From: Sandra Loosemore <sloosemore@baylibre.com>
Date: Thu, 13 Mar 2025 22:48:09 +0000 (+0000)
Subject: Doc: Rearrange remaining top-level sections in extend.texi [PR42270]
X-Git-Tag: basepoints/gcc-16~867
X-Git-Url: http://git.ipfire.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=96492302a23c945d35fe1c83062da6f22c4f7b72;p=thirdparty%2Fgcc.git

Doc: Rearrange remaining top-level sections in extend.texi [PR42270]

This is part of an incremental effort to make the chapter on GCC
extensions better organized by grouping/rearranging sections by topic.

gcc/ChangeLog
	PR other/42270
	* doc/extend.texi (Nonlocal Gotos): Group with other built-ins
	sections.
	(Constructing Calls): Likewise.
	(Pragmas): Move earlier in the section, before the built-ins docs.
	(Thread-Local): Likewise.
	(OpenMP): Likewise.
	(OpenACC): Likewise.
---

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 59cad54d2cd..9f8a590a301 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -23,8 +23,6 @@ Some features that are in ISO C99 but not C90 or C++ are also, as
 extensions, accepted by GCC in C90 mode and in C++.
 
 @menu
-* Nonlocal Gotos::      Nonlocal gotos.
-* Constructing Calls::  Dispatching a call to another function.
 * Additional Numeric Types::  Additional sizes and formats, plus complex numbers.
 * Aggregate Types::	Extensions to arrays, structs, and unions.
 * Named Address Spaces::Named address spaces.
@@ -36,11 +34,17 @@ extensions, accepted by GCC in C90 mode and in C++.
 * Enumerator Attributes:: Specifying attributes on enumerators.
 * Statement Attributes:: Specifying attributes on statements.
 * Attribute Syntax::    Formal syntax for attributes.
+* Pragmas::             Pragmas accepted by GCC.
+* Thread-Local::        Per-thread variables.
+* OpenMP::              Multiprocessing extensions.
+* OpenACC::             Extensions for offloading code to accelerator devices.
 * Inline::              Defining inline functions (as fast as macros).
 * Volatiles::           What constitutes an access to a volatile object.
 * Using Assembly Language with C:: Instructions and extensions for interfacing C with assembler.
 * Syntax Extensions::   Other extensions to C syntax.
 * Semantic Extensions:: GNU C defines behavior for some non-standard constructs.
+* Nonlocal Gotos::      Built-ins for nonlocal gotos.
+* Constructing Calls::  Built-ins for dispatching a call to another function.
 * Return Address::      Getting the return or frame address of a function.
 * Stack Scrubbing::     Stack scrubbing internal interfaces.
 * Vector Extensions::   Using vector instructions through built-in functions.
@@ -55,184 +59,8 @@ extensions, accepted by GCC in C90 mode and in C++.
 * Other Builtins::      Other built-in functions.
 * Target Builtins::     Built-in functions specific to particular targets.
 * Target Format Checks:: Format checks specific to particular targets.
-* Pragmas::             Pragmas accepted by GCC.
-* Thread-Local::        Per-thread variables.
-* OpenMP::              Multiprocessing extensions.
-* OpenACC::             Extensions for offloading code to accelerator devices.
 @end menu
 
-@node Nonlocal Gotos
-@section Nonlocal Gotos
-@cindex nonlocal gotos
-
-GCC provides the built-in functions @code{__builtin_setjmp} and
-@code{__builtin_longjmp} which are similar to, but not interchangeable
-with, the C library functions @code{setjmp} and @code{longjmp}.  
-The built-in versions are used internally by GCC's libraries
-to implement exception handling on some targets.  You should use the 
-standard C library functions declared in @code{<setjmp.h>} in user code
-instead of the builtins.
-
-The built-in versions of these functions use GCC's normal
-mechanisms to save and restore registers using the stack on function
-entry and exit.  The jump buffer argument @var{buf} holds only the
-information needed to restore the stack frame, rather than the entire 
-set of saved register values.  
-
-An important caveat is that GCC arranges to save and restore only
-those registers known to the specific architecture variant being
-compiled for.  This can make @code{__builtin_setjmp} and
-@code{__builtin_longjmp} more efficient than their library
-counterparts in some cases, but it can also cause incorrect and
-mysterious behavior when mixing with code that uses the full register
-set.
-
-You should declare the jump buffer argument @var{buf} to the
-built-in functions as:
-
-@smallexample
-#include <stdint.h>
-intptr_t @var{buf}[5];
-@end smallexample
-
-@defbuiltin{{int} __builtin_setjmp (intptr_t *@var{buf})}
-This function saves the current stack context in @var{buf}.  
-@code{__builtin_setjmp} returns 0 when returning directly,
-and 1 when returning from @code{__builtin_longjmp} using the same
-@var{buf}.
-@enddefbuiltin
-
-@defbuiltin{{void} __builtin_longjmp (intptr_t *@var{buf}, int @var{val})}
-This function restores the stack context in @var{buf}, 
-saved by a previous call to @code{__builtin_setjmp}.  After
-@code{__builtin_longjmp} is finished, the program resumes execution as
-if the matching @code{__builtin_setjmp} returns the value @var{val},
-which must be 1.
-
-Because @code{__builtin_longjmp} depends on the function return
-mechanism to restore the stack context, it cannot be called
-from the same function calling @code{__builtin_setjmp} to
-initialize @var{buf}.  It can only be called from a function called
-(directly or indirectly) from the function calling @code{__builtin_setjmp}.
-@enddefbuiltin
-
-@node Constructing Calls
-@section Constructing Function Calls
-@cindex constructing calls
-@cindex forwarding calls
-
-Using the built-in functions described below, you can record
-the arguments a function received, and call another function
-with the same arguments, without knowing the number or types
-of the arguments.
-
-You can also record the return value of that function call,
-and later return that value, without knowing what data type
-the function tried to return (as long as your caller expects
-that data type).
-
-However, these built-in functions may interact badly with some
-sophisticated features or other extensions of the language.  It
-is, therefore, not recommended to use them outside very simple
-functions acting as mere forwarders for their arguments.
-
-@defbuiltin{{void *} __builtin_apply_args ()}
-This built-in function returns a pointer to data
-describing how to perform a call with the same arguments as are passed
-to the current function.
-
-The function saves the arg pointer register, structure value address,
-and all registers that might be used to pass arguments to a function
-into a block of memory allocated on the stack.  Then it returns the
-address of that block.
-@enddefbuiltin
-
-@defbuiltin{{void *} __builtin_apply (void (*@var{function})(), void *@var{arguments}, size_t @var{size})}
-This built-in function invokes @var{function}
-with a copy of the parameters described by @var{arguments}
-and @var{size}.
-
-The value of @var{arguments} should be the value returned by
-@code{__builtin_apply_args}.  The argument @var{size} specifies the size
-of the stack argument data, in bytes.
-
-This function returns a pointer to data describing
-how to return whatever value is returned by @var{function}.  The data
-is saved in a block of memory allocated on the stack.
-
-It is not always simple to compute the proper value for @var{size}.  The
-value is used by @code{__builtin_apply} to compute the amount of data
-that should be pushed on the stack and copied from the incoming argument
-area.
-@enddefbuiltin
-
-@defbuiltin{{void} __builtin_return (void *@var{result})}
-This built-in function returns the value described by @var{result} from
-the containing function.  You should specify, for @var{result}, a value
-returned by @code{__builtin_apply}.
-@enddefbuiltin
-
-@defbuiltin{{} __builtin_va_arg_pack ()}
-This built-in function represents all anonymous arguments of an inline
-function.  It can be used only in inline functions that are always
-inlined, never compiled as a separate function, such as those using
-@code{__attribute__ ((__always_inline__))} or
-@code{__attribute__ ((__gnu_inline__))} extern inline functions.
-It must be only passed as last argument to some other function
-with variable arguments.  This is useful for writing small wrapper
-inlines for variable argument functions, when using preprocessor
-macros is undesirable.  For example:
-@smallexample
-extern int myprintf (FILE *f, const char *format, ...);
-extern inline __attribute__ ((__gnu_inline__)) int
-myprintf (FILE *f, const char *format, ...)
-@{
-  int r = fprintf (f, "myprintf: ");
-  if (r < 0)
-    return r;
-  int s = fprintf (f, format, __builtin_va_arg_pack ());
-  if (s < 0)
-    return s;
-  return r + s;
-@}
-@end smallexample
-@enddefbuiltin
-
-@defbuiltin{int __builtin_va_arg_pack_len ()}
-This built-in function returns the number of anonymous arguments of
-an inline function.  It can be used only in inline functions that
-are always inlined, never compiled as a separate function, such
-as those using @code{__attribute__ ((__always_inline__))} or
-@code{__attribute__ ((__gnu_inline__))} extern inline functions.
-For example following does link- or run-time checking of open
-arguments for optimized code:
-@smallexample
-#ifdef __OPTIMIZE__
-extern inline __attribute__((__gnu_inline__)) int
-myopen (const char *path, int oflag, ...)
-@{
-  if (__builtin_va_arg_pack_len () > 1)
-    warn_open_too_many_arguments ();
-
-  if (__builtin_constant_p (oflag))
-    @{
-      if ((oflag & O_CREAT) != 0 && __builtin_va_arg_pack_len () < 1)
-        @{
-          warn_open_missing_mode ();
-          return __open_2 (path, oflag);
-        @}
-      return open (path, oflag, __builtin_va_arg_pack ());
-    @}
-
-  if (__builtin_va_arg_pack_len () < 1)
-    return __open_2 (path, oflag);
-
-  return open (path, oflag, __builtin_va_arg_pack ());
-@}
-#endif
-@end smallexample
-@enddefbuiltin
-
 @node Additional Numeric Types
 @section Additional Numeric Types
 
@@ -9751,19850 +9579,20022 @@ target type; if such an attribute is applied to a function return type
 that is not a pointer-to-function type, it is treated as applying
 to the function type.
 
-@node Inline
-@section An Inline Function is As Fast As a Macro
-@cindex inline functions
-@cindex integrating function code
-@cindex open coding
-@cindex macros, inline alternative
+@node Pragmas
+@section Pragmas Accepted by GCC
+@cindex pragmas
+@cindex @code{#pragma}
 
-By declaring a function inline, you can direct GCC to make
-calls to that function faster.  One way GCC can achieve this is to
-integrate that function's code into the code for its callers.  This
-makes execution faster by eliminating the function-call overhead; in
-addition, if any of the actual argument values are constant, their
-known values may permit simplifications at compile time so that not
-all of the inline function's code needs to be included.  The effect on
-code size is less predictable; object code may be larger or smaller
-with function inlining, depending on the particular case.  You can
-also direct GCC to try to integrate all ``simple enough'' functions
-into their callers with the option @option{-finline-functions}.
+GCC supports several types of pragmas, primarily in order to compile
+code originally written for other compilers.  Note that in general
+we do not recommend the use of pragmas; @xref{Function Attributes},
+for further explanation.
 
-GCC implements three different semantics of declaring a function
-inline.  One is available with @option{-std=gnu89} or
-@option{-fgnu89-inline} or when @code{gnu_inline} attribute is present
-on all inline declarations, another when
-@option{-std=c99},
-@option{-std=gnu99} or an option for a later C version is used
-(without @option{-fgnu89-inline}), and the third
-is used when compiling C++.
+The GNU C preprocessor recognizes several pragmas in addition to the
+compiler pragmas documented here.  Refer to the CPP manual for more
+information.
 
-To declare a function inline, use the @code{inline} keyword in its
-declaration, like this:
+GCC additionally recognizes OpenMP pragmas when the @option{-fopenmp}
+option is specified, and OpenACC pragmas when the @option{-fopenacc}
+option is specified.  @xref{OpenMP}, and @ref{OpenACC}.
+
+@menu
+* AArch64 Pragmas::
+* ARM Pragmas::
+* LoongArch Pragmas::
+* M32C Pragmas::
+* PRU Pragmas::
+* RS/6000 and PowerPC Pragmas::
+* S/390 Pragmas::
+* Darwin Pragmas::
+* Solaris Pragmas::
+* Symbol-Renaming Pragmas::
+* Structure-Layout Pragmas::
+* Weak Pragmas::
+* Diagnostic Pragmas::
+* Visibility Pragmas::
+* Push/Pop Macro Pragmas::
+* Function Specific Option Pragmas::
+* Loop-Specific Pragmas::
+@end menu
+
+@node AArch64 Pragmas
+@subsection AArch64 Pragmas
 
+The pragmas defined by the AArch64 target correspond to the AArch64
+target function attributes.  They can be specified as below:
 @smallexample
-static inline int
-inc (int *a)
-@{
-  return (*a)++;
-@}
+#pragma GCC target("string")
 @end smallexample
 
-If you are writing a header file to be included in ISO C90 programs, write
-@code{__inline__} instead of @code{inline}.  @xref{Alternate Keywords}.
+where @code{@var{string}} can be any string accepted as an AArch64 target
+attribute.  @xref{AArch64 Function Attributes}, for more details
+on the permissible values of @code{string}.
 
-The three types of inlining behave similarly in two important cases:
-when the @code{inline} keyword is used on a @code{static} function,
-like the example above, and when a function is first declared without
-using the @code{inline} keyword and then is defined with
-@code{inline}, like this:
+@node ARM Pragmas
+@subsection ARM Pragmas
 
-@smallexample
-extern int inc (int *a);
-inline int
-inc (int *a)
-@{
-  return (*a)++;
-@}
-@end smallexample
+The ARM target defines pragmas for controlling the default addition of
+@code{long_call} and @code{short_call} attributes to functions.
+@xref{Function Attributes}, for information about the effects of these
+attributes.
 
-In both of these common cases, the program behaves the same as if you
-had not used the @code{inline} keyword, except for its speed.
+@table @code
+@cindex pragma, long_calls
+@item long_calls
+Set all subsequent functions to have the @code{long_call} attribute.
 
-@cindex inline functions, omission of
-@opindex fkeep-inline-functions
-When a function is both inline and @code{static}, if all calls to the
-function are integrated into the caller, and the function's address is
-never used, then the function's own assembler code is never referenced.
-In this case, GCC does not actually output assembler code for the
-function, unless you specify the option @option{-fkeep-inline-functions}.
-If there is a nonintegrated call, then the function is compiled to
-assembler code as usual.  The function must also be compiled as usual if
-the program refers to its address, because that cannot be inlined.
+@cindex pragma, no_long_calls
+@item no_long_calls
+Set all subsequent functions to have the @code{short_call} attribute.
 
-@opindex Winline
-Note that certain usages in a function definition can make it unsuitable
-for inline substitution.  Among these usages are: variadic functions,
-use of @code{alloca}, use of computed goto (@pxref{Labels as Values}),
-use of nonlocal goto, use of nested functions, use of @code{setjmp}, use
-of @code{__builtin_longjmp} and use of @code{__builtin_return} or
-@code{__builtin_apply_args}.  Using @option{-Winline} warns when a
-function marked @code{inline} could not be substituted, and gives the
-reason for the failure.
+@cindex pragma, long_calls_off
+@item long_calls_off
+Do not affect the @code{long_call} or @code{short_call} attributes of
+subsequent functions.
+@end table
 
-@cindex automatic @code{inline} for C++ member fns
-@cindex @code{inline} automatic for C++ member fns
-@cindex member fns, automatically @code{inline}
-@cindex C++ member fns, automatically @code{inline}
-@opindex fno-default-inline
-As required by ISO C++, GCC considers member functions defined within
-the body of a class to be marked inline even if they are
-not explicitly declared with the @code{inline} keyword.  You can
-override this with @option{-fno-default-inline}; @pxref{C++ Dialect
-Options,,Options Controlling C++ Dialect}.
+@node LoongArch Pragmas
+@subsection LoongArch Pragmas
 
-GCC does not inline any functions when not optimizing unless you specify
-the @samp{always_inline} attribute for the function, like this:
+The list of attributes supported by Pragma is the same as that of target
+function attributes.  @xref{LoongArch Function Attributes}.
+
+Example:
 
 @smallexample
-/* @r{Prototype.}  */
-inline void foo (const char) __attribute__((always_inline));
+#pragma GCC target("strict-align")
 @end smallexample
 
-The remainder of this section is specific to GNU C90 inlining.
+@node M32C Pragmas
+@subsection M32C Pragmas
 
-@cindex non-static inline function
-When an inline function is not @code{static}, then the compiler must assume
-that there may be calls from other source files; since a global symbol can
-be defined only once in any program, the function must not be defined in
-the other source files, so the calls therein cannot be integrated.
-Therefore, a non-@code{static} inline function is always compiled on its
-own in the usual fashion.
+@table @code
+@cindex pragma, memregs
+@item GCC memregs @var{number}
+Overrides the command-line option @code{-memregs=} for the current
+file.  Use with care!  This pragma must be before any function in the
+file, and mixing different memregs values in different objects may
+make them incompatible.  This pragma is useful when a
+performance-critical function uses a memreg for temporary values,
+as it may allow you to reduce the number of memregs used.
 
-If you specify both @code{inline} and @code{extern} in the function
-definition, then the definition is used only for inlining.  In no case
-is the function compiled on its own, not even if you refer to its
-address explicitly.  Such an address becomes an external reference, as
-if you had only declared the function, and had not defined it.
+@cindex pragma, address
+@item ADDRESS @var{name} @var{address}
+For any declared symbols matching @var{name}, this does three things
+to that symbol: it forces the symbol to be located at the given
+address (a number), it forces the symbol to be volatile, and it
+changes the symbol's scope to be static.  This pragma exists for
+compatibility with other compilers, but note that the common
+@code{1234H} numeric syntax is not supported (use @code{0x1234}
+instead).  Example:
 
-This combination of @code{inline} and @code{extern} has almost the
-effect of a macro.  The way to use it is to put a function definition in
-a header file with these keywords, and put another copy of the
-definition (lacking @code{inline} and @code{extern}) in a library file.
-The definition in the header file causes most calls to the function
-to be inlined.  If any uses of the function remain, they refer to
-the single copy in the library.
+@smallexample
+#pragma ADDRESS port3 0x103
+char port3;
+@end smallexample
 
-@node Volatiles
-@section When is a Volatile Object Accessed?
-@cindex accessing volatiles
-@cindex volatile read
-@cindex volatile write
-@cindex volatile access
+@end table
 
-C has the concept of volatile objects.  These are normally accessed by
-pointers and used for accessing hardware or inter-thread
-communication.  The standard encourages compilers to refrain from
-optimizations concerning accesses to volatile objects, but leaves it
-implementation defined as to what constitutes a volatile access.  The
-minimum requirement is that at a sequence point all previous accesses
-to volatile objects have stabilized and no subsequent accesses have
-occurred.  Thus an implementation is free to reorder and combine
-volatile accesses that occur between sequence points, but cannot do
-so for accesses across a sequence point.  The use of volatile does
-not allow you to violate the restriction on updating objects multiple
-times between two sequence points.
+@node PRU Pragmas
+@subsection PRU Pragmas
 
-Accesses to non-volatile objects are not ordered with respect to
-volatile accesses.  You cannot use a volatile object as a memory
-barrier to order a sequence of writes to non-volatile memory.  For
-instance:
+@table @code
+
+@cindex pragma, ctable_entry
+@item ctable_entry @var{index} @var{constant_address}
+Specifies that the PRU CTABLE entry given by @var{index} has the value
+@var{constant_address}.  This enables GCC to emit LBCO/SBCO instructions
+when the load/store address is known and can be addressed with some CTABLE
+entry.  For example:
 
 @smallexample
-int *ptr = @var{something};
-volatile int vobj;
-*ptr = @var{something};
-vobj = 1;
+/* will compile to "sbco Rx, 2, 0x10, 4" */
+#pragma ctable_entry 2 0x4802a000
+*(unsigned int *)0x4802a010 = val;
 @end smallexample
 
-@noindent
-Unless @var{*ptr} and @var{vobj} can be aliased, it is not guaranteed
-that the write to @var{*ptr} occurs by the time the update
-of @var{vobj} happens.  If you need this guarantee, you must use
-a stronger memory barrier such as:
+@end table
 
-@smallexample
-int *ptr = @var{something};
-volatile int vobj;
-*ptr = @var{something};
-asm volatile ("" : : : "memory");
-vobj = 1;
-@end smallexample
+@node RS/6000 and PowerPC Pragmas
+@subsection RS/6000 and PowerPC Pragmas
 
-A scalar volatile object is read when it is accessed in a void context:
+The RS/6000 and PowerPC targets define one pragma for controlling
+whether or not the @code{longcall} attribute is added to function
+declarations by default.  This pragma overrides the @option{-mlongcall}
+option, but not the @code{longcall} and @code{shortcall} attributes.
+@xref{RS/6000 and PowerPC Options}, for more information about when long
+calls are and are not necessary.
 
-@smallexample
-volatile int *src = @var{somevalue};
-*src;
-@end smallexample
+@table @code
+@cindex pragma, longcall
+@item longcall (1)
+Apply the @code{longcall} attribute to all subsequent function
+declarations.
 
-Such expressions are rvalues, and GCC implements this as a
-read of the volatile object being pointed to.
+@item longcall (0)
+Do not apply the @code{longcall} attribute to subsequent function
+declarations.
+@end table
 
-Assignments are also expressions and have an rvalue.  However when
-assigning to a scalar volatile, the volatile object is not reread,
-regardless of whether the assignment expression's rvalue is used or
-not.  If the assignment's rvalue is used, the value is that assigned
-to the volatile object.  For instance, there is no read of @var{vobj}
-in all the following cases:
+@c Describe h8300 pragmas here.
+@c Describe sh pragmas here.
+@c Describe v850 pragmas here.
 
-@smallexample
-int obj;
-volatile int vobj;
-vobj = @var{something};
-obj = vobj = @var{something};
-obj ? vobj = @var{onething} : vobj = @var{anotherthing};
-obj = (@var{something}, vobj = @var{anotherthing});
-@end smallexample
+@node S/390 Pragmas
+@subsection S/390 Pragmas
 
-If you need to read the volatile object after an assignment has
-occurred, you must use a separate expression with an intervening
-sequence point.
+The pragmas defined by the S/390 target correspond to the S/390
+target function attributes and some the additional options:
 
-As bit-fields are not individually addressable, volatile bit-fields may
-be implicitly read when written to, or when adjacent bit-fields are
-accessed.  Bit-field operations may be optimized such that adjacent
-bit-fields are only partially accessed, if they straddle a storage unit
-boundary.  For these reasons it is unwise to use volatile bit-fields to
-access hardware.
+@table @samp
+@item zvector
+@itemx no-zvector
+@end table
 
-@node Using Assembly Language with C
-@section How to Use Inline Assembly Language in C Code
-@cindex @code{asm} keyword
-@cindex assembly language in C
-@cindex inline assembly language
-@cindex mixing assembly language and C
+Note that options of the pragma, unlike options of the target
+attribute, do change the value of preprocessor macros like
+@code{__VEC__}.  They can be specified as below:
 
-The @code{asm} keyword allows you to embed assembler instructions
-within C code.  GCC provides two forms of inline @code{asm}
-statements.  A @dfn{basic @code{asm}} statement is one with no
-operands (@pxref{Basic Asm}), while an @dfn{extended @code{asm}}
-statement (@pxref{Extended Asm}) includes one or more operands.  
-The extended form is preferred for mixing C and assembly language
-within a function and can be used at top level as well with certain
-restrictions.
+@smallexample
+#pragma GCC target("string[,string]...")
+#pragma GCC target("string"[,"string"]...)
+@end smallexample
 
-You can also use the @code{asm} keyword to override the assembler name
-for a C symbol, or to place a C variable in a specific register.
+@node Darwin Pragmas
+@subsection Darwin Pragmas
 
-@menu
-* Basic Asm::          Inline assembler without operands.
-* Extended Asm::       Inline assembler with operands.
-* Constraints::        Constraints for @code{asm} operands
-* Asm constexprs::     C++11 constant expressions instead of string
-                       literals.
-* Asm Labels::         Specifying the assembler name to use for a C symbol.
-* Explicit Register Variables::  Defining variables residing in specified 
-                       registers.
-* Size of an asm::     How GCC calculates the size of an @code{asm} block.
-@end menu
+The following pragmas are available for all architectures running the
+Darwin operating system.  These are useful for compatibility with other
+macOS compilers.
 
-@node Basic Asm
-@subsection Basic Asm --- Assembler Instructions Without Operands
-@cindex basic @code{asm}
-@cindex assembly language in C, basic
+@table @code
+@cindex pragma, mark
+@item mark @var{tokens}@dots{}
+This pragma is accepted, but has no effect.
 
-A basic @code{asm} statement has the following syntax:
+@cindex pragma, options align
+@item options align=@var{alignment}
+This pragma sets the alignment of fields in structures.  The values of
+@var{alignment} may be @code{mac68k}, to emulate m68k alignment, or
+@code{power}, to emulate PowerPC alignment.  Uses of this pragma nest
+properly; to restore the previous setting, use @code{reset} for the
+@var{alignment}.
 
-@example
-asm @var{asm-qualifiers} ( @var{AssemblerInstructions} )
-@end example
+@cindex pragma, segment
+@item segment @var{tokens}@dots{}
+This pragma is accepted, but has no effect.
 
-For the C language, the @code{asm} keyword is a GNU extension.
-When writing C code that can be compiled with @option{-ansi} and the
-@option{-std} options that select C dialects without GNU extensions, use
-@code{__asm__} instead of @code{asm} (@pxref{Alternate Keywords}).  For
-the C++ language, @code{asm} is a standard keyword, but @code{__asm__}
-can be used for code compiled with @option{-fno-asm}.
+@cindex pragma, unused
+@item unused (@var{var} [, @var{var}]@dots{})
+This pragma declares variables to be possibly unused.  GCC does not
+produce warnings for the listed variables.  The effect is similar to
+that of the @code{unused} attribute, except that this pragma may appear
+anywhere within the variables' scopes.
+@end table
+
+@node Solaris Pragmas
+@subsection Solaris Pragmas
+
+The Solaris target supports @code{#pragma redefine_extname}
+(@pxref{Symbol-Renaming Pragmas}).  It also supports additional
+@code{#pragma} directives for compatibility with the system compiler.
 
-@subsubheading Qualifiers
 @table @code
-@item volatile
-The optional @code{volatile} qualifier has no effect. 
-All basic @code{asm} blocks are implicitly volatile.
-Basic @code{asm} statements outside of functions may not use any
-qualifiers.
+@cindex pragma, align
+@item align @var{alignment} (@var{variable} [, @var{variable}]...)
 
-@item inline
-If you use the @code{inline} qualifier, then for inlining purposes the size
-of the @code{asm} statement is taken as the smallest size possible (@pxref{Size
-of an asm}).
-@end table
+Increase the minimum alignment of each @var{variable} to @var{alignment}.
+This is the same as GCC's @code{aligned} attribute @pxref{Variable
+Attributes}).  Macro expansion occurs on the arguments to this pragma
+when compiling C and Objective-C@.  It does not currently occur when
+compiling C++, but this is a bug which may be fixed in a future
+release.
 
-@subsubheading Parameters
-@table @var
+@cindex pragma, fini
+@item fini (@var{function} [, @var{function}]...)
 
-@item AssemblerInstructions
-This is a literal string that specifies the assembler code.
-In C++ with @option{-std=gnu++11} or later, it can
-also be a constant expression inside parentheses (see @ref{Asm constexprs}).
+This pragma causes each listed @var{function} to be called after
+main, or during shared module unloading, by adding a call to the
+@code{.fini} section.
 
-The string can contain any instructions recognized by the assembler,
-including directives.  GCC does not parse the assembler instructions
-themselves and does not know what they mean or even whether they are
-valid assembler input.
+@cindex pragma, init
+@item init (@var{function} [, @var{function}]...)
+
+This pragma causes each listed @var{function} to be called during
+initialization (before @code{main}) or during shared module loading, by
+adding a call to the @code{.init} section.
 
-You may place multiple assembler instructions together in a single @code{asm}
-string, separated by the characters normally used in assembly code for the
-system. A combination that works in most places is a newline to break the
-line, plus a tab character (written as @samp{\n\t}).
-Some assemblers allow semicolons as a line separator. However,
-note that some assembler dialects use semicolons to start a comment.
 @end table
 
-@subsubheading Remarks
-Using extended @code{asm} (@pxref{Extended Asm}) typically produces
-smaller, safer, and more efficient code, and in most cases it is a
-better solution than basic @code{asm}.  However, functions declared
-with the @code{naked} attribute require only basic @code{asm}
-(@pxref{Function Attributes}).
+@node Symbol-Renaming Pragmas
+@subsection Symbol-Renaming Pragmas
 
-Basic @code{asm} statements may be used both inside a C function or at
-file scope (``top-level''), where you can use this technique to emit
-assembler directives, define assembly language macros that can be invoked
-elsewhere in the file, or write entire functions in assembly language.
+GCC supports a @code{#pragma} directive that changes the name used in
+assembly for a given declaration. While this pragma is supported on all
+platforms, it is intended primarily to provide compatibility with the
+Solaris system headers. This effect can also be achieved using the asm
+labels extension (@pxref{Asm Labels}).
 
-Safely accessing C data and calling functions from basic @code{asm} is more 
-complex than it may appear. To access C data, it is better to use extended 
-@code{asm}.
+@table @code
+@cindex pragma, redefine_extname
+@item redefine_extname @var{oldname} @var{newname}
 
-Do not expect a sequence of @code{asm} statements to remain perfectly 
-consecutive after compilation. If certain instructions need to remain 
-consecutive in the output, put them in a single multi-instruction @code{asm}
-statement. Note that GCC's optimizers can move @code{asm} statements 
-relative to other code, including across jumps.
+This pragma gives the C function @var{oldname} the assembly symbol
+@var{newname}.  The preprocessor macro @code{__PRAGMA_REDEFINE_EXTNAME}
+is defined if this pragma is available (currently on all platforms).
+@end table
 
-@code{asm} statements may not perform jumps into other @code{asm} statements. 
-GCC does not know about these jumps, and therefore cannot take 
-account of them when deciding how to optimize. Jumps from @code{asm} to C 
-labels are only supported in extended @code{asm}.
+This pragma and the @code{asm} labels extension interact in a complicated
+manner.  Here are some corner cases you may want to be aware of:
 
-Under certain circumstances, GCC may duplicate (or remove duplicates of) your 
-assembly code when optimizing. This can lead to unexpected duplicate 
-symbol errors during compilation if your assembly code defines symbols or 
-labels.
+@enumerate
+@item This pragma silently applies only to declarations with external
+linkage.  The @code{asm} label feature does not have this restriction.
 
-@strong{Warning:} The C standards do not specify semantics for @code{asm},
-making it a potential source of incompatibilities between compilers.  These
-incompatibilities may not produce compiler warnings/errors.
+@item In C++, this pragma silently applies only to declarations with
+``C'' linkage.  Again, @code{asm} labels do not have this restriction.
 
-GCC does not parse basic @code{asm}'s @var{AssemblerInstructions}, which
-means there is no way to communicate to the compiler what is happening
-inside them.  GCC has no visibility of symbols in the @code{asm} and may
-discard them as unreferenced.  It also does not know about side effects of
-the assembler code, such as modifications to memory or registers.  Unlike
-some compilers, GCC assumes that no changes to general purpose registers
-occur.  This assumption may change in a future release.
+@item If either of the ways of changing the assembly name of a
+declaration are applied to a declaration whose assembly name has
+already been determined (either by a previous use of one of these
+features, or because the compiler needed the assembly name in order to
+generate code), and the new name is different, a warning issues and
+the name does not change.
 
-To avoid complications from future changes to the semantics and the
-compatibility issues between compilers, consider replacing basic @code{asm}
-with extended @code{asm}.  See
-@uref{https://gcc.gnu.org/wiki/ConvertBasicAsmToExtended, How to convert
-from basic asm to extended asm} for information about how to perform this
-conversion.
+@item The @var{oldname} used by @code{#pragma redefine_extname} is
+always the C-language name.
+@end enumerate
 
-The compiler copies the assembler instructions in a basic @code{asm} 
-verbatim to the assembly language output file, without 
-processing dialects or any of the @samp{%} operators that are available with
-extended @code{asm}. This results in minor differences between basic 
-@code{asm} strings and extended @code{asm} templates. For example, to refer to 
-registers you might use @samp{%eax} in basic @code{asm} and
-@samp{%%eax} in extended @code{asm}.
+@node Structure-Layout Pragmas
+@subsection Structure-Layout Pragmas
 
-On targets such as x86 that support multiple assembler dialects,
-all basic @code{asm} blocks use the assembler dialect specified by the 
-@option{-masm} command-line option (@pxref{x86 Options}).  
-Basic @code{asm} provides no
-mechanism to provide different assembler strings for different dialects.
+For compatibility with Microsoft Windows compilers, GCC supports a
+set of @code{#pragma} directives that change the maximum alignment of
+members of structures (other than zero-width bit-fields), unions, and
+classes subsequently defined. The @var{n} value below always is required
+to be a small power of two and specifies the new alignment in bytes.
 
-For basic @code{asm} with non-empty assembler string GCC assumes
-the assembler block does not change any general purpose registers,
-but it may read or write any globally accessible variable.
+@enumerate
+@item @code{#pragma pack(@var{n})} simply sets the new alignment.
+@item @code{#pragma pack()} sets the alignment to the one that was in
+effect when compilation started (see also command-line option
+@option{-fpack-struct[=@var{n}]} @pxref{Code Gen Options}).
+@item @code{#pragma pack(push[,@var{n}])} pushes the current alignment
+setting on an internal stack and then optionally sets the new alignment.
+@item @code{#pragma pack(pop)} restores the alignment setting to the one
+saved at the top of the internal stack (and removes that stack entry).
+Note that @code{#pragma pack([@var{n}])} does not influence this internal
+stack; thus it is possible to have @code{#pragma pack(push)} followed by
+multiple @code{#pragma pack(@var{n})} instances and finalized by a single
+@code{#pragma pack(pop)}.
+@end enumerate
 
-Here is an example of basic @code{asm} for i386:
+Some targets, e.g.@: x86 and PowerPC, support the @code{#pragma ms_struct}
+directive which lays out structures and unions subsequently defined as the
+documented @code{__attribute__ ((ms_struct))}.
 
-@example
-/* Note that this code will not compile with -masm=intel */
-#define DebugBreak() asm("int $3")
-@end example
+@enumerate
+@item @code{#pragma ms_struct on} turns on the Microsoft layout.
+@item @code{#pragma ms_struct off} turns off the Microsoft layout.
+@item @code{#pragma ms_struct reset} goes back to the default layout.
+@end enumerate
 
-@node Extended Asm
-@subsection Extended Asm - Assembler Instructions with C Expression Operands
-@cindex extended @code{asm}
-@cindex assembly language in C, extended
+Most targets also support the @code{#pragma scalar_storage_order} directive
+which lays out structures and unions subsequently defined as the documented
+@code{__attribute__ ((scalar_storage_order))}.
 
-With extended @code{asm} you can read and write C variables from 
-assembler and perform jumps from assembler code to C labels.  
-Extended @code{asm} syntax uses colons (@samp{:}) to delimit
-the operand parameters after the assembler template:
+@enumerate
+@item @code{#pragma scalar_storage_order big-endian} sets the storage order
+of the scalar fields to big-endian.
+@item @code{#pragma scalar_storage_order little-endian} sets the storage order
+of the scalar fields to little-endian.
+@item @code{#pragma scalar_storage_order default} goes back to the endianness
+that was in effect when compilation started (see also command-line option
+@option{-fsso-struct=@var{endianness}} @pxref{C Dialect Options}).
+@end enumerate
 
-@example
-asm @var{asm-qualifiers} ( @var{AssemblerTemplate} 
-                 : @var{OutputOperands} 
-                 @r{[} : @var{InputOperands}
-                 @r{[} : @var{Clobbers} @r{]} @r{]})
+@node Weak Pragmas
+@subsection Weak Pragmas
 
-asm @var{asm-qualifiers} ( @var{AssemblerTemplate} 
-                      : @var{OutputOperands}
-                      : @var{InputOperands}
-                      : @var{Clobbers}
-                      : @var{GotoLabels})
-@end example
-where in the last form, @var{asm-qualifiers} contains @code{goto} (and in the
-first form, not).
+For compatibility with SVR4, GCC supports a set of @code{#pragma}
+directives for declaring symbols to be weak, and defining weak
+aliases.
 
-The @code{asm} keyword is a GNU extension.
-When writing code that can be compiled with @option{-ansi} and the
-various @option{-std} options, use @code{__asm__} instead of 
-@code{asm} (@pxref{Alternate Keywords}).
-
-@subsubheading Qualifiers
 @table @code
+@cindex pragma, weak
+@item #pragma weak @var{symbol}
+This pragma declares @var{symbol} to be weak, as if the declaration
+had the attribute of the same name.  The pragma may appear before
+or after the declaration of @var{symbol}.  It is not an error for
+@var{symbol} to never be defined at all.
 
-@item volatile
-The typical use of extended @code{asm} statements is to manipulate input 
-values to produce output values. However, your @code{asm} statements may 
-also produce side effects. If so, you may need to use the @code{volatile} 
-qualifier to disable certain optimizations. @xref{Volatile}.
+@item #pragma weak @var{symbol1} = @var{symbol2}
+This pragma declares @var{symbol1} to be a weak alias of @var{symbol2}.
+It is an error if @var{symbol2} is not defined in the current
+translation unit.
+@end table
 
-@item inline
-If you use the @code{inline} qualifier, then for inlining purposes the size
-of the @code{asm} statement is taken as the smallest size possible
-(@pxref{Size of an asm}).
+@node Diagnostic Pragmas
+@subsection Diagnostic Pragmas
 
-@item goto
-This qualifier informs the compiler that the @code{asm} statement may 
-perform a jump to one of the labels listed in the @var{GotoLabels}.
-@xref{GotoLabels}.
-@end table
+GCC allows the user to selectively enable or disable certain types of
+diagnostics, and change the kind of the diagnostic.  For example, a
+project's policy might require that all sources compile with
+@option{-Werror} but certain files might have exceptions allowing
+specific types of warnings.  Or, a project might selectively enable
+diagnostics and treat them as errors depending on which preprocessor
+macros are defined.
 
-@subsubheading Parameters
-@table @var
-@item AssemblerTemplate
-This is a literal string that is the template for the assembler code. It is a 
-combination of fixed text and tokens that refer to the input, output, 
-and goto parameters. @xref{AssemblerTemplate}.
+@table @code
+@cindex pragma, diagnostic
+@item #pragma GCC diagnostic @var{kind} @var{option}
 
-@item OutputOperands
-A comma-separated list describing the C variables modified by the
-instructions in the @var{AssemblerTemplate}.  An empty list is permitted.
-@xref{OutputOperands}.
+Modifies the disposition of a diagnostic.  Note that not all
+diagnostics are modifiable; at the moment only warnings (normally
+controlled by @samp{-W@dots{}}) can be controlled, and not all of them.
+Use @option{-fdiagnostics-show-option} to determine which diagnostics
+are controllable and which option controls them.
 
-@item InputOperands
-A comma-separated list describing the C expressions read by the
-instructions in the @var{AssemblerTemplate}.  An empty list is permitted.
-@xref{InputOperands}.
+@var{kind} is @samp{error} to treat this diagnostic as an error,
+@samp{warning} to treat it like a warning (even if @option{-Werror} is
+in effect), or @samp{ignored} if the diagnostic is to be ignored.
+@var{option} is a double quoted string that matches the command-line
+option.
 
-@item Clobbers
-A comma-separated list of registers or other values changed by the 
-@var{AssemblerTemplate}, beyond those listed as outputs.
-An empty list is permitted.  @xref{Clobbers and Scratch Registers}.
+@smallexample
+#pragma GCC diagnostic warning "-Wformat"
+#pragma GCC diagnostic error "-Wformat"
+#pragma GCC diagnostic ignored "-Wformat"
+@end smallexample
 
-@item GotoLabels
-When you are using the @code{goto} form of @code{asm}, this section contains 
-the list of all C labels to which the code in the 
-@var{AssemblerTemplate} may jump. 
-@xref{GotoLabels}.
+Note that these pragmas override any command-line options.  GCC keeps
+track of the location of each pragma, and issues diagnostics according
+to the state as of that point in the source file.  Thus, pragmas occurring
+after a line do not affect diagnostics caused by that line.
 
-@code{asm} statements may not perform jumps into other @code{asm} statements,
-only to the listed @var{GotoLabels}.
-GCC's optimizers do not know about other jumps; therefore they cannot take 
-account of them when deciding how to optimize.
-@end table
+@item #pragma GCC diagnostic push
+@itemx #pragma GCC diagnostic pop
 
-The total number of input + output + goto operands is limited to 30.
+Causes GCC to remember the state of the diagnostics as of each
+@code{push}, and restore to that point at each @code{pop}.  If a
+@code{pop} has no matching @code{push}, the command-line options are
+restored.
 
-@subsubheading Remarks
-The @code{asm} statement allows you to include assembly instructions directly 
-within C code. This may help you to maximize performance in time-sensitive 
-code or to access assembly instructions that are not readily available to C 
-programs.
+@smallexample
+#pragma GCC diagnostic error "-Wuninitialized"
+  foo(a);                       /* error is given for this one */
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wuninitialized"
+  foo(b);                       /* no diagnostic for this one */
+#pragma GCC diagnostic pop
+  foo(c);                       /* error is given for this one */
+#pragma GCC diagnostic pop
+  foo(d);                       /* depends on command-line options */
+@end smallexample
 
-Similarly to basic @code{asm}, extended @code{asm} statements may be used
-both inside a C function or at file scope (``top-level''), where you can
-use this technique to emit assembler directives, define assembly language
-macros that can be invoked elsewhere in the file, or write entire functions
-in assembly language.
-Extended @code{asm} statements outside of functions may not use any
-qualifiers, may not specify clobbers, may not use @code{%}, @code{+} or
-@code{&} modifiers in constraints and can only use constraints which don't
-allow using any register.
+@item #pragma GCC diagnostic ignored_attributes
 
-Functions declared with the @code{naked} attribute require basic 
-@code{asm} (@pxref{Function Attributes}).
+Similarly to @option{-Wno-attributes=}, this pragma allows users to suppress
+warnings about unknown scoped attributes (in C++11 and C23).  For example,
+@code{#pragma GCC diagnostic ignored_attributes "vendor::attr"} disables
+warning about the following declaration:
 
-While the uses of @code{asm} are many and varied, it may help to think of an 
-@code{asm} statement as a series of low-level instructions that convert input 
-parameters to output parameters. So a simple (if not particularly useful) 
-example for i386 using @code{asm} might look like this:
+@smallexample
+[[vendor::attr]] void f();
+@end smallexample
 
-@example
-int src = 1;
-int dst;   
+whereas @code{#pragma GCC diagnostic ignored_attributes "vendor::"} prevents
+warning about both of these declarations:
 
-asm ("mov %1, %0\n\t"
-    "add $1, %0"
-    : "=r" (dst) 
-    : "r" (src));
+@smallexample
+[[vendor::safe]] void f();
+[[vendor::unsafe]] void f2();
+@end smallexample
 
-printf("%d\n", dst);
-@end example
+@end table
 
-This code copies @code{src} to @code{dst} and add 1 to @code{dst}.
+GCC also offers a simple mechanism for printing messages during
+compilation.
 
-@anchor{Volatile}
-@subsubsection Volatile
-@cindex volatile @code{asm}
-@cindex @code{asm} volatile
+@table @code
+@cindex pragma, diagnostic
+@item #pragma message @var{string}
 
-GCC's optimizers sometimes discard @code{asm} statements if they determine 
-there is no need for the output variables. Also, the optimizers may move 
-code out of loops if they believe that the code will always return the same 
-result (i.e.@: none of its input values change between calls). Using the 
-@code{volatile} qualifier disables these optimizations. @code{asm} statements 
-that have no output operands and @code{asm goto} statements, 
-are implicitly volatile.
+Prints @var{string} as a compiler message on compilation.  The message
+is informational only, and is neither a compilation warning nor an
+error.  Newlines can be included in the string by using the @samp{\n}
+escape sequence.
 
-This i386 code demonstrates a case that does not use (or require) the 
-@code{volatile} qualifier. If it is performing assertion checking, this code 
-uses @code{asm} to perform the validation. Otherwise, @code{dwRes} is 
-unreferenced by any code. As a result, the optimizers can discard the 
-@code{asm} statement, which in turn removes the need for the entire 
-@code{DoCheck} routine. By omitting the @code{volatile} qualifier when it 
-isn't needed you allow the optimizers to produce the most efficient code 
-possible.
+@smallexample
+#pragma message "Compiling " __FILE__ "..."
+@end smallexample
 
-@example
-void DoCheck(uint32_t dwSomeValue)
-@{
-   uint32_t dwRes;
+@var{string} may be parenthesized, and is printed with location
+information.  For example,
 
-   // Assumes dwSomeValue is not zero.
-   asm ("bsfl %1,%0"
-     : "=r" (dwRes)
-     : "r" (dwSomeValue)
-     : "cc");
+@smallexample
+#define DO_PRAGMA(x) _Pragma (#x)
+#define TODO(x) DO_PRAGMA(message ("TODO - " #x))
 
-   assert(dwRes > 3);
-@}
-@end example
+TODO(Remember to fix this)
+@end smallexample
 
-The next example shows a case where the optimizers can recognize that the input 
-(@code{dwSomeValue}) never changes during the execution of the function and can 
-therefore move the @code{asm} outside the loop to produce more efficient code. 
-Again, using the @code{volatile} qualifier disables this type of optimization.
+@noindent
+prints @samp{/tmp/file.c:4: note: #pragma message:
+TODO - Remember to fix this}.
 
-@example
-void do_print(uint32_t dwSomeValue)
-@{
-   uint32_t dwRes;
+@cindex pragma, diagnostic
+@item #pragma GCC error @var{message}
+Generates an error message.  This pragma @emph{is} considered to
+indicate an error in the compilation, and it will be treated as such.
 
-   for (uint32_t x=0; x < 5; x++)
-   @{
-      // Assumes dwSomeValue is not zero.
-      asm ("bsfl %1,%0"
-        : "=r" (dwRes)
-        : "r" (dwSomeValue)
-        : "cc");
+Newlines can be included in the string by using the @samp{\n}
+escape sequence.  They will be displayed as newlines even if the
+@option{-fmessage-length} option is set to zero.
 
-      printf("%u: %u %u\n", x, dwSomeValue, dwRes);
-   @}
+The error is only generated if the pragma is present in the code after
+pre-processing has been completed.  It does not matter however if the
+code containing the pragma is unreachable:
+
+@smallexample
+#if 0
+#pragma GCC error "this error is not seen"
+#endif
+void foo (void)
+@{
+  return;
+#pragma GCC error "this error is seen"
 @}
-@end example
+@end smallexample
 
-The following example demonstrates a case where you need to use the 
-@code{volatile} qualifier. 
-It uses the x86 @code{rdtsc} instruction, which reads 
-the computer's time-stamp counter. Without the @code{volatile} qualifier, 
-the optimizers might assume that the @code{asm} block will always return the 
-same value and therefore optimize away the second call.
+@cindex pragma, diagnostic
+@item #pragma GCC warning @var{message}
+This is just like @samp{pragma GCC error} except that a warning
+message is issued instead of an error message.  Unless
+@option{-Werror} is in effect, in which case this pragma will generate
+an error as well.
 
-@example
-uint64_t msr;
+@end table
 
-asm volatile ( "rdtsc\n\t"    // Returns the time in EDX:EAX.
-        "shl $32, %%rdx\n\t"  // Shift the upper bits left.
-        "or %%rdx, %0"        // 'Or' in the lower bits.
-        : "=a" (msr)
-        : 
-        : "rdx");
-
-printf("msr: %llx\n", msr);
+@node Visibility Pragmas
+@subsection Visibility Pragmas
 
-// Do other work...
+@table @code
+@cindex pragma, visibility
+@item #pragma GCC visibility push(@var{visibility})
+@itemx #pragma GCC visibility pop
 
-// Reprint the timestamp
-asm volatile ( "rdtsc\n\t"    // Returns the time in EDX:EAX.
-        "shl $32, %%rdx\n\t"  // Shift the upper bits left.
-        "or %%rdx, %0"        // 'Or' in the lower bits.
-        : "=a" (msr)
-        : 
-        : "rdx");
+This pragma allows the user to set the visibility for multiple
+declarations without having to give each a visibility attribute
+(@pxref{Function Attributes}).
 
-printf("msr: %llx\n", msr);
-@end example
+In C++, @samp{#pragma GCC visibility} affects only namespace-scope
+declarations.  Class members and template specializations are not
+affected; if you want to override the visibility for a particular
+member or instantiation, you must use an attribute.
 
-GCC's optimizers do not treat this code like the non-volatile code in the 
-earlier examples. They do not move it out of loops or omit it on the 
-assumption that the result from a previous call is still valid.
+@end table
 
-Note that the compiler can move even @code{volatile asm} instructions relative
-to other code, including across jump instructions. For example, on many 
-targets there is a system register that controls the rounding mode of 
-floating-point operations. Setting it with a @code{volatile asm} statement,
-as in the following PowerPC example, does not work reliably.
 
-@example
-asm volatile("mtfsf 255, %0" : : "f" (fpenv));
-sum = x + y;
-@end example
+@node Push/Pop Macro Pragmas
+@subsection Push/Pop Macro Pragmas
 
-The compiler may move the addition back before the @code{volatile asm}
-statement. To make it work as expected, add an artificial dependency to
-the @code{asm} by referencing a variable in the subsequent code, for
-example:
+For compatibility with Microsoft Windows compilers, GCC supports
+@samp{#pragma push_macro(@var{"macro_name"})}
+and @samp{#pragma pop_macro(@var{"macro_name"})}.
 
-@example
-asm volatile ("mtfsf 255,%1" : "=X" (sum) : "f" (fpenv));
-sum = x + y;
-@end example
+@table @code
+@cindex pragma, push_macro
+@item #pragma push_macro(@var{"macro_name"})
+This pragma saves the value of the macro named as @var{macro_name} to
+the top of the stack for this macro.
 
-Under certain circumstances, GCC may duplicate (or remove duplicates of) your 
-assembly code when optimizing. This can lead to unexpected duplicate symbol 
-errors during compilation if your @code{asm} code defines symbols or labels. 
-Using @samp{%=} 
-(@pxref{AssemblerTemplate}) may help resolve this problem.
+@cindex pragma, pop_macro
+@item #pragma pop_macro(@var{"macro_name"})
+This pragma sets the value of the macro named as @var{macro_name} to
+the value on top of the stack for this macro. If the stack for
+@var{macro_name} is empty, the value of the macro remains unchanged.
+@end table
 
-@anchor{AssemblerTemplate}
-@subsubsection Assembler Template
-@cindex @code{asm} assembler template
+For example:
 
-An assembler template is a literal string containing assembler instructions.
-In C++ with @option{-std=gnu++11} or later, the assembler template can
-also be a constant expression inside parentheses (see @ref{Asm constexprs}).
+@smallexample
+#define X  1
+#pragma push_macro("X")
+#undef X
+#define X -1
+#pragma pop_macro("X")
+int x [X];
+@end smallexample
 
-The compiler replaces tokens in the template that refer 
-to inputs, outputs, and goto labels,
-and then outputs the resulting string to the assembler. The 
-string can contain any instructions recognized by the assembler, including 
-directives. GCC does not parse the assembler instructions 
-themselves and does not know what they mean or even whether they are valid 
-assembler input. However, it does count the statements 
-(@pxref{Size of an asm}).
+@noindent
+In this example, the definition of X as 1 is saved by @code{#pragma
+push_macro} and restored by @code{#pragma pop_macro}.
 
-You may place multiple assembler instructions together in a single @code{asm} 
-string, separated by the characters normally used in assembly code for the 
-system. A combination that works in most places is a newline to break the 
-line, plus a tab character to move to the instruction field (written as 
-@samp{\n\t}). 
-Some assemblers allow semicolons as a line separator. However, note 
-that some assembler dialects use semicolons to start a comment. 
+@node Function Specific Option Pragmas
+@subsection Function Specific Option Pragmas
 
-Do not expect a sequence of @code{asm} statements to remain perfectly 
-consecutive after compilation, even when you are using the @code{volatile} 
-qualifier. If certain instructions need to remain consecutive in the output, 
-put them in a single multi-instruction @code{asm} statement.
+@table @code
+@cindex pragma GCC target
+@item #pragma GCC target (@var{string}, @dots{})
 
-Accessing data from C programs without using input/output operands (such as 
-by using global symbols directly from the assembler template) may not work as 
-expected. Similarly, calling functions directly from an assembler template 
-requires a detailed understanding of the target assembler and ABI.
+This pragma allows you to set target-specific options for functions
+defined later in the source file.  One or more strings can be
+specified.  Each function that is defined after this point is treated
+as if it had been declared with one @code{target(}@var{string}@code{)}
+attribute for each @var{string} argument.  The parentheses around
+the strings in the pragma are optional.  @xref{Function Attributes},
+for more information about the @code{target} attribute and the attribute
+syntax.
 
-Since GCC does not parse the assembler template,
-it has no visibility of any 
-symbols it references. This may result in GCC discarding those symbols as 
-unreferenced unless they are also listed as input, output, or goto operands.
+The @code{#pragma GCC target} pragma is presently implemented for
+x86, ARM, AArch64, PowerPC, and S/390 targets only.
 
-@subsubheading Special format strings
+@cindex pragma GCC optimize
+@item #pragma GCC optimize (@var{string}, @dots{})
 
-In addition to the tokens described by the input, output, and goto operands, 
-these tokens have special meanings in the assembler template:
+This pragma allows you to set global optimization options for functions
+defined later in the source file.  One or more strings can be
+specified.  Each function that is defined after this point is treated
+as if it had been declared with one @code{optimize(}@var{string}@code{)}
+attribute for each @var{string} argument.  The parentheses around
+the strings in the pragma are optional.  @xref{Function Attributes},
+for more information about the @code{optimize} attribute and the attribute
+syntax.
 
-@table @samp
-@item %% 
-Outputs a single @samp{%} into the assembler code.
+@cindex pragma GCC push_options
+@cindex pragma GCC pop_options
+@item #pragma GCC push_options
+@itemx #pragma GCC pop_options
 
-@item %= 
-Outputs a number that is unique to each instance of the @code{asm} 
-statement in the entire compilation. This option is useful when creating local 
-labels and referring to them multiple times in a single template that 
-generates multiple assembler instructions. 
+These pragmas maintain a stack of the current target and optimization
+options.  It is intended for include files where you temporarily want
+to switch to using a different @samp{#pragma GCC target} or
+@samp{#pragma GCC optimize} and then to pop back to the previous
+options.
 
-@item %@{
-@itemx %|
-@itemx %@}
-Outputs @samp{@{}, @samp{|}, and @samp{@}} characters (respectively)
-into the assembler code.  When unescaped, these characters have special
-meaning to indicate multiple assembler dialects, as described below.
-@end table
+@cindex pragma GCC reset_options
+@item #pragma GCC reset_options
 
-@subsubheading Multiple assembler dialects in @code{asm} templates
+This pragma clears the current @code{#pragma GCC target} and
+@code{#pragma GCC optimize} to use the default switches as specified
+on the command line.
 
-On targets such as x86, GCC supports multiple assembler dialects.
-The @option{-masm} option controls which dialect GCC uses as its 
-default for inline assembler. The target-specific documentation for the 
-@option{-masm} option contains the list of supported dialects, as well as the 
-default dialect if the option is not specified. This information may be 
-important to understand, since assembler code that works correctly when 
-compiled using one dialect will likely fail if compiled using another.
-@xref{x86 Options}.
+@end table
 
-If your code needs to support multiple assembler dialects (for example, if 
-you are writing public headers that need to support a variety of compilation 
-options), use constructs of this form:
+@node Loop-Specific Pragmas
+@subsection Loop-Specific Pragmas
 
-@example
-@{ dialect0 | dialect1 | dialect2... @}
-@end example
+@table @code
+@cindex pragma GCC ivdep
+@item #pragma GCC ivdep
 
-This construct outputs @code{dialect0} 
-when using dialect #0 to compile the code, 
-@code{dialect1} for dialect #1, etc. If there are fewer alternatives within the 
-braces than the number of dialects the compiler supports, the construct 
-outputs nothing.
+With this pragma, the programmer asserts that there are no loop-carried
+dependencies which would prevent consecutive iterations of
+the following loop from executing concurrently with SIMD
+(single instruction multiple data) instructions.
 
-For example, if an x86 compiler supports two dialects
-(@samp{att}, @samp{intel}), an 
-assembler template such as this:
+For example, the compiler can only unconditionally vectorize the following
+loop with the pragma:
 
-@example
-"bt@{l %[Offset],%[Base] | %[Base],%[Offset]@}; jc %l2"
-@end example
+@smallexample
+void foo (int n, int *a, int *b, int *c)
+@{
+  int i, j;
+#pragma GCC ivdep
+  for (i = 0; i < n; ++i)
+    a[i] = b[i] + c[i];
+@}
+@end smallexample
 
 @noindent
-is equivalent to one of
-
-@example
-"btl %[Offset],%[Base] ; jc %l2"   @r{/* att dialect */}
-"bt %[Base],%[Offset]; jc %l2"     @r{/* intel dialect */}
-@end example
+In this example, using the @code{restrict} qualifier had the same
+effect. In the following example, that would not be possible. Assume
+@math{k < -m} or @math{k >= m}. Only with the pragma, the compiler knows
+that it can unconditionally vectorize the following loop:
 
-Using that same compiler, this code:
+@smallexample
+void ignore_vec_dep (int *a, int k, int c, int m)
+@{
+#pragma GCC ivdep
+  for (int i = 0; i < m; i++)
+    a[i] = a[i + k] * c;
+@}
+@end smallexample
 
-@example
-"xchg@{l@}\t@{%%@}ebx, %1"
-@end example
+@cindex pragma GCC novector
+@item #pragma GCC novector
 
-@noindent
-corresponds to either
+With this pragma, the programmer asserts that the following loop should be
+prevented from executing concurrently with SIMD (single instruction multiple
+data) instructions.
 
-@example
-"xchgl\t%%ebx, %1"                 @r{/* att dialect */}
-"xchg\tebx, %1"                    @r{/* intel dialect */}
-@end example
+For example, the compiler cannot vectorize the following loop with the pragma:
 
-There is no support for nesting dialect alternatives.
+@smallexample
+void foo (int n, int *a, int *b, int *c)
+@{
+  int i, j;
+#pragma GCC novector
+  for (i = 0; i < n; ++i)
+    a[i] = b[i] + c[i];
+@}
+@end smallexample
 
-@anchor{OutputOperands}
-@subsubsection Output Operands
-@cindex @code{asm} output operands
+@cindex pragma GCC unroll @var{n}
+@item #pragma GCC unroll @var{n}
 
-An @code{asm} statement has zero or more output operands indicating the names
-of C variables modified by the assembler code.
+You can use this pragma to control how many times a loop should be unrolled.
+It must be placed immediately before a @code{for}, @code{while} or @code{do}
+loop or a @code{#pragma GCC ivdep}, and applies only to the loop that follows.
+@var{n} is an integer constant expression specifying the unrolling factor.
+The values of @math{0} and @math{1} block any unrolling of the loop.
 
-In this i386 example, @code{old} (referred to in the template string as 
-@code{%0}) and @code{*Base} (as @code{%1}) are outputs and @code{Offset} 
-(@code{%2}) is an input:
+@end table
 
-@example
-bool old;
+@node Thread-Local
+@section Thread-Local Storage
+@cindex Thread-Local Storage
+@cindex @acronym{TLS}
+@cindex @code{__thread}
 
-__asm__ ("btsl %2,%1\n\t" // Turn on zero-based bit #Offset in Base.
-         "sbb %0,%0"      // Use the CF to calculate old.
-   : "=r" (old), "+rm" (*Base)
-   : "Ir" (Offset)
-   : "cc");
+Thread-local storage (@acronym{TLS}) is a mechanism by which variables
+are allocated such that there is one instance of the variable per extant
+thread.  The runtime model GCC uses to implement this originates
+in the IA-64 processor-specific ABI, but has since been migrated
+to other processors as well.  It requires significant support from
+the linker (@command{ld}), dynamic linker (@command{ld.so}), and
+system libraries (@file{libc.so} and @file{libpthread.so}), so it
+is not available everywhere.
 
-return old;
-@end example
+At the user level, the extension is visible with a new storage
+class keyword: @code{__thread}.  For example:
 
-Operands are separated by commas.  Each operand has this format:
+@smallexample
+__thread int i;
+extern __thread struct state s;
+static __thread char *p;
+@end smallexample
 
-@example
-@r{[} [@var{asmSymbolicName}] @r{]} @var{constraint} (@var{cvariablename})
-@end example
+The @code{__thread} specifier may be used alone, with the @code{extern}
+or @code{static} specifiers, but with no other storage class specifier.
+When used with @code{extern} or @code{static}, @code{__thread} must appear
+immediately after the other storage class specifier.
 
-@table @var
-@item asmSymbolicName
-Specifies an optional symbolic name for the operand.  The literal square
-brackets @samp{[]} around the @var{asmSymbolicName} are required both
-in the operand specification and references to the operand in the assembler
-template, i.e.@: @samp{%[Value]}.
-The scope of the name is the @code{asm} statement
-that contains the definition. Any valid C identifier is acceptable,
-including names already defined in the surrounding code. No two operands 
-within the same @code{asm} statement can use the same symbolic name.
+The @code{__thread} specifier may be applied to any global, file-scoped
+static, function-scoped static, or static data member of a class.  It may
+not be applied to block-scoped automatic or non-static data member.
 
-When not using an @var{asmSymbolicName}, use the (zero-based) position
-of the operand 
-in the list of operands in the assembler template. For example if there are 
-three output operands, use @samp{%0} in the template to refer to the first, 
-@samp{%1} for the second, and @samp{%2} for the third. 
+When the address-of operator is applied to a thread-local variable, it is
+evaluated at run time and returns the address of the current thread's
+instance of that variable.  An address so obtained may be used by any
+thread.  When a thread terminates, any pointers to thread-local variables
+in that thread become invalid.
 
-@item constraint
-A string constant specifying constraints on the placement of the operand; 
-@xref{Constraints}, for details.
-In C++ with @option{-std=gnu++11} or later, the constraint can
-also be a constant expression inside parentheses (see @ref{Asm constexprs}).
+No static initialization may refer to the address of a thread-local variable.
 
-Output constraints must begin with either @samp{=} (a variable overwriting an 
-existing value) or @samp{+} (when reading and writing). When using 
-@samp{=}, do not assume the location contains the existing value
-on entry to the @code{asm}, except 
-when the operand is tied to an input; @pxref{InputOperands,,Input Operands}.
+In C++, if an initializer is present for a thread-local variable, it must
+be a @var{constant-expression}, as defined in 5.19.2 of the ANSI/ISO C++
+standard.
 
-After the prefix, there must be one or more additional constraints 
-(@pxref{Constraints}) that describe where the value resides. Common 
-constraints include @samp{r} for register and @samp{m} for memory. 
-When you list more than one possible location (for example, @code{"=rm"}),
-the compiler chooses the most efficient one based on the current context. 
-If you list as many alternates as the @code{asm} statement allows, you permit 
-the optimizers to produce the best possible code. 
-If you must use a specific register, but your Machine Constraints do not
-provide sufficient control to select the specific register you want, 
-local register variables may provide a solution (@pxref{Local Register 
-Variables}).
+See @uref{https://www.akkadia.org/drepper/tls.pdf,
+ELF Handling For Thread-Local Storage} for a detailed explanation of
+the four thread-local storage addressing models, and how the runtime
+is expected to function.
 
-@item cvariablename
-Specifies a C lvalue expression to hold the output, typically a variable name.
-The enclosing parentheses are a required part of the syntax.
+@menu
+* C99 Thread-Local Edits::
+* C++98 Thread-Local Edits::
+@end menu
 
-@end table
+@node C99 Thread-Local Edits
+@subsection ISO/IEC 9899:1999 Edits for Thread-Local Storage
 
-When the compiler selects the registers to use to 
-represent the output operands, it does not use any of the clobbered registers 
-(@pxref{Clobbers and Scratch Registers}).
+The following are a set of changes to ISO/IEC 9899:1999 (aka C99)
+that document the exact semantics of the language extension.
 
-Output operand expressions must be lvalues. The compiler cannot check whether 
-the operands have data types that are reasonable for the instruction being 
-executed. For output expressions that are not directly addressable (for 
-example a bit-field), the constraint must allow a register. In that case, GCC 
-uses the register as the output of the @code{asm}, and then stores that 
-register into the output. 
+@itemize @bullet
+@item
+@cite{5.1.2  Execution environments}
 
-Operands using the @samp{+} constraint modifier count as two operands 
-(that is, both as input and output) towards the total maximum of 30 operands
-per @code{asm} statement.
+Add new text after paragraph 1
 
-Use the @samp{&} constraint modifier (@pxref{Modifiers}) on all output
-operands that must not overlap an input.  Otherwise, 
-GCC may allocate the output operand in the same register as an unrelated 
-input operand, on the assumption that the assembler code consumes its 
-inputs before producing outputs. This assumption may be false if the assembler 
-code actually consists of more than one instruction.
+@quotation
+Within either execution environment, a @dfn{thread} is a flow of
+control within a program.  It is implementation defined whether
+or not there may be more than one thread associated with a program.
+It is implementation defined how threads beyond the first are
+created, the name and type of the function called at thread
+startup, and how threads may be terminated.  However, objects
+with thread storage duration shall be initialized before thread
+startup.
+@end quotation
 
-The same problem can occur if one output parameter (@var{a}) allows a register 
-constraint and another output parameter (@var{b}) allows a memory constraint.
-The code generated by GCC to access the memory address in @var{b} can contain
-registers which @emph{might} be shared by @var{a}, and GCC considers those 
-registers to be inputs to the asm. As above, GCC assumes that such input
-registers are consumed before any outputs are written. This assumption may 
-result in incorrect behavior if the @code{asm} statement writes to @var{a}
-before using
-@var{b}. Combining the @samp{&} modifier with the register constraint on @var{a}
-ensures that modifying @var{a} does not affect the address referenced by 
-@var{b}. Otherwise, the location of @var{b} 
-is undefined if @var{a} is modified before using @var{b}.
+@item
+@cite{6.2.4  Storage durations of objects}
 
-@code{asm} supports operand modifiers on operands (for example @samp{%k2} 
-instead of simply @samp{%2}). @ref{GenericOperandmodifiers,
-Generic Operand modifiers} lists the modifiers that are available
-on all targets.  Other modifiers are hardware dependent.
-For example, the list of supported modifiers for x86 is found at
-@ref{x86Operandmodifiers,x86 Operand modifiers}.
+Add new text before paragraph 3
 
-If the C code that follows the @code{asm} makes no use of any of the output 
-operands, use @code{volatile} for the @code{asm} statement to prevent the 
-optimizers from discarding the @code{asm} statement as unneeded 
-(see @ref{Volatile}).
+@quotation
+An object whose identifier is declared with the storage-class
+specifier @w{@code{__thread}} has @dfn{thread storage duration}.
+Its lifetime is the entire execution of the thread, and its
+stored value is initialized only once, prior to thread startup.
+@end quotation
 
-This code makes no use of the optional @var{asmSymbolicName}. Therefore it 
-references the first output operand as @code{%0} (were there a second, it 
-would be @code{%1}, etc). The number of the first input operand is one greater 
-than that of the last output operand. In this i386 example, that makes 
-@code{Mask} referenced as @code{%1}:
+@item
+@cite{6.4.1  Keywords}
 
-@example
-uint32_t Mask = 1234;
-uint32_t Index;
+Add @code{__thread}.
 
-  asm ("bsfl %1, %0"
-     : "=r" (Index)
-     : "r" (Mask)
-     : "cc");
-@end example
+@item
+@cite{6.7.1  Storage-class specifiers}
 
-That code overwrites the variable @code{Index} (@samp{=}),
-placing the value in a register (@samp{r}).
-Using the generic @samp{r} constraint instead of a constraint for a specific 
-register allows the compiler to pick the register to use, which can result 
-in more efficient code. This may not be possible if an assembler instruction 
-requires a specific register.
+Add @code{__thread} to the list of storage class specifiers in
+paragraph 1.
 
-The following i386 example uses the @var{asmSymbolicName} syntax.
-It produces the 
-same result as the code above, but some may consider it more readable or more 
-maintainable since reordering index numbers is not necessary when adding or 
-removing operands. The names @code{aIndex} and @code{aMask}
-are only used in this example to emphasize which 
-names get used where.
-It is acceptable to reuse the names @code{Index} and @code{Mask}.
+Change paragraph 2 to
 
-@example
-uint32_t Mask = 1234;
-uint32_t Index;
+@quotation
+With the exception of @code{__thread}, at most one storage-class
+specifier may be given [@dots{}].  The @code{__thread} specifier may
+be used alone, or immediately following @code{extern} or
+@code{static}.
+@end quotation
 
-  asm ("bsfl %[aMask], %[aIndex]"
-     : [aIndex] "=r" (Index)
-     : [aMask] "r" (Mask)
-     : "cc");
-@end example
+Add new text after paragraph 6
 
-Here are some more examples of output operands.
+@quotation
+The declaration of an identifier for a variable that has
+block scope that specifies @code{__thread} shall also
+specify either @code{extern} or @code{static}.
 
-@example
-uint32_t c = 1;
-uint32_t d;
-uint32_t *e = &c;
+The @code{__thread} specifier shall be used only with
+variables.
+@end quotation
+@end itemize
 
-asm ("mov %[e], %[d]"
-   : [d] "=rm" (d)
-   : [e] "rm" (*e));
-@end example
+@node C++98 Thread-Local Edits
+@subsection ISO/IEC 14882:1998 Edits for Thread-Local Storage
 
-Here, @code{d} may either be in a register or in memory. Since the compiler 
-might already have the current value of the @code{uint32_t} location
-pointed to by @code{e}
-in a register, you can enable it to choose the best location
-for @code{d} by specifying both constraints.
+The following are a set of changes to ISO/IEC 14882:1998 (aka C++98)
+that document the exact semantics of the language extension.
 
-@anchor{FlagOutputOperands}
-@subsubsection Flag Output Operands
-@cindex @code{asm} flag output operands
+@itemize @bullet
+@item
+@b{[intro.execution]}
 
-Some targets have a special register that holds the ``flags'' for the
-result of an operation or comparison.  Normally, the contents of that
-register are either unmodified by the asm, or the @code{asm} statement is
-considered to clobber the contents.
+New text after paragraph 4
 
-On some targets, a special form of output operand exists by which
-conditions in the flags register may be outputs of the asm.  The set of
-conditions supported are target specific, but the general rule is that
-the output variable must be a scalar integer, and the value is boolean.
-When supported, the target defines the preprocessor symbol
-@code{__GCC_ASM_FLAG_OUTPUTS__}.
+@quotation
+A @dfn{thread} is a flow of control within the abstract machine.
+It is implementation defined whether or not there may be more than
+one thread.
+@end quotation
 
-Because of the special nature of the flag output operands, the constraint
-may not include alternatives.
+New text after paragraph 7
 
-Most often, the target has only one flags register, and thus is an implied
-operand of many instructions.  In this case, the operand should not be
-referenced within the assembler template via @code{%0} etc, as there's
-no corresponding text in the assembly language.
+@quotation
+It is unspecified whether additional action must be taken to
+ensure when and whether side effects are visible to other threads.
+@end quotation
 
-@table @asis
-@item ARM
-@itemx AArch64
-The flag output constraints for the ARM family are of the form
-@samp{=@@cc@var{cond}} where @var{cond} is one of the standard
-conditions defined in the ARM ARM for @code{ConditionHolds}.
+@item
+@b{[lex.key]}
 
-@table @code
-@item eq
-Z flag set, or equal
-@item ne
-Z flag clear or not equal
-@item cs
-@itemx hs
-C flag set or unsigned greater than equal
-@item cc
-@itemx lo
-C flag clear or unsigned less than
-@item mi
-N flag set or ``minus''
-@item pl
-N flag clear or ``plus''
-@item vs
-V flag set or signed overflow
-@item vc
-V flag clear
-@item hi
-unsigned greater than
-@item ls
-unsigned less than equal
-@item ge
-signed greater than equal
-@item lt
-signed less than
-@item gt
-signed greater than
-@item le
-signed less than equal
-@end table
+Add @code{__thread}.
 
-The flag output constraints are not supported in thumb1 mode.
+@item
+@b{[basic.start.main]}
 
-@item x86 family
-The flag output constraints for the x86 family are of the form
-@samp{=@@cc@var{cond}} where @var{cond} is one of the standard
-conditions defined in the ISA manual for @code{j@var{cc}} or
-@code{set@var{cc}}.
+Add after paragraph 5
 
-@table @code
-@item a
-``above'' or unsigned greater than
-@item ae
-``above or equal'' or unsigned greater than or equal
-@item b
-``below'' or unsigned less than
-@item be
-``below or equal'' or unsigned less than or equal
-@item c
-carry flag set
-@item e
-@itemx z
-``equal'' or zero flag set
-@item g
-signed greater than
-@item ge
-signed greater than or equal
-@item l
-signed less than
-@item le
-signed less than or equal
-@item o
-overflow flag set
-@item p
-parity flag set
-@item s
-sign flag set
-@item na
-@itemx nae
-@itemx nb
-@itemx nbe
-@itemx nc
-@itemx ne
-@itemx ng
-@itemx nge
-@itemx nl
-@itemx nle
-@itemx no
-@itemx np
-@itemx ns
-@itemx nz
-``not'' @var{flag}, or inverted versions of those above
-@end table
+@quotation
+The thread that begins execution at the @code{main} function is called
+the @dfn{main thread}.  It is implementation defined how functions
+beginning threads other than the main thread are designated or typed.
+A function so designated, as well as the @code{main} function, is called
+a @dfn{thread startup function}.  It is implementation defined what
+happens if a thread startup function returns.  It is implementation
+defined what happens to other threads when any thread calls @code{exit}.
+@end quotation
 
-@item s390
-The flag output constraint for s390 is @samp{=@@cc}.  Only one such
-constraint is allowed.  The variable has to be stored in a @samp{int}
-variable.
+@item
+@b{[basic.start.init]}
 
-@end table
+Add after paragraph 4
 
-@anchor{InputOperands}
-@subsubsection Input Operands
-@cindex @code{asm} input operands
-@cindex @code{asm} expressions
+@quotation
+The storage for an object of thread storage duration shall be
+statically initialized before the first statement of the thread startup
+function.  An object of thread storage duration shall not require
+dynamic initialization.
+@end quotation
 
-Input operands make values from C variables and expressions available to the 
-assembly code.
+@item
+@b{[basic.start.term]}
 
-Operands are separated by commas.  Each operand has this format:
+Add after paragraph 3
 
-@example
-@r{[} [@var{asmSymbolicName}] @r{]} @var{constraint} (@var{cexpression})
-@end example
+@quotation
+The type of an object with thread storage duration shall not have a
+non-trivial destructor, nor shall it be an array type whose elements
+(directly or indirectly) have non-trivial destructors.
+@end quotation
 
-@table @var
-@item asmSymbolicName
-Specifies an optional symbolic name for the operand.  The literal square
-brackets @samp{[]} around the @var{asmSymbolicName} are required both
-in the operand specification and references to the operand in the assembler
-template, i.e.@: @samp{%[Value]}.
-The scope of the name is the @code{asm} statement
-that contains the definition. Any valid C identifier is acceptable,
-including names already defined in the surrounding code. No two operands 
-within the same @code{asm} statement can use the same symbolic name.
+@item
+@b{[basic.stc]}
 
-When not using an @var{asmSymbolicName}, use the (zero-based) position
-of the operand 
-in the list of operands in the assembler template. For example if there are
-two output operands and three inputs,
-use @samp{%2} in the template to refer to the first input operand,
-@samp{%3} for the second, and @samp{%4} for the third. 
+Add ``thread storage duration'' to the list in paragraph 1.
 
-@item constraint
-A string constant specifying constraints on the placement of the operand; 
-@xref{Constraints}, for details.
-In C++ with @option{-std=gnu++11} or later, the constraint can
-also be a constant expression inside parentheses (see @ref{Asm constexprs}).
+Change paragraph 2
 
-Input constraint strings may not begin with either @samp{=} or @samp{+}.
-When you list more than one possible location (for example, @samp{"irm"}), 
-the compiler chooses the most efficient one based on the current context.
-If you must use a specific register, but your Machine Constraints do not
-provide sufficient control to select the specific register you want, 
-local register variables may provide a solution (@pxref{Local Register 
-Variables}).
+@quotation
+Thread, static, and automatic storage durations are associated with
+objects introduced by declarations [@dots{}].
+@end quotation
 
-Input constraints can also be digits (for example, @code{"0"}). This indicates 
-that the specified input must be in the same place as the output constraint 
-at the (zero-based) index in the output constraint list. 
-When using @var{asmSymbolicName} syntax for the output operands,
-you may use these names (enclosed in brackets @samp{[]}) instead of digits.
+Add @code{__thread} to the list of specifiers in paragraph 3.
 
-@item cexpression
-This is the C variable or expression being passed to the @code{asm} statement 
-as input.  The enclosing parentheses are a required part of the syntax.
+@item
+@b{[basic.stc.thread]}
 
-@end table
+New section before @b{[basic.stc.static]}
 
-When the compiler selects the registers to use to represent the input 
-operands, it does not use any of the clobbered registers
-(@pxref{Clobbers and Scratch Registers}).
+@quotation
+The keyword @code{__thread} applied to a non-local object gives the
+object thread storage duration.
 
-If there are no output operands but there are input operands, place two 
-consecutive colons where the output operands would go:
+A local variable or class data member declared both @code{static}
+and @code{__thread} gives the variable or member thread storage
+duration.
+@end quotation
 
-@example
-__asm__ ("some instructions"
-   : /* No outputs. */
-   : "r" (Offset / 8));
-@end example
+@item
+@b{[basic.stc.static]}
 
-@strong{Warning:} Do @emph{not} modify the contents of input-only operands 
-(except for inputs tied to outputs). The compiler assumes that on exit from 
-the @code{asm} statement these operands contain the same values as they 
-had before executing the statement. 
-It is @emph{not} possible to use clobbers
-to inform the compiler that the values in these inputs are changing. One 
-common work-around is to tie the changing input variable to an output variable 
-that never gets used. Note, however, that if the code that follows the 
-@code{asm} statement makes no use of any of the output operands, the GCC 
-optimizers may discard the @code{asm} statement as unneeded 
-(see @ref{Volatile}).
+Change paragraph 1
 
-@code{asm} supports operand modifiers on operands (for example @samp{%k2} 
-instead of simply @samp{%2}). @ref{GenericOperandmodifiers,
-Generic Operand modifiers} lists the modifiers that are available
-on all targets.  Other modifiers are hardware dependent.
-For example, the list of supported modifiers for x86 is found at
-@ref{x86Operandmodifiers,x86 Operand modifiers}.
+@quotation
+All objects that have neither thread storage duration, dynamic
+storage duration nor are local [@dots{}].
+@end quotation
 
-In this example using the fictitious @code{combine} instruction, the 
-constraint @code{"0"} for input operand 1 says that it must occupy the same 
-location as output operand 0. Only input operands may use numbers in 
-constraints, and they must each refer to an output operand. Only a number (or 
-the symbolic assembler name) in the constraint can guarantee that one operand 
-is in the same place as another. The mere fact that @code{foo} is the value of 
-both operands is not enough to guarantee that they are in the same place in 
-the generated assembler code.
+@item
+@b{[dcl.stc]}
 
-@example
-asm ("combine %2, %0" 
-   : "=r" (foo) 
-   : "0" (foo), "g" (bar));
-@end example
+Add @code{__thread} to the list in paragraph 1.
 
-Here is an example using symbolic names.
+Change paragraph 1
 
-@example
-asm ("cmoveq %1, %2, %[result]" 
-   : [result] "=r"(result) 
-   : "r" (test), "r" (new), "[result]" (old));
-@end example
+@quotation
+With the exception of @code{__thread}, at most one
+@var{storage-class-specifier} shall appear in a given
+@var{decl-specifier-seq}.  The @code{__thread} specifier may
+be used alone, or immediately following the @code{extern} or
+@code{static} specifiers.  [@dots{}]
+@end quotation
 
-@anchor{Clobbers and Scratch Registers}
-@subsubsection Clobbers and Scratch Registers
-@cindex @code{asm} clobbers
-@cindex @code{asm} scratch registers
+Add after paragraph 5
 
-While the compiler is aware of changes to entries listed in the output 
-operands, the inline @code{asm} code may modify more than just the outputs. For 
-example, calculations may require additional registers, or the processor may 
-overwrite a register as a side effect of a particular assembler instruction. 
-In order to inform the compiler of these changes, list them in the clobber 
-list. Clobber list items are either register names or the special clobbers 
-(listed below). Each clobber list item is a string constant 
-enclosed in double quotes and separated by commas.
-In C++ with @option{-std=gnu++11} or later, a clobber list item can
-also be a constant expression inside parentheses (see @ref{Asm constexprs}).
+@quotation
+The @code{__thread} specifier can be applied only to the names of objects
+and to anonymous unions.
+@end quotation
 
-Clobber descriptions may not in any way overlap with an input or output 
-operand. For example, you may not have an operand describing a register class 
-with one member when listing that register in the clobber list. Variables 
-declared to live in specific registers (@pxref{Explicit Register 
-Variables}) and used 
-as @code{asm} input or output operands must have no part mentioned in the 
-clobber description. In particular, there is no way to specify that input 
-operands get modified without also specifying them as output operands.
+@item
+@b{[class.mem]}
 
-When the compiler selects which registers to use to represent input and output 
-operands, it does not use any of the clobbered registers. As a result, 
-clobbered registers are available for any use in the assembler code.
+Add after paragraph 6
 
-Another restriction is that the clobber list should not contain the
-stack pointer register.  This is because the compiler requires the
-value of the stack pointer to be the same after an @code{asm}
-statement as it was on entry to the statement.  However, previous
-versions of GCC did not enforce this rule and allowed the stack
-pointer to appear in the list, with unclear semantics.  This behavior
-is deprecated and listing the stack pointer may become an error in
-future versions of GCC@.
+@quotation
+Non-@code{static} members shall not be @code{__thread}.
+@end quotation
+@end itemize
 
-Here is a realistic example for the VAX showing the use of clobbered 
-registers: 
+@node OpenMP
+@section OpenMP
+@cindex OpenMP extension support
 
-@example
-asm volatile ("movc3 %0, %1, %2"
-                   : /* No outputs. */
-                   : "g" (from), "g" (to), "g" (count)
-                   : "r0", "r1", "r2", "r3", "r4", "r5", "memory");
-@end example
+OpenMP (Open Multi-Processing) is an application programming
+interface (API) that supports multi-platform shared memory
+multiprocessing programming in C/C++ and Fortran on many
+architectures, including Unix and Microsoft Windows platforms.
+It consists of a set of compiler directives, library routines,
+and environment variables that influence run-time behavior.
 
-Also, there are three special clobber arguments:
+GCC implements all of the @uref{https://www.openmp.org/specifications/,
+OpenMP Application Program Interface v4.5}, and many features from later
+versions of the OpenMP specification.
+@xref{OpenMP Implementation Status,,,libgomp,
+GNU Offloading and Multi Processing Runtime Library},
+for more details about currently supported OpenMP features.
 
-@table @code
-@item "cc"
-The @code{"cc"} clobber indicates that the assembler code modifies the flags 
-register. On some machines, GCC represents the condition codes as a specific 
-hardware register; @code{"cc"} serves to name this register.
-On other machines, condition code handling is different, 
-and specifying @code{"cc"} has no effect. But 
-it is valid no matter what the target.
+To enable the processing of OpenMP directives @samp{#pragma omp},
+@samp{[[omp::directive(...)]]}, @samp{[[omp::decl(...)]]},
+and @samp{[[omp::sequence(...)]]} in C and C++,
+GCC needs to be invoked with the @option{-fopenmp} option.
+This option also arranges for automatic linking of the OpenMP
+runtime library.
+@xref{,,,libgomp,GNU Offloading and Multi Processing Runtime Library}.
 
-@item "memory"
-The @code{"memory"} clobber tells the compiler that the assembly code
-performs memory 
-reads or writes to items other than those listed in the input and output 
-operands (for example, accessing the memory pointed to by one of the input 
-parameters). To ensure memory contains correct values, GCC may need to flush 
-specific register values to memory before executing the @code{asm}. Further, 
-the compiler does not assume that any values read from memory before an 
-@code{asm} remain unchanged after that @code{asm}; it reloads them as 
-needed.  
-Using the @code{"memory"} clobber effectively forms a read/write
-memory barrier for the compiler.
+@xref{OpenMP and OpenACC Options}, for additional options useful with
+@option{-fopenmp}.
 
-Note that this clobber does not prevent the @emph{processor} from doing 
-speculative reads past the @code{asm} statement. To prevent that, you need 
-processor-specific fence instructions.
+@node OpenACC
+@section OpenACC
+@cindex OpenACC extension support
 
-@item "redzone"
-The @code{"redzone"} clobber tells the compiler that the assembly code
-may write to the stack red zone, area below the stack pointer which on
-some architectures in some calling conventions is guaranteed not to be
-changed by signal handlers, interrupts or exceptions and so the compiler
-can store there temporaries in leaf functions.  On targets which have
-no concept of the stack red zone, the clobber is ignored.
-It should be used e.g.@: in case the assembly code uses call instructions
-or pushes something to the stack without taking the red zone into account
-by subtracting red zone size from the stack pointer first and restoring
-it afterwards.
+OpenACC is an application programming interface (API) that supports
+offloading of code to accelerator devices.  It consists of a set of
+compiler directives, library routines, and environment variables that
+influence run-time behavior.
 
-@end table
+GCC strives to be compatible with the
+@uref{https://www.openacc.org/, OpenACC Application Programming
+Interface v2.6}.
 
-Flushing registers to memory has performance implications and may be
-an issue for time-sensitive code.  You can provide better information
-to GCC to avoid this, as shown in the following examples.  At a
-minimum, aliasing rules allow GCC to know what memory @emph{doesn't}
-need to be flushed.
+To enable the processing of OpenACC directives @samp{#pragma acc}
+in C and C++, GCC needs to be invoked with the @option{-fopenacc} option.
+This option also arranges for automatic linking of the OpenACC runtime
+library.
+@xref{,,,libgomp,GNU Offloading and Multi Processing Runtime Library}.
 
-Here is a fictitious sum of squares instruction, that takes two
-pointers to floating point values in memory and produces a floating
-point register output.
-Notice that @code{x}, and @code{y} both appear twice in the @code{asm}
-parameters, once to specify memory accessed, and once to specify a
-base register used by the @code{asm}.  You won't normally be wasting a
-register by doing this as GCC can use the same register for both
-purposes.  However, it would be foolish to use both @code{%1} and
-@code{%3} for @code{x} in this @code{asm} and expect them to be the
-same.  In fact, @code{%3} may well not be a register.  It might be a
-symbolic memory reference to the object pointed to by @code{x}.
+@xref{OpenMP and OpenACC Options}, for additional options useful with
+@option{-fopenacc}.
 
-@smallexample
-asm ("sumsq %0, %1, %2"
-     : "+f" (result)
-     : "r" (x), "r" (y), "m" (*x), "m" (*y));
-@end smallexample
+@node Inline
+@section An Inline Function is As Fast As a Macro
+@cindex inline functions
+@cindex integrating function code
+@cindex open coding
+@cindex macros, inline alternative
 
-Here is a fictitious @code{*z++ = *x++ * *y++} instruction.
-Notice that the @code{x}, @code{y} and @code{z} pointer registers
-must be specified as input/output because the @code{asm} modifies
-them.
+By declaring a function inline, you can direct GCC to make
+calls to that function faster.  One way GCC can achieve this is to
+integrate that function's code into the code for its callers.  This
+makes execution faster by eliminating the function-call overhead; in
+addition, if any of the actual argument values are constant, their
+known values may permit simplifications at compile time so that not
+all of the inline function's code needs to be included.  The effect on
+code size is less predictable; object code may be larger or smaller
+with function inlining, depending on the particular case.  You can
+also direct GCC to try to integrate all ``simple enough'' functions
+into their callers with the option @option{-finline-functions}.
 
-@smallexample
-asm ("vecmul %0, %1, %2"
-     : "+r" (z), "+r" (x), "+r" (y), "=m" (*z)
-     : "m" (*x), "m" (*y));
-@end smallexample
+GCC implements three different semantics of declaring a function
+inline.  One is available with @option{-std=gnu89} or
+@option{-fgnu89-inline} or when @code{gnu_inline} attribute is present
+on all inline declarations, another when
+@option{-std=c99},
+@option{-std=gnu99} or an option for a later C version is used
+(without @option{-fgnu89-inline}), and the third
+is used when compiling C++.
 
-An x86 example where the string memory argument is of unknown length.
+To declare a function inline, use the @code{inline} keyword in its
+declaration, like this:
 
 @smallexample
-asm("repne scasb"
-    : "=c" (count), "+D" (p)
-    : "m" (*(const char (*)[]) p), "0" (-1), "a" (0));
+static inline int
+inc (int *a)
+@{
+  return (*a)++;
+@}
 @end smallexample
 
-If you know the above will only be reading a ten byte array then you
-could instead use a memory input like:
-@code{"m" (*(const char (*)[10]) p)}.
+If you are writing a header file to be included in ISO C90 programs, write
+@code{__inline__} instead of @code{inline}.  @xref{Alternate Keywords}.
 
-Here is an example of a PowerPC vector scale implemented in assembly,
-complete with vector and condition code clobbers, and some initialized
-offset registers that are unchanged by the @code{asm}.
+The three types of inlining behave similarly in two important cases:
+when the @code{inline} keyword is used on a @code{static} function,
+like the example above, and when a function is first declared without
+using the @code{inline} keyword and then is defined with
+@code{inline}, like this:
 
 @smallexample
-void
-dscal (size_t n, double *x, double alpha)
+extern int inc (int *a);
+inline int
+inc (int *a)
 @{
-  asm ("/* lots of asm here */"
-       : "+m" (*(double (*)[n]) x), "+&r" (n), "+b" (x)
-       : "d" (alpha), "b" (32), "b" (48), "b" (64),
-         "b" (80), "b" (96), "b" (112)
-       : "cr0",
-         "vs32","vs33","vs34","vs35","vs36","vs37","vs38","vs39",
-         "vs40","vs41","vs42","vs43","vs44","vs45","vs46","vs47");
+  return (*a)++;
 @}
 @end smallexample
 
-Rather than allocating fixed registers via clobbers to provide scratch
-registers for an @code{asm} statement, an alternative is to define a
-variable and make it an early-clobber output as with @code{a2} and
-@code{a3} in the example below.  This gives the compiler register
-allocator more freedom.  You can also define a variable and make it an
-output tied to an input as with @code{a0} and @code{a1}, tied
-respectively to @code{ap} and @code{lda}.  Of course, with tied
-outputs your @code{asm} can't use the input value after modifying the
-output register since they are one and the same register.  What's
-more, if you omit the early-clobber on the output, it is possible that
-GCC might allocate the same register to another of the inputs if GCC
-could prove they had the same value on entry to the @code{asm}.  This
-is why @code{a1} has an early-clobber.  Its tied input, @code{lda}
-might conceivably be known to have the value 16 and without an
-early-clobber share the same register as @code{%11}.  On the other
-hand, @code{ap} can't be the same as any of the other inputs, so an
-early-clobber on @code{a0} is not needed.  It is also not desirable in
-this case.  An early-clobber on @code{a0} would cause GCC to allocate
-a separate register for the @code{"m" (*(const double (*)[]) ap)}
-input.  Note that tying an input to an output is the way to set up an
-initialized temporary register modified by an @code{asm} statement.
-An input not tied to an output is assumed by GCC to be unchanged, for
-example @code{"b" (16)} below sets up @code{%11} to 16, and GCC might
-use that register in following code if the value 16 happened to be
-needed.  You can even use a normal @code{asm} output for a scratch if
-all inputs that might share the same register are consumed before the
-scratch is used.  The VSX registers clobbered by the @code{asm}
-statement could have used this technique except for GCC's limit on the
-number of @code{asm} parameters.
+In both of these common cases, the program behaves the same as if you
+had not used the @code{inline} keyword, except for its speed.
 
-@smallexample
-static void
-dgemv_kernel_4x4 (long n, const double *ap, long lda,
-                  const double *x, double *y, double alpha)
-@{
-  double *a0;
-  double *a1;
-  double *a2;
-  double *a3;
+@cindex inline functions, omission of
+@opindex fkeep-inline-functions
+When a function is both inline and @code{static}, if all calls to the
+function are integrated into the caller, and the function's address is
+never used, then the function's own assembler code is never referenced.
+In this case, GCC does not actually output assembler code for the
+function, unless you specify the option @option{-fkeep-inline-functions}.
+If there is a nonintegrated call, then the function is compiled to
+assembler code as usual.  The function must also be compiled as usual if
+the program refers to its address, because that cannot be inlined.
 
-  __asm__
-    (
-     /* lots of asm here */
-     "#n=%1 ap=%8=%12 lda=%13 x=%7=%10 y=%0=%2 alpha=%9 o16=%11\n"
-     "#a0=%3 a1=%4 a2=%5 a3=%6"
-     :
-       "+m" (*(double (*)[n]) y),
-       "+&r" (n),	// 1
-       "+b" (y),	// 2
-       "=b" (a0),	// 3
-       "=&b" (a1),	// 4
-       "=&b" (a2),	// 5
-       "=&b" (a3)	// 6
-     :
-       "m" (*(const double (*)[n]) x),
-       "m" (*(const double (*)[]) ap),
-       "d" (alpha),	// 9
-       "r" (x),		// 10
-       "b" (16),	// 11
-       "3" (ap),	// 12
-       "4" (lda)	// 13
-     :
-       "cr0",
-       "vs32","vs33","vs34","vs35","vs36","vs37",
-       "vs40","vs41","vs42","vs43","vs44","vs45","vs46","vs47"
-     );
-@}
+@opindex Winline
+Note that certain usages in a function definition can make it unsuitable
+for inline substitution.  Among these usages are: variadic functions,
+use of @code{alloca}, use of computed goto (@pxref{Labels as Values}),
+use of nonlocal goto, use of nested functions, use of @code{setjmp}, use
+of @code{__builtin_longjmp} and use of @code{__builtin_return} or
+@code{__builtin_apply_args}.  Using @option{-Winline} warns when a
+function marked @code{inline} could not be substituted, and gives the
+reason for the failure.
+
+@cindex automatic @code{inline} for C++ member fns
+@cindex @code{inline} automatic for C++ member fns
+@cindex member fns, automatically @code{inline}
+@cindex C++ member fns, automatically @code{inline}
+@opindex fno-default-inline
+As required by ISO C++, GCC considers member functions defined within
+the body of a class to be marked inline even if they are
+not explicitly declared with the @code{inline} keyword.  You can
+override this with @option{-fno-default-inline}; @pxref{C++ Dialect
+Options,,Options Controlling C++ Dialect}.
+
+GCC does not inline any functions when not optimizing unless you specify
+the @samp{always_inline} attribute for the function, like this:
+
+@smallexample
+/* @r{Prototype.}  */
+inline void foo (const char) __attribute__((always_inline));
 @end smallexample
 
-@anchor{GotoLabels}
-@subsubsection Goto Labels
-@cindex @code{asm} goto labels
+The remainder of this section is specific to GNU C90 inlining.
 
-@code{asm goto} allows assembly code to jump to one or more C labels.  The
-@var{GotoLabels} section in an @code{asm goto} statement contains 
-a comma-separated 
-list of all C labels to which the assembler code may jump. GCC assumes that 
-@code{asm} execution falls through to the next statement (if this is not the 
-case, consider using the @code{__builtin_unreachable} intrinsic after the 
-@code{asm} statement). Optimization of @code{asm goto} may be improved by 
-using the @code{hot} and @code{cold} label attributes (@pxref{Label 
-Attributes}).
+@cindex non-static inline function
+When an inline function is not @code{static}, then the compiler must assume
+that there may be calls from other source files; since a global symbol can
+be defined only once in any program, the function must not be defined in
+the other source files, so the calls therein cannot be integrated.
+Therefore, a non-@code{static} inline function is always compiled on its
+own in the usual fashion.
 
-If the assembler code does modify anything, use the @code{"memory"} clobber 
-to force the 
-optimizers to flush all register values to memory and reload them if 
-necessary after the @code{asm} statement.
+If you specify both @code{inline} and @code{extern} in the function
+definition, then the definition is used only for inlining.  In no case
+is the function compiled on its own, not even if you refer to its
+address explicitly.  Such an address becomes an external reference, as
+if you had only declared the function, and had not defined it.
 
-Also note that an @code{asm goto} statement is always implicitly
-considered volatile.
+This combination of @code{inline} and @code{extern} has almost the
+effect of a macro.  The way to use it is to put a function definition in
+a header file with these keywords, and put another copy of the
+definition (lacking @code{inline} and @code{extern}) in a library file.
+The definition in the header file causes most calls to the function
+to be inlined.  If any uses of the function remain, they refer to
+the single copy in the library.
 
-Be careful when you set output operands inside @code{asm goto} only on
-some possible control flow paths.  If you don't set up the output on
-given path and never use it on this path, it is okay.  Otherwise, you
-should use @samp{+} constraint modifier meaning that the operand is
-input and output one.  With this modifier you will have the correct
-values on all possible paths from the @code{asm goto}.
+@node Volatiles
+@section When is a Volatile Object Accessed?
+@cindex accessing volatiles
+@cindex volatile read
+@cindex volatile write
+@cindex volatile access
 
-To reference a label in the assembler template, prefix it with
-@samp{%l} (lowercase @samp{L}) followed by its (zero-based) position
-in @var{GotoLabels} plus the number of input and output operands.
-Output operand with constraint modifier @samp{+} is counted as two
-operands because it is considered as one output and one input operand.
-For example, if the @code{asm} has three inputs, one output operand
-with constraint modifier @samp{+} and one output operand with
-constraint modifier @samp{=} and references two labels, refer to the
-first label as @samp{%l6} and the second as @samp{%l7}).
+C has the concept of volatile objects.  These are normally accessed by
+pointers and used for accessing hardware or inter-thread
+communication.  The standard encourages compilers to refrain from
+optimizations concerning accesses to volatile objects, but leaves it
+implementation defined as to what constitutes a volatile access.  The
+minimum requirement is that at a sequence point all previous accesses
+to volatile objects have stabilized and no subsequent accesses have
+occurred.  Thus an implementation is free to reorder and combine
+volatile accesses that occur between sequence points, but cannot do
+so for accesses across a sequence point.  The use of volatile does
+not allow you to violate the restriction on updating objects multiple
+times between two sequence points.
 
-Alternately, you can reference labels using the actual C label name
-enclosed in brackets.  For example, to reference a label named
-@code{carry}, you can use @samp{%l[carry]}.  The label must still be
-listed in the @var{GotoLabels} section when using this approach.  It
-is better to use the named references for labels as in this case you
-can avoid counting input and output operands and special treatment of
-output operands with constraint modifier @samp{+}.
+Accesses to non-volatile objects are not ordered with respect to
+volatile accesses.  You cannot use a volatile object as a memory
+barrier to order a sequence of writes to non-volatile memory.  For
+instance:
 
-Here is an example of @code{asm goto} for i386:
+@smallexample
+int *ptr = @var{something};
+volatile int vobj;
+*ptr = @var{something};
+vobj = 1;
+@end smallexample
 
-@example
-asm goto (
-    "btl %1, %0\n\t"
-    "jc %l2"
-    : /* No outputs. */
-    : "r" (p1), "r" (p2) 
-    : "cc" 
-    : carry);
+@noindent
+Unless @var{*ptr} and @var{vobj} can be aliased, it is not guaranteed
+that the write to @var{*ptr} occurs by the time the update
+of @var{vobj} happens.  If you need this guarantee, you must use
+a stronger memory barrier such as:
 
-return 0;
+@smallexample
+int *ptr = @var{something};
+volatile int vobj;
+*ptr = @var{something};
+asm volatile ("" : : : "memory");
+vobj = 1;
+@end smallexample
 
-carry:
-return 1;
-@end example
+A scalar volatile object is read when it is accessed in a void context:
 
-The following example shows an @code{asm goto} that uses a memory clobber.
+@smallexample
+volatile int *src = @var{somevalue};
+*src;
+@end smallexample
 
-@example
-int frob(int x)
-@{
-  int y;
-  asm goto ("frob %%r5, %1; jc %l[error]; mov (%2), %%r5"
-            : /* No outputs. */
-            : "r"(x), "r"(&y)
-            : "r5", "memory" 
-            : error);
-  return y;
-error:
-  return -1;
-@}
-@end example
+Such expressions are rvalues, and GCC implements this as a
+read of the volatile object being pointed to.
 
-The following example shows an @code{asm goto} that uses an output.
+Assignments are also expressions and have an rvalue.  However when
+assigning to a scalar volatile, the volatile object is not reread,
+regardless of whether the assignment expression's rvalue is used or
+not.  If the assignment's rvalue is used, the value is that assigned
+to the volatile object.  For instance, there is no read of @var{vobj}
+in all the following cases:
 
-@example
-int foo(int count)
-@{
-  asm goto ("dec %0; jb %l[stop]"
-            : "+r" (count)
-            :
-            :
-            : stop);
-  return count;
-stop:
-  return 0;
-@}
-@end example
-
-The following artificial example shows an @code{asm goto} that sets
-up an output only on one path inside the @code{asm goto}.  Usage of
-constraint modifier @samp{=} instead of @samp{+} would be wrong as
-@code{factor} is used on all paths from the @code{asm goto}.
+@smallexample
+int obj;
+volatile int vobj;
+vobj = @var{something};
+obj = vobj = @var{something};
+obj ? vobj = @var{onething} : vobj = @var{anotherthing};
+obj = (@var{something}, vobj = @var{anotherthing});
+@end smallexample
 
-@example
-int foo(int inp)
-@{
-  int factor = 0;
-  asm goto ("cmp %1, 10; jb %l[lab]; mov 2, %0"
-            : "+r" (factor)
-            : "r" (inp)
-            :
-            : lab);
-lab:
-  return inp * factor; /* return 2 * inp or 0 if inp < 10 */
-@}
-@end example
+If you need to read the volatile object after an assignment has
+occurred, you must use a separate expression with an intervening
+sequence point.
 
-@anchor{GenericOperandmodifiers}
-@subsubsection Generic Operand Modifiers
-@noindent
-The following table shows the modifiers supported by all targets and their effects:
+As bit-fields are not individually addressable, volatile bit-fields may
+be implicitly read when written to, or when adjacent bit-fields are
+accessed.  Bit-field operations may be optimized such that adjacent
+bit-fields are only partially accessed, if they straddle a storage unit
+boundary.  For these reasons it is unwise to use volatile bit-fields to
+access hardware.
 
-@multitable @columnfractions 0.15 0.7 0.15
-@headitem Modifier @tab Description @tab Example
-@item @code{c}
-@tab Require a constant operand and print the constant expression with no punctuation.
-@tab @code{%c0}
-@item @code{cc}
-@tab Like @samp{%c} except try harder to print it with no punctuation.
-@samp{%c} can e.g.@: fail to print constant addresses in position independent code on
-some architectures.
-@tab @code{%cc0}
-@item @code{n}
-@tab Like @samp{%c} except that the value of the constant is negated before printing.
-@tab @code{%n0}
-@item @code{a}
-@tab Substitute a memory reference, with the actual operand treated as the address.
-This may be useful when outputting a ``load address'' instruction, because
-often the assembler syntax for such an instruction requires you to write the
-operand as if it were a memory reference.
-@tab @code{%a0}
-@item @code{l}
-@tab Print the label name with no punctuation.
-@tab @code{%l0}
-@end multitable
+@node Using Assembly Language with C
+@section How to Use Inline Assembly Language in C Code
+@cindex @code{asm} keyword
+@cindex assembly language in C
+@cindex inline assembly language
+@cindex mixing assembly language and C
 
-@anchor{aarch64Operandmodifiers}
-@subsubsection AArch64 Operand Modifiers
+The @code{asm} keyword allows you to embed assembler instructions
+within C code.  GCC provides two forms of inline @code{asm}
+statements.  A @dfn{basic @code{asm}} statement is one with no
+operands (@pxref{Basic Asm}), while an @dfn{extended @code{asm}}
+statement (@pxref{Extended Asm}) includes one or more operands.  
+The extended form is preferred for mixing C and assembly language
+within a function and can be used at top level as well with certain
+restrictions.
 
-The following table shows the modifiers supported by AArch64 and their effects:
+You can also use the @code{asm} keyword to override the assembler name
+for a C symbol, or to place a C variable in a specific register.
 
-@multitable @columnfractions .10 .90
-@headitem Modifier @tab Description
-@item @code{w} @tab Print a 32-bit general-purpose register name or, given a
-constant zero operand, the 32-bit zero register (@code{wzr}).
-@item @code{x} @tab Print a 64-bit general-purpose register name or, given a
-constant zero operand, the 64-bit zero register (@code{xzr}).
-@item @code{b} @tab Print an FP/SIMD register name with a @code{b} (byte, 8-bit)
-prefix.
-@item @code{h} @tab Print an FP/SIMD register name with an @code{h} (halfword,
-16-bit) prefix.
-@item @code{s} @tab Print an FP/SIMD register name with an @code{s} (single
-word, 32-bit) prefix.
-@item @code{d} @tab Print an FP/SIMD register name with a @code{d} (doubleword,
-64-bit) prefix.
-@item @code{q} @tab Print an FP/SIMD register name with a @code{q} (quadword,
-128-bit) prefix.
-@item @code{Z} @tab Print an FP/SIMD register name as an SVE register (i.e. with
-a @code{z} prefix).  This is a no-op for SVE register operands.
-@end multitable
+@menu
+* Basic Asm::          Inline assembler without operands.
+* Extended Asm::       Inline assembler with operands.
+* Constraints::        Constraints for @code{asm} operands
+* Asm constexprs::     C++11 constant expressions instead of string
+                       literals.
+* Asm Labels::         Specifying the assembler name to use for a C symbol.
+* Explicit Register Variables::  Defining variables residing in specified 
+                       registers.
+* Size of an asm::     How GCC calculates the size of an @code{asm} block.
+@end menu
 
-@anchor{x86Operandmodifiers}
-@subsubsection x86 Operand Modifiers
+@node Basic Asm
+@subsection Basic Asm --- Assembler Instructions Without Operands
+@cindex basic @code{asm}
+@cindex assembly language in C, basic
 
-References to input, output, and goto operands in the assembler template
-of extended @code{asm} statements can use 
-modifiers to affect the way the operands are formatted in 
-the code output to the assembler. For example, the 
-following code uses the @samp{h} and @samp{b} modifiers for x86:
+A basic @code{asm} statement has the following syntax:
 
 @example
-uint16_t  num;
-asm volatile ("xchg %h0, %b0" : "+a" (num) );
+asm @var{asm-qualifiers} ( @var{AssemblerInstructions} )
 @end example
 
-@noindent
-These modifiers generate this assembler code:
+For the C language, the @code{asm} keyword is a GNU extension.
+When writing C code that can be compiled with @option{-ansi} and the
+@option{-std} options that select C dialects without GNU extensions, use
+@code{__asm__} instead of @code{asm} (@pxref{Alternate Keywords}).  For
+the C++ language, @code{asm} is a standard keyword, but @code{__asm__}
+can be used for code compiled with @option{-fno-asm}.
 
-@example
-xchg %ah, %al
-@end example
+@subsubheading Qualifiers
+@table @code
+@item volatile
+The optional @code{volatile} qualifier has no effect. 
+All basic @code{asm} blocks are implicitly volatile.
+Basic @code{asm} statements outside of functions may not use any
+qualifiers.
 
-The rest of this discussion uses the following code for illustrative purposes.
+@item inline
+If you use the @code{inline} qualifier, then for inlining purposes the size
+of the @code{asm} statement is taken as the smallest size possible (@pxref{Size
+of an asm}).
+@end table
 
-@example
-int main()
-@{
-   int iInt = 1;
+@subsubheading Parameters
+@table @var
 
-top:
+@item AssemblerInstructions
+This is a literal string that specifies the assembler code.
+In C++ with @option{-std=gnu++11} or later, it can
+also be a constant expression inside parentheses (see @ref{Asm constexprs}).
 
-   asm volatile goto ("some assembler instructions here"
-   : /* No outputs. */
-   : "q" (iInt), "X" (sizeof(unsigned char) + 1), "i" (42)
-   : /* No clobbers. */
-   : top);
-@}
-@end example
+The string can contain any instructions recognized by the assembler,
+including directives.  GCC does not parse the assembler instructions
+themselves and does not know what they mean or even whether they are
+valid assembler input.
 
-With no modifiers, this is what the output from the operands would be
-for the @samp{att} and @samp{intel} dialects of assembler:
+You may place multiple assembler instructions together in a single @code{asm}
+string, separated by the characters normally used in assembly code for the
+system. A combination that works in most places is a newline to break the
+line, plus a tab character (written as @samp{\n\t}).
+Some assemblers allow semicolons as a line separator. However,
+note that some assembler dialects use semicolons to start a comment.
+@end table
 
-@multitable {Operand} {$.L2} {OFFSET FLAT:.L2}
-@headitem Operand @tab @samp{att} @tab @samp{intel}
-@item @code{%0}
-@tab @code{%eax}
-@tab @code{eax}
-@item @code{%1}
-@tab @code{$2}
-@tab @code{2}
-@item @code{%3}
-@tab @code{$.L3}
-@tab @code{OFFSET FLAT:.L3}
-@item @code{%4}
-@tab @code{$8}
-@tab @code{8}
-@item @code{%5}
-@tab @code{%xmm0}
-@tab @code{xmm0}
-@item @code{%7}
-@tab @code{$0}
-@tab @code{0}
-@end multitable
+@subsubheading Remarks
+Using extended @code{asm} (@pxref{Extended Asm}) typically produces
+smaller, safer, and more efficient code, and in most cases it is a
+better solution than basic @code{asm}.  However, functions declared
+with the @code{naked} attribute require only basic @code{asm}
+(@pxref{Function Attributes}).
 
-The table below shows the list of supported modifiers and their effects.
+Basic @code{asm} statements may be used both inside a C function or at
+file scope (``top-level''), where you can use this technique to emit
+assembler directives, define assembly language macros that can be invoked
+elsewhere in the file, or write entire functions in assembly language.
 
-@multitable {Modifier} {Print the opcode suffix for the size of th} {Operand} {@samp{att}} {@samp{intel}}
-@headitem Modifier @tab Description @tab Operand @tab @samp{att} @tab @samp{intel}
-@item @code{A}
-@tab Print an absolute memory reference.
-@tab @code{%A0}
-@tab @code{*%rax}
-@tab @code{rax}
-@item @code{b}
-@tab Print the QImode name of the register.
-@tab @code{%b0}
-@tab @code{%al}
-@tab @code{al}
-@item @code{B}
-@tab print the opcode suffix of b.
-@tab @code{%B0}
-@tab @code{b}
-@tab
-@item @code{c}
-@tab Require a constant operand and print the constant expression with no punctuation.
-@tab @code{%c1}
-@tab @code{2}
-@tab @code{2}
-@item @code{d}
-@tab print duplicated register operand for AVX instruction.
-@tab @code{%d5}
-@tab @code{%xmm0, %xmm0}
-@tab @code{xmm0, xmm0}
-@item @code{E}
-@tab Print the address in Double Integer (DImode) mode (8 bytes) when the target is 64-bit.
-Otherwise mode is unspecified (VOIDmode).
-@tab @code{%E1}
-@tab @code{%(rax)}
-@tab @code{[rax]}
-@item @code{g}
-@tab Print the V16SFmode name of the register.
-@tab @code{%g0}
-@tab @code{%zmm0}
-@tab @code{zmm0}
-@item @code{h}
-@tab Print the QImode name for a ``high'' register.
-@tab @code{%h0}
-@tab @code{%ah}
-@tab @code{ah}
-@item @code{H}
-@tab Add 8 bytes to an offsettable memory reference. Useful when accessing the
-high 8 bytes of SSE values. For a memref in (%rax), it generates
-@tab @code{%H0}
-@tab @code{8(%rax)}
-@tab @code{8[rax]}
-@item @code{k}
-@tab Print the SImode name of the register.
-@tab @code{%k0}
-@tab @code{%eax}
-@tab @code{eax}
-@item @code{l}
-@tab Print the label name with no punctuation.
-@tab @code{%l3}
-@tab @code{.L3}
-@tab @code{.L3}
-@item @code{L}
-@tab print the opcode suffix of l.
-@tab @code{%L0}
-@tab @code{l}
-@tab
-@item @code{N}
-@tab print maskz.
-@tab @code{%N7}
-@tab @code{@{z@}}
-@tab @code{@{z@}}
-@item @code{p}
-@tab Print raw symbol name (without syntax-specific prefixes).
-@tab @code{%p2}
-@tab @code{42}
-@tab @code{42}
-@item @code{P}
-@tab If used for a function, print the PLT suffix and generate PIC code.
-For example, emit @code{foo@@PLT} instead of 'foo' for the function
-foo(). If used for a constant, drop all syntax-specific prefixes and
-issue the bare constant. See @code{p} above.
-@item @code{q}
-@tab Print the DImode name of the register.
-@tab @code{%q0}
-@tab @code{%rax}
-@tab @code{rax}
-@item @code{Q}
-@tab print the opcode suffix of q.
-@tab @code{%Q0}
-@tab @code{q}
-@tab
-@item @code{R}
-@tab print embedded rounding and sae.
-@tab @code{%R4}
-@tab @code{@{rn-sae@}, }
-@tab @code{, @{rn-sae@}}
-@item @code{r}
-@tab print only sae.
-@tab @code{%r4}
-@tab @code{@{sae@}, }
-@tab @code{, @{sae@}}
-@item @code{s}
-@tab print a shift double count, followed by the assemblers argument
-delimiterprint the opcode suffix of s.
-@tab @code{%s1}
-@tab @code{$2, }
-@tab @code{2, }
-@item @code{S}
-@tab print the opcode suffix of s.
-@tab @code{%S0}
-@tab @code{s}
-@tab
-@item @code{t}
-@tab print the V8SFmode name of the register.
-@tab @code{%t5}
-@tab @code{%ymm0}
-@tab @code{ymm0}
-@item @code{T}
-@tab print the opcode suffix of t.
-@tab @code{%T0}
-@tab @code{t}
-@tab
-@item @code{V}
-@tab print naked full integer register name without %.
-@tab @code{%V0}
-@tab @code{eax}
-@tab @code{eax}
-@item @code{w}
-@tab Print the HImode name of the register.
-@tab @code{%w0}
-@tab @code{%ax}
-@tab @code{ax}
-@item @code{W}
-@tab print the opcode suffix of w.
-@tab @code{%W0}
-@tab @code{w}
-@tab
-@item @code{x}
-@tab print the V4SFmode name of the register.
-@tab @code{%x5}
-@tab @code{%xmm0}
-@tab @code{xmm0}
-@item @code{y}
-@tab print "st(0)" instead of "st" as a register.
-@tab @code{%y6}
-@tab @code{%st(0)}
-@tab @code{st(0)}
-@item @code{z}
-@tab Print the opcode suffix for the size of the current integer operand (one of @code{b}/@code{w}/@code{l}/@code{q}).
-@tab @code{%z0}
-@tab @code{l}
-@tab 
-@item @code{Z}
-@tab Like @code{z}, with special suffixes for x87 instructions.
-@end multitable
+Safely accessing C data and calling functions from basic @code{asm} is more 
+complex than it may appear. To access C data, it is better to use extended 
+@code{asm}.
 
+Do not expect a sequence of @code{asm} statements to remain perfectly 
+consecutive after compilation. If certain instructions need to remain 
+consecutive in the output, put them in a single multi-instruction @code{asm}
+statement. Note that GCC's optimizers can move @code{asm} statements 
+relative to other code, including across jumps.
 
-@anchor{x86floatingpointasmoperands}
-@subsubsection x86 Floating-Point @code{asm} Operands
+@code{asm} statements may not perform jumps into other @code{asm} statements. 
+GCC does not know about these jumps, and therefore cannot take 
+account of them when deciding how to optimize. Jumps from @code{asm} to C 
+labels are only supported in extended @code{asm}.
 
-On x86 targets, there are several rules on the usage of stack-like registers
-in the operands of an @code{asm}.  These rules apply only to the operands
-that are stack-like registers:
+Under certain circumstances, GCC may duplicate (or remove duplicates of) your 
+assembly code when optimizing. This can lead to unexpected duplicate 
+symbol errors during compilation if your assembly code defines symbols or 
+labels.
 
-@enumerate
-@item
-Given a set of input registers that die in an @code{asm}, it is
-necessary to know which are implicitly popped by the @code{asm}, and
-which must be explicitly popped by GCC@.
+@strong{Warning:} The C standards do not specify semantics for @code{asm},
+making it a potential source of incompatibilities between compilers.  These
+incompatibilities may not produce compiler warnings/errors.
 
-An input register that is implicitly popped by the @code{asm} must be
-explicitly clobbered, unless it is constrained to match an
-output operand.
+GCC does not parse basic @code{asm}'s @var{AssemblerInstructions}, which
+means there is no way to communicate to the compiler what is happening
+inside them.  GCC has no visibility of symbols in the @code{asm} and may
+discard them as unreferenced.  It also does not know about side effects of
+the assembler code, such as modifications to memory or registers.  Unlike
+some compilers, GCC assumes that no changes to general purpose registers
+occur.  This assumption may change in a future release.
 
-@item
-For any input register that is implicitly popped by an @code{asm}, it is
-necessary to know how to adjust the stack to compensate for the pop.
-If any non-popped input is closer to the top of the reg-stack than
-the implicitly popped register, it would not be possible to know what the
-stack looked like---it's not clear how the rest of the stack ``slides
-up''.
+To avoid complications from future changes to the semantics and the
+compatibility issues between compilers, consider replacing basic @code{asm}
+with extended @code{asm}.  See
+@uref{https://gcc.gnu.org/wiki/ConvertBasicAsmToExtended, How to convert
+from basic asm to extended asm} for information about how to perform this
+conversion.
 
-All implicitly popped input registers must be closer to the top of
-the reg-stack than any input that is not implicitly popped.
+The compiler copies the assembler instructions in a basic @code{asm} 
+verbatim to the assembly language output file, without 
+processing dialects or any of the @samp{%} operators that are available with
+extended @code{asm}. This results in minor differences between basic 
+@code{asm} strings and extended @code{asm} templates. For example, to refer to 
+registers you might use @samp{%eax} in basic @code{asm} and
+@samp{%%eax} in extended @code{asm}.
 
-It is possible that if an input dies in an @code{asm}, the compiler might
-use the input register for an output reload.  Consider this example:
+On targets such as x86 that support multiple assembler dialects,
+all basic @code{asm} blocks use the assembler dialect specified by the 
+@option{-masm} command-line option (@pxref{x86 Options}).  
+Basic @code{asm} provides no
+mechanism to provide different assembler strings for different dialects.
 
-@smallexample
-asm ("foo" : "=t" (a) : "f" (b));
-@end smallexample
+For basic @code{asm} with non-empty assembler string GCC assumes
+the assembler block does not change any general purpose registers,
+but it may read or write any globally accessible variable.
 
-@noindent
-This code says that input @code{b} is not popped by the @code{asm}, and that
-the @code{asm} pushes a result onto the reg-stack, i.e., the stack is one
-deeper after the @code{asm} than it was before.  But, it is possible that
-reload may think that it can use the same register for both the input and
-the output.
+Here is an example of basic @code{asm} for i386:
 
-To prevent this from happening,
-if any input operand uses the @samp{f} constraint, all output register
-constraints must use the @samp{&} early-clobber modifier.
+@example
+/* Note that this code will not compile with -masm=intel */
+#define DebugBreak() asm("int $3")
+@end example
 
-The example above is correctly written as:
+@node Extended Asm
+@subsection Extended Asm - Assembler Instructions with C Expression Operands
+@cindex extended @code{asm}
+@cindex assembly language in C, extended
 
-@smallexample
-asm ("foo" : "=&t" (a) : "f" (b));
-@end smallexample
+With extended @code{asm} you can read and write C variables from 
+assembler and perform jumps from assembler code to C labels.  
+Extended @code{asm} syntax uses colons (@samp{:}) to delimit
+the operand parameters after the assembler template:
 
-@item
-Some operands need to be in particular places on the stack.  All
-output operands fall in this category---GCC has no other way to
-know which registers the outputs appear in unless you indicate
-this in the constraints.
+@example
+asm @var{asm-qualifiers} ( @var{AssemblerTemplate} 
+                 : @var{OutputOperands} 
+                 @r{[} : @var{InputOperands}
+                 @r{[} : @var{Clobbers} @r{]} @r{]})
 
-Output operands must specifically indicate which register an output
-appears in after an @code{asm}.  @samp{=f} is not allowed: the operand
-constraints must select a class with a single register.
+asm @var{asm-qualifiers} ( @var{AssemblerTemplate} 
+                      : @var{OutputOperands}
+                      : @var{InputOperands}
+                      : @var{Clobbers}
+                      : @var{GotoLabels})
+@end example
+where in the last form, @var{asm-qualifiers} contains @code{goto} (and in the
+first form, not).
 
-@item
-Output operands may not be ``inserted'' between existing stack registers.
-Since no 387 opcode uses a read/write operand, all output operands
-are dead before the @code{asm}, and are pushed by the @code{asm}.
-It makes no sense to push anywhere but the top of the reg-stack.
+The @code{asm} keyword is a GNU extension.
+When writing code that can be compiled with @option{-ansi} and the
+various @option{-std} options, use @code{__asm__} instead of 
+@code{asm} (@pxref{Alternate Keywords}).
 
-Output operands must start at the top of the reg-stack: output
-operands may not ``skip'' a register.
+@subsubheading Qualifiers
+@table @code
 
-@item
-Some @code{asm} statements may need extra stack space for internal
-calculations.  This can be guaranteed by clobbering stack registers
-unrelated to the inputs and outputs.
+@item volatile
+The typical use of extended @code{asm} statements is to manipulate input 
+values to produce output values. However, your @code{asm} statements may 
+also produce side effects. If so, you may need to use the @code{volatile} 
+qualifier to disable certain optimizations. @xref{Volatile}.
 
-@end enumerate
+@item inline
+If you use the @code{inline} qualifier, then for inlining purposes the size
+of the @code{asm} statement is taken as the smallest size possible
+(@pxref{Size of an asm}).
 
-This @code{asm}
-takes one input, which is internally popped, and produces two outputs.
+@item goto
+This qualifier informs the compiler that the @code{asm} statement may 
+perform a jump to one of the labels listed in the @var{GotoLabels}.
+@xref{GotoLabels}.
+@end table
 
-@smallexample
-asm ("fsincos" : "=t" (cos), "=u" (sin) : "0" (inp));
-@end smallexample
+@subsubheading Parameters
+@table @var
+@item AssemblerTemplate
+This is a literal string that is the template for the assembler code. It is a 
+combination of fixed text and tokens that refer to the input, output, 
+and goto parameters. @xref{AssemblerTemplate}.
 
-@noindent
-This @code{asm} takes two inputs, which are popped by the @code{fyl2xp1} opcode,
-and replaces them with one output.  The @code{st(1)} clobber is necessary 
-for the compiler to know that @code{fyl2xp1} pops both inputs.
+@item OutputOperands
+A comma-separated list describing the C variables modified by the
+instructions in the @var{AssemblerTemplate}.  An empty list is permitted.
+@xref{OutputOperands}.
 
-@smallexample
-asm ("fyl2xp1" : "=t" (result) : "0" (x), "u" (y) : "st(1)");
-@end smallexample
+@item InputOperands
+A comma-separated list describing the C expressions read by the
+instructions in the @var{AssemblerTemplate}.  An empty list is permitted.
+@xref{InputOperands}.
 
-@anchor{msp430Operandmodifiers}
-@subsubsection MSP430 Operand Modifiers
+@item Clobbers
+A comma-separated list of registers or other values changed by the 
+@var{AssemblerTemplate}, beyond those listed as outputs.
+An empty list is permitted.  @xref{Clobbers and Scratch Registers}.
 
-The list below describes the supported modifiers and their effects for MSP430.
+@item GotoLabels
+When you are using the @code{goto} form of @code{asm}, this section contains 
+the list of all C labels to which the code in the 
+@var{AssemblerTemplate} may jump. 
+@xref{GotoLabels}.
 
-@multitable @columnfractions .10 .90
-@headitem Modifier @tab Description
-@item @code{A} @tab Select low 16-bits of the constant/register/memory operand.
-@item @code{B} @tab Select high 16-bits of the constant/register/memory
-operand.
-@item @code{C} @tab Select bits 32-47 of the constant/register/memory operand.
-@item @code{D} @tab Select bits 48-63 of the constant/register/memory operand.
-@item @code{H} @tab Equivalent to @code{B} (for backwards compatibility).
-@item @code{I} @tab Print the inverse (logical @code{NOT}) of the constant
-value.
-@item @code{J} @tab Print an integer without a @code{#} prefix.
-@item @code{L} @tab Equivalent to @code{A} (for backwards compatibility).
-@item @code{O} @tab Offset of the current frame from the top of the stack.
-@item @code{Q} @tab Use the @code{A} instruction postfix.
-@item @code{R} @tab Inverse of condition code, for unsigned comparisons.
-@item @code{W} @tab Subtract 16 from the constant value.
-@item @code{X} @tab Use the @code{X} instruction postfix.
-@item @code{Y} @tab Subtract 4 from the constant value.
-@item @code{Z} @tab Subtract 1 from the constant value.
-@item @code{b} @tab Append @code{.B}, @code{.W} or @code{.A} to the
-instruction, depending on the mode.
-@item @code{d} @tab Offset 1 byte of a memory reference or constant value.
-@item @code{e} @tab Offset 3 bytes of a memory reference or constant value.
-@item @code{f} @tab Offset 5 bytes of a memory reference or constant value.
-@item @code{g} @tab Offset 7 bytes of a memory reference or constant value.
-@item @code{p} @tab Print the value of 2, raised to the power of the given
-constant.  Used to select the specified bit position.
-@item @code{r} @tab Inverse of condition code, for signed comparisons.
-@item @code{x} @tab Equivalent to @code{X}, but only for pointers.
-@end multitable
+@code{asm} statements may not perform jumps into other @code{asm} statements,
+only to the listed @var{GotoLabels}.
+GCC's optimizers do not know about other jumps; therefore they cannot take 
+account of them when deciding how to optimize.
+@end table
 
-@anchor{loongarchOperandmodifiers}
-@subsubsection LoongArch Operand Modifiers
+The total number of input + output + goto operands is limited to 30.
 
-The list below describes the supported modifiers and their effects for LoongArch.
+@subsubheading Remarks
+The @code{asm} statement allows you to include assembly instructions directly 
+within C code. This may help you to maximize performance in time-sensitive 
+code or to access assembly instructions that are not readily available to C 
+programs.
 
-@multitable @columnfractions .10 .90
-@headitem Modifier @tab Description
-@item @code{d} @tab Same as @code{c}.
-@item @code{i} @tab Print the character ''@code{i}'' if the operand is not a register.
-@item @code{m} @tab Same as @code{c}, but the printed value is @code{operand - 1}.
-@item @code{u} @tab Print a LASX register.
-@item @code{w} @tab Print a LSX register.
-@item @code{X} @tab Print a constant integer operand in hexadecimal.
-@item @code{z} @tab Print the operand in its unmodified form, followed by a comma.
-@end multitable
+Similarly to basic @code{asm}, extended @code{asm} statements may be used
+both inside a C function or at file scope (``top-level''), where you can
+use this technique to emit assembler directives, define assembly language
+macros that can be invoked elsewhere in the file, or write entire functions
+in assembly language.
+Extended @code{asm} statements outside of functions may not use any
+qualifiers, may not specify clobbers, may not use @code{%}, @code{+} or
+@code{&} modifiers in constraints and can only use constraints which don't
+allow using any register.
 
-References to input and output operands in the assembler template of extended
-asm statements can use modifiers to affect the way the operands are formatted
-in the code output to the assembler.  For example, the following code uses the
-'w' modifier for LoongArch:
+Functions declared with the @code{naked} attribute require basic 
+@code{asm} (@pxref{Function Attributes}).
+
+While the uses of @code{asm} are many and varied, it may help to think of an 
+@code{asm} statement as a series of low-level instructions that convert input 
+parameters to output parameters. So a simple (if not particularly useful) 
+example for i386 using @code{asm} might look like this:
 
 @example
-test-asm.c:
+int src = 1;
+int dst;   
 
-#include <lsxintrin.h>
+asm ("mov %1, %0\n\t"
+    "add $1, %0"
+    : "=r" (dst) 
+    : "r" (src));
 
-__m128i foo (void)
+printf("%d\n", dst);
+@end example
+
+This code copies @code{src} to @code{dst} and add 1 to @code{dst}.
+
+@anchor{Volatile}
+@subsubsection Volatile
+@cindex volatile @code{asm}
+@cindex @code{asm} volatile
+
+GCC's optimizers sometimes discard @code{asm} statements if they determine 
+there is no need for the output variables. Also, the optimizers may move 
+code out of loops if they believe that the code will always return the same 
+result (i.e.@: none of its input values change between calls). Using the 
+@code{volatile} qualifier disables these optimizations. @code{asm} statements 
+that have no output operands and @code{asm goto} statements, 
+are implicitly volatile.
+
+This i386 code demonstrates a case that does not use (or require) the 
+@code{volatile} qualifier. If it is performing assertion checking, this code 
+uses @code{asm} to perform the validation. Otherwise, @code{dwRes} is 
+unreferenced by any code. As a result, the optimizers can discard the 
+@code{asm} statement, which in turn removes the need for the entire 
+@code{DoCheck} routine. By omitting the @code{volatile} qualifier when it 
+isn't needed you allow the optimizers to produce the most efficient code 
+possible.
+
+@example
+void DoCheck(uint32_t dwSomeValue)
 @{
-__m128i  a,b,c;
-__asm__ ("vadd.d %w0,%w1,%w2\n\t"
-   :"=f" (c)
-   :"f" (a),"f" (b));
+   uint32_t dwRes;
 
-return c;
-@}
+   // Assumes dwSomeValue is not zero.
+   asm ("bsfl %1,%0"
+     : "=r" (dwRes)
+     : "r" (dwSomeValue)
+     : "cc");
 
+   assert(dwRes > 3);
+@}
 @end example
 
-@noindent
-The compile command for the test case is as follows:
+The next example shows a case where the optimizers can recognize that the input 
+(@code{dwSomeValue}) never changes during the execution of the function and can 
+therefore move the @code{asm} outside the loop to produce more efficient code. 
+Again, using the @code{volatile} qualifier disables this type of optimization.
 
 @example
-gcc test-asm.c -mlsx -S -o test-asm.s
+void do_print(uint32_t dwSomeValue)
+@{
+   uint32_t dwRes;
+
+   for (uint32_t x=0; x < 5; x++)
+   @{
+      // Assumes dwSomeValue is not zero.
+      asm ("bsfl %1,%0"
+        : "=r" (dwRes)
+        : "r" (dwSomeValue)
+        : "cc");
+
+      printf("%u: %u %u\n", x, dwSomeValue, dwRes);
+   @}
+@}
 @end example
 
-@noindent
-The assembly statement produces the following assembly code:
+The following example demonstrates a case where you need to use the 
+@code{volatile} qualifier. 
+It uses the x86 @code{rdtsc} instruction, which reads 
+the computer's time-stamp counter. Without the @code{volatile} qualifier, 
+the optimizers might assume that the @code{asm} block will always return the 
+same value and therefore optimize away the second call.
 
 @example
-vadd.d $vr0,$vr0,$vr1
-@end example
+uint64_t msr;
 
-This is a 128-bit vector addition instruction, @code{c} (referred to in the
-template string as %0) is the output, and @code{a} (%1) and @code{b} (%2) are
-the inputs.  @code{__m128i} is a vector data type defined in the  file
-@code{lsxintrin.h} (@xref{LoongArch SX Vector Intrinsics}).  The symbol '=f'
-represents a constraint using a floating-point register as an output type, and
-the 'f' in the input operand represents a constraint using a floating-point
-register operand, which can refer to the definition of a constraint
-(@xref{Constraints}) in gcc.
+asm volatile ( "rdtsc\n\t"    // Returns the time in EDX:EAX.
+        "shl $32, %%rdx\n\t"  // Shift the upper bits left.
+        "or %%rdx, %0"        // 'Or' in the lower bits.
+        : "=a" (msr)
+        : 
+        : "rdx");
 
-@anchor{riscvOperandmodifiers}
-@subsubsection RISC-V Operand Modifiers
+printf("msr: %llx\n", msr);
 
-The list below describes the supported modifiers and their effects for RISC-V.
+// Do other work...
 
-@multitable @columnfractions .10 .90
-@headitem Modifier @tab Description
-@item @code{z} @tab Print ''@code{zero}'' instead of 0 if the operand is an immediate with a value of zero.
-@item @code{i} @tab Print the character ''@code{i}'' if the operand is an immediate.
-@item @code{N} @tab Print the register encoding as integer (0 - 31).
-@end multitable
+// Reprint the timestamp
+asm volatile ( "rdtsc\n\t"    // Returns the time in EDX:EAX.
+        "shl $32, %%rdx\n\t"  // Shift the upper bits left.
+        "or %%rdx, %0"        // 'Or' in the lower bits.
+        : "=a" (msr)
+        : 
+        : "rdx");
 
-@anchor{shOperandmodifiers}
-@subsubsection SH Operand Modifiers
+printf("msr: %llx\n", msr);
+@end example
 
-The list below describes the supported modifiers and their effects for the SH family of processors.
+GCC's optimizers do not treat this code like the non-volatile code in the 
+earlier examples. They do not move it out of loops or omit it on the 
+assumption that the result from a previous call is still valid.
 
-@multitable @columnfractions .10 .90
-@headitem Modifier @tab Description
-@item @code{.} @tab Print ''@code{.s}'' if the instruction needs a delay slot.
-@item @code{,} @tab Print ''@code{LOCAL_LABEL_PREFIX}''.
-@item @code{@@} @tab Print ''@code{trap}'', ''@code{rte}'' or ''@code{rts}'' depending on the interrupt pragma used.
-@item @code{#} @tab Print ''@code{nop}'' if there is nothing to put in the delay slot.
-@item @code{'} @tab Print likelihood suffix (''@code{/u}'' for unlikely).
-@item @code{>} @tab Print branch target if ''@code{-fverbose-asm}''.
-@item @code{O} @tab Require a constant operand and print the constant expression with no punctuation.
-@item @code{R} @tab Print the ''@code{LSW}'' of a dp value - changes if in little endian.
-@item @code{S} @tab Print the ''@code{MSW}'' of a dp value - changes if in little endian.
-@item @code{T} @tab Print the next word of a dp value - same as ''@code{R}'' in big endian mode.
-@item @code{M} @tab Print ''@code{.b }'', ''@code{.w}'', ''@code{.l}'', ''@code{.s}'', ''@code{.d}'', suffix if operand is a MEM.
-@item @code{N} @tab Print ''@code{r63}'' if the operand is ''@code{const_int 0}''.
-@item @code{d} @tab Print a ''@code{V2SF}'' as ''@code{dN}'' instead of ''@code{fpN}''.
-@item @code{m} @tab Print the pair ''@code{base,offset}'' or ''@code{base,index}'' for LD and ST.
-@item @code{U} @tab Like ''@code{%m}'' for ''@code{LD}'' and ''@code{ST}'', ''@code{HI}'' and ''@code{LO}''.
-@item @code{V} @tab Print the position of a single bit set.
-@item @code{W} @tab Print the position of a single bit cleared.
-@item @code{t} @tab Print a memory address which is a register.
-@item @code{u} @tab Print the lowest 16 bits of ''@code{CONST_INT}'', as an unsigned value.
-@item @code{o} @tab Print an operator.
-@end multitable
+Note that the compiler can move even @code{volatile asm} instructions relative
+to other code, including across jump instructions. For example, on many 
+targets there is a system register that controls the rounding mode of 
+floating-point operations. Setting it with a @code{volatile asm} statement,
+as in the following PowerPC example, does not work reliably.
 
-@lowersections
-@include md.texi
-@raisesections
+@example
+asm volatile("mtfsf 255, %0" : : "f" (fpenv));
+sum = x + y;
+@end example
 
-@node Asm constexprs
-@subsection C++11 Constant Expressions instead of String Literals
+The compiler may move the addition back before the @code{volatile asm}
+statement. To make it work as expected, add an artificial dependency to
+the @code{asm} by referencing a variable in the subsequent code, for
+example:
 
-In C++ with @option{-std=gnu++11} or later, strings that appear in asm
-syntax---specifically, the assembler template, constraints, and
-clobbers---can be specified as parenthesized compile-time constant
-expressions as well as by string literals.  The parentheses around such
-an expression are a required part of the syntax.  The constant expression
-can return a container with @code{data ()} and @code{size ()}
-member functions, following similar rules as the C++26 @code{static_assert}
-message.  Any string is converted to the character set of the source code.
-When this feature is available the @code{__GXX_CONSTEXPR_ASM__} preprocessor
-macro is predefined.
+@example
+asm volatile ("mtfsf 255,%1" : "=X" (sum) : "f" (fpenv));
+sum = x + y;
+@end example
 
-This extension is supported for both the basic and extended asm syntax.
+Under certain circumstances, GCC may duplicate (or remove duplicates of) your 
+assembly code when optimizing. This can lead to unexpected duplicate symbol 
+errors during compilation if your @code{asm} code defines symbols or labels. 
+Using @samp{%=} 
+(@pxref{AssemblerTemplate}) may help resolve this problem.
 
-@example
-#include <string>
-constexpr std::string_view genfoo() @{ return "foo"; @}
+@anchor{AssemblerTemplate}
+@subsubsection Assembler Template
+@cindex @code{asm} assembler template
 
-void function()
-@{
-  asm((genfoo()));
-@}
-@end example
+An assembler template is a literal string containing assembler instructions.
+In C++ with @option{-std=gnu++11} or later, the assembler template can
+also be a constant expression inside parentheses (see @ref{Asm constexprs}).
 
-@node Asm Labels
-@subsection Controlling Names Used in Assembler Code
-@cindex assembler names for identifiers
-@cindex names used in assembler code
-@cindex identifiers, names in assembler code
+The compiler replaces tokens in the template that refer 
+to inputs, outputs, and goto labels,
+and then outputs the resulting string to the assembler. The 
+string can contain any instructions recognized by the assembler, including 
+directives. GCC does not parse the assembler instructions 
+themselves and does not know what they mean or even whether they are valid 
+assembler input. However, it does count the statements 
+(@pxref{Size of an asm}).
 
-You can specify the name to be used in the assembler code for a C
-function or variable by writing the @code{asm} (or @code{__asm__})
-keyword after the declarator.
-It is up to you to make sure that the assembler names you choose do not
-conflict with any other assembler symbols, or reference registers.
+You may place multiple assembler instructions together in a single @code{asm} 
+string, separated by the characters normally used in assembly code for the 
+system. A combination that works in most places is a newline to break the 
+line, plus a tab character to move to the instruction field (written as 
+@samp{\n\t}). 
+Some assemblers allow semicolons as a line separator. However, note 
+that some assembler dialects use semicolons to start a comment. 
 
-@subsubheading Assembler names for data
+Do not expect a sequence of @code{asm} statements to remain perfectly 
+consecutive after compilation, even when you are using the @code{volatile} 
+qualifier. If certain instructions need to remain consecutive in the output, 
+put them in a single multi-instruction @code{asm} statement.
 
-This sample shows how to specify the assembler name for data:
+Accessing data from C programs without using input/output operands (such as 
+by using global symbols directly from the assembler template) may not work as 
+expected. Similarly, calling functions directly from an assembler template 
+requires a detailed understanding of the target assembler and ABI.
 
-@smallexample
-int foo asm ("myfoo") = 2;
-@end smallexample
+Since GCC does not parse the assembler template,
+it has no visibility of any 
+symbols it references. This may result in GCC discarding those symbols as 
+unreferenced unless they are also listed as input, output, or goto operands.
 
-@noindent
-This specifies that the name to be used for the variable @code{foo} in
-the assembler code should be @samp{myfoo} rather than the usual
-@samp{_foo}.
+@subsubheading Special format strings
 
-On systems where an underscore is normally prepended to the name of a C
-variable, this feature allows you to define names for the
-linker that do not start with an underscore.
+In addition to the tokens described by the input, output, and goto operands, 
+these tokens have special meanings in the assembler template:
 
-GCC does not support using this feature with a non-static local variable 
-since such variables do not have assembler names.  If you are
-trying to put the variable in a particular register, see 
-@ref{Explicit Register Variables}.
+@table @samp
+@item %% 
+Outputs a single @samp{%} into the assembler code.
 
-@subsubheading Assembler names for functions
+@item %= 
+Outputs a number that is unique to each instance of the @code{asm} 
+statement in the entire compilation. This option is useful when creating local 
+labels and referring to them multiple times in a single template that 
+generates multiple assembler instructions. 
 
-To specify the assembler name for functions, write a declaration for the 
-function before its definition and put @code{asm} there, like this:
+@item %@{
+@itemx %|
+@itemx %@}
+Outputs @samp{@{}, @samp{|}, and @samp{@}} characters (respectively)
+into the assembler code.  When unescaped, these characters have special
+meaning to indicate multiple assembler dialects, as described below.
+@end table
 
-@smallexample
-int func (int x, int y) asm ("MYFUNC");
-     
-int func (int x, int y)
-@{
-   /* @r{@dots{}} */
-@end smallexample
+@subsubheading Multiple assembler dialects in @code{asm} templates
 
-@noindent
-This specifies that the name to be used for the function @code{func} in
-the assembler code should be @code{MYFUNC}.
+On targets such as x86, GCC supports multiple assembler dialects.
+The @option{-masm} option controls which dialect GCC uses as its 
+default for inline assembler. The target-specific documentation for the 
+@option{-masm} option contains the list of supported dialects, as well as the 
+default dialect if the option is not specified. This information may be 
+important to understand, since assembler code that works correctly when 
+compiled using one dialect will likely fail if compiled using another.
+@xref{x86 Options}.
 
-@node Explicit Register Variables
-@subsection Variables in Specified Registers
-@anchor{Explicit Reg Vars}
-@cindex explicit register variables
-@cindex variables in specified registers
-@cindex specified registers
+If your code needs to support multiple assembler dialects (for example, if 
+you are writing public headers that need to support a variety of compilation 
+options), use constructs of this form:
 
-GNU C allows you to associate specific hardware registers with C 
-variables.  In almost all cases, allowing the compiler to assign
-registers produces the best code.  However under certain unusual
-circumstances, more precise control over the variable storage is 
-required.
+@example
+@{ dialect0 | dialect1 | dialect2... @}
+@end example
 
-Both global and local variables can be associated with a register.  The
-consequences of performing this association are very different between
-the two, as explained in the sections below.
+This construct outputs @code{dialect0} 
+when using dialect #0 to compile the code, 
+@code{dialect1} for dialect #1, etc. If there are fewer alternatives within the 
+braces than the number of dialects the compiler supports, the construct 
+outputs nothing.
 
-@menu
-* Global Register Variables::   Variables declared at global scope.
-* Local Register Variables::    Variables declared within a function.
-@end menu
+For example, if an x86 compiler supports two dialects
+(@samp{att}, @samp{intel}), an 
+assembler template such as this:
 
-@node Global Register Variables
-@subsubsection Defining Global Register Variables
-@anchor{Global Reg Vars}
-@cindex global register variables
-@cindex registers, global variables in
-@cindex registers, global allocation
+@example
+"bt@{l %[Offset],%[Base] | %[Base],%[Offset]@}; jc %l2"
+@end example
 
-You can define a global register variable and associate it with a specified 
-register like this:
+@noindent
+is equivalent to one of
 
-@smallexample
-register int *foo asm ("r12");
-@end smallexample
+@example
+"btl %[Offset],%[Base] ; jc %l2"   @r{/* att dialect */}
+"bt %[Base],%[Offset]; jc %l2"     @r{/* intel dialect */}
+@end example
+
+Using that same compiler, this code:
+
+@example
+"xchg@{l@}\t@{%%@}ebx, %1"
+@end example
 
 @noindent
-Here @code{r12} is the name of the register that should be used. Note that 
-this is the same syntax used for defining local register variables, but for 
-a global variable the declaration appears outside a function. The 
-@code{register} keyword is required, and cannot be combined with 
-@code{static}. The register name must be a valid register name for the
-target platform.
+corresponds to either
 
-Do not use type qualifiers such as @code{const} and @code{volatile}, as
-the outcome may be contrary to expectations.  In  particular, using the
-@code{volatile} qualifier does not fully prevent the compiler from
-optimizing accesses to the register.
+@example
+"xchgl\t%%ebx, %1"                 @r{/* att dialect */}
+"xchg\tebx, %1"                    @r{/* intel dialect */}
+@end example
 
-Registers are a scarce resource on most systems and allowing the 
-compiler to manage their usage usually results in the best code. However, 
-under special circumstances it can make sense to reserve some globally.
-For example this may be useful in programs such as programming language 
-interpreters that have a couple of global variables that are accessed 
-very often.
+There is no support for nesting dialect alternatives.
 
-After defining a global register variable, for the current compilation
-unit:
+@anchor{OutputOperands}
+@subsubsection Output Operands
+@cindex @code{asm} output operands
 
-@itemize @bullet
-@item If the register is a call-saved register, call ABI is affected:
-the register will not be restored in function epilogue sequences after
-the variable has been assigned.  Therefore, functions cannot safely
-return to callers that assume standard ABI.
-@item Conversely, if the register is a call-clobbered register, making
-calls to functions that use standard ABI may lose contents of the variable.
-Such calls may be created by the compiler even if none are evident in
-the original program, for example when libgcc functions are used to
-make up for unavailable instructions.
-@item Accesses to the variable may be optimized as usual and the register
-remains available for allocation and use in any computations, provided that
-observable values of the variable are not affected.
-@item If the variable is referenced in inline assembly, the type of access
-must be provided to the compiler via constraints (@pxref{Constraints}).
-Accesses from basic asms are not supported.
-@end itemize
+An @code{asm} statement has zero or more output operands indicating the names
+of C variables modified by the assembler code.
 
-Note that these points @emph{only} apply to code that is compiled with the
-definition. The behavior of code that is merely linked in (for example 
-code from libraries) is not affected.
+In this i386 example, @code{old} (referred to in the template string as 
+@code{%0}) and @code{*Base} (as @code{%1}) are outputs and @code{Offset} 
+(@code{%2}) is an input:
 
-If you want to recompile source files that do not actually use your global 
-register variable so they do not use the specified register for any other 
-purpose, you need not actually add the global register declaration to 
-their source code. It suffices to specify the compiler option 
-@option{-ffixed-@var{reg}} (@pxref{Code Gen Options}) to reserve the 
-register.
+@example
+bool old;
 
-@subsubheading Declaring the variable
+__asm__ ("btsl %2,%1\n\t" // Turn on zero-based bit #Offset in Base.
+         "sbb %0,%0"      // Use the CF to calculate old.
+   : "=r" (old), "+rm" (*Base)
+   : "Ir" (Offset)
+   : "cc");
 
-Global register variables cannot have initial values, because an
-executable file has no means to supply initial contents for a register.
+return old;
+@end example
 
-When selecting a register, choose one that is normally saved and 
-restored by function calls on your machine. This ensures that code
-which is unaware of this reservation (such as library routines) will 
-restore it before returning.
+Operands are separated by commas.  Each operand has this format:
 
-On machines with register windows, be sure to choose a global
-register that is not affected magically by the function call mechanism.
+@example
+@r{[} [@var{asmSymbolicName}] @r{]} @var{constraint} (@var{cvariablename})
+@end example
 
-@subsubheading Using the variable
+@table @var
+@item asmSymbolicName
+Specifies an optional symbolic name for the operand.  The literal square
+brackets @samp{[]} around the @var{asmSymbolicName} are required both
+in the operand specification and references to the operand in the assembler
+template, i.e.@: @samp{%[Value]}.
+The scope of the name is the @code{asm} statement
+that contains the definition. Any valid C identifier is acceptable,
+including names already defined in the surrounding code. No two operands 
+within the same @code{asm} statement can use the same symbolic name.
 
-@cindex @code{qsort}, and global register variables
-When calling routines that are not aware of the reservation, be 
-cautious if those routines call back into code which uses them. As an 
-example, if you call the system library version of @code{qsort}, it may 
-clobber your registers during execution, but (if you have selected 
-appropriate registers) it will restore them before returning. However 
-it will @emph{not} restore them before calling @code{qsort}'s comparison 
-function. As a result, global values will not reliably be available to 
-the comparison function unless the @code{qsort} function itself is rebuilt.
+When not using an @var{asmSymbolicName}, use the (zero-based) position
+of the operand 
+in the list of operands in the assembler template. For example if there are 
+three output operands, use @samp{%0} in the template to refer to the first, 
+@samp{%1} for the second, and @samp{%2} for the third. 
 
-Similarly, it is not safe to access the global register variables from signal
-handlers or from more than one thread of control. Unless you recompile 
-them specially for the task at hand, the system library routines may 
-temporarily use the register for other things.  Furthermore, since the register
-is not reserved exclusively for the variable, accessing it from handlers of
-asynchronous signals may observe unrelated temporary values residing in the
-register.
+@item constraint
+A string constant specifying constraints on the placement of the operand; 
+@xref{Constraints}, for details.
+In C++ with @option{-std=gnu++11} or later, the constraint can
+also be a constant expression inside parentheses (see @ref{Asm constexprs}).
 
-@cindex register variable after @code{longjmp}
-@cindex global register after @code{longjmp}
-@cindex value after @code{longjmp}
-@findex longjmp
-@findex setjmp
-On most machines, @code{longjmp} restores to each global register
-variable the value it had at the time of the @code{setjmp}. On some
-machines, however, @code{longjmp} does not change the value of global
-register variables. To be portable, the function that called @code{setjmp}
-should make other arrangements to save the values of the global register
-variables, and to restore them in a @code{longjmp}. This way, the same
-thing happens regardless of what @code{longjmp} does.
+Output constraints must begin with either @samp{=} (a variable overwriting an 
+existing value) or @samp{+} (when reading and writing). When using 
+@samp{=}, do not assume the location contains the existing value
+on entry to the @code{asm}, except 
+when the operand is tied to an input; @pxref{InputOperands,,Input Operands}.
 
-@node Local Register Variables
-@subsubsection Specifying Registers for Local Variables
-@anchor{Local Reg Vars}
-@cindex local variables, specifying registers
-@cindex specifying registers for local variables
-@cindex registers for local variables
+After the prefix, there must be one or more additional constraints 
+(@pxref{Constraints}) that describe where the value resides. Common 
+constraints include @samp{r} for register and @samp{m} for memory. 
+When you list more than one possible location (for example, @code{"=rm"}),
+the compiler chooses the most efficient one based on the current context. 
+If you list as many alternates as the @code{asm} statement allows, you permit 
+the optimizers to produce the best possible code. 
+If you must use a specific register, but your Machine Constraints do not
+provide sufficient control to select the specific register you want, 
+local register variables may provide a solution (@pxref{Local Register 
+Variables}).
 
-You can define a local register variable and associate it with a specified 
-register like this:
+@item cvariablename
+Specifies a C lvalue expression to hold the output, typically a variable name.
+The enclosing parentheses are a required part of the syntax.
 
-@smallexample
-register int *foo asm ("r12");
-@end smallexample
+@end table
 
-@noindent
-Here @code{r12} is the name of the register that should be used.  Note
-that this is the same syntax used for defining global register variables, 
-but for a local variable the declaration appears within a function.  The 
-@code{register} keyword is required, and cannot be combined with 
-@code{static}.  The register name must be a valid register name for the
-target platform.
+When the compiler selects the registers to use to 
+represent the output operands, it does not use any of the clobbered registers 
+(@pxref{Clobbers and Scratch Registers}).
 
-Do not use type qualifiers such as @code{const} and @code{volatile}, as
-the outcome may be contrary to expectations. In particular, when the
-@code{const} qualifier is used, the compiler may substitute the
-variable with its initializer in @code{asm} statements, which may cause
-the corresponding operand to appear in a different register.
+Output operand expressions must be lvalues. The compiler cannot check whether 
+the operands have data types that are reasonable for the instruction being 
+executed. For output expressions that are not directly addressable (for 
+example a bit-field), the constraint must allow a register. In that case, GCC 
+uses the register as the output of the @code{asm}, and then stores that 
+register into the output. 
 
-As with global register variables, it is recommended that you choose 
-a register that is normally saved and restored by function calls on your 
-machine, so that calls to library routines will not clobber it.
+Operands using the @samp{+} constraint modifier count as two operands 
+(that is, both as input and output) towards the total maximum of 30 operands
+per @code{asm} statement.
 
-The only supported use for this feature is to specify registers
-for input and output operands when calling Extended @code{asm} 
-(@pxref{Extended Asm}).  This may be necessary if the constraints for a 
-particular machine don't provide sufficient control to select the desired 
-register.  To force an operand into a register, create a local variable 
-and specify the register name after the variable's declaration.  Then use 
-the local variable for the @code{asm} operand and specify any constraint 
-letter that matches the register:
+Use the @samp{&} constraint modifier (@pxref{Modifiers}) on all output
+operands that must not overlap an input.  Otherwise, 
+GCC may allocate the output operand in the same register as an unrelated 
+input operand, on the assumption that the assembler code consumes its 
+inputs before producing outputs. This assumption may be false if the assembler 
+code actually consists of more than one instruction.
 
-@smallexample
-register int *p1 asm ("r0") = @dots{};
-register int *p2 asm ("r1") = @dots{};
-register int *result asm ("r0");
-asm ("sysint" : "=r" (result) : "0" (p1), "r" (p2));
-@end smallexample
+The same problem can occur if one output parameter (@var{a}) allows a register 
+constraint and another output parameter (@var{b}) allows a memory constraint.
+The code generated by GCC to access the memory address in @var{b} can contain
+registers which @emph{might} be shared by @var{a}, and GCC considers those 
+registers to be inputs to the asm. As above, GCC assumes that such input
+registers are consumed before any outputs are written. This assumption may 
+result in incorrect behavior if the @code{asm} statement writes to @var{a}
+before using
+@var{b}. Combining the @samp{&} modifier with the register constraint on @var{a}
+ensures that modifying @var{a} does not affect the address referenced by 
+@var{b}. Otherwise, the location of @var{b} 
+is undefined if @var{a} is modified before using @var{b}.
 
-@emph{Warning:} In the above example, be aware that a register (for example 
-@code{r0}) can be call-clobbered by subsequent code, including function 
-calls and library calls for arithmetic operators on other variables (for 
-example the initialization of @code{p2}).  In this case, use temporary 
-variables for expressions between the register assignments:
+@code{asm} supports operand modifiers on operands (for example @samp{%k2} 
+instead of simply @samp{%2}). @ref{GenericOperandmodifiers,
+Generic Operand modifiers} lists the modifiers that are available
+on all targets.  Other modifiers are hardware dependent.
+For example, the list of supported modifiers for x86 is found at
+@ref{x86Operandmodifiers,x86 Operand modifiers}.
 
-@smallexample
-int t1 = @dots{};
-register int *p1 asm ("r0") = @dots{};
-register int *p2 asm ("r1") = t1;
-register int *result asm ("r0");
-asm ("sysint" : "=r" (result) : "0" (p1), "r" (p2));
-@end smallexample
+If the C code that follows the @code{asm} makes no use of any of the output 
+operands, use @code{volatile} for the @code{asm} statement to prevent the 
+optimizers from discarding the @code{asm} statement as unneeded 
+(see @ref{Volatile}).
 
-Defining a register variable does not reserve the register.  Other than
-when invoking the Extended @code{asm}, the contents of the specified 
-register are not guaranteed.  For this reason, the following uses 
-are explicitly @emph{not} supported.  If they appear to work, it is only 
-happenstance, and may stop working as intended due to (seemingly) 
-unrelated changes in surrounding code, or even minor changes in the 
-optimization of a future version of gcc:
+This code makes no use of the optional @var{asmSymbolicName}. Therefore it 
+references the first output operand as @code{%0} (were there a second, it 
+would be @code{%1}, etc). The number of the first input operand is one greater 
+than that of the last output operand. In this i386 example, that makes 
+@code{Mask} referenced as @code{%1}:
 
-@itemize @bullet
-@item Passing parameters to or from Basic @code{asm}
-@item Passing parameters to or from Extended @code{asm} without using input 
-or output operands.
-@item Passing parameters to or from routines written in assembler (or
-other languages) using non-standard calling conventions.
-@end itemize
+@example
+uint32_t Mask = 1234;
+uint32_t Index;
 
-Some developers use Local Register Variables in an attempt to improve 
-gcc's allocation of registers, especially in large functions.  In this 
-case the register name is essentially a hint to the register allocator.
-While in some instances this can generate better code, improvements are
-subject to the whims of the allocator/optimizers.  Since there are no
-guarantees that your improvements won't be lost, this usage of Local
-Register Variables is discouraged.
+  asm ("bsfl %1, %0"
+     : "=r" (Index)
+     : "r" (Mask)
+     : "cc");
+@end example
 
-On the MIPS platform, there is related use for local register variables 
-with slightly different characteristics (@pxref{MIPS Coprocessors,, 
-Defining coprocessor specifics for MIPS targets, gccint, 
-GNU Compiler Collection (GCC) Internals}).
+That code overwrites the variable @code{Index} (@samp{=}),
+placing the value in a register (@samp{r}).
+Using the generic @samp{r} constraint instead of a constraint for a specific 
+register allows the compiler to pick the register to use, which can result 
+in more efficient code. This may not be possible if an assembler instruction 
+requires a specific register.
 
-@node Size of an asm
-@subsection Size of an @code{asm}
+The following i386 example uses the @var{asmSymbolicName} syntax.
+It produces the 
+same result as the code above, but some may consider it more readable or more 
+maintainable since reordering index numbers is not necessary when adding or 
+removing operands. The names @code{aIndex} and @code{aMask}
+are only used in this example to emphasize which 
+names get used where.
+It is acceptable to reuse the names @code{Index} and @code{Mask}.
 
-Some targets require that GCC track the size of each instruction used
-in order to generate correct code.  Because the final length of the
-code produced by an @code{asm} statement is only known by the
-assembler, GCC must make an estimate as to how big it will be.  It
-does this by counting the number of instructions in the pattern of the
-@code{asm} and multiplying that by the length of the longest
-instruction supported by that processor.  (When working out the number
-of instructions, it assumes that any occurrence of a newline or of
-whatever statement separator character is supported by the assembler ---
-typically @samp{;} --- indicates the end of an instruction.)
+@example
+uint32_t Mask = 1234;
+uint32_t Index;
 
-Normally, GCC's estimate is adequate to ensure that correct
-code is generated, but it is possible to confuse the compiler if you use
-pseudo instructions or assembler macros that expand into multiple real
-instructions, or if you use assembler directives that expand to more
-space in the object file than is needed for a single instruction.
-If this happens then the assembler may produce a diagnostic saying that
-a label is unreachable.
+  asm ("bsfl %[aMask], %[aIndex]"
+     : [aIndex] "=r" (Index)
+     : [aMask] "r" (Mask)
+     : "cc");
+@end example
 
-@cindex @code{asm inline}
-This size is also used for inlining decisions.  If you use @code{asm inline}
-instead of just @code{asm}, then for inlining purposes the size of the asm
-is taken as the minimum size, ignoring how many instructions GCC thinks it is.
+Here are some more examples of output operands.
 
-@node Syntax Extensions
-@section Other Extensions to C Syntax
+@example
+uint32_t c = 1;
+uint32_t d;
+uint32_t *e = &c;
 
-GNU C has traditionally supported numerous extensions to standard C
-syntax.  Some of these features were originally intended for
-compatibility with other compilers or to ease traditional C
-compatibility, some have been adopted into subsequent versions of the
-C and/or C++ standards, while others remain specific to GNU C.
+asm ("mov %[e], %[d]"
+   : [d] "=rm" (d)
+   : [e] "rm" (*e));
+@end example
 
-@menu
-* Statement Exprs::     Putting statements and declarations inside expressions.
-* Local Labels::        Labels local to a block.
-* Labels as Values::    Getting pointers to labels, and computed gotos.
-* Nested Functions::    Nested functions in GNU C.
-* Typeof::              @code{typeof}: referring to the type of an expression.
-* Offsetof::            Special syntax for @code{offsetof}.
-* Alignment::           Determining the alignment of a function, type or variable.
-* Incomplete Enums::    @code{enum foo;}, with details to follow.
-* Variadic Macros::     Macros with a variable number of arguments.
-* Conditionals::        Omitting the middle operand of a @samp{?:} expression.
-* Case Ranges::         `case 1 ... 9' and such.
-* Mixed Labels and Declarations::  Mixing declarations, labels and code.
-* C++ Comments::        C++ comments are recognized.
-* Escaped Newlines::    Slightly looser rules for escaped newlines.
-* Hex Floats::          Hexadecimal floating-point constants.
-* Binary constants::    Binary constants using the @samp{0b} prefix.
-* Dollar Signs::        Dollar sign is allowed in identifiers.
-* Character Escapes::   @samp{\e} stands for the character @key{ESC}.
-* Alternate Keywords::  @code{__const__}, @code{__asm__}, etc., for header files.
-* Function Names::      Printable strings which are the name of the current
-                        function.
-@end menu
+Here, @code{d} may either be in a register or in memory. Since the compiler 
+might already have the current value of the @code{uint32_t} location
+pointed to by @code{e}
+in a register, you can enable it to choose the best location
+for @code{d} by specifying both constraints.
 
-@node Statement Exprs
-@subsection Statements and Declarations in Expressions
-@cindex statements inside expressions
-@cindex declarations inside expressions
-@cindex expressions containing statements
-@cindex macros, statements in expressions
+@anchor{FlagOutputOperands}
+@subsubsection Flag Output Operands
+@cindex @code{asm} flag output operands
 
-@c the above section title wrapped and causes an underfull hbox.. i
-@c changed it from "within" to "in". --mew 4feb93
-A compound statement enclosed in parentheses may appear as an expression
-in GNU C@.  This allows you to use loops, switches, and local variables
-within an expression.
+Some targets have a special register that holds the ``flags'' for the
+result of an operation or comparison.  Normally, the contents of that
+register are either unmodified by the asm, or the @code{asm} statement is
+considered to clobber the contents.
 
-Recall that a compound statement is a sequence of statements surrounded
-by braces; in this construct, parentheses go around the braces.  For
-example:
+On some targets, a special form of output operand exists by which
+conditions in the flags register may be outputs of the asm.  The set of
+conditions supported are target specific, but the general rule is that
+the output variable must be a scalar integer, and the value is boolean.
+When supported, the target defines the preprocessor symbol
+@code{__GCC_ASM_FLAG_OUTPUTS__}.
 
-@smallexample
-(@{ int y = foo (); int z;
-   if (y > 0) z = y;
-   else z = - y;
-   z; @})
-@end smallexample
+Because of the special nature of the flag output operands, the constraint
+may not include alternatives.
 
-@noindent
-is a valid (though slightly more complex than necessary) expression
-for the absolute value of @code{foo ()}.
+Most often, the target has only one flags register, and thus is an implied
+operand of many instructions.  In this case, the operand should not be
+referenced within the assembler template via @code{%0} etc, as there's
+no corresponding text in the assembly language.
 
-The last thing in the compound statement should be an expression
-followed by a semicolon; the value of this subexpression serves as the
-value of the entire construct.  (If you use some other kind of statement
-last within the braces, the construct has type @code{void}, and thus
-effectively no value.)
+@table @asis
+@item ARM
+@itemx AArch64
+The flag output constraints for the ARM family are of the form
+@samp{=@@cc@var{cond}} where @var{cond} is one of the standard
+conditions defined in the ARM ARM for @code{ConditionHolds}.
 
-This feature is especially useful in making macro definitions ``safe'' (so
-that they evaluate each operand exactly once).  For example, the
-``maximum'' function is commonly defined as a macro in standard C as
-follows:
+@table @code
+@item eq
+Z flag set, or equal
+@item ne
+Z flag clear or not equal
+@item cs
+@itemx hs
+C flag set or unsigned greater than equal
+@item cc
+@itemx lo
+C flag clear or unsigned less than
+@item mi
+N flag set or ``minus''
+@item pl
+N flag clear or ``plus''
+@item vs
+V flag set or signed overflow
+@item vc
+V flag clear
+@item hi
+unsigned greater than
+@item ls
+unsigned less than equal
+@item ge
+signed greater than equal
+@item lt
+signed less than
+@item gt
+signed greater than
+@item le
+signed less than equal
+@end table
 
-@smallexample
-#define max(a,b) ((a) > (b) ? (a) : (b))
-@end smallexample
+The flag output constraints are not supported in thumb1 mode.
 
-@noindent
-@cindex side effects, macro argument
-But this definition computes either @var{a} or @var{b} twice, with bad
-results if the operand has side effects.  In GNU C, if you know the
-type of the operands (here taken as @code{int}), you can avoid this
-problem by defining the macro as follows:
+@item x86 family
+The flag output constraints for the x86 family are of the form
+@samp{=@@cc@var{cond}} where @var{cond} is one of the standard
+conditions defined in the ISA manual for @code{j@var{cc}} or
+@code{set@var{cc}}.
 
-@smallexample
-#define maxint(a,b) \
-  (@{int _a = (a), _b = (b); _a > _b ? _a : _b; @})
-@end smallexample
+@table @code
+@item a
+``above'' or unsigned greater than
+@item ae
+``above or equal'' or unsigned greater than or equal
+@item b
+``below'' or unsigned less than
+@item be
+``below or equal'' or unsigned less than or equal
+@item c
+carry flag set
+@item e
+@itemx z
+``equal'' or zero flag set
+@item g
+signed greater than
+@item ge
+signed greater than or equal
+@item l
+signed less than
+@item le
+signed less than or equal
+@item o
+overflow flag set
+@item p
+parity flag set
+@item s
+sign flag set
+@item na
+@itemx nae
+@itemx nb
+@itemx nbe
+@itemx nc
+@itemx ne
+@itemx ng
+@itemx nge
+@itemx nl
+@itemx nle
+@itemx no
+@itemx np
+@itemx ns
+@itemx nz
+``not'' @var{flag}, or inverted versions of those above
+@end table
 
-Note that introducing variable declarations (as we do in @code{maxint}) can
-cause variable shadowing, so while this example using the @code{max} macro
-produces correct results:
-@smallexample
-int _a = 1, _b = 2, c;
-c = max (_a, _b);
-@end smallexample
-@noindent
-this example using maxint will not:
-@smallexample
-int _a = 1, _b = 2, c;
-c = maxint (_a, _b);
-@end smallexample
+@item s390
+The flag output constraint for s390 is @samp{=@@cc}.  Only one such
+constraint is allowed.  The variable has to be stored in a @samp{int}
+variable.
 
-This problem may for instance occur when we use this pattern recursively, like
-so:
+@end table
 
-@smallexample
-#define maxint3(a, b, c) \
-  (@{int _a = (a), _b = (b), _c = (c); maxint (maxint (_a, _b), _c); @})
-@end smallexample
+@anchor{InputOperands}
+@subsubsection Input Operands
+@cindex @code{asm} input operands
+@cindex @code{asm} expressions
 
-Embedded statements are not allowed in constant expressions, such as
-the value of an enumeration constant, the width of a bit-field, or
-the initial value of a static variable.
+Input operands make values from C variables and expressions available to the 
+assembly code.
 
-If you don't know the type of the operand, you can still do this, but you
-must use @code{typeof} or @code{__auto_type} (@pxref{Typeof}).
+Operands are separated by commas.  Each operand has this format:
 
-In G++, the result value of a statement expression undergoes array and
-function pointer decay, and is returned by value to the enclosing
-expression.  For instance, if @code{A} is a class, then
+@example
+@r{[} [@var{asmSymbolicName}] @r{]} @var{constraint} (@var{cexpression})
+@end example
 
-@smallexample
-        A a;
+@table @var
+@item asmSymbolicName
+Specifies an optional symbolic name for the operand.  The literal square
+brackets @samp{[]} around the @var{asmSymbolicName} are required both
+in the operand specification and references to the operand in the assembler
+template, i.e.@: @samp{%[Value]}.
+The scope of the name is the @code{asm} statement
+that contains the definition. Any valid C identifier is acceptable,
+including names already defined in the surrounding code. No two operands 
+within the same @code{asm} statement can use the same symbolic name.
 
-        (@{a;@}).Foo ()
-@end smallexample
+When not using an @var{asmSymbolicName}, use the (zero-based) position
+of the operand 
+in the list of operands in the assembler template. For example if there are
+two output operands and three inputs,
+use @samp{%2} in the template to refer to the first input operand,
+@samp{%3} for the second, and @samp{%4} for the third. 
 
-@noindent
-constructs a temporary @code{A} object to hold the result of the
-statement expression, and that is used to invoke @code{Foo}.
-Therefore the @code{this} pointer observed by @code{Foo} is not the
-address of @code{a}.
+@item constraint
+A string constant specifying constraints on the placement of the operand; 
+@xref{Constraints}, for details.
+In C++ with @option{-std=gnu++11} or later, the constraint can
+also be a constant expression inside parentheses (see @ref{Asm constexprs}).
 
-In a statement expression, any temporaries created within a statement
-are destroyed at that statement's end.  This makes statement
-expressions inside macros slightly different from function calls.  In
-the latter case temporaries introduced during argument evaluation are
-destroyed at the end of the statement that includes the function
-call.  In the statement expression case they are destroyed during
-the statement expression.  For instance,
+Input constraint strings may not begin with either @samp{=} or @samp{+}.
+When you list more than one possible location (for example, @samp{"irm"}), 
+the compiler chooses the most efficient one based on the current context.
+If you must use a specific register, but your Machine Constraints do not
+provide sufficient control to select the specific register you want, 
+local register variables may provide a solution (@pxref{Local Register 
+Variables}).
 
-@smallexample
-#define macro(a)  (@{__typeof__(a) b = (a); b + 3; @})
-template<typename T> T function(T a) @{ T b = a; return b + 3; @}
+Input constraints can also be digits (for example, @code{"0"}). This indicates 
+that the specified input must be in the same place as the output constraint 
+at the (zero-based) index in the output constraint list. 
+When using @var{asmSymbolicName} syntax for the output operands,
+you may use these names (enclosed in brackets @samp{[]}) instead of digits.
 
-void foo ()
-@{
-  macro (X ());
-  function (X ());
-@}
-@end smallexample
+@item cexpression
+This is the C variable or expression being passed to the @code{asm} statement 
+as input.  The enclosing parentheses are a required part of the syntax.
 
-@noindent
-has different places where temporaries are destroyed.  For the
-@code{macro} case, the temporary @code{X} is destroyed just after
-the initialization of @code{b}.  In the @code{function} case that
-temporary is destroyed when the function returns.
+@end table
 
-These considerations mean that it is probably a bad idea to use
-statement expressions of this form in header files that are designed to
-work with C++.  (Note that some versions of the GNU C Library contained
-header files using statement expressions that lead to precisely this
-bug.)
+When the compiler selects the registers to use to represent the input 
+operands, it does not use any of the clobbered registers
+(@pxref{Clobbers and Scratch Registers}).
 
-Jumping into a statement expression with @code{goto} or using a
-@code{switch} statement outside the statement expression with a
-@code{case} or @code{default} label inside the statement expression is
-not permitted.  Jumping into a statement expression with a computed
-@code{goto} (@pxref{Labels as Values}) has undefined behavior.
-Jumping out of a statement expression is permitted, but if the
-statement expression is part of a larger expression then it is
-unspecified which other subexpressions of that expression have been
-evaluated except where the language definition requires certain
-subexpressions to be evaluated before or after the statement
-expression.  A @code{break} or @code{continue} statement inside of
-a statement expression used in @code{while}, @code{do} or @code{for}
-loop or @code{switch} statement condition
-or @code{for} statement init or increment expressions jumps to an
-outer loop or @code{switch} statement if any (otherwise it is an error),
-rather than to the loop or @code{switch} statement in whose condition
-or init or increment expression it appears.
-In any case, as with a function call, the evaluation of a
-statement expression is not interleaved with the evaluation of other
-parts of the containing expression.  For example,
+If there are no output operands but there are input operands, place two 
+consecutive colons where the output operands would go:
 
-@smallexample
-  foo (), ((@{ bar1 (); goto a; 0; @}) + bar2 ()), baz();
-@end smallexample
+@example
+__asm__ ("some instructions"
+   : /* No outputs. */
+   : "r" (Offset / 8));
+@end example
 
-@noindent
-calls @code{foo} and @code{bar1} and does not call @code{baz} but
-may or may not call @code{bar2}.  If @code{bar2} is called, it is
-called after @code{foo} and before @code{bar1}.
+@strong{Warning:} Do @emph{not} modify the contents of input-only operands 
+(except for inputs tied to outputs). The compiler assumes that on exit from 
+the @code{asm} statement these operands contain the same values as they 
+had before executing the statement. 
+It is @emph{not} possible to use clobbers
+to inform the compiler that the values in these inputs are changing. One 
+common work-around is to tie the changing input variable to an output variable 
+that never gets used. Note, however, that if the code that follows the 
+@code{asm} statement makes no use of any of the output operands, the GCC 
+optimizers may discard the @code{asm} statement as unneeded 
+(see @ref{Volatile}).
 
-@node Local Labels
-@subsection Locally Declared Labels
-@cindex local labels
-@cindex macros, local labels
+@code{asm} supports operand modifiers on operands (for example @samp{%k2} 
+instead of simply @samp{%2}). @ref{GenericOperandmodifiers,
+Generic Operand modifiers} lists the modifiers that are available
+on all targets.  Other modifiers are hardware dependent.
+For example, the list of supported modifiers for x86 is found at
+@ref{x86Operandmodifiers,x86 Operand modifiers}.
 
-GCC allows you to declare @dfn{local labels} in any nested block
-scope.  A local label is just like an ordinary label, but you can
-only reference it (with a @code{goto} statement, or by taking its
-address) within the block in which it is declared.
+In this example using the fictitious @code{combine} instruction, the 
+constraint @code{"0"} for input operand 1 says that it must occupy the same 
+location as output operand 0. Only input operands may use numbers in 
+constraints, and they must each refer to an output operand. Only a number (or 
+the symbolic assembler name) in the constraint can guarantee that one operand 
+is in the same place as another. The mere fact that @code{foo} is the value of 
+both operands is not enough to guarantee that they are in the same place in 
+the generated assembler code.
 
-A local label declaration looks like this:
+@example
+asm ("combine %2, %0" 
+   : "=r" (foo) 
+   : "0" (foo), "g" (bar));
+@end example
 
-@smallexample
-__label__ @var{label};
-@end smallexample
+Here is an example using symbolic names.
 
-@noindent
-or
+@example
+asm ("cmoveq %1, %2, %[result]" 
+   : [result] "=r"(result) 
+   : "r" (test), "r" (new), "[result]" (old));
+@end example
 
-@smallexample
-__label__ @var{label1}, @var{label2}, /* @r{@dots{}} */;
-@end smallexample
+@anchor{Clobbers and Scratch Registers}
+@subsubsection Clobbers and Scratch Registers
+@cindex @code{asm} clobbers
+@cindex @code{asm} scratch registers
 
-Local label declarations must come at the beginning of the block,
-before any ordinary declarations or statements.
+While the compiler is aware of changes to entries listed in the output 
+operands, the inline @code{asm} code may modify more than just the outputs. For 
+example, calculations may require additional registers, or the processor may 
+overwrite a register as a side effect of a particular assembler instruction. 
+In order to inform the compiler of these changes, list them in the clobber 
+list. Clobber list items are either register names or the special clobbers 
+(listed below). Each clobber list item is a string constant 
+enclosed in double quotes and separated by commas.
+In C++ with @option{-std=gnu++11} or later, a clobber list item can
+also be a constant expression inside parentheses (see @ref{Asm constexprs}).
 
-The label declaration defines the label @emph{name}, but does not define
-the label itself.  You must do this in the usual way, with
-@code{@var{label}:}, within the statements of the statement expression.
+Clobber descriptions may not in any way overlap with an input or output 
+operand. For example, you may not have an operand describing a register class 
+with one member when listing that register in the clobber list. Variables 
+declared to live in specific registers (@pxref{Explicit Register 
+Variables}) and used 
+as @code{asm} input or output operands must have no part mentioned in the 
+clobber description. In particular, there is no way to specify that input 
+operands get modified without also specifying them as output operands.
 
-The local label feature is useful for complex macros.  If a macro
-contains nested loops, a @code{goto} can be useful for breaking out of
-them.  However, an ordinary label whose scope is the whole function
-cannot be used: if the macro can be expanded several times in one
-function, the label is multiply defined in that function.  A
-local label avoids this problem.  For example:
+When the compiler selects which registers to use to represent input and output 
+operands, it does not use any of the clobbered registers. As a result, 
+clobbered registers are available for any use in the assembler code.
 
-@smallexample
-#define SEARCH(value, array, target)              \
-do @{                                              \
-  __label__ found;                                \
-  typeof (target) _SEARCH_target = (target);      \
-  typeof (*(array)) *_SEARCH_array = (array);     \
-  int i, j;                                       \
-  int value;                                      \
-  for (i = 0; i < max; i++)                       \
-    for (j = 0; j < max; j++)                     \
-      if (_SEARCH_array[i][j] == _SEARCH_target)  \
-        @{ (value) = i; goto found; @}              \
-  (value) = -1;                                   \
- found:;                                          \
-@} while (0)
-@end smallexample
+Another restriction is that the clobber list should not contain the
+stack pointer register.  This is because the compiler requires the
+value of the stack pointer to be the same after an @code{asm}
+statement as it was on entry to the statement.  However, previous
+versions of GCC did not enforce this rule and allowed the stack
+pointer to appear in the list, with unclear semantics.  This behavior
+is deprecated and listing the stack pointer may become an error in
+future versions of GCC@.
 
-This could also be written using a statement expression:
+Here is a realistic example for the VAX showing the use of clobbered 
+registers: 
 
-@smallexample
-#define SEARCH(array, target)                     \
-(@{                                                \
-  __label__ found;                                \
-  typeof (target) _SEARCH_target = (target);      \
-  typeof (*(array)) *_SEARCH_array = (array);     \
-  int i, j;                                       \
-  int value;                                      \
-  for (i = 0; i < max; i++)                       \
-    for (j = 0; j < max; j++)                     \
-      if (_SEARCH_array[i][j] == _SEARCH_target)  \
-        @{ value = i; goto found; @}                \
-  value = -1;                                     \
- found:                                           \
-  value;                                          \
-@})
-@end smallexample
+@example
+asm volatile ("movc3 %0, %1, %2"
+                   : /* No outputs. */
+                   : "g" (from), "g" (to), "g" (count)
+                   : "r0", "r1", "r2", "r3", "r4", "r5", "memory");
+@end example
 
-Local label declarations also make the labels they declare visible to
-nested functions, if there are any.  @xref{Nested Functions}, for details.
+Also, there are three special clobber arguments:
 
-@node Labels as Values
-@subsection Labels as Values
-@cindex labels as values
-@cindex computed gotos
-@cindex goto with computed label
-@cindex address of a label
+@table @code
+@item "cc"
+The @code{"cc"} clobber indicates that the assembler code modifies the flags 
+register. On some machines, GCC represents the condition codes as a specific 
+hardware register; @code{"cc"} serves to name this register.
+On other machines, condition code handling is different, 
+and specifying @code{"cc"} has no effect. But 
+it is valid no matter what the target.
 
-You can get the address of a label defined in the current function
-(or a containing function) with the unary operator @samp{&&}.  The
-value has type @code{void *}.  This value is a constant and can be used
-wherever a constant of that type is valid.  For example:
+@item "memory"
+The @code{"memory"} clobber tells the compiler that the assembly code
+performs memory 
+reads or writes to items other than those listed in the input and output 
+operands (for example, accessing the memory pointed to by one of the input 
+parameters). To ensure memory contains correct values, GCC may need to flush 
+specific register values to memory before executing the @code{asm}. Further, 
+the compiler does not assume that any values read from memory before an 
+@code{asm} remain unchanged after that @code{asm}; it reloads them as 
+needed.  
+Using the @code{"memory"} clobber effectively forms a read/write
+memory barrier for the compiler.
 
-@smallexample
-void *ptr;
-/* @r{@dots{}} */
-ptr = &&foo;
-@end smallexample
+Note that this clobber does not prevent the @emph{processor} from doing 
+speculative reads past the @code{asm} statement. To prevent that, you need 
+processor-specific fence instructions.
 
-To use these values, you need to be able to jump to one.  This is done
-with the computed goto statement@footnote{The analogous feature in
-Fortran is called an assigned goto, but that name seems inappropriate in
-C, where one can do more than simply store label addresses in label
-variables.}, @code{goto *@var{exp};}.  For example,
+@item "redzone"
+The @code{"redzone"} clobber tells the compiler that the assembly code
+may write to the stack red zone, area below the stack pointer which on
+some architectures in some calling conventions is guaranteed not to be
+changed by signal handlers, interrupts or exceptions and so the compiler
+can store there temporaries in leaf functions.  On targets which have
+no concept of the stack red zone, the clobber is ignored.
+It should be used e.g.@: in case the assembly code uses call instructions
+or pushes something to the stack without taking the red zone into account
+by subtracting red zone size from the stack pointer first and restoring
+it afterwards.
 
-@smallexample
-goto *ptr;
-@end smallexample
+@end table
 
-@noindent
-Any expression of type @code{void *} is allowed.
+Flushing registers to memory has performance implications and may be
+an issue for time-sensitive code.  You can provide better information
+to GCC to avoid this, as shown in the following examples.  At a
+minimum, aliasing rules allow GCC to know what memory @emph{doesn't}
+need to be flushed.
 
-One way of using these constants is in initializing a static array that
-serves as a jump table:
+Here is a fictitious sum of squares instruction, that takes two
+pointers to floating point values in memory and produces a floating
+point register output.
+Notice that @code{x}, and @code{y} both appear twice in the @code{asm}
+parameters, once to specify memory accessed, and once to specify a
+base register used by the @code{asm}.  You won't normally be wasting a
+register by doing this as GCC can use the same register for both
+purposes.  However, it would be foolish to use both @code{%1} and
+@code{%3} for @code{x} in this @code{asm} and expect them to be the
+same.  In fact, @code{%3} may well not be a register.  It might be a
+symbolic memory reference to the object pointed to by @code{x}.
 
 @smallexample
-static void *array[] = @{ &&foo, &&bar, &&hack @};
+asm ("sumsq %0, %1, %2"
+     : "+f" (result)
+     : "r" (x), "r" (y), "m" (*x), "m" (*y));
 @end smallexample
 
-@noindent
-Then you can select a label with indexing, like this:
+Here is a fictitious @code{*z++ = *x++ * *y++} instruction.
+Notice that the @code{x}, @code{y} and @code{z} pointer registers
+must be specified as input/output because the @code{asm} modifies
+them.
 
 @smallexample
-goto *array[i];
+asm ("vecmul %0, %1, %2"
+     : "+r" (z), "+r" (x), "+r" (y), "=m" (*z)
+     : "m" (*x), "m" (*y));
 @end smallexample
 
-@noindent
-Note that this does not check whether the subscript is in bounds---array
-indexing in C never does that.
-
-Such an array of label values serves a purpose much like that of the
-@code{switch} statement.  The @code{switch} statement is cleaner, so
-use that rather than an array unless the problem does not fit a
-@code{switch} statement very well.
-
-Another use of label values is in an interpreter for threaded code.
-The labels within the interpreter function can be stored in the
-threaded code for super-fast dispatching.
-
-You may not use this mechanism to jump to code in a different function.
-If you do that, totally unpredictable things happen.  The best way to
-avoid this is to store the label address only in automatic variables and
-never pass it as an argument.
-
-An alternate way to write the above example is
+An x86 example where the string memory argument is of unknown length.
 
 @smallexample
-static const int array[] = @{ &&foo - &&foo, &&bar - &&foo,
-                             &&hack - &&foo @};
-goto *(&&foo + array[i]);
+asm("repne scasb"
+    : "=c" (count), "+D" (p)
+    : "m" (*(const char (*)[]) p), "0" (-1), "a" (0));
 @end smallexample
 
-@noindent
-This is more friendly to code living in shared libraries, as it reduces
-the number of dynamic relocations that are needed, and by consequence,
-allows the data to be read-only.
-This alternative with label differences is not supported for the AVR target,
-please use the first approach for AVR programs.
-
-The @code{&&foo} expressions for the same label might have different
-values if the containing function is inlined or cloned.  If a program
-relies on them being always the same,
-@code{__attribute__((__noinline__,__noclone__))} should be used to
-prevent inlining and cloning.  If @code{&&foo} is used in a static
-variable initializer, inlining and cloning is forbidden.
-
-Unlike a normal goto, in GNU C++ a computed goto will not call
-destructors for objects that go out of scope.
-
-@node Nested Functions
-@subsection Nested Functions
-@cindex nested functions
-@cindex downward funargs
-@cindex thunks
-
-A @dfn{nested function} is a function defined inside another function.
-Nested functions are supported as an extension in GNU C, but are not
-supported by GNU C++.
-
-The nested function's name is local to the block where it is defined.
-For example, here we define a nested function named @code{square}, and
-call it twice:
-
-@smallexample
-@group
-foo (double a, double b)
-@{
-  double square (double z) @{ return z * z; @}
-
-  return square (a) + square (b);
-@}
-@end group
-@end smallexample
+If you know the above will only be reading a ten byte array then you
+could instead use a memory input like:
+@code{"m" (*(const char (*)[10]) p)}.
 
-The nested function can access all the variables of the containing
-function that are visible at the point of its definition.  This is
-called @dfn{lexical scoping}.  For example, here we show a nested
-function which uses an inherited variable named @code{offset}:
+Here is an example of a PowerPC vector scale implemented in assembly,
+complete with vector and condition code clobbers, and some initialized
+offset registers that are unchanged by the @code{asm}.
 
 @smallexample
-@group
-bar (int *array, int offset, int size)
+void
+dscal (size_t n, double *x, double alpha)
 @{
-  int access (int *array, int index)
-    @{ return array[index + offset]; @}
-  int i;
-  /* @r{@dots{}} */
-  for (i = 0; i < size; i++)
-    /* @r{@dots{}} */ access (array, i) /* @r{@dots{}} */
+  asm ("/* lots of asm here */"
+       : "+m" (*(double (*)[n]) x), "+&r" (n), "+b" (x)
+       : "d" (alpha), "b" (32), "b" (48), "b" (64),
+         "b" (80), "b" (96), "b" (112)
+       : "cr0",
+         "vs32","vs33","vs34","vs35","vs36","vs37","vs38","vs39",
+         "vs40","vs41","vs42","vs43","vs44","vs45","vs46","vs47");
 @}
-@end group
 @end smallexample
 
-Nested function definitions are permitted within functions in the places
-where variable definitions are allowed; that is, in any block, mixed
-with the other declarations and statements in the block.
-
-It is possible to call the nested function from outside the scope of its
-name by storing its address or passing the address to another function:
+Rather than allocating fixed registers via clobbers to provide scratch
+registers for an @code{asm} statement, an alternative is to define a
+variable and make it an early-clobber output as with @code{a2} and
+@code{a3} in the example below.  This gives the compiler register
+allocator more freedom.  You can also define a variable and make it an
+output tied to an input as with @code{a0} and @code{a1}, tied
+respectively to @code{ap} and @code{lda}.  Of course, with tied
+outputs your @code{asm} can't use the input value after modifying the
+output register since they are one and the same register.  What's
+more, if you omit the early-clobber on the output, it is possible that
+GCC might allocate the same register to another of the inputs if GCC
+could prove they had the same value on entry to the @code{asm}.  This
+is why @code{a1} has an early-clobber.  Its tied input, @code{lda}
+might conceivably be known to have the value 16 and without an
+early-clobber share the same register as @code{%11}.  On the other
+hand, @code{ap} can't be the same as any of the other inputs, so an
+early-clobber on @code{a0} is not needed.  It is also not desirable in
+this case.  An early-clobber on @code{a0} would cause GCC to allocate
+a separate register for the @code{"m" (*(const double (*)[]) ap)}
+input.  Note that tying an input to an output is the way to set up an
+initialized temporary register modified by an @code{asm} statement.
+An input not tied to an output is assumed by GCC to be unchanged, for
+example @code{"b" (16)} below sets up @code{%11} to 16, and GCC might
+use that register in following code if the value 16 happened to be
+needed.  You can even use a normal @code{asm} output for a scratch if
+all inputs that might share the same register are consumed before the
+scratch is used.  The VSX registers clobbered by the @code{asm}
+statement could have used this technique except for GCC's limit on the
+number of @code{asm} parameters.
 
 @smallexample
-hack (int *array, int size)
+static void
+dgemv_kernel_4x4 (long n, const double *ap, long lda,
+                  const double *x, double *y, double alpha)
 @{
-  void store (int index, int value)
-    @{ array[index] = value; @}
-
-  intermediate (store, size);
-@}
-@end smallexample
-
-Here, the function @code{intermediate} receives the address of
-@code{store} as an argument.  If @code{intermediate} calls @code{store},
-the arguments given to @code{store} are used to store into @code{array}.
-But this technique works only so long as the containing function
-(@code{hack}, in this example) does not exit.
+  double *a0;
+  double *a1;
+  double *a2;
+  double *a3;
 
-If you try to call the nested function through its address after the
-containing function exits, all hell breaks loose.  If you try
-to call it after a containing scope level exits, and if it refers
-to some of the variables that are no longer in scope, you may be lucky,
-but it's not wise to take the risk.  If, however, the nested function
-does not refer to anything that has gone out of scope, you should be
-safe.
-
-GCC implements taking the address of a nested function using a technique
-called @dfn{trampolines}.  This technique was described in
-@cite{Lexical Closures for C++} (Thomas M. Breuel, USENIX
-C++ Conference Proceedings, October 17-21, 1988).
-
-A nested function can jump to a label inherited from a containing
-function, provided the label is explicitly declared in the containing
-function (@pxref{Local Labels}).  Such a jump returns instantly to the
-containing function, exiting the nested function that did the
-@code{goto} and any intermediate functions as well.  Here is an example:
-
-@smallexample
-@group
-bar (int *array, int offset, int size)
-@{
-  __label__ failure;
-  int access (int *array, int index)
-    @{
-      if (index > size)
-        goto failure;
-      return array[index + offset];
-    @}
-  int i;
-  /* @r{@dots{}} */
-  for (i = 0; i < size; i++)
-    /* @r{@dots{}} */ access (array, i) /* @r{@dots{}} */
-  /* @r{@dots{}} */
-  return 0;
-
- /* @r{Control comes here from @code{access}
-    if it detects an error.}  */
- failure:
-  return -1;
+  __asm__
+    (
+     /* lots of asm here */
+     "#n=%1 ap=%8=%12 lda=%13 x=%7=%10 y=%0=%2 alpha=%9 o16=%11\n"
+     "#a0=%3 a1=%4 a2=%5 a3=%6"
+     :
+       "+m" (*(double (*)[n]) y),
+       "+&r" (n),	// 1
+       "+b" (y),	// 2
+       "=b" (a0),	// 3
+       "=&b" (a1),	// 4
+       "=&b" (a2),	// 5
+       "=&b" (a3)	// 6
+     :
+       "m" (*(const double (*)[n]) x),
+       "m" (*(const double (*)[]) ap),
+       "d" (alpha),	// 9
+       "r" (x),		// 10
+       "b" (16),	// 11
+       "3" (ap),	// 12
+       "4" (lda)	// 13
+     :
+       "cr0",
+       "vs32","vs33","vs34","vs35","vs36","vs37",
+       "vs40","vs41","vs42","vs43","vs44","vs45","vs46","vs47"
+     );
 @}
-@end group
 @end smallexample
 
-A nested function always has no linkage.  Declaring one with
-@code{extern} or @code{static} is erroneous.  If you need to declare the nested function
-before its definition, use @code{auto} (which is otherwise meaningless
-for function declarations).
+@anchor{GotoLabels}
+@subsubsection Goto Labels
+@cindex @code{asm} goto labels
 
-@smallexample
-bar (int *array, int offset, int size)
-@{
-  __label__ failure;
-  auto int access (int *, int);
-  /* @r{@dots{}} */
-  int access (int *array, int index)
-    @{
-      if (index > size)
-        goto failure;
-      return array[index + offset];
-    @}
-  /* @r{@dots{}} */
-@}
-@end smallexample
+@code{asm goto} allows assembly code to jump to one or more C labels.  The
+@var{GotoLabels} section in an @code{asm goto} statement contains 
+a comma-separated 
+list of all C labels to which the assembler code may jump. GCC assumes that 
+@code{asm} execution falls through to the next statement (if this is not the 
+case, consider using the @code{__builtin_unreachable} intrinsic after the 
+@code{asm} statement). Optimization of @code{asm goto} may be improved by 
+using the @code{hot} and @code{cold} label attributes (@pxref{Label 
+Attributes}).
 
-@node Typeof
-@subsection Referring to a Type with @code{typeof}
-@findex typeof
-@findex sizeof
-@cindex macros, types of arguments
+If the assembler code does modify anything, use the @code{"memory"} clobber 
+to force the 
+optimizers to flush all register values to memory and reload them if 
+necessary after the @code{asm} statement.
 
-Another way to refer to the type of an expression is with @code{typeof}.
-The syntax of using of this keyword looks like @code{sizeof}, but the
-construct acts semantically like a type name defined with @code{typedef}.
+Also note that an @code{asm goto} statement is always implicitly
+considered volatile.
 
-There are two ways of writing the argument to @code{typeof}: with an
-expression or with a type.  Here is an example with an expression:
+Be careful when you set output operands inside @code{asm goto} only on
+some possible control flow paths.  If you don't set up the output on
+given path and never use it on this path, it is okay.  Otherwise, you
+should use @samp{+} constraint modifier meaning that the operand is
+input and output one.  With this modifier you will have the correct
+values on all possible paths from the @code{asm goto}.
 
-@smallexample
-typeof (x[0](1))
-@end smallexample
+To reference a label in the assembler template, prefix it with
+@samp{%l} (lowercase @samp{L}) followed by its (zero-based) position
+in @var{GotoLabels} plus the number of input and output operands.
+Output operand with constraint modifier @samp{+} is counted as two
+operands because it is considered as one output and one input operand.
+For example, if the @code{asm} has three inputs, one output operand
+with constraint modifier @samp{+} and one output operand with
+constraint modifier @samp{=} and references two labels, refer to the
+first label as @samp{%l6} and the second as @samp{%l7}).
 
-@noindent
-This assumes that @code{x} is an array of pointers to functions;
-the type described is that of the values of the functions.
+Alternately, you can reference labels using the actual C label name
+enclosed in brackets.  For example, to reference a label named
+@code{carry}, you can use @samp{%l[carry]}.  The label must still be
+listed in the @var{GotoLabels} section when using this approach.  It
+is better to use the named references for labels as in this case you
+can avoid counting input and output operands and special treatment of
+output operands with constraint modifier @samp{+}.
 
-Here is an example with a typename as the argument:
+Here is an example of @code{asm goto} for i386:
 
-@smallexample
-typeof (int *)
-@end smallexample
+@example
+asm goto (
+    "btl %1, %0\n\t"
+    "jc %l2"
+    : /* No outputs. */
+    : "r" (p1), "r" (p2) 
+    : "cc" 
+    : carry);
 
-@noindent
-Here the type described is that of pointers to @code{int}.
+return 0;
 
-If you are writing a header file that must work when included in ISO C
-programs, write @code{__typeof__} instead of @code{typeof}.
-@xref{Alternate Keywords}.
+carry:
+return 1;
+@end example
 
-A @code{typeof} construct can be used anywhere a typedef name can be
-used.  For example, you can use it in a declaration, in a cast, or inside
-of @code{sizeof} or @code{typeof}.
+The following example shows an @code{asm goto} that uses a memory clobber.
 
-The operand of @code{typeof} is evaluated for its side effects if and
-only if it is an expression of variably modified type or the name of
-such a type.
+@example
+int frob(int x)
+@{
+  int y;
+  asm goto ("frob %%r5, %1; jc %l[error]; mov (%2), %%r5"
+            : /* No outputs. */
+            : "r"(x), "r"(&y)
+            : "r5", "memory" 
+            : error);
+  return y;
+error:
+  return -1;
+@}
+@end example
 
-@code{typeof} is often useful in conjunction with
-statement expressions (@pxref{Statement Exprs}).
-Here is how the two together can
-be used to define a safe ``maximum'' macro which operates on any
-arithmetic type and evaluates each of its arguments exactly once:
+The following example shows an @code{asm goto} that uses an output.
 
-@smallexample
-#define max(a,b) \
-  (@{ typeof (a) _a = (a); \
-      typeof (b) _b = (b); \
-    _a > _b ? _a : _b; @})
-@end smallexample
+@example
+int foo(int count)
+@{
+  asm goto ("dec %0; jb %l[stop]"
+            : "+r" (count)
+            :
+            :
+            : stop);
+  return count;
+stop:
+  return 0;
+@}
+@end example
 
-@cindex underscores in variables in macros
-@cindex @samp{_} in variables in macros
-@cindex local variables in macros
-@cindex variables, local, in macros
-@cindex macros, local variables in
+The following artificial example shows an @code{asm goto} that sets
+up an output only on one path inside the @code{asm goto}.  Usage of
+constraint modifier @samp{=} instead of @samp{+} would be wrong as
+@code{factor} is used on all paths from the @code{asm goto}.
 
-The reason for using names that start with underscores for the local
-variables is to avoid conflicts with variable names that occur within the
-expressions that are substituted for @code{a} and @code{b}.  Eventually we
-hope to design a new form of declaration syntax that allows you to declare
-variables whose scopes start only after their initializers; this will be a
-more reliable way to prevent such conflicts.
+@example
+int foo(int inp)
+@{
+  int factor = 0;
+  asm goto ("cmp %1, 10; jb %l[lab]; mov 2, %0"
+            : "+r" (factor)
+            : "r" (inp)
+            :
+            : lab);
+lab:
+  return inp * factor; /* return 2 * inp or 0 if inp < 10 */
+@}
+@end example
 
+@anchor{GenericOperandmodifiers}
+@subsubsection Generic Operand Modifiers
 @noindent
-Some more examples of the use of @code{typeof}:
-
-@itemize @bullet
-@item
-This declares @code{y} with the type of what @code{x} points to.
-
-@smallexample
-typeof (*x) y;
-@end smallexample
-
-@item
-This declares @code{y} as an array of such values.
+The following table shows the modifiers supported by all targets and their effects:
 
-@smallexample
-typeof (*x) y[4];
-@end smallexample
-
-@item
-This declares @code{y} as an array of pointers to characters:
-
-@smallexample
-typeof (typeof (char *)[4]) y;
-@end smallexample
-
-@noindent
-It is equivalent to the following traditional C declaration:
-
-@smallexample
-char *y[4];
-@end smallexample
-
-To see the meaning of the declaration using @code{typeof}, and why it
-might be a useful way to write, rewrite it with these macros:
-
-@smallexample
-#define pointer(T)  typeof(T *)
-#define array(T, N) typeof(T [N])
-@end smallexample
-
-@noindent
-Now the declaration can be rewritten this way:
-
-@smallexample
-array (pointer (char), 4) y;
-@end smallexample
-
-@noindent
-Thus, @code{array (pointer (char), 4)} is the type of arrays of 4
-pointers to @code{char}.
-@end itemize
-
-The ISO C23 operator @code{typeof_unqual} is available in ISO C23 mode
-and its result is the non-atomic unqualified version of what @code{typeof}
-operator returns.  Alternate spelling @code{__typeof_unqual__} is
-available in all C modes and provides non-atomic unqualified version of
-what @code{__typeof__} operator returns.
-@xref{Alternate Keywords}.
-
-@cindex @code{__auto_type} in GNU C
-In GNU C, but not GNU C++, you may also declare the type of a variable
-as @code{__auto_type}.  In that case, the declaration must declare
-only one variable, whose declarator must just be an identifier, the
-declaration must be initialized, and the type of the variable is
-determined by the initializer; the name of the variable is not in
-scope until after the initializer.  (In C++, you should use C++11
-@code{auto} for this purpose.)  Using @code{__auto_type}, the
-``maximum'' macro above could be written as:
-
-@smallexample
-#define max(a,b) \
-  (@{ __auto_type _a = (a); \
-      __auto_type _b = (b); \
-    _a > _b ? _a : _b; @})
-@end smallexample
-
-Using @code{__auto_type} instead of @code{typeof} has two advantages:
-
-@itemize @bullet
-@item Each argument to the macro appears only once in the expansion of
-the macro.  This prevents the size of the macro expansion growing
-exponentially when calls to such macros are nested inside arguments of
-such macros.
-
-@item If the argument to the macro has variably modified type, it is
-evaluated only once when using @code{__auto_type}, but twice if
-@code{typeof} is used.
-@end itemize
-
-@node Offsetof
-@subsection Support for @code{offsetof}
-@findex __builtin_offsetof
-
-GCC implements for both C and C++ a syntactic extension to implement
-the @code{offsetof} macro.
-
-@smallexample
-primary:
-        "__builtin_offsetof" "(" @code{typename} "," offsetof_member_designator ")"
-
-offsetof_member_designator:
-          @code{identifier}
-        | offsetof_member_designator "." @code{identifier}
-        | offsetof_member_designator "[" @code{expr} "]"
-@end smallexample
-
-This extension is sufficient such that
-
-@smallexample
-#define offsetof(@var{type}, @var{member})  __builtin_offsetof (@var{type}, @var{member})
-@end smallexample
-
-@noindent
-is a suitable definition of the @code{offsetof} macro.  In C++, @var{type}
-may be dependent.  In either case, @var{member} may consist of a single
-identifier, or a sequence of member accesses and array references.
+@multitable @columnfractions 0.15 0.7 0.15
+@headitem Modifier @tab Description @tab Example
+@item @code{c}
+@tab Require a constant operand and print the constant expression with no punctuation.
+@tab @code{%c0}
+@item @code{cc}
+@tab Like @samp{%c} except try harder to print it with no punctuation.
+@samp{%c} can e.g.@: fail to print constant addresses in position independent code on
+some architectures.
+@tab @code{%cc0}
+@item @code{n}
+@tab Like @samp{%c} except that the value of the constant is negated before printing.
+@tab @code{%n0}
+@item @code{a}
+@tab Substitute a memory reference, with the actual operand treated as the address.
+This may be useful when outputting a ``load address'' instruction, because
+often the assembler syntax for such an instruction requires you to write the
+operand as if it were a memory reference.
+@tab @code{%a0}
+@item @code{l}
+@tab Print the label name with no punctuation.
+@tab @code{%l0}
+@end multitable
 
-@node Alignment
-@subsection Determining the Alignment of Functions, Types or Variables
-@cindex alignment
-@cindex type alignment
-@cindex variable alignment
+@anchor{aarch64Operandmodifiers}
+@subsubsection AArch64 Operand Modifiers
 
-The keyword @code{__alignof__} determines the alignment requirement of
-a function, object, or a type, or the minimum alignment usually required
-by a type.  Its syntax is just like @code{sizeof} and C11 @code{_Alignof}.
+The following table shows the modifiers supported by AArch64 and their effects:
 
-For example, if the target machine requires a @code{double} value to be
-aligned on an 8-byte boundary, then @code{__alignof__ (double)} is 8.
-This is true on many RISC machines.  On more traditional machine
-designs, @code{__alignof__ (double)} is 4 or even 2.
+@multitable @columnfractions .10 .90
+@headitem Modifier @tab Description
+@item @code{w} @tab Print a 32-bit general-purpose register name or, given a
+constant zero operand, the 32-bit zero register (@code{wzr}).
+@item @code{x} @tab Print a 64-bit general-purpose register name or, given a
+constant zero operand, the 64-bit zero register (@code{xzr}).
+@item @code{b} @tab Print an FP/SIMD register name with a @code{b} (byte, 8-bit)
+prefix.
+@item @code{h} @tab Print an FP/SIMD register name with an @code{h} (halfword,
+16-bit) prefix.
+@item @code{s} @tab Print an FP/SIMD register name with an @code{s} (single
+word, 32-bit) prefix.
+@item @code{d} @tab Print an FP/SIMD register name with a @code{d} (doubleword,
+64-bit) prefix.
+@item @code{q} @tab Print an FP/SIMD register name with a @code{q} (quadword,
+128-bit) prefix.
+@item @code{Z} @tab Print an FP/SIMD register name as an SVE register (i.e. with
+a @code{z} prefix).  This is a no-op for SVE register operands.
+@end multitable
 
-Some machines never actually require alignment; they allow references to any
-data type even at an odd address.  For these machines, @code{__alignof__}
-reports the smallest alignment that GCC gives the data type, usually as
-mandated by the target ABI.
+@anchor{x86Operandmodifiers}
+@subsubsection x86 Operand Modifiers
 
-If the operand of @code{__alignof__} is an lvalue rather than a type,
-its value is the required alignment for its type, taking into account
-any minimum alignment specified by attribute @code{aligned}
-(@pxref{Common Variable Attributes}).  For example, after this
-declaration:
+References to input, output, and goto operands in the assembler template
+of extended @code{asm} statements can use 
+modifiers to affect the way the operands are formatted in 
+the code output to the assembler. For example, the 
+following code uses the @samp{h} and @samp{b} modifiers for x86:
 
-@smallexample
-struct foo @{ int x; char y; @} foo1;
-@end smallexample
+@example
+uint16_t  num;
+asm volatile ("xchg %h0, %b0" : "+a" (num) );
+@end example
 
 @noindent
-the value of @code{__alignof__ (foo1.y)} is 1, even though its actual
-alignment is probably 2 or 4, the same as @code{__alignof__ (int)}.
-It is an error to ask for the alignment of an incomplete type other
-than @code{void}.
-
-If the operand of the @code{__alignof__} expression is a function,
-the expression evaluates to the alignment of the function which may
-be specified by attribute @code{aligned} (@pxref{Common Function Attributes}).
+These modifiers generate this assembler code:
 
-@node Incomplete Enums
-@subsection Incomplete @code{enum} Types
+@example
+xchg %ah, %al
+@end example
 
-You can define an @code{enum} tag without specifying its possible values.
-This results in an incomplete type, much like what you get if you write
-@code{struct foo} without describing the elements.  A later declaration
-that does specify the possible values completes the type.
+The rest of this discussion uses the following code for illustrative purposes.
 
-You cannot allocate variables or storage using the type while it is
-incomplete.  However, you can work with pointers to that type.
+@example
+int main()
+@{
+   int iInt = 1;
 
-This extension may not be very useful, but it makes the handling of
-@code{enum} more consistent with the way @code{struct} and @code{union}
-are handled.
+top:
 
-This extension is not supported by GNU C++.
+   asm volatile goto ("some assembler instructions here"
+   : /* No outputs. */
+   : "q" (iInt), "X" (sizeof(unsigned char) + 1), "i" (42)
+   : /* No clobbers. */
+   : top);
+@}
+@end example
 
-@node Variadic Macros
-@subsection Macros with a Variable Number of Arguments.
-@cindex variable number of arguments
-@cindex macro with variable arguments
-@cindex rest argument (in macro)
-@cindex variadic macros
+With no modifiers, this is what the output from the operands would be
+for the @samp{att} and @samp{intel} dialects of assembler:
 
-In the ISO C standard of 1999, a macro can be declared to accept a
-variable number of arguments much as a function can.  The syntax for
-defining the macro is similar to that of a function.  Here is an
-example:
+@multitable {Operand} {$.L2} {OFFSET FLAT:.L2}
+@headitem Operand @tab @samp{att} @tab @samp{intel}
+@item @code{%0}
+@tab @code{%eax}
+@tab @code{eax}
+@item @code{%1}
+@tab @code{$2}
+@tab @code{2}
+@item @code{%3}
+@tab @code{$.L3}
+@tab @code{OFFSET FLAT:.L3}
+@item @code{%4}
+@tab @code{$8}
+@tab @code{8}
+@item @code{%5}
+@tab @code{%xmm0}
+@tab @code{xmm0}
+@item @code{%7}
+@tab @code{$0}
+@tab @code{0}
+@end multitable
 
-@smallexample
-#define debug(format, ...) fprintf (stderr, format, __VA_ARGS__)
-@end smallexample
+The table below shows the list of supported modifiers and their effects.
 
-@noindent
-Here @samp{@dots{}} is a @dfn{variable argument}.  In the invocation of
-such a macro, it represents the zero or more tokens until the closing
-parenthesis that ends the invocation, including any commas.  This set of
-tokens replaces the identifier @code{__VA_ARGS__} in the macro body
-wherever it appears.  See the CPP manual for more information.
+@multitable {Modifier} {Print the opcode suffix for the size of th} {Operand} {@samp{att}} {@samp{intel}}
+@headitem Modifier @tab Description @tab Operand @tab @samp{att} @tab @samp{intel}
+@item @code{A}
+@tab Print an absolute memory reference.
+@tab @code{%A0}
+@tab @code{*%rax}
+@tab @code{rax}
+@item @code{b}
+@tab Print the QImode name of the register.
+@tab @code{%b0}
+@tab @code{%al}
+@tab @code{al}
+@item @code{B}
+@tab print the opcode suffix of b.
+@tab @code{%B0}
+@tab @code{b}
+@tab
+@item @code{c}
+@tab Require a constant operand and print the constant expression with no punctuation.
+@tab @code{%c1}
+@tab @code{2}
+@tab @code{2}
+@item @code{d}
+@tab print duplicated register operand for AVX instruction.
+@tab @code{%d5}
+@tab @code{%xmm0, %xmm0}
+@tab @code{xmm0, xmm0}
+@item @code{E}
+@tab Print the address in Double Integer (DImode) mode (8 bytes) when the target is 64-bit.
+Otherwise mode is unspecified (VOIDmode).
+@tab @code{%E1}
+@tab @code{%(rax)}
+@tab @code{[rax]}
+@item @code{g}
+@tab Print the V16SFmode name of the register.
+@tab @code{%g0}
+@tab @code{%zmm0}
+@tab @code{zmm0}
+@item @code{h}
+@tab Print the QImode name for a ``high'' register.
+@tab @code{%h0}
+@tab @code{%ah}
+@tab @code{ah}
+@item @code{H}
+@tab Add 8 bytes to an offsettable memory reference. Useful when accessing the
+high 8 bytes of SSE values. For a memref in (%rax), it generates
+@tab @code{%H0}
+@tab @code{8(%rax)}
+@tab @code{8[rax]}
+@item @code{k}
+@tab Print the SImode name of the register.
+@tab @code{%k0}
+@tab @code{%eax}
+@tab @code{eax}
+@item @code{l}
+@tab Print the label name with no punctuation.
+@tab @code{%l3}
+@tab @code{.L3}
+@tab @code{.L3}
+@item @code{L}
+@tab print the opcode suffix of l.
+@tab @code{%L0}
+@tab @code{l}
+@tab
+@item @code{N}
+@tab print maskz.
+@tab @code{%N7}
+@tab @code{@{z@}}
+@tab @code{@{z@}}
+@item @code{p}
+@tab Print raw symbol name (without syntax-specific prefixes).
+@tab @code{%p2}
+@tab @code{42}
+@tab @code{42}
+@item @code{P}
+@tab If used for a function, print the PLT suffix and generate PIC code.
+For example, emit @code{foo@@PLT} instead of 'foo' for the function
+foo(). If used for a constant, drop all syntax-specific prefixes and
+issue the bare constant. See @code{p} above.
+@item @code{q}
+@tab Print the DImode name of the register.
+@tab @code{%q0}
+@tab @code{%rax}
+@tab @code{rax}
+@item @code{Q}
+@tab print the opcode suffix of q.
+@tab @code{%Q0}
+@tab @code{q}
+@tab
+@item @code{R}
+@tab print embedded rounding and sae.
+@tab @code{%R4}
+@tab @code{@{rn-sae@}, }
+@tab @code{, @{rn-sae@}}
+@item @code{r}
+@tab print only sae.
+@tab @code{%r4}
+@tab @code{@{sae@}, }
+@tab @code{, @{sae@}}
+@item @code{s}
+@tab print a shift double count, followed by the assemblers argument
+delimiterprint the opcode suffix of s.
+@tab @code{%s1}
+@tab @code{$2, }
+@tab @code{2, }
+@item @code{S}
+@tab print the opcode suffix of s.
+@tab @code{%S0}
+@tab @code{s}
+@tab
+@item @code{t}
+@tab print the V8SFmode name of the register.
+@tab @code{%t5}
+@tab @code{%ymm0}
+@tab @code{ymm0}
+@item @code{T}
+@tab print the opcode suffix of t.
+@tab @code{%T0}
+@tab @code{t}
+@tab
+@item @code{V}
+@tab print naked full integer register name without %.
+@tab @code{%V0}
+@tab @code{eax}
+@tab @code{eax}
+@item @code{w}
+@tab Print the HImode name of the register.
+@tab @code{%w0}
+@tab @code{%ax}
+@tab @code{ax}
+@item @code{W}
+@tab print the opcode suffix of w.
+@tab @code{%W0}
+@tab @code{w}
+@tab
+@item @code{x}
+@tab print the V4SFmode name of the register.
+@tab @code{%x5}
+@tab @code{%xmm0}
+@tab @code{xmm0}
+@item @code{y}
+@tab print "st(0)" instead of "st" as a register.
+@tab @code{%y6}
+@tab @code{%st(0)}
+@tab @code{st(0)}
+@item @code{z}
+@tab Print the opcode suffix for the size of the current integer operand (one of @code{b}/@code{w}/@code{l}/@code{q}).
+@tab @code{%z0}
+@tab @code{l}
+@tab 
+@item @code{Z}
+@tab Like @code{z}, with special suffixes for x87 instructions.
+@end multitable
 
-GCC has long supported variadic macros, and used a different syntax that
-allowed you to give a name to the variable arguments just like any other
-argument.  Here is an example:
 
-@smallexample
-#define debug(format, args...) fprintf (stderr, format, args)
-@end smallexample
+@anchor{x86floatingpointasmoperands}
+@subsubsection x86 Floating-Point @code{asm} Operands
 
-@noindent
-This is in all ways equivalent to the ISO C example above, but arguably
-more readable and descriptive.
+On x86 targets, there are several rules on the usage of stack-like registers
+in the operands of an @code{asm}.  These rules apply only to the operands
+that are stack-like registers:
 
-GNU CPP has two further variadic macro extensions, and permits them to
-be used with either of the above forms of macro definition.
+@enumerate
+@item
+Given a set of input registers that die in an @code{asm}, it is
+necessary to know which are implicitly popped by the @code{asm}, and
+which must be explicitly popped by GCC@.
 
-In standard C, you are not allowed to leave the variable argument out
-entirely; but you are allowed to pass an empty argument.  For example,
-this invocation is invalid in ISO C, because there is no comma after
-the string:
+An input register that is implicitly popped by the @code{asm} must be
+explicitly clobbered, unless it is constrained to match an
+output operand.
 
-@smallexample
-debug ("A message")
-@end smallexample
+@item
+For any input register that is implicitly popped by an @code{asm}, it is
+necessary to know how to adjust the stack to compensate for the pop.
+If any non-popped input is closer to the top of the reg-stack than
+the implicitly popped register, it would not be possible to know what the
+stack looked like---it's not clear how the rest of the stack ``slides
+up''.
 
-GNU CPP permits you to completely omit the variable arguments in this
-way.  In the above examples, the compiler would complain, though since
-the expansion of the macro still has the extra comma after the format
-string.
+All implicitly popped input registers must be closer to the top of
+the reg-stack than any input that is not implicitly popped.
 
-To help solve this problem, CPP behaves specially for variable arguments
-used with the token paste operator, @samp{##}.  If instead you write
+It is possible that if an input dies in an @code{asm}, the compiler might
+use the input register for an output reload.  Consider this example:
 
 @smallexample
-#define debug(format, ...) fprintf (stderr, format, ## __VA_ARGS__)
+asm ("foo" : "=t" (a) : "f" (b));
 @end smallexample
 
 @noindent
-and if the variable arguments are omitted or empty, the @samp{##}
-operator causes the preprocessor to remove the comma before it.  If you
-do provide some variable arguments in your macro invocation, GNU CPP
-does not complain about the paste operation and instead places the
-variable arguments after the comma.  Just like any other pasted macro
-argument, these arguments are not macro expanded.
-
-@node Conditionals
-@subsection Conditionals with Omitted Operands
-@cindex conditional expressions, extensions
-@cindex omitted middle-operands
-@cindex middle-operands, omitted
-@cindex extensions, @code{?:}
-@cindex @code{?:} extensions
+This code says that input @code{b} is not popped by the @code{asm}, and that
+the @code{asm} pushes a result onto the reg-stack, i.e., the stack is one
+deeper after the @code{asm} than it was before.  But, it is possible that
+reload may think that it can use the same register for both the input and
+the output.
 
-The middle operand in a conditional expression may be omitted.  Then
-if the first operand is nonzero, its value is the value of the conditional
-expression.
+To prevent this from happening,
+if any input operand uses the @samp{f} constraint, all output register
+constraints must use the @samp{&} early-clobber modifier.
 
-Therefore, the expression
+The example above is correctly written as:
 
 @smallexample
-x ? : y
+asm ("foo" : "=&t" (a) : "f" (b));
 @end smallexample
 
-@noindent
-has the value of @code{x} if that is nonzero; otherwise, the value of
-@code{y}.
+@item
+Some operands need to be in particular places on the stack.  All
+output operands fall in this category---GCC has no other way to
+know which registers the outputs appear in unless you indicate
+this in the constraints.
 
-This example is perfectly equivalent to
+Output operands must specifically indicate which register an output
+appears in after an @code{asm}.  @samp{=f} is not allowed: the operand
+constraints must select a class with a single register.
 
-@smallexample
-x ? x : y
-@end smallexample
+@item
+Output operands may not be ``inserted'' between existing stack registers.
+Since no 387 opcode uses a read/write operand, all output operands
+are dead before the @code{asm}, and are pushed by the @code{asm}.
+It makes no sense to push anywhere but the top of the reg-stack.
 
-@cindex side effect in @code{?:}
-@cindex @code{?:} side effect
-@noindent
-In this simple case, the ability to omit the middle operand is not
-especially useful.  When it becomes useful is when the first operand does,
-or may (if it is a macro argument), contain a side effect.  Then repeating
-the operand in the middle would perform the side effect twice.  Omitting
-the middle operand uses the value already computed without the undesirable
-effects of recomputing it.
-
-@node Case Ranges
-@subsection Case Ranges
-@cindex case ranges
-@cindex ranges in case statements
-
-You can specify a range of consecutive values in a single @code{case} label,
-like this:
-
-@smallexample
-case @var{low} ... @var{high}:
-@end smallexample
-
-@noindent
-This has the same effect as the proper number of individual @code{case}
-labels, one for each integer value from @var{low} to @var{high}, inclusive.
+Output operands must start at the top of the reg-stack: output
+operands may not ``skip'' a register.
 
-This feature is especially useful for ranges of ASCII character codes:
+@item
+Some @code{asm} statements may need extra stack space for internal
+calculations.  This can be guaranteed by clobbering stack registers
+unrelated to the inputs and outputs.
 
-@smallexample
-case 'A' ... 'Z':
-@end smallexample
+@end enumerate
 
-@strong{Be careful:} Write spaces around the @code{...}, for otherwise
-it may be parsed wrong when you use it with integer values.  For example,
-write this:
+This @code{asm}
+takes one input, which is internally popped, and produces two outputs.
 
 @smallexample
-case 1 ... 5:
+asm ("fsincos" : "=t" (cos), "=u" (sin) : "0" (inp));
 @end smallexample
 
 @noindent
-rather than this:
+This @code{asm} takes two inputs, which are popped by the @code{fyl2xp1} opcode,
+and replaces them with one output.  The @code{st(1)} clobber is necessary 
+for the compiler to know that @code{fyl2xp1} pops both inputs.
 
 @smallexample
-case 1...5:
+asm ("fyl2xp1" : "=t" (result) : "0" (x), "u" (y) : "st(1)");
 @end smallexample
 
-@node Mixed Labels and Declarations
-@subsection Mixed Declarations, Labels and Code
-@cindex mixed declarations and code
-@cindex declarations, mixed with code
-@cindex code, mixed with declarations
-
-ISO C99 and ISO C++ allow declarations and code to be freely mixed
-within compound statements.  ISO C23 allows labels to be
-placed before declarations and at the end of a compound statement.
-As an extension, GNU C also allows all this in C90 mode.  For example,
-you could do:
+@anchor{msp430Operandmodifiers}
+@subsubsection MSP430 Operand Modifiers
 
-@smallexample
-int i;
-/* @r{@dots{}} */
-i++;
-int j = i + 2;
-@end smallexample
+The list below describes the supported modifiers and their effects for MSP430.
 
-Each identifier is visible from where it is declared until the end of
-the enclosing block.
+@multitable @columnfractions .10 .90
+@headitem Modifier @tab Description
+@item @code{A} @tab Select low 16-bits of the constant/register/memory operand.
+@item @code{B} @tab Select high 16-bits of the constant/register/memory
+operand.
+@item @code{C} @tab Select bits 32-47 of the constant/register/memory operand.
+@item @code{D} @tab Select bits 48-63 of the constant/register/memory operand.
+@item @code{H} @tab Equivalent to @code{B} (for backwards compatibility).
+@item @code{I} @tab Print the inverse (logical @code{NOT}) of the constant
+value.
+@item @code{J} @tab Print an integer without a @code{#} prefix.
+@item @code{L} @tab Equivalent to @code{A} (for backwards compatibility).
+@item @code{O} @tab Offset of the current frame from the top of the stack.
+@item @code{Q} @tab Use the @code{A} instruction postfix.
+@item @code{R} @tab Inverse of condition code, for unsigned comparisons.
+@item @code{W} @tab Subtract 16 from the constant value.
+@item @code{X} @tab Use the @code{X} instruction postfix.
+@item @code{Y} @tab Subtract 4 from the constant value.
+@item @code{Z} @tab Subtract 1 from the constant value.
+@item @code{b} @tab Append @code{.B}, @code{.W} or @code{.A} to the
+instruction, depending on the mode.
+@item @code{d} @tab Offset 1 byte of a memory reference or constant value.
+@item @code{e} @tab Offset 3 bytes of a memory reference or constant value.
+@item @code{f} @tab Offset 5 bytes of a memory reference or constant value.
+@item @code{g} @tab Offset 7 bytes of a memory reference or constant value.
+@item @code{p} @tab Print the value of 2, raised to the power of the given
+constant.  Used to select the specified bit position.
+@item @code{r} @tab Inverse of condition code, for signed comparisons.
+@item @code{x} @tab Equivalent to @code{X}, but only for pointers.
+@end multitable
 
-@node C++ Comments
-@subsection C++ Style Comments
-@cindex @code{//}
-@cindex C++ comments
-@cindex comments, C++ style
+@anchor{loongarchOperandmodifiers}
+@subsubsection LoongArch Operand Modifiers
 
-In GNU C, you may use C++ style comments, which start with @samp{//} and
-continue until the end of the line.  Many other C implementations allow
-such comments, and they are included in the 1999 C standard.  However,
-C++ style comments are not recognized if you specify an @option{-std}
-option specifying a version of ISO C before C99, or @option{-ansi}
-(equivalent to @option{-std=c90}).
+The list below describes the supported modifiers and their effects for LoongArch.
 
-@node Escaped Newlines
-@subsection Slightly Looser Rules for Escaped Newlines
-@cindex escaped newlines
-@cindex newlines (escaped)
+@multitable @columnfractions .10 .90
+@headitem Modifier @tab Description
+@item @code{d} @tab Same as @code{c}.
+@item @code{i} @tab Print the character ''@code{i}'' if the operand is not a register.
+@item @code{m} @tab Same as @code{c}, but the printed value is @code{operand - 1}.
+@item @code{u} @tab Print a LASX register.
+@item @code{w} @tab Print a LSX register.
+@item @code{X} @tab Print a constant integer operand in hexadecimal.
+@item @code{z} @tab Print the operand in its unmodified form, followed by a comma.
+@end multitable
 
-The preprocessor treatment of escaped newlines is more relaxed
-than that specified by the C90 standard, which requires the newline
-to immediately follow a backslash.
-GCC's implementation allows whitespace in the form
-of spaces, horizontal and vertical tabs, and form feeds between the
-backslash and the subsequent newline.  The preprocessor issues a
-warning, but treats it as a valid escaped newline and combines the two
-lines to form a single logical line.  This works within comments and
-tokens, as well as between tokens.  Comments are @emph{not} treated as
-whitespace for the purposes of this relaxation, since they have not
-yet been replaced with spaces.
+References to input and output operands in the assembler template of extended
+asm statements can use modifiers to affect the way the operands are formatted
+in the code output to the assembler.  For example, the following code uses the
+'w' modifier for LoongArch:
 
-@node Hex Floats
-@subsection Hex Floats
-@cindex hex floats
+@example
+test-asm.c:
 
-ISO C99 and ISO C++17 support floating-point numbers written not only in
-the usual decimal notation, such as @code{1.55e1}, but also numbers such as
-@code{0x1.fp3} written in hexadecimal format.  As a GNU extension, GCC
-supports this in C90 mode (except in some cases when strictly
-conforming) and in C++98, C++11 and C++14 modes.  In that format the
-@samp{0x} hex introducer and the @samp{p} or @samp{P} exponent field are
-mandatory.  The exponent is a decimal number that indicates the power of
-2 by which the significant part is multiplied.  Thus @samp{0x1.f} is
-@tex
-$1 {15\over16}$,
-@end tex
-@ifnottex
-1 15/16,
-@end ifnottex
-@samp{p3} multiplies it by 8, and the value of @code{0x1.fp3}
-is the same as @code{1.55e1}.
+#include <lsxintrin.h>
 
-Unlike for floating-point numbers in the decimal notation the exponent
-is always required in the hexadecimal notation.  Otherwise the compiler
-would not be able to resolve the ambiguity of, e.g., @code{0x1.f}.  This
-could mean @code{1.0f} or @code{1.9375} since @samp{f} is also the
-extension for floating-point constants of type @code{float}.
+__m128i foo (void)
+@{
+__m128i  a,b,c;
+__asm__ ("vadd.d %w0,%w1,%w2\n\t"
+   :"=f" (c)
+   :"f" (a),"f" (b));
 
-@node Binary constants
-@subsection Binary Constants using the @samp{0b} Prefix
-@cindex Binary constants using the @samp{0b} prefix
+return c;
+@}
 
-Integer constants can be written as binary constants, consisting of a
-sequence of @samp{0} and @samp{1} digits, prefixed by @samp{0b} or
-@samp{0B}.  This is particularly useful in environments that operate a
-lot on the bit level (like microcontrollers).
+@end example
 
-The following statements are identical:
+@noindent
+The compile command for the test case is as follows:
 
-@smallexample
-i =       42;
-i =     0x2a;
-i =      052;
-i = 0b101010;
-@end smallexample
+@example
+gcc test-asm.c -mlsx -S -o test-asm.s
+@end example
 
-The type of these constants follows the same rules as for octal or
-hexadecimal integer constants, so suffixes like @samp{L} or @samp{UL}
-can be applied.
+@noindent
+The assembly statement produces the following assembly code:
 
-@node Dollar Signs
-@subsection Dollar Signs in Identifier Names
-@cindex $
-@cindex dollar signs in identifier names
-@cindex identifier names, dollar signs in
+@example
+vadd.d $vr0,$vr0,$vr1
+@end example
 
-In GNU C, you may normally use dollar signs in identifier names.
-This is because many traditional C implementations allow such identifiers.
-However, dollar signs in identifiers are not supported on a few target
-machines, typically because the target assembler does not allow them.
+This is a 128-bit vector addition instruction, @code{c} (referred to in the
+template string as %0) is the output, and @code{a} (%1) and @code{b} (%2) are
+the inputs.  @code{__m128i} is a vector data type defined in the  file
+@code{lsxintrin.h} (@xref{LoongArch SX Vector Intrinsics}).  The symbol '=f'
+represents a constraint using a floating-point register as an output type, and
+the 'f' in the input operand represents a constraint using a floating-point
+register operand, which can refer to the definition of a constraint
+(@xref{Constraints}) in gcc.
 
-@node Character Escapes
-@subsection The Character @key{ESC} in Constants
+@anchor{riscvOperandmodifiers}
+@subsubsection RISC-V Operand Modifiers
 
-You can use the sequence @samp{\e} in a string or character constant to
-stand for the ASCII character @key{ESC}.
+The list below describes the supported modifiers and their effects for RISC-V.
 
-@node Alternate Keywords
-@subsection Alternate Keywords
-@cindex alternate keywords
-@cindex keywords, alternate
+@multitable @columnfractions .10 .90
+@headitem Modifier @tab Description
+@item @code{z} @tab Print ''@code{zero}'' instead of 0 if the operand is an immediate with a value of zero.
+@item @code{i} @tab Print the character ''@code{i}'' if the operand is an immediate.
+@item @code{N} @tab Print the register encoding as integer (0 - 31).
+@end multitable
 
-@option{-ansi} and the various @option{-std} options disable certain
-keywords that are GNU C extensions.
-Specifically, the keywords @code{asm}, @code{typeof} and
-@code{inline} are not available in programs compiled with
-@option{-ansi} or a @option{-std=} option specifying an ISO standard that
-doesn't define the keyword.  This causes trouble when you want to use
-these extensions in a header file that can be included in programs that may
-be compiled with with such options.
+@anchor{shOperandmodifiers}
+@subsubsection SH Operand Modifiers
 
-The way to solve these problems is to put @samp{__} at the beginning and
-end of each problematical keyword.  For example, use @code{__asm__}
-instead of @code{asm}, and @code{__inline__} instead of @code{inline}.
+The list below describes the supported modifiers and their effects for the SH family of processors.
 
-Other C compilers won't accept these alternative keywords; if you want to
-compile with another compiler, you can define the alternate keywords as
-macros to replace them with the customary keywords.  It looks like this:
+@multitable @columnfractions .10 .90
+@headitem Modifier @tab Description
+@item @code{.} @tab Print ''@code{.s}'' if the instruction needs a delay slot.
+@item @code{,} @tab Print ''@code{LOCAL_LABEL_PREFIX}''.
+@item @code{@@} @tab Print ''@code{trap}'', ''@code{rte}'' or ''@code{rts}'' depending on the interrupt pragma used.
+@item @code{#} @tab Print ''@code{nop}'' if there is nothing to put in the delay slot.
+@item @code{'} @tab Print likelihood suffix (''@code{/u}'' for unlikely).
+@item @code{>} @tab Print branch target if ''@code{-fverbose-asm}''.
+@item @code{O} @tab Require a constant operand and print the constant expression with no punctuation.
+@item @code{R} @tab Print the ''@code{LSW}'' of a dp value - changes if in little endian.
+@item @code{S} @tab Print the ''@code{MSW}'' of a dp value - changes if in little endian.
+@item @code{T} @tab Print the next word of a dp value - same as ''@code{R}'' in big endian mode.
+@item @code{M} @tab Print ''@code{.b }'', ''@code{.w}'', ''@code{.l}'', ''@code{.s}'', ''@code{.d}'', suffix if operand is a MEM.
+@item @code{N} @tab Print ''@code{r63}'' if the operand is ''@code{const_int 0}''.
+@item @code{d} @tab Print a ''@code{V2SF}'' as ''@code{dN}'' instead of ''@code{fpN}''.
+@item @code{m} @tab Print the pair ''@code{base,offset}'' or ''@code{base,index}'' for LD and ST.
+@item @code{U} @tab Like ''@code{%m}'' for ''@code{LD}'' and ''@code{ST}'', ''@code{HI}'' and ''@code{LO}''.
+@item @code{V} @tab Print the position of a single bit set.
+@item @code{W} @tab Print the position of a single bit cleared.
+@item @code{t} @tab Print a memory address which is a register.
+@item @code{u} @tab Print the lowest 16 bits of ''@code{CONST_INT}'', as an unsigned value.
+@item @code{o} @tab Print an operator.
+@end multitable
 
-@smallexample
-#ifndef __GNUC__
-#define __asm__ asm
-#endif
-@end smallexample
+@lowersections
+@include md.texi
+@raisesections
 
-@findex __extension__
-@opindex pedantic
-@option{-pedantic} and other options cause warnings for many GNU C extensions.
-You can suppress such warnings using the keyword @code{__extension__}.
-Specifically:
+@node Asm constexprs
+@subsection C++11 Constant Expressions instead of String Literals
 
-@itemize @bullet
-@item
-Writing @code{__extension__} before an expression prevents warnings
-about extensions within that expression.
+In C++ with @option{-std=gnu++11} or later, strings that appear in asm
+syntax---specifically, the assembler template, constraints, and
+clobbers---can be specified as parenthesized compile-time constant
+expressions as well as by string literals.  The parentheses around such
+an expression are a required part of the syntax.  The constant expression
+can return a container with @code{data ()} and @code{size ()}
+member functions, following similar rules as the C++26 @code{static_assert}
+message.  Any string is converted to the character set of the source code.
+When this feature is available the @code{__GXX_CONSTEXPR_ASM__} preprocessor
+macro is predefined.
 
-@item
-In C, writing:
+This extension is supported for both the basic and extended asm syntax.
 
-@smallexample
-[[__extension__ @dots{}]]
-@end smallexample
+@example
+#include <string>
+constexpr std::string_view genfoo() @{ return "foo"; @}
 
-suppresses warnings about using @samp{[[]]} attributes in C versions
-that predate C23@.
-@end itemize
+void function()
+@{
+  asm((genfoo()));
+@}
+@end example
 
-@code{__extension__} has no effect aside from this.
+@node Asm Labels
+@subsection Controlling Names Used in Assembler Code
+@cindex assembler names for identifiers
+@cindex names used in assembler code
+@cindex identifiers, names in assembler code
 
-@node Function Names
-@subsection Function Names as Strings
-@cindex @code{__func__} identifier
-@cindex @code{__FUNCTION__} identifier
-@cindex @code{__PRETTY_FUNCTION__} identifier
+You can specify the name to be used in the assembler code for a C
+function or variable by writing the @code{asm} (or @code{__asm__})
+keyword after the declarator.
+It is up to you to make sure that the assembler names you choose do not
+conflict with any other assembler symbols, or reference registers.
 
-GCC provides three magic constants that hold the name of the current
-function as a string.  In C++11 and later modes, all three are treated
-as constant expressions and can be used in @code{constexpr} constexts.
-The first of these constants is @code{__func__}, which is part of
-the C99 standard:
+@subsubheading Assembler names for data
 
-The identifier @code{__func__} is implicitly declared by the translator
-as if, immediately following the opening brace of each function
-definition, the declaration
+This sample shows how to specify the assembler name for data:
 
 @smallexample
-static const char __func__[] = "function-name";
+int foo asm ("myfoo") = 2;
 @end smallexample
 
 @noindent
-appeared, where function-name is the name of the lexically-enclosing
-function.  This name is the unadorned name of the function.  As an
-extension, at file (or, in C++, namespace scope), @code{__func__}
-evaluates to the empty string.
+This specifies that the name to be used for the variable @code{foo} in
+the assembler code should be @samp{myfoo} rather than the usual
+@samp{_foo}.
 
-@code{__FUNCTION__} is another name for @code{__func__}, provided for
-backward compatibility with old versions of GCC.
+On systems where an underscore is normally prepended to the name of a C
+variable, this feature allows you to define names for the
+linker that do not start with an underscore.
 
-In C, @code{__PRETTY_FUNCTION__} is yet another name for
-@code{__func__}, except that at file scope (or, in C++, namespace scope),
-it evaluates to the string @code{"top level"}.  In addition, in C++,
-@code{__PRETTY_FUNCTION__} contains the signature of the function as
-well as its bare name.  For example, this program:
+GCC does not support using this feature with a non-static local variable 
+since such variables do not have assembler names.  If you are
+trying to put the variable in a particular register, see 
+@ref{Explicit Register Variables}.
 
-@smallexample
-extern "C" int printf (const char *, ...);
+@subsubheading Assembler names for functions
 
-class a @{
- public:
-  void sub (int i)
-    @{
-      printf ("__FUNCTION__ = %s\n", __FUNCTION__);
-      printf ("__PRETTY_FUNCTION__ = %s\n", __PRETTY_FUNCTION__);
-    @}
-@};
+To specify the assembler name for functions, write a declaration for the 
+function before its definition and put @code{asm} there, like this:
 
-int
-main (void)
+@smallexample
+int func (int x, int y) asm ("MYFUNC");
+     
+int func (int x, int y)
 @{
-  a ax;
-  ax.sub (0);
-  return 0;
-@}
+   /* @r{@dots{}} */
 @end smallexample
 
 @noindent
-gives this output:
-
-@smallexample
-__FUNCTION__ = sub
-__PRETTY_FUNCTION__ = void a::sub(int)
-@end smallexample
+This specifies that the name to be used for the function @code{func} in
+the assembler code should be @code{MYFUNC}.
 
-These identifiers are variables, not preprocessor macros, and may not
-be used to initialize @code{char} arrays or be concatenated with string
-literals.
+@node Explicit Register Variables
+@subsection Variables in Specified Registers
+@anchor{Explicit Reg Vars}
+@cindex explicit register variables
+@cindex variables in specified registers
+@cindex specified registers
 
-@node Semantic Extensions
-@section Extensions to C Semantics
+GNU C allows you to associate specific hardware registers with C 
+variables.  In almost all cases, allowing the compiler to assign
+registers produces the best code.  However under certain unusual
+circumstances, more precise control over the variable storage is 
+required.
 
-GNU C defines useful behavior for some constructs that are not allowed or
-well-defined in standard C.
+Both global and local variables can be associated with a register.  The
+consequences of performing this association are very different between
+the two, as explained in the sections below.
 
 @menu
-* Function Prototypes::    Prototype declarations and old-style definitions.
-* Pointer Arith::          Arithmetic on @code{void}-pointers and function pointers.
-* Variadic Pointer Args::  Pointer arguments to variadic functions.
-* Pointers to Arrays::     Pointers to arrays with qualifiers work as expected.
-* Const and Volatile Functions:: GCC interprets these specially in C.
+* Global Register Variables::   Variables declared at global scope.
+* Local Register Variables::    Variables declared within a function.
 @end menu
 
-@node Function Prototypes
-@subsection Prototypes and Old-Style Function Definitions
-@cindex function prototype declarations
-@cindex old-style function definitions
-@cindex promotion of formal parameters
+@node Global Register Variables
+@subsubsection Defining Global Register Variables
+@anchor{Global Reg Vars}
+@cindex global register variables
+@cindex registers, global variables in
+@cindex registers, global allocation
 
-GNU C extends ISO C to allow a function prototype to override a later
-old-style non-prototype definition.  Consider the following example:
+You can define a global register variable and associate it with a specified 
+register like this:
 
 @smallexample
-/* @r{Use prototypes unless the compiler is old-fashioned.}  */
-#ifdef __STDC__
-#define P(x) x
-#else
-#define P(x) ()
-#endif
-
-/* @r{Prototype function declaration.}  */
-int isroot P((uid_t));
-
-/* @r{Old-style function definition.}  */
-int
-isroot (x)   /* @r{??? lossage here ???} */
-     uid_t x;
-@{
-  return x == 0;
-@}
+register int *foo asm ("r12");
 @end smallexample
 
-Suppose the type @code{uid_t} happens to be @code{short}.  ISO C does
-not allow this example, because subword arguments in old-style
-non-prototype definitions are promoted.  Therefore in this example the
-function definition's argument is really an @code{int}, which does not
-match the prototype argument type of @code{short}.
+@noindent
+Here @code{r12} is the name of the register that should be used. Note that 
+this is the same syntax used for defining local register variables, but for 
+a global variable the declaration appears outside a function. The 
+@code{register} keyword is required, and cannot be combined with 
+@code{static}. The register name must be a valid register name for the
+target platform.
 
-This restriction of ISO C makes it hard to write code that is portable
-to traditional C compilers, because the programmer does not know
-whether the @code{uid_t} type is @code{short}, @code{int}, or
-@code{long}.  Therefore, in cases like these GNU C allows a prototype
-to override a later old-style definition.  More precisely, in GNU C, a
-function prototype argument type overrides the argument type specified
-by a later old-style definition if the former type is the same as the
-latter type before promotion.  Thus in GNU C the above example is
-equivalent to the following:
+Do not use type qualifiers such as @code{const} and @code{volatile}, as
+the outcome may be contrary to expectations.  In  particular, using the
+@code{volatile} qualifier does not fully prevent the compiler from
+optimizing accesses to the register.
 
-@smallexample
-int isroot (uid_t);
+Registers are a scarce resource on most systems and allowing the 
+compiler to manage their usage usually results in the best code. However, 
+under special circumstances it can make sense to reserve some globally.
+For example this may be useful in programs such as programming language 
+interpreters that have a couple of global variables that are accessed 
+very often.
 
-int
-isroot (uid_t x)
-@{
-  return x == 0;
-@}
-@end smallexample
+After defining a global register variable, for the current compilation
+unit:
 
-@noindent
-GNU C++ does not support old-style function definitions, so this
-extension is irrelevant.
+@itemize @bullet
+@item If the register is a call-saved register, call ABI is affected:
+the register will not be restored in function epilogue sequences after
+the variable has been assigned.  Therefore, functions cannot safely
+return to callers that assume standard ABI.
+@item Conversely, if the register is a call-clobbered register, making
+calls to functions that use standard ABI may lose contents of the variable.
+Such calls may be created by the compiler even if none are evident in
+the original program, for example when libgcc functions are used to
+make up for unavailable instructions.
+@item Accesses to the variable may be optimized as usual and the register
+remains available for allocation and use in any computations, provided that
+observable values of the variable are not affected.
+@item If the variable is referenced in inline assembly, the type of access
+must be provided to the compiler via constraints (@pxref{Constraints}).
+Accesses from basic asms are not supported.
+@end itemize
 
-@node Pointer Arith
-@subsection Arithmetic on @code{void}- and Function-Pointers
-@cindex void pointers, arithmetic
-@cindex void, size of pointer to
-@cindex function pointers, arithmetic
-@cindex function, size of pointer to
+Note that these points @emph{only} apply to code that is compiled with the
+definition. The behavior of code that is merely linked in (for example 
+code from libraries) is not affected.
 
-In GNU C, addition and subtraction operations are supported on pointers to
-@code{void} and on pointers to functions.  This is done by treating the
-size of a @code{void} or of a function as 1.
+If you want to recompile source files that do not actually use your global 
+register variable so they do not use the specified register for any other 
+purpose, you need not actually add the global register declaration to 
+their source code. It suffices to specify the compiler option 
+@option{-ffixed-@var{reg}} (@pxref{Code Gen Options}) to reserve the 
+register.
 
-A consequence of this is that @code{sizeof} is also allowed on @code{void}
-and on function types, and returns 1.
+@subsubheading Declaring the variable
 
-@opindex Wpointer-arith
-The option @option{-Wpointer-arith} requests a warning if these extensions
-are used.
+Global register variables cannot have initial values, because an
+executable file has no means to supply initial contents for a register.
 
-@node Variadic Pointer Args
-@subsection Pointer Arguments in Variadic Functions
-@cindex pointer arguments in variadic functions
-@cindex variadic functions, pointer arguments
+When selecting a register, choose one that is normally saved and 
+restored by function calls on your machine. This ensures that code
+which is unaware of this reservation (such as library routines) will 
+restore it before returning.
 
-Standard C requires that pointer types used with @code{va_arg} in
-functions with variable argument lists either must be compatible with
-that of the actual argument, or that one type must be a pointer to
-@code{void} and the other a pointer to a character type.  GNU C
-implements the POSIX XSI extension that additionally permits the use
-of @code{va_arg} with a pointer type to receive arguments of any other
-pointer type.
+On machines with register windows, be sure to choose a global
+register that is not affected magically by the function call mechanism.
 
-In particular, in GNU C @samp{va_arg (ap, void *)} can safely be used
-to consume an argument of any pointer type.
+@subsubheading Using the variable
 
-@node Pointers to Arrays
-@subsection Pointers to Arrays with Qualifiers Work as Expected
-@cindex pointers to arrays
-@cindex const qualifier
+@cindex @code{qsort}, and global register variables
+When calling routines that are not aware of the reservation, be 
+cautious if those routines call back into code which uses them. As an 
+example, if you call the system library version of @code{qsort}, it may 
+clobber your registers during execution, but (if you have selected 
+appropriate registers) it will restore them before returning. However 
+it will @emph{not} restore them before calling @code{qsort}'s comparison 
+function. As a result, global values will not reliably be available to 
+the comparison function unless the @code{qsort} function itself is rebuilt.
 
-In GNU C, pointers to arrays with qualifiers work similar to pointers
-to other qualified types. For example, a value of type @code{int (*)[5]}
-can be used to initialize a variable of type @code{const int (*)[5]}.
-These types are incompatible in ISO C because the @code{const} qualifier
-is formally attached to the element type of the array and not the
-array itself.
+Similarly, it is not safe to access the global register variables from signal
+handlers or from more than one thread of control. Unless you recompile 
+them specially for the task at hand, the system library routines may 
+temporarily use the register for other things.  Furthermore, since the register
+is not reserved exclusively for the variable, accessing it from handlers of
+asynchronous signals may observe unrelated temporary values residing in the
+register.
+
+@cindex register variable after @code{longjmp}
+@cindex global register after @code{longjmp}
+@cindex value after @code{longjmp}
+@findex longjmp
+@findex setjmp
+On most machines, @code{longjmp} restores to each global register
+variable the value it had at the time of the @code{setjmp}. On some
+machines, however, @code{longjmp} does not change the value of global
+register variables. To be portable, the function that called @code{setjmp}
+should make other arrangements to save the values of the global register
+variables, and to restore them in a @code{longjmp}. This way, the same
+thing happens regardless of what @code{longjmp} does.
+
+@node Local Register Variables
+@subsubsection Specifying Registers for Local Variables
+@anchor{Local Reg Vars}
+@cindex local variables, specifying registers
+@cindex specifying registers for local variables
+@cindex registers for local variables
+
+You can define a local register variable and associate it with a specified 
+register like this:
 
 @smallexample
-extern void
-transpose (int N, int M, double out[M][N], const double in[N][M]);
-double x[3][2];
-double y[2][3];
-@r{@dots{}}
-transpose(3, 2, y, x);
+register int *foo asm ("r12");
 @end smallexample
 
-@node Const and Volatile Functions
-@subsection Const and Volatile Functions
-@cindex @code{const} applied to function
-@cindex @code{volatile} applied to function
+@noindent
+Here @code{r12} is the name of the register that should be used.  Note
+that this is the same syntax used for defining global register variables, 
+but for a local variable the declaration appears within a function.  The 
+@code{register} keyword is required, and cannot be combined with 
+@code{static}.  The register name must be a valid register name for the
+target platform.
 
-The C standard explicitly leaves the behavior of the @code{const} and
-@code{volatile} type qualifiers applied to functions undefined; these
-constructs can only arise through the use of @code{typedef}.  As an extension,
-GCC defines this use of the @code{const} qualifier to have the same meaning
-as the GCC @code{const} function attribute, and the @code{volatile} qualifier
-to be equivalent to the @code{noreturn} attribute.
-@xref{Common Function Attributes}, for more information.
+Do not use type qualifiers such as @code{const} and @code{volatile}, as
+the outcome may be contrary to expectations. In particular, when the
+@code{const} qualifier is used, the compiler may substitute the
+variable with its initializer in @code{asm} statements, which may cause
+the corresponding operand to appear in a different register.
 
-As examples of this usage,
+As with global register variables, it is recommended that you choose 
+a register that is normally saved and restored by function calls on your 
+machine, so that calls to library routines will not clobber it.
+
+The only supported use for this feature is to specify registers
+for input and output operands when calling Extended @code{asm} 
+(@pxref{Extended Asm}).  This may be necessary if the constraints for a 
+particular machine don't provide sufficient control to select the desired 
+register.  To force an operand into a register, create a local variable 
+and specify the register name after the variable's declaration.  Then use 
+the local variable for the @code{asm} operand and specify any constraint 
+letter that matches the register:
 
 @smallexample
+register int *p1 asm ("r0") = @dots{};
+register int *p2 asm ("r1") = @dots{};
+register int *result asm ("r0");
+asm ("sysint" : "=r" (result) : "0" (p1), "r" (p2));
+@end smallexample
 
-/* @r{Equivalent to:}
-   void fatal () __attribute__ ((noreturn));  */
-typedef void voidfn ();
-volatile voidfn fatal;
+@emph{Warning:} In the above example, be aware that a register (for example 
+@code{r0}) can be call-clobbered by subsequent code, including function 
+calls and library calls for arithmetic operators on other variables (for 
+example the initialization of @code{p2}).  In this case, use temporary 
+variables for expressions between the register assignments:
 
-/* @r{Equivalent to:}
-   extern int square (int) __attribute__ ((const));  */
-typedef int intfn (int);
-extern const intfn square;
+@smallexample
+int t1 = @dots{};
+register int *p1 asm ("r0") = @dots{};
+register int *p2 asm ("r1") = t1;
+register int *result asm ("r0");
+asm ("sysint" : "=r" (result) : "0" (p1), "r" (p2));
 @end smallexample
 
-In general, using function attributes instead is preferred, since the
-attributes make both the intent of the code and its reliance on a GNU
-extension explicit.  Additionally, using @code{const} and
-@code{volatile} in this way is specific to GNU C and does not work in
-GNU C++.
-
-@node Return Address
-@section Getting the Return or Frame Address of a Function
+Defining a register variable does not reserve the register.  Other than
+when invoking the Extended @code{asm}, the contents of the specified 
+register are not guaranteed.  For this reason, the following uses 
+are explicitly @emph{not} supported.  If they appear to work, it is only 
+happenstance, and may stop working as intended due to (seemingly) 
+unrelated changes in surrounding code, or even minor changes in the 
+optimization of a future version of gcc:
 
-These functions may be used to get information about the callers of a
-function.
+@itemize @bullet
+@item Passing parameters to or from Basic @code{asm}
+@item Passing parameters to or from Extended @code{asm} without using input 
+or output operands.
+@item Passing parameters to or from routines written in assembler (or
+other languages) using non-standard calling conventions.
+@end itemize
 
-@defbuiltin{{void *} __builtin_return_address (unsigned int @var{level})}
-This function returns the return address of the current function, or of
-one of its callers.  The @var{level} argument is number of frames to
-scan up the call stack.  A value of @code{0} yields the return address
-of the current function, a value of @code{1} yields the return address
-of the caller of the current function, and so forth.  When inlining
-the expected behavior is that the function returns the address of
-the function that is returned to.  To work around this behavior use
-the @code{noinline} function attribute.
+Some developers use Local Register Variables in an attempt to improve 
+gcc's allocation of registers, especially in large functions.  In this 
+case the register name is essentially a hint to the register allocator.
+While in some instances this can generate better code, improvements are
+subject to the whims of the allocator/optimizers.  Since there are no
+guarantees that your improvements won't be lost, this usage of Local
+Register Variables is discouraged.
 
-The @var{level} argument must be a constant integer.
+On the MIPS platform, there is related use for local register variables 
+with slightly different characteristics (@pxref{MIPS Coprocessors,, 
+Defining coprocessor specifics for MIPS targets, gccint, 
+GNU Compiler Collection (GCC) Internals}).
 
-On some machines it may be impossible to determine the return address of
-any function other than the current one; in such cases, or when the top
-of the stack has been reached, this function returns an unspecified
-value.  In addition, @code{__builtin_frame_address} may be used
-to determine if the top of the stack has been reached.
+@node Size of an asm
+@subsection Size of an @code{asm}
 
-Additional post-processing of the returned value may be needed, see
-@code{__builtin_extract_return_addr}.
+Some targets require that GCC track the size of each instruction used
+in order to generate correct code.  Because the final length of the
+code produced by an @code{asm} statement is only known by the
+assembler, GCC must make an estimate as to how big it will be.  It
+does this by counting the number of instructions in the pattern of the
+@code{asm} and multiplying that by the length of the longest
+instruction supported by that processor.  (When working out the number
+of instructions, it assumes that any occurrence of a newline or of
+whatever statement separator character is supported by the assembler ---
+typically @samp{;} --- indicates the end of an instruction.)
 
-The stored representation of the return address in memory may be different
-from the address returned by @code{__builtin_return_address}.  For example,
-on AArch64 the stored address may be mangled with return address signing
-whereas the address returned by @code{__builtin_return_address} is not.
+Normally, GCC's estimate is adequate to ensure that correct
+code is generated, but it is possible to confuse the compiler if you use
+pseudo instructions or assembler macros that expand into multiple real
+instructions, or if you use assembler directives that expand to more
+space in the object file than is needed for a single instruction.
+If this happens then the assembler may produce a diagnostic saying that
+a label is unreachable.
 
-Calling this function with a nonzero argument can have unpredictable
-effects, including crashing the calling program.  As a result, calls
-that are considered unsafe are diagnosed when the @option{-Wframe-address}
-option is in effect.  Such calls should only be made in debugging
-situations.
+@cindex @code{asm inline}
+This size is also used for inlining decisions.  If you use @code{asm inline}
+instead of just @code{asm}, then for inlining purposes the size of the asm
+is taken as the minimum size, ignoring how many instructions GCC thinks it is.
 
-On targets where code addresses are representable as @code{void *},
-@smallexample
-void *addr = __builtin_extract_return_addr (__builtin_return_address (0));
-@end smallexample
-gives the code address where the current function would return.  For example,
-such an address may be used with @code{dladdr} or other interfaces that work
-with code addresses.
-@enddefbuiltin
+@node Syntax Extensions
+@section Other Extensions to C Syntax
 
-@defbuiltin{{void *} __builtin_extract_return_addr (void *@var{addr})}
-The address as returned by @code{__builtin_return_address} may have to be fed
-through this function to get the actual encoded address.  For example, on the
-31-bit S/390 platform the highest bit has to be masked out, or on SPARC
-platforms an offset has to be added for the true next instruction to be
-executed.
+GNU C has traditionally supported numerous extensions to standard C
+syntax.  Some of these features were originally intended for
+compatibility with other compilers or to ease traditional C
+compatibility, some have been adopted into subsequent versions of the
+C and/or C++ standards, while others remain specific to GNU C.
 
-If no fixup is needed, this function simply passes through @var{addr}.
-@enddefbuiltin
+@menu
+* Statement Exprs::     Putting statements and declarations inside expressions.
+* Local Labels::        Labels local to a block.
+* Labels as Values::    Getting pointers to labels, and computed gotos.
+* Nested Functions::    Nested functions in GNU C.
+* Typeof::              @code{typeof}: referring to the type of an expression.
+* Offsetof::            Special syntax for @code{offsetof}.
+* Alignment::           Determining the alignment of a function, type or variable.
+* Incomplete Enums::    @code{enum foo;}, with details to follow.
+* Variadic Macros::     Macros with a variable number of arguments.
+* Conditionals::        Omitting the middle operand of a @samp{?:} expression.
+* Case Ranges::         `case 1 ... 9' and such.
+* Mixed Labels and Declarations::  Mixing declarations, labels and code.
+* C++ Comments::        C++ comments are recognized.
+* Escaped Newlines::    Slightly looser rules for escaped newlines.
+* Hex Floats::          Hexadecimal floating-point constants.
+* Binary constants::    Binary constants using the @samp{0b} prefix.
+* Dollar Signs::        Dollar sign is allowed in identifiers.
+* Character Escapes::   @samp{\e} stands for the character @key{ESC}.
+* Alternate Keywords::  @code{__const__}, @code{__asm__}, etc., for header files.
+* Function Names::      Printable strings which are the name of the current
+                        function.
+@end menu
 
-@defbuiltin{{void *} __builtin_frob_return_addr (void *@var{addr})}
-This function does the reverse of @code{__builtin_extract_return_addr}.
-@enddefbuiltin
+@node Statement Exprs
+@subsection Statements and Declarations in Expressions
+@cindex statements inside expressions
+@cindex declarations inside expressions
+@cindex expressions containing statements
+@cindex macros, statements in expressions
 
-@defbuiltin{{void *} __builtin_frame_address (unsigned int @var{level})}
-This function is similar to @code{__builtin_return_address}, but it
-returns the address of the function frame rather than the return address
-of the function.  Calling @code{__builtin_frame_address} with a value of
-@code{0} yields the frame address of the current function, a value of
-@code{1} yields the frame address of the caller of the current function,
-and so forth.
+@c the above section title wrapped and causes an underfull hbox.. i
+@c changed it from "within" to "in". --mew 4feb93
+A compound statement enclosed in parentheses may appear as an expression
+in GNU C@.  This allows you to use loops, switches, and local variables
+within an expression.
 
-The frame is the area on the stack that holds local variables and saved
-registers.  The frame address is normally the address of the first word
-pushed on to the stack by the function.  However, the exact definition
-depends upon the processor and the calling convention.  If the processor
-has a dedicated frame pointer register, and the function has a frame,
-then @code{__builtin_frame_address} returns the value of the frame
-pointer register.
+Recall that a compound statement is a sequence of statements surrounded
+by braces; in this construct, parentheses go around the braces.  For
+example:
 
-On some machines it may be impossible to determine the frame address of
-any function other than the current one; in such cases, or when the top
-of the stack has been reached, this function returns @code{0} if
-the first frame pointer is properly initialized by the startup code.
+@smallexample
+(@{ int y = foo (); int z;
+   if (y > 0) z = y;
+   else z = - y;
+   z; @})
+@end smallexample
 
-Calling this function with a nonzero argument can have unpredictable
-effects, including crashing the calling program.  As a result, calls
-that are considered unsafe are diagnosed when the @option{-Wframe-address}
-option is in effect.  Such calls should only be made in debugging
-situations.
-@enddefbuiltin
+@noindent
+is a valid (though slightly more complex than necessary) expression
+for the absolute value of @code{foo ()}.
 
-@deftypefn {Built-in Function} {void *} __builtin_stack_address ()
-This function returns the stack pointer register, offset by
-@code{STACK_ADDRESS_OFFSET} if that's defined.
+The last thing in the compound statement should be an expression
+followed by a semicolon; the value of this subexpression serves as the
+value of the entire construct.  (If you use some other kind of statement
+last within the braces, the construct has type @code{void}, and thus
+effectively no value.)
 
-Conceptually, the returned address returned by this built-in function is
-the boundary between the stack area allocated for use by its caller, and
-the area that could be modified by a function call, that the caller
-could safely zero-out before or after (but not during) the call
-sequence.
+This feature is especially useful in making macro definitions ``safe'' (so
+that they evaluate each operand exactly once).  For example, the
+``maximum'' function is commonly defined as a macro in standard C as
+follows:
 
-Arguments for a callee may be preallocated as part of the caller's stack
-frame, or allocated on a per-call basis, depending on the target, so
-they may be on either side of this boundary.
+@smallexample
+#define max(a,b) ((a) > (b) ? (a) : (b))
+@end smallexample
 
-Even if the stack pointer is biased, the result is not.  The register
-save area on SPARC is regarded as modifiable by calls, rather than as
-allocated for use by the caller function, since it is never in use while
-the caller function itself is running.
+@noindent
+@cindex side effects, macro argument
+But this definition computes either @var{a} or @var{b} twice, with bad
+results if the operand has side effects.  In GNU C, if you know the
+type of the operands (here taken as @code{int}), you can avoid this
+problem by defining the macro as follows:
 
-Red zones that only leaf functions could use are also regarded as
-modifiable by calls, rather than as allocated for use by the caller.
-This is only theoretical, since leaf functions do not issue calls, but a
-constant offset makes this built-in function more predictable.
-@end deftypefn
+@smallexample
+#define maxint(a,b) \
+  (@{int _a = (a), _b = (b); _a > _b ? _a : _b; @})
+@end smallexample
 
-@node Stack Scrubbing
-@section Stack scrubbing internal interfaces
+Note that introducing variable declarations (as we do in @code{maxint}) can
+cause variable shadowing, so while this example using the @code{max} macro
+produces correct results:
+@smallexample
+int _a = 1, _b = 2, c;
+c = max (_a, _b);
+@end smallexample
+@noindent
+this example using maxint will not:
+@smallexample
+int _a = 1, _b = 2, c;
+c = maxint (_a, _b);
+@end smallexample
 
-Stack scrubbing involves cooperation between a @code{strub} context,
-i.e., a function whose stack frame is to be zeroed-out, and its callers.
-The caller initializes a stack watermark, the @code{strub} context
-updates the watermark according to its stack use, and the caller zeroes
-it out once it regains control, whether by the callee's returning or by
-an exception.
+This problem may for instance occur when we use this pattern recursively, like
+so:
 
-Each of these steps is performed by a different builtin function call.
-Calls to these builtins are introduced automatically, in response to
-@code{strub} attributes and command-line options; they are not expected
-to be explicitly called by source code.
+@smallexample
+#define maxint3(a, b, c) \
+  (@{int _a = (a), _b = (b), _c = (c); maxint (maxint (_a, _b), _c); @})
+@end smallexample
 
-The functions that implement the builtins are available in libgcc but,
-depending on optimization levels, they are expanded internally, adjusted
-to account for inlining, and sometimes combined/deferred (e.g. passing
-the caller-supplied watermark on to callees, refraining from erasing
-stack areas that the caller will) to enable tail calls and to optimize
-for code size.
+Embedded statements are not allowed in constant expressions, such as
+the value of an enumeration constant, the width of a bit-field, or
+the initial value of a static variable.
 
-@deftypefn {Built-in Function} {void} __builtin___strub_enter (void **@var{wmptr})
-This function initializes a stack @var{watermark} variable with the
-current top of the stack.  A call to this builtin function is introduced
-before entering a @code{strub} context.  It remains as a function call
-if optimization is not enabled.
-@end deftypefn
+If you don't know the type of the operand, you can still do this, but you
+must use @code{typeof} or @code{__auto_type} (@pxref{Typeof}).
 
-@deftypefn {Built-in Function} {void} __builtin___strub_update (void **@var{wmptr})
-This function updates a stack @var{watermark} variable with the current
-top of the stack, if it tops the previous watermark.  A call to this
-builtin function is inserted within @code{strub} contexts, whenever
-additional stack space may have been used.  It remains as a function
-call at optimization levels lower than 2.
-@end deftypefn
+In G++, the result value of a statement expression undergoes array and
+function pointer decay, and is returned by value to the enclosing
+expression.  For instance, if @code{A} is a class, then
 
-@deftypefn {Built-in Function} {void} __builtin___strub_leave (void **@var{wmptr})
-This function overwrites the memory area between the current top of the
-stack, and the @var{watermark}ed address.  A call to this builtin
-function is inserted after leaving a @code{strub} context.  It remains
-as a function call at optimization levels lower than 3, and it is guarded by
-a condition at level 2.
-@end deftypefn
+@smallexample
+        A a;
 
-@node Vector Extensions
-@section Using Vector Instructions through Built-in Functions
+        (@{a;@}).Foo ()
+@end smallexample
 
-On some targets, the instruction set contains SIMD vector instructions which
-operate on multiple values contained in one large register at the same time.
-For example, on the x86 the MMX, 3DNow!@: and SSE extensions can be used
-this way.
+@noindent
+constructs a temporary @code{A} object to hold the result of the
+statement expression, and that is used to invoke @code{Foo}.
+Therefore the @code{this} pointer observed by @code{Foo} is not the
+address of @code{a}.
 
-The first step in using these extensions is to provide the necessary data
-types.  This should be done using an appropriate @code{typedef}:
+In a statement expression, any temporaries created within a statement
+are destroyed at that statement's end.  This makes statement
+expressions inside macros slightly different from function calls.  In
+the latter case temporaries introduced during argument evaluation are
+destroyed at the end of the statement that includes the function
+call.  In the statement expression case they are destroyed during
+the statement expression.  For instance,
 
 @smallexample
-typedef int v4si __attribute__ ((vector_size (16)));
+#define macro(a)  (@{__typeof__(a) b = (a); b + 3; @})
+template<typename T> T function(T a) @{ T b = a; return b + 3; @}
+
+void foo ()
+@{
+  macro (X ());
+  function (X ());
+@}
 @end smallexample
 
 @noindent
-The @code{int} type specifies the @dfn{base type} (which can be a
-@code{typedef}), while the attribute specifies the vector size for the
-variable, measured in bytes. For example, the declaration above causes
-the compiler to set the mode for the @code{v4si} type to be 16 bytes wide
-and divided into @code{int} sized units.  For a 32-bit @code{int} this
-means a vector of 4 units of 4 bytes, and the corresponding mode of
-@code{foo} is @acronym{V4SI}.
+has different places where temporaries are destroyed.  For the
+@code{macro} case, the temporary @code{X} is destroyed just after
+the initialization of @code{b}.  In the @code{function} case that
+temporary is destroyed when the function returns.
 
-The @code{vector_size} attribute is only applicable to integral and
-floating scalars, although arrays, pointers, and function return values
-are allowed in conjunction with this construct. Only sizes that are
-positive power-of-two multiples of the base type size are currently allowed.
+These considerations mean that it is probably a bad idea to use
+statement expressions of this form in header files that are designed to
+work with C++.  (Note that some versions of the GNU C Library contained
+header files using statement expressions that lead to precisely this
+bug.)
 
-All the basic integer types can be used as base types, both as signed
-and as unsigned: @code{char}, @code{short}, @code{int}, @code{long},
-@code{long long}.  In addition, @code{float} and @code{double} can be
-used to build floating-point vector types.
+Jumping into a statement expression with @code{goto} or using a
+@code{switch} statement outside the statement expression with a
+@code{case} or @code{default} label inside the statement expression is
+not permitted.  Jumping into a statement expression with a computed
+@code{goto} (@pxref{Labels as Values}) has undefined behavior.
+Jumping out of a statement expression is permitted, but if the
+statement expression is part of a larger expression then it is
+unspecified which other subexpressions of that expression have been
+evaluated except where the language definition requires certain
+subexpressions to be evaluated before or after the statement
+expression.  A @code{break} or @code{continue} statement inside of
+a statement expression used in @code{while}, @code{do} or @code{for}
+loop or @code{switch} statement condition
+or @code{for} statement init or increment expressions jumps to an
+outer loop or @code{switch} statement if any (otherwise it is an error),
+rather than to the loop or @code{switch} statement in whose condition
+or init or increment expression it appears.
+In any case, as with a function call, the evaluation of a
+statement expression is not interleaved with the evaluation of other
+parts of the containing expression.  For example,
 
-Specifying a combination that is not valid for the current architecture
-causes GCC to synthesize the instructions using a narrower mode.
-For example, if you specify a variable of type @code{V4SI} and your
-architecture does not allow for this specific SIMD type, GCC
-produces code that uses 4 @code{SIs}.
+@smallexample
+  foo (), ((@{ bar1 (); goto a; 0; @}) + bar2 ()), baz();
+@end smallexample
 
-The types defined in this manner can be used with a subset of normal C
-operations.  Currently, GCC allows using the following operators
-on these types: @code{+, -, *, /, unary minus, ^, |, &, ~, %}@.
+@noindent
+calls @code{foo} and @code{bar1} and does not call @code{baz} but
+may or may not call @code{bar2}.  If @code{bar2} is called, it is
+called after @code{foo} and before @code{bar1}.
 
-The operations behave like C++ @code{valarrays}.  Addition is defined as
-the addition of the corresponding elements of the operands.  For
-example, in the code below, each of the 4 elements in @var{a} is
-added to the corresponding 4 elements in @var{b} and the resulting
-vector is stored in @var{c}.
+@node Local Labels
+@subsection Locally Declared Labels
+@cindex local labels
+@cindex macros, local labels
 
-@smallexample
-typedef int v4si __attribute__ ((vector_size (16)));
+GCC allows you to declare @dfn{local labels} in any nested block
+scope.  A local label is just like an ordinary label, but you can
+only reference it (with a @code{goto} statement, or by taking its
+address) within the block in which it is declared.
 
-v4si a, b, c;
+A local label declaration looks like this:
 
-c = a + b;
+@smallexample
+__label__ @var{label};
 @end smallexample
 
-Subtraction, multiplication, division, and the logical operations
-operate in a similar manner.  Likewise, the result of using the unary
-minus or complement operators on a vector type is a vector whose
-elements are the negative or complemented values of the corresponding
-elements in the operand.
+@noindent
+or
 
-It is possible to use shifting operators @code{<<}, @code{>>} on
-integer-type vectors. The operation is defined as following: @code{@{a0,
-a1, @dots{}, an@} >> @{b0, b1, @dots{}, bn@} == @{a0 >> b0, a1 >> b1,
-@dots{}, an >> bn@}}@.  Unlike OpenCL, values of @code{b} are not
-implicitly taken modulo bit width of the base type @code{B}, and the behavior
-is undefined if any @code{bi} is greater than or equal to @code{B}.
+@smallexample
+__label__ @var{label1}, @var{label2}, /* @r{@dots{}} */;
+@end smallexample
 
-In contrast to scalar operations in C and C++, operands of integer vector
-operations do not undergo integer promotions.
+Local label declarations must come at the beginning of the block,
+before any ordinary declarations or statements.
 
-Operands of binary vector operations must have the same number of
-elements. 
+The label declaration defines the label @emph{name}, but does not define
+the label itself.  You must do this in the usual way, with
+@code{@var{label}:}, within the statements of the statement expression.
 
-For convenience, it is allowed to use a binary vector operation
-where one operand is a scalar. In that case the compiler transforms
-the scalar operand into a vector where each element is the scalar from
-the operation. The transformation happens only if the scalar could be
-safely converted to the vector-element type.
-Consider the following code.
+The local label feature is useful for complex macros.  If a macro
+contains nested loops, a @code{goto} can be useful for breaking out of
+them.  However, an ordinary label whose scope is the whole function
+cannot be used: if the macro can be expanded several times in one
+function, the label is multiply defined in that function.  A
+local label avoids this problem.  For example:
 
 @smallexample
-typedef int v4si __attribute__ ((vector_size (16)));
-
-v4si a, b, c;
-long l;
+#define SEARCH(value, array, target)              \
+do @{                                              \
+  __label__ found;                                \
+  typeof (target) _SEARCH_target = (target);      \
+  typeof (*(array)) *_SEARCH_array = (array);     \
+  int i, j;                                       \
+  int value;                                      \
+  for (i = 0; i < max; i++)                       \
+    for (j = 0; j < max; j++)                     \
+      if (_SEARCH_array[i][j] == _SEARCH_target)  \
+        @{ (value) = i; goto found; @}              \
+  (value) = -1;                                   \
+ found:;                                          \
+@} while (0)
+@end smallexample
 
-a = b + 1;    /* a = b + @{1,1,1,1@}; */
-a = 2 * b;    /* a = @{2,2,2,2@} * b; */
+This could also be written using a statement expression:
 
-a = l + a;    /* Error, cannot convert long to int. */
+@smallexample
+#define SEARCH(array, target)                     \
+(@{                                                \
+  __label__ found;                                \
+  typeof (target) _SEARCH_target = (target);      \
+  typeof (*(array)) *_SEARCH_array = (array);     \
+  int i, j;                                       \
+  int value;                                      \
+  for (i = 0; i < max; i++)                       \
+    for (j = 0; j < max; j++)                     \
+      if (_SEARCH_array[i][j] == _SEARCH_target)  \
+        @{ value = i; goto found; @}                \
+  value = -1;                                     \
+ found:                                           \
+  value;                                          \
+@})
 @end smallexample
 
-Vectors can be subscripted as if the vector were an array with
-the same number of elements and base type.  Out of bound accesses
-invoke undefined behavior at run time.  Warnings for out of bound
-accesses for vector subscription can be enabled with
-@option{-Warray-bounds}.
+Local label declarations also make the labels they declare visible to
+nested functions, if there are any.  @xref{Nested Functions}, for details.
 
-Vector comparison is supported with standard comparison
-operators: @code{==, !=, <, <=, >, >=}. Comparison operands can be
-vector expressions of integer-type or real-type. Comparison between
-integer-type vectors and real-type vectors are not supported.  The
-result of the comparison is a vector of the same width and number of
-elements as the comparison operands with a signed integral element
-type.
+@node Labels as Values
+@subsection Labels as Values
+@cindex labels as values
+@cindex computed gotos
+@cindex goto with computed label
+@cindex address of a label
 
-Vectors are compared element-wise producing 0 when comparison is false
-and -1 (constant of the appropriate type where all bits are set)
-otherwise. Consider the following example.
+You can get the address of a label defined in the current function
+(or a containing function) with the unary operator @samp{&&}.  The
+value has type @code{void *}.  This value is a constant and can be used
+wherever a constant of that type is valid.  For example:
 
 @smallexample
-typedef int v4si __attribute__ ((vector_size (16)));
-
-v4si a = @{1,2,3,4@};
-v4si b = @{3,2,1,4@};
-v4si c;
-
-c = a >  b;     /* The result would be @{0, 0,-1, 0@}  */
-c = a == b;     /* The result would be @{0,-1, 0,-1@}  */
+void *ptr;
+/* @r{@dots{}} */
+ptr = &&foo;
 @end smallexample
 
-In C++, the ternary operator @code{?:} is available. @code{a?b:c}, where
-@code{b} and @code{c} are vectors of the same type and @code{a} is an
-integer vector with the same number of elements of the same size as @code{b}
-and @code{c}, computes all three arguments and creates a vector
-@code{@{a[0]?b[0]:c[0], a[1]?b[1]:c[1], @dots{}@}}.  Note that unlike in
-OpenCL, @code{a} is thus interpreted as @code{a != 0} and not @code{a < 0}.
-As in the case of binary operations, this syntax is also accepted when
-one of @code{b} or @code{c} is a scalar that is then transformed into a
-vector. If both @code{b} and @code{c} are scalars and the type of
-@code{true?b:c} has the same size as the element type of @code{a}, then
-@code{b} and @code{c} are converted to a vector type whose elements have
-this type and with the same number of elements as @code{a}.
-
-In C++, the logic operators @code{!, &&, ||} are available for vectors.
-@code{!v} is equivalent to @code{v == 0}, @code{a && b} is equivalent to
-@code{a!=0 & b!=0} and @code{a || b} is equivalent to @code{a!=0 | b!=0}.
-For mixed operations between a scalar @code{s} and a vector @code{v},
-@code{s && v} is equivalent to @code{s?v!=0:0} (the evaluation is
-short-circuit) and @code{v && s} is equivalent to @code{v!=0 & (s?-1:0)}.
+To use these values, you need to be able to jump to one.  This is done
+with the computed goto statement@footnote{The analogous feature in
+Fortran is called an assigned goto, but that name seems inappropriate in
+C, where one can do more than simply store label addresses in label
+variables.}, @code{goto *@var{exp};}.  For example,
 
-@findex __builtin_shuffle
-Vector shuffling is available using functions
-@code{__builtin_shuffle (vec, mask)} and
-@code{__builtin_shuffle (vec0, vec1, mask)}.
-Both functions construct a permutation of elements from one or two
-vectors and return a vector of the same type as the input vector(s).
-The @var{mask} is an integral vector with the same width (@var{W})
-and element count (@var{N}) as the output vector.
+@smallexample
+goto *ptr;
+@end smallexample
 
-The elements of the input vectors are numbered in memory ordering of
-@var{vec0} beginning at 0 and @var{vec1} beginning at @var{N}.  The
-elements of @var{mask} are considered modulo @var{N} in the single-operand
-case and modulo @math{2*@var{N}} in the two-operand case.
+@noindent
+Any expression of type @code{void *} is allowed.
 
-Consider the following example,
+One way of using these constants is in initializing a static array that
+serves as a jump table:
 
 @smallexample
-typedef int v4si __attribute__ ((vector_size (16)));
+static void *array[] = @{ &&foo, &&bar, &&hack @};
+@end smallexample
 
-v4si a = @{1,2,3,4@};
-v4si b = @{5,6,7,8@};
-v4si mask1 = @{0,1,1,3@};
-v4si mask2 = @{0,4,2,5@};
-v4si res;
+@noindent
+Then you can select a label with indexing, like this:
 
-res = __builtin_shuffle (a, mask1);       /* res is @{1,2,2,4@}  */
-res = __builtin_shuffle (a, b, mask2);    /* res is @{1,5,3,6@}  */
+@smallexample
+goto *array[i];
 @end smallexample
 
-Note that @code{__builtin_shuffle} is intentionally semantically
-compatible with the OpenCL @code{shuffle} and @code{shuffle2} functions.
+@noindent
+Note that this does not check whether the subscript is in bounds---array
+indexing in C never does that.
 
-You can declare variables and use them in function calls and returns, as
-well as in assignments and some casts.  You can specify a vector type as
-a return type for a function.  Vector types can also be used as function
-arguments.  It is possible to cast from one vector type to another,
-provided they are of the same size (in fact, you can also cast vectors
-to and from other data types of the same size).
+Such an array of label values serves a purpose much like that of the
+@code{switch} statement.  The @code{switch} statement is cleaner, so
+use that rather than an array unless the problem does not fit a
+@code{switch} statement very well.
 
-You cannot operate between vectors of different lengths or different
-signedness without a cast.
+Another use of label values is in an interpreter for threaded code.
+The labels within the interpreter function can be stored in the
+threaded code for super-fast dispatching.
 
-@findex __builtin_shufflevector
-Vector shuffling is available using the
-@code{__builtin_shufflevector (vec1, vec2, index...)}
-function.  @var{vec1} and @var{vec2} must be expressions with
-vector type with a compatible element type.  The result of
-@code{__builtin_shufflevector} is a vector with the same element type
-as @var{vec1} and @var{vec2} but that has an element count equal to
-the number of indices specified.
+You may not use this mechanism to jump to code in a different function.
+If you do that, totally unpredictable things happen.  The best way to
+avoid this is to store the label address only in automatic variables and
+never pass it as an argument.
 
-The @var{index} arguments are a list of integers that specify the
-elements indices of the first two vectors that should be extracted and
-returned in a new vector. These element indices are numbered sequentially
-starting with the first vector, continuing into the second vector.
-An index of -1 can be used to indicate that the corresponding element in
-the returned vector is a don't care and can be freely chosen to optimized
-the generated code sequence performing the shuffle operation.
+An alternate way to write the above example is
 
-Consider the following example,
 @smallexample
-typedef int v4si __attribute__ ((vector_size (16)));
-typedef int v8si __attribute__ ((vector_size (32)));
-
-v8si a = @{1,-2,3,-4,5,-6,7,-8@};
-v4si b = __builtin_shufflevector (a, a, 0, 2, 4, 6); /* b is @{1,3,5,7@} */
-v4si c = @{-2,-4,-6,-8@};
-v8si d = __builtin_shufflevector (c, b, 4, 0, 5, 1, 6, 2, 7, 3); /* d is a */
+static const int array[] = @{ &&foo - &&foo, &&bar - &&foo,
+                             &&hack - &&foo @};
+goto *(&&foo + array[i]);
 @end smallexample
 
-@findex __builtin_convertvector
-Vector conversion is available using the
-@code{__builtin_convertvector (vec, vectype)}
-function.  @var{vec} must be an expression with integral or floating
-vector type and @var{vectype} an integral or floating vector type with the
-same number of elements.  The result has @var{vectype} type and value of
-a C cast of every element of @var{vec} to the element type of @var{vectype}.
+@noindent
+This is more friendly to code living in shared libraries, as it reduces
+the number of dynamic relocations that are needed, and by consequence,
+allows the data to be read-only.
+This alternative with label differences is not supported for the AVR target,
+please use the first approach for AVR programs.
 
-Consider the following example,
-@smallexample
-typedef int v4si __attribute__ ((vector_size (16)));
-typedef float v4sf __attribute__ ((vector_size (16)));
-typedef double v4df __attribute__ ((vector_size (32)));
-typedef unsigned long long v4di __attribute__ ((vector_size (32)));
+The @code{&&foo} expressions for the same label might have different
+values if the containing function is inlined or cloned.  If a program
+relies on them being always the same,
+@code{__attribute__((__noinline__,__noclone__))} should be used to
+prevent inlining and cloning.  If @code{&&foo} is used in a static
+variable initializer, inlining and cloning is forbidden.
 
-v4si a = @{1,-2,3,-4@};
-v4sf b = @{1.5f,-2.5f,3.f,7.f@};
-v4di c = @{1ULL,5ULL,0ULL,10ULL@};
-v4sf d = __builtin_convertvector (a, v4sf); /* d is @{1.f,-2.f,3.f,-4.f@} */
-/* Equivalent of:
-   v4sf d = @{ (float)a[0], (float)a[1], (float)a[2], (float)a[3] @}; */
-v4df e = __builtin_convertvector (a, v4df); /* e is @{1.,-2.,3.,-4.@} */
-v4df f = __builtin_convertvector (b, v4df); /* f is @{1.5,-2.5,3.,7.@} */
-v4si g = __builtin_convertvector (f, v4si); /* g is @{1,-2,3,7@} */
-v4si h = __builtin_convertvector (c, v4si); /* h is @{1,5,0,10@} */
-@end smallexample
+Unlike a normal goto, in GNU C++ a computed goto will not call
+destructors for objects that go out of scope.
 
-@cindex vector types, using with x86 intrinsics
-Sometimes it is desirable to write code using a mix of generic vector
-operations (for clarity) and machine-specific vector intrinsics (to
-access vector instructions that are not exposed via generic built-ins).
-On x86, intrinsic functions for integer vectors typically use the same
-vector type @code{__m128i} irrespective of how they interpret the vector,
-making it necessary to cast their arguments and return values from/to
-other vector types.  In C, you can make use of a @code{union} type:
-@c In C++ such type punning via a union is not allowed by the language
-@smallexample
-#include <immintrin.h>
+@node Nested Functions
+@subsection Nested Functions
+@cindex nested functions
+@cindex downward funargs
+@cindex thunks
 
-typedef unsigned char u8x16 __attribute__ ((vector_size (16)));
-typedef unsigned int  u32x4 __attribute__ ((vector_size (16)));
+A @dfn{nested function} is a function defined inside another function.
+Nested functions are supported as an extension in GNU C, but are not
+supported by GNU C++.
 
-typedef union @{
-        __m128i mm;
-        u8x16   u8;
-        u32x4   u32;
-@} v128;
+The nested function's name is local to the block where it is defined.
+For example, here we define a nested function named @code{square}, and
+call it twice:
+
+@smallexample
+@group
+foo (double a, double b)
+@{
+  double square (double z) @{ return z * z; @}
+
+  return square (a) + square (b);
+@}
+@end group
 @end smallexample
 
-@noindent
-for variables that can be used with both built-in operators and x86
-intrinsics:
+The nested function can access all the variables of the containing
+function that are visible at the point of its definition.  This is
+called @dfn{lexical scoping}.  For example, here we show a nested
+function which uses an inherited variable named @code{offset}:
 
 @smallexample
-v128 x, y = @{ 0 @};
-memcpy (&x, ptr, sizeof x);
-y.u8  += 0x80;
-x.mm  = _mm_adds_epu8 (x.mm, y.mm);
-x.u32 &= 0xffffff;
-
-/* Instead of a variable, a compound literal may be used to pass the
-   return value of an intrinsic call to a function expecting the union: */
-v128 foo (v128);
-x = foo ((v128) @{_mm_adds_epu8 (x.mm, y.mm)@});
-@c This could be done implicitly with __attribute__((transparent_union)),
-@c but GCC does not accept it for unions of vector types (PR 88955).
+@group
+bar (int *array, int offset, int size)
+@{
+  int access (int *array, int index)
+    @{ return array[index + offset]; @}
+  int i;
+  /* @r{@dots{}} */
+  for (i = 0; i < size; i++)
+    /* @r{@dots{}} */ access (array, i) /* @r{@dots{}} */
+@}
+@end group
 @end smallexample
 
-@node __sync Builtins
-@section Legacy @code{__sync} Built-in Functions for Atomic Memory Access
+Nested function definitions are permitted within functions in the places
+where variable definitions are allowed; that is, in any block, mixed
+with the other declarations and statements in the block.
 
-The following built-in functions
-are intended to be compatible with those described
-in the @cite{Intel Itanium Processor-specific Application Binary Interface},
-section 7.4.  As such, they depart from normal GCC practice by not using
-the @samp{__builtin_} prefix and also by being overloaded so that they
-work on multiple types.
+It is possible to call the nested function from outside the scope of its
+name by storing its address or passing the address to another function:
 
-The definition given in the Intel documentation allows only for the use of
-the types @code{int}, @code{long}, @code{long long} or their unsigned
-counterparts.  GCC allows any scalar type that is 1, 2, 4 or 8 bytes in
-size other than the C type @code{_Bool} or the C++ type @code{bool}.
-Operations on pointer arguments are performed as if the operands were
-of the @code{uintptr_t} type.  That is, they are not scaled by the size
-of the type to which the pointer points.
+@smallexample
+hack (int *array, int size)
+@{
+  void store (int index, int value)
+    @{ array[index] = value; @}
 
-These functions are implemented in terms of the @samp{__atomic}
-builtins (@pxref{__atomic Builtins}).  They should not be used for new
-code which should use the @samp{__atomic} builtins instead.
+  intermediate (store, size);
+@}
+@end smallexample
 
-Not all operations are supported by all target processors.  If a particular
-operation cannot be implemented on the target processor, a call to an
-external function is generated.  The external function carries the same name
-as the built-in version, with an additional suffix
-@samp{_@var{n}} where @var{n} is the size of the data type.
+Here, the function @code{intermediate} receives the address of
+@code{store} as an argument.  If @code{intermediate} calls @code{store},
+the arguments given to @code{store} are used to store into @code{array}.
+But this technique works only so long as the containing function
+(@code{hack}, in this example) does not exit.
 
-In most cases, these built-in functions are considered a @dfn{full barrier}.
-That is,
-no memory operand is moved across the operation, either forward or
-backward.  Further, instructions are issued as necessary to prevent the
-processor from speculating loads across the operation and from queuing stores
-after the operation.
+If you try to call the nested function through its address after the
+containing function exits, all hell breaks loose.  If you try
+to call it after a containing scope level exits, and if it refers
+to some of the variables that are no longer in scope, you may be lucky,
+but it's not wise to take the risk.  If, however, the nested function
+does not refer to anything that has gone out of scope, you should be
+safe.
 
-All of the routines are described in the Intel documentation to take
-``an optional list of variables protected by the memory barrier''.  It's
-not clear what is meant by that; it could mean that @emph{only} the
-listed variables are protected, or it could mean a list of additional
-variables to be protected.  The list is ignored by GCC which treats it as
-empty.  GCC interprets an empty list as meaning that all globally
-accessible variables should be protected.
+GCC implements taking the address of a nested function using a technique
+called @dfn{trampolines}.  This technique was described in
+@cite{Lexical Closures for C++} (Thomas M. Breuel, USENIX
+C++ Conference Proceedings, October 17-21, 1988).
 
-@defbuiltin{@var{type} __sync_fetch_and_add (@var{type} *@var{ptr}, @var{type} @var{value}, ...)}
-@defbuiltinx{@var{type} __sync_fetch_and_sub (@var{type} *@var{ptr}, @var{type} @var{value}, ...)}
-@defbuiltinx{@var{type} __sync_fetch_and_or (@var{type} *@var{ptr}, @var{type} @var{value}, ...)}
-@defbuiltinx{@var{type} __sync_fetch_and_and (@var{type} *@var{ptr}, @var{type} @var{value}, ...)}
-@defbuiltinx{@var{type} __sync_fetch_and_xor (@var{type} *@var{ptr}, @var{type} @var{value}, ...)}
-@defbuiltinx{@var{type} __sync_fetch_and_nand (@var{type} *@var{ptr}, @var{type} @var{value}, ...)}
-These built-in functions perform the operation suggested by the name, and
-returns the value that had previously been in memory.  That is, operations
-on integer operands have the following semantics.  Operations on pointer
-arguments are performed as if the operands were of the @code{uintptr_t}
-type.  That is, they are not scaled by the size of the type to which
-the pointer points.
+A nested function can jump to a label inherited from a containing
+function, provided the label is explicitly declared in the containing
+function (@pxref{Local Labels}).  Such a jump returns instantly to the
+containing function, exiting the nested function that did the
+@code{goto} and any intermediate functions as well.  Here is an example:
 
 @smallexample
-@{ tmp = *ptr; *ptr @var{op}= value; return tmp; @}
-@{ tmp = *ptr; *ptr = ~(tmp & value); return tmp; @}   // nand
+@group
+bar (int *array, int offset, int size)
+@{
+  __label__ failure;
+  int access (int *array, int index)
+    @{
+      if (index > size)
+        goto failure;
+      return array[index + offset];
+    @}
+  int i;
+  /* @r{@dots{}} */
+  for (i = 0; i < size; i++)
+    /* @r{@dots{}} */ access (array, i) /* @r{@dots{}} */
+  /* @r{@dots{}} */
+  return 0;
+
+ /* @r{Control comes here from @code{access}
+    if it detects an error.}  */
+ failure:
+  return -1;
+@}
+@end group
 @end smallexample
 
-The object pointed to by the first argument must be of integer or pointer
-type.  It must not be a boolean type.
-
-@emph{Note:} GCC 4.4 and later implement @code{__sync_fetch_and_nand}
-as @code{*ptr = ~(tmp & value)} instead of @code{*ptr = ~tmp & value}.
-@enddefbuiltin
-
-@defbuiltin{@var{type} __sync_add_and_fetch (@var{type} *@var{ptr}, @
-                                             @var{type} @var{value}, ...)}
-@defbuiltinx{@var{type} __sync_sub_and_fetch (@var{type} *@var{ptr}, @var{type} @var{value}, ...)}
-@defbuiltinx{@var{type} __sync_or_and_fetch (@var{type} *@var{ptr}, @var{type} @var{value}, ...)}
-@defbuiltinx{@var{type} __sync_and_and_fetch (@var{type} *@var{ptr}, @var{type} @var{value}, ...)}
-@defbuiltinx{@var{type} __sync_xor_and_fetch (@var{type} *@var{ptr}, @var{type} @var{value}, ...)}
-@defbuiltinx{@var{type} __sync_nand_and_fetch (@var{type} *@var{ptr}, @var{type} @var{value}, ...)}
-These built-in functions perform the operation suggested by the name, and
-return the new value.  That is, operations on integer operands have
-the following semantics.  Operations on pointer operands are performed as
-if the operand's type were @code{uintptr_t}.
+A nested function always has no linkage.  Declaring one with
+@code{extern} or @code{static} is erroneous.  If you need to declare the nested function
+before its definition, use @code{auto} (which is otherwise meaningless
+for function declarations).
 
 @smallexample
-@{ *ptr @var{op}= value; return *ptr; @}
-@{ *ptr = ~(*ptr & value); return *ptr; @}   // nand
+bar (int *array, int offset, int size)
+@{
+  __label__ failure;
+  auto int access (int *, int);
+  /* @r{@dots{}} */
+  int access (int *array, int index)
+    @{
+      if (index > size)
+        goto failure;
+      return array[index + offset];
+    @}
+  /* @r{@dots{}} */
+@}
 @end smallexample
 
-The same constraints on arguments apply as for the corresponding
-@code{__sync_op_and_fetch} built-in functions.
+@node Typeof
+@subsection Referring to a Type with @code{typeof}
+@findex typeof
+@findex sizeof
+@cindex macros, types of arguments
 
-@emph{Note:} GCC 4.4 and later implement @code{__sync_nand_and_fetch}
-as @code{*ptr = ~(*ptr & value)} instead of
-@code{*ptr = ~*ptr & value}.
-@enddefbuiltin
+Another way to refer to the type of an expression is with @code{typeof}.
+The syntax of using of this keyword looks like @code{sizeof}, but the
+construct acts semantically like a type name defined with @code{typedef}.
 
-@defbuiltin{bool __sync_bool_compare_and_swap (@var{type} *@var{ptr}, @var{type} @var{oldval}, @var{type} @var{newval}, ...)}
-@defbuiltinx{@var{type} __sync_val_compare_and_swap (@var{type} *@var{ptr}, @var{type} @var{oldval}, @var{type} @var{newval}, ...)}
-These built-in functions perform an atomic compare and swap.
-That is, if the current
-value of @code{*@var{ptr}} is @var{oldval}, then write @var{newval} into
-@code{*@var{ptr}}.
+There are two ways of writing the argument to @code{typeof}: with an
+expression or with a type.  Here is an example with an expression:
 
-The ``bool'' version returns @code{true} if the comparison is successful and
-@var{newval} is written.  The ``val'' version returns the contents
-of @code{*@var{ptr}} before the operation.
-@enddefbuiltin
+@smallexample
+typeof (x[0](1))
+@end smallexample
 
-@defbuiltin{void __sync_synchronize (...)}
-This built-in function issues a full memory barrier.
-@enddefbuiltin
+@noindent
+This assumes that @code{x} is an array of pointers to functions;
+the type described is that of the values of the functions.
 
-@defbuiltin{@var{type} __sync_lock_test_and_set (@var{type} *@var{ptr}, @var{type} @var{value}, ...)}
-This built-in function, as described by Intel, is not a traditional test-and-set
-operation, but rather an atomic exchange operation.  It writes @var{value}
-into @code{*@var{ptr}}, and returns the previous contents of
-@code{*@var{ptr}}.
+Here is an example with a typename as the argument:
 
-Many targets have only minimal support for such locks, and do not support
-a full exchange operation.  In this case, a target may support reduced
-functionality here by which the @emph{only} valid value to store is the
-immediate constant 1.  The exact value actually stored in @code{*@var{ptr}}
-is implementation defined.
+@smallexample
+typeof (int *)
+@end smallexample
 
-This built-in function is not a full barrier,
-but rather an @dfn{acquire barrier}.
-This means that references after the operation cannot move to (or be
-speculated to) before the operation, but previous memory stores may not
-be globally visible yet, and previous memory loads may not yet be
-satisfied.
-@enddefbuiltin
+@noindent
+Here the type described is that of pointers to @code{int}.
 
-@defbuiltin{void __sync_lock_release (@var{type} *@var{ptr}, ...)}
-This built-in function releases the lock acquired by
-@code{__sync_lock_test_and_set}.
-Normally this means writing the constant 0 to @code{*@var{ptr}}.
+If you are writing a header file that must work when included in ISO C
+programs, write @code{__typeof__} instead of @code{typeof}.
+@xref{Alternate Keywords}.
 
-This built-in function is not a full barrier,
-but rather a @dfn{release barrier}.
-This means that all previous memory stores are globally visible, and all
-previous memory loads have been satisfied, but following memory reads
-are not prevented from being speculated to before the barrier.
-@enddefbuiltin
+A @code{typeof} construct can be used anywhere a typedef name can be
+used.  For example, you can use it in a declaration, in a cast, or inside
+of @code{sizeof} or @code{typeof}.
 
-@node __atomic Builtins
-@section Built-in Functions for Memory Model Aware Atomic Operations
+The operand of @code{typeof} is evaluated for its side effects if and
+only if it is an expression of variably modified type or the name of
+such a type.
 
-The following built-in functions approximately match the requirements
-for the C++11 memory model.  They are all
-identified by being prefixed with @samp{__atomic} and most are
-overloaded so that they work with multiple types.
+@code{typeof} is often useful in conjunction with
+statement expressions (@pxref{Statement Exprs}).
+Here is how the two together can
+be used to define a safe ``maximum'' macro which operates on any
+arithmetic type and evaluates each of its arguments exactly once:
 
-These functions are intended to replace the legacy @samp{__sync}
-builtins.  The main difference is that the memory order that is requested
-is a parameter to the functions.  New code should always use the
-@samp{__atomic} builtins rather than the @samp{__sync} builtins.
+@smallexample
+#define max(a,b) \
+  (@{ typeof (a) _a = (a); \
+      typeof (b) _b = (b); \
+    _a > _b ? _a : _b; @})
+@end smallexample
 
-Note that the @samp{__atomic} builtins assume that programs will
-conform to the C++11 memory model.  In particular, they assume
-that programs are free of data races.  See the C++11 standard for
-detailed requirements.
+@cindex underscores in variables in macros
+@cindex @samp{_} in variables in macros
+@cindex local variables in macros
+@cindex variables, local, in macros
+@cindex macros, local variables in
 
-The @samp{__atomic} builtins can be used with any integral scalar or
-pointer type that is 1, 2, 4, or 8 bytes in length.  16-byte integral
-types are also allowed if @samp{__int128} (@pxref{__int128}) is
-supported by the architecture.
+The reason for using names that start with underscores for the local
+variables is to avoid conflicts with variable names that occur within the
+expressions that are substituted for @code{a} and @code{b}.  Eventually we
+hope to design a new form of declaration syntax that allows you to declare
+variables whose scopes start only after their initializers; this will be a
+more reliable way to prevent such conflicts.
 
-The four non-arithmetic functions (load, store, exchange, and 
-compare_exchange) all have a generic version as well.  This generic
-version works on any data type.  It uses the lock-free built-in function
-if the specific data type size makes that possible; otherwise, an
-external call is left to be resolved at run time.  This external call is
-the same format with the addition of a @samp{size_t} parameter inserted
-as the first parameter indicating the size of the object being pointed to.
-All objects must be the same size.
+@noindent
+Some more examples of the use of @code{typeof}:
 
-There are 6 different memory orders that can be specified.  These map
-to the C++11 memory orders with the same names, see the C++11 standard
-or the @uref{https://gcc.gnu.org/wiki/Atomic/GCCMM/AtomicSync,GCC wiki
-on atomic synchronization} for detailed definitions.  Individual
-targets may also support additional memory orders for use on specific
-architectures.  Refer to the target documentation for details of
-these.
+@itemize @bullet
+@item
+This declares @code{y} with the type of what @code{x} points to.
 
-An atomic operation can both constrain code motion and
-be mapped to hardware instructions for synchronization between threads
-(e.g., a fence).  To which extent this happens is controlled by the
-memory orders, which are listed here in approximately ascending order of
-strength.  The description of each memory order is only meant to roughly
-illustrate the effects and is not a specification; see the C++11
-memory model for precise semantics.
+@smallexample
+typeof (*x) y;
+@end smallexample
 
-@table  @code
-@item __ATOMIC_RELAXED
-Implies no inter-thread ordering constraints.
-@item __ATOMIC_CONSUME
-This is currently implemented using the stronger @code{__ATOMIC_ACQUIRE}
-memory order because of a deficiency in C++11's semantics for
-@code{memory_order_consume}.
-@item __ATOMIC_ACQUIRE
-Creates an inter-thread happens-before constraint from the release (or
-stronger) semantic store to this acquire load.  Can prevent hoisting
-of code to before the operation.
-@item __ATOMIC_RELEASE
-Creates an inter-thread happens-before constraint to acquire (or stronger)
-semantic loads that read from this release store.  Can prevent sinking
-of code to after the operation.
-@item __ATOMIC_ACQ_REL
-Combines the effects of both @code{__ATOMIC_ACQUIRE} and
-@code{__ATOMIC_RELEASE}.
-@item __ATOMIC_SEQ_CST
-Enforces total ordering with all other @code{__ATOMIC_SEQ_CST} operations.
-@end table
+@item
+This declares @code{y} as an array of such values.
 
-Note that in the C++11 memory model, @emph{fences} (e.g.,
-@samp{__atomic_thread_fence}) take effect in combination with other
-atomic operations on specific memory locations (e.g., atomic loads);
-operations on specific memory locations do not necessarily affect other
-operations in the same way.
+@smallexample
+typeof (*x) y[4];
+@end smallexample
 
-Target architectures are encouraged to provide their own patterns for
-each of the atomic built-in functions.  If no target is provided, the original
-non-memory model set of @samp{__sync} atomic built-in functions are
-used, along with any required synchronization fences surrounding it in
-order to achieve the proper behavior.  Execution in this case is subject
-to the same restrictions as those built-in functions.
+@item
+This declares @code{y} as an array of pointers to characters:
 
-If there is no pattern or mechanism to provide a lock-free instruction
-sequence, a call is made to an external routine with the same parameters
-to be resolved at run time.
+@smallexample
+typeof (typeof (char *)[4]) y;
+@end smallexample
 
-When implementing patterns for these built-in functions, the memory order
-parameter can be ignored as long as the pattern implements the most
-restrictive @code{__ATOMIC_SEQ_CST} memory order.  Any of the other memory
-orders execute correctly with this memory order but they may not execute as
-efficiently as they could with a more appropriate implementation of the
-relaxed requirements.
+@noindent
+It is equivalent to the following traditional C declaration:
 
-Note that the C++11 standard allows for the memory order parameter to be
-determined at run time rather than at compile time.  These built-in
-functions map any run-time value to @code{__ATOMIC_SEQ_CST} rather
-than invoke a runtime library call or inline a switch statement.  This is
-standard compliant, safe, and the simplest approach for now.
+@smallexample
+char *y[4];
+@end smallexample
 
-The memory order parameter is a signed int, but only the lower 16 bits are
-reserved for the memory order.  The remainder of the signed int is reserved
-for target use and should be 0.  Use of the predefined atomic values
-ensures proper usage.
+To see the meaning of the declaration using @code{typeof}, and why it
+might be a useful way to write, rewrite it with these macros:
 
-@defbuiltin{@var{type} __atomic_load_n (@var{type} *@var{ptr}, int @var{memorder})}
-This built-in function implements an atomic load operation.  It returns the
-contents of @code{*@var{ptr}}.
+@smallexample
+#define pointer(T)  typeof(T *)
+#define array(T, N) typeof(T [N])
+@end smallexample
 
-The valid memory order variants are
-@code{__ATOMIC_RELAXED}, @code{__ATOMIC_SEQ_CST}, @code{__ATOMIC_ACQUIRE},
-and @code{__ATOMIC_CONSUME}.
+@noindent
+Now the declaration can be rewritten this way:
 
-@enddefbuiltin
+@smallexample
+array (pointer (char), 4) y;
+@end smallexample
 
-@defbuiltin{void __atomic_load (@var{type} *@var{ptr}, @var{type} *@var{ret}, int @var{memorder})}
-This is the generic version of an atomic load.  It returns the
-contents of @code{*@var{ptr}} in @code{*@var{ret}}.
+@noindent
+Thus, @code{array (pointer (char), 4)} is the type of arrays of 4
+pointers to @code{char}.
+@end itemize
 
-@enddefbuiltin
+The ISO C23 operator @code{typeof_unqual} is available in ISO C23 mode
+and its result is the non-atomic unqualified version of what @code{typeof}
+operator returns.  Alternate spelling @code{__typeof_unqual__} is
+available in all C modes and provides non-atomic unqualified version of
+what @code{__typeof__} operator returns.
+@xref{Alternate Keywords}.
 
-@defbuiltin{void __atomic_store_n (@var{type} *@var{ptr}, @var{type} @var{val}, int @var{memorder})}
-This built-in function implements an atomic store operation.  It writes 
-@code{@var{val}} into @code{*@var{ptr}}.  
+@cindex @code{__auto_type} in GNU C
+In GNU C, but not GNU C++, you may also declare the type of a variable
+as @code{__auto_type}.  In that case, the declaration must declare
+only one variable, whose declarator must just be an identifier, the
+declaration must be initialized, and the type of the variable is
+determined by the initializer; the name of the variable is not in
+scope until after the initializer.  (In C++, you should use C++11
+@code{auto} for this purpose.)  Using @code{__auto_type}, the
+``maximum'' macro above could be written as:
 
-The valid memory order variants are
-@code{__ATOMIC_RELAXED}, @code{__ATOMIC_SEQ_CST}, and @code{__ATOMIC_RELEASE}.
+@smallexample
+#define max(a,b) \
+  (@{ __auto_type _a = (a); \
+      __auto_type _b = (b); \
+    _a > _b ? _a : _b; @})
+@end smallexample
 
-@enddefbuiltin
+Using @code{__auto_type} instead of @code{typeof} has two advantages:
 
-@defbuiltin{void __atomic_store (@var{type} *@var{ptr}, @var{type} *@var{val}, int @var{memorder})}
-This is the generic version of an atomic store.  It stores the value
-of @code{*@var{val}} into @code{*@var{ptr}}.
+@itemize @bullet
+@item Each argument to the macro appears only once in the expansion of
+the macro.  This prevents the size of the macro expansion growing
+exponentially when calls to such macros are nested inside arguments of
+such macros.
 
-@enddefbuiltin
+@item If the argument to the macro has variably modified type, it is
+evaluated only once when using @code{__auto_type}, but twice if
+@code{typeof} is used.
+@end itemize
 
-@defbuiltin{@var{type} __atomic_exchange_n (@var{type} *@var{ptr}, @var{type} @var{val}, int @var{memorder})}
-This built-in function implements an atomic exchange operation.  It writes
-@var{val} into @code{*@var{ptr}}, and returns the previous contents of
-@code{*@var{ptr}}.
+@node Offsetof
+@subsection Support for @code{offsetof}
+@findex __builtin_offsetof
 
-All memory order variants are valid.
+GCC implements for both C and C++ a syntactic extension to implement
+the @code{offsetof} macro.
 
-@enddefbuiltin
+@smallexample
+primary:
+        "__builtin_offsetof" "(" @code{typename} "," offsetof_member_designator ")"
 
-@defbuiltin{void __atomic_exchange (@var{type} *@var{ptr}, @var{type} *@var{val}, @var{type} *@var{ret}, int @var{memorder})}
-This is the generic version of an atomic exchange.  It stores the
-contents of @code{*@var{val}} into @code{*@var{ptr}}. The original value
-of @code{*@var{ptr}} is copied into @code{*@var{ret}}.
+offsetof_member_designator:
+          @code{identifier}
+        | offsetof_member_designator "." @code{identifier}
+        | offsetof_member_designator "[" @code{expr} "]"
+@end smallexample
 
-@enddefbuiltin
+This extension is sufficient such that
 
-@defbuiltin{bool __atomic_compare_exchange_n (@var{type} *@var{ptr}, @var{type} *@var{expected}, @var{type} @var{desired}, bool @var{weak}, int @var{success_memorder}, int @var{failure_memorder})}
-This built-in function implements an atomic compare and exchange operation.
-This compares the contents of @code{*@var{ptr}} with the contents of
-@code{*@var{expected}}. If equal, the operation is a @emph{read-modify-write}
-operation that writes @var{desired} into @code{*@var{ptr}}.  If they are not
-equal, the operation is a @emph{read} and the current contents of
-@code{*@var{ptr}} are written into @code{*@var{expected}}.  @var{weak} is @code{true}
-for weak compare_exchange, which may fail spuriously, and @code{false} for
-the strong variation, which never fails spuriously.  Many targets
-only offer the strong variation and ignore the parameter.  When in doubt, use
-the strong variation.
+@smallexample
+#define offsetof(@var{type}, @var{member})  __builtin_offsetof (@var{type}, @var{member})
+@end smallexample
 
-If @var{desired} is written into @code{*@var{ptr}} then @code{true} is returned
-and memory is affected according to the
-memory order specified by @var{success_memorder}.  There are no
-restrictions on what memory order can be used here.
+@noindent
+is a suitable definition of the @code{offsetof} macro.  In C++, @var{type}
+may be dependent.  In either case, @var{member} may consist of a single
+identifier, or a sequence of member accesses and array references.
 
-Otherwise, @code{false} is returned and memory is affected according
-to @var{failure_memorder}. This memory order cannot be
-@code{__ATOMIC_RELEASE} nor @code{__ATOMIC_ACQ_REL}.  It also cannot be a
-stronger order than that specified by @var{success_memorder}.
+@node Alignment
+@subsection Determining the Alignment of Functions, Types or Variables
+@cindex alignment
+@cindex type alignment
+@cindex variable alignment
 
-@enddefbuiltin
+The keyword @code{__alignof__} determines the alignment requirement of
+a function, object, or a type, or the minimum alignment usually required
+by a type.  Its syntax is just like @code{sizeof} and C11 @code{_Alignof}.
 
-@defbuiltin{bool __atomic_compare_exchange (@var{type} *@var{ptr}, @var{type} *@var{expected}, @var{type} *@var{desired}, bool @var{weak}, int @var{success_memorder}, int @var{failure_memorder})}
-This built-in function implements the generic version of
-@code{__atomic_compare_exchange}.  The function is virtually identical to
-@code{__atomic_compare_exchange_n}, except the desired value is also a
-pointer.
+For example, if the target machine requires a @code{double} value to be
+aligned on an 8-byte boundary, then @code{__alignof__ (double)} is 8.
+This is true on many RISC machines.  On more traditional machine
+designs, @code{__alignof__ (double)} is 4 or even 2.
 
-@enddefbuiltin
+Some machines never actually require alignment; they allow references to any
+data type even at an odd address.  For these machines, @code{__alignof__}
+reports the smallest alignment that GCC gives the data type, usually as
+mandated by the target ABI.
 
-@defbuiltin{@var{type} __atomic_add_fetch (@var{type} *@var{ptr}, @var{type} @var{val}, int @var{memorder})}
-@defbuiltinx{@var{type} __atomic_sub_fetch (@var{type} *@var{ptr}, @var{type} @var{val}, int @var{memorder})}
-@defbuiltinx{@var{type} __atomic_and_fetch (@var{type} *@var{ptr}, @var{type} @var{val}, int @var{memorder})}
-@defbuiltinx{@var{type} __atomic_xor_fetch (@var{type} *@var{ptr}, @var{type} @var{val}, int @var{memorder})}
-@defbuiltinx{@var{type} __atomic_or_fetch (@var{type} *@var{ptr}, @var{type} @var{val}, int @var{memorder})}
-@defbuiltinx{@var{type} __atomic_nand_fetch (@var{type} *@var{ptr}, @var{type} @var{val}, int @var{memorder})}
-These built-in functions perform the operation suggested by the name, and
-return the result of the operation.  Operations on pointer arguments are
-performed as if the operands were of the @code{uintptr_t} type.  That is,
-they are not scaled by the size of the type to which the pointer points.
+If the operand of @code{__alignof__} is an lvalue rather than a type,
+its value is the required alignment for its type, taking into account
+any minimum alignment specified by attribute @code{aligned}
+(@pxref{Common Variable Attributes}).  For example, after this
+declaration:
 
 @smallexample
-@{ *ptr @var{op}= val; return *ptr; @}
-@{ *ptr = ~(*ptr & val); return *ptr; @} // nand
+struct foo @{ int x; char y; @} foo1;
 @end smallexample
 
-The object pointed to by the first argument must be of integer or pointer
-type.  It must not be a boolean type.  All memory orders are valid.
+@noindent
+the value of @code{__alignof__ (foo1.y)} is 1, even though its actual
+alignment is probably 2 or 4, the same as @code{__alignof__ (int)}.
+It is an error to ask for the alignment of an incomplete type other
+than @code{void}.
 
-@enddefbuiltin
+If the operand of the @code{__alignof__} expression is a function,
+the expression evaluates to the alignment of the function which may
+be specified by attribute @code{aligned} (@pxref{Common Function Attributes}).
 
-@defbuiltin{@var{type} __atomic_fetch_add (@var{type} *@var{ptr}, @var{type} @var{val}, int @var{memorder})}
-@defbuiltinx{@var{type} __atomic_fetch_sub (@var{type} *@var{ptr}, @var{type} @var{val}, int @var{memorder})}
-@defbuiltinx{@var{type} __atomic_fetch_and (@var{type} *@var{ptr}, @var{type} @var{val}, int @var{memorder})}
-@defbuiltinx{@var{type} __atomic_fetch_xor (@var{type} *@var{ptr}, @var{type} @var{val}, int @var{memorder})}
-@defbuiltinx{@var{type} __atomic_fetch_or (@var{type} *@var{ptr}, @var{type} @var{val}, int @var{memorder})}
-@defbuiltinx{@var{type} __atomic_fetch_nand (@var{type} *@var{ptr}, @var{type} @var{val}, int @var{memorder})}
-These built-in functions perform the operation suggested by the name, and
-return the value that had previously been in @code{*@var{ptr}}.  Operations
-on pointer arguments are performed as if the operands were of
-the @code{uintptr_t} type.  That is, they are not scaled by the size of
-the type to which the pointer points.
+@node Incomplete Enums
+@subsection Incomplete @code{enum} Types
 
-@smallexample
-@{ tmp = *ptr; *ptr @var{op}= val; return tmp; @}
-@{ tmp = *ptr; *ptr = ~(*ptr & val); return tmp; @} // nand
-@end smallexample
+You can define an @code{enum} tag without specifying its possible values.
+This results in an incomplete type, much like what you get if you write
+@code{struct foo} without describing the elements.  A later declaration
+that does specify the possible values completes the type.
 
-The same constraints on arguments apply as for the corresponding
-@code{__atomic_op_fetch} built-in functions.  All memory orders are valid.
+You cannot allocate variables or storage using the type while it is
+incomplete.  However, you can work with pointers to that type.
 
-@enddefbuiltin
+This extension may not be very useful, but it makes the handling of
+@code{enum} more consistent with the way @code{struct} and @code{union}
+are handled.
 
-@defbuiltin{bool __atomic_test_and_set (void *@var{ptr}, int @var{memorder})}
+This extension is not supported by GNU C++.
 
-This built-in function performs an atomic test-and-set operation on
-the byte at @code{*@var{ptr}}.  The byte is set to some implementation
-defined nonzero ``set'' value and the return value is @code{true} if and only
-if the previous contents were ``set''.
-It should be only used for operands of type @code{bool} or @code{char}. For 
-other types only part of the value may be set.
+@node Variadic Macros
+@subsection Macros with a Variable Number of Arguments.
+@cindex variable number of arguments
+@cindex macro with variable arguments
+@cindex rest argument (in macro)
+@cindex variadic macros
 
-All memory orders are valid.
+In the ISO C standard of 1999, a macro can be declared to accept a
+variable number of arguments much as a function can.  The syntax for
+defining the macro is similar to that of a function.  Here is an
+example:
 
-@enddefbuiltin
+@smallexample
+#define debug(format, ...) fprintf (stderr, format, __VA_ARGS__)
+@end smallexample
 
-@defbuiltin{void __atomic_clear (bool *@var{ptr}, int @var{memorder})}
+@noindent
+Here @samp{@dots{}} is a @dfn{variable argument}.  In the invocation of
+such a macro, it represents the zero or more tokens until the closing
+parenthesis that ends the invocation, including any commas.  This set of
+tokens replaces the identifier @code{__VA_ARGS__} in the macro body
+wherever it appears.  See the CPP manual for more information.
 
-This built-in function performs an atomic clear operation on
-@code{*@var{ptr}}.  After the operation, @code{*@var{ptr}} contains 0.
-It should be only used for operands of type @code{bool} or @code{char} and 
-in conjunction with @code{__atomic_test_and_set}.
-For other types it may only clear partially. If the type is not @code{bool}
-prefer using @code{__atomic_store}.
+GCC has long supported variadic macros, and used a different syntax that
+allowed you to give a name to the variable arguments just like any other
+argument.  Here is an example:
 
-The valid memory order variants are
-@code{__ATOMIC_RELAXED}, @code{__ATOMIC_SEQ_CST}, and
-@code{__ATOMIC_RELEASE}.
+@smallexample
+#define debug(format, args...) fprintf (stderr, format, args)
+@end smallexample
 
-@enddefbuiltin
+@noindent
+This is in all ways equivalent to the ISO C example above, but arguably
+more readable and descriptive.
 
-@defbuiltin{void __atomic_thread_fence (int @var{memorder})}
+GNU CPP has two further variadic macro extensions, and permits them to
+be used with either of the above forms of macro definition.
 
-This built-in function acts as a synchronization fence between threads
-based on the specified memory order.
+In standard C, you are not allowed to leave the variable argument out
+entirely; but you are allowed to pass an empty argument.  For example,
+this invocation is invalid in ISO C, because there is no comma after
+the string:
 
-All memory orders are valid.
+@smallexample
+debug ("A message")
+@end smallexample
 
-@enddefbuiltin
+GNU CPP permits you to completely omit the variable arguments in this
+way.  In the above examples, the compiler would complain, though since
+the expansion of the macro still has the extra comma after the format
+string.
 
-@defbuiltin{void __atomic_signal_fence (int @var{memorder})}
-
-This built-in function acts as a synchronization fence between a thread
-and signal handlers based in the same thread.
-
-All memory orders are valid.
-
-@enddefbuiltin
-
-@defbuiltin{bool __atomic_always_lock_free (size_t @var{size},  void *@var{ptr})}
-
-This built-in function returns @code{true} if objects of @var{size} bytes always
-generate lock-free atomic instructions for the target architecture.
-@var{size} must resolve to a compile-time constant and the result also
-resolves to a compile-time constant.
-
-@var{ptr} is an optional pointer to the object that may be used to determine
-alignment.  A value of 0 indicates typical alignment should be used.  The 
-compiler may also ignore this parameter.
+To help solve this problem, CPP behaves specially for variable arguments
+used with the token paste operator, @samp{##}.  If instead you write
 
 @smallexample
-if (__atomic_always_lock_free (sizeof (long long), 0))
+#define debug(format, ...) fprintf (stderr, format, ## __VA_ARGS__)
 @end smallexample
 
-@enddefbuiltin
-
-@defbuiltin{bool __atomic_is_lock_free (size_t @var{size}, void *@var{ptr})}
-
-This built-in function returns @code{true} if objects of @var{size} bytes always
-generate lock-free atomic instructions for the target architecture.  If
-the built-in function is not known to be lock-free, a call is made to a
-runtime routine named @code{__atomic_is_lock_free}.
-
-@var{ptr} is an optional pointer to the object that may be used to determine
-alignment.  A value of 0 indicates typical alignment should be used.  The 
-compiler may also ignore this parameter.
-@enddefbuiltin
-
-@node Integer Overflow Builtins
-@section Built-in Functions to Perform Arithmetic with Overflow Checking
-
-The following built-in functions allow performing simple arithmetic operations
-together with checking whether the operations overflowed.
-
-@defbuiltin{bool __builtin_add_overflow (@var{type1} @var{a}, @var{type2} @var{b}, @var{type3} *@var{res})}
-@defbuiltinx{bool __builtin_sadd_overflow (int @var{a}, int @var{b}, int *@var{res})}
-@defbuiltinx{bool __builtin_saddl_overflow (long int @var{a}, long int @var{b}, long int *@var{res})}
-@defbuiltinx{bool __builtin_saddll_overflow (long long int @var{a}, long long int @var{b}, long long int *@var{res})}
-@defbuiltinx{bool __builtin_uadd_overflow (unsigned int @var{a}, unsigned int @var{b}, unsigned int *@var{res})}
-@defbuiltinx{bool __builtin_uaddl_overflow (unsigned long int @var{a}, unsigned long int @var{b}, unsigned long int *@var{res})}
-@defbuiltinx{bool __builtin_uaddll_overflow (unsigned long long int @var{a}, unsigned long long int @var{b}, unsigned long long int *@var{res})}
+@noindent
+and if the variable arguments are omitted or empty, the @samp{##}
+operator causes the preprocessor to remove the comma before it.  If you
+do provide some variable arguments in your macro invocation, GNU CPP
+does not complain about the paste operation and instead places the
+variable arguments after the comma.  Just like any other pasted macro
+argument, these arguments are not macro expanded.
 
-These built-in functions promote the first two operands into infinite precision signed
-type and perform addition on those promoted operands.  The result is then
-cast to the type the third pointer argument points to and stored there.
-If the stored result is equal to the infinite precision result, the built-in
-functions return @code{false}, otherwise they return @code{true}.  As the addition is
-performed in infinite signed precision, these built-in functions have fully defined
-behavior for all argument values.
+@node Conditionals
+@subsection Conditionals with Omitted Operands
+@cindex conditional expressions, extensions
+@cindex omitted middle-operands
+@cindex middle-operands, omitted
+@cindex extensions, @code{?:}
+@cindex @code{?:} extensions
 
-The first built-in function allows arbitrary integral types for operands and
-the result type must be pointer to some integral type other than enumerated or
-boolean type, the rest of the built-in functions have explicit integer types.
+The middle operand in a conditional expression may be omitted.  Then
+if the first operand is nonzero, its value is the value of the conditional
+expression.
 
-The compiler will attempt to use hardware instructions to implement
-these built-in functions where possible, like conditional jump on overflow
-after addition, conditional jump on carry etc.
+Therefore, the expression
 
-@enddefbuiltin
+@smallexample
+x ? : y
+@end smallexample
 
-@defbuiltin{bool __builtin_sub_overflow (@var{type1} @var{a}, @var{type2} @var{b}, @var{type3} *@var{res})}
-@defbuiltinx{bool __builtin_ssub_overflow (int @var{a}, int @var{b}, int *@var{res})}
-@defbuiltinx{bool __builtin_ssubl_overflow (long int @var{a}, long int @var{b}, long int *@var{res})}
-@defbuiltinx{bool __builtin_ssubll_overflow (long long int @var{a}, long long int @var{b}, long long int *@var{res})}
-@defbuiltinx{bool __builtin_usub_overflow (unsigned int @var{a}, unsigned int @var{b}, unsigned int *@var{res})}
-@defbuiltinx{bool __builtin_usubl_overflow (unsigned long int @var{a}, unsigned long int @var{b}, unsigned long int *@var{res})}
-@defbuiltinx{bool __builtin_usubll_overflow (unsigned long long int @var{a}, unsigned long long int @var{b}, unsigned long long int *@var{res})}
+@noindent
+has the value of @code{x} if that is nonzero; otherwise, the value of
+@code{y}.
 
-These built-in functions are similar to the add overflow checking built-in
-functions above, except they perform subtraction, subtract the second argument
-from the first one, instead of addition.
+This example is perfectly equivalent to
 
-@enddefbuiltin
+@smallexample
+x ? x : y
+@end smallexample
 
-@defbuiltin{bool __builtin_mul_overflow (@var{type1} @var{a}, @var{type2} @var{b}, @var{type3} *@var{res})}
-@defbuiltinx{bool __builtin_smul_overflow (int @var{a}, int @var{b}, int *@var{res})}
-@defbuiltinx{bool __builtin_smull_overflow (long int @var{a}, long int @var{b}, long int *@var{res})}
-@defbuiltinx{bool __builtin_smulll_overflow (long long int @var{a}, long long int @var{b}, long long int *@var{res})}
-@defbuiltinx{bool __builtin_umul_overflow (unsigned int @var{a}, unsigned int @var{b}, unsigned int *@var{res})}
-@defbuiltinx{bool __builtin_umull_overflow (unsigned long int @var{a}, unsigned long int @var{b}, unsigned long int *@var{res})}
-@defbuiltinx{bool __builtin_umulll_overflow (unsigned long long int @var{a}, unsigned long long int @var{b}, unsigned long long int *@var{res})}
+@cindex side effect in @code{?:}
+@cindex @code{?:} side effect
+@noindent
+In this simple case, the ability to omit the middle operand is not
+especially useful.  When it becomes useful is when the first operand does,
+or may (if it is a macro argument), contain a side effect.  Then repeating
+the operand in the middle would perform the side effect twice.  Omitting
+the middle operand uses the value already computed without the undesirable
+effects of recomputing it.
 
-These built-in functions are similar to the add overflow checking built-in
-functions above, except they perform multiplication, instead of addition.
+@node Case Ranges
+@subsection Case Ranges
+@cindex case ranges
+@cindex ranges in case statements
 
-@enddefbuiltin
+You can specify a range of consecutive values in a single @code{case} label,
+like this:
 
-The following built-in functions allow checking if simple arithmetic operation
-would overflow.
+@smallexample
+case @var{low} ... @var{high}:
+@end smallexample
 
-@defbuiltin{bool __builtin_add_overflow_p (@var{type1} @var{a}, @var{type2} @var{b}, @var{type3} @var{c})}
-@defbuiltinx{bool __builtin_sub_overflow_p (@var{type1} @var{a}, @var{type2} @var{b}, @var{type3} @var{c})}
-@defbuiltinx{bool __builtin_mul_overflow_p (@var{type1} @var{a}, @var{type2} @var{b}, @var{type3} @var{c})}
+@noindent
+This has the same effect as the proper number of individual @code{case}
+labels, one for each integer value from @var{low} to @var{high}, inclusive.
 
-These built-in functions are similar to @code{__builtin_add_overflow},
-@code{__builtin_sub_overflow}, or @code{__builtin_mul_overflow}, except that
-they don't store the result of the arithmetic operation anywhere and the
-last argument is not a pointer, but some expression with integral type other
-than enumerated or boolean type.
+This feature is especially useful for ranges of ASCII character codes:
 
-The built-in functions promote the first two operands into infinite precision signed type
-and perform addition on those promoted operands. The result is then
-cast to the type of the third argument.  If the cast result is equal to the infinite
-precision result, the built-in functions return @code{false}, otherwise they return @code{true}.
-The value of the third argument is ignored, just the side effects in the third argument
-are evaluated, and no integral argument promotions are performed on the last argument.
-If the third argument is a bit-field, the type used for the result cast has the
-precision and signedness of the given bit-field, rather than precision and signedness
-of the underlying type.
+@smallexample
+case 'A' ... 'Z':
+@end smallexample
 
-For example, the following macro can be used to portably check, at
-compile-time, whether or not adding two constant integers will overflow,
-and perform the addition only when it is known to be safe and not to trigger
-a @option{-Woverflow} warning.
+@strong{Be careful:} Write spaces around the @code{...}, for otherwise
+it may be parsed wrong when you use it with integer values.  For example,
+write this:
 
 @smallexample
-#define INT_ADD_OVERFLOW_P(a, b) \
-   __builtin_add_overflow_p (a, b, (__typeof__ ((a) + (b))) 0)
-
-enum @{
-    A = INT_MAX, B = 3,
-    C = INT_ADD_OVERFLOW_P (A, B) ? 0 : A + B,
-    D = __builtin_add_overflow_p (1, SCHAR_MAX, (signed char) 0)
-@};
+case 1 ... 5:
 @end smallexample
 
-The compiler will attempt to use hardware instructions to implement
-these built-in functions where possible, like conditional jump on overflow
-after addition, conditional jump on carry etc.
- 
-@enddefbuiltin
-
-@defbuiltin{{unsigned int} __builtin_addc (unsigned int @var{a}, unsigned int @var{b}, unsigned int @var{carry_in}, unsigned int *@var{carry_out})}
-@defbuiltinx{{unsigned long int} __builtin_addcl (unsigned long int @var{a}, unsigned long int @var{b}, unsigned int @var{carry_in}, unsigned long int *@var{carry_out})}
-@defbuiltinx{{unsigned long long int} __builtin_addcll (unsigned long long int @var{a}, unsigned long long int @var{b}, unsigned long long int @var{carry_in}, unsigned long long int *@var{carry_out})}
+@noindent
+rather than this:
 
-These built-in functions are equivalent to:
 @smallexample
-  (@{ __typeof__ (@var{a}) s; \
-      __typeof__ (@var{a}) c1 = __builtin_add_overflow (@var{a}, @var{b}, &s); \
-      __typeof__ (@var{a}) c2 = __builtin_add_overflow (s, @var{carry_in}, &s); \
-      *(@var{carry_out}) = c1 | c2; \
-      s; @})
+case 1...5:
 @end smallexample
 
-i.e.@: they add 3 unsigned values, set what the last argument
-points to to 1 if any of the two additions overflowed (otherwise 0)
-and return the sum of those 3 unsigned values.  Note, while all
-the first 3 arguments can have arbitrary values, better code will be
-emitted if one of them (preferably the third one) has only values
-0 or 1 (i.e.@: carry-in).
-
-@enddefbuiltin
+@node Mixed Labels and Declarations
+@subsection Mixed Declarations, Labels and Code
+@cindex mixed declarations and code
+@cindex declarations, mixed with code
+@cindex code, mixed with declarations
 
-@defbuiltin{{unsigned int} __builtin_subc (unsigned int @var{a}, unsigned int @var{b}, unsigned int @var{carry_in}, unsigned int *@var{carry_out})}
-@defbuiltinx{{unsigned long int} __builtin_subcl (unsigned long int @var{a}, unsigned long int @var{b}, unsigned int @var{carry_in}, unsigned long int *@var{carry_out})}
-@defbuiltinx{{unsigned long long int} __builtin_subcll (unsigned long long int @var{a}, unsigned long long int @var{b}, unsigned long long int @var{carry_in}, unsigned long long int *@var{carry_out})}
+ISO C99 and ISO C++ allow declarations and code to be freely mixed
+within compound statements.  ISO C23 allows labels to be
+placed before declarations and at the end of a compound statement.
+As an extension, GNU C also allows all this in C90 mode.  For example,
+you could do:
 
-These built-in functions are equivalent to:
 @smallexample
-  (@{ __typeof__ (@var{a}) s; \
-      __typeof__ (@var{a}) c1 = __builtin_sub_overflow (@var{a}, @var{b}, &s); \
-      __typeof__ (@var{a}) c2 = __builtin_sub_overflow (s, @var{carry_in}, &s); \
-      *(@var{carry_out}) = c1 | c2; \
-      s; @})
+int i;
+/* @r{@dots{}} */
+i++;
+int j = i + 2;
 @end smallexample
 
-i.e.@: they subtract 2 unsigned values from the first unsigned value,
-set what the last argument points to to 1 if any of the two subtractions
-overflowed (otherwise 0) and return the result of the subtractions.
-Note, while all the first 3 arguments can have arbitrary values, better code
-will be emitted if one of them (preferrably the third one) has only values
-0 or 1 (i.e.@: carry-in).
-
-@enddefbuiltin
+Each identifier is visible from where it is declared until the end of
+the enclosing block.
 
-@node x86 specific memory model extensions for transactional memory
-@section x86-Specific Memory Model Extensions for Transactional Memory
+@node C++ Comments
+@subsection C++ Style Comments
+@cindex @code{//}
+@cindex C++ comments
+@cindex comments, C++ style
 
-The x86 architecture supports additional memory ordering flags
-to mark critical sections for hardware lock elision. 
-These must be specified in addition to an existing memory order to
-atomic intrinsics.
+In GNU C, you may use C++ style comments, which start with @samp{//} and
+continue until the end of the line.  Many other C implementations allow
+such comments, and they are included in the 1999 C standard.  However,
+C++ style comments are not recognized if you specify an @option{-std}
+option specifying a version of ISO C before C99, or @option{-ansi}
+(equivalent to @option{-std=c90}).
 
-@table @code
-@item __ATOMIC_HLE_ACQUIRE
-Start lock elision on a lock variable.
-Memory order must be @code{__ATOMIC_ACQUIRE} or stronger.
-@item __ATOMIC_HLE_RELEASE
-End lock elision on a lock variable.
-Memory order must be @code{__ATOMIC_RELEASE} or stronger.
-@end table
+@node Escaped Newlines
+@subsection Slightly Looser Rules for Escaped Newlines
+@cindex escaped newlines
+@cindex newlines (escaped)
 
-When a lock acquire fails, it is required for good performance to abort
-the transaction quickly. This can be done with a @code{_mm_pause}.
+The preprocessor treatment of escaped newlines is more relaxed
+than that specified by the C90 standard, which requires the newline
+to immediately follow a backslash.
+GCC's implementation allows whitespace in the form
+of spaces, horizontal and vertical tabs, and form feeds between the
+backslash and the subsequent newline.  The preprocessor issues a
+warning, but treats it as a valid escaped newline and combines the two
+lines to form a single logical line.  This works within comments and
+tokens, as well as between tokens.  Comments are @emph{not} treated as
+whitespace for the purposes of this relaxation, since they have not
+yet been replaced with spaces.
 
-@smallexample
-#include <immintrin.h> // For _mm_pause
+@node Hex Floats
+@subsection Hex Floats
+@cindex hex floats
 
-int lockvar;
+ISO C99 and ISO C++17 support floating-point numbers written not only in
+the usual decimal notation, such as @code{1.55e1}, but also numbers such as
+@code{0x1.fp3} written in hexadecimal format.  As a GNU extension, GCC
+supports this in C90 mode (except in some cases when strictly
+conforming) and in C++98, C++11 and C++14 modes.  In that format the
+@samp{0x} hex introducer and the @samp{p} or @samp{P} exponent field are
+mandatory.  The exponent is a decimal number that indicates the power of
+2 by which the significant part is multiplied.  Thus @samp{0x1.f} is
+@tex
+$1 {15\over16}$,
+@end tex
+@ifnottex
+1 15/16,
+@end ifnottex
+@samp{p3} multiplies it by 8, and the value of @code{0x1.fp3}
+is the same as @code{1.55e1}.
 
-/* Acquire lock with lock elision */
-while (__atomic_exchange_n(&lockvar, 1, __ATOMIC_ACQUIRE|__ATOMIC_HLE_ACQUIRE))
-    _mm_pause(); /* Abort failed transaction */
-...
-/* Free lock with lock elision */
-__atomic_store_n(&lockvar, 0, __ATOMIC_RELEASE|__ATOMIC_HLE_RELEASE);
-@end smallexample
+Unlike for floating-point numbers in the decimal notation the exponent
+is always required in the hexadecimal notation.  Otherwise the compiler
+would not be able to resolve the ambiguity of, e.g., @code{0x1.f}.  This
+could mean @code{1.0f} or @code{1.9375} since @samp{f} is also the
+extension for floating-point constants of type @code{float}.
 
-@node Object Size Checking
-@section Object Size Checking
+@node Binary constants
+@subsection Binary Constants using the @samp{0b} Prefix
+@cindex Binary constants using the @samp{0b} prefix
 
-@subsection Object Size Checking Built-in Functions
-@findex __builtin___memcpy_chk
-@findex __builtin___mempcpy_chk
-@findex __builtin___memmove_chk
-@findex __builtin___memset_chk
-@findex __builtin___strcpy_chk
-@findex __builtin___stpcpy_chk
-@findex __builtin___strncpy_chk
-@findex __builtin___strcat_chk
-@findex __builtin___strncat_chk
+Integer constants can be written as binary constants, consisting of a
+sequence of @samp{0} and @samp{1} digits, prefixed by @samp{0b} or
+@samp{0B}.  This is particularly useful in environments that operate a
+lot on the bit level (like microcontrollers).
 
-GCC implements a limited buffer overflow protection mechanism that can
-prevent some buffer overflow attacks by determining the sizes of objects
-into which data is about to be written and preventing the writes when
-the size isn't sufficient.  The built-in functions described below yield
-the best results when used together and when optimization is enabled.
-For example, to detect object sizes across function boundaries or to
-follow pointer assignments through non-trivial control flow they rely
-on various optimization passes enabled with @option{-O2}.  However, to
-a limited extent, they can be used without optimization as well.
+The following statements are identical:
 
-@defbuiltin{size_t __builtin_object_size (const void * @var{ptr}, int @var{type})}
-is a built-in construct that returns a constant number of bytes from
-@var{ptr} to the end of the object @var{ptr} pointer points to
-(if known at compile time).  To determine the sizes of dynamically allocated
-objects the function relies on the allocation functions called to obtain
-the storage to be declared with the @code{alloc_size} attribute (@pxref{Common
-Function Attributes}).  @code{__builtin_object_size} never evaluates
-its arguments for side effects.  If there are any side effects in them, it
-returns @code{(size_t) -1} for @var{type} 0 or 1 and @code{(size_t) 0}
-for @var{type} 2 or 3.  If there are multiple objects @var{ptr} can
-point to and all of them are known at compile time, the returned number
-is the maximum of remaining byte counts in those objects if @var{type} & 2 is
-0 and minimum if nonzero.  If it is not possible to determine which objects
-@var{ptr} points to at compile time, @code{__builtin_object_size} should
-return @code{(size_t) -1} for @var{type} 0 or 1 and @code{(size_t) 0}
-for @var{type} 2 or 3.
+@smallexample
+i =       42;
+i =     0x2a;
+i =      052;
+i = 0b101010;
+@end smallexample
 
-@var{type} is an integer constant from 0 to 3.  If the least significant
-bit is clear, objects are whole variables, if it is set, a closest
-surrounding subobject is considered the object a pointer points to.
-The second bit determines if maximum or minimum of remaining bytes
-is computed.
+The type of these constants follows the same rules as for octal or
+hexadecimal integer constants, so suffixes like @samp{L} or @samp{UL}
+can be applied.
 
-@smallexample
-struct V @{ char buf1[10]; int b; char buf2[10]; @} var;
-char *p = &var.buf1[1], *q = &var.b;
+@node Dollar Signs
+@subsection Dollar Signs in Identifier Names
+@cindex $
+@cindex dollar signs in identifier names
+@cindex identifier names, dollar signs in
 
-/* Here the object p points to is var.  */
-assert (__builtin_object_size (p, 0) == sizeof (var) - 1);
-/* The subobject p points to is var.buf1.  */
-assert (__builtin_object_size (p, 1) == sizeof (var.buf1) - 1);
-/* The object q points to is var.  */
-assert (__builtin_object_size (q, 0)
-        == (char *) (&var + 1) - (char *) &var.b);
-/* The subobject q points to is var.b.  */
-assert (__builtin_object_size (q, 1) == sizeof (var.b));
-@end smallexample
-@enddefbuiltin
+In GNU C, you may normally use dollar signs in identifier names.
+This is because many traditional C implementations allow such identifiers.
+However, dollar signs in identifiers are not supported on a few target
+machines, typically because the target assembler does not allow them.
 
-@defbuiltin{{size_t} __builtin_dynamic_object_size (const void * @var{ptr}, int @var{type})}
-is similar to @code{__builtin_object_size} in that it returns a number of bytes
-from @var{ptr} to the end of the object @var{ptr} pointer points to, except
-that the size returned may not be a constant.  This results in successful
-evaluation of object size estimates in a wider range of use cases and can be
-more precise than @code{__builtin_object_size}, but it incurs a performance
-penalty since it may add a runtime overhead on size computation.  Semantics of
-@var{type} as well as return values in case it is not possible to determine
-which objects @var{ptr} points to at compile time are the same as in the case
-of @code{__builtin_object_size}.
-@enddefbuiltin
+@node Character Escapes
+@subsection The Character @key{ESC} in Constants
 
-@subsection Object Size Checking and Source Fortification
+You can use the sequence @samp{\e} in a string or character constant to
+stand for the ASCII character @key{ESC}.
 
-Hardening of function calls using the @code{_FORTIFY_SOURCE} macro is
-one of the key uses of the object size checking built-in functions.  To
-make implementation of these features more convenient and improve
-optimization and diagnostics, there are built-in functions added for
-many common string operation functions, e.g., for @code{memcpy}
-@code{__builtin___memcpy_chk} built-in is provided.  This built-in has
-an additional last argument, which is the number of bytes remaining in
-the object the @var{dest} argument points to or @code{(size_t) -1} if
-the size is not known.
+@node Alternate Keywords
+@subsection Alternate Keywords
+@cindex alternate keywords
+@cindex keywords, alternate
 
-The built-in functions are optimized into the normal string functions
-like @code{memcpy} if the last argument is @code{(size_t) -1} or if
-it is known at compile time that the destination object will not
-be overflowed.  If the compiler can determine at compile time that the
-object will always be overflowed, it issues a warning.
+@option{-ansi} and the various @option{-std} options disable certain
+keywords that are GNU C extensions.
+Specifically, the keywords @code{asm}, @code{typeof} and
+@code{inline} are not available in programs compiled with
+@option{-ansi} or a @option{-std=} option specifying an ISO standard that
+doesn't define the keyword.  This causes trouble when you want to use
+these extensions in a header file that can be included in programs that may
+be compiled with with such options.
 
-The intended use can be e.g.@:
+The way to solve these problems is to put @samp{__} at the beginning and
+end of each problematical keyword.  For example, use @code{__asm__}
+instead of @code{asm}, and @code{__inline__} instead of @code{inline}.
 
-@smallexample
-#undef memcpy
-#define bos0(dest) __builtin_object_size (dest, 0)
-#define memcpy(dest, src, n) \
-  __builtin___memcpy_chk (dest, src, n, bos0 (dest))
+Other C compilers won't accept these alternative keywords; if you want to
+compile with another compiler, you can define the alternate keywords as
+macros to replace them with the customary keywords.  It looks like this:
 
-char *volatile p;
-char buf[10];
-/* It is unknown what object p points to, so this is optimized
-   into plain memcpy - no checking is possible.  */
-memcpy (p, "abcde", n);
-/* Destination is known and length too.  It is known at compile
-   time there will be no overflow.  */
-memcpy (&buf[5], "abcde", 5);
-/* Destination is known, but the length is not known at compile time.
-   This will result in __memcpy_chk call that can check for overflow
-   at run time.  */
-memcpy (&buf[5], "abcde", n);
-/* Destination is known and it is known at compile time there will
-   be overflow.  There will be a warning and __memcpy_chk call that
-   will abort the program at run time.  */
-memcpy (&buf[6], "abcde", 5);
+@smallexample
+#ifndef __GNUC__
+#define __asm__ asm
+#endif
 @end smallexample
 
-Such built-in functions are provided for @code{memcpy}, @code{mempcpy},
-@code{memmove}, @code{memset}, @code{strcpy}, @code{stpcpy}, @code{strncpy},
-@code{strcat} and @code{strncat}.
+@findex __extension__
+@opindex pedantic
+@option{-pedantic} and other options cause warnings for many GNU C extensions.
+You can suppress such warnings using the keyword @code{__extension__}.
+Specifically:
 
-@subsubsection Formatted Output Function Checking
-@defbuiltin{int __builtin___sprintf_chk @
-            (char *@var{s}, int @var{flag}, size_t @var{os}, @
-            const char *@var{fmt}, ...)}
-@defbuiltinx{int __builtin___snprintf_chk @
-             (char *@var{s}, size_t @var{maxlen}, int @var{flag}, @
-             size_t @var{os}, const char *@var{fmt}, ...)}
-@defbuiltinx{int __builtin___vsprintf_chk @
-             (char *@var{s}, int @var{flag}, size_t @var{os}, @
-             const char *@var{fmt}, va_list @var{ap})}
-@defbuiltinx{int __builtin___vsnprintf_chk @
-             (char *@var{s}, size_t @var{maxlen}, int @var{flag}, @
-             size_t @var{os}, const char *@var{fmt}, @
-             va_list @var{ap})}
+@itemize @bullet
+@item
+Writing @code{__extension__} before an expression prevents warnings
+about extensions within that expression.
 
-The added @var{flag} argument is passed unchanged to @code{__sprintf_chk}
-etc.@: functions and can contain implementation specific flags on what
-additional security measures the checking function might take, such as
-handling @code{%n} differently.
+@item
+In C, writing:
 
-The @var{os} argument is the object size @var{s} points to, like in the
-other built-in functions.  There is a small difference in the behavior
-though, if @var{os} is @code{(size_t) -1}, the built-in functions are
-optimized into the non-checking functions only if @var{flag} is 0, otherwise
-the checking function is called with @var{os} argument set to
-@code{(size_t) -1}.
+@smallexample
+[[__extension__ @dots{}]]
+@end smallexample
 
-In addition to this, there are checking built-in functions
-@code{__builtin___printf_chk}, @code{__builtin___vprintf_chk},
-@code{__builtin___fprintf_chk} and @code{__builtin___vfprintf_chk}.
-These have just one additional argument, @var{flag}, right before
-format string @var{fmt}.  If the compiler is able to optimize them to
-@code{fputc} etc.@: functions, it does, otherwise the checking function
-is called and the @var{flag} argument passed to it.
-@enddefbuiltin
+suppresses warnings about using @samp{[[]]} attributes in C versions
+that predate C23@.
+@end itemize
 
-@node New/Delete Builtins
-@section Built-in functions for C++ allocations and deallocations
-@findex __builtin_operator_new
-@findex __builtin_operator_delete
-Calling these C++ built-in functions is similar to calling
-@code{::operator new} or @code{::operator delete} with the same arguments,
-except that it is an error if the selected @code{::operator new} or
-@code{::operator delete} overload is not a replaceable global operator
-and for optimization purposes calls to pairs of these functions can be
-omitted if access to the allocation is optimized out, or could be replaced
-with implementation provided buffer on the stack, or multiple allocation
-calls can be merged into a single allocation.  In C++ such optimizations
-are normally allowed just for calls to such replaceable global operators
-from @code{new} and @code{delete} expressions.
+@code{__extension__} has no effect aside from this.
+
+@node Function Names
+@subsection Function Names as Strings
+@cindex @code{__func__} identifier
+@cindex @code{__FUNCTION__} identifier
+@cindex @code{__PRETTY_FUNCTION__} identifier
+
+GCC provides three magic constants that hold the name of the current
+function as a string.  In C++11 and later modes, all three are treated
+as constant expressions and can be used in @code{constexpr} constexts.
+The first of these constants is @code{__func__}, which is part of
+the C99 standard:
+
+The identifier @code{__func__} is implicitly declared by the translator
+as if, immediately following the opening brace of each function
+definition, the declaration
 
 @smallexample
-void foo () @{
-  int *a = new int;
-  delete a; // This pair of allocation/deallocation operators can be omitted
-	    // or replaced with int _temp; int *a = &_temp; etc.@:
-  void *b = ::operator new (32);
-  ::operator delete (b); // This one cannnot.
-  void *c = __builtin_operator_new (32);
-  __builtin_operator_delete (c); // This one can.
+static const char __func__[] = "function-name";
+@end smallexample
+
+@noindent
+appeared, where function-name is the name of the lexically-enclosing
+function.  This name is the unadorned name of the function.  As an
+extension, at file (or, in C++, namespace scope), @code{__func__}
+evaluates to the empty string.
+
+@code{__FUNCTION__} is another name for @code{__func__}, provided for
+backward compatibility with old versions of GCC.
+
+In C, @code{__PRETTY_FUNCTION__} is yet another name for
+@code{__func__}, except that at file scope (or, in C++, namespace scope),
+it evaluates to the string @code{"top level"}.  In addition, in C++,
+@code{__PRETTY_FUNCTION__} contains the signature of the function as
+well as its bare name.  For example, this program:
+
+@smallexample
+extern "C" int printf (const char *, ...);
+
+class a @{
+ public:
+  void sub (int i)
+    @{
+      printf ("__FUNCTION__ = %s\n", __FUNCTION__);
+      printf ("__PRETTY_FUNCTION__ = %s\n", __PRETTY_FUNCTION__);
+    @}
+@};
+
+int
+main (void)
+@{
+  a ax;
+  ax.sub (0);
+  return 0;
 @}
 @end smallexample
 
-@node Other Builtins
-@section Other Built-in Functions Provided by GCC
-@cindex built-in functions
-@findex __builtin_iseqsig
-@findex __builtin_isfinite
-@findex __builtin_isnormal
-@findex __builtin_isgreater
-@findex __builtin_isgreaterequal
-@findex __builtin_isunordered
-@findex __builtin_speculation_safe_value
-@findex _Exit
-@findex _exit
-@findex abort
-@findex abs
-@findex acos
-@findex acosf
-@findex acosh
-@findex acoshf
-@findex acoshl
-@findex acosl
-@findex alloca
-@findex asin
-@findex asinf
-@findex asinh
-@findex asinhf
-@findex asinhl
-@findex asinl
-@findex atan
-@findex atan2
-@findex atan2f
-@findex atan2l
-@findex atanf
-@findex atanh
-@findex atanhf
-@findex atanhl
-@findex atanl
-@findex bcmp
-@findex bzero
-@findex cabs
-@findex cabsf
-@findex cabsl
-@findex cacos
-@findex cacosf
-@findex cacosh
-@findex cacoshf
-@findex cacoshl
-@findex cacosl
-@findex calloc
-@findex carg
-@findex cargf
-@findex cargl
-@findex casin
-@findex casinf
-@findex casinh
-@findex casinhf
-@findex casinhl
-@findex casinl
-@findex catan
-@findex catanf
-@findex catanh
-@findex catanhf
-@findex catanhl
-@findex catanl
-@findex cbrt
-@findex cbrtf
-@findex cbrtl
-@findex ccos
-@findex ccosf
-@findex ccosh
-@findex ccoshf
-@findex ccoshl
-@findex ccosl
-@findex ceil
-@findex ceilf
-@findex ceill
-@findex cexp
-@findex cexpf
-@findex cexpl
-@findex cimag
-@findex cimagf
-@findex cimagl
-@findex clog
-@findex clogf
-@findex clogl
-@findex clog10
-@findex clog10f
-@findex clog10l
-@findex conj
-@findex conjf
-@findex conjl
-@findex copysign
-@findex copysignf
-@findex copysignl
-@findex cos
-@findex cosf
-@findex cosh
-@findex coshf
-@findex coshl
-@findex cosl
-@findex cpow
-@findex cpowf
-@findex cpowl
-@findex cproj
-@findex cprojf
-@findex cprojl
-@findex creal
-@findex crealf
-@findex creall
-@findex csin
-@findex csinf
-@findex csinh
-@findex csinhf
-@findex csinhl
-@findex csinl
-@findex csqrt
-@findex csqrtf
-@findex csqrtl
-@findex ctan
-@findex ctanf
-@findex ctanh
-@findex ctanhf
-@findex ctanhl
-@findex ctanl
-@findex dcgettext
-@findex dgettext
-@findex drem
-@findex dremf
-@findex dreml
-@findex erf
-@findex erfc
-@findex erfcf
-@findex erfcl
-@findex erff
-@findex erfl
-@findex exit
-@findex exp
-@findex exp10
-@findex exp10f
-@findex exp10l
-@findex exp2
-@findex exp2f
-@findex exp2l
-@findex expf
-@findex expl
-@findex expm1
-@findex expm1f
-@findex expm1l
-@findex fabs
-@findex fabsf
-@findex fabsl
-@findex fdim
-@findex fdimf
-@findex fdiml
-@findex ffs
-@findex floor
-@findex floorf
-@findex floorl
-@findex fma
-@findex fmaf
-@findex fmal
-@findex fmax
-@findex fmaxf
-@findex fmaxl
-@findex fmin
-@findex fminf
-@findex fminl
-@findex fmod
-@findex fmodf
-@findex fmodl
-@findex fprintf
-@findex fprintf_unlocked
-@findex fputs
-@findex fputs_unlocked
-@findex free
-@findex frexp
-@findex frexpf
-@findex frexpl
-@findex fscanf
-@findex gamma
-@findex gammaf
-@findex gammal
-@findex gamma_r
-@findex gammaf_r
-@findex gammal_r
-@findex gettext
-@findex hypot
-@findex hypotf
-@findex hypotl
-@findex ilogb
-@findex ilogbf
-@findex ilogbl
-@findex imaxabs
-@findex index
-@findex isalnum
-@findex isalpha
-@findex isascii
-@findex isblank
-@findex iscntrl
-@findex isdigit
-@findex isgraph
-@findex islower
-@findex isprint
-@findex ispunct
-@findex isspace
-@findex isupper
-@findex iswalnum
-@findex iswalpha
-@findex iswblank
-@findex iswcntrl
-@findex iswdigit
-@findex iswgraph
-@findex iswlower
-@findex iswprint
-@findex iswpunct
-@findex iswspace
-@findex iswupper
-@findex iswxdigit
-@findex isxdigit
-@findex j0
-@findex j0f
-@findex j0l
-@findex j1
-@findex j1f
-@findex j1l
-@findex jn
-@findex jnf
-@findex jnl
-@findex labs
-@findex ldexp
-@findex ldexpf
-@findex ldexpl
-@findex lgamma
-@findex lgammaf
-@findex lgammal
-@findex lgamma_r
-@findex lgammaf_r
-@findex lgammal_r
-@findex llabs
-@findex llrint
-@findex llrintf
-@findex llrintl
-@findex llround
-@findex llroundf
-@findex llroundl
-@findex log
-@findex log10
-@findex log10f
-@findex log10l
-@findex log1p
-@findex log1pf
-@findex log1pl
-@findex log2
-@findex log2f
-@findex log2l
-@findex logb
-@findex logbf
-@findex logbl
-@findex logf
-@findex logl
-@findex lrint
-@findex lrintf
-@findex lrintl
-@findex lround
-@findex lroundf
-@findex lroundl
-@findex malloc
-@findex memchr
-@findex memcmp
-@findex memcpy
-@findex mempcpy
-@findex memset
-@findex modf
-@findex modff
-@findex modfl
-@findex nearbyint
-@findex nearbyintf
-@findex nearbyintl
-@findex nextafter
-@findex nextafterf
-@findex nextafterl
-@findex nexttoward
-@findex nexttowardf
-@findex nexttowardl
-@findex pow
-@findex pow10
-@findex pow10f
-@findex pow10l
-@findex powf
-@findex powl
-@findex printf
-@findex printf_unlocked
-@findex putchar
-@findex puts
-@findex realloc
-@findex remainder
-@findex remainderf
-@findex remainderl
-@findex remquo
-@findex remquof
-@findex remquol
-@findex rindex
-@findex rint
-@findex rintf
-@findex rintl
-@findex round
-@findex roundf
-@findex roundl
-@findex scalb
-@findex scalbf
-@findex scalbl
-@findex scalbln
-@findex scalblnf
-@findex scalblnf
-@findex scalbn
-@findex scalbnf
-@findex scanfnl
-@findex signbit
-@findex signbitf
-@findex signbitl
-@findex signbitd32
-@findex signbitd64
-@findex signbitd128
-@findex significand
-@findex significandf
-@findex significandl
-@findex sin
-@findex sincos
-@findex sincosf
-@findex sincosl
-@findex sinf
-@findex sinh
-@findex sinhf
-@findex sinhl
-@findex sinl
-@findex snprintf
-@findex sprintf
-@findex sqrt
-@findex sqrtf
-@findex sqrtl
-@findex sscanf
-@findex stpcpy
-@findex stpncpy
-@findex strcasecmp
-@findex strcat
-@findex strchr
-@findex strcmp
-@findex strcpy
-@findex strcspn
-@findex strdup
-@findex strfmon
-@findex strftime
-@findex strlen
-@findex strncasecmp
-@findex strncat
-@findex strncmp
-@findex strncpy
-@findex strndup
-@findex strnlen
-@findex strpbrk
-@findex strrchr
-@findex strspn
-@findex strstr
-@findex tan
-@findex tanf
-@findex tanh
-@findex tanhf
-@findex tanhl
-@findex tanl
-@findex tgamma
-@findex tgammaf
-@findex tgammal
-@findex toascii
-@findex tolower
-@findex toupper
-@findex towlower
-@findex towupper
-@findex trunc
-@findex truncf
-@findex truncl
-@findex vfprintf
-@findex vfscanf
-@findex vprintf
-@findex vscanf
-@findex vsnprintf
-@findex vsprintf
-@findex vsscanf
-@findex y0
-@findex y0f
-@findex y0l
-@findex y1
-@findex y1f
-@findex y1l
-@findex yn
-@findex ynf
-@findex ynl
+@noindent
+gives this output:
 
-GCC provides a large number of built-in functions other than the ones
-mentioned above.  Some of these are for internal use in the processing
-of exceptions or variable-length argument lists and are not
-documented here because they may change from time to time; we do not
-recommend general use of these functions.
+@smallexample
+__FUNCTION__ = sub
+__PRETTY_FUNCTION__ = void a::sub(int)
+@end smallexample
 
-The remaining functions are provided for optimization purposes.
+These identifiers are variables, not preprocessor macros, and may not
+be used to initialize @code{char} arrays or be concatenated with string
+literals.
 
-With the exception of built-ins that have library equivalents such as
-the standard C library functions discussed below, or that expand to
-library calls, GCC built-in functions are always expanded inline and
-thus do not have corresponding entry points and their address cannot
-be obtained.  Attempting to use them in an expression other than
-a function call results in a compile-time error.
+@node Semantic Extensions
+@section Extensions to C Semantics
 
-@opindex fno-builtin
-GCC includes built-in versions of many of the functions in the standard
-C library.  These functions come in two forms: one whose names start with
-the @code{__builtin_} prefix, and the other without.  Both forms have the
-same type (including prototype), the same address (when their address is
-taken), and the same meaning as the C library functions even if you specify
-the @option{-fno-builtin} option @pxref{C Dialect Options}).  Many of these
-functions are only optimized in certain cases; if they are not optimized in
-a particular case, a call to the library function is emitted.
+GNU C defines useful behavior for some constructs that are not allowed or
+well-defined in standard C.
 
-@opindex ansi
-@opindex std
-Outside strict ISO C mode (@option{-ansi}, @option{-std=c90},
-@option{-std=c99} or @option{-std=c11}), the functions
-@code{_exit}, @code{alloca}, @code{bcmp}, @code{bzero},
-@code{dcgettext}, @code{dgettext}, @code{dremf}, @code{dreml},
-@code{drem}, @code{exp10f}, @code{exp10l}, @code{exp10}, @code{ffsll},
-@code{ffsl}, @code{ffs}, @code{fprintf_unlocked},
-@code{fputs_unlocked}, @code{gammaf}, @code{gammal}, @code{gamma},
-@code{gammaf_r}, @code{gammal_r}, @code{gamma_r}, @code{gettext},
-@code{index}, @code{isascii}, @code{j0f}, @code{j0l}, @code{j0},
-@code{j1f}, @code{j1l}, @code{j1}, @code{jnf}, @code{jnl}, @code{jn},
-@code{lgammaf_r}, @code{lgammal_r}, @code{lgamma_r}, @code{mempcpy},
-@code{pow10f}, @code{pow10l}, @code{pow10}, @code{printf_unlocked},
-@code{rindex}, @code{roundeven}, @code{roundevenf}, @code{roundevenl},
-@code{scalbf}, @code{scalbl}, @code{scalb},
-@code{signbit}, @code{signbitf}, @code{signbitl}, @code{signbitd32},
-@code{signbitd64}, @code{signbitd128}, @code{significandf},
-@code{significandl}, @code{significand}, @code{sincosf},
-@code{sincosl}, @code{sincos}, @code{stpcpy}, @code{stpncpy},
-@code{strcasecmp}, @code{strdup}, @code{strfmon}, @code{strncasecmp},
-@code{strndup}, @code{strnlen}, @code{toascii}, @code{y0f}, @code{y0l},
-@code{y0}, @code{y1f}, @code{y1l}, @code{y1}, @code{ynf}, @code{ynl} and
-@code{yn}
-may be handled as built-in functions.
-All these functions have corresponding versions
-prefixed with @code{__builtin_}, which may be used even in strict C90
-mode.
-
-The ISO C99 functions
-@code{_Exit}, @code{acoshf}, @code{acoshl}, @code{acosh}, @code{asinhf},
-@code{asinhl}, @code{asinh}, @code{atanhf}, @code{atanhl}, @code{atanh},
-@code{cabsf}, @code{cabsl}, @code{cabs}, @code{cacosf}, @code{cacoshf},
-@code{cacoshl}, @code{cacosh}, @code{cacosl}, @code{cacos},
-@code{cargf}, @code{cargl}, @code{carg}, @code{casinf}, @code{casinhf},
-@code{casinhl}, @code{casinh}, @code{casinl}, @code{casin},
-@code{catanf}, @code{catanhf}, @code{catanhl}, @code{catanh},
-@code{catanl}, @code{catan}, @code{cbrtf}, @code{cbrtl}, @code{cbrt},
-@code{ccosf}, @code{ccoshf}, @code{ccoshl}, @code{ccosh}, @code{ccosl},
-@code{ccos}, @code{cexpf}, @code{cexpl}, @code{cexp}, @code{cimagf},
-@code{cimagl}, @code{cimag}, @code{clogf}, @code{clogl}, @code{clog},
-@code{conjf}, @code{conjl}, @code{conj}, @code{copysignf}, @code{copysignl},
-@code{copysign}, @code{cpowf}, @code{cpowl}, @code{cpow}, @code{cprojf},
-@code{cprojl}, @code{cproj}, @code{crealf}, @code{creall}, @code{creal},
-@code{csinf}, @code{csinhf}, @code{csinhl}, @code{csinh}, @code{csinl},
-@code{csin}, @code{csqrtf}, @code{csqrtl}, @code{csqrt}, @code{ctanf},
-@code{ctanhf}, @code{ctanhl}, @code{ctanh}, @code{ctanl}, @code{ctan},
-@code{erfcf}, @code{erfcl}, @code{erfc}, @code{erff}, @code{erfl},
-@code{erf}, @code{exp2f}, @code{exp2l}, @code{exp2}, @code{expm1f},
-@code{expm1l}, @code{expm1}, @code{fdimf}, @code{fdiml}, @code{fdim},
-@code{fmaf}, @code{fmal}, @code{fmaxf}, @code{fmaxl}, @code{fmax},
-@code{fma}, @code{fminf}, @code{fminl}, @code{fmin}, @code{hypotf},
-@code{hypotl}, @code{hypot}, @code{ilogbf}, @code{ilogbl}, @code{ilogb},
-@code{imaxabs}, @code{isblank}, @code{iswblank}, @code{lgammaf},
-@code{lgammal}, @code{lgamma}, @code{llabs}, @code{llrintf}, @code{llrintl},
-@code{llrint}, @code{llroundf}, @code{llroundl}, @code{llround},
-@code{log1pf}, @code{log1pl}, @code{log1p}, @code{log2f}, @code{log2l},
-@code{log2}, @code{logbf}, @code{logbl}, @code{logb}, @code{lrintf},
-@code{lrintl}, @code{lrint}, @code{lroundf}, @code{lroundl},
-@code{lround}, @code{nearbyintf}, @code{nearbyintl}, @code{nearbyint},
-@code{nextafterf}, @code{nextafterl}, @code{nextafter},
-@code{nexttowardf}, @code{nexttowardl}, @code{nexttoward},
-@code{remainderf}, @code{remainderl}, @code{remainder}, @code{remquof},
-@code{remquol}, @code{remquo}, @code{rintf}, @code{rintl}, @code{rint},
-@code{roundf}, @code{roundl}, @code{round}, @code{scalblnf},
-@code{scalblnl}, @code{scalbln}, @code{scalbnf}, @code{scalbnl},
-@code{scalbn}, @code{snprintf}, @code{tgammaf}, @code{tgammal},
-@code{tgamma}, @code{truncf}, @code{truncl}, @code{trunc},
-@code{vfscanf}, @code{vscanf}, @code{vsnprintf} and @code{vsscanf}
-are handled as built-in functions
-except in strict ISO C90 mode (@option{-ansi} or @option{-std=c90}).
-
-There are also built-in versions of the ISO C99 functions
-@code{acosf}, @code{acosl}, @code{asinf}, @code{asinl}, @code{atan2f},
-@code{atan2l}, @code{atanf}, @code{atanl}, @code{ceilf}, @code{ceill},
-@code{cosf}, @code{coshf}, @code{coshl}, @code{cosl}, @code{expf},
-@code{expl}, @code{fabsf}, @code{fabsl}, @code{floorf}, @code{floorl},
-@code{fmodf}, @code{fmodl}, @code{frexpf}, @code{frexpl}, @code{ldexpf},
-@code{ldexpl}, @code{log10f}, @code{log10l}, @code{logf}, @code{logl},
-@code{modfl}, @code{modff}, @code{powf}, @code{powl}, @code{sinf},
-@code{sinhf}, @code{sinhl}, @code{sinl}, @code{sqrtf}, @code{sqrtl},
-@code{tanf}, @code{tanhf}, @code{tanhl} and @code{tanl}
-that are recognized in any mode since ISO C90 reserves these names for
-the purpose to which ISO C99 puts them.  All these functions have
-corresponding versions prefixed with @code{__builtin_}.
-
-There are also built-in functions @code{__builtin_fabsf@var{n}},
-@code{__builtin_fabsf@var{n}x}, @code{__builtin_copysignf@var{n}} and
-@code{__builtin_copysignf@var{n}x}, corresponding to the TS 18661-3
-functions @code{fabsf@var{n}}, @code{fabsf@var{n}x},
-@code{copysignf@var{n}} and @code{copysignf@var{n}x}, for supported
-types @code{_Float@var{n}} and @code{_Float@var{n}x}.
+@menu
+* Function Prototypes::    Prototype declarations and old-style definitions.
+* Pointer Arith::          Arithmetic on @code{void}-pointers and function pointers.
+* Variadic Pointer Args::  Pointer arguments to variadic functions.
+* Pointers to Arrays::     Pointers to arrays with qualifiers work as expected.
+* Const and Volatile Functions:: GCC interprets these specially in C.
+@end menu
 
-There are also GNU extension functions @code{clog10}, @code{clog10f} and
-@code{clog10l} which names are reserved by ISO C99 for future use.
-All these functions have versions prefixed with @code{__builtin_}.
+@node Function Prototypes
+@subsection Prototypes and Old-Style Function Definitions
+@cindex function prototype declarations
+@cindex old-style function definitions
+@cindex promotion of formal parameters
 
-The ISO C94 functions
-@code{iswalnum}, @code{iswalpha}, @code{iswcntrl}, @code{iswdigit},
-@code{iswgraph}, @code{iswlower}, @code{iswprint}, @code{iswpunct},
-@code{iswspace}, @code{iswupper}, @code{iswxdigit}, @code{towlower} and
-@code{towupper}
-are handled as built-in functions
-except in strict ISO C90 mode (@option{-ansi} or @option{-std=c90}).
+GNU C extends ISO C to allow a function prototype to override a later
+old-style non-prototype definition.  Consider the following example:
 
-The ISO C90 functions
-@code{abort}, @code{abs}, @code{acos}, @code{asin}, @code{atan2},
-@code{atan}, @code{calloc}, @code{ceil}, @code{cosh}, @code{cos},
-@code{exit}, @code{exp}, @code{fabs}, @code{floor}, @code{fmod},
-@code{fprintf}, @code{fputs}, @code{free}, @code{frexp}, @code{fscanf},
-@code{isalnum}, @code{isalpha}, @code{iscntrl}, @code{isdigit},
-@code{isgraph}, @code{islower}, @code{isprint}, @code{ispunct},
-@code{isspace}, @code{isupper}, @code{isxdigit}, @code{tolower},
-@code{toupper}, @code{labs}, @code{ldexp}, @code{log10}, @code{log},
-@code{malloc}, @code{memchr}, @code{memcmp}, @code{memcpy},
-@code{memset}, @code{modf}, @code{pow}, @code{printf}, @code{putchar},
-@code{puts}, @code{realloc}, @code{scanf}, @code{sinh}, @code{sin},
-@code{snprintf}, @code{sprintf}, @code{sqrt}, @code{sscanf}, @code{strcat},
-@code{strchr}, @code{strcmp}, @code{strcpy}, @code{strcspn},
-@code{strlen}, @code{strncat}, @code{strncmp}, @code{strncpy},
-@code{strpbrk}, @code{strrchr}, @code{strspn}, @code{strstr},
-@code{tanh}, @code{tan}, @code{vfprintf}, @code{vprintf} and @code{vsprintf}
-are all recognized as built-in functions unless
-@option{-fno-builtin} is specified (or @option{-fno-builtin-@var{function}}
-is specified for an individual function).  All of these functions have
-corresponding versions prefixed with @code{__builtin_}.
+@smallexample
+/* @r{Use prototypes unless the compiler is old-fashioned.}  */
+#ifdef __STDC__
+#define P(x) x
+#else
+#define P(x) ()
+#endif
 
-GCC provides built-in versions of the ISO C99 floating-point comparison
-macros that avoid raising exceptions for unordered operands.  They have
-the same names as the standard macros ( @code{isgreater},
-@code{isgreaterequal}, @code{isless}, @code{islessequal},
-@code{islessgreater}, and @code{isunordered}) , with @code{__builtin_}
-prefixed.  We intend for a library implementor to be able to simply
-@code{#define} each standard macro to its built-in equivalent.
-In the same fashion, GCC provides @code{fpclassify}, @code{iseqsig},
-@code{isfinite}, @code{isinf_sign}, @code{isnormal} and @code{signbit} built-ins
-used with @code{__builtin_} prefixed.  The @code{isinf} and @code{isnan}
-built-in functions appear both with and without the @code{__builtin_} prefix.
-With @code{-ffinite-math-only} option the @code{isinf} and @code{isnan}
-built-in functions will always return 0.
+/* @r{Prototype function declaration.}  */
+int isroot P((uid_t));
 
-GCC provides built-in versions of the ISO C99 floating-point rounding and
-exceptions handling functions @code{fegetround}, @code{feclearexcept} and
-@code{feraiseexcept}.  They may not be available for all targets, and because
-they need close interaction with libc internal values, they may not be available
-for all target libcs, but in all cases they will gracefully fallback to libc
-calls.  These built-in functions appear both with and without the
-@code{__builtin_} prefix.
+/* @r{Old-style function definition.}  */
+int
+isroot (x)   /* @r{??? lossage here ???} */
+     uid_t x;
+@{
+  return x == 0;
+@}
+@end smallexample
 
-@defbuiltin{{void *} __builtin_alloca (size_t @var{size})}
-The @code{__builtin_alloca} function must be called at block scope.
-The function allocates an object @var{size} bytes large on the stack
-of the calling function.  The object is aligned on the default stack
-alignment boundary for the target determined by the
-@code{__BIGGEST_ALIGNMENT__} macro.  The @code{__builtin_alloca}
-function returns a pointer to the first byte of the allocated object.
-The lifetime of the allocated object ends just before the calling
-function returns to its caller.   This is so even when
-@code{__builtin_alloca} is called within a nested block.
+Suppose the type @code{uid_t} happens to be @code{short}.  ISO C does
+not allow this example, because subword arguments in old-style
+non-prototype definitions are promoted.  Therefore in this example the
+function definition's argument is really an @code{int}, which does not
+match the prototype argument type of @code{short}.
 
-For example, the following function allocates eight objects of @code{n}
-bytes each on the stack, storing a pointer to each in consecutive elements
-of the array @code{a}.  It then passes the array to function @code{g}
-which can safely use the storage pointed to by each of the array elements.
+This restriction of ISO C makes it hard to write code that is portable
+to traditional C compilers, because the programmer does not know
+whether the @code{uid_t} type is @code{short}, @code{int}, or
+@code{long}.  Therefore, in cases like these GNU C allows a prototype
+to override a later old-style definition.  More precisely, in GNU C, a
+function prototype argument type overrides the argument type specified
+by a later old-style definition if the former type is the same as the
+latter type before promotion.  Thus in GNU C the above example is
+equivalent to the following:
 
 @smallexample
-void f (unsigned n)
-@{
-  void *a [8];
-  for (int i = 0; i != 8; ++i)
-    a [i] = __builtin_alloca (n);
+int isroot (uid_t);
 
-  g (a, n);   // @r{safe}
+int
+isroot (uid_t x)
+@{
+  return x == 0;
 @}
 @end smallexample
 
-Since the @code{__builtin_alloca} function doesn't validate its argument
-it is the responsibility of its caller to make sure the argument doesn't
-cause it to exceed the stack size limit.
-The @code{__builtin_alloca} function is provided to make it possible to
-allocate on the stack arrays of bytes with an upper bound that may be
-computed at run time.  Since C99 Variable Length Arrays offer
-similar functionality under a portable, more convenient, and safer
-interface they are recommended instead, in both C99 and C++ programs
-where GCC provides them as an extension.
-@xref{Variable Length}, for details.
+@noindent
+GNU C++ does not support old-style function definitions, so this
+extension is irrelevant.
 
-@enddefbuiltin
+@node Pointer Arith
+@subsection Arithmetic on @code{void}- and Function-Pointers
+@cindex void pointers, arithmetic
+@cindex void, size of pointer to
+@cindex function pointers, arithmetic
+@cindex function, size of pointer to
 
-@defbuiltin{{void *} __builtin_alloca_with_align (size_t @var{size}, size_t @var{alignment})}
-The @code{__builtin_alloca_with_align} function must be called at block
-scope.  The function allocates an object @var{size} bytes large on
-the stack of the calling function.  The allocated object is aligned on
-the boundary specified by the argument @var{alignment} whose unit is given
-in bits (not bytes).  The @var{size} argument must be positive and not
-exceed the stack size limit.  The @var{alignment} argument must be a constant
-integer expression that evaluates to a power of 2 greater than or equal to
-@code{CHAR_BIT} and less than some unspecified maximum.  Invocations
-with other values are rejected with an error indicating the valid bounds.
-The function returns a pointer to the first byte of the allocated object.
-The lifetime of the allocated object ends at the end of the block in which
-the function was called.  The allocated storage is released no later than
-just before the calling function returns to its caller, but may be released
-at the end of the block in which the function was called.
+In GNU C, addition and subtraction operations are supported on pointers to
+@code{void} and on pointers to functions.  This is done by treating the
+size of a @code{void} or of a function as 1.
 
-For example, in the following function the call to @code{g} is unsafe
-because when @code{overalign} is non-zero, the space allocated by
-@code{__builtin_alloca_with_align} may have been released at the end
-of the @code{if} statement in which it was called.
+A consequence of this is that @code{sizeof} is also allowed on @code{void}
+and on function types, and returns 1.
 
-@smallexample
-void f (unsigned n, bool overalign)
-@{
-  void *p;
-  if (overalign)
-    p = __builtin_alloca_with_align (n, 64 /* bits */);
-  else
-    p = __builtin_alloc (n);
-
-  g (p, n);   // @r{unsafe}
-@}
-@end smallexample
+@opindex Wpointer-arith
+The option @option{-Wpointer-arith} requests a warning if these extensions
+are used.
 
-Since the @code{__builtin_alloca_with_align} function doesn't validate its
-@var{size} argument it is the responsibility of its caller to make sure
-the argument doesn't cause it to exceed the stack size limit.
-The @code{__builtin_alloca_with_align} function is provided to make
-it possible to allocate on the stack overaligned arrays of bytes with
-an upper bound that may be computed at run time.  Since C99
-Variable Length Arrays offer the same functionality under
-a portable, more convenient, and safer interface they are recommended
-instead, in both C99 and C++ programs where GCC provides them as
-an extension.  @xref{Variable Length}, for details.
+@node Variadic Pointer Args
+@subsection Pointer Arguments in Variadic Functions
+@cindex pointer arguments in variadic functions
+@cindex variadic functions, pointer arguments
 
-@enddefbuiltin
+Standard C requires that pointer types used with @code{va_arg} in
+functions with variable argument lists either must be compatible with
+that of the actual argument, or that one type must be a pointer to
+@code{void} and the other a pointer to a character type.  GNU C
+implements the POSIX XSI extension that additionally permits the use
+of @code{va_arg} with a pointer type to receive arguments of any other
+pointer type.
 
-@defbuiltin{{void *} __builtin_alloca_with_align_and_max (size_t @var{size}, size_t @var{alignment}, size_t @var{max_size})}
-Similar to @code{__builtin_alloca_with_align} but takes an extra argument
-specifying an upper bound for @var{size} in case its value cannot be computed
-at compile time, for use by @option{-fstack-usage}, @option{-Wstack-usage}
-and @option{-Walloca-larger-than}.  @var{max_size} must be a constant integer
-expression, it has no effect on code generation and no attempt is made to
-check its compatibility with @var{size}.
+In particular, in GNU C @samp{va_arg (ap, void *)} can safely be used
+to consume an argument of any pointer type.
 
-@enddefbuiltin
+@node Pointers to Arrays
+@subsection Pointers to Arrays with Qualifiers Work as Expected
+@cindex pointers to arrays
+@cindex const qualifier
 
-@defbuiltin{bool __builtin_has_attribute (@var{type-or-expression}, @var{attribute})}
-The @code{__builtin_has_attribute} function evaluates to an integer constant
-expression equal to @code{true} if the symbol or type referenced by
-the @var{type-or-expression} argument has been declared with
-the @var{attribute} referenced by the second argument.  For
-an @var{type-or-expression} argument that does not reference a symbol,
-since attributes do not apply to expressions the built-in consider
-the type of the argument.  Neither argument is evaluated.
-The @var{type-or-expression} argument is subject to the same
-restrictions as the argument to @code{typeof} (@pxref{Typeof}).  The
-@var{attribute} argument is an attribute name optionally followed by
-a comma-separated list of arguments enclosed in parentheses.  Both forms
-of attribute names---with and without double leading and trailing
-underscores---are recognized.  @xref{Attribute Syntax}, for details.
-When no attribute arguments are specified for an attribute that expects
-one or more arguments the function returns @code{true} if
-@var{type-or-expression} has been declared with the attribute regardless
-of the attribute argument values.  Arguments provided for an attribute
-that expects some are validated and matched up to the provided number.
-The function returns @code{true} if all provided arguments match.  For
-example, the first call to the function below evaluates to @code{true}
-because @code{x} is declared with the @code{aligned} attribute but
-the second call evaluates to @code{false} because @code{x} is declared
-@code{aligned (8)} and not @code{aligned (4)}.
+In GNU C, pointers to arrays with qualifiers work similar to pointers
+to other qualified types. For example, a value of type @code{int (*)[5]}
+can be used to initialize a variable of type @code{const int (*)[5]}.
+These types are incompatible in ISO C because the @code{const} qualifier
+is formally attached to the element type of the array and not the
+array itself.
 
 @smallexample
-__attribute__ ((aligned (8))) int x;
-_Static_assert (__builtin_has_attribute (x, aligned), "aligned");
-_Static_assert (!__builtin_has_attribute (x, aligned (4)), "aligned (4)");
+extern void
+transpose (int N, int M, double out[M][N], const double in[N][M]);
+double x[3][2];
+double y[2][3];
+@r{@dots{}}
+transpose(3, 2, y, x);
 @end smallexample
 
-Due to a limitation the @code{__builtin_has_attribute} function returns
-@code{false} for the @code{mode} attribute even if the type or variable
-referenced by the @var{type-or-expression} argument was declared with one.
-The function is also not supported with labels, and in C with enumerators.
-
-Note that unlike the @code{__has_attribute} preprocessor operator which
-is suitable for use in @code{#if} preprocessing directives
-@code{__builtin_has_attribute} is an intrinsic function that is not
-recognized in such contexts.
-
-@enddefbuiltin
+@node Const and Volatile Functions
+@subsection Const and Volatile Functions
+@cindex @code{const} applied to function
+@cindex @code{volatile} applied to function
 
-@defbuiltin{@var{type} __builtin_speculation_safe_value (@var{type} @var{val}, @var{type} @var{failval})}
+The C standard explicitly leaves the behavior of the @code{const} and
+@code{volatile} type qualifiers applied to functions undefined; these
+constructs can only arise through the use of @code{typedef}.  As an extension,
+GCC defines this use of the @code{const} qualifier to have the same meaning
+as the GCC @code{const} function attribute, and the @code{volatile} qualifier
+to be equivalent to the @code{noreturn} attribute.
+@xref{Common Function Attributes}, for more information.
 
-This built-in function can be used to help mitigate against unsafe
-speculative execution.  @var{type} may be any integral type or any
-pointer type.
+As examples of this usage,
 
-@enumerate
-@item
-If the CPU is not speculatively executing the code, then @var{val}
-is returned.
-@item
-If the CPU is executing speculatively then either:
-@itemize
-@item
-The function may cause execution to pause until it is known that the
-code is no-longer being executed speculatively (in which case
-@var{val} can be returned, as above); or
-@item
-The function may use target-dependent speculation tracking state to cause
-@var{failval} to be returned when it is known that speculative
-execution has incorrectly predicted a conditional branch operation.
-@end itemize
-@end enumerate
+@smallexample
 
-The second argument, @var{failval}, is optional and defaults to zero
-if omitted.
+/* @r{Equivalent to:}
+   void fatal () __attribute__ ((noreturn));  */
+typedef void voidfn ();
+volatile voidfn fatal;
 
-GCC defines the preprocessor macro
-@code{__HAVE_BUILTIN_SPECULATION_SAFE_VALUE} for targets that have been
-updated to support this builtin.
+/* @r{Equivalent to:}
+   extern int square (int) __attribute__ ((const));  */
+typedef int intfn (int);
+extern const intfn square;
+@end smallexample
 
-The built-in function can be used where a variable appears to be used in a
-safe way, but the CPU, due to speculative execution may temporarily ignore
-the bounds checks.  Consider, for example, the following function:
+In general, using function attributes instead is preferred, since the
+attributes make both the intent of the code and its reliance on a GNU
+extension explicit.  Additionally, using @code{const} and
+@code{volatile} in this way is specific to GNU C and does not work in
+GNU C++.
 
-@smallexample
-int array[500];
-int f (unsigned untrusted_index)
-@{
-  if (untrusted_index < 500)
-    return array[untrusted_index];
-  return 0;
-@}
-@end smallexample
+@node Nonlocal Gotos
+@section Nonlocal Gotos
+@cindex nonlocal gotos
 
-If the function is called repeatedly with @code{untrusted_index} less
-than the limit of 500, then a branch predictor will learn that the
-block of code that returns a value stored in @code{array} will be
-executed.  If the function is subsequently called with an
-out-of-range value it will still try to execute that block of code
-first until the CPU determines that the prediction was incorrect
-(the CPU will unwind any incorrect operations at that point).
-However, depending on how the result of the function is used, it might be
-possible to leave traces in the cache that can reveal what was stored
-at the out-of-bounds location.  The built-in function can be used to
-provide some protection against leaking data in this way by changing
-the code to:
+GCC provides the built-in functions @code{__builtin_setjmp} and
+@code{__builtin_longjmp} which are similar to, but not interchangeable
+with, the C library functions @code{setjmp} and @code{longjmp}.
+The built-in versions are used internally by GCC's libraries
+to implement exception handling on some targets.  You should use the
+standard C library functions declared in @code{<setjmp.h>} in user code
+instead of the builtins.
 
-@smallexample
-int array[500];
-int f (unsigned untrusted_index)
-@{
-  if (untrusted_index < 500)
-    return array[__builtin_speculation_safe_value (untrusted_index)];
-  return 0;
-@}
-@end smallexample
+The built-in versions of these functions use GCC's normal
+mechanisms to save and restore registers using the stack on function
+entry and exit.  The jump buffer argument @var{buf} holds only the
+information needed to restore the stack frame, rather than the entire
+set of saved register values.
 
-The built-in function will either cause execution to stall until the
-conditional branch has been fully resolved, or it may permit
-speculative execution to continue, but using 0 instead of
-@code{untrusted_value} if that exceeds the limit.
+An important caveat is that GCC arranges to save and restore only
+those registers known to the specific architecture variant being
+compiled for.  This can make @code{__builtin_setjmp} and
+@code{__builtin_longjmp} more efficient than their library
+counterparts in some cases, but it can also cause incorrect and
+mysterious behavior when mixing with code that uses the full register
+set.
 
-If accessing any memory location is potentially unsafe when speculative
-execution is incorrect, then the code can be rewritten as
+You should declare the jump buffer argument @var{buf} to the
+built-in functions as:
 
 @smallexample
-int array[500];
-int f (unsigned untrusted_index)
-@{
-  if (untrusted_index < 500)
-    return *__builtin_speculation_safe_value (&array[untrusted_index], NULL);
-  return 0;
-@}
+#include <stdint.h>
+intptr_t @var{buf}[5];
 @end smallexample
 
-which will cause a @code{NULL} pointer to be used for the unsafe case.
-
+@defbuiltin{{int} __builtin_setjmp (intptr_t *@var{buf})}
+This function saves the current stack context in @var{buf}.
+@code{__builtin_setjmp} returns 0 when returning directly,
+and 1 when returning from @code{__builtin_longjmp} using the same
+@var{buf}.
 @enddefbuiltin
 
-@defbuiltin{int __builtin_types_compatible_p (@var{type1}, @var{type2})}
+@defbuiltin{{void} __builtin_longjmp (intptr_t *@var{buf}, int @var{val})}
+This function restores the stack context in @var{buf},
+saved by a previous call to @code{__builtin_setjmp}.  After
+@code{__builtin_longjmp} is finished, the program resumes execution as
+if the matching @code{__builtin_setjmp} returns the value @var{val},
+which must be 1.
 
-You can use the built-in function @code{__builtin_types_compatible_p} to
-determine whether two types are the same.
+Because @code{__builtin_longjmp} depends on the function return
+mechanism to restore the stack context, it cannot be called
+from the same function calling @code{__builtin_setjmp} to
+initialize @var{buf}.  It can only be called from a function called
+(directly or indirectly) from the function calling @code{__builtin_setjmp}.
+@enddefbuiltin
 
-This built-in function returns 1 if the unqualified versions of the
-types @var{type1} and @var{type2} (which are types, not expressions) are
-compatible, 0 otherwise.  The result of this built-in function can be
-used in integer constant expressions.
+@node Constructing Calls
+@section Constructing Function Calls
+@cindex constructing calls
+@cindex forwarding calls
 
-This built-in function ignores top level qualifiers (e.g., @code{const},
-@code{volatile}).  For example, @code{int} is equivalent to @code{const
-int}.
+Using the built-in functions described below, you can record
+the arguments a function received, and call another function
+with the same arguments, without knowing the number or types
+of the arguments.
 
-The type @code{int[]} and @code{int[5]} are compatible.  On the other
-hand, @code{int} and @code{char *} are not compatible, even if the size
-of their types, on the particular architecture are the same.  Also, the
-amount of pointer indirection is taken into account when determining
-similarity.  Consequently, @code{short *} is not similar to
-@code{short **}.  Furthermore, two types that are typedefed are
-considered compatible if their underlying types are compatible.
+You can also record the return value of that function call,
+and later return that value, without knowing what data type
+the function tried to return (as long as your caller expects
+that data type).
 
-An @code{enum} type is not considered to be compatible with another
-@code{enum} type even if both are compatible with the same integer
-type; this is what the C standard specifies.
-For example, @code{enum @{foo, bar@}} is not similar to
-@code{enum @{hot, dog@}}.
+However, these built-in functions may interact badly with some
+sophisticated features or other extensions of the language.  It
+is, therefore, not recommended to use them outside very simple
+functions acting as mere forwarders for their arguments.
 
-You typically use this function in code whose execution varies
-depending on the arguments' types.  For example:
+@defbuiltin{{void *} __builtin_apply_args ()}
+This built-in function returns a pointer to data
+describing how to perform a call with the same arguments as are passed
+to the current function.
 
-@smallexample
-#define foo(x)                                                  \
-  (@{                                                           \
-    typeof (x) tmp = (x);                                       \
-    if (__builtin_types_compatible_p (typeof (x), long double)) \
-      tmp = foo_long_double (tmp);                              \
-    else if (__builtin_types_compatible_p (typeof (x), double)) \
-      tmp = foo_double (tmp);                                   \
-    else if (__builtin_types_compatible_p (typeof (x), float))  \
-      tmp = foo_float (tmp);                                    \
-    else                                                        \
-      abort ();                                                 \
-    tmp;                                                        \
-  @})
-@end smallexample
+The function saves the arg pointer register, structure value address,
+and all registers that might be used to pass arguments to a function
+into a block of memory allocated on the stack.  Then it returns the
+address of that block.
+@enddefbuiltin
 
-@emph{Note:} This construct is only available for C@.
+@defbuiltin{{void *} __builtin_apply (void (*@var{function})(), void *@var{arguments}, size_t @var{size})}
+This built-in function invokes @var{function}
+with a copy of the parameters described by @var{arguments}
+and @var{size}.
 
-@enddefbuiltin
+The value of @var{arguments} should be the value returned by
+@code{__builtin_apply_args}.  The argument @var{size} specifies the size
+of the stack argument data, in bytes.
 
-@defbuiltin{@var{type} __builtin_call_with_static_chain (@var{call_exp}, @var{pointer_exp})}
+This function returns a pointer to data describing
+how to return whatever value is returned by @var{function}.  The data
+is saved in a block of memory allocated on the stack.
 
-The @var{call_exp} expression must be a function call, and the
-@var{pointer_exp} expression must be a pointer.  The @var{pointer_exp}
-is passed to the function call in the target's static chain location.
-The result of builtin is the result of the function call.
+It is not always simple to compute the proper value for @var{size}.  The
+value is used by @code{__builtin_apply} to compute the amount of data
+that should be pushed on the stack and copied from the incoming argument
+area.
+@enddefbuiltin
 
-@emph{Note:} This builtin is only available for C@.
-This builtin can be used to call Go closures from C.
+@defbuiltin{{void} __builtin_return (void *@var{result})}
+This built-in function returns the value described by @var{result} from
+the containing function.  You should specify, for @var{result}, a value
+returned by @code{__builtin_apply}.
+@enddefbuiltin
 
+@defbuiltin{{} __builtin_va_arg_pack ()}
+This built-in function represents all anonymous arguments of an inline
+function.  It can be used only in inline functions that are always
+inlined, never compiled as a separate function, such as those using
+@code{__attribute__ ((__always_inline__))} or
+@code{__attribute__ ((__gnu_inline__))} extern inline functions.
+It must be only passed as last argument to some other function
+with variable arguments.  This is useful for writing small wrapper
+inlines for variable argument functions, when using preprocessor
+macros is undesirable.  For example:
+@smallexample
+extern int myprintf (FILE *f, const char *format, ...);
+extern inline __attribute__ ((__gnu_inline__)) int
+myprintf (FILE *f, const char *format, ...)
+@{
+  int r = fprintf (f, "myprintf: ");
+  if (r < 0)
+    return r;
+  int s = fprintf (f, format, __builtin_va_arg_pack ());
+  if (s < 0)
+    return s;
+  return r + s;
+@}
+@end smallexample
 @enddefbuiltin
 
-@defbuiltin{@var{type} __builtin_choose_expr (@var{const_exp}, @var{exp1}, @var{exp2})}
+@defbuiltin{int __builtin_va_arg_pack_len ()}
+This built-in function returns the number of anonymous arguments of
+an inline function.  It can be used only in inline functions that
+are always inlined, never compiled as a separate function, such
+as those using @code{__attribute__ ((__always_inline__))} or
+@code{__attribute__ ((__gnu_inline__))} extern inline functions.
+For example following does link- or run-time checking of open
+arguments for optimized code:
+@smallexample
+#ifdef __OPTIMIZE__
+extern inline __attribute__((__gnu_inline__)) int
+myopen (const char *path, int oflag, ...)
+@{
+  if (__builtin_va_arg_pack_len () > 1)
+    warn_open_too_many_arguments ();
 
-You can use the built-in function @code{__builtin_choose_expr} to
-evaluate code depending on the value of a constant expression.  This
-built-in function returns @var{exp1} if @var{const_exp}, which is an
-integer constant expression, is nonzero.  Otherwise it returns @var{exp2}.
+  if (__builtin_constant_p (oflag))
+    @{
+      if ((oflag & O_CREAT) != 0 && __builtin_va_arg_pack_len () < 1)
+        @{
+          warn_open_missing_mode ();
+          return __open_2 (path, oflag);
+        @}
+      return open (path, oflag, __builtin_va_arg_pack ());
+    @}
 
-Like the @samp{? :} operator, this built-in function does not evaluate the
-expression that is not chosen.  For example, if @var{const_exp} evaluates to
-@code{true}, @var{exp2} is not evaluated even if it has side effects.  On the
-other hand, @code{__builtin_choose_expr} differs from @samp{? :} in that the
-first operand must be a compile-time constant, and the other operands are not
-subject to the @samp{? :} type constraints and promotions.
+  if (__builtin_va_arg_pack_len () < 1)
+    return __open_2 (path, oflag);
 
-This built-in function can return an lvalue if the chosen argument is an
-lvalue.
+  return open (path, oflag, __builtin_va_arg_pack ());
+@}
+#endif
+@end smallexample
+@enddefbuiltin
 
-If @var{exp1} is returned, the return type is the same as @var{exp1}'s
-type.  Similarly, if @var{exp2} is returned, its return type is the same
-as @var{exp2}.
+@node Return Address
+@section Getting the Return or Frame Address of a Function
 
-Example:
+These functions may be used to get information about the callers of a
+function.
 
-@smallexample
-#define foo(x)                                                    \
-  __builtin_choose_expr (                                         \
-    __builtin_types_compatible_p (typeof (x), double),            \
-    foo_double (x),                                               \
-    __builtin_choose_expr (                                       \
-      __builtin_types_compatible_p (typeof (x), float),           \
-      foo_float (x),                                              \
-      /* @r{The void expression results in a compile-time error}  \
-         @r{when assigning the result to something.}  */          \
-      (void)0))
-@end smallexample
+@defbuiltin{{void *} __builtin_return_address (unsigned int @var{level})}
+This function returns the return address of the current function, or of
+one of its callers.  The @var{level} argument is number of frames to
+scan up the call stack.  A value of @code{0} yields the return address
+of the current function, a value of @code{1} yields the return address
+of the caller of the current function, and so forth.  When inlining
+the expected behavior is that the function returns the address of
+the function that is returned to.  To work around this behavior use
+the @code{noinline} function attribute.
 
-@emph{Note:} This construct is only available for C@.  Furthermore, the
-unused expression (@var{exp1} or @var{exp2} depending on the value of
-@var{const_exp}) may still generate syntax errors.  This may change in
-future revisions.
+The @var{level} argument must be a constant integer.
 
-@enddefbuiltin
+On some machines it may be impossible to determine the return address of
+any function other than the current one; in such cases, or when the top
+of the stack has been reached, this function returns an unspecified
+value.  In addition, @code{__builtin_frame_address} may be used
+to determine if the top of the stack has been reached.
 
-@defbuiltin{@var{type} __builtin_tgmath (@var{functions}, @var{arguments})}
+Additional post-processing of the returned value may be needed, see
+@code{__builtin_extract_return_addr}.
 
-The built-in function @code{__builtin_tgmath}, available only for C
-and Objective-C, calls a function determined according to the rules of
-@code{<tgmath.h>} macros.  It is intended to be used in
-implementations of that header, so that expansions of macros from that
-header only expand each of their arguments once, to avoid problems
-when calls to such macros are nested inside the arguments of other
-calls to such macros; in addition, it results in better diagnostics
-for invalid calls to @code{<tgmath.h>} macros than implementations
-using other GNU C language features.  For example, the @code{pow}
-type-generic macro might be defined as:
+The stored representation of the return address in memory may be different
+from the address returned by @code{__builtin_return_address}.  For example,
+on AArch64 the stored address may be mangled with return address signing
+whereas the address returned by @code{__builtin_return_address} is not.
+
+Calling this function with a nonzero argument can have unpredictable
+effects, including crashing the calling program.  As a result, calls
+that are considered unsafe are diagnosed when the @option{-Wframe-address}
+option is in effect.  Such calls should only be made in debugging
+situations.
 
+On targets where code addresses are representable as @code{void *},
 @smallexample
-#define pow(a, b) __builtin_tgmath (powf, pow, powl, \
-                                    cpowf, cpow, cpowl, a, b)
+void *addr = __builtin_extract_return_addr (__builtin_return_address (0));
 @end smallexample
+gives the code address where the current function would return.  For example,
+such an address may be used with @code{dladdr} or other interfaces that work
+with code addresses.
+@enddefbuiltin
 
-The arguments to @code{__builtin_tgmath} are at least two pointers to
-functions, followed by the arguments to the type-generic macro (which
-will be passed as arguments to the selected function).  All the
-pointers to functions must be pointers to prototyped functions, none
-of which may have variable arguments, and all of which must have the
-same number of parameters; the number of parameters of the first
-function determines how many arguments to @code{__builtin_tgmath} are
-interpreted as function pointers, and how many as the arguments to the
-called function.
-
-The types of the specified functions must all be different, but
-related to each other in the same way as a set of functions that may
-be selected between by a macro in @code{<tgmath.h>}.  This means that
-the functions are parameterized by a floating-point type @var{t},
-different for each such function.  The function return types may all
-be the same type, or they may be @var{t} for each function, or they
-may be the real type corresponding to @var{t} for each function (if
-some of the types @var{t} are complex).  Likewise, for each parameter
-position, the type of the parameter in that position may always be the
-same type, or may be @var{t} for each function (this case must apply
-for at least one parameter position), or may be the real type
-corresponding to @var{t} for each function.
+@defbuiltin{{void *} __builtin_extract_return_addr (void *@var{addr})}
+The address as returned by @code{__builtin_return_address} may have to be fed
+through this function to get the actual encoded address.  For example, on the
+31-bit S/390 platform the highest bit has to be masked out, or on SPARC
+platforms an offset has to be added for the true next instruction to be
+executed.
 
-The standard rules for @code{<tgmath.h>} macros are used to find a
-common type @var{u} from the types of the arguments for parameters
-whose types vary between the functions; complex integer types (a GNU
-extension) are treated like the complex type corresponding to the real
-floating type that would be chosen for the corresponding real integer type.
-If the function return types vary, or are all the same integer type,
-the function called is the one for which @var{t} is @var{u}, and it is
-an error if there is no such function.  If the function return types
-are all the same floating-point type, the type-generic macro is taken
-to be one of those from TS 18661 that rounds the result to a narrower
-type; if there is a function for which @var{t} is @var{u}, it is
-called, and otherwise the first function, if any, for which @var{t}
-has at least the range and precision of @var{u} is called, and it is
-an error if there is no such function.
+If no fixup is needed, this function simply passes through @var{addr}.
+@enddefbuiltin
 
+@defbuiltin{{void *} __builtin_frob_return_addr (void *@var{addr})}
+This function does the reverse of @code{__builtin_extract_return_addr}.
 @enddefbuiltin
 
-@defbuiltin{int __builtin_constant_p (@var{exp})}
-You can use the built-in function @code{__builtin_constant_p} to
-determine if the expression @var{exp} is known to be constant at
-compile time and hence that GCC can perform constant-folding on expressions
-involving that value.  The argument of the function is the expression to test.
-The expression is not evaluated, side-effects are discarded.  The function
-returns the integer 1 if the argument is known to be a compile-time
-constant and 0 if it is not known to be a compile-time constant.
-Any expression that has side-effects makes the function return 0.
-A return of 0 does not indicate that the expression is @emph{not} a constant,
-but merely that GCC cannot prove it is a constant within the constraints
-of the active set of optimization options.
-
-You typically use this function in an embedded application where
-memory is a critical resource.  If you have some complex calculation,
-you may want it to be folded if it involves constants, but need to call
-a function if it does not.  For example:
+@defbuiltin{{void *} __builtin_frame_address (unsigned int @var{level})}
+This function is similar to @code{__builtin_return_address}, but it
+returns the address of the function frame rather than the return address
+of the function.  Calling @code{__builtin_frame_address} with a value of
+@code{0} yields the frame address of the current function, a value of
+@code{1} yields the frame address of the caller of the current function,
+and so forth.
 
-@smallexample
-#define Scale_Value(X)      \
-  (__builtin_constant_p (X) \
-  ? ((X) * SCALE + OFFSET) : Scale (X))
-@end smallexample
+The frame is the area on the stack that holds local variables and saved
+registers.  The frame address is normally the address of the first word
+pushed on to the stack by the function.  However, the exact definition
+depends upon the processor and the calling convention.  If the processor
+has a dedicated frame pointer register, and the function has a frame,
+then @code{__builtin_frame_address} returns the value of the frame
+pointer register.
 
-You may use this built-in function in either a macro or an inline
-function.  However, if you use it in an inlined function and pass an
-argument of the function as the argument to the built-in, GCC
-never returns 1 when you call the inline function with a string constant
-or compound literal (@pxref{Compound Literals}) and does not return 1
-when you pass a constant numeric value to the inline function unless you
-specify the @option{-O} option.
+On some machines it may be impossible to determine the frame address of
+any function other than the current one; in such cases, or when the top
+of the stack has been reached, this function returns @code{0} if
+the first frame pointer is properly initialized by the startup code.
 
-You may also use @code{__builtin_constant_p} in initializers for static
-data.  For instance, you can write
+Calling this function with a nonzero argument can have unpredictable
+effects, including crashing the calling program.  As a result, calls
+that are considered unsafe are diagnosed when the @option{-Wframe-address}
+option is in effect.  Such calls should only be made in debugging
+situations.
+@enddefbuiltin
 
-@smallexample
-static const int table[] = @{
-   __builtin_constant_p (EXPRESSION) ? (EXPRESSION) : -1,
-   /* @r{@dots{}} */
-@};
-@end smallexample
+@deftypefn {Built-in Function} {void *} __builtin_stack_address ()
+This function returns the stack pointer register, offset by
+@code{STACK_ADDRESS_OFFSET} if that's defined.
 
-@noindent
-This is an acceptable initializer even if @var{EXPRESSION} is not a
-constant expression, including the case where
-@code{__builtin_constant_p} returns 1 because @var{EXPRESSION} can be
-folded to a constant but @var{EXPRESSION} contains operands that are
-not otherwise permitted in a static initializer (for example,
-@code{0 && foo ()}).  GCC must be more conservative about evaluating the
-built-in in this case, because it has no opportunity to perform
-optimization.
-@enddefbuiltin
+Conceptually, the returned address returned by this built-in function is
+the boundary between the stack area allocated for use by its caller, and
+the area that could be modified by a function call, that the caller
+could safely zero-out before or after (but not during) the call
+sequence.
 
-@defbuiltin{bool __builtin_is_constant_evaluated (void)}
-The @code{__builtin_is_constant_evaluated} function is available only
-in C++.  The built-in is intended to be used by implementations of
-the @code{std::is_constant_evaluated} C++ function.  Programs should make
-use of the latter function rather than invoking the built-in directly.
+Arguments for a callee may be preallocated as part of the caller's stack
+frame, or allocated on a per-call basis, depending on the target, so
+they may be on either side of this boundary.
 
-The main use case of the built-in is to determine whether a @code{constexpr}
-function is being called in a @code{constexpr} context.  A call to
-the function evaluates to a core constant expression with the value
-@code{true} if and only if it occurs within the evaluation of an expression
-or conversion that is manifestly constant-evaluated as defined in the C++
-standard.  Manifestly constant-evaluated contexts include constant-expressions,
-the conditions of @code{constexpr if} statements, constraint-expressions, and
-initializers of variables usable in constant expressions.   For more details
-refer to the latest revision of the C++ standard.
-@enddefbuiltin
+Even if the stack pointer is biased, the result is not.  The register
+save area on SPARC is regarded as modifiable by calls, rather than as
+allocated for use by the caller function, since it is never in use while
+the caller function itself is running.
 
-@defbuiltin{@var{type} __builtin_counted_by_ref (@var{ptr})}
-The built-in function @code{__builtin_counted_by_ref} checks whether the array
-object pointed by the pointer @var{ptr} has another object associated with it
-that represents the number of elements in the array object through the
-@code{counted_by} attribute (i.e. the counted-by object). If so, returns a
-pointer to the corresponding counted-by object.
-If such counted-by object does not exist, returns a null pointer.
+Red zones that only leaf functions could use are also regarded as
+modifiable by calls, rather than as allocated for use by the caller.
+This is only theoretical, since leaf functions do not issue calls, but a
+constant offset makes this built-in function more predictable.
+@end deftypefn
 
-This built-in function is only available in C for now.
+@node Stack Scrubbing
+@section Stack scrubbing internal interfaces
 
-The argument @var{ptr} must be a pointer to an array.
-The @var{type} of the returned value is a pointer type pointing to the
-corresponding type of the counted-by object or a void pointer type in case
-of a null pointer being returned.
+Stack scrubbing involves cooperation between a @code{strub} context,
+i.e., a function whose stack frame is to be zeroed-out, and its callers.
+The caller initializes a stack watermark, the @code{strub} context
+updates the watermark according to its stack use, and the caller zeroes
+it out once it regains control, whether by the callee's returning or by
+an exception.
 
-For example:
+Each of these steps is performed by a different builtin function call.
+Calls to these builtins are introduced automatically, in response to
+@code{strub} attributes and command-line options; they are not expected
+to be explicitly called by source code.
 
-@smallexample
-struct foo1 @{
-  int counter;
-  struct bar1 array[] __attribute__((counted_by (counter)));
-@} *p;
+The functions that implement the builtins are available in libgcc but,
+depending on optimization levels, they are expanded internally, adjusted
+to account for inlining, and sometimes combined/deferred (e.g. passing
+the caller-supplied watermark on to callees, refraining from erasing
+stack areas that the caller will) to enable tail calls and to optimize
+for code size.
 
-struct foo2 @{
-  int other;
-  struct bar2 array[];
-@} *q;
-@end smallexample
+@deftypefn {Built-in Function} {void} __builtin___strub_enter (void **@var{wmptr})
+This function initializes a stack @var{watermark} variable with the
+current top of the stack.  A call to this builtin function is introduced
+before entering a @code{strub} context.  It remains as a function call
+if optimization is not enabled.
+@end deftypefn
 
-@noindent
-the following call to the built-in
+@deftypefn {Built-in Function} {void} __builtin___strub_update (void **@var{wmptr})
+This function updates a stack @var{watermark} variable with the current
+top of the stack, if it tops the previous watermark.  A call to this
+builtin function is inserted within @code{strub} contexts, whenever
+additional stack space may have been used.  It remains as a function
+call at optimization levels lower than 2.
+@end deftypefn
 
-@smallexample
-__builtin_counted_by_ref (p->array)
-@end smallexample
+@deftypefn {Built-in Function} {void} __builtin___strub_leave (void **@var{wmptr})
+This function overwrites the memory area between the current top of the
+stack, and the @var{watermark}ed address.  A call to this builtin
+function is inserted after leaving a @code{strub} context.  It remains
+as a function call at optimization levels lower than 3, and it is guarded by
+a condition at level 2.
+@end deftypefn
 
-@noindent
-returns:
+@node Vector Extensions
+@section Using Vector Instructions through Built-in Functions
 
-@smallexample
-&p->counter with type @code{int *}.
-@end smallexample
+On some targets, the instruction set contains SIMD vector instructions which
+operate on multiple values contained in one large register at the same time.
+For example, on the x86 the MMX, 3DNow!@: and SSE extensions can be used
+this way.
 
-@noindent
-However, the following call to the built-in
+The first step in using these extensions is to provide the necessary data
+types.  This should be done using an appropriate @code{typedef}:
 
 @smallexample
-__builtin_counted_by_ref (q->array)
+typedef int v4si __attribute__ ((vector_size (16)));
 @end smallexample
 
 @noindent
-returns a null pointer to @code{void}.
+The @code{int} type specifies the @dfn{base type} (which can be a
+@code{typedef}), while the attribute specifies the vector size for the
+variable, measured in bytes. For example, the declaration above causes
+the compiler to set the mode for the @code{v4si} type to be 16 bytes wide
+and divided into @code{int} sized units.  For a 32-bit @code{int} this
+means a vector of 4 units of 4 bytes, and the corresponding mode of
+@code{foo} is @acronym{V4SI}.
 
-@enddefbuiltin
+The @code{vector_size} attribute is only applicable to integral and
+floating scalars, although arrays, pointers, and function return values
+are allowed in conjunction with this construct. Only sizes that are
+positive power-of-two multiples of the base type size are currently allowed.
 
-@defbuiltin{void __builtin_clear_padding (@var{ptr})}
-The built-in function @code{__builtin_clear_padding} function clears
-padding bits inside of the object representation of object pointed by
-@var{ptr}, which has to be a pointer.  The value representation of the
-object is not affected.  The type of the object is assumed to be the type
-the pointer points to.  Inside of a union, the only cleared bits are
-bits that are padding bits for all the union members.
+All the basic integer types can be used as base types, both as signed
+and as unsigned: @code{char}, @code{short}, @code{int}, @code{long},
+@code{long long}.  In addition, @code{float} and @code{double} can be
+used to build floating-point vector types.
 
-This built-in-function is useful if the padding bits of an object might
-have indeterminate values and the object representation needs to be
-bitwise compared to some other object, for example for atomic operations.
+Specifying a combination that is not valid for the current architecture
+causes GCC to synthesize the instructions using a narrower mode.
+For example, if you specify a variable of type @code{V4SI} and your
+architecture does not allow for this specific SIMD type, GCC
+produces code that uses 4 @code{SIs}.
 
-For C++, @var{ptr} argument type should be pointer to trivially-copyable
-type, unless the argument is address of a variable or parameter, because
-otherwise it isn't known if the type isn't just a base class whose padding
-bits are reused or laid out differently in a derived class.
-@enddefbuiltin
+The types defined in this manner can be used with a subset of normal C
+operations.  Currently, GCC allows using the following operators
+on these types: @code{+, -, *, /, unary minus, ^, |, &, ~, %}@.
 
-@defbuiltin{@var{type} __builtin_bit_cast (@var{type}, @var{arg})}
-The @code{__builtin_bit_cast} function is available only
-in C++.  The built-in is intended to be used by implementations of
-the @code{std::bit_cast} C++ template function.  Programs should make
-use of the latter function rather than invoking the built-in directly.
+The operations behave like C++ @code{valarrays}.  Addition is defined as
+the addition of the corresponding elements of the operands.  For
+example, in the code below, each of the 4 elements in @var{a} is
+added to the corresponding 4 elements in @var{b} and the resulting
+vector is stored in @var{c}.
 
-This built-in function allows reinterpreting the bits of the @var{arg}
-argument as if it had type @var{type}.  @var{type} and the type of the
-@var{arg} argument need to be trivially copyable types with the same size.
-When manifestly constant-evaluated, it performs extra diagnostics required
-for @code{std::bit_cast} and returns a constant expression if @var{arg}
-is a constant expression.  For more details
-refer to the latest revision of the C++ standard.
-@enddefbuiltin
+@smallexample
+typedef int v4si __attribute__ ((vector_size (16)));
 
-@defbuiltin{long __builtin_expect (long @var{exp}, long @var{c})}
-@opindex fprofile-arcs
-You may use @code{__builtin_expect} to provide the compiler with
-branch prediction information.  In general, you should prefer to
-use actual profile feedback for this (@option{-fprofile-arcs}), as
-programmers are notoriously bad at predicting how their programs
-actually perform.  However, there are applications in which this
-data is hard to collect.
+v4si a, b, c;
 
-The return value is the value of @var{exp}, which should be an integral
-expression.  The semantics of the built-in are that it is expected that
-@var{exp} == @var{c}.  For example:
-
-@smallexample
-if (__builtin_expect (x, 0))
-  foo ();
-@end smallexample
-
-@noindent
-indicates that we do not expect to call @code{foo}, since
-we expect @code{x} to be zero.  Since you are limited to integral
-expressions for @var{exp}, you should use constructions such as
-
-@smallexample
-if (__builtin_expect (ptr != NULL, 1))
-  foo (*ptr);
+c = a + b;
 @end smallexample
 
-@noindent
-when testing pointer or floating-point values.
+Subtraction, multiplication, division, and the logical operations
+operate in a similar manner.  Likewise, the result of using the unary
+minus or complement operators on a vector type is a vector whose
+elements are the negative or complemented values of the corresponding
+elements in the operand.
 
-For the purposes of branch prediction optimizations, the probability that
-a @code{__builtin_expect} expression is @code{true} is controlled by GCC's
-@code{builtin-expect-probability} parameter, which defaults to 90%.
+It is possible to use shifting operators @code{<<}, @code{>>} on
+integer-type vectors. The operation is defined as following: @code{@{a0,
+a1, @dots{}, an@} >> @{b0, b1, @dots{}, bn@} == @{a0 >> b0, a1 >> b1,
+@dots{}, an >> bn@}}@.  Unlike OpenCL, values of @code{b} are not
+implicitly taken modulo bit width of the base type @code{B}, and the behavior
+is undefined if any @code{bi} is greater than or equal to @code{B}.
 
-You can also use @code{__builtin_expect_with_probability} to explicitly
-assign a probability value to individual expressions.  If the built-in
-is used in a loop construct, the provided probability will influence
-the expected number of iterations made by loop optimizations.
-@enddefbuiltin
+In contrast to scalar operations in C and C++, operands of integer vector
+operations do not undergo integer promotions.
 
-@defbuiltin{long __builtin_expect_with_probability}
-(long @var{exp}, long @var{c}, double @var{probability})
+Operands of binary vector operations must have the same number of
+elements. 
 
-This function has the same semantics as @code{__builtin_expect},
-but the caller provides the expected probability that @var{exp} == @var{c}.
-The last argument, @var{probability}, is a floating-point value in the
-range 0.0 to 1.0, inclusive.  The @var{probability} argument must be a
-constant floating-point expression.
-@enddefbuiltin
+For convenience, it is allowed to use a binary vector operation
+where one operand is a scalar. In that case the compiler transforms
+the scalar operand into a vector where each element is the scalar from
+the operation. The transformation happens only if the scalar could be
+safely converted to the vector-element type.
+Consider the following code.
 
-@defbuiltin{void __builtin_trap (void)}
-This function causes the program to exit abnormally.  GCC implements
-this function by using a target-dependent mechanism (such as
-intentionally executing an illegal instruction) or by calling
-@code{abort}.  The mechanism used may vary from release to release so
-you should not rely on any particular implementation.
-@enddefbuiltin
+@smallexample
+typedef int v4si __attribute__ ((vector_size (16)));
 
-@defbuiltin{void __builtin_unreachable (void)}
-If control flow reaches the point of the @code{__builtin_unreachable},
-the program is undefined.  It is useful in situations where the
-compiler cannot deduce the unreachability of the code.
+v4si a, b, c;
+long l;
 
-One such case is immediately following an @code{asm} statement that
-either never terminates, or one that transfers control elsewhere
-and never returns.  In this example, without the
-@code{__builtin_unreachable}, GCC issues a warning that control
-reaches the end of a non-void function.  It also generates code
-to return after the @code{asm}.
+a = b + 1;    /* a = b + @{1,1,1,1@}; */
+a = 2 * b;    /* a = @{2,2,2,2@} * b; */
 
-@smallexample
-int f (int c, int v)
-@{
-  if (c)
-    @{
-      return v;
-    @}
-  else
-    @{
-      asm("jmp error_handler");
-      __builtin_unreachable ();
-    @}
-@}
+a = l + a;    /* Error, cannot convert long to int. */
 @end smallexample
 
-@noindent
-Because the @code{asm} statement unconditionally transfers control out
-of the function, control never reaches the end of the function
-body.  The @code{__builtin_unreachable} is in fact unreachable and
-communicates this fact to the compiler.
+Vectors can be subscripted as if the vector were an array with
+the same number of elements and base type.  Out of bound accesses
+invoke undefined behavior at run time.  Warnings for out of bound
+accesses for vector subscription can be enabled with
+@option{-Warray-bounds}.
 
-Another use for @code{__builtin_unreachable} is following a call a
-function that never returns but that is not declared
-@code{__attribute__((noreturn))}, as in this example:
+Vector comparison is supported with standard comparison
+operators: @code{==, !=, <, <=, >, >=}. Comparison operands can be
+vector expressions of integer-type or real-type. Comparison between
+integer-type vectors and real-type vectors are not supported.  The
+result of the comparison is a vector of the same width and number of
+elements as the comparison operands with a signed integral element
+type.
+
+Vectors are compared element-wise producing 0 when comparison is false
+and -1 (constant of the appropriate type where all bits are set)
+otherwise. Consider the following example.
 
 @smallexample
-void function_that_never_returns (void);
+typedef int v4si __attribute__ ((vector_size (16)));
 
-int g (int c)
-@{
-  if (c)
-    @{
-      return 1;
-    @}
-  else
-    @{
-      function_that_never_returns ();
-      __builtin_unreachable ();
-    @}
-@}
+v4si a = @{1,2,3,4@};
+v4si b = @{3,2,1,4@};
+v4si c;
+
+c = a >  b;     /* The result would be @{0, 0,-1, 0@}  */
+c = a == b;     /* The result would be @{0,-1, 0,-1@}  */
 @end smallexample
 
-@enddefbuiltin
+In C++, the ternary operator @code{?:} is available. @code{a?b:c}, where
+@code{b} and @code{c} are vectors of the same type and @code{a} is an
+integer vector with the same number of elements of the same size as @code{b}
+and @code{c}, computes all three arguments and creates a vector
+@code{@{a[0]?b[0]:c[0], a[1]?b[1]:c[1], @dots{}@}}.  Note that unlike in
+OpenCL, @code{a} is thus interpreted as @code{a != 0} and not @code{a < 0}.
+As in the case of binary operations, this syntax is also accepted when
+one of @code{b} or @code{c} is a scalar that is then transformed into a
+vector. If both @code{b} and @code{c} are scalars and the type of
+@code{true?b:c} has the same size as the element type of @code{a}, then
+@code{b} and @code{c} are converted to a vector type whose elements have
+this type and with the same number of elements as @code{a}.
 
-@defbuiltin{@var{type} __builtin_assoc_barrier (@var{type} @var{expr})}
-This built-in inhibits re-association of the floating-point expression
-@var{expr} with expressions consuming the return value of the built-in. The
-expression @var{expr} itself can be reordered, and the whole expression
-@var{expr} can be reordered with operands after the barrier. The barrier is
-relevant when @code{-fassociative-math} is active.
+In C++, the logic operators @code{!, &&, ||} are available for vectors.
+@code{!v} is equivalent to @code{v == 0}, @code{a && b} is equivalent to
+@code{a!=0 & b!=0} and @code{a || b} is equivalent to @code{a!=0 | b!=0}.
+For mixed operations between a scalar @code{s} and a vector @code{v},
+@code{s && v} is equivalent to @code{s?v!=0:0} (the evaluation is
+short-circuit) and @code{v && s} is equivalent to @code{v!=0 & (s?-1:0)}.
 
-@smallexample
-float x0 = a + b - b;
-float x1 = __builtin_assoc_barrier(a + b) - b;
-@end smallexample
+@findex __builtin_shuffle
+Vector shuffling is available using functions
+@code{__builtin_shuffle (vec, mask)} and
+@code{__builtin_shuffle (vec0, vec1, mask)}.
+Both functions construct a permutation of elements from one or two
+vectors and return a vector of the same type as the input vector(s).
+The @var{mask} is an integral vector with the same width (@var{W})
+and element count (@var{N}) as the output vector.
 
-@noindent
-means that, with @code{-fassociative-math}, @code{x0} can be optimized to
-@code{x0 = a} but @code{x1} cannot.
+The elements of the input vectors are numbered in memory ordering of
+@var{vec0} beginning at 0 and @var{vec1} beginning at @var{N}.  The
+elements of @var{mask} are considered modulo @var{N} in the single-operand
+case and modulo @math{2*@var{N}} in the two-operand case.
 
-It is also relevant when @code{-ffp-contract=fast} is active;
-it will prevent contraction between expressions.
+Consider the following example,
 
 @smallexample
-float x0 = a * b + c;
-float x1 = __builtin_assoc_barrier (a * b) + c;
+typedef int v4si __attribute__ ((vector_size (16)));
+
+v4si a = @{1,2,3,4@};
+v4si b = @{5,6,7,8@};
+v4si mask1 = @{0,1,1,3@};
+v4si mask2 = @{0,4,2,5@};
+v4si res;
+
+res = __builtin_shuffle (a, mask1);       /* res is @{1,2,2,4@}  */
+res = __builtin_shuffle (a, b, mask2);    /* res is @{1,5,3,6@}  */
 @end smallexample
 
-@noindent
-means that, with @code{-ffp-contract=fast}, @code{x0} may be optimized to
-use a fused multiply-add instruction but @code{x1} cannot.
+Note that @code{__builtin_shuffle} is intentionally semantically
+compatible with the OpenCL @code{shuffle} and @code{shuffle2} functions.
 
-@enddefbuiltin
+You can declare variables and use them in function calls and returns, as
+well as in assignments and some casts.  You can specify a vector type as
+a return type for a function.  Vector types can also be used as function
+arguments.  It is possible to cast from one vector type to another,
+provided they are of the same size (in fact, you can also cast vectors
+to and from other data types of the same size).
 
-@defbuiltin{{void *} __builtin_assume_aligned (const void *@var{exp}, size_t @var{align}, ...)}
-This function returns its first argument, and allows the compiler
-to assume that the returned pointer is at least @var{align} bytes
-aligned.  This built-in can have either two or three arguments,
-if it has three, the third argument should have integer type, and
-if it is nonzero means misalignment offset.  For example:
+You cannot operate between vectors of different lengths or different
+signedness without a cast.
 
-@smallexample
-void *x = __builtin_assume_aligned (arg, 16);
-@end smallexample
+@findex __builtin_shufflevector
+Vector shuffling is available using the
+@code{__builtin_shufflevector (vec1, vec2, index...)}
+function.  @var{vec1} and @var{vec2} must be expressions with
+vector type with a compatible element type.  The result of
+@code{__builtin_shufflevector} is a vector with the same element type
+as @var{vec1} and @var{vec2} but that has an element count equal to
+the number of indices specified.
 
-@noindent
-means that the compiler can assume @code{x}, set to @code{arg}, is at least
-16-byte aligned, while:
+The @var{index} arguments are a list of integers that specify the
+elements indices of the first two vectors that should be extracted and
+returned in a new vector. These element indices are numbered sequentially
+starting with the first vector, continuing into the second vector.
+An index of -1 can be used to indicate that the corresponding element in
+the returned vector is a don't care and can be freely chosen to optimized
+the generated code sequence performing the shuffle operation.
 
+Consider the following example,
 @smallexample
-void *x = __builtin_assume_aligned (arg, 32, 8);
-@end smallexample
+typedef int v4si __attribute__ ((vector_size (16)));
+typedef int v8si __attribute__ ((vector_size (32)));
 
-@noindent
-means that the compiler can assume for @code{x}, set to @code{arg}, that
-@code{(char *) x - 8} is 32-byte aligned.
-@enddefbuiltin
+v8si a = @{1,-2,3,-4,5,-6,7,-8@};
+v4si b = __builtin_shufflevector (a, a, 0, 2, 4, 6); /* b is @{1,3,5,7@} */
+v4si c = @{-2,-4,-6,-8@};
+v8si d = __builtin_shufflevector (c, b, 4, 0, 5, 1, 6, 2, 7, 3); /* d is a */
+@end smallexample
 
-@defbuiltin{int __builtin_LINE ()}
-This function is the equivalent of the preprocessor @code{__LINE__}
-macro and returns a constant integer expression that evaluates to
-the line number of the invocation of the built-in.  When used as a C++
-default argument for a function @var{F}, it returns the line number
-of the call to @var{F}.
-@enddefbuiltin
-
-@defbuiltin{{const char *} __builtin_FUNCTION ()}
-This function is the equivalent of the @code{__FUNCTION__} symbol
-and returns an address constant pointing to the name of the function
-from which the built-in was invoked, or the empty string if
-the invocation is not at function scope.  When used as a C++ default
-argument for a function @var{F}, it returns the name of @var{F}'s
-caller or the empty string if the call was not made at function
-scope.
-@enddefbuiltin
+@findex __builtin_convertvector
+Vector conversion is available using the
+@code{__builtin_convertvector (vec, vectype)}
+function.  @var{vec} must be an expression with integral or floating
+vector type and @var{vectype} an integral or floating vector type with the
+same number of elements.  The result has @var{vectype} type and value of
+a C cast of every element of @var{vec} to the element type of @var{vectype}.
 
-@defbuiltin{{const char *} __builtin_FILE ()}
-This function is the equivalent of the preprocessor @code{__FILE__}
-macro and returns an address constant pointing to the file name
-containing the invocation of the built-in, or the empty string if
-the invocation is not at function scope.  When used as a C++ default
-argument for a function @var{F}, it returns the file name of the call
-to @var{F} or the empty string if the call was not made at function
-scope.
+Consider the following example,
+@smallexample
+typedef int v4si __attribute__ ((vector_size (16)));
+typedef float v4sf __attribute__ ((vector_size (16)));
+typedef double v4df __attribute__ ((vector_size (32)));
+typedef unsigned long long v4di __attribute__ ((vector_size (32)));
 
-For example, in the following, each call to function @code{foo} will
-print a line similar to @code{"file.c:123: foo: message"} with the name
-of the file and the line number of the @code{printf} call, the name of
-the function @code{foo}, followed by the word @code{message}.
+v4si a = @{1,-2,3,-4@};
+v4sf b = @{1.5f,-2.5f,3.f,7.f@};
+v4di c = @{1ULL,5ULL,0ULL,10ULL@};
+v4sf d = __builtin_convertvector (a, v4sf); /* d is @{1.f,-2.f,3.f,-4.f@} */
+/* Equivalent of:
+   v4sf d = @{ (float)a[0], (float)a[1], (float)a[2], (float)a[3] @}; */
+v4df e = __builtin_convertvector (a, v4df); /* e is @{1.,-2.,3.,-4.@} */
+v4df f = __builtin_convertvector (b, v4df); /* f is @{1.5,-2.5,3.,7.@} */
+v4si g = __builtin_convertvector (f, v4si); /* g is @{1,-2,3,7@} */
+v4si h = __builtin_convertvector (c, v4si); /* h is @{1,5,0,10@} */
+@end smallexample
 
+@cindex vector types, using with x86 intrinsics
+Sometimes it is desirable to write code using a mix of generic vector
+operations (for clarity) and machine-specific vector intrinsics (to
+access vector instructions that are not exposed via generic built-ins).
+On x86, intrinsic functions for integer vectors typically use the same
+vector type @code{__m128i} irrespective of how they interpret the vector,
+making it necessary to cast their arguments and return values from/to
+other vector types.  In C, you can make use of a @code{union} type:
+@c In C++ such type punning via a union is not allowed by the language
 @smallexample
-const char*
-function (const char *func = __builtin_FUNCTION ())
-@{
-  return func;
-@}
+#include <immintrin.h>
 
-void foo (void)
-@{
-  printf ("%s:%i: %s: message\n", file (), line (), function ());
-@}
+typedef unsigned char u8x16 __attribute__ ((vector_size (16)));
+typedef unsigned int  u32x4 __attribute__ ((vector_size (16)));
+
+typedef union @{
+        __m128i mm;
+        u8x16   u8;
+        u32x4   u32;
+@} v128;
 @end smallexample
 
-@enddefbuiltin
+@noindent
+for variables that can be used with both built-in operators and x86
+intrinsics:
 
-@defbuiltin{void __builtin___clear_cache (void *@var{begin}, void *@var{end})}
-This function is used to flush the processor's instruction cache for
-the region of memory between @var{begin} inclusive and @var{end}
-exclusive.  Some targets require that the instruction cache be
-flushed, after modifying memory containing code, in order to obtain
-deterministic behavior.
+@smallexample
+v128 x, y = @{ 0 @};
+memcpy (&x, ptr, sizeof x);
+y.u8  += 0x80;
+x.mm  = _mm_adds_epu8 (x.mm, y.mm);
+x.u32 &= 0xffffff;
 
-If the target does not require instruction cache flushes,
-@code{__builtin___clear_cache} has no effect.  Otherwise either
-instructions are emitted in-line to clear the instruction cache or a
-call to the @code{__clear_cache} function in libgcc is made.
-@enddefbuiltin
+/* Instead of a variable, a compound literal may be used to pass the
+   return value of an intrinsic call to a function expecting the union: */
+v128 foo (v128);
+x = foo ((v128) @{_mm_adds_epu8 (x.mm, y.mm)@});
+@c This could be done implicitly with __attribute__((transparent_union)),
+@c but GCC does not accept it for unions of vector types (PR 88955).
+@end smallexample
 
-@defbuiltin{void __builtin_prefetch (const void *@var{addr}, ...)}
-This function is used to minimize cache-miss latency by moving data into
-a cache before it is accessed.
-You can insert calls to @code{__builtin_prefetch} into code for which
-you know addresses of data in memory that is likely to be accessed soon.
-If the target supports them, data prefetch instructions are generated.
-If the prefetch is done early enough before the access then the data will
-be in the cache by the time it is accessed.
+@node __sync Builtins
+@section Legacy @code{__sync} Built-in Functions for Atomic Memory Access
 
-The value of @var{addr} is the address of the memory to prefetch.
-There are two optional arguments, @var{rw} and @var{locality}.
-The value of @var{rw} is a compile-time constant zero, one or two; one
-means that the prefetch is preparing for a write to the memory address,
-two means that the prefetch is preparing for a shared read (expected to be
-read by at least one other processor before it is written if written at
-all) and zero, the default, means that the prefetch is preparing for a read.
-The value @var{locality} must be a compile-time constant integer between
-zero and three.  A value of zero means that the data has no temporal
-locality, so it need not be left in the cache after the access.  A value
-of three means that the data has a high degree of temporal locality and
-should be left in all levels of cache possible.  Values of one and two
-mean, respectively, a low or moderate degree of temporal locality.  The
-default is three.
+The following built-in functions
+are intended to be compatible with those described
+in the @cite{Intel Itanium Processor-specific Application Binary Interface},
+section 7.4.  As such, they depart from normal GCC practice by not using
+the @samp{__builtin_} prefix and also by being overloaded so that they
+work on multiple types.
 
-@smallexample
-for (i = 0; i < n; i++)
-  @{
-    a[i] = a[i] + b[i];
-    __builtin_prefetch (&a[i+j], 1, 1);
-    __builtin_prefetch (&b[i+j], 0, 1);
-    /* @r{@dots{}} */
-  @}
-@end smallexample
+The definition given in the Intel documentation allows only for the use of
+the types @code{int}, @code{long}, @code{long long} or their unsigned
+counterparts.  GCC allows any scalar type that is 1, 2, 4 or 8 bytes in
+size other than the C type @code{_Bool} or the C++ type @code{bool}.
+Operations on pointer arguments are performed as if the operands were
+of the @code{uintptr_t} type.  That is, they are not scaled by the size
+of the type to which the pointer points.
 
-Data prefetch does not generate faults if @var{addr} is invalid, but
-the address expression itself must be valid.  For example, a prefetch
-of @code{p->next} does not fault if @code{p->next} is not a valid
-address, but evaluation faults if @code{p} is not a valid address.
+These functions are implemented in terms of the @samp{__atomic}
+builtins (@pxref{__atomic Builtins}).  They should not be used for new
+code which should use the @samp{__atomic} builtins instead.
 
-If the target does not support data prefetch, the address expression
-is evaluated if it includes side effects but no other code is generated
-and GCC does not issue a warning.
-@enddefbuiltin
+Not all operations are supported by all target processors.  If a particular
+operation cannot be implemented on the target processor, a call to an
+external function is generated.  The external function carries the same name
+as the built-in version, with an additional suffix
+@samp{_@var{n}} where @var{n} is the size of the data type.
 
-@defbuiltin{{size_t} __builtin_object_size (const void * @var{ptr}, int @var{type})}
-Returns a constant size estimate of an object pointed to by @var{ptr}.
-@xref{Object Size Checking}, for a detailed description of the function.
-@enddefbuiltin
+In most cases, these built-in functions are considered a @dfn{full barrier}.
+That is,
+no memory operand is moved across the operation, either forward or
+backward.  Further, instructions are issued as necessary to prevent the
+processor from speculating loads across the operation and from queuing stores
+after the operation.
 
-@defbuiltin{{size_t} __builtin_dynamic_object_size (const void * @var{ptr}, int @var{type})}
-Similar to @code{__builtin_object_size} except that the return value
-need not be a constant.  @xref{Object Size Checking}, for a detailed
-description of the function.
-@enddefbuiltin
+All of the routines are described in the Intel documentation to take
+``an optional list of variables protected by the memory barrier''.  It's
+not clear what is meant by that; it could mean that @emph{only} the
+listed variables are protected, or it could mean a list of additional
+variables to be protected.  The list is ignored by GCC which treats it as
+empty.  GCC interprets an empty list as meaning that all globally
+accessible variables should be protected.
 
-@defbuiltin{int __builtin_classify_type (@var{arg})}
-@defbuiltinx{int __builtin_classify_type (@var{type})}
-The @code{__builtin_classify_type} returns a small integer with a category
-of @var{arg} argument's type, like void type, integer type, enumeral type,
-boolean type, pointer type, reference type, offset type, real type, complex
-type, function type, method type, record type, union type, array type,
-string type, bit-precise integer type, vector type, etc.  When the argument
-is an expression, for backwards compatibility reason the argument is promoted
-like arguments passed to @code{...} in varargs function, so some classes are
-never returned in certain languages.  Alternatively, the argument of the
-built-in function can be a typename, such as the @code{typeof} specifier.
+@defbuiltin{@var{type} __sync_fetch_and_add (@var{type} *@var{ptr}, @var{type} @var{value}, ...)}
+@defbuiltinx{@var{type} __sync_fetch_and_sub (@var{type} *@var{ptr}, @var{type} @var{value}, ...)}
+@defbuiltinx{@var{type} __sync_fetch_and_or (@var{type} *@var{ptr}, @var{type} @var{value}, ...)}
+@defbuiltinx{@var{type} __sync_fetch_and_and (@var{type} *@var{ptr}, @var{type} @var{value}, ...)}
+@defbuiltinx{@var{type} __sync_fetch_and_xor (@var{type} *@var{ptr}, @var{type} @var{value}, ...)}
+@defbuiltinx{@var{type} __sync_fetch_and_nand (@var{type} *@var{ptr}, @var{type} @var{value}, ...)}
+These built-in functions perform the operation suggested by the name, and
+returns the value that had previously been in memory.  That is, operations
+on integer operands have the following semantics.  Operations on pointer
+arguments are performed as if the operands were of the @code{uintptr_t}
+type.  That is, they are not scaled by the size of the type to which
+the pointer points.
 
 @smallexample
-int a[2];
-__builtin_classify_type (a) == __builtin_classify_type (int[5]);
-__builtin_classify_type (a) == __builtin_classify_type (void*);
-__builtin_classify_type (typeof (a)) == __builtin_classify_type (int[5]);
+@{ tmp = *ptr; *ptr @var{op}= value; return tmp; @}
+@{ tmp = *ptr; *ptr = ~(tmp & value); return tmp; @}   // nand
 @end smallexample
 
-The first comparison will never be true, as @var{a} is implicitly converted
-to pointer.  The last two comparisons will be true as they classify
-pointers in the second case and arrays in the last case.
-@enddefbuiltin
+The object pointed to by the first argument must be of integer or pointer
+type.  It must not be a boolean type.
 
-@defbuiltin{double __builtin_huge_val (void)}
-Returns a positive infinity, if supported by the floating-point format,
-else @code{DBL_MAX}.  This function is suitable for implementing the
-ISO C macro @code{HUGE_VAL}.
+@emph{Note:} GCC 4.4 and later implement @code{__sync_fetch_and_nand}
+as @code{*ptr = ~(tmp & value)} instead of @code{*ptr = ~tmp & value}.
 @enddefbuiltin
 
-@defbuiltin{float __builtin_huge_valf (void)}
-Similar to @code{__builtin_huge_val}, except the return type is @code{float}.
-@enddefbuiltin
+@defbuiltin{@var{type} __sync_add_and_fetch (@var{type} *@var{ptr}, @
+                                             @var{type} @var{value}, ...)}
+@defbuiltinx{@var{type} __sync_sub_and_fetch (@var{type} *@var{ptr}, @var{type} @var{value}, ...)}
+@defbuiltinx{@var{type} __sync_or_and_fetch (@var{type} *@var{ptr}, @var{type} @var{value}, ...)}
+@defbuiltinx{@var{type} __sync_and_and_fetch (@var{type} *@var{ptr}, @var{type} @var{value}, ...)}
+@defbuiltinx{@var{type} __sync_xor_and_fetch (@var{type} *@var{ptr}, @var{type} @var{value}, ...)}
+@defbuiltinx{@var{type} __sync_nand_and_fetch (@var{type} *@var{ptr}, @var{type} @var{value}, ...)}
+These built-in functions perform the operation suggested by the name, and
+return the new value.  That is, operations on integer operands have
+the following semantics.  Operations on pointer operands are performed as
+if the operand's type were @code{uintptr_t}.
 
-@defbuiltin{{long double} __builtin_huge_vall (void)}
-Similar to @code{__builtin_huge_val}, except the return
-type is @code{long double}.
-@enddefbuiltin
+@smallexample
+@{ *ptr @var{op}= value; return *ptr; @}
+@{ *ptr = ~(*ptr & value); return *ptr; @}   // nand
+@end smallexample
 
-@defbuiltin{_Float@var{n} __builtin_huge_valf@var{n} (void)}
-Similar to @code{__builtin_huge_val}, except the return type is
-@code{_Float@var{n}}.
-@enddefbuiltin
+The same constraints on arguments apply as for the corresponding
+@code{__sync_op_and_fetch} built-in functions.
 
-@defbuiltin{_Float@var{n}x __builtin_huge_valf@var{n}x (void)}
-Similar to @code{__builtin_huge_val}, except the return type is
-@code{_Float@var{n}x}.
+@emph{Note:} GCC 4.4 and later implement @code{__sync_nand_and_fetch}
+as @code{*ptr = ~(*ptr & value)} instead of
+@code{*ptr = ~*ptr & value}.
 @enddefbuiltin
 
-@defbuiltin{int __builtin_fpclassify (int, int, int, int, int, ...)}
-This built-in implements the C99 fpclassify functionality.  The first
-five int arguments should be the target library's notion of the
-possible FP classes and are used for return values.  They must be
-constant values and they must appear in this order: @code{FP_NAN},
-@code{FP_INFINITE}, @code{FP_NORMAL}, @code{FP_SUBNORMAL} and
-@code{FP_ZERO}.  The ellipsis is for exactly one floating-point value
-to classify.  GCC treats the last argument as type-generic, which
-means it does not do default promotion from float to double.
-@enddefbuiltin
+@defbuiltin{bool __sync_bool_compare_and_swap (@var{type} *@var{ptr}, @var{type} @var{oldval}, @var{type} @var{newval}, ...)}
+@defbuiltinx{@var{type} __sync_val_compare_and_swap (@var{type} *@var{ptr}, @var{type} @var{oldval}, @var{type} @var{newval}, ...)}
+These built-in functions perform an atomic compare and swap.
+That is, if the current
+value of @code{*@var{ptr}} is @var{oldval}, then write @var{newval} into
+@code{*@var{ptr}}.
 
-@defbuiltin{double __builtin_inf (void)}
-Similar to @code{__builtin_huge_val}, except a warning is generated
-if the target floating-point format does not support infinities.
+The ``bool'' version returns @code{true} if the comparison is successful and
+@var{newval} is written.  The ``val'' version returns the contents
+of @code{*@var{ptr}} before the operation.
 @enddefbuiltin
 
-@defbuiltin{_Decimal32 __builtin_infd32 (void)}
-Similar to @code{__builtin_inf}, except the return type is @code{_Decimal32}.
+@defbuiltin{void __sync_synchronize (...)}
+This built-in function issues a full memory barrier.
 @enddefbuiltin
 
-@defbuiltin{_Decimal64 __builtin_infd64 (void)}
-Similar to @code{__builtin_inf}, except the return type is @code{_Decimal64}.
-@enddefbuiltin
+@defbuiltin{@var{type} __sync_lock_test_and_set (@var{type} *@var{ptr}, @var{type} @var{value}, ...)}
+This built-in function, as described by Intel, is not a traditional test-and-set
+operation, but rather an atomic exchange operation.  It writes @var{value}
+into @code{*@var{ptr}}, and returns the previous contents of
+@code{*@var{ptr}}.
 
-@defbuiltin{_Decimal128 __builtin_infd128 (void)}
-Similar to @code{__builtin_inf}, except the return type is @code{_Decimal128}.
-@enddefbuiltin
+Many targets have only minimal support for such locks, and do not support
+a full exchange operation.  In this case, a target may support reduced
+functionality here by which the @emph{only} valid value to store is the
+immediate constant 1.  The exact value actually stored in @code{*@var{ptr}}
+is implementation defined.
 
-@defbuiltin{float __builtin_inff (void)}
-Similar to @code{__builtin_inf}, except the return type is @code{float}.
-This function is suitable for implementing the ISO C99 macro @code{INFINITY}.
+This built-in function is not a full barrier,
+but rather an @dfn{acquire barrier}.
+This means that references after the operation cannot move to (or be
+speculated to) before the operation, but previous memory stores may not
+be globally visible yet, and previous memory loads may not yet be
+satisfied.
 @enddefbuiltin
 
-@defbuiltin{{long double} __builtin_infl (void)}
-Similar to @code{__builtin_inf}, except the return
-type is @code{long double}.
-@enddefbuiltin
+@defbuiltin{void __sync_lock_release (@var{type} *@var{ptr}, ...)}
+This built-in function releases the lock acquired by
+@code{__sync_lock_test_and_set}.
+Normally this means writing the constant 0 to @code{*@var{ptr}}.
 
-@defbuiltin{_Float@var{n} __builtin_inff@var{n} (void)}
-Similar to @code{__builtin_inf}, except the return
-type is @code{_Float@var{n}}.
+This built-in function is not a full barrier,
+but rather a @dfn{release barrier}.
+This means that all previous memory stores are globally visible, and all
+previous memory loads have been satisfied, but following memory reads
+are not prevented from being speculated to before the barrier.
 @enddefbuiltin
 
-@defbuiltin{_Float@var{n} __builtin_inff@var{n}x (void)}
-Similar to @code{__builtin_inf}, except the return
-type is @code{_Float@var{n}x}.
-@enddefbuiltin
+@node __atomic Builtins
+@section Built-in Functions for Memory Model Aware Atomic Operations
 
-@defbuiltin{int __builtin_isinf_sign (...)}
-Similar to @code{isinf}, except the return value is -1 for
-an argument of @code{-Inf} and 1 for an argument of @code{+Inf}.
-Note while the parameter list is an
-ellipsis, this function only accepts exactly one floating-point
-argument.  GCC treats this parameter as type-generic, which means it
-does not do default promotion from float to double.
-@enddefbuiltin
+The following built-in functions approximately match the requirements
+for the C++11 memory model.  They are all
+identified by being prefixed with @samp{__atomic} and most are
+overloaded so that they work with multiple types.
 
-@defbuiltin{double __builtin_nan (const char *@var{str})}
-This is an implementation of the ISO C99 function @code{nan}.
+These functions are intended to replace the legacy @samp{__sync}
+builtins.  The main difference is that the memory order that is requested
+is a parameter to the functions.  New code should always use the
+@samp{__atomic} builtins rather than the @samp{__sync} builtins.
 
-Since ISO C99 defines this function in terms of @code{strtod}, which we
-do not implement, a description of the parsing is in order.  The string
-is parsed as by @code{strtol}; that is, the base is recognized by
-leading @samp{0} or @samp{0x} prefixes.  The number parsed is placed
-in the significand such that the least significant bit of the number
-is at the least significant bit of the significand.  The number is
-truncated to fit the significand field provided.  The significand is
-forced to be a quiet NaN@.
+Note that the @samp{__atomic} builtins assume that programs will
+conform to the C++11 memory model.  In particular, they assume
+that programs are free of data races.  See the C++11 standard for
+detailed requirements.
 
-This function, if given a string literal all of which would have been
-consumed by @code{strtol}, is evaluated early enough that it is considered a
-compile-time constant.
-@enddefbuiltin
+The @samp{__atomic} builtins can be used with any integral scalar or
+pointer type that is 1, 2, 4, or 8 bytes in length.  16-byte integral
+types are also allowed if @samp{__int128} (@pxref{__int128}) is
+supported by the architecture.
 
-@defbuiltin{_Decimal32 __builtin_nand32 (const char *@var{str})}
-Similar to @code{__builtin_nan}, except the return type is @code{_Decimal32}.
-@enddefbuiltin
+The four non-arithmetic functions (load, store, exchange, and 
+compare_exchange) all have a generic version as well.  This generic
+version works on any data type.  It uses the lock-free built-in function
+if the specific data type size makes that possible; otherwise, an
+external call is left to be resolved at run time.  This external call is
+the same format with the addition of a @samp{size_t} parameter inserted
+as the first parameter indicating the size of the object being pointed to.
+All objects must be the same size.
 
-@defbuiltin{_Decimal64 __builtin_nand64 (const char *@var{str})}
-Similar to @code{__builtin_nan}, except the return type is @code{_Decimal64}.
-@enddefbuiltin
+There are 6 different memory orders that can be specified.  These map
+to the C++11 memory orders with the same names, see the C++11 standard
+or the @uref{https://gcc.gnu.org/wiki/Atomic/GCCMM/AtomicSync,GCC wiki
+on atomic synchronization} for detailed definitions.  Individual
+targets may also support additional memory orders for use on specific
+architectures.  Refer to the target documentation for details of
+these.
 
-@defbuiltin{_Decimal128 __builtin_nand128 (const char *@var{str})}
-Similar to @code{__builtin_nan}, except the return type is @code{_Decimal128}.
-@enddefbuiltin
+An atomic operation can both constrain code motion and
+be mapped to hardware instructions for synchronization between threads
+(e.g., a fence).  To which extent this happens is controlled by the
+memory orders, which are listed here in approximately ascending order of
+strength.  The description of each memory order is only meant to roughly
+illustrate the effects and is not a specification; see the C++11
+memory model for precise semantics.
 
-@defbuiltin{float __builtin_nanf (const char *@var{str})}
-Similar to @code{__builtin_nan}, except the return type is @code{float}.
-@enddefbuiltin
+@table  @code
+@item __ATOMIC_RELAXED
+Implies no inter-thread ordering constraints.
+@item __ATOMIC_CONSUME
+This is currently implemented using the stronger @code{__ATOMIC_ACQUIRE}
+memory order because of a deficiency in C++11's semantics for
+@code{memory_order_consume}.
+@item __ATOMIC_ACQUIRE
+Creates an inter-thread happens-before constraint from the release (or
+stronger) semantic store to this acquire load.  Can prevent hoisting
+of code to before the operation.
+@item __ATOMIC_RELEASE
+Creates an inter-thread happens-before constraint to acquire (or stronger)
+semantic loads that read from this release store.  Can prevent sinking
+of code to after the operation.
+@item __ATOMIC_ACQ_REL
+Combines the effects of both @code{__ATOMIC_ACQUIRE} and
+@code{__ATOMIC_RELEASE}.
+@item __ATOMIC_SEQ_CST
+Enforces total ordering with all other @code{__ATOMIC_SEQ_CST} operations.
+@end table
 
-@defbuiltin{{long double} __builtin_nanl (const char *@var{str})}
-Similar to @code{__builtin_nan}, except the return type is @code{long double}.
-@enddefbuiltin
+Note that in the C++11 memory model, @emph{fences} (e.g.,
+@samp{__atomic_thread_fence}) take effect in combination with other
+atomic operations on specific memory locations (e.g., atomic loads);
+operations on specific memory locations do not necessarily affect other
+operations in the same way.
 
-@defbuiltin{_Float@var{n} __builtin_nanf@var{n} (const char *@var{str})}
-Similar to @code{__builtin_nan}, except the return type is
-@code{_Float@var{n}}.
-@enddefbuiltin
+Target architectures are encouraged to provide their own patterns for
+each of the atomic built-in functions.  If no target is provided, the original
+non-memory model set of @samp{__sync} atomic built-in functions are
+used, along with any required synchronization fences surrounding it in
+order to achieve the proper behavior.  Execution in this case is subject
+to the same restrictions as those built-in functions.
 
-@defbuiltin{_Float@var{n}x __builtin_nanf@var{n}x (const char *@var{str})}
-Similar to @code{__builtin_nan}, except the return type is
-@code{_Float@var{n}x}.
-@enddefbuiltin
+If there is no pattern or mechanism to provide a lock-free instruction
+sequence, a call is made to an external routine with the same parameters
+to be resolved at run time.
 
-@defbuiltin{double __builtin_nans (const char *@var{str})}
-Similar to @code{__builtin_nan}, except the significand is forced
-to be a signaling NaN@.  The @code{nans} function is proposed by
-@uref{https://www.open-std.org/jtc1/sc22/wg14/www/docs/n965.htm,,WG14 N965}.
-@enddefbuiltin
+When implementing patterns for these built-in functions, the memory order
+parameter can be ignored as long as the pattern implements the most
+restrictive @code{__ATOMIC_SEQ_CST} memory order.  Any of the other memory
+orders execute correctly with this memory order but they may not execute as
+efficiently as they could with a more appropriate implementation of the
+relaxed requirements.
 
-@defbuiltin{_Decimal32 __builtin_nansd32 (const char *@var{str})}
-Similar to @code{__builtin_nans}, except the return type is @code{_Decimal32}.
-@enddefbuiltin
+Note that the C++11 standard allows for the memory order parameter to be
+determined at run time rather than at compile time.  These built-in
+functions map any run-time value to @code{__ATOMIC_SEQ_CST} rather
+than invoke a runtime library call or inline a switch statement.  This is
+standard compliant, safe, and the simplest approach for now.
 
-@defbuiltin{_Decimal64 __builtin_nansd64 (const char *@var{str})}
-Similar to @code{__builtin_nans}, except the return type is @code{_Decimal64}.
-@enddefbuiltin
+The memory order parameter is a signed int, but only the lower 16 bits are
+reserved for the memory order.  The remainder of the signed int is reserved
+for target use and should be 0.  Use of the predefined atomic values
+ensures proper usage.
 
-@defbuiltin{_Decimal128 __builtin_nansd128 (const char *@var{str})}
-Similar to @code{__builtin_nans}, except the return type is @code{_Decimal128}.
-@enddefbuiltin
+@defbuiltin{@var{type} __atomic_load_n (@var{type} *@var{ptr}, int @var{memorder})}
+This built-in function implements an atomic load operation.  It returns the
+contents of @code{*@var{ptr}}.
 
-@defbuiltin{float __builtin_nansf (const char *@var{str})}
-Similar to @code{__builtin_nans}, except the return type is @code{float}.
-@enddefbuiltin
+The valid memory order variants are
+@code{__ATOMIC_RELAXED}, @code{__ATOMIC_SEQ_CST}, @code{__ATOMIC_ACQUIRE},
+and @code{__ATOMIC_CONSUME}.
 
-@defbuiltin{{long double} __builtin_nansl (const char *@var{str})}
-Similar to @code{__builtin_nans}, except the return type is @code{long double}.
 @enddefbuiltin
 
-@defbuiltin{_Float@var{n} __builtin_nansf@var{n} (const char *@var{str})}
-Similar to @code{__builtin_nans}, except the return type is
-@code{_Float@var{n}}.
+@defbuiltin{void __atomic_load (@var{type} *@var{ptr}, @var{type} *@var{ret}, int @var{memorder})}
+This is the generic version of an atomic load.  It returns the
+contents of @code{*@var{ptr}} in @code{*@var{ret}}.
+
 @enddefbuiltin
 
-@defbuiltin{_Float@var{n}x __builtin_nansf@var{n}x (const char *@var{str})}
-Similar to @code{__builtin_nans}, except the return type is
-@code{_Float@var{n}x}.
+@defbuiltin{void __atomic_store_n (@var{type} *@var{ptr}, @var{type} @var{val}, int @var{memorder})}
+This built-in function implements an atomic store operation.  It writes 
+@code{@var{val}} into @code{*@var{ptr}}.  
+
+The valid memory order variants are
+@code{__ATOMIC_RELAXED}, @code{__ATOMIC_SEQ_CST}, and @code{__ATOMIC_RELEASE}.
+
 @enddefbuiltin
 
-@defbuiltin{int __builtin_issignaling (...)}
-Return non-zero if the argument is a signaling NaN and zero otherwise.
-Note while the parameter list is an
-ellipsis, this function only accepts exactly one floating-point
-argument.  GCC treats this parameter as type-generic, which means it
-does not do default promotion from float to double.
-This built-in function can work even without the non-default
-@code{-fsignaling-nans} option, although if a signaling NaN is computed,
-stored or passed as argument to some function other than this built-in
-in the current translation unit, it is safer to use @code{-fsignaling-nans}.
-With @code{-ffinite-math-only} option this built-in function will always
-return 0.
-@enddefbuiltin
+@defbuiltin{void __atomic_store (@var{type} *@var{ptr}, @var{type} *@var{val}, int @var{memorder})}
+This is the generic version of an atomic store.  It stores the value
+of @code{*@var{val}} into @code{*@var{ptr}}.
 
-@defbuiltin{int __builtin_ffs (int @var{x})}
-Returns one plus the index of the least significant 1-bit of @var{x}, or
-if @var{x} is zero, returns zero.
 @enddefbuiltin
 
-@defbuiltin{int __builtin_clz (unsigned int @var{x})}
-Returns the number of leading 0-bits in @var{x}, starting at the most
-significant bit position.  If @var{x} is 0, the result is undefined.
-@enddefbuiltin
+@defbuiltin{@var{type} __atomic_exchange_n (@var{type} *@var{ptr}, @var{type} @var{val}, int @var{memorder})}
+This built-in function implements an atomic exchange operation.  It writes
+@var{val} into @code{*@var{ptr}}, and returns the previous contents of
+@code{*@var{ptr}}.
 
-@defbuiltin{int __builtin_ctz (unsigned int @var{x})}
-Returns the number of trailing 0-bits in @var{x}, starting at the least
-significant bit position.  If @var{x} is 0, the result is undefined.
-@enddefbuiltin
+All memory order variants are valid.
 
-@defbuiltin{int __builtin_clrsb (int @var{x})}
-Returns the number of leading redundant sign bits in @var{x}, i.e.@: the
-number of bits following the most significant bit that are identical
-to it.  There are no special cases for 0 or other values. 
 @enddefbuiltin
 
-@defbuiltin{int __builtin_popcount (unsigned int @var{x})}
-Returns the number of 1-bits in @var{x}.
-@enddefbuiltin
+@defbuiltin{void __atomic_exchange (@var{type} *@var{ptr}, @var{type} *@var{val}, @var{type} *@var{ret}, int @var{memorder})}
+This is the generic version of an atomic exchange.  It stores the
+contents of @code{*@var{val}} into @code{*@var{ptr}}. The original value
+of @code{*@var{ptr}} is copied into @code{*@var{ret}}.
 
-@defbuiltin{int __builtin_parity (unsigned int @var{x})}
-Returns the parity of @var{x}, i.e.@: the number of 1-bits in @var{x}
-modulo 2.
 @enddefbuiltin
 
-@defbuiltin{int __builtin_ffsl (long)}
-Similar to @code{__builtin_ffs}, except the argument type is
-@code{long}.
-@enddefbuiltin
+@defbuiltin{bool __atomic_compare_exchange_n (@var{type} *@var{ptr}, @var{type} *@var{expected}, @var{type} @var{desired}, bool @var{weak}, int @var{success_memorder}, int @var{failure_memorder})}
+This built-in function implements an atomic compare and exchange operation.
+This compares the contents of @code{*@var{ptr}} with the contents of
+@code{*@var{expected}}. If equal, the operation is a @emph{read-modify-write}
+operation that writes @var{desired} into @code{*@var{ptr}}.  If they are not
+equal, the operation is a @emph{read} and the current contents of
+@code{*@var{ptr}} are written into @code{*@var{expected}}.  @var{weak} is @code{true}
+for weak compare_exchange, which may fail spuriously, and @code{false} for
+the strong variation, which never fails spuriously.  Many targets
+only offer the strong variation and ignore the parameter.  When in doubt, use
+the strong variation.
 
-@defbuiltin{int __builtin_clzl (unsigned long)}
-Similar to @code{__builtin_clz}, except the argument type is
-@code{unsigned long}.
-@enddefbuiltin
+If @var{desired} is written into @code{*@var{ptr}} then @code{true} is returned
+and memory is affected according to the
+memory order specified by @var{success_memorder}.  There are no
+restrictions on what memory order can be used here.
 
-@defbuiltin{int __builtin_ctzl (unsigned long)}
-Similar to @code{__builtin_ctz}, except the argument type is
-@code{unsigned long}.
-@enddefbuiltin
+Otherwise, @code{false} is returned and memory is affected according
+to @var{failure_memorder}. This memory order cannot be
+@code{__ATOMIC_RELEASE} nor @code{__ATOMIC_ACQ_REL}.  It also cannot be a
+stronger order than that specified by @var{success_memorder}.
 
-@defbuiltin{int __builtin_clrsbl (long)}
-Similar to @code{__builtin_clrsb}, except the argument type is
-@code{long}.
 @enddefbuiltin
 
-@defbuiltin{int __builtin_popcountl (unsigned long)}
-Similar to @code{__builtin_popcount}, except the argument type is
-@code{unsigned long}.
-@enddefbuiltin
+@defbuiltin{bool __atomic_compare_exchange (@var{type} *@var{ptr}, @var{type} *@var{expected}, @var{type} *@var{desired}, bool @var{weak}, int @var{success_memorder}, int @var{failure_memorder})}
+This built-in function implements the generic version of
+@code{__atomic_compare_exchange}.  The function is virtually identical to
+@code{__atomic_compare_exchange_n}, except the desired value is also a
+pointer.
 
-@defbuiltin{int __builtin_parityl (unsigned long)}
-Similar to @code{__builtin_parity}, except the argument type is
-@code{unsigned long}.
 @enddefbuiltin
 
-@defbuiltin{int __builtin_ffsll (long long)}
-Similar to @code{__builtin_ffs}, except the argument type is
-@code{long long}.
-@enddefbuiltin
+@defbuiltin{@var{type} __atomic_add_fetch (@var{type} *@var{ptr}, @var{type} @var{val}, int @var{memorder})}
+@defbuiltinx{@var{type} __atomic_sub_fetch (@var{type} *@var{ptr}, @var{type} @var{val}, int @var{memorder})}
+@defbuiltinx{@var{type} __atomic_and_fetch (@var{type} *@var{ptr}, @var{type} @var{val}, int @var{memorder})}
+@defbuiltinx{@var{type} __atomic_xor_fetch (@var{type} *@var{ptr}, @var{type} @var{val}, int @var{memorder})}
+@defbuiltinx{@var{type} __atomic_or_fetch (@var{type} *@var{ptr}, @var{type} @var{val}, int @var{memorder})}
+@defbuiltinx{@var{type} __atomic_nand_fetch (@var{type} *@var{ptr}, @var{type} @var{val}, int @var{memorder})}
+These built-in functions perform the operation suggested by the name, and
+return the result of the operation.  Operations on pointer arguments are
+performed as if the operands were of the @code{uintptr_t} type.  That is,
+they are not scaled by the size of the type to which the pointer points.
 
-@defbuiltin{int __builtin_clzll (unsigned long long)}
-Similar to @code{__builtin_clz}, except the argument type is
-@code{unsigned long long}.
-@enddefbuiltin
+@smallexample
+@{ *ptr @var{op}= val; return *ptr; @}
+@{ *ptr = ~(*ptr & val); return *ptr; @} // nand
+@end smallexample
 
-@defbuiltin{int __builtin_ctzll (unsigned long long)}
-Similar to @code{__builtin_ctz}, except the argument type is
-@code{unsigned long long}.
-@enddefbuiltin
+The object pointed to by the first argument must be of integer or pointer
+type.  It must not be a boolean type.  All memory orders are valid.
 
-@defbuiltin{int __builtin_clrsbll (long long)}
-Similar to @code{__builtin_clrsb}, except the argument type is
-@code{long long}.
 @enddefbuiltin
 
-@defbuiltin{int __builtin_popcountll (unsigned long long)}
-Similar to @code{__builtin_popcount}, except the argument type is
-@code{unsigned long long}.
-@enddefbuiltin
+@defbuiltin{@var{type} __atomic_fetch_add (@var{type} *@var{ptr}, @var{type} @var{val}, int @var{memorder})}
+@defbuiltinx{@var{type} __atomic_fetch_sub (@var{type} *@var{ptr}, @var{type} @var{val}, int @var{memorder})}
+@defbuiltinx{@var{type} __atomic_fetch_and (@var{type} *@var{ptr}, @var{type} @var{val}, int @var{memorder})}
+@defbuiltinx{@var{type} __atomic_fetch_xor (@var{type} *@var{ptr}, @var{type} @var{val}, int @var{memorder})}
+@defbuiltinx{@var{type} __atomic_fetch_or (@var{type} *@var{ptr}, @var{type} @var{val}, int @var{memorder})}
+@defbuiltinx{@var{type} __atomic_fetch_nand (@var{type} *@var{ptr}, @var{type} @var{val}, int @var{memorder})}
+These built-in functions perform the operation suggested by the name, and
+return the value that had previously been in @code{*@var{ptr}}.  Operations
+on pointer arguments are performed as if the operands were of
+the @code{uintptr_t} type.  That is, they are not scaled by the size of
+the type to which the pointer points.
 
-@defbuiltin{int __builtin_parityll (unsigned long long)}
-Similar to @code{__builtin_parity}, except the argument type is
-@code{unsigned long long}.
-@enddefbuiltin
+@smallexample
+@{ tmp = *ptr; *ptr @var{op}= val; return tmp; @}
+@{ tmp = *ptr; *ptr = ~(*ptr & val); return tmp; @} // nand
+@end smallexample
 
-@defbuiltin{int __builtin_ffsg (...)}
-Similar to @code{__builtin_ffs}, except the argument is type-generic
-signed integer (standard, extended or bit-precise).  No integral argument
-promotions are performed on the argument.
-@enddefbuiltin
+The same constraints on arguments apply as for the corresponding
+@code{__atomic_op_fetch} built-in functions.  All memory orders are valid.
 
-@defbuiltin{int __builtin_clzg (...)}
-Similar to @code{__builtin_clz}, except the argument is type-generic
-unsigned integer (standard, extended or bit-precise) and there is
-optional second argument with int type.  No integral argument promotions
-are performed on the first argument.  If two arguments are specified,
-and first argument is 0, the result is the second argument.  If only
-one argument is specified and it is 0, the result is undefined.
 @enddefbuiltin
 
-@defbuiltin{int __builtin_ctzg (...)}
-Similar to @code{__builtin_ctz}, except the argument is type-generic
-unsigned integer (standard, extended or bit-precise) and there is
-optional second argument with int type.  No integral argument promotions
-are performed on the first argument.  If two arguments are specified,
-and first argument is 0, the result is the second argument.  If only
-one argument is specified and it is 0, the result is undefined.
-@enddefbuiltin
+@defbuiltin{bool __atomic_test_and_set (void *@var{ptr}, int @var{memorder})}
 
-@defbuiltin{int __builtin_clrsbg (...)}
-Similar to @code{__builtin_clrsb}, except the argument is type-generic
-signed integer (standard, extended or bit-precise).  No integral argument
-promotions are performed on the argument.
-@enddefbuiltin
+This built-in function performs an atomic test-and-set operation on
+the byte at @code{*@var{ptr}}.  The byte is set to some implementation
+defined nonzero ``set'' value and the return value is @code{true} if and only
+if the previous contents were ``set''.
+It should be only used for operands of type @code{bool} or @code{char}. For 
+other types only part of the value may be set.
 
-@defbuiltin{int __builtin_popcountg (...)}
-Similar to @code{__builtin_popcount}, except the argument is type-generic
-unsigned integer (standard, extended or bit-precise).  No integral argument
-promotions are performed on the argument.
-@enddefbuiltin
+All memory orders are valid.
 
-@defbuiltin{int __builtin_parityg (...)}
-Similar to @code{__builtin_parity}, except the argument is type-generic
-unsigned integer (standard, extended or bit-precise).  No integral argument
-promotions are performed on the argument.
 @enddefbuiltin
 
-@defbuiltin{@var{type} __builtin_stdc_bit_ceil (@var{type} @var{arg})}
-The @code{__builtin_stdc_bit_ceil} function is available only
-in C.  It is type-generic, the argument can be any unsigned integer
-(standard, extended or bit-precise).  No integral argument promotions are
-performed on the argument.  It is equivalent to
-@code{@var{arg} <= 1 ? (@var{type}) 1
-: (@var{type}) 2 << (@var{prec} - 1 - __builtin_clzg ((@var{type}) (@var{arg} - 1)))}
-where @var{prec} is bit width of @var{type}, except that side-effects
-in @var{arg} are evaluated just once.
-@enddefbuiltin
+@defbuiltin{void __atomic_clear (bool *@var{ptr}, int @var{memorder})}
 
-@defbuiltin{@var{type} __builtin_stdc_bit_floor (@var{type} @var{arg})}
-The @code{__builtin_stdc_bit_floor} function is available only
-in C.  It is type-generic, the argument can be any unsigned integer
-(standard, extended or bit-precise).  No integral argument promotions are
-performed on the argument.  It is equivalent to
-@code{@var{arg} == 0 ? (@var{type}) 0
-: (@var{type}) 1 << (@var{prec} - 1 - __builtin_clzg (@var{arg}))}
-where @var{prec} is bit width of @var{type}, except that side-effects
-in @var{arg} are evaluated just once.
-@enddefbuiltin
+This built-in function performs an atomic clear operation on
+@code{*@var{ptr}}.  After the operation, @code{*@var{ptr}} contains 0.
+It should be only used for operands of type @code{bool} or @code{char} and 
+in conjunction with @code{__atomic_test_and_set}.
+For other types it may only clear partially. If the type is not @code{bool}
+prefer using @code{__atomic_store}.
 
-@defbuiltin{{unsigned int} __builtin_stdc_bit_width (@var{type} @var{arg})}
-The @code{__builtin_stdc_bit_width} function is available only
-in C.  It is type-generic, the argument can be any unsigned integer
-(standard, extended or bit-precise).  No integral argument promotions are
-performed on the argument.  It is equivalent to
-@code{(unsigned int) (@var{prec} - __builtin_clzg (@var{arg}, @var{prec}))}
-where @var{prec} is bit width of @var{type}.
-@enddefbuiltin
+The valid memory order variants are
+@code{__ATOMIC_RELAXED}, @code{__ATOMIC_SEQ_CST}, and
+@code{__ATOMIC_RELEASE}.
 
-@defbuiltin{{unsigned int} __builtin_stdc_count_ones (@var{type} @var{arg})}
-The @code{__builtin_stdc_count_ones} function is available only
-in C.  It is type-generic, the argument can be any unsigned integer
-(standard, extended or bit-precise).  No integral argument promotions are
-performed on the argument.  It is equivalent to
-@code{(unsigned int) __builtin_popcountg (@var{arg})}
 @enddefbuiltin
 
-@defbuiltin{{unsigned int} __builtin_stdc_count_zeros (@var{type} @var{arg})}
-The @code{__builtin_stdc_count_zeros} function is available only
-in C.  It is type-generic, the argument can be any unsigned integer
-(standard, extended or bit-precise).  No integral argument promotions are
-performed on the argument.  It is equivalent to
-@code{(unsigned int) __builtin_popcountg ((@var{type}) ~@var{arg})}
-@enddefbuiltin
+@defbuiltin{void __atomic_thread_fence (int @var{memorder})}
+
+This built-in function acts as a synchronization fence between threads
+based on the specified memory order.
+
+All memory orders are valid.
 
-@defbuiltin{{unsigned int} __builtin_stdc_first_leading_one (@var{type} @var{arg})}
-The @code{__builtin_stdc_first_leading_one} function is available only
-in C.  It is type-generic, the argument can be any unsigned integer
-(standard, extended or bit-precise).  No integral argument promotions are
-performed on the argument.  It is equivalent to
-@code{__builtin_clzg (@var{arg}, -1) + 1U}
 @enddefbuiltin
 
-@defbuiltin{{unsigned int} __builtin_stdc_first_leading_zero (@var{type} @var{arg})}
-The @code{__builtin_stdc_first_leading_zero} function is available only
-in C.  It is type-generic, the argument can be any unsigned integer
-(standard, extended or bit-precise).  No integral argument promotions are
-performed on the argument.  It is equivalent to
-@code{__builtin_clzg ((@var{type}) ~@var{arg}, -1) + 1U}
-@enddefbuiltin
+@defbuiltin{void __atomic_signal_fence (int @var{memorder})}
 
-@defbuiltin{{unsigned int} __builtin_stdc_first_trailing_one (@var{type} @var{arg})}
-The @code{__builtin_stdc_first_trailing_one} function is available only
-in C.  It is type-generic, the argument can be any unsigned integer
-(standard, extended or bit-precise).  No integral argument promotions are
-performed on the argument.  It is equivalent to
-@code{__builtin_ctzg (@var{arg}, -1) + 1U}
-@enddefbuiltin
+This built-in function acts as a synchronization fence between a thread
+and signal handlers based in the same thread.
 
-@defbuiltin{{unsigned int} __builtin_stdc_first_trailing_zero (@var{type} @var{arg})}
-The @code{__builtin_stdc_first_trailing_zero} function is available only
-in C.  It is type-generic, the argument can be any unsigned integer
-(standard, extended or bit-precise).  No integral argument promotions are
-performed on the argument.  It is equivalent to
-@code{__builtin_ctzg ((@var{type}) ~@var{arg}, -1) + 1U}
-@enddefbuiltin
+All memory orders are valid.
 
-@defbuiltin{{unsigned int} __builtin_stdc_has_single_bit (@var{type} @var{arg})}
-The @code{__builtin_stdc_has_single_bit} function is available only
-in C.  It is type-generic, the argument can be any unsigned integer
-(standard, extended or bit-precise).  No integral argument promotions are
-performed on the argument.  It is equivalent to
-@code{(_Bool) (__builtin_popcountg (@var{arg}) == 1)}
 @enddefbuiltin
 
-@defbuiltin{{unsigned int} __builtin_stdc_leading_ones (@var{type} @var{arg})}
-The @code{__builtin_stdc_leading_ones} function is available only
-in C.  It is type-generic, the argument can be any unsigned integer
-(standard, extended or bit-precise).  No integral argument promotions are
-performed on the argument.  It is equivalent to
-@code{(unsigned int) __builtin_clzg ((@var{type}) ~@var{arg}, @var{prec})}
-@enddefbuiltin
+@defbuiltin{bool __atomic_always_lock_free (size_t @var{size},  void *@var{ptr})}
 
-@defbuiltin{{unsigned int} __builtin_stdc_leading_zeros (@var{type} @var{arg})}
-The @code{__builtin_stdc_leading_zeros} function is available only
-in C.  It is type-generic, the argument can be any unsigned integer
-(standard, extended or bit-precise).  No integral argument promotions are
-performed on the argument.  It is equivalent to
-@code{(unsigned int) __builtin_clzg (@var{arg}, @var{prec})}
-@enddefbuiltin
+This built-in function returns @code{true} if objects of @var{size} bytes always
+generate lock-free atomic instructions for the target architecture.
+@var{size} must resolve to a compile-time constant and the result also
+resolves to a compile-time constant.
 
-@defbuiltin{{unsigned int} __builtin_stdc_trailing_ones (@var{type} @var{arg})}
-The @code{__builtin_stdc_trailing_ones} function is available only
-in C.  It is type-generic, the argument can be any unsigned integer
-(standard, extended or bit-precise).  No integral argument promotions are
-performed on the argument.  It is equivalent to
-@code{(unsigned int) __builtin_ctzg ((@var{type}) ~@var{arg}, @var{prec})}
-@enddefbuiltin
+@var{ptr} is an optional pointer to the object that may be used to determine
+alignment.  A value of 0 indicates typical alignment should be used.  The 
+compiler may also ignore this parameter.
 
-@defbuiltin{{unsigned int} __builtin_stdc_trailing_zeros (@var{type} @var{arg})}
-The @code{__builtin_stdc_trailing_zeros} function is available only
-in C.  It is type-generic, the argument can be any unsigned integer
-(standard, extended or bit-precise).  No integral argument promotions are
-performed on the argument.  It is equivalent to
-@code{(unsigned int) __builtin_ctzg (@var{arg}, @var{prec})}
-@enddefbuiltin
+@smallexample
+if (__atomic_always_lock_free (sizeof (long long), 0))
+@end smallexample
 
-@defbuiltin{@var{type1} __builtin_stdc_rotate_left (@var{type1} @var{arg1}, @var{type2} @var{arg2})}
-The @code{__builtin_stdc_rotate_left} function is available only
-in C.  It is type-generic, the first argument can be any unsigned integer
-(standard, extended or bit-precise) and second argument any signed or
-unsigned integer or @code{char}.  No integral argument promotions are
-performed on the arguments.  It is equivalent to
-@code{(@var{type1}) ((@var{arg1} << (@var{arg2} % @var{prec}))
-| (@var{arg1} >> ((-(unsigned @var{type2}) @var{arg2}) % @var{prec})))}
-where @var{prec} is bit width of @var{type1}, except that side-effects
-in @var{arg1} and @var{arg2} are evaluated just once.  The behavior is
-undefined if @var{arg2} is negative.
 @enddefbuiltin
 
-@defbuiltin{@var{type1} __builtin_stdc_rotate_right (@var{type1} @var{arg1}, @var{type2} @var{arg2})}
-The @code{__builtin_stdc_rotate_right} function is available only
-in C.  It is type-generic, the first argument can be any unsigned integer
-(standard, extended or bit-precise) and second argument any signed or
-unsigned integer or @code{char}.  No integral argument promotions are
-performed on the arguments.  It is equivalent to
-@code{(@var{type1}) ((@var{arg1} >> (@var{arg2} % @var{prec}))
-| (@var{arg1} << ((-(unsigned @var{type2}) @var{arg2}) % @var{prec})))}
-where @var{prec} is bit width of @var{type1}, except that side-effects
-in @var{arg1} and @var{arg2} are evaluated just once.  The behavior is
-undefined if @var{arg2} is negative.
-@enddefbuiltin
+@defbuiltin{bool __atomic_is_lock_free (size_t @var{size}, void *@var{ptr})}
 
-@defbuiltin{double __builtin_powi (double, int)}
-@defbuiltinx{float __builtin_powif (float, int)}
-@defbuiltinx{{long double} __builtin_powil (long double, int)}
-Returns the first argument raised to the power of the second.  Unlike the
-@code{pow} function no guarantees about precision and rounding are made.
-@enddefbuiltin
+This built-in function returns @code{true} if objects of @var{size} bytes always
+generate lock-free atomic instructions for the target architecture.  If
+the built-in function is not known to be lock-free, a call is made to a
+runtime routine named @code{__atomic_is_lock_free}.
 
-@defbuiltin{uint16_t __builtin_bswap16 (uint16_t @var{x})}
-Returns @var{x} with the order of the bytes reversed; for example,
-@code{0xabcd} becomes @code{0xcdab}.  Byte here always means
-exactly 8 bits.
+@var{ptr} is an optional pointer to the object that may be used to determine
+alignment.  A value of 0 indicates typical alignment should be used.  The 
+compiler may also ignore this parameter.
 @enddefbuiltin
 
-@defbuiltin{uint32_t __builtin_bswap32 (uint32_t @var{x})}
-Similar to @code{__builtin_bswap16}, except the argument and return types
-are 32-bit.
-@enddefbuiltin
+@node Integer Overflow Builtins
+@section Built-in Functions to Perform Arithmetic with Overflow Checking
 
-@defbuiltin{uint64_t __builtin_bswap64 (uint64_t @var{x})}
-Similar to @code{__builtin_bswap32}, except the argument and return types
-are 64-bit.
-@enddefbuiltin
+The following built-in functions allow performing simple arithmetic operations
+together with checking whether the operations overflowed.
 
-@defbuiltin{uint128_t __builtin_bswap128 (uint128_t @var{x})}
-Similar to @code{__builtin_bswap64}, except the argument and return types
-are 128-bit.  Only supported on targets when 128-bit types are supported.
-@enddefbuiltin
+@defbuiltin{bool __builtin_add_overflow (@var{type1} @var{a}, @var{type2} @var{b}, @var{type3} *@var{res})}
+@defbuiltinx{bool __builtin_sadd_overflow (int @var{a}, int @var{b}, int *@var{res})}
+@defbuiltinx{bool __builtin_saddl_overflow (long int @var{a}, long int @var{b}, long int *@var{res})}
+@defbuiltinx{bool __builtin_saddll_overflow (long long int @var{a}, long long int @var{b}, long long int *@var{res})}
+@defbuiltinx{bool __builtin_uadd_overflow (unsigned int @var{a}, unsigned int @var{b}, unsigned int *@var{res})}
+@defbuiltinx{bool __builtin_uaddl_overflow (unsigned long int @var{a}, unsigned long int @var{b}, unsigned long int *@var{res})}
+@defbuiltinx{bool __builtin_uaddll_overflow (unsigned long long int @var{a}, unsigned long long int @var{b}, unsigned long long int *@var{res})}
 
+These built-in functions promote the first two operands into infinite precision signed
+type and perform addition on those promoted operands.  The result is then
+cast to the type the third pointer argument points to and stored there.
+If the stored result is equal to the infinite precision result, the built-in
+functions return @code{false}, otherwise they return @code{true}.  As the addition is
+performed in infinite signed precision, these built-in functions have fully defined
+behavior for all argument values.
 
-@defbuiltin{Pmode __builtin_extend_pointer (void * @var{x})}
-On targets where the user visible pointer size is smaller than the size
-of an actual hardware address this function returns the extended user
-pointer.  Targets where this is true included ILP32 mode on x86_64 or
-Aarch64.  This function is mainly useful when writing inline assembly
-code.
-@enddefbuiltin
+The first built-in function allows arbitrary integral types for operands and
+the result type must be pointer to some integral type other than enumerated or
+boolean type, the rest of the built-in functions have explicit integer types.
 
-@defbuiltin{int __builtin_goacc_parlevel_id (int @var{x})}
-Returns the openacc gang, worker or vector id depending on whether @var{x} is
-0, 1 or 2.
-@enddefbuiltin
+The compiler will attempt to use hardware instructions to implement
+these built-in functions where possible, like conditional jump on overflow
+after addition, conditional jump on carry etc.
 
-@defbuiltin{int __builtin_goacc_parlevel_size (int @var{x})}
-Returns the openacc gang, worker or vector size depending on whether @var{x} is
-0, 1 or 2.
 @enddefbuiltin
 
-@defbuiltin{uint8_t __builtin_rev_crc8_data8 (uint8_t @var{crc}, uint8_t @var{data}, uint8_t @var{poly})}
-Returns the calculated 8-bit bit-reversed CRC using the initial CRC (8-bit),
-data (8-bit) and the polynomial (8-bit).
-@var{crc} is the initial CRC, @var{data} is the data and
-@var{poly} is the polynomial without leading 1.
-Table-based or clmul-based CRC may be used for the
-calculation, depending on the target architecture.
-@enddefbuiltin
+@defbuiltin{bool __builtin_sub_overflow (@var{type1} @var{a}, @var{type2} @var{b}, @var{type3} *@var{res})}
+@defbuiltinx{bool __builtin_ssub_overflow (int @var{a}, int @var{b}, int *@var{res})}
+@defbuiltinx{bool __builtin_ssubl_overflow (long int @var{a}, long int @var{b}, long int *@var{res})}
+@defbuiltinx{bool __builtin_ssubll_overflow (long long int @var{a}, long long int @var{b}, long long int *@var{res})}
+@defbuiltinx{bool __builtin_usub_overflow (unsigned int @var{a}, unsigned int @var{b}, unsigned int *@var{res})}
+@defbuiltinx{bool __builtin_usubl_overflow (unsigned long int @var{a}, unsigned long int @var{b}, unsigned long int *@var{res})}
+@defbuiltinx{bool __builtin_usubll_overflow (unsigned long long int @var{a}, unsigned long long int @var{b}, unsigned long long int *@var{res})}
 
-@defbuiltin{uint16_t __builtin_rev_crc16_data16 (uint16_t @var{crc}, uint16_t @var{data}, uint16_t @var{poly})}
-Similar to @code{__builtin_rev_crc8_data8}, except the argument and return types
-are 16-bit.
-@enddefbuiltin
+These built-in functions are similar to the add overflow checking built-in
+functions above, except they perform subtraction, subtract the second argument
+from the first one, instead of addition.
 
-@defbuiltin{uint16_t __builtin_rev_crc16_data8 (uint16_t @var{crc}, uint8_t @var{data}, uint16_t @var{poly})}
-Similar to @code{__builtin_rev_crc16_data16}, except the @var{data} argument
-type is 8-bit.
 @enddefbuiltin
 
-@defbuiltin{uint32_t __builtin_rev_crc32_data32 (uint32_t @var{crc}, uint32_t @var{data}, uint32_t @var{poly})}
-Similar to @code{__builtin_rev_crc8_data8}, except the argument and return types
-are 32-bit and for the CRC calculation may be also used crc* machine instruction
-depending on the target and the polynomial.
-@enddefbuiltin
+@defbuiltin{bool __builtin_mul_overflow (@var{type1} @var{a}, @var{type2} @var{b}, @var{type3} *@var{res})}
+@defbuiltinx{bool __builtin_smul_overflow (int @var{a}, int @var{b}, int *@var{res})}
+@defbuiltinx{bool __builtin_smull_overflow (long int @var{a}, long int @var{b}, long int *@var{res})}
+@defbuiltinx{bool __builtin_smulll_overflow (long long int @var{a}, long long int @var{b}, long long int *@var{res})}
+@defbuiltinx{bool __builtin_umul_overflow (unsigned int @var{a}, unsigned int @var{b}, unsigned int *@var{res})}
+@defbuiltinx{bool __builtin_umull_overflow (unsigned long int @var{a}, unsigned long int @var{b}, unsigned long int *@var{res})}
+@defbuiltinx{bool __builtin_umulll_overflow (unsigned long long int @var{a}, unsigned long long int @var{b}, unsigned long long int *@var{res})}
 
-@defbuiltin{uint32_t __builtin_rev_crc32_data8 (uint32_t @var{crc}, uint8_t @var{data}, uint32_t @var{poly})}
-Similar to @code{__builtin_rev_crc32_data32}, except the @var{data} argument
-type is 8-bit.
-@enddefbuiltin
+These built-in functions are similar to the add overflow checking built-in
+functions above, except they perform multiplication, instead of addition.
 
-@defbuiltin{uint32_t __builtin_rev_crc32_data16 (uint32_t @var{crc}, uint16_t @var{data}, uint32_t @var{poly})}
-Similar to @code{__builtin_rev_crc32_data32}, except the @var{data} argument
-type is 16-bit.
 @enddefbuiltin
 
-@defbuiltin{uint64_t __builtin_rev_crc64_data64 (uint64_t @var{crc}, uint64_t @var{data}, uint64_t @var{poly})}
-Similar to @code{__builtin_rev_crc8_data8}, except the argument and return types
-are 64-bit.
-@enddefbuiltin
+The following built-in functions allow checking if simple arithmetic operation
+would overflow.
 
-@defbuiltin{uint64_t __builtin_rev_crc64_data8 (uint64_t @var{crc}, uint8_t @var{data}, uint64_t @var{poly})}
-Similar to @code{__builtin_rev_crc64_data64}, except the @var{data} argument type
-is 8-bit.
-@enddefbuiltin
+@defbuiltin{bool __builtin_add_overflow_p (@var{type1} @var{a}, @var{type2} @var{b}, @var{type3} @var{c})}
+@defbuiltinx{bool __builtin_sub_overflow_p (@var{type1} @var{a}, @var{type2} @var{b}, @var{type3} @var{c})}
+@defbuiltinx{bool __builtin_mul_overflow_p (@var{type1} @var{a}, @var{type2} @var{b}, @var{type3} @var{c})}
 
-@defbuiltin{uint64_t __builtin_rev_crc64_data16 (uint64_t @var{crc}, uint16_t @var{data}, uint64_t @var{poly})}
-Similar to @code{__builtin_rev_crc64_data64}, except the @var{data} argument type
-is 16-bit.
-@enddefbuiltin
+These built-in functions are similar to @code{__builtin_add_overflow},
+@code{__builtin_sub_overflow}, or @code{__builtin_mul_overflow}, except that
+they don't store the result of the arithmetic operation anywhere and the
+last argument is not a pointer, but some expression with integral type other
+than enumerated or boolean type.
 
-@defbuiltin{uint64_t __builtin_rev_crc64_data32 (uint64_t @var{crc}, uint32_t @var{data}, uint64_t @var{poly})}
-Similar to @code{__builtin_rev_crc64_data64}, except the @var{data} argument type
-is 32-bit.
-@enddefbuiltin
+The built-in functions promote the first two operands into infinite precision signed type
+and perform addition on those promoted operands. The result is then
+cast to the type of the third argument.  If the cast result is equal to the infinite
+precision result, the built-in functions return @code{false}, otherwise they return @code{true}.
+The value of the third argument is ignored, just the side effects in the third argument
+are evaluated, and no integral argument promotions are performed on the last argument.
+If the third argument is a bit-field, the type used for the result cast has the
+precision and signedness of the given bit-field, rather than precision and signedness
+of the underlying type.
 
-@defbuiltin{uint8_t __builtin_crc8_data8 (uint8_t @var{crc}, uint8_t @var{data}, uint8_t @var{poly})}
-Returns the calculated 8-bit bit-forward CRC using the initial CRC (8-bit),
-data (8-bit) and the polynomial (8-bit).
-@var{crc} is the initial CRC, @var{data} is the data and
-@var{poly} is the polynomial without leading 1.
-Table-based or clmul-based CRC may be used for the
-calculation, depending on the target architecture.
-@enddefbuiltin
+For example, the following macro can be used to portably check, at
+compile-time, whether or not adding two constant integers will overflow,
+and perform the addition only when it is known to be safe and not to trigger
+a @option{-Woverflow} warning.
 
-@defbuiltin{uint16_t __builtin_crc16_data16 (uint16_t @var{crc}, uint16_t @var{data}, uint16_t @var{poly})}
-Similar to @code{__builtin_crc8_data8}, except the argument and return types
-are 16-bit.
-@enddefbuiltin
+@smallexample
+#define INT_ADD_OVERFLOW_P(a, b) \
+   __builtin_add_overflow_p (a, b, (__typeof__ ((a) + (b))) 0)
 
-@defbuiltin{uint16_t __builtin_crc16_data8 (uint16_t @var{crc}, uint8_t @var{data}, uint16_t @var{poly})}
-Similar to @code{__builtin_crc16_data16}, except the @var{data} argument type
-is 8-bit.
-@enddefbuiltin
-
-@defbuiltin{uint32_t __builtin_crc32_data32 (uint32_t @var{crc}, uint32_t @var{data}, uint32_t @var{poly})}
-Similar to @code{__builtin_crc8_data8}, except the argument and return types
-are 32-bit.
-@enddefbuiltin
-
-@defbuiltin{uint32_t __builtin_crc32_data8 (uint32_t @var{crc}, uint8_t @var{data}, uint32_t @var{poly})}
-Similar to @code{__builtin_crc32_data32}, except the @var{data} argument type
-is 8-bit.
-@enddefbuiltin
+enum @{
+    A = INT_MAX, B = 3,
+    C = INT_ADD_OVERFLOW_P (A, B) ? 0 : A + B,
+    D = __builtin_add_overflow_p (1, SCHAR_MAX, (signed char) 0)
+@};
+@end smallexample
 
-@defbuiltin{uint32_t __builtin_crc32_data16 (uint32_t @var{crc}, uint16_t @var{data}, uint32_t @var{poly})}
-Similar to @code{__builtin_crc32_data32}, except the @var{data} argument type
-is 16-bit.
+The compiler will attempt to use hardware instructions to implement
+these built-in functions where possible, like conditional jump on overflow
+after addition, conditional jump on carry etc.
+ 
 @enddefbuiltin
 
-@defbuiltin{uint64_t __builtin_crc64_data64 (uint64_t @var{crc}, uint64_t @var{data}, uint64_t @var{poly})}
-Similar to @code{__builtin_crc8_data8}, except the argument and return types
-are 64-bit.
-@enddefbuiltin
+@defbuiltin{{unsigned int} __builtin_addc (unsigned int @var{a}, unsigned int @var{b}, unsigned int @var{carry_in}, unsigned int *@var{carry_out})}
+@defbuiltinx{{unsigned long int} __builtin_addcl (unsigned long int @var{a}, unsigned long int @var{b}, unsigned int @var{carry_in}, unsigned long int *@var{carry_out})}
+@defbuiltinx{{unsigned long long int} __builtin_addcll (unsigned long long int @var{a}, unsigned long long int @var{b}, unsigned long long int @var{carry_in}, unsigned long long int *@var{carry_out})}
 
-@defbuiltin{uint64_t __builtin_crc64_data8 (uint64_t @var{crc}, uint8_t @var{data}, uint64_t @var{poly})}
-Similar to @code{__builtin_crc64_data64}, except the @var{data} argument type
-is 8-bit.
-@enddefbuiltin
+These built-in functions are equivalent to:
+@smallexample
+  (@{ __typeof__ (@var{a}) s; \
+      __typeof__ (@var{a}) c1 = __builtin_add_overflow (@var{a}, @var{b}, &s); \
+      __typeof__ (@var{a}) c2 = __builtin_add_overflow (s, @var{carry_in}, &s); \
+      *(@var{carry_out}) = c1 | c2; \
+      s; @})
+@end smallexample
 
-@defbuiltin{uint64_t __builtin_crc64_data16 (uint64_t @var{crc}, uint16_t @var{data}, uint64_t @var{poly})}
-Similar to @code{__builtin_crc64_data64}, except the @var{data} argument type
-is 16-bit.
-@enddefbuiltin
+i.e.@: they add 3 unsigned values, set what the last argument
+points to to 1 if any of the two additions overflowed (otherwise 0)
+and return the sum of those 3 unsigned values.  Note, while all
+the first 3 arguments can have arbitrary values, better code will be
+emitted if one of them (preferably the third one) has only values
+0 or 1 (i.e.@: carry-in).
 
-@defbuiltin{uint64_t __builtin_crc64_data32 (uint64_t @var{crc}, uint32_t @var{data}, uint64_t @var{poly})}
-Similar to @code{__builtin_crc64_data64}, except the @var{data} argument type
-is 32-bit.
 @enddefbuiltin
 
-@node Target Builtins
-@section Built-in Functions Specific to Particular Target Machines
-
-On some target machines, GCC supports many built-in functions specific
-to those machines.  Generally these generate calls to specific machine
-instructions, but allow the compiler to schedule those calls.
-
-@menu
-* AArch64 Built-in Functions::
-* Alpha Built-in Functions::
-* ARC Built-in Functions::
-* ARC SIMD Built-in Functions::
-* ARM iWMMXt Built-in Functions::
-* ARM C Language Extensions (ACLE)::
-* ARM Floating Point Status and Control Intrinsics::
-* ARM ARMv8-M Security Extensions::
-* AVR Built-in Functions::
-* Blackfin Built-in Functions::
-* BPF Built-in Functions::
-* FR-V Built-in Functions::
-* LoongArch Base Built-in Functions::
-* LoongArch SX Vector Intrinsics::
-* LoongArch ASX Vector Intrinsics::
-* MIPS DSP Built-in Functions::
-* MIPS Paired-Single Support::
-* MIPS Loongson Built-in Functions::
-* MIPS SIMD Architecture (MSA) Support::
-* Other MIPS Built-in Functions::
-* MSP430 Built-in Functions::
-* NDS32 Built-in Functions::
-* Nvidia PTX Built-in Functions::
-* Basic PowerPC Built-in Functions::
-* PowerPC AltiVec/VSX Built-in Functions::
-* PowerPC Hardware Transactional Memory Built-in Functions::
-* PowerPC Atomic Memory Operation Functions::
-* PowerPC Matrix-Multiply Assist Built-in Functions::
-* PRU Built-in Functions::
-* RISC-V Built-in Functions::
-* RISC-V Vector Intrinsics::
-* CORE-V Built-in Functions::
-* RX Built-in Functions::
-* S/390 System z Built-in Functions::
-* SH Built-in Functions::
-* SPARC VIS Built-in Functions::
-* TI C6X Built-in Functions::
-* x86 Built-in Functions::
-* x86 transactional memory intrinsics::
-* x86 control-flow protection intrinsics::
-@end menu
-
-@node AArch64 Built-in Functions
-@subsection AArch64 Built-in Functions
+@defbuiltin{{unsigned int} __builtin_subc (unsigned int @var{a}, unsigned int @var{b}, unsigned int @var{carry_in}, unsigned int *@var{carry_out})}
+@defbuiltinx{{unsigned long int} __builtin_subcl (unsigned long int @var{a}, unsigned long int @var{b}, unsigned int @var{carry_in}, unsigned long int *@var{carry_out})}
+@defbuiltinx{{unsigned long long int} __builtin_subcll (unsigned long long int @var{a}, unsigned long long int @var{b}, unsigned long long int @var{carry_in}, unsigned long long int *@var{carry_out})}
 
-These built-in functions are available for the AArch64 family of
-processors.
+These built-in functions are equivalent to:
 @smallexample
-unsigned int __builtin_aarch64_get_fpcr ();
-void __builtin_aarch64_set_fpcr (unsigned int);
-unsigned int __builtin_aarch64_get_fpsr ();
-void __builtin_aarch64_set_fpsr (unsigned int);
-
-unsigned long long __builtin_aarch64_get_fpcr64 ();
-void __builtin_aarch64_set_fpcr64 (unsigned long long);
-unsigned long long __builtin_aarch64_get_fpsr64 ();
-void __builtin_aarch64_set_fpsr64 (unsigned long long);
+  (@{ __typeof__ (@var{a}) s; \
+      __typeof__ (@var{a}) c1 = __builtin_sub_overflow (@var{a}, @var{b}, &s); \
+      __typeof__ (@var{a}) c2 = __builtin_sub_overflow (s, @var{carry_in}, &s); \
+      *(@var{carry_out}) = c1 | c2; \
+      s; @})
 @end smallexample
 
-@node Alpha Built-in Functions
-@subsection Alpha Built-in Functions
-
-These built-in functions are available for the Alpha family of
-processors, depending on the command-line switches used.
+i.e.@: they subtract 2 unsigned values from the first unsigned value,
+set what the last argument points to to 1 if any of the two subtractions
+overflowed (otherwise 0) and return the result of the subtractions.
+Note, while all the first 3 arguments can have arbitrary values, better code
+will be emitted if one of them (preferrably the third one) has only values
+0 or 1 (i.e.@: carry-in).
 
-The following built-in functions are always available.  They
-all generate the machine instruction that is part of the name.
+@enddefbuiltin
 
-@smallexample
-long __builtin_alpha_implver (void);
-long __builtin_alpha_rpcc (void);
-long __builtin_alpha_amask (long);
-long __builtin_alpha_cmpbge (long, long);
-long __builtin_alpha_extbl (long, long);
-long __builtin_alpha_extwl (long, long);
-long __builtin_alpha_extll (long, long);
-long __builtin_alpha_extql (long, long);
-long __builtin_alpha_extwh (long, long);
-long __builtin_alpha_extlh (long, long);
-long __builtin_alpha_extqh (long, long);
-long __builtin_alpha_insbl (long, long);
-long __builtin_alpha_inswl (long, long);
-long __builtin_alpha_insll (long, long);
-long __builtin_alpha_insql (long, long);
-long __builtin_alpha_inswh (long, long);
-long __builtin_alpha_inslh (long, long);
-long __builtin_alpha_insqh (long, long);
-long __builtin_alpha_mskbl (long, long);
-long __builtin_alpha_mskwl (long, long);
-long __builtin_alpha_mskll (long, long);
-long __builtin_alpha_mskql (long, long);
-long __builtin_alpha_mskwh (long, long);
-long __builtin_alpha_msklh (long, long);
-long __builtin_alpha_mskqh (long, long);
-long __builtin_alpha_umulh (long, long);
-long __builtin_alpha_zap (long, long);
-long __builtin_alpha_zapnot (long, long);
-@end smallexample
+@node x86 specific memory model extensions for transactional memory
+@section x86-Specific Memory Model Extensions for Transactional Memory
 
-The following built-in functions are always with @option{-mmax}
-or @option{-mcpu=@var{cpu}} where @var{cpu} is @code{pca56} or
-later.  They all generate the machine instruction that is part
-of the name.
+The x86 architecture supports additional memory ordering flags
+to mark critical sections for hardware lock elision. 
+These must be specified in addition to an existing memory order to
+atomic intrinsics.
 
-@smallexample
-long __builtin_alpha_pklb (long);
-long __builtin_alpha_pkwb (long);
-long __builtin_alpha_unpkbl (long);
-long __builtin_alpha_unpkbw (long);
-long __builtin_alpha_minub8 (long, long);
-long __builtin_alpha_minsb8 (long, long);
-long __builtin_alpha_minuw4 (long, long);
-long __builtin_alpha_minsw4 (long, long);
-long __builtin_alpha_maxub8 (long, long);
-long __builtin_alpha_maxsb8 (long, long);
-long __builtin_alpha_maxuw4 (long, long);
-long __builtin_alpha_maxsw4 (long, long);
-long __builtin_alpha_perr (long, long);
-@end smallexample
+@table @code
+@item __ATOMIC_HLE_ACQUIRE
+Start lock elision on a lock variable.
+Memory order must be @code{__ATOMIC_ACQUIRE} or stronger.
+@item __ATOMIC_HLE_RELEASE
+End lock elision on a lock variable.
+Memory order must be @code{__ATOMIC_RELEASE} or stronger.
+@end table
 
-The following built-in functions are always with @option{-mcix}
-or @option{-mcpu=@var{cpu}} where @var{cpu} is @code{ev67} or
-later.  They all generate the machine instruction that is part
-of the name.
+When a lock acquire fails, it is required for good performance to abort
+the transaction quickly. This can be done with a @code{_mm_pause}.
 
 @smallexample
-long __builtin_alpha_cttz (long);
-long __builtin_alpha_ctlz (long);
-long __builtin_alpha_ctpop (long);
-@end smallexample
+#include <immintrin.h> // For _mm_pause
 
-The following built-in functions are available on systems that use the OSF/1
-PALcode.  Normally they invoke the @code{rduniq} and @code{wruniq}
-PAL calls, but when invoked with @option{-mtls-kernel}, they invoke
-@code{rdval} and @code{wrval}.
+int lockvar;
 
-@smallexample
-void *__builtin_thread_pointer (void);
-void __builtin_set_thread_pointer (void *);
+/* Acquire lock with lock elision */
+while (__atomic_exchange_n(&lockvar, 1, __ATOMIC_ACQUIRE|__ATOMIC_HLE_ACQUIRE))
+    _mm_pause(); /* Abort failed transaction */
+...
+/* Free lock with lock elision */
+__atomic_store_n(&lockvar, 0, __ATOMIC_RELEASE|__ATOMIC_HLE_RELEASE);
 @end smallexample
 
-@node ARC Built-in Functions
-@subsection ARC Built-in Functions
+@node Object Size Checking
+@section Object Size Checking
 
-The following built-in functions are provided for ARC targets.  The
-built-ins generate the corresponding assembly instructions.  In the
-examples given below, the generated code often requires an operand or
-result to be in a register.  Where necessary further code will be
-generated to ensure this is true, but for brevity this is not
-described in each case.
+@subsection Object Size Checking Built-in Functions
+@findex __builtin___memcpy_chk
+@findex __builtin___mempcpy_chk
+@findex __builtin___memmove_chk
+@findex __builtin___memset_chk
+@findex __builtin___strcpy_chk
+@findex __builtin___stpcpy_chk
+@findex __builtin___strncpy_chk
+@findex __builtin___strcat_chk
+@findex __builtin___strncat_chk
 
-@emph{Note:} Using a built-in to generate an instruction not supported
-by a target may cause problems. At present the compiler is not
-guaranteed to detect such misuse, and as a result an internal compiler
-error may be generated.
-
-@defbuiltin{int __builtin_arc_aligned (void *@var{val}, int @var{alignval})}
-Return 1 if @var{val} is known to have the byte alignment given
-by @var{alignval}, otherwise return 0.
-Note that this is different from
-@smallexample
-__alignof__(*(char *)@var{val}) >= alignval
-@end smallexample
-because __alignof__ sees only the type of the dereference, whereas
-__builtin_arc_align uses alignment information from the pointer
-as well as from the pointed-to type.
-The information available will depend on optimization level.
-@enddefbuiltin
+GCC implements a limited buffer overflow protection mechanism that can
+prevent some buffer overflow attacks by determining the sizes of objects
+into which data is about to be written and preventing the writes when
+the size isn't sufficient.  The built-in functions described below yield
+the best results when used together and when optimization is enabled.
+For example, to detect object sizes across function boundaries or to
+follow pointer assignments through non-trivial control flow they rely
+on various optimization passes enabled with @option{-O2}.  However, to
+a limited extent, they can be used without optimization as well.
 
-@defbuiltin{void __builtin_arc_brk (void)}
-Generates
-@example
-brk
-@end example
-@enddefbuiltin
+@defbuiltin{size_t __builtin_object_size (const void * @var{ptr}, int @var{type})}
+is a built-in construct that returns a constant number of bytes from
+@var{ptr} to the end of the object @var{ptr} pointer points to
+(if known at compile time).  To determine the sizes of dynamically allocated
+objects the function relies on the allocation functions called to obtain
+the storage to be declared with the @code{alloc_size} attribute (@pxref{Common
+Function Attributes}).  @code{__builtin_object_size} never evaluates
+its arguments for side effects.  If there are any side effects in them, it
+returns @code{(size_t) -1} for @var{type} 0 or 1 and @code{(size_t) 0}
+for @var{type} 2 or 3.  If there are multiple objects @var{ptr} can
+point to and all of them are known at compile time, the returned number
+is the maximum of remaining byte counts in those objects if @var{type} & 2 is
+0 and minimum if nonzero.  If it is not possible to determine which objects
+@var{ptr} points to at compile time, @code{__builtin_object_size} should
+return @code{(size_t) -1} for @var{type} 0 or 1 and @code{(size_t) 0}
+for @var{type} 2 or 3.
 
-@defbuiltin{{unsigned int} __builtin_arc_core_read (unsigned int @var{regno})}
-The operand is the number of a register to be read.  Generates:
-@example
-mov  @var{dest}, r@var{regno}
-@end example
-where the value in @var{dest} will be the result returned from the
-built-in.
-@enddefbuiltin
+@var{type} is an integer constant from 0 to 3.  If the least significant
+bit is clear, objects are whole variables, if it is set, a closest
+surrounding subobject is considered the object a pointer points to.
+The second bit determines if maximum or minimum of remaining bytes
+is computed.
 
-@defbuiltin{void __builtin_arc_core_write (unsigned int @var{regno}, unsigned int @var{val})}
-The first operand is the number of a register to be written, the
-second operand is a compile time constant to write into that
-register.  Generates:
-@example
-mov  r@var{regno}, @var{val}
-@end example
-@enddefbuiltin
+@smallexample
+struct V @{ char buf1[10]; int b; char buf2[10]; @} var;
+char *p = &var.buf1[1], *q = &var.b;
 
-@defbuiltin{int __builtin_arc_divaw (int @var{a}, int @var{b})}
-Only available if either @option{-mcpu=ARC700} or @option{-meA} is set.
-Generates:
-@example
-divaw  @var{dest}, @var{a}, @var{b}
-@end example
-where the value in @var{dest} will be the result returned from the
-built-in.
+/* Here the object p points to is var.  */
+assert (__builtin_object_size (p, 0) == sizeof (var) - 1);
+/* The subobject p points to is var.buf1.  */
+assert (__builtin_object_size (p, 1) == sizeof (var.buf1) - 1);
+/* The object q points to is var.  */
+assert (__builtin_object_size (q, 0)
+        == (char *) (&var + 1) - (char *) &var.b);
+/* The subobject q points to is var.b.  */
+assert (__builtin_object_size (q, 1) == sizeof (var.b));
+@end smallexample
 @enddefbuiltin
 
-@defbuiltin{void __builtin_arc_flag (unsigned int @var{a})}
-Generates
-@example
-flag  @var{a}
-@end example
+@defbuiltin{{size_t} __builtin_dynamic_object_size (const void * @var{ptr}, int @var{type})}
+is similar to @code{__builtin_object_size} in that it returns a number of bytes
+from @var{ptr} to the end of the object @var{ptr} pointer points to, except
+that the size returned may not be a constant.  This results in successful
+evaluation of object size estimates in a wider range of use cases and can be
+more precise than @code{__builtin_object_size}, but it incurs a performance
+penalty since it may add a runtime overhead on size computation.  Semantics of
+@var{type} as well as return values in case it is not possible to determine
+which objects @var{ptr} points to at compile time are the same as in the case
+of @code{__builtin_object_size}.
 @enddefbuiltin
 
-@defbuiltin{{unsigned int} __builtin_arc_lr (unsigned int @var{auxr})}
-The operand, @var{auxv}, is the address of an auxiliary register and
-must be a compile time constant.  Generates:
-@example
-lr  @var{dest}, [@var{auxr}]
-@end example
-Where the value in @var{dest} will be the result returned from the
-built-in.
-@enddefbuiltin
+@subsection Object Size Checking and Source Fortification
 
-@defbuiltin{void __builtin_arc_mul64 (int @var{a}, int @var{b})}
-Only available with @option{-mmul64}.  Generates:
-@example
-mul64  @var{a}, @var{b}
-@end example
-@enddefbuiltin
+Hardening of function calls using the @code{_FORTIFY_SOURCE} macro is
+one of the key uses of the object size checking built-in functions.  To
+make implementation of these features more convenient and improve
+optimization and diagnostics, there are built-in functions added for
+many common string operation functions, e.g., for @code{memcpy}
+@code{__builtin___memcpy_chk} built-in is provided.  This built-in has
+an additional last argument, which is the number of bytes remaining in
+the object the @var{dest} argument points to or @code{(size_t) -1} if
+the size is not known.
 
-@defbuiltin{void __builtin_arc_mulu64 (unsigned int @var{a}, unsigned int @var{b})}
-Only available with @option{-mmul64}.  Generates:
-@example
-mulu64  @var{a}, @var{b}
-@end example
-@enddefbuiltin
+The built-in functions are optimized into the normal string functions
+like @code{memcpy} if the last argument is @code{(size_t) -1} or if
+it is known at compile time that the destination object will not
+be overflowed.  If the compiler can determine at compile time that the
+object will always be overflowed, it issues a warning.
 
-@defbuiltin{void __builtin_arc_nop (void)}
-Generates:
-@example
-nop
-@end example
-@enddefbuiltin
+The intended use can be e.g.@:
 
-@defbuiltin{int __builtin_arc_norm (int @var{src})}
-Only valid if the @samp{norm} instruction is available through the
-@option{-mnorm} option or by default with @option{-mcpu=ARC700}.
-Generates:
-@example
-norm  @var{dest}, @var{src}
-@end example
-Where the value in @var{dest} will be the result returned from the
-built-in.
-@enddefbuiltin
+@smallexample
+#undef memcpy
+#define bos0(dest) __builtin_object_size (dest, 0)
+#define memcpy(dest, src, n) \
+  __builtin___memcpy_chk (dest, src, n, bos0 (dest))
 
-@defbuiltin{{short int} __builtin_arc_normw (short int @var{src})}
-Only valid if the @samp{normw} instruction is available through the
-@option{-mnorm} option or by default with @option{-mcpu=ARC700}.
-Generates:
-@example
-normw  @var{dest}, @var{src}
-@end example
-Where the value in @var{dest} will be the result returned from the
-built-in.
-@enddefbuiltin
+char *volatile p;
+char buf[10];
+/* It is unknown what object p points to, so this is optimized
+   into plain memcpy - no checking is possible.  */
+memcpy (p, "abcde", n);
+/* Destination is known and length too.  It is known at compile
+   time there will be no overflow.  */
+memcpy (&buf[5], "abcde", 5);
+/* Destination is known, but the length is not known at compile time.
+   This will result in __memcpy_chk call that can check for overflow
+   at run time.  */
+memcpy (&buf[5], "abcde", n);
+/* Destination is known and it is known at compile time there will
+   be overflow.  There will be a warning and __memcpy_chk call that
+   will abort the program at run time.  */
+memcpy (&buf[6], "abcde", 5);
+@end smallexample
 
-@defbuiltin{void __builtin_arc_rtie (void)}
-Generates:
-@example
-rtie
-@end example
-@enddefbuiltin
+Such built-in functions are provided for @code{memcpy}, @code{mempcpy},
+@code{memmove}, @code{memset}, @code{strcpy}, @code{stpcpy}, @code{strncpy},
+@code{strcat} and @code{strncat}.
 
-@defbuiltin{void __builtin_arc_sleep (int @var{a}}
-Generates:
-@example
-sleep  @var{a}
-@end example
-@enddefbuiltin
+@subsubsection Formatted Output Function Checking
+@defbuiltin{int __builtin___sprintf_chk @
+            (char *@var{s}, int @var{flag}, size_t @var{os}, @
+            const char *@var{fmt}, ...)}
+@defbuiltinx{int __builtin___snprintf_chk @
+             (char *@var{s}, size_t @var{maxlen}, int @var{flag}, @
+             size_t @var{os}, const char *@var{fmt}, ...)}
+@defbuiltinx{int __builtin___vsprintf_chk @
+             (char *@var{s}, int @var{flag}, size_t @var{os}, @
+             const char *@var{fmt}, va_list @var{ap})}
+@defbuiltinx{int __builtin___vsnprintf_chk @
+             (char *@var{s}, size_t @var{maxlen}, int @var{flag}, @
+             size_t @var{os}, const char *@var{fmt}, @
+             va_list @var{ap})}
 
-@defbuiltin{void __builtin_arc_sr (unsigned int @var{val}, unsigned int @var{auxr})}
-The first argument, @var{val}, is a compile time constant to be
-written to the register, the second argument, @var{auxr}, is the
-address of an auxiliary register.  Generates:
-@example
-sr  @var{val}, [@var{auxr}]
-@end example
-@enddefbuiltin
+The added @var{flag} argument is passed unchanged to @code{__sprintf_chk}
+etc.@: functions and can contain implementation specific flags on what
+additional security measures the checking function might take, such as
+handling @code{%n} differently.
 
-@defbuiltin{int __builtin_arc_swap (int @var{src})}
-Only valid with @option{-mswap}.  Generates:
-@example
-swap  @var{dest}, @var{src}
-@end example
-Where the value in @var{dest} will be the result returned from the
-built-in.
-@enddefbuiltin
+The @var{os} argument is the object size @var{s} points to, like in the
+other built-in functions.  There is a small difference in the behavior
+though, if @var{os} is @code{(size_t) -1}, the built-in functions are
+optimized into the non-checking functions only if @var{flag} is 0, otherwise
+the checking function is called with @var{os} argument set to
+@code{(size_t) -1}.
 
-@defbuiltin{void __builtin_arc_swi (void)}
-Generates:
-@example
-swi
-@end example
+In addition to this, there are checking built-in functions
+@code{__builtin___printf_chk}, @code{__builtin___vprintf_chk},
+@code{__builtin___fprintf_chk} and @code{__builtin___vfprintf_chk}.
+These have just one additional argument, @var{flag}, right before
+format string @var{fmt}.  If the compiler is able to optimize them to
+@code{fputc} etc.@: functions, it does, otherwise the checking function
+is called and the @var{flag} argument passed to it.
 @enddefbuiltin
 
-@defbuiltin{void __builtin_arc_sync (void)}
-Only available with @option{-mcpu=ARC700}.  Generates:
-@example
-sync
-@end example
-@enddefbuiltin
-
-@defbuiltin{void __builtin_arc_trap_s (unsigned int @var{c})}
-Only available with @option{-mcpu=ARC700}.  Generates:
-@example
-trap_s  @var{c}
-@end example
-@enddefbuiltin
-
-@defbuiltin{void __builtin_arc_unimp_s (void)}
-Only available with @option{-mcpu=ARC700}.  Generates:
-@example
-unimp_s
-@end example
-@enddefbuiltin
-
-The instructions generated by the following builtins are not
-considered as candidates for scheduling.  They are not moved around by
-the compiler during scheduling, and thus can be expected to appear
-where they are put in the C code:
-@example
-__builtin_arc_brk()
-__builtin_arc_core_read()
-__builtin_arc_core_write()
-__builtin_arc_flag()
-__builtin_arc_lr()
-__builtin_arc_sleep()
-__builtin_arc_sr()
-__builtin_arc_swi()
-@end example
-
-The following built-in functions are available for the ARCv2 family of
-processors.
-
-@example
-int __builtin_arc_clri ();
-void __builtin_arc_kflag (unsigned);
-void __builtin_arc_seti (int);
-@end example
-
-The following built-in functions are available for the ARCv2 family
-and uses @option{-mnorm}.
-
-@example
-int __builtin_arc_ffs (int);
-int __builtin_arc_fls (int);
-@end example
-
-@node ARC SIMD Built-in Functions
-@subsection ARC SIMD Built-in Functions
-
-SIMD builtins provided by the compiler can be used to generate the
-vector instructions.  This section describes the available builtins
-and their usage in programs.  With the @option{-msimd} option, the
-compiler provides 128-bit vector types, which can be specified using
-the @code{vector_size} attribute.  The header file @file{arc-simd.h}
-can be included to use the following predefined types:
-@example
-typedef int __v4si   __attribute__((vector_size(16)));
-typedef short __v8hi __attribute__((vector_size(16)));
-@end example
-
-These types can be used to define 128-bit variables.  The built-in
-functions listed in the following section can be used on these
-variables to generate the vector operations.
-
-For all builtins, @code{__builtin_arc_@var{someinsn}}, the header file
-@file{arc-simd.h} also provides equivalent macros called
-@code{_@var{someinsn}} that can be used for programming ease and
-improved readability.  The following macros for DMA control are also
-provided:
-@example
-#define _setup_dma_in_channel_reg _vdiwr
-#define _setup_dma_out_channel_reg _vdowr
-@end example
-
-The following is a complete list of all the SIMD built-ins provided
-for ARC, grouped by calling signature.
-
-The following take two @code{__v8hi} arguments and return a
-@code{__v8hi} result:
-@example
-__v8hi __builtin_arc_vaddaw (__v8hi, __v8hi);
-__v8hi __builtin_arc_vaddw (__v8hi, __v8hi);
-__v8hi __builtin_arc_vand (__v8hi, __v8hi);
-__v8hi __builtin_arc_vandaw (__v8hi, __v8hi);
-__v8hi __builtin_arc_vavb (__v8hi, __v8hi);
-__v8hi __builtin_arc_vavrb (__v8hi, __v8hi);
-__v8hi __builtin_arc_vbic (__v8hi, __v8hi);
-__v8hi __builtin_arc_vbicaw (__v8hi, __v8hi);
-__v8hi __builtin_arc_vdifaw (__v8hi, __v8hi);
-__v8hi __builtin_arc_vdifw (__v8hi, __v8hi);
-__v8hi __builtin_arc_veqw (__v8hi, __v8hi);
-__v8hi __builtin_arc_vh264f (__v8hi, __v8hi);
-__v8hi __builtin_arc_vh264ft (__v8hi, __v8hi);
-__v8hi __builtin_arc_vh264fw (__v8hi, __v8hi);
-__v8hi __builtin_arc_vlew (__v8hi, __v8hi);
-__v8hi __builtin_arc_vltw (__v8hi, __v8hi);
-__v8hi __builtin_arc_vmaxaw (__v8hi, __v8hi);
-__v8hi __builtin_arc_vmaxw (__v8hi, __v8hi);
-__v8hi __builtin_arc_vminaw (__v8hi, __v8hi);
-__v8hi __builtin_arc_vminw (__v8hi, __v8hi);
-__v8hi __builtin_arc_vmr1aw (__v8hi, __v8hi);
-__v8hi __builtin_arc_vmr1w (__v8hi, __v8hi);
-__v8hi __builtin_arc_vmr2aw (__v8hi, __v8hi);
-__v8hi __builtin_arc_vmr2w (__v8hi, __v8hi);
-__v8hi __builtin_arc_vmr3aw (__v8hi, __v8hi);
-__v8hi __builtin_arc_vmr3w (__v8hi, __v8hi);
-__v8hi __builtin_arc_vmr4aw (__v8hi, __v8hi);
-__v8hi __builtin_arc_vmr4w (__v8hi, __v8hi);
-__v8hi __builtin_arc_vmr5aw (__v8hi, __v8hi);
-__v8hi __builtin_arc_vmr5w (__v8hi, __v8hi);
-__v8hi __builtin_arc_vmr6aw (__v8hi, __v8hi);
-__v8hi __builtin_arc_vmr6w (__v8hi, __v8hi);
-__v8hi __builtin_arc_vmr7aw (__v8hi, __v8hi);
-__v8hi __builtin_arc_vmr7w (__v8hi, __v8hi);
-__v8hi __builtin_arc_vmrb (__v8hi, __v8hi);
-__v8hi __builtin_arc_vmulaw (__v8hi, __v8hi);
-__v8hi __builtin_arc_vmulfaw (__v8hi, __v8hi);
-__v8hi __builtin_arc_vmulfw (__v8hi, __v8hi);
-__v8hi __builtin_arc_vmulw (__v8hi, __v8hi);
-__v8hi __builtin_arc_vnew (__v8hi, __v8hi);
-__v8hi __builtin_arc_vor (__v8hi, __v8hi);
-__v8hi __builtin_arc_vsubaw (__v8hi, __v8hi);
-__v8hi __builtin_arc_vsubw (__v8hi, __v8hi);
-__v8hi __builtin_arc_vsummw (__v8hi, __v8hi);
-__v8hi __builtin_arc_vvc1f (__v8hi, __v8hi);
-__v8hi __builtin_arc_vvc1ft (__v8hi, __v8hi);
-__v8hi __builtin_arc_vxor (__v8hi, __v8hi);
-__v8hi __builtin_arc_vxoraw (__v8hi, __v8hi);
-@end example
-
-The following take one @code{__v8hi} and one @code{int} argument and return a
-@code{__v8hi} result:
-
-@example
-__v8hi __builtin_arc_vbaddw (__v8hi, int);
-__v8hi __builtin_arc_vbmaxw (__v8hi, int);
-__v8hi __builtin_arc_vbminw (__v8hi, int);
-__v8hi __builtin_arc_vbmulaw (__v8hi, int);
-__v8hi __builtin_arc_vbmulfw (__v8hi, int);
-__v8hi __builtin_arc_vbmulw (__v8hi, int);
-__v8hi __builtin_arc_vbrsubw (__v8hi, int);
-__v8hi __builtin_arc_vbsubw (__v8hi, int);
-@end example
-
-The following take one @code{__v8hi} argument and one @code{int} argument which
-must be a 3-bit compile time constant indicating a register number
-I0-I7.  They return a @code{__v8hi} result.
-@example
-__v8hi __builtin_arc_vasrw (__v8hi, const int);
-__v8hi __builtin_arc_vsr8 (__v8hi, const int);
-__v8hi __builtin_arc_vsr8aw (__v8hi, const int);
-@end example
-
-The following take one @code{__v8hi} argument and one @code{int}
-argument which must be a 6-bit compile time constant.  They return a
-@code{__v8hi} result.
-@example
-__v8hi __builtin_arc_vasrpwbi (__v8hi, const int);
-__v8hi __builtin_arc_vasrrpwbi (__v8hi, const int);
-__v8hi __builtin_arc_vasrrwi (__v8hi, const int);
-__v8hi __builtin_arc_vasrsrwi (__v8hi, const int);
-__v8hi __builtin_arc_vasrwi (__v8hi, const int);
-__v8hi __builtin_arc_vsr8awi (__v8hi, const int);
-__v8hi __builtin_arc_vsr8i (__v8hi, const int);
-@end example
-
-The following take one @code{__v8hi} argument and one @code{int} argument which
-must be a 8-bit compile time constant.  They return a @code{__v8hi}
-result.
-@example
-__v8hi __builtin_arc_vd6tapf (__v8hi, const int);
-__v8hi __builtin_arc_vmvaw (__v8hi, const int);
-__v8hi __builtin_arc_vmvw (__v8hi, const int);
-__v8hi __builtin_arc_vmvzw (__v8hi, const int);
-@end example
-
-The following take two @code{int} arguments, the second of which which
-must be a 8-bit compile time constant.  They return a @code{__v8hi}
-result:
-@example
-__v8hi __builtin_arc_vmovaw (int, const int);
-__v8hi __builtin_arc_vmovw (int, const int);
-__v8hi __builtin_arc_vmovzw (int, const int);
-@end example
-
-The following take a single @code{__v8hi} argument and return a
-@code{__v8hi} result:
-@example
-__v8hi __builtin_arc_vabsaw (__v8hi);
-__v8hi __builtin_arc_vabsw (__v8hi);
-__v8hi __builtin_arc_vaddsuw (__v8hi);
-__v8hi __builtin_arc_vexch1 (__v8hi);
-__v8hi __builtin_arc_vexch2 (__v8hi);
-__v8hi __builtin_arc_vexch4 (__v8hi);
-__v8hi __builtin_arc_vsignw (__v8hi);
-__v8hi __builtin_arc_vupbaw (__v8hi);
-__v8hi __builtin_arc_vupbw (__v8hi);
-__v8hi __builtin_arc_vupsbaw (__v8hi);
-__v8hi __builtin_arc_vupsbw (__v8hi);
-@end example
-
-The following take two @code{int} arguments and return no result:
-@example
-void __builtin_arc_vdirun (int, int);
-void __builtin_arc_vdorun (int, int);
-@end example
-
-The following take two @code{int} arguments and return no result.  The
-first argument must a 3-bit compile time constant indicating one of
-the DR0-DR7 DMA setup channels:
-@example
-void __builtin_arc_vdiwr (const int, int);
-void __builtin_arc_vdowr (const int, int);
-@end example
-
-The following take an @code{int} argument and return no result:
-@example
-void __builtin_arc_vendrec (int);
-void __builtin_arc_vrec (int);
-void __builtin_arc_vrecrun (int);
-void __builtin_arc_vrun (int);
-@end example
+@node New/Delete Builtins
+@section Built-in functions for C++ allocations and deallocations
+@findex __builtin_operator_new
+@findex __builtin_operator_delete
+Calling these C++ built-in functions is similar to calling
+@code{::operator new} or @code{::operator delete} with the same arguments,
+except that it is an error if the selected @code{::operator new} or
+@code{::operator delete} overload is not a replaceable global operator
+and for optimization purposes calls to pairs of these functions can be
+omitted if access to the allocation is optimized out, or could be replaced
+with implementation provided buffer on the stack, or multiple allocation
+calls can be merged into a single allocation.  In C++ such optimizations
+are normally allowed just for calls to such replaceable global operators
+from @code{new} and @code{delete} expressions.
 
-The following take a @code{__v8hi} argument and two @code{int}
-arguments and return a @code{__v8hi} result.  The second argument must
-be a 3-bit compile time constants, indicating one the registers I0-I7,
-and the third argument must be an 8-bit compile time constant.
+@smallexample
+void foo () @{
+  int *a = new int;
+  delete a; // This pair of allocation/deallocation operators can be omitted
+	    // or replaced with int _temp; int *a = &_temp; etc.@:
+  void *b = ::operator new (32);
+  ::operator delete (b); // This one cannnot.
+  void *c = __builtin_operator_new (32);
+  __builtin_operator_delete (c); // This one can.
+@}
+@end smallexample
 
-@emph{Note:} Although the equivalent hardware instructions do not take
-an SIMD register as an operand, these builtins overwrite the relevant
-bits of the @code{__v8hi} register provided as the first argument with
-the value loaded from the @code{[Ib, u8]} location in the SDM.
-
-@example
-__v8hi __builtin_arc_vld32 (__v8hi, const int, const int);
-__v8hi __builtin_arc_vld32wh (__v8hi, const int, const int);
-__v8hi __builtin_arc_vld32wl (__v8hi, const int, const int);
-__v8hi __builtin_arc_vld64 (__v8hi, const int, const int);
-@end example
-
-The following take two @code{int} arguments and return a @code{__v8hi}
-result.  The first argument must be a 3-bit compile time constants,
-indicating one the registers I0-I7, and the second argument must be an
-8-bit compile time constant.
-
-@example
-__v8hi __builtin_arc_vld128 (const int, const int);
-__v8hi __builtin_arc_vld64w (const int, const int);
-@end example
-
-The following take a @code{__v8hi} argument and two @code{int}
-arguments and return no result.  The second argument must be a 3-bit
-compile time constants, indicating one the registers I0-I7, and the
-third argument must be an 8-bit compile time constant.
-
-@example
-void __builtin_arc_vst128 (__v8hi, const int, const int);
-void __builtin_arc_vst64 (__v8hi, const int, const int);
-@end example
-
-The following take a @code{__v8hi} argument and three @code{int}
-arguments and return no result.  The second argument must be a 3-bit
-compile-time constant, identifying the 16-bit sub-register to be
-stored, the third argument must be a 3-bit compile time constants,
-indicating one the registers I0-I7, and the fourth argument must be an
-8-bit compile time constant.
-
-@example
-void __builtin_arc_vst16_n (__v8hi, const int, const int, const int);
-void __builtin_arc_vst32_n (__v8hi, const int, const int, const int);
-@end example
-
-The following built-in functions are available on systems that uses
-@option{-mmpy-option=6} or higher.
-
-@example
-__v2hi __builtin_arc_dmach (__v2hi, __v2hi);
-__v2hi __builtin_arc_dmachu (__v2hi, __v2hi);
-__v2hi __builtin_arc_dmpyh (__v2hi, __v2hi);
-__v2hi __builtin_arc_dmpyhu (__v2hi, __v2hi);
-__v2hi __builtin_arc_vaddsub2h (__v2hi, __v2hi);
-__v2hi __builtin_arc_vsubadd2h (__v2hi, __v2hi);
-@end example
-
-The following built-in functions are available on systems that uses
-@option{-mmpy-option=7} or higher.
-
-@example
-__v2si __builtin_arc_vmac2h (__v2hi, __v2hi);
-__v2si __builtin_arc_vmac2hu (__v2hi, __v2hi);
-__v2si __builtin_arc_vmpy2h (__v2hi, __v2hi);
-__v2si __builtin_arc_vmpy2hu (__v2hi, __v2hi);
-@end example
-
-The following built-in functions are available on systems that uses
-@option{-mmpy-option=8} or higher.
-
-@example
-long long __builtin_arc_qmach (__v4hi, __v4hi);
-long long __builtin_arc_qmachu (__v4hi, __v4hi);
-long long __builtin_arc_qmpyh (__v4hi, __v4hi);
-long long __builtin_arc_qmpyhu (__v4hi, __v4hi);
-long long __builtin_arc_dmacwh (__v2si, __v2hi);
-long long __builtin_arc_dmacwhu (__v2si, __v2hi);
-_v2si __builtin_arc_vaddsub (__v2si, __v2si);
-_v2si __builtin_arc_vsubadd (__v2si, __v2si);
-_v4hi __builtin_arc_vaddsub4h (__v4hi, __v4hi);
-_v4hi __builtin_arc_vsubadd4h (__v4hi, __v4hi);
-@end example
-
-@node ARM iWMMXt Built-in Functions
-@subsection ARM iWMMXt Built-in Functions
-
-These built-in functions are available for the ARM family of
-processors when the @option{-mcpu=iwmmxt} switch is used:
-
-@smallexample
-typedef int v2si __attribute__ ((vector_size (8)));
-typedef short v4hi __attribute__ ((vector_size (8)));
-typedef char v8qi __attribute__ ((vector_size (8)));
-
-int __builtin_arm_getwcgr0 (void);
-void __builtin_arm_setwcgr0 (int);
-int __builtin_arm_getwcgr1 (void);
-void __builtin_arm_setwcgr1 (int);
-int __builtin_arm_getwcgr2 (void);
-void __builtin_arm_setwcgr2 (int);
-int __builtin_arm_getwcgr3 (void);
-void __builtin_arm_setwcgr3 (int);
-int __builtin_arm_textrmsb (v8qi, int);
-int __builtin_arm_textrmsh (v4hi, int);
-int __builtin_arm_textrmsw (v2si, int);
-int __builtin_arm_textrmub (v8qi, int);
-int __builtin_arm_textrmuh (v4hi, int);
-int __builtin_arm_textrmuw (v2si, int);
-v8qi __builtin_arm_tinsrb (v8qi, int, int);
-v4hi __builtin_arm_tinsrh (v4hi, int, int);
-v2si __builtin_arm_tinsrw (v2si, int, int);
-long long __builtin_arm_tmia (long long, int, int);
-long long __builtin_arm_tmiabb (long long, int, int);
-long long __builtin_arm_tmiabt (long long, int, int);
-long long __builtin_arm_tmiaph (long long, int, int);
-long long __builtin_arm_tmiatb (long long, int, int);
-long long __builtin_arm_tmiatt (long long, int, int);
-int __builtin_arm_tmovmskb (v8qi);
-int __builtin_arm_tmovmskh (v4hi);
-int __builtin_arm_tmovmskw (v2si);
-long long __builtin_arm_waccb (v8qi);
-long long __builtin_arm_wacch (v4hi);
-long long __builtin_arm_waccw (v2si);
-v8qi __builtin_arm_waddb (v8qi, v8qi);
-v8qi __builtin_arm_waddbss (v8qi, v8qi);
-v8qi __builtin_arm_waddbus (v8qi, v8qi);
-v4hi __builtin_arm_waddh (v4hi, v4hi);
-v4hi __builtin_arm_waddhss (v4hi, v4hi);
-v4hi __builtin_arm_waddhus (v4hi, v4hi);
-v2si __builtin_arm_waddw (v2si, v2si);
-v2si __builtin_arm_waddwss (v2si, v2si);
-v2si __builtin_arm_waddwus (v2si, v2si);
-v8qi __builtin_arm_walign (v8qi, v8qi, int);
-long long __builtin_arm_wand(long long, long long);
-long long __builtin_arm_wandn (long long, long long);
-v8qi __builtin_arm_wavg2b (v8qi, v8qi);
-v8qi __builtin_arm_wavg2br (v8qi, v8qi);
-v4hi __builtin_arm_wavg2h (v4hi, v4hi);
-v4hi __builtin_arm_wavg2hr (v4hi, v4hi);
-v8qi __builtin_arm_wcmpeqb (v8qi, v8qi);
-v4hi __builtin_arm_wcmpeqh (v4hi, v4hi);
-v2si __builtin_arm_wcmpeqw (v2si, v2si);
-v8qi __builtin_arm_wcmpgtsb (v8qi, v8qi);
-v4hi __builtin_arm_wcmpgtsh (v4hi, v4hi);
-v2si __builtin_arm_wcmpgtsw (v2si, v2si);
-v8qi __builtin_arm_wcmpgtub (v8qi, v8qi);
-v4hi __builtin_arm_wcmpgtuh (v4hi, v4hi);
-v2si __builtin_arm_wcmpgtuw (v2si, v2si);
-long long __builtin_arm_wmacs (long long, v4hi, v4hi);
-long long __builtin_arm_wmacsz (v4hi, v4hi);
-long long __builtin_arm_wmacu (long long, v4hi, v4hi);
-long long __builtin_arm_wmacuz (v4hi, v4hi);
-v4hi __builtin_arm_wmadds (v4hi, v4hi);
-v4hi __builtin_arm_wmaddu (v4hi, v4hi);
-v8qi __builtin_arm_wmaxsb (v8qi, v8qi);
-v4hi __builtin_arm_wmaxsh (v4hi, v4hi);
-v2si __builtin_arm_wmaxsw (v2si, v2si);
-v8qi __builtin_arm_wmaxub (v8qi, v8qi);
-v4hi __builtin_arm_wmaxuh (v4hi, v4hi);
-v2si __builtin_arm_wmaxuw (v2si, v2si);
-v8qi __builtin_arm_wminsb (v8qi, v8qi);
-v4hi __builtin_arm_wminsh (v4hi, v4hi);
-v2si __builtin_arm_wminsw (v2si, v2si);
-v8qi __builtin_arm_wminub (v8qi, v8qi);
-v4hi __builtin_arm_wminuh (v4hi, v4hi);
-v2si __builtin_arm_wminuw (v2si, v2si);
-v4hi __builtin_arm_wmulsm (v4hi, v4hi);
-v4hi __builtin_arm_wmulul (v4hi, v4hi);
-v4hi __builtin_arm_wmulum (v4hi, v4hi);
-long long __builtin_arm_wor (long long, long long);
-v2si __builtin_arm_wpackdss (long long, long long);
-v2si __builtin_arm_wpackdus (long long, long long);
-v8qi __builtin_arm_wpackhss (v4hi, v4hi);
-v8qi __builtin_arm_wpackhus (v4hi, v4hi);
-v4hi __builtin_arm_wpackwss (v2si, v2si);
-v4hi __builtin_arm_wpackwus (v2si, v2si);
-long long __builtin_arm_wrord (long long, long long);
-long long __builtin_arm_wrordi (long long, int);
-v4hi __builtin_arm_wrorh (v4hi, long long);
-v4hi __builtin_arm_wrorhi (v4hi, int);
-v2si __builtin_arm_wrorw (v2si, long long);
-v2si __builtin_arm_wrorwi (v2si, int);
-v2si __builtin_arm_wsadb (v2si, v8qi, v8qi);
-v2si __builtin_arm_wsadbz (v8qi, v8qi);
-v2si __builtin_arm_wsadh (v2si, v4hi, v4hi);
-v2si __builtin_arm_wsadhz (v4hi, v4hi);
-v4hi __builtin_arm_wshufh (v4hi, int);
-long long __builtin_arm_wslld (long long, long long);
-long long __builtin_arm_wslldi (long long, int);
-v4hi __builtin_arm_wsllh (v4hi, long long);
-v4hi __builtin_arm_wsllhi (v4hi, int);
-v2si __builtin_arm_wsllw (v2si, long long);
-v2si __builtin_arm_wsllwi (v2si, int);
-long long __builtin_arm_wsrad (long long, long long);
-long long __builtin_arm_wsradi (long long, int);
-v4hi __builtin_arm_wsrah (v4hi, long long);
-v4hi __builtin_arm_wsrahi (v4hi, int);
-v2si __builtin_arm_wsraw (v2si, long long);
-v2si __builtin_arm_wsrawi (v2si, int);
-long long __builtin_arm_wsrld (long long, long long);
-long long __builtin_arm_wsrldi (long long, int);
-v4hi __builtin_arm_wsrlh (v4hi, long long);
-v4hi __builtin_arm_wsrlhi (v4hi, int);
-v2si __builtin_arm_wsrlw (v2si, long long);
-v2si __builtin_arm_wsrlwi (v2si, int);
-v8qi __builtin_arm_wsubb (v8qi, v8qi);
-v8qi __builtin_arm_wsubbss (v8qi, v8qi);
-v8qi __builtin_arm_wsubbus (v8qi, v8qi);
-v4hi __builtin_arm_wsubh (v4hi, v4hi);
-v4hi __builtin_arm_wsubhss (v4hi, v4hi);
-v4hi __builtin_arm_wsubhus (v4hi, v4hi);
-v2si __builtin_arm_wsubw (v2si, v2si);
-v2si __builtin_arm_wsubwss (v2si, v2si);
-v2si __builtin_arm_wsubwus (v2si, v2si);
-v4hi __builtin_arm_wunpckehsb (v8qi);
-v2si __builtin_arm_wunpckehsh (v4hi);
-long long __builtin_arm_wunpckehsw (v2si);
-v4hi __builtin_arm_wunpckehub (v8qi);
-v2si __builtin_arm_wunpckehuh (v4hi);
-long long __builtin_arm_wunpckehuw (v2si);
-v4hi __builtin_arm_wunpckelsb (v8qi);
-v2si __builtin_arm_wunpckelsh (v4hi);
-long long __builtin_arm_wunpckelsw (v2si);
-v4hi __builtin_arm_wunpckelub (v8qi);
-v2si __builtin_arm_wunpckeluh (v4hi);
-long long __builtin_arm_wunpckeluw (v2si);
-v8qi __builtin_arm_wunpckihb (v8qi, v8qi);
-v4hi __builtin_arm_wunpckihh (v4hi, v4hi);
-v2si __builtin_arm_wunpckihw (v2si, v2si);
-v8qi __builtin_arm_wunpckilb (v8qi, v8qi);
-v4hi __builtin_arm_wunpckilh (v4hi, v4hi);
-v2si __builtin_arm_wunpckilw (v2si, v2si);
-long long __builtin_arm_wxor (long long, long long);
-long long __builtin_arm_wzero ();
-@end smallexample
-
-
-@node ARM C Language Extensions (ACLE)
-@subsection ARM C Language Extensions (ACLE)
-
-GCC implements extensions for C as described in the ARM C Language
-Extensions (ACLE) specification, which can be found at
-@uref{https://developer.arm.com/documentation/ihi0053/latest/}.
-
-As a part of ACLE, GCC implements extensions for Advanced SIMD as described in
-the ARM C Language Extensions Specification.  The complete list of Advanced SIMD
-intrinsics can be found at
-@uref{https://developer.arm.com/documentation/ihi0073/latest/}.
-The built-in intrinsics for the Advanced SIMD extension are available when
-NEON is enabled.
-
-Currently, ARM and AArch64 back ends do not support ACLE 2.0 fully.  Both
-back ends support CRC32 intrinsics and the ARM back end supports the
-Coprocessor intrinsics, all from @file{arm_acle.h}.  The ARM back end's 16-bit
-floating-point Advanced SIMD intrinsics currently comply to ACLE v1.1.
-AArch64's back end does not have support for 16-bit floating point Advanced SIMD
-intrinsics yet.
-
-See @ref{ARM Options} and @ref{AArch64 Options} for more information on the
-availability of extensions.
-
-@node ARM Floating Point Status and Control Intrinsics
-@subsection ARM Floating Point Status and Control Intrinsics
-
-These built-in functions are available for the ARM family of
-processors with floating-point unit.
-
-@smallexample
-unsigned int __builtin_arm_get_fpscr ();
-void __builtin_arm_set_fpscr (unsigned int);
-@end smallexample
-
-@node ARM ARMv8-M Security Extensions
-@subsection ARM ARMv8-M Security Extensions
-
-GCC implements the ARMv8-M Security Extensions as described in the ARMv8-M
-Security Extensions: Requirements on Development Tools Engineering
-Specification, which can be found at
-@uref{https://developer.arm.com/documentation/ecm0359818/latest/}.
-
-As part of the Security Extensions GCC implements two new function attributes:
-@code{cmse_nonsecure_entry} and @code{cmse_nonsecure_call}.
-
-As part of the Security Extensions GCC implements the intrinsics below.  FPTR
-is used here to mean any function pointer type.
-
-@smallexample
-cmse_address_info_t cmse_TT (void *);
-cmse_address_info_t cmse_TT_fptr (FPTR);
-cmse_address_info_t cmse_TTT (void *);
-cmse_address_info_t cmse_TTT_fptr (FPTR);
-cmse_address_info_t cmse_TTA (void *);
-cmse_address_info_t cmse_TTA_fptr (FPTR);
-cmse_address_info_t cmse_TTAT (void *);
-cmse_address_info_t cmse_TTAT_fptr (FPTR);
-void * cmse_check_address_range (void *, size_t, int);
-typeof(p) cmse_nsfptr_create (FPTR p);
-intptr_t cmse_is_nsfptr (FPTR);
-int cmse_nonsecure_caller (void);
-@end smallexample
-
-@node AVR Built-in Functions
-@subsection AVR Built-in Functions
-
-For each built-in function for AVR, there is an equally named,
-uppercase built-in macro defined. That way users can easily query if
-or if not a specific built-in is implemented or not. For example, if
-@code{__builtin_avr_nop} is available the macro
-@code{__BUILTIN_AVR_NOP} is defined to @code{1} and undefined otherwise.
-
-@defbuiltin{void __builtin_avr_nop (void)}
-@defbuiltinx{void __builtin_avr_sei (void)}
-@defbuiltinx{void __builtin_avr_cli (void)}
-@defbuiltinx{void __builtin_avr_sleep (void)}
-@defbuiltinx{void __builtin_avr_wdr (void)}
-@defbuiltinx{uint8_t __builtin_avr_swap (uint8_t)}
-@defbuiltinx{uint16_t __builtin_avr_fmul (uint8_t, uint8_t)}
-@defbuiltinx{int16_t __builtin_avr_fmuls (int8_t, int8_t)}
-@defbuiltinx{int16_t __builtin_avr_fmulsu (int8_t, uint8_t)}
-
-These built-in functions map to the respective machine
-instruction, i.e.@: @code{nop}, @code{sei}, @code{cli}, @code{sleep},
-@code{wdr}, @code{swap}, @code{fmul}, @code{fmuls}
-resp. @code{fmulsu}. The three @code{fmul*} built-ins are implemented
-as library call if no hardware multiplier is available.
-@enddefbuiltin
-
-@defbuiltin{void __builtin_avr_delay_cycles (uint32_t @var{ticks})}
-Delay execution for @var{ticks} cycles. Note that this
-built-in does not take into account the effect of interrupts that
-might increase delay time. @var{ticks} must be a compile-time
-integer constant; delays with a variable number of cycles are not supported.
-@enddefbuiltin
-
-@defbuiltin{uint8_t __builtin_avr_insert_bits (uint32_t @var{map}, uint8_t @var{bits}, uint8_t @var{val})}
-Insert bits from @var{bits} into @var{val} and return the resulting
-value. The nibbles of @var{map} determine how the insertion is
-performed: Let @var{X} be the @var{n}-th nibble of @var{map}
-@enumerate
-@item If @var{X} is @code{0xf},
-then the @var{n}-th bit of @var{val} is returned unaltered.
-
-@item If X is in the range 0@dots{}7,
-then the @var{n}-th result bit is set to the @var{X}-th bit of @var{bits}
-
-@item If X is in the range 8@dots{}@code{0xe},
-then the @var{n}-th result bit is undefined.
-@end enumerate
-
-@noindent
-One typical use case for this built-in is adjusting input and
-output values to non-contiguous port layouts. Some examples:
-
-@smallexample
-// same as val, bits is unused
-__builtin_avr_insert_bits (0xffffffff, bits, val);
-@end smallexample
-
-@smallexample
-// same as bits, val is unused
-__builtin_avr_insert_bits (0x76543210, bits, val);
-@end smallexample
-
-@smallexample
-// same as rotating bits by 4
-__builtin_avr_insert_bits (0x32107654, bits, 0);
-@end smallexample
-
-@smallexample
-// high nibble of result is the high nibble of val
-// low nibble of result is the low nibble of bits
-__builtin_avr_insert_bits (0xffff3210, bits, val);
-@end smallexample
-
-@smallexample
-// reverse the bit order of bits
-__builtin_avr_insert_bits (0x01234567, bits, 0);
-@end smallexample
-@enddefbuiltin
-
-@defbuiltin{uint8_t __builtin_avr_mask1 (uint8_t @var{mask}, uint8_t @var{offs})}
-Rotate the 8-bit constant value @var{mask} by an offset of @var{offs},
-where @var{mask} is in @{ 0x01, 0xfe, 0x7f, 0x80 @}.
-This built-in can be used as an alternative to 8-bit expressions like
-@code{1 << offs} when their computation consumes too much
-time, and @var{offs} is known to be in the range 0@dots{}7.
-@example
-__builtin_avr_mask1 (1, offs)      // same like  1 << offs
-__builtin_avr_mask1 (~1, offs)     // same like  ~(1 << offs)
-__builtin_avr_mask1 (0x80, offs)   // same like  0x80 >> offs
-__builtin_avr_mask1 (~0x80, offs)  // same like  ~(0x80 >> offs)
-@end example
-The open coded C versions take at least @code{5 + 4 * @var{offs}} cycles
-(and 5 instructions), whereas the built-in takes 7 cycles and instructions
-(8 cycles and instructions in the case of @code{@var{mask} = 0x7f}).
-@enddefbuiltin
-
-@defbuiltin{void __builtin_avr_nops (uint16_t @var{count})}
-Insert @var{count} @code{NOP} instructions.
-The number of instructions must be a compile-time integer constant.
-@enddefbuiltin
-
-@b{All of the following built-in functions are only available for GNU-C}
-
-@defbuiltin{int8_t __builtin_avr_flash_segment (const __memx void*)}
-This built-in takes a byte address to the 24-bit
-@ref{AVR Named Address Spaces,named address space} @code{__memx} and returns
-the number of the flash segment (the 64 KiB chunk) where the address
-points to.  Counting starts at @code{0}.
-If the address does not point to flash memory, return @code{-1}.
-@enddefbuiltin
-
-@defbuiltin{size_t __builtin_avr_strlen_flash (const __flash char*)}
-@defbuiltinx{size_t __builtin_avr_strlen_flashx (const __flashx char*)}
-@defbuiltinx{size_t __builtin_avr_strlen_memx (const __memx char*)}
-These built-ins return the length of a string located in
-named address-space @code{__flash}, @code{__flashx} or @code{__memx},
-respectively.  They are used to support functions like @code{strlen_F} from
-@w{@uref{https://avrdudes.github.io/avr-libc/avr-libc-user-manual/,AVR-LibC}}'s
-header @code{avr/flash.h}.
-@enddefbuiltin
-
-@noindent
-There are many more AVR-specific built-in functions that are used to
-implement the ISO/IEC TR 18037 ``Embedded C'' fixed-point functions of
-section 7.18a.6.  You don't need to use these built-ins directly.
-Instead, use the declarations as supplied by the @code{stdfix.h} header
-with GNU-C99:
-
-@smallexample
-#include <stdfix.h>
-
-// Re-interpret the bit representation of unsigned 16-bit
-// integer @var{uval} as Q-format 0.16 value.
-unsigned fract get_bits (uint_ur_t uval)
-@{
-    return urbits (uval);
-@}
-@end smallexample
-
-@node Blackfin Built-in Functions
-@subsection Blackfin Built-in Functions
-
-Currently, there are two Blackfin-specific built-in functions.  These are
-used for generating @code{CSYNC} and @code{SSYNC} machine insns without
-using inline assembly; by using these built-in functions the compiler can
-automatically add workarounds for hardware errata involving these
-instructions.  These functions are named as follows:
-
-@smallexample
-void __builtin_bfin_csync (void);
-void __builtin_bfin_ssync (void);
-@end smallexample
-
-@node BPF Built-in Functions
-@subsection BPF Built-in Functions
-
-The following built-in functions are available for eBPF targets.
-
-@defbuiltin{{unsigned long long} __builtin_bpf_load_byte (unsigned long long @var{offset})}
-Load a byte from the @code{struct sk_buff} packet data pointed to by the
-register @code{%r6}, and return it.
-@enddefbuiltin
-
-@defbuiltin{{unsigned long long} __builtin_bpf_load_half (unsigned long long @var{offset})}
-Load 16 bits from the @code{struct sk_buff} packet data pointed to by the
-register @code{%r6}, and return it.
-@enddefbuiltin
-
-@defbuiltin{{unsigned long long} __builtin_bpf_load_word (unsigned long long @var{offset})}
-Load 32 bits from the @code{struct sk_buff} packet data pointed to by the
-register @code{%r6}, and return it.
-@enddefbuiltin
-
-@defbuiltin{@var{type} __builtin_preserve_access_index (@var{type} @var{expr})}
-BPF Compile Once-Run Everywhere (CO-RE) support.  Instruct GCC to
-generate CO-RE relocation records for any accesses to aggregate
-data structures (struct, union, array types) in @var{expr}.  This builtin
-is otherwise transparent; @var{expr} may have any type and its value is
-returned.  This builtin has no effect if @code{-mco-re} is not in effect
-(either specified or implied).
-@enddefbuiltin
-
-@defbuiltin{{unsigned int} __builtin_preserve_field_info (@var{expr}, unsigned int @var{kind})}
-BPF Compile Once-Run Everywhere (CO-RE) support. This builtin is used to
-extract information to aid in struct/union relocations.  @var{expr} is
-an access to a field of a struct or union. Depending on @var{kind}, different
-information is returned to the program. A CO-RE relocation for the access in
-@var{expr} with kind @var{kind} is recorded if @code{-mco-re} is in effect.
-
-The following values are supported for @var{kind}:
-@table @code
-@item FIELD_BYTE_OFFSET = 0
-The returned value is the offset, in bytes, of the field from the
-beginning of the containing structure. For bit-fields, this is the byte offset
-of the containing word.
-
-@item FIELD_BYTE_SIZE = 1
-The returned value is the size, in bytes, of the field. For bit-fields,
-this is the size in bytes of the containing word.
-
-@item FIELD_EXISTENCE = 2
-The returned value is 1 if the field exists, 0 otherwise. Always 1 at
-compile time.
-
-@item FIELD_SIGNEDNESS = 3
-The returned value is 1 if the field is signed, 0 otherwise.
-
-@item FIELD_LSHIFT_U64 = 4
-@itemx FIELD_RSHIFT_U64 = 5
-The returned value is the number of bits of left- or right-shifting
-(respectively) needed in order to recover the original value of the field,
-after it has been loaded by a read of @code{FIELD_BYTE_SIZE} bytes into an
-unsigned 64-bit value. Primarily useful for reading bit-field values
-from structures that may change between kernel versions.
-
-@end table
-
-Note that the return value is a constant which is known at
-compile time. If the field has a variable offset then
-@code{FIELD_BYTE_OFFSET}, @code{FIELD_LSHIFT_U64},
-and @code{FIELD_RSHIFT_U64} are not supported.
-Similarly, if the field has a variable size then
-@code{FIELD_BYTE_SIZE}, @code{FIELD_LSHIFT_U64},
-and @code{FIELD_RSHIFT_U64} are not supported.
-
-For example, @code{__builtin_preserve_field_info} can be used to reliably
-extract bit-field values from a structure that may change between
-kernel versions:
-
-@smallexample
-struct S
-@{
-  short a;
-  int x:7;
-  int y:5;
-@};
-
-int
-read_y (struct S *arg)
-@{
-  unsigned long long val;
-  unsigned int offset
-    = __builtin_preserve_field_info (arg->y, FIELD_BYTE_OFFSET);
-  unsigned int size
-    = __builtin_preserve_field_info (arg->y, FIELD_BYTE_SIZE);
-
-  /* Read size bytes from arg + offset into val.  */
-  bpf_probe_read (&val, size, arg + offset);
-
-  val <<= __builtin_preserve_field_info (arg->y, FIELD_LSHIFT_U64);
-
-  if (__builtin_preserve_field_info (arg->y, FIELD_SIGNEDNESS))
-    val = ((long long) val
-           >> __builtin_preserve_field_info (arg->y, FIELD_RSHIFT_U64));
-  else
-    val >>= __builtin_preserve_field_info (arg->y, FIELD_RSHIFT_U64);
-
-  return val;
-@}
-
-@end smallexample
-@enddefbuiltin
-
-@defbuiltin{{unsigned int} __builtin_preserve_enum_value (@var{type}, @var{enum}, unsigned int @var{kind})}
-BPF Compile Once-Run Everywhere (CO-RE) support. This builtin collects enum
-information and creates a CO-RE relocation relative to @var{enum} that should
-be of @var{type}.  The @var{kind} specifies the action performed.
-
-The following values are supported for @var{kind}:
-@table @code
-@item ENUM_VALUE_EXISTS = 0
-The return value is either 0 or 1 depending if the enum value exists in the
-target.
-
-@item ENUM_VALUE = 1
-The return value is the enum value in the target kernel.
-@end table
-@enddefbuiltin
-
-@defbuiltin{{unsigned int} __builtin_btf_type_id (@var{type}, unsigned int @var{kind})}
-BPF Compile Once-Run Everywhere (CO-RE) support. This builtin is used to get
-the BTF type ID of a specified @var{type}.
-Depending on the @var{kind} argument, it
-either returns the ID of the local BTF information, or the BTF type ID in
-the target kernel.
-
-The following values are supported for @var{kind}:
-@table @code
-@item BTF_TYPE_ID_LOCAL = 0
-Return the local BTF type ID.  Always succeeds.
-
-@item BTF_TYPE_ID_TARGET = 1
-Return the target BTF type ID.  If @var{type} does not exist in the target,
-returns 0.
-@end table
-@enddefbuiltin
-
-@defbuiltin{{unsigned int} __builtin_preserve_type_info (@var{type}, unsigned int @var{kind})}
-BPF Compile Once-Run Everywhere (CO-RE) support. This builtin performs named
-type (struct/union/enum/typedef) verifications. The type of verification
-depends on the @var{kind} argument provided.  This builtin always
-returns 0 if @var{type} does not exist in the target kernel.
-
-The following values are supported for @var{kind}:
-@table @code
-@item BTF_TYPE_EXISTS = 0
-Checks if @var{type} exists in the target.
-
-@item BTF_TYPE_MATCHES = 1
-Checks if @var{type} matches the local definition in the target kernel.
-
-@item BTF_TYPE_SIZE = 2
-Returns the size of the @var{type} within the target.
-@end table
-@enddefbuiltin
-
-@node FR-V Built-in Functions
-@subsection FR-V Built-in Functions
-
-GCC provides many FR-V-specific built-in functions.  In general,
-these functions are intended to be compatible with those described
-by @cite{FR-V Family, Softune C/C++ Compiler Manual (V6), Fujitsu
-Semiconductor}.  The two exceptions are @code{__MDUNPACKH} and
-@code{__MBTOHE}, the GCC forms of which pass 128-bit values by
-pointer rather than by value.
-
-Most of the functions are named after specific FR-V instructions.
-Such functions are said to be ``directly mapped'' and are summarized
-here in tabular form.
-
-@menu
-* Argument Types::
-* Directly-mapped Integer Functions::
-* Directly-mapped Media Functions::
-* Raw read/write Functions::
-* Other Built-in Functions::
-@end menu
-
-@node Argument Types
-@subsubsection Argument Types
-
-The arguments to the built-in functions can be divided into three groups:
-register numbers, compile-time constants and run-time values.  In order
-to make this classification clear at a glance, the arguments and return
-values are given the following pseudo types:
-
-@multitable @columnfractions .20 .30 .15 .35
-@headitem Pseudo type @tab Real C type @tab Constant? @tab Description
-@item @code{uh} @tab @code{unsigned short} @tab No @tab an unsigned halfword
-@item @code{uw1} @tab @code{unsigned int} @tab No @tab an unsigned word
-@item @code{sw1} @tab @code{int} @tab No @tab a signed word
-@item @code{uw2} @tab @code{unsigned long long} @tab No
-@tab an unsigned doubleword
-@item @code{sw2} @tab @code{long long} @tab No @tab a signed doubleword
-@item @code{const} @tab @code{int} @tab Yes @tab an integer constant
-@item @code{acc} @tab @code{int} @tab Yes @tab an ACC register number
-@item @code{iacc} @tab @code{int} @tab Yes @tab an IACC register number
-@end multitable
-
-These pseudo types are not defined by GCC, they are simply a notational
-convenience used in this manual.
-
-Arguments of type @code{uh}, @code{uw1}, @code{sw1}, @code{uw2}
-and @code{sw2} are evaluated at run time.  They correspond to
-register operands in the underlying FR-V instructions.
-
-@code{const} arguments represent immediate operands in the underlying
-FR-V instructions.  They must be compile-time constants.
-
-@code{acc} arguments are evaluated at compile time and specify the number
-of an accumulator register.  For example, an @code{acc} argument of 2
-selects the ACC2 register.
-
-@code{iacc} arguments are similar to @code{acc} arguments but specify the
-number of an IACC register.  See @pxref{Other Built-in Functions}
-for more details.
-
-@node Directly-mapped Integer Functions
-@subsubsection Directly-Mapped Integer Functions
-
-The functions listed below map directly to FR-V I-type instructions.
-
-@multitable @columnfractions .45 .32 .23
-@headitem Function prototype @tab Example usage @tab Assembly output
-@item @code{sw1 __ADDSS (sw1, sw1)}
-@tab @code{@var{c} = __ADDSS (@var{a}, @var{b})}
-@tab @code{ADDSS @var{a},@var{b},@var{c}}
-@item @code{sw1 __SCAN (sw1, sw1)}
-@tab @code{@var{c} = __SCAN (@var{a}, @var{b})}
-@tab @code{SCAN @var{a},@var{b},@var{c}}
-@item @code{sw1 __SCUTSS (sw1)}
-@tab @code{@var{b} = __SCUTSS (@var{a})}
-@tab @code{SCUTSS @var{a},@var{b}}
-@item @code{sw1 __SLASS (sw1, sw1)}
-@tab @code{@var{c} = __SLASS (@var{a}, @var{b})}
-@tab @code{SLASS @var{a},@var{b},@var{c}}
-@item @code{void __SMASS (sw1, sw1)}
-@tab @code{__SMASS (@var{a}, @var{b})}
-@tab @code{SMASS @var{a},@var{b}}
-@item @code{void __SMSSS (sw1, sw1)}
-@tab @code{__SMSSS (@var{a}, @var{b})}
-@tab @code{SMSSS @var{a},@var{b}}
-@item @code{void __SMU (sw1, sw1)}
-@tab @code{__SMU (@var{a}, @var{b})}
-@tab @code{SMU @var{a},@var{b}}
-@item @code{sw2 __SMUL (sw1, sw1)}
-@tab @code{@var{c} = __SMUL (@var{a}, @var{b})}
-@tab @code{SMUL @var{a},@var{b},@var{c}}
-@item @code{sw1 __SUBSS (sw1, sw1)}
-@tab @code{@var{c} = __SUBSS (@var{a}, @var{b})}
-@tab @code{SUBSS @var{a},@var{b},@var{c}}
-@item @code{uw2 __UMUL (uw1, uw1)}
-@tab @code{@var{c} = __UMUL (@var{a}, @var{b})}
-@tab @code{UMUL @var{a},@var{b},@var{c}}
-@end multitable
-
-@node Directly-mapped Media Functions
-@subsubsection Directly-Mapped Media Functions
-
-The functions listed below map directly to FR-V M-type instructions.
-
-@multitable @columnfractions .45 .32 .23
-@headitem Function prototype @tab Example usage @tab Assembly output
-@item @code{uw1 __MABSHS (sw1)}
-@tab @code{@var{b} = __MABSHS (@var{a})}
-@tab @code{MABSHS @var{a},@var{b}}
-@item @code{void __MADDACCS (acc, acc)}
-@tab @code{__MADDACCS (@var{b}, @var{a})}
-@tab @code{MADDACCS @var{a},@var{b}}
-@item @code{sw1 __MADDHSS (sw1, sw1)}
-@tab @code{@var{c} = __MADDHSS (@var{a}, @var{b})}
-@tab @code{MADDHSS @var{a},@var{b},@var{c}}
-@item @code{uw1 __MADDHUS (uw1, uw1)}
-@tab @code{@var{c} = __MADDHUS (@var{a}, @var{b})}
-@tab @code{MADDHUS @var{a},@var{b},@var{c}}
-@item @code{uw1 __MAND (uw1, uw1)}
-@tab @code{@var{c} = __MAND (@var{a}, @var{b})}
-@tab @code{MAND @var{a},@var{b},@var{c}}
-@item @code{void __MASACCS (acc, acc)}
-@tab @code{__MASACCS (@var{b}, @var{a})}
-@tab @code{MASACCS @var{a},@var{b}}
-@item @code{uw1 __MAVEH (uw1, uw1)}
-@tab @code{@var{c} = __MAVEH (@var{a}, @var{b})}
-@tab @code{MAVEH @var{a},@var{b},@var{c}}
-@item @code{uw2 __MBTOH (uw1)}
-@tab @code{@var{b} = __MBTOH (@var{a})}
-@tab @code{MBTOH @var{a},@var{b}}
-@item @code{void __MBTOHE (uw1 *, uw1)}
-@tab @code{__MBTOHE (&@var{b}, @var{a})}
-@tab @code{MBTOHE @var{a},@var{b}}
-@item @code{void __MCLRACC (acc)}
-@tab @code{__MCLRACC (@var{a})}
-@tab @code{MCLRACC @var{a}}
-@item @code{void __MCLRACCA (void)}
-@tab @code{__MCLRACCA ()}
-@tab @code{MCLRACCA}
-@item @code{uw1 __Mcop1 (uw1, uw1)}
-@tab @code{@var{c} = __Mcop1 (@var{a}, @var{b})}
-@tab @code{Mcop1 @var{a},@var{b},@var{c}}
-@item @code{uw1 __Mcop2 (uw1, uw1)}
-@tab @code{@var{c} = __Mcop2 (@var{a}, @var{b})}
-@tab @code{Mcop2 @var{a},@var{b},@var{c}}
-@item @code{uw1 __MCPLHI (uw2, const)}
-@tab @code{@var{c} = __MCPLHI (@var{a}, @var{b})}
-@tab @code{MCPLHI @var{a},#@var{b},@var{c}}
-@item @code{uw1 __MCPLI (uw2, const)}
-@tab @code{@var{c} = __MCPLI (@var{a}, @var{b})}
-@tab @code{MCPLI @var{a},#@var{b},@var{c}}
-@item @code{void __MCPXIS (acc, sw1, sw1)}
-@tab @code{__MCPXIS (@var{c}, @var{a}, @var{b})}
-@tab @code{MCPXIS @var{a},@var{b},@var{c}}
-@item @code{void __MCPXIU (acc, uw1, uw1)}
-@tab @code{__MCPXIU (@var{c}, @var{a}, @var{b})}
-@tab @code{MCPXIU @var{a},@var{b},@var{c}}
-@item @code{void __MCPXRS (acc, sw1, sw1)}
-@tab @code{__MCPXRS (@var{c}, @var{a}, @var{b})}
-@tab @code{MCPXRS @var{a},@var{b},@var{c}}
-@item @code{void __MCPXRU (acc, uw1, uw1)}
-@tab @code{__MCPXRU (@var{c}, @var{a}, @var{b})}
-@tab @code{MCPXRU @var{a},@var{b},@var{c}}
-@item @code{uw1 __MCUT (acc, uw1)}
-@tab @code{@var{c} = __MCUT (@var{a}, @var{b})}
-@tab @code{MCUT @var{a},@var{b},@var{c}}
-@item @code{uw1 __MCUTSS (acc, sw1)}
-@tab @code{@var{c} = __MCUTSS (@var{a}, @var{b})}
-@tab @code{MCUTSS @var{a},@var{b},@var{c}}
-@item @code{void __MDADDACCS (acc, acc)}
-@tab @code{__MDADDACCS (@var{b}, @var{a})}
-@tab @code{MDADDACCS @var{a},@var{b}}
-@item @code{void __MDASACCS (acc, acc)}
-@tab @code{__MDASACCS (@var{b}, @var{a})}
-@tab @code{MDASACCS @var{a},@var{b}}
-@item @code{uw2 __MDCUTSSI (acc, const)}
-@tab @code{@var{c} = __MDCUTSSI (@var{a}, @var{b})}
-@tab @code{MDCUTSSI @var{a},#@var{b},@var{c}}
-@item @code{uw2 __MDPACKH (uw2, uw2)}
-@tab @code{@var{c} = __MDPACKH (@var{a}, @var{b})}
-@tab @code{MDPACKH @var{a},@var{b},@var{c}}
-@item @code{uw2 __MDROTLI (uw2, const)}
-@tab @code{@var{c} = __MDROTLI (@var{a}, @var{b})}
-@tab @code{MDROTLI @var{a},#@var{b},@var{c}}
-@item @code{void __MDSUBACCS (acc, acc)}
-@tab @code{__MDSUBACCS (@var{b}, @var{a})}
-@tab @code{MDSUBACCS @var{a},@var{b}}
-@item @code{void __MDUNPACKH (uw1 *, uw2)}
-@tab @code{__MDUNPACKH (&@var{b}, @var{a})}
-@tab @code{MDUNPACKH @var{a},@var{b}}
-@item @code{uw2 __MEXPDHD (uw1, const)}
-@tab @code{@var{c} = __MEXPDHD (@var{a}, @var{b})}
-@tab @code{MEXPDHD @var{a},#@var{b},@var{c}}
-@item @code{uw1 __MEXPDHW (uw1, const)}
-@tab @code{@var{c} = __MEXPDHW (@var{a}, @var{b})}
-@tab @code{MEXPDHW @var{a},#@var{b},@var{c}}
-@item @code{uw1 __MHDSETH (uw1, const)}
-@tab @code{@var{c} = __MHDSETH (@var{a}, @var{b})}
-@tab @code{MHDSETH @var{a},#@var{b},@var{c}}
-@item @code{sw1 __MHDSETS (const)}
-@tab @code{@var{b} = __MHDSETS (@var{a})}
-@tab @code{MHDSETS #@var{a},@var{b}}
-@item @code{uw1 __MHSETHIH (uw1, const)}
-@tab @code{@var{b} = __MHSETHIH (@var{b}, @var{a})}
-@tab @code{MHSETHIH #@var{a},@var{b}}
-@item @code{sw1 __MHSETHIS (sw1, const)}
-@tab @code{@var{b} = __MHSETHIS (@var{b}, @var{a})}
-@tab @code{MHSETHIS #@var{a},@var{b}}
-@item @code{uw1 __MHSETLOH (uw1, const)}
-@tab @code{@var{b} = __MHSETLOH (@var{b}, @var{a})}
-@tab @code{MHSETLOH #@var{a},@var{b}}
-@item @code{sw1 __MHSETLOS (sw1, const)}
-@tab @code{@var{b} = __MHSETLOS (@var{b}, @var{a})}
-@tab @code{MHSETLOS #@var{a},@var{b}}
-@item @code{uw1 __MHTOB (uw2)}
-@tab @code{@var{b} = __MHTOB (@var{a})}
-@tab @code{MHTOB @var{a},@var{b}}
-@item @code{void __MMACHS (acc, sw1, sw1)}
-@tab @code{__MMACHS (@var{c}, @var{a}, @var{b})}
-@tab @code{MMACHS @var{a},@var{b},@var{c}}
-@item @code{void __MMACHU (acc, uw1, uw1)}
-@tab @code{__MMACHU (@var{c}, @var{a}, @var{b})}
-@tab @code{MMACHU @var{a},@var{b},@var{c}}
-@item @code{void __MMRDHS (acc, sw1, sw1)}
-@tab @code{__MMRDHS (@var{c}, @var{a}, @var{b})}
-@tab @code{MMRDHS @var{a},@var{b},@var{c}}
-@item @code{void __MMRDHU (acc, uw1, uw1)}
-@tab @code{__MMRDHU (@var{c}, @var{a}, @var{b})}
-@tab @code{MMRDHU @var{a},@var{b},@var{c}}
-@item @code{void __MMULHS (acc, sw1, sw1)}
-@tab @code{__MMULHS (@var{c}, @var{a}, @var{b})}
-@tab @code{MMULHS @var{a},@var{b},@var{c}}
-@item @code{void __MMULHU (acc, uw1, uw1)}
-@tab @code{__MMULHU (@var{c}, @var{a}, @var{b})}
-@tab @code{MMULHU @var{a},@var{b},@var{c}}
-@item @code{void __MMULXHS (acc, sw1, sw1)}
-@tab @code{__MMULXHS (@var{c}, @var{a}, @var{b})}
-@tab @code{MMULXHS @var{a},@var{b},@var{c}}
-@item @code{void __MMULXHU (acc, uw1, uw1)}
-@tab @code{__MMULXHU (@var{c}, @var{a}, @var{b})}
-@tab @code{MMULXHU @var{a},@var{b},@var{c}}
-@item @code{uw1 __MNOT (uw1)}
-@tab @code{@var{b} = __MNOT (@var{a})}
-@tab @code{MNOT @var{a},@var{b}}
-@item @code{uw1 __MOR (uw1, uw1)}
-@tab @code{@var{c} = __MOR (@var{a}, @var{b})}
-@tab @code{MOR @var{a},@var{b},@var{c}}
-@item @code{uw1 __MPACKH (uh, uh)}
-@tab @code{@var{c} = __MPACKH (@var{a}, @var{b})}
-@tab @code{MPACKH @var{a},@var{b},@var{c}}
-@item @code{sw2 __MQADDHSS (sw2, sw2)}
-@tab @code{@var{c} = __MQADDHSS (@var{a}, @var{b})}
-@tab @code{MQADDHSS @var{a},@var{b},@var{c}}
-@item @code{uw2 __MQADDHUS (uw2, uw2)}
-@tab @code{@var{c} = __MQADDHUS (@var{a}, @var{b})}
-@tab @code{MQADDHUS @var{a},@var{b},@var{c}}
-@item @code{void __MQCPXIS (acc, sw2, sw2)}
-@tab @code{__MQCPXIS (@var{c}, @var{a}, @var{b})}
-@tab @code{MQCPXIS @var{a},@var{b},@var{c}}
-@item @code{void __MQCPXIU (acc, uw2, uw2)}
-@tab @code{__MQCPXIU (@var{c}, @var{a}, @var{b})}
-@tab @code{MQCPXIU @var{a},@var{b},@var{c}}
-@item @code{void __MQCPXRS (acc, sw2, sw2)}
-@tab @code{__MQCPXRS (@var{c}, @var{a}, @var{b})}
-@tab @code{MQCPXRS @var{a},@var{b},@var{c}}
-@item @code{void __MQCPXRU (acc, uw2, uw2)}
-@tab @code{__MQCPXRU (@var{c}, @var{a}, @var{b})}
-@tab @code{MQCPXRU @var{a},@var{b},@var{c}}
-@item @code{sw2 __MQLCLRHS (sw2, sw2)}
-@tab @code{@var{c} = __MQLCLRHS (@var{a}, @var{b})}
-@tab @code{MQLCLRHS @var{a},@var{b},@var{c}}
-@item @code{sw2 __MQLMTHS (sw2, sw2)}
-@tab @code{@var{c} = __MQLMTHS (@var{a}, @var{b})}
-@tab @code{MQLMTHS @var{a},@var{b},@var{c}}
-@item @code{void __MQMACHS (acc, sw2, sw2)}
-@tab @code{__MQMACHS (@var{c}, @var{a}, @var{b})}
-@tab @code{MQMACHS @var{a},@var{b},@var{c}}
-@item @code{void __MQMACHU (acc, uw2, uw2)}
-@tab @code{__MQMACHU (@var{c}, @var{a}, @var{b})}
-@tab @code{MQMACHU @var{a},@var{b},@var{c}}
-@item @code{void __MQMACXHS (acc, sw2, sw2)}
-@tab @code{__MQMACXHS (@var{c}, @var{a}, @var{b})}
-@tab @code{MQMACXHS @var{a},@var{b},@var{c}}
-@item @code{void __MQMULHS (acc, sw2, sw2)}
-@tab @code{__MQMULHS (@var{c}, @var{a}, @var{b})}
-@tab @code{MQMULHS @var{a},@var{b},@var{c}}
-@item @code{void __MQMULHU (acc, uw2, uw2)}
-@tab @code{__MQMULHU (@var{c}, @var{a}, @var{b})}
-@tab @code{MQMULHU @var{a},@var{b},@var{c}}
-@item @code{void __MQMULXHS (acc, sw2, sw2)}
-@tab @code{__MQMULXHS (@var{c}, @var{a}, @var{b})}
-@tab @code{MQMULXHS @var{a},@var{b},@var{c}}
-@item @code{void __MQMULXHU (acc, uw2, uw2)}
-@tab @code{__MQMULXHU (@var{c}, @var{a}, @var{b})}
-@tab @code{MQMULXHU @var{a},@var{b},@var{c}}
-@item @code{sw2 __MQSATHS (sw2, sw2)}
-@tab @code{@var{c} = __MQSATHS (@var{a}, @var{b})}
-@tab @code{MQSATHS @var{a},@var{b},@var{c}}
-@item @code{uw2 __MQSLLHI (uw2, int)}
-@tab @code{@var{c} = __MQSLLHI (@var{a}, @var{b})}
-@tab @code{MQSLLHI @var{a},@var{b},@var{c}}
-@item @code{sw2 __MQSRAHI (sw2, int)}
-@tab @code{@var{c} = __MQSRAHI (@var{a}, @var{b})}
-@tab @code{MQSRAHI @var{a},@var{b},@var{c}}
-@item @code{sw2 __MQSUBHSS (sw2, sw2)}
-@tab @code{@var{c} = __MQSUBHSS (@var{a}, @var{b})}
-@tab @code{MQSUBHSS @var{a},@var{b},@var{c}}
-@item @code{uw2 __MQSUBHUS (uw2, uw2)}
-@tab @code{@var{c} = __MQSUBHUS (@var{a}, @var{b})}
-@tab @code{MQSUBHUS @var{a},@var{b},@var{c}}
-@item @code{void __MQXMACHS (acc, sw2, sw2)}
-@tab @code{__MQXMACHS (@var{c}, @var{a}, @var{b})}
-@tab @code{MQXMACHS @var{a},@var{b},@var{c}}
-@item @code{void __MQXMACXHS (acc, sw2, sw2)}
-@tab @code{__MQXMACXHS (@var{c}, @var{a}, @var{b})}
-@tab @code{MQXMACXHS @var{a},@var{b},@var{c}}
-@item @code{uw1 __MRDACC (acc)}
-@tab @code{@var{b} = __MRDACC (@var{a})}
-@tab @code{MRDACC @var{a},@var{b}}
-@item @code{uw1 __MRDACCG (acc)}
-@tab @code{@var{b} = __MRDACCG (@var{a})}
-@tab @code{MRDACCG @var{a},@var{b}}
-@item @code{uw1 __MROTLI (uw1, const)}
-@tab @code{@var{c} = __MROTLI (@var{a}, @var{b})}
-@tab @code{MROTLI @var{a},#@var{b},@var{c}}
-@item @code{uw1 __MROTRI (uw1, const)}
-@tab @code{@var{c} = __MROTRI (@var{a}, @var{b})}
-@tab @code{MROTRI @var{a},#@var{b},@var{c}}
-@item @code{sw1 __MSATHS (sw1, sw1)}
-@tab @code{@var{c} = __MSATHS (@var{a}, @var{b})}
-@tab @code{MSATHS @var{a},@var{b},@var{c}}
-@item @code{uw1 __MSATHU (uw1, uw1)}
-@tab @code{@var{c} = __MSATHU (@var{a}, @var{b})}
-@tab @code{MSATHU @var{a},@var{b},@var{c}}
-@item @code{uw1 __MSLLHI (uw1, const)}
-@tab @code{@var{c} = __MSLLHI (@var{a}, @var{b})}
-@tab @code{MSLLHI @var{a},#@var{b},@var{c}}
-@item @code{sw1 __MSRAHI (sw1, const)}
-@tab @code{@var{c} = __MSRAHI (@var{a}, @var{b})}
-@tab @code{MSRAHI @var{a},#@var{b},@var{c}}
-@item @code{uw1 __MSRLHI (uw1, const)}
-@tab @code{@var{c} = __MSRLHI (@var{a}, @var{b})}
-@tab @code{MSRLHI @var{a},#@var{b},@var{c}}
-@item @code{void __MSUBACCS (acc, acc)}
-@tab @code{__MSUBACCS (@var{b}, @var{a})}
-@tab @code{MSUBACCS @var{a},@var{b}}
-@item @code{sw1 __MSUBHSS (sw1, sw1)}
-@tab @code{@var{c} = __MSUBHSS (@var{a}, @var{b})}
-@tab @code{MSUBHSS @var{a},@var{b},@var{c}}
-@item @code{uw1 __MSUBHUS (uw1, uw1)}
-@tab @code{@var{c} = __MSUBHUS (@var{a}, @var{b})}
-@tab @code{MSUBHUS @var{a},@var{b},@var{c}}
-@item @code{void __MTRAP (void)}
-@tab @code{__MTRAP ()}
-@tab @code{MTRAP}
-@item @code{uw2 __MUNPACKH (uw1)}
-@tab @code{@var{b} = __MUNPACKH (@var{a})}
-@tab @code{MUNPACKH @var{a},@var{b}}
-@item @code{uw1 __MWCUT (uw2, uw1)}
-@tab @code{@var{c} = __MWCUT (@var{a}, @var{b})}
-@tab @code{MWCUT @var{a},@var{b},@var{c}}
-@item @code{void __MWTACC (acc, uw1)}
-@tab @code{__MWTACC (@var{b}, @var{a})}
-@tab @code{MWTACC @var{a},@var{b}}
-@item @code{void __MWTACCG (acc, uw1)}
-@tab @code{__MWTACCG (@var{b}, @var{a})}
-@tab @code{MWTACCG @var{a},@var{b}}
-@item @code{uw1 __MXOR (uw1, uw1)}
-@tab @code{@var{c} = __MXOR (@var{a}, @var{b})}
-@tab @code{MXOR @var{a},@var{b},@var{c}}
-@end multitable
-
-@node Raw read/write Functions
-@subsubsection Raw Read/Write Functions
-
-This sections describes built-in functions related to read and write
-instructions to access memory.  These functions generate
-@code{membar} instructions to flush the I/O load and stores where
-appropriate, as described in Fujitsu's manual described above.
-
-@table @code
-
-@item unsigned char __builtin_read8 (void *@var{data})
-@item unsigned short __builtin_read16 (void *@var{data})
-@item unsigned long __builtin_read32 (void *@var{data})
-@item unsigned long long __builtin_read64 (void *@var{data})
-
-@item void __builtin_write8 (void *@var{data}, unsigned char @var{datum})
-@item void __builtin_write16 (void *@var{data}, unsigned short @var{datum})
-@item void __builtin_write32 (void *@var{data}, unsigned long @var{datum})
-@item void __builtin_write64 (void *@var{data}, unsigned long long @var{datum})
-@end table
-
-@node Other Built-in Functions
-@subsubsection Other Built-in Functions
-
-This section describes built-in functions that are not named after
-a specific FR-V instruction.
-
-@table @code
-@item sw2 __IACCreadll (iacc @var{reg})
-Return the full 64-bit value of IACC0@.  The @var{reg} argument is reserved
-for future expansion and must be 0.
-
-@item sw1 __IACCreadl (iacc @var{reg})
-Return the value of IACC0H if @var{reg} is 0 and IACC0L if @var{reg} is 1.
-Other values of @var{reg} are rejected as invalid.
-
-@item void __IACCsetll (iacc @var{reg}, sw2 @var{x})
-Set the full 64-bit value of IACC0 to @var{x}.  The @var{reg} argument
-is reserved for future expansion and must be 0.
-
-@item void __IACCsetl (iacc @var{reg}, sw1 @var{x})
-Set IACC0H to @var{x} if @var{reg} is 0 and IACC0L to @var{x} if @var{reg}
-is 1.  Other values of @var{reg} are rejected as invalid.
-
-@item void __data_prefetch0 (const void *@var{x})
-Use the @code{dcpl} instruction to load the contents of address @var{x}
-into the data cache.
-
-@item void __data_prefetch (const void *@var{x})
-Use the @code{nldub} instruction to load the contents of address @var{x}
-into the data cache.  The instruction is issued in slot I1@.
-@end table
-
-@node LoongArch Base Built-in Functions
-@subsection LoongArch Base Built-in Functions
-
-These built-in functions are available for LoongArch.
-
-Data Type Description:
-@itemize
-@item @code{imm0_31}, a compile-time constant in range 0 to 31;
-@item @code{imm0_16383}, a compile-time constant in range 0 to 16383;
-@item @code{imm0_32767}, a compile-time constant in range 0 to 32767;
-@item @code{imm_n2048_2047}, a compile-time constant in range -2048 to 2047;
-@end itemize
-
-The intrinsics provided are listed below:
-@smallexample
-    unsigned int __builtin_loongarch_movfcsr2gr (imm0_31)
-    void __builtin_loongarch_movgr2fcsr (imm0_31, unsigned int)
-    void __builtin_loongarch_cacop_d (imm0_31, unsigned long int, imm_n2048_2047)
-    unsigned int __builtin_loongarch_cpucfg (unsigned int)
-    void __builtin_loongarch_asrtle_d (long int, long int)
-    void __builtin_loongarch_asrtgt_d (long int, long int)
-    long int __builtin_loongarch_lddir_d (long int, imm0_31)
-    void __builtin_loongarch_ldpte_d (long int, imm0_31)
-
-    int __builtin_loongarch_crc_w_b_w (char, int)
-    int __builtin_loongarch_crc_w_h_w (short, int)
-    int __builtin_loongarch_crc_w_w_w (int, int)
-    int __builtin_loongarch_crc_w_d_w (long int, int)
-    int __builtin_loongarch_crcc_w_b_w (char, int)
-    int __builtin_loongarch_crcc_w_h_w (short, int)
-    int __builtin_loongarch_crcc_w_w_w (int, int)
-    int __builtin_loongarch_crcc_w_d_w (long int, int)
-
-    unsigned int __builtin_loongarch_csrrd_w (imm0_16383)
-    unsigned int __builtin_loongarch_csrwr_w (unsigned int, imm0_16383)
-    unsigned int __builtin_loongarch_csrxchg_w (unsigned int, unsigned int, imm0_16383)
-    unsigned long int __builtin_loongarch_csrrd_d (imm0_16383)
-    unsigned long int __builtin_loongarch_csrwr_d (unsigned long int, imm0_16383)
-    unsigned long int __builtin_loongarch_csrxchg_d (unsigned long int, unsigned long int, imm0_16383)
-
-    unsigned char __builtin_loongarch_iocsrrd_b (unsigned int)
-    unsigned short __builtin_loongarch_iocsrrd_h (unsigned int)
-    unsigned int __builtin_loongarch_iocsrrd_w (unsigned int)
-    unsigned long int __builtin_loongarch_iocsrrd_d (unsigned int)
-    void __builtin_loongarch_iocsrwr_b (unsigned char, unsigned int)
-    void __builtin_loongarch_iocsrwr_h (unsigned short, unsigned int)
-    void __builtin_loongarch_iocsrwr_w (unsigned int, unsigned int)
-    void __builtin_loongarch_iocsrwr_d (unsigned long int, unsigned int)
-
-    void __builtin_loongarch_dbar (imm0_32767)
-    void __builtin_loongarch_ibar (imm0_32767)
-
-    void __builtin_loongarch_syscall (imm0_32767)
-    void __builtin_loongarch_break (imm0_32767)
-@end smallexample
-
-These intrinsic functions are available by using @option{-mfrecipe}.
-@smallexample
-    float __builtin_loongarch_frecipe_s (float);
-    double  __builtin_loongarch_frecipe_d (double);
-    float __builtin_loongarch_frsqrte_s (float);
-    double  __builtin_loongarch_frsqrte_d (double);
-@end smallexample
-
-@emph{Note:}Since the control register is divided into 32-bit and 64-bit,
-but the access instruction is not distinguished. So GCC renames the control
-instructions when implementing intrinsics.
-
-Take the csrrd instruction as an example, built-in functions are implemented as follows:
-@smallexample
-  __builtin_loongarch_csrrd_w  // When reading the 32-bit control register use.
-  __builtin_loongarch_csrrd_d  // When reading the 64-bit control register use.
-@end smallexample
-
-For the convenience of use, the built-in functions are encapsulated,
-the encapsulated functions and @code{__drdtime_t, __rdtime_t} are
-defined in the @code{larchintrin.h}. So if you call the following
-function you need to include @code{larchintrin.h}.
-
-@smallexample
-     typedef struct drdtime@{
-            unsigned long dvalue;
-            unsigned long dtimeid;
-     @} __drdtime_t;
-
-     typedef struct rdtime@{
-            unsigned int value;
-            unsigned int timeid;
-     @} __rdtime_t;
-@end smallexample
-
-@smallexample
-    __drdtime_t __rdtime_d (void)
-    __rdtime_t  __rdtimel_w (void)
-    __rdtime_t  __rdtimeh_w (void)
-    unsigned int  __movfcsr2gr (imm0_31)
-    void __movgr2fcsr (imm0_31, unsigned int)
-    void __cacop_d (imm0_31, unsigned long, imm_n2048_2047)
-    unsigned int  __cpucfg (unsigned int)
-    void __asrtle_d (long int, long int)
-    void __asrtgt_d (long int, long int)
-    long int  __lddir_d (long int, imm0_31)
-    void __ldpte_d (long int, imm0_31)
-
-    int  __crc_w_b_w (char, int)
-    int  __crc_w_h_w (short, int)
-    int  __crc_w_w_w (int, int)
-    int  __crc_w_d_w (long int, int)
-    int  __crcc_w_b_w (char, int)
-    int  __crcc_w_h_w (short, int)
-    int  __crcc_w_w_w (int, int)
-    int  __crcc_w_d_w (long int, int)
-
-    unsigned int  __csrrd_w (imm0_16383)
-    unsigned int  __csrwr_w (unsigned int, imm0_16383)
-    unsigned int  __csrxchg_w (unsigned int, unsigned int, imm0_16383)
-    unsigned long  __csrrd_d (imm0_16383)
-    unsigned long  __csrwr_d (unsigned long, imm0_16383)
-    unsigned long  __csrxchg_d (unsigned long, unsigned long, imm0_16383)
-
-    unsigned char   __iocsrrd_b (unsigned int)
-    unsigned short  __iocsrrd_h (unsigned int)
-    unsigned int  __iocsrrd_w (unsigned int)
-    unsigned long  __iocsrrd_d (unsigned int)
-    void __iocsrwr_b (unsigned char, unsigned int)
-    void __iocsrwr_h (unsigned short, unsigned int)
-    void __iocsrwr_w (unsigned int, unsigned int)
-    void __iocsrwr_d (unsigned long, unsigned int)
-
-    void __dbar (imm0_32767)
-    void __ibar (imm0_32767)
-
-    void __syscall (imm0_32767)
-    void __break (imm0_32767)
-@end smallexample
-
-These intrinsic functions are available by including @code{larchintrin.h} and
-using @option{-mfrecipe}.
-@smallexample
-    float __frecipe_s (float);
-    double __frecipe_d (double);
-    float __frsqrte_s (float);
-    double __frsqrte_d (double);
-@end smallexample
-
-Additional built-in functions are available for LoongArch family
-processors to efficiently use 128-bit floating-point (__float128)
-values.
-
-The following are the basic built-in functions supported.
-@smallexample
-__float128 __builtin_fabsq (__float128);
-__float128 __builtin_copysignq (__float128, __float128);
-__float128 __builtin_infq (void);
-__float128 __builtin_huge_valq (void);
-__float128 __builtin_nanq (void);
-__float128 __builtin_nansq (void);
-@end smallexample
-
-Returns the value that is currently set in the @samp{tp} register.
-@smallexample
-    void * __builtin_thread_pointer (void)
-@end smallexample
-
-@node LoongArch SX Vector Intrinsics
-@subsection LoongArch SX Vector Intrinsics
-
-GCC provides intrinsics to access the LSX (Loongson SIMD Extension) instructions.
-The interface is made available by including @code{<lsxintrin.h>} and using
-@option{-mlsx}.
-
-The following vectors typedefs are included in @code{lsxintrin.h}:
-
-@itemize
-@item @code{__m128i}, a 128-bit vector of fixed point;
-@item @code{__m128}, a 128-bit vector of single precision floating point;
-@item @code{__m128d}, a 128-bit vector of double precision floating point.
-@end itemize
-
-Instructions and corresponding built-ins may have additional restrictions and/or
-input/output values manipulated:
-@itemize
-@item @code{imm0_1}, an integer literal in range 0 to 1;
-@item @code{imm0_3}, an integer literal in range 0 to 3;
-@item @code{imm0_7}, an integer literal in range 0 to 7;
-@item @code{imm0_15}, an integer literal in range 0 to 15;
-@item @code{imm0_31}, an integer literal in range 0 to 31;
-@item @code{imm0_63}, an integer literal in range 0 to 63;
-@item @code{imm0_127}, an integer literal in range 0 to 127;
-@item @code{imm0_255}, an integer literal in range 0 to 255;
-@item @code{imm_n16_15}, an integer literal in range -16 to 15;
-@item @code{imm_n128_127}, an integer literal in range -128 to 127;
-@item @code{imm_n256_255}, an integer literal in range -256 to 255;
-@item @code{imm_n512_511}, an integer literal in range -512 to 511;
-@item @code{imm_n1024_1023}, an integer literal in range -1024 to 1023;
-@item @code{imm_n2048_2047}, an integer literal in range -2048 to 2047.
-@end itemize
-
-For convenience, GCC defines functions @code{__lsx_vrepli_@{b/h/w/d@}} and
-@code{__lsx_b[n]z_@{v/b/h/w/d@}}, which are implemented as follows:
-
-@smallexample
-a. @code{__lsx_vrepli_@{b/h/w/d@}}: Implemented the case where the highest
-   bit of @code{vldi} instruction @code{i13} is 1.
-
-   i13[12] == 1'b0
-   case i13[11:10] of :
-     2'b00: __lsx_vrepli_b (imm_n512_511)
-     2'b01: __lsx_vrepli_h (imm_n512_511)
-     2'b10: __lsx_vrepli_w (imm_n512_511)
-     2'b11: __lsx_vrepli_d (imm_n512_511)
-
-b. @code{__lsx_b[n]z_@{v/b/h/w/d@}}: Since the @code{vseteqz} class directive
-   cannot be used on its own, this function is defined.
-
-   _lsx_bz_v  => vseteqz.v + bcnez
-   _lsx_bnz_v => vsetnez.v + bcnez
-   _lsx_bz_b  => vsetanyeqz.b + bcnez
-   _lsx_bz_h  => vsetanyeqz.h + bcnez
-   _lsx_bz_w  => vsetanyeqz.w + bcnez
-   _lsx_bz_d  => vsetanyeqz.d + bcnez
-   _lsx_bnz_b => vsetallnez.b + bcnez
-   _lsx_bnz_h => vsetallnez.h + bcnez
-   _lsx_bnz_w => vsetallnez.w + bcnez
-   _lsx_bnz_d => vsetallnez.d + bcnez
-@end smallexample
-
-@smallexample
-eg:
-  #include <lsxintrin.h>
-
-  extern __m128i @var{a};
-
-  void
-  test (void)
-  @{
-    if (__lsx_bz_v (@var{a}))
-      printf ("1\n");
-    else
-      printf ("2\n");
-  @}
-@end smallexample
-
-@emph{Note:} For directives where the intent operand is also the source operand
-(modifying only part of the bitfield of the intent register), the first parameter
-in the builtin call function is used as the intent operand.
-
-@smallexample
-eg:
-  #include <lsxintrin.h>
-
-  extern __m128i @var{dst};
-  extern int @var{src};
-
-  void
-  test (void)
-  @{
-    @var{dst} = __lsx_vinsgr2vr_b (@var{dst}, @var{src}, 3);
-  @}
-@end smallexample
-
-The intrinsics provided are listed below:
-@smallexample
-int __lsx_bnz_b (__m128i);
-int __lsx_bnz_d (__m128i);
-int __lsx_bnz_h (__m128i);
-int __lsx_bnz_v (__m128i);
-int __lsx_bnz_w (__m128i);
-int __lsx_bz_b (__m128i);
-int __lsx_bz_d (__m128i);
-int __lsx_bz_h (__m128i);
-int __lsx_bz_v (__m128i);
-int __lsx_bz_w (__m128i);
-__m128i __lsx_vabsd_b (__m128i, __m128i);
-__m128i __lsx_vabsd_bu (__m128i, __m128i);
-__m128i __lsx_vabsd_d (__m128i, __m128i);
-__m128i __lsx_vabsd_du (__m128i, __m128i);
-__m128i __lsx_vabsd_h (__m128i, __m128i);
-__m128i __lsx_vabsd_hu (__m128i, __m128i);
-__m128i __lsx_vabsd_w (__m128i, __m128i);
-__m128i __lsx_vabsd_wu (__m128i, __m128i);
-__m128i __lsx_vadda_b (__m128i, __m128i);
-__m128i __lsx_vadda_d (__m128i, __m128i);
-__m128i __lsx_vadda_h (__m128i, __m128i);
-__m128i __lsx_vadda_w (__m128i, __m128i);
-__m128i __lsx_vadd_b (__m128i, __m128i);
-__m128i __lsx_vadd_d (__m128i, __m128i);
-__m128i __lsx_vadd_h (__m128i, __m128i);
-__m128i __lsx_vaddi_bu (__m128i, imm0_31);
-__m128i __lsx_vaddi_du (__m128i, imm0_31);
-__m128i __lsx_vaddi_hu (__m128i, imm0_31);
-__m128i __lsx_vaddi_wu (__m128i, imm0_31);
-__m128i __lsx_vadd_q (__m128i, __m128i);
-__m128i __lsx_vadd_w (__m128i, __m128i);
-__m128i __lsx_vaddwev_d_w (__m128i, __m128i);
-__m128i __lsx_vaddwev_d_wu (__m128i, __m128i);
-__m128i __lsx_vaddwev_d_wu_w (__m128i, __m128i);
-__m128i __lsx_vaddwev_h_b (__m128i, __m128i);
-__m128i __lsx_vaddwev_h_bu (__m128i, __m128i);
-__m128i __lsx_vaddwev_h_bu_b (__m128i, __m128i);
-__m128i __lsx_vaddwev_q_d (__m128i, __m128i);
-__m128i __lsx_vaddwev_q_du (__m128i, __m128i);
-__m128i __lsx_vaddwev_q_du_d (__m128i, __m128i);
-__m128i __lsx_vaddwev_w_h (__m128i, __m128i);
-__m128i __lsx_vaddwev_w_hu (__m128i, __m128i);
-__m128i __lsx_vaddwev_w_hu_h (__m128i, __m128i);
-__m128i __lsx_vaddwod_d_w (__m128i, __m128i);
-__m128i __lsx_vaddwod_d_wu (__m128i, __m128i);
-__m128i __lsx_vaddwod_d_wu_w (__m128i, __m128i);
-__m128i __lsx_vaddwod_h_b (__m128i, __m128i);
-__m128i __lsx_vaddwod_h_bu (__m128i, __m128i);
-__m128i __lsx_vaddwod_h_bu_b (__m128i, __m128i);
-__m128i __lsx_vaddwod_q_d (__m128i, __m128i);
-__m128i __lsx_vaddwod_q_du (__m128i, __m128i);
-__m128i __lsx_vaddwod_q_du_d (__m128i, __m128i);
-__m128i __lsx_vaddwod_w_h (__m128i, __m128i);
-__m128i __lsx_vaddwod_w_hu (__m128i, __m128i);
-__m128i __lsx_vaddwod_w_hu_h (__m128i, __m128i);
-__m128i __lsx_vandi_b (__m128i, imm0_255);
-__m128i __lsx_vandn_v (__m128i, __m128i);
-__m128i __lsx_vand_v (__m128i, __m128i);
-__m128i __lsx_vavg_b (__m128i, __m128i);
-__m128i __lsx_vavg_bu (__m128i, __m128i);
-__m128i __lsx_vavg_d (__m128i, __m128i);
-__m128i __lsx_vavg_du (__m128i, __m128i);
-__m128i __lsx_vavg_h (__m128i, __m128i);
-__m128i __lsx_vavg_hu (__m128i, __m128i);
-__m128i __lsx_vavgr_b (__m128i, __m128i);
-__m128i __lsx_vavgr_bu (__m128i, __m128i);
-__m128i __lsx_vavgr_d (__m128i, __m128i);
-__m128i __lsx_vavgr_du (__m128i, __m128i);
-__m128i __lsx_vavgr_h (__m128i, __m128i);
-__m128i __lsx_vavgr_hu (__m128i, __m128i);
-__m128i __lsx_vavgr_w (__m128i, __m128i);
-__m128i __lsx_vavgr_wu (__m128i, __m128i);
-__m128i __lsx_vavg_w (__m128i, __m128i);
-__m128i __lsx_vavg_wu (__m128i, __m128i);
-__m128i __lsx_vbitclr_b (__m128i, __m128i);
-__m128i __lsx_vbitclr_d (__m128i, __m128i);
-__m128i __lsx_vbitclr_h (__m128i, __m128i);
-__m128i __lsx_vbitclri_b (__m128i, imm0_7);
-__m128i __lsx_vbitclri_d (__m128i, imm0_63);
-__m128i __lsx_vbitclri_h (__m128i, imm0_15);
-__m128i __lsx_vbitclri_w (__m128i, imm0_31);
-__m128i __lsx_vbitclr_w (__m128i, __m128i);
-__m128i __lsx_vbitrev_b (__m128i, __m128i);
-__m128i __lsx_vbitrev_d (__m128i, __m128i);
-__m128i __lsx_vbitrev_h (__m128i, __m128i);
-__m128i __lsx_vbitrevi_b (__m128i, imm0_7);
-__m128i __lsx_vbitrevi_d (__m128i, imm0_63);
-__m128i __lsx_vbitrevi_h (__m128i, imm0_15);
-__m128i __lsx_vbitrevi_w (__m128i, imm0_31);
-__m128i __lsx_vbitrev_w (__m128i, __m128i);
-__m128i __lsx_vbitseli_b (__m128i, __m128i, imm0_255);
-__m128i __lsx_vbitsel_v (__m128i, __m128i, __m128i);
-__m128i __lsx_vbitset_b (__m128i, __m128i);
-__m128i __lsx_vbitset_d (__m128i, __m128i);
-__m128i __lsx_vbitset_h (__m128i, __m128i);
-__m128i __lsx_vbitseti_b (__m128i, imm0_7);
-__m128i __lsx_vbitseti_d (__m128i, imm0_63);
-__m128i __lsx_vbitseti_h (__m128i, imm0_15);
-__m128i __lsx_vbitseti_w (__m128i, imm0_31);
-__m128i __lsx_vbitset_w (__m128i, __m128i);
-__m128i __lsx_vbsll_v (__m128i, imm0_31);
-__m128i __lsx_vbsrl_v (__m128i, imm0_31);
-__m128i __lsx_vclo_b (__m128i);
-__m128i __lsx_vclo_d (__m128i);
-__m128i __lsx_vclo_h (__m128i);
-__m128i __lsx_vclo_w (__m128i);
-__m128i __lsx_vclz_b (__m128i);
-__m128i __lsx_vclz_d (__m128i);
-__m128i __lsx_vclz_h (__m128i);
-__m128i __lsx_vclz_w (__m128i);
-__m128i __lsx_vdiv_b (__m128i, __m128i);
-__m128i __lsx_vdiv_bu (__m128i, __m128i);
-__m128i __lsx_vdiv_d (__m128i, __m128i);
-__m128i __lsx_vdiv_du (__m128i, __m128i);
-__m128i __lsx_vdiv_h (__m128i, __m128i);
-__m128i __lsx_vdiv_hu (__m128i, __m128i);
-__m128i __lsx_vdiv_w (__m128i, __m128i);
-__m128i __lsx_vdiv_wu (__m128i, __m128i);
-__m128i __lsx_vexth_du_wu (__m128i);
-__m128i __lsx_vexth_d_w (__m128i);
-__m128i __lsx_vexth_h_b (__m128i);
-__m128i __lsx_vexth_hu_bu (__m128i);
-__m128i __lsx_vexth_q_d (__m128i);
-__m128i __lsx_vexth_qu_du (__m128i);
-__m128i __lsx_vexth_w_h (__m128i);
-__m128i __lsx_vexth_wu_hu (__m128i);
-__m128i __lsx_vextl_q_d (__m128i);
-__m128i __lsx_vextl_qu_du (__m128i);
-__m128i __lsx_vextrins_b (__m128i, __m128i, imm0_255);
-__m128i __lsx_vextrins_d (__m128i, __m128i, imm0_255);
-__m128i __lsx_vextrins_h (__m128i, __m128i, imm0_255);
-__m128i __lsx_vextrins_w (__m128i, __m128i, imm0_255);
-__m128d __lsx_vfadd_d (__m128d, __m128d);
-__m128 __lsx_vfadd_s (__m128, __m128);
-__m128i __lsx_vfclass_d (__m128d);
-__m128i __lsx_vfclass_s (__m128);
-__m128i __lsx_vfcmp_caf_d (__m128d, __m128d);
-__m128i __lsx_vfcmp_caf_s (__m128, __m128);
-__m128i __lsx_vfcmp_ceq_d (__m128d, __m128d);
-__m128i __lsx_vfcmp_ceq_s (__m128, __m128);
-__m128i __lsx_vfcmp_cle_d (__m128d, __m128d);
-__m128i __lsx_vfcmp_cle_s (__m128, __m128);
-__m128i __lsx_vfcmp_clt_d (__m128d, __m128d);
-__m128i __lsx_vfcmp_clt_s (__m128, __m128);
-__m128i __lsx_vfcmp_cne_d (__m128d, __m128d);
-__m128i __lsx_vfcmp_cne_s (__m128, __m128);
-__m128i __lsx_vfcmp_cor_d (__m128d, __m128d);
-__m128i __lsx_vfcmp_cor_s (__m128, __m128);
-__m128i __lsx_vfcmp_cueq_d (__m128d, __m128d);
-__m128i __lsx_vfcmp_cueq_s (__m128, __m128);
-__m128i __lsx_vfcmp_cule_d (__m128d, __m128d);
-__m128i __lsx_vfcmp_cule_s (__m128, __m128);
-__m128i __lsx_vfcmp_cult_d (__m128d, __m128d);
-__m128i __lsx_vfcmp_cult_s (__m128, __m128);
-__m128i __lsx_vfcmp_cun_d (__m128d, __m128d);
-__m128i __lsx_vfcmp_cune_d (__m128d, __m128d);
-__m128i __lsx_vfcmp_cune_s (__m128, __m128);
-__m128i __lsx_vfcmp_cun_s (__m128, __m128);
-__m128i __lsx_vfcmp_saf_d (__m128d, __m128d);
-__m128i __lsx_vfcmp_saf_s (__m128, __m128);
-__m128i __lsx_vfcmp_seq_d (__m128d, __m128d);
-__m128i __lsx_vfcmp_seq_s (__m128, __m128);
-__m128i __lsx_vfcmp_sle_d (__m128d, __m128d);
-__m128i __lsx_vfcmp_sle_s (__m128, __m128);
-__m128i __lsx_vfcmp_slt_d (__m128d, __m128d);
-__m128i __lsx_vfcmp_slt_s (__m128, __m128);
-__m128i __lsx_vfcmp_sne_d (__m128d, __m128d);
-__m128i __lsx_vfcmp_sne_s (__m128, __m128);
-__m128i __lsx_vfcmp_sor_d (__m128d, __m128d);
-__m128i __lsx_vfcmp_sor_s (__m128, __m128);
-__m128i __lsx_vfcmp_sueq_d (__m128d, __m128d);
-__m128i __lsx_vfcmp_sueq_s (__m128, __m128);
-__m128i __lsx_vfcmp_sule_d (__m128d, __m128d);
-__m128i __lsx_vfcmp_sule_s (__m128, __m128);
-__m128i __lsx_vfcmp_sult_d (__m128d, __m128d);
-__m128i __lsx_vfcmp_sult_s (__m128, __m128);
-__m128i __lsx_vfcmp_sun_d (__m128d, __m128d);
-__m128i __lsx_vfcmp_sune_d (__m128d, __m128d);
-__m128i __lsx_vfcmp_sune_s (__m128, __m128);
-__m128i __lsx_vfcmp_sun_s (__m128, __m128);
-__m128d __lsx_vfcvth_d_s (__m128);
-__m128i __lsx_vfcvt_h_s (__m128, __m128);
-__m128 __lsx_vfcvth_s_h (__m128i);
-__m128d __lsx_vfcvtl_d_s (__m128);
-__m128 __lsx_vfcvtl_s_h (__m128i);
-__m128 __lsx_vfcvt_s_d (__m128d, __m128d);
-__m128d __lsx_vfdiv_d (__m128d, __m128d);
-__m128 __lsx_vfdiv_s (__m128, __m128);
-__m128d __lsx_vffint_d_l (__m128i);
-__m128d __lsx_vffint_d_lu (__m128i);
-__m128d __lsx_vffinth_d_w (__m128i);
-__m128d __lsx_vffintl_d_w (__m128i);
-__m128 __lsx_vffint_s_l (__m128i, __m128i);
-__m128 __lsx_vffint_s_w (__m128i);
-__m128 __lsx_vffint_s_wu (__m128i);
-__m128d __lsx_vflogb_d (__m128d);
-__m128 __lsx_vflogb_s (__m128);
-__m128d __lsx_vfmadd_d (__m128d, __m128d, __m128d);
-__m128 __lsx_vfmadd_s (__m128, __m128, __m128);
-__m128d __lsx_vfmaxa_d (__m128d, __m128d);
-__m128 __lsx_vfmaxa_s (__m128, __m128);
-__m128d __lsx_vfmax_d (__m128d, __m128d);
-__m128 __lsx_vfmax_s (__m128, __m128);
-__m128d __lsx_vfmina_d (__m128d, __m128d);
-__m128 __lsx_vfmina_s (__m128, __m128);
-__m128d __lsx_vfmin_d (__m128d, __m128d);
-__m128 __lsx_vfmin_s (__m128, __m128);
-__m128d __lsx_vfmsub_d (__m128d, __m128d, __m128d);
-__m128 __lsx_vfmsub_s (__m128, __m128, __m128);
-__m128d __lsx_vfmul_d (__m128d, __m128d);
-__m128 __lsx_vfmul_s (__m128, __m128);
-__m128d __lsx_vfnmadd_d (__m128d, __m128d, __m128d);
-__m128 __lsx_vfnmadd_s (__m128, __m128, __m128);
-__m128d __lsx_vfnmsub_d (__m128d, __m128d, __m128d);
-__m128 __lsx_vfnmsub_s (__m128, __m128, __m128);
-__m128d __lsx_vfrecip_d (__m128d);
-__m128 __lsx_vfrecip_s (__m128);
-__m128d __lsx_vfrint_d (__m128d);
-__m128d __lsx_vfrintrm_d (__m128d);
-__m128 __lsx_vfrintrm_s (__m128);
-__m128d __lsx_vfrintrne_d (__m128d);
-__m128 __lsx_vfrintrne_s (__m128);
-__m128d __lsx_vfrintrp_d (__m128d);
-__m128 __lsx_vfrintrp_s (__m128);
-__m128d __lsx_vfrintrz_d (__m128d);
-__m128 __lsx_vfrintrz_s (__m128);
-__m128 __lsx_vfrint_s (__m128);
-__m128d __lsx_vfrsqrt_d (__m128d);
-__m128 __lsx_vfrsqrt_s (__m128);
-__m128i __lsx_vfrstp_b (__m128i, __m128i, __m128i);
-__m128i __lsx_vfrstp_h (__m128i, __m128i, __m128i);
-__m128i __lsx_vfrstpi_b (__m128i, __m128i, imm0_31);
-__m128i __lsx_vfrstpi_h (__m128i, __m128i, imm0_31);
-__m128d __lsx_vfsqrt_d (__m128d);
-__m128 __lsx_vfsqrt_s (__m128);
-__m128d __lsx_vfsub_d (__m128d, __m128d);
-__m128 __lsx_vfsub_s (__m128, __m128);
-__m128i __lsx_vftinth_l_s (__m128);
-__m128i __lsx_vftint_l_d (__m128d);
-__m128i __lsx_vftintl_l_s (__m128);
-__m128i __lsx_vftint_lu_d (__m128d);
-__m128i __lsx_vftintrmh_l_s (__m128);
-__m128i __lsx_vftintrm_l_d (__m128d);
-__m128i __lsx_vftintrml_l_s (__m128);
-__m128i __lsx_vftintrm_w_d (__m128d, __m128d);
-__m128i __lsx_vftintrm_w_s (__m128);
-__m128i __lsx_vftintrneh_l_s (__m128);
-__m128i __lsx_vftintrne_l_d (__m128d);
-__m128i __lsx_vftintrnel_l_s (__m128);
-__m128i __lsx_vftintrne_w_d (__m128d, __m128d);
-__m128i __lsx_vftintrne_w_s (__m128);
-__m128i __lsx_vftintrph_l_s (__m128);
-__m128i __lsx_vftintrp_l_d (__m128d);
-__m128i __lsx_vftintrpl_l_s (__m128);
-__m128i __lsx_vftintrp_w_d (__m128d, __m128d);
-__m128i __lsx_vftintrp_w_s (__m128);
-__m128i __lsx_vftintrzh_l_s (__m128);
-__m128i __lsx_vftintrz_l_d (__m128d);
-__m128i __lsx_vftintrzl_l_s (__m128);
-__m128i __lsx_vftintrz_lu_d (__m128d);
-__m128i __lsx_vftintrz_w_d (__m128d, __m128d);
-__m128i __lsx_vftintrz_w_s (__m128);
-__m128i __lsx_vftintrz_wu_s (__m128);
-__m128i __lsx_vftint_w_d (__m128d, __m128d);
-__m128i __lsx_vftint_w_s (__m128);
-__m128i __lsx_vftint_wu_s (__m128);
-__m128i __lsx_vhaddw_du_wu (__m128i, __m128i);
-__m128i __lsx_vhaddw_d_w (__m128i, __m128i);
-__m128i __lsx_vhaddw_h_b (__m128i, __m128i);
-__m128i __lsx_vhaddw_hu_bu (__m128i, __m128i);
-__m128i __lsx_vhaddw_q_d (__m128i, __m128i);
-__m128i __lsx_vhaddw_qu_du (__m128i, __m128i);
-__m128i __lsx_vhaddw_w_h (__m128i, __m128i);
-__m128i __lsx_vhaddw_wu_hu (__m128i, __m128i);
-__m128i __lsx_vhsubw_du_wu (__m128i, __m128i);
-__m128i __lsx_vhsubw_d_w (__m128i, __m128i);
-__m128i __lsx_vhsubw_h_b (__m128i, __m128i);
-__m128i __lsx_vhsubw_hu_bu (__m128i, __m128i);
-__m128i __lsx_vhsubw_q_d (__m128i, __m128i);
-__m128i __lsx_vhsubw_qu_du (__m128i, __m128i);
-__m128i __lsx_vhsubw_w_h (__m128i, __m128i);
-__m128i __lsx_vhsubw_wu_hu (__m128i, __m128i);
-__m128i __lsx_vilvh_b (__m128i, __m128i);
-__m128i __lsx_vilvh_d (__m128i, __m128i);
-__m128i __lsx_vilvh_h (__m128i, __m128i);
-__m128i __lsx_vilvh_w (__m128i, __m128i);
-__m128i __lsx_vilvl_b (__m128i, __m128i);
-__m128i __lsx_vilvl_d (__m128i, __m128i);
-__m128i __lsx_vilvl_h (__m128i, __m128i);
-__m128i __lsx_vilvl_w (__m128i, __m128i);
-__m128i __lsx_vinsgr2vr_b (__m128i, int, imm0_15);
-__m128i __lsx_vinsgr2vr_d (__m128i, long int, imm0_1);
-__m128i __lsx_vinsgr2vr_h (__m128i, int, imm0_7);
-__m128i __lsx_vinsgr2vr_w (__m128i, int, imm0_3);
-__m128i __lsx_vld (void *, imm_n2048_2047);
-__m128i __lsx_vldi (imm_n1024_1023);
-__m128i __lsx_vldrepl_b (void *, imm_n2048_2047);
-__m128i __lsx_vldrepl_d (void *, imm_n256_255);
-__m128i __lsx_vldrepl_h (void *, imm_n1024_1023);
-__m128i __lsx_vldrepl_w (void *, imm_n512_511);
-__m128i __lsx_vldx (void *, long int);
-__m128i __lsx_vmadd_b (__m128i, __m128i, __m128i);
-__m128i __lsx_vmadd_d (__m128i, __m128i, __m128i);
-__m128i __lsx_vmadd_h (__m128i, __m128i, __m128i);
-__m128i __lsx_vmadd_w (__m128i, __m128i, __m128i);
-__m128i __lsx_vmaddwev_d_w (__m128i, __m128i, __m128i);
-__m128i __lsx_vmaddwev_d_wu (__m128i, __m128i, __m128i);
-__m128i __lsx_vmaddwev_d_wu_w (__m128i, __m128i, __m128i);
-__m128i __lsx_vmaddwev_h_b (__m128i, __m128i, __m128i);
-__m128i __lsx_vmaddwev_h_bu (__m128i, __m128i, __m128i);
-__m128i __lsx_vmaddwev_h_bu_b (__m128i, __m128i, __m128i);
-__m128i __lsx_vmaddwev_q_d (__m128i, __m128i, __m128i);
-__m128i __lsx_vmaddwev_q_du (__m128i, __m128i, __m128i);
-__m128i __lsx_vmaddwev_q_du_d (__m128i, __m128i, __m128i);
-__m128i __lsx_vmaddwev_w_h (__m128i, __m128i, __m128i);
-__m128i __lsx_vmaddwev_w_hu (__m128i, __m128i, __m128i);
-__m128i __lsx_vmaddwev_w_hu_h (__m128i, __m128i, __m128i);
-__m128i __lsx_vmaddwod_d_w (__m128i, __m128i, __m128i);
-__m128i __lsx_vmaddwod_d_wu (__m128i, __m128i, __m128i);
-__m128i __lsx_vmaddwod_d_wu_w (__m128i, __m128i, __m128i);
-__m128i __lsx_vmaddwod_h_b (__m128i, __m128i, __m128i);
-__m128i __lsx_vmaddwod_h_bu (__m128i, __m128i, __m128i);
-__m128i __lsx_vmaddwod_h_bu_b (__m128i, __m128i, __m128i);
-__m128i __lsx_vmaddwod_q_d (__m128i, __m128i, __m128i);
-__m128i __lsx_vmaddwod_q_du (__m128i, __m128i, __m128i);
-__m128i __lsx_vmaddwod_q_du_d (__m128i, __m128i, __m128i);
-__m128i __lsx_vmaddwod_w_h (__m128i, __m128i, __m128i);
-__m128i __lsx_vmaddwod_w_hu (__m128i, __m128i, __m128i);
-__m128i __lsx_vmaddwod_w_hu_h (__m128i, __m128i, __m128i);
-__m128i __lsx_vmax_b (__m128i, __m128i);
-__m128i __lsx_vmax_bu (__m128i, __m128i);
-__m128i __lsx_vmax_d (__m128i, __m128i);
-__m128i __lsx_vmax_du (__m128i, __m128i);
-__m128i __lsx_vmax_h (__m128i, __m128i);
-__m128i __lsx_vmax_hu (__m128i, __m128i);
-__m128i __lsx_vmaxi_b (__m128i, imm_n16_15);
-__m128i __lsx_vmaxi_bu (__m128i, imm0_31);
-__m128i __lsx_vmaxi_d (__m128i, imm_n16_15);
-__m128i __lsx_vmaxi_du (__m128i, imm0_31);
-__m128i __lsx_vmaxi_h (__m128i, imm_n16_15);
-__m128i __lsx_vmaxi_hu (__m128i, imm0_31);
-__m128i __lsx_vmaxi_w (__m128i, imm_n16_15);
-__m128i __lsx_vmaxi_wu (__m128i, imm0_31);
-__m128i __lsx_vmax_w (__m128i, __m128i);
-__m128i __lsx_vmax_wu (__m128i, __m128i);
-__m128i __lsx_vmin_b (__m128i, __m128i);
-__m128i __lsx_vmin_bu (__m128i, __m128i);
-__m128i __lsx_vmin_d (__m128i, __m128i);
-__m128i __lsx_vmin_du (__m128i, __m128i);
-__m128i __lsx_vmin_h (__m128i, __m128i);
-__m128i __lsx_vmin_hu (__m128i, __m128i);
-__m128i __lsx_vmini_b (__m128i, imm_n16_15);
-__m128i __lsx_vmini_bu (__m128i, imm0_31);
-__m128i __lsx_vmini_d (__m128i, imm_n16_15);
-__m128i __lsx_vmini_du (__m128i, imm0_31);
-__m128i __lsx_vmini_h (__m128i, imm_n16_15);
-__m128i __lsx_vmini_hu (__m128i, imm0_31);
-__m128i __lsx_vmini_w (__m128i, imm_n16_15);
-__m128i __lsx_vmini_wu (__m128i, imm0_31);
-__m128i __lsx_vmin_w (__m128i, __m128i);
-__m128i __lsx_vmin_wu (__m128i, __m128i);
-__m128i __lsx_vmod_b (__m128i, __m128i);
-__m128i __lsx_vmod_bu (__m128i, __m128i);
-__m128i __lsx_vmod_d (__m128i, __m128i);
-__m128i __lsx_vmod_du (__m128i, __m128i);
-__m128i __lsx_vmod_h (__m128i, __m128i);
-__m128i __lsx_vmod_hu (__m128i, __m128i);
-__m128i __lsx_vmod_w (__m128i, __m128i);
-__m128i __lsx_vmod_wu (__m128i, __m128i);
-__m128i __lsx_vmskgez_b (__m128i);
-__m128i __lsx_vmskltz_b (__m128i);
-__m128i __lsx_vmskltz_d (__m128i);
-__m128i __lsx_vmskltz_h (__m128i);
-__m128i __lsx_vmskltz_w (__m128i);
-__m128i __lsx_vmsknz_b (__m128i);
-__m128i __lsx_vmsub_b (__m128i, __m128i, __m128i);
-__m128i __lsx_vmsub_d (__m128i, __m128i, __m128i);
-__m128i __lsx_vmsub_h (__m128i, __m128i, __m128i);
-__m128i __lsx_vmsub_w (__m128i, __m128i, __m128i);
-__m128i __lsx_vmuh_b (__m128i, __m128i);
-__m128i __lsx_vmuh_bu (__m128i, __m128i);
-__m128i __lsx_vmuh_d (__m128i, __m128i);
-__m128i __lsx_vmuh_du (__m128i, __m128i);
-__m128i __lsx_vmuh_h (__m128i, __m128i);
-__m128i __lsx_vmuh_hu (__m128i, __m128i);
-__m128i __lsx_vmuh_w (__m128i, __m128i);
-__m128i __lsx_vmuh_wu (__m128i, __m128i);
-__m128i __lsx_vmul_b (__m128i, __m128i);
-__m128i __lsx_vmul_d (__m128i, __m128i);
-__m128i __lsx_vmul_h (__m128i, __m128i);
-__m128i __lsx_vmul_w (__m128i, __m128i);
-__m128i __lsx_vmulwev_d_w (__m128i, __m128i);
-__m128i __lsx_vmulwev_d_wu (__m128i, __m128i);
-__m128i __lsx_vmulwev_d_wu_w (__m128i, __m128i);
-__m128i __lsx_vmulwev_h_b (__m128i, __m128i);
-__m128i __lsx_vmulwev_h_bu (__m128i, __m128i);
-__m128i __lsx_vmulwev_h_bu_b (__m128i, __m128i);
-__m128i __lsx_vmulwev_q_d (__m128i, __m128i);
-__m128i __lsx_vmulwev_q_du (__m128i, __m128i);
-__m128i __lsx_vmulwev_q_du_d (__m128i, __m128i);
-__m128i __lsx_vmulwev_w_h (__m128i, __m128i);
-__m128i __lsx_vmulwev_w_hu (__m128i, __m128i);
-__m128i __lsx_vmulwev_w_hu_h (__m128i, __m128i);
-__m128i __lsx_vmulwod_d_w (__m128i, __m128i);
-__m128i __lsx_vmulwod_d_wu (__m128i, __m128i);
-__m128i __lsx_vmulwod_d_wu_w (__m128i, __m128i);
-__m128i __lsx_vmulwod_h_b (__m128i, __m128i);
-__m128i __lsx_vmulwod_h_bu (__m128i, __m128i);
-__m128i __lsx_vmulwod_h_bu_b (__m128i, __m128i);
-__m128i __lsx_vmulwod_q_d (__m128i, __m128i);
-__m128i __lsx_vmulwod_q_du (__m128i, __m128i);
-__m128i __lsx_vmulwod_q_du_d (__m128i, __m128i);
-__m128i __lsx_vmulwod_w_h (__m128i, __m128i);
-__m128i __lsx_vmulwod_w_hu (__m128i, __m128i);
-__m128i __lsx_vmulwod_w_hu_h (__m128i, __m128i);
-__m128i __lsx_vneg_b (__m128i);
-__m128i __lsx_vneg_d (__m128i);
-__m128i __lsx_vneg_h (__m128i);
-__m128i __lsx_vneg_w (__m128i);
-__m128i __lsx_vnori_b (__m128i, imm0_255);
-__m128i __lsx_vnor_v (__m128i, __m128i);
-__m128i __lsx_vori_b (__m128i, imm0_255);
-__m128i __lsx_vorn_v (__m128i, __m128i);
-__m128i __lsx_vor_v (__m128i, __m128i);
-__m128i __lsx_vpackev_b (__m128i, __m128i);
-__m128i __lsx_vpackev_d (__m128i, __m128i);
-__m128i __lsx_vpackev_h (__m128i, __m128i);
-__m128i __lsx_vpackev_w (__m128i, __m128i);
-__m128i __lsx_vpackod_b (__m128i, __m128i);
-__m128i __lsx_vpackod_d (__m128i, __m128i);
-__m128i __lsx_vpackod_h (__m128i, __m128i);
-__m128i __lsx_vpackod_w (__m128i, __m128i);
-__m128i __lsx_vpcnt_b (__m128i);
-__m128i __lsx_vpcnt_d (__m128i);
-__m128i __lsx_vpcnt_h (__m128i);
-__m128i __lsx_vpcnt_w (__m128i);
-__m128i __lsx_vpermi_w (__m128i, __m128i, imm0_255);
-__m128i __lsx_vpickev_b (__m128i, __m128i);
-__m128i __lsx_vpickev_d (__m128i, __m128i);
-__m128i __lsx_vpickev_h (__m128i, __m128i);
-__m128i __lsx_vpickev_w (__m128i, __m128i);
-__m128i __lsx_vpickod_b (__m128i, __m128i);
-__m128i __lsx_vpickod_d (__m128i, __m128i);
-__m128i __lsx_vpickod_h (__m128i, __m128i);
-__m128i __lsx_vpickod_w (__m128i, __m128i);
-int __lsx_vpickve2gr_b (__m128i, imm0_15);
-unsigned int __lsx_vpickve2gr_bu (__m128i, imm0_15);
-long int __lsx_vpickve2gr_d (__m128i, imm0_1);
-unsigned long int __lsx_vpickve2gr_du (__m128i, imm0_1);
-int __lsx_vpickve2gr_h (__m128i, imm0_7);
-unsigned int __lsx_vpickve2gr_hu (__m128i, imm0_7);
-int __lsx_vpickve2gr_w (__m128i, imm0_3);
-unsigned int __lsx_vpickve2gr_wu (__m128i, imm0_3);
-__m128i __lsx_vreplgr2vr_b (int);
-__m128i __lsx_vreplgr2vr_d (long int);
-__m128i __lsx_vreplgr2vr_h (int);
-__m128i __lsx_vreplgr2vr_w (int);
-__m128i __lsx_vrepli_b (imm_n512_511);
-__m128i __lsx_vrepli_d (imm_n512_511);
-__m128i __lsx_vrepli_h (imm_n512_511);
-__m128i __lsx_vrepli_w (imm_n512_511);
-__m128i __lsx_vreplve_b (__m128i, int);
-__m128i __lsx_vreplve_d (__m128i, int);
-__m128i __lsx_vreplve_h (__m128i, int);
-__m128i __lsx_vreplvei_b (__m128i, imm0_15);
-__m128i __lsx_vreplvei_d (__m128i, imm0_1);
-__m128i __lsx_vreplvei_h (__m128i, imm0_7);
-__m128i __lsx_vreplvei_w (__m128i, imm0_3);
-__m128i __lsx_vreplve_w (__m128i, int);
-__m128i __lsx_vrotr_b (__m128i, __m128i);
-__m128i __lsx_vrotr_d (__m128i, __m128i);
-__m128i __lsx_vrotr_h (__m128i, __m128i);
-__m128i __lsx_vrotri_b (__m128i, imm0_7);
-__m128i __lsx_vrotri_d (__m128i, imm0_63);
-__m128i __lsx_vrotri_h (__m128i, imm0_15);
-__m128i __lsx_vrotri_w (__m128i, imm0_31);
-__m128i __lsx_vrotr_w (__m128i, __m128i);
-__m128i __lsx_vsadd_b (__m128i, __m128i);
-__m128i __lsx_vsadd_bu (__m128i, __m128i);
-__m128i __lsx_vsadd_d (__m128i, __m128i);
-__m128i __lsx_vsadd_du (__m128i, __m128i);
-__m128i __lsx_vsadd_h (__m128i, __m128i);
-__m128i __lsx_vsadd_hu (__m128i, __m128i);
-__m128i __lsx_vsadd_w (__m128i, __m128i);
-__m128i __lsx_vsadd_wu (__m128i, __m128i);
-__m128i __lsx_vsat_b (__m128i, imm0_7);
-__m128i __lsx_vsat_bu (__m128i, imm0_7);
-__m128i __lsx_vsat_d (__m128i, imm0_63);
-__m128i __lsx_vsat_du (__m128i, imm0_63);
-__m128i __lsx_vsat_h (__m128i, imm0_15);
-__m128i __lsx_vsat_hu (__m128i, imm0_15);
-__m128i __lsx_vsat_w (__m128i, imm0_31);
-__m128i __lsx_vsat_wu (__m128i, imm0_31);
-__m128i __lsx_vseq_b (__m128i, __m128i);
-__m128i __lsx_vseq_d (__m128i, __m128i);
-__m128i __lsx_vseq_h (__m128i, __m128i);
-__m128i __lsx_vseqi_b (__m128i, imm_n16_15);
-__m128i __lsx_vseqi_d (__m128i, imm_n16_15);
-__m128i __lsx_vseqi_h (__m128i, imm_n16_15);
-__m128i __lsx_vseqi_w (__m128i, imm_n16_15);
-__m128i __lsx_vseq_w (__m128i, __m128i);
-__m128i __lsx_vshuf4i_b (__m128i, imm0_255);
-__m128i __lsx_vshuf4i_d (__m128i, __m128i, imm0_255);
-__m128i __lsx_vshuf4i_h (__m128i, imm0_255);
-__m128i __lsx_vshuf4i_w (__m128i, imm0_255);
-__m128i __lsx_vshuf_b (__m128i, __m128i, __m128i);
-__m128i __lsx_vshuf_d (__m128i, __m128i, __m128i);
-__m128i __lsx_vshuf_h (__m128i, __m128i, __m128i);
-__m128i __lsx_vshuf_w (__m128i, __m128i, __m128i);
-__m128i __lsx_vsigncov_b (__m128i, __m128i);
-__m128i __lsx_vsigncov_d (__m128i, __m128i);
-__m128i __lsx_vsigncov_h (__m128i, __m128i);
-__m128i __lsx_vsigncov_w (__m128i, __m128i);
-__m128i __lsx_vsle_b (__m128i, __m128i);
-__m128i __lsx_vsle_bu (__m128i, __m128i);
-__m128i __lsx_vsle_d (__m128i, __m128i);
-__m128i __lsx_vsle_du (__m128i, __m128i);
-__m128i __lsx_vsle_h (__m128i, __m128i);
-__m128i __lsx_vsle_hu (__m128i, __m128i);
-__m128i __lsx_vslei_b (__m128i, imm_n16_15);
-__m128i __lsx_vslei_bu (__m128i, imm0_31);
-__m128i __lsx_vslei_d (__m128i, imm_n16_15);
-__m128i __lsx_vslei_du (__m128i, imm0_31);
-__m128i __lsx_vslei_h (__m128i, imm_n16_15);
-__m128i __lsx_vslei_hu (__m128i, imm0_31);
-__m128i __lsx_vslei_w (__m128i, imm_n16_15);
-__m128i __lsx_vslei_wu (__m128i, imm0_31);
-__m128i __lsx_vsle_w (__m128i, __m128i);
-__m128i __lsx_vsle_wu (__m128i, __m128i);
-__m128i __lsx_vsll_b (__m128i, __m128i);
-__m128i __lsx_vsll_d (__m128i, __m128i);
-__m128i __lsx_vsll_h (__m128i, __m128i);
-__m128i __lsx_vslli_b (__m128i, imm0_7);
-__m128i __lsx_vslli_d (__m128i, imm0_63);
-__m128i __lsx_vslli_h (__m128i, imm0_15);
-__m128i __lsx_vslli_w (__m128i, imm0_31);
-__m128i __lsx_vsll_w (__m128i, __m128i);
-__m128i __lsx_vsllwil_du_wu (__m128i, imm0_31);
-__m128i __lsx_vsllwil_d_w (__m128i, imm0_31);
-__m128i __lsx_vsllwil_h_b (__m128i, imm0_7);
-__m128i __lsx_vsllwil_hu_bu (__m128i, imm0_7);
-__m128i __lsx_vsllwil_w_h (__m128i, imm0_15);
-__m128i __lsx_vsllwil_wu_hu (__m128i, imm0_15);
-__m128i __lsx_vslt_b (__m128i, __m128i);
-__m128i __lsx_vslt_bu (__m128i, __m128i);
-__m128i __lsx_vslt_d (__m128i, __m128i);
-__m128i __lsx_vslt_du (__m128i, __m128i);
-__m128i __lsx_vslt_h (__m128i, __m128i);
-__m128i __lsx_vslt_hu (__m128i, __m128i);
-__m128i __lsx_vslti_b (__m128i, imm_n16_15);
-__m128i __lsx_vslti_bu (__m128i, imm0_31);
-__m128i __lsx_vslti_d (__m128i, imm_n16_15);
-__m128i __lsx_vslti_du (__m128i, imm0_31);
-__m128i __lsx_vslti_h (__m128i, imm_n16_15);
-__m128i __lsx_vslti_hu (__m128i, imm0_31);
-__m128i __lsx_vslti_w (__m128i, imm_n16_15);
-__m128i __lsx_vslti_wu (__m128i, imm0_31);
-__m128i __lsx_vslt_w (__m128i, __m128i);
-__m128i __lsx_vslt_wu (__m128i, __m128i);
-__m128i __lsx_vsra_b (__m128i, __m128i);
-__m128i __lsx_vsra_d (__m128i, __m128i);
-__m128i __lsx_vsra_h (__m128i, __m128i);
-__m128i __lsx_vsrai_b (__m128i, imm0_7);
-__m128i __lsx_vsrai_d (__m128i, imm0_63);
-__m128i __lsx_vsrai_h (__m128i, imm0_15);
-__m128i __lsx_vsrai_w (__m128i, imm0_31);
-__m128i __lsx_vsran_b_h (__m128i, __m128i);
-__m128i __lsx_vsran_h_w (__m128i, __m128i);
-__m128i __lsx_vsrani_b_h (__m128i, __m128i, imm0_15);
-__m128i __lsx_vsrani_d_q (__m128i, __m128i, imm0_127);
-__m128i __lsx_vsrani_h_w (__m128i, __m128i, imm0_31);
-__m128i __lsx_vsrani_w_d (__m128i, __m128i, imm0_63);
-__m128i __lsx_vsran_w_d (__m128i, __m128i);
-__m128i __lsx_vsrar_b (__m128i, __m128i);
-__m128i __lsx_vsrar_d (__m128i, __m128i);
-__m128i __lsx_vsrar_h (__m128i, __m128i);
-__m128i __lsx_vsrari_b (__m128i, imm0_7);
-__m128i __lsx_vsrari_d (__m128i, imm0_63);
-__m128i __lsx_vsrari_h (__m128i, imm0_15);
-__m128i __lsx_vsrari_w (__m128i, imm0_31);
-__m128i __lsx_vsrarn_b_h (__m128i, __m128i);
-__m128i __lsx_vsrarn_h_w (__m128i, __m128i);
-__m128i __lsx_vsrarni_b_h (__m128i, __m128i, imm0_15);
-__m128i __lsx_vsrarni_d_q (__m128i, __m128i, imm0_127);
-__m128i __lsx_vsrarni_h_w (__m128i, __m128i, imm0_31);
-__m128i __lsx_vsrarni_w_d (__m128i, __m128i, imm0_63);
-__m128i __lsx_vsrarn_w_d (__m128i, __m128i);
-__m128i __lsx_vsrar_w (__m128i, __m128i);
-__m128i __lsx_vsra_w (__m128i, __m128i);
-__m128i __lsx_vsrl_b (__m128i, __m128i);
-__m128i __lsx_vsrl_d (__m128i, __m128i);
-__m128i __lsx_vsrl_h (__m128i, __m128i);
-__m128i __lsx_vsrli_b (__m128i, imm0_7);
-__m128i __lsx_vsrli_d (__m128i, imm0_63);
-__m128i __lsx_vsrli_h (__m128i, imm0_15);
-__m128i __lsx_vsrli_w (__m128i, imm0_31);
-__m128i __lsx_vsrln_b_h (__m128i, __m128i);
-__m128i __lsx_vsrln_h_w (__m128i, __m128i);
-__m128i __lsx_vsrlni_b_h (__m128i, __m128i, imm0_15);
-__m128i __lsx_vsrlni_d_q (__m128i, __m128i, imm0_127);
-__m128i __lsx_vsrlni_h_w (__m128i, __m128i, imm0_31);
-__m128i __lsx_vsrlni_w_d (__m128i, __m128i, imm0_63);
-__m128i __lsx_vsrln_w_d (__m128i, __m128i);
-__m128i __lsx_vsrlr_b (__m128i, __m128i);
-__m128i __lsx_vsrlr_d (__m128i, __m128i);
-__m128i __lsx_vsrlr_h (__m128i, __m128i);
-__m128i __lsx_vsrlri_b (__m128i, imm0_7);
-__m128i __lsx_vsrlri_d (__m128i, imm0_63);
-__m128i __lsx_vsrlri_h (__m128i, imm0_15);
-__m128i __lsx_vsrlri_w (__m128i, imm0_31);
-__m128i __lsx_vsrlrn_b_h (__m128i, __m128i);
-__m128i __lsx_vsrlrn_h_w (__m128i, __m128i);
-__m128i __lsx_vsrlrni_b_h (__m128i, __m128i, imm0_15);
-__m128i __lsx_vsrlrni_d_q (__m128i, __m128i, imm0_127);
-__m128i __lsx_vsrlrni_h_w (__m128i, __m128i, imm0_31);
-__m128i __lsx_vsrlrni_w_d (__m128i, __m128i, imm0_63);
-__m128i __lsx_vsrlrn_w_d (__m128i, __m128i);
-__m128i __lsx_vsrlr_w (__m128i, __m128i);
-__m128i __lsx_vsrl_w (__m128i, __m128i);
-__m128i __lsx_vssran_b_h (__m128i, __m128i);
-__m128i __lsx_vssran_bu_h (__m128i, __m128i);
-__m128i __lsx_vssran_hu_w (__m128i, __m128i);
-__m128i __lsx_vssran_h_w (__m128i, __m128i);
-__m128i __lsx_vssrani_b_h (__m128i, __m128i, imm0_15);
-__m128i __lsx_vssrani_bu_h (__m128i, __m128i, imm0_15);
-__m128i __lsx_vssrani_d_q (__m128i, __m128i, imm0_127);
-__m128i __lsx_vssrani_du_q (__m128i, __m128i, imm0_127);
-__m128i __lsx_vssrani_hu_w (__m128i, __m128i, imm0_31);
-__m128i __lsx_vssrani_h_w (__m128i, __m128i, imm0_31);
-__m128i __lsx_vssrani_w_d (__m128i, __m128i, imm0_63);
-__m128i __lsx_vssrani_wu_d (__m128i, __m128i, imm0_63);
-__m128i __lsx_vssran_w_d (__m128i, __m128i);
-__m128i __lsx_vssran_wu_d (__m128i, __m128i);
-__m128i __lsx_vssrarn_b_h (__m128i, __m128i);
-__m128i __lsx_vssrarn_bu_h (__m128i, __m128i);
-__m128i __lsx_vssrarn_hu_w (__m128i, __m128i);
-__m128i __lsx_vssrarn_h_w (__m128i, __m128i);
-__m128i __lsx_vssrarni_b_h (__m128i, __m128i, imm0_15);
-__m128i __lsx_vssrarni_bu_h (__m128i, __m128i, imm0_15);
-__m128i __lsx_vssrarni_d_q (__m128i, __m128i, imm0_127);
-__m128i __lsx_vssrarni_du_q (__m128i, __m128i, imm0_127);
-__m128i __lsx_vssrarni_hu_w (__m128i, __m128i, imm0_31);
-__m128i __lsx_vssrarni_h_w (__m128i, __m128i, imm0_31);
-__m128i __lsx_vssrarni_w_d (__m128i, __m128i, imm0_63);
-__m128i __lsx_vssrarni_wu_d (__m128i, __m128i, imm0_63);
-__m128i __lsx_vssrarn_w_d (__m128i, __m128i);
-__m128i __lsx_vssrarn_wu_d (__m128i, __m128i);
-__m128i __lsx_vssrln_b_h (__m128i, __m128i);
-__m128i __lsx_vssrln_bu_h (__m128i, __m128i);
-__m128i __lsx_vssrln_hu_w (__m128i, __m128i);
-__m128i __lsx_vssrln_h_w (__m128i, __m128i);
-__m128i __lsx_vssrlni_b_h (__m128i, __m128i, imm0_15);
-__m128i __lsx_vssrlni_bu_h (__m128i, __m128i, imm0_15);
-__m128i __lsx_vssrlni_d_q (__m128i, __m128i, imm0_127);
-__m128i __lsx_vssrlni_du_q (__m128i, __m128i, imm0_127);
-__m128i __lsx_vssrlni_hu_w (__m128i, __m128i, imm0_31);
-__m128i __lsx_vssrlni_h_w (__m128i, __m128i, imm0_31);
-__m128i __lsx_vssrlni_w_d (__m128i, __m128i, imm0_63);
-__m128i __lsx_vssrlni_wu_d (__m128i, __m128i, imm0_63);
-__m128i __lsx_vssrln_w_d (__m128i, __m128i);
-__m128i __lsx_vssrln_wu_d (__m128i, __m128i);
-__m128i __lsx_vssrlrn_b_h (__m128i, __m128i);
-__m128i __lsx_vssrlrn_bu_h (__m128i, __m128i);
-__m128i __lsx_vssrlrn_hu_w (__m128i, __m128i);
-__m128i __lsx_vssrlrn_h_w (__m128i, __m128i);
-__m128i __lsx_vssrlrni_b_h (__m128i, __m128i, imm0_15);
-__m128i __lsx_vssrlrni_bu_h (__m128i, __m128i, imm0_15);
-__m128i __lsx_vssrlrni_d_q (__m128i, __m128i, imm0_127);
-__m128i __lsx_vssrlrni_du_q (__m128i, __m128i, imm0_127);
-__m128i __lsx_vssrlrni_hu_w (__m128i, __m128i, imm0_31);
-__m128i __lsx_vssrlrni_h_w (__m128i, __m128i, imm0_31);
-__m128i __lsx_vssrlrni_w_d (__m128i, __m128i, imm0_63);
-__m128i __lsx_vssrlrni_wu_d (__m128i, __m128i, imm0_63);
-__m128i __lsx_vssrlrn_w_d (__m128i, __m128i);
-__m128i __lsx_vssrlrn_wu_d (__m128i, __m128i);
-__m128i __lsx_vssub_b (__m128i, __m128i);
-__m128i __lsx_vssub_bu (__m128i, __m128i);
-__m128i __lsx_vssub_d (__m128i, __m128i);
-__m128i __lsx_vssub_du (__m128i, __m128i);
-__m128i __lsx_vssub_h (__m128i, __m128i);
-__m128i __lsx_vssub_hu (__m128i, __m128i);
-__m128i __lsx_vssub_w (__m128i, __m128i);
-__m128i __lsx_vssub_wu (__m128i, __m128i);
-void __lsx_vst (__m128i, void *, imm_n2048_2047);
-void __lsx_vstelm_b (__m128i, void *, imm_n128_127, imm0_15);
-void __lsx_vstelm_d (__m128i, void *, imm_n128_127, imm0_1);
-void __lsx_vstelm_h (__m128i, void *, imm_n128_127, imm0_7);
-void __lsx_vstelm_w (__m128i, void *, imm_n128_127, imm0_3);
-void __lsx_vstx (__m128i, void *, long int);
-__m128i __lsx_vsub_b (__m128i, __m128i);
-__m128i __lsx_vsub_d (__m128i, __m128i);
-__m128i __lsx_vsub_h (__m128i, __m128i);
-__m128i __lsx_vsubi_bu (__m128i, imm0_31);
-__m128i __lsx_vsubi_du (__m128i, imm0_31);
-__m128i __lsx_vsubi_hu (__m128i, imm0_31);
-__m128i __lsx_vsubi_wu (__m128i, imm0_31);
-__m128i __lsx_vsub_q (__m128i, __m128i);
-__m128i __lsx_vsub_w (__m128i, __m128i);
-__m128i __lsx_vsubwev_d_w (__m128i, __m128i);
-__m128i __lsx_vsubwev_d_wu (__m128i, __m128i);
-__m128i __lsx_vsubwev_h_b (__m128i, __m128i);
-__m128i __lsx_vsubwev_h_bu (__m128i, __m128i);
-__m128i __lsx_vsubwev_q_d (__m128i, __m128i);
-__m128i __lsx_vsubwev_q_du (__m128i, __m128i);
-__m128i __lsx_vsubwev_w_h (__m128i, __m128i);
-__m128i __lsx_vsubwev_w_hu (__m128i, __m128i);
-__m128i __lsx_vsubwod_d_w (__m128i, __m128i);
-__m128i __lsx_vsubwod_d_wu (__m128i, __m128i);
-__m128i __lsx_vsubwod_h_b (__m128i, __m128i);
-__m128i __lsx_vsubwod_h_bu (__m128i, __m128i);
-__m128i __lsx_vsubwod_q_d (__m128i, __m128i);
-__m128i __lsx_vsubwod_q_du (__m128i, __m128i);
-__m128i __lsx_vsubwod_w_h (__m128i, __m128i);
-__m128i __lsx_vsubwod_w_hu (__m128i, __m128i);
-__m128i __lsx_vxori_b (__m128i, imm0_255);
-__m128i __lsx_vxor_v (__m128i, __m128i);
-@end smallexample
-
-These intrinsic functions are available by including @code{lsxintrin.h} and
-using @option{-mfrecipe} and @option{-mlsx}.
-@smallexample
-__m128d __lsx_vfrecipe_d (__m128d);
-__m128 __lsx_vfrecipe_s (__m128);
-__m128d __lsx_vfrsqrte_d (__m128d);
-__m128 __lsx_vfrsqrte_s (__m128);
-@end smallexample
-
-@node LoongArch ASX Vector Intrinsics
-@subsection LoongArch ASX Vector Intrinsics
-
-GCC provides intrinsics to access the LASX (Loongson Advanced SIMD Extension)
-instructions. The interface is made available by including @code{<lasxintrin.h>}
-and using @option{-mlasx}.
-
-The following vectors typedefs are included in @code{lasxintrin.h}:
-
-@itemize
-@item @code{__m256i}, a 256-bit vector of fixed point;
-@item @code{__m256}, a 256-bit vector of single precision floating point;
-@item @code{__m256d}, a 256-bit vector of double precision floating point.
-@end itemize
-
-Instructions and corresponding built-ins may have additional restrictions and/or
-input/output values manipulated:
-
-@itemize
-@item @code{imm0_1}, an integer literal in range 0 to 1.
-@item @code{imm0_3}, an integer literal in range 0 to 3.
-@item @code{imm0_7}, an integer literal in range 0 to 7.
-@item @code{imm0_15}, an integer literal in range 0 to 15.
-@item @code{imm0_31}, an integer literal in range 0 to 31.
-@item @code{imm0_63}, an integer literal in range 0 to 63.
-@item @code{imm0_127}, an integer literal in range 0 to 127.
-@item @code{imm0_255}, an integer literal in range 0 to 255.
-@item @code{imm_n16_15}, an integer literal in range -16 to 15.
-@item @code{imm_n128_127}, an integer literal in range -128 to 127.
-@item @code{imm_n256_255}, an integer literal in range -256 to 255.
-@item @code{imm_n512_511}, an integer literal in range -512 to 511.
-@item @code{imm_n1024_1023}, an integer literal in range -1024 to 1023.
-@item @code{imm_n2048_2047}, an integer literal in range -2048 to 2047.
-@end itemize
-
-For convenience, GCC defines functions @code{__lasx_xvrepli_@{b/h/w/d@}} and
-@code{__lasx_b[n]z_@{v/b/h/w/d@}}, which are implemented as follows:
-
-@smallexample
-a. @code{__lasx_xvrepli_@{b/h/w/d@}}: Implemented the case where the highest
-   bit of @code{xvldi} instruction @code{i13} is 1.
-
-   i13[12] == 1'b0
-   case i13[11:10] of :
-     2'b00: __lasx_xvrepli_b (imm_n512_511)
-     2'b01: __lasx_xvrepli_h (imm_n512_511)
-     2'b10: __lasx_xvrepli_w (imm_n512_511)
-     2'b11: __lasx_xvrepli_d (imm_n512_511)
-
-b. @code{__lasx_b[n]z_@{v/b/h/w/d@}}: Since the @code{xvseteqz} class directive
-   cannot be used on its own, this function is defined.
-
-   __lasx_xbz_v  => xvseteqz.v + bcnez
-   __lasx_xbnz_v => xvsetnez.v + bcnez
-   __lasx_xbz_b  => xvsetanyeqz.b + bcnez
-   __lasx_xbz_h  => xvsetanyeqz.h + bcnez
-   __lasx_xbz_w  => xvsetanyeqz.w + bcnez
-   __lasx_xbz_d  => xvsetanyeqz.d + bcnez
-   __lasx_xbnz_b => xvsetallnez.b + bcnez
-   __lasx_xbnz_h => xvsetallnez.h + bcnez
-   __lasx_xbnz_w => xvsetallnez.w + bcnez
-   __lasx_xbnz_d => xvsetallnez.d + bcnez
-@end smallexample
-
-@smallexample
-eg:
-  #include <lasxintrin.h>
-
-  extern __m256i @var{a};
-
-  void
-  test (void)
-  @{
-    if (__lasx_xbz_v (@var{a}))
-      printf ("1\n");
-    else
-      printf ("2\n");
-  @}
-@end smallexample
-
-@emph{Note:} For directives where the intent operand is also the source operand
-(modifying only part of the bitfield of the intent register), the first parameter
-in the builtin call function is used as the intent operand.
-
-@smallexample
-eg:
-  #include <lasxintrin.h>
-  extern __m256i @var{dst};
-  int @var{src};
-
-  void
-  test (void)
-  @{
-    @var{dst} = __lasx_xvinsgr2vr_w (@var{dst}, @var{src}, 3);
-  @}
-@end smallexample
-
-
-The intrinsics provided are listed below:
-
-@smallexample
-__m256i __lasx_vext2xv_d_b (__m256i);
-__m256i __lasx_vext2xv_d_h (__m256i);
-__m256i __lasx_vext2xv_du_bu (__m256i);
-__m256i __lasx_vext2xv_du_hu (__m256i);
-__m256i __lasx_vext2xv_du_wu (__m256i);
-__m256i __lasx_vext2xv_d_w (__m256i);
-__m256i __lasx_vext2xv_h_b (__m256i);
-__m256i __lasx_vext2xv_hu_bu (__m256i);
-__m256i __lasx_vext2xv_w_b (__m256i);
-__m256i __lasx_vext2xv_w_h (__m256i);
-__m256i __lasx_vext2xv_wu_bu (__m256i);
-__m256i __lasx_vext2xv_wu_hu (__m256i);
-int __lasx_xbnz_b (__m256i);
-int __lasx_xbnz_d (__m256i);
-int __lasx_xbnz_h (__m256i);
-int __lasx_xbnz_v (__m256i);
-int __lasx_xbnz_w (__m256i);
-int __lasx_xbz_b (__m256i);
-int __lasx_xbz_d (__m256i);
-int __lasx_xbz_h (__m256i);
-int __lasx_xbz_v (__m256i);
-int __lasx_xbz_w (__m256i);
-__m256i __lasx_xvabsd_b (__m256i, __m256i);
-__m256i __lasx_xvabsd_bu (__m256i, __m256i);
-__m256i __lasx_xvabsd_d (__m256i, __m256i);
-__m256i __lasx_xvabsd_du (__m256i, __m256i);
-__m256i __lasx_xvabsd_h (__m256i, __m256i);
-__m256i __lasx_xvabsd_hu (__m256i, __m256i);
-__m256i __lasx_xvabsd_w (__m256i, __m256i);
-__m256i __lasx_xvabsd_wu (__m256i, __m256i);
-__m256i __lasx_xvadda_b (__m256i, __m256i);
-__m256i __lasx_xvadda_d (__m256i, __m256i);
-__m256i __lasx_xvadda_h (__m256i, __m256i);
-__m256i __lasx_xvadda_w (__m256i, __m256i);
-__m256i __lasx_xvadd_b (__m256i, __m256i);
-__m256i __lasx_xvadd_d (__m256i, __m256i);
-__m256i __lasx_xvadd_h (__m256i, __m256i);
-__m256i __lasx_xvaddi_bu (__m256i, imm0_31);
-__m256i __lasx_xvaddi_du (__m256i, imm0_31);
-__m256i __lasx_xvaddi_hu (__m256i, imm0_31);
-__m256i __lasx_xvaddi_wu (__m256i, imm0_31);
-__m256i __lasx_xvadd_q (__m256i, __m256i);
-__m256i __lasx_xvadd_w (__m256i, __m256i);
-__m256i __lasx_xvaddwev_d_w (__m256i, __m256i);
-__m256i __lasx_xvaddwev_d_wu (__m256i, __m256i);
-__m256i __lasx_xvaddwev_d_wu_w (__m256i, __m256i);
-__m256i __lasx_xvaddwev_h_b (__m256i, __m256i);
-__m256i __lasx_xvaddwev_h_bu (__m256i, __m256i);
-__m256i __lasx_xvaddwev_h_bu_b (__m256i, __m256i);
-__m256i __lasx_xvaddwev_q_d (__m256i, __m256i);
-__m256i __lasx_xvaddwev_q_du (__m256i, __m256i);
-__m256i __lasx_xvaddwev_q_du_d (__m256i, __m256i);
-__m256i __lasx_xvaddwev_w_h (__m256i, __m256i);
-__m256i __lasx_xvaddwev_w_hu (__m256i, __m256i);
-__m256i __lasx_xvaddwev_w_hu_h (__m256i, __m256i);
-__m256i __lasx_xvaddwod_d_w (__m256i, __m256i);
-__m256i __lasx_xvaddwod_d_wu (__m256i, __m256i);
-__m256i __lasx_xvaddwod_d_wu_w (__m256i, __m256i);
-__m256i __lasx_xvaddwod_h_b (__m256i, __m256i);
-__m256i __lasx_xvaddwod_h_bu (__m256i, __m256i);
-__m256i __lasx_xvaddwod_h_bu_b (__m256i, __m256i);
-__m256i __lasx_xvaddwod_q_d (__m256i, __m256i);
-__m256i __lasx_xvaddwod_q_du (__m256i, __m256i);
-__m256i __lasx_xvaddwod_q_du_d (__m256i, __m256i);
-__m256i __lasx_xvaddwod_w_h (__m256i, __m256i);
-__m256i __lasx_xvaddwod_w_hu (__m256i, __m256i);
-__m256i __lasx_xvaddwod_w_hu_h (__m256i, __m256i);
-__m256i __lasx_xvandi_b (__m256i, imm0_255);
-__m256i __lasx_xvandn_v (__m256i, __m256i);
-__m256i __lasx_xvand_v (__m256i, __m256i);
-__m256i __lasx_xvavg_b (__m256i, __m256i);
-__m256i __lasx_xvavg_bu (__m256i, __m256i);
-__m256i __lasx_xvavg_d (__m256i, __m256i);
-__m256i __lasx_xvavg_du (__m256i, __m256i);
-__m256i __lasx_xvavg_h (__m256i, __m256i);
-__m256i __lasx_xvavg_hu (__m256i, __m256i);
-__m256i __lasx_xvavgr_b (__m256i, __m256i);
-__m256i __lasx_xvavgr_bu (__m256i, __m256i);
-__m256i __lasx_xvavgr_d (__m256i, __m256i);
-__m256i __lasx_xvavgr_du (__m256i, __m256i);
-__m256i __lasx_xvavgr_h (__m256i, __m256i);
-__m256i __lasx_xvavgr_hu (__m256i, __m256i);
-__m256i __lasx_xvavgr_w (__m256i, __m256i);
-__m256i __lasx_xvavgr_wu (__m256i, __m256i);
-__m256i __lasx_xvavg_w (__m256i, __m256i);
-__m256i __lasx_xvavg_wu (__m256i, __m256i);
-__m256i __lasx_xvbitclr_b (__m256i, __m256i);
-__m256i __lasx_xvbitclr_d (__m256i, __m256i);
-__m256i __lasx_xvbitclr_h (__m256i, __m256i);
-__m256i __lasx_xvbitclri_b (__m256i, imm0_7);
-__m256i __lasx_xvbitclri_d (__m256i, imm0_63);
-__m256i __lasx_xvbitclri_h (__m256i, imm0_15);
-__m256i __lasx_xvbitclri_w (__m256i, imm0_31);
-__m256i __lasx_xvbitclr_w (__m256i, __m256i);
-__m256i __lasx_xvbitrev_b (__m256i, __m256i);
-__m256i __lasx_xvbitrev_d (__m256i, __m256i);
-__m256i __lasx_xvbitrev_h (__m256i, __m256i);
-__m256i __lasx_xvbitrevi_b (__m256i, imm0_7);
-__m256i __lasx_xvbitrevi_d (__m256i, imm0_63);
-__m256i __lasx_xvbitrevi_h (__m256i, imm0_15);
-__m256i __lasx_xvbitrevi_w (__m256i, imm0_31);
-__m256i __lasx_xvbitrev_w (__m256i, __m256i);
-__m256i __lasx_xvbitseli_b (__m256i, __m256i, imm0_255);
-__m256i __lasx_xvbitsel_v (__m256i, __m256i, __m256i);
-__m256i __lasx_xvbitset_b (__m256i, __m256i);
-__m256i __lasx_xvbitset_d (__m256i, __m256i);
-__m256i __lasx_xvbitset_h (__m256i, __m256i);
-__m256i __lasx_xvbitseti_b (__m256i, imm0_7);
-__m256i __lasx_xvbitseti_d (__m256i, imm0_63);
-__m256i __lasx_xvbitseti_h (__m256i, imm0_15);
-__m256i __lasx_xvbitseti_w (__m256i, imm0_31);
-__m256i __lasx_xvbitset_w (__m256i, __m256i);
-__m256i __lasx_xvbsll_v (__m256i, imm0_31);
-__m256i __lasx_xvbsrl_v (__m256i, imm0_31);
-__m256i __lasx_xvclo_b (__m256i);
-__m256i __lasx_xvclo_d (__m256i);
-__m256i __lasx_xvclo_h (__m256i);
-__m256i __lasx_xvclo_w (__m256i);
-__m256i __lasx_xvclz_b (__m256i);
-__m256i __lasx_xvclz_d (__m256i);
-__m256i __lasx_xvclz_h (__m256i);
-__m256i __lasx_xvclz_w (__m256i);
-__m256i __lasx_xvdiv_b (__m256i, __m256i);
-__m256i __lasx_xvdiv_bu (__m256i, __m256i);
-__m256i __lasx_xvdiv_d (__m256i, __m256i);
-__m256i __lasx_xvdiv_du (__m256i, __m256i);
-__m256i __lasx_xvdiv_h (__m256i, __m256i);
-__m256i __lasx_xvdiv_hu (__m256i, __m256i);
-__m256i __lasx_xvdiv_w (__m256i, __m256i);
-__m256i __lasx_xvdiv_wu (__m256i, __m256i);
-__m256i __lasx_xvexth_du_wu (__m256i);
-__m256i __lasx_xvexth_d_w (__m256i);
-__m256i __lasx_xvexth_h_b (__m256i);
-__m256i __lasx_xvexth_hu_bu (__m256i);
-__m256i __lasx_xvexth_q_d (__m256i);
-__m256i __lasx_xvexth_qu_du (__m256i);
-__m256i __lasx_xvexth_w_h (__m256i);
-__m256i __lasx_xvexth_wu_hu (__m256i);
-__m256i __lasx_xvextl_q_d (__m256i);
-__m256i __lasx_xvextl_qu_du (__m256i);
-__m256i __lasx_xvextrins_b (__m256i, __m256i, imm0_255);
-__m256i __lasx_xvextrins_d (__m256i, __m256i, imm0_255);
-__m256i __lasx_xvextrins_h (__m256i, __m256i, imm0_255);
-__m256i __lasx_xvextrins_w (__m256i, __m256i, imm0_255);
-__m256d __lasx_xvfadd_d (__m256d, __m256d);
-__m256 __lasx_xvfadd_s (__m256, __m256);
-__m256i __lasx_xvfclass_d (__m256d);
-__m256i __lasx_xvfclass_s (__m256);
-__m256i __lasx_xvfcmp_caf_d (__m256d, __m256d);
-__m256i __lasx_xvfcmp_caf_s (__m256, __m256);
-__m256i __lasx_xvfcmp_ceq_d (__m256d, __m256d);
-__m256i __lasx_xvfcmp_ceq_s (__m256, __m256);
-__m256i __lasx_xvfcmp_cle_d (__m256d, __m256d);
-__m256i __lasx_xvfcmp_cle_s (__m256, __m256);
-__m256i __lasx_xvfcmp_clt_d (__m256d, __m256d);
-__m256i __lasx_xvfcmp_clt_s (__m256, __m256);
-__m256i __lasx_xvfcmp_cne_d (__m256d, __m256d);
-__m256i __lasx_xvfcmp_cne_s (__m256, __m256);
-__m256i __lasx_xvfcmp_cor_d (__m256d, __m256d);
-__m256i __lasx_xvfcmp_cor_s (__m256, __m256);
-__m256i __lasx_xvfcmp_cueq_d (__m256d, __m256d);
-__m256i __lasx_xvfcmp_cueq_s (__m256, __m256);
-__m256i __lasx_xvfcmp_cule_d (__m256d, __m256d);
-__m256i __lasx_xvfcmp_cule_s (__m256, __m256);
-__m256i __lasx_xvfcmp_cult_d (__m256d, __m256d);
-__m256i __lasx_xvfcmp_cult_s (__m256, __m256);
-__m256i __lasx_xvfcmp_cun_d (__m256d, __m256d);
-__m256i __lasx_xvfcmp_cune_d (__m256d, __m256d);
-__m256i __lasx_xvfcmp_cune_s (__m256, __m256);
-__m256i __lasx_xvfcmp_cun_s (__m256, __m256);
-__m256i __lasx_xvfcmp_saf_d (__m256d, __m256d);
-__m256i __lasx_xvfcmp_saf_s (__m256, __m256);
-__m256i __lasx_xvfcmp_seq_d (__m256d, __m256d);
-__m256i __lasx_xvfcmp_seq_s (__m256, __m256);
-__m256i __lasx_xvfcmp_sle_d (__m256d, __m256d);
-__m256i __lasx_xvfcmp_sle_s (__m256, __m256);
-__m256i __lasx_xvfcmp_slt_d (__m256d, __m256d);
-__m256i __lasx_xvfcmp_slt_s (__m256, __m256);
-__m256i __lasx_xvfcmp_sne_d (__m256d, __m256d);
-__m256i __lasx_xvfcmp_sne_s (__m256, __m256);
-__m256i __lasx_xvfcmp_sor_d (__m256d, __m256d);
-__m256i __lasx_xvfcmp_sor_s (__m256, __m256);
-__m256i __lasx_xvfcmp_sueq_d (__m256d, __m256d);
-__m256i __lasx_xvfcmp_sueq_s (__m256, __m256);
-__m256i __lasx_xvfcmp_sule_d (__m256d, __m256d);
-__m256i __lasx_xvfcmp_sule_s (__m256, __m256);
-__m256i __lasx_xvfcmp_sult_d (__m256d, __m256d);
-__m256i __lasx_xvfcmp_sult_s (__m256, __m256);
-__m256i __lasx_xvfcmp_sun_d (__m256d, __m256d);
-__m256i __lasx_xvfcmp_sune_d (__m256d, __m256d);
-__m256i __lasx_xvfcmp_sune_s (__m256, __m256);
-__m256i __lasx_xvfcmp_sun_s (__m256, __m256);
-__m256d __lasx_xvfcvth_d_s (__m256);
-__m256i __lasx_xvfcvt_h_s (__m256, __m256);
-__m256 __lasx_xvfcvth_s_h (__m256i);
-__m256d __lasx_xvfcvtl_d_s (__m256);
-__m256 __lasx_xvfcvtl_s_h (__m256i);
-__m256 __lasx_xvfcvt_s_d (__m256d, __m256d);
-__m256d __lasx_xvfdiv_d (__m256d, __m256d);
-__m256 __lasx_xvfdiv_s (__m256, __m256);
-__m256d __lasx_xvffint_d_l (__m256i);
-__m256d __lasx_xvffint_d_lu (__m256i);
-__m256d __lasx_xvffinth_d_w (__m256i);
-__m256d __lasx_xvffintl_d_w (__m256i);
-__m256 __lasx_xvffint_s_l (__m256i, __m256i);
-__m256 __lasx_xvffint_s_w (__m256i);
-__m256 __lasx_xvffint_s_wu (__m256i);
-__m256d __lasx_xvflogb_d (__m256d);
-__m256 __lasx_xvflogb_s (__m256);
-__m256d __lasx_xvfmadd_d (__m256d, __m256d, __m256d);
-__m256 __lasx_xvfmadd_s (__m256, __m256, __m256);
-__m256d __lasx_xvfmaxa_d (__m256d, __m256d);
-__m256 __lasx_xvfmaxa_s (__m256, __m256);
-__m256d __lasx_xvfmax_d (__m256d, __m256d);
-__m256 __lasx_xvfmax_s (__m256, __m256);
-__m256d __lasx_xvfmina_d (__m256d, __m256d);
-__m256 __lasx_xvfmina_s (__m256, __m256);
-__m256d __lasx_xvfmin_d (__m256d, __m256d);
-__m256 __lasx_xvfmin_s (__m256, __m256);
-__m256d __lasx_xvfmsub_d (__m256d, __m256d, __m256d);
-__m256 __lasx_xvfmsub_s (__m256, __m256, __m256);
-__m256d __lasx_xvfmul_d (__m256d, __m256d);
-__m256 __lasx_xvfmul_s (__m256, __m256);
-__m256d __lasx_xvfnmadd_d (__m256d, __m256d, __m256d);
-__m256 __lasx_xvfnmadd_s (__m256, __m256, __m256);
-__m256d __lasx_xvfnmsub_d (__m256d, __m256d, __m256d);
-__m256 __lasx_xvfnmsub_s (__m256, __m256, __m256);
-__m256d __lasx_xvfrecip_d (__m256d);
-__m256 __lasx_xvfrecip_s (__m256);
-__m256d __lasx_xvfrint_d (__m256d);
-__m256d __lasx_xvfrintrm_d (__m256d);
-__m256 __lasx_xvfrintrm_s (__m256);
-__m256d __lasx_xvfrintrne_d (__m256d);
-__m256 __lasx_xvfrintrne_s (__m256);
-__m256d __lasx_xvfrintrp_d (__m256d);
-__m256 __lasx_xvfrintrp_s (__m256);
-__m256d __lasx_xvfrintrz_d (__m256d);
-__m256 __lasx_xvfrintrz_s (__m256);
-__m256 __lasx_xvfrint_s (__m256);
-__m256d __lasx_xvfrsqrt_d (__m256d);
-__m256 __lasx_xvfrsqrt_s (__m256);
-__m256i __lasx_xvfrstp_b (__m256i, __m256i, __m256i);
-__m256i __lasx_xvfrstp_h (__m256i, __m256i, __m256i);
-__m256i __lasx_xvfrstpi_b (__m256i, __m256i, imm0_31);
-__m256i __lasx_xvfrstpi_h (__m256i, __m256i, imm0_31);
-__m256d __lasx_xvfsqrt_d (__m256d);
-__m256 __lasx_xvfsqrt_s (__m256);
-__m256d __lasx_xvfsub_d (__m256d, __m256d);
-__m256 __lasx_xvfsub_s (__m256, __m256);
-__m256i __lasx_xvftinth_l_s (__m256);
-__m256i __lasx_xvftint_l_d (__m256d);
-__m256i __lasx_xvftintl_l_s (__m256);
-__m256i __lasx_xvftint_lu_d (__m256d);
-__m256i __lasx_xvftintrmh_l_s (__m256);
-__m256i __lasx_xvftintrm_l_d (__m256d);
-__m256i __lasx_xvftintrml_l_s (__m256);
-__m256i __lasx_xvftintrm_w_d (__m256d, __m256d);
-__m256i __lasx_xvftintrm_w_s (__m256);
-__m256i __lasx_xvftintrneh_l_s (__m256);
-__m256i __lasx_xvftintrne_l_d (__m256d);
-__m256i __lasx_xvftintrnel_l_s (__m256);
-__m256i __lasx_xvftintrne_w_d (__m256d, __m256d);
-__m256i __lasx_xvftintrne_w_s (__m256);
-__m256i __lasx_xvftintrph_l_s (__m256);
-__m256i __lasx_xvftintrp_l_d (__m256d);
-__m256i __lasx_xvftintrpl_l_s (__m256);
-__m256i __lasx_xvftintrp_w_d (__m256d, __m256d);
-__m256i __lasx_xvftintrp_w_s (__m256);
-__m256i __lasx_xvftintrzh_l_s (__m256);
-__m256i __lasx_xvftintrz_l_d (__m256d);
-__m256i __lasx_xvftintrzl_l_s (__m256);
-__m256i __lasx_xvftintrz_lu_d (__m256d);
-__m256i __lasx_xvftintrz_w_d (__m256d, __m256d);
-__m256i __lasx_xvftintrz_w_s (__m256);
-__m256i __lasx_xvftintrz_wu_s (__m256);
-__m256i __lasx_xvftint_w_d (__m256d, __m256d);
-__m256i __lasx_xvftint_w_s (__m256);
-__m256i __lasx_xvftint_wu_s (__m256);
-__m256i __lasx_xvhaddw_du_wu (__m256i, __m256i);
-__m256i __lasx_xvhaddw_d_w (__m256i, __m256i);
-__m256i __lasx_xvhaddw_h_b (__m256i, __m256i);
-__m256i __lasx_xvhaddw_hu_bu (__m256i, __m256i);
-__m256i __lasx_xvhaddw_q_d (__m256i, __m256i);
-__m256i __lasx_xvhaddw_qu_du (__m256i, __m256i);
-__m256i __lasx_xvhaddw_w_h (__m256i, __m256i);
-__m256i __lasx_xvhaddw_wu_hu (__m256i, __m256i);
-__m256i __lasx_xvhsubw_du_wu (__m256i, __m256i);
-__m256i __lasx_xvhsubw_d_w (__m256i, __m256i);
-__m256i __lasx_xvhsubw_h_b (__m256i, __m256i);
-__m256i __lasx_xvhsubw_hu_bu (__m256i, __m256i);
-__m256i __lasx_xvhsubw_q_d (__m256i, __m256i);
-__m256i __lasx_xvhsubw_qu_du (__m256i, __m256i);
-__m256i __lasx_xvhsubw_w_h (__m256i, __m256i);
-__m256i __lasx_xvhsubw_wu_hu (__m256i, __m256i);
-__m256i __lasx_xvilvh_b (__m256i, __m256i);
-__m256i __lasx_xvilvh_d (__m256i, __m256i);
-__m256i __lasx_xvilvh_h (__m256i, __m256i);
-__m256i __lasx_xvilvh_w (__m256i, __m256i);
-__m256i __lasx_xvilvl_b (__m256i, __m256i);
-__m256i __lasx_xvilvl_d (__m256i, __m256i);
-__m256i __lasx_xvilvl_h (__m256i, __m256i);
-__m256i __lasx_xvilvl_w (__m256i, __m256i);
-__m256i __lasx_xvinsgr2vr_d (__m256i, long int, imm0_3);
-__m256i __lasx_xvinsgr2vr_w (__m256i, int, imm0_7);
-__m256i __lasx_xvinsve0_d (__m256i, __m256i, imm0_3);
-__m256i __lasx_xvinsve0_w (__m256i, __m256i, imm0_7);
-__m256i __lasx_xvld (void *, imm_n2048_2047);
-__m256i __lasx_xvldi (imm_n1024_1023);
-__m256i __lasx_xvldrepl_b (void *, imm_n2048_2047);
-__m256i __lasx_xvldrepl_d (void *, imm_n256_255);
-__m256i __lasx_xvldrepl_h (void *, imm_n1024_1023);
-__m256i __lasx_xvldrepl_w (void *, imm_n512_511);
-__m256i __lasx_xvldx (void *, long int);
-__m256i __lasx_xvmadd_b (__m256i, __m256i, __m256i);
-__m256i __lasx_xvmadd_d (__m256i, __m256i, __m256i);
-__m256i __lasx_xvmadd_h (__m256i, __m256i, __m256i);
-__m256i __lasx_xvmadd_w (__m256i, __m256i, __m256i);
-__m256i __lasx_xvmaddwev_d_w (__m256i, __m256i, __m256i);
-__m256i __lasx_xvmaddwev_d_wu (__m256i, __m256i, __m256i);
-__m256i __lasx_xvmaddwev_d_wu_w (__m256i, __m256i, __m256i);
-__m256i __lasx_xvmaddwev_h_b (__m256i, __m256i, __m256i);
-__m256i __lasx_xvmaddwev_h_bu (__m256i, __m256i, __m256i);
-__m256i __lasx_xvmaddwev_h_bu_b (__m256i, __m256i, __m256i);
-__m256i __lasx_xvmaddwev_q_d (__m256i, __m256i, __m256i);
-__m256i __lasx_xvmaddwev_q_du (__m256i, __m256i, __m256i);
-__m256i __lasx_xvmaddwev_q_du_d (__m256i, __m256i, __m256i);
-__m256i __lasx_xvmaddwev_w_h (__m256i, __m256i, __m256i);
-__m256i __lasx_xvmaddwev_w_hu (__m256i, __m256i, __m256i);
-__m256i __lasx_xvmaddwev_w_hu_h (__m256i, __m256i, __m256i);
-__m256i __lasx_xvmaddwod_d_w (__m256i, __m256i, __m256i);
-__m256i __lasx_xvmaddwod_d_wu (__m256i, __m256i, __m256i);
-__m256i __lasx_xvmaddwod_d_wu_w (__m256i, __m256i, __m256i);
-__m256i __lasx_xvmaddwod_h_b (__m256i, __m256i, __m256i);
-__m256i __lasx_xvmaddwod_h_bu (__m256i, __m256i, __m256i);
-__m256i __lasx_xvmaddwod_h_bu_b (__m256i, __m256i, __m256i);
-__m256i __lasx_xvmaddwod_q_d (__m256i, __m256i, __m256i);
-__m256i __lasx_xvmaddwod_q_du (__m256i, __m256i, __m256i);
-__m256i __lasx_xvmaddwod_q_du_d (__m256i, __m256i, __m256i);
-__m256i __lasx_xvmaddwod_w_h (__m256i, __m256i, __m256i);
-__m256i __lasx_xvmaddwod_w_hu (__m256i, __m256i, __m256i);
-__m256i __lasx_xvmaddwod_w_hu_h (__m256i, __m256i, __m256i);
-__m256i __lasx_xvmax_b (__m256i, __m256i);
-__m256i __lasx_xvmax_bu (__m256i, __m256i);
-__m256i __lasx_xvmax_d (__m256i, __m256i);
-__m256i __lasx_xvmax_du (__m256i, __m256i);
-__m256i __lasx_xvmax_h (__m256i, __m256i);
-__m256i __lasx_xvmax_hu (__m256i, __m256i);
-__m256i __lasx_xvmaxi_b (__m256i, imm_n16_15);
-__m256i __lasx_xvmaxi_bu (__m256i, imm0_31);
-__m256i __lasx_xvmaxi_d (__m256i, imm_n16_15);
-__m256i __lasx_xvmaxi_du (__m256i, imm0_31);
-__m256i __lasx_xvmaxi_h (__m256i, imm_n16_15);
-__m256i __lasx_xvmaxi_hu (__m256i, imm0_31);
-__m256i __lasx_xvmaxi_w (__m256i, imm_n16_15);
-__m256i __lasx_xvmaxi_wu (__m256i, imm0_31);
-__m256i __lasx_xvmax_w (__m256i, __m256i);
-__m256i __lasx_xvmax_wu (__m256i, __m256i);
-__m256i __lasx_xvmin_b (__m256i, __m256i);
-__m256i __lasx_xvmin_bu (__m256i, __m256i);
-__m256i __lasx_xvmin_d (__m256i, __m256i);
-__m256i __lasx_xvmin_du (__m256i, __m256i);
-__m256i __lasx_xvmin_h (__m256i, __m256i);
-__m256i __lasx_xvmin_hu (__m256i, __m256i);
-__m256i __lasx_xvmini_b (__m256i, imm_n16_15);
-__m256i __lasx_xvmini_bu (__m256i, imm0_31);
-__m256i __lasx_xvmini_d (__m256i, imm_n16_15);
-__m256i __lasx_xvmini_du (__m256i, imm0_31);
-__m256i __lasx_xvmini_h (__m256i, imm_n16_15);
-__m256i __lasx_xvmini_hu (__m256i, imm0_31);
-__m256i __lasx_xvmini_w (__m256i, imm_n16_15);
-__m256i __lasx_xvmini_wu (__m256i, imm0_31);
-__m256i __lasx_xvmin_w (__m256i, __m256i);
-__m256i __lasx_xvmin_wu (__m256i, __m256i);
-__m256i __lasx_xvmod_b (__m256i, __m256i);
-__m256i __lasx_xvmod_bu (__m256i, __m256i);
-__m256i __lasx_xvmod_d (__m256i, __m256i);
-__m256i __lasx_xvmod_du (__m256i, __m256i);
-__m256i __lasx_xvmod_h (__m256i, __m256i);
-__m256i __lasx_xvmod_hu (__m256i, __m256i);
-__m256i __lasx_xvmod_w (__m256i, __m256i);
-__m256i __lasx_xvmod_wu (__m256i, __m256i);
-__m256i __lasx_xvmskgez_b (__m256i);
-__m256i __lasx_xvmskltz_b (__m256i);
-__m256i __lasx_xvmskltz_d (__m256i);
-__m256i __lasx_xvmskltz_h (__m256i);
-__m256i __lasx_xvmskltz_w (__m256i);
-__m256i __lasx_xvmsknz_b (__m256i);
-__m256i __lasx_xvmsub_b (__m256i, __m256i, __m256i);
-__m256i __lasx_xvmsub_d (__m256i, __m256i, __m256i);
-__m256i __lasx_xvmsub_h (__m256i, __m256i, __m256i);
-__m256i __lasx_xvmsub_w (__m256i, __m256i, __m256i);
-__m256i __lasx_xvmuh_b (__m256i, __m256i);
-__m256i __lasx_xvmuh_bu (__m256i, __m256i);
-__m256i __lasx_xvmuh_d (__m256i, __m256i);
-__m256i __lasx_xvmuh_du (__m256i, __m256i);
-__m256i __lasx_xvmuh_h (__m256i, __m256i);
-__m256i __lasx_xvmuh_hu (__m256i, __m256i);
-__m256i __lasx_xvmuh_w (__m256i, __m256i);
-__m256i __lasx_xvmuh_wu (__m256i, __m256i);
-__m256i __lasx_xvmul_b (__m256i, __m256i);
-__m256i __lasx_xvmul_d (__m256i, __m256i);
-__m256i __lasx_xvmul_h (__m256i, __m256i);
-__m256i __lasx_xvmul_w (__m256i, __m256i);
-__m256i __lasx_xvmulwev_d_w (__m256i, __m256i);
-__m256i __lasx_xvmulwev_d_wu (__m256i, __m256i);
-__m256i __lasx_xvmulwev_d_wu_w (__m256i, __m256i);
-__m256i __lasx_xvmulwev_h_b (__m256i, __m256i);
-__m256i __lasx_xvmulwev_h_bu (__m256i, __m256i);
-__m256i __lasx_xvmulwev_h_bu_b (__m256i, __m256i);
-__m256i __lasx_xvmulwev_q_d (__m256i, __m256i);
-__m256i __lasx_xvmulwev_q_du (__m256i, __m256i);
-__m256i __lasx_xvmulwev_q_du_d (__m256i, __m256i);
-__m256i __lasx_xvmulwev_w_h (__m256i, __m256i);
-__m256i __lasx_xvmulwev_w_hu (__m256i, __m256i);
-__m256i __lasx_xvmulwev_w_hu_h (__m256i, __m256i);
-__m256i __lasx_xvmulwod_d_w (__m256i, __m256i);
-__m256i __lasx_xvmulwod_d_wu (__m256i, __m256i);
-__m256i __lasx_xvmulwod_d_wu_w (__m256i, __m256i);
-__m256i __lasx_xvmulwod_h_b (__m256i, __m256i);
-__m256i __lasx_xvmulwod_h_bu (__m256i, __m256i);
-__m256i __lasx_xvmulwod_h_bu_b (__m256i, __m256i);
-__m256i __lasx_xvmulwod_q_d (__m256i, __m256i);
-__m256i __lasx_xvmulwod_q_du (__m256i, __m256i);
-__m256i __lasx_xvmulwod_q_du_d (__m256i, __m256i);
-__m256i __lasx_xvmulwod_w_h (__m256i, __m256i);
-__m256i __lasx_xvmulwod_w_hu (__m256i, __m256i);
-__m256i __lasx_xvmulwod_w_hu_h (__m256i, __m256i);
-__m256i __lasx_xvneg_b (__m256i);
-__m256i __lasx_xvneg_d (__m256i);
-__m256i __lasx_xvneg_h (__m256i);
-__m256i __lasx_xvneg_w (__m256i);
-__m256i __lasx_xvnori_b (__m256i, imm0_255);
-__m256i __lasx_xvnor_v (__m256i, __m256i);
-__m256i __lasx_xvori_b (__m256i, imm0_255);
-__m256i __lasx_xvorn_v (__m256i, __m256i);
-__m256i __lasx_xvor_v (__m256i, __m256i);
-__m256i __lasx_xvpackev_b (__m256i, __m256i);
-__m256i __lasx_xvpackev_d (__m256i, __m256i);
-__m256i __lasx_xvpackev_h (__m256i, __m256i);
-__m256i __lasx_xvpackev_w (__m256i, __m256i);
-__m256i __lasx_xvpackod_b (__m256i, __m256i);
-__m256i __lasx_xvpackod_d (__m256i, __m256i);
-__m256i __lasx_xvpackod_h (__m256i, __m256i);
-__m256i __lasx_xvpackod_w (__m256i, __m256i);
-__m256i __lasx_xvpcnt_b (__m256i);
-__m256i __lasx_xvpcnt_d (__m256i);
-__m256i __lasx_xvpcnt_h (__m256i);
-__m256i __lasx_xvpcnt_w (__m256i);
-__m256i __lasx_xvpermi_d (__m256i, imm0_255);
-__m256i __lasx_xvpermi_q (__m256i, __m256i, imm0_255);
-__m256i __lasx_xvpermi_w (__m256i, __m256i, imm0_255);
-__m256i __lasx_xvperm_w (__m256i, __m256i);
-__m256i __lasx_xvpickev_b (__m256i, __m256i);
-__m256i __lasx_xvpickev_d (__m256i, __m256i);
-__m256i __lasx_xvpickev_h (__m256i, __m256i);
-__m256i __lasx_xvpickev_w (__m256i, __m256i);
-__m256i __lasx_xvpickod_b (__m256i, __m256i);
-__m256i __lasx_xvpickod_d (__m256i, __m256i);
-__m256i __lasx_xvpickod_h (__m256i, __m256i);
-__m256i __lasx_xvpickod_w (__m256i, __m256i);
-long int __lasx_xvpickve2gr_d (__m256i, imm0_3);
-unsigned long int __lasx_xvpickve2gr_du (__m256i, imm0_3);
-int __lasx_xvpickve2gr_w (__m256i, imm0_7);
-unsigned int __lasx_xvpickve2gr_wu (__m256i, imm0_7);
-__m256i __lasx_xvpickve_d (__m256i, imm0_3);
-__m256d __lasx_xvpickve_d_f (__m256d, imm0_3);
-__m256i __lasx_xvpickve_w (__m256i, imm0_7);
-__m256 __lasx_xvpickve_w_f (__m256, imm0_7);
-__m256i __lasx_xvrepl128vei_b (__m256i, imm0_15);
-__m256i __lasx_xvrepl128vei_d (__m256i, imm0_1);
-__m256i __lasx_xvrepl128vei_h (__m256i, imm0_7);
-__m256i __lasx_xvrepl128vei_w (__m256i, imm0_3);
-__m256i __lasx_xvreplgr2vr_b (int);
-__m256i __lasx_xvreplgr2vr_d (long int);
-__m256i __lasx_xvreplgr2vr_h (int);
-__m256i __lasx_xvreplgr2vr_w (int);
-__m256i __lasx_xvrepli_b (imm_n512_511);
-__m256i __lasx_xvrepli_d (imm_n512_511);
-__m256i __lasx_xvrepli_h (imm_n512_511);
-__m256i __lasx_xvrepli_w (imm_n512_511);
-__m256i __lasx_xvreplve0_b (__m256i);
-__m256i __lasx_xvreplve0_d (__m256i);
-__m256i __lasx_xvreplve0_h (__m256i);
-__m256i __lasx_xvreplve0_q (__m256i);
-__m256i __lasx_xvreplve0_w (__m256i);
-__m256i __lasx_xvreplve_b (__m256i, int);
-__m256i __lasx_xvreplve_d (__m256i, int);
-__m256i __lasx_xvreplve_h (__m256i, int);
-__m256i __lasx_xvreplve_w (__m256i, int);
-__m256i __lasx_xvrotr_b (__m256i, __m256i);
-__m256i __lasx_xvrotr_d (__m256i, __m256i);
-__m256i __lasx_xvrotr_h (__m256i, __m256i);
-__m256i __lasx_xvrotri_b (__m256i, imm0_7);
-__m256i __lasx_xvrotri_d (__m256i, imm0_63);
-__m256i __lasx_xvrotri_h (__m256i, imm0_15);
-__m256i __lasx_xvrotri_w (__m256i, imm0_31);
-__m256i __lasx_xvrotr_w (__m256i, __m256i);
-__m256i __lasx_xvsadd_b (__m256i, __m256i);
-__m256i __lasx_xvsadd_bu (__m256i, __m256i);
-__m256i __lasx_xvsadd_d (__m256i, __m256i);
-__m256i __lasx_xvsadd_du (__m256i, __m256i);
-__m256i __lasx_xvsadd_h (__m256i, __m256i);
-__m256i __lasx_xvsadd_hu (__m256i, __m256i);
-__m256i __lasx_xvsadd_w (__m256i, __m256i);
-__m256i __lasx_xvsadd_wu (__m256i, __m256i);
-__m256i __lasx_xvsat_b (__m256i, imm0_7);
-__m256i __lasx_xvsat_bu (__m256i, imm0_7);
-__m256i __lasx_xvsat_d (__m256i, imm0_63);
-__m256i __lasx_xvsat_du (__m256i, imm0_63);
-__m256i __lasx_xvsat_h (__m256i, imm0_15);
-__m256i __lasx_xvsat_hu (__m256i, imm0_15);
-__m256i __lasx_xvsat_w (__m256i, imm0_31);
-__m256i __lasx_xvsat_wu (__m256i, imm0_31);
-__m256i __lasx_xvseq_b (__m256i, __m256i);
-__m256i __lasx_xvseq_d (__m256i, __m256i);
-__m256i __lasx_xvseq_h (__m256i, __m256i);
-__m256i __lasx_xvseqi_b (__m256i, imm_n16_15);
-__m256i __lasx_xvseqi_d (__m256i, imm_n16_15);
-__m256i __lasx_xvseqi_h (__m256i, imm_n16_15);
-__m256i __lasx_xvseqi_w (__m256i, imm_n16_15);
-__m256i __lasx_xvseq_w (__m256i, __m256i);
-__m256i __lasx_xvshuf4i_b (__m256i, imm0_255);
-__m256i __lasx_xvshuf4i_d (__m256i, __m256i, imm0_255);
-__m256i __lasx_xvshuf4i_h (__m256i, imm0_255);
-__m256i __lasx_xvshuf4i_w (__m256i, imm0_255);
-__m256i __lasx_xvshuf_b (__m256i, __m256i, __m256i);
-__m256i __lasx_xvshuf_d (__m256i, __m256i, __m256i);
-__m256i __lasx_xvshuf_h (__m256i, __m256i, __m256i);
-__m256i __lasx_xvshuf_w (__m256i, __m256i, __m256i);
-__m256i __lasx_xvsigncov_b (__m256i, __m256i);
-__m256i __lasx_xvsigncov_d (__m256i, __m256i);
-__m256i __lasx_xvsigncov_h (__m256i, __m256i);
-__m256i __lasx_xvsigncov_w (__m256i, __m256i);
-__m256i __lasx_xvsle_b (__m256i, __m256i);
-__m256i __lasx_xvsle_bu (__m256i, __m256i);
-__m256i __lasx_xvsle_d (__m256i, __m256i);
-__m256i __lasx_xvsle_du (__m256i, __m256i);
-__m256i __lasx_xvsle_h (__m256i, __m256i);
-__m256i __lasx_xvsle_hu (__m256i, __m256i);
-__m256i __lasx_xvslei_b (__m256i, imm_n16_15);
-__m256i __lasx_xvslei_bu (__m256i, imm0_31);
-__m256i __lasx_xvslei_d (__m256i, imm_n16_15);
-__m256i __lasx_xvslei_du (__m256i, imm0_31);
-__m256i __lasx_xvslei_h (__m256i, imm_n16_15);
-__m256i __lasx_xvslei_hu (__m256i, imm0_31);
-__m256i __lasx_xvslei_w (__m256i, imm_n16_15);
-__m256i __lasx_xvslei_wu (__m256i, imm0_31);
-__m256i __lasx_xvsle_w (__m256i, __m256i);
-__m256i __lasx_xvsle_wu (__m256i, __m256i);
-__m256i __lasx_xvsll_b (__m256i, __m256i);
-__m256i __lasx_xvsll_d (__m256i, __m256i);
-__m256i __lasx_xvsll_h (__m256i, __m256i);
-__m256i __lasx_xvslli_b (__m256i, imm0_7);
-__m256i __lasx_xvslli_d (__m256i, imm0_63);
-__m256i __lasx_xvslli_h (__m256i, imm0_15);
-__m256i __lasx_xvslli_w (__m256i, imm0_31);
-__m256i __lasx_xvsll_w (__m256i, __m256i);
-__m256i __lasx_xvsllwil_du_wu (__m256i, imm0_31);
-__m256i __lasx_xvsllwil_d_w (__m256i, imm0_31);
-__m256i __lasx_xvsllwil_h_b (__m256i, imm0_7);
-__m256i __lasx_xvsllwil_hu_bu (__m256i, imm0_7);
-__m256i __lasx_xvsllwil_w_h (__m256i, imm0_15);
-__m256i __lasx_xvsllwil_wu_hu (__m256i, imm0_15);
-__m256i __lasx_xvslt_b (__m256i, __m256i);
-__m256i __lasx_xvslt_bu (__m256i, __m256i);
-__m256i __lasx_xvslt_d (__m256i, __m256i);
-__m256i __lasx_xvslt_du (__m256i, __m256i);
-__m256i __lasx_xvslt_h (__m256i, __m256i);
-__m256i __lasx_xvslt_hu (__m256i, __m256i);
-__m256i __lasx_xvslti_b (__m256i, imm_n16_15);
-__m256i __lasx_xvslti_bu (__m256i, imm0_31);
-__m256i __lasx_xvslti_d (__m256i, imm_n16_15);
-__m256i __lasx_xvslti_du (__m256i, imm0_31);
-__m256i __lasx_xvslti_h (__m256i, imm_n16_15);
-__m256i __lasx_xvslti_hu (__m256i, imm0_31);
-__m256i __lasx_xvslti_w (__m256i, imm_n16_15);
-__m256i __lasx_xvslti_wu (__m256i, imm0_31);
-__m256i __lasx_xvslt_w (__m256i, __m256i);
-__m256i __lasx_xvslt_wu (__m256i, __m256i);
-__m256i __lasx_xvsra_b (__m256i, __m256i);
-__m256i __lasx_xvsra_d (__m256i, __m256i);
-__m256i __lasx_xvsra_h (__m256i, __m256i);
-__m256i __lasx_xvsrai_b (__m256i, imm0_7);
-__m256i __lasx_xvsrai_d (__m256i, imm0_63);
-__m256i __lasx_xvsrai_h (__m256i, imm0_15);
-__m256i __lasx_xvsrai_w (__m256i, imm0_31);
-__m256i __lasx_xvsran_b_h (__m256i, __m256i);
-__m256i __lasx_xvsran_h_w (__m256i, __m256i);
-__m256i __lasx_xvsrani_b_h (__m256i, __m256i, imm0_15);
-__m256i __lasx_xvsrani_d_q (__m256i, __m256i, imm0_127);
-__m256i __lasx_xvsrani_h_w (__m256i, __m256i, imm0_31);
-__m256i __lasx_xvsrani_w_d (__m256i, __m256i, imm0_63);
-__m256i __lasx_xvsran_w_d (__m256i, __m256i);
-__m256i __lasx_xvsrar_b (__m256i, __m256i);
-__m256i __lasx_xvsrar_d (__m256i, __m256i);
-__m256i __lasx_xvsrar_h (__m256i, __m256i);
-__m256i __lasx_xvsrari_b (__m256i, imm0_7);
-__m256i __lasx_xvsrari_d (__m256i, imm0_63);
-__m256i __lasx_xvsrari_h (__m256i, imm0_15);
-__m256i __lasx_xvsrari_w (__m256i, imm0_31);
-__m256i __lasx_xvsrarn_b_h (__m256i, __m256i);
-__m256i __lasx_xvsrarn_h_w (__m256i, __m256i);
-__m256i __lasx_xvsrarni_b_h (__m256i, __m256i, imm0_15);
-__m256i __lasx_xvsrarni_d_q (__m256i, __m256i, imm0_127);
-__m256i __lasx_xvsrarni_h_w (__m256i, __m256i, imm0_31);
-__m256i __lasx_xvsrarni_w_d (__m256i, __m256i, imm0_63);
-__m256i __lasx_xvsrarn_w_d (__m256i, __m256i);
-__m256i __lasx_xvsrar_w (__m256i, __m256i);
-__m256i __lasx_xvsra_w (__m256i, __m256i);
-__m256i __lasx_xvsrl_b (__m256i, __m256i);
-__m256i __lasx_xvsrl_d (__m256i, __m256i);
-__m256i __lasx_xvsrl_h (__m256i, __m256i);
-__m256i __lasx_xvsrli_b (__m256i, imm0_7);
-__m256i __lasx_xvsrli_d (__m256i, imm0_63);
-__m256i __lasx_xvsrli_h (__m256i, imm0_15);
-__m256i __lasx_xvsrli_w (__m256i, imm0_31);
-__m256i __lasx_xvsrln_b_h (__m256i, __m256i);
-__m256i __lasx_xvsrln_h_w (__m256i, __m256i);
-__m256i __lasx_xvsrlni_b_h (__m256i, __m256i, imm0_15);
-__m256i __lasx_xvsrlni_d_q (__m256i, __m256i, imm0_127);
-__m256i __lasx_xvsrlni_h_w (__m256i, __m256i, imm0_31);
-__m256i __lasx_xvsrlni_w_d (__m256i, __m256i, imm0_63);
-__m256i __lasx_xvsrln_w_d (__m256i, __m256i);
-__m256i __lasx_xvsrlr_b (__m256i, __m256i);
-__m256i __lasx_xvsrlr_d (__m256i, __m256i);
-__m256i __lasx_xvsrlr_h (__m256i, __m256i);
-__m256i __lasx_xvsrlri_b (__m256i, imm0_7);
-__m256i __lasx_xvsrlri_d (__m256i, imm0_63);
-__m256i __lasx_xvsrlri_h (__m256i, imm0_15);
-__m256i __lasx_xvsrlri_w (__m256i, imm0_31);
-__m256i __lasx_xvsrlrn_b_h (__m256i, __m256i);
-__m256i __lasx_xvsrlrn_h_w (__m256i, __m256i);
-__m256i __lasx_xvsrlrni_b_h (__m256i, __m256i, imm0_15);
-__m256i __lasx_xvsrlrni_d_q (__m256i, __m256i, imm0_127);
-__m256i __lasx_xvsrlrni_h_w (__m256i, __m256i, imm0_31);
-__m256i __lasx_xvsrlrni_w_d (__m256i, __m256i, imm0_63);
-__m256i __lasx_xvsrlrn_w_d (__m256i, __m256i);
-__m256i __lasx_xvsrlr_w (__m256i, __m256i);
-__m256i __lasx_xvsrl_w (__m256i, __m256i);
-__m256i __lasx_xvssran_b_h (__m256i, __m256i);
-__m256i __lasx_xvssran_bu_h (__m256i, __m256i);
-__m256i __lasx_xvssran_hu_w (__m256i, __m256i);
-__m256i __lasx_xvssran_h_w (__m256i, __m256i);
-__m256i __lasx_xvssrani_b_h (__m256i, __m256i, imm0_15);
-__m256i __lasx_xvssrani_bu_h (__m256i, __m256i, imm0_15);
-__m256i __lasx_xvssrani_d_q (__m256i, __m256i, imm0_127);
-__m256i __lasx_xvssrani_du_q (__m256i, __m256i, imm0_127);
-__m256i __lasx_xvssrani_hu_w (__m256i, __m256i, imm0_31);
-__m256i __lasx_xvssrani_h_w (__m256i, __m256i, imm0_31);
-__m256i __lasx_xvssrani_w_d (__m256i, __m256i, imm0_63);
-__m256i __lasx_xvssrani_wu_d (__m256i, __m256i, imm0_63);
-__m256i __lasx_xvssran_w_d (__m256i, __m256i);
-__m256i __lasx_xvssran_wu_d (__m256i, __m256i);
-__m256i __lasx_xvssrarn_b_h (__m256i, __m256i);
-__m256i __lasx_xvssrarn_bu_h (__m256i, __m256i);
-__m256i __lasx_xvssrarn_hu_w (__m256i, __m256i);
-__m256i __lasx_xvssrarn_h_w (__m256i, __m256i);
-__m256i __lasx_xvssrarni_b_h (__m256i, __m256i, imm0_15);
-__m256i __lasx_xvssrarni_bu_h (__m256i, __m256i, imm0_15);
-__m256i __lasx_xvssrarni_d_q (__m256i, __m256i, imm0_127);
-__m256i __lasx_xvssrarni_du_q (__m256i, __m256i, imm0_127);
-__m256i __lasx_xvssrarni_hu_w (__m256i, __m256i, imm0_31);
-__m256i __lasx_xvssrarni_h_w (__m256i, __m256i, imm0_31);
-__m256i __lasx_xvssrarni_w_d (__m256i, __m256i, imm0_63);
-__m256i __lasx_xvssrarni_wu_d (__m256i, __m256i, imm0_63);
-__m256i __lasx_xvssrarn_w_d (__m256i, __m256i);
-__m256i __lasx_xvssrarn_wu_d (__m256i, __m256i);
-__m256i __lasx_xvssrln_b_h (__m256i, __m256i);
-__m256i __lasx_xvssrln_bu_h (__m256i, __m256i);
-__m256i __lasx_xvssrln_hu_w (__m256i, __m256i);
-__m256i __lasx_xvssrln_h_w (__m256i, __m256i);
-__m256i __lasx_xvssrlni_b_h (__m256i, __m256i, imm0_15);
-__m256i __lasx_xvssrlni_bu_h (__m256i, __m256i, imm0_15);
-__m256i __lasx_xvssrlni_d_q (__m256i, __m256i, imm0_127);
-__m256i __lasx_xvssrlni_du_q (__m256i, __m256i, imm0_127);
-__m256i __lasx_xvssrlni_hu_w (__m256i, __m256i, imm0_31);
-__m256i __lasx_xvssrlni_h_w (__m256i, __m256i, imm0_31);
-__m256i __lasx_xvssrlni_w_d (__m256i, __m256i, imm0_63);
-__m256i __lasx_xvssrlni_wu_d (__m256i, __m256i, imm0_63);
-__m256i __lasx_xvssrln_w_d (__m256i, __m256i);
-__m256i __lasx_xvssrln_wu_d (__m256i, __m256i);
-__m256i __lasx_xvssrlrn_b_h (__m256i, __m256i);
-__m256i __lasx_xvssrlrn_bu_h (__m256i, __m256i);
-__m256i __lasx_xvssrlrn_hu_w (__m256i, __m256i);
-__m256i __lasx_xvssrlrn_h_w (__m256i, __m256i);
-__m256i __lasx_xvssrlrni_b_h (__m256i, __m256i, imm0_15);
-__m256i __lasx_xvssrlrni_bu_h (__m256i, __m256i, imm0_15);
-__m256i __lasx_xvssrlrni_d_q (__m256i, __m256i, imm0_127);
-__m256i __lasx_xvssrlrni_du_q (__m256i, __m256i, imm0_127);
-__m256i __lasx_xvssrlrni_hu_w (__m256i, __m256i, imm0_31);
-__m256i __lasx_xvssrlrni_h_w (__m256i, __m256i, imm0_31);
-__m256i __lasx_xvssrlrni_w_d (__m256i, __m256i, imm0_63);
-__m256i __lasx_xvssrlrni_wu_d (__m256i, __m256i, imm0_63);
-__m256i __lasx_xvssrlrn_w_d (__m256i, __m256i);
-__m256i __lasx_xvssrlrn_wu_d (__m256i, __m256i);
-__m256i __lasx_xvssub_b (__m256i, __m256i);
-__m256i __lasx_xvssub_bu (__m256i, __m256i);
-__m256i __lasx_xvssub_d (__m256i, __m256i);
-__m256i __lasx_xvssub_du (__m256i, __m256i);
-__m256i __lasx_xvssub_h (__m256i, __m256i);
-__m256i __lasx_xvssub_hu (__m256i, __m256i);
-__m256i __lasx_xvssub_w (__m256i, __m256i);
-__m256i __lasx_xvssub_wu (__m256i, __m256i);
-void __lasx_xvst (__m256i, void *, imm_n2048_2047);
-void __lasx_xvstelm_b (__m256i, void *, imm_n128_127, imm0_31);
-void __lasx_xvstelm_d (__m256i, void *, imm_n128_127, imm0_3);
-void __lasx_xvstelm_h (__m256i, void *, imm_n128_127, imm0_15);
-void __lasx_xvstelm_w (__m256i, void *, imm_n128_127, imm0_7);
-void __lasx_xvstx (__m256i, void *, long int);
-__m256i __lasx_xvsub_b (__m256i, __m256i);
-__m256i __lasx_xvsub_d (__m256i, __m256i);
-__m256i __lasx_xvsub_h (__m256i, __m256i);
-__m256i __lasx_xvsubi_bu (__m256i, imm0_31);
-__m256i __lasx_xvsubi_du (__m256i, imm0_31);
-__m256i __lasx_xvsubi_hu (__m256i, imm0_31);
-__m256i __lasx_xvsubi_wu (__m256i, imm0_31);
-__m256i __lasx_xvsub_q (__m256i, __m256i);
-__m256i __lasx_xvsub_w (__m256i, __m256i);
-__m256i __lasx_xvsubwev_d_w (__m256i, __m256i);
-__m256i __lasx_xvsubwev_d_wu (__m256i, __m256i);
-__m256i __lasx_xvsubwev_h_b (__m256i, __m256i);
-__m256i __lasx_xvsubwev_h_bu (__m256i, __m256i);
-__m256i __lasx_xvsubwev_q_d (__m256i, __m256i);
-__m256i __lasx_xvsubwev_q_du (__m256i, __m256i);
-__m256i __lasx_xvsubwev_w_h (__m256i, __m256i);
-__m256i __lasx_xvsubwev_w_hu (__m256i, __m256i);
-__m256i __lasx_xvsubwod_d_w (__m256i, __m256i);
-__m256i __lasx_xvsubwod_d_wu (__m256i, __m256i);
-__m256i __lasx_xvsubwod_h_b (__m256i, __m256i);
-__m256i __lasx_xvsubwod_h_bu (__m256i, __m256i);
-__m256i __lasx_xvsubwod_q_d (__m256i, __m256i);
-__m256i __lasx_xvsubwod_q_du (__m256i, __m256i);
-__m256i __lasx_xvsubwod_w_h (__m256i, __m256i);
-__m256i __lasx_xvsubwod_w_hu (__m256i, __m256i);
-__m256i __lasx_xvxori_b (__m256i, imm0_255);
-__m256i __lasx_xvxor_v (__m256i, __m256i);
+@node Other Builtins
+@section Other Built-in Functions Provided by GCC
+@cindex built-in functions
+@findex __builtin_iseqsig
+@findex __builtin_isfinite
+@findex __builtin_isnormal
+@findex __builtin_isgreater
+@findex __builtin_isgreaterequal
+@findex __builtin_isunordered
+@findex __builtin_speculation_safe_value
+@findex _Exit
+@findex _exit
+@findex abort
+@findex abs
+@findex acos
+@findex acosf
+@findex acosh
+@findex acoshf
+@findex acoshl
+@findex acosl
+@findex alloca
+@findex asin
+@findex asinf
+@findex asinh
+@findex asinhf
+@findex asinhl
+@findex asinl
+@findex atan
+@findex atan2
+@findex atan2f
+@findex atan2l
+@findex atanf
+@findex atanh
+@findex atanhf
+@findex atanhl
+@findex atanl
+@findex bcmp
+@findex bzero
+@findex cabs
+@findex cabsf
+@findex cabsl
+@findex cacos
+@findex cacosf
+@findex cacosh
+@findex cacoshf
+@findex cacoshl
+@findex cacosl
+@findex calloc
+@findex carg
+@findex cargf
+@findex cargl
+@findex casin
+@findex casinf
+@findex casinh
+@findex casinhf
+@findex casinhl
+@findex casinl
+@findex catan
+@findex catanf
+@findex catanh
+@findex catanhf
+@findex catanhl
+@findex catanl
+@findex cbrt
+@findex cbrtf
+@findex cbrtl
+@findex ccos
+@findex ccosf
+@findex ccosh
+@findex ccoshf
+@findex ccoshl
+@findex ccosl
+@findex ceil
+@findex ceilf
+@findex ceill
+@findex cexp
+@findex cexpf
+@findex cexpl
+@findex cimag
+@findex cimagf
+@findex cimagl
+@findex clog
+@findex clogf
+@findex clogl
+@findex clog10
+@findex clog10f
+@findex clog10l
+@findex conj
+@findex conjf
+@findex conjl
+@findex copysign
+@findex copysignf
+@findex copysignl
+@findex cos
+@findex cosf
+@findex cosh
+@findex coshf
+@findex coshl
+@findex cosl
+@findex cpow
+@findex cpowf
+@findex cpowl
+@findex cproj
+@findex cprojf
+@findex cprojl
+@findex creal
+@findex crealf
+@findex creall
+@findex csin
+@findex csinf
+@findex csinh
+@findex csinhf
+@findex csinhl
+@findex csinl
+@findex csqrt
+@findex csqrtf
+@findex csqrtl
+@findex ctan
+@findex ctanf
+@findex ctanh
+@findex ctanhf
+@findex ctanhl
+@findex ctanl
+@findex dcgettext
+@findex dgettext
+@findex drem
+@findex dremf
+@findex dreml
+@findex erf
+@findex erfc
+@findex erfcf
+@findex erfcl
+@findex erff
+@findex erfl
+@findex exit
+@findex exp
+@findex exp10
+@findex exp10f
+@findex exp10l
+@findex exp2
+@findex exp2f
+@findex exp2l
+@findex expf
+@findex expl
+@findex expm1
+@findex expm1f
+@findex expm1l
+@findex fabs
+@findex fabsf
+@findex fabsl
+@findex fdim
+@findex fdimf
+@findex fdiml
+@findex ffs
+@findex floor
+@findex floorf
+@findex floorl
+@findex fma
+@findex fmaf
+@findex fmal
+@findex fmax
+@findex fmaxf
+@findex fmaxl
+@findex fmin
+@findex fminf
+@findex fminl
+@findex fmod
+@findex fmodf
+@findex fmodl
+@findex fprintf
+@findex fprintf_unlocked
+@findex fputs
+@findex fputs_unlocked
+@findex free
+@findex frexp
+@findex frexpf
+@findex frexpl
+@findex fscanf
+@findex gamma
+@findex gammaf
+@findex gammal
+@findex gamma_r
+@findex gammaf_r
+@findex gammal_r
+@findex gettext
+@findex hypot
+@findex hypotf
+@findex hypotl
+@findex ilogb
+@findex ilogbf
+@findex ilogbl
+@findex imaxabs
+@findex index
+@findex isalnum
+@findex isalpha
+@findex isascii
+@findex isblank
+@findex iscntrl
+@findex isdigit
+@findex isgraph
+@findex islower
+@findex isprint
+@findex ispunct
+@findex isspace
+@findex isupper
+@findex iswalnum
+@findex iswalpha
+@findex iswblank
+@findex iswcntrl
+@findex iswdigit
+@findex iswgraph
+@findex iswlower
+@findex iswprint
+@findex iswpunct
+@findex iswspace
+@findex iswupper
+@findex iswxdigit
+@findex isxdigit
+@findex j0
+@findex j0f
+@findex j0l
+@findex j1
+@findex j1f
+@findex j1l
+@findex jn
+@findex jnf
+@findex jnl
+@findex labs
+@findex ldexp
+@findex ldexpf
+@findex ldexpl
+@findex lgamma
+@findex lgammaf
+@findex lgammal
+@findex lgamma_r
+@findex lgammaf_r
+@findex lgammal_r
+@findex llabs
+@findex llrint
+@findex llrintf
+@findex llrintl
+@findex llround
+@findex llroundf
+@findex llroundl
+@findex log
+@findex log10
+@findex log10f
+@findex log10l
+@findex log1p
+@findex log1pf
+@findex log1pl
+@findex log2
+@findex log2f
+@findex log2l
+@findex logb
+@findex logbf
+@findex logbl
+@findex logf
+@findex logl
+@findex lrint
+@findex lrintf
+@findex lrintl
+@findex lround
+@findex lroundf
+@findex lroundl
+@findex malloc
+@findex memchr
+@findex memcmp
+@findex memcpy
+@findex mempcpy
+@findex memset
+@findex modf
+@findex modff
+@findex modfl
+@findex nearbyint
+@findex nearbyintf
+@findex nearbyintl
+@findex nextafter
+@findex nextafterf
+@findex nextafterl
+@findex nexttoward
+@findex nexttowardf
+@findex nexttowardl
+@findex pow
+@findex pow10
+@findex pow10f
+@findex pow10l
+@findex powf
+@findex powl
+@findex printf
+@findex printf_unlocked
+@findex putchar
+@findex puts
+@findex realloc
+@findex remainder
+@findex remainderf
+@findex remainderl
+@findex remquo
+@findex remquof
+@findex remquol
+@findex rindex
+@findex rint
+@findex rintf
+@findex rintl
+@findex round
+@findex roundf
+@findex roundl
+@findex scalb
+@findex scalbf
+@findex scalbl
+@findex scalbln
+@findex scalblnf
+@findex scalblnf
+@findex scalbn
+@findex scalbnf
+@findex scanfnl
+@findex signbit
+@findex signbitf
+@findex signbitl
+@findex signbitd32
+@findex signbitd64
+@findex signbitd128
+@findex significand
+@findex significandf
+@findex significandl
+@findex sin
+@findex sincos
+@findex sincosf
+@findex sincosl
+@findex sinf
+@findex sinh
+@findex sinhf
+@findex sinhl
+@findex sinl
+@findex snprintf
+@findex sprintf
+@findex sqrt
+@findex sqrtf
+@findex sqrtl
+@findex sscanf
+@findex stpcpy
+@findex stpncpy
+@findex strcasecmp
+@findex strcat
+@findex strchr
+@findex strcmp
+@findex strcpy
+@findex strcspn
+@findex strdup
+@findex strfmon
+@findex strftime
+@findex strlen
+@findex strncasecmp
+@findex strncat
+@findex strncmp
+@findex strncpy
+@findex strndup
+@findex strnlen
+@findex strpbrk
+@findex strrchr
+@findex strspn
+@findex strstr
+@findex tan
+@findex tanf
+@findex tanh
+@findex tanhf
+@findex tanhl
+@findex tanl
+@findex tgamma
+@findex tgammaf
+@findex tgammal
+@findex toascii
+@findex tolower
+@findex toupper
+@findex towlower
+@findex towupper
+@findex trunc
+@findex truncf
+@findex truncl
+@findex vfprintf
+@findex vfscanf
+@findex vprintf
+@findex vscanf
+@findex vsnprintf
+@findex vsprintf
+@findex vsscanf
+@findex y0
+@findex y0f
+@findex y0l
+@findex y1
+@findex y1f
+@findex y1l
+@findex yn
+@findex ynf
+@findex ynl
+
+GCC provides a large number of built-in functions other than the ones
+mentioned above.  Some of these are for internal use in the processing
+of exceptions or variable-length argument lists and are not
+documented here because they may change from time to time; we do not
+recommend general use of these functions.
+
+The remaining functions are provided for optimization purposes.
+
+With the exception of built-ins that have library equivalents such as
+the standard C library functions discussed below, or that expand to
+library calls, GCC built-in functions are always expanded inline and
+thus do not have corresponding entry points and their address cannot
+be obtained.  Attempting to use them in an expression other than
+a function call results in a compile-time error.
+
+@opindex fno-builtin
+GCC includes built-in versions of many of the functions in the standard
+C library.  These functions come in two forms: one whose names start with
+the @code{__builtin_} prefix, and the other without.  Both forms have the
+same type (including prototype), the same address (when their address is
+taken), and the same meaning as the C library functions even if you specify
+the @option{-fno-builtin} option @pxref{C Dialect Options}).  Many of these
+functions are only optimized in certain cases; if they are not optimized in
+a particular case, a call to the library function is emitted.
+
+@opindex ansi
+@opindex std
+Outside strict ISO C mode (@option{-ansi}, @option{-std=c90},
+@option{-std=c99} or @option{-std=c11}), the functions
+@code{_exit}, @code{alloca}, @code{bcmp}, @code{bzero},
+@code{dcgettext}, @code{dgettext}, @code{dremf}, @code{dreml},
+@code{drem}, @code{exp10f}, @code{exp10l}, @code{exp10}, @code{ffsll},
+@code{ffsl}, @code{ffs}, @code{fprintf_unlocked},
+@code{fputs_unlocked}, @code{gammaf}, @code{gammal}, @code{gamma},
+@code{gammaf_r}, @code{gammal_r}, @code{gamma_r}, @code{gettext},
+@code{index}, @code{isascii}, @code{j0f}, @code{j0l}, @code{j0},
+@code{j1f}, @code{j1l}, @code{j1}, @code{jnf}, @code{jnl}, @code{jn},
+@code{lgammaf_r}, @code{lgammal_r}, @code{lgamma_r}, @code{mempcpy},
+@code{pow10f}, @code{pow10l}, @code{pow10}, @code{printf_unlocked},
+@code{rindex}, @code{roundeven}, @code{roundevenf}, @code{roundevenl},
+@code{scalbf}, @code{scalbl}, @code{scalb},
+@code{signbit}, @code{signbitf}, @code{signbitl}, @code{signbitd32},
+@code{signbitd64}, @code{signbitd128}, @code{significandf},
+@code{significandl}, @code{significand}, @code{sincosf},
+@code{sincosl}, @code{sincos}, @code{stpcpy}, @code{stpncpy},
+@code{strcasecmp}, @code{strdup}, @code{strfmon}, @code{strncasecmp},
+@code{strndup}, @code{strnlen}, @code{toascii}, @code{y0f}, @code{y0l},
+@code{y0}, @code{y1f}, @code{y1l}, @code{y1}, @code{ynf}, @code{ynl} and
+@code{yn}
+may be handled as built-in functions.
+All these functions have corresponding versions
+prefixed with @code{__builtin_}, which may be used even in strict C90
+mode.
+
+The ISO C99 functions
+@code{_Exit}, @code{acoshf}, @code{acoshl}, @code{acosh}, @code{asinhf},
+@code{asinhl}, @code{asinh}, @code{atanhf}, @code{atanhl}, @code{atanh},
+@code{cabsf}, @code{cabsl}, @code{cabs}, @code{cacosf}, @code{cacoshf},
+@code{cacoshl}, @code{cacosh}, @code{cacosl}, @code{cacos},
+@code{cargf}, @code{cargl}, @code{carg}, @code{casinf}, @code{casinhf},
+@code{casinhl}, @code{casinh}, @code{casinl}, @code{casin},
+@code{catanf}, @code{catanhf}, @code{catanhl}, @code{catanh},
+@code{catanl}, @code{catan}, @code{cbrtf}, @code{cbrtl}, @code{cbrt},
+@code{ccosf}, @code{ccoshf}, @code{ccoshl}, @code{ccosh}, @code{ccosl},
+@code{ccos}, @code{cexpf}, @code{cexpl}, @code{cexp}, @code{cimagf},
+@code{cimagl}, @code{cimag}, @code{clogf}, @code{clogl}, @code{clog},
+@code{conjf}, @code{conjl}, @code{conj}, @code{copysignf}, @code{copysignl},
+@code{copysign}, @code{cpowf}, @code{cpowl}, @code{cpow}, @code{cprojf},
+@code{cprojl}, @code{cproj}, @code{crealf}, @code{creall}, @code{creal},
+@code{csinf}, @code{csinhf}, @code{csinhl}, @code{csinh}, @code{csinl},
+@code{csin}, @code{csqrtf}, @code{csqrtl}, @code{csqrt}, @code{ctanf},
+@code{ctanhf}, @code{ctanhl}, @code{ctanh}, @code{ctanl}, @code{ctan},
+@code{erfcf}, @code{erfcl}, @code{erfc}, @code{erff}, @code{erfl},
+@code{erf}, @code{exp2f}, @code{exp2l}, @code{exp2}, @code{expm1f},
+@code{expm1l}, @code{expm1}, @code{fdimf}, @code{fdiml}, @code{fdim},
+@code{fmaf}, @code{fmal}, @code{fmaxf}, @code{fmaxl}, @code{fmax},
+@code{fma}, @code{fminf}, @code{fminl}, @code{fmin}, @code{hypotf},
+@code{hypotl}, @code{hypot}, @code{ilogbf}, @code{ilogbl}, @code{ilogb},
+@code{imaxabs}, @code{isblank}, @code{iswblank}, @code{lgammaf},
+@code{lgammal}, @code{lgamma}, @code{llabs}, @code{llrintf}, @code{llrintl},
+@code{llrint}, @code{llroundf}, @code{llroundl}, @code{llround},
+@code{log1pf}, @code{log1pl}, @code{log1p}, @code{log2f}, @code{log2l},
+@code{log2}, @code{logbf}, @code{logbl}, @code{logb}, @code{lrintf},
+@code{lrintl}, @code{lrint}, @code{lroundf}, @code{lroundl},
+@code{lround}, @code{nearbyintf}, @code{nearbyintl}, @code{nearbyint},
+@code{nextafterf}, @code{nextafterl}, @code{nextafter},
+@code{nexttowardf}, @code{nexttowardl}, @code{nexttoward},
+@code{remainderf}, @code{remainderl}, @code{remainder}, @code{remquof},
+@code{remquol}, @code{remquo}, @code{rintf}, @code{rintl}, @code{rint},
+@code{roundf}, @code{roundl}, @code{round}, @code{scalblnf},
+@code{scalblnl}, @code{scalbln}, @code{scalbnf}, @code{scalbnl},
+@code{scalbn}, @code{snprintf}, @code{tgammaf}, @code{tgammal},
+@code{tgamma}, @code{truncf}, @code{truncl}, @code{trunc},
+@code{vfscanf}, @code{vscanf}, @code{vsnprintf} and @code{vsscanf}
+are handled as built-in functions
+except in strict ISO C90 mode (@option{-ansi} or @option{-std=c90}).
+
+There are also built-in versions of the ISO C99 functions
+@code{acosf}, @code{acosl}, @code{asinf}, @code{asinl}, @code{atan2f},
+@code{atan2l}, @code{atanf}, @code{atanl}, @code{ceilf}, @code{ceill},
+@code{cosf}, @code{coshf}, @code{coshl}, @code{cosl}, @code{expf},
+@code{expl}, @code{fabsf}, @code{fabsl}, @code{floorf}, @code{floorl},
+@code{fmodf}, @code{fmodl}, @code{frexpf}, @code{frexpl}, @code{ldexpf},
+@code{ldexpl}, @code{log10f}, @code{log10l}, @code{logf}, @code{logl},
+@code{modfl}, @code{modff}, @code{powf}, @code{powl}, @code{sinf},
+@code{sinhf}, @code{sinhl}, @code{sinl}, @code{sqrtf}, @code{sqrtl},
+@code{tanf}, @code{tanhf}, @code{tanhl} and @code{tanl}
+that are recognized in any mode since ISO C90 reserves these names for
+the purpose to which ISO C99 puts them.  All these functions have
+corresponding versions prefixed with @code{__builtin_}.
+
+There are also built-in functions @code{__builtin_fabsf@var{n}},
+@code{__builtin_fabsf@var{n}x}, @code{__builtin_copysignf@var{n}} and
+@code{__builtin_copysignf@var{n}x}, corresponding to the TS 18661-3
+functions @code{fabsf@var{n}}, @code{fabsf@var{n}x},
+@code{copysignf@var{n}} and @code{copysignf@var{n}x}, for supported
+types @code{_Float@var{n}} and @code{_Float@var{n}x}.
+
+There are also GNU extension functions @code{clog10}, @code{clog10f} and
+@code{clog10l} which names are reserved by ISO C99 for future use.
+All these functions have versions prefixed with @code{__builtin_}.
+
+The ISO C94 functions
+@code{iswalnum}, @code{iswalpha}, @code{iswcntrl}, @code{iswdigit},
+@code{iswgraph}, @code{iswlower}, @code{iswprint}, @code{iswpunct},
+@code{iswspace}, @code{iswupper}, @code{iswxdigit}, @code{towlower} and
+@code{towupper}
+are handled as built-in functions
+except in strict ISO C90 mode (@option{-ansi} or @option{-std=c90}).
+
+The ISO C90 functions
+@code{abort}, @code{abs}, @code{acos}, @code{asin}, @code{atan2},
+@code{atan}, @code{calloc}, @code{ceil}, @code{cosh}, @code{cos},
+@code{exit}, @code{exp}, @code{fabs}, @code{floor}, @code{fmod},
+@code{fprintf}, @code{fputs}, @code{free}, @code{frexp}, @code{fscanf},
+@code{isalnum}, @code{isalpha}, @code{iscntrl}, @code{isdigit},
+@code{isgraph}, @code{islower}, @code{isprint}, @code{ispunct},
+@code{isspace}, @code{isupper}, @code{isxdigit}, @code{tolower},
+@code{toupper}, @code{labs}, @code{ldexp}, @code{log10}, @code{log},
+@code{malloc}, @code{memchr}, @code{memcmp}, @code{memcpy},
+@code{memset}, @code{modf}, @code{pow}, @code{printf}, @code{putchar},
+@code{puts}, @code{realloc}, @code{scanf}, @code{sinh}, @code{sin},
+@code{snprintf}, @code{sprintf}, @code{sqrt}, @code{sscanf}, @code{strcat},
+@code{strchr}, @code{strcmp}, @code{strcpy}, @code{strcspn},
+@code{strlen}, @code{strncat}, @code{strncmp}, @code{strncpy},
+@code{strpbrk}, @code{strrchr}, @code{strspn}, @code{strstr},
+@code{tanh}, @code{tan}, @code{vfprintf}, @code{vprintf} and @code{vsprintf}
+are all recognized as built-in functions unless
+@option{-fno-builtin} is specified (or @option{-fno-builtin-@var{function}}
+is specified for an individual function).  All of these functions have
+corresponding versions prefixed with @code{__builtin_}.
+
+GCC provides built-in versions of the ISO C99 floating-point comparison
+macros that avoid raising exceptions for unordered operands.  They have
+the same names as the standard macros ( @code{isgreater},
+@code{isgreaterequal}, @code{isless}, @code{islessequal},
+@code{islessgreater}, and @code{isunordered}) , with @code{__builtin_}
+prefixed.  We intend for a library implementor to be able to simply
+@code{#define} each standard macro to its built-in equivalent.
+In the same fashion, GCC provides @code{fpclassify}, @code{iseqsig},
+@code{isfinite}, @code{isinf_sign}, @code{isnormal} and @code{signbit} built-ins
+used with @code{__builtin_} prefixed.  The @code{isinf} and @code{isnan}
+built-in functions appear both with and without the @code{__builtin_} prefix.
+With @code{-ffinite-math-only} option the @code{isinf} and @code{isnan}
+built-in functions will always return 0.
+
+GCC provides built-in versions of the ISO C99 floating-point rounding and
+exceptions handling functions @code{fegetround}, @code{feclearexcept} and
+@code{feraiseexcept}.  They may not be available for all targets, and because
+they need close interaction with libc internal values, they may not be available
+for all target libcs, but in all cases they will gracefully fallback to libc
+calls.  These built-in functions appear both with and without the
+@code{__builtin_} prefix.
+
+@defbuiltin{{void *} __builtin_alloca (size_t @var{size})}
+The @code{__builtin_alloca} function must be called at block scope.
+The function allocates an object @var{size} bytes large on the stack
+of the calling function.  The object is aligned on the default stack
+alignment boundary for the target determined by the
+@code{__BIGGEST_ALIGNMENT__} macro.  The @code{__builtin_alloca}
+function returns a pointer to the first byte of the allocated object.
+The lifetime of the allocated object ends just before the calling
+function returns to its caller.   This is so even when
+@code{__builtin_alloca} is called within a nested block.
+
+For example, the following function allocates eight objects of @code{n}
+bytes each on the stack, storing a pointer to each in consecutive elements
+of the array @code{a}.  It then passes the array to function @code{g}
+which can safely use the storage pointed to by each of the array elements.
+
+@smallexample
+void f (unsigned n)
+@{
+  void *a [8];
+  for (int i = 0; i != 8; ++i)
+    a [i] = __builtin_alloca (n);
+
+  g (a, n);   // @r{safe}
+@}
+@end smallexample
+
+Since the @code{__builtin_alloca} function doesn't validate its argument
+it is the responsibility of its caller to make sure the argument doesn't
+cause it to exceed the stack size limit.
+The @code{__builtin_alloca} function is provided to make it possible to
+allocate on the stack arrays of bytes with an upper bound that may be
+computed at run time.  Since C99 Variable Length Arrays offer
+similar functionality under a portable, more convenient, and safer
+interface they are recommended instead, in both C99 and C++ programs
+where GCC provides them as an extension.
+@xref{Variable Length}, for details.
+
+@enddefbuiltin
+
+@defbuiltin{{void *} __builtin_alloca_with_align (size_t @var{size}, size_t @var{alignment})}
+The @code{__builtin_alloca_with_align} function must be called at block
+scope.  The function allocates an object @var{size} bytes large on
+the stack of the calling function.  The allocated object is aligned on
+the boundary specified by the argument @var{alignment} whose unit is given
+in bits (not bytes).  The @var{size} argument must be positive and not
+exceed the stack size limit.  The @var{alignment} argument must be a constant
+integer expression that evaluates to a power of 2 greater than or equal to
+@code{CHAR_BIT} and less than some unspecified maximum.  Invocations
+with other values are rejected with an error indicating the valid bounds.
+The function returns a pointer to the first byte of the allocated object.
+The lifetime of the allocated object ends at the end of the block in which
+the function was called.  The allocated storage is released no later than
+just before the calling function returns to its caller, but may be released
+at the end of the block in which the function was called.
+
+For example, in the following function the call to @code{g} is unsafe
+because when @code{overalign} is non-zero, the space allocated by
+@code{__builtin_alloca_with_align} may have been released at the end
+of the @code{if} statement in which it was called.
+
+@smallexample
+void f (unsigned n, bool overalign)
+@{
+  void *p;
+  if (overalign)
+    p = __builtin_alloca_with_align (n, 64 /* bits */);
+  else
+    p = __builtin_alloc (n);
+
+  g (p, n);   // @r{unsafe}
+@}
+@end smallexample
+
+Since the @code{__builtin_alloca_with_align} function doesn't validate its
+@var{size} argument it is the responsibility of its caller to make sure
+the argument doesn't cause it to exceed the stack size limit.
+The @code{__builtin_alloca_with_align} function is provided to make
+it possible to allocate on the stack overaligned arrays of bytes with
+an upper bound that may be computed at run time.  Since C99
+Variable Length Arrays offer the same functionality under
+a portable, more convenient, and safer interface they are recommended
+instead, in both C99 and C++ programs where GCC provides them as
+an extension.  @xref{Variable Length}, for details.
+
+@enddefbuiltin
+
+@defbuiltin{{void *} __builtin_alloca_with_align_and_max (size_t @var{size}, size_t @var{alignment}, size_t @var{max_size})}
+Similar to @code{__builtin_alloca_with_align} but takes an extra argument
+specifying an upper bound for @var{size} in case its value cannot be computed
+at compile time, for use by @option{-fstack-usage}, @option{-Wstack-usage}
+and @option{-Walloca-larger-than}.  @var{max_size} must be a constant integer
+expression, it has no effect on code generation and no attempt is made to
+check its compatibility with @var{size}.
+
+@enddefbuiltin
+
+@defbuiltin{bool __builtin_has_attribute (@var{type-or-expression}, @var{attribute})}
+The @code{__builtin_has_attribute} function evaluates to an integer constant
+expression equal to @code{true} if the symbol or type referenced by
+the @var{type-or-expression} argument has been declared with
+the @var{attribute} referenced by the second argument.  For
+an @var{type-or-expression} argument that does not reference a symbol,
+since attributes do not apply to expressions the built-in consider
+the type of the argument.  Neither argument is evaluated.
+The @var{type-or-expression} argument is subject to the same
+restrictions as the argument to @code{typeof} (@pxref{Typeof}).  The
+@var{attribute} argument is an attribute name optionally followed by
+a comma-separated list of arguments enclosed in parentheses.  Both forms
+of attribute names---with and without double leading and trailing
+underscores---are recognized.  @xref{Attribute Syntax}, for details.
+When no attribute arguments are specified for an attribute that expects
+one or more arguments the function returns @code{true} if
+@var{type-or-expression} has been declared with the attribute regardless
+of the attribute argument values.  Arguments provided for an attribute
+that expects some are validated and matched up to the provided number.
+The function returns @code{true} if all provided arguments match.  For
+example, the first call to the function below evaluates to @code{true}
+because @code{x} is declared with the @code{aligned} attribute but
+the second call evaluates to @code{false} because @code{x} is declared
+@code{aligned (8)} and not @code{aligned (4)}.
+
+@smallexample
+__attribute__ ((aligned (8))) int x;
+_Static_assert (__builtin_has_attribute (x, aligned), "aligned");
+_Static_assert (!__builtin_has_attribute (x, aligned (4)), "aligned (4)");
+@end smallexample
+
+Due to a limitation the @code{__builtin_has_attribute} function returns
+@code{false} for the @code{mode} attribute even if the type or variable
+referenced by the @var{type-or-expression} argument was declared with one.
+The function is also not supported with labels, and in C with enumerators.
+
+Note that unlike the @code{__has_attribute} preprocessor operator which
+is suitable for use in @code{#if} preprocessing directives
+@code{__builtin_has_attribute} is an intrinsic function that is not
+recognized in such contexts.
+
+@enddefbuiltin
+
+@defbuiltin{@var{type} __builtin_speculation_safe_value (@var{type} @var{val}, @var{type} @var{failval})}
+
+This built-in function can be used to help mitigate against unsafe
+speculative execution.  @var{type} may be any integral type or any
+pointer type.
+
+@enumerate
+@item
+If the CPU is not speculatively executing the code, then @var{val}
+is returned.
+@item
+If the CPU is executing speculatively then either:
+@itemize
+@item
+The function may cause execution to pause until it is known that the
+code is no-longer being executed speculatively (in which case
+@var{val} can be returned, as above); or
+@item
+The function may use target-dependent speculation tracking state to cause
+@var{failval} to be returned when it is known that speculative
+execution has incorrectly predicted a conditional branch operation.
+@end itemize
+@end enumerate
+
+The second argument, @var{failval}, is optional and defaults to zero
+if omitted.
+
+GCC defines the preprocessor macro
+@code{__HAVE_BUILTIN_SPECULATION_SAFE_VALUE} for targets that have been
+updated to support this builtin.
+
+The built-in function can be used where a variable appears to be used in a
+safe way, but the CPU, due to speculative execution may temporarily ignore
+the bounds checks.  Consider, for example, the following function:
+
+@smallexample
+int array[500];
+int f (unsigned untrusted_index)
+@{
+  if (untrusted_index < 500)
+    return array[untrusted_index];
+  return 0;
+@}
+@end smallexample
+
+If the function is called repeatedly with @code{untrusted_index} less
+than the limit of 500, then a branch predictor will learn that the
+block of code that returns a value stored in @code{array} will be
+executed.  If the function is subsequently called with an
+out-of-range value it will still try to execute that block of code
+first until the CPU determines that the prediction was incorrect
+(the CPU will unwind any incorrect operations at that point).
+However, depending on how the result of the function is used, it might be
+possible to leave traces in the cache that can reveal what was stored
+at the out-of-bounds location.  The built-in function can be used to
+provide some protection against leaking data in this way by changing
+the code to:
+
+@smallexample
+int array[500];
+int f (unsigned untrusted_index)
+@{
+  if (untrusted_index < 500)
+    return array[__builtin_speculation_safe_value (untrusted_index)];
+  return 0;
+@}
+@end smallexample
+
+The built-in function will either cause execution to stall until the
+conditional branch has been fully resolved, or it may permit
+speculative execution to continue, but using 0 instead of
+@code{untrusted_value} if that exceeds the limit.
+
+If accessing any memory location is potentially unsafe when speculative
+execution is incorrect, then the code can be rewritten as
+
+@smallexample
+int array[500];
+int f (unsigned untrusted_index)
+@{
+  if (untrusted_index < 500)
+    return *__builtin_speculation_safe_value (&array[untrusted_index], NULL);
+  return 0;
+@}
+@end smallexample
+
+which will cause a @code{NULL} pointer to be used for the unsafe case.
+
+@enddefbuiltin
+
+@defbuiltin{int __builtin_types_compatible_p (@var{type1}, @var{type2})}
+
+You can use the built-in function @code{__builtin_types_compatible_p} to
+determine whether two types are the same.
+
+This built-in function returns 1 if the unqualified versions of the
+types @var{type1} and @var{type2} (which are types, not expressions) are
+compatible, 0 otherwise.  The result of this built-in function can be
+used in integer constant expressions.
+
+This built-in function ignores top level qualifiers (e.g., @code{const},
+@code{volatile}).  For example, @code{int} is equivalent to @code{const
+int}.
+
+The type @code{int[]} and @code{int[5]} are compatible.  On the other
+hand, @code{int} and @code{char *} are not compatible, even if the size
+of their types, on the particular architecture are the same.  Also, the
+amount of pointer indirection is taken into account when determining
+similarity.  Consequently, @code{short *} is not similar to
+@code{short **}.  Furthermore, two types that are typedefed are
+considered compatible if their underlying types are compatible.
+
+An @code{enum} type is not considered to be compatible with another
+@code{enum} type even if both are compatible with the same integer
+type; this is what the C standard specifies.
+For example, @code{enum @{foo, bar@}} is not similar to
+@code{enum @{hot, dog@}}.
+
+You typically use this function in code whose execution varies
+depending on the arguments' types.  For example:
+
+@smallexample
+#define foo(x)                                                  \
+  (@{                                                           \
+    typeof (x) tmp = (x);                                       \
+    if (__builtin_types_compatible_p (typeof (x), long double)) \
+      tmp = foo_long_double (tmp);                              \
+    else if (__builtin_types_compatible_p (typeof (x), double)) \
+      tmp = foo_double (tmp);                                   \
+    else if (__builtin_types_compatible_p (typeof (x), float))  \
+      tmp = foo_float (tmp);                                    \
+    else                                                        \
+      abort ();                                                 \
+    tmp;                                                        \
+  @})
+@end smallexample
+
+@emph{Note:} This construct is only available for C@.
+
+@enddefbuiltin
+
+@defbuiltin{@var{type} __builtin_call_with_static_chain (@var{call_exp}, @var{pointer_exp})}
+
+The @var{call_exp} expression must be a function call, and the
+@var{pointer_exp} expression must be a pointer.  The @var{pointer_exp}
+is passed to the function call in the target's static chain location.
+The result of builtin is the result of the function call.
+
+@emph{Note:} This builtin is only available for C@.
+This builtin can be used to call Go closures from C.
+
+@enddefbuiltin
+
+@defbuiltin{@var{type} __builtin_choose_expr (@var{const_exp}, @var{exp1}, @var{exp2})}
+
+You can use the built-in function @code{__builtin_choose_expr} to
+evaluate code depending on the value of a constant expression.  This
+built-in function returns @var{exp1} if @var{const_exp}, which is an
+integer constant expression, is nonzero.  Otherwise it returns @var{exp2}.
+
+Like the @samp{? :} operator, this built-in function does not evaluate the
+expression that is not chosen.  For example, if @var{const_exp} evaluates to
+@code{true}, @var{exp2} is not evaluated even if it has side effects.  On the
+other hand, @code{__builtin_choose_expr} differs from @samp{? :} in that the
+first operand must be a compile-time constant, and the other operands are not
+subject to the @samp{? :} type constraints and promotions.
+
+This built-in function can return an lvalue if the chosen argument is an
+lvalue.
+
+If @var{exp1} is returned, the return type is the same as @var{exp1}'s
+type.  Similarly, if @var{exp2} is returned, its return type is the same
+as @var{exp2}.
+
+Example:
+
+@smallexample
+#define foo(x)                                                    \
+  __builtin_choose_expr (                                         \
+    __builtin_types_compatible_p (typeof (x), double),            \
+    foo_double (x),                                               \
+    __builtin_choose_expr (                                       \
+      __builtin_types_compatible_p (typeof (x), float),           \
+      foo_float (x),                                              \
+      /* @r{The void expression results in a compile-time error}  \
+         @r{when assigning the result to something.}  */          \
+      (void)0))
+@end smallexample
+
+@emph{Note:} This construct is only available for C@.  Furthermore, the
+unused expression (@var{exp1} or @var{exp2} depending on the value of
+@var{const_exp}) may still generate syntax errors.  This may change in
+future revisions.
+
+@enddefbuiltin
+
+@defbuiltin{@var{type} __builtin_tgmath (@var{functions}, @var{arguments})}
+
+The built-in function @code{__builtin_tgmath}, available only for C
+and Objective-C, calls a function determined according to the rules of
+@code{<tgmath.h>} macros.  It is intended to be used in
+implementations of that header, so that expansions of macros from that
+header only expand each of their arguments once, to avoid problems
+when calls to such macros are nested inside the arguments of other
+calls to such macros; in addition, it results in better diagnostics
+for invalid calls to @code{<tgmath.h>} macros than implementations
+using other GNU C language features.  For example, the @code{pow}
+type-generic macro might be defined as:
+
+@smallexample
+#define pow(a, b) __builtin_tgmath (powf, pow, powl, \
+                                    cpowf, cpow, cpowl, a, b)
+@end smallexample
+
+The arguments to @code{__builtin_tgmath} are at least two pointers to
+functions, followed by the arguments to the type-generic macro (which
+will be passed as arguments to the selected function).  All the
+pointers to functions must be pointers to prototyped functions, none
+of which may have variable arguments, and all of which must have the
+same number of parameters; the number of parameters of the first
+function determines how many arguments to @code{__builtin_tgmath} are
+interpreted as function pointers, and how many as the arguments to the
+called function.
+
+The types of the specified functions must all be different, but
+related to each other in the same way as a set of functions that may
+be selected between by a macro in @code{<tgmath.h>}.  This means that
+the functions are parameterized by a floating-point type @var{t},
+different for each such function.  The function return types may all
+be the same type, or they may be @var{t} for each function, or they
+may be the real type corresponding to @var{t} for each function (if
+some of the types @var{t} are complex).  Likewise, for each parameter
+position, the type of the parameter in that position may always be the
+same type, or may be @var{t} for each function (this case must apply
+for at least one parameter position), or may be the real type
+corresponding to @var{t} for each function.
+
+The standard rules for @code{<tgmath.h>} macros are used to find a
+common type @var{u} from the types of the arguments for parameters
+whose types vary between the functions; complex integer types (a GNU
+extension) are treated like the complex type corresponding to the real
+floating type that would be chosen for the corresponding real integer type.
+If the function return types vary, or are all the same integer type,
+the function called is the one for which @var{t} is @var{u}, and it is
+an error if there is no such function.  If the function return types
+are all the same floating-point type, the type-generic macro is taken
+to be one of those from TS 18661 that rounds the result to a narrower
+type; if there is a function for which @var{t} is @var{u}, it is
+called, and otherwise the first function, if any, for which @var{t}
+has at least the range and precision of @var{u} is called, and it is
+an error if there is no such function.
+
+@enddefbuiltin
+
+@defbuiltin{int __builtin_constant_p (@var{exp})}
+You can use the built-in function @code{__builtin_constant_p} to
+determine if the expression @var{exp} is known to be constant at
+compile time and hence that GCC can perform constant-folding on expressions
+involving that value.  The argument of the function is the expression to test.
+The expression is not evaluated, side-effects are discarded.  The function
+returns the integer 1 if the argument is known to be a compile-time
+constant and 0 if it is not known to be a compile-time constant.
+Any expression that has side-effects makes the function return 0.
+A return of 0 does not indicate that the expression is @emph{not} a constant,
+but merely that GCC cannot prove it is a constant within the constraints
+of the active set of optimization options.
+
+You typically use this function in an embedded application where
+memory is a critical resource.  If you have some complex calculation,
+you may want it to be folded if it involves constants, but need to call
+a function if it does not.  For example:
+
+@smallexample
+#define Scale_Value(X)      \
+  (__builtin_constant_p (X) \
+  ? ((X) * SCALE + OFFSET) : Scale (X))
+@end smallexample
+
+You may use this built-in function in either a macro or an inline
+function.  However, if you use it in an inlined function and pass an
+argument of the function as the argument to the built-in, GCC
+never returns 1 when you call the inline function with a string constant
+or compound literal (@pxref{Compound Literals}) and does not return 1
+when you pass a constant numeric value to the inline function unless you
+specify the @option{-O} option.
+
+You may also use @code{__builtin_constant_p} in initializers for static
+data.  For instance, you can write
+
+@smallexample
+static const int table[] = @{
+   __builtin_constant_p (EXPRESSION) ? (EXPRESSION) : -1,
+   /* @r{@dots{}} */
+@};
+@end smallexample
+
+@noindent
+This is an acceptable initializer even if @var{EXPRESSION} is not a
+constant expression, including the case where
+@code{__builtin_constant_p} returns 1 because @var{EXPRESSION} can be
+folded to a constant but @var{EXPRESSION} contains operands that are
+not otherwise permitted in a static initializer (for example,
+@code{0 && foo ()}).  GCC must be more conservative about evaluating the
+built-in in this case, because it has no opportunity to perform
+optimization.
+@enddefbuiltin
+
+@defbuiltin{bool __builtin_is_constant_evaluated (void)}
+The @code{__builtin_is_constant_evaluated} function is available only
+in C++.  The built-in is intended to be used by implementations of
+the @code{std::is_constant_evaluated} C++ function.  Programs should make
+use of the latter function rather than invoking the built-in directly.
+
+The main use case of the built-in is to determine whether a @code{constexpr}
+function is being called in a @code{constexpr} context.  A call to
+the function evaluates to a core constant expression with the value
+@code{true} if and only if it occurs within the evaluation of an expression
+or conversion that is manifestly constant-evaluated as defined in the C++
+standard.  Manifestly constant-evaluated contexts include constant-expressions,
+the conditions of @code{constexpr if} statements, constraint-expressions, and
+initializers of variables usable in constant expressions.   For more details
+refer to the latest revision of the C++ standard.
+@enddefbuiltin
+
+@defbuiltin{@var{type} __builtin_counted_by_ref (@var{ptr})}
+The built-in function @code{__builtin_counted_by_ref} checks whether the array
+object pointed by the pointer @var{ptr} has another object associated with it
+that represents the number of elements in the array object through the
+@code{counted_by} attribute (i.e. the counted-by object). If so, returns a
+pointer to the corresponding counted-by object.
+If such counted-by object does not exist, returns a null pointer.
+
+This built-in function is only available in C for now.
+
+The argument @var{ptr} must be a pointer to an array.
+The @var{type} of the returned value is a pointer type pointing to the
+corresponding type of the counted-by object or a void pointer type in case
+of a null pointer being returned.
+
+For example:
+
+@smallexample
+struct foo1 @{
+  int counter;
+  struct bar1 array[] __attribute__((counted_by (counter)));
+@} *p;
+
+struct foo2 @{
+  int other;
+  struct bar2 array[];
+@} *q;
+@end smallexample
+
+@noindent
+the following call to the built-in
+
+@smallexample
+__builtin_counted_by_ref (p->array)
+@end smallexample
+
+@noindent
+returns:
+
+@smallexample
+&p->counter with type @code{int *}.
+@end smallexample
+
+@noindent
+However, the following call to the built-in
+
+@smallexample
+__builtin_counted_by_ref (q->array)
+@end smallexample
+
+@noindent
+returns a null pointer to @code{void}.
+
+@enddefbuiltin
+
+@defbuiltin{void __builtin_clear_padding (@var{ptr})}
+The built-in function @code{__builtin_clear_padding} function clears
+padding bits inside of the object representation of object pointed by
+@var{ptr}, which has to be a pointer.  The value representation of the
+object is not affected.  The type of the object is assumed to be the type
+the pointer points to.  Inside of a union, the only cleared bits are
+bits that are padding bits for all the union members.
+
+This built-in-function is useful if the padding bits of an object might
+have indeterminate values and the object representation needs to be
+bitwise compared to some other object, for example for atomic operations.
+
+For C++, @var{ptr} argument type should be pointer to trivially-copyable
+type, unless the argument is address of a variable or parameter, because
+otherwise it isn't known if the type isn't just a base class whose padding
+bits are reused or laid out differently in a derived class.
+@enddefbuiltin
+
+@defbuiltin{@var{type} __builtin_bit_cast (@var{type}, @var{arg})}
+The @code{__builtin_bit_cast} function is available only
+in C++.  The built-in is intended to be used by implementations of
+the @code{std::bit_cast} C++ template function.  Programs should make
+use of the latter function rather than invoking the built-in directly.
+
+This built-in function allows reinterpreting the bits of the @var{arg}
+argument as if it had type @var{type}.  @var{type} and the type of the
+@var{arg} argument need to be trivially copyable types with the same size.
+When manifestly constant-evaluated, it performs extra diagnostics required
+for @code{std::bit_cast} and returns a constant expression if @var{arg}
+is a constant expression.  For more details
+refer to the latest revision of the C++ standard.
+@enddefbuiltin
+
+@defbuiltin{long __builtin_expect (long @var{exp}, long @var{c})}
+@opindex fprofile-arcs
+You may use @code{__builtin_expect} to provide the compiler with
+branch prediction information.  In general, you should prefer to
+use actual profile feedback for this (@option{-fprofile-arcs}), as
+programmers are notoriously bad at predicting how their programs
+actually perform.  However, there are applications in which this
+data is hard to collect.
+
+The return value is the value of @var{exp}, which should be an integral
+expression.  The semantics of the built-in are that it is expected that
+@var{exp} == @var{c}.  For example:
+
+@smallexample
+if (__builtin_expect (x, 0))
+  foo ();
+@end smallexample
+
+@noindent
+indicates that we do not expect to call @code{foo}, since
+we expect @code{x} to be zero.  Since you are limited to integral
+expressions for @var{exp}, you should use constructions such as
+
+@smallexample
+if (__builtin_expect (ptr != NULL, 1))
+  foo (*ptr);
+@end smallexample
+
+@noindent
+when testing pointer or floating-point values.
+
+For the purposes of branch prediction optimizations, the probability that
+a @code{__builtin_expect} expression is @code{true} is controlled by GCC's
+@code{builtin-expect-probability} parameter, which defaults to 90%.
+
+You can also use @code{__builtin_expect_with_probability} to explicitly
+assign a probability value to individual expressions.  If the built-in
+is used in a loop construct, the provided probability will influence
+the expected number of iterations made by loop optimizations.
+@enddefbuiltin
+
+@defbuiltin{long __builtin_expect_with_probability}
+(long @var{exp}, long @var{c}, double @var{probability})
+
+This function has the same semantics as @code{__builtin_expect},
+but the caller provides the expected probability that @var{exp} == @var{c}.
+The last argument, @var{probability}, is a floating-point value in the
+range 0.0 to 1.0, inclusive.  The @var{probability} argument must be a
+constant floating-point expression.
+@enddefbuiltin
+
+@defbuiltin{void __builtin_trap (void)}
+This function causes the program to exit abnormally.  GCC implements
+this function by using a target-dependent mechanism (such as
+intentionally executing an illegal instruction) or by calling
+@code{abort}.  The mechanism used may vary from release to release so
+you should not rely on any particular implementation.
+@enddefbuiltin
+
+@defbuiltin{void __builtin_unreachable (void)}
+If control flow reaches the point of the @code{__builtin_unreachable},
+the program is undefined.  It is useful in situations where the
+compiler cannot deduce the unreachability of the code.
+
+One such case is immediately following an @code{asm} statement that
+either never terminates, or one that transfers control elsewhere
+and never returns.  In this example, without the
+@code{__builtin_unreachable}, GCC issues a warning that control
+reaches the end of a non-void function.  It also generates code
+to return after the @code{asm}.
+
+@smallexample
+int f (int c, int v)
+@{
+  if (c)
+    @{
+      return v;
+    @}
+  else
+    @{
+      asm("jmp error_handler");
+      __builtin_unreachable ();
+    @}
+@}
+@end smallexample
+
+@noindent
+Because the @code{asm} statement unconditionally transfers control out
+of the function, control never reaches the end of the function
+body.  The @code{__builtin_unreachable} is in fact unreachable and
+communicates this fact to the compiler.
+
+Another use for @code{__builtin_unreachable} is following a call a
+function that never returns but that is not declared
+@code{__attribute__((noreturn))}, as in this example:
+
+@smallexample
+void function_that_never_returns (void);
+
+int g (int c)
+@{
+  if (c)
+    @{
+      return 1;
+    @}
+  else
+    @{
+      function_that_never_returns ();
+      __builtin_unreachable ();
+    @}
+@}
+@end smallexample
+
+@enddefbuiltin
+
+@defbuiltin{@var{type} __builtin_assoc_barrier (@var{type} @var{expr})}
+This built-in inhibits re-association of the floating-point expression
+@var{expr} with expressions consuming the return value of the built-in. The
+expression @var{expr} itself can be reordered, and the whole expression
+@var{expr} can be reordered with operands after the barrier. The barrier is
+relevant when @code{-fassociative-math} is active.
+
+@smallexample
+float x0 = a + b - b;
+float x1 = __builtin_assoc_barrier(a + b) - b;
+@end smallexample
+
+@noindent
+means that, with @code{-fassociative-math}, @code{x0} can be optimized to
+@code{x0 = a} but @code{x1} cannot.
+
+It is also relevant when @code{-ffp-contract=fast} is active;
+it will prevent contraction between expressions.
+
+@smallexample
+float x0 = a * b + c;
+float x1 = __builtin_assoc_barrier (a * b) + c;
+@end smallexample
+
+@noindent
+means that, with @code{-ffp-contract=fast}, @code{x0} may be optimized to
+use a fused multiply-add instruction but @code{x1} cannot.
+
+@enddefbuiltin
+
+@defbuiltin{{void *} __builtin_assume_aligned (const void *@var{exp}, size_t @var{align}, ...)}
+This function returns its first argument, and allows the compiler
+to assume that the returned pointer is at least @var{align} bytes
+aligned.  This built-in can have either two or three arguments,
+if it has three, the third argument should have integer type, and
+if it is nonzero means misalignment offset.  For example:
+
+@smallexample
+void *x = __builtin_assume_aligned (arg, 16);
+@end smallexample
+
+@noindent
+means that the compiler can assume @code{x}, set to @code{arg}, is at least
+16-byte aligned, while:
+
+@smallexample
+void *x = __builtin_assume_aligned (arg, 32, 8);
+@end smallexample
+
+@noindent
+means that the compiler can assume for @code{x}, set to @code{arg}, that
+@code{(char *) x - 8} is 32-byte aligned.
+@enddefbuiltin
+
+@defbuiltin{int __builtin_LINE ()}
+This function is the equivalent of the preprocessor @code{__LINE__}
+macro and returns a constant integer expression that evaluates to
+the line number of the invocation of the built-in.  When used as a C++
+default argument for a function @var{F}, it returns the line number
+of the call to @var{F}.
+@enddefbuiltin
+
+@defbuiltin{{const char *} __builtin_FUNCTION ()}
+This function is the equivalent of the @code{__FUNCTION__} symbol
+and returns an address constant pointing to the name of the function
+from which the built-in was invoked, or the empty string if
+the invocation is not at function scope.  When used as a C++ default
+argument for a function @var{F}, it returns the name of @var{F}'s
+caller or the empty string if the call was not made at function
+scope.
+@enddefbuiltin
+
+@defbuiltin{{const char *} __builtin_FILE ()}
+This function is the equivalent of the preprocessor @code{__FILE__}
+macro and returns an address constant pointing to the file name
+containing the invocation of the built-in, or the empty string if
+the invocation is not at function scope.  When used as a C++ default
+argument for a function @var{F}, it returns the file name of the call
+to @var{F} or the empty string if the call was not made at function
+scope.
+
+For example, in the following, each call to function @code{foo} will
+print a line similar to @code{"file.c:123: foo: message"} with the name
+of the file and the line number of the @code{printf} call, the name of
+the function @code{foo}, followed by the word @code{message}.
+
+@smallexample
+const char*
+function (const char *func = __builtin_FUNCTION ())
+@{
+  return func;
+@}
+
+void foo (void)
+@{
+  printf ("%s:%i: %s: message\n", file (), line (), function ());
+@}
+@end smallexample
+
+@enddefbuiltin
+
+@defbuiltin{void __builtin___clear_cache (void *@var{begin}, void *@var{end})}
+This function is used to flush the processor's instruction cache for
+the region of memory between @var{begin} inclusive and @var{end}
+exclusive.  Some targets require that the instruction cache be
+flushed, after modifying memory containing code, in order to obtain
+deterministic behavior.
+
+If the target does not require instruction cache flushes,
+@code{__builtin___clear_cache} has no effect.  Otherwise either
+instructions are emitted in-line to clear the instruction cache or a
+call to the @code{__clear_cache} function in libgcc is made.
+@enddefbuiltin
+
+@defbuiltin{void __builtin_prefetch (const void *@var{addr}, ...)}
+This function is used to minimize cache-miss latency by moving data into
+a cache before it is accessed.
+You can insert calls to @code{__builtin_prefetch} into code for which
+you know addresses of data in memory that is likely to be accessed soon.
+If the target supports them, data prefetch instructions are generated.
+If the prefetch is done early enough before the access then the data will
+be in the cache by the time it is accessed.
+
+The value of @var{addr} is the address of the memory to prefetch.
+There are two optional arguments, @var{rw} and @var{locality}.
+The value of @var{rw} is a compile-time constant zero, one or two; one
+means that the prefetch is preparing for a write to the memory address,
+two means that the prefetch is preparing for a shared read (expected to be
+read by at least one other processor before it is written if written at
+all) and zero, the default, means that the prefetch is preparing for a read.
+The value @var{locality} must be a compile-time constant integer between
+zero and three.  A value of zero means that the data has no temporal
+locality, so it need not be left in the cache after the access.  A value
+of three means that the data has a high degree of temporal locality and
+should be left in all levels of cache possible.  Values of one and two
+mean, respectively, a low or moderate degree of temporal locality.  The
+default is three.
+
+@smallexample
+for (i = 0; i < n; i++)
+  @{
+    a[i] = a[i] + b[i];
+    __builtin_prefetch (&a[i+j], 1, 1);
+    __builtin_prefetch (&b[i+j], 0, 1);
+    /* @r{@dots{}} */
+  @}
+@end smallexample
+
+Data prefetch does not generate faults if @var{addr} is invalid, but
+the address expression itself must be valid.  For example, a prefetch
+of @code{p->next} does not fault if @code{p->next} is not a valid
+address, but evaluation faults if @code{p} is not a valid address.
+
+If the target does not support data prefetch, the address expression
+is evaluated if it includes side effects but no other code is generated
+and GCC does not issue a warning.
+@enddefbuiltin
+
+@defbuiltin{{size_t} __builtin_object_size (const void * @var{ptr}, int @var{type})}
+Returns a constant size estimate of an object pointed to by @var{ptr}.
+@xref{Object Size Checking}, for a detailed description of the function.
+@enddefbuiltin
+
+@defbuiltin{{size_t} __builtin_dynamic_object_size (const void * @var{ptr}, int @var{type})}
+Similar to @code{__builtin_object_size} except that the return value
+need not be a constant.  @xref{Object Size Checking}, for a detailed
+description of the function.
+@enddefbuiltin
+
+@defbuiltin{int __builtin_classify_type (@var{arg})}
+@defbuiltinx{int __builtin_classify_type (@var{type})}
+The @code{__builtin_classify_type} returns a small integer with a category
+of @var{arg} argument's type, like void type, integer type, enumeral type,
+boolean type, pointer type, reference type, offset type, real type, complex
+type, function type, method type, record type, union type, array type,
+string type, bit-precise integer type, vector type, etc.  When the argument
+is an expression, for backwards compatibility reason the argument is promoted
+like arguments passed to @code{...} in varargs function, so some classes are
+never returned in certain languages.  Alternatively, the argument of the
+built-in function can be a typename, such as the @code{typeof} specifier.
+
+@smallexample
+int a[2];
+__builtin_classify_type (a) == __builtin_classify_type (int[5]);
+__builtin_classify_type (a) == __builtin_classify_type (void*);
+__builtin_classify_type (typeof (a)) == __builtin_classify_type (int[5]);
 @end smallexample
 
-These intrinsic functions are available by including @code{lasxintrin.h} and
-using @option{-mfrecipe} and @option{-mlasx}.
-@smallexample
-__m256d __lasx_xvfrecipe_d (__m256d);
-__m256 __lasx_xvfrecipe_s (__m256);
-__m256d __lasx_xvfrsqrte_d (__m256d);
-__m256 __lasx_xvfrsqrte_s (__m256);
-@end smallexample
+The first comparison will never be true, as @var{a} is implicitly converted
+to pointer.  The last two comparisons will be true as they classify
+pointers in the second case and arrays in the last case.
+@enddefbuiltin
+
+@defbuiltin{double __builtin_huge_val (void)}
+Returns a positive infinity, if supported by the floating-point format,
+else @code{DBL_MAX}.  This function is suitable for implementing the
+ISO C macro @code{HUGE_VAL}.
+@enddefbuiltin
+
+@defbuiltin{float __builtin_huge_valf (void)}
+Similar to @code{__builtin_huge_val}, except the return type is @code{float}.
+@enddefbuiltin
+
+@defbuiltin{{long double} __builtin_huge_vall (void)}
+Similar to @code{__builtin_huge_val}, except the return
+type is @code{long double}.
+@enddefbuiltin
+
+@defbuiltin{_Float@var{n} __builtin_huge_valf@var{n} (void)}
+Similar to @code{__builtin_huge_val}, except the return type is
+@code{_Float@var{n}}.
+@enddefbuiltin
+
+@defbuiltin{_Float@var{n}x __builtin_huge_valf@var{n}x (void)}
+Similar to @code{__builtin_huge_val}, except the return type is
+@code{_Float@var{n}x}.
+@enddefbuiltin
+
+@defbuiltin{int __builtin_fpclassify (int, int, int, int, int, ...)}
+This built-in implements the C99 fpclassify functionality.  The first
+five int arguments should be the target library's notion of the
+possible FP classes and are used for return values.  They must be
+constant values and they must appear in this order: @code{FP_NAN},
+@code{FP_INFINITE}, @code{FP_NORMAL}, @code{FP_SUBNORMAL} and
+@code{FP_ZERO}.  The ellipsis is for exactly one floating-point value
+to classify.  GCC treats the last argument as type-generic, which
+means it does not do default promotion from float to double.
+@enddefbuiltin
+
+@defbuiltin{double __builtin_inf (void)}
+Similar to @code{__builtin_huge_val}, except a warning is generated
+if the target floating-point format does not support infinities.
+@enddefbuiltin
+
+@defbuiltin{_Decimal32 __builtin_infd32 (void)}
+Similar to @code{__builtin_inf}, except the return type is @code{_Decimal32}.
+@enddefbuiltin
+
+@defbuiltin{_Decimal64 __builtin_infd64 (void)}
+Similar to @code{__builtin_inf}, except the return type is @code{_Decimal64}.
+@enddefbuiltin
+
+@defbuiltin{_Decimal128 __builtin_infd128 (void)}
+Similar to @code{__builtin_inf}, except the return type is @code{_Decimal128}.
+@enddefbuiltin
+
+@defbuiltin{float __builtin_inff (void)}
+Similar to @code{__builtin_inf}, except the return type is @code{float}.
+This function is suitable for implementing the ISO C99 macro @code{INFINITY}.
+@enddefbuiltin
+
+@defbuiltin{{long double} __builtin_infl (void)}
+Similar to @code{__builtin_inf}, except the return
+type is @code{long double}.
+@enddefbuiltin
+
+@defbuiltin{_Float@var{n} __builtin_inff@var{n} (void)}
+Similar to @code{__builtin_inf}, except the return
+type is @code{_Float@var{n}}.
+@enddefbuiltin
+
+@defbuiltin{_Float@var{n} __builtin_inff@var{n}x (void)}
+Similar to @code{__builtin_inf}, except the return
+type is @code{_Float@var{n}x}.
+@enddefbuiltin
+
+@defbuiltin{int __builtin_isinf_sign (...)}
+Similar to @code{isinf}, except the return value is -1 for
+an argument of @code{-Inf} and 1 for an argument of @code{+Inf}.
+Note while the parameter list is an
+ellipsis, this function only accepts exactly one floating-point
+argument.  GCC treats this parameter as type-generic, which means it
+does not do default promotion from float to double.
+@enddefbuiltin
+
+@defbuiltin{double __builtin_nan (const char *@var{str})}
+This is an implementation of the ISO C99 function @code{nan}.
+
+Since ISO C99 defines this function in terms of @code{strtod}, which we
+do not implement, a description of the parsing is in order.  The string
+is parsed as by @code{strtol}; that is, the base is recognized by
+leading @samp{0} or @samp{0x} prefixes.  The number parsed is placed
+in the significand such that the least significant bit of the number
+is at the least significant bit of the significand.  The number is
+truncated to fit the significand field provided.  The significand is
+forced to be a quiet NaN@.
+
+This function, if given a string literal all of which would have been
+consumed by @code{strtol}, is evaluated early enough that it is considered a
+compile-time constant.
+@enddefbuiltin
+
+@defbuiltin{_Decimal32 __builtin_nand32 (const char *@var{str})}
+Similar to @code{__builtin_nan}, except the return type is @code{_Decimal32}.
+@enddefbuiltin
+
+@defbuiltin{_Decimal64 __builtin_nand64 (const char *@var{str})}
+Similar to @code{__builtin_nan}, except the return type is @code{_Decimal64}.
+@enddefbuiltin
+
+@defbuiltin{_Decimal128 __builtin_nand128 (const char *@var{str})}
+Similar to @code{__builtin_nan}, except the return type is @code{_Decimal128}.
+@enddefbuiltin
+
+@defbuiltin{float __builtin_nanf (const char *@var{str})}
+Similar to @code{__builtin_nan}, except the return type is @code{float}.
+@enddefbuiltin
+
+@defbuiltin{{long double} __builtin_nanl (const char *@var{str})}
+Similar to @code{__builtin_nan}, except the return type is @code{long double}.
+@enddefbuiltin
+
+@defbuiltin{_Float@var{n} __builtin_nanf@var{n} (const char *@var{str})}
+Similar to @code{__builtin_nan}, except the return type is
+@code{_Float@var{n}}.
+@enddefbuiltin
+
+@defbuiltin{_Float@var{n}x __builtin_nanf@var{n}x (const char *@var{str})}
+Similar to @code{__builtin_nan}, except the return type is
+@code{_Float@var{n}x}.
+@enddefbuiltin
+
+@defbuiltin{double __builtin_nans (const char *@var{str})}
+Similar to @code{__builtin_nan}, except the significand is forced
+to be a signaling NaN@.  The @code{nans} function is proposed by
+@uref{https://www.open-std.org/jtc1/sc22/wg14/www/docs/n965.htm,,WG14 N965}.
+@enddefbuiltin
+
+@defbuiltin{_Decimal32 __builtin_nansd32 (const char *@var{str})}
+Similar to @code{__builtin_nans}, except the return type is @code{_Decimal32}.
+@enddefbuiltin
+
+@defbuiltin{_Decimal64 __builtin_nansd64 (const char *@var{str})}
+Similar to @code{__builtin_nans}, except the return type is @code{_Decimal64}.
+@enddefbuiltin
+
+@defbuiltin{_Decimal128 __builtin_nansd128 (const char *@var{str})}
+Similar to @code{__builtin_nans}, except the return type is @code{_Decimal128}.
+@enddefbuiltin
+
+@defbuiltin{float __builtin_nansf (const char *@var{str})}
+Similar to @code{__builtin_nans}, except the return type is @code{float}.
+@enddefbuiltin
+
+@defbuiltin{{long double} __builtin_nansl (const char *@var{str})}
+Similar to @code{__builtin_nans}, except the return type is @code{long double}.
+@enddefbuiltin
+
+@defbuiltin{_Float@var{n} __builtin_nansf@var{n} (const char *@var{str})}
+Similar to @code{__builtin_nans}, except the return type is
+@code{_Float@var{n}}.
+@enddefbuiltin
+
+@defbuiltin{_Float@var{n}x __builtin_nansf@var{n}x (const char *@var{str})}
+Similar to @code{__builtin_nans}, except the return type is
+@code{_Float@var{n}x}.
+@enddefbuiltin
+
+@defbuiltin{int __builtin_issignaling (...)}
+Return non-zero if the argument is a signaling NaN and zero otherwise.
+Note while the parameter list is an
+ellipsis, this function only accepts exactly one floating-point
+argument.  GCC treats this parameter as type-generic, which means it
+does not do default promotion from float to double.
+This built-in function can work even without the non-default
+@code{-fsignaling-nans} option, although if a signaling NaN is computed,
+stored or passed as argument to some function other than this built-in
+in the current translation unit, it is safer to use @code{-fsignaling-nans}.
+With @code{-ffinite-math-only} option this built-in function will always
+return 0.
+@enddefbuiltin
+
+@defbuiltin{int __builtin_ffs (int @var{x})}
+Returns one plus the index of the least significant 1-bit of @var{x}, or
+if @var{x} is zero, returns zero.
+@enddefbuiltin
+
+@defbuiltin{int __builtin_clz (unsigned int @var{x})}
+Returns the number of leading 0-bits in @var{x}, starting at the most
+significant bit position.  If @var{x} is 0, the result is undefined.
+@enddefbuiltin
+
+@defbuiltin{int __builtin_ctz (unsigned int @var{x})}
+Returns the number of trailing 0-bits in @var{x}, starting at the least
+significant bit position.  If @var{x} is 0, the result is undefined.
+@enddefbuiltin
+
+@defbuiltin{int __builtin_clrsb (int @var{x})}
+Returns the number of leading redundant sign bits in @var{x}, i.e.@: the
+number of bits following the most significant bit that are identical
+to it.  There are no special cases for 0 or other values. 
+@enddefbuiltin
+
+@defbuiltin{int __builtin_popcount (unsigned int @var{x})}
+Returns the number of 1-bits in @var{x}.
+@enddefbuiltin
+
+@defbuiltin{int __builtin_parity (unsigned int @var{x})}
+Returns the parity of @var{x}, i.e.@: the number of 1-bits in @var{x}
+modulo 2.
+@enddefbuiltin
+
+@defbuiltin{int __builtin_ffsl (long)}
+Similar to @code{__builtin_ffs}, except the argument type is
+@code{long}.
+@enddefbuiltin
+
+@defbuiltin{int __builtin_clzl (unsigned long)}
+Similar to @code{__builtin_clz}, except the argument type is
+@code{unsigned long}.
+@enddefbuiltin
+
+@defbuiltin{int __builtin_ctzl (unsigned long)}
+Similar to @code{__builtin_ctz}, except the argument type is
+@code{unsigned long}.
+@enddefbuiltin
+
+@defbuiltin{int __builtin_clrsbl (long)}
+Similar to @code{__builtin_clrsb}, except the argument type is
+@code{long}.
+@enddefbuiltin
+
+@defbuiltin{int __builtin_popcountl (unsigned long)}
+Similar to @code{__builtin_popcount}, except the argument type is
+@code{unsigned long}.
+@enddefbuiltin
+
+@defbuiltin{int __builtin_parityl (unsigned long)}
+Similar to @code{__builtin_parity}, except the argument type is
+@code{unsigned long}.
+@enddefbuiltin
+
+@defbuiltin{int __builtin_ffsll (long long)}
+Similar to @code{__builtin_ffs}, except the argument type is
+@code{long long}.
+@enddefbuiltin
+
+@defbuiltin{int __builtin_clzll (unsigned long long)}
+Similar to @code{__builtin_clz}, except the argument type is
+@code{unsigned long long}.
+@enddefbuiltin
+
+@defbuiltin{int __builtin_ctzll (unsigned long long)}
+Similar to @code{__builtin_ctz}, except the argument type is
+@code{unsigned long long}.
+@enddefbuiltin
+
+@defbuiltin{int __builtin_clrsbll (long long)}
+Similar to @code{__builtin_clrsb}, except the argument type is
+@code{long long}.
+@enddefbuiltin
+
+@defbuiltin{int __builtin_popcountll (unsigned long long)}
+Similar to @code{__builtin_popcount}, except the argument type is
+@code{unsigned long long}.
+@enddefbuiltin
+
+@defbuiltin{int __builtin_parityll (unsigned long long)}
+Similar to @code{__builtin_parity}, except the argument type is
+@code{unsigned long long}.
+@enddefbuiltin
+
+@defbuiltin{int __builtin_ffsg (...)}
+Similar to @code{__builtin_ffs}, except the argument is type-generic
+signed integer (standard, extended or bit-precise).  No integral argument
+promotions are performed on the argument.
+@enddefbuiltin
+
+@defbuiltin{int __builtin_clzg (...)}
+Similar to @code{__builtin_clz}, except the argument is type-generic
+unsigned integer (standard, extended or bit-precise) and there is
+optional second argument with int type.  No integral argument promotions
+are performed on the first argument.  If two arguments are specified,
+and first argument is 0, the result is the second argument.  If only
+one argument is specified and it is 0, the result is undefined.
+@enddefbuiltin
+
+@defbuiltin{int __builtin_ctzg (...)}
+Similar to @code{__builtin_ctz}, except the argument is type-generic
+unsigned integer (standard, extended or bit-precise) and there is
+optional second argument with int type.  No integral argument promotions
+are performed on the first argument.  If two arguments are specified,
+and first argument is 0, the result is the second argument.  If only
+one argument is specified and it is 0, the result is undefined.
+@enddefbuiltin
+
+@defbuiltin{int __builtin_clrsbg (...)}
+Similar to @code{__builtin_clrsb}, except the argument is type-generic
+signed integer (standard, extended or bit-precise).  No integral argument
+promotions are performed on the argument.
+@enddefbuiltin
+
+@defbuiltin{int __builtin_popcountg (...)}
+Similar to @code{__builtin_popcount}, except the argument is type-generic
+unsigned integer (standard, extended or bit-precise).  No integral argument
+promotions are performed on the argument.
+@enddefbuiltin
+
+@defbuiltin{int __builtin_parityg (...)}
+Similar to @code{__builtin_parity}, except the argument is type-generic
+unsigned integer (standard, extended or bit-precise).  No integral argument
+promotions are performed on the argument.
+@enddefbuiltin
+
+@defbuiltin{@var{type} __builtin_stdc_bit_ceil (@var{type} @var{arg})}
+The @code{__builtin_stdc_bit_ceil} function is available only
+in C.  It is type-generic, the argument can be any unsigned integer
+(standard, extended or bit-precise).  No integral argument promotions are
+performed on the argument.  It is equivalent to
+@code{@var{arg} <= 1 ? (@var{type}) 1
+: (@var{type}) 2 << (@var{prec} - 1 - __builtin_clzg ((@var{type}) (@var{arg} - 1)))}
+where @var{prec} is bit width of @var{type}, except that side-effects
+in @var{arg} are evaluated just once.
+@enddefbuiltin
+
+@defbuiltin{@var{type} __builtin_stdc_bit_floor (@var{type} @var{arg})}
+The @code{__builtin_stdc_bit_floor} function is available only
+in C.  It is type-generic, the argument can be any unsigned integer
+(standard, extended or bit-precise).  No integral argument promotions are
+performed on the argument.  It is equivalent to
+@code{@var{arg} == 0 ? (@var{type}) 0
+: (@var{type}) 1 << (@var{prec} - 1 - __builtin_clzg (@var{arg}))}
+where @var{prec} is bit width of @var{type}, except that side-effects
+in @var{arg} are evaluated just once.
+@enddefbuiltin
+
+@defbuiltin{{unsigned int} __builtin_stdc_bit_width (@var{type} @var{arg})}
+The @code{__builtin_stdc_bit_width} function is available only
+in C.  It is type-generic, the argument can be any unsigned integer
+(standard, extended or bit-precise).  No integral argument promotions are
+performed on the argument.  It is equivalent to
+@code{(unsigned int) (@var{prec} - __builtin_clzg (@var{arg}, @var{prec}))}
+where @var{prec} is bit width of @var{type}.
+@enddefbuiltin
+
+@defbuiltin{{unsigned int} __builtin_stdc_count_ones (@var{type} @var{arg})}
+The @code{__builtin_stdc_count_ones} function is available only
+in C.  It is type-generic, the argument can be any unsigned integer
+(standard, extended or bit-precise).  No integral argument promotions are
+performed on the argument.  It is equivalent to
+@code{(unsigned int) __builtin_popcountg (@var{arg})}
+@enddefbuiltin
+
+@defbuiltin{{unsigned int} __builtin_stdc_count_zeros (@var{type} @var{arg})}
+The @code{__builtin_stdc_count_zeros} function is available only
+in C.  It is type-generic, the argument can be any unsigned integer
+(standard, extended or bit-precise).  No integral argument promotions are
+performed on the argument.  It is equivalent to
+@code{(unsigned int) __builtin_popcountg ((@var{type}) ~@var{arg})}
+@enddefbuiltin
+
+@defbuiltin{{unsigned int} __builtin_stdc_first_leading_one (@var{type} @var{arg})}
+The @code{__builtin_stdc_first_leading_one} function is available only
+in C.  It is type-generic, the argument can be any unsigned integer
+(standard, extended or bit-precise).  No integral argument promotions are
+performed on the argument.  It is equivalent to
+@code{__builtin_clzg (@var{arg}, -1) + 1U}
+@enddefbuiltin
+
+@defbuiltin{{unsigned int} __builtin_stdc_first_leading_zero (@var{type} @var{arg})}
+The @code{__builtin_stdc_first_leading_zero} function is available only
+in C.  It is type-generic, the argument can be any unsigned integer
+(standard, extended or bit-precise).  No integral argument promotions are
+performed on the argument.  It is equivalent to
+@code{__builtin_clzg ((@var{type}) ~@var{arg}, -1) + 1U}
+@enddefbuiltin
+
+@defbuiltin{{unsigned int} __builtin_stdc_first_trailing_one (@var{type} @var{arg})}
+The @code{__builtin_stdc_first_trailing_one} function is available only
+in C.  It is type-generic, the argument can be any unsigned integer
+(standard, extended or bit-precise).  No integral argument promotions are
+performed on the argument.  It is equivalent to
+@code{__builtin_ctzg (@var{arg}, -1) + 1U}
+@enddefbuiltin
+
+@defbuiltin{{unsigned int} __builtin_stdc_first_trailing_zero (@var{type} @var{arg})}
+The @code{__builtin_stdc_first_trailing_zero} function is available only
+in C.  It is type-generic, the argument can be any unsigned integer
+(standard, extended or bit-precise).  No integral argument promotions are
+performed on the argument.  It is equivalent to
+@code{__builtin_ctzg ((@var{type}) ~@var{arg}, -1) + 1U}
+@enddefbuiltin
+
+@defbuiltin{{unsigned int} __builtin_stdc_has_single_bit (@var{type} @var{arg})}
+The @code{__builtin_stdc_has_single_bit} function is available only
+in C.  It is type-generic, the argument can be any unsigned integer
+(standard, extended or bit-precise).  No integral argument promotions are
+performed on the argument.  It is equivalent to
+@code{(_Bool) (__builtin_popcountg (@var{arg}) == 1)}
+@enddefbuiltin
+
+@defbuiltin{{unsigned int} __builtin_stdc_leading_ones (@var{type} @var{arg})}
+The @code{__builtin_stdc_leading_ones} function is available only
+in C.  It is type-generic, the argument can be any unsigned integer
+(standard, extended or bit-precise).  No integral argument promotions are
+performed on the argument.  It is equivalent to
+@code{(unsigned int) __builtin_clzg ((@var{type}) ~@var{arg}, @var{prec})}
+@enddefbuiltin
+
+@defbuiltin{{unsigned int} __builtin_stdc_leading_zeros (@var{type} @var{arg})}
+The @code{__builtin_stdc_leading_zeros} function is available only
+in C.  It is type-generic, the argument can be any unsigned integer
+(standard, extended or bit-precise).  No integral argument promotions are
+performed on the argument.  It is equivalent to
+@code{(unsigned int) __builtin_clzg (@var{arg}, @var{prec})}
+@enddefbuiltin
+
+@defbuiltin{{unsigned int} __builtin_stdc_trailing_ones (@var{type} @var{arg})}
+The @code{__builtin_stdc_trailing_ones} function is available only
+in C.  It is type-generic, the argument can be any unsigned integer
+(standard, extended or bit-precise).  No integral argument promotions are
+performed on the argument.  It is equivalent to
+@code{(unsigned int) __builtin_ctzg ((@var{type}) ~@var{arg}, @var{prec})}
+@enddefbuiltin
+
+@defbuiltin{{unsigned int} __builtin_stdc_trailing_zeros (@var{type} @var{arg})}
+The @code{__builtin_stdc_trailing_zeros} function is available only
+in C.  It is type-generic, the argument can be any unsigned integer
+(standard, extended or bit-precise).  No integral argument promotions are
+performed on the argument.  It is equivalent to
+@code{(unsigned int) __builtin_ctzg (@var{arg}, @var{prec})}
+@enddefbuiltin
+
+@defbuiltin{@var{type1} __builtin_stdc_rotate_left (@var{type1} @var{arg1}, @var{type2} @var{arg2})}
+The @code{__builtin_stdc_rotate_left} function is available only
+in C.  It is type-generic, the first argument can be any unsigned integer
+(standard, extended or bit-precise) and second argument any signed or
+unsigned integer or @code{char}.  No integral argument promotions are
+performed on the arguments.  It is equivalent to
+@code{(@var{type1}) ((@var{arg1} << (@var{arg2} % @var{prec}))
+| (@var{arg1} >> ((-(unsigned @var{type2}) @var{arg2}) % @var{prec})))}
+where @var{prec} is bit width of @var{type1}, except that side-effects
+in @var{arg1} and @var{arg2} are evaluated just once.  The behavior is
+undefined if @var{arg2} is negative.
+@enddefbuiltin
+
+@defbuiltin{@var{type1} __builtin_stdc_rotate_right (@var{type1} @var{arg1}, @var{type2} @var{arg2})}
+The @code{__builtin_stdc_rotate_right} function is available only
+in C.  It is type-generic, the first argument can be any unsigned integer
+(standard, extended or bit-precise) and second argument any signed or
+unsigned integer or @code{char}.  No integral argument promotions are
+performed on the arguments.  It is equivalent to
+@code{(@var{type1}) ((@var{arg1} >> (@var{arg2} % @var{prec}))
+| (@var{arg1} << ((-(unsigned @var{type2}) @var{arg2}) % @var{prec})))}
+where @var{prec} is bit width of @var{type1}, except that side-effects
+in @var{arg1} and @var{arg2} are evaluated just once.  The behavior is
+undefined if @var{arg2} is negative.
+@enddefbuiltin
+
+@defbuiltin{double __builtin_powi (double, int)}
+@defbuiltinx{float __builtin_powif (float, int)}
+@defbuiltinx{{long double} __builtin_powil (long double, int)}
+Returns the first argument raised to the power of the second.  Unlike the
+@code{pow} function no guarantees about precision and rounding are made.
+@enddefbuiltin
+
+@defbuiltin{uint16_t __builtin_bswap16 (uint16_t @var{x})}
+Returns @var{x} with the order of the bytes reversed; for example,
+@code{0xabcd} becomes @code{0xcdab}.  Byte here always means
+exactly 8 bits.
+@enddefbuiltin
+
+@defbuiltin{uint32_t __builtin_bswap32 (uint32_t @var{x})}
+Similar to @code{__builtin_bswap16}, except the argument and return types
+are 32-bit.
+@enddefbuiltin
+
+@defbuiltin{uint64_t __builtin_bswap64 (uint64_t @var{x})}
+Similar to @code{__builtin_bswap32}, except the argument and return types
+are 64-bit.
+@enddefbuiltin
+
+@defbuiltin{uint128_t __builtin_bswap128 (uint128_t @var{x})}
+Similar to @code{__builtin_bswap64}, except the argument and return types
+are 128-bit.  Only supported on targets when 128-bit types are supported.
+@enddefbuiltin
+
+
+@defbuiltin{Pmode __builtin_extend_pointer (void * @var{x})}
+On targets where the user visible pointer size is smaller than the size
+of an actual hardware address this function returns the extended user
+pointer.  Targets where this is true included ILP32 mode on x86_64 or
+Aarch64.  This function is mainly useful when writing inline assembly
+code.
+@enddefbuiltin
+
+@defbuiltin{int __builtin_goacc_parlevel_id (int @var{x})}
+Returns the openacc gang, worker or vector id depending on whether @var{x} is
+0, 1 or 2.
+@enddefbuiltin
+
+@defbuiltin{int __builtin_goacc_parlevel_size (int @var{x})}
+Returns the openacc gang, worker or vector size depending on whether @var{x} is
+0, 1 or 2.
+@enddefbuiltin
+
+@defbuiltin{uint8_t __builtin_rev_crc8_data8 (uint8_t @var{crc}, uint8_t @var{data}, uint8_t @var{poly})}
+Returns the calculated 8-bit bit-reversed CRC using the initial CRC (8-bit),
+data (8-bit) and the polynomial (8-bit).
+@var{crc} is the initial CRC, @var{data} is the data and
+@var{poly} is the polynomial without leading 1.
+Table-based or clmul-based CRC may be used for the
+calculation, depending on the target architecture.
+@enddefbuiltin
+
+@defbuiltin{uint16_t __builtin_rev_crc16_data16 (uint16_t @var{crc}, uint16_t @var{data}, uint16_t @var{poly})}
+Similar to @code{__builtin_rev_crc8_data8}, except the argument and return types
+are 16-bit.
+@enddefbuiltin
+
+@defbuiltin{uint16_t __builtin_rev_crc16_data8 (uint16_t @var{crc}, uint8_t @var{data}, uint16_t @var{poly})}
+Similar to @code{__builtin_rev_crc16_data16}, except the @var{data} argument
+type is 8-bit.
+@enddefbuiltin
+
+@defbuiltin{uint32_t __builtin_rev_crc32_data32 (uint32_t @var{crc}, uint32_t @var{data}, uint32_t @var{poly})}
+Similar to @code{__builtin_rev_crc8_data8}, except the argument and return types
+are 32-bit and for the CRC calculation may be also used crc* machine instruction
+depending on the target and the polynomial.
+@enddefbuiltin
+
+@defbuiltin{uint32_t __builtin_rev_crc32_data8 (uint32_t @var{crc}, uint8_t @var{data}, uint32_t @var{poly})}
+Similar to @code{__builtin_rev_crc32_data32}, except the @var{data} argument
+type is 8-bit.
+@enddefbuiltin
+
+@defbuiltin{uint32_t __builtin_rev_crc32_data16 (uint32_t @var{crc}, uint16_t @var{data}, uint32_t @var{poly})}
+Similar to @code{__builtin_rev_crc32_data32}, except the @var{data} argument
+type is 16-bit.
+@enddefbuiltin
+
+@defbuiltin{uint64_t __builtin_rev_crc64_data64 (uint64_t @var{crc}, uint64_t @var{data}, uint64_t @var{poly})}
+Similar to @code{__builtin_rev_crc8_data8}, except the argument and return types
+are 64-bit.
+@enddefbuiltin
+
+@defbuiltin{uint64_t __builtin_rev_crc64_data8 (uint64_t @var{crc}, uint8_t @var{data}, uint64_t @var{poly})}
+Similar to @code{__builtin_rev_crc64_data64}, except the @var{data} argument type
+is 8-bit.
+@enddefbuiltin
+
+@defbuiltin{uint64_t __builtin_rev_crc64_data16 (uint64_t @var{crc}, uint16_t @var{data}, uint64_t @var{poly})}
+Similar to @code{__builtin_rev_crc64_data64}, except the @var{data} argument type
+is 16-bit.
+@enddefbuiltin
 
-@node MIPS DSP Built-in Functions
-@subsection MIPS DSP Built-in Functions
+@defbuiltin{uint64_t __builtin_rev_crc64_data32 (uint64_t @var{crc}, uint32_t @var{data}, uint64_t @var{poly})}
+Similar to @code{__builtin_rev_crc64_data64}, except the @var{data} argument type
+is 32-bit.
+@enddefbuiltin
 
-The MIPS DSP Application-Specific Extension (ASE) includes new
-instructions that are designed to improve the performance of DSP and
-media applications.  It provides instructions that operate on packed
-8-bit/16-bit integer data, Q7, Q15 and Q31 fractional data.
+@defbuiltin{uint8_t __builtin_crc8_data8 (uint8_t @var{crc}, uint8_t @var{data}, uint8_t @var{poly})}
+Returns the calculated 8-bit bit-forward CRC using the initial CRC (8-bit),
+data (8-bit) and the polynomial (8-bit).
+@var{crc} is the initial CRC, @var{data} is the data and
+@var{poly} is the polynomial without leading 1.
+Table-based or clmul-based CRC may be used for the
+calculation, depending on the target architecture.
+@enddefbuiltin
 
-GCC supports MIPS DSP operations using both the generic
-vector extensions (@pxref{Vector Extensions}) and a collection of
-MIPS-specific built-in functions.  Both kinds of support are
-enabled by the @option{-mdsp} command-line option.
+@defbuiltin{uint16_t __builtin_crc16_data16 (uint16_t @var{crc}, uint16_t @var{data}, uint16_t @var{poly})}
+Similar to @code{__builtin_crc8_data8}, except the argument and return types
+are 16-bit.
+@enddefbuiltin
 
-Revision 2 of the ASE was introduced in the second half of 2006.
-This revision adds extra instructions to the original ASE, but is
-otherwise backwards-compatible with it.  You can select revision 2
-using the command-line option @option{-mdspr2}; this option implies
-@option{-mdsp}.
+@defbuiltin{uint16_t __builtin_crc16_data8 (uint16_t @var{crc}, uint8_t @var{data}, uint16_t @var{poly})}
+Similar to @code{__builtin_crc16_data16}, except the @var{data} argument type
+is 8-bit.
+@enddefbuiltin
 
-The SCOUNT and POS bits of the DSP control register are global.  The
-WRDSP, EXTPDP, EXTPDPV and MTHLIP instructions modify the SCOUNT and
-POS bits.  During optimization, the compiler does not delete these
-instructions and it does not delete calls to functions containing
-these instructions.
+@defbuiltin{uint32_t __builtin_crc32_data32 (uint32_t @var{crc}, uint32_t @var{data}, uint32_t @var{poly})}
+Similar to @code{__builtin_crc8_data8}, except the argument and return types
+are 32-bit.
+@enddefbuiltin
 
-At present, GCC only provides support for operations on 32-bit
-vectors.  The vector type associated with 8-bit integer data is
-usually called @code{v4i8}, the vector type associated with Q7
-is usually called @code{v4q7}, the vector type associated with 16-bit
-integer data is usually called @code{v2i16}, and the vector type
-associated with Q15 is usually called @code{v2q15}.  They can be
-defined in C as follows:
+@defbuiltin{uint32_t __builtin_crc32_data8 (uint32_t @var{crc}, uint8_t @var{data}, uint32_t @var{poly})}
+Similar to @code{__builtin_crc32_data32}, except the @var{data} argument type
+is 8-bit.
+@enddefbuiltin
 
-@smallexample
-typedef signed char v4i8 __attribute__ ((vector_size(4)));
-typedef signed char v4q7 __attribute__ ((vector_size(4)));
-typedef short v2i16 __attribute__ ((vector_size(4)));
-typedef short v2q15 __attribute__ ((vector_size(4)));
-@end smallexample
+@defbuiltin{uint32_t __builtin_crc32_data16 (uint32_t @var{crc}, uint16_t @var{data}, uint32_t @var{poly})}
+Similar to @code{__builtin_crc32_data32}, except the @var{data} argument type
+is 16-bit.
+@enddefbuiltin
 
-@code{v4i8}, @code{v4q7}, @code{v2i16} and @code{v2q15} values are
-initialized in the same way as aggregates.  For example:
+@defbuiltin{uint64_t __builtin_crc64_data64 (uint64_t @var{crc}, uint64_t @var{data}, uint64_t @var{poly})}
+Similar to @code{__builtin_crc8_data8}, except the argument and return types
+are 64-bit.
+@enddefbuiltin
 
-@smallexample
-v4i8 a = @{1, 2, 3, 4@};
-v4i8 b;
-b = (v4i8) @{5, 6, 7, 8@};
+@defbuiltin{uint64_t __builtin_crc64_data8 (uint64_t @var{crc}, uint8_t @var{data}, uint64_t @var{poly})}
+Similar to @code{__builtin_crc64_data64}, except the @var{data} argument type
+is 8-bit.
+@enddefbuiltin
 
-v2q15 c = @{0x0fcb, 0x3a75@};
-v2q15 d;
-d = (v2q15) @{0.1234 * 0x1.0p15, 0.4567 * 0x1.0p15@};
-@end smallexample
+@defbuiltin{uint64_t __builtin_crc64_data16 (uint64_t @var{crc}, uint16_t @var{data}, uint64_t @var{poly})}
+Similar to @code{__builtin_crc64_data64}, except the @var{data} argument type
+is 16-bit.
+@enddefbuiltin
 
-@emph{Note:} The CPU's endianness determines the order in which values
-are packed.  On little-endian targets, the first value is the least
-significant and the last value is the most significant.  The opposite
-order applies to big-endian targets.  For example, the code above
-sets the lowest byte of @code{a} to @code{1} on little-endian targets
-and @code{4} on big-endian targets.
+@defbuiltin{uint64_t __builtin_crc64_data32 (uint64_t @var{crc}, uint32_t @var{data}, uint64_t @var{poly})}
+Similar to @code{__builtin_crc64_data64}, except the @var{data} argument type
+is 32-bit.
+@enddefbuiltin
 
-@emph{Note:} Q7, Q15 and Q31 values must be initialized with their integer
-representation.  As shown in this example, the integer representation
-of a Q7 value can be obtained by multiplying the fractional value by
-@code{0x1.0p7}.  The equivalent for Q15 values is to multiply by
-@code{0x1.0p15}.  The equivalent for Q31 values is to multiply by
-@code{0x1.0p31}.
+@node Target Builtins
+@section Built-in Functions Specific to Particular Target Machines
 
-The table below lists the @code{v4i8} and @code{v2q15} operations for which
-hardware support exists.  @code{a} and @code{b} are @code{v4i8} values,
-and @code{c} and @code{d} are @code{v2q15} values.
+On some target machines, GCC supports many built-in functions specific
+to those machines.  Generally these generate calls to specific machine
+instructions, but allow the compiler to schedule those calls.
 
-@multitable @columnfractions .50 .50
-@headitem C code @tab MIPS instruction
-@item @code{a + b} @tab @code{addu.qb}
-@item @code{c + d} @tab @code{addq.ph}
-@item @code{a - b} @tab @code{subu.qb}
-@item @code{c - d} @tab @code{subq.ph}
-@end multitable
+@menu
+* AArch64 Built-in Functions::
+* Alpha Built-in Functions::
+* ARC Built-in Functions::
+* ARC SIMD Built-in Functions::
+* ARM iWMMXt Built-in Functions::
+* ARM C Language Extensions (ACLE)::
+* ARM Floating Point Status and Control Intrinsics::
+* ARM ARMv8-M Security Extensions::
+* AVR Built-in Functions::
+* Blackfin Built-in Functions::
+* BPF Built-in Functions::
+* FR-V Built-in Functions::
+* LoongArch Base Built-in Functions::
+* LoongArch SX Vector Intrinsics::
+* LoongArch ASX Vector Intrinsics::
+* MIPS DSP Built-in Functions::
+* MIPS Paired-Single Support::
+* MIPS Loongson Built-in Functions::
+* MIPS SIMD Architecture (MSA) Support::
+* Other MIPS Built-in Functions::
+* MSP430 Built-in Functions::
+* NDS32 Built-in Functions::
+* Nvidia PTX Built-in Functions::
+* Basic PowerPC Built-in Functions::
+* PowerPC AltiVec/VSX Built-in Functions::
+* PowerPC Hardware Transactional Memory Built-in Functions::
+* PowerPC Atomic Memory Operation Functions::
+* PowerPC Matrix-Multiply Assist Built-in Functions::
+* PRU Built-in Functions::
+* RISC-V Built-in Functions::
+* RISC-V Vector Intrinsics::
+* CORE-V Built-in Functions::
+* RX Built-in Functions::
+* S/390 System z Built-in Functions::
+* SH Built-in Functions::
+* SPARC VIS Built-in Functions::
+* TI C6X Built-in Functions::
+* x86 Built-in Functions::
+* x86 transactional memory intrinsics::
+* x86 control-flow protection intrinsics::
+@end menu
 
-The table below lists the @code{v2i16} operation for which
-hardware support exists for the DSP ASE REV 2.  @code{e} and @code{f} are
-@code{v2i16} values.
+@node AArch64 Built-in Functions
+@subsection AArch64 Built-in Functions
 
-@multitable @columnfractions .50 .50
-@headitem C code @tab MIPS instruction
-@item @code{e * f} @tab @code{mul.ph}
-@end multitable
+These built-in functions are available for the AArch64 family of
+processors.
+@smallexample
+unsigned int __builtin_aarch64_get_fpcr ();
+void __builtin_aarch64_set_fpcr (unsigned int);
+unsigned int __builtin_aarch64_get_fpsr ();
+void __builtin_aarch64_set_fpsr (unsigned int);
 
-It is easier to describe the DSP built-in functions if we first define
-the following types:
+unsigned long long __builtin_aarch64_get_fpcr64 ();
+void __builtin_aarch64_set_fpcr64 (unsigned long long);
+unsigned long long __builtin_aarch64_get_fpsr64 ();
+void __builtin_aarch64_set_fpsr64 (unsigned long long);
+@end smallexample
+
+@node Alpha Built-in Functions
+@subsection Alpha Built-in Functions
+
+These built-in functions are available for the Alpha family of
+processors, depending on the command-line switches used.
+
+The following built-in functions are always available.  They
+all generate the machine instruction that is part of the name.
 
 @smallexample
-typedef int q31;
-typedef int i32;
-typedef unsigned int ui32;
-typedef long long a64;
+long __builtin_alpha_implver (void);
+long __builtin_alpha_rpcc (void);
+long __builtin_alpha_amask (long);
+long __builtin_alpha_cmpbge (long, long);
+long __builtin_alpha_extbl (long, long);
+long __builtin_alpha_extwl (long, long);
+long __builtin_alpha_extll (long, long);
+long __builtin_alpha_extql (long, long);
+long __builtin_alpha_extwh (long, long);
+long __builtin_alpha_extlh (long, long);
+long __builtin_alpha_extqh (long, long);
+long __builtin_alpha_insbl (long, long);
+long __builtin_alpha_inswl (long, long);
+long __builtin_alpha_insll (long, long);
+long __builtin_alpha_insql (long, long);
+long __builtin_alpha_inswh (long, long);
+long __builtin_alpha_inslh (long, long);
+long __builtin_alpha_insqh (long, long);
+long __builtin_alpha_mskbl (long, long);
+long __builtin_alpha_mskwl (long, long);
+long __builtin_alpha_mskll (long, long);
+long __builtin_alpha_mskql (long, long);
+long __builtin_alpha_mskwh (long, long);
+long __builtin_alpha_msklh (long, long);
+long __builtin_alpha_mskqh (long, long);
+long __builtin_alpha_umulh (long, long);
+long __builtin_alpha_zap (long, long);
+long __builtin_alpha_zapnot (long, long);
+@end smallexample
+
+The following built-in functions are always with @option{-mmax}
+or @option{-mcpu=@var{cpu}} where @var{cpu} is @code{pca56} or
+later.  They all generate the machine instruction that is part
+of the name.
+
+@smallexample
+long __builtin_alpha_pklb (long);
+long __builtin_alpha_pkwb (long);
+long __builtin_alpha_unpkbl (long);
+long __builtin_alpha_unpkbw (long);
+long __builtin_alpha_minub8 (long, long);
+long __builtin_alpha_minsb8 (long, long);
+long __builtin_alpha_minuw4 (long, long);
+long __builtin_alpha_minsw4 (long, long);
+long __builtin_alpha_maxub8 (long, long);
+long __builtin_alpha_maxsb8 (long, long);
+long __builtin_alpha_maxuw4 (long, long);
+long __builtin_alpha_maxsw4 (long, long);
+long __builtin_alpha_perr (long, long);
 @end smallexample
 
-@code{q31} and @code{i32} are actually the same as @code{int}, but we
-use @code{q31} to indicate a Q31 fractional value and @code{i32} to
-indicate a 32-bit integer value.  Similarly, @code{a64} is the same as
-@code{long long}, but we use @code{a64} to indicate values that are
-placed in one of the four DSP accumulators (@code{$ac0},
-@code{$ac1}, @code{$ac2} or @code{$ac3}).
-
-Also, some built-in functions prefer or require immediate numbers as
-parameters, because the corresponding DSP instructions accept both immediate
-numbers and register operands, or accept immediate numbers only.  The
-immediate parameters are listed as follows.
+The following built-in functions are always with @option{-mcix}
+or @option{-mcpu=@var{cpu}} where @var{cpu} is @code{ev67} or
+later.  They all generate the machine instruction that is part
+of the name.
 
 @smallexample
-imm0_3: 0 to 3.
-imm0_7: 0 to 7.
-imm0_15: 0 to 15.
-imm0_31: 0 to 31.
-imm0_63: 0 to 63.
-imm0_255: 0 to 255.
-imm_n32_31: -32 to 31.
-imm_n512_511: -512 to 511.
+long __builtin_alpha_cttz (long);
+long __builtin_alpha_ctlz (long);
+long __builtin_alpha_ctpop (long);
 @end smallexample
 
-The following built-in functions map directly to a particular MIPS DSP
-instruction.  Please refer to the architecture specification
-for details on what each instruction does.
+The following built-in functions are available on systems that use the OSF/1
+PALcode.  Normally they invoke the @code{rduniq} and @code{wruniq}
+PAL calls, but when invoked with @option{-mtls-kernel}, they invoke
+@code{rdval} and @code{wrval}.
 
 @smallexample
-v2q15 __builtin_mips_addq_ph (v2q15, v2q15);
-v2q15 __builtin_mips_addq_s_ph (v2q15, v2q15);
-q31 __builtin_mips_addq_s_w (q31, q31);
-v4i8 __builtin_mips_addu_qb (v4i8, v4i8);
-v4i8 __builtin_mips_addu_s_qb (v4i8, v4i8);
-v2q15 __builtin_mips_subq_ph (v2q15, v2q15);
-v2q15 __builtin_mips_subq_s_ph (v2q15, v2q15);
-q31 __builtin_mips_subq_s_w (q31, q31);
-v4i8 __builtin_mips_subu_qb (v4i8, v4i8);
-v4i8 __builtin_mips_subu_s_qb (v4i8, v4i8);
-i32 __builtin_mips_addsc (i32, i32);
-i32 __builtin_mips_addwc (i32, i32);
-i32 __builtin_mips_modsub (i32, i32);
-i32 __builtin_mips_raddu_w_qb (v4i8);
-v2q15 __builtin_mips_absq_s_ph (v2q15);
-q31 __builtin_mips_absq_s_w (q31);
-v4i8 __builtin_mips_precrq_qb_ph (v2q15, v2q15);
-v2q15 __builtin_mips_precrq_ph_w (q31, q31);
-v2q15 __builtin_mips_precrq_rs_ph_w (q31, q31);
-v4i8 __builtin_mips_precrqu_s_qb_ph (v2q15, v2q15);
-q31 __builtin_mips_preceq_w_phl (v2q15);
-q31 __builtin_mips_preceq_w_phr (v2q15);
-v2q15 __builtin_mips_precequ_ph_qbl (v4i8);
-v2q15 __builtin_mips_precequ_ph_qbr (v4i8);
-v2q15 __builtin_mips_precequ_ph_qbla (v4i8);
-v2q15 __builtin_mips_precequ_ph_qbra (v4i8);
-v2q15 __builtin_mips_preceu_ph_qbl (v4i8);
-v2q15 __builtin_mips_preceu_ph_qbr (v4i8);
-v2q15 __builtin_mips_preceu_ph_qbla (v4i8);
-v2q15 __builtin_mips_preceu_ph_qbra (v4i8);
-v4i8 __builtin_mips_shll_qb (v4i8, imm0_7);
-v4i8 __builtin_mips_shll_qb (v4i8, i32);
-v2q15 __builtin_mips_shll_ph (v2q15, imm0_15);
-v2q15 __builtin_mips_shll_ph (v2q15, i32);
-v2q15 __builtin_mips_shll_s_ph (v2q15, imm0_15);
-v2q15 __builtin_mips_shll_s_ph (v2q15, i32);
-q31 __builtin_mips_shll_s_w (q31, imm0_31);
-q31 __builtin_mips_shll_s_w (q31, i32);
-v4i8 __builtin_mips_shrl_qb (v4i8, imm0_7);
-v4i8 __builtin_mips_shrl_qb (v4i8, i32);
-v2q15 __builtin_mips_shra_ph (v2q15, imm0_15);
-v2q15 __builtin_mips_shra_ph (v2q15, i32);
-v2q15 __builtin_mips_shra_r_ph (v2q15, imm0_15);
-v2q15 __builtin_mips_shra_r_ph (v2q15, i32);
-q31 __builtin_mips_shra_r_w (q31, imm0_31);
-q31 __builtin_mips_shra_r_w (q31, i32);
-v2q15 __builtin_mips_muleu_s_ph_qbl (v4i8, v2q15);
-v2q15 __builtin_mips_muleu_s_ph_qbr (v4i8, v2q15);
-v2q15 __builtin_mips_mulq_rs_ph (v2q15, v2q15);
-q31 __builtin_mips_muleq_s_w_phl (v2q15, v2q15);
-q31 __builtin_mips_muleq_s_w_phr (v2q15, v2q15);
-a64 __builtin_mips_dpau_h_qbl (a64, v4i8, v4i8);
-a64 __builtin_mips_dpau_h_qbr (a64, v4i8, v4i8);
-a64 __builtin_mips_dpsu_h_qbl (a64, v4i8, v4i8);
-a64 __builtin_mips_dpsu_h_qbr (a64, v4i8, v4i8);
-a64 __builtin_mips_dpaq_s_w_ph (a64, v2q15, v2q15);
-a64 __builtin_mips_dpaq_sa_l_w (a64, q31, q31);
-a64 __builtin_mips_dpsq_s_w_ph (a64, v2q15, v2q15);
-a64 __builtin_mips_dpsq_sa_l_w (a64, q31, q31);
-a64 __builtin_mips_mulsaq_s_w_ph (a64, v2q15, v2q15);
-a64 __builtin_mips_maq_s_w_phl (a64, v2q15, v2q15);
-a64 __builtin_mips_maq_s_w_phr (a64, v2q15, v2q15);
-a64 __builtin_mips_maq_sa_w_phl (a64, v2q15, v2q15);
-a64 __builtin_mips_maq_sa_w_phr (a64, v2q15, v2q15);
-i32 __builtin_mips_bitrev (i32);
-i32 __builtin_mips_insv (i32, i32);
-v4i8 __builtin_mips_repl_qb (imm0_255);
-v4i8 __builtin_mips_repl_qb (i32);
-v2q15 __builtin_mips_repl_ph (imm_n512_511);
-v2q15 __builtin_mips_repl_ph (i32);
-void __builtin_mips_cmpu_eq_qb (v4i8, v4i8);
-void __builtin_mips_cmpu_lt_qb (v4i8, v4i8);
-void __builtin_mips_cmpu_le_qb (v4i8, v4i8);
-i32 __builtin_mips_cmpgu_eq_qb (v4i8, v4i8);
-i32 __builtin_mips_cmpgu_lt_qb (v4i8, v4i8);
-i32 __builtin_mips_cmpgu_le_qb (v4i8, v4i8);
-void __builtin_mips_cmp_eq_ph (v2q15, v2q15);
-void __builtin_mips_cmp_lt_ph (v2q15, v2q15);
-void __builtin_mips_cmp_le_ph (v2q15, v2q15);
-v4i8 __builtin_mips_pick_qb (v4i8, v4i8);
-v2q15 __builtin_mips_pick_ph (v2q15, v2q15);
-v2q15 __builtin_mips_packrl_ph (v2q15, v2q15);
-i32 __builtin_mips_extr_w (a64, imm0_31);
-i32 __builtin_mips_extr_w (a64, i32);
-i32 __builtin_mips_extr_r_w (a64, imm0_31);
-i32 __builtin_mips_extr_s_h (a64, i32);
-i32 __builtin_mips_extr_rs_w (a64, imm0_31);
-i32 __builtin_mips_extr_rs_w (a64, i32);
-i32 __builtin_mips_extr_s_h (a64, imm0_31);
-i32 __builtin_mips_extr_r_w (a64, i32);
-i32 __builtin_mips_extp (a64, imm0_31);
-i32 __builtin_mips_extp (a64, i32);
-i32 __builtin_mips_extpdp (a64, imm0_31);
-i32 __builtin_mips_extpdp (a64, i32);
-a64 __builtin_mips_shilo (a64, imm_n32_31);
-a64 __builtin_mips_shilo (a64, i32);
-a64 __builtin_mips_mthlip (a64, i32);
-void __builtin_mips_wrdsp (i32, imm0_63);
-i32 __builtin_mips_rddsp (imm0_63);
-i32 __builtin_mips_lbux (void *, i32);
-i32 __builtin_mips_lhx (void *, i32);
-i32 __builtin_mips_lwx (void *, i32);
-a64 __builtin_mips_ldx (void *, i32); /* MIPS64 only */
-i32 __builtin_mips_bposge32 (void);
-a64 __builtin_mips_madd (a64, i32, i32);
-a64 __builtin_mips_maddu (a64, ui32, ui32);
-a64 __builtin_mips_msub (a64, i32, i32);
-a64 __builtin_mips_msubu (a64, ui32, ui32);
-a64 __builtin_mips_mult (i32, i32);
-a64 __builtin_mips_multu (ui32, ui32);
+void *__builtin_thread_pointer (void);
+void __builtin_set_thread_pointer (void *);
 @end smallexample
 
-The following built-in functions map directly to a particular MIPS DSP REV 2
-instruction.  Please refer to the architecture specification
-for details on what each instruction does.
+@node ARC Built-in Functions
+@subsection ARC Built-in Functions
+
+The following built-in functions are provided for ARC targets.  The
+built-ins generate the corresponding assembly instructions.  In the
+examples given below, the generated code often requires an operand or
+result to be in a register.  Where necessary further code will be
+generated to ensure this is true, but for brevity this is not
+described in each case.
+
+@emph{Note:} Using a built-in to generate an instruction not supported
+by a target may cause problems. At present the compiler is not
+guaranteed to detect such misuse, and as a result an internal compiler
+error may be generated.
 
+@defbuiltin{int __builtin_arc_aligned (void *@var{val}, int @var{alignval})}
+Return 1 if @var{val} is known to have the byte alignment given
+by @var{alignval}, otherwise return 0.
+Note that this is different from
 @smallexample
-v4q7 __builtin_mips_absq_s_qb (v4q7);
-v2i16 __builtin_mips_addu_ph (v2i16, v2i16);
-v2i16 __builtin_mips_addu_s_ph (v2i16, v2i16);
-v4i8 __builtin_mips_adduh_qb (v4i8, v4i8);
-v4i8 __builtin_mips_adduh_r_qb (v4i8, v4i8);
-i32 __builtin_mips_append (i32, i32, imm0_31);
-i32 __builtin_mips_balign (i32, i32, imm0_3);
-i32 __builtin_mips_cmpgdu_eq_qb (v4i8, v4i8);
-i32 __builtin_mips_cmpgdu_lt_qb (v4i8, v4i8);
-i32 __builtin_mips_cmpgdu_le_qb (v4i8, v4i8);
-a64 __builtin_mips_dpa_w_ph (a64, v2i16, v2i16);
-a64 __builtin_mips_dps_w_ph (a64, v2i16, v2i16);
-v2i16 __builtin_mips_mul_ph (v2i16, v2i16);
-v2i16 __builtin_mips_mul_s_ph (v2i16, v2i16);
-q31 __builtin_mips_mulq_rs_w (q31, q31);
-v2q15 __builtin_mips_mulq_s_ph (v2q15, v2q15);
-q31 __builtin_mips_mulq_s_w (q31, q31);
-a64 __builtin_mips_mulsa_w_ph (a64, v2i16, v2i16);
-v4i8 __builtin_mips_precr_qb_ph (v2i16, v2i16);
-v2i16 __builtin_mips_precr_sra_ph_w (i32, i32, imm0_31);
-v2i16 __builtin_mips_precr_sra_r_ph_w (i32, i32, imm0_31);
-i32 __builtin_mips_prepend (i32, i32, imm0_31);
-v4i8 __builtin_mips_shra_qb (v4i8, imm0_7);
-v4i8 __builtin_mips_shra_r_qb (v4i8, imm0_7);
-v4i8 __builtin_mips_shra_qb (v4i8, i32);
-v4i8 __builtin_mips_shra_r_qb (v4i8, i32);
-v2i16 __builtin_mips_shrl_ph (v2i16, imm0_15);
-v2i16 __builtin_mips_shrl_ph (v2i16, i32);
-v2i16 __builtin_mips_subu_ph (v2i16, v2i16);
-v2i16 __builtin_mips_subu_s_ph (v2i16, v2i16);
-v4i8 __builtin_mips_subuh_qb (v4i8, v4i8);
-v4i8 __builtin_mips_subuh_r_qb (v4i8, v4i8);
-v2q15 __builtin_mips_addqh_ph (v2q15, v2q15);
-v2q15 __builtin_mips_addqh_r_ph (v2q15, v2q15);
-q31 __builtin_mips_addqh_w (q31, q31);
-q31 __builtin_mips_addqh_r_w (q31, q31);
-v2q15 __builtin_mips_subqh_ph (v2q15, v2q15);
-v2q15 __builtin_mips_subqh_r_ph (v2q15, v2q15);
-q31 __builtin_mips_subqh_w (q31, q31);
-q31 __builtin_mips_subqh_r_w (q31, q31);
-a64 __builtin_mips_dpax_w_ph (a64, v2i16, v2i16);
-a64 __builtin_mips_dpsx_w_ph (a64, v2i16, v2i16);
-a64 __builtin_mips_dpaqx_s_w_ph (a64, v2q15, v2q15);
-a64 __builtin_mips_dpaqx_sa_w_ph (a64, v2q15, v2q15);
-a64 __builtin_mips_dpsqx_s_w_ph (a64, v2q15, v2q15);
-a64 __builtin_mips_dpsqx_sa_w_ph (a64, v2q15, v2q15);
+__alignof__(*(char *)@var{val}) >= alignval
 @end smallexample
+because __alignof__ sees only the type of the dereference, whereas
+__builtin_arc_align uses alignment information from the pointer
+as well as from the pointed-to type.
+The information available will depend on optimization level.
+@enddefbuiltin
+
+@defbuiltin{void __builtin_arc_brk (void)}
+Generates
+@example
+brk
+@end example
+@enddefbuiltin
+
+@defbuiltin{{unsigned int} __builtin_arc_core_read (unsigned int @var{regno})}
+The operand is the number of a register to be read.  Generates:
+@example
+mov  @var{dest}, r@var{regno}
+@end example
+where the value in @var{dest} will be the result returned from the
+built-in.
+@enddefbuiltin
+
+@defbuiltin{void __builtin_arc_core_write (unsigned int @var{regno}, unsigned int @var{val})}
+The first operand is the number of a register to be written, the
+second operand is a compile time constant to write into that
+register.  Generates:
+@example
+mov  r@var{regno}, @var{val}
+@end example
+@enddefbuiltin
+
+@defbuiltin{int __builtin_arc_divaw (int @var{a}, int @var{b})}
+Only available if either @option{-mcpu=ARC700} or @option{-meA} is set.
+Generates:
+@example
+divaw  @var{dest}, @var{a}, @var{b}
+@end example
+where the value in @var{dest} will be the result returned from the
+built-in.
+@enddefbuiltin
+
+@defbuiltin{void __builtin_arc_flag (unsigned int @var{a})}
+Generates
+@example
+flag  @var{a}
+@end example
+@enddefbuiltin
 
+@defbuiltin{{unsigned int} __builtin_arc_lr (unsigned int @var{auxr})}
+The operand, @var{auxv}, is the address of an auxiliary register and
+must be a compile time constant.  Generates:
+@example
+lr  @var{dest}, [@var{auxr}]
+@end example
+Where the value in @var{dest} will be the result returned from the
+built-in.
+@enddefbuiltin
 
-@node MIPS Paired-Single Support
-@subsection MIPS Paired-Single Support
+@defbuiltin{void __builtin_arc_mul64 (int @var{a}, int @var{b})}
+Only available with @option{-mmul64}.  Generates:
+@example
+mul64  @var{a}, @var{b}
+@end example
+@enddefbuiltin
 
-The MIPS64 architecture includes a number of instructions that
-operate on pairs of single-precision floating-point values.
-Each pair is packed into a 64-bit floating-point register,
-with one element being designated the ``upper half'' and
-the other being designated the ``lower half''.
+@defbuiltin{void __builtin_arc_mulu64 (unsigned int @var{a}, unsigned int @var{b})}
+Only available with @option{-mmul64}.  Generates:
+@example
+mulu64  @var{a}, @var{b}
+@end example
+@enddefbuiltin
 
-GCC supports paired-single operations using both the generic
-vector extensions (@pxref{Vector Extensions}) and a collection of
-MIPS-specific built-in functions.  Both kinds of support are
-enabled by the @option{-mpaired-single} command-line option.
+@defbuiltin{void __builtin_arc_nop (void)}
+Generates:
+@example
+nop
+@end example
+@enddefbuiltin
 
-The vector type associated with paired-single values is usually
-called @code{v2sf}.  It can be defined in C as follows:
+@defbuiltin{int __builtin_arc_norm (int @var{src})}
+Only valid if the @samp{norm} instruction is available through the
+@option{-mnorm} option or by default with @option{-mcpu=ARC700}.
+Generates:
+@example
+norm  @var{dest}, @var{src}
+@end example
+Where the value in @var{dest} will be the result returned from the
+built-in.
+@enddefbuiltin
 
-@smallexample
-typedef float v2sf __attribute__ ((vector_size (8)));
-@end smallexample
+@defbuiltin{{short int} __builtin_arc_normw (short int @var{src})}
+Only valid if the @samp{normw} instruction is available through the
+@option{-mnorm} option or by default with @option{-mcpu=ARC700}.
+Generates:
+@example
+normw  @var{dest}, @var{src}
+@end example
+Where the value in @var{dest} will be the result returned from the
+built-in.
+@enddefbuiltin
 
-@code{v2sf} values are initialized in the same way as aggregates.
-For example:
+@defbuiltin{void __builtin_arc_rtie (void)}
+Generates:
+@example
+rtie
+@end example
+@enddefbuiltin
 
-@smallexample
-v2sf a = @{1.5, 9.1@};
-v2sf b;
-float e, f;
-b = (v2sf) @{e, f@};
-@end smallexample
+@defbuiltin{void __builtin_arc_sleep (int @var{a}}
+Generates:
+@example
+sleep  @var{a}
+@end example
+@enddefbuiltin
 
-@emph{Note:} The CPU's endianness determines which value is stored in
-the upper half of a register and which value is stored in the lower half.
-On little-endian targets, the first value is the lower one and the second
-value is the upper one.  The opposite order applies to big-endian targets.
-For example, the code above sets the lower half of @code{a} to
-@code{1.5} on little-endian targets and @code{9.1} on big-endian targets.
+@defbuiltin{void __builtin_arc_sr (unsigned int @var{val}, unsigned int @var{auxr})}
+The first argument, @var{val}, is a compile time constant to be
+written to the register, the second argument, @var{auxr}, is the
+address of an auxiliary register.  Generates:
+@example
+sr  @var{val}, [@var{auxr}]
+@end example
+@enddefbuiltin
 
-@node MIPS Loongson Built-in Functions
-@subsection MIPS Loongson Built-in Functions
+@defbuiltin{int __builtin_arc_swap (int @var{src})}
+Only valid with @option{-mswap}.  Generates:
+@example
+swap  @var{dest}, @var{src}
+@end example
+Where the value in @var{dest} will be the result returned from the
+built-in.
+@enddefbuiltin
 
-GCC provides intrinsics to access the SIMD instructions provided by the
-ST Microelectronics Loongson-2E and -2F processors.  These intrinsics,
-available after inclusion of the @code{loongson.h} header file,
-operate on the following 64-bit vector types:
+@defbuiltin{void __builtin_arc_swi (void)}
+Generates:
+@example
+swi
+@end example
+@enddefbuiltin
 
-@itemize
-@item @code{uint8x8_t}, a vector of eight unsigned 8-bit integers;
-@item @code{uint16x4_t}, a vector of four unsigned 16-bit integers;
-@item @code{uint32x2_t}, a vector of two unsigned 32-bit integers;
-@item @code{int8x8_t}, a vector of eight signed 8-bit integers;
-@item @code{int16x4_t}, a vector of four signed 16-bit integers;
-@item @code{int32x2_t}, a vector of two signed 32-bit integers.
-@end itemize
+@defbuiltin{void __builtin_arc_sync (void)}
+Only available with @option{-mcpu=ARC700}.  Generates:
+@example
+sync
+@end example
+@enddefbuiltin
 
-The intrinsics provided are listed below; each is named after the
-machine instruction to which it corresponds, with suffixes added as
-appropriate to distinguish intrinsics that expand to the same machine
-instruction yet have different argument types.  Refer to the architecture
-documentation for a description of the functionality of each
-instruction.
+@defbuiltin{void __builtin_arc_trap_s (unsigned int @var{c})}
+Only available with @option{-mcpu=ARC700}.  Generates:
+@example
+trap_s  @var{c}
+@end example
+@enddefbuiltin
 
-@smallexample
-int16x4_t packsswh (int32x2_t s, int32x2_t t);
-int8x8_t packsshb (int16x4_t s, int16x4_t t);
-uint8x8_t packushb (uint16x4_t s, uint16x4_t t);
-uint32x2_t paddw_u (uint32x2_t s, uint32x2_t t);
-uint16x4_t paddh_u (uint16x4_t s, uint16x4_t t);
-uint8x8_t paddb_u (uint8x8_t s, uint8x8_t t);
-int32x2_t paddw_s (int32x2_t s, int32x2_t t);
-int16x4_t paddh_s (int16x4_t s, int16x4_t t);
-int8x8_t paddb_s (int8x8_t s, int8x8_t t);
-uint64_t paddd_u (uint64_t s, uint64_t t);
-int64_t paddd_s (int64_t s, int64_t t);
-int16x4_t paddsh (int16x4_t s, int16x4_t t);
-int8x8_t paddsb (int8x8_t s, int8x8_t t);
-uint16x4_t paddush (uint16x4_t s, uint16x4_t t);
-uint8x8_t paddusb (uint8x8_t s, uint8x8_t t);
-uint64_t pandn_ud (uint64_t s, uint64_t t);
-uint32x2_t pandn_uw (uint32x2_t s, uint32x2_t t);
-uint16x4_t pandn_uh (uint16x4_t s, uint16x4_t t);
-uint8x8_t pandn_ub (uint8x8_t s, uint8x8_t t);
-int64_t pandn_sd (int64_t s, int64_t t);
-int32x2_t pandn_sw (int32x2_t s, int32x2_t t);
-int16x4_t pandn_sh (int16x4_t s, int16x4_t t);
-int8x8_t pandn_sb (int8x8_t s, int8x8_t t);
-uint16x4_t pavgh (uint16x4_t s, uint16x4_t t);
-uint8x8_t pavgb (uint8x8_t s, uint8x8_t t);
-uint32x2_t pcmpeqw_u (uint32x2_t s, uint32x2_t t);
-uint16x4_t pcmpeqh_u (uint16x4_t s, uint16x4_t t);
-uint8x8_t pcmpeqb_u (uint8x8_t s, uint8x8_t t);
-int32x2_t pcmpeqw_s (int32x2_t s, int32x2_t t);
-int16x4_t pcmpeqh_s (int16x4_t s, int16x4_t t);
-int8x8_t pcmpeqb_s (int8x8_t s, int8x8_t t);
-uint32x2_t pcmpgtw_u (uint32x2_t s, uint32x2_t t);
-uint16x4_t pcmpgth_u (uint16x4_t s, uint16x4_t t);
-uint8x8_t pcmpgtb_u (uint8x8_t s, uint8x8_t t);
-int32x2_t pcmpgtw_s (int32x2_t s, int32x2_t t);
-int16x4_t pcmpgth_s (int16x4_t s, int16x4_t t);
-int8x8_t pcmpgtb_s (int8x8_t s, int8x8_t t);
-uint16x4_t pextrh_u (uint16x4_t s, int field);
-int16x4_t pextrh_s (int16x4_t s, int field);
-uint16x4_t pinsrh_0_u (uint16x4_t s, uint16x4_t t);
-uint16x4_t pinsrh_1_u (uint16x4_t s, uint16x4_t t);
-uint16x4_t pinsrh_2_u (uint16x4_t s, uint16x4_t t);
-uint16x4_t pinsrh_3_u (uint16x4_t s, uint16x4_t t);
-int16x4_t pinsrh_0_s (int16x4_t s, int16x4_t t);
-int16x4_t pinsrh_1_s (int16x4_t s, int16x4_t t);
-int16x4_t pinsrh_2_s (int16x4_t s, int16x4_t t);
-int16x4_t pinsrh_3_s (int16x4_t s, int16x4_t t);
-int32x2_t pmaddhw (int16x4_t s, int16x4_t t);
-int16x4_t pmaxsh (int16x4_t s, int16x4_t t);
-uint8x8_t pmaxub (uint8x8_t s, uint8x8_t t);
-int16x4_t pminsh (int16x4_t s, int16x4_t t);
-uint8x8_t pminub (uint8x8_t s, uint8x8_t t);
-uint8x8_t pmovmskb_u (uint8x8_t s);
-int8x8_t pmovmskb_s (int8x8_t s);
-uint16x4_t pmulhuh (uint16x4_t s, uint16x4_t t);
-int16x4_t pmulhh (int16x4_t s, int16x4_t t);
-int16x4_t pmullh (int16x4_t s, int16x4_t t);
-int64_t pmuluw (uint32x2_t s, uint32x2_t t);
-uint8x8_t pasubub (uint8x8_t s, uint8x8_t t);
-uint16x4_t biadd (uint8x8_t s);
-uint16x4_t psadbh (uint8x8_t s, uint8x8_t t);
-uint16x4_t pshufh_u (uint16x4_t dest, uint16x4_t s, uint8_t order);
-int16x4_t pshufh_s (int16x4_t dest, int16x4_t s, uint8_t order);
-uint16x4_t psllh_u (uint16x4_t s, uint8_t amount);
-int16x4_t psllh_s (int16x4_t s, uint8_t amount);
-uint32x2_t psllw_u (uint32x2_t s, uint8_t amount);
-int32x2_t psllw_s (int32x2_t s, uint8_t amount);
-uint16x4_t psrlh_u (uint16x4_t s, uint8_t amount);
-int16x4_t psrlh_s (int16x4_t s, uint8_t amount);
-uint32x2_t psrlw_u (uint32x2_t s, uint8_t amount);
-int32x2_t psrlw_s (int32x2_t s, uint8_t amount);
-uint16x4_t psrah_u (uint16x4_t s, uint8_t amount);
-int16x4_t psrah_s (int16x4_t s, uint8_t amount);
-uint32x2_t psraw_u (uint32x2_t s, uint8_t amount);
-int32x2_t psraw_s (int32x2_t s, uint8_t amount);
-uint32x2_t psubw_u (uint32x2_t s, uint32x2_t t);
-uint16x4_t psubh_u (uint16x4_t s, uint16x4_t t);
-uint8x8_t psubb_u (uint8x8_t s, uint8x8_t t);
-int32x2_t psubw_s (int32x2_t s, int32x2_t t);
-int16x4_t psubh_s (int16x4_t s, int16x4_t t);
-int8x8_t psubb_s (int8x8_t s, int8x8_t t);
-uint64_t psubd_u (uint64_t s, uint64_t t);
-int64_t psubd_s (int64_t s, int64_t t);
-int16x4_t psubsh (int16x4_t s, int16x4_t t);
-int8x8_t psubsb (int8x8_t s, int8x8_t t);
-uint16x4_t psubush (uint16x4_t s, uint16x4_t t);
-uint8x8_t psubusb (uint8x8_t s, uint8x8_t t);
-uint32x2_t punpckhwd_u (uint32x2_t s, uint32x2_t t);
-uint16x4_t punpckhhw_u (uint16x4_t s, uint16x4_t t);
-uint8x8_t punpckhbh_u (uint8x8_t s, uint8x8_t t);
-int32x2_t punpckhwd_s (int32x2_t s, int32x2_t t);
-int16x4_t punpckhhw_s (int16x4_t s, int16x4_t t);
-int8x8_t punpckhbh_s (int8x8_t s, int8x8_t t);
-uint32x2_t punpcklwd_u (uint32x2_t s, uint32x2_t t);
-uint16x4_t punpcklhw_u (uint16x4_t s, uint16x4_t t);
-uint8x8_t punpcklbh_u (uint8x8_t s, uint8x8_t t);
-int32x2_t punpcklwd_s (int32x2_t s, int32x2_t t);
-int16x4_t punpcklhw_s (int16x4_t s, int16x4_t t);
-int8x8_t punpcklbh_s (int8x8_t s, int8x8_t t);
-@end smallexample
+@defbuiltin{void __builtin_arc_unimp_s (void)}
+Only available with @option{-mcpu=ARC700}.  Generates:
+@example
+unimp_s
+@end example
+@enddefbuiltin
 
-@menu
-* Paired-Single Arithmetic::
-* Paired-Single Built-in Functions::
-* MIPS-3D Built-in Functions::
-@end menu
+The instructions generated by the following builtins are not
+considered as candidates for scheduling.  They are not moved around by
+the compiler during scheduling, and thus can be expected to appear
+where they are put in the C code:
+@example
+__builtin_arc_brk()
+__builtin_arc_core_read()
+__builtin_arc_core_write()
+__builtin_arc_flag()
+__builtin_arc_lr()
+__builtin_arc_sleep()
+__builtin_arc_sr()
+__builtin_arc_swi()
+@end example
 
-@node Paired-Single Arithmetic
-@subsubsection Paired-Single Arithmetic
+The following built-in functions are available for the ARCv2 family of
+processors.
 
-The table below lists the @code{v2sf} operations for which hardware
-support exists.  @code{a}, @code{b} and @code{c} are @code{v2sf}
-values and @code{x} is an integral value.
+@example
+int __builtin_arc_clri ();
+void __builtin_arc_kflag (unsigned);
+void __builtin_arc_seti (int);
+@end example
 
-@multitable @columnfractions .50 .50
-@headitem C code @tab MIPS instruction
-@item @code{a + b} @tab @code{add.ps}
-@item @code{a - b} @tab @code{sub.ps}
-@item @code{-a} @tab @code{neg.ps}
-@item @code{a * b} @tab @code{mul.ps}
-@item @code{a * b + c} @tab @code{madd.ps}
-@item @code{a * b - c} @tab @code{msub.ps}
-@item @code{-(a * b + c)} @tab @code{nmadd.ps}
-@item @code{-(a * b - c)} @tab @code{nmsub.ps}
-@item @code{x ? a : b} @tab @code{movn.ps}/@code{movz.ps}
-@end multitable
+The following built-in functions are available for the ARCv2 family
+and uses @option{-mnorm}.
 
-Note that the multiply-accumulate instructions can be disabled
-using the command-line option @code{-mno-fused-madd}.
+@example
+int __builtin_arc_ffs (int);
+int __builtin_arc_fls (int);
+@end example
 
-@node Paired-Single Built-in Functions
-@subsubsection Paired-Single Built-in Functions
+@node ARC SIMD Built-in Functions
+@subsection ARC SIMD Built-in Functions
 
-The following paired-single functions map directly to a particular
-MIPS instruction.  Please refer to the architecture specification
-for details on what each instruction does.
+SIMD builtins provided by the compiler can be used to generate the
+vector instructions.  This section describes the available builtins
+and their usage in programs.  With the @option{-msimd} option, the
+compiler provides 128-bit vector types, which can be specified using
+the @code{vector_size} attribute.  The header file @file{arc-simd.h}
+can be included to use the following predefined types:
+@example
+typedef int __v4si   __attribute__((vector_size(16)));
+typedef short __v8hi __attribute__((vector_size(16)));
+@end example
 
-@table @code
-@item v2sf __builtin_mips_pll_ps (v2sf, v2sf)
-Pair lower lower (@code{pll.ps}).
+These types can be used to define 128-bit variables.  The built-in
+functions listed in the following section can be used on these
+variables to generate the vector operations.
 
-@item v2sf __builtin_mips_pul_ps (v2sf, v2sf)
-Pair upper lower (@code{pul.ps}).
+For all builtins, @code{__builtin_arc_@var{someinsn}}, the header file
+@file{arc-simd.h} also provides equivalent macros called
+@code{_@var{someinsn}} that can be used for programming ease and
+improved readability.  The following macros for DMA control are also
+provided:
+@example
+#define _setup_dma_in_channel_reg _vdiwr
+#define _setup_dma_out_channel_reg _vdowr
+@end example
 
-@item v2sf __builtin_mips_plu_ps (v2sf, v2sf)
-Pair lower upper (@code{plu.ps}).
+The following is a complete list of all the SIMD built-ins provided
+for ARC, grouped by calling signature.
 
-@item v2sf __builtin_mips_puu_ps (v2sf, v2sf)
-Pair upper upper (@code{puu.ps}).
+The following take two @code{__v8hi} arguments and return a
+@code{__v8hi} result:
+@example
+__v8hi __builtin_arc_vaddaw (__v8hi, __v8hi);
+__v8hi __builtin_arc_vaddw (__v8hi, __v8hi);
+__v8hi __builtin_arc_vand (__v8hi, __v8hi);
+__v8hi __builtin_arc_vandaw (__v8hi, __v8hi);
+__v8hi __builtin_arc_vavb (__v8hi, __v8hi);
+__v8hi __builtin_arc_vavrb (__v8hi, __v8hi);
+__v8hi __builtin_arc_vbic (__v8hi, __v8hi);
+__v8hi __builtin_arc_vbicaw (__v8hi, __v8hi);
+__v8hi __builtin_arc_vdifaw (__v8hi, __v8hi);
+__v8hi __builtin_arc_vdifw (__v8hi, __v8hi);
+__v8hi __builtin_arc_veqw (__v8hi, __v8hi);
+__v8hi __builtin_arc_vh264f (__v8hi, __v8hi);
+__v8hi __builtin_arc_vh264ft (__v8hi, __v8hi);
+__v8hi __builtin_arc_vh264fw (__v8hi, __v8hi);
+__v8hi __builtin_arc_vlew (__v8hi, __v8hi);
+__v8hi __builtin_arc_vltw (__v8hi, __v8hi);
+__v8hi __builtin_arc_vmaxaw (__v8hi, __v8hi);
+__v8hi __builtin_arc_vmaxw (__v8hi, __v8hi);
+__v8hi __builtin_arc_vminaw (__v8hi, __v8hi);
+__v8hi __builtin_arc_vminw (__v8hi, __v8hi);
+__v8hi __builtin_arc_vmr1aw (__v8hi, __v8hi);
+__v8hi __builtin_arc_vmr1w (__v8hi, __v8hi);
+__v8hi __builtin_arc_vmr2aw (__v8hi, __v8hi);
+__v8hi __builtin_arc_vmr2w (__v8hi, __v8hi);
+__v8hi __builtin_arc_vmr3aw (__v8hi, __v8hi);
+__v8hi __builtin_arc_vmr3w (__v8hi, __v8hi);
+__v8hi __builtin_arc_vmr4aw (__v8hi, __v8hi);
+__v8hi __builtin_arc_vmr4w (__v8hi, __v8hi);
+__v8hi __builtin_arc_vmr5aw (__v8hi, __v8hi);
+__v8hi __builtin_arc_vmr5w (__v8hi, __v8hi);
+__v8hi __builtin_arc_vmr6aw (__v8hi, __v8hi);
+__v8hi __builtin_arc_vmr6w (__v8hi, __v8hi);
+__v8hi __builtin_arc_vmr7aw (__v8hi, __v8hi);
+__v8hi __builtin_arc_vmr7w (__v8hi, __v8hi);
+__v8hi __builtin_arc_vmrb (__v8hi, __v8hi);
+__v8hi __builtin_arc_vmulaw (__v8hi, __v8hi);
+__v8hi __builtin_arc_vmulfaw (__v8hi, __v8hi);
+__v8hi __builtin_arc_vmulfw (__v8hi, __v8hi);
+__v8hi __builtin_arc_vmulw (__v8hi, __v8hi);
+__v8hi __builtin_arc_vnew (__v8hi, __v8hi);
+__v8hi __builtin_arc_vor (__v8hi, __v8hi);
+__v8hi __builtin_arc_vsubaw (__v8hi, __v8hi);
+__v8hi __builtin_arc_vsubw (__v8hi, __v8hi);
+__v8hi __builtin_arc_vsummw (__v8hi, __v8hi);
+__v8hi __builtin_arc_vvc1f (__v8hi, __v8hi);
+__v8hi __builtin_arc_vvc1ft (__v8hi, __v8hi);
+__v8hi __builtin_arc_vxor (__v8hi, __v8hi);
+__v8hi __builtin_arc_vxoraw (__v8hi, __v8hi);
+@end example
 
-@item v2sf __builtin_mips_cvt_ps_s (float, float)
-Convert pair to paired single (@code{cvt.ps.s}).
+The following take one @code{__v8hi} and one @code{int} argument and return a
+@code{__v8hi} result:
 
-@item float __builtin_mips_cvt_s_pl (v2sf)
-Convert pair lower to single (@code{cvt.s.pl}).
+@example
+__v8hi __builtin_arc_vbaddw (__v8hi, int);
+__v8hi __builtin_arc_vbmaxw (__v8hi, int);
+__v8hi __builtin_arc_vbminw (__v8hi, int);
+__v8hi __builtin_arc_vbmulaw (__v8hi, int);
+__v8hi __builtin_arc_vbmulfw (__v8hi, int);
+__v8hi __builtin_arc_vbmulw (__v8hi, int);
+__v8hi __builtin_arc_vbrsubw (__v8hi, int);
+__v8hi __builtin_arc_vbsubw (__v8hi, int);
+@end example
 
-@item float __builtin_mips_cvt_s_pu (v2sf)
-Convert pair upper to single (@code{cvt.s.pu}).
+The following take one @code{__v8hi} argument and one @code{int} argument which
+must be a 3-bit compile time constant indicating a register number
+I0-I7.  They return a @code{__v8hi} result.
+@example
+__v8hi __builtin_arc_vasrw (__v8hi, const int);
+__v8hi __builtin_arc_vsr8 (__v8hi, const int);
+__v8hi __builtin_arc_vsr8aw (__v8hi, const int);
+@end example
 
-@item v2sf __builtin_mips_abs_ps (v2sf)
-Absolute value (@code{abs.ps}).
+The following take one @code{__v8hi} argument and one @code{int}
+argument which must be a 6-bit compile time constant.  They return a
+@code{__v8hi} result.
+@example
+__v8hi __builtin_arc_vasrpwbi (__v8hi, const int);
+__v8hi __builtin_arc_vasrrpwbi (__v8hi, const int);
+__v8hi __builtin_arc_vasrrwi (__v8hi, const int);
+__v8hi __builtin_arc_vasrsrwi (__v8hi, const int);
+__v8hi __builtin_arc_vasrwi (__v8hi, const int);
+__v8hi __builtin_arc_vsr8awi (__v8hi, const int);
+__v8hi __builtin_arc_vsr8i (__v8hi, const int);
+@end example
 
-@item v2sf __builtin_mips_alnv_ps (v2sf, v2sf, int)
-Align variable (@code{alnv.ps}).
+The following take one @code{__v8hi} argument and one @code{int} argument which
+must be a 8-bit compile time constant.  They return a @code{__v8hi}
+result.
+@example
+__v8hi __builtin_arc_vd6tapf (__v8hi, const int);
+__v8hi __builtin_arc_vmvaw (__v8hi, const int);
+__v8hi __builtin_arc_vmvw (__v8hi, const int);
+__v8hi __builtin_arc_vmvzw (__v8hi, const int);
+@end example
 
-@emph{Note:} The value of the third parameter must be 0 or 4
-modulo 8, otherwise the result is unpredictable.  Please read the
-instruction description for details.
-@end table
+The following take two @code{int} arguments, the second of which which
+must be a 8-bit compile time constant.  They return a @code{__v8hi}
+result:
+@example
+__v8hi __builtin_arc_vmovaw (int, const int);
+__v8hi __builtin_arc_vmovw (int, const int);
+__v8hi __builtin_arc_vmovzw (int, const int);
+@end example
 
-The following multi-instruction functions are also available.
-In each case, @var{cond} can be any of the 16 floating-point conditions:
-@code{f}, @code{un}, @code{eq}, @code{ueq}, @code{olt}, @code{ult},
-@code{ole}, @code{ule}, @code{sf}, @code{ngle}, @code{seq}, @code{ngl},
-@code{lt}, @code{nge}, @code{le} or @code{ngt}.
+The following take a single @code{__v8hi} argument and return a
+@code{__v8hi} result:
+@example
+__v8hi __builtin_arc_vabsaw (__v8hi);
+__v8hi __builtin_arc_vabsw (__v8hi);
+__v8hi __builtin_arc_vaddsuw (__v8hi);
+__v8hi __builtin_arc_vexch1 (__v8hi);
+__v8hi __builtin_arc_vexch2 (__v8hi);
+__v8hi __builtin_arc_vexch4 (__v8hi);
+__v8hi __builtin_arc_vsignw (__v8hi);
+__v8hi __builtin_arc_vupbaw (__v8hi);
+__v8hi __builtin_arc_vupbw (__v8hi);
+__v8hi __builtin_arc_vupsbaw (__v8hi);
+__v8hi __builtin_arc_vupsbw (__v8hi);
+@end example
+
+The following take two @code{int} arguments and return no result:
+@example
+void __builtin_arc_vdirun (int, int);
+void __builtin_arc_vdorun (int, int);
+@end example
+
+The following take two @code{int} arguments and return no result.  The
+first argument must a 3-bit compile time constant indicating one of
+the DR0-DR7 DMA setup channels:
+@example
+void __builtin_arc_vdiwr (const int, int);
+void __builtin_arc_vdowr (const int, int);
+@end example
+
+The following take an @code{int} argument and return no result:
+@example
+void __builtin_arc_vendrec (int);
+void __builtin_arc_vrec (int);
+void __builtin_arc_vrecrun (int);
+void __builtin_arc_vrun (int);
+@end example
+
+The following take a @code{__v8hi} argument and two @code{int}
+arguments and return a @code{__v8hi} result.  The second argument must
+be a 3-bit compile time constants, indicating one the registers I0-I7,
+and the third argument must be an 8-bit compile time constant.
+
+@emph{Note:} Although the equivalent hardware instructions do not take
+an SIMD register as an operand, these builtins overwrite the relevant
+bits of the @code{__v8hi} register provided as the first argument with
+the value loaded from the @code{[Ib, u8]} location in the SDM.
 
-@table @code
-@item v2sf __builtin_mips_movt_c_@var{cond}_ps (v2sf @var{a}, v2sf @var{b}, v2sf @var{c}, v2sf @var{d})
-@itemx v2sf __builtin_mips_movf_c_@var{cond}_ps (v2sf @var{a}, v2sf @var{b}, v2sf @var{c}, v2sf @var{d})
-Conditional move based on floating-point comparison (@code{c.@var{cond}.ps},
-@code{movt.ps}/@code{movf.ps}).
+@example
+__v8hi __builtin_arc_vld32 (__v8hi, const int, const int);
+__v8hi __builtin_arc_vld32wh (__v8hi, const int, const int);
+__v8hi __builtin_arc_vld32wl (__v8hi, const int, const int);
+__v8hi __builtin_arc_vld64 (__v8hi, const int, const int);
+@end example
 
-The @code{movt} functions return the value @var{x} computed by:
+The following take two @code{int} arguments and return a @code{__v8hi}
+result.  The first argument must be a 3-bit compile time constants,
+indicating one the registers I0-I7, and the second argument must be an
+8-bit compile time constant.
 
-@smallexample
-c.@var{cond}.ps @var{cc},@var{a},@var{b}
-mov.ps @var{x},@var{c}
-movt.ps @var{x},@var{d},@var{cc}
-@end smallexample
+@example
+__v8hi __builtin_arc_vld128 (const int, const int);
+__v8hi __builtin_arc_vld64w (const int, const int);
+@end example
 
-The @code{movf} functions are similar but use @code{movf.ps} instead
-of @code{movt.ps}.
+The following take a @code{__v8hi} argument and two @code{int}
+arguments and return no result.  The second argument must be a 3-bit
+compile time constants, indicating one the registers I0-I7, and the
+third argument must be an 8-bit compile time constant.
 
-@item int __builtin_mips_upper_c_@var{cond}_ps (v2sf @var{a}, v2sf @var{b})
-@itemx int __builtin_mips_lower_c_@var{cond}_ps (v2sf @var{a}, v2sf @var{b})
-Comparison of two paired-single values (@code{c.@var{cond}.ps},
-@code{bc1t}/@code{bc1f}).
+@example
+void __builtin_arc_vst128 (__v8hi, const int, const int);
+void __builtin_arc_vst64 (__v8hi, const int, const int);
+@end example
 
-These functions compare @var{a} and @var{b} using @code{c.@var{cond}.ps}
-and return either the upper or lower half of the result.  For example:
+The following take a @code{__v8hi} argument and three @code{int}
+arguments and return no result.  The second argument must be a 3-bit
+compile-time constant, identifying the 16-bit sub-register to be
+stored, the third argument must be a 3-bit compile time constants,
+indicating one the registers I0-I7, and the fourth argument must be an
+8-bit compile time constant.
 
-@smallexample
-v2sf a, b;
-if (__builtin_mips_upper_c_eq_ps (a, b))
-  upper_halves_are_equal ();
-else
-  upper_halves_are_unequal ();
+@example
+void __builtin_arc_vst16_n (__v8hi, const int, const int, const int);
+void __builtin_arc_vst32_n (__v8hi, const int, const int, const int);
+@end example
 
-if (__builtin_mips_lower_c_eq_ps (a, b))
-  lower_halves_are_equal ();
-else
-  lower_halves_are_unequal ();
-@end smallexample
-@end table
+The following built-in functions are available on systems that uses
+@option{-mmpy-option=6} or higher.
 
-@node MIPS-3D Built-in Functions
-@subsubsection MIPS-3D Built-in Functions
+@example
+__v2hi __builtin_arc_dmach (__v2hi, __v2hi);
+__v2hi __builtin_arc_dmachu (__v2hi, __v2hi);
+__v2hi __builtin_arc_dmpyh (__v2hi, __v2hi);
+__v2hi __builtin_arc_dmpyhu (__v2hi, __v2hi);
+__v2hi __builtin_arc_vaddsub2h (__v2hi, __v2hi);
+__v2hi __builtin_arc_vsubadd2h (__v2hi, __v2hi);
+@end example
 
-The MIPS-3D Application-Specific Extension (ASE) includes additional
-paired-single instructions that are designed to improve the performance
-of 3D graphics operations.  Support for these instructions is controlled
-by the @option{-mips3d} command-line option.
+The following built-in functions are available on systems that uses
+@option{-mmpy-option=7} or higher.
 
-The functions listed below map directly to a particular MIPS-3D
-instruction.  Please refer to the architecture specification for
-more details on what each instruction does.
+@example
+__v2si __builtin_arc_vmac2h (__v2hi, __v2hi);
+__v2si __builtin_arc_vmac2hu (__v2hi, __v2hi);
+__v2si __builtin_arc_vmpy2h (__v2hi, __v2hi);
+__v2si __builtin_arc_vmpy2hu (__v2hi, __v2hi);
+@end example
 
-@table @code
-@item v2sf __builtin_mips_addr_ps (v2sf, v2sf)
-Reduction add (@code{addr.ps}).
+The following built-in functions are available on systems that uses
+@option{-mmpy-option=8} or higher.
 
-@item v2sf __builtin_mips_mulr_ps (v2sf, v2sf)
-Reduction multiply (@code{mulr.ps}).
+@example
+long long __builtin_arc_qmach (__v4hi, __v4hi);
+long long __builtin_arc_qmachu (__v4hi, __v4hi);
+long long __builtin_arc_qmpyh (__v4hi, __v4hi);
+long long __builtin_arc_qmpyhu (__v4hi, __v4hi);
+long long __builtin_arc_dmacwh (__v2si, __v2hi);
+long long __builtin_arc_dmacwhu (__v2si, __v2hi);
+_v2si __builtin_arc_vaddsub (__v2si, __v2si);
+_v2si __builtin_arc_vsubadd (__v2si, __v2si);
+_v4hi __builtin_arc_vaddsub4h (__v4hi, __v4hi);
+_v4hi __builtin_arc_vsubadd4h (__v4hi, __v4hi);
+@end example
 
-@item v2sf __builtin_mips_cvt_pw_ps (v2sf)
-Convert paired single to paired word (@code{cvt.pw.ps}).
+@node ARM iWMMXt Built-in Functions
+@subsection ARM iWMMXt Built-in Functions
 
-@item v2sf __builtin_mips_cvt_ps_pw (v2sf)
-Convert paired word to paired single (@code{cvt.ps.pw}).
+These built-in functions are available for the ARM family of
+processors when the @option{-mcpu=iwmmxt} switch is used:
 
-@item float __builtin_mips_recip1_s (float)
-@itemx double __builtin_mips_recip1_d (double)
-@itemx v2sf __builtin_mips_recip1_ps (v2sf)
-Reduced-precision reciprocal (sequence step 1) (@code{recip1.@var{fmt}}).
+@smallexample
+typedef int v2si __attribute__ ((vector_size (8)));
+typedef short v4hi __attribute__ ((vector_size (8)));
+typedef char v8qi __attribute__ ((vector_size (8)));
 
-@item float __builtin_mips_recip2_s (float, float)
-@itemx double __builtin_mips_recip2_d (double, double)
-@itemx v2sf __builtin_mips_recip2_ps (v2sf, v2sf)
-Reduced-precision reciprocal (sequence step 2) (@code{recip2.@var{fmt}}).
+int __builtin_arm_getwcgr0 (void);
+void __builtin_arm_setwcgr0 (int);
+int __builtin_arm_getwcgr1 (void);
+void __builtin_arm_setwcgr1 (int);
+int __builtin_arm_getwcgr2 (void);
+void __builtin_arm_setwcgr2 (int);
+int __builtin_arm_getwcgr3 (void);
+void __builtin_arm_setwcgr3 (int);
+int __builtin_arm_textrmsb (v8qi, int);
+int __builtin_arm_textrmsh (v4hi, int);
+int __builtin_arm_textrmsw (v2si, int);
+int __builtin_arm_textrmub (v8qi, int);
+int __builtin_arm_textrmuh (v4hi, int);
+int __builtin_arm_textrmuw (v2si, int);
+v8qi __builtin_arm_tinsrb (v8qi, int, int);
+v4hi __builtin_arm_tinsrh (v4hi, int, int);
+v2si __builtin_arm_tinsrw (v2si, int, int);
+long long __builtin_arm_tmia (long long, int, int);
+long long __builtin_arm_tmiabb (long long, int, int);
+long long __builtin_arm_tmiabt (long long, int, int);
+long long __builtin_arm_tmiaph (long long, int, int);
+long long __builtin_arm_tmiatb (long long, int, int);
+long long __builtin_arm_tmiatt (long long, int, int);
+int __builtin_arm_tmovmskb (v8qi);
+int __builtin_arm_tmovmskh (v4hi);
+int __builtin_arm_tmovmskw (v2si);
+long long __builtin_arm_waccb (v8qi);
+long long __builtin_arm_wacch (v4hi);
+long long __builtin_arm_waccw (v2si);
+v8qi __builtin_arm_waddb (v8qi, v8qi);
+v8qi __builtin_arm_waddbss (v8qi, v8qi);
+v8qi __builtin_arm_waddbus (v8qi, v8qi);
+v4hi __builtin_arm_waddh (v4hi, v4hi);
+v4hi __builtin_arm_waddhss (v4hi, v4hi);
+v4hi __builtin_arm_waddhus (v4hi, v4hi);
+v2si __builtin_arm_waddw (v2si, v2si);
+v2si __builtin_arm_waddwss (v2si, v2si);
+v2si __builtin_arm_waddwus (v2si, v2si);
+v8qi __builtin_arm_walign (v8qi, v8qi, int);
+long long __builtin_arm_wand(long long, long long);
+long long __builtin_arm_wandn (long long, long long);
+v8qi __builtin_arm_wavg2b (v8qi, v8qi);
+v8qi __builtin_arm_wavg2br (v8qi, v8qi);
+v4hi __builtin_arm_wavg2h (v4hi, v4hi);
+v4hi __builtin_arm_wavg2hr (v4hi, v4hi);
+v8qi __builtin_arm_wcmpeqb (v8qi, v8qi);
+v4hi __builtin_arm_wcmpeqh (v4hi, v4hi);
+v2si __builtin_arm_wcmpeqw (v2si, v2si);
+v8qi __builtin_arm_wcmpgtsb (v8qi, v8qi);
+v4hi __builtin_arm_wcmpgtsh (v4hi, v4hi);
+v2si __builtin_arm_wcmpgtsw (v2si, v2si);
+v8qi __builtin_arm_wcmpgtub (v8qi, v8qi);
+v4hi __builtin_arm_wcmpgtuh (v4hi, v4hi);
+v2si __builtin_arm_wcmpgtuw (v2si, v2si);
+long long __builtin_arm_wmacs (long long, v4hi, v4hi);
+long long __builtin_arm_wmacsz (v4hi, v4hi);
+long long __builtin_arm_wmacu (long long, v4hi, v4hi);
+long long __builtin_arm_wmacuz (v4hi, v4hi);
+v4hi __builtin_arm_wmadds (v4hi, v4hi);
+v4hi __builtin_arm_wmaddu (v4hi, v4hi);
+v8qi __builtin_arm_wmaxsb (v8qi, v8qi);
+v4hi __builtin_arm_wmaxsh (v4hi, v4hi);
+v2si __builtin_arm_wmaxsw (v2si, v2si);
+v8qi __builtin_arm_wmaxub (v8qi, v8qi);
+v4hi __builtin_arm_wmaxuh (v4hi, v4hi);
+v2si __builtin_arm_wmaxuw (v2si, v2si);
+v8qi __builtin_arm_wminsb (v8qi, v8qi);
+v4hi __builtin_arm_wminsh (v4hi, v4hi);
+v2si __builtin_arm_wminsw (v2si, v2si);
+v8qi __builtin_arm_wminub (v8qi, v8qi);
+v4hi __builtin_arm_wminuh (v4hi, v4hi);
+v2si __builtin_arm_wminuw (v2si, v2si);
+v4hi __builtin_arm_wmulsm (v4hi, v4hi);
+v4hi __builtin_arm_wmulul (v4hi, v4hi);
+v4hi __builtin_arm_wmulum (v4hi, v4hi);
+long long __builtin_arm_wor (long long, long long);
+v2si __builtin_arm_wpackdss (long long, long long);
+v2si __builtin_arm_wpackdus (long long, long long);
+v8qi __builtin_arm_wpackhss (v4hi, v4hi);
+v8qi __builtin_arm_wpackhus (v4hi, v4hi);
+v4hi __builtin_arm_wpackwss (v2si, v2si);
+v4hi __builtin_arm_wpackwus (v2si, v2si);
+long long __builtin_arm_wrord (long long, long long);
+long long __builtin_arm_wrordi (long long, int);
+v4hi __builtin_arm_wrorh (v4hi, long long);
+v4hi __builtin_arm_wrorhi (v4hi, int);
+v2si __builtin_arm_wrorw (v2si, long long);
+v2si __builtin_arm_wrorwi (v2si, int);
+v2si __builtin_arm_wsadb (v2si, v8qi, v8qi);
+v2si __builtin_arm_wsadbz (v8qi, v8qi);
+v2si __builtin_arm_wsadh (v2si, v4hi, v4hi);
+v2si __builtin_arm_wsadhz (v4hi, v4hi);
+v4hi __builtin_arm_wshufh (v4hi, int);
+long long __builtin_arm_wslld (long long, long long);
+long long __builtin_arm_wslldi (long long, int);
+v4hi __builtin_arm_wsllh (v4hi, long long);
+v4hi __builtin_arm_wsllhi (v4hi, int);
+v2si __builtin_arm_wsllw (v2si, long long);
+v2si __builtin_arm_wsllwi (v2si, int);
+long long __builtin_arm_wsrad (long long, long long);
+long long __builtin_arm_wsradi (long long, int);
+v4hi __builtin_arm_wsrah (v4hi, long long);
+v4hi __builtin_arm_wsrahi (v4hi, int);
+v2si __builtin_arm_wsraw (v2si, long long);
+v2si __builtin_arm_wsrawi (v2si, int);
+long long __builtin_arm_wsrld (long long, long long);
+long long __builtin_arm_wsrldi (long long, int);
+v4hi __builtin_arm_wsrlh (v4hi, long long);
+v4hi __builtin_arm_wsrlhi (v4hi, int);
+v2si __builtin_arm_wsrlw (v2si, long long);
+v2si __builtin_arm_wsrlwi (v2si, int);
+v8qi __builtin_arm_wsubb (v8qi, v8qi);
+v8qi __builtin_arm_wsubbss (v8qi, v8qi);
+v8qi __builtin_arm_wsubbus (v8qi, v8qi);
+v4hi __builtin_arm_wsubh (v4hi, v4hi);
+v4hi __builtin_arm_wsubhss (v4hi, v4hi);
+v4hi __builtin_arm_wsubhus (v4hi, v4hi);
+v2si __builtin_arm_wsubw (v2si, v2si);
+v2si __builtin_arm_wsubwss (v2si, v2si);
+v2si __builtin_arm_wsubwus (v2si, v2si);
+v4hi __builtin_arm_wunpckehsb (v8qi);
+v2si __builtin_arm_wunpckehsh (v4hi);
+long long __builtin_arm_wunpckehsw (v2si);
+v4hi __builtin_arm_wunpckehub (v8qi);
+v2si __builtin_arm_wunpckehuh (v4hi);
+long long __builtin_arm_wunpckehuw (v2si);
+v4hi __builtin_arm_wunpckelsb (v8qi);
+v2si __builtin_arm_wunpckelsh (v4hi);
+long long __builtin_arm_wunpckelsw (v2si);
+v4hi __builtin_arm_wunpckelub (v8qi);
+v2si __builtin_arm_wunpckeluh (v4hi);
+long long __builtin_arm_wunpckeluw (v2si);
+v8qi __builtin_arm_wunpckihb (v8qi, v8qi);
+v4hi __builtin_arm_wunpckihh (v4hi, v4hi);
+v2si __builtin_arm_wunpckihw (v2si, v2si);
+v8qi __builtin_arm_wunpckilb (v8qi, v8qi);
+v4hi __builtin_arm_wunpckilh (v4hi, v4hi);
+v2si __builtin_arm_wunpckilw (v2si, v2si);
+long long __builtin_arm_wxor (long long, long long);
+long long __builtin_arm_wzero ();
+@end smallexample
 
-@item float __builtin_mips_rsqrt1_s (float)
-@itemx double __builtin_mips_rsqrt1_d (double)
-@itemx v2sf __builtin_mips_rsqrt1_ps (v2sf)
-Reduced-precision reciprocal square root (sequence step 1)
-(@code{rsqrt1.@var{fmt}}).
 
-@item float __builtin_mips_rsqrt2_s (float, float)
-@itemx double __builtin_mips_rsqrt2_d (double, double)
-@itemx v2sf __builtin_mips_rsqrt2_ps (v2sf, v2sf)
-Reduced-precision reciprocal square root (sequence step 2)
-(@code{rsqrt2.@var{fmt}}).
-@end table
+@node ARM C Language Extensions (ACLE)
+@subsection ARM C Language Extensions (ACLE)
 
-The following multi-instruction functions are also available.
-In each case, @var{cond} can be any of the 16 floating-point conditions:
-@code{f}, @code{un}, @code{eq}, @code{ueq}, @code{olt}, @code{ult},
-@code{ole}, @code{ule}, @code{sf}, @code{ngle}, @code{seq},
-@code{ngl}, @code{lt}, @code{nge}, @code{le} or @code{ngt}.
+GCC implements extensions for C as described in the ARM C Language
+Extensions (ACLE) specification, which can be found at
+@uref{https://developer.arm.com/documentation/ihi0053/latest/}.
 
-@table @code
-@item int __builtin_mips_cabs_@var{cond}_s (float @var{a}, float @var{b})
-@itemx int __builtin_mips_cabs_@var{cond}_d (double @var{a}, double @var{b})
-Absolute comparison of two scalar values (@code{cabs.@var{cond}.@var{fmt}},
-@code{bc1t}/@code{bc1f}).
+As a part of ACLE, GCC implements extensions for Advanced SIMD as described in
+the ARM C Language Extensions Specification.  The complete list of Advanced SIMD
+intrinsics can be found at
+@uref{https://developer.arm.com/documentation/ihi0073/latest/}.
+The built-in intrinsics for the Advanced SIMD extension are available when
+NEON is enabled.
 
-These functions compare @var{a} and @var{b} using @code{cabs.@var{cond}.s}
-or @code{cabs.@var{cond}.d} and return the result as a boolean value.
-For example:
+Currently, ARM and AArch64 back ends do not support ACLE 2.0 fully.  Both
+back ends support CRC32 intrinsics and the ARM back end supports the
+Coprocessor intrinsics, all from @file{arm_acle.h}.  The ARM back end's 16-bit
+floating-point Advanced SIMD intrinsics currently comply to ACLE v1.1.
+AArch64's back end does not have support for 16-bit floating point Advanced SIMD
+intrinsics yet.
 
-@smallexample
-float a, b;
-if (__builtin_mips_cabs_eq_s (a, b))
-  true ();
-else
-  false ();
-@end smallexample
+See @ref{ARM Options} and @ref{AArch64 Options} for more information on the
+availability of extensions.
 
-@item int __builtin_mips_upper_cabs_@var{cond}_ps (v2sf @var{a}, v2sf @var{b})
-@itemx int __builtin_mips_lower_cabs_@var{cond}_ps (v2sf @var{a}, v2sf @var{b})
-Absolute comparison of two paired-single values (@code{cabs.@var{cond}.ps},
-@code{bc1t}/@code{bc1f}).
+@node ARM Floating Point Status and Control Intrinsics
+@subsection ARM Floating Point Status and Control Intrinsics
 
-These functions compare @var{a} and @var{b} using @code{cabs.@var{cond}.ps}
-and return either the upper or lower half of the result.  For example:
+These built-in functions are available for the ARM family of
+processors with floating-point unit.
 
 @smallexample
-v2sf a, b;
-if (__builtin_mips_upper_cabs_eq_ps (a, b))
-  upper_halves_are_equal ();
-else
-  upper_halves_are_unequal ();
-
-if (__builtin_mips_lower_cabs_eq_ps (a, b))
-  lower_halves_are_equal ();
-else
-  lower_halves_are_unequal ();
+unsigned int __builtin_arm_get_fpscr ();
+void __builtin_arm_set_fpscr (unsigned int);
 @end smallexample
 
-@item v2sf __builtin_mips_movt_cabs_@var{cond}_ps (v2sf @var{a}, v2sf @var{b}, v2sf @var{c}, v2sf @var{d})
-@itemx v2sf __builtin_mips_movf_cabs_@var{cond}_ps (v2sf @var{a}, v2sf @var{b}, v2sf @var{c}, v2sf @var{d})
-Conditional move based on absolute comparison (@code{cabs.@var{cond}.ps},
-@code{movt.ps}/@code{movf.ps}).
-
-The @code{movt} functions return the value @var{x} computed by:
-
-@smallexample
-cabs.@var{cond}.ps @var{cc},@var{a},@var{b}
-mov.ps @var{x},@var{c}
-movt.ps @var{x},@var{d},@var{cc}
-@end smallexample
+@node ARM ARMv8-M Security Extensions
+@subsection ARM ARMv8-M Security Extensions
 
-The @code{movf} functions are similar but use @code{movf.ps} instead
-of @code{movt.ps}.
+GCC implements the ARMv8-M Security Extensions as described in the ARMv8-M
+Security Extensions: Requirements on Development Tools Engineering
+Specification, which can be found at
+@uref{https://developer.arm.com/documentation/ecm0359818/latest/}.
 
-@item int __builtin_mips_any_c_@var{cond}_ps (v2sf @var{a}, v2sf @var{b})
-@itemx int __builtin_mips_all_c_@var{cond}_ps (v2sf @var{a}, v2sf @var{b})
-@itemx int __builtin_mips_any_cabs_@var{cond}_ps (v2sf @var{a}, v2sf @var{b})
-@itemx int __builtin_mips_all_cabs_@var{cond}_ps (v2sf @var{a}, v2sf @var{b})
-Comparison of two paired-single values
-(@code{c.@var{cond}.ps}/@code{cabs.@var{cond}.ps},
-@code{bc1any2t}/@code{bc1any2f}).
+As part of the Security Extensions GCC implements two new function attributes:
+@code{cmse_nonsecure_entry} and @code{cmse_nonsecure_call}.
 
-These functions compare @var{a} and @var{b} using @code{c.@var{cond}.ps}
-or @code{cabs.@var{cond}.ps}.  The @code{any} forms return @code{true} if either
-result is @code{true} and the @code{all} forms return @code{true} if both results are @code{true}.
-For example:
+As part of the Security Extensions GCC implements the intrinsics below.  FPTR
+is used here to mean any function pointer type.
 
 @smallexample
-v2sf a, b;
-if (__builtin_mips_any_c_eq_ps (a, b))
-  one_is_true ();
-else
-  both_are_false ();
-
-if (__builtin_mips_all_c_eq_ps (a, b))
-  both_are_true ();
-else
-  one_is_false ();
+cmse_address_info_t cmse_TT (void *);
+cmse_address_info_t cmse_TT_fptr (FPTR);
+cmse_address_info_t cmse_TTT (void *);
+cmse_address_info_t cmse_TTT_fptr (FPTR);
+cmse_address_info_t cmse_TTA (void *);
+cmse_address_info_t cmse_TTA_fptr (FPTR);
+cmse_address_info_t cmse_TTAT (void *);
+cmse_address_info_t cmse_TTAT_fptr (FPTR);
+void * cmse_check_address_range (void *, size_t, int);
+typeof(p) cmse_nsfptr_create (FPTR p);
+intptr_t cmse_is_nsfptr (FPTR);
+int cmse_nonsecure_caller (void);
 @end smallexample
 
-@item int __builtin_mips_any_c_@var{cond}_4s (v2sf @var{a}, v2sf @var{b}, v2sf @var{c}, v2sf @var{d})
-@itemx int __builtin_mips_all_c_@var{cond}_4s (v2sf @var{a}, v2sf @var{b}, v2sf @var{c}, v2sf @var{d})
-@itemx int __builtin_mips_any_cabs_@var{cond}_4s (v2sf @var{a}, v2sf @var{b}, v2sf @var{c}, v2sf @var{d})
-@itemx int __builtin_mips_all_cabs_@var{cond}_4s (v2sf @var{a}, v2sf @var{b}, v2sf @var{c}, v2sf @var{d})
-Comparison of four paired-single values
-(@code{c.@var{cond}.ps}/@code{cabs.@var{cond}.ps},
-@code{bc1any4t}/@code{bc1any4f}).
+@node AVR Built-in Functions
+@subsection AVR Built-in Functions
 
-These functions use @code{c.@var{cond}.ps} or @code{cabs.@var{cond}.ps}
-to compare @var{a} with @var{b} and to compare @var{c} with @var{d}.
-The @code{any} forms return @code{true} if any of the four results are @code{true}
-and the @code{all} forms return @code{true} if all four results are @code{true}.
-For example:
+For each built-in function for AVR, there is an equally named,
+uppercase built-in macro defined. That way users can easily query if
+or if not a specific built-in is implemented or not. For example, if
+@code{__builtin_avr_nop} is available the macro
+@code{__BUILTIN_AVR_NOP} is defined to @code{1} and undefined otherwise.
 
-@smallexample
-v2sf a, b, c, d;
-if (__builtin_mips_any_c_eq_4s (a, b, c, d))
-  some_are_true ();
-else
-  all_are_false ();
+@defbuiltin{void __builtin_avr_nop (void)}
+@defbuiltinx{void __builtin_avr_sei (void)}
+@defbuiltinx{void __builtin_avr_cli (void)}
+@defbuiltinx{void __builtin_avr_sleep (void)}
+@defbuiltinx{void __builtin_avr_wdr (void)}
+@defbuiltinx{uint8_t __builtin_avr_swap (uint8_t)}
+@defbuiltinx{uint16_t __builtin_avr_fmul (uint8_t, uint8_t)}
+@defbuiltinx{int16_t __builtin_avr_fmuls (int8_t, int8_t)}
+@defbuiltinx{int16_t __builtin_avr_fmulsu (int8_t, uint8_t)}
 
-if (__builtin_mips_all_c_eq_4s (a, b, c, d))
-  all_are_true ();
-else
-  some_are_false ();
-@end smallexample
-@end table
+These built-in functions map to the respective machine
+instruction, i.e.@: @code{nop}, @code{sei}, @code{cli}, @code{sleep},
+@code{wdr}, @code{swap}, @code{fmul}, @code{fmuls}
+resp. @code{fmulsu}. The three @code{fmul*} built-ins are implemented
+as library call if no hardware multiplier is available.
+@enddefbuiltin
 
-@node MIPS SIMD Architecture (MSA) Support
-@subsection MIPS SIMD Architecture (MSA) Support
+@defbuiltin{void __builtin_avr_delay_cycles (uint32_t @var{ticks})}
+Delay execution for @var{ticks} cycles. Note that this
+built-in does not take into account the effect of interrupts that
+might increase delay time. @var{ticks} must be a compile-time
+integer constant; delays with a variable number of cycles are not supported.
+@enddefbuiltin
 
-@menu
-* MIPS SIMD Architecture Built-in Functions::
-@end menu
+@defbuiltin{uint8_t __builtin_avr_insert_bits (uint32_t @var{map}, uint8_t @var{bits}, uint8_t @var{val})}
+Insert bits from @var{bits} into @var{val} and return the resulting
+value. The nibbles of @var{map} determine how the insertion is
+performed: Let @var{X} be the @var{n}-th nibble of @var{map}
+@enumerate
+@item If @var{X} is @code{0xf},
+then the @var{n}-th bit of @var{val} is returned unaltered.
 
-GCC provides intrinsics to access the SIMD instructions provided by the
-MSA MIPS SIMD Architecture.  The interface is made available by including
-@code{<msa.h>} and using @option{-mmsa -mhard-float -mfp64 -mnan=2008}.
-For each @code{__builtin_msa_*}, there is a shortened name of the intrinsic,
-@code{__msa_*}.
+@item If X is in the range 0@dots{}7,
+then the @var{n}-th result bit is set to the @var{X}-th bit of @var{bits}
 
-MSA implements 128-bit wide vector registers, operating on 8-, 16-, 32- and
-64-bit integer, 16- and 32-bit fixed-point, or 32- and 64-bit floating point
-data elements.  The following vectors typedefs are included in @code{msa.h}:
-@itemize
-@item @code{v16i8}, a vector of sixteen signed 8-bit integers;
-@item @code{v16u8}, a vector of sixteen unsigned 8-bit integers;
-@item @code{v8i16}, a vector of eight signed 16-bit integers;
-@item @code{v8u16}, a vector of eight unsigned 16-bit integers;
-@item @code{v4i32}, a vector of four signed 32-bit integers;
-@item @code{v4u32}, a vector of four unsigned 32-bit integers;
-@item @code{v2i64}, a vector of two signed 64-bit integers;
-@item @code{v2u64}, a vector of two unsigned 64-bit integers;
-@item @code{v4f32}, a vector of four 32-bit floats;
-@item @code{v2f64}, a vector of two 64-bit doubles.
-@end itemize
+@item If X is in the range 8@dots{}@code{0xe},
+then the @var{n}-th result bit is undefined.
+@end enumerate
 
-Instructions and corresponding built-ins may have additional restrictions and/or
-input/output values manipulated:
-@itemize
-@item @code{imm0_1}, an integer literal in range 0 to 1;
-@item @code{imm0_3}, an integer literal in range 0 to 3;
-@item @code{imm0_7}, an integer literal in range 0 to 7;
-@item @code{imm0_15}, an integer literal in range 0 to 15;
-@item @code{imm0_31}, an integer literal in range 0 to 31;
-@item @code{imm0_63}, an integer literal in range 0 to 63;
-@item @code{imm0_255}, an integer literal in range 0 to 255;
-@item @code{imm_n16_15}, an integer literal in range -16 to 15;
-@item @code{imm_n512_511}, an integer literal in range -512 to 511;
-@item @code{imm_n1024_1022}, an integer literal in range -512 to 511 left
-shifted by 1 bit, i.e., -1024, -1022, @dots{}, 1020, 1022;
-@item @code{imm_n2048_2044}, an integer literal in range -512 to 511 left
-shifted by 2 bits, i.e., -2048, -2044, @dots{}, 2040, 2044;
-@item @code{imm_n4096_4088}, an integer literal in range -512 to 511 left
-shifted by 3 bits, i.e., -4096, -4088, @dots{}, 4080, 4088;
-@item @code{imm1_4}, an integer literal in range 1 to 4;
-@item @code{i32, i64, u32, u64, f32, f64}, defined as follows:
-@end itemize
+@noindent
+One typical use case for this built-in is adjusting input and
+output values to non-contiguous port layouts. Some examples:
 
 @smallexample
-@{
-typedef int i32;
-#if __LONG_MAX__ == __LONG_LONG_MAX__
-typedef long i64;
-#else
-typedef long long i64;
-#endif
-
-typedef unsigned int u32;
-#if __LONG_MAX__ == __LONG_LONG_MAX__
-typedef unsigned long u64;
-#else
-typedef unsigned long long u64;
-#endif
+// same as val, bits is unused
+__builtin_avr_insert_bits (0xffffffff, bits, val);
+@end smallexample
 
-typedef double f64;
-typedef float f32;
-@}
+@smallexample
+// same as bits, val is unused
+__builtin_avr_insert_bits (0x76543210, bits, val);
 @end smallexample
 
-@node MIPS SIMD Architecture Built-in Functions
-@subsubsection MIPS SIMD Architecture Built-in Functions
+@smallexample
+// same as rotating bits by 4
+__builtin_avr_insert_bits (0x32107654, bits, 0);
+@end smallexample
 
-The intrinsics provided are listed below; each is named after the
-machine instruction.
+@smallexample
+// high nibble of result is the high nibble of val
+// low nibble of result is the low nibble of bits
+__builtin_avr_insert_bits (0xffff3210, bits, val);
+@end smallexample
 
 @smallexample
-v16i8 __builtin_msa_add_a_b (v16i8, v16i8);
-v8i16 __builtin_msa_add_a_h (v8i16, v8i16);
-v4i32 __builtin_msa_add_a_w (v4i32, v4i32);
-v2i64 __builtin_msa_add_a_d (v2i64, v2i64);
+// reverse the bit order of bits
+__builtin_avr_insert_bits (0x01234567, bits, 0);
+@end smallexample
+@enddefbuiltin
 
-v16i8 __builtin_msa_adds_a_b (v16i8, v16i8);
-v8i16 __builtin_msa_adds_a_h (v8i16, v8i16);
-v4i32 __builtin_msa_adds_a_w (v4i32, v4i32);
-v2i64 __builtin_msa_adds_a_d (v2i64, v2i64);
+@defbuiltin{uint8_t __builtin_avr_mask1 (uint8_t @var{mask}, uint8_t @var{offs})}
+Rotate the 8-bit constant value @var{mask} by an offset of @var{offs},
+where @var{mask} is in @{ 0x01, 0xfe, 0x7f, 0x80 @}.
+This built-in can be used as an alternative to 8-bit expressions like
+@code{1 << offs} when their computation consumes too much
+time, and @var{offs} is known to be in the range 0@dots{}7.
+@example
+__builtin_avr_mask1 (1, offs)      // same like  1 << offs
+__builtin_avr_mask1 (~1, offs)     // same like  ~(1 << offs)
+__builtin_avr_mask1 (0x80, offs)   // same like  0x80 >> offs
+__builtin_avr_mask1 (~0x80, offs)  // same like  ~(0x80 >> offs)
+@end example
+The open coded C versions take at least @code{5 + 4 * @var{offs}} cycles
+(and 5 instructions), whereas the built-in takes 7 cycles and instructions
+(8 cycles and instructions in the case of @code{@var{mask} = 0x7f}).
+@enddefbuiltin
 
-v16i8 __builtin_msa_adds_s_b (v16i8, v16i8);
-v8i16 __builtin_msa_adds_s_h (v8i16, v8i16);
-v4i32 __builtin_msa_adds_s_w (v4i32, v4i32);
-v2i64 __builtin_msa_adds_s_d (v2i64, v2i64);
+@defbuiltin{void __builtin_avr_nops (uint16_t @var{count})}
+Insert @var{count} @code{NOP} instructions.
+The number of instructions must be a compile-time integer constant.
+@enddefbuiltin
 
-v16u8 __builtin_msa_adds_u_b (v16u8, v16u8);
-v8u16 __builtin_msa_adds_u_h (v8u16, v8u16);
-v4u32 __builtin_msa_adds_u_w (v4u32, v4u32);
-v2u64 __builtin_msa_adds_u_d (v2u64, v2u64);
+@b{All of the following built-in functions are only available for GNU-C}
 
-v16i8 __builtin_msa_addv_b (v16i8, v16i8);
-v8i16 __builtin_msa_addv_h (v8i16, v8i16);
-v4i32 __builtin_msa_addv_w (v4i32, v4i32);
-v2i64 __builtin_msa_addv_d (v2i64, v2i64);
+@defbuiltin{int8_t __builtin_avr_flash_segment (const __memx void*)}
+This built-in takes a byte address to the 24-bit
+@ref{AVR Named Address Spaces,named address space} @code{__memx} and returns
+the number of the flash segment (the 64 KiB chunk) where the address
+points to.  Counting starts at @code{0}.
+If the address does not point to flash memory, return @code{-1}.
+@enddefbuiltin
 
-v16i8 __builtin_msa_addvi_b (v16i8, imm0_31);
-v8i16 __builtin_msa_addvi_h (v8i16, imm0_31);
-v4i32 __builtin_msa_addvi_w (v4i32, imm0_31);
-v2i64 __builtin_msa_addvi_d (v2i64, imm0_31);
+@defbuiltin{size_t __builtin_avr_strlen_flash (const __flash char*)}
+@defbuiltinx{size_t __builtin_avr_strlen_flashx (const __flashx char*)}
+@defbuiltinx{size_t __builtin_avr_strlen_memx (const __memx char*)}
+These built-ins return the length of a string located in
+named address-space @code{__flash}, @code{__flashx} or @code{__memx},
+respectively.  They are used to support functions like @code{strlen_F} from
+@w{@uref{https://avrdudes.github.io/avr-libc/avr-libc-user-manual/,AVR-LibC}}'s
+header @code{avr/flash.h}.
+@enddefbuiltin
 
-v16u8 __builtin_msa_and_v (v16u8, v16u8);
+@noindent
+There are many more AVR-specific built-in functions that are used to
+implement the ISO/IEC TR 18037 ``Embedded C'' fixed-point functions of
+section 7.18a.6.  You don't need to use these built-ins directly.
+Instead, use the declarations as supplied by the @code{stdfix.h} header
+with GNU-C99:
 
-v16u8 __builtin_msa_andi_b (v16u8, imm0_255);
+@smallexample
+#include <stdfix.h>
 
-v16i8 __builtin_msa_asub_s_b (v16i8, v16i8);
-v8i16 __builtin_msa_asub_s_h (v8i16, v8i16);
-v4i32 __builtin_msa_asub_s_w (v4i32, v4i32);
-v2i64 __builtin_msa_asub_s_d (v2i64, v2i64);
+// Re-interpret the bit representation of unsigned 16-bit
+// integer @var{uval} as Q-format 0.16 value.
+unsigned fract get_bits (uint_ur_t uval)
+@{
+    return urbits (uval);
+@}
+@end smallexample
 
-v16u8 __builtin_msa_asub_u_b (v16u8, v16u8);
-v8u16 __builtin_msa_asub_u_h (v8u16, v8u16);
-v4u32 __builtin_msa_asub_u_w (v4u32, v4u32);
-v2u64 __builtin_msa_asub_u_d (v2u64, v2u64);
+@node Blackfin Built-in Functions
+@subsection Blackfin Built-in Functions
 
-v16i8 __builtin_msa_ave_s_b (v16i8, v16i8);
-v8i16 __builtin_msa_ave_s_h (v8i16, v8i16);
-v4i32 __builtin_msa_ave_s_w (v4i32, v4i32);
-v2i64 __builtin_msa_ave_s_d (v2i64, v2i64);
+Currently, there are two Blackfin-specific built-in functions.  These are
+used for generating @code{CSYNC} and @code{SSYNC} machine insns without
+using inline assembly; by using these built-in functions the compiler can
+automatically add workarounds for hardware errata involving these
+instructions.  These functions are named as follows:
 
-v16u8 __builtin_msa_ave_u_b (v16u8, v16u8);
-v8u16 __builtin_msa_ave_u_h (v8u16, v8u16);
-v4u32 __builtin_msa_ave_u_w (v4u32, v4u32);
-v2u64 __builtin_msa_ave_u_d (v2u64, v2u64);
+@smallexample
+void __builtin_bfin_csync (void);
+void __builtin_bfin_ssync (void);
+@end smallexample
 
-v16i8 __builtin_msa_aver_s_b (v16i8, v16i8);
-v8i16 __builtin_msa_aver_s_h (v8i16, v8i16);
-v4i32 __builtin_msa_aver_s_w (v4i32, v4i32);
-v2i64 __builtin_msa_aver_s_d (v2i64, v2i64);
+@node BPF Built-in Functions
+@subsection BPF Built-in Functions
 
-v16u8 __builtin_msa_aver_u_b (v16u8, v16u8);
-v8u16 __builtin_msa_aver_u_h (v8u16, v8u16);
-v4u32 __builtin_msa_aver_u_w (v4u32, v4u32);
-v2u64 __builtin_msa_aver_u_d (v2u64, v2u64);
+The following built-in functions are available for eBPF targets.
 
-v16u8 __builtin_msa_bclr_b (v16u8, v16u8);
-v8u16 __builtin_msa_bclr_h (v8u16, v8u16);
-v4u32 __builtin_msa_bclr_w (v4u32, v4u32);
-v2u64 __builtin_msa_bclr_d (v2u64, v2u64);
+@defbuiltin{{unsigned long long} __builtin_bpf_load_byte (unsigned long long @var{offset})}
+Load a byte from the @code{struct sk_buff} packet data pointed to by the
+register @code{%r6}, and return it.
+@enddefbuiltin
 
-v16u8 __builtin_msa_bclri_b (v16u8, imm0_7);
-v8u16 __builtin_msa_bclri_h (v8u16, imm0_15);
-v4u32 __builtin_msa_bclri_w (v4u32, imm0_31);
-v2u64 __builtin_msa_bclri_d (v2u64, imm0_63);
+@defbuiltin{{unsigned long long} __builtin_bpf_load_half (unsigned long long @var{offset})}
+Load 16 bits from the @code{struct sk_buff} packet data pointed to by the
+register @code{%r6}, and return it.
+@enddefbuiltin
 
-v16u8 __builtin_msa_binsl_b (v16u8, v16u8, v16u8);
-v8u16 __builtin_msa_binsl_h (v8u16, v8u16, v8u16);
-v4u32 __builtin_msa_binsl_w (v4u32, v4u32, v4u32);
-v2u64 __builtin_msa_binsl_d (v2u64, v2u64, v2u64);
+@defbuiltin{{unsigned long long} __builtin_bpf_load_word (unsigned long long @var{offset})}
+Load 32 bits from the @code{struct sk_buff} packet data pointed to by the
+register @code{%r6}, and return it.
+@enddefbuiltin
 
-v16u8 __builtin_msa_binsli_b (v16u8, v16u8, imm0_7);
-v8u16 __builtin_msa_binsli_h (v8u16, v8u16, imm0_15);
-v4u32 __builtin_msa_binsli_w (v4u32, v4u32, imm0_31);
-v2u64 __builtin_msa_binsli_d (v2u64, v2u64, imm0_63);
+@defbuiltin{@var{type} __builtin_preserve_access_index (@var{type} @var{expr})}
+BPF Compile Once-Run Everywhere (CO-RE) support.  Instruct GCC to
+generate CO-RE relocation records for any accesses to aggregate
+data structures (struct, union, array types) in @var{expr}.  This builtin
+is otherwise transparent; @var{expr} may have any type and its value is
+returned.  This builtin has no effect if @code{-mco-re} is not in effect
+(either specified or implied).
+@enddefbuiltin
 
-v16u8 __builtin_msa_binsr_b (v16u8, v16u8, v16u8);
-v8u16 __builtin_msa_binsr_h (v8u16, v8u16, v8u16);
-v4u32 __builtin_msa_binsr_w (v4u32, v4u32, v4u32);
-v2u64 __builtin_msa_binsr_d (v2u64, v2u64, v2u64);
+@defbuiltin{{unsigned int} __builtin_preserve_field_info (@var{expr}, unsigned int @var{kind})}
+BPF Compile Once-Run Everywhere (CO-RE) support. This builtin is used to
+extract information to aid in struct/union relocations.  @var{expr} is
+an access to a field of a struct or union. Depending on @var{kind}, different
+information is returned to the program. A CO-RE relocation for the access in
+@var{expr} with kind @var{kind} is recorded if @code{-mco-re} is in effect.
 
-v16u8 __builtin_msa_binsri_b (v16u8, v16u8, imm0_7);
-v8u16 __builtin_msa_binsri_h (v8u16, v8u16, imm0_15);
-v4u32 __builtin_msa_binsri_w (v4u32, v4u32, imm0_31);
-v2u64 __builtin_msa_binsri_d (v2u64, v2u64, imm0_63);
+The following values are supported for @var{kind}:
+@table @code
+@item FIELD_BYTE_OFFSET = 0
+The returned value is the offset, in bytes, of the field from the
+beginning of the containing structure. For bit-fields, this is the byte offset
+of the containing word.
 
-v16u8 __builtin_msa_bmnz_v (v16u8, v16u8, v16u8);
+@item FIELD_BYTE_SIZE = 1
+The returned value is the size, in bytes, of the field. For bit-fields,
+this is the size in bytes of the containing word.
 
-v16u8 __builtin_msa_bmnzi_b (v16u8, v16u8, imm0_255);
+@item FIELD_EXISTENCE = 2
+The returned value is 1 if the field exists, 0 otherwise. Always 1 at
+compile time.
 
-v16u8 __builtin_msa_bmz_v (v16u8, v16u8, v16u8);
+@item FIELD_SIGNEDNESS = 3
+The returned value is 1 if the field is signed, 0 otherwise.
 
-v16u8 __builtin_msa_bmzi_b (v16u8, v16u8, imm0_255);
+@item FIELD_LSHIFT_U64 = 4
+@itemx FIELD_RSHIFT_U64 = 5
+The returned value is the number of bits of left- or right-shifting
+(respectively) needed in order to recover the original value of the field,
+after it has been loaded by a read of @code{FIELD_BYTE_SIZE} bytes into an
+unsigned 64-bit value. Primarily useful for reading bit-field values
+from structures that may change between kernel versions.
 
-v16u8 __builtin_msa_bneg_b (v16u8, v16u8);
-v8u16 __builtin_msa_bneg_h (v8u16, v8u16);
-v4u32 __builtin_msa_bneg_w (v4u32, v4u32);
-v2u64 __builtin_msa_bneg_d (v2u64, v2u64);
+@end table
 
-v16u8 __builtin_msa_bnegi_b (v16u8, imm0_7);
-v8u16 __builtin_msa_bnegi_h (v8u16, imm0_15);
-v4u32 __builtin_msa_bnegi_w (v4u32, imm0_31);
-v2u64 __builtin_msa_bnegi_d (v2u64, imm0_63);
+Note that the return value is a constant which is known at
+compile time. If the field has a variable offset then
+@code{FIELD_BYTE_OFFSET}, @code{FIELD_LSHIFT_U64},
+and @code{FIELD_RSHIFT_U64} are not supported.
+Similarly, if the field has a variable size then
+@code{FIELD_BYTE_SIZE}, @code{FIELD_LSHIFT_U64},
+and @code{FIELD_RSHIFT_U64} are not supported.
 
-i32 __builtin_msa_bnz_b (v16u8);
-i32 __builtin_msa_bnz_h (v8u16);
-i32 __builtin_msa_bnz_w (v4u32);
-i32 __builtin_msa_bnz_d (v2u64);
+For example, @code{__builtin_preserve_field_info} can be used to reliably
+extract bit-field values from a structure that may change between
+kernel versions:
 
-i32 __builtin_msa_bnz_v (v16u8);
+@smallexample
+struct S
+@{
+  short a;
+  int x:7;
+  int y:5;
+@};
 
-v16u8 __builtin_msa_bsel_v (v16u8, v16u8, v16u8);
+int
+read_y (struct S *arg)
+@{
+  unsigned long long val;
+  unsigned int offset
+    = __builtin_preserve_field_info (arg->y, FIELD_BYTE_OFFSET);
+  unsigned int size
+    = __builtin_preserve_field_info (arg->y, FIELD_BYTE_SIZE);
 
-v16u8 __builtin_msa_bseli_b (v16u8, v16u8, imm0_255);
+  /* Read size bytes from arg + offset into val.  */
+  bpf_probe_read (&val, size, arg + offset);
 
-v16u8 __builtin_msa_bset_b (v16u8, v16u8);
-v8u16 __builtin_msa_bset_h (v8u16, v8u16);
-v4u32 __builtin_msa_bset_w (v4u32, v4u32);
-v2u64 __builtin_msa_bset_d (v2u64, v2u64);
+  val <<= __builtin_preserve_field_info (arg->y, FIELD_LSHIFT_U64);
 
-v16u8 __builtin_msa_bseti_b (v16u8, imm0_7);
-v8u16 __builtin_msa_bseti_h (v8u16, imm0_15);
-v4u32 __builtin_msa_bseti_w (v4u32, imm0_31);
-v2u64 __builtin_msa_bseti_d (v2u64, imm0_63);
+  if (__builtin_preserve_field_info (arg->y, FIELD_SIGNEDNESS))
+    val = ((long long) val
+           >> __builtin_preserve_field_info (arg->y, FIELD_RSHIFT_U64));
+  else
+    val >>= __builtin_preserve_field_info (arg->y, FIELD_RSHIFT_U64);
 
-i32 __builtin_msa_bz_b (v16u8);
-i32 __builtin_msa_bz_h (v8u16);
-i32 __builtin_msa_bz_w (v4u32);
-i32 __builtin_msa_bz_d (v2u64);
+  return val;
+@}
 
-i32 __builtin_msa_bz_v (v16u8);
+@end smallexample
+@enddefbuiltin
 
-v16i8 __builtin_msa_ceq_b (v16i8, v16i8);
-v8i16 __builtin_msa_ceq_h (v8i16, v8i16);
-v4i32 __builtin_msa_ceq_w (v4i32, v4i32);
-v2i64 __builtin_msa_ceq_d (v2i64, v2i64);
+@defbuiltin{{unsigned int} __builtin_preserve_enum_value (@var{type}, @var{enum}, unsigned int @var{kind})}
+BPF Compile Once-Run Everywhere (CO-RE) support. This builtin collects enum
+information and creates a CO-RE relocation relative to @var{enum} that should
+be of @var{type}.  The @var{kind} specifies the action performed.
 
-v16i8 __builtin_msa_ceqi_b (v16i8, imm_n16_15);
-v8i16 __builtin_msa_ceqi_h (v8i16, imm_n16_15);
-v4i32 __builtin_msa_ceqi_w (v4i32, imm_n16_15);
-v2i64 __builtin_msa_ceqi_d (v2i64, imm_n16_15);
+The following values are supported for @var{kind}:
+@table @code
+@item ENUM_VALUE_EXISTS = 0
+The return value is either 0 or 1 depending if the enum value exists in the
+target.
 
-i32 __builtin_msa_cfcmsa (imm0_31);
+@item ENUM_VALUE = 1
+The return value is the enum value in the target kernel.
+@end table
+@enddefbuiltin
 
-v16i8 __builtin_msa_cle_s_b (v16i8, v16i8);
-v8i16 __builtin_msa_cle_s_h (v8i16, v8i16);
-v4i32 __builtin_msa_cle_s_w (v4i32, v4i32);
-v2i64 __builtin_msa_cle_s_d (v2i64, v2i64);
+@defbuiltin{{unsigned int} __builtin_btf_type_id (@var{type}, unsigned int @var{kind})}
+BPF Compile Once-Run Everywhere (CO-RE) support. This builtin is used to get
+the BTF type ID of a specified @var{type}.
+Depending on the @var{kind} argument, it
+either returns the ID of the local BTF information, or the BTF type ID in
+the target kernel.
 
-v16i8 __builtin_msa_cle_u_b (v16u8, v16u8);
-v8i16 __builtin_msa_cle_u_h (v8u16, v8u16);
-v4i32 __builtin_msa_cle_u_w (v4u32, v4u32);
-v2i64 __builtin_msa_cle_u_d (v2u64, v2u64);
+The following values are supported for @var{kind}:
+@table @code
+@item BTF_TYPE_ID_LOCAL = 0
+Return the local BTF type ID.  Always succeeds.
 
-v16i8 __builtin_msa_clei_s_b (v16i8, imm_n16_15);
-v8i16 __builtin_msa_clei_s_h (v8i16, imm_n16_15);
-v4i32 __builtin_msa_clei_s_w (v4i32, imm_n16_15);
-v2i64 __builtin_msa_clei_s_d (v2i64, imm_n16_15);
+@item BTF_TYPE_ID_TARGET = 1
+Return the target BTF type ID.  If @var{type} does not exist in the target,
+returns 0.
+@end table
+@enddefbuiltin
 
-v16i8 __builtin_msa_clei_u_b (v16u8, imm0_31);
-v8i16 __builtin_msa_clei_u_h (v8u16, imm0_31);
-v4i32 __builtin_msa_clei_u_w (v4u32, imm0_31);
-v2i64 __builtin_msa_clei_u_d (v2u64, imm0_31);
+@defbuiltin{{unsigned int} __builtin_preserve_type_info (@var{type}, unsigned int @var{kind})}
+BPF Compile Once-Run Everywhere (CO-RE) support. This builtin performs named
+type (struct/union/enum/typedef) verifications. The type of verification
+depends on the @var{kind} argument provided.  This builtin always
+returns 0 if @var{type} does not exist in the target kernel.
 
-v16i8 __builtin_msa_clt_s_b (v16i8, v16i8);
-v8i16 __builtin_msa_clt_s_h (v8i16, v8i16);
-v4i32 __builtin_msa_clt_s_w (v4i32, v4i32);
-v2i64 __builtin_msa_clt_s_d (v2i64, v2i64);
+The following values are supported for @var{kind}:
+@table @code
+@item BTF_TYPE_EXISTS = 0
+Checks if @var{type} exists in the target.
 
-v16i8 __builtin_msa_clt_u_b (v16u8, v16u8);
-v8i16 __builtin_msa_clt_u_h (v8u16, v8u16);
-v4i32 __builtin_msa_clt_u_w (v4u32, v4u32);
-v2i64 __builtin_msa_clt_u_d (v2u64, v2u64);
+@item BTF_TYPE_MATCHES = 1
+Checks if @var{type} matches the local definition in the target kernel.
 
-v16i8 __builtin_msa_clti_s_b (v16i8, imm_n16_15);
-v8i16 __builtin_msa_clti_s_h (v8i16, imm_n16_15);
-v4i32 __builtin_msa_clti_s_w (v4i32, imm_n16_15);
-v2i64 __builtin_msa_clti_s_d (v2i64, imm_n16_15);
+@item BTF_TYPE_SIZE = 2
+Returns the size of the @var{type} within the target.
+@end table
+@enddefbuiltin
 
-v16i8 __builtin_msa_clti_u_b (v16u8, imm0_31);
-v8i16 __builtin_msa_clti_u_h (v8u16, imm0_31);
-v4i32 __builtin_msa_clti_u_w (v4u32, imm0_31);
-v2i64 __builtin_msa_clti_u_d (v2u64, imm0_31);
+@node FR-V Built-in Functions
+@subsection FR-V Built-in Functions
 
-i32 __builtin_msa_copy_s_b (v16i8, imm0_15);
-i32 __builtin_msa_copy_s_h (v8i16, imm0_7);
-i32 __builtin_msa_copy_s_w (v4i32, imm0_3);
-i64 __builtin_msa_copy_s_d (v2i64, imm0_1);
+GCC provides many FR-V-specific built-in functions.  In general,
+these functions are intended to be compatible with those described
+by @cite{FR-V Family, Softune C/C++ Compiler Manual (V6), Fujitsu
+Semiconductor}.  The two exceptions are @code{__MDUNPACKH} and
+@code{__MBTOHE}, the GCC forms of which pass 128-bit values by
+pointer rather than by value.
 
-u32 __builtin_msa_copy_u_b (v16i8, imm0_15);
-u32 __builtin_msa_copy_u_h (v8i16, imm0_7);
-u32 __builtin_msa_copy_u_w (v4i32, imm0_3);
-u64 __builtin_msa_copy_u_d (v2i64, imm0_1);
+Most of the functions are named after specific FR-V instructions.
+Such functions are said to be ``directly mapped'' and are summarized
+here in tabular form.
 
-void __builtin_msa_ctcmsa (imm0_31, i32);
+@menu
+* Argument Types::
+* Directly-mapped Integer Functions::
+* Directly-mapped Media Functions::
+* Raw read/write Functions::
+* Other Built-in Functions::
+@end menu
 
-v16i8 __builtin_msa_div_s_b (v16i8, v16i8);
-v8i16 __builtin_msa_div_s_h (v8i16, v8i16);
-v4i32 __builtin_msa_div_s_w (v4i32, v4i32);
-v2i64 __builtin_msa_div_s_d (v2i64, v2i64);
+@node Argument Types
+@subsubsection Argument Types
 
-v16u8 __builtin_msa_div_u_b (v16u8, v16u8);
-v8u16 __builtin_msa_div_u_h (v8u16, v8u16);
-v4u32 __builtin_msa_div_u_w (v4u32, v4u32);
-v2u64 __builtin_msa_div_u_d (v2u64, v2u64);
+The arguments to the built-in functions can be divided into three groups:
+register numbers, compile-time constants and run-time values.  In order
+to make this classification clear at a glance, the arguments and return
+values are given the following pseudo types:
 
-v8i16 __builtin_msa_dotp_s_h (v16i8, v16i8);
-v4i32 __builtin_msa_dotp_s_w (v8i16, v8i16);
-v2i64 __builtin_msa_dotp_s_d (v4i32, v4i32);
+@multitable @columnfractions .20 .30 .15 .35
+@headitem Pseudo type @tab Real C type @tab Constant? @tab Description
+@item @code{uh} @tab @code{unsigned short} @tab No @tab an unsigned halfword
+@item @code{uw1} @tab @code{unsigned int} @tab No @tab an unsigned word
+@item @code{sw1} @tab @code{int} @tab No @tab a signed word
+@item @code{uw2} @tab @code{unsigned long long} @tab No
+@tab an unsigned doubleword
+@item @code{sw2} @tab @code{long long} @tab No @tab a signed doubleword
+@item @code{const} @tab @code{int} @tab Yes @tab an integer constant
+@item @code{acc} @tab @code{int} @tab Yes @tab an ACC register number
+@item @code{iacc} @tab @code{int} @tab Yes @tab an IACC register number
+@end multitable
 
-v8u16 __builtin_msa_dotp_u_h (v16u8, v16u8);
-v4u32 __builtin_msa_dotp_u_w (v8u16, v8u16);
-v2u64 __builtin_msa_dotp_u_d (v4u32, v4u32);
+These pseudo types are not defined by GCC, they are simply a notational
+convenience used in this manual.
 
-v8i16 __builtin_msa_dpadd_s_h (v8i16, v16i8, v16i8);
-v4i32 __builtin_msa_dpadd_s_w (v4i32, v8i16, v8i16);
-v2i64 __builtin_msa_dpadd_s_d (v2i64, v4i32, v4i32);
+Arguments of type @code{uh}, @code{uw1}, @code{sw1}, @code{uw2}
+and @code{sw2} are evaluated at run time.  They correspond to
+register operands in the underlying FR-V instructions.
 
-v8u16 __builtin_msa_dpadd_u_h (v8u16, v16u8, v16u8);
-v4u32 __builtin_msa_dpadd_u_w (v4u32, v8u16, v8u16);
-v2u64 __builtin_msa_dpadd_u_d (v2u64, v4u32, v4u32);
+@code{const} arguments represent immediate operands in the underlying
+FR-V instructions.  They must be compile-time constants.
 
-v8i16 __builtin_msa_dpsub_s_h (v8i16, v16i8, v16i8);
-v4i32 __builtin_msa_dpsub_s_w (v4i32, v8i16, v8i16);
-v2i64 __builtin_msa_dpsub_s_d (v2i64, v4i32, v4i32);
+@code{acc} arguments are evaluated at compile time and specify the number
+of an accumulator register.  For example, an @code{acc} argument of 2
+selects the ACC2 register.
 
-v8i16 __builtin_msa_dpsub_u_h (v8i16, v16u8, v16u8);
-v4i32 __builtin_msa_dpsub_u_w (v4i32, v8u16, v8u16);
-v2i64 __builtin_msa_dpsub_u_d (v2i64, v4u32, v4u32);
+@code{iacc} arguments are similar to @code{acc} arguments but specify the
+number of an IACC register.  See @pxref{Other Built-in Functions}
+for more details.
 
-v4f32 __builtin_msa_fadd_w (v4f32, v4f32);
-v2f64 __builtin_msa_fadd_d (v2f64, v2f64);
+@node Directly-mapped Integer Functions
+@subsubsection Directly-Mapped Integer Functions
 
-v4i32 __builtin_msa_fcaf_w (v4f32, v4f32);
-v2i64 __builtin_msa_fcaf_d (v2f64, v2f64);
+The functions listed below map directly to FR-V I-type instructions.
 
-v4i32 __builtin_msa_fceq_w (v4f32, v4f32);
-v2i64 __builtin_msa_fceq_d (v2f64, v2f64);
+@multitable @columnfractions .45 .32 .23
+@headitem Function prototype @tab Example usage @tab Assembly output
+@item @code{sw1 __ADDSS (sw1, sw1)}
+@tab @code{@var{c} = __ADDSS (@var{a}, @var{b})}
+@tab @code{ADDSS @var{a},@var{b},@var{c}}
+@item @code{sw1 __SCAN (sw1, sw1)}
+@tab @code{@var{c} = __SCAN (@var{a}, @var{b})}
+@tab @code{SCAN @var{a},@var{b},@var{c}}
+@item @code{sw1 __SCUTSS (sw1)}
+@tab @code{@var{b} = __SCUTSS (@var{a})}
+@tab @code{SCUTSS @var{a},@var{b}}
+@item @code{sw1 __SLASS (sw1, sw1)}
+@tab @code{@var{c} = __SLASS (@var{a}, @var{b})}
+@tab @code{SLASS @var{a},@var{b},@var{c}}
+@item @code{void __SMASS (sw1, sw1)}
+@tab @code{__SMASS (@var{a}, @var{b})}
+@tab @code{SMASS @var{a},@var{b}}
+@item @code{void __SMSSS (sw1, sw1)}
+@tab @code{__SMSSS (@var{a}, @var{b})}
+@tab @code{SMSSS @var{a},@var{b}}
+@item @code{void __SMU (sw1, sw1)}
+@tab @code{__SMU (@var{a}, @var{b})}
+@tab @code{SMU @var{a},@var{b}}
+@item @code{sw2 __SMUL (sw1, sw1)}
+@tab @code{@var{c} = __SMUL (@var{a}, @var{b})}
+@tab @code{SMUL @var{a},@var{b},@var{c}}
+@item @code{sw1 __SUBSS (sw1, sw1)}
+@tab @code{@var{c} = __SUBSS (@var{a}, @var{b})}
+@tab @code{SUBSS @var{a},@var{b},@var{c}}
+@item @code{uw2 __UMUL (uw1, uw1)}
+@tab @code{@var{c} = __UMUL (@var{a}, @var{b})}
+@tab @code{UMUL @var{a},@var{b},@var{c}}
+@end multitable
 
-v4i32 __builtin_msa_fclass_w (v4f32);
-v2i64 __builtin_msa_fclass_d (v2f64);
+@node Directly-mapped Media Functions
+@subsubsection Directly-Mapped Media Functions
 
-v4i32 __builtin_msa_fcle_w (v4f32, v4f32);
-v2i64 __builtin_msa_fcle_d (v2f64, v2f64);
+The functions listed below map directly to FR-V M-type instructions.
 
-v4i32 __builtin_msa_fclt_w (v4f32, v4f32);
-v2i64 __builtin_msa_fclt_d (v2f64, v2f64);
+@multitable @columnfractions .45 .32 .23
+@headitem Function prototype @tab Example usage @tab Assembly output
+@item @code{uw1 __MABSHS (sw1)}
+@tab @code{@var{b} = __MABSHS (@var{a})}
+@tab @code{MABSHS @var{a},@var{b}}
+@item @code{void __MADDACCS (acc, acc)}
+@tab @code{__MADDACCS (@var{b}, @var{a})}
+@tab @code{MADDACCS @var{a},@var{b}}
+@item @code{sw1 __MADDHSS (sw1, sw1)}
+@tab @code{@var{c} = __MADDHSS (@var{a}, @var{b})}
+@tab @code{MADDHSS @var{a},@var{b},@var{c}}
+@item @code{uw1 __MADDHUS (uw1, uw1)}
+@tab @code{@var{c} = __MADDHUS (@var{a}, @var{b})}
+@tab @code{MADDHUS @var{a},@var{b},@var{c}}
+@item @code{uw1 __MAND (uw1, uw1)}
+@tab @code{@var{c} = __MAND (@var{a}, @var{b})}
+@tab @code{MAND @var{a},@var{b},@var{c}}
+@item @code{void __MASACCS (acc, acc)}
+@tab @code{__MASACCS (@var{b}, @var{a})}
+@tab @code{MASACCS @var{a},@var{b}}
+@item @code{uw1 __MAVEH (uw1, uw1)}
+@tab @code{@var{c} = __MAVEH (@var{a}, @var{b})}
+@tab @code{MAVEH @var{a},@var{b},@var{c}}
+@item @code{uw2 __MBTOH (uw1)}
+@tab @code{@var{b} = __MBTOH (@var{a})}
+@tab @code{MBTOH @var{a},@var{b}}
+@item @code{void __MBTOHE (uw1 *, uw1)}
+@tab @code{__MBTOHE (&@var{b}, @var{a})}
+@tab @code{MBTOHE @var{a},@var{b}}
+@item @code{void __MCLRACC (acc)}
+@tab @code{__MCLRACC (@var{a})}
+@tab @code{MCLRACC @var{a}}
+@item @code{void __MCLRACCA (void)}
+@tab @code{__MCLRACCA ()}
+@tab @code{MCLRACCA}
+@item @code{uw1 __Mcop1 (uw1, uw1)}
+@tab @code{@var{c} = __Mcop1 (@var{a}, @var{b})}
+@tab @code{Mcop1 @var{a},@var{b},@var{c}}
+@item @code{uw1 __Mcop2 (uw1, uw1)}
+@tab @code{@var{c} = __Mcop2 (@var{a}, @var{b})}
+@tab @code{Mcop2 @var{a},@var{b},@var{c}}
+@item @code{uw1 __MCPLHI (uw2, const)}
+@tab @code{@var{c} = __MCPLHI (@var{a}, @var{b})}
+@tab @code{MCPLHI @var{a},#@var{b},@var{c}}
+@item @code{uw1 __MCPLI (uw2, const)}
+@tab @code{@var{c} = __MCPLI (@var{a}, @var{b})}
+@tab @code{MCPLI @var{a},#@var{b},@var{c}}
+@item @code{void __MCPXIS (acc, sw1, sw1)}
+@tab @code{__MCPXIS (@var{c}, @var{a}, @var{b})}
+@tab @code{MCPXIS @var{a},@var{b},@var{c}}
+@item @code{void __MCPXIU (acc, uw1, uw1)}
+@tab @code{__MCPXIU (@var{c}, @var{a}, @var{b})}
+@tab @code{MCPXIU @var{a},@var{b},@var{c}}
+@item @code{void __MCPXRS (acc, sw1, sw1)}
+@tab @code{__MCPXRS (@var{c}, @var{a}, @var{b})}
+@tab @code{MCPXRS @var{a},@var{b},@var{c}}
+@item @code{void __MCPXRU (acc, uw1, uw1)}
+@tab @code{__MCPXRU (@var{c}, @var{a}, @var{b})}
+@tab @code{MCPXRU @var{a},@var{b},@var{c}}
+@item @code{uw1 __MCUT (acc, uw1)}
+@tab @code{@var{c} = __MCUT (@var{a}, @var{b})}
+@tab @code{MCUT @var{a},@var{b},@var{c}}
+@item @code{uw1 __MCUTSS (acc, sw1)}
+@tab @code{@var{c} = __MCUTSS (@var{a}, @var{b})}
+@tab @code{MCUTSS @var{a},@var{b},@var{c}}
+@item @code{void __MDADDACCS (acc, acc)}
+@tab @code{__MDADDACCS (@var{b}, @var{a})}
+@tab @code{MDADDACCS @var{a},@var{b}}
+@item @code{void __MDASACCS (acc, acc)}
+@tab @code{__MDASACCS (@var{b}, @var{a})}
+@tab @code{MDASACCS @var{a},@var{b}}
+@item @code{uw2 __MDCUTSSI (acc, const)}
+@tab @code{@var{c} = __MDCUTSSI (@var{a}, @var{b})}
+@tab @code{MDCUTSSI @var{a},#@var{b},@var{c}}
+@item @code{uw2 __MDPACKH (uw2, uw2)}
+@tab @code{@var{c} = __MDPACKH (@var{a}, @var{b})}
+@tab @code{MDPACKH @var{a},@var{b},@var{c}}
+@item @code{uw2 __MDROTLI (uw2, const)}
+@tab @code{@var{c} = __MDROTLI (@var{a}, @var{b})}
+@tab @code{MDROTLI @var{a},#@var{b},@var{c}}
+@item @code{void __MDSUBACCS (acc, acc)}
+@tab @code{__MDSUBACCS (@var{b}, @var{a})}
+@tab @code{MDSUBACCS @var{a},@var{b}}
+@item @code{void __MDUNPACKH (uw1 *, uw2)}
+@tab @code{__MDUNPACKH (&@var{b}, @var{a})}
+@tab @code{MDUNPACKH @var{a},@var{b}}
+@item @code{uw2 __MEXPDHD (uw1, const)}
+@tab @code{@var{c} = __MEXPDHD (@var{a}, @var{b})}
+@tab @code{MEXPDHD @var{a},#@var{b},@var{c}}
+@item @code{uw1 __MEXPDHW (uw1, const)}
+@tab @code{@var{c} = __MEXPDHW (@var{a}, @var{b})}
+@tab @code{MEXPDHW @var{a},#@var{b},@var{c}}
+@item @code{uw1 __MHDSETH (uw1, const)}
+@tab @code{@var{c} = __MHDSETH (@var{a}, @var{b})}
+@tab @code{MHDSETH @var{a},#@var{b},@var{c}}
+@item @code{sw1 __MHDSETS (const)}
+@tab @code{@var{b} = __MHDSETS (@var{a})}
+@tab @code{MHDSETS #@var{a},@var{b}}
+@item @code{uw1 __MHSETHIH (uw1, const)}
+@tab @code{@var{b} = __MHSETHIH (@var{b}, @var{a})}
+@tab @code{MHSETHIH #@var{a},@var{b}}
+@item @code{sw1 __MHSETHIS (sw1, const)}
+@tab @code{@var{b} = __MHSETHIS (@var{b}, @var{a})}
+@tab @code{MHSETHIS #@var{a},@var{b}}
+@item @code{uw1 __MHSETLOH (uw1, const)}
+@tab @code{@var{b} = __MHSETLOH (@var{b}, @var{a})}
+@tab @code{MHSETLOH #@var{a},@var{b}}
+@item @code{sw1 __MHSETLOS (sw1, const)}
+@tab @code{@var{b} = __MHSETLOS (@var{b}, @var{a})}
+@tab @code{MHSETLOS #@var{a},@var{b}}
+@item @code{uw1 __MHTOB (uw2)}
+@tab @code{@var{b} = __MHTOB (@var{a})}
+@tab @code{MHTOB @var{a},@var{b}}
+@item @code{void __MMACHS (acc, sw1, sw1)}
+@tab @code{__MMACHS (@var{c}, @var{a}, @var{b})}
+@tab @code{MMACHS @var{a},@var{b},@var{c}}
+@item @code{void __MMACHU (acc, uw1, uw1)}
+@tab @code{__MMACHU (@var{c}, @var{a}, @var{b})}
+@tab @code{MMACHU @var{a},@var{b},@var{c}}
+@item @code{void __MMRDHS (acc, sw1, sw1)}
+@tab @code{__MMRDHS (@var{c}, @var{a}, @var{b})}
+@tab @code{MMRDHS @var{a},@var{b},@var{c}}
+@item @code{void __MMRDHU (acc, uw1, uw1)}
+@tab @code{__MMRDHU (@var{c}, @var{a}, @var{b})}
+@tab @code{MMRDHU @var{a},@var{b},@var{c}}
+@item @code{void __MMULHS (acc, sw1, sw1)}
+@tab @code{__MMULHS (@var{c}, @var{a}, @var{b})}
+@tab @code{MMULHS @var{a},@var{b},@var{c}}
+@item @code{void __MMULHU (acc, uw1, uw1)}
+@tab @code{__MMULHU (@var{c}, @var{a}, @var{b})}
+@tab @code{MMULHU @var{a},@var{b},@var{c}}
+@item @code{void __MMULXHS (acc, sw1, sw1)}
+@tab @code{__MMULXHS (@var{c}, @var{a}, @var{b})}
+@tab @code{MMULXHS @var{a},@var{b},@var{c}}
+@item @code{void __MMULXHU (acc, uw1, uw1)}
+@tab @code{__MMULXHU (@var{c}, @var{a}, @var{b})}
+@tab @code{MMULXHU @var{a},@var{b},@var{c}}
+@item @code{uw1 __MNOT (uw1)}
+@tab @code{@var{b} = __MNOT (@var{a})}
+@tab @code{MNOT @var{a},@var{b}}
+@item @code{uw1 __MOR (uw1, uw1)}
+@tab @code{@var{c} = __MOR (@var{a}, @var{b})}
+@tab @code{MOR @var{a},@var{b},@var{c}}
+@item @code{uw1 __MPACKH (uh, uh)}
+@tab @code{@var{c} = __MPACKH (@var{a}, @var{b})}
+@tab @code{MPACKH @var{a},@var{b},@var{c}}
+@item @code{sw2 __MQADDHSS (sw2, sw2)}
+@tab @code{@var{c} = __MQADDHSS (@var{a}, @var{b})}
+@tab @code{MQADDHSS @var{a},@var{b},@var{c}}
+@item @code{uw2 __MQADDHUS (uw2, uw2)}
+@tab @code{@var{c} = __MQADDHUS (@var{a}, @var{b})}
+@tab @code{MQADDHUS @var{a},@var{b},@var{c}}
+@item @code{void __MQCPXIS (acc, sw2, sw2)}
+@tab @code{__MQCPXIS (@var{c}, @var{a}, @var{b})}
+@tab @code{MQCPXIS @var{a},@var{b},@var{c}}
+@item @code{void __MQCPXIU (acc, uw2, uw2)}
+@tab @code{__MQCPXIU (@var{c}, @var{a}, @var{b})}
+@tab @code{MQCPXIU @var{a},@var{b},@var{c}}
+@item @code{void __MQCPXRS (acc, sw2, sw2)}
+@tab @code{__MQCPXRS (@var{c}, @var{a}, @var{b})}
+@tab @code{MQCPXRS @var{a},@var{b},@var{c}}
+@item @code{void __MQCPXRU (acc, uw2, uw2)}
+@tab @code{__MQCPXRU (@var{c}, @var{a}, @var{b})}
+@tab @code{MQCPXRU @var{a},@var{b},@var{c}}
+@item @code{sw2 __MQLCLRHS (sw2, sw2)}
+@tab @code{@var{c} = __MQLCLRHS (@var{a}, @var{b})}
+@tab @code{MQLCLRHS @var{a},@var{b},@var{c}}
+@item @code{sw2 __MQLMTHS (sw2, sw2)}
+@tab @code{@var{c} = __MQLMTHS (@var{a}, @var{b})}
+@tab @code{MQLMTHS @var{a},@var{b},@var{c}}
+@item @code{void __MQMACHS (acc, sw2, sw2)}
+@tab @code{__MQMACHS (@var{c}, @var{a}, @var{b})}
+@tab @code{MQMACHS @var{a},@var{b},@var{c}}
+@item @code{void __MQMACHU (acc, uw2, uw2)}
+@tab @code{__MQMACHU (@var{c}, @var{a}, @var{b})}
+@tab @code{MQMACHU @var{a},@var{b},@var{c}}
+@item @code{void __MQMACXHS (acc, sw2, sw2)}
+@tab @code{__MQMACXHS (@var{c}, @var{a}, @var{b})}
+@tab @code{MQMACXHS @var{a},@var{b},@var{c}}
+@item @code{void __MQMULHS (acc, sw2, sw2)}
+@tab @code{__MQMULHS (@var{c}, @var{a}, @var{b})}
+@tab @code{MQMULHS @var{a},@var{b},@var{c}}
+@item @code{void __MQMULHU (acc, uw2, uw2)}
+@tab @code{__MQMULHU (@var{c}, @var{a}, @var{b})}
+@tab @code{MQMULHU @var{a},@var{b},@var{c}}
+@item @code{void __MQMULXHS (acc, sw2, sw2)}
+@tab @code{__MQMULXHS (@var{c}, @var{a}, @var{b})}
+@tab @code{MQMULXHS @var{a},@var{b},@var{c}}
+@item @code{void __MQMULXHU (acc, uw2, uw2)}
+@tab @code{__MQMULXHU (@var{c}, @var{a}, @var{b})}
+@tab @code{MQMULXHU @var{a},@var{b},@var{c}}
+@item @code{sw2 __MQSATHS (sw2, sw2)}
+@tab @code{@var{c} = __MQSATHS (@var{a}, @var{b})}
+@tab @code{MQSATHS @var{a},@var{b},@var{c}}
+@item @code{uw2 __MQSLLHI (uw2, int)}
+@tab @code{@var{c} = __MQSLLHI (@var{a}, @var{b})}
+@tab @code{MQSLLHI @var{a},@var{b},@var{c}}
+@item @code{sw2 __MQSRAHI (sw2, int)}
+@tab @code{@var{c} = __MQSRAHI (@var{a}, @var{b})}
+@tab @code{MQSRAHI @var{a},@var{b},@var{c}}
+@item @code{sw2 __MQSUBHSS (sw2, sw2)}
+@tab @code{@var{c} = __MQSUBHSS (@var{a}, @var{b})}
+@tab @code{MQSUBHSS @var{a},@var{b},@var{c}}
+@item @code{uw2 __MQSUBHUS (uw2, uw2)}
+@tab @code{@var{c} = __MQSUBHUS (@var{a}, @var{b})}
+@tab @code{MQSUBHUS @var{a},@var{b},@var{c}}
+@item @code{void __MQXMACHS (acc, sw2, sw2)}
+@tab @code{__MQXMACHS (@var{c}, @var{a}, @var{b})}
+@tab @code{MQXMACHS @var{a},@var{b},@var{c}}
+@item @code{void __MQXMACXHS (acc, sw2, sw2)}
+@tab @code{__MQXMACXHS (@var{c}, @var{a}, @var{b})}
+@tab @code{MQXMACXHS @var{a},@var{b},@var{c}}
+@item @code{uw1 __MRDACC (acc)}
+@tab @code{@var{b} = __MRDACC (@var{a})}
+@tab @code{MRDACC @var{a},@var{b}}
+@item @code{uw1 __MRDACCG (acc)}
+@tab @code{@var{b} = __MRDACCG (@var{a})}
+@tab @code{MRDACCG @var{a},@var{b}}
+@item @code{uw1 __MROTLI (uw1, const)}
+@tab @code{@var{c} = __MROTLI (@var{a}, @var{b})}
+@tab @code{MROTLI @var{a},#@var{b},@var{c}}
+@item @code{uw1 __MROTRI (uw1, const)}
+@tab @code{@var{c} = __MROTRI (@var{a}, @var{b})}
+@tab @code{MROTRI @var{a},#@var{b},@var{c}}
+@item @code{sw1 __MSATHS (sw1, sw1)}
+@tab @code{@var{c} = __MSATHS (@var{a}, @var{b})}
+@tab @code{MSATHS @var{a},@var{b},@var{c}}
+@item @code{uw1 __MSATHU (uw1, uw1)}
+@tab @code{@var{c} = __MSATHU (@var{a}, @var{b})}
+@tab @code{MSATHU @var{a},@var{b},@var{c}}
+@item @code{uw1 __MSLLHI (uw1, const)}
+@tab @code{@var{c} = __MSLLHI (@var{a}, @var{b})}
+@tab @code{MSLLHI @var{a},#@var{b},@var{c}}
+@item @code{sw1 __MSRAHI (sw1, const)}
+@tab @code{@var{c} = __MSRAHI (@var{a}, @var{b})}
+@tab @code{MSRAHI @var{a},#@var{b},@var{c}}
+@item @code{uw1 __MSRLHI (uw1, const)}
+@tab @code{@var{c} = __MSRLHI (@var{a}, @var{b})}
+@tab @code{MSRLHI @var{a},#@var{b},@var{c}}
+@item @code{void __MSUBACCS (acc, acc)}
+@tab @code{__MSUBACCS (@var{b}, @var{a})}
+@tab @code{MSUBACCS @var{a},@var{b}}
+@item @code{sw1 __MSUBHSS (sw1, sw1)}
+@tab @code{@var{c} = __MSUBHSS (@var{a}, @var{b})}
+@tab @code{MSUBHSS @var{a},@var{b},@var{c}}
+@item @code{uw1 __MSUBHUS (uw1, uw1)}
+@tab @code{@var{c} = __MSUBHUS (@var{a}, @var{b})}
+@tab @code{MSUBHUS @var{a},@var{b},@var{c}}
+@item @code{void __MTRAP (void)}
+@tab @code{__MTRAP ()}
+@tab @code{MTRAP}
+@item @code{uw2 __MUNPACKH (uw1)}
+@tab @code{@var{b} = __MUNPACKH (@var{a})}
+@tab @code{MUNPACKH @var{a},@var{b}}
+@item @code{uw1 __MWCUT (uw2, uw1)}
+@tab @code{@var{c} = __MWCUT (@var{a}, @var{b})}
+@tab @code{MWCUT @var{a},@var{b},@var{c}}
+@item @code{void __MWTACC (acc, uw1)}
+@tab @code{__MWTACC (@var{b}, @var{a})}
+@tab @code{MWTACC @var{a},@var{b}}
+@item @code{void __MWTACCG (acc, uw1)}
+@tab @code{__MWTACCG (@var{b}, @var{a})}
+@tab @code{MWTACCG @var{a},@var{b}}
+@item @code{uw1 __MXOR (uw1, uw1)}
+@tab @code{@var{c} = __MXOR (@var{a}, @var{b})}
+@tab @code{MXOR @var{a},@var{b},@var{c}}
+@end multitable
 
-v4i32 __builtin_msa_fcne_w (v4f32, v4f32);
-v2i64 __builtin_msa_fcne_d (v2f64, v2f64);
+@node Raw read/write Functions
+@subsubsection Raw Read/Write Functions
 
-v4i32 __builtin_msa_fcor_w (v4f32, v4f32);
-v2i64 __builtin_msa_fcor_d (v2f64, v2f64);
+This sections describes built-in functions related to read and write
+instructions to access memory.  These functions generate
+@code{membar} instructions to flush the I/O load and stores where
+appropriate, as described in Fujitsu's manual described above.
 
-v4i32 __builtin_msa_fcueq_w (v4f32, v4f32);
-v2i64 __builtin_msa_fcueq_d (v2f64, v2f64);
+@table @code
 
-v4i32 __builtin_msa_fcule_w (v4f32, v4f32);
-v2i64 __builtin_msa_fcule_d (v2f64, v2f64);
+@item unsigned char __builtin_read8 (void *@var{data})
+@item unsigned short __builtin_read16 (void *@var{data})
+@item unsigned long __builtin_read32 (void *@var{data})
+@item unsigned long long __builtin_read64 (void *@var{data})
 
-v4i32 __builtin_msa_fcult_w (v4f32, v4f32);
-v2i64 __builtin_msa_fcult_d (v2f64, v2f64);
+@item void __builtin_write8 (void *@var{data}, unsigned char @var{datum})
+@item void __builtin_write16 (void *@var{data}, unsigned short @var{datum})
+@item void __builtin_write32 (void *@var{data}, unsigned long @var{datum})
+@item void __builtin_write64 (void *@var{data}, unsigned long long @var{datum})
+@end table
 
-v4i32 __builtin_msa_fcun_w (v4f32, v4f32);
-v2i64 __builtin_msa_fcun_d (v2f64, v2f64);
+@node Other Built-in Functions
+@subsubsection Other Built-in Functions
 
-v4i32 __builtin_msa_fcune_w (v4f32, v4f32);
-v2i64 __builtin_msa_fcune_d (v2f64, v2f64);
+This section describes built-in functions that are not named after
+a specific FR-V instruction.
 
-v4f32 __builtin_msa_fdiv_w (v4f32, v4f32);
-v2f64 __builtin_msa_fdiv_d (v2f64, v2f64);
+@table @code
+@item sw2 __IACCreadll (iacc @var{reg})
+Return the full 64-bit value of IACC0@.  The @var{reg} argument is reserved
+for future expansion and must be 0.
 
-v8i16 __builtin_msa_fexdo_h (v4f32, v4f32);
-v4f32 __builtin_msa_fexdo_w (v2f64, v2f64);
+@item sw1 __IACCreadl (iacc @var{reg})
+Return the value of IACC0H if @var{reg} is 0 and IACC0L if @var{reg} is 1.
+Other values of @var{reg} are rejected as invalid.
 
-v4f32 __builtin_msa_fexp2_w (v4f32, v4i32);
-v2f64 __builtin_msa_fexp2_d (v2f64, v2i64);
+@item void __IACCsetll (iacc @var{reg}, sw2 @var{x})
+Set the full 64-bit value of IACC0 to @var{x}.  The @var{reg} argument
+is reserved for future expansion and must be 0.
 
-v4f32 __builtin_msa_fexupl_w (v8i16);
-v2f64 __builtin_msa_fexupl_d (v4f32);
+@item void __IACCsetl (iacc @var{reg}, sw1 @var{x})
+Set IACC0H to @var{x} if @var{reg} is 0 and IACC0L to @var{x} if @var{reg}
+is 1.  Other values of @var{reg} are rejected as invalid.
 
-v4f32 __builtin_msa_fexupr_w (v8i16);
-v2f64 __builtin_msa_fexupr_d (v4f32);
+@item void __data_prefetch0 (const void *@var{x})
+Use the @code{dcpl} instruction to load the contents of address @var{x}
+into the data cache.
 
-v4f32 __builtin_msa_ffint_s_w (v4i32);
-v2f64 __builtin_msa_ffint_s_d (v2i64);
+@item void __data_prefetch (const void *@var{x})
+Use the @code{nldub} instruction to load the contents of address @var{x}
+into the data cache.  The instruction is issued in slot I1@.
+@end table
 
-v4f32 __builtin_msa_ffint_u_w (v4u32);
-v2f64 __builtin_msa_ffint_u_d (v2u64);
+@node LoongArch Base Built-in Functions
+@subsection LoongArch Base Built-in Functions
 
-v4f32 __builtin_msa_ffql_w (v8i16);
-v2f64 __builtin_msa_ffql_d (v4i32);
+These built-in functions are available for LoongArch.
 
-v4f32 __builtin_msa_ffqr_w (v8i16);
-v2f64 __builtin_msa_ffqr_d (v4i32);
+Data Type Description:
+@itemize
+@item @code{imm0_31}, a compile-time constant in range 0 to 31;
+@item @code{imm0_16383}, a compile-time constant in range 0 to 16383;
+@item @code{imm0_32767}, a compile-time constant in range 0 to 32767;
+@item @code{imm_n2048_2047}, a compile-time constant in range -2048 to 2047;
+@end itemize
 
-v16i8 __builtin_msa_fill_b (i32);
-v8i16 __builtin_msa_fill_h (i32);
-v4i32 __builtin_msa_fill_w (i32);
-v2i64 __builtin_msa_fill_d (i64);
+The intrinsics provided are listed below:
+@smallexample
+    unsigned int __builtin_loongarch_movfcsr2gr (imm0_31)
+    void __builtin_loongarch_movgr2fcsr (imm0_31, unsigned int)
+    void __builtin_loongarch_cacop_d (imm0_31, unsigned long int, imm_n2048_2047)
+    unsigned int __builtin_loongarch_cpucfg (unsigned int)
+    void __builtin_loongarch_asrtle_d (long int, long int)
+    void __builtin_loongarch_asrtgt_d (long int, long int)
+    long int __builtin_loongarch_lddir_d (long int, imm0_31)
+    void __builtin_loongarch_ldpte_d (long int, imm0_31)
 
-v4f32 __builtin_msa_flog2_w (v4f32);
-v2f64 __builtin_msa_flog2_d (v2f64);
+    int __builtin_loongarch_crc_w_b_w (char, int)
+    int __builtin_loongarch_crc_w_h_w (short, int)
+    int __builtin_loongarch_crc_w_w_w (int, int)
+    int __builtin_loongarch_crc_w_d_w (long int, int)
+    int __builtin_loongarch_crcc_w_b_w (char, int)
+    int __builtin_loongarch_crcc_w_h_w (short, int)
+    int __builtin_loongarch_crcc_w_w_w (int, int)
+    int __builtin_loongarch_crcc_w_d_w (long int, int)
 
-v4f32 __builtin_msa_fmadd_w (v4f32, v4f32, v4f32);
-v2f64 __builtin_msa_fmadd_d (v2f64, v2f64, v2f64);
+    unsigned int __builtin_loongarch_csrrd_w (imm0_16383)
+    unsigned int __builtin_loongarch_csrwr_w (unsigned int, imm0_16383)
+    unsigned int __builtin_loongarch_csrxchg_w (unsigned int, unsigned int, imm0_16383)
+    unsigned long int __builtin_loongarch_csrrd_d (imm0_16383)
+    unsigned long int __builtin_loongarch_csrwr_d (unsigned long int, imm0_16383)
+    unsigned long int __builtin_loongarch_csrxchg_d (unsigned long int, unsigned long int, imm0_16383)
 
-v4f32 __builtin_msa_fmax_w (v4f32, v4f32);
-v2f64 __builtin_msa_fmax_d (v2f64, v2f64);
+    unsigned char __builtin_loongarch_iocsrrd_b (unsigned int)
+    unsigned short __builtin_loongarch_iocsrrd_h (unsigned int)
+    unsigned int __builtin_loongarch_iocsrrd_w (unsigned int)
+    unsigned long int __builtin_loongarch_iocsrrd_d (unsigned int)
+    void __builtin_loongarch_iocsrwr_b (unsigned char, unsigned int)
+    void __builtin_loongarch_iocsrwr_h (unsigned short, unsigned int)
+    void __builtin_loongarch_iocsrwr_w (unsigned int, unsigned int)
+    void __builtin_loongarch_iocsrwr_d (unsigned long int, unsigned int)
 
-v4f32 __builtin_msa_fmax_a_w (v4f32, v4f32);
-v2f64 __builtin_msa_fmax_a_d (v2f64, v2f64);
+    void __builtin_loongarch_dbar (imm0_32767)
+    void __builtin_loongarch_ibar (imm0_32767)
 
-v4f32 __builtin_msa_fmin_w (v4f32, v4f32);
-v2f64 __builtin_msa_fmin_d (v2f64, v2f64);
+    void __builtin_loongarch_syscall (imm0_32767)
+    void __builtin_loongarch_break (imm0_32767)
+@end smallexample
 
-v4f32 __builtin_msa_fmin_a_w (v4f32, v4f32);
-v2f64 __builtin_msa_fmin_a_d (v2f64, v2f64);
+These intrinsic functions are available by using @option{-mfrecipe}.
+@smallexample
+    float __builtin_loongarch_frecipe_s (float);
+    double  __builtin_loongarch_frecipe_d (double);
+    float __builtin_loongarch_frsqrte_s (float);
+    double  __builtin_loongarch_frsqrte_d (double);
+@end smallexample
 
-v4f32 __builtin_msa_fmsub_w (v4f32, v4f32, v4f32);
-v2f64 __builtin_msa_fmsub_d (v2f64, v2f64, v2f64);
+@emph{Note:}Since the control register is divided into 32-bit and 64-bit,
+but the access instruction is not distinguished. So GCC renames the control
+instructions when implementing intrinsics.
 
-v4f32 __builtin_msa_fmul_w (v4f32, v4f32);
-v2f64 __builtin_msa_fmul_d (v2f64, v2f64);
+Take the csrrd instruction as an example, built-in functions are implemented as follows:
+@smallexample
+  __builtin_loongarch_csrrd_w  // When reading the 32-bit control register use.
+  __builtin_loongarch_csrrd_d  // When reading the 64-bit control register use.
+@end smallexample
 
-v4f32 __builtin_msa_frint_w (v4f32);
-v2f64 __builtin_msa_frint_d (v2f64);
+For the convenience of use, the built-in functions are encapsulated,
+the encapsulated functions and @code{__drdtime_t, __rdtime_t} are
+defined in the @code{larchintrin.h}. So if you call the following
+function you need to include @code{larchintrin.h}.
 
-v4f32 __builtin_msa_frcp_w (v4f32);
-v2f64 __builtin_msa_frcp_d (v2f64);
+@smallexample
+     typedef struct drdtime@{
+            unsigned long dvalue;
+            unsigned long dtimeid;
+     @} __drdtime_t;
 
-v4f32 __builtin_msa_frsqrt_w (v4f32);
-v2f64 __builtin_msa_frsqrt_d (v2f64);
+     typedef struct rdtime@{
+            unsigned int value;
+            unsigned int timeid;
+     @} __rdtime_t;
+@end smallexample
 
-v4i32 __builtin_msa_fsaf_w (v4f32, v4f32);
-v2i64 __builtin_msa_fsaf_d (v2f64, v2f64);
+@smallexample
+    __drdtime_t __rdtime_d (void)
+    __rdtime_t  __rdtimel_w (void)
+    __rdtime_t  __rdtimeh_w (void)
+    unsigned int  __movfcsr2gr (imm0_31)
+    void __movgr2fcsr (imm0_31, unsigned int)
+    void __cacop_d (imm0_31, unsigned long, imm_n2048_2047)
+    unsigned int  __cpucfg (unsigned int)
+    void __asrtle_d (long int, long int)
+    void __asrtgt_d (long int, long int)
+    long int  __lddir_d (long int, imm0_31)
+    void __ldpte_d (long int, imm0_31)
 
-v4i32 __builtin_msa_fseq_w (v4f32, v4f32);
-v2i64 __builtin_msa_fseq_d (v2f64, v2f64);
+    int  __crc_w_b_w (char, int)
+    int  __crc_w_h_w (short, int)
+    int  __crc_w_w_w (int, int)
+    int  __crc_w_d_w (long int, int)
+    int  __crcc_w_b_w (char, int)
+    int  __crcc_w_h_w (short, int)
+    int  __crcc_w_w_w (int, int)
+    int  __crcc_w_d_w (long int, int)
 
-v4i32 __builtin_msa_fsle_w (v4f32, v4f32);
-v2i64 __builtin_msa_fsle_d (v2f64, v2f64);
+    unsigned int  __csrrd_w (imm0_16383)
+    unsigned int  __csrwr_w (unsigned int, imm0_16383)
+    unsigned int  __csrxchg_w (unsigned int, unsigned int, imm0_16383)
+    unsigned long  __csrrd_d (imm0_16383)
+    unsigned long  __csrwr_d (unsigned long, imm0_16383)
+    unsigned long  __csrxchg_d (unsigned long, unsigned long, imm0_16383)
 
-v4i32 __builtin_msa_fslt_w (v4f32, v4f32);
-v2i64 __builtin_msa_fslt_d (v2f64, v2f64);
+    unsigned char   __iocsrrd_b (unsigned int)
+    unsigned short  __iocsrrd_h (unsigned int)
+    unsigned int  __iocsrrd_w (unsigned int)
+    unsigned long  __iocsrrd_d (unsigned int)
+    void __iocsrwr_b (unsigned char, unsigned int)
+    void __iocsrwr_h (unsigned short, unsigned int)
+    void __iocsrwr_w (unsigned int, unsigned int)
+    void __iocsrwr_d (unsigned long, unsigned int)
 
-v4i32 __builtin_msa_fsne_w (v4f32, v4f32);
-v2i64 __builtin_msa_fsne_d (v2f64, v2f64);
+    void __dbar (imm0_32767)
+    void __ibar (imm0_32767)
 
-v4i32 __builtin_msa_fsor_w (v4f32, v4f32);
-v2i64 __builtin_msa_fsor_d (v2f64, v2f64);
+    void __syscall (imm0_32767)
+    void __break (imm0_32767)
+@end smallexample
 
-v4f32 __builtin_msa_fsqrt_w (v4f32);
-v2f64 __builtin_msa_fsqrt_d (v2f64);
+These intrinsic functions are available by including @code{larchintrin.h} and
+using @option{-mfrecipe}.
+@smallexample
+    float __frecipe_s (float);
+    double __frecipe_d (double);
+    float __frsqrte_s (float);
+    double __frsqrte_d (double);
+@end smallexample
 
-v4f32 __builtin_msa_fsub_w (v4f32, v4f32);
-v2f64 __builtin_msa_fsub_d (v2f64, v2f64);
+Additional built-in functions are available for LoongArch family
+processors to efficiently use 128-bit floating-point (__float128)
+values.
 
-v4i32 __builtin_msa_fsueq_w (v4f32, v4f32);
-v2i64 __builtin_msa_fsueq_d (v2f64, v2f64);
+The following are the basic built-in functions supported.
+@smallexample
+__float128 __builtin_fabsq (__float128);
+__float128 __builtin_copysignq (__float128, __float128);
+__float128 __builtin_infq (void);
+__float128 __builtin_huge_valq (void);
+__float128 __builtin_nanq (void);
+__float128 __builtin_nansq (void);
+@end smallexample
 
-v4i32 __builtin_msa_fsule_w (v4f32, v4f32);
-v2i64 __builtin_msa_fsule_d (v2f64, v2f64);
+Returns the value that is currently set in the @samp{tp} register.
+@smallexample
+    void * __builtin_thread_pointer (void)
+@end smallexample
 
-v4i32 __builtin_msa_fsult_w (v4f32, v4f32);
-v2i64 __builtin_msa_fsult_d (v2f64, v2f64);
+@node LoongArch SX Vector Intrinsics
+@subsection LoongArch SX Vector Intrinsics
 
-v4i32 __builtin_msa_fsun_w (v4f32, v4f32);
-v2i64 __builtin_msa_fsun_d (v2f64, v2f64);
+GCC provides intrinsics to access the LSX (Loongson SIMD Extension) instructions.
+The interface is made available by including @code{<lsxintrin.h>} and using
+@option{-mlsx}.
 
-v4i32 __builtin_msa_fsune_w (v4f32, v4f32);
-v2i64 __builtin_msa_fsune_d (v2f64, v2f64);
+The following vectors typedefs are included in @code{lsxintrin.h}:
 
-v4i32 __builtin_msa_ftint_s_w (v4f32);
-v2i64 __builtin_msa_ftint_s_d (v2f64);
+@itemize
+@item @code{__m128i}, a 128-bit vector of fixed point;
+@item @code{__m128}, a 128-bit vector of single precision floating point;
+@item @code{__m128d}, a 128-bit vector of double precision floating point.
+@end itemize
 
-v4u32 __builtin_msa_ftint_u_w (v4f32);
-v2u64 __builtin_msa_ftint_u_d (v2f64);
+Instructions and corresponding built-ins may have additional restrictions and/or
+input/output values manipulated:
+@itemize
+@item @code{imm0_1}, an integer literal in range 0 to 1;
+@item @code{imm0_3}, an integer literal in range 0 to 3;
+@item @code{imm0_7}, an integer literal in range 0 to 7;
+@item @code{imm0_15}, an integer literal in range 0 to 15;
+@item @code{imm0_31}, an integer literal in range 0 to 31;
+@item @code{imm0_63}, an integer literal in range 0 to 63;
+@item @code{imm0_127}, an integer literal in range 0 to 127;
+@item @code{imm0_255}, an integer literal in range 0 to 255;
+@item @code{imm_n16_15}, an integer literal in range -16 to 15;
+@item @code{imm_n128_127}, an integer literal in range -128 to 127;
+@item @code{imm_n256_255}, an integer literal in range -256 to 255;
+@item @code{imm_n512_511}, an integer literal in range -512 to 511;
+@item @code{imm_n1024_1023}, an integer literal in range -1024 to 1023;
+@item @code{imm_n2048_2047}, an integer literal in range -2048 to 2047.
+@end itemize
 
-v8i16 __builtin_msa_ftq_h (v4f32, v4f32);
-v4i32 __builtin_msa_ftq_w (v2f64, v2f64);
+For convenience, GCC defines functions @code{__lsx_vrepli_@{b/h/w/d@}} and
+@code{__lsx_b[n]z_@{v/b/h/w/d@}}, which are implemented as follows:
 
-v4i32 __builtin_msa_ftrunc_s_w (v4f32);
-v2i64 __builtin_msa_ftrunc_s_d (v2f64);
+@smallexample
+a. @code{__lsx_vrepli_@{b/h/w/d@}}: Implemented the case where the highest
+   bit of @code{vldi} instruction @code{i13} is 1.
 
-v4u32 __builtin_msa_ftrunc_u_w (v4f32);
-v2u64 __builtin_msa_ftrunc_u_d (v2f64);
+   i13[12] == 1'b0
+   case i13[11:10] of :
+     2'b00: __lsx_vrepli_b (imm_n512_511)
+     2'b01: __lsx_vrepli_h (imm_n512_511)
+     2'b10: __lsx_vrepli_w (imm_n512_511)
+     2'b11: __lsx_vrepli_d (imm_n512_511)
 
-v8i16 __builtin_msa_hadd_s_h (v16i8, v16i8);
-v4i32 __builtin_msa_hadd_s_w (v8i16, v8i16);
-v2i64 __builtin_msa_hadd_s_d (v4i32, v4i32);
+b. @code{__lsx_b[n]z_@{v/b/h/w/d@}}: Since the @code{vseteqz} class directive
+   cannot be used on its own, this function is defined.
 
-v8u16 __builtin_msa_hadd_u_h (v16u8, v16u8);
-v4u32 __builtin_msa_hadd_u_w (v8u16, v8u16);
-v2u64 __builtin_msa_hadd_u_d (v4u32, v4u32);
+   _lsx_bz_v  => vseteqz.v + bcnez
+   _lsx_bnz_v => vsetnez.v + bcnez
+   _lsx_bz_b  => vsetanyeqz.b + bcnez
+   _lsx_bz_h  => vsetanyeqz.h + bcnez
+   _lsx_bz_w  => vsetanyeqz.w + bcnez
+   _lsx_bz_d  => vsetanyeqz.d + bcnez
+   _lsx_bnz_b => vsetallnez.b + bcnez
+   _lsx_bnz_h => vsetallnez.h + bcnez
+   _lsx_bnz_w => vsetallnez.w + bcnez
+   _lsx_bnz_d => vsetallnez.d + bcnez
+@end smallexample
 
-v8i16 __builtin_msa_hsub_s_h (v16i8, v16i8);
-v4i32 __builtin_msa_hsub_s_w (v8i16, v8i16);
-v2i64 __builtin_msa_hsub_s_d (v4i32, v4i32);
+@smallexample
+eg:
+  #include <lsxintrin.h>
 
-v8i16 __builtin_msa_hsub_u_h (v16u8, v16u8);
-v4i32 __builtin_msa_hsub_u_w (v8u16, v8u16);
-v2i64 __builtin_msa_hsub_u_d (v4u32, v4u32);
+  extern __m128i @var{a};
 
-v16i8 __builtin_msa_ilvev_b (v16i8, v16i8);
-v8i16 __builtin_msa_ilvev_h (v8i16, v8i16);
-v4i32 __builtin_msa_ilvev_w (v4i32, v4i32);
-v2i64 __builtin_msa_ilvev_d (v2i64, v2i64);
+  void
+  test (void)
+  @{
+    if (__lsx_bz_v (@var{a}))
+      printf ("1\n");
+    else
+      printf ("2\n");
+  @}
+@end smallexample
 
-v16i8 __builtin_msa_ilvl_b (v16i8, v16i8);
-v8i16 __builtin_msa_ilvl_h (v8i16, v8i16);
-v4i32 __builtin_msa_ilvl_w (v4i32, v4i32);
-v2i64 __builtin_msa_ilvl_d (v2i64, v2i64);
+@emph{Note:} For directives where the intent operand is also the source operand
+(modifying only part of the bitfield of the intent register), the first parameter
+in the builtin call function is used as the intent operand.
 
-v16i8 __builtin_msa_ilvod_b (v16i8, v16i8);
-v8i16 __builtin_msa_ilvod_h (v8i16, v8i16);
-v4i32 __builtin_msa_ilvod_w (v4i32, v4i32);
-v2i64 __builtin_msa_ilvod_d (v2i64, v2i64);
+@smallexample
+eg:
+  #include <lsxintrin.h>
 
-v16i8 __builtin_msa_ilvr_b (v16i8, v16i8);
-v8i16 __builtin_msa_ilvr_h (v8i16, v8i16);
-v4i32 __builtin_msa_ilvr_w (v4i32, v4i32);
-v2i64 __builtin_msa_ilvr_d (v2i64, v2i64);
+  extern __m128i @var{dst};
+  extern int @var{src};
 
-v16i8 __builtin_msa_insert_b (v16i8, imm0_15, i32);
-v8i16 __builtin_msa_insert_h (v8i16, imm0_7, i32);
-v4i32 __builtin_msa_insert_w (v4i32, imm0_3, i32);
-v2i64 __builtin_msa_insert_d (v2i64, imm0_1, i64);
+  void
+  test (void)
+  @{
+    @var{dst} = __lsx_vinsgr2vr_b (@var{dst}, @var{src}, 3);
+  @}
+@end smallexample
 
-v16i8 __builtin_msa_insve_b (v16i8, imm0_15, v16i8);
-v8i16 __builtin_msa_insve_h (v8i16, imm0_7, v8i16);
-v4i32 __builtin_msa_insve_w (v4i32, imm0_3, v4i32);
-v2i64 __builtin_msa_insve_d (v2i64, imm0_1, v2i64);
+The intrinsics provided are listed below:
+@smallexample
+int __lsx_bnz_b (__m128i);
+int __lsx_bnz_d (__m128i);
+int __lsx_bnz_h (__m128i);
+int __lsx_bnz_v (__m128i);
+int __lsx_bnz_w (__m128i);
+int __lsx_bz_b (__m128i);
+int __lsx_bz_d (__m128i);
+int __lsx_bz_h (__m128i);
+int __lsx_bz_v (__m128i);
+int __lsx_bz_w (__m128i);
+__m128i __lsx_vabsd_b (__m128i, __m128i);
+__m128i __lsx_vabsd_bu (__m128i, __m128i);
+__m128i __lsx_vabsd_d (__m128i, __m128i);
+__m128i __lsx_vabsd_du (__m128i, __m128i);
+__m128i __lsx_vabsd_h (__m128i, __m128i);
+__m128i __lsx_vabsd_hu (__m128i, __m128i);
+__m128i __lsx_vabsd_w (__m128i, __m128i);
+__m128i __lsx_vabsd_wu (__m128i, __m128i);
+__m128i __lsx_vadda_b (__m128i, __m128i);
+__m128i __lsx_vadda_d (__m128i, __m128i);
+__m128i __lsx_vadda_h (__m128i, __m128i);
+__m128i __lsx_vadda_w (__m128i, __m128i);
+__m128i __lsx_vadd_b (__m128i, __m128i);
+__m128i __lsx_vadd_d (__m128i, __m128i);
+__m128i __lsx_vadd_h (__m128i, __m128i);
+__m128i __lsx_vaddi_bu (__m128i, imm0_31);
+__m128i __lsx_vaddi_du (__m128i, imm0_31);
+__m128i __lsx_vaddi_hu (__m128i, imm0_31);
+__m128i __lsx_vaddi_wu (__m128i, imm0_31);
+__m128i __lsx_vadd_q (__m128i, __m128i);
+__m128i __lsx_vadd_w (__m128i, __m128i);
+__m128i __lsx_vaddwev_d_w (__m128i, __m128i);
+__m128i __lsx_vaddwev_d_wu (__m128i, __m128i);
+__m128i __lsx_vaddwev_d_wu_w (__m128i, __m128i);
+__m128i __lsx_vaddwev_h_b (__m128i, __m128i);
+__m128i __lsx_vaddwev_h_bu (__m128i, __m128i);
+__m128i __lsx_vaddwev_h_bu_b (__m128i, __m128i);
+__m128i __lsx_vaddwev_q_d (__m128i, __m128i);
+__m128i __lsx_vaddwev_q_du (__m128i, __m128i);
+__m128i __lsx_vaddwev_q_du_d (__m128i, __m128i);
+__m128i __lsx_vaddwev_w_h (__m128i, __m128i);
+__m128i __lsx_vaddwev_w_hu (__m128i, __m128i);
+__m128i __lsx_vaddwev_w_hu_h (__m128i, __m128i);
+__m128i __lsx_vaddwod_d_w (__m128i, __m128i);
+__m128i __lsx_vaddwod_d_wu (__m128i, __m128i);
+__m128i __lsx_vaddwod_d_wu_w (__m128i, __m128i);
+__m128i __lsx_vaddwod_h_b (__m128i, __m128i);
+__m128i __lsx_vaddwod_h_bu (__m128i, __m128i);
+__m128i __lsx_vaddwod_h_bu_b (__m128i, __m128i);
+__m128i __lsx_vaddwod_q_d (__m128i, __m128i);
+__m128i __lsx_vaddwod_q_du (__m128i, __m128i);
+__m128i __lsx_vaddwod_q_du_d (__m128i, __m128i);
+__m128i __lsx_vaddwod_w_h (__m128i, __m128i);
+__m128i __lsx_vaddwod_w_hu (__m128i, __m128i);
+__m128i __lsx_vaddwod_w_hu_h (__m128i, __m128i);
+__m128i __lsx_vandi_b (__m128i, imm0_255);
+__m128i __lsx_vandn_v (__m128i, __m128i);
+__m128i __lsx_vand_v (__m128i, __m128i);
+__m128i __lsx_vavg_b (__m128i, __m128i);
+__m128i __lsx_vavg_bu (__m128i, __m128i);
+__m128i __lsx_vavg_d (__m128i, __m128i);
+__m128i __lsx_vavg_du (__m128i, __m128i);
+__m128i __lsx_vavg_h (__m128i, __m128i);
+__m128i __lsx_vavg_hu (__m128i, __m128i);
+__m128i __lsx_vavgr_b (__m128i, __m128i);
+__m128i __lsx_vavgr_bu (__m128i, __m128i);
+__m128i __lsx_vavgr_d (__m128i, __m128i);
+__m128i __lsx_vavgr_du (__m128i, __m128i);
+__m128i __lsx_vavgr_h (__m128i, __m128i);
+__m128i __lsx_vavgr_hu (__m128i, __m128i);
+__m128i __lsx_vavgr_w (__m128i, __m128i);
+__m128i __lsx_vavgr_wu (__m128i, __m128i);
+__m128i __lsx_vavg_w (__m128i, __m128i);
+__m128i __lsx_vavg_wu (__m128i, __m128i);
+__m128i __lsx_vbitclr_b (__m128i, __m128i);
+__m128i __lsx_vbitclr_d (__m128i, __m128i);
+__m128i __lsx_vbitclr_h (__m128i, __m128i);
+__m128i __lsx_vbitclri_b (__m128i, imm0_7);
+__m128i __lsx_vbitclri_d (__m128i, imm0_63);
+__m128i __lsx_vbitclri_h (__m128i, imm0_15);
+__m128i __lsx_vbitclri_w (__m128i, imm0_31);
+__m128i __lsx_vbitclr_w (__m128i, __m128i);
+__m128i __lsx_vbitrev_b (__m128i, __m128i);
+__m128i __lsx_vbitrev_d (__m128i, __m128i);
+__m128i __lsx_vbitrev_h (__m128i, __m128i);
+__m128i __lsx_vbitrevi_b (__m128i, imm0_7);
+__m128i __lsx_vbitrevi_d (__m128i, imm0_63);
+__m128i __lsx_vbitrevi_h (__m128i, imm0_15);
+__m128i __lsx_vbitrevi_w (__m128i, imm0_31);
+__m128i __lsx_vbitrev_w (__m128i, __m128i);
+__m128i __lsx_vbitseli_b (__m128i, __m128i, imm0_255);
+__m128i __lsx_vbitsel_v (__m128i, __m128i, __m128i);
+__m128i __lsx_vbitset_b (__m128i, __m128i);
+__m128i __lsx_vbitset_d (__m128i, __m128i);
+__m128i __lsx_vbitset_h (__m128i, __m128i);
+__m128i __lsx_vbitseti_b (__m128i, imm0_7);
+__m128i __lsx_vbitseti_d (__m128i, imm0_63);
+__m128i __lsx_vbitseti_h (__m128i, imm0_15);
+__m128i __lsx_vbitseti_w (__m128i, imm0_31);
+__m128i __lsx_vbitset_w (__m128i, __m128i);
+__m128i __lsx_vbsll_v (__m128i, imm0_31);
+__m128i __lsx_vbsrl_v (__m128i, imm0_31);
+__m128i __lsx_vclo_b (__m128i);
+__m128i __lsx_vclo_d (__m128i);
+__m128i __lsx_vclo_h (__m128i);
+__m128i __lsx_vclo_w (__m128i);
+__m128i __lsx_vclz_b (__m128i);
+__m128i __lsx_vclz_d (__m128i);
+__m128i __lsx_vclz_h (__m128i);
+__m128i __lsx_vclz_w (__m128i);
+__m128i __lsx_vdiv_b (__m128i, __m128i);
+__m128i __lsx_vdiv_bu (__m128i, __m128i);
+__m128i __lsx_vdiv_d (__m128i, __m128i);
+__m128i __lsx_vdiv_du (__m128i, __m128i);
+__m128i __lsx_vdiv_h (__m128i, __m128i);
+__m128i __lsx_vdiv_hu (__m128i, __m128i);
+__m128i __lsx_vdiv_w (__m128i, __m128i);
+__m128i __lsx_vdiv_wu (__m128i, __m128i);
+__m128i __lsx_vexth_du_wu (__m128i);
+__m128i __lsx_vexth_d_w (__m128i);
+__m128i __lsx_vexth_h_b (__m128i);
+__m128i __lsx_vexth_hu_bu (__m128i);
+__m128i __lsx_vexth_q_d (__m128i);
+__m128i __lsx_vexth_qu_du (__m128i);
+__m128i __lsx_vexth_w_h (__m128i);
+__m128i __lsx_vexth_wu_hu (__m128i);
+__m128i __lsx_vextl_q_d (__m128i);
+__m128i __lsx_vextl_qu_du (__m128i);
+__m128i __lsx_vextrins_b (__m128i, __m128i, imm0_255);
+__m128i __lsx_vextrins_d (__m128i, __m128i, imm0_255);
+__m128i __lsx_vextrins_h (__m128i, __m128i, imm0_255);
+__m128i __lsx_vextrins_w (__m128i, __m128i, imm0_255);
+__m128d __lsx_vfadd_d (__m128d, __m128d);
+__m128 __lsx_vfadd_s (__m128, __m128);
+__m128i __lsx_vfclass_d (__m128d);
+__m128i __lsx_vfclass_s (__m128);
+__m128i __lsx_vfcmp_caf_d (__m128d, __m128d);
+__m128i __lsx_vfcmp_caf_s (__m128, __m128);
+__m128i __lsx_vfcmp_ceq_d (__m128d, __m128d);
+__m128i __lsx_vfcmp_ceq_s (__m128, __m128);
+__m128i __lsx_vfcmp_cle_d (__m128d, __m128d);
+__m128i __lsx_vfcmp_cle_s (__m128, __m128);
+__m128i __lsx_vfcmp_clt_d (__m128d, __m128d);
+__m128i __lsx_vfcmp_clt_s (__m128, __m128);
+__m128i __lsx_vfcmp_cne_d (__m128d, __m128d);
+__m128i __lsx_vfcmp_cne_s (__m128, __m128);
+__m128i __lsx_vfcmp_cor_d (__m128d, __m128d);
+__m128i __lsx_vfcmp_cor_s (__m128, __m128);
+__m128i __lsx_vfcmp_cueq_d (__m128d, __m128d);
+__m128i __lsx_vfcmp_cueq_s (__m128, __m128);
+__m128i __lsx_vfcmp_cule_d (__m128d, __m128d);
+__m128i __lsx_vfcmp_cule_s (__m128, __m128);
+__m128i __lsx_vfcmp_cult_d (__m128d, __m128d);
+__m128i __lsx_vfcmp_cult_s (__m128, __m128);
+__m128i __lsx_vfcmp_cun_d (__m128d, __m128d);
+__m128i __lsx_vfcmp_cune_d (__m128d, __m128d);
+__m128i __lsx_vfcmp_cune_s (__m128, __m128);
+__m128i __lsx_vfcmp_cun_s (__m128, __m128);
+__m128i __lsx_vfcmp_saf_d (__m128d, __m128d);
+__m128i __lsx_vfcmp_saf_s (__m128, __m128);
+__m128i __lsx_vfcmp_seq_d (__m128d, __m128d);
+__m128i __lsx_vfcmp_seq_s (__m128, __m128);
+__m128i __lsx_vfcmp_sle_d (__m128d, __m128d);
+__m128i __lsx_vfcmp_sle_s (__m128, __m128);
+__m128i __lsx_vfcmp_slt_d (__m128d, __m128d);
+__m128i __lsx_vfcmp_slt_s (__m128, __m128);
+__m128i __lsx_vfcmp_sne_d (__m128d, __m128d);
+__m128i __lsx_vfcmp_sne_s (__m128, __m128);
+__m128i __lsx_vfcmp_sor_d (__m128d, __m128d);
+__m128i __lsx_vfcmp_sor_s (__m128, __m128);
+__m128i __lsx_vfcmp_sueq_d (__m128d, __m128d);
+__m128i __lsx_vfcmp_sueq_s (__m128, __m128);
+__m128i __lsx_vfcmp_sule_d (__m128d, __m128d);
+__m128i __lsx_vfcmp_sule_s (__m128, __m128);
+__m128i __lsx_vfcmp_sult_d (__m128d, __m128d);
+__m128i __lsx_vfcmp_sult_s (__m128, __m128);
+__m128i __lsx_vfcmp_sun_d (__m128d, __m128d);
+__m128i __lsx_vfcmp_sune_d (__m128d, __m128d);
+__m128i __lsx_vfcmp_sune_s (__m128, __m128);
+__m128i __lsx_vfcmp_sun_s (__m128, __m128);
+__m128d __lsx_vfcvth_d_s (__m128);
+__m128i __lsx_vfcvt_h_s (__m128, __m128);
+__m128 __lsx_vfcvth_s_h (__m128i);
+__m128d __lsx_vfcvtl_d_s (__m128);
+__m128 __lsx_vfcvtl_s_h (__m128i);
+__m128 __lsx_vfcvt_s_d (__m128d, __m128d);
+__m128d __lsx_vfdiv_d (__m128d, __m128d);
+__m128 __lsx_vfdiv_s (__m128, __m128);
+__m128d __lsx_vffint_d_l (__m128i);
+__m128d __lsx_vffint_d_lu (__m128i);
+__m128d __lsx_vffinth_d_w (__m128i);
+__m128d __lsx_vffintl_d_w (__m128i);
+__m128 __lsx_vffint_s_l (__m128i, __m128i);
+__m128 __lsx_vffint_s_w (__m128i);
+__m128 __lsx_vffint_s_wu (__m128i);
+__m128d __lsx_vflogb_d (__m128d);
+__m128 __lsx_vflogb_s (__m128);
+__m128d __lsx_vfmadd_d (__m128d, __m128d, __m128d);
+__m128 __lsx_vfmadd_s (__m128, __m128, __m128);
+__m128d __lsx_vfmaxa_d (__m128d, __m128d);
+__m128 __lsx_vfmaxa_s (__m128, __m128);
+__m128d __lsx_vfmax_d (__m128d, __m128d);
+__m128 __lsx_vfmax_s (__m128, __m128);
+__m128d __lsx_vfmina_d (__m128d, __m128d);
+__m128 __lsx_vfmina_s (__m128, __m128);
+__m128d __lsx_vfmin_d (__m128d, __m128d);
+__m128 __lsx_vfmin_s (__m128, __m128);
+__m128d __lsx_vfmsub_d (__m128d, __m128d, __m128d);
+__m128 __lsx_vfmsub_s (__m128, __m128, __m128);
+__m128d __lsx_vfmul_d (__m128d, __m128d);
+__m128 __lsx_vfmul_s (__m128, __m128);
+__m128d __lsx_vfnmadd_d (__m128d, __m128d, __m128d);
+__m128 __lsx_vfnmadd_s (__m128, __m128, __m128);
+__m128d __lsx_vfnmsub_d (__m128d, __m128d, __m128d);
+__m128 __lsx_vfnmsub_s (__m128, __m128, __m128);
+__m128d __lsx_vfrecip_d (__m128d);
+__m128 __lsx_vfrecip_s (__m128);
+__m128d __lsx_vfrint_d (__m128d);
+__m128d __lsx_vfrintrm_d (__m128d);
+__m128 __lsx_vfrintrm_s (__m128);
+__m128d __lsx_vfrintrne_d (__m128d);
+__m128 __lsx_vfrintrne_s (__m128);
+__m128d __lsx_vfrintrp_d (__m128d);
+__m128 __lsx_vfrintrp_s (__m128);
+__m128d __lsx_vfrintrz_d (__m128d);
+__m128 __lsx_vfrintrz_s (__m128);
+__m128 __lsx_vfrint_s (__m128);
+__m128d __lsx_vfrsqrt_d (__m128d);
+__m128 __lsx_vfrsqrt_s (__m128);
+__m128i __lsx_vfrstp_b (__m128i, __m128i, __m128i);
+__m128i __lsx_vfrstp_h (__m128i, __m128i, __m128i);
+__m128i __lsx_vfrstpi_b (__m128i, __m128i, imm0_31);
+__m128i __lsx_vfrstpi_h (__m128i, __m128i, imm0_31);
+__m128d __lsx_vfsqrt_d (__m128d);
+__m128 __lsx_vfsqrt_s (__m128);
+__m128d __lsx_vfsub_d (__m128d, __m128d);
+__m128 __lsx_vfsub_s (__m128, __m128);
+__m128i __lsx_vftinth_l_s (__m128);
+__m128i __lsx_vftint_l_d (__m128d);
+__m128i __lsx_vftintl_l_s (__m128);
+__m128i __lsx_vftint_lu_d (__m128d);
+__m128i __lsx_vftintrmh_l_s (__m128);
+__m128i __lsx_vftintrm_l_d (__m128d);
+__m128i __lsx_vftintrml_l_s (__m128);
+__m128i __lsx_vftintrm_w_d (__m128d, __m128d);
+__m128i __lsx_vftintrm_w_s (__m128);
+__m128i __lsx_vftintrneh_l_s (__m128);
+__m128i __lsx_vftintrne_l_d (__m128d);
+__m128i __lsx_vftintrnel_l_s (__m128);
+__m128i __lsx_vftintrne_w_d (__m128d, __m128d);
+__m128i __lsx_vftintrne_w_s (__m128);
+__m128i __lsx_vftintrph_l_s (__m128);
+__m128i __lsx_vftintrp_l_d (__m128d);
+__m128i __lsx_vftintrpl_l_s (__m128);
+__m128i __lsx_vftintrp_w_d (__m128d, __m128d);
+__m128i __lsx_vftintrp_w_s (__m128);
+__m128i __lsx_vftintrzh_l_s (__m128);
+__m128i __lsx_vftintrz_l_d (__m128d);
+__m128i __lsx_vftintrzl_l_s (__m128);
+__m128i __lsx_vftintrz_lu_d (__m128d);
+__m128i __lsx_vftintrz_w_d (__m128d, __m128d);
+__m128i __lsx_vftintrz_w_s (__m128);
+__m128i __lsx_vftintrz_wu_s (__m128);
+__m128i __lsx_vftint_w_d (__m128d, __m128d);
+__m128i __lsx_vftint_w_s (__m128);
+__m128i __lsx_vftint_wu_s (__m128);
+__m128i __lsx_vhaddw_du_wu (__m128i, __m128i);
+__m128i __lsx_vhaddw_d_w (__m128i, __m128i);
+__m128i __lsx_vhaddw_h_b (__m128i, __m128i);
+__m128i __lsx_vhaddw_hu_bu (__m128i, __m128i);
+__m128i __lsx_vhaddw_q_d (__m128i, __m128i);
+__m128i __lsx_vhaddw_qu_du (__m128i, __m128i);
+__m128i __lsx_vhaddw_w_h (__m128i, __m128i);
+__m128i __lsx_vhaddw_wu_hu (__m128i, __m128i);
+__m128i __lsx_vhsubw_du_wu (__m128i, __m128i);
+__m128i __lsx_vhsubw_d_w (__m128i, __m128i);
+__m128i __lsx_vhsubw_h_b (__m128i, __m128i);
+__m128i __lsx_vhsubw_hu_bu (__m128i, __m128i);
+__m128i __lsx_vhsubw_q_d (__m128i, __m128i);
+__m128i __lsx_vhsubw_qu_du (__m128i, __m128i);
+__m128i __lsx_vhsubw_w_h (__m128i, __m128i);
+__m128i __lsx_vhsubw_wu_hu (__m128i, __m128i);
+__m128i __lsx_vilvh_b (__m128i, __m128i);
+__m128i __lsx_vilvh_d (__m128i, __m128i);
+__m128i __lsx_vilvh_h (__m128i, __m128i);
+__m128i __lsx_vilvh_w (__m128i, __m128i);
+__m128i __lsx_vilvl_b (__m128i, __m128i);
+__m128i __lsx_vilvl_d (__m128i, __m128i);
+__m128i __lsx_vilvl_h (__m128i, __m128i);
+__m128i __lsx_vilvl_w (__m128i, __m128i);
+__m128i __lsx_vinsgr2vr_b (__m128i, int, imm0_15);
+__m128i __lsx_vinsgr2vr_d (__m128i, long int, imm0_1);
+__m128i __lsx_vinsgr2vr_h (__m128i, int, imm0_7);
+__m128i __lsx_vinsgr2vr_w (__m128i, int, imm0_3);
+__m128i __lsx_vld (void *, imm_n2048_2047);
+__m128i __lsx_vldi (imm_n1024_1023);
+__m128i __lsx_vldrepl_b (void *, imm_n2048_2047);
+__m128i __lsx_vldrepl_d (void *, imm_n256_255);
+__m128i __lsx_vldrepl_h (void *, imm_n1024_1023);
+__m128i __lsx_vldrepl_w (void *, imm_n512_511);
+__m128i __lsx_vldx (void *, long int);
+__m128i __lsx_vmadd_b (__m128i, __m128i, __m128i);
+__m128i __lsx_vmadd_d (__m128i, __m128i, __m128i);
+__m128i __lsx_vmadd_h (__m128i, __m128i, __m128i);
+__m128i __lsx_vmadd_w (__m128i, __m128i, __m128i);
+__m128i __lsx_vmaddwev_d_w (__m128i, __m128i, __m128i);
+__m128i __lsx_vmaddwev_d_wu (__m128i, __m128i, __m128i);
+__m128i __lsx_vmaddwev_d_wu_w (__m128i, __m128i, __m128i);
+__m128i __lsx_vmaddwev_h_b (__m128i, __m128i, __m128i);
+__m128i __lsx_vmaddwev_h_bu (__m128i, __m128i, __m128i);
+__m128i __lsx_vmaddwev_h_bu_b (__m128i, __m128i, __m128i);
+__m128i __lsx_vmaddwev_q_d (__m128i, __m128i, __m128i);
+__m128i __lsx_vmaddwev_q_du (__m128i, __m128i, __m128i);
+__m128i __lsx_vmaddwev_q_du_d (__m128i, __m128i, __m128i);
+__m128i __lsx_vmaddwev_w_h (__m128i, __m128i, __m128i);
+__m128i __lsx_vmaddwev_w_hu (__m128i, __m128i, __m128i);
+__m128i __lsx_vmaddwev_w_hu_h (__m128i, __m128i, __m128i);
+__m128i __lsx_vmaddwod_d_w (__m128i, __m128i, __m128i);
+__m128i __lsx_vmaddwod_d_wu (__m128i, __m128i, __m128i);
+__m128i __lsx_vmaddwod_d_wu_w (__m128i, __m128i, __m128i);
+__m128i __lsx_vmaddwod_h_b (__m128i, __m128i, __m128i);
+__m128i __lsx_vmaddwod_h_bu (__m128i, __m128i, __m128i);
+__m128i __lsx_vmaddwod_h_bu_b (__m128i, __m128i, __m128i);
+__m128i __lsx_vmaddwod_q_d (__m128i, __m128i, __m128i);
+__m128i __lsx_vmaddwod_q_du (__m128i, __m128i, __m128i);
+__m128i __lsx_vmaddwod_q_du_d (__m128i, __m128i, __m128i);
+__m128i __lsx_vmaddwod_w_h (__m128i, __m128i, __m128i);
+__m128i __lsx_vmaddwod_w_hu (__m128i, __m128i, __m128i);
+__m128i __lsx_vmaddwod_w_hu_h (__m128i, __m128i, __m128i);
+__m128i __lsx_vmax_b (__m128i, __m128i);
+__m128i __lsx_vmax_bu (__m128i, __m128i);
+__m128i __lsx_vmax_d (__m128i, __m128i);
+__m128i __lsx_vmax_du (__m128i, __m128i);
+__m128i __lsx_vmax_h (__m128i, __m128i);
+__m128i __lsx_vmax_hu (__m128i, __m128i);
+__m128i __lsx_vmaxi_b (__m128i, imm_n16_15);
+__m128i __lsx_vmaxi_bu (__m128i, imm0_31);
+__m128i __lsx_vmaxi_d (__m128i, imm_n16_15);
+__m128i __lsx_vmaxi_du (__m128i, imm0_31);
+__m128i __lsx_vmaxi_h (__m128i, imm_n16_15);
+__m128i __lsx_vmaxi_hu (__m128i, imm0_31);
+__m128i __lsx_vmaxi_w (__m128i, imm_n16_15);
+__m128i __lsx_vmaxi_wu (__m128i, imm0_31);
+__m128i __lsx_vmax_w (__m128i, __m128i);
+__m128i __lsx_vmax_wu (__m128i, __m128i);
+__m128i __lsx_vmin_b (__m128i, __m128i);
+__m128i __lsx_vmin_bu (__m128i, __m128i);
+__m128i __lsx_vmin_d (__m128i, __m128i);
+__m128i __lsx_vmin_du (__m128i, __m128i);
+__m128i __lsx_vmin_h (__m128i, __m128i);
+__m128i __lsx_vmin_hu (__m128i, __m128i);
+__m128i __lsx_vmini_b (__m128i, imm_n16_15);
+__m128i __lsx_vmini_bu (__m128i, imm0_31);
+__m128i __lsx_vmini_d (__m128i, imm_n16_15);
+__m128i __lsx_vmini_du (__m128i, imm0_31);
+__m128i __lsx_vmini_h (__m128i, imm_n16_15);
+__m128i __lsx_vmini_hu (__m128i, imm0_31);
+__m128i __lsx_vmini_w (__m128i, imm_n16_15);
+__m128i __lsx_vmini_wu (__m128i, imm0_31);
+__m128i __lsx_vmin_w (__m128i, __m128i);
+__m128i __lsx_vmin_wu (__m128i, __m128i);
+__m128i __lsx_vmod_b (__m128i, __m128i);
+__m128i __lsx_vmod_bu (__m128i, __m128i);
+__m128i __lsx_vmod_d (__m128i, __m128i);
+__m128i __lsx_vmod_du (__m128i, __m128i);
+__m128i __lsx_vmod_h (__m128i, __m128i);
+__m128i __lsx_vmod_hu (__m128i, __m128i);
+__m128i __lsx_vmod_w (__m128i, __m128i);
+__m128i __lsx_vmod_wu (__m128i, __m128i);
+__m128i __lsx_vmskgez_b (__m128i);
+__m128i __lsx_vmskltz_b (__m128i);
+__m128i __lsx_vmskltz_d (__m128i);
+__m128i __lsx_vmskltz_h (__m128i);
+__m128i __lsx_vmskltz_w (__m128i);
+__m128i __lsx_vmsknz_b (__m128i);
+__m128i __lsx_vmsub_b (__m128i, __m128i, __m128i);
+__m128i __lsx_vmsub_d (__m128i, __m128i, __m128i);
+__m128i __lsx_vmsub_h (__m128i, __m128i, __m128i);
+__m128i __lsx_vmsub_w (__m128i, __m128i, __m128i);
+__m128i __lsx_vmuh_b (__m128i, __m128i);
+__m128i __lsx_vmuh_bu (__m128i, __m128i);
+__m128i __lsx_vmuh_d (__m128i, __m128i);
+__m128i __lsx_vmuh_du (__m128i, __m128i);
+__m128i __lsx_vmuh_h (__m128i, __m128i);
+__m128i __lsx_vmuh_hu (__m128i, __m128i);
+__m128i __lsx_vmuh_w (__m128i, __m128i);
+__m128i __lsx_vmuh_wu (__m128i, __m128i);
+__m128i __lsx_vmul_b (__m128i, __m128i);
+__m128i __lsx_vmul_d (__m128i, __m128i);
+__m128i __lsx_vmul_h (__m128i, __m128i);
+__m128i __lsx_vmul_w (__m128i, __m128i);
+__m128i __lsx_vmulwev_d_w (__m128i, __m128i);
+__m128i __lsx_vmulwev_d_wu (__m128i, __m128i);
+__m128i __lsx_vmulwev_d_wu_w (__m128i, __m128i);
+__m128i __lsx_vmulwev_h_b (__m128i, __m128i);
+__m128i __lsx_vmulwev_h_bu (__m128i, __m128i);
+__m128i __lsx_vmulwev_h_bu_b (__m128i, __m128i);
+__m128i __lsx_vmulwev_q_d (__m128i, __m128i);
+__m128i __lsx_vmulwev_q_du (__m128i, __m128i);
+__m128i __lsx_vmulwev_q_du_d (__m128i, __m128i);
+__m128i __lsx_vmulwev_w_h (__m128i, __m128i);
+__m128i __lsx_vmulwev_w_hu (__m128i, __m128i);
+__m128i __lsx_vmulwev_w_hu_h (__m128i, __m128i);
+__m128i __lsx_vmulwod_d_w (__m128i, __m128i);
+__m128i __lsx_vmulwod_d_wu (__m128i, __m128i);
+__m128i __lsx_vmulwod_d_wu_w (__m128i, __m128i);
+__m128i __lsx_vmulwod_h_b (__m128i, __m128i);
+__m128i __lsx_vmulwod_h_bu (__m128i, __m128i);
+__m128i __lsx_vmulwod_h_bu_b (__m128i, __m128i);
+__m128i __lsx_vmulwod_q_d (__m128i, __m128i);
+__m128i __lsx_vmulwod_q_du (__m128i, __m128i);
+__m128i __lsx_vmulwod_q_du_d (__m128i, __m128i);
+__m128i __lsx_vmulwod_w_h (__m128i, __m128i);
+__m128i __lsx_vmulwod_w_hu (__m128i, __m128i);
+__m128i __lsx_vmulwod_w_hu_h (__m128i, __m128i);
+__m128i __lsx_vneg_b (__m128i);
+__m128i __lsx_vneg_d (__m128i);
+__m128i __lsx_vneg_h (__m128i);
+__m128i __lsx_vneg_w (__m128i);
+__m128i __lsx_vnori_b (__m128i, imm0_255);
+__m128i __lsx_vnor_v (__m128i, __m128i);
+__m128i __lsx_vori_b (__m128i, imm0_255);
+__m128i __lsx_vorn_v (__m128i, __m128i);
+__m128i __lsx_vor_v (__m128i, __m128i);
+__m128i __lsx_vpackev_b (__m128i, __m128i);
+__m128i __lsx_vpackev_d (__m128i, __m128i);
+__m128i __lsx_vpackev_h (__m128i, __m128i);
+__m128i __lsx_vpackev_w (__m128i, __m128i);
+__m128i __lsx_vpackod_b (__m128i, __m128i);
+__m128i __lsx_vpackod_d (__m128i, __m128i);
+__m128i __lsx_vpackod_h (__m128i, __m128i);
+__m128i __lsx_vpackod_w (__m128i, __m128i);
+__m128i __lsx_vpcnt_b (__m128i);
+__m128i __lsx_vpcnt_d (__m128i);
+__m128i __lsx_vpcnt_h (__m128i);
+__m128i __lsx_vpcnt_w (__m128i);
+__m128i __lsx_vpermi_w (__m128i, __m128i, imm0_255);
+__m128i __lsx_vpickev_b (__m128i, __m128i);
+__m128i __lsx_vpickev_d (__m128i, __m128i);
+__m128i __lsx_vpickev_h (__m128i, __m128i);
+__m128i __lsx_vpickev_w (__m128i, __m128i);
+__m128i __lsx_vpickod_b (__m128i, __m128i);
+__m128i __lsx_vpickod_d (__m128i, __m128i);
+__m128i __lsx_vpickod_h (__m128i, __m128i);
+__m128i __lsx_vpickod_w (__m128i, __m128i);
+int __lsx_vpickve2gr_b (__m128i, imm0_15);
+unsigned int __lsx_vpickve2gr_bu (__m128i, imm0_15);
+long int __lsx_vpickve2gr_d (__m128i, imm0_1);
+unsigned long int __lsx_vpickve2gr_du (__m128i, imm0_1);
+int __lsx_vpickve2gr_h (__m128i, imm0_7);
+unsigned int __lsx_vpickve2gr_hu (__m128i, imm0_7);
+int __lsx_vpickve2gr_w (__m128i, imm0_3);
+unsigned int __lsx_vpickve2gr_wu (__m128i, imm0_3);
+__m128i __lsx_vreplgr2vr_b (int);
+__m128i __lsx_vreplgr2vr_d (long int);
+__m128i __lsx_vreplgr2vr_h (int);
+__m128i __lsx_vreplgr2vr_w (int);
+__m128i __lsx_vrepli_b (imm_n512_511);
+__m128i __lsx_vrepli_d (imm_n512_511);
+__m128i __lsx_vrepli_h (imm_n512_511);
+__m128i __lsx_vrepli_w (imm_n512_511);
+__m128i __lsx_vreplve_b (__m128i, int);
+__m128i __lsx_vreplve_d (__m128i, int);
+__m128i __lsx_vreplve_h (__m128i, int);
+__m128i __lsx_vreplvei_b (__m128i, imm0_15);
+__m128i __lsx_vreplvei_d (__m128i, imm0_1);
+__m128i __lsx_vreplvei_h (__m128i, imm0_7);
+__m128i __lsx_vreplvei_w (__m128i, imm0_3);
+__m128i __lsx_vreplve_w (__m128i, int);
+__m128i __lsx_vrotr_b (__m128i, __m128i);
+__m128i __lsx_vrotr_d (__m128i, __m128i);
+__m128i __lsx_vrotr_h (__m128i, __m128i);
+__m128i __lsx_vrotri_b (__m128i, imm0_7);
+__m128i __lsx_vrotri_d (__m128i, imm0_63);
+__m128i __lsx_vrotri_h (__m128i, imm0_15);
+__m128i __lsx_vrotri_w (__m128i, imm0_31);
+__m128i __lsx_vrotr_w (__m128i, __m128i);
+__m128i __lsx_vsadd_b (__m128i, __m128i);
+__m128i __lsx_vsadd_bu (__m128i, __m128i);
+__m128i __lsx_vsadd_d (__m128i, __m128i);
+__m128i __lsx_vsadd_du (__m128i, __m128i);
+__m128i __lsx_vsadd_h (__m128i, __m128i);
+__m128i __lsx_vsadd_hu (__m128i, __m128i);
+__m128i __lsx_vsadd_w (__m128i, __m128i);
+__m128i __lsx_vsadd_wu (__m128i, __m128i);
+__m128i __lsx_vsat_b (__m128i, imm0_7);
+__m128i __lsx_vsat_bu (__m128i, imm0_7);
+__m128i __lsx_vsat_d (__m128i, imm0_63);
+__m128i __lsx_vsat_du (__m128i, imm0_63);
+__m128i __lsx_vsat_h (__m128i, imm0_15);
+__m128i __lsx_vsat_hu (__m128i, imm0_15);
+__m128i __lsx_vsat_w (__m128i, imm0_31);
+__m128i __lsx_vsat_wu (__m128i, imm0_31);
+__m128i __lsx_vseq_b (__m128i, __m128i);
+__m128i __lsx_vseq_d (__m128i, __m128i);
+__m128i __lsx_vseq_h (__m128i, __m128i);
+__m128i __lsx_vseqi_b (__m128i, imm_n16_15);
+__m128i __lsx_vseqi_d (__m128i, imm_n16_15);
+__m128i __lsx_vseqi_h (__m128i, imm_n16_15);
+__m128i __lsx_vseqi_w (__m128i, imm_n16_15);
+__m128i __lsx_vseq_w (__m128i, __m128i);
+__m128i __lsx_vshuf4i_b (__m128i, imm0_255);
+__m128i __lsx_vshuf4i_d (__m128i, __m128i, imm0_255);
+__m128i __lsx_vshuf4i_h (__m128i, imm0_255);
+__m128i __lsx_vshuf4i_w (__m128i, imm0_255);
+__m128i __lsx_vshuf_b (__m128i, __m128i, __m128i);
+__m128i __lsx_vshuf_d (__m128i, __m128i, __m128i);
+__m128i __lsx_vshuf_h (__m128i, __m128i, __m128i);
+__m128i __lsx_vshuf_w (__m128i, __m128i, __m128i);
+__m128i __lsx_vsigncov_b (__m128i, __m128i);
+__m128i __lsx_vsigncov_d (__m128i, __m128i);
+__m128i __lsx_vsigncov_h (__m128i, __m128i);
+__m128i __lsx_vsigncov_w (__m128i, __m128i);
+__m128i __lsx_vsle_b (__m128i, __m128i);
+__m128i __lsx_vsle_bu (__m128i, __m128i);
+__m128i __lsx_vsle_d (__m128i, __m128i);
+__m128i __lsx_vsle_du (__m128i, __m128i);
+__m128i __lsx_vsle_h (__m128i, __m128i);
+__m128i __lsx_vsle_hu (__m128i, __m128i);
+__m128i __lsx_vslei_b (__m128i, imm_n16_15);
+__m128i __lsx_vslei_bu (__m128i, imm0_31);
+__m128i __lsx_vslei_d (__m128i, imm_n16_15);
+__m128i __lsx_vslei_du (__m128i, imm0_31);
+__m128i __lsx_vslei_h (__m128i, imm_n16_15);
+__m128i __lsx_vslei_hu (__m128i, imm0_31);
+__m128i __lsx_vslei_w (__m128i, imm_n16_15);
+__m128i __lsx_vslei_wu (__m128i, imm0_31);
+__m128i __lsx_vsle_w (__m128i, __m128i);
+__m128i __lsx_vsle_wu (__m128i, __m128i);
+__m128i __lsx_vsll_b (__m128i, __m128i);
+__m128i __lsx_vsll_d (__m128i, __m128i);
+__m128i __lsx_vsll_h (__m128i, __m128i);
+__m128i __lsx_vslli_b (__m128i, imm0_7);
+__m128i __lsx_vslli_d (__m128i, imm0_63);
+__m128i __lsx_vslli_h (__m128i, imm0_15);
+__m128i __lsx_vslli_w (__m128i, imm0_31);
+__m128i __lsx_vsll_w (__m128i, __m128i);
+__m128i __lsx_vsllwil_du_wu (__m128i, imm0_31);
+__m128i __lsx_vsllwil_d_w (__m128i, imm0_31);
+__m128i __lsx_vsllwil_h_b (__m128i, imm0_7);
+__m128i __lsx_vsllwil_hu_bu (__m128i, imm0_7);
+__m128i __lsx_vsllwil_w_h (__m128i, imm0_15);
+__m128i __lsx_vsllwil_wu_hu (__m128i, imm0_15);
+__m128i __lsx_vslt_b (__m128i, __m128i);
+__m128i __lsx_vslt_bu (__m128i, __m128i);
+__m128i __lsx_vslt_d (__m128i, __m128i);
+__m128i __lsx_vslt_du (__m128i, __m128i);
+__m128i __lsx_vslt_h (__m128i, __m128i);
+__m128i __lsx_vslt_hu (__m128i, __m128i);
+__m128i __lsx_vslti_b (__m128i, imm_n16_15);
+__m128i __lsx_vslti_bu (__m128i, imm0_31);
+__m128i __lsx_vslti_d (__m128i, imm_n16_15);
+__m128i __lsx_vslti_du (__m128i, imm0_31);
+__m128i __lsx_vslti_h (__m128i, imm_n16_15);
+__m128i __lsx_vslti_hu (__m128i, imm0_31);
+__m128i __lsx_vslti_w (__m128i, imm_n16_15);
+__m128i __lsx_vslti_wu (__m128i, imm0_31);
+__m128i __lsx_vslt_w (__m128i, __m128i);
+__m128i __lsx_vslt_wu (__m128i, __m128i);
+__m128i __lsx_vsra_b (__m128i, __m128i);
+__m128i __lsx_vsra_d (__m128i, __m128i);
+__m128i __lsx_vsra_h (__m128i, __m128i);
+__m128i __lsx_vsrai_b (__m128i, imm0_7);
+__m128i __lsx_vsrai_d (__m128i, imm0_63);
+__m128i __lsx_vsrai_h (__m128i, imm0_15);
+__m128i __lsx_vsrai_w (__m128i, imm0_31);
+__m128i __lsx_vsran_b_h (__m128i, __m128i);
+__m128i __lsx_vsran_h_w (__m128i, __m128i);
+__m128i __lsx_vsrani_b_h (__m128i, __m128i, imm0_15);
+__m128i __lsx_vsrani_d_q (__m128i, __m128i, imm0_127);
+__m128i __lsx_vsrani_h_w (__m128i, __m128i, imm0_31);
+__m128i __lsx_vsrani_w_d (__m128i, __m128i, imm0_63);
+__m128i __lsx_vsran_w_d (__m128i, __m128i);
+__m128i __lsx_vsrar_b (__m128i, __m128i);
+__m128i __lsx_vsrar_d (__m128i, __m128i);
+__m128i __lsx_vsrar_h (__m128i, __m128i);
+__m128i __lsx_vsrari_b (__m128i, imm0_7);
+__m128i __lsx_vsrari_d (__m128i, imm0_63);
+__m128i __lsx_vsrari_h (__m128i, imm0_15);
+__m128i __lsx_vsrari_w (__m128i, imm0_31);
+__m128i __lsx_vsrarn_b_h (__m128i, __m128i);
+__m128i __lsx_vsrarn_h_w (__m128i, __m128i);
+__m128i __lsx_vsrarni_b_h (__m128i, __m128i, imm0_15);
+__m128i __lsx_vsrarni_d_q (__m128i, __m128i, imm0_127);
+__m128i __lsx_vsrarni_h_w (__m128i, __m128i, imm0_31);
+__m128i __lsx_vsrarni_w_d (__m128i, __m128i, imm0_63);
+__m128i __lsx_vsrarn_w_d (__m128i, __m128i);
+__m128i __lsx_vsrar_w (__m128i, __m128i);
+__m128i __lsx_vsra_w (__m128i, __m128i);
+__m128i __lsx_vsrl_b (__m128i, __m128i);
+__m128i __lsx_vsrl_d (__m128i, __m128i);
+__m128i __lsx_vsrl_h (__m128i, __m128i);
+__m128i __lsx_vsrli_b (__m128i, imm0_7);
+__m128i __lsx_vsrli_d (__m128i, imm0_63);
+__m128i __lsx_vsrli_h (__m128i, imm0_15);
+__m128i __lsx_vsrli_w (__m128i, imm0_31);
+__m128i __lsx_vsrln_b_h (__m128i, __m128i);
+__m128i __lsx_vsrln_h_w (__m128i, __m128i);
+__m128i __lsx_vsrlni_b_h (__m128i, __m128i, imm0_15);
+__m128i __lsx_vsrlni_d_q (__m128i, __m128i, imm0_127);
+__m128i __lsx_vsrlni_h_w (__m128i, __m128i, imm0_31);
+__m128i __lsx_vsrlni_w_d (__m128i, __m128i, imm0_63);
+__m128i __lsx_vsrln_w_d (__m128i, __m128i);
+__m128i __lsx_vsrlr_b (__m128i, __m128i);
+__m128i __lsx_vsrlr_d (__m128i, __m128i);
+__m128i __lsx_vsrlr_h (__m128i, __m128i);
+__m128i __lsx_vsrlri_b (__m128i, imm0_7);
+__m128i __lsx_vsrlri_d (__m128i, imm0_63);
+__m128i __lsx_vsrlri_h (__m128i, imm0_15);
+__m128i __lsx_vsrlri_w (__m128i, imm0_31);
+__m128i __lsx_vsrlrn_b_h (__m128i, __m128i);
+__m128i __lsx_vsrlrn_h_w (__m128i, __m128i);
+__m128i __lsx_vsrlrni_b_h (__m128i, __m128i, imm0_15);
+__m128i __lsx_vsrlrni_d_q (__m128i, __m128i, imm0_127);
+__m128i __lsx_vsrlrni_h_w (__m128i, __m128i, imm0_31);
+__m128i __lsx_vsrlrni_w_d (__m128i, __m128i, imm0_63);
+__m128i __lsx_vsrlrn_w_d (__m128i, __m128i);
+__m128i __lsx_vsrlr_w (__m128i, __m128i);
+__m128i __lsx_vsrl_w (__m128i, __m128i);
+__m128i __lsx_vssran_b_h (__m128i, __m128i);
+__m128i __lsx_vssran_bu_h (__m128i, __m128i);
+__m128i __lsx_vssran_hu_w (__m128i, __m128i);
+__m128i __lsx_vssran_h_w (__m128i, __m128i);
+__m128i __lsx_vssrani_b_h (__m128i, __m128i, imm0_15);
+__m128i __lsx_vssrani_bu_h (__m128i, __m128i, imm0_15);
+__m128i __lsx_vssrani_d_q (__m128i, __m128i, imm0_127);
+__m128i __lsx_vssrani_du_q (__m128i, __m128i, imm0_127);
+__m128i __lsx_vssrani_hu_w (__m128i, __m128i, imm0_31);
+__m128i __lsx_vssrani_h_w (__m128i, __m128i, imm0_31);
+__m128i __lsx_vssrani_w_d (__m128i, __m128i, imm0_63);
+__m128i __lsx_vssrani_wu_d (__m128i, __m128i, imm0_63);
+__m128i __lsx_vssran_w_d (__m128i, __m128i);
+__m128i __lsx_vssran_wu_d (__m128i, __m128i);
+__m128i __lsx_vssrarn_b_h (__m128i, __m128i);
+__m128i __lsx_vssrarn_bu_h (__m128i, __m128i);
+__m128i __lsx_vssrarn_hu_w (__m128i, __m128i);
+__m128i __lsx_vssrarn_h_w (__m128i, __m128i);
+__m128i __lsx_vssrarni_b_h (__m128i, __m128i, imm0_15);
+__m128i __lsx_vssrarni_bu_h (__m128i, __m128i, imm0_15);
+__m128i __lsx_vssrarni_d_q (__m128i, __m128i, imm0_127);
+__m128i __lsx_vssrarni_du_q (__m128i, __m128i, imm0_127);
+__m128i __lsx_vssrarni_hu_w (__m128i, __m128i, imm0_31);
+__m128i __lsx_vssrarni_h_w (__m128i, __m128i, imm0_31);
+__m128i __lsx_vssrarni_w_d (__m128i, __m128i, imm0_63);
+__m128i __lsx_vssrarni_wu_d (__m128i, __m128i, imm0_63);
+__m128i __lsx_vssrarn_w_d (__m128i, __m128i);
+__m128i __lsx_vssrarn_wu_d (__m128i, __m128i);
+__m128i __lsx_vssrln_b_h (__m128i, __m128i);
+__m128i __lsx_vssrln_bu_h (__m128i, __m128i);
+__m128i __lsx_vssrln_hu_w (__m128i, __m128i);
+__m128i __lsx_vssrln_h_w (__m128i, __m128i);
+__m128i __lsx_vssrlni_b_h (__m128i, __m128i, imm0_15);
+__m128i __lsx_vssrlni_bu_h (__m128i, __m128i, imm0_15);
+__m128i __lsx_vssrlni_d_q (__m128i, __m128i, imm0_127);
+__m128i __lsx_vssrlni_du_q (__m128i, __m128i, imm0_127);
+__m128i __lsx_vssrlni_hu_w (__m128i, __m128i, imm0_31);
+__m128i __lsx_vssrlni_h_w (__m128i, __m128i, imm0_31);
+__m128i __lsx_vssrlni_w_d (__m128i, __m128i, imm0_63);
+__m128i __lsx_vssrlni_wu_d (__m128i, __m128i, imm0_63);
+__m128i __lsx_vssrln_w_d (__m128i, __m128i);
+__m128i __lsx_vssrln_wu_d (__m128i, __m128i);
+__m128i __lsx_vssrlrn_b_h (__m128i, __m128i);
+__m128i __lsx_vssrlrn_bu_h (__m128i, __m128i);
+__m128i __lsx_vssrlrn_hu_w (__m128i, __m128i);
+__m128i __lsx_vssrlrn_h_w (__m128i, __m128i);
+__m128i __lsx_vssrlrni_b_h (__m128i, __m128i, imm0_15);
+__m128i __lsx_vssrlrni_bu_h (__m128i, __m128i, imm0_15);
+__m128i __lsx_vssrlrni_d_q (__m128i, __m128i, imm0_127);
+__m128i __lsx_vssrlrni_du_q (__m128i, __m128i, imm0_127);
+__m128i __lsx_vssrlrni_hu_w (__m128i, __m128i, imm0_31);
+__m128i __lsx_vssrlrni_h_w (__m128i, __m128i, imm0_31);
+__m128i __lsx_vssrlrni_w_d (__m128i, __m128i, imm0_63);
+__m128i __lsx_vssrlrni_wu_d (__m128i, __m128i, imm0_63);
+__m128i __lsx_vssrlrn_w_d (__m128i, __m128i);
+__m128i __lsx_vssrlrn_wu_d (__m128i, __m128i);
+__m128i __lsx_vssub_b (__m128i, __m128i);
+__m128i __lsx_vssub_bu (__m128i, __m128i);
+__m128i __lsx_vssub_d (__m128i, __m128i);
+__m128i __lsx_vssub_du (__m128i, __m128i);
+__m128i __lsx_vssub_h (__m128i, __m128i);
+__m128i __lsx_vssub_hu (__m128i, __m128i);
+__m128i __lsx_vssub_w (__m128i, __m128i);
+__m128i __lsx_vssub_wu (__m128i, __m128i);
+void __lsx_vst (__m128i, void *, imm_n2048_2047);
+void __lsx_vstelm_b (__m128i, void *, imm_n128_127, imm0_15);
+void __lsx_vstelm_d (__m128i, void *, imm_n128_127, imm0_1);
+void __lsx_vstelm_h (__m128i, void *, imm_n128_127, imm0_7);
+void __lsx_vstelm_w (__m128i, void *, imm_n128_127, imm0_3);
+void __lsx_vstx (__m128i, void *, long int);
+__m128i __lsx_vsub_b (__m128i, __m128i);
+__m128i __lsx_vsub_d (__m128i, __m128i);
+__m128i __lsx_vsub_h (__m128i, __m128i);
+__m128i __lsx_vsubi_bu (__m128i, imm0_31);
+__m128i __lsx_vsubi_du (__m128i, imm0_31);
+__m128i __lsx_vsubi_hu (__m128i, imm0_31);
+__m128i __lsx_vsubi_wu (__m128i, imm0_31);
+__m128i __lsx_vsub_q (__m128i, __m128i);
+__m128i __lsx_vsub_w (__m128i, __m128i);
+__m128i __lsx_vsubwev_d_w (__m128i, __m128i);
+__m128i __lsx_vsubwev_d_wu (__m128i, __m128i);
+__m128i __lsx_vsubwev_h_b (__m128i, __m128i);
+__m128i __lsx_vsubwev_h_bu (__m128i, __m128i);
+__m128i __lsx_vsubwev_q_d (__m128i, __m128i);
+__m128i __lsx_vsubwev_q_du (__m128i, __m128i);
+__m128i __lsx_vsubwev_w_h (__m128i, __m128i);
+__m128i __lsx_vsubwev_w_hu (__m128i, __m128i);
+__m128i __lsx_vsubwod_d_w (__m128i, __m128i);
+__m128i __lsx_vsubwod_d_wu (__m128i, __m128i);
+__m128i __lsx_vsubwod_h_b (__m128i, __m128i);
+__m128i __lsx_vsubwod_h_bu (__m128i, __m128i);
+__m128i __lsx_vsubwod_q_d (__m128i, __m128i);
+__m128i __lsx_vsubwod_q_du (__m128i, __m128i);
+__m128i __lsx_vsubwod_w_h (__m128i, __m128i);
+__m128i __lsx_vsubwod_w_hu (__m128i, __m128i);
+__m128i __lsx_vxori_b (__m128i, imm0_255);
+__m128i __lsx_vxor_v (__m128i, __m128i);
+@end smallexample
 
-v16i8 __builtin_msa_ld_b (const void *, imm_n512_511);
-v8i16 __builtin_msa_ld_h (const void *, imm_n1024_1022);
-v4i32 __builtin_msa_ld_w (const void *, imm_n2048_2044);
-v2i64 __builtin_msa_ld_d (const void *, imm_n4096_4088);
+These intrinsic functions are available by including @code{lsxintrin.h} and
+using @option{-mfrecipe} and @option{-mlsx}.
+@smallexample
+__m128d __lsx_vfrecipe_d (__m128d);
+__m128 __lsx_vfrecipe_s (__m128);
+__m128d __lsx_vfrsqrte_d (__m128d);
+__m128 __lsx_vfrsqrte_s (__m128);
+@end smallexample
 
-v16i8 __builtin_msa_ldi_b (imm_n512_511);
-v8i16 __builtin_msa_ldi_h (imm_n512_511);
-v4i32 __builtin_msa_ldi_w (imm_n512_511);
-v2i64 __builtin_msa_ldi_d (imm_n512_511);
+@node LoongArch ASX Vector Intrinsics
+@subsection LoongArch ASX Vector Intrinsics
 
-v8i16 __builtin_msa_madd_q_h (v8i16, v8i16, v8i16);
-v4i32 __builtin_msa_madd_q_w (v4i32, v4i32, v4i32);
+GCC provides intrinsics to access the LASX (Loongson Advanced SIMD Extension)
+instructions. The interface is made available by including @code{<lasxintrin.h>}
+and using @option{-mlasx}.
 
-v8i16 __builtin_msa_maddr_q_h (v8i16, v8i16, v8i16);
-v4i32 __builtin_msa_maddr_q_w (v4i32, v4i32, v4i32);
+The following vectors typedefs are included in @code{lasxintrin.h}:
 
-v16i8 __builtin_msa_maddv_b (v16i8, v16i8, v16i8);
-v8i16 __builtin_msa_maddv_h (v8i16, v8i16, v8i16);
-v4i32 __builtin_msa_maddv_w (v4i32, v4i32, v4i32);
-v2i64 __builtin_msa_maddv_d (v2i64, v2i64, v2i64);
+@itemize
+@item @code{__m256i}, a 256-bit vector of fixed point;
+@item @code{__m256}, a 256-bit vector of single precision floating point;
+@item @code{__m256d}, a 256-bit vector of double precision floating point.
+@end itemize
 
-v16i8 __builtin_msa_max_a_b (v16i8, v16i8);
-v8i16 __builtin_msa_max_a_h (v8i16, v8i16);
-v4i32 __builtin_msa_max_a_w (v4i32, v4i32);
-v2i64 __builtin_msa_max_a_d (v2i64, v2i64);
+Instructions and corresponding built-ins may have additional restrictions and/or
+input/output values manipulated:
 
-v16i8 __builtin_msa_max_s_b (v16i8, v16i8);
-v8i16 __builtin_msa_max_s_h (v8i16, v8i16);
-v4i32 __builtin_msa_max_s_w (v4i32, v4i32);
-v2i64 __builtin_msa_max_s_d (v2i64, v2i64);
+@itemize
+@item @code{imm0_1}, an integer literal in range 0 to 1.
+@item @code{imm0_3}, an integer literal in range 0 to 3.
+@item @code{imm0_7}, an integer literal in range 0 to 7.
+@item @code{imm0_15}, an integer literal in range 0 to 15.
+@item @code{imm0_31}, an integer literal in range 0 to 31.
+@item @code{imm0_63}, an integer literal in range 0 to 63.
+@item @code{imm0_127}, an integer literal in range 0 to 127.
+@item @code{imm0_255}, an integer literal in range 0 to 255.
+@item @code{imm_n16_15}, an integer literal in range -16 to 15.
+@item @code{imm_n128_127}, an integer literal in range -128 to 127.
+@item @code{imm_n256_255}, an integer literal in range -256 to 255.
+@item @code{imm_n512_511}, an integer literal in range -512 to 511.
+@item @code{imm_n1024_1023}, an integer literal in range -1024 to 1023.
+@item @code{imm_n2048_2047}, an integer literal in range -2048 to 2047.
+@end itemize
 
-v16u8 __builtin_msa_max_u_b (v16u8, v16u8);
-v8u16 __builtin_msa_max_u_h (v8u16, v8u16);
-v4u32 __builtin_msa_max_u_w (v4u32, v4u32);
-v2u64 __builtin_msa_max_u_d (v2u64, v2u64);
+For convenience, GCC defines functions @code{__lasx_xvrepli_@{b/h/w/d@}} and
+@code{__lasx_b[n]z_@{v/b/h/w/d@}}, which are implemented as follows:
 
-v16i8 __builtin_msa_maxi_s_b (v16i8, imm_n16_15);
-v8i16 __builtin_msa_maxi_s_h (v8i16, imm_n16_15);
-v4i32 __builtin_msa_maxi_s_w (v4i32, imm_n16_15);
-v2i64 __builtin_msa_maxi_s_d (v2i64, imm_n16_15);
+@smallexample
+a. @code{__lasx_xvrepli_@{b/h/w/d@}}: Implemented the case where the highest
+   bit of @code{xvldi} instruction @code{i13} is 1.
 
-v16u8 __builtin_msa_maxi_u_b (v16u8, imm0_31);
-v8u16 __builtin_msa_maxi_u_h (v8u16, imm0_31);
-v4u32 __builtin_msa_maxi_u_w (v4u32, imm0_31);
-v2u64 __builtin_msa_maxi_u_d (v2u64, imm0_31);
+   i13[12] == 1'b0
+   case i13[11:10] of :
+     2'b00: __lasx_xvrepli_b (imm_n512_511)
+     2'b01: __lasx_xvrepli_h (imm_n512_511)
+     2'b10: __lasx_xvrepli_w (imm_n512_511)
+     2'b11: __lasx_xvrepli_d (imm_n512_511)
 
-v16i8 __builtin_msa_min_a_b (v16i8, v16i8);
-v8i16 __builtin_msa_min_a_h (v8i16, v8i16);
-v4i32 __builtin_msa_min_a_w (v4i32, v4i32);
-v2i64 __builtin_msa_min_a_d (v2i64, v2i64);
+b. @code{__lasx_b[n]z_@{v/b/h/w/d@}}: Since the @code{xvseteqz} class directive
+   cannot be used on its own, this function is defined.
 
-v16i8 __builtin_msa_min_s_b (v16i8, v16i8);
-v8i16 __builtin_msa_min_s_h (v8i16, v8i16);
-v4i32 __builtin_msa_min_s_w (v4i32, v4i32);
-v2i64 __builtin_msa_min_s_d (v2i64, v2i64);
+   __lasx_xbz_v  => xvseteqz.v + bcnez
+   __lasx_xbnz_v => xvsetnez.v + bcnez
+   __lasx_xbz_b  => xvsetanyeqz.b + bcnez
+   __lasx_xbz_h  => xvsetanyeqz.h + bcnez
+   __lasx_xbz_w  => xvsetanyeqz.w + bcnez
+   __lasx_xbz_d  => xvsetanyeqz.d + bcnez
+   __lasx_xbnz_b => xvsetallnez.b + bcnez
+   __lasx_xbnz_h => xvsetallnez.h + bcnez
+   __lasx_xbnz_w => xvsetallnez.w + bcnez
+   __lasx_xbnz_d => xvsetallnez.d + bcnez
+@end smallexample
 
-v16u8 __builtin_msa_min_u_b (v16u8, v16u8);
-v8u16 __builtin_msa_min_u_h (v8u16, v8u16);
-v4u32 __builtin_msa_min_u_w (v4u32, v4u32);
-v2u64 __builtin_msa_min_u_d (v2u64, v2u64);
+@smallexample
+eg:
+  #include <lasxintrin.h>
 
-v16i8 __builtin_msa_mini_s_b (v16i8, imm_n16_15);
-v8i16 __builtin_msa_mini_s_h (v8i16, imm_n16_15);
-v4i32 __builtin_msa_mini_s_w (v4i32, imm_n16_15);
-v2i64 __builtin_msa_mini_s_d (v2i64, imm_n16_15);
+  extern __m256i @var{a};
 
-v16u8 __builtin_msa_mini_u_b (v16u8, imm0_31);
-v8u16 __builtin_msa_mini_u_h (v8u16, imm0_31);
-v4u32 __builtin_msa_mini_u_w (v4u32, imm0_31);
-v2u64 __builtin_msa_mini_u_d (v2u64, imm0_31);
+  void
+  test (void)
+  @{
+    if (__lasx_xbz_v (@var{a}))
+      printf ("1\n");
+    else
+      printf ("2\n");
+  @}
+@end smallexample
 
-v16i8 __builtin_msa_mod_s_b (v16i8, v16i8);
-v8i16 __builtin_msa_mod_s_h (v8i16, v8i16);
-v4i32 __builtin_msa_mod_s_w (v4i32, v4i32);
-v2i64 __builtin_msa_mod_s_d (v2i64, v2i64);
+@emph{Note:} For directives where the intent operand is also the source operand
+(modifying only part of the bitfield of the intent register), the first parameter
+in the builtin call function is used as the intent operand.
 
-v16u8 __builtin_msa_mod_u_b (v16u8, v16u8);
-v8u16 __builtin_msa_mod_u_h (v8u16, v8u16);
-v4u32 __builtin_msa_mod_u_w (v4u32, v4u32);
-v2u64 __builtin_msa_mod_u_d (v2u64, v2u64);
+@smallexample
+eg:
+  #include <lasxintrin.h>
+  extern __m256i @var{dst};
+  int @var{src};
 
-v16i8 __builtin_msa_move_v (v16i8);
+  void
+  test (void)
+  @{
+    @var{dst} = __lasx_xvinsgr2vr_w (@var{dst}, @var{src}, 3);
+  @}
+@end smallexample
 
-v8i16 __builtin_msa_msub_q_h (v8i16, v8i16, v8i16);
-v4i32 __builtin_msa_msub_q_w (v4i32, v4i32, v4i32);
 
-v8i16 __builtin_msa_msubr_q_h (v8i16, v8i16, v8i16);
-v4i32 __builtin_msa_msubr_q_w (v4i32, v4i32, v4i32);
+The intrinsics provided are listed below:
+
+@smallexample
+__m256i __lasx_vext2xv_d_b (__m256i);
+__m256i __lasx_vext2xv_d_h (__m256i);
+__m256i __lasx_vext2xv_du_bu (__m256i);
+__m256i __lasx_vext2xv_du_hu (__m256i);
+__m256i __lasx_vext2xv_du_wu (__m256i);
+__m256i __lasx_vext2xv_d_w (__m256i);
+__m256i __lasx_vext2xv_h_b (__m256i);
+__m256i __lasx_vext2xv_hu_bu (__m256i);
+__m256i __lasx_vext2xv_w_b (__m256i);
+__m256i __lasx_vext2xv_w_h (__m256i);
+__m256i __lasx_vext2xv_wu_bu (__m256i);
+__m256i __lasx_vext2xv_wu_hu (__m256i);
+int __lasx_xbnz_b (__m256i);
+int __lasx_xbnz_d (__m256i);
+int __lasx_xbnz_h (__m256i);
+int __lasx_xbnz_v (__m256i);
+int __lasx_xbnz_w (__m256i);
+int __lasx_xbz_b (__m256i);
+int __lasx_xbz_d (__m256i);
+int __lasx_xbz_h (__m256i);
+int __lasx_xbz_v (__m256i);
+int __lasx_xbz_w (__m256i);
+__m256i __lasx_xvabsd_b (__m256i, __m256i);
+__m256i __lasx_xvabsd_bu (__m256i, __m256i);
+__m256i __lasx_xvabsd_d (__m256i, __m256i);
+__m256i __lasx_xvabsd_du (__m256i, __m256i);
+__m256i __lasx_xvabsd_h (__m256i, __m256i);
+__m256i __lasx_xvabsd_hu (__m256i, __m256i);
+__m256i __lasx_xvabsd_w (__m256i, __m256i);
+__m256i __lasx_xvabsd_wu (__m256i, __m256i);
+__m256i __lasx_xvadda_b (__m256i, __m256i);
+__m256i __lasx_xvadda_d (__m256i, __m256i);
+__m256i __lasx_xvadda_h (__m256i, __m256i);
+__m256i __lasx_xvadda_w (__m256i, __m256i);
+__m256i __lasx_xvadd_b (__m256i, __m256i);
+__m256i __lasx_xvadd_d (__m256i, __m256i);
+__m256i __lasx_xvadd_h (__m256i, __m256i);
+__m256i __lasx_xvaddi_bu (__m256i, imm0_31);
+__m256i __lasx_xvaddi_du (__m256i, imm0_31);
+__m256i __lasx_xvaddi_hu (__m256i, imm0_31);
+__m256i __lasx_xvaddi_wu (__m256i, imm0_31);
+__m256i __lasx_xvadd_q (__m256i, __m256i);
+__m256i __lasx_xvadd_w (__m256i, __m256i);
+__m256i __lasx_xvaddwev_d_w (__m256i, __m256i);
+__m256i __lasx_xvaddwev_d_wu (__m256i, __m256i);
+__m256i __lasx_xvaddwev_d_wu_w (__m256i, __m256i);
+__m256i __lasx_xvaddwev_h_b (__m256i, __m256i);
+__m256i __lasx_xvaddwev_h_bu (__m256i, __m256i);
+__m256i __lasx_xvaddwev_h_bu_b (__m256i, __m256i);
+__m256i __lasx_xvaddwev_q_d (__m256i, __m256i);
+__m256i __lasx_xvaddwev_q_du (__m256i, __m256i);
+__m256i __lasx_xvaddwev_q_du_d (__m256i, __m256i);
+__m256i __lasx_xvaddwev_w_h (__m256i, __m256i);
+__m256i __lasx_xvaddwev_w_hu (__m256i, __m256i);
+__m256i __lasx_xvaddwev_w_hu_h (__m256i, __m256i);
+__m256i __lasx_xvaddwod_d_w (__m256i, __m256i);
+__m256i __lasx_xvaddwod_d_wu (__m256i, __m256i);
+__m256i __lasx_xvaddwod_d_wu_w (__m256i, __m256i);
+__m256i __lasx_xvaddwod_h_b (__m256i, __m256i);
+__m256i __lasx_xvaddwod_h_bu (__m256i, __m256i);
+__m256i __lasx_xvaddwod_h_bu_b (__m256i, __m256i);
+__m256i __lasx_xvaddwod_q_d (__m256i, __m256i);
+__m256i __lasx_xvaddwod_q_du (__m256i, __m256i);
+__m256i __lasx_xvaddwod_q_du_d (__m256i, __m256i);
+__m256i __lasx_xvaddwod_w_h (__m256i, __m256i);
+__m256i __lasx_xvaddwod_w_hu (__m256i, __m256i);
+__m256i __lasx_xvaddwod_w_hu_h (__m256i, __m256i);
+__m256i __lasx_xvandi_b (__m256i, imm0_255);
+__m256i __lasx_xvandn_v (__m256i, __m256i);
+__m256i __lasx_xvand_v (__m256i, __m256i);
+__m256i __lasx_xvavg_b (__m256i, __m256i);
+__m256i __lasx_xvavg_bu (__m256i, __m256i);
+__m256i __lasx_xvavg_d (__m256i, __m256i);
+__m256i __lasx_xvavg_du (__m256i, __m256i);
+__m256i __lasx_xvavg_h (__m256i, __m256i);
+__m256i __lasx_xvavg_hu (__m256i, __m256i);
+__m256i __lasx_xvavgr_b (__m256i, __m256i);
+__m256i __lasx_xvavgr_bu (__m256i, __m256i);
+__m256i __lasx_xvavgr_d (__m256i, __m256i);
+__m256i __lasx_xvavgr_du (__m256i, __m256i);
+__m256i __lasx_xvavgr_h (__m256i, __m256i);
+__m256i __lasx_xvavgr_hu (__m256i, __m256i);
+__m256i __lasx_xvavgr_w (__m256i, __m256i);
+__m256i __lasx_xvavgr_wu (__m256i, __m256i);
+__m256i __lasx_xvavg_w (__m256i, __m256i);
+__m256i __lasx_xvavg_wu (__m256i, __m256i);
+__m256i __lasx_xvbitclr_b (__m256i, __m256i);
+__m256i __lasx_xvbitclr_d (__m256i, __m256i);
+__m256i __lasx_xvbitclr_h (__m256i, __m256i);
+__m256i __lasx_xvbitclri_b (__m256i, imm0_7);
+__m256i __lasx_xvbitclri_d (__m256i, imm0_63);
+__m256i __lasx_xvbitclri_h (__m256i, imm0_15);
+__m256i __lasx_xvbitclri_w (__m256i, imm0_31);
+__m256i __lasx_xvbitclr_w (__m256i, __m256i);
+__m256i __lasx_xvbitrev_b (__m256i, __m256i);
+__m256i __lasx_xvbitrev_d (__m256i, __m256i);
+__m256i __lasx_xvbitrev_h (__m256i, __m256i);
+__m256i __lasx_xvbitrevi_b (__m256i, imm0_7);
+__m256i __lasx_xvbitrevi_d (__m256i, imm0_63);
+__m256i __lasx_xvbitrevi_h (__m256i, imm0_15);
+__m256i __lasx_xvbitrevi_w (__m256i, imm0_31);
+__m256i __lasx_xvbitrev_w (__m256i, __m256i);
+__m256i __lasx_xvbitseli_b (__m256i, __m256i, imm0_255);
+__m256i __lasx_xvbitsel_v (__m256i, __m256i, __m256i);
+__m256i __lasx_xvbitset_b (__m256i, __m256i);
+__m256i __lasx_xvbitset_d (__m256i, __m256i);
+__m256i __lasx_xvbitset_h (__m256i, __m256i);
+__m256i __lasx_xvbitseti_b (__m256i, imm0_7);
+__m256i __lasx_xvbitseti_d (__m256i, imm0_63);
+__m256i __lasx_xvbitseti_h (__m256i, imm0_15);
+__m256i __lasx_xvbitseti_w (__m256i, imm0_31);
+__m256i __lasx_xvbitset_w (__m256i, __m256i);
+__m256i __lasx_xvbsll_v (__m256i, imm0_31);
+__m256i __lasx_xvbsrl_v (__m256i, imm0_31);
+__m256i __lasx_xvclo_b (__m256i);
+__m256i __lasx_xvclo_d (__m256i);
+__m256i __lasx_xvclo_h (__m256i);
+__m256i __lasx_xvclo_w (__m256i);
+__m256i __lasx_xvclz_b (__m256i);
+__m256i __lasx_xvclz_d (__m256i);
+__m256i __lasx_xvclz_h (__m256i);
+__m256i __lasx_xvclz_w (__m256i);
+__m256i __lasx_xvdiv_b (__m256i, __m256i);
+__m256i __lasx_xvdiv_bu (__m256i, __m256i);
+__m256i __lasx_xvdiv_d (__m256i, __m256i);
+__m256i __lasx_xvdiv_du (__m256i, __m256i);
+__m256i __lasx_xvdiv_h (__m256i, __m256i);
+__m256i __lasx_xvdiv_hu (__m256i, __m256i);
+__m256i __lasx_xvdiv_w (__m256i, __m256i);
+__m256i __lasx_xvdiv_wu (__m256i, __m256i);
+__m256i __lasx_xvexth_du_wu (__m256i);
+__m256i __lasx_xvexth_d_w (__m256i);
+__m256i __lasx_xvexth_h_b (__m256i);
+__m256i __lasx_xvexth_hu_bu (__m256i);
+__m256i __lasx_xvexth_q_d (__m256i);
+__m256i __lasx_xvexth_qu_du (__m256i);
+__m256i __lasx_xvexth_w_h (__m256i);
+__m256i __lasx_xvexth_wu_hu (__m256i);
+__m256i __lasx_xvextl_q_d (__m256i);
+__m256i __lasx_xvextl_qu_du (__m256i);
+__m256i __lasx_xvextrins_b (__m256i, __m256i, imm0_255);
+__m256i __lasx_xvextrins_d (__m256i, __m256i, imm0_255);
+__m256i __lasx_xvextrins_h (__m256i, __m256i, imm0_255);
+__m256i __lasx_xvextrins_w (__m256i, __m256i, imm0_255);
+__m256d __lasx_xvfadd_d (__m256d, __m256d);
+__m256 __lasx_xvfadd_s (__m256, __m256);
+__m256i __lasx_xvfclass_d (__m256d);
+__m256i __lasx_xvfclass_s (__m256);
+__m256i __lasx_xvfcmp_caf_d (__m256d, __m256d);
+__m256i __lasx_xvfcmp_caf_s (__m256, __m256);
+__m256i __lasx_xvfcmp_ceq_d (__m256d, __m256d);
+__m256i __lasx_xvfcmp_ceq_s (__m256, __m256);
+__m256i __lasx_xvfcmp_cle_d (__m256d, __m256d);
+__m256i __lasx_xvfcmp_cle_s (__m256, __m256);
+__m256i __lasx_xvfcmp_clt_d (__m256d, __m256d);
+__m256i __lasx_xvfcmp_clt_s (__m256, __m256);
+__m256i __lasx_xvfcmp_cne_d (__m256d, __m256d);
+__m256i __lasx_xvfcmp_cne_s (__m256, __m256);
+__m256i __lasx_xvfcmp_cor_d (__m256d, __m256d);
+__m256i __lasx_xvfcmp_cor_s (__m256, __m256);
+__m256i __lasx_xvfcmp_cueq_d (__m256d, __m256d);
+__m256i __lasx_xvfcmp_cueq_s (__m256, __m256);
+__m256i __lasx_xvfcmp_cule_d (__m256d, __m256d);
+__m256i __lasx_xvfcmp_cule_s (__m256, __m256);
+__m256i __lasx_xvfcmp_cult_d (__m256d, __m256d);
+__m256i __lasx_xvfcmp_cult_s (__m256, __m256);
+__m256i __lasx_xvfcmp_cun_d (__m256d, __m256d);
+__m256i __lasx_xvfcmp_cune_d (__m256d, __m256d);
+__m256i __lasx_xvfcmp_cune_s (__m256, __m256);
+__m256i __lasx_xvfcmp_cun_s (__m256, __m256);
+__m256i __lasx_xvfcmp_saf_d (__m256d, __m256d);
+__m256i __lasx_xvfcmp_saf_s (__m256, __m256);
+__m256i __lasx_xvfcmp_seq_d (__m256d, __m256d);
+__m256i __lasx_xvfcmp_seq_s (__m256, __m256);
+__m256i __lasx_xvfcmp_sle_d (__m256d, __m256d);
+__m256i __lasx_xvfcmp_sle_s (__m256, __m256);
+__m256i __lasx_xvfcmp_slt_d (__m256d, __m256d);
+__m256i __lasx_xvfcmp_slt_s (__m256, __m256);
+__m256i __lasx_xvfcmp_sne_d (__m256d, __m256d);
+__m256i __lasx_xvfcmp_sne_s (__m256, __m256);
+__m256i __lasx_xvfcmp_sor_d (__m256d, __m256d);
+__m256i __lasx_xvfcmp_sor_s (__m256, __m256);
+__m256i __lasx_xvfcmp_sueq_d (__m256d, __m256d);
+__m256i __lasx_xvfcmp_sueq_s (__m256, __m256);
+__m256i __lasx_xvfcmp_sule_d (__m256d, __m256d);
+__m256i __lasx_xvfcmp_sule_s (__m256, __m256);
+__m256i __lasx_xvfcmp_sult_d (__m256d, __m256d);
+__m256i __lasx_xvfcmp_sult_s (__m256, __m256);
+__m256i __lasx_xvfcmp_sun_d (__m256d, __m256d);
+__m256i __lasx_xvfcmp_sune_d (__m256d, __m256d);
+__m256i __lasx_xvfcmp_sune_s (__m256, __m256);
+__m256i __lasx_xvfcmp_sun_s (__m256, __m256);
+__m256d __lasx_xvfcvth_d_s (__m256);
+__m256i __lasx_xvfcvt_h_s (__m256, __m256);
+__m256 __lasx_xvfcvth_s_h (__m256i);
+__m256d __lasx_xvfcvtl_d_s (__m256);
+__m256 __lasx_xvfcvtl_s_h (__m256i);
+__m256 __lasx_xvfcvt_s_d (__m256d, __m256d);
+__m256d __lasx_xvfdiv_d (__m256d, __m256d);
+__m256 __lasx_xvfdiv_s (__m256, __m256);
+__m256d __lasx_xvffint_d_l (__m256i);
+__m256d __lasx_xvffint_d_lu (__m256i);
+__m256d __lasx_xvffinth_d_w (__m256i);
+__m256d __lasx_xvffintl_d_w (__m256i);
+__m256 __lasx_xvffint_s_l (__m256i, __m256i);
+__m256 __lasx_xvffint_s_w (__m256i);
+__m256 __lasx_xvffint_s_wu (__m256i);
+__m256d __lasx_xvflogb_d (__m256d);
+__m256 __lasx_xvflogb_s (__m256);
+__m256d __lasx_xvfmadd_d (__m256d, __m256d, __m256d);
+__m256 __lasx_xvfmadd_s (__m256, __m256, __m256);
+__m256d __lasx_xvfmaxa_d (__m256d, __m256d);
+__m256 __lasx_xvfmaxa_s (__m256, __m256);
+__m256d __lasx_xvfmax_d (__m256d, __m256d);
+__m256 __lasx_xvfmax_s (__m256, __m256);
+__m256d __lasx_xvfmina_d (__m256d, __m256d);
+__m256 __lasx_xvfmina_s (__m256, __m256);
+__m256d __lasx_xvfmin_d (__m256d, __m256d);
+__m256 __lasx_xvfmin_s (__m256, __m256);
+__m256d __lasx_xvfmsub_d (__m256d, __m256d, __m256d);
+__m256 __lasx_xvfmsub_s (__m256, __m256, __m256);
+__m256d __lasx_xvfmul_d (__m256d, __m256d);
+__m256 __lasx_xvfmul_s (__m256, __m256);
+__m256d __lasx_xvfnmadd_d (__m256d, __m256d, __m256d);
+__m256 __lasx_xvfnmadd_s (__m256, __m256, __m256);
+__m256d __lasx_xvfnmsub_d (__m256d, __m256d, __m256d);
+__m256 __lasx_xvfnmsub_s (__m256, __m256, __m256);
+__m256d __lasx_xvfrecip_d (__m256d);
+__m256 __lasx_xvfrecip_s (__m256);
+__m256d __lasx_xvfrint_d (__m256d);
+__m256d __lasx_xvfrintrm_d (__m256d);
+__m256 __lasx_xvfrintrm_s (__m256);
+__m256d __lasx_xvfrintrne_d (__m256d);
+__m256 __lasx_xvfrintrne_s (__m256);
+__m256d __lasx_xvfrintrp_d (__m256d);
+__m256 __lasx_xvfrintrp_s (__m256);
+__m256d __lasx_xvfrintrz_d (__m256d);
+__m256 __lasx_xvfrintrz_s (__m256);
+__m256 __lasx_xvfrint_s (__m256);
+__m256d __lasx_xvfrsqrt_d (__m256d);
+__m256 __lasx_xvfrsqrt_s (__m256);
+__m256i __lasx_xvfrstp_b (__m256i, __m256i, __m256i);
+__m256i __lasx_xvfrstp_h (__m256i, __m256i, __m256i);
+__m256i __lasx_xvfrstpi_b (__m256i, __m256i, imm0_31);
+__m256i __lasx_xvfrstpi_h (__m256i, __m256i, imm0_31);
+__m256d __lasx_xvfsqrt_d (__m256d);
+__m256 __lasx_xvfsqrt_s (__m256);
+__m256d __lasx_xvfsub_d (__m256d, __m256d);
+__m256 __lasx_xvfsub_s (__m256, __m256);
+__m256i __lasx_xvftinth_l_s (__m256);
+__m256i __lasx_xvftint_l_d (__m256d);
+__m256i __lasx_xvftintl_l_s (__m256);
+__m256i __lasx_xvftint_lu_d (__m256d);
+__m256i __lasx_xvftintrmh_l_s (__m256);
+__m256i __lasx_xvftintrm_l_d (__m256d);
+__m256i __lasx_xvftintrml_l_s (__m256);
+__m256i __lasx_xvftintrm_w_d (__m256d, __m256d);
+__m256i __lasx_xvftintrm_w_s (__m256);
+__m256i __lasx_xvftintrneh_l_s (__m256);
+__m256i __lasx_xvftintrne_l_d (__m256d);
+__m256i __lasx_xvftintrnel_l_s (__m256);
+__m256i __lasx_xvftintrne_w_d (__m256d, __m256d);
+__m256i __lasx_xvftintrne_w_s (__m256);
+__m256i __lasx_xvftintrph_l_s (__m256);
+__m256i __lasx_xvftintrp_l_d (__m256d);
+__m256i __lasx_xvftintrpl_l_s (__m256);
+__m256i __lasx_xvftintrp_w_d (__m256d, __m256d);
+__m256i __lasx_xvftintrp_w_s (__m256);
+__m256i __lasx_xvftintrzh_l_s (__m256);
+__m256i __lasx_xvftintrz_l_d (__m256d);
+__m256i __lasx_xvftintrzl_l_s (__m256);
+__m256i __lasx_xvftintrz_lu_d (__m256d);
+__m256i __lasx_xvftintrz_w_d (__m256d, __m256d);
+__m256i __lasx_xvftintrz_w_s (__m256);
+__m256i __lasx_xvftintrz_wu_s (__m256);
+__m256i __lasx_xvftint_w_d (__m256d, __m256d);
+__m256i __lasx_xvftint_w_s (__m256);
+__m256i __lasx_xvftint_wu_s (__m256);
+__m256i __lasx_xvhaddw_du_wu (__m256i, __m256i);
+__m256i __lasx_xvhaddw_d_w (__m256i, __m256i);
+__m256i __lasx_xvhaddw_h_b (__m256i, __m256i);
+__m256i __lasx_xvhaddw_hu_bu (__m256i, __m256i);
+__m256i __lasx_xvhaddw_q_d (__m256i, __m256i);
+__m256i __lasx_xvhaddw_qu_du (__m256i, __m256i);
+__m256i __lasx_xvhaddw_w_h (__m256i, __m256i);
+__m256i __lasx_xvhaddw_wu_hu (__m256i, __m256i);
+__m256i __lasx_xvhsubw_du_wu (__m256i, __m256i);
+__m256i __lasx_xvhsubw_d_w (__m256i, __m256i);
+__m256i __lasx_xvhsubw_h_b (__m256i, __m256i);
+__m256i __lasx_xvhsubw_hu_bu (__m256i, __m256i);
+__m256i __lasx_xvhsubw_q_d (__m256i, __m256i);
+__m256i __lasx_xvhsubw_qu_du (__m256i, __m256i);
+__m256i __lasx_xvhsubw_w_h (__m256i, __m256i);
+__m256i __lasx_xvhsubw_wu_hu (__m256i, __m256i);
+__m256i __lasx_xvilvh_b (__m256i, __m256i);
+__m256i __lasx_xvilvh_d (__m256i, __m256i);
+__m256i __lasx_xvilvh_h (__m256i, __m256i);
+__m256i __lasx_xvilvh_w (__m256i, __m256i);
+__m256i __lasx_xvilvl_b (__m256i, __m256i);
+__m256i __lasx_xvilvl_d (__m256i, __m256i);
+__m256i __lasx_xvilvl_h (__m256i, __m256i);
+__m256i __lasx_xvilvl_w (__m256i, __m256i);
+__m256i __lasx_xvinsgr2vr_d (__m256i, long int, imm0_3);
+__m256i __lasx_xvinsgr2vr_w (__m256i, int, imm0_7);
+__m256i __lasx_xvinsve0_d (__m256i, __m256i, imm0_3);
+__m256i __lasx_xvinsve0_w (__m256i, __m256i, imm0_7);
+__m256i __lasx_xvld (void *, imm_n2048_2047);
+__m256i __lasx_xvldi (imm_n1024_1023);
+__m256i __lasx_xvldrepl_b (void *, imm_n2048_2047);
+__m256i __lasx_xvldrepl_d (void *, imm_n256_255);
+__m256i __lasx_xvldrepl_h (void *, imm_n1024_1023);
+__m256i __lasx_xvldrepl_w (void *, imm_n512_511);
+__m256i __lasx_xvldx (void *, long int);
+__m256i __lasx_xvmadd_b (__m256i, __m256i, __m256i);
+__m256i __lasx_xvmadd_d (__m256i, __m256i, __m256i);
+__m256i __lasx_xvmadd_h (__m256i, __m256i, __m256i);
+__m256i __lasx_xvmadd_w (__m256i, __m256i, __m256i);
+__m256i __lasx_xvmaddwev_d_w (__m256i, __m256i, __m256i);
+__m256i __lasx_xvmaddwev_d_wu (__m256i, __m256i, __m256i);
+__m256i __lasx_xvmaddwev_d_wu_w (__m256i, __m256i, __m256i);
+__m256i __lasx_xvmaddwev_h_b (__m256i, __m256i, __m256i);
+__m256i __lasx_xvmaddwev_h_bu (__m256i, __m256i, __m256i);
+__m256i __lasx_xvmaddwev_h_bu_b (__m256i, __m256i, __m256i);
+__m256i __lasx_xvmaddwev_q_d (__m256i, __m256i, __m256i);
+__m256i __lasx_xvmaddwev_q_du (__m256i, __m256i, __m256i);
+__m256i __lasx_xvmaddwev_q_du_d (__m256i, __m256i, __m256i);
+__m256i __lasx_xvmaddwev_w_h (__m256i, __m256i, __m256i);
+__m256i __lasx_xvmaddwev_w_hu (__m256i, __m256i, __m256i);
+__m256i __lasx_xvmaddwev_w_hu_h (__m256i, __m256i, __m256i);
+__m256i __lasx_xvmaddwod_d_w (__m256i, __m256i, __m256i);
+__m256i __lasx_xvmaddwod_d_wu (__m256i, __m256i, __m256i);
+__m256i __lasx_xvmaddwod_d_wu_w (__m256i, __m256i, __m256i);
+__m256i __lasx_xvmaddwod_h_b (__m256i, __m256i, __m256i);
+__m256i __lasx_xvmaddwod_h_bu (__m256i, __m256i, __m256i);
+__m256i __lasx_xvmaddwod_h_bu_b (__m256i, __m256i, __m256i);
+__m256i __lasx_xvmaddwod_q_d (__m256i, __m256i, __m256i);
+__m256i __lasx_xvmaddwod_q_du (__m256i, __m256i, __m256i);
+__m256i __lasx_xvmaddwod_q_du_d (__m256i, __m256i, __m256i);
+__m256i __lasx_xvmaddwod_w_h (__m256i, __m256i, __m256i);
+__m256i __lasx_xvmaddwod_w_hu (__m256i, __m256i, __m256i);
+__m256i __lasx_xvmaddwod_w_hu_h (__m256i, __m256i, __m256i);
+__m256i __lasx_xvmax_b (__m256i, __m256i);
+__m256i __lasx_xvmax_bu (__m256i, __m256i);
+__m256i __lasx_xvmax_d (__m256i, __m256i);
+__m256i __lasx_xvmax_du (__m256i, __m256i);
+__m256i __lasx_xvmax_h (__m256i, __m256i);
+__m256i __lasx_xvmax_hu (__m256i, __m256i);
+__m256i __lasx_xvmaxi_b (__m256i, imm_n16_15);
+__m256i __lasx_xvmaxi_bu (__m256i, imm0_31);
+__m256i __lasx_xvmaxi_d (__m256i, imm_n16_15);
+__m256i __lasx_xvmaxi_du (__m256i, imm0_31);
+__m256i __lasx_xvmaxi_h (__m256i, imm_n16_15);
+__m256i __lasx_xvmaxi_hu (__m256i, imm0_31);
+__m256i __lasx_xvmaxi_w (__m256i, imm_n16_15);
+__m256i __lasx_xvmaxi_wu (__m256i, imm0_31);
+__m256i __lasx_xvmax_w (__m256i, __m256i);
+__m256i __lasx_xvmax_wu (__m256i, __m256i);
+__m256i __lasx_xvmin_b (__m256i, __m256i);
+__m256i __lasx_xvmin_bu (__m256i, __m256i);
+__m256i __lasx_xvmin_d (__m256i, __m256i);
+__m256i __lasx_xvmin_du (__m256i, __m256i);
+__m256i __lasx_xvmin_h (__m256i, __m256i);
+__m256i __lasx_xvmin_hu (__m256i, __m256i);
+__m256i __lasx_xvmini_b (__m256i, imm_n16_15);
+__m256i __lasx_xvmini_bu (__m256i, imm0_31);
+__m256i __lasx_xvmini_d (__m256i, imm_n16_15);
+__m256i __lasx_xvmini_du (__m256i, imm0_31);
+__m256i __lasx_xvmini_h (__m256i, imm_n16_15);
+__m256i __lasx_xvmini_hu (__m256i, imm0_31);
+__m256i __lasx_xvmini_w (__m256i, imm_n16_15);
+__m256i __lasx_xvmini_wu (__m256i, imm0_31);
+__m256i __lasx_xvmin_w (__m256i, __m256i);
+__m256i __lasx_xvmin_wu (__m256i, __m256i);
+__m256i __lasx_xvmod_b (__m256i, __m256i);
+__m256i __lasx_xvmod_bu (__m256i, __m256i);
+__m256i __lasx_xvmod_d (__m256i, __m256i);
+__m256i __lasx_xvmod_du (__m256i, __m256i);
+__m256i __lasx_xvmod_h (__m256i, __m256i);
+__m256i __lasx_xvmod_hu (__m256i, __m256i);
+__m256i __lasx_xvmod_w (__m256i, __m256i);
+__m256i __lasx_xvmod_wu (__m256i, __m256i);
+__m256i __lasx_xvmskgez_b (__m256i);
+__m256i __lasx_xvmskltz_b (__m256i);
+__m256i __lasx_xvmskltz_d (__m256i);
+__m256i __lasx_xvmskltz_h (__m256i);
+__m256i __lasx_xvmskltz_w (__m256i);
+__m256i __lasx_xvmsknz_b (__m256i);
+__m256i __lasx_xvmsub_b (__m256i, __m256i, __m256i);
+__m256i __lasx_xvmsub_d (__m256i, __m256i, __m256i);
+__m256i __lasx_xvmsub_h (__m256i, __m256i, __m256i);
+__m256i __lasx_xvmsub_w (__m256i, __m256i, __m256i);
+__m256i __lasx_xvmuh_b (__m256i, __m256i);
+__m256i __lasx_xvmuh_bu (__m256i, __m256i);
+__m256i __lasx_xvmuh_d (__m256i, __m256i);
+__m256i __lasx_xvmuh_du (__m256i, __m256i);
+__m256i __lasx_xvmuh_h (__m256i, __m256i);
+__m256i __lasx_xvmuh_hu (__m256i, __m256i);
+__m256i __lasx_xvmuh_w (__m256i, __m256i);
+__m256i __lasx_xvmuh_wu (__m256i, __m256i);
+__m256i __lasx_xvmul_b (__m256i, __m256i);
+__m256i __lasx_xvmul_d (__m256i, __m256i);
+__m256i __lasx_xvmul_h (__m256i, __m256i);
+__m256i __lasx_xvmul_w (__m256i, __m256i);
+__m256i __lasx_xvmulwev_d_w (__m256i, __m256i);
+__m256i __lasx_xvmulwev_d_wu (__m256i, __m256i);
+__m256i __lasx_xvmulwev_d_wu_w (__m256i, __m256i);
+__m256i __lasx_xvmulwev_h_b (__m256i, __m256i);
+__m256i __lasx_xvmulwev_h_bu (__m256i, __m256i);
+__m256i __lasx_xvmulwev_h_bu_b (__m256i, __m256i);
+__m256i __lasx_xvmulwev_q_d (__m256i, __m256i);
+__m256i __lasx_xvmulwev_q_du (__m256i, __m256i);
+__m256i __lasx_xvmulwev_q_du_d (__m256i, __m256i);
+__m256i __lasx_xvmulwev_w_h (__m256i, __m256i);
+__m256i __lasx_xvmulwev_w_hu (__m256i, __m256i);
+__m256i __lasx_xvmulwev_w_hu_h (__m256i, __m256i);
+__m256i __lasx_xvmulwod_d_w (__m256i, __m256i);
+__m256i __lasx_xvmulwod_d_wu (__m256i, __m256i);
+__m256i __lasx_xvmulwod_d_wu_w (__m256i, __m256i);
+__m256i __lasx_xvmulwod_h_b (__m256i, __m256i);
+__m256i __lasx_xvmulwod_h_bu (__m256i, __m256i);
+__m256i __lasx_xvmulwod_h_bu_b (__m256i, __m256i);
+__m256i __lasx_xvmulwod_q_d (__m256i, __m256i);
+__m256i __lasx_xvmulwod_q_du (__m256i, __m256i);
+__m256i __lasx_xvmulwod_q_du_d (__m256i, __m256i);
+__m256i __lasx_xvmulwod_w_h (__m256i, __m256i);
+__m256i __lasx_xvmulwod_w_hu (__m256i, __m256i);
+__m256i __lasx_xvmulwod_w_hu_h (__m256i, __m256i);
+__m256i __lasx_xvneg_b (__m256i);
+__m256i __lasx_xvneg_d (__m256i);
+__m256i __lasx_xvneg_h (__m256i);
+__m256i __lasx_xvneg_w (__m256i);
+__m256i __lasx_xvnori_b (__m256i, imm0_255);
+__m256i __lasx_xvnor_v (__m256i, __m256i);
+__m256i __lasx_xvori_b (__m256i, imm0_255);
+__m256i __lasx_xvorn_v (__m256i, __m256i);
+__m256i __lasx_xvor_v (__m256i, __m256i);
+__m256i __lasx_xvpackev_b (__m256i, __m256i);
+__m256i __lasx_xvpackev_d (__m256i, __m256i);
+__m256i __lasx_xvpackev_h (__m256i, __m256i);
+__m256i __lasx_xvpackev_w (__m256i, __m256i);
+__m256i __lasx_xvpackod_b (__m256i, __m256i);
+__m256i __lasx_xvpackod_d (__m256i, __m256i);
+__m256i __lasx_xvpackod_h (__m256i, __m256i);
+__m256i __lasx_xvpackod_w (__m256i, __m256i);
+__m256i __lasx_xvpcnt_b (__m256i);
+__m256i __lasx_xvpcnt_d (__m256i);
+__m256i __lasx_xvpcnt_h (__m256i);
+__m256i __lasx_xvpcnt_w (__m256i);
+__m256i __lasx_xvpermi_d (__m256i, imm0_255);
+__m256i __lasx_xvpermi_q (__m256i, __m256i, imm0_255);
+__m256i __lasx_xvpermi_w (__m256i, __m256i, imm0_255);
+__m256i __lasx_xvperm_w (__m256i, __m256i);
+__m256i __lasx_xvpickev_b (__m256i, __m256i);
+__m256i __lasx_xvpickev_d (__m256i, __m256i);
+__m256i __lasx_xvpickev_h (__m256i, __m256i);
+__m256i __lasx_xvpickev_w (__m256i, __m256i);
+__m256i __lasx_xvpickod_b (__m256i, __m256i);
+__m256i __lasx_xvpickod_d (__m256i, __m256i);
+__m256i __lasx_xvpickod_h (__m256i, __m256i);
+__m256i __lasx_xvpickod_w (__m256i, __m256i);
+long int __lasx_xvpickve2gr_d (__m256i, imm0_3);
+unsigned long int __lasx_xvpickve2gr_du (__m256i, imm0_3);
+int __lasx_xvpickve2gr_w (__m256i, imm0_7);
+unsigned int __lasx_xvpickve2gr_wu (__m256i, imm0_7);
+__m256i __lasx_xvpickve_d (__m256i, imm0_3);
+__m256d __lasx_xvpickve_d_f (__m256d, imm0_3);
+__m256i __lasx_xvpickve_w (__m256i, imm0_7);
+__m256 __lasx_xvpickve_w_f (__m256, imm0_7);
+__m256i __lasx_xvrepl128vei_b (__m256i, imm0_15);
+__m256i __lasx_xvrepl128vei_d (__m256i, imm0_1);
+__m256i __lasx_xvrepl128vei_h (__m256i, imm0_7);
+__m256i __lasx_xvrepl128vei_w (__m256i, imm0_3);
+__m256i __lasx_xvreplgr2vr_b (int);
+__m256i __lasx_xvreplgr2vr_d (long int);
+__m256i __lasx_xvreplgr2vr_h (int);
+__m256i __lasx_xvreplgr2vr_w (int);
+__m256i __lasx_xvrepli_b (imm_n512_511);
+__m256i __lasx_xvrepli_d (imm_n512_511);
+__m256i __lasx_xvrepli_h (imm_n512_511);
+__m256i __lasx_xvrepli_w (imm_n512_511);
+__m256i __lasx_xvreplve0_b (__m256i);
+__m256i __lasx_xvreplve0_d (__m256i);
+__m256i __lasx_xvreplve0_h (__m256i);
+__m256i __lasx_xvreplve0_q (__m256i);
+__m256i __lasx_xvreplve0_w (__m256i);
+__m256i __lasx_xvreplve_b (__m256i, int);
+__m256i __lasx_xvreplve_d (__m256i, int);
+__m256i __lasx_xvreplve_h (__m256i, int);
+__m256i __lasx_xvreplve_w (__m256i, int);
+__m256i __lasx_xvrotr_b (__m256i, __m256i);
+__m256i __lasx_xvrotr_d (__m256i, __m256i);
+__m256i __lasx_xvrotr_h (__m256i, __m256i);
+__m256i __lasx_xvrotri_b (__m256i, imm0_7);
+__m256i __lasx_xvrotri_d (__m256i, imm0_63);
+__m256i __lasx_xvrotri_h (__m256i, imm0_15);
+__m256i __lasx_xvrotri_w (__m256i, imm0_31);
+__m256i __lasx_xvrotr_w (__m256i, __m256i);
+__m256i __lasx_xvsadd_b (__m256i, __m256i);
+__m256i __lasx_xvsadd_bu (__m256i, __m256i);
+__m256i __lasx_xvsadd_d (__m256i, __m256i);
+__m256i __lasx_xvsadd_du (__m256i, __m256i);
+__m256i __lasx_xvsadd_h (__m256i, __m256i);
+__m256i __lasx_xvsadd_hu (__m256i, __m256i);
+__m256i __lasx_xvsadd_w (__m256i, __m256i);
+__m256i __lasx_xvsadd_wu (__m256i, __m256i);
+__m256i __lasx_xvsat_b (__m256i, imm0_7);
+__m256i __lasx_xvsat_bu (__m256i, imm0_7);
+__m256i __lasx_xvsat_d (__m256i, imm0_63);
+__m256i __lasx_xvsat_du (__m256i, imm0_63);
+__m256i __lasx_xvsat_h (__m256i, imm0_15);
+__m256i __lasx_xvsat_hu (__m256i, imm0_15);
+__m256i __lasx_xvsat_w (__m256i, imm0_31);
+__m256i __lasx_xvsat_wu (__m256i, imm0_31);
+__m256i __lasx_xvseq_b (__m256i, __m256i);
+__m256i __lasx_xvseq_d (__m256i, __m256i);
+__m256i __lasx_xvseq_h (__m256i, __m256i);
+__m256i __lasx_xvseqi_b (__m256i, imm_n16_15);
+__m256i __lasx_xvseqi_d (__m256i, imm_n16_15);
+__m256i __lasx_xvseqi_h (__m256i, imm_n16_15);
+__m256i __lasx_xvseqi_w (__m256i, imm_n16_15);
+__m256i __lasx_xvseq_w (__m256i, __m256i);
+__m256i __lasx_xvshuf4i_b (__m256i, imm0_255);
+__m256i __lasx_xvshuf4i_d (__m256i, __m256i, imm0_255);
+__m256i __lasx_xvshuf4i_h (__m256i, imm0_255);
+__m256i __lasx_xvshuf4i_w (__m256i, imm0_255);
+__m256i __lasx_xvshuf_b (__m256i, __m256i, __m256i);
+__m256i __lasx_xvshuf_d (__m256i, __m256i, __m256i);
+__m256i __lasx_xvshuf_h (__m256i, __m256i, __m256i);
+__m256i __lasx_xvshuf_w (__m256i, __m256i, __m256i);
+__m256i __lasx_xvsigncov_b (__m256i, __m256i);
+__m256i __lasx_xvsigncov_d (__m256i, __m256i);
+__m256i __lasx_xvsigncov_h (__m256i, __m256i);
+__m256i __lasx_xvsigncov_w (__m256i, __m256i);
+__m256i __lasx_xvsle_b (__m256i, __m256i);
+__m256i __lasx_xvsle_bu (__m256i, __m256i);
+__m256i __lasx_xvsle_d (__m256i, __m256i);
+__m256i __lasx_xvsle_du (__m256i, __m256i);
+__m256i __lasx_xvsle_h (__m256i, __m256i);
+__m256i __lasx_xvsle_hu (__m256i, __m256i);
+__m256i __lasx_xvslei_b (__m256i, imm_n16_15);
+__m256i __lasx_xvslei_bu (__m256i, imm0_31);
+__m256i __lasx_xvslei_d (__m256i, imm_n16_15);
+__m256i __lasx_xvslei_du (__m256i, imm0_31);
+__m256i __lasx_xvslei_h (__m256i, imm_n16_15);
+__m256i __lasx_xvslei_hu (__m256i, imm0_31);
+__m256i __lasx_xvslei_w (__m256i, imm_n16_15);
+__m256i __lasx_xvslei_wu (__m256i, imm0_31);
+__m256i __lasx_xvsle_w (__m256i, __m256i);
+__m256i __lasx_xvsle_wu (__m256i, __m256i);
+__m256i __lasx_xvsll_b (__m256i, __m256i);
+__m256i __lasx_xvsll_d (__m256i, __m256i);
+__m256i __lasx_xvsll_h (__m256i, __m256i);
+__m256i __lasx_xvslli_b (__m256i, imm0_7);
+__m256i __lasx_xvslli_d (__m256i, imm0_63);
+__m256i __lasx_xvslli_h (__m256i, imm0_15);
+__m256i __lasx_xvslli_w (__m256i, imm0_31);
+__m256i __lasx_xvsll_w (__m256i, __m256i);
+__m256i __lasx_xvsllwil_du_wu (__m256i, imm0_31);
+__m256i __lasx_xvsllwil_d_w (__m256i, imm0_31);
+__m256i __lasx_xvsllwil_h_b (__m256i, imm0_7);
+__m256i __lasx_xvsllwil_hu_bu (__m256i, imm0_7);
+__m256i __lasx_xvsllwil_w_h (__m256i, imm0_15);
+__m256i __lasx_xvsllwil_wu_hu (__m256i, imm0_15);
+__m256i __lasx_xvslt_b (__m256i, __m256i);
+__m256i __lasx_xvslt_bu (__m256i, __m256i);
+__m256i __lasx_xvslt_d (__m256i, __m256i);
+__m256i __lasx_xvslt_du (__m256i, __m256i);
+__m256i __lasx_xvslt_h (__m256i, __m256i);
+__m256i __lasx_xvslt_hu (__m256i, __m256i);
+__m256i __lasx_xvslti_b (__m256i, imm_n16_15);
+__m256i __lasx_xvslti_bu (__m256i, imm0_31);
+__m256i __lasx_xvslti_d (__m256i, imm_n16_15);
+__m256i __lasx_xvslti_du (__m256i, imm0_31);
+__m256i __lasx_xvslti_h (__m256i, imm_n16_15);
+__m256i __lasx_xvslti_hu (__m256i, imm0_31);
+__m256i __lasx_xvslti_w (__m256i, imm_n16_15);
+__m256i __lasx_xvslti_wu (__m256i, imm0_31);
+__m256i __lasx_xvslt_w (__m256i, __m256i);
+__m256i __lasx_xvslt_wu (__m256i, __m256i);
+__m256i __lasx_xvsra_b (__m256i, __m256i);
+__m256i __lasx_xvsra_d (__m256i, __m256i);
+__m256i __lasx_xvsra_h (__m256i, __m256i);
+__m256i __lasx_xvsrai_b (__m256i, imm0_7);
+__m256i __lasx_xvsrai_d (__m256i, imm0_63);
+__m256i __lasx_xvsrai_h (__m256i, imm0_15);
+__m256i __lasx_xvsrai_w (__m256i, imm0_31);
+__m256i __lasx_xvsran_b_h (__m256i, __m256i);
+__m256i __lasx_xvsran_h_w (__m256i, __m256i);
+__m256i __lasx_xvsrani_b_h (__m256i, __m256i, imm0_15);
+__m256i __lasx_xvsrani_d_q (__m256i, __m256i, imm0_127);
+__m256i __lasx_xvsrani_h_w (__m256i, __m256i, imm0_31);
+__m256i __lasx_xvsrani_w_d (__m256i, __m256i, imm0_63);
+__m256i __lasx_xvsran_w_d (__m256i, __m256i);
+__m256i __lasx_xvsrar_b (__m256i, __m256i);
+__m256i __lasx_xvsrar_d (__m256i, __m256i);
+__m256i __lasx_xvsrar_h (__m256i, __m256i);
+__m256i __lasx_xvsrari_b (__m256i, imm0_7);
+__m256i __lasx_xvsrari_d (__m256i, imm0_63);
+__m256i __lasx_xvsrari_h (__m256i, imm0_15);
+__m256i __lasx_xvsrari_w (__m256i, imm0_31);
+__m256i __lasx_xvsrarn_b_h (__m256i, __m256i);
+__m256i __lasx_xvsrarn_h_w (__m256i, __m256i);
+__m256i __lasx_xvsrarni_b_h (__m256i, __m256i, imm0_15);
+__m256i __lasx_xvsrarni_d_q (__m256i, __m256i, imm0_127);
+__m256i __lasx_xvsrarni_h_w (__m256i, __m256i, imm0_31);
+__m256i __lasx_xvsrarni_w_d (__m256i, __m256i, imm0_63);
+__m256i __lasx_xvsrarn_w_d (__m256i, __m256i);
+__m256i __lasx_xvsrar_w (__m256i, __m256i);
+__m256i __lasx_xvsra_w (__m256i, __m256i);
+__m256i __lasx_xvsrl_b (__m256i, __m256i);
+__m256i __lasx_xvsrl_d (__m256i, __m256i);
+__m256i __lasx_xvsrl_h (__m256i, __m256i);
+__m256i __lasx_xvsrli_b (__m256i, imm0_7);
+__m256i __lasx_xvsrli_d (__m256i, imm0_63);
+__m256i __lasx_xvsrli_h (__m256i, imm0_15);
+__m256i __lasx_xvsrli_w (__m256i, imm0_31);
+__m256i __lasx_xvsrln_b_h (__m256i, __m256i);
+__m256i __lasx_xvsrln_h_w (__m256i, __m256i);
+__m256i __lasx_xvsrlni_b_h (__m256i, __m256i, imm0_15);
+__m256i __lasx_xvsrlni_d_q (__m256i, __m256i, imm0_127);
+__m256i __lasx_xvsrlni_h_w (__m256i, __m256i, imm0_31);
+__m256i __lasx_xvsrlni_w_d (__m256i, __m256i, imm0_63);
+__m256i __lasx_xvsrln_w_d (__m256i, __m256i);
+__m256i __lasx_xvsrlr_b (__m256i, __m256i);
+__m256i __lasx_xvsrlr_d (__m256i, __m256i);
+__m256i __lasx_xvsrlr_h (__m256i, __m256i);
+__m256i __lasx_xvsrlri_b (__m256i, imm0_7);
+__m256i __lasx_xvsrlri_d (__m256i, imm0_63);
+__m256i __lasx_xvsrlri_h (__m256i, imm0_15);
+__m256i __lasx_xvsrlri_w (__m256i, imm0_31);
+__m256i __lasx_xvsrlrn_b_h (__m256i, __m256i);
+__m256i __lasx_xvsrlrn_h_w (__m256i, __m256i);
+__m256i __lasx_xvsrlrni_b_h (__m256i, __m256i, imm0_15);
+__m256i __lasx_xvsrlrni_d_q (__m256i, __m256i, imm0_127);
+__m256i __lasx_xvsrlrni_h_w (__m256i, __m256i, imm0_31);
+__m256i __lasx_xvsrlrni_w_d (__m256i, __m256i, imm0_63);
+__m256i __lasx_xvsrlrn_w_d (__m256i, __m256i);
+__m256i __lasx_xvsrlr_w (__m256i, __m256i);
+__m256i __lasx_xvsrl_w (__m256i, __m256i);
+__m256i __lasx_xvssran_b_h (__m256i, __m256i);
+__m256i __lasx_xvssran_bu_h (__m256i, __m256i);
+__m256i __lasx_xvssran_hu_w (__m256i, __m256i);
+__m256i __lasx_xvssran_h_w (__m256i, __m256i);
+__m256i __lasx_xvssrani_b_h (__m256i, __m256i, imm0_15);
+__m256i __lasx_xvssrani_bu_h (__m256i, __m256i, imm0_15);
+__m256i __lasx_xvssrani_d_q (__m256i, __m256i, imm0_127);
+__m256i __lasx_xvssrani_du_q (__m256i, __m256i, imm0_127);
+__m256i __lasx_xvssrani_hu_w (__m256i, __m256i, imm0_31);
+__m256i __lasx_xvssrani_h_w (__m256i, __m256i, imm0_31);
+__m256i __lasx_xvssrani_w_d (__m256i, __m256i, imm0_63);
+__m256i __lasx_xvssrani_wu_d (__m256i, __m256i, imm0_63);
+__m256i __lasx_xvssran_w_d (__m256i, __m256i);
+__m256i __lasx_xvssran_wu_d (__m256i, __m256i);
+__m256i __lasx_xvssrarn_b_h (__m256i, __m256i);
+__m256i __lasx_xvssrarn_bu_h (__m256i, __m256i);
+__m256i __lasx_xvssrarn_hu_w (__m256i, __m256i);
+__m256i __lasx_xvssrarn_h_w (__m256i, __m256i);
+__m256i __lasx_xvssrarni_b_h (__m256i, __m256i, imm0_15);
+__m256i __lasx_xvssrarni_bu_h (__m256i, __m256i, imm0_15);
+__m256i __lasx_xvssrarni_d_q (__m256i, __m256i, imm0_127);
+__m256i __lasx_xvssrarni_du_q (__m256i, __m256i, imm0_127);
+__m256i __lasx_xvssrarni_hu_w (__m256i, __m256i, imm0_31);
+__m256i __lasx_xvssrarni_h_w (__m256i, __m256i, imm0_31);
+__m256i __lasx_xvssrarni_w_d (__m256i, __m256i, imm0_63);
+__m256i __lasx_xvssrarni_wu_d (__m256i, __m256i, imm0_63);
+__m256i __lasx_xvssrarn_w_d (__m256i, __m256i);
+__m256i __lasx_xvssrarn_wu_d (__m256i, __m256i);
+__m256i __lasx_xvssrln_b_h (__m256i, __m256i);
+__m256i __lasx_xvssrln_bu_h (__m256i, __m256i);
+__m256i __lasx_xvssrln_hu_w (__m256i, __m256i);
+__m256i __lasx_xvssrln_h_w (__m256i, __m256i);
+__m256i __lasx_xvssrlni_b_h (__m256i, __m256i, imm0_15);
+__m256i __lasx_xvssrlni_bu_h (__m256i, __m256i, imm0_15);
+__m256i __lasx_xvssrlni_d_q (__m256i, __m256i, imm0_127);
+__m256i __lasx_xvssrlni_du_q (__m256i, __m256i, imm0_127);
+__m256i __lasx_xvssrlni_hu_w (__m256i, __m256i, imm0_31);
+__m256i __lasx_xvssrlni_h_w (__m256i, __m256i, imm0_31);
+__m256i __lasx_xvssrlni_w_d (__m256i, __m256i, imm0_63);
+__m256i __lasx_xvssrlni_wu_d (__m256i, __m256i, imm0_63);
+__m256i __lasx_xvssrln_w_d (__m256i, __m256i);
+__m256i __lasx_xvssrln_wu_d (__m256i, __m256i);
+__m256i __lasx_xvssrlrn_b_h (__m256i, __m256i);
+__m256i __lasx_xvssrlrn_bu_h (__m256i, __m256i);
+__m256i __lasx_xvssrlrn_hu_w (__m256i, __m256i);
+__m256i __lasx_xvssrlrn_h_w (__m256i, __m256i);
+__m256i __lasx_xvssrlrni_b_h (__m256i, __m256i, imm0_15);
+__m256i __lasx_xvssrlrni_bu_h (__m256i, __m256i, imm0_15);
+__m256i __lasx_xvssrlrni_d_q (__m256i, __m256i, imm0_127);
+__m256i __lasx_xvssrlrni_du_q (__m256i, __m256i, imm0_127);
+__m256i __lasx_xvssrlrni_hu_w (__m256i, __m256i, imm0_31);
+__m256i __lasx_xvssrlrni_h_w (__m256i, __m256i, imm0_31);
+__m256i __lasx_xvssrlrni_w_d (__m256i, __m256i, imm0_63);
+__m256i __lasx_xvssrlrni_wu_d (__m256i, __m256i, imm0_63);
+__m256i __lasx_xvssrlrn_w_d (__m256i, __m256i);
+__m256i __lasx_xvssrlrn_wu_d (__m256i, __m256i);
+__m256i __lasx_xvssub_b (__m256i, __m256i);
+__m256i __lasx_xvssub_bu (__m256i, __m256i);
+__m256i __lasx_xvssub_d (__m256i, __m256i);
+__m256i __lasx_xvssub_du (__m256i, __m256i);
+__m256i __lasx_xvssub_h (__m256i, __m256i);
+__m256i __lasx_xvssub_hu (__m256i, __m256i);
+__m256i __lasx_xvssub_w (__m256i, __m256i);
+__m256i __lasx_xvssub_wu (__m256i, __m256i);
+void __lasx_xvst (__m256i, void *, imm_n2048_2047);
+void __lasx_xvstelm_b (__m256i, void *, imm_n128_127, imm0_31);
+void __lasx_xvstelm_d (__m256i, void *, imm_n128_127, imm0_3);
+void __lasx_xvstelm_h (__m256i, void *, imm_n128_127, imm0_15);
+void __lasx_xvstelm_w (__m256i, void *, imm_n128_127, imm0_7);
+void __lasx_xvstx (__m256i, void *, long int);
+__m256i __lasx_xvsub_b (__m256i, __m256i);
+__m256i __lasx_xvsub_d (__m256i, __m256i);
+__m256i __lasx_xvsub_h (__m256i, __m256i);
+__m256i __lasx_xvsubi_bu (__m256i, imm0_31);
+__m256i __lasx_xvsubi_du (__m256i, imm0_31);
+__m256i __lasx_xvsubi_hu (__m256i, imm0_31);
+__m256i __lasx_xvsubi_wu (__m256i, imm0_31);
+__m256i __lasx_xvsub_q (__m256i, __m256i);
+__m256i __lasx_xvsub_w (__m256i, __m256i);
+__m256i __lasx_xvsubwev_d_w (__m256i, __m256i);
+__m256i __lasx_xvsubwev_d_wu (__m256i, __m256i);
+__m256i __lasx_xvsubwev_h_b (__m256i, __m256i);
+__m256i __lasx_xvsubwev_h_bu (__m256i, __m256i);
+__m256i __lasx_xvsubwev_q_d (__m256i, __m256i);
+__m256i __lasx_xvsubwev_q_du (__m256i, __m256i);
+__m256i __lasx_xvsubwev_w_h (__m256i, __m256i);
+__m256i __lasx_xvsubwev_w_hu (__m256i, __m256i);
+__m256i __lasx_xvsubwod_d_w (__m256i, __m256i);
+__m256i __lasx_xvsubwod_d_wu (__m256i, __m256i);
+__m256i __lasx_xvsubwod_h_b (__m256i, __m256i);
+__m256i __lasx_xvsubwod_h_bu (__m256i, __m256i);
+__m256i __lasx_xvsubwod_q_d (__m256i, __m256i);
+__m256i __lasx_xvsubwod_q_du (__m256i, __m256i);
+__m256i __lasx_xvsubwod_w_h (__m256i, __m256i);
+__m256i __lasx_xvsubwod_w_hu (__m256i, __m256i);
+__m256i __lasx_xvxori_b (__m256i, imm0_255);
+__m256i __lasx_xvxor_v (__m256i, __m256i);
+@end smallexample
 
-v16i8 __builtin_msa_msubv_b (v16i8, v16i8, v16i8);
-v8i16 __builtin_msa_msubv_h (v8i16, v8i16, v8i16);
-v4i32 __builtin_msa_msubv_w (v4i32, v4i32, v4i32);
-v2i64 __builtin_msa_msubv_d (v2i64, v2i64, v2i64);
+These intrinsic functions are available by including @code{lasxintrin.h} and
+using @option{-mfrecipe} and @option{-mlasx}.
+@smallexample
+__m256d __lasx_xvfrecipe_d (__m256d);
+__m256 __lasx_xvfrecipe_s (__m256);
+__m256d __lasx_xvfrsqrte_d (__m256d);
+__m256 __lasx_xvfrsqrte_s (__m256);
+@end smallexample
 
-v8i16 __builtin_msa_mul_q_h (v8i16, v8i16);
-v4i32 __builtin_msa_mul_q_w (v4i32, v4i32);
+@node MIPS DSP Built-in Functions
+@subsection MIPS DSP Built-in Functions
 
-v8i16 __builtin_msa_mulr_q_h (v8i16, v8i16);
-v4i32 __builtin_msa_mulr_q_w (v4i32, v4i32);
+The MIPS DSP Application-Specific Extension (ASE) includes new
+instructions that are designed to improve the performance of DSP and
+media applications.  It provides instructions that operate on packed
+8-bit/16-bit integer data, Q7, Q15 and Q31 fractional data.
 
-v16i8 __builtin_msa_mulv_b (v16i8, v16i8);
-v8i16 __builtin_msa_mulv_h (v8i16, v8i16);
-v4i32 __builtin_msa_mulv_w (v4i32, v4i32);
-v2i64 __builtin_msa_mulv_d (v2i64, v2i64);
+GCC supports MIPS DSP operations using both the generic
+vector extensions (@pxref{Vector Extensions}) and a collection of
+MIPS-specific built-in functions.  Both kinds of support are
+enabled by the @option{-mdsp} command-line option.
 
-v16i8 __builtin_msa_nloc_b (v16i8);
-v8i16 __builtin_msa_nloc_h (v8i16);
-v4i32 __builtin_msa_nloc_w (v4i32);
-v2i64 __builtin_msa_nloc_d (v2i64);
+Revision 2 of the ASE was introduced in the second half of 2006.
+This revision adds extra instructions to the original ASE, but is
+otherwise backwards-compatible with it.  You can select revision 2
+using the command-line option @option{-mdspr2}; this option implies
+@option{-mdsp}.
 
-v16i8 __builtin_msa_nlzc_b (v16i8);
-v8i16 __builtin_msa_nlzc_h (v8i16);
-v4i32 __builtin_msa_nlzc_w (v4i32);
-v2i64 __builtin_msa_nlzc_d (v2i64);
+The SCOUNT and POS bits of the DSP control register are global.  The
+WRDSP, EXTPDP, EXTPDPV and MTHLIP instructions modify the SCOUNT and
+POS bits.  During optimization, the compiler does not delete these
+instructions and it does not delete calls to functions containing
+these instructions.
 
-v16u8 __builtin_msa_nor_v (v16u8, v16u8);
+At present, GCC only provides support for operations on 32-bit
+vectors.  The vector type associated with 8-bit integer data is
+usually called @code{v4i8}, the vector type associated with Q7
+is usually called @code{v4q7}, the vector type associated with 16-bit
+integer data is usually called @code{v2i16}, and the vector type
+associated with Q15 is usually called @code{v2q15}.  They can be
+defined in C as follows:
 
-v16u8 __builtin_msa_nori_b (v16u8, imm0_255);
+@smallexample
+typedef signed char v4i8 __attribute__ ((vector_size(4)));
+typedef signed char v4q7 __attribute__ ((vector_size(4)));
+typedef short v2i16 __attribute__ ((vector_size(4)));
+typedef short v2q15 __attribute__ ((vector_size(4)));
+@end smallexample
 
-v16u8 __builtin_msa_or_v (v16u8, v16u8);
+@code{v4i8}, @code{v4q7}, @code{v2i16} and @code{v2q15} values are
+initialized in the same way as aggregates.  For example:
 
-v16u8 __builtin_msa_ori_b (v16u8, imm0_255);
+@smallexample
+v4i8 a = @{1, 2, 3, 4@};
+v4i8 b;
+b = (v4i8) @{5, 6, 7, 8@};
 
-v16i8 __builtin_msa_pckev_b (v16i8, v16i8);
-v8i16 __builtin_msa_pckev_h (v8i16, v8i16);
-v4i32 __builtin_msa_pckev_w (v4i32, v4i32);
-v2i64 __builtin_msa_pckev_d (v2i64, v2i64);
+v2q15 c = @{0x0fcb, 0x3a75@};
+v2q15 d;
+d = (v2q15) @{0.1234 * 0x1.0p15, 0.4567 * 0x1.0p15@};
+@end smallexample
 
-v16i8 __builtin_msa_pckod_b (v16i8, v16i8);
-v8i16 __builtin_msa_pckod_h (v8i16, v8i16);
-v4i32 __builtin_msa_pckod_w (v4i32, v4i32);
-v2i64 __builtin_msa_pckod_d (v2i64, v2i64);
+@emph{Note:} The CPU's endianness determines the order in which values
+are packed.  On little-endian targets, the first value is the least
+significant and the last value is the most significant.  The opposite
+order applies to big-endian targets.  For example, the code above
+sets the lowest byte of @code{a} to @code{1} on little-endian targets
+and @code{4} on big-endian targets.
 
-v16i8 __builtin_msa_pcnt_b (v16i8);
-v8i16 __builtin_msa_pcnt_h (v8i16);
-v4i32 __builtin_msa_pcnt_w (v4i32);
-v2i64 __builtin_msa_pcnt_d (v2i64);
+@emph{Note:} Q7, Q15 and Q31 values must be initialized with their integer
+representation.  As shown in this example, the integer representation
+of a Q7 value can be obtained by multiplying the fractional value by
+@code{0x1.0p7}.  The equivalent for Q15 values is to multiply by
+@code{0x1.0p15}.  The equivalent for Q31 values is to multiply by
+@code{0x1.0p31}.
 
-v16i8 __builtin_msa_sat_s_b (v16i8, imm0_7);
-v8i16 __builtin_msa_sat_s_h (v8i16, imm0_15);
-v4i32 __builtin_msa_sat_s_w (v4i32, imm0_31);
-v2i64 __builtin_msa_sat_s_d (v2i64, imm0_63);
+The table below lists the @code{v4i8} and @code{v2q15} operations for which
+hardware support exists.  @code{a} and @code{b} are @code{v4i8} values,
+and @code{c} and @code{d} are @code{v2q15} values.
 
-v16u8 __builtin_msa_sat_u_b (v16u8, imm0_7);
-v8u16 __builtin_msa_sat_u_h (v8u16, imm0_15);
-v4u32 __builtin_msa_sat_u_w (v4u32, imm0_31);
-v2u64 __builtin_msa_sat_u_d (v2u64, imm0_63);
+@multitable @columnfractions .50 .50
+@headitem C code @tab MIPS instruction
+@item @code{a + b} @tab @code{addu.qb}
+@item @code{c + d} @tab @code{addq.ph}
+@item @code{a - b} @tab @code{subu.qb}
+@item @code{c - d} @tab @code{subq.ph}
+@end multitable
 
-v16i8 __builtin_msa_shf_b (v16i8, imm0_255);
-v8i16 __builtin_msa_shf_h (v8i16, imm0_255);
-v4i32 __builtin_msa_shf_w (v4i32, imm0_255);
+The table below lists the @code{v2i16} operation for which
+hardware support exists for the DSP ASE REV 2.  @code{e} and @code{f} are
+@code{v2i16} values.
 
-v16i8 __builtin_msa_sld_b (v16i8, v16i8, i32);
-v8i16 __builtin_msa_sld_h (v8i16, v8i16, i32);
-v4i32 __builtin_msa_sld_w (v4i32, v4i32, i32);
-v2i64 __builtin_msa_sld_d (v2i64, v2i64, i32);
+@multitable @columnfractions .50 .50
+@headitem C code @tab MIPS instruction
+@item @code{e * f} @tab @code{mul.ph}
+@end multitable
 
-v16i8 __builtin_msa_sldi_b (v16i8, v16i8, imm0_15);
-v8i16 __builtin_msa_sldi_h (v8i16, v8i16, imm0_7);
-v4i32 __builtin_msa_sldi_w (v4i32, v4i32, imm0_3);
-v2i64 __builtin_msa_sldi_d (v2i64, v2i64, imm0_1);
+It is easier to describe the DSP built-in functions if we first define
+the following types:
 
-v16i8 __builtin_msa_sll_b (v16i8, v16i8);
-v8i16 __builtin_msa_sll_h (v8i16, v8i16);
-v4i32 __builtin_msa_sll_w (v4i32, v4i32);
-v2i64 __builtin_msa_sll_d (v2i64, v2i64);
+@smallexample
+typedef int q31;
+typedef int i32;
+typedef unsigned int ui32;
+typedef long long a64;
+@end smallexample
 
-v16i8 __builtin_msa_slli_b (v16i8, imm0_7);
-v8i16 __builtin_msa_slli_h (v8i16, imm0_15);
-v4i32 __builtin_msa_slli_w (v4i32, imm0_31);
-v2i64 __builtin_msa_slli_d (v2i64, imm0_63);
+@code{q31} and @code{i32} are actually the same as @code{int}, but we
+use @code{q31} to indicate a Q31 fractional value and @code{i32} to
+indicate a 32-bit integer value.  Similarly, @code{a64} is the same as
+@code{long long}, but we use @code{a64} to indicate values that are
+placed in one of the four DSP accumulators (@code{$ac0},
+@code{$ac1}, @code{$ac2} or @code{$ac3}).
 
-v16i8 __builtin_msa_splat_b (v16i8, i32);
-v8i16 __builtin_msa_splat_h (v8i16, i32);
-v4i32 __builtin_msa_splat_w (v4i32, i32);
-v2i64 __builtin_msa_splat_d (v2i64, i32);
+Also, some built-in functions prefer or require immediate numbers as
+parameters, because the corresponding DSP instructions accept both immediate
+numbers and register operands, or accept immediate numbers only.  The
+immediate parameters are listed as follows.
 
-v16i8 __builtin_msa_splati_b (v16i8, imm0_15);
-v8i16 __builtin_msa_splati_h (v8i16, imm0_7);
-v4i32 __builtin_msa_splati_w (v4i32, imm0_3);
-v2i64 __builtin_msa_splati_d (v2i64, imm0_1);
+@smallexample
+imm0_3: 0 to 3.
+imm0_7: 0 to 7.
+imm0_15: 0 to 15.
+imm0_31: 0 to 31.
+imm0_63: 0 to 63.
+imm0_255: 0 to 255.
+imm_n32_31: -32 to 31.
+imm_n512_511: -512 to 511.
+@end smallexample
 
-v16i8 __builtin_msa_sra_b (v16i8, v16i8);
-v8i16 __builtin_msa_sra_h (v8i16, v8i16);
-v4i32 __builtin_msa_sra_w (v4i32, v4i32);
-v2i64 __builtin_msa_sra_d (v2i64, v2i64);
+The following built-in functions map directly to a particular MIPS DSP
+instruction.  Please refer to the architecture specification
+for details on what each instruction does.
 
-v16i8 __builtin_msa_srai_b (v16i8, imm0_7);
-v8i16 __builtin_msa_srai_h (v8i16, imm0_15);
-v4i32 __builtin_msa_srai_w (v4i32, imm0_31);
-v2i64 __builtin_msa_srai_d (v2i64, imm0_63);
+@smallexample
+v2q15 __builtin_mips_addq_ph (v2q15, v2q15);
+v2q15 __builtin_mips_addq_s_ph (v2q15, v2q15);
+q31 __builtin_mips_addq_s_w (q31, q31);
+v4i8 __builtin_mips_addu_qb (v4i8, v4i8);
+v4i8 __builtin_mips_addu_s_qb (v4i8, v4i8);
+v2q15 __builtin_mips_subq_ph (v2q15, v2q15);
+v2q15 __builtin_mips_subq_s_ph (v2q15, v2q15);
+q31 __builtin_mips_subq_s_w (q31, q31);
+v4i8 __builtin_mips_subu_qb (v4i8, v4i8);
+v4i8 __builtin_mips_subu_s_qb (v4i8, v4i8);
+i32 __builtin_mips_addsc (i32, i32);
+i32 __builtin_mips_addwc (i32, i32);
+i32 __builtin_mips_modsub (i32, i32);
+i32 __builtin_mips_raddu_w_qb (v4i8);
+v2q15 __builtin_mips_absq_s_ph (v2q15);
+q31 __builtin_mips_absq_s_w (q31);
+v4i8 __builtin_mips_precrq_qb_ph (v2q15, v2q15);
+v2q15 __builtin_mips_precrq_ph_w (q31, q31);
+v2q15 __builtin_mips_precrq_rs_ph_w (q31, q31);
+v4i8 __builtin_mips_precrqu_s_qb_ph (v2q15, v2q15);
+q31 __builtin_mips_preceq_w_phl (v2q15);
+q31 __builtin_mips_preceq_w_phr (v2q15);
+v2q15 __builtin_mips_precequ_ph_qbl (v4i8);
+v2q15 __builtin_mips_precequ_ph_qbr (v4i8);
+v2q15 __builtin_mips_precequ_ph_qbla (v4i8);
+v2q15 __builtin_mips_precequ_ph_qbra (v4i8);
+v2q15 __builtin_mips_preceu_ph_qbl (v4i8);
+v2q15 __builtin_mips_preceu_ph_qbr (v4i8);
+v2q15 __builtin_mips_preceu_ph_qbla (v4i8);
+v2q15 __builtin_mips_preceu_ph_qbra (v4i8);
+v4i8 __builtin_mips_shll_qb (v4i8, imm0_7);
+v4i8 __builtin_mips_shll_qb (v4i8, i32);
+v2q15 __builtin_mips_shll_ph (v2q15, imm0_15);
+v2q15 __builtin_mips_shll_ph (v2q15, i32);
+v2q15 __builtin_mips_shll_s_ph (v2q15, imm0_15);
+v2q15 __builtin_mips_shll_s_ph (v2q15, i32);
+q31 __builtin_mips_shll_s_w (q31, imm0_31);
+q31 __builtin_mips_shll_s_w (q31, i32);
+v4i8 __builtin_mips_shrl_qb (v4i8, imm0_7);
+v4i8 __builtin_mips_shrl_qb (v4i8, i32);
+v2q15 __builtin_mips_shra_ph (v2q15, imm0_15);
+v2q15 __builtin_mips_shra_ph (v2q15, i32);
+v2q15 __builtin_mips_shra_r_ph (v2q15, imm0_15);
+v2q15 __builtin_mips_shra_r_ph (v2q15, i32);
+q31 __builtin_mips_shra_r_w (q31, imm0_31);
+q31 __builtin_mips_shra_r_w (q31, i32);
+v2q15 __builtin_mips_muleu_s_ph_qbl (v4i8, v2q15);
+v2q15 __builtin_mips_muleu_s_ph_qbr (v4i8, v2q15);
+v2q15 __builtin_mips_mulq_rs_ph (v2q15, v2q15);
+q31 __builtin_mips_muleq_s_w_phl (v2q15, v2q15);
+q31 __builtin_mips_muleq_s_w_phr (v2q15, v2q15);
+a64 __builtin_mips_dpau_h_qbl (a64, v4i8, v4i8);
+a64 __builtin_mips_dpau_h_qbr (a64, v4i8, v4i8);
+a64 __builtin_mips_dpsu_h_qbl (a64, v4i8, v4i8);
+a64 __builtin_mips_dpsu_h_qbr (a64, v4i8, v4i8);
+a64 __builtin_mips_dpaq_s_w_ph (a64, v2q15, v2q15);
+a64 __builtin_mips_dpaq_sa_l_w (a64, q31, q31);
+a64 __builtin_mips_dpsq_s_w_ph (a64, v2q15, v2q15);
+a64 __builtin_mips_dpsq_sa_l_w (a64, q31, q31);
+a64 __builtin_mips_mulsaq_s_w_ph (a64, v2q15, v2q15);
+a64 __builtin_mips_maq_s_w_phl (a64, v2q15, v2q15);
+a64 __builtin_mips_maq_s_w_phr (a64, v2q15, v2q15);
+a64 __builtin_mips_maq_sa_w_phl (a64, v2q15, v2q15);
+a64 __builtin_mips_maq_sa_w_phr (a64, v2q15, v2q15);
+i32 __builtin_mips_bitrev (i32);
+i32 __builtin_mips_insv (i32, i32);
+v4i8 __builtin_mips_repl_qb (imm0_255);
+v4i8 __builtin_mips_repl_qb (i32);
+v2q15 __builtin_mips_repl_ph (imm_n512_511);
+v2q15 __builtin_mips_repl_ph (i32);
+void __builtin_mips_cmpu_eq_qb (v4i8, v4i8);
+void __builtin_mips_cmpu_lt_qb (v4i8, v4i8);
+void __builtin_mips_cmpu_le_qb (v4i8, v4i8);
+i32 __builtin_mips_cmpgu_eq_qb (v4i8, v4i8);
+i32 __builtin_mips_cmpgu_lt_qb (v4i8, v4i8);
+i32 __builtin_mips_cmpgu_le_qb (v4i8, v4i8);
+void __builtin_mips_cmp_eq_ph (v2q15, v2q15);
+void __builtin_mips_cmp_lt_ph (v2q15, v2q15);
+void __builtin_mips_cmp_le_ph (v2q15, v2q15);
+v4i8 __builtin_mips_pick_qb (v4i8, v4i8);
+v2q15 __builtin_mips_pick_ph (v2q15, v2q15);
+v2q15 __builtin_mips_packrl_ph (v2q15, v2q15);
+i32 __builtin_mips_extr_w (a64, imm0_31);
+i32 __builtin_mips_extr_w (a64, i32);
+i32 __builtin_mips_extr_r_w (a64, imm0_31);
+i32 __builtin_mips_extr_s_h (a64, i32);
+i32 __builtin_mips_extr_rs_w (a64, imm0_31);
+i32 __builtin_mips_extr_rs_w (a64, i32);
+i32 __builtin_mips_extr_s_h (a64, imm0_31);
+i32 __builtin_mips_extr_r_w (a64, i32);
+i32 __builtin_mips_extp (a64, imm0_31);
+i32 __builtin_mips_extp (a64, i32);
+i32 __builtin_mips_extpdp (a64, imm0_31);
+i32 __builtin_mips_extpdp (a64, i32);
+a64 __builtin_mips_shilo (a64, imm_n32_31);
+a64 __builtin_mips_shilo (a64, i32);
+a64 __builtin_mips_mthlip (a64, i32);
+void __builtin_mips_wrdsp (i32, imm0_63);
+i32 __builtin_mips_rddsp (imm0_63);
+i32 __builtin_mips_lbux (void *, i32);
+i32 __builtin_mips_lhx (void *, i32);
+i32 __builtin_mips_lwx (void *, i32);
+a64 __builtin_mips_ldx (void *, i32); /* MIPS64 only */
+i32 __builtin_mips_bposge32 (void);
+a64 __builtin_mips_madd (a64, i32, i32);
+a64 __builtin_mips_maddu (a64, ui32, ui32);
+a64 __builtin_mips_msub (a64, i32, i32);
+a64 __builtin_mips_msubu (a64, ui32, ui32);
+a64 __builtin_mips_mult (i32, i32);
+a64 __builtin_mips_multu (ui32, ui32);
+@end smallexample
 
-v16i8 __builtin_msa_srar_b (v16i8, v16i8);
-v8i16 __builtin_msa_srar_h (v8i16, v8i16);
-v4i32 __builtin_msa_srar_w (v4i32, v4i32);
-v2i64 __builtin_msa_srar_d (v2i64, v2i64);
+The following built-in functions map directly to a particular MIPS DSP REV 2
+instruction.  Please refer to the architecture specification
+for details on what each instruction does.
 
-v16i8 __builtin_msa_srari_b (v16i8, imm0_7);
-v8i16 __builtin_msa_srari_h (v8i16, imm0_15);
-v4i32 __builtin_msa_srari_w (v4i32, imm0_31);
-v2i64 __builtin_msa_srari_d (v2i64, imm0_63);
+@smallexample
+v4q7 __builtin_mips_absq_s_qb (v4q7);
+v2i16 __builtin_mips_addu_ph (v2i16, v2i16);
+v2i16 __builtin_mips_addu_s_ph (v2i16, v2i16);
+v4i8 __builtin_mips_adduh_qb (v4i8, v4i8);
+v4i8 __builtin_mips_adduh_r_qb (v4i8, v4i8);
+i32 __builtin_mips_append (i32, i32, imm0_31);
+i32 __builtin_mips_balign (i32, i32, imm0_3);
+i32 __builtin_mips_cmpgdu_eq_qb (v4i8, v4i8);
+i32 __builtin_mips_cmpgdu_lt_qb (v4i8, v4i8);
+i32 __builtin_mips_cmpgdu_le_qb (v4i8, v4i8);
+a64 __builtin_mips_dpa_w_ph (a64, v2i16, v2i16);
+a64 __builtin_mips_dps_w_ph (a64, v2i16, v2i16);
+v2i16 __builtin_mips_mul_ph (v2i16, v2i16);
+v2i16 __builtin_mips_mul_s_ph (v2i16, v2i16);
+q31 __builtin_mips_mulq_rs_w (q31, q31);
+v2q15 __builtin_mips_mulq_s_ph (v2q15, v2q15);
+q31 __builtin_mips_mulq_s_w (q31, q31);
+a64 __builtin_mips_mulsa_w_ph (a64, v2i16, v2i16);
+v4i8 __builtin_mips_precr_qb_ph (v2i16, v2i16);
+v2i16 __builtin_mips_precr_sra_ph_w (i32, i32, imm0_31);
+v2i16 __builtin_mips_precr_sra_r_ph_w (i32, i32, imm0_31);
+i32 __builtin_mips_prepend (i32, i32, imm0_31);
+v4i8 __builtin_mips_shra_qb (v4i8, imm0_7);
+v4i8 __builtin_mips_shra_r_qb (v4i8, imm0_7);
+v4i8 __builtin_mips_shra_qb (v4i8, i32);
+v4i8 __builtin_mips_shra_r_qb (v4i8, i32);
+v2i16 __builtin_mips_shrl_ph (v2i16, imm0_15);
+v2i16 __builtin_mips_shrl_ph (v2i16, i32);
+v2i16 __builtin_mips_subu_ph (v2i16, v2i16);
+v2i16 __builtin_mips_subu_s_ph (v2i16, v2i16);
+v4i8 __builtin_mips_subuh_qb (v4i8, v4i8);
+v4i8 __builtin_mips_subuh_r_qb (v4i8, v4i8);
+v2q15 __builtin_mips_addqh_ph (v2q15, v2q15);
+v2q15 __builtin_mips_addqh_r_ph (v2q15, v2q15);
+q31 __builtin_mips_addqh_w (q31, q31);
+q31 __builtin_mips_addqh_r_w (q31, q31);
+v2q15 __builtin_mips_subqh_ph (v2q15, v2q15);
+v2q15 __builtin_mips_subqh_r_ph (v2q15, v2q15);
+q31 __builtin_mips_subqh_w (q31, q31);
+q31 __builtin_mips_subqh_r_w (q31, q31);
+a64 __builtin_mips_dpax_w_ph (a64, v2i16, v2i16);
+a64 __builtin_mips_dpsx_w_ph (a64, v2i16, v2i16);
+a64 __builtin_mips_dpaqx_s_w_ph (a64, v2q15, v2q15);
+a64 __builtin_mips_dpaqx_sa_w_ph (a64, v2q15, v2q15);
+a64 __builtin_mips_dpsqx_s_w_ph (a64, v2q15, v2q15);
+a64 __builtin_mips_dpsqx_sa_w_ph (a64, v2q15, v2q15);
+@end smallexample
 
-v16i8 __builtin_msa_srl_b (v16i8, v16i8);
-v8i16 __builtin_msa_srl_h (v8i16, v8i16);
-v4i32 __builtin_msa_srl_w (v4i32, v4i32);
-v2i64 __builtin_msa_srl_d (v2i64, v2i64);
 
-v16i8 __builtin_msa_srli_b (v16i8, imm0_7);
-v8i16 __builtin_msa_srli_h (v8i16, imm0_15);
-v4i32 __builtin_msa_srli_w (v4i32, imm0_31);
-v2i64 __builtin_msa_srli_d (v2i64, imm0_63);
+@node MIPS Paired-Single Support
+@subsection MIPS Paired-Single Support
 
-v16i8 __builtin_msa_srlr_b (v16i8, v16i8);
-v8i16 __builtin_msa_srlr_h (v8i16, v8i16);
-v4i32 __builtin_msa_srlr_w (v4i32, v4i32);
-v2i64 __builtin_msa_srlr_d (v2i64, v2i64);
+The MIPS64 architecture includes a number of instructions that
+operate on pairs of single-precision floating-point values.
+Each pair is packed into a 64-bit floating-point register,
+with one element being designated the ``upper half'' and
+the other being designated the ``lower half''.
 
-v16i8 __builtin_msa_srlri_b (v16i8, imm0_7);
-v8i16 __builtin_msa_srlri_h (v8i16, imm0_15);
-v4i32 __builtin_msa_srlri_w (v4i32, imm0_31);
-v2i64 __builtin_msa_srlri_d (v2i64, imm0_63);
+GCC supports paired-single operations using both the generic
+vector extensions (@pxref{Vector Extensions}) and a collection of
+MIPS-specific built-in functions.  Both kinds of support are
+enabled by the @option{-mpaired-single} command-line option.
 
-void __builtin_msa_st_b (v16i8, void *, imm_n512_511);
-void __builtin_msa_st_h (v8i16, void *, imm_n1024_1022);
-void __builtin_msa_st_w (v4i32, void *, imm_n2048_2044);
-void __builtin_msa_st_d (v2i64, void *, imm_n4096_4088);
+The vector type associated with paired-single values is usually
+called @code{v2sf}.  It can be defined in C as follows:
 
-v16i8 __builtin_msa_subs_s_b (v16i8, v16i8);
-v8i16 __builtin_msa_subs_s_h (v8i16, v8i16);
-v4i32 __builtin_msa_subs_s_w (v4i32, v4i32);
-v2i64 __builtin_msa_subs_s_d (v2i64, v2i64);
+@smallexample
+typedef float v2sf __attribute__ ((vector_size (8)));
+@end smallexample
 
-v16u8 __builtin_msa_subs_u_b (v16u8, v16u8);
-v8u16 __builtin_msa_subs_u_h (v8u16, v8u16);
-v4u32 __builtin_msa_subs_u_w (v4u32, v4u32);
-v2u64 __builtin_msa_subs_u_d (v2u64, v2u64);
+@code{v2sf} values are initialized in the same way as aggregates.
+For example:
 
-v16u8 __builtin_msa_subsus_u_b (v16u8, v16i8);
-v8u16 __builtin_msa_subsus_u_h (v8u16, v8i16);
-v4u32 __builtin_msa_subsus_u_w (v4u32, v4i32);
-v2u64 __builtin_msa_subsus_u_d (v2u64, v2i64);
+@smallexample
+v2sf a = @{1.5, 9.1@};
+v2sf b;
+float e, f;
+b = (v2sf) @{e, f@};
+@end smallexample
 
-v16i8 __builtin_msa_subsuu_s_b (v16u8, v16u8);
-v8i16 __builtin_msa_subsuu_s_h (v8u16, v8u16);
-v4i32 __builtin_msa_subsuu_s_w (v4u32, v4u32);
-v2i64 __builtin_msa_subsuu_s_d (v2u64, v2u64);
+@emph{Note:} The CPU's endianness determines which value is stored in
+the upper half of a register and which value is stored in the lower half.
+On little-endian targets, the first value is the lower one and the second
+value is the upper one.  The opposite order applies to big-endian targets.
+For example, the code above sets the lower half of @code{a} to
+@code{1.5} on little-endian targets and @code{9.1} on big-endian targets.
 
-v16i8 __builtin_msa_subv_b (v16i8, v16i8);
-v8i16 __builtin_msa_subv_h (v8i16, v8i16);
-v4i32 __builtin_msa_subv_w (v4i32, v4i32);
-v2i64 __builtin_msa_subv_d (v2i64, v2i64);
+@node MIPS Loongson Built-in Functions
+@subsection MIPS Loongson Built-in Functions
 
-v16i8 __builtin_msa_subvi_b (v16i8, imm0_31);
-v8i16 __builtin_msa_subvi_h (v8i16, imm0_31);
-v4i32 __builtin_msa_subvi_w (v4i32, imm0_31);
-v2i64 __builtin_msa_subvi_d (v2i64, imm0_31);
+GCC provides intrinsics to access the SIMD instructions provided by the
+ST Microelectronics Loongson-2E and -2F processors.  These intrinsics,
+available after inclusion of the @code{loongson.h} header file,
+operate on the following 64-bit vector types:
 
-v16i8 __builtin_msa_vshf_b (v16i8, v16i8, v16i8);
-v8i16 __builtin_msa_vshf_h (v8i16, v8i16, v8i16);
-v4i32 __builtin_msa_vshf_w (v4i32, v4i32, v4i32);
-v2i64 __builtin_msa_vshf_d (v2i64, v2i64, v2i64);
+@itemize
+@item @code{uint8x8_t}, a vector of eight unsigned 8-bit integers;
+@item @code{uint16x4_t}, a vector of four unsigned 16-bit integers;
+@item @code{uint32x2_t}, a vector of two unsigned 32-bit integers;
+@item @code{int8x8_t}, a vector of eight signed 8-bit integers;
+@item @code{int16x4_t}, a vector of four signed 16-bit integers;
+@item @code{int32x2_t}, a vector of two signed 32-bit integers.
+@end itemize
 
-v16u8 __builtin_msa_xor_v (v16u8, v16u8);
+The intrinsics provided are listed below; each is named after the
+machine instruction to which it corresponds, with suffixes added as
+appropriate to distinguish intrinsics that expand to the same machine
+instruction yet have different argument types.  Refer to the architecture
+documentation for a description of the functionality of each
+instruction.
 
-v16u8 __builtin_msa_xori_b (v16u8, imm0_255);
+@smallexample
+int16x4_t packsswh (int32x2_t s, int32x2_t t);
+int8x8_t packsshb (int16x4_t s, int16x4_t t);
+uint8x8_t packushb (uint16x4_t s, uint16x4_t t);
+uint32x2_t paddw_u (uint32x2_t s, uint32x2_t t);
+uint16x4_t paddh_u (uint16x4_t s, uint16x4_t t);
+uint8x8_t paddb_u (uint8x8_t s, uint8x8_t t);
+int32x2_t paddw_s (int32x2_t s, int32x2_t t);
+int16x4_t paddh_s (int16x4_t s, int16x4_t t);
+int8x8_t paddb_s (int8x8_t s, int8x8_t t);
+uint64_t paddd_u (uint64_t s, uint64_t t);
+int64_t paddd_s (int64_t s, int64_t t);
+int16x4_t paddsh (int16x4_t s, int16x4_t t);
+int8x8_t paddsb (int8x8_t s, int8x8_t t);
+uint16x4_t paddush (uint16x4_t s, uint16x4_t t);
+uint8x8_t paddusb (uint8x8_t s, uint8x8_t t);
+uint64_t pandn_ud (uint64_t s, uint64_t t);
+uint32x2_t pandn_uw (uint32x2_t s, uint32x2_t t);
+uint16x4_t pandn_uh (uint16x4_t s, uint16x4_t t);
+uint8x8_t pandn_ub (uint8x8_t s, uint8x8_t t);
+int64_t pandn_sd (int64_t s, int64_t t);
+int32x2_t pandn_sw (int32x2_t s, int32x2_t t);
+int16x4_t pandn_sh (int16x4_t s, int16x4_t t);
+int8x8_t pandn_sb (int8x8_t s, int8x8_t t);
+uint16x4_t pavgh (uint16x4_t s, uint16x4_t t);
+uint8x8_t pavgb (uint8x8_t s, uint8x8_t t);
+uint32x2_t pcmpeqw_u (uint32x2_t s, uint32x2_t t);
+uint16x4_t pcmpeqh_u (uint16x4_t s, uint16x4_t t);
+uint8x8_t pcmpeqb_u (uint8x8_t s, uint8x8_t t);
+int32x2_t pcmpeqw_s (int32x2_t s, int32x2_t t);
+int16x4_t pcmpeqh_s (int16x4_t s, int16x4_t t);
+int8x8_t pcmpeqb_s (int8x8_t s, int8x8_t t);
+uint32x2_t pcmpgtw_u (uint32x2_t s, uint32x2_t t);
+uint16x4_t pcmpgth_u (uint16x4_t s, uint16x4_t t);
+uint8x8_t pcmpgtb_u (uint8x8_t s, uint8x8_t t);
+int32x2_t pcmpgtw_s (int32x2_t s, int32x2_t t);
+int16x4_t pcmpgth_s (int16x4_t s, int16x4_t t);
+int8x8_t pcmpgtb_s (int8x8_t s, int8x8_t t);
+uint16x4_t pextrh_u (uint16x4_t s, int field);
+int16x4_t pextrh_s (int16x4_t s, int field);
+uint16x4_t pinsrh_0_u (uint16x4_t s, uint16x4_t t);
+uint16x4_t pinsrh_1_u (uint16x4_t s, uint16x4_t t);
+uint16x4_t pinsrh_2_u (uint16x4_t s, uint16x4_t t);
+uint16x4_t pinsrh_3_u (uint16x4_t s, uint16x4_t t);
+int16x4_t pinsrh_0_s (int16x4_t s, int16x4_t t);
+int16x4_t pinsrh_1_s (int16x4_t s, int16x4_t t);
+int16x4_t pinsrh_2_s (int16x4_t s, int16x4_t t);
+int16x4_t pinsrh_3_s (int16x4_t s, int16x4_t t);
+int32x2_t pmaddhw (int16x4_t s, int16x4_t t);
+int16x4_t pmaxsh (int16x4_t s, int16x4_t t);
+uint8x8_t pmaxub (uint8x8_t s, uint8x8_t t);
+int16x4_t pminsh (int16x4_t s, int16x4_t t);
+uint8x8_t pminub (uint8x8_t s, uint8x8_t t);
+uint8x8_t pmovmskb_u (uint8x8_t s);
+int8x8_t pmovmskb_s (int8x8_t s);
+uint16x4_t pmulhuh (uint16x4_t s, uint16x4_t t);
+int16x4_t pmulhh (int16x4_t s, int16x4_t t);
+int16x4_t pmullh (int16x4_t s, int16x4_t t);
+int64_t pmuluw (uint32x2_t s, uint32x2_t t);
+uint8x8_t pasubub (uint8x8_t s, uint8x8_t t);
+uint16x4_t biadd (uint8x8_t s);
+uint16x4_t psadbh (uint8x8_t s, uint8x8_t t);
+uint16x4_t pshufh_u (uint16x4_t dest, uint16x4_t s, uint8_t order);
+int16x4_t pshufh_s (int16x4_t dest, int16x4_t s, uint8_t order);
+uint16x4_t psllh_u (uint16x4_t s, uint8_t amount);
+int16x4_t psllh_s (int16x4_t s, uint8_t amount);
+uint32x2_t psllw_u (uint32x2_t s, uint8_t amount);
+int32x2_t psllw_s (int32x2_t s, uint8_t amount);
+uint16x4_t psrlh_u (uint16x4_t s, uint8_t amount);
+int16x4_t psrlh_s (int16x4_t s, uint8_t amount);
+uint32x2_t psrlw_u (uint32x2_t s, uint8_t amount);
+int32x2_t psrlw_s (int32x2_t s, uint8_t amount);
+uint16x4_t psrah_u (uint16x4_t s, uint8_t amount);
+int16x4_t psrah_s (int16x4_t s, uint8_t amount);
+uint32x2_t psraw_u (uint32x2_t s, uint8_t amount);
+int32x2_t psraw_s (int32x2_t s, uint8_t amount);
+uint32x2_t psubw_u (uint32x2_t s, uint32x2_t t);
+uint16x4_t psubh_u (uint16x4_t s, uint16x4_t t);
+uint8x8_t psubb_u (uint8x8_t s, uint8x8_t t);
+int32x2_t psubw_s (int32x2_t s, int32x2_t t);
+int16x4_t psubh_s (int16x4_t s, int16x4_t t);
+int8x8_t psubb_s (int8x8_t s, int8x8_t t);
+uint64_t psubd_u (uint64_t s, uint64_t t);
+int64_t psubd_s (int64_t s, int64_t t);
+int16x4_t psubsh (int16x4_t s, int16x4_t t);
+int8x8_t psubsb (int8x8_t s, int8x8_t t);
+uint16x4_t psubush (uint16x4_t s, uint16x4_t t);
+uint8x8_t psubusb (uint8x8_t s, uint8x8_t t);
+uint32x2_t punpckhwd_u (uint32x2_t s, uint32x2_t t);
+uint16x4_t punpckhhw_u (uint16x4_t s, uint16x4_t t);
+uint8x8_t punpckhbh_u (uint8x8_t s, uint8x8_t t);
+int32x2_t punpckhwd_s (int32x2_t s, int32x2_t t);
+int16x4_t punpckhhw_s (int16x4_t s, int16x4_t t);
+int8x8_t punpckhbh_s (int8x8_t s, int8x8_t t);
+uint32x2_t punpcklwd_u (uint32x2_t s, uint32x2_t t);
+uint16x4_t punpcklhw_u (uint16x4_t s, uint16x4_t t);
+uint8x8_t punpcklbh_u (uint8x8_t s, uint8x8_t t);
+int32x2_t punpcklwd_s (int32x2_t s, int32x2_t t);
+int16x4_t punpcklhw_s (int16x4_t s, int16x4_t t);
+int8x8_t punpcklbh_s (int8x8_t s, int8x8_t t);
 @end smallexample
 
-@node Other MIPS Built-in Functions
-@subsection Other MIPS Built-in Functions
+@menu
+* Paired-Single Arithmetic::
+* Paired-Single Built-in Functions::
+* MIPS-3D Built-in Functions::
+@end menu
 
-GCC provides other MIPS-specific built-in functions:
+@node Paired-Single Arithmetic
+@subsubsection Paired-Single Arithmetic
 
-@table @code
-@item void __builtin_mips_cache (int @var{op}, const volatile void *@var{addr})
-Insert a @samp{cache} instruction with operands @var{op} and @var{addr}.
-GCC defines the preprocessor macro @code{___GCC_HAVE_BUILTIN_MIPS_CACHE}
-when this function is available.
+The table below lists the @code{v2sf} operations for which hardware
+support exists.  @code{a}, @code{b} and @code{c} are @code{v2sf}
+values and @code{x} is an integral value.
 
-@item unsigned int __builtin_mips_get_fcsr (void)
-@itemx void __builtin_mips_set_fcsr (unsigned int @var{value})
-Get and set the contents of the floating-point control and status register
-(FPU control register 31).  These functions are only available in hard-float
-code but can be called in both MIPS16 and non-MIPS16 contexts.
+@multitable @columnfractions .50 .50
+@headitem C code @tab MIPS instruction
+@item @code{a + b} @tab @code{add.ps}
+@item @code{a - b} @tab @code{sub.ps}
+@item @code{-a} @tab @code{neg.ps}
+@item @code{a * b} @tab @code{mul.ps}
+@item @code{a * b + c} @tab @code{madd.ps}
+@item @code{a * b - c} @tab @code{msub.ps}
+@item @code{-(a * b + c)} @tab @code{nmadd.ps}
+@item @code{-(a * b - c)} @tab @code{nmsub.ps}
+@item @code{x ? a : b} @tab @code{movn.ps}/@code{movz.ps}
+@end multitable
 
-@code{__builtin_mips_set_fcsr} can be used to change any bit of the
-register except the condition codes, which GCC assumes are preserved.
-@end table
+Note that the multiply-accumulate instructions can be disabled
+using the command-line option @code{-mno-fused-madd}.
 
-@node MSP430 Built-in Functions
-@subsection MSP430 Built-in Functions
+@node Paired-Single Built-in Functions
+@subsubsection Paired-Single Built-in Functions
 
-GCC provides a couple of special builtin functions to aid in the
-writing of interrupt handlers in C.
+The following paired-single functions map directly to a particular
+MIPS instruction.  Please refer to the architecture specification
+for details on what each instruction does.
 
 @table @code
-@item __bic_SR_register_on_exit (int @var{mask})
-This clears the indicated bits in the saved copy of the status register
-currently residing on the stack.  This only works inside interrupt
-handlers and the changes to the status register will only take affect
-once the handler returns.
-
-@item __bis_SR_register_on_exit (int @var{mask})
-This sets the indicated bits in the saved copy of the status register
-currently residing on the stack.  This only works inside interrupt
-handlers and the changes to the status register will only take affect
-once the handler returns.
-
-@item __delay_cycles (long long @var{cycles})
-This inserts an instruction sequence that takes exactly @var{cycles}
-cycles (between 0 and about 17E9) to complete.  The inserted sequence
-may use jumps, loops, or no-ops, and does not interfere with any other
-instructions.  Note that @var{cycles} must be a compile-time constant
-integer - that is, you must pass a number, not a variable that may be
-optimized to a constant later.  The number of cycles delayed by this
-builtin is exact.
-@end table
-
-@node NDS32 Built-in Functions
-@subsection NDS32 Built-in Functions
-
-These built-in functions are available for the NDS32 target:
-
-@defbuiltin{void __builtin_nds32_isync (int *@var{addr})}
-Insert an ISYNC instruction into the instruction stream where
-@var{addr} is an instruction address for serialization.
-@enddefbuiltin
-
-@defbuiltin{void __builtin_nds32_isb (void)}
-Insert an ISB instruction into the instruction stream.
-@enddefbuiltin
-
-@defbuiltin{int __builtin_nds32_mfsr (int @var{sr})}
-Return the content of a system register which is mapped by @var{sr}.
-@enddefbuiltin
-
-@defbuiltin{int __builtin_nds32_mfusr (int @var{usr})}
-Return the content of a user space register which is mapped by @var{usr}.
-@enddefbuiltin
+@item v2sf __builtin_mips_pll_ps (v2sf, v2sf)
+Pair lower lower (@code{pll.ps}).
 
-@defbuiltin{void __builtin_nds32_mtsr (int @var{value}, int @var{sr})}
-Move the @var{value} to a system register which is mapped by @var{sr}.
-@enddefbuiltin
+@item v2sf __builtin_mips_pul_ps (v2sf, v2sf)
+Pair upper lower (@code{pul.ps}).
 
-@defbuiltin{void __builtin_nds32_mtusr (int @var{value}, int @var{usr})}
-Move the @var{value} to a user space register which is mapped by @var{usr}.
-@enddefbuiltin
+@item v2sf __builtin_mips_plu_ps (v2sf, v2sf)
+Pair lower upper (@code{plu.ps}).
 
-@defbuiltin{void __builtin_nds32_setgie_en (void)}
-Enable global interrupt.
-@enddefbuiltin
+@item v2sf __builtin_mips_puu_ps (v2sf, v2sf)
+Pair upper upper (@code{puu.ps}).
 
-@defbuiltin{void __builtin_nds32_setgie_dis (void)}
-Disable global interrupt.
-@enddefbuiltin
+@item v2sf __builtin_mips_cvt_ps_s (float, float)
+Convert pair to paired single (@code{cvt.ps.s}).
 
-@node Nvidia PTX Built-in Functions
-@subsection Nvidia PTX Built-in Functions
+@item float __builtin_mips_cvt_s_pl (v2sf)
+Convert pair lower to single (@code{cvt.s.pl}).
 
-These built-in functions are available for the Nvidia PTX target:
+@item float __builtin_mips_cvt_s_pu (v2sf)
+Convert pair upper to single (@code{cvt.s.pu}).
 
-@defbuiltin{{unsigned int} __builtin_nvptx_brev (unsigned int @var{x})}
-Reverse the bit order of a 32-bit unsigned integer.
-@enddefbuiltin
+@item v2sf __builtin_mips_abs_ps (v2sf)
+Absolute value (@code{abs.ps}).
 
-@defbuiltin{{unsigned long long} __builtin_nvptx_brevll (unsigned long long @var{x})}
-Reverse the bit order of a 64-bit unsigned integer.
-@enddefbuiltin
+@item v2sf __builtin_mips_alnv_ps (v2sf, v2sf, int)
+Align variable (@code{alnv.ps}).
 
-@node Basic PowerPC Built-in Functions
-@subsection Basic PowerPC Built-in Functions
+@emph{Note:} The value of the third parameter must be 0 or 4
+modulo 8, otherwise the result is unpredictable.  Please read the
+instruction description for details.
+@end table
 
-@menu
-* Basic PowerPC Built-in Functions Available on all Configurations::
-* Basic PowerPC Built-in Functions Available on ISA 2.05::
-* Basic PowerPC Built-in Functions Available on ISA 2.06::
-* Basic PowerPC Built-in Functions Available on ISA 2.07::
-* Basic PowerPC Built-in Functions Available on ISA 3.0::
-* Basic PowerPC Built-in Functions Available on ISA 3.1::
-@end menu
+The following multi-instruction functions are also available.
+In each case, @var{cond} can be any of the 16 floating-point conditions:
+@code{f}, @code{un}, @code{eq}, @code{ueq}, @code{olt}, @code{ult},
+@code{ole}, @code{ule}, @code{sf}, @code{ngle}, @code{seq}, @code{ngl},
+@code{lt}, @code{nge}, @code{le} or @code{ngt}.
 
-This section describes PowerPC built-in functions that do not require
-the inclusion of any special header files to declare prototypes or
-provide macro definitions.  The sections that follow describe
-additional PowerPC built-in functions.
+@table @code
+@item v2sf __builtin_mips_movt_c_@var{cond}_ps (v2sf @var{a}, v2sf @var{b}, v2sf @var{c}, v2sf @var{d})
+@itemx v2sf __builtin_mips_movf_c_@var{cond}_ps (v2sf @var{a}, v2sf @var{b}, v2sf @var{c}, v2sf @var{d})
+Conditional move based on floating-point comparison (@code{c.@var{cond}.ps},
+@code{movt.ps}/@code{movf.ps}).
 
-@node Basic PowerPC Built-in Functions Available on all Configurations
-@subsubsection Basic PowerPC Built-in Functions Available on all Configurations
+The @code{movt} functions return the value @var{x} computed by:
 
-@defbuiltin{void __builtin_cpu_init (void)}
-This function is a @code{nop} on the PowerPC platform and is included solely
-to maintain API compatibility with the x86 builtins.
-@enddefbuiltin
+@smallexample
+c.@var{cond}.ps @var{cc},@var{a},@var{b}
+mov.ps @var{x},@var{c}
+movt.ps @var{x},@var{d},@var{cc}
+@end smallexample
 
-@defbuiltin{int __builtin_cpu_is (const char *@var{cpuname})}
-This function returns a value of @code{1} if the run-time CPU is of type
-@var{cpuname} and returns @code{0} otherwise
+The @code{movf} functions are similar but use @code{movf.ps} instead
+of @code{movt.ps}.
 
-The @code{__builtin_cpu_is} function requires GLIBC 2.23 or newer
-which exports the hardware capability bits.  GCC defines the macro
-@code{__BUILTIN_CPU_SUPPORTS__} if the @code{__builtin_cpu_supports}
-built-in function is fully supported.
+@item int __builtin_mips_upper_c_@var{cond}_ps (v2sf @var{a}, v2sf @var{b})
+@itemx int __builtin_mips_lower_c_@var{cond}_ps (v2sf @var{a}, v2sf @var{b})
+Comparison of two paired-single values (@code{c.@var{cond}.ps},
+@code{bc1t}/@code{bc1f}).
 
-If GCC was configured to use a GLIBC before 2.23, the built-in
-function @code{__builtin_cpu_is} always returns a 0 and the compiler
-issues a warning.
+These functions compare @var{a} and @var{b} using @code{c.@var{cond}.ps}
+and return either the upper or lower half of the result.  For example:
 
-The following CPU names can be detected:
+@smallexample
+v2sf a, b;
+if (__builtin_mips_upper_c_eq_ps (a, b))
+  upper_halves_are_equal ();
+else
+  upper_halves_are_unequal ();
 
-@table @samp
-@item power10
-IBM POWER10 Server CPU.
-@item power9
-IBM POWER9 Server CPU.
-@item power8
-IBM POWER8 Server CPU.
-@item power7
-IBM POWER7 Server CPU.
-@item power6x
-IBM POWER6 Server CPU (RAW mode).
-@item power6
-IBM POWER6 Server CPU (Architected mode).
-@item power5+
-IBM POWER5+ Server CPU.
-@item power5
-IBM POWER5 Server CPU.
-@item ppc970
-IBM 970 Server CPU (ie, Apple G5).
-@item power4
-IBM POWER4 Server CPU.
-@item ppca2
-IBM A2 64-bit Embedded CPU
-@item ppc476
-IBM PowerPC 476FP 32-bit Embedded CPU.
-@item ppc464
-IBM PowerPC 464 32-bit Embedded CPU.
-@item ppc440
-PowerPC 440 32-bit Embedded CPU.
-@item ppc405
-PowerPC 405 32-bit Embedded CPU.
-@item ppc-cell-be
-IBM PowerPC Cell Broadband Engine Architecture CPU.
+if (__builtin_mips_lower_c_eq_ps (a, b))
+  lower_halves_are_equal ();
+else
+  lower_halves_are_unequal ();
+@end smallexample
 @end table
 
-Here is an example:
-@smallexample
-#ifdef __BUILTIN_CPU_SUPPORTS__
-  if (__builtin_cpu_is ("power8"))
-    @{
-       do_power8 (); // POWER8 specific implementation.
-    @}
-  else
-#endif
-    @{
-       do_generic (); // Generic implementation.
-    @}
-@end smallexample
-@enddefbuiltin
+@node MIPS-3D Built-in Functions
+@subsubsection MIPS-3D Built-in Functions
 
-@defbuiltin{int __builtin_cpu_supports (const char *@var{feature})}
-This function returns a value of @code{1} if the run-time CPU supports the HWCAP
-feature @var{feature} and returns @code{0} otherwise.
+The MIPS-3D Application-Specific Extension (ASE) includes additional
+paired-single instructions that are designed to improve the performance
+of 3D graphics operations.  Support for these instructions is controlled
+by the @option{-mips3d} command-line option.
 
-The @code{__builtin_cpu_supports} function requires GLIBC 2.23 or
-newer which exports the hardware capability bits.  GCC defines the
-macro @code{__BUILTIN_CPU_SUPPORTS__} if the
-@code{__builtin_cpu_supports} built-in function is fully supported.
+The functions listed below map directly to a particular MIPS-3D
+instruction.  Please refer to the architecture specification for
+more details on what each instruction does.
 
-If GCC was configured to use a GLIBC before 2.23, the built-in
-function @code{__builtin_cpu_supports} always returns a 0 and the
-compiler issues a warning.
+@table @code
+@item v2sf __builtin_mips_addr_ps (v2sf, v2sf)
+Reduction add (@code{addr.ps}).
 
-The following features can be
-detected:
+@item v2sf __builtin_mips_mulr_ps (v2sf, v2sf)
+Reduction multiply (@code{mulr.ps}).
 
-@table @samp
-@item 4xxmac
-4xx CPU has a Multiply Accumulator.
-@item altivec
-CPU has a SIMD/Vector Unit.
-@item arch_2_05
-CPU supports ISA 2.05 (eg, POWER6)
-@item arch_2_06
-CPU supports ISA 2.06 (eg, POWER7)
-@item arch_2_07
-CPU supports ISA 2.07 (eg, POWER8)
-@item arch_3_00
-CPU supports ISA 3.0 (eg, POWER9)
-@item arch_3_1
-CPU supports ISA 3.1 (eg, POWER10)
-@item archpmu
-CPU supports the set of compatible performance monitoring events.
-@item booke
-CPU supports the Embedded ISA category.
-@item cellbe
-CPU has a CELL broadband engine.
-@item darn
-CPU supports the @code{darn} (deliver a random number) instruction.
-@item dfp
-CPU has a decimal floating point unit.
-@item dscr
-CPU supports the data stream control register.
-@item ebb
-CPU supports event base branching.
-@item efpdouble
-CPU has a SPE double precision floating point unit.
-@item efpsingle
-CPU has a SPE single precision floating point unit.
-@item fpu
-CPU has a floating point unit.
-@item htm
-CPU has hardware transaction memory instructions.
-@item htm-nosc
-Kernel aborts hardware transactions when a syscall is made.
-@item htm-no-suspend
-CPU supports hardware transaction memory but does not support the
-@code{tsuspend.} instruction.
-@item ic_snoop
-CPU supports icache snooping capabilities.
-@item ieee128
-CPU supports 128-bit IEEE binary floating point instructions.
-@item isel
-CPU supports the integer select instruction.
-@item mma
-CPU supports the matrix-multiply assist instructions.
-@item mmu
-CPU has a memory management unit.
-@item notb
-CPU does not have a timebase (eg, 601 and 403gx).
-@item pa6t
-CPU supports the PA Semi 6T CORE ISA.
-@item power4
-CPU supports ISA 2.00 (eg, POWER4)
-@item power5
-CPU supports ISA 2.02 (eg, POWER5)
-@item power5+
-CPU supports ISA 2.03 (eg, POWER5+)
-@item power6x
-CPU supports ISA 2.05 (eg, POWER6) extended opcodes mffgpr and mftgpr.
-@item ppc32
-CPU supports 32-bit mode execution.
-@item ppc601
-CPU supports the old POWER ISA (eg, 601)
-@item ppc64
-CPU supports 64-bit mode execution.
-@item ppcle
-CPU supports a little-endian mode that uses address swizzling.
-@item scv
-Kernel supports system call vectored.
-@item smt
-CPU support simultaneous multi-threading.
-@item spe
-CPU has a signal processing extension unit.
-@item tar
-CPU supports the target address register.
-@item true_le
-CPU supports true little-endian mode.
-@item ucache
-CPU has unified I/D cache.
-@item vcrypto
-CPU supports the vector cryptography instructions.
-@item vsx
-CPU supports the vector-scalar extension.
+@item v2sf __builtin_mips_cvt_pw_ps (v2sf)
+Convert paired single to paired word (@code{cvt.pw.ps}).
+
+@item v2sf __builtin_mips_cvt_ps_pw (v2sf)
+Convert paired word to paired single (@code{cvt.ps.pw}).
+
+@item float __builtin_mips_recip1_s (float)
+@itemx double __builtin_mips_recip1_d (double)
+@itemx v2sf __builtin_mips_recip1_ps (v2sf)
+Reduced-precision reciprocal (sequence step 1) (@code{recip1.@var{fmt}}).
+
+@item float __builtin_mips_recip2_s (float, float)
+@itemx double __builtin_mips_recip2_d (double, double)
+@itemx v2sf __builtin_mips_recip2_ps (v2sf, v2sf)
+Reduced-precision reciprocal (sequence step 2) (@code{recip2.@var{fmt}}).
+
+@item float __builtin_mips_rsqrt1_s (float)
+@itemx double __builtin_mips_rsqrt1_d (double)
+@itemx v2sf __builtin_mips_rsqrt1_ps (v2sf)
+Reduced-precision reciprocal square root (sequence step 1)
+(@code{rsqrt1.@var{fmt}}).
+
+@item float __builtin_mips_rsqrt2_s (float, float)
+@itemx double __builtin_mips_rsqrt2_d (double, double)
+@itemx v2sf __builtin_mips_rsqrt2_ps (v2sf, v2sf)
+Reduced-precision reciprocal square root (sequence step 2)
+(@code{rsqrt2.@var{fmt}}).
 @end table
 
-Here is an example:
-@smallexample
-#ifdef __BUILTIN_CPU_SUPPORTS__
-  if (__builtin_cpu_supports ("fpu"))
-    @{
-       asm("fadd %0,%1,%2" : "=d"(dst) : "d"(src1), "d"(src2));
-    @}
-  else
-#endif
-    @{
-       dst = __fadd (src1, src2); // Software FP addition function.
-    @}
-@end smallexample
-@enddefbuiltin
+The following multi-instruction functions are also available.
+In each case, @var{cond} can be any of the 16 floating-point conditions:
+@code{f}, @code{un}, @code{eq}, @code{ueq}, @code{olt}, @code{ult},
+@code{ole}, @code{ule}, @code{sf}, @code{ngle}, @code{seq},
+@code{ngl}, @code{lt}, @code{nge}, @code{le} or @code{ngt}.
+
+@table @code
+@item int __builtin_mips_cabs_@var{cond}_s (float @var{a}, float @var{b})
+@itemx int __builtin_mips_cabs_@var{cond}_d (double @var{a}, double @var{b})
+Absolute comparison of two scalar values (@code{cabs.@var{cond}.@var{fmt}},
+@code{bc1t}/@code{bc1f}).
+
+These functions compare @var{a} and @var{b} using @code{cabs.@var{cond}.s}
+or @code{cabs.@var{cond}.d} and return the result as a boolean value.
+For example:
 
-The following built-in functions are also available on all PowerPC
-processors:
 @smallexample
-uint64_t __builtin_ppc_get_timebase ();
-unsigned long __builtin_ppc_mftb ();
-double __builtin_unpack_ibm128 (__ibm128, int);
-__ibm128 __builtin_pack_ibm128 (double, double);
-double __builtin_mffs (void);
-void __builtin_mtfsf (const int, double);
-void __builtin_mtfsb0 (const int);
-void __builtin_mtfsb1 (const int);
-double __builtin_set_fpscr_rn (int);
+float a, b;
+if (__builtin_mips_cabs_eq_s (a, b))
+  true ();
+else
+  false ();
 @end smallexample
 
-The @code{__builtin_ppc_get_timebase} and @code{__builtin_ppc_mftb}
-functions generate instructions to read the Time Base Register.  The
-@code{__builtin_ppc_get_timebase} function may generate multiple
-instructions and always returns the 64 bits of the Time Base Register.
-The @code{__builtin_ppc_mftb} function always generates one instruction and
-returns the Time Base Register value as an unsigned long, throwing away
-the most significant word on 32-bit environments.  The @code{__builtin_mffs}
-return the value of the FPSCR register.  Note, ISA 3.0 supports the
-@code{__builtin_mffsl()} which permits software to read the control and
-non-sticky status bits in the FSPCR without the higher latency associated with
-accessing the sticky status bits.  The @code{__builtin_mtfsf} takes a constant
-8-bit integer field mask and a double precision floating point argument
-and generates the @code{mtfsf} (extended mnemonic) instruction to write new
-values to selected fields of the FPSCR.  The
-@code{__builtin_mtfsb0} and @code{__builtin_mtfsb1} take the bit to change
-as an argument.  The valid bit range is between 0 and 31.  The builtins map to
-the @code{mtfsb0} and @code{mtfsb1} instructions which take the argument and
-add 32.  Hence these instructions only modify the FPSCR[32:63] bits by
-changing the specified bit to a zero or one respectively.
-
-The @code{__builtin_set_fpscr_rn} built-in allows changing both of the floating
-point rounding mode bits and returning the various FPSCR fields before the RN
-field is updated.  The built-in returns a double consisting of the initial
-value of the FPSCR fields DRN, VE, OE, UE, ZE, XE, NI, and RN bit positions
-with all other bits set to zero.  The built-in argument is a 2-bit value for the
-new RN field value.  The argument can either be an @code{const int} or stored
-in a variable.  Earlier versions of @code{__builtin_set_fpscr_rn} returned
-void.  A @code{__SET_FPSCR_RN_RETURNS_FPSCR__} macro has been added.  If
-defined, then the @code{__builtin_set_fpscr_rn} built-in returns the FPSCR
-fields.  If not defined, the @code{__builtin_set_fpscr_rn} does not return a
-value.  If the @option{-msoft-float} option is used, the
-@code{__builtin_set_fpscr_rn} built-in will not return a value.
+@item int __builtin_mips_upper_cabs_@var{cond}_ps (v2sf @var{a}, v2sf @var{b})
+@itemx int __builtin_mips_lower_cabs_@var{cond}_ps (v2sf @var{a}, v2sf @var{b})
+Absolute comparison of two paired-single values (@code{cabs.@var{cond}.ps},
+@code{bc1t}/@code{bc1f}).
 
-@node Basic PowerPC Built-in Functions Available on ISA 2.05
-@subsubsection Basic PowerPC Built-in Functions Available on ISA 2.05
+These functions compare @var{a} and @var{b} using @code{cabs.@var{cond}.ps}
+and return either the upper or lower half of the result.  For example:
 
-The basic built-in functions described in this section are
-available on the PowerPC family of processors starting with ISA 2.05
-or later.  Unless specific options are explicitly disabled on the
-command line, specifying option @option{-mcpu=power6} has the effect of
-enabling the @option{-mpowerpc64}, @option{-mpowerpc-gpopt},
-@option{-mpowerpc-gfxopt}, @option{-mmfcrf}, @option{-mpopcntb},
-@option{-mfprnd}, @option{-mcmpb}, @option{-mhard-dfp}, and
-@option{-mrecip-precision} options.  Specify the
-@option{-maltivec} option explicitly in
-combination with the above options if desired.
+@smallexample
+v2sf a, b;
+if (__builtin_mips_upper_cabs_eq_ps (a, b))
+  upper_halves_are_equal ();
+else
+  upper_halves_are_unequal ();
 
-The following functions require option @option{-mcmpb}.
-@smallexample
-unsigned long long __builtin_cmpb (unsigned long long int, unsigned long long int);
-unsigned int __builtin_cmpb (unsigned int, unsigned int);
+if (__builtin_mips_lower_cabs_eq_ps (a, b))
+  lower_halves_are_equal ();
+else
+  lower_halves_are_unequal ();
 @end smallexample
 
-The @code{__builtin_cmpb} function
-performs a byte-wise compare on the contents of its two arguments,
-returning the result of the byte-wise comparison as the returned
-value.  For each byte comparison, the corresponding byte of the return
-value holds 0xff if the input bytes are equal and 0 if the input bytes
-are not equal.  If either of the arguments to this built-in function
-is wider than 32 bits, the function call expands into the form that
-expects @code{unsigned long long int} arguments
-which is only available on 64-bit targets.
+@item v2sf __builtin_mips_movt_cabs_@var{cond}_ps (v2sf @var{a}, v2sf @var{b}, v2sf @var{c}, v2sf @var{d})
+@itemx v2sf __builtin_mips_movf_cabs_@var{cond}_ps (v2sf @var{a}, v2sf @var{b}, v2sf @var{c}, v2sf @var{d})
+Conditional move based on absolute comparison (@code{cabs.@var{cond}.ps},
+@code{movt.ps}/@code{movf.ps}).
+
+The @code{movt} functions return the value @var{x} computed by:
 
-The following built-in functions are available
-when hardware decimal floating point
-(@option{-mhard-dfp}) is available:
 @smallexample
-void __builtin_set_fpscr_drn(int);
-_Decimal64 __builtin_ddedpd (int, _Decimal64);
-_Decimal128 __builtin_ddedpdq (int, _Decimal128);
-_Decimal64 __builtin_denbcd (int, _Decimal64);
-_Decimal128 __builtin_denbcdq (int, _Decimal128);
-_Decimal64 __builtin_diex (long long, _Decimal64);
-_Decimal128 _builtin_diexq (long long, _Decimal128);
-_Decimal64 __builtin_dscli (_Decimal64, int);
-_Decimal128 __builtin_dscliq (_Decimal128, int);
-_Decimal64 __builtin_dscri (_Decimal64, int);
-_Decimal128 __builtin_dscriq (_Decimal128, int);
-long long __builtin_dxex (_Decimal64);
-long long __builtin_dxexq (_Decimal128);
-_Decimal128 __builtin_pack_dec128 (unsigned long long, unsigned long long);
-unsigned long long __builtin_unpack_dec128 (_Decimal128, int);
+cabs.@var{cond}.ps @var{cc},@var{a},@var{b}
+mov.ps @var{x},@var{c}
+movt.ps @var{x},@var{d},@var{cc}
+@end smallexample
 
-The @code{__builtin_set_fpscr_drn} builtin allows changing the three decimal
-floating point rounding mode bits.  The argument is a 3-bit value.  The
-argument can either be a @code{const int} or the value can be stored in
-a variable.
-The builtin uses the ISA 3.0 instruction @code{mffscdrn} if available.
-Otherwise the builtin reads the FPSCR, masks the current decimal rounding
-mode bits out and OR's in the new value.
+The @code{movf} functions are similar but use @code{movf.ps} instead
+of @code{movt.ps}.
 
-_Decimal64 __builtin_dfp_quantize (_Decimal64, _Decimal64, const int);
-_Decimal64 __builtin_dfp_quantize (const int, _Decimal64, const int);
-_Decimal128 __builtin_dfp_quantize (_Decimal128, _Decimal128, const int);
-_Decimal128 __builtin_dfp_quantize (const int, _Decimal128, const int);
+@item int __builtin_mips_any_c_@var{cond}_ps (v2sf @var{a}, v2sf @var{b})
+@itemx int __builtin_mips_all_c_@var{cond}_ps (v2sf @var{a}, v2sf @var{b})
+@itemx int __builtin_mips_any_cabs_@var{cond}_ps (v2sf @var{a}, v2sf @var{b})
+@itemx int __builtin_mips_all_cabs_@var{cond}_ps (v2sf @var{a}, v2sf @var{b})
+Comparison of two paired-single values
+(@code{c.@var{cond}.ps}/@code{cabs.@var{cond}.ps},
+@code{bc1any2t}/@code{bc1any2f}).
 
-The @code{__builtin_dfp_quantize} built-in, converts and rounds the second
-argument to the form with the exponent as specified by the first
-argument based on the rounding mode specified by the third argument.
-If the first argument is a decimal floating point value, its exponent is used
-for converting and rounding of the second argument.  If the first argument is a
-5-bit constant integer value, then the value specifies the exponent to be used
-when rounding and converting the second argument.  The third argument is a
-two bit constant integer that specifies the rounding mode.  The possible modes
-are: 00 Round to nearest, ties to even; 01 Round toward 0; 10 Round to nearest,
-ties away from 0; 11 Round according to DRN where DRN is the Decimal Floating
-point field of the FPSCR.
+These functions compare @var{a} and @var{b} using @code{c.@var{cond}.ps}
+or @code{cabs.@var{cond}.ps}.  The @code{any} forms return @code{true} if either
+result is @code{true} and the @code{all} forms return @code{true} if both results are @code{true}.
+For example:
+
+@smallexample
+v2sf a, b;
+if (__builtin_mips_any_c_eq_ps (a, b))
+  one_is_true ();
+else
+  both_are_false ();
 
+if (__builtin_mips_all_c_eq_ps (a, b))
+  both_are_true ();
+else
+  one_is_false ();
 @end smallexample
 
-The following functions require @option{-mhard-float},
-@option{-mpowerpc-gfxopt}, and @option{-mpopcntb} options.
+@item int __builtin_mips_any_c_@var{cond}_4s (v2sf @var{a}, v2sf @var{b}, v2sf @var{c}, v2sf @var{d})
+@itemx int __builtin_mips_all_c_@var{cond}_4s (v2sf @var{a}, v2sf @var{b}, v2sf @var{c}, v2sf @var{d})
+@itemx int __builtin_mips_any_cabs_@var{cond}_4s (v2sf @var{a}, v2sf @var{b}, v2sf @var{c}, v2sf @var{d})
+@itemx int __builtin_mips_all_cabs_@var{cond}_4s (v2sf @var{a}, v2sf @var{b}, v2sf @var{c}, v2sf @var{d})
+Comparison of four paired-single values
+(@code{c.@var{cond}.ps}/@code{cabs.@var{cond}.ps},
+@code{bc1any4t}/@code{bc1any4f}).
+
+These functions use @code{c.@var{cond}.ps} or @code{cabs.@var{cond}.ps}
+to compare @var{a} with @var{b} and to compare @var{c} with @var{d}.
+The @code{any} forms return @code{true} if any of the four results are @code{true}
+and the @code{all} forms return @code{true} if all four results are @code{true}.
+For example:
 
 @smallexample
-double __builtin_recipdiv (double, double);
-float __builtin_recipdivf (float, float);
-double __builtin_rsqrt (double);
-float __builtin_rsqrtf (float);
+v2sf a, b, c, d;
+if (__builtin_mips_any_c_eq_4s (a, b, c, d))
+  some_are_true ();
+else
+  all_are_false ();
+
+if (__builtin_mips_all_c_eq_4s (a, b, c, d))
+  all_are_true ();
+else
+  some_are_false ();
 @end smallexample
+@end table
 
-The @code{vec_rsqrt}, @code{__builtin_rsqrt}, and
-@code{__builtin_rsqrtf} functions generate multiple instructions to
-implement the reciprocal sqrt functionality using reciprocal sqrt
-estimate instructions.
+@node MIPS SIMD Architecture (MSA) Support
+@subsection MIPS SIMD Architecture (MSA) Support
 
-The @code{__builtin_recipdiv}, and @code{__builtin_recipdivf}
-functions generate multiple instructions to implement division using
-the reciprocal estimate instructions.
+@menu
+* MIPS SIMD Architecture Built-in Functions::
+@end menu
 
-The following functions require @option{-mhard-float} and
-@option{-mmultiple} options.
+GCC provides intrinsics to access the SIMD instructions provided by the
+MSA MIPS SIMD Architecture.  The interface is made available by including
+@code{<msa.h>} and using @option{-mmsa -mhard-float -mfp64 -mnan=2008}.
+For each @code{__builtin_msa_*}, there is a shortened name of the intrinsic,
+@code{__msa_*}.
 
-The @code{__builtin_unpack_longdouble} function takes a
-@code{long double} argument and a compile time constant of 0 or 1.  If
-the constant is 0, the first @code{double} within the
-@code{long double} is returned, otherwise the second @code{double}
-is returned.  The @code{__builtin_unpack_longdouble} function is only
-available if @code{long double} uses the IBM extended double
-representation.
+MSA implements 128-bit wide vector registers, operating on 8-, 16-, 32- and
+64-bit integer, 16- and 32-bit fixed-point, or 32- and 64-bit floating point
+data elements.  The following vectors typedefs are included in @code{msa.h}:
+@itemize
+@item @code{v16i8}, a vector of sixteen signed 8-bit integers;
+@item @code{v16u8}, a vector of sixteen unsigned 8-bit integers;
+@item @code{v8i16}, a vector of eight signed 16-bit integers;
+@item @code{v8u16}, a vector of eight unsigned 16-bit integers;
+@item @code{v4i32}, a vector of four signed 32-bit integers;
+@item @code{v4u32}, a vector of four unsigned 32-bit integers;
+@item @code{v2i64}, a vector of two signed 64-bit integers;
+@item @code{v2u64}, a vector of two unsigned 64-bit integers;
+@item @code{v4f32}, a vector of four 32-bit floats;
+@item @code{v2f64}, a vector of two 64-bit doubles.
+@end itemize
 
-The @code{__builtin_pack_longdouble} function takes two @code{double}
-arguments and returns a @code{long double} value that combines the two
-arguments.  The @code{__builtin_pack_longdouble} function is only
-available if @code{long double} uses the IBM extended double
-representation.
+Instructions and corresponding built-ins may have additional restrictions and/or
+input/output values manipulated:
+@itemize
+@item @code{imm0_1}, an integer literal in range 0 to 1;
+@item @code{imm0_3}, an integer literal in range 0 to 3;
+@item @code{imm0_7}, an integer literal in range 0 to 7;
+@item @code{imm0_15}, an integer literal in range 0 to 15;
+@item @code{imm0_31}, an integer literal in range 0 to 31;
+@item @code{imm0_63}, an integer literal in range 0 to 63;
+@item @code{imm0_255}, an integer literal in range 0 to 255;
+@item @code{imm_n16_15}, an integer literal in range -16 to 15;
+@item @code{imm_n512_511}, an integer literal in range -512 to 511;
+@item @code{imm_n1024_1022}, an integer literal in range -512 to 511 left
+shifted by 1 bit, i.e., -1024, -1022, @dots{}, 1020, 1022;
+@item @code{imm_n2048_2044}, an integer literal in range -512 to 511 left
+shifted by 2 bits, i.e., -2048, -2044, @dots{}, 2040, 2044;
+@item @code{imm_n4096_4088}, an integer literal in range -512 to 511 left
+shifted by 3 bits, i.e., -4096, -4088, @dots{}, 4080, 4088;
+@item @code{imm1_4}, an integer literal in range 1 to 4;
+@item @code{i32, i64, u32, u64, f32, f64}, defined as follows:
+@end itemize
 
-The @code{__builtin_unpack_ibm128} function takes a @code{__ibm128}
-argument and a compile time constant of 0 or 1.  If the constant is 0,
-the first @code{double} within the @code{__ibm128} is returned,
-otherwise the second @code{double} is returned.
+@smallexample
+@{
+typedef int i32;
+#if __LONG_MAX__ == __LONG_LONG_MAX__
+typedef long i64;
+#else
+typedef long long i64;
+#endif
 
-The @code{__builtin_pack_ibm128} function takes two @code{double}
-arguments and returns a @code{__ibm128} value that combines the two
-arguments.
+typedef unsigned int u32;
+#if __LONG_MAX__ == __LONG_LONG_MAX__
+typedef unsigned long u64;
+#else
+typedef unsigned long long u64;
+#endif
 
-Additional built-in functions are available for the 64-bit PowerPC
-family of processors, for efficient use of 128-bit floating point
-(@code{__float128}) values.
+typedef double f64;
+typedef float f32;
+@}
+@end smallexample
 
-Vector select
+@node MIPS SIMD Architecture Built-in Functions
+@subsubsection MIPS SIMD Architecture Built-in Functions
+
+The intrinsics provided are listed below; each is named after the
+machine instruction.
 
 @smallexample
-vector signed __int128 vec_sel (vector signed __int128,
-               vector signed __int128, vector bool __int128);
-vector signed __int128 vec_sel (vector signed __int128,
-               vector signed __int128, vector unsigned __int128);
-vector unsigned __int128 vec_sel (vector unsigned __int128,
-               vector unsigned __int128, vector bool __int128);
-vector unsigned __int128 vec_sel (vector unsigned __int128,
-               vector unsigned __int128, vector unsigned __int128);
-vector bool __int128 vec_sel (vector bool __int128,
-               vector bool __int128, vector bool __int128);
-vector bool __int128 vec_sel (vector bool __int128,
-               vector bool __int128, vector unsigned __int128);
-@end smallexample
+v16i8 __builtin_msa_add_a_b (v16i8, v16i8);
+v8i16 __builtin_msa_add_a_h (v8i16, v8i16);
+v4i32 __builtin_msa_add_a_w (v4i32, v4i32);
+v2i64 __builtin_msa_add_a_d (v2i64, v2i64);
+
+v16i8 __builtin_msa_adds_a_b (v16i8, v16i8);
+v8i16 __builtin_msa_adds_a_h (v8i16, v8i16);
+v4i32 __builtin_msa_adds_a_w (v4i32, v4i32);
+v2i64 __builtin_msa_adds_a_d (v2i64, v2i64);
+
+v16i8 __builtin_msa_adds_s_b (v16i8, v16i8);
+v8i16 __builtin_msa_adds_s_h (v8i16, v8i16);
+v4i32 __builtin_msa_adds_s_w (v4i32, v4i32);
+v2i64 __builtin_msa_adds_s_d (v2i64, v2i64);
+
+v16u8 __builtin_msa_adds_u_b (v16u8, v16u8);
+v8u16 __builtin_msa_adds_u_h (v8u16, v8u16);
+v4u32 __builtin_msa_adds_u_w (v4u32, v4u32);
+v2u64 __builtin_msa_adds_u_d (v2u64, v2u64);
+
+v16i8 __builtin_msa_addv_b (v16i8, v16i8);
+v8i16 __builtin_msa_addv_h (v8i16, v8i16);
+v4i32 __builtin_msa_addv_w (v4i32, v4i32);
+v2i64 __builtin_msa_addv_d (v2i64, v2i64);
 
-The instance is an extension of the existing overloaded built-in @code{vec_sel}
-that is documented in the PVIPR.
+v16i8 __builtin_msa_addvi_b (v16i8, imm0_31);
+v8i16 __builtin_msa_addvi_h (v8i16, imm0_31);
+v4i32 __builtin_msa_addvi_w (v4i32, imm0_31);
+v2i64 __builtin_msa_addvi_d (v2i64, imm0_31);
 
-@smallexample
-vector signed __int128 vec_perm (vector signed __int128,
-               vector signed __int128);
-vector unsigned __int128 vec_perm (vector unsigned __int128,
-               vector unsigned __int128);
-@end smallexample
+v16u8 __builtin_msa_and_v (v16u8, v16u8);
 
-The instance is an extension of the existing overloaded built-in
-@code{vec_perm} that is documented in the PVIPR.
+v16u8 __builtin_msa_andi_b (v16u8, imm0_255);
 
-@node Basic PowerPC Built-in Functions Available on ISA 2.06
-@subsubsection Basic PowerPC Built-in Functions Available on ISA 2.06
+v16i8 __builtin_msa_asub_s_b (v16i8, v16i8);
+v8i16 __builtin_msa_asub_s_h (v8i16, v8i16);
+v4i32 __builtin_msa_asub_s_w (v4i32, v4i32);
+v2i64 __builtin_msa_asub_s_d (v2i64, v2i64);
 
-The basic built-in functions described in this section are
-available on the PowerPC family of processors starting with ISA 2.05
-or later.  Unless specific options are explicitly disabled on the
-command line, specifying option @option{-mcpu=power7} has the effect of
-enabling all the same options as for @option{-mcpu=power6} in
-addition to the @option{-maltivec}, @option{-mpopcntd}, and
-@option{-mvsx} options.
+v16u8 __builtin_msa_asub_u_b (v16u8, v16u8);
+v8u16 __builtin_msa_asub_u_h (v8u16, v8u16);
+v4u32 __builtin_msa_asub_u_w (v4u32, v4u32);
+v2u64 __builtin_msa_asub_u_d (v2u64, v2u64);
 
-The following basic built-in functions require @option{-mpopcntd}:
-@smallexample
-unsigned int __builtin_addg6s (unsigned int, unsigned int);
-long long __builtin_bpermd (long long, long long);
-unsigned int __builtin_cbcdtd (unsigned int);
-unsigned int __builtin_cdtbcd (unsigned int);
-long long __builtin_divde (long long, long long);
-unsigned long long __builtin_divdeu (unsigned long long, unsigned long long);
-int __builtin_divwe (int, int);
-unsigned int __builtin_divweu (unsigned int, unsigned int);
-vector __int128 __builtin_pack_vector_int128 (long long, long long);
-void __builtin_rs6000_speculation_barrier (void);
-long long __builtin_unpack_vector_int128 (vector __int128, signed char);
-@end smallexample
+v16i8 __builtin_msa_ave_s_b (v16i8, v16i8);
+v8i16 __builtin_msa_ave_s_h (v8i16, v8i16);
+v4i32 __builtin_msa_ave_s_w (v4i32, v4i32);
+v2i64 __builtin_msa_ave_s_d (v2i64, v2i64);
 
-Of these, the @code{__builtin_divde} and @code{__builtin_divdeu} functions
-require a 64-bit environment.
+v16u8 __builtin_msa_ave_u_b (v16u8, v16u8);
+v8u16 __builtin_msa_ave_u_h (v8u16, v8u16);
+v4u32 __builtin_msa_ave_u_w (v4u32, v4u32);
+v2u64 __builtin_msa_ave_u_d (v2u64, v2u64);
 
-The following basic built-in functions, which are also supported on
-x86 targets, require @option{-mfloat128}.
-@smallexample
-__float128 __builtin_fabsq (__float128);
-__float128 __builtin_copysignq (__float128, __float128);
-__float128 __builtin_infq (void);
-__float128 __builtin_huge_valq (void);
-__float128 __builtin_nanq (void);
-__float128 __builtin_nansq (void);
+v16i8 __builtin_msa_aver_s_b (v16i8, v16i8);
+v8i16 __builtin_msa_aver_s_h (v8i16, v8i16);
+v4i32 __builtin_msa_aver_s_w (v4i32, v4i32);
+v2i64 __builtin_msa_aver_s_d (v2i64, v2i64);
 
-__float128 __builtin_sqrtf128 (__float128);
-__float128 __builtin_fmaf128 (__float128, __float128, __float128);
-@end smallexample
+v16u8 __builtin_msa_aver_u_b (v16u8, v16u8);
+v8u16 __builtin_msa_aver_u_h (v8u16, v8u16);
+v4u32 __builtin_msa_aver_u_w (v4u32, v4u32);
+v2u64 __builtin_msa_aver_u_d (v2u64, v2u64);
 
-@node Basic PowerPC Built-in Functions Available on ISA 2.07
-@subsubsection Basic PowerPC Built-in Functions Available on ISA 2.07
+v16u8 __builtin_msa_bclr_b (v16u8, v16u8);
+v8u16 __builtin_msa_bclr_h (v8u16, v8u16);
+v4u32 __builtin_msa_bclr_w (v4u32, v4u32);
+v2u64 __builtin_msa_bclr_d (v2u64, v2u64);
 
-The basic built-in functions described in this section are
-available on the PowerPC family of processors starting with ISA 2.07
-or later.  Unless specific options are explicitly disabled on the
-command line, specifying option @option{-mcpu=power8} has the effect of
-enabling all the same options as for @option{-mcpu=power7} in
-addition to the @option{-mpower8-fusion}, @option{-mcrypto},
-@option{-mhtm}, @option{-mquad-memory}, and
-@option{-mquad-memory-atomic} options.
+v16u8 __builtin_msa_bclri_b (v16u8, imm0_7);
+v8u16 __builtin_msa_bclri_h (v8u16, imm0_15);
+v4u32 __builtin_msa_bclri_w (v4u32, imm0_31);
+v2u64 __builtin_msa_bclri_d (v2u64, imm0_63);
 
-This section intentionally empty.
+v16u8 __builtin_msa_binsl_b (v16u8, v16u8, v16u8);
+v8u16 __builtin_msa_binsl_h (v8u16, v8u16, v8u16);
+v4u32 __builtin_msa_binsl_w (v4u32, v4u32, v4u32);
+v2u64 __builtin_msa_binsl_d (v2u64, v2u64, v2u64);
 
-@node Basic PowerPC Built-in Functions Available on ISA 3.0
-@subsubsection Basic PowerPC Built-in Functions Available on ISA 3.0
+v16u8 __builtin_msa_binsli_b (v16u8, v16u8, imm0_7);
+v8u16 __builtin_msa_binsli_h (v8u16, v8u16, imm0_15);
+v4u32 __builtin_msa_binsli_w (v4u32, v4u32, imm0_31);
+v2u64 __builtin_msa_binsli_d (v2u64, v2u64, imm0_63);
 
-The basic built-in functions described in this section are
-available on the PowerPC family of processors starting with ISA 3.0
-or later.  Unless specific options are explicitly disabled on the
-command line, specifying option @option{-mcpu=power9} has the effect of
-enabling all the same options as for @option{-mcpu=power8} in
-addition to the @option{-misel} option.
+v16u8 __builtin_msa_binsr_b (v16u8, v16u8, v16u8);
+v8u16 __builtin_msa_binsr_h (v8u16, v8u16, v8u16);
+v4u32 __builtin_msa_binsr_w (v4u32, v4u32, v4u32);
+v2u64 __builtin_msa_binsr_d (v2u64, v2u64, v2u64);
 
-The following built-in functions are available on Linux 64-bit systems
-that use the ISA 3.0 instruction set (@option{-mcpu=power9}):
+v16u8 __builtin_msa_binsri_b (v16u8, v16u8, imm0_7);
+v8u16 __builtin_msa_binsri_h (v8u16, v8u16, imm0_15);
+v4u32 __builtin_msa_binsri_w (v4u32, v4u32, imm0_31);
+v2u64 __builtin_msa_binsri_d (v2u64, v2u64, imm0_63);
 
-@defbuiltin{__float128 __builtin_addf128_round_to_odd (__float128, __float128)}
-Perform a 128-bit IEEE floating point add using round to odd as the
-rounding mode.
-@enddefbuiltin
+v16u8 __builtin_msa_bmnz_v (v16u8, v16u8, v16u8);
 
-@defbuiltin{__float128 __builtin_subf128_round_to_odd (__float128, __float128)}
-Perform a 128-bit IEEE floating point subtract using round to odd as
-the rounding mode.
-@enddefbuiltin
+v16u8 __builtin_msa_bmnzi_b (v16u8, v16u8, imm0_255);
 
-@defbuiltin{__float128 __builtin_mulf128_round_to_odd (__float128, __float128)}
-Perform a 128-bit IEEE floating point multiply using round to odd as
-the rounding mode.
-@enddefbuiltin
+v16u8 __builtin_msa_bmz_v (v16u8, v16u8, v16u8);
 
-@defbuiltin{__float128 __builtin_divf128_round_to_odd (__float128, __float128)}
-Perform a 128-bit IEEE floating point divide using round to odd as
-the rounding mode.
-@enddefbuiltin
+v16u8 __builtin_msa_bmzi_b (v16u8, v16u8, imm0_255);
 
-@defbuiltin{__float128 __builtin_sqrtf128_round_to_odd (__float128)}
-Perform a 128-bit IEEE floating point square root using round to odd
-as the rounding mode.
-@enddefbuiltin
+v16u8 __builtin_msa_bneg_b (v16u8, v16u8);
+v8u16 __builtin_msa_bneg_h (v8u16, v8u16);
+v4u32 __builtin_msa_bneg_w (v4u32, v4u32);
+v2u64 __builtin_msa_bneg_d (v2u64, v2u64);
 
-@defbuiltin{__float128 __builtin_fmaf128_round_to_odd (__float128, __float128, __float128)}
-Perform a 128-bit IEEE floating point fused multiply and add operation
-using round to odd as the rounding mode.
-@enddefbuiltin
+v16u8 __builtin_msa_bnegi_b (v16u8, imm0_7);
+v8u16 __builtin_msa_bnegi_h (v8u16, imm0_15);
+v4u32 __builtin_msa_bnegi_w (v4u32, imm0_31);
+v2u64 __builtin_msa_bnegi_d (v2u64, imm0_63);
 
-@defbuiltin{double __builtin_truncf128_round_to_odd (__float128)}
-Convert a 128-bit IEEE floating point value to @code{double} using
-round to odd as the rounding mode.
-@enddefbuiltin
+i32 __builtin_msa_bnz_b (v16u8);
+i32 __builtin_msa_bnz_h (v8u16);
+i32 __builtin_msa_bnz_w (v4u32);
+i32 __builtin_msa_bnz_d (v2u64);
 
+i32 __builtin_msa_bnz_v (v16u8);
 
-The following additional built-in functions are also available for the
-PowerPC family of processors, starting with ISA 3.0 or later:
+v16u8 __builtin_msa_bsel_v (v16u8, v16u8, v16u8);
 
-@defbuiltin{{long long} __builtin_darn (void)}
-@defbuiltinx{{long long} __builtin_darn_raw (void)}
-@defbuiltinx{int __builtin_darn_32 (void)}
-The @code{__builtin_darn} and @code{__builtin_darn_raw}
-functions require a
-64-bit environment supporting ISA 3.0 or later.
-The @code{__builtin_darn} function provides a 64-bit conditioned
-random number.  The @code{__builtin_darn_raw} function provides a
-64-bit raw random number.  The @code{__builtin_darn_32} function
-provides a 32-bit conditioned random number.
-@enddefbuiltin
+v16u8 __builtin_msa_bseli_b (v16u8, v16u8, imm0_255);
 
-The following additional built-in functions are also available for the
-PowerPC family of processors, starting with ISA 3.0 or later:
+v16u8 __builtin_msa_bset_b (v16u8, v16u8);
+v8u16 __builtin_msa_bset_h (v8u16, v8u16);
+v4u32 __builtin_msa_bset_w (v4u32, v4u32);
+v2u64 __builtin_msa_bset_d (v2u64, v2u64);
 
-@smallexample
-int __builtin_byte_in_set (unsigned char u, unsigned long long set);
-int __builtin_byte_in_range (unsigned char u, unsigned int range);
-int __builtin_byte_in_either_range (unsigned char u, unsigned int ranges);
+v16u8 __builtin_msa_bseti_b (v16u8, imm0_7);
+v8u16 __builtin_msa_bseti_h (v8u16, imm0_15);
+v4u32 __builtin_msa_bseti_w (v4u32, imm0_31);
+v2u64 __builtin_msa_bseti_d (v2u64, imm0_63);
 
-int __builtin_dfp_dtstsfi_lt (unsigned int comparison, _Decimal64 value);
-int __builtin_dfp_dtstsfi_lt (unsigned int comparison, _Decimal128 value);
-int __builtin_dfp_dtstsfi_lt_dd (unsigned int comparison, _Decimal64 value);
-int __builtin_dfp_dtstsfi_lt_td (unsigned int comparison, _Decimal128 value);
+i32 __builtin_msa_bz_b (v16u8);
+i32 __builtin_msa_bz_h (v8u16);
+i32 __builtin_msa_bz_w (v4u32);
+i32 __builtin_msa_bz_d (v2u64);
 
-int __builtin_dfp_dtstsfi_gt (unsigned int comparison, _Decimal64 value);
-int __builtin_dfp_dtstsfi_gt (unsigned int comparison, _Decimal128 value);
-int __builtin_dfp_dtstsfi_gt_dd (unsigned int comparison, _Decimal64 value);
-int __builtin_dfp_dtstsfi_gt_td (unsigned int comparison, _Decimal128 value);
+i32 __builtin_msa_bz_v (v16u8);
 
-int __builtin_dfp_dtstsfi_eq (unsigned int comparison, _Decimal64 value);
-int __builtin_dfp_dtstsfi_eq (unsigned int comparison, _Decimal128 value);
-int __builtin_dfp_dtstsfi_eq_dd (unsigned int comparison, _Decimal64 value);
-int __builtin_dfp_dtstsfi_eq_td (unsigned int comparison, _Decimal128 value);
+v16i8 __builtin_msa_ceq_b (v16i8, v16i8);
+v8i16 __builtin_msa_ceq_h (v8i16, v8i16);
+v4i32 __builtin_msa_ceq_w (v4i32, v4i32);
+v2i64 __builtin_msa_ceq_d (v2i64, v2i64);
 
-int __builtin_dfp_dtstsfi_ov (unsigned int comparison, _Decimal64 value);
-int __builtin_dfp_dtstsfi_ov (unsigned int comparison, _Decimal128 value);
-int __builtin_dfp_dtstsfi_ov_dd (unsigned int comparison, _Decimal64 value);
-int __builtin_dfp_dtstsfi_ov_td (unsigned int comparison, _Decimal128 value);
+v16i8 __builtin_msa_ceqi_b (v16i8, imm_n16_15);
+v8i16 __builtin_msa_ceqi_h (v8i16, imm_n16_15);
+v4i32 __builtin_msa_ceqi_w (v4i32, imm_n16_15);
+v2i64 __builtin_msa_ceqi_d (v2i64, imm_n16_15);
 
-double __builtin_mffsl(void);
+i32 __builtin_msa_cfcmsa (imm0_31);
 
-@end smallexample
-The @code{__builtin_byte_in_set} function requires a
-64-bit environment supporting ISA 3.0 or later.  This function returns
-a non-zero value if and only if its @code{u} argument exactly equals one of
-the eight bytes contained within its 64-bit @code{set} argument.
+v16i8 __builtin_msa_cle_s_b (v16i8, v16i8);
+v8i16 __builtin_msa_cle_s_h (v8i16, v8i16);
+v4i32 __builtin_msa_cle_s_w (v4i32, v4i32);
+v2i64 __builtin_msa_cle_s_d (v2i64, v2i64);
 
-The @code{__builtin_byte_in_range} and
-@code{__builtin_byte_in_either_range} require an environment
-supporting ISA 3.0 or later.  For these two functions, the
-@code{range} argument is encoded as 4 bytes, organized as
-@code{hi_1:lo_1:hi_2:lo_2}.
-The @code{__builtin_byte_in_range} function returns a
-non-zero value if and only if its @code{u} argument is within the
-range bounded between @code{lo_2} and @code{hi_2} inclusive.
-The @code{__builtin_byte_in_either_range} function returns non-zero if
-and only if its @code{u} argument is within either the range bounded
-between @code{lo_1} and @code{hi_1} inclusive or the range bounded
-between @code{lo_2} and @code{hi_2} inclusive.
+v16i8 __builtin_msa_cle_u_b (v16u8, v16u8);
+v8i16 __builtin_msa_cle_u_h (v8u16, v8u16);
+v4i32 __builtin_msa_cle_u_w (v4u32, v4u32);
+v2i64 __builtin_msa_cle_u_d (v2u64, v2u64);
 
-The @code{__builtin_dfp_dtstsfi_lt} function returns a non-zero value
-if and only if the number of significant digits of its @code{value} argument
-is less than its @code{comparison} argument.  The
-@code{__builtin_dfp_dtstsfi_lt_dd} and
-@code{__builtin_dfp_dtstsfi_lt_td} functions behave similarly, but
-require that the type of the @code{value} argument be
-@code{__Decimal64} and @code{__Decimal128} respectively.
+v16i8 __builtin_msa_clei_s_b (v16i8, imm_n16_15);
+v8i16 __builtin_msa_clei_s_h (v8i16, imm_n16_15);
+v4i32 __builtin_msa_clei_s_w (v4i32, imm_n16_15);
+v2i64 __builtin_msa_clei_s_d (v2i64, imm_n16_15);
 
-The @code{__builtin_dfp_dtstsfi_gt} function returns a non-zero value
-if and only if the number of significant digits of its @code{value} argument
-is greater than its @code{comparison} argument.  The
-@code{__builtin_dfp_dtstsfi_gt_dd} and
-@code{__builtin_dfp_dtstsfi_gt_td} functions behave similarly, but
-require that the type of the @code{value} argument be
-@code{__Decimal64} and @code{__Decimal128} respectively.
+v16i8 __builtin_msa_clei_u_b (v16u8, imm0_31);
+v8i16 __builtin_msa_clei_u_h (v8u16, imm0_31);
+v4i32 __builtin_msa_clei_u_w (v4u32, imm0_31);
+v2i64 __builtin_msa_clei_u_d (v2u64, imm0_31);
 
-The @code{__builtin_dfp_dtstsfi_eq} function returns a non-zero value
-if and only if the number of significant digits of its @code{value} argument
-equals its @code{comparison} argument.  The
-@code{__builtin_dfp_dtstsfi_eq_dd} and
-@code{__builtin_dfp_dtstsfi_eq_td} functions behave similarly, but
-require that the type of the @code{value} argument be
-@code{__Decimal64} and @code{__Decimal128} respectively.
+v16i8 __builtin_msa_clt_s_b (v16i8, v16i8);
+v8i16 __builtin_msa_clt_s_h (v8i16, v8i16);
+v4i32 __builtin_msa_clt_s_w (v4i32, v4i32);
+v2i64 __builtin_msa_clt_s_d (v2i64, v2i64);
 
-The @code{__builtin_dfp_dtstsfi_ov} function returns a non-zero value
-if and only if its @code{value} argument has an undefined number of
-significant digits, such as when @code{value} is an encoding of @code{NaN}.
-The @code{__builtin_dfp_dtstsfi_ov_dd} and
-@code{__builtin_dfp_dtstsfi_ov_td} functions behave similarly, but
-require that the type of the @code{value} argument be
-@code{__Decimal64} and @code{__Decimal128} respectively.
+v16i8 __builtin_msa_clt_u_b (v16u8, v16u8);
+v8i16 __builtin_msa_clt_u_h (v8u16, v8u16);
+v4i32 __builtin_msa_clt_u_w (v4u32, v4u32);
+v2i64 __builtin_msa_clt_u_d (v2u64, v2u64);
 
-The @code{__builtin_mffsl} uses the ISA 3.0 @code{mffsl} instruction to read
-the FPSCR.  The instruction is a lower latency version of the @code{mffs}
-instruction.  If the @code{mffsl} instruction is not available, then the
-builtin uses the older @code{mffs} instruction to read the FPSCR.
+v16i8 __builtin_msa_clti_s_b (v16i8, imm_n16_15);
+v8i16 __builtin_msa_clti_s_h (v8i16, imm_n16_15);
+v4i32 __builtin_msa_clti_s_w (v4i32, imm_n16_15);
+v2i64 __builtin_msa_clti_s_d (v2i64, imm_n16_15);
 
-@node Basic PowerPC Built-in Functions Available on ISA 3.1
-@subsubsection Basic PowerPC Built-in Functions Available on ISA 3.1
+v16i8 __builtin_msa_clti_u_b (v16u8, imm0_31);
+v8i16 __builtin_msa_clti_u_h (v8u16, imm0_31);
+v4i32 __builtin_msa_clti_u_w (v4u32, imm0_31);
+v2i64 __builtin_msa_clti_u_d (v2u64, imm0_31);
 
-The basic built-in functions described in this section are
-available on the PowerPC family of processors starting with ISA 3.1.
-Unless specific options are explicitly disabled on the
-command line, specifying option @option{-mcpu=power10} has the effect of
-enabling all the same options as for @option{-mcpu=power9}.
+i32 __builtin_msa_copy_s_b (v16i8, imm0_15);
+i32 __builtin_msa_copy_s_h (v8i16, imm0_7);
+i32 __builtin_msa_copy_s_w (v4i32, imm0_3);
+i64 __builtin_msa_copy_s_d (v2i64, imm0_1);
 
-The following built-in functions are available on Linux 64-bit systems
-that use a future architecture instruction set (@option{-mcpu=power10}):
+u32 __builtin_msa_copy_u_b (v16i8, imm0_15);
+u32 __builtin_msa_copy_u_h (v8i16, imm0_7);
+u32 __builtin_msa_copy_u_w (v4i32, imm0_3);
+u64 __builtin_msa_copy_u_d (v2i64, imm0_1);
 
-@defbuiltin{{unsigned long long} @
-            __builtin_cfuged (unsigned long long, unsigned long long)}
-Perform a 64-bit centrifuge operation, as if implemented by the
-@code{cfuged} instruction.
-@enddefbuiltin
+void __builtin_msa_ctcmsa (imm0_31, i32);
 
-@defbuiltin{{unsigned long long} @
-            __builtin_cntlzdm (unsigned long long, unsigned long long)}
-Perform a 64-bit count leading zeros operation under mask, as if
-implemented by the @code{cntlzdm} instruction.
-@enddefbuiltin
+v16i8 __builtin_msa_div_s_b (v16i8, v16i8);
+v8i16 __builtin_msa_div_s_h (v8i16, v8i16);
+v4i32 __builtin_msa_div_s_w (v4i32, v4i32);
+v2i64 __builtin_msa_div_s_d (v2i64, v2i64);
 
-@defbuiltin{{unsigned long long} @
-            __builtin_cnttzdm (unsigned long long, unsigned long long)}
-Perform a 64-bit count trailing zeros operation under mask, as if
-implemented by the @code{cnttzdm} instruction.
-@enddefbuiltin
+v16u8 __builtin_msa_div_u_b (v16u8, v16u8);
+v8u16 __builtin_msa_div_u_h (v8u16, v8u16);
+v4u32 __builtin_msa_div_u_w (v4u32, v4u32);
+v2u64 __builtin_msa_div_u_d (v2u64, v2u64);
 
-@defbuiltin{{unsigned long long} @
-            __builtin_pdepd (unsigned long long, unsigned long long)}
-Perform a 64-bit parallel bits deposit operation, as if implemented by the
-@code{pdepd} instruction.
-@enddefbuiltin
+v8i16 __builtin_msa_dotp_s_h (v16i8, v16i8);
+v4i32 __builtin_msa_dotp_s_w (v8i16, v8i16);
+v2i64 __builtin_msa_dotp_s_d (v4i32, v4i32);
 
-@defbuiltin{{unsigned long long} @
-            __builtin_pextd (unsigned long long, unsigned long long)}
-Perform a 64-bit parallel bits extract operation, as if implemented by the
-@code{pextd} instruction.
-@enddefbuiltin
+v8u16 __builtin_msa_dotp_u_h (v16u8, v16u8);
+v4u32 __builtin_msa_dotp_u_w (v8u16, v8u16);
+v2u64 __builtin_msa_dotp_u_d (v4u32, v4u32);
 
-@defbuiltin{{vector signed __int128} vsx_xl_sext (signed long long, signed char *)}
-@defbuiltinx{{vector signed __int128} vsx_xl_sext (signed long long, signed short *)}
-@defbuiltinx{{vector signed __int128} vsx_xl_sext (signed long long, signed int *)}
-@defbuiltinx{{vector signed __int128} vsx_xl_sext (signed long long, signed long long *)}
-@defbuiltinx{{vector unsigned __int128} vsx_xl_zext (signed long long, unsigned char *)}
-@defbuiltinx{{vector unsigned __int128} vsx_xl_zext (signed long long, unsigned short *)}
-@defbuiltinx{{vector unsigned __int128} vsx_xl_zext (signed long long, unsigned int *)}
-@defbuiltinx{{vector unsigned __int128} vsx_xl_zext (signed long long, unsigned long long *)}
+v8i16 __builtin_msa_dpadd_s_h (v8i16, v16i8, v16i8);
+v4i32 __builtin_msa_dpadd_s_w (v4i32, v8i16, v8i16);
+v2i64 __builtin_msa_dpadd_s_d (v2i64, v4i32, v4i32);
 
-Load (and sign extend) to an __int128 vector, as if implemented by the ISA 3.1
-@code{lxvrbx}, @code{lxvrhx}, @code{lxvrwx}, and  @code{lxvrdx}
-instructions.
-@enddefbuiltin
+v8u16 __builtin_msa_dpadd_u_h (v8u16, v16u8, v16u8);
+v4u32 __builtin_msa_dpadd_u_w (v4u32, v8u16, v8u16);
+v2u64 __builtin_msa_dpadd_u_d (v2u64, v4u32, v4u32);
 
-@defbuiltin{{void} vec_xst_trunc (vector signed __int128, signed long long, signed char *)}
-@defbuiltinx{{void} vec_xst_trunc (vector signed __int128, signed long long, signed short *)}
-@defbuiltinx{{void} vec_xst_trunc (vector signed __int128, signed long long, signed int *)}
-@defbuiltinx{{void} vec_xst_trunc (vector signed __int128, signed long long, signed long long *)}
-@defbuiltinx{{void} vec_xst_trunc (vector unsigned __int128, signed long long, unsigned char *)}
-@defbuiltinx{{void} vec_xst_trunc (vector unsigned __int128, signed long long, unsigned short *)}
-@defbuiltinx{{void} vec_xst_trunc (vector unsigned __int128, signed long long, unsigned int *)}
-@defbuiltinx{{void} vec_xst_trunc (vector unsigned __int128, signed long long, unsigned long long *)}
+v8i16 __builtin_msa_dpsub_s_h (v8i16, v16i8, v16i8);
+v4i32 __builtin_msa_dpsub_s_w (v4i32, v8i16, v8i16);
+v2i64 __builtin_msa_dpsub_s_d (v2i64, v4i32, v4i32);
 
-Truncate and store the rightmost element of a vector, as if implemented by the
-ISA 3.1 @code{stxvrbx}, @code{stxvrhx}, @code{stxvrwx}, and @code{stxvrdx}
-instructions.
-@enddefbuiltin
+v8i16 __builtin_msa_dpsub_u_h (v8i16, v16u8, v16u8);
+v4i32 __builtin_msa_dpsub_u_w (v4i32, v8u16, v8u16);
+v2i64 __builtin_msa_dpsub_u_d (v2i64, v4u32, v4u32);
 
-@node PowerPC AltiVec/VSX Built-in Functions
-@subsection PowerPC AltiVec/VSX Built-in Functions
+v4f32 __builtin_msa_fadd_w (v4f32, v4f32);
+v2f64 __builtin_msa_fadd_d (v2f64, v2f64);
 
-GCC provides an interface for the PowerPC family of processors to access
-the AltiVec operations described in Motorola's AltiVec Programming
-Interface Manual.  The interface is made available by including
-@code{<altivec.h>} and using @option{-maltivec} and
-@option{-mabi=altivec}.  The interface supports the following vector
-types.
+v4i32 __builtin_msa_fcaf_w (v4f32, v4f32);
+v2i64 __builtin_msa_fcaf_d (v2f64, v2f64);
 
-@smallexample
-vector unsigned char
-vector signed char
-vector bool char
+v4i32 __builtin_msa_fceq_w (v4f32, v4f32);
+v2i64 __builtin_msa_fceq_d (v2f64, v2f64);
 
-vector unsigned short
-vector signed short
-vector bool short
-vector pixel
+v4i32 __builtin_msa_fclass_w (v4f32);
+v2i64 __builtin_msa_fclass_d (v2f64);
 
-vector unsigned int
-vector signed int
-vector bool int
-vector float
-@end smallexample
+v4i32 __builtin_msa_fcle_w (v4f32, v4f32);
+v2i64 __builtin_msa_fcle_d (v2f64, v2f64);
 
-GCC's implementation of the high-level language interface available from
-C and C++ code differs from Motorola's documentation in several ways.
+v4i32 __builtin_msa_fclt_w (v4f32, v4f32);
+v2i64 __builtin_msa_fclt_d (v2f64, v2f64);
 
-@itemize @bullet
+v4i32 __builtin_msa_fcne_w (v4f32, v4f32);
+v2i64 __builtin_msa_fcne_d (v2f64, v2f64);
 
-@item
-A vector constant is a list of constant expressions within curly braces.
+v4i32 __builtin_msa_fcor_w (v4f32, v4f32);
+v2i64 __builtin_msa_fcor_d (v2f64, v2f64);
 
-@item
-A vector initializer requires no cast if the vector constant is of the
-same type as the variable it is initializing.
+v4i32 __builtin_msa_fcueq_w (v4f32, v4f32);
+v2i64 __builtin_msa_fcueq_d (v2f64, v2f64);
 
-@item
-If @code{signed} or @code{unsigned} is omitted, the signedness of the
-vector type is the default signedness of the base type.  The default
-varies depending on the operating system, so a portable program should
-always specify the signedness.
+v4i32 __builtin_msa_fcule_w (v4f32, v4f32);
+v2i64 __builtin_msa_fcule_d (v2f64, v2f64);
 
-@item
-Compiling with @option{-maltivec} adds keywords @code{__vector},
-@code{vector}, @code{__pixel}, @code{pixel}, @code{__bool} and
-@code{bool}.  When compiling ISO C, the context-sensitive substitution
-of the keywords @code{vector}, @code{pixel} and @code{bool} is
-disabled.  To use them, you must include @code{<altivec.h>} instead.
+v4i32 __builtin_msa_fcult_w (v4f32, v4f32);
+v2i64 __builtin_msa_fcult_d (v2f64, v2f64);
 
-@item
-GCC allows using a @code{typedef} name as the type specifier for a
-vector type, but only under the following circumstances:
+v4i32 __builtin_msa_fcun_w (v4f32, v4f32);
+v2i64 __builtin_msa_fcun_d (v2f64, v2f64);
 
-@itemize @bullet
+v4i32 __builtin_msa_fcune_w (v4f32, v4f32);
+v2i64 __builtin_msa_fcune_d (v2f64, v2f64);
 
-@item
-When using @code{__vector} instead of @code{vector}; for example,
+v4f32 __builtin_msa_fdiv_w (v4f32, v4f32);
+v2f64 __builtin_msa_fdiv_d (v2f64, v2f64);
 
-@smallexample
-typedef signed short int16;
-__vector int16 data;
-@end smallexample
+v8i16 __builtin_msa_fexdo_h (v4f32, v4f32);
+v4f32 __builtin_msa_fexdo_w (v2f64, v2f64);
 
-@item
-When using @code{vector} in keyword-and-predefine mode; for example,
+v4f32 __builtin_msa_fexp2_w (v4f32, v4i32);
+v2f64 __builtin_msa_fexp2_d (v2f64, v2i64);
 
-@smallexample
-typedef signed short int16;
-vector int16 data;
-@end smallexample
+v4f32 __builtin_msa_fexupl_w (v8i16);
+v2f64 __builtin_msa_fexupl_d (v4f32);
 
-Note that keyword-and-predefine mode is enabled by disabling GNU
-extensions (e.g., by using @code{-std=c11}) and including
-@code{<altivec.h>}.
-@end itemize
+v4f32 __builtin_msa_fexupr_w (v8i16);
+v2f64 __builtin_msa_fexupr_d (v4f32);
 
-@item
-For C, overloaded functions are implemented with macros so the following
-does not work:
+v4f32 __builtin_msa_ffint_s_w (v4i32);
+v2f64 __builtin_msa_ffint_s_d (v2i64);
+
+v4f32 __builtin_msa_ffint_u_w (v4u32);
+v2f64 __builtin_msa_ffint_u_d (v2u64);
+
+v4f32 __builtin_msa_ffql_w (v8i16);
+v2f64 __builtin_msa_ffql_d (v4i32);
 
-@smallexample
-  vec_add ((vector signed int)@{1, 2, 3, 4@}, foo);
-@end smallexample
+v4f32 __builtin_msa_ffqr_w (v8i16);
+v2f64 __builtin_msa_ffqr_d (v4i32);
 
-@noindent
-Since @code{vec_add} is a macro, the vector constant in the example
-is treated as four separate arguments.  Wrap the entire argument in
-parentheses for this to work.
-@end itemize
+v16i8 __builtin_msa_fill_b (i32);
+v8i16 __builtin_msa_fill_h (i32);
+v4i32 __builtin_msa_fill_w (i32);
+v2i64 __builtin_msa_fill_d (i64);
 
-@emph{Note:} Only the @code{<altivec.h>} interface is supported.
-Internally, GCC uses built-in functions to achieve the functionality in
-the aforementioned header file, but they are not supported and are
-subject to change without notice.
+v4f32 __builtin_msa_flog2_w (v4f32);
+v2f64 __builtin_msa_flog2_d (v2f64);
 
-GCC complies with the Power Vector Intrinsic Programming Reference (PVIPR),
-which may be found at
-@uref{https://openpowerfoundation.org/?resource_lib=power-vector-intrinsic-programming-reference}.
-Chapter 4 of this document fully documents the vector API interfaces
-that must be
-provided by compliant compilers.  Programmers should preferentially use
-the interfaces described therein.  However, historically GCC has provided
-additional interfaces for access to vector instructions.  These are
-briefly described below.  Where the PVIPR provides a portable interface,
-other functions in GCC that provide the same capabilities should be
-considered deprecated.
+v4f32 __builtin_msa_fmadd_w (v4f32, v4f32, v4f32);
+v2f64 __builtin_msa_fmadd_d (v2f64, v2f64, v2f64);
 
-The PVIPR documents the following overloaded functions:
+v4f32 __builtin_msa_fmax_w (v4f32, v4f32);
+v2f64 __builtin_msa_fmax_d (v2f64, v2f64);
 
-@multitable @columnfractions 0.33 0.33 0.33
+v4f32 __builtin_msa_fmax_a_w (v4f32, v4f32);
+v2f64 __builtin_msa_fmax_a_d (v2f64, v2f64);
 
-@item @code{vec_abs}
-@tab @code{vec_absd}
-@tab @code{vec_abss}
-@item @code{vec_add}
-@tab @code{vec_addc}
-@tab @code{vec_adde}
-@item @code{vec_addec}
-@tab @code{vec_adds}
-@tab @code{vec_all_eq}
-@item @code{vec_all_ge}
-@tab @code{vec_all_gt}
-@tab @code{vec_all_in}
-@item @code{vec_all_le}
-@tab @code{vec_all_lt}
-@tab @code{vec_all_nan}
-@item @code{vec_all_ne}
-@tab @code{vec_all_nge}
-@tab @code{vec_all_ngt}
-@item @code{vec_all_nle}
-@tab @code{vec_all_nlt}
-@tab @code{vec_all_numeric}
-@item @code{vec_and}
-@tab @code{vec_andc}
-@tab @code{vec_any_eq}
-@item @code{vec_any_ge}
-@tab @code{vec_any_gt}
-@tab @code{vec_any_le}
-@item @code{vec_any_lt}
-@tab @code{vec_any_nan}
-@tab @code{vec_any_ne}
-@item @code{vec_any_nge}
-@tab @code{vec_any_ngt}
-@tab @code{vec_any_nle}
-@item @code{vec_any_nlt}
-@tab @code{vec_any_numeric}
-@tab @code{vec_any_out}
-@item @code{vec_avg}
-@tab @code{vec_bperm}
-@tab @code{vec_ceil}
-@item @code{vec_cipher_be}
-@tab @code{vec_cipherlast_be}
-@tab @code{vec_cmpb}
-@item @code{vec_cmpeq}
-@tab @code{vec_cmpge}
-@tab @code{vec_cmpgt}
-@item @code{vec_cmple}
-@tab @code{vec_cmplt}
-@tab @code{vec_cmpne}
-@item @code{vec_cmpnez}
-@tab @code{vec_cntlz}
-@tab @code{vec_cntlz_lsbb}
-@item @code{vec_cnttz}
-@tab @code{vec_cnttz_lsbb}
-@tab @code{vec_cpsgn}
-@item @code{vec_ctf}
-@tab @code{vec_cts}
-@tab @code{vec_ctu}
-@item @code{vec_div}
-@tab @code{vec_double}
-@tab @code{vec_doublee}
-@item @code{vec_doubleh}
-@tab @code{vec_doublel}
-@tab @code{vec_doubleo}
-@item @code{vec_eqv}
-@tab @code{vec_expte}
-@tab @code{vec_extract}
-@item @code{vec_extract_exp}
-@tab @code{vec_extract_fp32_from_shorth}
-@tab @code{vec_extract_fp32_from_shortl}
-@item @code{vec_extract_sig}
-@tab @code{vec_extract_4b}
-@tab @code{vec_first_match_index}
-@item @code{vec_first_match_or_eos_index}
-@tab @code{vec_first_mismatch_index}
-@tab @code{vec_first_mismatch_or_eos_index}
-@item @code{vec_float}
-@tab @code{vec_float2}
-@tab @code{vec_floate}
-@item @code{vec_floato}
-@tab @code{vec_floor}
-@tab @code{vec_gb}
-@item @code{vec_insert}
-@tab @code{vec_insert_exp}
-@tab @code{vec_insert4b}
-@item @code{vec_ld}
-@tab @code{vec_lde}
-@tab @code{vec_ldl}
-@item @code{vec_loge}
-@tab @code{vec_madd}
-@tab @code{vec_madds}
-@item @code{vec_max}
-@tab @code{vec_mergee}
-@tab @code{vec_mergeh}
-@item @code{vec_mergel}
-@tab @code{vec_mergeo}
-@tab @code{vec_mfvscr}
-@item @code{vec_min}
-@tab @code{vec_mradds}
-@tab @code{vec_msub}
-@item @code{vec_msum}
-@tab @code{vec_msums}
-@tab @code{vec_mtvscr}
-@item @code{vec_mul}
-@tab @code{vec_mule}
-@tab @code{vec_mulo}
-@item @code{vec_nabs}
-@tab @code{vec_nand}
-@tab @code{vec_ncipher_be}
-@item @code{vec_ncipherlast_be}
-@tab @code{vec_nearbyint}
-@tab @code{vec_neg}
-@item @code{vec_nmadd}
-@tab @code{vec_nmsub}
-@tab @code{vec_nor}
-@item @code{vec_or}
-@tab @code{vec_orc}
-@tab @code{vec_pack}
-@item @code{vec_pack_to_short_fp32}
-@tab @code{vec_packpx}
-@tab @code{vec_packs}
-@item @code{vec_packsu}
-@tab @code{vec_parity_lsbb}
-@tab @code{vec_perm}
-@item @code{vec_permxor}
-@tab @code{vec_pmsum_be}
-@tab @code{vec_popcnt}
-@item @code{vec_re}
-@tab @code{vec_recipdiv}
-@tab @code{vec_revb}
-@item @code{vec_reve}
-@tab @code{vec_rint}
-@tab @code{vec_rl}
-@item @code{vec_rlmi}
-@tab @code{vec_rlnm}
-@tab @code{vec_round}
-@item @code{vec_rsqrt}
-@tab @code{vec_rsqrte}
-@tab @code{vec_sbox_be}
-@item @code{vec_sel}
-@tab @code{vec_shasigma_be}
-@tab @code{vec_signed}
-@item @code{vec_signed2}
-@tab @code{vec_signede}
-@tab @code{vec_signedo}
-@item @code{vec_sl}
-@tab @code{vec_sld}
-@tab @code{vec_sldw}
-@item @code{vec_sll}
-@tab @code{vec_slo}
-@tab @code{vec_slv}
-@item @code{vec_splat}
-@tab @code{vec_splat_s8}
-@tab @code{vec_splat_s16}
-@item @code{vec_splat_s32}
-@tab @code{vec_splat_u8}
-@tab @code{vec_splat_u16}
-@item @code{vec_splat_u32}
-@tab @code{vec_splats}
-@tab @code{vec_sqrt}
-@item @code{vec_sr}
-@tab @code{vec_sra}
-@tab @code{vec_srl}
-@item @code{vec_sro}
-@tab @code{vec_srv}
-@tab @code{vec_st}
-@item @code{vec_ste}
-@tab @code{vec_stl}
-@tab @code{vec_sub}
-@item @code{vec_subc}
-@tab @code{vec_sube}
-@tab @code{vec_subec}
-@item @code{vec_subs}
-@tab @code{vec_sum2s}
-@tab @code{vec_sum4s}
-@item @code{vec_sums}
-@tab @code{vec_test_data_class}
-@tab @code{vec_trunc}
-@item @code{vec_unpackh}
-@tab @code{vec_unpackl}
-@tab @code{vec_unsigned}
-@item @code{vec_unsigned2}
-@tab @code{vec_unsignede}
-@tab @code{vec_unsignedo}
-@item @code{vec_xl}
-@tab @code{vec_xl_be}
-@tab @code{vec_xl_len}
-@item @code{vec_xl_len_r}
-@tab @code{vec_xor}
-@tab @code{vec_xst}
-@item @code{vec_xst_be}
-@tab @code{vec_xst_len}
-@tab @code{vec_xst_len_r}
+v4f32 __builtin_msa_fmin_w (v4f32, v4f32);
+v2f64 __builtin_msa_fmin_d (v2f64, v2f64);
 
-@end multitable
+v4f32 __builtin_msa_fmin_a_w (v4f32, v4f32);
+v2f64 __builtin_msa_fmin_a_d (v2f64, v2f64);
 
-@menu
-* PowerPC AltiVec Built-in Functions on ISA 2.05::
-* PowerPC AltiVec Built-in Functions Available on ISA 2.06::
-* PowerPC AltiVec Built-in Functions Available on ISA 2.07::
-* PowerPC AltiVec Built-in Functions Available on ISA 3.0::
-* PowerPC AltiVec Built-in Functions Available on ISA 3.1::
-@end menu
+v4f32 __builtin_msa_fmsub_w (v4f32, v4f32, v4f32);
+v2f64 __builtin_msa_fmsub_d (v2f64, v2f64, v2f64);
 
-@node PowerPC AltiVec Built-in Functions on ISA 2.05
-@subsubsection PowerPC AltiVec Built-in Functions on ISA 2.05
+v4f32 __builtin_msa_fmul_w (v4f32, v4f32);
+v2f64 __builtin_msa_fmul_d (v2f64, v2f64);
 
-The following interfaces are supported for the generic and specific
-AltiVec operations and the AltiVec predicates.  In cases where there
-is a direct mapping between generic and specific operations, only the
-generic names are shown here, although the specific operations can also
-be used.
+v4f32 __builtin_msa_frint_w (v4f32);
+v2f64 __builtin_msa_frint_d (v2f64);
 
-Arguments that are documented as @code{const int} require literal
-integral values within the range required for that operation.
+v4f32 __builtin_msa_frcp_w (v4f32);
+v2f64 __builtin_msa_frcp_d (v2f64);
 
-Only functions excluded from the PVIPR are listed here.
+v4f32 __builtin_msa_frsqrt_w (v4f32);
+v2f64 __builtin_msa_frsqrt_d (v2f64);
 
-@smallexample
-void vec_dss (const int);
+v4i32 __builtin_msa_fsaf_w (v4f32, v4f32);
+v2i64 __builtin_msa_fsaf_d (v2f64, v2f64);
 
-void vec_dssall (void);
+v4i32 __builtin_msa_fseq_w (v4f32, v4f32);
+v2i64 __builtin_msa_fseq_d (v2f64, v2f64);
 
-void vec_dst (const vector unsigned char *, int, const int);
-void vec_dst (const vector signed char *, int, const int);
-void vec_dst (const vector bool char *, int, const int);
-void vec_dst (const vector unsigned short *, int, const int);
-void vec_dst (const vector signed short *, int, const int);
-void vec_dst (const vector bool short *, int, const int);
-void vec_dst (const vector pixel *, int, const int);
-void vec_dst (const vector unsigned int *, int, const int);
-void vec_dst (const vector signed int *, int, const int);
-void vec_dst (const vector bool int *, int, const int);
-void vec_dst (const vector float *, int, const int);
-void vec_dst (const unsigned char *, int, const int);
-void vec_dst (const signed char *, int, const int);
-void vec_dst (const unsigned short *, int, const int);
-void vec_dst (const short *, int, const int);
-void vec_dst (const unsigned int *, int, const int);
-void vec_dst (const int *, int, const int);
-void vec_dst (const float *, int, const int);
+v4i32 __builtin_msa_fsle_w (v4f32, v4f32);
+v2i64 __builtin_msa_fsle_d (v2f64, v2f64);
 
-void vec_dstst (const vector unsigned char *, int, const int);
-void vec_dstst (const vector signed char *, int, const int);
-void vec_dstst (const vector bool char *, int, const int);
-void vec_dstst (const vector unsigned short *, int, const int);
-void vec_dstst (const vector signed short *, int, const int);
-void vec_dstst (const vector bool short *, int, const int);
-void vec_dstst (const vector pixel *, int, const int);
-void vec_dstst (const vector unsigned int *, int, const int);
-void vec_dstst (const vector signed int *, int, const int);
-void vec_dstst (const vector bool int *, int, const int);
-void vec_dstst (const vector float *, int, const int);
-void vec_dstst (const unsigned char *, int, const int);
-void vec_dstst (const signed char *, int, const int);
-void vec_dstst (const unsigned short *, int, const int);
-void vec_dstst (const short *, int, const int);
-void vec_dstst (const unsigned int *, int, const int);
-void vec_dstst (const int *, int, const int);
-void vec_dstst (const unsigned long *, int, const int);
-void vec_dstst (const long *, int, const int);
-void vec_dstst (const float *, int, const int);
+v4i32 __builtin_msa_fslt_w (v4f32, v4f32);
+v2i64 __builtin_msa_fslt_d (v2f64, v2f64);
 
-void vec_dststt (const vector unsigned char *, int, const int);
-void vec_dststt (const vector signed char *, int, const int);
-void vec_dststt (const vector bool char *, int, const int);
-void vec_dststt (const vector unsigned short *, int, const int);
-void vec_dststt (const vector signed short *, int, const int);
-void vec_dststt (const vector bool short *, int, const int);
-void vec_dststt (const vector pixel *, int, const int);
-void vec_dststt (const vector unsigned int *, int, const int);
-void vec_dststt (const vector signed int *, int, const int);
-void vec_dststt (const vector bool int *, int, const int);
-void vec_dststt (const vector float *, int, const int);
-void vec_dststt (const unsigned char *, int, const int);
-void vec_dststt (const signed char *, int, const int);
-void vec_dststt (const unsigned short *, int, const int);
-void vec_dststt (const short *, int, const int);
-void vec_dststt (const unsigned int *, int, const int);
-void vec_dststt (const int *, int, const int);
-void vec_dststt (const float *, int, const int);
+v4i32 __builtin_msa_fsne_w (v4f32, v4f32);
+v2i64 __builtin_msa_fsne_d (v2f64, v2f64);
 
-void vec_dstt (const vector unsigned char *, int, const int);
-void vec_dstt (const vector signed char *, int, const int);
-void vec_dstt (const vector bool char *, int, const int);
-void vec_dstt (const vector unsigned short *, int, const int);
-void vec_dstt (const vector signed short *, int, const int);
-void vec_dstt (const vector bool short *, int, const int);
-void vec_dstt (const vector pixel *, int, const int);
-void vec_dstt (const vector unsigned int *, int, const int);
-void vec_dstt (const vector signed int *, int, const int);
-void vec_dstt (const vector bool int *, int, const int);
-void vec_dstt (const vector float *, int, const int);
-void vec_dstt (const unsigned char *, int, const int);
-void vec_dstt (const signed char *, int, const int);
-void vec_dstt (const unsigned short *, int, const int);
-void vec_dstt (const short *, int, const int);
-void vec_dstt (const unsigned int *, int, const int);
-void vec_dstt (const int *, int, const int);
-void vec_dstt (const float *, int, const int);
+v4i32 __builtin_msa_fsor_w (v4f32, v4f32);
+v2i64 __builtin_msa_fsor_d (v2f64, v2f64);
 
-vector signed char vec_lvebx (int, char *);
-vector unsigned char vec_lvebx (int, unsigned char *);
+v4f32 __builtin_msa_fsqrt_w (v4f32);
+v2f64 __builtin_msa_fsqrt_d (v2f64);
 
-vector signed short vec_lvehx (int, short *);
-vector unsigned short vec_lvehx (int, unsigned short *);
+v4f32 __builtin_msa_fsub_w (v4f32, v4f32);
+v2f64 __builtin_msa_fsub_d (v2f64, v2f64);
 
-vector float vec_lvewx (int, float *);
-vector signed int vec_lvewx (int, int *);
-vector unsigned int vec_lvewx (int, unsigned int *);
+v4i32 __builtin_msa_fsueq_w (v4f32, v4f32);
+v2i64 __builtin_msa_fsueq_d (v2f64, v2f64);
 
-vector unsigned char vec_lvsl (int, const unsigned char *);
-vector unsigned char vec_lvsl (int, const signed char *);
-vector unsigned char vec_lvsl (int, const unsigned short *);
-vector unsigned char vec_lvsl (int, const short *);
-vector unsigned char vec_lvsl (int, const unsigned int *);
-vector unsigned char vec_lvsl (int, const int *);
-vector unsigned char vec_lvsl (int, const float *);
+v4i32 __builtin_msa_fsule_w (v4f32, v4f32);
+v2i64 __builtin_msa_fsule_d (v2f64, v2f64);
 
-vector unsigned char vec_lvsr (int, const unsigned char *);
-vector unsigned char vec_lvsr (int, const signed char *);
-vector unsigned char vec_lvsr (int, const unsigned short *);
-vector unsigned char vec_lvsr (int, const short *);
-vector unsigned char vec_lvsr (int, const unsigned int *);
-vector unsigned char vec_lvsr (int, const int *);
-vector unsigned char vec_lvsr (int, const float *);
+v4i32 __builtin_msa_fsult_w (v4f32, v4f32);
+v2i64 __builtin_msa_fsult_d (v2f64, v2f64);
 
-void vec_stvebx (vector signed char, int, signed char *);
-void vec_stvebx (vector unsigned char, int, unsigned char *);
-void vec_stvebx (vector bool char, int, signed char *);
-void vec_stvebx (vector bool char, int, unsigned char *);
+v4i32 __builtin_msa_fsun_w (v4f32, v4f32);
+v2i64 __builtin_msa_fsun_d (v2f64, v2f64);
 
-void vec_stvehx (vector signed short, int, short *);
-void vec_stvehx (vector unsigned short, int, unsigned short *);
-void vec_stvehx (vector bool short, int, short *);
-void vec_stvehx (vector bool short, int, unsigned short *);
+v4i32 __builtin_msa_fsune_w (v4f32, v4f32);
+v2i64 __builtin_msa_fsune_d (v2f64, v2f64);
 
-void vec_stvewx (vector float, int, float *);
-void vec_stvewx (vector signed int, int, int *);
-void vec_stvewx (vector unsigned int, int, unsigned int *);
-void vec_stvewx (vector bool int, int, int *);
-void vec_stvewx (vector bool int, int, unsigned int *);
+v4i32 __builtin_msa_ftint_s_w (v4f32);
+v2i64 __builtin_msa_ftint_s_d (v2f64);
 
-vector float vec_vaddfp (vector float, vector float);
+v4u32 __builtin_msa_ftint_u_w (v4f32);
+v2u64 __builtin_msa_ftint_u_d (v2f64);
 
-vector signed char vec_vaddsbs (vector bool char, vector signed char);
-vector signed char vec_vaddsbs (vector signed char, vector bool char);
-vector signed char vec_vaddsbs (vector signed char, vector signed char);
+v8i16 __builtin_msa_ftq_h (v4f32, v4f32);
+v4i32 __builtin_msa_ftq_w (v2f64, v2f64);
 
-vector signed short vec_vaddshs (vector bool short, vector signed short);
-vector signed short vec_vaddshs (vector signed short, vector bool short);
-vector signed short vec_vaddshs (vector signed short, vector signed short);
+v4i32 __builtin_msa_ftrunc_s_w (v4f32);
+v2i64 __builtin_msa_ftrunc_s_d (v2f64);
 
-vector signed int vec_vaddsws (vector bool int, vector signed int);
-vector signed int vec_vaddsws (vector signed int, vector bool int);
-vector signed int vec_vaddsws (vector signed int, vector signed int);
+v4u32 __builtin_msa_ftrunc_u_w (v4f32);
+v2u64 __builtin_msa_ftrunc_u_d (v2f64);
 
-vector signed char vec_vaddubm (vector bool char, vector signed char);
-vector signed char vec_vaddubm (vector signed char, vector bool char);
-vector signed char vec_vaddubm (vector signed char, vector signed char);
-vector unsigned char vec_vaddubm (vector bool char, vector unsigned char);
-vector unsigned char vec_vaddubm (vector unsigned char, vector bool char);
-vector unsigned char vec_vaddubm (vector unsigned char, vector unsigned char);
+v8i16 __builtin_msa_hadd_s_h (v16i8, v16i8);
+v4i32 __builtin_msa_hadd_s_w (v8i16, v8i16);
+v2i64 __builtin_msa_hadd_s_d (v4i32, v4i32);
 
-vector unsigned char vec_vaddubs (vector bool char, vector unsigned char);
-vector unsigned char vec_vaddubs (vector unsigned char, vector bool char);
-vector unsigned char vec_vaddubs (vector unsigned char, vector unsigned char);
+v8u16 __builtin_msa_hadd_u_h (v16u8, v16u8);
+v4u32 __builtin_msa_hadd_u_w (v8u16, v8u16);
+v2u64 __builtin_msa_hadd_u_d (v4u32, v4u32);
 
-vector signed short vec_vadduhm (vector bool short, vector signed short);
-vector signed short vec_vadduhm (vector signed short, vector bool short);
-vector signed short vec_vadduhm (vector signed short, vector signed short);
-vector unsigned short vec_vadduhm (vector bool short, vector unsigned short);
-vector unsigned short vec_vadduhm (vector unsigned short, vector bool short);
-vector unsigned short vec_vadduhm (vector unsigned short, vector unsigned short);
+v8i16 __builtin_msa_hsub_s_h (v16i8, v16i8);
+v4i32 __builtin_msa_hsub_s_w (v8i16, v8i16);
+v2i64 __builtin_msa_hsub_s_d (v4i32, v4i32);
 
-vector unsigned short vec_vadduhs (vector bool short, vector unsigned short);
-vector unsigned short vec_vadduhs (vector unsigned short, vector bool short);
-vector unsigned short vec_vadduhs (vector unsigned short, vector unsigned short);
+v8i16 __builtin_msa_hsub_u_h (v16u8, v16u8);
+v4i32 __builtin_msa_hsub_u_w (v8u16, v8u16);
+v2i64 __builtin_msa_hsub_u_d (v4u32, v4u32);
 
-vector signed int vec_vadduwm (vector bool int, vector signed int);
-vector signed int vec_vadduwm (vector signed int, vector bool int);
-vector signed int vec_vadduwm (vector signed int, vector signed int);
-vector unsigned int vec_vadduwm (vector bool int, vector unsigned int);
-vector unsigned int vec_vadduwm (vector unsigned int, vector bool int);
-vector unsigned int vec_vadduwm (vector unsigned int, vector unsigned int);
+v16i8 __builtin_msa_ilvev_b (v16i8, v16i8);
+v8i16 __builtin_msa_ilvev_h (v8i16, v8i16);
+v4i32 __builtin_msa_ilvev_w (v4i32, v4i32);
+v2i64 __builtin_msa_ilvev_d (v2i64, v2i64);
 
-vector unsigned int vec_vadduws (vector bool int, vector unsigned int);
-vector unsigned int vec_vadduws (vector unsigned int, vector bool int);
-vector unsigned int vec_vadduws (vector unsigned int, vector unsigned int);
+v16i8 __builtin_msa_ilvl_b (v16i8, v16i8);
+v8i16 __builtin_msa_ilvl_h (v8i16, v8i16);
+v4i32 __builtin_msa_ilvl_w (v4i32, v4i32);
+v2i64 __builtin_msa_ilvl_d (v2i64, v2i64);
 
-vector signed char vec_vavgsb (vector signed char, vector signed char);
+v16i8 __builtin_msa_ilvod_b (v16i8, v16i8);
+v8i16 __builtin_msa_ilvod_h (v8i16, v8i16);
+v4i32 __builtin_msa_ilvod_w (v4i32, v4i32);
+v2i64 __builtin_msa_ilvod_d (v2i64, v2i64);
 
-vector signed short vec_vavgsh (vector signed short, vector signed short);
+v16i8 __builtin_msa_ilvr_b (v16i8, v16i8);
+v8i16 __builtin_msa_ilvr_h (v8i16, v8i16);
+v4i32 __builtin_msa_ilvr_w (v4i32, v4i32);
+v2i64 __builtin_msa_ilvr_d (v2i64, v2i64);
 
-vector signed int vec_vavgsw (vector signed int, vector signed int);
+v16i8 __builtin_msa_insert_b (v16i8, imm0_15, i32);
+v8i16 __builtin_msa_insert_h (v8i16, imm0_7, i32);
+v4i32 __builtin_msa_insert_w (v4i32, imm0_3, i32);
+v2i64 __builtin_msa_insert_d (v2i64, imm0_1, i64);
 
-vector unsigned char vec_vavgub (vector unsigned char, vector unsigned char);
+v16i8 __builtin_msa_insve_b (v16i8, imm0_15, v16i8);
+v8i16 __builtin_msa_insve_h (v8i16, imm0_7, v8i16);
+v4i32 __builtin_msa_insve_w (v4i32, imm0_3, v4i32);
+v2i64 __builtin_msa_insve_d (v2i64, imm0_1, v2i64);
 
-vector unsigned short vec_vavguh (vector unsigned short, vector unsigned short);
+v16i8 __builtin_msa_ld_b (const void *, imm_n512_511);
+v8i16 __builtin_msa_ld_h (const void *, imm_n1024_1022);
+v4i32 __builtin_msa_ld_w (const void *, imm_n2048_2044);
+v2i64 __builtin_msa_ld_d (const void *, imm_n4096_4088);
 
-vector unsigned int vec_vavguw (vector unsigned int, vector unsigned int);
+v16i8 __builtin_msa_ldi_b (imm_n512_511);
+v8i16 __builtin_msa_ldi_h (imm_n512_511);
+v4i32 __builtin_msa_ldi_w (imm_n512_511);
+v2i64 __builtin_msa_ldi_d (imm_n512_511);
 
-vector float vec_vcfsx (vector signed int, const int);
+v8i16 __builtin_msa_madd_q_h (v8i16, v8i16, v8i16);
+v4i32 __builtin_msa_madd_q_w (v4i32, v4i32, v4i32);
 
-vector float vec_vcfux (vector unsigned int, const int);
+v8i16 __builtin_msa_maddr_q_h (v8i16, v8i16, v8i16);
+v4i32 __builtin_msa_maddr_q_w (v4i32, v4i32, v4i32);
 
-vector bool int vec_vcmpeqfp (vector float, vector float);
+v16i8 __builtin_msa_maddv_b (v16i8, v16i8, v16i8);
+v8i16 __builtin_msa_maddv_h (v8i16, v8i16, v8i16);
+v4i32 __builtin_msa_maddv_w (v4i32, v4i32, v4i32);
+v2i64 __builtin_msa_maddv_d (v2i64, v2i64, v2i64);
 
-vector bool char vec_vcmpequb (vector signed char, vector signed char);
-vector bool char vec_vcmpequb (vector unsigned char, vector unsigned char);
+v16i8 __builtin_msa_max_a_b (v16i8, v16i8);
+v8i16 __builtin_msa_max_a_h (v8i16, v8i16);
+v4i32 __builtin_msa_max_a_w (v4i32, v4i32);
+v2i64 __builtin_msa_max_a_d (v2i64, v2i64);
 
-vector bool short vec_vcmpequh (vector signed short, vector signed short);
-vector bool short vec_vcmpequh (vector unsigned short, vector unsigned short);
+v16i8 __builtin_msa_max_s_b (v16i8, v16i8);
+v8i16 __builtin_msa_max_s_h (v8i16, v8i16);
+v4i32 __builtin_msa_max_s_w (v4i32, v4i32);
+v2i64 __builtin_msa_max_s_d (v2i64, v2i64);
 
-vector bool int vec_vcmpequw (vector signed int, vector signed int);
-vector bool int vec_vcmpequw (vector unsigned int, vector unsigned int);
+v16u8 __builtin_msa_max_u_b (v16u8, v16u8);
+v8u16 __builtin_msa_max_u_h (v8u16, v8u16);
+v4u32 __builtin_msa_max_u_w (v4u32, v4u32);
+v2u64 __builtin_msa_max_u_d (v2u64, v2u64);
 
-vector bool int vec_vcmpgtfp (vector float, vector float);
+v16i8 __builtin_msa_maxi_s_b (v16i8, imm_n16_15);
+v8i16 __builtin_msa_maxi_s_h (v8i16, imm_n16_15);
+v4i32 __builtin_msa_maxi_s_w (v4i32, imm_n16_15);
+v2i64 __builtin_msa_maxi_s_d (v2i64, imm_n16_15);
 
-vector bool char vec_vcmpgtsb (vector signed char, vector signed char);
+v16u8 __builtin_msa_maxi_u_b (v16u8, imm0_31);
+v8u16 __builtin_msa_maxi_u_h (v8u16, imm0_31);
+v4u32 __builtin_msa_maxi_u_w (v4u32, imm0_31);
+v2u64 __builtin_msa_maxi_u_d (v2u64, imm0_31);
 
-vector bool short vec_vcmpgtsh (vector signed short, vector signed short);
+v16i8 __builtin_msa_min_a_b (v16i8, v16i8);
+v8i16 __builtin_msa_min_a_h (v8i16, v8i16);
+v4i32 __builtin_msa_min_a_w (v4i32, v4i32);
+v2i64 __builtin_msa_min_a_d (v2i64, v2i64);
 
-vector bool int vec_vcmpgtsw (vector signed int, vector signed int);
+v16i8 __builtin_msa_min_s_b (v16i8, v16i8);
+v8i16 __builtin_msa_min_s_h (v8i16, v8i16);
+v4i32 __builtin_msa_min_s_w (v4i32, v4i32);
+v2i64 __builtin_msa_min_s_d (v2i64, v2i64);
 
-vector bool char vec_vcmpgtub (vector unsigned char, vector unsigned char);
+v16u8 __builtin_msa_min_u_b (v16u8, v16u8);
+v8u16 __builtin_msa_min_u_h (v8u16, v8u16);
+v4u32 __builtin_msa_min_u_w (v4u32, v4u32);
+v2u64 __builtin_msa_min_u_d (v2u64, v2u64);
 
-vector bool short vec_vcmpgtuh (vector unsigned short, vector unsigned short);
+v16i8 __builtin_msa_mini_s_b (v16i8, imm_n16_15);
+v8i16 __builtin_msa_mini_s_h (v8i16, imm_n16_15);
+v4i32 __builtin_msa_mini_s_w (v4i32, imm_n16_15);
+v2i64 __builtin_msa_mini_s_d (v2i64, imm_n16_15);
 
-vector bool int vec_vcmpgtuw (vector unsigned int, vector unsigned int);
+v16u8 __builtin_msa_mini_u_b (v16u8, imm0_31);
+v8u16 __builtin_msa_mini_u_h (v8u16, imm0_31);
+v4u32 __builtin_msa_mini_u_w (v4u32, imm0_31);
+v2u64 __builtin_msa_mini_u_d (v2u64, imm0_31);
 
-vector float vec_vmaxfp (vector float, vector float);
+v16i8 __builtin_msa_mod_s_b (v16i8, v16i8);
+v8i16 __builtin_msa_mod_s_h (v8i16, v8i16);
+v4i32 __builtin_msa_mod_s_w (v4i32, v4i32);
+v2i64 __builtin_msa_mod_s_d (v2i64, v2i64);
 
-vector signed char vec_vmaxsb (vector bool char, vector signed char);
-vector signed char vec_vmaxsb (vector signed char, vector bool char);
-vector signed char vec_vmaxsb (vector signed char, vector signed char);
+v16u8 __builtin_msa_mod_u_b (v16u8, v16u8);
+v8u16 __builtin_msa_mod_u_h (v8u16, v8u16);
+v4u32 __builtin_msa_mod_u_w (v4u32, v4u32);
+v2u64 __builtin_msa_mod_u_d (v2u64, v2u64);
 
-vector signed short vec_vmaxsh (vector bool short, vector signed short);
-vector signed short vec_vmaxsh (vector signed short, vector bool short);
-vector signed short vec_vmaxsh (vector signed short, vector signed short);
+v16i8 __builtin_msa_move_v (v16i8);
 
-vector signed int vec_vmaxsw (vector bool int, vector signed int);
-vector signed int vec_vmaxsw (vector signed int, vector bool int);
-vector signed int vec_vmaxsw (vector signed int, vector signed int);
+v8i16 __builtin_msa_msub_q_h (v8i16, v8i16, v8i16);
+v4i32 __builtin_msa_msub_q_w (v4i32, v4i32, v4i32);
 
-vector unsigned char vec_vmaxub (vector bool char, vector unsigned char);
-vector unsigned char vec_vmaxub (vector unsigned char, vector bool char);
-vector unsigned char vec_vmaxub (vector unsigned char, vector unsigned char);
+v8i16 __builtin_msa_msubr_q_h (v8i16, v8i16, v8i16);
+v4i32 __builtin_msa_msubr_q_w (v4i32, v4i32, v4i32);
 
-vector unsigned short vec_vmaxuh (vector bool short, vector unsigned short);
-vector unsigned short vec_vmaxuh (vector unsigned short, vector bool short);
-vector unsigned short vec_vmaxuh (vector unsigned short, vector unsigned short);
+v16i8 __builtin_msa_msubv_b (v16i8, v16i8, v16i8);
+v8i16 __builtin_msa_msubv_h (v8i16, v8i16, v8i16);
+v4i32 __builtin_msa_msubv_w (v4i32, v4i32, v4i32);
+v2i64 __builtin_msa_msubv_d (v2i64, v2i64, v2i64);
 
-vector unsigned int vec_vmaxuw (vector bool int, vector unsigned int);
-vector unsigned int vec_vmaxuw (vector unsigned int, vector bool int);
-vector unsigned int vec_vmaxuw (vector unsigned int, vector unsigned int);
+v8i16 __builtin_msa_mul_q_h (v8i16, v8i16);
+v4i32 __builtin_msa_mul_q_w (v4i32, v4i32);
 
-vector float vec_vminfp (vector float, vector float);
+v8i16 __builtin_msa_mulr_q_h (v8i16, v8i16);
+v4i32 __builtin_msa_mulr_q_w (v4i32, v4i32);
 
-vector signed char vec_vminsb (vector bool char, vector signed char);
-vector signed char vec_vminsb (vector signed char, vector bool char);
-vector signed char vec_vminsb (vector signed char, vector signed char);
+v16i8 __builtin_msa_mulv_b (v16i8, v16i8);
+v8i16 __builtin_msa_mulv_h (v8i16, v8i16);
+v4i32 __builtin_msa_mulv_w (v4i32, v4i32);
+v2i64 __builtin_msa_mulv_d (v2i64, v2i64);
 
-vector signed short vec_vminsh (vector bool short, vector signed short);
-vector signed short vec_vminsh (vector signed short, vector bool short);
-vector signed short vec_vminsh (vector signed short, vector signed short);
+v16i8 __builtin_msa_nloc_b (v16i8);
+v8i16 __builtin_msa_nloc_h (v8i16);
+v4i32 __builtin_msa_nloc_w (v4i32);
+v2i64 __builtin_msa_nloc_d (v2i64);
 
-vector signed int vec_vminsw (vector bool int, vector signed int);
-vector signed int vec_vminsw (vector signed int, vector bool int);
-vector signed int vec_vminsw (vector signed int, vector signed int);
+v16i8 __builtin_msa_nlzc_b (v16i8);
+v8i16 __builtin_msa_nlzc_h (v8i16);
+v4i32 __builtin_msa_nlzc_w (v4i32);
+v2i64 __builtin_msa_nlzc_d (v2i64);
 
-vector unsigned char vec_vminub (vector bool char, vector unsigned char);
-vector unsigned char vec_vminub (vector unsigned char, vector bool char);
-vector unsigned char vec_vminub (vector unsigned char, vector unsigned char);
+v16u8 __builtin_msa_nor_v (v16u8, v16u8);
 
-vector unsigned short vec_vminuh (vector bool short, vector unsigned short);
-vector unsigned short vec_vminuh (vector unsigned short, vector bool short);
-vector unsigned short vec_vminuh (vector unsigned short, vector unsigned short);
+v16u8 __builtin_msa_nori_b (v16u8, imm0_255);
 
-vector unsigned int vec_vminuw (vector bool int, vector unsigned int);
-vector unsigned int vec_vminuw (vector unsigned int, vector bool int);
-vector unsigned int vec_vminuw (vector unsigned int, vector unsigned int);
+v16u8 __builtin_msa_or_v (v16u8, v16u8);
 
-vector bool char vec_vmrghb (vector bool char, vector bool char);
-vector signed char vec_vmrghb (vector signed char, vector signed char);
-vector unsigned char vec_vmrghb (vector unsigned char, vector unsigned char);
+v16u8 __builtin_msa_ori_b (v16u8, imm0_255);
 
-vector bool short vec_vmrghh (vector bool short, vector bool short);
-vector signed short vec_vmrghh (vector signed short, vector signed short);
-vector unsigned short vec_vmrghh (vector unsigned short, vector unsigned short);
-vector pixel vec_vmrghh (vector pixel, vector pixel);
+v16i8 __builtin_msa_pckev_b (v16i8, v16i8);
+v8i16 __builtin_msa_pckev_h (v8i16, v8i16);
+v4i32 __builtin_msa_pckev_w (v4i32, v4i32);
+v2i64 __builtin_msa_pckev_d (v2i64, v2i64);
 
-vector float vec_vmrghw (vector float, vector float);
-vector bool int vec_vmrghw (vector bool int, vector bool int);
-vector signed int vec_vmrghw (vector signed int, vector signed int);
-vector unsigned int vec_vmrghw (vector unsigned int, vector unsigned int);
+v16i8 __builtin_msa_pckod_b (v16i8, v16i8);
+v8i16 __builtin_msa_pckod_h (v8i16, v8i16);
+v4i32 __builtin_msa_pckod_w (v4i32, v4i32);
+v2i64 __builtin_msa_pckod_d (v2i64, v2i64);
 
-vector bool char vec_vmrglb (vector bool char, vector bool char);
-vector signed char vec_vmrglb (vector signed char, vector signed char);
-vector unsigned char vec_vmrglb (vector unsigned char, vector unsigned char);
+v16i8 __builtin_msa_pcnt_b (v16i8);
+v8i16 __builtin_msa_pcnt_h (v8i16);
+v4i32 __builtin_msa_pcnt_w (v4i32);
+v2i64 __builtin_msa_pcnt_d (v2i64);
 
-vector bool short vec_vmrglh (vector bool short, vector bool short);
-vector signed short vec_vmrglh (vector signed short, vector signed short);
-vector unsigned short vec_vmrglh (vector unsigned short, vector unsigned short);
-vector pixel vec_vmrglh (vector pixel, vector pixel);
+v16i8 __builtin_msa_sat_s_b (v16i8, imm0_7);
+v8i16 __builtin_msa_sat_s_h (v8i16, imm0_15);
+v4i32 __builtin_msa_sat_s_w (v4i32, imm0_31);
+v2i64 __builtin_msa_sat_s_d (v2i64, imm0_63);
 
-vector float vec_vmrglw (vector float, vector float);
-vector signed int vec_vmrglw (vector signed int, vector signed int);
-vector unsigned int vec_vmrglw (vector unsigned int, vector unsigned int);
-vector bool int vec_vmrglw (vector bool int, vector bool int);
+v16u8 __builtin_msa_sat_u_b (v16u8, imm0_7);
+v8u16 __builtin_msa_sat_u_h (v8u16, imm0_15);
+v4u32 __builtin_msa_sat_u_w (v4u32, imm0_31);
+v2u64 __builtin_msa_sat_u_d (v2u64, imm0_63);
 
-vector signed int vec_vmsummbm (vector signed char, vector unsigned char,
-                                vector signed int);
+v16i8 __builtin_msa_shf_b (v16i8, imm0_255);
+v8i16 __builtin_msa_shf_h (v8i16, imm0_255);
+v4i32 __builtin_msa_shf_w (v4i32, imm0_255);
 
-vector signed int vec_vmsumshm (vector signed short, vector signed short,
-                                vector signed int);
+v16i8 __builtin_msa_sld_b (v16i8, v16i8, i32);
+v8i16 __builtin_msa_sld_h (v8i16, v8i16, i32);
+v4i32 __builtin_msa_sld_w (v4i32, v4i32, i32);
+v2i64 __builtin_msa_sld_d (v2i64, v2i64, i32);
 
-vector signed int vec_vmsumshs (vector signed short, vector signed short,
-                                vector signed int);
+v16i8 __builtin_msa_sldi_b (v16i8, v16i8, imm0_15);
+v8i16 __builtin_msa_sldi_h (v8i16, v8i16, imm0_7);
+v4i32 __builtin_msa_sldi_w (v4i32, v4i32, imm0_3);
+v2i64 __builtin_msa_sldi_d (v2i64, v2i64, imm0_1);
 
-vector unsigned int vec_vmsumubm (vector unsigned char, vector unsigned char,
-                                  vector unsigned int);
+v16i8 __builtin_msa_sll_b (v16i8, v16i8);
+v8i16 __builtin_msa_sll_h (v8i16, v8i16);
+v4i32 __builtin_msa_sll_w (v4i32, v4i32);
+v2i64 __builtin_msa_sll_d (v2i64, v2i64);
 
-vector unsigned int vec_vmsumuhm (vector unsigned short, vector unsigned short,
-                                  vector unsigned int);
+v16i8 __builtin_msa_slli_b (v16i8, imm0_7);
+v8i16 __builtin_msa_slli_h (v8i16, imm0_15);
+v4i32 __builtin_msa_slli_w (v4i32, imm0_31);
+v2i64 __builtin_msa_slli_d (v2i64, imm0_63);
 
-vector unsigned int vec_vmsumuhs (vector unsigned short, vector unsigned short,
-                                  vector unsigned int);
+v16i8 __builtin_msa_splat_b (v16i8, i32);
+v8i16 __builtin_msa_splat_h (v8i16, i32);
+v4i32 __builtin_msa_splat_w (v4i32, i32);
+v2i64 __builtin_msa_splat_d (v2i64, i32);
 
-vector signed short vec_vmulesb (vector signed char, vector signed char);
+v16i8 __builtin_msa_splati_b (v16i8, imm0_15);
+v8i16 __builtin_msa_splati_h (v8i16, imm0_7);
+v4i32 __builtin_msa_splati_w (v4i32, imm0_3);
+v2i64 __builtin_msa_splati_d (v2i64, imm0_1);
 
-vector signed int vec_vmulesh (vector signed short, vector signed short);
+v16i8 __builtin_msa_sra_b (v16i8, v16i8);
+v8i16 __builtin_msa_sra_h (v8i16, v8i16);
+v4i32 __builtin_msa_sra_w (v4i32, v4i32);
+v2i64 __builtin_msa_sra_d (v2i64, v2i64);
 
-vector unsigned short vec_vmuleub (vector unsigned char, vector unsigned char);
+v16i8 __builtin_msa_srai_b (v16i8, imm0_7);
+v8i16 __builtin_msa_srai_h (v8i16, imm0_15);
+v4i32 __builtin_msa_srai_w (v4i32, imm0_31);
+v2i64 __builtin_msa_srai_d (v2i64, imm0_63);
 
-vector unsigned int vec_vmuleuh (vector unsigned short, vector unsigned short);
+v16i8 __builtin_msa_srar_b (v16i8, v16i8);
+v8i16 __builtin_msa_srar_h (v8i16, v8i16);
+v4i32 __builtin_msa_srar_w (v4i32, v4i32);
+v2i64 __builtin_msa_srar_d (v2i64, v2i64);
 
-vector signed short vec_vmulosb (vector signed char, vector signed char);
+v16i8 __builtin_msa_srari_b (v16i8, imm0_7);
+v8i16 __builtin_msa_srari_h (v8i16, imm0_15);
+v4i32 __builtin_msa_srari_w (v4i32, imm0_31);
+v2i64 __builtin_msa_srari_d (v2i64, imm0_63);
 
-vector signed int vec_vmulosh (vector signed short, vector signed short);
+v16i8 __builtin_msa_srl_b (v16i8, v16i8);
+v8i16 __builtin_msa_srl_h (v8i16, v8i16);
+v4i32 __builtin_msa_srl_w (v4i32, v4i32);
+v2i64 __builtin_msa_srl_d (v2i64, v2i64);
 
-vector unsigned short vec_vmuloub (vector unsigned char, vector unsigned char);
+v16i8 __builtin_msa_srli_b (v16i8, imm0_7);
+v8i16 __builtin_msa_srli_h (v8i16, imm0_15);
+v4i32 __builtin_msa_srli_w (v4i32, imm0_31);
+v2i64 __builtin_msa_srli_d (v2i64, imm0_63);
 
-vector unsigned int vec_vmulouh (vector unsigned short, vector unsigned short);
+v16i8 __builtin_msa_srlr_b (v16i8, v16i8);
+v8i16 __builtin_msa_srlr_h (v8i16, v8i16);
+v4i32 __builtin_msa_srlr_w (v4i32, v4i32);
+v2i64 __builtin_msa_srlr_d (v2i64, v2i64);
 
-vector signed char vec_vpkshss (vector signed short, vector signed short);
+v16i8 __builtin_msa_srlri_b (v16i8, imm0_7);
+v8i16 __builtin_msa_srlri_h (v8i16, imm0_15);
+v4i32 __builtin_msa_srlri_w (v4i32, imm0_31);
+v2i64 __builtin_msa_srlri_d (v2i64, imm0_63);
 
-vector unsigned char vec_vpkshus (vector signed short, vector signed short);
+void __builtin_msa_st_b (v16i8, void *, imm_n512_511);
+void __builtin_msa_st_h (v8i16, void *, imm_n1024_1022);
+void __builtin_msa_st_w (v4i32, void *, imm_n2048_2044);
+void __builtin_msa_st_d (v2i64, void *, imm_n4096_4088);
 
-vector signed short vec_vpkswss (vector signed int, vector signed int);
+v16i8 __builtin_msa_subs_s_b (v16i8, v16i8);
+v8i16 __builtin_msa_subs_s_h (v8i16, v8i16);
+v4i32 __builtin_msa_subs_s_w (v4i32, v4i32);
+v2i64 __builtin_msa_subs_s_d (v2i64, v2i64);
 
-vector unsigned short vec_vpkswus (vector signed int, vector signed int);
+v16u8 __builtin_msa_subs_u_b (v16u8, v16u8);
+v8u16 __builtin_msa_subs_u_h (v8u16, v8u16);
+v4u32 __builtin_msa_subs_u_w (v4u32, v4u32);
+v2u64 __builtin_msa_subs_u_d (v2u64, v2u64);
 
-vector bool char vec_vpkuhum (vector bool short, vector bool short);
-vector signed char vec_vpkuhum (vector signed short, vector signed short);
-vector unsigned char vec_vpkuhum (vector unsigned short, vector unsigned short);
+v16u8 __builtin_msa_subsus_u_b (v16u8, v16i8);
+v8u16 __builtin_msa_subsus_u_h (v8u16, v8i16);
+v4u32 __builtin_msa_subsus_u_w (v4u32, v4i32);
+v2u64 __builtin_msa_subsus_u_d (v2u64, v2i64);
 
-vector unsigned char vec_vpkuhus (vector unsigned short, vector unsigned short);
+v16i8 __builtin_msa_subsuu_s_b (v16u8, v16u8);
+v8i16 __builtin_msa_subsuu_s_h (v8u16, v8u16);
+v4i32 __builtin_msa_subsuu_s_w (v4u32, v4u32);
+v2i64 __builtin_msa_subsuu_s_d (v2u64, v2u64);
 
-vector bool short vec_vpkuwum (vector bool int, vector bool int);
-vector signed short vec_vpkuwum (vector signed int, vector signed int);
-vector unsigned short vec_vpkuwum (vector unsigned int, vector unsigned int);
+v16i8 __builtin_msa_subv_b (v16i8, v16i8);
+v8i16 __builtin_msa_subv_h (v8i16, v8i16);
+v4i32 __builtin_msa_subv_w (v4i32, v4i32);
+v2i64 __builtin_msa_subv_d (v2i64, v2i64);
 
-vector unsigned short vec_vpkuwus (vector unsigned int, vector unsigned int);
+v16i8 __builtin_msa_subvi_b (v16i8, imm0_31);
+v8i16 __builtin_msa_subvi_h (v8i16, imm0_31);
+v4i32 __builtin_msa_subvi_w (v4i32, imm0_31);
+v2i64 __builtin_msa_subvi_d (v2i64, imm0_31);
 
-vector signed char vec_vrlb (vector signed char, vector unsigned char);
-vector unsigned char vec_vrlb (vector unsigned char, vector unsigned char);
+v16i8 __builtin_msa_vshf_b (v16i8, v16i8, v16i8);
+v8i16 __builtin_msa_vshf_h (v8i16, v8i16, v8i16);
+v4i32 __builtin_msa_vshf_w (v4i32, v4i32, v4i32);
+v2i64 __builtin_msa_vshf_d (v2i64, v2i64, v2i64);
 
-vector signed short vec_vrlh (vector signed short, vector unsigned short);
-vector unsigned short vec_vrlh (vector unsigned short, vector unsigned short);
+v16u8 __builtin_msa_xor_v (v16u8, v16u8);
 
-vector signed int vec_vrlw (vector signed int, vector unsigned int);
-vector unsigned int vec_vrlw (vector unsigned int, vector unsigned int);
+v16u8 __builtin_msa_xori_b (v16u8, imm0_255);
+@end smallexample
 
-vector signed char vec_vslb (vector signed char, vector unsigned char);
-vector unsigned char vec_vslb (vector unsigned char, vector unsigned char);
+@node Other MIPS Built-in Functions
+@subsection Other MIPS Built-in Functions
 
-vector signed short vec_vslh (vector signed short, vector unsigned short);
-vector unsigned short vec_vslh (vector unsigned short, vector unsigned short);
+GCC provides other MIPS-specific built-in functions:
 
-vector signed int vec_vslw (vector signed int, vector unsigned int);
-vector unsigned int vec_vslw (vector unsigned int, vector unsigned int);
+@table @code
+@item void __builtin_mips_cache (int @var{op}, const volatile void *@var{addr})
+Insert a @samp{cache} instruction with operands @var{op} and @var{addr}.
+GCC defines the preprocessor macro @code{___GCC_HAVE_BUILTIN_MIPS_CACHE}
+when this function is available.
 
-vector signed char vec_vspltb (vector signed char, const int);
-vector unsigned char vec_vspltb (vector unsigned char, const int);
-vector bool char vec_vspltb (vector bool char, const int);
+@item unsigned int __builtin_mips_get_fcsr (void)
+@itemx void __builtin_mips_set_fcsr (unsigned int @var{value})
+Get and set the contents of the floating-point control and status register
+(FPU control register 31).  These functions are only available in hard-float
+code but can be called in both MIPS16 and non-MIPS16 contexts.
 
-vector bool short vec_vsplth (vector bool short, const int);
-vector signed short vec_vsplth (vector signed short, const int);
-vector unsigned short vec_vsplth (vector unsigned short, const int);
-vector pixel vec_vsplth (vector pixel, const int);
+@code{__builtin_mips_set_fcsr} can be used to change any bit of the
+register except the condition codes, which GCC assumes are preserved.
+@end table
 
-vector float vec_vspltw (vector float, const int);
-vector signed int vec_vspltw (vector signed int, const int);
-vector unsigned int vec_vspltw (vector unsigned int, const int);
-vector bool int vec_vspltw (vector bool int, const int);
+@node MSP430 Built-in Functions
+@subsection MSP430 Built-in Functions
 
-vector signed char vec_vsrab (vector signed char, vector unsigned char);
-vector unsigned char vec_vsrab (vector unsigned char, vector unsigned char);
+GCC provides a couple of special builtin functions to aid in the
+writing of interrupt handlers in C.
 
-vector signed short vec_vsrah (vector signed short, vector unsigned short);
-vector unsigned short vec_vsrah (vector unsigned short, vector unsigned short);
+@table @code
+@item __bic_SR_register_on_exit (int @var{mask})
+This clears the indicated bits in the saved copy of the status register
+currently residing on the stack.  This only works inside interrupt
+handlers and the changes to the status register will only take affect
+once the handler returns.
 
-vector signed int vec_vsraw (vector signed int, vector unsigned int);
-vector unsigned int vec_vsraw (vector unsigned int, vector unsigned int);
+@item __bis_SR_register_on_exit (int @var{mask})
+This sets the indicated bits in the saved copy of the status register
+currently residing on the stack.  This only works inside interrupt
+handlers and the changes to the status register will only take affect
+once the handler returns.
 
-vector signed char vec_vsrb (vector signed char, vector unsigned char);
-vector unsigned char vec_vsrb (vector unsigned char, vector unsigned char);
+@item __delay_cycles (long long @var{cycles})
+This inserts an instruction sequence that takes exactly @var{cycles}
+cycles (between 0 and about 17E9) to complete.  The inserted sequence
+may use jumps, loops, or no-ops, and does not interfere with any other
+instructions.  Note that @var{cycles} must be a compile-time constant
+integer - that is, you must pass a number, not a variable that may be
+optimized to a constant later.  The number of cycles delayed by this
+builtin is exact.
+@end table
 
-vector signed short vec_vsrh (vector signed short, vector unsigned short);
-vector unsigned short vec_vsrh (vector unsigned short, vector unsigned short);
+@node NDS32 Built-in Functions
+@subsection NDS32 Built-in Functions
 
-vector signed int vec_vsrw (vector signed int, vector unsigned int);
-vector unsigned int vec_vsrw (vector unsigned int, vector unsigned int);
+These built-in functions are available for the NDS32 target:
 
-vector float vec_vsubfp (vector float, vector float);
+@defbuiltin{void __builtin_nds32_isync (int *@var{addr})}
+Insert an ISYNC instruction into the instruction stream where
+@var{addr} is an instruction address for serialization.
+@enddefbuiltin
 
-vector signed char vec_vsubsbs (vector bool char, vector signed char);
-vector signed char vec_vsubsbs (vector signed char, vector bool char);
-vector signed char vec_vsubsbs (vector signed char, vector signed char);
+@defbuiltin{void __builtin_nds32_isb (void)}
+Insert an ISB instruction into the instruction stream.
+@enddefbuiltin
 
-vector signed short vec_vsubshs (vector bool short, vector signed short);
-vector signed short vec_vsubshs (vector signed short, vector bool short);
-vector signed short vec_vsubshs (vector signed short, vector signed short);
+@defbuiltin{int __builtin_nds32_mfsr (int @var{sr})}
+Return the content of a system register which is mapped by @var{sr}.
+@enddefbuiltin
 
-vector signed int vec_vsubsws (vector bool int, vector signed int);
-vector signed int vec_vsubsws (vector signed int, vector bool int);
-vector signed int vec_vsubsws (vector signed int, vector signed int);
+@defbuiltin{int __builtin_nds32_mfusr (int @var{usr})}
+Return the content of a user space register which is mapped by @var{usr}.
+@enddefbuiltin
 
-vector signed char vec_vsububm (vector bool char, vector signed char);
-vector signed char vec_vsububm (vector signed char, vector bool char);
-vector signed char vec_vsububm (vector signed char, vector signed char);
-vector unsigned char vec_vsububm (vector bool char, vector unsigned char);
-vector unsigned char vec_vsububm (vector unsigned char, vector bool char);
-vector unsigned char vec_vsububm (vector unsigned char, vector unsigned char);
+@defbuiltin{void __builtin_nds32_mtsr (int @var{value}, int @var{sr})}
+Move the @var{value} to a system register which is mapped by @var{sr}.
+@enddefbuiltin
 
-vector unsigned char vec_vsububs (vector bool char, vector unsigned char);
-vector unsigned char vec_vsububs (vector unsigned char, vector bool char);
-vector unsigned char vec_vsububs (vector unsigned char, vector unsigned char);
+@defbuiltin{void __builtin_nds32_mtusr (int @var{value}, int @var{usr})}
+Move the @var{value} to a user space register which is mapped by @var{usr}.
+@enddefbuiltin
 
-vector signed short vec_vsubuhm (vector bool short, vector signed short);
-vector signed short vec_vsubuhm (vector signed short, vector bool short);
-vector signed short vec_vsubuhm (vector signed short, vector signed short);
-vector unsigned short vec_vsubuhm (vector bool short, vector unsigned short);
-vector unsigned short vec_vsubuhm (vector unsigned short, vector bool short);
-vector unsigned short vec_vsubuhm (vector unsigned short, vector unsigned short);
+@defbuiltin{void __builtin_nds32_setgie_en (void)}
+Enable global interrupt.
+@enddefbuiltin
 
-vector unsigned short vec_vsubuhs (vector bool short, vector unsigned short);
-vector unsigned short vec_vsubuhs (vector unsigned short, vector bool short);
-vector unsigned short vec_vsubuhs (vector unsigned short, vector unsigned short);
+@defbuiltin{void __builtin_nds32_setgie_dis (void)}
+Disable global interrupt.
+@enddefbuiltin
 
-vector signed int vec_vsubuwm (vector bool int, vector signed int);
-vector signed int vec_vsubuwm (vector signed int, vector bool int);
-vector signed int vec_vsubuwm (vector signed int, vector signed int);
-vector unsigned int vec_vsubuwm (vector bool int, vector unsigned int);
-vector unsigned int vec_vsubuwm (vector unsigned int, vector bool int);
-vector unsigned int vec_vsubuwm (vector unsigned int, vector unsigned int);
+@node Nvidia PTX Built-in Functions
+@subsection Nvidia PTX Built-in Functions
 
-vector unsigned int vec_vsubuws (vector bool int, vector unsigned int);
-vector unsigned int vec_vsubuws (vector unsigned int, vector bool int);
-vector unsigned int vec_vsubuws (vector unsigned int, vector unsigned int);
+These built-in functions are available for the Nvidia PTX target:
 
-vector signed int vec_vsum4sbs (vector signed char, vector signed int);
+@defbuiltin{{unsigned int} __builtin_nvptx_brev (unsigned int @var{x})}
+Reverse the bit order of a 32-bit unsigned integer.
+@enddefbuiltin
 
-vector signed int vec_vsum4shs (vector signed short, vector signed int);
+@defbuiltin{{unsigned long long} __builtin_nvptx_brevll (unsigned long long @var{x})}
+Reverse the bit order of a 64-bit unsigned integer.
+@enddefbuiltin
 
-vector unsigned int vec_vsum4ubs (vector unsigned char, vector unsigned int);
+@node Basic PowerPC Built-in Functions
+@subsection Basic PowerPC Built-in Functions
 
-vector unsigned int vec_vupkhpx (vector pixel);
+@menu
+* Basic PowerPC Built-in Functions Available on all Configurations::
+* Basic PowerPC Built-in Functions Available on ISA 2.05::
+* Basic PowerPC Built-in Functions Available on ISA 2.06::
+* Basic PowerPC Built-in Functions Available on ISA 2.07::
+* Basic PowerPC Built-in Functions Available on ISA 3.0::
+* Basic PowerPC Built-in Functions Available on ISA 3.1::
+@end menu
 
-vector bool short vec_vupkhsb (vector bool char);
-vector signed short vec_vupkhsb (vector signed char);
+This section describes PowerPC built-in functions that do not require
+the inclusion of any special header files to declare prototypes or
+provide macro definitions.  The sections that follow describe
+additional PowerPC built-in functions.
 
-vector bool int vec_vupkhsh (vector bool short);
-vector signed int vec_vupkhsh (vector signed short);
+@node Basic PowerPC Built-in Functions Available on all Configurations
+@subsubsection Basic PowerPC Built-in Functions Available on all Configurations
 
-vector unsigned int vec_vupklpx (vector pixel);
+@defbuiltin{void __builtin_cpu_init (void)}
+This function is a @code{nop} on the PowerPC platform and is included solely
+to maintain API compatibility with the x86 builtins.
+@enddefbuiltin
 
-vector bool short vec_vupklsb (vector bool char);
-vector signed short vec_vupklsb (vector signed char);
+@defbuiltin{int __builtin_cpu_is (const char *@var{cpuname})}
+This function returns a value of @code{1} if the run-time CPU is of type
+@var{cpuname} and returns @code{0} otherwise
 
-vector bool int vec_vupklsh (vector bool short);
-vector signed int vec_vupklsh (vector signed short);
-@end smallexample
+The @code{__builtin_cpu_is} function requires GLIBC 2.23 or newer
+which exports the hardware capability bits.  GCC defines the macro
+@code{__BUILTIN_CPU_SUPPORTS__} if the @code{__builtin_cpu_supports}
+built-in function is fully supported.
 
-@node PowerPC AltiVec Built-in Functions Available on ISA 2.06
-@subsubsection PowerPC AltiVec Built-in Functions Available on ISA 2.06
+If GCC was configured to use a GLIBC before 2.23, the built-in
+function @code{__builtin_cpu_is} always returns a 0 and the compiler
+issues a warning.
 
-The AltiVec built-in functions described in this section are
-available on the PowerPC family of processors starting with ISA 2.06
-or later.  These are normally enabled by adding @option{-mvsx} to the
-command line.
+The following CPU names can be detected:
 
-When @option{-mvsx} is used, the following additional vector types are
-implemented.
+@table @samp
+@item power10
+IBM POWER10 Server CPU.
+@item power9
+IBM POWER9 Server CPU.
+@item power8
+IBM POWER8 Server CPU.
+@item power7
+IBM POWER7 Server CPU.
+@item power6x
+IBM POWER6 Server CPU (RAW mode).
+@item power6
+IBM POWER6 Server CPU (Architected mode).
+@item power5+
+IBM POWER5+ Server CPU.
+@item power5
+IBM POWER5 Server CPU.
+@item ppc970
+IBM 970 Server CPU (ie, Apple G5).
+@item power4
+IBM POWER4 Server CPU.
+@item ppca2
+IBM A2 64-bit Embedded CPU
+@item ppc476
+IBM PowerPC 476FP 32-bit Embedded CPU.
+@item ppc464
+IBM PowerPC 464 32-bit Embedded CPU.
+@item ppc440
+PowerPC 440 32-bit Embedded CPU.
+@item ppc405
+PowerPC 405 32-bit Embedded CPU.
+@item ppc-cell-be
+IBM PowerPC Cell Broadband Engine Architecture CPU.
+@end table
 
+Here is an example:
 @smallexample
-vector unsigned __int128
-vector signed __int128
-vector unsigned long long int
-vector signed long long int
-vector double
+#ifdef __BUILTIN_CPU_SUPPORTS__
+  if (__builtin_cpu_is ("power8"))
+    @{
+       do_power8 (); // POWER8 specific implementation.
+    @}
+  else
+#endif
+    @{
+       do_generic (); // Generic implementation.
+    @}
 @end smallexample
+@enddefbuiltin
 
-The long long types are only implemented for 64-bit code generation.
-
-Only functions excluded from the PVIPR are listed here.
-
-@smallexample
-void vec_dst (const unsigned long *, int, const int);
-void vec_dst (const long *, int, const int);
-
-void vec_dststt (const unsigned long *, int, const int);
-void vec_dststt (const long *, int, const int);
-
-void vec_dstt (const unsigned long *, int, const int);
-void vec_dstt (const long *, int, const int);
-
-vector unsigned char vec_lvsl (int, const unsigned long *);
-vector unsigned char vec_lvsl (int, const long *);
-
-vector unsigned char vec_lvsr (int, const unsigned long *);
-vector unsigned char vec_lvsr (int, const long *);
+@defbuiltin{int __builtin_cpu_supports (const char *@var{feature})}
+This function returns a value of @code{1} if the run-time CPU supports the HWCAP
+feature @var{feature} and returns @code{0} otherwise.
 
-vector unsigned char vec_lvsl (int, const double *);
-vector unsigned char vec_lvsr (int, const double *);
+The @code{__builtin_cpu_supports} function requires GLIBC 2.23 or
+newer which exports the hardware capability bits.  GCC defines the
+macro @code{__BUILTIN_CPU_SUPPORTS__} if the
+@code{__builtin_cpu_supports} built-in function is fully supported.
 
-vector double vec_vsx_ld (int, const vector double *);
-vector double vec_vsx_ld (int, const double *);
-vector float vec_vsx_ld (int, const vector float *);
-vector float vec_vsx_ld (int, const float *);
-vector bool int vec_vsx_ld (int, const vector bool int *);
-vector signed int vec_vsx_ld (int, const vector signed int *);
-vector signed int vec_vsx_ld (int, const int *);
-vector signed int vec_vsx_ld (int, const long *);
-vector unsigned int vec_vsx_ld (int, const vector unsigned int *);
-vector unsigned int vec_vsx_ld (int, const unsigned int *);
-vector unsigned int vec_vsx_ld (int, const unsigned long *);
-vector bool short vec_vsx_ld (int, const vector bool short *);
-vector pixel vec_vsx_ld (int, const vector pixel *);
-vector signed short vec_vsx_ld (int, const vector signed short *);
-vector signed short vec_vsx_ld (int, const short *);
-vector unsigned short vec_vsx_ld (int, const vector unsigned short *);
-vector unsigned short vec_vsx_ld (int, const unsigned short *);
-vector bool char vec_vsx_ld (int, const vector bool char *);
-vector signed char vec_vsx_ld (int, const vector signed char *);
-vector signed char vec_vsx_ld (int, const signed char *);
-vector unsigned char vec_vsx_ld (int, const vector unsigned char *);
-vector unsigned char vec_vsx_ld (int, const unsigned char *);
+If GCC was configured to use a GLIBC before 2.23, the built-in
+function @code{__builtin_cpu_supports} always returns a 0 and the
+compiler issues a warning.
 
-void vec_vsx_st (vector double, int, vector double *);
-void vec_vsx_st (vector double, int, double *);
-void vec_vsx_st (vector float, int, vector float *);
-void vec_vsx_st (vector float, int, float *);
-void vec_vsx_st (vector signed int, int, vector signed int *);
-void vec_vsx_st (vector signed int, int, int *);
-void vec_vsx_st (vector unsigned int, int, vector unsigned int *);
-void vec_vsx_st (vector unsigned int, int, unsigned int *);
-void vec_vsx_st (vector bool int, int, vector bool int *);
-void vec_vsx_st (vector bool int, int, unsigned int *);
-void vec_vsx_st (vector bool int, int, int *);
-void vec_vsx_st (vector signed short, int, vector signed short *);
-void vec_vsx_st (vector signed short, int, short *);
-void vec_vsx_st (vector unsigned short, int, vector unsigned short *);
-void vec_vsx_st (vector unsigned short, int, unsigned short *);
-void vec_vsx_st (vector bool short, int, vector bool short *);
-void vec_vsx_st (vector bool short, int, unsigned short *);
-void vec_vsx_st (vector pixel, int, vector pixel *);
-void vec_vsx_st (vector pixel, int, unsigned short *);
-void vec_vsx_st (vector pixel, int, short *);
-void vec_vsx_st (vector bool short, int, short *);
-void vec_vsx_st (vector signed char, int, vector signed char *);
-void vec_vsx_st (vector signed char, int, signed char *);
-void vec_vsx_st (vector unsigned char, int, vector unsigned char *);
-void vec_vsx_st (vector unsigned char, int, unsigned char *);
-void vec_vsx_st (vector bool char, int, vector bool char *);
-void vec_vsx_st (vector bool char, int, unsigned char *);
-void vec_vsx_st (vector bool char, int, signed char *);
+The following features can be
+detected:
 
-vector double vec_xxpermdi (vector double, vector double, const int);
-vector float vec_xxpermdi (vector float, vector float, const int);
-vector __int128 vec_xxpermdi (vector __int128,
-                              vector __int128, const int);
-vector __uint128 vec_xxpermdi (vector __uint128,
-                               vector __uint128, const int);
-vector long long vec_xxpermdi (vector long long, vector long long, const int);
-vector unsigned long long vec_xxpermdi (vector unsigned long long,
-                                        vector unsigned long long, const int);
-vector int vec_xxpermdi (vector int, vector int, const int);
-vector unsigned int vec_xxpermdi (vector unsigned int,
-                                  vector unsigned int, const int);
-vector short vec_xxpermdi (vector short, vector short, const int);
-vector unsigned short vec_xxpermdi (vector unsigned short,
-                                    vector unsigned short, const int);
-vector signed char vec_xxpermdi (vector signed char, vector signed char,
-                                 const int);
-vector unsigned char vec_xxpermdi (vector unsigned char,
-                                   vector unsigned char, const int);
+@table @samp
+@item 4xxmac
+4xx CPU has a Multiply Accumulator.
+@item altivec
+CPU has a SIMD/Vector Unit.
+@item arch_2_05
+CPU supports ISA 2.05 (eg, POWER6)
+@item arch_2_06
+CPU supports ISA 2.06 (eg, POWER7)
+@item arch_2_07
+CPU supports ISA 2.07 (eg, POWER8)
+@item arch_3_00
+CPU supports ISA 3.0 (eg, POWER9)
+@item arch_3_1
+CPU supports ISA 3.1 (eg, POWER10)
+@item archpmu
+CPU supports the set of compatible performance monitoring events.
+@item booke
+CPU supports the Embedded ISA category.
+@item cellbe
+CPU has a CELL broadband engine.
+@item darn
+CPU supports the @code{darn} (deliver a random number) instruction.
+@item dfp
+CPU has a decimal floating point unit.
+@item dscr
+CPU supports the data stream control register.
+@item ebb
+CPU supports event base branching.
+@item efpdouble
+CPU has a SPE double precision floating point unit.
+@item efpsingle
+CPU has a SPE single precision floating point unit.
+@item fpu
+CPU has a floating point unit.
+@item htm
+CPU has hardware transaction memory instructions.
+@item htm-nosc
+Kernel aborts hardware transactions when a syscall is made.
+@item htm-no-suspend
+CPU supports hardware transaction memory but does not support the
+@code{tsuspend.} instruction.
+@item ic_snoop
+CPU supports icache snooping capabilities.
+@item ieee128
+CPU supports 128-bit IEEE binary floating point instructions.
+@item isel
+CPU supports the integer select instruction.
+@item mma
+CPU supports the matrix-multiply assist instructions.
+@item mmu
+CPU has a memory management unit.
+@item notb
+CPU does not have a timebase (eg, 601 and 403gx).
+@item pa6t
+CPU supports the PA Semi 6T CORE ISA.
+@item power4
+CPU supports ISA 2.00 (eg, POWER4)
+@item power5
+CPU supports ISA 2.02 (eg, POWER5)
+@item power5+
+CPU supports ISA 2.03 (eg, POWER5+)
+@item power6x
+CPU supports ISA 2.05 (eg, POWER6) extended opcodes mffgpr and mftgpr.
+@item ppc32
+CPU supports 32-bit mode execution.
+@item ppc601
+CPU supports the old POWER ISA (eg, 601)
+@item ppc64
+CPU supports 64-bit mode execution.
+@item ppcle
+CPU supports a little-endian mode that uses address swizzling.
+@item scv
+Kernel supports system call vectored.
+@item smt
+CPU support simultaneous multi-threading.
+@item spe
+CPU has a signal processing extension unit.
+@item tar
+CPU supports the target address register.
+@item true_le
+CPU supports true little-endian mode.
+@item ucache
+CPU has unified I/D cache.
+@item vcrypto
+CPU supports the vector cryptography instructions.
+@item vsx
+CPU supports the vector-scalar extension.
+@end table
 
-vector double vec_xxsldi (vector double, vector double, int);
-vector float vec_xxsldi (vector float, vector float, int);
-vector long long vec_xxsldi (vector long long, vector long long, int);
-vector unsigned long long vec_xxsldi (vector unsigned long long,
-                                      vector unsigned long long, int);
-vector int vec_xxsldi (vector int, vector int, int);
-vector unsigned int vec_xxsldi (vector unsigned int, vector unsigned int, int);
-vector short vec_xxsldi (vector short, vector short, int);
-vector unsigned short vec_xxsldi (vector unsigned short,
-                                  vector unsigned short, int);
-vector signed char vec_xxsldi (vector signed char, vector signed char, int);
-vector unsigned char vec_xxsldi (vector unsigned char,
-                                 vector unsigned char, int);
+Here is an example:
+@smallexample
+#ifdef __BUILTIN_CPU_SUPPORTS__
+  if (__builtin_cpu_supports ("fpu"))
+    @{
+       asm("fadd %0,%1,%2" : "=d"(dst) : "d"(src1), "d"(src2));
+    @}
+  else
+#endif
+    @{
+       dst = __fadd (src1, src2); // Software FP addition function.
+    @}
 @end smallexample
+@enddefbuiltin
 
-Note that the @samp{vec_ld} and @samp{vec_st} built-in functions always
-generate the AltiVec @samp{LVX} and @samp{STVX} instructions even
-if the VSX instruction set is available.  The @samp{vec_vsx_ld} and
-@samp{vec_vsx_st} built-in functions always generate the VSX @samp{LXVD2X},
-@samp{LXVW4X}, @samp{STXVD2X}, and @samp{STXVW4X} instructions.
-
+The following built-in functions are also available on all PowerPC
+processors:
 @smallexample
-vector signed long long vec_signedo (vector float);
-vector signed long long vec_signede (vector float);
-vector unsigned long long vec_unsignedo (vector float);
-vector unsigned long long vec_unsignede (vector float);
+uint64_t __builtin_ppc_get_timebase ();
+unsigned long __builtin_ppc_mftb ();
+double __builtin_unpack_ibm128 (__ibm128, int);
+__ibm128 __builtin_pack_ibm128 (double, double);
+double __builtin_mffs (void);
+void __builtin_mtfsf (const int, double);
+void __builtin_mtfsb0 (const int);
+void __builtin_mtfsb1 (const int);
+double __builtin_set_fpscr_rn (int);
 @end smallexample
 
-The overloaded built-ins @code{vec_signedo} and @code{vec_signede} are
-additional extensions to the built-ins as documented in the PVIPR.
+The @code{__builtin_ppc_get_timebase} and @code{__builtin_ppc_mftb}
+functions generate instructions to read the Time Base Register.  The
+@code{__builtin_ppc_get_timebase} function may generate multiple
+instructions and always returns the 64 bits of the Time Base Register.
+The @code{__builtin_ppc_mftb} function always generates one instruction and
+returns the Time Base Register value as an unsigned long, throwing away
+the most significant word on 32-bit environments.  The @code{__builtin_mffs}
+return the value of the FPSCR register.  Note, ISA 3.0 supports the
+@code{__builtin_mffsl()} which permits software to read the control and
+non-sticky status bits in the FSPCR without the higher latency associated with
+accessing the sticky status bits.  The @code{__builtin_mtfsf} takes a constant
+8-bit integer field mask and a double precision floating point argument
+and generates the @code{mtfsf} (extended mnemonic) instruction to write new
+values to selected fields of the FPSCR.  The
+@code{__builtin_mtfsb0} and @code{__builtin_mtfsb1} take the bit to change
+as an argument.  The valid bit range is between 0 and 31.  The builtins map to
+the @code{mtfsb0} and @code{mtfsb1} instructions which take the argument and
+add 32.  Hence these instructions only modify the FPSCR[32:63] bits by
+changing the specified bit to a zero or one respectively.
 
-@node PowerPC AltiVec Built-in Functions Available on ISA 2.07
-@subsubsection PowerPC AltiVec Built-in Functions Available on ISA 2.07
+The @code{__builtin_set_fpscr_rn} built-in allows changing both of the floating
+point rounding mode bits and returning the various FPSCR fields before the RN
+field is updated.  The built-in returns a double consisting of the initial
+value of the FPSCR fields DRN, VE, OE, UE, ZE, XE, NI, and RN bit positions
+with all other bits set to zero.  The built-in argument is a 2-bit value for the
+new RN field value.  The argument can either be an @code{const int} or stored
+in a variable.  Earlier versions of @code{__builtin_set_fpscr_rn} returned
+void.  A @code{__SET_FPSCR_RN_RETURNS_FPSCR__} macro has been added.  If
+defined, then the @code{__builtin_set_fpscr_rn} built-in returns the FPSCR
+fields.  If not defined, the @code{__builtin_set_fpscr_rn} does not return a
+value.  If the @option{-msoft-float} option is used, the
+@code{__builtin_set_fpscr_rn} built-in will not return a value.
 
-If the ISA 2.07 additions to the vector/scalar (power8-vector)
-instruction set are available, the following additional functions are
-available for both 32-bit and 64-bit targets.  For 64-bit targets, you
-can use @var{vector long} instead of @var{vector long long},
-@var{vector bool long} instead of @var{vector bool long long}, and
-@var{vector unsigned long} instead of @var{vector unsigned long long}.
+@node Basic PowerPC Built-in Functions Available on ISA 2.05
+@subsubsection Basic PowerPC Built-in Functions Available on ISA 2.05
 
-Only functions excluded from the PVIPR are listed here.
+The basic built-in functions described in this section are
+available on the PowerPC family of processors starting with ISA 2.05
+or later.  Unless specific options are explicitly disabled on the
+command line, specifying option @option{-mcpu=power6} has the effect of
+enabling the @option{-mpowerpc64}, @option{-mpowerpc-gpopt},
+@option{-mpowerpc-gfxopt}, @option{-mmfcrf}, @option{-mpopcntb},
+@option{-mfprnd}, @option{-mcmpb}, @option{-mhard-dfp}, and
+@option{-mrecip-precision} options.  Specify the
+@option{-maltivec} option explicitly in
+combination with the above options if desired.
 
+The following functions require option @option{-mcmpb}.
 @smallexample
-vector long long vec_vaddudm (vector long long, vector long long);
-vector long long vec_vaddudm (vector bool long long, vector long long);
-vector long long vec_vaddudm (vector long long, vector bool long long);
-vector unsigned long long vec_vaddudm (vector unsigned long long,
-                                       vector unsigned long long);
-vector unsigned long long vec_vaddudm (vector bool unsigned long long,
-                                       vector unsigned long long);
-vector unsigned long long vec_vaddudm (vector unsigned long long,
-                                       vector bool unsigned long long);
-
-vector long long vec_vclz (vector long long);
-vector unsigned long long vec_vclz (vector unsigned long long);
-vector int vec_vclz (vector int);
-vector unsigned int vec_vclz (vector int);
-vector short vec_vclz (vector short);
-vector unsigned short vec_vclz (vector unsigned short);
-vector signed char vec_vclz (vector signed char);
-vector unsigned char vec_vclz (vector unsigned char);
-
-vector signed char vec_vclzb (vector signed char);
-vector unsigned char vec_vclzb (vector unsigned char);
-
-vector long long vec_vclzd (vector long long);
-vector unsigned long long vec_vclzd (vector unsigned long long);
-
-vector short vec_vclzh (vector short);
-vector unsigned short vec_vclzh (vector unsigned short);
-
-vector int vec_vclzw (vector int);
-vector unsigned int vec_vclzw (vector int);
-
-vector signed char vec_vgbbd (vector signed char);
-vector unsigned char vec_vgbbd (vector unsigned char);
-
-vector long long vec_vmaxsd (vector long long, vector long long);
-
-vector unsigned long long vec_vmaxud (vector unsigned long long,
-                                      unsigned vector long long);
-
-vector long long vec_vminsd (vector long long, vector long long);
-
-vector unsigned long long vec_vminud (vector long long, vector long long);
-
-vector int vec_vpksdss (vector long long, vector long long);
-vector unsigned int vec_vpksdss (vector long long, vector long long);
+unsigned long long __builtin_cmpb (unsigned long long int, unsigned long long int);
+unsigned int __builtin_cmpb (unsigned int, unsigned int);
+@end smallexample
 
-vector unsigned int vec_vpkudus (vector unsigned long long,
-                                 vector unsigned long long);
+The @code{__builtin_cmpb} function
+performs a byte-wise compare on the contents of its two arguments,
+returning the result of the byte-wise comparison as the returned
+value.  For each byte comparison, the corresponding byte of the return
+value holds 0xff if the input bytes are equal and 0 if the input bytes
+are not equal.  If either of the arguments to this built-in function
+is wider than 32 bits, the function call expands into the form that
+expects @code{unsigned long long int} arguments
+which is only available on 64-bit targets.
 
-vector int vec_vpkudum (vector long long, vector long long);
-vector unsigned int vec_vpkudum (vector unsigned long long,
-                                 vector unsigned long long);
-vector bool int vec_vpkudum (vector bool long long, vector bool long long);
+The following built-in functions are available
+when hardware decimal floating point
+(@option{-mhard-dfp}) is available:
+@smallexample
+void __builtin_set_fpscr_drn(int);
+_Decimal64 __builtin_ddedpd (int, _Decimal64);
+_Decimal128 __builtin_ddedpdq (int, _Decimal128);
+_Decimal64 __builtin_denbcd (int, _Decimal64);
+_Decimal128 __builtin_denbcdq (int, _Decimal128);
+_Decimal64 __builtin_diex (long long, _Decimal64);
+_Decimal128 _builtin_diexq (long long, _Decimal128);
+_Decimal64 __builtin_dscli (_Decimal64, int);
+_Decimal128 __builtin_dscliq (_Decimal128, int);
+_Decimal64 __builtin_dscri (_Decimal64, int);
+_Decimal128 __builtin_dscriq (_Decimal128, int);
+long long __builtin_dxex (_Decimal64);
+long long __builtin_dxexq (_Decimal128);
+_Decimal128 __builtin_pack_dec128 (unsigned long long, unsigned long long);
+unsigned long long __builtin_unpack_dec128 (_Decimal128, int);
 
-vector long long vec_vpopcnt (vector long long);
-vector unsigned long long vec_vpopcnt (vector unsigned long long);
-vector int vec_vpopcnt (vector int);
-vector unsigned int vec_vpopcnt (vector int);
-vector short vec_vpopcnt (vector short);
-vector unsigned short vec_vpopcnt (vector unsigned short);
-vector signed char vec_vpopcnt (vector signed char);
-vector unsigned char vec_vpopcnt (vector unsigned char);
+The @code{__builtin_set_fpscr_drn} builtin allows changing the three decimal
+floating point rounding mode bits.  The argument is a 3-bit value.  The
+argument can either be a @code{const int} or the value can be stored in
+a variable.
+The builtin uses the ISA 3.0 instruction @code{mffscdrn} if available.
+Otherwise the builtin reads the FPSCR, masks the current decimal rounding
+mode bits out and OR's in the new value.
 
-vector signed char vec_vpopcntb (vector signed char);
-vector unsigned char vec_vpopcntb (vector unsigned char);
+_Decimal64 __builtin_dfp_quantize (_Decimal64, _Decimal64, const int);
+_Decimal64 __builtin_dfp_quantize (const int, _Decimal64, const int);
+_Decimal128 __builtin_dfp_quantize (_Decimal128, _Decimal128, const int);
+_Decimal128 __builtin_dfp_quantize (const int, _Decimal128, const int);
 
-vector long long vec_vpopcntd (vector long long);
-vector unsigned long long vec_vpopcntd (vector unsigned long long);
+The @code{__builtin_dfp_quantize} built-in, converts and rounds the second
+argument to the form with the exponent as specified by the first
+argument based on the rounding mode specified by the third argument.
+If the first argument is a decimal floating point value, its exponent is used
+for converting and rounding of the second argument.  If the first argument is a
+5-bit constant integer value, then the value specifies the exponent to be used
+when rounding and converting the second argument.  The third argument is a
+two bit constant integer that specifies the rounding mode.  The possible modes
+are: 00 Round to nearest, ties to even; 01 Round toward 0; 10 Round to nearest,
+ties away from 0; 11 Round according to DRN where DRN is the Decimal Floating
+point field of the FPSCR.
 
-vector short vec_vpopcnth (vector short);
-vector unsigned short vec_vpopcnth (vector unsigned short);
+@end smallexample
 
-vector int vec_vpopcntw (vector int);
-vector unsigned int vec_vpopcntw (vector int);
+The following functions require @option{-mhard-float},
+@option{-mpowerpc-gfxopt}, and @option{-mpopcntb} options.
 
-vector long long vec_vrld (vector long long, vector unsigned long long);
-vector unsigned long long vec_vrld (vector unsigned long long,
-                                    vector unsigned long long);
+@smallexample
+double __builtin_recipdiv (double, double);
+float __builtin_recipdivf (float, float);
+double __builtin_rsqrt (double);
+float __builtin_rsqrtf (float);
+@end smallexample
 
-vector long long vec_vsld (vector long long, vector unsigned long long);
-vector long long vec_vsld (vector unsigned long long,
-                           vector unsigned long long);
+The @code{vec_rsqrt}, @code{__builtin_rsqrt}, and
+@code{__builtin_rsqrtf} functions generate multiple instructions to
+implement the reciprocal sqrt functionality using reciprocal sqrt
+estimate instructions.
 
-vector long long vec_vsrad (vector long long, vector unsigned long long);
-vector unsigned long long vec_vsrad (vector unsigned long long,
-                                     vector unsigned long long);
+The @code{__builtin_recipdiv}, and @code{__builtin_recipdivf}
+functions generate multiple instructions to implement division using
+the reciprocal estimate instructions.
 
-vector long long vec_vsrd (vector long long, vector unsigned long long);
-vector unsigned long long char vec_vsrd (vector unsigned long long,
-                                         vector unsigned long long);
+The following functions require @option{-mhard-float} and
+@option{-mmultiple} options.
 
-vector long long vec_vsubudm (vector long long, vector long long);
-vector long long vec_vsubudm (vector bool long long, vector long long);
-vector long long vec_vsubudm (vector long long, vector bool long long);
-vector unsigned long long vec_vsubudm (vector unsigned long long,
-                                       vector unsigned long long);
-vector unsigned long long vec_vsubudm (vector bool long long,
-                                       vector unsigned long long);
-vector unsigned long long vec_vsubudm (vector unsigned long long,
-                                       vector bool long long);
+The @code{__builtin_unpack_longdouble} function takes a
+@code{long double} argument and a compile time constant of 0 or 1.  If
+the constant is 0, the first @code{double} within the
+@code{long double} is returned, otherwise the second @code{double}
+is returned.  The @code{__builtin_unpack_longdouble} function is only
+available if @code{long double} uses the IBM extended double
+representation.
 
-vector long long vec_vupkhsw (vector int);
-vector unsigned long long vec_vupkhsw (vector unsigned int);
+The @code{__builtin_pack_longdouble} function takes two @code{double}
+arguments and returns a @code{long double} value that combines the two
+arguments.  The @code{__builtin_pack_longdouble} function is only
+available if @code{long double} uses the IBM extended double
+representation.
 
-vector long long vec_vupklsw (vector int);
-vector unsigned long long vec_vupklsw (vector int);
-@end smallexample
+The @code{__builtin_unpack_ibm128} function takes a @code{__ibm128}
+argument and a compile time constant of 0 or 1.  If the constant is 0,
+the first @code{double} within the @code{__ibm128} is returned,
+otherwise the second @code{double} is returned.
 
-If the ISA 2.07 additions to the vector/scalar (power8-vector)
-instruction set are available, the following additional functions are
-available for 64-bit targets.  New vector types
-(@var{vector __int128} and @var{vector __uint128}) are available
-to hold the @var{__int128} and @var{__uint128} types to use these
-builtins.
+The @code{__builtin_pack_ibm128} function takes two @code{double}
+arguments and returns a @code{__ibm128} value that combines the two
+arguments.
 
-The normal vector extract, and set operations work on
-@var{vector __int128} and @var{vector __uint128} types,
-but the index value must be 0.
+Additional built-in functions are available for the 64-bit PowerPC
+family of processors, for efficient use of 128-bit floating point
+(@code{__float128}) values.
 
-Only functions excluded from the PVIPR are listed here.
+Vector select
 
 @smallexample
-vector __int128 vec_vaddcuq (vector __int128, vector __int128);
-vector __uint128 vec_vaddcuq (vector __uint128, vector __uint128);
+vector signed __int128 vec_sel (vector signed __int128,
+               vector signed __int128, vector bool __int128);
+vector signed __int128 vec_sel (vector signed __int128,
+               vector signed __int128, vector unsigned __int128);
+vector unsigned __int128 vec_sel (vector unsigned __int128,
+               vector unsigned __int128, vector bool __int128);
+vector unsigned __int128 vec_sel (vector unsigned __int128,
+               vector unsigned __int128, vector unsigned __int128);
+vector bool __int128 vec_sel (vector bool __int128,
+               vector bool __int128, vector bool __int128);
+vector bool __int128 vec_sel (vector bool __int128,
+               vector bool __int128, vector unsigned __int128);
+@end smallexample
 
-vector __int128 vec_vadduqm (vector __int128, vector __int128);
-vector __uint128 vec_vadduqm (vector __uint128, vector __uint128);
+The instance is an extension of the existing overloaded built-in @code{vec_sel}
+that is documented in the PVIPR.
 
-vector __int128 vec_vaddecuq (vector __int128, vector __int128,
-                                vector __int128);
-vector __uint128 vec_vaddecuq (vector __uint128, vector __uint128,
-                                 vector __uint128);
+@smallexample
+vector signed __int128 vec_perm (vector signed __int128,
+               vector signed __int128);
+vector unsigned __int128 vec_perm (vector unsigned __int128,
+               vector unsigned __int128);
+@end smallexample
 
-vector __int128 vec_vaddeuqm (vector __int128, vector __int128,
-                                vector __int128);
-vector __uint128 vec_vaddeuqm (vector __uint128, vector __uint128,
-                                 vector __uint128);
+The instance is an extension of the existing overloaded built-in
+@code{vec_perm} that is documented in the PVIPR.
 
-vector __int128 vec_vsubecuq (vector __int128, vector __int128,
-                                vector __int128);
-vector __uint128 vec_vsubecuq (vector __uint128, vector __uint128,
-                                 vector __uint128);
+@node Basic PowerPC Built-in Functions Available on ISA 2.06
+@subsubsection Basic PowerPC Built-in Functions Available on ISA 2.06
 
-vector __int128 vec_vsubeuqm (vector __int128, vector __int128,
-                                vector __int128);
-vector __uint128 vec_vsubeuqm (vector __uint128, vector __uint128,
-                                 vector __uint128);
+The basic built-in functions described in this section are
+available on the PowerPC family of processors starting with ISA 2.05
+or later.  Unless specific options are explicitly disabled on the
+command line, specifying option @option{-mcpu=power7} has the effect of
+enabling all the same options as for @option{-mcpu=power6} in
+addition to the @option{-maltivec}, @option{-mpopcntd}, and
+@option{-mvsx} options.
 
-vector __int128 vec_vsubcuq (vector __int128, vector __int128);
-vector __uint128 vec_vsubcuq (vector __uint128, vector __uint128);
+The following basic built-in functions require @option{-mpopcntd}:
+@smallexample
+unsigned int __builtin_addg6s (unsigned int, unsigned int);
+long long __builtin_bpermd (long long, long long);
+unsigned int __builtin_cbcdtd (unsigned int);
+unsigned int __builtin_cdtbcd (unsigned int);
+long long __builtin_divde (long long, long long);
+unsigned long long __builtin_divdeu (unsigned long long, unsigned long long);
+int __builtin_divwe (int, int);
+unsigned int __builtin_divweu (unsigned int, unsigned int);
+vector __int128 __builtin_pack_vector_int128 (long long, long long);
+void __builtin_rs6000_speculation_barrier (void);
+long long __builtin_unpack_vector_int128 (vector __int128, signed char);
+@end smallexample
 
-__int128 vec_vsubuqm (__int128, __int128);
-__uint128 vec_vsubuqm (__uint128, __uint128);
+Of these, the @code{__builtin_divde} and @code{__builtin_divdeu} functions
+require a 64-bit environment.
 
-vector __int128 __builtin_bcdadd (vector __int128, vector __int128, const int);
-vector unsigned char __builtin_bcdadd (vector unsigned char, vector unsigned char,
-                                       const int);
-int __builtin_bcdadd_lt (vector __int128, vector __int128, const int);
-int __builtin_bcdadd_lt (vector unsigned char, vector unsigned char, const int);
-int __builtin_bcdadd_eq (vector __int128, vector __int128, const int);
-int __builtin_bcdadd_eq (vector unsigned char, vector unsigned char, const int);
-int __builtin_bcdadd_gt (vector __int128, vector __int128, const int);
-int __builtin_bcdadd_gt (vector unsigned char, vector unsigned char, const int);
-int __builtin_bcdadd_ov (vector __int128, vector __int128, const int);
-int __builtin_bcdadd_ov (vector unsigned char, vector unsigned char, const int);
+The following basic built-in functions, which are also supported on
+x86 targets, require @option{-mfloat128}.
+@smallexample
+__float128 __builtin_fabsq (__float128);
+__float128 __builtin_copysignq (__float128, __float128);
+__float128 __builtin_infq (void);
+__float128 __builtin_huge_valq (void);
+__float128 __builtin_nanq (void);
+__float128 __builtin_nansq (void);
 
-vector __int128 __builtin_bcdsub (vector __int128, vector __int128, const int);
-vector unsigned char __builtin_bcdsub (vector unsigned char, vector unsigned char,
-                                       const int);
-int __builtin_bcdsub_le (vector __int128, vector __int128, const int);
-int __builtin_bcdsub_le (vector unsigned char, vector unsigned char, const int);
-int __builtin_bcdsub_lt (vector __int128, vector __int128, const int);
-int __builtin_bcdsub_lt (vector unsigned char, vector unsigned char, const int);
-int __builtin_bcdsub_eq (vector __int128, vector __int128, const int);
-int __builtin_bcdsub_eq (vector unsigned char, vector unsigned char, const int);
-int __builtin_bcdsub_gt (vector __int128, vector __int128, const int);
-int __builtin_bcdsub_gt (vector unsigned char, vector unsigned char, const int);
-int __builtin_bcdsub_ge (vector __int128, vector __int128, const int);
-int __builtin_bcdsub_ge (vector unsigned char, vector unsigned char, const int);
-int __builtin_bcdsub_ov (vector __int128, vector __int128, const int);
-int __builtin_bcdsub_ov (vector unsigned char, vector unsigned char, const int);
+__float128 __builtin_sqrtf128 (__float128);
+__float128 __builtin_fmaf128 (__float128, __float128, __float128);
 @end smallexample
 
-@node PowerPC AltiVec Built-in Functions Available on ISA 3.0
-@subsubsection PowerPC AltiVec Built-in Functions Available on ISA 3.0
-
-The following additional built-in functions are also available for the
-PowerPC family of processors, starting with ISA 3.0
-(@option{-mcpu=power9}) or later.
-
-Only instructions excluded from the PVIPR are listed here.
+@node Basic PowerPC Built-in Functions Available on ISA 2.07
+@subsubsection Basic PowerPC Built-in Functions Available on ISA 2.07
 
-@smallexample
-unsigned int scalar_extract_exp (double source);
-unsigned long long int scalar_extract_exp (__ieee128 source);
+The basic built-in functions described in this section are
+available on the PowerPC family of processors starting with ISA 2.07
+or later.  Unless specific options are explicitly disabled on the
+command line, specifying option @option{-mcpu=power8} has the effect of
+enabling all the same options as for @option{-mcpu=power7} in
+addition to the @option{-mpower8-fusion}, @option{-mcrypto},
+@option{-mhtm}, @option{-mquad-memory}, and
+@option{-mquad-memory-atomic} options.
 
-unsigned long long int scalar_extract_sig (double source);
-unsigned __int128 scalar_extract_sig (__ieee128 source);
+This section intentionally empty.
 
-double scalar_insert_exp (unsigned long long int significand,
-                          unsigned long long int exponent);
-double scalar_insert_exp (double significand, unsigned long long int exponent);
+@node Basic PowerPC Built-in Functions Available on ISA 3.0
+@subsubsection Basic PowerPC Built-in Functions Available on ISA 3.0
 
-ieee_128 scalar_insert_exp (unsigned __int128 significand,
-                            unsigned long long int exponent);
-ieee_128 scalar_insert_exp (ieee_128 significand, unsigned long long int exponent);
-vector ieee_128 scalar_insert_exp (vector unsigned __int128 significand,
-                                   vector unsigned long long exponent);
-vector unsigned long long scalar_extract_exp_to_vec (ieee_128);
-vector unsigned __int128  scalar_extract_sig_to_vec (ieee_128);
+The basic built-in functions described in this section are
+available on the PowerPC family of processors starting with ISA 3.0
+or later.  Unless specific options are explicitly disabled on the
+command line, specifying option @option{-mcpu=power9} has the effect of
+enabling all the same options as for @option{-mcpu=power8} in
+addition to the @option{-misel} option.
 
-int scalar_cmp_exp_gt (double arg1, double arg2);
-int scalar_cmp_exp_lt (double arg1, double arg2);
-int scalar_cmp_exp_eq (double arg1, double arg2);
-int scalar_cmp_exp_unordered (double arg1, double arg2);
+The following built-in functions are available on Linux 64-bit systems
+that use the ISA 3.0 instruction set (@option{-mcpu=power9}):
 
-bool scalar_test_data_class (float source, const int condition);
-bool scalar_test_data_class (double source, const int condition);
-bool scalar_test_data_class (__ieee128 source, const int condition);
+@defbuiltin{__float128 __builtin_addf128_round_to_odd (__float128, __float128)}
+Perform a 128-bit IEEE floating point add using round to odd as the
+rounding mode.
+@enddefbuiltin
 
-bool scalar_test_neg (float source);
-bool scalar_test_neg (double source);
-bool scalar_test_neg (__ieee128 source);
-@end smallexample
+@defbuiltin{__float128 __builtin_subf128_round_to_odd (__float128, __float128)}
+Perform a 128-bit IEEE floating point subtract using round to odd as
+the rounding mode.
+@enddefbuiltin
 
-The @code{scalar_extract_exp} with a 64-bit source argument
-function requires an environment supporting ISA 3.0 or later.
-The @code{scalar_extract_exp} with a 128-bit source argument
-and @code{scalar_extract_sig}
-functions require a 64-bit environment supporting ISA 3.0 or later.
-The @code{scalar_extract_exp} and @code{scalar_extract_sig} built-in
-functions return the significand and the biased exponent value
-respectively of their @code{source} arguments.
-When supplied with a 64-bit @code{source} argument, the
-result returned by @code{scalar_extract_sig} has
-the @code{0x0010000000000000} bit set if the
-function's @code{source} argument is in normalized form.
-Otherwise, this bit is set to 0.
-When supplied with a 128-bit @code{source} argument, the
-@code{0x00010000000000000000000000000000} bit of the result is
-treated similarly.
-Note that the sign of the significand is not represented in the result
-returned from the @code{scalar_extract_sig} function.  Use the
-@code{scalar_test_neg} function to test the sign of its @code{double}
-argument.
+@defbuiltin{__float128 __builtin_mulf128_round_to_odd (__float128, __float128)}
+Perform a 128-bit IEEE floating point multiply using round to odd as
+the rounding mode.
+@enddefbuiltin
 
-The @code{scalar_insert_exp}
-functions require a 64-bit environment supporting ISA 3.0 or later.
-When supplied with a 64-bit first argument, the
-@code{scalar_insert_exp} built-in function returns a double-precision
-floating point value that is constructed by assembling the values of its
-@code{significand} and @code{exponent} arguments.  The sign of the
-result is copied from the most significant bit of the
-@code{significand} argument.  The significand and exponent components
-of the result are composed of the least significant 11 bits of the
-@code{exponent} argument and the least significant 52 bits of the
-@code{significand} argument respectively.
+@defbuiltin{__float128 __builtin_divf128_round_to_odd (__float128, __float128)}
+Perform a 128-bit IEEE floating point divide using round to odd as
+the rounding mode.
+@enddefbuiltin
 
-When supplied with a 128-bit first argument, the
-@code{scalar_insert_exp} built-in function returns a quad-precision
-IEEE floating point value if the two arguments were scalar.  If the two
-arguments are vectors, the return value is a vector IEEE floating point value.
-The sign bit of the result is copied from the most significant bit of the
-@code{significand} argument.  The significand and exponent components of the
-result are composed of the least significant 15 bits of the @code{exponent}
-argument (element 0 on big-endian and element 1 on little-endian) and the
-least significant 112 bits of the @code{significand} argument
-respectively.  Note, the @code{significand} is the scalar argument or in the
-case of vector arguments, @code{significand} is element 0 for big-endian and
-element 1 for little-endian.
+@defbuiltin{__float128 __builtin_sqrtf128_round_to_odd (__float128)}
+Perform a 128-bit IEEE floating point square root using round to odd
+as the rounding mode.
+@enddefbuiltin
 
-The @code{scalar_extract_exp_to_vec},
-and @code{scalar_extract_sig_to_vec} are similar to
-@code{scalar_extract_exp}, @code{scalar_extract_sig} except they return
-a vector result of type unsigned long long and unsigned __int128 respectively.
+@defbuiltin{__float128 __builtin_fmaf128_round_to_odd (__float128, __float128, __float128)}
+Perform a 128-bit IEEE floating point fused multiply and add operation
+using round to odd as the rounding mode.
+@enddefbuiltin
 
-The @code{scalar_cmp_exp_gt}, @code{scalar_cmp_exp_lt},
-@code{scalar_cmp_exp_eq}, and @code{scalar_cmp_exp_unordered} built-in
-functions return a non-zero value if @code{arg1} is greater than, less
-than, equal to, or not comparable to @code{arg2} respectively.  The
-arguments are not comparable if one or the other equals NaN (not a
-number). 
+@defbuiltin{double __builtin_truncf128_round_to_odd (__float128)}
+Convert a 128-bit IEEE floating point value to @code{double} using
+round to odd as the rounding mode.
+@enddefbuiltin
 
-The @code{scalar_test_data_class} built-in function returns 1
-if any of the condition tests enabled by the value of the
-@code{condition} variable are true, and 0 otherwise.  The
-@code{condition} argument must be a compile-time constant integer with
-value not exceeding 127.  The
-@code{condition} argument is encoded as a bitmask with each bit
-enabling the testing of a different condition, as characterized by the
-following:
-@smallexample
-0x40    Test for NaN
-0x20    Test for +Infinity
-0x10    Test for -Infinity
-0x08    Test for +Zero
-0x04    Test for -Zero
-0x02    Test for +Denormal
-0x01    Test for -Denormal
-@end smallexample
 
-The @code{scalar_test_neg} built-in function returns 1 if its
-@code{source} argument holds a negative value, 0 otherwise.
+The following additional built-in functions are also available for the
+PowerPC family of processors, starting with ISA 3.0 or later:
 
-The following built-in functions are also available for the PowerPC family
-of processors, starting with ISA 3.0 or later
-(@option{-mcpu=power9}).  These string functions are described
-separately in order to group the descriptions closer to the function
-prototypes.
+@defbuiltin{{long long} __builtin_darn (void)}
+@defbuiltinx{{long long} __builtin_darn_raw (void)}
+@defbuiltinx{int __builtin_darn_32 (void)}
+The @code{__builtin_darn} and @code{__builtin_darn_raw}
+functions require a
+64-bit environment supporting ISA 3.0 or later.
+The @code{__builtin_darn} function provides a 64-bit conditioned
+random number.  The @code{__builtin_darn_raw} function provides a
+64-bit raw random number.  The @code{__builtin_darn_32} function
+provides a 32-bit conditioned random number.
+@enddefbuiltin
 
-Only functions excluded from the PVIPR are listed here.
+The following additional built-in functions are also available for the
+PowerPC family of processors, starting with ISA 3.0 or later:
 
 @smallexample
-int vec_all_nez (vector signed char, vector signed char);
-int vec_all_nez (vector unsigned char, vector unsigned char);
-int vec_all_nez (vector signed short, vector signed short);
-int vec_all_nez (vector unsigned short, vector unsigned short);
-int vec_all_nez (vector signed int, vector signed int);
-int vec_all_nez (vector unsigned int, vector unsigned int);
-
-int vec_any_eqz (vector signed char, vector signed char);
-int vec_any_eqz (vector unsigned char, vector unsigned char);
-int vec_any_eqz (vector signed short, vector signed short);
-int vec_any_eqz (vector unsigned short, vector unsigned short);
-int vec_any_eqz (vector signed int, vector signed int);
-int vec_any_eqz (vector unsigned int, vector unsigned int);
+int __builtin_byte_in_set (unsigned char u, unsigned long long set);
+int __builtin_byte_in_range (unsigned char u, unsigned int range);
+int __builtin_byte_in_either_range (unsigned char u, unsigned int ranges);
 
-signed char vec_xlx (unsigned int index, vector signed char data);
-unsigned char vec_xlx (unsigned int index, vector unsigned char data);
-signed short vec_xlx (unsigned int index, vector signed short data);
-unsigned short vec_xlx (unsigned int index, vector unsigned short data);
-signed int vec_xlx (unsigned int index, vector signed int data);
-unsigned int vec_xlx (unsigned int index, vector unsigned int data);
-float vec_xlx (unsigned int index, vector float data);
+int __builtin_dfp_dtstsfi_lt (unsigned int comparison, _Decimal64 value);
+int __builtin_dfp_dtstsfi_lt (unsigned int comparison, _Decimal128 value);
+int __builtin_dfp_dtstsfi_lt_dd (unsigned int comparison, _Decimal64 value);
+int __builtin_dfp_dtstsfi_lt_td (unsigned int comparison, _Decimal128 value);
 
-signed char vec_xrx (unsigned int index, vector signed char data);
-unsigned char vec_xrx (unsigned int index, vector unsigned char data);
-signed short vec_xrx (unsigned int index, vector signed short data);
-unsigned short vec_xrx (unsigned int index, vector unsigned short data);
-signed int vec_xrx (unsigned int index, vector signed int data);
-unsigned int vec_xrx (unsigned int index, vector unsigned int data);
-float vec_xrx (unsigned int index, vector float data);
-@end smallexample
+int __builtin_dfp_dtstsfi_gt (unsigned int comparison, _Decimal64 value);
+int __builtin_dfp_dtstsfi_gt (unsigned int comparison, _Decimal128 value);
+int __builtin_dfp_dtstsfi_gt_dd (unsigned int comparison, _Decimal64 value);
+int __builtin_dfp_dtstsfi_gt_td (unsigned int comparison, _Decimal128 value);
 
-The @code{vec_all_nez}, @code{vec_any_eqz}, and @code{vec_cmpnez}
-perform pairwise comparisons between the elements at the same
-positions within their two vector arguments.
-The @code{vec_all_nez} function returns a
-non-zero value if and only if all pairwise comparisons are not
-equal and no element of either vector argument contains a zero.
-The @code{vec_any_eqz} function returns a
-non-zero value if and only if at least one pairwise comparison is equal
-or if at least one element of either vector argument contains a zero.
-The @code{vec_cmpnez} function returns a vector of the same type as
-its two arguments, within which each element consists of all ones to
-denote that either the corresponding elements of the incoming arguments are
-not equal or that at least one of the corresponding elements contains
-zero.  Otherwise, the element of the returned vector contains all zeros.
+int __builtin_dfp_dtstsfi_eq (unsigned int comparison, _Decimal64 value);
+int __builtin_dfp_dtstsfi_eq (unsigned int comparison, _Decimal128 value);
+int __builtin_dfp_dtstsfi_eq_dd (unsigned int comparison, _Decimal64 value);
+int __builtin_dfp_dtstsfi_eq_td (unsigned int comparison, _Decimal128 value);
 
-The @code{vec_xlx} and @code{vec_xrx} functions extract the single
-element selected by the @code{index} argument from the vector
-represented by the @code{data} argument.  The @code{index} argument
-always specifies a byte offset, regardless of the size of the vector
-element.  With @code{vec_xlx}, @code{index} is the offset of the first
-byte of the element to be extracted.  With @code{vec_xrx}, @code{index}
-represents the last byte of the element to be extracted, measured
-from the right end of the vector.  In other words, the last byte of
-the element to be extracted is found at position @code{(15 - index)}.
-There is no requirement that @code{index} be a multiple of the vector
-element size.  However, if the size of the vector element added to
-@code{index} is greater than 15, the content of the returned value is
-undefined.
+int __builtin_dfp_dtstsfi_ov (unsigned int comparison, _Decimal64 value);
+int __builtin_dfp_dtstsfi_ov (unsigned int comparison, _Decimal128 value);
+int __builtin_dfp_dtstsfi_ov_dd (unsigned int comparison, _Decimal64 value);
+int __builtin_dfp_dtstsfi_ov_td (unsigned int comparison, _Decimal128 value);
 
-The following functions are also available if the ISA 3.0 instruction
-set additions (@option{-mcpu=power9}) are available.
+double __builtin_mffsl(void);
 
-Only functions excluded from the PVIPR are listed here.
+@end smallexample
+The @code{__builtin_byte_in_set} function requires a
+64-bit environment supporting ISA 3.0 or later.  This function returns
+a non-zero value if and only if its @code{u} argument exactly equals one of
+the eight bytes contained within its 64-bit @code{set} argument.
 
-@smallexample
-vector long long vec_vctz (vector long long);
-vector unsigned long long vec_vctz (vector unsigned long long);
-vector int vec_vctz (vector int);
-vector unsigned int vec_vctz (vector int);
-vector short vec_vctz (vector short);
-vector unsigned short vec_vctz (vector unsigned short);
-vector signed char vec_vctz (vector signed char);
-vector unsigned char vec_vctz (vector unsigned char);
+The @code{__builtin_byte_in_range} and
+@code{__builtin_byte_in_either_range} require an environment
+supporting ISA 3.0 or later.  For these two functions, the
+@code{range} argument is encoded as 4 bytes, organized as
+@code{hi_1:lo_1:hi_2:lo_2}.
+The @code{__builtin_byte_in_range} function returns a
+non-zero value if and only if its @code{u} argument is within the
+range bounded between @code{lo_2} and @code{hi_2} inclusive.
+The @code{__builtin_byte_in_either_range} function returns non-zero if
+and only if its @code{u} argument is within either the range bounded
+between @code{lo_1} and @code{hi_1} inclusive or the range bounded
+between @code{lo_2} and @code{hi_2} inclusive.
 
-vector signed char vec_vctzb (vector signed char);
-vector unsigned char vec_vctzb (vector unsigned char);
+The @code{__builtin_dfp_dtstsfi_lt} function returns a non-zero value
+if and only if the number of significant digits of its @code{value} argument
+is less than its @code{comparison} argument.  The
+@code{__builtin_dfp_dtstsfi_lt_dd} and
+@code{__builtin_dfp_dtstsfi_lt_td} functions behave similarly, but
+require that the type of the @code{value} argument be
+@code{__Decimal64} and @code{__Decimal128} respectively.
 
-vector long long vec_vctzd (vector long long);
-vector unsigned long long vec_vctzd (vector unsigned long long);
+The @code{__builtin_dfp_dtstsfi_gt} function returns a non-zero value
+if and only if the number of significant digits of its @code{value} argument
+is greater than its @code{comparison} argument.  The
+@code{__builtin_dfp_dtstsfi_gt_dd} and
+@code{__builtin_dfp_dtstsfi_gt_td} functions behave similarly, but
+require that the type of the @code{value} argument be
+@code{__Decimal64} and @code{__Decimal128} respectively.
 
-vector short vec_vctzh (vector short);
-vector unsigned short vec_vctzh (vector unsigned short);
+The @code{__builtin_dfp_dtstsfi_eq} function returns a non-zero value
+if and only if the number of significant digits of its @code{value} argument
+equals its @code{comparison} argument.  The
+@code{__builtin_dfp_dtstsfi_eq_dd} and
+@code{__builtin_dfp_dtstsfi_eq_td} functions behave similarly, but
+require that the type of the @code{value} argument be
+@code{__Decimal64} and @code{__Decimal128} respectively.
 
-vector int vec_vctzw (vector int);
-vector unsigned int vec_vctzw (vector int);
+The @code{__builtin_dfp_dtstsfi_ov} function returns a non-zero value
+if and only if its @code{value} argument has an undefined number of
+significant digits, such as when @code{value} is an encoding of @code{NaN}.
+The @code{__builtin_dfp_dtstsfi_ov_dd} and
+@code{__builtin_dfp_dtstsfi_ov_td} functions behave similarly, but
+require that the type of the @code{value} argument be
+@code{__Decimal64} and @code{__Decimal128} respectively.
 
-vector int vec_vprtyb (vector int);
-vector unsigned int vec_vprtyb (vector unsigned int);
-vector long long vec_vprtyb (vector long long);
-vector unsigned long long vec_vprtyb (vector unsigned long long);
+The @code{__builtin_mffsl} uses the ISA 3.0 @code{mffsl} instruction to read
+the FPSCR.  The instruction is a lower latency version of the @code{mffs}
+instruction.  If the @code{mffsl} instruction is not available, then the
+builtin uses the older @code{mffs} instruction to read the FPSCR.
 
-vector int vec_vprtybw (vector int);
-vector unsigned int vec_vprtybw (vector unsigned int);
+@node Basic PowerPC Built-in Functions Available on ISA 3.1
+@subsubsection Basic PowerPC Built-in Functions Available on ISA 3.1
 
-vector long long vec_vprtybd (vector long long);
-vector unsigned long long vec_vprtybd (vector unsigned long long);
-@end smallexample
+The basic built-in functions described in this section are
+available on the PowerPC family of processors starting with ISA 3.1.
+Unless specific options are explicitly disabled on the
+command line, specifying option @option{-mcpu=power10} has the effect of
+enabling all the same options as for @option{-mcpu=power9}.
 
-On 64-bit targets, if the ISA 3.0 additions (@option{-mcpu=power9})
-are available:
+The following built-in functions are available on Linux 64-bit systems
+that use a future architecture instruction set (@option{-mcpu=power10}):
 
-@smallexample
-vector long vec_vprtyb (vector long);
-vector unsigned long vec_vprtyb (vector unsigned long);
-vector __int128 vec_vprtyb (vector __int128);
-vector __uint128 vec_vprtyb (vector __uint128);
+@defbuiltin{{unsigned long long} @
+            __builtin_cfuged (unsigned long long, unsigned long long)}
+Perform a 64-bit centrifuge operation, as if implemented by the
+@code{cfuged} instruction.
+@enddefbuiltin
 
-vector long vec_vprtybd (vector long);
-vector unsigned long vec_vprtybd (vector unsigned long);
+@defbuiltin{{unsigned long long} @
+            __builtin_cntlzdm (unsigned long long, unsigned long long)}
+Perform a 64-bit count leading zeros operation under mask, as if
+implemented by the @code{cntlzdm} instruction.
+@enddefbuiltin
 
-vector __int128 vec_vprtybq (vector __int128);
-vector __uint128 vec_vprtybd (vector __uint128);
-@end smallexample
+@defbuiltin{{unsigned long long} @
+            __builtin_cnttzdm (unsigned long long, unsigned long long)}
+Perform a 64-bit count trailing zeros operation under mask, as if
+implemented by the @code{cnttzdm} instruction.
+@enddefbuiltin
 
-The following built-in functions are available for the PowerPC family
-of processors, starting with ISA 3.0 or later (@option{-mcpu=power9}).
+@defbuiltin{{unsigned long long} @
+            __builtin_pdepd (unsigned long long, unsigned long long)}
+Perform a 64-bit parallel bits deposit operation, as if implemented by the
+@code{pdepd} instruction.
+@enddefbuiltin
 
-Only functions excluded from the PVIPR are listed here.
+@defbuiltin{{unsigned long long} @
+            __builtin_pextd (unsigned long long, unsigned long long)}
+Perform a 64-bit parallel bits extract operation, as if implemented by the
+@code{pextd} instruction.
+@enddefbuiltin
 
-@smallexample
-__vector unsigned char
-vec_absdb (__vector unsigned char arg1, __vector unsigned char arg2);
-__vector unsigned short
-vec_absdh (__vector unsigned short arg1, __vector unsigned short arg2);
-__vector unsigned int
-vec_absdw (__vector unsigned int arg1, __vector unsigned int arg2);
-@end smallexample
+@defbuiltin{{vector signed __int128} vsx_xl_sext (signed long long, signed char *)}
+@defbuiltinx{{vector signed __int128} vsx_xl_sext (signed long long, signed short *)}
+@defbuiltinx{{vector signed __int128} vsx_xl_sext (signed long long, signed int *)}
+@defbuiltinx{{vector signed __int128} vsx_xl_sext (signed long long, signed long long *)}
+@defbuiltinx{{vector unsigned __int128} vsx_xl_zext (signed long long, unsigned char *)}
+@defbuiltinx{{vector unsigned __int128} vsx_xl_zext (signed long long, unsigned short *)}
+@defbuiltinx{{vector unsigned __int128} vsx_xl_zext (signed long long, unsigned int *)}
+@defbuiltinx{{vector unsigned __int128} vsx_xl_zext (signed long long, unsigned long long *)}
 
-The @code{vec_absd}, @code{vec_absdb}, @code{vec_absdh}, and
-@code{vec_absdw} built-in functions each computes the absolute
-differences of the pairs of vector elements supplied in its two vector
-arguments, placing the absolute differences into the corresponding
-elements of the vector result.
+Load (and sign extend) to an __int128 vector, as if implemented by the ISA 3.1
+@code{lxvrbx}, @code{lxvrhx}, @code{lxvrwx}, and  @code{lxvrdx}
+instructions.
+@enddefbuiltin
 
-The following built-in functions are available for the PowerPC family
-of processors, starting with ISA 3.0 or later (@option{-mcpu=power9}):
-@smallexample
-vector unsigned int vec_vrlnm (vector unsigned int, vector unsigned int);
-vector unsigned long long vec_vrlnm (vector unsigned long long,
-                                     vector unsigned long long);
-@end smallexample
+@defbuiltin{{void} vec_xst_trunc (vector signed __int128, signed long long, signed char *)}
+@defbuiltinx{{void} vec_xst_trunc (vector signed __int128, signed long long, signed short *)}
+@defbuiltinx{{void} vec_xst_trunc (vector signed __int128, signed long long, signed int *)}
+@defbuiltinx{{void} vec_xst_trunc (vector signed __int128, signed long long, signed long long *)}
+@defbuiltinx{{void} vec_xst_trunc (vector unsigned __int128, signed long long, unsigned char *)}
+@defbuiltinx{{void} vec_xst_trunc (vector unsigned __int128, signed long long, unsigned short *)}
+@defbuiltinx{{void} vec_xst_trunc (vector unsigned __int128, signed long long, unsigned int *)}
+@defbuiltinx{{void} vec_xst_trunc (vector unsigned __int128, signed long long, unsigned long long *)}
 
-The result of @code{vec_vrlnm} is obtained by rotating each element
-of the first argument vector left and ANDing it with a mask.  The
-second argument vector contains the mask  beginning in bits 11:15,
-the mask end in bits 19:23, and the shift count in bits 27:31,
-of each element.
+Truncate and store the rightmost element of a vector, as if implemented by the
+ISA 3.1 @code{stxvrbx}, @code{stxvrhx}, @code{stxvrwx}, and @code{stxvrdx}
+instructions.
+@enddefbuiltin
 
-If the cryptographic instructions are enabled (@option{-mcrypto} or
-@option{-mcpu=power8}), the following builtins are enabled.
+@node PowerPC AltiVec/VSX Built-in Functions
+@subsection PowerPC AltiVec/VSX Built-in Functions
 
-Only functions excluded from the PVIPR are listed here.
+GCC provides an interface for the PowerPC family of processors to access
+the AltiVec operations described in Motorola's AltiVec Programming
+Interface Manual.  The interface is made available by including
+@code{<altivec.h>} and using @option{-maltivec} and
+@option{-mabi=altivec}.  The interface supports the following vector
+types.
 
 @smallexample
-vector unsigned long long __builtin_crypto_vsbox (vector unsigned long long);
+vector unsigned char
+vector signed char
+vector bool char
 
-vector unsigned long long __builtin_crypto_vcipher (vector unsigned long long,
-                                                    vector unsigned long long);
+vector unsigned short
+vector signed short
+vector bool short
+vector pixel
 
-vector unsigned long long __builtin_crypto_vcipherlast
-                                     (vector unsigned long long,
-                                      vector unsigned long long);
+vector unsigned int
+vector signed int
+vector bool int
+vector float
+@end smallexample
 
-vector unsigned long long __builtin_crypto_vncipher (vector unsigned long long,
-                                                     vector unsigned long long);
+GCC's implementation of the high-level language interface available from
+C and C++ code differs from Motorola's documentation in several ways.
 
-vector unsigned long long __builtin_crypto_vncipherlast (vector unsigned long long,
-                                                         vector unsigned long long);
+@itemize @bullet
 
-vector unsigned char __builtin_crypto_vpermxor (vector unsigned char,
-                                                vector unsigned char,
-                                                vector unsigned char);
+@item
+A vector constant is a list of constant expressions within curly braces.
 
-vector unsigned short __builtin_crypto_vpermxor (vector unsigned short,
-                                                 vector unsigned short,
-                                                 vector unsigned short);
+@item
+A vector initializer requires no cast if the vector constant is of the
+same type as the variable it is initializing.
 
-vector unsigned int __builtin_crypto_vpermxor (vector unsigned int,
-                                               vector unsigned int,
-                                               vector unsigned int);
+@item
+If @code{signed} or @code{unsigned} is omitted, the signedness of the
+vector type is the default signedness of the base type.  The default
+varies depending on the operating system, so a portable program should
+always specify the signedness.
 
-vector unsigned long long __builtin_crypto_vpermxor (vector unsigned long long,
-                                                     vector unsigned long long,
-                                                     vector unsigned long long);
+@item
+Compiling with @option{-maltivec} adds keywords @code{__vector},
+@code{vector}, @code{__pixel}, @code{pixel}, @code{__bool} and
+@code{bool}.  When compiling ISO C, the context-sensitive substitution
+of the keywords @code{vector}, @code{pixel} and @code{bool} is
+disabled.  To use them, you must include @code{<altivec.h>} instead.
 
-vector unsigned char __builtin_crypto_vpmsumb (vector unsigned char,
-                                               vector unsigned char);
+@item
+GCC allows using a @code{typedef} name as the type specifier for a
+vector type, but only under the following circumstances:
 
-vector unsigned short __builtin_crypto_vpmsumh (vector unsigned short,
-                                                vector unsigned short);
+@itemize @bullet
 
-vector unsigned int __builtin_crypto_vpmsumw (vector unsigned int,
-                                              vector unsigned int);
+@item
+When using @code{__vector} instead of @code{vector}; for example,
 
-vector unsigned long long __builtin_crypto_vpmsumd (vector unsigned long long,
-                                                    vector unsigned long long);
+@smallexample
+typedef signed short int16;
+__vector int16 data;
+@end smallexample
 
-vector unsigned long long __builtin_crypto_vshasigmad (vector unsigned long long,
-                                                       int, int);
+@item
+When using @code{vector} in keyword-and-predefine mode; for example,
 
-vector unsigned int __builtin_crypto_vshasigmaw (vector unsigned int, int, int);
+@smallexample
+typedef signed short int16;
+vector int16 data;
 @end smallexample
 
-The second argument to @var{__builtin_crypto_vshasigmad} and
-@var{__builtin_crypto_vshasigmaw} must be a constant
-integer that is 0 or 1.  The third argument to these built-in functions
-must be a constant integer in the range of 0 to 15.
+Note that keyword-and-predefine mode is enabled by disabling GNU
+extensions (e.g., by using @code{-std=c11}) and including
+@code{<altivec.h>}.
+@end itemize
 
-The following sign extension builtins are provided:
+@item
+For C, overloaded functions are implemented with macros so the following
+does not work:
 
 @smallexample
-vector signed int vec_signexti (vector signed char a);
-vector signed long long vec_signextll (vector signed char a);
-vector signed int vec_signexti (vector signed short a);
-vector signed long long vec_signextll (vector signed short a);
-vector signed long long vec_signextll (vector signed int a);
-vector signed long long vec_signextq (vector signed long long a);
+  vec_add ((vector signed int)@{1, 2, 3, 4@}, foo);
 @end smallexample
 
-Each element of the result is produced by sign-extending the element of the
-input vector that would fall in the least significant portion of the result
-element. For example, a sign-extension of a vector signed char to a vector
-signed long long will sign extend the rightmost byte of each doubleword.
+@noindent
+Since @code{vec_add} is a macro, the vector constant in the example
+is treated as four separate arguments.  Wrap the entire argument in
+parentheses for this to work.
+@end itemize
 
-@node PowerPC AltiVec Built-in Functions Available on ISA 3.1
-@subsubsection PowerPC AltiVec Built-in Functions Available on ISA 3.1
+@emph{Note:} Only the @code{<altivec.h>} interface is supported.
+Internally, GCC uses built-in functions to achieve the functionality in
+the aforementioned header file, but they are not supported and are
+subject to change without notice.
 
-The following additional built-in functions are also available for the
-PowerPC family of processors, starting with ISA 3.1 (@option{-mcpu=power10}):
+GCC complies with the Power Vector Intrinsic Programming Reference (PVIPR),
+which may be found at
+@uref{https://openpowerfoundation.org/?resource_lib=power-vector-intrinsic-programming-reference}.
+Chapter 4 of this document fully documents the vector API interfaces
+that must be
+provided by compliant compilers.  Programmers should preferentially use
+the interfaces described therein.  However, historically GCC has provided
+additional interfaces for access to vector instructions.  These are
+briefly described below.  Where the PVIPR provides a portable interface,
+other functions in GCC that provide the same capabilities should be
+considered deprecated.
 
-@smallexample
-@exdent int vec_test_lsbb_all_ones (vector signed char);
-@exdent int vec_test_lsbb_all_ones (vector unsigned char);
-@exdent int vec_test_lsbb_all_ones (vector bool char);
-@end smallexample
-@findex vec_test_lsbb_all_ones
+The PVIPR documents the following overloaded functions:
 
-The builtin @code{vec_test_lsbb_all_ones} returns 1 if the least significant
-bit in each byte is equal to 1.  It returns 0 otherwise.
+@multitable @columnfractions 0.33 0.33 0.33
 
-@smallexample
-@exdent int vec_test_lsbb_all_zeros (vector signed char);
-@exdent int vec_test_lsbb_all_zeros (vector unsigned char);
-@exdent int vec_test_lsbb_all_zeros (vector bool char);
-@end smallexample
-@findex vec_test_lsbb_all_zeros
+@item @code{vec_abs}
+@tab @code{vec_absd}
+@tab @code{vec_abss}
+@item @code{vec_add}
+@tab @code{vec_addc}
+@tab @code{vec_adde}
+@item @code{vec_addec}
+@tab @code{vec_adds}
+@tab @code{vec_all_eq}
+@item @code{vec_all_ge}
+@tab @code{vec_all_gt}
+@tab @code{vec_all_in}
+@item @code{vec_all_le}
+@tab @code{vec_all_lt}
+@tab @code{vec_all_nan}
+@item @code{vec_all_ne}
+@tab @code{vec_all_nge}
+@tab @code{vec_all_ngt}
+@item @code{vec_all_nle}
+@tab @code{vec_all_nlt}
+@tab @code{vec_all_numeric}
+@item @code{vec_and}
+@tab @code{vec_andc}
+@tab @code{vec_any_eq}
+@item @code{vec_any_ge}
+@tab @code{vec_any_gt}
+@tab @code{vec_any_le}
+@item @code{vec_any_lt}
+@tab @code{vec_any_nan}
+@tab @code{vec_any_ne}
+@item @code{vec_any_nge}
+@tab @code{vec_any_ngt}
+@tab @code{vec_any_nle}
+@item @code{vec_any_nlt}
+@tab @code{vec_any_numeric}
+@tab @code{vec_any_out}
+@item @code{vec_avg}
+@tab @code{vec_bperm}
+@tab @code{vec_ceil}
+@item @code{vec_cipher_be}
+@tab @code{vec_cipherlast_be}
+@tab @code{vec_cmpb}
+@item @code{vec_cmpeq}
+@tab @code{vec_cmpge}
+@tab @code{vec_cmpgt}
+@item @code{vec_cmple}
+@tab @code{vec_cmplt}
+@tab @code{vec_cmpne}
+@item @code{vec_cmpnez}
+@tab @code{vec_cntlz}
+@tab @code{vec_cntlz_lsbb}
+@item @code{vec_cnttz}
+@tab @code{vec_cnttz_lsbb}
+@tab @code{vec_cpsgn}
+@item @code{vec_ctf}
+@tab @code{vec_cts}
+@tab @code{vec_ctu}
+@item @code{vec_div}
+@tab @code{vec_double}
+@tab @code{vec_doublee}
+@item @code{vec_doubleh}
+@tab @code{vec_doublel}
+@tab @code{vec_doubleo}
+@item @code{vec_eqv}
+@tab @code{vec_expte}
+@tab @code{vec_extract}
+@item @code{vec_extract_exp}
+@tab @code{vec_extract_fp32_from_shorth}
+@tab @code{vec_extract_fp32_from_shortl}
+@item @code{vec_extract_sig}
+@tab @code{vec_extract_4b}
+@tab @code{vec_first_match_index}
+@item @code{vec_first_match_or_eos_index}
+@tab @code{vec_first_mismatch_index}
+@tab @code{vec_first_mismatch_or_eos_index}
+@item @code{vec_float}
+@tab @code{vec_float2}
+@tab @code{vec_floate}
+@item @code{vec_floato}
+@tab @code{vec_floor}
+@tab @code{vec_gb}
+@item @code{vec_insert}
+@tab @code{vec_insert_exp}
+@tab @code{vec_insert4b}
+@item @code{vec_ld}
+@tab @code{vec_lde}
+@tab @code{vec_ldl}
+@item @code{vec_loge}
+@tab @code{vec_madd}
+@tab @code{vec_madds}
+@item @code{vec_max}
+@tab @code{vec_mergee}
+@tab @code{vec_mergeh}
+@item @code{vec_mergel}
+@tab @code{vec_mergeo}
+@tab @code{vec_mfvscr}
+@item @code{vec_min}
+@tab @code{vec_mradds}
+@tab @code{vec_msub}
+@item @code{vec_msum}
+@tab @code{vec_msums}
+@tab @code{vec_mtvscr}
+@item @code{vec_mul}
+@tab @code{vec_mule}
+@tab @code{vec_mulo}
+@item @code{vec_nabs}
+@tab @code{vec_nand}
+@tab @code{vec_ncipher_be}
+@item @code{vec_ncipherlast_be}
+@tab @code{vec_nearbyint}
+@tab @code{vec_neg}
+@item @code{vec_nmadd}
+@tab @code{vec_nmsub}
+@tab @code{vec_nor}
+@item @code{vec_or}
+@tab @code{vec_orc}
+@tab @code{vec_pack}
+@item @code{vec_pack_to_short_fp32}
+@tab @code{vec_packpx}
+@tab @code{vec_packs}
+@item @code{vec_packsu}
+@tab @code{vec_parity_lsbb}
+@tab @code{vec_perm}
+@item @code{vec_permxor}
+@tab @code{vec_pmsum_be}
+@tab @code{vec_popcnt}
+@item @code{vec_re}
+@tab @code{vec_recipdiv}
+@tab @code{vec_revb}
+@item @code{vec_reve}
+@tab @code{vec_rint}
+@tab @code{vec_rl}
+@item @code{vec_rlmi}
+@tab @code{vec_rlnm}
+@tab @code{vec_round}
+@item @code{vec_rsqrt}
+@tab @code{vec_rsqrte}
+@tab @code{vec_sbox_be}
+@item @code{vec_sel}
+@tab @code{vec_shasigma_be}
+@tab @code{vec_signed}
+@item @code{vec_signed2}
+@tab @code{vec_signede}
+@tab @code{vec_signedo}
+@item @code{vec_sl}
+@tab @code{vec_sld}
+@tab @code{vec_sldw}
+@item @code{vec_sll}
+@tab @code{vec_slo}
+@tab @code{vec_slv}
+@item @code{vec_splat}
+@tab @code{vec_splat_s8}
+@tab @code{vec_splat_s16}
+@item @code{vec_splat_s32}
+@tab @code{vec_splat_u8}
+@tab @code{vec_splat_u16}
+@item @code{vec_splat_u32}
+@tab @code{vec_splats}
+@tab @code{vec_sqrt}
+@item @code{vec_sr}
+@tab @code{vec_sra}
+@tab @code{vec_srl}
+@item @code{vec_sro}
+@tab @code{vec_srv}
+@tab @code{vec_st}
+@item @code{vec_ste}
+@tab @code{vec_stl}
+@tab @code{vec_sub}
+@item @code{vec_subc}
+@tab @code{vec_sube}
+@tab @code{vec_subec}
+@item @code{vec_subs}
+@tab @code{vec_sum2s}
+@tab @code{vec_sum4s}
+@item @code{vec_sums}
+@tab @code{vec_test_data_class}
+@tab @code{vec_trunc}
+@item @code{vec_unpackh}
+@tab @code{vec_unpackl}
+@tab @code{vec_unsigned}
+@item @code{vec_unsigned2}
+@tab @code{vec_unsignede}
+@tab @code{vec_unsignedo}
+@item @code{vec_xl}
+@tab @code{vec_xl_be}
+@tab @code{vec_xl_len}
+@item @code{vec_xl_len_r}
+@tab @code{vec_xor}
+@tab @code{vec_xst}
+@item @code{vec_xst_be}
+@tab @code{vec_xst_len}
+@tab @code{vec_xst_len_r}
 
-The builtin @code{vec_test_lsbb_all_zeros} returns 1 if the least significant
-bit in each byte is equal to zero.  It returns 0 otherwise.
+@end multitable
 
-@smallexample
-@exdent vector unsigned long long int
-@exdent vec_cfuge (vector unsigned long long int, vector unsigned long long int);
-@end smallexample
-Perform a vector centrifuge operation, as if implemented by the
-@code{vcfuged} instruction.
-@findex vec_cfuge
+@menu
+* PowerPC AltiVec Built-in Functions on ISA 2.05::
+* PowerPC AltiVec Built-in Functions Available on ISA 2.06::
+* PowerPC AltiVec Built-in Functions Available on ISA 2.07::
+* PowerPC AltiVec Built-in Functions Available on ISA 3.0::
+* PowerPC AltiVec Built-in Functions Available on ISA 3.1::
+@end menu
 
-@smallexample
-@exdent vector unsigned long long int
-@exdent vec_cntlzm (vector unsigned long long int, vector unsigned long long int);
-@end smallexample
-Perform a vector count leading zeros under bit mask operation, as if
-implemented by the @code{vclzdm} instruction.
-@findex vec_cntlzm
+@node PowerPC AltiVec Built-in Functions on ISA 2.05
+@subsubsection PowerPC AltiVec Built-in Functions on ISA 2.05
 
-@smallexample
-@exdent vector unsigned long long int
-@exdent vec_cnttzm (vector unsigned long long int, vector unsigned long long int);
-@end smallexample
-Perform a vector count trailing zeros under bit mask operation, as if
-implemented by the @code{vctzdm} instruction.
-@findex vec_cnttzm
+The following interfaces are supported for the generic and specific
+AltiVec operations and the AltiVec predicates.  In cases where there
+is a direct mapping between generic and specific operations, only the
+generic names are shown here, although the specific operations can also
+be used.
 
-@smallexample
-@exdent vector signed char
-@exdent vec_clrl (vector signed char @var{a}, unsigned int @var{n});
-@exdent vector unsigned char
-@exdent vec_clrl (vector unsigned char @var{a}, unsigned int @var{n});
-@end smallexample
-Clear the left-most @code{(16 - n)} bytes of vector argument @code{a}, as if
-implemented by the @code{vclrlb} instruction on a big-endian target
-and by the @code{vclrrb} instruction on a little-endian target.  A
-value of @code{n} that is greater than 16 is treated as if it equaled 16.
-@findex vec_clrl
+Arguments that are documented as @code{const int} require literal
+integral values within the range required for that operation.
 
-@smallexample
-@exdent vector signed char
-@exdent vec_clrr (vector signed char @var{a}, unsigned int @var{n});
-@exdent vector unsigned char
-@exdent vec_clrr (vector unsigned char @var{a}, unsigned int @var{n});
-@end smallexample
-Clear the right-most @code{(16 - n)} bytes of vector argument @code{a}, as if
-implemented by the @code{vclrrb} instruction on a big-endian target
-and by the @code{vclrlb} instruction on a little-endian target.  A
-value of @code{n} that is greater than 16 is treated as if it equaled 16.
-@findex vec_clrr
+Only functions excluded from the PVIPR are listed here.
 
 @smallexample
-@exdent vector unsigned long long int
-@exdent vec_gnb (vector unsigned __int128, const unsigned char);
-@end smallexample
-Perform a 128-bit vector gather  operation, as if implemented by the
-@code{vgnb} instruction.  The second argument must be a literal
-integer value between 2 and 7 inclusive.
-@findex vec_gnb
+void vec_dss (const int);
 
+void vec_dssall (void);
 
-Vector Extract
+void vec_dst (const vector unsigned char *, int, const int);
+void vec_dst (const vector signed char *, int, const int);
+void vec_dst (const vector bool char *, int, const int);
+void vec_dst (const vector unsigned short *, int, const int);
+void vec_dst (const vector signed short *, int, const int);
+void vec_dst (const vector bool short *, int, const int);
+void vec_dst (const vector pixel *, int, const int);
+void vec_dst (const vector unsigned int *, int, const int);
+void vec_dst (const vector signed int *, int, const int);
+void vec_dst (const vector bool int *, int, const int);
+void vec_dst (const vector float *, int, const int);
+void vec_dst (const unsigned char *, int, const int);
+void vec_dst (const signed char *, int, const int);
+void vec_dst (const unsigned short *, int, const int);
+void vec_dst (const short *, int, const int);
+void vec_dst (const unsigned int *, int, const int);
+void vec_dst (const int *, int, const int);
+void vec_dst (const float *, int, const int);
 
-@smallexample
-@exdent vector unsigned long long int
-@exdent vec_extractl (vector unsigned char, vector unsigned char, unsigned int);
-@exdent vector unsigned long long int
-@exdent vec_extractl (vector unsigned short, vector unsigned short, unsigned int);
-@exdent vector unsigned long long int
-@exdent vec_extractl (vector unsigned int, vector unsigned int, unsigned int);
-@exdent vector unsigned long long int
-@exdent vec_extractl (vector unsigned long long, vector unsigned long long, unsigned int);
-@end smallexample
-Extract an element from two concatenated vectors starting at the given byte index
-in natural-endian order, and place it zero-extended in doubleword 1 of the result
-according to natural element order.  If the byte index is out of range for the
-data type, the intrinsic will be rejected.
-For little-endian, this output will match the placement by the hardware
-instruction, i.e., dword[0] in RTL notation.  For big-endian, an additional
-instruction is needed to move it from the "left" doubleword to the  "right" one.
-For little-endian, semantics matching the @code{vextdubvrx},
-@code{vextduhvrx}, @code{vextduwvrx} instruction will be generated, while for
-big-endian, semantics matching the @code{vextdubvlx}, @code{vextduhvlx},
-@code{vextduwvlx} instructions
-will be generated.  Note that some fairly anomalous results can be generated if
-the byte index is not aligned on an element boundary for the element being
-extracted.  This is a limitation of the bi-endian vector programming model is
-consistent with the limitation on @code{vec_perm}.
-@findex vec_extractl
+void vec_dstst (const vector unsigned char *, int, const int);
+void vec_dstst (const vector signed char *, int, const int);
+void vec_dstst (const vector bool char *, int, const int);
+void vec_dstst (const vector unsigned short *, int, const int);
+void vec_dstst (const vector signed short *, int, const int);
+void vec_dstst (const vector bool short *, int, const int);
+void vec_dstst (const vector pixel *, int, const int);
+void vec_dstst (const vector unsigned int *, int, const int);
+void vec_dstst (const vector signed int *, int, const int);
+void vec_dstst (const vector bool int *, int, const int);
+void vec_dstst (const vector float *, int, const int);
+void vec_dstst (const unsigned char *, int, const int);
+void vec_dstst (const signed char *, int, const int);
+void vec_dstst (const unsigned short *, int, const int);
+void vec_dstst (const short *, int, const int);
+void vec_dstst (const unsigned int *, int, const int);
+void vec_dstst (const int *, int, const int);
+void vec_dstst (const unsigned long *, int, const int);
+void vec_dstst (const long *, int, const int);
+void vec_dstst (const float *, int, const int);
 
-@smallexample
-@exdent vector unsigned long long int
-@exdent vec_extracth (vector unsigned char, vector unsigned char, unsigned int);
-@exdent vector unsigned long long int
-@exdent vec_extracth (vector unsigned short, vector unsigned short,
-unsigned int);
-@exdent vector unsigned long long int
-@exdent vec_extracth (vector unsigned int, vector unsigned int, unsigned int);
-@exdent vector unsigned long long int
-@exdent vec_extracth (vector unsigned long long, vector unsigned long long,
-unsigned int);
-@end smallexample
-Extract an element from two concatenated vectors starting at the given byte
-index.  The index is based on big endian order for a little endian system.
-Similarly, the index is based on little endian order for a big endian system.
-The extraced elements are zero-extended and put in doubleword 1
-according to natural element order.  If the byte index is out of range for the
-data type, the intrinsic will be rejected.  For little-endian, this output
-will match the placement by the hardware instruction (vextdubvrx, vextduhvrx,
-vextduwvrx, vextddvrx) i.e., dword[0] in RTL
-notation.  For big-endian, an additional instruction is needed to move it
-from the "left" doubleword to the "right" one.  For little-endian, semantics
-matching the @code{vextdubvlx}, @code{vextduhvlx}, @code{vextduwvlx}
-instructions will be generated, while for big-endian, semantics matching the
-@code{vextdubvrx}, @code{vextduhvrx}, @code{vextduwvrx} instructions will
-be generated.  Note that some fairly anomalous
-results can be generated if the byte index is not aligned on the
-element boundary for the element being extracted.  This is a
-limitation of the bi-endian vector programming model consistent with the
-limitation on @code{vec_perm}.
-@findex vec_extracth
-@smallexample
-@exdent vector unsigned long long int
-@exdent vec_pdep (vector unsigned long long int, vector unsigned long long int);
-@end smallexample
-Perform a vector parallel bits deposit operation, as if implemented by
-the @code{vpdepd} instruction.
-@findex vec_pdep
+void vec_dststt (const vector unsigned char *, int, const int);
+void vec_dststt (const vector signed char *, int, const int);
+void vec_dststt (const vector bool char *, int, const int);
+void vec_dststt (const vector unsigned short *, int, const int);
+void vec_dststt (const vector signed short *, int, const int);
+void vec_dststt (const vector bool short *, int, const int);
+void vec_dststt (const vector pixel *, int, const int);
+void vec_dststt (const vector unsigned int *, int, const int);
+void vec_dststt (const vector signed int *, int, const int);
+void vec_dststt (const vector bool int *, int, const int);
+void vec_dststt (const vector float *, int, const int);
+void vec_dststt (const unsigned char *, int, const int);
+void vec_dststt (const signed char *, int, const int);
+void vec_dststt (const unsigned short *, int, const int);
+void vec_dststt (const short *, int, const int);
+void vec_dststt (const unsigned int *, int, const int);
+void vec_dststt (const int *, int, const int);
+void vec_dststt (const float *, int, const int);
 
-Vector Insert
+void vec_dstt (const vector unsigned char *, int, const int);
+void vec_dstt (const vector signed char *, int, const int);
+void vec_dstt (const vector bool char *, int, const int);
+void vec_dstt (const vector unsigned short *, int, const int);
+void vec_dstt (const vector signed short *, int, const int);
+void vec_dstt (const vector bool short *, int, const int);
+void vec_dstt (const vector pixel *, int, const int);
+void vec_dstt (const vector unsigned int *, int, const int);
+void vec_dstt (const vector signed int *, int, const int);
+void vec_dstt (const vector bool int *, int, const int);
+void vec_dstt (const vector float *, int, const int);
+void vec_dstt (const unsigned char *, int, const int);
+void vec_dstt (const signed char *, int, const int);
+void vec_dstt (const unsigned short *, int, const int);
+void vec_dstt (const short *, int, const int);
+void vec_dstt (const unsigned int *, int, const int);
+void vec_dstt (const int *, int, const int);
+void vec_dstt (const float *, int, const int);
 
-@smallexample
-@exdent vector unsigned char
-@exdent vec_insertl (unsigned char, vector unsigned char, unsigned int);
-@exdent vector unsigned short
-@exdent vec_insertl (unsigned short, vector unsigned short, unsigned int);
-@exdent vector unsigned int
-@exdent vec_insertl (unsigned int, vector unsigned int, unsigned int);
-@exdent vector unsigned long long
-@exdent vec_insertl (unsigned long long, vector unsigned long long,
-unsigned int);
-@exdent vector unsigned char
-@exdent vec_insertl (vector unsigned char, vector unsigned char, unsigned int;
-@exdent vector unsigned short
-@exdent vec_insertl (vector unsigned short, vector unsigned short,
-unsigned int);
-@exdent vector unsigned int
-@exdent vec_insertl (vector unsigned int, vector unsigned int, unsigned int);
-@end smallexample
+vector signed char vec_lvebx (int, char *);
+vector unsigned char vec_lvebx (int, unsigned char *);
 
-Let src be the first argument, when the first argument is a scalar, or the
-rightmost element of the left doubleword of the first argument, when the first
-argument is a vector.  Insert the source into the destination at the position
-given by the third argument, using natural element order in the second
-argument.  The rest of the second argument is unchanged.  If the byte
-index is greater than 14 for halfwords, greater than 12 for words, or
-greater than 8 for doublewords the result is undefined.   For little-endian,
-the generated code will be semantically equivalent to @code{vins[bhwd]rx}
-instructions.  Similarly for big-endian it will be semantically equivalent
-to @code{vins[bhwd]lx}.  Note that some fairly anomalous results can be
-generated if the byte index is not aligned on an element boundary for the
-type of element being inserted.
-@findex vec_insertl
+vector signed short vec_lvehx (int, short *);
+vector unsigned short vec_lvehx (int, unsigned short *);
 
-@smallexample
-@exdent vector unsigned char
-@exdent vec_inserth (unsigned char, vector unsigned char, unsigned int);
-@exdent vector unsigned short
-@exdent vec_inserth (unsigned short, vector unsigned short, unsigned int);
-@exdent vector unsigned int
-@exdent vec_inserth (unsigned int, vector unsigned int, unsigned int);
-@exdent vector unsigned long long
-@exdent vec_inserth (unsigned long long, vector unsigned long long,
-unsigned int);
-@exdent vector unsigned char
-@exdent vec_inserth (vector unsigned char, vector unsigned char, unsigned int);
-@exdent vector unsigned short
-@exdent vec_inserth (vector unsigned short, vector unsigned short,
-unsigned int);
-@exdent vector unsigned int
-@exdent vec_inserth (vector unsigned int, vector unsigned int, unsigned int);
-@end smallexample
+vector float vec_lvewx (int, float *);
+vector signed int vec_lvewx (int, int *);
+vector unsigned int vec_lvewx (int, unsigned int *);
 
-Let src be the first argument, when the first argument is a scalar, or the
-rightmost element of the first argument, when the first argument is a vector.
-Insert src into the second argument at the position identified by the third
-argument, using opposite element order in the second argument, and leaving the
-rest of the second argument unchanged.  If the byte index is greater than 14
-for halfwords, 12 for words, or 8 for doublewords, the intrinsic will be
-rejected. Note that the underlying hardware instruction uses the same register
-for the second argument and the result.
-For little-endian, the code generation will be semantically equivalent to
-@code{vins[bhwd]lx}, while for big-endian it will be semantically equivalent to
-@code{vins[bhwd]rx}.
-Note that some fairly anomalous results can be generated if the byte index is
-not aligned on an element boundary for the sort of element being inserted.
-@findex vec_inserth
+vector unsigned char vec_lvsl (int, const unsigned char *);
+vector unsigned char vec_lvsl (int, const signed char *);
+vector unsigned char vec_lvsl (int, const unsigned short *);
+vector unsigned char vec_lvsl (int, const short *);
+vector unsigned char vec_lvsl (int, const unsigned int *);
+vector unsigned char vec_lvsl (int, const int *);
+vector unsigned char vec_lvsl (int, const float *);
 
-Vector Replace Element
-@smallexample
-@exdent vector signed int vec_replace_elt (vector signed int, signed int,
-const int);
-@exdent vector unsigned int vec_replace_elt (vector unsigned int,
-unsigned int, const int);
-@exdent vector float vec_replace_elt (vector float, float, const int);
-@exdent vector signed long long vec_replace_elt (vector signed long long,
-signed long long, const int);
-@exdent vector unsigned long long vec_replace_elt (vector unsigned long long,
-unsigned long long, const int);
-@exdent vector double rec_replace_elt (vector double, double, const int);
-@end smallexample
-The third argument (constrained to [0,3]) identifies the natural-endian
-element number of the first argument that will be replaced by the second
-argument to produce the result.  The other elements of the first argument will
-remain unchanged in the result.
+vector unsigned char vec_lvsr (int, const unsigned char *);
+vector unsigned char vec_lvsr (int, const signed char *);
+vector unsigned char vec_lvsr (int, const unsigned short *);
+vector unsigned char vec_lvsr (int, const short *);
+vector unsigned char vec_lvsr (int, const unsigned int *);
+vector unsigned char vec_lvsr (int, const int *);
+vector unsigned char vec_lvsr (int, const float *);
 
-If it's desirable to insert a word at an unaligned position, use
-vec_replace_unaligned instead.
+void vec_stvebx (vector signed char, int, signed char *);
+void vec_stvebx (vector unsigned char, int, unsigned char *);
+void vec_stvebx (vector bool char, int, signed char *);
+void vec_stvebx (vector bool char, int, unsigned char *);
 
-@findex vec_replace_element
+void vec_stvehx (vector signed short, int, short *);
+void vec_stvehx (vector unsigned short, int, unsigned short *);
+void vec_stvehx (vector bool short, int, short *);
+void vec_stvehx (vector bool short, int, unsigned short *);
 
-Vector Replace Unaligned
-@smallexample
-@exdent vector unsigned char vec_replace_unaligned (vector unsigned char,
-signed int, const int);
-@exdent vector unsigned char vec_replace_unaligned (vector unsigned char,
-unsigned int, const int);
-@exdent vector unsigned char vec_replace_unaligned (vector unsigned char,
-float, const int);
-@exdent vector unsigned char vec_replace_unaligned (vector unsigned char,
-signed long long, const int);
-@exdent vector unsigned char vec_replace_unaligned (vector unsigned char,
-unsigned long long, const int);
-@exdent vector unsigned char vec_replace_unaligned (vector unsigned char,
-double, const int);
-@end smallexample
+void vec_stvewx (vector float, int, float *);
+void vec_stvewx (vector signed int, int, int *);
+void vec_stvewx (vector unsigned int, int, unsigned int *);
+void vec_stvewx (vector bool int, int, int *);
+void vec_stvewx (vector bool int, int, unsigned int *);
 
-The second argument replaces a portion of the first argument to produce the
-result, with the rest of the first argument unchanged in the result.  The
-third argument identifies the byte index (using left-to-right, or big-endian
-order) where the high-order byte of the second argument will be placed, with
-the remaining bytes of the second argument placed naturally "to the right"
-of the high-order byte.
+vector float vec_vaddfp (vector float, vector float);
 
-The programmer is responsible for understanding the endianness issues involved
-with the first argument and the result.
-@findex vec_replace_unaligned
+vector signed char vec_vaddsbs (vector bool char, vector signed char);
+vector signed char vec_vaddsbs (vector signed char, vector bool char);
+vector signed char vec_vaddsbs (vector signed char, vector signed char);
 
-Vector Shift Left Double Bit Immediate
-@smallexample
-@exdent vector signed char vec_sldb (vector signed char, vector signed char,
-const unsigned int);
-@exdent vector unsigned char vec_sldb (vector unsigned char,
-vector unsigned char, const unsigned int);
-@exdent vector signed short vec_sldb (vector signed short, vector signed short,
-const unsigned int);
-@exdent vector unsigned short vec_sldb (vector unsigned short,
-vector unsigned short, const unsigned int);
-@exdent vector signed int vec_sldb (vector signed int, vector signed int,
-const unsigned int);
-@exdent vector unsigned int vec_sldb (vector unsigned int, vector unsigned int,
-const unsigned int);
-@exdent vector signed long long vec_sldb (vector signed long long,
-vector signed long long, const unsigned int);
-@exdent vector unsigned long long vec_sldb (vector unsigned long long,
-vector unsigned long long, const unsigned int);
-@exdent vector signed __int128 vec_sldb (vector signed __int128,
-vector signed __int128, const unsigned int);
-@exdent vector unsigned __int128 vec_sldb (vector unsigned __int128,
-vector unsigned __int128, const unsigned int);
-@end smallexample
+vector signed short vec_vaddshs (vector bool short, vector signed short);
+vector signed short vec_vaddshs (vector signed short, vector bool short);
+vector signed short vec_vaddshs (vector signed short, vector signed short);
+
+vector signed int vec_vaddsws (vector bool int, vector signed int);
+vector signed int vec_vaddsws (vector signed int, vector bool int);
+vector signed int vec_vaddsws (vector signed int, vector signed int);
+
+vector signed char vec_vaddubm (vector bool char, vector signed char);
+vector signed char vec_vaddubm (vector signed char, vector bool char);
+vector signed char vec_vaddubm (vector signed char, vector signed char);
+vector unsigned char vec_vaddubm (vector bool char, vector unsigned char);
+vector unsigned char vec_vaddubm (vector unsigned char, vector bool char);
+vector unsigned char vec_vaddubm (vector unsigned char, vector unsigned char);
 
-Shift the combined input vectors left by the amount specified by the low-order
-three bits of the third argument, and return the leftmost remaining 128 bits.
-Code using this instruction must be endian-aware.
+vector unsigned char vec_vaddubs (vector bool char, vector unsigned char);
+vector unsigned char vec_vaddubs (vector unsigned char, vector bool char);
+vector unsigned char vec_vaddubs (vector unsigned char, vector unsigned char);
 
-@findex vec_sldb
+vector signed short vec_vadduhm (vector bool short, vector signed short);
+vector signed short vec_vadduhm (vector signed short, vector bool short);
+vector signed short vec_vadduhm (vector signed short, vector signed short);
+vector unsigned short vec_vadduhm (vector bool short, vector unsigned short);
+vector unsigned short vec_vadduhm (vector unsigned short, vector bool short);
+vector unsigned short vec_vadduhm (vector unsigned short, vector unsigned short);
 
-Vector Shift Right Double Bit Immediate
+vector unsigned short vec_vadduhs (vector bool short, vector unsigned short);
+vector unsigned short vec_vadduhs (vector unsigned short, vector bool short);
+vector unsigned short vec_vadduhs (vector unsigned short, vector unsigned short);
 
-@smallexample
-@exdent vector signed char vec_srdb (vector signed char, vector signed char,
-const unsigned int);
-@exdent vector unsigned char vec_srdb (vector unsigned char, vector unsigned char,
-const unsigned int);
-@exdent vector signed short vec_srdb (vector signed short, vector signed short,
-const unsigned int);
-@exdent vector unsigned short vec_srdb (vector unsigned short, vector unsigned short,
-const unsigned int);
-@exdent vector signed int vec_srdb (vector signed int, vector signed int,
-const unsigned int);
-@exdent vector unsigned int vec_srdb (vector unsigned int, vector unsigned int,
-const unsigned int);
-@exdent vector signed long long vec_srdb (vector signed long long,
-vector signed long long, const unsigned int);
-@exdent vector unsigned long long vec_srdb (vector unsigned long long,
-vector unsigned long long, const unsigned int);
-@exdent vector signed __int128 vec_srdb (vector signed __int128,
-vector signed __int128, const unsigned int);
-@exdent vector unsigned __int128 vec_srdb (vector unsigned __int128,
-vector unsigned __int128, const unsigned int);
-@end smallexample
+vector signed int vec_vadduwm (vector bool int, vector signed int);
+vector signed int vec_vadduwm (vector signed int, vector bool int);
+vector signed int vec_vadduwm (vector signed int, vector signed int);
+vector unsigned int vec_vadduwm (vector bool int, vector unsigned int);
+vector unsigned int vec_vadduwm (vector unsigned int, vector bool int);
+vector unsigned int vec_vadduwm (vector unsigned int, vector unsigned int);
 
-Shift the combined input vectors right by the amount specified by the low-order
-three bits of the third argument, and return the remaining 128 bits.  Code
-using this built-in must be endian-aware.
+vector unsigned int vec_vadduws (vector bool int, vector unsigned int);
+vector unsigned int vec_vadduws (vector unsigned int, vector bool int);
+vector unsigned int vec_vadduws (vector unsigned int, vector unsigned int);
 
-@findex vec_srdb
+vector signed char vec_vavgsb (vector signed char, vector signed char);
 
-Vector Splat
+vector signed short vec_vavgsh (vector signed short, vector signed short);
 
-@smallexample
-@exdent vector signed int vec_splati (const signed int);
-@exdent vector float vec_splati (const float);
-@end smallexample
+vector signed int vec_vavgsw (vector signed int, vector signed int);
 
-Splat a 32-bit immediate into a vector of words.
+vector unsigned char vec_vavgub (vector unsigned char, vector unsigned char);
 
-@findex vec_splati
+vector unsigned short vec_vavguh (vector unsigned short, vector unsigned short);
 
-@smallexample
-@exdent vector double vec_splatid (const float);
-@end smallexample
+vector unsigned int vec_vavguw (vector unsigned int, vector unsigned int);
 
-Convert a single precision floating-point value to double-precision and splat
-the result to a vector of double-precision floats.
+vector float vec_vcfsx (vector signed int, const int);
 
-@findex vec_splatid
+vector float vec_vcfux (vector unsigned int, const int);
 
-@smallexample
-@exdent vector signed int vec_splati_ins (vector signed int,
-const unsigned int, const signed int);
-@exdent vector unsigned int vec_splati_ins (vector unsigned int,
-const unsigned int, const unsigned int);
-@exdent vector float vec_splati_ins (vector float, const unsigned int,
-const float);
-@end smallexample
+vector bool int vec_vcmpeqfp (vector float, vector float);
 
-Argument 2 must be either 0 or 1.  Splat the value of argument 3 into the word
-identified by argument 2 of each doubleword of argument 1 and return the
-result.  The other words of argument 1 are unchanged.
+vector bool char vec_vcmpequb (vector signed char, vector signed char);
+vector bool char vec_vcmpequb (vector unsigned char, vector unsigned char);
 
-@findex vec_splati_ins
+vector bool short vec_vcmpequh (vector signed short, vector signed short);
+vector bool short vec_vcmpequh (vector unsigned short, vector unsigned short);
 
-Vector Blend Variable
+vector bool int vec_vcmpequw (vector signed int, vector signed int);
+vector bool int vec_vcmpequw (vector unsigned int, vector unsigned int);
 
-@smallexample
-@exdent vector signed char vec_blendv (vector signed char, vector signed char,
-vector unsigned char);
-@exdent vector unsigned char vec_blendv (vector unsigned char,
-vector unsigned char, vector unsigned char);
-@exdent vector signed short vec_blendv (vector signed short,
-vector signed short, vector unsigned short);
-@exdent vector unsigned short vec_blendv (vector unsigned short,
-vector unsigned short, vector unsigned short);
-@exdent vector signed int vec_blendv (vector signed int, vector signed int,
-vector unsigned int);
-@exdent vector unsigned int vec_blendv (vector unsigned int,
-vector unsigned int, vector unsigned int);
-@exdent vector signed long long vec_blendv (vector signed long long,
-vector signed long long, vector unsigned long long);
-@exdent vector unsigned long long vec_blendv (vector unsigned long long,
-vector unsigned long long, vector unsigned long long);
-@exdent vector float vec_blendv (vector float, vector float,
-vector unsigned int);
-@exdent vector double vec_blendv (vector double, vector double,
-vector unsigned long long);
-@end smallexample
+vector bool int vec_vcmpgtfp (vector float, vector float);
 
-Blend the first and second argument vectors according to the sign bits of the
-corresponding elements of the third argument vector.  This is similar to the
-@code{vsel} and @code{xxsel} instructions but for bigger elements.
+vector bool char vec_vcmpgtsb (vector signed char, vector signed char);
 
-@findex vec_blendv
+vector bool short vec_vcmpgtsh (vector signed short, vector signed short);
 
-Vector Permute Extended
+vector bool int vec_vcmpgtsw (vector signed int, vector signed int);
 
-@smallexample
-@exdent vector signed char vec_permx (vector signed char, vector signed char,
-vector unsigned char, const int);
-@exdent vector unsigned char vec_permx (vector unsigned char,
-vector unsigned char, vector unsigned char, const int);
-@exdent vector signed short vec_permx (vector signed short,
-vector signed short, vector unsigned char, const int);
-@exdent vector unsigned short vec_permx (vector unsigned short,
-vector unsigned short, vector unsigned char, const int);
-@exdent vector signed int vec_permx (vector signed int, vector signed int,
-vector unsigned char, const int);
-@exdent vector unsigned int vec_permx (vector unsigned int,
-vector unsigned int, vector unsigned char, const int);
-@exdent vector signed long long vec_permx (vector signed long long,
-vector signed long long, vector unsigned char, const int);
-@exdent vector unsigned long long vec_permx (vector unsigned long long,
-vector unsigned long long, vector unsigned char, const int);
-@exdent vector float (vector float, vector float, vector unsigned char,
-const int);
-@exdent vector double (vector double, vector double, vector unsigned char,
-const int);
-@end smallexample
+vector bool char vec_vcmpgtub (vector unsigned char, vector unsigned char);
 
-Perform a partial permute of the first two arguments, which form a 32-byte
-section of an emulated vector up to 256 bytes wide, using the partial permute
-control vector in the third argument.  The fourth argument (constrained to
-values of 0-7) identifies which 32-byte section of the emulated vector is
-contained in the first two arguments.
-@findex vec_permx
+vector bool short vec_vcmpgtuh (vector unsigned short, vector unsigned short);
 
-@smallexample
-@exdent vector unsigned long long int
-@exdent vec_pext (vector unsigned long long int, vector unsigned long long int);
-@end smallexample
-Perform a vector parallel bit extract operation, as if implemented by
-the @code{vpextd} instruction.
-@findex vec_pext
+vector bool int vec_vcmpgtuw (vector unsigned int, vector unsigned int);
 
-@smallexample
-@exdent vector unsigned char vec_stril (vector unsigned char);
-@exdent vector signed char vec_stril (vector signed char);
-@exdent vector unsigned short vec_stril (vector unsigned short);
-@exdent vector signed short vec_stril (vector signed short);
-@end smallexample
-Isolate the left-most non-zero elements of the incoming vector argument,
-replacing all elements to the right of the left-most zero element
-found within the argument with zero.  The typical implementation uses
-the @code{vstribl} or @code{vstrihl} instruction on big-endian targets
-and uses the @code{vstribr} or @code{vstrihr} instruction on
-little-endian targets.
-@findex vec_stril
+vector float vec_vmaxfp (vector float, vector float);
 
-@smallexample
-@exdent int vec_stril_p (vector unsigned char);
-@exdent int vec_stril_p (vector signed char);
-@exdent int short vec_stril_p (vector unsigned short);
-@exdent int vec_stril_p (vector signed short);
-@end smallexample
-Return a non-zero value if and only if the argument contains a zero
-element.  The typical implementation uses
-the @code{vstribl.} or @code{vstrihl.} instruction on big-endian targets
-and uses the @code{vstribr.} or @code{vstrihr.} instruction on
-little-endian targets.  Choose this built-in to check for presence of
-zero element if the same argument is also passed to @code{vec_stril}.
-@findex vec_stril_p
+vector signed char vec_vmaxsb (vector bool char, vector signed char);
+vector signed char vec_vmaxsb (vector signed char, vector bool char);
+vector signed char vec_vmaxsb (vector signed char, vector signed char);
 
-@smallexample
-@exdent vector unsigned char vec_strir (vector unsigned char);
-@exdent vector signed char vec_strir (vector signed char);
-@exdent vector unsigned short vec_strir (vector unsigned short);
-@exdent vector signed short vec_strir (vector signed short);
-@end smallexample
-Isolate the right-most non-zero elements of the incoming vector argument,
-replacing all elements to the left of the right-most zero element
-found within the argument with zero.  The typical implementation uses
-the @code{vstribr} or @code{vstrihr} instruction on big-endian targets
-and uses the @code{vstribl} or @code{vstrihl} instruction on
-little-endian targets.
-@findex vec_strir
+vector signed short vec_vmaxsh (vector bool short, vector signed short);
+vector signed short vec_vmaxsh (vector signed short, vector bool short);
+vector signed short vec_vmaxsh (vector signed short, vector signed short);
 
-@smallexample
-@exdent int vec_strir_p (vector unsigned char);
-@exdent int vec_strir_p (vector signed char);
-@exdent int short vec_strir_p (vector unsigned short);
-@exdent int vec_strir_p (vector signed short);
-@end smallexample
-Return a non-zero value if and only if the argument contains a zero
-element.  The typical implementation uses
-the @code{vstribr.} or @code{vstrihr.} instruction on big-endian targets
-and uses the @code{vstribl.} or @code{vstrihl.} instruction on
-little-endian targets.  Choose this built-in to check for presence of
-zero element if the same argument is also passed to @code{vec_strir}.
-@findex vec_strir_p
+vector signed int vec_vmaxsw (vector bool int, vector signed int);
+vector signed int vec_vmaxsw (vector signed int, vector bool int);
+vector signed int vec_vmaxsw (vector signed int, vector signed int);
 
-@smallexample
-@exdent vector unsigned char
-@exdent vec_ternarylogic (vector unsigned char, vector unsigned char,
-            vector unsigned char, const unsigned int);
-@exdent vector unsigned short
-@exdent vec_ternarylogic (vector unsigned short, vector unsigned short,
-            vector unsigned short, const unsigned int);
-@exdent vector unsigned int
-@exdent vec_ternarylogic (vector unsigned int, vector unsigned int,
-            vector unsigned int, const unsigned int);
-@exdent vector unsigned long long int
-@exdent vec_ternarylogic (vector unsigned long long int, vector unsigned long long int,
-            vector unsigned long long int, const unsigned int);
-@exdent vector unsigned __int128
-@exdent vec_ternarylogic (vector unsigned __int128, vector unsigned __int128,
-            vector unsigned __int128, const unsigned int);
-@end smallexample
-Perform a 128-bit vector evaluate operation, as if implemented by the
-@code{xxeval} instruction.  The fourth argument must be a literal
-integer value between 0 and 255 inclusive.
-@findex vec_ternarylogic
+vector unsigned char vec_vmaxub (vector bool char, vector unsigned char);
+vector unsigned char vec_vmaxub (vector unsigned char, vector bool char);
+vector unsigned char vec_vmaxub (vector unsigned char, vector unsigned char);
+
+vector unsigned short vec_vmaxuh (vector bool short, vector unsigned short);
+vector unsigned short vec_vmaxuh (vector unsigned short, vector bool short);
+vector unsigned short vec_vmaxuh (vector unsigned short, vector unsigned short);
 
-@smallexample
-@exdent vector unsigned char vec_genpcvm (vector unsigned char, const int);
-@exdent vector unsigned short vec_genpcvm (vector unsigned short, const int);
-@exdent vector unsigned int vec_genpcvm (vector unsigned int, const int);
-@exdent vector unsigned int vec_genpcvm (vector unsigned long long int,
-                                         const int);
-@end smallexample
+vector unsigned int vec_vmaxuw (vector bool int, vector unsigned int);
+vector unsigned int vec_vmaxuw (vector unsigned int, vector bool int);
+vector unsigned int vec_vmaxuw (vector unsigned int, vector unsigned int);
 
-Vector Integer Multiply/Divide/Modulo
+vector float vec_vminfp (vector float, vector float);
 
-@smallexample
-@exdent vector signed int
-@exdent vec_mulh (vector signed int @var{a}, vector signed int @var{b});
-@exdent vector unsigned int
-@exdent vec_mulh (vector unsigned int @var{a}, vector unsigned int @var{b});
-@end smallexample
+vector signed char vec_vminsb (vector bool char, vector signed char);
+vector signed char vec_vminsb (vector signed char, vector bool char);
+vector signed char vec_vminsb (vector signed char, vector signed char);
 
-For each integer value @code{i} from 0 to 3, do the following. The integer
-value in word element @code{i} of a is multiplied by the integer value in word
-element @code{i} of b. The high-order 32 bits of the 64-bit product are placed
-into word element @code{i} of the vector returned.
+vector signed short vec_vminsh (vector bool short, vector signed short);
+vector signed short vec_vminsh (vector signed short, vector bool short);
+vector signed short vec_vminsh (vector signed short, vector signed short);
 
-@smallexample
-@exdent vector signed long long
-@exdent vec_mulh (vector signed long long @var{a}, vector signed long long @var{b});
-@exdent vector unsigned long long
-@exdent vec_mulh (vector unsigned long long @var{a}, vector unsigned long long @var{b});
-@end smallexample
+vector signed int vec_vminsw (vector bool int, vector signed int);
+vector signed int vec_vminsw (vector signed int, vector bool int);
+vector signed int vec_vminsw (vector signed int, vector signed int);
 
-For each integer value @code{i} from 0 to 1, do the following. The integer
-value in doubleword element @code{i} of a is multiplied by the integer value in
-doubleword element @code{i} of b. The high-order 64 bits of the 128-bit product
-are placed into doubleword element @code{i} of the vector returned.
+vector unsigned char vec_vminub (vector bool char, vector unsigned char);
+vector unsigned char vec_vminub (vector unsigned char, vector bool char);
+vector unsigned char vec_vminub (vector unsigned char, vector unsigned char);
 
-@smallexample
-@exdent vector unsigned long long
-@exdent vec_mul (vector unsigned long long @var{a}, vector unsigned long long @var{b});
-@exdent vector signed long long
-@exdent vec_mul (vector signed long long @var{a}, vector signed long long @var{b});
-@end smallexample
+vector unsigned short vec_vminuh (vector bool short, vector unsigned short);
+vector unsigned short vec_vminuh (vector unsigned short, vector bool short);
+vector unsigned short vec_vminuh (vector unsigned short, vector unsigned short);
 
-For each integer value @code{i} from 0 to 1, do the following. The integer
-value in doubleword element @code{i} of a is multiplied by the integer value in
-doubleword element @code{i} of b. The low-order 64 bits of the 128-bit product
-are placed into doubleword element @code{i} of the vector returned.
+vector unsigned int vec_vminuw (vector bool int, vector unsigned int);
+vector unsigned int vec_vminuw (vector unsigned int, vector bool int);
+vector unsigned int vec_vminuw (vector unsigned int, vector unsigned int);
 
-@smallexample
-@exdent vector signed int
-@exdent vec_div (vector signed int @var{a}, vector signed int @var{b});
-@exdent vector unsigned int
-@exdent vec_div (vector unsigned int @var{a}, vector unsigned int @var{b});
-@end smallexample
+vector bool char vec_vmrghb (vector bool char, vector bool char);
+vector signed char vec_vmrghb (vector signed char, vector signed char);
+vector unsigned char vec_vmrghb (vector unsigned char, vector unsigned char);
 
-For each integer value @code{i} from 0 to 3, do the following. The integer in
-word element @code{i} of a is divided by the integer in word element @code{i}
-of b. The unique integer quotient is placed into the word element @code{i} of
-the vector returned. If an attempt is made to perform any of the divisions
-<anything> Ã· 0 then the quotient is undefined.
+vector bool short vec_vmrghh (vector bool short, vector bool short);
+vector signed short vec_vmrghh (vector signed short, vector signed short);
+vector unsigned short vec_vmrghh (vector unsigned short, vector unsigned short);
+vector pixel vec_vmrghh (vector pixel, vector pixel);
 
-@smallexample
-@exdent vector signed long long
-@exdent vec_div (vector signed long long @var{a}, vector signed long long @var{b});
-@exdent vector unsigned long long
-@exdent vec_div (vector unsigned long long @var{a}, vector unsigned long long @var{b});
-@end smallexample
+vector float vec_vmrghw (vector float, vector float);
+vector bool int vec_vmrghw (vector bool int, vector bool int);
+vector signed int vec_vmrghw (vector signed int, vector signed int);
+vector unsigned int vec_vmrghw (vector unsigned int, vector unsigned int);
 
-For each integer value @code{i} from 0 to 1, do the following. The integer in
-doubleword element @code{i} of a is divided by the integer in doubleword
-element @code{i} of b. The unique integer quotient is placed into the
-doubleword element @code{i} of the vector returned. If an attempt is made to
-perform any of the divisions 0x8000_0000_0000_0000 Ã· -1 or <anything> Ã· 0 then
-the quotient is undefined.
+vector bool char vec_vmrglb (vector bool char, vector bool char);
+vector signed char vec_vmrglb (vector signed char, vector signed char);
+vector unsigned char vec_vmrglb (vector unsigned char, vector unsigned char);
 
-@smallexample
-@exdent vector signed int
-@exdent vec_dive (vector signed int @var{a}, vector signed int @var{b});
-@exdent vector unsigned int
-@exdent vec_dive (vector unsigned int @var{a}, vector unsigned int @var{b});
-@end smallexample
+vector bool short vec_vmrglh (vector bool short, vector bool short);
+vector signed short vec_vmrglh (vector signed short, vector signed short);
+vector unsigned short vec_vmrglh (vector unsigned short, vector unsigned short);
+vector pixel vec_vmrglh (vector pixel, vector pixel);
 
-For each integer value @code{i} from 0 to 3, do the following. The integer in
-word element @code{i} of a is shifted left by 32 bits, then divided by the
-integer in word element @code{i} of b. The unique integer quotient is placed
-into the word element @code{i} of the vector returned. If the quotient cannot
-be represented in 32 bits, or if an attempt is made to perform any of the
-divisions <anything> Ã· 0 then the quotient is undefined.
+vector float vec_vmrglw (vector float, vector float);
+vector signed int vec_vmrglw (vector signed int, vector signed int);
+vector unsigned int vec_vmrglw (vector unsigned int, vector unsigned int);
+vector bool int vec_vmrglw (vector bool int, vector bool int);
 
-@smallexample
-@exdent vector signed long long
-@exdent vec_dive (vector signed long long @var{a}, vector signed long long @var{b});
-@exdent vector unsigned long long
-@exdent vec_dive (vector unsigned long long @var{a}, vector unsigned long long @var{b});
-@end smallexample
+vector signed int vec_vmsummbm (vector signed char, vector unsigned char,
+                                vector signed int);
 
-For each integer value @code{i} from 0 to 1, do the following. The integer in
-doubleword element @code{i} of a is shifted left by 64 bits, then divided by
-the integer in doubleword element @code{i} of b. The unique integer quotient is
-placed into the doubleword element @code{i} of the vector returned. If the
-quotient cannot be represented in 64 bits, or if an attempt is made to perform
-<anything> Ã· 0 then the quotient is undefined.
+vector signed int vec_vmsumshm (vector signed short, vector signed short,
+                                vector signed int);
 
-@smallexample
-@exdent vector signed int
-@exdent vec_mod (vector signed int @var{a}, vector signed int @var{b});
-@exdent vector unsigned int
-@exdent vec_mod (vector unsigned int @var{a}, vector unsigned int @var{b});
-@end smallexample
+vector signed int vec_vmsumshs (vector signed short, vector signed short,
+                                vector signed int);
 
-For each integer value @code{i} from 0 to 3, do the following. The integer in
-word element @code{i} of a is divided by the integer in word element @code{i}
-of b. The unique integer remainder is placed into the word element @code{i} of
-the vector returned.  If an attempt is made to perform any of the divisions
-0x8000_0000 Ã· -1 or <anything> Ã· 0 then the remainder is undefined.
+vector unsigned int vec_vmsumubm (vector unsigned char, vector unsigned char,
+                                  vector unsigned int);
 
-@smallexample
-@exdent vector signed long long
-@exdent vec_mod (vector signed long long @var{a}, vector signed long long @var{b});
-@exdent vector unsigned long long
-@exdent vec_mod (vector unsigned long long @var{a}, vector unsigned long long @var{b});
-@end smallexample
+vector unsigned int vec_vmsumuhm (vector unsigned short, vector unsigned short,
+                                  vector unsigned int);
 
-For each integer value @code{i} from 0 to 1, do the following. The integer in
-doubleword element @code{i} of a is divided by the integer in doubleword
-element @code{i} of b. The unique integer remainder is placed into the
-doubleword element @code{i} of the vector returned. If an attempt is made to
-perform <anything> Ã· 0 then the remainder is undefined.
+vector unsigned int vec_vmsumuhs (vector unsigned short, vector unsigned short,
+                                  vector unsigned int);
 
-Generate PCV from specified Mask size, as if implemented by the
-@code{xxgenpcvbm}, @code{xxgenpcvhm}, @code{xxgenpcvwm} instructions, where
-immediate value is either 0, 1, 2 or 3.
-@findex vec_genpcvm
+vector signed short vec_vmulesb (vector signed char, vector signed char);
 
-@smallexample
-@exdent vector unsigned __int128 vec_rl (vector unsigned __int128 @var{A},
-                                         vector unsigned __int128 @var{B});
-@exdent vector signed __int128 vec_rl (vector signed __int128 @var{A},
-                                       vector unsigned __int128 @var{B});
-@end smallexample
+vector signed int vec_vmulesh (vector signed short, vector signed short);
 
-Result value: Each element of @var{R} is obtained by rotating the corresponding element
-of @var{A} left by the number of bits specified by the corresponding element of @var{B}.
+vector unsigned short vec_vmuleub (vector unsigned char, vector unsigned char);
 
+vector unsigned int vec_vmuleuh (vector unsigned short, vector unsigned short);
 
-@smallexample
-@exdent vector unsigned __int128 vec_rlmi (vector unsigned __int128,
-                                           vector unsigned __int128,
-                                           vector unsigned __int128);
-@exdent vector signed __int128 vec_rlmi (vector signed __int128,
-                                         vector signed __int128,
-                                         vector unsigned __int128);
-@end smallexample
+vector signed short vec_vmulosb (vector signed char, vector signed char);
 
-Returns the result of rotating the first input and inserting it under mask
-into the second input.  The first bit in the mask, the last bit in the mask are
-obtained from the two 7-bit fields bits [108:115] and bits [117:123]
-respectively of the second input.  The shift is obtained from the third input
-in the 7-bit field [125:131] where all bits counted from zero at the left.
+vector signed int vec_vmulosh (vector signed short, vector signed short);
 
-@smallexample
-@exdent vector unsigned __int128 vec_rlnm (vector unsigned __int128,
-                                           vector unsigned __int128,
-                                           vector unsigned __int128);
-@exdent vector signed __int128 vec_rlnm (vector signed __int128,
-                                         vector unsigned __int128,
-                                         vector unsigned __int128);
-@end smallexample
+vector unsigned short vec_vmuloub (vector unsigned char, vector unsigned char);
 
-Returns the result of rotating the first input and ANDing it with a mask.  The
-first bit in the mask and the last bit in the mask are obtained from the two
-7-bit fields bits [117:123] and bits [125:131] respectively of the second
-input.  The shift is obtained from the third input in the 7-bit field bits
-[125:131] where all bits counted from zero at the left.
+vector unsigned int vec_vmulouh (vector unsigned short, vector unsigned short);
 
-@smallexample
-@exdent vector unsigned __int128 vec_sl(vector unsigned __int128 @var{A}, vector unsigned __int128 @var{B});
-@exdent vector signed __int128 vec_sl(vector signed __int128 @var{A}, vector unsigned __int128 @var{B});
-@end smallexample
+vector signed char vec_vpkshss (vector signed short, vector signed short);
 
-Result value: Each element of @var{R} is obtained by shifting the corresponding element of
-@var{A} left by the number of bits specified by the corresponding element of @var{B}.
+vector unsigned char vec_vpkshus (vector signed short, vector signed short);
 
-@smallexample
-@exdent vector unsigned __int128 vec_sr(vector unsigned __int128 @var{A}, vector unsigned __int128 @var{B});
-@exdent vector signed __int128 vec_sr(vector signed __int128 @var{A}, vector unsigned __int128 @var{B});
-@end smallexample
+vector signed short vec_vpkswss (vector signed int, vector signed int);
 
-Result value: Each element of @var{R} is obtained by shifting the corresponding element of
-@var{A} right by the number of bits specified by the corresponding element of @var{B}.
+vector unsigned short vec_vpkswus (vector signed int, vector signed int);
 
-@smallexample
-@exdent vector unsigned __int128 vec_sra(vector unsigned __int128 @var{A}, vector unsigned __int128 @var{B});
-@exdent vector signed __int128 vec_sra(vector signed __int128 @var{A}, vector unsigned __int128 @var{B});
-@end smallexample
+vector bool char vec_vpkuhum (vector bool short, vector bool short);
+vector signed char vec_vpkuhum (vector signed short, vector signed short);
+vector unsigned char vec_vpkuhum (vector unsigned short, vector unsigned short);
 
-Result value: Each element of @var{R} is obtained by arithmetic shifting the corresponding
-element of @var{A} right by the number of bits specified by the corresponding element of @var{B}.
+vector unsigned char vec_vpkuhus (vector unsigned short, vector unsigned short);
 
-@smallexample
-@exdent vector unsigned __int128 vec_mule (vector unsigned long long,
-                                           vector unsigned long long);
-@exdent vector signed __int128 vec_mule (vector signed long long,
-                                         vector signed long long);
-@end smallexample
+vector bool short vec_vpkuwum (vector bool int, vector bool int);
+vector signed short vec_vpkuwum (vector signed int, vector signed int);
+vector unsigned short vec_vpkuwum (vector unsigned int, vector unsigned int);
+
+vector unsigned short vec_vpkuwus (vector unsigned int, vector unsigned int);
+
+vector signed char vec_vrlb (vector signed char, vector unsigned char);
+vector unsigned char vec_vrlb (vector unsigned char, vector unsigned char);
+
+vector signed short vec_vrlh (vector signed short, vector unsigned short);
+vector unsigned short vec_vrlh (vector unsigned short, vector unsigned short);
+
+vector signed int vec_vrlw (vector signed int, vector unsigned int);
+vector unsigned int vec_vrlw (vector unsigned int, vector unsigned int);
 
-Returns a vector containing a 128-bit integer result of multiplying the even
-doubleword elements of the two inputs.
+vector signed char vec_vslb (vector signed char, vector unsigned char);
+vector unsigned char vec_vslb (vector unsigned char, vector unsigned char);
 
-@smallexample
-@exdent vector unsigned __int128 vec_mulo (vector unsigned long long,
-                                           vector unsigned long long);
-@exdent vector signed __int128 vec_mulo (vector signed long long,
-                                         vector signed long long);
-@end smallexample
+vector signed short vec_vslh (vector signed short, vector unsigned short);
+vector unsigned short vec_vslh (vector unsigned short, vector unsigned short);
 
-Returns a vector containing a 128-bit integer result of multiplying the odd
-doubleword elements of the two inputs.
+vector signed int vec_vslw (vector signed int, vector unsigned int);
+vector unsigned int vec_vslw (vector unsigned int, vector unsigned int);
 
-@smallexample
-@exdent vector unsigned __int128 vec_div (vector unsigned __int128,
-                                          vector unsigned __int128);
-@exdent vector signed __int128 vec_div (vector signed __int128,
-                                        vector signed __int128);
-@end smallexample
+vector signed char vec_vspltb (vector signed char, const int);
+vector unsigned char vec_vspltb (vector unsigned char, const int);
+vector bool char vec_vspltb (vector bool char, const int);
 
-Returns the result of dividing the first operand by the second operand. An
-attempt to divide any value by zero or to divide the most negative signed
-128-bit integer by negative one results in an undefined value.
+vector bool short vec_vsplth (vector bool short, const int);
+vector signed short vec_vsplth (vector signed short, const int);
+vector unsigned short vec_vsplth (vector unsigned short, const int);
+vector pixel vec_vsplth (vector pixel, const int);
 
-@smallexample
-@exdent vector unsigned __int128 vec_dive (vector unsigned __int128,
-                                           vector unsigned __int128);
-@exdent vector signed __int128 vec_dive (vector signed __int128,
-                                         vector signed __int128);
-@end smallexample
+vector float vec_vspltw (vector float, const int);
+vector signed int vec_vspltw (vector signed int, const int);
+vector unsigned int vec_vspltw (vector unsigned int, const int);
+vector bool int vec_vspltw (vector bool int, const int);
 
-The result is produced by shifting the first input left by 128 bits and
-dividing by the second.  If an attempt is made to divide by zero or the result
-is larger than 128 bits, the result is undefined.
+vector signed char vec_vsrab (vector signed char, vector unsigned char);
+vector unsigned char vec_vsrab (vector unsigned char, vector unsigned char);
 
-@smallexample
-@exdent vector unsigned __int128 vec_mod (vector unsigned __int128,
-                                          vector unsigned __int128);
-@exdent vector signed __int128 vec_mod (vector signed __int128,
-                                        vector signed __int128);
-@end smallexample
+vector signed short vec_vsrah (vector signed short, vector unsigned short);
+vector unsigned short vec_vsrah (vector unsigned short, vector unsigned short);
 
-The result is the modulo result of dividing the first input  by the second
-input.
+vector signed int vec_vsraw (vector signed int, vector unsigned int);
+vector unsigned int vec_vsraw (vector unsigned int, vector unsigned int);
 
-The following builtins perform 128-bit vector comparisons.  The
-@code{vec_all_xx}, @code{vec_any_xx}, and @code{vec_cmpxx}, where @code{xx} is
-one of the operations @code{eq, ne, gt, lt, ge, le} perform pairwise
-comparisons between the elements at the same positions within their two vector
-arguments.  The @code{vec_all_xx}function returns a non-zero value if and only
-if all pairwise comparisons are true.  The @code{vec_any_xx} function returns
-a non-zero value if and only if at least one pairwise comparison is true.  The
-@code{vec_cmpxx}function returns a vector of the same type as its two
-arguments, within which each element consists of all ones to denote that
-specified logical comparison of the corresponding elements was true.
-Otherwise, the element of the returned vector contains all zeros.
+vector signed char vec_vsrb (vector signed char, vector unsigned char);
+vector unsigned char vec_vsrb (vector unsigned char, vector unsigned char);
 
-@smallexample
-vector bool __int128 vec_cmpeq (vector signed __int128, vector signed __int128);
-vector bool __int128 vec_cmpeq (vector unsigned __int128, vector unsigned __int128);
-vector bool __int128 vec_cmpne (vector signed __int128, vector signed __int128);
-vector bool __int128 vec_cmpne (vector unsigned __int128, vector unsigned __int128);
-vector bool __int128 vec_cmpgt (vector signed __int128, vector signed __int128);
-vector bool __int128 vec_cmpgt (vector unsigned __int128, vector unsigned __int128);
-vector bool __int128 vec_cmplt (vector signed __int128, vector signed __int128);
-vector bool __int128 vec_cmplt (vector unsigned __int128, vector unsigned __int128);
-vector bool __int128 vec_cmpge (vector signed __int128, vector signed __int128);
-vector bool __int128 vec_cmpge (vector unsigned __int128, vector unsigned __int128);
-vector bool __int128 vec_cmple (vector signed __int128, vector signed __int128);
-vector bool __int128 vec_cmple (vector unsigned __int128, vector unsigned __int128);
+vector signed short vec_vsrh (vector signed short, vector unsigned short);
+vector unsigned short vec_vsrh (vector unsigned short, vector unsigned short);
 
-int vec_all_eq (vector signed __int128, vector signed __int128);
-int vec_all_eq (vector unsigned __int128, vector unsigned __int128);
-int vec_all_ne (vector signed __int128, vector signed __int128);
-int vec_all_ne (vector unsigned __int128, vector unsigned __int128);
-int vec_all_gt (vector signed __int128, vector signed __int128);
-int vec_all_gt (vector unsigned __int128, vector unsigned __int128);
-int vec_all_lt (vector signed __int128, vector signed __int128);
-int vec_all_lt (vector unsigned __int128, vector unsigned __int128);
-int vec_all_ge (vector signed __int128, vector signed __int128);
-int vec_all_ge (vector unsigned __int128, vector unsigned __int128);
-int vec_all_le (vector signed __int128, vector signed __int128);
-int vec_all_le (vector unsigned __int128, vector unsigned __int128);
+vector signed int vec_vsrw (vector signed int, vector unsigned int);
+vector unsigned int vec_vsrw (vector unsigned int, vector unsigned int);
 
-int vec_any_eq (vector signed __int128, vector signed __int128);
-int vec_any_eq (vector unsigned __int128, vector unsigned __int128);
-int vec_any_ne (vector signed __int128, vector signed __int128);
-int vec_any_ne (vector unsigned __int128, vector unsigned __int128);
-int vec_any_gt (vector signed __int128, vector signed __int128);
-int vec_any_gt (vector unsigned __int128, vector unsigned __int128);
-int vec_any_lt (vector signed __int128, vector signed __int128);
-int vec_any_lt (vector unsigned __int128, vector unsigned __int128);
-int vec_any_ge (vector signed __int128, vector signed __int128);
-int vec_any_ge (vector unsigned __int128, vector unsigned __int128);
-int vec_any_le (vector signed __int128, vector signed __int128);
-int vec_any_le (vector unsigned __int128, vector unsigned __int128);
-@end smallexample
+vector float vec_vsubfp (vector float, vector float);
 
+vector signed char vec_vsubsbs (vector bool char, vector signed char);
+vector signed char vec_vsubsbs (vector signed char, vector bool char);
+vector signed char vec_vsubsbs (vector signed char, vector signed char);
 
-The following instances are extension of the existing overloaded built-ins
-@code{vec_sld}, @code{vec_sldw}, @code{vec_slo}, @code{vec_sro}, @code{vec_srl}
-that are documented in the PVIPR.
+vector signed short vec_vsubshs (vector bool short, vector signed short);
+vector signed short vec_vsubshs (vector signed short, vector bool short);
+vector signed short vec_vsubshs (vector signed short, vector signed short);
 
-@smallexample
-@exdent vector signed __int128 vec_sld (vector signed __int128,
-vector signed __int128, const unsigned int);
-@exdent vector unsigned __int128 vec_sld (vector unsigned __int128,
-vector unsigned __int128, const unsigned int);
-@exdent vector signed __int128 vec_sldw (vector signed __int128,
-vector signed __int128, const unsigned int);
-@exdent vector unsigned __int128 vec_sldw (vector unsigned __int,
-vector unsigned __int128, const unsigned int);
-@exdent vector signed __int128 vec_slo (vector signed __int128,
-vector signed char);
-@exdent vector signed __int128 vec_slo (vector signed __int128,
-vector unsigned char);
-@exdent vector unsigned __int128 vec_slo (vector unsigned __int128,
-vector signed char);
-@exdent vector unsigned __int128 vec_slo (vector unsigned __int128,
-vector unsigned char);
-@exdent vector signed __int128 vec_sro (vector signed __int128,
-vector signed char);
-@exdent vector signed __int128 vec_sro (vector signed __int128,
-vector unsigned char);
-@exdent vector unsigned __int128 vec_sro (vector unsigned __int128,
-vector signed char);
-@exdent vector unsigned __int128 vec_sro (vector unsigned __int128,
-vector unsigned char);
-@exdent vector signed __int128 vec_srl (vector signed __int128,
-vector unsigned char);
-@exdent vector unsigned __int128 vec_srl (vector unsigned __int128,
-vector unsigned char);
-@end smallexample
+vector signed int vec_vsubsws (vector bool int, vector signed int);
+vector signed int vec_vsubsws (vector signed int, vector bool int);
+vector signed int vec_vsubsws (vector signed int, vector signed int);
 
-@node PowerPC Hardware Transactional Memory Built-in Functions
-@subsection PowerPC Hardware Transactional Memory Built-in Functions
-GCC provides two interfaces for accessing the Hardware Transactional
-Memory (HTM) instructions available on some of the PowerPC family
-of processors (eg, POWER8).  The two interfaces come in a low level
-interface, consisting of built-in functions specific to PowerPC and a
-higher level interface consisting of inline functions that are common
-between PowerPC and S/390.
+vector signed char vec_vsububm (vector bool char, vector signed char);
+vector signed char vec_vsububm (vector signed char, vector bool char);
+vector signed char vec_vsububm (vector signed char, vector signed char);
+vector unsigned char vec_vsububm (vector bool char, vector unsigned char);
+vector unsigned char vec_vsububm (vector unsigned char, vector bool char);
+vector unsigned char vec_vsububm (vector unsigned char, vector unsigned char);
 
-@subsubsection PowerPC HTM Low Level Built-in Functions
+vector unsigned char vec_vsububs (vector bool char, vector unsigned char);
+vector unsigned char vec_vsububs (vector unsigned char, vector bool char);
+vector unsigned char vec_vsububs (vector unsigned char, vector unsigned char);
 
-The following low level built-in functions are available with
-@option{-mhtm} or @option{-mcpu=CPU} where CPU is `power8' or later.
-They all generate the machine instruction that is part of the name.
+vector signed short vec_vsubuhm (vector bool short, vector signed short);
+vector signed short vec_vsubuhm (vector signed short, vector bool short);
+vector signed short vec_vsubuhm (vector signed short, vector signed short);
+vector unsigned short vec_vsubuhm (vector bool short, vector unsigned short);
+vector unsigned short vec_vsubuhm (vector unsigned short, vector bool short);
+vector unsigned short vec_vsubuhm (vector unsigned short, vector unsigned short);
 
-The HTM builtins (with the exception of @code{__builtin_tbegin}) return
-the full 4-bit condition register value set by their associated hardware
-instruction.  The header file @code{htmintrin.h} defines some macros that can
-be used to decipher the return value.  The @code{__builtin_tbegin} builtin
-returns a simple @code{true} or @code{false} value depending on whether a transaction was
-successfully started or not.  The arguments of the builtins match exactly the
-type and order of the associated hardware instruction's operands, except for
-the @code{__builtin_tcheck} builtin, which does not take any input arguments.
-Refer to the ISA manual for a description of each instruction's operands.
+vector unsigned short vec_vsubuhs (vector bool short, vector unsigned short);
+vector unsigned short vec_vsubuhs (vector unsigned short, vector bool short);
+vector unsigned short vec_vsubuhs (vector unsigned short, vector unsigned short);
 
-@smallexample
-unsigned int __builtin_tbegin (unsigned int);
-unsigned int __builtin_tend (unsigned int);
+vector signed int vec_vsubuwm (vector bool int, vector signed int);
+vector signed int vec_vsubuwm (vector signed int, vector bool int);
+vector signed int vec_vsubuwm (vector signed int, vector signed int);
+vector unsigned int vec_vsubuwm (vector bool int, vector unsigned int);
+vector unsigned int vec_vsubuwm (vector unsigned int, vector bool int);
+vector unsigned int vec_vsubuwm (vector unsigned int, vector unsigned int);
 
-unsigned int __builtin_tabort (unsigned int);
-unsigned int __builtin_tabortdc (unsigned int, unsigned int, unsigned int);
-unsigned int __builtin_tabortdci (unsigned int, unsigned int, int);
-unsigned int __builtin_tabortwc (unsigned int, unsigned int, unsigned int);
-unsigned int __builtin_tabortwci (unsigned int, unsigned int, int);
+vector unsigned int vec_vsubuws (vector bool int, vector unsigned int);
+vector unsigned int vec_vsubuws (vector unsigned int, vector bool int);
+vector unsigned int vec_vsubuws (vector unsigned int, vector unsigned int);
 
-unsigned int __builtin_tcheck (void);
-unsigned int __builtin_treclaim (unsigned int);
-unsigned int __builtin_trechkpt (void);
-unsigned int __builtin_tsr (unsigned int);
-@end smallexample
+vector signed int vec_vsum4sbs (vector signed char, vector signed int);
 
-In addition to the above HTM built-ins, we have added built-ins for
-some common extended mnemonics of the HTM instructions:
+vector signed int vec_vsum4shs (vector signed short, vector signed int);
 
-@smallexample
-unsigned int __builtin_tendall (void);
-unsigned int __builtin_tresume (void);
-unsigned int __builtin_tsuspend (void);
-@end smallexample
+vector unsigned int vec_vsum4ubs (vector unsigned char, vector unsigned int);
 
-Note that the semantics of the above HTM builtins are required to mimic
-the locking semantics used for critical sections.  Builtins that are used
-to create a new transaction or restart a suspended transaction must have
-lock acquisition like semantics while those builtins that end or suspend a
-transaction must have lock release like semantics.  Specifically, this must
-mimic lock semantics as specified by C++11, for example: Lock acquisition is
-as-if an execution of __atomic_exchange_n(&globallock,1,__ATOMIC_ACQUIRE)
-that returns 0, and lock release is as-if an execution of
-__atomic_store(&globallock,0,__ATOMIC_RELEASE), with globallock being an
-implicit implementation-defined lock used for all transactions.  The HTM
-instructions associated with with the builtins inherently provide the
-correct acquisition and release hardware barriers required.  However,
-the compiler must also be prohibited from moving loads and stores across
-the builtins in a way that would violate their semantics.  This has been
-accomplished by adding memory barriers to the associated HTM instructions
-(which is a conservative approach to provide acquire and release semantics).
-Earlier versions of the compiler did not treat the HTM instructions as
-memory barriers.  A @code{__TM_FENCE__} macro has been added, which can
-be used to determine whether the current compiler treats HTM instructions
-as memory barriers or not.  This allows the user to explicitly add memory
-barriers to their code when using an older version of the compiler.
+vector unsigned int vec_vupkhpx (vector pixel);
+
+vector bool short vec_vupkhsb (vector bool char);
+vector signed short vec_vupkhsb (vector signed char);
+
+vector bool int vec_vupkhsh (vector bool short);
+vector signed int vec_vupkhsh (vector signed short);
 
-The following set of built-in functions are available to gain access
-to the HTM specific special purpose registers.
+vector unsigned int vec_vupklpx (vector pixel);
 
-@smallexample
-unsigned long __builtin_get_texasr (void);
-unsigned long __builtin_get_texasru (void);
-unsigned long __builtin_get_tfhar (void);
-unsigned long __builtin_get_tfiar (void);
+vector bool short vec_vupklsb (vector bool char);
+vector signed short vec_vupklsb (vector signed char);
 
-void __builtin_set_texasr (unsigned long);
-void __builtin_set_texasru (unsigned long);
-void __builtin_set_tfhar (unsigned long);
-void __builtin_set_tfiar (unsigned long);
+vector bool int vec_vupklsh (vector bool short);
+vector signed int vec_vupklsh (vector signed short);
 @end smallexample
 
-Example usage of these low level built-in functions may look like:
+@node PowerPC AltiVec Built-in Functions Available on ISA 2.06
+@subsubsection PowerPC AltiVec Built-in Functions Available on ISA 2.06
 
-@smallexample
-#include <htmintrin.h>
+The AltiVec built-in functions described in this section are
+available on the PowerPC family of processors starting with ISA 2.06
+or later.  These are normally enabled by adding @option{-mvsx} to the
+command line.
 
-int num_retries = 10;
+When @option{-mvsx} is used, the following additional vector types are
+implemented.
 
-while (1)
-  @{
-    if (__builtin_tbegin (0))
-      @{
-        /* Transaction State Initiated.  */
-        if (is_locked (lock))
-          __builtin_tabort (0);
-        ... transaction code...
-        __builtin_tend (0);
-        break;
-      @}
-    else
-      @{
-        /* Transaction State Failed.  Use locks if the transaction
-           failure is "persistent" or we've tried too many times.  */
-        if (num_retries-- <= 0
-            || _TEXASRU_FAILURE_PERSISTENT (__builtin_get_texasru ()))
-          @{
-            acquire_lock (lock);
-            ... non transactional fallback path...
-            release_lock (lock);
-            break;
-          @}
-      @}
-  @}
+@smallexample
+vector unsigned __int128
+vector signed __int128
+vector unsigned long long int
+vector signed long long int
+vector double
 @end smallexample
 
-One final built-in function has been added that returns the value of
-the 2-bit Transaction State field of the Machine Status Register (MSR)
-as stored in @code{CR0}.
+The long long types are only implemented for 64-bit code generation.
+
+Only functions excluded from the PVIPR are listed here.
 
 @smallexample
-unsigned long __builtin_ttest (void)
-@end smallexample
+void vec_dst (const unsigned long *, int, const int);
+void vec_dst (const long *, int, const int);
 
-This built-in can be used to determine the current transaction state
-using the following code example:
+void vec_dststt (const unsigned long *, int, const int);
+void vec_dststt (const long *, int, const int);
 
-@smallexample
-#include <htmintrin.h>
+void vec_dstt (const unsigned long *, int, const int);
+void vec_dstt (const long *, int, const int);
 
-unsigned char tx_state = _HTM_STATE (__builtin_ttest ());
+vector unsigned char vec_lvsl (int, const unsigned long *);
+vector unsigned char vec_lvsl (int, const long *);
 
-if (tx_state == _HTM_TRANSACTIONAL)
-  @{
-    /* Code to use in transactional state.  */
-  @}
-else if (tx_state == _HTM_NONTRANSACTIONAL)
-  @{
-    /* Code to use in non-transactional state.  */
-  @}
-else if (tx_state == _HTM_SUSPENDED)
-  @{
-    /* Code to use in transaction suspended state.  */
-  @}
-@end smallexample
+vector unsigned char vec_lvsr (int, const unsigned long *);
+vector unsigned char vec_lvsr (int, const long *);
 
-@subsubsection PowerPC HTM High Level Inline Functions
+vector unsigned char vec_lvsl (int, const double *);
+vector unsigned char vec_lvsr (int, const double *);
 
-The following high level HTM interface is made available by including
-@code{<htmxlintrin.h>} and using @option{-mhtm} or @option{-mcpu=CPU}
-where CPU is `power8' or later.  This interface is common between PowerPC
-and S/390, allowing users to write one HTM source implementation that
-can be compiled and executed on either system.
+vector double vec_vsx_ld (int, const vector double *);
+vector double vec_vsx_ld (int, const double *);
+vector float vec_vsx_ld (int, const vector float *);
+vector float vec_vsx_ld (int, const float *);
+vector bool int vec_vsx_ld (int, const vector bool int *);
+vector signed int vec_vsx_ld (int, const vector signed int *);
+vector signed int vec_vsx_ld (int, const int *);
+vector signed int vec_vsx_ld (int, const long *);
+vector unsigned int vec_vsx_ld (int, const vector unsigned int *);
+vector unsigned int vec_vsx_ld (int, const unsigned int *);
+vector unsigned int vec_vsx_ld (int, const unsigned long *);
+vector bool short vec_vsx_ld (int, const vector bool short *);
+vector pixel vec_vsx_ld (int, const vector pixel *);
+vector signed short vec_vsx_ld (int, const vector signed short *);
+vector signed short vec_vsx_ld (int, const short *);
+vector unsigned short vec_vsx_ld (int, const vector unsigned short *);
+vector unsigned short vec_vsx_ld (int, const unsigned short *);
+vector bool char vec_vsx_ld (int, const vector bool char *);
+vector signed char vec_vsx_ld (int, const vector signed char *);
+vector signed char vec_vsx_ld (int, const signed char *);
+vector unsigned char vec_vsx_ld (int, const vector unsigned char *);
+vector unsigned char vec_vsx_ld (int, const unsigned char *);
 
-@smallexample
-long __TM_simple_begin (void);
-long __TM_begin (void* const TM_buff);
-long __TM_end (void);
-void __TM_abort (void);
-void __TM_named_abort (unsigned char const code);
-void __TM_resume (void);
-void __TM_suspend (void);
+void vec_vsx_st (vector double, int, vector double *);
+void vec_vsx_st (vector double, int, double *);
+void vec_vsx_st (vector float, int, vector float *);
+void vec_vsx_st (vector float, int, float *);
+void vec_vsx_st (vector signed int, int, vector signed int *);
+void vec_vsx_st (vector signed int, int, int *);
+void vec_vsx_st (vector unsigned int, int, vector unsigned int *);
+void vec_vsx_st (vector unsigned int, int, unsigned int *);
+void vec_vsx_st (vector bool int, int, vector bool int *);
+void vec_vsx_st (vector bool int, int, unsigned int *);
+void vec_vsx_st (vector bool int, int, int *);
+void vec_vsx_st (vector signed short, int, vector signed short *);
+void vec_vsx_st (vector signed short, int, short *);
+void vec_vsx_st (vector unsigned short, int, vector unsigned short *);
+void vec_vsx_st (vector unsigned short, int, unsigned short *);
+void vec_vsx_st (vector bool short, int, vector bool short *);
+void vec_vsx_st (vector bool short, int, unsigned short *);
+void vec_vsx_st (vector pixel, int, vector pixel *);
+void vec_vsx_st (vector pixel, int, unsigned short *);
+void vec_vsx_st (vector pixel, int, short *);
+void vec_vsx_st (vector bool short, int, short *);
+void vec_vsx_st (vector signed char, int, vector signed char *);
+void vec_vsx_st (vector signed char, int, signed char *);
+void vec_vsx_st (vector unsigned char, int, vector unsigned char *);
+void vec_vsx_st (vector unsigned char, int, unsigned char *);
+void vec_vsx_st (vector bool char, int, vector bool char *);
+void vec_vsx_st (vector bool char, int, unsigned char *);
+void vec_vsx_st (vector bool char, int, signed char *);
 
-long __TM_is_user_abort (void* const TM_buff);
-long __TM_is_named_user_abort (void* const TM_buff, unsigned char *code);
-long __TM_is_illegal (void* const TM_buff);
-long __TM_is_footprint_exceeded (void* const TM_buff);
-long __TM_nesting_depth (void* const TM_buff);
-long __TM_is_nested_too_deep(void* const TM_buff);
-long __TM_is_conflict(void* const TM_buff);
-long __TM_is_failure_persistent(void* const TM_buff);
-long __TM_failure_address(void* const TM_buff);
-long long __TM_failure_code(void* const TM_buff);
+vector double vec_xxpermdi (vector double, vector double, const int);
+vector float vec_xxpermdi (vector float, vector float, const int);
+vector __int128 vec_xxpermdi (vector __int128,
+                              vector __int128, const int);
+vector __uint128 vec_xxpermdi (vector __uint128,
+                               vector __uint128, const int);
+vector long long vec_xxpermdi (vector long long, vector long long, const int);
+vector unsigned long long vec_xxpermdi (vector unsigned long long,
+                                        vector unsigned long long, const int);
+vector int vec_xxpermdi (vector int, vector int, const int);
+vector unsigned int vec_xxpermdi (vector unsigned int,
+                                  vector unsigned int, const int);
+vector short vec_xxpermdi (vector short, vector short, const int);
+vector unsigned short vec_xxpermdi (vector unsigned short,
+                                    vector unsigned short, const int);
+vector signed char vec_xxpermdi (vector signed char, vector signed char,
+                                 const int);
+vector unsigned char vec_xxpermdi (vector unsigned char,
+                                   vector unsigned char, const int);
+
+vector double vec_xxsldi (vector double, vector double, int);
+vector float vec_xxsldi (vector float, vector float, int);
+vector long long vec_xxsldi (vector long long, vector long long, int);
+vector unsigned long long vec_xxsldi (vector unsigned long long,
+                                      vector unsigned long long, int);
+vector int vec_xxsldi (vector int, vector int, int);
+vector unsigned int vec_xxsldi (vector unsigned int, vector unsigned int, int);
+vector short vec_xxsldi (vector short, vector short, int);
+vector unsigned short vec_xxsldi (vector unsigned short,
+                                  vector unsigned short, int);
+vector signed char vec_xxsldi (vector signed char, vector signed char, int);
+vector unsigned char vec_xxsldi (vector unsigned char,
+                                 vector unsigned char, int);
 @end smallexample
 
-Using these common set of HTM inline functions, we can create
-a more portable version of the HTM example in the previous
-section that will work on either PowerPC or S/390:
+Note that the @samp{vec_ld} and @samp{vec_st} built-in functions always
+generate the AltiVec @samp{LVX} and @samp{STVX} instructions even
+if the VSX instruction set is available.  The @samp{vec_vsx_ld} and
+@samp{vec_vsx_st} built-in functions always generate the VSX @samp{LXVD2X},
+@samp{LXVW4X}, @samp{STXVD2X}, and @samp{STXVW4X} instructions.
 
 @smallexample
-#include <htmxlintrin.h>
-
-int num_retries = 10;
-TM_buff_type TM_buff;
-
-while (1)
-  @{
-    if (__TM_begin (TM_buff) == _HTM_TBEGIN_STARTED)
-      @{
-        /* Transaction State Initiated.  */
-        if (is_locked (lock))
-          __TM_abort ();
-        ... transaction code...
-        __TM_end ();
-        break;
-      @}
-    else
-      @{
-        /* Transaction State Failed.  Use locks if the transaction
-           failure is "persistent" or we've tried too many times.  */
-        if (num_retries-- <= 0
-            || __TM_is_failure_persistent (TM_buff))
-          @{
-            acquire_lock (lock);
-            ... non transactional fallback path...
-            release_lock (lock);
-            break;
-          @}
-      @}
-  @}
+vector signed long long vec_signedo (vector float);
+vector signed long long vec_signede (vector float);
+vector unsigned long long vec_unsignedo (vector float);
+vector unsigned long long vec_unsignede (vector float);
 @end smallexample
 
-@node PowerPC Atomic Memory Operation Functions
-@subsection PowerPC Atomic Memory Operation Functions
-ISA 3.0 of the PowerPC added new atomic memory operation (amo)
-instructions.  GCC provides support for these instructions in 64-bit
-environments.  All of the functions are declared in the include file
-@code{amo.h}.
+The overloaded built-ins @code{vec_signedo} and @code{vec_signede} are
+additional extensions to the built-ins as documented in the PVIPR.
+
+@node PowerPC AltiVec Built-in Functions Available on ISA 2.07
+@subsubsection PowerPC AltiVec Built-in Functions Available on ISA 2.07
 
-The functions supported are:
+If the ISA 2.07 additions to the vector/scalar (power8-vector)
+instruction set are available, the following additional functions are
+available for both 32-bit and 64-bit targets.  For 64-bit targets, you
+can use @var{vector long} instead of @var{vector long long},
+@var{vector bool long} instead of @var{vector bool long long}, and
+@var{vector unsigned long} instead of @var{vector unsigned long long}.
+
+Only functions excluded from the PVIPR are listed here.
 
 @smallexample
-#include <amo.h>
+vector long long vec_vaddudm (vector long long, vector long long);
+vector long long vec_vaddudm (vector bool long long, vector long long);
+vector long long vec_vaddudm (vector long long, vector bool long long);
+vector unsigned long long vec_vaddudm (vector unsigned long long,
+                                       vector unsigned long long);
+vector unsigned long long vec_vaddudm (vector bool unsigned long long,
+                                       vector unsigned long long);
+vector unsigned long long vec_vaddudm (vector unsigned long long,
+                                       vector bool unsigned long long);
 
-uint32_t amo_lwat_add (uint32_t *, uint32_t);
-uint32_t amo_lwat_xor (uint32_t *, uint32_t);
-uint32_t amo_lwat_ior (uint32_t *, uint32_t);
-uint32_t amo_lwat_and (uint32_t *, uint32_t);
-uint32_t amo_lwat_umax (uint32_t *, uint32_t);
-uint32_t amo_lwat_umin (uint32_t *, uint32_t);
-uint32_t amo_lwat_swap (uint32_t *, uint32_t);
+vector long long vec_vclz (vector long long);
+vector unsigned long long vec_vclz (vector unsigned long long);
+vector int vec_vclz (vector int);
+vector unsigned int vec_vclz (vector int);
+vector short vec_vclz (vector short);
+vector unsigned short vec_vclz (vector unsigned short);
+vector signed char vec_vclz (vector signed char);
+vector unsigned char vec_vclz (vector unsigned char);
 
-int32_t amo_lwat_sadd (int32_t *, int32_t);
-int32_t amo_lwat_smax (int32_t *, int32_t);
-int32_t amo_lwat_smin (int32_t *, int32_t);
-int32_t amo_lwat_sswap (int32_t *, int32_t);
+vector signed char vec_vclzb (vector signed char);
+vector unsigned char vec_vclzb (vector unsigned char);
 
-uint64_t amo_ldat_add (uint64_t *, uint64_t);
-uint64_t amo_ldat_xor (uint64_t *, uint64_t);
-uint64_t amo_ldat_ior (uint64_t *, uint64_t);
-uint64_t amo_ldat_and (uint64_t *, uint64_t);
-uint64_t amo_ldat_umax (uint64_t *, uint64_t);
-uint64_t amo_ldat_umin (uint64_t *, uint64_t);
-uint64_t amo_ldat_swap (uint64_t *, uint64_t);
+vector long long vec_vclzd (vector long long);
+vector unsigned long long vec_vclzd (vector unsigned long long);
 
-int64_t amo_ldat_sadd (int64_t *, int64_t);
-int64_t amo_ldat_smax (int64_t *, int64_t);
-int64_t amo_ldat_smin (int64_t *, int64_t);
-int64_t amo_ldat_sswap (int64_t *, int64_t);
+vector short vec_vclzh (vector short);
+vector unsigned short vec_vclzh (vector unsigned short);
 
-void amo_stwat_add (uint32_t *, uint32_t);
-void amo_stwat_xor (uint32_t *, uint32_t);
-void amo_stwat_ior (uint32_t *, uint32_t);
-void amo_stwat_and (uint32_t *, uint32_t);
-void amo_stwat_umax (uint32_t *, uint32_t);
-void amo_stwat_umin (uint32_t *, uint32_t);
+vector int vec_vclzw (vector int);
+vector unsigned int vec_vclzw (vector int);
 
-void amo_stwat_sadd (int32_t *, int32_t);
-void amo_stwat_smax (int32_t *, int32_t);
-void amo_stwat_smin (int32_t *, int32_t);
+vector signed char vec_vgbbd (vector signed char);
+vector unsigned char vec_vgbbd (vector unsigned char);
 
-void amo_stdat_add (uint64_t *, uint64_t);
-void amo_stdat_xor (uint64_t *, uint64_t);
-void amo_stdat_ior (uint64_t *, uint64_t);
-void amo_stdat_and (uint64_t *, uint64_t);
-void amo_stdat_umax (uint64_t *, uint64_t);
-void amo_stdat_umin (uint64_t *, uint64_t);
+vector long long vec_vmaxsd (vector long long, vector long long);
 
-void amo_stdat_sadd (int64_t *, int64_t);
-void amo_stdat_smax (int64_t *, int64_t);
-void amo_stdat_smin (int64_t *, int64_t);
-@end smallexample
+vector unsigned long long vec_vmaxud (vector unsigned long long,
+                                      unsigned vector long long);
 
-@node PowerPC Matrix-Multiply Assist Built-in Functions
-@subsection PowerPC Matrix-Multiply Assist Built-in Functions
-ISA 3.1 of the PowerPC added new Matrix-Multiply Assist (MMA) instructions.
-GCC provides support for these instructions through the following built-in
-functions which are enabled with the @code{-mmma} option.  The vec_t type
-below is defined to be a normal vector unsigned char type.  The uint2, uint4
-and uint8 parameters are 2-bit, 4-bit and 8-bit unsigned integer constants
-respectively.  The compiler will verify that they are constants and that
-their values are within range.
+vector long long vec_vminsd (vector long long, vector long long);
 
-The built-in functions supported are:
+vector unsigned long long vec_vminud (vector long long, vector long long);
 
-@smallexample
-void __builtin_mma_xvi4ger8 (__vector_quad *, vec_t, vec_t);
-void __builtin_mma_xvi8ger4 (__vector_quad *, vec_t, vec_t);
-void __builtin_mma_xvi16ger2 (__vector_quad *, vec_t, vec_t);
-void __builtin_mma_xvi16ger2s (__vector_quad *, vec_t, vec_t);
-void __builtin_mma_xvf16ger2 (__vector_quad *, vec_t, vec_t);
-void __builtin_mma_xvbf16ger2 (__vector_quad *, vec_t, vec_t);
-void __builtin_mma_xvf32ger (__vector_quad *, vec_t, vec_t);
+vector int vec_vpksdss (vector long long, vector long long);
+vector unsigned int vec_vpksdss (vector long long, vector long long);
 
-void __builtin_mma_xvi4ger8pp (__vector_quad *, vec_t, vec_t);
-void __builtin_mma_xvi8ger4pp (__vector_quad *, vec_t, vec_t);
-void __builtin_mma_xvi8ger4spp(__vector_quad *, vec_t, vec_t);
-void __builtin_mma_xvi16ger2pp (__vector_quad *, vec_t, vec_t);
-void __builtin_mma_xvi16ger2spp (__vector_quad *, vec_t, vec_t);
-void __builtin_mma_xvf16ger2pp (__vector_quad *, vec_t, vec_t);
-void __builtin_mma_xvf16ger2pn (__vector_quad *, vec_t, vec_t);
-void __builtin_mma_xvf16ger2np (__vector_quad *, vec_t, vec_t);
-void __builtin_mma_xvf16ger2nn (__vector_quad *, vec_t, vec_t);
-void __builtin_mma_xvbf16ger2pp (__vector_quad *, vec_t, vec_t);
-void __builtin_mma_xvbf16ger2pn (__vector_quad *, vec_t, vec_t);
-void __builtin_mma_xvbf16ger2np (__vector_quad *, vec_t, vec_t);
-void __builtin_mma_xvbf16ger2nn (__vector_quad *, vec_t, vec_t);
-void __builtin_mma_xvf32gerpp (__vector_quad *, vec_t, vec_t);
-void __builtin_mma_xvf32gerpn (__vector_quad *, vec_t, vec_t);
-void __builtin_mma_xvf32gernp (__vector_quad *, vec_t, vec_t);
-void __builtin_mma_xvf32gernn (__vector_quad *, vec_t, vec_t);
+vector unsigned int vec_vpkudus (vector unsigned long long,
+                                 vector unsigned long long);
 
-void __builtin_mma_pmxvi4ger8 (__vector_quad *, vec_t, vec_t, uint4, uint4, uint8);
-void __builtin_mma_pmxvi4ger8pp (__vector_quad *, vec_t, vec_t, uint4, uint4, uint8);
+vector int vec_vpkudum (vector long long, vector long long);
+vector unsigned int vec_vpkudum (vector unsigned long long,
+                                 vector unsigned long long);
+vector bool int vec_vpkudum (vector bool long long, vector bool long long);
 
-void __builtin_mma_pmxvi8ger4 (__vector_quad *, vec_t, vec_t, uint4, uint4, uint4);
-void __builtin_mma_pmxvi8ger4pp (__vector_quad *, vec_t, vec_t, uint4, uint4, uint4);
-void __builtin_mma_pmxvi8ger4spp(__vector_quad *, vec_t, vec_t, uint4, uint4, uint4);
+vector long long vec_vpopcnt (vector long long);
+vector unsigned long long vec_vpopcnt (vector unsigned long long);
+vector int vec_vpopcnt (vector int);
+vector unsigned int vec_vpopcnt (vector int);
+vector short vec_vpopcnt (vector short);
+vector unsigned short vec_vpopcnt (vector unsigned short);
+vector signed char vec_vpopcnt (vector signed char);
+vector unsigned char vec_vpopcnt (vector unsigned char);
 
-void __builtin_mma_pmxvi16ger2 (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2);
-void __builtin_mma_pmxvi16ger2s (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2);
-void __builtin_mma_pmxvf16ger2 (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2);
-void __builtin_mma_pmxvbf16ger2 (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2);
+vector signed char vec_vpopcntb (vector signed char);
+vector unsigned char vec_vpopcntb (vector unsigned char);
 
-void __builtin_mma_pmxvi16ger2pp (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2);
-void __builtin_mma_pmxvi16ger2spp (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2);
-void __builtin_mma_pmxvf16ger2pp (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2);
-void __builtin_mma_pmxvf16ger2pn (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2);
-void __builtin_mma_pmxvf16ger2np (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2);
-void __builtin_mma_pmxvf16ger2nn (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2);
-void __builtin_mma_pmxvbf16ger2pp (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2);
-void __builtin_mma_pmxvbf16ger2pn (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2);
-void __builtin_mma_pmxvbf16ger2np (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2);
-void __builtin_mma_pmxvbf16ger2nn (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2);
+vector long long vec_vpopcntd (vector long long);
+vector unsigned long long vec_vpopcntd (vector unsigned long long);
 
-void __builtin_mma_pmxvf32ger (__vector_quad *, vec_t, vec_t, uint4, uint4);
-void __builtin_mma_pmxvf32gerpp (__vector_quad *, vec_t, vec_t, uint4, uint4);
-void __builtin_mma_pmxvf32gerpn (__vector_quad *, vec_t, vec_t, uint4, uint4);
-void __builtin_mma_pmxvf32gernp (__vector_quad *, vec_t, vec_t, uint4, uint4);
-void __builtin_mma_pmxvf32gernn (__vector_quad *, vec_t, vec_t, uint4, uint4);
+vector short vec_vpopcnth (vector short);
+vector unsigned short vec_vpopcnth (vector unsigned short);
 
-void __builtin_mma_xvf64ger (__vector_quad *, __vector_pair, vec_t);
-void __builtin_mma_xvf64gerpp (__vector_quad *, __vector_pair, vec_t);
-void __builtin_mma_xvf64gerpn (__vector_quad *, __vector_pair, vec_t);
-void __builtin_mma_xvf64gernp (__vector_quad *, __vector_pair, vec_t);
-void __builtin_mma_xvf64gernn (__vector_quad *, __vector_pair, vec_t);
+vector int vec_vpopcntw (vector int);
+vector unsigned int vec_vpopcntw (vector int);
 
-void __builtin_mma_pmxvf64ger (__vector_quad *, __vector_pair, vec_t, uint4, uint2);
-void __builtin_mma_pmxvf64gerpp (__vector_quad *, __vector_pair, vec_t, uint4, uint2);
-void __builtin_mma_pmxvf64gerpn (__vector_quad *, __vector_pair, vec_t, uint4, uint2);
-void __builtin_mma_pmxvf64gernp (__vector_quad *, __vector_pair, vec_t, uint4, uint2);
-void __builtin_mma_pmxvf64gernn (__vector_quad *, __vector_pair, vec_t, uint4, uint2);
+vector long long vec_vrld (vector long long, vector unsigned long long);
+vector unsigned long long vec_vrld (vector unsigned long long,
+                                    vector unsigned long long);
 
-void __builtin_mma_xxmtacc (__vector_quad *);
-void __builtin_mma_xxmfacc (__vector_quad *);
-void __builtin_mma_xxsetaccz (__vector_quad *);
+vector long long vec_vsld (vector long long, vector unsigned long long);
+vector long long vec_vsld (vector unsigned long long,
+                           vector unsigned long long);
 
-void __builtin_mma_build_acc (__vector_quad *, vec_t, vec_t, vec_t, vec_t);
-void __builtin_mma_disassemble_acc (void *, __vector_quad *);
+vector long long vec_vsrad (vector long long, vector unsigned long long);
+vector unsigned long long vec_vsrad (vector unsigned long long,
+                                     vector unsigned long long);
 
-void __builtin_vsx_build_pair (__vector_pair *, vec_t, vec_t);
-void __builtin_vsx_disassemble_pair (void *, __vector_pair *);
+vector long long vec_vsrd (vector long long, vector unsigned long long);
+vector unsigned long long char vec_vsrd (vector unsigned long long,
+                                         vector unsigned long long);
 
-vec_t __builtin_vsx_xvcvspbf16 (vec_t);
-vec_t __builtin_vsx_xvcvbf16spn (vec_t);
+vector long long vec_vsubudm (vector long long, vector long long);
+vector long long vec_vsubudm (vector bool long long, vector long long);
+vector long long vec_vsubudm (vector long long, vector bool long long);
+vector unsigned long long vec_vsubudm (vector unsigned long long,
+                                       vector unsigned long long);
+vector unsigned long long vec_vsubudm (vector bool long long,
+                                       vector unsigned long long);
+vector unsigned long long vec_vsubudm (vector unsigned long long,
+                                       vector bool long long);
 
-__vector_pair __builtin_vsx_lxvp (size_t, __vector_pair *);
-void __builtin_vsx_stxvp (__vector_pair, size_t, __vector_pair *);
+vector long long vec_vupkhsw (vector int);
+vector unsigned long long vec_vupkhsw (vector unsigned int);
+
+vector long long vec_vupklsw (vector int);
+vector unsigned long long vec_vupklsw (vector int);
 @end smallexample
 
-@node PRU Built-in Functions
-@subsection PRU Built-in Functions
+If the ISA 2.07 additions to the vector/scalar (power8-vector)
+instruction set are available, the following additional functions are
+available for 64-bit targets.  New vector types
+(@var{vector __int128} and @var{vector __uint128}) are available
+to hold the @var{__int128} and @var{__uint128} types to use these
+builtins.
 
-GCC provides a couple of special builtin functions to aid in utilizing
-special PRU instructions.
+The normal vector extract, and set operations work on
+@var{vector __int128} and @var{vector __uint128} types,
+but the index value must be 0.
 
-The built-in functions supported are:
+Only functions excluded from the PVIPR are listed here.
 
-@defbuiltin{void __delay_cycles (constant long long @var{cycles})}
-This inserts an instruction sequence that takes exactly @var{cycles}
-cycles (between 0 and 0xffffffff) to complete.  The inserted sequence
-may use jumps, loops, or no-ops, and does not interfere with any other
-instructions.  Note that @var{cycles} must be a compile-time constant
-integer - that is, you must pass a number, not a variable that may be
-optimized to a constant later.  The number of cycles delayed by this
-builtin is exact.
-@enddefbuiltin
+@smallexample
+vector __int128 vec_vaddcuq (vector __int128, vector __int128);
+vector __uint128 vec_vaddcuq (vector __uint128, vector __uint128);
 
-@defbuiltin{void __halt (void)}
-This inserts a HALT instruction to stop processor execution.
-@enddefbuiltin
+vector __int128 vec_vadduqm (vector __int128, vector __int128);
+vector __uint128 vec_vadduqm (vector __uint128, vector __uint128);
+
+vector __int128 vec_vaddecuq (vector __int128, vector __int128,
+                                vector __int128);
+vector __uint128 vec_vaddecuq (vector __uint128, vector __uint128,
+                                 vector __uint128);
+
+vector __int128 vec_vaddeuqm (vector __int128, vector __int128,
+                                vector __int128);
+vector __uint128 vec_vaddeuqm (vector __uint128, vector __uint128,
+                                 vector __uint128);
 
-@defbuiltin{{unsigned int} @
-            __lmbd (unsigned int @var{wordval}, @
-                    unsigned int @var{bitval})}
-This inserts LMBD instruction to calculate the left-most bit with value
-@var{bitval} in value @var{wordval}.  Only the least significant bit
-of @var{bitval} is taken into account.
-@enddefbuiltin
+vector __int128 vec_vsubecuq (vector __int128, vector __int128,
+                                vector __int128);
+vector __uint128 vec_vsubecuq (vector __uint128, vector __uint128,
+                                 vector __uint128);
 
-@node RISC-V Built-in Functions
-@subsection RISC-V Built-in Functions
+vector __int128 vec_vsubeuqm (vector __int128, vector __int128,
+                                vector __int128);
+vector __uint128 vec_vsubeuqm (vector __uint128, vector __uint128,
+                                 vector __uint128);
 
-These built-in functions are available for the RISC-V family of
-processors.
+vector __int128 vec_vsubcuq (vector __int128, vector __int128);
+vector __uint128 vec_vsubcuq (vector __uint128, vector __uint128);
 
-@defbuiltin{{void *} __builtin_thread_pointer (void)}
-Returns the value that is currently set in the @samp{tp} register.
-@enddefbuiltin
+__int128 vec_vsubuqm (__int128, __int128);
+__uint128 vec_vsubuqm (__uint128, __uint128);
 
-@defbuiltin{void __builtin_riscv_pause (void)}
-Generates the @code{pause} (hint) machine instruction.  If the target implements
-the Zihintpause extension, it indicates that the current hart should be
-temporarily paused or slowed down.
-@enddefbuiltin
+vector __int128 __builtin_bcdadd (vector __int128, vector __int128, const int);
+vector unsigned char __builtin_bcdadd (vector unsigned char, vector unsigned char,
+                                       const int);
+int __builtin_bcdadd_lt (vector __int128, vector __int128, const int);
+int __builtin_bcdadd_lt (vector unsigned char, vector unsigned char, const int);
+int __builtin_bcdadd_eq (vector __int128, vector __int128, const int);
+int __builtin_bcdadd_eq (vector unsigned char, vector unsigned char, const int);
+int __builtin_bcdadd_gt (vector __int128, vector __int128, const int);
+int __builtin_bcdadd_gt (vector unsigned char, vector unsigned char, const int);
+int __builtin_bcdadd_ov (vector __int128, vector __int128, const int);
+int __builtin_bcdadd_ov (vector unsigned char, vector unsigned char, const int);
 
-@node RISC-V Vector Intrinsics
-@subsection RISC-V Vector Intrinsics
+vector __int128 __builtin_bcdsub (vector __int128, vector __int128, const int);
+vector unsigned char __builtin_bcdsub (vector unsigned char, vector unsigned char,
+                                       const int);
+int __builtin_bcdsub_le (vector __int128, vector __int128, const int);
+int __builtin_bcdsub_le (vector unsigned char, vector unsigned char, const int);
+int __builtin_bcdsub_lt (vector __int128, vector __int128, const int);
+int __builtin_bcdsub_lt (vector unsigned char, vector unsigned char, const int);
+int __builtin_bcdsub_eq (vector __int128, vector __int128, const int);
+int __builtin_bcdsub_eq (vector unsigned char, vector unsigned char, const int);
+int __builtin_bcdsub_gt (vector __int128, vector __int128, const int);
+int __builtin_bcdsub_gt (vector unsigned char, vector unsigned char, const int);
+int __builtin_bcdsub_ge (vector __int128, vector __int128, const int);
+int __builtin_bcdsub_ge (vector unsigned char, vector unsigned char, const int);
+int __builtin_bcdsub_ov (vector __int128, vector __int128, const int);
+int __builtin_bcdsub_ov (vector unsigned char, vector unsigned char, const int);
+@end smallexample
 
-GCC supports vector intrinsics as specified in version 0.11 of the RISC-V
-vector intrinsic specification, which is available at the following link:
-@uref{https://github.com/riscv-non-isa/rvv-intrinsic-doc/tree/v0.11.x}.
-All of these functions are declared in the include file @file{riscv_vector.h}.
+@node PowerPC AltiVec Built-in Functions Available on ISA 3.0
+@subsubsection PowerPC AltiVec Built-in Functions Available on ISA 3.0
 
-@node CORE-V Built-in Functions
-@subsection CORE-V Built-in Functions
-For more information on all CORE-V built-ins, please see
-@uref{https://github.com/openhwgroup/core-v-sw/blob/master/specifications/corev-builtin-spec.md}
+The following additional built-in functions are also available for the
+PowerPC family of processors, starting with ISA 3.0
+(@option{-mcpu=power9}) or later.
 
-These built-in functions are available for the CORE-V MAC machine
-architecture. For more information on CORE-V built-ins, please see
-@uref{https://github.com/openhwgroup/core-v-sw/blob/master/specifications/corev-builtin-spec.md#listing-of-multiply-accumulate-builtins-xcvmac}.
+Only instructions excluded from the PVIPR are listed here.
 
-@deftypefn {Built-in Function} {int32_t} __builtin_riscv_cv_mac_mac (int32_t, int32_t, int32_t)
-Generated assembler @code{cv.mac}
-@end deftypefn
+@smallexample
+unsigned int scalar_extract_exp (double source);
+unsigned long long int scalar_extract_exp (__ieee128 source);
 
-@deftypefn {Built-in Function} {int32_t} __builtin_riscv_cv_mac_msu (int32_t, int32_t, int32_t)
-Generates the @code{cv.msu} machine instruction.
-@end deftypefn
+unsigned long long int scalar_extract_sig (double source);
+unsigned __int128 scalar_extract_sig (__ieee128 source);
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_mac_muluN (uint32_t, uint32_t, uint8_t)
-Generates the @code{cv.muluN} machine instruction.
-@end deftypefn
+double scalar_insert_exp (unsigned long long int significand,
+                          unsigned long long int exponent);
+double scalar_insert_exp (double significand, unsigned long long int exponent);
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_mac_mulhhuN (uint32_t, uint32_t, uint8_t)
-Generates the @code{cv.mulhhuN} machine instruction.
-@end deftypefn
+ieee_128 scalar_insert_exp (unsigned __int128 significand,
+                            unsigned long long int exponent);
+ieee_128 scalar_insert_exp (ieee_128 significand, unsigned long long int exponent);
+vector ieee_128 scalar_insert_exp (vector unsigned __int128 significand,
+                                   vector unsigned long long exponent);
+vector unsigned long long scalar_extract_exp_to_vec (ieee_128);
+vector unsigned __int128  scalar_extract_sig_to_vec (ieee_128);
 
-@deftypefn {Built-in Function} {int32_t} __builtin_riscv_cv_mac_mulsN (int32_t, int32_t, uint8_t)
-Generates the @code{cv.mulsN} machine instruction.
-@end deftypefn
+int scalar_cmp_exp_gt (double arg1, double arg2);
+int scalar_cmp_exp_lt (double arg1, double arg2);
+int scalar_cmp_exp_eq (double arg1, double arg2);
+int scalar_cmp_exp_unordered (double arg1, double arg2);
 
-@deftypefn {Built-in Function} {int32_t} __builtin_riscv_cv_mac_mulhhsN (int32_t, int32_t, uint8_t)
-Generates the @code{cv.mulhhsN} machine instruction.
-@end deftypefn
+bool scalar_test_data_class (float source, const int condition);
+bool scalar_test_data_class (double source, const int condition);
+bool scalar_test_data_class (__ieee128 source, const int condition);
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_mac_muluRN (uint32_t, uint32_t, uint8_t)
-Generates the @code{cv.muluRN} machine instruction.
-@end deftypefn
+bool scalar_test_neg (float source);
+bool scalar_test_neg (double source);
+bool scalar_test_neg (__ieee128 source);
+@end smallexample
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_mac_mulhhuRN (uint32_t, uint32_t, uint8_t)
-Generates the @code{cv.mulhhuRN} machine instruction.
-@end deftypefn
+The @code{scalar_extract_exp} with a 64-bit source argument
+function requires an environment supporting ISA 3.0 or later.
+The @code{scalar_extract_exp} with a 128-bit source argument
+and @code{scalar_extract_sig}
+functions require a 64-bit environment supporting ISA 3.0 or later.
+The @code{scalar_extract_exp} and @code{scalar_extract_sig} built-in
+functions return the significand and the biased exponent value
+respectively of their @code{source} arguments.
+When supplied with a 64-bit @code{source} argument, the
+result returned by @code{scalar_extract_sig} has
+the @code{0x0010000000000000} bit set if the
+function's @code{source} argument is in normalized form.
+Otherwise, this bit is set to 0.
+When supplied with a 128-bit @code{source} argument, the
+@code{0x00010000000000000000000000000000} bit of the result is
+treated similarly.
+Note that the sign of the significand is not represented in the result
+returned from the @code{scalar_extract_sig} function.  Use the
+@code{scalar_test_neg} function to test the sign of its @code{double}
+argument.
 
-@deftypefn {Built-in Function} {int32_t} __builtin_riscv_cv_mac_mulsRN (int32_t, int32_t, uint8_t)
-Generates the @code{cv.mulsRN} machine instruction.
-@end deftypefn
+The @code{scalar_insert_exp}
+functions require a 64-bit environment supporting ISA 3.0 or later.
+When supplied with a 64-bit first argument, the
+@code{scalar_insert_exp} built-in function returns a double-precision
+floating point value that is constructed by assembling the values of its
+@code{significand} and @code{exponent} arguments.  The sign of the
+result is copied from the most significant bit of the
+@code{significand} argument.  The significand and exponent components
+of the result are composed of the least significant 11 bits of the
+@code{exponent} argument and the least significant 52 bits of the
+@code{significand} argument respectively.
 
-@deftypefn {Built-in Function} {int32_t} __builtin_riscv_cv_mac_mulhhsRN (int32_t, int32_t, uint8_t)
-Generates the @code{cv.mulhhsRN} machine instruction.
-@end deftypefn
+When supplied with a 128-bit first argument, the
+@code{scalar_insert_exp} built-in function returns a quad-precision
+IEEE floating point value if the two arguments were scalar.  If the two
+arguments are vectors, the return value is a vector IEEE floating point value.
+The sign bit of the result is copied from the most significant bit of the
+@code{significand} argument.  The significand and exponent components of the
+result are composed of the least significant 15 bits of the @code{exponent}
+argument (element 0 on big-endian and element 1 on little-endian) and the
+least significant 112 bits of the @code{significand} argument
+respectively.  Note, the @code{significand} is the scalar argument or in the
+case of vector arguments, @code{significand} is element 0 for big-endian and
+element 1 for little-endian.
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_mac_macuN (uint32_t, uint32_t, uint8_t)
-Generates the @code{cv.macuN} machine instruction.
-@end deftypefn
+The @code{scalar_extract_exp_to_vec},
+and @code{scalar_extract_sig_to_vec} are similar to
+@code{scalar_extract_exp}, @code{scalar_extract_sig} except they return
+a vector result of type unsigned long long and unsigned __int128 respectively.
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_mac_machhuN (uint32_t, uint32_t, uint8_t)
-Generates the @code{cv.machhuN} machine instruction.
-@end deftypefn
+The @code{scalar_cmp_exp_gt}, @code{scalar_cmp_exp_lt},
+@code{scalar_cmp_exp_eq}, and @code{scalar_cmp_exp_unordered} built-in
+functions return a non-zero value if @code{arg1} is greater than, less
+than, equal to, or not comparable to @code{arg2} respectively.  The
+arguments are not comparable if one or the other equals NaN (not a
+number). 
 
-@deftypefn {Built-in Function} {int32_t} __builtin_riscv_cv_mac_macsN (int32_t, int32_t, uint8_t)
-Generates the @code{cv.macsN} machine instruction.
-@end deftypefn
+The @code{scalar_test_data_class} built-in function returns 1
+if any of the condition tests enabled by the value of the
+@code{condition} variable are true, and 0 otherwise.  The
+@code{condition} argument must be a compile-time constant integer with
+value not exceeding 127.  The
+@code{condition} argument is encoded as a bitmask with each bit
+enabling the testing of a different condition, as characterized by the
+following:
+@smallexample
+0x40    Test for NaN
+0x20    Test for +Infinity
+0x10    Test for -Infinity
+0x08    Test for +Zero
+0x04    Test for -Zero
+0x02    Test for +Denormal
+0x01    Test for -Denormal
+@end smallexample
 
-@deftypefn {Built-in Function} {int32_t} __builtin_riscv_cv_mac_machhsN (int32_t, int32_t, uint8_t)
-Generates the @code{cv.machhsN} machine instruction.
-@end deftypefn
+The @code{scalar_test_neg} built-in function returns 1 if its
+@code{source} argument holds a negative value, 0 otherwise.
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_mac_macuRN (uint32_t, uint32_t, uint8_t)
-Generates the @code{cv.macuRN} machine instruction.
-@end deftypefn
+The following built-in functions are also available for the PowerPC family
+of processors, starting with ISA 3.0 or later
+(@option{-mcpu=power9}).  These string functions are described
+separately in order to group the descriptions closer to the function
+prototypes.
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_mac_machhuRN (uint32_t, uint32_t, uint8_t)
-Generates the @code{cv.machhuRN} machine instruction.
-@end deftypefn
+Only functions excluded from the PVIPR are listed here.
 
-@deftypefn {Built-in Function} {int32_t} __builtin_riscv_cv_mac_macsRN (int32_t, int32_t, uint8_t)
-Generates the @code{cv.macsRN} machine instruction.
-@end deftypefn
+@smallexample
+int vec_all_nez (vector signed char, vector signed char);
+int vec_all_nez (vector unsigned char, vector unsigned char);
+int vec_all_nez (vector signed short, vector signed short);
+int vec_all_nez (vector unsigned short, vector unsigned short);
+int vec_all_nez (vector signed int, vector signed int);
+int vec_all_nez (vector unsigned int, vector unsigned int);
 
-@deftypefn {Built-in Function} {int32_t} __builtin_riscv_cv_mac_machhsRN (int32_t, int32_t, uint8_t)
-Generates the @code{cv.machhsRN} machine instruction.
-@end deftypefn
+int vec_any_eqz (vector signed char, vector signed char);
+int vec_any_eqz (vector unsigned char, vector unsigned char);
+int vec_any_eqz (vector signed short, vector signed short);
+int vec_any_eqz (vector unsigned short, vector unsigned short);
+int vec_any_eqz (vector signed int, vector signed int);
+int vec_any_eqz (vector unsigned int, vector unsigned int);
+
+signed char vec_xlx (unsigned int index, vector signed char data);
+unsigned char vec_xlx (unsigned int index, vector unsigned char data);
+signed short vec_xlx (unsigned int index, vector signed short data);
+unsigned short vec_xlx (unsigned int index, vector unsigned short data);
+signed int vec_xlx (unsigned int index, vector signed int data);
+unsigned int vec_xlx (unsigned int index, vector unsigned int data);
+float vec_xlx (unsigned int index, vector float data);
 
-These built-in functions are available for the CORE-V ALU machine
-architecture. For more information on CORE-V built-ins, please see
-@uref{https://github.com/openhwgroup/core-v-sw/blob/master/specifications/corev-builtin-spec.md#listing-of-miscellaneous-alu-builtins-xcvalu}
+signed char vec_xrx (unsigned int index, vector signed char data);
+unsigned char vec_xrx (unsigned int index, vector unsigned char data);
+signed short vec_xrx (unsigned int index, vector signed short data);
+unsigned short vec_xrx (unsigned int index, vector unsigned short data);
+signed int vec_xrx (unsigned int index, vector signed int data);
+unsigned int vec_xrx (unsigned int index, vector unsigned int data);
+float vec_xrx (unsigned int index, vector float data);
+@end smallexample
 
-@deftypefn {Built-in Function} {int} __builtin_riscv_cv_alu_slet (int32_t, int32_t)
-Generated assembler @code{cv.slet}
-@end deftypefn
+The @code{vec_all_nez}, @code{vec_any_eqz}, and @code{vec_cmpnez}
+perform pairwise comparisons between the elements at the same
+positions within their two vector arguments.
+The @code{vec_all_nez} function returns a
+non-zero value if and only if all pairwise comparisons are not
+equal and no element of either vector argument contains a zero.
+The @code{vec_any_eqz} function returns a
+non-zero value if and only if at least one pairwise comparison is equal
+or if at least one element of either vector argument contains a zero.
+The @code{vec_cmpnez} function returns a vector of the same type as
+its two arguments, within which each element consists of all ones to
+denote that either the corresponding elements of the incoming arguments are
+not equal or that at least one of the corresponding elements contains
+zero.  Otherwise, the element of the returned vector contains all zeros.
 
-@deftypefn {Built-in Function} {int} __builtin_riscv_cv_alu_sletu (uint32_t, uint32_t)
-Generated assembler @code{cv.sletu}
-@end deftypefn
+The @code{vec_xlx} and @code{vec_xrx} functions extract the single
+element selected by the @code{index} argument from the vector
+represented by the @code{data} argument.  The @code{index} argument
+always specifies a byte offset, regardless of the size of the vector
+element.  With @code{vec_xlx}, @code{index} is the offset of the first
+byte of the element to be extracted.  With @code{vec_xrx}, @code{index}
+represents the last byte of the element to be extracted, measured
+from the right end of the vector.  In other words, the last byte of
+the element to be extracted is found at position @code{(15 - index)}.
+There is no requirement that @code{index} be a multiple of the vector
+element size.  However, if the size of the vector element added to
+@code{index} is greater than 15, the content of the returned value is
+undefined.
 
-@deftypefn {Built-in Function} {int32_t} __builtin_riscv_cv_alu_min (int32_t, int32_t)
-Generated assembler @code{cv.min}
-@end deftypefn
+The following functions are also available if the ISA 3.0 instruction
+set additions (@option{-mcpu=power9}) are available.
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_alu_minu (uint32_t, uint32_t)
-Generated assembler @code{cv.minu}
-@end deftypefn
+Only functions excluded from the PVIPR are listed here.
 
-@deftypefn {Built-in Function} {int32_t} __builtin_riscv_cv_alu_max (int32_t, int32_t)
-Generated assembler @code{cv.max}
-@end deftypefn
+@smallexample
+vector long long vec_vctz (vector long long);
+vector unsigned long long vec_vctz (vector unsigned long long);
+vector int vec_vctz (vector int);
+vector unsigned int vec_vctz (vector int);
+vector short vec_vctz (vector short);
+vector unsigned short vec_vctz (vector unsigned short);
+vector signed char vec_vctz (vector signed char);
+vector unsigned char vec_vctz (vector unsigned char);
 
-@deftypefn {Built-in Function} {uint32_tnt} __builtin_riscv_cv_alu_maxu (uint32_t, uint32_t)
-Generated assembler @code{cv.maxu}
-@end deftypefn
+vector signed char vec_vctzb (vector signed char);
+vector unsigned char vec_vctzb (vector unsigned char);
 
-@deftypefn {Built-in Function} {int32_t} __builtin_riscv_cv_alu_exths (int16_t)
-Generated assembler @code{cv.exths}
-@end deftypefn
+vector long long vec_vctzd (vector long long);
+vector unsigned long long vec_vctzd (vector unsigned long long);
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_alu_exthz (uint16_t)
-Generated assembler @code{cv.exthz}
-@end deftypefn
+vector short vec_vctzh (vector short);
+vector unsigned short vec_vctzh (vector unsigned short);
 
-@deftypefn {Built-in Function} {int32_t} __builtin_riscv_cv_alu_extbs (int8_t)
-Generated assembler @code{cv.extbs}
-@end deftypefn
+vector int vec_vctzw (vector int);
+vector unsigned int vec_vctzw (vector int);
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_alu_extbz (uint8_t)
-Generated assembler @code{cv.extbz}
-@end deftypefn
+vector int vec_vprtyb (vector int);
+vector unsigned int vec_vprtyb (vector unsigned int);
+vector long long vec_vprtyb (vector long long);
+vector unsigned long long vec_vprtyb (vector unsigned long long);
 
-@deftypefn {Built-in Function} {int32_t} __builtin_riscv_cv_alu_clip (int32_t, uint32_t)
-Generated assembler @code{cv.clip} if the uint32_t operand is a constant and an exact power of 2.
-Generated assembler @code{cv.clipr} if  the it is a register.
-@end deftypefn
+vector int vec_vprtybw (vector int);
+vector unsigned int vec_vprtybw (vector unsigned int);
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_alu_clipu (uint32_t, uint32_t)
-Generated assembler @code{cv.clipu} if the uint32_t operand is a constant and an exact power of 2.
-Generated assembler @code{cv.clipur} if  the it is a register.
-@end deftypefn
+vector long long vec_vprtybd (vector long long);
+vector unsigned long long vec_vprtybd (vector unsigned long long);
+@end smallexample
 
-@deftypefn {Built-in Function} {int32_t} __builtin_riscv_cv_alu_addN (int32_t, int32_t, uint8_t)
-Generated assembler @code{cv.addN} if the uint8_t operand is a constant and in the range 0 <= shft <= 31.
-Generated assembler @code{cv.addNr} if  the it is a register.
-@end deftypefn
+On 64-bit targets, if the ISA 3.0 additions (@option{-mcpu=power9})
+are available:
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_alu_adduN (uint32_t, uint32_t, uint8_t)
-Generated assembler @code{cv.adduN} if the uint8_t operand is a constant and in the range 0 <= shft <= 31.
-Generated assembler @code{cv.adduNr} if  the it is a register.
-@end deftypefn
+@smallexample
+vector long vec_vprtyb (vector long);
+vector unsigned long vec_vprtyb (vector unsigned long);
+vector __int128 vec_vprtyb (vector __int128);
+vector __uint128 vec_vprtyb (vector __uint128);
 
-@deftypefn {Built-in Function} {int32_t} __builtin_riscv_cv_alu_addRN (int32_t, int32_t, uint8_t)
-Generated assembler @code{cv.addRN} if the uint8_t operand is a constant and in the range 0 <= shft <= 31.
-Generated assembler @code{cv.addRNr} if  the it is a register.
-@end deftypefn
+vector long vec_vprtybd (vector long);
+vector unsigned long vec_vprtybd (vector unsigned long);
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_alu_adduRN (uint32_t, uint32_t, uint8_t)
-Generated assembler @code{cv.adduRN} if the uint8_t operand is a constant and in the range 0 <= shft <= 31.
-Generated assembler @code{cv.adduRNr} if  the it is a register.
-@end deftypefn
+vector __int128 vec_vprtybq (vector __int128);
+vector __uint128 vec_vprtybd (vector __uint128);
+@end smallexample
 
-@deftypefn {Built-in Function} {int32_t} __builtin_riscv_cv_alu_subN (int32_t, int32_t, uint8_t)
-Generated assembler @code{cv.subN} if the uint8_t operand is a constant and in the range 0 <= shft <= 31.
-Generated assembler @code{cv.subNr} if  the it is a register.
-@end deftypefn
+The following built-in functions are available for the PowerPC family
+of processors, starting with ISA 3.0 or later (@option{-mcpu=power9}).
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_alu_subuN (uint32_t, uint32_t, uint8_t)
-Generated assembler @code{cv.subuN} if the uint8_t operand is a constant and in the range 0 <= shft <= 31.
-Generated assembler @code{cv.subuNr} if  the it is a register.
-@end deftypefn
+Only functions excluded from the PVIPR are listed here.
 
-@deftypefn {Built-in Function} {int32_t} __builtin_riscv_cv_alu_subRN (int32_t, int32_t, uint8_t)
-Generated assembler @code{cv.subRN} if the uint8_t operand is a constant and in the range 0 <= shft <= 31.
-Generated assembler @code{cv.subRNr} if  the it is a register.
-@end deftypefn
+@smallexample
+__vector unsigned char
+vec_absdb (__vector unsigned char arg1, __vector unsigned char arg2);
+__vector unsigned short
+vec_absdh (__vector unsigned short arg1, __vector unsigned short arg2);
+__vector unsigned int
+vec_absdw (__vector unsigned int arg1, __vector unsigned int arg2);
+@end smallexample
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_alu_subuRN (uint32_t, uint32_t, uint8_t)
-Generated assembler @code{cv.subuRN} if the uint8_t operand is a constant and in the range 0 <= shft <= 31.
-Generated assembler @code{cv.subuRNr} if  the it is a register.
-@end deftypefn
+The @code{vec_absd}, @code{vec_absdb}, @code{vec_absdh}, and
+@code{vec_absdw} built-in functions each computes the absolute
+differences of the pairs of vector elements supplied in its two vector
+arguments, placing the absolute differences into the corresponding
+elements of the vector result.
 
-These built-in functions are available for the CORE-V Event Load machine
-architecture. For more information on CORE-V ELW builtins, please see
-@uref{https://github.com/openhwgroup/core-v-sw/blob/master/specifications/corev-builtin-spec.md#listing-of-event-load-word-builtins-xcvelw}
+The following built-in functions are available for the PowerPC family
+of processors, starting with ISA 3.0 or later (@option{-mcpu=power9}):
+@smallexample
+vector unsigned int vec_vrlnm (vector unsigned int, vector unsigned int);
+vector unsigned long long vec_vrlnm (vector unsigned long long,
+                                     vector unsigned long long);
+@end smallexample
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_elw_elw (uint32_t *)
-Generated assembler @code{cv.elw}
-@end deftypefn
+The result of @code{vec_vrlnm} is obtained by rotating each element
+of the first argument vector left and ANDing it with a mask.  The
+second argument vector contains the mask  beginning in bits 11:15,
+the mask end in bits 19:23, and the shift count in bits 27:31,
+of each element.
 
-These built-in functions are available for the CORE-V SIMD machine
-architecture. For more information on CORE-V SIMD built-ins, please see
-@uref{https://github.com/openhwgroup/core-v-sw/blob/master/specifications/corev-builtin-spec.md#listing-of-pulp-816-bit-simd-builtins-xcvsimd}
+If the cryptographic instructions are enabled (@option{-mcrypto} or
+@option{-mcpu=power8}), the following builtins are enabled.
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_add_h (uint32_t, uint32_t, uint4_t)
-Generated assembler @code{cv.add.h}
-@end deftypefn
+Only functions excluded from the PVIPR are listed here.
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_add_b (uint32_t, uint32_t)
-Generated assembler @code{cv.add.b}
-@end deftypefn
+@smallexample
+vector unsigned long long __builtin_crypto_vsbox (vector unsigned long long);
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_add_sc_h (uint32_t, int16_t)
-Generated assembler @code{cv.add.sc.h}
-@end deftypefn
+vector unsigned long long __builtin_crypto_vcipher (vector unsigned long long,
+                                                    vector unsigned long long);
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_add_sc_h (uint32_t, int6_t)
-Generated assembler @code{cv.add.sci.h}
-@end deftypefn
+vector unsigned long long __builtin_crypto_vcipherlast
+                                     (vector unsigned long long,
+                                      vector unsigned long long);
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_add_sc_b (uint32_t, int8_t)
-Generated assembler @code{cv.add.sc.b}
-@end deftypefn
+vector unsigned long long __builtin_crypto_vncipher (vector unsigned long long,
+                                                     vector unsigned long long);
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_add_sc_b (uint32_t, int6_t)
-Generated assembler @code{cv.add.sci.b}
-@end deftypefn
+vector unsigned long long __builtin_crypto_vncipherlast (vector unsigned long long,
+                                                         vector unsigned long long);
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sub_h (uint32_t, uint32_t, uint4_t)
-Generated assembler @code{cv.sub.h}
-@end deftypefn
+vector unsigned char __builtin_crypto_vpermxor (vector unsigned char,
+                                                vector unsigned char,
+                                                vector unsigned char);
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sub_b (uint32_t, uint32_t)
-Generated assembler @code{cv.sub.b}
-@end deftypefn
+vector unsigned short __builtin_crypto_vpermxor (vector unsigned short,
+                                                 vector unsigned short,
+                                                 vector unsigned short);
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sub_sc_h (uint32_t, int16_t)
-Generated assembler @code{cv.sub.sc.h}
-@end deftypefn
+vector unsigned int __builtin_crypto_vpermxor (vector unsigned int,
+                                               vector unsigned int,
+                                               vector unsigned int);
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sub_sc_h (uint32_t, int6_t)
-Generated assembler @code{cv.sub.sci.h}
-@end deftypefn
+vector unsigned long long __builtin_crypto_vpermxor (vector unsigned long long,
+                                                     vector unsigned long long,
+                                                     vector unsigned long long);
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sub_sc_b (uint32_t, int8_t)
-Generated assembler @code{cv.sub.sc.b}
-@end deftypefn
+vector unsigned char __builtin_crypto_vpmsumb (vector unsigned char,
+                                               vector unsigned char);
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sub_sc_b (uint32_t, int6_t)
-Generated assembler @code{cv.sub.sci.b}
-@end deftypefn
+vector unsigned short __builtin_crypto_vpmsumh (vector unsigned short,
+                                                vector unsigned short);
+
+vector unsigned int __builtin_crypto_vpmsumw (vector unsigned int,
+                                              vector unsigned int);
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_avg_h (uint32_t, uint32_t)
-Generated assembler @code{cv.avg.h}
-@end deftypefn
+vector unsigned long long __builtin_crypto_vpmsumd (vector unsigned long long,
+                                                    vector unsigned long long);
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_avg_b (uint32_t, uint32_t)
-Generated assembler @code{cv.avg.b}
-@end deftypefn
+vector unsigned long long __builtin_crypto_vshasigmad (vector unsigned long long,
+                                                       int, int);
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_avg_sc_h (uint32_t, int16_t)
-Generated assembler @code{cv.avg.sc.h}
-@end deftypefn
+vector unsigned int __builtin_crypto_vshasigmaw (vector unsigned int, int, int);
+@end smallexample
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_avg_sc_h (uint32_t, int6_t)
-Generated assembler @code{cv.avg.sci.h}
-@end deftypefn
+The second argument to @var{__builtin_crypto_vshasigmad} and
+@var{__builtin_crypto_vshasigmaw} must be a constant
+integer that is 0 or 1.  The third argument to these built-in functions
+must be a constant integer in the range of 0 to 15.
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_avg_sc_b (uint32_t, int8_t)
-Generated assembler @code{cv.avg.sc.b}
-@end deftypefn
+The following sign extension builtins are provided:
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_avg_sc_b (uint32_t, int6_t)
-Generated assembler @code{cv.avg.sci.b}
-@end deftypefn
+@smallexample
+vector signed int vec_signexti (vector signed char a);
+vector signed long long vec_signextll (vector signed char a);
+vector signed int vec_signexti (vector signed short a);
+vector signed long long vec_signextll (vector signed short a);
+vector signed long long vec_signextll (vector signed int a);
+vector signed long long vec_signextq (vector signed long long a);
+@end smallexample
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_avgu_h (uint32_t, uint32_t)
-Generated assembler @code{cv.avgu.h}
-@end deftypefn
+Each element of the result is produced by sign-extending the element of the
+input vector that would fall in the least significant portion of the result
+element. For example, a sign-extension of a vector signed char to a vector
+signed long long will sign extend the rightmost byte of each doubleword.
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_avgu_b (uint32_t, uint32_t)
-Generated assembler @code{cv.avgu.b}
-@end deftypefn
+@node PowerPC AltiVec Built-in Functions Available on ISA 3.1
+@subsubsection PowerPC AltiVec Built-in Functions Available on ISA 3.1
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_avgu_sc_h (uint32_t, uint16_t)
-Generated assembler @code{cv.avgu.sc.h}
-@end deftypefn
+The following additional built-in functions are also available for the
+PowerPC family of processors, starting with ISA 3.1 (@option{-mcpu=power10}):
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_avgu_sc_h (uint32_t, uint6_t)
-Generated assembler @code{cv.avgu.sci.h}
-@end deftypefn
+@smallexample
+@exdent int vec_test_lsbb_all_ones (vector signed char);
+@exdent int vec_test_lsbb_all_ones (vector unsigned char);
+@exdent int vec_test_lsbb_all_ones (vector bool char);
+@end smallexample
+@findex vec_test_lsbb_all_ones
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_avgu_sc_b (uint32_t, uint8_t)
-Generated assembler @code{cv.avgu.sc.b}
-@end deftypefn
+The builtin @code{vec_test_lsbb_all_ones} returns 1 if the least significant
+bit in each byte is equal to 1.  It returns 0 otherwise.
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_avgu_sc_b (uint32_t, uint6_t)
-Generated assembler @code{cv.avgu.sci.b}
-@end deftypefn
+@smallexample
+@exdent int vec_test_lsbb_all_zeros (vector signed char);
+@exdent int vec_test_lsbb_all_zeros (vector unsigned char);
+@exdent int vec_test_lsbb_all_zeros (vector bool char);
+@end smallexample
+@findex vec_test_lsbb_all_zeros
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_min_h (uint32_t, uint32_t)
-Generated assembler @code{cv.min.h}
-@end deftypefn
+The builtin @code{vec_test_lsbb_all_zeros} returns 1 if the least significant
+bit in each byte is equal to zero.  It returns 0 otherwise.
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_min_b (uint32_t, uint32_t)
-Generated assembler @code{cv.min.b}
-@end deftypefn
+@smallexample
+@exdent vector unsigned long long int
+@exdent vec_cfuge (vector unsigned long long int, vector unsigned long long int);
+@end smallexample
+Perform a vector centrifuge operation, as if implemented by the
+@code{vcfuged} instruction.
+@findex vec_cfuge
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_min_sc_h (uint32_t, int16_t)
-Generated assembler @code{cv.min.sc.h}
-@end deftypefn
+@smallexample
+@exdent vector unsigned long long int
+@exdent vec_cntlzm (vector unsigned long long int, vector unsigned long long int);
+@end smallexample
+Perform a vector count leading zeros under bit mask operation, as if
+implemented by the @code{vclzdm} instruction.
+@findex vec_cntlzm
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_min_sc_h (uint32_t, int6_t)
-Generated assembler @code{cv.min.sci.h}
-@end deftypefn
+@smallexample
+@exdent vector unsigned long long int
+@exdent vec_cnttzm (vector unsigned long long int, vector unsigned long long int);
+@end smallexample
+Perform a vector count trailing zeros under bit mask operation, as if
+implemented by the @code{vctzdm} instruction.
+@findex vec_cnttzm
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_min_sc_b (uint32_t, int8_t)
-Generated assembler @code{cv.min.sc.b}
-@end deftypefn
+@smallexample
+@exdent vector signed char
+@exdent vec_clrl (vector signed char @var{a}, unsigned int @var{n});
+@exdent vector unsigned char
+@exdent vec_clrl (vector unsigned char @var{a}, unsigned int @var{n});
+@end smallexample
+Clear the left-most @code{(16 - n)} bytes of vector argument @code{a}, as if
+implemented by the @code{vclrlb} instruction on a big-endian target
+and by the @code{vclrrb} instruction on a little-endian target.  A
+value of @code{n} that is greater than 16 is treated as if it equaled 16.
+@findex vec_clrl
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_min_sc_b (uint32_t, int6_t)
-Generated assembler @code{cv.min.sci.b}
-@end deftypefn
+@smallexample
+@exdent vector signed char
+@exdent vec_clrr (vector signed char @var{a}, unsigned int @var{n});
+@exdent vector unsigned char
+@exdent vec_clrr (vector unsigned char @var{a}, unsigned int @var{n});
+@end smallexample
+Clear the right-most @code{(16 - n)} bytes of vector argument @code{a}, as if
+implemented by the @code{vclrrb} instruction on a big-endian target
+and by the @code{vclrlb} instruction on a little-endian target.  A
+value of @code{n} that is greater than 16 is treated as if it equaled 16.
+@findex vec_clrr
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_minu_h (uint32_t, uint32_t)
-Generated assembler @code{cv.minu.h}
-@end deftypefn
+@smallexample
+@exdent vector unsigned long long int
+@exdent vec_gnb (vector unsigned __int128, const unsigned char);
+@end smallexample
+Perform a 128-bit vector gather  operation, as if implemented by the
+@code{vgnb} instruction.  The second argument must be a literal
+integer value between 2 and 7 inclusive.
+@findex vec_gnb
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_minu_b (uint32_t, uint32_t)
-Generated assembler @code{cv.minu.b}
-@end deftypefn
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_minu_sc_h (uint32_t, uint16_t)
-Generated assembler @code{cv.minu.sc.h}
-@end deftypefn
+Vector Extract
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_minu_sc_h (uint32_t, uint6_t)
-Generated assembler @code{cv.minu.sci.h}
-@end deftypefn
+@smallexample
+@exdent vector unsigned long long int
+@exdent vec_extractl (vector unsigned char, vector unsigned char, unsigned int);
+@exdent vector unsigned long long int
+@exdent vec_extractl (vector unsigned short, vector unsigned short, unsigned int);
+@exdent vector unsigned long long int
+@exdent vec_extractl (vector unsigned int, vector unsigned int, unsigned int);
+@exdent vector unsigned long long int
+@exdent vec_extractl (vector unsigned long long, vector unsigned long long, unsigned int);
+@end smallexample
+Extract an element from two concatenated vectors starting at the given byte index
+in natural-endian order, and place it zero-extended in doubleword 1 of the result
+according to natural element order.  If the byte index is out of range for the
+data type, the intrinsic will be rejected.
+For little-endian, this output will match the placement by the hardware
+instruction, i.e., dword[0] in RTL notation.  For big-endian, an additional
+instruction is needed to move it from the "left" doubleword to the  "right" one.
+For little-endian, semantics matching the @code{vextdubvrx},
+@code{vextduhvrx}, @code{vextduwvrx} instruction will be generated, while for
+big-endian, semantics matching the @code{vextdubvlx}, @code{vextduhvlx},
+@code{vextduwvlx} instructions
+will be generated.  Note that some fairly anomalous results can be generated if
+the byte index is not aligned on an element boundary for the element being
+extracted.  This is a limitation of the bi-endian vector programming model is
+consistent with the limitation on @code{vec_perm}.
+@findex vec_extractl
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_minu_sc_b (uint32_t, uint8_t)
-Generated assembler @code{cv.minu.sc.b}
-@end deftypefn
+@smallexample
+@exdent vector unsigned long long int
+@exdent vec_extracth (vector unsigned char, vector unsigned char, unsigned int);
+@exdent vector unsigned long long int
+@exdent vec_extracth (vector unsigned short, vector unsigned short,
+unsigned int);
+@exdent vector unsigned long long int
+@exdent vec_extracth (vector unsigned int, vector unsigned int, unsigned int);
+@exdent vector unsigned long long int
+@exdent vec_extracth (vector unsigned long long, vector unsigned long long,
+unsigned int);
+@end smallexample
+Extract an element from two concatenated vectors starting at the given byte
+index.  The index is based on big endian order for a little endian system.
+Similarly, the index is based on little endian order for a big endian system.
+The extraced elements are zero-extended and put in doubleword 1
+according to natural element order.  If the byte index is out of range for the
+data type, the intrinsic will be rejected.  For little-endian, this output
+will match the placement by the hardware instruction (vextdubvrx, vextduhvrx,
+vextduwvrx, vextddvrx) i.e., dword[0] in RTL
+notation.  For big-endian, an additional instruction is needed to move it
+from the "left" doubleword to the "right" one.  For little-endian, semantics
+matching the @code{vextdubvlx}, @code{vextduhvlx}, @code{vextduwvlx}
+instructions will be generated, while for big-endian, semantics matching the
+@code{vextdubvrx}, @code{vextduhvrx}, @code{vextduwvrx} instructions will
+be generated.  Note that some fairly anomalous
+results can be generated if the byte index is not aligned on the
+element boundary for the element being extracted.  This is a
+limitation of the bi-endian vector programming model consistent with the
+limitation on @code{vec_perm}.
+@findex vec_extracth
+@smallexample
+@exdent vector unsigned long long int
+@exdent vec_pdep (vector unsigned long long int, vector unsigned long long int);
+@end smallexample
+Perform a vector parallel bits deposit operation, as if implemented by
+the @code{vpdepd} instruction.
+@findex vec_pdep
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_minu_sc_b (uint32_t, uint6_t)
-Generated assembler @code{cv.minu.sci.b}
-@end deftypefn
+Vector Insert
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_max_h (uint32_t, uint32_t)
-Generated assembler @code{cv.max.h}
-@end deftypefn
+@smallexample
+@exdent vector unsigned char
+@exdent vec_insertl (unsigned char, vector unsigned char, unsigned int);
+@exdent vector unsigned short
+@exdent vec_insertl (unsigned short, vector unsigned short, unsigned int);
+@exdent vector unsigned int
+@exdent vec_insertl (unsigned int, vector unsigned int, unsigned int);
+@exdent vector unsigned long long
+@exdent vec_insertl (unsigned long long, vector unsigned long long,
+unsigned int);
+@exdent vector unsigned char
+@exdent vec_insertl (vector unsigned char, vector unsigned char, unsigned int;
+@exdent vector unsigned short
+@exdent vec_insertl (vector unsigned short, vector unsigned short,
+unsigned int);
+@exdent vector unsigned int
+@exdent vec_insertl (vector unsigned int, vector unsigned int, unsigned int);
+@end smallexample
+
+Let src be the first argument, when the first argument is a scalar, or the
+rightmost element of the left doubleword of the first argument, when the first
+argument is a vector.  Insert the source into the destination at the position
+given by the third argument, using natural element order in the second
+argument.  The rest of the second argument is unchanged.  If the byte
+index is greater than 14 for halfwords, greater than 12 for words, or
+greater than 8 for doublewords the result is undefined.   For little-endian,
+the generated code will be semantically equivalent to @code{vins[bhwd]rx}
+instructions.  Similarly for big-endian it will be semantically equivalent
+to @code{vins[bhwd]lx}.  Note that some fairly anomalous results can be
+generated if the byte index is not aligned on an element boundary for the
+type of element being inserted.
+@findex vec_insertl
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_max_b (uint32_t, uint32_t)
-Generated assembler @code{cv.max.b}
-@end deftypefn
+@smallexample
+@exdent vector unsigned char
+@exdent vec_inserth (unsigned char, vector unsigned char, unsigned int);
+@exdent vector unsigned short
+@exdent vec_inserth (unsigned short, vector unsigned short, unsigned int);
+@exdent vector unsigned int
+@exdent vec_inserth (unsigned int, vector unsigned int, unsigned int);
+@exdent vector unsigned long long
+@exdent vec_inserth (unsigned long long, vector unsigned long long,
+unsigned int);
+@exdent vector unsigned char
+@exdent vec_inserth (vector unsigned char, vector unsigned char, unsigned int);
+@exdent vector unsigned short
+@exdent vec_inserth (vector unsigned short, vector unsigned short,
+unsigned int);
+@exdent vector unsigned int
+@exdent vec_inserth (vector unsigned int, vector unsigned int, unsigned int);
+@end smallexample
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_max_sc_h (uint32_t, int16_t)
-Generated assembler @code{cv.max.sc.h}
-@end deftypefn
+Let src be the first argument, when the first argument is a scalar, or the
+rightmost element of the first argument, when the first argument is a vector.
+Insert src into the second argument at the position identified by the third
+argument, using opposite element order in the second argument, and leaving the
+rest of the second argument unchanged.  If the byte index is greater than 14
+for halfwords, 12 for words, or 8 for doublewords, the intrinsic will be
+rejected. Note that the underlying hardware instruction uses the same register
+for the second argument and the result.
+For little-endian, the code generation will be semantically equivalent to
+@code{vins[bhwd]lx}, while for big-endian it will be semantically equivalent to
+@code{vins[bhwd]rx}.
+Note that some fairly anomalous results can be generated if the byte index is
+not aligned on an element boundary for the sort of element being inserted.
+@findex vec_inserth
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_max_sc_h (uint32_t, int6_t)
-Generated assembler @code{cv.max.sci.h}
-@end deftypefn
+Vector Replace Element
+@smallexample
+@exdent vector signed int vec_replace_elt (vector signed int, signed int,
+const int);
+@exdent vector unsigned int vec_replace_elt (vector unsigned int,
+unsigned int, const int);
+@exdent vector float vec_replace_elt (vector float, float, const int);
+@exdent vector signed long long vec_replace_elt (vector signed long long,
+signed long long, const int);
+@exdent vector unsigned long long vec_replace_elt (vector unsigned long long,
+unsigned long long, const int);
+@exdent vector double rec_replace_elt (vector double, double, const int);
+@end smallexample
+The third argument (constrained to [0,3]) identifies the natural-endian
+element number of the first argument that will be replaced by the second
+argument to produce the result.  The other elements of the first argument will
+remain unchanged in the result.
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_max_sc_b (uint32_t, int8_t)
-Generated assembler @code{cv.max.sc.b}
-@end deftypefn
+If it's desirable to insert a word at an unaligned position, use
+vec_replace_unaligned instead.
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_max_sc_b (uint32_t, int6_t)
-Generated assembler @code{cv.max.sci.b}
-@end deftypefn
+@findex vec_replace_element
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_maxu_h (uint32_t, uint32_t)
-Generated assembler @code{cv.maxu.h}
-@end deftypefn
+Vector Replace Unaligned
+@smallexample
+@exdent vector unsigned char vec_replace_unaligned (vector unsigned char,
+signed int, const int);
+@exdent vector unsigned char vec_replace_unaligned (vector unsigned char,
+unsigned int, const int);
+@exdent vector unsigned char vec_replace_unaligned (vector unsigned char,
+float, const int);
+@exdent vector unsigned char vec_replace_unaligned (vector unsigned char,
+signed long long, const int);
+@exdent vector unsigned char vec_replace_unaligned (vector unsigned char,
+unsigned long long, const int);
+@exdent vector unsigned char vec_replace_unaligned (vector unsigned char,
+double, const int);
+@end smallexample
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_maxu_b (uint32_t, uint32_t)
-Generated assembler @code{cv.maxu.b}
-@end deftypefn
+The second argument replaces a portion of the first argument to produce the
+result, with the rest of the first argument unchanged in the result.  The
+third argument identifies the byte index (using left-to-right, or big-endian
+order) where the high-order byte of the second argument will be placed, with
+the remaining bytes of the second argument placed naturally "to the right"
+of the high-order byte.
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_maxu_sc_h (uint32_t, uint16_t)
-Generated assembler @code{cv.maxu.sc.h}
-@end deftypefn
+The programmer is responsible for understanding the endianness issues involved
+with the first argument and the result.
+@findex vec_replace_unaligned
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_maxu_sc_h (uint32_t, uint6_t)
-Generated assembler @code{cv.maxu.sci.h}
-@end deftypefn
+Vector Shift Left Double Bit Immediate
+@smallexample
+@exdent vector signed char vec_sldb (vector signed char, vector signed char,
+const unsigned int);
+@exdent vector unsigned char vec_sldb (vector unsigned char,
+vector unsigned char, const unsigned int);
+@exdent vector signed short vec_sldb (vector signed short, vector signed short,
+const unsigned int);
+@exdent vector unsigned short vec_sldb (vector unsigned short,
+vector unsigned short, const unsigned int);
+@exdent vector signed int vec_sldb (vector signed int, vector signed int,
+const unsigned int);
+@exdent vector unsigned int vec_sldb (vector unsigned int, vector unsigned int,
+const unsigned int);
+@exdent vector signed long long vec_sldb (vector signed long long,
+vector signed long long, const unsigned int);
+@exdent vector unsigned long long vec_sldb (vector unsigned long long,
+vector unsigned long long, const unsigned int);
+@exdent vector signed __int128 vec_sldb (vector signed __int128,
+vector signed __int128, const unsigned int);
+@exdent vector unsigned __int128 vec_sldb (vector unsigned __int128,
+vector unsigned __int128, const unsigned int);
+@end smallexample
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_maxu_sc_b (uint32_t, uint8_t)
-Generated assembler @code{cv.maxu.sc.b}
-@end deftypefn
+Shift the combined input vectors left by the amount specified by the low-order
+three bits of the third argument, and return the leftmost remaining 128 bits.
+Code using this instruction must be endian-aware.
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_maxu_sc_b (uint32_t, uint6_t)
-Generated assembler @code{cv.maxu.sci.b}
-@end deftypefn
+@findex vec_sldb
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_srl_h (uint32_t, uint32_t)
-Generated assembler @code{cv.srl.h}
-@end deftypefn
+Vector Shift Right Double Bit Immediate
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_srl_b (uint32_t, uint32_t)
-Generated assembler @code{cv.srl.b}
-@end deftypefn
+@smallexample
+@exdent vector signed char vec_srdb (vector signed char, vector signed char,
+const unsigned int);
+@exdent vector unsigned char vec_srdb (vector unsigned char, vector unsigned char,
+const unsigned int);
+@exdent vector signed short vec_srdb (vector signed short, vector signed short,
+const unsigned int);
+@exdent vector unsigned short vec_srdb (vector unsigned short, vector unsigned short,
+const unsigned int);
+@exdent vector signed int vec_srdb (vector signed int, vector signed int,
+const unsigned int);
+@exdent vector unsigned int vec_srdb (vector unsigned int, vector unsigned int,
+const unsigned int);
+@exdent vector signed long long vec_srdb (vector signed long long,
+vector signed long long, const unsigned int);
+@exdent vector unsigned long long vec_srdb (vector unsigned long long,
+vector unsigned long long, const unsigned int);
+@exdent vector signed __int128 vec_srdb (vector signed __int128,
+vector signed __int128, const unsigned int);
+@exdent vector unsigned __int128 vec_srdb (vector unsigned __int128,
+vector unsigned __int128, const unsigned int);
+@end smallexample
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_srl_sc_h (uint32_t, int16_t)
-Generated assembler @code{cv.srl.sc.h}
-@end deftypefn
+Shift the combined input vectors right by the amount specified by the low-order
+three bits of the third argument, and return the remaining 128 bits.  Code
+using this built-in must be endian-aware.
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_srl_sc_h (uint32_t, int6_t)
-Generated assembler @code{cv.srl.sci.h}
-@end deftypefn
+@findex vec_srdb
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_srl_sc_b (uint32_t, int8_t)
-Generated assembler @code{cv.srl.sc.b}
-@end deftypefn
+Vector Splat
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_srl_sc_b (uint32_t, int6_t)
-Generated assembler @code{cv.srl.sci.b}
-@end deftypefn
+@smallexample
+@exdent vector signed int vec_splati (const signed int);
+@exdent vector float vec_splati (const float);
+@end smallexample
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sra_h (uint32_t, uint32_t)
-Generated assembler @code{cv.sra.h}
-@end deftypefn
+Splat a 32-bit immediate into a vector of words.
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sra_b (uint32_t, uint32_t)
-Generated assembler @code{cv.sra.b}
-@end deftypefn
+@findex vec_splati
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sra_sc_h (uint32_t, int16_t)
-Generated assembler @code{cv.sra.sc.h}
-@end deftypefn
+@smallexample
+@exdent vector double vec_splatid (const float);
+@end smallexample
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sra_sc_h (uint32_t, int6_t)
-Generated assembler @code{cv.sra.sci.h}
-@end deftypefn
+Convert a single precision floating-point value to double-precision and splat
+the result to a vector of double-precision floats.
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sra_sc_b (uint32_t, int8_t)
-Generated assembler @code{cv.sra.sc.b}
-@end deftypefn
+@findex vec_splatid
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sra_sc_b (uint32_t, int6_t)
-Generated assembler @code{cv.sra.sci.b}
-@end deftypefn
+@smallexample
+@exdent vector signed int vec_splati_ins (vector signed int,
+const unsigned int, const signed int);
+@exdent vector unsigned int vec_splati_ins (vector unsigned int,
+const unsigned int, const unsigned int);
+@exdent vector float vec_splati_ins (vector float, const unsigned int,
+const float);
+@end smallexample
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sll_h (uint32_t, uint32_t)
-Generated assembler @code{cv.sll.h}
-@end deftypefn
+Argument 2 must be either 0 or 1.  Splat the value of argument 3 into the word
+identified by argument 2 of each doubleword of argument 1 and return the
+result.  The other words of argument 1 are unchanged.
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sll_b (uint32_t, uint32_t)
-Generated assembler @code{cv.sll.b}
-@end deftypefn
+@findex vec_splati_ins
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sll_sc_h (uint32_t, int16_t)
-Generated assembler @code{cv.sll.sc.h}
-@end deftypefn
+Vector Blend Variable
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sll_sc_h (uint32_t, int6_t)
-Generated assembler @code{cv.sll.sci.h}
-@end deftypefn
+@smallexample
+@exdent vector signed char vec_blendv (vector signed char, vector signed char,
+vector unsigned char);
+@exdent vector unsigned char vec_blendv (vector unsigned char,
+vector unsigned char, vector unsigned char);
+@exdent vector signed short vec_blendv (vector signed short,
+vector signed short, vector unsigned short);
+@exdent vector unsigned short vec_blendv (vector unsigned short,
+vector unsigned short, vector unsigned short);
+@exdent vector signed int vec_blendv (vector signed int, vector signed int,
+vector unsigned int);
+@exdent vector unsigned int vec_blendv (vector unsigned int,
+vector unsigned int, vector unsigned int);
+@exdent vector signed long long vec_blendv (vector signed long long,
+vector signed long long, vector unsigned long long);
+@exdent vector unsigned long long vec_blendv (vector unsigned long long,
+vector unsigned long long, vector unsigned long long);
+@exdent vector float vec_blendv (vector float, vector float,
+vector unsigned int);
+@exdent vector double vec_blendv (vector double, vector double,
+vector unsigned long long);
+@end smallexample
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sll_sc_b (uint32_t, int8_t)
-Generated assembler @code{cv.sll.sc.b}
-@end deftypefn
+Blend the first and second argument vectors according to the sign bits of the
+corresponding elements of the third argument vector.  This is similar to the
+@code{vsel} and @code{xxsel} instructions but for bigger elements.
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sll_sc_b (uint32_t, int6_t)
-Generated assembler @code{cv.sll.sci.b}
-@end deftypefn
+@findex vec_blendv
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_or_h (uint32_t, uint32_t)
-Generated assembler @code{cv.or.h}
-@end deftypefn
+Vector Permute Extended
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_or_b (uint32_t, uint32_t)
-Generated assembler @code{cv.or.b}
-@end deftypefn
+@smallexample
+@exdent vector signed char vec_permx (vector signed char, vector signed char,
+vector unsigned char, const int);
+@exdent vector unsigned char vec_permx (vector unsigned char,
+vector unsigned char, vector unsigned char, const int);
+@exdent vector signed short vec_permx (vector signed short,
+vector signed short, vector unsigned char, const int);
+@exdent vector unsigned short vec_permx (vector unsigned short,
+vector unsigned short, vector unsigned char, const int);
+@exdent vector signed int vec_permx (vector signed int, vector signed int,
+vector unsigned char, const int);
+@exdent vector unsigned int vec_permx (vector unsigned int,
+vector unsigned int, vector unsigned char, const int);
+@exdent vector signed long long vec_permx (vector signed long long,
+vector signed long long, vector unsigned char, const int);
+@exdent vector unsigned long long vec_permx (vector unsigned long long,
+vector unsigned long long, vector unsigned char, const int);
+@exdent vector float (vector float, vector float, vector unsigned char,
+const int);
+@exdent vector double (vector double, vector double, vector unsigned char,
+const int);
+@end smallexample
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_or_sc_h (uint32_t, int16_t)
-Generated assembler @code{cv.or.sc.h}
-@end deftypefn
+Perform a partial permute of the first two arguments, which form a 32-byte
+section of an emulated vector up to 256 bytes wide, using the partial permute
+control vector in the third argument.  The fourth argument (constrained to
+values of 0-7) identifies which 32-byte section of the emulated vector is
+contained in the first two arguments.
+@findex vec_permx
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_or_sc_h (uint32_t, int6_t)
-Generated assembler @code{cv.or.sci.h}
-@end deftypefn
+@smallexample
+@exdent vector unsigned long long int
+@exdent vec_pext (vector unsigned long long int, vector unsigned long long int);
+@end smallexample
+Perform a vector parallel bit extract operation, as if implemented by
+the @code{vpextd} instruction.
+@findex vec_pext
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_or_sc_b (uint32_t, int8_t)
-Generated assembler @code{cv.or.sc.b}
-@end deftypefn
+@smallexample
+@exdent vector unsigned char vec_stril (vector unsigned char);
+@exdent vector signed char vec_stril (vector signed char);
+@exdent vector unsigned short vec_stril (vector unsigned short);
+@exdent vector signed short vec_stril (vector signed short);
+@end smallexample
+Isolate the left-most non-zero elements of the incoming vector argument,
+replacing all elements to the right of the left-most zero element
+found within the argument with zero.  The typical implementation uses
+the @code{vstribl} or @code{vstrihl} instruction on big-endian targets
+and uses the @code{vstribr} or @code{vstrihr} instruction on
+little-endian targets.
+@findex vec_stril
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_or_sc_b (uint32_t, int6_t)
-Generated assembler @code{cv.or.sci.b}
-@end deftypefn
+@smallexample
+@exdent int vec_stril_p (vector unsigned char);
+@exdent int vec_stril_p (vector signed char);
+@exdent int short vec_stril_p (vector unsigned short);
+@exdent int vec_stril_p (vector signed short);
+@end smallexample
+Return a non-zero value if and only if the argument contains a zero
+element.  The typical implementation uses
+the @code{vstribl.} or @code{vstrihl.} instruction on big-endian targets
+and uses the @code{vstribr.} or @code{vstrihr.} instruction on
+little-endian targets.  Choose this built-in to check for presence of
+zero element if the same argument is also passed to @code{vec_stril}.
+@findex vec_stril_p
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_xor_h (uint32_t, uint32_t)
-Generated assembler @code{cv.xor.h}
-@end deftypefn
+@smallexample
+@exdent vector unsigned char vec_strir (vector unsigned char);
+@exdent vector signed char vec_strir (vector signed char);
+@exdent vector unsigned short vec_strir (vector unsigned short);
+@exdent vector signed short vec_strir (vector signed short);
+@end smallexample
+Isolate the right-most non-zero elements of the incoming vector argument,
+replacing all elements to the left of the right-most zero element
+found within the argument with zero.  The typical implementation uses
+the @code{vstribr} or @code{vstrihr} instruction on big-endian targets
+and uses the @code{vstribl} or @code{vstrihl} instruction on
+little-endian targets.
+@findex vec_strir
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_xor_b (uint32_t, uint32_t)
-Generated assembler @code{cv.xor.b}
-@end deftypefn
+@smallexample
+@exdent int vec_strir_p (vector unsigned char);
+@exdent int vec_strir_p (vector signed char);
+@exdent int short vec_strir_p (vector unsigned short);
+@exdent int vec_strir_p (vector signed short);
+@end smallexample
+Return a non-zero value if and only if the argument contains a zero
+element.  The typical implementation uses
+the @code{vstribr.} or @code{vstrihr.} instruction on big-endian targets
+and uses the @code{vstribl.} or @code{vstrihl.} instruction on
+little-endian targets.  Choose this built-in to check for presence of
+zero element if the same argument is also passed to @code{vec_strir}.
+@findex vec_strir_p
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_xor_sc_h (uint32_t, int16_t)
-Generated assembler @code{cv.xor.sc.h}
-@end deftypefn
+@smallexample
+@exdent vector unsigned char
+@exdent vec_ternarylogic (vector unsigned char, vector unsigned char,
+            vector unsigned char, const unsigned int);
+@exdent vector unsigned short
+@exdent vec_ternarylogic (vector unsigned short, vector unsigned short,
+            vector unsigned short, const unsigned int);
+@exdent vector unsigned int
+@exdent vec_ternarylogic (vector unsigned int, vector unsigned int,
+            vector unsigned int, const unsigned int);
+@exdent vector unsigned long long int
+@exdent vec_ternarylogic (vector unsigned long long int, vector unsigned long long int,
+            vector unsigned long long int, const unsigned int);
+@exdent vector unsigned __int128
+@exdent vec_ternarylogic (vector unsigned __int128, vector unsigned __int128,
+            vector unsigned __int128, const unsigned int);
+@end smallexample
+Perform a 128-bit vector evaluate operation, as if implemented by the
+@code{xxeval} instruction.  The fourth argument must be a literal
+integer value between 0 and 255 inclusive.
+@findex vec_ternarylogic
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_xor_sc_h (uint32_t, int6_t)
-Generated assembler @code{cv.xor.sci.h}
-@end deftypefn
+@smallexample
+@exdent vector unsigned char vec_genpcvm (vector unsigned char, const int);
+@exdent vector unsigned short vec_genpcvm (vector unsigned short, const int);
+@exdent vector unsigned int vec_genpcvm (vector unsigned int, const int);
+@exdent vector unsigned int vec_genpcvm (vector unsigned long long int,
+                                         const int);
+@end smallexample
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_xor_sc_b (uint32_t, int8_t)
-Generated assembler @code{cv.xor.sc.b}
-@end deftypefn
+Vector Integer Multiply/Divide/Modulo
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_xor_sc_b (uint32_t, int6_t)
-Generated assembler @code{cv.xor.sci.b}
-@end deftypefn
+@smallexample
+@exdent vector signed int
+@exdent vec_mulh (vector signed int @var{a}, vector signed int @var{b});
+@exdent vector unsigned int
+@exdent vec_mulh (vector unsigned int @var{a}, vector unsigned int @var{b});
+@end smallexample
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_and_h (uint32_t, uint32_t)
-Generated assembler @code{cv.and.h}
-@end deftypefn
+For each integer value @code{i} from 0 to 3, do the following. The integer
+value in word element @code{i} of a is multiplied by the integer value in word
+element @code{i} of b. The high-order 32 bits of the 64-bit product are placed
+into word element @code{i} of the vector returned.
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_and_b (uint32_t, uint32_t)
-Generated assembler @code{cv.and.b}
-@end deftypefn
+@smallexample
+@exdent vector signed long long
+@exdent vec_mulh (vector signed long long @var{a}, vector signed long long @var{b});
+@exdent vector unsigned long long
+@exdent vec_mulh (vector unsigned long long @var{a}, vector unsigned long long @var{b});
+@end smallexample
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_and_sc_h (uint32_t, int16_t)
-Generated assembler @code{cv.and.sc.h}
-@end deftypefn
+For each integer value @code{i} from 0 to 1, do the following. The integer
+value in doubleword element @code{i} of a is multiplied by the integer value in
+doubleword element @code{i} of b. The high-order 64 bits of the 128-bit product
+are placed into doubleword element @code{i} of the vector returned.
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_and_sc_h (uint32_t, int6_t)
-Generated assembler @code{cv.and.sci.h}
-@end deftypefn
+@smallexample
+@exdent vector unsigned long long
+@exdent vec_mul (vector unsigned long long @var{a}, vector unsigned long long @var{b});
+@exdent vector signed long long
+@exdent vec_mul (vector signed long long @var{a}, vector signed long long @var{b});
+@end smallexample
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_and_sc_b (uint32_t, int8_t)
-Generated assembler @code{cv.and.sc.b}
-@end deftypefn
+For each integer value @code{i} from 0 to 1, do the following. The integer
+value in doubleword element @code{i} of a is multiplied by the integer value in
+doubleword element @code{i} of b. The low-order 64 bits of the 128-bit product
+are placed into doubleword element @code{i} of the vector returned.
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_and_sc_b (uint32_t, int6_t)
-Generated assembler @code{cv.and.sci.b}
-@end deftypefn
+@smallexample
+@exdent vector signed int
+@exdent vec_div (vector signed int @var{a}, vector signed int @var{b});
+@exdent vector unsigned int
+@exdent vec_div (vector unsigned int @var{a}, vector unsigned int @var{b});
+@end smallexample
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_abs_h (uint32_t)
-Generated assembler @code{cv.abs.h}
-@end deftypefn
+For each integer value @code{i} from 0 to 3, do the following. The integer in
+word element @code{i} of a is divided by the integer in word element @code{i}
+of b. The unique integer quotient is placed into the word element @code{i} of
+the vector returned. If an attempt is made to perform any of the divisions
+<anything> Ã· 0 then the quotient is undefined.
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_abs_b (uint32_t)
-Generated assembler @code{cv.abs.b}
-@end deftypefn
+@smallexample
+@exdent vector signed long long
+@exdent vec_div (vector signed long long @var{a}, vector signed long long @var{b});
+@exdent vector unsigned long long
+@exdent vec_div (vector unsigned long long @var{a}, vector unsigned long long @var{b});
+@end smallexample
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_dotup_h (uint32_t, uint32_t)
-Generated assembler @code{cv.dotup.h}
-@end deftypefn
+For each integer value @code{i} from 0 to 1, do the following. The integer in
+doubleword element @code{i} of a is divided by the integer in doubleword
+element @code{i} of b. The unique integer quotient is placed into the
+doubleword element @code{i} of the vector returned. If an attempt is made to
+perform any of the divisions 0x8000_0000_0000_0000 Ã· -1 or <anything> Ã· 0 then
+the quotient is undefined.
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_dotup_b (uint32_t, uint32_t)
-Generated assembler @code{cv.dotup.b}
-@end deftypefn
+@smallexample
+@exdent vector signed int
+@exdent vec_dive (vector signed int @var{a}, vector signed int @var{b});
+@exdent vector unsigned int
+@exdent vec_dive (vector unsigned int @var{a}, vector unsigned int @var{b});
+@end smallexample
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_dotup_sc_h (uint32_t, uint16_t)
-Generated assembler @code{cv.dotup.sc.h}
-@end deftypefn
+For each integer value @code{i} from 0 to 3, do the following. The integer in
+word element @code{i} of a is shifted left by 32 bits, then divided by the
+integer in word element @code{i} of b. The unique integer quotient is placed
+into the word element @code{i} of the vector returned. If the quotient cannot
+be represented in 32 bits, or if an attempt is made to perform any of the
+divisions <anything> Ã· 0 then the quotient is undefined.
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_dotup_sc_h (uint32_t, uint6_t)
-Generated assembler @code{cv.dotup.sci.h}
-@end deftypefn
+@smallexample
+@exdent vector signed long long
+@exdent vec_dive (vector signed long long @var{a}, vector signed long long @var{b});
+@exdent vector unsigned long long
+@exdent vec_dive (vector unsigned long long @var{a}, vector unsigned long long @var{b});
+@end smallexample
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_dotup_sc_b (uint32_t, uint8_t)
-Generated assembler @code{cv.dotup.sc.b}
-@end deftypefn
+For each integer value @code{i} from 0 to 1, do the following. The integer in
+doubleword element @code{i} of a is shifted left by 64 bits, then divided by
+the integer in doubleword element @code{i} of b. The unique integer quotient is
+placed into the doubleword element @code{i} of the vector returned. If the
+quotient cannot be represented in 64 bits, or if an attempt is made to perform
+<anything> Ã· 0 then the quotient is undefined.
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_dotup_sc_b (uint32_t, uint6_t)
-Generated assembler @code{cv.dotup.sci.b}
-@end deftypefn
+@smallexample
+@exdent vector signed int
+@exdent vec_mod (vector signed int @var{a}, vector signed int @var{b});
+@exdent vector unsigned int
+@exdent vec_mod (vector unsigned int @var{a}, vector unsigned int @var{b});
+@end smallexample
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_dotusp_h (uint32_t, uint32_t)
-Generated assembler @code{cv.dotusp.h}
-@end deftypefn
+For each integer value @code{i} from 0 to 3, do the following. The integer in
+word element @code{i} of a is divided by the integer in word element @code{i}
+of b. The unique integer remainder is placed into the word element @code{i} of
+the vector returned.  If an attempt is made to perform any of the divisions
+0x8000_0000 Ã· -1 or <anything> Ã· 0 then the remainder is undefined.
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_dotusp_b (uint32_t, uint32_t)
-Generated assembler @code{cv.dotusp.b}
-@end deftypefn
+@smallexample
+@exdent vector signed long long
+@exdent vec_mod (vector signed long long @var{a}, vector signed long long @var{b});
+@exdent vector unsigned long long
+@exdent vec_mod (vector unsigned long long @var{a}, vector unsigned long long @var{b});
+@end smallexample
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_dotusp_sc_h (uint32_t, int16_t)
-Generated assembler @code{cv.dotusp.sc.h}
-@end deftypefn
+For each integer value @code{i} from 0 to 1, do the following. The integer in
+doubleword element @code{i} of a is divided by the integer in doubleword
+element @code{i} of b. The unique integer remainder is placed into the
+doubleword element @code{i} of the vector returned. If an attempt is made to
+perform <anything> Ã· 0 then the remainder is undefined.
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_dotusp_sc_h (uint32_t, int6_t)
-Generated assembler @code{cv.dotusp.sci.h}
-@end deftypefn
+Generate PCV from specified Mask size, as if implemented by the
+@code{xxgenpcvbm}, @code{xxgenpcvhm}, @code{xxgenpcvwm} instructions, where
+immediate value is either 0, 1, 2 or 3.
+@findex vec_genpcvm
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_dotusp_sc_b (uint32_t, int8_t)
-Generated assembler @code{cv.dotusp.sc.b}
-@end deftypefn
+@smallexample
+@exdent vector unsigned __int128 vec_rl (vector unsigned __int128 @var{A},
+                                         vector unsigned __int128 @var{B});
+@exdent vector signed __int128 vec_rl (vector signed __int128 @var{A},
+                                       vector unsigned __int128 @var{B});
+@end smallexample
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_dotusp_sc_b (uint32_t, int6_t)
-Generated assembler @code{cv.dotusp.sci.b}
-@end deftypefn
+Result value: Each element of @var{R} is obtained by rotating the corresponding element
+of @var{A} left by the number of bits specified by the corresponding element of @var{B}.
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_dotsp_h (uint32_t, uint32_t)
-Generated assembler @code{cv.dotsp.h}
-@end deftypefn
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_dotsp_b (uint32_t, uint32_t)
-Generated assembler @code{cv.dotsp.b}
-@end deftypefn
+@smallexample
+@exdent vector unsigned __int128 vec_rlmi (vector unsigned __int128,
+                                           vector unsigned __int128,
+                                           vector unsigned __int128);
+@exdent vector signed __int128 vec_rlmi (vector signed __int128,
+                                         vector signed __int128,
+                                         vector unsigned __int128);
+@end smallexample
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_dotsp_sc_h (uint32_t, int16_t)
-Generated assembler @code{cv.dotsp.sc.h}
-@end deftypefn
+Returns the result of rotating the first input and inserting it under mask
+into the second input.  The first bit in the mask, the last bit in the mask are
+obtained from the two 7-bit fields bits [108:115] and bits [117:123]
+respectively of the second input.  The shift is obtained from the third input
+in the 7-bit field [125:131] where all bits counted from zero at the left.
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_dotsp_sc_h (uint32_t, int6_t)
-Generated assembler @code{cv.dotsp.sci.h}
-@end deftypefn
+@smallexample
+@exdent vector unsigned __int128 vec_rlnm (vector unsigned __int128,
+                                           vector unsigned __int128,
+                                           vector unsigned __int128);
+@exdent vector signed __int128 vec_rlnm (vector signed __int128,
+                                         vector unsigned __int128,
+                                         vector unsigned __int128);
+@end smallexample
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_dotsp_sc_b (uint32_t, int8_t)
-Generated assembler @code{cv.dotsp.sc.b}
-@end deftypefn
+Returns the result of rotating the first input and ANDing it with a mask.  The
+first bit in the mask and the last bit in the mask are obtained from the two
+7-bit fields bits [117:123] and bits [125:131] respectively of the second
+input.  The shift is obtained from the third input in the 7-bit field bits
+[125:131] where all bits counted from zero at the left.
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_dotsp_sc_b (uint32_t, int6_t)
-Generated assembler @code{cv.dotsp.sci.b}
-@end deftypefn
+@smallexample
+@exdent vector unsigned __int128 vec_sl(vector unsigned __int128 @var{A}, vector unsigned __int128 @var{B});
+@exdent vector signed __int128 vec_sl(vector signed __int128 @var{A}, vector unsigned __int128 @var{B});
+@end smallexample
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sdotup_h (uint32_t, uint32_t, uint32_t)
-Generated assembler @code{cv.sdotup.h}
-@end deftypefn
+Result value: Each element of @var{R} is obtained by shifting the corresponding element of
+@var{A} left by the number of bits specified by the corresponding element of @var{B}.
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sdotup_b (uint32_t, uint32_t, uint32_t)
-Generated assembler @code{cv.sdotup.b}
-@end deftypefn
+@smallexample
+@exdent vector unsigned __int128 vec_sr(vector unsigned __int128 @var{A}, vector unsigned __int128 @var{B});
+@exdent vector signed __int128 vec_sr(vector signed __int128 @var{A}, vector unsigned __int128 @var{B});
+@end smallexample
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sdotup_sc_h (uint32_t, uint16_t, uint32_t)
-Generated assembler @code{cv.sdotup.sc.h}
-@end deftypefn
+Result value: Each element of @var{R} is obtained by shifting the corresponding element of
+@var{A} right by the number of bits specified by the corresponding element of @var{B}.
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sdotup_sc_h (uint32_t, uint6_t, uint32_t)
-Generated assembler @code{cv.sdotup.sci.h}
-@end deftypefn
+@smallexample
+@exdent vector unsigned __int128 vec_sra(vector unsigned __int128 @var{A}, vector unsigned __int128 @var{B});
+@exdent vector signed __int128 vec_sra(vector signed __int128 @var{A}, vector unsigned __int128 @var{B});
+@end smallexample
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sdotup_sc_b (uint32_t, uint8_t, uint32_t)
-Generated assembler @code{cv.sdotup.sc.b}
-@end deftypefn
+Result value: Each element of @var{R} is obtained by arithmetic shifting the corresponding
+element of @var{A} right by the number of bits specified by the corresponding element of @var{B}.
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sdotup_sc_b (uint32_t, uint6_t, uint32_t)
-Generated assembler @code{cv.sdotup.sci.b}
-@end deftypefn
+@smallexample
+@exdent vector unsigned __int128 vec_mule (vector unsigned long long,
+                                           vector unsigned long long);
+@exdent vector signed __int128 vec_mule (vector signed long long,
+                                         vector signed long long);
+@end smallexample
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sdotusp_h (uint32_t, uint32_t, uint32_t)
-Generated assembler @code{cv.sdotusp.h}
-@end deftypefn
+Returns a vector containing a 128-bit integer result of multiplying the even
+doubleword elements of the two inputs.
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sdotusp_b (uint32_t, uint32_t, uint32_t)
-Generated assembler @code{cv.sdotusp.b}
-@end deftypefn
+@smallexample
+@exdent vector unsigned __int128 vec_mulo (vector unsigned long long,
+                                           vector unsigned long long);
+@exdent vector signed __int128 vec_mulo (vector signed long long,
+                                         vector signed long long);
+@end smallexample
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sdotusp_sc_h (uint32_t, int16_t, uint32_t)
-Generated assembler @code{cv.sdotusp.sc.h}
-@end deftypefn
+Returns a vector containing a 128-bit integer result of multiplying the odd
+doubleword elements of the two inputs.
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sdotusp_sc_h (uint32_t, int6_t, uint32_t)
-Generated assembler @code{cv.sdotusp.sci.h}
-@end deftypefn
+@smallexample
+@exdent vector unsigned __int128 vec_div (vector unsigned __int128,
+                                          vector unsigned __int128);
+@exdent vector signed __int128 vec_div (vector signed __int128,
+                                        vector signed __int128);
+@end smallexample
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sdotusp_sc_b (uint32_t, int8_t, uint32_t)
-Generated assembler @code{cv.sdotusp.sc.b}
-@end deftypefn
+Returns the result of dividing the first operand by the second operand. An
+attempt to divide any value by zero or to divide the most negative signed
+128-bit integer by negative one results in an undefined value.
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sdotusp_sc_b (uint32_t, int6_t, uint32_t)
-Generated assembler @code{cv.sdotusp.sci.b}
-@end deftypefn
+@smallexample
+@exdent vector unsigned __int128 vec_dive (vector unsigned __int128,
+                                           vector unsigned __int128);
+@exdent vector signed __int128 vec_dive (vector signed __int128,
+                                         vector signed __int128);
+@end smallexample
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sdotsp_h (uint32_t, uint32_t, uint32_t)
-Generated assembler @code{cv.sdotsp.h}
-@end deftypefn
+The result is produced by shifting the first input left by 128 bits and
+dividing by the second.  If an attempt is made to divide by zero or the result
+is larger than 128 bits, the result is undefined.
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sdotsp_b (uint32_t, uint32_t, uint32_t)
-Generated assembler @code{cv.sdotsp.b}
-@end deftypefn
+@smallexample
+@exdent vector unsigned __int128 vec_mod (vector unsigned __int128,
+                                          vector unsigned __int128);
+@exdent vector signed __int128 vec_mod (vector signed __int128,
+                                        vector signed __int128);
+@end smallexample
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sdotsp_sc_h (uint32_t, int16_t, uint32_t)
-Generated assembler @code{cv.sdotsp.sc.h}
-@end deftypefn
+The result is the modulo result of dividing the first input  by the second
+input.
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sdotsp_sc_h (uint32_t, int6_t, uint32_t)
-Generated assembler @code{cv.sdotsp.sci.h}
-@end deftypefn
+The following builtins perform 128-bit vector comparisons.  The
+@code{vec_all_xx}, @code{vec_any_xx}, and @code{vec_cmpxx}, where @code{xx} is
+one of the operations @code{eq, ne, gt, lt, ge, le} perform pairwise
+comparisons between the elements at the same positions within their two vector
+arguments.  The @code{vec_all_xx}function returns a non-zero value if and only
+if all pairwise comparisons are true.  The @code{vec_any_xx} function returns
+a non-zero value if and only if at least one pairwise comparison is true.  The
+@code{vec_cmpxx}function returns a vector of the same type as its two
+arguments, within which each element consists of all ones to denote that
+specified logical comparison of the corresponding elements was true.
+Otherwise, the element of the returned vector contains all zeros.
+
+@smallexample
+vector bool __int128 vec_cmpeq (vector signed __int128, vector signed __int128);
+vector bool __int128 vec_cmpeq (vector unsigned __int128, vector unsigned __int128);
+vector bool __int128 vec_cmpne (vector signed __int128, vector signed __int128);
+vector bool __int128 vec_cmpne (vector unsigned __int128, vector unsigned __int128);
+vector bool __int128 vec_cmpgt (vector signed __int128, vector signed __int128);
+vector bool __int128 vec_cmpgt (vector unsigned __int128, vector unsigned __int128);
+vector bool __int128 vec_cmplt (vector signed __int128, vector signed __int128);
+vector bool __int128 vec_cmplt (vector unsigned __int128, vector unsigned __int128);
+vector bool __int128 vec_cmpge (vector signed __int128, vector signed __int128);
+vector bool __int128 vec_cmpge (vector unsigned __int128, vector unsigned __int128);
+vector bool __int128 vec_cmple (vector signed __int128, vector signed __int128);
+vector bool __int128 vec_cmple (vector unsigned __int128, vector unsigned __int128);
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sdotsp_sc_b (uint32_t, int8_t, uint32_t)
-Generated assembler @code{cv.sdotsp.sc.b}
-@end deftypefn
+int vec_all_eq (vector signed __int128, vector signed __int128);
+int vec_all_eq (vector unsigned __int128, vector unsigned __int128);
+int vec_all_ne (vector signed __int128, vector signed __int128);
+int vec_all_ne (vector unsigned __int128, vector unsigned __int128);
+int vec_all_gt (vector signed __int128, vector signed __int128);
+int vec_all_gt (vector unsigned __int128, vector unsigned __int128);
+int vec_all_lt (vector signed __int128, vector signed __int128);
+int vec_all_lt (vector unsigned __int128, vector unsigned __int128);
+int vec_all_ge (vector signed __int128, vector signed __int128);
+int vec_all_ge (vector unsigned __int128, vector unsigned __int128);
+int vec_all_le (vector signed __int128, vector signed __int128);
+int vec_all_le (vector unsigned __int128, vector unsigned __int128);
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sdotsp_sc_b (uint32_t, int6_t, uint32_t)
-Generated assembler @code{cv.sdotsp.sci.b}
-@end deftypefn
+int vec_any_eq (vector signed __int128, vector signed __int128);
+int vec_any_eq (vector unsigned __int128, vector unsigned __int128);
+int vec_any_ne (vector signed __int128, vector signed __int128);
+int vec_any_ne (vector unsigned __int128, vector unsigned __int128);
+int vec_any_gt (vector signed __int128, vector signed __int128);
+int vec_any_gt (vector unsigned __int128, vector unsigned __int128);
+int vec_any_lt (vector signed __int128, vector signed __int128);
+int vec_any_lt (vector unsigned __int128, vector unsigned __int128);
+int vec_any_ge (vector signed __int128, vector signed __int128);
+int vec_any_ge (vector unsigned __int128, vector unsigned __int128);
+int vec_any_le (vector signed __int128, vector signed __int128);
+int vec_any_le (vector unsigned __int128, vector unsigned __int128);
+@end smallexample
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_extract_h (uint32_t, uint6_t)
-Generated assembler @code{cv.extract.h}
-@end deftypefn
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_extract_b (uint32_t, uint6_t)
-Generated assembler @code{cv.extract.b}
-@end deftypefn
+The following instances are extension of the existing overloaded built-ins
+@code{vec_sld}, @code{vec_sldw}, @code{vec_slo}, @code{vec_sro}, @code{vec_srl}
+that are documented in the PVIPR.
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_extractu_h (uint32_t, uint6_t)
-Generated assembler @code{cv.extractu.h}
-@end deftypefn
+@smallexample
+@exdent vector signed __int128 vec_sld (vector signed __int128,
+vector signed __int128, const unsigned int);
+@exdent vector unsigned __int128 vec_sld (vector unsigned __int128,
+vector unsigned __int128, const unsigned int);
+@exdent vector signed __int128 vec_sldw (vector signed __int128,
+vector signed __int128, const unsigned int);
+@exdent vector unsigned __int128 vec_sldw (vector unsigned __int,
+vector unsigned __int128, const unsigned int);
+@exdent vector signed __int128 vec_slo (vector signed __int128,
+vector signed char);
+@exdent vector signed __int128 vec_slo (vector signed __int128,
+vector unsigned char);
+@exdent vector unsigned __int128 vec_slo (vector unsigned __int128,
+vector signed char);
+@exdent vector unsigned __int128 vec_slo (vector unsigned __int128,
+vector unsigned char);
+@exdent vector signed __int128 vec_sro (vector signed __int128,
+vector signed char);
+@exdent vector signed __int128 vec_sro (vector signed __int128,
+vector unsigned char);
+@exdent vector unsigned __int128 vec_sro (vector unsigned __int128,
+vector signed char);
+@exdent vector unsigned __int128 vec_sro (vector unsigned __int128,
+vector unsigned char);
+@exdent vector signed __int128 vec_srl (vector signed __int128,
+vector unsigned char);
+@exdent vector unsigned __int128 vec_srl (vector unsigned __int128,
+vector unsigned char);
+@end smallexample
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_extractu_b (uint32_t, uint6_t)
-Generated assembler @code{cv.extractu.b}
-@end deftypefn
+@node PowerPC Hardware Transactional Memory Built-in Functions
+@subsection PowerPC Hardware Transactional Memory Built-in Functions
+GCC provides two interfaces for accessing the Hardware Transactional
+Memory (HTM) instructions available on some of the PowerPC family
+of processors (eg, POWER8).  The two interfaces come in a low level
+interface, consisting of built-in functions specific to PowerPC and a
+higher level interface consisting of inline functions that are common
+between PowerPC and S/390.
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_insert_h (uint32_t, uint32_t)
-Generated assembler @code{cv.insert.h}
-@end deftypefn
+@subsubsection PowerPC HTM Low Level Built-in Functions
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_insert_b (uint32_t, uint32_t)
-Generated assembler @code{cv.insert.b}
-@end deftypefn
+The following low level built-in functions are available with
+@option{-mhtm} or @option{-mcpu=CPU} where CPU is `power8' or later.
+They all generate the machine instruction that is part of the name.
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_shuffle_h (uint32_t, uint32_t)
-Generated assembler @code{cv.shuffle.h}
-@end deftypefn
+The HTM builtins (with the exception of @code{__builtin_tbegin}) return
+the full 4-bit condition register value set by their associated hardware
+instruction.  The header file @code{htmintrin.h} defines some macros that can
+be used to decipher the return value.  The @code{__builtin_tbegin} builtin
+returns a simple @code{true} or @code{false} value depending on whether a transaction was
+successfully started or not.  The arguments of the builtins match exactly the
+type and order of the associated hardware instruction's operands, except for
+the @code{__builtin_tcheck} builtin, which does not take any input arguments.
+Refer to the ISA manual for a description of each instruction's operands.
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_shuffle_b (uint32_t, uint32_t)
-Generated assembler @code{cv.shuffle.b}
-@end deftypefn
+@smallexample
+unsigned int __builtin_tbegin (unsigned int);
+unsigned int __builtin_tend (unsigned int);
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_shuffle_sci_h (uint32_t, uint4_t)
-Generated assembler @code{cv.shuffle.sci.h}
-@end deftypefn
+unsigned int __builtin_tabort (unsigned int);
+unsigned int __builtin_tabortdc (unsigned int, unsigned int, unsigned int);
+unsigned int __builtin_tabortdci (unsigned int, unsigned int, int);
+unsigned int __builtin_tabortwc (unsigned int, unsigned int, unsigned int);
+unsigned int __builtin_tabortwci (unsigned int, unsigned int, int);
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_shufflei0_sci_b (uint32_t, uint4_t)
-Generated assembler @code{cv.shufflei0.sci.b}
-@end deftypefn
+unsigned int __builtin_tcheck (void);
+unsigned int __builtin_treclaim (unsigned int);
+unsigned int __builtin_trechkpt (void);
+unsigned int __builtin_tsr (unsigned int);
+@end smallexample
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_shufflei1_sci_b (uint32_t, uint4_t)
-Generated assembler @code{cv.shufflei1.sci.b}
-@end deftypefn
+In addition to the above HTM built-ins, we have added built-ins for
+some common extended mnemonics of the HTM instructions:
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_shufflei2_sci_b (uint32_t, uint4_t)
-Generated assembler @code{cv.shufflei2.sci.b}
-@end deftypefn
+@smallexample
+unsigned int __builtin_tendall (void);
+unsigned int __builtin_tresume (void);
+unsigned int __builtin_tsuspend (void);
+@end smallexample
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_shufflei3_sci_b (uint32_t, uint4_t)
-Generated assembler @code{cv.shufflei3.sci.b}
-@end deftypefn
+Note that the semantics of the above HTM builtins are required to mimic
+the locking semantics used for critical sections.  Builtins that are used
+to create a new transaction or restart a suspended transaction must have
+lock acquisition like semantics while those builtins that end or suspend a
+transaction must have lock release like semantics.  Specifically, this must
+mimic lock semantics as specified by C++11, for example: Lock acquisition is
+as-if an execution of __atomic_exchange_n(&globallock,1,__ATOMIC_ACQUIRE)
+that returns 0, and lock release is as-if an execution of
+__atomic_store(&globallock,0,__ATOMIC_RELEASE), with globallock being an
+implicit implementation-defined lock used for all transactions.  The HTM
+instructions associated with with the builtins inherently provide the
+correct acquisition and release hardware barriers required.  However,
+the compiler must also be prohibited from moving loads and stores across
+the builtins in a way that would violate their semantics.  This has been
+accomplished by adding memory barriers to the associated HTM instructions
+(which is a conservative approach to provide acquire and release semantics).
+Earlier versions of the compiler did not treat the HTM instructions as
+memory barriers.  A @code{__TM_FENCE__} macro has been added, which can
+be used to determine whether the current compiler treats HTM instructions
+as memory barriers or not.  This allows the user to explicitly add memory
+barriers to their code when using an older version of the compiler.
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_shuffle2_h (uint32_t, uint32_t, uint32_t)
-Generated assembler @code{cv.shuffle2.h}
-@end deftypefn
+The following set of built-in functions are available to gain access
+to the HTM specific special purpose registers.
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_shuffle2_b (uint32_t, uint32_t, uint32_t)
-Generated assembler @code{cv.shuffle2.b}
-@end deftypefn
+@smallexample
+unsigned long __builtin_get_texasr (void);
+unsigned long __builtin_get_texasru (void);
+unsigned long __builtin_get_tfhar (void);
+unsigned long __builtin_get_tfiar (void);
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_packlo_h (uint32_t, uint32_t)
-Generated assembler @code{cv.pack}
-@end deftypefn
+void __builtin_set_texasr (unsigned long);
+void __builtin_set_texasru (unsigned long);
+void __builtin_set_tfhar (unsigned long);
+void __builtin_set_tfiar (unsigned long);
+@end smallexample
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_packhi_h (uint32_t, uint32_t)
-Generated assembler @code{cv.pack.h}
-@end deftypefn
+Example usage of these low level built-in functions may look like:
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_packhi_b (uint32_t, uint32_t, uint32_t)
-Generated assembler @code{cv.packhi.b}
-@end deftypefn
+@smallexample
+#include <htmintrin.h>
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_packlo_b (uint32_t, uint32_t, uint32_t)
-Generated assembler @code{cv.packlo.b}
-@end deftypefn
+int num_retries = 10;
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpeq_h (uint32_t, uint32_t)
-Generated assembler @code{cv.cmpeq.h}
-@end deftypefn
+while (1)
+  @{
+    if (__builtin_tbegin (0))
+      @{
+        /* Transaction State Initiated.  */
+        if (is_locked (lock))
+          __builtin_tabort (0);
+        ... transaction code...
+        __builtin_tend (0);
+        break;
+      @}
+    else
+      @{
+        /* Transaction State Failed.  Use locks if the transaction
+           failure is "persistent" or we've tried too many times.  */
+        if (num_retries-- <= 0
+            || _TEXASRU_FAILURE_PERSISTENT (__builtin_get_texasru ()))
+          @{
+            acquire_lock (lock);
+            ... non transactional fallback path...
+            release_lock (lock);
+            break;
+          @}
+      @}
+  @}
+@end smallexample
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpeq_b (uint32_t, uint32_t)
-Generated assembler @code{cv.cmpeq.b}
-@end deftypefn
+One final built-in function has been added that returns the value of
+the 2-bit Transaction State field of the Machine Status Register (MSR)
+as stored in @code{CR0}.
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpeq_sc_h (uint32_t, int16_t)
-Generated assembler @code{cv.cmpeq.sc.h}
-@end deftypefn
+@smallexample
+unsigned long __builtin_ttest (void)
+@end smallexample
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpeq_sc_h (uint32_t, int6_t)
-Generated assembler @code{cv.cmpeq.sci.h}
-@end deftypefn
+This built-in can be used to determine the current transaction state
+using the following code example:
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpeq_sc_b (uint32_t, int8_t)
-Generated assembler @code{cv.cmpeq.sc.b}
-@end deftypefn
+@smallexample
+#include <htmintrin.h>
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpeq_sc_b (uint32_t, int6_t)
-Generated assembler @code{cv.cmpeq.sci.b}
-@end deftypefn
+unsigned char tx_state = _HTM_STATE (__builtin_ttest ());
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpne_h (uint32_t, uint32_t)
-Generated assembler @code{cv.cmpne.h}
-@end deftypefn
+if (tx_state == _HTM_TRANSACTIONAL)
+  @{
+    /* Code to use in transactional state.  */
+  @}
+else if (tx_state == _HTM_NONTRANSACTIONAL)
+  @{
+    /* Code to use in non-transactional state.  */
+  @}
+else if (tx_state == _HTM_SUSPENDED)
+  @{
+    /* Code to use in transaction suspended state.  */
+  @}
+@end smallexample
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpne_b (uint32_t, uint32_t)
-Generated assembler @code{cv.cmpne.b}
-@end deftypefn
+@subsubsection PowerPC HTM High Level Inline Functions
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpne_sc_h (uint32_t, int16_t)
-Generated assembler @code{cv.cmpne.sc.h}
-@end deftypefn
+The following high level HTM interface is made available by including
+@code{<htmxlintrin.h>} and using @option{-mhtm} or @option{-mcpu=CPU}
+where CPU is `power8' or later.  This interface is common between PowerPC
+and S/390, allowing users to write one HTM source implementation that
+can be compiled and executed on either system.
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpne_sc_h (uint32_t, int6_t)
-Generated assembler @code{cv.cmpne.sci.h}
-@end deftypefn
+@smallexample
+long __TM_simple_begin (void);
+long __TM_begin (void* const TM_buff);
+long __TM_end (void);
+void __TM_abort (void);
+void __TM_named_abort (unsigned char const code);
+void __TM_resume (void);
+void __TM_suspend (void);
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpne_sc_b (uint32_t, int8_t)
-Generated assembler @code{cv.cmpne.sc.b}
-@end deftypefn
+long __TM_is_user_abort (void* const TM_buff);
+long __TM_is_named_user_abort (void* const TM_buff, unsigned char *code);
+long __TM_is_illegal (void* const TM_buff);
+long __TM_is_footprint_exceeded (void* const TM_buff);
+long __TM_nesting_depth (void* const TM_buff);
+long __TM_is_nested_too_deep(void* const TM_buff);
+long __TM_is_conflict(void* const TM_buff);
+long __TM_is_failure_persistent(void* const TM_buff);
+long __TM_failure_address(void* const TM_buff);
+long long __TM_failure_code(void* const TM_buff);
+@end smallexample
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpne_sc_b (uint32_t, int6_t)
-Generated assembler @code{cv.cmpne.sci.b}
-@end deftypefn
+Using these common set of HTM inline functions, we can create
+a more portable version of the HTM example in the previous
+section that will work on either PowerPC or S/390:
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpgt_h (uint32_t, uint32_t)
-Generated assembler @code{cv.cmpgt.h}
-@end deftypefn
+@smallexample
+#include <htmxlintrin.h>
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpgt_b (uint32_t, uint32_t)
-Generated assembler @code{cv.cmpgt.b}
-@end deftypefn
+int num_retries = 10;
+TM_buff_type TM_buff;
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpgt_sc_h (uint32_t, int16_t)
-Generated assembler @code{cv.cmpgt.sc.h}
-@end deftypefn
+while (1)
+  @{
+    if (__TM_begin (TM_buff) == _HTM_TBEGIN_STARTED)
+      @{
+        /* Transaction State Initiated.  */
+        if (is_locked (lock))
+          __TM_abort ();
+        ... transaction code...
+        __TM_end ();
+        break;
+      @}
+    else
+      @{
+        /* Transaction State Failed.  Use locks if the transaction
+           failure is "persistent" or we've tried too many times.  */
+        if (num_retries-- <= 0
+            || __TM_is_failure_persistent (TM_buff))
+          @{
+            acquire_lock (lock);
+            ... non transactional fallback path...
+            release_lock (lock);
+            break;
+          @}
+      @}
+  @}
+@end smallexample
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpgt_sc_h (uint32_t, int6_t)
-Generated assembler @code{cv.cmpgt.sci.h}
-@end deftypefn
+@node PowerPC Atomic Memory Operation Functions
+@subsection PowerPC Atomic Memory Operation Functions
+ISA 3.0 of the PowerPC added new atomic memory operation (amo)
+instructions.  GCC provides support for these instructions in 64-bit
+environments.  All of the functions are declared in the include file
+@code{amo.h}.
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpgt_sc_b (uint32_t, int8_t)
-Generated assembler @code{cv.cmpgt.sc.b}
-@end deftypefn
+The functions supported are:
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpgt_sc_b (uint32_t, int6_t)
-Generated assembler @code{cv.cmpgt.sci.b}
-@end deftypefn
+@smallexample
+#include <amo.h>
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpge_h (uint32_t, uint32_t)
-Generated assembler @code{cv.cmpge.h}
-@end deftypefn
+uint32_t amo_lwat_add (uint32_t *, uint32_t);
+uint32_t amo_lwat_xor (uint32_t *, uint32_t);
+uint32_t amo_lwat_ior (uint32_t *, uint32_t);
+uint32_t amo_lwat_and (uint32_t *, uint32_t);
+uint32_t amo_lwat_umax (uint32_t *, uint32_t);
+uint32_t amo_lwat_umin (uint32_t *, uint32_t);
+uint32_t amo_lwat_swap (uint32_t *, uint32_t);
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpge_b (uint32_t, uint32_t)
-Generated assembler @code{cv.cmpge.b}
-@end deftypefn
+int32_t amo_lwat_sadd (int32_t *, int32_t);
+int32_t amo_lwat_smax (int32_t *, int32_t);
+int32_t amo_lwat_smin (int32_t *, int32_t);
+int32_t amo_lwat_sswap (int32_t *, int32_t);
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpge_sc_h (uint32_t, int16_t)
-Generated assembler @code{cv.cmpge.sc.h}
-@end deftypefn
+uint64_t amo_ldat_add (uint64_t *, uint64_t);
+uint64_t amo_ldat_xor (uint64_t *, uint64_t);
+uint64_t amo_ldat_ior (uint64_t *, uint64_t);
+uint64_t amo_ldat_and (uint64_t *, uint64_t);
+uint64_t amo_ldat_umax (uint64_t *, uint64_t);
+uint64_t amo_ldat_umin (uint64_t *, uint64_t);
+uint64_t amo_ldat_swap (uint64_t *, uint64_t);
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpge_sc_h (uint32_t, int6_t)
-Generated assembler @code{cv.cmpge.sci.h}
-@end deftypefn
+int64_t amo_ldat_sadd (int64_t *, int64_t);
+int64_t amo_ldat_smax (int64_t *, int64_t);
+int64_t amo_ldat_smin (int64_t *, int64_t);
+int64_t amo_ldat_sswap (int64_t *, int64_t);
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpge_sc_b (uint32_t, int8_t)
-Generated assembler @code{cv.cmpge.sc.b}
-@end deftypefn
+void amo_stwat_add (uint32_t *, uint32_t);
+void amo_stwat_xor (uint32_t *, uint32_t);
+void amo_stwat_ior (uint32_t *, uint32_t);
+void amo_stwat_and (uint32_t *, uint32_t);
+void amo_stwat_umax (uint32_t *, uint32_t);
+void amo_stwat_umin (uint32_t *, uint32_t);
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpge_sc_b (uint32_t, int6_t)
-Generated assembler @code{cv.cmpge.sci.b}
-@end deftypefn
+void amo_stwat_sadd (int32_t *, int32_t);
+void amo_stwat_smax (int32_t *, int32_t);
+void amo_stwat_smin (int32_t *, int32_t);
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmplt_h (uint32_t, uint32_t)
-Generated assembler @code{cv.cmplt.h}
-@end deftypefn
+void amo_stdat_add (uint64_t *, uint64_t);
+void amo_stdat_xor (uint64_t *, uint64_t);
+void amo_stdat_ior (uint64_t *, uint64_t);
+void amo_stdat_and (uint64_t *, uint64_t);
+void amo_stdat_umax (uint64_t *, uint64_t);
+void amo_stdat_umin (uint64_t *, uint64_t);
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmplt_b (uint32_t, uint32_t)
-Generated assembler @code{cv.cmplt.b}
-@end deftypefn
+void amo_stdat_sadd (int64_t *, int64_t);
+void amo_stdat_smax (int64_t *, int64_t);
+void amo_stdat_smin (int64_t *, int64_t);
+@end smallexample
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmplt_sc_h (uint32_t, int16_t)
-Generated assembler @code{cv.cmplt.sc.h}
-@end deftypefn
+@node PowerPC Matrix-Multiply Assist Built-in Functions
+@subsection PowerPC Matrix-Multiply Assist Built-in Functions
+ISA 3.1 of the PowerPC added new Matrix-Multiply Assist (MMA) instructions.
+GCC provides support for these instructions through the following built-in
+functions which are enabled with the @code{-mmma} option.  The vec_t type
+below is defined to be a normal vector unsigned char type.  The uint2, uint4
+and uint8 parameters are 2-bit, 4-bit and 8-bit unsigned integer constants
+respectively.  The compiler will verify that they are constants and that
+their values are within range.
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmplt_sc_h (uint32_t, int6_t)
-Generated assembler @code{cv.cmplt.sci.h}
-@end deftypefn
+The built-in functions supported are:
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmplt_sc_b (uint32_t, int8_t)
-Generated assembler @code{cv.cmplt.sc.b}
-@end deftypefn
+@smallexample
+void __builtin_mma_xvi4ger8 (__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvi8ger4 (__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvi16ger2 (__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvi16ger2s (__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvf16ger2 (__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvbf16ger2 (__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvf32ger (__vector_quad *, vec_t, vec_t);
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmplt_sc_b (uint32_t, int6_t)
-Generated assembler @code{cv.cmplt.sci.b}
-@end deftypefn
+void __builtin_mma_xvi4ger8pp (__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvi8ger4pp (__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvi8ger4spp(__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvi16ger2pp (__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvi16ger2spp (__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvf16ger2pp (__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvf16ger2pn (__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvf16ger2np (__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvf16ger2nn (__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvbf16ger2pp (__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvbf16ger2pn (__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvbf16ger2np (__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvbf16ger2nn (__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvf32gerpp (__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvf32gerpn (__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvf32gernp (__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvf32gernn (__vector_quad *, vec_t, vec_t);
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmple_h (uint32_t, uint32_t)
-Generated assembler @code{cv.cmple.h}
-@end deftypefn
+void __builtin_mma_pmxvi4ger8 (__vector_quad *, vec_t, vec_t, uint4, uint4, uint8);
+void __builtin_mma_pmxvi4ger8pp (__vector_quad *, vec_t, vec_t, uint4, uint4, uint8);
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmple_b (uint32_t, uint32_t)
-Generated assembler @code{cv.cmple.b}
-@end deftypefn
+void __builtin_mma_pmxvi8ger4 (__vector_quad *, vec_t, vec_t, uint4, uint4, uint4);
+void __builtin_mma_pmxvi8ger4pp (__vector_quad *, vec_t, vec_t, uint4, uint4, uint4);
+void __builtin_mma_pmxvi8ger4spp(__vector_quad *, vec_t, vec_t, uint4, uint4, uint4);
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmple_sc_h (uint32_t, int16_t)
-Generated assembler @code{cv.cmple.sc.h}
-@end deftypefn
+void __builtin_mma_pmxvi16ger2 (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2);
+void __builtin_mma_pmxvi16ger2s (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2);
+void __builtin_mma_pmxvf16ger2 (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2);
+void __builtin_mma_pmxvbf16ger2 (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2);
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmple_sc_h (uint32_t, int6_t)
-Generated assembler @code{cv.cmple.sci.h}
-@end deftypefn
+void __builtin_mma_pmxvi16ger2pp (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2);
+void __builtin_mma_pmxvi16ger2spp (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2);
+void __builtin_mma_pmxvf16ger2pp (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2);
+void __builtin_mma_pmxvf16ger2pn (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2);
+void __builtin_mma_pmxvf16ger2np (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2);
+void __builtin_mma_pmxvf16ger2nn (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2);
+void __builtin_mma_pmxvbf16ger2pp (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2);
+void __builtin_mma_pmxvbf16ger2pn (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2);
+void __builtin_mma_pmxvbf16ger2np (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2);
+void __builtin_mma_pmxvbf16ger2nn (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2);
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmple_sc_b (uint32_t, int8_t)
-Generated assembler @code{cv.cmple.sc.b}
-@end deftypefn
+void __builtin_mma_pmxvf32ger (__vector_quad *, vec_t, vec_t, uint4, uint4);
+void __builtin_mma_pmxvf32gerpp (__vector_quad *, vec_t, vec_t, uint4, uint4);
+void __builtin_mma_pmxvf32gerpn (__vector_quad *, vec_t, vec_t, uint4, uint4);
+void __builtin_mma_pmxvf32gernp (__vector_quad *, vec_t, vec_t, uint4, uint4);
+void __builtin_mma_pmxvf32gernn (__vector_quad *, vec_t, vec_t, uint4, uint4);
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmple_sc_b (uint32_t, int6_t)
-Generated assembler @code{cv.cmple.sci.b}
-@end deftypefn
+void __builtin_mma_xvf64ger (__vector_quad *, __vector_pair, vec_t);
+void __builtin_mma_xvf64gerpp (__vector_quad *, __vector_pair, vec_t);
+void __builtin_mma_xvf64gerpn (__vector_quad *, __vector_pair, vec_t);
+void __builtin_mma_xvf64gernp (__vector_quad *, __vector_pair, vec_t);
+void __builtin_mma_xvf64gernn (__vector_quad *, __vector_pair, vec_t);
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpgtu_h (uint32_t, uint32_t)
-Generated assembler @code{cv.cmpgtu.h}
-@end deftypefn
+void __builtin_mma_pmxvf64ger (__vector_quad *, __vector_pair, vec_t, uint4, uint2);
+void __builtin_mma_pmxvf64gerpp (__vector_quad *, __vector_pair, vec_t, uint4, uint2);
+void __builtin_mma_pmxvf64gerpn (__vector_quad *, __vector_pair, vec_t, uint4, uint2);
+void __builtin_mma_pmxvf64gernp (__vector_quad *, __vector_pair, vec_t, uint4, uint2);
+void __builtin_mma_pmxvf64gernn (__vector_quad *, __vector_pair, vec_t, uint4, uint2);
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpgtu_b (uint32_t, uint32_t)
-Generated assembler @code{cv.cmpgtu.b}
-@end deftypefn
+void __builtin_mma_xxmtacc (__vector_quad *);
+void __builtin_mma_xxmfacc (__vector_quad *);
+void __builtin_mma_xxsetaccz (__vector_quad *);
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpgtu_sc_h (uint32_t, uint16_t)
-Generated assembler @code{cv.cmpgtu.sc.h}
-@end deftypefn
+void __builtin_mma_build_acc (__vector_quad *, vec_t, vec_t, vec_t, vec_t);
+void __builtin_mma_disassemble_acc (void *, __vector_quad *);
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpgtu_sc_h (uint32_t, uint6_t)
-Generated assembler @code{cv.cmpgtu.sci.h}
-@end deftypefn
+void __builtin_vsx_build_pair (__vector_pair *, vec_t, vec_t);
+void __builtin_vsx_disassemble_pair (void *, __vector_pair *);
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpgtu_sc_b (uint32_t, uint8_t)
-Generated assembler @code{cv.cmpgtu.sc.b}
-@end deftypefn
+vec_t __builtin_vsx_xvcvspbf16 (vec_t);
+vec_t __builtin_vsx_xvcvbf16spn (vec_t);
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpgtu_sc_b (uint32_t, uint6_t)
-Generated assembler @code{cv.cmpgtu.sci.b}
-@end deftypefn
+__vector_pair __builtin_vsx_lxvp (size_t, __vector_pair *);
+void __builtin_vsx_stxvp (__vector_pair, size_t, __vector_pair *);
+@end smallexample
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpgeu_h (uint32_t, uint32_t)
-Generated assembler @code{cv.cmpgeu.h}
-@end deftypefn
+@node PRU Built-in Functions
+@subsection PRU Built-in Functions
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpgeu_b (uint32_t, uint32_t)
-Generated assembler @code{cv.cmpgeu.b}
-@end deftypefn
+GCC provides a couple of special builtin functions to aid in utilizing
+special PRU instructions.
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpgeu_sc_h (uint32_t, uint16_t)
-Generated assembler @code{cv.cmpgeu.sc.h}
-@end deftypefn
+The built-in functions supported are:
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpgeu_sc_h (uint32_t, uint6_t)
-Generated assembler @code{cv.cmpgeu.sci.h}
-@end deftypefn
+@defbuiltin{void __delay_cycles (constant long long @var{cycles})}
+This inserts an instruction sequence that takes exactly @var{cycles}
+cycles (between 0 and 0xffffffff) to complete.  The inserted sequence
+may use jumps, loops, or no-ops, and does not interfere with any other
+instructions.  Note that @var{cycles} must be a compile-time constant
+integer - that is, you must pass a number, not a variable that may be
+optimized to a constant later.  The number of cycles delayed by this
+builtin is exact.
+@enddefbuiltin
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpgeu_sc_b (uint32_t, uint8_t)
-Generated assembler @code{cv.cmpgeu.sc.b}
-@end deftypefn
+@defbuiltin{void __halt (void)}
+This inserts a HALT instruction to stop processor execution.
+@enddefbuiltin
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpgeu_sc_b (uint32_t, uint6_t)
-Generated assembler @code{cv.cmpgeu.sci.b}
-@end deftypefn
+@defbuiltin{{unsigned int} @
+            __lmbd (unsigned int @var{wordval}, @
+                    unsigned int @var{bitval})}
+This inserts LMBD instruction to calculate the left-most bit with value
+@var{bitval} in value @var{wordval}.  Only the least significant bit
+of @var{bitval} is taken into account.
+@enddefbuiltin
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpltu_h (uint32_t, uint32_t)
-Generated assembler @code{cv.cmpltu.h}
-@end deftypefn
+@node RISC-V Built-in Functions
+@subsection RISC-V Built-in Functions
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpltu_b (uint32_t, uint32_t)
-Generated assembler @code{cv.cmpltu.b}
-@end deftypefn
+These built-in functions are available for the RISC-V family of
+processors.
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpltu_sc_h (uint32_t, uint16_t)
-Generated assembler @code{cv.cmpltu.sc.h}
-@end deftypefn
+@defbuiltin{{void *} __builtin_thread_pointer (void)}
+Returns the value that is currently set in the @samp{tp} register.
+@enddefbuiltin
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpltu_sc_h (uint32_t, uint6_t)
-Generated assembler @code{cv.cmpltu.sci.h}
-@end deftypefn
+@defbuiltin{void __builtin_riscv_pause (void)}
+Generates the @code{pause} (hint) machine instruction.  If the target implements
+the Zihintpause extension, it indicates that the current hart should be
+temporarily paused or slowed down.
+@enddefbuiltin
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpltu_sc_b (uint32_t, uint8_t)
-Generated assembler @code{cv.cmpltu.sc.b}
-@end deftypefn
+@node RISC-V Vector Intrinsics
+@subsection RISC-V Vector Intrinsics
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpltu_sc_b (uint32_t, uint6_t)
-Generated assembler @code{cv.cmpltu.sci.b}
-@end deftypefn
+GCC supports vector intrinsics as specified in version 0.11 of the RISC-V
+vector intrinsic specification, which is available at the following link:
+@uref{https://github.com/riscv-non-isa/rvv-intrinsic-doc/tree/v0.11.x}.
+All of these functions are declared in the include file @file{riscv_vector.h}.
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpleu_h (uint32_t, uint32_t)
-Generated assembler @code{cv.cmpleu.h}
-@end deftypefn
+@node CORE-V Built-in Functions
+@subsection CORE-V Built-in Functions
+For more information on all CORE-V built-ins, please see
+@uref{https://github.com/openhwgroup/core-v-sw/blob/master/specifications/corev-builtin-spec.md}
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpleu_b (uint32_t, uint32_t)
-Generated assembler @code{cv.cmpleu.b}
-@end deftypefn
+These built-in functions are available for the CORE-V MAC machine
+architecture. For more information on CORE-V built-ins, please see
+@uref{https://github.com/openhwgroup/core-v-sw/blob/master/specifications/corev-builtin-spec.md#listing-of-multiply-accumulate-builtins-xcvmac}.
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpleu_sc_h (uint32_t, uint16_t)
-Generated assembler @code{cv.cmpleu.sc.h}
+@deftypefn {Built-in Function} {int32_t} __builtin_riscv_cv_mac_mac (int32_t, int32_t, int32_t)
+Generated assembler @code{cv.mac}
 @end deftypefn
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpleu_sc_h (uint32_t, uint6_t)
-Generated assembler @code{cv.cmpleu.sci.h}
+@deftypefn {Built-in Function} {int32_t} __builtin_riscv_cv_mac_msu (int32_t, int32_t, int32_t)
+Generates the @code{cv.msu} machine instruction.
 @end deftypefn
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpleu_sc_b (uint32_t, uint8_t)
-Generated assembler @code{cv.cmpleu.sc.b}
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_mac_muluN (uint32_t, uint32_t, uint8_t)
+Generates the @code{cv.muluN} machine instruction.
 @end deftypefn
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpleu_sc_b (uint32_t, uint6_t)
-Generated assembler @code{cv.cmpleu.sci.b}
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_mac_mulhhuN (uint32_t, uint32_t, uint8_t)
+Generates the @code{cv.mulhhuN} machine instruction.
 @end deftypefn
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cplxmul_r (uint32_t, uint32_t, uint32_t, uint4_t)
-Generated assembler @code{cv.cplxmul.r}
+@deftypefn {Built-in Function} {int32_t} __builtin_riscv_cv_mac_mulsN (int32_t, int32_t, uint8_t)
+Generates the @code{cv.mulsN} machine instruction.
 @end deftypefn
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cplxmul_i (uint32_t, uint32_t, uint32_t, uint4_t)
-Generated assembler @code{cv.cplxmul.i}
+@deftypefn {Built-in Function} {int32_t} __builtin_riscv_cv_mac_mulhhsN (int32_t, int32_t, uint8_t)
+Generates the @code{cv.mulhhsN} machine instruction.
 @end deftypefn
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cplxmul_r (uint32_t, uint32_t, uint32_t, uint4_t)
-Generated assembler @code{cv.cplxmul.r.div2}
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_mac_muluRN (uint32_t, uint32_t, uint8_t)
+Generates the @code{cv.muluRN} machine instruction.
 @end deftypefn
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cplxmul_i (uint32_t, uint32_t, uint32_t, uint4_t)
-Generated assembler @code{cv.cplxmul.i.div2}
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_mac_mulhhuRN (uint32_t, uint32_t, uint8_t)
+Generates the @code{cv.mulhhuRN} machine instruction.
 @end deftypefn
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cplxmul_r (uint32_t, uint32_t, uint32_t, uint4_t)
-Generated assembler @code{cv.cplxmul.r.div4}
+@deftypefn {Built-in Function} {int32_t} __builtin_riscv_cv_mac_mulsRN (int32_t, int32_t, uint8_t)
+Generates the @code{cv.mulsRN} machine instruction.
 @end deftypefn
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cplxmul_i (uint32_t, uint32_t, uint32_t, uint4_t)
-Generated assembler @code{cv.cplxmul.i.div4}
+@deftypefn {Built-in Function} {int32_t} __builtin_riscv_cv_mac_mulhhsRN (int32_t, int32_t, uint8_t)
+Generates the @code{cv.mulhhsRN} machine instruction.
 @end deftypefn
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cplxmul_r (uint32_t, uint32_t, uint32_t, uint4_t)
-Generated assembler @code{cv.cplxmul.r.div8}
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_mac_macuN (uint32_t, uint32_t, uint8_t)
+Generates the @code{cv.macuN} machine instruction.
 @end deftypefn
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cplxmul_i (uint32_t, uint32_t, uint32_t, uint4_t)
-Generated assembler @code{cv.cplxmul.i.div8}
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_mac_machhuN (uint32_t, uint32_t, uint8_t)
+Generates the @code{cv.machhuN} machine instruction.
 @end deftypefn
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cplxconj (uint32_t)
-Generated assembler @code{cv.cplxconj}
+@deftypefn {Built-in Function} {int32_t} __builtin_riscv_cv_mac_macsN (int32_t, int32_t, uint8_t)
+Generates the @code{cv.macsN} machine instruction.
 @end deftypefn
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_subrotmj (uint32_t, uint32_t, uint4_t)
-Generated assembler @code{cv.subrotmj}
+@deftypefn {Built-in Function} {int32_t} __builtin_riscv_cv_mac_machhsN (int32_t, int32_t, uint8_t)
+Generates the @code{cv.machhsN} machine instruction.
 @end deftypefn
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_subrotmj (uint32_t, uint32_t, uint32_t, uint4_t)
-Generated assembler @code{cv.subrotmj.div2}
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_mac_macuRN (uint32_t, uint32_t, uint8_t)
+Generates the @code{cv.macuRN} machine instruction.
 @end deftypefn
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_subrotmj (uint32_t, uint32_t, uint32_t, uint4_t)
-Generated assembler @code{cv.subrotmj.div4}
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_mac_machhuRN (uint32_t, uint32_t, uint8_t)
+Generates the @code{cv.machhuRN} machine instruction.
 @end deftypefn
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_subrotmj (uint32_t, uint32_t, uint32_t, uint4_t)
-Generated assembler @code{cv.subrotmj.div8}
+@deftypefn {Built-in Function} {int32_t} __builtin_riscv_cv_mac_macsRN (int32_t, int32_t, uint8_t)
+Generates the @code{cv.macsRN} machine instruction.
 @end deftypefn
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_add_h (uint32_t, uint32_t, uint32_t, uint4_t)
-Generated assembler @code{cv.add.div2}
+@deftypefn {Built-in Function} {int32_t} __builtin_riscv_cv_mac_machhsRN (int32_t, int32_t, uint8_t)
+Generates the @code{cv.machhsRN} machine instruction.
 @end deftypefn
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_add_h (uint32_t, uint32_t, uint32_t, uint4_t)
-Generated assembler @code{cv.add.div4}
-@end deftypefn
+These built-in functions are available for the CORE-V ALU machine
+architecture. For more information on CORE-V built-ins, please see
+@uref{https://github.com/openhwgroup/core-v-sw/blob/master/specifications/corev-builtin-spec.md#listing-of-miscellaneous-alu-builtins-xcvalu}
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_add_h (uint32_t, uint32_t, uint32_t, uint4_t)
-Generated assembler @code{cv.add.div8}
+@deftypefn {Built-in Function} {int} __builtin_riscv_cv_alu_slet (int32_t, int32_t)
+Generated assembler @code{cv.slet}
 @end deftypefn
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sub_h (uint32_t, uint32_t, uint32_t, uint4_t)
-Generated assembler @code{cv.sub.div2}
+@deftypefn {Built-in Function} {int} __builtin_riscv_cv_alu_sletu (uint32_t, uint32_t)
+Generated assembler @code{cv.sletu}
 @end deftypefn
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sub_h (uint32_t, uint32_t, uint32_t, uint4_t)
-Generated assembler @code{cv.sub.div4}
+@deftypefn {Built-in Function} {int32_t} __builtin_riscv_cv_alu_min (int32_t, int32_t)
+Generated assembler @code{cv.min}
 @end deftypefn
 
-@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sub_h (uint32_t, uint32_t, uint32_t, uint4_t)
-Generated assembler @code{cv.sub.div8}
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_alu_minu (uint32_t, uint32_t)
+Generated assembler @code{cv.minu}
 @end deftypefn
 
-@node RX Built-in Functions
-@subsection RX Built-in Functions
-GCC supports some of the RX instructions which cannot be expressed in
-the C programming language via the use of built-in functions.  The
-following functions are supported:
-
-@defbuiltin{void __builtin_rx_brk (void)}
-Generates the @code{brk} machine instruction.
-@enddefbuiltin
+@deftypefn {Built-in Function} {int32_t} __builtin_riscv_cv_alu_max (int32_t, int32_t)
+Generated assembler @code{cv.max}
+@end deftypefn
 
-@defbuiltin{void __builtin_rx_clrpsw (int)}
-Generates the @code{clrpsw} machine instruction to clear the specified
-bit in the processor status word.
-@enddefbuiltin
+@deftypefn {Built-in Function} {uint32_tnt} __builtin_riscv_cv_alu_maxu (uint32_t, uint32_t)
+Generated assembler @code{cv.maxu}
+@end deftypefn
 
-@defbuiltin{void __builtin_rx_int (int)}
-Generates the @code{int} machine instruction to generate an interrupt
-with the specified value.
-@enddefbuiltin
+@deftypefn {Built-in Function} {int32_t} __builtin_riscv_cv_alu_exths (int16_t)
+Generated assembler @code{cv.exths}
+@end deftypefn
 
-@defbuiltin{void __builtin_rx_machi (int, int)}
-Generates the @code{machi} machine instruction to add the result of
-multiplying the top 16 bits of the two arguments into the
-accumulator.
-@enddefbuiltin
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_alu_exthz (uint16_t)
+Generated assembler @code{cv.exthz}
+@end deftypefn
 
-@defbuiltin{void __builtin_rx_maclo (int, int)}
-Generates the @code{maclo} machine instruction to add the result of
-multiplying the bottom 16 bits of the two arguments into the
-accumulator.
-@enddefbuiltin
+@deftypefn {Built-in Function} {int32_t} __builtin_riscv_cv_alu_extbs (int8_t)
+Generated assembler @code{cv.extbs}
+@end deftypefn
 
-@defbuiltin{void __builtin_rx_mulhi (int, int)}
-Generates the @code{mulhi} machine instruction to place the result of
-multiplying the top 16 bits of the two arguments into the
-accumulator.
-@enddefbuiltin
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_alu_extbz (uint8_t)
+Generated assembler @code{cv.extbz}
+@end deftypefn
 
-@defbuiltin{void __builtin_rx_mullo (int, int)}
-Generates the @code{mullo} machine instruction to place the result of
-multiplying the bottom 16 bits of the two arguments into the
-accumulator.
-@enddefbuiltin
+@deftypefn {Built-in Function} {int32_t} __builtin_riscv_cv_alu_clip (int32_t, uint32_t)
+Generated assembler @code{cv.clip} if the uint32_t operand is a constant and an exact power of 2.
+Generated assembler @code{cv.clipr} if  the it is a register.
+@end deftypefn
 
-@defbuiltin{int __builtin_rx_mvfachi (void)}
-Generates the @code{mvfachi} machine instruction to read the top
-32 bits of the accumulator.
-@enddefbuiltin
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_alu_clipu (uint32_t, uint32_t)
+Generated assembler @code{cv.clipu} if the uint32_t operand is a constant and an exact power of 2.
+Generated assembler @code{cv.clipur} if  the it is a register.
+@end deftypefn
 
-@defbuiltin{int __builtin_rx_mvfacmi (void)}
-Generates the @code{mvfacmi} machine instruction to read the middle
-32 bits of the accumulator.
-@enddefbuiltin
+@deftypefn {Built-in Function} {int32_t} __builtin_riscv_cv_alu_addN (int32_t, int32_t, uint8_t)
+Generated assembler @code{cv.addN} if the uint8_t operand is a constant and in the range 0 <= shft <= 31.
+Generated assembler @code{cv.addNr} if  the it is a register.
+@end deftypefn
 
-@defbuiltin{int __builtin_rx_mvfc (int)}
-Generates the @code{mvfc} machine instruction which reads the control
-register specified in its argument and returns its value.
-@enddefbuiltin
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_alu_adduN (uint32_t, uint32_t, uint8_t)
+Generated assembler @code{cv.adduN} if the uint8_t operand is a constant and in the range 0 <= shft <= 31.
+Generated assembler @code{cv.adduNr} if  the it is a register.
+@end deftypefn
 
-@defbuiltin{void __builtin_rx_mvtachi (int)}
-Generates the @code{mvtachi} machine instruction to set the top
-32 bits of the accumulator.
-@enddefbuiltin
+@deftypefn {Built-in Function} {int32_t} __builtin_riscv_cv_alu_addRN (int32_t, int32_t, uint8_t)
+Generated assembler @code{cv.addRN} if the uint8_t operand is a constant and in the range 0 <= shft <= 31.
+Generated assembler @code{cv.addRNr} if  the it is a register.
+@end deftypefn
 
-@defbuiltin{void __builtin_rx_mvtaclo (int)}
-Generates the @code{mvtaclo} machine instruction to set the bottom
-32 bits of the accumulator.
-@enddefbuiltin
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_alu_adduRN (uint32_t, uint32_t, uint8_t)
+Generated assembler @code{cv.adduRN} if the uint8_t operand is a constant and in the range 0 <= shft <= 31.
+Generated assembler @code{cv.adduRNr} if  the it is a register.
+@end deftypefn
 
-@defbuiltin{void __builtin_rx_mvtc (int @var{reg}, int @var{val})}
-Generates the @code{mvtc} machine instruction which sets control
-register number @code{reg} to @code{val}.
-@enddefbuiltin
+@deftypefn {Built-in Function} {int32_t} __builtin_riscv_cv_alu_subN (int32_t, int32_t, uint8_t)
+Generated assembler @code{cv.subN} if the uint8_t operand is a constant and in the range 0 <= shft <= 31.
+Generated assembler @code{cv.subNr} if  the it is a register.
+@end deftypefn
 
-@defbuiltin{void __builtin_rx_mvtipl (int)}
-Generates the @code{mvtipl} machine instruction set the interrupt
-priority level.
-@enddefbuiltin
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_alu_subuN (uint32_t, uint32_t, uint8_t)
+Generated assembler @code{cv.subuN} if the uint8_t operand is a constant and in the range 0 <= shft <= 31.
+Generated assembler @code{cv.subuNr} if  the it is a register.
+@end deftypefn
 
-@defbuiltin{void __builtin_rx_racw (int)}
-Generates the @code{racw} machine instruction to round the accumulator
-according to the specified mode.
-@enddefbuiltin
+@deftypefn {Built-in Function} {int32_t} __builtin_riscv_cv_alu_subRN (int32_t, int32_t, uint8_t)
+Generated assembler @code{cv.subRN} if the uint8_t operand is a constant and in the range 0 <= shft <= 31.
+Generated assembler @code{cv.subRNr} if  the it is a register.
+@end deftypefn
 
-@defbuiltin{int __builtin_rx_revw (int)}
-Generates the @code{revw} machine instruction which swaps the bytes in
-the argument so that bits 0--7 now occupy bits 8--15 and vice versa,
-and also bits 16--23 occupy bits 24--31 and vice versa.
-@enddefbuiltin
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_alu_subuRN (uint32_t, uint32_t, uint8_t)
+Generated assembler @code{cv.subuRN} if the uint8_t operand is a constant and in the range 0 <= shft <= 31.
+Generated assembler @code{cv.subuRNr} if  the it is a register.
+@end deftypefn
 
-@defbuiltin{void __builtin_rx_rmpa (void)}
-Generates the @code{rmpa} machine instruction which initiates a
-repeated multiply and accumulate sequence.
-@enddefbuiltin
+These built-in functions are available for the CORE-V Event Load machine
+architecture. For more information on CORE-V ELW builtins, please see
+@uref{https://github.com/openhwgroup/core-v-sw/blob/master/specifications/corev-builtin-spec.md#listing-of-event-load-word-builtins-xcvelw}
 
-@defbuiltin{void __builtin_rx_round (float)}
-Generates the @code{round} machine instruction which returns the
-floating-point argument rounded according to the current rounding mode
-set in the floating-point status word register.
-@enddefbuiltin
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_elw_elw (uint32_t *)
+Generated assembler @code{cv.elw}
+@end deftypefn
 
-@defbuiltin{int __builtin_rx_sat (int)}
-Generates the @code{sat} machine instruction which returns the
-saturated value of the argument.
-@enddefbuiltin
+These built-in functions are available for the CORE-V SIMD machine
+architecture. For more information on CORE-V SIMD built-ins, please see
+@uref{https://github.com/openhwgroup/core-v-sw/blob/master/specifications/corev-builtin-spec.md#listing-of-pulp-816-bit-simd-builtins-xcvsimd}
 
-@defbuiltin{void __builtin_rx_setpsw (int)}
-Generates the @code{setpsw} machine instruction to set the specified
-bit in the processor status word.
-@enddefbuiltin
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_add_h (uint32_t, uint32_t, uint4_t)
+Generated assembler @code{cv.add.h}
+@end deftypefn
 
-@defbuiltin{void __builtin_rx_wait (void)}
-Generates the @code{wait} machine instruction.
-@enddefbuiltin
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_add_b (uint32_t, uint32_t)
+Generated assembler @code{cv.add.b}
+@end deftypefn
 
-@node S/390 System z Built-in Functions
-@subsection S/390 System z Built-in Functions
-@defbuiltin{int __builtin_tbegin (void*)}
-Generates the @code{tbegin} machine instruction starting a
-non-constrained hardware transaction.  If the parameter is non-NULL the
-memory area is used to store the transaction diagnostic buffer and
-will be passed as first operand to @code{tbegin}.  This buffer can be
-defined using the @code{struct __htm_tdb} C struct defined in
-@code{htmintrin.h} and must reside on a double-word boundary.  The
-second tbegin operand is set to @code{0xff0c}. This enables
-save/restore of all GPRs and disables aborts for FPR and AR
-manipulations inside the transaction body.  The condition code set by
-the tbegin instruction is returned as integer value.  The tbegin
-instruction by definition overwrites the content of all FPRs.  The
-compiler will generate code which saves and restores the FPRs.  For
-soft-float code it is recommended to used the @code{*_nofloat}
-variant.  In order to prevent a TDB from being written it is required
-to pass a constant zero value as parameter.  Passing a zero value
-through a variable is not sufficient.  Although modifications of
-access registers inside the transaction will not trigger an
-transaction abort it is not supported to actually modify them.  Access
-registers do not get saved when entering a transaction. They will have
-undefined state when reaching the abort code.
-@enddefbuiltin
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_add_sc_h (uint32_t, int16_t)
+Generated assembler @code{cv.add.sc.h}
+@end deftypefn
 
-Macros for the possible return codes of tbegin are defined in the
-@code{htmintrin.h} header file:
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_add_sc_h (uint32_t, int6_t)
+Generated assembler @code{cv.add.sci.h}
+@end deftypefn
 
-@defmac _HTM_TBEGIN_STARTED
-@code{tbegin} has been executed as part of normal processing.  The
-transaction body is supposed to be executed.
-@end defmac
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_add_sc_b (uint32_t, int8_t)
+Generated assembler @code{cv.add.sc.b}
+@end deftypefn
 
-@defmac _HTM_TBEGIN_INDETERMINATE
-The transaction was aborted due to an indeterminate condition which
-might be persistent.
-@end defmac
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_add_sc_b (uint32_t, int6_t)
+Generated assembler @code{cv.add.sci.b}
+@end deftypefn
 
-@defmac _HTM_TBEGIN_TRANSIENT
-The transaction aborted due to a transient failure.  The transaction
-should be re-executed in that case.
-@end defmac
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sub_h (uint32_t, uint32_t, uint4_t)
+Generated assembler @code{cv.sub.h}
+@end deftypefn
 
-@defmac _HTM_TBEGIN_PERSISTENT
-The transaction aborted due to a persistent failure.  Re-execution
-under same circumstances will not be productive.
-@end defmac
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sub_b (uint32_t, uint32_t)
+Generated assembler @code{cv.sub.b}
+@end deftypefn
 
-@defmac _HTM_FIRST_USER_ABORT_CODE
-The @code{_HTM_FIRST_USER_ABORT_CODE} defined in @code{htmintrin.h}
-specifies the first abort code which can be used for
-@code{__builtin_tabort}.  Values below this threshold are reserved for
-machine use.
-@end defmac
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sub_sc_h (uint32_t, int16_t)
+Generated assembler @code{cv.sub.sc.h}
+@end deftypefn
 
-@deftp {Data type} {struct __htm_tdb}
-The @code{struct __htm_tdb} defined in @code{htmintrin.h} describes
-the structure of the transaction diagnostic block as specified in the
-Principles of Operation manual chapter 5-91.
-@end deftp
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sub_sc_h (uint32_t, int6_t)
+Generated assembler @code{cv.sub.sci.h}
+@end deftypefn
 
-@defbuiltin{int __builtin_tbegin_nofloat (void*)}
-Same as @code{__builtin_tbegin} but without FPR saves and restores.
-Using this variant in code making use of FPRs will leave the FPRs in
-undefined state when entering the transaction abort handler code.
-@enddefbuiltin
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sub_sc_b (uint32_t, int8_t)
+Generated assembler @code{cv.sub.sc.b}
+@end deftypefn
 
-@defbuiltin{int __builtin_tbegin_retry (void*, int)}
-In addition to @code{__builtin_tbegin} a loop for transient failures
-is generated.  If tbegin returns a condition code of 2 the transaction
-will be retried as often as specified in the second argument.  The
-perform processor assist instruction is used to tell the CPU about the
-number of fails so far.
-@enddefbuiltin
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sub_sc_b (uint32_t, int6_t)
+Generated assembler @code{cv.sub.sci.b}
+@end deftypefn
 
-@defbuiltin{int __builtin_tbegin_retry_nofloat (void*, int)}
-Same as @code{__builtin_tbegin_retry} but without FPR saves and
-restores.  Using this variant in code making use of FPRs will leave
-the FPRs in undefined state when entering the transaction abort
-handler code.
-@enddefbuiltin
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_avg_h (uint32_t, uint32_t)
+Generated assembler @code{cv.avg.h}
+@end deftypefn
 
-@defbuiltin{void __builtin_tbeginc (void)}
-Generates the @code{tbeginc} machine instruction starting a constrained
-hardware transaction.  The second operand is set to @code{0xff08}.
-@enddefbuiltin
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_avg_b (uint32_t, uint32_t)
+Generated assembler @code{cv.avg.b}
+@end deftypefn
 
-@defbuiltin{int __builtin_tend (void)}
-Generates the @code{tend} machine instruction finishing a transaction
-and making the changes visible to other threads.  The condition code
-generated by tend is returned as integer value.
-@enddefbuiltin
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_avg_sc_h (uint32_t, int16_t)
+Generated assembler @code{cv.avg.sc.h}
+@end deftypefn
 
-@defbuiltin{void __builtin_tabort (int)}
-Generates the @code{tabort} machine instruction with the specified
-abort code.  Abort codes from 0 through 255 are reserved and will
-result in an error message.
-@enddefbuiltin
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_avg_sc_h (uint32_t, int6_t)
+Generated assembler @code{cv.avg.sci.h}
+@end deftypefn
 
-@defbuiltin{void __builtin_tx_assist (int)}
-Generates the @code{ppa rX,rY,1} machine instruction.  Where the
-integer parameter is loaded into rX and a value of zero is loaded into
-rY.  The integer parameter specifies the number of times the
-transaction repeatedly aborted.
-@enddefbuiltin
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_avg_sc_b (uint32_t, int8_t)
+Generated assembler @code{cv.avg.sc.b}
+@end deftypefn
 
-@defbuiltin{int __builtin_tx_nesting_depth (void)}
-Generates the @code{etnd} machine instruction.  The current nesting
-depth is returned as integer value.  For a nesting depth of 0 the code
-is not executed as part of an transaction.
-@enddefbuiltin
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_avg_sc_b (uint32_t, int6_t)
+Generated assembler @code{cv.avg.sci.b}
+@end deftypefn
 
-@defbuiltin{void __builtin_non_tx_store (uint64_t *, uint64_t)}
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_avgu_h (uint32_t, uint32_t)
+Generated assembler @code{cv.avgu.h}
+@end deftypefn
 
-Generates the @code{ntstg} machine instruction.  The second argument
-is written to the first arguments location.  The store operation will
-not be rolled-back in case of an transaction abort.
-@enddefbuiltin
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_avgu_b (uint32_t, uint32_t)
+Generated assembler @code{cv.avgu.b}
+@end deftypefn
 
-@node SH Built-in Functions
-@subsection SH Built-in Functions
-The following built-in functions are supported on the SH1, SH2, SH3 and SH4
-families of processors:
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_avgu_sc_h (uint32_t, uint16_t)
+Generated assembler @code{cv.avgu.sc.h}
+@end deftypefn
 
-@defbuiltin{{void} __builtin_set_thread_pointer (void *@var{ptr})}
-Sets the @samp{GBR} register to the specified value @var{ptr}.  This is usually
-used by system code that manages threads and execution contexts.  The compiler
-normally does not generate code that modifies the contents of @samp{GBR} and
-thus the value is preserved across function calls.  Changing the @samp{GBR}
-value in user code must be done with caution, since the compiler might use
-@samp{GBR} in order to access thread local variables.
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_avgu_sc_h (uint32_t, uint6_t)
+Generated assembler @code{cv.avgu.sci.h}
+@end deftypefn
 
-@enddefbuiltin
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_avgu_sc_b (uint32_t, uint8_t)
+Generated assembler @code{cv.avgu.sc.b}
+@end deftypefn
 
-@defbuiltin{{void *} __builtin_thread_pointer (void)}
-Returns the value that is currently set in the @samp{GBR} register.
-Memory loads and stores that use the thread pointer as a base address are
-turned into @samp{GBR} based displacement loads and stores, if possible.
-For example:
-@smallexample
-struct my_tcb
-@{
-   int a, b, c, d, e;
-@};
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_avgu_sc_b (uint32_t, uint6_t)
+Generated assembler @code{cv.avgu.sci.b}
+@end deftypefn
 
-int get_tcb_value (void)
-@{
-  // Generate @samp{mov.l @@(8,gbr),r0} instruction
-  return ((my_tcb*)__builtin_thread_pointer ())->c;
-@}
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_min_h (uint32_t, uint32_t)
+Generated assembler @code{cv.min.h}
+@end deftypefn
 
-@end smallexample
-@enddefbuiltin
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_min_b (uint32_t, uint32_t)
+Generated assembler @code{cv.min.b}
+@end deftypefn
 
-@defbuiltin{{unsigned int} __builtin_sh_get_fpscr (void)}
-Returns the value that is currently set in the @samp{FPSCR} register.
-@enddefbuiltin
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_min_sc_h (uint32_t, int16_t)
+Generated assembler @code{cv.min.sc.h}
+@end deftypefn
 
-@defbuiltin{{void} __builtin_sh_set_fpscr (unsigned int @var{val})}
-Sets the @samp{FPSCR} register to the specified value @var{val}, while
-preserving the current values of the FR, SZ and PR bits.
-@enddefbuiltin
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_min_sc_h (uint32_t, int6_t)
+Generated assembler @code{cv.min.sci.h}
+@end deftypefn
 
-@node SPARC VIS Built-in Functions
-@subsection SPARC VIS Built-in Functions
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_min_sc_b (uint32_t, int8_t)
+Generated assembler @code{cv.min.sc.b}
+@end deftypefn
 
-GCC supports SIMD operations on the SPARC using both the generic vector
-extensions (@pxref{Vector Extensions}) as well as built-in functions for
-the SPARC Visual Instruction Set (VIS).  When you use the @option{-mvis}
-switch, the VIS extension is exposed as the following built-in functions:
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_min_sc_b (uint32_t, int6_t)
+Generated assembler @code{cv.min.sci.b}
+@end deftypefn
 
-@smallexample
-typedef int v1si __attribute__ ((vector_size (4)));
-typedef int v2si __attribute__ ((vector_size (8)));
-typedef short v4hi __attribute__ ((vector_size (8)));
-typedef short v2hi __attribute__ ((vector_size (4)));
-typedef unsigned char v8qi __attribute__ ((vector_size (8)));
-typedef unsigned char v4qi __attribute__ ((vector_size (4)));
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_minu_h (uint32_t, uint32_t)
+Generated assembler @code{cv.minu.h}
+@end deftypefn
 
-void __builtin_vis_write_gsr (int64_t);
-int64_t __builtin_vis_read_gsr (void);
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_minu_b (uint32_t, uint32_t)
+Generated assembler @code{cv.minu.b}
+@end deftypefn
 
-void * __builtin_vis_alignaddr (void *, long);
-void * __builtin_vis_alignaddrl (void *, long);
-int64_t __builtin_vis_faligndatadi (int64_t, int64_t);
-v2si __builtin_vis_faligndatav2si (v2si, v2si);
-v4hi __builtin_vis_faligndatav4hi (v4si, v4si);
-v8qi __builtin_vis_faligndatav8qi (v8qi, v8qi);
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_minu_sc_h (uint32_t, uint16_t)
+Generated assembler @code{cv.minu.sc.h}
+@end deftypefn
 
-v4hi __builtin_vis_fexpand (v4qi);
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_minu_sc_h (uint32_t, uint6_t)
+Generated assembler @code{cv.minu.sci.h}
+@end deftypefn
 
-v4hi __builtin_vis_fmul8x16 (v4qi, v4hi);
-v4hi __builtin_vis_fmul8x16au (v4qi, v2hi);
-v4hi __builtin_vis_fmul8x16al (v4qi, v2hi);
-v4hi __builtin_vis_fmul8sux16 (v8qi, v4hi);
-v4hi __builtin_vis_fmul8ulx16 (v8qi, v4hi);
-v2si __builtin_vis_fmuld8sux16 (v4qi, v2hi);
-v2si __builtin_vis_fmuld8ulx16 (v4qi, v2hi);
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_minu_sc_b (uint32_t, uint8_t)
+Generated assembler @code{cv.minu.sc.b}
+@end deftypefn
 
-v4qi __builtin_vis_fpack16 (v4hi);
-v8qi __builtin_vis_fpack32 (v2si, v8qi);
-v2hi __builtin_vis_fpackfix (v2si);
-v8qi __builtin_vis_fpmerge (v4qi, v4qi);
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_minu_sc_b (uint32_t, uint6_t)
+Generated assembler @code{cv.minu.sci.b}
+@end deftypefn
 
-int64_t __builtin_vis_pdist (v8qi, v8qi, int64_t);
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_max_h (uint32_t, uint32_t)
+Generated assembler @code{cv.max.h}
+@end deftypefn
 
-long __builtin_vis_edge8 (void *, void *);
-long __builtin_vis_edge8l (void *, void *);
-long __builtin_vis_edge16 (void *, void *);
-long __builtin_vis_edge16l (void *, void *);
-long __builtin_vis_edge32 (void *, void *);
-long __builtin_vis_edge32l (void *, void *);
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_max_b (uint32_t, uint32_t)
+Generated assembler @code{cv.max.b}
+@end deftypefn
 
-long __builtin_vis_fcmple16 (v4hi, v4hi);
-long __builtin_vis_fcmple32 (v2si, v2si);
-long __builtin_vis_fcmpne16 (v4hi, v4hi);
-long __builtin_vis_fcmpne32 (v2si, v2si);
-long __builtin_vis_fcmpgt16 (v4hi, v4hi);
-long __builtin_vis_fcmpgt32 (v2si, v2si);
-long __builtin_vis_fcmpeq16 (v4hi, v4hi);
-long __builtin_vis_fcmpeq32 (v2si, v2si);
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_max_sc_h (uint32_t, int16_t)
+Generated assembler @code{cv.max.sc.h}
+@end deftypefn
 
-v4hi __builtin_vis_fpadd16 (v4hi, v4hi);
-v2hi __builtin_vis_fpadd16s (v2hi, v2hi);
-v2si __builtin_vis_fpadd32 (v2si, v2si);
-v1si __builtin_vis_fpadd32s (v1si, v1si);
-v4hi __builtin_vis_fpsub16 (v4hi, v4hi);
-v2hi __builtin_vis_fpsub16s (v2hi, v2hi);
-v2si __builtin_vis_fpsub32 (v2si, v2si);
-v1si __builtin_vis_fpsub32s (v1si, v1si);
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_max_sc_h (uint32_t, int6_t)
+Generated assembler @code{cv.max.sci.h}
+@end deftypefn
 
-long __builtin_vis_array8 (long, long);
-long __builtin_vis_array16 (long, long);
-long __builtin_vis_array32 (long, long);
-@end smallexample
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_max_sc_b (uint32_t, int8_t)
+Generated assembler @code{cv.max.sc.b}
+@end deftypefn
 
-When you use the @option{-mvis2} switch, the VIS version 2.0 built-in
-functions also become available:
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_max_sc_b (uint32_t, int6_t)
+Generated assembler @code{cv.max.sci.b}
+@end deftypefn
 
-@smallexample
-long __builtin_vis_bmask (long, long);
-int64_t __builtin_vis_bshuffledi (int64_t, int64_t);
-v2si __builtin_vis_bshufflev2si (v2si, v2si);
-v4hi __builtin_vis_bshufflev2si (v4hi, v4hi);
-v8qi __builtin_vis_bshufflev2si (v8qi, v8qi);
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_maxu_h (uint32_t, uint32_t)
+Generated assembler @code{cv.maxu.h}
+@end deftypefn
 
-long __builtin_vis_edge8n (void *, void *);
-long __builtin_vis_edge8ln (void *, void *);
-long __builtin_vis_edge16n (void *, void *);
-long __builtin_vis_edge16ln (void *, void *);
-long __builtin_vis_edge32n (void *, void *);
-long __builtin_vis_edge32ln (void *, void *);
-@end smallexample
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_maxu_b (uint32_t, uint32_t)
+Generated assembler @code{cv.maxu.b}
+@end deftypefn
 
-When you use the @option{-mvis3} switch, the VIS version 3.0 built-in
-functions also become available:
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_maxu_sc_h (uint32_t, uint16_t)
+Generated assembler @code{cv.maxu.sc.h}
+@end deftypefn
 
-@smallexample
-void __builtin_vis_cmask8 (long);
-void __builtin_vis_cmask16 (long);
-void __builtin_vis_cmask32 (long);
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_maxu_sc_h (uint32_t, uint6_t)
+Generated assembler @code{cv.maxu.sci.h}
+@end deftypefn
+
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_maxu_sc_b (uint32_t, uint8_t)
+Generated assembler @code{cv.maxu.sc.b}
+@end deftypefn
 
-v4hi __builtin_vis_fchksm16 (v4hi, v4hi);
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_maxu_sc_b (uint32_t, uint6_t)
+Generated assembler @code{cv.maxu.sci.b}
+@end deftypefn
 
-v4hi __builtin_vis_fsll16 (v4hi, v4hi);
-v4hi __builtin_vis_fslas16 (v4hi, v4hi);
-v4hi __builtin_vis_fsrl16 (v4hi, v4hi);
-v4hi __builtin_vis_fsra16 (v4hi, v4hi);
-v2si __builtin_vis_fsll16 (v2si, v2si);
-v2si __builtin_vis_fslas16 (v2si, v2si);
-v2si __builtin_vis_fsrl16 (v2si, v2si);
-v2si __builtin_vis_fsra16 (v2si, v2si);
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_srl_h (uint32_t, uint32_t)
+Generated assembler @code{cv.srl.h}
+@end deftypefn
 
-long __builtin_vis_pdistn (v8qi, v8qi);
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_srl_b (uint32_t, uint32_t)
+Generated assembler @code{cv.srl.b}
+@end deftypefn
 
-v4hi __builtin_vis_fmean16 (v4hi, v4hi);
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_srl_sc_h (uint32_t, int16_t)
+Generated assembler @code{cv.srl.sc.h}
+@end deftypefn
 
-int64_t __builtin_vis_fpadd64 (int64_t, int64_t);
-int64_t __builtin_vis_fpsub64 (int64_t, int64_t);
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_srl_sc_h (uint32_t, int6_t)
+Generated assembler @code{cv.srl.sci.h}
+@end deftypefn
 
-v4hi __builtin_vis_fpadds16 (v4hi, v4hi);
-v2hi __builtin_vis_fpadds16s (v2hi, v2hi);
-v4hi __builtin_vis_fpsubs16 (v4hi, v4hi);
-v2hi __builtin_vis_fpsubs16s (v2hi, v2hi);
-v2si __builtin_vis_fpadds32 (v2si, v2si);
-v1si __builtin_vis_fpadds32s (v1si, v1si);
-v2si __builtin_vis_fpsubs32 (v2si, v2si);
-v1si __builtin_vis_fpsubs32s (v1si, v1si);
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_srl_sc_b (uint32_t, int8_t)
+Generated assembler @code{cv.srl.sc.b}
+@end deftypefn
 
-long __builtin_vis_fucmple8 (v8qi, v8qi);
-long __builtin_vis_fucmpne8 (v8qi, v8qi);
-long __builtin_vis_fucmpgt8 (v8qi, v8qi);
-long __builtin_vis_fucmpeq8 (v8qi, v8qi);
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_srl_sc_b (uint32_t, int6_t)
+Generated assembler @code{cv.srl.sci.b}
+@end deftypefn
 
-float __builtin_vis_fhadds (float, float);
-double __builtin_vis_fhaddd (double, double);
-float __builtin_vis_fhsubs (float, float);
-double __builtin_vis_fhsubd (double, double);
-float __builtin_vis_fnhadds (float, float);
-double __builtin_vis_fnhaddd (double, double);
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sra_h (uint32_t, uint32_t)
+Generated assembler @code{cv.sra.h}
+@end deftypefn
 
-int64_t __builtin_vis_umulxhi (int64_t, int64_t);
-int64_t __builtin_vis_xmulx (int64_t, int64_t);
-int64_t __builtin_vis_xmulxhi (int64_t, int64_t);
-@end smallexample
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sra_b (uint32_t, uint32_t)
+Generated assembler @code{cv.sra.b}
+@end deftypefn
 
-When you use the @option{-mvis4} switch, the VIS version 4.0 built-in
-functions also become available:
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sra_sc_h (uint32_t, int16_t)
+Generated assembler @code{cv.sra.sc.h}
+@end deftypefn
 
-@smallexample
-v8qi __builtin_vis_fpadd8 (v8qi, v8qi);
-v8qi __builtin_vis_fpadds8 (v8qi, v8qi);
-v8qi __builtin_vis_fpaddus8 (v8qi, v8qi);
-v4hi __builtin_vis_fpaddus16 (v4hi, v4hi);
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sra_sc_h (uint32_t, int6_t)
+Generated assembler @code{cv.sra.sci.h}
+@end deftypefn
 
-v8qi __builtin_vis_fpsub8 (v8qi, v8qi);
-v8qi __builtin_vis_fpsubs8 (v8qi, v8qi);
-v8qi __builtin_vis_fpsubus8 (v8qi, v8qi);
-v4hi __builtin_vis_fpsubus16 (v4hi, v4hi);
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sra_sc_b (uint32_t, int8_t)
+Generated assembler @code{cv.sra.sc.b}
+@end deftypefn
 
-long __builtin_vis_fpcmple8 (v8qi, v8qi);
-long __builtin_vis_fpcmpgt8 (v8qi, v8qi);
-long __builtin_vis_fpcmpule16 (v4hi, v4hi);
-long __builtin_vis_fpcmpugt16 (v4hi, v4hi);
-long __builtin_vis_fpcmpule32 (v2si, v2si);
-long __builtin_vis_fpcmpugt32 (v2si, v2si);
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sra_sc_b (uint32_t, int6_t)
+Generated assembler @code{cv.sra.sci.b}
+@end deftypefn
 
-v8qi __builtin_vis_fpmax8 (v8qi, v8qi);
-v4hi __builtin_vis_fpmax16 (v4hi, v4hi);
-v2si __builtin_vis_fpmax32 (v2si, v2si);
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sll_h (uint32_t, uint32_t)
+Generated assembler @code{cv.sll.h}
+@end deftypefn
 
-v8qi __builtin_vis_fpmaxu8 (v8qi, v8qi);
-v4hi __builtin_vis_fpmaxu16 (v4hi, v4hi);
-v2si __builtin_vis_fpmaxu32 (v2si, v2si);
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sll_b (uint32_t, uint32_t)
+Generated assembler @code{cv.sll.b}
+@end deftypefn
 
-v8qi __builtin_vis_fpmin8 (v8qi, v8qi);
-v4hi __builtin_vis_fpmin16 (v4hi, v4hi);
-v2si __builtin_vis_fpmin32 (v2si, v2si);
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sll_sc_h (uint32_t, int16_t)
+Generated assembler @code{cv.sll.sc.h}
+@end deftypefn
 
-v8qi __builtin_vis_fpminu8 (v8qi, v8qi);
-v4hi __builtin_vis_fpminu16 (v4hi, v4hi);
-v2si __builtin_vis_fpminu32 (v2si, v2si);
-@end smallexample
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sll_sc_h (uint32_t, int6_t)
+Generated assembler @code{cv.sll.sci.h}
+@end deftypefn
 
-When you use the @option{-mvis4b} switch, the VIS version 4.0B
-built-in functions also become available:
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sll_sc_b (uint32_t, int8_t)
+Generated assembler @code{cv.sll.sc.b}
+@end deftypefn
 
-@smallexample
-v8qi __builtin_vis_dictunpack8 (double, int);
-v4hi __builtin_vis_dictunpack16 (double, int);
-v2si __builtin_vis_dictunpack32 (double, int);
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sll_sc_b (uint32_t, int6_t)
+Generated assembler @code{cv.sll.sci.b}
+@end deftypefn
 
-long __builtin_vis_fpcmple8shl (v8qi, v8qi, int);
-long __builtin_vis_fpcmpgt8shl (v8qi, v8qi, int);
-long __builtin_vis_fpcmpeq8shl (v8qi, v8qi, int);
-long __builtin_vis_fpcmpne8shl (v8qi, v8qi, int);
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_or_h (uint32_t, uint32_t)
+Generated assembler @code{cv.or.h}
+@end deftypefn
 
-long __builtin_vis_fpcmple16shl (v4hi, v4hi, int);
-long __builtin_vis_fpcmpgt16shl (v4hi, v4hi, int);
-long __builtin_vis_fpcmpeq16shl (v4hi, v4hi, int);
-long __builtin_vis_fpcmpne16shl (v4hi, v4hi, int);
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_or_b (uint32_t, uint32_t)
+Generated assembler @code{cv.or.b}
+@end deftypefn
 
-long __builtin_vis_fpcmple32shl (v2si, v2si, int);
-long __builtin_vis_fpcmpgt32shl (v2si, v2si, int);
-long __builtin_vis_fpcmpeq32shl (v2si, v2si, int);
-long __builtin_vis_fpcmpne32shl (v2si, v2si, int);
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_or_sc_h (uint32_t, int16_t)
+Generated assembler @code{cv.or.sc.h}
+@end deftypefn
 
-long __builtin_vis_fpcmpule8shl (v8qi, v8qi, int);
-long __builtin_vis_fpcmpugt8shl (v8qi, v8qi, int);
-long __builtin_vis_fpcmpule16shl (v4hi, v4hi, int);
-long __builtin_vis_fpcmpugt16shl (v4hi, v4hi, int);
-long __builtin_vis_fpcmpule32shl (v2si, v2si, int);
-long __builtin_vis_fpcmpugt32shl (v2si, v2si, int);
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_or_sc_h (uint32_t, int6_t)
+Generated assembler @code{cv.or.sci.h}
+@end deftypefn
 
-long __builtin_vis_fpcmpde8shl (v8qi, v8qi, int);
-long __builtin_vis_fpcmpde16shl (v4hi, v4hi, int);
-long __builtin_vis_fpcmpde32shl (v2si, v2si, int);
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_or_sc_b (uint32_t, int8_t)
+Generated assembler @code{cv.or.sc.b}
+@end deftypefn
 
-long __builtin_vis_fpcmpur8shl (v8qi, v8qi, int);
-long __builtin_vis_fpcmpur16shl (v4hi, v4hi, int);
-long __builtin_vis_fpcmpur32shl (v2si, v2si, int);
-@end smallexample
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_or_sc_b (uint32_t, int6_t)
+Generated assembler @code{cv.or.sci.b}
+@end deftypefn
 
-@node TI C6X Built-in Functions
-@subsection TI C6X Built-in Functions
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_xor_h (uint32_t, uint32_t)
+Generated assembler @code{cv.xor.h}
+@end deftypefn
 
-GCC provides intrinsics to access certain instructions of the TI C6X
-processors.  These intrinsics, listed below, are available after
-inclusion of the @code{c6x_intrinsics.h} header file.  They map directly
-to C6X instructions.
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_xor_b (uint32_t, uint32_t)
+Generated assembler @code{cv.xor.b}
+@end deftypefn
 
-@smallexample
-int _sadd (int, int);
-int _ssub (int, int);
-int _sadd2 (int, int);
-int _ssub2 (int, int);
-long long _mpy2 (int, int);
-long long _smpy2 (int, int);
-int _add4 (int, int);
-int _sub4 (int, int);
-int _saddu4 (int, int);
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_xor_sc_h (uint32_t, int16_t)
+Generated assembler @code{cv.xor.sc.h}
+@end deftypefn
 
-int _smpy (int, int);
-int _smpyh (int, int);
-int _smpyhl (int, int);
-int _smpylh (int, int);
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_xor_sc_h (uint32_t, int6_t)
+Generated assembler @code{cv.xor.sci.h}
+@end deftypefn
 
-int _sshl (int, int);
-int _subc (int, int);
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_xor_sc_b (uint32_t, int8_t)
+Generated assembler @code{cv.xor.sc.b}
+@end deftypefn
 
-int _avg2 (int, int);
-int _avgu4 (int, int);
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_xor_sc_b (uint32_t, int6_t)
+Generated assembler @code{cv.xor.sci.b}
+@end deftypefn
 
-int _clrr (int, int);
-int _extr (int, int);
-int _extru (int, int);
-int _abs (int);
-int _abs2 (int);
-@end smallexample
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_and_h (uint32_t, uint32_t)
+Generated assembler @code{cv.and.h}
+@end deftypefn
 
-@node x86 Built-in Functions
-@subsection x86 Built-in Functions
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_and_b (uint32_t, uint32_t)
+Generated assembler @code{cv.and.b}
+@end deftypefn
 
-These built-in functions are available for the x86-32 and x86-64 family
-of computers, depending on the command-line switches used.
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_and_sc_h (uint32_t, int16_t)
+Generated assembler @code{cv.and.sc.h}
+@end deftypefn
 
-If you specify command-line switches such as @option{-msse},
-the compiler could use the extended instruction sets even if the built-ins
-are not used explicitly in the program.  For this reason, applications
-that perform run-time CPU detection must compile separate files for each
-supported architecture, using the appropriate flags.  In particular,
-the file containing the CPU detection code should be compiled without
-these options.
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_and_sc_h (uint32_t, int6_t)
+Generated assembler @code{cv.and.sci.h}
+@end deftypefn
 
-The following machine modes are available for use with MMX built-in functions
-(@pxref{Vector Extensions}): @code{V2SI} for a vector of two 32-bit integers,
-@code{V4HI} for a vector of four 16-bit integers, and @code{V8QI} for a
-vector of eight 8-bit integers.  Some of the built-in functions operate on
-MMX registers as a whole 64-bit entity, these use @code{V1DI} as their mode.
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_and_sc_b (uint32_t, int8_t)
+Generated assembler @code{cv.and.sc.b}
+@end deftypefn
 
-If 3DNow!@: extensions are enabled, @code{V2SF} is used as a mode for a vector
-of two 32-bit floating-point values.
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_and_sc_b (uint32_t, int6_t)
+Generated assembler @code{cv.and.sci.b}
+@end deftypefn
 
-If SSE extensions are enabled, @code{V4SF} is used for a vector of four 32-bit
-floating-point values.  Some instructions use a vector of four 32-bit
-integers, these use @code{V4SI}.  Finally, some instructions operate on an
-entire vector register, interpreting it as a 128-bit integer, these use mode
-@code{TI}.
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_abs_h (uint32_t)
+Generated assembler @code{cv.abs.h}
+@end deftypefn
 
-The x86-32 and x86-64 family of processors use additional built-in
-functions for efficient use of @code{TF} (@code{__float128}) 128-bit
-floating point and @code{TC} 128-bit complex floating-point values.
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_abs_b (uint32_t)
+Generated assembler @code{cv.abs.b}
+@end deftypefn
 
-The following floating-point built-in functions are always available:
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_dotup_h (uint32_t, uint32_t)
+Generated assembler @code{cv.dotup.h}
+@end deftypefn
 
-@defbuiltin{__float128 __builtin_fabsq (__float128 @var{x}))}
-Computes the absolute value of @var{x}.
-@enddefbuiltin
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_dotup_b (uint32_t, uint32_t)
+Generated assembler @code{cv.dotup.b}
+@end deftypefn
 
-@defbuiltin{__float128 __builtin_copysignq (__float128 @var{x}, @
-                                            __float128 @var{y})}
-Copies the sign of @var{y} into @var{x} and returns the new value of
-@var{x}.
-@enddefbuiltin
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_dotup_sc_h (uint32_t, uint16_t)
+Generated assembler @code{cv.dotup.sc.h}
+@end deftypefn
 
-@defbuiltin{__float128 __builtin_infq (void)}
-Similar to @code{__builtin_inf}, except the return type is @code{__float128}.
-@enddefbuiltin
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_dotup_sc_h (uint32_t, uint6_t)
+Generated assembler @code{cv.dotup.sci.h}
+@end deftypefn
 
-@defbuiltin{__float128 __builtin_huge_valq (void)}
-Similar to @code{__builtin_huge_val}, except the return type is @code{__float128}.
-@enddefbuiltin
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_dotup_sc_b (uint32_t, uint8_t)
+Generated assembler @code{cv.dotup.sc.b}
+@end deftypefn
 
-@defbuiltin{__float128 __builtin_nanq (void)}
-Similar to @code{__builtin_nan}, except the return type is @code{__float128}.
-@enddefbuiltin
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_dotup_sc_b (uint32_t, uint6_t)
+Generated assembler @code{cv.dotup.sci.b}
+@end deftypefn
 
-@defbuiltin{__float128 __builtin_nansq (void)}
-Similar to @code{__builtin_nans}, except the return type is @code{__float128}.
-@enddefbuiltin
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_dotusp_h (uint32_t, uint32_t)
+Generated assembler @code{cv.dotusp.h}
+@end deftypefn
 
-The following built-in function is always available.
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_dotusp_b (uint32_t, uint32_t)
+Generated assembler @code{cv.dotusp.b}
+@end deftypefn
 
-@defbuiltin{void __builtin_ia32_pause (void)}
-Generates the @code{pause} machine instruction with a compiler memory
-barrier.
-@enddefbuiltin
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_dotusp_sc_h (uint32_t, int16_t)
+Generated assembler @code{cv.dotusp.sc.h}
+@end deftypefn
 
-The following built-in functions are always available and can be used to
-check the target platform type.
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_dotusp_sc_h (uint32_t, int6_t)
+Generated assembler @code{cv.dotusp.sci.h}
+@end deftypefn
 
-@defbuiltin{void __builtin_cpu_init (void)}
-This function runs the CPU detection code to check the type of CPU and the
-features supported.  This built-in function needs to be invoked along with the built-in functions
-to check CPU type and features, @code{__builtin_cpu_is} and
-@code{__builtin_cpu_supports}, only when used in a function that is
-executed before any constructors are called.  The CPU detection code is
-automatically executed in a very high priority constructor.
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_dotusp_sc_b (uint32_t, int8_t)
+Generated assembler @code{cv.dotusp.sc.b}
+@end deftypefn
 
-For example, this function has to be used in @code{ifunc} resolvers that
-check for CPU type using the built-in functions @code{__builtin_cpu_is}
-and @code{__builtin_cpu_supports}, or in constructors on targets that
-don't support constructor priority.
-@smallexample
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_dotusp_sc_b (uint32_t, int6_t)
+Generated assembler @code{cv.dotusp.sci.b}
+@end deftypefn
 
-static void (*resolve_memcpy (void)) (void)
-@{
-  // ifunc resolvers fire before constructors, explicitly call the init
-  // function.
-  __builtin_cpu_init ();
-  if (__builtin_cpu_supports ("ssse3"))
-    return ssse3_memcpy; // super fast memcpy with ssse3 instructions.
-  else
-    return default_memcpy;
-@}
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_dotsp_h (uint32_t, uint32_t)
+Generated assembler @code{cv.dotsp.h}
+@end deftypefn
 
-void *memcpy (void *, const void *, size_t)
-     __attribute__ ((ifunc ("resolve_memcpy")));
-@end smallexample
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_dotsp_b (uint32_t, uint32_t)
+Generated assembler @code{cv.dotsp.b}
+@end deftypefn
 
-@enddefbuiltin
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_dotsp_sc_h (uint32_t, int16_t)
+Generated assembler @code{cv.dotsp.sc.h}
+@end deftypefn
 
-@defbuiltin{int __builtin_cpu_is (const char *@var{cpuname})}
-This function returns a positive integer if the run-time CPU
-is of type @var{cpuname}
-and returns @code{0} otherwise. The following CPU names can be detected:
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_dotsp_sc_h (uint32_t, int6_t)
+Generated assembler @code{cv.dotsp.sci.h}
+@end deftypefn
 
-@table @samp
-@item amd
-AMD CPU.
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_dotsp_sc_b (uint32_t, int8_t)
+Generated assembler @code{cv.dotsp.sc.b}
+@end deftypefn
 
-@item intel
-Intel CPU.
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_dotsp_sc_b (uint32_t, int6_t)
+Generated assembler @code{cv.dotsp.sci.b}
+@end deftypefn
 
-@item atom
-Intel Atom CPU.
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sdotup_h (uint32_t, uint32_t, uint32_t)
+Generated assembler @code{cv.sdotup.h}
+@end deftypefn
 
-@item slm
-Intel Silvermont CPU.
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sdotup_b (uint32_t, uint32_t, uint32_t)
+Generated assembler @code{cv.sdotup.b}
+@end deftypefn
 
-@item core2
-Intel Core 2 CPU.
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sdotup_sc_h (uint32_t, uint16_t, uint32_t)
+Generated assembler @code{cv.sdotup.sc.h}
+@end deftypefn
 
-@item corei7
-Intel Core i7 CPU.
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sdotup_sc_h (uint32_t, uint6_t, uint32_t)
+Generated assembler @code{cv.sdotup.sci.h}
+@end deftypefn
 
-@item nehalem
-Intel Core i7 Nehalem CPU.
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sdotup_sc_b (uint32_t, uint8_t, uint32_t)
+Generated assembler @code{cv.sdotup.sc.b}
+@end deftypefn
 
-@item westmere
-Intel Core i7 Westmere CPU.
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sdotup_sc_b (uint32_t, uint6_t, uint32_t)
+Generated assembler @code{cv.sdotup.sci.b}
+@end deftypefn
 
-@item sandybridge
-Intel Core i7 Sandy Bridge CPU.
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sdotusp_h (uint32_t, uint32_t, uint32_t)
+Generated assembler @code{cv.sdotusp.h}
+@end deftypefn
 
-@item ivybridge
-Intel Core i7 Ivy Bridge CPU.
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sdotusp_b (uint32_t, uint32_t, uint32_t)
+Generated assembler @code{cv.sdotusp.b}
+@end deftypefn
 
-@item haswell
-Intel Core i7 Haswell CPU.
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sdotusp_sc_h (uint32_t, int16_t, uint32_t)
+Generated assembler @code{cv.sdotusp.sc.h}
+@end deftypefn
 
-@item broadwell
-Intel Core i7 Broadwell CPU.
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sdotusp_sc_h (uint32_t, int6_t, uint32_t)
+Generated assembler @code{cv.sdotusp.sci.h}
+@end deftypefn
 
-@item skylake
-Intel Core i7 Skylake CPU.
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sdotusp_sc_b (uint32_t, int8_t, uint32_t)
+Generated assembler @code{cv.sdotusp.sc.b}
+@end deftypefn
 
-@item skylake-avx512
-Intel Core i7 Skylake AVX512 CPU.
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sdotusp_sc_b (uint32_t, int6_t, uint32_t)
+Generated assembler @code{cv.sdotusp.sci.b}
+@end deftypefn
 
-@item cannonlake
-Intel Core i7 Cannon Lake CPU.
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sdotsp_h (uint32_t, uint32_t, uint32_t)
+Generated assembler @code{cv.sdotsp.h}
+@end deftypefn
 
-@item icelake-client
-Intel Core i7 Ice Lake Client CPU.
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sdotsp_b (uint32_t, uint32_t, uint32_t)
+Generated assembler @code{cv.sdotsp.b}
+@end deftypefn
 
-@item icelake-server
-Intel Core i7 Ice Lake Server CPU.
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sdotsp_sc_h (uint32_t, int16_t, uint32_t)
+Generated assembler @code{cv.sdotsp.sc.h}
+@end deftypefn
 
-@item cascadelake
-Intel Core i7 Cascadelake CPU.
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sdotsp_sc_h (uint32_t, int6_t, uint32_t)
+Generated assembler @code{cv.sdotsp.sci.h}
+@end deftypefn
 
-@item tigerlake
-Intel Core i7 Tigerlake CPU.
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sdotsp_sc_b (uint32_t, int8_t, uint32_t)
+Generated assembler @code{cv.sdotsp.sc.b}
+@end deftypefn
 
-@item cooperlake
-Intel Core i7 Cooperlake CPU.
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sdotsp_sc_b (uint32_t, int6_t, uint32_t)
+Generated assembler @code{cv.sdotsp.sci.b}
+@end deftypefn
 
-@item sapphirerapids
-Intel Core i7 sapphirerapids CPU.
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_extract_h (uint32_t, uint6_t)
+Generated assembler @code{cv.extract.h}
+@end deftypefn
 
-@item alderlake
-Intel Core i7 Alderlake CPU.
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_extract_b (uint32_t, uint6_t)
+Generated assembler @code{cv.extract.b}
+@end deftypefn
 
-@item rocketlake
-Intel Core i7 Rocketlake CPU.
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_extractu_h (uint32_t, uint6_t)
+Generated assembler @code{cv.extractu.h}
+@end deftypefn
 
-@item graniterapids
-Intel Core i7 graniterapids CPU.
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_extractu_b (uint32_t, uint6_t)
+Generated assembler @code{cv.extractu.b}
+@end deftypefn
 
-@item graniterapids-d
-Intel Core i7 graniterapids D CPU.
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_insert_h (uint32_t, uint32_t)
+Generated assembler @code{cv.insert.h}
+@end deftypefn
 
-@item arrowlake
-Intel Core i7 Arrow Lake CPU.
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_insert_b (uint32_t, uint32_t)
+Generated assembler @code{cv.insert.b}
+@end deftypefn
 
-@item arrowlake-s
-Intel Core i7 Arrow Lake S CPU.
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_shuffle_h (uint32_t, uint32_t)
+Generated assembler @code{cv.shuffle.h}
+@end deftypefn
 
-@item pantherlake
-Intel Core i7 Panther Lake CPU.
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_shuffle_b (uint32_t, uint32_t)
+Generated assembler @code{cv.shuffle.b}
+@end deftypefn
 
-@item diamondrapids
-Intel Core i7 Diamond Rapids CPU.
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_shuffle_sci_h (uint32_t, uint4_t)
+Generated assembler @code{cv.shuffle.sci.h}
+@end deftypefn
 
-@item bonnell
-Intel Atom Bonnell CPU.
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_shufflei0_sci_b (uint32_t, uint4_t)
+Generated assembler @code{cv.shufflei0.sci.b}
+@end deftypefn
 
-@item silvermont
-Intel Atom Silvermont CPU.
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_shufflei1_sci_b (uint32_t, uint4_t)
+Generated assembler @code{cv.shufflei1.sci.b}
+@end deftypefn
 
-@item goldmont
-Intel Atom Goldmont CPU.
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_shufflei2_sci_b (uint32_t, uint4_t)
+Generated assembler @code{cv.shufflei2.sci.b}
+@end deftypefn
 
-@item goldmont-plus
-Intel Atom Goldmont Plus CPU.
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_shufflei3_sci_b (uint32_t, uint4_t)
+Generated assembler @code{cv.shufflei3.sci.b}
+@end deftypefn
 
-@item tremont
-Intel Atom Tremont CPU.
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_shuffle2_h (uint32_t, uint32_t, uint32_t)
+Generated assembler @code{cv.shuffle2.h}
+@end deftypefn
 
-@item sierraforest
-Intel Atom Sierra Forest CPU.
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_shuffle2_b (uint32_t, uint32_t, uint32_t)
+Generated assembler @code{cv.shuffle2.b}
+@end deftypefn
 
-@item grandridge
-Intel Atom Grand Ridge CPU.
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_packlo_h (uint32_t, uint32_t)
+Generated assembler @code{cv.pack}
+@end deftypefn
 
-@item clearwaterforest
-Intel Atom Clearwater Forest CPU.
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_packhi_h (uint32_t, uint32_t)
+Generated assembler @code{cv.pack.h}
+@end deftypefn
 
-@item lujiazui
-ZHAOXIN lujiazui CPU.
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_packhi_b (uint32_t, uint32_t, uint32_t)
+Generated assembler @code{cv.packhi.b}
+@end deftypefn
 
-@item yongfeng
-ZHAOXIN yongfeng CPU.
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_packlo_b (uint32_t, uint32_t, uint32_t)
+Generated assembler @code{cv.packlo.b}
+@end deftypefn
 
-@item shijidadao
-ZHAOXIN shijidadao CPU.
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpeq_h (uint32_t, uint32_t)
+Generated assembler @code{cv.cmpeq.h}
+@end deftypefn
 
-@item amdfam10h
-AMD Family 10h CPU.
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpeq_b (uint32_t, uint32_t)
+Generated assembler @code{cv.cmpeq.b}
+@end deftypefn
 
-@item barcelona
-AMD Family 10h Barcelona CPU.
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpeq_sc_h (uint32_t, int16_t)
+Generated assembler @code{cv.cmpeq.sc.h}
+@end deftypefn
 
-@item shanghai
-AMD Family 10h Shanghai CPU.
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpeq_sc_h (uint32_t, int6_t)
+Generated assembler @code{cv.cmpeq.sci.h}
+@end deftypefn
 
-@item istanbul
-AMD Family 10h Istanbul CPU.
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpeq_sc_b (uint32_t, int8_t)
+Generated assembler @code{cv.cmpeq.sc.b}
+@end deftypefn
 
-@item btver1
-AMD Family 14h CPU.
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpeq_sc_b (uint32_t, int6_t)
+Generated assembler @code{cv.cmpeq.sci.b}
+@end deftypefn
 
-@item amdfam15h
-AMD Family 15h CPU.
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpne_h (uint32_t, uint32_t)
+Generated assembler @code{cv.cmpne.h}
+@end deftypefn
 
-@item bdver1
-AMD Family 15h Bulldozer version 1.
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpne_b (uint32_t, uint32_t)
+Generated assembler @code{cv.cmpne.b}
+@end deftypefn
 
-@item bdver2
-AMD Family 15h Bulldozer version 2.
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpne_sc_h (uint32_t, int16_t)
+Generated assembler @code{cv.cmpne.sc.h}
+@end deftypefn
 
-@item bdver3
-AMD Family 15h Bulldozer version 3.
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpne_sc_h (uint32_t, int6_t)
+Generated assembler @code{cv.cmpne.sci.h}
+@end deftypefn
 
-@item bdver4
-AMD Family 15h Bulldozer version 4.
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpne_sc_b (uint32_t, int8_t)
+Generated assembler @code{cv.cmpne.sc.b}
+@end deftypefn
 
-@item btver2
-AMD Family 16h CPU.
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpne_sc_b (uint32_t, int6_t)
+Generated assembler @code{cv.cmpne.sci.b}
+@end deftypefn
 
-@item amdfam17h
-AMD Family 17h CPU.
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpgt_h (uint32_t, uint32_t)
+Generated assembler @code{cv.cmpgt.h}
+@end deftypefn
 
-@item znver1
-AMD Family 17h Zen version 1.
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpgt_b (uint32_t, uint32_t)
+Generated assembler @code{cv.cmpgt.b}
+@end deftypefn
 
-@item znver2
-AMD Family 17h Zen version 2.
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpgt_sc_h (uint32_t, int16_t)
+Generated assembler @code{cv.cmpgt.sc.h}
+@end deftypefn
 
-@item amdfam19h
-AMD Family 19h CPU.
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpgt_sc_h (uint32_t, int6_t)
+Generated assembler @code{cv.cmpgt.sci.h}
+@end deftypefn
 
-@item znver3
-AMD Family 19h Zen version 3.
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpgt_sc_b (uint32_t, int8_t)
+Generated assembler @code{cv.cmpgt.sc.b}
+@end deftypefn
 
-@item znver4
-AMD Family 19h Zen version 4.
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpgt_sc_b (uint32_t, int6_t)
+Generated assembler @code{cv.cmpgt.sci.b}
+@end deftypefn
 
-@item znver5
-AMD Family 1ah Zen version 5.
-@end table
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpge_h (uint32_t, uint32_t)
+Generated assembler @code{cv.cmpge.h}
+@end deftypefn
 
-Here is an example:
-@smallexample
-if (__builtin_cpu_is ("corei7"))
-  @{
-     do_corei7 (); // Core i7 specific implementation.
-  @}
-else
-  @{
-     do_generic (); // Generic implementation.
-  @}
-@end smallexample
-@enddefbuiltin
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpge_b (uint32_t, uint32_t)
+Generated assembler @code{cv.cmpge.b}
+@end deftypefn
 
-@defbuiltin{int __builtin_cpu_supports (const char *@var{feature})}
-This function returns a positive integer if the run-time CPU
-supports @var{feature}
-and returns @code{0} otherwise. The following features can be detected:
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpge_sc_h (uint32_t, int16_t)
+Generated assembler @code{cv.cmpge.sc.h}
+@end deftypefn
 
-@table @samp
-@item cmov
-CMOV instruction.
-@item mmx
-MMX instructions.
-@item popcnt
-POPCNT instruction.
-@item sse
-SSE instructions.
-@item sse2
-SSE2 instructions.
-@item sse3
-SSE3 instructions.
-@item ssse3
-SSSE3 instructions.
-@item sse4.1
-SSE4.1 instructions.
-@item sse4.2
-SSE4.2 instructions.
-@item avx
-AVX instructions.
-@item avx2
-AVX2 instructions.
-@item sse4a
-SSE4A instructions.
-@item fma4
-FMA4 instructions.
-@item xop
-XOP instructions.
-@item fma
-FMA instructions.
-@item avx512f
-AVX512F instructions.
-@item bmi
-BMI instructions.
-@item bmi2
-BMI2 instructions.
-@item aes
-AES instructions.
-@item pclmul
-PCLMUL instructions.
-@item avx512vl
-AVX512VL instructions.
-@item avx512bw
-AVX512BW instructions.
-@item avx512dq
-AVX512DQ instructions.
-@item avx512cd
-AVX512CD instructions.
-@item avx512vbmi
-AVX512VBMI instructions.
-@item avx512ifma
-AVX512IFMA instructions.
-@item avx512vpopcntdq
-AVX512VPOPCNTDQ instructions.
-@item avx512vbmi2
-AVX512VBMI2 instructions.
-@item gfni
-GFNI instructions.
-@item vpclmulqdq
-VPCLMULQDQ instructions.
-@item avx512vnni
-AVX512VNNI instructions.
-@item avx512bitalg
-AVX512BITALG instructions.
-@item x86-64
-Baseline x86-64 microarchitecture level (as defined in x86-64 psABI).
-@item x86-64-v2
-x86-64-v2 microarchitecture level.
-@item x86-64-v3
-x86-64-v3 microarchitecture level.
-@item x86-64-v4
-x86-64-v4 microarchitecture level.
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpge_sc_h (uint32_t, int6_t)
+Generated assembler @code{cv.cmpge.sci.h}
+@end deftypefn
 
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpge_sc_b (uint32_t, int8_t)
+Generated assembler @code{cv.cmpge.sc.b}
+@end deftypefn
 
-@end table
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpge_sc_b (uint32_t, int6_t)
+Generated assembler @code{cv.cmpge.sci.b}
+@end deftypefn
 
-Here is an example:
-@smallexample
-if (__builtin_cpu_supports ("popcnt"))
-  @{
-     asm("popcnt %1,%0" : "=r"(count) : "rm"(n) : "cc");
-  @}
-else
-  @{
-     count = generic_countbits (n); //generic implementation.
-  @}
-@end smallexample
-@enddefbuiltin
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmplt_h (uint32_t, uint32_t)
+Generated assembler @code{cv.cmplt.h}
+@end deftypefn
 
-The following built-in functions are made available by @option{-mmmx}.
-All of them generate the machine instruction that is part of the name.
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmplt_b (uint32_t, uint32_t)
+Generated assembler @code{cv.cmplt.b}
+@end deftypefn
 
-@smallexample
-v8qi __builtin_ia32_paddb (v8qi, v8qi);
-v4hi __builtin_ia32_paddw (v4hi, v4hi);
-v2si __builtin_ia32_paddd (v2si, v2si);
-v8qi __builtin_ia32_psubb (v8qi, v8qi);
-v4hi __builtin_ia32_psubw (v4hi, v4hi);
-v2si __builtin_ia32_psubd (v2si, v2si);
-v8qi __builtin_ia32_paddsb (v8qi, v8qi);
-v4hi __builtin_ia32_paddsw (v4hi, v4hi);
-v8qi __builtin_ia32_psubsb (v8qi, v8qi);
-v4hi __builtin_ia32_psubsw (v4hi, v4hi);
-v8qi __builtin_ia32_paddusb (v8qi, v8qi);
-v4hi __builtin_ia32_paddusw (v4hi, v4hi);
-v8qi __builtin_ia32_psubusb (v8qi, v8qi);
-v4hi __builtin_ia32_psubusw (v4hi, v4hi);
-v4hi __builtin_ia32_pmullw (v4hi, v4hi);
-v4hi __builtin_ia32_pmulhw (v4hi, v4hi);
-di __builtin_ia32_pand (di, di);
-di __builtin_ia32_pandn (di,di);
-di __builtin_ia32_por (di, di);
-di __builtin_ia32_pxor (di, di);
-v8qi __builtin_ia32_pcmpeqb (v8qi, v8qi);
-v4hi __builtin_ia32_pcmpeqw (v4hi, v4hi);
-v2si __builtin_ia32_pcmpeqd (v2si, v2si);
-v8qi __builtin_ia32_pcmpgtb (v8qi, v8qi);
-v4hi __builtin_ia32_pcmpgtw (v4hi, v4hi);
-v2si __builtin_ia32_pcmpgtd (v2si, v2si);
-v8qi __builtin_ia32_punpckhbw (v8qi, v8qi);
-v4hi __builtin_ia32_punpckhwd (v4hi, v4hi);
-v2si __builtin_ia32_punpckhdq (v2si, v2si);
-v8qi __builtin_ia32_punpcklbw (v8qi, v8qi);
-v4hi __builtin_ia32_punpcklwd (v4hi, v4hi);
-v2si __builtin_ia32_punpckldq (v2si, v2si);
-v8qi __builtin_ia32_packsswb (v4hi, v4hi);
-v4hi __builtin_ia32_packssdw (v2si, v2si);
-v8qi __builtin_ia32_packuswb (v4hi, v4hi);
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmplt_sc_h (uint32_t, int16_t)
+Generated assembler @code{cv.cmplt.sc.h}
+@end deftypefn
 
-v4hi __builtin_ia32_psllw (v4hi, v4hi);
-v2si __builtin_ia32_pslld (v2si, v2si);
-v1di __builtin_ia32_psllq (v1di, v1di);
-v4hi __builtin_ia32_psrlw (v4hi, v4hi);
-v2si __builtin_ia32_psrld (v2si, v2si);
-v1di __builtin_ia32_psrlq (v1di, v1di);
-v4hi __builtin_ia32_psraw (v4hi, v4hi);
-v2si __builtin_ia32_psrad (v2si, v2si);
-v4hi __builtin_ia32_psllwi (v4hi, int);
-v2si __builtin_ia32_pslldi (v2si, int);
-v1di __builtin_ia32_psllqi (v1di, int);
-v4hi __builtin_ia32_psrlwi (v4hi, int);
-v2si __builtin_ia32_psrldi (v2si, int);
-v1di __builtin_ia32_psrlqi (v1di, int);
-v4hi __builtin_ia32_psrawi (v4hi, int);
-v2si __builtin_ia32_psradi (v2si, int);
-@end smallexample
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmplt_sc_h (uint32_t, int6_t)
+Generated assembler @code{cv.cmplt.sci.h}
+@end deftypefn
 
-The following built-in functions are made available either with
-@option{-msse}, or with @option{-m3dnowa}.  All of them generate
-the machine instruction that is part of the name.
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmplt_sc_b (uint32_t, int8_t)
+Generated assembler @code{cv.cmplt.sc.b}
+@end deftypefn
 
-@smallexample
-v4hi __builtin_ia32_pmulhuw (v4hi, v4hi);
-v8qi __builtin_ia32_pavgb (v8qi, v8qi);
-v4hi __builtin_ia32_pavgw (v4hi, v4hi);
-v1di __builtin_ia32_psadbw (v8qi, v8qi);
-v8qi __builtin_ia32_pmaxub (v8qi, v8qi);
-v4hi __builtin_ia32_pmaxsw (v4hi, v4hi);
-v8qi __builtin_ia32_pminub (v8qi, v8qi);
-v4hi __builtin_ia32_pminsw (v4hi, v4hi);
-int __builtin_ia32_pmovmskb (v8qi);
-void __builtin_ia32_maskmovq (v8qi, v8qi, char *);
-void __builtin_ia32_movntq (di *, di);
-void __builtin_ia32_sfence (void);
-@end smallexample
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmplt_sc_b (uint32_t, int6_t)
+Generated assembler @code{cv.cmplt.sci.b}
+@end deftypefn
 
-The following built-in functions are available when @option{-msse} is used.
-All of them generate the machine instruction that is part of the name.
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmple_h (uint32_t, uint32_t)
+Generated assembler @code{cv.cmple.h}
+@end deftypefn
 
-@smallexample
-int __builtin_ia32_comieq (v4sf, v4sf);
-int __builtin_ia32_comineq (v4sf, v4sf);
-int __builtin_ia32_comilt (v4sf, v4sf);
-int __builtin_ia32_comile (v4sf, v4sf);
-int __builtin_ia32_comigt (v4sf, v4sf);
-int __builtin_ia32_comige (v4sf, v4sf);
-int __builtin_ia32_ucomieq (v4sf, v4sf);
-int __builtin_ia32_ucomineq (v4sf, v4sf);
-int __builtin_ia32_ucomilt (v4sf, v4sf);
-int __builtin_ia32_ucomile (v4sf, v4sf);
-int __builtin_ia32_ucomigt (v4sf, v4sf);
-int __builtin_ia32_ucomige (v4sf, v4sf);
-v4sf __builtin_ia32_addps (v4sf, v4sf);
-v4sf __builtin_ia32_subps (v4sf, v4sf);
-v4sf __builtin_ia32_mulps (v4sf, v4sf);
-v4sf __builtin_ia32_divps (v4sf, v4sf);
-v4sf __builtin_ia32_addss (v4sf, v4sf);
-v4sf __builtin_ia32_subss (v4sf, v4sf);
-v4sf __builtin_ia32_mulss (v4sf, v4sf);
-v4sf __builtin_ia32_divss (v4sf, v4sf);
-v4sf __builtin_ia32_cmpeqps (v4sf, v4sf);
-v4sf __builtin_ia32_cmpltps (v4sf, v4sf);
-v4sf __builtin_ia32_cmpleps (v4sf, v4sf);
-v4sf __builtin_ia32_cmpgtps (v4sf, v4sf);
-v4sf __builtin_ia32_cmpgeps (v4sf, v4sf);
-v4sf __builtin_ia32_cmpunordps (v4sf, v4sf);
-v4sf __builtin_ia32_cmpneqps (v4sf, v4sf);
-v4sf __builtin_ia32_cmpnltps (v4sf, v4sf);
-v4sf __builtin_ia32_cmpnleps (v4sf, v4sf);
-v4sf __builtin_ia32_cmpngtps (v4sf, v4sf);
-v4sf __builtin_ia32_cmpngeps (v4sf, v4sf);
-v4sf __builtin_ia32_cmpordps (v4sf, v4sf);
-v4sf __builtin_ia32_cmpeqss (v4sf, v4sf);
-v4sf __builtin_ia32_cmpltss (v4sf, v4sf);
-v4sf __builtin_ia32_cmpless (v4sf, v4sf);
-v4sf __builtin_ia32_cmpunordss (v4sf, v4sf);
-v4sf __builtin_ia32_cmpneqss (v4sf, v4sf);
-v4sf __builtin_ia32_cmpnltss (v4sf, v4sf);
-v4sf __builtin_ia32_cmpnless (v4sf, v4sf);
-v4sf __builtin_ia32_cmpordss (v4sf, v4sf);
-v4sf __builtin_ia32_maxps (v4sf, v4sf);
-v4sf __builtin_ia32_maxss (v4sf, v4sf);
-v4sf __builtin_ia32_minps (v4sf, v4sf);
-v4sf __builtin_ia32_minss (v4sf, v4sf);
-v4sf __builtin_ia32_andps (v4sf, v4sf);
-v4sf __builtin_ia32_andnps (v4sf, v4sf);
-v4sf __builtin_ia32_orps (v4sf, v4sf);
-v4sf __builtin_ia32_xorps (v4sf, v4sf);
-v4sf __builtin_ia32_movss (v4sf, v4sf);
-v4sf __builtin_ia32_movhlps (v4sf, v4sf);
-v4sf __builtin_ia32_movlhps (v4sf, v4sf);
-v4sf __builtin_ia32_unpckhps (v4sf, v4sf);
-v4sf __builtin_ia32_unpcklps (v4sf, v4sf);
-v4sf __builtin_ia32_cvtpi2ps (v4sf, v2si);
-v4sf __builtin_ia32_cvtsi2ss (v4sf, int);
-v2si __builtin_ia32_cvtps2pi (v4sf);
-int __builtin_ia32_cvtss2si (v4sf);
-v2si __builtin_ia32_cvttps2pi (v4sf);
-int __builtin_ia32_cvttss2si (v4sf);
-v4sf __builtin_ia32_rcpps (v4sf);
-v4sf __builtin_ia32_rsqrtps (v4sf);
-v4sf __builtin_ia32_sqrtps (v4sf);
-v4sf __builtin_ia32_rcpss (v4sf);
-v4sf __builtin_ia32_rsqrtss (v4sf);
-v4sf __builtin_ia32_sqrtss (v4sf);
-v4sf __builtin_ia32_shufps (v4sf, v4sf, int);
-void __builtin_ia32_movntps (float *, v4sf);
-int __builtin_ia32_movmskps (v4sf);
-@end smallexample
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmple_b (uint32_t, uint32_t)
+Generated assembler @code{cv.cmple.b}
+@end deftypefn
+
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmple_sc_h (uint32_t, int16_t)
+Generated assembler @code{cv.cmple.sc.h}
+@end deftypefn
+
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmple_sc_h (uint32_t, int6_t)
+Generated assembler @code{cv.cmple.sci.h}
+@end deftypefn
+
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmple_sc_b (uint32_t, int8_t)
+Generated assembler @code{cv.cmple.sc.b}
+@end deftypefn
+
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmple_sc_b (uint32_t, int6_t)
+Generated assembler @code{cv.cmple.sci.b}
+@end deftypefn
+
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpgtu_h (uint32_t, uint32_t)
+Generated assembler @code{cv.cmpgtu.h}
+@end deftypefn
+
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpgtu_b (uint32_t, uint32_t)
+Generated assembler @code{cv.cmpgtu.b}
+@end deftypefn
+
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpgtu_sc_h (uint32_t, uint16_t)
+Generated assembler @code{cv.cmpgtu.sc.h}
+@end deftypefn
+
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpgtu_sc_h (uint32_t, uint6_t)
+Generated assembler @code{cv.cmpgtu.sci.h}
+@end deftypefn
+
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpgtu_sc_b (uint32_t, uint8_t)
+Generated assembler @code{cv.cmpgtu.sc.b}
+@end deftypefn
+
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpgtu_sc_b (uint32_t, uint6_t)
+Generated assembler @code{cv.cmpgtu.sci.b}
+@end deftypefn
+
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpgeu_h (uint32_t, uint32_t)
+Generated assembler @code{cv.cmpgeu.h}
+@end deftypefn
+
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpgeu_b (uint32_t, uint32_t)
+Generated assembler @code{cv.cmpgeu.b}
+@end deftypefn
+
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpgeu_sc_h (uint32_t, uint16_t)
+Generated assembler @code{cv.cmpgeu.sc.h}
+@end deftypefn
+
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpgeu_sc_h (uint32_t, uint6_t)
+Generated assembler @code{cv.cmpgeu.sci.h}
+@end deftypefn
+
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpgeu_sc_b (uint32_t, uint8_t)
+Generated assembler @code{cv.cmpgeu.sc.b}
+@end deftypefn
+
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpgeu_sc_b (uint32_t, uint6_t)
+Generated assembler @code{cv.cmpgeu.sci.b}
+@end deftypefn
+
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpltu_h (uint32_t, uint32_t)
+Generated assembler @code{cv.cmpltu.h}
+@end deftypefn
+
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpltu_b (uint32_t, uint32_t)
+Generated assembler @code{cv.cmpltu.b}
+@end deftypefn
 
-The following built-in functions are available when @option{-msse} is used.
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpltu_sc_h (uint32_t, uint16_t)
+Generated assembler @code{cv.cmpltu.sc.h}
+@end deftypefn
 
-@defbuiltin{v4sf __builtin_ia32_loadups (float *)}
-Generates the @code{movups} machine instruction as a load from memory.
-@enddefbuiltin
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpltu_sc_h (uint32_t, uint6_t)
+Generated assembler @code{cv.cmpltu.sci.h}
+@end deftypefn
 
-@defbuiltin{void __builtin_ia32_storeups (float *, v4sf)}
-Generates the @code{movups} machine instruction as a store to memory.
-@enddefbuiltin
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpltu_sc_b (uint32_t, uint8_t)
+Generated assembler @code{cv.cmpltu.sc.b}
+@end deftypefn
 
-@defbuiltin{v4sf __builtin_ia32_loadss (float *)}
-Generates the @code{movss} machine instruction as a load from memory.
-@enddefbuiltin
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpltu_sc_b (uint32_t, uint6_t)
+Generated assembler @code{cv.cmpltu.sci.b}
+@end deftypefn
 
-@defbuiltin{v4sf __builtin_ia32_loadhps (v4sf, const v2sf *)}
-Generates the @code{movhps} machine instruction as a load from memory.
-@enddefbuiltin
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpleu_h (uint32_t, uint32_t)
+Generated assembler @code{cv.cmpleu.h}
+@end deftypefn
 
-@defbuiltin{v4sf __builtin_ia32_loadlps (v4sf, const v2sf *)}
-Generates the @code{movlps} machine instruction as a load from memory
-@enddefbuiltin
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpleu_b (uint32_t, uint32_t)
+Generated assembler @code{cv.cmpleu.b}
+@end deftypefn
 
-@defbuiltin{void __builtin_ia32_storehps (v2sf *, v4sf)}
-Generates the @code{movhps} machine instruction as a store to memory.
-@enddefbuiltin
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpleu_sc_h (uint32_t, uint16_t)
+Generated assembler @code{cv.cmpleu.sc.h}
+@end deftypefn
 
-@defbuiltin{void __builtin_ia32_storelps (v2sf *, v4sf)}
-Generates the @code{movlps} machine instruction as a store to memory.
-@enddefbuiltin
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpleu_sc_h (uint32_t, uint6_t)
+Generated assembler @code{cv.cmpleu.sci.h}
+@end deftypefn
 
-The following built-in functions are available when @option{-msse2} is used.
-All of them generate the machine instruction that is part of the name.
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpleu_sc_b (uint32_t, uint8_t)
+Generated assembler @code{cv.cmpleu.sc.b}
+@end deftypefn
 
-@smallexample
-int __builtin_ia32_comisdeq (v2df, v2df);
-int __builtin_ia32_comisdlt (v2df, v2df);
-int __builtin_ia32_comisdle (v2df, v2df);
-int __builtin_ia32_comisdgt (v2df, v2df);
-int __builtin_ia32_comisdge (v2df, v2df);
-int __builtin_ia32_comisdneq (v2df, v2df);
-int __builtin_ia32_ucomisdeq (v2df, v2df);
-int __builtin_ia32_ucomisdlt (v2df, v2df);
-int __builtin_ia32_ucomisdle (v2df, v2df);
-int __builtin_ia32_ucomisdgt (v2df, v2df);
-int __builtin_ia32_ucomisdge (v2df, v2df);
-int __builtin_ia32_ucomisdneq (v2df, v2df);
-v2df __builtin_ia32_cmpeqpd (v2df, v2df);
-v2df __builtin_ia32_cmpltpd (v2df, v2df);
-v2df __builtin_ia32_cmplepd (v2df, v2df);
-v2df __builtin_ia32_cmpgtpd (v2df, v2df);
-v2df __builtin_ia32_cmpgepd (v2df, v2df);
-v2df __builtin_ia32_cmpunordpd (v2df, v2df);
-v2df __builtin_ia32_cmpneqpd (v2df, v2df);
-v2df __builtin_ia32_cmpnltpd (v2df, v2df);
-v2df __builtin_ia32_cmpnlepd (v2df, v2df);
-v2df __builtin_ia32_cmpngtpd (v2df, v2df);
-v2df __builtin_ia32_cmpngepd (v2df, v2df);
-v2df __builtin_ia32_cmpordpd (v2df, v2df);
-v2df __builtin_ia32_cmpeqsd (v2df, v2df);
-v2df __builtin_ia32_cmpltsd (v2df, v2df);
-v2df __builtin_ia32_cmplesd (v2df, v2df);
-v2df __builtin_ia32_cmpunordsd (v2df, v2df);
-v2df __builtin_ia32_cmpneqsd (v2df, v2df);
-v2df __builtin_ia32_cmpnltsd (v2df, v2df);
-v2df __builtin_ia32_cmpnlesd (v2df, v2df);
-v2df __builtin_ia32_cmpordsd (v2df, v2df);
-v2di __builtin_ia32_paddq (v2di, v2di);
-v2di __builtin_ia32_psubq (v2di, v2di);
-v2df __builtin_ia32_addpd (v2df, v2df);
-v2df __builtin_ia32_subpd (v2df, v2df);
-v2df __builtin_ia32_mulpd (v2df, v2df);
-v2df __builtin_ia32_divpd (v2df, v2df);
-v2df __builtin_ia32_addsd (v2df, v2df);
-v2df __builtin_ia32_subsd (v2df, v2df);
-v2df __builtin_ia32_mulsd (v2df, v2df);
-v2df __builtin_ia32_divsd (v2df, v2df);
-v2df __builtin_ia32_minpd (v2df, v2df);
-v2df __builtin_ia32_maxpd (v2df, v2df);
-v2df __builtin_ia32_minsd (v2df, v2df);
-v2df __builtin_ia32_maxsd (v2df, v2df);
-v2df __builtin_ia32_andpd (v2df, v2df);
-v2df __builtin_ia32_andnpd (v2df, v2df);
-v2df __builtin_ia32_orpd (v2df, v2df);
-v2df __builtin_ia32_xorpd (v2df, v2df);
-v2df __builtin_ia32_movsd (v2df, v2df);
-v2df __builtin_ia32_unpckhpd (v2df, v2df);
-v2df __builtin_ia32_unpcklpd (v2df, v2df);
-v16qi __builtin_ia32_paddb128 (v16qi, v16qi);
-v8hi __builtin_ia32_paddw128 (v8hi, v8hi);
-v4si __builtin_ia32_paddd128 (v4si, v4si);
-v2di __builtin_ia32_paddq128 (v2di, v2di);
-v16qi __builtin_ia32_psubb128 (v16qi, v16qi);
-v8hi __builtin_ia32_psubw128 (v8hi, v8hi);
-v4si __builtin_ia32_psubd128 (v4si, v4si);
-v2di __builtin_ia32_psubq128 (v2di, v2di);
-v8hi __builtin_ia32_pmullw128 (v8hi, v8hi);
-v8hi __builtin_ia32_pmulhw128 (v8hi, v8hi);
-v2di __builtin_ia32_pand128 (v2di, v2di);
-v2di __builtin_ia32_pandn128 (v2di, v2di);
-v2di __builtin_ia32_por128 (v2di, v2di);
-v2di __builtin_ia32_pxor128 (v2di, v2di);
-v16qi __builtin_ia32_pavgb128 (v16qi, v16qi);
-v8hi __builtin_ia32_pavgw128 (v8hi, v8hi);
-v16qi __builtin_ia32_pcmpeqb128 (v16qi, v16qi);
-v8hi __builtin_ia32_pcmpeqw128 (v8hi, v8hi);
-v4si __builtin_ia32_pcmpeqd128 (v4si, v4si);
-v16qi __builtin_ia32_pcmpgtb128 (v16qi, v16qi);
-v8hi __builtin_ia32_pcmpgtw128 (v8hi, v8hi);
-v4si __builtin_ia32_pcmpgtd128 (v4si, v4si);
-v16qi __builtin_ia32_pmaxub128 (v16qi, v16qi);
-v8hi __builtin_ia32_pmaxsw128 (v8hi, v8hi);
-v16qi __builtin_ia32_pminub128 (v16qi, v16qi);
-v8hi __builtin_ia32_pminsw128 (v8hi, v8hi);
-v16qi __builtin_ia32_punpckhbw128 (v16qi, v16qi);
-v8hi __builtin_ia32_punpckhwd128 (v8hi, v8hi);
-v4si __builtin_ia32_punpckhdq128 (v4si, v4si);
-v2di __builtin_ia32_punpckhqdq128 (v2di, v2di);
-v16qi __builtin_ia32_punpcklbw128 (v16qi, v16qi);
-v8hi __builtin_ia32_punpcklwd128 (v8hi, v8hi);
-v4si __builtin_ia32_punpckldq128 (v4si, v4si);
-v2di __builtin_ia32_punpcklqdq128 (v2di, v2di);
-v16qi __builtin_ia32_packsswb128 (v8hi, v8hi);
-v8hi __builtin_ia32_packssdw128 (v4si, v4si);
-v16qi __builtin_ia32_packuswb128 (v8hi, v8hi);
-v8hi __builtin_ia32_pmulhuw128 (v8hi, v8hi);
-void __builtin_ia32_maskmovdqu (v16qi, v16qi);
-v2df __builtin_ia32_loadupd (double *);
-void __builtin_ia32_storeupd (double *, v2df);
-v2df __builtin_ia32_loadhpd (v2df, double const *);
-v2df __builtin_ia32_loadlpd (v2df, double const *);
-int __builtin_ia32_movmskpd (v2df);
-int __builtin_ia32_pmovmskb128 (v16qi);
-void __builtin_ia32_movnti (int *, int);
-void __builtin_ia32_movnti64 (long long int *, long long int);
-void __builtin_ia32_movntpd (double *, v2df);
-void __builtin_ia32_movntdq (v2df *, v2df);
-v4si __builtin_ia32_pshufd (v4si, int);
-v8hi __builtin_ia32_pshuflw (v8hi, int);
-v8hi __builtin_ia32_pshufhw (v8hi, int);
-v2di __builtin_ia32_psadbw128 (v16qi, v16qi);
-v2df __builtin_ia32_sqrtpd (v2df);
-v2df __builtin_ia32_sqrtsd (v2df);
-v2df __builtin_ia32_shufpd (v2df, v2df, int);
-v2df __builtin_ia32_cvtdq2pd (v4si);
-v4sf __builtin_ia32_cvtdq2ps (v4si);
-v4si __builtin_ia32_cvtpd2dq (v2df);
-v2si __builtin_ia32_cvtpd2pi (v2df);
-v4sf __builtin_ia32_cvtpd2ps (v2df);
-v4si __builtin_ia32_cvttpd2dq (v2df);
-v2si __builtin_ia32_cvttpd2pi (v2df);
-v2df __builtin_ia32_cvtpi2pd (v2si);
-int __builtin_ia32_cvtsd2si (v2df);
-int __builtin_ia32_cvttsd2si (v2df);
-long long __builtin_ia32_cvtsd2si64 (v2df);
-long long __builtin_ia32_cvttsd2si64 (v2df);
-v4si __builtin_ia32_cvtps2dq (v4sf);
-v2df __builtin_ia32_cvtps2pd (v4sf);
-v4si __builtin_ia32_cvttps2dq (v4sf);
-v2df __builtin_ia32_cvtsi2sd (v2df, int);
-v2df __builtin_ia32_cvtsi642sd (v2df, long long);
-v4sf __builtin_ia32_cvtsd2ss (v4sf, v2df);
-v2df __builtin_ia32_cvtss2sd (v2df, v4sf);
-void __builtin_ia32_clflush (const void *);
-void __builtin_ia32_lfence (void);
-void __builtin_ia32_mfence (void);
-v16qi __builtin_ia32_loaddqu (const char *);
-void __builtin_ia32_storedqu (char *, v16qi);
-v1di __builtin_ia32_pmuludq (v2si, v2si);
-v2di __builtin_ia32_pmuludq128 (v4si, v4si);
-v8hi __builtin_ia32_psllw128 (v8hi, v8hi);
-v4si __builtin_ia32_pslld128 (v4si, v4si);
-v2di __builtin_ia32_psllq128 (v2di, v2di);
-v8hi __builtin_ia32_psrlw128 (v8hi, v8hi);
-v4si __builtin_ia32_psrld128 (v4si, v4si);
-v2di __builtin_ia32_psrlq128 (v2di, v2di);
-v8hi __builtin_ia32_psraw128 (v8hi, v8hi);
-v4si __builtin_ia32_psrad128 (v4si, v4si);
-v2di __builtin_ia32_pslldqi128 (v2di, int);
-v8hi __builtin_ia32_psllwi128 (v8hi, int);
-v4si __builtin_ia32_pslldi128 (v4si, int);
-v2di __builtin_ia32_psllqi128 (v2di, int);
-v2di __builtin_ia32_psrldqi128 (v2di, int);
-v8hi __builtin_ia32_psrlwi128 (v8hi, int);
-v4si __builtin_ia32_psrldi128 (v4si, int);
-v2di __builtin_ia32_psrlqi128 (v2di, int);
-v8hi __builtin_ia32_psrawi128 (v8hi, int);
-v4si __builtin_ia32_psradi128 (v4si, int);
-v4si __builtin_ia32_pmaddwd128 (v8hi, v8hi);
-v2di __builtin_ia32_movq128 (v2di);
-@end smallexample
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpleu_sc_b (uint32_t, uint6_t)
+Generated assembler @code{cv.cmpleu.sci.b}
+@end deftypefn
+
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cplxmul_r (uint32_t, uint32_t, uint32_t, uint4_t)
+Generated assembler @code{cv.cplxmul.r}
+@end deftypefn
+
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cplxmul_i (uint32_t, uint32_t, uint32_t, uint4_t)
+Generated assembler @code{cv.cplxmul.i}
+@end deftypefn
+
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cplxmul_r (uint32_t, uint32_t, uint32_t, uint4_t)
+Generated assembler @code{cv.cplxmul.r.div2}
+@end deftypefn
+
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cplxmul_i (uint32_t, uint32_t, uint32_t, uint4_t)
+Generated assembler @code{cv.cplxmul.i.div2}
+@end deftypefn
+
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cplxmul_r (uint32_t, uint32_t, uint32_t, uint4_t)
+Generated assembler @code{cv.cplxmul.r.div4}
+@end deftypefn
+
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cplxmul_i (uint32_t, uint32_t, uint32_t, uint4_t)
+Generated assembler @code{cv.cplxmul.i.div4}
+@end deftypefn
+
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cplxmul_r (uint32_t, uint32_t, uint32_t, uint4_t)
+Generated assembler @code{cv.cplxmul.r.div8}
+@end deftypefn
+
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cplxmul_i (uint32_t, uint32_t, uint32_t, uint4_t)
+Generated assembler @code{cv.cplxmul.i.div8}
+@end deftypefn
+
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cplxconj (uint32_t)
+Generated assembler @code{cv.cplxconj}
+@end deftypefn
+
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_subrotmj (uint32_t, uint32_t, uint4_t)
+Generated assembler @code{cv.subrotmj}
+@end deftypefn
+
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_subrotmj (uint32_t, uint32_t, uint32_t, uint4_t)
+Generated assembler @code{cv.subrotmj.div2}
+@end deftypefn
 
-The following built-in functions are available when @option{-msse3} is used.
-All of them generate the machine instruction that is part of the name.
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_subrotmj (uint32_t, uint32_t, uint32_t, uint4_t)
+Generated assembler @code{cv.subrotmj.div4}
+@end deftypefn
 
-@smallexample
-v2df __builtin_ia32_addsubpd (v2df, v2df);
-v4sf __builtin_ia32_addsubps (v4sf, v4sf);
-v2df __builtin_ia32_haddpd (v2df, v2df);
-v4sf __builtin_ia32_haddps (v4sf, v4sf);
-v2df __builtin_ia32_hsubpd (v2df, v2df);
-v4sf __builtin_ia32_hsubps (v4sf, v4sf);
-v16qi __builtin_ia32_lddqu (char const *);
-void __builtin_ia32_monitor (void *, unsigned int, unsigned int);
-v4sf __builtin_ia32_movshdup (v4sf);
-v4sf __builtin_ia32_movsldup (v4sf);
-void __builtin_ia32_mwait (unsigned int, unsigned int);
-@end smallexample
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_subrotmj (uint32_t, uint32_t, uint32_t, uint4_t)
+Generated assembler @code{cv.subrotmj.div8}
+@end deftypefn
 
-The following built-in functions are available when @option{-mssse3} is used.
-All of them generate the machine instruction that is part of the name.
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_add_h (uint32_t, uint32_t, uint32_t, uint4_t)
+Generated assembler @code{cv.add.div2}
+@end deftypefn
 
-@smallexample
-v2si __builtin_ia32_phaddd (v2si, v2si);
-v4hi __builtin_ia32_phaddw (v4hi, v4hi);
-v4hi __builtin_ia32_phaddsw (v4hi, v4hi);
-v2si __builtin_ia32_phsubd (v2si, v2si);
-v4hi __builtin_ia32_phsubw (v4hi, v4hi);
-v4hi __builtin_ia32_phsubsw (v4hi, v4hi);
-v4hi __builtin_ia32_pmaddubsw (v8qi, v8qi);
-v4hi __builtin_ia32_pmulhrsw (v4hi, v4hi);
-v8qi __builtin_ia32_pshufb (v8qi, v8qi);
-v8qi __builtin_ia32_psignb (v8qi, v8qi);
-v2si __builtin_ia32_psignd (v2si, v2si);
-v4hi __builtin_ia32_psignw (v4hi, v4hi);
-v1di __builtin_ia32_palignr (v1di, v1di, int);
-v8qi __builtin_ia32_pabsb (v8qi);
-v2si __builtin_ia32_pabsd (v2si);
-v4hi __builtin_ia32_pabsw (v4hi);
-@end smallexample
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_add_h (uint32_t, uint32_t, uint32_t, uint4_t)
+Generated assembler @code{cv.add.div4}
+@end deftypefn
 
-The following built-in functions are available when @option{-mssse3} is used.
-All of them generate the machine instruction that is part of the name.
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_add_h (uint32_t, uint32_t, uint32_t, uint4_t)
+Generated assembler @code{cv.add.div8}
+@end deftypefn
 
-@smallexample
-v4si __builtin_ia32_phaddd128 (v4si, v4si);
-v8hi __builtin_ia32_phaddw128 (v8hi, v8hi);
-v8hi __builtin_ia32_phaddsw128 (v8hi, v8hi);
-v4si __builtin_ia32_phsubd128 (v4si, v4si);
-v8hi __builtin_ia32_phsubw128 (v8hi, v8hi);
-v8hi __builtin_ia32_phsubsw128 (v8hi, v8hi);
-v8hi __builtin_ia32_pmaddubsw128 (v16qi, v16qi);
-v8hi __builtin_ia32_pmulhrsw128 (v8hi, v8hi);
-v16qi __builtin_ia32_pshufb128 (v16qi, v16qi);
-v16qi __builtin_ia32_psignb128 (v16qi, v16qi);
-v4si __builtin_ia32_psignd128 (v4si, v4si);
-v8hi __builtin_ia32_psignw128 (v8hi, v8hi);
-v2di __builtin_ia32_palignr128 (v2di, v2di, int);
-v16qi __builtin_ia32_pabsb128 (v16qi);
-v4si __builtin_ia32_pabsd128 (v4si);
-v8hi __builtin_ia32_pabsw128 (v8hi);
-@end smallexample
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sub_h (uint32_t, uint32_t, uint32_t, uint4_t)
+Generated assembler @code{cv.sub.div2}
+@end deftypefn
 
-The following built-in functions are available when @option{-msse4.1} is
-used.  All of them generate the machine instruction that is part of the
-name.
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sub_h (uint32_t, uint32_t, uint32_t, uint4_t)
+Generated assembler @code{cv.sub.div4}
+@end deftypefn
 
-@smallexample
-v2df __builtin_ia32_blendpd (v2df, v2df, const int);
-v4sf __builtin_ia32_blendps (v4sf, v4sf, const int);
-v2df __builtin_ia32_blendvpd (v2df, v2df, v2df);
-v4sf __builtin_ia32_blendvps (v4sf, v4sf, v4sf);
-v2df __builtin_ia32_dppd (v2df, v2df, const int);
-v4sf __builtin_ia32_dpps (v4sf, v4sf, const int);
-v4sf __builtin_ia32_insertps128 (v4sf, v4sf, const int);
-v2di __builtin_ia32_movntdqa (v2di *);
-v16qi __builtin_ia32_mpsadbw128 (v16qi, v16qi, const int);
-v8hi __builtin_ia32_packusdw128 (v4si, v4si);
-v16qi __builtin_ia32_pblendvb128 (v16qi, v16qi, v16qi);
-v8hi __builtin_ia32_pblendw128 (v8hi, v8hi, const int);
-v2di __builtin_ia32_pcmpeqq (v2di, v2di);
-v8hi __builtin_ia32_phminposuw128 (v8hi);
-v16qi __builtin_ia32_pmaxsb128 (v16qi, v16qi);
-v4si __builtin_ia32_pmaxsd128 (v4si, v4si);
-v4si __builtin_ia32_pmaxud128 (v4si, v4si);
-v8hi __builtin_ia32_pmaxuw128 (v8hi, v8hi);
-v16qi __builtin_ia32_pminsb128 (v16qi, v16qi);
-v4si __builtin_ia32_pminsd128 (v4si, v4si);
-v4si __builtin_ia32_pminud128 (v4si, v4si);
-v8hi __builtin_ia32_pminuw128 (v8hi, v8hi);
-v4si __builtin_ia32_pmovsxbd128 (v16qi);
-v2di __builtin_ia32_pmovsxbq128 (v16qi);
-v8hi __builtin_ia32_pmovsxbw128 (v16qi);
-v2di __builtin_ia32_pmovsxdq128 (v4si);
-v4si __builtin_ia32_pmovsxwd128 (v8hi);
-v2di __builtin_ia32_pmovsxwq128 (v8hi);
-v4si __builtin_ia32_pmovzxbd128 (v16qi);
-v2di __builtin_ia32_pmovzxbq128 (v16qi);
-v8hi __builtin_ia32_pmovzxbw128 (v16qi);
-v2di __builtin_ia32_pmovzxdq128 (v4si);
-v4si __builtin_ia32_pmovzxwd128 (v8hi);
-v2di __builtin_ia32_pmovzxwq128 (v8hi);
-v2di __builtin_ia32_pmuldq128 (v4si, v4si);
-v4si __builtin_ia32_pmulld128 (v4si, v4si);
-int __builtin_ia32_ptestc128 (v2di, v2di);
-int __builtin_ia32_ptestnzc128 (v2di, v2di);
-int __builtin_ia32_ptestz128 (v2di, v2di);
-v2df __builtin_ia32_roundpd (v2df, const int);
-v4sf __builtin_ia32_roundps (v4sf, const int);
-v2df __builtin_ia32_roundsd (v2df, v2df, const int);
-v4sf __builtin_ia32_roundss (v4sf, v4sf, const int);
-@end smallexample
+@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sub_h (uint32_t, uint32_t, uint32_t, uint4_t)
+Generated assembler @code{cv.sub.div8}
+@end deftypefn
 
-The following built-in functions are available when @option{-msse4.1} is
-used.
+@node RX Built-in Functions
+@subsection RX Built-in Functions
+GCC supports some of the RX instructions which cannot be expressed in
+the C programming language via the use of built-in functions.  The
+following functions are supported:
 
-@defbuiltin{v4sf __builtin_ia32_vec_set_v4sf (v4sf, float, const int)}
-Generates the @code{insertps} machine instruction.
+@defbuiltin{void __builtin_rx_brk (void)}
+Generates the @code{brk} machine instruction.
 @enddefbuiltin
 
-@defbuiltin{int __builtin_ia32_vec_ext_v16qi (v16qi, const int)}
-Generates the @code{pextrb} machine instruction.
+@defbuiltin{void __builtin_rx_clrpsw (int)}
+Generates the @code{clrpsw} machine instruction to clear the specified
+bit in the processor status word.
 @enddefbuiltin
 
-@defbuiltin{v16qi __builtin_ia32_vec_set_v16qi (v16qi, int, const int)}
-Generates the @code{pinsrb} machine instruction.
+@defbuiltin{void __builtin_rx_int (int)}
+Generates the @code{int} machine instruction to generate an interrupt
+with the specified value.
 @enddefbuiltin
 
-@defbuiltin{v4si __builtin_ia32_vec_set_v4si (v4si, int, const int)}
-Generates the @code{pinsrd} machine instruction.
+@defbuiltin{void __builtin_rx_machi (int, int)}
+Generates the @code{machi} machine instruction to add the result of
+multiplying the top 16 bits of the two arguments into the
+accumulator.
 @enddefbuiltin
 
-@defbuiltin{v2di __builtin_ia32_vec_set_v2di (v2di, long long, const int)}
-Generates the @code{pinsrq} machine instruction in 64bit mode.
+@defbuiltin{void __builtin_rx_maclo (int, int)}
+Generates the @code{maclo} machine instruction to add the result of
+multiplying the bottom 16 bits of the two arguments into the
+accumulator.
+@enddefbuiltin
+
+@defbuiltin{void __builtin_rx_mulhi (int, int)}
+Generates the @code{mulhi} machine instruction to place the result of
+multiplying the top 16 bits of the two arguments into the
+accumulator.
+@enddefbuiltin
+
+@defbuiltin{void __builtin_rx_mullo (int, int)}
+Generates the @code{mullo} machine instruction to place the result of
+multiplying the bottom 16 bits of the two arguments into the
+accumulator.
+@enddefbuiltin
+
+@defbuiltin{int __builtin_rx_mvfachi (void)}
+Generates the @code{mvfachi} machine instruction to read the top
+32 bits of the accumulator.
+@enddefbuiltin
+
+@defbuiltin{int __builtin_rx_mvfacmi (void)}
+Generates the @code{mvfacmi} machine instruction to read the middle
+32 bits of the accumulator.
+@enddefbuiltin
+
+@defbuiltin{int __builtin_rx_mvfc (int)}
+Generates the @code{mvfc} machine instruction which reads the control
+register specified in its argument and returns its value.
+@enddefbuiltin
+
+@defbuiltin{void __builtin_rx_mvtachi (int)}
+Generates the @code{mvtachi} machine instruction to set the top
+32 bits of the accumulator.
+@enddefbuiltin
+
+@defbuiltin{void __builtin_rx_mvtaclo (int)}
+Generates the @code{mvtaclo} machine instruction to set the bottom
+32 bits of the accumulator.
+@enddefbuiltin
+
+@defbuiltin{void __builtin_rx_mvtc (int @var{reg}, int @var{val})}
+Generates the @code{mvtc} machine instruction which sets control
+register number @code{reg} to @code{val}.
+@enddefbuiltin
+
+@defbuiltin{void __builtin_rx_mvtipl (int)}
+Generates the @code{mvtipl} machine instruction set the interrupt
+priority level.
+@enddefbuiltin
+
+@defbuiltin{void __builtin_rx_racw (int)}
+Generates the @code{racw} machine instruction to round the accumulator
+according to the specified mode.
+@enddefbuiltin
+
+@defbuiltin{int __builtin_rx_revw (int)}
+Generates the @code{revw} machine instruction which swaps the bytes in
+the argument so that bits 0--7 now occupy bits 8--15 and vice versa,
+and also bits 16--23 occupy bits 24--31 and vice versa.
+@enddefbuiltin
+
+@defbuiltin{void __builtin_rx_rmpa (void)}
+Generates the @code{rmpa} machine instruction which initiates a
+repeated multiply and accumulate sequence.
+@enddefbuiltin
+
+@defbuiltin{void __builtin_rx_round (float)}
+Generates the @code{round} machine instruction which returns the
+floating-point argument rounded according to the current rounding mode
+set in the floating-point status word register.
+@enddefbuiltin
+
+@defbuiltin{int __builtin_rx_sat (int)}
+Generates the @code{sat} machine instruction which returns the
+saturated value of the argument.
+@enddefbuiltin
+
+@defbuiltin{void __builtin_rx_setpsw (int)}
+Generates the @code{setpsw} machine instruction to set the specified
+bit in the processor status word.
+@enddefbuiltin
+
+@defbuiltin{void __builtin_rx_wait (void)}
+Generates the @code{wait} machine instruction.
+@enddefbuiltin
+
+@node S/390 System z Built-in Functions
+@subsection S/390 System z Built-in Functions
+@defbuiltin{int __builtin_tbegin (void*)}
+Generates the @code{tbegin} machine instruction starting a
+non-constrained hardware transaction.  If the parameter is non-NULL the
+memory area is used to store the transaction diagnostic buffer and
+will be passed as first operand to @code{tbegin}.  This buffer can be
+defined using the @code{struct __htm_tdb} C struct defined in
+@code{htmintrin.h} and must reside on a double-word boundary.  The
+second tbegin operand is set to @code{0xff0c}. This enables
+save/restore of all GPRs and disables aborts for FPR and AR
+manipulations inside the transaction body.  The condition code set by
+the tbegin instruction is returned as integer value.  The tbegin
+instruction by definition overwrites the content of all FPRs.  The
+compiler will generate code which saves and restores the FPRs.  For
+soft-float code it is recommended to used the @code{*_nofloat}
+variant.  In order to prevent a TDB from being written it is required
+to pass a constant zero value as parameter.  Passing a zero value
+through a variable is not sufficient.  Although modifications of
+access registers inside the transaction will not trigger an
+transaction abort it is not supported to actually modify them.  Access
+registers do not get saved when entering a transaction. They will have
+undefined state when reaching the abort code.
 @enddefbuiltin
 
-The following built-in functions are changed to generate new SSE4.1
-instructions when @option{-msse4.1} is used.
-
-@defbuiltin{float __builtin_ia32_vec_ext_v4sf (v4sf, const int)}
-Generates the @code{extractps} machine instruction.
-@enddefbuiltin
+Macros for the possible return codes of tbegin are defined in the
+@code{htmintrin.h} header file:
 
-@defbuiltin{int __builtin_ia32_vec_ext_v4si (v4si, const int)}
-Generates the @code{pextrd} machine instruction.
-@enddefbuiltin
+@defmac _HTM_TBEGIN_STARTED
+@code{tbegin} has been executed as part of normal processing.  The
+transaction body is supposed to be executed.
+@end defmac
 
-@defbuiltin{{long long} __builtin_ia32_vec_ext_v2di (v2di, const int)}
-Generates the @code{pextrq} machine instruction in 64bit mode.
-@enddefbuiltin
+@defmac _HTM_TBEGIN_INDETERMINATE
+The transaction was aborted due to an indeterminate condition which
+might be persistent.
+@end defmac
 
-The following built-in functions are available when @option{-msse4.2} is
-used.  All of them generate the machine instruction that is part of the
-name.
+@defmac _HTM_TBEGIN_TRANSIENT
+The transaction aborted due to a transient failure.  The transaction
+should be re-executed in that case.
+@end defmac
 
-@smallexample
-v16qi __builtin_ia32_pcmpestrm128 (v16qi, int, v16qi, int, const int);
-int __builtin_ia32_pcmpestri128 (v16qi, int, v16qi, int, const int);
-int __builtin_ia32_pcmpestria128 (v16qi, int, v16qi, int, const int);
-int __builtin_ia32_pcmpestric128 (v16qi, int, v16qi, int, const int);
-int __builtin_ia32_pcmpestrio128 (v16qi, int, v16qi, int, const int);
-int __builtin_ia32_pcmpestris128 (v16qi, int, v16qi, int, const int);
-int __builtin_ia32_pcmpestriz128 (v16qi, int, v16qi, int, const int);
-v16qi __builtin_ia32_pcmpistrm128 (v16qi, v16qi, const int);
-int __builtin_ia32_pcmpistri128 (v16qi, v16qi, const int);
-int __builtin_ia32_pcmpistria128 (v16qi, v16qi, const int);
-int __builtin_ia32_pcmpistric128 (v16qi, v16qi, const int);
-int __builtin_ia32_pcmpistrio128 (v16qi, v16qi, const int);
-int __builtin_ia32_pcmpistris128 (v16qi, v16qi, const int);
-int __builtin_ia32_pcmpistriz128 (v16qi, v16qi, const int);
-v2di __builtin_ia32_pcmpgtq (v2di, v2di);
-@end smallexample
+@defmac _HTM_TBEGIN_PERSISTENT
+The transaction aborted due to a persistent failure.  Re-execution
+under same circumstances will not be productive.
+@end defmac
 
-The following built-in functions are available when @option{-msse4.2} is
-used.
+@defmac _HTM_FIRST_USER_ABORT_CODE
+The @code{_HTM_FIRST_USER_ABORT_CODE} defined in @code{htmintrin.h}
+specifies the first abort code which can be used for
+@code{__builtin_tabort}.  Values below this threshold are reserved for
+machine use.
+@end defmac
 
-@defbuiltin{{unsigned int} __builtin_ia32_crc32qi (unsigned int, unsigned char)}
-Generates the @code{crc32b} machine instruction.
-@enddefbuiltin
+@deftp {Data type} {struct __htm_tdb}
+The @code{struct __htm_tdb} defined in @code{htmintrin.h} describes
+the structure of the transaction diagnostic block as specified in the
+Principles of Operation manual chapter 5-91.
+@end deftp
 
-@defbuiltin{{unsigned int} __builtin_ia32_crc32hi (unsigned int, unsigned short)}
-Generates the @code{crc32w} machine instruction.
+@defbuiltin{int __builtin_tbegin_nofloat (void*)}
+Same as @code{__builtin_tbegin} but without FPR saves and restores.
+Using this variant in code making use of FPRs will leave the FPRs in
+undefined state when entering the transaction abort handler code.
 @enddefbuiltin
 
-@defbuiltin{{unsigned int} __builtin_ia32_crc32si (unsigned int, unsigned int)}
-Generates the @code{crc32l} machine instruction.
+@defbuiltin{int __builtin_tbegin_retry (void*, int)}
+In addition to @code{__builtin_tbegin} a loop for transient failures
+is generated.  If tbegin returns a condition code of 2 the transaction
+will be retried as often as specified in the second argument.  The
+perform processor assist instruction is used to tell the CPU about the
+number of fails so far.
 @enddefbuiltin
 
-@defbuiltin{{unsigned long long} __builtin_ia32_crc32di (unsigned long long, unsigned long long)}
-Generates the @code{crc32q} machine instruction.
+@defbuiltin{int __builtin_tbegin_retry_nofloat (void*, int)}
+Same as @code{__builtin_tbegin_retry} but without FPR saves and
+restores.  Using this variant in code making use of FPRs will leave
+the FPRs in undefined state when entering the transaction abort
+handler code.
 @enddefbuiltin
 
-The following built-in functions are changed to generate new SSE4.2
-instructions when @option{-msse4.2} is used.
-
-@defbuiltin{int __builtin_popcount (unsigned int)}
-Generates the @code{popcntl} machine instruction.
+@defbuiltin{void __builtin_tbeginc (void)}
+Generates the @code{tbeginc} machine instruction starting a constrained
+hardware transaction.  The second operand is set to @code{0xff08}.
 @enddefbuiltin
 
-@defbuiltin{int __builtin_popcountl (unsigned long)}
-Generates the @code{popcntl} or @code{popcntq} machine instruction,
-depending on the size of @code{unsigned long}.
+@defbuiltin{int __builtin_tend (void)}
+Generates the @code{tend} machine instruction finishing a transaction
+and making the changes visible to other threads.  The condition code
+generated by tend is returned as integer value.
 @enddefbuiltin
 
-@defbuiltin{int __builtin_popcountll (unsigned long long)}
-Generates the @code{popcntq} machine instruction.
+@defbuiltin{void __builtin_tabort (int)}
+Generates the @code{tabort} machine instruction with the specified
+abort code.  Abort codes from 0 through 255 are reserved and will
+result in an error message.
 @enddefbuiltin
 
-The following built-in functions are available when @option{-mavx} is
-used. All of them generate the machine instruction that is part of the
-name.
-
-@smallexample
-v4df __builtin_ia32_addpd256 (v4df,v4df);
-v8sf __builtin_ia32_addps256 (v8sf,v8sf);
-v4df __builtin_ia32_addsubpd256 (v4df,v4df);
-v8sf __builtin_ia32_addsubps256 (v8sf,v8sf);
-v4df __builtin_ia32_andnpd256 (v4df,v4df);
-v8sf __builtin_ia32_andnps256 (v8sf,v8sf);
-v4df __builtin_ia32_andpd256 (v4df,v4df);
-v8sf __builtin_ia32_andps256 (v8sf,v8sf);
-v4df __builtin_ia32_blendpd256 (v4df,v4df,int);
-v8sf __builtin_ia32_blendps256 (v8sf,v8sf,int);
-v4df __builtin_ia32_blendvpd256 (v4df,v4df,v4df);
-v8sf __builtin_ia32_blendvps256 (v8sf,v8sf,v8sf);
-v2df __builtin_ia32_cmppd (v2df,v2df,int);
-v4df __builtin_ia32_cmppd256 (v4df,v4df,int);
-v4sf __builtin_ia32_cmpps (v4sf,v4sf,int);
-v8sf __builtin_ia32_cmpps256 (v8sf,v8sf,int);
-v2df __builtin_ia32_cmpsd (v2df,v2df,int);
-v4sf __builtin_ia32_cmpss (v4sf,v4sf,int);
-v4df __builtin_ia32_cvtdq2pd256 (v4si);
-v8sf __builtin_ia32_cvtdq2ps256 (v8si);
-v4si __builtin_ia32_cvtpd2dq256 (v4df);
-v4sf __builtin_ia32_cvtpd2ps256 (v4df);
-v8si __builtin_ia32_cvtps2dq256 (v8sf);
-v4df __builtin_ia32_cvtps2pd256 (v4sf);
-v4si __builtin_ia32_cvttpd2dq256 (v4df);
-v8si __builtin_ia32_cvttps2dq256 (v8sf);
-v4df __builtin_ia32_divpd256 (v4df,v4df);
-v8sf __builtin_ia32_divps256 (v8sf,v8sf);
-v8sf __builtin_ia32_dpps256 (v8sf,v8sf,int);
-v4df __builtin_ia32_haddpd256 (v4df,v4df);
-v8sf __builtin_ia32_haddps256 (v8sf,v8sf);
-v4df __builtin_ia32_hsubpd256 (v4df,v4df);
-v8sf __builtin_ia32_hsubps256 (v8sf,v8sf);
-v32qi __builtin_ia32_lddqu256 (pcchar);
-v32qi __builtin_ia32_loaddqu256 (pcchar);
-v4df __builtin_ia32_loadupd256 (pcdouble);
-v8sf __builtin_ia32_loadups256 (pcfloat);
-v2df __builtin_ia32_maskloadpd (pcv2df,v2df);
-v4df __builtin_ia32_maskloadpd256 (pcv4df,v4df);
-v4sf __builtin_ia32_maskloadps (pcv4sf,v4sf);
-v8sf __builtin_ia32_maskloadps256 (pcv8sf,v8sf);
-void __builtin_ia32_maskstorepd (pv2df,v2df,v2df);
-void __builtin_ia32_maskstorepd256 (pv4df,v4df,v4df);
-void __builtin_ia32_maskstoreps (pv4sf,v4sf,v4sf);
-void __builtin_ia32_maskstoreps256 (pv8sf,v8sf,v8sf);
-v4df __builtin_ia32_maxpd256 (v4df,v4df);
-v8sf __builtin_ia32_maxps256 (v8sf,v8sf);
-v4df __builtin_ia32_minpd256 (v4df,v4df);
-v8sf __builtin_ia32_minps256 (v8sf,v8sf);
-v4df __builtin_ia32_movddup256 (v4df);
-int __builtin_ia32_movmskpd256 (v4df);
-int __builtin_ia32_movmskps256 (v8sf);
-v8sf __builtin_ia32_movshdup256 (v8sf);
-v8sf __builtin_ia32_movsldup256 (v8sf);
-v4df __builtin_ia32_mulpd256 (v4df,v4df);
-v8sf __builtin_ia32_mulps256 (v8sf,v8sf);
-v4df __builtin_ia32_orpd256 (v4df,v4df);
-v8sf __builtin_ia32_orps256 (v8sf,v8sf);
-v2df __builtin_ia32_pd_pd256 (v4df);
-v4df __builtin_ia32_pd256_pd (v2df);
-v4sf __builtin_ia32_ps_ps256 (v8sf);
-v8sf __builtin_ia32_ps256_ps (v4sf);
-int __builtin_ia32_ptestc256 (v4di,v4di,ptest);
-int __builtin_ia32_ptestnzc256 (v4di,v4di,ptest);
-int __builtin_ia32_ptestz256 (v4di,v4di,ptest);
-v8sf __builtin_ia32_rcpps256 (v8sf);
-v4df __builtin_ia32_roundpd256 (v4df,int);
-v8sf __builtin_ia32_roundps256 (v8sf,int);
-v8sf __builtin_ia32_rsqrtps_nr256 (v8sf);
-v8sf __builtin_ia32_rsqrtps256 (v8sf);
-v4df __builtin_ia32_shufpd256 (v4df,v4df,int);
-v8sf __builtin_ia32_shufps256 (v8sf,v8sf,int);
-v4si __builtin_ia32_si_si256 (v8si);
-v8si __builtin_ia32_si256_si (v4si);
-v4df __builtin_ia32_sqrtpd256 (v4df);
-v8sf __builtin_ia32_sqrtps_nr256 (v8sf);
-v8sf __builtin_ia32_sqrtps256 (v8sf);
-void __builtin_ia32_storedqu256 (pchar,v32qi);
-void __builtin_ia32_storeupd256 (pdouble,v4df);
-void __builtin_ia32_storeups256 (pfloat,v8sf);
-v4df __builtin_ia32_subpd256 (v4df,v4df);
-v8sf __builtin_ia32_subps256 (v8sf,v8sf);
-v4df __builtin_ia32_unpckhpd256 (v4df,v4df);
-v8sf __builtin_ia32_unpckhps256 (v8sf,v8sf);
-v4df __builtin_ia32_unpcklpd256 (v4df,v4df);
-v8sf __builtin_ia32_unpcklps256 (v8sf,v8sf);
-v4df __builtin_ia32_vbroadcastf128_pd256 (pcv2df);
-v8sf __builtin_ia32_vbroadcastf128_ps256 (pcv4sf);
-v4df __builtin_ia32_vbroadcastsd256 (pcdouble);
-v4sf __builtin_ia32_vbroadcastss (pcfloat);
-v8sf __builtin_ia32_vbroadcastss256 (pcfloat);
-v2df __builtin_ia32_vextractf128_pd256 (v4df,int);
-v4sf __builtin_ia32_vextractf128_ps256 (v8sf,int);
-v4si __builtin_ia32_vextractf128_si256 (v8si,int);
-v4df __builtin_ia32_vinsertf128_pd256 (v4df,v2df,int);
-v8sf __builtin_ia32_vinsertf128_ps256 (v8sf,v4sf,int);
-v8si __builtin_ia32_vinsertf128_si256 (v8si,v4si,int);
-v4df __builtin_ia32_vperm2f128_pd256 (v4df,v4df,int);
-v8sf __builtin_ia32_vperm2f128_ps256 (v8sf,v8sf,int);
-v8si __builtin_ia32_vperm2f128_si256 (v8si,v8si,int);
-v2df __builtin_ia32_vpermil2pd (v2df,v2df,v2di,int);
-v4df __builtin_ia32_vpermil2pd256 (v4df,v4df,v4di,int);
-v4sf __builtin_ia32_vpermil2ps (v4sf,v4sf,v4si,int);
-v8sf __builtin_ia32_vpermil2ps256 (v8sf,v8sf,v8si,int);
-v2df __builtin_ia32_vpermilpd (v2df,int);
-v4df __builtin_ia32_vpermilpd256 (v4df,int);
-v4sf __builtin_ia32_vpermilps (v4sf,int);
-v8sf __builtin_ia32_vpermilps256 (v8sf,int);
-v2df __builtin_ia32_vpermilvarpd (v2df,v2di);
-v4df __builtin_ia32_vpermilvarpd256 (v4df,v4di);
-v4sf __builtin_ia32_vpermilvarps (v4sf,v4si);
-v8sf __builtin_ia32_vpermilvarps256 (v8sf,v8si);
-int __builtin_ia32_vtestcpd (v2df,v2df,ptest);
-int __builtin_ia32_vtestcpd256 (v4df,v4df,ptest);
-int __builtin_ia32_vtestcps (v4sf,v4sf,ptest);
-int __builtin_ia32_vtestcps256 (v8sf,v8sf,ptest);
-int __builtin_ia32_vtestnzcpd (v2df,v2df,ptest);
-int __builtin_ia32_vtestnzcpd256 (v4df,v4df,ptest);
-int __builtin_ia32_vtestnzcps (v4sf,v4sf,ptest);
-int __builtin_ia32_vtestnzcps256 (v8sf,v8sf,ptest);
-int __builtin_ia32_vtestzpd (v2df,v2df,ptest);
-int __builtin_ia32_vtestzpd256 (v4df,v4df,ptest);
-int __builtin_ia32_vtestzps (v4sf,v4sf,ptest);
-int __builtin_ia32_vtestzps256 (v8sf,v8sf,ptest);
-void __builtin_ia32_vzeroall (void);
-void __builtin_ia32_vzeroupper (void);
-v4df __builtin_ia32_xorpd256 (v4df,v4df);
-v8sf __builtin_ia32_xorps256 (v8sf,v8sf);
-@end smallexample
+@defbuiltin{void __builtin_tx_assist (int)}
+Generates the @code{ppa rX,rY,1} machine instruction.  Where the
+integer parameter is loaded into rX and a value of zero is loaded into
+rY.  The integer parameter specifies the number of times the
+transaction repeatedly aborted.
+@enddefbuiltin
 
-The following built-in functions are available when @option{-mavx2} is
-used. All of them generate the machine instruction that is part of the
-name.
+@defbuiltin{int __builtin_tx_nesting_depth (void)}
+Generates the @code{etnd} machine instruction.  The current nesting
+depth is returned as integer value.  For a nesting depth of 0 the code
+is not executed as part of an transaction.
+@enddefbuiltin
 
-@smallexample
-v32qi __builtin_ia32_mpsadbw256 (v32qi,v32qi,int);
-v32qi __builtin_ia32_pabsb256 (v32qi);
-v16hi __builtin_ia32_pabsw256 (v16hi);
-v8si __builtin_ia32_pabsd256 (v8si);
-v16hi __builtin_ia32_packssdw256 (v8si,v8si);
-v32qi __builtin_ia32_packsswb256 (v16hi,v16hi);
-v16hi __builtin_ia32_packusdw256 (v8si,v8si);
-v32qi __builtin_ia32_packuswb256 (v16hi,v16hi);
-v32qi __builtin_ia32_paddb256 (v32qi,v32qi);
-v16hi __builtin_ia32_paddw256 (v16hi,v16hi);
-v8si __builtin_ia32_paddd256 (v8si,v8si);
-v4di __builtin_ia32_paddq256 (v4di,v4di);
-v32qi __builtin_ia32_paddsb256 (v32qi,v32qi);
-v16hi __builtin_ia32_paddsw256 (v16hi,v16hi);
-v32qi __builtin_ia32_paddusb256 (v32qi,v32qi);
-v16hi __builtin_ia32_paddusw256 (v16hi,v16hi);
-v4di __builtin_ia32_palignr256 (v4di,v4di,int);
-v4di __builtin_ia32_andsi256 (v4di,v4di);
-v4di __builtin_ia32_andnotsi256 (v4di,v4di);
-v32qi __builtin_ia32_pavgb256 (v32qi,v32qi);
-v16hi __builtin_ia32_pavgw256 (v16hi,v16hi);
-v32qi __builtin_ia32_pblendvb256 (v32qi,v32qi,v32qi);
-v16hi __builtin_ia32_pblendw256 (v16hi,v16hi,int);
-v32qi __builtin_ia32_pcmpeqb256 (v32qi,v32qi);
-v16hi __builtin_ia32_pcmpeqw256 (v16hi,v16hi);
-v8si __builtin_ia32_pcmpeqd256 (c8si,v8si);
-v4di __builtin_ia32_pcmpeqq256 (v4di,v4di);
-v32qi __builtin_ia32_pcmpgtb256 (v32qi,v32qi);
-v16hi __builtin_ia32_pcmpgtw256 (16hi,v16hi);
-v8si __builtin_ia32_pcmpgtd256 (v8si,v8si);
-v4di __builtin_ia32_pcmpgtq256 (v4di,v4di);
-v16hi __builtin_ia32_phaddw256 (v16hi,v16hi);
-v8si __builtin_ia32_phaddd256 (v8si,v8si);
-v16hi __builtin_ia32_phaddsw256 (v16hi,v16hi);
-v16hi __builtin_ia32_phsubw256 (v16hi,v16hi);
-v8si __builtin_ia32_phsubd256 (v8si,v8si);
-v16hi __builtin_ia32_phsubsw256 (v16hi,v16hi);
-v32qi __builtin_ia32_pmaddubsw256 (v32qi,v32qi);
-v16hi __builtin_ia32_pmaddwd256 (v16hi,v16hi);
-v32qi __builtin_ia32_pmaxsb256 (v32qi,v32qi);
-v16hi __builtin_ia32_pmaxsw256 (v16hi,v16hi);
-v8si __builtin_ia32_pmaxsd256 (v8si,v8si);
-v32qi __builtin_ia32_pmaxub256 (v32qi,v32qi);
-v16hi __builtin_ia32_pmaxuw256 (v16hi,v16hi);
-v8si __builtin_ia32_pmaxud256 (v8si,v8si);
-v32qi __builtin_ia32_pminsb256 (v32qi,v32qi);
-v16hi __builtin_ia32_pminsw256 (v16hi,v16hi);
-v8si __builtin_ia32_pminsd256 (v8si,v8si);
-v32qi __builtin_ia32_pminub256 (v32qi,v32qi);
-v16hi __builtin_ia32_pminuw256 (v16hi,v16hi);
-v8si __builtin_ia32_pminud256 (v8si,v8si);
-int __builtin_ia32_pmovmskb256 (v32qi);
-v16hi __builtin_ia32_pmovsxbw256 (v16qi);
-v8si __builtin_ia32_pmovsxbd256 (v16qi);
-v4di __builtin_ia32_pmovsxbq256 (v16qi);
-v8si __builtin_ia32_pmovsxwd256 (v8hi);
-v4di __builtin_ia32_pmovsxwq256 (v8hi);
-v4di __builtin_ia32_pmovsxdq256 (v4si);
-v16hi __builtin_ia32_pmovzxbw256 (v16qi);
-v8si __builtin_ia32_pmovzxbd256 (v16qi);
-v4di __builtin_ia32_pmovzxbq256 (v16qi);
-v8si __builtin_ia32_pmovzxwd256 (v8hi);
-v4di __builtin_ia32_pmovzxwq256 (v8hi);
-v4di __builtin_ia32_pmovzxdq256 (v4si);
-v4di __builtin_ia32_pmuldq256 (v8si,v8si);
-v16hi __builtin_ia32_pmulhrsw256 (v16hi, v16hi);
-v16hi __builtin_ia32_pmulhuw256 (v16hi,v16hi);
-v16hi __builtin_ia32_pmulhw256 (v16hi,v16hi);
-v16hi __builtin_ia32_pmullw256 (v16hi,v16hi);
-v8si __builtin_ia32_pmulld256 (v8si,v8si);
-v4di __builtin_ia32_pmuludq256 (v8si,v8si);
-v4di __builtin_ia32_por256 (v4di,v4di);
-v16hi __builtin_ia32_psadbw256 (v32qi,v32qi);
-v32qi __builtin_ia32_pshufb256 (v32qi,v32qi);
-v8si __builtin_ia32_pshufd256 (v8si,int);
-v16hi __builtin_ia32_pshufhw256 (v16hi,int);
-v16hi __builtin_ia32_pshuflw256 (v16hi,int);
-v32qi __builtin_ia32_psignb256 (v32qi,v32qi);
-v16hi __builtin_ia32_psignw256 (v16hi,v16hi);
-v8si __builtin_ia32_psignd256 (v8si,v8si);
-v4di __builtin_ia32_pslldqi256 (v4di,int);
-v16hi __builtin_ia32_psllwi256 (16hi,int);
-v16hi __builtin_ia32_psllw256(v16hi,v8hi);
-v8si __builtin_ia32_pslldi256 (v8si,int);
-v8si __builtin_ia32_pslld256(v8si,v4si);
-v4di __builtin_ia32_psllqi256 (v4di,int);
-v4di __builtin_ia32_psllq256(v4di,v2di);
-v16hi __builtin_ia32_psrawi256 (v16hi,int);
-v16hi __builtin_ia32_psraw256 (v16hi,v8hi);
-v8si __builtin_ia32_psradi256 (v8si,int);
-v8si __builtin_ia32_psrad256 (v8si,v4si);
-v4di __builtin_ia32_psrldqi256 (v4di, int);
-v16hi __builtin_ia32_psrlwi256 (v16hi,int);
-v16hi __builtin_ia32_psrlw256 (v16hi,v8hi);
-v8si __builtin_ia32_psrldi256 (v8si,int);
-v8si __builtin_ia32_psrld256 (v8si,v4si);
-v4di __builtin_ia32_psrlqi256 (v4di,int);
-v4di __builtin_ia32_psrlq256(v4di,v2di);
-v32qi __builtin_ia32_psubb256 (v32qi,v32qi);
-v32hi __builtin_ia32_psubw256 (v16hi,v16hi);
-v8si __builtin_ia32_psubd256 (v8si,v8si);
-v4di __builtin_ia32_psubq256 (v4di,v4di);
-v32qi __builtin_ia32_psubsb256 (v32qi,v32qi);
-v16hi __builtin_ia32_psubsw256 (v16hi,v16hi);
-v32qi __builtin_ia32_psubusb256 (v32qi,v32qi);
-v16hi __builtin_ia32_psubusw256 (v16hi,v16hi);
-v32qi __builtin_ia32_punpckhbw256 (v32qi,v32qi);
-v16hi __builtin_ia32_punpckhwd256 (v16hi,v16hi);
-v8si __builtin_ia32_punpckhdq256 (v8si,v8si);
-v4di __builtin_ia32_punpckhqdq256 (v4di,v4di);
-v32qi __builtin_ia32_punpcklbw256 (v32qi,v32qi);
-v16hi __builtin_ia32_punpcklwd256 (v16hi,v16hi);
-v8si __builtin_ia32_punpckldq256 (v8si,v8si);
-v4di __builtin_ia32_punpcklqdq256 (v4di,v4di);
-v4di __builtin_ia32_pxor256 (v4di,v4di);
-v4di __builtin_ia32_movntdqa256 (pv4di);
-v4sf __builtin_ia32_vbroadcastss_ps (v4sf);
-v8sf __builtin_ia32_vbroadcastss_ps256 (v4sf);
-v4df __builtin_ia32_vbroadcastsd_pd256 (v2df);
-v4di __builtin_ia32_vbroadcastsi256 (v2di);
-v4si __builtin_ia32_pblendd128 (v4si,v4si);
-v8si __builtin_ia32_pblendd256 (v8si,v8si);
-v32qi __builtin_ia32_pbroadcastb256 (v16qi);
-v16hi __builtin_ia32_pbroadcastw256 (v8hi);
-v8si __builtin_ia32_pbroadcastd256 (v4si);
-v4di __builtin_ia32_pbroadcastq256 (v2di);
-v16qi __builtin_ia32_pbroadcastb128 (v16qi);
-v8hi __builtin_ia32_pbroadcastw128 (v8hi);
-v4si __builtin_ia32_pbroadcastd128 (v4si);
-v2di __builtin_ia32_pbroadcastq128 (v2di);
-v8si __builtin_ia32_permvarsi256 (v8si,v8si);
-v4df __builtin_ia32_permdf256 (v4df,int);
-v8sf __builtin_ia32_permvarsf256 (v8sf,v8sf);
-v4di __builtin_ia32_permdi256 (v4di,int);
-v4di __builtin_ia32_permti256 (v4di,v4di,int);
-v4di __builtin_ia32_extract128i256 (v4di,int);
-v4di __builtin_ia32_insert128i256 (v4di,v2di,int);
-v8si __builtin_ia32_maskloadd256 (pcv8si,v8si);
-v4di __builtin_ia32_maskloadq256 (pcv4di,v4di);
-v4si __builtin_ia32_maskloadd (pcv4si,v4si);
-v2di __builtin_ia32_maskloadq (pcv2di,v2di);
-void __builtin_ia32_maskstored256 (pv8si,v8si,v8si);
-void __builtin_ia32_maskstoreq256 (pv4di,v4di,v4di);
-void __builtin_ia32_maskstored (pv4si,v4si,v4si);
-void __builtin_ia32_maskstoreq (pv2di,v2di,v2di);
-v8si __builtin_ia32_psllv8si (v8si,v8si);
-v4si __builtin_ia32_psllv4si (v4si,v4si);
-v4di __builtin_ia32_psllv4di (v4di,v4di);
-v2di __builtin_ia32_psllv2di (v2di,v2di);
-v8si __builtin_ia32_psrav8si (v8si,v8si);
-v4si __builtin_ia32_psrav4si (v4si,v4si);
-v8si __builtin_ia32_psrlv8si (v8si,v8si);
-v4si __builtin_ia32_psrlv4si (v4si,v4si);
-v4di __builtin_ia32_psrlv4di (v4di,v4di);
-v2di __builtin_ia32_psrlv2di (v2di,v2di);
-v2df __builtin_ia32_gathersiv2df (v2df, pcdouble,v4si,v2df,int);
-v4df __builtin_ia32_gathersiv4df (v4df, pcdouble,v4si,v4df,int);
-v2df __builtin_ia32_gatherdiv2df (v2df, pcdouble,v2di,v2df,int);
-v4df __builtin_ia32_gatherdiv4df (v4df, pcdouble,v4di,v4df,int);
-v4sf __builtin_ia32_gathersiv4sf (v4sf, pcfloat,v4si,v4sf,int);
-v8sf __builtin_ia32_gathersiv8sf (v8sf, pcfloat,v8si,v8sf,int);
-v4sf __builtin_ia32_gatherdiv4sf (v4sf, pcfloat,v2di,v4sf,int);
-v4sf __builtin_ia32_gatherdiv4sf256 (v4sf, pcfloat,v4di,v4sf,int);
-v2di __builtin_ia32_gathersiv2di (v2di, pcint64,v4si,v2di,int);
-v4di __builtin_ia32_gathersiv4di (v4di, pcint64,v4si,v4di,int);
-v2di __builtin_ia32_gatherdiv2di (v2di, pcint64,v2di,v2di,int);
-v4di __builtin_ia32_gatherdiv4di (v4di, pcint64,v4di,v4di,int);
-v4si __builtin_ia32_gathersiv4si (v4si, pcint,v4si,v4si,int);
-v8si __builtin_ia32_gathersiv8si (v8si, pcint,v8si,v8si,int);
-v4si __builtin_ia32_gatherdiv4si (v4si, pcint,v2di,v4si,int);
-v4si __builtin_ia32_gatherdiv4si256 (v4si, pcint,v4di,v4si,int);
-@end smallexample
+@defbuiltin{void __builtin_non_tx_store (uint64_t *, uint64_t)}
 
-The following built-in functions are available when @option{-maes} is
-used.  All of them generate the machine instruction that is part of the
-name.
+Generates the @code{ntstg} machine instruction.  The second argument
+is written to the first arguments location.  The store operation will
+not be rolled-back in case of an transaction abort.
+@enddefbuiltin
 
-@smallexample
-v2di __builtin_ia32_aesenc128 (v2di, v2di);
-v2di __builtin_ia32_aesenclast128 (v2di, v2di);
-v2di __builtin_ia32_aesdec128 (v2di, v2di);
-v2di __builtin_ia32_aesdeclast128 (v2di, v2di);
-v2di __builtin_ia32_aeskeygenassist128 (v2di, const int);
-v2di __builtin_ia32_aesimc128 (v2di);
-@end smallexample
+@node SH Built-in Functions
+@subsection SH Built-in Functions
+The following built-in functions are supported on the SH1, SH2, SH3 and SH4
+families of processors:
 
-The following built-in function is available when @option{-mpclmul} is
-used.
+@defbuiltin{{void} __builtin_set_thread_pointer (void *@var{ptr})}
+Sets the @samp{GBR} register to the specified value @var{ptr}.  This is usually
+used by system code that manages threads and execution contexts.  The compiler
+normally does not generate code that modifies the contents of @samp{GBR} and
+thus the value is preserved across function calls.  Changing the @samp{GBR}
+value in user code must be done with caution, since the compiler might use
+@samp{GBR} in order to access thread local variables.
 
-@defbuiltin{v2di __builtin_ia32_pclmulqdq128 (v2di, v2di, const int)}
-Generates the @code{pclmulqdq} machine instruction.
 @enddefbuiltin
 
-The following built-in function is available when @option{-mfsgsbase} is
-used.  All of them generate the machine instruction that is part of the
-name.
-
+@defbuiltin{{void *} __builtin_thread_pointer (void)}
+Returns the value that is currently set in the @samp{GBR} register.
+Memory loads and stores that use the thread pointer as a base address are
+turned into @samp{GBR} based displacement loads and stores, if possible.
+For example:
 @smallexample
-unsigned int __builtin_ia32_rdfsbase32 (void);
-unsigned long long __builtin_ia32_rdfsbase64 (void);
-unsigned int __builtin_ia32_rdgsbase32 (void);
-unsigned long long __builtin_ia32_rdgsbase64 (void);
-void _writefsbase_u32 (unsigned int);
-void _writefsbase_u64 (unsigned long long);
-void _writegsbase_u32 (unsigned int);
-void _writegsbase_u64 (unsigned long long);
+struct my_tcb
+@{
+   int a, b, c, d, e;
+@};
+
+int get_tcb_value (void)
+@{
+  // Generate @samp{mov.l @@(8,gbr),r0} instruction
+  return ((my_tcb*)__builtin_thread_pointer ())->c;
+@}
+
 @end smallexample
+@enddefbuiltin
 
-The following built-in function is available when @option{-mrdrnd} is
-used.  All of them generate the machine instruction that is part of the
-name.
+@defbuiltin{{unsigned int} __builtin_sh_get_fpscr (void)}
+Returns the value that is currently set in the @samp{FPSCR} register.
+@enddefbuiltin
+
+@defbuiltin{{void} __builtin_sh_set_fpscr (unsigned int @var{val})}
+Sets the @samp{FPSCR} register to the specified value @var{val}, while
+preserving the current values of the FR, SZ and PR bits.
+@enddefbuiltin
 
-@smallexample
-unsigned int __builtin_ia32_rdrand16_step (unsigned short *);
-unsigned int __builtin_ia32_rdrand32_step (unsigned int *);
-unsigned int __builtin_ia32_rdrand64_step (unsigned long long *);
-@end smallexample
+@node SPARC VIS Built-in Functions
+@subsection SPARC VIS Built-in Functions
 
-The following built-in function is available when @option{-mptwrite} is
-used.  All of them generate the machine instruction that is part of the
-name.
+GCC supports SIMD operations on the SPARC using both the generic vector
+extensions (@pxref{Vector Extensions}) as well as built-in functions for
+the SPARC Visual Instruction Set (VIS).  When you use the @option{-mvis}
+switch, the VIS extension is exposed as the following built-in functions:
 
 @smallexample
-void __builtin_ia32_ptwrite32 (unsigned);
-void __builtin_ia32_ptwrite64 (unsigned long long);
-@end smallexample
+typedef int v1si __attribute__ ((vector_size (4)));
+typedef int v2si __attribute__ ((vector_size (8)));
+typedef short v4hi __attribute__ ((vector_size (8)));
+typedef short v2hi __attribute__ ((vector_size (4)));
+typedef unsigned char v8qi __attribute__ ((vector_size (8)));
+typedef unsigned char v4qi __attribute__ ((vector_size (4)));
 
-The following built-in functions are available when @option{-msse4a} is used.
-All of them generate the machine instruction that is part of the name.
+void __builtin_vis_write_gsr (int64_t);
+int64_t __builtin_vis_read_gsr (void);
 
-@smallexample
-void __builtin_ia32_movntsd (double *, v2df);
-void __builtin_ia32_movntss (float *, v4sf);
-v2di __builtin_ia32_extrq  (v2di, v16qi);
-v2di __builtin_ia32_extrqi (v2di, const unsigned int, const unsigned int);
-v2di __builtin_ia32_insertq (v2di, v2di);
-v2di __builtin_ia32_insertqi (v2di, v2di, const unsigned int, const unsigned int);
-@end smallexample
+void * __builtin_vis_alignaddr (void *, long);
+void * __builtin_vis_alignaddrl (void *, long);
+int64_t __builtin_vis_faligndatadi (int64_t, int64_t);
+v2si __builtin_vis_faligndatav2si (v2si, v2si);
+v4hi __builtin_vis_faligndatav4hi (v4si, v4si);
+v8qi __builtin_vis_faligndatav8qi (v8qi, v8qi);
 
-The following built-in functions are available when @option{-mxop} is used.
-@smallexample
-v2df __builtin_ia32_vfrczpd (v2df);
-v4sf __builtin_ia32_vfrczps (v4sf);
-v2df __builtin_ia32_vfrczsd (v2df);
-v4sf __builtin_ia32_vfrczss (v4sf);
-v4df __builtin_ia32_vfrczpd256 (v4df);
-v8sf __builtin_ia32_vfrczps256 (v8sf);
-v2di __builtin_ia32_vpcmov (v2di, v2di, v2di);
-v2di __builtin_ia32_vpcmov_v2di (v2di, v2di, v2di);
-v4si __builtin_ia32_vpcmov_v4si (v4si, v4si, v4si);
-v8hi __builtin_ia32_vpcmov_v8hi (v8hi, v8hi, v8hi);
-v16qi __builtin_ia32_vpcmov_v16qi (v16qi, v16qi, v16qi);
-v2df __builtin_ia32_vpcmov_v2df (v2df, v2df, v2df);
-v4sf __builtin_ia32_vpcmov_v4sf (v4sf, v4sf, v4sf);
-v4di __builtin_ia32_vpcmov_v4di256 (v4di, v4di, v4di);
-v8si __builtin_ia32_vpcmov_v8si256 (v8si, v8si, v8si);
-v16hi __builtin_ia32_vpcmov_v16hi256 (v16hi, v16hi, v16hi);
-v32qi __builtin_ia32_vpcmov_v32qi256 (v32qi, v32qi, v32qi);
-v4df __builtin_ia32_vpcmov_v4df256 (v4df, v4df, v4df);
-v8sf __builtin_ia32_vpcmov_v8sf256 (v8sf, v8sf, v8sf);
-v16qi __builtin_ia32_vpcomeqb (v16qi, v16qi);
-v8hi __builtin_ia32_vpcomeqw (v8hi, v8hi);
-v4si __builtin_ia32_vpcomeqd (v4si, v4si);
-v2di __builtin_ia32_vpcomeqq (v2di, v2di);
-v16qi __builtin_ia32_vpcomequb (v16qi, v16qi);
-v4si __builtin_ia32_vpcomequd (v4si, v4si);
-v2di __builtin_ia32_vpcomequq (v2di, v2di);
-v8hi __builtin_ia32_vpcomequw (v8hi, v8hi);
-v8hi __builtin_ia32_vpcomeqw (v8hi, v8hi);
-v16qi __builtin_ia32_vpcomfalseb (v16qi, v16qi);
-v4si __builtin_ia32_vpcomfalsed (v4si, v4si);
-v2di __builtin_ia32_vpcomfalseq (v2di, v2di);
-v16qi __builtin_ia32_vpcomfalseub (v16qi, v16qi);
-v4si __builtin_ia32_vpcomfalseud (v4si, v4si);
-v2di __builtin_ia32_vpcomfalseuq (v2di, v2di);
-v8hi __builtin_ia32_vpcomfalseuw (v8hi, v8hi);
-v8hi __builtin_ia32_vpcomfalsew (v8hi, v8hi);
-v16qi __builtin_ia32_vpcomgeb (v16qi, v16qi);
-v4si __builtin_ia32_vpcomged (v4si, v4si);
-v2di __builtin_ia32_vpcomgeq (v2di, v2di);
-v16qi __builtin_ia32_vpcomgeub (v16qi, v16qi);
-v4si __builtin_ia32_vpcomgeud (v4si, v4si);
-v2di __builtin_ia32_vpcomgeuq (v2di, v2di);
-v8hi __builtin_ia32_vpcomgeuw (v8hi, v8hi);
-v8hi __builtin_ia32_vpcomgew (v8hi, v8hi);
-v16qi __builtin_ia32_vpcomgtb (v16qi, v16qi);
-v4si __builtin_ia32_vpcomgtd (v4si, v4si);
-v2di __builtin_ia32_vpcomgtq (v2di, v2di);
-v16qi __builtin_ia32_vpcomgtub (v16qi, v16qi);
-v4si __builtin_ia32_vpcomgtud (v4si, v4si);
-v2di __builtin_ia32_vpcomgtuq (v2di, v2di);
-v8hi __builtin_ia32_vpcomgtuw (v8hi, v8hi);
-v8hi __builtin_ia32_vpcomgtw (v8hi, v8hi);
-v16qi __builtin_ia32_vpcomleb (v16qi, v16qi);
-v4si __builtin_ia32_vpcomled (v4si, v4si);
-v2di __builtin_ia32_vpcomleq (v2di, v2di);
-v16qi __builtin_ia32_vpcomleub (v16qi, v16qi);
-v4si __builtin_ia32_vpcomleud (v4si, v4si);
-v2di __builtin_ia32_vpcomleuq (v2di, v2di);
-v8hi __builtin_ia32_vpcomleuw (v8hi, v8hi);
-v8hi __builtin_ia32_vpcomlew (v8hi, v8hi);
-v16qi __builtin_ia32_vpcomltb (v16qi, v16qi);
-v4si __builtin_ia32_vpcomltd (v4si, v4si);
-v2di __builtin_ia32_vpcomltq (v2di, v2di);
-v16qi __builtin_ia32_vpcomltub (v16qi, v16qi);
-v4si __builtin_ia32_vpcomltud (v4si, v4si);
-v2di __builtin_ia32_vpcomltuq (v2di, v2di);
-v8hi __builtin_ia32_vpcomltuw (v8hi, v8hi);
-v8hi __builtin_ia32_vpcomltw (v8hi, v8hi);
-v16qi __builtin_ia32_vpcomneb (v16qi, v16qi);
-v4si __builtin_ia32_vpcomned (v4si, v4si);
-v2di __builtin_ia32_vpcomneq (v2di, v2di);
-v16qi __builtin_ia32_vpcomneub (v16qi, v16qi);
-v4si __builtin_ia32_vpcomneud (v4si, v4si);
-v2di __builtin_ia32_vpcomneuq (v2di, v2di);
-v8hi __builtin_ia32_vpcomneuw (v8hi, v8hi);
-v8hi __builtin_ia32_vpcomnew (v8hi, v8hi);
-v16qi __builtin_ia32_vpcomtrueb (v16qi, v16qi);
-v4si __builtin_ia32_vpcomtrued (v4si, v4si);
-v2di __builtin_ia32_vpcomtrueq (v2di, v2di);
-v16qi __builtin_ia32_vpcomtrueub (v16qi, v16qi);
-v4si __builtin_ia32_vpcomtrueud (v4si, v4si);
-v2di __builtin_ia32_vpcomtrueuq (v2di, v2di);
-v8hi __builtin_ia32_vpcomtrueuw (v8hi, v8hi);
-v8hi __builtin_ia32_vpcomtruew (v8hi, v8hi);
-v4si __builtin_ia32_vphaddbd (v16qi);
-v2di __builtin_ia32_vphaddbq (v16qi);
-v8hi __builtin_ia32_vphaddbw (v16qi);
-v2di __builtin_ia32_vphadddq (v4si);
-v4si __builtin_ia32_vphaddubd (v16qi);
-v2di __builtin_ia32_vphaddubq (v16qi);
-v8hi __builtin_ia32_vphaddubw (v16qi);
-v2di __builtin_ia32_vphaddudq (v4si);
-v4si __builtin_ia32_vphadduwd (v8hi);
-v2di __builtin_ia32_vphadduwq (v8hi);
-v4si __builtin_ia32_vphaddwd (v8hi);
-v2di __builtin_ia32_vphaddwq (v8hi);
-v8hi __builtin_ia32_vphsubbw (v16qi);
-v2di __builtin_ia32_vphsubdq (v4si);
-v4si __builtin_ia32_vphsubwd (v8hi);
-v4si __builtin_ia32_vpmacsdd (v4si, v4si, v4si);
-v2di __builtin_ia32_vpmacsdqh (v4si, v4si, v2di);
-v2di __builtin_ia32_vpmacsdql (v4si, v4si, v2di);
-v4si __builtin_ia32_vpmacssdd (v4si, v4si, v4si);
-v2di __builtin_ia32_vpmacssdqh (v4si, v4si, v2di);
-v2di __builtin_ia32_vpmacssdql (v4si, v4si, v2di);
-v4si __builtin_ia32_vpmacsswd (v8hi, v8hi, v4si);
-v8hi __builtin_ia32_vpmacssww (v8hi, v8hi, v8hi);
-v4si __builtin_ia32_vpmacswd (v8hi, v8hi, v4si);
-v8hi __builtin_ia32_vpmacsww (v8hi, v8hi, v8hi);
-v4si __builtin_ia32_vpmadcsswd (v8hi, v8hi, v4si);
-v4si __builtin_ia32_vpmadcswd (v8hi, v8hi, v4si);
-v16qi __builtin_ia32_vpperm (v16qi, v16qi, v16qi);
-v16qi __builtin_ia32_vprotb (v16qi, v16qi);
-v4si __builtin_ia32_vprotd (v4si, v4si);
-v2di __builtin_ia32_vprotq (v2di, v2di);
-v8hi __builtin_ia32_vprotw (v8hi, v8hi);
-v16qi __builtin_ia32_vpshab (v16qi, v16qi);
-v4si __builtin_ia32_vpshad (v4si, v4si);
-v2di __builtin_ia32_vpshaq (v2di, v2di);
-v8hi __builtin_ia32_vpshaw (v8hi, v8hi);
-v16qi __builtin_ia32_vpshlb (v16qi, v16qi);
-v4si __builtin_ia32_vpshld (v4si, v4si);
-v2di __builtin_ia32_vpshlq (v2di, v2di);
-v8hi __builtin_ia32_vpshlw (v8hi, v8hi);
-@end smallexample
+v4hi __builtin_vis_fexpand (v4qi);
 
-The following built-in functions are available when @option{-mfma4} is used.
-All of them generate the machine instruction that is part of the name.
+v4hi __builtin_vis_fmul8x16 (v4qi, v4hi);
+v4hi __builtin_vis_fmul8x16au (v4qi, v2hi);
+v4hi __builtin_vis_fmul8x16al (v4qi, v2hi);
+v4hi __builtin_vis_fmul8sux16 (v8qi, v4hi);
+v4hi __builtin_vis_fmul8ulx16 (v8qi, v4hi);
+v2si __builtin_vis_fmuld8sux16 (v4qi, v2hi);
+v2si __builtin_vis_fmuld8ulx16 (v4qi, v2hi);
 
-@smallexample
-v2df __builtin_ia32_vfmaddpd (v2df, v2df, v2df);
-v4sf __builtin_ia32_vfmaddps (v4sf, v4sf, v4sf);
-v2df __builtin_ia32_vfmaddsd (v2df, v2df, v2df);
-v4sf __builtin_ia32_vfmaddss (v4sf, v4sf, v4sf);
-v2df __builtin_ia32_vfmsubpd (v2df, v2df, v2df);
-v4sf __builtin_ia32_vfmsubps (v4sf, v4sf, v4sf);
-v2df __builtin_ia32_vfmsubsd (v2df, v2df, v2df);
-v4sf __builtin_ia32_vfmsubss (v4sf, v4sf, v4sf);
-v2df __builtin_ia32_vfnmaddpd (v2df, v2df, v2df);
-v4sf __builtin_ia32_vfnmaddps (v4sf, v4sf, v4sf);
-v2df __builtin_ia32_vfnmaddsd (v2df, v2df, v2df);
-v4sf __builtin_ia32_vfnmaddss (v4sf, v4sf, v4sf);
-v2df __builtin_ia32_vfnmsubpd (v2df, v2df, v2df);
-v4sf __builtin_ia32_vfnmsubps (v4sf, v4sf, v4sf);
-v2df __builtin_ia32_vfnmsubsd (v2df, v2df, v2df);
-v4sf __builtin_ia32_vfnmsubss (v4sf, v4sf, v4sf);
-v2df __builtin_ia32_vfmaddsubpd  (v2df, v2df, v2df);
-v4sf __builtin_ia32_vfmaddsubps  (v4sf, v4sf, v4sf);
-v2df __builtin_ia32_vfmsubaddpd  (v2df, v2df, v2df);
-v4sf __builtin_ia32_vfmsubaddps  (v4sf, v4sf, v4sf);
-v4df __builtin_ia32_vfmaddpd256 (v4df, v4df, v4df);
-v8sf __builtin_ia32_vfmaddps256 (v8sf, v8sf, v8sf);
-v4df __builtin_ia32_vfmsubpd256 (v4df, v4df, v4df);
-v8sf __builtin_ia32_vfmsubps256 (v8sf, v8sf, v8sf);
-v4df __builtin_ia32_vfnmaddpd256 (v4df, v4df, v4df);
-v8sf __builtin_ia32_vfnmaddps256 (v8sf, v8sf, v8sf);
-v4df __builtin_ia32_vfnmsubpd256 (v4df, v4df, v4df);
-v8sf __builtin_ia32_vfnmsubps256 (v8sf, v8sf, v8sf);
-v4df __builtin_ia32_vfmaddsubpd256 (v4df, v4df, v4df);
-v8sf __builtin_ia32_vfmaddsubps256 (v8sf, v8sf, v8sf);
-v4df __builtin_ia32_vfmsubaddpd256 (v4df, v4df, v4df);
-v8sf __builtin_ia32_vfmsubaddps256 (v8sf, v8sf, v8sf);
+v4qi __builtin_vis_fpack16 (v4hi);
+v8qi __builtin_vis_fpack32 (v2si, v8qi);
+v2hi __builtin_vis_fpackfix (v2si);
+v8qi __builtin_vis_fpmerge (v4qi, v4qi);
 
-@end smallexample
+int64_t __builtin_vis_pdist (v8qi, v8qi, int64_t);
 
-The following built-in functions are available when @option{-mlwp} is used.
+long __builtin_vis_edge8 (void *, void *);
+long __builtin_vis_edge8l (void *, void *);
+long __builtin_vis_edge16 (void *, void *);
+long __builtin_vis_edge16l (void *, void *);
+long __builtin_vis_edge32 (void *, void *);
+long __builtin_vis_edge32l (void *, void *);
 
-@smallexample
-void __builtin_ia32_llwpcb16 (void *);
-void __builtin_ia32_llwpcb32 (void *);
-void __builtin_ia32_llwpcb64 (void *);
-void * __builtin_ia32_llwpcb16 (void);
-void * __builtin_ia32_llwpcb32 (void);
-void * __builtin_ia32_llwpcb64 (void);
-void __builtin_ia32_lwpval16 (unsigned short, unsigned int, unsigned short);
-void __builtin_ia32_lwpval32 (unsigned int, unsigned int, unsigned int);
-void __builtin_ia32_lwpval64 (unsigned __int64, unsigned int, unsigned int);
-unsigned char __builtin_ia32_lwpins16 (unsigned short, unsigned int, unsigned short);
-unsigned char __builtin_ia32_lwpins32 (unsigned int, unsigned int, unsigned int);
-unsigned char __builtin_ia32_lwpins64 (unsigned __int64, unsigned int, unsigned int);
-@end smallexample
+long __builtin_vis_fcmple16 (v4hi, v4hi);
+long __builtin_vis_fcmple32 (v2si, v2si);
+long __builtin_vis_fcmpne16 (v4hi, v4hi);
+long __builtin_vis_fcmpne32 (v2si, v2si);
+long __builtin_vis_fcmpgt16 (v4hi, v4hi);
+long __builtin_vis_fcmpgt32 (v2si, v2si);
+long __builtin_vis_fcmpeq16 (v4hi, v4hi);
+long __builtin_vis_fcmpeq32 (v2si, v2si);
+
+v4hi __builtin_vis_fpadd16 (v4hi, v4hi);
+v2hi __builtin_vis_fpadd16s (v2hi, v2hi);
+v2si __builtin_vis_fpadd32 (v2si, v2si);
+v1si __builtin_vis_fpadd32s (v1si, v1si);
+v4hi __builtin_vis_fpsub16 (v4hi, v4hi);
+v2hi __builtin_vis_fpsub16s (v2hi, v2hi);
+v2si __builtin_vis_fpsub32 (v2si, v2si);
+v1si __builtin_vis_fpsub32s (v1si, v1si);
 
-The following built-in functions are available when @option{-mbmi} is used.
-All of them generate the machine instruction that is part of the name.
-@smallexample
-unsigned int __builtin_ia32_bextr_u32(unsigned int, unsigned int);
-unsigned long long __builtin_ia32_bextr_u64 (unsigned long long, unsigned long long);
+long __builtin_vis_array8 (long, long);
+long __builtin_vis_array16 (long, long);
+long __builtin_vis_array32 (long, long);
 @end smallexample
 
-The following built-in functions are available when @option{-mbmi2} is used.
-All of them generate the machine instruction that is part of the name.
-@smallexample
-unsigned int _bzhi_u32 (unsigned int, unsigned int);
-unsigned int _pdep_u32 (unsigned int, unsigned int);
-unsigned int _pext_u32 (unsigned int, unsigned int);
-unsigned long long _bzhi_u64 (unsigned long long, unsigned long long);
-unsigned long long _pdep_u64 (unsigned long long, unsigned long long);
-unsigned long long _pext_u64 (unsigned long long, unsigned long long);
-@end smallexample
+When you use the @option{-mvis2} switch, the VIS version 2.0 built-in
+functions also become available:
 
-The following built-in functions are available when @option{-mlzcnt} is used.
-All of them generate the machine instruction that is part of the name.
 @smallexample
-unsigned short __builtin_ia32_lzcnt_u16(unsigned short);
-unsigned int __builtin_ia32_lzcnt_u32(unsigned int);
-unsigned long long __builtin_ia32_lzcnt_u64 (unsigned long long);
-@end smallexample
+long __builtin_vis_bmask (long, long);
+int64_t __builtin_vis_bshuffledi (int64_t, int64_t);
+v2si __builtin_vis_bshufflev2si (v2si, v2si);
+v4hi __builtin_vis_bshufflev2si (v4hi, v4hi);
+v8qi __builtin_vis_bshufflev2si (v8qi, v8qi);
 
-The following built-in functions are available when @option{-mfxsr} is used.
-All of them generate the machine instruction that is part of the name.
-@smallexample
-void __builtin_ia32_fxsave (void *);
-void __builtin_ia32_fxrstor (void *);
-void __builtin_ia32_fxsave64 (void *);
-void __builtin_ia32_fxrstor64 (void *);
+long __builtin_vis_edge8n (void *, void *);
+long __builtin_vis_edge8ln (void *, void *);
+long __builtin_vis_edge16n (void *, void *);
+long __builtin_vis_edge16ln (void *, void *);
+long __builtin_vis_edge32n (void *, void *);
+long __builtin_vis_edge32ln (void *, void *);
 @end smallexample
 
-The following built-in functions are available when @option{-mxsave} is used.
-All of them generate the machine instruction that is part of the name.
-@smallexample
-void __builtin_ia32_xsave (void *, long long);
-void __builtin_ia32_xrstor (void *, long long);
-void __builtin_ia32_xsave64 (void *, long long);
-void __builtin_ia32_xrstor64 (void *, long long);
-@end smallexample
+When you use the @option{-mvis3} switch, the VIS version 3.0 built-in
+functions also become available:
 
-The following built-in functions are available when @option{-mxsaveopt} is used.
-All of them generate the machine instruction that is part of the name.
 @smallexample
-void __builtin_ia32_xsaveopt (void *, long long);
-void __builtin_ia32_xsaveopt64 (void *, long long);
-@end smallexample
+void __builtin_vis_cmask8 (long);
+void __builtin_vis_cmask16 (long);
+void __builtin_vis_cmask32 (long);
 
-The following built-in functions are available when @option{-mtbm} is used.
-Both of them generate the immediate form of the bextr machine instruction.
-@smallexample
-unsigned int __builtin_ia32_bextri_u32 (unsigned int,
-                                        const unsigned int);
-unsigned long long __builtin_ia32_bextri_u64 (unsigned long long,
-                                              const unsigned long long);
-@end smallexample
+v4hi __builtin_vis_fchksm16 (v4hi, v4hi);
 
+v4hi __builtin_vis_fsll16 (v4hi, v4hi);
+v4hi __builtin_vis_fslas16 (v4hi, v4hi);
+v4hi __builtin_vis_fsrl16 (v4hi, v4hi);
+v4hi __builtin_vis_fsra16 (v4hi, v4hi);
+v2si __builtin_vis_fsll16 (v2si, v2si);
+v2si __builtin_vis_fslas16 (v2si, v2si);
+v2si __builtin_vis_fsrl16 (v2si, v2si);
+v2si __builtin_vis_fsra16 (v2si, v2si);
 
-The following built-in functions are available when @option{-m3dnow} is used.
-All of them generate the machine instruction that is part of the name.
+long __builtin_vis_pdistn (v8qi, v8qi);
 
-@smallexample
-void __builtin_ia32_femms (void);
-v8qi __builtin_ia32_pavgusb (v8qi, v8qi);
-v2si __builtin_ia32_pf2id (v2sf);
-v2sf __builtin_ia32_pfacc (v2sf, v2sf);
-v2sf __builtin_ia32_pfadd (v2sf, v2sf);
-v2si __builtin_ia32_pfcmpeq (v2sf, v2sf);
-v2si __builtin_ia32_pfcmpge (v2sf, v2sf);
-v2si __builtin_ia32_pfcmpgt (v2sf, v2sf);
-v2sf __builtin_ia32_pfmax (v2sf, v2sf);
-v2sf __builtin_ia32_pfmin (v2sf, v2sf);
-v2sf __builtin_ia32_pfmul (v2sf, v2sf);
-v2sf __builtin_ia32_pfrcp (v2sf);
-v2sf __builtin_ia32_pfrcpit1 (v2sf, v2sf);
-v2sf __builtin_ia32_pfrcpit2 (v2sf, v2sf);
-v2sf __builtin_ia32_pfrsqrt (v2sf);
-v2sf __builtin_ia32_pfsub (v2sf, v2sf);
-v2sf __builtin_ia32_pfsubr (v2sf, v2sf);
-v2sf __builtin_ia32_pi2fd (v2si);
-v4hi __builtin_ia32_pmulhrw (v4hi, v4hi);
-@end smallexample
+v4hi __builtin_vis_fmean16 (v4hi, v4hi);
 
-The following built-in functions are available when @option{-m3dnowa} is used.
-All of them generate the machine instruction that is part of the name.
+int64_t __builtin_vis_fpadd64 (int64_t, int64_t);
+int64_t __builtin_vis_fpsub64 (int64_t, int64_t);
 
-@smallexample
-v2si __builtin_ia32_pf2iw (v2sf);
-v2sf __builtin_ia32_pfnacc (v2sf, v2sf);
-v2sf __builtin_ia32_pfpnacc (v2sf, v2sf);
-v2sf __builtin_ia32_pi2fw (v2si);
-v2sf __builtin_ia32_pswapdsf (v2sf);
-v2si __builtin_ia32_pswapdsi (v2si);
-@end smallexample
+v4hi __builtin_vis_fpadds16 (v4hi, v4hi);
+v2hi __builtin_vis_fpadds16s (v2hi, v2hi);
+v4hi __builtin_vis_fpsubs16 (v4hi, v4hi);
+v2hi __builtin_vis_fpsubs16s (v2hi, v2hi);
+v2si __builtin_vis_fpadds32 (v2si, v2si);
+v1si __builtin_vis_fpadds32s (v1si, v1si);
+v2si __builtin_vis_fpsubs32 (v2si, v2si);
+v1si __builtin_vis_fpsubs32s (v1si, v1si);
 
-The following built-in functions are available when @option{-mrtm} is used
-They are used for restricted transactional memory. These are the internal
-low level functions. Normally the functions in 
-@ref{x86 transactional memory intrinsics} should be used instead.
+long __builtin_vis_fucmple8 (v8qi, v8qi);
+long __builtin_vis_fucmpne8 (v8qi, v8qi);
+long __builtin_vis_fucmpgt8 (v8qi, v8qi);
+long __builtin_vis_fucmpeq8 (v8qi, v8qi);
 
-@smallexample
-int __builtin_ia32_xbegin ();
-void __builtin_ia32_xend ();
-void __builtin_ia32_xabort (status);
-int __builtin_ia32_xtest ();
-@end smallexample
+float __builtin_vis_fhadds (float, float);
+double __builtin_vis_fhaddd (double, double);
+float __builtin_vis_fhsubs (float, float);
+double __builtin_vis_fhsubd (double, double);
+float __builtin_vis_fnhadds (float, float);
+double __builtin_vis_fnhaddd (double, double);
 
-The following built-in functions are available when @option{-mmwaitx} is used.
-All of them generate the machine instruction that is part of the name.
-@smallexample
-void __builtin_ia32_monitorx (void *, unsigned int, unsigned int);
-void __builtin_ia32_mwaitx (unsigned int, unsigned int, unsigned int);
+int64_t __builtin_vis_umulxhi (int64_t, int64_t);
+int64_t __builtin_vis_xmulx (int64_t, int64_t);
+int64_t __builtin_vis_xmulxhi (int64_t, int64_t);
 @end smallexample
 
-The following built-in functions are available when @option{-mclzero} is used.
-All of them generate the machine instruction that is part of the name.
-@smallexample
-void __builtin_i32_clzero (void *);
-@end smallexample
+When you use the @option{-mvis4} switch, the VIS version 4.0 built-in
+functions also become available:
 
-The following built-in functions are available when @option{-mpku} is used.
-They generate reads and writes to PKRU.
 @smallexample
-void __builtin_ia32_wrpkru (unsigned int);
-unsigned int __builtin_ia32_rdpkru ();
-@end smallexample
+v8qi __builtin_vis_fpadd8 (v8qi, v8qi);
+v8qi __builtin_vis_fpadds8 (v8qi, v8qi);
+v8qi __builtin_vis_fpaddus8 (v8qi, v8qi);
+v4hi __builtin_vis_fpaddus16 (v4hi, v4hi);
 
-The following built-in functions are available when
-@option{-mshstk} option is used.  They support shadow stack
-machine instructions from Intel Control-flow Enforcement Technology (CET).
-Each built-in function generates the  machine instruction that is part
-of the function's name.  These are the internal low-level functions.
-Normally the functions in @ref{x86 control-flow protection intrinsics}
-should be used instead.
+v8qi __builtin_vis_fpsub8 (v8qi, v8qi);
+v8qi __builtin_vis_fpsubs8 (v8qi, v8qi);
+v8qi __builtin_vis_fpsubus8 (v8qi, v8qi);
+v4hi __builtin_vis_fpsubus16 (v4hi, v4hi);
 
-@smallexample
-unsigned int __builtin_ia32_rdsspd (void);
-unsigned long long __builtin_ia32_rdsspq (void);
-void __builtin_ia32_incsspd (unsigned int);
-void __builtin_ia32_incsspq (unsigned long long);
-void __builtin_ia32_saveprevssp(void);
-void __builtin_ia32_rstorssp(void *);
-void __builtin_ia32_wrssd(unsigned int, void *);
-void __builtin_ia32_wrssq(unsigned long long, void *);
-void __builtin_ia32_wrussd(unsigned int, void *);
-void __builtin_ia32_wrussq(unsigned long long, void *);
-void __builtin_ia32_setssbsy(void);
-void __builtin_ia32_clrssbsy(void *);
-@end smallexample
+long __builtin_vis_fpcmple8 (v8qi, v8qi);
+long __builtin_vis_fpcmpgt8 (v8qi, v8qi);
+long __builtin_vis_fpcmpule16 (v4hi, v4hi);
+long __builtin_vis_fpcmpugt16 (v4hi, v4hi);
+long __builtin_vis_fpcmpule32 (v2si, v2si);
+long __builtin_vis_fpcmpugt32 (v2si, v2si);
 
-@node x86 transactional memory intrinsics
-@subsection x86 Transactional Memory Intrinsics
+v8qi __builtin_vis_fpmax8 (v8qi, v8qi);
+v4hi __builtin_vis_fpmax16 (v4hi, v4hi);
+v2si __builtin_vis_fpmax32 (v2si, v2si);
 
-These hardware transactional memory intrinsics for x86 allow you to use
-memory transactions with RTM (Restricted Transactional Memory).
-This support is enabled with the @option{-mrtm} option.
-For using HLE (Hardware Lock Elision) see 
-@ref{x86 specific memory model extensions for transactional memory} instead.
+v8qi __builtin_vis_fpmaxu8 (v8qi, v8qi);
+v4hi __builtin_vis_fpmaxu16 (v4hi, v4hi);
+v2si __builtin_vis_fpmaxu32 (v2si, v2si);
 
-A memory transaction commits all changes to memory in an atomic way,
-as visible to other threads. If the transaction fails it is rolled back
-and all side effects discarded.
+v8qi __builtin_vis_fpmin8 (v8qi, v8qi);
+v4hi __builtin_vis_fpmin16 (v4hi, v4hi);
+v2si __builtin_vis_fpmin32 (v2si, v2si);
 
-Generally there is no guarantee that a memory transaction ever succeeds
-and suitable fallback code always needs to be supplied.
+v8qi __builtin_vis_fpminu8 (v8qi, v8qi);
+v4hi __builtin_vis_fpminu16 (v4hi, v4hi);
+v2si __builtin_vis_fpminu32 (v2si, v2si);
+@end smallexample
 
-@deftypefn {RTM Function} {unsigned} _xbegin ()
-Start a RTM (Restricted Transactional Memory) transaction. 
-Returns @code{_XBEGIN_STARTED} when the transaction
-started successfully (note this is not 0, so the constant has to be 
-explicitly tested).  
+When you use the @option{-mvis4b} switch, the VIS version 4.0B
+built-in functions also become available:
 
-If the transaction aborts, all side effects
-are undone and an abort code encoded as a bit mask is returned.
-The following macros are defined:
+@smallexample
+v8qi __builtin_vis_dictunpack8 (double, int);
+v4hi __builtin_vis_dictunpack16 (double, int);
+v2si __builtin_vis_dictunpack32 (double, int);
 
-@defmac{_XABORT_EXPLICIT}
-Transaction was explicitly aborted with @code{_xabort}.  The parameter passed
-to @code{_xabort} is available with @code{_XABORT_CODE(status)}.
-@end defmac
+long __builtin_vis_fpcmple8shl (v8qi, v8qi, int);
+long __builtin_vis_fpcmpgt8shl (v8qi, v8qi, int);
+long __builtin_vis_fpcmpeq8shl (v8qi, v8qi, int);
+long __builtin_vis_fpcmpne8shl (v8qi, v8qi, int);
+
+long __builtin_vis_fpcmple16shl (v4hi, v4hi, int);
+long __builtin_vis_fpcmpgt16shl (v4hi, v4hi, int);
+long __builtin_vis_fpcmpeq16shl (v4hi, v4hi, int);
+long __builtin_vis_fpcmpne16shl (v4hi, v4hi, int);
+
+long __builtin_vis_fpcmple32shl (v2si, v2si, int);
+long __builtin_vis_fpcmpgt32shl (v2si, v2si, int);
+long __builtin_vis_fpcmpeq32shl (v2si, v2si, int);
+long __builtin_vis_fpcmpne32shl (v2si, v2si, int);
 
-@defmac{_XABORT_RETRY}
-Transaction retry is possible.
-@end defmac
+long __builtin_vis_fpcmpule8shl (v8qi, v8qi, int);
+long __builtin_vis_fpcmpugt8shl (v8qi, v8qi, int);
+long __builtin_vis_fpcmpule16shl (v4hi, v4hi, int);
+long __builtin_vis_fpcmpugt16shl (v4hi, v4hi, int);
+long __builtin_vis_fpcmpule32shl (v2si, v2si, int);
+long __builtin_vis_fpcmpugt32shl (v2si, v2si, int);
 
-@defmac{_XABORT_CONFLICT}
-Transaction abort due to a memory conflict with another thread.
-@end defmac
+long __builtin_vis_fpcmpde8shl (v8qi, v8qi, int);
+long __builtin_vis_fpcmpde16shl (v4hi, v4hi, int);
+long __builtin_vis_fpcmpde32shl (v2si, v2si, int);
 
-@defmac{_XABORT_CAPACITY}
-Transaction abort due to the transaction using too much memory.
-@end defmac
+long __builtin_vis_fpcmpur8shl (v8qi, v8qi, int);
+long __builtin_vis_fpcmpur16shl (v4hi, v4hi, int);
+long __builtin_vis_fpcmpur32shl (v2si, v2si, int);
+@end smallexample
 
-@defmac{_XABORT_DEBUG}
-Transaction abort due to a debug trap.
-@end defmac
+@node TI C6X Built-in Functions
+@subsection TI C6X Built-in Functions
 
-@defmac{_XABORT_NESTED}
-Transaction abort in an inner nested transaction.
-@end defmac
+GCC provides intrinsics to access certain instructions of the TI C6X
+processors.  These intrinsics, listed below, are available after
+inclusion of the @code{c6x_intrinsics.h} header file.  They map directly
+to C6X instructions.
 
-There is no guarantee
-any transaction ever succeeds, so there always needs to be a valid
-fallback path.
-@end deftypefn
+@smallexample
+int _sadd (int, int);
+int _ssub (int, int);
+int _sadd2 (int, int);
+int _ssub2 (int, int);
+long long _mpy2 (int, int);
+long long _smpy2 (int, int);
+int _add4 (int, int);
+int _sub4 (int, int);
+int _saddu4 (int, int);
 
-@deftypefn {RTM Function} {void} _xend ()
-Commit the current transaction. When no transaction is active this faults.
-All memory side effects of the transaction become visible
-to other threads in an atomic manner.
-@end deftypefn
+int _smpy (int, int);
+int _smpyh (int, int);
+int _smpyhl (int, int);
+int _smpylh (int, int);
 
-@deftypefn {RTM Function} {int} _xtest ()
-Return a nonzero value if a transaction is currently active, otherwise 0.
-@end deftypefn
+int _sshl (int, int);
+int _subc (int, int);
 
-@deftypefn {RTM Function} {void} _xabort (status)
-Abort the current transaction. When no transaction is active this is a no-op.
-The @var{status} is an 8-bit constant; its value is encoded in the return 
-value from @code{_xbegin}.
-@end deftypefn
+int _avg2 (int, int);
+int _avgu4 (int, int);
 
-Here is an example showing handling for @code{_XABORT_RETRY}
-and a fallback path for other failures:
+int _clrr (int, int);
+int _extr (int, int);
+int _extru (int, int);
+int _abs (int);
+int _abs2 (int);
+@end smallexample
 
-@smallexample
-#include <immintrin.h>
+@node x86 Built-in Functions
+@subsection x86 Built-in Functions
 
-int n_tries, max_tries;
-unsigned status = _XABORT_EXPLICIT;
-...
+These built-in functions are available for the x86-32 and x86-64 family
+of computers, depending on the command-line switches used.
 
-for (n_tries = 0; n_tries < max_tries; n_tries++) 
-  @{
-    status = _xbegin ();
-    if (status == _XBEGIN_STARTED || !(status & _XABORT_RETRY))
-      break;
-  @}
-if (status == _XBEGIN_STARTED) 
-  @{
-    ... transaction code...
-    _xend ();
-  @} 
-else 
-  @{
-    ... non-transactional fallback path...
-  @}
-@end smallexample
+If you specify command-line switches such as @option{-msse},
+the compiler could use the extended instruction sets even if the built-ins
+are not used explicitly in the program.  For this reason, applications
+that perform run-time CPU detection must compile separate files for each
+supported architecture, using the appropriate flags.  In particular,
+the file containing the CPU detection code should be compiled without
+these options.
 
-@noindent
-Note that, in most cases, the transactional and non-transactional code
-must synchronize together to ensure consistency.
+The following machine modes are available for use with MMX built-in functions
+(@pxref{Vector Extensions}): @code{V2SI} for a vector of two 32-bit integers,
+@code{V4HI} for a vector of four 16-bit integers, and @code{V8QI} for a
+vector of eight 8-bit integers.  Some of the built-in functions operate on
+MMX registers as a whole 64-bit entity, these use @code{V1DI} as their mode.
 
-@node x86 control-flow protection intrinsics
-@subsection x86 Control-Flow Protection Intrinsics
+If 3DNow!@: extensions are enabled, @code{V2SF} is used as a mode for a vector
+of two 32-bit floating-point values.
 
-@deftypefn {CET Function} {ret_type} _get_ssp (void)
-Get the current value of shadow stack pointer if shadow stack support
-from Intel CET is enabled in the hardware or @code{0} otherwise.
-The @code{ret_type} is @code{unsigned long long} for 64-bit targets 
-and @code{unsigned int} for 32-bit targets.
-@end deftypefn
+If SSE extensions are enabled, @code{V4SF} is used for a vector of four 32-bit
+floating-point values.  Some instructions use a vector of four 32-bit
+integers, these use @code{V4SI}.  Finally, some instructions operate on an
+entire vector register, interpreting it as a 128-bit integer, these use mode
+@code{TI}.
 
-@deftypefn {CET Function} void _inc_ssp (unsigned int)
-Increment the current shadow stack pointer by the size specified by the
-function argument.  The argument is masked to a byte value for security
-reasons, so to increment by more than 255 bytes you must call the function
-multiple times.
-@end deftypefn
+The x86-32 and x86-64 family of processors use additional built-in
+functions for efficient use of @code{TF} (@code{__float128}) 128-bit
+floating point and @code{TC} 128-bit complex floating-point values.
 
-The shadow stack unwind code looks like:
+The following floating-point built-in functions are always available:
 
-@smallexample
-#include <immintrin.h>
+@defbuiltin{__float128 __builtin_fabsq (__float128 @var{x}))}
+Computes the absolute value of @var{x}.
+@enddefbuiltin
 
-/* Unwind the shadow stack for EH.  */
-#define _Unwind_Frames_Extra(x)       \
-  do                                  \
-    @{                                \
-      _Unwind_Word ssp = _get_ssp (); \
-      if (ssp != 0)                   \
-        @{                            \
-          _Unwind_Word tmp = (x);     \
-          while (tmp > 255)           \
-            @{                        \
-              _inc_ssp (tmp);         \
-              tmp -= 255;             \
-            @}                        \
-          _inc_ssp (tmp);             \
-        @}                            \
-    @}                                \
-    while (0)
-@end smallexample
+@defbuiltin{__float128 __builtin_copysignq (__float128 @var{x}, @
+                                            __float128 @var{y})}
+Copies the sign of @var{y} into @var{x} and returns the new value of
+@var{x}.
+@enddefbuiltin
 
-@noindent
-This code runs unconditionally on all 64-bit processors.  For 32-bit
-processors the code runs on those that support multi-byte NOP instructions.
+@defbuiltin{__float128 __builtin_infq (void)}
+Similar to @code{__builtin_inf}, except the return type is @code{__float128}.
+@enddefbuiltin
 
-@node Target Format Checks
-@section Format Checks Specific to Particular Target Machines
+@defbuiltin{__float128 __builtin_huge_valq (void)}
+Similar to @code{__builtin_huge_val}, except the return type is @code{__float128}.
+@enddefbuiltin
 
-For some target machines, GCC supports additional options to the
-format attribute
-(@pxref{Function Attributes,,Declaring Attributes of Functions}).
+@defbuiltin{__float128 __builtin_nanq (void)}
+Similar to @code{__builtin_nan}, except the return type is @code{__float128}.
+@enddefbuiltin
 
-@menu
-* Solaris Format Checks::
-* Darwin Format Checks::
-@end menu
+@defbuiltin{__float128 __builtin_nansq (void)}
+Similar to @code{__builtin_nans}, except the return type is @code{__float128}.
+@enddefbuiltin
 
-@node Solaris Format Checks
-@subsection Solaris Format Checks
+The following built-in function is always available.
 
-Solaris targets support the @code{cmn_err} (or @code{__cmn_err__}) format
-check.  @code{cmn_err} accepts a subset of the standard @code{printf}
-conversions, and the two-argument @code{%b} conversion for displaying
-bit-fields.  See the Solaris man page for @code{cmn_err} for more information.
+@defbuiltin{void __builtin_ia32_pause (void)}
+Generates the @code{pause} machine instruction with a compiler memory
+barrier.
+@enddefbuiltin
 
-@node Darwin Format Checks
-@subsection Darwin Format Checks
+The following built-in functions are always available and can be used to
+check the target platform type.
 
-In addition to the full set of format archetypes (attribute format style
-arguments such as @code{printf}, @code{scanf}, @code{strftime}, and
-@code{strfmon}), Darwin targets also support the @code{CFString} (or
-@code{__CFString__}) archetype in the @code{format} attribute.
-Declarations with this archetype are parsed for correct syntax
-and argument types.  However, parsing of the format string itself and
-validating arguments against it in calls to such functions is currently
-not performed.
+@defbuiltin{void __builtin_cpu_init (void)}
+This function runs the CPU detection code to check the type of CPU and the
+features supported.  This built-in function needs to be invoked along with the built-in functions
+to check CPU type and features, @code{__builtin_cpu_is} and
+@code{__builtin_cpu_supports}, only when used in a function that is
+executed before any constructors are called.  The CPU detection code is
+automatically executed in a very high priority constructor.
 
-Additionally, @code{CFStringRefs} (defined by the @code{CoreFoundation} headers) may
-also be used as format arguments.  Note that the relevant headers are only likely to be
-available on Darwin (OSX) installations.  On such installations, the XCode and system
-documentation provide descriptions of @code{CFString}, @code{CFStringRefs} and
-associated functions.
+For example, this function has to be used in @code{ifunc} resolvers that
+check for CPU type using the built-in functions @code{__builtin_cpu_is}
+and @code{__builtin_cpu_supports}, or in constructors on targets that
+don't support constructor priority.
+@smallexample
 
-@node Pragmas
-@section Pragmas Accepted by GCC
-@cindex pragmas
-@cindex @code{#pragma}
+static void (*resolve_memcpy (void)) (void)
+@{
+  // ifunc resolvers fire before constructors, explicitly call the init
+  // function.
+  __builtin_cpu_init ();
+  if (__builtin_cpu_supports ("ssse3"))
+    return ssse3_memcpy; // super fast memcpy with ssse3 instructions.
+  else
+    return default_memcpy;
+@}
 
-GCC supports several types of pragmas, primarily in order to compile
-code originally written for other compilers.  Note that in general
-we do not recommend the use of pragmas; @xref{Function Attributes},
-for further explanation.
+void *memcpy (void *, const void *, size_t)
+     __attribute__ ((ifunc ("resolve_memcpy")));
+@end smallexample
 
-The GNU C preprocessor recognizes several pragmas in addition to the
-compiler pragmas documented here.  Refer to the CPP manual for more
-information.
+@enddefbuiltin
 
-GCC additionally recognizes OpenMP pragmas when the @option{-fopenmp}
-option is specified, and OpenACC pragmas when the @option{-fopenacc}
-option is specified.  @xref{OpenMP}, and @ref{OpenACC}.
+@defbuiltin{int __builtin_cpu_is (const char *@var{cpuname})}
+This function returns a positive integer if the run-time CPU
+is of type @var{cpuname}
+and returns @code{0} otherwise. The following CPU names can be detected:
 
-@menu
-* AArch64 Pragmas::
-* ARM Pragmas::
-* LoongArch Pragmas::
-* M32C Pragmas::
-* PRU Pragmas::
-* RS/6000 and PowerPC Pragmas::
-* S/390 Pragmas::
-* Darwin Pragmas::
-* Solaris Pragmas::
-* Symbol-Renaming Pragmas::
-* Structure-Layout Pragmas::
-* Weak Pragmas::
-* Diagnostic Pragmas::
-* Visibility Pragmas::
-* Push/Pop Macro Pragmas::
-* Function Specific Option Pragmas::
-* Loop-Specific Pragmas::
-@end menu
+@table @samp
+@item amd
+AMD CPU.
 
-@node AArch64 Pragmas
-@subsection AArch64 Pragmas
+@item intel
+Intel CPU.
 
-The pragmas defined by the AArch64 target correspond to the AArch64
-target function attributes.  They can be specified as below:
-@smallexample
-#pragma GCC target("string")
-@end smallexample
+@item atom
+Intel Atom CPU.
 
-where @code{@var{string}} can be any string accepted as an AArch64 target
-attribute.  @xref{AArch64 Function Attributes}, for more details
-on the permissible values of @code{string}.
+@item slm
+Intel Silvermont CPU.
 
-@node ARM Pragmas
-@subsection ARM Pragmas
+@item core2
+Intel Core 2 CPU.
 
-The ARM target defines pragmas for controlling the default addition of
-@code{long_call} and @code{short_call} attributes to functions.
-@xref{Function Attributes}, for information about the effects of these
-attributes.
+@item corei7
+Intel Core i7 CPU.
 
-@table @code
-@cindex pragma, long_calls
-@item long_calls
-Set all subsequent functions to have the @code{long_call} attribute.
+@item nehalem
+Intel Core i7 Nehalem CPU.
 
-@cindex pragma, no_long_calls
-@item no_long_calls
-Set all subsequent functions to have the @code{short_call} attribute.
+@item westmere
+Intel Core i7 Westmere CPU.
 
-@cindex pragma, long_calls_off
-@item long_calls_off
-Do not affect the @code{long_call} or @code{short_call} attributes of
-subsequent functions.
-@end table
+@item sandybridge
+Intel Core i7 Sandy Bridge CPU.
 
-@node LoongArch Pragmas
-@subsection LoongArch Pragmas
+@item ivybridge
+Intel Core i7 Ivy Bridge CPU.
 
-The list of attributes supported by Pragma is the same as that of target
-function attributes.  @xref{LoongArch Function Attributes}.
+@item haswell
+Intel Core i7 Haswell CPU.
 
-Example:
+@item broadwell
+Intel Core i7 Broadwell CPU.
 
-@smallexample
-#pragma GCC target("strict-align")
-@end smallexample
+@item skylake
+Intel Core i7 Skylake CPU.
 
-@node M32C Pragmas
-@subsection M32C Pragmas
+@item skylake-avx512
+Intel Core i7 Skylake AVX512 CPU.
 
-@table @code
-@cindex pragma, memregs
-@item GCC memregs @var{number}
-Overrides the command-line option @code{-memregs=} for the current
-file.  Use with care!  This pragma must be before any function in the
-file, and mixing different memregs values in different objects may
-make them incompatible.  This pragma is useful when a
-performance-critical function uses a memreg for temporary values,
-as it may allow you to reduce the number of memregs used.
+@item cannonlake
+Intel Core i7 Cannon Lake CPU.
 
-@cindex pragma, address
-@item ADDRESS @var{name} @var{address}
-For any declared symbols matching @var{name}, this does three things
-to that symbol: it forces the symbol to be located at the given
-address (a number), it forces the symbol to be volatile, and it
-changes the symbol's scope to be static.  This pragma exists for
-compatibility with other compilers, but note that the common
-@code{1234H} numeric syntax is not supported (use @code{0x1234}
-instead).  Example:
+@item icelake-client
+Intel Core i7 Ice Lake Client CPU.
 
-@smallexample
-#pragma ADDRESS port3 0x103
-char port3;
-@end smallexample
+@item icelake-server
+Intel Core i7 Ice Lake Server CPU.
 
-@end table
+@item cascadelake
+Intel Core i7 Cascadelake CPU.
 
-@node PRU Pragmas
-@subsection PRU Pragmas
+@item tigerlake
+Intel Core i7 Tigerlake CPU.
 
-@table @code
+@item cooperlake
+Intel Core i7 Cooperlake CPU.
 
-@cindex pragma, ctable_entry
-@item ctable_entry @var{index} @var{constant_address}
-Specifies that the PRU CTABLE entry given by @var{index} has the value
-@var{constant_address}.  This enables GCC to emit LBCO/SBCO instructions
-when the load/store address is known and can be addressed with some CTABLE
-entry.  For example:
+@item sapphirerapids
+Intel Core i7 sapphirerapids CPU.
 
-@smallexample
-/* will compile to "sbco Rx, 2, 0x10, 4" */
-#pragma ctable_entry 2 0x4802a000
-*(unsigned int *)0x4802a010 = val;
-@end smallexample
+@item alderlake
+Intel Core i7 Alderlake CPU.
 
-@end table
+@item rocketlake
+Intel Core i7 Rocketlake CPU.
 
-@node RS/6000 and PowerPC Pragmas
-@subsection RS/6000 and PowerPC Pragmas
+@item graniterapids
+Intel Core i7 graniterapids CPU.
 
-The RS/6000 and PowerPC targets define one pragma for controlling
-whether or not the @code{longcall} attribute is added to function
-declarations by default.  This pragma overrides the @option{-mlongcall}
-option, but not the @code{longcall} and @code{shortcall} attributes.
-@xref{RS/6000 and PowerPC Options}, for more information about when long
-calls are and are not necessary.
+@item graniterapids-d
+Intel Core i7 graniterapids D CPU.
 
-@table @code
-@cindex pragma, longcall
-@item longcall (1)
-Apply the @code{longcall} attribute to all subsequent function
-declarations.
+@item arrowlake
+Intel Core i7 Arrow Lake CPU.
 
-@item longcall (0)
-Do not apply the @code{longcall} attribute to subsequent function
-declarations.
-@end table
+@item arrowlake-s
+Intel Core i7 Arrow Lake S CPU.
 
-@c Describe h8300 pragmas here.
-@c Describe sh pragmas here.
-@c Describe v850 pragmas here.
+@item pantherlake
+Intel Core i7 Panther Lake CPU.
 
-@node S/390 Pragmas
-@subsection S/390 Pragmas
+@item diamondrapids
+Intel Core i7 Diamond Rapids CPU.
 
-The pragmas defined by the S/390 target correspond to the S/390
-target function attributes and some the additional options:
+@item bonnell
+Intel Atom Bonnell CPU.
 
-@table @samp
-@item zvector
-@itemx no-zvector
-@end table
+@item silvermont
+Intel Atom Silvermont CPU.
 
-Note that options of the pragma, unlike options of the target
-attribute, do change the value of preprocessor macros like
-@code{__VEC__}.  They can be specified as below:
+@item goldmont
+Intel Atom Goldmont CPU.
 
-@smallexample
-#pragma GCC target("string[,string]...")
-#pragma GCC target("string"[,"string"]...)
-@end smallexample
+@item goldmont-plus
+Intel Atom Goldmont Plus CPU.
 
-@node Darwin Pragmas
-@subsection Darwin Pragmas
+@item tremont
+Intel Atom Tremont CPU.
 
-The following pragmas are available for all architectures running the
-Darwin operating system.  These are useful for compatibility with other
-macOS compilers.
+@item sierraforest
+Intel Atom Sierra Forest CPU.
 
-@table @code
-@cindex pragma, mark
-@item mark @var{tokens}@dots{}
-This pragma is accepted, but has no effect.
+@item grandridge
+Intel Atom Grand Ridge CPU.
 
-@cindex pragma, options align
-@item options align=@var{alignment}
-This pragma sets the alignment of fields in structures.  The values of
-@var{alignment} may be @code{mac68k}, to emulate m68k alignment, or
-@code{power}, to emulate PowerPC alignment.  Uses of this pragma nest
-properly; to restore the previous setting, use @code{reset} for the
-@var{alignment}.
+@item clearwaterforest
+Intel Atom Clearwater Forest CPU.
 
-@cindex pragma, segment
-@item segment @var{tokens}@dots{}
-This pragma is accepted, but has no effect.
+@item lujiazui
+ZHAOXIN lujiazui CPU.
 
-@cindex pragma, unused
-@item unused (@var{var} [, @var{var}]@dots{})
-This pragma declares variables to be possibly unused.  GCC does not
-produce warnings for the listed variables.  The effect is similar to
-that of the @code{unused} attribute, except that this pragma may appear
-anywhere within the variables' scopes.
-@end table
+@item yongfeng
+ZHAOXIN yongfeng CPU.
 
-@node Solaris Pragmas
-@subsection Solaris Pragmas
+@item shijidadao
+ZHAOXIN shijidadao CPU.
 
-The Solaris target supports @code{#pragma redefine_extname}
-(@pxref{Symbol-Renaming Pragmas}).  It also supports additional
-@code{#pragma} directives for compatibility with the system compiler.
+@item amdfam10h
+AMD Family 10h CPU.
 
-@table @code
-@cindex pragma, align
-@item align @var{alignment} (@var{variable} [, @var{variable}]...)
+@item barcelona
+AMD Family 10h Barcelona CPU.
 
-Increase the minimum alignment of each @var{variable} to @var{alignment}.
-This is the same as GCC's @code{aligned} attribute @pxref{Variable
-Attributes}).  Macro expansion occurs on the arguments to this pragma
-when compiling C and Objective-C@.  It does not currently occur when
-compiling C++, but this is a bug which may be fixed in a future
-release.
+@item shanghai
+AMD Family 10h Shanghai CPU.
 
-@cindex pragma, fini
-@item fini (@var{function} [, @var{function}]...)
+@item istanbul
+AMD Family 10h Istanbul CPU.
 
-This pragma causes each listed @var{function} to be called after
-main, or during shared module unloading, by adding a call to the
-@code{.fini} section.
+@item btver1
+AMD Family 14h CPU.
 
-@cindex pragma, init
-@item init (@var{function} [, @var{function}]...)
+@item amdfam15h
+AMD Family 15h CPU.
 
-This pragma causes each listed @var{function} to be called during
-initialization (before @code{main}) or during shared module loading, by
-adding a call to the @code{.init} section.
+@item bdver1
+AMD Family 15h Bulldozer version 1.
 
-@end table
+@item bdver2
+AMD Family 15h Bulldozer version 2.
 
-@node Symbol-Renaming Pragmas
-@subsection Symbol-Renaming Pragmas
+@item bdver3
+AMD Family 15h Bulldozer version 3.
 
-GCC supports a @code{#pragma} directive that changes the name used in
-assembly for a given declaration. While this pragma is supported on all
-platforms, it is intended primarily to provide compatibility with the
-Solaris system headers. This effect can also be achieved using the asm
-labels extension (@pxref{Asm Labels}).
+@item bdver4
+AMD Family 15h Bulldozer version 4.
 
-@table @code
-@cindex pragma, redefine_extname
-@item redefine_extname @var{oldname} @var{newname}
+@item btver2
+AMD Family 16h CPU.
 
-This pragma gives the C function @var{oldname} the assembly symbol
-@var{newname}.  The preprocessor macro @code{__PRAGMA_REDEFINE_EXTNAME}
-is defined if this pragma is available (currently on all platforms).
-@end table
+@item amdfam17h
+AMD Family 17h CPU.
 
-This pragma and the @code{asm} labels extension interact in a complicated
-manner.  Here are some corner cases you may want to be aware of:
+@item znver1
+AMD Family 17h Zen version 1.
 
-@enumerate
-@item This pragma silently applies only to declarations with external
-linkage.  The @code{asm} label feature does not have this restriction.
+@item znver2
+AMD Family 17h Zen version 2.
 
-@item In C++, this pragma silently applies only to declarations with
-``C'' linkage.  Again, @code{asm} labels do not have this restriction.
+@item amdfam19h
+AMD Family 19h CPU.
 
-@item If either of the ways of changing the assembly name of a
-declaration are applied to a declaration whose assembly name has
-already been determined (either by a previous use of one of these
-features, or because the compiler needed the assembly name in order to
-generate code), and the new name is different, a warning issues and
-the name does not change.
+@item znver3
+AMD Family 19h Zen version 3.
 
-@item The @var{oldname} used by @code{#pragma redefine_extname} is
-always the C-language name.
-@end enumerate
+@item znver4
+AMD Family 19h Zen version 4.
 
-@node Structure-Layout Pragmas
-@subsection Structure-Layout Pragmas
+@item znver5
+AMD Family 1ah Zen version 5.
+@end table
 
-For compatibility with Microsoft Windows compilers, GCC supports a
-set of @code{#pragma} directives that change the maximum alignment of
-members of structures (other than zero-width bit-fields), unions, and
-classes subsequently defined. The @var{n} value below always is required
-to be a small power of two and specifies the new alignment in bytes.
+Here is an example:
+@smallexample
+if (__builtin_cpu_is ("corei7"))
+  @{
+     do_corei7 (); // Core i7 specific implementation.
+  @}
+else
+  @{
+     do_generic (); // Generic implementation.
+  @}
+@end smallexample
+@enddefbuiltin
 
-@enumerate
-@item @code{#pragma pack(@var{n})} simply sets the new alignment.
-@item @code{#pragma pack()} sets the alignment to the one that was in
-effect when compilation started (see also command-line option
-@option{-fpack-struct[=@var{n}]} @pxref{Code Gen Options}).
-@item @code{#pragma pack(push[,@var{n}])} pushes the current alignment
-setting on an internal stack and then optionally sets the new alignment.
-@item @code{#pragma pack(pop)} restores the alignment setting to the one
-saved at the top of the internal stack (and removes that stack entry).
-Note that @code{#pragma pack([@var{n}])} does not influence this internal
-stack; thus it is possible to have @code{#pragma pack(push)} followed by
-multiple @code{#pragma pack(@var{n})} instances and finalized by a single
-@code{#pragma pack(pop)}.
-@end enumerate
+@defbuiltin{int __builtin_cpu_supports (const char *@var{feature})}
+This function returns a positive integer if the run-time CPU
+supports @var{feature}
+and returns @code{0} otherwise. The following features can be detected:
 
-Some targets, e.g.@: x86 and PowerPC, support the @code{#pragma ms_struct}
-directive which lays out structures and unions subsequently defined as the
-documented @code{__attribute__ ((ms_struct))}.
+@table @samp
+@item cmov
+CMOV instruction.
+@item mmx
+MMX instructions.
+@item popcnt
+POPCNT instruction.
+@item sse
+SSE instructions.
+@item sse2
+SSE2 instructions.
+@item sse3
+SSE3 instructions.
+@item ssse3
+SSSE3 instructions.
+@item sse4.1
+SSE4.1 instructions.
+@item sse4.2
+SSE4.2 instructions.
+@item avx
+AVX instructions.
+@item avx2
+AVX2 instructions.
+@item sse4a
+SSE4A instructions.
+@item fma4
+FMA4 instructions.
+@item xop
+XOP instructions.
+@item fma
+FMA instructions.
+@item avx512f
+AVX512F instructions.
+@item bmi
+BMI instructions.
+@item bmi2
+BMI2 instructions.
+@item aes
+AES instructions.
+@item pclmul
+PCLMUL instructions.
+@item avx512vl
+AVX512VL instructions.
+@item avx512bw
+AVX512BW instructions.
+@item avx512dq
+AVX512DQ instructions.
+@item avx512cd
+AVX512CD instructions.
+@item avx512vbmi
+AVX512VBMI instructions.
+@item avx512ifma
+AVX512IFMA instructions.
+@item avx512vpopcntdq
+AVX512VPOPCNTDQ instructions.
+@item avx512vbmi2
+AVX512VBMI2 instructions.
+@item gfni
+GFNI instructions.
+@item vpclmulqdq
+VPCLMULQDQ instructions.
+@item avx512vnni
+AVX512VNNI instructions.
+@item avx512bitalg
+AVX512BITALG instructions.
+@item x86-64
+Baseline x86-64 microarchitecture level (as defined in x86-64 psABI).
+@item x86-64-v2
+x86-64-v2 microarchitecture level.
+@item x86-64-v3
+x86-64-v3 microarchitecture level.
+@item x86-64-v4
+x86-64-v4 microarchitecture level.
 
-@enumerate
-@item @code{#pragma ms_struct on} turns on the Microsoft layout.
-@item @code{#pragma ms_struct off} turns off the Microsoft layout.
-@item @code{#pragma ms_struct reset} goes back to the default layout.
-@end enumerate
 
-Most targets also support the @code{#pragma scalar_storage_order} directive
-which lays out structures and unions subsequently defined as the documented
-@code{__attribute__ ((scalar_storage_order))}.
+@end table
 
-@enumerate
-@item @code{#pragma scalar_storage_order big-endian} sets the storage order
-of the scalar fields to big-endian.
-@item @code{#pragma scalar_storage_order little-endian} sets the storage order
-of the scalar fields to little-endian.
-@item @code{#pragma scalar_storage_order default} goes back to the endianness
-that was in effect when compilation started (see also command-line option
-@option{-fsso-struct=@var{endianness}} @pxref{C Dialect Options}).
-@end enumerate
+Here is an example:
+@smallexample
+if (__builtin_cpu_supports ("popcnt"))
+  @{
+     asm("popcnt %1,%0" : "=r"(count) : "rm"(n) : "cc");
+  @}
+else
+  @{
+     count = generic_countbits (n); //generic implementation.
+  @}
+@end smallexample
+@enddefbuiltin
 
-@node Weak Pragmas
-@subsection Weak Pragmas
+The following built-in functions are made available by @option{-mmmx}.
+All of them generate the machine instruction that is part of the name.
 
-For compatibility with SVR4, GCC supports a set of @code{#pragma}
-directives for declaring symbols to be weak, and defining weak
-aliases.
+@smallexample
+v8qi __builtin_ia32_paddb (v8qi, v8qi);
+v4hi __builtin_ia32_paddw (v4hi, v4hi);
+v2si __builtin_ia32_paddd (v2si, v2si);
+v8qi __builtin_ia32_psubb (v8qi, v8qi);
+v4hi __builtin_ia32_psubw (v4hi, v4hi);
+v2si __builtin_ia32_psubd (v2si, v2si);
+v8qi __builtin_ia32_paddsb (v8qi, v8qi);
+v4hi __builtin_ia32_paddsw (v4hi, v4hi);
+v8qi __builtin_ia32_psubsb (v8qi, v8qi);
+v4hi __builtin_ia32_psubsw (v4hi, v4hi);
+v8qi __builtin_ia32_paddusb (v8qi, v8qi);
+v4hi __builtin_ia32_paddusw (v4hi, v4hi);
+v8qi __builtin_ia32_psubusb (v8qi, v8qi);
+v4hi __builtin_ia32_psubusw (v4hi, v4hi);
+v4hi __builtin_ia32_pmullw (v4hi, v4hi);
+v4hi __builtin_ia32_pmulhw (v4hi, v4hi);
+di __builtin_ia32_pand (di, di);
+di __builtin_ia32_pandn (di,di);
+di __builtin_ia32_por (di, di);
+di __builtin_ia32_pxor (di, di);
+v8qi __builtin_ia32_pcmpeqb (v8qi, v8qi);
+v4hi __builtin_ia32_pcmpeqw (v4hi, v4hi);
+v2si __builtin_ia32_pcmpeqd (v2si, v2si);
+v8qi __builtin_ia32_pcmpgtb (v8qi, v8qi);
+v4hi __builtin_ia32_pcmpgtw (v4hi, v4hi);
+v2si __builtin_ia32_pcmpgtd (v2si, v2si);
+v8qi __builtin_ia32_punpckhbw (v8qi, v8qi);
+v4hi __builtin_ia32_punpckhwd (v4hi, v4hi);
+v2si __builtin_ia32_punpckhdq (v2si, v2si);
+v8qi __builtin_ia32_punpcklbw (v8qi, v8qi);
+v4hi __builtin_ia32_punpcklwd (v4hi, v4hi);
+v2si __builtin_ia32_punpckldq (v2si, v2si);
+v8qi __builtin_ia32_packsswb (v4hi, v4hi);
+v4hi __builtin_ia32_packssdw (v2si, v2si);
+v8qi __builtin_ia32_packuswb (v4hi, v4hi);
 
-@table @code
-@cindex pragma, weak
-@item #pragma weak @var{symbol}
-This pragma declares @var{symbol} to be weak, as if the declaration
-had the attribute of the same name.  The pragma may appear before
-or after the declaration of @var{symbol}.  It is not an error for
-@var{symbol} to never be defined at all.
+v4hi __builtin_ia32_psllw (v4hi, v4hi);
+v2si __builtin_ia32_pslld (v2si, v2si);
+v1di __builtin_ia32_psllq (v1di, v1di);
+v4hi __builtin_ia32_psrlw (v4hi, v4hi);
+v2si __builtin_ia32_psrld (v2si, v2si);
+v1di __builtin_ia32_psrlq (v1di, v1di);
+v4hi __builtin_ia32_psraw (v4hi, v4hi);
+v2si __builtin_ia32_psrad (v2si, v2si);
+v4hi __builtin_ia32_psllwi (v4hi, int);
+v2si __builtin_ia32_pslldi (v2si, int);
+v1di __builtin_ia32_psllqi (v1di, int);
+v4hi __builtin_ia32_psrlwi (v4hi, int);
+v2si __builtin_ia32_psrldi (v2si, int);
+v1di __builtin_ia32_psrlqi (v1di, int);
+v4hi __builtin_ia32_psrawi (v4hi, int);
+v2si __builtin_ia32_psradi (v2si, int);
+@end smallexample
 
-@item #pragma weak @var{symbol1} = @var{symbol2}
-This pragma declares @var{symbol1} to be a weak alias of @var{symbol2}.
-It is an error if @var{symbol2} is not defined in the current
-translation unit.
-@end table
+The following built-in functions are made available either with
+@option{-msse}, or with @option{-m3dnowa}.  All of them generate
+the machine instruction that is part of the name.
 
-@node Diagnostic Pragmas
-@subsection Diagnostic Pragmas
+@smallexample
+v4hi __builtin_ia32_pmulhuw (v4hi, v4hi);
+v8qi __builtin_ia32_pavgb (v8qi, v8qi);
+v4hi __builtin_ia32_pavgw (v4hi, v4hi);
+v1di __builtin_ia32_psadbw (v8qi, v8qi);
+v8qi __builtin_ia32_pmaxub (v8qi, v8qi);
+v4hi __builtin_ia32_pmaxsw (v4hi, v4hi);
+v8qi __builtin_ia32_pminub (v8qi, v8qi);
+v4hi __builtin_ia32_pminsw (v4hi, v4hi);
+int __builtin_ia32_pmovmskb (v8qi);
+void __builtin_ia32_maskmovq (v8qi, v8qi, char *);
+void __builtin_ia32_movntq (di *, di);
+void __builtin_ia32_sfence (void);
+@end smallexample
 
-GCC allows the user to selectively enable or disable certain types of
-diagnostics, and change the kind of the diagnostic.  For example, a
-project's policy might require that all sources compile with
-@option{-Werror} but certain files might have exceptions allowing
-specific types of warnings.  Or, a project might selectively enable
-diagnostics and treat them as errors depending on which preprocessor
-macros are defined.
+The following built-in functions are available when @option{-msse} is used.
+All of them generate the machine instruction that is part of the name.
 
-@table @code
-@cindex pragma, diagnostic
-@item #pragma GCC diagnostic @var{kind} @var{option}
+@smallexample
+int __builtin_ia32_comieq (v4sf, v4sf);
+int __builtin_ia32_comineq (v4sf, v4sf);
+int __builtin_ia32_comilt (v4sf, v4sf);
+int __builtin_ia32_comile (v4sf, v4sf);
+int __builtin_ia32_comigt (v4sf, v4sf);
+int __builtin_ia32_comige (v4sf, v4sf);
+int __builtin_ia32_ucomieq (v4sf, v4sf);
+int __builtin_ia32_ucomineq (v4sf, v4sf);
+int __builtin_ia32_ucomilt (v4sf, v4sf);
+int __builtin_ia32_ucomile (v4sf, v4sf);
+int __builtin_ia32_ucomigt (v4sf, v4sf);
+int __builtin_ia32_ucomige (v4sf, v4sf);
+v4sf __builtin_ia32_addps (v4sf, v4sf);
+v4sf __builtin_ia32_subps (v4sf, v4sf);
+v4sf __builtin_ia32_mulps (v4sf, v4sf);
+v4sf __builtin_ia32_divps (v4sf, v4sf);
+v4sf __builtin_ia32_addss (v4sf, v4sf);
+v4sf __builtin_ia32_subss (v4sf, v4sf);
+v4sf __builtin_ia32_mulss (v4sf, v4sf);
+v4sf __builtin_ia32_divss (v4sf, v4sf);
+v4sf __builtin_ia32_cmpeqps (v4sf, v4sf);
+v4sf __builtin_ia32_cmpltps (v4sf, v4sf);
+v4sf __builtin_ia32_cmpleps (v4sf, v4sf);
+v4sf __builtin_ia32_cmpgtps (v4sf, v4sf);
+v4sf __builtin_ia32_cmpgeps (v4sf, v4sf);
+v4sf __builtin_ia32_cmpunordps (v4sf, v4sf);
+v4sf __builtin_ia32_cmpneqps (v4sf, v4sf);
+v4sf __builtin_ia32_cmpnltps (v4sf, v4sf);
+v4sf __builtin_ia32_cmpnleps (v4sf, v4sf);
+v4sf __builtin_ia32_cmpngtps (v4sf, v4sf);
+v4sf __builtin_ia32_cmpngeps (v4sf, v4sf);
+v4sf __builtin_ia32_cmpordps (v4sf, v4sf);
+v4sf __builtin_ia32_cmpeqss (v4sf, v4sf);
+v4sf __builtin_ia32_cmpltss (v4sf, v4sf);
+v4sf __builtin_ia32_cmpless (v4sf, v4sf);
+v4sf __builtin_ia32_cmpunordss (v4sf, v4sf);
+v4sf __builtin_ia32_cmpneqss (v4sf, v4sf);
+v4sf __builtin_ia32_cmpnltss (v4sf, v4sf);
+v4sf __builtin_ia32_cmpnless (v4sf, v4sf);
+v4sf __builtin_ia32_cmpordss (v4sf, v4sf);
+v4sf __builtin_ia32_maxps (v4sf, v4sf);
+v4sf __builtin_ia32_maxss (v4sf, v4sf);
+v4sf __builtin_ia32_minps (v4sf, v4sf);
+v4sf __builtin_ia32_minss (v4sf, v4sf);
+v4sf __builtin_ia32_andps (v4sf, v4sf);
+v4sf __builtin_ia32_andnps (v4sf, v4sf);
+v4sf __builtin_ia32_orps (v4sf, v4sf);
+v4sf __builtin_ia32_xorps (v4sf, v4sf);
+v4sf __builtin_ia32_movss (v4sf, v4sf);
+v4sf __builtin_ia32_movhlps (v4sf, v4sf);
+v4sf __builtin_ia32_movlhps (v4sf, v4sf);
+v4sf __builtin_ia32_unpckhps (v4sf, v4sf);
+v4sf __builtin_ia32_unpcklps (v4sf, v4sf);
+v4sf __builtin_ia32_cvtpi2ps (v4sf, v2si);
+v4sf __builtin_ia32_cvtsi2ss (v4sf, int);
+v2si __builtin_ia32_cvtps2pi (v4sf);
+int __builtin_ia32_cvtss2si (v4sf);
+v2si __builtin_ia32_cvttps2pi (v4sf);
+int __builtin_ia32_cvttss2si (v4sf);
+v4sf __builtin_ia32_rcpps (v4sf);
+v4sf __builtin_ia32_rsqrtps (v4sf);
+v4sf __builtin_ia32_sqrtps (v4sf);
+v4sf __builtin_ia32_rcpss (v4sf);
+v4sf __builtin_ia32_rsqrtss (v4sf);
+v4sf __builtin_ia32_sqrtss (v4sf);
+v4sf __builtin_ia32_shufps (v4sf, v4sf, int);
+void __builtin_ia32_movntps (float *, v4sf);
+int __builtin_ia32_movmskps (v4sf);
+@end smallexample
 
-Modifies the disposition of a diagnostic.  Note that not all
-diagnostics are modifiable; at the moment only warnings (normally
-controlled by @samp{-W@dots{}}) can be controlled, and not all of them.
-Use @option{-fdiagnostics-show-option} to determine which diagnostics
-are controllable and which option controls them.
+The following built-in functions are available when @option{-msse} is used.
 
-@var{kind} is @samp{error} to treat this diagnostic as an error,
-@samp{warning} to treat it like a warning (even if @option{-Werror} is
-in effect), or @samp{ignored} if the diagnostic is to be ignored.
-@var{option} is a double quoted string that matches the command-line
-option.
+@defbuiltin{v4sf __builtin_ia32_loadups (float *)}
+Generates the @code{movups} machine instruction as a load from memory.
+@enddefbuiltin
 
-@smallexample
-#pragma GCC diagnostic warning "-Wformat"
-#pragma GCC diagnostic error "-Wformat"
-#pragma GCC diagnostic ignored "-Wformat"
-@end smallexample
+@defbuiltin{void __builtin_ia32_storeups (float *, v4sf)}
+Generates the @code{movups} machine instruction as a store to memory.
+@enddefbuiltin
 
-Note that these pragmas override any command-line options.  GCC keeps
-track of the location of each pragma, and issues diagnostics according
-to the state as of that point in the source file.  Thus, pragmas occurring
-after a line do not affect diagnostics caused by that line.
+@defbuiltin{v4sf __builtin_ia32_loadss (float *)}
+Generates the @code{movss} machine instruction as a load from memory.
+@enddefbuiltin
 
-@item #pragma GCC diagnostic push
-@itemx #pragma GCC diagnostic pop
+@defbuiltin{v4sf __builtin_ia32_loadhps (v4sf, const v2sf *)}
+Generates the @code{movhps} machine instruction as a load from memory.
+@enddefbuiltin
 
-Causes GCC to remember the state of the diagnostics as of each
-@code{push}, and restore to that point at each @code{pop}.  If a
-@code{pop} has no matching @code{push}, the command-line options are
-restored.
+@defbuiltin{v4sf __builtin_ia32_loadlps (v4sf, const v2sf *)}
+Generates the @code{movlps} machine instruction as a load from memory
+@enddefbuiltin
 
-@smallexample
-#pragma GCC diagnostic error "-Wuninitialized"
-  foo(a);                       /* error is given for this one */
-#pragma GCC diagnostic push
-#pragma GCC diagnostic ignored "-Wuninitialized"
-  foo(b);                       /* no diagnostic for this one */
-#pragma GCC diagnostic pop
-  foo(c);                       /* error is given for this one */
-#pragma GCC diagnostic pop
-  foo(d);                       /* depends on command-line options */
-@end smallexample
+@defbuiltin{void __builtin_ia32_storehps (v2sf *, v4sf)}
+Generates the @code{movhps} machine instruction as a store to memory.
+@enddefbuiltin
 
-@item #pragma GCC diagnostic ignored_attributes
+@defbuiltin{void __builtin_ia32_storelps (v2sf *, v4sf)}
+Generates the @code{movlps} machine instruction as a store to memory.
+@enddefbuiltin
 
-Similarly to @option{-Wno-attributes=}, this pragma allows users to suppress
-warnings about unknown scoped attributes (in C++11 and C23).  For example,
-@code{#pragma GCC diagnostic ignored_attributes "vendor::attr"} disables
-warning about the following declaration:
+The following built-in functions are available when @option{-msse2} is used.
+All of them generate the machine instruction that is part of the name.
 
 @smallexample
-[[vendor::attr]] void f();
+int __builtin_ia32_comisdeq (v2df, v2df);
+int __builtin_ia32_comisdlt (v2df, v2df);
+int __builtin_ia32_comisdle (v2df, v2df);
+int __builtin_ia32_comisdgt (v2df, v2df);
+int __builtin_ia32_comisdge (v2df, v2df);
+int __builtin_ia32_comisdneq (v2df, v2df);
+int __builtin_ia32_ucomisdeq (v2df, v2df);
+int __builtin_ia32_ucomisdlt (v2df, v2df);
+int __builtin_ia32_ucomisdle (v2df, v2df);
+int __builtin_ia32_ucomisdgt (v2df, v2df);
+int __builtin_ia32_ucomisdge (v2df, v2df);
+int __builtin_ia32_ucomisdneq (v2df, v2df);
+v2df __builtin_ia32_cmpeqpd (v2df, v2df);
+v2df __builtin_ia32_cmpltpd (v2df, v2df);
+v2df __builtin_ia32_cmplepd (v2df, v2df);
+v2df __builtin_ia32_cmpgtpd (v2df, v2df);
+v2df __builtin_ia32_cmpgepd (v2df, v2df);
+v2df __builtin_ia32_cmpunordpd (v2df, v2df);
+v2df __builtin_ia32_cmpneqpd (v2df, v2df);
+v2df __builtin_ia32_cmpnltpd (v2df, v2df);
+v2df __builtin_ia32_cmpnlepd (v2df, v2df);
+v2df __builtin_ia32_cmpngtpd (v2df, v2df);
+v2df __builtin_ia32_cmpngepd (v2df, v2df);
+v2df __builtin_ia32_cmpordpd (v2df, v2df);
+v2df __builtin_ia32_cmpeqsd (v2df, v2df);
+v2df __builtin_ia32_cmpltsd (v2df, v2df);
+v2df __builtin_ia32_cmplesd (v2df, v2df);
+v2df __builtin_ia32_cmpunordsd (v2df, v2df);
+v2df __builtin_ia32_cmpneqsd (v2df, v2df);
+v2df __builtin_ia32_cmpnltsd (v2df, v2df);
+v2df __builtin_ia32_cmpnlesd (v2df, v2df);
+v2df __builtin_ia32_cmpordsd (v2df, v2df);
+v2di __builtin_ia32_paddq (v2di, v2di);
+v2di __builtin_ia32_psubq (v2di, v2di);
+v2df __builtin_ia32_addpd (v2df, v2df);
+v2df __builtin_ia32_subpd (v2df, v2df);
+v2df __builtin_ia32_mulpd (v2df, v2df);
+v2df __builtin_ia32_divpd (v2df, v2df);
+v2df __builtin_ia32_addsd (v2df, v2df);
+v2df __builtin_ia32_subsd (v2df, v2df);
+v2df __builtin_ia32_mulsd (v2df, v2df);
+v2df __builtin_ia32_divsd (v2df, v2df);
+v2df __builtin_ia32_minpd (v2df, v2df);
+v2df __builtin_ia32_maxpd (v2df, v2df);
+v2df __builtin_ia32_minsd (v2df, v2df);
+v2df __builtin_ia32_maxsd (v2df, v2df);
+v2df __builtin_ia32_andpd (v2df, v2df);
+v2df __builtin_ia32_andnpd (v2df, v2df);
+v2df __builtin_ia32_orpd (v2df, v2df);
+v2df __builtin_ia32_xorpd (v2df, v2df);
+v2df __builtin_ia32_movsd (v2df, v2df);
+v2df __builtin_ia32_unpckhpd (v2df, v2df);
+v2df __builtin_ia32_unpcklpd (v2df, v2df);
+v16qi __builtin_ia32_paddb128 (v16qi, v16qi);
+v8hi __builtin_ia32_paddw128 (v8hi, v8hi);
+v4si __builtin_ia32_paddd128 (v4si, v4si);
+v2di __builtin_ia32_paddq128 (v2di, v2di);
+v16qi __builtin_ia32_psubb128 (v16qi, v16qi);
+v8hi __builtin_ia32_psubw128 (v8hi, v8hi);
+v4si __builtin_ia32_psubd128 (v4si, v4si);
+v2di __builtin_ia32_psubq128 (v2di, v2di);
+v8hi __builtin_ia32_pmullw128 (v8hi, v8hi);
+v8hi __builtin_ia32_pmulhw128 (v8hi, v8hi);
+v2di __builtin_ia32_pand128 (v2di, v2di);
+v2di __builtin_ia32_pandn128 (v2di, v2di);
+v2di __builtin_ia32_por128 (v2di, v2di);
+v2di __builtin_ia32_pxor128 (v2di, v2di);
+v16qi __builtin_ia32_pavgb128 (v16qi, v16qi);
+v8hi __builtin_ia32_pavgw128 (v8hi, v8hi);
+v16qi __builtin_ia32_pcmpeqb128 (v16qi, v16qi);
+v8hi __builtin_ia32_pcmpeqw128 (v8hi, v8hi);
+v4si __builtin_ia32_pcmpeqd128 (v4si, v4si);
+v16qi __builtin_ia32_pcmpgtb128 (v16qi, v16qi);
+v8hi __builtin_ia32_pcmpgtw128 (v8hi, v8hi);
+v4si __builtin_ia32_pcmpgtd128 (v4si, v4si);
+v16qi __builtin_ia32_pmaxub128 (v16qi, v16qi);
+v8hi __builtin_ia32_pmaxsw128 (v8hi, v8hi);
+v16qi __builtin_ia32_pminub128 (v16qi, v16qi);
+v8hi __builtin_ia32_pminsw128 (v8hi, v8hi);
+v16qi __builtin_ia32_punpckhbw128 (v16qi, v16qi);
+v8hi __builtin_ia32_punpckhwd128 (v8hi, v8hi);
+v4si __builtin_ia32_punpckhdq128 (v4si, v4si);
+v2di __builtin_ia32_punpckhqdq128 (v2di, v2di);
+v16qi __builtin_ia32_punpcklbw128 (v16qi, v16qi);
+v8hi __builtin_ia32_punpcklwd128 (v8hi, v8hi);
+v4si __builtin_ia32_punpckldq128 (v4si, v4si);
+v2di __builtin_ia32_punpcklqdq128 (v2di, v2di);
+v16qi __builtin_ia32_packsswb128 (v8hi, v8hi);
+v8hi __builtin_ia32_packssdw128 (v4si, v4si);
+v16qi __builtin_ia32_packuswb128 (v8hi, v8hi);
+v8hi __builtin_ia32_pmulhuw128 (v8hi, v8hi);
+void __builtin_ia32_maskmovdqu (v16qi, v16qi);
+v2df __builtin_ia32_loadupd (double *);
+void __builtin_ia32_storeupd (double *, v2df);
+v2df __builtin_ia32_loadhpd (v2df, double const *);
+v2df __builtin_ia32_loadlpd (v2df, double const *);
+int __builtin_ia32_movmskpd (v2df);
+int __builtin_ia32_pmovmskb128 (v16qi);
+void __builtin_ia32_movnti (int *, int);
+void __builtin_ia32_movnti64 (long long int *, long long int);
+void __builtin_ia32_movntpd (double *, v2df);
+void __builtin_ia32_movntdq (v2df *, v2df);
+v4si __builtin_ia32_pshufd (v4si, int);
+v8hi __builtin_ia32_pshuflw (v8hi, int);
+v8hi __builtin_ia32_pshufhw (v8hi, int);
+v2di __builtin_ia32_psadbw128 (v16qi, v16qi);
+v2df __builtin_ia32_sqrtpd (v2df);
+v2df __builtin_ia32_sqrtsd (v2df);
+v2df __builtin_ia32_shufpd (v2df, v2df, int);
+v2df __builtin_ia32_cvtdq2pd (v4si);
+v4sf __builtin_ia32_cvtdq2ps (v4si);
+v4si __builtin_ia32_cvtpd2dq (v2df);
+v2si __builtin_ia32_cvtpd2pi (v2df);
+v4sf __builtin_ia32_cvtpd2ps (v2df);
+v4si __builtin_ia32_cvttpd2dq (v2df);
+v2si __builtin_ia32_cvttpd2pi (v2df);
+v2df __builtin_ia32_cvtpi2pd (v2si);
+int __builtin_ia32_cvtsd2si (v2df);
+int __builtin_ia32_cvttsd2si (v2df);
+long long __builtin_ia32_cvtsd2si64 (v2df);
+long long __builtin_ia32_cvttsd2si64 (v2df);
+v4si __builtin_ia32_cvtps2dq (v4sf);
+v2df __builtin_ia32_cvtps2pd (v4sf);
+v4si __builtin_ia32_cvttps2dq (v4sf);
+v2df __builtin_ia32_cvtsi2sd (v2df, int);
+v2df __builtin_ia32_cvtsi642sd (v2df, long long);
+v4sf __builtin_ia32_cvtsd2ss (v4sf, v2df);
+v2df __builtin_ia32_cvtss2sd (v2df, v4sf);
+void __builtin_ia32_clflush (const void *);
+void __builtin_ia32_lfence (void);
+void __builtin_ia32_mfence (void);
+v16qi __builtin_ia32_loaddqu (const char *);
+void __builtin_ia32_storedqu (char *, v16qi);
+v1di __builtin_ia32_pmuludq (v2si, v2si);
+v2di __builtin_ia32_pmuludq128 (v4si, v4si);
+v8hi __builtin_ia32_psllw128 (v8hi, v8hi);
+v4si __builtin_ia32_pslld128 (v4si, v4si);
+v2di __builtin_ia32_psllq128 (v2di, v2di);
+v8hi __builtin_ia32_psrlw128 (v8hi, v8hi);
+v4si __builtin_ia32_psrld128 (v4si, v4si);
+v2di __builtin_ia32_psrlq128 (v2di, v2di);
+v8hi __builtin_ia32_psraw128 (v8hi, v8hi);
+v4si __builtin_ia32_psrad128 (v4si, v4si);
+v2di __builtin_ia32_pslldqi128 (v2di, int);
+v8hi __builtin_ia32_psllwi128 (v8hi, int);
+v4si __builtin_ia32_pslldi128 (v4si, int);
+v2di __builtin_ia32_psllqi128 (v2di, int);
+v2di __builtin_ia32_psrldqi128 (v2di, int);
+v8hi __builtin_ia32_psrlwi128 (v8hi, int);
+v4si __builtin_ia32_psrldi128 (v4si, int);
+v2di __builtin_ia32_psrlqi128 (v2di, int);
+v8hi __builtin_ia32_psrawi128 (v8hi, int);
+v4si __builtin_ia32_psradi128 (v4si, int);
+v4si __builtin_ia32_pmaddwd128 (v8hi, v8hi);
+v2di __builtin_ia32_movq128 (v2di);
 @end smallexample
 
-whereas @code{#pragma GCC diagnostic ignored_attributes "vendor::"} prevents
-warning about both of these declarations:
+The following built-in functions are available when @option{-msse3} is used.
+All of them generate the machine instruction that is part of the name.
 
 @smallexample
-[[vendor::safe]] void f();
-[[vendor::unsafe]] void f2();
+v2df __builtin_ia32_addsubpd (v2df, v2df);
+v4sf __builtin_ia32_addsubps (v4sf, v4sf);
+v2df __builtin_ia32_haddpd (v2df, v2df);
+v4sf __builtin_ia32_haddps (v4sf, v4sf);
+v2df __builtin_ia32_hsubpd (v2df, v2df);
+v4sf __builtin_ia32_hsubps (v4sf, v4sf);
+v16qi __builtin_ia32_lddqu (char const *);
+void __builtin_ia32_monitor (void *, unsigned int, unsigned int);
+v4sf __builtin_ia32_movshdup (v4sf);
+v4sf __builtin_ia32_movsldup (v4sf);
+void __builtin_ia32_mwait (unsigned int, unsigned int);
 @end smallexample
 
-@end table
-
-GCC also offers a simple mechanism for printing messages during
-compilation.
-
-@table @code
-@cindex pragma, diagnostic
-@item #pragma message @var{string}
-
-Prints @var{string} as a compiler message on compilation.  The message
-is informational only, and is neither a compilation warning nor an
-error.  Newlines can be included in the string by using the @samp{\n}
-escape sequence.
+The following built-in functions are available when @option{-mssse3} is used.
+All of them generate the machine instruction that is part of the name.
 
 @smallexample
-#pragma message "Compiling " __FILE__ "..."
+v2si __builtin_ia32_phaddd (v2si, v2si);
+v4hi __builtin_ia32_phaddw (v4hi, v4hi);
+v4hi __builtin_ia32_phaddsw (v4hi, v4hi);
+v2si __builtin_ia32_phsubd (v2si, v2si);
+v4hi __builtin_ia32_phsubw (v4hi, v4hi);
+v4hi __builtin_ia32_phsubsw (v4hi, v4hi);
+v4hi __builtin_ia32_pmaddubsw (v8qi, v8qi);
+v4hi __builtin_ia32_pmulhrsw (v4hi, v4hi);
+v8qi __builtin_ia32_pshufb (v8qi, v8qi);
+v8qi __builtin_ia32_psignb (v8qi, v8qi);
+v2si __builtin_ia32_psignd (v2si, v2si);
+v4hi __builtin_ia32_psignw (v4hi, v4hi);
+v1di __builtin_ia32_palignr (v1di, v1di, int);
+v8qi __builtin_ia32_pabsb (v8qi);
+v2si __builtin_ia32_pabsd (v2si);
+v4hi __builtin_ia32_pabsw (v4hi);
 @end smallexample
 
-@var{string} may be parenthesized, and is printed with location
-information.  For example,
+The following built-in functions are available when @option{-mssse3} is used.
+All of them generate the machine instruction that is part of the name.
 
 @smallexample
-#define DO_PRAGMA(x) _Pragma (#x)
-#define TODO(x) DO_PRAGMA(message ("TODO - " #x))
-
-TODO(Remember to fix this)
+v4si __builtin_ia32_phaddd128 (v4si, v4si);
+v8hi __builtin_ia32_phaddw128 (v8hi, v8hi);
+v8hi __builtin_ia32_phaddsw128 (v8hi, v8hi);
+v4si __builtin_ia32_phsubd128 (v4si, v4si);
+v8hi __builtin_ia32_phsubw128 (v8hi, v8hi);
+v8hi __builtin_ia32_phsubsw128 (v8hi, v8hi);
+v8hi __builtin_ia32_pmaddubsw128 (v16qi, v16qi);
+v8hi __builtin_ia32_pmulhrsw128 (v8hi, v8hi);
+v16qi __builtin_ia32_pshufb128 (v16qi, v16qi);
+v16qi __builtin_ia32_psignb128 (v16qi, v16qi);
+v4si __builtin_ia32_psignd128 (v4si, v4si);
+v8hi __builtin_ia32_psignw128 (v8hi, v8hi);
+v2di __builtin_ia32_palignr128 (v2di, v2di, int);
+v16qi __builtin_ia32_pabsb128 (v16qi);
+v4si __builtin_ia32_pabsd128 (v4si);
+v8hi __builtin_ia32_pabsw128 (v8hi);
 @end smallexample
 
-@noindent
-prints @samp{/tmp/file.c:4: note: #pragma message:
-TODO - Remember to fix this}.
-
-@cindex pragma, diagnostic
-@item #pragma GCC error @var{message}
-Generates an error message.  This pragma @emph{is} considered to
-indicate an error in the compilation, and it will be treated as such.
-
-Newlines can be included in the string by using the @samp{\n}
-escape sequence.  They will be displayed as newlines even if the
-@option{-fmessage-length} option is set to zero.
-
-The error is only generated if the pragma is present in the code after
-pre-processing has been completed.  It does not matter however if the
-code containing the pragma is unreachable:
+The following built-in functions are available when @option{-msse4.1} is
+used.  All of them generate the machine instruction that is part of the
+name.
 
 @smallexample
-#if 0
-#pragma GCC error "this error is not seen"
-#endif
-void foo (void)
-@{
-  return;
-#pragma GCC error "this error is seen"
-@}
-@end smallexample
-
-@cindex pragma, diagnostic
-@item #pragma GCC warning @var{message}
-This is just like @samp{pragma GCC error} except that a warning
-message is issued instead of an error message.  Unless
-@option{-Werror} is in effect, in which case this pragma will generate
-an error as well.
-
-@end table
+v2df __builtin_ia32_blendpd (v2df, v2df, const int);
+v4sf __builtin_ia32_blendps (v4sf, v4sf, const int);
+v2df __builtin_ia32_blendvpd (v2df, v2df, v2df);
+v4sf __builtin_ia32_blendvps (v4sf, v4sf, v4sf);
+v2df __builtin_ia32_dppd (v2df, v2df, const int);
+v4sf __builtin_ia32_dpps (v4sf, v4sf, const int);
+v4sf __builtin_ia32_insertps128 (v4sf, v4sf, const int);
+v2di __builtin_ia32_movntdqa (v2di *);
+v16qi __builtin_ia32_mpsadbw128 (v16qi, v16qi, const int);
+v8hi __builtin_ia32_packusdw128 (v4si, v4si);
+v16qi __builtin_ia32_pblendvb128 (v16qi, v16qi, v16qi);
+v8hi __builtin_ia32_pblendw128 (v8hi, v8hi, const int);
+v2di __builtin_ia32_pcmpeqq (v2di, v2di);
+v8hi __builtin_ia32_phminposuw128 (v8hi);
+v16qi __builtin_ia32_pmaxsb128 (v16qi, v16qi);
+v4si __builtin_ia32_pmaxsd128 (v4si, v4si);
+v4si __builtin_ia32_pmaxud128 (v4si, v4si);
+v8hi __builtin_ia32_pmaxuw128 (v8hi, v8hi);
+v16qi __builtin_ia32_pminsb128 (v16qi, v16qi);
+v4si __builtin_ia32_pminsd128 (v4si, v4si);
+v4si __builtin_ia32_pminud128 (v4si, v4si);
+v8hi __builtin_ia32_pminuw128 (v8hi, v8hi);
+v4si __builtin_ia32_pmovsxbd128 (v16qi);
+v2di __builtin_ia32_pmovsxbq128 (v16qi);
+v8hi __builtin_ia32_pmovsxbw128 (v16qi);
+v2di __builtin_ia32_pmovsxdq128 (v4si);
+v4si __builtin_ia32_pmovsxwd128 (v8hi);
+v2di __builtin_ia32_pmovsxwq128 (v8hi);
+v4si __builtin_ia32_pmovzxbd128 (v16qi);
+v2di __builtin_ia32_pmovzxbq128 (v16qi);
+v8hi __builtin_ia32_pmovzxbw128 (v16qi);
+v2di __builtin_ia32_pmovzxdq128 (v4si);
+v4si __builtin_ia32_pmovzxwd128 (v8hi);
+v2di __builtin_ia32_pmovzxwq128 (v8hi);
+v2di __builtin_ia32_pmuldq128 (v4si, v4si);
+v4si __builtin_ia32_pmulld128 (v4si, v4si);
+int __builtin_ia32_ptestc128 (v2di, v2di);
+int __builtin_ia32_ptestnzc128 (v2di, v2di);
+int __builtin_ia32_ptestz128 (v2di, v2di);
+v2df __builtin_ia32_roundpd (v2df, const int);
+v4sf __builtin_ia32_roundps (v4sf, const int);
+v2df __builtin_ia32_roundsd (v2df, v2df, const int);
+v4sf __builtin_ia32_roundss (v4sf, v4sf, const int);
+@end smallexample
 
-@node Visibility Pragmas
-@subsection Visibility Pragmas
+The following built-in functions are available when @option{-msse4.1} is
+used.
 
-@table @code
-@cindex pragma, visibility
-@item #pragma GCC visibility push(@var{visibility})
-@itemx #pragma GCC visibility pop
+@defbuiltin{v4sf __builtin_ia32_vec_set_v4sf (v4sf, float, const int)}
+Generates the @code{insertps} machine instruction.
+@enddefbuiltin
 
-This pragma allows the user to set the visibility for multiple
-declarations without having to give each a visibility attribute
-(@pxref{Function Attributes}).
+@defbuiltin{int __builtin_ia32_vec_ext_v16qi (v16qi, const int)}
+Generates the @code{pextrb} machine instruction.
+@enddefbuiltin
 
-In C++, @samp{#pragma GCC visibility} affects only namespace-scope
-declarations.  Class members and template specializations are not
-affected; if you want to override the visibility for a particular
-member or instantiation, you must use an attribute.
+@defbuiltin{v16qi __builtin_ia32_vec_set_v16qi (v16qi, int, const int)}
+Generates the @code{pinsrb} machine instruction.
+@enddefbuiltin
 
-@end table
+@defbuiltin{v4si __builtin_ia32_vec_set_v4si (v4si, int, const int)}
+Generates the @code{pinsrd} machine instruction.
+@enddefbuiltin
 
+@defbuiltin{v2di __builtin_ia32_vec_set_v2di (v2di, long long, const int)}
+Generates the @code{pinsrq} machine instruction in 64bit mode.
+@enddefbuiltin
 
-@node Push/Pop Macro Pragmas
-@subsection Push/Pop Macro Pragmas
+The following built-in functions are changed to generate new SSE4.1
+instructions when @option{-msse4.1} is used.
 
-For compatibility with Microsoft Windows compilers, GCC supports
-@samp{#pragma push_macro(@var{"macro_name"})}
-and @samp{#pragma pop_macro(@var{"macro_name"})}.
+@defbuiltin{float __builtin_ia32_vec_ext_v4sf (v4sf, const int)}
+Generates the @code{extractps} machine instruction.
+@enddefbuiltin
 
-@table @code
-@cindex pragma, push_macro
-@item #pragma push_macro(@var{"macro_name"})
-This pragma saves the value of the macro named as @var{macro_name} to
-the top of the stack for this macro.
+@defbuiltin{int __builtin_ia32_vec_ext_v4si (v4si, const int)}
+Generates the @code{pextrd} machine instruction.
+@enddefbuiltin
 
-@cindex pragma, pop_macro
-@item #pragma pop_macro(@var{"macro_name"})
-This pragma sets the value of the macro named as @var{macro_name} to
-the value on top of the stack for this macro. If the stack for
-@var{macro_name} is empty, the value of the macro remains unchanged.
-@end table
+@defbuiltin{{long long} __builtin_ia32_vec_ext_v2di (v2di, const int)}
+Generates the @code{pextrq} machine instruction in 64bit mode.
+@enddefbuiltin
 
-For example:
+The following built-in functions are available when @option{-msse4.2} is
+used.  All of them generate the machine instruction that is part of the
+name.
 
 @smallexample
-#define X  1
-#pragma push_macro("X")
-#undef X
-#define X -1
-#pragma pop_macro("X")
-int x [X];
+v16qi __builtin_ia32_pcmpestrm128 (v16qi, int, v16qi, int, const int);
+int __builtin_ia32_pcmpestri128 (v16qi, int, v16qi, int, const int);
+int __builtin_ia32_pcmpestria128 (v16qi, int, v16qi, int, const int);
+int __builtin_ia32_pcmpestric128 (v16qi, int, v16qi, int, const int);
+int __builtin_ia32_pcmpestrio128 (v16qi, int, v16qi, int, const int);
+int __builtin_ia32_pcmpestris128 (v16qi, int, v16qi, int, const int);
+int __builtin_ia32_pcmpestriz128 (v16qi, int, v16qi, int, const int);
+v16qi __builtin_ia32_pcmpistrm128 (v16qi, v16qi, const int);
+int __builtin_ia32_pcmpistri128 (v16qi, v16qi, const int);
+int __builtin_ia32_pcmpistria128 (v16qi, v16qi, const int);
+int __builtin_ia32_pcmpistric128 (v16qi, v16qi, const int);
+int __builtin_ia32_pcmpistrio128 (v16qi, v16qi, const int);
+int __builtin_ia32_pcmpistris128 (v16qi, v16qi, const int);
+int __builtin_ia32_pcmpistriz128 (v16qi, v16qi, const int);
+v2di __builtin_ia32_pcmpgtq (v2di, v2di);
 @end smallexample
 
-@noindent
-In this example, the definition of X as 1 is saved by @code{#pragma
-push_macro} and restored by @code{#pragma pop_macro}.
-
-@node Function Specific Option Pragmas
-@subsection Function Specific Option Pragmas
-
-@table @code
-@cindex pragma GCC target
-@item #pragma GCC target (@var{string}, @dots{})
-
-This pragma allows you to set target-specific options for functions
-defined later in the source file.  One or more strings can be
-specified.  Each function that is defined after this point is treated
-as if it had been declared with one @code{target(}@var{string}@code{)}
-attribute for each @var{string} argument.  The parentheses around
-the strings in the pragma are optional.  @xref{Function Attributes},
-for more information about the @code{target} attribute and the attribute
-syntax.
-
-The @code{#pragma GCC target} pragma is presently implemented for
-x86, ARM, AArch64, PowerPC, and S/390 targets only.
-
-@cindex pragma GCC optimize
-@item #pragma GCC optimize (@var{string}, @dots{})
-
-This pragma allows you to set global optimization options for functions
-defined later in the source file.  One or more strings can be
-specified.  Each function that is defined after this point is treated
-as if it had been declared with one @code{optimize(}@var{string}@code{)}
-attribute for each @var{string} argument.  The parentheses around
-the strings in the pragma are optional.  @xref{Function Attributes},
-for more information about the @code{optimize} attribute and the attribute
-syntax.
-
-@cindex pragma GCC push_options
-@cindex pragma GCC pop_options
-@item #pragma GCC push_options
-@itemx #pragma GCC pop_options
-
-These pragmas maintain a stack of the current target and optimization
-options.  It is intended for include files where you temporarily want
-to switch to using a different @samp{#pragma GCC target} or
-@samp{#pragma GCC optimize} and then to pop back to the previous
-options.
+The following built-in functions are available when @option{-msse4.2} is
+used.
 
-@cindex pragma GCC reset_options
-@item #pragma GCC reset_options
+@defbuiltin{{unsigned int} __builtin_ia32_crc32qi (unsigned int, unsigned char)}
+Generates the @code{crc32b} machine instruction.
+@enddefbuiltin
 
-This pragma clears the current @code{#pragma GCC target} and
-@code{#pragma GCC optimize} to use the default switches as specified
-on the command line.
+@defbuiltin{{unsigned int} __builtin_ia32_crc32hi (unsigned int, unsigned short)}
+Generates the @code{crc32w} machine instruction.
+@enddefbuiltin
 
-@end table
+@defbuiltin{{unsigned int} __builtin_ia32_crc32si (unsigned int, unsigned int)}
+Generates the @code{crc32l} machine instruction.
+@enddefbuiltin
 
-@node Loop-Specific Pragmas
-@subsection Loop-Specific Pragmas
+@defbuiltin{{unsigned long long} __builtin_ia32_crc32di (unsigned long long, unsigned long long)}
+Generates the @code{crc32q} machine instruction.
+@enddefbuiltin
 
-@table @code
-@cindex pragma GCC ivdep
-@item #pragma GCC ivdep
+The following built-in functions are changed to generate new SSE4.2
+instructions when @option{-msse4.2} is used.
 
-With this pragma, the programmer asserts that there are no loop-carried
-dependencies which would prevent consecutive iterations of
-the following loop from executing concurrently with SIMD
-(single instruction multiple data) instructions.
+@defbuiltin{int __builtin_popcount (unsigned int)}
+Generates the @code{popcntl} machine instruction.
+@enddefbuiltin
 
-For example, the compiler can only unconditionally vectorize the following
-loop with the pragma:
+@defbuiltin{int __builtin_popcountl (unsigned long)}
+Generates the @code{popcntl} or @code{popcntq} machine instruction,
+depending on the size of @code{unsigned long}.
+@enddefbuiltin
 
-@smallexample
-void foo (int n, int *a, int *b, int *c)
-@{
-  int i, j;
-#pragma GCC ivdep
-  for (i = 0; i < n; ++i)
-    a[i] = b[i] + c[i];
-@}
-@end smallexample
+@defbuiltin{int __builtin_popcountll (unsigned long long)}
+Generates the @code{popcntq} machine instruction.
+@enddefbuiltin
 
-@noindent
-In this example, using the @code{restrict} qualifier had the same
-effect. In the following example, that would not be possible. Assume
-@math{k < -m} or @math{k >= m}. Only with the pragma, the compiler knows
-that it can unconditionally vectorize the following loop:
+The following built-in functions are available when @option{-mavx} is
+used. All of them generate the machine instruction that is part of the
+name.
 
 @smallexample
-void ignore_vec_dep (int *a, int k, int c, int m)
-@{
-#pragma GCC ivdep
-  for (int i = 0; i < m; i++)
-    a[i] = a[i + k] * c;
-@}
+v4df __builtin_ia32_addpd256 (v4df,v4df);
+v8sf __builtin_ia32_addps256 (v8sf,v8sf);
+v4df __builtin_ia32_addsubpd256 (v4df,v4df);
+v8sf __builtin_ia32_addsubps256 (v8sf,v8sf);
+v4df __builtin_ia32_andnpd256 (v4df,v4df);
+v8sf __builtin_ia32_andnps256 (v8sf,v8sf);
+v4df __builtin_ia32_andpd256 (v4df,v4df);
+v8sf __builtin_ia32_andps256 (v8sf,v8sf);
+v4df __builtin_ia32_blendpd256 (v4df,v4df,int);
+v8sf __builtin_ia32_blendps256 (v8sf,v8sf,int);
+v4df __builtin_ia32_blendvpd256 (v4df,v4df,v4df);
+v8sf __builtin_ia32_blendvps256 (v8sf,v8sf,v8sf);
+v2df __builtin_ia32_cmppd (v2df,v2df,int);
+v4df __builtin_ia32_cmppd256 (v4df,v4df,int);
+v4sf __builtin_ia32_cmpps (v4sf,v4sf,int);
+v8sf __builtin_ia32_cmpps256 (v8sf,v8sf,int);
+v2df __builtin_ia32_cmpsd (v2df,v2df,int);
+v4sf __builtin_ia32_cmpss (v4sf,v4sf,int);
+v4df __builtin_ia32_cvtdq2pd256 (v4si);
+v8sf __builtin_ia32_cvtdq2ps256 (v8si);
+v4si __builtin_ia32_cvtpd2dq256 (v4df);
+v4sf __builtin_ia32_cvtpd2ps256 (v4df);
+v8si __builtin_ia32_cvtps2dq256 (v8sf);
+v4df __builtin_ia32_cvtps2pd256 (v4sf);
+v4si __builtin_ia32_cvttpd2dq256 (v4df);
+v8si __builtin_ia32_cvttps2dq256 (v8sf);
+v4df __builtin_ia32_divpd256 (v4df,v4df);
+v8sf __builtin_ia32_divps256 (v8sf,v8sf);
+v8sf __builtin_ia32_dpps256 (v8sf,v8sf,int);
+v4df __builtin_ia32_haddpd256 (v4df,v4df);
+v8sf __builtin_ia32_haddps256 (v8sf,v8sf);
+v4df __builtin_ia32_hsubpd256 (v4df,v4df);
+v8sf __builtin_ia32_hsubps256 (v8sf,v8sf);
+v32qi __builtin_ia32_lddqu256 (pcchar);
+v32qi __builtin_ia32_loaddqu256 (pcchar);
+v4df __builtin_ia32_loadupd256 (pcdouble);
+v8sf __builtin_ia32_loadups256 (pcfloat);
+v2df __builtin_ia32_maskloadpd (pcv2df,v2df);
+v4df __builtin_ia32_maskloadpd256 (pcv4df,v4df);
+v4sf __builtin_ia32_maskloadps (pcv4sf,v4sf);
+v8sf __builtin_ia32_maskloadps256 (pcv8sf,v8sf);
+void __builtin_ia32_maskstorepd (pv2df,v2df,v2df);
+void __builtin_ia32_maskstorepd256 (pv4df,v4df,v4df);
+void __builtin_ia32_maskstoreps (pv4sf,v4sf,v4sf);
+void __builtin_ia32_maskstoreps256 (pv8sf,v8sf,v8sf);
+v4df __builtin_ia32_maxpd256 (v4df,v4df);
+v8sf __builtin_ia32_maxps256 (v8sf,v8sf);
+v4df __builtin_ia32_minpd256 (v4df,v4df);
+v8sf __builtin_ia32_minps256 (v8sf,v8sf);
+v4df __builtin_ia32_movddup256 (v4df);
+int __builtin_ia32_movmskpd256 (v4df);
+int __builtin_ia32_movmskps256 (v8sf);
+v8sf __builtin_ia32_movshdup256 (v8sf);
+v8sf __builtin_ia32_movsldup256 (v8sf);
+v4df __builtin_ia32_mulpd256 (v4df,v4df);
+v8sf __builtin_ia32_mulps256 (v8sf,v8sf);
+v4df __builtin_ia32_orpd256 (v4df,v4df);
+v8sf __builtin_ia32_orps256 (v8sf,v8sf);
+v2df __builtin_ia32_pd_pd256 (v4df);
+v4df __builtin_ia32_pd256_pd (v2df);
+v4sf __builtin_ia32_ps_ps256 (v8sf);
+v8sf __builtin_ia32_ps256_ps (v4sf);
+int __builtin_ia32_ptestc256 (v4di,v4di,ptest);
+int __builtin_ia32_ptestnzc256 (v4di,v4di,ptest);
+int __builtin_ia32_ptestz256 (v4di,v4di,ptest);
+v8sf __builtin_ia32_rcpps256 (v8sf);
+v4df __builtin_ia32_roundpd256 (v4df,int);
+v8sf __builtin_ia32_roundps256 (v8sf,int);
+v8sf __builtin_ia32_rsqrtps_nr256 (v8sf);
+v8sf __builtin_ia32_rsqrtps256 (v8sf);
+v4df __builtin_ia32_shufpd256 (v4df,v4df,int);
+v8sf __builtin_ia32_shufps256 (v8sf,v8sf,int);
+v4si __builtin_ia32_si_si256 (v8si);
+v8si __builtin_ia32_si256_si (v4si);
+v4df __builtin_ia32_sqrtpd256 (v4df);
+v8sf __builtin_ia32_sqrtps_nr256 (v8sf);
+v8sf __builtin_ia32_sqrtps256 (v8sf);
+void __builtin_ia32_storedqu256 (pchar,v32qi);
+void __builtin_ia32_storeupd256 (pdouble,v4df);
+void __builtin_ia32_storeups256 (pfloat,v8sf);
+v4df __builtin_ia32_subpd256 (v4df,v4df);
+v8sf __builtin_ia32_subps256 (v8sf,v8sf);
+v4df __builtin_ia32_unpckhpd256 (v4df,v4df);
+v8sf __builtin_ia32_unpckhps256 (v8sf,v8sf);
+v4df __builtin_ia32_unpcklpd256 (v4df,v4df);
+v8sf __builtin_ia32_unpcklps256 (v8sf,v8sf);
+v4df __builtin_ia32_vbroadcastf128_pd256 (pcv2df);
+v8sf __builtin_ia32_vbroadcastf128_ps256 (pcv4sf);
+v4df __builtin_ia32_vbroadcastsd256 (pcdouble);
+v4sf __builtin_ia32_vbroadcastss (pcfloat);
+v8sf __builtin_ia32_vbroadcastss256 (pcfloat);
+v2df __builtin_ia32_vextractf128_pd256 (v4df,int);
+v4sf __builtin_ia32_vextractf128_ps256 (v8sf,int);
+v4si __builtin_ia32_vextractf128_si256 (v8si,int);
+v4df __builtin_ia32_vinsertf128_pd256 (v4df,v2df,int);
+v8sf __builtin_ia32_vinsertf128_ps256 (v8sf,v4sf,int);
+v8si __builtin_ia32_vinsertf128_si256 (v8si,v4si,int);
+v4df __builtin_ia32_vperm2f128_pd256 (v4df,v4df,int);
+v8sf __builtin_ia32_vperm2f128_ps256 (v8sf,v8sf,int);
+v8si __builtin_ia32_vperm2f128_si256 (v8si,v8si,int);
+v2df __builtin_ia32_vpermil2pd (v2df,v2df,v2di,int);
+v4df __builtin_ia32_vpermil2pd256 (v4df,v4df,v4di,int);
+v4sf __builtin_ia32_vpermil2ps (v4sf,v4sf,v4si,int);
+v8sf __builtin_ia32_vpermil2ps256 (v8sf,v8sf,v8si,int);
+v2df __builtin_ia32_vpermilpd (v2df,int);
+v4df __builtin_ia32_vpermilpd256 (v4df,int);
+v4sf __builtin_ia32_vpermilps (v4sf,int);
+v8sf __builtin_ia32_vpermilps256 (v8sf,int);
+v2df __builtin_ia32_vpermilvarpd (v2df,v2di);
+v4df __builtin_ia32_vpermilvarpd256 (v4df,v4di);
+v4sf __builtin_ia32_vpermilvarps (v4sf,v4si);
+v8sf __builtin_ia32_vpermilvarps256 (v8sf,v8si);
+int __builtin_ia32_vtestcpd (v2df,v2df,ptest);
+int __builtin_ia32_vtestcpd256 (v4df,v4df,ptest);
+int __builtin_ia32_vtestcps (v4sf,v4sf,ptest);
+int __builtin_ia32_vtestcps256 (v8sf,v8sf,ptest);
+int __builtin_ia32_vtestnzcpd (v2df,v2df,ptest);
+int __builtin_ia32_vtestnzcpd256 (v4df,v4df,ptest);
+int __builtin_ia32_vtestnzcps (v4sf,v4sf,ptest);
+int __builtin_ia32_vtestnzcps256 (v8sf,v8sf,ptest);
+int __builtin_ia32_vtestzpd (v2df,v2df,ptest);
+int __builtin_ia32_vtestzpd256 (v4df,v4df,ptest);
+int __builtin_ia32_vtestzps (v4sf,v4sf,ptest);
+int __builtin_ia32_vtestzps256 (v8sf,v8sf,ptest);
+void __builtin_ia32_vzeroall (void);
+void __builtin_ia32_vzeroupper (void);
+v4df __builtin_ia32_xorpd256 (v4df,v4df);
+v8sf __builtin_ia32_xorps256 (v8sf,v8sf);
 @end smallexample
 
-@cindex pragma GCC novector
-@item #pragma GCC novector
-
-With this pragma, the programmer asserts that the following loop should be
-prevented from executing concurrently with SIMD (single instruction multiple
-data) instructions.
-
-For example, the compiler cannot vectorize the following loop with the pragma:
+The following built-in functions are available when @option{-mavx2} is
+used. All of them generate the machine instruction that is part of the
+name.
 
 @smallexample
-void foo (int n, int *a, int *b, int *c)
-@{
-  int i, j;
-#pragma GCC novector
-  for (i = 0; i < n; ++i)
-    a[i] = b[i] + c[i];
-@}
+v32qi __builtin_ia32_mpsadbw256 (v32qi,v32qi,int);
+v32qi __builtin_ia32_pabsb256 (v32qi);
+v16hi __builtin_ia32_pabsw256 (v16hi);
+v8si __builtin_ia32_pabsd256 (v8si);
+v16hi __builtin_ia32_packssdw256 (v8si,v8si);
+v32qi __builtin_ia32_packsswb256 (v16hi,v16hi);
+v16hi __builtin_ia32_packusdw256 (v8si,v8si);
+v32qi __builtin_ia32_packuswb256 (v16hi,v16hi);
+v32qi __builtin_ia32_paddb256 (v32qi,v32qi);
+v16hi __builtin_ia32_paddw256 (v16hi,v16hi);
+v8si __builtin_ia32_paddd256 (v8si,v8si);
+v4di __builtin_ia32_paddq256 (v4di,v4di);
+v32qi __builtin_ia32_paddsb256 (v32qi,v32qi);
+v16hi __builtin_ia32_paddsw256 (v16hi,v16hi);
+v32qi __builtin_ia32_paddusb256 (v32qi,v32qi);
+v16hi __builtin_ia32_paddusw256 (v16hi,v16hi);
+v4di __builtin_ia32_palignr256 (v4di,v4di,int);
+v4di __builtin_ia32_andsi256 (v4di,v4di);
+v4di __builtin_ia32_andnotsi256 (v4di,v4di);
+v32qi __builtin_ia32_pavgb256 (v32qi,v32qi);
+v16hi __builtin_ia32_pavgw256 (v16hi,v16hi);
+v32qi __builtin_ia32_pblendvb256 (v32qi,v32qi,v32qi);
+v16hi __builtin_ia32_pblendw256 (v16hi,v16hi,int);
+v32qi __builtin_ia32_pcmpeqb256 (v32qi,v32qi);
+v16hi __builtin_ia32_pcmpeqw256 (v16hi,v16hi);
+v8si __builtin_ia32_pcmpeqd256 (c8si,v8si);
+v4di __builtin_ia32_pcmpeqq256 (v4di,v4di);
+v32qi __builtin_ia32_pcmpgtb256 (v32qi,v32qi);
+v16hi __builtin_ia32_pcmpgtw256 (16hi,v16hi);
+v8si __builtin_ia32_pcmpgtd256 (v8si,v8si);
+v4di __builtin_ia32_pcmpgtq256 (v4di,v4di);
+v16hi __builtin_ia32_phaddw256 (v16hi,v16hi);
+v8si __builtin_ia32_phaddd256 (v8si,v8si);
+v16hi __builtin_ia32_phaddsw256 (v16hi,v16hi);
+v16hi __builtin_ia32_phsubw256 (v16hi,v16hi);
+v8si __builtin_ia32_phsubd256 (v8si,v8si);
+v16hi __builtin_ia32_phsubsw256 (v16hi,v16hi);
+v32qi __builtin_ia32_pmaddubsw256 (v32qi,v32qi);
+v16hi __builtin_ia32_pmaddwd256 (v16hi,v16hi);
+v32qi __builtin_ia32_pmaxsb256 (v32qi,v32qi);
+v16hi __builtin_ia32_pmaxsw256 (v16hi,v16hi);
+v8si __builtin_ia32_pmaxsd256 (v8si,v8si);
+v32qi __builtin_ia32_pmaxub256 (v32qi,v32qi);
+v16hi __builtin_ia32_pmaxuw256 (v16hi,v16hi);
+v8si __builtin_ia32_pmaxud256 (v8si,v8si);
+v32qi __builtin_ia32_pminsb256 (v32qi,v32qi);
+v16hi __builtin_ia32_pminsw256 (v16hi,v16hi);
+v8si __builtin_ia32_pminsd256 (v8si,v8si);
+v32qi __builtin_ia32_pminub256 (v32qi,v32qi);
+v16hi __builtin_ia32_pminuw256 (v16hi,v16hi);
+v8si __builtin_ia32_pminud256 (v8si,v8si);
+int __builtin_ia32_pmovmskb256 (v32qi);
+v16hi __builtin_ia32_pmovsxbw256 (v16qi);
+v8si __builtin_ia32_pmovsxbd256 (v16qi);
+v4di __builtin_ia32_pmovsxbq256 (v16qi);
+v8si __builtin_ia32_pmovsxwd256 (v8hi);
+v4di __builtin_ia32_pmovsxwq256 (v8hi);
+v4di __builtin_ia32_pmovsxdq256 (v4si);
+v16hi __builtin_ia32_pmovzxbw256 (v16qi);
+v8si __builtin_ia32_pmovzxbd256 (v16qi);
+v4di __builtin_ia32_pmovzxbq256 (v16qi);
+v8si __builtin_ia32_pmovzxwd256 (v8hi);
+v4di __builtin_ia32_pmovzxwq256 (v8hi);
+v4di __builtin_ia32_pmovzxdq256 (v4si);
+v4di __builtin_ia32_pmuldq256 (v8si,v8si);
+v16hi __builtin_ia32_pmulhrsw256 (v16hi, v16hi);
+v16hi __builtin_ia32_pmulhuw256 (v16hi,v16hi);
+v16hi __builtin_ia32_pmulhw256 (v16hi,v16hi);
+v16hi __builtin_ia32_pmullw256 (v16hi,v16hi);
+v8si __builtin_ia32_pmulld256 (v8si,v8si);
+v4di __builtin_ia32_pmuludq256 (v8si,v8si);
+v4di __builtin_ia32_por256 (v4di,v4di);
+v16hi __builtin_ia32_psadbw256 (v32qi,v32qi);
+v32qi __builtin_ia32_pshufb256 (v32qi,v32qi);
+v8si __builtin_ia32_pshufd256 (v8si,int);
+v16hi __builtin_ia32_pshufhw256 (v16hi,int);
+v16hi __builtin_ia32_pshuflw256 (v16hi,int);
+v32qi __builtin_ia32_psignb256 (v32qi,v32qi);
+v16hi __builtin_ia32_psignw256 (v16hi,v16hi);
+v8si __builtin_ia32_psignd256 (v8si,v8si);
+v4di __builtin_ia32_pslldqi256 (v4di,int);
+v16hi __builtin_ia32_psllwi256 (16hi,int);
+v16hi __builtin_ia32_psllw256(v16hi,v8hi);
+v8si __builtin_ia32_pslldi256 (v8si,int);
+v8si __builtin_ia32_pslld256(v8si,v4si);
+v4di __builtin_ia32_psllqi256 (v4di,int);
+v4di __builtin_ia32_psllq256(v4di,v2di);
+v16hi __builtin_ia32_psrawi256 (v16hi,int);
+v16hi __builtin_ia32_psraw256 (v16hi,v8hi);
+v8si __builtin_ia32_psradi256 (v8si,int);
+v8si __builtin_ia32_psrad256 (v8si,v4si);
+v4di __builtin_ia32_psrldqi256 (v4di, int);
+v16hi __builtin_ia32_psrlwi256 (v16hi,int);
+v16hi __builtin_ia32_psrlw256 (v16hi,v8hi);
+v8si __builtin_ia32_psrldi256 (v8si,int);
+v8si __builtin_ia32_psrld256 (v8si,v4si);
+v4di __builtin_ia32_psrlqi256 (v4di,int);
+v4di __builtin_ia32_psrlq256(v4di,v2di);
+v32qi __builtin_ia32_psubb256 (v32qi,v32qi);
+v32hi __builtin_ia32_psubw256 (v16hi,v16hi);
+v8si __builtin_ia32_psubd256 (v8si,v8si);
+v4di __builtin_ia32_psubq256 (v4di,v4di);
+v32qi __builtin_ia32_psubsb256 (v32qi,v32qi);
+v16hi __builtin_ia32_psubsw256 (v16hi,v16hi);
+v32qi __builtin_ia32_psubusb256 (v32qi,v32qi);
+v16hi __builtin_ia32_psubusw256 (v16hi,v16hi);
+v32qi __builtin_ia32_punpckhbw256 (v32qi,v32qi);
+v16hi __builtin_ia32_punpckhwd256 (v16hi,v16hi);
+v8si __builtin_ia32_punpckhdq256 (v8si,v8si);
+v4di __builtin_ia32_punpckhqdq256 (v4di,v4di);
+v32qi __builtin_ia32_punpcklbw256 (v32qi,v32qi);
+v16hi __builtin_ia32_punpcklwd256 (v16hi,v16hi);
+v8si __builtin_ia32_punpckldq256 (v8si,v8si);
+v4di __builtin_ia32_punpcklqdq256 (v4di,v4di);
+v4di __builtin_ia32_pxor256 (v4di,v4di);
+v4di __builtin_ia32_movntdqa256 (pv4di);
+v4sf __builtin_ia32_vbroadcastss_ps (v4sf);
+v8sf __builtin_ia32_vbroadcastss_ps256 (v4sf);
+v4df __builtin_ia32_vbroadcastsd_pd256 (v2df);
+v4di __builtin_ia32_vbroadcastsi256 (v2di);
+v4si __builtin_ia32_pblendd128 (v4si,v4si);
+v8si __builtin_ia32_pblendd256 (v8si,v8si);
+v32qi __builtin_ia32_pbroadcastb256 (v16qi);
+v16hi __builtin_ia32_pbroadcastw256 (v8hi);
+v8si __builtin_ia32_pbroadcastd256 (v4si);
+v4di __builtin_ia32_pbroadcastq256 (v2di);
+v16qi __builtin_ia32_pbroadcastb128 (v16qi);
+v8hi __builtin_ia32_pbroadcastw128 (v8hi);
+v4si __builtin_ia32_pbroadcastd128 (v4si);
+v2di __builtin_ia32_pbroadcastq128 (v2di);
+v8si __builtin_ia32_permvarsi256 (v8si,v8si);
+v4df __builtin_ia32_permdf256 (v4df,int);
+v8sf __builtin_ia32_permvarsf256 (v8sf,v8sf);
+v4di __builtin_ia32_permdi256 (v4di,int);
+v4di __builtin_ia32_permti256 (v4di,v4di,int);
+v4di __builtin_ia32_extract128i256 (v4di,int);
+v4di __builtin_ia32_insert128i256 (v4di,v2di,int);
+v8si __builtin_ia32_maskloadd256 (pcv8si,v8si);
+v4di __builtin_ia32_maskloadq256 (pcv4di,v4di);
+v4si __builtin_ia32_maskloadd (pcv4si,v4si);
+v2di __builtin_ia32_maskloadq (pcv2di,v2di);
+void __builtin_ia32_maskstored256 (pv8si,v8si,v8si);
+void __builtin_ia32_maskstoreq256 (pv4di,v4di,v4di);
+void __builtin_ia32_maskstored (pv4si,v4si,v4si);
+void __builtin_ia32_maskstoreq (pv2di,v2di,v2di);
+v8si __builtin_ia32_psllv8si (v8si,v8si);
+v4si __builtin_ia32_psllv4si (v4si,v4si);
+v4di __builtin_ia32_psllv4di (v4di,v4di);
+v2di __builtin_ia32_psllv2di (v2di,v2di);
+v8si __builtin_ia32_psrav8si (v8si,v8si);
+v4si __builtin_ia32_psrav4si (v4si,v4si);
+v8si __builtin_ia32_psrlv8si (v8si,v8si);
+v4si __builtin_ia32_psrlv4si (v4si,v4si);
+v4di __builtin_ia32_psrlv4di (v4di,v4di);
+v2di __builtin_ia32_psrlv2di (v2di,v2di);
+v2df __builtin_ia32_gathersiv2df (v2df, pcdouble,v4si,v2df,int);
+v4df __builtin_ia32_gathersiv4df (v4df, pcdouble,v4si,v4df,int);
+v2df __builtin_ia32_gatherdiv2df (v2df, pcdouble,v2di,v2df,int);
+v4df __builtin_ia32_gatherdiv4df (v4df, pcdouble,v4di,v4df,int);
+v4sf __builtin_ia32_gathersiv4sf (v4sf, pcfloat,v4si,v4sf,int);
+v8sf __builtin_ia32_gathersiv8sf (v8sf, pcfloat,v8si,v8sf,int);
+v4sf __builtin_ia32_gatherdiv4sf (v4sf, pcfloat,v2di,v4sf,int);
+v4sf __builtin_ia32_gatherdiv4sf256 (v4sf, pcfloat,v4di,v4sf,int);
+v2di __builtin_ia32_gathersiv2di (v2di, pcint64,v4si,v2di,int);
+v4di __builtin_ia32_gathersiv4di (v4di, pcint64,v4si,v4di,int);
+v2di __builtin_ia32_gatherdiv2di (v2di, pcint64,v2di,v2di,int);
+v4di __builtin_ia32_gatherdiv4di (v4di, pcint64,v4di,v4di,int);
+v4si __builtin_ia32_gathersiv4si (v4si, pcint,v4si,v4si,int);
+v8si __builtin_ia32_gathersiv8si (v8si, pcint,v8si,v8si,int);
+v4si __builtin_ia32_gatherdiv4si (v4si, pcint,v2di,v4si,int);
+v4si __builtin_ia32_gatherdiv4si256 (v4si, pcint,v4di,v4si,int);
 @end smallexample
 
-@cindex pragma GCC unroll @var{n}
-@item #pragma GCC unroll @var{n}
-
-You can use this pragma to control how many times a loop should be unrolled.
-It must be placed immediately before a @code{for}, @code{while} or @code{do}
-loop or a @code{#pragma GCC ivdep}, and applies only to the loop that follows.
-@var{n} is an integer constant expression specifying the unrolling factor.
-The values of @math{0} and @math{1} block any unrolling of the loop.
-
-@end table
-
-@node Thread-Local
-@section Thread-Local Storage
-@cindex Thread-Local Storage
-@cindex @acronym{TLS}
-@cindex @code{__thread}
-
-Thread-local storage (@acronym{TLS}) is a mechanism by which variables
-are allocated such that there is one instance of the variable per extant
-thread.  The runtime model GCC uses to implement this originates
-in the IA-64 processor-specific ABI, but has since been migrated
-to other processors as well.  It requires significant support from
-the linker (@command{ld}), dynamic linker (@command{ld.so}), and
-system libraries (@file{libc.so} and @file{libpthread.so}), so it
-is not available everywhere.
-
-At the user level, the extension is visible with a new storage
-class keyword: @code{__thread}.  For example:
+The following built-in functions are available when @option{-maes} is
+used.  All of them generate the machine instruction that is part of the
+name.
 
 @smallexample
-__thread int i;
-extern __thread struct state s;
-static __thread char *p;
+v2di __builtin_ia32_aesenc128 (v2di, v2di);
+v2di __builtin_ia32_aesenclast128 (v2di, v2di);
+v2di __builtin_ia32_aesdec128 (v2di, v2di);
+v2di __builtin_ia32_aesdeclast128 (v2di, v2di);
+v2di __builtin_ia32_aeskeygenassist128 (v2di, const int);
+v2di __builtin_ia32_aesimc128 (v2di);
 @end smallexample
 
-The @code{__thread} specifier may be used alone, with the @code{extern}
-or @code{static} specifiers, but with no other storage class specifier.
-When used with @code{extern} or @code{static}, @code{__thread} must appear
-immediately after the other storage class specifier.
-
-The @code{__thread} specifier may be applied to any global, file-scoped
-static, function-scoped static, or static data member of a class.  It may
-not be applied to block-scoped automatic or non-static data member.
-
-When the address-of operator is applied to a thread-local variable, it is
-evaluated at run time and returns the address of the current thread's
-instance of that variable.  An address so obtained may be used by any
-thread.  When a thread terminates, any pointers to thread-local variables
-in that thread become invalid.
+The following built-in function is available when @option{-mpclmul} is
+used.
 
-No static initialization may refer to the address of a thread-local variable.
+@defbuiltin{v2di __builtin_ia32_pclmulqdq128 (v2di, v2di, const int)}
+Generates the @code{pclmulqdq} machine instruction.
+@enddefbuiltin
 
-In C++, if an initializer is present for a thread-local variable, it must
-be a @var{constant-expression}, as defined in 5.19.2 of the ANSI/ISO C++
-standard.
+The following built-in function is available when @option{-mfsgsbase} is
+used.  All of them generate the machine instruction that is part of the
+name.
 
-See @uref{https://www.akkadia.org/drepper/tls.pdf,
-ELF Handling For Thread-Local Storage} for a detailed explanation of
-the four thread-local storage addressing models, and how the runtime
-is expected to function.
+@smallexample
+unsigned int __builtin_ia32_rdfsbase32 (void);
+unsigned long long __builtin_ia32_rdfsbase64 (void);
+unsigned int __builtin_ia32_rdgsbase32 (void);
+unsigned long long __builtin_ia32_rdgsbase64 (void);
+void _writefsbase_u32 (unsigned int);
+void _writefsbase_u64 (unsigned long long);
+void _writegsbase_u32 (unsigned int);
+void _writegsbase_u64 (unsigned long long);
+@end smallexample
 
-@menu
-* C99 Thread-Local Edits::
-* C++98 Thread-Local Edits::
-@end menu
+The following built-in function is available when @option{-mrdrnd} is
+used.  All of them generate the machine instruction that is part of the
+name.
 
-@node C99 Thread-Local Edits
-@subsection ISO/IEC 9899:1999 Edits for Thread-Local Storage
+@smallexample
+unsigned int __builtin_ia32_rdrand16_step (unsigned short *);
+unsigned int __builtin_ia32_rdrand32_step (unsigned int *);
+unsigned int __builtin_ia32_rdrand64_step (unsigned long long *);
+@end smallexample
 
-The following are a set of changes to ISO/IEC 9899:1999 (aka C99)
-that document the exact semantics of the language extension.
+The following built-in function is available when @option{-mptwrite} is
+used.  All of them generate the machine instruction that is part of the
+name.
 
-@itemize @bullet
-@item
-@cite{5.1.2  Execution environments}
+@smallexample
+void __builtin_ia32_ptwrite32 (unsigned);
+void __builtin_ia32_ptwrite64 (unsigned long long);
+@end smallexample
 
-Add new text after paragraph 1
+The following built-in functions are available when @option{-msse4a} is used.
+All of them generate the machine instruction that is part of the name.
 
-@quotation
-Within either execution environment, a @dfn{thread} is a flow of
-control within a program.  It is implementation defined whether
-or not there may be more than one thread associated with a program.
-It is implementation defined how threads beyond the first are
-created, the name and type of the function called at thread
-startup, and how threads may be terminated.  However, objects
-with thread storage duration shall be initialized before thread
-startup.
-@end quotation
+@smallexample
+void __builtin_ia32_movntsd (double *, v2df);
+void __builtin_ia32_movntss (float *, v4sf);
+v2di __builtin_ia32_extrq  (v2di, v16qi);
+v2di __builtin_ia32_extrqi (v2di, const unsigned int, const unsigned int);
+v2di __builtin_ia32_insertq (v2di, v2di);
+v2di __builtin_ia32_insertqi (v2di, v2di, const unsigned int, const unsigned int);
+@end smallexample
 
-@item
-@cite{6.2.4  Storage durations of objects}
+The following built-in functions are available when @option{-mxop} is used.
+@smallexample
+v2df __builtin_ia32_vfrczpd (v2df);
+v4sf __builtin_ia32_vfrczps (v4sf);
+v2df __builtin_ia32_vfrczsd (v2df);
+v4sf __builtin_ia32_vfrczss (v4sf);
+v4df __builtin_ia32_vfrczpd256 (v4df);
+v8sf __builtin_ia32_vfrczps256 (v8sf);
+v2di __builtin_ia32_vpcmov (v2di, v2di, v2di);
+v2di __builtin_ia32_vpcmov_v2di (v2di, v2di, v2di);
+v4si __builtin_ia32_vpcmov_v4si (v4si, v4si, v4si);
+v8hi __builtin_ia32_vpcmov_v8hi (v8hi, v8hi, v8hi);
+v16qi __builtin_ia32_vpcmov_v16qi (v16qi, v16qi, v16qi);
+v2df __builtin_ia32_vpcmov_v2df (v2df, v2df, v2df);
+v4sf __builtin_ia32_vpcmov_v4sf (v4sf, v4sf, v4sf);
+v4di __builtin_ia32_vpcmov_v4di256 (v4di, v4di, v4di);
+v8si __builtin_ia32_vpcmov_v8si256 (v8si, v8si, v8si);
+v16hi __builtin_ia32_vpcmov_v16hi256 (v16hi, v16hi, v16hi);
+v32qi __builtin_ia32_vpcmov_v32qi256 (v32qi, v32qi, v32qi);
+v4df __builtin_ia32_vpcmov_v4df256 (v4df, v4df, v4df);
+v8sf __builtin_ia32_vpcmov_v8sf256 (v8sf, v8sf, v8sf);
+v16qi __builtin_ia32_vpcomeqb (v16qi, v16qi);
+v8hi __builtin_ia32_vpcomeqw (v8hi, v8hi);
+v4si __builtin_ia32_vpcomeqd (v4si, v4si);
+v2di __builtin_ia32_vpcomeqq (v2di, v2di);
+v16qi __builtin_ia32_vpcomequb (v16qi, v16qi);
+v4si __builtin_ia32_vpcomequd (v4si, v4si);
+v2di __builtin_ia32_vpcomequq (v2di, v2di);
+v8hi __builtin_ia32_vpcomequw (v8hi, v8hi);
+v8hi __builtin_ia32_vpcomeqw (v8hi, v8hi);
+v16qi __builtin_ia32_vpcomfalseb (v16qi, v16qi);
+v4si __builtin_ia32_vpcomfalsed (v4si, v4si);
+v2di __builtin_ia32_vpcomfalseq (v2di, v2di);
+v16qi __builtin_ia32_vpcomfalseub (v16qi, v16qi);
+v4si __builtin_ia32_vpcomfalseud (v4si, v4si);
+v2di __builtin_ia32_vpcomfalseuq (v2di, v2di);
+v8hi __builtin_ia32_vpcomfalseuw (v8hi, v8hi);
+v8hi __builtin_ia32_vpcomfalsew (v8hi, v8hi);
+v16qi __builtin_ia32_vpcomgeb (v16qi, v16qi);
+v4si __builtin_ia32_vpcomged (v4si, v4si);
+v2di __builtin_ia32_vpcomgeq (v2di, v2di);
+v16qi __builtin_ia32_vpcomgeub (v16qi, v16qi);
+v4si __builtin_ia32_vpcomgeud (v4si, v4si);
+v2di __builtin_ia32_vpcomgeuq (v2di, v2di);
+v8hi __builtin_ia32_vpcomgeuw (v8hi, v8hi);
+v8hi __builtin_ia32_vpcomgew (v8hi, v8hi);
+v16qi __builtin_ia32_vpcomgtb (v16qi, v16qi);
+v4si __builtin_ia32_vpcomgtd (v4si, v4si);
+v2di __builtin_ia32_vpcomgtq (v2di, v2di);
+v16qi __builtin_ia32_vpcomgtub (v16qi, v16qi);
+v4si __builtin_ia32_vpcomgtud (v4si, v4si);
+v2di __builtin_ia32_vpcomgtuq (v2di, v2di);
+v8hi __builtin_ia32_vpcomgtuw (v8hi, v8hi);
+v8hi __builtin_ia32_vpcomgtw (v8hi, v8hi);
+v16qi __builtin_ia32_vpcomleb (v16qi, v16qi);
+v4si __builtin_ia32_vpcomled (v4si, v4si);
+v2di __builtin_ia32_vpcomleq (v2di, v2di);
+v16qi __builtin_ia32_vpcomleub (v16qi, v16qi);
+v4si __builtin_ia32_vpcomleud (v4si, v4si);
+v2di __builtin_ia32_vpcomleuq (v2di, v2di);
+v8hi __builtin_ia32_vpcomleuw (v8hi, v8hi);
+v8hi __builtin_ia32_vpcomlew (v8hi, v8hi);
+v16qi __builtin_ia32_vpcomltb (v16qi, v16qi);
+v4si __builtin_ia32_vpcomltd (v4si, v4si);
+v2di __builtin_ia32_vpcomltq (v2di, v2di);
+v16qi __builtin_ia32_vpcomltub (v16qi, v16qi);
+v4si __builtin_ia32_vpcomltud (v4si, v4si);
+v2di __builtin_ia32_vpcomltuq (v2di, v2di);
+v8hi __builtin_ia32_vpcomltuw (v8hi, v8hi);
+v8hi __builtin_ia32_vpcomltw (v8hi, v8hi);
+v16qi __builtin_ia32_vpcomneb (v16qi, v16qi);
+v4si __builtin_ia32_vpcomned (v4si, v4si);
+v2di __builtin_ia32_vpcomneq (v2di, v2di);
+v16qi __builtin_ia32_vpcomneub (v16qi, v16qi);
+v4si __builtin_ia32_vpcomneud (v4si, v4si);
+v2di __builtin_ia32_vpcomneuq (v2di, v2di);
+v8hi __builtin_ia32_vpcomneuw (v8hi, v8hi);
+v8hi __builtin_ia32_vpcomnew (v8hi, v8hi);
+v16qi __builtin_ia32_vpcomtrueb (v16qi, v16qi);
+v4si __builtin_ia32_vpcomtrued (v4si, v4si);
+v2di __builtin_ia32_vpcomtrueq (v2di, v2di);
+v16qi __builtin_ia32_vpcomtrueub (v16qi, v16qi);
+v4si __builtin_ia32_vpcomtrueud (v4si, v4si);
+v2di __builtin_ia32_vpcomtrueuq (v2di, v2di);
+v8hi __builtin_ia32_vpcomtrueuw (v8hi, v8hi);
+v8hi __builtin_ia32_vpcomtruew (v8hi, v8hi);
+v4si __builtin_ia32_vphaddbd (v16qi);
+v2di __builtin_ia32_vphaddbq (v16qi);
+v8hi __builtin_ia32_vphaddbw (v16qi);
+v2di __builtin_ia32_vphadddq (v4si);
+v4si __builtin_ia32_vphaddubd (v16qi);
+v2di __builtin_ia32_vphaddubq (v16qi);
+v8hi __builtin_ia32_vphaddubw (v16qi);
+v2di __builtin_ia32_vphaddudq (v4si);
+v4si __builtin_ia32_vphadduwd (v8hi);
+v2di __builtin_ia32_vphadduwq (v8hi);
+v4si __builtin_ia32_vphaddwd (v8hi);
+v2di __builtin_ia32_vphaddwq (v8hi);
+v8hi __builtin_ia32_vphsubbw (v16qi);
+v2di __builtin_ia32_vphsubdq (v4si);
+v4si __builtin_ia32_vphsubwd (v8hi);
+v4si __builtin_ia32_vpmacsdd (v4si, v4si, v4si);
+v2di __builtin_ia32_vpmacsdqh (v4si, v4si, v2di);
+v2di __builtin_ia32_vpmacsdql (v4si, v4si, v2di);
+v4si __builtin_ia32_vpmacssdd (v4si, v4si, v4si);
+v2di __builtin_ia32_vpmacssdqh (v4si, v4si, v2di);
+v2di __builtin_ia32_vpmacssdql (v4si, v4si, v2di);
+v4si __builtin_ia32_vpmacsswd (v8hi, v8hi, v4si);
+v8hi __builtin_ia32_vpmacssww (v8hi, v8hi, v8hi);
+v4si __builtin_ia32_vpmacswd (v8hi, v8hi, v4si);
+v8hi __builtin_ia32_vpmacsww (v8hi, v8hi, v8hi);
+v4si __builtin_ia32_vpmadcsswd (v8hi, v8hi, v4si);
+v4si __builtin_ia32_vpmadcswd (v8hi, v8hi, v4si);
+v16qi __builtin_ia32_vpperm (v16qi, v16qi, v16qi);
+v16qi __builtin_ia32_vprotb (v16qi, v16qi);
+v4si __builtin_ia32_vprotd (v4si, v4si);
+v2di __builtin_ia32_vprotq (v2di, v2di);
+v8hi __builtin_ia32_vprotw (v8hi, v8hi);
+v16qi __builtin_ia32_vpshab (v16qi, v16qi);
+v4si __builtin_ia32_vpshad (v4si, v4si);
+v2di __builtin_ia32_vpshaq (v2di, v2di);
+v8hi __builtin_ia32_vpshaw (v8hi, v8hi);
+v16qi __builtin_ia32_vpshlb (v16qi, v16qi);
+v4si __builtin_ia32_vpshld (v4si, v4si);
+v2di __builtin_ia32_vpshlq (v2di, v2di);
+v8hi __builtin_ia32_vpshlw (v8hi, v8hi);
+@end smallexample
 
-Add new text before paragraph 3
+The following built-in functions are available when @option{-mfma4} is used.
+All of them generate the machine instruction that is part of the name.
 
-@quotation
-An object whose identifier is declared with the storage-class
-specifier @w{@code{__thread}} has @dfn{thread storage duration}.
-Its lifetime is the entire execution of the thread, and its
-stored value is initialized only once, prior to thread startup.
-@end quotation
+@smallexample
+v2df __builtin_ia32_vfmaddpd (v2df, v2df, v2df);
+v4sf __builtin_ia32_vfmaddps (v4sf, v4sf, v4sf);
+v2df __builtin_ia32_vfmaddsd (v2df, v2df, v2df);
+v4sf __builtin_ia32_vfmaddss (v4sf, v4sf, v4sf);
+v2df __builtin_ia32_vfmsubpd (v2df, v2df, v2df);
+v4sf __builtin_ia32_vfmsubps (v4sf, v4sf, v4sf);
+v2df __builtin_ia32_vfmsubsd (v2df, v2df, v2df);
+v4sf __builtin_ia32_vfmsubss (v4sf, v4sf, v4sf);
+v2df __builtin_ia32_vfnmaddpd (v2df, v2df, v2df);
+v4sf __builtin_ia32_vfnmaddps (v4sf, v4sf, v4sf);
+v2df __builtin_ia32_vfnmaddsd (v2df, v2df, v2df);
+v4sf __builtin_ia32_vfnmaddss (v4sf, v4sf, v4sf);
+v2df __builtin_ia32_vfnmsubpd (v2df, v2df, v2df);
+v4sf __builtin_ia32_vfnmsubps (v4sf, v4sf, v4sf);
+v2df __builtin_ia32_vfnmsubsd (v2df, v2df, v2df);
+v4sf __builtin_ia32_vfnmsubss (v4sf, v4sf, v4sf);
+v2df __builtin_ia32_vfmaddsubpd  (v2df, v2df, v2df);
+v4sf __builtin_ia32_vfmaddsubps  (v4sf, v4sf, v4sf);
+v2df __builtin_ia32_vfmsubaddpd  (v2df, v2df, v2df);
+v4sf __builtin_ia32_vfmsubaddps  (v4sf, v4sf, v4sf);
+v4df __builtin_ia32_vfmaddpd256 (v4df, v4df, v4df);
+v8sf __builtin_ia32_vfmaddps256 (v8sf, v8sf, v8sf);
+v4df __builtin_ia32_vfmsubpd256 (v4df, v4df, v4df);
+v8sf __builtin_ia32_vfmsubps256 (v8sf, v8sf, v8sf);
+v4df __builtin_ia32_vfnmaddpd256 (v4df, v4df, v4df);
+v8sf __builtin_ia32_vfnmaddps256 (v8sf, v8sf, v8sf);
+v4df __builtin_ia32_vfnmsubpd256 (v4df, v4df, v4df);
+v8sf __builtin_ia32_vfnmsubps256 (v8sf, v8sf, v8sf);
+v4df __builtin_ia32_vfmaddsubpd256 (v4df, v4df, v4df);
+v8sf __builtin_ia32_vfmaddsubps256 (v8sf, v8sf, v8sf);
+v4df __builtin_ia32_vfmsubaddpd256 (v4df, v4df, v4df);
+v8sf __builtin_ia32_vfmsubaddps256 (v8sf, v8sf, v8sf);
 
-@item
-@cite{6.4.1  Keywords}
+@end smallexample
 
-Add @code{__thread}.
+The following built-in functions are available when @option{-mlwp} is used.
 
-@item
-@cite{6.7.1  Storage-class specifiers}
+@smallexample
+void __builtin_ia32_llwpcb16 (void *);
+void __builtin_ia32_llwpcb32 (void *);
+void __builtin_ia32_llwpcb64 (void *);
+void * __builtin_ia32_llwpcb16 (void);
+void * __builtin_ia32_llwpcb32 (void);
+void * __builtin_ia32_llwpcb64 (void);
+void __builtin_ia32_lwpval16 (unsigned short, unsigned int, unsigned short);
+void __builtin_ia32_lwpval32 (unsigned int, unsigned int, unsigned int);
+void __builtin_ia32_lwpval64 (unsigned __int64, unsigned int, unsigned int);
+unsigned char __builtin_ia32_lwpins16 (unsigned short, unsigned int, unsigned short);
+unsigned char __builtin_ia32_lwpins32 (unsigned int, unsigned int, unsigned int);
+unsigned char __builtin_ia32_lwpins64 (unsigned __int64, unsigned int, unsigned int);
+@end smallexample
 
-Add @code{__thread} to the list of storage class specifiers in
-paragraph 1.
+The following built-in functions are available when @option{-mbmi} is used.
+All of them generate the machine instruction that is part of the name.
+@smallexample
+unsigned int __builtin_ia32_bextr_u32(unsigned int, unsigned int);
+unsigned long long __builtin_ia32_bextr_u64 (unsigned long long, unsigned long long);
+@end smallexample
 
-Change paragraph 2 to
+The following built-in functions are available when @option{-mbmi2} is used.
+All of them generate the machine instruction that is part of the name.
+@smallexample
+unsigned int _bzhi_u32 (unsigned int, unsigned int);
+unsigned int _pdep_u32 (unsigned int, unsigned int);
+unsigned int _pext_u32 (unsigned int, unsigned int);
+unsigned long long _bzhi_u64 (unsigned long long, unsigned long long);
+unsigned long long _pdep_u64 (unsigned long long, unsigned long long);
+unsigned long long _pext_u64 (unsigned long long, unsigned long long);
+@end smallexample
 
-@quotation
-With the exception of @code{__thread}, at most one storage-class
-specifier may be given [@dots{}].  The @code{__thread} specifier may
-be used alone, or immediately following @code{extern} or
-@code{static}.
-@end quotation
+The following built-in functions are available when @option{-mlzcnt} is used.
+All of them generate the machine instruction that is part of the name.
+@smallexample
+unsigned short __builtin_ia32_lzcnt_u16(unsigned short);
+unsigned int __builtin_ia32_lzcnt_u32(unsigned int);
+unsigned long long __builtin_ia32_lzcnt_u64 (unsigned long long);
+@end smallexample
 
-Add new text after paragraph 6
+The following built-in functions are available when @option{-mfxsr} is used.
+All of them generate the machine instruction that is part of the name.
+@smallexample
+void __builtin_ia32_fxsave (void *);
+void __builtin_ia32_fxrstor (void *);
+void __builtin_ia32_fxsave64 (void *);
+void __builtin_ia32_fxrstor64 (void *);
+@end smallexample
 
-@quotation
-The declaration of an identifier for a variable that has
-block scope that specifies @code{__thread} shall also
-specify either @code{extern} or @code{static}.
+The following built-in functions are available when @option{-mxsave} is used.
+All of them generate the machine instruction that is part of the name.
+@smallexample
+void __builtin_ia32_xsave (void *, long long);
+void __builtin_ia32_xrstor (void *, long long);
+void __builtin_ia32_xsave64 (void *, long long);
+void __builtin_ia32_xrstor64 (void *, long long);
+@end smallexample
 
-The @code{__thread} specifier shall be used only with
-variables.
-@end quotation
-@end itemize
+The following built-in functions are available when @option{-mxsaveopt} is used.
+All of them generate the machine instruction that is part of the name.
+@smallexample
+void __builtin_ia32_xsaveopt (void *, long long);
+void __builtin_ia32_xsaveopt64 (void *, long long);
+@end smallexample
 
-@node C++98 Thread-Local Edits
-@subsection ISO/IEC 14882:1998 Edits for Thread-Local Storage
+The following built-in functions are available when @option{-mtbm} is used.
+Both of them generate the immediate form of the bextr machine instruction.
+@smallexample
+unsigned int __builtin_ia32_bextri_u32 (unsigned int,
+                                        const unsigned int);
+unsigned long long __builtin_ia32_bextri_u64 (unsigned long long,
+                                              const unsigned long long);
+@end smallexample
 
-The following are a set of changes to ISO/IEC 14882:1998 (aka C++98)
-that document the exact semantics of the language extension.
 
-@itemize @bullet
-@item
-@b{[intro.execution]}
+The following built-in functions are available when @option{-m3dnow} is used.
+All of them generate the machine instruction that is part of the name.
 
-New text after paragraph 4
+@smallexample
+void __builtin_ia32_femms (void);
+v8qi __builtin_ia32_pavgusb (v8qi, v8qi);
+v2si __builtin_ia32_pf2id (v2sf);
+v2sf __builtin_ia32_pfacc (v2sf, v2sf);
+v2sf __builtin_ia32_pfadd (v2sf, v2sf);
+v2si __builtin_ia32_pfcmpeq (v2sf, v2sf);
+v2si __builtin_ia32_pfcmpge (v2sf, v2sf);
+v2si __builtin_ia32_pfcmpgt (v2sf, v2sf);
+v2sf __builtin_ia32_pfmax (v2sf, v2sf);
+v2sf __builtin_ia32_pfmin (v2sf, v2sf);
+v2sf __builtin_ia32_pfmul (v2sf, v2sf);
+v2sf __builtin_ia32_pfrcp (v2sf);
+v2sf __builtin_ia32_pfrcpit1 (v2sf, v2sf);
+v2sf __builtin_ia32_pfrcpit2 (v2sf, v2sf);
+v2sf __builtin_ia32_pfrsqrt (v2sf);
+v2sf __builtin_ia32_pfsub (v2sf, v2sf);
+v2sf __builtin_ia32_pfsubr (v2sf, v2sf);
+v2sf __builtin_ia32_pi2fd (v2si);
+v4hi __builtin_ia32_pmulhrw (v4hi, v4hi);
+@end smallexample
 
-@quotation
-A @dfn{thread} is a flow of control within the abstract machine.
-It is implementation defined whether or not there may be more than
-one thread.
-@end quotation
+The following built-in functions are available when @option{-m3dnowa} is used.
+All of them generate the machine instruction that is part of the name.
 
-New text after paragraph 7
+@smallexample
+v2si __builtin_ia32_pf2iw (v2sf);
+v2sf __builtin_ia32_pfnacc (v2sf, v2sf);
+v2sf __builtin_ia32_pfpnacc (v2sf, v2sf);
+v2sf __builtin_ia32_pi2fw (v2si);
+v2sf __builtin_ia32_pswapdsf (v2sf);
+v2si __builtin_ia32_pswapdsi (v2si);
+@end smallexample
 
-@quotation
-It is unspecified whether additional action must be taken to
-ensure when and whether side effects are visible to other threads.
-@end quotation
+The following built-in functions are available when @option{-mrtm} is used
+They are used for restricted transactional memory. These are the internal
+low level functions. Normally the functions in 
+@ref{x86 transactional memory intrinsics} should be used instead.
 
-@item
-@b{[lex.key]}
+@smallexample
+int __builtin_ia32_xbegin ();
+void __builtin_ia32_xend ();
+void __builtin_ia32_xabort (status);
+int __builtin_ia32_xtest ();
+@end smallexample
 
-Add @code{__thread}.
+The following built-in functions are available when @option{-mmwaitx} is used.
+All of them generate the machine instruction that is part of the name.
+@smallexample
+void __builtin_ia32_monitorx (void *, unsigned int, unsigned int);
+void __builtin_ia32_mwaitx (unsigned int, unsigned int, unsigned int);
+@end smallexample
 
-@item
-@b{[basic.start.main]}
+The following built-in functions are available when @option{-mclzero} is used.
+All of them generate the machine instruction that is part of the name.
+@smallexample
+void __builtin_i32_clzero (void *);
+@end smallexample
 
-Add after paragraph 5
+The following built-in functions are available when @option{-mpku} is used.
+They generate reads and writes to PKRU.
+@smallexample
+void __builtin_ia32_wrpkru (unsigned int);
+unsigned int __builtin_ia32_rdpkru ();
+@end smallexample
 
-@quotation
-The thread that begins execution at the @code{main} function is called
-the @dfn{main thread}.  It is implementation defined how functions
-beginning threads other than the main thread are designated or typed.
-A function so designated, as well as the @code{main} function, is called
-a @dfn{thread startup function}.  It is implementation defined what
-happens if a thread startup function returns.  It is implementation
-defined what happens to other threads when any thread calls @code{exit}.
-@end quotation
+The following built-in functions are available when
+@option{-mshstk} option is used.  They support shadow stack
+machine instructions from Intel Control-flow Enforcement Technology (CET).
+Each built-in function generates the  machine instruction that is part
+of the function's name.  These are the internal low-level functions.
+Normally the functions in @ref{x86 control-flow protection intrinsics}
+should be used instead.
 
-@item
-@b{[basic.start.init]}
+@smallexample
+unsigned int __builtin_ia32_rdsspd (void);
+unsigned long long __builtin_ia32_rdsspq (void);
+void __builtin_ia32_incsspd (unsigned int);
+void __builtin_ia32_incsspq (unsigned long long);
+void __builtin_ia32_saveprevssp(void);
+void __builtin_ia32_rstorssp(void *);
+void __builtin_ia32_wrssd(unsigned int, void *);
+void __builtin_ia32_wrssq(unsigned long long, void *);
+void __builtin_ia32_wrussd(unsigned int, void *);
+void __builtin_ia32_wrussq(unsigned long long, void *);
+void __builtin_ia32_setssbsy(void);
+void __builtin_ia32_clrssbsy(void *);
+@end smallexample
 
-Add after paragraph 4
+@node x86 transactional memory intrinsics
+@subsection x86 Transactional Memory Intrinsics
 
-@quotation
-The storage for an object of thread storage duration shall be
-statically initialized before the first statement of the thread startup
-function.  An object of thread storage duration shall not require
-dynamic initialization.
-@end quotation
+These hardware transactional memory intrinsics for x86 allow you to use
+memory transactions with RTM (Restricted Transactional Memory).
+This support is enabled with the @option{-mrtm} option.
+For using HLE (Hardware Lock Elision) see 
+@ref{x86 specific memory model extensions for transactional memory} instead.
 
-@item
-@b{[basic.start.term]}
+A memory transaction commits all changes to memory in an atomic way,
+as visible to other threads. If the transaction fails it is rolled back
+and all side effects discarded.
 
-Add after paragraph 3
+Generally there is no guarantee that a memory transaction ever succeeds
+and suitable fallback code always needs to be supplied.
 
-@quotation
-The type of an object with thread storage duration shall not have a
-non-trivial destructor, nor shall it be an array type whose elements
-(directly or indirectly) have non-trivial destructors.
-@end quotation
+@deftypefn {RTM Function} {unsigned} _xbegin ()
+Start a RTM (Restricted Transactional Memory) transaction. 
+Returns @code{_XBEGIN_STARTED} when the transaction
+started successfully (note this is not 0, so the constant has to be 
+explicitly tested).  
 
-@item
-@b{[basic.stc]}
+If the transaction aborts, all side effects
+are undone and an abort code encoded as a bit mask is returned.
+The following macros are defined:
 
-Add ``thread storage duration'' to the list in paragraph 1.
+@defmac{_XABORT_EXPLICIT}
+Transaction was explicitly aborted with @code{_xabort}.  The parameter passed
+to @code{_xabort} is available with @code{_XABORT_CODE(status)}.
+@end defmac
 
-Change paragraph 2
+@defmac{_XABORT_RETRY}
+Transaction retry is possible.
+@end defmac
 
-@quotation
-Thread, static, and automatic storage durations are associated with
-objects introduced by declarations [@dots{}].
-@end quotation
+@defmac{_XABORT_CONFLICT}
+Transaction abort due to a memory conflict with another thread.
+@end defmac
 
-Add @code{__thread} to the list of specifiers in paragraph 3.
+@defmac{_XABORT_CAPACITY}
+Transaction abort due to the transaction using too much memory.
+@end defmac
 
-@item
-@b{[basic.stc.thread]}
+@defmac{_XABORT_DEBUG}
+Transaction abort due to a debug trap.
+@end defmac
 
-New section before @b{[basic.stc.static]}
+@defmac{_XABORT_NESTED}
+Transaction abort in an inner nested transaction.
+@end defmac
 
-@quotation
-The keyword @code{__thread} applied to a non-local object gives the
-object thread storage duration.
+There is no guarantee
+any transaction ever succeeds, so there always needs to be a valid
+fallback path.
+@end deftypefn
 
-A local variable or class data member declared both @code{static}
-and @code{__thread} gives the variable or member thread storage
-duration.
-@end quotation
+@deftypefn {RTM Function} {void} _xend ()
+Commit the current transaction. When no transaction is active this faults.
+All memory side effects of the transaction become visible
+to other threads in an atomic manner.
+@end deftypefn
 
-@item
-@b{[basic.stc.static]}
+@deftypefn {RTM Function} {int} _xtest ()
+Return a nonzero value if a transaction is currently active, otherwise 0.
+@end deftypefn
 
-Change paragraph 1
+@deftypefn {RTM Function} {void} _xabort (status)
+Abort the current transaction. When no transaction is active this is a no-op.
+The @var{status} is an 8-bit constant; its value is encoded in the return 
+value from @code{_xbegin}.
+@end deftypefn
 
-@quotation
-All objects that have neither thread storage duration, dynamic
-storage duration nor are local [@dots{}].
-@end quotation
+Here is an example showing handling for @code{_XABORT_RETRY}
+and a fallback path for other failures:
 
-@item
-@b{[dcl.stc]}
+@smallexample
+#include <immintrin.h>
 
-Add @code{__thread} to the list in paragraph 1.
+int n_tries, max_tries;
+unsigned status = _XABORT_EXPLICIT;
+...
 
-Change paragraph 1
+for (n_tries = 0; n_tries < max_tries; n_tries++) 
+  @{
+    status = _xbegin ();
+    if (status == _XBEGIN_STARTED || !(status & _XABORT_RETRY))
+      break;
+  @}
+if (status == _XBEGIN_STARTED) 
+  @{
+    ... transaction code...
+    _xend ();
+  @} 
+else 
+  @{
+    ... non-transactional fallback path...
+  @}
+@end smallexample
 
-@quotation
-With the exception of @code{__thread}, at most one
-@var{storage-class-specifier} shall appear in a given
-@var{decl-specifier-seq}.  The @code{__thread} specifier may
-be used alone, or immediately following the @code{extern} or
-@code{static} specifiers.  [@dots{}]
-@end quotation
+@noindent
+Note that, in most cases, the transactional and non-transactional code
+must synchronize together to ensure consistency.
 
-Add after paragraph 5
+@node x86 control-flow protection intrinsics
+@subsection x86 Control-Flow Protection Intrinsics
 
-@quotation
-The @code{__thread} specifier can be applied only to the names of objects
-and to anonymous unions.
-@end quotation
+@deftypefn {CET Function} {ret_type} _get_ssp (void)
+Get the current value of shadow stack pointer if shadow stack support
+from Intel CET is enabled in the hardware or @code{0} otherwise.
+The @code{ret_type} is @code{unsigned long long} for 64-bit targets 
+and @code{unsigned int} for 32-bit targets.
+@end deftypefn
 
-@item
-@b{[class.mem]}
+@deftypefn {CET Function} void _inc_ssp (unsigned int)
+Increment the current shadow stack pointer by the size specified by the
+function argument.  The argument is masked to a byte value for security
+reasons, so to increment by more than 255 bytes you must call the function
+multiple times.
+@end deftypefn
 
-Add after paragraph 6
+The shadow stack unwind code looks like:
 
-@quotation
-Non-@code{static} members shall not be @code{__thread}.
-@end quotation
-@end itemize
+@smallexample
+#include <immintrin.h>
 
-@node OpenMP
-@section OpenMP
-@cindex OpenMP extension support
+/* Unwind the shadow stack for EH.  */
+#define _Unwind_Frames_Extra(x)       \
+  do                                  \
+    @{                                \
+      _Unwind_Word ssp = _get_ssp (); \
+      if (ssp != 0)                   \
+        @{                            \
+          _Unwind_Word tmp = (x);     \
+          while (tmp > 255)           \
+            @{                        \
+              _inc_ssp (tmp);         \
+              tmp -= 255;             \
+            @}                        \
+          _inc_ssp (tmp);             \
+        @}                            \
+    @}                                \
+    while (0)
+@end smallexample
 
-OpenMP (Open Multi-Processing) is an application programming
-interface (API) that supports multi-platform shared memory
-multiprocessing programming in C/C++ and Fortran on many
-architectures, including Unix and Microsoft Windows platforms.
-It consists of a set of compiler directives, library routines,
-and environment variables that influence run-time behavior.
+@noindent
+This code runs unconditionally on all 64-bit processors.  For 32-bit
+processors the code runs on those that support multi-byte NOP instructions.
 
-GCC implements all of the @uref{https://www.openmp.org/specifications/,
-OpenMP Application Program Interface v4.5}, and many features from later
-versions of the OpenMP specification.
-@xref{OpenMP Implementation Status,,,libgomp,
-GNU Offloading and Multi Processing Runtime Library},
-for more details about currently supported OpenMP features.
+@node Target Format Checks
+@section Format Checks Specific to Particular Target Machines
 
-To enable the processing of OpenMP directives @samp{#pragma omp},
-@samp{[[omp::directive(...)]]}, @samp{[[omp::decl(...)]]},
-and @samp{[[omp::sequence(...)]]} in C and C++,
-GCC needs to be invoked with the @option{-fopenmp} option.
-This option also arranges for automatic linking of the OpenMP
-runtime library.
-@xref{,,,libgomp,GNU Offloading and Multi Processing Runtime Library}.
+For some target machines, GCC supports additional options to the
+format attribute
+(@pxref{Function Attributes,,Declaring Attributes of Functions}).
 
-@xref{OpenMP and OpenACC Options}, for additional options useful with
-@option{-fopenmp}.
+@menu
+* Solaris Format Checks::
+* Darwin Format Checks::
+@end menu
 
-@node OpenACC
-@section OpenACC
-@cindex OpenACC extension support
+@node Solaris Format Checks
+@subsection Solaris Format Checks
 
-OpenACC is an application programming interface (API) that supports
-offloading of code to accelerator devices.  It consists of a set of
-compiler directives, library routines, and environment variables that
-influence run-time behavior.
+Solaris targets support the @code{cmn_err} (or @code{__cmn_err__}) format
+check.  @code{cmn_err} accepts a subset of the standard @code{printf}
+conversions, and the two-argument @code{%b} conversion for displaying
+bit-fields.  See the Solaris man page for @code{cmn_err} for more information.
 
-GCC strives to be compatible with the
-@uref{https://www.openacc.org/, OpenACC Application Programming
-Interface v2.6}.
+@node Darwin Format Checks
+@subsection Darwin Format Checks
 
-To enable the processing of OpenACC directives @samp{#pragma acc}
-in C and C++, GCC needs to be invoked with the @option{-fopenacc} option.
-This option also arranges for automatic linking of the OpenACC runtime
-library.
-@xref{,,,libgomp,GNU Offloading and Multi Processing Runtime Library}.
+In addition to the full set of format archetypes (attribute format style
+arguments such as @code{printf}, @code{scanf}, @code{strftime}, and
+@code{strfmon}), Darwin targets also support the @code{CFString} (or
+@code{__CFString__}) archetype in the @code{format} attribute.
+Declarations with this archetype are parsed for correct syntax
+and argument types.  However, parsing of the format string itself and
+validating arguments against it in calls to such functions is currently
+not performed.
 
-@xref{OpenMP and OpenACC Options}, for additional options useful with
-@option{-fopenacc}.
+Additionally, @code{CFStringRefs} (defined by the @code{CoreFoundation} headers) may
+also be used as format arguments.  Note that the relevant headers are only likely to be
+available on Darwin (OSX) installations.  On such installations, the XCode and system
+documentation provide descriptions of @code{CFString}, @code{CFStringRefs} and
+associated functions.
 
 @node C++ Extensions
 @chapter Extensions to the C++ Language