From: Sandra Loosemore Date: Thu, 13 Mar 2025 22:48:09 +0000 (+0000) Subject: Doc: Rearrange remaining top-level sections in extend.texi [PR42270] X-Git-Tag: basepoints/gcc-16~867 X-Git-Url: http://git.ipfire.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=96492302a23c945d35fe1c83062da6f22c4f7b72;p=thirdparty%2Fgcc.git Doc: Rearrange remaining top-level sections in extend.texi [PR42270] This is part of an incremental effort to make the chapter on GCC extensions better organized by grouping/rearranging sections by topic. gcc/ChangeLog PR other/42270 * doc/extend.texi (Nonlocal Gotos): Group with other built-ins sections. (Constructing Calls): Likewise. (Pragmas): Move earlier in the section, before the built-ins docs. (Thread-Local): Likewise. (OpenMP): Likewise. (OpenACC): Likewise. --- diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index 59cad54d2cd..9f8a590a301 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -23,8 +23,6 @@ Some features that are in ISO C99 but not C90 or C++ are also, as extensions, accepted by GCC in C90 mode and in C++. @menu -* Nonlocal Gotos:: Nonlocal gotos. -* Constructing Calls:: Dispatching a call to another function. * Additional Numeric Types:: Additional sizes and formats, plus complex numbers. * Aggregate Types:: Extensions to arrays, structs, and unions. * Named Address Spaces::Named address spaces. @@ -36,11 +34,17 @@ extensions, accepted by GCC in C90 mode and in C++. * Enumerator Attributes:: Specifying attributes on enumerators. * Statement Attributes:: Specifying attributes on statements. * Attribute Syntax:: Formal syntax for attributes. +* Pragmas:: Pragmas accepted by GCC. +* Thread-Local:: Per-thread variables. +* OpenMP:: Multiprocessing extensions. +* OpenACC:: Extensions for offloading code to accelerator devices. * Inline:: Defining inline functions (as fast as macros). * Volatiles:: What constitutes an access to a volatile object. * Using Assembly Language with C:: Instructions and extensions for interfacing C with assembler. * Syntax Extensions:: Other extensions to C syntax. * Semantic Extensions:: GNU C defines behavior for some non-standard constructs. +* Nonlocal Gotos:: Built-ins for nonlocal gotos. +* Constructing Calls:: Built-ins for dispatching a call to another function. * Return Address:: Getting the return or frame address of a function. * Stack Scrubbing:: Stack scrubbing internal interfaces. * Vector Extensions:: Using vector instructions through built-in functions. @@ -55,184 +59,8 @@ extensions, accepted by GCC in C90 mode and in C++. * Other Builtins:: Other built-in functions. * Target Builtins:: Built-in functions specific to particular targets. * Target Format Checks:: Format checks specific to particular targets. -* Pragmas:: Pragmas accepted by GCC. -* Thread-Local:: Per-thread variables. -* OpenMP:: Multiprocessing extensions. -* OpenACC:: Extensions for offloading code to accelerator devices. @end menu -@node Nonlocal Gotos -@section Nonlocal Gotos -@cindex nonlocal gotos - -GCC provides the built-in functions @code{__builtin_setjmp} and -@code{__builtin_longjmp} which are similar to, but not interchangeable -with, the C library functions @code{setjmp} and @code{longjmp}. -The built-in versions are used internally by GCC's libraries -to implement exception handling on some targets. You should use the -standard C library functions declared in @code{} in user code -instead of the builtins. - -The built-in versions of these functions use GCC's normal -mechanisms to save and restore registers using the stack on function -entry and exit. The jump buffer argument @var{buf} holds only the -information needed to restore the stack frame, rather than the entire -set of saved register values. - -An important caveat is that GCC arranges to save and restore only -those registers known to the specific architecture variant being -compiled for. This can make @code{__builtin_setjmp} and -@code{__builtin_longjmp} more efficient than their library -counterparts in some cases, but it can also cause incorrect and -mysterious behavior when mixing with code that uses the full register -set. - -You should declare the jump buffer argument @var{buf} to the -built-in functions as: - -@smallexample -#include -intptr_t @var{buf}[5]; -@end smallexample - -@defbuiltin{{int} __builtin_setjmp (intptr_t *@var{buf})} -This function saves the current stack context in @var{buf}. -@code{__builtin_setjmp} returns 0 when returning directly, -and 1 when returning from @code{__builtin_longjmp} using the same -@var{buf}. -@enddefbuiltin - -@defbuiltin{{void} __builtin_longjmp (intptr_t *@var{buf}, int @var{val})} -This function restores the stack context in @var{buf}, -saved by a previous call to @code{__builtin_setjmp}. After -@code{__builtin_longjmp} is finished, the program resumes execution as -if the matching @code{__builtin_setjmp} returns the value @var{val}, -which must be 1. - -Because @code{__builtin_longjmp} depends on the function return -mechanism to restore the stack context, it cannot be called -from the same function calling @code{__builtin_setjmp} to -initialize @var{buf}. It can only be called from a function called -(directly or indirectly) from the function calling @code{__builtin_setjmp}. -@enddefbuiltin - -@node Constructing Calls -@section Constructing Function Calls -@cindex constructing calls -@cindex forwarding calls - -Using the built-in functions described below, you can record -the arguments a function received, and call another function -with the same arguments, without knowing the number or types -of the arguments. - -You can also record the return value of that function call, -and later return that value, without knowing what data type -the function tried to return (as long as your caller expects -that data type). - -However, these built-in functions may interact badly with some -sophisticated features or other extensions of the language. It -is, therefore, not recommended to use them outside very simple -functions acting as mere forwarders for their arguments. - -@defbuiltin{{void *} __builtin_apply_args ()} -This built-in function returns a pointer to data -describing how to perform a call with the same arguments as are passed -to the current function. - -The function saves the arg pointer register, structure value address, -and all registers that might be used to pass arguments to a function -into a block of memory allocated on the stack. Then it returns the -address of that block. -@enddefbuiltin - -@defbuiltin{{void *} __builtin_apply (void (*@var{function})(), void *@var{arguments}, size_t @var{size})} -This built-in function invokes @var{function} -with a copy of the parameters described by @var{arguments} -and @var{size}. - -The value of @var{arguments} should be the value returned by -@code{__builtin_apply_args}. The argument @var{size} specifies the size -of the stack argument data, in bytes. - -This function returns a pointer to data describing -how to return whatever value is returned by @var{function}. The data -is saved in a block of memory allocated on the stack. - -It is not always simple to compute the proper value for @var{size}. The -value is used by @code{__builtin_apply} to compute the amount of data -that should be pushed on the stack and copied from the incoming argument -area. -@enddefbuiltin - -@defbuiltin{{void} __builtin_return (void *@var{result})} -This built-in function returns the value described by @var{result} from -the containing function. You should specify, for @var{result}, a value -returned by @code{__builtin_apply}. -@enddefbuiltin - -@defbuiltin{{} __builtin_va_arg_pack ()} -This built-in function represents all anonymous arguments of an inline -function. It can be used only in inline functions that are always -inlined, never compiled as a separate function, such as those using -@code{__attribute__ ((__always_inline__))} or -@code{__attribute__ ((__gnu_inline__))} extern inline functions. -It must be only passed as last argument to some other function -with variable arguments. This is useful for writing small wrapper -inlines for variable argument functions, when using preprocessor -macros is undesirable. For example: -@smallexample -extern int myprintf (FILE *f, const char *format, ...); -extern inline __attribute__ ((__gnu_inline__)) int -myprintf (FILE *f, const char *format, ...) -@{ - int r = fprintf (f, "myprintf: "); - if (r < 0) - return r; - int s = fprintf (f, format, __builtin_va_arg_pack ()); - if (s < 0) - return s; - return r + s; -@} -@end smallexample -@enddefbuiltin - -@defbuiltin{int __builtin_va_arg_pack_len ()} -This built-in function returns the number of anonymous arguments of -an inline function. It can be used only in inline functions that -are always inlined, never compiled as a separate function, such -as those using @code{__attribute__ ((__always_inline__))} or -@code{__attribute__ ((__gnu_inline__))} extern inline functions. -For example following does link- or run-time checking of open -arguments for optimized code: -@smallexample -#ifdef __OPTIMIZE__ -extern inline __attribute__((__gnu_inline__)) int -myopen (const char *path, int oflag, ...) -@{ - if (__builtin_va_arg_pack_len () > 1) - warn_open_too_many_arguments (); - - if (__builtin_constant_p (oflag)) - @{ - if ((oflag & O_CREAT) != 0 && __builtin_va_arg_pack_len () < 1) - @{ - warn_open_missing_mode (); - return __open_2 (path, oflag); - @} - return open (path, oflag, __builtin_va_arg_pack ()); - @} - - if (__builtin_va_arg_pack_len () < 1) - return __open_2 (path, oflag); - - return open (path, oflag, __builtin_va_arg_pack ()); -@} -#endif -@end smallexample -@enddefbuiltin - @node Additional Numeric Types @section Additional Numeric Types @@ -9751,19850 +9579,20022 @@ target type; if such an attribute is applied to a function return type that is not a pointer-to-function type, it is treated as applying to the function type. -@node Inline -@section An Inline Function is As Fast As a Macro -@cindex inline functions -@cindex integrating function code -@cindex open coding -@cindex macros, inline alternative +@node Pragmas +@section Pragmas Accepted by GCC +@cindex pragmas +@cindex @code{#pragma} -By declaring a function inline, you can direct GCC to make -calls to that function faster. One way GCC can achieve this is to -integrate that function's code into the code for its callers. This -makes execution faster by eliminating the function-call overhead; in -addition, if any of the actual argument values are constant, their -known values may permit simplifications at compile time so that not -all of the inline function's code needs to be included. The effect on -code size is less predictable; object code may be larger or smaller -with function inlining, depending on the particular case. You can -also direct GCC to try to integrate all ``simple enough'' functions -into their callers with the option @option{-finline-functions}. +GCC supports several types of pragmas, primarily in order to compile +code originally written for other compilers. Note that in general +we do not recommend the use of pragmas; @xref{Function Attributes}, +for further explanation. -GCC implements three different semantics of declaring a function -inline. One is available with @option{-std=gnu89} or -@option{-fgnu89-inline} or when @code{gnu_inline} attribute is present -on all inline declarations, another when -@option{-std=c99}, -@option{-std=gnu99} or an option for a later C version is used -(without @option{-fgnu89-inline}), and the third -is used when compiling C++. +The GNU C preprocessor recognizes several pragmas in addition to the +compiler pragmas documented here. Refer to the CPP manual for more +information. -To declare a function inline, use the @code{inline} keyword in its -declaration, like this: +GCC additionally recognizes OpenMP pragmas when the @option{-fopenmp} +option is specified, and OpenACC pragmas when the @option{-fopenacc} +option is specified. @xref{OpenMP}, and @ref{OpenACC}. + +@menu +* AArch64 Pragmas:: +* ARM Pragmas:: +* LoongArch Pragmas:: +* M32C Pragmas:: +* PRU Pragmas:: +* RS/6000 and PowerPC Pragmas:: +* S/390 Pragmas:: +* Darwin Pragmas:: +* Solaris Pragmas:: +* Symbol-Renaming Pragmas:: +* Structure-Layout Pragmas:: +* Weak Pragmas:: +* Diagnostic Pragmas:: +* Visibility Pragmas:: +* Push/Pop Macro Pragmas:: +* Function Specific Option Pragmas:: +* Loop-Specific Pragmas:: +@end menu + +@node AArch64 Pragmas +@subsection AArch64 Pragmas +The pragmas defined by the AArch64 target correspond to the AArch64 +target function attributes. They can be specified as below: @smallexample -static inline int -inc (int *a) -@{ - return (*a)++; -@} +#pragma GCC target("string") @end smallexample -If you are writing a header file to be included in ISO C90 programs, write -@code{__inline__} instead of @code{inline}. @xref{Alternate Keywords}. +where @code{@var{string}} can be any string accepted as an AArch64 target +attribute. @xref{AArch64 Function Attributes}, for more details +on the permissible values of @code{string}. -The three types of inlining behave similarly in two important cases: -when the @code{inline} keyword is used on a @code{static} function, -like the example above, and when a function is first declared without -using the @code{inline} keyword and then is defined with -@code{inline}, like this: +@node ARM Pragmas +@subsection ARM Pragmas -@smallexample -extern int inc (int *a); -inline int -inc (int *a) -@{ - return (*a)++; -@} -@end smallexample +The ARM target defines pragmas for controlling the default addition of +@code{long_call} and @code{short_call} attributes to functions. +@xref{Function Attributes}, for information about the effects of these +attributes. -In both of these common cases, the program behaves the same as if you -had not used the @code{inline} keyword, except for its speed. +@table @code +@cindex pragma, long_calls +@item long_calls +Set all subsequent functions to have the @code{long_call} attribute. -@cindex inline functions, omission of -@opindex fkeep-inline-functions -When a function is both inline and @code{static}, if all calls to the -function are integrated into the caller, and the function's address is -never used, then the function's own assembler code is never referenced. -In this case, GCC does not actually output assembler code for the -function, unless you specify the option @option{-fkeep-inline-functions}. -If there is a nonintegrated call, then the function is compiled to -assembler code as usual. The function must also be compiled as usual if -the program refers to its address, because that cannot be inlined. +@cindex pragma, no_long_calls +@item no_long_calls +Set all subsequent functions to have the @code{short_call} attribute. -@opindex Winline -Note that certain usages in a function definition can make it unsuitable -for inline substitution. Among these usages are: variadic functions, -use of @code{alloca}, use of computed goto (@pxref{Labels as Values}), -use of nonlocal goto, use of nested functions, use of @code{setjmp}, use -of @code{__builtin_longjmp} and use of @code{__builtin_return} or -@code{__builtin_apply_args}. Using @option{-Winline} warns when a -function marked @code{inline} could not be substituted, and gives the -reason for the failure. +@cindex pragma, long_calls_off +@item long_calls_off +Do not affect the @code{long_call} or @code{short_call} attributes of +subsequent functions. +@end table -@cindex automatic @code{inline} for C++ member fns -@cindex @code{inline} automatic for C++ member fns -@cindex member fns, automatically @code{inline} -@cindex C++ member fns, automatically @code{inline} -@opindex fno-default-inline -As required by ISO C++, GCC considers member functions defined within -the body of a class to be marked inline even if they are -not explicitly declared with the @code{inline} keyword. You can -override this with @option{-fno-default-inline}; @pxref{C++ Dialect -Options,,Options Controlling C++ Dialect}. +@node LoongArch Pragmas +@subsection LoongArch Pragmas -GCC does not inline any functions when not optimizing unless you specify -the @samp{always_inline} attribute for the function, like this: +The list of attributes supported by Pragma is the same as that of target +function attributes. @xref{LoongArch Function Attributes}. + +Example: @smallexample -/* @r{Prototype.} */ -inline void foo (const char) __attribute__((always_inline)); +#pragma GCC target("strict-align") @end smallexample -The remainder of this section is specific to GNU C90 inlining. +@node M32C Pragmas +@subsection M32C Pragmas -@cindex non-static inline function -When an inline function is not @code{static}, then the compiler must assume -that there may be calls from other source files; since a global symbol can -be defined only once in any program, the function must not be defined in -the other source files, so the calls therein cannot be integrated. -Therefore, a non-@code{static} inline function is always compiled on its -own in the usual fashion. +@table @code +@cindex pragma, memregs +@item GCC memregs @var{number} +Overrides the command-line option @code{-memregs=} for the current +file. Use with care! This pragma must be before any function in the +file, and mixing different memregs values in different objects may +make them incompatible. This pragma is useful when a +performance-critical function uses a memreg for temporary values, +as it may allow you to reduce the number of memregs used. -If you specify both @code{inline} and @code{extern} in the function -definition, then the definition is used only for inlining. In no case -is the function compiled on its own, not even if you refer to its -address explicitly. Such an address becomes an external reference, as -if you had only declared the function, and had not defined it. +@cindex pragma, address +@item ADDRESS @var{name} @var{address} +For any declared symbols matching @var{name}, this does three things +to that symbol: it forces the symbol to be located at the given +address (a number), it forces the symbol to be volatile, and it +changes the symbol's scope to be static. This pragma exists for +compatibility with other compilers, but note that the common +@code{1234H} numeric syntax is not supported (use @code{0x1234} +instead). Example: -This combination of @code{inline} and @code{extern} has almost the -effect of a macro. The way to use it is to put a function definition in -a header file with these keywords, and put another copy of the -definition (lacking @code{inline} and @code{extern}) in a library file. -The definition in the header file causes most calls to the function -to be inlined. If any uses of the function remain, they refer to -the single copy in the library. +@smallexample +#pragma ADDRESS port3 0x103 +char port3; +@end smallexample -@node Volatiles -@section When is a Volatile Object Accessed? -@cindex accessing volatiles -@cindex volatile read -@cindex volatile write -@cindex volatile access +@end table -C has the concept of volatile objects. These are normally accessed by -pointers and used for accessing hardware or inter-thread -communication. The standard encourages compilers to refrain from -optimizations concerning accesses to volatile objects, but leaves it -implementation defined as to what constitutes a volatile access. The -minimum requirement is that at a sequence point all previous accesses -to volatile objects have stabilized and no subsequent accesses have -occurred. Thus an implementation is free to reorder and combine -volatile accesses that occur between sequence points, but cannot do -so for accesses across a sequence point. The use of volatile does -not allow you to violate the restriction on updating objects multiple -times between two sequence points. +@node PRU Pragmas +@subsection PRU Pragmas -Accesses to non-volatile objects are not ordered with respect to -volatile accesses. You cannot use a volatile object as a memory -barrier to order a sequence of writes to non-volatile memory. For -instance: +@table @code + +@cindex pragma, ctable_entry +@item ctable_entry @var{index} @var{constant_address} +Specifies that the PRU CTABLE entry given by @var{index} has the value +@var{constant_address}. This enables GCC to emit LBCO/SBCO instructions +when the load/store address is known and can be addressed with some CTABLE +entry. For example: @smallexample -int *ptr = @var{something}; -volatile int vobj; -*ptr = @var{something}; -vobj = 1; +/* will compile to "sbco Rx, 2, 0x10, 4" */ +#pragma ctable_entry 2 0x4802a000 +*(unsigned int *)0x4802a010 = val; @end smallexample -@noindent -Unless @var{*ptr} and @var{vobj} can be aliased, it is not guaranteed -that the write to @var{*ptr} occurs by the time the update -of @var{vobj} happens. If you need this guarantee, you must use -a stronger memory barrier such as: +@end table -@smallexample -int *ptr = @var{something}; -volatile int vobj; -*ptr = @var{something}; -asm volatile ("" : : : "memory"); -vobj = 1; -@end smallexample +@node RS/6000 and PowerPC Pragmas +@subsection RS/6000 and PowerPC Pragmas -A scalar volatile object is read when it is accessed in a void context: +The RS/6000 and PowerPC targets define one pragma for controlling +whether or not the @code{longcall} attribute is added to function +declarations by default. This pragma overrides the @option{-mlongcall} +option, but not the @code{longcall} and @code{shortcall} attributes. +@xref{RS/6000 and PowerPC Options}, for more information about when long +calls are and are not necessary. -@smallexample -volatile int *src = @var{somevalue}; -*src; -@end smallexample +@table @code +@cindex pragma, longcall +@item longcall (1) +Apply the @code{longcall} attribute to all subsequent function +declarations. -Such expressions are rvalues, and GCC implements this as a -read of the volatile object being pointed to. +@item longcall (0) +Do not apply the @code{longcall} attribute to subsequent function +declarations. +@end table -Assignments are also expressions and have an rvalue. However when -assigning to a scalar volatile, the volatile object is not reread, -regardless of whether the assignment expression's rvalue is used or -not. If the assignment's rvalue is used, the value is that assigned -to the volatile object. For instance, there is no read of @var{vobj} -in all the following cases: +@c Describe h8300 pragmas here. +@c Describe sh pragmas here. +@c Describe v850 pragmas here. -@smallexample -int obj; -volatile int vobj; -vobj = @var{something}; -obj = vobj = @var{something}; -obj ? vobj = @var{onething} : vobj = @var{anotherthing}; -obj = (@var{something}, vobj = @var{anotherthing}); -@end smallexample +@node S/390 Pragmas +@subsection S/390 Pragmas -If you need to read the volatile object after an assignment has -occurred, you must use a separate expression with an intervening -sequence point. +The pragmas defined by the S/390 target correspond to the S/390 +target function attributes and some the additional options: -As bit-fields are not individually addressable, volatile bit-fields may -be implicitly read when written to, or when adjacent bit-fields are -accessed. Bit-field operations may be optimized such that adjacent -bit-fields are only partially accessed, if they straddle a storage unit -boundary. For these reasons it is unwise to use volatile bit-fields to -access hardware. +@table @samp +@item zvector +@itemx no-zvector +@end table -@node Using Assembly Language with C -@section How to Use Inline Assembly Language in C Code -@cindex @code{asm} keyword -@cindex assembly language in C -@cindex inline assembly language -@cindex mixing assembly language and C +Note that options of the pragma, unlike options of the target +attribute, do change the value of preprocessor macros like +@code{__VEC__}. They can be specified as below: -The @code{asm} keyword allows you to embed assembler instructions -within C code. GCC provides two forms of inline @code{asm} -statements. A @dfn{basic @code{asm}} statement is one with no -operands (@pxref{Basic Asm}), while an @dfn{extended @code{asm}} -statement (@pxref{Extended Asm}) includes one or more operands. -The extended form is preferred for mixing C and assembly language -within a function and can be used at top level as well with certain -restrictions. +@smallexample +#pragma GCC target("string[,string]...") +#pragma GCC target("string"[,"string"]...) +@end smallexample -You can also use the @code{asm} keyword to override the assembler name -for a C symbol, or to place a C variable in a specific register. +@node Darwin Pragmas +@subsection Darwin Pragmas -@menu -* Basic Asm:: Inline assembler without operands. -* Extended Asm:: Inline assembler with operands. -* Constraints:: Constraints for @code{asm} operands -* Asm constexprs:: C++11 constant expressions instead of string - literals. -* Asm Labels:: Specifying the assembler name to use for a C symbol. -* Explicit Register Variables:: Defining variables residing in specified - registers. -* Size of an asm:: How GCC calculates the size of an @code{asm} block. -@end menu +The following pragmas are available for all architectures running the +Darwin operating system. These are useful for compatibility with other +macOS compilers. -@node Basic Asm -@subsection Basic Asm --- Assembler Instructions Without Operands -@cindex basic @code{asm} -@cindex assembly language in C, basic +@table @code +@cindex pragma, mark +@item mark @var{tokens}@dots{} +This pragma is accepted, but has no effect. -A basic @code{asm} statement has the following syntax: +@cindex pragma, options align +@item options align=@var{alignment} +This pragma sets the alignment of fields in structures. The values of +@var{alignment} may be @code{mac68k}, to emulate m68k alignment, or +@code{power}, to emulate PowerPC alignment. Uses of this pragma nest +properly; to restore the previous setting, use @code{reset} for the +@var{alignment}. -@example -asm @var{asm-qualifiers} ( @var{AssemblerInstructions} ) -@end example +@cindex pragma, segment +@item segment @var{tokens}@dots{} +This pragma is accepted, but has no effect. -For the C language, the @code{asm} keyword is a GNU extension. -When writing C code that can be compiled with @option{-ansi} and the -@option{-std} options that select C dialects without GNU extensions, use -@code{__asm__} instead of @code{asm} (@pxref{Alternate Keywords}). For -the C++ language, @code{asm} is a standard keyword, but @code{__asm__} -can be used for code compiled with @option{-fno-asm}. +@cindex pragma, unused +@item unused (@var{var} [, @var{var}]@dots{}) +This pragma declares variables to be possibly unused. GCC does not +produce warnings for the listed variables. The effect is similar to +that of the @code{unused} attribute, except that this pragma may appear +anywhere within the variables' scopes. +@end table + +@node Solaris Pragmas +@subsection Solaris Pragmas + +The Solaris target supports @code{#pragma redefine_extname} +(@pxref{Symbol-Renaming Pragmas}). It also supports additional +@code{#pragma} directives for compatibility with the system compiler. -@subsubheading Qualifiers @table @code -@item volatile -The optional @code{volatile} qualifier has no effect. -All basic @code{asm} blocks are implicitly volatile. -Basic @code{asm} statements outside of functions may not use any -qualifiers. +@cindex pragma, align +@item align @var{alignment} (@var{variable} [, @var{variable}]...) -@item inline -If you use the @code{inline} qualifier, then for inlining purposes the size -of the @code{asm} statement is taken as the smallest size possible (@pxref{Size -of an asm}). -@end table +Increase the minimum alignment of each @var{variable} to @var{alignment}. +This is the same as GCC's @code{aligned} attribute @pxref{Variable +Attributes}). Macro expansion occurs on the arguments to this pragma +when compiling C and Objective-C@. It does not currently occur when +compiling C++, but this is a bug which may be fixed in a future +release. -@subsubheading Parameters -@table @var +@cindex pragma, fini +@item fini (@var{function} [, @var{function}]...) -@item AssemblerInstructions -This is a literal string that specifies the assembler code. -In C++ with @option{-std=gnu++11} or later, it can -also be a constant expression inside parentheses (see @ref{Asm constexprs}). +This pragma causes each listed @var{function} to be called after +main, or during shared module unloading, by adding a call to the +@code{.fini} section. -The string can contain any instructions recognized by the assembler, -including directives. GCC does not parse the assembler instructions -themselves and does not know what they mean or even whether they are -valid assembler input. +@cindex pragma, init +@item init (@var{function} [, @var{function}]...) + +This pragma causes each listed @var{function} to be called during +initialization (before @code{main}) or during shared module loading, by +adding a call to the @code{.init} section. -You may place multiple assembler instructions together in a single @code{asm} -string, separated by the characters normally used in assembly code for the -system. A combination that works in most places is a newline to break the -line, plus a tab character (written as @samp{\n\t}). -Some assemblers allow semicolons as a line separator. However, -note that some assembler dialects use semicolons to start a comment. @end table -@subsubheading Remarks -Using extended @code{asm} (@pxref{Extended Asm}) typically produces -smaller, safer, and more efficient code, and in most cases it is a -better solution than basic @code{asm}. However, functions declared -with the @code{naked} attribute require only basic @code{asm} -(@pxref{Function Attributes}). +@node Symbol-Renaming Pragmas +@subsection Symbol-Renaming Pragmas -Basic @code{asm} statements may be used both inside a C function or at -file scope (``top-level''), where you can use this technique to emit -assembler directives, define assembly language macros that can be invoked -elsewhere in the file, or write entire functions in assembly language. +GCC supports a @code{#pragma} directive that changes the name used in +assembly for a given declaration. While this pragma is supported on all +platforms, it is intended primarily to provide compatibility with the +Solaris system headers. This effect can also be achieved using the asm +labels extension (@pxref{Asm Labels}). -Safely accessing C data and calling functions from basic @code{asm} is more -complex than it may appear. To access C data, it is better to use extended -@code{asm}. +@table @code +@cindex pragma, redefine_extname +@item redefine_extname @var{oldname} @var{newname} -Do not expect a sequence of @code{asm} statements to remain perfectly -consecutive after compilation. If certain instructions need to remain -consecutive in the output, put them in a single multi-instruction @code{asm} -statement. Note that GCC's optimizers can move @code{asm} statements -relative to other code, including across jumps. +This pragma gives the C function @var{oldname} the assembly symbol +@var{newname}. The preprocessor macro @code{__PRAGMA_REDEFINE_EXTNAME} +is defined if this pragma is available (currently on all platforms). +@end table -@code{asm} statements may not perform jumps into other @code{asm} statements. -GCC does not know about these jumps, and therefore cannot take -account of them when deciding how to optimize. Jumps from @code{asm} to C -labels are only supported in extended @code{asm}. +This pragma and the @code{asm} labels extension interact in a complicated +manner. Here are some corner cases you may want to be aware of: -Under certain circumstances, GCC may duplicate (or remove duplicates of) your -assembly code when optimizing. This can lead to unexpected duplicate -symbol errors during compilation if your assembly code defines symbols or -labels. +@enumerate +@item This pragma silently applies only to declarations with external +linkage. The @code{asm} label feature does not have this restriction. -@strong{Warning:} The C standards do not specify semantics for @code{asm}, -making it a potential source of incompatibilities between compilers. These -incompatibilities may not produce compiler warnings/errors. +@item In C++, this pragma silently applies only to declarations with +``C'' linkage. Again, @code{asm} labels do not have this restriction. -GCC does not parse basic @code{asm}'s @var{AssemblerInstructions}, which -means there is no way to communicate to the compiler what is happening -inside them. GCC has no visibility of symbols in the @code{asm} and may -discard them as unreferenced. It also does not know about side effects of -the assembler code, such as modifications to memory or registers. Unlike -some compilers, GCC assumes that no changes to general purpose registers -occur. This assumption may change in a future release. +@item If either of the ways of changing the assembly name of a +declaration are applied to a declaration whose assembly name has +already been determined (either by a previous use of one of these +features, or because the compiler needed the assembly name in order to +generate code), and the new name is different, a warning issues and +the name does not change. -To avoid complications from future changes to the semantics and the -compatibility issues between compilers, consider replacing basic @code{asm} -with extended @code{asm}. See -@uref{https://gcc.gnu.org/wiki/ConvertBasicAsmToExtended, How to convert -from basic asm to extended asm} for information about how to perform this -conversion. +@item The @var{oldname} used by @code{#pragma redefine_extname} is +always the C-language name. +@end enumerate -The compiler copies the assembler instructions in a basic @code{asm} -verbatim to the assembly language output file, without -processing dialects or any of the @samp{%} operators that are available with -extended @code{asm}. This results in minor differences between basic -@code{asm} strings and extended @code{asm} templates. For example, to refer to -registers you might use @samp{%eax} in basic @code{asm} and -@samp{%%eax} in extended @code{asm}. +@node Structure-Layout Pragmas +@subsection Structure-Layout Pragmas -On targets such as x86 that support multiple assembler dialects, -all basic @code{asm} blocks use the assembler dialect specified by the -@option{-masm} command-line option (@pxref{x86 Options}). -Basic @code{asm} provides no -mechanism to provide different assembler strings for different dialects. +For compatibility with Microsoft Windows compilers, GCC supports a +set of @code{#pragma} directives that change the maximum alignment of +members of structures (other than zero-width bit-fields), unions, and +classes subsequently defined. The @var{n} value below always is required +to be a small power of two and specifies the new alignment in bytes. -For basic @code{asm} with non-empty assembler string GCC assumes -the assembler block does not change any general purpose registers, -but it may read or write any globally accessible variable. +@enumerate +@item @code{#pragma pack(@var{n})} simply sets the new alignment. +@item @code{#pragma pack()} sets the alignment to the one that was in +effect when compilation started (see also command-line option +@option{-fpack-struct[=@var{n}]} @pxref{Code Gen Options}). +@item @code{#pragma pack(push[,@var{n}])} pushes the current alignment +setting on an internal stack and then optionally sets the new alignment. +@item @code{#pragma pack(pop)} restores the alignment setting to the one +saved at the top of the internal stack (and removes that stack entry). +Note that @code{#pragma pack([@var{n}])} does not influence this internal +stack; thus it is possible to have @code{#pragma pack(push)} followed by +multiple @code{#pragma pack(@var{n})} instances and finalized by a single +@code{#pragma pack(pop)}. +@end enumerate -Here is an example of basic @code{asm} for i386: +Some targets, e.g.@: x86 and PowerPC, support the @code{#pragma ms_struct} +directive which lays out structures and unions subsequently defined as the +documented @code{__attribute__ ((ms_struct))}. -@example -/* Note that this code will not compile with -masm=intel */ -#define DebugBreak() asm("int $3") -@end example +@enumerate +@item @code{#pragma ms_struct on} turns on the Microsoft layout. +@item @code{#pragma ms_struct off} turns off the Microsoft layout. +@item @code{#pragma ms_struct reset} goes back to the default layout. +@end enumerate -@node Extended Asm -@subsection Extended Asm - Assembler Instructions with C Expression Operands -@cindex extended @code{asm} -@cindex assembly language in C, extended +Most targets also support the @code{#pragma scalar_storage_order} directive +which lays out structures and unions subsequently defined as the documented +@code{__attribute__ ((scalar_storage_order))}. -With extended @code{asm} you can read and write C variables from -assembler and perform jumps from assembler code to C labels. -Extended @code{asm} syntax uses colons (@samp{:}) to delimit -the operand parameters after the assembler template: +@enumerate +@item @code{#pragma scalar_storage_order big-endian} sets the storage order +of the scalar fields to big-endian. +@item @code{#pragma scalar_storage_order little-endian} sets the storage order +of the scalar fields to little-endian. +@item @code{#pragma scalar_storage_order default} goes back to the endianness +that was in effect when compilation started (see also command-line option +@option{-fsso-struct=@var{endianness}} @pxref{C Dialect Options}). +@end enumerate -@example -asm @var{asm-qualifiers} ( @var{AssemblerTemplate} - : @var{OutputOperands} - @r{[} : @var{InputOperands} - @r{[} : @var{Clobbers} @r{]} @r{]}) +@node Weak Pragmas +@subsection Weak Pragmas -asm @var{asm-qualifiers} ( @var{AssemblerTemplate} - : @var{OutputOperands} - : @var{InputOperands} - : @var{Clobbers} - : @var{GotoLabels}) -@end example -where in the last form, @var{asm-qualifiers} contains @code{goto} (and in the -first form, not). +For compatibility with SVR4, GCC supports a set of @code{#pragma} +directives for declaring symbols to be weak, and defining weak +aliases. -The @code{asm} keyword is a GNU extension. -When writing code that can be compiled with @option{-ansi} and the -various @option{-std} options, use @code{__asm__} instead of -@code{asm} (@pxref{Alternate Keywords}). - -@subsubheading Qualifiers @table @code +@cindex pragma, weak +@item #pragma weak @var{symbol} +This pragma declares @var{symbol} to be weak, as if the declaration +had the attribute of the same name. The pragma may appear before +or after the declaration of @var{symbol}. It is not an error for +@var{symbol} to never be defined at all. -@item volatile -The typical use of extended @code{asm} statements is to manipulate input -values to produce output values. However, your @code{asm} statements may -also produce side effects. If so, you may need to use the @code{volatile} -qualifier to disable certain optimizations. @xref{Volatile}. +@item #pragma weak @var{symbol1} = @var{symbol2} +This pragma declares @var{symbol1} to be a weak alias of @var{symbol2}. +It is an error if @var{symbol2} is not defined in the current +translation unit. +@end table -@item inline -If you use the @code{inline} qualifier, then for inlining purposes the size -of the @code{asm} statement is taken as the smallest size possible -(@pxref{Size of an asm}). +@node Diagnostic Pragmas +@subsection Diagnostic Pragmas -@item goto -This qualifier informs the compiler that the @code{asm} statement may -perform a jump to one of the labels listed in the @var{GotoLabels}. -@xref{GotoLabels}. -@end table +GCC allows the user to selectively enable or disable certain types of +diagnostics, and change the kind of the diagnostic. For example, a +project's policy might require that all sources compile with +@option{-Werror} but certain files might have exceptions allowing +specific types of warnings. Or, a project might selectively enable +diagnostics and treat them as errors depending on which preprocessor +macros are defined. -@subsubheading Parameters -@table @var -@item AssemblerTemplate -This is a literal string that is the template for the assembler code. It is a -combination of fixed text and tokens that refer to the input, output, -and goto parameters. @xref{AssemblerTemplate}. +@table @code +@cindex pragma, diagnostic +@item #pragma GCC diagnostic @var{kind} @var{option} -@item OutputOperands -A comma-separated list describing the C variables modified by the -instructions in the @var{AssemblerTemplate}. An empty list is permitted. -@xref{OutputOperands}. +Modifies the disposition of a diagnostic. Note that not all +diagnostics are modifiable; at the moment only warnings (normally +controlled by @samp{-W@dots{}}) can be controlled, and not all of them. +Use @option{-fdiagnostics-show-option} to determine which diagnostics +are controllable and which option controls them. -@item InputOperands -A comma-separated list describing the C expressions read by the -instructions in the @var{AssemblerTemplate}. An empty list is permitted. -@xref{InputOperands}. +@var{kind} is @samp{error} to treat this diagnostic as an error, +@samp{warning} to treat it like a warning (even if @option{-Werror} is +in effect), or @samp{ignored} if the diagnostic is to be ignored. +@var{option} is a double quoted string that matches the command-line +option. -@item Clobbers -A comma-separated list of registers or other values changed by the -@var{AssemblerTemplate}, beyond those listed as outputs. -An empty list is permitted. @xref{Clobbers and Scratch Registers}. +@smallexample +#pragma GCC diagnostic warning "-Wformat" +#pragma GCC diagnostic error "-Wformat" +#pragma GCC diagnostic ignored "-Wformat" +@end smallexample -@item GotoLabels -When you are using the @code{goto} form of @code{asm}, this section contains -the list of all C labels to which the code in the -@var{AssemblerTemplate} may jump. -@xref{GotoLabels}. +Note that these pragmas override any command-line options. GCC keeps +track of the location of each pragma, and issues diagnostics according +to the state as of that point in the source file. Thus, pragmas occurring +after a line do not affect diagnostics caused by that line. -@code{asm} statements may not perform jumps into other @code{asm} statements, -only to the listed @var{GotoLabels}. -GCC's optimizers do not know about other jumps; therefore they cannot take -account of them when deciding how to optimize. -@end table +@item #pragma GCC diagnostic push +@itemx #pragma GCC diagnostic pop -The total number of input + output + goto operands is limited to 30. +Causes GCC to remember the state of the diagnostics as of each +@code{push}, and restore to that point at each @code{pop}. If a +@code{pop} has no matching @code{push}, the command-line options are +restored. -@subsubheading Remarks -The @code{asm} statement allows you to include assembly instructions directly -within C code. This may help you to maximize performance in time-sensitive -code or to access assembly instructions that are not readily available to C -programs. +@smallexample +#pragma GCC diagnostic error "-Wuninitialized" + foo(a); /* error is given for this one */ +#pragma GCC diagnostic push +#pragma GCC diagnostic ignored "-Wuninitialized" + foo(b); /* no diagnostic for this one */ +#pragma GCC diagnostic pop + foo(c); /* error is given for this one */ +#pragma GCC diagnostic pop + foo(d); /* depends on command-line options */ +@end smallexample -Similarly to basic @code{asm}, extended @code{asm} statements may be used -both inside a C function or at file scope (``top-level''), where you can -use this technique to emit assembler directives, define assembly language -macros that can be invoked elsewhere in the file, or write entire functions -in assembly language. -Extended @code{asm} statements outside of functions may not use any -qualifiers, may not specify clobbers, may not use @code{%}, @code{+} or -@code{&} modifiers in constraints and can only use constraints which don't -allow using any register. +@item #pragma GCC diagnostic ignored_attributes -Functions declared with the @code{naked} attribute require basic -@code{asm} (@pxref{Function Attributes}). +Similarly to @option{-Wno-attributes=}, this pragma allows users to suppress +warnings about unknown scoped attributes (in C++11 and C23). For example, +@code{#pragma GCC diagnostic ignored_attributes "vendor::attr"} disables +warning about the following declaration: -While the uses of @code{asm} are many and varied, it may help to think of an -@code{asm} statement as a series of low-level instructions that convert input -parameters to output parameters. So a simple (if not particularly useful) -example for i386 using @code{asm} might look like this: +@smallexample +[[vendor::attr]] void f(); +@end smallexample -@example -int src = 1; -int dst; +whereas @code{#pragma GCC diagnostic ignored_attributes "vendor::"} prevents +warning about both of these declarations: -asm ("mov %1, %0\n\t" - "add $1, %0" - : "=r" (dst) - : "r" (src)); +@smallexample +[[vendor::safe]] void f(); +[[vendor::unsafe]] void f2(); +@end smallexample -printf("%d\n", dst); -@end example +@end table -This code copies @code{src} to @code{dst} and add 1 to @code{dst}. +GCC also offers a simple mechanism for printing messages during +compilation. -@anchor{Volatile} -@subsubsection Volatile -@cindex volatile @code{asm} -@cindex @code{asm} volatile +@table @code +@cindex pragma, diagnostic +@item #pragma message @var{string} -GCC's optimizers sometimes discard @code{asm} statements if they determine -there is no need for the output variables. Also, the optimizers may move -code out of loops if they believe that the code will always return the same -result (i.e.@: none of its input values change between calls). Using the -@code{volatile} qualifier disables these optimizations. @code{asm} statements -that have no output operands and @code{asm goto} statements, -are implicitly volatile. +Prints @var{string} as a compiler message on compilation. The message +is informational only, and is neither a compilation warning nor an +error. Newlines can be included in the string by using the @samp{\n} +escape sequence. -This i386 code demonstrates a case that does not use (or require) the -@code{volatile} qualifier. If it is performing assertion checking, this code -uses @code{asm} to perform the validation. Otherwise, @code{dwRes} is -unreferenced by any code. As a result, the optimizers can discard the -@code{asm} statement, which in turn removes the need for the entire -@code{DoCheck} routine. By omitting the @code{volatile} qualifier when it -isn't needed you allow the optimizers to produce the most efficient code -possible. +@smallexample +#pragma message "Compiling " __FILE__ "..." +@end smallexample -@example -void DoCheck(uint32_t dwSomeValue) -@{ - uint32_t dwRes; +@var{string} may be parenthesized, and is printed with location +information. For example, - // Assumes dwSomeValue is not zero. - asm ("bsfl %1,%0" - : "=r" (dwRes) - : "r" (dwSomeValue) - : "cc"); +@smallexample +#define DO_PRAGMA(x) _Pragma (#x) +#define TODO(x) DO_PRAGMA(message ("TODO - " #x)) - assert(dwRes > 3); -@} -@end example +TODO(Remember to fix this) +@end smallexample -The next example shows a case where the optimizers can recognize that the input -(@code{dwSomeValue}) never changes during the execution of the function and can -therefore move the @code{asm} outside the loop to produce more efficient code. -Again, using the @code{volatile} qualifier disables this type of optimization. +@noindent +prints @samp{/tmp/file.c:4: note: #pragma message: +TODO - Remember to fix this}. -@example -void do_print(uint32_t dwSomeValue) -@{ - uint32_t dwRes; +@cindex pragma, diagnostic +@item #pragma GCC error @var{message} +Generates an error message. This pragma @emph{is} considered to +indicate an error in the compilation, and it will be treated as such. - for (uint32_t x=0; x < 5; x++) - @{ - // Assumes dwSomeValue is not zero. - asm ("bsfl %1,%0" - : "=r" (dwRes) - : "r" (dwSomeValue) - : "cc"); +Newlines can be included in the string by using the @samp{\n} +escape sequence. They will be displayed as newlines even if the +@option{-fmessage-length} option is set to zero. - printf("%u: %u %u\n", x, dwSomeValue, dwRes); - @} +The error is only generated if the pragma is present in the code after +pre-processing has been completed. It does not matter however if the +code containing the pragma is unreachable: + +@smallexample +#if 0 +#pragma GCC error "this error is not seen" +#endif +void foo (void) +@{ + return; +#pragma GCC error "this error is seen" @} -@end example +@end smallexample -The following example demonstrates a case where you need to use the -@code{volatile} qualifier. -It uses the x86 @code{rdtsc} instruction, which reads -the computer's time-stamp counter. Without the @code{volatile} qualifier, -the optimizers might assume that the @code{asm} block will always return the -same value and therefore optimize away the second call. +@cindex pragma, diagnostic +@item #pragma GCC warning @var{message} +This is just like @samp{pragma GCC error} except that a warning +message is issued instead of an error message. Unless +@option{-Werror} is in effect, in which case this pragma will generate +an error as well. -@example -uint64_t msr; +@end table -asm volatile ( "rdtsc\n\t" // Returns the time in EDX:EAX. - "shl $32, %%rdx\n\t" // Shift the upper bits left. - "or %%rdx, %0" // 'Or' in the lower bits. - : "=a" (msr) - : - : "rdx"); - -printf("msr: %llx\n", msr); +@node Visibility Pragmas +@subsection Visibility Pragmas -// Do other work... +@table @code +@cindex pragma, visibility +@item #pragma GCC visibility push(@var{visibility}) +@itemx #pragma GCC visibility pop -// Reprint the timestamp -asm volatile ( "rdtsc\n\t" // Returns the time in EDX:EAX. - "shl $32, %%rdx\n\t" // Shift the upper bits left. - "or %%rdx, %0" // 'Or' in the lower bits. - : "=a" (msr) - : - : "rdx"); +This pragma allows the user to set the visibility for multiple +declarations without having to give each a visibility attribute +(@pxref{Function Attributes}). -printf("msr: %llx\n", msr); -@end example +In C++, @samp{#pragma GCC visibility} affects only namespace-scope +declarations. Class members and template specializations are not +affected; if you want to override the visibility for a particular +member or instantiation, you must use an attribute. -GCC's optimizers do not treat this code like the non-volatile code in the -earlier examples. They do not move it out of loops or omit it on the -assumption that the result from a previous call is still valid. +@end table -Note that the compiler can move even @code{volatile asm} instructions relative -to other code, including across jump instructions. For example, on many -targets there is a system register that controls the rounding mode of -floating-point operations. Setting it with a @code{volatile asm} statement, -as in the following PowerPC example, does not work reliably. -@example -asm volatile("mtfsf 255, %0" : : "f" (fpenv)); -sum = x + y; -@end example +@node Push/Pop Macro Pragmas +@subsection Push/Pop Macro Pragmas -The compiler may move the addition back before the @code{volatile asm} -statement. To make it work as expected, add an artificial dependency to -the @code{asm} by referencing a variable in the subsequent code, for -example: +For compatibility with Microsoft Windows compilers, GCC supports +@samp{#pragma push_macro(@var{"macro_name"})} +and @samp{#pragma pop_macro(@var{"macro_name"})}. -@example -asm volatile ("mtfsf 255,%1" : "=X" (sum) : "f" (fpenv)); -sum = x + y; -@end example +@table @code +@cindex pragma, push_macro +@item #pragma push_macro(@var{"macro_name"}) +This pragma saves the value of the macro named as @var{macro_name} to +the top of the stack for this macro. -Under certain circumstances, GCC may duplicate (or remove duplicates of) your -assembly code when optimizing. This can lead to unexpected duplicate symbol -errors during compilation if your @code{asm} code defines symbols or labels. -Using @samp{%=} -(@pxref{AssemblerTemplate}) may help resolve this problem. +@cindex pragma, pop_macro +@item #pragma pop_macro(@var{"macro_name"}) +This pragma sets the value of the macro named as @var{macro_name} to +the value on top of the stack for this macro. If the stack for +@var{macro_name} is empty, the value of the macro remains unchanged. +@end table -@anchor{AssemblerTemplate} -@subsubsection Assembler Template -@cindex @code{asm} assembler template +For example: -An assembler template is a literal string containing assembler instructions. -In C++ with @option{-std=gnu++11} or later, the assembler template can -also be a constant expression inside parentheses (see @ref{Asm constexprs}). +@smallexample +#define X 1 +#pragma push_macro("X") +#undef X +#define X -1 +#pragma pop_macro("X") +int x [X]; +@end smallexample -The compiler replaces tokens in the template that refer -to inputs, outputs, and goto labels, -and then outputs the resulting string to the assembler. The -string can contain any instructions recognized by the assembler, including -directives. GCC does not parse the assembler instructions -themselves and does not know what they mean or even whether they are valid -assembler input. However, it does count the statements -(@pxref{Size of an asm}). +@noindent +In this example, the definition of X as 1 is saved by @code{#pragma +push_macro} and restored by @code{#pragma pop_macro}. -You may place multiple assembler instructions together in a single @code{asm} -string, separated by the characters normally used in assembly code for the -system. A combination that works in most places is a newline to break the -line, plus a tab character to move to the instruction field (written as -@samp{\n\t}). -Some assemblers allow semicolons as a line separator. However, note -that some assembler dialects use semicolons to start a comment. +@node Function Specific Option Pragmas +@subsection Function Specific Option Pragmas -Do not expect a sequence of @code{asm} statements to remain perfectly -consecutive after compilation, even when you are using the @code{volatile} -qualifier. If certain instructions need to remain consecutive in the output, -put them in a single multi-instruction @code{asm} statement. +@table @code +@cindex pragma GCC target +@item #pragma GCC target (@var{string}, @dots{}) -Accessing data from C programs without using input/output operands (such as -by using global symbols directly from the assembler template) may not work as -expected. Similarly, calling functions directly from an assembler template -requires a detailed understanding of the target assembler and ABI. +This pragma allows you to set target-specific options for functions +defined later in the source file. One or more strings can be +specified. Each function that is defined after this point is treated +as if it had been declared with one @code{target(}@var{string}@code{)} +attribute for each @var{string} argument. The parentheses around +the strings in the pragma are optional. @xref{Function Attributes}, +for more information about the @code{target} attribute and the attribute +syntax. -Since GCC does not parse the assembler template, -it has no visibility of any -symbols it references. This may result in GCC discarding those symbols as -unreferenced unless they are also listed as input, output, or goto operands. +The @code{#pragma GCC target} pragma is presently implemented for +x86, ARM, AArch64, PowerPC, and S/390 targets only. -@subsubheading Special format strings +@cindex pragma GCC optimize +@item #pragma GCC optimize (@var{string}, @dots{}) -In addition to the tokens described by the input, output, and goto operands, -these tokens have special meanings in the assembler template: +This pragma allows you to set global optimization options for functions +defined later in the source file. One or more strings can be +specified. Each function that is defined after this point is treated +as if it had been declared with one @code{optimize(}@var{string}@code{)} +attribute for each @var{string} argument. The parentheses around +the strings in the pragma are optional. @xref{Function Attributes}, +for more information about the @code{optimize} attribute and the attribute +syntax. -@table @samp -@item %% -Outputs a single @samp{%} into the assembler code. +@cindex pragma GCC push_options +@cindex pragma GCC pop_options +@item #pragma GCC push_options +@itemx #pragma GCC pop_options -@item %= -Outputs a number that is unique to each instance of the @code{asm} -statement in the entire compilation. This option is useful when creating local -labels and referring to them multiple times in a single template that -generates multiple assembler instructions. +These pragmas maintain a stack of the current target and optimization +options. It is intended for include files where you temporarily want +to switch to using a different @samp{#pragma GCC target} or +@samp{#pragma GCC optimize} and then to pop back to the previous +options. -@item %@{ -@itemx %| -@itemx %@} -Outputs @samp{@{}, @samp{|}, and @samp{@}} characters (respectively) -into the assembler code. When unescaped, these characters have special -meaning to indicate multiple assembler dialects, as described below. -@end table +@cindex pragma GCC reset_options +@item #pragma GCC reset_options -@subsubheading Multiple assembler dialects in @code{asm} templates +This pragma clears the current @code{#pragma GCC target} and +@code{#pragma GCC optimize} to use the default switches as specified +on the command line. -On targets such as x86, GCC supports multiple assembler dialects. -The @option{-masm} option controls which dialect GCC uses as its -default for inline assembler. The target-specific documentation for the -@option{-masm} option contains the list of supported dialects, as well as the -default dialect if the option is not specified. This information may be -important to understand, since assembler code that works correctly when -compiled using one dialect will likely fail if compiled using another. -@xref{x86 Options}. +@end table -If your code needs to support multiple assembler dialects (for example, if -you are writing public headers that need to support a variety of compilation -options), use constructs of this form: +@node Loop-Specific Pragmas +@subsection Loop-Specific Pragmas -@example -@{ dialect0 | dialect1 | dialect2... @} -@end example +@table @code +@cindex pragma GCC ivdep +@item #pragma GCC ivdep -This construct outputs @code{dialect0} -when using dialect #0 to compile the code, -@code{dialect1} for dialect #1, etc. If there are fewer alternatives within the -braces than the number of dialects the compiler supports, the construct -outputs nothing. +With this pragma, the programmer asserts that there are no loop-carried +dependencies which would prevent consecutive iterations of +the following loop from executing concurrently with SIMD +(single instruction multiple data) instructions. -For example, if an x86 compiler supports two dialects -(@samp{att}, @samp{intel}), an -assembler template such as this: +For example, the compiler can only unconditionally vectorize the following +loop with the pragma: -@example -"bt@{l %[Offset],%[Base] | %[Base],%[Offset]@}; jc %l2" -@end example +@smallexample +void foo (int n, int *a, int *b, int *c) +@{ + int i, j; +#pragma GCC ivdep + for (i = 0; i < n; ++i) + a[i] = b[i] + c[i]; +@} +@end smallexample @noindent -is equivalent to one of - -@example -"btl %[Offset],%[Base] ; jc %l2" @r{/* att dialect */} -"bt %[Base],%[Offset]; jc %l2" @r{/* intel dialect */} -@end example +In this example, using the @code{restrict} qualifier had the same +effect. In the following example, that would not be possible. Assume +@math{k < -m} or @math{k >= m}. Only with the pragma, the compiler knows +that it can unconditionally vectorize the following loop: -Using that same compiler, this code: +@smallexample +void ignore_vec_dep (int *a, int k, int c, int m) +@{ +#pragma GCC ivdep + for (int i = 0; i < m; i++) + a[i] = a[i + k] * c; +@} +@end smallexample -@example -"xchg@{l@}\t@{%%@}ebx, %1" -@end example +@cindex pragma GCC novector +@item #pragma GCC novector -@noindent -corresponds to either +With this pragma, the programmer asserts that the following loop should be +prevented from executing concurrently with SIMD (single instruction multiple +data) instructions. -@example -"xchgl\t%%ebx, %1" @r{/* att dialect */} -"xchg\tebx, %1" @r{/* intel dialect */} -@end example +For example, the compiler cannot vectorize the following loop with the pragma: -There is no support for nesting dialect alternatives. +@smallexample +void foo (int n, int *a, int *b, int *c) +@{ + int i, j; +#pragma GCC novector + for (i = 0; i < n; ++i) + a[i] = b[i] + c[i]; +@} +@end smallexample -@anchor{OutputOperands} -@subsubsection Output Operands -@cindex @code{asm} output operands +@cindex pragma GCC unroll @var{n} +@item #pragma GCC unroll @var{n} -An @code{asm} statement has zero or more output operands indicating the names -of C variables modified by the assembler code. +You can use this pragma to control how many times a loop should be unrolled. +It must be placed immediately before a @code{for}, @code{while} or @code{do} +loop or a @code{#pragma GCC ivdep}, and applies only to the loop that follows. +@var{n} is an integer constant expression specifying the unrolling factor. +The values of @math{0} and @math{1} block any unrolling of the loop. -In this i386 example, @code{old} (referred to in the template string as -@code{%0}) and @code{*Base} (as @code{%1}) are outputs and @code{Offset} -(@code{%2}) is an input: +@end table -@example -bool old; +@node Thread-Local +@section Thread-Local Storage +@cindex Thread-Local Storage +@cindex @acronym{TLS} +@cindex @code{__thread} -__asm__ ("btsl %2,%1\n\t" // Turn on zero-based bit #Offset in Base. - "sbb %0,%0" // Use the CF to calculate old. - : "=r" (old), "+rm" (*Base) - : "Ir" (Offset) - : "cc"); +Thread-local storage (@acronym{TLS}) is a mechanism by which variables +are allocated such that there is one instance of the variable per extant +thread. The runtime model GCC uses to implement this originates +in the IA-64 processor-specific ABI, but has since been migrated +to other processors as well. It requires significant support from +the linker (@command{ld}), dynamic linker (@command{ld.so}), and +system libraries (@file{libc.so} and @file{libpthread.so}), so it +is not available everywhere. -return old; -@end example +At the user level, the extension is visible with a new storage +class keyword: @code{__thread}. For example: -Operands are separated by commas. Each operand has this format: +@smallexample +__thread int i; +extern __thread struct state s; +static __thread char *p; +@end smallexample -@example -@r{[} [@var{asmSymbolicName}] @r{]} @var{constraint} (@var{cvariablename}) -@end example +The @code{__thread} specifier may be used alone, with the @code{extern} +or @code{static} specifiers, but with no other storage class specifier. +When used with @code{extern} or @code{static}, @code{__thread} must appear +immediately after the other storage class specifier. -@table @var -@item asmSymbolicName -Specifies an optional symbolic name for the operand. The literal square -brackets @samp{[]} around the @var{asmSymbolicName} are required both -in the operand specification and references to the operand in the assembler -template, i.e.@: @samp{%[Value]}. -The scope of the name is the @code{asm} statement -that contains the definition. Any valid C identifier is acceptable, -including names already defined in the surrounding code. No two operands -within the same @code{asm} statement can use the same symbolic name. +The @code{__thread} specifier may be applied to any global, file-scoped +static, function-scoped static, or static data member of a class. It may +not be applied to block-scoped automatic or non-static data member. -When not using an @var{asmSymbolicName}, use the (zero-based) position -of the operand -in the list of operands in the assembler template. For example if there are -three output operands, use @samp{%0} in the template to refer to the first, -@samp{%1} for the second, and @samp{%2} for the third. +When the address-of operator is applied to a thread-local variable, it is +evaluated at run time and returns the address of the current thread's +instance of that variable. An address so obtained may be used by any +thread. When a thread terminates, any pointers to thread-local variables +in that thread become invalid. -@item constraint -A string constant specifying constraints on the placement of the operand; -@xref{Constraints}, for details. -In C++ with @option{-std=gnu++11} or later, the constraint can -also be a constant expression inside parentheses (see @ref{Asm constexprs}). +No static initialization may refer to the address of a thread-local variable. -Output constraints must begin with either @samp{=} (a variable overwriting an -existing value) or @samp{+} (when reading and writing). When using -@samp{=}, do not assume the location contains the existing value -on entry to the @code{asm}, except -when the operand is tied to an input; @pxref{InputOperands,,Input Operands}. +In C++, if an initializer is present for a thread-local variable, it must +be a @var{constant-expression}, as defined in 5.19.2 of the ANSI/ISO C++ +standard. -After the prefix, there must be one or more additional constraints -(@pxref{Constraints}) that describe where the value resides. Common -constraints include @samp{r} for register and @samp{m} for memory. -When you list more than one possible location (for example, @code{"=rm"}), -the compiler chooses the most efficient one based on the current context. -If you list as many alternates as the @code{asm} statement allows, you permit -the optimizers to produce the best possible code. -If you must use a specific register, but your Machine Constraints do not -provide sufficient control to select the specific register you want, -local register variables may provide a solution (@pxref{Local Register -Variables}). +See @uref{https://www.akkadia.org/drepper/tls.pdf, +ELF Handling For Thread-Local Storage} for a detailed explanation of +the four thread-local storage addressing models, and how the runtime +is expected to function. -@item cvariablename -Specifies a C lvalue expression to hold the output, typically a variable name. -The enclosing parentheses are a required part of the syntax. +@menu +* C99 Thread-Local Edits:: +* C++98 Thread-Local Edits:: +@end menu -@end table +@node C99 Thread-Local Edits +@subsection ISO/IEC 9899:1999 Edits for Thread-Local Storage -When the compiler selects the registers to use to -represent the output operands, it does not use any of the clobbered registers -(@pxref{Clobbers and Scratch Registers}). +The following are a set of changes to ISO/IEC 9899:1999 (aka C99) +that document the exact semantics of the language extension. -Output operand expressions must be lvalues. The compiler cannot check whether -the operands have data types that are reasonable for the instruction being -executed. For output expressions that are not directly addressable (for -example a bit-field), the constraint must allow a register. In that case, GCC -uses the register as the output of the @code{asm}, and then stores that -register into the output. +@itemize @bullet +@item +@cite{5.1.2 Execution environments} -Operands using the @samp{+} constraint modifier count as two operands -(that is, both as input and output) towards the total maximum of 30 operands -per @code{asm} statement. +Add new text after paragraph 1 -Use the @samp{&} constraint modifier (@pxref{Modifiers}) on all output -operands that must not overlap an input. Otherwise, -GCC may allocate the output operand in the same register as an unrelated -input operand, on the assumption that the assembler code consumes its -inputs before producing outputs. This assumption may be false if the assembler -code actually consists of more than one instruction. +@quotation +Within either execution environment, a @dfn{thread} is a flow of +control within a program. It is implementation defined whether +or not there may be more than one thread associated with a program. +It is implementation defined how threads beyond the first are +created, the name and type of the function called at thread +startup, and how threads may be terminated. However, objects +with thread storage duration shall be initialized before thread +startup. +@end quotation -The same problem can occur if one output parameter (@var{a}) allows a register -constraint and another output parameter (@var{b}) allows a memory constraint. -The code generated by GCC to access the memory address in @var{b} can contain -registers which @emph{might} be shared by @var{a}, and GCC considers those -registers to be inputs to the asm. As above, GCC assumes that such input -registers are consumed before any outputs are written. This assumption may -result in incorrect behavior if the @code{asm} statement writes to @var{a} -before using -@var{b}. Combining the @samp{&} modifier with the register constraint on @var{a} -ensures that modifying @var{a} does not affect the address referenced by -@var{b}. Otherwise, the location of @var{b} -is undefined if @var{a} is modified before using @var{b}. +@item +@cite{6.2.4 Storage durations of objects} -@code{asm} supports operand modifiers on operands (for example @samp{%k2} -instead of simply @samp{%2}). @ref{GenericOperandmodifiers, -Generic Operand modifiers} lists the modifiers that are available -on all targets. Other modifiers are hardware dependent. -For example, the list of supported modifiers for x86 is found at -@ref{x86Operandmodifiers,x86 Operand modifiers}. +Add new text before paragraph 3 -If the C code that follows the @code{asm} makes no use of any of the output -operands, use @code{volatile} for the @code{asm} statement to prevent the -optimizers from discarding the @code{asm} statement as unneeded -(see @ref{Volatile}). +@quotation +An object whose identifier is declared with the storage-class +specifier @w{@code{__thread}} has @dfn{thread storage duration}. +Its lifetime is the entire execution of the thread, and its +stored value is initialized only once, prior to thread startup. +@end quotation -This code makes no use of the optional @var{asmSymbolicName}. Therefore it -references the first output operand as @code{%0} (were there a second, it -would be @code{%1}, etc). The number of the first input operand is one greater -than that of the last output operand. In this i386 example, that makes -@code{Mask} referenced as @code{%1}: +@item +@cite{6.4.1 Keywords} -@example -uint32_t Mask = 1234; -uint32_t Index; +Add @code{__thread}. - asm ("bsfl %1, %0" - : "=r" (Index) - : "r" (Mask) - : "cc"); -@end example +@item +@cite{6.7.1 Storage-class specifiers} -That code overwrites the variable @code{Index} (@samp{=}), -placing the value in a register (@samp{r}). -Using the generic @samp{r} constraint instead of a constraint for a specific -register allows the compiler to pick the register to use, which can result -in more efficient code. This may not be possible if an assembler instruction -requires a specific register. +Add @code{__thread} to the list of storage class specifiers in +paragraph 1. -The following i386 example uses the @var{asmSymbolicName} syntax. -It produces the -same result as the code above, but some may consider it more readable or more -maintainable since reordering index numbers is not necessary when adding or -removing operands. The names @code{aIndex} and @code{aMask} -are only used in this example to emphasize which -names get used where. -It is acceptable to reuse the names @code{Index} and @code{Mask}. +Change paragraph 2 to -@example -uint32_t Mask = 1234; -uint32_t Index; +@quotation +With the exception of @code{__thread}, at most one storage-class +specifier may be given [@dots{}]. The @code{__thread} specifier may +be used alone, or immediately following @code{extern} or +@code{static}. +@end quotation - asm ("bsfl %[aMask], %[aIndex]" - : [aIndex] "=r" (Index) - : [aMask] "r" (Mask) - : "cc"); -@end example +Add new text after paragraph 6 -Here are some more examples of output operands. +@quotation +The declaration of an identifier for a variable that has +block scope that specifies @code{__thread} shall also +specify either @code{extern} or @code{static}. -@example -uint32_t c = 1; -uint32_t d; -uint32_t *e = &c; +The @code{__thread} specifier shall be used only with +variables. +@end quotation +@end itemize -asm ("mov %[e], %[d]" - : [d] "=rm" (d) - : [e] "rm" (*e)); -@end example +@node C++98 Thread-Local Edits +@subsection ISO/IEC 14882:1998 Edits for Thread-Local Storage -Here, @code{d} may either be in a register or in memory. Since the compiler -might already have the current value of the @code{uint32_t} location -pointed to by @code{e} -in a register, you can enable it to choose the best location -for @code{d} by specifying both constraints. +The following are a set of changes to ISO/IEC 14882:1998 (aka C++98) +that document the exact semantics of the language extension. -@anchor{FlagOutputOperands} -@subsubsection Flag Output Operands -@cindex @code{asm} flag output operands +@itemize @bullet +@item +@b{[intro.execution]} -Some targets have a special register that holds the ``flags'' for the -result of an operation or comparison. Normally, the contents of that -register are either unmodified by the asm, or the @code{asm} statement is -considered to clobber the contents. +New text after paragraph 4 -On some targets, a special form of output operand exists by which -conditions in the flags register may be outputs of the asm. The set of -conditions supported are target specific, but the general rule is that -the output variable must be a scalar integer, and the value is boolean. -When supported, the target defines the preprocessor symbol -@code{__GCC_ASM_FLAG_OUTPUTS__}. +@quotation +A @dfn{thread} is a flow of control within the abstract machine. +It is implementation defined whether or not there may be more than +one thread. +@end quotation -Because of the special nature of the flag output operands, the constraint -may not include alternatives. +New text after paragraph 7 -Most often, the target has only one flags register, and thus is an implied -operand of many instructions. In this case, the operand should not be -referenced within the assembler template via @code{%0} etc, as there's -no corresponding text in the assembly language. +@quotation +It is unspecified whether additional action must be taken to +ensure when and whether side effects are visible to other threads. +@end quotation -@table @asis -@item ARM -@itemx AArch64 -The flag output constraints for the ARM family are of the form -@samp{=@@cc@var{cond}} where @var{cond} is one of the standard -conditions defined in the ARM ARM for @code{ConditionHolds}. +@item +@b{[lex.key]} -@table @code -@item eq -Z flag set, or equal -@item ne -Z flag clear or not equal -@item cs -@itemx hs -C flag set or unsigned greater than equal -@item cc -@itemx lo -C flag clear or unsigned less than -@item mi -N flag set or ``minus'' -@item pl -N flag clear or ``plus'' -@item vs -V flag set or signed overflow -@item vc -V flag clear -@item hi -unsigned greater than -@item ls -unsigned less than equal -@item ge -signed greater than equal -@item lt -signed less than -@item gt -signed greater than -@item le -signed less than equal -@end table +Add @code{__thread}. -The flag output constraints are not supported in thumb1 mode. +@item +@b{[basic.start.main]} -@item x86 family -The flag output constraints for the x86 family are of the form -@samp{=@@cc@var{cond}} where @var{cond} is one of the standard -conditions defined in the ISA manual for @code{j@var{cc}} or -@code{set@var{cc}}. +Add after paragraph 5 -@table @code -@item a -``above'' or unsigned greater than -@item ae -``above or equal'' or unsigned greater than or equal -@item b -``below'' or unsigned less than -@item be -``below or equal'' or unsigned less than or equal -@item c -carry flag set -@item e -@itemx z -``equal'' or zero flag set -@item g -signed greater than -@item ge -signed greater than or equal -@item l -signed less than -@item le -signed less than or equal -@item o -overflow flag set -@item p -parity flag set -@item s -sign flag set -@item na -@itemx nae -@itemx nb -@itemx nbe -@itemx nc -@itemx ne -@itemx ng -@itemx nge -@itemx nl -@itemx nle -@itemx no -@itemx np -@itemx ns -@itemx nz -``not'' @var{flag}, or inverted versions of those above -@end table +@quotation +The thread that begins execution at the @code{main} function is called +the @dfn{main thread}. It is implementation defined how functions +beginning threads other than the main thread are designated or typed. +A function so designated, as well as the @code{main} function, is called +a @dfn{thread startup function}. It is implementation defined what +happens if a thread startup function returns. It is implementation +defined what happens to other threads when any thread calls @code{exit}. +@end quotation -@item s390 -The flag output constraint for s390 is @samp{=@@cc}. Only one such -constraint is allowed. The variable has to be stored in a @samp{int} -variable. +@item +@b{[basic.start.init]} -@end table +Add after paragraph 4 -@anchor{InputOperands} -@subsubsection Input Operands -@cindex @code{asm} input operands -@cindex @code{asm} expressions +@quotation +The storage for an object of thread storage duration shall be +statically initialized before the first statement of the thread startup +function. An object of thread storage duration shall not require +dynamic initialization. +@end quotation -Input operands make values from C variables and expressions available to the -assembly code. +@item +@b{[basic.start.term]} -Operands are separated by commas. Each operand has this format: +Add after paragraph 3 -@example -@r{[} [@var{asmSymbolicName}] @r{]} @var{constraint} (@var{cexpression}) -@end example +@quotation +The type of an object with thread storage duration shall not have a +non-trivial destructor, nor shall it be an array type whose elements +(directly or indirectly) have non-trivial destructors. +@end quotation -@table @var -@item asmSymbolicName -Specifies an optional symbolic name for the operand. The literal square -brackets @samp{[]} around the @var{asmSymbolicName} are required both -in the operand specification and references to the operand in the assembler -template, i.e.@: @samp{%[Value]}. -The scope of the name is the @code{asm} statement -that contains the definition. Any valid C identifier is acceptable, -including names already defined in the surrounding code. No two operands -within the same @code{asm} statement can use the same symbolic name. +@item +@b{[basic.stc]} -When not using an @var{asmSymbolicName}, use the (zero-based) position -of the operand -in the list of operands in the assembler template. For example if there are -two output operands and three inputs, -use @samp{%2} in the template to refer to the first input operand, -@samp{%3} for the second, and @samp{%4} for the third. +Add ``thread storage duration'' to the list in paragraph 1. -@item constraint -A string constant specifying constraints on the placement of the operand; -@xref{Constraints}, for details. -In C++ with @option{-std=gnu++11} or later, the constraint can -also be a constant expression inside parentheses (see @ref{Asm constexprs}). +Change paragraph 2 -Input constraint strings may not begin with either @samp{=} or @samp{+}. -When you list more than one possible location (for example, @samp{"irm"}), -the compiler chooses the most efficient one based on the current context. -If you must use a specific register, but your Machine Constraints do not -provide sufficient control to select the specific register you want, -local register variables may provide a solution (@pxref{Local Register -Variables}). +@quotation +Thread, static, and automatic storage durations are associated with +objects introduced by declarations [@dots{}]. +@end quotation -Input constraints can also be digits (for example, @code{"0"}). This indicates -that the specified input must be in the same place as the output constraint -at the (zero-based) index in the output constraint list. -When using @var{asmSymbolicName} syntax for the output operands, -you may use these names (enclosed in brackets @samp{[]}) instead of digits. +Add @code{__thread} to the list of specifiers in paragraph 3. -@item cexpression -This is the C variable or expression being passed to the @code{asm} statement -as input. The enclosing parentheses are a required part of the syntax. +@item +@b{[basic.stc.thread]} -@end table +New section before @b{[basic.stc.static]} -When the compiler selects the registers to use to represent the input -operands, it does not use any of the clobbered registers -(@pxref{Clobbers and Scratch Registers}). +@quotation +The keyword @code{__thread} applied to a non-local object gives the +object thread storage duration. -If there are no output operands but there are input operands, place two -consecutive colons where the output operands would go: +A local variable or class data member declared both @code{static} +and @code{__thread} gives the variable or member thread storage +duration. +@end quotation -@example -__asm__ ("some instructions" - : /* No outputs. */ - : "r" (Offset / 8)); -@end example +@item +@b{[basic.stc.static]} -@strong{Warning:} Do @emph{not} modify the contents of input-only operands -(except for inputs tied to outputs). The compiler assumes that on exit from -the @code{asm} statement these operands contain the same values as they -had before executing the statement. -It is @emph{not} possible to use clobbers -to inform the compiler that the values in these inputs are changing. One -common work-around is to tie the changing input variable to an output variable -that never gets used. Note, however, that if the code that follows the -@code{asm} statement makes no use of any of the output operands, the GCC -optimizers may discard the @code{asm} statement as unneeded -(see @ref{Volatile}). +Change paragraph 1 -@code{asm} supports operand modifiers on operands (for example @samp{%k2} -instead of simply @samp{%2}). @ref{GenericOperandmodifiers, -Generic Operand modifiers} lists the modifiers that are available -on all targets. Other modifiers are hardware dependent. -For example, the list of supported modifiers for x86 is found at -@ref{x86Operandmodifiers,x86 Operand modifiers}. +@quotation +All objects that have neither thread storage duration, dynamic +storage duration nor are local [@dots{}]. +@end quotation -In this example using the fictitious @code{combine} instruction, the -constraint @code{"0"} for input operand 1 says that it must occupy the same -location as output operand 0. Only input operands may use numbers in -constraints, and they must each refer to an output operand. Only a number (or -the symbolic assembler name) in the constraint can guarantee that one operand -is in the same place as another. The mere fact that @code{foo} is the value of -both operands is not enough to guarantee that they are in the same place in -the generated assembler code. +@item +@b{[dcl.stc]} -@example -asm ("combine %2, %0" - : "=r" (foo) - : "0" (foo), "g" (bar)); -@end example +Add @code{__thread} to the list in paragraph 1. -Here is an example using symbolic names. +Change paragraph 1 -@example -asm ("cmoveq %1, %2, %[result]" - : [result] "=r"(result) - : "r" (test), "r" (new), "[result]" (old)); -@end example +@quotation +With the exception of @code{__thread}, at most one +@var{storage-class-specifier} shall appear in a given +@var{decl-specifier-seq}. The @code{__thread} specifier may +be used alone, or immediately following the @code{extern} or +@code{static} specifiers. [@dots{}] +@end quotation -@anchor{Clobbers and Scratch Registers} -@subsubsection Clobbers and Scratch Registers -@cindex @code{asm} clobbers -@cindex @code{asm} scratch registers +Add after paragraph 5 -While the compiler is aware of changes to entries listed in the output -operands, the inline @code{asm} code may modify more than just the outputs. For -example, calculations may require additional registers, or the processor may -overwrite a register as a side effect of a particular assembler instruction. -In order to inform the compiler of these changes, list them in the clobber -list. Clobber list items are either register names or the special clobbers -(listed below). Each clobber list item is a string constant -enclosed in double quotes and separated by commas. -In C++ with @option{-std=gnu++11} or later, a clobber list item can -also be a constant expression inside parentheses (see @ref{Asm constexprs}). +@quotation +The @code{__thread} specifier can be applied only to the names of objects +and to anonymous unions. +@end quotation -Clobber descriptions may not in any way overlap with an input or output -operand. For example, you may not have an operand describing a register class -with one member when listing that register in the clobber list. Variables -declared to live in specific registers (@pxref{Explicit Register -Variables}) and used -as @code{asm} input or output operands must have no part mentioned in the -clobber description. In particular, there is no way to specify that input -operands get modified without also specifying them as output operands. +@item +@b{[class.mem]} -When the compiler selects which registers to use to represent input and output -operands, it does not use any of the clobbered registers. As a result, -clobbered registers are available for any use in the assembler code. +Add after paragraph 6 -Another restriction is that the clobber list should not contain the -stack pointer register. This is because the compiler requires the -value of the stack pointer to be the same after an @code{asm} -statement as it was on entry to the statement. However, previous -versions of GCC did not enforce this rule and allowed the stack -pointer to appear in the list, with unclear semantics. This behavior -is deprecated and listing the stack pointer may become an error in -future versions of GCC@. +@quotation +Non-@code{static} members shall not be @code{__thread}. +@end quotation +@end itemize -Here is a realistic example for the VAX showing the use of clobbered -registers: +@node OpenMP +@section OpenMP +@cindex OpenMP extension support -@example -asm volatile ("movc3 %0, %1, %2" - : /* No outputs. */ - : "g" (from), "g" (to), "g" (count) - : "r0", "r1", "r2", "r3", "r4", "r5", "memory"); -@end example +OpenMP (Open Multi-Processing) is an application programming +interface (API) that supports multi-platform shared memory +multiprocessing programming in C/C++ and Fortran on many +architectures, including Unix and Microsoft Windows platforms. +It consists of a set of compiler directives, library routines, +and environment variables that influence run-time behavior. -Also, there are three special clobber arguments: +GCC implements all of the @uref{https://www.openmp.org/specifications/, +OpenMP Application Program Interface v4.5}, and many features from later +versions of the OpenMP specification. +@xref{OpenMP Implementation Status,,,libgomp, +GNU Offloading and Multi Processing Runtime Library}, +for more details about currently supported OpenMP features. -@table @code -@item "cc" -The @code{"cc"} clobber indicates that the assembler code modifies the flags -register. On some machines, GCC represents the condition codes as a specific -hardware register; @code{"cc"} serves to name this register. -On other machines, condition code handling is different, -and specifying @code{"cc"} has no effect. But -it is valid no matter what the target. +To enable the processing of OpenMP directives @samp{#pragma omp}, +@samp{[[omp::directive(...)]]}, @samp{[[omp::decl(...)]]}, +and @samp{[[omp::sequence(...)]]} in C and C++, +GCC needs to be invoked with the @option{-fopenmp} option. +This option also arranges for automatic linking of the OpenMP +runtime library. +@xref{,,,libgomp,GNU Offloading and Multi Processing Runtime Library}. -@item "memory" -The @code{"memory"} clobber tells the compiler that the assembly code -performs memory -reads or writes to items other than those listed in the input and output -operands (for example, accessing the memory pointed to by one of the input -parameters). To ensure memory contains correct values, GCC may need to flush -specific register values to memory before executing the @code{asm}. Further, -the compiler does not assume that any values read from memory before an -@code{asm} remain unchanged after that @code{asm}; it reloads them as -needed. -Using the @code{"memory"} clobber effectively forms a read/write -memory barrier for the compiler. +@xref{OpenMP and OpenACC Options}, for additional options useful with +@option{-fopenmp}. -Note that this clobber does not prevent the @emph{processor} from doing -speculative reads past the @code{asm} statement. To prevent that, you need -processor-specific fence instructions. +@node OpenACC +@section OpenACC +@cindex OpenACC extension support -@item "redzone" -The @code{"redzone"} clobber tells the compiler that the assembly code -may write to the stack red zone, area below the stack pointer which on -some architectures in some calling conventions is guaranteed not to be -changed by signal handlers, interrupts or exceptions and so the compiler -can store there temporaries in leaf functions. On targets which have -no concept of the stack red zone, the clobber is ignored. -It should be used e.g.@: in case the assembly code uses call instructions -or pushes something to the stack without taking the red zone into account -by subtracting red zone size from the stack pointer first and restoring -it afterwards. +OpenACC is an application programming interface (API) that supports +offloading of code to accelerator devices. It consists of a set of +compiler directives, library routines, and environment variables that +influence run-time behavior. -@end table +GCC strives to be compatible with the +@uref{https://www.openacc.org/, OpenACC Application Programming +Interface v2.6}. -Flushing registers to memory has performance implications and may be -an issue for time-sensitive code. You can provide better information -to GCC to avoid this, as shown in the following examples. At a -minimum, aliasing rules allow GCC to know what memory @emph{doesn't} -need to be flushed. +To enable the processing of OpenACC directives @samp{#pragma acc} +in C and C++, GCC needs to be invoked with the @option{-fopenacc} option. +This option also arranges for automatic linking of the OpenACC runtime +library. +@xref{,,,libgomp,GNU Offloading and Multi Processing Runtime Library}. -Here is a fictitious sum of squares instruction, that takes two -pointers to floating point values in memory and produces a floating -point register output. -Notice that @code{x}, and @code{y} both appear twice in the @code{asm} -parameters, once to specify memory accessed, and once to specify a -base register used by the @code{asm}. You won't normally be wasting a -register by doing this as GCC can use the same register for both -purposes. However, it would be foolish to use both @code{%1} and -@code{%3} for @code{x} in this @code{asm} and expect them to be the -same. In fact, @code{%3} may well not be a register. It might be a -symbolic memory reference to the object pointed to by @code{x}. +@xref{OpenMP and OpenACC Options}, for additional options useful with +@option{-fopenacc}. -@smallexample -asm ("sumsq %0, %1, %2" - : "+f" (result) - : "r" (x), "r" (y), "m" (*x), "m" (*y)); -@end smallexample +@node Inline +@section An Inline Function is As Fast As a Macro +@cindex inline functions +@cindex integrating function code +@cindex open coding +@cindex macros, inline alternative -Here is a fictitious @code{*z++ = *x++ * *y++} instruction. -Notice that the @code{x}, @code{y} and @code{z} pointer registers -must be specified as input/output because the @code{asm} modifies -them. +By declaring a function inline, you can direct GCC to make +calls to that function faster. One way GCC can achieve this is to +integrate that function's code into the code for its callers. This +makes execution faster by eliminating the function-call overhead; in +addition, if any of the actual argument values are constant, their +known values may permit simplifications at compile time so that not +all of the inline function's code needs to be included. The effect on +code size is less predictable; object code may be larger or smaller +with function inlining, depending on the particular case. You can +also direct GCC to try to integrate all ``simple enough'' functions +into their callers with the option @option{-finline-functions}. -@smallexample -asm ("vecmul %0, %1, %2" - : "+r" (z), "+r" (x), "+r" (y), "=m" (*z) - : "m" (*x), "m" (*y)); -@end smallexample +GCC implements three different semantics of declaring a function +inline. One is available with @option{-std=gnu89} or +@option{-fgnu89-inline} or when @code{gnu_inline} attribute is present +on all inline declarations, another when +@option{-std=c99}, +@option{-std=gnu99} or an option for a later C version is used +(without @option{-fgnu89-inline}), and the third +is used when compiling C++. -An x86 example where the string memory argument is of unknown length. +To declare a function inline, use the @code{inline} keyword in its +declaration, like this: @smallexample -asm("repne scasb" - : "=c" (count), "+D" (p) - : "m" (*(const char (*)[]) p), "0" (-1), "a" (0)); +static inline int +inc (int *a) +@{ + return (*a)++; +@} @end smallexample -If you know the above will only be reading a ten byte array then you -could instead use a memory input like: -@code{"m" (*(const char (*)[10]) p)}. +If you are writing a header file to be included in ISO C90 programs, write +@code{__inline__} instead of @code{inline}. @xref{Alternate Keywords}. -Here is an example of a PowerPC vector scale implemented in assembly, -complete with vector and condition code clobbers, and some initialized -offset registers that are unchanged by the @code{asm}. +The three types of inlining behave similarly in two important cases: +when the @code{inline} keyword is used on a @code{static} function, +like the example above, and when a function is first declared without +using the @code{inline} keyword and then is defined with +@code{inline}, like this: @smallexample -void -dscal (size_t n, double *x, double alpha) +extern int inc (int *a); +inline int +inc (int *a) @{ - asm ("/* lots of asm here */" - : "+m" (*(double (*)[n]) x), "+&r" (n), "+b" (x) - : "d" (alpha), "b" (32), "b" (48), "b" (64), - "b" (80), "b" (96), "b" (112) - : "cr0", - "vs32","vs33","vs34","vs35","vs36","vs37","vs38","vs39", - "vs40","vs41","vs42","vs43","vs44","vs45","vs46","vs47"); + return (*a)++; @} @end smallexample -Rather than allocating fixed registers via clobbers to provide scratch -registers for an @code{asm} statement, an alternative is to define a -variable and make it an early-clobber output as with @code{a2} and -@code{a3} in the example below. This gives the compiler register -allocator more freedom. You can also define a variable and make it an -output tied to an input as with @code{a0} and @code{a1}, tied -respectively to @code{ap} and @code{lda}. Of course, with tied -outputs your @code{asm} can't use the input value after modifying the -output register since they are one and the same register. What's -more, if you omit the early-clobber on the output, it is possible that -GCC might allocate the same register to another of the inputs if GCC -could prove they had the same value on entry to the @code{asm}. This -is why @code{a1} has an early-clobber. Its tied input, @code{lda} -might conceivably be known to have the value 16 and without an -early-clobber share the same register as @code{%11}. On the other -hand, @code{ap} can't be the same as any of the other inputs, so an -early-clobber on @code{a0} is not needed. It is also not desirable in -this case. An early-clobber on @code{a0} would cause GCC to allocate -a separate register for the @code{"m" (*(const double (*)[]) ap)} -input. Note that tying an input to an output is the way to set up an -initialized temporary register modified by an @code{asm} statement. -An input not tied to an output is assumed by GCC to be unchanged, for -example @code{"b" (16)} below sets up @code{%11} to 16, and GCC might -use that register in following code if the value 16 happened to be -needed. You can even use a normal @code{asm} output for a scratch if -all inputs that might share the same register are consumed before the -scratch is used. The VSX registers clobbered by the @code{asm} -statement could have used this technique except for GCC's limit on the -number of @code{asm} parameters. +In both of these common cases, the program behaves the same as if you +had not used the @code{inline} keyword, except for its speed. -@smallexample -static void -dgemv_kernel_4x4 (long n, const double *ap, long lda, - const double *x, double *y, double alpha) -@{ - double *a0; - double *a1; - double *a2; - double *a3; +@cindex inline functions, omission of +@opindex fkeep-inline-functions +When a function is both inline and @code{static}, if all calls to the +function are integrated into the caller, and the function's address is +never used, then the function's own assembler code is never referenced. +In this case, GCC does not actually output assembler code for the +function, unless you specify the option @option{-fkeep-inline-functions}. +If there is a nonintegrated call, then the function is compiled to +assembler code as usual. The function must also be compiled as usual if +the program refers to its address, because that cannot be inlined. - __asm__ - ( - /* lots of asm here */ - "#n=%1 ap=%8=%12 lda=%13 x=%7=%10 y=%0=%2 alpha=%9 o16=%11\n" - "#a0=%3 a1=%4 a2=%5 a3=%6" - : - "+m" (*(double (*)[n]) y), - "+&r" (n), // 1 - "+b" (y), // 2 - "=b" (a0), // 3 - "=&b" (a1), // 4 - "=&b" (a2), // 5 - "=&b" (a3) // 6 - : - "m" (*(const double (*)[n]) x), - "m" (*(const double (*)[]) ap), - "d" (alpha), // 9 - "r" (x), // 10 - "b" (16), // 11 - "3" (ap), // 12 - "4" (lda) // 13 - : - "cr0", - "vs32","vs33","vs34","vs35","vs36","vs37", - "vs40","vs41","vs42","vs43","vs44","vs45","vs46","vs47" - ); -@} +@opindex Winline +Note that certain usages in a function definition can make it unsuitable +for inline substitution. Among these usages are: variadic functions, +use of @code{alloca}, use of computed goto (@pxref{Labels as Values}), +use of nonlocal goto, use of nested functions, use of @code{setjmp}, use +of @code{__builtin_longjmp} and use of @code{__builtin_return} or +@code{__builtin_apply_args}. Using @option{-Winline} warns when a +function marked @code{inline} could not be substituted, and gives the +reason for the failure. + +@cindex automatic @code{inline} for C++ member fns +@cindex @code{inline} automatic for C++ member fns +@cindex member fns, automatically @code{inline} +@cindex C++ member fns, automatically @code{inline} +@opindex fno-default-inline +As required by ISO C++, GCC considers member functions defined within +the body of a class to be marked inline even if they are +not explicitly declared with the @code{inline} keyword. You can +override this with @option{-fno-default-inline}; @pxref{C++ Dialect +Options,,Options Controlling C++ Dialect}. + +GCC does not inline any functions when not optimizing unless you specify +the @samp{always_inline} attribute for the function, like this: + +@smallexample +/* @r{Prototype.} */ +inline void foo (const char) __attribute__((always_inline)); @end smallexample -@anchor{GotoLabels} -@subsubsection Goto Labels -@cindex @code{asm} goto labels +The remainder of this section is specific to GNU C90 inlining. -@code{asm goto} allows assembly code to jump to one or more C labels. The -@var{GotoLabels} section in an @code{asm goto} statement contains -a comma-separated -list of all C labels to which the assembler code may jump. GCC assumes that -@code{asm} execution falls through to the next statement (if this is not the -case, consider using the @code{__builtin_unreachable} intrinsic after the -@code{asm} statement). Optimization of @code{asm goto} may be improved by -using the @code{hot} and @code{cold} label attributes (@pxref{Label -Attributes}). +@cindex non-static inline function +When an inline function is not @code{static}, then the compiler must assume +that there may be calls from other source files; since a global symbol can +be defined only once in any program, the function must not be defined in +the other source files, so the calls therein cannot be integrated. +Therefore, a non-@code{static} inline function is always compiled on its +own in the usual fashion. -If the assembler code does modify anything, use the @code{"memory"} clobber -to force the -optimizers to flush all register values to memory and reload them if -necessary after the @code{asm} statement. +If you specify both @code{inline} and @code{extern} in the function +definition, then the definition is used only for inlining. In no case +is the function compiled on its own, not even if you refer to its +address explicitly. Such an address becomes an external reference, as +if you had only declared the function, and had not defined it. -Also note that an @code{asm goto} statement is always implicitly -considered volatile. +This combination of @code{inline} and @code{extern} has almost the +effect of a macro. The way to use it is to put a function definition in +a header file with these keywords, and put another copy of the +definition (lacking @code{inline} and @code{extern}) in a library file. +The definition in the header file causes most calls to the function +to be inlined. If any uses of the function remain, they refer to +the single copy in the library. -Be careful when you set output operands inside @code{asm goto} only on -some possible control flow paths. If you don't set up the output on -given path and never use it on this path, it is okay. Otherwise, you -should use @samp{+} constraint modifier meaning that the operand is -input and output one. With this modifier you will have the correct -values on all possible paths from the @code{asm goto}. +@node Volatiles +@section When is a Volatile Object Accessed? +@cindex accessing volatiles +@cindex volatile read +@cindex volatile write +@cindex volatile access -To reference a label in the assembler template, prefix it with -@samp{%l} (lowercase @samp{L}) followed by its (zero-based) position -in @var{GotoLabels} plus the number of input and output operands. -Output operand with constraint modifier @samp{+} is counted as two -operands because it is considered as one output and one input operand. -For example, if the @code{asm} has three inputs, one output operand -with constraint modifier @samp{+} and one output operand with -constraint modifier @samp{=} and references two labels, refer to the -first label as @samp{%l6} and the second as @samp{%l7}). +C has the concept of volatile objects. These are normally accessed by +pointers and used for accessing hardware or inter-thread +communication. The standard encourages compilers to refrain from +optimizations concerning accesses to volatile objects, but leaves it +implementation defined as to what constitutes a volatile access. The +minimum requirement is that at a sequence point all previous accesses +to volatile objects have stabilized and no subsequent accesses have +occurred. Thus an implementation is free to reorder and combine +volatile accesses that occur between sequence points, but cannot do +so for accesses across a sequence point. The use of volatile does +not allow you to violate the restriction on updating objects multiple +times between two sequence points. -Alternately, you can reference labels using the actual C label name -enclosed in brackets. For example, to reference a label named -@code{carry}, you can use @samp{%l[carry]}. The label must still be -listed in the @var{GotoLabels} section when using this approach. It -is better to use the named references for labels as in this case you -can avoid counting input and output operands and special treatment of -output operands with constraint modifier @samp{+}. +Accesses to non-volatile objects are not ordered with respect to +volatile accesses. You cannot use a volatile object as a memory +barrier to order a sequence of writes to non-volatile memory. For +instance: -Here is an example of @code{asm goto} for i386: +@smallexample +int *ptr = @var{something}; +volatile int vobj; +*ptr = @var{something}; +vobj = 1; +@end smallexample -@example -asm goto ( - "btl %1, %0\n\t" - "jc %l2" - : /* No outputs. */ - : "r" (p1), "r" (p2) - : "cc" - : carry); +@noindent +Unless @var{*ptr} and @var{vobj} can be aliased, it is not guaranteed +that the write to @var{*ptr} occurs by the time the update +of @var{vobj} happens. If you need this guarantee, you must use +a stronger memory barrier such as: -return 0; +@smallexample +int *ptr = @var{something}; +volatile int vobj; +*ptr = @var{something}; +asm volatile ("" : : : "memory"); +vobj = 1; +@end smallexample -carry: -return 1; -@end example +A scalar volatile object is read when it is accessed in a void context: -The following example shows an @code{asm goto} that uses a memory clobber. +@smallexample +volatile int *src = @var{somevalue}; +*src; +@end smallexample -@example -int frob(int x) -@{ - int y; - asm goto ("frob %%r5, %1; jc %l[error]; mov (%2), %%r5" - : /* No outputs. */ - : "r"(x), "r"(&y) - : "r5", "memory" - : error); - return y; -error: - return -1; -@} -@end example +Such expressions are rvalues, and GCC implements this as a +read of the volatile object being pointed to. -The following example shows an @code{asm goto} that uses an output. +Assignments are also expressions and have an rvalue. However when +assigning to a scalar volatile, the volatile object is not reread, +regardless of whether the assignment expression's rvalue is used or +not. If the assignment's rvalue is used, the value is that assigned +to the volatile object. For instance, there is no read of @var{vobj} +in all the following cases: -@example -int foo(int count) -@{ - asm goto ("dec %0; jb %l[stop]" - : "+r" (count) - : - : - : stop); - return count; -stop: - return 0; -@} -@end example - -The following artificial example shows an @code{asm goto} that sets -up an output only on one path inside the @code{asm goto}. Usage of -constraint modifier @samp{=} instead of @samp{+} would be wrong as -@code{factor} is used on all paths from the @code{asm goto}. +@smallexample +int obj; +volatile int vobj; +vobj = @var{something}; +obj = vobj = @var{something}; +obj ? vobj = @var{onething} : vobj = @var{anotherthing}; +obj = (@var{something}, vobj = @var{anotherthing}); +@end smallexample -@example -int foo(int inp) -@{ - int factor = 0; - asm goto ("cmp %1, 10; jb %l[lab]; mov 2, %0" - : "+r" (factor) - : "r" (inp) - : - : lab); -lab: - return inp * factor; /* return 2 * inp or 0 if inp < 10 */ -@} -@end example +If you need to read the volatile object after an assignment has +occurred, you must use a separate expression with an intervening +sequence point. -@anchor{GenericOperandmodifiers} -@subsubsection Generic Operand Modifiers -@noindent -The following table shows the modifiers supported by all targets and their effects: +As bit-fields are not individually addressable, volatile bit-fields may +be implicitly read when written to, or when adjacent bit-fields are +accessed. Bit-field operations may be optimized such that adjacent +bit-fields are only partially accessed, if they straddle a storage unit +boundary. For these reasons it is unwise to use volatile bit-fields to +access hardware. -@multitable @columnfractions 0.15 0.7 0.15 -@headitem Modifier @tab Description @tab Example -@item @code{c} -@tab Require a constant operand and print the constant expression with no punctuation. -@tab @code{%c0} -@item @code{cc} -@tab Like @samp{%c} except try harder to print it with no punctuation. -@samp{%c} can e.g.@: fail to print constant addresses in position independent code on -some architectures. -@tab @code{%cc0} -@item @code{n} -@tab Like @samp{%c} except that the value of the constant is negated before printing. -@tab @code{%n0} -@item @code{a} -@tab Substitute a memory reference, with the actual operand treated as the address. -This may be useful when outputting a ``load address'' instruction, because -often the assembler syntax for such an instruction requires you to write the -operand as if it were a memory reference. -@tab @code{%a0} -@item @code{l} -@tab Print the label name with no punctuation. -@tab @code{%l0} -@end multitable +@node Using Assembly Language with C +@section How to Use Inline Assembly Language in C Code +@cindex @code{asm} keyword +@cindex assembly language in C +@cindex inline assembly language +@cindex mixing assembly language and C -@anchor{aarch64Operandmodifiers} -@subsubsection AArch64 Operand Modifiers +The @code{asm} keyword allows you to embed assembler instructions +within C code. GCC provides two forms of inline @code{asm} +statements. A @dfn{basic @code{asm}} statement is one with no +operands (@pxref{Basic Asm}), while an @dfn{extended @code{asm}} +statement (@pxref{Extended Asm}) includes one or more operands. +The extended form is preferred for mixing C and assembly language +within a function and can be used at top level as well with certain +restrictions. -The following table shows the modifiers supported by AArch64 and their effects: +You can also use the @code{asm} keyword to override the assembler name +for a C symbol, or to place a C variable in a specific register. -@multitable @columnfractions .10 .90 -@headitem Modifier @tab Description -@item @code{w} @tab Print a 32-bit general-purpose register name or, given a -constant zero operand, the 32-bit zero register (@code{wzr}). -@item @code{x} @tab Print a 64-bit general-purpose register name or, given a -constant zero operand, the 64-bit zero register (@code{xzr}). -@item @code{b} @tab Print an FP/SIMD register name with a @code{b} (byte, 8-bit) -prefix. -@item @code{h} @tab Print an FP/SIMD register name with an @code{h} (halfword, -16-bit) prefix. -@item @code{s} @tab Print an FP/SIMD register name with an @code{s} (single -word, 32-bit) prefix. -@item @code{d} @tab Print an FP/SIMD register name with a @code{d} (doubleword, -64-bit) prefix. -@item @code{q} @tab Print an FP/SIMD register name with a @code{q} (quadword, -128-bit) prefix. -@item @code{Z} @tab Print an FP/SIMD register name as an SVE register (i.e. with -a @code{z} prefix). This is a no-op for SVE register operands. -@end multitable +@menu +* Basic Asm:: Inline assembler without operands. +* Extended Asm:: Inline assembler with operands. +* Constraints:: Constraints for @code{asm} operands +* Asm constexprs:: C++11 constant expressions instead of string + literals. +* Asm Labels:: Specifying the assembler name to use for a C symbol. +* Explicit Register Variables:: Defining variables residing in specified + registers. +* Size of an asm:: How GCC calculates the size of an @code{asm} block. +@end menu -@anchor{x86Operandmodifiers} -@subsubsection x86 Operand Modifiers +@node Basic Asm +@subsection Basic Asm --- Assembler Instructions Without Operands +@cindex basic @code{asm} +@cindex assembly language in C, basic -References to input, output, and goto operands in the assembler template -of extended @code{asm} statements can use -modifiers to affect the way the operands are formatted in -the code output to the assembler. For example, the -following code uses the @samp{h} and @samp{b} modifiers for x86: +A basic @code{asm} statement has the following syntax: @example -uint16_t num; -asm volatile ("xchg %h0, %b0" : "+a" (num) ); +asm @var{asm-qualifiers} ( @var{AssemblerInstructions} ) @end example -@noindent -These modifiers generate this assembler code: +For the C language, the @code{asm} keyword is a GNU extension. +When writing C code that can be compiled with @option{-ansi} and the +@option{-std} options that select C dialects without GNU extensions, use +@code{__asm__} instead of @code{asm} (@pxref{Alternate Keywords}). For +the C++ language, @code{asm} is a standard keyword, but @code{__asm__} +can be used for code compiled with @option{-fno-asm}. -@example -xchg %ah, %al -@end example +@subsubheading Qualifiers +@table @code +@item volatile +The optional @code{volatile} qualifier has no effect. +All basic @code{asm} blocks are implicitly volatile. +Basic @code{asm} statements outside of functions may not use any +qualifiers. -The rest of this discussion uses the following code for illustrative purposes. +@item inline +If you use the @code{inline} qualifier, then for inlining purposes the size +of the @code{asm} statement is taken as the smallest size possible (@pxref{Size +of an asm}). +@end table -@example -int main() -@{ - int iInt = 1; +@subsubheading Parameters +@table @var -top: +@item AssemblerInstructions +This is a literal string that specifies the assembler code. +In C++ with @option{-std=gnu++11} or later, it can +also be a constant expression inside parentheses (see @ref{Asm constexprs}). - asm volatile goto ("some assembler instructions here" - : /* No outputs. */ - : "q" (iInt), "X" (sizeof(unsigned char) + 1), "i" (42) - : /* No clobbers. */ - : top); -@} -@end example +The string can contain any instructions recognized by the assembler, +including directives. GCC does not parse the assembler instructions +themselves and does not know what they mean or even whether they are +valid assembler input. -With no modifiers, this is what the output from the operands would be -for the @samp{att} and @samp{intel} dialects of assembler: +You may place multiple assembler instructions together in a single @code{asm} +string, separated by the characters normally used in assembly code for the +system. A combination that works in most places is a newline to break the +line, plus a tab character (written as @samp{\n\t}). +Some assemblers allow semicolons as a line separator. However, +note that some assembler dialects use semicolons to start a comment. +@end table -@multitable {Operand} {$.L2} {OFFSET FLAT:.L2} -@headitem Operand @tab @samp{att} @tab @samp{intel} -@item @code{%0} -@tab @code{%eax} -@tab @code{eax} -@item @code{%1} -@tab @code{$2} -@tab @code{2} -@item @code{%3} -@tab @code{$.L3} -@tab @code{OFFSET FLAT:.L3} -@item @code{%4} -@tab @code{$8} -@tab @code{8} -@item @code{%5} -@tab @code{%xmm0} -@tab @code{xmm0} -@item @code{%7} -@tab @code{$0} -@tab @code{0} -@end multitable +@subsubheading Remarks +Using extended @code{asm} (@pxref{Extended Asm}) typically produces +smaller, safer, and more efficient code, and in most cases it is a +better solution than basic @code{asm}. However, functions declared +with the @code{naked} attribute require only basic @code{asm} +(@pxref{Function Attributes}). -The table below shows the list of supported modifiers and their effects. +Basic @code{asm} statements may be used both inside a C function or at +file scope (``top-level''), where you can use this technique to emit +assembler directives, define assembly language macros that can be invoked +elsewhere in the file, or write entire functions in assembly language. -@multitable {Modifier} {Print the opcode suffix for the size of th} {Operand} {@samp{att}} {@samp{intel}} -@headitem Modifier @tab Description @tab Operand @tab @samp{att} @tab @samp{intel} -@item @code{A} -@tab Print an absolute memory reference. -@tab @code{%A0} -@tab @code{*%rax} -@tab @code{rax} -@item @code{b} -@tab Print the QImode name of the register. -@tab @code{%b0} -@tab @code{%al} -@tab @code{al} -@item @code{B} -@tab print the opcode suffix of b. -@tab @code{%B0} -@tab @code{b} -@tab -@item @code{c} -@tab Require a constant operand and print the constant expression with no punctuation. -@tab @code{%c1} -@tab @code{2} -@tab @code{2} -@item @code{d} -@tab print duplicated register operand for AVX instruction. -@tab @code{%d5} -@tab @code{%xmm0, %xmm0} -@tab @code{xmm0, xmm0} -@item @code{E} -@tab Print the address in Double Integer (DImode) mode (8 bytes) when the target is 64-bit. -Otherwise mode is unspecified (VOIDmode). -@tab @code{%E1} -@tab @code{%(rax)} -@tab @code{[rax]} -@item @code{g} -@tab Print the V16SFmode name of the register. -@tab @code{%g0} -@tab @code{%zmm0} -@tab @code{zmm0} -@item @code{h} -@tab Print the QImode name for a ``high'' register. -@tab @code{%h0} -@tab @code{%ah} -@tab @code{ah} -@item @code{H} -@tab Add 8 bytes to an offsettable memory reference. Useful when accessing the -high 8 bytes of SSE values. For a memref in (%rax), it generates -@tab @code{%H0} -@tab @code{8(%rax)} -@tab @code{8[rax]} -@item @code{k} -@tab Print the SImode name of the register. -@tab @code{%k0} -@tab @code{%eax} -@tab @code{eax} -@item @code{l} -@tab Print the label name with no punctuation. -@tab @code{%l3} -@tab @code{.L3} -@tab @code{.L3} -@item @code{L} -@tab print the opcode suffix of l. -@tab @code{%L0} -@tab @code{l} -@tab -@item @code{N} -@tab print maskz. -@tab @code{%N7} -@tab @code{@{z@}} -@tab @code{@{z@}} -@item @code{p} -@tab Print raw symbol name (without syntax-specific prefixes). -@tab @code{%p2} -@tab @code{42} -@tab @code{42} -@item @code{P} -@tab If used for a function, print the PLT suffix and generate PIC code. -For example, emit @code{foo@@PLT} instead of 'foo' for the function -foo(). If used for a constant, drop all syntax-specific prefixes and -issue the bare constant. See @code{p} above. -@item @code{q} -@tab Print the DImode name of the register. -@tab @code{%q0} -@tab @code{%rax} -@tab @code{rax} -@item @code{Q} -@tab print the opcode suffix of q. -@tab @code{%Q0} -@tab @code{q} -@tab -@item @code{R} -@tab print embedded rounding and sae. -@tab @code{%R4} -@tab @code{@{rn-sae@}, } -@tab @code{, @{rn-sae@}} -@item @code{r} -@tab print only sae. -@tab @code{%r4} -@tab @code{@{sae@}, } -@tab @code{, @{sae@}} -@item @code{s} -@tab print a shift double count, followed by the assemblers argument -delimiterprint the opcode suffix of s. -@tab @code{%s1} -@tab @code{$2, } -@tab @code{2, } -@item @code{S} -@tab print the opcode suffix of s. -@tab @code{%S0} -@tab @code{s} -@tab -@item @code{t} -@tab print the V8SFmode name of the register. -@tab @code{%t5} -@tab @code{%ymm0} -@tab @code{ymm0} -@item @code{T} -@tab print the opcode suffix of t. -@tab @code{%T0} -@tab @code{t} -@tab -@item @code{V} -@tab print naked full integer register name without %. -@tab @code{%V0} -@tab @code{eax} -@tab @code{eax} -@item @code{w} -@tab Print the HImode name of the register. -@tab @code{%w0} -@tab @code{%ax} -@tab @code{ax} -@item @code{W} -@tab print the opcode suffix of w. -@tab @code{%W0} -@tab @code{w} -@tab -@item @code{x} -@tab print the V4SFmode name of the register. -@tab @code{%x5} -@tab @code{%xmm0} -@tab @code{xmm0} -@item @code{y} -@tab print "st(0)" instead of "st" as a register. -@tab @code{%y6} -@tab @code{%st(0)} -@tab @code{st(0)} -@item @code{z} -@tab Print the opcode suffix for the size of the current integer operand (one of @code{b}/@code{w}/@code{l}/@code{q}). -@tab @code{%z0} -@tab @code{l} -@tab -@item @code{Z} -@tab Like @code{z}, with special suffixes for x87 instructions. -@end multitable +Safely accessing C data and calling functions from basic @code{asm} is more +complex than it may appear. To access C data, it is better to use extended +@code{asm}. +Do not expect a sequence of @code{asm} statements to remain perfectly +consecutive after compilation. If certain instructions need to remain +consecutive in the output, put them in a single multi-instruction @code{asm} +statement. Note that GCC's optimizers can move @code{asm} statements +relative to other code, including across jumps. -@anchor{x86floatingpointasmoperands} -@subsubsection x86 Floating-Point @code{asm} Operands +@code{asm} statements may not perform jumps into other @code{asm} statements. +GCC does not know about these jumps, and therefore cannot take +account of them when deciding how to optimize. Jumps from @code{asm} to C +labels are only supported in extended @code{asm}. -On x86 targets, there are several rules on the usage of stack-like registers -in the operands of an @code{asm}. These rules apply only to the operands -that are stack-like registers: +Under certain circumstances, GCC may duplicate (or remove duplicates of) your +assembly code when optimizing. This can lead to unexpected duplicate +symbol errors during compilation if your assembly code defines symbols or +labels. -@enumerate -@item -Given a set of input registers that die in an @code{asm}, it is -necessary to know which are implicitly popped by the @code{asm}, and -which must be explicitly popped by GCC@. +@strong{Warning:} The C standards do not specify semantics for @code{asm}, +making it a potential source of incompatibilities between compilers. These +incompatibilities may not produce compiler warnings/errors. -An input register that is implicitly popped by the @code{asm} must be -explicitly clobbered, unless it is constrained to match an -output operand. +GCC does not parse basic @code{asm}'s @var{AssemblerInstructions}, which +means there is no way to communicate to the compiler what is happening +inside them. GCC has no visibility of symbols in the @code{asm} and may +discard them as unreferenced. It also does not know about side effects of +the assembler code, such as modifications to memory or registers. Unlike +some compilers, GCC assumes that no changes to general purpose registers +occur. This assumption may change in a future release. -@item -For any input register that is implicitly popped by an @code{asm}, it is -necessary to know how to adjust the stack to compensate for the pop. -If any non-popped input is closer to the top of the reg-stack than -the implicitly popped register, it would not be possible to know what the -stack looked like---it's not clear how the rest of the stack ``slides -up''. +To avoid complications from future changes to the semantics and the +compatibility issues between compilers, consider replacing basic @code{asm} +with extended @code{asm}. See +@uref{https://gcc.gnu.org/wiki/ConvertBasicAsmToExtended, How to convert +from basic asm to extended asm} for information about how to perform this +conversion. -All implicitly popped input registers must be closer to the top of -the reg-stack than any input that is not implicitly popped. +The compiler copies the assembler instructions in a basic @code{asm} +verbatim to the assembly language output file, without +processing dialects or any of the @samp{%} operators that are available with +extended @code{asm}. This results in minor differences between basic +@code{asm} strings and extended @code{asm} templates. For example, to refer to +registers you might use @samp{%eax} in basic @code{asm} and +@samp{%%eax} in extended @code{asm}. -It is possible that if an input dies in an @code{asm}, the compiler might -use the input register for an output reload. Consider this example: +On targets such as x86 that support multiple assembler dialects, +all basic @code{asm} blocks use the assembler dialect specified by the +@option{-masm} command-line option (@pxref{x86 Options}). +Basic @code{asm} provides no +mechanism to provide different assembler strings for different dialects. -@smallexample -asm ("foo" : "=t" (a) : "f" (b)); -@end smallexample +For basic @code{asm} with non-empty assembler string GCC assumes +the assembler block does not change any general purpose registers, +but it may read or write any globally accessible variable. -@noindent -This code says that input @code{b} is not popped by the @code{asm}, and that -the @code{asm} pushes a result onto the reg-stack, i.e., the stack is one -deeper after the @code{asm} than it was before. But, it is possible that -reload may think that it can use the same register for both the input and -the output. +Here is an example of basic @code{asm} for i386: -To prevent this from happening, -if any input operand uses the @samp{f} constraint, all output register -constraints must use the @samp{&} early-clobber modifier. +@example +/* Note that this code will not compile with -masm=intel */ +#define DebugBreak() asm("int $3") +@end example -The example above is correctly written as: +@node Extended Asm +@subsection Extended Asm - Assembler Instructions with C Expression Operands +@cindex extended @code{asm} +@cindex assembly language in C, extended -@smallexample -asm ("foo" : "=&t" (a) : "f" (b)); -@end smallexample +With extended @code{asm} you can read and write C variables from +assembler and perform jumps from assembler code to C labels. +Extended @code{asm} syntax uses colons (@samp{:}) to delimit +the operand parameters after the assembler template: -@item -Some operands need to be in particular places on the stack. All -output operands fall in this category---GCC has no other way to -know which registers the outputs appear in unless you indicate -this in the constraints. +@example +asm @var{asm-qualifiers} ( @var{AssemblerTemplate} + : @var{OutputOperands} + @r{[} : @var{InputOperands} + @r{[} : @var{Clobbers} @r{]} @r{]}) -Output operands must specifically indicate which register an output -appears in after an @code{asm}. @samp{=f} is not allowed: the operand -constraints must select a class with a single register. +asm @var{asm-qualifiers} ( @var{AssemblerTemplate} + : @var{OutputOperands} + : @var{InputOperands} + : @var{Clobbers} + : @var{GotoLabels}) +@end example +where in the last form, @var{asm-qualifiers} contains @code{goto} (and in the +first form, not). -@item -Output operands may not be ``inserted'' between existing stack registers. -Since no 387 opcode uses a read/write operand, all output operands -are dead before the @code{asm}, and are pushed by the @code{asm}. -It makes no sense to push anywhere but the top of the reg-stack. +The @code{asm} keyword is a GNU extension. +When writing code that can be compiled with @option{-ansi} and the +various @option{-std} options, use @code{__asm__} instead of +@code{asm} (@pxref{Alternate Keywords}). -Output operands must start at the top of the reg-stack: output -operands may not ``skip'' a register. +@subsubheading Qualifiers +@table @code -@item -Some @code{asm} statements may need extra stack space for internal -calculations. This can be guaranteed by clobbering stack registers -unrelated to the inputs and outputs. +@item volatile +The typical use of extended @code{asm} statements is to manipulate input +values to produce output values. However, your @code{asm} statements may +also produce side effects. If so, you may need to use the @code{volatile} +qualifier to disable certain optimizations. @xref{Volatile}. -@end enumerate +@item inline +If you use the @code{inline} qualifier, then for inlining purposes the size +of the @code{asm} statement is taken as the smallest size possible +(@pxref{Size of an asm}). -This @code{asm} -takes one input, which is internally popped, and produces two outputs. +@item goto +This qualifier informs the compiler that the @code{asm} statement may +perform a jump to one of the labels listed in the @var{GotoLabels}. +@xref{GotoLabels}. +@end table -@smallexample -asm ("fsincos" : "=t" (cos), "=u" (sin) : "0" (inp)); -@end smallexample +@subsubheading Parameters +@table @var +@item AssemblerTemplate +This is a literal string that is the template for the assembler code. It is a +combination of fixed text and tokens that refer to the input, output, +and goto parameters. @xref{AssemblerTemplate}. -@noindent -This @code{asm} takes two inputs, which are popped by the @code{fyl2xp1} opcode, -and replaces them with one output. The @code{st(1)} clobber is necessary -for the compiler to know that @code{fyl2xp1} pops both inputs. +@item OutputOperands +A comma-separated list describing the C variables modified by the +instructions in the @var{AssemblerTemplate}. An empty list is permitted. +@xref{OutputOperands}. -@smallexample -asm ("fyl2xp1" : "=t" (result) : "0" (x), "u" (y) : "st(1)"); -@end smallexample +@item InputOperands +A comma-separated list describing the C expressions read by the +instructions in the @var{AssemblerTemplate}. An empty list is permitted. +@xref{InputOperands}. -@anchor{msp430Operandmodifiers} -@subsubsection MSP430 Operand Modifiers +@item Clobbers +A comma-separated list of registers or other values changed by the +@var{AssemblerTemplate}, beyond those listed as outputs. +An empty list is permitted. @xref{Clobbers and Scratch Registers}. -The list below describes the supported modifiers and their effects for MSP430. +@item GotoLabels +When you are using the @code{goto} form of @code{asm}, this section contains +the list of all C labels to which the code in the +@var{AssemblerTemplate} may jump. +@xref{GotoLabels}. -@multitable @columnfractions .10 .90 -@headitem Modifier @tab Description -@item @code{A} @tab Select low 16-bits of the constant/register/memory operand. -@item @code{B} @tab Select high 16-bits of the constant/register/memory -operand. -@item @code{C} @tab Select bits 32-47 of the constant/register/memory operand. -@item @code{D} @tab Select bits 48-63 of the constant/register/memory operand. -@item @code{H} @tab Equivalent to @code{B} (for backwards compatibility). -@item @code{I} @tab Print the inverse (logical @code{NOT}) of the constant -value. -@item @code{J} @tab Print an integer without a @code{#} prefix. -@item @code{L} @tab Equivalent to @code{A} (for backwards compatibility). -@item @code{O} @tab Offset of the current frame from the top of the stack. -@item @code{Q} @tab Use the @code{A} instruction postfix. -@item @code{R} @tab Inverse of condition code, for unsigned comparisons. -@item @code{W} @tab Subtract 16 from the constant value. -@item @code{X} @tab Use the @code{X} instruction postfix. -@item @code{Y} @tab Subtract 4 from the constant value. -@item @code{Z} @tab Subtract 1 from the constant value. -@item @code{b} @tab Append @code{.B}, @code{.W} or @code{.A} to the -instruction, depending on the mode. -@item @code{d} @tab Offset 1 byte of a memory reference or constant value. -@item @code{e} @tab Offset 3 bytes of a memory reference or constant value. -@item @code{f} @tab Offset 5 bytes of a memory reference or constant value. -@item @code{g} @tab Offset 7 bytes of a memory reference or constant value. -@item @code{p} @tab Print the value of 2, raised to the power of the given -constant. Used to select the specified bit position. -@item @code{r} @tab Inverse of condition code, for signed comparisons. -@item @code{x} @tab Equivalent to @code{X}, but only for pointers. -@end multitable +@code{asm} statements may not perform jumps into other @code{asm} statements, +only to the listed @var{GotoLabels}. +GCC's optimizers do not know about other jumps; therefore they cannot take +account of them when deciding how to optimize. +@end table -@anchor{loongarchOperandmodifiers} -@subsubsection LoongArch Operand Modifiers +The total number of input + output + goto operands is limited to 30. -The list below describes the supported modifiers and their effects for LoongArch. +@subsubheading Remarks +The @code{asm} statement allows you to include assembly instructions directly +within C code. This may help you to maximize performance in time-sensitive +code or to access assembly instructions that are not readily available to C +programs. -@multitable @columnfractions .10 .90 -@headitem Modifier @tab Description -@item @code{d} @tab Same as @code{c}. -@item @code{i} @tab Print the character ''@code{i}'' if the operand is not a register. -@item @code{m} @tab Same as @code{c}, but the printed value is @code{operand - 1}. -@item @code{u} @tab Print a LASX register. -@item @code{w} @tab Print a LSX register. -@item @code{X} @tab Print a constant integer operand in hexadecimal. -@item @code{z} @tab Print the operand in its unmodified form, followed by a comma. -@end multitable +Similarly to basic @code{asm}, extended @code{asm} statements may be used +both inside a C function or at file scope (``top-level''), where you can +use this technique to emit assembler directives, define assembly language +macros that can be invoked elsewhere in the file, or write entire functions +in assembly language. +Extended @code{asm} statements outside of functions may not use any +qualifiers, may not specify clobbers, may not use @code{%}, @code{+} or +@code{&} modifiers in constraints and can only use constraints which don't +allow using any register. -References to input and output operands in the assembler template of extended -asm statements can use modifiers to affect the way the operands are formatted -in the code output to the assembler. For example, the following code uses the -'w' modifier for LoongArch: +Functions declared with the @code{naked} attribute require basic +@code{asm} (@pxref{Function Attributes}). + +While the uses of @code{asm} are many and varied, it may help to think of an +@code{asm} statement as a series of low-level instructions that convert input +parameters to output parameters. So a simple (if not particularly useful) +example for i386 using @code{asm} might look like this: @example -test-asm.c: +int src = 1; +int dst; -#include +asm ("mov %1, %0\n\t" + "add $1, %0" + : "=r" (dst) + : "r" (src)); -__m128i foo (void) +printf("%d\n", dst); +@end example + +This code copies @code{src} to @code{dst} and add 1 to @code{dst}. + +@anchor{Volatile} +@subsubsection Volatile +@cindex volatile @code{asm} +@cindex @code{asm} volatile + +GCC's optimizers sometimes discard @code{asm} statements if they determine +there is no need for the output variables. Also, the optimizers may move +code out of loops if they believe that the code will always return the same +result (i.e.@: none of its input values change between calls). Using the +@code{volatile} qualifier disables these optimizations. @code{asm} statements +that have no output operands and @code{asm goto} statements, +are implicitly volatile. + +This i386 code demonstrates a case that does not use (or require) the +@code{volatile} qualifier. If it is performing assertion checking, this code +uses @code{asm} to perform the validation. Otherwise, @code{dwRes} is +unreferenced by any code. As a result, the optimizers can discard the +@code{asm} statement, which in turn removes the need for the entire +@code{DoCheck} routine. By omitting the @code{volatile} qualifier when it +isn't needed you allow the optimizers to produce the most efficient code +possible. + +@example +void DoCheck(uint32_t dwSomeValue) @{ -__m128i a,b,c; -__asm__ ("vadd.d %w0,%w1,%w2\n\t" - :"=f" (c) - :"f" (a),"f" (b)); + uint32_t dwRes; -return c; -@} + // Assumes dwSomeValue is not zero. + asm ("bsfl %1,%0" + : "=r" (dwRes) + : "r" (dwSomeValue) + : "cc"); + assert(dwRes > 3); +@} @end example -@noindent -The compile command for the test case is as follows: +The next example shows a case where the optimizers can recognize that the input +(@code{dwSomeValue}) never changes during the execution of the function and can +therefore move the @code{asm} outside the loop to produce more efficient code. +Again, using the @code{volatile} qualifier disables this type of optimization. @example -gcc test-asm.c -mlsx -S -o test-asm.s +void do_print(uint32_t dwSomeValue) +@{ + uint32_t dwRes; + + for (uint32_t x=0; x < 5; x++) + @{ + // Assumes dwSomeValue is not zero. + asm ("bsfl %1,%0" + : "=r" (dwRes) + : "r" (dwSomeValue) + : "cc"); + + printf("%u: %u %u\n", x, dwSomeValue, dwRes); + @} +@} @end example -@noindent -The assembly statement produces the following assembly code: +The following example demonstrates a case where you need to use the +@code{volatile} qualifier. +It uses the x86 @code{rdtsc} instruction, which reads +the computer's time-stamp counter. Without the @code{volatile} qualifier, +the optimizers might assume that the @code{asm} block will always return the +same value and therefore optimize away the second call. @example -vadd.d $vr0,$vr0,$vr1 -@end example +uint64_t msr; -This is a 128-bit vector addition instruction, @code{c} (referred to in the -template string as %0) is the output, and @code{a} (%1) and @code{b} (%2) are -the inputs. @code{__m128i} is a vector data type defined in the file -@code{lsxintrin.h} (@xref{LoongArch SX Vector Intrinsics}). The symbol '=f' -represents a constraint using a floating-point register as an output type, and -the 'f' in the input operand represents a constraint using a floating-point -register operand, which can refer to the definition of a constraint -(@xref{Constraints}) in gcc. +asm volatile ( "rdtsc\n\t" // Returns the time in EDX:EAX. + "shl $32, %%rdx\n\t" // Shift the upper bits left. + "or %%rdx, %0" // 'Or' in the lower bits. + : "=a" (msr) + : + : "rdx"); -@anchor{riscvOperandmodifiers} -@subsubsection RISC-V Operand Modifiers +printf("msr: %llx\n", msr); -The list below describes the supported modifiers and their effects for RISC-V. +// Do other work... -@multitable @columnfractions .10 .90 -@headitem Modifier @tab Description -@item @code{z} @tab Print ''@code{zero}'' instead of 0 if the operand is an immediate with a value of zero. -@item @code{i} @tab Print the character ''@code{i}'' if the operand is an immediate. -@item @code{N} @tab Print the register encoding as integer (0 - 31). -@end multitable +// Reprint the timestamp +asm volatile ( "rdtsc\n\t" // Returns the time in EDX:EAX. + "shl $32, %%rdx\n\t" // Shift the upper bits left. + "or %%rdx, %0" // 'Or' in the lower bits. + : "=a" (msr) + : + : "rdx"); -@anchor{shOperandmodifiers} -@subsubsection SH Operand Modifiers +printf("msr: %llx\n", msr); +@end example -The list below describes the supported modifiers and their effects for the SH family of processors. +GCC's optimizers do not treat this code like the non-volatile code in the +earlier examples. They do not move it out of loops or omit it on the +assumption that the result from a previous call is still valid. -@multitable @columnfractions .10 .90 -@headitem Modifier @tab Description -@item @code{.} @tab Print ''@code{.s}'' if the instruction needs a delay slot. -@item @code{,} @tab Print ''@code{LOCAL_LABEL_PREFIX}''. -@item @code{@@} @tab Print ''@code{trap}'', ''@code{rte}'' or ''@code{rts}'' depending on the interrupt pragma used. -@item @code{#} @tab Print ''@code{nop}'' if there is nothing to put in the delay slot. -@item @code{'} @tab Print likelihood suffix (''@code{/u}'' for unlikely). -@item @code{>} @tab Print branch target if ''@code{-fverbose-asm}''. -@item @code{O} @tab Require a constant operand and print the constant expression with no punctuation. -@item @code{R} @tab Print the ''@code{LSW}'' of a dp value - changes if in little endian. -@item @code{S} @tab Print the ''@code{MSW}'' of a dp value - changes if in little endian. -@item @code{T} @tab Print the next word of a dp value - same as ''@code{R}'' in big endian mode. -@item @code{M} @tab Print ''@code{.b }'', ''@code{.w}'', ''@code{.l}'', ''@code{.s}'', ''@code{.d}'', suffix if operand is a MEM. -@item @code{N} @tab Print ''@code{r63}'' if the operand is ''@code{const_int 0}''. -@item @code{d} @tab Print a ''@code{V2SF}'' as ''@code{dN}'' instead of ''@code{fpN}''. -@item @code{m} @tab Print the pair ''@code{base,offset}'' or ''@code{base,index}'' for LD and ST. -@item @code{U} @tab Like ''@code{%m}'' for ''@code{LD}'' and ''@code{ST}'', ''@code{HI}'' and ''@code{LO}''. -@item @code{V} @tab Print the position of a single bit set. -@item @code{W} @tab Print the position of a single bit cleared. -@item @code{t} @tab Print a memory address which is a register. -@item @code{u} @tab Print the lowest 16 bits of ''@code{CONST_INT}'', as an unsigned value. -@item @code{o} @tab Print an operator. -@end multitable +Note that the compiler can move even @code{volatile asm} instructions relative +to other code, including across jump instructions. For example, on many +targets there is a system register that controls the rounding mode of +floating-point operations. Setting it with a @code{volatile asm} statement, +as in the following PowerPC example, does not work reliably. -@lowersections -@include md.texi -@raisesections +@example +asm volatile("mtfsf 255, %0" : : "f" (fpenv)); +sum = x + y; +@end example -@node Asm constexprs -@subsection C++11 Constant Expressions instead of String Literals +The compiler may move the addition back before the @code{volatile asm} +statement. To make it work as expected, add an artificial dependency to +the @code{asm} by referencing a variable in the subsequent code, for +example: -In C++ with @option{-std=gnu++11} or later, strings that appear in asm -syntax---specifically, the assembler template, constraints, and -clobbers---can be specified as parenthesized compile-time constant -expressions as well as by string literals. The parentheses around such -an expression are a required part of the syntax. The constant expression -can return a container with @code{data ()} and @code{size ()} -member functions, following similar rules as the C++26 @code{static_assert} -message. Any string is converted to the character set of the source code. -When this feature is available the @code{__GXX_CONSTEXPR_ASM__} preprocessor -macro is predefined. +@example +asm volatile ("mtfsf 255,%1" : "=X" (sum) : "f" (fpenv)); +sum = x + y; +@end example -This extension is supported for both the basic and extended asm syntax. +Under certain circumstances, GCC may duplicate (or remove duplicates of) your +assembly code when optimizing. This can lead to unexpected duplicate symbol +errors during compilation if your @code{asm} code defines symbols or labels. +Using @samp{%=} +(@pxref{AssemblerTemplate}) may help resolve this problem. -@example -#include -constexpr std::string_view genfoo() @{ return "foo"; @} +@anchor{AssemblerTemplate} +@subsubsection Assembler Template +@cindex @code{asm} assembler template -void function() -@{ - asm((genfoo())); -@} -@end example +An assembler template is a literal string containing assembler instructions. +In C++ with @option{-std=gnu++11} or later, the assembler template can +also be a constant expression inside parentheses (see @ref{Asm constexprs}). -@node Asm Labels -@subsection Controlling Names Used in Assembler Code -@cindex assembler names for identifiers -@cindex names used in assembler code -@cindex identifiers, names in assembler code +The compiler replaces tokens in the template that refer +to inputs, outputs, and goto labels, +and then outputs the resulting string to the assembler. The +string can contain any instructions recognized by the assembler, including +directives. GCC does not parse the assembler instructions +themselves and does not know what they mean or even whether they are valid +assembler input. However, it does count the statements +(@pxref{Size of an asm}). -You can specify the name to be used in the assembler code for a C -function or variable by writing the @code{asm} (or @code{__asm__}) -keyword after the declarator. -It is up to you to make sure that the assembler names you choose do not -conflict with any other assembler symbols, or reference registers. +You may place multiple assembler instructions together in a single @code{asm} +string, separated by the characters normally used in assembly code for the +system. A combination that works in most places is a newline to break the +line, plus a tab character to move to the instruction field (written as +@samp{\n\t}). +Some assemblers allow semicolons as a line separator. However, note +that some assembler dialects use semicolons to start a comment. -@subsubheading Assembler names for data +Do not expect a sequence of @code{asm} statements to remain perfectly +consecutive after compilation, even when you are using the @code{volatile} +qualifier. If certain instructions need to remain consecutive in the output, +put them in a single multi-instruction @code{asm} statement. -This sample shows how to specify the assembler name for data: +Accessing data from C programs without using input/output operands (such as +by using global symbols directly from the assembler template) may not work as +expected. Similarly, calling functions directly from an assembler template +requires a detailed understanding of the target assembler and ABI. -@smallexample -int foo asm ("myfoo") = 2; -@end smallexample +Since GCC does not parse the assembler template, +it has no visibility of any +symbols it references. This may result in GCC discarding those symbols as +unreferenced unless they are also listed as input, output, or goto operands. -@noindent -This specifies that the name to be used for the variable @code{foo} in -the assembler code should be @samp{myfoo} rather than the usual -@samp{_foo}. +@subsubheading Special format strings -On systems where an underscore is normally prepended to the name of a C -variable, this feature allows you to define names for the -linker that do not start with an underscore. +In addition to the tokens described by the input, output, and goto operands, +these tokens have special meanings in the assembler template: -GCC does not support using this feature with a non-static local variable -since such variables do not have assembler names. If you are -trying to put the variable in a particular register, see -@ref{Explicit Register Variables}. +@table @samp +@item %% +Outputs a single @samp{%} into the assembler code. -@subsubheading Assembler names for functions +@item %= +Outputs a number that is unique to each instance of the @code{asm} +statement in the entire compilation. This option is useful when creating local +labels and referring to them multiple times in a single template that +generates multiple assembler instructions. -To specify the assembler name for functions, write a declaration for the -function before its definition and put @code{asm} there, like this: +@item %@{ +@itemx %| +@itemx %@} +Outputs @samp{@{}, @samp{|}, and @samp{@}} characters (respectively) +into the assembler code. When unescaped, these characters have special +meaning to indicate multiple assembler dialects, as described below. +@end table -@smallexample -int func (int x, int y) asm ("MYFUNC"); - -int func (int x, int y) -@{ - /* @r{@dots{}} */ -@end smallexample +@subsubheading Multiple assembler dialects in @code{asm} templates -@noindent -This specifies that the name to be used for the function @code{func} in -the assembler code should be @code{MYFUNC}. +On targets such as x86, GCC supports multiple assembler dialects. +The @option{-masm} option controls which dialect GCC uses as its +default for inline assembler. The target-specific documentation for the +@option{-masm} option contains the list of supported dialects, as well as the +default dialect if the option is not specified. This information may be +important to understand, since assembler code that works correctly when +compiled using one dialect will likely fail if compiled using another. +@xref{x86 Options}. -@node Explicit Register Variables -@subsection Variables in Specified Registers -@anchor{Explicit Reg Vars} -@cindex explicit register variables -@cindex variables in specified registers -@cindex specified registers +If your code needs to support multiple assembler dialects (for example, if +you are writing public headers that need to support a variety of compilation +options), use constructs of this form: -GNU C allows you to associate specific hardware registers with C -variables. In almost all cases, allowing the compiler to assign -registers produces the best code. However under certain unusual -circumstances, more precise control over the variable storage is -required. +@example +@{ dialect0 | dialect1 | dialect2... @} +@end example -Both global and local variables can be associated with a register. The -consequences of performing this association are very different between -the two, as explained in the sections below. +This construct outputs @code{dialect0} +when using dialect #0 to compile the code, +@code{dialect1} for dialect #1, etc. If there are fewer alternatives within the +braces than the number of dialects the compiler supports, the construct +outputs nothing. -@menu -* Global Register Variables:: Variables declared at global scope. -* Local Register Variables:: Variables declared within a function. -@end menu +For example, if an x86 compiler supports two dialects +(@samp{att}, @samp{intel}), an +assembler template such as this: -@node Global Register Variables -@subsubsection Defining Global Register Variables -@anchor{Global Reg Vars} -@cindex global register variables -@cindex registers, global variables in -@cindex registers, global allocation +@example +"bt@{l %[Offset],%[Base] | %[Base],%[Offset]@}; jc %l2" +@end example -You can define a global register variable and associate it with a specified -register like this: +@noindent +is equivalent to one of -@smallexample -register int *foo asm ("r12"); -@end smallexample +@example +"btl %[Offset],%[Base] ; jc %l2" @r{/* att dialect */} +"bt %[Base],%[Offset]; jc %l2" @r{/* intel dialect */} +@end example + +Using that same compiler, this code: + +@example +"xchg@{l@}\t@{%%@}ebx, %1" +@end example @noindent -Here @code{r12} is the name of the register that should be used. Note that -this is the same syntax used for defining local register variables, but for -a global variable the declaration appears outside a function. The -@code{register} keyword is required, and cannot be combined with -@code{static}. The register name must be a valid register name for the -target platform. +corresponds to either -Do not use type qualifiers such as @code{const} and @code{volatile}, as -the outcome may be contrary to expectations. In particular, using the -@code{volatile} qualifier does not fully prevent the compiler from -optimizing accesses to the register. +@example +"xchgl\t%%ebx, %1" @r{/* att dialect */} +"xchg\tebx, %1" @r{/* intel dialect */} +@end example -Registers are a scarce resource on most systems and allowing the -compiler to manage their usage usually results in the best code. However, -under special circumstances it can make sense to reserve some globally. -For example this may be useful in programs such as programming language -interpreters that have a couple of global variables that are accessed -very often. +There is no support for nesting dialect alternatives. -After defining a global register variable, for the current compilation -unit: +@anchor{OutputOperands} +@subsubsection Output Operands +@cindex @code{asm} output operands -@itemize @bullet -@item If the register is a call-saved register, call ABI is affected: -the register will not be restored in function epilogue sequences after -the variable has been assigned. Therefore, functions cannot safely -return to callers that assume standard ABI. -@item Conversely, if the register is a call-clobbered register, making -calls to functions that use standard ABI may lose contents of the variable. -Such calls may be created by the compiler even if none are evident in -the original program, for example when libgcc functions are used to -make up for unavailable instructions. -@item Accesses to the variable may be optimized as usual and the register -remains available for allocation and use in any computations, provided that -observable values of the variable are not affected. -@item If the variable is referenced in inline assembly, the type of access -must be provided to the compiler via constraints (@pxref{Constraints}). -Accesses from basic asms are not supported. -@end itemize +An @code{asm} statement has zero or more output operands indicating the names +of C variables modified by the assembler code. -Note that these points @emph{only} apply to code that is compiled with the -definition. The behavior of code that is merely linked in (for example -code from libraries) is not affected. +In this i386 example, @code{old} (referred to in the template string as +@code{%0}) and @code{*Base} (as @code{%1}) are outputs and @code{Offset} +(@code{%2}) is an input: -If you want to recompile source files that do not actually use your global -register variable so they do not use the specified register for any other -purpose, you need not actually add the global register declaration to -their source code. It suffices to specify the compiler option -@option{-ffixed-@var{reg}} (@pxref{Code Gen Options}) to reserve the -register. +@example +bool old; -@subsubheading Declaring the variable +__asm__ ("btsl %2,%1\n\t" // Turn on zero-based bit #Offset in Base. + "sbb %0,%0" // Use the CF to calculate old. + : "=r" (old), "+rm" (*Base) + : "Ir" (Offset) + : "cc"); -Global register variables cannot have initial values, because an -executable file has no means to supply initial contents for a register. +return old; +@end example -When selecting a register, choose one that is normally saved and -restored by function calls on your machine. This ensures that code -which is unaware of this reservation (such as library routines) will -restore it before returning. +Operands are separated by commas. Each operand has this format: -On machines with register windows, be sure to choose a global -register that is not affected magically by the function call mechanism. +@example +@r{[} [@var{asmSymbolicName}] @r{]} @var{constraint} (@var{cvariablename}) +@end example -@subsubheading Using the variable +@table @var +@item asmSymbolicName +Specifies an optional symbolic name for the operand. The literal square +brackets @samp{[]} around the @var{asmSymbolicName} are required both +in the operand specification and references to the operand in the assembler +template, i.e.@: @samp{%[Value]}. +The scope of the name is the @code{asm} statement +that contains the definition. Any valid C identifier is acceptable, +including names already defined in the surrounding code. No two operands +within the same @code{asm} statement can use the same symbolic name. -@cindex @code{qsort}, and global register variables -When calling routines that are not aware of the reservation, be -cautious if those routines call back into code which uses them. As an -example, if you call the system library version of @code{qsort}, it may -clobber your registers during execution, but (if you have selected -appropriate registers) it will restore them before returning. However -it will @emph{not} restore them before calling @code{qsort}'s comparison -function. As a result, global values will not reliably be available to -the comparison function unless the @code{qsort} function itself is rebuilt. +When not using an @var{asmSymbolicName}, use the (zero-based) position +of the operand +in the list of operands in the assembler template. For example if there are +three output operands, use @samp{%0} in the template to refer to the first, +@samp{%1} for the second, and @samp{%2} for the third. -Similarly, it is not safe to access the global register variables from signal -handlers or from more than one thread of control. Unless you recompile -them specially for the task at hand, the system library routines may -temporarily use the register for other things. Furthermore, since the register -is not reserved exclusively for the variable, accessing it from handlers of -asynchronous signals may observe unrelated temporary values residing in the -register. +@item constraint +A string constant specifying constraints on the placement of the operand; +@xref{Constraints}, for details. +In C++ with @option{-std=gnu++11} or later, the constraint can +also be a constant expression inside parentheses (see @ref{Asm constexprs}). -@cindex register variable after @code{longjmp} -@cindex global register after @code{longjmp} -@cindex value after @code{longjmp} -@findex longjmp -@findex setjmp -On most machines, @code{longjmp} restores to each global register -variable the value it had at the time of the @code{setjmp}. On some -machines, however, @code{longjmp} does not change the value of global -register variables. To be portable, the function that called @code{setjmp} -should make other arrangements to save the values of the global register -variables, and to restore them in a @code{longjmp}. This way, the same -thing happens regardless of what @code{longjmp} does. +Output constraints must begin with either @samp{=} (a variable overwriting an +existing value) or @samp{+} (when reading and writing). When using +@samp{=}, do not assume the location contains the existing value +on entry to the @code{asm}, except +when the operand is tied to an input; @pxref{InputOperands,,Input Operands}. -@node Local Register Variables -@subsubsection Specifying Registers for Local Variables -@anchor{Local Reg Vars} -@cindex local variables, specifying registers -@cindex specifying registers for local variables -@cindex registers for local variables +After the prefix, there must be one or more additional constraints +(@pxref{Constraints}) that describe where the value resides. Common +constraints include @samp{r} for register and @samp{m} for memory. +When you list more than one possible location (for example, @code{"=rm"}), +the compiler chooses the most efficient one based on the current context. +If you list as many alternates as the @code{asm} statement allows, you permit +the optimizers to produce the best possible code. +If you must use a specific register, but your Machine Constraints do not +provide sufficient control to select the specific register you want, +local register variables may provide a solution (@pxref{Local Register +Variables}). -You can define a local register variable and associate it with a specified -register like this: +@item cvariablename +Specifies a C lvalue expression to hold the output, typically a variable name. +The enclosing parentheses are a required part of the syntax. -@smallexample -register int *foo asm ("r12"); -@end smallexample +@end table -@noindent -Here @code{r12} is the name of the register that should be used. Note -that this is the same syntax used for defining global register variables, -but for a local variable the declaration appears within a function. The -@code{register} keyword is required, and cannot be combined with -@code{static}. The register name must be a valid register name for the -target platform. +When the compiler selects the registers to use to +represent the output operands, it does not use any of the clobbered registers +(@pxref{Clobbers and Scratch Registers}). -Do not use type qualifiers such as @code{const} and @code{volatile}, as -the outcome may be contrary to expectations. In particular, when the -@code{const} qualifier is used, the compiler may substitute the -variable with its initializer in @code{asm} statements, which may cause -the corresponding operand to appear in a different register. +Output operand expressions must be lvalues. The compiler cannot check whether +the operands have data types that are reasonable for the instruction being +executed. For output expressions that are not directly addressable (for +example a bit-field), the constraint must allow a register. In that case, GCC +uses the register as the output of the @code{asm}, and then stores that +register into the output. -As with global register variables, it is recommended that you choose -a register that is normally saved and restored by function calls on your -machine, so that calls to library routines will not clobber it. +Operands using the @samp{+} constraint modifier count as two operands +(that is, both as input and output) towards the total maximum of 30 operands +per @code{asm} statement. -The only supported use for this feature is to specify registers -for input and output operands when calling Extended @code{asm} -(@pxref{Extended Asm}). This may be necessary if the constraints for a -particular machine don't provide sufficient control to select the desired -register. To force an operand into a register, create a local variable -and specify the register name after the variable's declaration. Then use -the local variable for the @code{asm} operand and specify any constraint -letter that matches the register: +Use the @samp{&} constraint modifier (@pxref{Modifiers}) on all output +operands that must not overlap an input. Otherwise, +GCC may allocate the output operand in the same register as an unrelated +input operand, on the assumption that the assembler code consumes its +inputs before producing outputs. This assumption may be false if the assembler +code actually consists of more than one instruction. -@smallexample -register int *p1 asm ("r0") = @dots{}; -register int *p2 asm ("r1") = @dots{}; -register int *result asm ("r0"); -asm ("sysint" : "=r" (result) : "0" (p1), "r" (p2)); -@end smallexample +The same problem can occur if one output parameter (@var{a}) allows a register +constraint and another output parameter (@var{b}) allows a memory constraint. +The code generated by GCC to access the memory address in @var{b} can contain +registers which @emph{might} be shared by @var{a}, and GCC considers those +registers to be inputs to the asm. As above, GCC assumes that such input +registers are consumed before any outputs are written. This assumption may +result in incorrect behavior if the @code{asm} statement writes to @var{a} +before using +@var{b}. Combining the @samp{&} modifier with the register constraint on @var{a} +ensures that modifying @var{a} does not affect the address referenced by +@var{b}. Otherwise, the location of @var{b} +is undefined if @var{a} is modified before using @var{b}. -@emph{Warning:} In the above example, be aware that a register (for example -@code{r0}) can be call-clobbered by subsequent code, including function -calls and library calls for arithmetic operators on other variables (for -example the initialization of @code{p2}). In this case, use temporary -variables for expressions between the register assignments: +@code{asm} supports operand modifiers on operands (for example @samp{%k2} +instead of simply @samp{%2}). @ref{GenericOperandmodifiers, +Generic Operand modifiers} lists the modifiers that are available +on all targets. Other modifiers are hardware dependent. +For example, the list of supported modifiers for x86 is found at +@ref{x86Operandmodifiers,x86 Operand modifiers}. -@smallexample -int t1 = @dots{}; -register int *p1 asm ("r0") = @dots{}; -register int *p2 asm ("r1") = t1; -register int *result asm ("r0"); -asm ("sysint" : "=r" (result) : "0" (p1), "r" (p2)); -@end smallexample +If the C code that follows the @code{asm} makes no use of any of the output +operands, use @code{volatile} for the @code{asm} statement to prevent the +optimizers from discarding the @code{asm} statement as unneeded +(see @ref{Volatile}). -Defining a register variable does not reserve the register. Other than -when invoking the Extended @code{asm}, the contents of the specified -register are not guaranteed. For this reason, the following uses -are explicitly @emph{not} supported. If they appear to work, it is only -happenstance, and may stop working as intended due to (seemingly) -unrelated changes in surrounding code, or even minor changes in the -optimization of a future version of gcc: +This code makes no use of the optional @var{asmSymbolicName}. Therefore it +references the first output operand as @code{%0} (were there a second, it +would be @code{%1}, etc). The number of the first input operand is one greater +than that of the last output operand. In this i386 example, that makes +@code{Mask} referenced as @code{%1}: -@itemize @bullet -@item Passing parameters to or from Basic @code{asm} -@item Passing parameters to or from Extended @code{asm} without using input -or output operands. -@item Passing parameters to or from routines written in assembler (or -other languages) using non-standard calling conventions. -@end itemize +@example +uint32_t Mask = 1234; +uint32_t Index; -Some developers use Local Register Variables in an attempt to improve -gcc's allocation of registers, especially in large functions. In this -case the register name is essentially a hint to the register allocator. -While in some instances this can generate better code, improvements are -subject to the whims of the allocator/optimizers. Since there are no -guarantees that your improvements won't be lost, this usage of Local -Register Variables is discouraged. + asm ("bsfl %1, %0" + : "=r" (Index) + : "r" (Mask) + : "cc"); +@end example -On the MIPS platform, there is related use for local register variables -with slightly different characteristics (@pxref{MIPS Coprocessors,, -Defining coprocessor specifics for MIPS targets, gccint, -GNU Compiler Collection (GCC) Internals}). +That code overwrites the variable @code{Index} (@samp{=}), +placing the value in a register (@samp{r}). +Using the generic @samp{r} constraint instead of a constraint for a specific +register allows the compiler to pick the register to use, which can result +in more efficient code. This may not be possible if an assembler instruction +requires a specific register. -@node Size of an asm -@subsection Size of an @code{asm} +The following i386 example uses the @var{asmSymbolicName} syntax. +It produces the +same result as the code above, but some may consider it more readable or more +maintainable since reordering index numbers is not necessary when adding or +removing operands. The names @code{aIndex} and @code{aMask} +are only used in this example to emphasize which +names get used where. +It is acceptable to reuse the names @code{Index} and @code{Mask}. -Some targets require that GCC track the size of each instruction used -in order to generate correct code. Because the final length of the -code produced by an @code{asm} statement is only known by the -assembler, GCC must make an estimate as to how big it will be. It -does this by counting the number of instructions in the pattern of the -@code{asm} and multiplying that by the length of the longest -instruction supported by that processor. (When working out the number -of instructions, it assumes that any occurrence of a newline or of -whatever statement separator character is supported by the assembler --- -typically @samp{;} --- indicates the end of an instruction.) +@example +uint32_t Mask = 1234; +uint32_t Index; -Normally, GCC's estimate is adequate to ensure that correct -code is generated, but it is possible to confuse the compiler if you use -pseudo instructions or assembler macros that expand into multiple real -instructions, or if you use assembler directives that expand to more -space in the object file than is needed for a single instruction. -If this happens then the assembler may produce a diagnostic saying that -a label is unreachable. + asm ("bsfl %[aMask], %[aIndex]" + : [aIndex] "=r" (Index) + : [aMask] "r" (Mask) + : "cc"); +@end example -@cindex @code{asm inline} -This size is also used for inlining decisions. If you use @code{asm inline} -instead of just @code{asm}, then for inlining purposes the size of the asm -is taken as the minimum size, ignoring how many instructions GCC thinks it is. +Here are some more examples of output operands. -@node Syntax Extensions -@section Other Extensions to C Syntax +@example +uint32_t c = 1; +uint32_t d; +uint32_t *e = &c; -GNU C has traditionally supported numerous extensions to standard C -syntax. Some of these features were originally intended for -compatibility with other compilers or to ease traditional C -compatibility, some have been adopted into subsequent versions of the -C and/or C++ standards, while others remain specific to GNU C. +asm ("mov %[e], %[d]" + : [d] "=rm" (d) + : [e] "rm" (*e)); +@end example -@menu -* Statement Exprs:: Putting statements and declarations inside expressions. -* Local Labels:: Labels local to a block. -* Labels as Values:: Getting pointers to labels, and computed gotos. -* Nested Functions:: Nested functions in GNU C. -* Typeof:: @code{typeof}: referring to the type of an expression. -* Offsetof:: Special syntax for @code{offsetof}. -* Alignment:: Determining the alignment of a function, type or variable. -* Incomplete Enums:: @code{enum foo;}, with details to follow. -* Variadic Macros:: Macros with a variable number of arguments. -* Conditionals:: Omitting the middle operand of a @samp{?:} expression. -* Case Ranges:: `case 1 ... 9' and such. -* Mixed Labels and Declarations:: Mixing declarations, labels and code. -* C++ Comments:: C++ comments are recognized. -* Escaped Newlines:: Slightly looser rules for escaped newlines. -* Hex Floats:: Hexadecimal floating-point constants. -* Binary constants:: Binary constants using the @samp{0b} prefix. -* Dollar Signs:: Dollar sign is allowed in identifiers. -* Character Escapes:: @samp{\e} stands for the character @key{ESC}. -* Alternate Keywords:: @code{__const__}, @code{__asm__}, etc., for header files. -* Function Names:: Printable strings which are the name of the current - function. -@end menu +Here, @code{d} may either be in a register or in memory. Since the compiler +might already have the current value of the @code{uint32_t} location +pointed to by @code{e} +in a register, you can enable it to choose the best location +for @code{d} by specifying both constraints. -@node Statement Exprs -@subsection Statements and Declarations in Expressions -@cindex statements inside expressions -@cindex declarations inside expressions -@cindex expressions containing statements -@cindex macros, statements in expressions +@anchor{FlagOutputOperands} +@subsubsection Flag Output Operands +@cindex @code{asm} flag output operands -@c the above section title wrapped and causes an underfull hbox.. i -@c changed it from "within" to "in". --mew 4feb93 -A compound statement enclosed in parentheses may appear as an expression -in GNU C@. This allows you to use loops, switches, and local variables -within an expression. +Some targets have a special register that holds the ``flags'' for the +result of an operation or comparison. Normally, the contents of that +register are either unmodified by the asm, or the @code{asm} statement is +considered to clobber the contents. -Recall that a compound statement is a sequence of statements surrounded -by braces; in this construct, parentheses go around the braces. For -example: +On some targets, a special form of output operand exists by which +conditions in the flags register may be outputs of the asm. The set of +conditions supported are target specific, but the general rule is that +the output variable must be a scalar integer, and the value is boolean. +When supported, the target defines the preprocessor symbol +@code{__GCC_ASM_FLAG_OUTPUTS__}. -@smallexample -(@{ int y = foo (); int z; - if (y > 0) z = y; - else z = - y; - z; @}) -@end smallexample +Because of the special nature of the flag output operands, the constraint +may not include alternatives. -@noindent -is a valid (though slightly more complex than necessary) expression -for the absolute value of @code{foo ()}. +Most often, the target has only one flags register, and thus is an implied +operand of many instructions. In this case, the operand should not be +referenced within the assembler template via @code{%0} etc, as there's +no corresponding text in the assembly language. -The last thing in the compound statement should be an expression -followed by a semicolon; the value of this subexpression serves as the -value of the entire construct. (If you use some other kind of statement -last within the braces, the construct has type @code{void}, and thus -effectively no value.) +@table @asis +@item ARM +@itemx AArch64 +The flag output constraints for the ARM family are of the form +@samp{=@@cc@var{cond}} where @var{cond} is one of the standard +conditions defined in the ARM ARM for @code{ConditionHolds}. -This feature is especially useful in making macro definitions ``safe'' (so -that they evaluate each operand exactly once). For example, the -``maximum'' function is commonly defined as a macro in standard C as -follows: +@table @code +@item eq +Z flag set, or equal +@item ne +Z flag clear or not equal +@item cs +@itemx hs +C flag set or unsigned greater than equal +@item cc +@itemx lo +C flag clear or unsigned less than +@item mi +N flag set or ``minus'' +@item pl +N flag clear or ``plus'' +@item vs +V flag set or signed overflow +@item vc +V flag clear +@item hi +unsigned greater than +@item ls +unsigned less than equal +@item ge +signed greater than equal +@item lt +signed less than +@item gt +signed greater than +@item le +signed less than equal +@end table -@smallexample -#define max(a,b) ((a) > (b) ? (a) : (b)) -@end smallexample +The flag output constraints are not supported in thumb1 mode. -@noindent -@cindex side effects, macro argument -But this definition computes either @var{a} or @var{b} twice, with bad -results if the operand has side effects. In GNU C, if you know the -type of the operands (here taken as @code{int}), you can avoid this -problem by defining the macro as follows: +@item x86 family +The flag output constraints for the x86 family are of the form +@samp{=@@cc@var{cond}} where @var{cond} is one of the standard +conditions defined in the ISA manual for @code{j@var{cc}} or +@code{set@var{cc}}. -@smallexample -#define maxint(a,b) \ - (@{int _a = (a), _b = (b); _a > _b ? _a : _b; @}) -@end smallexample +@table @code +@item a +``above'' or unsigned greater than +@item ae +``above or equal'' or unsigned greater than or equal +@item b +``below'' or unsigned less than +@item be +``below or equal'' or unsigned less than or equal +@item c +carry flag set +@item e +@itemx z +``equal'' or zero flag set +@item g +signed greater than +@item ge +signed greater than or equal +@item l +signed less than +@item le +signed less than or equal +@item o +overflow flag set +@item p +parity flag set +@item s +sign flag set +@item na +@itemx nae +@itemx nb +@itemx nbe +@itemx nc +@itemx ne +@itemx ng +@itemx nge +@itemx nl +@itemx nle +@itemx no +@itemx np +@itemx ns +@itemx nz +``not'' @var{flag}, or inverted versions of those above +@end table -Note that introducing variable declarations (as we do in @code{maxint}) can -cause variable shadowing, so while this example using the @code{max} macro -produces correct results: -@smallexample -int _a = 1, _b = 2, c; -c = max (_a, _b); -@end smallexample -@noindent -this example using maxint will not: -@smallexample -int _a = 1, _b = 2, c; -c = maxint (_a, _b); -@end smallexample +@item s390 +The flag output constraint for s390 is @samp{=@@cc}. Only one such +constraint is allowed. The variable has to be stored in a @samp{int} +variable. -This problem may for instance occur when we use this pattern recursively, like -so: +@end table -@smallexample -#define maxint3(a, b, c) \ - (@{int _a = (a), _b = (b), _c = (c); maxint (maxint (_a, _b), _c); @}) -@end smallexample +@anchor{InputOperands} +@subsubsection Input Operands +@cindex @code{asm} input operands +@cindex @code{asm} expressions -Embedded statements are not allowed in constant expressions, such as -the value of an enumeration constant, the width of a bit-field, or -the initial value of a static variable. +Input operands make values from C variables and expressions available to the +assembly code. -If you don't know the type of the operand, you can still do this, but you -must use @code{typeof} or @code{__auto_type} (@pxref{Typeof}). +Operands are separated by commas. Each operand has this format: -In G++, the result value of a statement expression undergoes array and -function pointer decay, and is returned by value to the enclosing -expression. For instance, if @code{A} is a class, then +@example +@r{[} [@var{asmSymbolicName}] @r{]} @var{constraint} (@var{cexpression}) +@end example -@smallexample - A a; +@table @var +@item asmSymbolicName +Specifies an optional symbolic name for the operand. The literal square +brackets @samp{[]} around the @var{asmSymbolicName} are required both +in the operand specification and references to the operand in the assembler +template, i.e.@: @samp{%[Value]}. +The scope of the name is the @code{asm} statement +that contains the definition. Any valid C identifier is acceptable, +including names already defined in the surrounding code. No two operands +within the same @code{asm} statement can use the same symbolic name. - (@{a;@}).Foo () -@end smallexample +When not using an @var{asmSymbolicName}, use the (zero-based) position +of the operand +in the list of operands in the assembler template. For example if there are +two output operands and three inputs, +use @samp{%2} in the template to refer to the first input operand, +@samp{%3} for the second, and @samp{%4} for the third. -@noindent -constructs a temporary @code{A} object to hold the result of the -statement expression, and that is used to invoke @code{Foo}. -Therefore the @code{this} pointer observed by @code{Foo} is not the -address of @code{a}. +@item constraint +A string constant specifying constraints on the placement of the operand; +@xref{Constraints}, for details. +In C++ with @option{-std=gnu++11} or later, the constraint can +also be a constant expression inside parentheses (see @ref{Asm constexprs}). -In a statement expression, any temporaries created within a statement -are destroyed at that statement's end. This makes statement -expressions inside macros slightly different from function calls. In -the latter case temporaries introduced during argument evaluation are -destroyed at the end of the statement that includes the function -call. In the statement expression case they are destroyed during -the statement expression. For instance, +Input constraint strings may not begin with either @samp{=} or @samp{+}. +When you list more than one possible location (for example, @samp{"irm"}), +the compiler chooses the most efficient one based on the current context. +If you must use a specific register, but your Machine Constraints do not +provide sufficient control to select the specific register you want, +local register variables may provide a solution (@pxref{Local Register +Variables}). -@smallexample -#define macro(a) (@{__typeof__(a) b = (a); b + 3; @}) -template T function(T a) @{ T b = a; return b + 3; @} +Input constraints can also be digits (for example, @code{"0"}). This indicates +that the specified input must be in the same place as the output constraint +at the (zero-based) index in the output constraint list. +When using @var{asmSymbolicName} syntax for the output operands, +you may use these names (enclosed in brackets @samp{[]}) instead of digits. -void foo () -@{ - macro (X ()); - function (X ()); -@} -@end smallexample +@item cexpression +This is the C variable or expression being passed to the @code{asm} statement +as input. The enclosing parentheses are a required part of the syntax. -@noindent -has different places where temporaries are destroyed. For the -@code{macro} case, the temporary @code{X} is destroyed just after -the initialization of @code{b}. In the @code{function} case that -temporary is destroyed when the function returns. +@end table -These considerations mean that it is probably a bad idea to use -statement expressions of this form in header files that are designed to -work with C++. (Note that some versions of the GNU C Library contained -header files using statement expressions that lead to precisely this -bug.) +When the compiler selects the registers to use to represent the input +operands, it does not use any of the clobbered registers +(@pxref{Clobbers and Scratch Registers}). -Jumping into a statement expression with @code{goto} or using a -@code{switch} statement outside the statement expression with a -@code{case} or @code{default} label inside the statement expression is -not permitted. Jumping into a statement expression with a computed -@code{goto} (@pxref{Labels as Values}) has undefined behavior. -Jumping out of a statement expression is permitted, but if the -statement expression is part of a larger expression then it is -unspecified which other subexpressions of that expression have been -evaluated except where the language definition requires certain -subexpressions to be evaluated before or after the statement -expression. A @code{break} or @code{continue} statement inside of -a statement expression used in @code{while}, @code{do} or @code{for} -loop or @code{switch} statement condition -or @code{for} statement init or increment expressions jumps to an -outer loop or @code{switch} statement if any (otherwise it is an error), -rather than to the loop or @code{switch} statement in whose condition -or init or increment expression it appears. -In any case, as with a function call, the evaluation of a -statement expression is not interleaved with the evaluation of other -parts of the containing expression. For example, +If there are no output operands but there are input operands, place two +consecutive colons where the output operands would go: -@smallexample - foo (), ((@{ bar1 (); goto a; 0; @}) + bar2 ()), baz(); -@end smallexample +@example +__asm__ ("some instructions" + : /* No outputs. */ + : "r" (Offset / 8)); +@end example -@noindent -calls @code{foo} and @code{bar1} and does not call @code{baz} but -may or may not call @code{bar2}. If @code{bar2} is called, it is -called after @code{foo} and before @code{bar1}. +@strong{Warning:} Do @emph{not} modify the contents of input-only operands +(except for inputs tied to outputs). The compiler assumes that on exit from +the @code{asm} statement these operands contain the same values as they +had before executing the statement. +It is @emph{not} possible to use clobbers +to inform the compiler that the values in these inputs are changing. One +common work-around is to tie the changing input variable to an output variable +that never gets used. Note, however, that if the code that follows the +@code{asm} statement makes no use of any of the output operands, the GCC +optimizers may discard the @code{asm} statement as unneeded +(see @ref{Volatile}). -@node Local Labels -@subsection Locally Declared Labels -@cindex local labels -@cindex macros, local labels +@code{asm} supports operand modifiers on operands (for example @samp{%k2} +instead of simply @samp{%2}). @ref{GenericOperandmodifiers, +Generic Operand modifiers} lists the modifiers that are available +on all targets. Other modifiers are hardware dependent. +For example, the list of supported modifiers for x86 is found at +@ref{x86Operandmodifiers,x86 Operand modifiers}. -GCC allows you to declare @dfn{local labels} in any nested block -scope. A local label is just like an ordinary label, but you can -only reference it (with a @code{goto} statement, or by taking its -address) within the block in which it is declared. +In this example using the fictitious @code{combine} instruction, the +constraint @code{"0"} for input operand 1 says that it must occupy the same +location as output operand 0. Only input operands may use numbers in +constraints, and they must each refer to an output operand. Only a number (or +the symbolic assembler name) in the constraint can guarantee that one operand +is in the same place as another. The mere fact that @code{foo} is the value of +both operands is not enough to guarantee that they are in the same place in +the generated assembler code. -A local label declaration looks like this: +@example +asm ("combine %2, %0" + : "=r" (foo) + : "0" (foo), "g" (bar)); +@end example -@smallexample -__label__ @var{label}; -@end smallexample +Here is an example using symbolic names. -@noindent -or +@example +asm ("cmoveq %1, %2, %[result]" + : [result] "=r"(result) + : "r" (test), "r" (new), "[result]" (old)); +@end example -@smallexample -__label__ @var{label1}, @var{label2}, /* @r{@dots{}} */; -@end smallexample +@anchor{Clobbers and Scratch Registers} +@subsubsection Clobbers and Scratch Registers +@cindex @code{asm} clobbers +@cindex @code{asm} scratch registers -Local label declarations must come at the beginning of the block, -before any ordinary declarations or statements. +While the compiler is aware of changes to entries listed in the output +operands, the inline @code{asm} code may modify more than just the outputs. For +example, calculations may require additional registers, or the processor may +overwrite a register as a side effect of a particular assembler instruction. +In order to inform the compiler of these changes, list them in the clobber +list. Clobber list items are either register names or the special clobbers +(listed below). Each clobber list item is a string constant +enclosed in double quotes and separated by commas. +In C++ with @option{-std=gnu++11} or later, a clobber list item can +also be a constant expression inside parentheses (see @ref{Asm constexprs}). -The label declaration defines the label @emph{name}, but does not define -the label itself. You must do this in the usual way, with -@code{@var{label}:}, within the statements of the statement expression. +Clobber descriptions may not in any way overlap with an input or output +operand. For example, you may not have an operand describing a register class +with one member when listing that register in the clobber list. Variables +declared to live in specific registers (@pxref{Explicit Register +Variables}) and used +as @code{asm} input or output operands must have no part mentioned in the +clobber description. In particular, there is no way to specify that input +operands get modified without also specifying them as output operands. -The local label feature is useful for complex macros. If a macro -contains nested loops, a @code{goto} can be useful for breaking out of -them. However, an ordinary label whose scope is the whole function -cannot be used: if the macro can be expanded several times in one -function, the label is multiply defined in that function. A -local label avoids this problem. For example: +When the compiler selects which registers to use to represent input and output +operands, it does not use any of the clobbered registers. As a result, +clobbered registers are available for any use in the assembler code. -@smallexample -#define SEARCH(value, array, target) \ -do @{ \ - __label__ found; \ - typeof (target) _SEARCH_target = (target); \ - typeof (*(array)) *_SEARCH_array = (array); \ - int i, j; \ - int value; \ - for (i = 0; i < max; i++) \ - for (j = 0; j < max; j++) \ - if (_SEARCH_array[i][j] == _SEARCH_target) \ - @{ (value) = i; goto found; @} \ - (value) = -1; \ - found:; \ -@} while (0) -@end smallexample +Another restriction is that the clobber list should not contain the +stack pointer register. This is because the compiler requires the +value of the stack pointer to be the same after an @code{asm} +statement as it was on entry to the statement. However, previous +versions of GCC did not enforce this rule and allowed the stack +pointer to appear in the list, with unclear semantics. This behavior +is deprecated and listing the stack pointer may become an error in +future versions of GCC@. -This could also be written using a statement expression: +Here is a realistic example for the VAX showing the use of clobbered +registers: -@smallexample -#define SEARCH(array, target) \ -(@{ \ - __label__ found; \ - typeof (target) _SEARCH_target = (target); \ - typeof (*(array)) *_SEARCH_array = (array); \ - int i, j; \ - int value; \ - for (i = 0; i < max; i++) \ - for (j = 0; j < max; j++) \ - if (_SEARCH_array[i][j] == _SEARCH_target) \ - @{ value = i; goto found; @} \ - value = -1; \ - found: \ - value; \ -@}) -@end smallexample +@example +asm volatile ("movc3 %0, %1, %2" + : /* No outputs. */ + : "g" (from), "g" (to), "g" (count) + : "r0", "r1", "r2", "r3", "r4", "r5", "memory"); +@end example -Local label declarations also make the labels they declare visible to -nested functions, if there are any. @xref{Nested Functions}, for details. +Also, there are three special clobber arguments: -@node Labels as Values -@subsection Labels as Values -@cindex labels as values -@cindex computed gotos -@cindex goto with computed label -@cindex address of a label +@table @code +@item "cc" +The @code{"cc"} clobber indicates that the assembler code modifies the flags +register. On some machines, GCC represents the condition codes as a specific +hardware register; @code{"cc"} serves to name this register. +On other machines, condition code handling is different, +and specifying @code{"cc"} has no effect. But +it is valid no matter what the target. -You can get the address of a label defined in the current function -(or a containing function) with the unary operator @samp{&&}. The -value has type @code{void *}. This value is a constant and can be used -wherever a constant of that type is valid. For example: +@item "memory" +The @code{"memory"} clobber tells the compiler that the assembly code +performs memory +reads or writes to items other than those listed in the input and output +operands (for example, accessing the memory pointed to by one of the input +parameters). To ensure memory contains correct values, GCC may need to flush +specific register values to memory before executing the @code{asm}. Further, +the compiler does not assume that any values read from memory before an +@code{asm} remain unchanged after that @code{asm}; it reloads them as +needed. +Using the @code{"memory"} clobber effectively forms a read/write +memory barrier for the compiler. -@smallexample -void *ptr; -/* @r{@dots{}} */ -ptr = &&foo; -@end smallexample +Note that this clobber does not prevent the @emph{processor} from doing +speculative reads past the @code{asm} statement. To prevent that, you need +processor-specific fence instructions. -To use these values, you need to be able to jump to one. This is done -with the computed goto statement@footnote{The analogous feature in -Fortran is called an assigned goto, but that name seems inappropriate in -C, where one can do more than simply store label addresses in label -variables.}, @code{goto *@var{exp};}. For example, +@item "redzone" +The @code{"redzone"} clobber tells the compiler that the assembly code +may write to the stack red zone, area below the stack pointer which on +some architectures in some calling conventions is guaranteed not to be +changed by signal handlers, interrupts or exceptions and so the compiler +can store there temporaries in leaf functions. On targets which have +no concept of the stack red zone, the clobber is ignored. +It should be used e.g.@: in case the assembly code uses call instructions +or pushes something to the stack without taking the red zone into account +by subtracting red zone size from the stack pointer first and restoring +it afterwards. -@smallexample -goto *ptr; -@end smallexample +@end table -@noindent -Any expression of type @code{void *} is allowed. +Flushing registers to memory has performance implications and may be +an issue for time-sensitive code. You can provide better information +to GCC to avoid this, as shown in the following examples. At a +minimum, aliasing rules allow GCC to know what memory @emph{doesn't} +need to be flushed. -One way of using these constants is in initializing a static array that -serves as a jump table: +Here is a fictitious sum of squares instruction, that takes two +pointers to floating point values in memory and produces a floating +point register output. +Notice that @code{x}, and @code{y} both appear twice in the @code{asm} +parameters, once to specify memory accessed, and once to specify a +base register used by the @code{asm}. You won't normally be wasting a +register by doing this as GCC can use the same register for both +purposes. However, it would be foolish to use both @code{%1} and +@code{%3} for @code{x} in this @code{asm} and expect them to be the +same. In fact, @code{%3} may well not be a register. It might be a +symbolic memory reference to the object pointed to by @code{x}. @smallexample -static void *array[] = @{ &&foo, &&bar, &&hack @}; +asm ("sumsq %0, %1, %2" + : "+f" (result) + : "r" (x), "r" (y), "m" (*x), "m" (*y)); @end smallexample -@noindent -Then you can select a label with indexing, like this: +Here is a fictitious @code{*z++ = *x++ * *y++} instruction. +Notice that the @code{x}, @code{y} and @code{z} pointer registers +must be specified as input/output because the @code{asm} modifies +them. @smallexample -goto *array[i]; +asm ("vecmul %0, %1, %2" + : "+r" (z), "+r" (x), "+r" (y), "=m" (*z) + : "m" (*x), "m" (*y)); @end smallexample -@noindent -Note that this does not check whether the subscript is in bounds---array -indexing in C never does that. - -Such an array of label values serves a purpose much like that of the -@code{switch} statement. The @code{switch} statement is cleaner, so -use that rather than an array unless the problem does not fit a -@code{switch} statement very well. - -Another use of label values is in an interpreter for threaded code. -The labels within the interpreter function can be stored in the -threaded code for super-fast dispatching. - -You may not use this mechanism to jump to code in a different function. -If you do that, totally unpredictable things happen. The best way to -avoid this is to store the label address only in automatic variables and -never pass it as an argument. - -An alternate way to write the above example is +An x86 example where the string memory argument is of unknown length. @smallexample -static const int array[] = @{ &&foo - &&foo, &&bar - &&foo, - &&hack - &&foo @}; -goto *(&&foo + array[i]); +asm("repne scasb" + : "=c" (count), "+D" (p) + : "m" (*(const char (*)[]) p), "0" (-1), "a" (0)); @end smallexample -@noindent -This is more friendly to code living in shared libraries, as it reduces -the number of dynamic relocations that are needed, and by consequence, -allows the data to be read-only. -This alternative with label differences is not supported for the AVR target, -please use the first approach for AVR programs. - -The @code{&&foo} expressions for the same label might have different -values if the containing function is inlined or cloned. If a program -relies on them being always the same, -@code{__attribute__((__noinline__,__noclone__))} should be used to -prevent inlining and cloning. If @code{&&foo} is used in a static -variable initializer, inlining and cloning is forbidden. - -Unlike a normal goto, in GNU C++ a computed goto will not call -destructors for objects that go out of scope. - -@node Nested Functions -@subsection Nested Functions -@cindex nested functions -@cindex downward funargs -@cindex thunks - -A @dfn{nested function} is a function defined inside another function. -Nested functions are supported as an extension in GNU C, but are not -supported by GNU C++. - -The nested function's name is local to the block where it is defined. -For example, here we define a nested function named @code{square}, and -call it twice: - -@smallexample -@group -foo (double a, double b) -@{ - double square (double z) @{ return z * z; @} - - return square (a) + square (b); -@} -@end group -@end smallexample +If you know the above will only be reading a ten byte array then you +could instead use a memory input like: +@code{"m" (*(const char (*)[10]) p)}. -The nested function can access all the variables of the containing -function that are visible at the point of its definition. This is -called @dfn{lexical scoping}. For example, here we show a nested -function which uses an inherited variable named @code{offset}: +Here is an example of a PowerPC vector scale implemented in assembly, +complete with vector and condition code clobbers, and some initialized +offset registers that are unchanged by the @code{asm}. @smallexample -@group -bar (int *array, int offset, int size) +void +dscal (size_t n, double *x, double alpha) @{ - int access (int *array, int index) - @{ return array[index + offset]; @} - int i; - /* @r{@dots{}} */ - for (i = 0; i < size; i++) - /* @r{@dots{}} */ access (array, i) /* @r{@dots{}} */ + asm ("/* lots of asm here */" + : "+m" (*(double (*)[n]) x), "+&r" (n), "+b" (x) + : "d" (alpha), "b" (32), "b" (48), "b" (64), + "b" (80), "b" (96), "b" (112) + : "cr0", + "vs32","vs33","vs34","vs35","vs36","vs37","vs38","vs39", + "vs40","vs41","vs42","vs43","vs44","vs45","vs46","vs47"); @} -@end group @end smallexample -Nested function definitions are permitted within functions in the places -where variable definitions are allowed; that is, in any block, mixed -with the other declarations and statements in the block. - -It is possible to call the nested function from outside the scope of its -name by storing its address or passing the address to another function: +Rather than allocating fixed registers via clobbers to provide scratch +registers for an @code{asm} statement, an alternative is to define a +variable and make it an early-clobber output as with @code{a2} and +@code{a3} in the example below. This gives the compiler register +allocator more freedom. You can also define a variable and make it an +output tied to an input as with @code{a0} and @code{a1}, tied +respectively to @code{ap} and @code{lda}. Of course, with tied +outputs your @code{asm} can't use the input value after modifying the +output register since they are one and the same register. What's +more, if you omit the early-clobber on the output, it is possible that +GCC might allocate the same register to another of the inputs if GCC +could prove they had the same value on entry to the @code{asm}. This +is why @code{a1} has an early-clobber. Its tied input, @code{lda} +might conceivably be known to have the value 16 and without an +early-clobber share the same register as @code{%11}. On the other +hand, @code{ap} can't be the same as any of the other inputs, so an +early-clobber on @code{a0} is not needed. It is also not desirable in +this case. An early-clobber on @code{a0} would cause GCC to allocate +a separate register for the @code{"m" (*(const double (*)[]) ap)} +input. Note that tying an input to an output is the way to set up an +initialized temporary register modified by an @code{asm} statement. +An input not tied to an output is assumed by GCC to be unchanged, for +example @code{"b" (16)} below sets up @code{%11} to 16, and GCC might +use that register in following code if the value 16 happened to be +needed. You can even use a normal @code{asm} output for a scratch if +all inputs that might share the same register are consumed before the +scratch is used. The VSX registers clobbered by the @code{asm} +statement could have used this technique except for GCC's limit on the +number of @code{asm} parameters. @smallexample -hack (int *array, int size) +static void +dgemv_kernel_4x4 (long n, const double *ap, long lda, + const double *x, double *y, double alpha) @{ - void store (int index, int value) - @{ array[index] = value; @} - - intermediate (store, size); -@} -@end smallexample - -Here, the function @code{intermediate} receives the address of -@code{store} as an argument. If @code{intermediate} calls @code{store}, -the arguments given to @code{store} are used to store into @code{array}. -But this technique works only so long as the containing function -(@code{hack}, in this example) does not exit. + double *a0; + double *a1; + double *a2; + double *a3; -If you try to call the nested function through its address after the -containing function exits, all hell breaks loose. If you try -to call it after a containing scope level exits, and if it refers -to some of the variables that are no longer in scope, you may be lucky, -but it's not wise to take the risk. If, however, the nested function -does not refer to anything that has gone out of scope, you should be -safe. - -GCC implements taking the address of a nested function using a technique -called @dfn{trampolines}. This technique was described in -@cite{Lexical Closures for C++} (Thomas M. Breuel, USENIX -C++ Conference Proceedings, October 17-21, 1988). - -A nested function can jump to a label inherited from a containing -function, provided the label is explicitly declared in the containing -function (@pxref{Local Labels}). Such a jump returns instantly to the -containing function, exiting the nested function that did the -@code{goto} and any intermediate functions as well. Here is an example: - -@smallexample -@group -bar (int *array, int offset, int size) -@{ - __label__ failure; - int access (int *array, int index) - @{ - if (index > size) - goto failure; - return array[index + offset]; - @} - int i; - /* @r{@dots{}} */ - for (i = 0; i < size; i++) - /* @r{@dots{}} */ access (array, i) /* @r{@dots{}} */ - /* @r{@dots{}} */ - return 0; - - /* @r{Control comes here from @code{access} - if it detects an error.} */ - failure: - return -1; + __asm__ + ( + /* lots of asm here */ + "#n=%1 ap=%8=%12 lda=%13 x=%7=%10 y=%0=%2 alpha=%9 o16=%11\n" + "#a0=%3 a1=%4 a2=%5 a3=%6" + : + "+m" (*(double (*)[n]) y), + "+&r" (n), // 1 + "+b" (y), // 2 + "=b" (a0), // 3 + "=&b" (a1), // 4 + "=&b" (a2), // 5 + "=&b" (a3) // 6 + : + "m" (*(const double (*)[n]) x), + "m" (*(const double (*)[]) ap), + "d" (alpha), // 9 + "r" (x), // 10 + "b" (16), // 11 + "3" (ap), // 12 + "4" (lda) // 13 + : + "cr0", + "vs32","vs33","vs34","vs35","vs36","vs37", + "vs40","vs41","vs42","vs43","vs44","vs45","vs46","vs47" + ); @} -@end group @end smallexample -A nested function always has no linkage. Declaring one with -@code{extern} or @code{static} is erroneous. If you need to declare the nested function -before its definition, use @code{auto} (which is otherwise meaningless -for function declarations). +@anchor{GotoLabels} +@subsubsection Goto Labels +@cindex @code{asm} goto labels -@smallexample -bar (int *array, int offset, int size) -@{ - __label__ failure; - auto int access (int *, int); - /* @r{@dots{}} */ - int access (int *array, int index) - @{ - if (index > size) - goto failure; - return array[index + offset]; - @} - /* @r{@dots{}} */ -@} -@end smallexample +@code{asm goto} allows assembly code to jump to one or more C labels. The +@var{GotoLabels} section in an @code{asm goto} statement contains +a comma-separated +list of all C labels to which the assembler code may jump. GCC assumes that +@code{asm} execution falls through to the next statement (if this is not the +case, consider using the @code{__builtin_unreachable} intrinsic after the +@code{asm} statement). Optimization of @code{asm goto} may be improved by +using the @code{hot} and @code{cold} label attributes (@pxref{Label +Attributes}). -@node Typeof -@subsection Referring to a Type with @code{typeof} -@findex typeof -@findex sizeof -@cindex macros, types of arguments +If the assembler code does modify anything, use the @code{"memory"} clobber +to force the +optimizers to flush all register values to memory and reload them if +necessary after the @code{asm} statement. -Another way to refer to the type of an expression is with @code{typeof}. -The syntax of using of this keyword looks like @code{sizeof}, but the -construct acts semantically like a type name defined with @code{typedef}. +Also note that an @code{asm goto} statement is always implicitly +considered volatile. -There are two ways of writing the argument to @code{typeof}: with an -expression or with a type. Here is an example with an expression: +Be careful when you set output operands inside @code{asm goto} only on +some possible control flow paths. If you don't set up the output on +given path and never use it on this path, it is okay. Otherwise, you +should use @samp{+} constraint modifier meaning that the operand is +input and output one. With this modifier you will have the correct +values on all possible paths from the @code{asm goto}. -@smallexample -typeof (x[0](1)) -@end smallexample +To reference a label in the assembler template, prefix it with +@samp{%l} (lowercase @samp{L}) followed by its (zero-based) position +in @var{GotoLabels} plus the number of input and output operands. +Output operand with constraint modifier @samp{+} is counted as two +operands because it is considered as one output and one input operand. +For example, if the @code{asm} has three inputs, one output operand +with constraint modifier @samp{+} and one output operand with +constraint modifier @samp{=} and references two labels, refer to the +first label as @samp{%l6} and the second as @samp{%l7}). -@noindent -This assumes that @code{x} is an array of pointers to functions; -the type described is that of the values of the functions. +Alternately, you can reference labels using the actual C label name +enclosed in brackets. For example, to reference a label named +@code{carry}, you can use @samp{%l[carry]}. The label must still be +listed in the @var{GotoLabels} section when using this approach. It +is better to use the named references for labels as in this case you +can avoid counting input and output operands and special treatment of +output operands with constraint modifier @samp{+}. -Here is an example with a typename as the argument: +Here is an example of @code{asm goto} for i386: -@smallexample -typeof (int *) -@end smallexample +@example +asm goto ( + "btl %1, %0\n\t" + "jc %l2" + : /* No outputs. */ + : "r" (p1), "r" (p2) + : "cc" + : carry); -@noindent -Here the type described is that of pointers to @code{int}. +return 0; -If you are writing a header file that must work when included in ISO C -programs, write @code{__typeof__} instead of @code{typeof}. -@xref{Alternate Keywords}. +carry: +return 1; +@end example -A @code{typeof} construct can be used anywhere a typedef name can be -used. For example, you can use it in a declaration, in a cast, or inside -of @code{sizeof} or @code{typeof}. +The following example shows an @code{asm goto} that uses a memory clobber. -The operand of @code{typeof} is evaluated for its side effects if and -only if it is an expression of variably modified type or the name of -such a type. +@example +int frob(int x) +@{ + int y; + asm goto ("frob %%r5, %1; jc %l[error]; mov (%2), %%r5" + : /* No outputs. */ + : "r"(x), "r"(&y) + : "r5", "memory" + : error); + return y; +error: + return -1; +@} +@end example -@code{typeof} is often useful in conjunction with -statement expressions (@pxref{Statement Exprs}). -Here is how the two together can -be used to define a safe ``maximum'' macro which operates on any -arithmetic type and evaluates each of its arguments exactly once: +The following example shows an @code{asm goto} that uses an output. -@smallexample -#define max(a,b) \ - (@{ typeof (a) _a = (a); \ - typeof (b) _b = (b); \ - _a > _b ? _a : _b; @}) -@end smallexample +@example +int foo(int count) +@{ + asm goto ("dec %0; jb %l[stop]" + : "+r" (count) + : + : + : stop); + return count; +stop: + return 0; +@} +@end example -@cindex underscores in variables in macros -@cindex @samp{_} in variables in macros -@cindex local variables in macros -@cindex variables, local, in macros -@cindex macros, local variables in +The following artificial example shows an @code{asm goto} that sets +up an output only on one path inside the @code{asm goto}. Usage of +constraint modifier @samp{=} instead of @samp{+} would be wrong as +@code{factor} is used on all paths from the @code{asm goto}. -The reason for using names that start with underscores for the local -variables is to avoid conflicts with variable names that occur within the -expressions that are substituted for @code{a} and @code{b}. Eventually we -hope to design a new form of declaration syntax that allows you to declare -variables whose scopes start only after their initializers; this will be a -more reliable way to prevent such conflicts. +@example +int foo(int inp) +@{ + int factor = 0; + asm goto ("cmp %1, 10; jb %l[lab]; mov 2, %0" + : "+r" (factor) + : "r" (inp) + : + : lab); +lab: + return inp * factor; /* return 2 * inp or 0 if inp < 10 */ +@} +@end example +@anchor{GenericOperandmodifiers} +@subsubsection Generic Operand Modifiers @noindent -Some more examples of the use of @code{typeof}: - -@itemize @bullet -@item -This declares @code{y} with the type of what @code{x} points to. - -@smallexample -typeof (*x) y; -@end smallexample - -@item -This declares @code{y} as an array of such values. +The following table shows the modifiers supported by all targets and their effects: -@smallexample -typeof (*x) y[4]; -@end smallexample - -@item -This declares @code{y} as an array of pointers to characters: - -@smallexample -typeof (typeof (char *)[4]) y; -@end smallexample - -@noindent -It is equivalent to the following traditional C declaration: - -@smallexample -char *y[4]; -@end smallexample - -To see the meaning of the declaration using @code{typeof}, and why it -might be a useful way to write, rewrite it with these macros: - -@smallexample -#define pointer(T) typeof(T *) -#define array(T, N) typeof(T [N]) -@end smallexample - -@noindent -Now the declaration can be rewritten this way: - -@smallexample -array (pointer (char), 4) y; -@end smallexample - -@noindent -Thus, @code{array (pointer (char), 4)} is the type of arrays of 4 -pointers to @code{char}. -@end itemize - -The ISO C23 operator @code{typeof_unqual} is available in ISO C23 mode -and its result is the non-atomic unqualified version of what @code{typeof} -operator returns. Alternate spelling @code{__typeof_unqual__} is -available in all C modes and provides non-atomic unqualified version of -what @code{__typeof__} operator returns. -@xref{Alternate Keywords}. - -@cindex @code{__auto_type} in GNU C -In GNU C, but not GNU C++, you may also declare the type of a variable -as @code{__auto_type}. In that case, the declaration must declare -only one variable, whose declarator must just be an identifier, the -declaration must be initialized, and the type of the variable is -determined by the initializer; the name of the variable is not in -scope until after the initializer. (In C++, you should use C++11 -@code{auto} for this purpose.) Using @code{__auto_type}, the -``maximum'' macro above could be written as: - -@smallexample -#define max(a,b) \ - (@{ __auto_type _a = (a); \ - __auto_type _b = (b); \ - _a > _b ? _a : _b; @}) -@end smallexample - -Using @code{__auto_type} instead of @code{typeof} has two advantages: - -@itemize @bullet -@item Each argument to the macro appears only once in the expansion of -the macro. This prevents the size of the macro expansion growing -exponentially when calls to such macros are nested inside arguments of -such macros. - -@item If the argument to the macro has variably modified type, it is -evaluated only once when using @code{__auto_type}, but twice if -@code{typeof} is used. -@end itemize - -@node Offsetof -@subsection Support for @code{offsetof} -@findex __builtin_offsetof - -GCC implements for both C and C++ a syntactic extension to implement -the @code{offsetof} macro. - -@smallexample -primary: - "__builtin_offsetof" "(" @code{typename} "," offsetof_member_designator ")" - -offsetof_member_designator: - @code{identifier} - | offsetof_member_designator "." @code{identifier} - | offsetof_member_designator "[" @code{expr} "]" -@end smallexample - -This extension is sufficient such that - -@smallexample -#define offsetof(@var{type}, @var{member}) __builtin_offsetof (@var{type}, @var{member}) -@end smallexample - -@noindent -is a suitable definition of the @code{offsetof} macro. In C++, @var{type} -may be dependent. In either case, @var{member} may consist of a single -identifier, or a sequence of member accesses and array references. +@multitable @columnfractions 0.15 0.7 0.15 +@headitem Modifier @tab Description @tab Example +@item @code{c} +@tab Require a constant operand and print the constant expression with no punctuation. +@tab @code{%c0} +@item @code{cc} +@tab Like @samp{%c} except try harder to print it with no punctuation. +@samp{%c} can e.g.@: fail to print constant addresses in position independent code on +some architectures. +@tab @code{%cc0} +@item @code{n} +@tab Like @samp{%c} except that the value of the constant is negated before printing. +@tab @code{%n0} +@item @code{a} +@tab Substitute a memory reference, with the actual operand treated as the address. +This may be useful when outputting a ``load address'' instruction, because +often the assembler syntax for such an instruction requires you to write the +operand as if it were a memory reference. +@tab @code{%a0} +@item @code{l} +@tab Print the label name with no punctuation. +@tab @code{%l0} +@end multitable -@node Alignment -@subsection Determining the Alignment of Functions, Types or Variables -@cindex alignment -@cindex type alignment -@cindex variable alignment +@anchor{aarch64Operandmodifiers} +@subsubsection AArch64 Operand Modifiers -The keyword @code{__alignof__} determines the alignment requirement of -a function, object, or a type, or the minimum alignment usually required -by a type. Its syntax is just like @code{sizeof} and C11 @code{_Alignof}. +The following table shows the modifiers supported by AArch64 and their effects: -For example, if the target machine requires a @code{double} value to be -aligned on an 8-byte boundary, then @code{__alignof__ (double)} is 8. -This is true on many RISC machines. On more traditional machine -designs, @code{__alignof__ (double)} is 4 or even 2. +@multitable @columnfractions .10 .90 +@headitem Modifier @tab Description +@item @code{w} @tab Print a 32-bit general-purpose register name or, given a +constant zero operand, the 32-bit zero register (@code{wzr}). +@item @code{x} @tab Print a 64-bit general-purpose register name or, given a +constant zero operand, the 64-bit zero register (@code{xzr}). +@item @code{b} @tab Print an FP/SIMD register name with a @code{b} (byte, 8-bit) +prefix. +@item @code{h} @tab Print an FP/SIMD register name with an @code{h} (halfword, +16-bit) prefix. +@item @code{s} @tab Print an FP/SIMD register name with an @code{s} (single +word, 32-bit) prefix. +@item @code{d} @tab Print an FP/SIMD register name with a @code{d} (doubleword, +64-bit) prefix. +@item @code{q} @tab Print an FP/SIMD register name with a @code{q} (quadword, +128-bit) prefix. +@item @code{Z} @tab Print an FP/SIMD register name as an SVE register (i.e. with +a @code{z} prefix). This is a no-op for SVE register operands. +@end multitable -Some machines never actually require alignment; they allow references to any -data type even at an odd address. For these machines, @code{__alignof__} -reports the smallest alignment that GCC gives the data type, usually as -mandated by the target ABI. +@anchor{x86Operandmodifiers} +@subsubsection x86 Operand Modifiers -If the operand of @code{__alignof__} is an lvalue rather than a type, -its value is the required alignment for its type, taking into account -any minimum alignment specified by attribute @code{aligned} -(@pxref{Common Variable Attributes}). For example, after this -declaration: +References to input, output, and goto operands in the assembler template +of extended @code{asm} statements can use +modifiers to affect the way the operands are formatted in +the code output to the assembler. For example, the +following code uses the @samp{h} and @samp{b} modifiers for x86: -@smallexample -struct foo @{ int x; char y; @} foo1; -@end smallexample +@example +uint16_t num; +asm volatile ("xchg %h0, %b0" : "+a" (num) ); +@end example @noindent -the value of @code{__alignof__ (foo1.y)} is 1, even though its actual -alignment is probably 2 or 4, the same as @code{__alignof__ (int)}. -It is an error to ask for the alignment of an incomplete type other -than @code{void}. - -If the operand of the @code{__alignof__} expression is a function, -the expression evaluates to the alignment of the function which may -be specified by attribute @code{aligned} (@pxref{Common Function Attributes}). +These modifiers generate this assembler code: -@node Incomplete Enums -@subsection Incomplete @code{enum} Types +@example +xchg %ah, %al +@end example -You can define an @code{enum} tag without specifying its possible values. -This results in an incomplete type, much like what you get if you write -@code{struct foo} without describing the elements. A later declaration -that does specify the possible values completes the type. +The rest of this discussion uses the following code for illustrative purposes. -You cannot allocate variables or storage using the type while it is -incomplete. However, you can work with pointers to that type. +@example +int main() +@{ + int iInt = 1; -This extension may not be very useful, but it makes the handling of -@code{enum} more consistent with the way @code{struct} and @code{union} -are handled. +top: -This extension is not supported by GNU C++. + asm volatile goto ("some assembler instructions here" + : /* No outputs. */ + : "q" (iInt), "X" (sizeof(unsigned char) + 1), "i" (42) + : /* No clobbers. */ + : top); +@} +@end example -@node Variadic Macros -@subsection Macros with a Variable Number of Arguments. -@cindex variable number of arguments -@cindex macro with variable arguments -@cindex rest argument (in macro) -@cindex variadic macros +With no modifiers, this is what the output from the operands would be +for the @samp{att} and @samp{intel} dialects of assembler: -In the ISO C standard of 1999, a macro can be declared to accept a -variable number of arguments much as a function can. The syntax for -defining the macro is similar to that of a function. Here is an -example: +@multitable {Operand} {$.L2} {OFFSET FLAT:.L2} +@headitem Operand @tab @samp{att} @tab @samp{intel} +@item @code{%0} +@tab @code{%eax} +@tab @code{eax} +@item @code{%1} +@tab @code{$2} +@tab @code{2} +@item @code{%3} +@tab @code{$.L3} +@tab @code{OFFSET FLAT:.L3} +@item @code{%4} +@tab @code{$8} +@tab @code{8} +@item @code{%5} +@tab @code{%xmm0} +@tab @code{xmm0} +@item @code{%7} +@tab @code{$0} +@tab @code{0} +@end multitable -@smallexample -#define debug(format, ...) fprintf (stderr, format, __VA_ARGS__) -@end smallexample +The table below shows the list of supported modifiers and their effects. -@noindent -Here @samp{@dots{}} is a @dfn{variable argument}. In the invocation of -such a macro, it represents the zero or more tokens until the closing -parenthesis that ends the invocation, including any commas. This set of -tokens replaces the identifier @code{__VA_ARGS__} in the macro body -wherever it appears. See the CPP manual for more information. +@multitable {Modifier} {Print the opcode suffix for the size of th} {Operand} {@samp{att}} {@samp{intel}} +@headitem Modifier @tab Description @tab Operand @tab @samp{att} @tab @samp{intel} +@item @code{A} +@tab Print an absolute memory reference. +@tab @code{%A0} +@tab @code{*%rax} +@tab @code{rax} +@item @code{b} +@tab Print the QImode name of the register. +@tab @code{%b0} +@tab @code{%al} +@tab @code{al} +@item @code{B} +@tab print the opcode suffix of b. +@tab @code{%B0} +@tab @code{b} +@tab +@item @code{c} +@tab Require a constant operand and print the constant expression with no punctuation. +@tab @code{%c1} +@tab @code{2} +@tab @code{2} +@item @code{d} +@tab print duplicated register operand for AVX instruction. +@tab @code{%d5} +@tab @code{%xmm0, %xmm0} +@tab @code{xmm0, xmm0} +@item @code{E} +@tab Print the address in Double Integer (DImode) mode (8 bytes) when the target is 64-bit. +Otherwise mode is unspecified (VOIDmode). +@tab @code{%E1} +@tab @code{%(rax)} +@tab @code{[rax]} +@item @code{g} +@tab Print the V16SFmode name of the register. +@tab @code{%g0} +@tab @code{%zmm0} +@tab @code{zmm0} +@item @code{h} +@tab Print the QImode name for a ``high'' register. +@tab @code{%h0} +@tab @code{%ah} +@tab @code{ah} +@item @code{H} +@tab Add 8 bytes to an offsettable memory reference. Useful when accessing the +high 8 bytes of SSE values. For a memref in (%rax), it generates +@tab @code{%H0} +@tab @code{8(%rax)} +@tab @code{8[rax]} +@item @code{k} +@tab Print the SImode name of the register. +@tab @code{%k0} +@tab @code{%eax} +@tab @code{eax} +@item @code{l} +@tab Print the label name with no punctuation. +@tab @code{%l3} +@tab @code{.L3} +@tab @code{.L3} +@item @code{L} +@tab print the opcode suffix of l. +@tab @code{%L0} +@tab @code{l} +@tab +@item @code{N} +@tab print maskz. +@tab @code{%N7} +@tab @code{@{z@}} +@tab @code{@{z@}} +@item @code{p} +@tab Print raw symbol name (without syntax-specific prefixes). +@tab @code{%p2} +@tab @code{42} +@tab @code{42} +@item @code{P} +@tab If used for a function, print the PLT suffix and generate PIC code. +For example, emit @code{foo@@PLT} instead of 'foo' for the function +foo(). If used for a constant, drop all syntax-specific prefixes and +issue the bare constant. See @code{p} above. +@item @code{q} +@tab Print the DImode name of the register. +@tab @code{%q0} +@tab @code{%rax} +@tab @code{rax} +@item @code{Q} +@tab print the opcode suffix of q. +@tab @code{%Q0} +@tab @code{q} +@tab +@item @code{R} +@tab print embedded rounding and sae. +@tab @code{%R4} +@tab @code{@{rn-sae@}, } +@tab @code{, @{rn-sae@}} +@item @code{r} +@tab print only sae. +@tab @code{%r4} +@tab @code{@{sae@}, } +@tab @code{, @{sae@}} +@item @code{s} +@tab print a shift double count, followed by the assemblers argument +delimiterprint the opcode suffix of s. +@tab @code{%s1} +@tab @code{$2, } +@tab @code{2, } +@item @code{S} +@tab print the opcode suffix of s. +@tab @code{%S0} +@tab @code{s} +@tab +@item @code{t} +@tab print the V8SFmode name of the register. +@tab @code{%t5} +@tab @code{%ymm0} +@tab @code{ymm0} +@item @code{T} +@tab print the opcode suffix of t. +@tab @code{%T0} +@tab @code{t} +@tab +@item @code{V} +@tab print naked full integer register name without %. +@tab @code{%V0} +@tab @code{eax} +@tab @code{eax} +@item @code{w} +@tab Print the HImode name of the register. +@tab @code{%w0} +@tab @code{%ax} +@tab @code{ax} +@item @code{W} +@tab print the opcode suffix of w. +@tab @code{%W0} +@tab @code{w} +@tab +@item @code{x} +@tab print the V4SFmode name of the register. +@tab @code{%x5} +@tab @code{%xmm0} +@tab @code{xmm0} +@item @code{y} +@tab print "st(0)" instead of "st" as a register. +@tab @code{%y6} +@tab @code{%st(0)} +@tab @code{st(0)} +@item @code{z} +@tab Print the opcode suffix for the size of the current integer operand (one of @code{b}/@code{w}/@code{l}/@code{q}). +@tab @code{%z0} +@tab @code{l} +@tab +@item @code{Z} +@tab Like @code{z}, with special suffixes for x87 instructions. +@end multitable -GCC has long supported variadic macros, and used a different syntax that -allowed you to give a name to the variable arguments just like any other -argument. Here is an example: -@smallexample -#define debug(format, args...) fprintf (stderr, format, args) -@end smallexample +@anchor{x86floatingpointasmoperands} +@subsubsection x86 Floating-Point @code{asm} Operands -@noindent -This is in all ways equivalent to the ISO C example above, but arguably -more readable and descriptive. +On x86 targets, there are several rules on the usage of stack-like registers +in the operands of an @code{asm}. These rules apply only to the operands +that are stack-like registers: -GNU CPP has two further variadic macro extensions, and permits them to -be used with either of the above forms of macro definition. +@enumerate +@item +Given a set of input registers that die in an @code{asm}, it is +necessary to know which are implicitly popped by the @code{asm}, and +which must be explicitly popped by GCC@. -In standard C, you are not allowed to leave the variable argument out -entirely; but you are allowed to pass an empty argument. For example, -this invocation is invalid in ISO C, because there is no comma after -the string: +An input register that is implicitly popped by the @code{asm} must be +explicitly clobbered, unless it is constrained to match an +output operand. -@smallexample -debug ("A message") -@end smallexample +@item +For any input register that is implicitly popped by an @code{asm}, it is +necessary to know how to adjust the stack to compensate for the pop. +If any non-popped input is closer to the top of the reg-stack than +the implicitly popped register, it would not be possible to know what the +stack looked like---it's not clear how the rest of the stack ``slides +up''. -GNU CPP permits you to completely omit the variable arguments in this -way. In the above examples, the compiler would complain, though since -the expansion of the macro still has the extra comma after the format -string. +All implicitly popped input registers must be closer to the top of +the reg-stack than any input that is not implicitly popped. -To help solve this problem, CPP behaves specially for variable arguments -used with the token paste operator, @samp{##}. If instead you write +It is possible that if an input dies in an @code{asm}, the compiler might +use the input register for an output reload. Consider this example: @smallexample -#define debug(format, ...) fprintf (stderr, format, ## __VA_ARGS__) +asm ("foo" : "=t" (a) : "f" (b)); @end smallexample @noindent -and if the variable arguments are omitted or empty, the @samp{##} -operator causes the preprocessor to remove the comma before it. If you -do provide some variable arguments in your macro invocation, GNU CPP -does not complain about the paste operation and instead places the -variable arguments after the comma. Just like any other pasted macro -argument, these arguments are not macro expanded. - -@node Conditionals -@subsection Conditionals with Omitted Operands -@cindex conditional expressions, extensions -@cindex omitted middle-operands -@cindex middle-operands, omitted -@cindex extensions, @code{?:} -@cindex @code{?:} extensions +This code says that input @code{b} is not popped by the @code{asm}, and that +the @code{asm} pushes a result onto the reg-stack, i.e., the stack is one +deeper after the @code{asm} than it was before. But, it is possible that +reload may think that it can use the same register for both the input and +the output. -The middle operand in a conditional expression may be omitted. Then -if the first operand is nonzero, its value is the value of the conditional -expression. +To prevent this from happening, +if any input operand uses the @samp{f} constraint, all output register +constraints must use the @samp{&} early-clobber modifier. -Therefore, the expression +The example above is correctly written as: @smallexample -x ? : y +asm ("foo" : "=&t" (a) : "f" (b)); @end smallexample -@noindent -has the value of @code{x} if that is nonzero; otherwise, the value of -@code{y}. +@item +Some operands need to be in particular places on the stack. All +output operands fall in this category---GCC has no other way to +know which registers the outputs appear in unless you indicate +this in the constraints. -This example is perfectly equivalent to +Output operands must specifically indicate which register an output +appears in after an @code{asm}. @samp{=f} is not allowed: the operand +constraints must select a class with a single register. -@smallexample -x ? x : y -@end smallexample +@item +Output operands may not be ``inserted'' between existing stack registers. +Since no 387 opcode uses a read/write operand, all output operands +are dead before the @code{asm}, and are pushed by the @code{asm}. +It makes no sense to push anywhere but the top of the reg-stack. -@cindex side effect in @code{?:} -@cindex @code{?:} side effect -@noindent -In this simple case, the ability to omit the middle operand is not -especially useful. When it becomes useful is when the first operand does, -or may (if it is a macro argument), contain a side effect. Then repeating -the operand in the middle would perform the side effect twice. Omitting -the middle operand uses the value already computed without the undesirable -effects of recomputing it. - -@node Case Ranges -@subsection Case Ranges -@cindex case ranges -@cindex ranges in case statements - -You can specify a range of consecutive values in a single @code{case} label, -like this: - -@smallexample -case @var{low} ... @var{high}: -@end smallexample - -@noindent -This has the same effect as the proper number of individual @code{case} -labels, one for each integer value from @var{low} to @var{high}, inclusive. +Output operands must start at the top of the reg-stack: output +operands may not ``skip'' a register. -This feature is especially useful for ranges of ASCII character codes: +@item +Some @code{asm} statements may need extra stack space for internal +calculations. This can be guaranteed by clobbering stack registers +unrelated to the inputs and outputs. -@smallexample -case 'A' ... 'Z': -@end smallexample +@end enumerate -@strong{Be careful:} Write spaces around the @code{...}, for otherwise -it may be parsed wrong when you use it with integer values. For example, -write this: +This @code{asm} +takes one input, which is internally popped, and produces two outputs. @smallexample -case 1 ... 5: +asm ("fsincos" : "=t" (cos), "=u" (sin) : "0" (inp)); @end smallexample @noindent -rather than this: +This @code{asm} takes two inputs, which are popped by the @code{fyl2xp1} opcode, +and replaces them with one output. The @code{st(1)} clobber is necessary +for the compiler to know that @code{fyl2xp1} pops both inputs. @smallexample -case 1...5: +asm ("fyl2xp1" : "=t" (result) : "0" (x), "u" (y) : "st(1)"); @end smallexample -@node Mixed Labels and Declarations -@subsection Mixed Declarations, Labels and Code -@cindex mixed declarations and code -@cindex declarations, mixed with code -@cindex code, mixed with declarations - -ISO C99 and ISO C++ allow declarations and code to be freely mixed -within compound statements. ISO C23 allows labels to be -placed before declarations and at the end of a compound statement. -As an extension, GNU C also allows all this in C90 mode. For example, -you could do: +@anchor{msp430Operandmodifiers} +@subsubsection MSP430 Operand Modifiers -@smallexample -int i; -/* @r{@dots{}} */ -i++; -int j = i + 2; -@end smallexample +The list below describes the supported modifiers and their effects for MSP430. -Each identifier is visible from where it is declared until the end of -the enclosing block. +@multitable @columnfractions .10 .90 +@headitem Modifier @tab Description +@item @code{A} @tab Select low 16-bits of the constant/register/memory operand. +@item @code{B} @tab Select high 16-bits of the constant/register/memory +operand. +@item @code{C} @tab Select bits 32-47 of the constant/register/memory operand. +@item @code{D} @tab Select bits 48-63 of the constant/register/memory operand. +@item @code{H} @tab Equivalent to @code{B} (for backwards compatibility). +@item @code{I} @tab Print the inverse (logical @code{NOT}) of the constant +value. +@item @code{J} @tab Print an integer without a @code{#} prefix. +@item @code{L} @tab Equivalent to @code{A} (for backwards compatibility). +@item @code{O} @tab Offset of the current frame from the top of the stack. +@item @code{Q} @tab Use the @code{A} instruction postfix. +@item @code{R} @tab Inverse of condition code, for unsigned comparisons. +@item @code{W} @tab Subtract 16 from the constant value. +@item @code{X} @tab Use the @code{X} instruction postfix. +@item @code{Y} @tab Subtract 4 from the constant value. +@item @code{Z} @tab Subtract 1 from the constant value. +@item @code{b} @tab Append @code{.B}, @code{.W} or @code{.A} to the +instruction, depending on the mode. +@item @code{d} @tab Offset 1 byte of a memory reference or constant value. +@item @code{e} @tab Offset 3 bytes of a memory reference or constant value. +@item @code{f} @tab Offset 5 bytes of a memory reference or constant value. +@item @code{g} @tab Offset 7 bytes of a memory reference or constant value. +@item @code{p} @tab Print the value of 2, raised to the power of the given +constant. Used to select the specified bit position. +@item @code{r} @tab Inverse of condition code, for signed comparisons. +@item @code{x} @tab Equivalent to @code{X}, but only for pointers. +@end multitable -@node C++ Comments -@subsection C++ Style Comments -@cindex @code{//} -@cindex C++ comments -@cindex comments, C++ style +@anchor{loongarchOperandmodifiers} +@subsubsection LoongArch Operand Modifiers -In GNU C, you may use C++ style comments, which start with @samp{//} and -continue until the end of the line. Many other C implementations allow -such comments, and they are included in the 1999 C standard. However, -C++ style comments are not recognized if you specify an @option{-std} -option specifying a version of ISO C before C99, or @option{-ansi} -(equivalent to @option{-std=c90}). +The list below describes the supported modifiers and their effects for LoongArch. -@node Escaped Newlines -@subsection Slightly Looser Rules for Escaped Newlines -@cindex escaped newlines -@cindex newlines (escaped) +@multitable @columnfractions .10 .90 +@headitem Modifier @tab Description +@item @code{d} @tab Same as @code{c}. +@item @code{i} @tab Print the character ''@code{i}'' if the operand is not a register. +@item @code{m} @tab Same as @code{c}, but the printed value is @code{operand - 1}. +@item @code{u} @tab Print a LASX register. +@item @code{w} @tab Print a LSX register. +@item @code{X} @tab Print a constant integer operand in hexadecimal. +@item @code{z} @tab Print the operand in its unmodified form, followed by a comma. +@end multitable -The preprocessor treatment of escaped newlines is more relaxed -than that specified by the C90 standard, which requires the newline -to immediately follow a backslash. -GCC's implementation allows whitespace in the form -of spaces, horizontal and vertical tabs, and form feeds between the -backslash and the subsequent newline. The preprocessor issues a -warning, but treats it as a valid escaped newline and combines the two -lines to form a single logical line. This works within comments and -tokens, as well as between tokens. Comments are @emph{not} treated as -whitespace for the purposes of this relaxation, since they have not -yet been replaced with spaces. +References to input and output operands in the assembler template of extended +asm statements can use modifiers to affect the way the operands are formatted +in the code output to the assembler. For example, the following code uses the +'w' modifier for LoongArch: -@node Hex Floats -@subsection Hex Floats -@cindex hex floats +@example +test-asm.c: -ISO C99 and ISO C++17 support floating-point numbers written not only in -the usual decimal notation, such as @code{1.55e1}, but also numbers such as -@code{0x1.fp3} written in hexadecimal format. As a GNU extension, GCC -supports this in C90 mode (except in some cases when strictly -conforming) and in C++98, C++11 and C++14 modes. In that format the -@samp{0x} hex introducer and the @samp{p} or @samp{P} exponent field are -mandatory. The exponent is a decimal number that indicates the power of -2 by which the significant part is multiplied. Thus @samp{0x1.f} is -@tex -$1 {15\over16}$, -@end tex -@ifnottex -1 15/16, -@end ifnottex -@samp{p3} multiplies it by 8, and the value of @code{0x1.fp3} -is the same as @code{1.55e1}. +#include -Unlike for floating-point numbers in the decimal notation the exponent -is always required in the hexadecimal notation. Otherwise the compiler -would not be able to resolve the ambiguity of, e.g., @code{0x1.f}. This -could mean @code{1.0f} or @code{1.9375} since @samp{f} is also the -extension for floating-point constants of type @code{float}. +__m128i foo (void) +@{ +__m128i a,b,c; +__asm__ ("vadd.d %w0,%w1,%w2\n\t" + :"=f" (c) + :"f" (a),"f" (b)); -@node Binary constants -@subsection Binary Constants using the @samp{0b} Prefix -@cindex Binary constants using the @samp{0b} prefix +return c; +@} -Integer constants can be written as binary constants, consisting of a -sequence of @samp{0} and @samp{1} digits, prefixed by @samp{0b} or -@samp{0B}. This is particularly useful in environments that operate a -lot on the bit level (like microcontrollers). +@end example -The following statements are identical: +@noindent +The compile command for the test case is as follows: -@smallexample -i = 42; -i = 0x2a; -i = 052; -i = 0b101010; -@end smallexample +@example +gcc test-asm.c -mlsx -S -o test-asm.s +@end example -The type of these constants follows the same rules as for octal or -hexadecimal integer constants, so suffixes like @samp{L} or @samp{UL} -can be applied. +@noindent +The assembly statement produces the following assembly code: -@node Dollar Signs -@subsection Dollar Signs in Identifier Names -@cindex $ -@cindex dollar signs in identifier names -@cindex identifier names, dollar signs in +@example +vadd.d $vr0,$vr0,$vr1 +@end example -In GNU C, you may normally use dollar signs in identifier names. -This is because many traditional C implementations allow such identifiers. -However, dollar signs in identifiers are not supported on a few target -machines, typically because the target assembler does not allow them. +This is a 128-bit vector addition instruction, @code{c} (referred to in the +template string as %0) is the output, and @code{a} (%1) and @code{b} (%2) are +the inputs. @code{__m128i} is a vector data type defined in the file +@code{lsxintrin.h} (@xref{LoongArch SX Vector Intrinsics}). The symbol '=f' +represents a constraint using a floating-point register as an output type, and +the 'f' in the input operand represents a constraint using a floating-point +register operand, which can refer to the definition of a constraint +(@xref{Constraints}) in gcc. -@node Character Escapes -@subsection The Character @key{ESC} in Constants +@anchor{riscvOperandmodifiers} +@subsubsection RISC-V Operand Modifiers -You can use the sequence @samp{\e} in a string or character constant to -stand for the ASCII character @key{ESC}. +The list below describes the supported modifiers and their effects for RISC-V. -@node Alternate Keywords -@subsection Alternate Keywords -@cindex alternate keywords -@cindex keywords, alternate +@multitable @columnfractions .10 .90 +@headitem Modifier @tab Description +@item @code{z} @tab Print ''@code{zero}'' instead of 0 if the operand is an immediate with a value of zero. +@item @code{i} @tab Print the character ''@code{i}'' if the operand is an immediate. +@item @code{N} @tab Print the register encoding as integer (0 - 31). +@end multitable -@option{-ansi} and the various @option{-std} options disable certain -keywords that are GNU C extensions. -Specifically, the keywords @code{asm}, @code{typeof} and -@code{inline} are not available in programs compiled with -@option{-ansi} or a @option{-std=} option specifying an ISO standard that -doesn't define the keyword. This causes trouble when you want to use -these extensions in a header file that can be included in programs that may -be compiled with with such options. +@anchor{shOperandmodifiers} +@subsubsection SH Operand Modifiers -The way to solve these problems is to put @samp{__} at the beginning and -end of each problematical keyword. For example, use @code{__asm__} -instead of @code{asm}, and @code{__inline__} instead of @code{inline}. +The list below describes the supported modifiers and their effects for the SH family of processors. -Other C compilers won't accept these alternative keywords; if you want to -compile with another compiler, you can define the alternate keywords as -macros to replace them with the customary keywords. It looks like this: +@multitable @columnfractions .10 .90 +@headitem Modifier @tab Description +@item @code{.} @tab Print ''@code{.s}'' if the instruction needs a delay slot. +@item @code{,} @tab Print ''@code{LOCAL_LABEL_PREFIX}''. +@item @code{@@} @tab Print ''@code{trap}'', ''@code{rte}'' or ''@code{rts}'' depending on the interrupt pragma used. +@item @code{#} @tab Print ''@code{nop}'' if there is nothing to put in the delay slot. +@item @code{'} @tab Print likelihood suffix (''@code{/u}'' for unlikely). +@item @code{>} @tab Print branch target if ''@code{-fverbose-asm}''. +@item @code{O} @tab Require a constant operand and print the constant expression with no punctuation. +@item @code{R} @tab Print the ''@code{LSW}'' of a dp value - changes if in little endian. +@item @code{S} @tab Print the ''@code{MSW}'' of a dp value - changes if in little endian. +@item @code{T} @tab Print the next word of a dp value - same as ''@code{R}'' in big endian mode. +@item @code{M} @tab Print ''@code{.b }'', ''@code{.w}'', ''@code{.l}'', ''@code{.s}'', ''@code{.d}'', suffix if operand is a MEM. +@item @code{N} @tab Print ''@code{r63}'' if the operand is ''@code{const_int 0}''. +@item @code{d} @tab Print a ''@code{V2SF}'' as ''@code{dN}'' instead of ''@code{fpN}''. +@item @code{m} @tab Print the pair ''@code{base,offset}'' or ''@code{base,index}'' for LD and ST. +@item @code{U} @tab Like ''@code{%m}'' for ''@code{LD}'' and ''@code{ST}'', ''@code{HI}'' and ''@code{LO}''. +@item @code{V} @tab Print the position of a single bit set. +@item @code{W} @tab Print the position of a single bit cleared. +@item @code{t} @tab Print a memory address which is a register. +@item @code{u} @tab Print the lowest 16 bits of ''@code{CONST_INT}'', as an unsigned value. +@item @code{o} @tab Print an operator. +@end multitable -@smallexample -#ifndef __GNUC__ -#define __asm__ asm -#endif -@end smallexample +@lowersections +@include md.texi +@raisesections -@findex __extension__ -@opindex pedantic -@option{-pedantic} and other options cause warnings for many GNU C extensions. -You can suppress such warnings using the keyword @code{__extension__}. -Specifically: +@node Asm constexprs +@subsection C++11 Constant Expressions instead of String Literals -@itemize @bullet -@item -Writing @code{__extension__} before an expression prevents warnings -about extensions within that expression. +In C++ with @option{-std=gnu++11} or later, strings that appear in asm +syntax---specifically, the assembler template, constraints, and +clobbers---can be specified as parenthesized compile-time constant +expressions as well as by string literals. The parentheses around such +an expression are a required part of the syntax. The constant expression +can return a container with @code{data ()} and @code{size ()} +member functions, following similar rules as the C++26 @code{static_assert} +message. Any string is converted to the character set of the source code. +When this feature is available the @code{__GXX_CONSTEXPR_ASM__} preprocessor +macro is predefined. -@item -In C, writing: +This extension is supported for both the basic and extended asm syntax. -@smallexample -[[__extension__ @dots{}]] -@end smallexample +@example +#include +constexpr std::string_view genfoo() @{ return "foo"; @} -suppresses warnings about using @samp{[[]]} attributes in C versions -that predate C23@. -@end itemize +void function() +@{ + asm((genfoo())); +@} +@end example -@code{__extension__} has no effect aside from this. +@node Asm Labels +@subsection Controlling Names Used in Assembler Code +@cindex assembler names for identifiers +@cindex names used in assembler code +@cindex identifiers, names in assembler code -@node Function Names -@subsection Function Names as Strings -@cindex @code{__func__} identifier -@cindex @code{__FUNCTION__} identifier -@cindex @code{__PRETTY_FUNCTION__} identifier +You can specify the name to be used in the assembler code for a C +function or variable by writing the @code{asm} (or @code{__asm__}) +keyword after the declarator. +It is up to you to make sure that the assembler names you choose do not +conflict with any other assembler symbols, or reference registers. -GCC provides three magic constants that hold the name of the current -function as a string. In C++11 and later modes, all three are treated -as constant expressions and can be used in @code{constexpr} constexts. -The first of these constants is @code{__func__}, which is part of -the C99 standard: +@subsubheading Assembler names for data -The identifier @code{__func__} is implicitly declared by the translator -as if, immediately following the opening brace of each function -definition, the declaration +This sample shows how to specify the assembler name for data: @smallexample -static const char __func__[] = "function-name"; +int foo asm ("myfoo") = 2; @end smallexample @noindent -appeared, where function-name is the name of the lexically-enclosing -function. This name is the unadorned name of the function. As an -extension, at file (or, in C++, namespace scope), @code{__func__} -evaluates to the empty string. +This specifies that the name to be used for the variable @code{foo} in +the assembler code should be @samp{myfoo} rather than the usual +@samp{_foo}. -@code{__FUNCTION__} is another name for @code{__func__}, provided for -backward compatibility with old versions of GCC. +On systems where an underscore is normally prepended to the name of a C +variable, this feature allows you to define names for the +linker that do not start with an underscore. -In C, @code{__PRETTY_FUNCTION__} is yet another name for -@code{__func__}, except that at file scope (or, in C++, namespace scope), -it evaluates to the string @code{"top level"}. In addition, in C++, -@code{__PRETTY_FUNCTION__} contains the signature of the function as -well as its bare name. For example, this program: +GCC does not support using this feature with a non-static local variable +since such variables do not have assembler names. If you are +trying to put the variable in a particular register, see +@ref{Explicit Register Variables}. -@smallexample -extern "C" int printf (const char *, ...); +@subsubheading Assembler names for functions -class a @{ - public: - void sub (int i) - @{ - printf ("__FUNCTION__ = %s\n", __FUNCTION__); - printf ("__PRETTY_FUNCTION__ = %s\n", __PRETTY_FUNCTION__); - @} -@}; +To specify the assembler name for functions, write a declaration for the +function before its definition and put @code{asm} there, like this: -int -main (void) +@smallexample +int func (int x, int y) asm ("MYFUNC"); + +int func (int x, int y) @{ - a ax; - ax.sub (0); - return 0; -@} + /* @r{@dots{}} */ @end smallexample @noindent -gives this output: - -@smallexample -__FUNCTION__ = sub -__PRETTY_FUNCTION__ = void a::sub(int) -@end smallexample +This specifies that the name to be used for the function @code{func} in +the assembler code should be @code{MYFUNC}. -These identifiers are variables, not preprocessor macros, and may not -be used to initialize @code{char} arrays or be concatenated with string -literals. +@node Explicit Register Variables +@subsection Variables in Specified Registers +@anchor{Explicit Reg Vars} +@cindex explicit register variables +@cindex variables in specified registers +@cindex specified registers -@node Semantic Extensions -@section Extensions to C Semantics +GNU C allows you to associate specific hardware registers with C +variables. In almost all cases, allowing the compiler to assign +registers produces the best code. However under certain unusual +circumstances, more precise control over the variable storage is +required. -GNU C defines useful behavior for some constructs that are not allowed or -well-defined in standard C. +Both global and local variables can be associated with a register. The +consequences of performing this association are very different between +the two, as explained in the sections below. @menu -* Function Prototypes:: Prototype declarations and old-style definitions. -* Pointer Arith:: Arithmetic on @code{void}-pointers and function pointers. -* Variadic Pointer Args:: Pointer arguments to variadic functions. -* Pointers to Arrays:: Pointers to arrays with qualifiers work as expected. -* Const and Volatile Functions:: GCC interprets these specially in C. +* Global Register Variables:: Variables declared at global scope. +* Local Register Variables:: Variables declared within a function. @end menu -@node Function Prototypes -@subsection Prototypes and Old-Style Function Definitions -@cindex function prototype declarations -@cindex old-style function definitions -@cindex promotion of formal parameters +@node Global Register Variables +@subsubsection Defining Global Register Variables +@anchor{Global Reg Vars} +@cindex global register variables +@cindex registers, global variables in +@cindex registers, global allocation -GNU C extends ISO C to allow a function prototype to override a later -old-style non-prototype definition. Consider the following example: +You can define a global register variable and associate it with a specified +register like this: @smallexample -/* @r{Use prototypes unless the compiler is old-fashioned.} */ -#ifdef __STDC__ -#define P(x) x -#else -#define P(x) () -#endif - -/* @r{Prototype function declaration.} */ -int isroot P((uid_t)); - -/* @r{Old-style function definition.} */ -int -isroot (x) /* @r{??? lossage here ???} */ - uid_t x; -@{ - return x == 0; -@} +register int *foo asm ("r12"); @end smallexample -Suppose the type @code{uid_t} happens to be @code{short}. ISO C does -not allow this example, because subword arguments in old-style -non-prototype definitions are promoted. Therefore in this example the -function definition's argument is really an @code{int}, which does not -match the prototype argument type of @code{short}. +@noindent +Here @code{r12} is the name of the register that should be used. Note that +this is the same syntax used for defining local register variables, but for +a global variable the declaration appears outside a function. The +@code{register} keyword is required, and cannot be combined with +@code{static}. The register name must be a valid register name for the +target platform. -This restriction of ISO C makes it hard to write code that is portable -to traditional C compilers, because the programmer does not know -whether the @code{uid_t} type is @code{short}, @code{int}, or -@code{long}. Therefore, in cases like these GNU C allows a prototype -to override a later old-style definition. More precisely, in GNU C, a -function prototype argument type overrides the argument type specified -by a later old-style definition if the former type is the same as the -latter type before promotion. Thus in GNU C the above example is -equivalent to the following: +Do not use type qualifiers such as @code{const} and @code{volatile}, as +the outcome may be contrary to expectations. In particular, using the +@code{volatile} qualifier does not fully prevent the compiler from +optimizing accesses to the register. -@smallexample -int isroot (uid_t); +Registers are a scarce resource on most systems and allowing the +compiler to manage their usage usually results in the best code. However, +under special circumstances it can make sense to reserve some globally. +For example this may be useful in programs such as programming language +interpreters that have a couple of global variables that are accessed +very often. -int -isroot (uid_t x) -@{ - return x == 0; -@} -@end smallexample +After defining a global register variable, for the current compilation +unit: -@noindent -GNU C++ does not support old-style function definitions, so this -extension is irrelevant. +@itemize @bullet +@item If the register is a call-saved register, call ABI is affected: +the register will not be restored in function epilogue sequences after +the variable has been assigned. Therefore, functions cannot safely +return to callers that assume standard ABI. +@item Conversely, if the register is a call-clobbered register, making +calls to functions that use standard ABI may lose contents of the variable. +Such calls may be created by the compiler even if none are evident in +the original program, for example when libgcc functions are used to +make up for unavailable instructions. +@item Accesses to the variable may be optimized as usual and the register +remains available for allocation and use in any computations, provided that +observable values of the variable are not affected. +@item If the variable is referenced in inline assembly, the type of access +must be provided to the compiler via constraints (@pxref{Constraints}). +Accesses from basic asms are not supported. +@end itemize -@node Pointer Arith -@subsection Arithmetic on @code{void}- and Function-Pointers -@cindex void pointers, arithmetic -@cindex void, size of pointer to -@cindex function pointers, arithmetic -@cindex function, size of pointer to +Note that these points @emph{only} apply to code that is compiled with the +definition. The behavior of code that is merely linked in (for example +code from libraries) is not affected. -In GNU C, addition and subtraction operations are supported on pointers to -@code{void} and on pointers to functions. This is done by treating the -size of a @code{void} or of a function as 1. +If you want to recompile source files that do not actually use your global +register variable so they do not use the specified register for any other +purpose, you need not actually add the global register declaration to +their source code. It suffices to specify the compiler option +@option{-ffixed-@var{reg}} (@pxref{Code Gen Options}) to reserve the +register. -A consequence of this is that @code{sizeof} is also allowed on @code{void} -and on function types, and returns 1. +@subsubheading Declaring the variable -@opindex Wpointer-arith -The option @option{-Wpointer-arith} requests a warning if these extensions -are used. +Global register variables cannot have initial values, because an +executable file has no means to supply initial contents for a register. -@node Variadic Pointer Args -@subsection Pointer Arguments in Variadic Functions -@cindex pointer arguments in variadic functions -@cindex variadic functions, pointer arguments +When selecting a register, choose one that is normally saved and +restored by function calls on your machine. This ensures that code +which is unaware of this reservation (such as library routines) will +restore it before returning. -Standard C requires that pointer types used with @code{va_arg} in -functions with variable argument lists either must be compatible with -that of the actual argument, or that one type must be a pointer to -@code{void} and the other a pointer to a character type. GNU C -implements the POSIX XSI extension that additionally permits the use -of @code{va_arg} with a pointer type to receive arguments of any other -pointer type. +On machines with register windows, be sure to choose a global +register that is not affected magically by the function call mechanism. -In particular, in GNU C @samp{va_arg (ap, void *)} can safely be used -to consume an argument of any pointer type. +@subsubheading Using the variable -@node Pointers to Arrays -@subsection Pointers to Arrays with Qualifiers Work as Expected -@cindex pointers to arrays -@cindex const qualifier +@cindex @code{qsort}, and global register variables +When calling routines that are not aware of the reservation, be +cautious if those routines call back into code which uses them. As an +example, if you call the system library version of @code{qsort}, it may +clobber your registers during execution, but (if you have selected +appropriate registers) it will restore them before returning. However +it will @emph{not} restore them before calling @code{qsort}'s comparison +function. As a result, global values will not reliably be available to +the comparison function unless the @code{qsort} function itself is rebuilt. -In GNU C, pointers to arrays with qualifiers work similar to pointers -to other qualified types. For example, a value of type @code{int (*)[5]} -can be used to initialize a variable of type @code{const int (*)[5]}. -These types are incompatible in ISO C because the @code{const} qualifier -is formally attached to the element type of the array and not the -array itself. +Similarly, it is not safe to access the global register variables from signal +handlers or from more than one thread of control. Unless you recompile +them specially for the task at hand, the system library routines may +temporarily use the register for other things. Furthermore, since the register +is not reserved exclusively for the variable, accessing it from handlers of +asynchronous signals may observe unrelated temporary values residing in the +register. + +@cindex register variable after @code{longjmp} +@cindex global register after @code{longjmp} +@cindex value after @code{longjmp} +@findex longjmp +@findex setjmp +On most machines, @code{longjmp} restores to each global register +variable the value it had at the time of the @code{setjmp}. On some +machines, however, @code{longjmp} does not change the value of global +register variables. To be portable, the function that called @code{setjmp} +should make other arrangements to save the values of the global register +variables, and to restore them in a @code{longjmp}. This way, the same +thing happens regardless of what @code{longjmp} does. + +@node Local Register Variables +@subsubsection Specifying Registers for Local Variables +@anchor{Local Reg Vars} +@cindex local variables, specifying registers +@cindex specifying registers for local variables +@cindex registers for local variables + +You can define a local register variable and associate it with a specified +register like this: @smallexample -extern void -transpose (int N, int M, double out[M][N], const double in[N][M]); -double x[3][2]; -double y[2][3]; -@r{@dots{}} -transpose(3, 2, y, x); +register int *foo asm ("r12"); @end smallexample -@node Const and Volatile Functions -@subsection Const and Volatile Functions -@cindex @code{const} applied to function -@cindex @code{volatile} applied to function +@noindent +Here @code{r12} is the name of the register that should be used. Note +that this is the same syntax used for defining global register variables, +but for a local variable the declaration appears within a function. The +@code{register} keyword is required, and cannot be combined with +@code{static}. The register name must be a valid register name for the +target platform. -The C standard explicitly leaves the behavior of the @code{const} and -@code{volatile} type qualifiers applied to functions undefined; these -constructs can only arise through the use of @code{typedef}. As an extension, -GCC defines this use of the @code{const} qualifier to have the same meaning -as the GCC @code{const} function attribute, and the @code{volatile} qualifier -to be equivalent to the @code{noreturn} attribute. -@xref{Common Function Attributes}, for more information. +Do not use type qualifiers such as @code{const} and @code{volatile}, as +the outcome may be contrary to expectations. In particular, when the +@code{const} qualifier is used, the compiler may substitute the +variable with its initializer in @code{asm} statements, which may cause +the corresponding operand to appear in a different register. -As examples of this usage, +As with global register variables, it is recommended that you choose +a register that is normally saved and restored by function calls on your +machine, so that calls to library routines will not clobber it. + +The only supported use for this feature is to specify registers +for input and output operands when calling Extended @code{asm} +(@pxref{Extended Asm}). This may be necessary if the constraints for a +particular machine don't provide sufficient control to select the desired +register. To force an operand into a register, create a local variable +and specify the register name after the variable's declaration. Then use +the local variable for the @code{asm} operand and specify any constraint +letter that matches the register: @smallexample +register int *p1 asm ("r0") = @dots{}; +register int *p2 asm ("r1") = @dots{}; +register int *result asm ("r0"); +asm ("sysint" : "=r" (result) : "0" (p1), "r" (p2)); +@end smallexample -/* @r{Equivalent to:} - void fatal () __attribute__ ((noreturn)); */ -typedef void voidfn (); -volatile voidfn fatal; +@emph{Warning:} In the above example, be aware that a register (for example +@code{r0}) can be call-clobbered by subsequent code, including function +calls and library calls for arithmetic operators on other variables (for +example the initialization of @code{p2}). In this case, use temporary +variables for expressions between the register assignments: -/* @r{Equivalent to:} - extern int square (int) __attribute__ ((const)); */ -typedef int intfn (int); -extern const intfn square; +@smallexample +int t1 = @dots{}; +register int *p1 asm ("r0") = @dots{}; +register int *p2 asm ("r1") = t1; +register int *result asm ("r0"); +asm ("sysint" : "=r" (result) : "0" (p1), "r" (p2)); @end smallexample -In general, using function attributes instead is preferred, since the -attributes make both the intent of the code and its reliance on a GNU -extension explicit. Additionally, using @code{const} and -@code{volatile} in this way is specific to GNU C and does not work in -GNU C++. - -@node Return Address -@section Getting the Return or Frame Address of a Function +Defining a register variable does not reserve the register. Other than +when invoking the Extended @code{asm}, the contents of the specified +register are not guaranteed. For this reason, the following uses +are explicitly @emph{not} supported. If they appear to work, it is only +happenstance, and may stop working as intended due to (seemingly) +unrelated changes in surrounding code, or even minor changes in the +optimization of a future version of gcc: -These functions may be used to get information about the callers of a -function. +@itemize @bullet +@item Passing parameters to or from Basic @code{asm} +@item Passing parameters to or from Extended @code{asm} without using input +or output operands. +@item Passing parameters to or from routines written in assembler (or +other languages) using non-standard calling conventions. +@end itemize -@defbuiltin{{void *} __builtin_return_address (unsigned int @var{level})} -This function returns the return address of the current function, or of -one of its callers. The @var{level} argument is number of frames to -scan up the call stack. A value of @code{0} yields the return address -of the current function, a value of @code{1} yields the return address -of the caller of the current function, and so forth. When inlining -the expected behavior is that the function returns the address of -the function that is returned to. To work around this behavior use -the @code{noinline} function attribute. +Some developers use Local Register Variables in an attempt to improve +gcc's allocation of registers, especially in large functions. In this +case the register name is essentially a hint to the register allocator. +While in some instances this can generate better code, improvements are +subject to the whims of the allocator/optimizers. Since there are no +guarantees that your improvements won't be lost, this usage of Local +Register Variables is discouraged. -The @var{level} argument must be a constant integer. +On the MIPS platform, there is related use for local register variables +with slightly different characteristics (@pxref{MIPS Coprocessors,, +Defining coprocessor specifics for MIPS targets, gccint, +GNU Compiler Collection (GCC) Internals}). -On some machines it may be impossible to determine the return address of -any function other than the current one; in such cases, or when the top -of the stack has been reached, this function returns an unspecified -value. In addition, @code{__builtin_frame_address} may be used -to determine if the top of the stack has been reached. +@node Size of an asm +@subsection Size of an @code{asm} -Additional post-processing of the returned value may be needed, see -@code{__builtin_extract_return_addr}. +Some targets require that GCC track the size of each instruction used +in order to generate correct code. Because the final length of the +code produced by an @code{asm} statement is only known by the +assembler, GCC must make an estimate as to how big it will be. It +does this by counting the number of instructions in the pattern of the +@code{asm} and multiplying that by the length of the longest +instruction supported by that processor. (When working out the number +of instructions, it assumes that any occurrence of a newline or of +whatever statement separator character is supported by the assembler --- +typically @samp{;} --- indicates the end of an instruction.) -The stored representation of the return address in memory may be different -from the address returned by @code{__builtin_return_address}. For example, -on AArch64 the stored address may be mangled with return address signing -whereas the address returned by @code{__builtin_return_address} is not. +Normally, GCC's estimate is adequate to ensure that correct +code is generated, but it is possible to confuse the compiler if you use +pseudo instructions or assembler macros that expand into multiple real +instructions, or if you use assembler directives that expand to more +space in the object file than is needed for a single instruction. +If this happens then the assembler may produce a diagnostic saying that +a label is unreachable. -Calling this function with a nonzero argument can have unpredictable -effects, including crashing the calling program. As a result, calls -that are considered unsafe are diagnosed when the @option{-Wframe-address} -option is in effect. Such calls should only be made in debugging -situations. +@cindex @code{asm inline} +This size is also used for inlining decisions. If you use @code{asm inline} +instead of just @code{asm}, then for inlining purposes the size of the asm +is taken as the minimum size, ignoring how many instructions GCC thinks it is. -On targets where code addresses are representable as @code{void *}, -@smallexample -void *addr = __builtin_extract_return_addr (__builtin_return_address (0)); -@end smallexample -gives the code address where the current function would return. For example, -such an address may be used with @code{dladdr} or other interfaces that work -with code addresses. -@enddefbuiltin +@node Syntax Extensions +@section Other Extensions to C Syntax -@defbuiltin{{void *} __builtin_extract_return_addr (void *@var{addr})} -The address as returned by @code{__builtin_return_address} may have to be fed -through this function to get the actual encoded address. For example, on the -31-bit S/390 platform the highest bit has to be masked out, or on SPARC -platforms an offset has to be added for the true next instruction to be -executed. +GNU C has traditionally supported numerous extensions to standard C +syntax. Some of these features were originally intended for +compatibility with other compilers or to ease traditional C +compatibility, some have been adopted into subsequent versions of the +C and/or C++ standards, while others remain specific to GNU C. -If no fixup is needed, this function simply passes through @var{addr}. -@enddefbuiltin +@menu +* Statement Exprs:: Putting statements and declarations inside expressions. +* Local Labels:: Labels local to a block. +* Labels as Values:: Getting pointers to labels, and computed gotos. +* Nested Functions:: Nested functions in GNU C. +* Typeof:: @code{typeof}: referring to the type of an expression. +* Offsetof:: Special syntax for @code{offsetof}. +* Alignment:: Determining the alignment of a function, type or variable. +* Incomplete Enums:: @code{enum foo;}, with details to follow. +* Variadic Macros:: Macros with a variable number of arguments. +* Conditionals:: Omitting the middle operand of a @samp{?:} expression. +* Case Ranges:: `case 1 ... 9' and such. +* Mixed Labels and Declarations:: Mixing declarations, labels and code. +* C++ Comments:: C++ comments are recognized. +* Escaped Newlines:: Slightly looser rules for escaped newlines. +* Hex Floats:: Hexadecimal floating-point constants. +* Binary constants:: Binary constants using the @samp{0b} prefix. +* Dollar Signs:: Dollar sign is allowed in identifiers. +* Character Escapes:: @samp{\e} stands for the character @key{ESC}. +* Alternate Keywords:: @code{__const__}, @code{__asm__}, etc., for header files. +* Function Names:: Printable strings which are the name of the current + function. +@end menu -@defbuiltin{{void *} __builtin_frob_return_addr (void *@var{addr})} -This function does the reverse of @code{__builtin_extract_return_addr}. -@enddefbuiltin +@node Statement Exprs +@subsection Statements and Declarations in Expressions +@cindex statements inside expressions +@cindex declarations inside expressions +@cindex expressions containing statements +@cindex macros, statements in expressions -@defbuiltin{{void *} __builtin_frame_address (unsigned int @var{level})} -This function is similar to @code{__builtin_return_address}, but it -returns the address of the function frame rather than the return address -of the function. Calling @code{__builtin_frame_address} with a value of -@code{0} yields the frame address of the current function, a value of -@code{1} yields the frame address of the caller of the current function, -and so forth. +@c the above section title wrapped and causes an underfull hbox.. i +@c changed it from "within" to "in". --mew 4feb93 +A compound statement enclosed in parentheses may appear as an expression +in GNU C@. This allows you to use loops, switches, and local variables +within an expression. -The frame is the area on the stack that holds local variables and saved -registers. The frame address is normally the address of the first word -pushed on to the stack by the function. However, the exact definition -depends upon the processor and the calling convention. If the processor -has a dedicated frame pointer register, and the function has a frame, -then @code{__builtin_frame_address} returns the value of the frame -pointer register. +Recall that a compound statement is a sequence of statements surrounded +by braces; in this construct, parentheses go around the braces. For +example: -On some machines it may be impossible to determine the frame address of -any function other than the current one; in such cases, or when the top -of the stack has been reached, this function returns @code{0} if -the first frame pointer is properly initialized by the startup code. +@smallexample +(@{ int y = foo (); int z; + if (y > 0) z = y; + else z = - y; + z; @}) +@end smallexample -Calling this function with a nonzero argument can have unpredictable -effects, including crashing the calling program. As a result, calls -that are considered unsafe are diagnosed when the @option{-Wframe-address} -option is in effect. Such calls should only be made in debugging -situations. -@enddefbuiltin +@noindent +is a valid (though slightly more complex than necessary) expression +for the absolute value of @code{foo ()}. -@deftypefn {Built-in Function} {void *} __builtin_stack_address () -This function returns the stack pointer register, offset by -@code{STACK_ADDRESS_OFFSET} if that's defined. +The last thing in the compound statement should be an expression +followed by a semicolon; the value of this subexpression serves as the +value of the entire construct. (If you use some other kind of statement +last within the braces, the construct has type @code{void}, and thus +effectively no value.) -Conceptually, the returned address returned by this built-in function is -the boundary between the stack area allocated for use by its caller, and -the area that could be modified by a function call, that the caller -could safely zero-out before or after (but not during) the call -sequence. +This feature is especially useful in making macro definitions ``safe'' (so +that they evaluate each operand exactly once). For example, the +``maximum'' function is commonly defined as a macro in standard C as +follows: -Arguments for a callee may be preallocated as part of the caller's stack -frame, or allocated on a per-call basis, depending on the target, so -they may be on either side of this boundary. +@smallexample +#define max(a,b) ((a) > (b) ? (a) : (b)) +@end smallexample -Even if the stack pointer is biased, the result is not. The register -save area on SPARC is regarded as modifiable by calls, rather than as -allocated for use by the caller function, since it is never in use while -the caller function itself is running. +@noindent +@cindex side effects, macro argument +But this definition computes either @var{a} or @var{b} twice, with bad +results if the operand has side effects. In GNU C, if you know the +type of the operands (here taken as @code{int}), you can avoid this +problem by defining the macro as follows: -Red zones that only leaf functions could use are also regarded as -modifiable by calls, rather than as allocated for use by the caller. -This is only theoretical, since leaf functions do not issue calls, but a -constant offset makes this built-in function more predictable. -@end deftypefn +@smallexample +#define maxint(a,b) \ + (@{int _a = (a), _b = (b); _a > _b ? _a : _b; @}) +@end smallexample -@node Stack Scrubbing -@section Stack scrubbing internal interfaces +Note that introducing variable declarations (as we do in @code{maxint}) can +cause variable shadowing, so while this example using the @code{max} macro +produces correct results: +@smallexample +int _a = 1, _b = 2, c; +c = max (_a, _b); +@end smallexample +@noindent +this example using maxint will not: +@smallexample +int _a = 1, _b = 2, c; +c = maxint (_a, _b); +@end smallexample -Stack scrubbing involves cooperation between a @code{strub} context, -i.e., a function whose stack frame is to be zeroed-out, and its callers. -The caller initializes a stack watermark, the @code{strub} context -updates the watermark according to its stack use, and the caller zeroes -it out once it regains control, whether by the callee's returning or by -an exception. +This problem may for instance occur when we use this pattern recursively, like +so: -Each of these steps is performed by a different builtin function call. -Calls to these builtins are introduced automatically, in response to -@code{strub} attributes and command-line options; they are not expected -to be explicitly called by source code. +@smallexample +#define maxint3(a, b, c) \ + (@{int _a = (a), _b = (b), _c = (c); maxint (maxint (_a, _b), _c); @}) +@end smallexample -The functions that implement the builtins are available in libgcc but, -depending on optimization levels, they are expanded internally, adjusted -to account for inlining, and sometimes combined/deferred (e.g. passing -the caller-supplied watermark on to callees, refraining from erasing -stack areas that the caller will) to enable tail calls and to optimize -for code size. +Embedded statements are not allowed in constant expressions, such as +the value of an enumeration constant, the width of a bit-field, or +the initial value of a static variable. -@deftypefn {Built-in Function} {void} __builtin___strub_enter (void **@var{wmptr}) -This function initializes a stack @var{watermark} variable with the -current top of the stack. A call to this builtin function is introduced -before entering a @code{strub} context. It remains as a function call -if optimization is not enabled. -@end deftypefn +If you don't know the type of the operand, you can still do this, but you +must use @code{typeof} or @code{__auto_type} (@pxref{Typeof}). -@deftypefn {Built-in Function} {void} __builtin___strub_update (void **@var{wmptr}) -This function updates a stack @var{watermark} variable with the current -top of the stack, if it tops the previous watermark. A call to this -builtin function is inserted within @code{strub} contexts, whenever -additional stack space may have been used. It remains as a function -call at optimization levels lower than 2. -@end deftypefn +In G++, the result value of a statement expression undergoes array and +function pointer decay, and is returned by value to the enclosing +expression. For instance, if @code{A} is a class, then -@deftypefn {Built-in Function} {void} __builtin___strub_leave (void **@var{wmptr}) -This function overwrites the memory area between the current top of the -stack, and the @var{watermark}ed address. A call to this builtin -function is inserted after leaving a @code{strub} context. It remains -as a function call at optimization levels lower than 3, and it is guarded by -a condition at level 2. -@end deftypefn +@smallexample + A a; -@node Vector Extensions -@section Using Vector Instructions through Built-in Functions + (@{a;@}).Foo () +@end smallexample -On some targets, the instruction set contains SIMD vector instructions which -operate on multiple values contained in one large register at the same time. -For example, on the x86 the MMX, 3DNow!@: and SSE extensions can be used -this way. +@noindent +constructs a temporary @code{A} object to hold the result of the +statement expression, and that is used to invoke @code{Foo}. +Therefore the @code{this} pointer observed by @code{Foo} is not the +address of @code{a}. -The first step in using these extensions is to provide the necessary data -types. This should be done using an appropriate @code{typedef}: +In a statement expression, any temporaries created within a statement +are destroyed at that statement's end. This makes statement +expressions inside macros slightly different from function calls. In +the latter case temporaries introduced during argument evaluation are +destroyed at the end of the statement that includes the function +call. In the statement expression case they are destroyed during +the statement expression. For instance, @smallexample -typedef int v4si __attribute__ ((vector_size (16))); +#define macro(a) (@{__typeof__(a) b = (a); b + 3; @}) +template T function(T a) @{ T b = a; return b + 3; @} + +void foo () +@{ + macro (X ()); + function (X ()); +@} @end smallexample @noindent -The @code{int} type specifies the @dfn{base type} (which can be a -@code{typedef}), while the attribute specifies the vector size for the -variable, measured in bytes. For example, the declaration above causes -the compiler to set the mode for the @code{v4si} type to be 16 bytes wide -and divided into @code{int} sized units. For a 32-bit @code{int} this -means a vector of 4 units of 4 bytes, and the corresponding mode of -@code{foo} is @acronym{V4SI}. +has different places where temporaries are destroyed. For the +@code{macro} case, the temporary @code{X} is destroyed just after +the initialization of @code{b}. In the @code{function} case that +temporary is destroyed when the function returns. -The @code{vector_size} attribute is only applicable to integral and -floating scalars, although arrays, pointers, and function return values -are allowed in conjunction with this construct. Only sizes that are -positive power-of-two multiples of the base type size are currently allowed. +These considerations mean that it is probably a bad idea to use +statement expressions of this form in header files that are designed to +work with C++. (Note that some versions of the GNU C Library contained +header files using statement expressions that lead to precisely this +bug.) -All the basic integer types can be used as base types, both as signed -and as unsigned: @code{char}, @code{short}, @code{int}, @code{long}, -@code{long long}. In addition, @code{float} and @code{double} can be -used to build floating-point vector types. +Jumping into a statement expression with @code{goto} or using a +@code{switch} statement outside the statement expression with a +@code{case} or @code{default} label inside the statement expression is +not permitted. Jumping into a statement expression with a computed +@code{goto} (@pxref{Labels as Values}) has undefined behavior. +Jumping out of a statement expression is permitted, but if the +statement expression is part of a larger expression then it is +unspecified which other subexpressions of that expression have been +evaluated except where the language definition requires certain +subexpressions to be evaluated before or after the statement +expression. A @code{break} or @code{continue} statement inside of +a statement expression used in @code{while}, @code{do} or @code{for} +loop or @code{switch} statement condition +or @code{for} statement init or increment expressions jumps to an +outer loop or @code{switch} statement if any (otherwise it is an error), +rather than to the loop or @code{switch} statement in whose condition +or init or increment expression it appears. +In any case, as with a function call, the evaluation of a +statement expression is not interleaved with the evaluation of other +parts of the containing expression. For example, -Specifying a combination that is not valid for the current architecture -causes GCC to synthesize the instructions using a narrower mode. -For example, if you specify a variable of type @code{V4SI} and your -architecture does not allow for this specific SIMD type, GCC -produces code that uses 4 @code{SIs}. +@smallexample + foo (), ((@{ bar1 (); goto a; 0; @}) + bar2 ()), baz(); +@end smallexample -The types defined in this manner can be used with a subset of normal C -operations. Currently, GCC allows using the following operators -on these types: @code{+, -, *, /, unary minus, ^, |, &, ~, %}@. +@noindent +calls @code{foo} and @code{bar1} and does not call @code{baz} but +may or may not call @code{bar2}. If @code{bar2} is called, it is +called after @code{foo} and before @code{bar1}. -The operations behave like C++ @code{valarrays}. Addition is defined as -the addition of the corresponding elements of the operands. For -example, in the code below, each of the 4 elements in @var{a} is -added to the corresponding 4 elements in @var{b} and the resulting -vector is stored in @var{c}. +@node Local Labels +@subsection Locally Declared Labels +@cindex local labels +@cindex macros, local labels -@smallexample -typedef int v4si __attribute__ ((vector_size (16))); +GCC allows you to declare @dfn{local labels} in any nested block +scope. A local label is just like an ordinary label, but you can +only reference it (with a @code{goto} statement, or by taking its +address) within the block in which it is declared. -v4si a, b, c; +A local label declaration looks like this: -c = a + b; +@smallexample +__label__ @var{label}; @end smallexample -Subtraction, multiplication, division, and the logical operations -operate in a similar manner. Likewise, the result of using the unary -minus or complement operators on a vector type is a vector whose -elements are the negative or complemented values of the corresponding -elements in the operand. +@noindent +or -It is possible to use shifting operators @code{<<}, @code{>>} on -integer-type vectors. The operation is defined as following: @code{@{a0, -a1, @dots{}, an@} >> @{b0, b1, @dots{}, bn@} == @{a0 >> b0, a1 >> b1, -@dots{}, an >> bn@}}@. Unlike OpenCL, values of @code{b} are not -implicitly taken modulo bit width of the base type @code{B}, and the behavior -is undefined if any @code{bi} is greater than or equal to @code{B}. +@smallexample +__label__ @var{label1}, @var{label2}, /* @r{@dots{}} */; +@end smallexample -In contrast to scalar operations in C and C++, operands of integer vector -operations do not undergo integer promotions. +Local label declarations must come at the beginning of the block, +before any ordinary declarations or statements. -Operands of binary vector operations must have the same number of -elements. +The label declaration defines the label @emph{name}, but does not define +the label itself. You must do this in the usual way, with +@code{@var{label}:}, within the statements of the statement expression. -For convenience, it is allowed to use a binary vector operation -where one operand is a scalar. In that case the compiler transforms -the scalar operand into a vector where each element is the scalar from -the operation. The transformation happens only if the scalar could be -safely converted to the vector-element type. -Consider the following code. +The local label feature is useful for complex macros. If a macro +contains nested loops, a @code{goto} can be useful for breaking out of +them. However, an ordinary label whose scope is the whole function +cannot be used: if the macro can be expanded several times in one +function, the label is multiply defined in that function. A +local label avoids this problem. For example: @smallexample -typedef int v4si __attribute__ ((vector_size (16))); - -v4si a, b, c; -long l; +#define SEARCH(value, array, target) \ +do @{ \ + __label__ found; \ + typeof (target) _SEARCH_target = (target); \ + typeof (*(array)) *_SEARCH_array = (array); \ + int i, j; \ + int value; \ + for (i = 0; i < max; i++) \ + for (j = 0; j < max; j++) \ + if (_SEARCH_array[i][j] == _SEARCH_target) \ + @{ (value) = i; goto found; @} \ + (value) = -1; \ + found:; \ +@} while (0) +@end smallexample -a = b + 1; /* a = b + @{1,1,1,1@}; */ -a = 2 * b; /* a = @{2,2,2,2@} * b; */ +This could also be written using a statement expression: -a = l + a; /* Error, cannot convert long to int. */ +@smallexample +#define SEARCH(array, target) \ +(@{ \ + __label__ found; \ + typeof (target) _SEARCH_target = (target); \ + typeof (*(array)) *_SEARCH_array = (array); \ + int i, j; \ + int value; \ + for (i = 0; i < max; i++) \ + for (j = 0; j < max; j++) \ + if (_SEARCH_array[i][j] == _SEARCH_target) \ + @{ value = i; goto found; @} \ + value = -1; \ + found: \ + value; \ +@}) @end smallexample -Vectors can be subscripted as if the vector were an array with -the same number of elements and base type. Out of bound accesses -invoke undefined behavior at run time. Warnings for out of bound -accesses for vector subscription can be enabled with -@option{-Warray-bounds}. +Local label declarations also make the labels they declare visible to +nested functions, if there are any. @xref{Nested Functions}, for details. -Vector comparison is supported with standard comparison -operators: @code{==, !=, <, <=, >, >=}. Comparison operands can be -vector expressions of integer-type or real-type. Comparison between -integer-type vectors and real-type vectors are not supported. The -result of the comparison is a vector of the same width and number of -elements as the comparison operands with a signed integral element -type. +@node Labels as Values +@subsection Labels as Values +@cindex labels as values +@cindex computed gotos +@cindex goto with computed label +@cindex address of a label -Vectors are compared element-wise producing 0 when comparison is false -and -1 (constant of the appropriate type where all bits are set) -otherwise. Consider the following example. +You can get the address of a label defined in the current function +(or a containing function) with the unary operator @samp{&&}. The +value has type @code{void *}. This value is a constant and can be used +wherever a constant of that type is valid. For example: @smallexample -typedef int v4si __attribute__ ((vector_size (16))); - -v4si a = @{1,2,3,4@}; -v4si b = @{3,2,1,4@}; -v4si c; - -c = a > b; /* The result would be @{0, 0,-1, 0@} */ -c = a == b; /* The result would be @{0,-1, 0,-1@} */ +void *ptr; +/* @r{@dots{}} */ +ptr = &&foo; @end smallexample -In C++, the ternary operator @code{?:} is available. @code{a?b:c}, where -@code{b} and @code{c} are vectors of the same type and @code{a} is an -integer vector with the same number of elements of the same size as @code{b} -and @code{c}, computes all three arguments and creates a vector -@code{@{a[0]?b[0]:c[0], a[1]?b[1]:c[1], @dots{}@}}. Note that unlike in -OpenCL, @code{a} is thus interpreted as @code{a != 0} and not @code{a < 0}. -As in the case of binary operations, this syntax is also accepted when -one of @code{b} or @code{c} is a scalar that is then transformed into a -vector. If both @code{b} and @code{c} are scalars and the type of -@code{true?b:c} has the same size as the element type of @code{a}, then -@code{b} and @code{c} are converted to a vector type whose elements have -this type and with the same number of elements as @code{a}. - -In C++, the logic operators @code{!, &&, ||} are available for vectors. -@code{!v} is equivalent to @code{v == 0}, @code{a && b} is equivalent to -@code{a!=0 & b!=0} and @code{a || b} is equivalent to @code{a!=0 | b!=0}. -For mixed operations between a scalar @code{s} and a vector @code{v}, -@code{s && v} is equivalent to @code{s?v!=0:0} (the evaluation is -short-circuit) and @code{v && s} is equivalent to @code{v!=0 & (s?-1:0)}. +To use these values, you need to be able to jump to one. This is done +with the computed goto statement@footnote{The analogous feature in +Fortran is called an assigned goto, but that name seems inappropriate in +C, where one can do more than simply store label addresses in label +variables.}, @code{goto *@var{exp};}. For example, -@findex __builtin_shuffle -Vector shuffling is available using functions -@code{__builtin_shuffle (vec, mask)} and -@code{__builtin_shuffle (vec0, vec1, mask)}. -Both functions construct a permutation of elements from one or two -vectors and return a vector of the same type as the input vector(s). -The @var{mask} is an integral vector with the same width (@var{W}) -and element count (@var{N}) as the output vector. +@smallexample +goto *ptr; +@end smallexample -The elements of the input vectors are numbered in memory ordering of -@var{vec0} beginning at 0 and @var{vec1} beginning at @var{N}. The -elements of @var{mask} are considered modulo @var{N} in the single-operand -case and modulo @math{2*@var{N}} in the two-operand case. +@noindent +Any expression of type @code{void *} is allowed. -Consider the following example, +One way of using these constants is in initializing a static array that +serves as a jump table: @smallexample -typedef int v4si __attribute__ ((vector_size (16))); +static void *array[] = @{ &&foo, &&bar, &&hack @}; +@end smallexample -v4si a = @{1,2,3,4@}; -v4si b = @{5,6,7,8@}; -v4si mask1 = @{0,1,1,3@}; -v4si mask2 = @{0,4,2,5@}; -v4si res; +@noindent +Then you can select a label with indexing, like this: -res = __builtin_shuffle (a, mask1); /* res is @{1,2,2,4@} */ -res = __builtin_shuffle (a, b, mask2); /* res is @{1,5,3,6@} */ +@smallexample +goto *array[i]; @end smallexample -Note that @code{__builtin_shuffle} is intentionally semantically -compatible with the OpenCL @code{shuffle} and @code{shuffle2} functions. +@noindent +Note that this does not check whether the subscript is in bounds---array +indexing in C never does that. -You can declare variables and use them in function calls and returns, as -well as in assignments and some casts. You can specify a vector type as -a return type for a function. Vector types can also be used as function -arguments. It is possible to cast from one vector type to another, -provided they are of the same size (in fact, you can also cast vectors -to and from other data types of the same size). +Such an array of label values serves a purpose much like that of the +@code{switch} statement. The @code{switch} statement is cleaner, so +use that rather than an array unless the problem does not fit a +@code{switch} statement very well. -You cannot operate between vectors of different lengths or different -signedness without a cast. +Another use of label values is in an interpreter for threaded code. +The labels within the interpreter function can be stored in the +threaded code for super-fast dispatching. -@findex __builtin_shufflevector -Vector shuffling is available using the -@code{__builtin_shufflevector (vec1, vec2, index...)} -function. @var{vec1} and @var{vec2} must be expressions with -vector type with a compatible element type. The result of -@code{__builtin_shufflevector} is a vector with the same element type -as @var{vec1} and @var{vec2} but that has an element count equal to -the number of indices specified. +You may not use this mechanism to jump to code in a different function. +If you do that, totally unpredictable things happen. The best way to +avoid this is to store the label address only in automatic variables and +never pass it as an argument. -The @var{index} arguments are a list of integers that specify the -elements indices of the first two vectors that should be extracted and -returned in a new vector. These element indices are numbered sequentially -starting with the first vector, continuing into the second vector. -An index of -1 can be used to indicate that the corresponding element in -the returned vector is a don't care and can be freely chosen to optimized -the generated code sequence performing the shuffle operation. +An alternate way to write the above example is -Consider the following example, @smallexample -typedef int v4si __attribute__ ((vector_size (16))); -typedef int v8si __attribute__ ((vector_size (32))); - -v8si a = @{1,-2,3,-4,5,-6,7,-8@}; -v4si b = __builtin_shufflevector (a, a, 0, 2, 4, 6); /* b is @{1,3,5,7@} */ -v4si c = @{-2,-4,-6,-8@}; -v8si d = __builtin_shufflevector (c, b, 4, 0, 5, 1, 6, 2, 7, 3); /* d is a */ +static const int array[] = @{ &&foo - &&foo, &&bar - &&foo, + &&hack - &&foo @}; +goto *(&&foo + array[i]); @end smallexample -@findex __builtin_convertvector -Vector conversion is available using the -@code{__builtin_convertvector (vec, vectype)} -function. @var{vec} must be an expression with integral or floating -vector type and @var{vectype} an integral or floating vector type with the -same number of elements. The result has @var{vectype} type and value of -a C cast of every element of @var{vec} to the element type of @var{vectype}. +@noindent +This is more friendly to code living in shared libraries, as it reduces +the number of dynamic relocations that are needed, and by consequence, +allows the data to be read-only. +This alternative with label differences is not supported for the AVR target, +please use the first approach for AVR programs. -Consider the following example, -@smallexample -typedef int v4si __attribute__ ((vector_size (16))); -typedef float v4sf __attribute__ ((vector_size (16))); -typedef double v4df __attribute__ ((vector_size (32))); -typedef unsigned long long v4di __attribute__ ((vector_size (32))); +The @code{&&foo} expressions for the same label might have different +values if the containing function is inlined or cloned. If a program +relies on them being always the same, +@code{__attribute__((__noinline__,__noclone__))} should be used to +prevent inlining and cloning. If @code{&&foo} is used in a static +variable initializer, inlining and cloning is forbidden. -v4si a = @{1,-2,3,-4@}; -v4sf b = @{1.5f,-2.5f,3.f,7.f@}; -v4di c = @{1ULL,5ULL,0ULL,10ULL@}; -v4sf d = __builtin_convertvector (a, v4sf); /* d is @{1.f,-2.f,3.f,-4.f@} */ -/* Equivalent of: - v4sf d = @{ (float)a[0], (float)a[1], (float)a[2], (float)a[3] @}; */ -v4df e = __builtin_convertvector (a, v4df); /* e is @{1.,-2.,3.,-4.@} */ -v4df f = __builtin_convertvector (b, v4df); /* f is @{1.5,-2.5,3.,7.@} */ -v4si g = __builtin_convertvector (f, v4si); /* g is @{1,-2,3,7@} */ -v4si h = __builtin_convertvector (c, v4si); /* h is @{1,5,0,10@} */ -@end smallexample +Unlike a normal goto, in GNU C++ a computed goto will not call +destructors for objects that go out of scope. -@cindex vector types, using with x86 intrinsics -Sometimes it is desirable to write code using a mix of generic vector -operations (for clarity) and machine-specific vector intrinsics (to -access vector instructions that are not exposed via generic built-ins). -On x86, intrinsic functions for integer vectors typically use the same -vector type @code{__m128i} irrespective of how they interpret the vector, -making it necessary to cast their arguments and return values from/to -other vector types. In C, you can make use of a @code{union} type: -@c In C++ such type punning via a union is not allowed by the language -@smallexample -#include +@node Nested Functions +@subsection Nested Functions +@cindex nested functions +@cindex downward funargs +@cindex thunks -typedef unsigned char u8x16 __attribute__ ((vector_size (16))); -typedef unsigned int u32x4 __attribute__ ((vector_size (16))); +A @dfn{nested function} is a function defined inside another function. +Nested functions are supported as an extension in GNU C, but are not +supported by GNU C++. -typedef union @{ - __m128i mm; - u8x16 u8; - u32x4 u32; -@} v128; +The nested function's name is local to the block where it is defined. +For example, here we define a nested function named @code{square}, and +call it twice: + +@smallexample +@group +foo (double a, double b) +@{ + double square (double z) @{ return z * z; @} + + return square (a) + square (b); +@} +@end group @end smallexample -@noindent -for variables that can be used with both built-in operators and x86 -intrinsics: +The nested function can access all the variables of the containing +function that are visible at the point of its definition. This is +called @dfn{lexical scoping}. For example, here we show a nested +function which uses an inherited variable named @code{offset}: @smallexample -v128 x, y = @{ 0 @}; -memcpy (&x, ptr, sizeof x); -y.u8 += 0x80; -x.mm = _mm_adds_epu8 (x.mm, y.mm); -x.u32 &= 0xffffff; - -/* Instead of a variable, a compound literal may be used to pass the - return value of an intrinsic call to a function expecting the union: */ -v128 foo (v128); -x = foo ((v128) @{_mm_adds_epu8 (x.mm, y.mm)@}); -@c This could be done implicitly with __attribute__((transparent_union)), -@c but GCC does not accept it for unions of vector types (PR 88955). +@group +bar (int *array, int offset, int size) +@{ + int access (int *array, int index) + @{ return array[index + offset]; @} + int i; + /* @r{@dots{}} */ + for (i = 0; i < size; i++) + /* @r{@dots{}} */ access (array, i) /* @r{@dots{}} */ +@} +@end group @end smallexample -@node __sync Builtins -@section Legacy @code{__sync} Built-in Functions for Atomic Memory Access +Nested function definitions are permitted within functions in the places +where variable definitions are allowed; that is, in any block, mixed +with the other declarations and statements in the block. -The following built-in functions -are intended to be compatible with those described -in the @cite{Intel Itanium Processor-specific Application Binary Interface}, -section 7.4. As such, they depart from normal GCC practice by not using -the @samp{__builtin_} prefix and also by being overloaded so that they -work on multiple types. +It is possible to call the nested function from outside the scope of its +name by storing its address or passing the address to another function: -The definition given in the Intel documentation allows only for the use of -the types @code{int}, @code{long}, @code{long long} or their unsigned -counterparts. GCC allows any scalar type that is 1, 2, 4 or 8 bytes in -size other than the C type @code{_Bool} or the C++ type @code{bool}. -Operations on pointer arguments are performed as if the operands were -of the @code{uintptr_t} type. That is, they are not scaled by the size -of the type to which the pointer points. +@smallexample +hack (int *array, int size) +@{ + void store (int index, int value) + @{ array[index] = value; @} -These functions are implemented in terms of the @samp{__atomic} -builtins (@pxref{__atomic Builtins}). They should not be used for new -code which should use the @samp{__atomic} builtins instead. + intermediate (store, size); +@} +@end smallexample -Not all operations are supported by all target processors. If a particular -operation cannot be implemented on the target processor, a call to an -external function is generated. The external function carries the same name -as the built-in version, with an additional suffix -@samp{_@var{n}} where @var{n} is the size of the data type. +Here, the function @code{intermediate} receives the address of +@code{store} as an argument. If @code{intermediate} calls @code{store}, +the arguments given to @code{store} are used to store into @code{array}. +But this technique works only so long as the containing function +(@code{hack}, in this example) does not exit. -In most cases, these built-in functions are considered a @dfn{full barrier}. -That is, -no memory operand is moved across the operation, either forward or -backward. Further, instructions are issued as necessary to prevent the -processor from speculating loads across the operation and from queuing stores -after the operation. +If you try to call the nested function through its address after the +containing function exits, all hell breaks loose. If you try +to call it after a containing scope level exits, and if it refers +to some of the variables that are no longer in scope, you may be lucky, +but it's not wise to take the risk. If, however, the nested function +does not refer to anything that has gone out of scope, you should be +safe. -All of the routines are described in the Intel documentation to take -``an optional list of variables protected by the memory barrier''. It's -not clear what is meant by that; it could mean that @emph{only} the -listed variables are protected, or it could mean a list of additional -variables to be protected. The list is ignored by GCC which treats it as -empty. GCC interprets an empty list as meaning that all globally -accessible variables should be protected. +GCC implements taking the address of a nested function using a technique +called @dfn{trampolines}. This technique was described in +@cite{Lexical Closures for C++} (Thomas M. Breuel, USENIX +C++ Conference Proceedings, October 17-21, 1988). -@defbuiltin{@var{type} __sync_fetch_and_add (@var{type} *@var{ptr}, @var{type} @var{value}, ...)} -@defbuiltinx{@var{type} __sync_fetch_and_sub (@var{type} *@var{ptr}, @var{type} @var{value}, ...)} -@defbuiltinx{@var{type} __sync_fetch_and_or (@var{type} *@var{ptr}, @var{type} @var{value}, ...)} -@defbuiltinx{@var{type} __sync_fetch_and_and (@var{type} *@var{ptr}, @var{type} @var{value}, ...)} -@defbuiltinx{@var{type} __sync_fetch_and_xor (@var{type} *@var{ptr}, @var{type} @var{value}, ...)} -@defbuiltinx{@var{type} __sync_fetch_and_nand (@var{type} *@var{ptr}, @var{type} @var{value}, ...)} -These built-in functions perform the operation suggested by the name, and -returns the value that had previously been in memory. That is, operations -on integer operands have the following semantics. Operations on pointer -arguments are performed as if the operands were of the @code{uintptr_t} -type. That is, they are not scaled by the size of the type to which -the pointer points. +A nested function can jump to a label inherited from a containing +function, provided the label is explicitly declared in the containing +function (@pxref{Local Labels}). Such a jump returns instantly to the +containing function, exiting the nested function that did the +@code{goto} and any intermediate functions as well. Here is an example: @smallexample -@{ tmp = *ptr; *ptr @var{op}= value; return tmp; @} -@{ tmp = *ptr; *ptr = ~(tmp & value); return tmp; @} // nand +@group +bar (int *array, int offset, int size) +@{ + __label__ failure; + int access (int *array, int index) + @{ + if (index > size) + goto failure; + return array[index + offset]; + @} + int i; + /* @r{@dots{}} */ + for (i = 0; i < size; i++) + /* @r{@dots{}} */ access (array, i) /* @r{@dots{}} */ + /* @r{@dots{}} */ + return 0; + + /* @r{Control comes here from @code{access} + if it detects an error.} */ + failure: + return -1; +@} +@end group @end smallexample -The object pointed to by the first argument must be of integer or pointer -type. It must not be a boolean type. - -@emph{Note:} GCC 4.4 and later implement @code{__sync_fetch_and_nand} -as @code{*ptr = ~(tmp & value)} instead of @code{*ptr = ~tmp & value}. -@enddefbuiltin - -@defbuiltin{@var{type} __sync_add_and_fetch (@var{type} *@var{ptr}, @ - @var{type} @var{value}, ...)} -@defbuiltinx{@var{type} __sync_sub_and_fetch (@var{type} *@var{ptr}, @var{type} @var{value}, ...)} -@defbuiltinx{@var{type} __sync_or_and_fetch (@var{type} *@var{ptr}, @var{type} @var{value}, ...)} -@defbuiltinx{@var{type} __sync_and_and_fetch (@var{type} *@var{ptr}, @var{type} @var{value}, ...)} -@defbuiltinx{@var{type} __sync_xor_and_fetch (@var{type} *@var{ptr}, @var{type} @var{value}, ...)} -@defbuiltinx{@var{type} __sync_nand_and_fetch (@var{type} *@var{ptr}, @var{type} @var{value}, ...)} -These built-in functions perform the operation suggested by the name, and -return the new value. That is, operations on integer operands have -the following semantics. Operations on pointer operands are performed as -if the operand's type were @code{uintptr_t}. +A nested function always has no linkage. Declaring one with +@code{extern} or @code{static} is erroneous. If you need to declare the nested function +before its definition, use @code{auto} (which is otherwise meaningless +for function declarations). @smallexample -@{ *ptr @var{op}= value; return *ptr; @} -@{ *ptr = ~(*ptr & value); return *ptr; @} // nand +bar (int *array, int offset, int size) +@{ + __label__ failure; + auto int access (int *, int); + /* @r{@dots{}} */ + int access (int *array, int index) + @{ + if (index > size) + goto failure; + return array[index + offset]; + @} + /* @r{@dots{}} */ +@} @end smallexample -The same constraints on arguments apply as for the corresponding -@code{__sync_op_and_fetch} built-in functions. +@node Typeof +@subsection Referring to a Type with @code{typeof} +@findex typeof +@findex sizeof +@cindex macros, types of arguments -@emph{Note:} GCC 4.4 and later implement @code{__sync_nand_and_fetch} -as @code{*ptr = ~(*ptr & value)} instead of -@code{*ptr = ~*ptr & value}. -@enddefbuiltin +Another way to refer to the type of an expression is with @code{typeof}. +The syntax of using of this keyword looks like @code{sizeof}, but the +construct acts semantically like a type name defined with @code{typedef}. -@defbuiltin{bool __sync_bool_compare_and_swap (@var{type} *@var{ptr}, @var{type} @var{oldval}, @var{type} @var{newval}, ...)} -@defbuiltinx{@var{type} __sync_val_compare_and_swap (@var{type} *@var{ptr}, @var{type} @var{oldval}, @var{type} @var{newval}, ...)} -These built-in functions perform an atomic compare and swap. -That is, if the current -value of @code{*@var{ptr}} is @var{oldval}, then write @var{newval} into -@code{*@var{ptr}}. +There are two ways of writing the argument to @code{typeof}: with an +expression or with a type. Here is an example with an expression: -The ``bool'' version returns @code{true} if the comparison is successful and -@var{newval} is written. The ``val'' version returns the contents -of @code{*@var{ptr}} before the operation. -@enddefbuiltin +@smallexample +typeof (x[0](1)) +@end smallexample -@defbuiltin{void __sync_synchronize (...)} -This built-in function issues a full memory barrier. -@enddefbuiltin +@noindent +This assumes that @code{x} is an array of pointers to functions; +the type described is that of the values of the functions. -@defbuiltin{@var{type} __sync_lock_test_and_set (@var{type} *@var{ptr}, @var{type} @var{value}, ...)} -This built-in function, as described by Intel, is not a traditional test-and-set -operation, but rather an atomic exchange operation. It writes @var{value} -into @code{*@var{ptr}}, and returns the previous contents of -@code{*@var{ptr}}. +Here is an example with a typename as the argument: -Many targets have only minimal support for such locks, and do not support -a full exchange operation. In this case, a target may support reduced -functionality here by which the @emph{only} valid value to store is the -immediate constant 1. The exact value actually stored in @code{*@var{ptr}} -is implementation defined. +@smallexample +typeof (int *) +@end smallexample -This built-in function is not a full barrier, -but rather an @dfn{acquire barrier}. -This means that references after the operation cannot move to (or be -speculated to) before the operation, but previous memory stores may not -be globally visible yet, and previous memory loads may not yet be -satisfied. -@enddefbuiltin +@noindent +Here the type described is that of pointers to @code{int}. -@defbuiltin{void __sync_lock_release (@var{type} *@var{ptr}, ...)} -This built-in function releases the lock acquired by -@code{__sync_lock_test_and_set}. -Normally this means writing the constant 0 to @code{*@var{ptr}}. +If you are writing a header file that must work when included in ISO C +programs, write @code{__typeof__} instead of @code{typeof}. +@xref{Alternate Keywords}. -This built-in function is not a full barrier, -but rather a @dfn{release barrier}. -This means that all previous memory stores are globally visible, and all -previous memory loads have been satisfied, but following memory reads -are not prevented from being speculated to before the barrier. -@enddefbuiltin +A @code{typeof} construct can be used anywhere a typedef name can be +used. For example, you can use it in a declaration, in a cast, or inside +of @code{sizeof} or @code{typeof}. -@node __atomic Builtins -@section Built-in Functions for Memory Model Aware Atomic Operations +The operand of @code{typeof} is evaluated for its side effects if and +only if it is an expression of variably modified type or the name of +such a type. -The following built-in functions approximately match the requirements -for the C++11 memory model. They are all -identified by being prefixed with @samp{__atomic} and most are -overloaded so that they work with multiple types. +@code{typeof} is often useful in conjunction with +statement expressions (@pxref{Statement Exprs}). +Here is how the two together can +be used to define a safe ``maximum'' macro which operates on any +arithmetic type and evaluates each of its arguments exactly once: -These functions are intended to replace the legacy @samp{__sync} -builtins. The main difference is that the memory order that is requested -is a parameter to the functions. New code should always use the -@samp{__atomic} builtins rather than the @samp{__sync} builtins. +@smallexample +#define max(a,b) \ + (@{ typeof (a) _a = (a); \ + typeof (b) _b = (b); \ + _a > _b ? _a : _b; @}) +@end smallexample -Note that the @samp{__atomic} builtins assume that programs will -conform to the C++11 memory model. In particular, they assume -that programs are free of data races. See the C++11 standard for -detailed requirements. +@cindex underscores in variables in macros +@cindex @samp{_} in variables in macros +@cindex local variables in macros +@cindex variables, local, in macros +@cindex macros, local variables in -The @samp{__atomic} builtins can be used with any integral scalar or -pointer type that is 1, 2, 4, or 8 bytes in length. 16-byte integral -types are also allowed if @samp{__int128} (@pxref{__int128}) is -supported by the architecture. +The reason for using names that start with underscores for the local +variables is to avoid conflicts with variable names that occur within the +expressions that are substituted for @code{a} and @code{b}. Eventually we +hope to design a new form of declaration syntax that allows you to declare +variables whose scopes start only after their initializers; this will be a +more reliable way to prevent such conflicts. -The four non-arithmetic functions (load, store, exchange, and -compare_exchange) all have a generic version as well. This generic -version works on any data type. It uses the lock-free built-in function -if the specific data type size makes that possible; otherwise, an -external call is left to be resolved at run time. This external call is -the same format with the addition of a @samp{size_t} parameter inserted -as the first parameter indicating the size of the object being pointed to. -All objects must be the same size. +@noindent +Some more examples of the use of @code{typeof}: -There are 6 different memory orders that can be specified. These map -to the C++11 memory orders with the same names, see the C++11 standard -or the @uref{https://gcc.gnu.org/wiki/Atomic/GCCMM/AtomicSync,GCC wiki -on atomic synchronization} for detailed definitions. Individual -targets may also support additional memory orders for use on specific -architectures. Refer to the target documentation for details of -these. +@itemize @bullet +@item +This declares @code{y} with the type of what @code{x} points to. -An atomic operation can both constrain code motion and -be mapped to hardware instructions for synchronization between threads -(e.g., a fence). To which extent this happens is controlled by the -memory orders, which are listed here in approximately ascending order of -strength. The description of each memory order is only meant to roughly -illustrate the effects and is not a specification; see the C++11 -memory model for precise semantics. +@smallexample +typeof (*x) y; +@end smallexample -@table @code -@item __ATOMIC_RELAXED -Implies no inter-thread ordering constraints. -@item __ATOMIC_CONSUME -This is currently implemented using the stronger @code{__ATOMIC_ACQUIRE} -memory order because of a deficiency in C++11's semantics for -@code{memory_order_consume}. -@item __ATOMIC_ACQUIRE -Creates an inter-thread happens-before constraint from the release (or -stronger) semantic store to this acquire load. Can prevent hoisting -of code to before the operation. -@item __ATOMIC_RELEASE -Creates an inter-thread happens-before constraint to acquire (or stronger) -semantic loads that read from this release store. Can prevent sinking -of code to after the operation. -@item __ATOMIC_ACQ_REL -Combines the effects of both @code{__ATOMIC_ACQUIRE} and -@code{__ATOMIC_RELEASE}. -@item __ATOMIC_SEQ_CST -Enforces total ordering with all other @code{__ATOMIC_SEQ_CST} operations. -@end table +@item +This declares @code{y} as an array of such values. -Note that in the C++11 memory model, @emph{fences} (e.g., -@samp{__atomic_thread_fence}) take effect in combination with other -atomic operations on specific memory locations (e.g., atomic loads); -operations on specific memory locations do not necessarily affect other -operations in the same way. +@smallexample +typeof (*x) y[4]; +@end smallexample -Target architectures are encouraged to provide their own patterns for -each of the atomic built-in functions. If no target is provided, the original -non-memory model set of @samp{__sync} atomic built-in functions are -used, along with any required synchronization fences surrounding it in -order to achieve the proper behavior. Execution in this case is subject -to the same restrictions as those built-in functions. +@item +This declares @code{y} as an array of pointers to characters: -If there is no pattern or mechanism to provide a lock-free instruction -sequence, a call is made to an external routine with the same parameters -to be resolved at run time. +@smallexample +typeof (typeof (char *)[4]) y; +@end smallexample -When implementing patterns for these built-in functions, the memory order -parameter can be ignored as long as the pattern implements the most -restrictive @code{__ATOMIC_SEQ_CST} memory order. Any of the other memory -orders execute correctly with this memory order but they may not execute as -efficiently as they could with a more appropriate implementation of the -relaxed requirements. +@noindent +It is equivalent to the following traditional C declaration: -Note that the C++11 standard allows for the memory order parameter to be -determined at run time rather than at compile time. These built-in -functions map any run-time value to @code{__ATOMIC_SEQ_CST} rather -than invoke a runtime library call or inline a switch statement. This is -standard compliant, safe, and the simplest approach for now. +@smallexample +char *y[4]; +@end smallexample -The memory order parameter is a signed int, but only the lower 16 bits are -reserved for the memory order. The remainder of the signed int is reserved -for target use and should be 0. Use of the predefined atomic values -ensures proper usage. +To see the meaning of the declaration using @code{typeof}, and why it +might be a useful way to write, rewrite it with these macros: -@defbuiltin{@var{type} __atomic_load_n (@var{type} *@var{ptr}, int @var{memorder})} -This built-in function implements an atomic load operation. It returns the -contents of @code{*@var{ptr}}. +@smallexample +#define pointer(T) typeof(T *) +#define array(T, N) typeof(T [N]) +@end smallexample -The valid memory order variants are -@code{__ATOMIC_RELAXED}, @code{__ATOMIC_SEQ_CST}, @code{__ATOMIC_ACQUIRE}, -and @code{__ATOMIC_CONSUME}. +@noindent +Now the declaration can be rewritten this way: -@enddefbuiltin +@smallexample +array (pointer (char), 4) y; +@end smallexample -@defbuiltin{void __atomic_load (@var{type} *@var{ptr}, @var{type} *@var{ret}, int @var{memorder})} -This is the generic version of an atomic load. It returns the -contents of @code{*@var{ptr}} in @code{*@var{ret}}. +@noindent +Thus, @code{array (pointer (char), 4)} is the type of arrays of 4 +pointers to @code{char}. +@end itemize -@enddefbuiltin +The ISO C23 operator @code{typeof_unqual} is available in ISO C23 mode +and its result is the non-atomic unqualified version of what @code{typeof} +operator returns. Alternate spelling @code{__typeof_unqual__} is +available in all C modes and provides non-atomic unqualified version of +what @code{__typeof__} operator returns. +@xref{Alternate Keywords}. -@defbuiltin{void __atomic_store_n (@var{type} *@var{ptr}, @var{type} @var{val}, int @var{memorder})} -This built-in function implements an atomic store operation. It writes -@code{@var{val}} into @code{*@var{ptr}}. +@cindex @code{__auto_type} in GNU C +In GNU C, but not GNU C++, you may also declare the type of a variable +as @code{__auto_type}. In that case, the declaration must declare +only one variable, whose declarator must just be an identifier, the +declaration must be initialized, and the type of the variable is +determined by the initializer; the name of the variable is not in +scope until after the initializer. (In C++, you should use C++11 +@code{auto} for this purpose.) Using @code{__auto_type}, the +``maximum'' macro above could be written as: -The valid memory order variants are -@code{__ATOMIC_RELAXED}, @code{__ATOMIC_SEQ_CST}, and @code{__ATOMIC_RELEASE}. +@smallexample +#define max(a,b) \ + (@{ __auto_type _a = (a); \ + __auto_type _b = (b); \ + _a > _b ? _a : _b; @}) +@end smallexample -@enddefbuiltin +Using @code{__auto_type} instead of @code{typeof} has two advantages: -@defbuiltin{void __atomic_store (@var{type} *@var{ptr}, @var{type} *@var{val}, int @var{memorder})} -This is the generic version of an atomic store. It stores the value -of @code{*@var{val}} into @code{*@var{ptr}}. +@itemize @bullet +@item Each argument to the macro appears only once in the expansion of +the macro. This prevents the size of the macro expansion growing +exponentially when calls to such macros are nested inside arguments of +such macros. -@enddefbuiltin +@item If the argument to the macro has variably modified type, it is +evaluated only once when using @code{__auto_type}, but twice if +@code{typeof} is used. +@end itemize -@defbuiltin{@var{type} __atomic_exchange_n (@var{type} *@var{ptr}, @var{type} @var{val}, int @var{memorder})} -This built-in function implements an atomic exchange operation. It writes -@var{val} into @code{*@var{ptr}}, and returns the previous contents of -@code{*@var{ptr}}. +@node Offsetof +@subsection Support for @code{offsetof} +@findex __builtin_offsetof -All memory order variants are valid. +GCC implements for both C and C++ a syntactic extension to implement +the @code{offsetof} macro. -@enddefbuiltin +@smallexample +primary: + "__builtin_offsetof" "(" @code{typename} "," offsetof_member_designator ")" -@defbuiltin{void __atomic_exchange (@var{type} *@var{ptr}, @var{type} *@var{val}, @var{type} *@var{ret}, int @var{memorder})} -This is the generic version of an atomic exchange. It stores the -contents of @code{*@var{val}} into @code{*@var{ptr}}. The original value -of @code{*@var{ptr}} is copied into @code{*@var{ret}}. +offsetof_member_designator: + @code{identifier} + | offsetof_member_designator "." @code{identifier} + | offsetof_member_designator "[" @code{expr} "]" +@end smallexample -@enddefbuiltin +This extension is sufficient such that -@defbuiltin{bool __atomic_compare_exchange_n (@var{type} *@var{ptr}, @var{type} *@var{expected}, @var{type} @var{desired}, bool @var{weak}, int @var{success_memorder}, int @var{failure_memorder})} -This built-in function implements an atomic compare and exchange operation. -This compares the contents of @code{*@var{ptr}} with the contents of -@code{*@var{expected}}. If equal, the operation is a @emph{read-modify-write} -operation that writes @var{desired} into @code{*@var{ptr}}. If they are not -equal, the operation is a @emph{read} and the current contents of -@code{*@var{ptr}} are written into @code{*@var{expected}}. @var{weak} is @code{true} -for weak compare_exchange, which may fail spuriously, and @code{false} for -the strong variation, which never fails spuriously. Many targets -only offer the strong variation and ignore the parameter. When in doubt, use -the strong variation. +@smallexample +#define offsetof(@var{type}, @var{member}) __builtin_offsetof (@var{type}, @var{member}) +@end smallexample -If @var{desired} is written into @code{*@var{ptr}} then @code{true} is returned -and memory is affected according to the -memory order specified by @var{success_memorder}. There are no -restrictions on what memory order can be used here. +@noindent +is a suitable definition of the @code{offsetof} macro. In C++, @var{type} +may be dependent. In either case, @var{member} may consist of a single +identifier, or a sequence of member accesses and array references. -Otherwise, @code{false} is returned and memory is affected according -to @var{failure_memorder}. This memory order cannot be -@code{__ATOMIC_RELEASE} nor @code{__ATOMIC_ACQ_REL}. It also cannot be a -stronger order than that specified by @var{success_memorder}. +@node Alignment +@subsection Determining the Alignment of Functions, Types or Variables +@cindex alignment +@cindex type alignment +@cindex variable alignment -@enddefbuiltin +The keyword @code{__alignof__} determines the alignment requirement of +a function, object, or a type, or the minimum alignment usually required +by a type. Its syntax is just like @code{sizeof} and C11 @code{_Alignof}. -@defbuiltin{bool __atomic_compare_exchange (@var{type} *@var{ptr}, @var{type} *@var{expected}, @var{type} *@var{desired}, bool @var{weak}, int @var{success_memorder}, int @var{failure_memorder})} -This built-in function implements the generic version of -@code{__atomic_compare_exchange}. The function is virtually identical to -@code{__atomic_compare_exchange_n}, except the desired value is also a -pointer. +For example, if the target machine requires a @code{double} value to be +aligned on an 8-byte boundary, then @code{__alignof__ (double)} is 8. +This is true on many RISC machines. On more traditional machine +designs, @code{__alignof__ (double)} is 4 or even 2. -@enddefbuiltin +Some machines never actually require alignment; they allow references to any +data type even at an odd address. For these machines, @code{__alignof__} +reports the smallest alignment that GCC gives the data type, usually as +mandated by the target ABI. -@defbuiltin{@var{type} __atomic_add_fetch (@var{type} *@var{ptr}, @var{type} @var{val}, int @var{memorder})} -@defbuiltinx{@var{type} __atomic_sub_fetch (@var{type} *@var{ptr}, @var{type} @var{val}, int @var{memorder})} -@defbuiltinx{@var{type} __atomic_and_fetch (@var{type} *@var{ptr}, @var{type} @var{val}, int @var{memorder})} -@defbuiltinx{@var{type} __atomic_xor_fetch (@var{type} *@var{ptr}, @var{type} @var{val}, int @var{memorder})} -@defbuiltinx{@var{type} __atomic_or_fetch (@var{type} *@var{ptr}, @var{type} @var{val}, int @var{memorder})} -@defbuiltinx{@var{type} __atomic_nand_fetch (@var{type} *@var{ptr}, @var{type} @var{val}, int @var{memorder})} -These built-in functions perform the operation suggested by the name, and -return the result of the operation. Operations on pointer arguments are -performed as if the operands were of the @code{uintptr_t} type. That is, -they are not scaled by the size of the type to which the pointer points. +If the operand of @code{__alignof__} is an lvalue rather than a type, +its value is the required alignment for its type, taking into account +any minimum alignment specified by attribute @code{aligned} +(@pxref{Common Variable Attributes}). For example, after this +declaration: @smallexample -@{ *ptr @var{op}= val; return *ptr; @} -@{ *ptr = ~(*ptr & val); return *ptr; @} // nand +struct foo @{ int x; char y; @} foo1; @end smallexample -The object pointed to by the first argument must be of integer or pointer -type. It must not be a boolean type. All memory orders are valid. +@noindent +the value of @code{__alignof__ (foo1.y)} is 1, even though its actual +alignment is probably 2 or 4, the same as @code{__alignof__ (int)}. +It is an error to ask for the alignment of an incomplete type other +than @code{void}. -@enddefbuiltin +If the operand of the @code{__alignof__} expression is a function, +the expression evaluates to the alignment of the function which may +be specified by attribute @code{aligned} (@pxref{Common Function Attributes}). -@defbuiltin{@var{type} __atomic_fetch_add (@var{type} *@var{ptr}, @var{type} @var{val}, int @var{memorder})} -@defbuiltinx{@var{type} __atomic_fetch_sub (@var{type} *@var{ptr}, @var{type} @var{val}, int @var{memorder})} -@defbuiltinx{@var{type} __atomic_fetch_and (@var{type} *@var{ptr}, @var{type} @var{val}, int @var{memorder})} -@defbuiltinx{@var{type} __atomic_fetch_xor (@var{type} *@var{ptr}, @var{type} @var{val}, int @var{memorder})} -@defbuiltinx{@var{type} __atomic_fetch_or (@var{type} *@var{ptr}, @var{type} @var{val}, int @var{memorder})} -@defbuiltinx{@var{type} __atomic_fetch_nand (@var{type} *@var{ptr}, @var{type} @var{val}, int @var{memorder})} -These built-in functions perform the operation suggested by the name, and -return the value that had previously been in @code{*@var{ptr}}. Operations -on pointer arguments are performed as if the operands were of -the @code{uintptr_t} type. That is, they are not scaled by the size of -the type to which the pointer points. +@node Incomplete Enums +@subsection Incomplete @code{enum} Types -@smallexample -@{ tmp = *ptr; *ptr @var{op}= val; return tmp; @} -@{ tmp = *ptr; *ptr = ~(*ptr & val); return tmp; @} // nand -@end smallexample +You can define an @code{enum} tag without specifying its possible values. +This results in an incomplete type, much like what you get if you write +@code{struct foo} without describing the elements. A later declaration +that does specify the possible values completes the type. -The same constraints on arguments apply as for the corresponding -@code{__atomic_op_fetch} built-in functions. All memory orders are valid. +You cannot allocate variables or storage using the type while it is +incomplete. However, you can work with pointers to that type. -@enddefbuiltin +This extension may not be very useful, but it makes the handling of +@code{enum} more consistent with the way @code{struct} and @code{union} +are handled. -@defbuiltin{bool __atomic_test_and_set (void *@var{ptr}, int @var{memorder})} +This extension is not supported by GNU C++. -This built-in function performs an atomic test-and-set operation on -the byte at @code{*@var{ptr}}. The byte is set to some implementation -defined nonzero ``set'' value and the return value is @code{true} if and only -if the previous contents were ``set''. -It should be only used for operands of type @code{bool} or @code{char}. For -other types only part of the value may be set. +@node Variadic Macros +@subsection Macros with a Variable Number of Arguments. +@cindex variable number of arguments +@cindex macro with variable arguments +@cindex rest argument (in macro) +@cindex variadic macros -All memory orders are valid. +In the ISO C standard of 1999, a macro can be declared to accept a +variable number of arguments much as a function can. The syntax for +defining the macro is similar to that of a function. Here is an +example: -@enddefbuiltin +@smallexample +#define debug(format, ...) fprintf (stderr, format, __VA_ARGS__) +@end smallexample -@defbuiltin{void __atomic_clear (bool *@var{ptr}, int @var{memorder})} +@noindent +Here @samp{@dots{}} is a @dfn{variable argument}. In the invocation of +such a macro, it represents the zero or more tokens until the closing +parenthesis that ends the invocation, including any commas. This set of +tokens replaces the identifier @code{__VA_ARGS__} in the macro body +wherever it appears. See the CPP manual for more information. -This built-in function performs an atomic clear operation on -@code{*@var{ptr}}. After the operation, @code{*@var{ptr}} contains 0. -It should be only used for operands of type @code{bool} or @code{char} and -in conjunction with @code{__atomic_test_and_set}. -For other types it may only clear partially. If the type is not @code{bool} -prefer using @code{__atomic_store}. +GCC has long supported variadic macros, and used a different syntax that +allowed you to give a name to the variable arguments just like any other +argument. Here is an example: -The valid memory order variants are -@code{__ATOMIC_RELAXED}, @code{__ATOMIC_SEQ_CST}, and -@code{__ATOMIC_RELEASE}. +@smallexample +#define debug(format, args...) fprintf (stderr, format, args) +@end smallexample -@enddefbuiltin +@noindent +This is in all ways equivalent to the ISO C example above, but arguably +more readable and descriptive. -@defbuiltin{void __atomic_thread_fence (int @var{memorder})} +GNU CPP has two further variadic macro extensions, and permits them to +be used with either of the above forms of macro definition. -This built-in function acts as a synchronization fence between threads -based on the specified memory order. +In standard C, you are not allowed to leave the variable argument out +entirely; but you are allowed to pass an empty argument. For example, +this invocation is invalid in ISO C, because there is no comma after +the string: -All memory orders are valid. +@smallexample +debug ("A message") +@end smallexample -@enddefbuiltin +GNU CPP permits you to completely omit the variable arguments in this +way. In the above examples, the compiler would complain, though since +the expansion of the macro still has the extra comma after the format +string. -@defbuiltin{void __atomic_signal_fence (int @var{memorder})} - -This built-in function acts as a synchronization fence between a thread -and signal handlers based in the same thread. - -All memory orders are valid. - -@enddefbuiltin - -@defbuiltin{bool __atomic_always_lock_free (size_t @var{size}, void *@var{ptr})} - -This built-in function returns @code{true} if objects of @var{size} bytes always -generate lock-free atomic instructions for the target architecture. -@var{size} must resolve to a compile-time constant and the result also -resolves to a compile-time constant. - -@var{ptr} is an optional pointer to the object that may be used to determine -alignment. A value of 0 indicates typical alignment should be used. The -compiler may also ignore this parameter. +To help solve this problem, CPP behaves specially for variable arguments +used with the token paste operator, @samp{##}. If instead you write @smallexample -if (__atomic_always_lock_free (sizeof (long long), 0)) +#define debug(format, ...) fprintf (stderr, format, ## __VA_ARGS__) @end smallexample -@enddefbuiltin - -@defbuiltin{bool __atomic_is_lock_free (size_t @var{size}, void *@var{ptr})} - -This built-in function returns @code{true} if objects of @var{size} bytes always -generate lock-free atomic instructions for the target architecture. If -the built-in function is not known to be lock-free, a call is made to a -runtime routine named @code{__atomic_is_lock_free}. - -@var{ptr} is an optional pointer to the object that may be used to determine -alignment. A value of 0 indicates typical alignment should be used. The -compiler may also ignore this parameter. -@enddefbuiltin - -@node Integer Overflow Builtins -@section Built-in Functions to Perform Arithmetic with Overflow Checking - -The following built-in functions allow performing simple arithmetic operations -together with checking whether the operations overflowed. - -@defbuiltin{bool __builtin_add_overflow (@var{type1} @var{a}, @var{type2} @var{b}, @var{type3} *@var{res})} -@defbuiltinx{bool __builtin_sadd_overflow (int @var{a}, int @var{b}, int *@var{res})} -@defbuiltinx{bool __builtin_saddl_overflow (long int @var{a}, long int @var{b}, long int *@var{res})} -@defbuiltinx{bool __builtin_saddll_overflow (long long int @var{a}, long long int @var{b}, long long int *@var{res})} -@defbuiltinx{bool __builtin_uadd_overflow (unsigned int @var{a}, unsigned int @var{b}, unsigned int *@var{res})} -@defbuiltinx{bool __builtin_uaddl_overflow (unsigned long int @var{a}, unsigned long int @var{b}, unsigned long int *@var{res})} -@defbuiltinx{bool __builtin_uaddll_overflow (unsigned long long int @var{a}, unsigned long long int @var{b}, unsigned long long int *@var{res})} +@noindent +and if the variable arguments are omitted or empty, the @samp{##} +operator causes the preprocessor to remove the comma before it. If you +do provide some variable arguments in your macro invocation, GNU CPP +does not complain about the paste operation and instead places the +variable arguments after the comma. Just like any other pasted macro +argument, these arguments are not macro expanded. -These built-in functions promote the first two operands into infinite precision signed -type and perform addition on those promoted operands. The result is then -cast to the type the third pointer argument points to and stored there. -If the stored result is equal to the infinite precision result, the built-in -functions return @code{false}, otherwise they return @code{true}. As the addition is -performed in infinite signed precision, these built-in functions have fully defined -behavior for all argument values. +@node Conditionals +@subsection Conditionals with Omitted Operands +@cindex conditional expressions, extensions +@cindex omitted middle-operands +@cindex middle-operands, omitted +@cindex extensions, @code{?:} +@cindex @code{?:} extensions -The first built-in function allows arbitrary integral types for operands and -the result type must be pointer to some integral type other than enumerated or -boolean type, the rest of the built-in functions have explicit integer types. +The middle operand in a conditional expression may be omitted. Then +if the first operand is nonzero, its value is the value of the conditional +expression. -The compiler will attempt to use hardware instructions to implement -these built-in functions where possible, like conditional jump on overflow -after addition, conditional jump on carry etc. +Therefore, the expression -@enddefbuiltin +@smallexample +x ? : y +@end smallexample -@defbuiltin{bool __builtin_sub_overflow (@var{type1} @var{a}, @var{type2} @var{b}, @var{type3} *@var{res})} -@defbuiltinx{bool __builtin_ssub_overflow (int @var{a}, int @var{b}, int *@var{res})} -@defbuiltinx{bool __builtin_ssubl_overflow (long int @var{a}, long int @var{b}, long int *@var{res})} -@defbuiltinx{bool __builtin_ssubll_overflow (long long int @var{a}, long long int @var{b}, long long int *@var{res})} -@defbuiltinx{bool __builtin_usub_overflow (unsigned int @var{a}, unsigned int @var{b}, unsigned int *@var{res})} -@defbuiltinx{bool __builtin_usubl_overflow (unsigned long int @var{a}, unsigned long int @var{b}, unsigned long int *@var{res})} -@defbuiltinx{bool __builtin_usubll_overflow (unsigned long long int @var{a}, unsigned long long int @var{b}, unsigned long long int *@var{res})} +@noindent +has the value of @code{x} if that is nonzero; otherwise, the value of +@code{y}. -These built-in functions are similar to the add overflow checking built-in -functions above, except they perform subtraction, subtract the second argument -from the first one, instead of addition. +This example is perfectly equivalent to -@enddefbuiltin +@smallexample +x ? x : y +@end smallexample -@defbuiltin{bool __builtin_mul_overflow (@var{type1} @var{a}, @var{type2} @var{b}, @var{type3} *@var{res})} -@defbuiltinx{bool __builtin_smul_overflow (int @var{a}, int @var{b}, int *@var{res})} -@defbuiltinx{bool __builtin_smull_overflow (long int @var{a}, long int @var{b}, long int *@var{res})} -@defbuiltinx{bool __builtin_smulll_overflow (long long int @var{a}, long long int @var{b}, long long int *@var{res})} -@defbuiltinx{bool __builtin_umul_overflow (unsigned int @var{a}, unsigned int @var{b}, unsigned int *@var{res})} -@defbuiltinx{bool __builtin_umull_overflow (unsigned long int @var{a}, unsigned long int @var{b}, unsigned long int *@var{res})} -@defbuiltinx{bool __builtin_umulll_overflow (unsigned long long int @var{a}, unsigned long long int @var{b}, unsigned long long int *@var{res})} +@cindex side effect in @code{?:} +@cindex @code{?:} side effect +@noindent +In this simple case, the ability to omit the middle operand is not +especially useful. When it becomes useful is when the first operand does, +or may (if it is a macro argument), contain a side effect. Then repeating +the operand in the middle would perform the side effect twice. Omitting +the middle operand uses the value already computed without the undesirable +effects of recomputing it. -These built-in functions are similar to the add overflow checking built-in -functions above, except they perform multiplication, instead of addition. +@node Case Ranges +@subsection Case Ranges +@cindex case ranges +@cindex ranges in case statements -@enddefbuiltin +You can specify a range of consecutive values in a single @code{case} label, +like this: -The following built-in functions allow checking if simple arithmetic operation -would overflow. +@smallexample +case @var{low} ... @var{high}: +@end smallexample -@defbuiltin{bool __builtin_add_overflow_p (@var{type1} @var{a}, @var{type2} @var{b}, @var{type3} @var{c})} -@defbuiltinx{bool __builtin_sub_overflow_p (@var{type1} @var{a}, @var{type2} @var{b}, @var{type3} @var{c})} -@defbuiltinx{bool __builtin_mul_overflow_p (@var{type1} @var{a}, @var{type2} @var{b}, @var{type3} @var{c})} +@noindent +This has the same effect as the proper number of individual @code{case} +labels, one for each integer value from @var{low} to @var{high}, inclusive. -These built-in functions are similar to @code{__builtin_add_overflow}, -@code{__builtin_sub_overflow}, or @code{__builtin_mul_overflow}, except that -they don't store the result of the arithmetic operation anywhere and the -last argument is not a pointer, but some expression with integral type other -than enumerated or boolean type. +This feature is especially useful for ranges of ASCII character codes: -The built-in functions promote the first two operands into infinite precision signed type -and perform addition on those promoted operands. The result is then -cast to the type of the third argument. If the cast result is equal to the infinite -precision result, the built-in functions return @code{false}, otherwise they return @code{true}. -The value of the third argument is ignored, just the side effects in the third argument -are evaluated, and no integral argument promotions are performed on the last argument. -If the third argument is a bit-field, the type used for the result cast has the -precision and signedness of the given bit-field, rather than precision and signedness -of the underlying type. +@smallexample +case 'A' ... 'Z': +@end smallexample -For example, the following macro can be used to portably check, at -compile-time, whether or not adding two constant integers will overflow, -and perform the addition only when it is known to be safe and not to trigger -a @option{-Woverflow} warning. +@strong{Be careful:} Write spaces around the @code{...}, for otherwise +it may be parsed wrong when you use it with integer values. For example, +write this: @smallexample -#define INT_ADD_OVERFLOW_P(a, b) \ - __builtin_add_overflow_p (a, b, (__typeof__ ((a) + (b))) 0) - -enum @{ - A = INT_MAX, B = 3, - C = INT_ADD_OVERFLOW_P (A, B) ? 0 : A + B, - D = __builtin_add_overflow_p (1, SCHAR_MAX, (signed char) 0) -@}; +case 1 ... 5: @end smallexample -The compiler will attempt to use hardware instructions to implement -these built-in functions where possible, like conditional jump on overflow -after addition, conditional jump on carry etc. - -@enddefbuiltin - -@defbuiltin{{unsigned int} __builtin_addc (unsigned int @var{a}, unsigned int @var{b}, unsigned int @var{carry_in}, unsigned int *@var{carry_out})} -@defbuiltinx{{unsigned long int} __builtin_addcl (unsigned long int @var{a}, unsigned long int @var{b}, unsigned int @var{carry_in}, unsigned long int *@var{carry_out})} -@defbuiltinx{{unsigned long long int} __builtin_addcll (unsigned long long int @var{a}, unsigned long long int @var{b}, unsigned long long int @var{carry_in}, unsigned long long int *@var{carry_out})} +@noindent +rather than this: -These built-in functions are equivalent to: @smallexample - (@{ __typeof__ (@var{a}) s; \ - __typeof__ (@var{a}) c1 = __builtin_add_overflow (@var{a}, @var{b}, &s); \ - __typeof__ (@var{a}) c2 = __builtin_add_overflow (s, @var{carry_in}, &s); \ - *(@var{carry_out}) = c1 | c2; \ - s; @}) +case 1...5: @end smallexample -i.e.@: they add 3 unsigned values, set what the last argument -points to to 1 if any of the two additions overflowed (otherwise 0) -and return the sum of those 3 unsigned values. Note, while all -the first 3 arguments can have arbitrary values, better code will be -emitted if one of them (preferably the third one) has only values -0 or 1 (i.e.@: carry-in). - -@enddefbuiltin +@node Mixed Labels and Declarations +@subsection Mixed Declarations, Labels and Code +@cindex mixed declarations and code +@cindex declarations, mixed with code +@cindex code, mixed with declarations -@defbuiltin{{unsigned int} __builtin_subc (unsigned int @var{a}, unsigned int @var{b}, unsigned int @var{carry_in}, unsigned int *@var{carry_out})} -@defbuiltinx{{unsigned long int} __builtin_subcl (unsigned long int @var{a}, unsigned long int @var{b}, unsigned int @var{carry_in}, unsigned long int *@var{carry_out})} -@defbuiltinx{{unsigned long long int} __builtin_subcll (unsigned long long int @var{a}, unsigned long long int @var{b}, unsigned long long int @var{carry_in}, unsigned long long int *@var{carry_out})} +ISO C99 and ISO C++ allow declarations and code to be freely mixed +within compound statements. ISO C23 allows labels to be +placed before declarations and at the end of a compound statement. +As an extension, GNU C also allows all this in C90 mode. For example, +you could do: -These built-in functions are equivalent to: @smallexample - (@{ __typeof__ (@var{a}) s; \ - __typeof__ (@var{a}) c1 = __builtin_sub_overflow (@var{a}, @var{b}, &s); \ - __typeof__ (@var{a}) c2 = __builtin_sub_overflow (s, @var{carry_in}, &s); \ - *(@var{carry_out}) = c1 | c2; \ - s; @}) +int i; +/* @r{@dots{}} */ +i++; +int j = i + 2; @end smallexample -i.e.@: they subtract 2 unsigned values from the first unsigned value, -set what the last argument points to to 1 if any of the two subtractions -overflowed (otherwise 0) and return the result of the subtractions. -Note, while all the first 3 arguments can have arbitrary values, better code -will be emitted if one of them (preferrably the third one) has only values -0 or 1 (i.e.@: carry-in). - -@enddefbuiltin +Each identifier is visible from where it is declared until the end of +the enclosing block. -@node x86 specific memory model extensions for transactional memory -@section x86-Specific Memory Model Extensions for Transactional Memory +@node C++ Comments +@subsection C++ Style Comments +@cindex @code{//} +@cindex C++ comments +@cindex comments, C++ style -The x86 architecture supports additional memory ordering flags -to mark critical sections for hardware lock elision. -These must be specified in addition to an existing memory order to -atomic intrinsics. +In GNU C, you may use C++ style comments, which start with @samp{//} and +continue until the end of the line. Many other C implementations allow +such comments, and they are included in the 1999 C standard. However, +C++ style comments are not recognized if you specify an @option{-std} +option specifying a version of ISO C before C99, or @option{-ansi} +(equivalent to @option{-std=c90}). -@table @code -@item __ATOMIC_HLE_ACQUIRE -Start lock elision on a lock variable. -Memory order must be @code{__ATOMIC_ACQUIRE} or stronger. -@item __ATOMIC_HLE_RELEASE -End lock elision on a lock variable. -Memory order must be @code{__ATOMIC_RELEASE} or stronger. -@end table +@node Escaped Newlines +@subsection Slightly Looser Rules for Escaped Newlines +@cindex escaped newlines +@cindex newlines (escaped) -When a lock acquire fails, it is required for good performance to abort -the transaction quickly. This can be done with a @code{_mm_pause}. +The preprocessor treatment of escaped newlines is more relaxed +than that specified by the C90 standard, which requires the newline +to immediately follow a backslash. +GCC's implementation allows whitespace in the form +of spaces, horizontal and vertical tabs, and form feeds between the +backslash and the subsequent newline. The preprocessor issues a +warning, but treats it as a valid escaped newline and combines the two +lines to form a single logical line. This works within comments and +tokens, as well as between tokens. Comments are @emph{not} treated as +whitespace for the purposes of this relaxation, since they have not +yet been replaced with spaces. -@smallexample -#include // For _mm_pause +@node Hex Floats +@subsection Hex Floats +@cindex hex floats -int lockvar; +ISO C99 and ISO C++17 support floating-point numbers written not only in +the usual decimal notation, such as @code{1.55e1}, but also numbers such as +@code{0x1.fp3} written in hexadecimal format. As a GNU extension, GCC +supports this in C90 mode (except in some cases when strictly +conforming) and in C++98, C++11 and C++14 modes. In that format the +@samp{0x} hex introducer and the @samp{p} or @samp{P} exponent field are +mandatory. The exponent is a decimal number that indicates the power of +2 by which the significant part is multiplied. Thus @samp{0x1.f} is +@tex +$1 {15\over16}$, +@end tex +@ifnottex +1 15/16, +@end ifnottex +@samp{p3} multiplies it by 8, and the value of @code{0x1.fp3} +is the same as @code{1.55e1}. -/* Acquire lock with lock elision */ -while (__atomic_exchange_n(&lockvar, 1, __ATOMIC_ACQUIRE|__ATOMIC_HLE_ACQUIRE)) - _mm_pause(); /* Abort failed transaction */ -... -/* Free lock with lock elision */ -__atomic_store_n(&lockvar, 0, __ATOMIC_RELEASE|__ATOMIC_HLE_RELEASE); -@end smallexample +Unlike for floating-point numbers in the decimal notation the exponent +is always required in the hexadecimal notation. Otherwise the compiler +would not be able to resolve the ambiguity of, e.g., @code{0x1.f}. This +could mean @code{1.0f} or @code{1.9375} since @samp{f} is also the +extension for floating-point constants of type @code{float}. -@node Object Size Checking -@section Object Size Checking +@node Binary constants +@subsection Binary Constants using the @samp{0b} Prefix +@cindex Binary constants using the @samp{0b} prefix -@subsection Object Size Checking Built-in Functions -@findex __builtin___memcpy_chk -@findex __builtin___mempcpy_chk -@findex __builtin___memmove_chk -@findex __builtin___memset_chk -@findex __builtin___strcpy_chk -@findex __builtin___stpcpy_chk -@findex __builtin___strncpy_chk -@findex __builtin___strcat_chk -@findex __builtin___strncat_chk +Integer constants can be written as binary constants, consisting of a +sequence of @samp{0} and @samp{1} digits, prefixed by @samp{0b} or +@samp{0B}. This is particularly useful in environments that operate a +lot on the bit level (like microcontrollers). -GCC implements a limited buffer overflow protection mechanism that can -prevent some buffer overflow attacks by determining the sizes of objects -into which data is about to be written and preventing the writes when -the size isn't sufficient. The built-in functions described below yield -the best results when used together and when optimization is enabled. -For example, to detect object sizes across function boundaries or to -follow pointer assignments through non-trivial control flow they rely -on various optimization passes enabled with @option{-O2}. However, to -a limited extent, they can be used without optimization as well. +The following statements are identical: -@defbuiltin{size_t __builtin_object_size (const void * @var{ptr}, int @var{type})} -is a built-in construct that returns a constant number of bytes from -@var{ptr} to the end of the object @var{ptr} pointer points to -(if known at compile time). To determine the sizes of dynamically allocated -objects the function relies on the allocation functions called to obtain -the storage to be declared with the @code{alloc_size} attribute (@pxref{Common -Function Attributes}). @code{__builtin_object_size} never evaluates -its arguments for side effects. If there are any side effects in them, it -returns @code{(size_t) -1} for @var{type} 0 or 1 and @code{(size_t) 0} -for @var{type} 2 or 3. If there are multiple objects @var{ptr} can -point to and all of them are known at compile time, the returned number -is the maximum of remaining byte counts in those objects if @var{type} & 2 is -0 and minimum if nonzero. If it is not possible to determine which objects -@var{ptr} points to at compile time, @code{__builtin_object_size} should -return @code{(size_t) -1} for @var{type} 0 or 1 and @code{(size_t) 0} -for @var{type} 2 or 3. +@smallexample +i = 42; +i = 0x2a; +i = 052; +i = 0b101010; +@end smallexample -@var{type} is an integer constant from 0 to 3. If the least significant -bit is clear, objects are whole variables, if it is set, a closest -surrounding subobject is considered the object a pointer points to. -The second bit determines if maximum or minimum of remaining bytes -is computed. +The type of these constants follows the same rules as for octal or +hexadecimal integer constants, so suffixes like @samp{L} or @samp{UL} +can be applied. -@smallexample -struct V @{ char buf1[10]; int b; char buf2[10]; @} var; -char *p = &var.buf1[1], *q = &var.b; +@node Dollar Signs +@subsection Dollar Signs in Identifier Names +@cindex $ +@cindex dollar signs in identifier names +@cindex identifier names, dollar signs in -/* Here the object p points to is var. */ -assert (__builtin_object_size (p, 0) == sizeof (var) - 1); -/* The subobject p points to is var.buf1. */ -assert (__builtin_object_size (p, 1) == sizeof (var.buf1) - 1); -/* The object q points to is var. */ -assert (__builtin_object_size (q, 0) - == (char *) (&var + 1) - (char *) &var.b); -/* The subobject q points to is var.b. */ -assert (__builtin_object_size (q, 1) == sizeof (var.b)); -@end smallexample -@enddefbuiltin +In GNU C, you may normally use dollar signs in identifier names. +This is because many traditional C implementations allow such identifiers. +However, dollar signs in identifiers are not supported on a few target +machines, typically because the target assembler does not allow them. -@defbuiltin{{size_t} __builtin_dynamic_object_size (const void * @var{ptr}, int @var{type})} -is similar to @code{__builtin_object_size} in that it returns a number of bytes -from @var{ptr} to the end of the object @var{ptr} pointer points to, except -that the size returned may not be a constant. This results in successful -evaluation of object size estimates in a wider range of use cases and can be -more precise than @code{__builtin_object_size}, but it incurs a performance -penalty since it may add a runtime overhead on size computation. Semantics of -@var{type} as well as return values in case it is not possible to determine -which objects @var{ptr} points to at compile time are the same as in the case -of @code{__builtin_object_size}. -@enddefbuiltin +@node Character Escapes +@subsection The Character @key{ESC} in Constants -@subsection Object Size Checking and Source Fortification +You can use the sequence @samp{\e} in a string or character constant to +stand for the ASCII character @key{ESC}. -Hardening of function calls using the @code{_FORTIFY_SOURCE} macro is -one of the key uses of the object size checking built-in functions. To -make implementation of these features more convenient and improve -optimization and diagnostics, there are built-in functions added for -many common string operation functions, e.g., for @code{memcpy} -@code{__builtin___memcpy_chk} built-in is provided. This built-in has -an additional last argument, which is the number of bytes remaining in -the object the @var{dest} argument points to or @code{(size_t) -1} if -the size is not known. +@node Alternate Keywords +@subsection Alternate Keywords +@cindex alternate keywords +@cindex keywords, alternate -The built-in functions are optimized into the normal string functions -like @code{memcpy} if the last argument is @code{(size_t) -1} or if -it is known at compile time that the destination object will not -be overflowed. If the compiler can determine at compile time that the -object will always be overflowed, it issues a warning. +@option{-ansi} and the various @option{-std} options disable certain +keywords that are GNU C extensions. +Specifically, the keywords @code{asm}, @code{typeof} and +@code{inline} are not available in programs compiled with +@option{-ansi} or a @option{-std=} option specifying an ISO standard that +doesn't define the keyword. This causes trouble when you want to use +these extensions in a header file that can be included in programs that may +be compiled with with such options. -The intended use can be e.g.@: +The way to solve these problems is to put @samp{__} at the beginning and +end of each problematical keyword. For example, use @code{__asm__} +instead of @code{asm}, and @code{__inline__} instead of @code{inline}. -@smallexample -#undef memcpy -#define bos0(dest) __builtin_object_size (dest, 0) -#define memcpy(dest, src, n) \ - __builtin___memcpy_chk (dest, src, n, bos0 (dest)) +Other C compilers won't accept these alternative keywords; if you want to +compile with another compiler, you can define the alternate keywords as +macros to replace them with the customary keywords. It looks like this: -char *volatile p; -char buf[10]; -/* It is unknown what object p points to, so this is optimized - into plain memcpy - no checking is possible. */ -memcpy (p, "abcde", n); -/* Destination is known and length too. It is known at compile - time there will be no overflow. */ -memcpy (&buf[5], "abcde", 5); -/* Destination is known, but the length is not known at compile time. - This will result in __memcpy_chk call that can check for overflow - at run time. */ -memcpy (&buf[5], "abcde", n); -/* Destination is known and it is known at compile time there will - be overflow. There will be a warning and __memcpy_chk call that - will abort the program at run time. */ -memcpy (&buf[6], "abcde", 5); +@smallexample +#ifndef __GNUC__ +#define __asm__ asm +#endif @end smallexample -Such built-in functions are provided for @code{memcpy}, @code{mempcpy}, -@code{memmove}, @code{memset}, @code{strcpy}, @code{stpcpy}, @code{strncpy}, -@code{strcat} and @code{strncat}. +@findex __extension__ +@opindex pedantic +@option{-pedantic} and other options cause warnings for many GNU C extensions. +You can suppress such warnings using the keyword @code{__extension__}. +Specifically: -@subsubsection Formatted Output Function Checking -@defbuiltin{int __builtin___sprintf_chk @ - (char *@var{s}, int @var{flag}, size_t @var{os}, @ - const char *@var{fmt}, ...)} -@defbuiltinx{int __builtin___snprintf_chk @ - (char *@var{s}, size_t @var{maxlen}, int @var{flag}, @ - size_t @var{os}, const char *@var{fmt}, ...)} -@defbuiltinx{int __builtin___vsprintf_chk @ - (char *@var{s}, int @var{flag}, size_t @var{os}, @ - const char *@var{fmt}, va_list @var{ap})} -@defbuiltinx{int __builtin___vsnprintf_chk @ - (char *@var{s}, size_t @var{maxlen}, int @var{flag}, @ - size_t @var{os}, const char *@var{fmt}, @ - va_list @var{ap})} +@itemize @bullet +@item +Writing @code{__extension__} before an expression prevents warnings +about extensions within that expression. -The added @var{flag} argument is passed unchanged to @code{__sprintf_chk} -etc.@: functions and can contain implementation specific flags on what -additional security measures the checking function might take, such as -handling @code{%n} differently. +@item +In C, writing: -The @var{os} argument is the object size @var{s} points to, like in the -other built-in functions. There is a small difference in the behavior -though, if @var{os} is @code{(size_t) -1}, the built-in functions are -optimized into the non-checking functions only if @var{flag} is 0, otherwise -the checking function is called with @var{os} argument set to -@code{(size_t) -1}. +@smallexample +[[__extension__ @dots{}]] +@end smallexample -In addition to this, there are checking built-in functions -@code{__builtin___printf_chk}, @code{__builtin___vprintf_chk}, -@code{__builtin___fprintf_chk} and @code{__builtin___vfprintf_chk}. -These have just one additional argument, @var{flag}, right before -format string @var{fmt}. If the compiler is able to optimize them to -@code{fputc} etc.@: functions, it does, otherwise the checking function -is called and the @var{flag} argument passed to it. -@enddefbuiltin +suppresses warnings about using @samp{[[]]} attributes in C versions +that predate C23@. +@end itemize -@node New/Delete Builtins -@section Built-in functions for C++ allocations and deallocations -@findex __builtin_operator_new -@findex __builtin_operator_delete -Calling these C++ built-in functions is similar to calling -@code{::operator new} or @code{::operator delete} with the same arguments, -except that it is an error if the selected @code{::operator new} or -@code{::operator delete} overload is not a replaceable global operator -and for optimization purposes calls to pairs of these functions can be -omitted if access to the allocation is optimized out, or could be replaced -with implementation provided buffer on the stack, or multiple allocation -calls can be merged into a single allocation. In C++ such optimizations -are normally allowed just for calls to such replaceable global operators -from @code{new} and @code{delete} expressions. +@code{__extension__} has no effect aside from this. + +@node Function Names +@subsection Function Names as Strings +@cindex @code{__func__} identifier +@cindex @code{__FUNCTION__} identifier +@cindex @code{__PRETTY_FUNCTION__} identifier + +GCC provides three magic constants that hold the name of the current +function as a string. In C++11 and later modes, all three are treated +as constant expressions and can be used in @code{constexpr} constexts. +The first of these constants is @code{__func__}, which is part of +the C99 standard: + +The identifier @code{__func__} is implicitly declared by the translator +as if, immediately following the opening brace of each function +definition, the declaration @smallexample -void foo () @{ - int *a = new int; - delete a; // This pair of allocation/deallocation operators can be omitted - // or replaced with int _temp; int *a = &_temp; etc.@: - void *b = ::operator new (32); - ::operator delete (b); // This one cannnot. - void *c = __builtin_operator_new (32); - __builtin_operator_delete (c); // This one can. +static const char __func__[] = "function-name"; +@end smallexample + +@noindent +appeared, where function-name is the name of the lexically-enclosing +function. This name is the unadorned name of the function. As an +extension, at file (or, in C++, namespace scope), @code{__func__} +evaluates to the empty string. + +@code{__FUNCTION__} is another name for @code{__func__}, provided for +backward compatibility with old versions of GCC. + +In C, @code{__PRETTY_FUNCTION__} is yet another name for +@code{__func__}, except that at file scope (or, in C++, namespace scope), +it evaluates to the string @code{"top level"}. In addition, in C++, +@code{__PRETTY_FUNCTION__} contains the signature of the function as +well as its bare name. For example, this program: + +@smallexample +extern "C" int printf (const char *, ...); + +class a @{ + public: + void sub (int i) + @{ + printf ("__FUNCTION__ = %s\n", __FUNCTION__); + printf ("__PRETTY_FUNCTION__ = %s\n", __PRETTY_FUNCTION__); + @} +@}; + +int +main (void) +@{ + a ax; + ax.sub (0); + return 0; @} @end smallexample -@node Other Builtins -@section Other Built-in Functions Provided by GCC -@cindex built-in functions -@findex __builtin_iseqsig -@findex __builtin_isfinite -@findex __builtin_isnormal -@findex __builtin_isgreater -@findex __builtin_isgreaterequal -@findex __builtin_isunordered -@findex __builtin_speculation_safe_value -@findex _Exit -@findex _exit -@findex abort -@findex abs -@findex acos -@findex acosf -@findex acosh -@findex acoshf -@findex acoshl -@findex acosl -@findex alloca -@findex asin -@findex asinf -@findex asinh -@findex asinhf -@findex asinhl -@findex asinl -@findex atan -@findex atan2 -@findex atan2f -@findex atan2l -@findex atanf -@findex atanh -@findex atanhf -@findex atanhl -@findex atanl -@findex bcmp -@findex bzero -@findex cabs -@findex cabsf -@findex cabsl -@findex cacos -@findex cacosf -@findex cacosh -@findex cacoshf -@findex cacoshl -@findex cacosl -@findex calloc -@findex carg -@findex cargf -@findex cargl -@findex casin -@findex casinf -@findex casinh -@findex casinhf -@findex casinhl -@findex casinl -@findex catan -@findex catanf -@findex catanh -@findex catanhf -@findex catanhl -@findex catanl -@findex cbrt -@findex cbrtf -@findex cbrtl -@findex ccos -@findex ccosf -@findex ccosh -@findex ccoshf -@findex ccoshl -@findex ccosl -@findex ceil -@findex ceilf -@findex ceill -@findex cexp -@findex cexpf -@findex cexpl -@findex cimag -@findex cimagf -@findex cimagl -@findex clog -@findex clogf -@findex clogl -@findex clog10 -@findex clog10f -@findex clog10l -@findex conj -@findex conjf -@findex conjl -@findex copysign -@findex copysignf -@findex copysignl -@findex cos -@findex cosf -@findex cosh -@findex coshf -@findex coshl -@findex cosl -@findex cpow -@findex cpowf -@findex cpowl -@findex cproj -@findex cprojf -@findex cprojl -@findex creal -@findex crealf -@findex creall -@findex csin -@findex csinf -@findex csinh -@findex csinhf -@findex csinhl -@findex csinl -@findex csqrt -@findex csqrtf -@findex csqrtl -@findex ctan -@findex ctanf -@findex ctanh -@findex ctanhf -@findex ctanhl -@findex ctanl -@findex dcgettext -@findex dgettext -@findex drem -@findex dremf -@findex dreml -@findex erf -@findex erfc -@findex erfcf -@findex erfcl -@findex erff -@findex erfl -@findex exit -@findex exp -@findex exp10 -@findex exp10f -@findex exp10l -@findex exp2 -@findex exp2f -@findex exp2l -@findex expf -@findex expl -@findex expm1 -@findex expm1f -@findex expm1l -@findex fabs -@findex fabsf -@findex fabsl -@findex fdim -@findex fdimf -@findex fdiml -@findex ffs -@findex floor -@findex floorf -@findex floorl -@findex fma -@findex fmaf -@findex fmal -@findex fmax -@findex fmaxf -@findex fmaxl -@findex fmin -@findex fminf -@findex fminl -@findex fmod -@findex fmodf -@findex fmodl -@findex fprintf -@findex fprintf_unlocked -@findex fputs -@findex fputs_unlocked -@findex free -@findex frexp -@findex frexpf -@findex frexpl -@findex fscanf -@findex gamma -@findex gammaf -@findex gammal -@findex gamma_r -@findex gammaf_r -@findex gammal_r -@findex gettext -@findex hypot -@findex hypotf -@findex hypotl -@findex ilogb -@findex ilogbf -@findex ilogbl -@findex imaxabs -@findex index -@findex isalnum -@findex isalpha -@findex isascii -@findex isblank -@findex iscntrl -@findex isdigit -@findex isgraph -@findex islower -@findex isprint -@findex ispunct -@findex isspace -@findex isupper -@findex iswalnum -@findex iswalpha -@findex iswblank -@findex iswcntrl -@findex iswdigit -@findex iswgraph -@findex iswlower -@findex iswprint -@findex iswpunct -@findex iswspace -@findex iswupper -@findex iswxdigit -@findex isxdigit -@findex j0 -@findex j0f -@findex j0l -@findex j1 -@findex j1f -@findex j1l -@findex jn -@findex jnf -@findex jnl -@findex labs -@findex ldexp -@findex ldexpf -@findex ldexpl -@findex lgamma -@findex lgammaf -@findex lgammal -@findex lgamma_r -@findex lgammaf_r -@findex lgammal_r -@findex llabs -@findex llrint -@findex llrintf -@findex llrintl -@findex llround -@findex llroundf -@findex llroundl -@findex log -@findex log10 -@findex log10f -@findex log10l -@findex log1p -@findex log1pf -@findex log1pl -@findex log2 -@findex log2f -@findex log2l -@findex logb -@findex logbf -@findex logbl -@findex logf -@findex logl -@findex lrint -@findex lrintf -@findex lrintl -@findex lround -@findex lroundf -@findex lroundl -@findex malloc -@findex memchr -@findex memcmp -@findex memcpy -@findex mempcpy -@findex memset -@findex modf -@findex modff -@findex modfl -@findex nearbyint -@findex nearbyintf -@findex nearbyintl -@findex nextafter -@findex nextafterf -@findex nextafterl -@findex nexttoward -@findex nexttowardf -@findex nexttowardl -@findex pow -@findex pow10 -@findex pow10f -@findex pow10l -@findex powf -@findex powl -@findex printf -@findex printf_unlocked -@findex putchar -@findex puts -@findex realloc -@findex remainder -@findex remainderf -@findex remainderl -@findex remquo -@findex remquof -@findex remquol -@findex rindex -@findex rint -@findex rintf -@findex rintl -@findex round -@findex roundf -@findex roundl -@findex scalb -@findex scalbf -@findex scalbl -@findex scalbln -@findex scalblnf -@findex scalblnf -@findex scalbn -@findex scalbnf -@findex scanfnl -@findex signbit -@findex signbitf -@findex signbitl -@findex signbitd32 -@findex signbitd64 -@findex signbitd128 -@findex significand -@findex significandf -@findex significandl -@findex sin -@findex sincos -@findex sincosf -@findex sincosl -@findex sinf -@findex sinh -@findex sinhf -@findex sinhl -@findex sinl -@findex snprintf -@findex sprintf -@findex sqrt -@findex sqrtf -@findex sqrtl -@findex sscanf -@findex stpcpy -@findex stpncpy -@findex strcasecmp -@findex strcat -@findex strchr -@findex strcmp -@findex strcpy -@findex strcspn -@findex strdup -@findex strfmon -@findex strftime -@findex strlen -@findex strncasecmp -@findex strncat -@findex strncmp -@findex strncpy -@findex strndup -@findex strnlen -@findex strpbrk -@findex strrchr -@findex strspn -@findex strstr -@findex tan -@findex tanf -@findex tanh -@findex tanhf -@findex tanhl -@findex tanl -@findex tgamma -@findex tgammaf -@findex tgammal -@findex toascii -@findex tolower -@findex toupper -@findex towlower -@findex towupper -@findex trunc -@findex truncf -@findex truncl -@findex vfprintf -@findex vfscanf -@findex vprintf -@findex vscanf -@findex vsnprintf -@findex vsprintf -@findex vsscanf -@findex y0 -@findex y0f -@findex y0l -@findex y1 -@findex y1f -@findex y1l -@findex yn -@findex ynf -@findex ynl +@noindent +gives this output: -GCC provides a large number of built-in functions other than the ones -mentioned above. Some of these are for internal use in the processing -of exceptions or variable-length argument lists and are not -documented here because they may change from time to time; we do not -recommend general use of these functions. +@smallexample +__FUNCTION__ = sub +__PRETTY_FUNCTION__ = void a::sub(int) +@end smallexample -The remaining functions are provided for optimization purposes. +These identifiers are variables, not preprocessor macros, and may not +be used to initialize @code{char} arrays or be concatenated with string +literals. -With the exception of built-ins that have library equivalents such as -the standard C library functions discussed below, or that expand to -library calls, GCC built-in functions are always expanded inline and -thus do not have corresponding entry points and their address cannot -be obtained. Attempting to use them in an expression other than -a function call results in a compile-time error. +@node Semantic Extensions +@section Extensions to C Semantics -@opindex fno-builtin -GCC includes built-in versions of many of the functions in the standard -C library. These functions come in two forms: one whose names start with -the @code{__builtin_} prefix, and the other without. Both forms have the -same type (including prototype), the same address (when their address is -taken), and the same meaning as the C library functions even if you specify -the @option{-fno-builtin} option @pxref{C Dialect Options}). Many of these -functions are only optimized in certain cases; if they are not optimized in -a particular case, a call to the library function is emitted. +GNU C defines useful behavior for some constructs that are not allowed or +well-defined in standard C. -@opindex ansi -@opindex std -Outside strict ISO C mode (@option{-ansi}, @option{-std=c90}, -@option{-std=c99} or @option{-std=c11}), the functions -@code{_exit}, @code{alloca}, @code{bcmp}, @code{bzero}, -@code{dcgettext}, @code{dgettext}, @code{dremf}, @code{dreml}, -@code{drem}, @code{exp10f}, @code{exp10l}, @code{exp10}, @code{ffsll}, -@code{ffsl}, @code{ffs}, @code{fprintf_unlocked}, -@code{fputs_unlocked}, @code{gammaf}, @code{gammal}, @code{gamma}, -@code{gammaf_r}, @code{gammal_r}, @code{gamma_r}, @code{gettext}, -@code{index}, @code{isascii}, @code{j0f}, @code{j0l}, @code{j0}, -@code{j1f}, @code{j1l}, @code{j1}, @code{jnf}, @code{jnl}, @code{jn}, -@code{lgammaf_r}, @code{lgammal_r}, @code{lgamma_r}, @code{mempcpy}, -@code{pow10f}, @code{pow10l}, @code{pow10}, @code{printf_unlocked}, -@code{rindex}, @code{roundeven}, @code{roundevenf}, @code{roundevenl}, -@code{scalbf}, @code{scalbl}, @code{scalb}, -@code{signbit}, @code{signbitf}, @code{signbitl}, @code{signbitd32}, -@code{signbitd64}, @code{signbitd128}, @code{significandf}, -@code{significandl}, @code{significand}, @code{sincosf}, -@code{sincosl}, @code{sincos}, @code{stpcpy}, @code{stpncpy}, -@code{strcasecmp}, @code{strdup}, @code{strfmon}, @code{strncasecmp}, -@code{strndup}, @code{strnlen}, @code{toascii}, @code{y0f}, @code{y0l}, -@code{y0}, @code{y1f}, @code{y1l}, @code{y1}, @code{ynf}, @code{ynl} and -@code{yn} -may be handled as built-in functions. -All these functions have corresponding versions -prefixed with @code{__builtin_}, which may be used even in strict C90 -mode. - -The ISO C99 functions -@code{_Exit}, @code{acoshf}, @code{acoshl}, @code{acosh}, @code{asinhf}, -@code{asinhl}, @code{asinh}, @code{atanhf}, @code{atanhl}, @code{atanh}, -@code{cabsf}, @code{cabsl}, @code{cabs}, @code{cacosf}, @code{cacoshf}, -@code{cacoshl}, @code{cacosh}, @code{cacosl}, @code{cacos}, -@code{cargf}, @code{cargl}, @code{carg}, @code{casinf}, @code{casinhf}, -@code{casinhl}, @code{casinh}, @code{casinl}, @code{casin}, -@code{catanf}, @code{catanhf}, @code{catanhl}, @code{catanh}, -@code{catanl}, @code{catan}, @code{cbrtf}, @code{cbrtl}, @code{cbrt}, -@code{ccosf}, @code{ccoshf}, @code{ccoshl}, @code{ccosh}, @code{ccosl}, -@code{ccos}, @code{cexpf}, @code{cexpl}, @code{cexp}, @code{cimagf}, -@code{cimagl}, @code{cimag}, @code{clogf}, @code{clogl}, @code{clog}, -@code{conjf}, @code{conjl}, @code{conj}, @code{copysignf}, @code{copysignl}, -@code{copysign}, @code{cpowf}, @code{cpowl}, @code{cpow}, @code{cprojf}, -@code{cprojl}, @code{cproj}, @code{crealf}, @code{creall}, @code{creal}, -@code{csinf}, @code{csinhf}, @code{csinhl}, @code{csinh}, @code{csinl}, -@code{csin}, @code{csqrtf}, @code{csqrtl}, @code{csqrt}, @code{ctanf}, -@code{ctanhf}, @code{ctanhl}, @code{ctanh}, @code{ctanl}, @code{ctan}, -@code{erfcf}, @code{erfcl}, @code{erfc}, @code{erff}, @code{erfl}, -@code{erf}, @code{exp2f}, @code{exp2l}, @code{exp2}, @code{expm1f}, -@code{expm1l}, @code{expm1}, @code{fdimf}, @code{fdiml}, @code{fdim}, -@code{fmaf}, @code{fmal}, @code{fmaxf}, @code{fmaxl}, @code{fmax}, -@code{fma}, @code{fminf}, @code{fminl}, @code{fmin}, @code{hypotf}, -@code{hypotl}, @code{hypot}, @code{ilogbf}, @code{ilogbl}, @code{ilogb}, -@code{imaxabs}, @code{isblank}, @code{iswblank}, @code{lgammaf}, -@code{lgammal}, @code{lgamma}, @code{llabs}, @code{llrintf}, @code{llrintl}, -@code{llrint}, @code{llroundf}, @code{llroundl}, @code{llround}, -@code{log1pf}, @code{log1pl}, @code{log1p}, @code{log2f}, @code{log2l}, -@code{log2}, @code{logbf}, @code{logbl}, @code{logb}, @code{lrintf}, -@code{lrintl}, @code{lrint}, @code{lroundf}, @code{lroundl}, -@code{lround}, @code{nearbyintf}, @code{nearbyintl}, @code{nearbyint}, -@code{nextafterf}, @code{nextafterl}, @code{nextafter}, -@code{nexttowardf}, @code{nexttowardl}, @code{nexttoward}, -@code{remainderf}, @code{remainderl}, @code{remainder}, @code{remquof}, -@code{remquol}, @code{remquo}, @code{rintf}, @code{rintl}, @code{rint}, -@code{roundf}, @code{roundl}, @code{round}, @code{scalblnf}, -@code{scalblnl}, @code{scalbln}, @code{scalbnf}, @code{scalbnl}, -@code{scalbn}, @code{snprintf}, @code{tgammaf}, @code{tgammal}, -@code{tgamma}, @code{truncf}, @code{truncl}, @code{trunc}, -@code{vfscanf}, @code{vscanf}, @code{vsnprintf} and @code{vsscanf} -are handled as built-in functions -except in strict ISO C90 mode (@option{-ansi} or @option{-std=c90}). - -There are also built-in versions of the ISO C99 functions -@code{acosf}, @code{acosl}, @code{asinf}, @code{asinl}, @code{atan2f}, -@code{atan2l}, @code{atanf}, @code{atanl}, @code{ceilf}, @code{ceill}, -@code{cosf}, @code{coshf}, @code{coshl}, @code{cosl}, @code{expf}, -@code{expl}, @code{fabsf}, @code{fabsl}, @code{floorf}, @code{floorl}, -@code{fmodf}, @code{fmodl}, @code{frexpf}, @code{frexpl}, @code{ldexpf}, -@code{ldexpl}, @code{log10f}, @code{log10l}, @code{logf}, @code{logl}, -@code{modfl}, @code{modff}, @code{powf}, @code{powl}, @code{sinf}, -@code{sinhf}, @code{sinhl}, @code{sinl}, @code{sqrtf}, @code{sqrtl}, -@code{tanf}, @code{tanhf}, @code{tanhl} and @code{tanl} -that are recognized in any mode since ISO C90 reserves these names for -the purpose to which ISO C99 puts them. All these functions have -corresponding versions prefixed with @code{__builtin_}. - -There are also built-in functions @code{__builtin_fabsf@var{n}}, -@code{__builtin_fabsf@var{n}x}, @code{__builtin_copysignf@var{n}} and -@code{__builtin_copysignf@var{n}x}, corresponding to the TS 18661-3 -functions @code{fabsf@var{n}}, @code{fabsf@var{n}x}, -@code{copysignf@var{n}} and @code{copysignf@var{n}x}, for supported -types @code{_Float@var{n}} and @code{_Float@var{n}x}. +@menu +* Function Prototypes:: Prototype declarations and old-style definitions. +* Pointer Arith:: Arithmetic on @code{void}-pointers and function pointers. +* Variadic Pointer Args:: Pointer arguments to variadic functions. +* Pointers to Arrays:: Pointers to arrays with qualifiers work as expected. +* Const and Volatile Functions:: GCC interprets these specially in C. +@end menu -There are also GNU extension functions @code{clog10}, @code{clog10f} and -@code{clog10l} which names are reserved by ISO C99 for future use. -All these functions have versions prefixed with @code{__builtin_}. +@node Function Prototypes +@subsection Prototypes and Old-Style Function Definitions +@cindex function prototype declarations +@cindex old-style function definitions +@cindex promotion of formal parameters -The ISO C94 functions -@code{iswalnum}, @code{iswalpha}, @code{iswcntrl}, @code{iswdigit}, -@code{iswgraph}, @code{iswlower}, @code{iswprint}, @code{iswpunct}, -@code{iswspace}, @code{iswupper}, @code{iswxdigit}, @code{towlower} and -@code{towupper} -are handled as built-in functions -except in strict ISO C90 mode (@option{-ansi} or @option{-std=c90}). +GNU C extends ISO C to allow a function prototype to override a later +old-style non-prototype definition. Consider the following example: -The ISO C90 functions -@code{abort}, @code{abs}, @code{acos}, @code{asin}, @code{atan2}, -@code{atan}, @code{calloc}, @code{ceil}, @code{cosh}, @code{cos}, -@code{exit}, @code{exp}, @code{fabs}, @code{floor}, @code{fmod}, -@code{fprintf}, @code{fputs}, @code{free}, @code{frexp}, @code{fscanf}, -@code{isalnum}, @code{isalpha}, @code{iscntrl}, @code{isdigit}, -@code{isgraph}, @code{islower}, @code{isprint}, @code{ispunct}, -@code{isspace}, @code{isupper}, @code{isxdigit}, @code{tolower}, -@code{toupper}, @code{labs}, @code{ldexp}, @code{log10}, @code{log}, -@code{malloc}, @code{memchr}, @code{memcmp}, @code{memcpy}, -@code{memset}, @code{modf}, @code{pow}, @code{printf}, @code{putchar}, -@code{puts}, @code{realloc}, @code{scanf}, @code{sinh}, @code{sin}, -@code{snprintf}, @code{sprintf}, @code{sqrt}, @code{sscanf}, @code{strcat}, -@code{strchr}, @code{strcmp}, @code{strcpy}, @code{strcspn}, -@code{strlen}, @code{strncat}, @code{strncmp}, @code{strncpy}, -@code{strpbrk}, @code{strrchr}, @code{strspn}, @code{strstr}, -@code{tanh}, @code{tan}, @code{vfprintf}, @code{vprintf} and @code{vsprintf} -are all recognized as built-in functions unless -@option{-fno-builtin} is specified (or @option{-fno-builtin-@var{function}} -is specified for an individual function). All of these functions have -corresponding versions prefixed with @code{__builtin_}. +@smallexample +/* @r{Use prototypes unless the compiler is old-fashioned.} */ +#ifdef __STDC__ +#define P(x) x +#else +#define P(x) () +#endif -GCC provides built-in versions of the ISO C99 floating-point comparison -macros that avoid raising exceptions for unordered operands. They have -the same names as the standard macros ( @code{isgreater}, -@code{isgreaterequal}, @code{isless}, @code{islessequal}, -@code{islessgreater}, and @code{isunordered}) , with @code{__builtin_} -prefixed. We intend for a library implementor to be able to simply -@code{#define} each standard macro to its built-in equivalent. -In the same fashion, GCC provides @code{fpclassify}, @code{iseqsig}, -@code{isfinite}, @code{isinf_sign}, @code{isnormal} and @code{signbit} built-ins -used with @code{__builtin_} prefixed. The @code{isinf} and @code{isnan} -built-in functions appear both with and without the @code{__builtin_} prefix. -With @code{-ffinite-math-only} option the @code{isinf} and @code{isnan} -built-in functions will always return 0. +/* @r{Prototype function declaration.} */ +int isroot P((uid_t)); -GCC provides built-in versions of the ISO C99 floating-point rounding and -exceptions handling functions @code{fegetround}, @code{feclearexcept} and -@code{feraiseexcept}. They may not be available for all targets, and because -they need close interaction with libc internal values, they may not be available -for all target libcs, but in all cases they will gracefully fallback to libc -calls. These built-in functions appear both with and without the -@code{__builtin_} prefix. +/* @r{Old-style function definition.} */ +int +isroot (x) /* @r{??? lossage here ???} */ + uid_t x; +@{ + return x == 0; +@} +@end smallexample -@defbuiltin{{void *} __builtin_alloca (size_t @var{size})} -The @code{__builtin_alloca} function must be called at block scope. -The function allocates an object @var{size} bytes large on the stack -of the calling function. The object is aligned on the default stack -alignment boundary for the target determined by the -@code{__BIGGEST_ALIGNMENT__} macro. The @code{__builtin_alloca} -function returns a pointer to the first byte of the allocated object. -The lifetime of the allocated object ends just before the calling -function returns to its caller. This is so even when -@code{__builtin_alloca} is called within a nested block. +Suppose the type @code{uid_t} happens to be @code{short}. ISO C does +not allow this example, because subword arguments in old-style +non-prototype definitions are promoted. Therefore in this example the +function definition's argument is really an @code{int}, which does not +match the prototype argument type of @code{short}. -For example, the following function allocates eight objects of @code{n} -bytes each on the stack, storing a pointer to each in consecutive elements -of the array @code{a}. It then passes the array to function @code{g} -which can safely use the storage pointed to by each of the array elements. +This restriction of ISO C makes it hard to write code that is portable +to traditional C compilers, because the programmer does not know +whether the @code{uid_t} type is @code{short}, @code{int}, or +@code{long}. Therefore, in cases like these GNU C allows a prototype +to override a later old-style definition. More precisely, in GNU C, a +function prototype argument type overrides the argument type specified +by a later old-style definition if the former type is the same as the +latter type before promotion. Thus in GNU C the above example is +equivalent to the following: @smallexample -void f (unsigned n) -@{ - void *a [8]; - for (int i = 0; i != 8; ++i) - a [i] = __builtin_alloca (n); +int isroot (uid_t); - g (a, n); // @r{safe} +int +isroot (uid_t x) +@{ + return x == 0; @} @end smallexample -Since the @code{__builtin_alloca} function doesn't validate its argument -it is the responsibility of its caller to make sure the argument doesn't -cause it to exceed the stack size limit. -The @code{__builtin_alloca} function is provided to make it possible to -allocate on the stack arrays of bytes with an upper bound that may be -computed at run time. Since C99 Variable Length Arrays offer -similar functionality under a portable, more convenient, and safer -interface they are recommended instead, in both C99 and C++ programs -where GCC provides them as an extension. -@xref{Variable Length}, for details. +@noindent +GNU C++ does not support old-style function definitions, so this +extension is irrelevant. -@enddefbuiltin +@node Pointer Arith +@subsection Arithmetic on @code{void}- and Function-Pointers +@cindex void pointers, arithmetic +@cindex void, size of pointer to +@cindex function pointers, arithmetic +@cindex function, size of pointer to -@defbuiltin{{void *} __builtin_alloca_with_align (size_t @var{size}, size_t @var{alignment})} -The @code{__builtin_alloca_with_align} function must be called at block -scope. The function allocates an object @var{size} bytes large on -the stack of the calling function. The allocated object is aligned on -the boundary specified by the argument @var{alignment} whose unit is given -in bits (not bytes). The @var{size} argument must be positive and not -exceed the stack size limit. The @var{alignment} argument must be a constant -integer expression that evaluates to a power of 2 greater than or equal to -@code{CHAR_BIT} and less than some unspecified maximum. Invocations -with other values are rejected with an error indicating the valid bounds. -The function returns a pointer to the first byte of the allocated object. -The lifetime of the allocated object ends at the end of the block in which -the function was called. The allocated storage is released no later than -just before the calling function returns to its caller, but may be released -at the end of the block in which the function was called. +In GNU C, addition and subtraction operations are supported on pointers to +@code{void} and on pointers to functions. This is done by treating the +size of a @code{void} or of a function as 1. -For example, in the following function the call to @code{g} is unsafe -because when @code{overalign} is non-zero, the space allocated by -@code{__builtin_alloca_with_align} may have been released at the end -of the @code{if} statement in which it was called. +A consequence of this is that @code{sizeof} is also allowed on @code{void} +and on function types, and returns 1. -@smallexample -void f (unsigned n, bool overalign) -@{ - void *p; - if (overalign) - p = __builtin_alloca_with_align (n, 64 /* bits */); - else - p = __builtin_alloc (n); - - g (p, n); // @r{unsafe} -@} -@end smallexample +@opindex Wpointer-arith +The option @option{-Wpointer-arith} requests a warning if these extensions +are used. -Since the @code{__builtin_alloca_with_align} function doesn't validate its -@var{size} argument it is the responsibility of its caller to make sure -the argument doesn't cause it to exceed the stack size limit. -The @code{__builtin_alloca_with_align} function is provided to make -it possible to allocate on the stack overaligned arrays of bytes with -an upper bound that may be computed at run time. Since C99 -Variable Length Arrays offer the same functionality under -a portable, more convenient, and safer interface they are recommended -instead, in both C99 and C++ programs where GCC provides them as -an extension. @xref{Variable Length}, for details. +@node Variadic Pointer Args +@subsection Pointer Arguments in Variadic Functions +@cindex pointer arguments in variadic functions +@cindex variadic functions, pointer arguments -@enddefbuiltin +Standard C requires that pointer types used with @code{va_arg} in +functions with variable argument lists either must be compatible with +that of the actual argument, or that one type must be a pointer to +@code{void} and the other a pointer to a character type. GNU C +implements the POSIX XSI extension that additionally permits the use +of @code{va_arg} with a pointer type to receive arguments of any other +pointer type. -@defbuiltin{{void *} __builtin_alloca_with_align_and_max (size_t @var{size}, size_t @var{alignment}, size_t @var{max_size})} -Similar to @code{__builtin_alloca_with_align} but takes an extra argument -specifying an upper bound for @var{size} in case its value cannot be computed -at compile time, for use by @option{-fstack-usage}, @option{-Wstack-usage} -and @option{-Walloca-larger-than}. @var{max_size} must be a constant integer -expression, it has no effect on code generation and no attempt is made to -check its compatibility with @var{size}. +In particular, in GNU C @samp{va_arg (ap, void *)} can safely be used +to consume an argument of any pointer type. -@enddefbuiltin +@node Pointers to Arrays +@subsection Pointers to Arrays with Qualifiers Work as Expected +@cindex pointers to arrays +@cindex const qualifier -@defbuiltin{bool __builtin_has_attribute (@var{type-or-expression}, @var{attribute})} -The @code{__builtin_has_attribute} function evaluates to an integer constant -expression equal to @code{true} if the symbol or type referenced by -the @var{type-or-expression} argument has been declared with -the @var{attribute} referenced by the second argument. For -an @var{type-or-expression} argument that does not reference a symbol, -since attributes do not apply to expressions the built-in consider -the type of the argument. Neither argument is evaluated. -The @var{type-or-expression} argument is subject to the same -restrictions as the argument to @code{typeof} (@pxref{Typeof}). The -@var{attribute} argument is an attribute name optionally followed by -a comma-separated list of arguments enclosed in parentheses. Both forms -of attribute names---with and without double leading and trailing -underscores---are recognized. @xref{Attribute Syntax}, for details. -When no attribute arguments are specified for an attribute that expects -one or more arguments the function returns @code{true} if -@var{type-or-expression} has been declared with the attribute regardless -of the attribute argument values. Arguments provided for an attribute -that expects some are validated and matched up to the provided number. -The function returns @code{true} if all provided arguments match. For -example, the first call to the function below evaluates to @code{true} -because @code{x} is declared with the @code{aligned} attribute but -the second call evaluates to @code{false} because @code{x} is declared -@code{aligned (8)} and not @code{aligned (4)}. +In GNU C, pointers to arrays with qualifiers work similar to pointers +to other qualified types. For example, a value of type @code{int (*)[5]} +can be used to initialize a variable of type @code{const int (*)[5]}. +These types are incompatible in ISO C because the @code{const} qualifier +is formally attached to the element type of the array and not the +array itself. @smallexample -__attribute__ ((aligned (8))) int x; -_Static_assert (__builtin_has_attribute (x, aligned), "aligned"); -_Static_assert (!__builtin_has_attribute (x, aligned (4)), "aligned (4)"); +extern void +transpose (int N, int M, double out[M][N], const double in[N][M]); +double x[3][2]; +double y[2][3]; +@r{@dots{}} +transpose(3, 2, y, x); @end smallexample -Due to a limitation the @code{__builtin_has_attribute} function returns -@code{false} for the @code{mode} attribute even if the type or variable -referenced by the @var{type-or-expression} argument was declared with one. -The function is also not supported with labels, and in C with enumerators. - -Note that unlike the @code{__has_attribute} preprocessor operator which -is suitable for use in @code{#if} preprocessing directives -@code{__builtin_has_attribute} is an intrinsic function that is not -recognized in such contexts. - -@enddefbuiltin +@node Const and Volatile Functions +@subsection Const and Volatile Functions +@cindex @code{const} applied to function +@cindex @code{volatile} applied to function -@defbuiltin{@var{type} __builtin_speculation_safe_value (@var{type} @var{val}, @var{type} @var{failval})} +The C standard explicitly leaves the behavior of the @code{const} and +@code{volatile} type qualifiers applied to functions undefined; these +constructs can only arise through the use of @code{typedef}. As an extension, +GCC defines this use of the @code{const} qualifier to have the same meaning +as the GCC @code{const} function attribute, and the @code{volatile} qualifier +to be equivalent to the @code{noreturn} attribute. +@xref{Common Function Attributes}, for more information. -This built-in function can be used to help mitigate against unsafe -speculative execution. @var{type} may be any integral type or any -pointer type. +As examples of this usage, -@enumerate -@item -If the CPU is not speculatively executing the code, then @var{val} -is returned. -@item -If the CPU is executing speculatively then either: -@itemize -@item -The function may cause execution to pause until it is known that the -code is no-longer being executed speculatively (in which case -@var{val} can be returned, as above); or -@item -The function may use target-dependent speculation tracking state to cause -@var{failval} to be returned when it is known that speculative -execution has incorrectly predicted a conditional branch operation. -@end itemize -@end enumerate +@smallexample -The second argument, @var{failval}, is optional and defaults to zero -if omitted. +/* @r{Equivalent to:} + void fatal () __attribute__ ((noreturn)); */ +typedef void voidfn (); +volatile voidfn fatal; -GCC defines the preprocessor macro -@code{__HAVE_BUILTIN_SPECULATION_SAFE_VALUE} for targets that have been -updated to support this builtin. +/* @r{Equivalent to:} + extern int square (int) __attribute__ ((const)); */ +typedef int intfn (int); +extern const intfn square; +@end smallexample -The built-in function can be used where a variable appears to be used in a -safe way, but the CPU, due to speculative execution may temporarily ignore -the bounds checks. Consider, for example, the following function: +In general, using function attributes instead is preferred, since the +attributes make both the intent of the code and its reliance on a GNU +extension explicit. Additionally, using @code{const} and +@code{volatile} in this way is specific to GNU C and does not work in +GNU C++. -@smallexample -int array[500]; -int f (unsigned untrusted_index) -@{ - if (untrusted_index < 500) - return array[untrusted_index]; - return 0; -@} -@end smallexample +@node Nonlocal Gotos +@section Nonlocal Gotos +@cindex nonlocal gotos -If the function is called repeatedly with @code{untrusted_index} less -than the limit of 500, then a branch predictor will learn that the -block of code that returns a value stored in @code{array} will be -executed. If the function is subsequently called with an -out-of-range value it will still try to execute that block of code -first until the CPU determines that the prediction was incorrect -(the CPU will unwind any incorrect operations at that point). -However, depending on how the result of the function is used, it might be -possible to leave traces in the cache that can reveal what was stored -at the out-of-bounds location. The built-in function can be used to -provide some protection against leaking data in this way by changing -the code to: +GCC provides the built-in functions @code{__builtin_setjmp} and +@code{__builtin_longjmp} which are similar to, but not interchangeable +with, the C library functions @code{setjmp} and @code{longjmp}. +The built-in versions are used internally by GCC's libraries +to implement exception handling on some targets. You should use the +standard C library functions declared in @code{} in user code +instead of the builtins. -@smallexample -int array[500]; -int f (unsigned untrusted_index) -@{ - if (untrusted_index < 500) - return array[__builtin_speculation_safe_value (untrusted_index)]; - return 0; -@} -@end smallexample +The built-in versions of these functions use GCC's normal +mechanisms to save and restore registers using the stack on function +entry and exit. The jump buffer argument @var{buf} holds only the +information needed to restore the stack frame, rather than the entire +set of saved register values. -The built-in function will either cause execution to stall until the -conditional branch has been fully resolved, or it may permit -speculative execution to continue, but using 0 instead of -@code{untrusted_value} if that exceeds the limit. +An important caveat is that GCC arranges to save and restore only +those registers known to the specific architecture variant being +compiled for. This can make @code{__builtin_setjmp} and +@code{__builtin_longjmp} more efficient than their library +counterparts in some cases, but it can also cause incorrect and +mysterious behavior when mixing with code that uses the full register +set. -If accessing any memory location is potentially unsafe when speculative -execution is incorrect, then the code can be rewritten as +You should declare the jump buffer argument @var{buf} to the +built-in functions as: @smallexample -int array[500]; -int f (unsigned untrusted_index) -@{ - if (untrusted_index < 500) - return *__builtin_speculation_safe_value (&array[untrusted_index], NULL); - return 0; -@} +#include +intptr_t @var{buf}[5]; @end smallexample -which will cause a @code{NULL} pointer to be used for the unsafe case. - +@defbuiltin{{int} __builtin_setjmp (intptr_t *@var{buf})} +This function saves the current stack context in @var{buf}. +@code{__builtin_setjmp} returns 0 when returning directly, +and 1 when returning from @code{__builtin_longjmp} using the same +@var{buf}. @enddefbuiltin -@defbuiltin{int __builtin_types_compatible_p (@var{type1}, @var{type2})} +@defbuiltin{{void} __builtin_longjmp (intptr_t *@var{buf}, int @var{val})} +This function restores the stack context in @var{buf}, +saved by a previous call to @code{__builtin_setjmp}. After +@code{__builtin_longjmp} is finished, the program resumes execution as +if the matching @code{__builtin_setjmp} returns the value @var{val}, +which must be 1. -You can use the built-in function @code{__builtin_types_compatible_p} to -determine whether two types are the same. +Because @code{__builtin_longjmp} depends on the function return +mechanism to restore the stack context, it cannot be called +from the same function calling @code{__builtin_setjmp} to +initialize @var{buf}. It can only be called from a function called +(directly or indirectly) from the function calling @code{__builtin_setjmp}. +@enddefbuiltin -This built-in function returns 1 if the unqualified versions of the -types @var{type1} and @var{type2} (which are types, not expressions) are -compatible, 0 otherwise. The result of this built-in function can be -used in integer constant expressions. +@node Constructing Calls +@section Constructing Function Calls +@cindex constructing calls +@cindex forwarding calls -This built-in function ignores top level qualifiers (e.g., @code{const}, -@code{volatile}). For example, @code{int} is equivalent to @code{const -int}. +Using the built-in functions described below, you can record +the arguments a function received, and call another function +with the same arguments, without knowing the number or types +of the arguments. -The type @code{int[]} and @code{int[5]} are compatible. On the other -hand, @code{int} and @code{char *} are not compatible, even if the size -of their types, on the particular architecture are the same. Also, the -amount of pointer indirection is taken into account when determining -similarity. Consequently, @code{short *} is not similar to -@code{short **}. Furthermore, two types that are typedefed are -considered compatible if their underlying types are compatible. +You can also record the return value of that function call, +and later return that value, without knowing what data type +the function tried to return (as long as your caller expects +that data type). -An @code{enum} type is not considered to be compatible with another -@code{enum} type even if both are compatible with the same integer -type; this is what the C standard specifies. -For example, @code{enum @{foo, bar@}} is not similar to -@code{enum @{hot, dog@}}. +However, these built-in functions may interact badly with some +sophisticated features or other extensions of the language. It +is, therefore, not recommended to use them outside very simple +functions acting as mere forwarders for their arguments. -You typically use this function in code whose execution varies -depending on the arguments' types. For example: +@defbuiltin{{void *} __builtin_apply_args ()} +This built-in function returns a pointer to data +describing how to perform a call with the same arguments as are passed +to the current function. -@smallexample -#define foo(x) \ - (@{ \ - typeof (x) tmp = (x); \ - if (__builtin_types_compatible_p (typeof (x), long double)) \ - tmp = foo_long_double (tmp); \ - else if (__builtin_types_compatible_p (typeof (x), double)) \ - tmp = foo_double (tmp); \ - else if (__builtin_types_compatible_p (typeof (x), float)) \ - tmp = foo_float (tmp); \ - else \ - abort (); \ - tmp; \ - @}) -@end smallexample +The function saves the arg pointer register, structure value address, +and all registers that might be used to pass arguments to a function +into a block of memory allocated on the stack. Then it returns the +address of that block. +@enddefbuiltin -@emph{Note:} This construct is only available for C@. +@defbuiltin{{void *} __builtin_apply (void (*@var{function})(), void *@var{arguments}, size_t @var{size})} +This built-in function invokes @var{function} +with a copy of the parameters described by @var{arguments} +and @var{size}. -@enddefbuiltin +The value of @var{arguments} should be the value returned by +@code{__builtin_apply_args}. The argument @var{size} specifies the size +of the stack argument data, in bytes. -@defbuiltin{@var{type} __builtin_call_with_static_chain (@var{call_exp}, @var{pointer_exp})} +This function returns a pointer to data describing +how to return whatever value is returned by @var{function}. The data +is saved in a block of memory allocated on the stack. -The @var{call_exp} expression must be a function call, and the -@var{pointer_exp} expression must be a pointer. The @var{pointer_exp} -is passed to the function call in the target's static chain location. -The result of builtin is the result of the function call. +It is not always simple to compute the proper value for @var{size}. The +value is used by @code{__builtin_apply} to compute the amount of data +that should be pushed on the stack and copied from the incoming argument +area. +@enddefbuiltin -@emph{Note:} This builtin is only available for C@. -This builtin can be used to call Go closures from C. +@defbuiltin{{void} __builtin_return (void *@var{result})} +This built-in function returns the value described by @var{result} from +the containing function. You should specify, for @var{result}, a value +returned by @code{__builtin_apply}. +@enddefbuiltin +@defbuiltin{{} __builtin_va_arg_pack ()} +This built-in function represents all anonymous arguments of an inline +function. It can be used only in inline functions that are always +inlined, never compiled as a separate function, such as those using +@code{__attribute__ ((__always_inline__))} or +@code{__attribute__ ((__gnu_inline__))} extern inline functions. +It must be only passed as last argument to some other function +with variable arguments. This is useful for writing small wrapper +inlines for variable argument functions, when using preprocessor +macros is undesirable. For example: +@smallexample +extern int myprintf (FILE *f, const char *format, ...); +extern inline __attribute__ ((__gnu_inline__)) int +myprintf (FILE *f, const char *format, ...) +@{ + int r = fprintf (f, "myprintf: "); + if (r < 0) + return r; + int s = fprintf (f, format, __builtin_va_arg_pack ()); + if (s < 0) + return s; + return r + s; +@} +@end smallexample @enddefbuiltin -@defbuiltin{@var{type} __builtin_choose_expr (@var{const_exp}, @var{exp1}, @var{exp2})} +@defbuiltin{int __builtin_va_arg_pack_len ()} +This built-in function returns the number of anonymous arguments of +an inline function. It can be used only in inline functions that +are always inlined, never compiled as a separate function, such +as those using @code{__attribute__ ((__always_inline__))} or +@code{__attribute__ ((__gnu_inline__))} extern inline functions. +For example following does link- or run-time checking of open +arguments for optimized code: +@smallexample +#ifdef __OPTIMIZE__ +extern inline __attribute__((__gnu_inline__)) int +myopen (const char *path, int oflag, ...) +@{ + if (__builtin_va_arg_pack_len () > 1) + warn_open_too_many_arguments (); -You can use the built-in function @code{__builtin_choose_expr} to -evaluate code depending on the value of a constant expression. This -built-in function returns @var{exp1} if @var{const_exp}, which is an -integer constant expression, is nonzero. Otherwise it returns @var{exp2}. + if (__builtin_constant_p (oflag)) + @{ + if ((oflag & O_CREAT) != 0 && __builtin_va_arg_pack_len () < 1) + @{ + warn_open_missing_mode (); + return __open_2 (path, oflag); + @} + return open (path, oflag, __builtin_va_arg_pack ()); + @} -Like the @samp{? :} operator, this built-in function does not evaluate the -expression that is not chosen. For example, if @var{const_exp} evaluates to -@code{true}, @var{exp2} is not evaluated even if it has side effects. On the -other hand, @code{__builtin_choose_expr} differs from @samp{? :} in that the -first operand must be a compile-time constant, and the other operands are not -subject to the @samp{? :} type constraints and promotions. + if (__builtin_va_arg_pack_len () < 1) + return __open_2 (path, oflag); -This built-in function can return an lvalue if the chosen argument is an -lvalue. + return open (path, oflag, __builtin_va_arg_pack ()); +@} +#endif +@end smallexample +@enddefbuiltin -If @var{exp1} is returned, the return type is the same as @var{exp1}'s -type. Similarly, if @var{exp2} is returned, its return type is the same -as @var{exp2}. +@node Return Address +@section Getting the Return or Frame Address of a Function -Example: +These functions may be used to get information about the callers of a +function. -@smallexample -#define foo(x) \ - __builtin_choose_expr ( \ - __builtin_types_compatible_p (typeof (x), double), \ - foo_double (x), \ - __builtin_choose_expr ( \ - __builtin_types_compatible_p (typeof (x), float), \ - foo_float (x), \ - /* @r{The void expression results in a compile-time error} \ - @r{when assigning the result to something.} */ \ - (void)0)) -@end smallexample +@defbuiltin{{void *} __builtin_return_address (unsigned int @var{level})} +This function returns the return address of the current function, or of +one of its callers. The @var{level} argument is number of frames to +scan up the call stack. A value of @code{0} yields the return address +of the current function, a value of @code{1} yields the return address +of the caller of the current function, and so forth. When inlining +the expected behavior is that the function returns the address of +the function that is returned to. To work around this behavior use +the @code{noinline} function attribute. -@emph{Note:} This construct is only available for C@. Furthermore, the -unused expression (@var{exp1} or @var{exp2} depending on the value of -@var{const_exp}) may still generate syntax errors. This may change in -future revisions. +The @var{level} argument must be a constant integer. -@enddefbuiltin +On some machines it may be impossible to determine the return address of +any function other than the current one; in such cases, or when the top +of the stack has been reached, this function returns an unspecified +value. In addition, @code{__builtin_frame_address} may be used +to determine if the top of the stack has been reached. -@defbuiltin{@var{type} __builtin_tgmath (@var{functions}, @var{arguments})} +Additional post-processing of the returned value may be needed, see +@code{__builtin_extract_return_addr}. -The built-in function @code{__builtin_tgmath}, available only for C -and Objective-C, calls a function determined according to the rules of -@code{} macros. It is intended to be used in -implementations of that header, so that expansions of macros from that -header only expand each of their arguments once, to avoid problems -when calls to such macros are nested inside the arguments of other -calls to such macros; in addition, it results in better diagnostics -for invalid calls to @code{} macros than implementations -using other GNU C language features. For example, the @code{pow} -type-generic macro might be defined as: +The stored representation of the return address in memory may be different +from the address returned by @code{__builtin_return_address}. For example, +on AArch64 the stored address may be mangled with return address signing +whereas the address returned by @code{__builtin_return_address} is not. + +Calling this function with a nonzero argument can have unpredictable +effects, including crashing the calling program. As a result, calls +that are considered unsafe are diagnosed when the @option{-Wframe-address} +option is in effect. Such calls should only be made in debugging +situations. +On targets where code addresses are representable as @code{void *}, @smallexample -#define pow(a, b) __builtin_tgmath (powf, pow, powl, \ - cpowf, cpow, cpowl, a, b) +void *addr = __builtin_extract_return_addr (__builtin_return_address (0)); @end smallexample +gives the code address where the current function would return. For example, +such an address may be used with @code{dladdr} or other interfaces that work +with code addresses. +@enddefbuiltin -The arguments to @code{__builtin_tgmath} are at least two pointers to -functions, followed by the arguments to the type-generic macro (which -will be passed as arguments to the selected function). All the -pointers to functions must be pointers to prototyped functions, none -of which may have variable arguments, and all of which must have the -same number of parameters; the number of parameters of the first -function determines how many arguments to @code{__builtin_tgmath} are -interpreted as function pointers, and how many as the arguments to the -called function. - -The types of the specified functions must all be different, but -related to each other in the same way as a set of functions that may -be selected between by a macro in @code{}. This means that -the functions are parameterized by a floating-point type @var{t}, -different for each such function. The function return types may all -be the same type, or they may be @var{t} for each function, or they -may be the real type corresponding to @var{t} for each function (if -some of the types @var{t} are complex). Likewise, for each parameter -position, the type of the parameter in that position may always be the -same type, or may be @var{t} for each function (this case must apply -for at least one parameter position), or may be the real type -corresponding to @var{t} for each function. +@defbuiltin{{void *} __builtin_extract_return_addr (void *@var{addr})} +The address as returned by @code{__builtin_return_address} may have to be fed +through this function to get the actual encoded address. For example, on the +31-bit S/390 platform the highest bit has to be masked out, or on SPARC +platforms an offset has to be added for the true next instruction to be +executed. -The standard rules for @code{} macros are used to find a -common type @var{u} from the types of the arguments for parameters -whose types vary between the functions; complex integer types (a GNU -extension) are treated like the complex type corresponding to the real -floating type that would be chosen for the corresponding real integer type. -If the function return types vary, or are all the same integer type, -the function called is the one for which @var{t} is @var{u}, and it is -an error if there is no such function. If the function return types -are all the same floating-point type, the type-generic macro is taken -to be one of those from TS 18661 that rounds the result to a narrower -type; if there is a function for which @var{t} is @var{u}, it is -called, and otherwise the first function, if any, for which @var{t} -has at least the range and precision of @var{u} is called, and it is -an error if there is no such function. +If no fixup is needed, this function simply passes through @var{addr}. +@enddefbuiltin +@defbuiltin{{void *} __builtin_frob_return_addr (void *@var{addr})} +This function does the reverse of @code{__builtin_extract_return_addr}. @enddefbuiltin -@defbuiltin{int __builtin_constant_p (@var{exp})} -You can use the built-in function @code{__builtin_constant_p} to -determine if the expression @var{exp} is known to be constant at -compile time and hence that GCC can perform constant-folding on expressions -involving that value. The argument of the function is the expression to test. -The expression is not evaluated, side-effects are discarded. The function -returns the integer 1 if the argument is known to be a compile-time -constant and 0 if it is not known to be a compile-time constant. -Any expression that has side-effects makes the function return 0. -A return of 0 does not indicate that the expression is @emph{not} a constant, -but merely that GCC cannot prove it is a constant within the constraints -of the active set of optimization options. - -You typically use this function in an embedded application where -memory is a critical resource. If you have some complex calculation, -you may want it to be folded if it involves constants, but need to call -a function if it does not. For example: +@defbuiltin{{void *} __builtin_frame_address (unsigned int @var{level})} +This function is similar to @code{__builtin_return_address}, but it +returns the address of the function frame rather than the return address +of the function. Calling @code{__builtin_frame_address} with a value of +@code{0} yields the frame address of the current function, a value of +@code{1} yields the frame address of the caller of the current function, +and so forth. -@smallexample -#define Scale_Value(X) \ - (__builtin_constant_p (X) \ - ? ((X) * SCALE + OFFSET) : Scale (X)) -@end smallexample +The frame is the area on the stack that holds local variables and saved +registers. The frame address is normally the address of the first word +pushed on to the stack by the function. However, the exact definition +depends upon the processor and the calling convention. If the processor +has a dedicated frame pointer register, and the function has a frame, +then @code{__builtin_frame_address} returns the value of the frame +pointer register. -You may use this built-in function in either a macro or an inline -function. However, if you use it in an inlined function and pass an -argument of the function as the argument to the built-in, GCC -never returns 1 when you call the inline function with a string constant -or compound literal (@pxref{Compound Literals}) and does not return 1 -when you pass a constant numeric value to the inline function unless you -specify the @option{-O} option. +On some machines it may be impossible to determine the frame address of +any function other than the current one; in such cases, or when the top +of the stack has been reached, this function returns @code{0} if +the first frame pointer is properly initialized by the startup code. -You may also use @code{__builtin_constant_p} in initializers for static -data. For instance, you can write +Calling this function with a nonzero argument can have unpredictable +effects, including crashing the calling program. As a result, calls +that are considered unsafe are diagnosed when the @option{-Wframe-address} +option is in effect. Such calls should only be made in debugging +situations. +@enddefbuiltin -@smallexample -static const int table[] = @{ - __builtin_constant_p (EXPRESSION) ? (EXPRESSION) : -1, - /* @r{@dots{}} */ -@}; -@end smallexample +@deftypefn {Built-in Function} {void *} __builtin_stack_address () +This function returns the stack pointer register, offset by +@code{STACK_ADDRESS_OFFSET} if that's defined. -@noindent -This is an acceptable initializer even if @var{EXPRESSION} is not a -constant expression, including the case where -@code{__builtin_constant_p} returns 1 because @var{EXPRESSION} can be -folded to a constant but @var{EXPRESSION} contains operands that are -not otherwise permitted in a static initializer (for example, -@code{0 && foo ()}). GCC must be more conservative about evaluating the -built-in in this case, because it has no opportunity to perform -optimization. -@enddefbuiltin +Conceptually, the returned address returned by this built-in function is +the boundary between the stack area allocated for use by its caller, and +the area that could be modified by a function call, that the caller +could safely zero-out before or after (but not during) the call +sequence. -@defbuiltin{bool __builtin_is_constant_evaluated (void)} -The @code{__builtin_is_constant_evaluated} function is available only -in C++. The built-in is intended to be used by implementations of -the @code{std::is_constant_evaluated} C++ function. Programs should make -use of the latter function rather than invoking the built-in directly. +Arguments for a callee may be preallocated as part of the caller's stack +frame, or allocated on a per-call basis, depending on the target, so +they may be on either side of this boundary. -The main use case of the built-in is to determine whether a @code{constexpr} -function is being called in a @code{constexpr} context. A call to -the function evaluates to a core constant expression with the value -@code{true} if and only if it occurs within the evaluation of an expression -or conversion that is manifestly constant-evaluated as defined in the C++ -standard. Manifestly constant-evaluated contexts include constant-expressions, -the conditions of @code{constexpr if} statements, constraint-expressions, and -initializers of variables usable in constant expressions. For more details -refer to the latest revision of the C++ standard. -@enddefbuiltin +Even if the stack pointer is biased, the result is not. The register +save area on SPARC is regarded as modifiable by calls, rather than as +allocated for use by the caller function, since it is never in use while +the caller function itself is running. -@defbuiltin{@var{type} __builtin_counted_by_ref (@var{ptr})} -The built-in function @code{__builtin_counted_by_ref} checks whether the array -object pointed by the pointer @var{ptr} has another object associated with it -that represents the number of elements in the array object through the -@code{counted_by} attribute (i.e. the counted-by object). If so, returns a -pointer to the corresponding counted-by object. -If such counted-by object does not exist, returns a null pointer. +Red zones that only leaf functions could use are also regarded as +modifiable by calls, rather than as allocated for use by the caller. +This is only theoretical, since leaf functions do not issue calls, but a +constant offset makes this built-in function more predictable. +@end deftypefn -This built-in function is only available in C for now. +@node Stack Scrubbing +@section Stack scrubbing internal interfaces -The argument @var{ptr} must be a pointer to an array. -The @var{type} of the returned value is a pointer type pointing to the -corresponding type of the counted-by object or a void pointer type in case -of a null pointer being returned. +Stack scrubbing involves cooperation between a @code{strub} context, +i.e., a function whose stack frame is to be zeroed-out, and its callers. +The caller initializes a stack watermark, the @code{strub} context +updates the watermark according to its stack use, and the caller zeroes +it out once it regains control, whether by the callee's returning or by +an exception. -For example: +Each of these steps is performed by a different builtin function call. +Calls to these builtins are introduced automatically, in response to +@code{strub} attributes and command-line options; they are not expected +to be explicitly called by source code. -@smallexample -struct foo1 @{ - int counter; - struct bar1 array[] __attribute__((counted_by (counter))); -@} *p; +The functions that implement the builtins are available in libgcc but, +depending on optimization levels, they are expanded internally, adjusted +to account for inlining, and sometimes combined/deferred (e.g. passing +the caller-supplied watermark on to callees, refraining from erasing +stack areas that the caller will) to enable tail calls and to optimize +for code size. -struct foo2 @{ - int other; - struct bar2 array[]; -@} *q; -@end smallexample +@deftypefn {Built-in Function} {void} __builtin___strub_enter (void **@var{wmptr}) +This function initializes a stack @var{watermark} variable with the +current top of the stack. A call to this builtin function is introduced +before entering a @code{strub} context. It remains as a function call +if optimization is not enabled. +@end deftypefn -@noindent -the following call to the built-in +@deftypefn {Built-in Function} {void} __builtin___strub_update (void **@var{wmptr}) +This function updates a stack @var{watermark} variable with the current +top of the stack, if it tops the previous watermark. A call to this +builtin function is inserted within @code{strub} contexts, whenever +additional stack space may have been used. It remains as a function +call at optimization levels lower than 2. +@end deftypefn -@smallexample -__builtin_counted_by_ref (p->array) -@end smallexample +@deftypefn {Built-in Function} {void} __builtin___strub_leave (void **@var{wmptr}) +This function overwrites the memory area between the current top of the +stack, and the @var{watermark}ed address. A call to this builtin +function is inserted after leaving a @code{strub} context. It remains +as a function call at optimization levels lower than 3, and it is guarded by +a condition at level 2. +@end deftypefn -@noindent -returns: +@node Vector Extensions +@section Using Vector Instructions through Built-in Functions -@smallexample -&p->counter with type @code{int *}. -@end smallexample +On some targets, the instruction set contains SIMD vector instructions which +operate on multiple values contained in one large register at the same time. +For example, on the x86 the MMX, 3DNow!@: and SSE extensions can be used +this way. -@noindent -However, the following call to the built-in +The first step in using these extensions is to provide the necessary data +types. This should be done using an appropriate @code{typedef}: @smallexample -__builtin_counted_by_ref (q->array) +typedef int v4si __attribute__ ((vector_size (16))); @end smallexample @noindent -returns a null pointer to @code{void}. +The @code{int} type specifies the @dfn{base type} (which can be a +@code{typedef}), while the attribute specifies the vector size for the +variable, measured in bytes. For example, the declaration above causes +the compiler to set the mode for the @code{v4si} type to be 16 bytes wide +and divided into @code{int} sized units. For a 32-bit @code{int} this +means a vector of 4 units of 4 bytes, and the corresponding mode of +@code{foo} is @acronym{V4SI}. -@enddefbuiltin +The @code{vector_size} attribute is only applicable to integral and +floating scalars, although arrays, pointers, and function return values +are allowed in conjunction with this construct. Only sizes that are +positive power-of-two multiples of the base type size are currently allowed. -@defbuiltin{void __builtin_clear_padding (@var{ptr})} -The built-in function @code{__builtin_clear_padding} function clears -padding bits inside of the object representation of object pointed by -@var{ptr}, which has to be a pointer. The value representation of the -object is not affected. The type of the object is assumed to be the type -the pointer points to. Inside of a union, the only cleared bits are -bits that are padding bits for all the union members. +All the basic integer types can be used as base types, both as signed +and as unsigned: @code{char}, @code{short}, @code{int}, @code{long}, +@code{long long}. In addition, @code{float} and @code{double} can be +used to build floating-point vector types. -This built-in-function is useful if the padding bits of an object might -have indeterminate values and the object representation needs to be -bitwise compared to some other object, for example for atomic operations. +Specifying a combination that is not valid for the current architecture +causes GCC to synthesize the instructions using a narrower mode. +For example, if you specify a variable of type @code{V4SI} and your +architecture does not allow for this specific SIMD type, GCC +produces code that uses 4 @code{SIs}. -For C++, @var{ptr} argument type should be pointer to trivially-copyable -type, unless the argument is address of a variable or parameter, because -otherwise it isn't known if the type isn't just a base class whose padding -bits are reused or laid out differently in a derived class. -@enddefbuiltin +The types defined in this manner can be used with a subset of normal C +operations. Currently, GCC allows using the following operators +on these types: @code{+, -, *, /, unary minus, ^, |, &, ~, %}@. -@defbuiltin{@var{type} __builtin_bit_cast (@var{type}, @var{arg})} -The @code{__builtin_bit_cast} function is available only -in C++. The built-in is intended to be used by implementations of -the @code{std::bit_cast} C++ template function. Programs should make -use of the latter function rather than invoking the built-in directly. +The operations behave like C++ @code{valarrays}. Addition is defined as +the addition of the corresponding elements of the operands. For +example, in the code below, each of the 4 elements in @var{a} is +added to the corresponding 4 elements in @var{b} and the resulting +vector is stored in @var{c}. -This built-in function allows reinterpreting the bits of the @var{arg} -argument as if it had type @var{type}. @var{type} and the type of the -@var{arg} argument need to be trivially copyable types with the same size. -When manifestly constant-evaluated, it performs extra diagnostics required -for @code{std::bit_cast} and returns a constant expression if @var{arg} -is a constant expression. For more details -refer to the latest revision of the C++ standard. -@enddefbuiltin +@smallexample +typedef int v4si __attribute__ ((vector_size (16))); -@defbuiltin{long __builtin_expect (long @var{exp}, long @var{c})} -@opindex fprofile-arcs -You may use @code{__builtin_expect} to provide the compiler with -branch prediction information. In general, you should prefer to -use actual profile feedback for this (@option{-fprofile-arcs}), as -programmers are notoriously bad at predicting how their programs -actually perform. However, there are applications in which this -data is hard to collect. +v4si a, b, c; -The return value is the value of @var{exp}, which should be an integral -expression. The semantics of the built-in are that it is expected that -@var{exp} == @var{c}. For example: - -@smallexample -if (__builtin_expect (x, 0)) - foo (); -@end smallexample - -@noindent -indicates that we do not expect to call @code{foo}, since -we expect @code{x} to be zero. Since you are limited to integral -expressions for @var{exp}, you should use constructions such as - -@smallexample -if (__builtin_expect (ptr != NULL, 1)) - foo (*ptr); +c = a + b; @end smallexample -@noindent -when testing pointer or floating-point values. +Subtraction, multiplication, division, and the logical operations +operate in a similar manner. Likewise, the result of using the unary +minus or complement operators on a vector type is a vector whose +elements are the negative or complemented values of the corresponding +elements in the operand. -For the purposes of branch prediction optimizations, the probability that -a @code{__builtin_expect} expression is @code{true} is controlled by GCC's -@code{builtin-expect-probability} parameter, which defaults to 90%. +It is possible to use shifting operators @code{<<}, @code{>>} on +integer-type vectors. The operation is defined as following: @code{@{a0, +a1, @dots{}, an@} >> @{b0, b1, @dots{}, bn@} == @{a0 >> b0, a1 >> b1, +@dots{}, an >> bn@}}@. Unlike OpenCL, values of @code{b} are not +implicitly taken modulo bit width of the base type @code{B}, and the behavior +is undefined if any @code{bi} is greater than or equal to @code{B}. -You can also use @code{__builtin_expect_with_probability} to explicitly -assign a probability value to individual expressions. If the built-in -is used in a loop construct, the provided probability will influence -the expected number of iterations made by loop optimizations. -@enddefbuiltin +In contrast to scalar operations in C and C++, operands of integer vector +operations do not undergo integer promotions. -@defbuiltin{long __builtin_expect_with_probability} -(long @var{exp}, long @var{c}, double @var{probability}) +Operands of binary vector operations must have the same number of +elements. -This function has the same semantics as @code{__builtin_expect}, -but the caller provides the expected probability that @var{exp} == @var{c}. -The last argument, @var{probability}, is a floating-point value in the -range 0.0 to 1.0, inclusive. The @var{probability} argument must be a -constant floating-point expression. -@enddefbuiltin +For convenience, it is allowed to use a binary vector operation +where one operand is a scalar. In that case the compiler transforms +the scalar operand into a vector where each element is the scalar from +the operation. The transformation happens only if the scalar could be +safely converted to the vector-element type. +Consider the following code. -@defbuiltin{void __builtin_trap (void)} -This function causes the program to exit abnormally. GCC implements -this function by using a target-dependent mechanism (such as -intentionally executing an illegal instruction) or by calling -@code{abort}. The mechanism used may vary from release to release so -you should not rely on any particular implementation. -@enddefbuiltin +@smallexample +typedef int v4si __attribute__ ((vector_size (16))); -@defbuiltin{void __builtin_unreachable (void)} -If control flow reaches the point of the @code{__builtin_unreachable}, -the program is undefined. It is useful in situations where the -compiler cannot deduce the unreachability of the code. +v4si a, b, c; +long l; -One such case is immediately following an @code{asm} statement that -either never terminates, or one that transfers control elsewhere -and never returns. In this example, without the -@code{__builtin_unreachable}, GCC issues a warning that control -reaches the end of a non-void function. It also generates code -to return after the @code{asm}. +a = b + 1; /* a = b + @{1,1,1,1@}; */ +a = 2 * b; /* a = @{2,2,2,2@} * b; */ -@smallexample -int f (int c, int v) -@{ - if (c) - @{ - return v; - @} - else - @{ - asm("jmp error_handler"); - __builtin_unreachable (); - @} -@} +a = l + a; /* Error, cannot convert long to int. */ @end smallexample -@noindent -Because the @code{asm} statement unconditionally transfers control out -of the function, control never reaches the end of the function -body. The @code{__builtin_unreachable} is in fact unreachable and -communicates this fact to the compiler. +Vectors can be subscripted as if the vector were an array with +the same number of elements and base type. Out of bound accesses +invoke undefined behavior at run time. Warnings for out of bound +accesses for vector subscription can be enabled with +@option{-Warray-bounds}. -Another use for @code{__builtin_unreachable} is following a call a -function that never returns but that is not declared -@code{__attribute__((noreturn))}, as in this example: +Vector comparison is supported with standard comparison +operators: @code{==, !=, <, <=, >, >=}. Comparison operands can be +vector expressions of integer-type or real-type. Comparison between +integer-type vectors and real-type vectors are not supported. The +result of the comparison is a vector of the same width and number of +elements as the comparison operands with a signed integral element +type. + +Vectors are compared element-wise producing 0 when comparison is false +and -1 (constant of the appropriate type where all bits are set) +otherwise. Consider the following example. @smallexample -void function_that_never_returns (void); +typedef int v4si __attribute__ ((vector_size (16))); -int g (int c) -@{ - if (c) - @{ - return 1; - @} - else - @{ - function_that_never_returns (); - __builtin_unreachable (); - @} -@} +v4si a = @{1,2,3,4@}; +v4si b = @{3,2,1,4@}; +v4si c; + +c = a > b; /* The result would be @{0, 0,-1, 0@} */ +c = a == b; /* The result would be @{0,-1, 0,-1@} */ @end smallexample -@enddefbuiltin +In C++, the ternary operator @code{?:} is available. @code{a?b:c}, where +@code{b} and @code{c} are vectors of the same type and @code{a} is an +integer vector with the same number of elements of the same size as @code{b} +and @code{c}, computes all three arguments and creates a vector +@code{@{a[0]?b[0]:c[0], a[1]?b[1]:c[1], @dots{}@}}. Note that unlike in +OpenCL, @code{a} is thus interpreted as @code{a != 0} and not @code{a < 0}. +As in the case of binary operations, this syntax is also accepted when +one of @code{b} or @code{c} is a scalar that is then transformed into a +vector. If both @code{b} and @code{c} are scalars and the type of +@code{true?b:c} has the same size as the element type of @code{a}, then +@code{b} and @code{c} are converted to a vector type whose elements have +this type and with the same number of elements as @code{a}. -@defbuiltin{@var{type} __builtin_assoc_barrier (@var{type} @var{expr})} -This built-in inhibits re-association of the floating-point expression -@var{expr} with expressions consuming the return value of the built-in. The -expression @var{expr} itself can be reordered, and the whole expression -@var{expr} can be reordered with operands after the barrier. The barrier is -relevant when @code{-fassociative-math} is active. +In C++, the logic operators @code{!, &&, ||} are available for vectors. +@code{!v} is equivalent to @code{v == 0}, @code{a && b} is equivalent to +@code{a!=0 & b!=0} and @code{a || b} is equivalent to @code{a!=0 | b!=0}. +For mixed operations between a scalar @code{s} and a vector @code{v}, +@code{s && v} is equivalent to @code{s?v!=0:0} (the evaluation is +short-circuit) and @code{v && s} is equivalent to @code{v!=0 & (s?-1:0)}. -@smallexample -float x0 = a + b - b; -float x1 = __builtin_assoc_barrier(a + b) - b; -@end smallexample +@findex __builtin_shuffle +Vector shuffling is available using functions +@code{__builtin_shuffle (vec, mask)} and +@code{__builtin_shuffle (vec0, vec1, mask)}. +Both functions construct a permutation of elements from one or two +vectors and return a vector of the same type as the input vector(s). +The @var{mask} is an integral vector with the same width (@var{W}) +and element count (@var{N}) as the output vector. -@noindent -means that, with @code{-fassociative-math}, @code{x0} can be optimized to -@code{x0 = a} but @code{x1} cannot. +The elements of the input vectors are numbered in memory ordering of +@var{vec0} beginning at 0 and @var{vec1} beginning at @var{N}. The +elements of @var{mask} are considered modulo @var{N} in the single-operand +case and modulo @math{2*@var{N}} in the two-operand case. -It is also relevant when @code{-ffp-contract=fast} is active; -it will prevent contraction between expressions. +Consider the following example, @smallexample -float x0 = a * b + c; -float x1 = __builtin_assoc_barrier (a * b) + c; +typedef int v4si __attribute__ ((vector_size (16))); + +v4si a = @{1,2,3,4@}; +v4si b = @{5,6,7,8@}; +v4si mask1 = @{0,1,1,3@}; +v4si mask2 = @{0,4,2,5@}; +v4si res; + +res = __builtin_shuffle (a, mask1); /* res is @{1,2,2,4@} */ +res = __builtin_shuffle (a, b, mask2); /* res is @{1,5,3,6@} */ @end smallexample -@noindent -means that, with @code{-ffp-contract=fast}, @code{x0} may be optimized to -use a fused multiply-add instruction but @code{x1} cannot. +Note that @code{__builtin_shuffle} is intentionally semantically +compatible with the OpenCL @code{shuffle} and @code{shuffle2} functions. -@enddefbuiltin +You can declare variables and use them in function calls and returns, as +well as in assignments and some casts. You can specify a vector type as +a return type for a function. Vector types can also be used as function +arguments. It is possible to cast from one vector type to another, +provided they are of the same size (in fact, you can also cast vectors +to and from other data types of the same size). -@defbuiltin{{void *} __builtin_assume_aligned (const void *@var{exp}, size_t @var{align}, ...)} -This function returns its first argument, and allows the compiler -to assume that the returned pointer is at least @var{align} bytes -aligned. This built-in can have either two or three arguments, -if it has three, the third argument should have integer type, and -if it is nonzero means misalignment offset. For example: +You cannot operate between vectors of different lengths or different +signedness without a cast. -@smallexample -void *x = __builtin_assume_aligned (arg, 16); -@end smallexample +@findex __builtin_shufflevector +Vector shuffling is available using the +@code{__builtin_shufflevector (vec1, vec2, index...)} +function. @var{vec1} and @var{vec2} must be expressions with +vector type with a compatible element type. The result of +@code{__builtin_shufflevector} is a vector with the same element type +as @var{vec1} and @var{vec2} but that has an element count equal to +the number of indices specified. -@noindent -means that the compiler can assume @code{x}, set to @code{arg}, is at least -16-byte aligned, while: +The @var{index} arguments are a list of integers that specify the +elements indices of the first two vectors that should be extracted and +returned in a new vector. These element indices are numbered sequentially +starting with the first vector, continuing into the second vector. +An index of -1 can be used to indicate that the corresponding element in +the returned vector is a don't care and can be freely chosen to optimized +the generated code sequence performing the shuffle operation. +Consider the following example, @smallexample -void *x = __builtin_assume_aligned (arg, 32, 8); -@end smallexample +typedef int v4si __attribute__ ((vector_size (16))); +typedef int v8si __attribute__ ((vector_size (32))); -@noindent -means that the compiler can assume for @code{x}, set to @code{arg}, that -@code{(char *) x - 8} is 32-byte aligned. -@enddefbuiltin +v8si a = @{1,-2,3,-4,5,-6,7,-8@}; +v4si b = __builtin_shufflevector (a, a, 0, 2, 4, 6); /* b is @{1,3,5,7@} */ +v4si c = @{-2,-4,-6,-8@}; +v8si d = __builtin_shufflevector (c, b, 4, 0, 5, 1, 6, 2, 7, 3); /* d is a */ +@end smallexample -@defbuiltin{int __builtin_LINE ()} -This function is the equivalent of the preprocessor @code{__LINE__} -macro and returns a constant integer expression that evaluates to -the line number of the invocation of the built-in. When used as a C++ -default argument for a function @var{F}, it returns the line number -of the call to @var{F}. -@enddefbuiltin - -@defbuiltin{{const char *} __builtin_FUNCTION ()} -This function is the equivalent of the @code{__FUNCTION__} symbol -and returns an address constant pointing to the name of the function -from which the built-in was invoked, or the empty string if -the invocation is not at function scope. When used as a C++ default -argument for a function @var{F}, it returns the name of @var{F}'s -caller or the empty string if the call was not made at function -scope. -@enddefbuiltin +@findex __builtin_convertvector +Vector conversion is available using the +@code{__builtin_convertvector (vec, vectype)} +function. @var{vec} must be an expression with integral or floating +vector type and @var{vectype} an integral or floating vector type with the +same number of elements. The result has @var{vectype} type and value of +a C cast of every element of @var{vec} to the element type of @var{vectype}. -@defbuiltin{{const char *} __builtin_FILE ()} -This function is the equivalent of the preprocessor @code{__FILE__} -macro and returns an address constant pointing to the file name -containing the invocation of the built-in, or the empty string if -the invocation is not at function scope. When used as a C++ default -argument for a function @var{F}, it returns the file name of the call -to @var{F} or the empty string if the call was not made at function -scope. +Consider the following example, +@smallexample +typedef int v4si __attribute__ ((vector_size (16))); +typedef float v4sf __attribute__ ((vector_size (16))); +typedef double v4df __attribute__ ((vector_size (32))); +typedef unsigned long long v4di __attribute__ ((vector_size (32))); -For example, in the following, each call to function @code{foo} will -print a line similar to @code{"file.c:123: foo: message"} with the name -of the file and the line number of the @code{printf} call, the name of -the function @code{foo}, followed by the word @code{message}. +v4si a = @{1,-2,3,-4@}; +v4sf b = @{1.5f,-2.5f,3.f,7.f@}; +v4di c = @{1ULL,5ULL,0ULL,10ULL@}; +v4sf d = __builtin_convertvector (a, v4sf); /* d is @{1.f,-2.f,3.f,-4.f@} */ +/* Equivalent of: + v4sf d = @{ (float)a[0], (float)a[1], (float)a[2], (float)a[3] @}; */ +v4df e = __builtin_convertvector (a, v4df); /* e is @{1.,-2.,3.,-4.@} */ +v4df f = __builtin_convertvector (b, v4df); /* f is @{1.5,-2.5,3.,7.@} */ +v4si g = __builtin_convertvector (f, v4si); /* g is @{1,-2,3,7@} */ +v4si h = __builtin_convertvector (c, v4si); /* h is @{1,5,0,10@} */ +@end smallexample +@cindex vector types, using with x86 intrinsics +Sometimes it is desirable to write code using a mix of generic vector +operations (for clarity) and machine-specific vector intrinsics (to +access vector instructions that are not exposed via generic built-ins). +On x86, intrinsic functions for integer vectors typically use the same +vector type @code{__m128i} irrespective of how they interpret the vector, +making it necessary to cast their arguments and return values from/to +other vector types. In C, you can make use of a @code{union} type: +@c In C++ such type punning via a union is not allowed by the language @smallexample -const char* -function (const char *func = __builtin_FUNCTION ()) -@{ - return func; -@} +#include -void foo (void) -@{ - printf ("%s:%i: %s: message\n", file (), line (), function ()); -@} +typedef unsigned char u8x16 __attribute__ ((vector_size (16))); +typedef unsigned int u32x4 __attribute__ ((vector_size (16))); + +typedef union @{ + __m128i mm; + u8x16 u8; + u32x4 u32; +@} v128; @end smallexample -@enddefbuiltin +@noindent +for variables that can be used with both built-in operators and x86 +intrinsics: -@defbuiltin{void __builtin___clear_cache (void *@var{begin}, void *@var{end})} -This function is used to flush the processor's instruction cache for -the region of memory between @var{begin} inclusive and @var{end} -exclusive. Some targets require that the instruction cache be -flushed, after modifying memory containing code, in order to obtain -deterministic behavior. +@smallexample +v128 x, y = @{ 0 @}; +memcpy (&x, ptr, sizeof x); +y.u8 += 0x80; +x.mm = _mm_adds_epu8 (x.mm, y.mm); +x.u32 &= 0xffffff; -If the target does not require instruction cache flushes, -@code{__builtin___clear_cache} has no effect. Otherwise either -instructions are emitted in-line to clear the instruction cache or a -call to the @code{__clear_cache} function in libgcc is made. -@enddefbuiltin +/* Instead of a variable, a compound literal may be used to pass the + return value of an intrinsic call to a function expecting the union: */ +v128 foo (v128); +x = foo ((v128) @{_mm_adds_epu8 (x.mm, y.mm)@}); +@c This could be done implicitly with __attribute__((transparent_union)), +@c but GCC does not accept it for unions of vector types (PR 88955). +@end smallexample -@defbuiltin{void __builtin_prefetch (const void *@var{addr}, ...)} -This function is used to minimize cache-miss latency by moving data into -a cache before it is accessed. -You can insert calls to @code{__builtin_prefetch} into code for which -you know addresses of data in memory that is likely to be accessed soon. -If the target supports them, data prefetch instructions are generated. -If the prefetch is done early enough before the access then the data will -be in the cache by the time it is accessed. +@node __sync Builtins +@section Legacy @code{__sync} Built-in Functions for Atomic Memory Access -The value of @var{addr} is the address of the memory to prefetch. -There are two optional arguments, @var{rw} and @var{locality}. -The value of @var{rw} is a compile-time constant zero, one or two; one -means that the prefetch is preparing for a write to the memory address, -two means that the prefetch is preparing for a shared read (expected to be -read by at least one other processor before it is written if written at -all) and zero, the default, means that the prefetch is preparing for a read. -The value @var{locality} must be a compile-time constant integer between -zero and three. A value of zero means that the data has no temporal -locality, so it need not be left in the cache after the access. A value -of three means that the data has a high degree of temporal locality and -should be left in all levels of cache possible. Values of one and two -mean, respectively, a low or moderate degree of temporal locality. The -default is three. +The following built-in functions +are intended to be compatible with those described +in the @cite{Intel Itanium Processor-specific Application Binary Interface}, +section 7.4. As such, they depart from normal GCC practice by not using +the @samp{__builtin_} prefix and also by being overloaded so that they +work on multiple types. -@smallexample -for (i = 0; i < n; i++) - @{ - a[i] = a[i] + b[i]; - __builtin_prefetch (&a[i+j], 1, 1); - __builtin_prefetch (&b[i+j], 0, 1); - /* @r{@dots{}} */ - @} -@end smallexample +The definition given in the Intel documentation allows only for the use of +the types @code{int}, @code{long}, @code{long long} or their unsigned +counterparts. GCC allows any scalar type that is 1, 2, 4 or 8 bytes in +size other than the C type @code{_Bool} or the C++ type @code{bool}. +Operations on pointer arguments are performed as if the operands were +of the @code{uintptr_t} type. That is, they are not scaled by the size +of the type to which the pointer points. -Data prefetch does not generate faults if @var{addr} is invalid, but -the address expression itself must be valid. For example, a prefetch -of @code{p->next} does not fault if @code{p->next} is not a valid -address, but evaluation faults if @code{p} is not a valid address. +These functions are implemented in terms of the @samp{__atomic} +builtins (@pxref{__atomic Builtins}). They should not be used for new +code which should use the @samp{__atomic} builtins instead. -If the target does not support data prefetch, the address expression -is evaluated if it includes side effects but no other code is generated -and GCC does not issue a warning. -@enddefbuiltin +Not all operations are supported by all target processors. If a particular +operation cannot be implemented on the target processor, a call to an +external function is generated. The external function carries the same name +as the built-in version, with an additional suffix +@samp{_@var{n}} where @var{n} is the size of the data type. -@defbuiltin{{size_t} __builtin_object_size (const void * @var{ptr}, int @var{type})} -Returns a constant size estimate of an object pointed to by @var{ptr}. -@xref{Object Size Checking}, for a detailed description of the function. -@enddefbuiltin +In most cases, these built-in functions are considered a @dfn{full barrier}. +That is, +no memory operand is moved across the operation, either forward or +backward. Further, instructions are issued as necessary to prevent the +processor from speculating loads across the operation and from queuing stores +after the operation. -@defbuiltin{{size_t} __builtin_dynamic_object_size (const void * @var{ptr}, int @var{type})} -Similar to @code{__builtin_object_size} except that the return value -need not be a constant. @xref{Object Size Checking}, for a detailed -description of the function. -@enddefbuiltin +All of the routines are described in the Intel documentation to take +``an optional list of variables protected by the memory barrier''. It's +not clear what is meant by that; it could mean that @emph{only} the +listed variables are protected, or it could mean a list of additional +variables to be protected. The list is ignored by GCC which treats it as +empty. GCC interprets an empty list as meaning that all globally +accessible variables should be protected. -@defbuiltin{int __builtin_classify_type (@var{arg})} -@defbuiltinx{int __builtin_classify_type (@var{type})} -The @code{__builtin_classify_type} returns a small integer with a category -of @var{arg} argument's type, like void type, integer type, enumeral type, -boolean type, pointer type, reference type, offset type, real type, complex -type, function type, method type, record type, union type, array type, -string type, bit-precise integer type, vector type, etc. When the argument -is an expression, for backwards compatibility reason the argument is promoted -like arguments passed to @code{...} in varargs function, so some classes are -never returned in certain languages. Alternatively, the argument of the -built-in function can be a typename, such as the @code{typeof} specifier. +@defbuiltin{@var{type} __sync_fetch_and_add (@var{type} *@var{ptr}, @var{type} @var{value}, ...)} +@defbuiltinx{@var{type} __sync_fetch_and_sub (@var{type} *@var{ptr}, @var{type} @var{value}, ...)} +@defbuiltinx{@var{type} __sync_fetch_and_or (@var{type} *@var{ptr}, @var{type} @var{value}, ...)} +@defbuiltinx{@var{type} __sync_fetch_and_and (@var{type} *@var{ptr}, @var{type} @var{value}, ...)} +@defbuiltinx{@var{type} __sync_fetch_and_xor (@var{type} *@var{ptr}, @var{type} @var{value}, ...)} +@defbuiltinx{@var{type} __sync_fetch_and_nand (@var{type} *@var{ptr}, @var{type} @var{value}, ...)} +These built-in functions perform the operation suggested by the name, and +returns the value that had previously been in memory. That is, operations +on integer operands have the following semantics. Operations on pointer +arguments are performed as if the operands were of the @code{uintptr_t} +type. That is, they are not scaled by the size of the type to which +the pointer points. @smallexample -int a[2]; -__builtin_classify_type (a) == __builtin_classify_type (int[5]); -__builtin_classify_type (a) == __builtin_classify_type (void*); -__builtin_classify_type (typeof (a)) == __builtin_classify_type (int[5]); +@{ tmp = *ptr; *ptr @var{op}= value; return tmp; @} +@{ tmp = *ptr; *ptr = ~(tmp & value); return tmp; @} // nand @end smallexample -The first comparison will never be true, as @var{a} is implicitly converted -to pointer. The last two comparisons will be true as they classify -pointers in the second case and arrays in the last case. -@enddefbuiltin +The object pointed to by the first argument must be of integer or pointer +type. It must not be a boolean type. -@defbuiltin{double __builtin_huge_val (void)} -Returns a positive infinity, if supported by the floating-point format, -else @code{DBL_MAX}. This function is suitable for implementing the -ISO C macro @code{HUGE_VAL}. +@emph{Note:} GCC 4.4 and later implement @code{__sync_fetch_and_nand} +as @code{*ptr = ~(tmp & value)} instead of @code{*ptr = ~tmp & value}. @enddefbuiltin -@defbuiltin{float __builtin_huge_valf (void)} -Similar to @code{__builtin_huge_val}, except the return type is @code{float}. -@enddefbuiltin +@defbuiltin{@var{type} __sync_add_and_fetch (@var{type} *@var{ptr}, @ + @var{type} @var{value}, ...)} +@defbuiltinx{@var{type} __sync_sub_and_fetch (@var{type} *@var{ptr}, @var{type} @var{value}, ...)} +@defbuiltinx{@var{type} __sync_or_and_fetch (@var{type} *@var{ptr}, @var{type} @var{value}, ...)} +@defbuiltinx{@var{type} __sync_and_and_fetch (@var{type} *@var{ptr}, @var{type} @var{value}, ...)} +@defbuiltinx{@var{type} __sync_xor_and_fetch (@var{type} *@var{ptr}, @var{type} @var{value}, ...)} +@defbuiltinx{@var{type} __sync_nand_and_fetch (@var{type} *@var{ptr}, @var{type} @var{value}, ...)} +These built-in functions perform the operation suggested by the name, and +return the new value. That is, operations on integer operands have +the following semantics. Operations on pointer operands are performed as +if the operand's type were @code{uintptr_t}. -@defbuiltin{{long double} __builtin_huge_vall (void)} -Similar to @code{__builtin_huge_val}, except the return -type is @code{long double}. -@enddefbuiltin +@smallexample +@{ *ptr @var{op}= value; return *ptr; @} +@{ *ptr = ~(*ptr & value); return *ptr; @} // nand +@end smallexample -@defbuiltin{_Float@var{n} __builtin_huge_valf@var{n} (void)} -Similar to @code{__builtin_huge_val}, except the return type is -@code{_Float@var{n}}. -@enddefbuiltin +The same constraints on arguments apply as for the corresponding +@code{__sync_op_and_fetch} built-in functions. -@defbuiltin{_Float@var{n}x __builtin_huge_valf@var{n}x (void)} -Similar to @code{__builtin_huge_val}, except the return type is -@code{_Float@var{n}x}. +@emph{Note:} GCC 4.4 and later implement @code{__sync_nand_and_fetch} +as @code{*ptr = ~(*ptr & value)} instead of +@code{*ptr = ~*ptr & value}. @enddefbuiltin -@defbuiltin{int __builtin_fpclassify (int, int, int, int, int, ...)} -This built-in implements the C99 fpclassify functionality. The first -five int arguments should be the target library's notion of the -possible FP classes and are used for return values. They must be -constant values and they must appear in this order: @code{FP_NAN}, -@code{FP_INFINITE}, @code{FP_NORMAL}, @code{FP_SUBNORMAL} and -@code{FP_ZERO}. The ellipsis is for exactly one floating-point value -to classify. GCC treats the last argument as type-generic, which -means it does not do default promotion from float to double. -@enddefbuiltin +@defbuiltin{bool __sync_bool_compare_and_swap (@var{type} *@var{ptr}, @var{type} @var{oldval}, @var{type} @var{newval}, ...)} +@defbuiltinx{@var{type} __sync_val_compare_and_swap (@var{type} *@var{ptr}, @var{type} @var{oldval}, @var{type} @var{newval}, ...)} +These built-in functions perform an atomic compare and swap. +That is, if the current +value of @code{*@var{ptr}} is @var{oldval}, then write @var{newval} into +@code{*@var{ptr}}. -@defbuiltin{double __builtin_inf (void)} -Similar to @code{__builtin_huge_val}, except a warning is generated -if the target floating-point format does not support infinities. +The ``bool'' version returns @code{true} if the comparison is successful and +@var{newval} is written. The ``val'' version returns the contents +of @code{*@var{ptr}} before the operation. @enddefbuiltin -@defbuiltin{_Decimal32 __builtin_infd32 (void)} -Similar to @code{__builtin_inf}, except the return type is @code{_Decimal32}. +@defbuiltin{void __sync_synchronize (...)} +This built-in function issues a full memory barrier. @enddefbuiltin -@defbuiltin{_Decimal64 __builtin_infd64 (void)} -Similar to @code{__builtin_inf}, except the return type is @code{_Decimal64}. -@enddefbuiltin +@defbuiltin{@var{type} __sync_lock_test_and_set (@var{type} *@var{ptr}, @var{type} @var{value}, ...)} +This built-in function, as described by Intel, is not a traditional test-and-set +operation, but rather an atomic exchange operation. It writes @var{value} +into @code{*@var{ptr}}, and returns the previous contents of +@code{*@var{ptr}}. -@defbuiltin{_Decimal128 __builtin_infd128 (void)} -Similar to @code{__builtin_inf}, except the return type is @code{_Decimal128}. -@enddefbuiltin +Many targets have only minimal support for such locks, and do not support +a full exchange operation. In this case, a target may support reduced +functionality here by which the @emph{only} valid value to store is the +immediate constant 1. The exact value actually stored in @code{*@var{ptr}} +is implementation defined. -@defbuiltin{float __builtin_inff (void)} -Similar to @code{__builtin_inf}, except the return type is @code{float}. -This function is suitable for implementing the ISO C99 macro @code{INFINITY}. +This built-in function is not a full barrier, +but rather an @dfn{acquire barrier}. +This means that references after the operation cannot move to (or be +speculated to) before the operation, but previous memory stores may not +be globally visible yet, and previous memory loads may not yet be +satisfied. @enddefbuiltin -@defbuiltin{{long double} __builtin_infl (void)} -Similar to @code{__builtin_inf}, except the return -type is @code{long double}. -@enddefbuiltin +@defbuiltin{void __sync_lock_release (@var{type} *@var{ptr}, ...)} +This built-in function releases the lock acquired by +@code{__sync_lock_test_and_set}. +Normally this means writing the constant 0 to @code{*@var{ptr}}. -@defbuiltin{_Float@var{n} __builtin_inff@var{n} (void)} -Similar to @code{__builtin_inf}, except the return -type is @code{_Float@var{n}}. +This built-in function is not a full barrier, +but rather a @dfn{release barrier}. +This means that all previous memory stores are globally visible, and all +previous memory loads have been satisfied, but following memory reads +are not prevented from being speculated to before the barrier. @enddefbuiltin -@defbuiltin{_Float@var{n} __builtin_inff@var{n}x (void)} -Similar to @code{__builtin_inf}, except the return -type is @code{_Float@var{n}x}. -@enddefbuiltin +@node __atomic Builtins +@section Built-in Functions for Memory Model Aware Atomic Operations -@defbuiltin{int __builtin_isinf_sign (...)} -Similar to @code{isinf}, except the return value is -1 for -an argument of @code{-Inf} and 1 for an argument of @code{+Inf}. -Note while the parameter list is an -ellipsis, this function only accepts exactly one floating-point -argument. GCC treats this parameter as type-generic, which means it -does not do default promotion from float to double. -@enddefbuiltin +The following built-in functions approximately match the requirements +for the C++11 memory model. They are all +identified by being prefixed with @samp{__atomic} and most are +overloaded so that they work with multiple types. -@defbuiltin{double __builtin_nan (const char *@var{str})} -This is an implementation of the ISO C99 function @code{nan}. +These functions are intended to replace the legacy @samp{__sync} +builtins. The main difference is that the memory order that is requested +is a parameter to the functions. New code should always use the +@samp{__atomic} builtins rather than the @samp{__sync} builtins. -Since ISO C99 defines this function in terms of @code{strtod}, which we -do not implement, a description of the parsing is in order. The string -is parsed as by @code{strtol}; that is, the base is recognized by -leading @samp{0} or @samp{0x} prefixes. The number parsed is placed -in the significand such that the least significant bit of the number -is at the least significant bit of the significand. The number is -truncated to fit the significand field provided. The significand is -forced to be a quiet NaN@. +Note that the @samp{__atomic} builtins assume that programs will +conform to the C++11 memory model. In particular, they assume +that programs are free of data races. See the C++11 standard for +detailed requirements. -This function, if given a string literal all of which would have been -consumed by @code{strtol}, is evaluated early enough that it is considered a -compile-time constant. -@enddefbuiltin +The @samp{__atomic} builtins can be used with any integral scalar or +pointer type that is 1, 2, 4, or 8 bytes in length. 16-byte integral +types are also allowed if @samp{__int128} (@pxref{__int128}) is +supported by the architecture. -@defbuiltin{_Decimal32 __builtin_nand32 (const char *@var{str})} -Similar to @code{__builtin_nan}, except the return type is @code{_Decimal32}. -@enddefbuiltin +The four non-arithmetic functions (load, store, exchange, and +compare_exchange) all have a generic version as well. This generic +version works on any data type. It uses the lock-free built-in function +if the specific data type size makes that possible; otherwise, an +external call is left to be resolved at run time. This external call is +the same format with the addition of a @samp{size_t} parameter inserted +as the first parameter indicating the size of the object being pointed to. +All objects must be the same size. -@defbuiltin{_Decimal64 __builtin_nand64 (const char *@var{str})} -Similar to @code{__builtin_nan}, except the return type is @code{_Decimal64}. -@enddefbuiltin +There are 6 different memory orders that can be specified. These map +to the C++11 memory orders with the same names, see the C++11 standard +or the @uref{https://gcc.gnu.org/wiki/Atomic/GCCMM/AtomicSync,GCC wiki +on atomic synchronization} for detailed definitions. Individual +targets may also support additional memory orders for use on specific +architectures. Refer to the target documentation for details of +these. -@defbuiltin{_Decimal128 __builtin_nand128 (const char *@var{str})} -Similar to @code{__builtin_nan}, except the return type is @code{_Decimal128}. -@enddefbuiltin +An atomic operation can both constrain code motion and +be mapped to hardware instructions for synchronization between threads +(e.g., a fence). To which extent this happens is controlled by the +memory orders, which are listed here in approximately ascending order of +strength. The description of each memory order is only meant to roughly +illustrate the effects and is not a specification; see the C++11 +memory model for precise semantics. -@defbuiltin{float __builtin_nanf (const char *@var{str})} -Similar to @code{__builtin_nan}, except the return type is @code{float}. -@enddefbuiltin +@table @code +@item __ATOMIC_RELAXED +Implies no inter-thread ordering constraints. +@item __ATOMIC_CONSUME +This is currently implemented using the stronger @code{__ATOMIC_ACQUIRE} +memory order because of a deficiency in C++11's semantics for +@code{memory_order_consume}. +@item __ATOMIC_ACQUIRE +Creates an inter-thread happens-before constraint from the release (or +stronger) semantic store to this acquire load. Can prevent hoisting +of code to before the operation. +@item __ATOMIC_RELEASE +Creates an inter-thread happens-before constraint to acquire (or stronger) +semantic loads that read from this release store. Can prevent sinking +of code to after the operation. +@item __ATOMIC_ACQ_REL +Combines the effects of both @code{__ATOMIC_ACQUIRE} and +@code{__ATOMIC_RELEASE}. +@item __ATOMIC_SEQ_CST +Enforces total ordering with all other @code{__ATOMIC_SEQ_CST} operations. +@end table -@defbuiltin{{long double} __builtin_nanl (const char *@var{str})} -Similar to @code{__builtin_nan}, except the return type is @code{long double}. -@enddefbuiltin +Note that in the C++11 memory model, @emph{fences} (e.g., +@samp{__atomic_thread_fence}) take effect in combination with other +atomic operations on specific memory locations (e.g., atomic loads); +operations on specific memory locations do not necessarily affect other +operations in the same way. -@defbuiltin{_Float@var{n} __builtin_nanf@var{n} (const char *@var{str})} -Similar to @code{__builtin_nan}, except the return type is -@code{_Float@var{n}}. -@enddefbuiltin +Target architectures are encouraged to provide their own patterns for +each of the atomic built-in functions. If no target is provided, the original +non-memory model set of @samp{__sync} atomic built-in functions are +used, along with any required synchronization fences surrounding it in +order to achieve the proper behavior. Execution in this case is subject +to the same restrictions as those built-in functions. -@defbuiltin{_Float@var{n}x __builtin_nanf@var{n}x (const char *@var{str})} -Similar to @code{__builtin_nan}, except the return type is -@code{_Float@var{n}x}. -@enddefbuiltin +If there is no pattern or mechanism to provide a lock-free instruction +sequence, a call is made to an external routine with the same parameters +to be resolved at run time. -@defbuiltin{double __builtin_nans (const char *@var{str})} -Similar to @code{__builtin_nan}, except the significand is forced -to be a signaling NaN@. The @code{nans} function is proposed by -@uref{https://www.open-std.org/jtc1/sc22/wg14/www/docs/n965.htm,,WG14 N965}. -@enddefbuiltin +When implementing patterns for these built-in functions, the memory order +parameter can be ignored as long as the pattern implements the most +restrictive @code{__ATOMIC_SEQ_CST} memory order. Any of the other memory +orders execute correctly with this memory order but they may not execute as +efficiently as they could with a more appropriate implementation of the +relaxed requirements. -@defbuiltin{_Decimal32 __builtin_nansd32 (const char *@var{str})} -Similar to @code{__builtin_nans}, except the return type is @code{_Decimal32}. -@enddefbuiltin +Note that the C++11 standard allows for the memory order parameter to be +determined at run time rather than at compile time. These built-in +functions map any run-time value to @code{__ATOMIC_SEQ_CST} rather +than invoke a runtime library call or inline a switch statement. This is +standard compliant, safe, and the simplest approach for now. -@defbuiltin{_Decimal64 __builtin_nansd64 (const char *@var{str})} -Similar to @code{__builtin_nans}, except the return type is @code{_Decimal64}. -@enddefbuiltin +The memory order parameter is a signed int, but only the lower 16 bits are +reserved for the memory order. The remainder of the signed int is reserved +for target use and should be 0. Use of the predefined atomic values +ensures proper usage. -@defbuiltin{_Decimal128 __builtin_nansd128 (const char *@var{str})} -Similar to @code{__builtin_nans}, except the return type is @code{_Decimal128}. -@enddefbuiltin +@defbuiltin{@var{type} __atomic_load_n (@var{type} *@var{ptr}, int @var{memorder})} +This built-in function implements an atomic load operation. It returns the +contents of @code{*@var{ptr}}. -@defbuiltin{float __builtin_nansf (const char *@var{str})} -Similar to @code{__builtin_nans}, except the return type is @code{float}. -@enddefbuiltin +The valid memory order variants are +@code{__ATOMIC_RELAXED}, @code{__ATOMIC_SEQ_CST}, @code{__ATOMIC_ACQUIRE}, +and @code{__ATOMIC_CONSUME}. -@defbuiltin{{long double} __builtin_nansl (const char *@var{str})} -Similar to @code{__builtin_nans}, except the return type is @code{long double}. @enddefbuiltin -@defbuiltin{_Float@var{n} __builtin_nansf@var{n} (const char *@var{str})} -Similar to @code{__builtin_nans}, except the return type is -@code{_Float@var{n}}. +@defbuiltin{void __atomic_load (@var{type} *@var{ptr}, @var{type} *@var{ret}, int @var{memorder})} +This is the generic version of an atomic load. It returns the +contents of @code{*@var{ptr}} in @code{*@var{ret}}. + @enddefbuiltin -@defbuiltin{_Float@var{n}x __builtin_nansf@var{n}x (const char *@var{str})} -Similar to @code{__builtin_nans}, except the return type is -@code{_Float@var{n}x}. +@defbuiltin{void __atomic_store_n (@var{type} *@var{ptr}, @var{type} @var{val}, int @var{memorder})} +This built-in function implements an atomic store operation. It writes +@code{@var{val}} into @code{*@var{ptr}}. + +The valid memory order variants are +@code{__ATOMIC_RELAXED}, @code{__ATOMIC_SEQ_CST}, and @code{__ATOMIC_RELEASE}. + @enddefbuiltin -@defbuiltin{int __builtin_issignaling (...)} -Return non-zero if the argument is a signaling NaN and zero otherwise. -Note while the parameter list is an -ellipsis, this function only accepts exactly one floating-point -argument. GCC treats this parameter as type-generic, which means it -does not do default promotion from float to double. -This built-in function can work even without the non-default -@code{-fsignaling-nans} option, although if a signaling NaN is computed, -stored or passed as argument to some function other than this built-in -in the current translation unit, it is safer to use @code{-fsignaling-nans}. -With @code{-ffinite-math-only} option this built-in function will always -return 0. -@enddefbuiltin +@defbuiltin{void __atomic_store (@var{type} *@var{ptr}, @var{type} *@var{val}, int @var{memorder})} +This is the generic version of an atomic store. It stores the value +of @code{*@var{val}} into @code{*@var{ptr}}. -@defbuiltin{int __builtin_ffs (int @var{x})} -Returns one plus the index of the least significant 1-bit of @var{x}, or -if @var{x} is zero, returns zero. @enddefbuiltin -@defbuiltin{int __builtin_clz (unsigned int @var{x})} -Returns the number of leading 0-bits in @var{x}, starting at the most -significant bit position. If @var{x} is 0, the result is undefined. -@enddefbuiltin +@defbuiltin{@var{type} __atomic_exchange_n (@var{type} *@var{ptr}, @var{type} @var{val}, int @var{memorder})} +This built-in function implements an atomic exchange operation. It writes +@var{val} into @code{*@var{ptr}}, and returns the previous contents of +@code{*@var{ptr}}. -@defbuiltin{int __builtin_ctz (unsigned int @var{x})} -Returns the number of trailing 0-bits in @var{x}, starting at the least -significant bit position. If @var{x} is 0, the result is undefined. -@enddefbuiltin +All memory order variants are valid. -@defbuiltin{int __builtin_clrsb (int @var{x})} -Returns the number of leading redundant sign bits in @var{x}, i.e.@: the -number of bits following the most significant bit that are identical -to it. There are no special cases for 0 or other values. @enddefbuiltin -@defbuiltin{int __builtin_popcount (unsigned int @var{x})} -Returns the number of 1-bits in @var{x}. -@enddefbuiltin +@defbuiltin{void __atomic_exchange (@var{type} *@var{ptr}, @var{type} *@var{val}, @var{type} *@var{ret}, int @var{memorder})} +This is the generic version of an atomic exchange. It stores the +contents of @code{*@var{val}} into @code{*@var{ptr}}. The original value +of @code{*@var{ptr}} is copied into @code{*@var{ret}}. -@defbuiltin{int __builtin_parity (unsigned int @var{x})} -Returns the parity of @var{x}, i.e.@: the number of 1-bits in @var{x} -modulo 2. @enddefbuiltin -@defbuiltin{int __builtin_ffsl (long)} -Similar to @code{__builtin_ffs}, except the argument type is -@code{long}. -@enddefbuiltin +@defbuiltin{bool __atomic_compare_exchange_n (@var{type} *@var{ptr}, @var{type} *@var{expected}, @var{type} @var{desired}, bool @var{weak}, int @var{success_memorder}, int @var{failure_memorder})} +This built-in function implements an atomic compare and exchange operation. +This compares the contents of @code{*@var{ptr}} with the contents of +@code{*@var{expected}}. If equal, the operation is a @emph{read-modify-write} +operation that writes @var{desired} into @code{*@var{ptr}}. If they are not +equal, the operation is a @emph{read} and the current contents of +@code{*@var{ptr}} are written into @code{*@var{expected}}. @var{weak} is @code{true} +for weak compare_exchange, which may fail spuriously, and @code{false} for +the strong variation, which never fails spuriously. Many targets +only offer the strong variation and ignore the parameter. When in doubt, use +the strong variation. -@defbuiltin{int __builtin_clzl (unsigned long)} -Similar to @code{__builtin_clz}, except the argument type is -@code{unsigned long}. -@enddefbuiltin +If @var{desired} is written into @code{*@var{ptr}} then @code{true} is returned +and memory is affected according to the +memory order specified by @var{success_memorder}. There are no +restrictions on what memory order can be used here. -@defbuiltin{int __builtin_ctzl (unsigned long)} -Similar to @code{__builtin_ctz}, except the argument type is -@code{unsigned long}. -@enddefbuiltin +Otherwise, @code{false} is returned and memory is affected according +to @var{failure_memorder}. This memory order cannot be +@code{__ATOMIC_RELEASE} nor @code{__ATOMIC_ACQ_REL}. It also cannot be a +stronger order than that specified by @var{success_memorder}. -@defbuiltin{int __builtin_clrsbl (long)} -Similar to @code{__builtin_clrsb}, except the argument type is -@code{long}. @enddefbuiltin -@defbuiltin{int __builtin_popcountl (unsigned long)} -Similar to @code{__builtin_popcount}, except the argument type is -@code{unsigned long}. -@enddefbuiltin +@defbuiltin{bool __atomic_compare_exchange (@var{type} *@var{ptr}, @var{type} *@var{expected}, @var{type} *@var{desired}, bool @var{weak}, int @var{success_memorder}, int @var{failure_memorder})} +This built-in function implements the generic version of +@code{__atomic_compare_exchange}. The function is virtually identical to +@code{__atomic_compare_exchange_n}, except the desired value is also a +pointer. -@defbuiltin{int __builtin_parityl (unsigned long)} -Similar to @code{__builtin_parity}, except the argument type is -@code{unsigned long}. @enddefbuiltin -@defbuiltin{int __builtin_ffsll (long long)} -Similar to @code{__builtin_ffs}, except the argument type is -@code{long long}. -@enddefbuiltin +@defbuiltin{@var{type} __atomic_add_fetch (@var{type} *@var{ptr}, @var{type} @var{val}, int @var{memorder})} +@defbuiltinx{@var{type} __atomic_sub_fetch (@var{type} *@var{ptr}, @var{type} @var{val}, int @var{memorder})} +@defbuiltinx{@var{type} __atomic_and_fetch (@var{type} *@var{ptr}, @var{type} @var{val}, int @var{memorder})} +@defbuiltinx{@var{type} __atomic_xor_fetch (@var{type} *@var{ptr}, @var{type} @var{val}, int @var{memorder})} +@defbuiltinx{@var{type} __atomic_or_fetch (@var{type} *@var{ptr}, @var{type} @var{val}, int @var{memorder})} +@defbuiltinx{@var{type} __atomic_nand_fetch (@var{type} *@var{ptr}, @var{type} @var{val}, int @var{memorder})} +These built-in functions perform the operation suggested by the name, and +return the result of the operation. Operations on pointer arguments are +performed as if the operands were of the @code{uintptr_t} type. That is, +they are not scaled by the size of the type to which the pointer points. -@defbuiltin{int __builtin_clzll (unsigned long long)} -Similar to @code{__builtin_clz}, except the argument type is -@code{unsigned long long}. -@enddefbuiltin +@smallexample +@{ *ptr @var{op}= val; return *ptr; @} +@{ *ptr = ~(*ptr & val); return *ptr; @} // nand +@end smallexample -@defbuiltin{int __builtin_ctzll (unsigned long long)} -Similar to @code{__builtin_ctz}, except the argument type is -@code{unsigned long long}. -@enddefbuiltin +The object pointed to by the first argument must be of integer or pointer +type. It must not be a boolean type. All memory orders are valid. -@defbuiltin{int __builtin_clrsbll (long long)} -Similar to @code{__builtin_clrsb}, except the argument type is -@code{long long}. @enddefbuiltin -@defbuiltin{int __builtin_popcountll (unsigned long long)} -Similar to @code{__builtin_popcount}, except the argument type is -@code{unsigned long long}. -@enddefbuiltin +@defbuiltin{@var{type} __atomic_fetch_add (@var{type} *@var{ptr}, @var{type} @var{val}, int @var{memorder})} +@defbuiltinx{@var{type} __atomic_fetch_sub (@var{type} *@var{ptr}, @var{type} @var{val}, int @var{memorder})} +@defbuiltinx{@var{type} __atomic_fetch_and (@var{type} *@var{ptr}, @var{type} @var{val}, int @var{memorder})} +@defbuiltinx{@var{type} __atomic_fetch_xor (@var{type} *@var{ptr}, @var{type} @var{val}, int @var{memorder})} +@defbuiltinx{@var{type} __atomic_fetch_or (@var{type} *@var{ptr}, @var{type} @var{val}, int @var{memorder})} +@defbuiltinx{@var{type} __atomic_fetch_nand (@var{type} *@var{ptr}, @var{type} @var{val}, int @var{memorder})} +These built-in functions perform the operation suggested by the name, and +return the value that had previously been in @code{*@var{ptr}}. Operations +on pointer arguments are performed as if the operands were of +the @code{uintptr_t} type. That is, they are not scaled by the size of +the type to which the pointer points. -@defbuiltin{int __builtin_parityll (unsigned long long)} -Similar to @code{__builtin_parity}, except the argument type is -@code{unsigned long long}. -@enddefbuiltin +@smallexample +@{ tmp = *ptr; *ptr @var{op}= val; return tmp; @} +@{ tmp = *ptr; *ptr = ~(*ptr & val); return tmp; @} // nand +@end smallexample -@defbuiltin{int __builtin_ffsg (...)} -Similar to @code{__builtin_ffs}, except the argument is type-generic -signed integer (standard, extended or bit-precise). No integral argument -promotions are performed on the argument. -@enddefbuiltin +The same constraints on arguments apply as for the corresponding +@code{__atomic_op_fetch} built-in functions. All memory orders are valid. -@defbuiltin{int __builtin_clzg (...)} -Similar to @code{__builtin_clz}, except the argument is type-generic -unsigned integer (standard, extended or bit-precise) and there is -optional second argument with int type. No integral argument promotions -are performed on the first argument. If two arguments are specified, -and first argument is 0, the result is the second argument. If only -one argument is specified and it is 0, the result is undefined. @enddefbuiltin -@defbuiltin{int __builtin_ctzg (...)} -Similar to @code{__builtin_ctz}, except the argument is type-generic -unsigned integer (standard, extended or bit-precise) and there is -optional second argument with int type. No integral argument promotions -are performed on the first argument. If two arguments are specified, -and first argument is 0, the result is the second argument. If only -one argument is specified and it is 0, the result is undefined. -@enddefbuiltin +@defbuiltin{bool __atomic_test_and_set (void *@var{ptr}, int @var{memorder})} -@defbuiltin{int __builtin_clrsbg (...)} -Similar to @code{__builtin_clrsb}, except the argument is type-generic -signed integer (standard, extended or bit-precise). No integral argument -promotions are performed on the argument. -@enddefbuiltin +This built-in function performs an atomic test-and-set operation on +the byte at @code{*@var{ptr}}. The byte is set to some implementation +defined nonzero ``set'' value and the return value is @code{true} if and only +if the previous contents were ``set''. +It should be only used for operands of type @code{bool} or @code{char}. For +other types only part of the value may be set. -@defbuiltin{int __builtin_popcountg (...)} -Similar to @code{__builtin_popcount}, except the argument is type-generic -unsigned integer (standard, extended or bit-precise). No integral argument -promotions are performed on the argument. -@enddefbuiltin +All memory orders are valid. -@defbuiltin{int __builtin_parityg (...)} -Similar to @code{__builtin_parity}, except the argument is type-generic -unsigned integer (standard, extended or bit-precise). No integral argument -promotions are performed on the argument. @enddefbuiltin -@defbuiltin{@var{type} __builtin_stdc_bit_ceil (@var{type} @var{arg})} -The @code{__builtin_stdc_bit_ceil} function is available only -in C. It is type-generic, the argument can be any unsigned integer -(standard, extended or bit-precise). No integral argument promotions are -performed on the argument. It is equivalent to -@code{@var{arg} <= 1 ? (@var{type}) 1 -: (@var{type}) 2 << (@var{prec} - 1 - __builtin_clzg ((@var{type}) (@var{arg} - 1)))} -where @var{prec} is bit width of @var{type}, except that side-effects -in @var{arg} are evaluated just once. -@enddefbuiltin +@defbuiltin{void __atomic_clear (bool *@var{ptr}, int @var{memorder})} -@defbuiltin{@var{type} __builtin_stdc_bit_floor (@var{type} @var{arg})} -The @code{__builtin_stdc_bit_floor} function is available only -in C. It is type-generic, the argument can be any unsigned integer -(standard, extended or bit-precise). No integral argument promotions are -performed on the argument. It is equivalent to -@code{@var{arg} == 0 ? (@var{type}) 0 -: (@var{type}) 1 << (@var{prec} - 1 - __builtin_clzg (@var{arg}))} -where @var{prec} is bit width of @var{type}, except that side-effects -in @var{arg} are evaluated just once. -@enddefbuiltin +This built-in function performs an atomic clear operation on +@code{*@var{ptr}}. After the operation, @code{*@var{ptr}} contains 0. +It should be only used for operands of type @code{bool} or @code{char} and +in conjunction with @code{__atomic_test_and_set}. +For other types it may only clear partially. If the type is not @code{bool} +prefer using @code{__atomic_store}. -@defbuiltin{{unsigned int} __builtin_stdc_bit_width (@var{type} @var{arg})} -The @code{__builtin_stdc_bit_width} function is available only -in C. It is type-generic, the argument can be any unsigned integer -(standard, extended or bit-precise). No integral argument promotions are -performed on the argument. It is equivalent to -@code{(unsigned int) (@var{prec} - __builtin_clzg (@var{arg}, @var{prec}))} -where @var{prec} is bit width of @var{type}. -@enddefbuiltin +The valid memory order variants are +@code{__ATOMIC_RELAXED}, @code{__ATOMIC_SEQ_CST}, and +@code{__ATOMIC_RELEASE}. -@defbuiltin{{unsigned int} __builtin_stdc_count_ones (@var{type} @var{arg})} -The @code{__builtin_stdc_count_ones} function is available only -in C. It is type-generic, the argument can be any unsigned integer -(standard, extended or bit-precise). No integral argument promotions are -performed on the argument. It is equivalent to -@code{(unsigned int) __builtin_popcountg (@var{arg})} @enddefbuiltin -@defbuiltin{{unsigned int} __builtin_stdc_count_zeros (@var{type} @var{arg})} -The @code{__builtin_stdc_count_zeros} function is available only -in C. It is type-generic, the argument can be any unsigned integer -(standard, extended or bit-precise). No integral argument promotions are -performed on the argument. It is equivalent to -@code{(unsigned int) __builtin_popcountg ((@var{type}) ~@var{arg})} -@enddefbuiltin +@defbuiltin{void __atomic_thread_fence (int @var{memorder})} + +This built-in function acts as a synchronization fence between threads +based on the specified memory order. + +All memory orders are valid. -@defbuiltin{{unsigned int} __builtin_stdc_first_leading_one (@var{type} @var{arg})} -The @code{__builtin_stdc_first_leading_one} function is available only -in C. It is type-generic, the argument can be any unsigned integer -(standard, extended or bit-precise). No integral argument promotions are -performed on the argument. It is equivalent to -@code{__builtin_clzg (@var{arg}, -1) + 1U} @enddefbuiltin -@defbuiltin{{unsigned int} __builtin_stdc_first_leading_zero (@var{type} @var{arg})} -The @code{__builtin_stdc_first_leading_zero} function is available only -in C. It is type-generic, the argument can be any unsigned integer -(standard, extended or bit-precise). No integral argument promotions are -performed on the argument. It is equivalent to -@code{__builtin_clzg ((@var{type}) ~@var{arg}, -1) + 1U} -@enddefbuiltin +@defbuiltin{void __atomic_signal_fence (int @var{memorder})} -@defbuiltin{{unsigned int} __builtin_stdc_first_trailing_one (@var{type} @var{arg})} -The @code{__builtin_stdc_first_trailing_one} function is available only -in C. It is type-generic, the argument can be any unsigned integer -(standard, extended or bit-precise). No integral argument promotions are -performed on the argument. It is equivalent to -@code{__builtin_ctzg (@var{arg}, -1) + 1U} -@enddefbuiltin +This built-in function acts as a synchronization fence between a thread +and signal handlers based in the same thread. -@defbuiltin{{unsigned int} __builtin_stdc_first_trailing_zero (@var{type} @var{arg})} -The @code{__builtin_stdc_first_trailing_zero} function is available only -in C. It is type-generic, the argument can be any unsigned integer -(standard, extended or bit-precise). No integral argument promotions are -performed on the argument. It is equivalent to -@code{__builtin_ctzg ((@var{type}) ~@var{arg}, -1) + 1U} -@enddefbuiltin +All memory orders are valid. -@defbuiltin{{unsigned int} __builtin_stdc_has_single_bit (@var{type} @var{arg})} -The @code{__builtin_stdc_has_single_bit} function is available only -in C. It is type-generic, the argument can be any unsigned integer -(standard, extended or bit-precise). No integral argument promotions are -performed on the argument. It is equivalent to -@code{(_Bool) (__builtin_popcountg (@var{arg}) == 1)} @enddefbuiltin -@defbuiltin{{unsigned int} __builtin_stdc_leading_ones (@var{type} @var{arg})} -The @code{__builtin_stdc_leading_ones} function is available only -in C. It is type-generic, the argument can be any unsigned integer -(standard, extended or bit-precise). No integral argument promotions are -performed on the argument. It is equivalent to -@code{(unsigned int) __builtin_clzg ((@var{type}) ~@var{arg}, @var{prec})} -@enddefbuiltin +@defbuiltin{bool __atomic_always_lock_free (size_t @var{size}, void *@var{ptr})} -@defbuiltin{{unsigned int} __builtin_stdc_leading_zeros (@var{type} @var{arg})} -The @code{__builtin_stdc_leading_zeros} function is available only -in C. It is type-generic, the argument can be any unsigned integer -(standard, extended or bit-precise). No integral argument promotions are -performed on the argument. It is equivalent to -@code{(unsigned int) __builtin_clzg (@var{arg}, @var{prec})} -@enddefbuiltin +This built-in function returns @code{true} if objects of @var{size} bytes always +generate lock-free atomic instructions for the target architecture. +@var{size} must resolve to a compile-time constant and the result also +resolves to a compile-time constant. -@defbuiltin{{unsigned int} __builtin_stdc_trailing_ones (@var{type} @var{arg})} -The @code{__builtin_stdc_trailing_ones} function is available only -in C. It is type-generic, the argument can be any unsigned integer -(standard, extended or bit-precise). No integral argument promotions are -performed on the argument. It is equivalent to -@code{(unsigned int) __builtin_ctzg ((@var{type}) ~@var{arg}, @var{prec})} -@enddefbuiltin +@var{ptr} is an optional pointer to the object that may be used to determine +alignment. A value of 0 indicates typical alignment should be used. The +compiler may also ignore this parameter. -@defbuiltin{{unsigned int} __builtin_stdc_trailing_zeros (@var{type} @var{arg})} -The @code{__builtin_stdc_trailing_zeros} function is available only -in C. It is type-generic, the argument can be any unsigned integer -(standard, extended or bit-precise). No integral argument promotions are -performed on the argument. It is equivalent to -@code{(unsigned int) __builtin_ctzg (@var{arg}, @var{prec})} -@enddefbuiltin +@smallexample +if (__atomic_always_lock_free (sizeof (long long), 0)) +@end smallexample -@defbuiltin{@var{type1} __builtin_stdc_rotate_left (@var{type1} @var{arg1}, @var{type2} @var{arg2})} -The @code{__builtin_stdc_rotate_left} function is available only -in C. It is type-generic, the first argument can be any unsigned integer -(standard, extended or bit-precise) and second argument any signed or -unsigned integer or @code{char}. No integral argument promotions are -performed on the arguments. It is equivalent to -@code{(@var{type1}) ((@var{arg1} << (@var{arg2} % @var{prec})) -| (@var{arg1} >> ((-(unsigned @var{type2}) @var{arg2}) % @var{prec})))} -where @var{prec} is bit width of @var{type1}, except that side-effects -in @var{arg1} and @var{arg2} are evaluated just once. The behavior is -undefined if @var{arg2} is negative. @enddefbuiltin -@defbuiltin{@var{type1} __builtin_stdc_rotate_right (@var{type1} @var{arg1}, @var{type2} @var{arg2})} -The @code{__builtin_stdc_rotate_right} function is available only -in C. It is type-generic, the first argument can be any unsigned integer -(standard, extended or bit-precise) and second argument any signed or -unsigned integer or @code{char}. No integral argument promotions are -performed on the arguments. It is equivalent to -@code{(@var{type1}) ((@var{arg1} >> (@var{arg2} % @var{prec})) -| (@var{arg1} << ((-(unsigned @var{type2}) @var{arg2}) % @var{prec})))} -where @var{prec} is bit width of @var{type1}, except that side-effects -in @var{arg1} and @var{arg2} are evaluated just once. The behavior is -undefined if @var{arg2} is negative. -@enddefbuiltin +@defbuiltin{bool __atomic_is_lock_free (size_t @var{size}, void *@var{ptr})} -@defbuiltin{double __builtin_powi (double, int)} -@defbuiltinx{float __builtin_powif (float, int)} -@defbuiltinx{{long double} __builtin_powil (long double, int)} -Returns the first argument raised to the power of the second. Unlike the -@code{pow} function no guarantees about precision and rounding are made. -@enddefbuiltin +This built-in function returns @code{true} if objects of @var{size} bytes always +generate lock-free atomic instructions for the target architecture. If +the built-in function is not known to be lock-free, a call is made to a +runtime routine named @code{__atomic_is_lock_free}. -@defbuiltin{uint16_t __builtin_bswap16 (uint16_t @var{x})} -Returns @var{x} with the order of the bytes reversed; for example, -@code{0xabcd} becomes @code{0xcdab}. Byte here always means -exactly 8 bits. +@var{ptr} is an optional pointer to the object that may be used to determine +alignment. A value of 0 indicates typical alignment should be used. The +compiler may also ignore this parameter. @enddefbuiltin -@defbuiltin{uint32_t __builtin_bswap32 (uint32_t @var{x})} -Similar to @code{__builtin_bswap16}, except the argument and return types -are 32-bit. -@enddefbuiltin +@node Integer Overflow Builtins +@section Built-in Functions to Perform Arithmetic with Overflow Checking -@defbuiltin{uint64_t __builtin_bswap64 (uint64_t @var{x})} -Similar to @code{__builtin_bswap32}, except the argument and return types -are 64-bit. -@enddefbuiltin +The following built-in functions allow performing simple arithmetic operations +together with checking whether the operations overflowed. -@defbuiltin{uint128_t __builtin_bswap128 (uint128_t @var{x})} -Similar to @code{__builtin_bswap64}, except the argument and return types -are 128-bit. Only supported on targets when 128-bit types are supported. -@enddefbuiltin +@defbuiltin{bool __builtin_add_overflow (@var{type1} @var{a}, @var{type2} @var{b}, @var{type3} *@var{res})} +@defbuiltinx{bool __builtin_sadd_overflow (int @var{a}, int @var{b}, int *@var{res})} +@defbuiltinx{bool __builtin_saddl_overflow (long int @var{a}, long int @var{b}, long int *@var{res})} +@defbuiltinx{bool __builtin_saddll_overflow (long long int @var{a}, long long int @var{b}, long long int *@var{res})} +@defbuiltinx{bool __builtin_uadd_overflow (unsigned int @var{a}, unsigned int @var{b}, unsigned int *@var{res})} +@defbuiltinx{bool __builtin_uaddl_overflow (unsigned long int @var{a}, unsigned long int @var{b}, unsigned long int *@var{res})} +@defbuiltinx{bool __builtin_uaddll_overflow (unsigned long long int @var{a}, unsigned long long int @var{b}, unsigned long long int *@var{res})} +These built-in functions promote the first two operands into infinite precision signed +type and perform addition on those promoted operands. The result is then +cast to the type the third pointer argument points to and stored there. +If the stored result is equal to the infinite precision result, the built-in +functions return @code{false}, otherwise they return @code{true}. As the addition is +performed in infinite signed precision, these built-in functions have fully defined +behavior for all argument values. -@defbuiltin{Pmode __builtin_extend_pointer (void * @var{x})} -On targets where the user visible pointer size is smaller than the size -of an actual hardware address this function returns the extended user -pointer. Targets where this is true included ILP32 mode on x86_64 or -Aarch64. This function is mainly useful when writing inline assembly -code. -@enddefbuiltin +The first built-in function allows arbitrary integral types for operands and +the result type must be pointer to some integral type other than enumerated or +boolean type, the rest of the built-in functions have explicit integer types. -@defbuiltin{int __builtin_goacc_parlevel_id (int @var{x})} -Returns the openacc gang, worker or vector id depending on whether @var{x} is -0, 1 or 2. -@enddefbuiltin +The compiler will attempt to use hardware instructions to implement +these built-in functions where possible, like conditional jump on overflow +after addition, conditional jump on carry etc. -@defbuiltin{int __builtin_goacc_parlevel_size (int @var{x})} -Returns the openacc gang, worker or vector size depending on whether @var{x} is -0, 1 or 2. @enddefbuiltin -@defbuiltin{uint8_t __builtin_rev_crc8_data8 (uint8_t @var{crc}, uint8_t @var{data}, uint8_t @var{poly})} -Returns the calculated 8-bit bit-reversed CRC using the initial CRC (8-bit), -data (8-bit) and the polynomial (8-bit). -@var{crc} is the initial CRC, @var{data} is the data and -@var{poly} is the polynomial without leading 1. -Table-based or clmul-based CRC may be used for the -calculation, depending on the target architecture. -@enddefbuiltin +@defbuiltin{bool __builtin_sub_overflow (@var{type1} @var{a}, @var{type2} @var{b}, @var{type3} *@var{res})} +@defbuiltinx{bool __builtin_ssub_overflow (int @var{a}, int @var{b}, int *@var{res})} +@defbuiltinx{bool __builtin_ssubl_overflow (long int @var{a}, long int @var{b}, long int *@var{res})} +@defbuiltinx{bool __builtin_ssubll_overflow (long long int @var{a}, long long int @var{b}, long long int *@var{res})} +@defbuiltinx{bool __builtin_usub_overflow (unsigned int @var{a}, unsigned int @var{b}, unsigned int *@var{res})} +@defbuiltinx{bool __builtin_usubl_overflow (unsigned long int @var{a}, unsigned long int @var{b}, unsigned long int *@var{res})} +@defbuiltinx{bool __builtin_usubll_overflow (unsigned long long int @var{a}, unsigned long long int @var{b}, unsigned long long int *@var{res})} -@defbuiltin{uint16_t __builtin_rev_crc16_data16 (uint16_t @var{crc}, uint16_t @var{data}, uint16_t @var{poly})} -Similar to @code{__builtin_rev_crc8_data8}, except the argument and return types -are 16-bit. -@enddefbuiltin +These built-in functions are similar to the add overflow checking built-in +functions above, except they perform subtraction, subtract the second argument +from the first one, instead of addition. -@defbuiltin{uint16_t __builtin_rev_crc16_data8 (uint16_t @var{crc}, uint8_t @var{data}, uint16_t @var{poly})} -Similar to @code{__builtin_rev_crc16_data16}, except the @var{data} argument -type is 8-bit. @enddefbuiltin -@defbuiltin{uint32_t __builtin_rev_crc32_data32 (uint32_t @var{crc}, uint32_t @var{data}, uint32_t @var{poly})} -Similar to @code{__builtin_rev_crc8_data8}, except the argument and return types -are 32-bit and for the CRC calculation may be also used crc* machine instruction -depending on the target and the polynomial. -@enddefbuiltin +@defbuiltin{bool __builtin_mul_overflow (@var{type1} @var{a}, @var{type2} @var{b}, @var{type3} *@var{res})} +@defbuiltinx{bool __builtin_smul_overflow (int @var{a}, int @var{b}, int *@var{res})} +@defbuiltinx{bool __builtin_smull_overflow (long int @var{a}, long int @var{b}, long int *@var{res})} +@defbuiltinx{bool __builtin_smulll_overflow (long long int @var{a}, long long int @var{b}, long long int *@var{res})} +@defbuiltinx{bool __builtin_umul_overflow (unsigned int @var{a}, unsigned int @var{b}, unsigned int *@var{res})} +@defbuiltinx{bool __builtin_umull_overflow (unsigned long int @var{a}, unsigned long int @var{b}, unsigned long int *@var{res})} +@defbuiltinx{bool __builtin_umulll_overflow (unsigned long long int @var{a}, unsigned long long int @var{b}, unsigned long long int *@var{res})} -@defbuiltin{uint32_t __builtin_rev_crc32_data8 (uint32_t @var{crc}, uint8_t @var{data}, uint32_t @var{poly})} -Similar to @code{__builtin_rev_crc32_data32}, except the @var{data} argument -type is 8-bit. -@enddefbuiltin +These built-in functions are similar to the add overflow checking built-in +functions above, except they perform multiplication, instead of addition. -@defbuiltin{uint32_t __builtin_rev_crc32_data16 (uint32_t @var{crc}, uint16_t @var{data}, uint32_t @var{poly})} -Similar to @code{__builtin_rev_crc32_data32}, except the @var{data} argument -type is 16-bit. @enddefbuiltin -@defbuiltin{uint64_t __builtin_rev_crc64_data64 (uint64_t @var{crc}, uint64_t @var{data}, uint64_t @var{poly})} -Similar to @code{__builtin_rev_crc8_data8}, except the argument and return types -are 64-bit. -@enddefbuiltin +The following built-in functions allow checking if simple arithmetic operation +would overflow. -@defbuiltin{uint64_t __builtin_rev_crc64_data8 (uint64_t @var{crc}, uint8_t @var{data}, uint64_t @var{poly})} -Similar to @code{__builtin_rev_crc64_data64}, except the @var{data} argument type -is 8-bit. -@enddefbuiltin +@defbuiltin{bool __builtin_add_overflow_p (@var{type1} @var{a}, @var{type2} @var{b}, @var{type3} @var{c})} +@defbuiltinx{bool __builtin_sub_overflow_p (@var{type1} @var{a}, @var{type2} @var{b}, @var{type3} @var{c})} +@defbuiltinx{bool __builtin_mul_overflow_p (@var{type1} @var{a}, @var{type2} @var{b}, @var{type3} @var{c})} -@defbuiltin{uint64_t __builtin_rev_crc64_data16 (uint64_t @var{crc}, uint16_t @var{data}, uint64_t @var{poly})} -Similar to @code{__builtin_rev_crc64_data64}, except the @var{data} argument type -is 16-bit. -@enddefbuiltin +These built-in functions are similar to @code{__builtin_add_overflow}, +@code{__builtin_sub_overflow}, or @code{__builtin_mul_overflow}, except that +they don't store the result of the arithmetic operation anywhere and the +last argument is not a pointer, but some expression with integral type other +than enumerated or boolean type. -@defbuiltin{uint64_t __builtin_rev_crc64_data32 (uint64_t @var{crc}, uint32_t @var{data}, uint64_t @var{poly})} -Similar to @code{__builtin_rev_crc64_data64}, except the @var{data} argument type -is 32-bit. -@enddefbuiltin +The built-in functions promote the first two operands into infinite precision signed type +and perform addition on those promoted operands. The result is then +cast to the type of the third argument. If the cast result is equal to the infinite +precision result, the built-in functions return @code{false}, otherwise they return @code{true}. +The value of the third argument is ignored, just the side effects in the third argument +are evaluated, and no integral argument promotions are performed on the last argument. +If the third argument is a bit-field, the type used for the result cast has the +precision and signedness of the given bit-field, rather than precision and signedness +of the underlying type. -@defbuiltin{uint8_t __builtin_crc8_data8 (uint8_t @var{crc}, uint8_t @var{data}, uint8_t @var{poly})} -Returns the calculated 8-bit bit-forward CRC using the initial CRC (8-bit), -data (8-bit) and the polynomial (8-bit). -@var{crc} is the initial CRC, @var{data} is the data and -@var{poly} is the polynomial without leading 1. -Table-based or clmul-based CRC may be used for the -calculation, depending on the target architecture. -@enddefbuiltin +For example, the following macro can be used to portably check, at +compile-time, whether or not adding two constant integers will overflow, +and perform the addition only when it is known to be safe and not to trigger +a @option{-Woverflow} warning. -@defbuiltin{uint16_t __builtin_crc16_data16 (uint16_t @var{crc}, uint16_t @var{data}, uint16_t @var{poly})} -Similar to @code{__builtin_crc8_data8}, except the argument and return types -are 16-bit. -@enddefbuiltin +@smallexample +#define INT_ADD_OVERFLOW_P(a, b) \ + __builtin_add_overflow_p (a, b, (__typeof__ ((a) + (b))) 0) -@defbuiltin{uint16_t __builtin_crc16_data8 (uint16_t @var{crc}, uint8_t @var{data}, uint16_t @var{poly})} -Similar to @code{__builtin_crc16_data16}, except the @var{data} argument type -is 8-bit. -@enddefbuiltin - -@defbuiltin{uint32_t __builtin_crc32_data32 (uint32_t @var{crc}, uint32_t @var{data}, uint32_t @var{poly})} -Similar to @code{__builtin_crc8_data8}, except the argument and return types -are 32-bit. -@enddefbuiltin - -@defbuiltin{uint32_t __builtin_crc32_data8 (uint32_t @var{crc}, uint8_t @var{data}, uint32_t @var{poly})} -Similar to @code{__builtin_crc32_data32}, except the @var{data} argument type -is 8-bit. -@enddefbuiltin +enum @{ + A = INT_MAX, B = 3, + C = INT_ADD_OVERFLOW_P (A, B) ? 0 : A + B, + D = __builtin_add_overflow_p (1, SCHAR_MAX, (signed char) 0) +@}; +@end smallexample -@defbuiltin{uint32_t __builtin_crc32_data16 (uint32_t @var{crc}, uint16_t @var{data}, uint32_t @var{poly})} -Similar to @code{__builtin_crc32_data32}, except the @var{data} argument type -is 16-bit. +The compiler will attempt to use hardware instructions to implement +these built-in functions where possible, like conditional jump on overflow +after addition, conditional jump on carry etc. + @enddefbuiltin -@defbuiltin{uint64_t __builtin_crc64_data64 (uint64_t @var{crc}, uint64_t @var{data}, uint64_t @var{poly})} -Similar to @code{__builtin_crc8_data8}, except the argument and return types -are 64-bit. -@enddefbuiltin +@defbuiltin{{unsigned int} __builtin_addc (unsigned int @var{a}, unsigned int @var{b}, unsigned int @var{carry_in}, unsigned int *@var{carry_out})} +@defbuiltinx{{unsigned long int} __builtin_addcl (unsigned long int @var{a}, unsigned long int @var{b}, unsigned int @var{carry_in}, unsigned long int *@var{carry_out})} +@defbuiltinx{{unsigned long long int} __builtin_addcll (unsigned long long int @var{a}, unsigned long long int @var{b}, unsigned long long int @var{carry_in}, unsigned long long int *@var{carry_out})} -@defbuiltin{uint64_t __builtin_crc64_data8 (uint64_t @var{crc}, uint8_t @var{data}, uint64_t @var{poly})} -Similar to @code{__builtin_crc64_data64}, except the @var{data} argument type -is 8-bit. -@enddefbuiltin +These built-in functions are equivalent to: +@smallexample + (@{ __typeof__ (@var{a}) s; \ + __typeof__ (@var{a}) c1 = __builtin_add_overflow (@var{a}, @var{b}, &s); \ + __typeof__ (@var{a}) c2 = __builtin_add_overflow (s, @var{carry_in}, &s); \ + *(@var{carry_out}) = c1 | c2; \ + s; @}) +@end smallexample -@defbuiltin{uint64_t __builtin_crc64_data16 (uint64_t @var{crc}, uint16_t @var{data}, uint64_t @var{poly})} -Similar to @code{__builtin_crc64_data64}, except the @var{data} argument type -is 16-bit. -@enddefbuiltin +i.e.@: they add 3 unsigned values, set what the last argument +points to to 1 if any of the two additions overflowed (otherwise 0) +and return the sum of those 3 unsigned values. Note, while all +the first 3 arguments can have arbitrary values, better code will be +emitted if one of them (preferably the third one) has only values +0 or 1 (i.e.@: carry-in). -@defbuiltin{uint64_t __builtin_crc64_data32 (uint64_t @var{crc}, uint32_t @var{data}, uint64_t @var{poly})} -Similar to @code{__builtin_crc64_data64}, except the @var{data} argument type -is 32-bit. @enddefbuiltin -@node Target Builtins -@section Built-in Functions Specific to Particular Target Machines - -On some target machines, GCC supports many built-in functions specific -to those machines. Generally these generate calls to specific machine -instructions, but allow the compiler to schedule those calls. - -@menu -* AArch64 Built-in Functions:: -* Alpha Built-in Functions:: -* ARC Built-in Functions:: -* ARC SIMD Built-in Functions:: -* ARM iWMMXt Built-in Functions:: -* ARM C Language Extensions (ACLE):: -* ARM Floating Point Status and Control Intrinsics:: -* ARM ARMv8-M Security Extensions:: -* AVR Built-in Functions:: -* Blackfin Built-in Functions:: -* BPF Built-in Functions:: -* FR-V Built-in Functions:: -* LoongArch Base Built-in Functions:: -* LoongArch SX Vector Intrinsics:: -* LoongArch ASX Vector Intrinsics:: -* MIPS DSP Built-in Functions:: -* MIPS Paired-Single Support:: -* MIPS Loongson Built-in Functions:: -* MIPS SIMD Architecture (MSA) Support:: -* Other MIPS Built-in Functions:: -* MSP430 Built-in Functions:: -* NDS32 Built-in Functions:: -* Nvidia PTX Built-in Functions:: -* Basic PowerPC Built-in Functions:: -* PowerPC AltiVec/VSX Built-in Functions:: -* PowerPC Hardware Transactional Memory Built-in Functions:: -* PowerPC Atomic Memory Operation Functions:: -* PowerPC Matrix-Multiply Assist Built-in Functions:: -* PRU Built-in Functions:: -* RISC-V Built-in Functions:: -* RISC-V Vector Intrinsics:: -* CORE-V Built-in Functions:: -* RX Built-in Functions:: -* S/390 System z Built-in Functions:: -* SH Built-in Functions:: -* SPARC VIS Built-in Functions:: -* TI C6X Built-in Functions:: -* x86 Built-in Functions:: -* x86 transactional memory intrinsics:: -* x86 control-flow protection intrinsics:: -@end menu - -@node AArch64 Built-in Functions -@subsection AArch64 Built-in Functions +@defbuiltin{{unsigned int} __builtin_subc (unsigned int @var{a}, unsigned int @var{b}, unsigned int @var{carry_in}, unsigned int *@var{carry_out})} +@defbuiltinx{{unsigned long int} __builtin_subcl (unsigned long int @var{a}, unsigned long int @var{b}, unsigned int @var{carry_in}, unsigned long int *@var{carry_out})} +@defbuiltinx{{unsigned long long int} __builtin_subcll (unsigned long long int @var{a}, unsigned long long int @var{b}, unsigned long long int @var{carry_in}, unsigned long long int *@var{carry_out})} -These built-in functions are available for the AArch64 family of -processors. +These built-in functions are equivalent to: @smallexample -unsigned int __builtin_aarch64_get_fpcr (); -void __builtin_aarch64_set_fpcr (unsigned int); -unsigned int __builtin_aarch64_get_fpsr (); -void __builtin_aarch64_set_fpsr (unsigned int); - -unsigned long long __builtin_aarch64_get_fpcr64 (); -void __builtin_aarch64_set_fpcr64 (unsigned long long); -unsigned long long __builtin_aarch64_get_fpsr64 (); -void __builtin_aarch64_set_fpsr64 (unsigned long long); + (@{ __typeof__ (@var{a}) s; \ + __typeof__ (@var{a}) c1 = __builtin_sub_overflow (@var{a}, @var{b}, &s); \ + __typeof__ (@var{a}) c2 = __builtin_sub_overflow (s, @var{carry_in}, &s); \ + *(@var{carry_out}) = c1 | c2; \ + s; @}) @end smallexample -@node Alpha Built-in Functions -@subsection Alpha Built-in Functions - -These built-in functions are available for the Alpha family of -processors, depending on the command-line switches used. +i.e.@: they subtract 2 unsigned values from the first unsigned value, +set what the last argument points to to 1 if any of the two subtractions +overflowed (otherwise 0) and return the result of the subtractions. +Note, while all the first 3 arguments can have arbitrary values, better code +will be emitted if one of them (preferrably the third one) has only values +0 or 1 (i.e.@: carry-in). -The following built-in functions are always available. They -all generate the machine instruction that is part of the name. +@enddefbuiltin -@smallexample -long __builtin_alpha_implver (void); -long __builtin_alpha_rpcc (void); -long __builtin_alpha_amask (long); -long __builtin_alpha_cmpbge (long, long); -long __builtin_alpha_extbl (long, long); -long __builtin_alpha_extwl (long, long); -long __builtin_alpha_extll (long, long); -long __builtin_alpha_extql (long, long); -long __builtin_alpha_extwh (long, long); -long __builtin_alpha_extlh (long, long); -long __builtin_alpha_extqh (long, long); -long __builtin_alpha_insbl (long, long); -long __builtin_alpha_inswl (long, long); -long __builtin_alpha_insll (long, long); -long __builtin_alpha_insql (long, long); -long __builtin_alpha_inswh (long, long); -long __builtin_alpha_inslh (long, long); -long __builtin_alpha_insqh (long, long); -long __builtin_alpha_mskbl (long, long); -long __builtin_alpha_mskwl (long, long); -long __builtin_alpha_mskll (long, long); -long __builtin_alpha_mskql (long, long); -long __builtin_alpha_mskwh (long, long); -long __builtin_alpha_msklh (long, long); -long __builtin_alpha_mskqh (long, long); -long __builtin_alpha_umulh (long, long); -long __builtin_alpha_zap (long, long); -long __builtin_alpha_zapnot (long, long); -@end smallexample +@node x86 specific memory model extensions for transactional memory +@section x86-Specific Memory Model Extensions for Transactional Memory -The following built-in functions are always with @option{-mmax} -or @option{-mcpu=@var{cpu}} where @var{cpu} is @code{pca56} or -later. They all generate the machine instruction that is part -of the name. +The x86 architecture supports additional memory ordering flags +to mark critical sections for hardware lock elision. +These must be specified in addition to an existing memory order to +atomic intrinsics. -@smallexample -long __builtin_alpha_pklb (long); -long __builtin_alpha_pkwb (long); -long __builtin_alpha_unpkbl (long); -long __builtin_alpha_unpkbw (long); -long __builtin_alpha_minub8 (long, long); -long __builtin_alpha_minsb8 (long, long); -long __builtin_alpha_minuw4 (long, long); -long __builtin_alpha_minsw4 (long, long); -long __builtin_alpha_maxub8 (long, long); -long __builtin_alpha_maxsb8 (long, long); -long __builtin_alpha_maxuw4 (long, long); -long __builtin_alpha_maxsw4 (long, long); -long __builtin_alpha_perr (long, long); -@end smallexample +@table @code +@item __ATOMIC_HLE_ACQUIRE +Start lock elision on a lock variable. +Memory order must be @code{__ATOMIC_ACQUIRE} or stronger. +@item __ATOMIC_HLE_RELEASE +End lock elision on a lock variable. +Memory order must be @code{__ATOMIC_RELEASE} or stronger. +@end table -The following built-in functions are always with @option{-mcix} -or @option{-mcpu=@var{cpu}} where @var{cpu} is @code{ev67} or -later. They all generate the machine instruction that is part -of the name. +When a lock acquire fails, it is required for good performance to abort +the transaction quickly. This can be done with a @code{_mm_pause}. @smallexample -long __builtin_alpha_cttz (long); -long __builtin_alpha_ctlz (long); -long __builtin_alpha_ctpop (long); -@end smallexample +#include // For _mm_pause -The following built-in functions are available on systems that use the OSF/1 -PALcode. Normally they invoke the @code{rduniq} and @code{wruniq} -PAL calls, but when invoked with @option{-mtls-kernel}, they invoke -@code{rdval} and @code{wrval}. +int lockvar; -@smallexample -void *__builtin_thread_pointer (void); -void __builtin_set_thread_pointer (void *); +/* Acquire lock with lock elision */ +while (__atomic_exchange_n(&lockvar, 1, __ATOMIC_ACQUIRE|__ATOMIC_HLE_ACQUIRE)) + _mm_pause(); /* Abort failed transaction */ +... +/* Free lock with lock elision */ +__atomic_store_n(&lockvar, 0, __ATOMIC_RELEASE|__ATOMIC_HLE_RELEASE); @end smallexample -@node ARC Built-in Functions -@subsection ARC Built-in Functions +@node Object Size Checking +@section Object Size Checking -The following built-in functions are provided for ARC targets. The -built-ins generate the corresponding assembly instructions. In the -examples given below, the generated code often requires an operand or -result to be in a register. Where necessary further code will be -generated to ensure this is true, but for brevity this is not -described in each case. +@subsection Object Size Checking Built-in Functions +@findex __builtin___memcpy_chk +@findex __builtin___mempcpy_chk +@findex __builtin___memmove_chk +@findex __builtin___memset_chk +@findex __builtin___strcpy_chk +@findex __builtin___stpcpy_chk +@findex __builtin___strncpy_chk +@findex __builtin___strcat_chk +@findex __builtin___strncat_chk -@emph{Note:} Using a built-in to generate an instruction not supported -by a target may cause problems. At present the compiler is not -guaranteed to detect such misuse, and as a result an internal compiler -error may be generated. - -@defbuiltin{int __builtin_arc_aligned (void *@var{val}, int @var{alignval})} -Return 1 if @var{val} is known to have the byte alignment given -by @var{alignval}, otherwise return 0. -Note that this is different from -@smallexample -__alignof__(*(char *)@var{val}) >= alignval -@end smallexample -because __alignof__ sees only the type of the dereference, whereas -__builtin_arc_align uses alignment information from the pointer -as well as from the pointed-to type. -The information available will depend on optimization level. -@enddefbuiltin +GCC implements a limited buffer overflow protection mechanism that can +prevent some buffer overflow attacks by determining the sizes of objects +into which data is about to be written and preventing the writes when +the size isn't sufficient. The built-in functions described below yield +the best results when used together and when optimization is enabled. +For example, to detect object sizes across function boundaries or to +follow pointer assignments through non-trivial control flow they rely +on various optimization passes enabled with @option{-O2}. However, to +a limited extent, they can be used without optimization as well. -@defbuiltin{void __builtin_arc_brk (void)} -Generates -@example -brk -@end example -@enddefbuiltin +@defbuiltin{size_t __builtin_object_size (const void * @var{ptr}, int @var{type})} +is a built-in construct that returns a constant number of bytes from +@var{ptr} to the end of the object @var{ptr} pointer points to +(if known at compile time). To determine the sizes of dynamically allocated +objects the function relies on the allocation functions called to obtain +the storage to be declared with the @code{alloc_size} attribute (@pxref{Common +Function Attributes}). @code{__builtin_object_size} never evaluates +its arguments for side effects. If there are any side effects in them, it +returns @code{(size_t) -1} for @var{type} 0 or 1 and @code{(size_t) 0} +for @var{type} 2 or 3. If there are multiple objects @var{ptr} can +point to and all of them are known at compile time, the returned number +is the maximum of remaining byte counts in those objects if @var{type} & 2 is +0 and minimum if nonzero. If it is not possible to determine which objects +@var{ptr} points to at compile time, @code{__builtin_object_size} should +return @code{(size_t) -1} for @var{type} 0 or 1 and @code{(size_t) 0} +for @var{type} 2 or 3. -@defbuiltin{{unsigned int} __builtin_arc_core_read (unsigned int @var{regno})} -The operand is the number of a register to be read. Generates: -@example -mov @var{dest}, r@var{regno} -@end example -where the value in @var{dest} will be the result returned from the -built-in. -@enddefbuiltin +@var{type} is an integer constant from 0 to 3. If the least significant +bit is clear, objects are whole variables, if it is set, a closest +surrounding subobject is considered the object a pointer points to. +The second bit determines if maximum or minimum of remaining bytes +is computed. -@defbuiltin{void __builtin_arc_core_write (unsigned int @var{regno}, unsigned int @var{val})} -The first operand is the number of a register to be written, the -second operand is a compile time constant to write into that -register. Generates: -@example -mov r@var{regno}, @var{val} -@end example -@enddefbuiltin +@smallexample +struct V @{ char buf1[10]; int b; char buf2[10]; @} var; +char *p = &var.buf1[1], *q = &var.b; -@defbuiltin{int __builtin_arc_divaw (int @var{a}, int @var{b})} -Only available if either @option{-mcpu=ARC700} or @option{-meA} is set. -Generates: -@example -divaw @var{dest}, @var{a}, @var{b} -@end example -where the value in @var{dest} will be the result returned from the -built-in. +/* Here the object p points to is var. */ +assert (__builtin_object_size (p, 0) == sizeof (var) - 1); +/* The subobject p points to is var.buf1. */ +assert (__builtin_object_size (p, 1) == sizeof (var.buf1) - 1); +/* The object q points to is var. */ +assert (__builtin_object_size (q, 0) + == (char *) (&var + 1) - (char *) &var.b); +/* The subobject q points to is var.b. */ +assert (__builtin_object_size (q, 1) == sizeof (var.b)); +@end smallexample @enddefbuiltin -@defbuiltin{void __builtin_arc_flag (unsigned int @var{a})} -Generates -@example -flag @var{a} -@end example +@defbuiltin{{size_t} __builtin_dynamic_object_size (const void * @var{ptr}, int @var{type})} +is similar to @code{__builtin_object_size} in that it returns a number of bytes +from @var{ptr} to the end of the object @var{ptr} pointer points to, except +that the size returned may not be a constant. This results in successful +evaluation of object size estimates in a wider range of use cases and can be +more precise than @code{__builtin_object_size}, but it incurs a performance +penalty since it may add a runtime overhead on size computation. Semantics of +@var{type} as well as return values in case it is not possible to determine +which objects @var{ptr} points to at compile time are the same as in the case +of @code{__builtin_object_size}. @enddefbuiltin -@defbuiltin{{unsigned int} __builtin_arc_lr (unsigned int @var{auxr})} -The operand, @var{auxv}, is the address of an auxiliary register and -must be a compile time constant. Generates: -@example -lr @var{dest}, [@var{auxr}] -@end example -Where the value in @var{dest} will be the result returned from the -built-in. -@enddefbuiltin +@subsection Object Size Checking and Source Fortification -@defbuiltin{void __builtin_arc_mul64 (int @var{a}, int @var{b})} -Only available with @option{-mmul64}. Generates: -@example -mul64 @var{a}, @var{b} -@end example -@enddefbuiltin +Hardening of function calls using the @code{_FORTIFY_SOURCE} macro is +one of the key uses of the object size checking built-in functions. To +make implementation of these features more convenient and improve +optimization and diagnostics, there are built-in functions added for +many common string operation functions, e.g., for @code{memcpy} +@code{__builtin___memcpy_chk} built-in is provided. This built-in has +an additional last argument, which is the number of bytes remaining in +the object the @var{dest} argument points to or @code{(size_t) -1} if +the size is not known. -@defbuiltin{void __builtin_arc_mulu64 (unsigned int @var{a}, unsigned int @var{b})} -Only available with @option{-mmul64}. Generates: -@example -mulu64 @var{a}, @var{b} -@end example -@enddefbuiltin +The built-in functions are optimized into the normal string functions +like @code{memcpy} if the last argument is @code{(size_t) -1} or if +it is known at compile time that the destination object will not +be overflowed. If the compiler can determine at compile time that the +object will always be overflowed, it issues a warning. -@defbuiltin{void __builtin_arc_nop (void)} -Generates: -@example -nop -@end example -@enddefbuiltin +The intended use can be e.g.@: -@defbuiltin{int __builtin_arc_norm (int @var{src})} -Only valid if the @samp{norm} instruction is available through the -@option{-mnorm} option or by default with @option{-mcpu=ARC700}. -Generates: -@example -norm @var{dest}, @var{src} -@end example -Where the value in @var{dest} will be the result returned from the -built-in. -@enddefbuiltin +@smallexample +#undef memcpy +#define bos0(dest) __builtin_object_size (dest, 0) +#define memcpy(dest, src, n) \ + __builtin___memcpy_chk (dest, src, n, bos0 (dest)) -@defbuiltin{{short int} __builtin_arc_normw (short int @var{src})} -Only valid if the @samp{normw} instruction is available through the -@option{-mnorm} option or by default with @option{-mcpu=ARC700}. -Generates: -@example -normw @var{dest}, @var{src} -@end example -Where the value in @var{dest} will be the result returned from the -built-in. -@enddefbuiltin +char *volatile p; +char buf[10]; +/* It is unknown what object p points to, so this is optimized + into plain memcpy - no checking is possible. */ +memcpy (p, "abcde", n); +/* Destination is known and length too. It is known at compile + time there will be no overflow. */ +memcpy (&buf[5], "abcde", 5); +/* Destination is known, but the length is not known at compile time. + This will result in __memcpy_chk call that can check for overflow + at run time. */ +memcpy (&buf[5], "abcde", n); +/* Destination is known and it is known at compile time there will + be overflow. There will be a warning and __memcpy_chk call that + will abort the program at run time. */ +memcpy (&buf[6], "abcde", 5); +@end smallexample -@defbuiltin{void __builtin_arc_rtie (void)} -Generates: -@example -rtie -@end example -@enddefbuiltin +Such built-in functions are provided for @code{memcpy}, @code{mempcpy}, +@code{memmove}, @code{memset}, @code{strcpy}, @code{stpcpy}, @code{strncpy}, +@code{strcat} and @code{strncat}. -@defbuiltin{void __builtin_arc_sleep (int @var{a}} -Generates: -@example -sleep @var{a} -@end example -@enddefbuiltin +@subsubsection Formatted Output Function Checking +@defbuiltin{int __builtin___sprintf_chk @ + (char *@var{s}, int @var{flag}, size_t @var{os}, @ + const char *@var{fmt}, ...)} +@defbuiltinx{int __builtin___snprintf_chk @ + (char *@var{s}, size_t @var{maxlen}, int @var{flag}, @ + size_t @var{os}, const char *@var{fmt}, ...)} +@defbuiltinx{int __builtin___vsprintf_chk @ + (char *@var{s}, int @var{flag}, size_t @var{os}, @ + const char *@var{fmt}, va_list @var{ap})} +@defbuiltinx{int __builtin___vsnprintf_chk @ + (char *@var{s}, size_t @var{maxlen}, int @var{flag}, @ + size_t @var{os}, const char *@var{fmt}, @ + va_list @var{ap})} -@defbuiltin{void __builtin_arc_sr (unsigned int @var{val}, unsigned int @var{auxr})} -The first argument, @var{val}, is a compile time constant to be -written to the register, the second argument, @var{auxr}, is the -address of an auxiliary register. Generates: -@example -sr @var{val}, [@var{auxr}] -@end example -@enddefbuiltin +The added @var{flag} argument is passed unchanged to @code{__sprintf_chk} +etc.@: functions and can contain implementation specific flags on what +additional security measures the checking function might take, such as +handling @code{%n} differently. -@defbuiltin{int __builtin_arc_swap (int @var{src})} -Only valid with @option{-mswap}. Generates: -@example -swap @var{dest}, @var{src} -@end example -Where the value in @var{dest} will be the result returned from the -built-in. -@enddefbuiltin +The @var{os} argument is the object size @var{s} points to, like in the +other built-in functions. There is a small difference in the behavior +though, if @var{os} is @code{(size_t) -1}, the built-in functions are +optimized into the non-checking functions only if @var{flag} is 0, otherwise +the checking function is called with @var{os} argument set to +@code{(size_t) -1}. -@defbuiltin{void __builtin_arc_swi (void)} -Generates: -@example -swi -@end example +In addition to this, there are checking built-in functions +@code{__builtin___printf_chk}, @code{__builtin___vprintf_chk}, +@code{__builtin___fprintf_chk} and @code{__builtin___vfprintf_chk}. +These have just one additional argument, @var{flag}, right before +format string @var{fmt}. If the compiler is able to optimize them to +@code{fputc} etc.@: functions, it does, otherwise the checking function +is called and the @var{flag} argument passed to it. @enddefbuiltin -@defbuiltin{void __builtin_arc_sync (void)} -Only available with @option{-mcpu=ARC700}. Generates: -@example -sync -@end example -@enddefbuiltin - -@defbuiltin{void __builtin_arc_trap_s (unsigned int @var{c})} -Only available with @option{-mcpu=ARC700}. Generates: -@example -trap_s @var{c} -@end example -@enddefbuiltin - -@defbuiltin{void __builtin_arc_unimp_s (void)} -Only available with @option{-mcpu=ARC700}. Generates: -@example -unimp_s -@end example -@enddefbuiltin - -The instructions generated by the following builtins are not -considered as candidates for scheduling. They are not moved around by -the compiler during scheduling, and thus can be expected to appear -where they are put in the C code: -@example -__builtin_arc_brk() -__builtin_arc_core_read() -__builtin_arc_core_write() -__builtin_arc_flag() -__builtin_arc_lr() -__builtin_arc_sleep() -__builtin_arc_sr() -__builtin_arc_swi() -@end example - -The following built-in functions are available for the ARCv2 family of -processors. - -@example -int __builtin_arc_clri (); -void __builtin_arc_kflag (unsigned); -void __builtin_arc_seti (int); -@end example - -The following built-in functions are available for the ARCv2 family -and uses @option{-mnorm}. - -@example -int __builtin_arc_ffs (int); -int __builtin_arc_fls (int); -@end example - -@node ARC SIMD Built-in Functions -@subsection ARC SIMD Built-in Functions - -SIMD builtins provided by the compiler can be used to generate the -vector instructions. This section describes the available builtins -and their usage in programs. With the @option{-msimd} option, the -compiler provides 128-bit vector types, which can be specified using -the @code{vector_size} attribute. The header file @file{arc-simd.h} -can be included to use the following predefined types: -@example -typedef int __v4si __attribute__((vector_size(16))); -typedef short __v8hi __attribute__((vector_size(16))); -@end example - -These types can be used to define 128-bit variables. The built-in -functions listed in the following section can be used on these -variables to generate the vector operations. - -For all builtins, @code{__builtin_arc_@var{someinsn}}, the header file -@file{arc-simd.h} also provides equivalent macros called -@code{_@var{someinsn}} that can be used for programming ease and -improved readability. The following macros for DMA control are also -provided: -@example -#define _setup_dma_in_channel_reg _vdiwr -#define _setup_dma_out_channel_reg _vdowr -@end example - -The following is a complete list of all the SIMD built-ins provided -for ARC, grouped by calling signature. - -The following take two @code{__v8hi} arguments and return a -@code{__v8hi} result: -@example -__v8hi __builtin_arc_vaddaw (__v8hi, __v8hi); -__v8hi __builtin_arc_vaddw (__v8hi, __v8hi); -__v8hi __builtin_arc_vand (__v8hi, __v8hi); -__v8hi __builtin_arc_vandaw (__v8hi, __v8hi); -__v8hi __builtin_arc_vavb (__v8hi, __v8hi); -__v8hi __builtin_arc_vavrb (__v8hi, __v8hi); -__v8hi __builtin_arc_vbic (__v8hi, __v8hi); -__v8hi __builtin_arc_vbicaw (__v8hi, __v8hi); -__v8hi __builtin_arc_vdifaw (__v8hi, __v8hi); -__v8hi __builtin_arc_vdifw (__v8hi, __v8hi); -__v8hi __builtin_arc_veqw (__v8hi, __v8hi); -__v8hi __builtin_arc_vh264f (__v8hi, __v8hi); -__v8hi __builtin_arc_vh264ft (__v8hi, __v8hi); -__v8hi __builtin_arc_vh264fw (__v8hi, __v8hi); -__v8hi __builtin_arc_vlew (__v8hi, __v8hi); -__v8hi __builtin_arc_vltw (__v8hi, __v8hi); -__v8hi __builtin_arc_vmaxaw (__v8hi, __v8hi); -__v8hi __builtin_arc_vmaxw (__v8hi, __v8hi); -__v8hi __builtin_arc_vminaw (__v8hi, __v8hi); -__v8hi __builtin_arc_vminw (__v8hi, __v8hi); -__v8hi __builtin_arc_vmr1aw (__v8hi, __v8hi); -__v8hi __builtin_arc_vmr1w (__v8hi, __v8hi); -__v8hi __builtin_arc_vmr2aw (__v8hi, __v8hi); -__v8hi __builtin_arc_vmr2w (__v8hi, __v8hi); -__v8hi __builtin_arc_vmr3aw (__v8hi, __v8hi); -__v8hi __builtin_arc_vmr3w (__v8hi, __v8hi); -__v8hi __builtin_arc_vmr4aw (__v8hi, __v8hi); -__v8hi __builtin_arc_vmr4w (__v8hi, __v8hi); -__v8hi __builtin_arc_vmr5aw (__v8hi, __v8hi); -__v8hi __builtin_arc_vmr5w (__v8hi, __v8hi); -__v8hi __builtin_arc_vmr6aw (__v8hi, __v8hi); -__v8hi __builtin_arc_vmr6w (__v8hi, __v8hi); -__v8hi __builtin_arc_vmr7aw (__v8hi, __v8hi); -__v8hi __builtin_arc_vmr7w (__v8hi, __v8hi); -__v8hi __builtin_arc_vmrb (__v8hi, __v8hi); -__v8hi __builtin_arc_vmulaw (__v8hi, __v8hi); -__v8hi __builtin_arc_vmulfaw (__v8hi, __v8hi); -__v8hi __builtin_arc_vmulfw (__v8hi, __v8hi); -__v8hi __builtin_arc_vmulw (__v8hi, __v8hi); -__v8hi __builtin_arc_vnew (__v8hi, __v8hi); -__v8hi __builtin_arc_vor (__v8hi, __v8hi); -__v8hi __builtin_arc_vsubaw (__v8hi, __v8hi); -__v8hi __builtin_arc_vsubw (__v8hi, __v8hi); -__v8hi __builtin_arc_vsummw (__v8hi, __v8hi); -__v8hi __builtin_arc_vvc1f (__v8hi, __v8hi); -__v8hi __builtin_arc_vvc1ft (__v8hi, __v8hi); -__v8hi __builtin_arc_vxor (__v8hi, __v8hi); -__v8hi __builtin_arc_vxoraw (__v8hi, __v8hi); -@end example - -The following take one @code{__v8hi} and one @code{int} argument and return a -@code{__v8hi} result: - -@example -__v8hi __builtin_arc_vbaddw (__v8hi, int); -__v8hi __builtin_arc_vbmaxw (__v8hi, int); -__v8hi __builtin_arc_vbminw (__v8hi, int); -__v8hi __builtin_arc_vbmulaw (__v8hi, int); -__v8hi __builtin_arc_vbmulfw (__v8hi, int); -__v8hi __builtin_arc_vbmulw (__v8hi, int); -__v8hi __builtin_arc_vbrsubw (__v8hi, int); -__v8hi __builtin_arc_vbsubw (__v8hi, int); -@end example - -The following take one @code{__v8hi} argument and one @code{int} argument which -must be a 3-bit compile time constant indicating a register number -I0-I7. They return a @code{__v8hi} result. -@example -__v8hi __builtin_arc_vasrw (__v8hi, const int); -__v8hi __builtin_arc_vsr8 (__v8hi, const int); -__v8hi __builtin_arc_vsr8aw (__v8hi, const int); -@end example - -The following take one @code{__v8hi} argument and one @code{int} -argument which must be a 6-bit compile time constant. They return a -@code{__v8hi} result. -@example -__v8hi __builtin_arc_vasrpwbi (__v8hi, const int); -__v8hi __builtin_arc_vasrrpwbi (__v8hi, const int); -__v8hi __builtin_arc_vasrrwi (__v8hi, const int); -__v8hi __builtin_arc_vasrsrwi (__v8hi, const int); -__v8hi __builtin_arc_vasrwi (__v8hi, const int); -__v8hi __builtin_arc_vsr8awi (__v8hi, const int); -__v8hi __builtin_arc_vsr8i (__v8hi, const int); -@end example - -The following take one @code{__v8hi} argument and one @code{int} argument which -must be a 8-bit compile time constant. They return a @code{__v8hi} -result. -@example -__v8hi __builtin_arc_vd6tapf (__v8hi, const int); -__v8hi __builtin_arc_vmvaw (__v8hi, const int); -__v8hi __builtin_arc_vmvw (__v8hi, const int); -__v8hi __builtin_arc_vmvzw (__v8hi, const int); -@end example - -The following take two @code{int} arguments, the second of which which -must be a 8-bit compile time constant. They return a @code{__v8hi} -result: -@example -__v8hi __builtin_arc_vmovaw (int, const int); -__v8hi __builtin_arc_vmovw (int, const int); -__v8hi __builtin_arc_vmovzw (int, const int); -@end example - -The following take a single @code{__v8hi} argument and return a -@code{__v8hi} result: -@example -__v8hi __builtin_arc_vabsaw (__v8hi); -__v8hi __builtin_arc_vabsw (__v8hi); -__v8hi __builtin_arc_vaddsuw (__v8hi); -__v8hi __builtin_arc_vexch1 (__v8hi); -__v8hi __builtin_arc_vexch2 (__v8hi); -__v8hi __builtin_arc_vexch4 (__v8hi); -__v8hi __builtin_arc_vsignw (__v8hi); -__v8hi __builtin_arc_vupbaw (__v8hi); -__v8hi __builtin_arc_vupbw (__v8hi); -__v8hi __builtin_arc_vupsbaw (__v8hi); -__v8hi __builtin_arc_vupsbw (__v8hi); -@end example - -The following take two @code{int} arguments and return no result: -@example -void __builtin_arc_vdirun (int, int); -void __builtin_arc_vdorun (int, int); -@end example - -The following take two @code{int} arguments and return no result. The -first argument must a 3-bit compile time constant indicating one of -the DR0-DR7 DMA setup channels: -@example -void __builtin_arc_vdiwr (const int, int); -void __builtin_arc_vdowr (const int, int); -@end example - -The following take an @code{int} argument and return no result: -@example -void __builtin_arc_vendrec (int); -void __builtin_arc_vrec (int); -void __builtin_arc_vrecrun (int); -void __builtin_arc_vrun (int); -@end example +@node New/Delete Builtins +@section Built-in functions for C++ allocations and deallocations +@findex __builtin_operator_new +@findex __builtin_operator_delete +Calling these C++ built-in functions is similar to calling +@code{::operator new} or @code{::operator delete} with the same arguments, +except that it is an error if the selected @code{::operator new} or +@code{::operator delete} overload is not a replaceable global operator +and for optimization purposes calls to pairs of these functions can be +omitted if access to the allocation is optimized out, or could be replaced +with implementation provided buffer on the stack, or multiple allocation +calls can be merged into a single allocation. In C++ such optimizations +are normally allowed just for calls to such replaceable global operators +from @code{new} and @code{delete} expressions. -The following take a @code{__v8hi} argument and two @code{int} -arguments and return a @code{__v8hi} result. The second argument must -be a 3-bit compile time constants, indicating one the registers I0-I7, -and the third argument must be an 8-bit compile time constant. +@smallexample +void foo () @{ + int *a = new int; + delete a; // This pair of allocation/deallocation operators can be omitted + // or replaced with int _temp; int *a = &_temp; etc.@: + void *b = ::operator new (32); + ::operator delete (b); // This one cannnot. + void *c = __builtin_operator_new (32); + __builtin_operator_delete (c); // This one can. +@} +@end smallexample -@emph{Note:} Although the equivalent hardware instructions do not take -an SIMD register as an operand, these builtins overwrite the relevant -bits of the @code{__v8hi} register provided as the first argument with -the value loaded from the @code{[Ib, u8]} location in the SDM. - -@example -__v8hi __builtin_arc_vld32 (__v8hi, const int, const int); -__v8hi __builtin_arc_vld32wh (__v8hi, const int, const int); -__v8hi __builtin_arc_vld32wl (__v8hi, const int, const int); -__v8hi __builtin_arc_vld64 (__v8hi, const int, const int); -@end example - -The following take two @code{int} arguments and return a @code{__v8hi} -result. The first argument must be a 3-bit compile time constants, -indicating one the registers I0-I7, and the second argument must be an -8-bit compile time constant. - -@example -__v8hi __builtin_arc_vld128 (const int, const int); -__v8hi __builtin_arc_vld64w (const int, const int); -@end example - -The following take a @code{__v8hi} argument and two @code{int} -arguments and return no result. The second argument must be a 3-bit -compile time constants, indicating one the registers I0-I7, and the -third argument must be an 8-bit compile time constant. - -@example -void __builtin_arc_vst128 (__v8hi, const int, const int); -void __builtin_arc_vst64 (__v8hi, const int, const int); -@end example - -The following take a @code{__v8hi} argument and three @code{int} -arguments and return no result. The second argument must be a 3-bit -compile-time constant, identifying the 16-bit sub-register to be -stored, the third argument must be a 3-bit compile time constants, -indicating one the registers I0-I7, and the fourth argument must be an -8-bit compile time constant. - -@example -void __builtin_arc_vst16_n (__v8hi, const int, const int, const int); -void __builtin_arc_vst32_n (__v8hi, const int, const int, const int); -@end example - -The following built-in functions are available on systems that uses -@option{-mmpy-option=6} or higher. - -@example -__v2hi __builtin_arc_dmach (__v2hi, __v2hi); -__v2hi __builtin_arc_dmachu (__v2hi, __v2hi); -__v2hi __builtin_arc_dmpyh (__v2hi, __v2hi); -__v2hi __builtin_arc_dmpyhu (__v2hi, __v2hi); -__v2hi __builtin_arc_vaddsub2h (__v2hi, __v2hi); -__v2hi __builtin_arc_vsubadd2h (__v2hi, __v2hi); -@end example - -The following built-in functions are available on systems that uses -@option{-mmpy-option=7} or higher. - -@example -__v2si __builtin_arc_vmac2h (__v2hi, __v2hi); -__v2si __builtin_arc_vmac2hu (__v2hi, __v2hi); -__v2si __builtin_arc_vmpy2h (__v2hi, __v2hi); -__v2si __builtin_arc_vmpy2hu (__v2hi, __v2hi); -@end example - -The following built-in functions are available on systems that uses -@option{-mmpy-option=8} or higher. - -@example -long long __builtin_arc_qmach (__v4hi, __v4hi); -long long __builtin_arc_qmachu (__v4hi, __v4hi); -long long __builtin_arc_qmpyh (__v4hi, __v4hi); -long long __builtin_arc_qmpyhu (__v4hi, __v4hi); -long long __builtin_arc_dmacwh (__v2si, __v2hi); -long long __builtin_arc_dmacwhu (__v2si, __v2hi); -_v2si __builtin_arc_vaddsub (__v2si, __v2si); -_v2si __builtin_arc_vsubadd (__v2si, __v2si); -_v4hi __builtin_arc_vaddsub4h (__v4hi, __v4hi); -_v4hi __builtin_arc_vsubadd4h (__v4hi, __v4hi); -@end example - -@node ARM iWMMXt Built-in Functions -@subsection ARM iWMMXt Built-in Functions - -These built-in functions are available for the ARM family of -processors when the @option{-mcpu=iwmmxt} switch is used: - -@smallexample -typedef int v2si __attribute__ ((vector_size (8))); -typedef short v4hi __attribute__ ((vector_size (8))); -typedef char v8qi __attribute__ ((vector_size (8))); - -int __builtin_arm_getwcgr0 (void); -void __builtin_arm_setwcgr0 (int); -int __builtin_arm_getwcgr1 (void); -void __builtin_arm_setwcgr1 (int); -int __builtin_arm_getwcgr2 (void); -void __builtin_arm_setwcgr2 (int); -int __builtin_arm_getwcgr3 (void); -void __builtin_arm_setwcgr3 (int); -int __builtin_arm_textrmsb (v8qi, int); -int __builtin_arm_textrmsh (v4hi, int); -int __builtin_arm_textrmsw (v2si, int); -int __builtin_arm_textrmub (v8qi, int); -int __builtin_arm_textrmuh (v4hi, int); -int __builtin_arm_textrmuw (v2si, int); -v8qi __builtin_arm_tinsrb (v8qi, int, int); -v4hi __builtin_arm_tinsrh (v4hi, int, int); -v2si __builtin_arm_tinsrw (v2si, int, int); -long long __builtin_arm_tmia (long long, int, int); -long long __builtin_arm_tmiabb (long long, int, int); -long long __builtin_arm_tmiabt (long long, int, int); -long long __builtin_arm_tmiaph (long long, int, int); -long long __builtin_arm_tmiatb (long long, int, int); -long long __builtin_arm_tmiatt (long long, int, int); -int __builtin_arm_tmovmskb (v8qi); -int __builtin_arm_tmovmskh (v4hi); -int __builtin_arm_tmovmskw (v2si); -long long __builtin_arm_waccb (v8qi); -long long __builtin_arm_wacch (v4hi); -long long __builtin_arm_waccw (v2si); -v8qi __builtin_arm_waddb (v8qi, v8qi); -v8qi __builtin_arm_waddbss (v8qi, v8qi); -v8qi __builtin_arm_waddbus (v8qi, v8qi); -v4hi __builtin_arm_waddh (v4hi, v4hi); -v4hi __builtin_arm_waddhss (v4hi, v4hi); -v4hi __builtin_arm_waddhus (v4hi, v4hi); -v2si __builtin_arm_waddw (v2si, v2si); -v2si __builtin_arm_waddwss (v2si, v2si); -v2si __builtin_arm_waddwus (v2si, v2si); -v8qi __builtin_arm_walign (v8qi, v8qi, int); -long long __builtin_arm_wand(long long, long long); -long long __builtin_arm_wandn (long long, long long); -v8qi __builtin_arm_wavg2b (v8qi, v8qi); -v8qi __builtin_arm_wavg2br (v8qi, v8qi); -v4hi __builtin_arm_wavg2h (v4hi, v4hi); -v4hi __builtin_arm_wavg2hr (v4hi, v4hi); -v8qi __builtin_arm_wcmpeqb (v8qi, v8qi); -v4hi __builtin_arm_wcmpeqh (v4hi, v4hi); -v2si __builtin_arm_wcmpeqw (v2si, v2si); -v8qi __builtin_arm_wcmpgtsb (v8qi, v8qi); -v4hi __builtin_arm_wcmpgtsh (v4hi, v4hi); -v2si __builtin_arm_wcmpgtsw (v2si, v2si); -v8qi __builtin_arm_wcmpgtub (v8qi, v8qi); -v4hi __builtin_arm_wcmpgtuh (v4hi, v4hi); -v2si __builtin_arm_wcmpgtuw (v2si, v2si); -long long __builtin_arm_wmacs (long long, v4hi, v4hi); -long long __builtin_arm_wmacsz (v4hi, v4hi); -long long __builtin_arm_wmacu (long long, v4hi, v4hi); -long long __builtin_arm_wmacuz (v4hi, v4hi); -v4hi __builtin_arm_wmadds (v4hi, v4hi); -v4hi __builtin_arm_wmaddu (v4hi, v4hi); -v8qi __builtin_arm_wmaxsb (v8qi, v8qi); -v4hi __builtin_arm_wmaxsh (v4hi, v4hi); -v2si __builtin_arm_wmaxsw (v2si, v2si); -v8qi __builtin_arm_wmaxub (v8qi, v8qi); -v4hi __builtin_arm_wmaxuh (v4hi, v4hi); -v2si __builtin_arm_wmaxuw (v2si, v2si); -v8qi __builtin_arm_wminsb (v8qi, v8qi); -v4hi __builtin_arm_wminsh (v4hi, v4hi); -v2si __builtin_arm_wminsw (v2si, v2si); -v8qi __builtin_arm_wminub (v8qi, v8qi); -v4hi __builtin_arm_wminuh (v4hi, v4hi); -v2si __builtin_arm_wminuw (v2si, v2si); -v4hi __builtin_arm_wmulsm (v4hi, v4hi); -v4hi __builtin_arm_wmulul (v4hi, v4hi); -v4hi __builtin_arm_wmulum (v4hi, v4hi); -long long __builtin_arm_wor (long long, long long); -v2si __builtin_arm_wpackdss (long long, long long); -v2si __builtin_arm_wpackdus (long long, long long); -v8qi __builtin_arm_wpackhss (v4hi, v4hi); -v8qi __builtin_arm_wpackhus (v4hi, v4hi); -v4hi __builtin_arm_wpackwss (v2si, v2si); -v4hi __builtin_arm_wpackwus (v2si, v2si); -long long __builtin_arm_wrord (long long, long long); -long long __builtin_arm_wrordi (long long, int); -v4hi __builtin_arm_wrorh (v4hi, long long); -v4hi __builtin_arm_wrorhi (v4hi, int); -v2si __builtin_arm_wrorw (v2si, long long); -v2si __builtin_arm_wrorwi (v2si, int); -v2si __builtin_arm_wsadb (v2si, v8qi, v8qi); -v2si __builtin_arm_wsadbz (v8qi, v8qi); -v2si __builtin_arm_wsadh (v2si, v4hi, v4hi); -v2si __builtin_arm_wsadhz (v4hi, v4hi); -v4hi __builtin_arm_wshufh (v4hi, int); -long long __builtin_arm_wslld (long long, long long); -long long __builtin_arm_wslldi (long long, int); -v4hi __builtin_arm_wsllh (v4hi, long long); -v4hi __builtin_arm_wsllhi (v4hi, int); -v2si __builtin_arm_wsllw (v2si, long long); -v2si __builtin_arm_wsllwi (v2si, int); -long long __builtin_arm_wsrad (long long, long long); -long long __builtin_arm_wsradi (long long, int); -v4hi __builtin_arm_wsrah (v4hi, long long); -v4hi __builtin_arm_wsrahi (v4hi, int); -v2si __builtin_arm_wsraw (v2si, long long); -v2si __builtin_arm_wsrawi (v2si, int); -long long __builtin_arm_wsrld (long long, long long); -long long __builtin_arm_wsrldi (long long, int); -v4hi __builtin_arm_wsrlh (v4hi, long long); -v4hi __builtin_arm_wsrlhi (v4hi, int); -v2si __builtin_arm_wsrlw (v2si, long long); -v2si __builtin_arm_wsrlwi (v2si, int); -v8qi __builtin_arm_wsubb (v8qi, v8qi); -v8qi __builtin_arm_wsubbss (v8qi, v8qi); -v8qi __builtin_arm_wsubbus (v8qi, v8qi); -v4hi __builtin_arm_wsubh (v4hi, v4hi); -v4hi __builtin_arm_wsubhss (v4hi, v4hi); -v4hi __builtin_arm_wsubhus (v4hi, v4hi); -v2si __builtin_arm_wsubw (v2si, v2si); -v2si __builtin_arm_wsubwss (v2si, v2si); -v2si __builtin_arm_wsubwus (v2si, v2si); -v4hi __builtin_arm_wunpckehsb (v8qi); -v2si __builtin_arm_wunpckehsh (v4hi); -long long __builtin_arm_wunpckehsw (v2si); -v4hi __builtin_arm_wunpckehub (v8qi); -v2si __builtin_arm_wunpckehuh (v4hi); -long long __builtin_arm_wunpckehuw (v2si); -v4hi __builtin_arm_wunpckelsb (v8qi); -v2si __builtin_arm_wunpckelsh (v4hi); -long long __builtin_arm_wunpckelsw (v2si); -v4hi __builtin_arm_wunpckelub (v8qi); -v2si __builtin_arm_wunpckeluh (v4hi); -long long __builtin_arm_wunpckeluw (v2si); -v8qi __builtin_arm_wunpckihb (v8qi, v8qi); -v4hi __builtin_arm_wunpckihh (v4hi, v4hi); -v2si __builtin_arm_wunpckihw (v2si, v2si); -v8qi __builtin_arm_wunpckilb (v8qi, v8qi); -v4hi __builtin_arm_wunpckilh (v4hi, v4hi); -v2si __builtin_arm_wunpckilw (v2si, v2si); -long long __builtin_arm_wxor (long long, long long); -long long __builtin_arm_wzero (); -@end smallexample - - -@node ARM C Language Extensions (ACLE) -@subsection ARM C Language Extensions (ACLE) - -GCC implements extensions for C as described in the ARM C Language -Extensions (ACLE) specification, which can be found at -@uref{https://developer.arm.com/documentation/ihi0053/latest/}. - -As a part of ACLE, GCC implements extensions for Advanced SIMD as described in -the ARM C Language Extensions Specification. The complete list of Advanced SIMD -intrinsics can be found at -@uref{https://developer.arm.com/documentation/ihi0073/latest/}. -The built-in intrinsics for the Advanced SIMD extension are available when -NEON is enabled. - -Currently, ARM and AArch64 back ends do not support ACLE 2.0 fully. Both -back ends support CRC32 intrinsics and the ARM back end supports the -Coprocessor intrinsics, all from @file{arm_acle.h}. The ARM back end's 16-bit -floating-point Advanced SIMD intrinsics currently comply to ACLE v1.1. -AArch64's back end does not have support for 16-bit floating point Advanced SIMD -intrinsics yet. - -See @ref{ARM Options} and @ref{AArch64 Options} for more information on the -availability of extensions. - -@node ARM Floating Point Status and Control Intrinsics -@subsection ARM Floating Point Status and Control Intrinsics - -These built-in functions are available for the ARM family of -processors with floating-point unit. - -@smallexample -unsigned int __builtin_arm_get_fpscr (); -void __builtin_arm_set_fpscr (unsigned int); -@end smallexample - -@node ARM ARMv8-M Security Extensions -@subsection ARM ARMv8-M Security Extensions - -GCC implements the ARMv8-M Security Extensions as described in the ARMv8-M -Security Extensions: Requirements on Development Tools Engineering -Specification, which can be found at -@uref{https://developer.arm.com/documentation/ecm0359818/latest/}. - -As part of the Security Extensions GCC implements two new function attributes: -@code{cmse_nonsecure_entry} and @code{cmse_nonsecure_call}. - -As part of the Security Extensions GCC implements the intrinsics below. FPTR -is used here to mean any function pointer type. - -@smallexample -cmse_address_info_t cmse_TT (void *); -cmse_address_info_t cmse_TT_fptr (FPTR); -cmse_address_info_t cmse_TTT (void *); -cmse_address_info_t cmse_TTT_fptr (FPTR); -cmse_address_info_t cmse_TTA (void *); -cmse_address_info_t cmse_TTA_fptr (FPTR); -cmse_address_info_t cmse_TTAT (void *); -cmse_address_info_t cmse_TTAT_fptr (FPTR); -void * cmse_check_address_range (void *, size_t, int); -typeof(p) cmse_nsfptr_create (FPTR p); -intptr_t cmse_is_nsfptr (FPTR); -int cmse_nonsecure_caller (void); -@end smallexample - -@node AVR Built-in Functions -@subsection AVR Built-in Functions - -For each built-in function for AVR, there is an equally named, -uppercase built-in macro defined. That way users can easily query if -or if not a specific built-in is implemented or not. For example, if -@code{__builtin_avr_nop} is available the macro -@code{__BUILTIN_AVR_NOP} is defined to @code{1} and undefined otherwise. - -@defbuiltin{void __builtin_avr_nop (void)} -@defbuiltinx{void __builtin_avr_sei (void)} -@defbuiltinx{void __builtin_avr_cli (void)} -@defbuiltinx{void __builtin_avr_sleep (void)} -@defbuiltinx{void __builtin_avr_wdr (void)} -@defbuiltinx{uint8_t __builtin_avr_swap (uint8_t)} -@defbuiltinx{uint16_t __builtin_avr_fmul (uint8_t, uint8_t)} -@defbuiltinx{int16_t __builtin_avr_fmuls (int8_t, int8_t)} -@defbuiltinx{int16_t __builtin_avr_fmulsu (int8_t, uint8_t)} - -These built-in functions map to the respective machine -instruction, i.e.@: @code{nop}, @code{sei}, @code{cli}, @code{sleep}, -@code{wdr}, @code{swap}, @code{fmul}, @code{fmuls} -resp. @code{fmulsu}. The three @code{fmul*} built-ins are implemented -as library call if no hardware multiplier is available. -@enddefbuiltin - -@defbuiltin{void __builtin_avr_delay_cycles (uint32_t @var{ticks})} -Delay execution for @var{ticks} cycles. Note that this -built-in does not take into account the effect of interrupts that -might increase delay time. @var{ticks} must be a compile-time -integer constant; delays with a variable number of cycles are not supported. -@enddefbuiltin - -@defbuiltin{uint8_t __builtin_avr_insert_bits (uint32_t @var{map}, uint8_t @var{bits}, uint8_t @var{val})} -Insert bits from @var{bits} into @var{val} and return the resulting -value. The nibbles of @var{map} determine how the insertion is -performed: Let @var{X} be the @var{n}-th nibble of @var{map} -@enumerate -@item If @var{X} is @code{0xf}, -then the @var{n}-th bit of @var{val} is returned unaltered. - -@item If X is in the range 0@dots{}7, -then the @var{n}-th result bit is set to the @var{X}-th bit of @var{bits} - -@item If X is in the range 8@dots{}@code{0xe}, -then the @var{n}-th result bit is undefined. -@end enumerate - -@noindent -One typical use case for this built-in is adjusting input and -output values to non-contiguous port layouts. Some examples: - -@smallexample -// same as val, bits is unused -__builtin_avr_insert_bits (0xffffffff, bits, val); -@end smallexample - -@smallexample -// same as bits, val is unused -__builtin_avr_insert_bits (0x76543210, bits, val); -@end smallexample - -@smallexample -// same as rotating bits by 4 -__builtin_avr_insert_bits (0x32107654, bits, 0); -@end smallexample - -@smallexample -// high nibble of result is the high nibble of val -// low nibble of result is the low nibble of bits -__builtin_avr_insert_bits (0xffff3210, bits, val); -@end smallexample - -@smallexample -// reverse the bit order of bits -__builtin_avr_insert_bits (0x01234567, bits, 0); -@end smallexample -@enddefbuiltin - -@defbuiltin{uint8_t __builtin_avr_mask1 (uint8_t @var{mask}, uint8_t @var{offs})} -Rotate the 8-bit constant value @var{mask} by an offset of @var{offs}, -where @var{mask} is in @{ 0x01, 0xfe, 0x7f, 0x80 @}. -This built-in can be used as an alternative to 8-bit expressions like -@code{1 << offs} when their computation consumes too much -time, and @var{offs} is known to be in the range 0@dots{}7. -@example -__builtin_avr_mask1 (1, offs) // same like 1 << offs -__builtin_avr_mask1 (~1, offs) // same like ~(1 << offs) -__builtin_avr_mask1 (0x80, offs) // same like 0x80 >> offs -__builtin_avr_mask1 (~0x80, offs) // same like ~(0x80 >> offs) -@end example -The open coded C versions take at least @code{5 + 4 * @var{offs}} cycles -(and 5 instructions), whereas the built-in takes 7 cycles and instructions -(8 cycles and instructions in the case of @code{@var{mask} = 0x7f}). -@enddefbuiltin - -@defbuiltin{void __builtin_avr_nops (uint16_t @var{count})} -Insert @var{count} @code{NOP} instructions. -The number of instructions must be a compile-time integer constant. -@enddefbuiltin - -@b{All of the following built-in functions are only available for GNU-C} - -@defbuiltin{int8_t __builtin_avr_flash_segment (const __memx void*)} -This built-in takes a byte address to the 24-bit -@ref{AVR Named Address Spaces,named address space} @code{__memx} and returns -the number of the flash segment (the 64 KiB chunk) where the address -points to. Counting starts at @code{0}. -If the address does not point to flash memory, return @code{-1}. -@enddefbuiltin - -@defbuiltin{size_t __builtin_avr_strlen_flash (const __flash char*)} -@defbuiltinx{size_t __builtin_avr_strlen_flashx (const __flashx char*)} -@defbuiltinx{size_t __builtin_avr_strlen_memx (const __memx char*)} -These built-ins return the length of a string located in -named address-space @code{__flash}, @code{__flashx} or @code{__memx}, -respectively. They are used to support functions like @code{strlen_F} from -@w{@uref{https://avrdudes.github.io/avr-libc/avr-libc-user-manual/,AVR-LibC}}'s -header @code{avr/flash.h}. -@enddefbuiltin - -@noindent -There are many more AVR-specific built-in functions that are used to -implement the ISO/IEC TR 18037 ``Embedded C'' fixed-point functions of -section 7.18a.6. You don't need to use these built-ins directly. -Instead, use the declarations as supplied by the @code{stdfix.h} header -with GNU-C99: - -@smallexample -#include - -// Re-interpret the bit representation of unsigned 16-bit -// integer @var{uval} as Q-format 0.16 value. -unsigned fract get_bits (uint_ur_t uval) -@{ - return urbits (uval); -@} -@end smallexample - -@node Blackfin Built-in Functions -@subsection Blackfin Built-in Functions - -Currently, there are two Blackfin-specific built-in functions. These are -used for generating @code{CSYNC} and @code{SSYNC} machine insns without -using inline assembly; by using these built-in functions the compiler can -automatically add workarounds for hardware errata involving these -instructions. These functions are named as follows: - -@smallexample -void __builtin_bfin_csync (void); -void __builtin_bfin_ssync (void); -@end smallexample - -@node BPF Built-in Functions -@subsection BPF Built-in Functions - -The following built-in functions are available for eBPF targets. - -@defbuiltin{{unsigned long long} __builtin_bpf_load_byte (unsigned long long @var{offset})} -Load a byte from the @code{struct sk_buff} packet data pointed to by the -register @code{%r6}, and return it. -@enddefbuiltin - -@defbuiltin{{unsigned long long} __builtin_bpf_load_half (unsigned long long @var{offset})} -Load 16 bits from the @code{struct sk_buff} packet data pointed to by the -register @code{%r6}, and return it. -@enddefbuiltin - -@defbuiltin{{unsigned long long} __builtin_bpf_load_word (unsigned long long @var{offset})} -Load 32 bits from the @code{struct sk_buff} packet data pointed to by the -register @code{%r6}, and return it. -@enddefbuiltin - -@defbuiltin{@var{type} __builtin_preserve_access_index (@var{type} @var{expr})} -BPF Compile Once-Run Everywhere (CO-RE) support. Instruct GCC to -generate CO-RE relocation records for any accesses to aggregate -data structures (struct, union, array types) in @var{expr}. This builtin -is otherwise transparent; @var{expr} may have any type and its value is -returned. This builtin has no effect if @code{-mco-re} is not in effect -(either specified or implied). -@enddefbuiltin - -@defbuiltin{{unsigned int} __builtin_preserve_field_info (@var{expr}, unsigned int @var{kind})} -BPF Compile Once-Run Everywhere (CO-RE) support. This builtin is used to -extract information to aid in struct/union relocations. @var{expr} is -an access to a field of a struct or union. Depending on @var{kind}, different -information is returned to the program. A CO-RE relocation for the access in -@var{expr} with kind @var{kind} is recorded if @code{-mco-re} is in effect. - -The following values are supported for @var{kind}: -@table @code -@item FIELD_BYTE_OFFSET = 0 -The returned value is the offset, in bytes, of the field from the -beginning of the containing structure. For bit-fields, this is the byte offset -of the containing word. - -@item FIELD_BYTE_SIZE = 1 -The returned value is the size, in bytes, of the field. For bit-fields, -this is the size in bytes of the containing word. - -@item FIELD_EXISTENCE = 2 -The returned value is 1 if the field exists, 0 otherwise. Always 1 at -compile time. - -@item FIELD_SIGNEDNESS = 3 -The returned value is 1 if the field is signed, 0 otherwise. - -@item FIELD_LSHIFT_U64 = 4 -@itemx FIELD_RSHIFT_U64 = 5 -The returned value is the number of bits of left- or right-shifting -(respectively) needed in order to recover the original value of the field, -after it has been loaded by a read of @code{FIELD_BYTE_SIZE} bytes into an -unsigned 64-bit value. Primarily useful for reading bit-field values -from structures that may change between kernel versions. - -@end table - -Note that the return value is a constant which is known at -compile time. If the field has a variable offset then -@code{FIELD_BYTE_OFFSET}, @code{FIELD_LSHIFT_U64}, -and @code{FIELD_RSHIFT_U64} are not supported. -Similarly, if the field has a variable size then -@code{FIELD_BYTE_SIZE}, @code{FIELD_LSHIFT_U64}, -and @code{FIELD_RSHIFT_U64} are not supported. - -For example, @code{__builtin_preserve_field_info} can be used to reliably -extract bit-field values from a structure that may change between -kernel versions: - -@smallexample -struct S -@{ - short a; - int x:7; - int y:5; -@}; - -int -read_y (struct S *arg) -@{ - unsigned long long val; - unsigned int offset - = __builtin_preserve_field_info (arg->y, FIELD_BYTE_OFFSET); - unsigned int size - = __builtin_preserve_field_info (arg->y, FIELD_BYTE_SIZE); - - /* Read size bytes from arg + offset into val. */ - bpf_probe_read (&val, size, arg + offset); - - val <<= __builtin_preserve_field_info (arg->y, FIELD_LSHIFT_U64); - - if (__builtin_preserve_field_info (arg->y, FIELD_SIGNEDNESS)) - val = ((long long) val - >> __builtin_preserve_field_info (arg->y, FIELD_RSHIFT_U64)); - else - val >>= __builtin_preserve_field_info (arg->y, FIELD_RSHIFT_U64); - - return val; -@} - -@end smallexample -@enddefbuiltin - -@defbuiltin{{unsigned int} __builtin_preserve_enum_value (@var{type}, @var{enum}, unsigned int @var{kind})} -BPF Compile Once-Run Everywhere (CO-RE) support. This builtin collects enum -information and creates a CO-RE relocation relative to @var{enum} that should -be of @var{type}. The @var{kind} specifies the action performed. - -The following values are supported for @var{kind}: -@table @code -@item ENUM_VALUE_EXISTS = 0 -The return value is either 0 or 1 depending if the enum value exists in the -target. - -@item ENUM_VALUE = 1 -The return value is the enum value in the target kernel. -@end table -@enddefbuiltin - -@defbuiltin{{unsigned int} __builtin_btf_type_id (@var{type}, unsigned int @var{kind})} -BPF Compile Once-Run Everywhere (CO-RE) support. This builtin is used to get -the BTF type ID of a specified @var{type}. -Depending on the @var{kind} argument, it -either returns the ID of the local BTF information, or the BTF type ID in -the target kernel. - -The following values are supported for @var{kind}: -@table @code -@item BTF_TYPE_ID_LOCAL = 0 -Return the local BTF type ID. Always succeeds. - -@item BTF_TYPE_ID_TARGET = 1 -Return the target BTF type ID. If @var{type} does not exist in the target, -returns 0. -@end table -@enddefbuiltin - -@defbuiltin{{unsigned int} __builtin_preserve_type_info (@var{type}, unsigned int @var{kind})} -BPF Compile Once-Run Everywhere (CO-RE) support. This builtin performs named -type (struct/union/enum/typedef) verifications. The type of verification -depends on the @var{kind} argument provided. This builtin always -returns 0 if @var{type} does not exist in the target kernel. - -The following values are supported for @var{kind}: -@table @code -@item BTF_TYPE_EXISTS = 0 -Checks if @var{type} exists in the target. - -@item BTF_TYPE_MATCHES = 1 -Checks if @var{type} matches the local definition in the target kernel. - -@item BTF_TYPE_SIZE = 2 -Returns the size of the @var{type} within the target. -@end table -@enddefbuiltin - -@node FR-V Built-in Functions -@subsection FR-V Built-in Functions - -GCC provides many FR-V-specific built-in functions. In general, -these functions are intended to be compatible with those described -by @cite{FR-V Family, Softune C/C++ Compiler Manual (V6), Fujitsu -Semiconductor}. The two exceptions are @code{__MDUNPACKH} and -@code{__MBTOHE}, the GCC forms of which pass 128-bit values by -pointer rather than by value. - -Most of the functions are named after specific FR-V instructions. -Such functions are said to be ``directly mapped'' and are summarized -here in tabular form. - -@menu -* Argument Types:: -* Directly-mapped Integer Functions:: -* Directly-mapped Media Functions:: -* Raw read/write Functions:: -* Other Built-in Functions:: -@end menu - -@node Argument Types -@subsubsection Argument Types - -The arguments to the built-in functions can be divided into three groups: -register numbers, compile-time constants and run-time values. In order -to make this classification clear at a glance, the arguments and return -values are given the following pseudo types: - -@multitable @columnfractions .20 .30 .15 .35 -@headitem Pseudo type @tab Real C type @tab Constant? @tab Description -@item @code{uh} @tab @code{unsigned short} @tab No @tab an unsigned halfword -@item @code{uw1} @tab @code{unsigned int} @tab No @tab an unsigned word -@item @code{sw1} @tab @code{int} @tab No @tab a signed word -@item @code{uw2} @tab @code{unsigned long long} @tab No -@tab an unsigned doubleword -@item @code{sw2} @tab @code{long long} @tab No @tab a signed doubleword -@item @code{const} @tab @code{int} @tab Yes @tab an integer constant -@item @code{acc} @tab @code{int} @tab Yes @tab an ACC register number -@item @code{iacc} @tab @code{int} @tab Yes @tab an IACC register number -@end multitable - -These pseudo types are not defined by GCC, they are simply a notational -convenience used in this manual. - -Arguments of type @code{uh}, @code{uw1}, @code{sw1}, @code{uw2} -and @code{sw2} are evaluated at run time. They correspond to -register operands in the underlying FR-V instructions. - -@code{const} arguments represent immediate operands in the underlying -FR-V instructions. They must be compile-time constants. - -@code{acc} arguments are evaluated at compile time and specify the number -of an accumulator register. For example, an @code{acc} argument of 2 -selects the ACC2 register. - -@code{iacc} arguments are similar to @code{acc} arguments but specify the -number of an IACC register. See @pxref{Other Built-in Functions} -for more details. - -@node Directly-mapped Integer Functions -@subsubsection Directly-Mapped Integer Functions - -The functions listed below map directly to FR-V I-type instructions. - -@multitable @columnfractions .45 .32 .23 -@headitem Function prototype @tab Example usage @tab Assembly output -@item @code{sw1 __ADDSS (sw1, sw1)} -@tab @code{@var{c} = __ADDSS (@var{a}, @var{b})} -@tab @code{ADDSS @var{a},@var{b},@var{c}} -@item @code{sw1 __SCAN (sw1, sw1)} -@tab @code{@var{c} = __SCAN (@var{a}, @var{b})} -@tab @code{SCAN @var{a},@var{b},@var{c}} -@item @code{sw1 __SCUTSS (sw1)} -@tab @code{@var{b} = __SCUTSS (@var{a})} -@tab @code{SCUTSS @var{a},@var{b}} -@item @code{sw1 __SLASS (sw1, sw1)} -@tab @code{@var{c} = __SLASS (@var{a}, @var{b})} -@tab @code{SLASS @var{a},@var{b},@var{c}} -@item @code{void __SMASS (sw1, sw1)} -@tab @code{__SMASS (@var{a}, @var{b})} -@tab @code{SMASS @var{a},@var{b}} -@item @code{void __SMSSS (sw1, sw1)} -@tab @code{__SMSSS (@var{a}, @var{b})} -@tab @code{SMSSS @var{a},@var{b}} -@item @code{void __SMU (sw1, sw1)} -@tab @code{__SMU (@var{a}, @var{b})} -@tab @code{SMU @var{a},@var{b}} -@item @code{sw2 __SMUL (sw1, sw1)} -@tab @code{@var{c} = __SMUL (@var{a}, @var{b})} -@tab @code{SMUL @var{a},@var{b},@var{c}} -@item @code{sw1 __SUBSS (sw1, sw1)} -@tab @code{@var{c} = __SUBSS (@var{a}, @var{b})} -@tab @code{SUBSS @var{a},@var{b},@var{c}} -@item @code{uw2 __UMUL (uw1, uw1)} -@tab @code{@var{c} = __UMUL (@var{a}, @var{b})} -@tab @code{UMUL @var{a},@var{b},@var{c}} -@end multitable - -@node Directly-mapped Media Functions -@subsubsection Directly-Mapped Media Functions - -The functions listed below map directly to FR-V M-type instructions. - -@multitable @columnfractions .45 .32 .23 -@headitem Function prototype @tab Example usage @tab Assembly output -@item @code{uw1 __MABSHS (sw1)} -@tab @code{@var{b} = __MABSHS (@var{a})} -@tab @code{MABSHS @var{a},@var{b}} -@item @code{void __MADDACCS (acc, acc)} -@tab @code{__MADDACCS (@var{b}, @var{a})} -@tab @code{MADDACCS @var{a},@var{b}} -@item @code{sw1 __MADDHSS (sw1, sw1)} -@tab @code{@var{c} = __MADDHSS (@var{a}, @var{b})} -@tab @code{MADDHSS @var{a},@var{b},@var{c}} -@item @code{uw1 __MADDHUS (uw1, uw1)} -@tab @code{@var{c} = __MADDHUS (@var{a}, @var{b})} -@tab @code{MADDHUS @var{a},@var{b},@var{c}} -@item @code{uw1 __MAND (uw1, uw1)} -@tab @code{@var{c} = __MAND (@var{a}, @var{b})} -@tab @code{MAND @var{a},@var{b},@var{c}} -@item @code{void __MASACCS (acc, acc)} -@tab @code{__MASACCS (@var{b}, @var{a})} -@tab @code{MASACCS @var{a},@var{b}} -@item @code{uw1 __MAVEH (uw1, uw1)} -@tab @code{@var{c} = __MAVEH (@var{a}, @var{b})} -@tab @code{MAVEH @var{a},@var{b},@var{c}} -@item @code{uw2 __MBTOH (uw1)} -@tab @code{@var{b} = __MBTOH (@var{a})} -@tab @code{MBTOH @var{a},@var{b}} -@item @code{void __MBTOHE (uw1 *, uw1)} -@tab @code{__MBTOHE (&@var{b}, @var{a})} -@tab @code{MBTOHE @var{a},@var{b}} -@item @code{void __MCLRACC (acc)} -@tab @code{__MCLRACC (@var{a})} -@tab @code{MCLRACC @var{a}} -@item @code{void __MCLRACCA (void)} -@tab @code{__MCLRACCA ()} -@tab @code{MCLRACCA} -@item @code{uw1 __Mcop1 (uw1, uw1)} -@tab @code{@var{c} = __Mcop1 (@var{a}, @var{b})} -@tab @code{Mcop1 @var{a},@var{b},@var{c}} -@item @code{uw1 __Mcop2 (uw1, uw1)} -@tab @code{@var{c} = __Mcop2 (@var{a}, @var{b})} -@tab @code{Mcop2 @var{a},@var{b},@var{c}} -@item @code{uw1 __MCPLHI (uw2, const)} -@tab @code{@var{c} = __MCPLHI (@var{a}, @var{b})} -@tab @code{MCPLHI @var{a},#@var{b},@var{c}} -@item @code{uw1 __MCPLI (uw2, const)} -@tab @code{@var{c} = __MCPLI (@var{a}, @var{b})} -@tab @code{MCPLI @var{a},#@var{b},@var{c}} -@item @code{void __MCPXIS (acc, sw1, sw1)} -@tab @code{__MCPXIS (@var{c}, @var{a}, @var{b})} -@tab @code{MCPXIS @var{a},@var{b},@var{c}} -@item @code{void __MCPXIU (acc, uw1, uw1)} -@tab @code{__MCPXIU (@var{c}, @var{a}, @var{b})} -@tab @code{MCPXIU @var{a},@var{b},@var{c}} -@item @code{void __MCPXRS (acc, sw1, sw1)} -@tab @code{__MCPXRS (@var{c}, @var{a}, @var{b})} -@tab @code{MCPXRS @var{a},@var{b},@var{c}} -@item @code{void __MCPXRU (acc, uw1, uw1)} -@tab @code{__MCPXRU (@var{c}, @var{a}, @var{b})} -@tab @code{MCPXRU @var{a},@var{b},@var{c}} -@item @code{uw1 __MCUT (acc, uw1)} -@tab @code{@var{c} = __MCUT (@var{a}, @var{b})} -@tab @code{MCUT @var{a},@var{b},@var{c}} -@item @code{uw1 __MCUTSS (acc, sw1)} -@tab @code{@var{c} = __MCUTSS (@var{a}, @var{b})} -@tab @code{MCUTSS @var{a},@var{b},@var{c}} -@item @code{void __MDADDACCS (acc, acc)} -@tab @code{__MDADDACCS (@var{b}, @var{a})} -@tab @code{MDADDACCS @var{a},@var{b}} -@item @code{void __MDASACCS (acc, acc)} -@tab @code{__MDASACCS (@var{b}, @var{a})} -@tab @code{MDASACCS @var{a},@var{b}} -@item @code{uw2 __MDCUTSSI (acc, const)} -@tab @code{@var{c} = __MDCUTSSI (@var{a}, @var{b})} -@tab @code{MDCUTSSI @var{a},#@var{b},@var{c}} -@item @code{uw2 __MDPACKH (uw2, uw2)} -@tab @code{@var{c} = __MDPACKH (@var{a}, @var{b})} -@tab @code{MDPACKH @var{a},@var{b},@var{c}} -@item @code{uw2 __MDROTLI (uw2, const)} -@tab @code{@var{c} = __MDROTLI (@var{a}, @var{b})} -@tab @code{MDROTLI @var{a},#@var{b},@var{c}} -@item @code{void __MDSUBACCS (acc, acc)} -@tab @code{__MDSUBACCS (@var{b}, @var{a})} -@tab @code{MDSUBACCS @var{a},@var{b}} -@item @code{void __MDUNPACKH (uw1 *, uw2)} -@tab @code{__MDUNPACKH (&@var{b}, @var{a})} -@tab @code{MDUNPACKH @var{a},@var{b}} -@item @code{uw2 __MEXPDHD (uw1, const)} -@tab @code{@var{c} = __MEXPDHD (@var{a}, @var{b})} -@tab @code{MEXPDHD @var{a},#@var{b},@var{c}} -@item @code{uw1 __MEXPDHW (uw1, const)} -@tab @code{@var{c} = __MEXPDHW (@var{a}, @var{b})} -@tab @code{MEXPDHW @var{a},#@var{b},@var{c}} -@item @code{uw1 __MHDSETH (uw1, const)} -@tab @code{@var{c} = __MHDSETH (@var{a}, @var{b})} -@tab @code{MHDSETH @var{a},#@var{b},@var{c}} -@item @code{sw1 __MHDSETS (const)} -@tab @code{@var{b} = __MHDSETS (@var{a})} -@tab @code{MHDSETS #@var{a},@var{b}} -@item @code{uw1 __MHSETHIH (uw1, const)} -@tab @code{@var{b} = __MHSETHIH (@var{b}, @var{a})} -@tab @code{MHSETHIH #@var{a},@var{b}} -@item @code{sw1 __MHSETHIS (sw1, const)} -@tab @code{@var{b} = __MHSETHIS (@var{b}, @var{a})} -@tab @code{MHSETHIS #@var{a},@var{b}} -@item @code{uw1 __MHSETLOH (uw1, const)} -@tab @code{@var{b} = __MHSETLOH (@var{b}, @var{a})} -@tab @code{MHSETLOH #@var{a},@var{b}} -@item @code{sw1 __MHSETLOS (sw1, const)} -@tab @code{@var{b} = __MHSETLOS (@var{b}, @var{a})} -@tab @code{MHSETLOS #@var{a},@var{b}} -@item @code{uw1 __MHTOB (uw2)} -@tab @code{@var{b} = __MHTOB (@var{a})} -@tab @code{MHTOB @var{a},@var{b}} -@item @code{void __MMACHS (acc, sw1, sw1)} -@tab @code{__MMACHS (@var{c}, @var{a}, @var{b})} -@tab @code{MMACHS @var{a},@var{b},@var{c}} -@item @code{void __MMACHU (acc, uw1, uw1)} -@tab @code{__MMACHU (@var{c}, @var{a}, @var{b})} -@tab @code{MMACHU @var{a},@var{b},@var{c}} -@item @code{void __MMRDHS (acc, sw1, sw1)} -@tab @code{__MMRDHS (@var{c}, @var{a}, @var{b})} -@tab @code{MMRDHS @var{a},@var{b},@var{c}} -@item @code{void __MMRDHU (acc, uw1, uw1)} -@tab @code{__MMRDHU (@var{c}, @var{a}, @var{b})} -@tab @code{MMRDHU @var{a},@var{b},@var{c}} -@item @code{void __MMULHS (acc, sw1, sw1)} -@tab @code{__MMULHS (@var{c}, @var{a}, @var{b})} -@tab @code{MMULHS @var{a},@var{b},@var{c}} -@item @code{void __MMULHU (acc, uw1, uw1)} -@tab @code{__MMULHU (@var{c}, @var{a}, @var{b})} -@tab @code{MMULHU @var{a},@var{b},@var{c}} -@item @code{void __MMULXHS (acc, sw1, sw1)} -@tab @code{__MMULXHS (@var{c}, @var{a}, @var{b})} -@tab @code{MMULXHS @var{a},@var{b},@var{c}} -@item @code{void __MMULXHU (acc, uw1, uw1)} -@tab @code{__MMULXHU (@var{c}, @var{a}, @var{b})} -@tab @code{MMULXHU @var{a},@var{b},@var{c}} -@item @code{uw1 __MNOT (uw1)} -@tab @code{@var{b} = __MNOT (@var{a})} -@tab @code{MNOT @var{a},@var{b}} -@item @code{uw1 __MOR (uw1, uw1)} -@tab @code{@var{c} = __MOR (@var{a}, @var{b})} -@tab @code{MOR @var{a},@var{b},@var{c}} -@item @code{uw1 __MPACKH (uh, uh)} -@tab @code{@var{c} = __MPACKH (@var{a}, @var{b})} -@tab @code{MPACKH @var{a},@var{b},@var{c}} -@item @code{sw2 __MQADDHSS (sw2, sw2)} -@tab @code{@var{c} = __MQADDHSS (@var{a}, @var{b})} -@tab @code{MQADDHSS @var{a},@var{b},@var{c}} -@item @code{uw2 __MQADDHUS (uw2, uw2)} -@tab @code{@var{c} = __MQADDHUS (@var{a}, @var{b})} -@tab @code{MQADDHUS @var{a},@var{b},@var{c}} -@item @code{void __MQCPXIS (acc, sw2, sw2)} -@tab @code{__MQCPXIS (@var{c}, @var{a}, @var{b})} -@tab @code{MQCPXIS @var{a},@var{b},@var{c}} -@item @code{void __MQCPXIU (acc, uw2, uw2)} -@tab @code{__MQCPXIU (@var{c}, @var{a}, @var{b})} -@tab @code{MQCPXIU @var{a},@var{b},@var{c}} -@item @code{void __MQCPXRS (acc, sw2, sw2)} -@tab @code{__MQCPXRS (@var{c}, @var{a}, @var{b})} -@tab @code{MQCPXRS @var{a},@var{b},@var{c}} -@item @code{void __MQCPXRU (acc, uw2, uw2)} -@tab @code{__MQCPXRU (@var{c}, @var{a}, @var{b})} -@tab @code{MQCPXRU @var{a},@var{b},@var{c}} -@item @code{sw2 __MQLCLRHS (sw2, sw2)} -@tab @code{@var{c} = __MQLCLRHS (@var{a}, @var{b})} -@tab @code{MQLCLRHS @var{a},@var{b},@var{c}} -@item @code{sw2 __MQLMTHS (sw2, sw2)} -@tab @code{@var{c} = __MQLMTHS (@var{a}, @var{b})} -@tab @code{MQLMTHS @var{a},@var{b},@var{c}} -@item @code{void __MQMACHS (acc, sw2, sw2)} -@tab @code{__MQMACHS (@var{c}, @var{a}, @var{b})} -@tab @code{MQMACHS @var{a},@var{b},@var{c}} -@item @code{void __MQMACHU (acc, uw2, uw2)} -@tab @code{__MQMACHU (@var{c}, @var{a}, @var{b})} -@tab @code{MQMACHU @var{a},@var{b},@var{c}} -@item @code{void __MQMACXHS (acc, sw2, sw2)} -@tab @code{__MQMACXHS (@var{c}, @var{a}, @var{b})} -@tab @code{MQMACXHS @var{a},@var{b},@var{c}} -@item @code{void __MQMULHS (acc, sw2, sw2)} -@tab @code{__MQMULHS (@var{c}, @var{a}, @var{b})} -@tab @code{MQMULHS @var{a},@var{b},@var{c}} -@item @code{void __MQMULHU (acc, uw2, uw2)} -@tab @code{__MQMULHU (@var{c}, @var{a}, @var{b})} -@tab @code{MQMULHU @var{a},@var{b},@var{c}} -@item @code{void __MQMULXHS (acc, sw2, sw2)} -@tab @code{__MQMULXHS (@var{c}, @var{a}, @var{b})} -@tab @code{MQMULXHS @var{a},@var{b},@var{c}} -@item @code{void __MQMULXHU (acc, uw2, uw2)} -@tab @code{__MQMULXHU (@var{c}, @var{a}, @var{b})} -@tab @code{MQMULXHU @var{a},@var{b},@var{c}} -@item @code{sw2 __MQSATHS (sw2, sw2)} -@tab @code{@var{c} = __MQSATHS (@var{a}, @var{b})} -@tab @code{MQSATHS @var{a},@var{b},@var{c}} -@item @code{uw2 __MQSLLHI (uw2, int)} -@tab @code{@var{c} = __MQSLLHI (@var{a}, @var{b})} -@tab @code{MQSLLHI @var{a},@var{b},@var{c}} -@item @code{sw2 __MQSRAHI (sw2, int)} -@tab @code{@var{c} = __MQSRAHI (@var{a}, @var{b})} -@tab @code{MQSRAHI @var{a},@var{b},@var{c}} -@item @code{sw2 __MQSUBHSS (sw2, sw2)} -@tab @code{@var{c} = __MQSUBHSS (@var{a}, @var{b})} -@tab @code{MQSUBHSS @var{a},@var{b},@var{c}} -@item @code{uw2 __MQSUBHUS (uw2, uw2)} -@tab @code{@var{c} = __MQSUBHUS (@var{a}, @var{b})} -@tab @code{MQSUBHUS @var{a},@var{b},@var{c}} -@item @code{void __MQXMACHS (acc, sw2, sw2)} -@tab @code{__MQXMACHS (@var{c}, @var{a}, @var{b})} -@tab @code{MQXMACHS @var{a},@var{b},@var{c}} -@item @code{void __MQXMACXHS (acc, sw2, sw2)} -@tab @code{__MQXMACXHS (@var{c}, @var{a}, @var{b})} -@tab @code{MQXMACXHS @var{a},@var{b},@var{c}} -@item @code{uw1 __MRDACC (acc)} -@tab @code{@var{b} = __MRDACC (@var{a})} -@tab @code{MRDACC @var{a},@var{b}} -@item @code{uw1 __MRDACCG (acc)} -@tab @code{@var{b} = __MRDACCG (@var{a})} -@tab @code{MRDACCG @var{a},@var{b}} -@item @code{uw1 __MROTLI (uw1, const)} -@tab @code{@var{c} = __MROTLI (@var{a}, @var{b})} -@tab @code{MROTLI @var{a},#@var{b},@var{c}} -@item @code{uw1 __MROTRI (uw1, const)} -@tab @code{@var{c} = __MROTRI (@var{a}, @var{b})} -@tab @code{MROTRI @var{a},#@var{b},@var{c}} -@item @code{sw1 __MSATHS (sw1, sw1)} -@tab @code{@var{c} = __MSATHS (@var{a}, @var{b})} -@tab @code{MSATHS @var{a},@var{b},@var{c}} -@item @code{uw1 __MSATHU (uw1, uw1)} -@tab @code{@var{c} = __MSATHU (@var{a}, @var{b})} -@tab @code{MSATHU @var{a},@var{b},@var{c}} -@item @code{uw1 __MSLLHI (uw1, const)} -@tab @code{@var{c} = __MSLLHI (@var{a}, @var{b})} -@tab @code{MSLLHI @var{a},#@var{b},@var{c}} -@item @code{sw1 __MSRAHI (sw1, const)} -@tab @code{@var{c} = __MSRAHI (@var{a}, @var{b})} -@tab @code{MSRAHI @var{a},#@var{b},@var{c}} -@item @code{uw1 __MSRLHI (uw1, const)} -@tab @code{@var{c} = __MSRLHI (@var{a}, @var{b})} -@tab @code{MSRLHI @var{a},#@var{b},@var{c}} -@item @code{void __MSUBACCS (acc, acc)} -@tab @code{__MSUBACCS (@var{b}, @var{a})} -@tab @code{MSUBACCS @var{a},@var{b}} -@item @code{sw1 __MSUBHSS (sw1, sw1)} -@tab @code{@var{c} = __MSUBHSS (@var{a}, @var{b})} -@tab @code{MSUBHSS @var{a},@var{b},@var{c}} -@item @code{uw1 __MSUBHUS (uw1, uw1)} -@tab @code{@var{c} = __MSUBHUS (@var{a}, @var{b})} -@tab @code{MSUBHUS @var{a},@var{b},@var{c}} -@item @code{void __MTRAP (void)} -@tab @code{__MTRAP ()} -@tab @code{MTRAP} -@item @code{uw2 __MUNPACKH (uw1)} -@tab @code{@var{b} = __MUNPACKH (@var{a})} -@tab @code{MUNPACKH @var{a},@var{b}} -@item @code{uw1 __MWCUT (uw2, uw1)} -@tab @code{@var{c} = __MWCUT (@var{a}, @var{b})} -@tab @code{MWCUT @var{a},@var{b},@var{c}} -@item @code{void __MWTACC (acc, uw1)} -@tab @code{__MWTACC (@var{b}, @var{a})} -@tab @code{MWTACC @var{a},@var{b}} -@item @code{void __MWTACCG (acc, uw1)} -@tab @code{__MWTACCG (@var{b}, @var{a})} -@tab @code{MWTACCG @var{a},@var{b}} -@item @code{uw1 __MXOR (uw1, uw1)} -@tab @code{@var{c} = __MXOR (@var{a}, @var{b})} -@tab @code{MXOR @var{a},@var{b},@var{c}} -@end multitable - -@node Raw read/write Functions -@subsubsection Raw Read/Write Functions - -This sections describes built-in functions related to read and write -instructions to access memory. These functions generate -@code{membar} instructions to flush the I/O load and stores where -appropriate, as described in Fujitsu's manual described above. - -@table @code - -@item unsigned char __builtin_read8 (void *@var{data}) -@item unsigned short __builtin_read16 (void *@var{data}) -@item unsigned long __builtin_read32 (void *@var{data}) -@item unsigned long long __builtin_read64 (void *@var{data}) - -@item void __builtin_write8 (void *@var{data}, unsigned char @var{datum}) -@item void __builtin_write16 (void *@var{data}, unsigned short @var{datum}) -@item void __builtin_write32 (void *@var{data}, unsigned long @var{datum}) -@item void __builtin_write64 (void *@var{data}, unsigned long long @var{datum}) -@end table - -@node Other Built-in Functions -@subsubsection Other Built-in Functions - -This section describes built-in functions that are not named after -a specific FR-V instruction. - -@table @code -@item sw2 __IACCreadll (iacc @var{reg}) -Return the full 64-bit value of IACC0@. The @var{reg} argument is reserved -for future expansion and must be 0. - -@item sw1 __IACCreadl (iacc @var{reg}) -Return the value of IACC0H if @var{reg} is 0 and IACC0L if @var{reg} is 1. -Other values of @var{reg} are rejected as invalid. - -@item void __IACCsetll (iacc @var{reg}, sw2 @var{x}) -Set the full 64-bit value of IACC0 to @var{x}. The @var{reg} argument -is reserved for future expansion and must be 0. - -@item void __IACCsetl (iacc @var{reg}, sw1 @var{x}) -Set IACC0H to @var{x} if @var{reg} is 0 and IACC0L to @var{x} if @var{reg} -is 1. Other values of @var{reg} are rejected as invalid. - -@item void __data_prefetch0 (const void *@var{x}) -Use the @code{dcpl} instruction to load the contents of address @var{x} -into the data cache. - -@item void __data_prefetch (const void *@var{x}) -Use the @code{nldub} instruction to load the contents of address @var{x} -into the data cache. The instruction is issued in slot I1@. -@end table - -@node LoongArch Base Built-in Functions -@subsection LoongArch Base Built-in Functions - -These built-in functions are available for LoongArch. - -Data Type Description: -@itemize -@item @code{imm0_31}, a compile-time constant in range 0 to 31; -@item @code{imm0_16383}, a compile-time constant in range 0 to 16383; -@item @code{imm0_32767}, a compile-time constant in range 0 to 32767; -@item @code{imm_n2048_2047}, a compile-time constant in range -2048 to 2047; -@end itemize - -The intrinsics provided are listed below: -@smallexample - unsigned int __builtin_loongarch_movfcsr2gr (imm0_31) - void __builtin_loongarch_movgr2fcsr (imm0_31, unsigned int) - void __builtin_loongarch_cacop_d (imm0_31, unsigned long int, imm_n2048_2047) - unsigned int __builtin_loongarch_cpucfg (unsigned int) - void __builtin_loongarch_asrtle_d (long int, long int) - void __builtin_loongarch_asrtgt_d (long int, long int) - long int __builtin_loongarch_lddir_d (long int, imm0_31) - void __builtin_loongarch_ldpte_d (long int, imm0_31) - - int __builtin_loongarch_crc_w_b_w (char, int) - int __builtin_loongarch_crc_w_h_w (short, int) - int __builtin_loongarch_crc_w_w_w (int, int) - int __builtin_loongarch_crc_w_d_w (long int, int) - int __builtin_loongarch_crcc_w_b_w (char, int) - int __builtin_loongarch_crcc_w_h_w (short, int) - int __builtin_loongarch_crcc_w_w_w (int, int) - int __builtin_loongarch_crcc_w_d_w (long int, int) - - unsigned int __builtin_loongarch_csrrd_w (imm0_16383) - unsigned int __builtin_loongarch_csrwr_w (unsigned int, imm0_16383) - unsigned int __builtin_loongarch_csrxchg_w (unsigned int, unsigned int, imm0_16383) - unsigned long int __builtin_loongarch_csrrd_d (imm0_16383) - unsigned long int __builtin_loongarch_csrwr_d (unsigned long int, imm0_16383) - unsigned long int __builtin_loongarch_csrxchg_d (unsigned long int, unsigned long int, imm0_16383) - - unsigned char __builtin_loongarch_iocsrrd_b (unsigned int) - unsigned short __builtin_loongarch_iocsrrd_h (unsigned int) - unsigned int __builtin_loongarch_iocsrrd_w (unsigned int) - unsigned long int __builtin_loongarch_iocsrrd_d (unsigned int) - void __builtin_loongarch_iocsrwr_b (unsigned char, unsigned int) - void __builtin_loongarch_iocsrwr_h (unsigned short, unsigned int) - void __builtin_loongarch_iocsrwr_w (unsigned int, unsigned int) - void __builtin_loongarch_iocsrwr_d (unsigned long int, unsigned int) - - void __builtin_loongarch_dbar (imm0_32767) - void __builtin_loongarch_ibar (imm0_32767) - - void __builtin_loongarch_syscall (imm0_32767) - void __builtin_loongarch_break (imm0_32767) -@end smallexample - -These intrinsic functions are available by using @option{-mfrecipe}. -@smallexample - float __builtin_loongarch_frecipe_s (float); - double __builtin_loongarch_frecipe_d (double); - float __builtin_loongarch_frsqrte_s (float); - double __builtin_loongarch_frsqrte_d (double); -@end smallexample - -@emph{Note:}Since the control register is divided into 32-bit and 64-bit, -but the access instruction is not distinguished. So GCC renames the control -instructions when implementing intrinsics. - -Take the csrrd instruction as an example, built-in functions are implemented as follows: -@smallexample - __builtin_loongarch_csrrd_w // When reading the 32-bit control register use. - __builtin_loongarch_csrrd_d // When reading the 64-bit control register use. -@end smallexample - -For the convenience of use, the built-in functions are encapsulated, -the encapsulated functions and @code{__drdtime_t, __rdtime_t} are -defined in the @code{larchintrin.h}. So if you call the following -function you need to include @code{larchintrin.h}. - -@smallexample - typedef struct drdtime@{ - unsigned long dvalue; - unsigned long dtimeid; - @} __drdtime_t; - - typedef struct rdtime@{ - unsigned int value; - unsigned int timeid; - @} __rdtime_t; -@end smallexample - -@smallexample - __drdtime_t __rdtime_d (void) - __rdtime_t __rdtimel_w (void) - __rdtime_t __rdtimeh_w (void) - unsigned int __movfcsr2gr (imm0_31) - void __movgr2fcsr (imm0_31, unsigned int) - void __cacop_d (imm0_31, unsigned long, imm_n2048_2047) - unsigned int __cpucfg (unsigned int) - void __asrtle_d (long int, long int) - void __asrtgt_d (long int, long int) - long int __lddir_d (long int, imm0_31) - void __ldpte_d (long int, imm0_31) - - int __crc_w_b_w (char, int) - int __crc_w_h_w (short, int) - int __crc_w_w_w (int, int) - int __crc_w_d_w (long int, int) - int __crcc_w_b_w (char, int) - int __crcc_w_h_w (short, int) - int __crcc_w_w_w (int, int) - int __crcc_w_d_w (long int, int) - - unsigned int __csrrd_w (imm0_16383) - unsigned int __csrwr_w (unsigned int, imm0_16383) - unsigned int __csrxchg_w (unsigned int, unsigned int, imm0_16383) - unsigned long __csrrd_d (imm0_16383) - unsigned long __csrwr_d (unsigned long, imm0_16383) - unsigned long __csrxchg_d (unsigned long, unsigned long, imm0_16383) - - unsigned char __iocsrrd_b (unsigned int) - unsigned short __iocsrrd_h (unsigned int) - unsigned int __iocsrrd_w (unsigned int) - unsigned long __iocsrrd_d (unsigned int) - void __iocsrwr_b (unsigned char, unsigned int) - void __iocsrwr_h (unsigned short, unsigned int) - void __iocsrwr_w (unsigned int, unsigned int) - void __iocsrwr_d (unsigned long, unsigned int) - - void __dbar (imm0_32767) - void __ibar (imm0_32767) - - void __syscall (imm0_32767) - void __break (imm0_32767) -@end smallexample - -These intrinsic functions are available by including @code{larchintrin.h} and -using @option{-mfrecipe}. -@smallexample - float __frecipe_s (float); - double __frecipe_d (double); - float __frsqrte_s (float); - double __frsqrte_d (double); -@end smallexample - -Additional built-in functions are available for LoongArch family -processors to efficiently use 128-bit floating-point (__float128) -values. - -The following are the basic built-in functions supported. -@smallexample -__float128 __builtin_fabsq (__float128); -__float128 __builtin_copysignq (__float128, __float128); -__float128 __builtin_infq (void); -__float128 __builtin_huge_valq (void); -__float128 __builtin_nanq (void); -__float128 __builtin_nansq (void); -@end smallexample - -Returns the value that is currently set in the @samp{tp} register. -@smallexample - void * __builtin_thread_pointer (void) -@end smallexample - -@node LoongArch SX Vector Intrinsics -@subsection LoongArch SX Vector Intrinsics - -GCC provides intrinsics to access the LSX (Loongson SIMD Extension) instructions. -The interface is made available by including @code{} and using -@option{-mlsx}. - -The following vectors typedefs are included in @code{lsxintrin.h}: - -@itemize -@item @code{__m128i}, a 128-bit vector of fixed point; -@item @code{__m128}, a 128-bit vector of single precision floating point; -@item @code{__m128d}, a 128-bit vector of double precision floating point. -@end itemize - -Instructions and corresponding built-ins may have additional restrictions and/or -input/output values manipulated: -@itemize -@item @code{imm0_1}, an integer literal in range 0 to 1; -@item @code{imm0_3}, an integer literal in range 0 to 3; -@item @code{imm0_7}, an integer literal in range 0 to 7; -@item @code{imm0_15}, an integer literal in range 0 to 15; -@item @code{imm0_31}, an integer literal in range 0 to 31; -@item @code{imm0_63}, an integer literal in range 0 to 63; -@item @code{imm0_127}, an integer literal in range 0 to 127; -@item @code{imm0_255}, an integer literal in range 0 to 255; -@item @code{imm_n16_15}, an integer literal in range -16 to 15; -@item @code{imm_n128_127}, an integer literal in range -128 to 127; -@item @code{imm_n256_255}, an integer literal in range -256 to 255; -@item @code{imm_n512_511}, an integer literal in range -512 to 511; -@item @code{imm_n1024_1023}, an integer literal in range -1024 to 1023; -@item @code{imm_n2048_2047}, an integer literal in range -2048 to 2047. -@end itemize - -For convenience, GCC defines functions @code{__lsx_vrepli_@{b/h/w/d@}} and -@code{__lsx_b[n]z_@{v/b/h/w/d@}}, which are implemented as follows: - -@smallexample -a. @code{__lsx_vrepli_@{b/h/w/d@}}: Implemented the case where the highest - bit of @code{vldi} instruction @code{i13} is 1. - - i13[12] == 1'b0 - case i13[11:10] of : - 2'b00: __lsx_vrepli_b (imm_n512_511) - 2'b01: __lsx_vrepli_h (imm_n512_511) - 2'b10: __lsx_vrepli_w (imm_n512_511) - 2'b11: __lsx_vrepli_d (imm_n512_511) - -b. @code{__lsx_b[n]z_@{v/b/h/w/d@}}: Since the @code{vseteqz} class directive - cannot be used on its own, this function is defined. - - _lsx_bz_v => vseteqz.v + bcnez - _lsx_bnz_v => vsetnez.v + bcnez - _lsx_bz_b => vsetanyeqz.b + bcnez - _lsx_bz_h => vsetanyeqz.h + bcnez - _lsx_bz_w => vsetanyeqz.w + bcnez - _lsx_bz_d => vsetanyeqz.d + bcnez - _lsx_bnz_b => vsetallnez.b + bcnez - _lsx_bnz_h => vsetallnez.h + bcnez - _lsx_bnz_w => vsetallnez.w + bcnez - _lsx_bnz_d => vsetallnez.d + bcnez -@end smallexample - -@smallexample -eg: - #include - - extern __m128i @var{a}; - - void - test (void) - @{ - if (__lsx_bz_v (@var{a})) - printf ("1\n"); - else - printf ("2\n"); - @} -@end smallexample - -@emph{Note:} For directives where the intent operand is also the source operand -(modifying only part of the bitfield of the intent register), the first parameter -in the builtin call function is used as the intent operand. - -@smallexample -eg: - #include - - extern __m128i @var{dst}; - extern int @var{src}; - - void - test (void) - @{ - @var{dst} = __lsx_vinsgr2vr_b (@var{dst}, @var{src}, 3); - @} -@end smallexample - -The intrinsics provided are listed below: -@smallexample -int __lsx_bnz_b (__m128i); -int __lsx_bnz_d (__m128i); -int __lsx_bnz_h (__m128i); -int __lsx_bnz_v (__m128i); -int __lsx_bnz_w (__m128i); -int __lsx_bz_b (__m128i); -int __lsx_bz_d (__m128i); -int __lsx_bz_h (__m128i); -int __lsx_bz_v (__m128i); -int __lsx_bz_w (__m128i); -__m128i __lsx_vabsd_b (__m128i, __m128i); -__m128i __lsx_vabsd_bu (__m128i, __m128i); -__m128i __lsx_vabsd_d (__m128i, __m128i); -__m128i __lsx_vabsd_du (__m128i, __m128i); -__m128i __lsx_vabsd_h (__m128i, __m128i); -__m128i __lsx_vabsd_hu (__m128i, __m128i); -__m128i __lsx_vabsd_w (__m128i, __m128i); -__m128i __lsx_vabsd_wu (__m128i, __m128i); -__m128i __lsx_vadda_b (__m128i, __m128i); -__m128i __lsx_vadda_d (__m128i, __m128i); -__m128i __lsx_vadda_h (__m128i, __m128i); -__m128i __lsx_vadda_w (__m128i, __m128i); -__m128i __lsx_vadd_b (__m128i, __m128i); -__m128i __lsx_vadd_d (__m128i, __m128i); -__m128i __lsx_vadd_h (__m128i, __m128i); -__m128i __lsx_vaddi_bu (__m128i, imm0_31); -__m128i __lsx_vaddi_du (__m128i, imm0_31); -__m128i __lsx_vaddi_hu (__m128i, imm0_31); -__m128i __lsx_vaddi_wu (__m128i, imm0_31); -__m128i __lsx_vadd_q (__m128i, __m128i); -__m128i __lsx_vadd_w (__m128i, __m128i); -__m128i __lsx_vaddwev_d_w (__m128i, __m128i); -__m128i __lsx_vaddwev_d_wu (__m128i, __m128i); -__m128i __lsx_vaddwev_d_wu_w (__m128i, __m128i); -__m128i __lsx_vaddwev_h_b (__m128i, __m128i); -__m128i __lsx_vaddwev_h_bu (__m128i, __m128i); -__m128i __lsx_vaddwev_h_bu_b (__m128i, __m128i); -__m128i __lsx_vaddwev_q_d (__m128i, __m128i); -__m128i __lsx_vaddwev_q_du (__m128i, __m128i); -__m128i __lsx_vaddwev_q_du_d (__m128i, __m128i); -__m128i __lsx_vaddwev_w_h (__m128i, __m128i); -__m128i __lsx_vaddwev_w_hu (__m128i, __m128i); -__m128i __lsx_vaddwev_w_hu_h (__m128i, __m128i); -__m128i __lsx_vaddwod_d_w (__m128i, __m128i); -__m128i __lsx_vaddwod_d_wu (__m128i, __m128i); -__m128i __lsx_vaddwod_d_wu_w (__m128i, __m128i); -__m128i __lsx_vaddwod_h_b (__m128i, __m128i); -__m128i __lsx_vaddwod_h_bu (__m128i, __m128i); -__m128i __lsx_vaddwod_h_bu_b (__m128i, __m128i); -__m128i __lsx_vaddwod_q_d (__m128i, __m128i); -__m128i __lsx_vaddwod_q_du (__m128i, __m128i); -__m128i __lsx_vaddwod_q_du_d (__m128i, __m128i); -__m128i __lsx_vaddwod_w_h (__m128i, __m128i); -__m128i __lsx_vaddwod_w_hu (__m128i, __m128i); -__m128i __lsx_vaddwod_w_hu_h (__m128i, __m128i); -__m128i __lsx_vandi_b (__m128i, imm0_255); -__m128i __lsx_vandn_v (__m128i, __m128i); -__m128i __lsx_vand_v (__m128i, __m128i); -__m128i __lsx_vavg_b (__m128i, __m128i); -__m128i __lsx_vavg_bu (__m128i, __m128i); -__m128i __lsx_vavg_d (__m128i, __m128i); -__m128i __lsx_vavg_du (__m128i, __m128i); -__m128i __lsx_vavg_h (__m128i, __m128i); -__m128i __lsx_vavg_hu (__m128i, __m128i); -__m128i __lsx_vavgr_b (__m128i, __m128i); -__m128i __lsx_vavgr_bu (__m128i, __m128i); -__m128i __lsx_vavgr_d (__m128i, __m128i); -__m128i __lsx_vavgr_du (__m128i, __m128i); -__m128i __lsx_vavgr_h (__m128i, __m128i); -__m128i __lsx_vavgr_hu (__m128i, __m128i); -__m128i __lsx_vavgr_w (__m128i, __m128i); -__m128i __lsx_vavgr_wu (__m128i, __m128i); -__m128i __lsx_vavg_w (__m128i, __m128i); -__m128i __lsx_vavg_wu (__m128i, __m128i); -__m128i __lsx_vbitclr_b (__m128i, __m128i); -__m128i __lsx_vbitclr_d (__m128i, __m128i); -__m128i __lsx_vbitclr_h (__m128i, __m128i); -__m128i __lsx_vbitclri_b (__m128i, imm0_7); -__m128i __lsx_vbitclri_d (__m128i, imm0_63); -__m128i __lsx_vbitclri_h (__m128i, imm0_15); -__m128i __lsx_vbitclri_w (__m128i, imm0_31); -__m128i __lsx_vbitclr_w (__m128i, __m128i); -__m128i __lsx_vbitrev_b (__m128i, __m128i); -__m128i __lsx_vbitrev_d (__m128i, __m128i); -__m128i __lsx_vbitrev_h (__m128i, __m128i); -__m128i __lsx_vbitrevi_b (__m128i, imm0_7); -__m128i __lsx_vbitrevi_d (__m128i, imm0_63); -__m128i __lsx_vbitrevi_h (__m128i, imm0_15); -__m128i __lsx_vbitrevi_w (__m128i, imm0_31); -__m128i __lsx_vbitrev_w (__m128i, __m128i); -__m128i __lsx_vbitseli_b (__m128i, __m128i, imm0_255); -__m128i __lsx_vbitsel_v (__m128i, __m128i, __m128i); -__m128i __lsx_vbitset_b (__m128i, __m128i); -__m128i __lsx_vbitset_d (__m128i, __m128i); -__m128i __lsx_vbitset_h (__m128i, __m128i); -__m128i __lsx_vbitseti_b (__m128i, imm0_7); -__m128i __lsx_vbitseti_d (__m128i, imm0_63); -__m128i __lsx_vbitseti_h (__m128i, imm0_15); -__m128i __lsx_vbitseti_w (__m128i, imm0_31); -__m128i __lsx_vbitset_w (__m128i, __m128i); -__m128i __lsx_vbsll_v (__m128i, imm0_31); -__m128i __lsx_vbsrl_v (__m128i, imm0_31); -__m128i __lsx_vclo_b (__m128i); -__m128i __lsx_vclo_d (__m128i); -__m128i __lsx_vclo_h (__m128i); -__m128i __lsx_vclo_w (__m128i); -__m128i __lsx_vclz_b (__m128i); -__m128i __lsx_vclz_d (__m128i); -__m128i __lsx_vclz_h (__m128i); -__m128i __lsx_vclz_w (__m128i); -__m128i __lsx_vdiv_b (__m128i, __m128i); -__m128i __lsx_vdiv_bu (__m128i, __m128i); -__m128i __lsx_vdiv_d (__m128i, __m128i); -__m128i __lsx_vdiv_du (__m128i, __m128i); -__m128i __lsx_vdiv_h (__m128i, __m128i); -__m128i __lsx_vdiv_hu (__m128i, __m128i); -__m128i __lsx_vdiv_w (__m128i, __m128i); -__m128i __lsx_vdiv_wu (__m128i, __m128i); -__m128i __lsx_vexth_du_wu (__m128i); -__m128i __lsx_vexth_d_w (__m128i); -__m128i __lsx_vexth_h_b (__m128i); -__m128i __lsx_vexth_hu_bu (__m128i); -__m128i __lsx_vexth_q_d (__m128i); -__m128i __lsx_vexth_qu_du (__m128i); -__m128i __lsx_vexth_w_h (__m128i); -__m128i __lsx_vexth_wu_hu (__m128i); -__m128i __lsx_vextl_q_d (__m128i); -__m128i __lsx_vextl_qu_du (__m128i); -__m128i __lsx_vextrins_b (__m128i, __m128i, imm0_255); -__m128i __lsx_vextrins_d (__m128i, __m128i, imm0_255); -__m128i __lsx_vextrins_h (__m128i, __m128i, imm0_255); -__m128i __lsx_vextrins_w (__m128i, __m128i, imm0_255); -__m128d __lsx_vfadd_d (__m128d, __m128d); -__m128 __lsx_vfadd_s (__m128, __m128); -__m128i __lsx_vfclass_d (__m128d); -__m128i __lsx_vfclass_s (__m128); -__m128i __lsx_vfcmp_caf_d (__m128d, __m128d); -__m128i __lsx_vfcmp_caf_s (__m128, __m128); -__m128i __lsx_vfcmp_ceq_d (__m128d, __m128d); -__m128i __lsx_vfcmp_ceq_s (__m128, __m128); -__m128i __lsx_vfcmp_cle_d (__m128d, __m128d); -__m128i __lsx_vfcmp_cle_s (__m128, __m128); -__m128i __lsx_vfcmp_clt_d (__m128d, __m128d); -__m128i __lsx_vfcmp_clt_s (__m128, __m128); -__m128i __lsx_vfcmp_cne_d (__m128d, __m128d); -__m128i __lsx_vfcmp_cne_s (__m128, __m128); -__m128i __lsx_vfcmp_cor_d (__m128d, __m128d); -__m128i __lsx_vfcmp_cor_s (__m128, __m128); -__m128i __lsx_vfcmp_cueq_d (__m128d, __m128d); -__m128i __lsx_vfcmp_cueq_s (__m128, __m128); -__m128i __lsx_vfcmp_cule_d (__m128d, __m128d); -__m128i __lsx_vfcmp_cule_s (__m128, __m128); -__m128i __lsx_vfcmp_cult_d (__m128d, __m128d); -__m128i __lsx_vfcmp_cult_s (__m128, __m128); -__m128i __lsx_vfcmp_cun_d (__m128d, __m128d); -__m128i __lsx_vfcmp_cune_d (__m128d, __m128d); -__m128i __lsx_vfcmp_cune_s (__m128, __m128); -__m128i __lsx_vfcmp_cun_s (__m128, __m128); -__m128i __lsx_vfcmp_saf_d (__m128d, __m128d); -__m128i __lsx_vfcmp_saf_s (__m128, __m128); -__m128i __lsx_vfcmp_seq_d (__m128d, __m128d); -__m128i __lsx_vfcmp_seq_s (__m128, __m128); -__m128i __lsx_vfcmp_sle_d (__m128d, __m128d); -__m128i __lsx_vfcmp_sle_s (__m128, __m128); -__m128i __lsx_vfcmp_slt_d (__m128d, __m128d); -__m128i __lsx_vfcmp_slt_s (__m128, __m128); -__m128i __lsx_vfcmp_sne_d (__m128d, __m128d); -__m128i __lsx_vfcmp_sne_s (__m128, __m128); -__m128i __lsx_vfcmp_sor_d (__m128d, __m128d); -__m128i __lsx_vfcmp_sor_s (__m128, __m128); -__m128i __lsx_vfcmp_sueq_d (__m128d, __m128d); -__m128i __lsx_vfcmp_sueq_s (__m128, __m128); -__m128i __lsx_vfcmp_sule_d (__m128d, __m128d); -__m128i __lsx_vfcmp_sule_s (__m128, __m128); -__m128i __lsx_vfcmp_sult_d (__m128d, __m128d); -__m128i __lsx_vfcmp_sult_s (__m128, __m128); -__m128i __lsx_vfcmp_sun_d (__m128d, __m128d); -__m128i __lsx_vfcmp_sune_d (__m128d, __m128d); -__m128i __lsx_vfcmp_sune_s (__m128, __m128); -__m128i __lsx_vfcmp_sun_s (__m128, __m128); -__m128d __lsx_vfcvth_d_s (__m128); -__m128i __lsx_vfcvt_h_s (__m128, __m128); -__m128 __lsx_vfcvth_s_h (__m128i); -__m128d __lsx_vfcvtl_d_s (__m128); -__m128 __lsx_vfcvtl_s_h (__m128i); -__m128 __lsx_vfcvt_s_d (__m128d, __m128d); -__m128d __lsx_vfdiv_d (__m128d, __m128d); -__m128 __lsx_vfdiv_s (__m128, __m128); -__m128d __lsx_vffint_d_l (__m128i); -__m128d __lsx_vffint_d_lu (__m128i); -__m128d __lsx_vffinth_d_w (__m128i); -__m128d __lsx_vffintl_d_w (__m128i); -__m128 __lsx_vffint_s_l (__m128i, __m128i); -__m128 __lsx_vffint_s_w (__m128i); -__m128 __lsx_vffint_s_wu (__m128i); -__m128d __lsx_vflogb_d (__m128d); -__m128 __lsx_vflogb_s (__m128); -__m128d __lsx_vfmadd_d (__m128d, __m128d, __m128d); -__m128 __lsx_vfmadd_s (__m128, __m128, __m128); -__m128d __lsx_vfmaxa_d (__m128d, __m128d); -__m128 __lsx_vfmaxa_s (__m128, __m128); -__m128d __lsx_vfmax_d (__m128d, __m128d); -__m128 __lsx_vfmax_s (__m128, __m128); -__m128d __lsx_vfmina_d (__m128d, __m128d); -__m128 __lsx_vfmina_s (__m128, __m128); -__m128d __lsx_vfmin_d (__m128d, __m128d); -__m128 __lsx_vfmin_s (__m128, __m128); -__m128d __lsx_vfmsub_d (__m128d, __m128d, __m128d); -__m128 __lsx_vfmsub_s (__m128, __m128, __m128); -__m128d __lsx_vfmul_d (__m128d, __m128d); -__m128 __lsx_vfmul_s (__m128, __m128); -__m128d __lsx_vfnmadd_d (__m128d, __m128d, __m128d); -__m128 __lsx_vfnmadd_s (__m128, __m128, __m128); -__m128d __lsx_vfnmsub_d (__m128d, __m128d, __m128d); -__m128 __lsx_vfnmsub_s (__m128, __m128, __m128); -__m128d __lsx_vfrecip_d (__m128d); -__m128 __lsx_vfrecip_s (__m128); -__m128d __lsx_vfrint_d (__m128d); -__m128d __lsx_vfrintrm_d (__m128d); -__m128 __lsx_vfrintrm_s (__m128); -__m128d __lsx_vfrintrne_d (__m128d); -__m128 __lsx_vfrintrne_s (__m128); -__m128d __lsx_vfrintrp_d (__m128d); -__m128 __lsx_vfrintrp_s (__m128); -__m128d __lsx_vfrintrz_d (__m128d); -__m128 __lsx_vfrintrz_s (__m128); -__m128 __lsx_vfrint_s (__m128); -__m128d __lsx_vfrsqrt_d (__m128d); -__m128 __lsx_vfrsqrt_s (__m128); -__m128i __lsx_vfrstp_b (__m128i, __m128i, __m128i); -__m128i __lsx_vfrstp_h (__m128i, __m128i, __m128i); -__m128i __lsx_vfrstpi_b (__m128i, __m128i, imm0_31); -__m128i __lsx_vfrstpi_h (__m128i, __m128i, imm0_31); -__m128d __lsx_vfsqrt_d (__m128d); -__m128 __lsx_vfsqrt_s (__m128); -__m128d __lsx_vfsub_d (__m128d, __m128d); -__m128 __lsx_vfsub_s (__m128, __m128); -__m128i __lsx_vftinth_l_s (__m128); -__m128i __lsx_vftint_l_d (__m128d); -__m128i __lsx_vftintl_l_s (__m128); -__m128i __lsx_vftint_lu_d (__m128d); -__m128i __lsx_vftintrmh_l_s (__m128); -__m128i __lsx_vftintrm_l_d (__m128d); -__m128i __lsx_vftintrml_l_s (__m128); -__m128i __lsx_vftintrm_w_d (__m128d, __m128d); -__m128i __lsx_vftintrm_w_s (__m128); -__m128i __lsx_vftintrneh_l_s (__m128); -__m128i __lsx_vftintrne_l_d (__m128d); -__m128i __lsx_vftintrnel_l_s (__m128); -__m128i __lsx_vftintrne_w_d (__m128d, __m128d); -__m128i __lsx_vftintrne_w_s (__m128); -__m128i __lsx_vftintrph_l_s (__m128); -__m128i __lsx_vftintrp_l_d (__m128d); -__m128i __lsx_vftintrpl_l_s (__m128); -__m128i __lsx_vftintrp_w_d (__m128d, __m128d); -__m128i __lsx_vftintrp_w_s (__m128); -__m128i __lsx_vftintrzh_l_s (__m128); -__m128i __lsx_vftintrz_l_d (__m128d); -__m128i __lsx_vftintrzl_l_s (__m128); -__m128i __lsx_vftintrz_lu_d (__m128d); -__m128i __lsx_vftintrz_w_d (__m128d, __m128d); -__m128i __lsx_vftintrz_w_s (__m128); -__m128i __lsx_vftintrz_wu_s (__m128); -__m128i __lsx_vftint_w_d (__m128d, __m128d); -__m128i __lsx_vftint_w_s (__m128); -__m128i __lsx_vftint_wu_s (__m128); -__m128i __lsx_vhaddw_du_wu (__m128i, __m128i); -__m128i __lsx_vhaddw_d_w (__m128i, __m128i); -__m128i __lsx_vhaddw_h_b (__m128i, __m128i); -__m128i __lsx_vhaddw_hu_bu (__m128i, __m128i); -__m128i __lsx_vhaddw_q_d (__m128i, __m128i); -__m128i __lsx_vhaddw_qu_du (__m128i, __m128i); -__m128i __lsx_vhaddw_w_h (__m128i, __m128i); -__m128i __lsx_vhaddw_wu_hu (__m128i, __m128i); -__m128i __lsx_vhsubw_du_wu (__m128i, __m128i); -__m128i __lsx_vhsubw_d_w (__m128i, __m128i); -__m128i __lsx_vhsubw_h_b (__m128i, __m128i); -__m128i __lsx_vhsubw_hu_bu (__m128i, __m128i); -__m128i __lsx_vhsubw_q_d (__m128i, __m128i); -__m128i __lsx_vhsubw_qu_du (__m128i, __m128i); -__m128i __lsx_vhsubw_w_h (__m128i, __m128i); -__m128i __lsx_vhsubw_wu_hu (__m128i, __m128i); -__m128i __lsx_vilvh_b (__m128i, __m128i); -__m128i __lsx_vilvh_d (__m128i, __m128i); -__m128i __lsx_vilvh_h (__m128i, __m128i); -__m128i __lsx_vilvh_w (__m128i, __m128i); -__m128i __lsx_vilvl_b (__m128i, __m128i); -__m128i __lsx_vilvl_d (__m128i, __m128i); -__m128i __lsx_vilvl_h (__m128i, __m128i); -__m128i __lsx_vilvl_w (__m128i, __m128i); -__m128i __lsx_vinsgr2vr_b (__m128i, int, imm0_15); -__m128i __lsx_vinsgr2vr_d (__m128i, long int, imm0_1); -__m128i __lsx_vinsgr2vr_h (__m128i, int, imm0_7); -__m128i __lsx_vinsgr2vr_w (__m128i, int, imm0_3); -__m128i __lsx_vld (void *, imm_n2048_2047); -__m128i __lsx_vldi (imm_n1024_1023); -__m128i __lsx_vldrepl_b (void *, imm_n2048_2047); -__m128i __lsx_vldrepl_d (void *, imm_n256_255); -__m128i __lsx_vldrepl_h (void *, imm_n1024_1023); -__m128i __lsx_vldrepl_w (void *, imm_n512_511); -__m128i __lsx_vldx (void *, long int); -__m128i __lsx_vmadd_b (__m128i, __m128i, __m128i); -__m128i __lsx_vmadd_d (__m128i, __m128i, __m128i); -__m128i __lsx_vmadd_h (__m128i, __m128i, __m128i); -__m128i __lsx_vmadd_w (__m128i, __m128i, __m128i); -__m128i __lsx_vmaddwev_d_w (__m128i, __m128i, __m128i); -__m128i __lsx_vmaddwev_d_wu (__m128i, __m128i, __m128i); -__m128i __lsx_vmaddwev_d_wu_w (__m128i, __m128i, __m128i); -__m128i __lsx_vmaddwev_h_b (__m128i, __m128i, __m128i); -__m128i __lsx_vmaddwev_h_bu (__m128i, __m128i, __m128i); -__m128i __lsx_vmaddwev_h_bu_b (__m128i, __m128i, __m128i); -__m128i __lsx_vmaddwev_q_d (__m128i, __m128i, __m128i); -__m128i __lsx_vmaddwev_q_du (__m128i, __m128i, __m128i); -__m128i __lsx_vmaddwev_q_du_d (__m128i, __m128i, __m128i); -__m128i __lsx_vmaddwev_w_h (__m128i, __m128i, __m128i); -__m128i __lsx_vmaddwev_w_hu (__m128i, __m128i, __m128i); -__m128i __lsx_vmaddwev_w_hu_h (__m128i, __m128i, __m128i); -__m128i __lsx_vmaddwod_d_w (__m128i, __m128i, __m128i); -__m128i __lsx_vmaddwod_d_wu (__m128i, __m128i, __m128i); -__m128i __lsx_vmaddwod_d_wu_w (__m128i, __m128i, __m128i); -__m128i __lsx_vmaddwod_h_b (__m128i, __m128i, __m128i); -__m128i __lsx_vmaddwod_h_bu (__m128i, __m128i, __m128i); -__m128i __lsx_vmaddwod_h_bu_b (__m128i, __m128i, __m128i); -__m128i __lsx_vmaddwod_q_d (__m128i, __m128i, __m128i); -__m128i __lsx_vmaddwod_q_du (__m128i, __m128i, __m128i); -__m128i __lsx_vmaddwod_q_du_d (__m128i, __m128i, __m128i); -__m128i __lsx_vmaddwod_w_h (__m128i, __m128i, __m128i); -__m128i __lsx_vmaddwod_w_hu (__m128i, __m128i, __m128i); -__m128i __lsx_vmaddwod_w_hu_h (__m128i, __m128i, __m128i); -__m128i __lsx_vmax_b (__m128i, __m128i); -__m128i __lsx_vmax_bu (__m128i, __m128i); -__m128i __lsx_vmax_d (__m128i, __m128i); -__m128i __lsx_vmax_du (__m128i, __m128i); -__m128i __lsx_vmax_h (__m128i, __m128i); -__m128i __lsx_vmax_hu (__m128i, __m128i); -__m128i __lsx_vmaxi_b (__m128i, imm_n16_15); -__m128i __lsx_vmaxi_bu (__m128i, imm0_31); -__m128i __lsx_vmaxi_d (__m128i, imm_n16_15); -__m128i __lsx_vmaxi_du (__m128i, imm0_31); -__m128i __lsx_vmaxi_h (__m128i, imm_n16_15); -__m128i __lsx_vmaxi_hu (__m128i, imm0_31); -__m128i __lsx_vmaxi_w (__m128i, imm_n16_15); -__m128i __lsx_vmaxi_wu (__m128i, imm0_31); -__m128i __lsx_vmax_w (__m128i, __m128i); -__m128i __lsx_vmax_wu (__m128i, __m128i); -__m128i __lsx_vmin_b (__m128i, __m128i); -__m128i __lsx_vmin_bu (__m128i, __m128i); -__m128i __lsx_vmin_d (__m128i, __m128i); -__m128i __lsx_vmin_du (__m128i, __m128i); -__m128i __lsx_vmin_h (__m128i, __m128i); -__m128i __lsx_vmin_hu (__m128i, __m128i); -__m128i __lsx_vmini_b (__m128i, imm_n16_15); -__m128i __lsx_vmini_bu (__m128i, imm0_31); -__m128i __lsx_vmini_d (__m128i, imm_n16_15); -__m128i __lsx_vmini_du (__m128i, imm0_31); -__m128i __lsx_vmini_h (__m128i, imm_n16_15); -__m128i __lsx_vmini_hu (__m128i, imm0_31); -__m128i __lsx_vmini_w (__m128i, imm_n16_15); -__m128i __lsx_vmini_wu (__m128i, imm0_31); -__m128i __lsx_vmin_w (__m128i, __m128i); -__m128i __lsx_vmin_wu (__m128i, __m128i); -__m128i __lsx_vmod_b (__m128i, __m128i); -__m128i __lsx_vmod_bu (__m128i, __m128i); -__m128i __lsx_vmod_d (__m128i, __m128i); -__m128i __lsx_vmod_du (__m128i, __m128i); -__m128i __lsx_vmod_h (__m128i, __m128i); -__m128i __lsx_vmod_hu (__m128i, __m128i); -__m128i __lsx_vmod_w (__m128i, __m128i); -__m128i __lsx_vmod_wu (__m128i, __m128i); -__m128i __lsx_vmskgez_b (__m128i); -__m128i __lsx_vmskltz_b (__m128i); -__m128i __lsx_vmskltz_d (__m128i); -__m128i __lsx_vmskltz_h (__m128i); -__m128i __lsx_vmskltz_w (__m128i); -__m128i __lsx_vmsknz_b (__m128i); -__m128i __lsx_vmsub_b (__m128i, __m128i, __m128i); -__m128i __lsx_vmsub_d (__m128i, __m128i, __m128i); -__m128i __lsx_vmsub_h (__m128i, __m128i, __m128i); -__m128i __lsx_vmsub_w (__m128i, __m128i, __m128i); -__m128i __lsx_vmuh_b (__m128i, __m128i); -__m128i __lsx_vmuh_bu (__m128i, __m128i); -__m128i __lsx_vmuh_d (__m128i, __m128i); -__m128i __lsx_vmuh_du (__m128i, __m128i); -__m128i __lsx_vmuh_h (__m128i, __m128i); -__m128i __lsx_vmuh_hu (__m128i, __m128i); -__m128i __lsx_vmuh_w (__m128i, __m128i); -__m128i __lsx_vmuh_wu (__m128i, __m128i); -__m128i __lsx_vmul_b (__m128i, __m128i); -__m128i __lsx_vmul_d (__m128i, __m128i); -__m128i __lsx_vmul_h (__m128i, __m128i); -__m128i __lsx_vmul_w (__m128i, __m128i); -__m128i __lsx_vmulwev_d_w (__m128i, __m128i); -__m128i __lsx_vmulwev_d_wu (__m128i, __m128i); -__m128i __lsx_vmulwev_d_wu_w (__m128i, __m128i); -__m128i __lsx_vmulwev_h_b (__m128i, __m128i); -__m128i __lsx_vmulwev_h_bu (__m128i, __m128i); -__m128i __lsx_vmulwev_h_bu_b (__m128i, __m128i); -__m128i __lsx_vmulwev_q_d (__m128i, __m128i); -__m128i __lsx_vmulwev_q_du (__m128i, __m128i); -__m128i __lsx_vmulwev_q_du_d (__m128i, __m128i); -__m128i __lsx_vmulwev_w_h (__m128i, __m128i); -__m128i __lsx_vmulwev_w_hu (__m128i, __m128i); -__m128i __lsx_vmulwev_w_hu_h (__m128i, __m128i); -__m128i __lsx_vmulwod_d_w (__m128i, __m128i); -__m128i __lsx_vmulwod_d_wu (__m128i, __m128i); -__m128i __lsx_vmulwod_d_wu_w (__m128i, __m128i); -__m128i __lsx_vmulwod_h_b (__m128i, __m128i); -__m128i __lsx_vmulwod_h_bu (__m128i, __m128i); -__m128i __lsx_vmulwod_h_bu_b (__m128i, __m128i); -__m128i __lsx_vmulwod_q_d (__m128i, __m128i); -__m128i __lsx_vmulwod_q_du (__m128i, __m128i); -__m128i __lsx_vmulwod_q_du_d (__m128i, __m128i); -__m128i __lsx_vmulwod_w_h (__m128i, __m128i); -__m128i __lsx_vmulwod_w_hu (__m128i, __m128i); -__m128i __lsx_vmulwod_w_hu_h (__m128i, __m128i); -__m128i __lsx_vneg_b (__m128i); -__m128i __lsx_vneg_d (__m128i); -__m128i __lsx_vneg_h (__m128i); -__m128i __lsx_vneg_w (__m128i); -__m128i __lsx_vnori_b (__m128i, imm0_255); -__m128i __lsx_vnor_v (__m128i, __m128i); -__m128i __lsx_vori_b (__m128i, imm0_255); -__m128i __lsx_vorn_v (__m128i, __m128i); -__m128i __lsx_vor_v (__m128i, __m128i); -__m128i __lsx_vpackev_b (__m128i, __m128i); -__m128i __lsx_vpackev_d (__m128i, __m128i); -__m128i __lsx_vpackev_h (__m128i, __m128i); -__m128i __lsx_vpackev_w (__m128i, __m128i); -__m128i __lsx_vpackod_b (__m128i, __m128i); -__m128i __lsx_vpackod_d (__m128i, __m128i); -__m128i __lsx_vpackod_h (__m128i, __m128i); -__m128i __lsx_vpackod_w (__m128i, __m128i); -__m128i __lsx_vpcnt_b (__m128i); -__m128i __lsx_vpcnt_d (__m128i); -__m128i __lsx_vpcnt_h (__m128i); -__m128i __lsx_vpcnt_w (__m128i); -__m128i __lsx_vpermi_w (__m128i, __m128i, imm0_255); -__m128i __lsx_vpickev_b (__m128i, __m128i); -__m128i __lsx_vpickev_d (__m128i, __m128i); -__m128i __lsx_vpickev_h (__m128i, __m128i); -__m128i __lsx_vpickev_w (__m128i, __m128i); -__m128i __lsx_vpickod_b (__m128i, __m128i); -__m128i __lsx_vpickod_d (__m128i, __m128i); -__m128i __lsx_vpickod_h (__m128i, __m128i); -__m128i __lsx_vpickod_w (__m128i, __m128i); -int __lsx_vpickve2gr_b (__m128i, imm0_15); -unsigned int __lsx_vpickve2gr_bu (__m128i, imm0_15); -long int __lsx_vpickve2gr_d (__m128i, imm0_1); -unsigned long int __lsx_vpickve2gr_du (__m128i, imm0_1); -int __lsx_vpickve2gr_h (__m128i, imm0_7); -unsigned int __lsx_vpickve2gr_hu (__m128i, imm0_7); -int __lsx_vpickve2gr_w (__m128i, imm0_3); -unsigned int __lsx_vpickve2gr_wu (__m128i, imm0_3); -__m128i __lsx_vreplgr2vr_b (int); -__m128i __lsx_vreplgr2vr_d (long int); -__m128i __lsx_vreplgr2vr_h (int); -__m128i __lsx_vreplgr2vr_w (int); -__m128i __lsx_vrepli_b (imm_n512_511); -__m128i __lsx_vrepli_d (imm_n512_511); -__m128i __lsx_vrepli_h (imm_n512_511); -__m128i __lsx_vrepli_w (imm_n512_511); -__m128i __lsx_vreplve_b (__m128i, int); -__m128i __lsx_vreplve_d (__m128i, int); -__m128i __lsx_vreplve_h (__m128i, int); -__m128i __lsx_vreplvei_b (__m128i, imm0_15); -__m128i __lsx_vreplvei_d (__m128i, imm0_1); -__m128i __lsx_vreplvei_h (__m128i, imm0_7); -__m128i __lsx_vreplvei_w (__m128i, imm0_3); -__m128i __lsx_vreplve_w (__m128i, int); -__m128i __lsx_vrotr_b (__m128i, __m128i); -__m128i __lsx_vrotr_d (__m128i, __m128i); -__m128i __lsx_vrotr_h (__m128i, __m128i); -__m128i __lsx_vrotri_b (__m128i, imm0_7); -__m128i __lsx_vrotri_d (__m128i, imm0_63); -__m128i __lsx_vrotri_h (__m128i, imm0_15); -__m128i __lsx_vrotri_w (__m128i, imm0_31); -__m128i __lsx_vrotr_w (__m128i, __m128i); -__m128i __lsx_vsadd_b (__m128i, __m128i); -__m128i __lsx_vsadd_bu (__m128i, __m128i); -__m128i __lsx_vsadd_d (__m128i, __m128i); -__m128i __lsx_vsadd_du (__m128i, __m128i); -__m128i __lsx_vsadd_h (__m128i, __m128i); -__m128i __lsx_vsadd_hu (__m128i, __m128i); -__m128i __lsx_vsadd_w (__m128i, __m128i); -__m128i __lsx_vsadd_wu (__m128i, __m128i); -__m128i __lsx_vsat_b (__m128i, imm0_7); -__m128i __lsx_vsat_bu (__m128i, imm0_7); -__m128i __lsx_vsat_d (__m128i, imm0_63); -__m128i __lsx_vsat_du (__m128i, imm0_63); -__m128i __lsx_vsat_h (__m128i, imm0_15); -__m128i __lsx_vsat_hu (__m128i, imm0_15); -__m128i __lsx_vsat_w (__m128i, imm0_31); -__m128i __lsx_vsat_wu (__m128i, imm0_31); -__m128i __lsx_vseq_b (__m128i, __m128i); -__m128i __lsx_vseq_d (__m128i, __m128i); -__m128i __lsx_vseq_h (__m128i, __m128i); -__m128i __lsx_vseqi_b (__m128i, imm_n16_15); -__m128i __lsx_vseqi_d (__m128i, imm_n16_15); -__m128i __lsx_vseqi_h (__m128i, imm_n16_15); -__m128i __lsx_vseqi_w (__m128i, imm_n16_15); -__m128i __lsx_vseq_w (__m128i, __m128i); -__m128i __lsx_vshuf4i_b (__m128i, imm0_255); -__m128i __lsx_vshuf4i_d (__m128i, __m128i, imm0_255); -__m128i __lsx_vshuf4i_h (__m128i, imm0_255); -__m128i __lsx_vshuf4i_w (__m128i, imm0_255); -__m128i __lsx_vshuf_b (__m128i, __m128i, __m128i); -__m128i __lsx_vshuf_d (__m128i, __m128i, __m128i); -__m128i __lsx_vshuf_h (__m128i, __m128i, __m128i); -__m128i __lsx_vshuf_w (__m128i, __m128i, __m128i); -__m128i __lsx_vsigncov_b (__m128i, __m128i); -__m128i __lsx_vsigncov_d (__m128i, __m128i); -__m128i __lsx_vsigncov_h (__m128i, __m128i); -__m128i __lsx_vsigncov_w (__m128i, __m128i); -__m128i __lsx_vsle_b (__m128i, __m128i); -__m128i __lsx_vsle_bu (__m128i, __m128i); -__m128i __lsx_vsle_d (__m128i, __m128i); -__m128i __lsx_vsle_du (__m128i, __m128i); -__m128i __lsx_vsle_h (__m128i, __m128i); -__m128i __lsx_vsle_hu (__m128i, __m128i); -__m128i __lsx_vslei_b (__m128i, imm_n16_15); -__m128i __lsx_vslei_bu (__m128i, imm0_31); -__m128i __lsx_vslei_d (__m128i, imm_n16_15); -__m128i __lsx_vslei_du (__m128i, imm0_31); -__m128i __lsx_vslei_h (__m128i, imm_n16_15); -__m128i __lsx_vslei_hu (__m128i, imm0_31); -__m128i __lsx_vslei_w (__m128i, imm_n16_15); -__m128i __lsx_vslei_wu (__m128i, imm0_31); -__m128i __lsx_vsle_w (__m128i, __m128i); -__m128i __lsx_vsle_wu (__m128i, __m128i); -__m128i __lsx_vsll_b (__m128i, __m128i); -__m128i __lsx_vsll_d (__m128i, __m128i); -__m128i __lsx_vsll_h (__m128i, __m128i); -__m128i __lsx_vslli_b (__m128i, imm0_7); -__m128i __lsx_vslli_d (__m128i, imm0_63); -__m128i __lsx_vslli_h (__m128i, imm0_15); -__m128i __lsx_vslli_w (__m128i, imm0_31); -__m128i __lsx_vsll_w (__m128i, __m128i); -__m128i __lsx_vsllwil_du_wu (__m128i, imm0_31); -__m128i __lsx_vsllwil_d_w (__m128i, imm0_31); -__m128i __lsx_vsllwil_h_b (__m128i, imm0_7); -__m128i __lsx_vsllwil_hu_bu (__m128i, imm0_7); -__m128i __lsx_vsllwil_w_h (__m128i, imm0_15); -__m128i __lsx_vsllwil_wu_hu (__m128i, imm0_15); -__m128i __lsx_vslt_b (__m128i, __m128i); -__m128i __lsx_vslt_bu (__m128i, __m128i); -__m128i __lsx_vslt_d (__m128i, __m128i); -__m128i __lsx_vslt_du (__m128i, __m128i); -__m128i __lsx_vslt_h (__m128i, __m128i); -__m128i __lsx_vslt_hu (__m128i, __m128i); -__m128i __lsx_vslti_b (__m128i, imm_n16_15); -__m128i __lsx_vslti_bu (__m128i, imm0_31); -__m128i __lsx_vslti_d (__m128i, imm_n16_15); -__m128i __lsx_vslti_du (__m128i, imm0_31); -__m128i __lsx_vslti_h (__m128i, imm_n16_15); -__m128i __lsx_vslti_hu (__m128i, imm0_31); -__m128i __lsx_vslti_w (__m128i, imm_n16_15); -__m128i __lsx_vslti_wu (__m128i, imm0_31); -__m128i __lsx_vslt_w (__m128i, __m128i); -__m128i __lsx_vslt_wu (__m128i, __m128i); -__m128i __lsx_vsra_b (__m128i, __m128i); -__m128i __lsx_vsra_d (__m128i, __m128i); -__m128i __lsx_vsra_h (__m128i, __m128i); -__m128i __lsx_vsrai_b (__m128i, imm0_7); -__m128i __lsx_vsrai_d (__m128i, imm0_63); -__m128i __lsx_vsrai_h (__m128i, imm0_15); -__m128i __lsx_vsrai_w (__m128i, imm0_31); -__m128i __lsx_vsran_b_h (__m128i, __m128i); -__m128i __lsx_vsran_h_w (__m128i, __m128i); -__m128i __lsx_vsrani_b_h (__m128i, __m128i, imm0_15); -__m128i __lsx_vsrani_d_q (__m128i, __m128i, imm0_127); -__m128i __lsx_vsrani_h_w (__m128i, __m128i, imm0_31); -__m128i __lsx_vsrani_w_d (__m128i, __m128i, imm0_63); -__m128i __lsx_vsran_w_d (__m128i, __m128i); -__m128i __lsx_vsrar_b (__m128i, __m128i); -__m128i __lsx_vsrar_d (__m128i, __m128i); -__m128i __lsx_vsrar_h (__m128i, __m128i); -__m128i __lsx_vsrari_b (__m128i, imm0_7); -__m128i __lsx_vsrari_d (__m128i, imm0_63); -__m128i __lsx_vsrari_h (__m128i, imm0_15); -__m128i __lsx_vsrari_w (__m128i, imm0_31); -__m128i __lsx_vsrarn_b_h (__m128i, __m128i); -__m128i __lsx_vsrarn_h_w (__m128i, __m128i); -__m128i __lsx_vsrarni_b_h (__m128i, __m128i, imm0_15); -__m128i __lsx_vsrarni_d_q (__m128i, __m128i, imm0_127); -__m128i __lsx_vsrarni_h_w (__m128i, __m128i, imm0_31); -__m128i __lsx_vsrarni_w_d (__m128i, __m128i, imm0_63); -__m128i __lsx_vsrarn_w_d (__m128i, __m128i); -__m128i __lsx_vsrar_w (__m128i, __m128i); -__m128i __lsx_vsra_w (__m128i, __m128i); -__m128i __lsx_vsrl_b (__m128i, __m128i); -__m128i __lsx_vsrl_d (__m128i, __m128i); -__m128i __lsx_vsrl_h (__m128i, __m128i); -__m128i __lsx_vsrli_b (__m128i, imm0_7); -__m128i __lsx_vsrli_d (__m128i, imm0_63); -__m128i __lsx_vsrli_h (__m128i, imm0_15); -__m128i __lsx_vsrli_w (__m128i, imm0_31); -__m128i __lsx_vsrln_b_h (__m128i, __m128i); -__m128i __lsx_vsrln_h_w (__m128i, __m128i); -__m128i __lsx_vsrlni_b_h (__m128i, __m128i, imm0_15); -__m128i __lsx_vsrlni_d_q (__m128i, __m128i, imm0_127); -__m128i __lsx_vsrlni_h_w (__m128i, __m128i, imm0_31); -__m128i __lsx_vsrlni_w_d (__m128i, __m128i, imm0_63); -__m128i __lsx_vsrln_w_d (__m128i, __m128i); -__m128i __lsx_vsrlr_b (__m128i, __m128i); -__m128i __lsx_vsrlr_d (__m128i, __m128i); -__m128i __lsx_vsrlr_h (__m128i, __m128i); -__m128i __lsx_vsrlri_b (__m128i, imm0_7); -__m128i __lsx_vsrlri_d (__m128i, imm0_63); -__m128i __lsx_vsrlri_h (__m128i, imm0_15); -__m128i __lsx_vsrlri_w (__m128i, imm0_31); -__m128i __lsx_vsrlrn_b_h (__m128i, __m128i); -__m128i __lsx_vsrlrn_h_w (__m128i, __m128i); -__m128i __lsx_vsrlrni_b_h (__m128i, __m128i, imm0_15); -__m128i __lsx_vsrlrni_d_q (__m128i, __m128i, imm0_127); -__m128i __lsx_vsrlrni_h_w (__m128i, __m128i, imm0_31); -__m128i __lsx_vsrlrni_w_d (__m128i, __m128i, imm0_63); -__m128i __lsx_vsrlrn_w_d (__m128i, __m128i); -__m128i __lsx_vsrlr_w (__m128i, __m128i); -__m128i __lsx_vsrl_w (__m128i, __m128i); -__m128i __lsx_vssran_b_h (__m128i, __m128i); -__m128i __lsx_vssran_bu_h (__m128i, __m128i); -__m128i __lsx_vssran_hu_w (__m128i, __m128i); -__m128i __lsx_vssran_h_w (__m128i, __m128i); -__m128i __lsx_vssrani_b_h (__m128i, __m128i, imm0_15); -__m128i __lsx_vssrani_bu_h (__m128i, __m128i, imm0_15); -__m128i __lsx_vssrani_d_q (__m128i, __m128i, imm0_127); -__m128i __lsx_vssrani_du_q (__m128i, __m128i, imm0_127); -__m128i __lsx_vssrani_hu_w (__m128i, __m128i, imm0_31); -__m128i __lsx_vssrani_h_w (__m128i, __m128i, imm0_31); -__m128i __lsx_vssrani_w_d (__m128i, __m128i, imm0_63); -__m128i __lsx_vssrani_wu_d (__m128i, __m128i, imm0_63); -__m128i __lsx_vssran_w_d (__m128i, __m128i); -__m128i __lsx_vssran_wu_d (__m128i, __m128i); -__m128i __lsx_vssrarn_b_h (__m128i, __m128i); -__m128i __lsx_vssrarn_bu_h (__m128i, __m128i); -__m128i __lsx_vssrarn_hu_w (__m128i, __m128i); -__m128i __lsx_vssrarn_h_w (__m128i, __m128i); -__m128i __lsx_vssrarni_b_h (__m128i, __m128i, imm0_15); -__m128i __lsx_vssrarni_bu_h (__m128i, __m128i, imm0_15); -__m128i __lsx_vssrarni_d_q (__m128i, __m128i, imm0_127); -__m128i __lsx_vssrarni_du_q (__m128i, __m128i, imm0_127); -__m128i __lsx_vssrarni_hu_w (__m128i, __m128i, imm0_31); -__m128i __lsx_vssrarni_h_w (__m128i, __m128i, imm0_31); -__m128i __lsx_vssrarni_w_d (__m128i, __m128i, imm0_63); -__m128i __lsx_vssrarni_wu_d (__m128i, __m128i, imm0_63); -__m128i __lsx_vssrarn_w_d (__m128i, __m128i); -__m128i __lsx_vssrarn_wu_d (__m128i, __m128i); -__m128i __lsx_vssrln_b_h (__m128i, __m128i); -__m128i __lsx_vssrln_bu_h (__m128i, __m128i); -__m128i __lsx_vssrln_hu_w (__m128i, __m128i); -__m128i __lsx_vssrln_h_w (__m128i, __m128i); -__m128i __lsx_vssrlni_b_h (__m128i, __m128i, imm0_15); -__m128i __lsx_vssrlni_bu_h (__m128i, __m128i, imm0_15); -__m128i __lsx_vssrlni_d_q (__m128i, __m128i, imm0_127); -__m128i __lsx_vssrlni_du_q (__m128i, __m128i, imm0_127); -__m128i __lsx_vssrlni_hu_w (__m128i, __m128i, imm0_31); -__m128i __lsx_vssrlni_h_w (__m128i, __m128i, imm0_31); -__m128i __lsx_vssrlni_w_d (__m128i, __m128i, imm0_63); -__m128i __lsx_vssrlni_wu_d (__m128i, __m128i, imm0_63); -__m128i __lsx_vssrln_w_d (__m128i, __m128i); -__m128i __lsx_vssrln_wu_d (__m128i, __m128i); -__m128i __lsx_vssrlrn_b_h (__m128i, __m128i); -__m128i __lsx_vssrlrn_bu_h (__m128i, __m128i); -__m128i __lsx_vssrlrn_hu_w (__m128i, __m128i); -__m128i __lsx_vssrlrn_h_w (__m128i, __m128i); -__m128i __lsx_vssrlrni_b_h (__m128i, __m128i, imm0_15); -__m128i __lsx_vssrlrni_bu_h (__m128i, __m128i, imm0_15); -__m128i __lsx_vssrlrni_d_q (__m128i, __m128i, imm0_127); -__m128i __lsx_vssrlrni_du_q (__m128i, __m128i, imm0_127); -__m128i __lsx_vssrlrni_hu_w (__m128i, __m128i, imm0_31); -__m128i __lsx_vssrlrni_h_w (__m128i, __m128i, imm0_31); -__m128i __lsx_vssrlrni_w_d (__m128i, __m128i, imm0_63); -__m128i __lsx_vssrlrni_wu_d (__m128i, __m128i, imm0_63); -__m128i __lsx_vssrlrn_w_d (__m128i, __m128i); -__m128i __lsx_vssrlrn_wu_d (__m128i, __m128i); -__m128i __lsx_vssub_b (__m128i, __m128i); -__m128i __lsx_vssub_bu (__m128i, __m128i); -__m128i __lsx_vssub_d (__m128i, __m128i); -__m128i __lsx_vssub_du (__m128i, __m128i); -__m128i __lsx_vssub_h (__m128i, __m128i); -__m128i __lsx_vssub_hu (__m128i, __m128i); -__m128i __lsx_vssub_w (__m128i, __m128i); -__m128i __lsx_vssub_wu (__m128i, __m128i); -void __lsx_vst (__m128i, void *, imm_n2048_2047); -void __lsx_vstelm_b (__m128i, void *, imm_n128_127, imm0_15); -void __lsx_vstelm_d (__m128i, void *, imm_n128_127, imm0_1); -void __lsx_vstelm_h (__m128i, void *, imm_n128_127, imm0_7); -void __lsx_vstelm_w (__m128i, void *, imm_n128_127, imm0_3); -void __lsx_vstx (__m128i, void *, long int); -__m128i __lsx_vsub_b (__m128i, __m128i); -__m128i __lsx_vsub_d (__m128i, __m128i); -__m128i __lsx_vsub_h (__m128i, __m128i); -__m128i __lsx_vsubi_bu (__m128i, imm0_31); -__m128i __lsx_vsubi_du (__m128i, imm0_31); -__m128i __lsx_vsubi_hu (__m128i, imm0_31); -__m128i __lsx_vsubi_wu (__m128i, imm0_31); -__m128i __lsx_vsub_q (__m128i, __m128i); -__m128i __lsx_vsub_w (__m128i, __m128i); -__m128i __lsx_vsubwev_d_w (__m128i, __m128i); -__m128i __lsx_vsubwev_d_wu (__m128i, __m128i); -__m128i __lsx_vsubwev_h_b (__m128i, __m128i); -__m128i __lsx_vsubwev_h_bu (__m128i, __m128i); -__m128i __lsx_vsubwev_q_d (__m128i, __m128i); -__m128i __lsx_vsubwev_q_du (__m128i, __m128i); -__m128i __lsx_vsubwev_w_h (__m128i, __m128i); -__m128i __lsx_vsubwev_w_hu (__m128i, __m128i); -__m128i __lsx_vsubwod_d_w (__m128i, __m128i); -__m128i __lsx_vsubwod_d_wu (__m128i, __m128i); -__m128i __lsx_vsubwod_h_b (__m128i, __m128i); -__m128i __lsx_vsubwod_h_bu (__m128i, __m128i); -__m128i __lsx_vsubwod_q_d (__m128i, __m128i); -__m128i __lsx_vsubwod_q_du (__m128i, __m128i); -__m128i __lsx_vsubwod_w_h (__m128i, __m128i); -__m128i __lsx_vsubwod_w_hu (__m128i, __m128i); -__m128i __lsx_vxori_b (__m128i, imm0_255); -__m128i __lsx_vxor_v (__m128i, __m128i); -@end smallexample - -These intrinsic functions are available by including @code{lsxintrin.h} and -using @option{-mfrecipe} and @option{-mlsx}. -@smallexample -__m128d __lsx_vfrecipe_d (__m128d); -__m128 __lsx_vfrecipe_s (__m128); -__m128d __lsx_vfrsqrte_d (__m128d); -__m128 __lsx_vfrsqrte_s (__m128); -@end smallexample - -@node LoongArch ASX Vector Intrinsics -@subsection LoongArch ASX Vector Intrinsics - -GCC provides intrinsics to access the LASX (Loongson Advanced SIMD Extension) -instructions. The interface is made available by including @code{} -and using @option{-mlasx}. - -The following vectors typedefs are included in @code{lasxintrin.h}: - -@itemize -@item @code{__m256i}, a 256-bit vector of fixed point; -@item @code{__m256}, a 256-bit vector of single precision floating point; -@item @code{__m256d}, a 256-bit vector of double precision floating point. -@end itemize - -Instructions and corresponding built-ins may have additional restrictions and/or -input/output values manipulated: - -@itemize -@item @code{imm0_1}, an integer literal in range 0 to 1. -@item @code{imm0_3}, an integer literal in range 0 to 3. -@item @code{imm0_7}, an integer literal in range 0 to 7. -@item @code{imm0_15}, an integer literal in range 0 to 15. -@item @code{imm0_31}, an integer literal in range 0 to 31. -@item @code{imm0_63}, an integer literal in range 0 to 63. -@item @code{imm0_127}, an integer literal in range 0 to 127. -@item @code{imm0_255}, an integer literal in range 0 to 255. -@item @code{imm_n16_15}, an integer literal in range -16 to 15. -@item @code{imm_n128_127}, an integer literal in range -128 to 127. -@item @code{imm_n256_255}, an integer literal in range -256 to 255. -@item @code{imm_n512_511}, an integer literal in range -512 to 511. -@item @code{imm_n1024_1023}, an integer literal in range -1024 to 1023. -@item @code{imm_n2048_2047}, an integer literal in range -2048 to 2047. -@end itemize - -For convenience, GCC defines functions @code{__lasx_xvrepli_@{b/h/w/d@}} and -@code{__lasx_b[n]z_@{v/b/h/w/d@}}, which are implemented as follows: - -@smallexample -a. @code{__lasx_xvrepli_@{b/h/w/d@}}: Implemented the case where the highest - bit of @code{xvldi} instruction @code{i13} is 1. - - i13[12] == 1'b0 - case i13[11:10] of : - 2'b00: __lasx_xvrepli_b (imm_n512_511) - 2'b01: __lasx_xvrepli_h (imm_n512_511) - 2'b10: __lasx_xvrepli_w (imm_n512_511) - 2'b11: __lasx_xvrepli_d (imm_n512_511) - -b. @code{__lasx_b[n]z_@{v/b/h/w/d@}}: Since the @code{xvseteqz} class directive - cannot be used on its own, this function is defined. - - __lasx_xbz_v => xvseteqz.v + bcnez - __lasx_xbnz_v => xvsetnez.v + bcnez - __lasx_xbz_b => xvsetanyeqz.b + bcnez - __lasx_xbz_h => xvsetanyeqz.h + bcnez - __lasx_xbz_w => xvsetanyeqz.w + bcnez - __lasx_xbz_d => xvsetanyeqz.d + bcnez - __lasx_xbnz_b => xvsetallnez.b + bcnez - __lasx_xbnz_h => xvsetallnez.h + bcnez - __lasx_xbnz_w => xvsetallnez.w + bcnez - __lasx_xbnz_d => xvsetallnez.d + bcnez -@end smallexample - -@smallexample -eg: - #include - - extern __m256i @var{a}; - - void - test (void) - @{ - if (__lasx_xbz_v (@var{a})) - printf ("1\n"); - else - printf ("2\n"); - @} -@end smallexample - -@emph{Note:} For directives where the intent operand is also the source operand -(modifying only part of the bitfield of the intent register), the first parameter -in the builtin call function is used as the intent operand. - -@smallexample -eg: - #include - extern __m256i @var{dst}; - int @var{src}; - - void - test (void) - @{ - @var{dst} = __lasx_xvinsgr2vr_w (@var{dst}, @var{src}, 3); - @} -@end smallexample - - -The intrinsics provided are listed below: - -@smallexample -__m256i __lasx_vext2xv_d_b (__m256i); -__m256i __lasx_vext2xv_d_h (__m256i); -__m256i __lasx_vext2xv_du_bu (__m256i); -__m256i __lasx_vext2xv_du_hu (__m256i); -__m256i __lasx_vext2xv_du_wu (__m256i); -__m256i __lasx_vext2xv_d_w (__m256i); -__m256i __lasx_vext2xv_h_b (__m256i); -__m256i __lasx_vext2xv_hu_bu (__m256i); -__m256i __lasx_vext2xv_w_b (__m256i); -__m256i __lasx_vext2xv_w_h (__m256i); -__m256i __lasx_vext2xv_wu_bu (__m256i); -__m256i __lasx_vext2xv_wu_hu (__m256i); -int __lasx_xbnz_b (__m256i); -int __lasx_xbnz_d (__m256i); -int __lasx_xbnz_h (__m256i); -int __lasx_xbnz_v (__m256i); -int __lasx_xbnz_w (__m256i); -int __lasx_xbz_b (__m256i); -int __lasx_xbz_d (__m256i); -int __lasx_xbz_h (__m256i); -int __lasx_xbz_v (__m256i); -int __lasx_xbz_w (__m256i); -__m256i __lasx_xvabsd_b (__m256i, __m256i); -__m256i __lasx_xvabsd_bu (__m256i, __m256i); -__m256i __lasx_xvabsd_d (__m256i, __m256i); -__m256i __lasx_xvabsd_du (__m256i, __m256i); -__m256i __lasx_xvabsd_h (__m256i, __m256i); -__m256i __lasx_xvabsd_hu (__m256i, __m256i); -__m256i __lasx_xvabsd_w (__m256i, __m256i); -__m256i __lasx_xvabsd_wu (__m256i, __m256i); -__m256i __lasx_xvadda_b (__m256i, __m256i); -__m256i __lasx_xvadda_d (__m256i, __m256i); -__m256i __lasx_xvadda_h (__m256i, __m256i); -__m256i __lasx_xvadda_w (__m256i, __m256i); -__m256i __lasx_xvadd_b (__m256i, __m256i); -__m256i __lasx_xvadd_d (__m256i, __m256i); -__m256i __lasx_xvadd_h (__m256i, __m256i); -__m256i __lasx_xvaddi_bu (__m256i, imm0_31); -__m256i __lasx_xvaddi_du (__m256i, imm0_31); -__m256i __lasx_xvaddi_hu (__m256i, imm0_31); -__m256i __lasx_xvaddi_wu (__m256i, imm0_31); -__m256i __lasx_xvadd_q (__m256i, __m256i); -__m256i __lasx_xvadd_w (__m256i, __m256i); -__m256i __lasx_xvaddwev_d_w (__m256i, __m256i); -__m256i __lasx_xvaddwev_d_wu (__m256i, __m256i); -__m256i __lasx_xvaddwev_d_wu_w (__m256i, __m256i); -__m256i __lasx_xvaddwev_h_b (__m256i, __m256i); -__m256i __lasx_xvaddwev_h_bu (__m256i, __m256i); -__m256i __lasx_xvaddwev_h_bu_b (__m256i, __m256i); -__m256i __lasx_xvaddwev_q_d (__m256i, __m256i); -__m256i __lasx_xvaddwev_q_du (__m256i, __m256i); -__m256i __lasx_xvaddwev_q_du_d (__m256i, __m256i); -__m256i __lasx_xvaddwev_w_h (__m256i, __m256i); -__m256i __lasx_xvaddwev_w_hu (__m256i, __m256i); -__m256i __lasx_xvaddwev_w_hu_h (__m256i, __m256i); -__m256i __lasx_xvaddwod_d_w (__m256i, __m256i); -__m256i __lasx_xvaddwod_d_wu (__m256i, __m256i); -__m256i __lasx_xvaddwod_d_wu_w (__m256i, __m256i); -__m256i __lasx_xvaddwod_h_b (__m256i, __m256i); -__m256i __lasx_xvaddwod_h_bu (__m256i, __m256i); -__m256i __lasx_xvaddwod_h_bu_b (__m256i, __m256i); -__m256i __lasx_xvaddwod_q_d (__m256i, __m256i); -__m256i __lasx_xvaddwod_q_du (__m256i, __m256i); -__m256i __lasx_xvaddwod_q_du_d (__m256i, __m256i); -__m256i __lasx_xvaddwod_w_h (__m256i, __m256i); -__m256i __lasx_xvaddwod_w_hu (__m256i, __m256i); -__m256i __lasx_xvaddwod_w_hu_h (__m256i, __m256i); -__m256i __lasx_xvandi_b (__m256i, imm0_255); -__m256i __lasx_xvandn_v (__m256i, __m256i); -__m256i __lasx_xvand_v (__m256i, __m256i); -__m256i __lasx_xvavg_b (__m256i, __m256i); -__m256i __lasx_xvavg_bu (__m256i, __m256i); -__m256i __lasx_xvavg_d (__m256i, __m256i); -__m256i __lasx_xvavg_du (__m256i, __m256i); -__m256i __lasx_xvavg_h (__m256i, __m256i); -__m256i __lasx_xvavg_hu (__m256i, __m256i); -__m256i __lasx_xvavgr_b (__m256i, __m256i); -__m256i __lasx_xvavgr_bu (__m256i, __m256i); -__m256i __lasx_xvavgr_d (__m256i, __m256i); -__m256i __lasx_xvavgr_du (__m256i, __m256i); -__m256i __lasx_xvavgr_h (__m256i, __m256i); -__m256i __lasx_xvavgr_hu (__m256i, __m256i); -__m256i __lasx_xvavgr_w (__m256i, __m256i); -__m256i __lasx_xvavgr_wu (__m256i, __m256i); -__m256i __lasx_xvavg_w (__m256i, __m256i); -__m256i __lasx_xvavg_wu (__m256i, __m256i); -__m256i __lasx_xvbitclr_b (__m256i, __m256i); -__m256i __lasx_xvbitclr_d (__m256i, __m256i); -__m256i __lasx_xvbitclr_h (__m256i, __m256i); -__m256i __lasx_xvbitclri_b (__m256i, imm0_7); -__m256i __lasx_xvbitclri_d (__m256i, imm0_63); -__m256i __lasx_xvbitclri_h (__m256i, imm0_15); -__m256i __lasx_xvbitclri_w (__m256i, imm0_31); -__m256i __lasx_xvbitclr_w (__m256i, __m256i); -__m256i __lasx_xvbitrev_b (__m256i, __m256i); -__m256i __lasx_xvbitrev_d (__m256i, __m256i); -__m256i __lasx_xvbitrev_h (__m256i, __m256i); -__m256i __lasx_xvbitrevi_b (__m256i, imm0_7); -__m256i __lasx_xvbitrevi_d (__m256i, imm0_63); -__m256i __lasx_xvbitrevi_h (__m256i, imm0_15); -__m256i __lasx_xvbitrevi_w (__m256i, imm0_31); -__m256i __lasx_xvbitrev_w (__m256i, __m256i); -__m256i __lasx_xvbitseli_b (__m256i, __m256i, imm0_255); -__m256i __lasx_xvbitsel_v (__m256i, __m256i, __m256i); -__m256i __lasx_xvbitset_b (__m256i, __m256i); -__m256i __lasx_xvbitset_d (__m256i, __m256i); -__m256i __lasx_xvbitset_h (__m256i, __m256i); -__m256i __lasx_xvbitseti_b (__m256i, imm0_7); -__m256i __lasx_xvbitseti_d (__m256i, imm0_63); -__m256i __lasx_xvbitseti_h (__m256i, imm0_15); -__m256i __lasx_xvbitseti_w (__m256i, imm0_31); -__m256i __lasx_xvbitset_w (__m256i, __m256i); -__m256i __lasx_xvbsll_v (__m256i, imm0_31); -__m256i __lasx_xvbsrl_v (__m256i, imm0_31); -__m256i __lasx_xvclo_b (__m256i); -__m256i __lasx_xvclo_d (__m256i); -__m256i __lasx_xvclo_h (__m256i); -__m256i __lasx_xvclo_w (__m256i); -__m256i __lasx_xvclz_b (__m256i); -__m256i __lasx_xvclz_d (__m256i); -__m256i __lasx_xvclz_h (__m256i); -__m256i __lasx_xvclz_w (__m256i); -__m256i __lasx_xvdiv_b (__m256i, __m256i); -__m256i __lasx_xvdiv_bu (__m256i, __m256i); -__m256i __lasx_xvdiv_d (__m256i, __m256i); -__m256i __lasx_xvdiv_du (__m256i, __m256i); -__m256i __lasx_xvdiv_h (__m256i, __m256i); -__m256i __lasx_xvdiv_hu (__m256i, __m256i); -__m256i __lasx_xvdiv_w (__m256i, __m256i); -__m256i __lasx_xvdiv_wu (__m256i, __m256i); -__m256i __lasx_xvexth_du_wu (__m256i); -__m256i __lasx_xvexth_d_w (__m256i); -__m256i __lasx_xvexth_h_b (__m256i); -__m256i __lasx_xvexth_hu_bu (__m256i); -__m256i __lasx_xvexth_q_d (__m256i); -__m256i __lasx_xvexth_qu_du (__m256i); -__m256i __lasx_xvexth_w_h (__m256i); -__m256i __lasx_xvexth_wu_hu (__m256i); -__m256i __lasx_xvextl_q_d (__m256i); -__m256i __lasx_xvextl_qu_du (__m256i); -__m256i __lasx_xvextrins_b (__m256i, __m256i, imm0_255); -__m256i __lasx_xvextrins_d (__m256i, __m256i, imm0_255); -__m256i __lasx_xvextrins_h (__m256i, __m256i, imm0_255); -__m256i __lasx_xvextrins_w (__m256i, __m256i, imm0_255); -__m256d __lasx_xvfadd_d (__m256d, __m256d); -__m256 __lasx_xvfadd_s (__m256, __m256); -__m256i __lasx_xvfclass_d (__m256d); -__m256i __lasx_xvfclass_s (__m256); -__m256i __lasx_xvfcmp_caf_d (__m256d, __m256d); -__m256i __lasx_xvfcmp_caf_s (__m256, __m256); -__m256i __lasx_xvfcmp_ceq_d (__m256d, __m256d); -__m256i __lasx_xvfcmp_ceq_s (__m256, __m256); -__m256i __lasx_xvfcmp_cle_d (__m256d, __m256d); -__m256i __lasx_xvfcmp_cle_s (__m256, __m256); -__m256i __lasx_xvfcmp_clt_d (__m256d, __m256d); -__m256i __lasx_xvfcmp_clt_s (__m256, __m256); -__m256i __lasx_xvfcmp_cne_d (__m256d, __m256d); -__m256i __lasx_xvfcmp_cne_s (__m256, __m256); -__m256i __lasx_xvfcmp_cor_d (__m256d, __m256d); -__m256i __lasx_xvfcmp_cor_s (__m256, __m256); -__m256i __lasx_xvfcmp_cueq_d (__m256d, __m256d); -__m256i __lasx_xvfcmp_cueq_s (__m256, __m256); -__m256i __lasx_xvfcmp_cule_d (__m256d, __m256d); -__m256i __lasx_xvfcmp_cule_s (__m256, __m256); -__m256i __lasx_xvfcmp_cult_d (__m256d, __m256d); -__m256i __lasx_xvfcmp_cult_s (__m256, __m256); -__m256i __lasx_xvfcmp_cun_d (__m256d, __m256d); -__m256i __lasx_xvfcmp_cune_d (__m256d, __m256d); -__m256i __lasx_xvfcmp_cune_s (__m256, __m256); -__m256i __lasx_xvfcmp_cun_s (__m256, __m256); -__m256i __lasx_xvfcmp_saf_d (__m256d, __m256d); -__m256i __lasx_xvfcmp_saf_s (__m256, __m256); -__m256i __lasx_xvfcmp_seq_d (__m256d, __m256d); -__m256i __lasx_xvfcmp_seq_s (__m256, __m256); -__m256i __lasx_xvfcmp_sle_d (__m256d, __m256d); -__m256i __lasx_xvfcmp_sle_s (__m256, __m256); -__m256i __lasx_xvfcmp_slt_d (__m256d, __m256d); -__m256i __lasx_xvfcmp_slt_s (__m256, __m256); -__m256i __lasx_xvfcmp_sne_d (__m256d, __m256d); -__m256i __lasx_xvfcmp_sne_s (__m256, __m256); -__m256i __lasx_xvfcmp_sor_d (__m256d, __m256d); -__m256i __lasx_xvfcmp_sor_s (__m256, __m256); -__m256i __lasx_xvfcmp_sueq_d (__m256d, __m256d); -__m256i __lasx_xvfcmp_sueq_s (__m256, __m256); -__m256i __lasx_xvfcmp_sule_d (__m256d, __m256d); -__m256i __lasx_xvfcmp_sule_s (__m256, __m256); -__m256i __lasx_xvfcmp_sult_d (__m256d, __m256d); -__m256i __lasx_xvfcmp_sult_s (__m256, __m256); -__m256i __lasx_xvfcmp_sun_d (__m256d, __m256d); -__m256i __lasx_xvfcmp_sune_d (__m256d, __m256d); -__m256i __lasx_xvfcmp_sune_s (__m256, __m256); -__m256i __lasx_xvfcmp_sun_s (__m256, __m256); -__m256d __lasx_xvfcvth_d_s (__m256); -__m256i __lasx_xvfcvt_h_s (__m256, __m256); -__m256 __lasx_xvfcvth_s_h (__m256i); -__m256d __lasx_xvfcvtl_d_s (__m256); -__m256 __lasx_xvfcvtl_s_h (__m256i); -__m256 __lasx_xvfcvt_s_d (__m256d, __m256d); -__m256d __lasx_xvfdiv_d (__m256d, __m256d); -__m256 __lasx_xvfdiv_s (__m256, __m256); -__m256d __lasx_xvffint_d_l (__m256i); -__m256d __lasx_xvffint_d_lu (__m256i); -__m256d __lasx_xvffinth_d_w (__m256i); -__m256d __lasx_xvffintl_d_w (__m256i); -__m256 __lasx_xvffint_s_l (__m256i, __m256i); -__m256 __lasx_xvffint_s_w (__m256i); -__m256 __lasx_xvffint_s_wu (__m256i); -__m256d __lasx_xvflogb_d (__m256d); -__m256 __lasx_xvflogb_s (__m256); -__m256d __lasx_xvfmadd_d (__m256d, __m256d, __m256d); -__m256 __lasx_xvfmadd_s (__m256, __m256, __m256); -__m256d __lasx_xvfmaxa_d (__m256d, __m256d); -__m256 __lasx_xvfmaxa_s (__m256, __m256); -__m256d __lasx_xvfmax_d (__m256d, __m256d); -__m256 __lasx_xvfmax_s (__m256, __m256); -__m256d __lasx_xvfmina_d (__m256d, __m256d); -__m256 __lasx_xvfmina_s (__m256, __m256); -__m256d __lasx_xvfmin_d (__m256d, __m256d); -__m256 __lasx_xvfmin_s (__m256, __m256); -__m256d __lasx_xvfmsub_d (__m256d, __m256d, __m256d); -__m256 __lasx_xvfmsub_s (__m256, __m256, __m256); -__m256d __lasx_xvfmul_d (__m256d, __m256d); -__m256 __lasx_xvfmul_s (__m256, __m256); -__m256d __lasx_xvfnmadd_d (__m256d, __m256d, __m256d); -__m256 __lasx_xvfnmadd_s (__m256, __m256, __m256); -__m256d __lasx_xvfnmsub_d (__m256d, __m256d, __m256d); -__m256 __lasx_xvfnmsub_s (__m256, __m256, __m256); -__m256d __lasx_xvfrecip_d (__m256d); -__m256 __lasx_xvfrecip_s (__m256); -__m256d __lasx_xvfrint_d (__m256d); -__m256d __lasx_xvfrintrm_d (__m256d); -__m256 __lasx_xvfrintrm_s (__m256); -__m256d __lasx_xvfrintrne_d (__m256d); -__m256 __lasx_xvfrintrne_s (__m256); -__m256d __lasx_xvfrintrp_d (__m256d); -__m256 __lasx_xvfrintrp_s (__m256); -__m256d __lasx_xvfrintrz_d (__m256d); -__m256 __lasx_xvfrintrz_s (__m256); -__m256 __lasx_xvfrint_s (__m256); -__m256d __lasx_xvfrsqrt_d (__m256d); -__m256 __lasx_xvfrsqrt_s (__m256); -__m256i __lasx_xvfrstp_b (__m256i, __m256i, __m256i); -__m256i __lasx_xvfrstp_h (__m256i, __m256i, __m256i); -__m256i __lasx_xvfrstpi_b (__m256i, __m256i, imm0_31); -__m256i __lasx_xvfrstpi_h (__m256i, __m256i, imm0_31); -__m256d __lasx_xvfsqrt_d (__m256d); -__m256 __lasx_xvfsqrt_s (__m256); -__m256d __lasx_xvfsub_d (__m256d, __m256d); -__m256 __lasx_xvfsub_s (__m256, __m256); -__m256i __lasx_xvftinth_l_s (__m256); -__m256i __lasx_xvftint_l_d (__m256d); -__m256i __lasx_xvftintl_l_s (__m256); -__m256i __lasx_xvftint_lu_d (__m256d); -__m256i __lasx_xvftintrmh_l_s (__m256); -__m256i __lasx_xvftintrm_l_d (__m256d); -__m256i __lasx_xvftintrml_l_s (__m256); -__m256i __lasx_xvftintrm_w_d (__m256d, __m256d); -__m256i __lasx_xvftintrm_w_s (__m256); -__m256i __lasx_xvftintrneh_l_s (__m256); -__m256i __lasx_xvftintrne_l_d (__m256d); -__m256i __lasx_xvftintrnel_l_s (__m256); -__m256i __lasx_xvftintrne_w_d (__m256d, __m256d); -__m256i __lasx_xvftintrne_w_s (__m256); -__m256i __lasx_xvftintrph_l_s (__m256); -__m256i __lasx_xvftintrp_l_d (__m256d); -__m256i __lasx_xvftintrpl_l_s (__m256); -__m256i __lasx_xvftintrp_w_d (__m256d, __m256d); -__m256i __lasx_xvftintrp_w_s (__m256); -__m256i __lasx_xvftintrzh_l_s (__m256); -__m256i __lasx_xvftintrz_l_d (__m256d); -__m256i __lasx_xvftintrzl_l_s (__m256); -__m256i __lasx_xvftintrz_lu_d (__m256d); -__m256i __lasx_xvftintrz_w_d (__m256d, __m256d); -__m256i __lasx_xvftintrz_w_s (__m256); -__m256i __lasx_xvftintrz_wu_s (__m256); -__m256i __lasx_xvftint_w_d (__m256d, __m256d); -__m256i __lasx_xvftint_w_s (__m256); -__m256i __lasx_xvftint_wu_s (__m256); -__m256i __lasx_xvhaddw_du_wu (__m256i, __m256i); -__m256i __lasx_xvhaddw_d_w (__m256i, __m256i); -__m256i __lasx_xvhaddw_h_b (__m256i, __m256i); -__m256i __lasx_xvhaddw_hu_bu (__m256i, __m256i); -__m256i __lasx_xvhaddw_q_d (__m256i, __m256i); -__m256i __lasx_xvhaddw_qu_du (__m256i, __m256i); -__m256i __lasx_xvhaddw_w_h (__m256i, __m256i); -__m256i __lasx_xvhaddw_wu_hu (__m256i, __m256i); -__m256i __lasx_xvhsubw_du_wu (__m256i, __m256i); -__m256i __lasx_xvhsubw_d_w (__m256i, __m256i); -__m256i __lasx_xvhsubw_h_b (__m256i, __m256i); -__m256i __lasx_xvhsubw_hu_bu (__m256i, __m256i); -__m256i __lasx_xvhsubw_q_d (__m256i, __m256i); -__m256i __lasx_xvhsubw_qu_du (__m256i, __m256i); -__m256i __lasx_xvhsubw_w_h (__m256i, __m256i); -__m256i __lasx_xvhsubw_wu_hu (__m256i, __m256i); -__m256i __lasx_xvilvh_b (__m256i, __m256i); -__m256i __lasx_xvilvh_d (__m256i, __m256i); -__m256i __lasx_xvilvh_h (__m256i, __m256i); -__m256i __lasx_xvilvh_w (__m256i, __m256i); -__m256i __lasx_xvilvl_b (__m256i, __m256i); -__m256i __lasx_xvilvl_d (__m256i, __m256i); -__m256i __lasx_xvilvl_h (__m256i, __m256i); -__m256i __lasx_xvilvl_w (__m256i, __m256i); -__m256i __lasx_xvinsgr2vr_d (__m256i, long int, imm0_3); -__m256i __lasx_xvinsgr2vr_w (__m256i, int, imm0_7); -__m256i __lasx_xvinsve0_d (__m256i, __m256i, imm0_3); -__m256i __lasx_xvinsve0_w (__m256i, __m256i, imm0_7); -__m256i __lasx_xvld (void *, imm_n2048_2047); -__m256i __lasx_xvldi (imm_n1024_1023); -__m256i __lasx_xvldrepl_b (void *, imm_n2048_2047); -__m256i __lasx_xvldrepl_d (void *, imm_n256_255); -__m256i __lasx_xvldrepl_h (void *, imm_n1024_1023); -__m256i __lasx_xvldrepl_w (void *, imm_n512_511); -__m256i __lasx_xvldx (void *, long int); -__m256i __lasx_xvmadd_b (__m256i, __m256i, __m256i); -__m256i __lasx_xvmadd_d (__m256i, __m256i, __m256i); -__m256i __lasx_xvmadd_h (__m256i, __m256i, __m256i); -__m256i __lasx_xvmadd_w (__m256i, __m256i, __m256i); -__m256i __lasx_xvmaddwev_d_w (__m256i, __m256i, __m256i); -__m256i __lasx_xvmaddwev_d_wu (__m256i, __m256i, __m256i); -__m256i __lasx_xvmaddwev_d_wu_w (__m256i, __m256i, __m256i); -__m256i __lasx_xvmaddwev_h_b (__m256i, __m256i, __m256i); -__m256i __lasx_xvmaddwev_h_bu (__m256i, __m256i, __m256i); -__m256i __lasx_xvmaddwev_h_bu_b (__m256i, __m256i, __m256i); -__m256i __lasx_xvmaddwev_q_d (__m256i, __m256i, __m256i); -__m256i __lasx_xvmaddwev_q_du (__m256i, __m256i, __m256i); -__m256i __lasx_xvmaddwev_q_du_d (__m256i, __m256i, __m256i); -__m256i __lasx_xvmaddwev_w_h (__m256i, __m256i, __m256i); -__m256i __lasx_xvmaddwev_w_hu (__m256i, __m256i, __m256i); -__m256i __lasx_xvmaddwev_w_hu_h (__m256i, __m256i, __m256i); -__m256i __lasx_xvmaddwod_d_w (__m256i, __m256i, __m256i); -__m256i __lasx_xvmaddwod_d_wu (__m256i, __m256i, __m256i); -__m256i __lasx_xvmaddwod_d_wu_w (__m256i, __m256i, __m256i); -__m256i __lasx_xvmaddwod_h_b (__m256i, __m256i, __m256i); -__m256i __lasx_xvmaddwod_h_bu (__m256i, __m256i, __m256i); -__m256i __lasx_xvmaddwod_h_bu_b (__m256i, __m256i, __m256i); -__m256i __lasx_xvmaddwod_q_d (__m256i, __m256i, __m256i); -__m256i __lasx_xvmaddwod_q_du (__m256i, __m256i, __m256i); -__m256i __lasx_xvmaddwod_q_du_d (__m256i, __m256i, __m256i); -__m256i __lasx_xvmaddwod_w_h (__m256i, __m256i, __m256i); -__m256i __lasx_xvmaddwod_w_hu (__m256i, __m256i, __m256i); -__m256i __lasx_xvmaddwod_w_hu_h (__m256i, __m256i, __m256i); -__m256i __lasx_xvmax_b (__m256i, __m256i); -__m256i __lasx_xvmax_bu (__m256i, __m256i); -__m256i __lasx_xvmax_d (__m256i, __m256i); -__m256i __lasx_xvmax_du (__m256i, __m256i); -__m256i __lasx_xvmax_h (__m256i, __m256i); -__m256i __lasx_xvmax_hu (__m256i, __m256i); -__m256i __lasx_xvmaxi_b (__m256i, imm_n16_15); -__m256i __lasx_xvmaxi_bu (__m256i, imm0_31); -__m256i __lasx_xvmaxi_d (__m256i, imm_n16_15); -__m256i __lasx_xvmaxi_du (__m256i, imm0_31); -__m256i __lasx_xvmaxi_h (__m256i, imm_n16_15); -__m256i __lasx_xvmaxi_hu (__m256i, imm0_31); -__m256i __lasx_xvmaxi_w (__m256i, imm_n16_15); -__m256i __lasx_xvmaxi_wu (__m256i, imm0_31); -__m256i __lasx_xvmax_w (__m256i, __m256i); -__m256i __lasx_xvmax_wu (__m256i, __m256i); -__m256i __lasx_xvmin_b (__m256i, __m256i); -__m256i __lasx_xvmin_bu (__m256i, __m256i); -__m256i __lasx_xvmin_d (__m256i, __m256i); -__m256i __lasx_xvmin_du (__m256i, __m256i); -__m256i __lasx_xvmin_h (__m256i, __m256i); -__m256i __lasx_xvmin_hu (__m256i, __m256i); -__m256i __lasx_xvmini_b (__m256i, imm_n16_15); -__m256i __lasx_xvmini_bu (__m256i, imm0_31); -__m256i __lasx_xvmini_d (__m256i, imm_n16_15); -__m256i __lasx_xvmini_du (__m256i, imm0_31); -__m256i __lasx_xvmini_h (__m256i, imm_n16_15); -__m256i __lasx_xvmini_hu (__m256i, imm0_31); -__m256i __lasx_xvmini_w (__m256i, imm_n16_15); -__m256i __lasx_xvmini_wu (__m256i, imm0_31); -__m256i __lasx_xvmin_w (__m256i, __m256i); -__m256i __lasx_xvmin_wu (__m256i, __m256i); -__m256i __lasx_xvmod_b (__m256i, __m256i); -__m256i __lasx_xvmod_bu (__m256i, __m256i); -__m256i __lasx_xvmod_d (__m256i, __m256i); -__m256i __lasx_xvmod_du (__m256i, __m256i); -__m256i __lasx_xvmod_h (__m256i, __m256i); -__m256i __lasx_xvmod_hu (__m256i, __m256i); -__m256i __lasx_xvmod_w (__m256i, __m256i); -__m256i __lasx_xvmod_wu (__m256i, __m256i); -__m256i __lasx_xvmskgez_b (__m256i); -__m256i __lasx_xvmskltz_b (__m256i); -__m256i __lasx_xvmskltz_d (__m256i); -__m256i __lasx_xvmskltz_h (__m256i); -__m256i __lasx_xvmskltz_w (__m256i); -__m256i __lasx_xvmsknz_b (__m256i); -__m256i __lasx_xvmsub_b (__m256i, __m256i, __m256i); -__m256i __lasx_xvmsub_d (__m256i, __m256i, __m256i); -__m256i __lasx_xvmsub_h (__m256i, __m256i, __m256i); -__m256i __lasx_xvmsub_w (__m256i, __m256i, __m256i); -__m256i __lasx_xvmuh_b (__m256i, __m256i); -__m256i __lasx_xvmuh_bu (__m256i, __m256i); -__m256i __lasx_xvmuh_d (__m256i, __m256i); -__m256i __lasx_xvmuh_du (__m256i, __m256i); -__m256i __lasx_xvmuh_h (__m256i, __m256i); -__m256i __lasx_xvmuh_hu (__m256i, __m256i); -__m256i __lasx_xvmuh_w (__m256i, __m256i); -__m256i __lasx_xvmuh_wu (__m256i, __m256i); -__m256i __lasx_xvmul_b (__m256i, __m256i); -__m256i __lasx_xvmul_d (__m256i, __m256i); -__m256i __lasx_xvmul_h (__m256i, __m256i); -__m256i __lasx_xvmul_w (__m256i, __m256i); -__m256i __lasx_xvmulwev_d_w (__m256i, __m256i); -__m256i __lasx_xvmulwev_d_wu (__m256i, __m256i); -__m256i __lasx_xvmulwev_d_wu_w (__m256i, __m256i); -__m256i __lasx_xvmulwev_h_b (__m256i, __m256i); -__m256i __lasx_xvmulwev_h_bu (__m256i, __m256i); -__m256i __lasx_xvmulwev_h_bu_b (__m256i, __m256i); -__m256i __lasx_xvmulwev_q_d (__m256i, __m256i); -__m256i __lasx_xvmulwev_q_du (__m256i, __m256i); -__m256i __lasx_xvmulwev_q_du_d (__m256i, __m256i); -__m256i __lasx_xvmulwev_w_h (__m256i, __m256i); -__m256i __lasx_xvmulwev_w_hu (__m256i, __m256i); -__m256i __lasx_xvmulwev_w_hu_h (__m256i, __m256i); -__m256i __lasx_xvmulwod_d_w (__m256i, __m256i); -__m256i __lasx_xvmulwod_d_wu (__m256i, __m256i); -__m256i __lasx_xvmulwod_d_wu_w (__m256i, __m256i); -__m256i __lasx_xvmulwod_h_b (__m256i, __m256i); -__m256i __lasx_xvmulwod_h_bu (__m256i, __m256i); -__m256i __lasx_xvmulwod_h_bu_b (__m256i, __m256i); -__m256i __lasx_xvmulwod_q_d (__m256i, __m256i); -__m256i __lasx_xvmulwod_q_du (__m256i, __m256i); -__m256i __lasx_xvmulwod_q_du_d (__m256i, __m256i); -__m256i __lasx_xvmulwod_w_h (__m256i, __m256i); -__m256i __lasx_xvmulwod_w_hu (__m256i, __m256i); -__m256i __lasx_xvmulwod_w_hu_h (__m256i, __m256i); -__m256i __lasx_xvneg_b (__m256i); -__m256i __lasx_xvneg_d (__m256i); -__m256i __lasx_xvneg_h (__m256i); -__m256i __lasx_xvneg_w (__m256i); -__m256i __lasx_xvnori_b (__m256i, imm0_255); -__m256i __lasx_xvnor_v (__m256i, __m256i); -__m256i __lasx_xvori_b (__m256i, imm0_255); -__m256i __lasx_xvorn_v (__m256i, __m256i); -__m256i __lasx_xvor_v (__m256i, __m256i); -__m256i __lasx_xvpackev_b (__m256i, __m256i); -__m256i __lasx_xvpackev_d (__m256i, __m256i); -__m256i __lasx_xvpackev_h (__m256i, __m256i); -__m256i __lasx_xvpackev_w (__m256i, __m256i); -__m256i __lasx_xvpackod_b (__m256i, __m256i); -__m256i __lasx_xvpackod_d (__m256i, __m256i); -__m256i __lasx_xvpackod_h (__m256i, __m256i); -__m256i __lasx_xvpackod_w (__m256i, __m256i); -__m256i __lasx_xvpcnt_b (__m256i); -__m256i __lasx_xvpcnt_d (__m256i); -__m256i __lasx_xvpcnt_h (__m256i); -__m256i __lasx_xvpcnt_w (__m256i); -__m256i __lasx_xvpermi_d (__m256i, imm0_255); -__m256i __lasx_xvpermi_q (__m256i, __m256i, imm0_255); -__m256i __lasx_xvpermi_w (__m256i, __m256i, imm0_255); -__m256i __lasx_xvperm_w (__m256i, __m256i); -__m256i __lasx_xvpickev_b (__m256i, __m256i); -__m256i __lasx_xvpickev_d (__m256i, __m256i); -__m256i __lasx_xvpickev_h (__m256i, __m256i); -__m256i __lasx_xvpickev_w (__m256i, __m256i); -__m256i __lasx_xvpickod_b (__m256i, __m256i); -__m256i __lasx_xvpickod_d (__m256i, __m256i); -__m256i __lasx_xvpickod_h (__m256i, __m256i); -__m256i __lasx_xvpickod_w (__m256i, __m256i); -long int __lasx_xvpickve2gr_d (__m256i, imm0_3); -unsigned long int __lasx_xvpickve2gr_du (__m256i, imm0_3); -int __lasx_xvpickve2gr_w (__m256i, imm0_7); -unsigned int __lasx_xvpickve2gr_wu (__m256i, imm0_7); -__m256i __lasx_xvpickve_d (__m256i, imm0_3); -__m256d __lasx_xvpickve_d_f (__m256d, imm0_3); -__m256i __lasx_xvpickve_w (__m256i, imm0_7); -__m256 __lasx_xvpickve_w_f (__m256, imm0_7); -__m256i __lasx_xvrepl128vei_b (__m256i, imm0_15); -__m256i __lasx_xvrepl128vei_d (__m256i, imm0_1); -__m256i __lasx_xvrepl128vei_h (__m256i, imm0_7); -__m256i __lasx_xvrepl128vei_w (__m256i, imm0_3); -__m256i __lasx_xvreplgr2vr_b (int); -__m256i __lasx_xvreplgr2vr_d (long int); -__m256i __lasx_xvreplgr2vr_h (int); -__m256i __lasx_xvreplgr2vr_w (int); -__m256i __lasx_xvrepli_b (imm_n512_511); -__m256i __lasx_xvrepli_d (imm_n512_511); -__m256i __lasx_xvrepli_h (imm_n512_511); -__m256i __lasx_xvrepli_w (imm_n512_511); -__m256i __lasx_xvreplve0_b (__m256i); -__m256i __lasx_xvreplve0_d (__m256i); -__m256i __lasx_xvreplve0_h (__m256i); -__m256i __lasx_xvreplve0_q (__m256i); -__m256i __lasx_xvreplve0_w (__m256i); -__m256i __lasx_xvreplve_b (__m256i, int); -__m256i __lasx_xvreplve_d (__m256i, int); -__m256i __lasx_xvreplve_h (__m256i, int); -__m256i __lasx_xvreplve_w (__m256i, int); -__m256i __lasx_xvrotr_b (__m256i, __m256i); -__m256i __lasx_xvrotr_d (__m256i, __m256i); -__m256i __lasx_xvrotr_h (__m256i, __m256i); -__m256i __lasx_xvrotri_b (__m256i, imm0_7); -__m256i __lasx_xvrotri_d (__m256i, imm0_63); -__m256i __lasx_xvrotri_h (__m256i, imm0_15); -__m256i __lasx_xvrotri_w (__m256i, imm0_31); -__m256i __lasx_xvrotr_w (__m256i, __m256i); -__m256i __lasx_xvsadd_b (__m256i, __m256i); -__m256i __lasx_xvsadd_bu (__m256i, __m256i); -__m256i __lasx_xvsadd_d (__m256i, __m256i); -__m256i __lasx_xvsadd_du (__m256i, __m256i); -__m256i __lasx_xvsadd_h (__m256i, __m256i); -__m256i __lasx_xvsadd_hu (__m256i, __m256i); -__m256i __lasx_xvsadd_w (__m256i, __m256i); -__m256i __lasx_xvsadd_wu (__m256i, __m256i); -__m256i __lasx_xvsat_b (__m256i, imm0_7); -__m256i __lasx_xvsat_bu (__m256i, imm0_7); -__m256i __lasx_xvsat_d (__m256i, imm0_63); -__m256i __lasx_xvsat_du (__m256i, imm0_63); -__m256i __lasx_xvsat_h (__m256i, imm0_15); -__m256i __lasx_xvsat_hu (__m256i, imm0_15); -__m256i __lasx_xvsat_w (__m256i, imm0_31); -__m256i __lasx_xvsat_wu (__m256i, imm0_31); -__m256i __lasx_xvseq_b (__m256i, __m256i); -__m256i __lasx_xvseq_d (__m256i, __m256i); -__m256i __lasx_xvseq_h (__m256i, __m256i); -__m256i __lasx_xvseqi_b (__m256i, imm_n16_15); -__m256i __lasx_xvseqi_d (__m256i, imm_n16_15); -__m256i __lasx_xvseqi_h (__m256i, imm_n16_15); -__m256i __lasx_xvseqi_w (__m256i, imm_n16_15); -__m256i __lasx_xvseq_w (__m256i, __m256i); -__m256i __lasx_xvshuf4i_b (__m256i, imm0_255); -__m256i __lasx_xvshuf4i_d (__m256i, __m256i, imm0_255); -__m256i __lasx_xvshuf4i_h (__m256i, imm0_255); -__m256i __lasx_xvshuf4i_w (__m256i, imm0_255); -__m256i __lasx_xvshuf_b (__m256i, __m256i, __m256i); -__m256i __lasx_xvshuf_d (__m256i, __m256i, __m256i); -__m256i __lasx_xvshuf_h (__m256i, __m256i, __m256i); -__m256i __lasx_xvshuf_w (__m256i, __m256i, __m256i); -__m256i __lasx_xvsigncov_b (__m256i, __m256i); -__m256i __lasx_xvsigncov_d (__m256i, __m256i); -__m256i __lasx_xvsigncov_h (__m256i, __m256i); -__m256i __lasx_xvsigncov_w (__m256i, __m256i); -__m256i __lasx_xvsle_b (__m256i, __m256i); -__m256i __lasx_xvsle_bu (__m256i, __m256i); -__m256i __lasx_xvsle_d (__m256i, __m256i); -__m256i __lasx_xvsle_du (__m256i, __m256i); -__m256i __lasx_xvsle_h (__m256i, __m256i); -__m256i __lasx_xvsle_hu (__m256i, __m256i); -__m256i __lasx_xvslei_b (__m256i, imm_n16_15); -__m256i __lasx_xvslei_bu (__m256i, imm0_31); -__m256i __lasx_xvslei_d (__m256i, imm_n16_15); -__m256i __lasx_xvslei_du (__m256i, imm0_31); -__m256i __lasx_xvslei_h (__m256i, imm_n16_15); -__m256i __lasx_xvslei_hu (__m256i, imm0_31); -__m256i __lasx_xvslei_w (__m256i, imm_n16_15); -__m256i __lasx_xvslei_wu (__m256i, imm0_31); -__m256i __lasx_xvsle_w (__m256i, __m256i); -__m256i __lasx_xvsle_wu (__m256i, __m256i); -__m256i __lasx_xvsll_b (__m256i, __m256i); -__m256i __lasx_xvsll_d (__m256i, __m256i); -__m256i __lasx_xvsll_h (__m256i, __m256i); -__m256i __lasx_xvslli_b (__m256i, imm0_7); -__m256i __lasx_xvslli_d (__m256i, imm0_63); -__m256i __lasx_xvslli_h (__m256i, imm0_15); -__m256i __lasx_xvslli_w (__m256i, imm0_31); -__m256i __lasx_xvsll_w (__m256i, __m256i); -__m256i __lasx_xvsllwil_du_wu (__m256i, imm0_31); -__m256i __lasx_xvsllwil_d_w (__m256i, imm0_31); -__m256i __lasx_xvsllwil_h_b (__m256i, imm0_7); -__m256i __lasx_xvsllwil_hu_bu (__m256i, imm0_7); -__m256i __lasx_xvsllwil_w_h (__m256i, imm0_15); -__m256i __lasx_xvsllwil_wu_hu (__m256i, imm0_15); -__m256i __lasx_xvslt_b (__m256i, __m256i); -__m256i __lasx_xvslt_bu (__m256i, __m256i); -__m256i __lasx_xvslt_d (__m256i, __m256i); -__m256i __lasx_xvslt_du (__m256i, __m256i); -__m256i __lasx_xvslt_h (__m256i, __m256i); -__m256i __lasx_xvslt_hu (__m256i, __m256i); -__m256i __lasx_xvslti_b (__m256i, imm_n16_15); -__m256i __lasx_xvslti_bu (__m256i, imm0_31); -__m256i __lasx_xvslti_d (__m256i, imm_n16_15); -__m256i __lasx_xvslti_du (__m256i, imm0_31); -__m256i __lasx_xvslti_h (__m256i, imm_n16_15); -__m256i __lasx_xvslti_hu (__m256i, imm0_31); -__m256i __lasx_xvslti_w (__m256i, imm_n16_15); -__m256i __lasx_xvslti_wu (__m256i, imm0_31); -__m256i __lasx_xvslt_w (__m256i, __m256i); -__m256i __lasx_xvslt_wu (__m256i, __m256i); -__m256i __lasx_xvsra_b (__m256i, __m256i); -__m256i __lasx_xvsra_d (__m256i, __m256i); -__m256i __lasx_xvsra_h (__m256i, __m256i); -__m256i __lasx_xvsrai_b (__m256i, imm0_7); -__m256i __lasx_xvsrai_d (__m256i, imm0_63); -__m256i __lasx_xvsrai_h (__m256i, imm0_15); -__m256i __lasx_xvsrai_w (__m256i, imm0_31); -__m256i __lasx_xvsran_b_h (__m256i, __m256i); -__m256i __lasx_xvsran_h_w (__m256i, __m256i); -__m256i __lasx_xvsrani_b_h (__m256i, __m256i, imm0_15); -__m256i __lasx_xvsrani_d_q (__m256i, __m256i, imm0_127); -__m256i __lasx_xvsrani_h_w (__m256i, __m256i, imm0_31); -__m256i __lasx_xvsrani_w_d (__m256i, __m256i, imm0_63); -__m256i __lasx_xvsran_w_d (__m256i, __m256i); -__m256i __lasx_xvsrar_b (__m256i, __m256i); -__m256i __lasx_xvsrar_d (__m256i, __m256i); -__m256i __lasx_xvsrar_h (__m256i, __m256i); -__m256i __lasx_xvsrari_b (__m256i, imm0_7); -__m256i __lasx_xvsrari_d (__m256i, imm0_63); -__m256i __lasx_xvsrari_h (__m256i, imm0_15); -__m256i __lasx_xvsrari_w (__m256i, imm0_31); -__m256i __lasx_xvsrarn_b_h (__m256i, __m256i); -__m256i __lasx_xvsrarn_h_w (__m256i, __m256i); -__m256i __lasx_xvsrarni_b_h (__m256i, __m256i, imm0_15); -__m256i __lasx_xvsrarni_d_q (__m256i, __m256i, imm0_127); -__m256i __lasx_xvsrarni_h_w (__m256i, __m256i, imm0_31); -__m256i __lasx_xvsrarni_w_d (__m256i, __m256i, imm0_63); -__m256i __lasx_xvsrarn_w_d (__m256i, __m256i); -__m256i __lasx_xvsrar_w (__m256i, __m256i); -__m256i __lasx_xvsra_w (__m256i, __m256i); -__m256i __lasx_xvsrl_b (__m256i, __m256i); -__m256i __lasx_xvsrl_d (__m256i, __m256i); -__m256i __lasx_xvsrl_h (__m256i, __m256i); -__m256i __lasx_xvsrli_b (__m256i, imm0_7); -__m256i __lasx_xvsrli_d (__m256i, imm0_63); -__m256i __lasx_xvsrli_h (__m256i, imm0_15); -__m256i __lasx_xvsrli_w (__m256i, imm0_31); -__m256i __lasx_xvsrln_b_h (__m256i, __m256i); -__m256i __lasx_xvsrln_h_w (__m256i, __m256i); -__m256i __lasx_xvsrlni_b_h (__m256i, __m256i, imm0_15); -__m256i __lasx_xvsrlni_d_q (__m256i, __m256i, imm0_127); -__m256i __lasx_xvsrlni_h_w (__m256i, __m256i, imm0_31); -__m256i __lasx_xvsrlni_w_d (__m256i, __m256i, imm0_63); -__m256i __lasx_xvsrln_w_d (__m256i, __m256i); -__m256i __lasx_xvsrlr_b (__m256i, __m256i); -__m256i __lasx_xvsrlr_d (__m256i, __m256i); -__m256i __lasx_xvsrlr_h (__m256i, __m256i); -__m256i __lasx_xvsrlri_b (__m256i, imm0_7); -__m256i __lasx_xvsrlri_d (__m256i, imm0_63); -__m256i __lasx_xvsrlri_h (__m256i, imm0_15); -__m256i __lasx_xvsrlri_w (__m256i, imm0_31); -__m256i __lasx_xvsrlrn_b_h (__m256i, __m256i); -__m256i __lasx_xvsrlrn_h_w (__m256i, __m256i); -__m256i __lasx_xvsrlrni_b_h (__m256i, __m256i, imm0_15); -__m256i __lasx_xvsrlrni_d_q (__m256i, __m256i, imm0_127); -__m256i __lasx_xvsrlrni_h_w (__m256i, __m256i, imm0_31); -__m256i __lasx_xvsrlrni_w_d (__m256i, __m256i, imm0_63); -__m256i __lasx_xvsrlrn_w_d (__m256i, __m256i); -__m256i __lasx_xvsrlr_w (__m256i, __m256i); -__m256i __lasx_xvsrl_w (__m256i, __m256i); -__m256i __lasx_xvssran_b_h (__m256i, __m256i); -__m256i __lasx_xvssran_bu_h (__m256i, __m256i); -__m256i __lasx_xvssran_hu_w (__m256i, __m256i); -__m256i __lasx_xvssran_h_w (__m256i, __m256i); -__m256i __lasx_xvssrani_b_h (__m256i, __m256i, imm0_15); -__m256i __lasx_xvssrani_bu_h (__m256i, __m256i, imm0_15); -__m256i __lasx_xvssrani_d_q (__m256i, __m256i, imm0_127); -__m256i __lasx_xvssrani_du_q (__m256i, __m256i, imm0_127); -__m256i __lasx_xvssrani_hu_w (__m256i, __m256i, imm0_31); -__m256i __lasx_xvssrani_h_w (__m256i, __m256i, imm0_31); -__m256i __lasx_xvssrani_w_d (__m256i, __m256i, imm0_63); -__m256i __lasx_xvssrani_wu_d (__m256i, __m256i, imm0_63); -__m256i __lasx_xvssran_w_d (__m256i, __m256i); -__m256i __lasx_xvssran_wu_d (__m256i, __m256i); -__m256i __lasx_xvssrarn_b_h (__m256i, __m256i); -__m256i __lasx_xvssrarn_bu_h (__m256i, __m256i); -__m256i __lasx_xvssrarn_hu_w (__m256i, __m256i); -__m256i __lasx_xvssrarn_h_w (__m256i, __m256i); -__m256i __lasx_xvssrarni_b_h (__m256i, __m256i, imm0_15); -__m256i __lasx_xvssrarni_bu_h (__m256i, __m256i, imm0_15); -__m256i __lasx_xvssrarni_d_q (__m256i, __m256i, imm0_127); -__m256i __lasx_xvssrarni_du_q (__m256i, __m256i, imm0_127); -__m256i __lasx_xvssrarni_hu_w (__m256i, __m256i, imm0_31); -__m256i __lasx_xvssrarni_h_w (__m256i, __m256i, imm0_31); -__m256i __lasx_xvssrarni_w_d (__m256i, __m256i, imm0_63); -__m256i __lasx_xvssrarni_wu_d (__m256i, __m256i, imm0_63); -__m256i __lasx_xvssrarn_w_d (__m256i, __m256i); -__m256i __lasx_xvssrarn_wu_d (__m256i, __m256i); -__m256i __lasx_xvssrln_b_h (__m256i, __m256i); -__m256i __lasx_xvssrln_bu_h (__m256i, __m256i); -__m256i __lasx_xvssrln_hu_w (__m256i, __m256i); -__m256i __lasx_xvssrln_h_w (__m256i, __m256i); -__m256i __lasx_xvssrlni_b_h (__m256i, __m256i, imm0_15); -__m256i __lasx_xvssrlni_bu_h (__m256i, __m256i, imm0_15); -__m256i __lasx_xvssrlni_d_q (__m256i, __m256i, imm0_127); -__m256i __lasx_xvssrlni_du_q (__m256i, __m256i, imm0_127); -__m256i __lasx_xvssrlni_hu_w (__m256i, __m256i, imm0_31); -__m256i __lasx_xvssrlni_h_w (__m256i, __m256i, imm0_31); -__m256i __lasx_xvssrlni_w_d (__m256i, __m256i, imm0_63); -__m256i __lasx_xvssrlni_wu_d (__m256i, __m256i, imm0_63); -__m256i __lasx_xvssrln_w_d (__m256i, __m256i); -__m256i __lasx_xvssrln_wu_d (__m256i, __m256i); -__m256i __lasx_xvssrlrn_b_h (__m256i, __m256i); -__m256i __lasx_xvssrlrn_bu_h (__m256i, __m256i); -__m256i __lasx_xvssrlrn_hu_w (__m256i, __m256i); -__m256i __lasx_xvssrlrn_h_w (__m256i, __m256i); -__m256i __lasx_xvssrlrni_b_h (__m256i, __m256i, imm0_15); -__m256i __lasx_xvssrlrni_bu_h (__m256i, __m256i, imm0_15); -__m256i __lasx_xvssrlrni_d_q (__m256i, __m256i, imm0_127); -__m256i __lasx_xvssrlrni_du_q (__m256i, __m256i, imm0_127); -__m256i __lasx_xvssrlrni_hu_w (__m256i, __m256i, imm0_31); -__m256i __lasx_xvssrlrni_h_w (__m256i, __m256i, imm0_31); -__m256i __lasx_xvssrlrni_w_d (__m256i, __m256i, imm0_63); -__m256i __lasx_xvssrlrni_wu_d (__m256i, __m256i, imm0_63); -__m256i __lasx_xvssrlrn_w_d (__m256i, __m256i); -__m256i __lasx_xvssrlrn_wu_d (__m256i, __m256i); -__m256i __lasx_xvssub_b (__m256i, __m256i); -__m256i __lasx_xvssub_bu (__m256i, __m256i); -__m256i __lasx_xvssub_d (__m256i, __m256i); -__m256i __lasx_xvssub_du (__m256i, __m256i); -__m256i __lasx_xvssub_h (__m256i, __m256i); -__m256i __lasx_xvssub_hu (__m256i, __m256i); -__m256i __lasx_xvssub_w (__m256i, __m256i); -__m256i __lasx_xvssub_wu (__m256i, __m256i); -void __lasx_xvst (__m256i, void *, imm_n2048_2047); -void __lasx_xvstelm_b (__m256i, void *, imm_n128_127, imm0_31); -void __lasx_xvstelm_d (__m256i, void *, imm_n128_127, imm0_3); -void __lasx_xvstelm_h (__m256i, void *, imm_n128_127, imm0_15); -void __lasx_xvstelm_w (__m256i, void *, imm_n128_127, imm0_7); -void __lasx_xvstx (__m256i, void *, long int); -__m256i __lasx_xvsub_b (__m256i, __m256i); -__m256i __lasx_xvsub_d (__m256i, __m256i); -__m256i __lasx_xvsub_h (__m256i, __m256i); -__m256i __lasx_xvsubi_bu (__m256i, imm0_31); -__m256i __lasx_xvsubi_du (__m256i, imm0_31); -__m256i __lasx_xvsubi_hu (__m256i, imm0_31); -__m256i __lasx_xvsubi_wu (__m256i, imm0_31); -__m256i __lasx_xvsub_q (__m256i, __m256i); -__m256i __lasx_xvsub_w (__m256i, __m256i); -__m256i __lasx_xvsubwev_d_w (__m256i, __m256i); -__m256i __lasx_xvsubwev_d_wu (__m256i, __m256i); -__m256i __lasx_xvsubwev_h_b (__m256i, __m256i); -__m256i __lasx_xvsubwev_h_bu (__m256i, __m256i); -__m256i __lasx_xvsubwev_q_d (__m256i, __m256i); -__m256i __lasx_xvsubwev_q_du (__m256i, __m256i); -__m256i __lasx_xvsubwev_w_h (__m256i, __m256i); -__m256i __lasx_xvsubwev_w_hu (__m256i, __m256i); -__m256i __lasx_xvsubwod_d_w (__m256i, __m256i); -__m256i __lasx_xvsubwod_d_wu (__m256i, __m256i); -__m256i __lasx_xvsubwod_h_b (__m256i, __m256i); -__m256i __lasx_xvsubwod_h_bu (__m256i, __m256i); -__m256i __lasx_xvsubwod_q_d (__m256i, __m256i); -__m256i __lasx_xvsubwod_q_du (__m256i, __m256i); -__m256i __lasx_xvsubwod_w_h (__m256i, __m256i); -__m256i __lasx_xvsubwod_w_hu (__m256i, __m256i); -__m256i __lasx_xvxori_b (__m256i, imm0_255); -__m256i __lasx_xvxor_v (__m256i, __m256i); +@node Other Builtins +@section Other Built-in Functions Provided by GCC +@cindex built-in functions +@findex __builtin_iseqsig +@findex __builtin_isfinite +@findex __builtin_isnormal +@findex __builtin_isgreater +@findex __builtin_isgreaterequal +@findex __builtin_isunordered +@findex __builtin_speculation_safe_value +@findex _Exit +@findex _exit +@findex abort +@findex abs +@findex acos +@findex acosf +@findex acosh +@findex acoshf +@findex acoshl +@findex acosl +@findex alloca +@findex asin +@findex asinf +@findex asinh +@findex asinhf +@findex asinhl +@findex asinl +@findex atan +@findex atan2 +@findex atan2f +@findex atan2l +@findex atanf +@findex atanh +@findex atanhf +@findex atanhl +@findex atanl +@findex bcmp +@findex bzero +@findex cabs +@findex cabsf +@findex cabsl +@findex cacos +@findex cacosf +@findex cacosh +@findex cacoshf +@findex cacoshl +@findex cacosl +@findex calloc +@findex carg +@findex cargf +@findex cargl +@findex casin +@findex casinf +@findex casinh +@findex casinhf +@findex casinhl +@findex casinl +@findex catan +@findex catanf +@findex catanh +@findex catanhf +@findex catanhl +@findex catanl +@findex cbrt +@findex cbrtf +@findex cbrtl +@findex ccos +@findex ccosf +@findex ccosh +@findex ccoshf +@findex ccoshl +@findex ccosl +@findex ceil +@findex ceilf +@findex ceill +@findex cexp +@findex cexpf +@findex cexpl +@findex cimag +@findex cimagf +@findex cimagl +@findex clog +@findex clogf +@findex clogl +@findex clog10 +@findex clog10f +@findex clog10l +@findex conj +@findex conjf +@findex conjl +@findex copysign +@findex copysignf +@findex copysignl +@findex cos +@findex cosf +@findex cosh +@findex coshf +@findex coshl +@findex cosl +@findex cpow +@findex cpowf +@findex cpowl +@findex cproj +@findex cprojf +@findex cprojl +@findex creal +@findex crealf +@findex creall +@findex csin +@findex csinf +@findex csinh +@findex csinhf +@findex csinhl +@findex csinl +@findex csqrt +@findex csqrtf +@findex csqrtl +@findex ctan +@findex ctanf +@findex ctanh +@findex ctanhf +@findex ctanhl +@findex ctanl +@findex dcgettext +@findex dgettext +@findex drem +@findex dremf +@findex dreml +@findex erf +@findex erfc +@findex erfcf +@findex erfcl +@findex erff +@findex erfl +@findex exit +@findex exp +@findex exp10 +@findex exp10f +@findex exp10l +@findex exp2 +@findex exp2f +@findex exp2l +@findex expf +@findex expl +@findex expm1 +@findex expm1f +@findex expm1l +@findex fabs +@findex fabsf +@findex fabsl +@findex fdim +@findex fdimf +@findex fdiml +@findex ffs +@findex floor +@findex floorf +@findex floorl +@findex fma +@findex fmaf +@findex fmal +@findex fmax +@findex fmaxf +@findex fmaxl +@findex fmin +@findex fminf +@findex fminl +@findex fmod +@findex fmodf +@findex fmodl +@findex fprintf +@findex fprintf_unlocked +@findex fputs +@findex fputs_unlocked +@findex free +@findex frexp +@findex frexpf +@findex frexpl +@findex fscanf +@findex gamma +@findex gammaf +@findex gammal +@findex gamma_r +@findex gammaf_r +@findex gammal_r +@findex gettext +@findex hypot +@findex hypotf +@findex hypotl +@findex ilogb +@findex ilogbf +@findex ilogbl +@findex imaxabs +@findex index +@findex isalnum +@findex isalpha +@findex isascii +@findex isblank +@findex iscntrl +@findex isdigit +@findex isgraph +@findex islower +@findex isprint +@findex ispunct +@findex isspace +@findex isupper +@findex iswalnum +@findex iswalpha +@findex iswblank +@findex iswcntrl +@findex iswdigit +@findex iswgraph +@findex iswlower +@findex iswprint +@findex iswpunct +@findex iswspace +@findex iswupper +@findex iswxdigit +@findex isxdigit +@findex j0 +@findex j0f +@findex j0l +@findex j1 +@findex j1f +@findex j1l +@findex jn +@findex jnf +@findex jnl +@findex labs +@findex ldexp +@findex ldexpf +@findex ldexpl +@findex lgamma +@findex lgammaf +@findex lgammal +@findex lgamma_r +@findex lgammaf_r +@findex lgammal_r +@findex llabs +@findex llrint +@findex llrintf +@findex llrintl +@findex llround +@findex llroundf +@findex llroundl +@findex log +@findex log10 +@findex log10f +@findex log10l +@findex log1p +@findex log1pf +@findex log1pl +@findex log2 +@findex log2f +@findex log2l +@findex logb +@findex logbf +@findex logbl +@findex logf +@findex logl +@findex lrint +@findex lrintf +@findex lrintl +@findex lround +@findex lroundf +@findex lroundl +@findex malloc +@findex memchr +@findex memcmp +@findex memcpy +@findex mempcpy +@findex memset +@findex modf +@findex modff +@findex modfl +@findex nearbyint +@findex nearbyintf +@findex nearbyintl +@findex nextafter +@findex nextafterf +@findex nextafterl +@findex nexttoward +@findex nexttowardf +@findex nexttowardl +@findex pow +@findex pow10 +@findex pow10f +@findex pow10l +@findex powf +@findex powl +@findex printf +@findex printf_unlocked +@findex putchar +@findex puts +@findex realloc +@findex remainder +@findex remainderf +@findex remainderl +@findex remquo +@findex remquof +@findex remquol +@findex rindex +@findex rint +@findex rintf +@findex rintl +@findex round +@findex roundf +@findex roundl +@findex scalb +@findex scalbf +@findex scalbl +@findex scalbln +@findex scalblnf +@findex scalblnf +@findex scalbn +@findex scalbnf +@findex scanfnl +@findex signbit +@findex signbitf +@findex signbitl +@findex signbitd32 +@findex signbitd64 +@findex signbitd128 +@findex significand +@findex significandf +@findex significandl +@findex sin +@findex sincos +@findex sincosf +@findex sincosl +@findex sinf +@findex sinh +@findex sinhf +@findex sinhl +@findex sinl +@findex snprintf +@findex sprintf +@findex sqrt +@findex sqrtf +@findex sqrtl +@findex sscanf +@findex stpcpy +@findex stpncpy +@findex strcasecmp +@findex strcat +@findex strchr +@findex strcmp +@findex strcpy +@findex strcspn +@findex strdup +@findex strfmon +@findex strftime +@findex strlen +@findex strncasecmp +@findex strncat +@findex strncmp +@findex strncpy +@findex strndup +@findex strnlen +@findex strpbrk +@findex strrchr +@findex strspn +@findex strstr +@findex tan +@findex tanf +@findex tanh +@findex tanhf +@findex tanhl +@findex tanl +@findex tgamma +@findex tgammaf +@findex tgammal +@findex toascii +@findex tolower +@findex toupper +@findex towlower +@findex towupper +@findex trunc +@findex truncf +@findex truncl +@findex vfprintf +@findex vfscanf +@findex vprintf +@findex vscanf +@findex vsnprintf +@findex vsprintf +@findex vsscanf +@findex y0 +@findex y0f +@findex y0l +@findex y1 +@findex y1f +@findex y1l +@findex yn +@findex ynf +@findex ynl + +GCC provides a large number of built-in functions other than the ones +mentioned above. Some of these are for internal use in the processing +of exceptions or variable-length argument lists and are not +documented here because they may change from time to time; we do not +recommend general use of these functions. + +The remaining functions are provided for optimization purposes. + +With the exception of built-ins that have library equivalents such as +the standard C library functions discussed below, or that expand to +library calls, GCC built-in functions are always expanded inline and +thus do not have corresponding entry points and their address cannot +be obtained. Attempting to use them in an expression other than +a function call results in a compile-time error. + +@opindex fno-builtin +GCC includes built-in versions of many of the functions in the standard +C library. These functions come in two forms: one whose names start with +the @code{__builtin_} prefix, and the other without. Both forms have the +same type (including prototype), the same address (when their address is +taken), and the same meaning as the C library functions even if you specify +the @option{-fno-builtin} option @pxref{C Dialect Options}). Many of these +functions are only optimized in certain cases; if they are not optimized in +a particular case, a call to the library function is emitted. + +@opindex ansi +@opindex std +Outside strict ISO C mode (@option{-ansi}, @option{-std=c90}, +@option{-std=c99} or @option{-std=c11}), the functions +@code{_exit}, @code{alloca}, @code{bcmp}, @code{bzero}, +@code{dcgettext}, @code{dgettext}, @code{dremf}, @code{dreml}, +@code{drem}, @code{exp10f}, @code{exp10l}, @code{exp10}, @code{ffsll}, +@code{ffsl}, @code{ffs}, @code{fprintf_unlocked}, +@code{fputs_unlocked}, @code{gammaf}, @code{gammal}, @code{gamma}, +@code{gammaf_r}, @code{gammal_r}, @code{gamma_r}, @code{gettext}, +@code{index}, @code{isascii}, @code{j0f}, @code{j0l}, @code{j0}, +@code{j1f}, @code{j1l}, @code{j1}, @code{jnf}, @code{jnl}, @code{jn}, +@code{lgammaf_r}, @code{lgammal_r}, @code{lgamma_r}, @code{mempcpy}, +@code{pow10f}, @code{pow10l}, @code{pow10}, @code{printf_unlocked}, +@code{rindex}, @code{roundeven}, @code{roundevenf}, @code{roundevenl}, +@code{scalbf}, @code{scalbl}, @code{scalb}, +@code{signbit}, @code{signbitf}, @code{signbitl}, @code{signbitd32}, +@code{signbitd64}, @code{signbitd128}, @code{significandf}, +@code{significandl}, @code{significand}, @code{sincosf}, +@code{sincosl}, @code{sincos}, @code{stpcpy}, @code{stpncpy}, +@code{strcasecmp}, @code{strdup}, @code{strfmon}, @code{strncasecmp}, +@code{strndup}, @code{strnlen}, @code{toascii}, @code{y0f}, @code{y0l}, +@code{y0}, @code{y1f}, @code{y1l}, @code{y1}, @code{ynf}, @code{ynl} and +@code{yn} +may be handled as built-in functions. +All these functions have corresponding versions +prefixed with @code{__builtin_}, which may be used even in strict C90 +mode. + +The ISO C99 functions +@code{_Exit}, @code{acoshf}, @code{acoshl}, @code{acosh}, @code{asinhf}, +@code{asinhl}, @code{asinh}, @code{atanhf}, @code{atanhl}, @code{atanh}, +@code{cabsf}, @code{cabsl}, @code{cabs}, @code{cacosf}, @code{cacoshf}, +@code{cacoshl}, @code{cacosh}, @code{cacosl}, @code{cacos}, +@code{cargf}, @code{cargl}, @code{carg}, @code{casinf}, @code{casinhf}, +@code{casinhl}, @code{casinh}, @code{casinl}, @code{casin}, +@code{catanf}, @code{catanhf}, @code{catanhl}, @code{catanh}, +@code{catanl}, @code{catan}, @code{cbrtf}, @code{cbrtl}, @code{cbrt}, +@code{ccosf}, @code{ccoshf}, @code{ccoshl}, @code{ccosh}, @code{ccosl}, +@code{ccos}, @code{cexpf}, @code{cexpl}, @code{cexp}, @code{cimagf}, +@code{cimagl}, @code{cimag}, @code{clogf}, @code{clogl}, @code{clog}, +@code{conjf}, @code{conjl}, @code{conj}, @code{copysignf}, @code{copysignl}, +@code{copysign}, @code{cpowf}, @code{cpowl}, @code{cpow}, @code{cprojf}, +@code{cprojl}, @code{cproj}, @code{crealf}, @code{creall}, @code{creal}, +@code{csinf}, @code{csinhf}, @code{csinhl}, @code{csinh}, @code{csinl}, +@code{csin}, @code{csqrtf}, @code{csqrtl}, @code{csqrt}, @code{ctanf}, +@code{ctanhf}, @code{ctanhl}, @code{ctanh}, @code{ctanl}, @code{ctan}, +@code{erfcf}, @code{erfcl}, @code{erfc}, @code{erff}, @code{erfl}, +@code{erf}, @code{exp2f}, @code{exp2l}, @code{exp2}, @code{expm1f}, +@code{expm1l}, @code{expm1}, @code{fdimf}, @code{fdiml}, @code{fdim}, +@code{fmaf}, @code{fmal}, @code{fmaxf}, @code{fmaxl}, @code{fmax}, +@code{fma}, @code{fminf}, @code{fminl}, @code{fmin}, @code{hypotf}, +@code{hypotl}, @code{hypot}, @code{ilogbf}, @code{ilogbl}, @code{ilogb}, +@code{imaxabs}, @code{isblank}, @code{iswblank}, @code{lgammaf}, +@code{lgammal}, @code{lgamma}, @code{llabs}, @code{llrintf}, @code{llrintl}, +@code{llrint}, @code{llroundf}, @code{llroundl}, @code{llround}, +@code{log1pf}, @code{log1pl}, @code{log1p}, @code{log2f}, @code{log2l}, +@code{log2}, @code{logbf}, @code{logbl}, @code{logb}, @code{lrintf}, +@code{lrintl}, @code{lrint}, @code{lroundf}, @code{lroundl}, +@code{lround}, @code{nearbyintf}, @code{nearbyintl}, @code{nearbyint}, +@code{nextafterf}, @code{nextafterl}, @code{nextafter}, +@code{nexttowardf}, @code{nexttowardl}, @code{nexttoward}, +@code{remainderf}, @code{remainderl}, @code{remainder}, @code{remquof}, +@code{remquol}, @code{remquo}, @code{rintf}, @code{rintl}, @code{rint}, +@code{roundf}, @code{roundl}, @code{round}, @code{scalblnf}, +@code{scalblnl}, @code{scalbln}, @code{scalbnf}, @code{scalbnl}, +@code{scalbn}, @code{snprintf}, @code{tgammaf}, @code{tgammal}, +@code{tgamma}, @code{truncf}, @code{truncl}, @code{trunc}, +@code{vfscanf}, @code{vscanf}, @code{vsnprintf} and @code{vsscanf} +are handled as built-in functions +except in strict ISO C90 mode (@option{-ansi} or @option{-std=c90}). + +There are also built-in versions of the ISO C99 functions +@code{acosf}, @code{acosl}, @code{asinf}, @code{asinl}, @code{atan2f}, +@code{atan2l}, @code{atanf}, @code{atanl}, @code{ceilf}, @code{ceill}, +@code{cosf}, @code{coshf}, @code{coshl}, @code{cosl}, @code{expf}, +@code{expl}, @code{fabsf}, @code{fabsl}, @code{floorf}, @code{floorl}, +@code{fmodf}, @code{fmodl}, @code{frexpf}, @code{frexpl}, @code{ldexpf}, +@code{ldexpl}, @code{log10f}, @code{log10l}, @code{logf}, @code{logl}, +@code{modfl}, @code{modff}, @code{powf}, @code{powl}, @code{sinf}, +@code{sinhf}, @code{sinhl}, @code{sinl}, @code{sqrtf}, @code{sqrtl}, +@code{tanf}, @code{tanhf}, @code{tanhl} and @code{tanl} +that are recognized in any mode since ISO C90 reserves these names for +the purpose to which ISO C99 puts them. All these functions have +corresponding versions prefixed with @code{__builtin_}. + +There are also built-in functions @code{__builtin_fabsf@var{n}}, +@code{__builtin_fabsf@var{n}x}, @code{__builtin_copysignf@var{n}} and +@code{__builtin_copysignf@var{n}x}, corresponding to the TS 18661-3 +functions @code{fabsf@var{n}}, @code{fabsf@var{n}x}, +@code{copysignf@var{n}} and @code{copysignf@var{n}x}, for supported +types @code{_Float@var{n}} and @code{_Float@var{n}x}. + +There are also GNU extension functions @code{clog10}, @code{clog10f} and +@code{clog10l} which names are reserved by ISO C99 for future use. +All these functions have versions prefixed with @code{__builtin_}. + +The ISO C94 functions +@code{iswalnum}, @code{iswalpha}, @code{iswcntrl}, @code{iswdigit}, +@code{iswgraph}, @code{iswlower}, @code{iswprint}, @code{iswpunct}, +@code{iswspace}, @code{iswupper}, @code{iswxdigit}, @code{towlower} and +@code{towupper} +are handled as built-in functions +except in strict ISO C90 mode (@option{-ansi} or @option{-std=c90}). + +The ISO C90 functions +@code{abort}, @code{abs}, @code{acos}, @code{asin}, @code{atan2}, +@code{atan}, @code{calloc}, @code{ceil}, @code{cosh}, @code{cos}, +@code{exit}, @code{exp}, @code{fabs}, @code{floor}, @code{fmod}, +@code{fprintf}, @code{fputs}, @code{free}, @code{frexp}, @code{fscanf}, +@code{isalnum}, @code{isalpha}, @code{iscntrl}, @code{isdigit}, +@code{isgraph}, @code{islower}, @code{isprint}, @code{ispunct}, +@code{isspace}, @code{isupper}, @code{isxdigit}, @code{tolower}, +@code{toupper}, @code{labs}, @code{ldexp}, @code{log10}, @code{log}, +@code{malloc}, @code{memchr}, @code{memcmp}, @code{memcpy}, +@code{memset}, @code{modf}, @code{pow}, @code{printf}, @code{putchar}, +@code{puts}, @code{realloc}, @code{scanf}, @code{sinh}, @code{sin}, +@code{snprintf}, @code{sprintf}, @code{sqrt}, @code{sscanf}, @code{strcat}, +@code{strchr}, @code{strcmp}, @code{strcpy}, @code{strcspn}, +@code{strlen}, @code{strncat}, @code{strncmp}, @code{strncpy}, +@code{strpbrk}, @code{strrchr}, @code{strspn}, @code{strstr}, +@code{tanh}, @code{tan}, @code{vfprintf}, @code{vprintf} and @code{vsprintf} +are all recognized as built-in functions unless +@option{-fno-builtin} is specified (or @option{-fno-builtin-@var{function}} +is specified for an individual function). All of these functions have +corresponding versions prefixed with @code{__builtin_}. + +GCC provides built-in versions of the ISO C99 floating-point comparison +macros that avoid raising exceptions for unordered operands. They have +the same names as the standard macros ( @code{isgreater}, +@code{isgreaterequal}, @code{isless}, @code{islessequal}, +@code{islessgreater}, and @code{isunordered}) , with @code{__builtin_} +prefixed. We intend for a library implementor to be able to simply +@code{#define} each standard macro to its built-in equivalent. +In the same fashion, GCC provides @code{fpclassify}, @code{iseqsig}, +@code{isfinite}, @code{isinf_sign}, @code{isnormal} and @code{signbit} built-ins +used with @code{__builtin_} prefixed. The @code{isinf} and @code{isnan} +built-in functions appear both with and without the @code{__builtin_} prefix. +With @code{-ffinite-math-only} option the @code{isinf} and @code{isnan} +built-in functions will always return 0. + +GCC provides built-in versions of the ISO C99 floating-point rounding and +exceptions handling functions @code{fegetround}, @code{feclearexcept} and +@code{feraiseexcept}. They may not be available for all targets, and because +they need close interaction with libc internal values, they may not be available +for all target libcs, but in all cases they will gracefully fallback to libc +calls. These built-in functions appear both with and without the +@code{__builtin_} prefix. + +@defbuiltin{{void *} __builtin_alloca (size_t @var{size})} +The @code{__builtin_alloca} function must be called at block scope. +The function allocates an object @var{size} bytes large on the stack +of the calling function. The object is aligned on the default stack +alignment boundary for the target determined by the +@code{__BIGGEST_ALIGNMENT__} macro. The @code{__builtin_alloca} +function returns a pointer to the first byte of the allocated object. +The lifetime of the allocated object ends just before the calling +function returns to its caller. This is so even when +@code{__builtin_alloca} is called within a nested block. + +For example, the following function allocates eight objects of @code{n} +bytes each on the stack, storing a pointer to each in consecutive elements +of the array @code{a}. It then passes the array to function @code{g} +which can safely use the storage pointed to by each of the array elements. + +@smallexample +void f (unsigned n) +@{ + void *a [8]; + for (int i = 0; i != 8; ++i) + a [i] = __builtin_alloca (n); + + g (a, n); // @r{safe} +@} +@end smallexample + +Since the @code{__builtin_alloca} function doesn't validate its argument +it is the responsibility of its caller to make sure the argument doesn't +cause it to exceed the stack size limit. +The @code{__builtin_alloca} function is provided to make it possible to +allocate on the stack arrays of bytes with an upper bound that may be +computed at run time. Since C99 Variable Length Arrays offer +similar functionality under a portable, more convenient, and safer +interface they are recommended instead, in both C99 and C++ programs +where GCC provides them as an extension. +@xref{Variable Length}, for details. + +@enddefbuiltin + +@defbuiltin{{void *} __builtin_alloca_with_align (size_t @var{size}, size_t @var{alignment})} +The @code{__builtin_alloca_with_align} function must be called at block +scope. The function allocates an object @var{size} bytes large on +the stack of the calling function. The allocated object is aligned on +the boundary specified by the argument @var{alignment} whose unit is given +in bits (not bytes). The @var{size} argument must be positive and not +exceed the stack size limit. The @var{alignment} argument must be a constant +integer expression that evaluates to a power of 2 greater than or equal to +@code{CHAR_BIT} and less than some unspecified maximum. Invocations +with other values are rejected with an error indicating the valid bounds. +The function returns a pointer to the first byte of the allocated object. +The lifetime of the allocated object ends at the end of the block in which +the function was called. The allocated storage is released no later than +just before the calling function returns to its caller, but may be released +at the end of the block in which the function was called. + +For example, in the following function the call to @code{g} is unsafe +because when @code{overalign} is non-zero, the space allocated by +@code{__builtin_alloca_with_align} may have been released at the end +of the @code{if} statement in which it was called. + +@smallexample +void f (unsigned n, bool overalign) +@{ + void *p; + if (overalign) + p = __builtin_alloca_with_align (n, 64 /* bits */); + else + p = __builtin_alloc (n); + + g (p, n); // @r{unsafe} +@} +@end smallexample + +Since the @code{__builtin_alloca_with_align} function doesn't validate its +@var{size} argument it is the responsibility of its caller to make sure +the argument doesn't cause it to exceed the stack size limit. +The @code{__builtin_alloca_with_align} function is provided to make +it possible to allocate on the stack overaligned arrays of bytes with +an upper bound that may be computed at run time. Since C99 +Variable Length Arrays offer the same functionality under +a portable, more convenient, and safer interface they are recommended +instead, in both C99 and C++ programs where GCC provides them as +an extension. @xref{Variable Length}, for details. + +@enddefbuiltin + +@defbuiltin{{void *} __builtin_alloca_with_align_and_max (size_t @var{size}, size_t @var{alignment}, size_t @var{max_size})} +Similar to @code{__builtin_alloca_with_align} but takes an extra argument +specifying an upper bound for @var{size} in case its value cannot be computed +at compile time, for use by @option{-fstack-usage}, @option{-Wstack-usage} +and @option{-Walloca-larger-than}. @var{max_size} must be a constant integer +expression, it has no effect on code generation and no attempt is made to +check its compatibility with @var{size}. + +@enddefbuiltin + +@defbuiltin{bool __builtin_has_attribute (@var{type-or-expression}, @var{attribute})} +The @code{__builtin_has_attribute} function evaluates to an integer constant +expression equal to @code{true} if the symbol or type referenced by +the @var{type-or-expression} argument has been declared with +the @var{attribute} referenced by the second argument. For +an @var{type-or-expression} argument that does not reference a symbol, +since attributes do not apply to expressions the built-in consider +the type of the argument. Neither argument is evaluated. +The @var{type-or-expression} argument is subject to the same +restrictions as the argument to @code{typeof} (@pxref{Typeof}). The +@var{attribute} argument is an attribute name optionally followed by +a comma-separated list of arguments enclosed in parentheses. Both forms +of attribute names---with and without double leading and trailing +underscores---are recognized. @xref{Attribute Syntax}, for details. +When no attribute arguments are specified for an attribute that expects +one or more arguments the function returns @code{true} if +@var{type-or-expression} has been declared with the attribute regardless +of the attribute argument values. Arguments provided for an attribute +that expects some are validated and matched up to the provided number. +The function returns @code{true} if all provided arguments match. For +example, the first call to the function below evaluates to @code{true} +because @code{x} is declared with the @code{aligned} attribute but +the second call evaluates to @code{false} because @code{x} is declared +@code{aligned (8)} and not @code{aligned (4)}. + +@smallexample +__attribute__ ((aligned (8))) int x; +_Static_assert (__builtin_has_attribute (x, aligned), "aligned"); +_Static_assert (!__builtin_has_attribute (x, aligned (4)), "aligned (4)"); +@end smallexample + +Due to a limitation the @code{__builtin_has_attribute} function returns +@code{false} for the @code{mode} attribute even if the type or variable +referenced by the @var{type-or-expression} argument was declared with one. +The function is also not supported with labels, and in C with enumerators. + +Note that unlike the @code{__has_attribute} preprocessor operator which +is suitable for use in @code{#if} preprocessing directives +@code{__builtin_has_attribute} is an intrinsic function that is not +recognized in such contexts. + +@enddefbuiltin + +@defbuiltin{@var{type} __builtin_speculation_safe_value (@var{type} @var{val}, @var{type} @var{failval})} + +This built-in function can be used to help mitigate against unsafe +speculative execution. @var{type} may be any integral type or any +pointer type. + +@enumerate +@item +If the CPU is not speculatively executing the code, then @var{val} +is returned. +@item +If the CPU is executing speculatively then either: +@itemize +@item +The function may cause execution to pause until it is known that the +code is no-longer being executed speculatively (in which case +@var{val} can be returned, as above); or +@item +The function may use target-dependent speculation tracking state to cause +@var{failval} to be returned when it is known that speculative +execution has incorrectly predicted a conditional branch operation. +@end itemize +@end enumerate + +The second argument, @var{failval}, is optional and defaults to zero +if omitted. + +GCC defines the preprocessor macro +@code{__HAVE_BUILTIN_SPECULATION_SAFE_VALUE} for targets that have been +updated to support this builtin. + +The built-in function can be used where a variable appears to be used in a +safe way, but the CPU, due to speculative execution may temporarily ignore +the bounds checks. Consider, for example, the following function: + +@smallexample +int array[500]; +int f (unsigned untrusted_index) +@{ + if (untrusted_index < 500) + return array[untrusted_index]; + return 0; +@} +@end smallexample + +If the function is called repeatedly with @code{untrusted_index} less +than the limit of 500, then a branch predictor will learn that the +block of code that returns a value stored in @code{array} will be +executed. If the function is subsequently called with an +out-of-range value it will still try to execute that block of code +first until the CPU determines that the prediction was incorrect +(the CPU will unwind any incorrect operations at that point). +However, depending on how the result of the function is used, it might be +possible to leave traces in the cache that can reveal what was stored +at the out-of-bounds location. The built-in function can be used to +provide some protection against leaking data in this way by changing +the code to: + +@smallexample +int array[500]; +int f (unsigned untrusted_index) +@{ + if (untrusted_index < 500) + return array[__builtin_speculation_safe_value (untrusted_index)]; + return 0; +@} +@end smallexample + +The built-in function will either cause execution to stall until the +conditional branch has been fully resolved, or it may permit +speculative execution to continue, but using 0 instead of +@code{untrusted_value} if that exceeds the limit. + +If accessing any memory location is potentially unsafe when speculative +execution is incorrect, then the code can be rewritten as + +@smallexample +int array[500]; +int f (unsigned untrusted_index) +@{ + if (untrusted_index < 500) + return *__builtin_speculation_safe_value (&array[untrusted_index], NULL); + return 0; +@} +@end smallexample + +which will cause a @code{NULL} pointer to be used for the unsafe case. + +@enddefbuiltin + +@defbuiltin{int __builtin_types_compatible_p (@var{type1}, @var{type2})} + +You can use the built-in function @code{__builtin_types_compatible_p} to +determine whether two types are the same. + +This built-in function returns 1 if the unqualified versions of the +types @var{type1} and @var{type2} (which are types, not expressions) are +compatible, 0 otherwise. The result of this built-in function can be +used in integer constant expressions. + +This built-in function ignores top level qualifiers (e.g., @code{const}, +@code{volatile}). For example, @code{int} is equivalent to @code{const +int}. + +The type @code{int[]} and @code{int[5]} are compatible. On the other +hand, @code{int} and @code{char *} are not compatible, even if the size +of their types, on the particular architecture are the same. Also, the +amount of pointer indirection is taken into account when determining +similarity. Consequently, @code{short *} is not similar to +@code{short **}. Furthermore, two types that are typedefed are +considered compatible if their underlying types are compatible. + +An @code{enum} type is not considered to be compatible with another +@code{enum} type even if both are compatible with the same integer +type; this is what the C standard specifies. +For example, @code{enum @{foo, bar@}} is not similar to +@code{enum @{hot, dog@}}. + +You typically use this function in code whose execution varies +depending on the arguments' types. For example: + +@smallexample +#define foo(x) \ + (@{ \ + typeof (x) tmp = (x); \ + if (__builtin_types_compatible_p (typeof (x), long double)) \ + tmp = foo_long_double (tmp); \ + else if (__builtin_types_compatible_p (typeof (x), double)) \ + tmp = foo_double (tmp); \ + else if (__builtin_types_compatible_p (typeof (x), float)) \ + tmp = foo_float (tmp); \ + else \ + abort (); \ + tmp; \ + @}) +@end smallexample + +@emph{Note:} This construct is only available for C@. + +@enddefbuiltin + +@defbuiltin{@var{type} __builtin_call_with_static_chain (@var{call_exp}, @var{pointer_exp})} + +The @var{call_exp} expression must be a function call, and the +@var{pointer_exp} expression must be a pointer. The @var{pointer_exp} +is passed to the function call in the target's static chain location. +The result of builtin is the result of the function call. + +@emph{Note:} This builtin is only available for C@. +This builtin can be used to call Go closures from C. + +@enddefbuiltin + +@defbuiltin{@var{type} __builtin_choose_expr (@var{const_exp}, @var{exp1}, @var{exp2})} + +You can use the built-in function @code{__builtin_choose_expr} to +evaluate code depending on the value of a constant expression. This +built-in function returns @var{exp1} if @var{const_exp}, which is an +integer constant expression, is nonzero. Otherwise it returns @var{exp2}. + +Like the @samp{? :} operator, this built-in function does not evaluate the +expression that is not chosen. For example, if @var{const_exp} evaluates to +@code{true}, @var{exp2} is not evaluated even if it has side effects. On the +other hand, @code{__builtin_choose_expr} differs from @samp{? :} in that the +first operand must be a compile-time constant, and the other operands are not +subject to the @samp{? :} type constraints and promotions. + +This built-in function can return an lvalue if the chosen argument is an +lvalue. + +If @var{exp1} is returned, the return type is the same as @var{exp1}'s +type. Similarly, if @var{exp2} is returned, its return type is the same +as @var{exp2}. + +Example: + +@smallexample +#define foo(x) \ + __builtin_choose_expr ( \ + __builtin_types_compatible_p (typeof (x), double), \ + foo_double (x), \ + __builtin_choose_expr ( \ + __builtin_types_compatible_p (typeof (x), float), \ + foo_float (x), \ + /* @r{The void expression results in a compile-time error} \ + @r{when assigning the result to something.} */ \ + (void)0)) +@end smallexample + +@emph{Note:} This construct is only available for C@. Furthermore, the +unused expression (@var{exp1} or @var{exp2} depending on the value of +@var{const_exp}) may still generate syntax errors. This may change in +future revisions. + +@enddefbuiltin + +@defbuiltin{@var{type} __builtin_tgmath (@var{functions}, @var{arguments})} + +The built-in function @code{__builtin_tgmath}, available only for C +and Objective-C, calls a function determined according to the rules of +@code{} macros. It is intended to be used in +implementations of that header, so that expansions of macros from that +header only expand each of their arguments once, to avoid problems +when calls to such macros are nested inside the arguments of other +calls to such macros; in addition, it results in better diagnostics +for invalid calls to @code{} macros than implementations +using other GNU C language features. For example, the @code{pow} +type-generic macro might be defined as: + +@smallexample +#define pow(a, b) __builtin_tgmath (powf, pow, powl, \ + cpowf, cpow, cpowl, a, b) +@end smallexample + +The arguments to @code{__builtin_tgmath} are at least two pointers to +functions, followed by the arguments to the type-generic macro (which +will be passed as arguments to the selected function). All the +pointers to functions must be pointers to prototyped functions, none +of which may have variable arguments, and all of which must have the +same number of parameters; the number of parameters of the first +function determines how many arguments to @code{__builtin_tgmath} are +interpreted as function pointers, and how many as the arguments to the +called function. + +The types of the specified functions must all be different, but +related to each other in the same way as a set of functions that may +be selected between by a macro in @code{}. This means that +the functions are parameterized by a floating-point type @var{t}, +different for each such function. The function return types may all +be the same type, or they may be @var{t} for each function, or they +may be the real type corresponding to @var{t} for each function (if +some of the types @var{t} are complex). Likewise, for each parameter +position, the type of the parameter in that position may always be the +same type, or may be @var{t} for each function (this case must apply +for at least one parameter position), or may be the real type +corresponding to @var{t} for each function. + +The standard rules for @code{} macros are used to find a +common type @var{u} from the types of the arguments for parameters +whose types vary between the functions; complex integer types (a GNU +extension) are treated like the complex type corresponding to the real +floating type that would be chosen for the corresponding real integer type. +If the function return types vary, or are all the same integer type, +the function called is the one for which @var{t} is @var{u}, and it is +an error if there is no such function. If the function return types +are all the same floating-point type, the type-generic macro is taken +to be one of those from TS 18661 that rounds the result to a narrower +type; if there is a function for which @var{t} is @var{u}, it is +called, and otherwise the first function, if any, for which @var{t} +has at least the range and precision of @var{u} is called, and it is +an error if there is no such function. + +@enddefbuiltin + +@defbuiltin{int __builtin_constant_p (@var{exp})} +You can use the built-in function @code{__builtin_constant_p} to +determine if the expression @var{exp} is known to be constant at +compile time and hence that GCC can perform constant-folding on expressions +involving that value. The argument of the function is the expression to test. +The expression is not evaluated, side-effects are discarded. The function +returns the integer 1 if the argument is known to be a compile-time +constant and 0 if it is not known to be a compile-time constant. +Any expression that has side-effects makes the function return 0. +A return of 0 does not indicate that the expression is @emph{not} a constant, +but merely that GCC cannot prove it is a constant within the constraints +of the active set of optimization options. + +You typically use this function in an embedded application where +memory is a critical resource. If you have some complex calculation, +you may want it to be folded if it involves constants, but need to call +a function if it does not. For example: + +@smallexample +#define Scale_Value(X) \ + (__builtin_constant_p (X) \ + ? ((X) * SCALE + OFFSET) : Scale (X)) +@end smallexample + +You may use this built-in function in either a macro or an inline +function. However, if you use it in an inlined function and pass an +argument of the function as the argument to the built-in, GCC +never returns 1 when you call the inline function with a string constant +or compound literal (@pxref{Compound Literals}) and does not return 1 +when you pass a constant numeric value to the inline function unless you +specify the @option{-O} option. + +You may also use @code{__builtin_constant_p} in initializers for static +data. For instance, you can write + +@smallexample +static const int table[] = @{ + __builtin_constant_p (EXPRESSION) ? (EXPRESSION) : -1, + /* @r{@dots{}} */ +@}; +@end smallexample + +@noindent +This is an acceptable initializer even if @var{EXPRESSION} is not a +constant expression, including the case where +@code{__builtin_constant_p} returns 1 because @var{EXPRESSION} can be +folded to a constant but @var{EXPRESSION} contains operands that are +not otherwise permitted in a static initializer (for example, +@code{0 && foo ()}). GCC must be more conservative about evaluating the +built-in in this case, because it has no opportunity to perform +optimization. +@enddefbuiltin + +@defbuiltin{bool __builtin_is_constant_evaluated (void)} +The @code{__builtin_is_constant_evaluated} function is available only +in C++. The built-in is intended to be used by implementations of +the @code{std::is_constant_evaluated} C++ function. Programs should make +use of the latter function rather than invoking the built-in directly. + +The main use case of the built-in is to determine whether a @code{constexpr} +function is being called in a @code{constexpr} context. A call to +the function evaluates to a core constant expression with the value +@code{true} if and only if it occurs within the evaluation of an expression +or conversion that is manifestly constant-evaluated as defined in the C++ +standard. Manifestly constant-evaluated contexts include constant-expressions, +the conditions of @code{constexpr if} statements, constraint-expressions, and +initializers of variables usable in constant expressions. For more details +refer to the latest revision of the C++ standard. +@enddefbuiltin + +@defbuiltin{@var{type} __builtin_counted_by_ref (@var{ptr})} +The built-in function @code{__builtin_counted_by_ref} checks whether the array +object pointed by the pointer @var{ptr} has another object associated with it +that represents the number of elements in the array object through the +@code{counted_by} attribute (i.e. the counted-by object). If so, returns a +pointer to the corresponding counted-by object. +If such counted-by object does not exist, returns a null pointer. + +This built-in function is only available in C for now. + +The argument @var{ptr} must be a pointer to an array. +The @var{type} of the returned value is a pointer type pointing to the +corresponding type of the counted-by object or a void pointer type in case +of a null pointer being returned. + +For example: + +@smallexample +struct foo1 @{ + int counter; + struct bar1 array[] __attribute__((counted_by (counter))); +@} *p; + +struct foo2 @{ + int other; + struct bar2 array[]; +@} *q; +@end smallexample + +@noindent +the following call to the built-in + +@smallexample +__builtin_counted_by_ref (p->array) +@end smallexample + +@noindent +returns: + +@smallexample +&p->counter with type @code{int *}. +@end smallexample + +@noindent +However, the following call to the built-in + +@smallexample +__builtin_counted_by_ref (q->array) +@end smallexample + +@noindent +returns a null pointer to @code{void}. + +@enddefbuiltin + +@defbuiltin{void __builtin_clear_padding (@var{ptr})} +The built-in function @code{__builtin_clear_padding} function clears +padding bits inside of the object representation of object pointed by +@var{ptr}, which has to be a pointer. The value representation of the +object is not affected. The type of the object is assumed to be the type +the pointer points to. Inside of a union, the only cleared bits are +bits that are padding bits for all the union members. + +This built-in-function is useful if the padding bits of an object might +have indeterminate values and the object representation needs to be +bitwise compared to some other object, for example for atomic operations. + +For C++, @var{ptr} argument type should be pointer to trivially-copyable +type, unless the argument is address of a variable or parameter, because +otherwise it isn't known if the type isn't just a base class whose padding +bits are reused or laid out differently in a derived class. +@enddefbuiltin + +@defbuiltin{@var{type} __builtin_bit_cast (@var{type}, @var{arg})} +The @code{__builtin_bit_cast} function is available only +in C++. The built-in is intended to be used by implementations of +the @code{std::bit_cast} C++ template function. Programs should make +use of the latter function rather than invoking the built-in directly. + +This built-in function allows reinterpreting the bits of the @var{arg} +argument as if it had type @var{type}. @var{type} and the type of the +@var{arg} argument need to be trivially copyable types with the same size. +When manifestly constant-evaluated, it performs extra diagnostics required +for @code{std::bit_cast} and returns a constant expression if @var{arg} +is a constant expression. For more details +refer to the latest revision of the C++ standard. +@enddefbuiltin + +@defbuiltin{long __builtin_expect (long @var{exp}, long @var{c})} +@opindex fprofile-arcs +You may use @code{__builtin_expect} to provide the compiler with +branch prediction information. In general, you should prefer to +use actual profile feedback for this (@option{-fprofile-arcs}), as +programmers are notoriously bad at predicting how their programs +actually perform. However, there are applications in which this +data is hard to collect. + +The return value is the value of @var{exp}, which should be an integral +expression. The semantics of the built-in are that it is expected that +@var{exp} == @var{c}. For example: + +@smallexample +if (__builtin_expect (x, 0)) + foo (); +@end smallexample + +@noindent +indicates that we do not expect to call @code{foo}, since +we expect @code{x} to be zero. Since you are limited to integral +expressions for @var{exp}, you should use constructions such as + +@smallexample +if (__builtin_expect (ptr != NULL, 1)) + foo (*ptr); +@end smallexample + +@noindent +when testing pointer or floating-point values. + +For the purposes of branch prediction optimizations, the probability that +a @code{__builtin_expect} expression is @code{true} is controlled by GCC's +@code{builtin-expect-probability} parameter, which defaults to 90%. + +You can also use @code{__builtin_expect_with_probability} to explicitly +assign a probability value to individual expressions. If the built-in +is used in a loop construct, the provided probability will influence +the expected number of iterations made by loop optimizations. +@enddefbuiltin + +@defbuiltin{long __builtin_expect_with_probability} +(long @var{exp}, long @var{c}, double @var{probability}) + +This function has the same semantics as @code{__builtin_expect}, +but the caller provides the expected probability that @var{exp} == @var{c}. +The last argument, @var{probability}, is a floating-point value in the +range 0.0 to 1.0, inclusive. The @var{probability} argument must be a +constant floating-point expression. +@enddefbuiltin + +@defbuiltin{void __builtin_trap (void)} +This function causes the program to exit abnormally. GCC implements +this function by using a target-dependent mechanism (such as +intentionally executing an illegal instruction) or by calling +@code{abort}. The mechanism used may vary from release to release so +you should not rely on any particular implementation. +@enddefbuiltin + +@defbuiltin{void __builtin_unreachable (void)} +If control flow reaches the point of the @code{__builtin_unreachable}, +the program is undefined. It is useful in situations where the +compiler cannot deduce the unreachability of the code. + +One such case is immediately following an @code{asm} statement that +either never terminates, or one that transfers control elsewhere +and never returns. In this example, without the +@code{__builtin_unreachable}, GCC issues a warning that control +reaches the end of a non-void function. It also generates code +to return after the @code{asm}. + +@smallexample +int f (int c, int v) +@{ + if (c) + @{ + return v; + @} + else + @{ + asm("jmp error_handler"); + __builtin_unreachable (); + @} +@} +@end smallexample + +@noindent +Because the @code{asm} statement unconditionally transfers control out +of the function, control never reaches the end of the function +body. The @code{__builtin_unreachable} is in fact unreachable and +communicates this fact to the compiler. + +Another use for @code{__builtin_unreachable} is following a call a +function that never returns but that is not declared +@code{__attribute__((noreturn))}, as in this example: + +@smallexample +void function_that_never_returns (void); + +int g (int c) +@{ + if (c) + @{ + return 1; + @} + else + @{ + function_that_never_returns (); + __builtin_unreachable (); + @} +@} +@end smallexample + +@enddefbuiltin + +@defbuiltin{@var{type} __builtin_assoc_barrier (@var{type} @var{expr})} +This built-in inhibits re-association of the floating-point expression +@var{expr} with expressions consuming the return value of the built-in. The +expression @var{expr} itself can be reordered, and the whole expression +@var{expr} can be reordered with operands after the barrier. The barrier is +relevant when @code{-fassociative-math} is active. + +@smallexample +float x0 = a + b - b; +float x1 = __builtin_assoc_barrier(a + b) - b; +@end smallexample + +@noindent +means that, with @code{-fassociative-math}, @code{x0} can be optimized to +@code{x0 = a} but @code{x1} cannot. + +It is also relevant when @code{-ffp-contract=fast} is active; +it will prevent contraction between expressions. + +@smallexample +float x0 = a * b + c; +float x1 = __builtin_assoc_barrier (a * b) + c; +@end smallexample + +@noindent +means that, with @code{-ffp-contract=fast}, @code{x0} may be optimized to +use a fused multiply-add instruction but @code{x1} cannot. + +@enddefbuiltin + +@defbuiltin{{void *} __builtin_assume_aligned (const void *@var{exp}, size_t @var{align}, ...)} +This function returns its first argument, and allows the compiler +to assume that the returned pointer is at least @var{align} bytes +aligned. This built-in can have either two or three arguments, +if it has three, the third argument should have integer type, and +if it is nonzero means misalignment offset. For example: + +@smallexample +void *x = __builtin_assume_aligned (arg, 16); +@end smallexample + +@noindent +means that the compiler can assume @code{x}, set to @code{arg}, is at least +16-byte aligned, while: + +@smallexample +void *x = __builtin_assume_aligned (arg, 32, 8); +@end smallexample + +@noindent +means that the compiler can assume for @code{x}, set to @code{arg}, that +@code{(char *) x - 8} is 32-byte aligned. +@enddefbuiltin + +@defbuiltin{int __builtin_LINE ()} +This function is the equivalent of the preprocessor @code{__LINE__} +macro and returns a constant integer expression that evaluates to +the line number of the invocation of the built-in. When used as a C++ +default argument for a function @var{F}, it returns the line number +of the call to @var{F}. +@enddefbuiltin + +@defbuiltin{{const char *} __builtin_FUNCTION ()} +This function is the equivalent of the @code{__FUNCTION__} symbol +and returns an address constant pointing to the name of the function +from which the built-in was invoked, or the empty string if +the invocation is not at function scope. When used as a C++ default +argument for a function @var{F}, it returns the name of @var{F}'s +caller or the empty string if the call was not made at function +scope. +@enddefbuiltin + +@defbuiltin{{const char *} __builtin_FILE ()} +This function is the equivalent of the preprocessor @code{__FILE__} +macro and returns an address constant pointing to the file name +containing the invocation of the built-in, or the empty string if +the invocation is not at function scope. When used as a C++ default +argument for a function @var{F}, it returns the file name of the call +to @var{F} or the empty string if the call was not made at function +scope. + +For example, in the following, each call to function @code{foo} will +print a line similar to @code{"file.c:123: foo: message"} with the name +of the file and the line number of the @code{printf} call, the name of +the function @code{foo}, followed by the word @code{message}. + +@smallexample +const char* +function (const char *func = __builtin_FUNCTION ()) +@{ + return func; +@} + +void foo (void) +@{ + printf ("%s:%i: %s: message\n", file (), line (), function ()); +@} +@end smallexample + +@enddefbuiltin + +@defbuiltin{void __builtin___clear_cache (void *@var{begin}, void *@var{end})} +This function is used to flush the processor's instruction cache for +the region of memory between @var{begin} inclusive and @var{end} +exclusive. Some targets require that the instruction cache be +flushed, after modifying memory containing code, in order to obtain +deterministic behavior. + +If the target does not require instruction cache flushes, +@code{__builtin___clear_cache} has no effect. Otherwise either +instructions are emitted in-line to clear the instruction cache or a +call to the @code{__clear_cache} function in libgcc is made. +@enddefbuiltin + +@defbuiltin{void __builtin_prefetch (const void *@var{addr}, ...)} +This function is used to minimize cache-miss latency by moving data into +a cache before it is accessed. +You can insert calls to @code{__builtin_prefetch} into code for which +you know addresses of data in memory that is likely to be accessed soon. +If the target supports them, data prefetch instructions are generated. +If the prefetch is done early enough before the access then the data will +be in the cache by the time it is accessed. + +The value of @var{addr} is the address of the memory to prefetch. +There are two optional arguments, @var{rw} and @var{locality}. +The value of @var{rw} is a compile-time constant zero, one or two; one +means that the prefetch is preparing for a write to the memory address, +two means that the prefetch is preparing for a shared read (expected to be +read by at least one other processor before it is written if written at +all) and zero, the default, means that the prefetch is preparing for a read. +The value @var{locality} must be a compile-time constant integer between +zero and three. A value of zero means that the data has no temporal +locality, so it need not be left in the cache after the access. A value +of three means that the data has a high degree of temporal locality and +should be left in all levels of cache possible. Values of one and two +mean, respectively, a low or moderate degree of temporal locality. The +default is three. + +@smallexample +for (i = 0; i < n; i++) + @{ + a[i] = a[i] + b[i]; + __builtin_prefetch (&a[i+j], 1, 1); + __builtin_prefetch (&b[i+j], 0, 1); + /* @r{@dots{}} */ + @} +@end smallexample + +Data prefetch does not generate faults if @var{addr} is invalid, but +the address expression itself must be valid. For example, a prefetch +of @code{p->next} does not fault if @code{p->next} is not a valid +address, but evaluation faults if @code{p} is not a valid address. + +If the target does not support data prefetch, the address expression +is evaluated if it includes side effects but no other code is generated +and GCC does not issue a warning. +@enddefbuiltin + +@defbuiltin{{size_t} __builtin_object_size (const void * @var{ptr}, int @var{type})} +Returns a constant size estimate of an object pointed to by @var{ptr}. +@xref{Object Size Checking}, for a detailed description of the function. +@enddefbuiltin + +@defbuiltin{{size_t} __builtin_dynamic_object_size (const void * @var{ptr}, int @var{type})} +Similar to @code{__builtin_object_size} except that the return value +need not be a constant. @xref{Object Size Checking}, for a detailed +description of the function. +@enddefbuiltin + +@defbuiltin{int __builtin_classify_type (@var{arg})} +@defbuiltinx{int __builtin_classify_type (@var{type})} +The @code{__builtin_classify_type} returns a small integer with a category +of @var{arg} argument's type, like void type, integer type, enumeral type, +boolean type, pointer type, reference type, offset type, real type, complex +type, function type, method type, record type, union type, array type, +string type, bit-precise integer type, vector type, etc. When the argument +is an expression, for backwards compatibility reason the argument is promoted +like arguments passed to @code{...} in varargs function, so some classes are +never returned in certain languages. Alternatively, the argument of the +built-in function can be a typename, such as the @code{typeof} specifier. + +@smallexample +int a[2]; +__builtin_classify_type (a) == __builtin_classify_type (int[5]); +__builtin_classify_type (a) == __builtin_classify_type (void*); +__builtin_classify_type (typeof (a)) == __builtin_classify_type (int[5]); @end smallexample -These intrinsic functions are available by including @code{lasxintrin.h} and -using @option{-mfrecipe} and @option{-mlasx}. -@smallexample -__m256d __lasx_xvfrecipe_d (__m256d); -__m256 __lasx_xvfrecipe_s (__m256); -__m256d __lasx_xvfrsqrte_d (__m256d); -__m256 __lasx_xvfrsqrte_s (__m256); -@end smallexample +The first comparison will never be true, as @var{a} is implicitly converted +to pointer. The last two comparisons will be true as they classify +pointers in the second case and arrays in the last case. +@enddefbuiltin + +@defbuiltin{double __builtin_huge_val (void)} +Returns a positive infinity, if supported by the floating-point format, +else @code{DBL_MAX}. This function is suitable for implementing the +ISO C macro @code{HUGE_VAL}. +@enddefbuiltin + +@defbuiltin{float __builtin_huge_valf (void)} +Similar to @code{__builtin_huge_val}, except the return type is @code{float}. +@enddefbuiltin + +@defbuiltin{{long double} __builtin_huge_vall (void)} +Similar to @code{__builtin_huge_val}, except the return +type is @code{long double}. +@enddefbuiltin + +@defbuiltin{_Float@var{n} __builtin_huge_valf@var{n} (void)} +Similar to @code{__builtin_huge_val}, except the return type is +@code{_Float@var{n}}. +@enddefbuiltin + +@defbuiltin{_Float@var{n}x __builtin_huge_valf@var{n}x (void)} +Similar to @code{__builtin_huge_val}, except the return type is +@code{_Float@var{n}x}. +@enddefbuiltin + +@defbuiltin{int __builtin_fpclassify (int, int, int, int, int, ...)} +This built-in implements the C99 fpclassify functionality. The first +five int arguments should be the target library's notion of the +possible FP classes and are used for return values. They must be +constant values and they must appear in this order: @code{FP_NAN}, +@code{FP_INFINITE}, @code{FP_NORMAL}, @code{FP_SUBNORMAL} and +@code{FP_ZERO}. The ellipsis is for exactly one floating-point value +to classify. GCC treats the last argument as type-generic, which +means it does not do default promotion from float to double. +@enddefbuiltin + +@defbuiltin{double __builtin_inf (void)} +Similar to @code{__builtin_huge_val}, except a warning is generated +if the target floating-point format does not support infinities. +@enddefbuiltin + +@defbuiltin{_Decimal32 __builtin_infd32 (void)} +Similar to @code{__builtin_inf}, except the return type is @code{_Decimal32}. +@enddefbuiltin + +@defbuiltin{_Decimal64 __builtin_infd64 (void)} +Similar to @code{__builtin_inf}, except the return type is @code{_Decimal64}. +@enddefbuiltin + +@defbuiltin{_Decimal128 __builtin_infd128 (void)} +Similar to @code{__builtin_inf}, except the return type is @code{_Decimal128}. +@enddefbuiltin + +@defbuiltin{float __builtin_inff (void)} +Similar to @code{__builtin_inf}, except the return type is @code{float}. +This function is suitable for implementing the ISO C99 macro @code{INFINITY}. +@enddefbuiltin + +@defbuiltin{{long double} __builtin_infl (void)} +Similar to @code{__builtin_inf}, except the return +type is @code{long double}. +@enddefbuiltin + +@defbuiltin{_Float@var{n} __builtin_inff@var{n} (void)} +Similar to @code{__builtin_inf}, except the return +type is @code{_Float@var{n}}. +@enddefbuiltin + +@defbuiltin{_Float@var{n} __builtin_inff@var{n}x (void)} +Similar to @code{__builtin_inf}, except the return +type is @code{_Float@var{n}x}. +@enddefbuiltin + +@defbuiltin{int __builtin_isinf_sign (...)} +Similar to @code{isinf}, except the return value is -1 for +an argument of @code{-Inf} and 1 for an argument of @code{+Inf}. +Note while the parameter list is an +ellipsis, this function only accepts exactly one floating-point +argument. GCC treats this parameter as type-generic, which means it +does not do default promotion from float to double. +@enddefbuiltin + +@defbuiltin{double __builtin_nan (const char *@var{str})} +This is an implementation of the ISO C99 function @code{nan}. + +Since ISO C99 defines this function in terms of @code{strtod}, which we +do not implement, a description of the parsing is in order. The string +is parsed as by @code{strtol}; that is, the base is recognized by +leading @samp{0} or @samp{0x} prefixes. The number parsed is placed +in the significand such that the least significant bit of the number +is at the least significant bit of the significand. The number is +truncated to fit the significand field provided. The significand is +forced to be a quiet NaN@. + +This function, if given a string literal all of which would have been +consumed by @code{strtol}, is evaluated early enough that it is considered a +compile-time constant. +@enddefbuiltin + +@defbuiltin{_Decimal32 __builtin_nand32 (const char *@var{str})} +Similar to @code{__builtin_nan}, except the return type is @code{_Decimal32}. +@enddefbuiltin + +@defbuiltin{_Decimal64 __builtin_nand64 (const char *@var{str})} +Similar to @code{__builtin_nan}, except the return type is @code{_Decimal64}. +@enddefbuiltin + +@defbuiltin{_Decimal128 __builtin_nand128 (const char *@var{str})} +Similar to @code{__builtin_nan}, except the return type is @code{_Decimal128}. +@enddefbuiltin + +@defbuiltin{float __builtin_nanf (const char *@var{str})} +Similar to @code{__builtin_nan}, except the return type is @code{float}. +@enddefbuiltin + +@defbuiltin{{long double} __builtin_nanl (const char *@var{str})} +Similar to @code{__builtin_nan}, except the return type is @code{long double}. +@enddefbuiltin + +@defbuiltin{_Float@var{n} __builtin_nanf@var{n} (const char *@var{str})} +Similar to @code{__builtin_nan}, except the return type is +@code{_Float@var{n}}. +@enddefbuiltin + +@defbuiltin{_Float@var{n}x __builtin_nanf@var{n}x (const char *@var{str})} +Similar to @code{__builtin_nan}, except the return type is +@code{_Float@var{n}x}. +@enddefbuiltin + +@defbuiltin{double __builtin_nans (const char *@var{str})} +Similar to @code{__builtin_nan}, except the significand is forced +to be a signaling NaN@. The @code{nans} function is proposed by +@uref{https://www.open-std.org/jtc1/sc22/wg14/www/docs/n965.htm,,WG14 N965}. +@enddefbuiltin + +@defbuiltin{_Decimal32 __builtin_nansd32 (const char *@var{str})} +Similar to @code{__builtin_nans}, except the return type is @code{_Decimal32}. +@enddefbuiltin + +@defbuiltin{_Decimal64 __builtin_nansd64 (const char *@var{str})} +Similar to @code{__builtin_nans}, except the return type is @code{_Decimal64}. +@enddefbuiltin + +@defbuiltin{_Decimal128 __builtin_nansd128 (const char *@var{str})} +Similar to @code{__builtin_nans}, except the return type is @code{_Decimal128}. +@enddefbuiltin + +@defbuiltin{float __builtin_nansf (const char *@var{str})} +Similar to @code{__builtin_nans}, except the return type is @code{float}. +@enddefbuiltin + +@defbuiltin{{long double} __builtin_nansl (const char *@var{str})} +Similar to @code{__builtin_nans}, except the return type is @code{long double}. +@enddefbuiltin + +@defbuiltin{_Float@var{n} __builtin_nansf@var{n} (const char *@var{str})} +Similar to @code{__builtin_nans}, except the return type is +@code{_Float@var{n}}. +@enddefbuiltin + +@defbuiltin{_Float@var{n}x __builtin_nansf@var{n}x (const char *@var{str})} +Similar to @code{__builtin_nans}, except the return type is +@code{_Float@var{n}x}. +@enddefbuiltin + +@defbuiltin{int __builtin_issignaling (...)} +Return non-zero if the argument is a signaling NaN and zero otherwise. +Note while the parameter list is an +ellipsis, this function only accepts exactly one floating-point +argument. GCC treats this parameter as type-generic, which means it +does not do default promotion from float to double. +This built-in function can work even without the non-default +@code{-fsignaling-nans} option, although if a signaling NaN is computed, +stored or passed as argument to some function other than this built-in +in the current translation unit, it is safer to use @code{-fsignaling-nans}. +With @code{-ffinite-math-only} option this built-in function will always +return 0. +@enddefbuiltin + +@defbuiltin{int __builtin_ffs (int @var{x})} +Returns one plus the index of the least significant 1-bit of @var{x}, or +if @var{x} is zero, returns zero. +@enddefbuiltin + +@defbuiltin{int __builtin_clz (unsigned int @var{x})} +Returns the number of leading 0-bits in @var{x}, starting at the most +significant bit position. If @var{x} is 0, the result is undefined. +@enddefbuiltin + +@defbuiltin{int __builtin_ctz (unsigned int @var{x})} +Returns the number of trailing 0-bits in @var{x}, starting at the least +significant bit position. If @var{x} is 0, the result is undefined. +@enddefbuiltin + +@defbuiltin{int __builtin_clrsb (int @var{x})} +Returns the number of leading redundant sign bits in @var{x}, i.e.@: the +number of bits following the most significant bit that are identical +to it. There are no special cases for 0 or other values. +@enddefbuiltin + +@defbuiltin{int __builtin_popcount (unsigned int @var{x})} +Returns the number of 1-bits in @var{x}. +@enddefbuiltin + +@defbuiltin{int __builtin_parity (unsigned int @var{x})} +Returns the parity of @var{x}, i.e.@: the number of 1-bits in @var{x} +modulo 2. +@enddefbuiltin + +@defbuiltin{int __builtin_ffsl (long)} +Similar to @code{__builtin_ffs}, except the argument type is +@code{long}. +@enddefbuiltin + +@defbuiltin{int __builtin_clzl (unsigned long)} +Similar to @code{__builtin_clz}, except the argument type is +@code{unsigned long}. +@enddefbuiltin + +@defbuiltin{int __builtin_ctzl (unsigned long)} +Similar to @code{__builtin_ctz}, except the argument type is +@code{unsigned long}. +@enddefbuiltin + +@defbuiltin{int __builtin_clrsbl (long)} +Similar to @code{__builtin_clrsb}, except the argument type is +@code{long}. +@enddefbuiltin + +@defbuiltin{int __builtin_popcountl (unsigned long)} +Similar to @code{__builtin_popcount}, except the argument type is +@code{unsigned long}. +@enddefbuiltin + +@defbuiltin{int __builtin_parityl (unsigned long)} +Similar to @code{__builtin_parity}, except the argument type is +@code{unsigned long}. +@enddefbuiltin + +@defbuiltin{int __builtin_ffsll (long long)} +Similar to @code{__builtin_ffs}, except the argument type is +@code{long long}. +@enddefbuiltin + +@defbuiltin{int __builtin_clzll (unsigned long long)} +Similar to @code{__builtin_clz}, except the argument type is +@code{unsigned long long}. +@enddefbuiltin + +@defbuiltin{int __builtin_ctzll (unsigned long long)} +Similar to @code{__builtin_ctz}, except the argument type is +@code{unsigned long long}. +@enddefbuiltin + +@defbuiltin{int __builtin_clrsbll (long long)} +Similar to @code{__builtin_clrsb}, except the argument type is +@code{long long}. +@enddefbuiltin + +@defbuiltin{int __builtin_popcountll (unsigned long long)} +Similar to @code{__builtin_popcount}, except the argument type is +@code{unsigned long long}. +@enddefbuiltin + +@defbuiltin{int __builtin_parityll (unsigned long long)} +Similar to @code{__builtin_parity}, except the argument type is +@code{unsigned long long}. +@enddefbuiltin + +@defbuiltin{int __builtin_ffsg (...)} +Similar to @code{__builtin_ffs}, except the argument is type-generic +signed integer (standard, extended or bit-precise). No integral argument +promotions are performed on the argument. +@enddefbuiltin + +@defbuiltin{int __builtin_clzg (...)} +Similar to @code{__builtin_clz}, except the argument is type-generic +unsigned integer (standard, extended or bit-precise) and there is +optional second argument with int type. No integral argument promotions +are performed on the first argument. If two arguments are specified, +and first argument is 0, the result is the second argument. If only +one argument is specified and it is 0, the result is undefined. +@enddefbuiltin + +@defbuiltin{int __builtin_ctzg (...)} +Similar to @code{__builtin_ctz}, except the argument is type-generic +unsigned integer (standard, extended or bit-precise) and there is +optional second argument with int type. No integral argument promotions +are performed on the first argument. If two arguments are specified, +and first argument is 0, the result is the second argument. If only +one argument is specified and it is 0, the result is undefined. +@enddefbuiltin + +@defbuiltin{int __builtin_clrsbg (...)} +Similar to @code{__builtin_clrsb}, except the argument is type-generic +signed integer (standard, extended or bit-precise). No integral argument +promotions are performed on the argument. +@enddefbuiltin + +@defbuiltin{int __builtin_popcountg (...)} +Similar to @code{__builtin_popcount}, except the argument is type-generic +unsigned integer (standard, extended or bit-precise). No integral argument +promotions are performed on the argument. +@enddefbuiltin + +@defbuiltin{int __builtin_parityg (...)} +Similar to @code{__builtin_parity}, except the argument is type-generic +unsigned integer (standard, extended or bit-precise). No integral argument +promotions are performed on the argument. +@enddefbuiltin + +@defbuiltin{@var{type} __builtin_stdc_bit_ceil (@var{type} @var{arg})} +The @code{__builtin_stdc_bit_ceil} function is available only +in C. It is type-generic, the argument can be any unsigned integer +(standard, extended or bit-precise). No integral argument promotions are +performed on the argument. It is equivalent to +@code{@var{arg} <= 1 ? (@var{type}) 1 +: (@var{type}) 2 << (@var{prec} - 1 - __builtin_clzg ((@var{type}) (@var{arg} - 1)))} +where @var{prec} is bit width of @var{type}, except that side-effects +in @var{arg} are evaluated just once. +@enddefbuiltin + +@defbuiltin{@var{type} __builtin_stdc_bit_floor (@var{type} @var{arg})} +The @code{__builtin_stdc_bit_floor} function is available only +in C. It is type-generic, the argument can be any unsigned integer +(standard, extended or bit-precise). No integral argument promotions are +performed on the argument. It is equivalent to +@code{@var{arg} == 0 ? (@var{type}) 0 +: (@var{type}) 1 << (@var{prec} - 1 - __builtin_clzg (@var{arg}))} +where @var{prec} is bit width of @var{type}, except that side-effects +in @var{arg} are evaluated just once. +@enddefbuiltin + +@defbuiltin{{unsigned int} __builtin_stdc_bit_width (@var{type} @var{arg})} +The @code{__builtin_stdc_bit_width} function is available only +in C. It is type-generic, the argument can be any unsigned integer +(standard, extended or bit-precise). No integral argument promotions are +performed on the argument. It is equivalent to +@code{(unsigned int) (@var{prec} - __builtin_clzg (@var{arg}, @var{prec}))} +where @var{prec} is bit width of @var{type}. +@enddefbuiltin + +@defbuiltin{{unsigned int} __builtin_stdc_count_ones (@var{type} @var{arg})} +The @code{__builtin_stdc_count_ones} function is available only +in C. It is type-generic, the argument can be any unsigned integer +(standard, extended or bit-precise). No integral argument promotions are +performed on the argument. It is equivalent to +@code{(unsigned int) __builtin_popcountg (@var{arg})} +@enddefbuiltin + +@defbuiltin{{unsigned int} __builtin_stdc_count_zeros (@var{type} @var{arg})} +The @code{__builtin_stdc_count_zeros} function is available only +in C. It is type-generic, the argument can be any unsigned integer +(standard, extended or bit-precise). No integral argument promotions are +performed on the argument. It is equivalent to +@code{(unsigned int) __builtin_popcountg ((@var{type}) ~@var{arg})} +@enddefbuiltin + +@defbuiltin{{unsigned int} __builtin_stdc_first_leading_one (@var{type} @var{arg})} +The @code{__builtin_stdc_first_leading_one} function is available only +in C. It is type-generic, the argument can be any unsigned integer +(standard, extended or bit-precise). No integral argument promotions are +performed on the argument. It is equivalent to +@code{__builtin_clzg (@var{arg}, -1) + 1U} +@enddefbuiltin + +@defbuiltin{{unsigned int} __builtin_stdc_first_leading_zero (@var{type} @var{arg})} +The @code{__builtin_stdc_first_leading_zero} function is available only +in C. It is type-generic, the argument can be any unsigned integer +(standard, extended or bit-precise). No integral argument promotions are +performed on the argument. It is equivalent to +@code{__builtin_clzg ((@var{type}) ~@var{arg}, -1) + 1U} +@enddefbuiltin + +@defbuiltin{{unsigned int} __builtin_stdc_first_trailing_one (@var{type} @var{arg})} +The @code{__builtin_stdc_first_trailing_one} function is available only +in C. It is type-generic, the argument can be any unsigned integer +(standard, extended or bit-precise). No integral argument promotions are +performed on the argument. It is equivalent to +@code{__builtin_ctzg (@var{arg}, -1) + 1U} +@enddefbuiltin + +@defbuiltin{{unsigned int} __builtin_stdc_first_trailing_zero (@var{type} @var{arg})} +The @code{__builtin_stdc_first_trailing_zero} function is available only +in C. It is type-generic, the argument can be any unsigned integer +(standard, extended or bit-precise). No integral argument promotions are +performed on the argument. It is equivalent to +@code{__builtin_ctzg ((@var{type}) ~@var{arg}, -1) + 1U} +@enddefbuiltin + +@defbuiltin{{unsigned int} __builtin_stdc_has_single_bit (@var{type} @var{arg})} +The @code{__builtin_stdc_has_single_bit} function is available only +in C. It is type-generic, the argument can be any unsigned integer +(standard, extended or bit-precise). No integral argument promotions are +performed on the argument. It is equivalent to +@code{(_Bool) (__builtin_popcountg (@var{arg}) == 1)} +@enddefbuiltin + +@defbuiltin{{unsigned int} __builtin_stdc_leading_ones (@var{type} @var{arg})} +The @code{__builtin_stdc_leading_ones} function is available only +in C. It is type-generic, the argument can be any unsigned integer +(standard, extended or bit-precise). No integral argument promotions are +performed on the argument. It is equivalent to +@code{(unsigned int) __builtin_clzg ((@var{type}) ~@var{arg}, @var{prec})} +@enddefbuiltin + +@defbuiltin{{unsigned int} __builtin_stdc_leading_zeros (@var{type} @var{arg})} +The @code{__builtin_stdc_leading_zeros} function is available only +in C. It is type-generic, the argument can be any unsigned integer +(standard, extended or bit-precise). No integral argument promotions are +performed on the argument. It is equivalent to +@code{(unsigned int) __builtin_clzg (@var{arg}, @var{prec})} +@enddefbuiltin + +@defbuiltin{{unsigned int} __builtin_stdc_trailing_ones (@var{type} @var{arg})} +The @code{__builtin_stdc_trailing_ones} function is available only +in C. It is type-generic, the argument can be any unsigned integer +(standard, extended or bit-precise). No integral argument promotions are +performed on the argument. It is equivalent to +@code{(unsigned int) __builtin_ctzg ((@var{type}) ~@var{arg}, @var{prec})} +@enddefbuiltin + +@defbuiltin{{unsigned int} __builtin_stdc_trailing_zeros (@var{type} @var{arg})} +The @code{__builtin_stdc_trailing_zeros} function is available only +in C. It is type-generic, the argument can be any unsigned integer +(standard, extended or bit-precise). No integral argument promotions are +performed on the argument. It is equivalent to +@code{(unsigned int) __builtin_ctzg (@var{arg}, @var{prec})} +@enddefbuiltin + +@defbuiltin{@var{type1} __builtin_stdc_rotate_left (@var{type1} @var{arg1}, @var{type2} @var{arg2})} +The @code{__builtin_stdc_rotate_left} function is available only +in C. It is type-generic, the first argument can be any unsigned integer +(standard, extended or bit-precise) and second argument any signed or +unsigned integer or @code{char}. No integral argument promotions are +performed on the arguments. It is equivalent to +@code{(@var{type1}) ((@var{arg1} << (@var{arg2} % @var{prec})) +| (@var{arg1} >> ((-(unsigned @var{type2}) @var{arg2}) % @var{prec})))} +where @var{prec} is bit width of @var{type1}, except that side-effects +in @var{arg1} and @var{arg2} are evaluated just once. The behavior is +undefined if @var{arg2} is negative. +@enddefbuiltin + +@defbuiltin{@var{type1} __builtin_stdc_rotate_right (@var{type1} @var{arg1}, @var{type2} @var{arg2})} +The @code{__builtin_stdc_rotate_right} function is available only +in C. It is type-generic, the first argument can be any unsigned integer +(standard, extended or bit-precise) and second argument any signed or +unsigned integer or @code{char}. No integral argument promotions are +performed on the arguments. It is equivalent to +@code{(@var{type1}) ((@var{arg1} >> (@var{arg2} % @var{prec})) +| (@var{arg1} << ((-(unsigned @var{type2}) @var{arg2}) % @var{prec})))} +where @var{prec} is bit width of @var{type1}, except that side-effects +in @var{arg1} and @var{arg2} are evaluated just once. The behavior is +undefined if @var{arg2} is negative. +@enddefbuiltin + +@defbuiltin{double __builtin_powi (double, int)} +@defbuiltinx{float __builtin_powif (float, int)} +@defbuiltinx{{long double} __builtin_powil (long double, int)} +Returns the first argument raised to the power of the second. Unlike the +@code{pow} function no guarantees about precision and rounding are made. +@enddefbuiltin + +@defbuiltin{uint16_t __builtin_bswap16 (uint16_t @var{x})} +Returns @var{x} with the order of the bytes reversed; for example, +@code{0xabcd} becomes @code{0xcdab}. Byte here always means +exactly 8 bits. +@enddefbuiltin + +@defbuiltin{uint32_t __builtin_bswap32 (uint32_t @var{x})} +Similar to @code{__builtin_bswap16}, except the argument and return types +are 32-bit. +@enddefbuiltin + +@defbuiltin{uint64_t __builtin_bswap64 (uint64_t @var{x})} +Similar to @code{__builtin_bswap32}, except the argument and return types +are 64-bit. +@enddefbuiltin + +@defbuiltin{uint128_t __builtin_bswap128 (uint128_t @var{x})} +Similar to @code{__builtin_bswap64}, except the argument and return types +are 128-bit. Only supported on targets when 128-bit types are supported. +@enddefbuiltin + + +@defbuiltin{Pmode __builtin_extend_pointer (void * @var{x})} +On targets where the user visible pointer size is smaller than the size +of an actual hardware address this function returns the extended user +pointer. Targets where this is true included ILP32 mode on x86_64 or +Aarch64. This function is mainly useful when writing inline assembly +code. +@enddefbuiltin + +@defbuiltin{int __builtin_goacc_parlevel_id (int @var{x})} +Returns the openacc gang, worker or vector id depending on whether @var{x} is +0, 1 or 2. +@enddefbuiltin + +@defbuiltin{int __builtin_goacc_parlevel_size (int @var{x})} +Returns the openacc gang, worker or vector size depending on whether @var{x} is +0, 1 or 2. +@enddefbuiltin + +@defbuiltin{uint8_t __builtin_rev_crc8_data8 (uint8_t @var{crc}, uint8_t @var{data}, uint8_t @var{poly})} +Returns the calculated 8-bit bit-reversed CRC using the initial CRC (8-bit), +data (8-bit) and the polynomial (8-bit). +@var{crc} is the initial CRC, @var{data} is the data and +@var{poly} is the polynomial without leading 1. +Table-based or clmul-based CRC may be used for the +calculation, depending on the target architecture. +@enddefbuiltin + +@defbuiltin{uint16_t __builtin_rev_crc16_data16 (uint16_t @var{crc}, uint16_t @var{data}, uint16_t @var{poly})} +Similar to @code{__builtin_rev_crc8_data8}, except the argument and return types +are 16-bit. +@enddefbuiltin + +@defbuiltin{uint16_t __builtin_rev_crc16_data8 (uint16_t @var{crc}, uint8_t @var{data}, uint16_t @var{poly})} +Similar to @code{__builtin_rev_crc16_data16}, except the @var{data} argument +type is 8-bit. +@enddefbuiltin + +@defbuiltin{uint32_t __builtin_rev_crc32_data32 (uint32_t @var{crc}, uint32_t @var{data}, uint32_t @var{poly})} +Similar to @code{__builtin_rev_crc8_data8}, except the argument and return types +are 32-bit and for the CRC calculation may be also used crc* machine instruction +depending on the target and the polynomial. +@enddefbuiltin + +@defbuiltin{uint32_t __builtin_rev_crc32_data8 (uint32_t @var{crc}, uint8_t @var{data}, uint32_t @var{poly})} +Similar to @code{__builtin_rev_crc32_data32}, except the @var{data} argument +type is 8-bit. +@enddefbuiltin + +@defbuiltin{uint32_t __builtin_rev_crc32_data16 (uint32_t @var{crc}, uint16_t @var{data}, uint32_t @var{poly})} +Similar to @code{__builtin_rev_crc32_data32}, except the @var{data} argument +type is 16-bit. +@enddefbuiltin + +@defbuiltin{uint64_t __builtin_rev_crc64_data64 (uint64_t @var{crc}, uint64_t @var{data}, uint64_t @var{poly})} +Similar to @code{__builtin_rev_crc8_data8}, except the argument and return types +are 64-bit. +@enddefbuiltin + +@defbuiltin{uint64_t __builtin_rev_crc64_data8 (uint64_t @var{crc}, uint8_t @var{data}, uint64_t @var{poly})} +Similar to @code{__builtin_rev_crc64_data64}, except the @var{data} argument type +is 8-bit. +@enddefbuiltin + +@defbuiltin{uint64_t __builtin_rev_crc64_data16 (uint64_t @var{crc}, uint16_t @var{data}, uint64_t @var{poly})} +Similar to @code{__builtin_rev_crc64_data64}, except the @var{data} argument type +is 16-bit. +@enddefbuiltin -@node MIPS DSP Built-in Functions -@subsection MIPS DSP Built-in Functions +@defbuiltin{uint64_t __builtin_rev_crc64_data32 (uint64_t @var{crc}, uint32_t @var{data}, uint64_t @var{poly})} +Similar to @code{__builtin_rev_crc64_data64}, except the @var{data} argument type +is 32-bit. +@enddefbuiltin -The MIPS DSP Application-Specific Extension (ASE) includes new -instructions that are designed to improve the performance of DSP and -media applications. It provides instructions that operate on packed -8-bit/16-bit integer data, Q7, Q15 and Q31 fractional data. +@defbuiltin{uint8_t __builtin_crc8_data8 (uint8_t @var{crc}, uint8_t @var{data}, uint8_t @var{poly})} +Returns the calculated 8-bit bit-forward CRC using the initial CRC (8-bit), +data (8-bit) and the polynomial (8-bit). +@var{crc} is the initial CRC, @var{data} is the data and +@var{poly} is the polynomial without leading 1. +Table-based or clmul-based CRC may be used for the +calculation, depending on the target architecture. +@enddefbuiltin -GCC supports MIPS DSP operations using both the generic -vector extensions (@pxref{Vector Extensions}) and a collection of -MIPS-specific built-in functions. Both kinds of support are -enabled by the @option{-mdsp} command-line option. +@defbuiltin{uint16_t __builtin_crc16_data16 (uint16_t @var{crc}, uint16_t @var{data}, uint16_t @var{poly})} +Similar to @code{__builtin_crc8_data8}, except the argument and return types +are 16-bit. +@enddefbuiltin -Revision 2 of the ASE was introduced in the second half of 2006. -This revision adds extra instructions to the original ASE, but is -otherwise backwards-compatible with it. You can select revision 2 -using the command-line option @option{-mdspr2}; this option implies -@option{-mdsp}. +@defbuiltin{uint16_t __builtin_crc16_data8 (uint16_t @var{crc}, uint8_t @var{data}, uint16_t @var{poly})} +Similar to @code{__builtin_crc16_data16}, except the @var{data} argument type +is 8-bit. +@enddefbuiltin -The SCOUNT and POS bits of the DSP control register are global. The -WRDSP, EXTPDP, EXTPDPV and MTHLIP instructions modify the SCOUNT and -POS bits. During optimization, the compiler does not delete these -instructions and it does not delete calls to functions containing -these instructions. +@defbuiltin{uint32_t __builtin_crc32_data32 (uint32_t @var{crc}, uint32_t @var{data}, uint32_t @var{poly})} +Similar to @code{__builtin_crc8_data8}, except the argument and return types +are 32-bit. +@enddefbuiltin -At present, GCC only provides support for operations on 32-bit -vectors. The vector type associated with 8-bit integer data is -usually called @code{v4i8}, the vector type associated with Q7 -is usually called @code{v4q7}, the vector type associated with 16-bit -integer data is usually called @code{v2i16}, and the vector type -associated with Q15 is usually called @code{v2q15}. They can be -defined in C as follows: +@defbuiltin{uint32_t __builtin_crc32_data8 (uint32_t @var{crc}, uint8_t @var{data}, uint32_t @var{poly})} +Similar to @code{__builtin_crc32_data32}, except the @var{data} argument type +is 8-bit. +@enddefbuiltin -@smallexample -typedef signed char v4i8 __attribute__ ((vector_size(4))); -typedef signed char v4q7 __attribute__ ((vector_size(4))); -typedef short v2i16 __attribute__ ((vector_size(4))); -typedef short v2q15 __attribute__ ((vector_size(4))); -@end smallexample +@defbuiltin{uint32_t __builtin_crc32_data16 (uint32_t @var{crc}, uint16_t @var{data}, uint32_t @var{poly})} +Similar to @code{__builtin_crc32_data32}, except the @var{data} argument type +is 16-bit. +@enddefbuiltin -@code{v4i8}, @code{v4q7}, @code{v2i16} and @code{v2q15} values are -initialized in the same way as aggregates. For example: +@defbuiltin{uint64_t __builtin_crc64_data64 (uint64_t @var{crc}, uint64_t @var{data}, uint64_t @var{poly})} +Similar to @code{__builtin_crc8_data8}, except the argument and return types +are 64-bit. +@enddefbuiltin -@smallexample -v4i8 a = @{1, 2, 3, 4@}; -v4i8 b; -b = (v4i8) @{5, 6, 7, 8@}; +@defbuiltin{uint64_t __builtin_crc64_data8 (uint64_t @var{crc}, uint8_t @var{data}, uint64_t @var{poly})} +Similar to @code{__builtin_crc64_data64}, except the @var{data} argument type +is 8-bit. +@enddefbuiltin -v2q15 c = @{0x0fcb, 0x3a75@}; -v2q15 d; -d = (v2q15) @{0.1234 * 0x1.0p15, 0.4567 * 0x1.0p15@}; -@end smallexample +@defbuiltin{uint64_t __builtin_crc64_data16 (uint64_t @var{crc}, uint16_t @var{data}, uint64_t @var{poly})} +Similar to @code{__builtin_crc64_data64}, except the @var{data} argument type +is 16-bit. +@enddefbuiltin -@emph{Note:} The CPU's endianness determines the order in which values -are packed. On little-endian targets, the first value is the least -significant and the last value is the most significant. The opposite -order applies to big-endian targets. For example, the code above -sets the lowest byte of @code{a} to @code{1} on little-endian targets -and @code{4} on big-endian targets. +@defbuiltin{uint64_t __builtin_crc64_data32 (uint64_t @var{crc}, uint32_t @var{data}, uint64_t @var{poly})} +Similar to @code{__builtin_crc64_data64}, except the @var{data} argument type +is 32-bit. +@enddefbuiltin -@emph{Note:} Q7, Q15 and Q31 values must be initialized with their integer -representation. As shown in this example, the integer representation -of a Q7 value can be obtained by multiplying the fractional value by -@code{0x1.0p7}. The equivalent for Q15 values is to multiply by -@code{0x1.0p15}. The equivalent for Q31 values is to multiply by -@code{0x1.0p31}. +@node Target Builtins +@section Built-in Functions Specific to Particular Target Machines -The table below lists the @code{v4i8} and @code{v2q15} operations for which -hardware support exists. @code{a} and @code{b} are @code{v4i8} values, -and @code{c} and @code{d} are @code{v2q15} values. +On some target machines, GCC supports many built-in functions specific +to those machines. Generally these generate calls to specific machine +instructions, but allow the compiler to schedule those calls. -@multitable @columnfractions .50 .50 -@headitem C code @tab MIPS instruction -@item @code{a + b} @tab @code{addu.qb} -@item @code{c + d} @tab @code{addq.ph} -@item @code{a - b} @tab @code{subu.qb} -@item @code{c - d} @tab @code{subq.ph} -@end multitable +@menu +* AArch64 Built-in Functions:: +* Alpha Built-in Functions:: +* ARC Built-in Functions:: +* ARC SIMD Built-in Functions:: +* ARM iWMMXt Built-in Functions:: +* ARM C Language Extensions (ACLE):: +* ARM Floating Point Status and Control Intrinsics:: +* ARM ARMv8-M Security Extensions:: +* AVR Built-in Functions:: +* Blackfin Built-in Functions:: +* BPF Built-in Functions:: +* FR-V Built-in Functions:: +* LoongArch Base Built-in Functions:: +* LoongArch SX Vector Intrinsics:: +* LoongArch ASX Vector Intrinsics:: +* MIPS DSP Built-in Functions:: +* MIPS Paired-Single Support:: +* MIPS Loongson Built-in Functions:: +* MIPS SIMD Architecture (MSA) Support:: +* Other MIPS Built-in Functions:: +* MSP430 Built-in Functions:: +* NDS32 Built-in Functions:: +* Nvidia PTX Built-in Functions:: +* Basic PowerPC Built-in Functions:: +* PowerPC AltiVec/VSX Built-in Functions:: +* PowerPC Hardware Transactional Memory Built-in Functions:: +* PowerPC Atomic Memory Operation Functions:: +* PowerPC Matrix-Multiply Assist Built-in Functions:: +* PRU Built-in Functions:: +* RISC-V Built-in Functions:: +* RISC-V Vector Intrinsics:: +* CORE-V Built-in Functions:: +* RX Built-in Functions:: +* S/390 System z Built-in Functions:: +* SH Built-in Functions:: +* SPARC VIS Built-in Functions:: +* TI C6X Built-in Functions:: +* x86 Built-in Functions:: +* x86 transactional memory intrinsics:: +* x86 control-flow protection intrinsics:: +@end menu -The table below lists the @code{v2i16} operation for which -hardware support exists for the DSP ASE REV 2. @code{e} and @code{f} are -@code{v2i16} values. +@node AArch64 Built-in Functions +@subsection AArch64 Built-in Functions -@multitable @columnfractions .50 .50 -@headitem C code @tab MIPS instruction -@item @code{e * f} @tab @code{mul.ph} -@end multitable +These built-in functions are available for the AArch64 family of +processors. +@smallexample +unsigned int __builtin_aarch64_get_fpcr (); +void __builtin_aarch64_set_fpcr (unsigned int); +unsigned int __builtin_aarch64_get_fpsr (); +void __builtin_aarch64_set_fpsr (unsigned int); -It is easier to describe the DSP built-in functions if we first define -the following types: +unsigned long long __builtin_aarch64_get_fpcr64 (); +void __builtin_aarch64_set_fpcr64 (unsigned long long); +unsigned long long __builtin_aarch64_get_fpsr64 (); +void __builtin_aarch64_set_fpsr64 (unsigned long long); +@end smallexample + +@node Alpha Built-in Functions +@subsection Alpha Built-in Functions + +These built-in functions are available for the Alpha family of +processors, depending on the command-line switches used. + +The following built-in functions are always available. They +all generate the machine instruction that is part of the name. @smallexample -typedef int q31; -typedef int i32; -typedef unsigned int ui32; -typedef long long a64; +long __builtin_alpha_implver (void); +long __builtin_alpha_rpcc (void); +long __builtin_alpha_amask (long); +long __builtin_alpha_cmpbge (long, long); +long __builtin_alpha_extbl (long, long); +long __builtin_alpha_extwl (long, long); +long __builtin_alpha_extll (long, long); +long __builtin_alpha_extql (long, long); +long __builtin_alpha_extwh (long, long); +long __builtin_alpha_extlh (long, long); +long __builtin_alpha_extqh (long, long); +long __builtin_alpha_insbl (long, long); +long __builtin_alpha_inswl (long, long); +long __builtin_alpha_insll (long, long); +long __builtin_alpha_insql (long, long); +long __builtin_alpha_inswh (long, long); +long __builtin_alpha_inslh (long, long); +long __builtin_alpha_insqh (long, long); +long __builtin_alpha_mskbl (long, long); +long __builtin_alpha_mskwl (long, long); +long __builtin_alpha_mskll (long, long); +long __builtin_alpha_mskql (long, long); +long __builtin_alpha_mskwh (long, long); +long __builtin_alpha_msklh (long, long); +long __builtin_alpha_mskqh (long, long); +long __builtin_alpha_umulh (long, long); +long __builtin_alpha_zap (long, long); +long __builtin_alpha_zapnot (long, long); +@end smallexample + +The following built-in functions are always with @option{-mmax} +or @option{-mcpu=@var{cpu}} where @var{cpu} is @code{pca56} or +later. They all generate the machine instruction that is part +of the name. + +@smallexample +long __builtin_alpha_pklb (long); +long __builtin_alpha_pkwb (long); +long __builtin_alpha_unpkbl (long); +long __builtin_alpha_unpkbw (long); +long __builtin_alpha_minub8 (long, long); +long __builtin_alpha_minsb8 (long, long); +long __builtin_alpha_minuw4 (long, long); +long __builtin_alpha_minsw4 (long, long); +long __builtin_alpha_maxub8 (long, long); +long __builtin_alpha_maxsb8 (long, long); +long __builtin_alpha_maxuw4 (long, long); +long __builtin_alpha_maxsw4 (long, long); +long __builtin_alpha_perr (long, long); @end smallexample -@code{q31} and @code{i32} are actually the same as @code{int}, but we -use @code{q31} to indicate a Q31 fractional value and @code{i32} to -indicate a 32-bit integer value. Similarly, @code{a64} is the same as -@code{long long}, but we use @code{a64} to indicate values that are -placed in one of the four DSP accumulators (@code{$ac0}, -@code{$ac1}, @code{$ac2} or @code{$ac3}). - -Also, some built-in functions prefer or require immediate numbers as -parameters, because the corresponding DSP instructions accept both immediate -numbers and register operands, or accept immediate numbers only. The -immediate parameters are listed as follows. +The following built-in functions are always with @option{-mcix} +or @option{-mcpu=@var{cpu}} where @var{cpu} is @code{ev67} or +later. They all generate the machine instruction that is part +of the name. @smallexample -imm0_3: 0 to 3. -imm0_7: 0 to 7. -imm0_15: 0 to 15. -imm0_31: 0 to 31. -imm0_63: 0 to 63. -imm0_255: 0 to 255. -imm_n32_31: -32 to 31. -imm_n512_511: -512 to 511. +long __builtin_alpha_cttz (long); +long __builtin_alpha_ctlz (long); +long __builtin_alpha_ctpop (long); @end smallexample -The following built-in functions map directly to a particular MIPS DSP -instruction. Please refer to the architecture specification -for details on what each instruction does. +The following built-in functions are available on systems that use the OSF/1 +PALcode. Normally they invoke the @code{rduniq} and @code{wruniq} +PAL calls, but when invoked with @option{-mtls-kernel}, they invoke +@code{rdval} and @code{wrval}. @smallexample -v2q15 __builtin_mips_addq_ph (v2q15, v2q15); -v2q15 __builtin_mips_addq_s_ph (v2q15, v2q15); -q31 __builtin_mips_addq_s_w (q31, q31); -v4i8 __builtin_mips_addu_qb (v4i8, v4i8); -v4i8 __builtin_mips_addu_s_qb (v4i8, v4i8); -v2q15 __builtin_mips_subq_ph (v2q15, v2q15); -v2q15 __builtin_mips_subq_s_ph (v2q15, v2q15); -q31 __builtin_mips_subq_s_w (q31, q31); -v4i8 __builtin_mips_subu_qb (v4i8, v4i8); -v4i8 __builtin_mips_subu_s_qb (v4i8, v4i8); -i32 __builtin_mips_addsc (i32, i32); -i32 __builtin_mips_addwc (i32, i32); -i32 __builtin_mips_modsub (i32, i32); -i32 __builtin_mips_raddu_w_qb (v4i8); -v2q15 __builtin_mips_absq_s_ph (v2q15); -q31 __builtin_mips_absq_s_w (q31); -v4i8 __builtin_mips_precrq_qb_ph (v2q15, v2q15); -v2q15 __builtin_mips_precrq_ph_w (q31, q31); -v2q15 __builtin_mips_precrq_rs_ph_w (q31, q31); -v4i8 __builtin_mips_precrqu_s_qb_ph (v2q15, v2q15); -q31 __builtin_mips_preceq_w_phl (v2q15); -q31 __builtin_mips_preceq_w_phr (v2q15); -v2q15 __builtin_mips_precequ_ph_qbl (v4i8); -v2q15 __builtin_mips_precequ_ph_qbr (v4i8); -v2q15 __builtin_mips_precequ_ph_qbla (v4i8); -v2q15 __builtin_mips_precequ_ph_qbra (v4i8); -v2q15 __builtin_mips_preceu_ph_qbl (v4i8); -v2q15 __builtin_mips_preceu_ph_qbr (v4i8); -v2q15 __builtin_mips_preceu_ph_qbla (v4i8); -v2q15 __builtin_mips_preceu_ph_qbra (v4i8); -v4i8 __builtin_mips_shll_qb (v4i8, imm0_7); -v4i8 __builtin_mips_shll_qb (v4i8, i32); -v2q15 __builtin_mips_shll_ph (v2q15, imm0_15); -v2q15 __builtin_mips_shll_ph (v2q15, i32); -v2q15 __builtin_mips_shll_s_ph (v2q15, imm0_15); -v2q15 __builtin_mips_shll_s_ph (v2q15, i32); -q31 __builtin_mips_shll_s_w (q31, imm0_31); -q31 __builtin_mips_shll_s_w (q31, i32); -v4i8 __builtin_mips_shrl_qb (v4i8, imm0_7); -v4i8 __builtin_mips_shrl_qb (v4i8, i32); -v2q15 __builtin_mips_shra_ph (v2q15, imm0_15); -v2q15 __builtin_mips_shra_ph (v2q15, i32); -v2q15 __builtin_mips_shra_r_ph (v2q15, imm0_15); -v2q15 __builtin_mips_shra_r_ph (v2q15, i32); -q31 __builtin_mips_shra_r_w (q31, imm0_31); -q31 __builtin_mips_shra_r_w (q31, i32); -v2q15 __builtin_mips_muleu_s_ph_qbl (v4i8, v2q15); -v2q15 __builtin_mips_muleu_s_ph_qbr (v4i8, v2q15); -v2q15 __builtin_mips_mulq_rs_ph (v2q15, v2q15); -q31 __builtin_mips_muleq_s_w_phl (v2q15, v2q15); -q31 __builtin_mips_muleq_s_w_phr (v2q15, v2q15); -a64 __builtin_mips_dpau_h_qbl (a64, v4i8, v4i8); -a64 __builtin_mips_dpau_h_qbr (a64, v4i8, v4i8); -a64 __builtin_mips_dpsu_h_qbl (a64, v4i8, v4i8); -a64 __builtin_mips_dpsu_h_qbr (a64, v4i8, v4i8); -a64 __builtin_mips_dpaq_s_w_ph (a64, v2q15, v2q15); -a64 __builtin_mips_dpaq_sa_l_w (a64, q31, q31); -a64 __builtin_mips_dpsq_s_w_ph (a64, v2q15, v2q15); -a64 __builtin_mips_dpsq_sa_l_w (a64, q31, q31); -a64 __builtin_mips_mulsaq_s_w_ph (a64, v2q15, v2q15); -a64 __builtin_mips_maq_s_w_phl (a64, v2q15, v2q15); -a64 __builtin_mips_maq_s_w_phr (a64, v2q15, v2q15); -a64 __builtin_mips_maq_sa_w_phl (a64, v2q15, v2q15); -a64 __builtin_mips_maq_sa_w_phr (a64, v2q15, v2q15); -i32 __builtin_mips_bitrev (i32); -i32 __builtin_mips_insv (i32, i32); -v4i8 __builtin_mips_repl_qb (imm0_255); -v4i8 __builtin_mips_repl_qb (i32); -v2q15 __builtin_mips_repl_ph (imm_n512_511); -v2q15 __builtin_mips_repl_ph (i32); -void __builtin_mips_cmpu_eq_qb (v4i8, v4i8); -void __builtin_mips_cmpu_lt_qb (v4i8, v4i8); -void __builtin_mips_cmpu_le_qb (v4i8, v4i8); -i32 __builtin_mips_cmpgu_eq_qb (v4i8, v4i8); -i32 __builtin_mips_cmpgu_lt_qb (v4i8, v4i8); -i32 __builtin_mips_cmpgu_le_qb (v4i8, v4i8); -void __builtin_mips_cmp_eq_ph (v2q15, v2q15); -void __builtin_mips_cmp_lt_ph (v2q15, v2q15); -void __builtin_mips_cmp_le_ph (v2q15, v2q15); -v4i8 __builtin_mips_pick_qb (v4i8, v4i8); -v2q15 __builtin_mips_pick_ph (v2q15, v2q15); -v2q15 __builtin_mips_packrl_ph (v2q15, v2q15); -i32 __builtin_mips_extr_w (a64, imm0_31); -i32 __builtin_mips_extr_w (a64, i32); -i32 __builtin_mips_extr_r_w (a64, imm0_31); -i32 __builtin_mips_extr_s_h (a64, i32); -i32 __builtin_mips_extr_rs_w (a64, imm0_31); -i32 __builtin_mips_extr_rs_w (a64, i32); -i32 __builtin_mips_extr_s_h (a64, imm0_31); -i32 __builtin_mips_extr_r_w (a64, i32); -i32 __builtin_mips_extp (a64, imm0_31); -i32 __builtin_mips_extp (a64, i32); -i32 __builtin_mips_extpdp (a64, imm0_31); -i32 __builtin_mips_extpdp (a64, i32); -a64 __builtin_mips_shilo (a64, imm_n32_31); -a64 __builtin_mips_shilo (a64, i32); -a64 __builtin_mips_mthlip (a64, i32); -void __builtin_mips_wrdsp (i32, imm0_63); -i32 __builtin_mips_rddsp (imm0_63); -i32 __builtin_mips_lbux (void *, i32); -i32 __builtin_mips_lhx (void *, i32); -i32 __builtin_mips_lwx (void *, i32); -a64 __builtin_mips_ldx (void *, i32); /* MIPS64 only */ -i32 __builtin_mips_bposge32 (void); -a64 __builtin_mips_madd (a64, i32, i32); -a64 __builtin_mips_maddu (a64, ui32, ui32); -a64 __builtin_mips_msub (a64, i32, i32); -a64 __builtin_mips_msubu (a64, ui32, ui32); -a64 __builtin_mips_mult (i32, i32); -a64 __builtin_mips_multu (ui32, ui32); +void *__builtin_thread_pointer (void); +void __builtin_set_thread_pointer (void *); @end smallexample -The following built-in functions map directly to a particular MIPS DSP REV 2 -instruction. Please refer to the architecture specification -for details on what each instruction does. +@node ARC Built-in Functions +@subsection ARC Built-in Functions + +The following built-in functions are provided for ARC targets. The +built-ins generate the corresponding assembly instructions. In the +examples given below, the generated code often requires an operand or +result to be in a register. Where necessary further code will be +generated to ensure this is true, but for brevity this is not +described in each case. + +@emph{Note:} Using a built-in to generate an instruction not supported +by a target may cause problems. At present the compiler is not +guaranteed to detect such misuse, and as a result an internal compiler +error may be generated. +@defbuiltin{int __builtin_arc_aligned (void *@var{val}, int @var{alignval})} +Return 1 if @var{val} is known to have the byte alignment given +by @var{alignval}, otherwise return 0. +Note that this is different from @smallexample -v4q7 __builtin_mips_absq_s_qb (v4q7); -v2i16 __builtin_mips_addu_ph (v2i16, v2i16); -v2i16 __builtin_mips_addu_s_ph (v2i16, v2i16); -v4i8 __builtin_mips_adduh_qb (v4i8, v4i8); -v4i8 __builtin_mips_adduh_r_qb (v4i8, v4i8); -i32 __builtin_mips_append (i32, i32, imm0_31); -i32 __builtin_mips_balign (i32, i32, imm0_3); -i32 __builtin_mips_cmpgdu_eq_qb (v4i8, v4i8); -i32 __builtin_mips_cmpgdu_lt_qb (v4i8, v4i8); -i32 __builtin_mips_cmpgdu_le_qb (v4i8, v4i8); -a64 __builtin_mips_dpa_w_ph (a64, v2i16, v2i16); -a64 __builtin_mips_dps_w_ph (a64, v2i16, v2i16); -v2i16 __builtin_mips_mul_ph (v2i16, v2i16); -v2i16 __builtin_mips_mul_s_ph (v2i16, v2i16); -q31 __builtin_mips_mulq_rs_w (q31, q31); -v2q15 __builtin_mips_mulq_s_ph (v2q15, v2q15); -q31 __builtin_mips_mulq_s_w (q31, q31); -a64 __builtin_mips_mulsa_w_ph (a64, v2i16, v2i16); -v4i8 __builtin_mips_precr_qb_ph (v2i16, v2i16); -v2i16 __builtin_mips_precr_sra_ph_w (i32, i32, imm0_31); -v2i16 __builtin_mips_precr_sra_r_ph_w (i32, i32, imm0_31); -i32 __builtin_mips_prepend (i32, i32, imm0_31); -v4i8 __builtin_mips_shra_qb (v4i8, imm0_7); -v4i8 __builtin_mips_shra_r_qb (v4i8, imm0_7); -v4i8 __builtin_mips_shra_qb (v4i8, i32); -v4i8 __builtin_mips_shra_r_qb (v4i8, i32); -v2i16 __builtin_mips_shrl_ph (v2i16, imm0_15); -v2i16 __builtin_mips_shrl_ph (v2i16, i32); -v2i16 __builtin_mips_subu_ph (v2i16, v2i16); -v2i16 __builtin_mips_subu_s_ph (v2i16, v2i16); -v4i8 __builtin_mips_subuh_qb (v4i8, v4i8); -v4i8 __builtin_mips_subuh_r_qb (v4i8, v4i8); -v2q15 __builtin_mips_addqh_ph (v2q15, v2q15); -v2q15 __builtin_mips_addqh_r_ph (v2q15, v2q15); -q31 __builtin_mips_addqh_w (q31, q31); -q31 __builtin_mips_addqh_r_w (q31, q31); -v2q15 __builtin_mips_subqh_ph (v2q15, v2q15); -v2q15 __builtin_mips_subqh_r_ph (v2q15, v2q15); -q31 __builtin_mips_subqh_w (q31, q31); -q31 __builtin_mips_subqh_r_w (q31, q31); -a64 __builtin_mips_dpax_w_ph (a64, v2i16, v2i16); -a64 __builtin_mips_dpsx_w_ph (a64, v2i16, v2i16); -a64 __builtin_mips_dpaqx_s_w_ph (a64, v2q15, v2q15); -a64 __builtin_mips_dpaqx_sa_w_ph (a64, v2q15, v2q15); -a64 __builtin_mips_dpsqx_s_w_ph (a64, v2q15, v2q15); -a64 __builtin_mips_dpsqx_sa_w_ph (a64, v2q15, v2q15); +__alignof__(*(char *)@var{val}) >= alignval @end smallexample +because __alignof__ sees only the type of the dereference, whereas +__builtin_arc_align uses alignment information from the pointer +as well as from the pointed-to type. +The information available will depend on optimization level. +@enddefbuiltin + +@defbuiltin{void __builtin_arc_brk (void)} +Generates +@example +brk +@end example +@enddefbuiltin + +@defbuiltin{{unsigned int} __builtin_arc_core_read (unsigned int @var{regno})} +The operand is the number of a register to be read. Generates: +@example +mov @var{dest}, r@var{regno} +@end example +where the value in @var{dest} will be the result returned from the +built-in. +@enddefbuiltin + +@defbuiltin{void __builtin_arc_core_write (unsigned int @var{regno}, unsigned int @var{val})} +The first operand is the number of a register to be written, the +second operand is a compile time constant to write into that +register. Generates: +@example +mov r@var{regno}, @var{val} +@end example +@enddefbuiltin + +@defbuiltin{int __builtin_arc_divaw (int @var{a}, int @var{b})} +Only available if either @option{-mcpu=ARC700} or @option{-meA} is set. +Generates: +@example +divaw @var{dest}, @var{a}, @var{b} +@end example +where the value in @var{dest} will be the result returned from the +built-in. +@enddefbuiltin + +@defbuiltin{void __builtin_arc_flag (unsigned int @var{a})} +Generates +@example +flag @var{a} +@end example +@enddefbuiltin +@defbuiltin{{unsigned int} __builtin_arc_lr (unsigned int @var{auxr})} +The operand, @var{auxv}, is the address of an auxiliary register and +must be a compile time constant. Generates: +@example +lr @var{dest}, [@var{auxr}] +@end example +Where the value in @var{dest} will be the result returned from the +built-in. +@enddefbuiltin -@node MIPS Paired-Single Support -@subsection MIPS Paired-Single Support +@defbuiltin{void __builtin_arc_mul64 (int @var{a}, int @var{b})} +Only available with @option{-mmul64}. Generates: +@example +mul64 @var{a}, @var{b} +@end example +@enddefbuiltin -The MIPS64 architecture includes a number of instructions that -operate on pairs of single-precision floating-point values. -Each pair is packed into a 64-bit floating-point register, -with one element being designated the ``upper half'' and -the other being designated the ``lower half''. +@defbuiltin{void __builtin_arc_mulu64 (unsigned int @var{a}, unsigned int @var{b})} +Only available with @option{-mmul64}. Generates: +@example +mulu64 @var{a}, @var{b} +@end example +@enddefbuiltin -GCC supports paired-single operations using both the generic -vector extensions (@pxref{Vector Extensions}) and a collection of -MIPS-specific built-in functions. Both kinds of support are -enabled by the @option{-mpaired-single} command-line option. +@defbuiltin{void __builtin_arc_nop (void)} +Generates: +@example +nop +@end example +@enddefbuiltin -The vector type associated with paired-single values is usually -called @code{v2sf}. It can be defined in C as follows: +@defbuiltin{int __builtin_arc_norm (int @var{src})} +Only valid if the @samp{norm} instruction is available through the +@option{-mnorm} option or by default with @option{-mcpu=ARC700}. +Generates: +@example +norm @var{dest}, @var{src} +@end example +Where the value in @var{dest} will be the result returned from the +built-in. +@enddefbuiltin -@smallexample -typedef float v2sf __attribute__ ((vector_size (8))); -@end smallexample +@defbuiltin{{short int} __builtin_arc_normw (short int @var{src})} +Only valid if the @samp{normw} instruction is available through the +@option{-mnorm} option or by default with @option{-mcpu=ARC700}. +Generates: +@example +normw @var{dest}, @var{src} +@end example +Where the value in @var{dest} will be the result returned from the +built-in. +@enddefbuiltin -@code{v2sf} values are initialized in the same way as aggregates. -For example: +@defbuiltin{void __builtin_arc_rtie (void)} +Generates: +@example +rtie +@end example +@enddefbuiltin -@smallexample -v2sf a = @{1.5, 9.1@}; -v2sf b; -float e, f; -b = (v2sf) @{e, f@}; -@end smallexample +@defbuiltin{void __builtin_arc_sleep (int @var{a}} +Generates: +@example +sleep @var{a} +@end example +@enddefbuiltin -@emph{Note:} The CPU's endianness determines which value is stored in -the upper half of a register and which value is stored in the lower half. -On little-endian targets, the first value is the lower one and the second -value is the upper one. The opposite order applies to big-endian targets. -For example, the code above sets the lower half of @code{a} to -@code{1.5} on little-endian targets and @code{9.1} on big-endian targets. +@defbuiltin{void __builtin_arc_sr (unsigned int @var{val}, unsigned int @var{auxr})} +The first argument, @var{val}, is a compile time constant to be +written to the register, the second argument, @var{auxr}, is the +address of an auxiliary register. Generates: +@example +sr @var{val}, [@var{auxr}] +@end example +@enddefbuiltin -@node MIPS Loongson Built-in Functions -@subsection MIPS Loongson Built-in Functions +@defbuiltin{int __builtin_arc_swap (int @var{src})} +Only valid with @option{-mswap}. Generates: +@example +swap @var{dest}, @var{src} +@end example +Where the value in @var{dest} will be the result returned from the +built-in. +@enddefbuiltin -GCC provides intrinsics to access the SIMD instructions provided by the -ST Microelectronics Loongson-2E and -2F processors. These intrinsics, -available after inclusion of the @code{loongson.h} header file, -operate on the following 64-bit vector types: +@defbuiltin{void __builtin_arc_swi (void)} +Generates: +@example +swi +@end example +@enddefbuiltin -@itemize -@item @code{uint8x8_t}, a vector of eight unsigned 8-bit integers; -@item @code{uint16x4_t}, a vector of four unsigned 16-bit integers; -@item @code{uint32x2_t}, a vector of two unsigned 32-bit integers; -@item @code{int8x8_t}, a vector of eight signed 8-bit integers; -@item @code{int16x4_t}, a vector of four signed 16-bit integers; -@item @code{int32x2_t}, a vector of two signed 32-bit integers. -@end itemize +@defbuiltin{void __builtin_arc_sync (void)} +Only available with @option{-mcpu=ARC700}. Generates: +@example +sync +@end example +@enddefbuiltin -The intrinsics provided are listed below; each is named after the -machine instruction to which it corresponds, with suffixes added as -appropriate to distinguish intrinsics that expand to the same machine -instruction yet have different argument types. Refer to the architecture -documentation for a description of the functionality of each -instruction. +@defbuiltin{void __builtin_arc_trap_s (unsigned int @var{c})} +Only available with @option{-mcpu=ARC700}. Generates: +@example +trap_s @var{c} +@end example +@enddefbuiltin -@smallexample -int16x4_t packsswh (int32x2_t s, int32x2_t t); -int8x8_t packsshb (int16x4_t s, int16x4_t t); -uint8x8_t packushb (uint16x4_t s, uint16x4_t t); -uint32x2_t paddw_u (uint32x2_t s, uint32x2_t t); -uint16x4_t paddh_u (uint16x4_t s, uint16x4_t t); -uint8x8_t paddb_u (uint8x8_t s, uint8x8_t t); -int32x2_t paddw_s (int32x2_t s, int32x2_t t); -int16x4_t paddh_s (int16x4_t s, int16x4_t t); -int8x8_t paddb_s (int8x8_t s, int8x8_t t); -uint64_t paddd_u (uint64_t s, uint64_t t); -int64_t paddd_s (int64_t s, int64_t t); -int16x4_t paddsh (int16x4_t s, int16x4_t t); -int8x8_t paddsb (int8x8_t s, int8x8_t t); -uint16x4_t paddush (uint16x4_t s, uint16x4_t t); -uint8x8_t paddusb (uint8x8_t s, uint8x8_t t); -uint64_t pandn_ud (uint64_t s, uint64_t t); -uint32x2_t pandn_uw (uint32x2_t s, uint32x2_t t); -uint16x4_t pandn_uh (uint16x4_t s, uint16x4_t t); -uint8x8_t pandn_ub (uint8x8_t s, uint8x8_t t); -int64_t pandn_sd (int64_t s, int64_t t); -int32x2_t pandn_sw (int32x2_t s, int32x2_t t); -int16x4_t pandn_sh (int16x4_t s, int16x4_t t); -int8x8_t pandn_sb (int8x8_t s, int8x8_t t); -uint16x4_t pavgh (uint16x4_t s, uint16x4_t t); -uint8x8_t pavgb (uint8x8_t s, uint8x8_t t); -uint32x2_t pcmpeqw_u (uint32x2_t s, uint32x2_t t); -uint16x4_t pcmpeqh_u (uint16x4_t s, uint16x4_t t); -uint8x8_t pcmpeqb_u (uint8x8_t s, uint8x8_t t); -int32x2_t pcmpeqw_s (int32x2_t s, int32x2_t t); -int16x4_t pcmpeqh_s (int16x4_t s, int16x4_t t); -int8x8_t pcmpeqb_s (int8x8_t s, int8x8_t t); -uint32x2_t pcmpgtw_u (uint32x2_t s, uint32x2_t t); -uint16x4_t pcmpgth_u (uint16x4_t s, uint16x4_t t); -uint8x8_t pcmpgtb_u (uint8x8_t s, uint8x8_t t); -int32x2_t pcmpgtw_s (int32x2_t s, int32x2_t t); -int16x4_t pcmpgth_s (int16x4_t s, int16x4_t t); -int8x8_t pcmpgtb_s (int8x8_t s, int8x8_t t); -uint16x4_t pextrh_u (uint16x4_t s, int field); -int16x4_t pextrh_s (int16x4_t s, int field); -uint16x4_t pinsrh_0_u (uint16x4_t s, uint16x4_t t); -uint16x4_t pinsrh_1_u (uint16x4_t s, uint16x4_t t); -uint16x4_t pinsrh_2_u (uint16x4_t s, uint16x4_t t); -uint16x4_t pinsrh_3_u (uint16x4_t s, uint16x4_t t); -int16x4_t pinsrh_0_s (int16x4_t s, int16x4_t t); -int16x4_t pinsrh_1_s (int16x4_t s, int16x4_t t); -int16x4_t pinsrh_2_s (int16x4_t s, int16x4_t t); -int16x4_t pinsrh_3_s (int16x4_t s, int16x4_t t); -int32x2_t pmaddhw (int16x4_t s, int16x4_t t); -int16x4_t pmaxsh (int16x4_t s, int16x4_t t); -uint8x8_t pmaxub (uint8x8_t s, uint8x8_t t); -int16x4_t pminsh (int16x4_t s, int16x4_t t); -uint8x8_t pminub (uint8x8_t s, uint8x8_t t); -uint8x8_t pmovmskb_u (uint8x8_t s); -int8x8_t pmovmskb_s (int8x8_t s); -uint16x4_t pmulhuh (uint16x4_t s, uint16x4_t t); -int16x4_t pmulhh (int16x4_t s, int16x4_t t); -int16x4_t pmullh (int16x4_t s, int16x4_t t); -int64_t pmuluw (uint32x2_t s, uint32x2_t t); -uint8x8_t pasubub (uint8x8_t s, uint8x8_t t); -uint16x4_t biadd (uint8x8_t s); -uint16x4_t psadbh (uint8x8_t s, uint8x8_t t); -uint16x4_t pshufh_u (uint16x4_t dest, uint16x4_t s, uint8_t order); -int16x4_t pshufh_s (int16x4_t dest, int16x4_t s, uint8_t order); -uint16x4_t psllh_u (uint16x4_t s, uint8_t amount); -int16x4_t psllh_s (int16x4_t s, uint8_t amount); -uint32x2_t psllw_u (uint32x2_t s, uint8_t amount); -int32x2_t psllw_s (int32x2_t s, uint8_t amount); -uint16x4_t psrlh_u (uint16x4_t s, uint8_t amount); -int16x4_t psrlh_s (int16x4_t s, uint8_t amount); -uint32x2_t psrlw_u (uint32x2_t s, uint8_t amount); -int32x2_t psrlw_s (int32x2_t s, uint8_t amount); -uint16x4_t psrah_u (uint16x4_t s, uint8_t amount); -int16x4_t psrah_s (int16x4_t s, uint8_t amount); -uint32x2_t psraw_u (uint32x2_t s, uint8_t amount); -int32x2_t psraw_s (int32x2_t s, uint8_t amount); -uint32x2_t psubw_u (uint32x2_t s, uint32x2_t t); -uint16x4_t psubh_u (uint16x4_t s, uint16x4_t t); -uint8x8_t psubb_u (uint8x8_t s, uint8x8_t t); -int32x2_t psubw_s (int32x2_t s, int32x2_t t); -int16x4_t psubh_s (int16x4_t s, int16x4_t t); -int8x8_t psubb_s (int8x8_t s, int8x8_t t); -uint64_t psubd_u (uint64_t s, uint64_t t); -int64_t psubd_s (int64_t s, int64_t t); -int16x4_t psubsh (int16x4_t s, int16x4_t t); -int8x8_t psubsb (int8x8_t s, int8x8_t t); -uint16x4_t psubush (uint16x4_t s, uint16x4_t t); -uint8x8_t psubusb (uint8x8_t s, uint8x8_t t); -uint32x2_t punpckhwd_u (uint32x2_t s, uint32x2_t t); -uint16x4_t punpckhhw_u (uint16x4_t s, uint16x4_t t); -uint8x8_t punpckhbh_u (uint8x8_t s, uint8x8_t t); -int32x2_t punpckhwd_s (int32x2_t s, int32x2_t t); -int16x4_t punpckhhw_s (int16x4_t s, int16x4_t t); -int8x8_t punpckhbh_s (int8x8_t s, int8x8_t t); -uint32x2_t punpcklwd_u (uint32x2_t s, uint32x2_t t); -uint16x4_t punpcklhw_u (uint16x4_t s, uint16x4_t t); -uint8x8_t punpcklbh_u (uint8x8_t s, uint8x8_t t); -int32x2_t punpcklwd_s (int32x2_t s, int32x2_t t); -int16x4_t punpcklhw_s (int16x4_t s, int16x4_t t); -int8x8_t punpcklbh_s (int8x8_t s, int8x8_t t); -@end smallexample +@defbuiltin{void __builtin_arc_unimp_s (void)} +Only available with @option{-mcpu=ARC700}. Generates: +@example +unimp_s +@end example +@enddefbuiltin -@menu -* Paired-Single Arithmetic:: -* Paired-Single Built-in Functions:: -* MIPS-3D Built-in Functions:: -@end menu +The instructions generated by the following builtins are not +considered as candidates for scheduling. They are not moved around by +the compiler during scheduling, and thus can be expected to appear +where they are put in the C code: +@example +__builtin_arc_brk() +__builtin_arc_core_read() +__builtin_arc_core_write() +__builtin_arc_flag() +__builtin_arc_lr() +__builtin_arc_sleep() +__builtin_arc_sr() +__builtin_arc_swi() +@end example -@node Paired-Single Arithmetic -@subsubsection Paired-Single Arithmetic +The following built-in functions are available for the ARCv2 family of +processors. -The table below lists the @code{v2sf} operations for which hardware -support exists. @code{a}, @code{b} and @code{c} are @code{v2sf} -values and @code{x} is an integral value. +@example +int __builtin_arc_clri (); +void __builtin_arc_kflag (unsigned); +void __builtin_arc_seti (int); +@end example -@multitable @columnfractions .50 .50 -@headitem C code @tab MIPS instruction -@item @code{a + b} @tab @code{add.ps} -@item @code{a - b} @tab @code{sub.ps} -@item @code{-a} @tab @code{neg.ps} -@item @code{a * b} @tab @code{mul.ps} -@item @code{a * b + c} @tab @code{madd.ps} -@item @code{a * b - c} @tab @code{msub.ps} -@item @code{-(a * b + c)} @tab @code{nmadd.ps} -@item @code{-(a * b - c)} @tab @code{nmsub.ps} -@item @code{x ? a : b} @tab @code{movn.ps}/@code{movz.ps} -@end multitable +The following built-in functions are available for the ARCv2 family +and uses @option{-mnorm}. -Note that the multiply-accumulate instructions can be disabled -using the command-line option @code{-mno-fused-madd}. +@example +int __builtin_arc_ffs (int); +int __builtin_arc_fls (int); +@end example -@node Paired-Single Built-in Functions -@subsubsection Paired-Single Built-in Functions +@node ARC SIMD Built-in Functions +@subsection ARC SIMD Built-in Functions -The following paired-single functions map directly to a particular -MIPS instruction. Please refer to the architecture specification -for details on what each instruction does. +SIMD builtins provided by the compiler can be used to generate the +vector instructions. This section describes the available builtins +and their usage in programs. With the @option{-msimd} option, the +compiler provides 128-bit vector types, which can be specified using +the @code{vector_size} attribute. The header file @file{arc-simd.h} +can be included to use the following predefined types: +@example +typedef int __v4si __attribute__((vector_size(16))); +typedef short __v8hi __attribute__((vector_size(16))); +@end example -@table @code -@item v2sf __builtin_mips_pll_ps (v2sf, v2sf) -Pair lower lower (@code{pll.ps}). +These types can be used to define 128-bit variables. The built-in +functions listed in the following section can be used on these +variables to generate the vector operations. -@item v2sf __builtin_mips_pul_ps (v2sf, v2sf) -Pair upper lower (@code{pul.ps}). +For all builtins, @code{__builtin_arc_@var{someinsn}}, the header file +@file{arc-simd.h} also provides equivalent macros called +@code{_@var{someinsn}} that can be used for programming ease and +improved readability. The following macros for DMA control are also +provided: +@example +#define _setup_dma_in_channel_reg _vdiwr +#define _setup_dma_out_channel_reg _vdowr +@end example -@item v2sf __builtin_mips_plu_ps (v2sf, v2sf) -Pair lower upper (@code{plu.ps}). +The following is a complete list of all the SIMD built-ins provided +for ARC, grouped by calling signature. -@item v2sf __builtin_mips_puu_ps (v2sf, v2sf) -Pair upper upper (@code{puu.ps}). +The following take two @code{__v8hi} arguments and return a +@code{__v8hi} result: +@example +__v8hi __builtin_arc_vaddaw (__v8hi, __v8hi); +__v8hi __builtin_arc_vaddw (__v8hi, __v8hi); +__v8hi __builtin_arc_vand (__v8hi, __v8hi); +__v8hi __builtin_arc_vandaw (__v8hi, __v8hi); +__v8hi __builtin_arc_vavb (__v8hi, __v8hi); +__v8hi __builtin_arc_vavrb (__v8hi, __v8hi); +__v8hi __builtin_arc_vbic (__v8hi, __v8hi); +__v8hi __builtin_arc_vbicaw (__v8hi, __v8hi); +__v8hi __builtin_arc_vdifaw (__v8hi, __v8hi); +__v8hi __builtin_arc_vdifw (__v8hi, __v8hi); +__v8hi __builtin_arc_veqw (__v8hi, __v8hi); +__v8hi __builtin_arc_vh264f (__v8hi, __v8hi); +__v8hi __builtin_arc_vh264ft (__v8hi, __v8hi); +__v8hi __builtin_arc_vh264fw (__v8hi, __v8hi); +__v8hi __builtin_arc_vlew (__v8hi, __v8hi); +__v8hi __builtin_arc_vltw (__v8hi, __v8hi); +__v8hi __builtin_arc_vmaxaw (__v8hi, __v8hi); +__v8hi __builtin_arc_vmaxw (__v8hi, __v8hi); +__v8hi __builtin_arc_vminaw (__v8hi, __v8hi); +__v8hi __builtin_arc_vminw (__v8hi, __v8hi); +__v8hi __builtin_arc_vmr1aw (__v8hi, __v8hi); +__v8hi __builtin_arc_vmr1w (__v8hi, __v8hi); +__v8hi __builtin_arc_vmr2aw (__v8hi, __v8hi); +__v8hi __builtin_arc_vmr2w (__v8hi, __v8hi); +__v8hi __builtin_arc_vmr3aw (__v8hi, __v8hi); +__v8hi __builtin_arc_vmr3w (__v8hi, __v8hi); +__v8hi __builtin_arc_vmr4aw (__v8hi, __v8hi); +__v8hi __builtin_arc_vmr4w (__v8hi, __v8hi); +__v8hi __builtin_arc_vmr5aw (__v8hi, __v8hi); +__v8hi __builtin_arc_vmr5w (__v8hi, __v8hi); +__v8hi __builtin_arc_vmr6aw (__v8hi, __v8hi); +__v8hi __builtin_arc_vmr6w (__v8hi, __v8hi); +__v8hi __builtin_arc_vmr7aw (__v8hi, __v8hi); +__v8hi __builtin_arc_vmr7w (__v8hi, __v8hi); +__v8hi __builtin_arc_vmrb (__v8hi, __v8hi); +__v8hi __builtin_arc_vmulaw (__v8hi, __v8hi); +__v8hi __builtin_arc_vmulfaw (__v8hi, __v8hi); +__v8hi __builtin_arc_vmulfw (__v8hi, __v8hi); +__v8hi __builtin_arc_vmulw (__v8hi, __v8hi); +__v8hi __builtin_arc_vnew (__v8hi, __v8hi); +__v8hi __builtin_arc_vor (__v8hi, __v8hi); +__v8hi __builtin_arc_vsubaw (__v8hi, __v8hi); +__v8hi __builtin_arc_vsubw (__v8hi, __v8hi); +__v8hi __builtin_arc_vsummw (__v8hi, __v8hi); +__v8hi __builtin_arc_vvc1f (__v8hi, __v8hi); +__v8hi __builtin_arc_vvc1ft (__v8hi, __v8hi); +__v8hi __builtin_arc_vxor (__v8hi, __v8hi); +__v8hi __builtin_arc_vxoraw (__v8hi, __v8hi); +@end example -@item v2sf __builtin_mips_cvt_ps_s (float, float) -Convert pair to paired single (@code{cvt.ps.s}). +The following take one @code{__v8hi} and one @code{int} argument and return a +@code{__v8hi} result: -@item float __builtin_mips_cvt_s_pl (v2sf) -Convert pair lower to single (@code{cvt.s.pl}). +@example +__v8hi __builtin_arc_vbaddw (__v8hi, int); +__v8hi __builtin_arc_vbmaxw (__v8hi, int); +__v8hi __builtin_arc_vbminw (__v8hi, int); +__v8hi __builtin_arc_vbmulaw (__v8hi, int); +__v8hi __builtin_arc_vbmulfw (__v8hi, int); +__v8hi __builtin_arc_vbmulw (__v8hi, int); +__v8hi __builtin_arc_vbrsubw (__v8hi, int); +__v8hi __builtin_arc_vbsubw (__v8hi, int); +@end example -@item float __builtin_mips_cvt_s_pu (v2sf) -Convert pair upper to single (@code{cvt.s.pu}). +The following take one @code{__v8hi} argument and one @code{int} argument which +must be a 3-bit compile time constant indicating a register number +I0-I7. They return a @code{__v8hi} result. +@example +__v8hi __builtin_arc_vasrw (__v8hi, const int); +__v8hi __builtin_arc_vsr8 (__v8hi, const int); +__v8hi __builtin_arc_vsr8aw (__v8hi, const int); +@end example -@item v2sf __builtin_mips_abs_ps (v2sf) -Absolute value (@code{abs.ps}). +The following take one @code{__v8hi} argument and one @code{int} +argument which must be a 6-bit compile time constant. They return a +@code{__v8hi} result. +@example +__v8hi __builtin_arc_vasrpwbi (__v8hi, const int); +__v8hi __builtin_arc_vasrrpwbi (__v8hi, const int); +__v8hi __builtin_arc_vasrrwi (__v8hi, const int); +__v8hi __builtin_arc_vasrsrwi (__v8hi, const int); +__v8hi __builtin_arc_vasrwi (__v8hi, const int); +__v8hi __builtin_arc_vsr8awi (__v8hi, const int); +__v8hi __builtin_arc_vsr8i (__v8hi, const int); +@end example -@item v2sf __builtin_mips_alnv_ps (v2sf, v2sf, int) -Align variable (@code{alnv.ps}). +The following take one @code{__v8hi} argument and one @code{int} argument which +must be a 8-bit compile time constant. They return a @code{__v8hi} +result. +@example +__v8hi __builtin_arc_vd6tapf (__v8hi, const int); +__v8hi __builtin_arc_vmvaw (__v8hi, const int); +__v8hi __builtin_arc_vmvw (__v8hi, const int); +__v8hi __builtin_arc_vmvzw (__v8hi, const int); +@end example -@emph{Note:} The value of the third parameter must be 0 or 4 -modulo 8, otherwise the result is unpredictable. Please read the -instruction description for details. -@end table +The following take two @code{int} arguments, the second of which which +must be a 8-bit compile time constant. They return a @code{__v8hi} +result: +@example +__v8hi __builtin_arc_vmovaw (int, const int); +__v8hi __builtin_arc_vmovw (int, const int); +__v8hi __builtin_arc_vmovzw (int, const int); +@end example -The following multi-instruction functions are also available. -In each case, @var{cond} can be any of the 16 floating-point conditions: -@code{f}, @code{un}, @code{eq}, @code{ueq}, @code{olt}, @code{ult}, -@code{ole}, @code{ule}, @code{sf}, @code{ngle}, @code{seq}, @code{ngl}, -@code{lt}, @code{nge}, @code{le} or @code{ngt}. +The following take a single @code{__v8hi} argument and return a +@code{__v8hi} result: +@example +__v8hi __builtin_arc_vabsaw (__v8hi); +__v8hi __builtin_arc_vabsw (__v8hi); +__v8hi __builtin_arc_vaddsuw (__v8hi); +__v8hi __builtin_arc_vexch1 (__v8hi); +__v8hi __builtin_arc_vexch2 (__v8hi); +__v8hi __builtin_arc_vexch4 (__v8hi); +__v8hi __builtin_arc_vsignw (__v8hi); +__v8hi __builtin_arc_vupbaw (__v8hi); +__v8hi __builtin_arc_vupbw (__v8hi); +__v8hi __builtin_arc_vupsbaw (__v8hi); +__v8hi __builtin_arc_vupsbw (__v8hi); +@end example + +The following take two @code{int} arguments and return no result: +@example +void __builtin_arc_vdirun (int, int); +void __builtin_arc_vdorun (int, int); +@end example + +The following take two @code{int} arguments and return no result. The +first argument must a 3-bit compile time constant indicating one of +the DR0-DR7 DMA setup channels: +@example +void __builtin_arc_vdiwr (const int, int); +void __builtin_arc_vdowr (const int, int); +@end example + +The following take an @code{int} argument and return no result: +@example +void __builtin_arc_vendrec (int); +void __builtin_arc_vrec (int); +void __builtin_arc_vrecrun (int); +void __builtin_arc_vrun (int); +@end example + +The following take a @code{__v8hi} argument and two @code{int} +arguments and return a @code{__v8hi} result. The second argument must +be a 3-bit compile time constants, indicating one the registers I0-I7, +and the third argument must be an 8-bit compile time constant. + +@emph{Note:} Although the equivalent hardware instructions do not take +an SIMD register as an operand, these builtins overwrite the relevant +bits of the @code{__v8hi} register provided as the first argument with +the value loaded from the @code{[Ib, u8]} location in the SDM. -@table @code -@item v2sf __builtin_mips_movt_c_@var{cond}_ps (v2sf @var{a}, v2sf @var{b}, v2sf @var{c}, v2sf @var{d}) -@itemx v2sf __builtin_mips_movf_c_@var{cond}_ps (v2sf @var{a}, v2sf @var{b}, v2sf @var{c}, v2sf @var{d}) -Conditional move based on floating-point comparison (@code{c.@var{cond}.ps}, -@code{movt.ps}/@code{movf.ps}). +@example +__v8hi __builtin_arc_vld32 (__v8hi, const int, const int); +__v8hi __builtin_arc_vld32wh (__v8hi, const int, const int); +__v8hi __builtin_arc_vld32wl (__v8hi, const int, const int); +__v8hi __builtin_arc_vld64 (__v8hi, const int, const int); +@end example -The @code{movt} functions return the value @var{x} computed by: +The following take two @code{int} arguments and return a @code{__v8hi} +result. The first argument must be a 3-bit compile time constants, +indicating one the registers I0-I7, and the second argument must be an +8-bit compile time constant. -@smallexample -c.@var{cond}.ps @var{cc},@var{a},@var{b} -mov.ps @var{x},@var{c} -movt.ps @var{x},@var{d},@var{cc} -@end smallexample +@example +__v8hi __builtin_arc_vld128 (const int, const int); +__v8hi __builtin_arc_vld64w (const int, const int); +@end example -The @code{movf} functions are similar but use @code{movf.ps} instead -of @code{movt.ps}. +The following take a @code{__v8hi} argument and two @code{int} +arguments and return no result. The second argument must be a 3-bit +compile time constants, indicating one the registers I0-I7, and the +third argument must be an 8-bit compile time constant. -@item int __builtin_mips_upper_c_@var{cond}_ps (v2sf @var{a}, v2sf @var{b}) -@itemx int __builtin_mips_lower_c_@var{cond}_ps (v2sf @var{a}, v2sf @var{b}) -Comparison of two paired-single values (@code{c.@var{cond}.ps}, -@code{bc1t}/@code{bc1f}). +@example +void __builtin_arc_vst128 (__v8hi, const int, const int); +void __builtin_arc_vst64 (__v8hi, const int, const int); +@end example -These functions compare @var{a} and @var{b} using @code{c.@var{cond}.ps} -and return either the upper or lower half of the result. For example: +The following take a @code{__v8hi} argument and three @code{int} +arguments and return no result. The second argument must be a 3-bit +compile-time constant, identifying the 16-bit sub-register to be +stored, the third argument must be a 3-bit compile time constants, +indicating one the registers I0-I7, and the fourth argument must be an +8-bit compile time constant. -@smallexample -v2sf a, b; -if (__builtin_mips_upper_c_eq_ps (a, b)) - upper_halves_are_equal (); -else - upper_halves_are_unequal (); +@example +void __builtin_arc_vst16_n (__v8hi, const int, const int, const int); +void __builtin_arc_vst32_n (__v8hi, const int, const int, const int); +@end example -if (__builtin_mips_lower_c_eq_ps (a, b)) - lower_halves_are_equal (); -else - lower_halves_are_unequal (); -@end smallexample -@end table +The following built-in functions are available on systems that uses +@option{-mmpy-option=6} or higher. -@node MIPS-3D Built-in Functions -@subsubsection MIPS-3D Built-in Functions +@example +__v2hi __builtin_arc_dmach (__v2hi, __v2hi); +__v2hi __builtin_arc_dmachu (__v2hi, __v2hi); +__v2hi __builtin_arc_dmpyh (__v2hi, __v2hi); +__v2hi __builtin_arc_dmpyhu (__v2hi, __v2hi); +__v2hi __builtin_arc_vaddsub2h (__v2hi, __v2hi); +__v2hi __builtin_arc_vsubadd2h (__v2hi, __v2hi); +@end example -The MIPS-3D Application-Specific Extension (ASE) includes additional -paired-single instructions that are designed to improve the performance -of 3D graphics operations. Support for these instructions is controlled -by the @option{-mips3d} command-line option. +The following built-in functions are available on systems that uses +@option{-mmpy-option=7} or higher. -The functions listed below map directly to a particular MIPS-3D -instruction. Please refer to the architecture specification for -more details on what each instruction does. +@example +__v2si __builtin_arc_vmac2h (__v2hi, __v2hi); +__v2si __builtin_arc_vmac2hu (__v2hi, __v2hi); +__v2si __builtin_arc_vmpy2h (__v2hi, __v2hi); +__v2si __builtin_arc_vmpy2hu (__v2hi, __v2hi); +@end example -@table @code -@item v2sf __builtin_mips_addr_ps (v2sf, v2sf) -Reduction add (@code{addr.ps}). +The following built-in functions are available on systems that uses +@option{-mmpy-option=8} or higher. -@item v2sf __builtin_mips_mulr_ps (v2sf, v2sf) -Reduction multiply (@code{mulr.ps}). +@example +long long __builtin_arc_qmach (__v4hi, __v4hi); +long long __builtin_arc_qmachu (__v4hi, __v4hi); +long long __builtin_arc_qmpyh (__v4hi, __v4hi); +long long __builtin_arc_qmpyhu (__v4hi, __v4hi); +long long __builtin_arc_dmacwh (__v2si, __v2hi); +long long __builtin_arc_dmacwhu (__v2si, __v2hi); +_v2si __builtin_arc_vaddsub (__v2si, __v2si); +_v2si __builtin_arc_vsubadd (__v2si, __v2si); +_v4hi __builtin_arc_vaddsub4h (__v4hi, __v4hi); +_v4hi __builtin_arc_vsubadd4h (__v4hi, __v4hi); +@end example -@item v2sf __builtin_mips_cvt_pw_ps (v2sf) -Convert paired single to paired word (@code{cvt.pw.ps}). +@node ARM iWMMXt Built-in Functions +@subsection ARM iWMMXt Built-in Functions -@item v2sf __builtin_mips_cvt_ps_pw (v2sf) -Convert paired word to paired single (@code{cvt.ps.pw}). +These built-in functions are available for the ARM family of +processors when the @option{-mcpu=iwmmxt} switch is used: -@item float __builtin_mips_recip1_s (float) -@itemx double __builtin_mips_recip1_d (double) -@itemx v2sf __builtin_mips_recip1_ps (v2sf) -Reduced-precision reciprocal (sequence step 1) (@code{recip1.@var{fmt}}). +@smallexample +typedef int v2si __attribute__ ((vector_size (8))); +typedef short v4hi __attribute__ ((vector_size (8))); +typedef char v8qi __attribute__ ((vector_size (8))); -@item float __builtin_mips_recip2_s (float, float) -@itemx double __builtin_mips_recip2_d (double, double) -@itemx v2sf __builtin_mips_recip2_ps (v2sf, v2sf) -Reduced-precision reciprocal (sequence step 2) (@code{recip2.@var{fmt}}). +int __builtin_arm_getwcgr0 (void); +void __builtin_arm_setwcgr0 (int); +int __builtin_arm_getwcgr1 (void); +void __builtin_arm_setwcgr1 (int); +int __builtin_arm_getwcgr2 (void); +void __builtin_arm_setwcgr2 (int); +int __builtin_arm_getwcgr3 (void); +void __builtin_arm_setwcgr3 (int); +int __builtin_arm_textrmsb (v8qi, int); +int __builtin_arm_textrmsh (v4hi, int); +int __builtin_arm_textrmsw (v2si, int); +int __builtin_arm_textrmub (v8qi, int); +int __builtin_arm_textrmuh (v4hi, int); +int __builtin_arm_textrmuw (v2si, int); +v8qi __builtin_arm_tinsrb (v8qi, int, int); +v4hi __builtin_arm_tinsrh (v4hi, int, int); +v2si __builtin_arm_tinsrw (v2si, int, int); +long long __builtin_arm_tmia (long long, int, int); +long long __builtin_arm_tmiabb (long long, int, int); +long long __builtin_arm_tmiabt (long long, int, int); +long long __builtin_arm_tmiaph (long long, int, int); +long long __builtin_arm_tmiatb (long long, int, int); +long long __builtin_arm_tmiatt (long long, int, int); +int __builtin_arm_tmovmskb (v8qi); +int __builtin_arm_tmovmskh (v4hi); +int __builtin_arm_tmovmskw (v2si); +long long __builtin_arm_waccb (v8qi); +long long __builtin_arm_wacch (v4hi); +long long __builtin_arm_waccw (v2si); +v8qi __builtin_arm_waddb (v8qi, v8qi); +v8qi __builtin_arm_waddbss (v8qi, v8qi); +v8qi __builtin_arm_waddbus (v8qi, v8qi); +v4hi __builtin_arm_waddh (v4hi, v4hi); +v4hi __builtin_arm_waddhss (v4hi, v4hi); +v4hi __builtin_arm_waddhus (v4hi, v4hi); +v2si __builtin_arm_waddw (v2si, v2si); +v2si __builtin_arm_waddwss (v2si, v2si); +v2si __builtin_arm_waddwus (v2si, v2si); +v8qi __builtin_arm_walign (v8qi, v8qi, int); +long long __builtin_arm_wand(long long, long long); +long long __builtin_arm_wandn (long long, long long); +v8qi __builtin_arm_wavg2b (v8qi, v8qi); +v8qi __builtin_arm_wavg2br (v8qi, v8qi); +v4hi __builtin_arm_wavg2h (v4hi, v4hi); +v4hi __builtin_arm_wavg2hr (v4hi, v4hi); +v8qi __builtin_arm_wcmpeqb (v8qi, v8qi); +v4hi __builtin_arm_wcmpeqh (v4hi, v4hi); +v2si __builtin_arm_wcmpeqw (v2si, v2si); +v8qi __builtin_arm_wcmpgtsb (v8qi, v8qi); +v4hi __builtin_arm_wcmpgtsh (v4hi, v4hi); +v2si __builtin_arm_wcmpgtsw (v2si, v2si); +v8qi __builtin_arm_wcmpgtub (v8qi, v8qi); +v4hi __builtin_arm_wcmpgtuh (v4hi, v4hi); +v2si __builtin_arm_wcmpgtuw (v2si, v2si); +long long __builtin_arm_wmacs (long long, v4hi, v4hi); +long long __builtin_arm_wmacsz (v4hi, v4hi); +long long __builtin_arm_wmacu (long long, v4hi, v4hi); +long long __builtin_arm_wmacuz (v4hi, v4hi); +v4hi __builtin_arm_wmadds (v4hi, v4hi); +v4hi __builtin_arm_wmaddu (v4hi, v4hi); +v8qi __builtin_arm_wmaxsb (v8qi, v8qi); +v4hi __builtin_arm_wmaxsh (v4hi, v4hi); +v2si __builtin_arm_wmaxsw (v2si, v2si); +v8qi __builtin_arm_wmaxub (v8qi, v8qi); +v4hi __builtin_arm_wmaxuh (v4hi, v4hi); +v2si __builtin_arm_wmaxuw (v2si, v2si); +v8qi __builtin_arm_wminsb (v8qi, v8qi); +v4hi __builtin_arm_wminsh (v4hi, v4hi); +v2si __builtin_arm_wminsw (v2si, v2si); +v8qi __builtin_arm_wminub (v8qi, v8qi); +v4hi __builtin_arm_wminuh (v4hi, v4hi); +v2si __builtin_arm_wminuw (v2si, v2si); +v4hi __builtin_arm_wmulsm (v4hi, v4hi); +v4hi __builtin_arm_wmulul (v4hi, v4hi); +v4hi __builtin_arm_wmulum (v4hi, v4hi); +long long __builtin_arm_wor (long long, long long); +v2si __builtin_arm_wpackdss (long long, long long); +v2si __builtin_arm_wpackdus (long long, long long); +v8qi __builtin_arm_wpackhss (v4hi, v4hi); +v8qi __builtin_arm_wpackhus (v4hi, v4hi); +v4hi __builtin_arm_wpackwss (v2si, v2si); +v4hi __builtin_arm_wpackwus (v2si, v2si); +long long __builtin_arm_wrord (long long, long long); +long long __builtin_arm_wrordi (long long, int); +v4hi __builtin_arm_wrorh (v4hi, long long); +v4hi __builtin_arm_wrorhi (v4hi, int); +v2si __builtin_arm_wrorw (v2si, long long); +v2si __builtin_arm_wrorwi (v2si, int); +v2si __builtin_arm_wsadb (v2si, v8qi, v8qi); +v2si __builtin_arm_wsadbz (v8qi, v8qi); +v2si __builtin_arm_wsadh (v2si, v4hi, v4hi); +v2si __builtin_arm_wsadhz (v4hi, v4hi); +v4hi __builtin_arm_wshufh (v4hi, int); +long long __builtin_arm_wslld (long long, long long); +long long __builtin_arm_wslldi (long long, int); +v4hi __builtin_arm_wsllh (v4hi, long long); +v4hi __builtin_arm_wsllhi (v4hi, int); +v2si __builtin_arm_wsllw (v2si, long long); +v2si __builtin_arm_wsllwi (v2si, int); +long long __builtin_arm_wsrad (long long, long long); +long long __builtin_arm_wsradi (long long, int); +v4hi __builtin_arm_wsrah (v4hi, long long); +v4hi __builtin_arm_wsrahi (v4hi, int); +v2si __builtin_arm_wsraw (v2si, long long); +v2si __builtin_arm_wsrawi (v2si, int); +long long __builtin_arm_wsrld (long long, long long); +long long __builtin_arm_wsrldi (long long, int); +v4hi __builtin_arm_wsrlh (v4hi, long long); +v4hi __builtin_arm_wsrlhi (v4hi, int); +v2si __builtin_arm_wsrlw (v2si, long long); +v2si __builtin_arm_wsrlwi (v2si, int); +v8qi __builtin_arm_wsubb (v8qi, v8qi); +v8qi __builtin_arm_wsubbss (v8qi, v8qi); +v8qi __builtin_arm_wsubbus (v8qi, v8qi); +v4hi __builtin_arm_wsubh (v4hi, v4hi); +v4hi __builtin_arm_wsubhss (v4hi, v4hi); +v4hi __builtin_arm_wsubhus (v4hi, v4hi); +v2si __builtin_arm_wsubw (v2si, v2si); +v2si __builtin_arm_wsubwss (v2si, v2si); +v2si __builtin_arm_wsubwus (v2si, v2si); +v4hi __builtin_arm_wunpckehsb (v8qi); +v2si __builtin_arm_wunpckehsh (v4hi); +long long __builtin_arm_wunpckehsw (v2si); +v4hi __builtin_arm_wunpckehub (v8qi); +v2si __builtin_arm_wunpckehuh (v4hi); +long long __builtin_arm_wunpckehuw (v2si); +v4hi __builtin_arm_wunpckelsb (v8qi); +v2si __builtin_arm_wunpckelsh (v4hi); +long long __builtin_arm_wunpckelsw (v2si); +v4hi __builtin_arm_wunpckelub (v8qi); +v2si __builtin_arm_wunpckeluh (v4hi); +long long __builtin_arm_wunpckeluw (v2si); +v8qi __builtin_arm_wunpckihb (v8qi, v8qi); +v4hi __builtin_arm_wunpckihh (v4hi, v4hi); +v2si __builtin_arm_wunpckihw (v2si, v2si); +v8qi __builtin_arm_wunpckilb (v8qi, v8qi); +v4hi __builtin_arm_wunpckilh (v4hi, v4hi); +v2si __builtin_arm_wunpckilw (v2si, v2si); +long long __builtin_arm_wxor (long long, long long); +long long __builtin_arm_wzero (); +@end smallexample -@item float __builtin_mips_rsqrt1_s (float) -@itemx double __builtin_mips_rsqrt1_d (double) -@itemx v2sf __builtin_mips_rsqrt1_ps (v2sf) -Reduced-precision reciprocal square root (sequence step 1) -(@code{rsqrt1.@var{fmt}}). -@item float __builtin_mips_rsqrt2_s (float, float) -@itemx double __builtin_mips_rsqrt2_d (double, double) -@itemx v2sf __builtin_mips_rsqrt2_ps (v2sf, v2sf) -Reduced-precision reciprocal square root (sequence step 2) -(@code{rsqrt2.@var{fmt}}). -@end table +@node ARM C Language Extensions (ACLE) +@subsection ARM C Language Extensions (ACLE) -The following multi-instruction functions are also available. -In each case, @var{cond} can be any of the 16 floating-point conditions: -@code{f}, @code{un}, @code{eq}, @code{ueq}, @code{olt}, @code{ult}, -@code{ole}, @code{ule}, @code{sf}, @code{ngle}, @code{seq}, -@code{ngl}, @code{lt}, @code{nge}, @code{le} or @code{ngt}. +GCC implements extensions for C as described in the ARM C Language +Extensions (ACLE) specification, which can be found at +@uref{https://developer.arm.com/documentation/ihi0053/latest/}. -@table @code -@item int __builtin_mips_cabs_@var{cond}_s (float @var{a}, float @var{b}) -@itemx int __builtin_mips_cabs_@var{cond}_d (double @var{a}, double @var{b}) -Absolute comparison of two scalar values (@code{cabs.@var{cond}.@var{fmt}}, -@code{bc1t}/@code{bc1f}). +As a part of ACLE, GCC implements extensions for Advanced SIMD as described in +the ARM C Language Extensions Specification. The complete list of Advanced SIMD +intrinsics can be found at +@uref{https://developer.arm.com/documentation/ihi0073/latest/}. +The built-in intrinsics for the Advanced SIMD extension are available when +NEON is enabled. -These functions compare @var{a} and @var{b} using @code{cabs.@var{cond}.s} -or @code{cabs.@var{cond}.d} and return the result as a boolean value. -For example: +Currently, ARM and AArch64 back ends do not support ACLE 2.0 fully. Both +back ends support CRC32 intrinsics and the ARM back end supports the +Coprocessor intrinsics, all from @file{arm_acle.h}. The ARM back end's 16-bit +floating-point Advanced SIMD intrinsics currently comply to ACLE v1.1. +AArch64's back end does not have support for 16-bit floating point Advanced SIMD +intrinsics yet. -@smallexample -float a, b; -if (__builtin_mips_cabs_eq_s (a, b)) - true (); -else - false (); -@end smallexample +See @ref{ARM Options} and @ref{AArch64 Options} for more information on the +availability of extensions. -@item int __builtin_mips_upper_cabs_@var{cond}_ps (v2sf @var{a}, v2sf @var{b}) -@itemx int __builtin_mips_lower_cabs_@var{cond}_ps (v2sf @var{a}, v2sf @var{b}) -Absolute comparison of two paired-single values (@code{cabs.@var{cond}.ps}, -@code{bc1t}/@code{bc1f}). +@node ARM Floating Point Status and Control Intrinsics +@subsection ARM Floating Point Status and Control Intrinsics -These functions compare @var{a} and @var{b} using @code{cabs.@var{cond}.ps} -and return either the upper or lower half of the result. For example: +These built-in functions are available for the ARM family of +processors with floating-point unit. @smallexample -v2sf a, b; -if (__builtin_mips_upper_cabs_eq_ps (a, b)) - upper_halves_are_equal (); -else - upper_halves_are_unequal (); - -if (__builtin_mips_lower_cabs_eq_ps (a, b)) - lower_halves_are_equal (); -else - lower_halves_are_unequal (); +unsigned int __builtin_arm_get_fpscr (); +void __builtin_arm_set_fpscr (unsigned int); @end smallexample -@item v2sf __builtin_mips_movt_cabs_@var{cond}_ps (v2sf @var{a}, v2sf @var{b}, v2sf @var{c}, v2sf @var{d}) -@itemx v2sf __builtin_mips_movf_cabs_@var{cond}_ps (v2sf @var{a}, v2sf @var{b}, v2sf @var{c}, v2sf @var{d}) -Conditional move based on absolute comparison (@code{cabs.@var{cond}.ps}, -@code{movt.ps}/@code{movf.ps}). - -The @code{movt} functions return the value @var{x} computed by: - -@smallexample -cabs.@var{cond}.ps @var{cc},@var{a},@var{b} -mov.ps @var{x},@var{c} -movt.ps @var{x},@var{d},@var{cc} -@end smallexample +@node ARM ARMv8-M Security Extensions +@subsection ARM ARMv8-M Security Extensions -The @code{movf} functions are similar but use @code{movf.ps} instead -of @code{movt.ps}. +GCC implements the ARMv8-M Security Extensions as described in the ARMv8-M +Security Extensions: Requirements on Development Tools Engineering +Specification, which can be found at +@uref{https://developer.arm.com/documentation/ecm0359818/latest/}. -@item int __builtin_mips_any_c_@var{cond}_ps (v2sf @var{a}, v2sf @var{b}) -@itemx int __builtin_mips_all_c_@var{cond}_ps (v2sf @var{a}, v2sf @var{b}) -@itemx int __builtin_mips_any_cabs_@var{cond}_ps (v2sf @var{a}, v2sf @var{b}) -@itemx int __builtin_mips_all_cabs_@var{cond}_ps (v2sf @var{a}, v2sf @var{b}) -Comparison of two paired-single values -(@code{c.@var{cond}.ps}/@code{cabs.@var{cond}.ps}, -@code{bc1any2t}/@code{bc1any2f}). +As part of the Security Extensions GCC implements two new function attributes: +@code{cmse_nonsecure_entry} and @code{cmse_nonsecure_call}. -These functions compare @var{a} and @var{b} using @code{c.@var{cond}.ps} -or @code{cabs.@var{cond}.ps}. The @code{any} forms return @code{true} if either -result is @code{true} and the @code{all} forms return @code{true} if both results are @code{true}. -For example: +As part of the Security Extensions GCC implements the intrinsics below. FPTR +is used here to mean any function pointer type. @smallexample -v2sf a, b; -if (__builtin_mips_any_c_eq_ps (a, b)) - one_is_true (); -else - both_are_false (); - -if (__builtin_mips_all_c_eq_ps (a, b)) - both_are_true (); -else - one_is_false (); +cmse_address_info_t cmse_TT (void *); +cmse_address_info_t cmse_TT_fptr (FPTR); +cmse_address_info_t cmse_TTT (void *); +cmse_address_info_t cmse_TTT_fptr (FPTR); +cmse_address_info_t cmse_TTA (void *); +cmse_address_info_t cmse_TTA_fptr (FPTR); +cmse_address_info_t cmse_TTAT (void *); +cmse_address_info_t cmse_TTAT_fptr (FPTR); +void * cmse_check_address_range (void *, size_t, int); +typeof(p) cmse_nsfptr_create (FPTR p); +intptr_t cmse_is_nsfptr (FPTR); +int cmse_nonsecure_caller (void); @end smallexample -@item int __builtin_mips_any_c_@var{cond}_4s (v2sf @var{a}, v2sf @var{b}, v2sf @var{c}, v2sf @var{d}) -@itemx int __builtin_mips_all_c_@var{cond}_4s (v2sf @var{a}, v2sf @var{b}, v2sf @var{c}, v2sf @var{d}) -@itemx int __builtin_mips_any_cabs_@var{cond}_4s (v2sf @var{a}, v2sf @var{b}, v2sf @var{c}, v2sf @var{d}) -@itemx int __builtin_mips_all_cabs_@var{cond}_4s (v2sf @var{a}, v2sf @var{b}, v2sf @var{c}, v2sf @var{d}) -Comparison of four paired-single values -(@code{c.@var{cond}.ps}/@code{cabs.@var{cond}.ps}, -@code{bc1any4t}/@code{bc1any4f}). +@node AVR Built-in Functions +@subsection AVR Built-in Functions -These functions use @code{c.@var{cond}.ps} or @code{cabs.@var{cond}.ps} -to compare @var{a} with @var{b} and to compare @var{c} with @var{d}. -The @code{any} forms return @code{true} if any of the four results are @code{true} -and the @code{all} forms return @code{true} if all four results are @code{true}. -For example: +For each built-in function for AVR, there is an equally named, +uppercase built-in macro defined. That way users can easily query if +or if not a specific built-in is implemented or not. For example, if +@code{__builtin_avr_nop} is available the macro +@code{__BUILTIN_AVR_NOP} is defined to @code{1} and undefined otherwise. -@smallexample -v2sf a, b, c, d; -if (__builtin_mips_any_c_eq_4s (a, b, c, d)) - some_are_true (); -else - all_are_false (); +@defbuiltin{void __builtin_avr_nop (void)} +@defbuiltinx{void __builtin_avr_sei (void)} +@defbuiltinx{void __builtin_avr_cli (void)} +@defbuiltinx{void __builtin_avr_sleep (void)} +@defbuiltinx{void __builtin_avr_wdr (void)} +@defbuiltinx{uint8_t __builtin_avr_swap (uint8_t)} +@defbuiltinx{uint16_t __builtin_avr_fmul (uint8_t, uint8_t)} +@defbuiltinx{int16_t __builtin_avr_fmuls (int8_t, int8_t)} +@defbuiltinx{int16_t __builtin_avr_fmulsu (int8_t, uint8_t)} -if (__builtin_mips_all_c_eq_4s (a, b, c, d)) - all_are_true (); -else - some_are_false (); -@end smallexample -@end table +These built-in functions map to the respective machine +instruction, i.e.@: @code{nop}, @code{sei}, @code{cli}, @code{sleep}, +@code{wdr}, @code{swap}, @code{fmul}, @code{fmuls} +resp. @code{fmulsu}. The three @code{fmul*} built-ins are implemented +as library call if no hardware multiplier is available. +@enddefbuiltin -@node MIPS SIMD Architecture (MSA) Support -@subsection MIPS SIMD Architecture (MSA) Support +@defbuiltin{void __builtin_avr_delay_cycles (uint32_t @var{ticks})} +Delay execution for @var{ticks} cycles. Note that this +built-in does not take into account the effect of interrupts that +might increase delay time. @var{ticks} must be a compile-time +integer constant; delays with a variable number of cycles are not supported. +@enddefbuiltin -@menu -* MIPS SIMD Architecture Built-in Functions:: -@end menu +@defbuiltin{uint8_t __builtin_avr_insert_bits (uint32_t @var{map}, uint8_t @var{bits}, uint8_t @var{val})} +Insert bits from @var{bits} into @var{val} and return the resulting +value. The nibbles of @var{map} determine how the insertion is +performed: Let @var{X} be the @var{n}-th nibble of @var{map} +@enumerate +@item If @var{X} is @code{0xf}, +then the @var{n}-th bit of @var{val} is returned unaltered. -GCC provides intrinsics to access the SIMD instructions provided by the -MSA MIPS SIMD Architecture. The interface is made available by including -@code{} and using @option{-mmsa -mhard-float -mfp64 -mnan=2008}. -For each @code{__builtin_msa_*}, there is a shortened name of the intrinsic, -@code{__msa_*}. +@item If X is in the range 0@dots{}7, +then the @var{n}-th result bit is set to the @var{X}-th bit of @var{bits} -MSA implements 128-bit wide vector registers, operating on 8-, 16-, 32- and -64-bit integer, 16- and 32-bit fixed-point, or 32- and 64-bit floating point -data elements. The following vectors typedefs are included in @code{msa.h}: -@itemize -@item @code{v16i8}, a vector of sixteen signed 8-bit integers; -@item @code{v16u8}, a vector of sixteen unsigned 8-bit integers; -@item @code{v8i16}, a vector of eight signed 16-bit integers; -@item @code{v8u16}, a vector of eight unsigned 16-bit integers; -@item @code{v4i32}, a vector of four signed 32-bit integers; -@item @code{v4u32}, a vector of four unsigned 32-bit integers; -@item @code{v2i64}, a vector of two signed 64-bit integers; -@item @code{v2u64}, a vector of two unsigned 64-bit integers; -@item @code{v4f32}, a vector of four 32-bit floats; -@item @code{v2f64}, a vector of two 64-bit doubles. -@end itemize +@item If X is in the range 8@dots{}@code{0xe}, +then the @var{n}-th result bit is undefined. +@end enumerate -Instructions and corresponding built-ins may have additional restrictions and/or -input/output values manipulated: -@itemize -@item @code{imm0_1}, an integer literal in range 0 to 1; -@item @code{imm0_3}, an integer literal in range 0 to 3; -@item @code{imm0_7}, an integer literal in range 0 to 7; -@item @code{imm0_15}, an integer literal in range 0 to 15; -@item @code{imm0_31}, an integer literal in range 0 to 31; -@item @code{imm0_63}, an integer literal in range 0 to 63; -@item @code{imm0_255}, an integer literal in range 0 to 255; -@item @code{imm_n16_15}, an integer literal in range -16 to 15; -@item @code{imm_n512_511}, an integer literal in range -512 to 511; -@item @code{imm_n1024_1022}, an integer literal in range -512 to 511 left -shifted by 1 bit, i.e., -1024, -1022, @dots{}, 1020, 1022; -@item @code{imm_n2048_2044}, an integer literal in range -512 to 511 left -shifted by 2 bits, i.e., -2048, -2044, @dots{}, 2040, 2044; -@item @code{imm_n4096_4088}, an integer literal in range -512 to 511 left -shifted by 3 bits, i.e., -4096, -4088, @dots{}, 4080, 4088; -@item @code{imm1_4}, an integer literal in range 1 to 4; -@item @code{i32, i64, u32, u64, f32, f64}, defined as follows: -@end itemize +@noindent +One typical use case for this built-in is adjusting input and +output values to non-contiguous port layouts. Some examples: @smallexample -@{ -typedef int i32; -#if __LONG_MAX__ == __LONG_LONG_MAX__ -typedef long i64; -#else -typedef long long i64; -#endif - -typedef unsigned int u32; -#if __LONG_MAX__ == __LONG_LONG_MAX__ -typedef unsigned long u64; -#else -typedef unsigned long long u64; -#endif +// same as val, bits is unused +__builtin_avr_insert_bits (0xffffffff, bits, val); +@end smallexample -typedef double f64; -typedef float f32; -@} +@smallexample +// same as bits, val is unused +__builtin_avr_insert_bits (0x76543210, bits, val); @end smallexample -@node MIPS SIMD Architecture Built-in Functions -@subsubsection MIPS SIMD Architecture Built-in Functions +@smallexample +// same as rotating bits by 4 +__builtin_avr_insert_bits (0x32107654, bits, 0); +@end smallexample -The intrinsics provided are listed below; each is named after the -machine instruction. +@smallexample +// high nibble of result is the high nibble of val +// low nibble of result is the low nibble of bits +__builtin_avr_insert_bits (0xffff3210, bits, val); +@end smallexample @smallexample -v16i8 __builtin_msa_add_a_b (v16i8, v16i8); -v8i16 __builtin_msa_add_a_h (v8i16, v8i16); -v4i32 __builtin_msa_add_a_w (v4i32, v4i32); -v2i64 __builtin_msa_add_a_d (v2i64, v2i64); +// reverse the bit order of bits +__builtin_avr_insert_bits (0x01234567, bits, 0); +@end smallexample +@enddefbuiltin -v16i8 __builtin_msa_adds_a_b (v16i8, v16i8); -v8i16 __builtin_msa_adds_a_h (v8i16, v8i16); -v4i32 __builtin_msa_adds_a_w (v4i32, v4i32); -v2i64 __builtin_msa_adds_a_d (v2i64, v2i64); +@defbuiltin{uint8_t __builtin_avr_mask1 (uint8_t @var{mask}, uint8_t @var{offs})} +Rotate the 8-bit constant value @var{mask} by an offset of @var{offs}, +where @var{mask} is in @{ 0x01, 0xfe, 0x7f, 0x80 @}. +This built-in can be used as an alternative to 8-bit expressions like +@code{1 << offs} when their computation consumes too much +time, and @var{offs} is known to be in the range 0@dots{}7. +@example +__builtin_avr_mask1 (1, offs) // same like 1 << offs +__builtin_avr_mask1 (~1, offs) // same like ~(1 << offs) +__builtin_avr_mask1 (0x80, offs) // same like 0x80 >> offs +__builtin_avr_mask1 (~0x80, offs) // same like ~(0x80 >> offs) +@end example +The open coded C versions take at least @code{5 + 4 * @var{offs}} cycles +(and 5 instructions), whereas the built-in takes 7 cycles and instructions +(8 cycles and instructions in the case of @code{@var{mask} = 0x7f}). +@enddefbuiltin -v16i8 __builtin_msa_adds_s_b (v16i8, v16i8); -v8i16 __builtin_msa_adds_s_h (v8i16, v8i16); -v4i32 __builtin_msa_adds_s_w (v4i32, v4i32); -v2i64 __builtin_msa_adds_s_d (v2i64, v2i64); +@defbuiltin{void __builtin_avr_nops (uint16_t @var{count})} +Insert @var{count} @code{NOP} instructions. +The number of instructions must be a compile-time integer constant. +@enddefbuiltin -v16u8 __builtin_msa_adds_u_b (v16u8, v16u8); -v8u16 __builtin_msa_adds_u_h (v8u16, v8u16); -v4u32 __builtin_msa_adds_u_w (v4u32, v4u32); -v2u64 __builtin_msa_adds_u_d (v2u64, v2u64); +@b{All of the following built-in functions are only available for GNU-C} -v16i8 __builtin_msa_addv_b (v16i8, v16i8); -v8i16 __builtin_msa_addv_h (v8i16, v8i16); -v4i32 __builtin_msa_addv_w (v4i32, v4i32); -v2i64 __builtin_msa_addv_d (v2i64, v2i64); +@defbuiltin{int8_t __builtin_avr_flash_segment (const __memx void*)} +This built-in takes a byte address to the 24-bit +@ref{AVR Named Address Spaces,named address space} @code{__memx} and returns +the number of the flash segment (the 64 KiB chunk) where the address +points to. Counting starts at @code{0}. +If the address does not point to flash memory, return @code{-1}. +@enddefbuiltin -v16i8 __builtin_msa_addvi_b (v16i8, imm0_31); -v8i16 __builtin_msa_addvi_h (v8i16, imm0_31); -v4i32 __builtin_msa_addvi_w (v4i32, imm0_31); -v2i64 __builtin_msa_addvi_d (v2i64, imm0_31); +@defbuiltin{size_t __builtin_avr_strlen_flash (const __flash char*)} +@defbuiltinx{size_t __builtin_avr_strlen_flashx (const __flashx char*)} +@defbuiltinx{size_t __builtin_avr_strlen_memx (const __memx char*)} +These built-ins return the length of a string located in +named address-space @code{__flash}, @code{__flashx} or @code{__memx}, +respectively. They are used to support functions like @code{strlen_F} from +@w{@uref{https://avrdudes.github.io/avr-libc/avr-libc-user-manual/,AVR-LibC}}'s +header @code{avr/flash.h}. +@enddefbuiltin -v16u8 __builtin_msa_and_v (v16u8, v16u8); +@noindent +There are many more AVR-specific built-in functions that are used to +implement the ISO/IEC TR 18037 ``Embedded C'' fixed-point functions of +section 7.18a.6. You don't need to use these built-ins directly. +Instead, use the declarations as supplied by the @code{stdfix.h} header +with GNU-C99: -v16u8 __builtin_msa_andi_b (v16u8, imm0_255); +@smallexample +#include -v16i8 __builtin_msa_asub_s_b (v16i8, v16i8); -v8i16 __builtin_msa_asub_s_h (v8i16, v8i16); -v4i32 __builtin_msa_asub_s_w (v4i32, v4i32); -v2i64 __builtin_msa_asub_s_d (v2i64, v2i64); +// Re-interpret the bit representation of unsigned 16-bit +// integer @var{uval} as Q-format 0.16 value. +unsigned fract get_bits (uint_ur_t uval) +@{ + return urbits (uval); +@} +@end smallexample -v16u8 __builtin_msa_asub_u_b (v16u8, v16u8); -v8u16 __builtin_msa_asub_u_h (v8u16, v8u16); -v4u32 __builtin_msa_asub_u_w (v4u32, v4u32); -v2u64 __builtin_msa_asub_u_d (v2u64, v2u64); +@node Blackfin Built-in Functions +@subsection Blackfin Built-in Functions -v16i8 __builtin_msa_ave_s_b (v16i8, v16i8); -v8i16 __builtin_msa_ave_s_h (v8i16, v8i16); -v4i32 __builtin_msa_ave_s_w (v4i32, v4i32); -v2i64 __builtin_msa_ave_s_d (v2i64, v2i64); +Currently, there are two Blackfin-specific built-in functions. These are +used for generating @code{CSYNC} and @code{SSYNC} machine insns without +using inline assembly; by using these built-in functions the compiler can +automatically add workarounds for hardware errata involving these +instructions. These functions are named as follows: -v16u8 __builtin_msa_ave_u_b (v16u8, v16u8); -v8u16 __builtin_msa_ave_u_h (v8u16, v8u16); -v4u32 __builtin_msa_ave_u_w (v4u32, v4u32); -v2u64 __builtin_msa_ave_u_d (v2u64, v2u64); +@smallexample +void __builtin_bfin_csync (void); +void __builtin_bfin_ssync (void); +@end smallexample -v16i8 __builtin_msa_aver_s_b (v16i8, v16i8); -v8i16 __builtin_msa_aver_s_h (v8i16, v8i16); -v4i32 __builtin_msa_aver_s_w (v4i32, v4i32); -v2i64 __builtin_msa_aver_s_d (v2i64, v2i64); +@node BPF Built-in Functions +@subsection BPF Built-in Functions -v16u8 __builtin_msa_aver_u_b (v16u8, v16u8); -v8u16 __builtin_msa_aver_u_h (v8u16, v8u16); -v4u32 __builtin_msa_aver_u_w (v4u32, v4u32); -v2u64 __builtin_msa_aver_u_d (v2u64, v2u64); +The following built-in functions are available for eBPF targets. -v16u8 __builtin_msa_bclr_b (v16u8, v16u8); -v8u16 __builtin_msa_bclr_h (v8u16, v8u16); -v4u32 __builtin_msa_bclr_w (v4u32, v4u32); -v2u64 __builtin_msa_bclr_d (v2u64, v2u64); +@defbuiltin{{unsigned long long} __builtin_bpf_load_byte (unsigned long long @var{offset})} +Load a byte from the @code{struct sk_buff} packet data pointed to by the +register @code{%r6}, and return it. +@enddefbuiltin -v16u8 __builtin_msa_bclri_b (v16u8, imm0_7); -v8u16 __builtin_msa_bclri_h (v8u16, imm0_15); -v4u32 __builtin_msa_bclri_w (v4u32, imm0_31); -v2u64 __builtin_msa_bclri_d (v2u64, imm0_63); +@defbuiltin{{unsigned long long} __builtin_bpf_load_half (unsigned long long @var{offset})} +Load 16 bits from the @code{struct sk_buff} packet data pointed to by the +register @code{%r6}, and return it. +@enddefbuiltin -v16u8 __builtin_msa_binsl_b (v16u8, v16u8, v16u8); -v8u16 __builtin_msa_binsl_h (v8u16, v8u16, v8u16); -v4u32 __builtin_msa_binsl_w (v4u32, v4u32, v4u32); -v2u64 __builtin_msa_binsl_d (v2u64, v2u64, v2u64); +@defbuiltin{{unsigned long long} __builtin_bpf_load_word (unsigned long long @var{offset})} +Load 32 bits from the @code{struct sk_buff} packet data pointed to by the +register @code{%r6}, and return it. +@enddefbuiltin -v16u8 __builtin_msa_binsli_b (v16u8, v16u8, imm0_7); -v8u16 __builtin_msa_binsli_h (v8u16, v8u16, imm0_15); -v4u32 __builtin_msa_binsli_w (v4u32, v4u32, imm0_31); -v2u64 __builtin_msa_binsli_d (v2u64, v2u64, imm0_63); +@defbuiltin{@var{type} __builtin_preserve_access_index (@var{type} @var{expr})} +BPF Compile Once-Run Everywhere (CO-RE) support. Instruct GCC to +generate CO-RE relocation records for any accesses to aggregate +data structures (struct, union, array types) in @var{expr}. This builtin +is otherwise transparent; @var{expr} may have any type and its value is +returned. This builtin has no effect if @code{-mco-re} is not in effect +(either specified or implied). +@enddefbuiltin -v16u8 __builtin_msa_binsr_b (v16u8, v16u8, v16u8); -v8u16 __builtin_msa_binsr_h (v8u16, v8u16, v8u16); -v4u32 __builtin_msa_binsr_w (v4u32, v4u32, v4u32); -v2u64 __builtin_msa_binsr_d (v2u64, v2u64, v2u64); +@defbuiltin{{unsigned int} __builtin_preserve_field_info (@var{expr}, unsigned int @var{kind})} +BPF Compile Once-Run Everywhere (CO-RE) support. This builtin is used to +extract information to aid in struct/union relocations. @var{expr} is +an access to a field of a struct or union. Depending on @var{kind}, different +information is returned to the program. A CO-RE relocation for the access in +@var{expr} with kind @var{kind} is recorded if @code{-mco-re} is in effect. -v16u8 __builtin_msa_binsri_b (v16u8, v16u8, imm0_7); -v8u16 __builtin_msa_binsri_h (v8u16, v8u16, imm0_15); -v4u32 __builtin_msa_binsri_w (v4u32, v4u32, imm0_31); -v2u64 __builtin_msa_binsri_d (v2u64, v2u64, imm0_63); +The following values are supported for @var{kind}: +@table @code +@item FIELD_BYTE_OFFSET = 0 +The returned value is the offset, in bytes, of the field from the +beginning of the containing structure. For bit-fields, this is the byte offset +of the containing word. -v16u8 __builtin_msa_bmnz_v (v16u8, v16u8, v16u8); +@item FIELD_BYTE_SIZE = 1 +The returned value is the size, in bytes, of the field. For bit-fields, +this is the size in bytes of the containing word. -v16u8 __builtin_msa_bmnzi_b (v16u8, v16u8, imm0_255); +@item FIELD_EXISTENCE = 2 +The returned value is 1 if the field exists, 0 otherwise. Always 1 at +compile time. -v16u8 __builtin_msa_bmz_v (v16u8, v16u8, v16u8); +@item FIELD_SIGNEDNESS = 3 +The returned value is 1 if the field is signed, 0 otherwise. -v16u8 __builtin_msa_bmzi_b (v16u8, v16u8, imm0_255); +@item FIELD_LSHIFT_U64 = 4 +@itemx FIELD_RSHIFT_U64 = 5 +The returned value is the number of bits of left- or right-shifting +(respectively) needed in order to recover the original value of the field, +after it has been loaded by a read of @code{FIELD_BYTE_SIZE} bytes into an +unsigned 64-bit value. Primarily useful for reading bit-field values +from structures that may change between kernel versions. -v16u8 __builtin_msa_bneg_b (v16u8, v16u8); -v8u16 __builtin_msa_bneg_h (v8u16, v8u16); -v4u32 __builtin_msa_bneg_w (v4u32, v4u32); -v2u64 __builtin_msa_bneg_d (v2u64, v2u64); +@end table -v16u8 __builtin_msa_bnegi_b (v16u8, imm0_7); -v8u16 __builtin_msa_bnegi_h (v8u16, imm0_15); -v4u32 __builtin_msa_bnegi_w (v4u32, imm0_31); -v2u64 __builtin_msa_bnegi_d (v2u64, imm0_63); +Note that the return value is a constant which is known at +compile time. If the field has a variable offset then +@code{FIELD_BYTE_OFFSET}, @code{FIELD_LSHIFT_U64}, +and @code{FIELD_RSHIFT_U64} are not supported. +Similarly, if the field has a variable size then +@code{FIELD_BYTE_SIZE}, @code{FIELD_LSHIFT_U64}, +and @code{FIELD_RSHIFT_U64} are not supported. -i32 __builtin_msa_bnz_b (v16u8); -i32 __builtin_msa_bnz_h (v8u16); -i32 __builtin_msa_bnz_w (v4u32); -i32 __builtin_msa_bnz_d (v2u64); +For example, @code{__builtin_preserve_field_info} can be used to reliably +extract bit-field values from a structure that may change between +kernel versions: -i32 __builtin_msa_bnz_v (v16u8); +@smallexample +struct S +@{ + short a; + int x:7; + int y:5; +@}; -v16u8 __builtin_msa_bsel_v (v16u8, v16u8, v16u8); +int +read_y (struct S *arg) +@{ + unsigned long long val; + unsigned int offset + = __builtin_preserve_field_info (arg->y, FIELD_BYTE_OFFSET); + unsigned int size + = __builtin_preserve_field_info (arg->y, FIELD_BYTE_SIZE); -v16u8 __builtin_msa_bseli_b (v16u8, v16u8, imm0_255); + /* Read size bytes from arg + offset into val. */ + bpf_probe_read (&val, size, arg + offset); -v16u8 __builtin_msa_bset_b (v16u8, v16u8); -v8u16 __builtin_msa_bset_h (v8u16, v8u16); -v4u32 __builtin_msa_bset_w (v4u32, v4u32); -v2u64 __builtin_msa_bset_d (v2u64, v2u64); + val <<= __builtin_preserve_field_info (arg->y, FIELD_LSHIFT_U64); -v16u8 __builtin_msa_bseti_b (v16u8, imm0_7); -v8u16 __builtin_msa_bseti_h (v8u16, imm0_15); -v4u32 __builtin_msa_bseti_w (v4u32, imm0_31); -v2u64 __builtin_msa_bseti_d (v2u64, imm0_63); + if (__builtin_preserve_field_info (arg->y, FIELD_SIGNEDNESS)) + val = ((long long) val + >> __builtin_preserve_field_info (arg->y, FIELD_RSHIFT_U64)); + else + val >>= __builtin_preserve_field_info (arg->y, FIELD_RSHIFT_U64); -i32 __builtin_msa_bz_b (v16u8); -i32 __builtin_msa_bz_h (v8u16); -i32 __builtin_msa_bz_w (v4u32); -i32 __builtin_msa_bz_d (v2u64); + return val; +@} -i32 __builtin_msa_bz_v (v16u8); +@end smallexample +@enddefbuiltin -v16i8 __builtin_msa_ceq_b (v16i8, v16i8); -v8i16 __builtin_msa_ceq_h (v8i16, v8i16); -v4i32 __builtin_msa_ceq_w (v4i32, v4i32); -v2i64 __builtin_msa_ceq_d (v2i64, v2i64); +@defbuiltin{{unsigned int} __builtin_preserve_enum_value (@var{type}, @var{enum}, unsigned int @var{kind})} +BPF Compile Once-Run Everywhere (CO-RE) support. This builtin collects enum +information and creates a CO-RE relocation relative to @var{enum} that should +be of @var{type}. The @var{kind} specifies the action performed. -v16i8 __builtin_msa_ceqi_b (v16i8, imm_n16_15); -v8i16 __builtin_msa_ceqi_h (v8i16, imm_n16_15); -v4i32 __builtin_msa_ceqi_w (v4i32, imm_n16_15); -v2i64 __builtin_msa_ceqi_d (v2i64, imm_n16_15); +The following values are supported for @var{kind}: +@table @code +@item ENUM_VALUE_EXISTS = 0 +The return value is either 0 or 1 depending if the enum value exists in the +target. -i32 __builtin_msa_cfcmsa (imm0_31); +@item ENUM_VALUE = 1 +The return value is the enum value in the target kernel. +@end table +@enddefbuiltin -v16i8 __builtin_msa_cle_s_b (v16i8, v16i8); -v8i16 __builtin_msa_cle_s_h (v8i16, v8i16); -v4i32 __builtin_msa_cle_s_w (v4i32, v4i32); -v2i64 __builtin_msa_cle_s_d (v2i64, v2i64); +@defbuiltin{{unsigned int} __builtin_btf_type_id (@var{type}, unsigned int @var{kind})} +BPF Compile Once-Run Everywhere (CO-RE) support. This builtin is used to get +the BTF type ID of a specified @var{type}. +Depending on the @var{kind} argument, it +either returns the ID of the local BTF information, or the BTF type ID in +the target kernel. -v16i8 __builtin_msa_cle_u_b (v16u8, v16u8); -v8i16 __builtin_msa_cle_u_h (v8u16, v8u16); -v4i32 __builtin_msa_cle_u_w (v4u32, v4u32); -v2i64 __builtin_msa_cle_u_d (v2u64, v2u64); +The following values are supported for @var{kind}: +@table @code +@item BTF_TYPE_ID_LOCAL = 0 +Return the local BTF type ID. Always succeeds. -v16i8 __builtin_msa_clei_s_b (v16i8, imm_n16_15); -v8i16 __builtin_msa_clei_s_h (v8i16, imm_n16_15); -v4i32 __builtin_msa_clei_s_w (v4i32, imm_n16_15); -v2i64 __builtin_msa_clei_s_d (v2i64, imm_n16_15); +@item BTF_TYPE_ID_TARGET = 1 +Return the target BTF type ID. If @var{type} does not exist in the target, +returns 0. +@end table +@enddefbuiltin -v16i8 __builtin_msa_clei_u_b (v16u8, imm0_31); -v8i16 __builtin_msa_clei_u_h (v8u16, imm0_31); -v4i32 __builtin_msa_clei_u_w (v4u32, imm0_31); -v2i64 __builtin_msa_clei_u_d (v2u64, imm0_31); +@defbuiltin{{unsigned int} __builtin_preserve_type_info (@var{type}, unsigned int @var{kind})} +BPF Compile Once-Run Everywhere (CO-RE) support. This builtin performs named +type (struct/union/enum/typedef) verifications. The type of verification +depends on the @var{kind} argument provided. This builtin always +returns 0 if @var{type} does not exist in the target kernel. -v16i8 __builtin_msa_clt_s_b (v16i8, v16i8); -v8i16 __builtin_msa_clt_s_h (v8i16, v8i16); -v4i32 __builtin_msa_clt_s_w (v4i32, v4i32); -v2i64 __builtin_msa_clt_s_d (v2i64, v2i64); +The following values are supported for @var{kind}: +@table @code +@item BTF_TYPE_EXISTS = 0 +Checks if @var{type} exists in the target. -v16i8 __builtin_msa_clt_u_b (v16u8, v16u8); -v8i16 __builtin_msa_clt_u_h (v8u16, v8u16); -v4i32 __builtin_msa_clt_u_w (v4u32, v4u32); -v2i64 __builtin_msa_clt_u_d (v2u64, v2u64); +@item BTF_TYPE_MATCHES = 1 +Checks if @var{type} matches the local definition in the target kernel. -v16i8 __builtin_msa_clti_s_b (v16i8, imm_n16_15); -v8i16 __builtin_msa_clti_s_h (v8i16, imm_n16_15); -v4i32 __builtin_msa_clti_s_w (v4i32, imm_n16_15); -v2i64 __builtin_msa_clti_s_d (v2i64, imm_n16_15); +@item BTF_TYPE_SIZE = 2 +Returns the size of the @var{type} within the target. +@end table +@enddefbuiltin -v16i8 __builtin_msa_clti_u_b (v16u8, imm0_31); -v8i16 __builtin_msa_clti_u_h (v8u16, imm0_31); -v4i32 __builtin_msa_clti_u_w (v4u32, imm0_31); -v2i64 __builtin_msa_clti_u_d (v2u64, imm0_31); +@node FR-V Built-in Functions +@subsection FR-V Built-in Functions -i32 __builtin_msa_copy_s_b (v16i8, imm0_15); -i32 __builtin_msa_copy_s_h (v8i16, imm0_7); -i32 __builtin_msa_copy_s_w (v4i32, imm0_3); -i64 __builtin_msa_copy_s_d (v2i64, imm0_1); +GCC provides many FR-V-specific built-in functions. In general, +these functions are intended to be compatible with those described +by @cite{FR-V Family, Softune C/C++ Compiler Manual (V6), Fujitsu +Semiconductor}. The two exceptions are @code{__MDUNPACKH} and +@code{__MBTOHE}, the GCC forms of which pass 128-bit values by +pointer rather than by value. -u32 __builtin_msa_copy_u_b (v16i8, imm0_15); -u32 __builtin_msa_copy_u_h (v8i16, imm0_7); -u32 __builtin_msa_copy_u_w (v4i32, imm0_3); -u64 __builtin_msa_copy_u_d (v2i64, imm0_1); +Most of the functions are named after specific FR-V instructions. +Such functions are said to be ``directly mapped'' and are summarized +here in tabular form. -void __builtin_msa_ctcmsa (imm0_31, i32); +@menu +* Argument Types:: +* Directly-mapped Integer Functions:: +* Directly-mapped Media Functions:: +* Raw read/write Functions:: +* Other Built-in Functions:: +@end menu -v16i8 __builtin_msa_div_s_b (v16i8, v16i8); -v8i16 __builtin_msa_div_s_h (v8i16, v8i16); -v4i32 __builtin_msa_div_s_w (v4i32, v4i32); -v2i64 __builtin_msa_div_s_d (v2i64, v2i64); +@node Argument Types +@subsubsection Argument Types -v16u8 __builtin_msa_div_u_b (v16u8, v16u8); -v8u16 __builtin_msa_div_u_h (v8u16, v8u16); -v4u32 __builtin_msa_div_u_w (v4u32, v4u32); -v2u64 __builtin_msa_div_u_d (v2u64, v2u64); +The arguments to the built-in functions can be divided into three groups: +register numbers, compile-time constants and run-time values. In order +to make this classification clear at a glance, the arguments and return +values are given the following pseudo types: -v8i16 __builtin_msa_dotp_s_h (v16i8, v16i8); -v4i32 __builtin_msa_dotp_s_w (v8i16, v8i16); -v2i64 __builtin_msa_dotp_s_d (v4i32, v4i32); +@multitable @columnfractions .20 .30 .15 .35 +@headitem Pseudo type @tab Real C type @tab Constant? @tab Description +@item @code{uh} @tab @code{unsigned short} @tab No @tab an unsigned halfword +@item @code{uw1} @tab @code{unsigned int} @tab No @tab an unsigned word +@item @code{sw1} @tab @code{int} @tab No @tab a signed word +@item @code{uw2} @tab @code{unsigned long long} @tab No +@tab an unsigned doubleword +@item @code{sw2} @tab @code{long long} @tab No @tab a signed doubleword +@item @code{const} @tab @code{int} @tab Yes @tab an integer constant +@item @code{acc} @tab @code{int} @tab Yes @tab an ACC register number +@item @code{iacc} @tab @code{int} @tab Yes @tab an IACC register number +@end multitable -v8u16 __builtin_msa_dotp_u_h (v16u8, v16u8); -v4u32 __builtin_msa_dotp_u_w (v8u16, v8u16); -v2u64 __builtin_msa_dotp_u_d (v4u32, v4u32); +These pseudo types are not defined by GCC, they are simply a notational +convenience used in this manual. -v8i16 __builtin_msa_dpadd_s_h (v8i16, v16i8, v16i8); -v4i32 __builtin_msa_dpadd_s_w (v4i32, v8i16, v8i16); -v2i64 __builtin_msa_dpadd_s_d (v2i64, v4i32, v4i32); +Arguments of type @code{uh}, @code{uw1}, @code{sw1}, @code{uw2} +and @code{sw2} are evaluated at run time. They correspond to +register operands in the underlying FR-V instructions. -v8u16 __builtin_msa_dpadd_u_h (v8u16, v16u8, v16u8); -v4u32 __builtin_msa_dpadd_u_w (v4u32, v8u16, v8u16); -v2u64 __builtin_msa_dpadd_u_d (v2u64, v4u32, v4u32); +@code{const} arguments represent immediate operands in the underlying +FR-V instructions. They must be compile-time constants. -v8i16 __builtin_msa_dpsub_s_h (v8i16, v16i8, v16i8); -v4i32 __builtin_msa_dpsub_s_w (v4i32, v8i16, v8i16); -v2i64 __builtin_msa_dpsub_s_d (v2i64, v4i32, v4i32); +@code{acc} arguments are evaluated at compile time and specify the number +of an accumulator register. For example, an @code{acc} argument of 2 +selects the ACC2 register. -v8i16 __builtin_msa_dpsub_u_h (v8i16, v16u8, v16u8); -v4i32 __builtin_msa_dpsub_u_w (v4i32, v8u16, v8u16); -v2i64 __builtin_msa_dpsub_u_d (v2i64, v4u32, v4u32); +@code{iacc} arguments are similar to @code{acc} arguments but specify the +number of an IACC register. See @pxref{Other Built-in Functions} +for more details. -v4f32 __builtin_msa_fadd_w (v4f32, v4f32); -v2f64 __builtin_msa_fadd_d (v2f64, v2f64); +@node Directly-mapped Integer Functions +@subsubsection Directly-Mapped Integer Functions -v4i32 __builtin_msa_fcaf_w (v4f32, v4f32); -v2i64 __builtin_msa_fcaf_d (v2f64, v2f64); +The functions listed below map directly to FR-V I-type instructions. -v4i32 __builtin_msa_fceq_w (v4f32, v4f32); -v2i64 __builtin_msa_fceq_d (v2f64, v2f64); +@multitable @columnfractions .45 .32 .23 +@headitem Function prototype @tab Example usage @tab Assembly output +@item @code{sw1 __ADDSS (sw1, sw1)} +@tab @code{@var{c} = __ADDSS (@var{a}, @var{b})} +@tab @code{ADDSS @var{a},@var{b},@var{c}} +@item @code{sw1 __SCAN (sw1, sw1)} +@tab @code{@var{c} = __SCAN (@var{a}, @var{b})} +@tab @code{SCAN @var{a},@var{b},@var{c}} +@item @code{sw1 __SCUTSS (sw1)} +@tab @code{@var{b} = __SCUTSS (@var{a})} +@tab @code{SCUTSS @var{a},@var{b}} +@item @code{sw1 __SLASS (sw1, sw1)} +@tab @code{@var{c} = __SLASS (@var{a}, @var{b})} +@tab @code{SLASS @var{a},@var{b},@var{c}} +@item @code{void __SMASS (sw1, sw1)} +@tab @code{__SMASS (@var{a}, @var{b})} +@tab @code{SMASS @var{a},@var{b}} +@item @code{void __SMSSS (sw1, sw1)} +@tab @code{__SMSSS (@var{a}, @var{b})} +@tab @code{SMSSS @var{a},@var{b}} +@item @code{void __SMU (sw1, sw1)} +@tab @code{__SMU (@var{a}, @var{b})} +@tab @code{SMU @var{a},@var{b}} +@item @code{sw2 __SMUL (sw1, sw1)} +@tab @code{@var{c} = __SMUL (@var{a}, @var{b})} +@tab @code{SMUL @var{a},@var{b},@var{c}} +@item @code{sw1 __SUBSS (sw1, sw1)} +@tab @code{@var{c} = __SUBSS (@var{a}, @var{b})} +@tab @code{SUBSS @var{a},@var{b},@var{c}} +@item @code{uw2 __UMUL (uw1, uw1)} +@tab @code{@var{c} = __UMUL (@var{a}, @var{b})} +@tab @code{UMUL @var{a},@var{b},@var{c}} +@end multitable -v4i32 __builtin_msa_fclass_w (v4f32); -v2i64 __builtin_msa_fclass_d (v2f64); +@node Directly-mapped Media Functions +@subsubsection Directly-Mapped Media Functions -v4i32 __builtin_msa_fcle_w (v4f32, v4f32); -v2i64 __builtin_msa_fcle_d (v2f64, v2f64); +The functions listed below map directly to FR-V M-type instructions. -v4i32 __builtin_msa_fclt_w (v4f32, v4f32); -v2i64 __builtin_msa_fclt_d (v2f64, v2f64); +@multitable @columnfractions .45 .32 .23 +@headitem Function prototype @tab Example usage @tab Assembly output +@item @code{uw1 __MABSHS (sw1)} +@tab @code{@var{b} = __MABSHS (@var{a})} +@tab @code{MABSHS @var{a},@var{b}} +@item @code{void __MADDACCS (acc, acc)} +@tab @code{__MADDACCS (@var{b}, @var{a})} +@tab @code{MADDACCS @var{a},@var{b}} +@item @code{sw1 __MADDHSS (sw1, sw1)} +@tab @code{@var{c} = __MADDHSS (@var{a}, @var{b})} +@tab @code{MADDHSS @var{a},@var{b},@var{c}} +@item @code{uw1 __MADDHUS (uw1, uw1)} +@tab @code{@var{c} = __MADDHUS (@var{a}, @var{b})} +@tab @code{MADDHUS @var{a},@var{b},@var{c}} +@item @code{uw1 __MAND (uw1, uw1)} +@tab @code{@var{c} = __MAND (@var{a}, @var{b})} +@tab @code{MAND @var{a},@var{b},@var{c}} +@item @code{void __MASACCS (acc, acc)} +@tab @code{__MASACCS (@var{b}, @var{a})} +@tab @code{MASACCS @var{a},@var{b}} +@item @code{uw1 __MAVEH (uw1, uw1)} +@tab @code{@var{c} = __MAVEH (@var{a}, @var{b})} +@tab @code{MAVEH @var{a},@var{b},@var{c}} +@item @code{uw2 __MBTOH (uw1)} +@tab @code{@var{b} = __MBTOH (@var{a})} +@tab @code{MBTOH @var{a},@var{b}} +@item @code{void __MBTOHE (uw1 *, uw1)} +@tab @code{__MBTOHE (&@var{b}, @var{a})} +@tab @code{MBTOHE @var{a},@var{b}} +@item @code{void __MCLRACC (acc)} +@tab @code{__MCLRACC (@var{a})} +@tab @code{MCLRACC @var{a}} +@item @code{void __MCLRACCA (void)} +@tab @code{__MCLRACCA ()} +@tab @code{MCLRACCA} +@item @code{uw1 __Mcop1 (uw1, uw1)} +@tab @code{@var{c} = __Mcop1 (@var{a}, @var{b})} +@tab @code{Mcop1 @var{a},@var{b},@var{c}} +@item @code{uw1 __Mcop2 (uw1, uw1)} +@tab @code{@var{c} = __Mcop2 (@var{a}, @var{b})} +@tab @code{Mcop2 @var{a},@var{b},@var{c}} +@item @code{uw1 __MCPLHI (uw2, const)} +@tab @code{@var{c} = __MCPLHI (@var{a}, @var{b})} +@tab @code{MCPLHI @var{a},#@var{b},@var{c}} +@item @code{uw1 __MCPLI (uw2, const)} +@tab @code{@var{c} = __MCPLI (@var{a}, @var{b})} +@tab @code{MCPLI @var{a},#@var{b},@var{c}} +@item @code{void __MCPXIS (acc, sw1, sw1)} +@tab @code{__MCPXIS (@var{c}, @var{a}, @var{b})} +@tab @code{MCPXIS @var{a},@var{b},@var{c}} +@item @code{void __MCPXIU (acc, uw1, uw1)} +@tab @code{__MCPXIU (@var{c}, @var{a}, @var{b})} +@tab @code{MCPXIU @var{a},@var{b},@var{c}} +@item @code{void __MCPXRS (acc, sw1, sw1)} +@tab @code{__MCPXRS (@var{c}, @var{a}, @var{b})} +@tab @code{MCPXRS @var{a},@var{b},@var{c}} +@item @code{void __MCPXRU (acc, uw1, uw1)} +@tab @code{__MCPXRU (@var{c}, @var{a}, @var{b})} +@tab @code{MCPXRU @var{a},@var{b},@var{c}} +@item @code{uw1 __MCUT (acc, uw1)} +@tab @code{@var{c} = __MCUT (@var{a}, @var{b})} +@tab @code{MCUT @var{a},@var{b},@var{c}} +@item @code{uw1 __MCUTSS (acc, sw1)} +@tab @code{@var{c} = __MCUTSS (@var{a}, @var{b})} +@tab @code{MCUTSS @var{a},@var{b},@var{c}} +@item @code{void __MDADDACCS (acc, acc)} +@tab @code{__MDADDACCS (@var{b}, @var{a})} +@tab @code{MDADDACCS @var{a},@var{b}} +@item @code{void __MDASACCS (acc, acc)} +@tab @code{__MDASACCS (@var{b}, @var{a})} +@tab @code{MDASACCS @var{a},@var{b}} +@item @code{uw2 __MDCUTSSI (acc, const)} +@tab @code{@var{c} = __MDCUTSSI (@var{a}, @var{b})} +@tab @code{MDCUTSSI @var{a},#@var{b},@var{c}} +@item @code{uw2 __MDPACKH (uw2, uw2)} +@tab @code{@var{c} = __MDPACKH (@var{a}, @var{b})} +@tab @code{MDPACKH @var{a},@var{b},@var{c}} +@item @code{uw2 __MDROTLI (uw2, const)} +@tab @code{@var{c} = __MDROTLI (@var{a}, @var{b})} +@tab @code{MDROTLI @var{a},#@var{b},@var{c}} +@item @code{void __MDSUBACCS (acc, acc)} +@tab @code{__MDSUBACCS (@var{b}, @var{a})} +@tab @code{MDSUBACCS @var{a},@var{b}} +@item @code{void __MDUNPACKH (uw1 *, uw2)} +@tab @code{__MDUNPACKH (&@var{b}, @var{a})} +@tab @code{MDUNPACKH @var{a},@var{b}} +@item @code{uw2 __MEXPDHD (uw1, const)} +@tab @code{@var{c} = __MEXPDHD (@var{a}, @var{b})} +@tab @code{MEXPDHD @var{a},#@var{b},@var{c}} +@item @code{uw1 __MEXPDHW (uw1, const)} +@tab @code{@var{c} = __MEXPDHW (@var{a}, @var{b})} +@tab @code{MEXPDHW @var{a},#@var{b},@var{c}} +@item @code{uw1 __MHDSETH (uw1, const)} +@tab @code{@var{c} = __MHDSETH (@var{a}, @var{b})} +@tab @code{MHDSETH @var{a},#@var{b},@var{c}} +@item @code{sw1 __MHDSETS (const)} +@tab @code{@var{b} = __MHDSETS (@var{a})} +@tab @code{MHDSETS #@var{a},@var{b}} +@item @code{uw1 __MHSETHIH (uw1, const)} +@tab @code{@var{b} = __MHSETHIH (@var{b}, @var{a})} +@tab @code{MHSETHIH #@var{a},@var{b}} +@item @code{sw1 __MHSETHIS (sw1, const)} +@tab @code{@var{b} = __MHSETHIS (@var{b}, @var{a})} +@tab @code{MHSETHIS #@var{a},@var{b}} +@item @code{uw1 __MHSETLOH (uw1, const)} +@tab @code{@var{b} = __MHSETLOH (@var{b}, @var{a})} +@tab @code{MHSETLOH #@var{a},@var{b}} +@item @code{sw1 __MHSETLOS (sw1, const)} +@tab @code{@var{b} = __MHSETLOS (@var{b}, @var{a})} +@tab @code{MHSETLOS #@var{a},@var{b}} +@item @code{uw1 __MHTOB (uw2)} +@tab @code{@var{b} = __MHTOB (@var{a})} +@tab @code{MHTOB @var{a},@var{b}} +@item @code{void __MMACHS (acc, sw1, sw1)} +@tab @code{__MMACHS (@var{c}, @var{a}, @var{b})} +@tab @code{MMACHS @var{a},@var{b},@var{c}} +@item @code{void __MMACHU (acc, uw1, uw1)} +@tab @code{__MMACHU (@var{c}, @var{a}, @var{b})} +@tab @code{MMACHU @var{a},@var{b},@var{c}} +@item @code{void __MMRDHS (acc, sw1, sw1)} +@tab @code{__MMRDHS (@var{c}, @var{a}, @var{b})} +@tab @code{MMRDHS @var{a},@var{b},@var{c}} +@item @code{void __MMRDHU (acc, uw1, uw1)} +@tab @code{__MMRDHU (@var{c}, @var{a}, @var{b})} +@tab @code{MMRDHU @var{a},@var{b},@var{c}} +@item @code{void __MMULHS (acc, sw1, sw1)} +@tab @code{__MMULHS (@var{c}, @var{a}, @var{b})} +@tab @code{MMULHS @var{a},@var{b},@var{c}} +@item @code{void __MMULHU (acc, uw1, uw1)} +@tab @code{__MMULHU (@var{c}, @var{a}, @var{b})} +@tab @code{MMULHU @var{a},@var{b},@var{c}} +@item @code{void __MMULXHS (acc, sw1, sw1)} +@tab @code{__MMULXHS (@var{c}, @var{a}, @var{b})} +@tab @code{MMULXHS @var{a},@var{b},@var{c}} +@item @code{void __MMULXHU (acc, uw1, uw1)} +@tab @code{__MMULXHU (@var{c}, @var{a}, @var{b})} +@tab @code{MMULXHU @var{a},@var{b},@var{c}} +@item @code{uw1 __MNOT (uw1)} +@tab @code{@var{b} = __MNOT (@var{a})} +@tab @code{MNOT @var{a},@var{b}} +@item @code{uw1 __MOR (uw1, uw1)} +@tab @code{@var{c} = __MOR (@var{a}, @var{b})} +@tab @code{MOR @var{a},@var{b},@var{c}} +@item @code{uw1 __MPACKH (uh, uh)} +@tab @code{@var{c} = __MPACKH (@var{a}, @var{b})} +@tab @code{MPACKH @var{a},@var{b},@var{c}} +@item @code{sw2 __MQADDHSS (sw2, sw2)} +@tab @code{@var{c} = __MQADDHSS (@var{a}, @var{b})} +@tab @code{MQADDHSS @var{a},@var{b},@var{c}} +@item @code{uw2 __MQADDHUS (uw2, uw2)} +@tab @code{@var{c} = __MQADDHUS (@var{a}, @var{b})} +@tab @code{MQADDHUS @var{a},@var{b},@var{c}} +@item @code{void __MQCPXIS (acc, sw2, sw2)} +@tab @code{__MQCPXIS (@var{c}, @var{a}, @var{b})} +@tab @code{MQCPXIS @var{a},@var{b},@var{c}} +@item @code{void __MQCPXIU (acc, uw2, uw2)} +@tab @code{__MQCPXIU (@var{c}, @var{a}, @var{b})} +@tab @code{MQCPXIU @var{a},@var{b},@var{c}} +@item @code{void __MQCPXRS (acc, sw2, sw2)} +@tab @code{__MQCPXRS (@var{c}, @var{a}, @var{b})} +@tab @code{MQCPXRS @var{a},@var{b},@var{c}} +@item @code{void __MQCPXRU (acc, uw2, uw2)} +@tab @code{__MQCPXRU (@var{c}, @var{a}, @var{b})} +@tab @code{MQCPXRU @var{a},@var{b},@var{c}} +@item @code{sw2 __MQLCLRHS (sw2, sw2)} +@tab @code{@var{c} = __MQLCLRHS (@var{a}, @var{b})} +@tab @code{MQLCLRHS @var{a},@var{b},@var{c}} +@item @code{sw2 __MQLMTHS (sw2, sw2)} +@tab @code{@var{c} = __MQLMTHS (@var{a}, @var{b})} +@tab @code{MQLMTHS @var{a},@var{b},@var{c}} +@item @code{void __MQMACHS (acc, sw2, sw2)} +@tab @code{__MQMACHS (@var{c}, @var{a}, @var{b})} +@tab @code{MQMACHS @var{a},@var{b},@var{c}} +@item @code{void __MQMACHU (acc, uw2, uw2)} +@tab @code{__MQMACHU (@var{c}, @var{a}, @var{b})} +@tab @code{MQMACHU @var{a},@var{b},@var{c}} +@item @code{void __MQMACXHS (acc, sw2, sw2)} +@tab @code{__MQMACXHS (@var{c}, @var{a}, @var{b})} +@tab @code{MQMACXHS @var{a},@var{b},@var{c}} +@item @code{void __MQMULHS (acc, sw2, sw2)} +@tab @code{__MQMULHS (@var{c}, @var{a}, @var{b})} +@tab @code{MQMULHS @var{a},@var{b},@var{c}} +@item @code{void __MQMULHU (acc, uw2, uw2)} +@tab @code{__MQMULHU (@var{c}, @var{a}, @var{b})} +@tab @code{MQMULHU @var{a},@var{b},@var{c}} +@item @code{void __MQMULXHS (acc, sw2, sw2)} +@tab @code{__MQMULXHS (@var{c}, @var{a}, @var{b})} +@tab @code{MQMULXHS @var{a},@var{b},@var{c}} +@item @code{void __MQMULXHU (acc, uw2, uw2)} +@tab @code{__MQMULXHU (@var{c}, @var{a}, @var{b})} +@tab @code{MQMULXHU @var{a},@var{b},@var{c}} +@item @code{sw2 __MQSATHS (sw2, sw2)} +@tab @code{@var{c} = __MQSATHS (@var{a}, @var{b})} +@tab @code{MQSATHS @var{a},@var{b},@var{c}} +@item @code{uw2 __MQSLLHI (uw2, int)} +@tab @code{@var{c} = __MQSLLHI (@var{a}, @var{b})} +@tab @code{MQSLLHI @var{a},@var{b},@var{c}} +@item @code{sw2 __MQSRAHI (sw2, int)} +@tab @code{@var{c} = __MQSRAHI (@var{a}, @var{b})} +@tab @code{MQSRAHI @var{a},@var{b},@var{c}} +@item @code{sw2 __MQSUBHSS (sw2, sw2)} +@tab @code{@var{c} = __MQSUBHSS (@var{a}, @var{b})} +@tab @code{MQSUBHSS @var{a},@var{b},@var{c}} +@item @code{uw2 __MQSUBHUS (uw2, uw2)} +@tab @code{@var{c} = __MQSUBHUS (@var{a}, @var{b})} +@tab @code{MQSUBHUS @var{a},@var{b},@var{c}} +@item @code{void __MQXMACHS (acc, sw2, sw2)} +@tab @code{__MQXMACHS (@var{c}, @var{a}, @var{b})} +@tab @code{MQXMACHS @var{a},@var{b},@var{c}} +@item @code{void __MQXMACXHS (acc, sw2, sw2)} +@tab @code{__MQXMACXHS (@var{c}, @var{a}, @var{b})} +@tab @code{MQXMACXHS @var{a},@var{b},@var{c}} +@item @code{uw1 __MRDACC (acc)} +@tab @code{@var{b} = __MRDACC (@var{a})} +@tab @code{MRDACC @var{a},@var{b}} +@item @code{uw1 __MRDACCG (acc)} +@tab @code{@var{b} = __MRDACCG (@var{a})} +@tab @code{MRDACCG @var{a},@var{b}} +@item @code{uw1 __MROTLI (uw1, const)} +@tab @code{@var{c} = __MROTLI (@var{a}, @var{b})} +@tab @code{MROTLI @var{a},#@var{b},@var{c}} +@item @code{uw1 __MROTRI (uw1, const)} +@tab @code{@var{c} = __MROTRI (@var{a}, @var{b})} +@tab @code{MROTRI @var{a},#@var{b},@var{c}} +@item @code{sw1 __MSATHS (sw1, sw1)} +@tab @code{@var{c} = __MSATHS (@var{a}, @var{b})} +@tab @code{MSATHS @var{a},@var{b},@var{c}} +@item @code{uw1 __MSATHU (uw1, uw1)} +@tab @code{@var{c} = __MSATHU (@var{a}, @var{b})} +@tab @code{MSATHU @var{a},@var{b},@var{c}} +@item @code{uw1 __MSLLHI (uw1, const)} +@tab @code{@var{c} = __MSLLHI (@var{a}, @var{b})} +@tab @code{MSLLHI @var{a},#@var{b},@var{c}} +@item @code{sw1 __MSRAHI (sw1, const)} +@tab @code{@var{c} = __MSRAHI (@var{a}, @var{b})} +@tab @code{MSRAHI @var{a},#@var{b},@var{c}} +@item @code{uw1 __MSRLHI (uw1, const)} +@tab @code{@var{c} = __MSRLHI (@var{a}, @var{b})} +@tab @code{MSRLHI @var{a},#@var{b},@var{c}} +@item @code{void __MSUBACCS (acc, acc)} +@tab @code{__MSUBACCS (@var{b}, @var{a})} +@tab @code{MSUBACCS @var{a},@var{b}} +@item @code{sw1 __MSUBHSS (sw1, sw1)} +@tab @code{@var{c} = __MSUBHSS (@var{a}, @var{b})} +@tab @code{MSUBHSS @var{a},@var{b},@var{c}} +@item @code{uw1 __MSUBHUS (uw1, uw1)} +@tab @code{@var{c} = __MSUBHUS (@var{a}, @var{b})} +@tab @code{MSUBHUS @var{a},@var{b},@var{c}} +@item @code{void __MTRAP (void)} +@tab @code{__MTRAP ()} +@tab @code{MTRAP} +@item @code{uw2 __MUNPACKH (uw1)} +@tab @code{@var{b} = __MUNPACKH (@var{a})} +@tab @code{MUNPACKH @var{a},@var{b}} +@item @code{uw1 __MWCUT (uw2, uw1)} +@tab @code{@var{c} = __MWCUT (@var{a}, @var{b})} +@tab @code{MWCUT @var{a},@var{b},@var{c}} +@item @code{void __MWTACC (acc, uw1)} +@tab @code{__MWTACC (@var{b}, @var{a})} +@tab @code{MWTACC @var{a},@var{b}} +@item @code{void __MWTACCG (acc, uw1)} +@tab @code{__MWTACCG (@var{b}, @var{a})} +@tab @code{MWTACCG @var{a},@var{b}} +@item @code{uw1 __MXOR (uw1, uw1)} +@tab @code{@var{c} = __MXOR (@var{a}, @var{b})} +@tab @code{MXOR @var{a},@var{b},@var{c}} +@end multitable -v4i32 __builtin_msa_fcne_w (v4f32, v4f32); -v2i64 __builtin_msa_fcne_d (v2f64, v2f64); +@node Raw read/write Functions +@subsubsection Raw Read/Write Functions -v4i32 __builtin_msa_fcor_w (v4f32, v4f32); -v2i64 __builtin_msa_fcor_d (v2f64, v2f64); +This sections describes built-in functions related to read and write +instructions to access memory. These functions generate +@code{membar} instructions to flush the I/O load and stores where +appropriate, as described in Fujitsu's manual described above. -v4i32 __builtin_msa_fcueq_w (v4f32, v4f32); -v2i64 __builtin_msa_fcueq_d (v2f64, v2f64); +@table @code -v4i32 __builtin_msa_fcule_w (v4f32, v4f32); -v2i64 __builtin_msa_fcule_d (v2f64, v2f64); +@item unsigned char __builtin_read8 (void *@var{data}) +@item unsigned short __builtin_read16 (void *@var{data}) +@item unsigned long __builtin_read32 (void *@var{data}) +@item unsigned long long __builtin_read64 (void *@var{data}) -v4i32 __builtin_msa_fcult_w (v4f32, v4f32); -v2i64 __builtin_msa_fcult_d (v2f64, v2f64); +@item void __builtin_write8 (void *@var{data}, unsigned char @var{datum}) +@item void __builtin_write16 (void *@var{data}, unsigned short @var{datum}) +@item void __builtin_write32 (void *@var{data}, unsigned long @var{datum}) +@item void __builtin_write64 (void *@var{data}, unsigned long long @var{datum}) +@end table -v4i32 __builtin_msa_fcun_w (v4f32, v4f32); -v2i64 __builtin_msa_fcun_d (v2f64, v2f64); +@node Other Built-in Functions +@subsubsection Other Built-in Functions -v4i32 __builtin_msa_fcune_w (v4f32, v4f32); -v2i64 __builtin_msa_fcune_d (v2f64, v2f64); +This section describes built-in functions that are not named after +a specific FR-V instruction. -v4f32 __builtin_msa_fdiv_w (v4f32, v4f32); -v2f64 __builtin_msa_fdiv_d (v2f64, v2f64); +@table @code +@item sw2 __IACCreadll (iacc @var{reg}) +Return the full 64-bit value of IACC0@. The @var{reg} argument is reserved +for future expansion and must be 0. -v8i16 __builtin_msa_fexdo_h (v4f32, v4f32); -v4f32 __builtin_msa_fexdo_w (v2f64, v2f64); +@item sw1 __IACCreadl (iacc @var{reg}) +Return the value of IACC0H if @var{reg} is 0 and IACC0L if @var{reg} is 1. +Other values of @var{reg} are rejected as invalid. -v4f32 __builtin_msa_fexp2_w (v4f32, v4i32); -v2f64 __builtin_msa_fexp2_d (v2f64, v2i64); +@item void __IACCsetll (iacc @var{reg}, sw2 @var{x}) +Set the full 64-bit value of IACC0 to @var{x}. The @var{reg} argument +is reserved for future expansion and must be 0. -v4f32 __builtin_msa_fexupl_w (v8i16); -v2f64 __builtin_msa_fexupl_d (v4f32); +@item void __IACCsetl (iacc @var{reg}, sw1 @var{x}) +Set IACC0H to @var{x} if @var{reg} is 0 and IACC0L to @var{x} if @var{reg} +is 1. Other values of @var{reg} are rejected as invalid. -v4f32 __builtin_msa_fexupr_w (v8i16); -v2f64 __builtin_msa_fexupr_d (v4f32); +@item void __data_prefetch0 (const void *@var{x}) +Use the @code{dcpl} instruction to load the contents of address @var{x} +into the data cache. -v4f32 __builtin_msa_ffint_s_w (v4i32); -v2f64 __builtin_msa_ffint_s_d (v2i64); +@item void __data_prefetch (const void *@var{x}) +Use the @code{nldub} instruction to load the contents of address @var{x} +into the data cache. The instruction is issued in slot I1@. +@end table -v4f32 __builtin_msa_ffint_u_w (v4u32); -v2f64 __builtin_msa_ffint_u_d (v2u64); +@node LoongArch Base Built-in Functions +@subsection LoongArch Base Built-in Functions -v4f32 __builtin_msa_ffql_w (v8i16); -v2f64 __builtin_msa_ffql_d (v4i32); +These built-in functions are available for LoongArch. -v4f32 __builtin_msa_ffqr_w (v8i16); -v2f64 __builtin_msa_ffqr_d (v4i32); +Data Type Description: +@itemize +@item @code{imm0_31}, a compile-time constant in range 0 to 31; +@item @code{imm0_16383}, a compile-time constant in range 0 to 16383; +@item @code{imm0_32767}, a compile-time constant in range 0 to 32767; +@item @code{imm_n2048_2047}, a compile-time constant in range -2048 to 2047; +@end itemize -v16i8 __builtin_msa_fill_b (i32); -v8i16 __builtin_msa_fill_h (i32); -v4i32 __builtin_msa_fill_w (i32); -v2i64 __builtin_msa_fill_d (i64); +The intrinsics provided are listed below: +@smallexample + unsigned int __builtin_loongarch_movfcsr2gr (imm0_31) + void __builtin_loongarch_movgr2fcsr (imm0_31, unsigned int) + void __builtin_loongarch_cacop_d (imm0_31, unsigned long int, imm_n2048_2047) + unsigned int __builtin_loongarch_cpucfg (unsigned int) + void __builtin_loongarch_asrtle_d (long int, long int) + void __builtin_loongarch_asrtgt_d (long int, long int) + long int __builtin_loongarch_lddir_d (long int, imm0_31) + void __builtin_loongarch_ldpte_d (long int, imm0_31) -v4f32 __builtin_msa_flog2_w (v4f32); -v2f64 __builtin_msa_flog2_d (v2f64); + int __builtin_loongarch_crc_w_b_w (char, int) + int __builtin_loongarch_crc_w_h_w (short, int) + int __builtin_loongarch_crc_w_w_w (int, int) + int __builtin_loongarch_crc_w_d_w (long int, int) + int __builtin_loongarch_crcc_w_b_w (char, int) + int __builtin_loongarch_crcc_w_h_w (short, int) + int __builtin_loongarch_crcc_w_w_w (int, int) + int __builtin_loongarch_crcc_w_d_w (long int, int) -v4f32 __builtin_msa_fmadd_w (v4f32, v4f32, v4f32); -v2f64 __builtin_msa_fmadd_d (v2f64, v2f64, v2f64); + unsigned int __builtin_loongarch_csrrd_w (imm0_16383) + unsigned int __builtin_loongarch_csrwr_w (unsigned int, imm0_16383) + unsigned int __builtin_loongarch_csrxchg_w (unsigned int, unsigned int, imm0_16383) + unsigned long int __builtin_loongarch_csrrd_d (imm0_16383) + unsigned long int __builtin_loongarch_csrwr_d (unsigned long int, imm0_16383) + unsigned long int __builtin_loongarch_csrxchg_d (unsigned long int, unsigned long int, imm0_16383) -v4f32 __builtin_msa_fmax_w (v4f32, v4f32); -v2f64 __builtin_msa_fmax_d (v2f64, v2f64); + unsigned char __builtin_loongarch_iocsrrd_b (unsigned int) + unsigned short __builtin_loongarch_iocsrrd_h (unsigned int) + unsigned int __builtin_loongarch_iocsrrd_w (unsigned int) + unsigned long int __builtin_loongarch_iocsrrd_d (unsigned int) + void __builtin_loongarch_iocsrwr_b (unsigned char, unsigned int) + void __builtin_loongarch_iocsrwr_h (unsigned short, unsigned int) + void __builtin_loongarch_iocsrwr_w (unsigned int, unsigned int) + void __builtin_loongarch_iocsrwr_d (unsigned long int, unsigned int) -v4f32 __builtin_msa_fmax_a_w (v4f32, v4f32); -v2f64 __builtin_msa_fmax_a_d (v2f64, v2f64); + void __builtin_loongarch_dbar (imm0_32767) + void __builtin_loongarch_ibar (imm0_32767) -v4f32 __builtin_msa_fmin_w (v4f32, v4f32); -v2f64 __builtin_msa_fmin_d (v2f64, v2f64); + void __builtin_loongarch_syscall (imm0_32767) + void __builtin_loongarch_break (imm0_32767) +@end smallexample -v4f32 __builtin_msa_fmin_a_w (v4f32, v4f32); -v2f64 __builtin_msa_fmin_a_d (v2f64, v2f64); +These intrinsic functions are available by using @option{-mfrecipe}. +@smallexample + float __builtin_loongarch_frecipe_s (float); + double __builtin_loongarch_frecipe_d (double); + float __builtin_loongarch_frsqrte_s (float); + double __builtin_loongarch_frsqrte_d (double); +@end smallexample -v4f32 __builtin_msa_fmsub_w (v4f32, v4f32, v4f32); -v2f64 __builtin_msa_fmsub_d (v2f64, v2f64, v2f64); +@emph{Note:}Since the control register is divided into 32-bit and 64-bit, +but the access instruction is not distinguished. So GCC renames the control +instructions when implementing intrinsics. -v4f32 __builtin_msa_fmul_w (v4f32, v4f32); -v2f64 __builtin_msa_fmul_d (v2f64, v2f64); +Take the csrrd instruction as an example, built-in functions are implemented as follows: +@smallexample + __builtin_loongarch_csrrd_w // When reading the 32-bit control register use. + __builtin_loongarch_csrrd_d // When reading the 64-bit control register use. +@end smallexample -v4f32 __builtin_msa_frint_w (v4f32); -v2f64 __builtin_msa_frint_d (v2f64); +For the convenience of use, the built-in functions are encapsulated, +the encapsulated functions and @code{__drdtime_t, __rdtime_t} are +defined in the @code{larchintrin.h}. So if you call the following +function you need to include @code{larchintrin.h}. -v4f32 __builtin_msa_frcp_w (v4f32); -v2f64 __builtin_msa_frcp_d (v2f64); +@smallexample + typedef struct drdtime@{ + unsigned long dvalue; + unsigned long dtimeid; + @} __drdtime_t; -v4f32 __builtin_msa_frsqrt_w (v4f32); -v2f64 __builtin_msa_frsqrt_d (v2f64); + typedef struct rdtime@{ + unsigned int value; + unsigned int timeid; + @} __rdtime_t; +@end smallexample -v4i32 __builtin_msa_fsaf_w (v4f32, v4f32); -v2i64 __builtin_msa_fsaf_d (v2f64, v2f64); +@smallexample + __drdtime_t __rdtime_d (void) + __rdtime_t __rdtimel_w (void) + __rdtime_t __rdtimeh_w (void) + unsigned int __movfcsr2gr (imm0_31) + void __movgr2fcsr (imm0_31, unsigned int) + void __cacop_d (imm0_31, unsigned long, imm_n2048_2047) + unsigned int __cpucfg (unsigned int) + void __asrtle_d (long int, long int) + void __asrtgt_d (long int, long int) + long int __lddir_d (long int, imm0_31) + void __ldpte_d (long int, imm0_31) -v4i32 __builtin_msa_fseq_w (v4f32, v4f32); -v2i64 __builtin_msa_fseq_d (v2f64, v2f64); + int __crc_w_b_w (char, int) + int __crc_w_h_w (short, int) + int __crc_w_w_w (int, int) + int __crc_w_d_w (long int, int) + int __crcc_w_b_w (char, int) + int __crcc_w_h_w (short, int) + int __crcc_w_w_w (int, int) + int __crcc_w_d_w (long int, int) -v4i32 __builtin_msa_fsle_w (v4f32, v4f32); -v2i64 __builtin_msa_fsle_d (v2f64, v2f64); + unsigned int __csrrd_w (imm0_16383) + unsigned int __csrwr_w (unsigned int, imm0_16383) + unsigned int __csrxchg_w (unsigned int, unsigned int, imm0_16383) + unsigned long __csrrd_d (imm0_16383) + unsigned long __csrwr_d (unsigned long, imm0_16383) + unsigned long __csrxchg_d (unsigned long, unsigned long, imm0_16383) -v4i32 __builtin_msa_fslt_w (v4f32, v4f32); -v2i64 __builtin_msa_fslt_d (v2f64, v2f64); + unsigned char __iocsrrd_b (unsigned int) + unsigned short __iocsrrd_h (unsigned int) + unsigned int __iocsrrd_w (unsigned int) + unsigned long __iocsrrd_d (unsigned int) + void __iocsrwr_b (unsigned char, unsigned int) + void __iocsrwr_h (unsigned short, unsigned int) + void __iocsrwr_w (unsigned int, unsigned int) + void __iocsrwr_d (unsigned long, unsigned int) -v4i32 __builtin_msa_fsne_w (v4f32, v4f32); -v2i64 __builtin_msa_fsne_d (v2f64, v2f64); + void __dbar (imm0_32767) + void __ibar (imm0_32767) -v4i32 __builtin_msa_fsor_w (v4f32, v4f32); -v2i64 __builtin_msa_fsor_d (v2f64, v2f64); + void __syscall (imm0_32767) + void __break (imm0_32767) +@end smallexample -v4f32 __builtin_msa_fsqrt_w (v4f32); -v2f64 __builtin_msa_fsqrt_d (v2f64); +These intrinsic functions are available by including @code{larchintrin.h} and +using @option{-mfrecipe}. +@smallexample + float __frecipe_s (float); + double __frecipe_d (double); + float __frsqrte_s (float); + double __frsqrte_d (double); +@end smallexample -v4f32 __builtin_msa_fsub_w (v4f32, v4f32); -v2f64 __builtin_msa_fsub_d (v2f64, v2f64); +Additional built-in functions are available for LoongArch family +processors to efficiently use 128-bit floating-point (__float128) +values. -v4i32 __builtin_msa_fsueq_w (v4f32, v4f32); -v2i64 __builtin_msa_fsueq_d (v2f64, v2f64); +The following are the basic built-in functions supported. +@smallexample +__float128 __builtin_fabsq (__float128); +__float128 __builtin_copysignq (__float128, __float128); +__float128 __builtin_infq (void); +__float128 __builtin_huge_valq (void); +__float128 __builtin_nanq (void); +__float128 __builtin_nansq (void); +@end smallexample -v4i32 __builtin_msa_fsule_w (v4f32, v4f32); -v2i64 __builtin_msa_fsule_d (v2f64, v2f64); +Returns the value that is currently set in the @samp{tp} register. +@smallexample + void * __builtin_thread_pointer (void) +@end smallexample -v4i32 __builtin_msa_fsult_w (v4f32, v4f32); -v2i64 __builtin_msa_fsult_d (v2f64, v2f64); +@node LoongArch SX Vector Intrinsics +@subsection LoongArch SX Vector Intrinsics -v4i32 __builtin_msa_fsun_w (v4f32, v4f32); -v2i64 __builtin_msa_fsun_d (v2f64, v2f64); +GCC provides intrinsics to access the LSX (Loongson SIMD Extension) instructions. +The interface is made available by including @code{} and using +@option{-mlsx}. -v4i32 __builtin_msa_fsune_w (v4f32, v4f32); -v2i64 __builtin_msa_fsune_d (v2f64, v2f64); +The following vectors typedefs are included in @code{lsxintrin.h}: -v4i32 __builtin_msa_ftint_s_w (v4f32); -v2i64 __builtin_msa_ftint_s_d (v2f64); +@itemize +@item @code{__m128i}, a 128-bit vector of fixed point; +@item @code{__m128}, a 128-bit vector of single precision floating point; +@item @code{__m128d}, a 128-bit vector of double precision floating point. +@end itemize -v4u32 __builtin_msa_ftint_u_w (v4f32); -v2u64 __builtin_msa_ftint_u_d (v2f64); +Instructions and corresponding built-ins may have additional restrictions and/or +input/output values manipulated: +@itemize +@item @code{imm0_1}, an integer literal in range 0 to 1; +@item @code{imm0_3}, an integer literal in range 0 to 3; +@item @code{imm0_7}, an integer literal in range 0 to 7; +@item @code{imm0_15}, an integer literal in range 0 to 15; +@item @code{imm0_31}, an integer literal in range 0 to 31; +@item @code{imm0_63}, an integer literal in range 0 to 63; +@item @code{imm0_127}, an integer literal in range 0 to 127; +@item @code{imm0_255}, an integer literal in range 0 to 255; +@item @code{imm_n16_15}, an integer literal in range -16 to 15; +@item @code{imm_n128_127}, an integer literal in range -128 to 127; +@item @code{imm_n256_255}, an integer literal in range -256 to 255; +@item @code{imm_n512_511}, an integer literal in range -512 to 511; +@item @code{imm_n1024_1023}, an integer literal in range -1024 to 1023; +@item @code{imm_n2048_2047}, an integer literal in range -2048 to 2047. +@end itemize -v8i16 __builtin_msa_ftq_h (v4f32, v4f32); -v4i32 __builtin_msa_ftq_w (v2f64, v2f64); +For convenience, GCC defines functions @code{__lsx_vrepli_@{b/h/w/d@}} and +@code{__lsx_b[n]z_@{v/b/h/w/d@}}, which are implemented as follows: -v4i32 __builtin_msa_ftrunc_s_w (v4f32); -v2i64 __builtin_msa_ftrunc_s_d (v2f64); +@smallexample +a. @code{__lsx_vrepli_@{b/h/w/d@}}: Implemented the case where the highest + bit of @code{vldi} instruction @code{i13} is 1. -v4u32 __builtin_msa_ftrunc_u_w (v4f32); -v2u64 __builtin_msa_ftrunc_u_d (v2f64); + i13[12] == 1'b0 + case i13[11:10] of : + 2'b00: __lsx_vrepli_b (imm_n512_511) + 2'b01: __lsx_vrepli_h (imm_n512_511) + 2'b10: __lsx_vrepli_w (imm_n512_511) + 2'b11: __lsx_vrepli_d (imm_n512_511) -v8i16 __builtin_msa_hadd_s_h (v16i8, v16i8); -v4i32 __builtin_msa_hadd_s_w (v8i16, v8i16); -v2i64 __builtin_msa_hadd_s_d (v4i32, v4i32); +b. @code{__lsx_b[n]z_@{v/b/h/w/d@}}: Since the @code{vseteqz} class directive + cannot be used on its own, this function is defined. -v8u16 __builtin_msa_hadd_u_h (v16u8, v16u8); -v4u32 __builtin_msa_hadd_u_w (v8u16, v8u16); -v2u64 __builtin_msa_hadd_u_d (v4u32, v4u32); + _lsx_bz_v => vseteqz.v + bcnez + _lsx_bnz_v => vsetnez.v + bcnez + _lsx_bz_b => vsetanyeqz.b + bcnez + _lsx_bz_h => vsetanyeqz.h + bcnez + _lsx_bz_w => vsetanyeqz.w + bcnez + _lsx_bz_d => vsetanyeqz.d + bcnez + _lsx_bnz_b => vsetallnez.b + bcnez + _lsx_bnz_h => vsetallnez.h + bcnez + _lsx_bnz_w => vsetallnez.w + bcnez + _lsx_bnz_d => vsetallnez.d + bcnez +@end smallexample -v8i16 __builtin_msa_hsub_s_h (v16i8, v16i8); -v4i32 __builtin_msa_hsub_s_w (v8i16, v8i16); -v2i64 __builtin_msa_hsub_s_d (v4i32, v4i32); +@smallexample +eg: + #include -v8i16 __builtin_msa_hsub_u_h (v16u8, v16u8); -v4i32 __builtin_msa_hsub_u_w (v8u16, v8u16); -v2i64 __builtin_msa_hsub_u_d (v4u32, v4u32); + extern __m128i @var{a}; -v16i8 __builtin_msa_ilvev_b (v16i8, v16i8); -v8i16 __builtin_msa_ilvev_h (v8i16, v8i16); -v4i32 __builtin_msa_ilvev_w (v4i32, v4i32); -v2i64 __builtin_msa_ilvev_d (v2i64, v2i64); + void + test (void) + @{ + if (__lsx_bz_v (@var{a})) + printf ("1\n"); + else + printf ("2\n"); + @} +@end smallexample -v16i8 __builtin_msa_ilvl_b (v16i8, v16i8); -v8i16 __builtin_msa_ilvl_h (v8i16, v8i16); -v4i32 __builtin_msa_ilvl_w (v4i32, v4i32); -v2i64 __builtin_msa_ilvl_d (v2i64, v2i64); +@emph{Note:} For directives where the intent operand is also the source operand +(modifying only part of the bitfield of the intent register), the first parameter +in the builtin call function is used as the intent operand. -v16i8 __builtin_msa_ilvod_b (v16i8, v16i8); -v8i16 __builtin_msa_ilvod_h (v8i16, v8i16); -v4i32 __builtin_msa_ilvod_w (v4i32, v4i32); -v2i64 __builtin_msa_ilvod_d (v2i64, v2i64); +@smallexample +eg: + #include -v16i8 __builtin_msa_ilvr_b (v16i8, v16i8); -v8i16 __builtin_msa_ilvr_h (v8i16, v8i16); -v4i32 __builtin_msa_ilvr_w (v4i32, v4i32); -v2i64 __builtin_msa_ilvr_d (v2i64, v2i64); + extern __m128i @var{dst}; + extern int @var{src}; -v16i8 __builtin_msa_insert_b (v16i8, imm0_15, i32); -v8i16 __builtin_msa_insert_h (v8i16, imm0_7, i32); -v4i32 __builtin_msa_insert_w (v4i32, imm0_3, i32); -v2i64 __builtin_msa_insert_d (v2i64, imm0_1, i64); + void + test (void) + @{ + @var{dst} = __lsx_vinsgr2vr_b (@var{dst}, @var{src}, 3); + @} +@end smallexample -v16i8 __builtin_msa_insve_b (v16i8, imm0_15, v16i8); -v8i16 __builtin_msa_insve_h (v8i16, imm0_7, v8i16); -v4i32 __builtin_msa_insve_w (v4i32, imm0_3, v4i32); -v2i64 __builtin_msa_insve_d (v2i64, imm0_1, v2i64); +The intrinsics provided are listed below: +@smallexample +int __lsx_bnz_b (__m128i); +int __lsx_bnz_d (__m128i); +int __lsx_bnz_h (__m128i); +int __lsx_bnz_v (__m128i); +int __lsx_bnz_w (__m128i); +int __lsx_bz_b (__m128i); +int __lsx_bz_d (__m128i); +int __lsx_bz_h (__m128i); +int __lsx_bz_v (__m128i); +int __lsx_bz_w (__m128i); +__m128i __lsx_vabsd_b (__m128i, __m128i); +__m128i __lsx_vabsd_bu (__m128i, __m128i); +__m128i __lsx_vabsd_d (__m128i, __m128i); +__m128i __lsx_vabsd_du (__m128i, __m128i); +__m128i __lsx_vabsd_h (__m128i, __m128i); +__m128i __lsx_vabsd_hu (__m128i, __m128i); +__m128i __lsx_vabsd_w (__m128i, __m128i); +__m128i __lsx_vabsd_wu (__m128i, __m128i); +__m128i __lsx_vadda_b (__m128i, __m128i); +__m128i __lsx_vadda_d (__m128i, __m128i); +__m128i __lsx_vadda_h (__m128i, __m128i); +__m128i __lsx_vadda_w (__m128i, __m128i); +__m128i __lsx_vadd_b (__m128i, __m128i); +__m128i __lsx_vadd_d (__m128i, __m128i); +__m128i __lsx_vadd_h (__m128i, __m128i); +__m128i __lsx_vaddi_bu (__m128i, imm0_31); +__m128i __lsx_vaddi_du (__m128i, imm0_31); +__m128i __lsx_vaddi_hu (__m128i, imm0_31); +__m128i __lsx_vaddi_wu (__m128i, imm0_31); +__m128i __lsx_vadd_q (__m128i, __m128i); +__m128i __lsx_vadd_w (__m128i, __m128i); +__m128i __lsx_vaddwev_d_w (__m128i, __m128i); +__m128i __lsx_vaddwev_d_wu (__m128i, __m128i); +__m128i __lsx_vaddwev_d_wu_w (__m128i, __m128i); +__m128i __lsx_vaddwev_h_b (__m128i, __m128i); +__m128i __lsx_vaddwev_h_bu (__m128i, __m128i); +__m128i __lsx_vaddwev_h_bu_b (__m128i, __m128i); +__m128i __lsx_vaddwev_q_d (__m128i, __m128i); +__m128i __lsx_vaddwev_q_du (__m128i, __m128i); +__m128i __lsx_vaddwev_q_du_d (__m128i, __m128i); +__m128i __lsx_vaddwev_w_h (__m128i, __m128i); +__m128i __lsx_vaddwev_w_hu (__m128i, __m128i); +__m128i __lsx_vaddwev_w_hu_h (__m128i, __m128i); +__m128i __lsx_vaddwod_d_w (__m128i, __m128i); +__m128i __lsx_vaddwod_d_wu (__m128i, __m128i); +__m128i __lsx_vaddwod_d_wu_w (__m128i, __m128i); +__m128i __lsx_vaddwod_h_b (__m128i, __m128i); +__m128i __lsx_vaddwod_h_bu (__m128i, __m128i); +__m128i __lsx_vaddwod_h_bu_b (__m128i, __m128i); +__m128i __lsx_vaddwod_q_d (__m128i, __m128i); +__m128i __lsx_vaddwod_q_du (__m128i, __m128i); +__m128i __lsx_vaddwod_q_du_d (__m128i, __m128i); +__m128i __lsx_vaddwod_w_h (__m128i, __m128i); +__m128i __lsx_vaddwod_w_hu (__m128i, __m128i); +__m128i __lsx_vaddwod_w_hu_h (__m128i, __m128i); +__m128i __lsx_vandi_b (__m128i, imm0_255); +__m128i __lsx_vandn_v (__m128i, __m128i); +__m128i __lsx_vand_v (__m128i, __m128i); +__m128i __lsx_vavg_b (__m128i, __m128i); +__m128i __lsx_vavg_bu (__m128i, __m128i); +__m128i __lsx_vavg_d (__m128i, __m128i); +__m128i __lsx_vavg_du (__m128i, __m128i); +__m128i __lsx_vavg_h (__m128i, __m128i); +__m128i __lsx_vavg_hu (__m128i, __m128i); +__m128i __lsx_vavgr_b (__m128i, __m128i); +__m128i __lsx_vavgr_bu (__m128i, __m128i); +__m128i __lsx_vavgr_d (__m128i, __m128i); +__m128i __lsx_vavgr_du (__m128i, __m128i); +__m128i __lsx_vavgr_h (__m128i, __m128i); +__m128i __lsx_vavgr_hu (__m128i, __m128i); +__m128i __lsx_vavgr_w (__m128i, __m128i); +__m128i __lsx_vavgr_wu (__m128i, __m128i); +__m128i __lsx_vavg_w (__m128i, __m128i); +__m128i __lsx_vavg_wu (__m128i, __m128i); +__m128i __lsx_vbitclr_b (__m128i, __m128i); +__m128i __lsx_vbitclr_d (__m128i, __m128i); +__m128i __lsx_vbitclr_h (__m128i, __m128i); +__m128i __lsx_vbitclri_b (__m128i, imm0_7); +__m128i __lsx_vbitclri_d (__m128i, imm0_63); +__m128i __lsx_vbitclri_h (__m128i, imm0_15); +__m128i __lsx_vbitclri_w (__m128i, imm0_31); +__m128i __lsx_vbitclr_w (__m128i, __m128i); +__m128i __lsx_vbitrev_b (__m128i, __m128i); +__m128i __lsx_vbitrev_d (__m128i, __m128i); +__m128i __lsx_vbitrev_h (__m128i, __m128i); +__m128i __lsx_vbitrevi_b (__m128i, imm0_7); +__m128i __lsx_vbitrevi_d (__m128i, imm0_63); +__m128i __lsx_vbitrevi_h (__m128i, imm0_15); +__m128i __lsx_vbitrevi_w (__m128i, imm0_31); +__m128i __lsx_vbitrev_w (__m128i, __m128i); +__m128i __lsx_vbitseli_b (__m128i, __m128i, imm0_255); +__m128i __lsx_vbitsel_v (__m128i, __m128i, __m128i); +__m128i __lsx_vbitset_b (__m128i, __m128i); +__m128i __lsx_vbitset_d (__m128i, __m128i); +__m128i __lsx_vbitset_h (__m128i, __m128i); +__m128i __lsx_vbitseti_b (__m128i, imm0_7); +__m128i __lsx_vbitseti_d (__m128i, imm0_63); +__m128i __lsx_vbitseti_h (__m128i, imm0_15); +__m128i __lsx_vbitseti_w (__m128i, imm0_31); +__m128i __lsx_vbitset_w (__m128i, __m128i); +__m128i __lsx_vbsll_v (__m128i, imm0_31); +__m128i __lsx_vbsrl_v (__m128i, imm0_31); +__m128i __lsx_vclo_b (__m128i); +__m128i __lsx_vclo_d (__m128i); +__m128i __lsx_vclo_h (__m128i); +__m128i __lsx_vclo_w (__m128i); +__m128i __lsx_vclz_b (__m128i); +__m128i __lsx_vclz_d (__m128i); +__m128i __lsx_vclz_h (__m128i); +__m128i __lsx_vclz_w (__m128i); +__m128i __lsx_vdiv_b (__m128i, __m128i); +__m128i __lsx_vdiv_bu (__m128i, __m128i); +__m128i __lsx_vdiv_d (__m128i, __m128i); +__m128i __lsx_vdiv_du (__m128i, __m128i); +__m128i __lsx_vdiv_h (__m128i, __m128i); +__m128i __lsx_vdiv_hu (__m128i, __m128i); +__m128i __lsx_vdiv_w (__m128i, __m128i); +__m128i __lsx_vdiv_wu (__m128i, __m128i); +__m128i __lsx_vexth_du_wu (__m128i); +__m128i __lsx_vexth_d_w (__m128i); +__m128i __lsx_vexth_h_b (__m128i); +__m128i __lsx_vexth_hu_bu (__m128i); +__m128i __lsx_vexth_q_d (__m128i); +__m128i __lsx_vexth_qu_du (__m128i); +__m128i __lsx_vexth_w_h (__m128i); +__m128i __lsx_vexth_wu_hu (__m128i); +__m128i __lsx_vextl_q_d (__m128i); +__m128i __lsx_vextl_qu_du (__m128i); +__m128i __lsx_vextrins_b (__m128i, __m128i, imm0_255); +__m128i __lsx_vextrins_d (__m128i, __m128i, imm0_255); +__m128i __lsx_vextrins_h (__m128i, __m128i, imm0_255); +__m128i __lsx_vextrins_w (__m128i, __m128i, imm0_255); +__m128d __lsx_vfadd_d (__m128d, __m128d); +__m128 __lsx_vfadd_s (__m128, __m128); +__m128i __lsx_vfclass_d (__m128d); +__m128i __lsx_vfclass_s (__m128); +__m128i __lsx_vfcmp_caf_d (__m128d, __m128d); +__m128i __lsx_vfcmp_caf_s (__m128, __m128); +__m128i __lsx_vfcmp_ceq_d (__m128d, __m128d); +__m128i __lsx_vfcmp_ceq_s (__m128, __m128); +__m128i __lsx_vfcmp_cle_d (__m128d, __m128d); +__m128i __lsx_vfcmp_cle_s (__m128, __m128); +__m128i __lsx_vfcmp_clt_d (__m128d, __m128d); +__m128i __lsx_vfcmp_clt_s (__m128, __m128); +__m128i __lsx_vfcmp_cne_d (__m128d, __m128d); +__m128i __lsx_vfcmp_cne_s (__m128, __m128); +__m128i __lsx_vfcmp_cor_d (__m128d, __m128d); +__m128i __lsx_vfcmp_cor_s (__m128, __m128); +__m128i __lsx_vfcmp_cueq_d (__m128d, __m128d); +__m128i __lsx_vfcmp_cueq_s (__m128, __m128); +__m128i __lsx_vfcmp_cule_d (__m128d, __m128d); +__m128i __lsx_vfcmp_cule_s (__m128, __m128); +__m128i __lsx_vfcmp_cult_d (__m128d, __m128d); +__m128i __lsx_vfcmp_cult_s (__m128, __m128); +__m128i __lsx_vfcmp_cun_d (__m128d, __m128d); +__m128i __lsx_vfcmp_cune_d (__m128d, __m128d); +__m128i __lsx_vfcmp_cune_s (__m128, __m128); +__m128i __lsx_vfcmp_cun_s (__m128, __m128); +__m128i __lsx_vfcmp_saf_d (__m128d, __m128d); +__m128i __lsx_vfcmp_saf_s (__m128, __m128); +__m128i __lsx_vfcmp_seq_d (__m128d, __m128d); +__m128i __lsx_vfcmp_seq_s (__m128, __m128); +__m128i __lsx_vfcmp_sle_d (__m128d, __m128d); +__m128i __lsx_vfcmp_sle_s (__m128, __m128); +__m128i __lsx_vfcmp_slt_d (__m128d, __m128d); +__m128i __lsx_vfcmp_slt_s (__m128, __m128); +__m128i __lsx_vfcmp_sne_d (__m128d, __m128d); +__m128i __lsx_vfcmp_sne_s (__m128, __m128); +__m128i __lsx_vfcmp_sor_d (__m128d, __m128d); +__m128i __lsx_vfcmp_sor_s (__m128, __m128); +__m128i __lsx_vfcmp_sueq_d (__m128d, __m128d); +__m128i __lsx_vfcmp_sueq_s (__m128, __m128); +__m128i __lsx_vfcmp_sule_d (__m128d, __m128d); +__m128i __lsx_vfcmp_sule_s (__m128, __m128); +__m128i __lsx_vfcmp_sult_d (__m128d, __m128d); +__m128i __lsx_vfcmp_sult_s (__m128, __m128); +__m128i __lsx_vfcmp_sun_d (__m128d, __m128d); +__m128i __lsx_vfcmp_sune_d (__m128d, __m128d); +__m128i __lsx_vfcmp_sune_s (__m128, __m128); +__m128i __lsx_vfcmp_sun_s (__m128, __m128); +__m128d __lsx_vfcvth_d_s (__m128); +__m128i __lsx_vfcvt_h_s (__m128, __m128); +__m128 __lsx_vfcvth_s_h (__m128i); +__m128d __lsx_vfcvtl_d_s (__m128); +__m128 __lsx_vfcvtl_s_h (__m128i); +__m128 __lsx_vfcvt_s_d (__m128d, __m128d); +__m128d __lsx_vfdiv_d (__m128d, __m128d); +__m128 __lsx_vfdiv_s (__m128, __m128); +__m128d __lsx_vffint_d_l (__m128i); +__m128d __lsx_vffint_d_lu (__m128i); +__m128d __lsx_vffinth_d_w (__m128i); +__m128d __lsx_vffintl_d_w (__m128i); +__m128 __lsx_vffint_s_l (__m128i, __m128i); +__m128 __lsx_vffint_s_w (__m128i); +__m128 __lsx_vffint_s_wu (__m128i); +__m128d __lsx_vflogb_d (__m128d); +__m128 __lsx_vflogb_s (__m128); +__m128d __lsx_vfmadd_d (__m128d, __m128d, __m128d); +__m128 __lsx_vfmadd_s (__m128, __m128, __m128); +__m128d __lsx_vfmaxa_d (__m128d, __m128d); +__m128 __lsx_vfmaxa_s (__m128, __m128); +__m128d __lsx_vfmax_d (__m128d, __m128d); +__m128 __lsx_vfmax_s (__m128, __m128); +__m128d __lsx_vfmina_d (__m128d, __m128d); +__m128 __lsx_vfmina_s (__m128, __m128); +__m128d __lsx_vfmin_d (__m128d, __m128d); +__m128 __lsx_vfmin_s (__m128, __m128); +__m128d __lsx_vfmsub_d (__m128d, __m128d, __m128d); +__m128 __lsx_vfmsub_s (__m128, __m128, __m128); +__m128d __lsx_vfmul_d (__m128d, __m128d); +__m128 __lsx_vfmul_s (__m128, __m128); +__m128d __lsx_vfnmadd_d (__m128d, __m128d, __m128d); +__m128 __lsx_vfnmadd_s (__m128, __m128, __m128); +__m128d __lsx_vfnmsub_d (__m128d, __m128d, __m128d); +__m128 __lsx_vfnmsub_s (__m128, __m128, __m128); +__m128d __lsx_vfrecip_d (__m128d); +__m128 __lsx_vfrecip_s (__m128); +__m128d __lsx_vfrint_d (__m128d); +__m128d __lsx_vfrintrm_d (__m128d); +__m128 __lsx_vfrintrm_s (__m128); +__m128d __lsx_vfrintrne_d (__m128d); +__m128 __lsx_vfrintrne_s (__m128); +__m128d __lsx_vfrintrp_d (__m128d); +__m128 __lsx_vfrintrp_s (__m128); +__m128d __lsx_vfrintrz_d (__m128d); +__m128 __lsx_vfrintrz_s (__m128); +__m128 __lsx_vfrint_s (__m128); +__m128d __lsx_vfrsqrt_d (__m128d); +__m128 __lsx_vfrsqrt_s (__m128); +__m128i __lsx_vfrstp_b (__m128i, __m128i, __m128i); +__m128i __lsx_vfrstp_h (__m128i, __m128i, __m128i); +__m128i __lsx_vfrstpi_b (__m128i, __m128i, imm0_31); +__m128i __lsx_vfrstpi_h (__m128i, __m128i, imm0_31); +__m128d __lsx_vfsqrt_d (__m128d); +__m128 __lsx_vfsqrt_s (__m128); +__m128d __lsx_vfsub_d (__m128d, __m128d); +__m128 __lsx_vfsub_s (__m128, __m128); +__m128i __lsx_vftinth_l_s (__m128); +__m128i __lsx_vftint_l_d (__m128d); +__m128i __lsx_vftintl_l_s (__m128); +__m128i __lsx_vftint_lu_d (__m128d); +__m128i __lsx_vftintrmh_l_s (__m128); +__m128i __lsx_vftintrm_l_d (__m128d); +__m128i __lsx_vftintrml_l_s (__m128); +__m128i __lsx_vftintrm_w_d (__m128d, __m128d); +__m128i __lsx_vftintrm_w_s (__m128); +__m128i __lsx_vftintrneh_l_s (__m128); +__m128i __lsx_vftintrne_l_d (__m128d); +__m128i __lsx_vftintrnel_l_s (__m128); +__m128i __lsx_vftintrne_w_d (__m128d, __m128d); +__m128i __lsx_vftintrne_w_s (__m128); +__m128i __lsx_vftintrph_l_s (__m128); +__m128i __lsx_vftintrp_l_d (__m128d); +__m128i __lsx_vftintrpl_l_s (__m128); +__m128i __lsx_vftintrp_w_d (__m128d, __m128d); +__m128i __lsx_vftintrp_w_s (__m128); +__m128i __lsx_vftintrzh_l_s (__m128); +__m128i __lsx_vftintrz_l_d (__m128d); +__m128i __lsx_vftintrzl_l_s (__m128); +__m128i __lsx_vftintrz_lu_d (__m128d); +__m128i __lsx_vftintrz_w_d (__m128d, __m128d); +__m128i __lsx_vftintrz_w_s (__m128); +__m128i __lsx_vftintrz_wu_s (__m128); +__m128i __lsx_vftint_w_d (__m128d, __m128d); +__m128i __lsx_vftint_w_s (__m128); +__m128i __lsx_vftint_wu_s (__m128); +__m128i __lsx_vhaddw_du_wu (__m128i, __m128i); +__m128i __lsx_vhaddw_d_w (__m128i, __m128i); +__m128i __lsx_vhaddw_h_b (__m128i, __m128i); +__m128i __lsx_vhaddw_hu_bu (__m128i, __m128i); +__m128i __lsx_vhaddw_q_d (__m128i, __m128i); +__m128i __lsx_vhaddw_qu_du (__m128i, __m128i); +__m128i __lsx_vhaddw_w_h (__m128i, __m128i); +__m128i __lsx_vhaddw_wu_hu (__m128i, __m128i); +__m128i __lsx_vhsubw_du_wu (__m128i, __m128i); +__m128i __lsx_vhsubw_d_w (__m128i, __m128i); +__m128i __lsx_vhsubw_h_b (__m128i, __m128i); +__m128i __lsx_vhsubw_hu_bu (__m128i, __m128i); +__m128i __lsx_vhsubw_q_d (__m128i, __m128i); +__m128i __lsx_vhsubw_qu_du (__m128i, __m128i); +__m128i __lsx_vhsubw_w_h (__m128i, __m128i); +__m128i __lsx_vhsubw_wu_hu (__m128i, __m128i); +__m128i __lsx_vilvh_b (__m128i, __m128i); +__m128i __lsx_vilvh_d (__m128i, __m128i); +__m128i __lsx_vilvh_h (__m128i, __m128i); +__m128i __lsx_vilvh_w (__m128i, __m128i); +__m128i __lsx_vilvl_b (__m128i, __m128i); +__m128i __lsx_vilvl_d (__m128i, __m128i); +__m128i __lsx_vilvl_h (__m128i, __m128i); +__m128i __lsx_vilvl_w (__m128i, __m128i); +__m128i __lsx_vinsgr2vr_b (__m128i, int, imm0_15); +__m128i __lsx_vinsgr2vr_d (__m128i, long int, imm0_1); +__m128i __lsx_vinsgr2vr_h (__m128i, int, imm0_7); +__m128i __lsx_vinsgr2vr_w (__m128i, int, imm0_3); +__m128i __lsx_vld (void *, imm_n2048_2047); +__m128i __lsx_vldi (imm_n1024_1023); +__m128i __lsx_vldrepl_b (void *, imm_n2048_2047); +__m128i __lsx_vldrepl_d (void *, imm_n256_255); +__m128i __lsx_vldrepl_h (void *, imm_n1024_1023); +__m128i __lsx_vldrepl_w (void *, imm_n512_511); +__m128i __lsx_vldx (void *, long int); +__m128i __lsx_vmadd_b (__m128i, __m128i, __m128i); +__m128i __lsx_vmadd_d (__m128i, __m128i, __m128i); +__m128i __lsx_vmadd_h (__m128i, __m128i, __m128i); +__m128i __lsx_vmadd_w (__m128i, __m128i, __m128i); +__m128i __lsx_vmaddwev_d_w (__m128i, __m128i, __m128i); +__m128i __lsx_vmaddwev_d_wu (__m128i, __m128i, __m128i); +__m128i __lsx_vmaddwev_d_wu_w (__m128i, __m128i, __m128i); +__m128i __lsx_vmaddwev_h_b (__m128i, __m128i, __m128i); +__m128i __lsx_vmaddwev_h_bu (__m128i, __m128i, __m128i); +__m128i __lsx_vmaddwev_h_bu_b (__m128i, __m128i, __m128i); +__m128i __lsx_vmaddwev_q_d (__m128i, __m128i, __m128i); +__m128i __lsx_vmaddwev_q_du (__m128i, __m128i, __m128i); +__m128i __lsx_vmaddwev_q_du_d (__m128i, __m128i, __m128i); +__m128i __lsx_vmaddwev_w_h (__m128i, __m128i, __m128i); +__m128i __lsx_vmaddwev_w_hu (__m128i, __m128i, __m128i); +__m128i __lsx_vmaddwev_w_hu_h (__m128i, __m128i, __m128i); +__m128i __lsx_vmaddwod_d_w (__m128i, __m128i, __m128i); +__m128i __lsx_vmaddwod_d_wu (__m128i, __m128i, __m128i); +__m128i __lsx_vmaddwod_d_wu_w (__m128i, __m128i, __m128i); +__m128i __lsx_vmaddwod_h_b (__m128i, __m128i, __m128i); +__m128i __lsx_vmaddwod_h_bu (__m128i, __m128i, __m128i); +__m128i __lsx_vmaddwod_h_bu_b (__m128i, __m128i, __m128i); +__m128i __lsx_vmaddwod_q_d (__m128i, __m128i, __m128i); +__m128i __lsx_vmaddwod_q_du (__m128i, __m128i, __m128i); +__m128i __lsx_vmaddwod_q_du_d (__m128i, __m128i, __m128i); +__m128i __lsx_vmaddwod_w_h (__m128i, __m128i, __m128i); +__m128i __lsx_vmaddwod_w_hu (__m128i, __m128i, __m128i); +__m128i __lsx_vmaddwod_w_hu_h (__m128i, __m128i, __m128i); +__m128i __lsx_vmax_b (__m128i, __m128i); +__m128i __lsx_vmax_bu (__m128i, __m128i); +__m128i __lsx_vmax_d (__m128i, __m128i); +__m128i __lsx_vmax_du (__m128i, __m128i); +__m128i __lsx_vmax_h (__m128i, __m128i); +__m128i __lsx_vmax_hu (__m128i, __m128i); +__m128i __lsx_vmaxi_b (__m128i, imm_n16_15); +__m128i __lsx_vmaxi_bu (__m128i, imm0_31); +__m128i __lsx_vmaxi_d (__m128i, imm_n16_15); +__m128i __lsx_vmaxi_du (__m128i, imm0_31); +__m128i __lsx_vmaxi_h (__m128i, imm_n16_15); +__m128i __lsx_vmaxi_hu (__m128i, imm0_31); +__m128i __lsx_vmaxi_w (__m128i, imm_n16_15); +__m128i __lsx_vmaxi_wu (__m128i, imm0_31); +__m128i __lsx_vmax_w (__m128i, __m128i); +__m128i __lsx_vmax_wu (__m128i, __m128i); +__m128i __lsx_vmin_b (__m128i, __m128i); +__m128i __lsx_vmin_bu (__m128i, __m128i); +__m128i __lsx_vmin_d (__m128i, __m128i); +__m128i __lsx_vmin_du (__m128i, __m128i); +__m128i __lsx_vmin_h (__m128i, __m128i); +__m128i __lsx_vmin_hu (__m128i, __m128i); +__m128i __lsx_vmini_b (__m128i, imm_n16_15); +__m128i __lsx_vmini_bu (__m128i, imm0_31); +__m128i __lsx_vmini_d (__m128i, imm_n16_15); +__m128i __lsx_vmini_du (__m128i, imm0_31); +__m128i __lsx_vmini_h (__m128i, imm_n16_15); +__m128i __lsx_vmini_hu (__m128i, imm0_31); +__m128i __lsx_vmini_w (__m128i, imm_n16_15); +__m128i __lsx_vmini_wu (__m128i, imm0_31); +__m128i __lsx_vmin_w (__m128i, __m128i); +__m128i __lsx_vmin_wu (__m128i, __m128i); +__m128i __lsx_vmod_b (__m128i, __m128i); +__m128i __lsx_vmod_bu (__m128i, __m128i); +__m128i __lsx_vmod_d (__m128i, __m128i); +__m128i __lsx_vmod_du (__m128i, __m128i); +__m128i __lsx_vmod_h (__m128i, __m128i); +__m128i __lsx_vmod_hu (__m128i, __m128i); +__m128i __lsx_vmod_w (__m128i, __m128i); +__m128i __lsx_vmod_wu (__m128i, __m128i); +__m128i __lsx_vmskgez_b (__m128i); +__m128i __lsx_vmskltz_b (__m128i); +__m128i __lsx_vmskltz_d (__m128i); +__m128i __lsx_vmskltz_h (__m128i); +__m128i __lsx_vmskltz_w (__m128i); +__m128i __lsx_vmsknz_b (__m128i); +__m128i __lsx_vmsub_b (__m128i, __m128i, __m128i); +__m128i __lsx_vmsub_d (__m128i, __m128i, __m128i); +__m128i __lsx_vmsub_h (__m128i, __m128i, __m128i); +__m128i __lsx_vmsub_w (__m128i, __m128i, __m128i); +__m128i __lsx_vmuh_b (__m128i, __m128i); +__m128i __lsx_vmuh_bu (__m128i, __m128i); +__m128i __lsx_vmuh_d (__m128i, __m128i); +__m128i __lsx_vmuh_du (__m128i, __m128i); +__m128i __lsx_vmuh_h (__m128i, __m128i); +__m128i __lsx_vmuh_hu (__m128i, __m128i); +__m128i __lsx_vmuh_w (__m128i, __m128i); +__m128i __lsx_vmuh_wu (__m128i, __m128i); +__m128i __lsx_vmul_b (__m128i, __m128i); +__m128i __lsx_vmul_d (__m128i, __m128i); +__m128i __lsx_vmul_h (__m128i, __m128i); +__m128i __lsx_vmul_w (__m128i, __m128i); +__m128i __lsx_vmulwev_d_w (__m128i, __m128i); +__m128i __lsx_vmulwev_d_wu (__m128i, __m128i); +__m128i __lsx_vmulwev_d_wu_w (__m128i, __m128i); +__m128i __lsx_vmulwev_h_b (__m128i, __m128i); +__m128i __lsx_vmulwev_h_bu (__m128i, __m128i); +__m128i __lsx_vmulwev_h_bu_b (__m128i, __m128i); +__m128i __lsx_vmulwev_q_d (__m128i, __m128i); +__m128i __lsx_vmulwev_q_du (__m128i, __m128i); +__m128i __lsx_vmulwev_q_du_d (__m128i, __m128i); +__m128i __lsx_vmulwev_w_h (__m128i, __m128i); +__m128i __lsx_vmulwev_w_hu (__m128i, __m128i); +__m128i __lsx_vmulwev_w_hu_h (__m128i, __m128i); +__m128i __lsx_vmulwod_d_w (__m128i, __m128i); +__m128i __lsx_vmulwod_d_wu (__m128i, __m128i); +__m128i __lsx_vmulwod_d_wu_w (__m128i, __m128i); +__m128i __lsx_vmulwod_h_b (__m128i, __m128i); +__m128i __lsx_vmulwod_h_bu (__m128i, __m128i); +__m128i __lsx_vmulwod_h_bu_b (__m128i, __m128i); +__m128i __lsx_vmulwod_q_d (__m128i, __m128i); +__m128i __lsx_vmulwod_q_du (__m128i, __m128i); +__m128i __lsx_vmulwod_q_du_d (__m128i, __m128i); +__m128i __lsx_vmulwod_w_h (__m128i, __m128i); +__m128i __lsx_vmulwod_w_hu (__m128i, __m128i); +__m128i __lsx_vmulwod_w_hu_h (__m128i, __m128i); +__m128i __lsx_vneg_b (__m128i); +__m128i __lsx_vneg_d (__m128i); +__m128i __lsx_vneg_h (__m128i); +__m128i __lsx_vneg_w (__m128i); +__m128i __lsx_vnori_b (__m128i, imm0_255); +__m128i __lsx_vnor_v (__m128i, __m128i); +__m128i __lsx_vori_b (__m128i, imm0_255); +__m128i __lsx_vorn_v (__m128i, __m128i); +__m128i __lsx_vor_v (__m128i, __m128i); +__m128i __lsx_vpackev_b (__m128i, __m128i); +__m128i __lsx_vpackev_d (__m128i, __m128i); +__m128i __lsx_vpackev_h (__m128i, __m128i); +__m128i __lsx_vpackev_w (__m128i, __m128i); +__m128i __lsx_vpackod_b (__m128i, __m128i); +__m128i __lsx_vpackod_d (__m128i, __m128i); +__m128i __lsx_vpackod_h (__m128i, __m128i); +__m128i __lsx_vpackod_w (__m128i, __m128i); +__m128i __lsx_vpcnt_b (__m128i); +__m128i __lsx_vpcnt_d (__m128i); +__m128i __lsx_vpcnt_h (__m128i); +__m128i __lsx_vpcnt_w (__m128i); +__m128i __lsx_vpermi_w (__m128i, __m128i, imm0_255); +__m128i __lsx_vpickev_b (__m128i, __m128i); +__m128i __lsx_vpickev_d (__m128i, __m128i); +__m128i __lsx_vpickev_h (__m128i, __m128i); +__m128i __lsx_vpickev_w (__m128i, __m128i); +__m128i __lsx_vpickod_b (__m128i, __m128i); +__m128i __lsx_vpickod_d (__m128i, __m128i); +__m128i __lsx_vpickod_h (__m128i, __m128i); +__m128i __lsx_vpickod_w (__m128i, __m128i); +int __lsx_vpickve2gr_b (__m128i, imm0_15); +unsigned int __lsx_vpickve2gr_bu (__m128i, imm0_15); +long int __lsx_vpickve2gr_d (__m128i, imm0_1); +unsigned long int __lsx_vpickve2gr_du (__m128i, imm0_1); +int __lsx_vpickve2gr_h (__m128i, imm0_7); +unsigned int __lsx_vpickve2gr_hu (__m128i, imm0_7); +int __lsx_vpickve2gr_w (__m128i, imm0_3); +unsigned int __lsx_vpickve2gr_wu (__m128i, imm0_3); +__m128i __lsx_vreplgr2vr_b (int); +__m128i __lsx_vreplgr2vr_d (long int); +__m128i __lsx_vreplgr2vr_h (int); +__m128i __lsx_vreplgr2vr_w (int); +__m128i __lsx_vrepli_b (imm_n512_511); +__m128i __lsx_vrepli_d (imm_n512_511); +__m128i __lsx_vrepli_h (imm_n512_511); +__m128i __lsx_vrepli_w (imm_n512_511); +__m128i __lsx_vreplve_b (__m128i, int); +__m128i __lsx_vreplve_d (__m128i, int); +__m128i __lsx_vreplve_h (__m128i, int); +__m128i __lsx_vreplvei_b (__m128i, imm0_15); +__m128i __lsx_vreplvei_d (__m128i, imm0_1); +__m128i __lsx_vreplvei_h (__m128i, imm0_7); +__m128i __lsx_vreplvei_w (__m128i, imm0_3); +__m128i __lsx_vreplve_w (__m128i, int); +__m128i __lsx_vrotr_b (__m128i, __m128i); +__m128i __lsx_vrotr_d (__m128i, __m128i); +__m128i __lsx_vrotr_h (__m128i, __m128i); +__m128i __lsx_vrotri_b (__m128i, imm0_7); +__m128i __lsx_vrotri_d (__m128i, imm0_63); +__m128i __lsx_vrotri_h (__m128i, imm0_15); +__m128i __lsx_vrotri_w (__m128i, imm0_31); +__m128i __lsx_vrotr_w (__m128i, __m128i); +__m128i __lsx_vsadd_b (__m128i, __m128i); +__m128i __lsx_vsadd_bu (__m128i, __m128i); +__m128i __lsx_vsadd_d (__m128i, __m128i); +__m128i __lsx_vsadd_du (__m128i, __m128i); +__m128i __lsx_vsadd_h (__m128i, __m128i); +__m128i __lsx_vsadd_hu (__m128i, __m128i); +__m128i __lsx_vsadd_w (__m128i, __m128i); +__m128i __lsx_vsadd_wu (__m128i, __m128i); +__m128i __lsx_vsat_b (__m128i, imm0_7); +__m128i __lsx_vsat_bu (__m128i, imm0_7); +__m128i __lsx_vsat_d (__m128i, imm0_63); +__m128i __lsx_vsat_du (__m128i, imm0_63); +__m128i __lsx_vsat_h (__m128i, imm0_15); +__m128i __lsx_vsat_hu (__m128i, imm0_15); +__m128i __lsx_vsat_w (__m128i, imm0_31); +__m128i __lsx_vsat_wu (__m128i, imm0_31); +__m128i __lsx_vseq_b (__m128i, __m128i); +__m128i __lsx_vseq_d (__m128i, __m128i); +__m128i __lsx_vseq_h (__m128i, __m128i); +__m128i __lsx_vseqi_b (__m128i, imm_n16_15); +__m128i __lsx_vseqi_d (__m128i, imm_n16_15); +__m128i __lsx_vseqi_h (__m128i, imm_n16_15); +__m128i __lsx_vseqi_w (__m128i, imm_n16_15); +__m128i __lsx_vseq_w (__m128i, __m128i); +__m128i __lsx_vshuf4i_b (__m128i, imm0_255); +__m128i __lsx_vshuf4i_d (__m128i, __m128i, imm0_255); +__m128i __lsx_vshuf4i_h (__m128i, imm0_255); +__m128i __lsx_vshuf4i_w (__m128i, imm0_255); +__m128i __lsx_vshuf_b (__m128i, __m128i, __m128i); +__m128i __lsx_vshuf_d (__m128i, __m128i, __m128i); +__m128i __lsx_vshuf_h (__m128i, __m128i, __m128i); +__m128i __lsx_vshuf_w (__m128i, __m128i, __m128i); +__m128i __lsx_vsigncov_b (__m128i, __m128i); +__m128i __lsx_vsigncov_d (__m128i, __m128i); +__m128i __lsx_vsigncov_h (__m128i, __m128i); +__m128i __lsx_vsigncov_w (__m128i, __m128i); +__m128i __lsx_vsle_b (__m128i, __m128i); +__m128i __lsx_vsle_bu (__m128i, __m128i); +__m128i __lsx_vsle_d (__m128i, __m128i); +__m128i __lsx_vsle_du (__m128i, __m128i); +__m128i __lsx_vsle_h (__m128i, __m128i); +__m128i __lsx_vsle_hu (__m128i, __m128i); +__m128i __lsx_vslei_b (__m128i, imm_n16_15); +__m128i __lsx_vslei_bu (__m128i, imm0_31); +__m128i __lsx_vslei_d (__m128i, imm_n16_15); +__m128i __lsx_vslei_du (__m128i, imm0_31); +__m128i __lsx_vslei_h (__m128i, imm_n16_15); +__m128i __lsx_vslei_hu (__m128i, imm0_31); +__m128i __lsx_vslei_w (__m128i, imm_n16_15); +__m128i __lsx_vslei_wu (__m128i, imm0_31); +__m128i __lsx_vsle_w (__m128i, __m128i); +__m128i __lsx_vsle_wu (__m128i, __m128i); +__m128i __lsx_vsll_b (__m128i, __m128i); +__m128i __lsx_vsll_d (__m128i, __m128i); +__m128i __lsx_vsll_h (__m128i, __m128i); +__m128i __lsx_vslli_b (__m128i, imm0_7); +__m128i __lsx_vslli_d (__m128i, imm0_63); +__m128i __lsx_vslli_h (__m128i, imm0_15); +__m128i __lsx_vslli_w (__m128i, imm0_31); +__m128i __lsx_vsll_w (__m128i, __m128i); +__m128i __lsx_vsllwil_du_wu (__m128i, imm0_31); +__m128i __lsx_vsllwil_d_w (__m128i, imm0_31); +__m128i __lsx_vsllwil_h_b (__m128i, imm0_7); +__m128i __lsx_vsllwil_hu_bu (__m128i, imm0_7); +__m128i __lsx_vsllwil_w_h (__m128i, imm0_15); +__m128i __lsx_vsllwil_wu_hu (__m128i, imm0_15); +__m128i __lsx_vslt_b (__m128i, __m128i); +__m128i __lsx_vslt_bu (__m128i, __m128i); +__m128i __lsx_vslt_d (__m128i, __m128i); +__m128i __lsx_vslt_du (__m128i, __m128i); +__m128i __lsx_vslt_h (__m128i, __m128i); +__m128i __lsx_vslt_hu (__m128i, __m128i); +__m128i __lsx_vslti_b (__m128i, imm_n16_15); +__m128i __lsx_vslti_bu (__m128i, imm0_31); +__m128i __lsx_vslti_d (__m128i, imm_n16_15); +__m128i __lsx_vslti_du (__m128i, imm0_31); +__m128i __lsx_vslti_h (__m128i, imm_n16_15); +__m128i __lsx_vslti_hu (__m128i, imm0_31); +__m128i __lsx_vslti_w (__m128i, imm_n16_15); +__m128i __lsx_vslti_wu (__m128i, imm0_31); +__m128i __lsx_vslt_w (__m128i, __m128i); +__m128i __lsx_vslt_wu (__m128i, __m128i); +__m128i __lsx_vsra_b (__m128i, __m128i); +__m128i __lsx_vsra_d (__m128i, __m128i); +__m128i __lsx_vsra_h (__m128i, __m128i); +__m128i __lsx_vsrai_b (__m128i, imm0_7); +__m128i __lsx_vsrai_d (__m128i, imm0_63); +__m128i __lsx_vsrai_h (__m128i, imm0_15); +__m128i __lsx_vsrai_w (__m128i, imm0_31); +__m128i __lsx_vsran_b_h (__m128i, __m128i); +__m128i __lsx_vsran_h_w (__m128i, __m128i); +__m128i __lsx_vsrani_b_h (__m128i, __m128i, imm0_15); +__m128i __lsx_vsrani_d_q (__m128i, __m128i, imm0_127); +__m128i __lsx_vsrani_h_w (__m128i, __m128i, imm0_31); +__m128i __lsx_vsrani_w_d (__m128i, __m128i, imm0_63); +__m128i __lsx_vsran_w_d (__m128i, __m128i); +__m128i __lsx_vsrar_b (__m128i, __m128i); +__m128i __lsx_vsrar_d (__m128i, __m128i); +__m128i __lsx_vsrar_h (__m128i, __m128i); +__m128i __lsx_vsrari_b (__m128i, imm0_7); +__m128i __lsx_vsrari_d (__m128i, imm0_63); +__m128i __lsx_vsrari_h (__m128i, imm0_15); +__m128i __lsx_vsrari_w (__m128i, imm0_31); +__m128i __lsx_vsrarn_b_h (__m128i, __m128i); +__m128i __lsx_vsrarn_h_w (__m128i, __m128i); +__m128i __lsx_vsrarni_b_h (__m128i, __m128i, imm0_15); +__m128i __lsx_vsrarni_d_q (__m128i, __m128i, imm0_127); +__m128i __lsx_vsrarni_h_w (__m128i, __m128i, imm0_31); +__m128i __lsx_vsrarni_w_d (__m128i, __m128i, imm0_63); +__m128i __lsx_vsrarn_w_d (__m128i, __m128i); +__m128i __lsx_vsrar_w (__m128i, __m128i); +__m128i __lsx_vsra_w (__m128i, __m128i); +__m128i __lsx_vsrl_b (__m128i, __m128i); +__m128i __lsx_vsrl_d (__m128i, __m128i); +__m128i __lsx_vsrl_h (__m128i, __m128i); +__m128i __lsx_vsrli_b (__m128i, imm0_7); +__m128i __lsx_vsrli_d (__m128i, imm0_63); +__m128i __lsx_vsrli_h (__m128i, imm0_15); +__m128i __lsx_vsrli_w (__m128i, imm0_31); +__m128i __lsx_vsrln_b_h (__m128i, __m128i); +__m128i __lsx_vsrln_h_w (__m128i, __m128i); +__m128i __lsx_vsrlni_b_h (__m128i, __m128i, imm0_15); +__m128i __lsx_vsrlni_d_q (__m128i, __m128i, imm0_127); +__m128i __lsx_vsrlni_h_w (__m128i, __m128i, imm0_31); +__m128i __lsx_vsrlni_w_d (__m128i, __m128i, imm0_63); +__m128i __lsx_vsrln_w_d (__m128i, __m128i); +__m128i __lsx_vsrlr_b (__m128i, __m128i); +__m128i __lsx_vsrlr_d (__m128i, __m128i); +__m128i __lsx_vsrlr_h (__m128i, __m128i); +__m128i __lsx_vsrlri_b (__m128i, imm0_7); +__m128i __lsx_vsrlri_d (__m128i, imm0_63); +__m128i __lsx_vsrlri_h (__m128i, imm0_15); +__m128i __lsx_vsrlri_w (__m128i, imm0_31); +__m128i __lsx_vsrlrn_b_h (__m128i, __m128i); +__m128i __lsx_vsrlrn_h_w (__m128i, __m128i); +__m128i __lsx_vsrlrni_b_h (__m128i, __m128i, imm0_15); +__m128i __lsx_vsrlrni_d_q (__m128i, __m128i, imm0_127); +__m128i __lsx_vsrlrni_h_w (__m128i, __m128i, imm0_31); +__m128i __lsx_vsrlrni_w_d (__m128i, __m128i, imm0_63); +__m128i __lsx_vsrlrn_w_d (__m128i, __m128i); +__m128i __lsx_vsrlr_w (__m128i, __m128i); +__m128i __lsx_vsrl_w (__m128i, __m128i); +__m128i __lsx_vssran_b_h (__m128i, __m128i); +__m128i __lsx_vssran_bu_h (__m128i, __m128i); +__m128i __lsx_vssran_hu_w (__m128i, __m128i); +__m128i __lsx_vssran_h_w (__m128i, __m128i); +__m128i __lsx_vssrani_b_h (__m128i, __m128i, imm0_15); +__m128i __lsx_vssrani_bu_h (__m128i, __m128i, imm0_15); +__m128i __lsx_vssrani_d_q (__m128i, __m128i, imm0_127); +__m128i __lsx_vssrani_du_q (__m128i, __m128i, imm0_127); +__m128i __lsx_vssrani_hu_w (__m128i, __m128i, imm0_31); +__m128i __lsx_vssrani_h_w (__m128i, __m128i, imm0_31); +__m128i __lsx_vssrani_w_d (__m128i, __m128i, imm0_63); +__m128i __lsx_vssrani_wu_d (__m128i, __m128i, imm0_63); +__m128i __lsx_vssran_w_d (__m128i, __m128i); +__m128i __lsx_vssran_wu_d (__m128i, __m128i); +__m128i __lsx_vssrarn_b_h (__m128i, __m128i); +__m128i __lsx_vssrarn_bu_h (__m128i, __m128i); +__m128i __lsx_vssrarn_hu_w (__m128i, __m128i); +__m128i __lsx_vssrarn_h_w (__m128i, __m128i); +__m128i __lsx_vssrarni_b_h (__m128i, __m128i, imm0_15); +__m128i __lsx_vssrarni_bu_h (__m128i, __m128i, imm0_15); +__m128i __lsx_vssrarni_d_q (__m128i, __m128i, imm0_127); +__m128i __lsx_vssrarni_du_q (__m128i, __m128i, imm0_127); +__m128i __lsx_vssrarni_hu_w (__m128i, __m128i, imm0_31); +__m128i __lsx_vssrarni_h_w (__m128i, __m128i, imm0_31); +__m128i __lsx_vssrarni_w_d (__m128i, __m128i, imm0_63); +__m128i __lsx_vssrarni_wu_d (__m128i, __m128i, imm0_63); +__m128i __lsx_vssrarn_w_d (__m128i, __m128i); +__m128i __lsx_vssrarn_wu_d (__m128i, __m128i); +__m128i __lsx_vssrln_b_h (__m128i, __m128i); +__m128i __lsx_vssrln_bu_h (__m128i, __m128i); +__m128i __lsx_vssrln_hu_w (__m128i, __m128i); +__m128i __lsx_vssrln_h_w (__m128i, __m128i); +__m128i __lsx_vssrlni_b_h (__m128i, __m128i, imm0_15); +__m128i __lsx_vssrlni_bu_h (__m128i, __m128i, imm0_15); +__m128i __lsx_vssrlni_d_q (__m128i, __m128i, imm0_127); +__m128i __lsx_vssrlni_du_q (__m128i, __m128i, imm0_127); +__m128i __lsx_vssrlni_hu_w (__m128i, __m128i, imm0_31); +__m128i __lsx_vssrlni_h_w (__m128i, __m128i, imm0_31); +__m128i __lsx_vssrlni_w_d (__m128i, __m128i, imm0_63); +__m128i __lsx_vssrlni_wu_d (__m128i, __m128i, imm0_63); +__m128i __lsx_vssrln_w_d (__m128i, __m128i); +__m128i __lsx_vssrln_wu_d (__m128i, __m128i); +__m128i __lsx_vssrlrn_b_h (__m128i, __m128i); +__m128i __lsx_vssrlrn_bu_h (__m128i, __m128i); +__m128i __lsx_vssrlrn_hu_w (__m128i, __m128i); +__m128i __lsx_vssrlrn_h_w (__m128i, __m128i); +__m128i __lsx_vssrlrni_b_h (__m128i, __m128i, imm0_15); +__m128i __lsx_vssrlrni_bu_h (__m128i, __m128i, imm0_15); +__m128i __lsx_vssrlrni_d_q (__m128i, __m128i, imm0_127); +__m128i __lsx_vssrlrni_du_q (__m128i, __m128i, imm0_127); +__m128i __lsx_vssrlrni_hu_w (__m128i, __m128i, imm0_31); +__m128i __lsx_vssrlrni_h_w (__m128i, __m128i, imm0_31); +__m128i __lsx_vssrlrni_w_d (__m128i, __m128i, imm0_63); +__m128i __lsx_vssrlrni_wu_d (__m128i, __m128i, imm0_63); +__m128i __lsx_vssrlrn_w_d (__m128i, __m128i); +__m128i __lsx_vssrlrn_wu_d (__m128i, __m128i); +__m128i __lsx_vssub_b (__m128i, __m128i); +__m128i __lsx_vssub_bu (__m128i, __m128i); +__m128i __lsx_vssub_d (__m128i, __m128i); +__m128i __lsx_vssub_du (__m128i, __m128i); +__m128i __lsx_vssub_h (__m128i, __m128i); +__m128i __lsx_vssub_hu (__m128i, __m128i); +__m128i __lsx_vssub_w (__m128i, __m128i); +__m128i __lsx_vssub_wu (__m128i, __m128i); +void __lsx_vst (__m128i, void *, imm_n2048_2047); +void __lsx_vstelm_b (__m128i, void *, imm_n128_127, imm0_15); +void __lsx_vstelm_d (__m128i, void *, imm_n128_127, imm0_1); +void __lsx_vstelm_h (__m128i, void *, imm_n128_127, imm0_7); +void __lsx_vstelm_w (__m128i, void *, imm_n128_127, imm0_3); +void __lsx_vstx (__m128i, void *, long int); +__m128i __lsx_vsub_b (__m128i, __m128i); +__m128i __lsx_vsub_d (__m128i, __m128i); +__m128i __lsx_vsub_h (__m128i, __m128i); +__m128i __lsx_vsubi_bu (__m128i, imm0_31); +__m128i __lsx_vsubi_du (__m128i, imm0_31); +__m128i __lsx_vsubi_hu (__m128i, imm0_31); +__m128i __lsx_vsubi_wu (__m128i, imm0_31); +__m128i __lsx_vsub_q (__m128i, __m128i); +__m128i __lsx_vsub_w (__m128i, __m128i); +__m128i __lsx_vsubwev_d_w (__m128i, __m128i); +__m128i __lsx_vsubwev_d_wu (__m128i, __m128i); +__m128i __lsx_vsubwev_h_b (__m128i, __m128i); +__m128i __lsx_vsubwev_h_bu (__m128i, __m128i); +__m128i __lsx_vsubwev_q_d (__m128i, __m128i); +__m128i __lsx_vsubwev_q_du (__m128i, __m128i); +__m128i __lsx_vsubwev_w_h (__m128i, __m128i); +__m128i __lsx_vsubwev_w_hu (__m128i, __m128i); +__m128i __lsx_vsubwod_d_w (__m128i, __m128i); +__m128i __lsx_vsubwod_d_wu (__m128i, __m128i); +__m128i __lsx_vsubwod_h_b (__m128i, __m128i); +__m128i __lsx_vsubwod_h_bu (__m128i, __m128i); +__m128i __lsx_vsubwod_q_d (__m128i, __m128i); +__m128i __lsx_vsubwod_q_du (__m128i, __m128i); +__m128i __lsx_vsubwod_w_h (__m128i, __m128i); +__m128i __lsx_vsubwod_w_hu (__m128i, __m128i); +__m128i __lsx_vxori_b (__m128i, imm0_255); +__m128i __lsx_vxor_v (__m128i, __m128i); +@end smallexample -v16i8 __builtin_msa_ld_b (const void *, imm_n512_511); -v8i16 __builtin_msa_ld_h (const void *, imm_n1024_1022); -v4i32 __builtin_msa_ld_w (const void *, imm_n2048_2044); -v2i64 __builtin_msa_ld_d (const void *, imm_n4096_4088); +These intrinsic functions are available by including @code{lsxintrin.h} and +using @option{-mfrecipe} and @option{-mlsx}. +@smallexample +__m128d __lsx_vfrecipe_d (__m128d); +__m128 __lsx_vfrecipe_s (__m128); +__m128d __lsx_vfrsqrte_d (__m128d); +__m128 __lsx_vfrsqrte_s (__m128); +@end smallexample -v16i8 __builtin_msa_ldi_b (imm_n512_511); -v8i16 __builtin_msa_ldi_h (imm_n512_511); -v4i32 __builtin_msa_ldi_w (imm_n512_511); -v2i64 __builtin_msa_ldi_d (imm_n512_511); +@node LoongArch ASX Vector Intrinsics +@subsection LoongArch ASX Vector Intrinsics -v8i16 __builtin_msa_madd_q_h (v8i16, v8i16, v8i16); -v4i32 __builtin_msa_madd_q_w (v4i32, v4i32, v4i32); +GCC provides intrinsics to access the LASX (Loongson Advanced SIMD Extension) +instructions. The interface is made available by including @code{} +and using @option{-mlasx}. -v8i16 __builtin_msa_maddr_q_h (v8i16, v8i16, v8i16); -v4i32 __builtin_msa_maddr_q_w (v4i32, v4i32, v4i32); +The following vectors typedefs are included in @code{lasxintrin.h}: -v16i8 __builtin_msa_maddv_b (v16i8, v16i8, v16i8); -v8i16 __builtin_msa_maddv_h (v8i16, v8i16, v8i16); -v4i32 __builtin_msa_maddv_w (v4i32, v4i32, v4i32); -v2i64 __builtin_msa_maddv_d (v2i64, v2i64, v2i64); +@itemize +@item @code{__m256i}, a 256-bit vector of fixed point; +@item @code{__m256}, a 256-bit vector of single precision floating point; +@item @code{__m256d}, a 256-bit vector of double precision floating point. +@end itemize -v16i8 __builtin_msa_max_a_b (v16i8, v16i8); -v8i16 __builtin_msa_max_a_h (v8i16, v8i16); -v4i32 __builtin_msa_max_a_w (v4i32, v4i32); -v2i64 __builtin_msa_max_a_d (v2i64, v2i64); +Instructions and corresponding built-ins may have additional restrictions and/or +input/output values manipulated: -v16i8 __builtin_msa_max_s_b (v16i8, v16i8); -v8i16 __builtin_msa_max_s_h (v8i16, v8i16); -v4i32 __builtin_msa_max_s_w (v4i32, v4i32); -v2i64 __builtin_msa_max_s_d (v2i64, v2i64); +@itemize +@item @code{imm0_1}, an integer literal in range 0 to 1. +@item @code{imm0_3}, an integer literal in range 0 to 3. +@item @code{imm0_7}, an integer literal in range 0 to 7. +@item @code{imm0_15}, an integer literal in range 0 to 15. +@item @code{imm0_31}, an integer literal in range 0 to 31. +@item @code{imm0_63}, an integer literal in range 0 to 63. +@item @code{imm0_127}, an integer literal in range 0 to 127. +@item @code{imm0_255}, an integer literal in range 0 to 255. +@item @code{imm_n16_15}, an integer literal in range -16 to 15. +@item @code{imm_n128_127}, an integer literal in range -128 to 127. +@item @code{imm_n256_255}, an integer literal in range -256 to 255. +@item @code{imm_n512_511}, an integer literal in range -512 to 511. +@item @code{imm_n1024_1023}, an integer literal in range -1024 to 1023. +@item @code{imm_n2048_2047}, an integer literal in range -2048 to 2047. +@end itemize -v16u8 __builtin_msa_max_u_b (v16u8, v16u8); -v8u16 __builtin_msa_max_u_h (v8u16, v8u16); -v4u32 __builtin_msa_max_u_w (v4u32, v4u32); -v2u64 __builtin_msa_max_u_d (v2u64, v2u64); +For convenience, GCC defines functions @code{__lasx_xvrepli_@{b/h/w/d@}} and +@code{__lasx_b[n]z_@{v/b/h/w/d@}}, which are implemented as follows: -v16i8 __builtin_msa_maxi_s_b (v16i8, imm_n16_15); -v8i16 __builtin_msa_maxi_s_h (v8i16, imm_n16_15); -v4i32 __builtin_msa_maxi_s_w (v4i32, imm_n16_15); -v2i64 __builtin_msa_maxi_s_d (v2i64, imm_n16_15); +@smallexample +a. @code{__lasx_xvrepli_@{b/h/w/d@}}: Implemented the case where the highest + bit of @code{xvldi} instruction @code{i13} is 1. -v16u8 __builtin_msa_maxi_u_b (v16u8, imm0_31); -v8u16 __builtin_msa_maxi_u_h (v8u16, imm0_31); -v4u32 __builtin_msa_maxi_u_w (v4u32, imm0_31); -v2u64 __builtin_msa_maxi_u_d (v2u64, imm0_31); + i13[12] == 1'b0 + case i13[11:10] of : + 2'b00: __lasx_xvrepli_b (imm_n512_511) + 2'b01: __lasx_xvrepli_h (imm_n512_511) + 2'b10: __lasx_xvrepli_w (imm_n512_511) + 2'b11: __lasx_xvrepli_d (imm_n512_511) -v16i8 __builtin_msa_min_a_b (v16i8, v16i8); -v8i16 __builtin_msa_min_a_h (v8i16, v8i16); -v4i32 __builtin_msa_min_a_w (v4i32, v4i32); -v2i64 __builtin_msa_min_a_d (v2i64, v2i64); +b. @code{__lasx_b[n]z_@{v/b/h/w/d@}}: Since the @code{xvseteqz} class directive + cannot be used on its own, this function is defined. -v16i8 __builtin_msa_min_s_b (v16i8, v16i8); -v8i16 __builtin_msa_min_s_h (v8i16, v8i16); -v4i32 __builtin_msa_min_s_w (v4i32, v4i32); -v2i64 __builtin_msa_min_s_d (v2i64, v2i64); + __lasx_xbz_v => xvseteqz.v + bcnez + __lasx_xbnz_v => xvsetnez.v + bcnez + __lasx_xbz_b => xvsetanyeqz.b + bcnez + __lasx_xbz_h => xvsetanyeqz.h + bcnez + __lasx_xbz_w => xvsetanyeqz.w + bcnez + __lasx_xbz_d => xvsetanyeqz.d + bcnez + __lasx_xbnz_b => xvsetallnez.b + bcnez + __lasx_xbnz_h => xvsetallnez.h + bcnez + __lasx_xbnz_w => xvsetallnez.w + bcnez + __lasx_xbnz_d => xvsetallnez.d + bcnez +@end smallexample -v16u8 __builtin_msa_min_u_b (v16u8, v16u8); -v8u16 __builtin_msa_min_u_h (v8u16, v8u16); -v4u32 __builtin_msa_min_u_w (v4u32, v4u32); -v2u64 __builtin_msa_min_u_d (v2u64, v2u64); +@smallexample +eg: + #include -v16i8 __builtin_msa_mini_s_b (v16i8, imm_n16_15); -v8i16 __builtin_msa_mini_s_h (v8i16, imm_n16_15); -v4i32 __builtin_msa_mini_s_w (v4i32, imm_n16_15); -v2i64 __builtin_msa_mini_s_d (v2i64, imm_n16_15); + extern __m256i @var{a}; -v16u8 __builtin_msa_mini_u_b (v16u8, imm0_31); -v8u16 __builtin_msa_mini_u_h (v8u16, imm0_31); -v4u32 __builtin_msa_mini_u_w (v4u32, imm0_31); -v2u64 __builtin_msa_mini_u_d (v2u64, imm0_31); + void + test (void) + @{ + if (__lasx_xbz_v (@var{a})) + printf ("1\n"); + else + printf ("2\n"); + @} +@end smallexample -v16i8 __builtin_msa_mod_s_b (v16i8, v16i8); -v8i16 __builtin_msa_mod_s_h (v8i16, v8i16); -v4i32 __builtin_msa_mod_s_w (v4i32, v4i32); -v2i64 __builtin_msa_mod_s_d (v2i64, v2i64); +@emph{Note:} For directives where the intent operand is also the source operand +(modifying only part of the bitfield of the intent register), the first parameter +in the builtin call function is used as the intent operand. -v16u8 __builtin_msa_mod_u_b (v16u8, v16u8); -v8u16 __builtin_msa_mod_u_h (v8u16, v8u16); -v4u32 __builtin_msa_mod_u_w (v4u32, v4u32); -v2u64 __builtin_msa_mod_u_d (v2u64, v2u64); +@smallexample +eg: + #include + extern __m256i @var{dst}; + int @var{src}; -v16i8 __builtin_msa_move_v (v16i8); + void + test (void) + @{ + @var{dst} = __lasx_xvinsgr2vr_w (@var{dst}, @var{src}, 3); + @} +@end smallexample -v8i16 __builtin_msa_msub_q_h (v8i16, v8i16, v8i16); -v4i32 __builtin_msa_msub_q_w (v4i32, v4i32, v4i32); -v8i16 __builtin_msa_msubr_q_h (v8i16, v8i16, v8i16); -v4i32 __builtin_msa_msubr_q_w (v4i32, v4i32, v4i32); +The intrinsics provided are listed below: + +@smallexample +__m256i __lasx_vext2xv_d_b (__m256i); +__m256i __lasx_vext2xv_d_h (__m256i); +__m256i __lasx_vext2xv_du_bu (__m256i); +__m256i __lasx_vext2xv_du_hu (__m256i); +__m256i __lasx_vext2xv_du_wu (__m256i); +__m256i __lasx_vext2xv_d_w (__m256i); +__m256i __lasx_vext2xv_h_b (__m256i); +__m256i __lasx_vext2xv_hu_bu (__m256i); +__m256i __lasx_vext2xv_w_b (__m256i); +__m256i __lasx_vext2xv_w_h (__m256i); +__m256i __lasx_vext2xv_wu_bu (__m256i); +__m256i __lasx_vext2xv_wu_hu (__m256i); +int __lasx_xbnz_b (__m256i); +int __lasx_xbnz_d (__m256i); +int __lasx_xbnz_h (__m256i); +int __lasx_xbnz_v (__m256i); +int __lasx_xbnz_w (__m256i); +int __lasx_xbz_b (__m256i); +int __lasx_xbz_d (__m256i); +int __lasx_xbz_h (__m256i); +int __lasx_xbz_v (__m256i); +int __lasx_xbz_w (__m256i); +__m256i __lasx_xvabsd_b (__m256i, __m256i); +__m256i __lasx_xvabsd_bu (__m256i, __m256i); +__m256i __lasx_xvabsd_d (__m256i, __m256i); +__m256i __lasx_xvabsd_du (__m256i, __m256i); +__m256i __lasx_xvabsd_h (__m256i, __m256i); +__m256i __lasx_xvabsd_hu (__m256i, __m256i); +__m256i __lasx_xvabsd_w (__m256i, __m256i); +__m256i __lasx_xvabsd_wu (__m256i, __m256i); +__m256i __lasx_xvadda_b (__m256i, __m256i); +__m256i __lasx_xvadda_d (__m256i, __m256i); +__m256i __lasx_xvadda_h (__m256i, __m256i); +__m256i __lasx_xvadda_w (__m256i, __m256i); +__m256i __lasx_xvadd_b (__m256i, __m256i); +__m256i __lasx_xvadd_d (__m256i, __m256i); +__m256i __lasx_xvadd_h (__m256i, __m256i); +__m256i __lasx_xvaddi_bu (__m256i, imm0_31); +__m256i __lasx_xvaddi_du (__m256i, imm0_31); +__m256i __lasx_xvaddi_hu (__m256i, imm0_31); +__m256i __lasx_xvaddi_wu (__m256i, imm0_31); +__m256i __lasx_xvadd_q (__m256i, __m256i); +__m256i __lasx_xvadd_w (__m256i, __m256i); +__m256i __lasx_xvaddwev_d_w (__m256i, __m256i); +__m256i __lasx_xvaddwev_d_wu (__m256i, __m256i); +__m256i __lasx_xvaddwev_d_wu_w (__m256i, __m256i); +__m256i __lasx_xvaddwev_h_b (__m256i, __m256i); +__m256i __lasx_xvaddwev_h_bu (__m256i, __m256i); +__m256i __lasx_xvaddwev_h_bu_b (__m256i, __m256i); +__m256i __lasx_xvaddwev_q_d (__m256i, __m256i); +__m256i __lasx_xvaddwev_q_du (__m256i, __m256i); +__m256i __lasx_xvaddwev_q_du_d (__m256i, __m256i); +__m256i __lasx_xvaddwev_w_h (__m256i, __m256i); +__m256i __lasx_xvaddwev_w_hu (__m256i, __m256i); +__m256i __lasx_xvaddwev_w_hu_h (__m256i, __m256i); +__m256i __lasx_xvaddwod_d_w (__m256i, __m256i); +__m256i __lasx_xvaddwod_d_wu (__m256i, __m256i); +__m256i __lasx_xvaddwod_d_wu_w (__m256i, __m256i); +__m256i __lasx_xvaddwod_h_b (__m256i, __m256i); +__m256i __lasx_xvaddwod_h_bu (__m256i, __m256i); +__m256i __lasx_xvaddwod_h_bu_b (__m256i, __m256i); +__m256i __lasx_xvaddwod_q_d (__m256i, __m256i); +__m256i __lasx_xvaddwod_q_du (__m256i, __m256i); +__m256i __lasx_xvaddwod_q_du_d (__m256i, __m256i); +__m256i __lasx_xvaddwod_w_h (__m256i, __m256i); +__m256i __lasx_xvaddwod_w_hu (__m256i, __m256i); +__m256i __lasx_xvaddwod_w_hu_h (__m256i, __m256i); +__m256i __lasx_xvandi_b (__m256i, imm0_255); +__m256i __lasx_xvandn_v (__m256i, __m256i); +__m256i __lasx_xvand_v (__m256i, __m256i); +__m256i __lasx_xvavg_b (__m256i, __m256i); +__m256i __lasx_xvavg_bu (__m256i, __m256i); +__m256i __lasx_xvavg_d (__m256i, __m256i); +__m256i __lasx_xvavg_du (__m256i, __m256i); +__m256i __lasx_xvavg_h (__m256i, __m256i); +__m256i __lasx_xvavg_hu (__m256i, __m256i); +__m256i __lasx_xvavgr_b (__m256i, __m256i); +__m256i __lasx_xvavgr_bu (__m256i, __m256i); +__m256i __lasx_xvavgr_d (__m256i, __m256i); +__m256i __lasx_xvavgr_du (__m256i, __m256i); +__m256i __lasx_xvavgr_h (__m256i, __m256i); +__m256i __lasx_xvavgr_hu (__m256i, __m256i); +__m256i __lasx_xvavgr_w (__m256i, __m256i); +__m256i __lasx_xvavgr_wu (__m256i, __m256i); +__m256i __lasx_xvavg_w (__m256i, __m256i); +__m256i __lasx_xvavg_wu (__m256i, __m256i); +__m256i __lasx_xvbitclr_b (__m256i, __m256i); +__m256i __lasx_xvbitclr_d (__m256i, __m256i); +__m256i __lasx_xvbitclr_h (__m256i, __m256i); +__m256i __lasx_xvbitclri_b (__m256i, imm0_7); +__m256i __lasx_xvbitclri_d (__m256i, imm0_63); +__m256i __lasx_xvbitclri_h (__m256i, imm0_15); +__m256i __lasx_xvbitclri_w (__m256i, imm0_31); +__m256i __lasx_xvbitclr_w (__m256i, __m256i); +__m256i __lasx_xvbitrev_b (__m256i, __m256i); +__m256i __lasx_xvbitrev_d (__m256i, __m256i); +__m256i __lasx_xvbitrev_h (__m256i, __m256i); +__m256i __lasx_xvbitrevi_b (__m256i, imm0_7); +__m256i __lasx_xvbitrevi_d (__m256i, imm0_63); +__m256i __lasx_xvbitrevi_h (__m256i, imm0_15); +__m256i __lasx_xvbitrevi_w (__m256i, imm0_31); +__m256i __lasx_xvbitrev_w (__m256i, __m256i); +__m256i __lasx_xvbitseli_b (__m256i, __m256i, imm0_255); +__m256i __lasx_xvbitsel_v (__m256i, __m256i, __m256i); +__m256i __lasx_xvbitset_b (__m256i, __m256i); +__m256i __lasx_xvbitset_d (__m256i, __m256i); +__m256i __lasx_xvbitset_h (__m256i, __m256i); +__m256i __lasx_xvbitseti_b (__m256i, imm0_7); +__m256i __lasx_xvbitseti_d (__m256i, imm0_63); +__m256i __lasx_xvbitseti_h (__m256i, imm0_15); +__m256i __lasx_xvbitseti_w (__m256i, imm0_31); +__m256i __lasx_xvbitset_w (__m256i, __m256i); +__m256i __lasx_xvbsll_v (__m256i, imm0_31); +__m256i __lasx_xvbsrl_v (__m256i, imm0_31); +__m256i __lasx_xvclo_b (__m256i); +__m256i __lasx_xvclo_d (__m256i); +__m256i __lasx_xvclo_h (__m256i); +__m256i __lasx_xvclo_w (__m256i); +__m256i __lasx_xvclz_b (__m256i); +__m256i __lasx_xvclz_d (__m256i); +__m256i __lasx_xvclz_h (__m256i); +__m256i __lasx_xvclz_w (__m256i); +__m256i __lasx_xvdiv_b (__m256i, __m256i); +__m256i __lasx_xvdiv_bu (__m256i, __m256i); +__m256i __lasx_xvdiv_d (__m256i, __m256i); +__m256i __lasx_xvdiv_du (__m256i, __m256i); +__m256i __lasx_xvdiv_h (__m256i, __m256i); +__m256i __lasx_xvdiv_hu (__m256i, __m256i); +__m256i __lasx_xvdiv_w (__m256i, __m256i); +__m256i __lasx_xvdiv_wu (__m256i, __m256i); +__m256i __lasx_xvexth_du_wu (__m256i); +__m256i __lasx_xvexth_d_w (__m256i); +__m256i __lasx_xvexth_h_b (__m256i); +__m256i __lasx_xvexth_hu_bu (__m256i); +__m256i __lasx_xvexth_q_d (__m256i); +__m256i __lasx_xvexth_qu_du (__m256i); +__m256i __lasx_xvexth_w_h (__m256i); +__m256i __lasx_xvexth_wu_hu (__m256i); +__m256i __lasx_xvextl_q_d (__m256i); +__m256i __lasx_xvextl_qu_du (__m256i); +__m256i __lasx_xvextrins_b (__m256i, __m256i, imm0_255); +__m256i __lasx_xvextrins_d (__m256i, __m256i, imm0_255); +__m256i __lasx_xvextrins_h (__m256i, __m256i, imm0_255); +__m256i __lasx_xvextrins_w (__m256i, __m256i, imm0_255); +__m256d __lasx_xvfadd_d (__m256d, __m256d); +__m256 __lasx_xvfadd_s (__m256, __m256); +__m256i __lasx_xvfclass_d (__m256d); +__m256i __lasx_xvfclass_s (__m256); +__m256i __lasx_xvfcmp_caf_d (__m256d, __m256d); +__m256i __lasx_xvfcmp_caf_s (__m256, __m256); +__m256i __lasx_xvfcmp_ceq_d (__m256d, __m256d); +__m256i __lasx_xvfcmp_ceq_s (__m256, __m256); +__m256i __lasx_xvfcmp_cle_d (__m256d, __m256d); +__m256i __lasx_xvfcmp_cle_s (__m256, __m256); +__m256i __lasx_xvfcmp_clt_d (__m256d, __m256d); +__m256i __lasx_xvfcmp_clt_s (__m256, __m256); +__m256i __lasx_xvfcmp_cne_d (__m256d, __m256d); +__m256i __lasx_xvfcmp_cne_s (__m256, __m256); +__m256i __lasx_xvfcmp_cor_d (__m256d, __m256d); +__m256i __lasx_xvfcmp_cor_s (__m256, __m256); +__m256i __lasx_xvfcmp_cueq_d (__m256d, __m256d); +__m256i __lasx_xvfcmp_cueq_s (__m256, __m256); +__m256i __lasx_xvfcmp_cule_d (__m256d, __m256d); +__m256i __lasx_xvfcmp_cule_s (__m256, __m256); +__m256i __lasx_xvfcmp_cult_d (__m256d, __m256d); +__m256i __lasx_xvfcmp_cult_s (__m256, __m256); +__m256i __lasx_xvfcmp_cun_d (__m256d, __m256d); +__m256i __lasx_xvfcmp_cune_d (__m256d, __m256d); +__m256i __lasx_xvfcmp_cune_s (__m256, __m256); +__m256i __lasx_xvfcmp_cun_s (__m256, __m256); +__m256i __lasx_xvfcmp_saf_d (__m256d, __m256d); +__m256i __lasx_xvfcmp_saf_s (__m256, __m256); +__m256i __lasx_xvfcmp_seq_d (__m256d, __m256d); +__m256i __lasx_xvfcmp_seq_s (__m256, __m256); +__m256i __lasx_xvfcmp_sle_d (__m256d, __m256d); +__m256i __lasx_xvfcmp_sle_s (__m256, __m256); +__m256i __lasx_xvfcmp_slt_d (__m256d, __m256d); +__m256i __lasx_xvfcmp_slt_s (__m256, __m256); +__m256i __lasx_xvfcmp_sne_d (__m256d, __m256d); +__m256i __lasx_xvfcmp_sne_s (__m256, __m256); +__m256i __lasx_xvfcmp_sor_d (__m256d, __m256d); +__m256i __lasx_xvfcmp_sor_s (__m256, __m256); +__m256i __lasx_xvfcmp_sueq_d (__m256d, __m256d); +__m256i __lasx_xvfcmp_sueq_s (__m256, __m256); +__m256i __lasx_xvfcmp_sule_d (__m256d, __m256d); +__m256i __lasx_xvfcmp_sule_s (__m256, __m256); +__m256i __lasx_xvfcmp_sult_d (__m256d, __m256d); +__m256i __lasx_xvfcmp_sult_s (__m256, __m256); +__m256i __lasx_xvfcmp_sun_d (__m256d, __m256d); +__m256i __lasx_xvfcmp_sune_d (__m256d, __m256d); +__m256i __lasx_xvfcmp_sune_s (__m256, __m256); +__m256i __lasx_xvfcmp_sun_s (__m256, __m256); +__m256d __lasx_xvfcvth_d_s (__m256); +__m256i __lasx_xvfcvt_h_s (__m256, __m256); +__m256 __lasx_xvfcvth_s_h (__m256i); +__m256d __lasx_xvfcvtl_d_s (__m256); +__m256 __lasx_xvfcvtl_s_h (__m256i); +__m256 __lasx_xvfcvt_s_d (__m256d, __m256d); +__m256d __lasx_xvfdiv_d (__m256d, __m256d); +__m256 __lasx_xvfdiv_s (__m256, __m256); +__m256d __lasx_xvffint_d_l (__m256i); +__m256d __lasx_xvffint_d_lu (__m256i); +__m256d __lasx_xvffinth_d_w (__m256i); +__m256d __lasx_xvffintl_d_w (__m256i); +__m256 __lasx_xvffint_s_l (__m256i, __m256i); +__m256 __lasx_xvffint_s_w (__m256i); +__m256 __lasx_xvffint_s_wu (__m256i); +__m256d __lasx_xvflogb_d (__m256d); +__m256 __lasx_xvflogb_s (__m256); +__m256d __lasx_xvfmadd_d (__m256d, __m256d, __m256d); +__m256 __lasx_xvfmadd_s (__m256, __m256, __m256); +__m256d __lasx_xvfmaxa_d (__m256d, __m256d); +__m256 __lasx_xvfmaxa_s (__m256, __m256); +__m256d __lasx_xvfmax_d (__m256d, __m256d); +__m256 __lasx_xvfmax_s (__m256, __m256); +__m256d __lasx_xvfmina_d (__m256d, __m256d); +__m256 __lasx_xvfmina_s (__m256, __m256); +__m256d __lasx_xvfmin_d (__m256d, __m256d); +__m256 __lasx_xvfmin_s (__m256, __m256); +__m256d __lasx_xvfmsub_d (__m256d, __m256d, __m256d); +__m256 __lasx_xvfmsub_s (__m256, __m256, __m256); +__m256d __lasx_xvfmul_d (__m256d, __m256d); +__m256 __lasx_xvfmul_s (__m256, __m256); +__m256d __lasx_xvfnmadd_d (__m256d, __m256d, __m256d); +__m256 __lasx_xvfnmadd_s (__m256, __m256, __m256); +__m256d __lasx_xvfnmsub_d (__m256d, __m256d, __m256d); +__m256 __lasx_xvfnmsub_s (__m256, __m256, __m256); +__m256d __lasx_xvfrecip_d (__m256d); +__m256 __lasx_xvfrecip_s (__m256); +__m256d __lasx_xvfrint_d (__m256d); +__m256d __lasx_xvfrintrm_d (__m256d); +__m256 __lasx_xvfrintrm_s (__m256); +__m256d __lasx_xvfrintrne_d (__m256d); +__m256 __lasx_xvfrintrne_s (__m256); +__m256d __lasx_xvfrintrp_d (__m256d); +__m256 __lasx_xvfrintrp_s (__m256); +__m256d __lasx_xvfrintrz_d (__m256d); +__m256 __lasx_xvfrintrz_s (__m256); +__m256 __lasx_xvfrint_s (__m256); +__m256d __lasx_xvfrsqrt_d (__m256d); +__m256 __lasx_xvfrsqrt_s (__m256); +__m256i __lasx_xvfrstp_b (__m256i, __m256i, __m256i); +__m256i __lasx_xvfrstp_h (__m256i, __m256i, __m256i); +__m256i __lasx_xvfrstpi_b (__m256i, __m256i, imm0_31); +__m256i __lasx_xvfrstpi_h (__m256i, __m256i, imm0_31); +__m256d __lasx_xvfsqrt_d (__m256d); +__m256 __lasx_xvfsqrt_s (__m256); +__m256d __lasx_xvfsub_d (__m256d, __m256d); +__m256 __lasx_xvfsub_s (__m256, __m256); +__m256i __lasx_xvftinth_l_s (__m256); +__m256i __lasx_xvftint_l_d (__m256d); +__m256i __lasx_xvftintl_l_s (__m256); +__m256i __lasx_xvftint_lu_d (__m256d); +__m256i __lasx_xvftintrmh_l_s (__m256); +__m256i __lasx_xvftintrm_l_d (__m256d); +__m256i __lasx_xvftintrml_l_s (__m256); +__m256i __lasx_xvftintrm_w_d (__m256d, __m256d); +__m256i __lasx_xvftintrm_w_s (__m256); +__m256i __lasx_xvftintrneh_l_s (__m256); +__m256i __lasx_xvftintrne_l_d (__m256d); +__m256i __lasx_xvftintrnel_l_s (__m256); +__m256i __lasx_xvftintrne_w_d (__m256d, __m256d); +__m256i __lasx_xvftintrne_w_s (__m256); +__m256i __lasx_xvftintrph_l_s (__m256); +__m256i __lasx_xvftintrp_l_d (__m256d); +__m256i __lasx_xvftintrpl_l_s (__m256); +__m256i __lasx_xvftintrp_w_d (__m256d, __m256d); +__m256i __lasx_xvftintrp_w_s (__m256); +__m256i __lasx_xvftintrzh_l_s (__m256); +__m256i __lasx_xvftintrz_l_d (__m256d); +__m256i __lasx_xvftintrzl_l_s (__m256); +__m256i __lasx_xvftintrz_lu_d (__m256d); +__m256i __lasx_xvftintrz_w_d (__m256d, __m256d); +__m256i __lasx_xvftintrz_w_s (__m256); +__m256i __lasx_xvftintrz_wu_s (__m256); +__m256i __lasx_xvftint_w_d (__m256d, __m256d); +__m256i __lasx_xvftint_w_s (__m256); +__m256i __lasx_xvftint_wu_s (__m256); +__m256i __lasx_xvhaddw_du_wu (__m256i, __m256i); +__m256i __lasx_xvhaddw_d_w (__m256i, __m256i); +__m256i __lasx_xvhaddw_h_b (__m256i, __m256i); +__m256i __lasx_xvhaddw_hu_bu (__m256i, __m256i); +__m256i __lasx_xvhaddw_q_d (__m256i, __m256i); +__m256i __lasx_xvhaddw_qu_du (__m256i, __m256i); +__m256i __lasx_xvhaddw_w_h (__m256i, __m256i); +__m256i __lasx_xvhaddw_wu_hu (__m256i, __m256i); +__m256i __lasx_xvhsubw_du_wu (__m256i, __m256i); +__m256i __lasx_xvhsubw_d_w (__m256i, __m256i); +__m256i __lasx_xvhsubw_h_b (__m256i, __m256i); +__m256i __lasx_xvhsubw_hu_bu (__m256i, __m256i); +__m256i __lasx_xvhsubw_q_d (__m256i, __m256i); +__m256i __lasx_xvhsubw_qu_du (__m256i, __m256i); +__m256i __lasx_xvhsubw_w_h (__m256i, __m256i); +__m256i __lasx_xvhsubw_wu_hu (__m256i, __m256i); +__m256i __lasx_xvilvh_b (__m256i, __m256i); +__m256i __lasx_xvilvh_d (__m256i, __m256i); +__m256i __lasx_xvilvh_h (__m256i, __m256i); +__m256i __lasx_xvilvh_w (__m256i, __m256i); +__m256i __lasx_xvilvl_b (__m256i, __m256i); +__m256i __lasx_xvilvl_d (__m256i, __m256i); +__m256i __lasx_xvilvl_h (__m256i, __m256i); +__m256i __lasx_xvilvl_w (__m256i, __m256i); +__m256i __lasx_xvinsgr2vr_d (__m256i, long int, imm0_3); +__m256i __lasx_xvinsgr2vr_w (__m256i, int, imm0_7); +__m256i __lasx_xvinsve0_d (__m256i, __m256i, imm0_3); +__m256i __lasx_xvinsve0_w (__m256i, __m256i, imm0_7); +__m256i __lasx_xvld (void *, imm_n2048_2047); +__m256i __lasx_xvldi (imm_n1024_1023); +__m256i __lasx_xvldrepl_b (void *, imm_n2048_2047); +__m256i __lasx_xvldrepl_d (void *, imm_n256_255); +__m256i __lasx_xvldrepl_h (void *, imm_n1024_1023); +__m256i __lasx_xvldrepl_w (void *, imm_n512_511); +__m256i __lasx_xvldx (void *, long int); +__m256i __lasx_xvmadd_b (__m256i, __m256i, __m256i); +__m256i __lasx_xvmadd_d (__m256i, __m256i, __m256i); +__m256i __lasx_xvmadd_h (__m256i, __m256i, __m256i); +__m256i __lasx_xvmadd_w (__m256i, __m256i, __m256i); +__m256i __lasx_xvmaddwev_d_w (__m256i, __m256i, __m256i); +__m256i __lasx_xvmaddwev_d_wu (__m256i, __m256i, __m256i); +__m256i __lasx_xvmaddwev_d_wu_w (__m256i, __m256i, __m256i); +__m256i __lasx_xvmaddwev_h_b (__m256i, __m256i, __m256i); +__m256i __lasx_xvmaddwev_h_bu (__m256i, __m256i, __m256i); +__m256i __lasx_xvmaddwev_h_bu_b (__m256i, __m256i, __m256i); +__m256i __lasx_xvmaddwev_q_d (__m256i, __m256i, __m256i); +__m256i __lasx_xvmaddwev_q_du (__m256i, __m256i, __m256i); +__m256i __lasx_xvmaddwev_q_du_d (__m256i, __m256i, __m256i); +__m256i __lasx_xvmaddwev_w_h (__m256i, __m256i, __m256i); +__m256i __lasx_xvmaddwev_w_hu (__m256i, __m256i, __m256i); +__m256i __lasx_xvmaddwev_w_hu_h (__m256i, __m256i, __m256i); +__m256i __lasx_xvmaddwod_d_w (__m256i, __m256i, __m256i); +__m256i __lasx_xvmaddwod_d_wu (__m256i, __m256i, __m256i); +__m256i __lasx_xvmaddwod_d_wu_w (__m256i, __m256i, __m256i); +__m256i __lasx_xvmaddwod_h_b (__m256i, __m256i, __m256i); +__m256i __lasx_xvmaddwod_h_bu (__m256i, __m256i, __m256i); +__m256i __lasx_xvmaddwod_h_bu_b (__m256i, __m256i, __m256i); +__m256i __lasx_xvmaddwod_q_d (__m256i, __m256i, __m256i); +__m256i __lasx_xvmaddwod_q_du (__m256i, __m256i, __m256i); +__m256i __lasx_xvmaddwod_q_du_d (__m256i, __m256i, __m256i); +__m256i __lasx_xvmaddwod_w_h (__m256i, __m256i, __m256i); +__m256i __lasx_xvmaddwod_w_hu (__m256i, __m256i, __m256i); +__m256i __lasx_xvmaddwod_w_hu_h (__m256i, __m256i, __m256i); +__m256i __lasx_xvmax_b (__m256i, __m256i); +__m256i __lasx_xvmax_bu (__m256i, __m256i); +__m256i __lasx_xvmax_d (__m256i, __m256i); +__m256i __lasx_xvmax_du (__m256i, __m256i); +__m256i __lasx_xvmax_h (__m256i, __m256i); +__m256i __lasx_xvmax_hu (__m256i, __m256i); +__m256i __lasx_xvmaxi_b (__m256i, imm_n16_15); +__m256i __lasx_xvmaxi_bu (__m256i, imm0_31); +__m256i __lasx_xvmaxi_d (__m256i, imm_n16_15); +__m256i __lasx_xvmaxi_du (__m256i, imm0_31); +__m256i __lasx_xvmaxi_h (__m256i, imm_n16_15); +__m256i __lasx_xvmaxi_hu (__m256i, imm0_31); +__m256i __lasx_xvmaxi_w (__m256i, imm_n16_15); +__m256i __lasx_xvmaxi_wu (__m256i, imm0_31); +__m256i __lasx_xvmax_w (__m256i, __m256i); +__m256i __lasx_xvmax_wu (__m256i, __m256i); +__m256i __lasx_xvmin_b (__m256i, __m256i); +__m256i __lasx_xvmin_bu (__m256i, __m256i); +__m256i __lasx_xvmin_d (__m256i, __m256i); +__m256i __lasx_xvmin_du (__m256i, __m256i); +__m256i __lasx_xvmin_h (__m256i, __m256i); +__m256i __lasx_xvmin_hu (__m256i, __m256i); +__m256i __lasx_xvmini_b (__m256i, imm_n16_15); +__m256i __lasx_xvmini_bu (__m256i, imm0_31); +__m256i __lasx_xvmini_d (__m256i, imm_n16_15); +__m256i __lasx_xvmini_du (__m256i, imm0_31); +__m256i __lasx_xvmini_h (__m256i, imm_n16_15); +__m256i __lasx_xvmini_hu (__m256i, imm0_31); +__m256i __lasx_xvmini_w (__m256i, imm_n16_15); +__m256i __lasx_xvmini_wu (__m256i, imm0_31); +__m256i __lasx_xvmin_w (__m256i, __m256i); +__m256i __lasx_xvmin_wu (__m256i, __m256i); +__m256i __lasx_xvmod_b (__m256i, __m256i); +__m256i __lasx_xvmod_bu (__m256i, __m256i); +__m256i __lasx_xvmod_d (__m256i, __m256i); +__m256i __lasx_xvmod_du (__m256i, __m256i); +__m256i __lasx_xvmod_h (__m256i, __m256i); +__m256i __lasx_xvmod_hu (__m256i, __m256i); +__m256i __lasx_xvmod_w (__m256i, __m256i); +__m256i __lasx_xvmod_wu (__m256i, __m256i); +__m256i __lasx_xvmskgez_b (__m256i); +__m256i __lasx_xvmskltz_b (__m256i); +__m256i __lasx_xvmskltz_d (__m256i); +__m256i __lasx_xvmskltz_h (__m256i); +__m256i __lasx_xvmskltz_w (__m256i); +__m256i __lasx_xvmsknz_b (__m256i); +__m256i __lasx_xvmsub_b (__m256i, __m256i, __m256i); +__m256i __lasx_xvmsub_d (__m256i, __m256i, __m256i); +__m256i __lasx_xvmsub_h (__m256i, __m256i, __m256i); +__m256i __lasx_xvmsub_w (__m256i, __m256i, __m256i); +__m256i __lasx_xvmuh_b (__m256i, __m256i); +__m256i __lasx_xvmuh_bu (__m256i, __m256i); +__m256i __lasx_xvmuh_d (__m256i, __m256i); +__m256i __lasx_xvmuh_du (__m256i, __m256i); +__m256i __lasx_xvmuh_h (__m256i, __m256i); +__m256i __lasx_xvmuh_hu (__m256i, __m256i); +__m256i __lasx_xvmuh_w (__m256i, __m256i); +__m256i __lasx_xvmuh_wu (__m256i, __m256i); +__m256i __lasx_xvmul_b (__m256i, __m256i); +__m256i __lasx_xvmul_d (__m256i, __m256i); +__m256i __lasx_xvmul_h (__m256i, __m256i); +__m256i __lasx_xvmul_w (__m256i, __m256i); +__m256i __lasx_xvmulwev_d_w (__m256i, __m256i); +__m256i __lasx_xvmulwev_d_wu (__m256i, __m256i); +__m256i __lasx_xvmulwev_d_wu_w (__m256i, __m256i); +__m256i __lasx_xvmulwev_h_b (__m256i, __m256i); +__m256i __lasx_xvmulwev_h_bu (__m256i, __m256i); +__m256i __lasx_xvmulwev_h_bu_b (__m256i, __m256i); +__m256i __lasx_xvmulwev_q_d (__m256i, __m256i); +__m256i __lasx_xvmulwev_q_du (__m256i, __m256i); +__m256i __lasx_xvmulwev_q_du_d (__m256i, __m256i); +__m256i __lasx_xvmulwev_w_h (__m256i, __m256i); +__m256i __lasx_xvmulwev_w_hu (__m256i, __m256i); +__m256i __lasx_xvmulwev_w_hu_h (__m256i, __m256i); +__m256i __lasx_xvmulwod_d_w (__m256i, __m256i); +__m256i __lasx_xvmulwod_d_wu (__m256i, __m256i); +__m256i __lasx_xvmulwod_d_wu_w (__m256i, __m256i); +__m256i __lasx_xvmulwod_h_b (__m256i, __m256i); +__m256i __lasx_xvmulwod_h_bu (__m256i, __m256i); +__m256i __lasx_xvmulwod_h_bu_b (__m256i, __m256i); +__m256i __lasx_xvmulwod_q_d (__m256i, __m256i); +__m256i __lasx_xvmulwod_q_du (__m256i, __m256i); +__m256i __lasx_xvmulwod_q_du_d (__m256i, __m256i); +__m256i __lasx_xvmulwod_w_h (__m256i, __m256i); +__m256i __lasx_xvmulwod_w_hu (__m256i, __m256i); +__m256i __lasx_xvmulwod_w_hu_h (__m256i, __m256i); +__m256i __lasx_xvneg_b (__m256i); +__m256i __lasx_xvneg_d (__m256i); +__m256i __lasx_xvneg_h (__m256i); +__m256i __lasx_xvneg_w (__m256i); +__m256i __lasx_xvnori_b (__m256i, imm0_255); +__m256i __lasx_xvnor_v (__m256i, __m256i); +__m256i __lasx_xvori_b (__m256i, imm0_255); +__m256i __lasx_xvorn_v (__m256i, __m256i); +__m256i __lasx_xvor_v (__m256i, __m256i); +__m256i __lasx_xvpackev_b (__m256i, __m256i); +__m256i __lasx_xvpackev_d (__m256i, __m256i); +__m256i __lasx_xvpackev_h (__m256i, __m256i); +__m256i __lasx_xvpackev_w (__m256i, __m256i); +__m256i __lasx_xvpackod_b (__m256i, __m256i); +__m256i __lasx_xvpackod_d (__m256i, __m256i); +__m256i __lasx_xvpackod_h (__m256i, __m256i); +__m256i __lasx_xvpackod_w (__m256i, __m256i); +__m256i __lasx_xvpcnt_b (__m256i); +__m256i __lasx_xvpcnt_d (__m256i); +__m256i __lasx_xvpcnt_h (__m256i); +__m256i __lasx_xvpcnt_w (__m256i); +__m256i __lasx_xvpermi_d (__m256i, imm0_255); +__m256i __lasx_xvpermi_q (__m256i, __m256i, imm0_255); +__m256i __lasx_xvpermi_w (__m256i, __m256i, imm0_255); +__m256i __lasx_xvperm_w (__m256i, __m256i); +__m256i __lasx_xvpickev_b (__m256i, __m256i); +__m256i __lasx_xvpickev_d (__m256i, __m256i); +__m256i __lasx_xvpickev_h (__m256i, __m256i); +__m256i __lasx_xvpickev_w (__m256i, __m256i); +__m256i __lasx_xvpickod_b (__m256i, __m256i); +__m256i __lasx_xvpickod_d (__m256i, __m256i); +__m256i __lasx_xvpickod_h (__m256i, __m256i); +__m256i __lasx_xvpickod_w (__m256i, __m256i); +long int __lasx_xvpickve2gr_d (__m256i, imm0_3); +unsigned long int __lasx_xvpickve2gr_du (__m256i, imm0_3); +int __lasx_xvpickve2gr_w (__m256i, imm0_7); +unsigned int __lasx_xvpickve2gr_wu (__m256i, imm0_7); +__m256i __lasx_xvpickve_d (__m256i, imm0_3); +__m256d __lasx_xvpickve_d_f (__m256d, imm0_3); +__m256i __lasx_xvpickve_w (__m256i, imm0_7); +__m256 __lasx_xvpickve_w_f (__m256, imm0_7); +__m256i __lasx_xvrepl128vei_b (__m256i, imm0_15); +__m256i __lasx_xvrepl128vei_d (__m256i, imm0_1); +__m256i __lasx_xvrepl128vei_h (__m256i, imm0_7); +__m256i __lasx_xvrepl128vei_w (__m256i, imm0_3); +__m256i __lasx_xvreplgr2vr_b (int); +__m256i __lasx_xvreplgr2vr_d (long int); +__m256i __lasx_xvreplgr2vr_h (int); +__m256i __lasx_xvreplgr2vr_w (int); +__m256i __lasx_xvrepli_b (imm_n512_511); +__m256i __lasx_xvrepli_d (imm_n512_511); +__m256i __lasx_xvrepli_h (imm_n512_511); +__m256i __lasx_xvrepli_w (imm_n512_511); +__m256i __lasx_xvreplve0_b (__m256i); +__m256i __lasx_xvreplve0_d (__m256i); +__m256i __lasx_xvreplve0_h (__m256i); +__m256i __lasx_xvreplve0_q (__m256i); +__m256i __lasx_xvreplve0_w (__m256i); +__m256i __lasx_xvreplve_b (__m256i, int); +__m256i __lasx_xvreplve_d (__m256i, int); +__m256i __lasx_xvreplve_h (__m256i, int); +__m256i __lasx_xvreplve_w (__m256i, int); +__m256i __lasx_xvrotr_b (__m256i, __m256i); +__m256i __lasx_xvrotr_d (__m256i, __m256i); +__m256i __lasx_xvrotr_h (__m256i, __m256i); +__m256i __lasx_xvrotri_b (__m256i, imm0_7); +__m256i __lasx_xvrotri_d (__m256i, imm0_63); +__m256i __lasx_xvrotri_h (__m256i, imm0_15); +__m256i __lasx_xvrotri_w (__m256i, imm0_31); +__m256i __lasx_xvrotr_w (__m256i, __m256i); +__m256i __lasx_xvsadd_b (__m256i, __m256i); +__m256i __lasx_xvsadd_bu (__m256i, __m256i); +__m256i __lasx_xvsadd_d (__m256i, __m256i); +__m256i __lasx_xvsadd_du (__m256i, __m256i); +__m256i __lasx_xvsadd_h (__m256i, __m256i); +__m256i __lasx_xvsadd_hu (__m256i, __m256i); +__m256i __lasx_xvsadd_w (__m256i, __m256i); +__m256i __lasx_xvsadd_wu (__m256i, __m256i); +__m256i __lasx_xvsat_b (__m256i, imm0_7); +__m256i __lasx_xvsat_bu (__m256i, imm0_7); +__m256i __lasx_xvsat_d (__m256i, imm0_63); +__m256i __lasx_xvsat_du (__m256i, imm0_63); +__m256i __lasx_xvsat_h (__m256i, imm0_15); +__m256i __lasx_xvsat_hu (__m256i, imm0_15); +__m256i __lasx_xvsat_w (__m256i, imm0_31); +__m256i __lasx_xvsat_wu (__m256i, imm0_31); +__m256i __lasx_xvseq_b (__m256i, __m256i); +__m256i __lasx_xvseq_d (__m256i, __m256i); +__m256i __lasx_xvseq_h (__m256i, __m256i); +__m256i __lasx_xvseqi_b (__m256i, imm_n16_15); +__m256i __lasx_xvseqi_d (__m256i, imm_n16_15); +__m256i __lasx_xvseqi_h (__m256i, imm_n16_15); +__m256i __lasx_xvseqi_w (__m256i, imm_n16_15); +__m256i __lasx_xvseq_w (__m256i, __m256i); +__m256i __lasx_xvshuf4i_b (__m256i, imm0_255); +__m256i __lasx_xvshuf4i_d (__m256i, __m256i, imm0_255); +__m256i __lasx_xvshuf4i_h (__m256i, imm0_255); +__m256i __lasx_xvshuf4i_w (__m256i, imm0_255); +__m256i __lasx_xvshuf_b (__m256i, __m256i, __m256i); +__m256i __lasx_xvshuf_d (__m256i, __m256i, __m256i); +__m256i __lasx_xvshuf_h (__m256i, __m256i, __m256i); +__m256i __lasx_xvshuf_w (__m256i, __m256i, __m256i); +__m256i __lasx_xvsigncov_b (__m256i, __m256i); +__m256i __lasx_xvsigncov_d (__m256i, __m256i); +__m256i __lasx_xvsigncov_h (__m256i, __m256i); +__m256i __lasx_xvsigncov_w (__m256i, __m256i); +__m256i __lasx_xvsle_b (__m256i, __m256i); +__m256i __lasx_xvsle_bu (__m256i, __m256i); +__m256i __lasx_xvsle_d (__m256i, __m256i); +__m256i __lasx_xvsle_du (__m256i, __m256i); +__m256i __lasx_xvsle_h (__m256i, __m256i); +__m256i __lasx_xvsle_hu (__m256i, __m256i); +__m256i __lasx_xvslei_b (__m256i, imm_n16_15); +__m256i __lasx_xvslei_bu (__m256i, imm0_31); +__m256i __lasx_xvslei_d (__m256i, imm_n16_15); +__m256i __lasx_xvslei_du (__m256i, imm0_31); +__m256i __lasx_xvslei_h (__m256i, imm_n16_15); +__m256i __lasx_xvslei_hu (__m256i, imm0_31); +__m256i __lasx_xvslei_w (__m256i, imm_n16_15); +__m256i __lasx_xvslei_wu (__m256i, imm0_31); +__m256i __lasx_xvsle_w (__m256i, __m256i); +__m256i __lasx_xvsle_wu (__m256i, __m256i); +__m256i __lasx_xvsll_b (__m256i, __m256i); +__m256i __lasx_xvsll_d (__m256i, __m256i); +__m256i __lasx_xvsll_h (__m256i, __m256i); +__m256i __lasx_xvslli_b (__m256i, imm0_7); +__m256i __lasx_xvslli_d (__m256i, imm0_63); +__m256i __lasx_xvslli_h (__m256i, imm0_15); +__m256i __lasx_xvslli_w (__m256i, imm0_31); +__m256i __lasx_xvsll_w (__m256i, __m256i); +__m256i __lasx_xvsllwil_du_wu (__m256i, imm0_31); +__m256i __lasx_xvsllwil_d_w (__m256i, imm0_31); +__m256i __lasx_xvsllwil_h_b (__m256i, imm0_7); +__m256i __lasx_xvsllwil_hu_bu (__m256i, imm0_7); +__m256i __lasx_xvsllwil_w_h (__m256i, imm0_15); +__m256i __lasx_xvsllwil_wu_hu (__m256i, imm0_15); +__m256i __lasx_xvslt_b (__m256i, __m256i); +__m256i __lasx_xvslt_bu (__m256i, __m256i); +__m256i __lasx_xvslt_d (__m256i, __m256i); +__m256i __lasx_xvslt_du (__m256i, __m256i); +__m256i __lasx_xvslt_h (__m256i, __m256i); +__m256i __lasx_xvslt_hu (__m256i, __m256i); +__m256i __lasx_xvslti_b (__m256i, imm_n16_15); +__m256i __lasx_xvslti_bu (__m256i, imm0_31); +__m256i __lasx_xvslti_d (__m256i, imm_n16_15); +__m256i __lasx_xvslti_du (__m256i, imm0_31); +__m256i __lasx_xvslti_h (__m256i, imm_n16_15); +__m256i __lasx_xvslti_hu (__m256i, imm0_31); +__m256i __lasx_xvslti_w (__m256i, imm_n16_15); +__m256i __lasx_xvslti_wu (__m256i, imm0_31); +__m256i __lasx_xvslt_w (__m256i, __m256i); +__m256i __lasx_xvslt_wu (__m256i, __m256i); +__m256i __lasx_xvsra_b (__m256i, __m256i); +__m256i __lasx_xvsra_d (__m256i, __m256i); +__m256i __lasx_xvsra_h (__m256i, __m256i); +__m256i __lasx_xvsrai_b (__m256i, imm0_7); +__m256i __lasx_xvsrai_d (__m256i, imm0_63); +__m256i __lasx_xvsrai_h (__m256i, imm0_15); +__m256i __lasx_xvsrai_w (__m256i, imm0_31); +__m256i __lasx_xvsran_b_h (__m256i, __m256i); +__m256i __lasx_xvsran_h_w (__m256i, __m256i); +__m256i __lasx_xvsrani_b_h (__m256i, __m256i, imm0_15); +__m256i __lasx_xvsrani_d_q (__m256i, __m256i, imm0_127); +__m256i __lasx_xvsrani_h_w (__m256i, __m256i, imm0_31); +__m256i __lasx_xvsrani_w_d (__m256i, __m256i, imm0_63); +__m256i __lasx_xvsran_w_d (__m256i, __m256i); +__m256i __lasx_xvsrar_b (__m256i, __m256i); +__m256i __lasx_xvsrar_d (__m256i, __m256i); +__m256i __lasx_xvsrar_h (__m256i, __m256i); +__m256i __lasx_xvsrari_b (__m256i, imm0_7); +__m256i __lasx_xvsrari_d (__m256i, imm0_63); +__m256i __lasx_xvsrari_h (__m256i, imm0_15); +__m256i __lasx_xvsrari_w (__m256i, imm0_31); +__m256i __lasx_xvsrarn_b_h (__m256i, __m256i); +__m256i __lasx_xvsrarn_h_w (__m256i, __m256i); +__m256i __lasx_xvsrarni_b_h (__m256i, __m256i, imm0_15); +__m256i __lasx_xvsrarni_d_q (__m256i, __m256i, imm0_127); +__m256i __lasx_xvsrarni_h_w (__m256i, __m256i, imm0_31); +__m256i __lasx_xvsrarni_w_d (__m256i, __m256i, imm0_63); +__m256i __lasx_xvsrarn_w_d (__m256i, __m256i); +__m256i __lasx_xvsrar_w (__m256i, __m256i); +__m256i __lasx_xvsra_w (__m256i, __m256i); +__m256i __lasx_xvsrl_b (__m256i, __m256i); +__m256i __lasx_xvsrl_d (__m256i, __m256i); +__m256i __lasx_xvsrl_h (__m256i, __m256i); +__m256i __lasx_xvsrli_b (__m256i, imm0_7); +__m256i __lasx_xvsrli_d (__m256i, imm0_63); +__m256i __lasx_xvsrli_h (__m256i, imm0_15); +__m256i __lasx_xvsrli_w (__m256i, imm0_31); +__m256i __lasx_xvsrln_b_h (__m256i, __m256i); +__m256i __lasx_xvsrln_h_w (__m256i, __m256i); +__m256i __lasx_xvsrlni_b_h (__m256i, __m256i, imm0_15); +__m256i __lasx_xvsrlni_d_q (__m256i, __m256i, imm0_127); +__m256i __lasx_xvsrlni_h_w (__m256i, __m256i, imm0_31); +__m256i __lasx_xvsrlni_w_d (__m256i, __m256i, imm0_63); +__m256i __lasx_xvsrln_w_d (__m256i, __m256i); +__m256i __lasx_xvsrlr_b (__m256i, __m256i); +__m256i __lasx_xvsrlr_d (__m256i, __m256i); +__m256i __lasx_xvsrlr_h (__m256i, __m256i); +__m256i __lasx_xvsrlri_b (__m256i, imm0_7); +__m256i __lasx_xvsrlri_d (__m256i, imm0_63); +__m256i __lasx_xvsrlri_h (__m256i, imm0_15); +__m256i __lasx_xvsrlri_w (__m256i, imm0_31); +__m256i __lasx_xvsrlrn_b_h (__m256i, __m256i); +__m256i __lasx_xvsrlrn_h_w (__m256i, __m256i); +__m256i __lasx_xvsrlrni_b_h (__m256i, __m256i, imm0_15); +__m256i __lasx_xvsrlrni_d_q (__m256i, __m256i, imm0_127); +__m256i __lasx_xvsrlrni_h_w (__m256i, __m256i, imm0_31); +__m256i __lasx_xvsrlrni_w_d (__m256i, __m256i, imm0_63); +__m256i __lasx_xvsrlrn_w_d (__m256i, __m256i); +__m256i __lasx_xvsrlr_w (__m256i, __m256i); +__m256i __lasx_xvsrl_w (__m256i, __m256i); +__m256i __lasx_xvssran_b_h (__m256i, __m256i); +__m256i __lasx_xvssran_bu_h (__m256i, __m256i); +__m256i __lasx_xvssran_hu_w (__m256i, __m256i); +__m256i __lasx_xvssran_h_w (__m256i, __m256i); +__m256i __lasx_xvssrani_b_h (__m256i, __m256i, imm0_15); +__m256i __lasx_xvssrani_bu_h (__m256i, __m256i, imm0_15); +__m256i __lasx_xvssrani_d_q (__m256i, __m256i, imm0_127); +__m256i __lasx_xvssrani_du_q (__m256i, __m256i, imm0_127); +__m256i __lasx_xvssrani_hu_w (__m256i, __m256i, imm0_31); +__m256i __lasx_xvssrani_h_w (__m256i, __m256i, imm0_31); +__m256i __lasx_xvssrani_w_d (__m256i, __m256i, imm0_63); +__m256i __lasx_xvssrani_wu_d (__m256i, __m256i, imm0_63); +__m256i __lasx_xvssran_w_d (__m256i, __m256i); +__m256i __lasx_xvssran_wu_d (__m256i, __m256i); +__m256i __lasx_xvssrarn_b_h (__m256i, __m256i); +__m256i __lasx_xvssrarn_bu_h (__m256i, __m256i); +__m256i __lasx_xvssrarn_hu_w (__m256i, __m256i); +__m256i __lasx_xvssrarn_h_w (__m256i, __m256i); +__m256i __lasx_xvssrarni_b_h (__m256i, __m256i, imm0_15); +__m256i __lasx_xvssrarni_bu_h (__m256i, __m256i, imm0_15); +__m256i __lasx_xvssrarni_d_q (__m256i, __m256i, imm0_127); +__m256i __lasx_xvssrarni_du_q (__m256i, __m256i, imm0_127); +__m256i __lasx_xvssrarni_hu_w (__m256i, __m256i, imm0_31); +__m256i __lasx_xvssrarni_h_w (__m256i, __m256i, imm0_31); +__m256i __lasx_xvssrarni_w_d (__m256i, __m256i, imm0_63); +__m256i __lasx_xvssrarni_wu_d (__m256i, __m256i, imm0_63); +__m256i __lasx_xvssrarn_w_d (__m256i, __m256i); +__m256i __lasx_xvssrarn_wu_d (__m256i, __m256i); +__m256i __lasx_xvssrln_b_h (__m256i, __m256i); +__m256i __lasx_xvssrln_bu_h (__m256i, __m256i); +__m256i __lasx_xvssrln_hu_w (__m256i, __m256i); +__m256i __lasx_xvssrln_h_w (__m256i, __m256i); +__m256i __lasx_xvssrlni_b_h (__m256i, __m256i, imm0_15); +__m256i __lasx_xvssrlni_bu_h (__m256i, __m256i, imm0_15); +__m256i __lasx_xvssrlni_d_q (__m256i, __m256i, imm0_127); +__m256i __lasx_xvssrlni_du_q (__m256i, __m256i, imm0_127); +__m256i __lasx_xvssrlni_hu_w (__m256i, __m256i, imm0_31); +__m256i __lasx_xvssrlni_h_w (__m256i, __m256i, imm0_31); +__m256i __lasx_xvssrlni_w_d (__m256i, __m256i, imm0_63); +__m256i __lasx_xvssrlni_wu_d (__m256i, __m256i, imm0_63); +__m256i __lasx_xvssrln_w_d (__m256i, __m256i); +__m256i __lasx_xvssrln_wu_d (__m256i, __m256i); +__m256i __lasx_xvssrlrn_b_h (__m256i, __m256i); +__m256i __lasx_xvssrlrn_bu_h (__m256i, __m256i); +__m256i __lasx_xvssrlrn_hu_w (__m256i, __m256i); +__m256i __lasx_xvssrlrn_h_w (__m256i, __m256i); +__m256i __lasx_xvssrlrni_b_h (__m256i, __m256i, imm0_15); +__m256i __lasx_xvssrlrni_bu_h (__m256i, __m256i, imm0_15); +__m256i __lasx_xvssrlrni_d_q (__m256i, __m256i, imm0_127); +__m256i __lasx_xvssrlrni_du_q (__m256i, __m256i, imm0_127); +__m256i __lasx_xvssrlrni_hu_w (__m256i, __m256i, imm0_31); +__m256i __lasx_xvssrlrni_h_w (__m256i, __m256i, imm0_31); +__m256i __lasx_xvssrlrni_w_d (__m256i, __m256i, imm0_63); +__m256i __lasx_xvssrlrni_wu_d (__m256i, __m256i, imm0_63); +__m256i __lasx_xvssrlrn_w_d (__m256i, __m256i); +__m256i __lasx_xvssrlrn_wu_d (__m256i, __m256i); +__m256i __lasx_xvssub_b (__m256i, __m256i); +__m256i __lasx_xvssub_bu (__m256i, __m256i); +__m256i __lasx_xvssub_d (__m256i, __m256i); +__m256i __lasx_xvssub_du (__m256i, __m256i); +__m256i __lasx_xvssub_h (__m256i, __m256i); +__m256i __lasx_xvssub_hu (__m256i, __m256i); +__m256i __lasx_xvssub_w (__m256i, __m256i); +__m256i __lasx_xvssub_wu (__m256i, __m256i); +void __lasx_xvst (__m256i, void *, imm_n2048_2047); +void __lasx_xvstelm_b (__m256i, void *, imm_n128_127, imm0_31); +void __lasx_xvstelm_d (__m256i, void *, imm_n128_127, imm0_3); +void __lasx_xvstelm_h (__m256i, void *, imm_n128_127, imm0_15); +void __lasx_xvstelm_w (__m256i, void *, imm_n128_127, imm0_7); +void __lasx_xvstx (__m256i, void *, long int); +__m256i __lasx_xvsub_b (__m256i, __m256i); +__m256i __lasx_xvsub_d (__m256i, __m256i); +__m256i __lasx_xvsub_h (__m256i, __m256i); +__m256i __lasx_xvsubi_bu (__m256i, imm0_31); +__m256i __lasx_xvsubi_du (__m256i, imm0_31); +__m256i __lasx_xvsubi_hu (__m256i, imm0_31); +__m256i __lasx_xvsubi_wu (__m256i, imm0_31); +__m256i __lasx_xvsub_q (__m256i, __m256i); +__m256i __lasx_xvsub_w (__m256i, __m256i); +__m256i __lasx_xvsubwev_d_w (__m256i, __m256i); +__m256i __lasx_xvsubwev_d_wu (__m256i, __m256i); +__m256i __lasx_xvsubwev_h_b (__m256i, __m256i); +__m256i __lasx_xvsubwev_h_bu (__m256i, __m256i); +__m256i __lasx_xvsubwev_q_d (__m256i, __m256i); +__m256i __lasx_xvsubwev_q_du (__m256i, __m256i); +__m256i __lasx_xvsubwev_w_h (__m256i, __m256i); +__m256i __lasx_xvsubwev_w_hu (__m256i, __m256i); +__m256i __lasx_xvsubwod_d_w (__m256i, __m256i); +__m256i __lasx_xvsubwod_d_wu (__m256i, __m256i); +__m256i __lasx_xvsubwod_h_b (__m256i, __m256i); +__m256i __lasx_xvsubwod_h_bu (__m256i, __m256i); +__m256i __lasx_xvsubwod_q_d (__m256i, __m256i); +__m256i __lasx_xvsubwod_q_du (__m256i, __m256i); +__m256i __lasx_xvsubwod_w_h (__m256i, __m256i); +__m256i __lasx_xvsubwod_w_hu (__m256i, __m256i); +__m256i __lasx_xvxori_b (__m256i, imm0_255); +__m256i __lasx_xvxor_v (__m256i, __m256i); +@end smallexample -v16i8 __builtin_msa_msubv_b (v16i8, v16i8, v16i8); -v8i16 __builtin_msa_msubv_h (v8i16, v8i16, v8i16); -v4i32 __builtin_msa_msubv_w (v4i32, v4i32, v4i32); -v2i64 __builtin_msa_msubv_d (v2i64, v2i64, v2i64); +These intrinsic functions are available by including @code{lasxintrin.h} and +using @option{-mfrecipe} and @option{-mlasx}. +@smallexample +__m256d __lasx_xvfrecipe_d (__m256d); +__m256 __lasx_xvfrecipe_s (__m256); +__m256d __lasx_xvfrsqrte_d (__m256d); +__m256 __lasx_xvfrsqrte_s (__m256); +@end smallexample -v8i16 __builtin_msa_mul_q_h (v8i16, v8i16); -v4i32 __builtin_msa_mul_q_w (v4i32, v4i32); +@node MIPS DSP Built-in Functions +@subsection MIPS DSP Built-in Functions -v8i16 __builtin_msa_mulr_q_h (v8i16, v8i16); -v4i32 __builtin_msa_mulr_q_w (v4i32, v4i32); +The MIPS DSP Application-Specific Extension (ASE) includes new +instructions that are designed to improve the performance of DSP and +media applications. It provides instructions that operate on packed +8-bit/16-bit integer data, Q7, Q15 and Q31 fractional data. -v16i8 __builtin_msa_mulv_b (v16i8, v16i8); -v8i16 __builtin_msa_mulv_h (v8i16, v8i16); -v4i32 __builtin_msa_mulv_w (v4i32, v4i32); -v2i64 __builtin_msa_mulv_d (v2i64, v2i64); +GCC supports MIPS DSP operations using both the generic +vector extensions (@pxref{Vector Extensions}) and a collection of +MIPS-specific built-in functions. Both kinds of support are +enabled by the @option{-mdsp} command-line option. -v16i8 __builtin_msa_nloc_b (v16i8); -v8i16 __builtin_msa_nloc_h (v8i16); -v4i32 __builtin_msa_nloc_w (v4i32); -v2i64 __builtin_msa_nloc_d (v2i64); +Revision 2 of the ASE was introduced in the second half of 2006. +This revision adds extra instructions to the original ASE, but is +otherwise backwards-compatible with it. You can select revision 2 +using the command-line option @option{-mdspr2}; this option implies +@option{-mdsp}. -v16i8 __builtin_msa_nlzc_b (v16i8); -v8i16 __builtin_msa_nlzc_h (v8i16); -v4i32 __builtin_msa_nlzc_w (v4i32); -v2i64 __builtin_msa_nlzc_d (v2i64); +The SCOUNT and POS bits of the DSP control register are global. The +WRDSP, EXTPDP, EXTPDPV and MTHLIP instructions modify the SCOUNT and +POS bits. During optimization, the compiler does not delete these +instructions and it does not delete calls to functions containing +these instructions. -v16u8 __builtin_msa_nor_v (v16u8, v16u8); +At present, GCC only provides support for operations on 32-bit +vectors. The vector type associated with 8-bit integer data is +usually called @code{v4i8}, the vector type associated with Q7 +is usually called @code{v4q7}, the vector type associated with 16-bit +integer data is usually called @code{v2i16}, and the vector type +associated with Q15 is usually called @code{v2q15}. They can be +defined in C as follows: -v16u8 __builtin_msa_nori_b (v16u8, imm0_255); +@smallexample +typedef signed char v4i8 __attribute__ ((vector_size(4))); +typedef signed char v4q7 __attribute__ ((vector_size(4))); +typedef short v2i16 __attribute__ ((vector_size(4))); +typedef short v2q15 __attribute__ ((vector_size(4))); +@end smallexample -v16u8 __builtin_msa_or_v (v16u8, v16u8); +@code{v4i8}, @code{v4q7}, @code{v2i16} and @code{v2q15} values are +initialized in the same way as aggregates. For example: -v16u8 __builtin_msa_ori_b (v16u8, imm0_255); +@smallexample +v4i8 a = @{1, 2, 3, 4@}; +v4i8 b; +b = (v4i8) @{5, 6, 7, 8@}; -v16i8 __builtin_msa_pckev_b (v16i8, v16i8); -v8i16 __builtin_msa_pckev_h (v8i16, v8i16); -v4i32 __builtin_msa_pckev_w (v4i32, v4i32); -v2i64 __builtin_msa_pckev_d (v2i64, v2i64); +v2q15 c = @{0x0fcb, 0x3a75@}; +v2q15 d; +d = (v2q15) @{0.1234 * 0x1.0p15, 0.4567 * 0x1.0p15@}; +@end smallexample -v16i8 __builtin_msa_pckod_b (v16i8, v16i8); -v8i16 __builtin_msa_pckod_h (v8i16, v8i16); -v4i32 __builtin_msa_pckod_w (v4i32, v4i32); -v2i64 __builtin_msa_pckod_d (v2i64, v2i64); +@emph{Note:} The CPU's endianness determines the order in which values +are packed. On little-endian targets, the first value is the least +significant and the last value is the most significant. The opposite +order applies to big-endian targets. For example, the code above +sets the lowest byte of @code{a} to @code{1} on little-endian targets +and @code{4} on big-endian targets. -v16i8 __builtin_msa_pcnt_b (v16i8); -v8i16 __builtin_msa_pcnt_h (v8i16); -v4i32 __builtin_msa_pcnt_w (v4i32); -v2i64 __builtin_msa_pcnt_d (v2i64); +@emph{Note:} Q7, Q15 and Q31 values must be initialized with their integer +representation. As shown in this example, the integer representation +of a Q7 value can be obtained by multiplying the fractional value by +@code{0x1.0p7}. The equivalent for Q15 values is to multiply by +@code{0x1.0p15}. The equivalent for Q31 values is to multiply by +@code{0x1.0p31}. -v16i8 __builtin_msa_sat_s_b (v16i8, imm0_7); -v8i16 __builtin_msa_sat_s_h (v8i16, imm0_15); -v4i32 __builtin_msa_sat_s_w (v4i32, imm0_31); -v2i64 __builtin_msa_sat_s_d (v2i64, imm0_63); +The table below lists the @code{v4i8} and @code{v2q15} operations for which +hardware support exists. @code{a} and @code{b} are @code{v4i8} values, +and @code{c} and @code{d} are @code{v2q15} values. -v16u8 __builtin_msa_sat_u_b (v16u8, imm0_7); -v8u16 __builtin_msa_sat_u_h (v8u16, imm0_15); -v4u32 __builtin_msa_sat_u_w (v4u32, imm0_31); -v2u64 __builtin_msa_sat_u_d (v2u64, imm0_63); +@multitable @columnfractions .50 .50 +@headitem C code @tab MIPS instruction +@item @code{a + b} @tab @code{addu.qb} +@item @code{c + d} @tab @code{addq.ph} +@item @code{a - b} @tab @code{subu.qb} +@item @code{c - d} @tab @code{subq.ph} +@end multitable -v16i8 __builtin_msa_shf_b (v16i8, imm0_255); -v8i16 __builtin_msa_shf_h (v8i16, imm0_255); -v4i32 __builtin_msa_shf_w (v4i32, imm0_255); +The table below lists the @code{v2i16} operation for which +hardware support exists for the DSP ASE REV 2. @code{e} and @code{f} are +@code{v2i16} values. -v16i8 __builtin_msa_sld_b (v16i8, v16i8, i32); -v8i16 __builtin_msa_sld_h (v8i16, v8i16, i32); -v4i32 __builtin_msa_sld_w (v4i32, v4i32, i32); -v2i64 __builtin_msa_sld_d (v2i64, v2i64, i32); +@multitable @columnfractions .50 .50 +@headitem C code @tab MIPS instruction +@item @code{e * f} @tab @code{mul.ph} +@end multitable -v16i8 __builtin_msa_sldi_b (v16i8, v16i8, imm0_15); -v8i16 __builtin_msa_sldi_h (v8i16, v8i16, imm0_7); -v4i32 __builtin_msa_sldi_w (v4i32, v4i32, imm0_3); -v2i64 __builtin_msa_sldi_d (v2i64, v2i64, imm0_1); +It is easier to describe the DSP built-in functions if we first define +the following types: -v16i8 __builtin_msa_sll_b (v16i8, v16i8); -v8i16 __builtin_msa_sll_h (v8i16, v8i16); -v4i32 __builtin_msa_sll_w (v4i32, v4i32); -v2i64 __builtin_msa_sll_d (v2i64, v2i64); +@smallexample +typedef int q31; +typedef int i32; +typedef unsigned int ui32; +typedef long long a64; +@end smallexample -v16i8 __builtin_msa_slli_b (v16i8, imm0_7); -v8i16 __builtin_msa_slli_h (v8i16, imm0_15); -v4i32 __builtin_msa_slli_w (v4i32, imm0_31); -v2i64 __builtin_msa_slli_d (v2i64, imm0_63); +@code{q31} and @code{i32} are actually the same as @code{int}, but we +use @code{q31} to indicate a Q31 fractional value and @code{i32} to +indicate a 32-bit integer value. Similarly, @code{a64} is the same as +@code{long long}, but we use @code{a64} to indicate values that are +placed in one of the four DSP accumulators (@code{$ac0}, +@code{$ac1}, @code{$ac2} or @code{$ac3}). -v16i8 __builtin_msa_splat_b (v16i8, i32); -v8i16 __builtin_msa_splat_h (v8i16, i32); -v4i32 __builtin_msa_splat_w (v4i32, i32); -v2i64 __builtin_msa_splat_d (v2i64, i32); +Also, some built-in functions prefer or require immediate numbers as +parameters, because the corresponding DSP instructions accept both immediate +numbers and register operands, or accept immediate numbers only. The +immediate parameters are listed as follows. -v16i8 __builtin_msa_splati_b (v16i8, imm0_15); -v8i16 __builtin_msa_splati_h (v8i16, imm0_7); -v4i32 __builtin_msa_splati_w (v4i32, imm0_3); -v2i64 __builtin_msa_splati_d (v2i64, imm0_1); +@smallexample +imm0_3: 0 to 3. +imm0_7: 0 to 7. +imm0_15: 0 to 15. +imm0_31: 0 to 31. +imm0_63: 0 to 63. +imm0_255: 0 to 255. +imm_n32_31: -32 to 31. +imm_n512_511: -512 to 511. +@end smallexample -v16i8 __builtin_msa_sra_b (v16i8, v16i8); -v8i16 __builtin_msa_sra_h (v8i16, v8i16); -v4i32 __builtin_msa_sra_w (v4i32, v4i32); -v2i64 __builtin_msa_sra_d (v2i64, v2i64); +The following built-in functions map directly to a particular MIPS DSP +instruction. Please refer to the architecture specification +for details on what each instruction does. -v16i8 __builtin_msa_srai_b (v16i8, imm0_7); -v8i16 __builtin_msa_srai_h (v8i16, imm0_15); -v4i32 __builtin_msa_srai_w (v4i32, imm0_31); -v2i64 __builtin_msa_srai_d (v2i64, imm0_63); +@smallexample +v2q15 __builtin_mips_addq_ph (v2q15, v2q15); +v2q15 __builtin_mips_addq_s_ph (v2q15, v2q15); +q31 __builtin_mips_addq_s_w (q31, q31); +v4i8 __builtin_mips_addu_qb (v4i8, v4i8); +v4i8 __builtin_mips_addu_s_qb (v4i8, v4i8); +v2q15 __builtin_mips_subq_ph (v2q15, v2q15); +v2q15 __builtin_mips_subq_s_ph (v2q15, v2q15); +q31 __builtin_mips_subq_s_w (q31, q31); +v4i8 __builtin_mips_subu_qb (v4i8, v4i8); +v4i8 __builtin_mips_subu_s_qb (v4i8, v4i8); +i32 __builtin_mips_addsc (i32, i32); +i32 __builtin_mips_addwc (i32, i32); +i32 __builtin_mips_modsub (i32, i32); +i32 __builtin_mips_raddu_w_qb (v4i8); +v2q15 __builtin_mips_absq_s_ph (v2q15); +q31 __builtin_mips_absq_s_w (q31); +v4i8 __builtin_mips_precrq_qb_ph (v2q15, v2q15); +v2q15 __builtin_mips_precrq_ph_w (q31, q31); +v2q15 __builtin_mips_precrq_rs_ph_w (q31, q31); +v4i8 __builtin_mips_precrqu_s_qb_ph (v2q15, v2q15); +q31 __builtin_mips_preceq_w_phl (v2q15); +q31 __builtin_mips_preceq_w_phr (v2q15); +v2q15 __builtin_mips_precequ_ph_qbl (v4i8); +v2q15 __builtin_mips_precequ_ph_qbr (v4i8); +v2q15 __builtin_mips_precequ_ph_qbla (v4i8); +v2q15 __builtin_mips_precequ_ph_qbra (v4i8); +v2q15 __builtin_mips_preceu_ph_qbl (v4i8); +v2q15 __builtin_mips_preceu_ph_qbr (v4i8); +v2q15 __builtin_mips_preceu_ph_qbla (v4i8); +v2q15 __builtin_mips_preceu_ph_qbra (v4i8); +v4i8 __builtin_mips_shll_qb (v4i8, imm0_7); +v4i8 __builtin_mips_shll_qb (v4i8, i32); +v2q15 __builtin_mips_shll_ph (v2q15, imm0_15); +v2q15 __builtin_mips_shll_ph (v2q15, i32); +v2q15 __builtin_mips_shll_s_ph (v2q15, imm0_15); +v2q15 __builtin_mips_shll_s_ph (v2q15, i32); +q31 __builtin_mips_shll_s_w (q31, imm0_31); +q31 __builtin_mips_shll_s_w (q31, i32); +v4i8 __builtin_mips_shrl_qb (v4i8, imm0_7); +v4i8 __builtin_mips_shrl_qb (v4i8, i32); +v2q15 __builtin_mips_shra_ph (v2q15, imm0_15); +v2q15 __builtin_mips_shra_ph (v2q15, i32); +v2q15 __builtin_mips_shra_r_ph (v2q15, imm0_15); +v2q15 __builtin_mips_shra_r_ph (v2q15, i32); +q31 __builtin_mips_shra_r_w (q31, imm0_31); +q31 __builtin_mips_shra_r_w (q31, i32); +v2q15 __builtin_mips_muleu_s_ph_qbl (v4i8, v2q15); +v2q15 __builtin_mips_muleu_s_ph_qbr (v4i8, v2q15); +v2q15 __builtin_mips_mulq_rs_ph (v2q15, v2q15); +q31 __builtin_mips_muleq_s_w_phl (v2q15, v2q15); +q31 __builtin_mips_muleq_s_w_phr (v2q15, v2q15); +a64 __builtin_mips_dpau_h_qbl (a64, v4i8, v4i8); +a64 __builtin_mips_dpau_h_qbr (a64, v4i8, v4i8); +a64 __builtin_mips_dpsu_h_qbl (a64, v4i8, v4i8); +a64 __builtin_mips_dpsu_h_qbr (a64, v4i8, v4i8); +a64 __builtin_mips_dpaq_s_w_ph (a64, v2q15, v2q15); +a64 __builtin_mips_dpaq_sa_l_w (a64, q31, q31); +a64 __builtin_mips_dpsq_s_w_ph (a64, v2q15, v2q15); +a64 __builtin_mips_dpsq_sa_l_w (a64, q31, q31); +a64 __builtin_mips_mulsaq_s_w_ph (a64, v2q15, v2q15); +a64 __builtin_mips_maq_s_w_phl (a64, v2q15, v2q15); +a64 __builtin_mips_maq_s_w_phr (a64, v2q15, v2q15); +a64 __builtin_mips_maq_sa_w_phl (a64, v2q15, v2q15); +a64 __builtin_mips_maq_sa_w_phr (a64, v2q15, v2q15); +i32 __builtin_mips_bitrev (i32); +i32 __builtin_mips_insv (i32, i32); +v4i8 __builtin_mips_repl_qb (imm0_255); +v4i8 __builtin_mips_repl_qb (i32); +v2q15 __builtin_mips_repl_ph (imm_n512_511); +v2q15 __builtin_mips_repl_ph (i32); +void __builtin_mips_cmpu_eq_qb (v4i8, v4i8); +void __builtin_mips_cmpu_lt_qb (v4i8, v4i8); +void __builtin_mips_cmpu_le_qb (v4i8, v4i8); +i32 __builtin_mips_cmpgu_eq_qb (v4i8, v4i8); +i32 __builtin_mips_cmpgu_lt_qb (v4i8, v4i8); +i32 __builtin_mips_cmpgu_le_qb (v4i8, v4i8); +void __builtin_mips_cmp_eq_ph (v2q15, v2q15); +void __builtin_mips_cmp_lt_ph (v2q15, v2q15); +void __builtin_mips_cmp_le_ph (v2q15, v2q15); +v4i8 __builtin_mips_pick_qb (v4i8, v4i8); +v2q15 __builtin_mips_pick_ph (v2q15, v2q15); +v2q15 __builtin_mips_packrl_ph (v2q15, v2q15); +i32 __builtin_mips_extr_w (a64, imm0_31); +i32 __builtin_mips_extr_w (a64, i32); +i32 __builtin_mips_extr_r_w (a64, imm0_31); +i32 __builtin_mips_extr_s_h (a64, i32); +i32 __builtin_mips_extr_rs_w (a64, imm0_31); +i32 __builtin_mips_extr_rs_w (a64, i32); +i32 __builtin_mips_extr_s_h (a64, imm0_31); +i32 __builtin_mips_extr_r_w (a64, i32); +i32 __builtin_mips_extp (a64, imm0_31); +i32 __builtin_mips_extp (a64, i32); +i32 __builtin_mips_extpdp (a64, imm0_31); +i32 __builtin_mips_extpdp (a64, i32); +a64 __builtin_mips_shilo (a64, imm_n32_31); +a64 __builtin_mips_shilo (a64, i32); +a64 __builtin_mips_mthlip (a64, i32); +void __builtin_mips_wrdsp (i32, imm0_63); +i32 __builtin_mips_rddsp (imm0_63); +i32 __builtin_mips_lbux (void *, i32); +i32 __builtin_mips_lhx (void *, i32); +i32 __builtin_mips_lwx (void *, i32); +a64 __builtin_mips_ldx (void *, i32); /* MIPS64 only */ +i32 __builtin_mips_bposge32 (void); +a64 __builtin_mips_madd (a64, i32, i32); +a64 __builtin_mips_maddu (a64, ui32, ui32); +a64 __builtin_mips_msub (a64, i32, i32); +a64 __builtin_mips_msubu (a64, ui32, ui32); +a64 __builtin_mips_mult (i32, i32); +a64 __builtin_mips_multu (ui32, ui32); +@end smallexample -v16i8 __builtin_msa_srar_b (v16i8, v16i8); -v8i16 __builtin_msa_srar_h (v8i16, v8i16); -v4i32 __builtin_msa_srar_w (v4i32, v4i32); -v2i64 __builtin_msa_srar_d (v2i64, v2i64); +The following built-in functions map directly to a particular MIPS DSP REV 2 +instruction. Please refer to the architecture specification +for details on what each instruction does. -v16i8 __builtin_msa_srari_b (v16i8, imm0_7); -v8i16 __builtin_msa_srari_h (v8i16, imm0_15); -v4i32 __builtin_msa_srari_w (v4i32, imm0_31); -v2i64 __builtin_msa_srari_d (v2i64, imm0_63); +@smallexample +v4q7 __builtin_mips_absq_s_qb (v4q7); +v2i16 __builtin_mips_addu_ph (v2i16, v2i16); +v2i16 __builtin_mips_addu_s_ph (v2i16, v2i16); +v4i8 __builtin_mips_adduh_qb (v4i8, v4i8); +v4i8 __builtin_mips_adduh_r_qb (v4i8, v4i8); +i32 __builtin_mips_append (i32, i32, imm0_31); +i32 __builtin_mips_balign (i32, i32, imm0_3); +i32 __builtin_mips_cmpgdu_eq_qb (v4i8, v4i8); +i32 __builtin_mips_cmpgdu_lt_qb (v4i8, v4i8); +i32 __builtin_mips_cmpgdu_le_qb (v4i8, v4i8); +a64 __builtin_mips_dpa_w_ph (a64, v2i16, v2i16); +a64 __builtin_mips_dps_w_ph (a64, v2i16, v2i16); +v2i16 __builtin_mips_mul_ph (v2i16, v2i16); +v2i16 __builtin_mips_mul_s_ph (v2i16, v2i16); +q31 __builtin_mips_mulq_rs_w (q31, q31); +v2q15 __builtin_mips_mulq_s_ph (v2q15, v2q15); +q31 __builtin_mips_mulq_s_w (q31, q31); +a64 __builtin_mips_mulsa_w_ph (a64, v2i16, v2i16); +v4i8 __builtin_mips_precr_qb_ph (v2i16, v2i16); +v2i16 __builtin_mips_precr_sra_ph_w (i32, i32, imm0_31); +v2i16 __builtin_mips_precr_sra_r_ph_w (i32, i32, imm0_31); +i32 __builtin_mips_prepend (i32, i32, imm0_31); +v4i8 __builtin_mips_shra_qb (v4i8, imm0_7); +v4i8 __builtin_mips_shra_r_qb (v4i8, imm0_7); +v4i8 __builtin_mips_shra_qb (v4i8, i32); +v4i8 __builtin_mips_shra_r_qb (v4i8, i32); +v2i16 __builtin_mips_shrl_ph (v2i16, imm0_15); +v2i16 __builtin_mips_shrl_ph (v2i16, i32); +v2i16 __builtin_mips_subu_ph (v2i16, v2i16); +v2i16 __builtin_mips_subu_s_ph (v2i16, v2i16); +v4i8 __builtin_mips_subuh_qb (v4i8, v4i8); +v4i8 __builtin_mips_subuh_r_qb (v4i8, v4i8); +v2q15 __builtin_mips_addqh_ph (v2q15, v2q15); +v2q15 __builtin_mips_addqh_r_ph (v2q15, v2q15); +q31 __builtin_mips_addqh_w (q31, q31); +q31 __builtin_mips_addqh_r_w (q31, q31); +v2q15 __builtin_mips_subqh_ph (v2q15, v2q15); +v2q15 __builtin_mips_subqh_r_ph (v2q15, v2q15); +q31 __builtin_mips_subqh_w (q31, q31); +q31 __builtin_mips_subqh_r_w (q31, q31); +a64 __builtin_mips_dpax_w_ph (a64, v2i16, v2i16); +a64 __builtin_mips_dpsx_w_ph (a64, v2i16, v2i16); +a64 __builtin_mips_dpaqx_s_w_ph (a64, v2q15, v2q15); +a64 __builtin_mips_dpaqx_sa_w_ph (a64, v2q15, v2q15); +a64 __builtin_mips_dpsqx_s_w_ph (a64, v2q15, v2q15); +a64 __builtin_mips_dpsqx_sa_w_ph (a64, v2q15, v2q15); +@end smallexample -v16i8 __builtin_msa_srl_b (v16i8, v16i8); -v8i16 __builtin_msa_srl_h (v8i16, v8i16); -v4i32 __builtin_msa_srl_w (v4i32, v4i32); -v2i64 __builtin_msa_srl_d (v2i64, v2i64); -v16i8 __builtin_msa_srli_b (v16i8, imm0_7); -v8i16 __builtin_msa_srli_h (v8i16, imm0_15); -v4i32 __builtin_msa_srli_w (v4i32, imm0_31); -v2i64 __builtin_msa_srli_d (v2i64, imm0_63); +@node MIPS Paired-Single Support +@subsection MIPS Paired-Single Support -v16i8 __builtin_msa_srlr_b (v16i8, v16i8); -v8i16 __builtin_msa_srlr_h (v8i16, v8i16); -v4i32 __builtin_msa_srlr_w (v4i32, v4i32); -v2i64 __builtin_msa_srlr_d (v2i64, v2i64); +The MIPS64 architecture includes a number of instructions that +operate on pairs of single-precision floating-point values. +Each pair is packed into a 64-bit floating-point register, +with one element being designated the ``upper half'' and +the other being designated the ``lower half''. -v16i8 __builtin_msa_srlri_b (v16i8, imm0_7); -v8i16 __builtin_msa_srlri_h (v8i16, imm0_15); -v4i32 __builtin_msa_srlri_w (v4i32, imm0_31); -v2i64 __builtin_msa_srlri_d (v2i64, imm0_63); +GCC supports paired-single operations using both the generic +vector extensions (@pxref{Vector Extensions}) and a collection of +MIPS-specific built-in functions. Both kinds of support are +enabled by the @option{-mpaired-single} command-line option. -void __builtin_msa_st_b (v16i8, void *, imm_n512_511); -void __builtin_msa_st_h (v8i16, void *, imm_n1024_1022); -void __builtin_msa_st_w (v4i32, void *, imm_n2048_2044); -void __builtin_msa_st_d (v2i64, void *, imm_n4096_4088); +The vector type associated with paired-single values is usually +called @code{v2sf}. It can be defined in C as follows: -v16i8 __builtin_msa_subs_s_b (v16i8, v16i8); -v8i16 __builtin_msa_subs_s_h (v8i16, v8i16); -v4i32 __builtin_msa_subs_s_w (v4i32, v4i32); -v2i64 __builtin_msa_subs_s_d (v2i64, v2i64); +@smallexample +typedef float v2sf __attribute__ ((vector_size (8))); +@end smallexample -v16u8 __builtin_msa_subs_u_b (v16u8, v16u8); -v8u16 __builtin_msa_subs_u_h (v8u16, v8u16); -v4u32 __builtin_msa_subs_u_w (v4u32, v4u32); -v2u64 __builtin_msa_subs_u_d (v2u64, v2u64); +@code{v2sf} values are initialized in the same way as aggregates. +For example: -v16u8 __builtin_msa_subsus_u_b (v16u8, v16i8); -v8u16 __builtin_msa_subsus_u_h (v8u16, v8i16); -v4u32 __builtin_msa_subsus_u_w (v4u32, v4i32); -v2u64 __builtin_msa_subsus_u_d (v2u64, v2i64); +@smallexample +v2sf a = @{1.5, 9.1@}; +v2sf b; +float e, f; +b = (v2sf) @{e, f@}; +@end smallexample -v16i8 __builtin_msa_subsuu_s_b (v16u8, v16u8); -v8i16 __builtin_msa_subsuu_s_h (v8u16, v8u16); -v4i32 __builtin_msa_subsuu_s_w (v4u32, v4u32); -v2i64 __builtin_msa_subsuu_s_d (v2u64, v2u64); +@emph{Note:} The CPU's endianness determines which value is stored in +the upper half of a register and which value is stored in the lower half. +On little-endian targets, the first value is the lower one and the second +value is the upper one. The opposite order applies to big-endian targets. +For example, the code above sets the lower half of @code{a} to +@code{1.5} on little-endian targets and @code{9.1} on big-endian targets. -v16i8 __builtin_msa_subv_b (v16i8, v16i8); -v8i16 __builtin_msa_subv_h (v8i16, v8i16); -v4i32 __builtin_msa_subv_w (v4i32, v4i32); -v2i64 __builtin_msa_subv_d (v2i64, v2i64); +@node MIPS Loongson Built-in Functions +@subsection MIPS Loongson Built-in Functions -v16i8 __builtin_msa_subvi_b (v16i8, imm0_31); -v8i16 __builtin_msa_subvi_h (v8i16, imm0_31); -v4i32 __builtin_msa_subvi_w (v4i32, imm0_31); -v2i64 __builtin_msa_subvi_d (v2i64, imm0_31); +GCC provides intrinsics to access the SIMD instructions provided by the +ST Microelectronics Loongson-2E and -2F processors. These intrinsics, +available after inclusion of the @code{loongson.h} header file, +operate on the following 64-bit vector types: -v16i8 __builtin_msa_vshf_b (v16i8, v16i8, v16i8); -v8i16 __builtin_msa_vshf_h (v8i16, v8i16, v8i16); -v4i32 __builtin_msa_vshf_w (v4i32, v4i32, v4i32); -v2i64 __builtin_msa_vshf_d (v2i64, v2i64, v2i64); +@itemize +@item @code{uint8x8_t}, a vector of eight unsigned 8-bit integers; +@item @code{uint16x4_t}, a vector of four unsigned 16-bit integers; +@item @code{uint32x2_t}, a vector of two unsigned 32-bit integers; +@item @code{int8x8_t}, a vector of eight signed 8-bit integers; +@item @code{int16x4_t}, a vector of four signed 16-bit integers; +@item @code{int32x2_t}, a vector of two signed 32-bit integers. +@end itemize -v16u8 __builtin_msa_xor_v (v16u8, v16u8); +The intrinsics provided are listed below; each is named after the +machine instruction to which it corresponds, with suffixes added as +appropriate to distinguish intrinsics that expand to the same machine +instruction yet have different argument types. Refer to the architecture +documentation for a description of the functionality of each +instruction. -v16u8 __builtin_msa_xori_b (v16u8, imm0_255); +@smallexample +int16x4_t packsswh (int32x2_t s, int32x2_t t); +int8x8_t packsshb (int16x4_t s, int16x4_t t); +uint8x8_t packushb (uint16x4_t s, uint16x4_t t); +uint32x2_t paddw_u (uint32x2_t s, uint32x2_t t); +uint16x4_t paddh_u (uint16x4_t s, uint16x4_t t); +uint8x8_t paddb_u (uint8x8_t s, uint8x8_t t); +int32x2_t paddw_s (int32x2_t s, int32x2_t t); +int16x4_t paddh_s (int16x4_t s, int16x4_t t); +int8x8_t paddb_s (int8x8_t s, int8x8_t t); +uint64_t paddd_u (uint64_t s, uint64_t t); +int64_t paddd_s (int64_t s, int64_t t); +int16x4_t paddsh (int16x4_t s, int16x4_t t); +int8x8_t paddsb (int8x8_t s, int8x8_t t); +uint16x4_t paddush (uint16x4_t s, uint16x4_t t); +uint8x8_t paddusb (uint8x8_t s, uint8x8_t t); +uint64_t pandn_ud (uint64_t s, uint64_t t); +uint32x2_t pandn_uw (uint32x2_t s, uint32x2_t t); +uint16x4_t pandn_uh (uint16x4_t s, uint16x4_t t); +uint8x8_t pandn_ub (uint8x8_t s, uint8x8_t t); +int64_t pandn_sd (int64_t s, int64_t t); +int32x2_t pandn_sw (int32x2_t s, int32x2_t t); +int16x4_t pandn_sh (int16x4_t s, int16x4_t t); +int8x8_t pandn_sb (int8x8_t s, int8x8_t t); +uint16x4_t pavgh (uint16x4_t s, uint16x4_t t); +uint8x8_t pavgb (uint8x8_t s, uint8x8_t t); +uint32x2_t pcmpeqw_u (uint32x2_t s, uint32x2_t t); +uint16x4_t pcmpeqh_u (uint16x4_t s, uint16x4_t t); +uint8x8_t pcmpeqb_u (uint8x8_t s, uint8x8_t t); +int32x2_t pcmpeqw_s (int32x2_t s, int32x2_t t); +int16x4_t pcmpeqh_s (int16x4_t s, int16x4_t t); +int8x8_t pcmpeqb_s (int8x8_t s, int8x8_t t); +uint32x2_t pcmpgtw_u (uint32x2_t s, uint32x2_t t); +uint16x4_t pcmpgth_u (uint16x4_t s, uint16x4_t t); +uint8x8_t pcmpgtb_u (uint8x8_t s, uint8x8_t t); +int32x2_t pcmpgtw_s (int32x2_t s, int32x2_t t); +int16x4_t pcmpgth_s (int16x4_t s, int16x4_t t); +int8x8_t pcmpgtb_s (int8x8_t s, int8x8_t t); +uint16x4_t pextrh_u (uint16x4_t s, int field); +int16x4_t pextrh_s (int16x4_t s, int field); +uint16x4_t pinsrh_0_u (uint16x4_t s, uint16x4_t t); +uint16x4_t pinsrh_1_u (uint16x4_t s, uint16x4_t t); +uint16x4_t pinsrh_2_u (uint16x4_t s, uint16x4_t t); +uint16x4_t pinsrh_3_u (uint16x4_t s, uint16x4_t t); +int16x4_t pinsrh_0_s (int16x4_t s, int16x4_t t); +int16x4_t pinsrh_1_s (int16x4_t s, int16x4_t t); +int16x4_t pinsrh_2_s (int16x4_t s, int16x4_t t); +int16x4_t pinsrh_3_s (int16x4_t s, int16x4_t t); +int32x2_t pmaddhw (int16x4_t s, int16x4_t t); +int16x4_t pmaxsh (int16x4_t s, int16x4_t t); +uint8x8_t pmaxub (uint8x8_t s, uint8x8_t t); +int16x4_t pminsh (int16x4_t s, int16x4_t t); +uint8x8_t pminub (uint8x8_t s, uint8x8_t t); +uint8x8_t pmovmskb_u (uint8x8_t s); +int8x8_t pmovmskb_s (int8x8_t s); +uint16x4_t pmulhuh (uint16x4_t s, uint16x4_t t); +int16x4_t pmulhh (int16x4_t s, int16x4_t t); +int16x4_t pmullh (int16x4_t s, int16x4_t t); +int64_t pmuluw (uint32x2_t s, uint32x2_t t); +uint8x8_t pasubub (uint8x8_t s, uint8x8_t t); +uint16x4_t biadd (uint8x8_t s); +uint16x4_t psadbh (uint8x8_t s, uint8x8_t t); +uint16x4_t pshufh_u (uint16x4_t dest, uint16x4_t s, uint8_t order); +int16x4_t pshufh_s (int16x4_t dest, int16x4_t s, uint8_t order); +uint16x4_t psllh_u (uint16x4_t s, uint8_t amount); +int16x4_t psllh_s (int16x4_t s, uint8_t amount); +uint32x2_t psllw_u (uint32x2_t s, uint8_t amount); +int32x2_t psllw_s (int32x2_t s, uint8_t amount); +uint16x4_t psrlh_u (uint16x4_t s, uint8_t amount); +int16x4_t psrlh_s (int16x4_t s, uint8_t amount); +uint32x2_t psrlw_u (uint32x2_t s, uint8_t amount); +int32x2_t psrlw_s (int32x2_t s, uint8_t amount); +uint16x4_t psrah_u (uint16x4_t s, uint8_t amount); +int16x4_t psrah_s (int16x4_t s, uint8_t amount); +uint32x2_t psraw_u (uint32x2_t s, uint8_t amount); +int32x2_t psraw_s (int32x2_t s, uint8_t amount); +uint32x2_t psubw_u (uint32x2_t s, uint32x2_t t); +uint16x4_t psubh_u (uint16x4_t s, uint16x4_t t); +uint8x8_t psubb_u (uint8x8_t s, uint8x8_t t); +int32x2_t psubw_s (int32x2_t s, int32x2_t t); +int16x4_t psubh_s (int16x4_t s, int16x4_t t); +int8x8_t psubb_s (int8x8_t s, int8x8_t t); +uint64_t psubd_u (uint64_t s, uint64_t t); +int64_t psubd_s (int64_t s, int64_t t); +int16x4_t psubsh (int16x4_t s, int16x4_t t); +int8x8_t psubsb (int8x8_t s, int8x8_t t); +uint16x4_t psubush (uint16x4_t s, uint16x4_t t); +uint8x8_t psubusb (uint8x8_t s, uint8x8_t t); +uint32x2_t punpckhwd_u (uint32x2_t s, uint32x2_t t); +uint16x4_t punpckhhw_u (uint16x4_t s, uint16x4_t t); +uint8x8_t punpckhbh_u (uint8x8_t s, uint8x8_t t); +int32x2_t punpckhwd_s (int32x2_t s, int32x2_t t); +int16x4_t punpckhhw_s (int16x4_t s, int16x4_t t); +int8x8_t punpckhbh_s (int8x8_t s, int8x8_t t); +uint32x2_t punpcklwd_u (uint32x2_t s, uint32x2_t t); +uint16x4_t punpcklhw_u (uint16x4_t s, uint16x4_t t); +uint8x8_t punpcklbh_u (uint8x8_t s, uint8x8_t t); +int32x2_t punpcklwd_s (int32x2_t s, int32x2_t t); +int16x4_t punpcklhw_s (int16x4_t s, int16x4_t t); +int8x8_t punpcklbh_s (int8x8_t s, int8x8_t t); @end smallexample -@node Other MIPS Built-in Functions -@subsection Other MIPS Built-in Functions +@menu +* Paired-Single Arithmetic:: +* Paired-Single Built-in Functions:: +* MIPS-3D Built-in Functions:: +@end menu -GCC provides other MIPS-specific built-in functions: +@node Paired-Single Arithmetic +@subsubsection Paired-Single Arithmetic -@table @code -@item void __builtin_mips_cache (int @var{op}, const volatile void *@var{addr}) -Insert a @samp{cache} instruction with operands @var{op} and @var{addr}. -GCC defines the preprocessor macro @code{___GCC_HAVE_BUILTIN_MIPS_CACHE} -when this function is available. +The table below lists the @code{v2sf} operations for which hardware +support exists. @code{a}, @code{b} and @code{c} are @code{v2sf} +values and @code{x} is an integral value. -@item unsigned int __builtin_mips_get_fcsr (void) -@itemx void __builtin_mips_set_fcsr (unsigned int @var{value}) -Get and set the contents of the floating-point control and status register -(FPU control register 31). These functions are only available in hard-float -code but can be called in both MIPS16 and non-MIPS16 contexts. +@multitable @columnfractions .50 .50 +@headitem C code @tab MIPS instruction +@item @code{a + b} @tab @code{add.ps} +@item @code{a - b} @tab @code{sub.ps} +@item @code{-a} @tab @code{neg.ps} +@item @code{a * b} @tab @code{mul.ps} +@item @code{a * b + c} @tab @code{madd.ps} +@item @code{a * b - c} @tab @code{msub.ps} +@item @code{-(a * b + c)} @tab @code{nmadd.ps} +@item @code{-(a * b - c)} @tab @code{nmsub.ps} +@item @code{x ? a : b} @tab @code{movn.ps}/@code{movz.ps} +@end multitable -@code{__builtin_mips_set_fcsr} can be used to change any bit of the -register except the condition codes, which GCC assumes are preserved. -@end table +Note that the multiply-accumulate instructions can be disabled +using the command-line option @code{-mno-fused-madd}. -@node MSP430 Built-in Functions -@subsection MSP430 Built-in Functions +@node Paired-Single Built-in Functions +@subsubsection Paired-Single Built-in Functions -GCC provides a couple of special builtin functions to aid in the -writing of interrupt handlers in C. +The following paired-single functions map directly to a particular +MIPS instruction. Please refer to the architecture specification +for details on what each instruction does. @table @code -@item __bic_SR_register_on_exit (int @var{mask}) -This clears the indicated bits in the saved copy of the status register -currently residing on the stack. This only works inside interrupt -handlers and the changes to the status register will only take affect -once the handler returns. - -@item __bis_SR_register_on_exit (int @var{mask}) -This sets the indicated bits in the saved copy of the status register -currently residing on the stack. This only works inside interrupt -handlers and the changes to the status register will only take affect -once the handler returns. - -@item __delay_cycles (long long @var{cycles}) -This inserts an instruction sequence that takes exactly @var{cycles} -cycles (between 0 and about 17E9) to complete. The inserted sequence -may use jumps, loops, or no-ops, and does not interfere with any other -instructions. Note that @var{cycles} must be a compile-time constant -integer - that is, you must pass a number, not a variable that may be -optimized to a constant later. The number of cycles delayed by this -builtin is exact. -@end table - -@node NDS32 Built-in Functions -@subsection NDS32 Built-in Functions - -These built-in functions are available for the NDS32 target: - -@defbuiltin{void __builtin_nds32_isync (int *@var{addr})} -Insert an ISYNC instruction into the instruction stream where -@var{addr} is an instruction address for serialization. -@enddefbuiltin - -@defbuiltin{void __builtin_nds32_isb (void)} -Insert an ISB instruction into the instruction stream. -@enddefbuiltin - -@defbuiltin{int __builtin_nds32_mfsr (int @var{sr})} -Return the content of a system register which is mapped by @var{sr}. -@enddefbuiltin - -@defbuiltin{int __builtin_nds32_mfusr (int @var{usr})} -Return the content of a user space register which is mapped by @var{usr}. -@enddefbuiltin +@item v2sf __builtin_mips_pll_ps (v2sf, v2sf) +Pair lower lower (@code{pll.ps}). -@defbuiltin{void __builtin_nds32_mtsr (int @var{value}, int @var{sr})} -Move the @var{value} to a system register which is mapped by @var{sr}. -@enddefbuiltin +@item v2sf __builtin_mips_pul_ps (v2sf, v2sf) +Pair upper lower (@code{pul.ps}). -@defbuiltin{void __builtin_nds32_mtusr (int @var{value}, int @var{usr})} -Move the @var{value} to a user space register which is mapped by @var{usr}. -@enddefbuiltin +@item v2sf __builtin_mips_plu_ps (v2sf, v2sf) +Pair lower upper (@code{plu.ps}). -@defbuiltin{void __builtin_nds32_setgie_en (void)} -Enable global interrupt. -@enddefbuiltin +@item v2sf __builtin_mips_puu_ps (v2sf, v2sf) +Pair upper upper (@code{puu.ps}). -@defbuiltin{void __builtin_nds32_setgie_dis (void)} -Disable global interrupt. -@enddefbuiltin +@item v2sf __builtin_mips_cvt_ps_s (float, float) +Convert pair to paired single (@code{cvt.ps.s}). -@node Nvidia PTX Built-in Functions -@subsection Nvidia PTX Built-in Functions +@item float __builtin_mips_cvt_s_pl (v2sf) +Convert pair lower to single (@code{cvt.s.pl}). -These built-in functions are available for the Nvidia PTX target: +@item float __builtin_mips_cvt_s_pu (v2sf) +Convert pair upper to single (@code{cvt.s.pu}). -@defbuiltin{{unsigned int} __builtin_nvptx_brev (unsigned int @var{x})} -Reverse the bit order of a 32-bit unsigned integer. -@enddefbuiltin +@item v2sf __builtin_mips_abs_ps (v2sf) +Absolute value (@code{abs.ps}). -@defbuiltin{{unsigned long long} __builtin_nvptx_brevll (unsigned long long @var{x})} -Reverse the bit order of a 64-bit unsigned integer. -@enddefbuiltin +@item v2sf __builtin_mips_alnv_ps (v2sf, v2sf, int) +Align variable (@code{alnv.ps}). -@node Basic PowerPC Built-in Functions -@subsection Basic PowerPC Built-in Functions +@emph{Note:} The value of the third parameter must be 0 or 4 +modulo 8, otherwise the result is unpredictable. Please read the +instruction description for details. +@end table -@menu -* Basic PowerPC Built-in Functions Available on all Configurations:: -* Basic PowerPC Built-in Functions Available on ISA 2.05:: -* Basic PowerPC Built-in Functions Available on ISA 2.06:: -* Basic PowerPC Built-in Functions Available on ISA 2.07:: -* Basic PowerPC Built-in Functions Available on ISA 3.0:: -* Basic PowerPC Built-in Functions Available on ISA 3.1:: -@end menu +The following multi-instruction functions are also available. +In each case, @var{cond} can be any of the 16 floating-point conditions: +@code{f}, @code{un}, @code{eq}, @code{ueq}, @code{olt}, @code{ult}, +@code{ole}, @code{ule}, @code{sf}, @code{ngle}, @code{seq}, @code{ngl}, +@code{lt}, @code{nge}, @code{le} or @code{ngt}. -This section describes PowerPC built-in functions that do not require -the inclusion of any special header files to declare prototypes or -provide macro definitions. The sections that follow describe -additional PowerPC built-in functions. +@table @code +@item v2sf __builtin_mips_movt_c_@var{cond}_ps (v2sf @var{a}, v2sf @var{b}, v2sf @var{c}, v2sf @var{d}) +@itemx v2sf __builtin_mips_movf_c_@var{cond}_ps (v2sf @var{a}, v2sf @var{b}, v2sf @var{c}, v2sf @var{d}) +Conditional move based on floating-point comparison (@code{c.@var{cond}.ps}, +@code{movt.ps}/@code{movf.ps}). -@node Basic PowerPC Built-in Functions Available on all Configurations -@subsubsection Basic PowerPC Built-in Functions Available on all Configurations +The @code{movt} functions return the value @var{x} computed by: -@defbuiltin{void __builtin_cpu_init (void)} -This function is a @code{nop} on the PowerPC platform and is included solely -to maintain API compatibility with the x86 builtins. -@enddefbuiltin +@smallexample +c.@var{cond}.ps @var{cc},@var{a},@var{b} +mov.ps @var{x},@var{c} +movt.ps @var{x},@var{d},@var{cc} +@end smallexample -@defbuiltin{int __builtin_cpu_is (const char *@var{cpuname})} -This function returns a value of @code{1} if the run-time CPU is of type -@var{cpuname} and returns @code{0} otherwise +The @code{movf} functions are similar but use @code{movf.ps} instead +of @code{movt.ps}. -The @code{__builtin_cpu_is} function requires GLIBC 2.23 or newer -which exports the hardware capability bits. GCC defines the macro -@code{__BUILTIN_CPU_SUPPORTS__} if the @code{__builtin_cpu_supports} -built-in function is fully supported. +@item int __builtin_mips_upper_c_@var{cond}_ps (v2sf @var{a}, v2sf @var{b}) +@itemx int __builtin_mips_lower_c_@var{cond}_ps (v2sf @var{a}, v2sf @var{b}) +Comparison of two paired-single values (@code{c.@var{cond}.ps}, +@code{bc1t}/@code{bc1f}). -If GCC was configured to use a GLIBC before 2.23, the built-in -function @code{__builtin_cpu_is} always returns a 0 and the compiler -issues a warning. +These functions compare @var{a} and @var{b} using @code{c.@var{cond}.ps} +and return either the upper or lower half of the result. For example: -The following CPU names can be detected: +@smallexample +v2sf a, b; +if (__builtin_mips_upper_c_eq_ps (a, b)) + upper_halves_are_equal (); +else + upper_halves_are_unequal (); -@table @samp -@item power10 -IBM POWER10 Server CPU. -@item power9 -IBM POWER9 Server CPU. -@item power8 -IBM POWER8 Server CPU. -@item power7 -IBM POWER7 Server CPU. -@item power6x -IBM POWER6 Server CPU (RAW mode). -@item power6 -IBM POWER6 Server CPU (Architected mode). -@item power5+ -IBM POWER5+ Server CPU. -@item power5 -IBM POWER5 Server CPU. -@item ppc970 -IBM 970 Server CPU (ie, Apple G5). -@item power4 -IBM POWER4 Server CPU. -@item ppca2 -IBM A2 64-bit Embedded CPU -@item ppc476 -IBM PowerPC 476FP 32-bit Embedded CPU. -@item ppc464 -IBM PowerPC 464 32-bit Embedded CPU. -@item ppc440 -PowerPC 440 32-bit Embedded CPU. -@item ppc405 -PowerPC 405 32-bit Embedded CPU. -@item ppc-cell-be -IBM PowerPC Cell Broadband Engine Architecture CPU. +if (__builtin_mips_lower_c_eq_ps (a, b)) + lower_halves_are_equal (); +else + lower_halves_are_unequal (); +@end smallexample @end table -Here is an example: -@smallexample -#ifdef __BUILTIN_CPU_SUPPORTS__ - if (__builtin_cpu_is ("power8")) - @{ - do_power8 (); // POWER8 specific implementation. - @} - else -#endif - @{ - do_generic (); // Generic implementation. - @} -@end smallexample -@enddefbuiltin +@node MIPS-3D Built-in Functions +@subsubsection MIPS-3D Built-in Functions -@defbuiltin{int __builtin_cpu_supports (const char *@var{feature})} -This function returns a value of @code{1} if the run-time CPU supports the HWCAP -feature @var{feature} and returns @code{0} otherwise. +The MIPS-3D Application-Specific Extension (ASE) includes additional +paired-single instructions that are designed to improve the performance +of 3D graphics operations. Support for these instructions is controlled +by the @option{-mips3d} command-line option. -The @code{__builtin_cpu_supports} function requires GLIBC 2.23 or -newer which exports the hardware capability bits. GCC defines the -macro @code{__BUILTIN_CPU_SUPPORTS__} if the -@code{__builtin_cpu_supports} built-in function is fully supported. +The functions listed below map directly to a particular MIPS-3D +instruction. Please refer to the architecture specification for +more details on what each instruction does. -If GCC was configured to use a GLIBC before 2.23, the built-in -function @code{__builtin_cpu_supports} always returns a 0 and the -compiler issues a warning. +@table @code +@item v2sf __builtin_mips_addr_ps (v2sf, v2sf) +Reduction add (@code{addr.ps}). -The following features can be -detected: +@item v2sf __builtin_mips_mulr_ps (v2sf, v2sf) +Reduction multiply (@code{mulr.ps}). -@table @samp -@item 4xxmac -4xx CPU has a Multiply Accumulator. -@item altivec -CPU has a SIMD/Vector Unit. -@item arch_2_05 -CPU supports ISA 2.05 (eg, POWER6) -@item arch_2_06 -CPU supports ISA 2.06 (eg, POWER7) -@item arch_2_07 -CPU supports ISA 2.07 (eg, POWER8) -@item arch_3_00 -CPU supports ISA 3.0 (eg, POWER9) -@item arch_3_1 -CPU supports ISA 3.1 (eg, POWER10) -@item archpmu -CPU supports the set of compatible performance monitoring events. -@item booke -CPU supports the Embedded ISA category. -@item cellbe -CPU has a CELL broadband engine. -@item darn -CPU supports the @code{darn} (deliver a random number) instruction. -@item dfp -CPU has a decimal floating point unit. -@item dscr -CPU supports the data stream control register. -@item ebb -CPU supports event base branching. -@item efpdouble -CPU has a SPE double precision floating point unit. -@item efpsingle -CPU has a SPE single precision floating point unit. -@item fpu -CPU has a floating point unit. -@item htm -CPU has hardware transaction memory instructions. -@item htm-nosc -Kernel aborts hardware transactions when a syscall is made. -@item htm-no-suspend -CPU supports hardware transaction memory but does not support the -@code{tsuspend.} instruction. -@item ic_snoop -CPU supports icache snooping capabilities. -@item ieee128 -CPU supports 128-bit IEEE binary floating point instructions. -@item isel -CPU supports the integer select instruction. -@item mma -CPU supports the matrix-multiply assist instructions. -@item mmu -CPU has a memory management unit. -@item notb -CPU does not have a timebase (eg, 601 and 403gx). -@item pa6t -CPU supports the PA Semi 6T CORE ISA. -@item power4 -CPU supports ISA 2.00 (eg, POWER4) -@item power5 -CPU supports ISA 2.02 (eg, POWER5) -@item power5+ -CPU supports ISA 2.03 (eg, POWER5+) -@item power6x -CPU supports ISA 2.05 (eg, POWER6) extended opcodes mffgpr and mftgpr. -@item ppc32 -CPU supports 32-bit mode execution. -@item ppc601 -CPU supports the old POWER ISA (eg, 601) -@item ppc64 -CPU supports 64-bit mode execution. -@item ppcle -CPU supports a little-endian mode that uses address swizzling. -@item scv -Kernel supports system call vectored. -@item smt -CPU support simultaneous multi-threading. -@item spe -CPU has a signal processing extension unit. -@item tar -CPU supports the target address register. -@item true_le -CPU supports true little-endian mode. -@item ucache -CPU has unified I/D cache. -@item vcrypto -CPU supports the vector cryptography instructions. -@item vsx -CPU supports the vector-scalar extension. +@item v2sf __builtin_mips_cvt_pw_ps (v2sf) +Convert paired single to paired word (@code{cvt.pw.ps}). + +@item v2sf __builtin_mips_cvt_ps_pw (v2sf) +Convert paired word to paired single (@code{cvt.ps.pw}). + +@item float __builtin_mips_recip1_s (float) +@itemx double __builtin_mips_recip1_d (double) +@itemx v2sf __builtin_mips_recip1_ps (v2sf) +Reduced-precision reciprocal (sequence step 1) (@code{recip1.@var{fmt}}). + +@item float __builtin_mips_recip2_s (float, float) +@itemx double __builtin_mips_recip2_d (double, double) +@itemx v2sf __builtin_mips_recip2_ps (v2sf, v2sf) +Reduced-precision reciprocal (sequence step 2) (@code{recip2.@var{fmt}}). + +@item float __builtin_mips_rsqrt1_s (float) +@itemx double __builtin_mips_rsqrt1_d (double) +@itemx v2sf __builtin_mips_rsqrt1_ps (v2sf) +Reduced-precision reciprocal square root (sequence step 1) +(@code{rsqrt1.@var{fmt}}). + +@item float __builtin_mips_rsqrt2_s (float, float) +@itemx double __builtin_mips_rsqrt2_d (double, double) +@itemx v2sf __builtin_mips_rsqrt2_ps (v2sf, v2sf) +Reduced-precision reciprocal square root (sequence step 2) +(@code{rsqrt2.@var{fmt}}). @end table -Here is an example: -@smallexample -#ifdef __BUILTIN_CPU_SUPPORTS__ - if (__builtin_cpu_supports ("fpu")) - @{ - asm("fadd %0,%1,%2" : "=d"(dst) : "d"(src1), "d"(src2)); - @} - else -#endif - @{ - dst = __fadd (src1, src2); // Software FP addition function. - @} -@end smallexample -@enddefbuiltin +The following multi-instruction functions are also available. +In each case, @var{cond} can be any of the 16 floating-point conditions: +@code{f}, @code{un}, @code{eq}, @code{ueq}, @code{olt}, @code{ult}, +@code{ole}, @code{ule}, @code{sf}, @code{ngle}, @code{seq}, +@code{ngl}, @code{lt}, @code{nge}, @code{le} or @code{ngt}. + +@table @code +@item int __builtin_mips_cabs_@var{cond}_s (float @var{a}, float @var{b}) +@itemx int __builtin_mips_cabs_@var{cond}_d (double @var{a}, double @var{b}) +Absolute comparison of two scalar values (@code{cabs.@var{cond}.@var{fmt}}, +@code{bc1t}/@code{bc1f}). + +These functions compare @var{a} and @var{b} using @code{cabs.@var{cond}.s} +or @code{cabs.@var{cond}.d} and return the result as a boolean value. +For example: -The following built-in functions are also available on all PowerPC -processors: @smallexample -uint64_t __builtin_ppc_get_timebase (); -unsigned long __builtin_ppc_mftb (); -double __builtin_unpack_ibm128 (__ibm128, int); -__ibm128 __builtin_pack_ibm128 (double, double); -double __builtin_mffs (void); -void __builtin_mtfsf (const int, double); -void __builtin_mtfsb0 (const int); -void __builtin_mtfsb1 (const int); -double __builtin_set_fpscr_rn (int); +float a, b; +if (__builtin_mips_cabs_eq_s (a, b)) + true (); +else + false (); @end smallexample -The @code{__builtin_ppc_get_timebase} and @code{__builtin_ppc_mftb} -functions generate instructions to read the Time Base Register. The -@code{__builtin_ppc_get_timebase} function may generate multiple -instructions and always returns the 64 bits of the Time Base Register. -The @code{__builtin_ppc_mftb} function always generates one instruction and -returns the Time Base Register value as an unsigned long, throwing away -the most significant word on 32-bit environments. The @code{__builtin_mffs} -return the value of the FPSCR register. Note, ISA 3.0 supports the -@code{__builtin_mffsl()} which permits software to read the control and -non-sticky status bits in the FSPCR without the higher latency associated with -accessing the sticky status bits. The @code{__builtin_mtfsf} takes a constant -8-bit integer field mask and a double precision floating point argument -and generates the @code{mtfsf} (extended mnemonic) instruction to write new -values to selected fields of the FPSCR. The -@code{__builtin_mtfsb0} and @code{__builtin_mtfsb1} take the bit to change -as an argument. The valid bit range is between 0 and 31. The builtins map to -the @code{mtfsb0} and @code{mtfsb1} instructions which take the argument and -add 32. Hence these instructions only modify the FPSCR[32:63] bits by -changing the specified bit to a zero or one respectively. - -The @code{__builtin_set_fpscr_rn} built-in allows changing both of the floating -point rounding mode bits and returning the various FPSCR fields before the RN -field is updated. The built-in returns a double consisting of the initial -value of the FPSCR fields DRN, VE, OE, UE, ZE, XE, NI, and RN bit positions -with all other bits set to zero. The built-in argument is a 2-bit value for the -new RN field value. The argument can either be an @code{const int} or stored -in a variable. Earlier versions of @code{__builtin_set_fpscr_rn} returned -void. A @code{__SET_FPSCR_RN_RETURNS_FPSCR__} macro has been added. If -defined, then the @code{__builtin_set_fpscr_rn} built-in returns the FPSCR -fields. If not defined, the @code{__builtin_set_fpscr_rn} does not return a -value. If the @option{-msoft-float} option is used, the -@code{__builtin_set_fpscr_rn} built-in will not return a value. +@item int __builtin_mips_upper_cabs_@var{cond}_ps (v2sf @var{a}, v2sf @var{b}) +@itemx int __builtin_mips_lower_cabs_@var{cond}_ps (v2sf @var{a}, v2sf @var{b}) +Absolute comparison of two paired-single values (@code{cabs.@var{cond}.ps}, +@code{bc1t}/@code{bc1f}). -@node Basic PowerPC Built-in Functions Available on ISA 2.05 -@subsubsection Basic PowerPC Built-in Functions Available on ISA 2.05 +These functions compare @var{a} and @var{b} using @code{cabs.@var{cond}.ps} +and return either the upper or lower half of the result. For example: -The basic built-in functions described in this section are -available on the PowerPC family of processors starting with ISA 2.05 -or later. Unless specific options are explicitly disabled on the -command line, specifying option @option{-mcpu=power6} has the effect of -enabling the @option{-mpowerpc64}, @option{-mpowerpc-gpopt}, -@option{-mpowerpc-gfxopt}, @option{-mmfcrf}, @option{-mpopcntb}, -@option{-mfprnd}, @option{-mcmpb}, @option{-mhard-dfp}, and -@option{-mrecip-precision} options. Specify the -@option{-maltivec} option explicitly in -combination with the above options if desired. +@smallexample +v2sf a, b; +if (__builtin_mips_upper_cabs_eq_ps (a, b)) + upper_halves_are_equal (); +else + upper_halves_are_unequal (); -The following functions require option @option{-mcmpb}. -@smallexample -unsigned long long __builtin_cmpb (unsigned long long int, unsigned long long int); -unsigned int __builtin_cmpb (unsigned int, unsigned int); +if (__builtin_mips_lower_cabs_eq_ps (a, b)) + lower_halves_are_equal (); +else + lower_halves_are_unequal (); @end smallexample -The @code{__builtin_cmpb} function -performs a byte-wise compare on the contents of its two arguments, -returning the result of the byte-wise comparison as the returned -value. For each byte comparison, the corresponding byte of the return -value holds 0xff if the input bytes are equal and 0 if the input bytes -are not equal. If either of the arguments to this built-in function -is wider than 32 bits, the function call expands into the form that -expects @code{unsigned long long int} arguments -which is only available on 64-bit targets. +@item v2sf __builtin_mips_movt_cabs_@var{cond}_ps (v2sf @var{a}, v2sf @var{b}, v2sf @var{c}, v2sf @var{d}) +@itemx v2sf __builtin_mips_movf_cabs_@var{cond}_ps (v2sf @var{a}, v2sf @var{b}, v2sf @var{c}, v2sf @var{d}) +Conditional move based on absolute comparison (@code{cabs.@var{cond}.ps}, +@code{movt.ps}/@code{movf.ps}). + +The @code{movt} functions return the value @var{x} computed by: -The following built-in functions are available -when hardware decimal floating point -(@option{-mhard-dfp}) is available: @smallexample -void __builtin_set_fpscr_drn(int); -_Decimal64 __builtin_ddedpd (int, _Decimal64); -_Decimal128 __builtin_ddedpdq (int, _Decimal128); -_Decimal64 __builtin_denbcd (int, _Decimal64); -_Decimal128 __builtin_denbcdq (int, _Decimal128); -_Decimal64 __builtin_diex (long long, _Decimal64); -_Decimal128 _builtin_diexq (long long, _Decimal128); -_Decimal64 __builtin_dscli (_Decimal64, int); -_Decimal128 __builtin_dscliq (_Decimal128, int); -_Decimal64 __builtin_dscri (_Decimal64, int); -_Decimal128 __builtin_dscriq (_Decimal128, int); -long long __builtin_dxex (_Decimal64); -long long __builtin_dxexq (_Decimal128); -_Decimal128 __builtin_pack_dec128 (unsigned long long, unsigned long long); -unsigned long long __builtin_unpack_dec128 (_Decimal128, int); +cabs.@var{cond}.ps @var{cc},@var{a},@var{b} +mov.ps @var{x},@var{c} +movt.ps @var{x},@var{d},@var{cc} +@end smallexample -The @code{__builtin_set_fpscr_drn} builtin allows changing the three decimal -floating point rounding mode bits. The argument is a 3-bit value. The -argument can either be a @code{const int} or the value can be stored in -a variable. -The builtin uses the ISA 3.0 instruction @code{mffscdrn} if available. -Otherwise the builtin reads the FPSCR, masks the current decimal rounding -mode bits out and OR's in the new value. +The @code{movf} functions are similar but use @code{movf.ps} instead +of @code{movt.ps}. -_Decimal64 __builtin_dfp_quantize (_Decimal64, _Decimal64, const int); -_Decimal64 __builtin_dfp_quantize (const int, _Decimal64, const int); -_Decimal128 __builtin_dfp_quantize (_Decimal128, _Decimal128, const int); -_Decimal128 __builtin_dfp_quantize (const int, _Decimal128, const int); +@item int __builtin_mips_any_c_@var{cond}_ps (v2sf @var{a}, v2sf @var{b}) +@itemx int __builtin_mips_all_c_@var{cond}_ps (v2sf @var{a}, v2sf @var{b}) +@itemx int __builtin_mips_any_cabs_@var{cond}_ps (v2sf @var{a}, v2sf @var{b}) +@itemx int __builtin_mips_all_cabs_@var{cond}_ps (v2sf @var{a}, v2sf @var{b}) +Comparison of two paired-single values +(@code{c.@var{cond}.ps}/@code{cabs.@var{cond}.ps}, +@code{bc1any2t}/@code{bc1any2f}). -The @code{__builtin_dfp_quantize} built-in, converts and rounds the second -argument to the form with the exponent as specified by the first -argument based on the rounding mode specified by the third argument. -If the first argument is a decimal floating point value, its exponent is used -for converting and rounding of the second argument. If the first argument is a -5-bit constant integer value, then the value specifies the exponent to be used -when rounding and converting the second argument. The third argument is a -two bit constant integer that specifies the rounding mode. The possible modes -are: 00 Round to nearest, ties to even; 01 Round toward 0; 10 Round to nearest, -ties away from 0; 11 Round according to DRN where DRN is the Decimal Floating -point field of the FPSCR. +These functions compare @var{a} and @var{b} using @code{c.@var{cond}.ps} +or @code{cabs.@var{cond}.ps}. The @code{any} forms return @code{true} if either +result is @code{true} and the @code{all} forms return @code{true} if both results are @code{true}. +For example: + +@smallexample +v2sf a, b; +if (__builtin_mips_any_c_eq_ps (a, b)) + one_is_true (); +else + both_are_false (); +if (__builtin_mips_all_c_eq_ps (a, b)) + both_are_true (); +else + one_is_false (); @end smallexample -The following functions require @option{-mhard-float}, -@option{-mpowerpc-gfxopt}, and @option{-mpopcntb} options. +@item int __builtin_mips_any_c_@var{cond}_4s (v2sf @var{a}, v2sf @var{b}, v2sf @var{c}, v2sf @var{d}) +@itemx int __builtin_mips_all_c_@var{cond}_4s (v2sf @var{a}, v2sf @var{b}, v2sf @var{c}, v2sf @var{d}) +@itemx int __builtin_mips_any_cabs_@var{cond}_4s (v2sf @var{a}, v2sf @var{b}, v2sf @var{c}, v2sf @var{d}) +@itemx int __builtin_mips_all_cabs_@var{cond}_4s (v2sf @var{a}, v2sf @var{b}, v2sf @var{c}, v2sf @var{d}) +Comparison of four paired-single values +(@code{c.@var{cond}.ps}/@code{cabs.@var{cond}.ps}, +@code{bc1any4t}/@code{bc1any4f}). + +These functions use @code{c.@var{cond}.ps} or @code{cabs.@var{cond}.ps} +to compare @var{a} with @var{b} and to compare @var{c} with @var{d}. +The @code{any} forms return @code{true} if any of the four results are @code{true} +and the @code{all} forms return @code{true} if all four results are @code{true}. +For example: @smallexample -double __builtin_recipdiv (double, double); -float __builtin_recipdivf (float, float); -double __builtin_rsqrt (double); -float __builtin_rsqrtf (float); +v2sf a, b, c, d; +if (__builtin_mips_any_c_eq_4s (a, b, c, d)) + some_are_true (); +else + all_are_false (); + +if (__builtin_mips_all_c_eq_4s (a, b, c, d)) + all_are_true (); +else + some_are_false (); @end smallexample +@end table -The @code{vec_rsqrt}, @code{__builtin_rsqrt}, and -@code{__builtin_rsqrtf} functions generate multiple instructions to -implement the reciprocal sqrt functionality using reciprocal sqrt -estimate instructions. +@node MIPS SIMD Architecture (MSA) Support +@subsection MIPS SIMD Architecture (MSA) Support -The @code{__builtin_recipdiv}, and @code{__builtin_recipdivf} -functions generate multiple instructions to implement division using -the reciprocal estimate instructions. +@menu +* MIPS SIMD Architecture Built-in Functions:: +@end menu -The following functions require @option{-mhard-float} and -@option{-mmultiple} options. +GCC provides intrinsics to access the SIMD instructions provided by the +MSA MIPS SIMD Architecture. The interface is made available by including +@code{} and using @option{-mmsa -mhard-float -mfp64 -mnan=2008}. +For each @code{__builtin_msa_*}, there is a shortened name of the intrinsic, +@code{__msa_*}. -The @code{__builtin_unpack_longdouble} function takes a -@code{long double} argument and a compile time constant of 0 or 1. If -the constant is 0, the first @code{double} within the -@code{long double} is returned, otherwise the second @code{double} -is returned. The @code{__builtin_unpack_longdouble} function is only -available if @code{long double} uses the IBM extended double -representation. +MSA implements 128-bit wide vector registers, operating on 8-, 16-, 32- and +64-bit integer, 16- and 32-bit fixed-point, or 32- and 64-bit floating point +data elements. The following vectors typedefs are included in @code{msa.h}: +@itemize +@item @code{v16i8}, a vector of sixteen signed 8-bit integers; +@item @code{v16u8}, a vector of sixteen unsigned 8-bit integers; +@item @code{v8i16}, a vector of eight signed 16-bit integers; +@item @code{v8u16}, a vector of eight unsigned 16-bit integers; +@item @code{v4i32}, a vector of four signed 32-bit integers; +@item @code{v4u32}, a vector of four unsigned 32-bit integers; +@item @code{v2i64}, a vector of two signed 64-bit integers; +@item @code{v2u64}, a vector of two unsigned 64-bit integers; +@item @code{v4f32}, a vector of four 32-bit floats; +@item @code{v2f64}, a vector of two 64-bit doubles. +@end itemize -The @code{__builtin_pack_longdouble} function takes two @code{double} -arguments and returns a @code{long double} value that combines the two -arguments. The @code{__builtin_pack_longdouble} function is only -available if @code{long double} uses the IBM extended double -representation. +Instructions and corresponding built-ins may have additional restrictions and/or +input/output values manipulated: +@itemize +@item @code{imm0_1}, an integer literal in range 0 to 1; +@item @code{imm0_3}, an integer literal in range 0 to 3; +@item @code{imm0_7}, an integer literal in range 0 to 7; +@item @code{imm0_15}, an integer literal in range 0 to 15; +@item @code{imm0_31}, an integer literal in range 0 to 31; +@item @code{imm0_63}, an integer literal in range 0 to 63; +@item @code{imm0_255}, an integer literal in range 0 to 255; +@item @code{imm_n16_15}, an integer literal in range -16 to 15; +@item @code{imm_n512_511}, an integer literal in range -512 to 511; +@item @code{imm_n1024_1022}, an integer literal in range -512 to 511 left +shifted by 1 bit, i.e., -1024, -1022, @dots{}, 1020, 1022; +@item @code{imm_n2048_2044}, an integer literal in range -512 to 511 left +shifted by 2 bits, i.e., -2048, -2044, @dots{}, 2040, 2044; +@item @code{imm_n4096_4088}, an integer literal in range -512 to 511 left +shifted by 3 bits, i.e., -4096, -4088, @dots{}, 4080, 4088; +@item @code{imm1_4}, an integer literal in range 1 to 4; +@item @code{i32, i64, u32, u64, f32, f64}, defined as follows: +@end itemize -The @code{__builtin_unpack_ibm128} function takes a @code{__ibm128} -argument and a compile time constant of 0 or 1. If the constant is 0, -the first @code{double} within the @code{__ibm128} is returned, -otherwise the second @code{double} is returned. +@smallexample +@{ +typedef int i32; +#if __LONG_MAX__ == __LONG_LONG_MAX__ +typedef long i64; +#else +typedef long long i64; +#endif -The @code{__builtin_pack_ibm128} function takes two @code{double} -arguments and returns a @code{__ibm128} value that combines the two -arguments. +typedef unsigned int u32; +#if __LONG_MAX__ == __LONG_LONG_MAX__ +typedef unsigned long u64; +#else +typedef unsigned long long u64; +#endif -Additional built-in functions are available for the 64-bit PowerPC -family of processors, for efficient use of 128-bit floating point -(@code{__float128}) values. +typedef double f64; +typedef float f32; +@} +@end smallexample -Vector select +@node MIPS SIMD Architecture Built-in Functions +@subsubsection MIPS SIMD Architecture Built-in Functions + +The intrinsics provided are listed below; each is named after the +machine instruction. @smallexample -vector signed __int128 vec_sel (vector signed __int128, - vector signed __int128, vector bool __int128); -vector signed __int128 vec_sel (vector signed __int128, - vector signed __int128, vector unsigned __int128); -vector unsigned __int128 vec_sel (vector unsigned __int128, - vector unsigned __int128, vector bool __int128); -vector unsigned __int128 vec_sel (vector unsigned __int128, - vector unsigned __int128, vector unsigned __int128); -vector bool __int128 vec_sel (vector bool __int128, - vector bool __int128, vector bool __int128); -vector bool __int128 vec_sel (vector bool __int128, - vector bool __int128, vector unsigned __int128); -@end smallexample +v16i8 __builtin_msa_add_a_b (v16i8, v16i8); +v8i16 __builtin_msa_add_a_h (v8i16, v8i16); +v4i32 __builtin_msa_add_a_w (v4i32, v4i32); +v2i64 __builtin_msa_add_a_d (v2i64, v2i64); + +v16i8 __builtin_msa_adds_a_b (v16i8, v16i8); +v8i16 __builtin_msa_adds_a_h (v8i16, v8i16); +v4i32 __builtin_msa_adds_a_w (v4i32, v4i32); +v2i64 __builtin_msa_adds_a_d (v2i64, v2i64); + +v16i8 __builtin_msa_adds_s_b (v16i8, v16i8); +v8i16 __builtin_msa_adds_s_h (v8i16, v8i16); +v4i32 __builtin_msa_adds_s_w (v4i32, v4i32); +v2i64 __builtin_msa_adds_s_d (v2i64, v2i64); + +v16u8 __builtin_msa_adds_u_b (v16u8, v16u8); +v8u16 __builtin_msa_adds_u_h (v8u16, v8u16); +v4u32 __builtin_msa_adds_u_w (v4u32, v4u32); +v2u64 __builtin_msa_adds_u_d (v2u64, v2u64); + +v16i8 __builtin_msa_addv_b (v16i8, v16i8); +v8i16 __builtin_msa_addv_h (v8i16, v8i16); +v4i32 __builtin_msa_addv_w (v4i32, v4i32); +v2i64 __builtin_msa_addv_d (v2i64, v2i64); -The instance is an extension of the existing overloaded built-in @code{vec_sel} -that is documented in the PVIPR. +v16i8 __builtin_msa_addvi_b (v16i8, imm0_31); +v8i16 __builtin_msa_addvi_h (v8i16, imm0_31); +v4i32 __builtin_msa_addvi_w (v4i32, imm0_31); +v2i64 __builtin_msa_addvi_d (v2i64, imm0_31); -@smallexample -vector signed __int128 vec_perm (vector signed __int128, - vector signed __int128); -vector unsigned __int128 vec_perm (vector unsigned __int128, - vector unsigned __int128); -@end smallexample +v16u8 __builtin_msa_and_v (v16u8, v16u8); -The instance is an extension of the existing overloaded built-in -@code{vec_perm} that is documented in the PVIPR. +v16u8 __builtin_msa_andi_b (v16u8, imm0_255); -@node Basic PowerPC Built-in Functions Available on ISA 2.06 -@subsubsection Basic PowerPC Built-in Functions Available on ISA 2.06 +v16i8 __builtin_msa_asub_s_b (v16i8, v16i8); +v8i16 __builtin_msa_asub_s_h (v8i16, v8i16); +v4i32 __builtin_msa_asub_s_w (v4i32, v4i32); +v2i64 __builtin_msa_asub_s_d (v2i64, v2i64); -The basic built-in functions described in this section are -available on the PowerPC family of processors starting with ISA 2.05 -or later. Unless specific options are explicitly disabled on the -command line, specifying option @option{-mcpu=power7} has the effect of -enabling all the same options as for @option{-mcpu=power6} in -addition to the @option{-maltivec}, @option{-mpopcntd}, and -@option{-mvsx} options. +v16u8 __builtin_msa_asub_u_b (v16u8, v16u8); +v8u16 __builtin_msa_asub_u_h (v8u16, v8u16); +v4u32 __builtin_msa_asub_u_w (v4u32, v4u32); +v2u64 __builtin_msa_asub_u_d (v2u64, v2u64); -The following basic built-in functions require @option{-mpopcntd}: -@smallexample -unsigned int __builtin_addg6s (unsigned int, unsigned int); -long long __builtin_bpermd (long long, long long); -unsigned int __builtin_cbcdtd (unsigned int); -unsigned int __builtin_cdtbcd (unsigned int); -long long __builtin_divde (long long, long long); -unsigned long long __builtin_divdeu (unsigned long long, unsigned long long); -int __builtin_divwe (int, int); -unsigned int __builtin_divweu (unsigned int, unsigned int); -vector __int128 __builtin_pack_vector_int128 (long long, long long); -void __builtin_rs6000_speculation_barrier (void); -long long __builtin_unpack_vector_int128 (vector __int128, signed char); -@end smallexample +v16i8 __builtin_msa_ave_s_b (v16i8, v16i8); +v8i16 __builtin_msa_ave_s_h (v8i16, v8i16); +v4i32 __builtin_msa_ave_s_w (v4i32, v4i32); +v2i64 __builtin_msa_ave_s_d (v2i64, v2i64); -Of these, the @code{__builtin_divde} and @code{__builtin_divdeu} functions -require a 64-bit environment. +v16u8 __builtin_msa_ave_u_b (v16u8, v16u8); +v8u16 __builtin_msa_ave_u_h (v8u16, v8u16); +v4u32 __builtin_msa_ave_u_w (v4u32, v4u32); +v2u64 __builtin_msa_ave_u_d (v2u64, v2u64); -The following basic built-in functions, which are also supported on -x86 targets, require @option{-mfloat128}. -@smallexample -__float128 __builtin_fabsq (__float128); -__float128 __builtin_copysignq (__float128, __float128); -__float128 __builtin_infq (void); -__float128 __builtin_huge_valq (void); -__float128 __builtin_nanq (void); -__float128 __builtin_nansq (void); +v16i8 __builtin_msa_aver_s_b (v16i8, v16i8); +v8i16 __builtin_msa_aver_s_h (v8i16, v8i16); +v4i32 __builtin_msa_aver_s_w (v4i32, v4i32); +v2i64 __builtin_msa_aver_s_d (v2i64, v2i64); -__float128 __builtin_sqrtf128 (__float128); -__float128 __builtin_fmaf128 (__float128, __float128, __float128); -@end smallexample +v16u8 __builtin_msa_aver_u_b (v16u8, v16u8); +v8u16 __builtin_msa_aver_u_h (v8u16, v8u16); +v4u32 __builtin_msa_aver_u_w (v4u32, v4u32); +v2u64 __builtin_msa_aver_u_d (v2u64, v2u64); -@node Basic PowerPC Built-in Functions Available on ISA 2.07 -@subsubsection Basic PowerPC Built-in Functions Available on ISA 2.07 +v16u8 __builtin_msa_bclr_b (v16u8, v16u8); +v8u16 __builtin_msa_bclr_h (v8u16, v8u16); +v4u32 __builtin_msa_bclr_w (v4u32, v4u32); +v2u64 __builtin_msa_bclr_d (v2u64, v2u64); -The basic built-in functions described in this section are -available on the PowerPC family of processors starting with ISA 2.07 -or later. Unless specific options are explicitly disabled on the -command line, specifying option @option{-mcpu=power8} has the effect of -enabling all the same options as for @option{-mcpu=power7} in -addition to the @option{-mpower8-fusion}, @option{-mcrypto}, -@option{-mhtm}, @option{-mquad-memory}, and -@option{-mquad-memory-atomic} options. +v16u8 __builtin_msa_bclri_b (v16u8, imm0_7); +v8u16 __builtin_msa_bclri_h (v8u16, imm0_15); +v4u32 __builtin_msa_bclri_w (v4u32, imm0_31); +v2u64 __builtin_msa_bclri_d (v2u64, imm0_63); -This section intentionally empty. +v16u8 __builtin_msa_binsl_b (v16u8, v16u8, v16u8); +v8u16 __builtin_msa_binsl_h (v8u16, v8u16, v8u16); +v4u32 __builtin_msa_binsl_w (v4u32, v4u32, v4u32); +v2u64 __builtin_msa_binsl_d (v2u64, v2u64, v2u64); -@node Basic PowerPC Built-in Functions Available on ISA 3.0 -@subsubsection Basic PowerPC Built-in Functions Available on ISA 3.0 +v16u8 __builtin_msa_binsli_b (v16u8, v16u8, imm0_7); +v8u16 __builtin_msa_binsli_h (v8u16, v8u16, imm0_15); +v4u32 __builtin_msa_binsli_w (v4u32, v4u32, imm0_31); +v2u64 __builtin_msa_binsli_d (v2u64, v2u64, imm0_63); -The basic built-in functions described in this section are -available on the PowerPC family of processors starting with ISA 3.0 -or later. Unless specific options are explicitly disabled on the -command line, specifying option @option{-mcpu=power9} has the effect of -enabling all the same options as for @option{-mcpu=power8} in -addition to the @option{-misel} option. +v16u8 __builtin_msa_binsr_b (v16u8, v16u8, v16u8); +v8u16 __builtin_msa_binsr_h (v8u16, v8u16, v8u16); +v4u32 __builtin_msa_binsr_w (v4u32, v4u32, v4u32); +v2u64 __builtin_msa_binsr_d (v2u64, v2u64, v2u64); -The following built-in functions are available on Linux 64-bit systems -that use the ISA 3.0 instruction set (@option{-mcpu=power9}): +v16u8 __builtin_msa_binsri_b (v16u8, v16u8, imm0_7); +v8u16 __builtin_msa_binsri_h (v8u16, v8u16, imm0_15); +v4u32 __builtin_msa_binsri_w (v4u32, v4u32, imm0_31); +v2u64 __builtin_msa_binsri_d (v2u64, v2u64, imm0_63); -@defbuiltin{__float128 __builtin_addf128_round_to_odd (__float128, __float128)} -Perform a 128-bit IEEE floating point add using round to odd as the -rounding mode. -@enddefbuiltin +v16u8 __builtin_msa_bmnz_v (v16u8, v16u8, v16u8); -@defbuiltin{__float128 __builtin_subf128_round_to_odd (__float128, __float128)} -Perform a 128-bit IEEE floating point subtract using round to odd as -the rounding mode. -@enddefbuiltin +v16u8 __builtin_msa_bmnzi_b (v16u8, v16u8, imm0_255); -@defbuiltin{__float128 __builtin_mulf128_round_to_odd (__float128, __float128)} -Perform a 128-bit IEEE floating point multiply using round to odd as -the rounding mode. -@enddefbuiltin +v16u8 __builtin_msa_bmz_v (v16u8, v16u8, v16u8); -@defbuiltin{__float128 __builtin_divf128_round_to_odd (__float128, __float128)} -Perform a 128-bit IEEE floating point divide using round to odd as -the rounding mode. -@enddefbuiltin +v16u8 __builtin_msa_bmzi_b (v16u8, v16u8, imm0_255); -@defbuiltin{__float128 __builtin_sqrtf128_round_to_odd (__float128)} -Perform a 128-bit IEEE floating point square root using round to odd -as the rounding mode. -@enddefbuiltin +v16u8 __builtin_msa_bneg_b (v16u8, v16u8); +v8u16 __builtin_msa_bneg_h (v8u16, v8u16); +v4u32 __builtin_msa_bneg_w (v4u32, v4u32); +v2u64 __builtin_msa_bneg_d (v2u64, v2u64); -@defbuiltin{__float128 __builtin_fmaf128_round_to_odd (__float128, __float128, __float128)} -Perform a 128-bit IEEE floating point fused multiply and add operation -using round to odd as the rounding mode. -@enddefbuiltin +v16u8 __builtin_msa_bnegi_b (v16u8, imm0_7); +v8u16 __builtin_msa_bnegi_h (v8u16, imm0_15); +v4u32 __builtin_msa_bnegi_w (v4u32, imm0_31); +v2u64 __builtin_msa_bnegi_d (v2u64, imm0_63); -@defbuiltin{double __builtin_truncf128_round_to_odd (__float128)} -Convert a 128-bit IEEE floating point value to @code{double} using -round to odd as the rounding mode. -@enddefbuiltin +i32 __builtin_msa_bnz_b (v16u8); +i32 __builtin_msa_bnz_h (v8u16); +i32 __builtin_msa_bnz_w (v4u32); +i32 __builtin_msa_bnz_d (v2u64); +i32 __builtin_msa_bnz_v (v16u8); -The following additional built-in functions are also available for the -PowerPC family of processors, starting with ISA 3.0 or later: +v16u8 __builtin_msa_bsel_v (v16u8, v16u8, v16u8); -@defbuiltin{{long long} __builtin_darn (void)} -@defbuiltinx{{long long} __builtin_darn_raw (void)} -@defbuiltinx{int __builtin_darn_32 (void)} -The @code{__builtin_darn} and @code{__builtin_darn_raw} -functions require a -64-bit environment supporting ISA 3.0 or later. -The @code{__builtin_darn} function provides a 64-bit conditioned -random number. The @code{__builtin_darn_raw} function provides a -64-bit raw random number. The @code{__builtin_darn_32} function -provides a 32-bit conditioned random number. -@enddefbuiltin +v16u8 __builtin_msa_bseli_b (v16u8, v16u8, imm0_255); -The following additional built-in functions are also available for the -PowerPC family of processors, starting with ISA 3.0 or later: +v16u8 __builtin_msa_bset_b (v16u8, v16u8); +v8u16 __builtin_msa_bset_h (v8u16, v8u16); +v4u32 __builtin_msa_bset_w (v4u32, v4u32); +v2u64 __builtin_msa_bset_d (v2u64, v2u64); -@smallexample -int __builtin_byte_in_set (unsigned char u, unsigned long long set); -int __builtin_byte_in_range (unsigned char u, unsigned int range); -int __builtin_byte_in_either_range (unsigned char u, unsigned int ranges); +v16u8 __builtin_msa_bseti_b (v16u8, imm0_7); +v8u16 __builtin_msa_bseti_h (v8u16, imm0_15); +v4u32 __builtin_msa_bseti_w (v4u32, imm0_31); +v2u64 __builtin_msa_bseti_d (v2u64, imm0_63); -int __builtin_dfp_dtstsfi_lt (unsigned int comparison, _Decimal64 value); -int __builtin_dfp_dtstsfi_lt (unsigned int comparison, _Decimal128 value); -int __builtin_dfp_dtstsfi_lt_dd (unsigned int comparison, _Decimal64 value); -int __builtin_dfp_dtstsfi_lt_td (unsigned int comparison, _Decimal128 value); +i32 __builtin_msa_bz_b (v16u8); +i32 __builtin_msa_bz_h (v8u16); +i32 __builtin_msa_bz_w (v4u32); +i32 __builtin_msa_bz_d (v2u64); -int __builtin_dfp_dtstsfi_gt (unsigned int comparison, _Decimal64 value); -int __builtin_dfp_dtstsfi_gt (unsigned int comparison, _Decimal128 value); -int __builtin_dfp_dtstsfi_gt_dd (unsigned int comparison, _Decimal64 value); -int __builtin_dfp_dtstsfi_gt_td (unsigned int comparison, _Decimal128 value); +i32 __builtin_msa_bz_v (v16u8); -int __builtin_dfp_dtstsfi_eq (unsigned int comparison, _Decimal64 value); -int __builtin_dfp_dtstsfi_eq (unsigned int comparison, _Decimal128 value); -int __builtin_dfp_dtstsfi_eq_dd (unsigned int comparison, _Decimal64 value); -int __builtin_dfp_dtstsfi_eq_td (unsigned int comparison, _Decimal128 value); +v16i8 __builtin_msa_ceq_b (v16i8, v16i8); +v8i16 __builtin_msa_ceq_h (v8i16, v8i16); +v4i32 __builtin_msa_ceq_w (v4i32, v4i32); +v2i64 __builtin_msa_ceq_d (v2i64, v2i64); -int __builtin_dfp_dtstsfi_ov (unsigned int comparison, _Decimal64 value); -int __builtin_dfp_dtstsfi_ov (unsigned int comparison, _Decimal128 value); -int __builtin_dfp_dtstsfi_ov_dd (unsigned int comparison, _Decimal64 value); -int __builtin_dfp_dtstsfi_ov_td (unsigned int comparison, _Decimal128 value); +v16i8 __builtin_msa_ceqi_b (v16i8, imm_n16_15); +v8i16 __builtin_msa_ceqi_h (v8i16, imm_n16_15); +v4i32 __builtin_msa_ceqi_w (v4i32, imm_n16_15); +v2i64 __builtin_msa_ceqi_d (v2i64, imm_n16_15); -double __builtin_mffsl(void); +i32 __builtin_msa_cfcmsa (imm0_31); -@end smallexample -The @code{__builtin_byte_in_set} function requires a -64-bit environment supporting ISA 3.0 or later. This function returns -a non-zero value if and only if its @code{u} argument exactly equals one of -the eight bytes contained within its 64-bit @code{set} argument. +v16i8 __builtin_msa_cle_s_b (v16i8, v16i8); +v8i16 __builtin_msa_cle_s_h (v8i16, v8i16); +v4i32 __builtin_msa_cle_s_w (v4i32, v4i32); +v2i64 __builtin_msa_cle_s_d (v2i64, v2i64); -The @code{__builtin_byte_in_range} and -@code{__builtin_byte_in_either_range} require an environment -supporting ISA 3.0 or later. For these two functions, the -@code{range} argument is encoded as 4 bytes, organized as -@code{hi_1:lo_1:hi_2:lo_2}. -The @code{__builtin_byte_in_range} function returns a -non-zero value if and only if its @code{u} argument is within the -range bounded between @code{lo_2} and @code{hi_2} inclusive. -The @code{__builtin_byte_in_either_range} function returns non-zero if -and only if its @code{u} argument is within either the range bounded -between @code{lo_1} and @code{hi_1} inclusive or the range bounded -between @code{lo_2} and @code{hi_2} inclusive. +v16i8 __builtin_msa_cle_u_b (v16u8, v16u8); +v8i16 __builtin_msa_cle_u_h (v8u16, v8u16); +v4i32 __builtin_msa_cle_u_w (v4u32, v4u32); +v2i64 __builtin_msa_cle_u_d (v2u64, v2u64); -The @code{__builtin_dfp_dtstsfi_lt} function returns a non-zero value -if and only if the number of significant digits of its @code{value} argument -is less than its @code{comparison} argument. The -@code{__builtin_dfp_dtstsfi_lt_dd} and -@code{__builtin_dfp_dtstsfi_lt_td} functions behave similarly, but -require that the type of the @code{value} argument be -@code{__Decimal64} and @code{__Decimal128} respectively. +v16i8 __builtin_msa_clei_s_b (v16i8, imm_n16_15); +v8i16 __builtin_msa_clei_s_h (v8i16, imm_n16_15); +v4i32 __builtin_msa_clei_s_w (v4i32, imm_n16_15); +v2i64 __builtin_msa_clei_s_d (v2i64, imm_n16_15); -The @code{__builtin_dfp_dtstsfi_gt} function returns a non-zero value -if and only if the number of significant digits of its @code{value} argument -is greater than its @code{comparison} argument. The -@code{__builtin_dfp_dtstsfi_gt_dd} and -@code{__builtin_dfp_dtstsfi_gt_td} functions behave similarly, but -require that the type of the @code{value} argument be -@code{__Decimal64} and @code{__Decimal128} respectively. +v16i8 __builtin_msa_clei_u_b (v16u8, imm0_31); +v8i16 __builtin_msa_clei_u_h (v8u16, imm0_31); +v4i32 __builtin_msa_clei_u_w (v4u32, imm0_31); +v2i64 __builtin_msa_clei_u_d (v2u64, imm0_31); -The @code{__builtin_dfp_dtstsfi_eq} function returns a non-zero value -if and only if the number of significant digits of its @code{value} argument -equals its @code{comparison} argument. The -@code{__builtin_dfp_dtstsfi_eq_dd} and -@code{__builtin_dfp_dtstsfi_eq_td} functions behave similarly, but -require that the type of the @code{value} argument be -@code{__Decimal64} and @code{__Decimal128} respectively. +v16i8 __builtin_msa_clt_s_b (v16i8, v16i8); +v8i16 __builtin_msa_clt_s_h (v8i16, v8i16); +v4i32 __builtin_msa_clt_s_w (v4i32, v4i32); +v2i64 __builtin_msa_clt_s_d (v2i64, v2i64); -The @code{__builtin_dfp_dtstsfi_ov} function returns a non-zero value -if and only if its @code{value} argument has an undefined number of -significant digits, such as when @code{value} is an encoding of @code{NaN}. -The @code{__builtin_dfp_dtstsfi_ov_dd} and -@code{__builtin_dfp_dtstsfi_ov_td} functions behave similarly, but -require that the type of the @code{value} argument be -@code{__Decimal64} and @code{__Decimal128} respectively. +v16i8 __builtin_msa_clt_u_b (v16u8, v16u8); +v8i16 __builtin_msa_clt_u_h (v8u16, v8u16); +v4i32 __builtin_msa_clt_u_w (v4u32, v4u32); +v2i64 __builtin_msa_clt_u_d (v2u64, v2u64); -The @code{__builtin_mffsl} uses the ISA 3.0 @code{mffsl} instruction to read -the FPSCR. The instruction is a lower latency version of the @code{mffs} -instruction. If the @code{mffsl} instruction is not available, then the -builtin uses the older @code{mffs} instruction to read the FPSCR. +v16i8 __builtin_msa_clti_s_b (v16i8, imm_n16_15); +v8i16 __builtin_msa_clti_s_h (v8i16, imm_n16_15); +v4i32 __builtin_msa_clti_s_w (v4i32, imm_n16_15); +v2i64 __builtin_msa_clti_s_d (v2i64, imm_n16_15); -@node Basic PowerPC Built-in Functions Available on ISA 3.1 -@subsubsection Basic PowerPC Built-in Functions Available on ISA 3.1 +v16i8 __builtin_msa_clti_u_b (v16u8, imm0_31); +v8i16 __builtin_msa_clti_u_h (v8u16, imm0_31); +v4i32 __builtin_msa_clti_u_w (v4u32, imm0_31); +v2i64 __builtin_msa_clti_u_d (v2u64, imm0_31); -The basic built-in functions described in this section are -available on the PowerPC family of processors starting with ISA 3.1. -Unless specific options are explicitly disabled on the -command line, specifying option @option{-mcpu=power10} has the effect of -enabling all the same options as for @option{-mcpu=power9}. +i32 __builtin_msa_copy_s_b (v16i8, imm0_15); +i32 __builtin_msa_copy_s_h (v8i16, imm0_7); +i32 __builtin_msa_copy_s_w (v4i32, imm0_3); +i64 __builtin_msa_copy_s_d (v2i64, imm0_1); -The following built-in functions are available on Linux 64-bit systems -that use a future architecture instruction set (@option{-mcpu=power10}): +u32 __builtin_msa_copy_u_b (v16i8, imm0_15); +u32 __builtin_msa_copy_u_h (v8i16, imm0_7); +u32 __builtin_msa_copy_u_w (v4i32, imm0_3); +u64 __builtin_msa_copy_u_d (v2i64, imm0_1); -@defbuiltin{{unsigned long long} @ - __builtin_cfuged (unsigned long long, unsigned long long)} -Perform a 64-bit centrifuge operation, as if implemented by the -@code{cfuged} instruction. -@enddefbuiltin +void __builtin_msa_ctcmsa (imm0_31, i32); -@defbuiltin{{unsigned long long} @ - __builtin_cntlzdm (unsigned long long, unsigned long long)} -Perform a 64-bit count leading zeros operation under mask, as if -implemented by the @code{cntlzdm} instruction. -@enddefbuiltin +v16i8 __builtin_msa_div_s_b (v16i8, v16i8); +v8i16 __builtin_msa_div_s_h (v8i16, v8i16); +v4i32 __builtin_msa_div_s_w (v4i32, v4i32); +v2i64 __builtin_msa_div_s_d (v2i64, v2i64); -@defbuiltin{{unsigned long long} @ - __builtin_cnttzdm (unsigned long long, unsigned long long)} -Perform a 64-bit count trailing zeros operation under mask, as if -implemented by the @code{cnttzdm} instruction. -@enddefbuiltin +v16u8 __builtin_msa_div_u_b (v16u8, v16u8); +v8u16 __builtin_msa_div_u_h (v8u16, v8u16); +v4u32 __builtin_msa_div_u_w (v4u32, v4u32); +v2u64 __builtin_msa_div_u_d (v2u64, v2u64); -@defbuiltin{{unsigned long long} @ - __builtin_pdepd (unsigned long long, unsigned long long)} -Perform a 64-bit parallel bits deposit operation, as if implemented by the -@code{pdepd} instruction. -@enddefbuiltin +v8i16 __builtin_msa_dotp_s_h (v16i8, v16i8); +v4i32 __builtin_msa_dotp_s_w (v8i16, v8i16); +v2i64 __builtin_msa_dotp_s_d (v4i32, v4i32); -@defbuiltin{{unsigned long long} @ - __builtin_pextd (unsigned long long, unsigned long long)} -Perform a 64-bit parallel bits extract operation, as if implemented by the -@code{pextd} instruction. -@enddefbuiltin +v8u16 __builtin_msa_dotp_u_h (v16u8, v16u8); +v4u32 __builtin_msa_dotp_u_w (v8u16, v8u16); +v2u64 __builtin_msa_dotp_u_d (v4u32, v4u32); -@defbuiltin{{vector signed __int128} vsx_xl_sext (signed long long, signed char *)} -@defbuiltinx{{vector signed __int128} vsx_xl_sext (signed long long, signed short *)} -@defbuiltinx{{vector signed __int128} vsx_xl_sext (signed long long, signed int *)} -@defbuiltinx{{vector signed __int128} vsx_xl_sext (signed long long, signed long long *)} -@defbuiltinx{{vector unsigned __int128} vsx_xl_zext (signed long long, unsigned char *)} -@defbuiltinx{{vector unsigned __int128} vsx_xl_zext (signed long long, unsigned short *)} -@defbuiltinx{{vector unsigned __int128} vsx_xl_zext (signed long long, unsigned int *)} -@defbuiltinx{{vector unsigned __int128} vsx_xl_zext (signed long long, unsigned long long *)} +v8i16 __builtin_msa_dpadd_s_h (v8i16, v16i8, v16i8); +v4i32 __builtin_msa_dpadd_s_w (v4i32, v8i16, v8i16); +v2i64 __builtin_msa_dpadd_s_d (v2i64, v4i32, v4i32); -Load (and sign extend) to an __int128 vector, as if implemented by the ISA 3.1 -@code{lxvrbx}, @code{lxvrhx}, @code{lxvrwx}, and @code{lxvrdx} -instructions. -@enddefbuiltin +v8u16 __builtin_msa_dpadd_u_h (v8u16, v16u8, v16u8); +v4u32 __builtin_msa_dpadd_u_w (v4u32, v8u16, v8u16); +v2u64 __builtin_msa_dpadd_u_d (v2u64, v4u32, v4u32); -@defbuiltin{{void} vec_xst_trunc (vector signed __int128, signed long long, signed char *)} -@defbuiltinx{{void} vec_xst_trunc (vector signed __int128, signed long long, signed short *)} -@defbuiltinx{{void} vec_xst_trunc (vector signed __int128, signed long long, signed int *)} -@defbuiltinx{{void} vec_xst_trunc (vector signed __int128, signed long long, signed long long *)} -@defbuiltinx{{void} vec_xst_trunc (vector unsigned __int128, signed long long, unsigned char *)} -@defbuiltinx{{void} vec_xst_trunc (vector unsigned __int128, signed long long, unsigned short *)} -@defbuiltinx{{void} vec_xst_trunc (vector unsigned __int128, signed long long, unsigned int *)} -@defbuiltinx{{void} vec_xst_trunc (vector unsigned __int128, signed long long, unsigned long long *)} +v8i16 __builtin_msa_dpsub_s_h (v8i16, v16i8, v16i8); +v4i32 __builtin_msa_dpsub_s_w (v4i32, v8i16, v8i16); +v2i64 __builtin_msa_dpsub_s_d (v2i64, v4i32, v4i32); -Truncate and store the rightmost element of a vector, as if implemented by the -ISA 3.1 @code{stxvrbx}, @code{stxvrhx}, @code{stxvrwx}, and @code{stxvrdx} -instructions. -@enddefbuiltin +v8i16 __builtin_msa_dpsub_u_h (v8i16, v16u8, v16u8); +v4i32 __builtin_msa_dpsub_u_w (v4i32, v8u16, v8u16); +v2i64 __builtin_msa_dpsub_u_d (v2i64, v4u32, v4u32); -@node PowerPC AltiVec/VSX Built-in Functions -@subsection PowerPC AltiVec/VSX Built-in Functions +v4f32 __builtin_msa_fadd_w (v4f32, v4f32); +v2f64 __builtin_msa_fadd_d (v2f64, v2f64); -GCC provides an interface for the PowerPC family of processors to access -the AltiVec operations described in Motorola's AltiVec Programming -Interface Manual. The interface is made available by including -@code{} and using @option{-maltivec} and -@option{-mabi=altivec}. The interface supports the following vector -types. +v4i32 __builtin_msa_fcaf_w (v4f32, v4f32); +v2i64 __builtin_msa_fcaf_d (v2f64, v2f64); -@smallexample -vector unsigned char -vector signed char -vector bool char +v4i32 __builtin_msa_fceq_w (v4f32, v4f32); +v2i64 __builtin_msa_fceq_d (v2f64, v2f64); -vector unsigned short -vector signed short -vector bool short -vector pixel +v4i32 __builtin_msa_fclass_w (v4f32); +v2i64 __builtin_msa_fclass_d (v2f64); -vector unsigned int -vector signed int -vector bool int -vector float -@end smallexample +v4i32 __builtin_msa_fcle_w (v4f32, v4f32); +v2i64 __builtin_msa_fcle_d (v2f64, v2f64); -GCC's implementation of the high-level language interface available from -C and C++ code differs from Motorola's documentation in several ways. +v4i32 __builtin_msa_fclt_w (v4f32, v4f32); +v2i64 __builtin_msa_fclt_d (v2f64, v2f64); -@itemize @bullet +v4i32 __builtin_msa_fcne_w (v4f32, v4f32); +v2i64 __builtin_msa_fcne_d (v2f64, v2f64); -@item -A vector constant is a list of constant expressions within curly braces. +v4i32 __builtin_msa_fcor_w (v4f32, v4f32); +v2i64 __builtin_msa_fcor_d (v2f64, v2f64); -@item -A vector initializer requires no cast if the vector constant is of the -same type as the variable it is initializing. +v4i32 __builtin_msa_fcueq_w (v4f32, v4f32); +v2i64 __builtin_msa_fcueq_d (v2f64, v2f64); -@item -If @code{signed} or @code{unsigned} is omitted, the signedness of the -vector type is the default signedness of the base type. The default -varies depending on the operating system, so a portable program should -always specify the signedness. +v4i32 __builtin_msa_fcule_w (v4f32, v4f32); +v2i64 __builtin_msa_fcule_d (v2f64, v2f64); -@item -Compiling with @option{-maltivec} adds keywords @code{__vector}, -@code{vector}, @code{__pixel}, @code{pixel}, @code{__bool} and -@code{bool}. When compiling ISO C, the context-sensitive substitution -of the keywords @code{vector}, @code{pixel} and @code{bool} is -disabled. To use them, you must include @code{} instead. +v4i32 __builtin_msa_fcult_w (v4f32, v4f32); +v2i64 __builtin_msa_fcult_d (v2f64, v2f64); -@item -GCC allows using a @code{typedef} name as the type specifier for a -vector type, but only under the following circumstances: +v4i32 __builtin_msa_fcun_w (v4f32, v4f32); +v2i64 __builtin_msa_fcun_d (v2f64, v2f64); -@itemize @bullet +v4i32 __builtin_msa_fcune_w (v4f32, v4f32); +v2i64 __builtin_msa_fcune_d (v2f64, v2f64); -@item -When using @code{__vector} instead of @code{vector}; for example, +v4f32 __builtin_msa_fdiv_w (v4f32, v4f32); +v2f64 __builtin_msa_fdiv_d (v2f64, v2f64); -@smallexample -typedef signed short int16; -__vector int16 data; -@end smallexample +v8i16 __builtin_msa_fexdo_h (v4f32, v4f32); +v4f32 __builtin_msa_fexdo_w (v2f64, v2f64); -@item -When using @code{vector} in keyword-and-predefine mode; for example, +v4f32 __builtin_msa_fexp2_w (v4f32, v4i32); +v2f64 __builtin_msa_fexp2_d (v2f64, v2i64); -@smallexample -typedef signed short int16; -vector int16 data; -@end smallexample +v4f32 __builtin_msa_fexupl_w (v8i16); +v2f64 __builtin_msa_fexupl_d (v4f32); -Note that keyword-and-predefine mode is enabled by disabling GNU -extensions (e.g., by using @code{-std=c11}) and including -@code{}. -@end itemize +v4f32 __builtin_msa_fexupr_w (v8i16); +v2f64 __builtin_msa_fexupr_d (v4f32); -@item -For C, overloaded functions are implemented with macros so the following -does not work: +v4f32 __builtin_msa_ffint_s_w (v4i32); +v2f64 __builtin_msa_ffint_s_d (v2i64); + +v4f32 __builtin_msa_ffint_u_w (v4u32); +v2f64 __builtin_msa_ffint_u_d (v2u64); + +v4f32 __builtin_msa_ffql_w (v8i16); +v2f64 __builtin_msa_ffql_d (v4i32); -@smallexample - vec_add ((vector signed int)@{1, 2, 3, 4@}, foo); -@end smallexample +v4f32 __builtin_msa_ffqr_w (v8i16); +v2f64 __builtin_msa_ffqr_d (v4i32); -@noindent -Since @code{vec_add} is a macro, the vector constant in the example -is treated as four separate arguments. Wrap the entire argument in -parentheses for this to work. -@end itemize +v16i8 __builtin_msa_fill_b (i32); +v8i16 __builtin_msa_fill_h (i32); +v4i32 __builtin_msa_fill_w (i32); +v2i64 __builtin_msa_fill_d (i64); -@emph{Note:} Only the @code{} interface is supported. -Internally, GCC uses built-in functions to achieve the functionality in -the aforementioned header file, but they are not supported and are -subject to change without notice. +v4f32 __builtin_msa_flog2_w (v4f32); +v2f64 __builtin_msa_flog2_d (v2f64); -GCC complies with the Power Vector Intrinsic Programming Reference (PVIPR), -which may be found at -@uref{https://openpowerfoundation.org/?resource_lib=power-vector-intrinsic-programming-reference}. -Chapter 4 of this document fully documents the vector API interfaces -that must be -provided by compliant compilers. Programmers should preferentially use -the interfaces described therein. However, historically GCC has provided -additional interfaces for access to vector instructions. These are -briefly described below. Where the PVIPR provides a portable interface, -other functions in GCC that provide the same capabilities should be -considered deprecated. +v4f32 __builtin_msa_fmadd_w (v4f32, v4f32, v4f32); +v2f64 __builtin_msa_fmadd_d (v2f64, v2f64, v2f64); -The PVIPR documents the following overloaded functions: +v4f32 __builtin_msa_fmax_w (v4f32, v4f32); +v2f64 __builtin_msa_fmax_d (v2f64, v2f64); -@multitable @columnfractions 0.33 0.33 0.33 +v4f32 __builtin_msa_fmax_a_w (v4f32, v4f32); +v2f64 __builtin_msa_fmax_a_d (v2f64, v2f64); -@item @code{vec_abs} -@tab @code{vec_absd} -@tab @code{vec_abss} -@item @code{vec_add} -@tab @code{vec_addc} -@tab @code{vec_adde} -@item @code{vec_addec} -@tab @code{vec_adds} -@tab @code{vec_all_eq} -@item @code{vec_all_ge} -@tab @code{vec_all_gt} -@tab @code{vec_all_in} -@item @code{vec_all_le} -@tab @code{vec_all_lt} -@tab @code{vec_all_nan} -@item @code{vec_all_ne} -@tab @code{vec_all_nge} -@tab @code{vec_all_ngt} -@item @code{vec_all_nle} -@tab @code{vec_all_nlt} -@tab @code{vec_all_numeric} -@item @code{vec_and} -@tab @code{vec_andc} -@tab @code{vec_any_eq} -@item @code{vec_any_ge} -@tab @code{vec_any_gt} -@tab @code{vec_any_le} -@item @code{vec_any_lt} -@tab @code{vec_any_nan} -@tab @code{vec_any_ne} -@item @code{vec_any_nge} -@tab @code{vec_any_ngt} -@tab @code{vec_any_nle} -@item @code{vec_any_nlt} -@tab @code{vec_any_numeric} -@tab @code{vec_any_out} -@item @code{vec_avg} -@tab @code{vec_bperm} -@tab @code{vec_ceil} -@item @code{vec_cipher_be} -@tab @code{vec_cipherlast_be} -@tab @code{vec_cmpb} -@item @code{vec_cmpeq} -@tab @code{vec_cmpge} -@tab @code{vec_cmpgt} -@item @code{vec_cmple} -@tab @code{vec_cmplt} -@tab @code{vec_cmpne} -@item @code{vec_cmpnez} -@tab @code{vec_cntlz} -@tab @code{vec_cntlz_lsbb} -@item @code{vec_cnttz} -@tab @code{vec_cnttz_lsbb} -@tab @code{vec_cpsgn} -@item @code{vec_ctf} -@tab @code{vec_cts} -@tab @code{vec_ctu} -@item @code{vec_div} -@tab @code{vec_double} -@tab @code{vec_doublee} -@item @code{vec_doubleh} -@tab @code{vec_doublel} -@tab @code{vec_doubleo} -@item @code{vec_eqv} -@tab @code{vec_expte} -@tab @code{vec_extract} -@item @code{vec_extract_exp} -@tab @code{vec_extract_fp32_from_shorth} -@tab @code{vec_extract_fp32_from_shortl} -@item @code{vec_extract_sig} -@tab @code{vec_extract_4b} -@tab @code{vec_first_match_index} -@item @code{vec_first_match_or_eos_index} -@tab @code{vec_first_mismatch_index} -@tab @code{vec_first_mismatch_or_eos_index} -@item @code{vec_float} -@tab @code{vec_float2} -@tab @code{vec_floate} -@item @code{vec_floato} -@tab @code{vec_floor} -@tab @code{vec_gb} -@item @code{vec_insert} -@tab @code{vec_insert_exp} -@tab @code{vec_insert4b} -@item @code{vec_ld} -@tab @code{vec_lde} -@tab @code{vec_ldl} -@item @code{vec_loge} -@tab @code{vec_madd} -@tab @code{vec_madds} -@item @code{vec_max} -@tab @code{vec_mergee} -@tab @code{vec_mergeh} -@item @code{vec_mergel} -@tab @code{vec_mergeo} -@tab @code{vec_mfvscr} -@item @code{vec_min} -@tab @code{vec_mradds} -@tab @code{vec_msub} -@item @code{vec_msum} -@tab @code{vec_msums} -@tab @code{vec_mtvscr} -@item @code{vec_mul} -@tab @code{vec_mule} -@tab @code{vec_mulo} -@item @code{vec_nabs} -@tab @code{vec_nand} -@tab @code{vec_ncipher_be} -@item @code{vec_ncipherlast_be} -@tab @code{vec_nearbyint} -@tab @code{vec_neg} -@item @code{vec_nmadd} -@tab @code{vec_nmsub} -@tab @code{vec_nor} -@item @code{vec_or} -@tab @code{vec_orc} -@tab @code{vec_pack} -@item @code{vec_pack_to_short_fp32} -@tab @code{vec_packpx} -@tab @code{vec_packs} -@item @code{vec_packsu} -@tab @code{vec_parity_lsbb} -@tab @code{vec_perm} -@item @code{vec_permxor} -@tab @code{vec_pmsum_be} -@tab @code{vec_popcnt} -@item @code{vec_re} -@tab @code{vec_recipdiv} -@tab @code{vec_revb} -@item @code{vec_reve} -@tab @code{vec_rint} -@tab @code{vec_rl} -@item @code{vec_rlmi} -@tab @code{vec_rlnm} -@tab @code{vec_round} -@item @code{vec_rsqrt} -@tab @code{vec_rsqrte} -@tab @code{vec_sbox_be} -@item @code{vec_sel} -@tab @code{vec_shasigma_be} -@tab @code{vec_signed} -@item @code{vec_signed2} -@tab @code{vec_signede} -@tab @code{vec_signedo} -@item @code{vec_sl} -@tab @code{vec_sld} -@tab @code{vec_sldw} -@item @code{vec_sll} -@tab @code{vec_slo} -@tab @code{vec_slv} -@item @code{vec_splat} -@tab @code{vec_splat_s8} -@tab @code{vec_splat_s16} -@item @code{vec_splat_s32} -@tab @code{vec_splat_u8} -@tab @code{vec_splat_u16} -@item @code{vec_splat_u32} -@tab @code{vec_splats} -@tab @code{vec_sqrt} -@item @code{vec_sr} -@tab @code{vec_sra} -@tab @code{vec_srl} -@item @code{vec_sro} -@tab @code{vec_srv} -@tab @code{vec_st} -@item @code{vec_ste} -@tab @code{vec_stl} -@tab @code{vec_sub} -@item @code{vec_subc} -@tab @code{vec_sube} -@tab @code{vec_subec} -@item @code{vec_subs} -@tab @code{vec_sum2s} -@tab @code{vec_sum4s} -@item @code{vec_sums} -@tab @code{vec_test_data_class} -@tab @code{vec_trunc} -@item @code{vec_unpackh} -@tab @code{vec_unpackl} -@tab @code{vec_unsigned} -@item @code{vec_unsigned2} -@tab @code{vec_unsignede} -@tab @code{vec_unsignedo} -@item @code{vec_xl} -@tab @code{vec_xl_be} -@tab @code{vec_xl_len} -@item @code{vec_xl_len_r} -@tab @code{vec_xor} -@tab @code{vec_xst} -@item @code{vec_xst_be} -@tab @code{vec_xst_len} -@tab @code{vec_xst_len_r} +v4f32 __builtin_msa_fmin_w (v4f32, v4f32); +v2f64 __builtin_msa_fmin_d (v2f64, v2f64); -@end multitable +v4f32 __builtin_msa_fmin_a_w (v4f32, v4f32); +v2f64 __builtin_msa_fmin_a_d (v2f64, v2f64); -@menu -* PowerPC AltiVec Built-in Functions on ISA 2.05:: -* PowerPC AltiVec Built-in Functions Available on ISA 2.06:: -* PowerPC AltiVec Built-in Functions Available on ISA 2.07:: -* PowerPC AltiVec Built-in Functions Available on ISA 3.0:: -* PowerPC AltiVec Built-in Functions Available on ISA 3.1:: -@end menu +v4f32 __builtin_msa_fmsub_w (v4f32, v4f32, v4f32); +v2f64 __builtin_msa_fmsub_d (v2f64, v2f64, v2f64); -@node PowerPC AltiVec Built-in Functions on ISA 2.05 -@subsubsection PowerPC AltiVec Built-in Functions on ISA 2.05 +v4f32 __builtin_msa_fmul_w (v4f32, v4f32); +v2f64 __builtin_msa_fmul_d (v2f64, v2f64); -The following interfaces are supported for the generic and specific -AltiVec operations and the AltiVec predicates. In cases where there -is a direct mapping between generic and specific operations, only the -generic names are shown here, although the specific operations can also -be used. +v4f32 __builtin_msa_frint_w (v4f32); +v2f64 __builtin_msa_frint_d (v2f64); -Arguments that are documented as @code{const int} require literal -integral values within the range required for that operation. +v4f32 __builtin_msa_frcp_w (v4f32); +v2f64 __builtin_msa_frcp_d (v2f64); -Only functions excluded from the PVIPR are listed here. +v4f32 __builtin_msa_frsqrt_w (v4f32); +v2f64 __builtin_msa_frsqrt_d (v2f64); -@smallexample -void vec_dss (const int); +v4i32 __builtin_msa_fsaf_w (v4f32, v4f32); +v2i64 __builtin_msa_fsaf_d (v2f64, v2f64); -void vec_dssall (void); +v4i32 __builtin_msa_fseq_w (v4f32, v4f32); +v2i64 __builtin_msa_fseq_d (v2f64, v2f64); -void vec_dst (const vector unsigned char *, int, const int); -void vec_dst (const vector signed char *, int, const int); -void vec_dst (const vector bool char *, int, const int); -void vec_dst (const vector unsigned short *, int, const int); -void vec_dst (const vector signed short *, int, const int); -void vec_dst (const vector bool short *, int, const int); -void vec_dst (const vector pixel *, int, const int); -void vec_dst (const vector unsigned int *, int, const int); -void vec_dst (const vector signed int *, int, const int); -void vec_dst (const vector bool int *, int, const int); -void vec_dst (const vector float *, int, const int); -void vec_dst (const unsigned char *, int, const int); -void vec_dst (const signed char *, int, const int); -void vec_dst (const unsigned short *, int, const int); -void vec_dst (const short *, int, const int); -void vec_dst (const unsigned int *, int, const int); -void vec_dst (const int *, int, const int); -void vec_dst (const float *, int, const int); +v4i32 __builtin_msa_fsle_w (v4f32, v4f32); +v2i64 __builtin_msa_fsle_d (v2f64, v2f64); -void vec_dstst (const vector unsigned char *, int, const int); -void vec_dstst (const vector signed char *, int, const int); -void vec_dstst (const vector bool char *, int, const int); -void vec_dstst (const vector unsigned short *, int, const int); -void vec_dstst (const vector signed short *, int, const int); -void vec_dstst (const vector bool short *, int, const int); -void vec_dstst (const vector pixel *, int, const int); -void vec_dstst (const vector unsigned int *, int, const int); -void vec_dstst (const vector signed int *, int, const int); -void vec_dstst (const vector bool int *, int, const int); -void vec_dstst (const vector float *, int, const int); -void vec_dstst (const unsigned char *, int, const int); -void vec_dstst (const signed char *, int, const int); -void vec_dstst (const unsigned short *, int, const int); -void vec_dstst (const short *, int, const int); -void vec_dstst (const unsigned int *, int, const int); -void vec_dstst (const int *, int, const int); -void vec_dstst (const unsigned long *, int, const int); -void vec_dstst (const long *, int, const int); -void vec_dstst (const float *, int, const int); +v4i32 __builtin_msa_fslt_w (v4f32, v4f32); +v2i64 __builtin_msa_fslt_d (v2f64, v2f64); -void vec_dststt (const vector unsigned char *, int, const int); -void vec_dststt (const vector signed char *, int, const int); -void vec_dststt (const vector bool char *, int, const int); -void vec_dststt (const vector unsigned short *, int, const int); -void vec_dststt (const vector signed short *, int, const int); -void vec_dststt (const vector bool short *, int, const int); -void vec_dststt (const vector pixel *, int, const int); -void vec_dststt (const vector unsigned int *, int, const int); -void vec_dststt (const vector signed int *, int, const int); -void vec_dststt (const vector bool int *, int, const int); -void vec_dststt (const vector float *, int, const int); -void vec_dststt (const unsigned char *, int, const int); -void vec_dststt (const signed char *, int, const int); -void vec_dststt (const unsigned short *, int, const int); -void vec_dststt (const short *, int, const int); -void vec_dststt (const unsigned int *, int, const int); -void vec_dststt (const int *, int, const int); -void vec_dststt (const float *, int, const int); +v4i32 __builtin_msa_fsne_w (v4f32, v4f32); +v2i64 __builtin_msa_fsne_d (v2f64, v2f64); -void vec_dstt (const vector unsigned char *, int, const int); -void vec_dstt (const vector signed char *, int, const int); -void vec_dstt (const vector bool char *, int, const int); -void vec_dstt (const vector unsigned short *, int, const int); -void vec_dstt (const vector signed short *, int, const int); -void vec_dstt (const vector bool short *, int, const int); -void vec_dstt (const vector pixel *, int, const int); -void vec_dstt (const vector unsigned int *, int, const int); -void vec_dstt (const vector signed int *, int, const int); -void vec_dstt (const vector bool int *, int, const int); -void vec_dstt (const vector float *, int, const int); -void vec_dstt (const unsigned char *, int, const int); -void vec_dstt (const signed char *, int, const int); -void vec_dstt (const unsigned short *, int, const int); -void vec_dstt (const short *, int, const int); -void vec_dstt (const unsigned int *, int, const int); -void vec_dstt (const int *, int, const int); -void vec_dstt (const float *, int, const int); +v4i32 __builtin_msa_fsor_w (v4f32, v4f32); +v2i64 __builtin_msa_fsor_d (v2f64, v2f64); -vector signed char vec_lvebx (int, char *); -vector unsigned char vec_lvebx (int, unsigned char *); +v4f32 __builtin_msa_fsqrt_w (v4f32); +v2f64 __builtin_msa_fsqrt_d (v2f64); -vector signed short vec_lvehx (int, short *); -vector unsigned short vec_lvehx (int, unsigned short *); +v4f32 __builtin_msa_fsub_w (v4f32, v4f32); +v2f64 __builtin_msa_fsub_d (v2f64, v2f64); -vector float vec_lvewx (int, float *); -vector signed int vec_lvewx (int, int *); -vector unsigned int vec_lvewx (int, unsigned int *); +v4i32 __builtin_msa_fsueq_w (v4f32, v4f32); +v2i64 __builtin_msa_fsueq_d (v2f64, v2f64); -vector unsigned char vec_lvsl (int, const unsigned char *); -vector unsigned char vec_lvsl (int, const signed char *); -vector unsigned char vec_lvsl (int, const unsigned short *); -vector unsigned char vec_lvsl (int, const short *); -vector unsigned char vec_lvsl (int, const unsigned int *); -vector unsigned char vec_lvsl (int, const int *); -vector unsigned char vec_lvsl (int, const float *); +v4i32 __builtin_msa_fsule_w (v4f32, v4f32); +v2i64 __builtin_msa_fsule_d (v2f64, v2f64); -vector unsigned char vec_lvsr (int, const unsigned char *); -vector unsigned char vec_lvsr (int, const signed char *); -vector unsigned char vec_lvsr (int, const unsigned short *); -vector unsigned char vec_lvsr (int, const short *); -vector unsigned char vec_lvsr (int, const unsigned int *); -vector unsigned char vec_lvsr (int, const int *); -vector unsigned char vec_lvsr (int, const float *); +v4i32 __builtin_msa_fsult_w (v4f32, v4f32); +v2i64 __builtin_msa_fsult_d (v2f64, v2f64); -void vec_stvebx (vector signed char, int, signed char *); -void vec_stvebx (vector unsigned char, int, unsigned char *); -void vec_stvebx (vector bool char, int, signed char *); -void vec_stvebx (vector bool char, int, unsigned char *); +v4i32 __builtin_msa_fsun_w (v4f32, v4f32); +v2i64 __builtin_msa_fsun_d (v2f64, v2f64); -void vec_stvehx (vector signed short, int, short *); -void vec_stvehx (vector unsigned short, int, unsigned short *); -void vec_stvehx (vector bool short, int, short *); -void vec_stvehx (vector bool short, int, unsigned short *); +v4i32 __builtin_msa_fsune_w (v4f32, v4f32); +v2i64 __builtin_msa_fsune_d (v2f64, v2f64); -void vec_stvewx (vector float, int, float *); -void vec_stvewx (vector signed int, int, int *); -void vec_stvewx (vector unsigned int, int, unsigned int *); -void vec_stvewx (vector bool int, int, int *); -void vec_stvewx (vector bool int, int, unsigned int *); +v4i32 __builtin_msa_ftint_s_w (v4f32); +v2i64 __builtin_msa_ftint_s_d (v2f64); -vector float vec_vaddfp (vector float, vector float); +v4u32 __builtin_msa_ftint_u_w (v4f32); +v2u64 __builtin_msa_ftint_u_d (v2f64); -vector signed char vec_vaddsbs (vector bool char, vector signed char); -vector signed char vec_vaddsbs (vector signed char, vector bool char); -vector signed char vec_vaddsbs (vector signed char, vector signed char); +v8i16 __builtin_msa_ftq_h (v4f32, v4f32); +v4i32 __builtin_msa_ftq_w (v2f64, v2f64); -vector signed short vec_vaddshs (vector bool short, vector signed short); -vector signed short vec_vaddshs (vector signed short, vector bool short); -vector signed short vec_vaddshs (vector signed short, vector signed short); +v4i32 __builtin_msa_ftrunc_s_w (v4f32); +v2i64 __builtin_msa_ftrunc_s_d (v2f64); -vector signed int vec_vaddsws (vector bool int, vector signed int); -vector signed int vec_vaddsws (vector signed int, vector bool int); -vector signed int vec_vaddsws (vector signed int, vector signed int); +v4u32 __builtin_msa_ftrunc_u_w (v4f32); +v2u64 __builtin_msa_ftrunc_u_d (v2f64); -vector signed char vec_vaddubm (vector bool char, vector signed char); -vector signed char vec_vaddubm (vector signed char, vector bool char); -vector signed char vec_vaddubm (vector signed char, vector signed char); -vector unsigned char vec_vaddubm (vector bool char, vector unsigned char); -vector unsigned char vec_vaddubm (vector unsigned char, vector bool char); -vector unsigned char vec_vaddubm (vector unsigned char, vector unsigned char); +v8i16 __builtin_msa_hadd_s_h (v16i8, v16i8); +v4i32 __builtin_msa_hadd_s_w (v8i16, v8i16); +v2i64 __builtin_msa_hadd_s_d (v4i32, v4i32); -vector unsigned char vec_vaddubs (vector bool char, vector unsigned char); -vector unsigned char vec_vaddubs (vector unsigned char, vector bool char); -vector unsigned char vec_vaddubs (vector unsigned char, vector unsigned char); +v8u16 __builtin_msa_hadd_u_h (v16u8, v16u8); +v4u32 __builtin_msa_hadd_u_w (v8u16, v8u16); +v2u64 __builtin_msa_hadd_u_d (v4u32, v4u32); -vector signed short vec_vadduhm (vector bool short, vector signed short); -vector signed short vec_vadduhm (vector signed short, vector bool short); -vector signed short vec_vadduhm (vector signed short, vector signed short); -vector unsigned short vec_vadduhm (vector bool short, vector unsigned short); -vector unsigned short vec_vadduhm (vector unsigned short, vector bool short); -vector unsigned short vec_vadduhm (vector unsigned short, vector unsigned short); +v8i16 __builtin_msa_hsub_s_h (v16i8, v16i8); +v4i32 __builtin_msa_hsub_s_w (v8i16, v8i16); +v2i64 __builtin_msa_hsub_s_d (v4i32, v4i32); -vector unsigned short vec_vadduhs (vector bool short, vector unsigned short); -vector unsigned short vec_vadduhs (vector unsigned short, vector bool short); -vector unsigned short vec_vadduhs (vector unsigned short, vector unsigned short); +v8i16 __builtin_msa_hsub_u_h (v16u8, v16u8); +v4i32 __builtin_msa_hsub_u_w (v8u16, v8u16); +v2i64 __builtin_msa_hsub_u_d (v4u32, v4u32); -vector signed int vec_vadduwm (vector bool int, vector signed int); -vector signed int vec_vadduwm (vector signed int, vector bool int); -vector signed int vec_vadduwm (vector signed int, vector signed int); -vector unsigned int vec_vadduwm (vector bool int, vector unsigned int); -vector unsigned int vec_vadduwm (vector unsigned int, vector bool int); -vector unsigned int vec_vadduwm (vector unsigned int, vector unsigned int); +v16i8 __builtin_msa_ilvev_b (v16i8, v16i8); +v8i16 __builtin_msa_ilvev_h (v8i16, v8i16); +v4i32 __builtin_msa_ilvev_w (v4i32, v4i32); +v2i64 __builtin_msa_ilvev_d (v2i64, v2i64); -vector unsigned int vec_vadduws (vector bool int, vector unsigned int); -vector unsigned int vec_vadduws (vector unsigned int, vector bool int); -vector unsigned int vec_vadduws (vector unsigned int, vector unsigned int); +v16i8 __builtin_msa_ilvl_b (v16i8, v16i8); +v8i16 __builtin_msa_ilvl_h (v8i16, v8i16); +v4i32 __builtin_msa_ilvl_w (v4i32, v4i32); +v2i64 __builtin_msa_ilvl_d (v2i64, v2i64); -vector signed char vec_vavgsb (vector signed char, vector signed char); +v16i8 __builtin_msa_ilvod_b (v16i8, v16i8); +v8i16 __builtin_msa_ilvod_h (v8i16, v8i16); +v4i32 __builtin_msa_ilvod_w (v4i32, v4i32); +v2i64 __builtin_msa_ilvod_d (v2i64, v2i64); -vector signed short vec_vavgsh (vector signed short, vector signed short); +v16i8 __builtin_msa_ilvr_b (v16i8, v16i8); +v8i16 __builtin_msa_ilvr_h (v8i16, v8i16); +v4i32 __builtin_msa_ilvr_w (v4i32, v4i32); +v2i64 __builtin_msa_ilvr_d (v2i64, v2i64); -vector signed int vec_vavgsw (vector signed int, vector signed int); +v16i8 __builtin_msa_insert_b (v16i8, imm0_15, i32); +v8i16 __builtin_msa_insert_h (v8i16, imm0_7, i32); +v4i32 __builtin_msa_insert_w (v4i32, imm0_3, i32); +v2i64 __builtin_msa_insert_d (v2i64, imm0_1, i64); -vector unsigned char vec_vavgub (vector unsigned char, vector unsigned char); +v16i8 __builtin_msa_insve_b (v16i8, imm0_15, v16i8); +v8i16 __builtin_msa_insve_h (v8i16, imm0_7, v8i16); +v4i32 __builtin_msa_insve_w (v4i32, imm0_3, v4i32); +v2i64 __builtin_msa_insve_d (v2i64, imm0_1, v2i64); -vector unsigned short vec_vavguh (vector unsigned short, vector unsigned short); +v16i8 __builtin_msa_ld_b (const void *, imm_n512_511); +v8i16 __builtin_msa_ld_h (const void *, imm_n1024_1022); +v4i32 __builtin_msa_ld_w (const void *, imm_n2048_2044); +v2i64 __builtin_msa_ld_d (const void *, imm_n4096_4088); -vector unsigned int vec_vavguw (vector unsigned int, vector unsigned int); +v16i8 __builtin_msa_ldi_b (imm_n512_511); +v8i16 __builtin_msa_ldi_h (imm_n512_511); +v4i32 __builtin_msa_ldi_w (imm_n512_511); +v2i64 __builtin_msa_ldi_d (imm_n512_511); -vector float vec_vcfsx (vector signed int, const int); +v8i16 __builtin_msa_madd_q_h (v8i16, v8i16, v8i16); +v4i32 __builtin_msa_madd_q_w (v4i32, v4i32, v4i32); -vector float vec_vcfux (vector unsigned int, const int); +v8i16 __builtin_msa_maddr_q_h (v8i16, v8i16, v8i16); +v4i32 __builtin_msa_maddr_q_w (v4i32, v4i32, v4i32); -vector bool int vec_vcmpeqfp (vector float, vector float); +v16i8 __builtin_msa_maddv_b (v16i8, v16i8, v16i8); +v8i16 __builtin_msa_maddv_h (v8i16, v8i16, v8i16); +v4i32 __builtin_msa_maddv_w (v4i32, v4i32, v4i32); +v2i64 __builtin_msa_maddv_d (v2i64, v2i64, v2i64); -vector bool char vec_vcmpequb (vector signed char, vector signed char); -vector bool char vec_vcmpequb (vector unsigned char, vector unsigned char); +v16i8 __builtin_msa_max_a_b (v16i8, v16i8); +v8i16 __builtin_msa_max_a_h (v8i16, v8i16); +v4i32 __builtin_msa_max_a_w (v4i32, v4i32); +v2i64 __builtin_msa_max_a_d (v2i64, v2i64); -vector bool short vec_vcmpequh (vector signed short, vector signed short); -vector bool short vec_vcmpequh (vector unsigned short, vector unsigned short); +v16i8 __builtin_msa_max_s_b (v16i8, v16i8); +v8i16 __builtin_msa_max_s_h (v8i16, v8i16); +v4i32 __builtin_msa_max_s_w (v4i32, v4i32); +v2i64 __builtin_msa_max_s_d (v2i64, v2i64); -vector bool int vec_vcmpequw (vector signed int, vector signed int); -vector bool int vec_vcmpequw (vector unsigned int, vector unsigned int); +v16u8 __builtin_msa_max_u_b (v16u8, v16u8); +v8u16 __builtin_msa_max_u_h (v8u16, v8u16); +v4u32 __builtin_msa_max_u_w (v4u32, v4u32); +v2u64 __builtin_msa_max_u_d (v2u64, v2u64); -vector bool int vec_vcmpgtfp (vector float, vector float); +v16i8 __builtin_msa_maxi_s_b (v16i8, imm_n16_15); +v8i16 __builtin_msa_maxi_s_h (v8i16, imm_n16_15); +v4i32 __builtin_msa_maxi_s_w (v4i32, imm_n16_15); +v2i64 __builtin_msa_maxi_s_d (v2i64, imm_n16_15); -vector bool char vec_vcmpgtsb (vector signed char, vector signed char); +v16u8 __builtin_msa_maxi_u_b (v16u8, imm0_31); +v8u16 __builtin_msa_maxi_u_h (v8u16, imm0_31); +v4u32 __builtin_msa_maxi_u_w (v4u32, imm0_31); +v2u64 __builtin_msa_maxi_u_d (v2u64, imm0_31); -vector bool short vec_vcmpgtsh (vector signed short, vector signed short); +v16i8 __builtin_msa_min_a_b (v16i8, v16i8); +v8i16 __builtin_msa_min_a_h (v8i16, v8i16); +v4i32 __builtin_msa_min_a_w (v4i32, v4i32); +v2i64 __builtin_msa_min_a_d (v2i64, v2i64); -vector bool int vec_vcmpgtsw (vector signed int, vector signed int); +v16i8 __builtin_msa_min_s_b (v16i8, v16i8); +v8i16 __builtin_msa_min_s_h (v8i16, v8i16); +v4i32 __builtin_msa_min_s_w (v4i32, v4i32); +v2i64 __builtin_msa_min_s_d (v2i64, v2i64); -vector bool char vec_vcmpgtub (vector unsigned char, vector unsigned char); +v16u8 __builtin_msa_min_u_b (v16u8, v16u8); +v8u16 __builtin_msa_min_u_h (v8u16, v8u16); +v4u32 __builtin_msa_min_u_w (v4u32, v4u32); +v2u64 __builtin_msa_min_u_d (v2u64, v2u64); -vector bool short vec_vcmpgtuh (vector unsigned short, vector unsigned short); +v16i8 __builtin_msa_mini_s_b (v16i8, imm_n16_15); +v8i16 __builtin_msa_mini_s_h (v8i16, imm_n16_15); +v4i32 __builtin_msa_mini_s_w (v4i32, imm_n16_15); +v2i64 __builtin_msa_mini_s_d (v2i64, imm_n16_15); -vector bool int vec_vcmpgtuw (vector unsigned int, vector unsigned int); +v16u8 __builtin_msa_mini_u_b (v16u8, imm0_31); +v8u16 __builtin_msa_mini_u_h (v8u16, imm0_31); +v4u32 __builtin_msa_mini_u_w (v4u32, imm0_31); +v2u64 __builtin_msa_mini_u_d (v2u64, imm0_31); -vector float vec_vmaxfp (vector float, vector float); +v16i8 __builtin_msa_mod_s_b (v16i8, v16i8); +v8i16 __builtin_msa_mod_s_h (v8i16, v8i16); +v4i32 __builtin_msa_mod_s_w (v4i32, v4i32); +v2i64 __builtin_msa_mod_s_d (v2i64, v2i64); -vector signed char vec_vmaxsb (vector bool char, vector signed char); -vector signed char vec_vmaxsb (vector signed char, vector bool char); -vector signed char vec_vmaxsb (vector signed char, vector signed char); +v16u8 __builtin_msa_mod_u_b (v16u8, v16u8); +v8u16 __builtin_msa_mod_u_h (v8u16, v8u16); +v4u32 __builtin_msa_mod_u_w (v4u32, v4u32); +v2u64 __builtin_msa_mod_u_d (v2u64, v2u64); -vector signed short vec_vmaxsh (vector bool short, vector signed short); -vector signed short vec_vmaxsh (vector signed short, vector bool short); -vector signed short vec_vmaxsh (vector signed short, vector signed short); +v16i8 __builtin_msa_move_v (v16i8); -vector signed int vec_vmaxsw (vector bool int, vector signed int); -vector signed int vec_vmaxsw (vector signed int, vector bool int); -vector signed int vec_vmaxsw (vector signed int, vector signed int); +v8i16 __builtin_msa_msub_q_h (v8i16, v8i16, v8i16); +v4i32 __builtin_msa_msub_q_w (v4i32, v4i32, v4i32); -vector unsigned char vec_vmaxub (vector bool char, vector unsigned char); -vector unsigned char vec_vmaxub (vector unsigned char, vector bool char); -vector unsigned char vec_vmaxub (vector unsigned char, vector unsigned char); +v8i16 __builtin_msa_msubr_q_h (v8i16, v8i16, v8i16); +v4i32 __builtin_msa_msubr_q_w (v4i32, v4i32, v4i32); -vector unsigned short vec_vmaxuh (vector bool short, vector unsigned short); -vector unsigned short vec_vmaxuh (vector unsigned short, vector bool short); -vector unsigned short vec_vmaxuh (vector unsigned short, vector unsigned short); +v16i8 __builtin_msa_msubv_b (v16i8, v16i8, v16i8); +v8i16 __builtin_msa_msubv_h (v8i16, v8i16, v8i16); +v4i32 __builtin_msa_msubv_w (v4i32, v4i32, v4i32); +v2i64 __builtin_msa_msubv_d (v2i64, v2i64, v2i64); -vector unsigned int vec_vmaxuw (vector bool int, vector unsigned int); -vector unsigned int vec_vmaxuw (vector unsigned int, vector bool int); -vector unsigned int vec_vmaxuw (vector unsigned int, vector unsigned int); +v8i16 __builtin_msa_mul_q_h (v8i16, v8i16); +v4i32 __builtin_msa_mul_q_w (v4i32, v4i32); -vector float vec_vminfp (vector float, vector float); +v8i16 __builtin_msa_mulr_q_h (v8i16, v8i16); +v4i32 __builtin_msa_mulr_q_w (v4i32, v4i32); -vector signed char vec_vminsb (vector bool char, vector signed char); -vector signed char vec_vminsb (vector signed char, vector bool char); -vector signed char vec_vminsb (vector signed char, vector signed char); +v16i8 __builtin_msa_mulv_b (v16i8, v16i8); +v8i16 __builtin_msa_mulv_h (v8i16, v8i16); +v4i32 __builtin_msa_mulv_w (v4i32, v4i32); +v2i64 __builtin_msa_mulv_d (v2i64, v2i64); -vector signed short vec_vminsh (vector bool short, vector signed short); -vector signed short vec_vminsh (vector signed short, vector bool short); -vector signed short vec_vminsh (vector signed short, vector signed short); +v16i8 __builtin_msa_nloc_b (v16i8); +v8i16 __builtin_msa_nloc_h (v8i16); +v4i32 __builtin_msa_nloc_w (v4i32); +v2i64 __builtin_msa_nloc_d (v2i64); -vector signed int vec_vminsw (vector bool int, vector signed int); -vector signed int vec_vminsw (vector signed int, vector bool int); -vector signed int vec_vminsw (vector signed int, vector signed int); +v16i8 __builtin_msa_nlzc_b (v16i8); +v8i16 __builtin_msa_nlzc_h (v8i16); +v4i32 __builtin_msa_nlzc_w (v4i32); +v2i64 __builtin_msa_nlzc_d (v2i64); -vector unsigned char vec_vminub (vector bool char, vector unsigned char); -vector unsigned char vec_vminub (vector unsigned char, vector bool char); -vector unsigned char vec_vminub (vector unsigned char, vector unsigned char); +v16u8 __builtin_msa_nor_v (v16u8, v16u8); -vector unsigned short vec_vminuh (vector bool short, vector unsigned short); -vector unsigned short vec_vminuh (vector unsigned short, vector bool short); -vector unsigned short vec_vminuh (vector unsigned short, vector unsigned short); +v16u8 __builtin_msa_nori_b (v16u8, imm0_255); -vector unsigned int vec_vminuw (vector bool int, vector unsigned int); -vector unsigned int vec_vminuw (vector unsigned int, vector bool int); -vector unsigned int vec_vminuw (vector unsigned int, vector unsigned int); +v16u8 __builtin_msa_or_v (v16u8, v16u8); -vector bool char vec_vmrghb (vector bool char, vector bool char); -vector signed char vec_vmrghb (vector signed char, vector signed char); -vector unsigned char vec_vmrghb (vector unsigned char, vector unsigned char); +v16u8 __builtin_msa_ori_b (v16u8, imm0_255); -vector bool short vec_vmrghh (vector bool short, vector bool short); -vector signed short vec_vmrghh (vector signed short, vector signed short); -vector unsigned short vec_vmrghh (vector unsigned short, vector unsigned short); -vector pixel vec_vmrghh (vector pixel, vector pixel); +v16i8 __builtin_msa_pckev_b (v16i8, v16i8); +v8i16 __builtin_msa_pckev_h (v8i16, v8i16); +v4i32 __builtin_msa_pckev_w (v4i32, v4i32); +v2i64 __builtin_msa_pckev_d (v2i64, v2i64); -vector float vec_vmrghw (vector float, vector float); -vector bool int vec_vmrghw (vector bool int, vector bool int); -vector signed int vec_vmrghw (vector signed int, vector signed int); -vector unsigned int vec_vmrghw (vector unsigned int, vector unsigned int); +v16i8 __builtin_msa_pckod_b (v16i8, v16i8); +v8i16 __builtin_msa_pckod_h (v8i16, v8i16); +v4i32 __builtin_msa_pckod_w (v4i32, v4i32); +v2i64 __builtin_msa_pckod_d (v2i64, v2i64); -vector bool char vec_vmrglb (vector bool char, vector bool char); -vector signed char vec_vmrglb (vector signed char, vector signed char); -vector unsigned char vec_vmrglb (vector unsigned char, vector unsigned char); +v16i8 __builtin_msa_pcnt_b (v16i8); +v8i16 __builtin_msa_pcnt_h (v8i16); +v4i32 __builtin_msa_pcnt_w (v4i32); +v2i64 __builtin_msa_pcnt_d (v2i64); -vector bool short vec_vmrglh (vector bool short, vector bool short); -vector signed short vec_vmrglh (vector signed short, vector signed short); -vector unsigned short vec_vmrglh (vector unsigned short, vector unsigned short); -vector pixel vec_vmrglh (vector pixel, vector pixel); +v16i8 __builtin_msa_sat_s_b (v16i8, imm0_7); +v8i16 __builtin_msa_sat_s_h (v8i16, imm0_15); +v4i32 __builtin_msa_sat_s_w (v4i32, imm0_31); +v2i64 __builtin_msa_sat_s_d (v2i64, imm0_63); -vector float vec_vmrglw (vector float, vector float); -vector signed int vec_vmrglw (vector signed int, vector signed int); -vector unsigned int vec_vmrglw (vector unsigned int, vector unsigned int); -vector bool int vec_vmrglw (vector bool int, vector bool int); +v16u8 __builtin_msa_sat_u_b (v16u8, imm0_7); +v8u16 __builtin_msa_sat_u_h (v8u16, imm0_15); +v4u32 __builtin_msa_sat_u_w (v4u32, imm0_31); +v2u64 __builtin_msa_sat_u_d (v2u64, imm0_63); -vector signed int vec_vmsummbm (vector signed char, vector unsigned char, - vector signed int); +v16i8 __builtin_msa_shf_b (v16i8, imm0_255); +v8i16 __builtin_msa_shf_h (v8i16, imm0_255); +v4i32 __builtin_msa_shf_w (v4i32, imm0_255); -vector signed int vec_vmsumshm (vector signed short, vector signed short, - vector signed int); +v16i8 __builtin_msa_sld_b (v16i8, v16i8, i32); +v8i16 __builtin_msa_sld_h (v8i16, v8i16, i32); +v4i32 __builtin_msa_sld_w (v4i32, v4i32, i32); +v2i64 __builtin_msa_sld_d (v2i64, v2i64, i32); -vector signed int vec_vmsumshs (vector signed short, vector signed short, - vector signed int); +v16i8 __builtin_msa_sldi_b (v16i8, v16i8, imm0_15); +v8i16 __builtin_msa_sldi_h (v8i16, v8i16, imm0_7); +v4i32 __builtin_msa_sldi_w (v4i32, v4i32, imm0_3); +v2i64 __builtin_msa_sldi_d (v2i64, v2i64, imm0_1); -vector unsigned int vec_vmsumubm (vector unsigned char, vector unsigned char, - vector unsigned int); +v16i8 __builtin_msa_sll_b (v16i8, v16i8); +v8i16 __builtin_msa_sll_h (v8i16, v8i16); +v4i32 __builtin_msa_sll_w (v4i32, v4i32); +v2i64 __builtin_msa_sll_d (v2i64, v2i64); -vector unsigned int vec_vmsumuhm (vector unsigned short, vector unsigned short, - vector unsigned int); +v16i8 __builtin_msa_slli_b (v16i8, imm0_7); +v8i16 __builtin_msa_slli_h (v8i16, imm0_15); +v4i32 __builtin_msa_slli_w (v4i32, imm0_31); +v2i64 __builtin_msa_slli_d (v2i64, imm0_63); -vector unsigned int vec_vmsumuhs (vector unsigned short, vector unsigned short, - vector unsigned int); +v16i8 __builtin_msa_splat_b (v16i8, i32); +v8i16 __builtin_msa_splat_h (v8i16, i32); +v4i32 __builtin_msa_splat_w (v4i32, i32); +v2i64 __builtin_msa_splat_d (v2i64, i32); -vector signed short vec_vmulesb (vector signed char, vector signed char); +v16i8 __builtin_msa_splati_b (v16i8, imm0_15); +v8i16 __builtin_msa_splati_h (v8i16, imm0_7); +v4i32 __builtin_msa_splati_w (v4i32, imm0_3); +v2i64 __builtin_msa_splati_d (v2i64, imm0_1); -vector signed int vec_vmulesh (vector signed short, vector signed short); +v16i8 __builtin_msa_sra_b (v16i8, v16i8); +v8i16 __builtin_msa_sra_h (v8i16, v8i16); +v4i32 __builtin_msa_sra_w (v4i32, v4i32); +v2i64 __builtin_msa_sra_d (v2i64, v2i64); -vector unsigned short vec_vmuleub (vector unsigned char, vector unsigned char); +v16i8 __builtin_msa_srai_b (v16i8, imm0_7); +v8i16 __builtin_msa_srai_h (v8i16, imm0_15); +v4i32 __builtin_msa_srai_w (v4i32, imm0_31); +v2i64 __builtin_msa_srai_d (v2i64, imm0_63); -vector unsigned int vec_vmuleuh (vector unsigned short, vector unsigned short); +v16i8 __builtin_msa_srar_b (v16i8, v16i8); +v8i16 __builtin_msa_srar_h (v8i16, v8i16); +v4i32 __builtin_msa_srar_w (v4i32, v4i32); +v2i64 __builtin_msa_srar_d (v2i64, v2i64); -vector signed short vec_vmulosb (vector signed char, vector signed char); +v16i8 __builtin_msa_srari_b (v16i8, imm0_7); +v8i16 __builtin_msa_srari_h (v8i16, imm0_15); +v4i32 __builtin_msa_srari_w (v4i32, imm0_31); +v2i64 __builtin_msa_srari_d (v2i64, imm0_63); -vector signed int vec_vmulosh (vector signed short, vector signed short); +v16i8 __builtin_msa_srl_b (v16i8, v16i8); +v8i16 __builtin_msa_srl_h (v8i16, v8i16); +v4i32 __builtin_msa_srl_w (v4i32, v4i32); +v2i64 __builtin_msa_srl_d (v2i64, v2i64); -vector unsigned short vec_vmuloub (vector unsigned char, vector unsigned char); +v16i8 __builtin_msa_srli_b (v16i8, imm0_7); +v8i16 __builtin_msa_srli_h (v8i16, imm0_15); +v4i32 __builtin_msa_srli_w (v4i32, imm0_31); +v2i64 __builtin_msa_srli_d (v2i64, imm0_63); -vector unsigned int vec_vmulouh (vector unsigned short, vector unsigned short); +v16i8 __builtin_msa_srlr_b (v16i8, v16i8); +v8i16 __builtin_msa_srlr_h (v8i16, v8i16); +v4i32 __builtin_msa_srlr_w (v4i32, v4i32); +v2i64 __builtin_msa_srlr_d (v2i64, v2i64); -vector signed char vec_vpkshss (vector signed short, vector signed short); +v16i8 __builtin_msa_srlri_b (v16i8, imm0_7); +v8i16 __builtin_msa_srlri_h (v8i16, imm0_15); +v4i32 __builtin_msa_srlri_w (v4i32, imm0_31); +v2i64 __builtin_msa_srlri_d (v2i64, imm0_63); -vector unsigned char vec_vpkshus (vector signed short, vector signed short); +void __builtin_msa_st_b (v16i8, void *, imm_n512_511); +void __builtin_msa_st_h (v8i16, void *, imm_n1024_1022); +void __builtin_msa_st_w (v4i32, void *, imm_n2048_2044); +void __builtin_msa_st_d (v2i64, void *, imm_n4096_4088); -vector signed short vec_vpkswss (vector signed int, vector signed int); +v16i8 __builtin_msa_subs_s_b (v16i8, v16i8); +v8i16 __builtin_msa_subs_s_h (v8i16, v8i16); +v4i32 __builtin_msa_subs_s_w (v4i32, v4i32); +v2i64 __builtin_msa_subs_s_d (v2i64, v2i64); -vector unsigned short vec_vpkswus (vector signed int, vector signed int); +v16u8 __builtin_msa_subs_u_b (v16u8, v16u8); +v8u16 __builtin_msa_subs_u_h (v8u16, v8u16); +v4u32 __builtin_msa_subs_u_w (v4u32, v4u32); +v2u64 __builtin_msa_subs_u_d (v2u64, v2u64); -vector bool char vec_vpkuhum (vector bool short, vector bool short); -vector signed char vec_vpkuhum (vector signed short, vector signed short); -vector unsigned char vec_vpkuhum (vector unsigned short, vector unsigned short); +v16u8 __builtin_msa_subsus_u_b (v16u8, v16i8); +v8u16 __builtin_msa_subsus_u_h (v8u16, v8i16); +v4u32 __builtin_msa_subsus_u_w (v4u32, v4i32); +v2u64 __builtin_msa_subsus_u_d (v2u64, v2i64); -vector unsigned char vec_vpkuhus (vector unsigned short, vector unsigned short); +v16i8 __builtin_msa_subsuu_s_b (v16u8, v16u8); +v8i16 __builtin_msa_subsuu_s_h (v8u16, v8u16); +v4i32 __builtin_msa_subsuu_s_w (v4u32, v4u32); +v2i64 __builtin_msa_subsuu_s_d (v2u64, v2u64); -vector bool short vec_vpkuwum (vector bool int, vector bool int); -vector signed short vec_vpkuwum (vector signed int, vector signed int); -vector unsigned short vec_vpkuwum (vector unsigned int, vector unsigned int); +v16i8 __builtin_msa_subv_b (v16i8, v16i8); +v8i16 __builtin_msa_subv_h (v8i16, v8i16); +v4i32 __builtin_msa_subv_w (v4i32, v4i32); +v2i64 __builtin_msa_subv_d (v2i64, v2i64); -vector unsigned short vec_vpkuwus (vector unsigned int, vector unsigned int); +v16i8 __builtin_msa_subvi_b (v16i8, imm0_31); +v8i16 __builtin_msa_subvi_h (v8i16, imm0_31); +v4i32 __builtin_msa_subvi_w (v4i32, imm0_31); +v2i64 __builtin_msa_subvi_d (v2i64, imm0_31); -vector signed char vec_vrlb (vector signed char, vector unsigned char); -vector unsigned char vec_vrlb (vector unsigned char, vector unsigned char); +v16i8 __builtin_msa_vshf_b (v16i8, v16i8, v16i8); +v8i16 __builtin_msa_vshf_h (v8i16, v8i16, v8i16); +v4i32 __builtin_msa_vshf_w (v4i32, v4i32, v4i32); +v2i64 __builtin_msa_vshf_d (v2i64, v2i64, v2i64); -vector signed short vec_vrlh (vector signed short, vector unsigned short); -vector unsigned short vec_vrlh (vector unsigned short, vector unsigned short); +v16u8 __builtin_msa_xor_v (v16u8, v16u8); -vector signed int vec_vrlw (vector signed int, vector unsigned int); -vector unsigned int vec_vrlw (vector unsigned int, vector unsigned int); +v16u8 __builtin_msa_xori_b (v16u8, imm0_255); +@end smallexample -vector signed char vec_vslb (vector signed char, vector unsigned char); -vector unsigned char vec_vslb (vector unsigned char, vector unsigned char); +@node Other MIPS Built-in Functions +@subsection Other MIPS Built-in Functions -vector signed short vec_vslh (vector signed short, vector unsigned short); -vector unsigned short vec_vslh (vector unsigned short, vector unsigned short); +GCC provides other MIPS-specific built-in functions: -vector signed int vec_vslw (vector signed int, vector unsigned int); -vector unsigned int vec_vslw (vector unsigned int, vector unsigned int); +@table @code +@item void __builtin_mips_cache (int @var{op}, const volatile void *@var{addr}) +Insert a @samp{cache} instruction with operands @var{op} and @var{addr}. +GCC defines the preprocessor macro @code{___GCC_HAVE_BUILTIN_MIPS_CACHE} +when this function is available. -vector signed char vec_vspltb (vector signed char, const int); -vector unsigned char vec_vspltb (vector unsigned char, const int); -vector bool char vec_vspltb (vector bool char, const int); +@item unsigned int __builtin_mips_get_fcsr (void) +@itemx void __builtin_mips_set_fcsr (unsigned int @var{value}) +Get and set the contents of the floating-point control and status register +(FPU control register 31). These functions are only available in hard-float +code but can be called in both MIPS16 and non-MIPS16 contexts. -vector bool short vec_vsplth (vector bool short, const int); -vector signed short vec_vsplth (vector signed short, const int); -vector unsigned short vec_vsplth (vector unsigned short, const int); -vector pixel vec_vsplth (vector pixel, const int); +@code{__builtin_mips_set_fcsr} can be used to change any bit of the +register except the condition codes, which GCC assumes are preserved. +@end table -vector float vec_vspltw (vector float, const int); -vector signed int vec_vspltw (vector signed int, const int); -vector unsigned int vec_vspltw (vector unsigned int, const int); -vector bool int vec_vspltw (vector bool int, const int); +@node MSP430 Built-in Functions +@subsection MSP430 Built-in Functions -vector signed char vec_vsrab (vector signed char, vector unsigned char); -vector unsigned char vec_vsrab (vector unsigned char, vector unsigned char); +GCC provides a couple of special builtin functions to aid in the +writing of interrupt handlers in C. -vector signed short vec_vsrah (vector signed short, vector unsigned short); -vector unsigned short vec_vsrah (vector unsigned short, vector unsigned short); +@table @code +@item __bic_SR_register_on_exit (int @var{mask}) +This clears the indicated bits in the saved copy of the status register +currently residing on the stack. This only works inside interrupt +handlers and the changes to the status register will only take affect +once the handler returns. -vector signed int vec_vsraw (vector signed int, vector unsigned int); -vector unsigned int vec_vsraw (vector unsigned int, vector unsigned int); +@item __bis_SR_register_on_exit (int @var{mask}) +This sets the indicated bits in the saved copy of the status register +currently residing on the stack. This only works inside interrupt +handlers and the changes to the status register will only take affect +once the handler returns. -vector signed char vec_vsrb (vector signed char, vector unsigned char); -vector unsigned char vec_vsrb (vector unsigned char, vector unsigned char); +@item __delay_cycles (long long @var{cycles}) +This inserts an instruction sequence that takes exactly @var{cycles} +cycles (between 0 and about 17E9) to complete. The inserted sequence +may use jumps, loops, or no-ops, and does not interfere with any other +instructions. Note that @var{cycles} must be a compile-time constant +integer - that is, you must pass a number, not a variable that may be +optimized to a constant later. The number of cycles delayed by this +builtin is exact. +@end table -vector signed short vec_vsrh (vector signed short, vector unsigned short); -vector unsigned short vec_vsrh (vector unsigned short, vector unsigned short); +@node NDS32 Built-in Functions +@subsection NDS32 Built-in Functions -vector signed int vec_vsrw (vector signed int, vector unsigned int); -vector unsigned int vec_vsrw (vector unsigned int, vector unsigned int); +These built-in functions are available for the NDS32 target: -vector float vec_vsubfp (vector float, vector float); +@defbuiltin{void __builtin_nds32_isync (int *@var{addr})} +Insert an ISYNC instruction into the instruction stream where +@var{addr} is an instruction address for serialization. +@enddefbuiltin -vector signed char vec_vsubsbs (vector bool char, vector signed char); -vector signed char vec_vsubsbs (vector signed char, vector bool char); -vector signed char vec_vsubsbs (vector signed char, vector signed char); +@defbuiltin{void __builtin_nds32_isb (void)} +Insert an ISB instruction into the instruction stream. +@enddefbuiltin -vector signed short vec_vsubshs (vector bool short, vector signed short); -vector signed short vec_vsubshs (vector signed short, vector bool short); -vector signed short vec_vsubshs (vector signed short, vector signed short); +@defbuiltin{int __builtin_nds32_mfsr (int @var{sr})} +Return the content of a system register which is mapped by @var{sr}. +@enddefbuiltin -vector signed int vec_vsubsws (vector bool int, vector signed int); -vector signed int vec_vsubsws (vector signed int, vector bool int); -vector signed int vec_vsubsws (vector signed int, vector signed int); +@defbuiltin{int __builtin_nds32_mfusr (int @var{usr})} +Return the content of a user space register which is mapped by @var{usr}. +@enddefbuiltin -vector signed char vec_vsububm (vector bool char, vector signed char); -vector signed char vec_vsububm (vector signed char, vector bool char); -vector signed char vec_vsububm (vector signed char, vector signed char); -vector unsigned char vec_vsububm (vector bool char, vector unsigned char); -vector unsigned char vec_vsububm (vector unsigned char, vector bool char); -vector unsigned char vec_vsububm (vector unsigned char, vector unsigned char); +@defbuiltin{void __builtin_nds32_mtsr (int @var{value}, int @var{sr})} +Move the @var{value} to a system register which is mapped by @var{sr}. +@enddefbuiltin -vector unsigned char vec_vsububs (vector bool char, vector unsigned char); -vector unsigned char vec_vsububs (vector unsigned char, vector bool char); -vector unsigned char vec_vsububs (vector unsigned char, vector unsigned char); +@defbuiltin{void __builtin_nds32_mtusr (int @var{value}, int @var{usr})} +Move the @var{value} to a user space register which is mapped by @var{usr}. +@enddefbuiltin -vector signed short vec_vsubuhm (vector bool short, vector signed short); -vector signed short vec_vsubuhm (vector signed short, vector bool short); -vector signed short vec_vsubuhm (vector signed short, vector signed short); -vector unsigned short vec_vsubuhm (vector bool short, vector unsigned short); -vector unsigned short vec_vsubuhm (vector unsigned short, vector bool short); -vector unsigned short vec_vsubuhm (vector unsigned short, vector unsigned short); +@defbuiltin{void __builtin_nds32_setgie_en (void)} +Enable global interrupt. +@enddefbuiltin -vector unsigned short vec_vsubuhs (vector bool short, vector unsigned short); -vector unsigned short vec_vsubuhs (vector unsigned short, vector bool short); -vector unsigned short vec_vsubuhs (vector unsigned short, vector unsigned short); +@defbuiltin{void __builtin_nds32_setgie_dis (void)} +Disable global interrupt. +@enddefbuiltin -vector signed int vec_vsubuwm (vector bool int, vector signed int); -vector signed int vec_vsubuwm (vector signed int, vector bool int); -vector signed int vec_vsubuwm (vector signed int, vector signed int); -vector unsigned int vec_vsubuwm (vector bool int, vector unsigned int); -vector unsigned int vec_vsubuwm (vector unsigned int, vector bool int); -vector unsigned int vec_vsubuwm (vector unsigned int, vector unsigned int); +@node Nvidia PTX Built-in Functions +@subsection Nvidia PTX Built-in Functions -vector unsigned int vec_vsubuws (vector bool int, vector unsigned int); -vector unsigned int vec_vsubuws (vector unsigned int, vector bool int); -vector unsigned int vec_vsubuws (vector unsigned int, vector unsigned int); +These built-in functions are available for the Nvidia PTX target: -vector signed int vec_vsum4sbs (vector signed char, vector signed int); +@defbuiltin{{unsigned int} __builtin_nvptx_brev (unsigned int @var{x})} +Reverse the bit order of a 32-bit unsigned integer. +@enddefbuiltin -vector signed int vec_vsum4shs (vector signed short, vector signed int); +@defbuiltin{{unsigned long long} __builtin_nvptx_brevll (unsigned long long @var{x})} +Reverse the bit order of a 64-bit unsigned integer. +@enddefbuiltin -vector unsigned int vec_vsum4ubs (vector unsigned char, vector unsigned int); +@node Basic PowerPC Built-in Functions +@subsection Basic PowerPC Built-in Functions -vector unsigned int vec_vupkhpx (vector pixel); +@menu +* Basic PowerPC Built-in Functions Available on all Configurations:: +* Basic PowerPC Built-in Functions Available on ISA 2.05:: +* Basic PowerPC Built-in Functions Available on ISA 2.06:: +* Basic PowerPC Built-in Functions Available on ISA 2.07:: +* Basic PowerPC Built-in Functions Available on ISA 3.0:: +* Basic PowerPC Built-in Functions Available on ISA 3.1:: +@end menu -vector bool short vec_vupkhsb (vector bool char); -vector signed short vec_vupkhsb (vector signed char); +This section describes PowerPC built-in functions that do not require +the inclusion of any special header files to declare prototypes or +provide macro definitions. The sections that follow describe +additional PowerPC built-in functions. -vector bool int vec_vupkhsh (vector bool short); -vector signed int vec_vupkhsh (vector signed short); +@node Basic PowerPC Built-in Functions Available on all Configurations +@subsubsection Basic PowerPC Built-in Functions Available on all Configurations -vector unsigned int vec_vupklpx (vector pixel); +@defbuiltin{void __builtin_cpu_init (void)} +This function is a @code{nop} on the PowerPC platform and is included solely +to maintain API compatibility with the x86 builtins. +@enddefbuiltin -vector bool short vec_vupklsb (vector bool char); -vector signed short vec_vupklsb (vector signed char); +@defbuiltin{int __builtin_cpu_is (const char *@var{cpuname})} +This function returns a value of @code{1} if the run-time CPU is of type +@var{cpuname} and returns @code{0} otherwise -vector bool int vec_vupklsh (vector bool short); -vector signed int vec_vupklsh (vector signed short); -@end smallexample +The @code{__builtin_cpu_is} function requires GLIBC 2.23 or newer +which exports the hardware capability bits. GCC defines the macro +@code{__BUILTIN_CPU_SUPPORTS__} if the @code{__builtin_cpu_supports} +built-in function is fully supported. -@node PowerPC AltiVec Built-in Functions Available on ISA 2.06 -@subsubsection PowerPC AltiVec Built-in Functions Available on ISA 2.06 +If GCC was configured to use a GLIBC before 2.23, the built-in +function @code{__builtin_cpu_is} always returns a 0 and the compiler +issues a warning. -The AltiVec built-in functions described in this section are -available on the PowerPC family of processors starting with ISA 2.06 -or later. These are normally enabled by adding @option{-mvsx} to the -command line. +The following CPU names can be detected: -When @option{-mvsx} is used, the following additional vector types are -implemented. +@table @samp +@item power10 +IBM POWER10 Server CPU. +@item power9 +IBM POWER9 Server CPU. +@item power8 +IBM POWER8 Server CPU. +@item power7 +IBM POWER7 Server CPU. +@item power6x +IBM POWER6 Server CPU (RAW mode). +@item power6 +IBM POWER6 Server CPU (Architected mode). +@item power5+ +IBM POWER5+ Server CPU. +@item power5 +IBM POWER5 Server CPU. +@item ppc970 +IBM 970 Server CPU (ie, Apple G5). +@item power4 +IBM POWER4 Server CPU. +@item ppca2 +IBM A2 64-bit Embedded CPU +@item ppc476 +IBM PowerPC 476FP 32-bit Embedded CPU. +@item ppc464 +IBM PowerPC 464 32-bit Embedded CPU. +@item ppc440 +PowerPC 440 32-bit Embedded CPU. +@item ppc405 +PowerPC 405 32-bit Embedded CPU. +@item ppc-cell-be +IBM PowerPC Cell Broadband Engine Architecture CPU. +@end table +Here is an example: @smallexample -vector unsigned __int128 -vector signed __int128 -vector unsigned long long int -vector signed long long int -vector double +#ifdef __BUILTIN_CPU_SUPPORTS__ + if (__builtin_cpu_is ("power8")) + @{ + do_power8 (); // POWER8 specific implementation. + @} + else +#endif + @{ + do_generic (); // Generic implementation. + @} @end smallexample +@enddefbuiltin -The long long types are only implemented for 64-bit code generation. - -Only functions excluded from the PVIPR are listed here. - -@smallexample -void vec_dst (const unsigned long *, int, const int); -void vec_dst (const long *, int, const int); - -void vec_dststt (const unsigned long *, int, const int); -void vec_dststt (const long *, int, const int); - -void vec_dstt (const unsigned long *, int, const int); -void vec_dstt (const long *, int, const int); - -vector unsigned char vec_lvsl (int, const unsigned long *); -vector unsigned char vec_lvsl (int, const long *); - -vector unsigned char vec_lvsr (int, const unsigned long *); -vector unsigned char vec_lvsr (int, const long *); +@defbuiltin{int __builtin_cpu_supports (const char *@var{feature})} +This function returns a value of @code{1} if the run-time CPU supports the HWCAP +feature @var{feature} and returns @code{0} otherwise. -vector unsigned char vec_lvsl (int, const double *); -vector unsigned char vec_lvsr (int, const double *); +The @code{__builtin_cpu_supports} function requires GLIBC 2.23 or +newer which exports the hardware capability bits. GCC defines the +macro @code{__BUILTIN_CPU_SUPPORTS__} if the +@code{__builtin_cpu_supports} built-in function is fully supported. -vector double vec_vsx_ld (int, const vector double *); -vector double vec_vsx_ld (int, const double *); -vector float vec_vsx_ld (int, const vector float *); -vector float vec_vsx_ld (int, const float *); -vector bool int vec_vsx_ld (int, const vector bool int *); -vector signed int vec_vsx_ld (int, const vector signed int *); -vector signed int vec_vsx_ld (int, const int *); -vector signed int vec_vsx_ld (int, const long *); -vector unsigned int vec_vsx_ld (int, const vector unsigned int *); -vector unsigned int vec_vsx_ld (int, const unsigned int *); -vector unsigned int vec_vsx_ld (int, const unsigned long *); -vector bool short vec_vsx_ld (int, const vector bool short *); -vector pixel vec_vsx_ld (int, const vector pixel *); -vector signed short vec_vsx_ld (int, const vector signed short *); -vector signed short vec_vsx_ld (int, const short *); -vector unsigned short vec_vsx_ld (int, const vector unsigned short *); -vector unsigned short vec_vsx_ld (int, const unsigned short *); -vector bool char vec_vsx_ld (int, const vector bool char *); -vector signed char vec_vsx_ld (int, const vector signed char *); -vector signed char vec_vsx_ld (int, const signed char *); -vector unsigned char vec_vsx_ld (int, const vector unsigned char *); -vector unsigned char vec_vsx_ld (int, const unsigned char *); +If GCC was configured to use a GLIBC before 2.23, the built-in +function @code{__builtin_cpu_supports} always returns a 0 and the +compiler issues a warning. -void vec_vsx_st (vector double, int, vector double *); -void vec_vsx_st (vector double, int, double *); -void vec_vsx_st (vector float, int, vector float *); -void vec_vsx_st (vector float, int, float *); -void vec_vsx_st (vector signed int, int, vector signed int *); -void vec_vsx_st (vector signed int, int, int *); -void vec_vsx_st (vector unsigned int, int, vector unsigned int *); -void vec_vsx_st (vector unsigned int, int, unsigned int *); -void vec_vsx_st (vector bool int, int, vector bool int *); -void vec_vsx_st (vector bool int, int, unsigned int *); -void vec_vsx_st (vector bool int, int, int *); -void vec_vsx_st (vector signed short, int, vector signed short *); -void vec_vsx_st (vector signed short, int, short *); -void vec_vsx_st (vector unsigned short, int, vector unsigned short *); -void vec_vsx_st (vector unsigned short, int, unsigned short *); -void vec_vsx_st (vector bool short, int, vector bool short *); -void vec_vsx_st (vector bool short, int, unsigned short *); -void vec_vsx_st (vector pixel, int, vector pixel *); -void vec_vsx_st (vector pixel, int, unsigned short *); -void vec_vsx_st (vector pixel, int, short *); -void vec_vsx_st (vector bool short, int, short *); -void vec_vsx_st (vector signed char, int, vector signed char *); -void vec_vsx_st (vector signed char, int, signed char *); -void vec_vsx_st (vector unsigned char, int, vector unsigned char *); -void vec_vsx_st (vector unsigned char, int, unsigned char *); -void vec_vsx_st (vector bool char, int, vector bool char *); -void vec_vsx_st (vector bool char, int, unsigned char *); -void vec_vsx_st (vector bool char, int, signed char *); +The following features can be +detected: -vector double vec_xxpermdi (vector double, vector double, const int); -vector float vec_xxpermdi (vector float, vector float, const int); -vector __int128 vec_xxpermdi (vector __int128, - vector __int128, const int); -vector __uint128 vec_xxpermdi (vector __uint128, - vector __uint128, const int); -vector long long vec_xxpermdi (vector long long, vector long long, const int); -vector unsigned long long vec_xxpermdi (vector unsigned long long, - vector unsigned long long, const int); -vector int vec_xxpermdi (vector int, vector int, const int); -vector unsigned int vec_xxpermdi (vector unsigned int, - vector unsigned int, const int); -vector short vec_xxpermdi (vector short, vector short, const int); -vector unsigned short vec_xxpermdi (vector unsigned short, - vector unsigned short, const int); -vector signed char vec_xxpermdi (vector signed char, vector signed char, - const int); -vector unsigned char vec_xxpermdi (vector unsigned char, - vector unsigned char, const int); +@table @samp +@item 4xxmac +4xx CPU has a Multiply Accumulator. +@item altivec +CPU has a SIMD/Vector Unit. +@item arch_2_05 +CPU supports ISA 2.05 (eg, POWER6) +@item arch_2_06 +CPU supports ISA 2.06 (eg, POWER7) +@item arch_2_07 +CPU supports ISA 2.07 (eg, POWER8) +@item arch_3_00 +CPU supports ISA 3.0 (eg, POWER9) +@item arch_3_1 +CPU supports ISA 3.1 (eg, POWER10) +@item archpmu +CPU supports the set of compatible performance monitoring events. +@item booke +CPU supports the Embedded ISA category. +@item cellbe +CPU has a CELL broadband engine. +@item darn +CPU supports the @code{darn} (deliver a random number) instruction. +@item dfp +CPU has a decimal floating point unit. +@item dscr +CPU supports the data stream control register. +@item ebb +CPU supports event base branching. +@item efpdouble +CPU has a SPE double precision floating point unit. +@item efpsingle +CPU has a SPE single precision floating point unit. +@item fpu +CPU has a floating point unit. +@item htm +CPU has hardware transaction memory instructions. +@item htm-nosc +Kernel aborts hardware transactions when a syscall is made. +@item htm-no-suspend +CPU supports hardware transaction memory but does not support the +@code{tsuspend.} instruction. +@item ic_snoop +CPU supports icache snooping capabilities. +@item ieee128 +CPU supports 128-bit IEEE binary floating point instructions. +@item isel +CPU supports the integer select instruction. +@item mma +CPU supports the matrix-multiply assist instructions. +@item mmu +CPU has a memory management unit. +@item notb +CPU does not have a timebase (eg, 601 and 403gx). +@item pa6t +CPU supports the PA Semi 6T CORE ISA. +@item power4 +CPU supports ISA 2.00 (eg, POWER4) +@item power5 +CPU supports ISA 2.02 (eg, POWER5) +@item power5+ +CPU supports ISA 2.03 (eg, POWER5+) +@item power6x +CPU supports ISA 2.05 (eg, POWER6) extended opcodes mffgpr and mftgpr. +@item ppc32 +CPU supports 32-bit mode execution. +@item ppc601 +CPU supports the old POWER ISA (eg, 601) +@item ppc64 +CPU supports 64-bit mode execution. +@item ppcle +CPU supports a little-endian mode that uses address swizzling. +@item scv +Kernel supports system call vectored. +@item smt +CPU support simultaneous multi-threading. +@item spe +CPU has a signal processing extension unit. +@item tar +CPU supports the target address register. +@item true_le +CPU supports true little-endian mode. +@item ucache +CPU has unified I/D cache. +@item vcrypto +CPU supports the vector cryptography instructions. +@item vsx +CPU supports the vector-scalar extension. +@end table -vector double vec_xxsldi (vector double, vector double, int); -vector float vec_xxsldi (vector float, vector float, int); -vector long long vec_xxsldi (vector long long, vector long long, int); -vector unsigned long long vec_xxsldi (vector unsigned long long, - vector unsigned long long, int); -vector int vec_xxsldi (vector int, vector int, int); -vector unsigned int vec_xxsldi (vector unsigned int, vector unsigned int, int); -vector short vec_xxsldi (vector short, vector short, int); -vector unsigned short vec_xxsldi (vector unsigned short, - vector unsigned short, int); -vector signed char vec_xxsldi (vector signed char, vector signed char, int); -vector unsigned char vec_xxsldi (vector unsigned char, - vector unsigned char, int); +Here is an example: +@smallexample +#ifdef __BUILTIN_CPU_SUPPORTS__ + if (__builtin_cpu_supports ("fpu")) + @{ + asm("fadd %0,%1,%2" : "=d"(dst) : "d"(src1), "d"(src2)); + @} + else +#endif + @{ + dst = __fadd (src1, src2); // Software FP addition function. + @} @end smallexample +@enddefbuiltin -Note that the @samp{vec_ld} and @samp{vec_st} built-in functions always -generate the AltiVec @samp{LVX} and @samp{STVX} instructions even -if the VSX instruction set is available. The @samp{vec_vsx_ld} and -@samp{vec_vsx_st} built-in functions always generate the VSX @samp{LXVD2X}, -@samp{LXVW4X}, @samp{STXVD2X}, and @samp{STXVW4X} instructions. - +The following built-in functions are also available on all PowerPC +processors: @smallexample -vector signed long long vec_signedo (vector float); -vector signed long long vec_signede (vector float); -vector unsigned long long vec_unsignedo (vector float); -vector unsigned long long vec_unsignede (vector float); +uint64_t __builtin_ppc_get_timebase (); +unsigned long __builtin_ppc_mftb (); +double __builtin_unpack_ibm128 (__ibm128, int); +__ibm128 __builtin_pack_ibm128 (double, double); +double __builtin_mffs (void); +void __builtin_mtfsf (const int, double); +void __builtin_mtfsb0 (const int); +void __builtin_mtfsb1 (const int); +double __builtin_set_fpscr_rn (int); @end smallexample -The overloaded built-ins @code{vec_signedo} and @code{vec_signede} are -additional extensions to the built-ins as documented in the PVIPR. +The @code{__builtin_ppc_get_timebase} and @code{__builtin_ppc_mftb} +functions generate instructions to read the Time Base Register. The +@code{__builtin_ppc_get_timebase} function may generate multiple +instructions and always returns the 64 bits of the Time Base Register. +The @code{__builtin_ppc_mftb} function always generates one instruction and +returns the Time Base Register value as an unsigned long, throwing away +the most significant word on 32-bit environments. The @code{__builtin_mffs} +return the value of the FPSCR register. Note, ISA 3.0 supports the +@code{__builtin_mffsl()} which permits software to read the control and +non-sticky status bits in the FSPCR without the higher latency associated with +accessing the sticky status bits. The @code{__builtin_mtfsf} takes a constant +8-bit integer field mask and a double precision floating point argument +and generates the @code{mtfsf} (extended mnemonic) instruction to write new +values to selected fields of the FPSCR. The +@code{__builtin_mtfsb0} and @code{__builtin_mtfsb1} take the bit to change +as an argument. The valid bit range is between 0 and 31. The builtins map to +the @code{mtfsb0} and @code{mtfsb1} instructions which take the argument and +add 32. Hence these instructions only modify the FPSCR[32:63] bits by +changing the specified bit to a zero or one respectively. -@node PowerPC AltiVec Built-in Functions Available on ISA 2.07 -@subsubsection PowerPC AltiVec Built-in Functions Available on ISA 2.07 +The @code{__builtin_set_fpscr_rn} built-in allows changing both of the floating +point rounding mode bits and returning the various FPSCR fields before the RN +field is updated. The built-in returns a double consisting of the initial +value of the FPSCR fields DRN, VE, OE, UE, ZE, XE, NI, and RN bit positions +with all other bits set to zero. The built-in argument is a 2-bit value for the +new RN field value. The argument can either be an @code{const int} or stored +in a variable. Earlier versions of @code{__builtin_set_fpscr_rn} returned +void. A @code{__SET_FPSCR_RN_RETURNS_FPSCR__} macro has been added. If +defined, then the @code{__builtin_set_fpscr_rn} built-in returns the FPSCR +fields. If not defined, the @code{__builtin_set_fpscr_rn} does not return a +value. If the @option{-msoft-float} option is used, the +@code{__builtin_set_fpscr_rn} built-in will not return a value. -If the ISA 2.07 additions to the vector/scalar (power8-vector) -instruction set are available, the following additional functions are -available for both 32-bit and 64-bit targets. For 64-bit targets, you -can use @var{vector long} instead of @var{vector long long}, -@var{vector bool long} instead of @var{vector bool long long}, and -@var{vector unsigned long} instead of @var{vector unsigned long long}. +@node Basic PowerPC Built-in Functions Available on ISA 2.05 +@subsubsection Basic PowerPC Built-in Functions Available on ISA 2.05 -Only functions excluded from the PVIPR are listed here. +The basic built-in functions described in this section are +available on the PowerPC family of processors starting with ISA 2.05 +or later. Unless specific options are explicitly disabled on the +command line, specifying option @option{-mcpu=power6} has the effect of +enabling the @option{-mpowerpc64}, @option{-mpowerpc-gpopt}, +@option{-mpowerpc-gfxopt}, @option{-mmfcrf}, @option{-mpopcntb}, +@option{-mfprnd}, @option{-mcmpb}, @option{-mhard-dfp}, and +@option{-mrecip-precision} options. Specify the +@option{-maltivec} option explicitly in +combination with the above options if desired. +The following functions require option @option{-mcmpb}. @smallexample -vector long long vec_vaddudm (vector long long, vector long long); -vector long long vec_vaddudm (vector bool long long, vector long long); -vector long long vec_vaddudm (vector long long, vector bool long long); -vector unsigned long long vec_vaddudm (vector unsigned long long, - vector unsigned long long); -vector unsigned long long vec_vaddudm (vector bool unsigned long long, - vector unsigned long long); -vector unsigned long long vec_vaddudm (vector unsigned long long, - vector bool unsigned long long); - -vector long long vec_vclz (vector long long); -vector unsigned long long vec_vclz (vector unsigned long long); -vector int vec_vclz (vector int); -vector unsigned int vec_vclz (vector int); -vector short vec_vclz (vector short); -vector unsigned short vec_vclz (vector unsigned short); -vector signed char vec_vclz (vector signed char); -vector unsigned char vec_vclz (vector unsigned char); - -vector signed char vec_vclzb (vector signed char); -vector unsigned char vec_vclzb (vector unsigned char); - -vector long long vec_vclzd (vector long long); -vector unsigned long long vec_vclzd (vector unsigned long long); - -vector short vec_vclzh (vector short); -vector unsigned short vec_vclzh (vector unsigned short); - -vector int vec_vclzw (vector int); -vector unsigned int vec_vclzw (vector int); - -vector signed char vec_vgbbd (vector signed char); -vector unsigned char vec_vgbbd (vector unsigned char); - -vector long long vec_vmaxsd (vector long long, vector long long); - -vector unsigned long long vec_vmaxud (vector unsigned long long, - unsigned vector long long); - -vector long long vec_vminsd (vector long long, vector long long); - -vector unsigned long long vec_vminud (vector long long, vector long long); - -vector int vec_vpksdss (vector long long, vector long long); -vector unsigned int vec_vpksdss (vector long long, vector long long); +unsigned long long __builtin_cmpb (unsigned long long int, unsigned long long int); +unsigned int __builtin_cmpb (unsigned int, unsigned int); +@end smallexample -vector unsigned int vec_vpkudus (vector unsigned long long, - vector unsigned long long); +The @code{__builtin_cmpb} function +performs a byte-wise compare on the contents of its two arguments, +returning the result of the byte-wise comparison as the returned +value. For each byte comparison, the corresponding byte of the return +value holds 0xff if the input bytes are equal and 0 if the input bytes +are not equal. If either of the arguments to this built-in function +is wider than 32 bits, the function call expands into the form that +expects @code{unsigned long long int} arguments +which is only available on 64-bit targets. -vector int vec_vpkudum (vector long long, vector long long); -vector unsigned int vec_vpkudum (vector unsigned long long, - vector unsigned long long); -vector bool int vec_vpkudum (vector bool long long, vector bool long long); +The following built-in functions are available +when hardware decimal floating point +(@option{-mhard-dfp}) is available: +@smallexample +void __builtin_set_fpscr_drn(int); +_Decimal64 __builtin_ddedpd (int, _Decimal64); +_Decimal128 __builtin_ddedpdq (int, _Decimal128); +_Decimal64 __builtin_denbcd (int, _Decimal64); +_Decimal128 __builtin_denbcdq (int, _Decimal128); +_Decimal64 __builtin_diex (long long, _Decimal64); +_Decimal128 _builtin_diexq (long long, _Decimal128); +_Decimal64 __builtin_dscli (_Decimal64, int); +_Decimal128 __builtin_dscliq (_Decimal128, int); +_Decimal64 __builtin_dscri (_Decimal64, int); +_Decimal128 __builtin_dscriq (_Decimal128, int); +long long __builtin_dxex (_Decimal64); +long long __builtin_dxexq (_Decimal128); +_Decimal128 __builtin_pack_dec128 (unsigned long long, unsigned long long); +unsigned long long __builtin_unpack_dec128 (_Decimal128, int); -vector long long vec_vpopcnt (vector long long); -vector unsigned long long vec_vpopcnt (vector unsigned long long); -vector int vec_vpopcnt (vector int); -vector unsigned int vec_vpopcnt (vector int); -vector short vec_vpopcnt (vector short); -vector unsigned short vec_vpopcnt (vector unsigned short); -vector signed char vec_vpopcnt (vector signed char); -vector unsigned char vec_vpopcnt (vector unsigned char); +The @code{__builtin_set_fpscr_drn} builtin allows changing the three decimal +floating point rounding mode bits. The argument is a 3-bit value. The +argument can either be a @code{const int} or the value can be stored in +a variable. +The builtin uses the ISA 3.0 instruction @code{mffscdrn} if available. +Otherwise the builtin reads the FPSCR, masks the current decimal rounding +mode bits out and OR's in the new value. -vector signed char vec_vpopcntb (vector signed char); -vector unsigned char vec_vpopcntb (vector unsigned char); +_Decimal64 __builtin_dfp_quantize (_Decimal64, _Decimal64, const int); +_Decimal64 __builtin_dfp_quantize (const int, _Decimal64, const int); +_Decimal128 __builtin_dfp_quantize (_Decimal128, _Decimal128, const int); +_Decimal128 __builtin_dfp_quantize (const int, _Decimal128, const int); -vector long long vec_vpopcntd (vector long long); -vector unsigned long long vec_vpopcntd (vector unsigned long long); +The @code{__builtin_dfp_quantize} built-in, converts and rounds the second +argument to the form with the exponent as specified by the first +argument based on the rounding mode specified by the third argument. +If the first argument is a decimal floating point value, its exponent is used +for converting and rounding of the second argument. If the first argument is a +5-bit constant integer value, then the value specifies the exponent to be used +when rounding and converting the second argument. The third argument is a +two bit constant integer that specifies the rounding mode. The possible modes +are: 00 Round to nearest, ties to even; 01 Round toward 0; 10 Round to nearest, +ties away from 0; 11 Round according to DRN where DRN is the Decimal Floating +point field of the FPSCR. -vector short vec_vpopcnth (vector short); -vector unsigned short vec_vpopcnth (vector unsigned short); +@end smallexample -vector int vec_vpopcntw (vector int); -vector unsigned int vec_vpopcntw (vector int); +The following functions require @option{-mhard-float}, +@option{-mpowerpc-gfxopt}, and @option{-mpopcntb} options. -vector long long vec_vrld (vector long long, vector unsigned long long); -vector unsigned long long vec_vrld (vector unsigned long long, - vector unsigned long long); +@smallexample +double __builtin_recipdiv (double, double); +float __builtin_recipdivf (float, float); +double __builtin_rsqrt (double); +float __builtin_rsqrtf (float); +@end smallexample -vector long long vec_vsld (vector long long, vector unsigned long long); -vector long long vec_vsld (vector unsigned long long, - vector unsigned long long); +The @code{vec_rsqrt}, @code{__builtin_rsqrt}, and +@code{__builtin_rsqrtf} functions generate multiple instructions to +implement the reciprocal sqrt functionality using reciprocal sqrt +estimate instructions. -vector long long vec_vsrad (vector long long, vector unsigned long long); -vector unsigned long long vec_vsrad (vector unsigned long long, - vector unsigned long long); +The @code{__builtin_recipdiv}, and @code{__builtin_recipdivf} +functions generate multiple instructions to implement division using +the reciprocal estimate instructions. -vector long long vec_vsrd (vector long long, vector unsigned long long); -vector unsigned long long char vec_vsrd (vector unsigned long long, - vector unsigned long long); +The following functions require @option{-mhard-float} and +@option{-mmultiple} options. -vector long long vec_vsubudm (vector long long, vector long long); -vector long long vec_vsubudm (vector bool long long, vector long long); -vector long long vec_vsubudm (vector long long, vector bool long long); -vector unsigned long long vec_vsubudm (vector unsigned long long, - vector unsigned long long); -vector unsigned long long vec_vsubudm (vector bool long long, - vector unsigned long long); -vector unsigned long long vec_vsubudm (vector unsigned long long, - vector bool long long); +The @code{__builtin_unpack_longdouble} function takes a +@code{long double} argument and a compile time constant of 0 or 1. If +the constant is 0, the first @code{double} within the +@code{long double} is returned, otherwise the second @code{double} +is returned. The @code{__builtin_unpack_longdouble} function is only +available if @code{long double} uses the IBM extended double +representation. -vector long long vec_vupkhsw (vector int); -vector unsigned long long vec_vupkhsw (vector unsigned int); +The @code{__builtin_pack_longdouble} function takes two @code{double} +arguments and returns a @code{long double} value that combines the two +arguments. The @code{__builtin_pack_longdouble} function is only +available if @code{long double} uses the IBM extended double +representation. -vector long long vec_vupklsw (vector int); -vector unsigned long long vec_vupklsw (vector int); -@end smallexample +The @code{__builtin_unpack_ibm128} function takes a @code{__ibm128} +argument and a compile time constant of 0 or 1. If the constant is 0, +the first @code{double} within the @code{__ibm128} is returned, +otherwise the second @code{double} is returned. -If the ISA 2.07 additions to the vector/scalar (power8-vector) -instruction set are available, the following additional functions are -available for 64-bit targets. New vector types -(@var{vector __int128} and @var{vector __uint128}) are available -to hold the @var{__int128} and @var{__uint128} types to use these -builtins. +The @code{__builtin_pack_ibm128} function takes two @code{double} +arguments and returns a @code{__ibm128} value that combines the two +arguments. -The normal vector extract, and set operations work on -@var{vector __int128} and @var{vector __uint128} types, -but the index value must be 0. +Additional built-in functions are available for the 64-bit PowerPC +family of processors, for efficient use of 128-bit floating point +(@code{__float128}) values. -Only functions excluded from the PVIPR are listed here. +Vector select @smallexample -vector __int128 vec_vaddcuq (vector __int128, vector __int128); -vector __uint128 vec_vaddcuq (vector __uint128, vector __uint128); +vector signed __int128 vec_sel (vector signed __int128, + vector signed __int128, vector bool __int128); +vector signed __int128 vec_sel (vector signed __int128, + vector signed __int128, vector unsigned __int128); +vector unsigned __int128 vec_sel (vector unsigned __int128, + vector unsigned __int128, vector bool __int128); +vector unsigned __int128 vec_sel (vector unsigned __int128, + vector unsigned __int128, vector unsigned __int128); +vector bool __int128 vec_sel (vector bool __int128, + vector bool __int128, vector bool __int128); +vector bool __int128 vec_sel (vector bool __int128, + vector bool __int128, vector unsigned __int128); +@end smallexample -vector __int128 vec_vadduqm (vector __int128, vector __int128); -vector __uint128 vec_vadduqm (vector __uint128, vector __uint128); +The instance is an extension of the existing overloaded built-in @code{vec_sel} +that is documented in the PVIPR. -vector __int128 vec_vaddecuq (vector __int128, vector __int128, - vector __int128); -vector __uint128 vec_vaddecuq (vector __uint128, vector __uint128, - vector __uint128); +@smallexample +vector signed __int128 vec_perm (vector signed __int128, + vector signed __int128); +vector unsigned __int128 vec_perm (vector unsigned __int128, + vector unsigned __int128); +@end smallexample -vector __int128 vec_vaddeuqm (vector __int128, vector __int128, - vector __int128); -vector __uint128 vec_vaddeuqm (vector __uint128, vector __uint128, - vector __uint128); +The instance is an extension of the existing overloaded built-in +@code{vec_perm} that is documented in the PVIPR. -vector __int128 vec_vsubecuq (vector __int128, vector __int128, - vector __int128); -vector __uint128 vec_vsubecuq (vector __uint128, vector __uint128, - vector __uint128); +@node Basic PowerPC Built-in Functions Available on ISA 2.06 +@subsubsection Basic PowerPC Built-in Functions Available on ISA 2.06 -vector __int128 vec_vsubeuqm (vector __int128, vector __int128, - vector __int128); -vector __uint128 vec_vsubeuqm (vector __uint128, vector __uint128, - vector __uint128); +The basic built-in functions described in this section are +available on the PowerPC family of processors starting with ISA 2.05 +or later. Unless specific options are explicitly disabled on the +command line, specifying option @option{-mcpu=power7} has the effect of +enabling all the same options as for @option{-mcpu=power6} in +addition to the @option{-maltivec}, @option{-mpopcntd}, and +@option{-mvsx} options. -vector __int128 vec_vsubcuq (vector __int128, vector __int128); -vector __uint128 vec_vsubcuq (vector __uint128, vector __uint128); +The following basic built-in functions require @option{-mpopcntd}: +@smallexample +unsigned int __builtin_addg6s (unsigned int, unsigned int); +long long __builtin_bpermd (long long, long long); +unsigned int __builtin_cbcdtd (unsigned int); +unsigned int __builtin_cdtbcd (unsigned int); +long long __builtin_divde (long long, long long); +unsigned long long __builtin_divdeu (unsigned long long, unsigned long long); +int __builtin_divwe (int, int); +unsigned int __builtin_divweu (unsigned int, unsigned int); +vector __int128 __builtin_pack_vector_int128 (long long, long long); +void __builtin_rs6000_speculation_barrier (void); +long long __builtin_unpack_vector_int128 (vector __int128, signed char); +@end smallexample -__int128 vec_vsubuqm (__int128, __int128); -__uint128 vec_vsubuqm (__uint128, __uint128); +Of these, the @code{__builtin_divde} and @code{__builtin_divdeu} functions +require a 64-bit environment. -vector __int128 __builtin_bcdadd (vector __int128, vector __int128, const int); -vector unsigned char __builtin_bcdadd (vector unsigned char, vector unsigned char, - const int); -int __builtin_bcdadd_lt (vector __int128, vector __int128, const int); -int __builtin_bcdadd_lt (vector unsigned char, vector unsigned char, const int); -int __builtin_bcdadd_eq (vector __int128, vector __int128, const int); -int __builtin_bcdadd_eq (vector unsigned char, vector unsigned char, const int); -int __builtin_bcdadd_gt (vector __int128, vector __int128, const int); -int __builtin_bcdadd_gt (vector unsigned char, vector unsigned char, const int); -int __builtin_bcdadd_ov (vector __int128, vector __int128, const int); -int __builtin_bcdadd_ov (vector unsigned char, vector unsigned char, const int); +The following basic built-in functions, which are also supported on +x86 targets, require @option{-mfloat128}. +@smallexample +__float128 __builtin_fabsq (__float128); +__float128 __builtin_copysignq (__float128, __float128); +__float128 __builtin_infq (void); +__float128 __builtin_huge_valq (void); +__float128 __builtin_nanq (void); +__float128 __builtin_nansq (void); -vector __int128 __builtin_bcdsub (vector __int128, vector __int128, const int); -vector unsigned char __builtin_bcdsub (vector unsigned char, vector unsigned char, - const int); -int __builtin_bcdsub_le (vector __int128, vector __int128, const int); -int __builtin_bcdsub_le (vector unsigned char, vector unsigned char, const int); -int __builtin_bcdsub_lt (vector __int128, vector __int128, const int); -int __builtin_bcdsub_lt (vector unsigned char, vector unsigned char, const int); -int __builtin_bcdsub_eq (vector __int128, vector __int128, const int); -int __builtin_bcdsub_eq (vector unsigned char, vector unsigned char, const int); -int __builtin_bcdsub_gt (vector __int128, vector __int128, const int); -int __builtin_bcdsub_gt (vector unsigned char, vector unsigned char, const int); -int __builtin_bcdsub_ge (vector __int128, vector __int128, const int); -int __builtin_bcdsub_ge (vector unsigned char, vector unsigned char, const int); -int __builtin_bcdsub_ov (vector __int128, vector __int128, const int); -int __builtin_bcdsub_ov (vector unsigned char, vector unsigned char, const int); +__float128 __builtin_sqrtf128 (__float128); +__float128 __builtin_fmaf128 (__float128, __float128, __float128); @end smallexample -@node PowerPC AltiVec Built-in Functions Available on ISA 3.0 -@subsubsection PowerPC AltiVec Built-in Functions Available on ISA 3.0 - -The following additional built-in functions are also available for the -PowerPC family of processors, starting with ISA 3.0 -(@option{-mcpu=power9}) or later. - -Only instructions excluded from the PVIPR are listed here. +@node Basic PowerPC Built-in Functions Available on ISA 2.07 +@subsubsection Basic PowerPC Built-in Functions Available on ISA 2.07 -@smallexample -unsigned int scalar_extract_exp (double source); -unsigned long long int scalar_extract_exp (__ieee128 source); +The basic built-in functions described in this section are +available on the PowerPC family of processors starting with ISA 2.07 +or later. Unless specific options are explicitly disabled on the +command line, specifying option @option{-mcpu=power8} has the effect of +enabling all the same options as for @option{-mcpu=power7} in +addition to the @option{-mpower8-fusion}, @option{-mcrypto}, +@option{-mhtm}, @option{-mquad-memory}, and +@option{-mquad-memory-atomic} options. -unsigned long long int scalar_extract_sig (double source); -unsigned __int128 scalar_extract_sig (__ieee128 source); +This section intentionally empty. -double scalar_insert_exp (unsigned long long int significand, - unsigned long long int exponent); -double scalar_insert_exp (double significand, unsigned long long int exponent); +@node Basic PowerPC Built-in Functions Available on ISA 3.0 +@subsubsection Basic PowerPC Built-in Functions Available on ISA 3.0 -ieee_128 scalar_insert_exp (unsigned __int128 significand, - unsigned long long int exponent); -ieee_128 scalar_insert_exp (ieee_128 significand, unsigned long long int exponent); -vector ieee_128 scalar_insert_exp (vector unsigned __int128 significand, - vector unsigned long long exponent); -vector unsigned long long scalar_extract_exp_to_vec (ieee_128); -vector unsigned __int128 scalar_extract_sig_to_vec (ieee_128); +The basic built-in functions described in this section are +available on the PowerPC family of processors starting with ISA 3.0 +or later. Unless specific options are explicitly disabled on the +command line, specifying option @option{-mcpu=power9} has the effect of +enabling all the same options as for @option{-mcpu=power8} in +addition to the @option{-misel} option. -int scalar_cmp_exp_gt (double arg1, double arg2); -int scalar_cmp_exp_lt (double arg1, double arg2); -int scalar_cmp_exp_eq (double arg1, double arg2); -int scalar_cmp_exp_unordered (double arg1, double arg2); +The following built-in functions are available on Linux 64-bit systems +that use the ISA 3.0 instruction set (@option{-mcpu=power9}): -bool scalar_test_data_class (float source, const int condition); -bool scalar_test_data_class (double source, const int condition); -bool scalar_test_data_class (__ieee128 source, const int condition); +@defbuiltin{__float128 __builtin_addf128_round_to_odd (__float128, __float128)} +Perform a 128-bit IEEE floating point add using round to odd as the +rounding mode. +@enddefbuiltin -bool scalar_test_neg (float source); -bool scalar_test_neg (double source); -bool scalar_test_neg (__ieee128 source); -@end smallexample +@defbuiltin{__float128 __builtin_subf128_round_to_odd (__float128, __float128)} +Perform a 128-bit IEEE floating point subtract using round to odd as +the rounding mode. +@enddefbuiltin -The @code{scalar_extract_exp} with a 64-bit source argument -function requires an environment supporting ISA 3.0 or later. -The @code{scalar_extract_exp} with a 128-bit source argument -and @code{scalar_extract_sig} -functions require a 64-bit environment supporting ISA 3.0 or later. -The @code{scalar_extract_exp} and @code{scalar_extract_sig} built-in -functions return the significand and the biased exponent value -respectively of their @code{source} arguments. -When supplied with a 64-bit @code{source} argument, the -result returned by @code{scalar_extract_sig} has -the @code{0x0010000000000000} bit set if the -function's @code{source} argument is in normalized form. -Otherwise, this bit is set to 0. -When supplied with a 128-bit @code{source} argument, the -@code{0x00010000000000000000000000000000} bit of the result is -treated similarly. -Note that the sign of the significand is not represented in the result -returned from the @code{scalar_extract_sig} function. Use the -@code{scalar_test_neg} function to test the sign of its @code{double} -argument. +@defbuiltin{__float128 __builtin_mulf128_round_to_odd (__float128, __float128)} +Perform a 128-bit IEEE floating point multiply using round to odd as +the rounding mode. +@enddefbuiltin -The @code{scalar_insert_exp} -functions require a 64-bit environment supporting ISA 3.0 or later. -When supplied with a 64-bit first argument, the -@code{scalar_insert_exp} built-in function returns a double-precision -floating point value that is constructed by assembling the values of its -@code{significand} and @code{exponent} arguments. The sign of the -result is copied from the most significant bit of the -@code{significand} argument. The significand and exponent components -of the result are composed of the least significant 11 bits of the -@code{exponent} argument and the least significant 52 bits of the -@code{significand} argument respectively. +@defbuiltin{__float128 __builtin_divf128_round_to_odd (__float128, __float128)} +Perform a 128-bit IEEE floating point divide using round to odd as +the rounding mode. +@enddefbuiltin -When supplied with a 128-bit first argument, the -@code{scalar_insert_exp} built-in function returns a quad-precision -IEEE floating point value if the two arguments were scalar. If the two -arguments are vectors, the return value is a vector IEEE floating point value. -The sign bit of the result is copied from the most significant bit of the -@code{significand} argument. The significand and exponent components of the -result are composed of the least significant 15 bits of the @code{exponent} -argument (element 0 on big-endian and element 1 on little-endian) and the -least significant 112 bits of the @code{significand} argument -respectively. Note, the @code{significand} is the scalar argument or in the -case of vector arguments, @code{significand} is element 0 for big-endian and -element 1 for little-endian. +@defbuiltin{__float128 __builtin_sqrtf128_round_to_odd (__float128)} +Perform a 128-bit IEEE floating point square root using round to odd +as the rounding mode. +@enddefbuiltin -The @code{scalar_extract_exp_to_vec}, -and @code{scalar_extract_sig_to_vec} are similar to -@code{scalar_extract_exp}, @code{scalar_extract_sig} except they return -a vector result of type unsigned long long and unsigned __int128 respectively. +@defbuiltin{__float128 __builtin_fmaf128_round_to_odd (__float128, __float128, __float128)} +Perform a 128-bit IEEE floating point fused multiply and add operation +using round to odd as the rounding mode. +@enddefbuiltin -The @code{scalar_cmp_exp_gt}, @code{scalar_cmp_exp_lt}, -@code{scalar_cmp_exp_eq}, and @code{scalar_cmp_exp_unordered} built-in -functions return a non-zero value if @code{arg1} is greater than, less -than, equal to, or not comparable to @code{arg2} respectively. The -arguments are not comparable if one or the other equals NaN (not a -number). +@defbuiltin{double __builtin_truncf128_round_to_odd (__float128)} +Convert a 128-bit IEEE floating point value to @code{double} using +round to odd as the rounding mode. +@enddefbuiltin -The @code{scalar_test_data_class} built-in function returns 1 -if any of the condition tests enabled by the value of the -@code{condition} variable are true, and 0 otherwise. The -@code{condition} argument must be a compile-time constant integer with -value not exceeding 127. The -@code{condition} argument is encoded as a bitmask with each bit -enabling the testing of a different condition, as characterized by the -following: -@smallexample -0x40 Test for NaN -0x20 Test for +Infinity -0x10 Test for -Infinity -0x08 Test for +Zero -0x04 Test for -Zero -0x02 Test for +Denormal -0x01 Test for -Denormal -@end smallexample -The @code{scalar_test_neg} built-in function returns 1 if its -@code{source} argument holds a negative value, 0 otherwise. +The following additional built-in functions are also available for the +PowerPC family of processors, starting with ISA 3.0 or later: -The following built-in functions are also available for the PowerPC family -of processors, starting with ISA 3.0 or later -(@option{-mcpu=power9}). These string functions are described -separately in order to group the descriptions closer to the function -prototypes. +@defbuiltin{{long long} __builtin_darn (void)} +@defbuiltinx{{long long} __builtin_darn_raw (void)} +@defbuiltinx{int __builtin_darn_32 (void)} +The @code{__builtin_darn} and @code{__builtin_darn_raw} +functions require a +64-bit environment supporting ISA 3.0 or later. +The @code{__builtin_darn} function provides a 64-bit conditioned +random number. The @code{__builtin_darn_raw} function provides a +64-bit raw random number. The @code{__builtin_darn_32} function +provides a 32-bit conditioned random number. +@enddefbuiltin -Only functions excluded from the PVIPR are listed here. +The following additional built-in functions are also available for the +PowerPC family of processors, starting with ISA 3.0 or later: @smallexample -int vec_all_nez (vector signed char, vector signed char); -int vec_all_nez (vector unsigned char, vector unsigned char); -int vec_all_nez (vector signed short, vector signed short); -int vec_all_nez (vector unsigned short, vector unsigned short); -int vec_all_nez (vector signed int, vector signed int); -int vec_all_nez (vector unsigned int, vector unsigned int); - -int vec_any_eqz (vector signed char, vector signed char); -int vec_any_eqz (vector unsigned char, vector unsigned char); -int vec_any_eqz (vector signed short, vector signed short); -int vec_any_eqz (vector unsigned short, vector unsigned short); -int vec_any_eqz (vector signed int, vector signed int); -int vec_any_eqz (vector unsigned int, vector unsigned int); +int __builtin_byte_in_set (unsigned char u, unsigned long long set); +int __builtin_byte_in_range (unsigned char u, unsigned int range); +int __builtin_byte_in_either_range (unsigned char u, unsigned int ranges); -signed char vec_xlx (unsigned int index, vector signed char data); -unsigned char vec_xlx (unsigned int index, vector unsigned char data); -signed short vec_xlx (unsigned int index, vector signed short data); -unsigned short vec_xlx (unsigned int index, vector unsigned short data); -signed int vec_xlx (unsigned int index, vector signed int data); -unsigned int vec_xlx (unsigned int index, vector unsigned int data); -float vec_xlx (unsigned int index, vector float data); +int __builtin_dfp_dtstsfi_lt (unsigned int comparison, _Decimal64 value); +int __builtin_dfp_dtstsfi_lt (unsigned int comparison, _Decimal128 value); +int __builtin_dfp_dtstsfi_lt_dd (unsigned int comparison, _Decimal64 value); +int __builtin_dfp_dtstsfi_lt_td (unsigned int comparison, _Decimal128 value); -signed char vec_xrx (unsigned int index, vector signed char data); -unsigned char vec_xrx (unsigned int index, vector unsigned char data); -signed short vec_xrx (unsigned int index, vector signed short data); -unsigned short vec_xrx (unsigned int index, vector unsigned short data); -signed int vec_xrx (unsigned int index, vector signed int data); -unsigned int vec_xrx (unsigned int index, vector unsigned int data); -float vec_xrx (unsigned int index, vector float data); -@end smallexample +int __builtin_dfp_dtstsfi_gt (unsigned int comparison, _Decimal64 value); +int __builtin_dfp_dtstsfi_gt (unsigned int comparison, _Decimal128 value); +int __builtin_dfp_dtstsfi_gt_dd (unsigned int comparison, _Decimal64 value); +int __builtin_dfp_dtstsfi_gt_td (unsigned int comparison, _Decimal128 value); -The @code{vec_all_nez}, @code{vec_any_eqz}, and @code{vec_cmpnez} -perform pairwise comparisons between the elements at the same -positions within their two vector arguments. -The @code{vec_all_nez} function returns a -non-zero value if and only if all pairwise comparisons are not -equal and no element of either vector argument contains a zero. -The @code{vec_any_eqz} function returns a -non-zero value if and only if at least one pairwise comparison is equal -or if at least one element of either vector argument contains a zero. -The @code{vec_cmpnez} function returns a vector of the same type as -its two arguments, within which each element consists of all ones to -denote that either the corresponding elements of the incoming arguments are -not equal or that at least one of the corresponding elements contains -zero. Otherwise, the element of the returned vector contains all zeros. +int __builtin_dfp_dtstsfi_eq (unsigned int comparison, _Decimal64 value); +int __builtin_dfp_dtstsfi_eq (unsigned int comparison, _Decimal128 value); +int __builtin_dfp_dtstsfi_eq_dd (unsigned int comparison, _Decimal64 value); +int __builtin_dfp_dtstsfi_eq_td (unsigned int comparison, _Decimal128 value); -The @code{vec_xlx} and @code{vec_xrx} functions extract the single -element selected by the @code{index} argument from the vector -represented by the @code{data} argument. The @code{index} argument -always specifies a byte offset, regardless of the size of the vector -element. With @code{vec_xlx}, @code{index} is the offset of the first -byte of the element to be extracted. With @code{vec_xrx}, @code{index} -represents the last byte of the element to be extracted, measured -from the right end of the vector. In other words, the last byte of -the element to be extracted is found at position @code{(15 - index)}. -There is no requirement that @code{index} be a multiple of the vector -element size. However, if the size of the vector element added to -@code{index} is greater than 15, the content of the returned value is -undefined. +int __builtin_dfp_dtstsfi_ov (unsigned int comparison, _Decimal64 value); +int __builtin_dfp_dtstsfi_ov (unsigned int comparison, _Decimal128 value); +int __builtin_dfp_dtstsfi_ov_dd (unsigned int comparison, _Decimal64 value); +int __builtin_dfp_dtstsfi_ov_td (unsigned int comparison, _Decimal128 value); -The following functions are also available if the ISA 3.0 instruction -set additions (@option{-mcpu=power9}) are available. +double __builtin_mffsl(void); -Only functions excluded from the PVIPR are listed here. +@end smallexample +The @code{__builtin_byte_in_set} function requires a +64-bit environment supporting ISA 3.0 or later. This function returns +a non-zero value if and only if its @code{u} argument exactly equals one of +the eight bytes contained within its 64-bit @code{set} argument. -@smallexample -vector long long vec_vctz (vector long long); -vector unsigned long long vec_vctz (vector unsigned long long); -vector int vec_vctz (vector int); -vector unsigned int vec_vctz (vector int); -vector short vec_vctz (vector short); -vector unsigned short vec_vctz (vector unsigned short); -vector signed char vec_vctz (vector signed char); -vector unsigned char vec_vctz (vector unsigned char); +The @code{__builtin_byte_in_range} and +@code{__builtin_byte_in_either_range} require an environment +supporting ISA 3.0 or later. For these two functions, the +@code{range} argument is encoded as 4 bytes, organized as +@code{hi_1:lo_1:hi_2:lo_2}. +The @code{__builtin_byte_in_range} function returns a +non-zero value if and only if its @code{u} argument is within the +range bounded between @code{lo_2} and @code{hi_2} inclusive. +The @code{__builtin_byte_in_either_range} function returns non-zero if +and only if its @code{u} argument is within either the range bounded +between @code{lo_1} and @code{hi_1} inclusive or the range bounded +between @code{lo_2} and @code{hi_2} inclusive. -vector signed char vec_vctzb (vector signed char); -vector unsigned char vec_vctzb (vector unsigned char); +The @code{__builtin_dfp_dtstsfi_lt} function returns a non-zero value +if and only if the number of significant digits of its @code{value} argument +is less than its @code{comparison} argument. The +@code{__builtin_dfp_dtstsfi_lt_dd} and +@code{__builtin_dfp_dtstsfi_lt_td} functions behave similarly, but +require that the type of the @code{value} argument be +@code{__Decimal64} and @code{__Decimal128} respectively. -vector long long vec_vctzd (vector long long); -vector unsigned long long vec_vctzd (vector unsigned long long); +The @code{__builtin_dfp_dtstsfi_gt} function returns a non-zero value +if and only if the number of significant digits of its @code{value} argument +is greater than its @code{comparison} argument. The +@code{__builtin_dfp_dtstsfi_gt_dd} and +@code{__builtin_dfp_dtstsfi_gt_td} functions behave similarly, but +require that the type of the @code{value} argument be +@code{__Decimal64} and @code{__Decimal128} respectively. -vector short vec_vctzh (vector short); -vector unsigned short vec_vctzh (vector unsigned short); +The @code{__builtin_dfp_dtstsfi_eq} function returns a non-zero value +if and only if the number of significant digits of its @code{value} argument +equals its @code{comparison} argument. The +@code{__builtin_dfp_dtstsfi_eq_dd} and +@code{__builtin_dfp_dtstsfi_eq_td} functions behave similarly, but +require that the type of the @code{value} argument be +@code{__Decimal64} and @code{__Decimal128} respectively. -vector int vec_vctzw (vector int); -vector unsigned int vec_vctzw (vector int); +The @code{__builtin_dfp_dtstsfi_ov} function returns a non-zero value +if and only if its @code{value} argument has an undefined number of +significant digits, such as when @code{value} is an encoding of @code{NaN}. +The @code{__builtin_dfp_dtstsfi_ov_dd} and +@code{__builtin_dfp_dtstsfi_ov_td} functions behave similarly, but +require that the type of the @code{value} argument be +@code{__Decimal64} and @code{__Decimal128} respectively. -vector int vec_vprtyb (vector int); -vector unsigned int vec_vprtyb (vector unsigned int); -vector long long vec_vprtyb (vector long long); -vector unsigned long long vec_vprtyb (vector unsigned long long); +The @code{__builtin_mffsl} uses the ISA 3.0 @code{mffsl} instruction to read +the FPSCR. The instruction is a lower latency version of the @code{mffs} +instruction. If the @code{mffsl} instruction is not available, then the +builtin uses the older @code{mffs} instruction to read the FPSCR. -vector int vec_vprtybw (vector int); -vector unsigned int vec_vprtybw (vector unsigned int); +@node Basic PowerPC Built-in Functions Available on ISA 3.1 +@subsubsection Basic PowerPC Built-in Functions Available on ISA 3.1 -vector long long vec_vprtybd (vector long long); -vector unsigned long long vec_vprtybd (vector unsigned long long); -@end smallexample +The basic built-in functions described in this section are +available on the PowerPC family of processors starting with ISA 3.1. +Unless specific options are explicitly disabled on the +command line, specifying option @option{-mcpu=power10} has the effect of +enabling all the same options as for @option{-mcpu=power9}. -On 64-bit targets, if the ISA 3.0 additions (@option{-mcpu=power9}) -are available: +The following built-in functions are available on Linux 64-bit systems +that use a future architecture instruction set (@option{-mcpu=power10}): -@smallexample -vector long vec_vprtyb (vector long); -vector unsigned long vec_vprtyb (vector unsigned long); -vector __int128 vec_vprtyb (vector __int128); -vector __uint128 vec_vprtyb (vector __uint128); +@defbuiltin{{unsigned long long} @ + __builtin_cfuged (unsigned long long, unsigned long long)} +Perform a 64-bit centrifuge operation, as if implemented by the +@code{cfuged} instruction. +@enddefbuiltin -vector long vec_vprtybd (vector long); -vector unsigned long vec_vprtybd (vector unsigned long); +@defbuiltin{{unsigned long long} @ + __builtin_cntlzdm (unsigned long long, unsigned long long)} +Perform a 64-bit count leading zeros operation under mask, as if +implemented by the @code{cntlzdm} instruction. +@enddefbuiltin -vector __int128 vec_vprtybq (vector __int128); -vector __uint128 vec_vprtybd (vector __uint128); -@end smallexample +@defbuiltin{{unsigned long long} @ + __builtin_cnttzdm (unsigned long long, unsigned long long)} +Perform a 64-bit count trailing zeros operation under mask, as if +implemented by the @code{cnttzdm} instruction. +@enddefbuiltin -The following built-in functions are available for the PowerPC family -of processors, starting with ISA 3.0 or later (@option{-mcpu=power9}). +@defbuiltin{{unsigned long long} @ + __builtin_pdepd (unsigned long long, unsigned long long)} +Perform a 64-bit parallel bits deposit operation, as if implemented by the +@code{pdepd} instruction. +@enddefbuiltin -Only functions excluded from the PVIPR are listed here. +@defbuiltin{{unsigned long long} @ + __builtin_pextd (unsigned long long, unsigned long long)} +Perform a 64-bit parallel bits extract operation, as if implemented by the +@code{pextd} instruction. +@enddefbuiltin -@smallexample -__vector unsigned char -vec_absdb (__vector unsigned char arg1, __vector unsigned char arg2); -__vector unsigned short -vec_absdh (__vector unsigned short arg1, __vector unsigned short arg2); -__vector unsigned int -vec_absdw (__vector unsigned int arg1, __vector unsigned int arg2); -@end smallexample +@defbuiltin{{vector signed __int128} vsx_xl_sext (signed long long, signed char *)} +@defbuiltinx{{vector signed __int128} vsx_xl_sext (signed long long, signed short *)} +@defbuiltinx{{vector signed __int128} vsx_xl_sext (signed long long, signed int *)} +@defbuiltinx{{vector signed __int128} vsx_xl_sext (signed long long, signed long long *)} +@defbuiltinx{{vector unsigned __int128} vsx_xl_zext (signed long long, unsigned char *)} +@defbuiltinx{{vector unsigned __int128} vsx_xl_zext (signed long long, unsigned short *)} +@defbuiltinx{{vector unsigned __int128} vsx_xl_zext (signed long long, unsigned int *)} +@defbuiltinx{{vector unsigned __int128} vsx_xl_zext (signed long long, unsigned long long *)} -The @code{vec_absd}, @code{vec_absdb}, @code{vec_absdh}, and -@code{vec_absdw} built-in functions each computes the absolute -differences of the pairs of vector elements supplied in its two vector -arguments, placing the absolute differences into the corresponding -elements of the vector result. +Load (and sign extend) to an __int128 vector, as if implemented by the ISA 3.1 +@code{lxvrbx}, @code{lxvrhx}, @code{lxvrwx}, and @code{lxvrdx} +instructions. +@enddefbuiltin -The following built-in functions are available for the PowerPC family -of processors, starting with ISA 3.0 or later (@option{-mcpu=power9}): -@smallexample -vector unsigned int vec_vrlnm (vector unsigned int, vector unsigned int); -vector unsigned long long vec_vrlnm (vector unsigned long long, - vector unsigned long long); -@end smallexample +@defbuiltin{{void} vec_xst_trunc (vector signed __int128, signed long long, signed char *)} +@defbuiltinx{{void} vec_xst_trunc (vector signed __int128, signed long long, signed short *)} +@defbuiltinx{{void} vec_xst_trunc (vector signed __int128, signed long long, signed int *)} +@defbuiltinx{{void} vec_xst_trunc (vector signed __int128, signed long long, signed long long *)} +@defbuiltinx{{void} vec_xst_trunc (vector unsigned __int128, signed long long, unsigned char *)} +@defbuiltinx{{void} vec_xst_trunc (vector unsigned __int128, signed long long, unsigned short *)} +@defbuiltinx{{void} vec_xst_trunc (vector unsigned __int128, signed long long, unsigned int *)} +@defbuiltinx{{void} vec_xst_trunc (vector unsigned __int128, signed long long, unsigned long long *)} -The result of @code{vec_vrlnm} is obtained by rotating each element -of the first argument vector left and ANDing it with a mask. The -second argument vector contains the mask beginning in bits 11:15, -the mask end in bits 19:23, and the shift count in bits 27:31, -of each element. +Truncate and store the rightmost element of a vector, as if implemented by the +ISA 3.1 @code{stxvrbx}, @code{stxvrhx}, @code{stxvrwx}, and @code{stxvrdx} +instructions. +@enddefbuiltin -If the cryptographic instructions are enabled (@option{-mcrypto} or -@option{-mcpu=power8}), the following builtins are enabled. +@node PowerPC AltiVec/VSX Built-in Functions +@subsection PowerPC AltiVec/VSX Built-in Functions -Only functions excluded from the PVIPR are listed here. +GCC provides an interface for the PowerPC family of processors to access +the AltiVec operations described in Motorola's AltiVec Programming +Interface Manual. The interface is made available by including +@code{} and using @option{-maltivec} and +@option{-mabi=altivec}. The interface supports the following vector +types. @smallexample -vector unsigned long long __builtin_crypto_vsbox (vector unsigned long long); +vector unsigned char +vector signed char +vector bool char -vector unsigned long long __builtin_crypto_vcipher (vector unsigned long long, - vector unsigned long long); +vector unsigned short +vector signed short +vector bool short +vector pixel -vector unsigned long long __builtin_crypto_vcipherlast - (vector unsigned long long, - vector unsigned long long); +vector unsigned int +vector signed int +vector bool int +vector float +@end smallexample -vector unsigned long long __builtin_crypto_vncipher (vector unsigned long long, - vector unsigned long long); +GCC's implementation of the high-level language interface available from +C and C++ code differs from Motorola's documentation in several ways. -vector unsigned long long __builtin_crypto_vncipherlast (vector unsigned long long, - vector unsigned long long); +@itemize @bullet -vector unsigned char __builtin_crypto_vpermxor (vector unsigned char, - vector unsigned char, - vector unsigned char); +@item +A vector constant is a list of constant expressions within curly braces. -vector unsigned short __builtin_crypto_vpermxor (vector unsigned short, - vector unsigned short, - vector unsigned short); +@item +A vector initializer requires no cast if the vector constant is of the +same type as the variable it is initializing. -vector unsigned int __builtin_crypto_vpermxor (vector unsigned int, - vector unsigned int, - vector unsigned int); +@item +If @code{signed} or @code{unsigned} is omitted, the signedness of the +vector type is the default signedness of the base type. The default +varies depending on the operating system, so a portable program should +always specify the signedness. -vector unsigned long long __builtin_crypto_vpermxor (vector unsigned long long, - vector unsigned long long, - vector unsigned long long); +@item +Compiling with @option{-maltivec} adds keywords @code{__vector}, +@code{vector}, @code{__pixel}, @code{pixel}, @code{__bool} and +@code{bool}. When compiling ISO C, the context-sensitive substitution +of the keywords @code{vector}, @code{pixel} and @code{bool} is +disabled. To use them, you must include @code{} instead. -vector unsigned char __builtin_crypto_vpmsumb (vector unsigned char, - vector unsigned char); +@item +GCC allows using a @code{typedef} name as the type specifier for a +vector type, but only under the following circumstances: -vector unsigned short __builtin_crypto_vpmsumh (vector unsigned short, - vector unsigned short); +@itemize @bullet -vector unsigned int __builtin_crypto_vpmsumw (vector unsigned int, - vector unsigned int); +@item +When using @code{__vector} instead of @code{vector}; for example, -vector unsigned long long __builtin_crypto_vpmsumd (vector unsigned long long, - vector unsigned long long); +@smallexample +typedef signed short int16; +__vector int16 data; +@end smallexample -vector unsigned long long __builtin_crypto_vshasigmad (vector unsigned long long, - int, int); +@item +When using @code{vector} in keyword-and-predefine mode; for example, -vector unsigned int __builtin_crypto_vshasigmaw (vector unsigned int, int, int); +@smallexample +typedef signed short int16; +vector int16 data; @end smallexample -The second argument to @var{__builtin_crypto_vshasigmad} and -@var{__builtin_crypto_vshasigmaw} must be a constant -integer that is 0 or 1. The third argument to these built-in functions -must be a constant integer in the range of 0 to 15. +Note that keyword-and-predefine mode is enabled by disabling GNU +extensions (e.g., by using @code{-std=c11}) and including +@code{}. +@end itemize -The following sign extension builtins are provided: +@item +For C, overloaded functions are implemented with macros so the following +does not work: @smallexample -vector signed int vec_signexti (vector signed char a); -vector signed long long vec_signextll (vector signed char a); -vector signed int vec_signexti (vector signed short a); -vector signed long long vec_signextll (vector signed short a); -vector signed long long vec_signextll (vector signed int a); -vector signed long long vec_signextq (vector signed long long a); + vec_add ((vector signed int)@{1, 2, 3, 4@}, foo); @end smallexample -Each element of the result is produced by sign-extending the element of the -input vector that would fall in the least significant portion of the result -element. For example, a sign-extension of a vector signed char to a vector -signed long long will sign extend the rightmost byte of each doubleword. +@noindent +Since @code{vec_add} is a macro, the vector constant in the example +is treated as four separate arguments. Wrap the entire argument in +parentheses for this to work. +@end itemize -@node PowerPC AltiVec Built-in Functions Available on ISA 3.1 -@subsubsection PowerPC AltiVec Built-in Functions Available on ISA 3.1 +@emph{Note:} Only the @code{} interface is supported. +Internally, GCC uses built-in functions to achieve the functionality in +the aforementioned header file, but they are not supported and are +subject to change without notice. -The following additional built-in functions are also available for the -PowerPC family of processors, starting with ISA 3.1 (@option{-mcpu=power10}): +GCC complies with the Power Vector Intrinsic Programming Reference (PVIPR), +which may be found at +@uref{https://openpowerfoundation.org/?resource_lib=power-vector-intrinsic-programming-reference}. +Chapter 4 of this document fully documents the vector API interfaces +that must be +provided by compliant compilers. Programmers should preferentially use +the interfaces described therein. However, historically GCC has provided +additional interfaces for access to vector instructions. These are +briefly described below. Where the PVIPR provides a portable interface, +other functions in GCC that provide the same capabilities should be +considered deprecated. -@smallexample -@exdent int vec_test_lsbb_all_ones (vector signed char); -@exdent int vec_test_lsbb_all_ones (vector unsigned char); -@exdent int vec_test_lsbb_all_ones (vector bool char); -@end smallexample -@findex vec_test_lsbb_all_ones +The PVIPR documents the following overloaded functions: -The builtin @code{vec_test_lsbb_all_ones} returns 1 if the least significant -bit in each byte is equal to 1. It returns 0 otherwise. +@multitable @columnfractions 0.33 0.33 0.33 -@smallexample -@exdent int vec_test_lsbb_all_zeros (vector signed char); -@exdent int vec_test_lsbb_all_zeros (vector unsigned char); -@exdent int vec_test_lsbb_all_zeros (vector bool char); -@end smallexample -@findex vec_test_lsbb_all_zeros +@item @code{vec_abs} +@tab @code{vec_absd} +@tab @code{vec_abss} +@item @code{vec_add} +@tab @code{vec_addc} +@tab @code{vec_adde} +@item @code{vec_addec} +@tab @code{vec_adds} +@tab @code{vec_all_eq} +@item @code{vec_all_ge} +@tab @code{vec_all_gt} +@tab @code{vec_all_in} +@item @code{vec_all_le} +@tab @code{vec_all_lt} +@tab @code{vec_all_nan} +@item @code{vec_all_ne} +@tab @code{vec_all_nge} +@tab @code{vec_all_ngt} +@item @code{vec_all_nle} +@tab @code{vec_all_nlt} +@tab @code{vec_all_numeric} +@item @code{vec_and} +@tab @code{vec_andc} +@tab @code{vec_any_eq} +@item @code{vec_any_ge} +@tab @code{vec_any_gt} +@tab @code{vec_any_le} +@item @code{vec_any_lt} +@tab @code{vec_any_nan} +@tab @code{vec_any_ne} +@item @code{vec_any_nge} +@tab @code{vec_any_ngt} +@tab @code{vec_any_nle} +@item @code{vec_any_nlt} +@tab @code{vec_any_numeric} +@tab @code{vec_any_out} +@item @code{vec_avg} +@tab @code{vec_bperm} +@tab @code{vec_ceil} +@item @code{vec_cipher_be} +@tab @code{vec_cipherlast_be} +@tab @code{vec_cmpb} +@item @code{vec_cmpeq} +@tab @code{vec_cmpge} +@tab @code{vec_cmpgt} +@item @code{vec_cmple} +@tab @code{vec_cmplt} +@tab @code{vec_cmpne} +@item @code{vec_cmpnez} +@tab @code{vec_cntlz} +@tab @code{vec_cntlz_lsbb} +@item @code{vec_cnttz} +@tab @code{vec_cnttz_lsbb} +@tab @code{vec_cpsgn} +@item @code{vec_ctf} +@tab @code{vec_cts} +@tab @code{vec_ctu} +@item @code{vec_div} +@tab @code{vec_double} +@tab @code{vec_doublee} +@item @code{vec_doubleh} +@tab @code{vec_doublel} +@tab @code{vec_doubleo} +@item @code{vec_eqv} +@tab @code{vec_expte} +@tab @code{vec_extract} +@item @code{vec_extract_exp} +@tab @code{vec_extract_fp32_from_shorth} +@tab @code{vec_extract_fp32_from_shortl} +@item @code{vec_extract_sig} +@tab @code{vec_extract_4b} +@tab @code{vec_first_match_index} +@item @code{vec_first_match_or_eos_index} +@tab @code{vec_first_mismatch_index} +@tab @code{vec_first_mismatch_or_eos_index} +@item @code{vec_float} +@tab @code{vec_float2} +@tab @code{vec_floate} +@item @code{vec_floato} +@tab @code{vec_floor} +@tab @code{vec_gb} +@item @code{vec_insert} +@tab @code{vec_insert_exp} +@tab @code{vec_insert4b} +@item @code{vec_ld} +@tab @code{vec_lde} +@tab @code{vec_ldl} +@item @code{vec_loge} +@tab @code{vec_madd} +@tab @code{vec_madds} +@item @code{vec_max} +@tab @code{vec_mergee} +@tab @code{vec_mergeh} +@item @code{vec_mergel} +@tab @code{vec_mergeo} +@tab @code{vec_mfvscr} +@item @code{vec_min} +@tab @code{vec_mradds} +@tab @code{vec_msub} +@item @code{vec_msum} +@tab @code{vec_msums} +@tab @code{vec_mtvscr} +@item @code{vec_mul} +@tab @code{vec_mule} +@tab @code{vec_mulo} +@item @code{vec_nabs} +@tab @code{vec_nand} +@tab @code{vec_ncipher_be} +@item @code{vec_ncipherlast_be} +@tab @code{vec_nearbyint} +@tab @code{vec_neg} +@item @code{vec_nmadd} +@tab @code{vec_nmsub} +@tab @code{vec_nor} +@item @code{vec_or} +@tab @code{vec_orc} +@tab @code{vec_pack} +@item @code{vec_pack_to_short_fp32} +@tab @code{vec_packpx} +@tab @code{vec_packs} +@item @code{vec_packsu} +@tab @code{vec_parity_lsbb} +@tab @code{vec_perm} +@item @code{vec_permxor} +@tab @code{vec_pmsum_be} +@tab @code{vec_popcnt} +@item @code{vec_re} +@tab @code{vec_recipdiv} +@tab @code{vec_revb} +@item @code{vec_reve} +@tab @code{vec_rint} +@tab @code{vec_rl} +@item @code{vec_rlmi} +@tab @code{vec_rlnm} +@tab @code{vec_round} +@item @code{vec_rsqrt} +@tab @code{vec_rsqrte} +@tab @code{vec_sbox_be} +@item @code{vec_sel} +@tab @code{vec_shasigma_be} +@tab @code{vec_signed} +@item @code{vec_signed2} +@tab @code{vec_signede} +@tab @code{vec_signedo} +@item @code{vec_sl} +@tab @code{vec_sld} +@tab @code{vec_sldw} +@item @code{vec_sll} +@tab @code{vec_slo} +@tab @code{vec_slv} +@item @code{vec_splat} +@tab @code{vec_splat_s8} +@tab @code{vec_splat_s16} +@item @code{vec_splat_s32} +@tab @code{vec_splat_u8} +@tab @code{vec_splat_u16} +@item @code{vec_splat_u32} +@tab @code{vec_splats} +@tab @code{vec_sqrt} +@item @code{vec_sr} +@tab @code{vec_sra} +@tab @code{vec_srl} +@item @code{vec_sro} +@tab @code{vec_srv} +@tab @code{vec_st} +@item @code{vec_ste} +@tab @code{vec_stl} +@tab @code{vec_sub} +@item @code{vec_subc} +@tab @code{vec_sube} +@tab @code{vec_subec} +@item @code{vec_subs} +@tab @code{vec_sum2s} +@tab @code{vec_sum4s} +@item @code{vec_sums} +@tab @code{vec_test_data_class} +@tab @code{vec_trunc} +@item @code{vec_unpackh} +@tab @code{vec_unpackl} +@tab @code{vec_unsigned} +@item @code{vec_unsigned2} +@tab @code{vec_unsignede} +@tab @code{vec_unsignedo} +@item @code{vec_xl} +@tab @code{vec_xl_be} +@tab @code{vec_xl_len} +@item @code{vec_xl_len_r} +@tab @code{vec_xor} +@tab @code{vec_xst} +@item @code{vec_xst_be} +@tab @code{vec_xst_len} +@tab @code{vec_xst_len_r} -The builtin @code{vec_test_lsbb_all_zeros} returns 1 if the least significant -bit in each byte is equal to zero. It returns 0 otherwise. +@end multitable -@smallexample -@exdent vector unsigned long long int -@exdent vec_cfuge (vector unsigned long long int, vector unsigned long long int); -@end smallexample -Perform a vector centrifuge operation, as if implemented by the -@code{vcfuged} instruction. -@findex vec_cfuge +@menu +* PowerPC AltiVec Built-in Functions on ISA 2.05:: +* PowerPC AltiVec Built-in Functions Available on ISA 2.06:: +* PowerPC AltiVec Built-in Functions Available on ISA 2.07:: +* PowerPC AltiVec Built-in Functions Available on ISA 3.0:: +* PowerPC AltiVec Built-in Functions Available on ISA 3.1:: +@end menu -@smallexample -@exdent vector unsigned long long int -@exdent vec_cntlzm (vector unsigned long long int, vector unsigned long long int); -@end smallexample -Perform a vector count leading zeros under bit mask operation, as if -implemented by the @code{vclzdm} instruction. -@findex vec_cntlzm +@node PowerPC AltiVec Built-in Functions on ISA 2.05 +@subsubsection PowerPC AltiVec Built-in Functions on ISA 2.05 -@smallexample -@exdent vector unsigned long long int -@exdent vec_cnttzm (vector unsigned long long int, vector unsigned long long int); -@end smallexample -Perform a vector count trailing zeros under bit mask operation, as if -implemented by the @code{vctzdm} instruction. -@findex vec_cnttzm +The following interfaces are supported for the generic and specific +AltiVec operations and the AltiVec predicates. In cases where there +is a direct mapping between generic and specific operations, only the +generic names are shown here, although the specific operations can also +be used. -@smallexample -@exdent vector signed char -@exdent vec_clrl (vector signed char @var{a}, unsigned int @var{n}); -@exdent vector unsigned char -@exdent vec_clrl (vector unsigned char @var{a}, unsigned int @var{n}); -@end smallexample -Clear the left-most @code{(16 - n)} bytes of vector argument @code{a}, as if -implemented by the @code{vclrlb} instruction on a big-endian target -and by the @code{vclrrb} instruction on a little-endian target. A -value of @code{n} that is greater than 16 is treated as if it equaled 16. -@findex vec_clrl +Arguments that are documented as @code{const int} require literal +integral values within the range required for that operation. -@smallexample -@exdent vector signed char -@exdent vec_clrr (vector signed char @var{a}, unsigned int @var{n}); -@exdent vector unsigned char -@exdent vec_clrr (vector unsigned char @var{a}, unsigned int @var{n}); -@end smallexample -Clear the right-most @code{(16 - n)} bytes of vector argument @code{a}, as if -implemented by the @code{vclrrb} instruction on a big-endian target -and by the @code{vclrlb} instruction on a little-endian target. A -value of @code{n} that is greater than 16 is treated as if it equaled 16. -@findex vec_clrr +Only functions excluded from the PVIPR are listed here. @smallexample -@exdent vector unsigned long long int -@exdent vec_gnb (vector unsigned __int128, const unsigned char); -@end smallexample -Perform a 128-bit vector gather operation, as if implemented by the -@code{vgnb} instruction. The second argument must be a literal -integer value between 2 and 7 inclusive. -@findex vec_gnb +void vec_dss (const int); +void vec_dssall (void); -Vector Extract +void vec_dst (const vector unsigned char *, int, const int); +void vec_dst (const vector signed char *, int, const int); +void vec_dst (const vector bool char *, int, const int); +void vec_dst (const vector unsigned short *, int, const int); +void vec_dst (const vector signed short *, int, const int); +void vec_dst (const vector bool short *, int, const int); +void vec_dst (const vector pixel *, int, const int); +void vec_dst (const vector unsigned int *, int, const int); +void vec_dst (const vector signed int *, int, const int); +void vec_dst (const vector bool int *, int, const int); +void vec_dst (const vector float *, int, const int); +void vec_dst (const unsigned char *, int, const int); +void vec_dst (const signed char *, int, const int); +void vec_dst (const unsigned short *, int, const int); +void vec_dst (const short *, int, const int); +void vec_dst (const unsigned int *, int, const int); +void vec_dst (const int *, int, const int); +void vec_dst (const float *, int, const int); -@smallexample -@exdent vector unsigned long long int -@exdent vec_extractl (vector unsigned char, vector unsigned char, unsigned int); -@exdent vector unsigned long long int -@exdent vec_extractl (vector unsigned short, vector unsigned short, unsigned int); -@exdent vector unsigned long long int -@exdent vec_extractl (vector unsigned int, vector unsigned int, unsigned int); -@exdent vector unsigned long long int -@exdent vec_extractl (vector unsigned long long, vector unsigned long long, unsigned int); -@end smallexample -Extract an element from two concatenated vectors starting at the given byte index -in natural-endian order, and place it zero-extended in doubleword 1 of the result -according to natural element order. If the byte index is out of range for the -data type, the intrinsic will be rejected. -For little-endian, this output will match the placement by the hardware -instruction, i.e., dword[0] in RTL notation. For big-endian, an additional -instruction is needed to move it from the "left" doubleword to the "right" one. -For little-endian, semantics matching the @code{vextdubvrx}, -@code{vextduhvrx}, @code{vextduwvrx} instruction will be generated, while for -big-endian, semantics matching the @code{vextdubvlx}, @code{vextduhvlx}, -@code{vextduwvlx} instructions -will be generated. Note that some fairly anomalous results can be generated if -the byte index is not aligned on an element boundary for the element being -extracted. This is a limitation of the bi-endian vector programming model is -consistent with the limitation on @code{vec_perm}. -@findex vec_extractl +void vec_dstst (const vector unsigned char *, int, const int); +void vec_dstst (const vector signed char *, int, const int); +void vec_dstst (const vector bool char *, int, const int); +void vec_dstst (const vector unsigned short *, int, const int); +void vec_dstst (const vector signed short *, int, const int); +void vec_dstst (const vector bool short *, int, const int); +void vec_dstst (const vector pixel *, int, const int); +void vec_dstst (const vector unsigned int *, int, const int); +void vec_dstst (const vector signed int *, int, const int); +void vec_dstst (const vector bool int *, int, const int); +void vec_dstst (const vector float *, int, const int); +void vec_dstst (const unsigned char *, int, const int); +void vec_dstst (const signed char *, int, const int); +void vec_dstst (const unsigned short *, int, const int); +void vec_dstst (const short *, int, const int); +void vec_dstst (const unsigned int *, int, const int); +void vec_dstst (const int *, int, const int); +void vec_dstst (const unsigned long *, int, const int); +void vec_dstst (const long *, int, const int); +void vec_dstst (const float *, int, const int); -@smallexample -@exdent vector unsigned long long int -@exdent vec_extracth (vector unsigned char, vector unsigned char, unsigned int); -@exdent vector unsigned long long int -@exdent vec_extracth (vector unsigned short, vector unsigned short, -unsigned int); -@exdent vector unsigned long long int -@exdent vec_extracth (vector unsigned int, vector unsigned int, unsigned int); -@exdent vector unsigned long long int -@exdent vec_extracth (vector unsigned long long, vector unsigned long long, -unsigned int); -@end smallexample -Extract an element from two concatenated vectors starting at the given byte -index. The index is based on big endian order for a little endian system. -Similarly, the index is based on little endian order for a big endian system. -The extraced elements are zero-extended and put in doubleword 1 -according to natural element order. If the byte index is out of range for the -data type, the intrinsic will be rejected. For little-endian, this output -will match the placement by the hardware instruction (vextdubvrx, vextduhvrx, -vextduwvrx, vextddvrx) i.e., dword[0] in RTL -notation. For big-endian, an additional instruction is needed to move it -from the "left" doubleword to the "right" one. For little-endian, semantics -matching the @code{vextdubvlx}, @code{vextduhvlx}, @code{vextduwvlx} -instructions will be generated, while for big-endian, semantics matching the -@code{vextdubvrx}, @code{vextduhvrx}, @code{vextduwvrx} instructions will -be generated. Note that some fairly anomalous -results can be generated if the byte index is not aligned on the -element boundary for the element being extracted. This is a -limitation of the bi-endian vector programming model consistent with the -limitation on @code{vec_perm}. -@findex vec_extracth -@smallexample -@exdent vector unsigned long long int -@exdent vec_pdep (vector unsigned long long int, vector unsigned long long int); -@end smallexample -Perform a vector parallel bits deposit operation, as if implemented by -the @code{vpdepd} instruction. -@findex vec_pdep +void vec_dststt (const vector unsigned char *, int, const int); +void vec_dststt (const vector signed char *, int, const int); +void vec_dststt (const vector bool char *, int, const int); +void vec_dststt (const vector unsigned short *, int, const int); +void vec_dststt (const vector signed short *, int, const int); +void vec_dststt (const vector bool short *, int, const int); +void vec_dststt (const vector pixel *, int, const int); +void vec_dststt (const vector unsigned int *, int, const int); +void vec_dststt (const vector signed int *, int, const int); +void vec_dststt (const vector bool int *, int, const int); +void vec_dststt (const vector float *, int, const int); +void vec_dststt (const unsigned char *, int, const int); +void vec_dststt (const signed char *, int, const int); +void vec_dststt (const unsigned short *, int, const int); +void vec_dststt (const short *, int, const int); +void vec_dststt (const unsigned int *, int, const int); +void vec_dststt (const int *, int, const int); +void vec_dststt (const float *, int, const int); -Vector Insert +void vec_dstt (const vector unsigned char *, int, const int); +void vec_dstt (const vector signed char *, int, const int); +void vec_dstt (const vector bool char *, int, const int); +void vec_dstt (const vector unsigned short *, int, const int); +void vec_dstt (const vector signed short *, int, const int); +void vec_dstt (const vector bool short *, int, const int); +void vec_dstt (const vector pixel *, int, const int); +void vec_dstt (const vector unsigned int *, int, const int); +void vec_dstt (const vector signed int *, int, const int); +void vec_dstt (const vector bool int *, int, const int); +void vec_dstt (const vector float *, int, const int); +void vec_dstt (const unsigned char *, int, const int); +void vec_dstt (const signed char *, int, const int); +void vec_dstt (const unsigned short *, int, const int); +void vec_dstt (const short *, int, const int); +void vec_dstt (const unsigned int *, int, const int); +void vec_dstt (const int *, int, const int); +void vec_dstt (const float *, int, const int); -@smallexample -@exdent vector unsigned char -@exdent vec_insertl (unsigned char, vector unsigned char, unsigned int); -@exdent vector unsigned short -@exdent vec_insertl (unsigned short, vector unsigned short, unsigned int); -@exdent vector unsigned int -@exdent vec_insertl (unsigned int, vector unsigned int, unsigned int); -@exdent vector unsigned long long -@exdent vec_insertl (unsigned long long, vector unsigned long long, -unsigned int); -@exdent vector unsigned char -@exdent vec_insertl (vector unsigned char, vector unsigned char, unsigned int; -@exdent vector unsigned short -@exdent vec_insertl (vector unsigned short, vector unsigned short, -unsigned int); -@exdent vector unsigned int -@exdent vec_insertl (vector unsigned int, vector unsigned int, unsigned int); -@end smallexample +vector signed char vec_lvebx (int, char *); +vector unsigned char vec_lvebx (int, unsigned char *); -Let src be the first argument, when the first argument is a scalar, or the -rightmost element of the left doubleword of the first argument, when the first -argument is a vector. Insert the source into the destination at the position -given by the third argument, using natural element order in the second -argument. The rest of the second argument is unchanged. If the byte -index is greater than 14 for halfwords, greater than 12 for words, or -greater than 8 for doublewords the result is undefined. For little-endian, -the generated code will be semantically equivalent to @code{vins[bhwd]rx} -instructions. Similarly for big-endian it will be semantically equivalent -to @code{vins[bhwd]lx}. Note that some fairly anomalous results can be -generated if the byte index is not aligned on an element boundary for the -type of element being inserted. -@findex vec_insertl +vector signed short vec_lvehx (int, short *); +vector unsigned short vec_lvehx (int, unsigned short *); -@smallexample -@exdent vector unsigned char -@exdent vec_inserth (unsigned char, vector unsigned char, unsigned int); -@exdent vector unsigned short -@exdent vec_inserth (unsigned short, vector unsigned short, unsigned int); -@exdent vector unsigned int -@exdent vec_inserth (unsigned int, vector unsigned int, unsigned int); -@exdent vector unsigned long long -@exdent vec_inserth (unsigned long long, vector unsigned long long, -unsigned int); -@exdent vector unsigned char -@exdent vec_inserth (vector unsigned char, vector unsigned char, unsigned int); -@exdent vector unsigned short -@exdent vec_inserth (vector unsigned short, vector unsigned short, -unsigned int); -@exdent vector unsigned int -@exdent vec_inserth (vector unsigned int, vector unsigned int, unsigned int); -@end smallexample +vector float vec_lvewx (int, float *); +vector signed int vec_lvewx (int, int *); +vector unsigned int vec_lvewx (int, unsigned int *); -Let src be the first argument, when the first argument is a scalar, or the -rightmost element of the first argument, when the first argument is a vector. -Insert src into the second argument at the position identified by the third -argument, using opposite element order in the second argument, and leaving the -rest of the second argument unchanged. If the byte index is greater than 14 -for halfwords, 12 for words, or 8 for doublewords, the intrinsic will be -rejected. Note that the underlying hardware instruction uses the same register -for the second argument and the result. -For little-endian, the code generation will be semantically equivalent to -@code{vins[bhwd]lx}, while for big-endian it will be semantically equivalent to -@code{vins[bhwd]rx}. -Note that some fairly anomalous results can be generated if the byte index is -not aligned on an element boundary for the sort of element being inserted. -@findex vec_inserth +vector unsigned char vec_lvsl (int, const unsigned char *); +vector unsigned char vec_lvsl (int, const signed char *); +vector unsigned char vec_lvsl (int, const unsigned short *); +vector unsigned char vec_lvsl (int, const short *); +vector unsigned char vec_lvsl (int, const unsigned int *); +vector unsigned char vec_lvsl (int, const int *); +vector unsigned char vec_lvsl (int, const float *); -Vector Replace Element -@smallexample -@exdent vector signed int vec_replace_elt (vector signed int, signed int, -const int); -@exdent vector unsigned int vec_replace_elt (vector unsigned int, -unsigned int, const int); -@exdent vector float vec_replace_elt (vector float, float, const int); -@exdent vector signed long long vec_replace_elt (vector signed long long, -signed long long, const int); -@exdent vector unsigned long long vec_replace_elt (vector unsigned long long, -unsigned long long, const int); -@exdent vector double rec_replace_elt (vector double, double, const int); -@end smallexample -The third argument (constrained to [0,3]) identifies the natural-endian -element number of the first argument that will be replaced by the second -argument to produce the result. The other elements of the first argument will -remain unchanged in the result. +vector unsigned char vec_lvsr (int, const unsigned char *); +vector unsigned char vec_lvsr (int, const signed char *); +vector unsigned char vec_lvsr (int, const unsigned short *); +vector unsigned char vec_lvsr (int, const short *); +vector unsigned char vec_lvsr (int, const unsigned int *); +vector unsigned char vec_lvsr (int, const int *); +vector unsigned char vec_lvsr (int, const float *); -If it's desirable to insert a word at an unaligned position, use -vec_replace_unaligned instead. +void vec_stvebx (vector signed char, int, signed char *); +void vec_stvebx (vector unsigned char, int, unsigned char *); +void vec_stvebx (vector bool char, int, signed char *); +void vec_stvebx (vector bool char, int, unsigned char *); -@findex vec_replace_element +void vec_stvehx (vector signed short, int, short *); +void vec_stvehx (vector unsigned short, int, unsigned short *); +void vec_stvehx (vector bool short, int, short *); +void vec_stvehx (vector bool short, int, unsigned short *); -Vector Replace Unaligned -@smallexample -@exdent vector unsigned char vec_replace_unaligned (vector unsigned char, -signed int, const int); -@exdent vector unsigned char vec_replace_unaligned (vector unsigned char, -unsigned int, const int); -@exdent vector unsigned char vec_replace_unaligned (vector unsigned char, -float, const int); -@exdent vector unsigned char vec_replace_unaligned (vector unsigned char, -signed long long, const int); -@exdent vector unsigned char vec_replace_unaligned (vector unsigned char, -unsigned long long, const int); -@exdent vector unsigned char vec_replace_unaligned (vector unsigned char, -double, const int); -@end smallexample +void vec_stvewx (vector float, int, float *); +void vec_stvewx (vector signed int, int, int *); +void vec_stvewx (vector unsigned int, int, unsigned int *); +void vec_stvewx (vector bool int, int, int *); +void vec_stvewx (vector bool int, int, unsigned int *); -The second argument replaces a portion of the first argument to produce the -result, with the rest of the first argument unchanged in the result. The -third argument identifies the byte index (using left-to-right, or big-endian -order) where the high-order byte of the second argument will be placed, with -the remaining bytes of the second argument placed naturally "to the right" -of the high-order byte. +vector float vec_vaddfp (vector float, vector float); -The programmer is responsible for understanding the endianness issues involved -with the first argument and the result. -@findex vec_replace_unaligned +vector signed char vec_vaddsbs (vector bool char, vector signed char); +vector signed char vec_vaddsbs (vector signed char, vector bool char); +vector signed char vec_vaddsbs (vector signed char, vector signed char); -Vector Shift Left Double Bit Immediate -@smallexample -@exdent vector signed char vec_sldb (vector signed char, vector signed char, -const unsigned int); -@exdent vector unsigned char vec_sldb (vector unsigned char, -vector unsigned char, const unsigned int); -@exdent vector signed short vec_sldb (vector signed short, vector signed short, -const unsigned int); -@exdent vector unsigned short vec_sldb (vector unsigned short, -vector unsigned short, const unsigned int); -@exdent vector signed int vec_sldb (vector signed int, vector signed int, -const unsigned int); -@exdent vector unsigned int vec_sldb (vector unsigned int, vector unsigned int, -const unsigned int); -@exdent vector signed long long vec_sldb (vector signed long long, -vector signed long long, const unsigned int); -@exdent vector unsigned long long vec_sldb (vector unsigned long long, -vector unsigned long long, const unsigned int); -@exdent vector signed __int128 vec_sldb (vector signed __int128, -vector signed __int128, const unsigned int); -@exdent vector unsigned __int128 vec_sldb (vector unsigned __int128, -vector unsigned __int128, const unsigned int); -@end smallexample +vector signed short vec_vaddshs (vector bool short, vector signed short); +vector signed short vec_vaddshs (vector signed short, vector bool short); +vector signed short vec_vaddshs (vector signed short, vector signed short); + +vector signed int vec_vaddsws (vector bool int, vector signed int); +vector signed int vec_vaddsws (vector signed int, vector bool int); +vector signed int vec_vaddsws (vector signed int, vector signed int); + +vector signed char vec_vaddubm (vector bool char, vector signed char); +vector signed char vec_vaddubm (vector signed char, vector bool char); +vector signed char vec_vaddubm (vector signed char, vector signed char); +vector unsigned char vec_vaddubm (vector bool char, vector unsigned char); +vector unsigned char vec_vaddubm (vector unsigned char, vector bool char); +vector unsigned char vec_vaddubm (vector unsigned char, vector unsigned char); -Shift the combined input vectors left by the amount specified by the low-order -three bits of the third argument, and return the leftmost remaining 128 bits. -Code using this instruction must be endian-aware. +vector unsigned char vec_vaddubs (vector bool char, vector unsigned char); +vector unsigned char vec_vaddubs (vector unsigned char, vector bool char); +vector unsigned char vec_vaddubs (vector unsigned char, vector unsigned char); -@findex vec_sldb +vector signed short vec_vadduhm (vector bool short, vector signed short); +vector signed short vec_vadduhm (vector signed short, vector bool short); +vector signed short vec_vadduhm (vector signed short, vector signed short); +vector unsigned short vec_vadduhm (vector bool short, vector unsigned short); +vector unsigned short vec_vadduhm (vector unsigned short, vector bool short); +vector unsigned short vec_vadduhm (vector unsigned short, vector unsigned short); -Vector Shift Right Double Bit Immediate +vector unsigned short vec_vadduhs (vector bool short, vector unsigned short); +vector unsigned short vec_vadduhs (vector unsigned short, vector bool short); +vector unsigned short vec_vadduhs (vector unsigned short, vector unsigned short); -@smallexample -@exdent vector signed char vec_srdb (vector signed char, vector signed char, -const unsigned int); -@exdent vector unsigned char vec_srdb (vector unsigned char, vector unsigned char, -const unsigned int); -@exdent vector signed short vec_srdb (vector signed short, vector signed short, -const unsigned int); -@exdent vector unsigned short vec_srdb (vector unsigned short, vector unsigned short, -const unsigned int); -@exdent vector signed int vec_srdb (vector signed int, vector signed int, -const unsigned int); -@exdent vector unsigned int vec_srdb (vector unsigned int, vector unsigned int, -const unsigned int); -@exdent vector signed long long vec_srdb (vector signed long long, -vector signed long long, const unsigned int); -@exdent vector unsigned long long vec_srdb (vector unsigned long long, -vector unsigned long long, const unsigned int); -@exdent vector signed __int128 vec_srdb (vector signed __int128, -vector signed __int128, const unsigned int); -@exdent vector unsigned __int128 vec_srdb (vector unsigned __int128, -vector unsigned __int128, const unsigned int); -@end smallexample +vector signed int vec_vadduwm (vector bool int, vector signed int); +vector signed int vec_vadduwm (vector signed int, vector bool int); +vector signed int vec_vadduwm (vector signed int, vector signed int); +vector unsigned int vec_vadduwm (vector bool int, vector unsigned int); +vector unsigned int vec_vadduwm (vector unsigned int, vector bool int); +vector unsigned int vec_vadduwm (vector unsigned int, vector unsigned int); -Shift the combined input vectors right by the amount specified by the low-order -three bits of the third argument, and return the remaining 128 bits. Code -using this built-in must be endian-aware. +vector unsigned int vec_vadduws (vector bool int, vector unsigned int); +vector unsigned int vec_vadduws (vector unsigned int, vector bool int); +vector unsigned int vec_vadduws (vector unsigned int, vector unsigned int); -@findex vec_srdb +vector signed char vec_vavgsb (vector signed char, vector signed char); -Vector Splat +vector signed short vec_vavgsh (vector signed short, vector signed short); -@smallexample -@exdent vector signed int vec_splati (const signed int); -@exdent vector float vec_splati (const float); -@end smallexample +vector signed int vec_vavgsw (vector signed int, vector signed int); -Splat a 32-bit immediate into a vector of words. +vector unsigned char vec_vavgub (vector unsigned char, vector unsigned char); -@findex vec_splati +vector unsigned short vec_vavguh (vector unsigned short, vector unsigned short); -@smallexample -@exdent vector double vec_splatid (const float); -@end smallexample +vector unsigned int vec_vavguw (vector unsigned int, vector unsigned int); -Convert a single precision floating-point value to double-precision and splat -the result to a vector of double-precision floats. +vector float vec_vcfsx (vector signed int, const int); -@findex vec_splatid +vector float vec_vcfux (vector unsigned int, const int); -@smallexample -@exdent vector signed int vec_splati_ins (vector signed int, -const unsigned int, const signed int); -@exdent vector unsigned int vec_splati_ins (vector unsigned int, -const unsigned int, const unsigned int); -@exdent vector float vec_splati_ins (vector float, const unsigned int, -const float); -@end smallexample +vector bool int vec_vcmpeqfp (vector float, vector float); -Argument 2 must be either 0 or 1. Splat the value of argument 3 into the word -identified by argument 2 of each doubleword of argument 1 and return the -result. The other words of argument 1 are unchanged. +vector bool char vec_vcmpequb (vector signed char, vector signed char); +vector bool char vec_vcmpequb (vector unsigned char, vector unsigned char); -@findex vec_splati_ins +vector bool short vec_vcmpequh (vector signed short, vector signed short); +vector bool short vec_vcmpequh (vector unsigned short, vector unsigned short); -Vector Blend Variable +vector bool int vec_vcmpequw (vector signed int, vector signed int); +vector bool int vec_vcmpequw (vector unsigned int, vector unsigned int); -@smallexample -@exdent vector signed char vec_blendv (vector signed char, vector signed char, -vector unsigned char); -@exdent vector unsigned char vec_blendv (vector unsigned char, -vector unsigned char, vector unsigned char); -@exdent vector signed short vec_blendv (vector signed short, -vector signed short, vector unsigned short); -@exdent vector unsigned short vec_blendv (vector unsigned short, -vector unsigned short, vector unsigned short); -@exdent vector signed int vec_blendv (vector signed int, vector signed int, -vector unsigned int); -@exdent vector unsigned int vec_blendv (vector unsigned int, -vector unsigned int, vector unsigned int); -@exdent vector signed long long vec_blendv (vector signed long long, -vector signed long long, vector unsigned long long); -@exdent vector unsigned long long vec_blendv (vector unsigned long long, -vector unsigned long long, vector unsigned long long); -@exdent vector float vec_blendv (vector float, vector float, -vector unsigned int); -@exdent vector double vec_blendv (vector double, vector double, -vector unsigned long long); -@end smallexample +vector bool int vec_vcmpgtfp (vector float, vector float); -Blend the first and second argument vectors according to the sign bits of the -corresponding elements of the third argument vector. This is similar to the -@code{vsel} and @code{xxsel} instructions but for bigger elements. +vector bool char vec_vcmpgtsb (vector signed char, vector signed char); -@findex vec_blendv +vector bool short vec_vcmpgtsh (vector signed short, vector signed short); -Vector Permute Extended +vector bool int vec_vcmpgtsw (vector signed int, vector signed int); -@smallexample -@exdent vector signed char vec_permx (vector signed char, vector signed char, -vector unsigned char, const int); -@exdent vector unsigned char vec_permx (vector unsigned char, -vector unsigned char, vector unsigned char, const int); -@exdent vector signed short vec_permx (vector signed short, -vector signed short, vector unsigned char, const int); -@exdent vector unsigned short vec_permx (vector unsigned short, -vector unsigned short, vector unsigned char, const int); -@exdent vector signed int vec_permx (vector signed int, vector signed int, -vector unsigned char, const int); -@exdent vector unsigned int vec_permx (vector unsigned int, -vector unsigned int, vector unsigned char, const int); -@exdent vector signed long long vec_permx (vector signed long long, -vector signed long long, vector unsigned char, const int); -@exdent vector unsigned long long vec_permx (vector unsigned long long, -vector unsigned long long, vector unsigned char, const int); -@exdent vector float (vector float, vector float, vector unsigned char, -const int); -@exdent vector double (vector double, vector double, vector unsigned char, -const int); -@end smallexample +vector bool char vec_vcmpgtub (vector unsigned char, vector unsigned char); -Perform a partial permute of the first two arguments, which form a 32-byte -section of an emulated vector up to 256 bytes wide, using the partial permute -control vector in the third argument. The fourth argument (constrained to -values of 0-7) identifies which 32-byte section of the emulated vector is -contained in the first two arguments. -@findex vec_permx +vector bool short vec_vcmpgtuh (vector unsigned short, vector unsigned short); -@smallexample -@exdent vector unsigned long long int -@exdent vec_pext (vector unsigned long long int, vector unsigned long long int); -@end smallexample -Perform a vector parallel bit extract operation, as if implemented by -the @code{vpextd} instruction. -@findex vec_pext +vector bool int vec_vcmpgtuw (vector unsigned int, vector unsigned int); -@smallexample -@exdent vector unsigned char vec_stril (vector unsigned char); -@exdent vector signed char vec_stril (vector signed char); -@exdent vector unsigned short vec_stril (vector unsigned short); -@exdent vector signed short vec_stril (vector signed short); -@end smallexample -Isolate the left-most non-zero elements of the incoming vector argument, -replacing all elements to the right of the left-most zero element -found within the argument with zero. The typical implementation uses -the @code{vstribl} or @code{vstrihl} instruction on big-endian targets -and uses the @code{vstribr} or @code{vstrihr} instruction on -little-endian targets. -@findex vec_stril +vector float vec_vmaxfp (vector float, vector float); -@smallexample -@exdent int vec_stril_p (vector unsigned char); -@exdent int vec_stril_p (vector signed char); -@exdent int short vec_stril_p (vector unsigned short); -@exdent int vec_stril_p (vector signed short); -@end smallexample -Return a non-zero value if and only if the argument contains a zero -element. The typical implementation uses -the @code{vstribl.} or @code{vstrihl.} instruction on big-endian targets -and uses the @code{vstribr.} or @code{vstrihr.} instruction on -little-endian targets. Choose this built-in to check for presence of -zero element if the same argument is also passed to @code{vec_stril}. -@findex vec_stril_p +vector signed char vec_vmaxsb (vector bool char, vector signed char); +vector signed char vec_vmaxsb (vector signed char, vector bool char); +vector signed char vec_vmaxsb (vector signed char, vector signed char); -@smallexample -@exdent vector unsigned char vec_strir (vector unsigned char); -@exdent vector signed char vec_strir (vector signed char); -@exdent vector unsigned short vec_strir (vector unsigned short); -@exdent vector signed short vec_strir (vector signed short); -@end smallexample -Isolate the right-most non-zero elements of the incoming vector argument, -replacing all elements to the left of the right-most zero element -found within the argument with zero. The typical implementation uses -the @code{vstribr} or @code{vstrihr} instruction on big-endian targets -and uses the @code{vstribl} or @code{vstrihl} instruction on -little-endian targets. -@findex vec_strir +vector signed short vec_vmaxsh (vector bool short, vector signed short); +vector signed short vec_vmaxsh (vector signed short, vector bool short); +vector signed short vec_vmaxsh (vector signed short, vector signed short); -@smallexample -@exdent int vec_strir_p (vector unsigned char); -@exdent int vec_strir_p (vector signed char); -@exdent int short vec_strir_p (vector unsigned short); -@exdent int vec_strir_p (vector signed short); -@end smallexample -Return a non-zero value if and only if the argument contains a zero -element. The typical implementation uses -the @code{vstribr.} or @code{vstrihr.} instruction on big-endian targets -and uses the @code{vstribl.} or @code{vstrihl.} instruction on -little-endian targets. Choose this built-in to check for presence of -zero element if the same argument is also passed to @code{vec_strir}. -@findex vec_strir_p +vector signed int vec_vmaxsw (vector bool int, vector signed int); +vector signed int vec_vmaxsw (vector signed int, vector bool int); +vector signed int vec_vmaxsw (vector signed int, vector signed int); -@smallexample -@exdent vector unsigned char -@exdent vec_ternarylogic (vector unsigned char, vector unsigned char, - vector unsigned char, const unsigned int); -@exdent vector unsigned short -@exdent vec_ternarylogic (vector unsigned short, vector unsigned short, - vector unsigned short, const unsigned int); -@exdent vector unsigned int -@exdent vec_ternarylogic (vector unsigned int, vector unsigned int, - vector unsigned int, const unsigned int); -@exdent vector unsigned long long int -@exdent vec_ternarylogic (vector unsigned long long int, vector unsigned long long int, - vector unsigned long long int, const unsigned int); -@exdent vector unsigned __int128 -@exdent vec_ternarylogic (vector unsigned __int128, vector unsigned __int128, - vector unsigned __int128, const unsigned int); -@end smallexample -Perform a 128-bit vector evaluate operation, as if implemented by the -@code{xxeval} instruction. The fourth argument must be a literal -integer value between 0 and 255 inclusive. -@findex vec_ternarylogic +vector unsigned char vec_vmaxub (vector bool char, vector unsigned char); +vector unsigned char vec_vmaxub (vector unsigned char, vector bool char); +vector unsigned char vec_vmaxub (vector unsigned char, vector unsigned char); + +vector unsigned short vec_vmaxuh (vector bool short, vector unsigned short); +vector unsigned short vec_vmaxuh (vector unsigned short, vector bool short); +vector unsigned short vec_vmaxuh (vector unsigned short, vector unsigned short); -@smallexample -@exdent vector unsigned char vec_genpcvm (vector unsigned char, const int); -@exdent vector unsigned short vec_genpcvm (vector unsigned short, const int); -@exdent vector unsigned int vec_genpcvm (vector unsigned int, const int); -@exdent vector unsigned int vec_genpcvm (vector unsigned long long int, - const int); -@end smallexample +vector unsigned int vec_vmaxuw (vector bool int, vector unsigned int); +vector unsigned int vec_vmaxuw (vector unsigned int, vector bool int); +vector unsigned int vec_vmaxuw (vector unsigned int, vector unsigned int); -Vector Integer Multiply/Divide/Modulo +vector float vec_vminfp (vector float, vector float); -@smallexample -@exdent vector signed int -@exdent vec_mulh (vector signed int @var{a}, vector signed int @var{b}); -@exdent vector unsigned int -@exdent vec_mulh (vector unsigned int @var{a}, vector unsigned int @var{b}); -@end smallexample +vector signed char vec_vminsb (vector bool char, vector signed char); +vector signed char vec_vminsb (vector signed char, vector bool char); +vector signed char vec_vminsb (vector signed char, vector signed char); -For each integer value @code{i} from 0 to 3, do the following. The integer -value in word element @code{i} of a is multiplied by the integer value in word -element @code{i} of b. The high-order 32 bits of the 64-bit product are placed -into word element @code{i} of the vector returned. +vector signed short vec_vminsh (vector bool short, vector signed short); +vector signed short vec_vminsh (vector signed short, vector bool short); +vector signed short vec_vminsh (vector signed short, vector signed short); -@smallexample -@exdent vector signed long long -@exdent vec_mulh (vector signed long long @var{a}, vector signed long long @var{b}); -@exdent vector unsigned long long -@exdent vec_mulh (vector unsigned long long @var{a}, vector unsigned long long @var{b}); -@end smallexample +vector signed int vec_vminsw (vector bool int, vector signed int); +vector signed int vec_vminsw (vector signed int, vector bool int); +vector signed int vec_vminsw (vector signed int, vector signed int); -For each integer value @code{i} from 0 to 1, do the following. The integer -value in doubleword element @code{i} of a is multiplied by the integer value in -doubleword element @code{i} of b. The high-order 64 bits of the 128-bit product -are placed into doubleword element @code{i} of the vector returned. +vector unsigned char vec_vminub (vector bool char, vector unsigned char); +vector unsigned char vec_vminub (vector unsigned char, vector bool char); +vector unsigned char vec_vminub (vector unsigned char, vector unsigned char); -@smallexample -@exdent vector unsigned long long -@exdent vec_mul (vector unsigned long long @var{a}, vector unsigned long long @var{b}); -@exdent vector signed long long -@exdent vec_mul (vector signed long long @var{a}, vector signed long long @var{b}); -@end smallexample +vector unsigned short vec_vminuh (vector bool short, vector unsigned short); +vector unsigned short vec_vminuh (vector unsigned short, vector bool short); +vector unsigned short vec_vminuh (vector unsigned short, vector unsigned short); -For each integer value @code{i} from 0 to 1, do the following. The integer -value in doubleword element @code{i} of a is multiplied by the integer value in -doubleword element @code{i} of b. The low-order 64 bits of the 128-bit product -are placed into doubleword element @code{i} of the vector returned. +vector unsigned int vec_vminuw (vector bool int, vector unsigned int); +vector unsigned int vec_vminuw (vector unsigned int, vector bool int); +vector unsigned int vec_vminuw (vector unsigned int, vector unsigned int); -@smallexample -@exdent vector signed int -@exdent vec_div (vector signed int @var{a}, vector signed int @var{b}); -@exdent vector unsigned int -@exdent vec_div (vector unsigned int @var{a}, vector unsigned int @var{b}); -@end smallexample +vector bool char vec_vmrghb (vector bool char, vector bool char); +vector signed char vec_vmrghb (vector signed char, vector signed char); +vector unsigned char vec_vmrghb (vector unsigned char, vector unsigned char); -For each integer value @code{i} from 0 to 3, do the following. The integer in -word element @code{i} of a is divided by the integer in word element @code{i} -of b. The unique integer quotient is placed into the word element @code{i} of -the vector returned. If an attempt is made to perform any of the divisions - ÷ 0 then the quotient is undefined. +vector bool short vec_vmrghh (vector bool short, vector bool short); +vector signed short vec_vmrghh (vector signed short, vector signed short); +vector unsigned short vec_vmrghh (vector unsigned short, vector unsigned short); +vector pixel vec_vmrghh (vector pixel, vector pixel); -@smallexample -@exdent vector signed long long -@exdent vec_div (vector signed long long @var{a}, vector signed long long @var{b}); -@exdent vector unsigned long long -@exdent vec_div (vector unsigned long long @var{a}, vector unsigned long long @var{b}); -@end smallexample +vector float vec_vmrghw (vector float, vector float); +vector bool int vec_vmrghw (vector bool int, vector bool int); +vector signed int vec_vmrghw (vector signed int, vector signed int); +vector unsigned int vec_vmrghw (vector unsigned int, vector unsigned int); -For each integer value @code{i} from 0 to 1, do the following. The integer in -doubleword element @code{i} of a is divided by the integer in doubleword -element @code{i} of b. The unique integer quotient is placed into the -doubleword element @code{i} of the vector returned. If an attempt is made to -perform any of the divisions 0x8000_0000_0000_0000 ÷ -1 or ÷ 0 then -the quotient is undefined. +vector bool char vec_vmrglb (vector bool char, vector bool char); +vector signed char vec_vmrglb (vector signed char, vector signed char); +vector unsigned char vec_vmrglb (vector unsigned char, vector unsigned char); -@smallexample -@exdent vector signed int -@exdent vec_dive (vector signed int @var{a}, vector signed int @var{b}); -@exdent vector unsigned int -@exdent vec_dive (vector unsigned int @var{a}, vector unsigned int @var{b}); -@end smallexample +vector bool short vec_vmrglh (vector bool short, vector bool short); +vector signed short vec_vmrglh (vector signed short, vector signed short); +vector unsigned short vec_vmrglh (vector unsigned short, vector unsigned short); +vector pixel vec_vmrglh (vector pixel, vector pixel); -For each integer value @code{i} from 0 to 3, do the following. The integer in -word element @code{i} of a is shifted left by 32 bits, then divided by the -integer in word element @code{i} of b. The unique integer quotient is placed -into the word element @code{i} of the vector returned. If the quotient cannot -be represented in 32 bits, or if an attempt is made to perform any of the -divisions ÷ 0 then the quotient is undefined. +vector float vec_vmrglw (vector float, vector float); +vector signed int vec_vmrglw (vector signed int, vector signed int); +vector unsigned int vec_vmrglw (vector unsigned int, vector unsigned int); +vector bool int vec_vmrglw (vector bool int, vector bool int); -@smallexample -@exdent vector signed long long -@exdent vec_dive (vector signed long long @var{a}, vector signed long long @var{b}); -@exdent vector unsigned long long -@exdent vec_dive (vector unsigned long long @var{a}, vector unsigned long long @var{b}); -@end smallexample +vector signed int vec_vmsummbm (vector signed char, vector unsigned char, + vector signed int); -For each integer value @code{i} from 0 to 1, do the following. The integer in -doubleword element @code{i} of a is shifted left by 64 bits, then divided by -the integer in doubleword element @code{i} of b. The unique integer quotient is -placed into the doubleword element @code{i} of the vector returned. If the -quotient cannot be represented in 64 bits, or if an attempt is made to perform - ÷ 0 then the quotient is undefined. +vector signed int vec_vmsumshm (vector signed short, vector signed short, + vector signed int); -@smallexample -@exdent vector signed int -@exdent vec_mod (vector signed int @var{a}, vector signed int @var{b}); -@exdent vector unsigned int -@exdent vec_mod (vector unsigned int @var{a}, vector unsigned int @var{b}); -@end smallexample +vector signed int vec_vmsumshs (vector signed short, vector signed short, + vector signed int); -For each integer value @code{i} from 0 to 3, do the following. The integer in -word element @code{i} of a is divided by the integer in word element @code{i} -of b. The unique integer remainder is placed into the word element @code{i} of -the vector returned. If an attempt is made to perform any of the divisions -0x8000_0000 ÷ -1 or ÷ 0 then the remainder is undefined. +vector unsigned int vec_vmsumubm (vector unsigned char, vector unsigned char, + vector unsigned int); -@smallexample -@exdent vector signed long long -@exdent vec_mod (vector signed long long @var{a}, vector signed long long @var{b}); -@exdent vector unsigned long long -@exdent vec_mod (vector unsigned long long @var{a}, vector unsigned long long @var{b}); -@end smallexample +vector unsigned int vec_vmsumuhm (vector unsigned short, vector unsigned short, + vector unsigned int); -For each integer value @code{i} from 0 to 1, do the following. The integer in -doubleword element @code{i} of a is divided by the integer in doubleword -element @code{i} of b. The unique integer remainder is placed into the -doubleword element @code{i} of the vector returned. If an attempt is made to -perform ÷ 0 then the remainder is undefined. +vector unsigned int vec_vmsumuhs (vector unsigned short, vector unsigned short, + vector unsigned int); -Generate PCV from specified Mask size, as if implemented by the -@code{xxgenpcvbm}, @code{xxgenpcvhm}, @code{xxgenpcvwm} instructions, where -immediate value is either 0, 1, 2 or 3. -@findex vec_genpcvm +vector signed short vec_vmulesb (vector signed char, vector signed char); -@smallexample -@exdent vector unsigned __int128 vec_rl (vector unsigned __int128 @var{A}, - vector unsigned __int128 @var{B}); -@exdent vector signed __int128 vec_rl (vector signed __int128 @var{A}, - vector unsigned __int128 @var{B}); -@end smallexample +vector signed int vec_vmulesh (vector signed short, vector signed short); -Result value: Each element of @var{R} is obtained by rotating the corresponding element -of @var{A} left by the number of bits specified by the corresponding element of @var{B}. +vector unsigned short vec_vmuleub (vector unsigned char, vector unsigned char); +vector unsigned int vec_vmuleuh (vector unsigned short, vector unsigned short); -@smallexample -@exdent vector unsigned __int128 vec_rlmi (vector unsigned __int128, - vector unsigned __int128, - vector unsigned __int128); -@exdent vector signed __int128 vec_rlmi (vector signed __int128, - vector signed __int128, - vector unsigned __int128); -@end smallexample +vector signed short vec_vmulosb (vector signed char, vector signed char); -Returns the result of rotating the first input and inserting it under mask -into the second input. The first bit in the mask, the last bit in the mask are -obtained from the two 7-bit fields bits [108:115] and bits [117:123] -respectively of the second input. The shift is obtained from the third input -in the 7-bit field [125:131] where all bits counted from zero at the left. +vector signed int vec_vmulosh (vector signed short, vector signed short); -@smallexample -@exdent vector unsigned __int128 vec_rlnm (vector unsigned __int128, - vector unsigned __int128, - vector unsigned __int128); -@exdent vector signed __int128 vec_rlnm (vector signed __int128, - vector unsigned __int128, - vector unsigned __int128); -@end smallexample +vector unsigned short vec_vmuloub (vector unsigned char, vector unsigned char); -Returns the result of rotating the first input and ANDing it with a mask. The -first bit in the mask and the last bit in the mask are obtained from the two -7-bit fields bits [117:123] and bits [125:131] respectively of the second -input. The shift is obtained from the third input in the 7-bit field bits -[125:131] where all bits counted from zero at the left. +vector unsigned int vec_vmulouh (vector unsigned short, vector unsigned short); -@smallexample -@exdent vector unsigned __int128 vec_sl(vector unsigned __int128 @var{A}, vector unsigned __int128 @var{B}); -@exdent vector signed __int128 vec_sl(vector signed __int128 @var{A}, vector unsigned __int128 @var{B}); -@end smallexample +vector signed char vec_vpkshss (vector signed short, vector signed short); -Result value: Each element of @var{R} is obtained by shifting the corresponding element of -@var{A} left by the number of bits specified by the corresponding element of @var{B}. +vector unsigned char vec_vpkshus (vector signed short, vector signed short); -@smallexample -@exdent vector unsigned __int128 vec_sr(vector unsigned __int128 @var{A}, vector unsigned __int128 @var{B}); -@exdent vector signed __int128 vec_sr(vector signed __int128 @var{A}, vector unsigned __int128 @var{B}); -@end smallexample +vector signed short vec_vpkswss (vector signed int, vector signed int); -Result value: Each element of @var{R} is obtained by shifting the corresponding element of -@var{A} right by the number of bits specified by the corresponding element of @var{B}. +vector unsigned short vec_vpkswus (vector signed int, vector signed int); -@smallexample -@exdent vector unsigned __int128 vec_sra(vector unsigned __int128 @var{A}, vector unsigned __int128 @var{B}); -@exdent vector signed __int128 vec_sra(vector signed __int128 @var{A}, vector unsigned __int128 @var{B}); -@end smallexample +vector bool char vec_vpkuhum (vector bool short, vector bool short); +vector signed char vec_vpkuhum (vector signed short, vector signed short); +vector unsigned char vec_vpkuhum (vector unsigned short, vector unsigned short); -Result value: Each element of @var{R} is obtained by arithmetic shifting the corresponding -element of @var{A} right by the number of bits specified by the corresponding element of @var{B}. +vector unsigned char vec_vpkuhus (vector unsigned short, vector unsigned short); -@smallexample -@exdent vector unsigned __int128 vec_mule (vector unsigned long long, - vector unsigned long long); -@exdent vector signed __int128 vec_mule (vector signed long long, - vector signed long long); -@end smallexample +vector bool short vec_vpkuwum (vector bool int, vector bool int); +vector signed short vec_vpkuwum (vector signed int, vector signed int); +vector unsigned short vec_vpkuwum (vector unsigned int, vector unsigned int); + +vector unsigned short vec_vpkuwus (vector unsigned int, vector unsigned int); + +vector signed char vec_vrlb (vector signed char, vector unsigned char); +vector unsigned char vec_vrlb (vector unsigned char, vector unsigned char); + +vector signed short vec_vrlh (vector signed short, vector unsigned short); +vector unsigned short vec_vrlh (vector unsigned short, vector unsigned short); + +vector signed int vec_vrlw (vector signed int, vector unsigned int); +vector unsigned int vec_vrlw (vector unsigned int, vector unsigned int); -Returns a vector containing a 128-bit integer result of multiplying the even -doubleword elements of the two inputs. +vector signed char vec_vslb (vector signed char, vector unsigned char); +vector unsigned char vec_vslb (vector unsigned char, vector unsigned char); -@smallexample -@exdent vector unsigned __int128 vec_mulo (vector unsigned long long, - vector unsigned long long); -@exdent vector signed __int128 vec_mulo (vector signed long long, - vector signed long long); -@end smallexample +vector signed short vec_vslh (vector signed short, vector unsigned short); +vector unsigned short vec_vslh (vector unsigned short, vector unsigned short); -Returns a vector containing a 128-bit integer result of multiplying the odd -doubleword elements of the two inputs. +vector signed int vec_vslw (vector signed int, vector unsigned int); +vector unsigned int vec_vslw (vector unsigned int, vector unsigned int); -@smallexample -@exdent vector unsigned __int128 vec_div (vector unsigned __int128, - vector unsigned __int128); -@exdent vector signed __int128 vec_div (vector signed __int128, - vector signed __int128); -@end smallexample +vector signed char vec_vspltb (vector signed char, const int); +vector unsigned char vec_vspltb (vector unsigned char, const int); +vector bool char vec_vspltb (vector bool char, const int); -Returns the result of dividing the first operand by the second operand. An -attempt to divide any value by zero or to divide the most negative signed -128-bit integer by negative one results in an undefined value. +vector bool short vec_vsplth (vector bool short, const int); +vector signed short vec_vsplth (vector signed short, const int); +vector unsigned short vec_vsplth (vector unsigned short, const int); +vector pixel vec_vsplth (vector pixel, const int); -@smallexample -@exdent vector unsigned __int128 vec_dive (vector unsigned __int128, - vector unsigned __int128); -@exdent vector signed __int128 vec_dive (vector signed __int128, - vector signed __int128); -@end smallexample +vector float vec_vspltw (vector float, const int); +vector signed int vec_vspltw (vector signed int, const int); +vector unsigned int vec_vspltw (vector unsigned int, const int); +vector bool int vec_vspltw (vector bool int, const int); -The result is produced by shifting the first input left by 128 bits and -dividing by the second. If an attempt is made to divide by zero or the result -is larger than 128 bits, the result is undefined. +vector signed char vec_vsrab (vector signed char, vector unsigned char); +vector unsigned char vec_vsrab (vector unsigned char, vector unsigned char); -@smallexample -@exdent vector unsigned __int128 vec_mod (vector unsigned __int128, - vector unsigned __int128); -@exdent vector signed __int128 vec_mod (vector signed __int128, - vector signed __int128); -@end smallexample +vector signed short vec_vsrah (vector signed short, vector unsigned short); +vector unsigned short vec_vsrah (vector unsigned short, vector unsigned short); -The result is the modulo result of dividing the first input by the second -input. +vector signed int vec_vsraw (vector signed int, vector unsigned int); +vector unsigned int vec_vsraw (vector unsigned int, vector unsigned int); -The following builtins perform 128-bit vector comparisons. The -@code{vec_all_xx}, @code{vec_any_xx}, and @code{vec_cmpxx}, where @code{xx} is -one of the operations @code{eq, ne, gt, lt, ge, le} perform pairwise -comparisons between the elements at the same positions within their two vector -arguments. The @code{vec_all_xx}function returns a non-zero value if and only -if all pairwise comparisons are true. The @code{vec_any_xx} function returns -a non-zero value if and only if at least one pairwise comparison is true. The -@code{vec_cmpxx}function returns a vector of the same type as its two -arguments, within which each element consists of all ones to denote that -specified logical comparison of the corresponding elements was true. -Otherwise, the element of the returned vector contains all zeros. +vector signed char vec_vsrb (vector signed char, vector unsigned char); +vector unsigned char vec_vsrb (vector unsigned char, vector unsigned char); -@smallexample -vector bool __int128 vec_cmpeq (vector signed __int128, vector signed __int128); -vector bool __int128 vec_cmpeq (vector unsigned __int128, vector unsigned __int128); -vector bool __int128 vec_cmpne (vector signed __int128, vector signed __int128); -vector bool __int128 vec_cmpne (vector unsigned __int128, vector unsigned __int128); -vector bool __int128 vec_cmpgt (vector signed __int128, vector signed __int128); -vector bool __int128 vec_cmpgt (vector unsigned __int128, vector unsigned __int128); -vector bool __int128 vec_cmplt (vector signed __int128, vector signed __int128); -vector bool __int128 vec_cmplt (vector unsigned __int128, vector unsigned __int128); -vector bool __int128 vec_cmpge (vector signed __int128, vector signed __int128); -vector bool __int128 vec_cmpge (vector unsigned __int128, vector unsigned __int128); -vector bool __int128 vec_cmple (vector signed __int128, vector signed __int128); -vector bool __int128 vec_cmple (vector unsigned __int128, vector unsigned __int128); +vector signed short vec_vsrh (vector signed short, vector unsigned short); +vector unsigned short vec_vsrh (vector unsigned short, vector unsigned short); -int vec_all_eq (vector signed __int128, vector signed __int128); -int vec_all_eq (vector unsigned __int128, vector unsigned __int128); -int vec_all_ne (vector signed __int128, vector signed __int128); -int vec_all_ne (vector unsigned __int128, vector unsigned __int128); -int vec_all_gt (vector signed __int128, vector signed __int128); -int vec_all_gt (vector unsigned __int128, vector unsigned __int128); -int vec_all_lt (vector signed __int128, vector signed __int128); -int vec_all_lt (vector unsigned __int128, vector unsigned __int128); -int vec_all_ge (vector signed __int128, vector signed __int128); -int vec_all_ge (vector unsigned __int128, vector unsigned __int128); -int vec_all_le (vector signed __int128, vector signed __int128); -int vec_all_le (vector unsigned __int128, vector unsigned __int128); +vector signed int vec_vsrw (vector signed int, vector unsigned int); +vector unsigned int vec_vsrw (vector unsigned int, vector unsigned int); -int vec_any_eq (vector signed __int128, vector signed __int128); -int vec_any_eq (vector unsigned __int128, vector unsigned __int128); -int vec_any_ne (vector signed __int128, vector signed __int128); -int vec_any_ne (vector unsigned __int128, vector unsigned __int128); -int vec_any_gt (vector signed __int128, vector signed __int128); -int vec_any_gt (vector unsigned __int128, vector unsigned __int128); -int vec_any_lt (vector signed __int128, vector signed __int128); -int vec_any_lt (vector unsigned __int128, vector unsigned __int128); -int vec_any_ge (vector signed __int128, vector signed __int128); -int vec_any_ge (vector unsigned __int128, vector unsigned __int128); -int vec_any_le (vector signed __int128, vector signed __int128); -int vec_any_le (vector unsigned __int128, vector unsigned __int128); -@end smallexample +vector float vec_vsubfp (vector float, vector float); +vector signed char vec_vsubsbs (vector bool char, vector signed char); +vector signed char vec_vsubsbs (vector signed char, vector bool char); +vector signed char vec_vsubsbs (vector signed char, vector signed char); -The following instances are extension of the existing overloaded built-ins -@code{vec_sld}, @code{vec_sldw}, @code{vec_slo}, @code{vec_sro}, @code{vec_srl} -that are documented in the PVIPR. +vector signed short vec_vsubshs (vector bool short, vector signed short); +vector signed short vec_vsubshs (vector signed short, vector bool short); +vector signed short vec_vsubshs (vector signed short, vector signed short); -@smallexample -@exdent vector signed __int128 vec_sld (vector signed __int128, -vector signed __int128, const unsigned int); -@exdent vector unsigned __int128 vec_sld (vector unsigned __int128, -vector unsigned __int128, const unsigned int); -@exdent vector signed __int128 vec_sldw (vector signed __int128, -vector signed __int128, const unsigned int); -@exdent vector unsigned __int128 vec_sldw (vector unsigned __int, -vector unsigned __int128, const unsigned int); -@exdent vector signed __int128 vec_slo (vector signed __int128, -vector signed char); -@exdent vector signed __int128 vec_slo (vector signed __int128, -vector unsigned char); -@exdent vector unsigned __int128 vec_slo (vector unsigned __int128, -vector signed char); -@exdent vector unsigned __int128 vec_slo (vector unsigned __int128, -vector unsigned char); -@exdent vector signed __int128 vec_sro (vector signed __int128, -vector signed char); -@exdent vector signed __int128 vec_sro (vector signed __int128, -vector unsigned char); -@exdent vector unsigned __int128 vec_sro (vector unsigned __int128, -vector signed char); -@exdent vector unsigned __int128 vec_sro (vector unsigned __int128, -vector unsigned char); -@exdent vector signed __int128 vec_srl (vector signed __int128, -vector unsigned char); -@exdent vector unsigned __int128 vec_srl (vector unsigned __int128, -vector unsigned char); -@end smallexample +vector signed int vec_vsubsws (vector bool int, vector signed int); +vector signed int vec_vsubsws (vector signed int, vector bool int); +vector signed int vec_vsubsws (vector signed int, vector signed int); -@node PowerPC Hardware Transactional Memory Built-in Functions -@subsection PowerPC Hardware Transactional Memory Built-in Functions -GCC provides two interfaces for accessing the Hardware Transactional -Memory (HTM) instructions available on some of the PowerPC family -of processors (eg, POWER8). The two interfaces come in a low level -interface, consisting of built-in functions specific to PowerPC and a -higher level interface consisting of inline functions that are common -between PowerPC and S/390. +vector signed char vec_vsububm (vector bool char, vector signed char); +vector signed char vec_vsububm (vector signed char, vector bool char); +vector signed char vec_vsububm (vector signed char, vector signed char); +vector unsigned char vec_vsububm (vector bool char, vector unsigned char); +vector unsigned char vec_vsububm (vector unsigned char, vector bool char); +vector unsigned char vec_vsububm (vector unsigned char, vector unsigned char); -@subsubsection PowerPC HTM Low Level Built-in Functions +vector unsigned char vec_vsububs (vector bool char, vector unsigned char); +vector unsigned char vec_vsububs (vector unsigned char, vector bool char); +vector unsigned char vec_vsububs (vector unsigned char, vector unsigned char); -The following low level built-in functions are available with -@option{-mhtm} or @option{-mcpu=CPU} where CPU is `power8' or later. -They all generate the machine instruction that is part of the name. +vector signed short vec_vsubuhm (vector bool short, vector signed short); +vector signed short vec_vsubuhm (vector signed short, vector bool short); +vector signed short vec_vsubuhm (vector signed short, vector signed short); +vector unsigned short vec_vsubuhm (vector bool short, vector unsigned short); +vector unsigned short vec_vsubuhm (vector unsigned short, vector bool short); +vector unsigned short vec_vsubuhm (vector unsigned short, vector unsigned short); -The HTM builtins (with the exception of @code{__builtin_tbegin}) return -the full 4-bit condition register value set by their associated hardware -instruction. The header file @code{htmintrin.h} defines some macros that can -be used to decipher the return value. The @code{__builtin_tbegin} builtin -returns a simple @code{true} or @code{false} value depending on whether a transaction was -successfully started or not. The arguments of the builtins match exactly the -type and order of the associated hardware instruction's operands, except for -the @code{__builtin_tcheck} builtin, which does not take any input arguments. -Refer to the ISA manual for a description of each instruction's operands. +vector unsigned short vec_vsubuhs (vector bool short, vector unsigned short); +vector unsigned short vec_vsubuhs (vector unsigned short, vector bool short); +vector unsigned short vec_vsubuhs (vector unsigned short, vector unsigned short); -@smallexample -unsigned int __builtin_tbegin (unsigned int); -unsigned int __builtin_tend (unsigned int); +vector signed int vec_vsubuwm (vector bool int, vector signed int); +vector signed int vec_vsubuwm (vector signed int, vector bool int); +vector signed int vec_vsubuwm (vector signed int, vector signed int); +vector unsigned int vec_vsubuwm (vector bool int, vector unsigned int); +vector unsigned int vec_vsubuwm (vector unsigned int, vector bool int); +vector unsigned int vec_vsubuwm (vector unsigned int, vector unsigned int); -unsigned int __builtin_tabort (unsigned int); -unsigned int __builtin_tabortdc (unsigned int, unsigned int, unsigned int); -unsigned int __builtin_tabortdci (unsigned int, unsigned int, int); -unsigned int __builtin_tabortwc (unsigned int, unsigned int, unsigned int); -unsigned int __builtin_tabortwci (unsigned int, unsigned int, int); +vector unsigned int vec_vsubuws (vector bool int, vector unsigned int); +vector unsigned int vec_vsubuws (vector unsigned int, vector bool int); +vector unsigned int vec_vsubuws (vector unsigned int, vector unsigned int); -unsigned int __builtin_tcheck (void); -unsigned int __builtin_treclaim (unsigned int); -unsigned int __builtin_trechkpt (void); -unsigned int __builtin_tsr (unsigned int); -@end smallexample +vector signed int vec_vsum4sbs (vector signed char, vector signed int); -In addition to the above HTM built-ins, we have added built-ins for -some common extended mnemonics of the HTM instructions: +vector signed int vec_vsum4shs (vector signed short, vector signed int); -@smallexample -unsigned int __builtin_tendall (void); -unsigned int __builtin_tresume (void); -unsigned int __builtin_tsuspend (void); -@end smallexample +vector unsigned int vec_vsum4ubs (vector unsigned char, vector unsigned int); -Note that the semantics of the above HTM builtins are required to mimic -the locking semantics used for critical sections. Builtins that are used -to create a new transaction or restart a suspended transaction must have -lock acquisition like semantics while those builtins that end or suspend a -transaction must have lock release like semantics. Specifically, this must -mimic lock semantics as specified by C++11, for example: Lock acquisition is -as-if an execution of __atomic_exchange_n(&globallock,1,__ATOMIC_ACQUIRE) -that returns 0, and lock release is as-if an execution of -__atomic_store(&globallock,0,__ATOMIC_RELEASE), with globallock being an -implicit implementation-defined lock used for all transactions. The HTM -instructions associated with with the builtins inherently provide the -correct acquisition and release hardware barriers required. However, -the compiler must also be prohibited from moving loads and stores across -the builtins in a way that would violate their semantics. This has been -accomplished by adding memory barriers to the associated HTM instructions -(which is a conservative approach to provide acquire and release semantics). -Earlier versions of the compiler did not treat the HTM instructions as -memory barriers. A @code{__TM_FENCE__} macro has been added, which can -be used to determine whether the current compiler treats HTM instructions -as memory barriers or not. This allows the user to explicitly add memory -barriers to their code when using an older version of the compiler. +vector unsigned int vec_vupkhpx (vector pixel); + +vector bool short vec_vupkhsb (vector bool char); +vector signed short vec_vupkhsb (vector signed char); + +vector bool int vec_vupkhsh (vector bool short); +vector signed int vec_vupkhsh (vector signed short); -The following set of built-in functions are available to gain access -to the HTM specific special purpose registers. +vector unsigned int vec_vupklpx (vector pixel); -@smallexample -unsigned long __builtin_get_texasr (void); -unsigned long __builtin_get_texasru (void); -unsigned long __builtin_get_tfhar (void); -unsigned long __builtin_get_tfiar (void); +vector bool short vec_vupklsb (vector bool char); +vector signed short vec_vupklsb (vector signed char); -void __builtin_set_texasr (unsigned long); -void __builtin_set_texasru (unsigned long); -void __builtin_set_tfhar (unsigned long); -void __builtin_set_tfiar (unsigned long); +vector bool int vec_vupklsh (vector bool short); +vector signed int vec_vupklsh (vector signed short); @end smallexample -Example usage of these low level built-in functions may look like: +@node PowerPC AltiVec Built-in Functions Available on ISA 2.06 +@subsubsection PowerPC AltiVec Built-in Functions Available on ISA 2.06 -@smallexample -#include +The AltiVec built-in functions described in this section are +available on the PowerPC family of processors starting with ISA 2.06 +or later. These are normally enabled by adding @option{-mvsx} to the +command line. -int num_retries = 10; +When @option{-mvsx} is used, the following additional vector types are +implemented. -while (1) - @{ - if (__builtin_tbegin (0)) - @{ - /* Transaction State Initiated. */ - if (is_locked (lock)) - __builtin_tabort (0); - ... transaction code... - __builtin_tend (0); - break; - @} - else - @{ - /* Transaction State Failed. Use locks if the transaction - failure is "persistent" or we've tried too many times. */ - if (num_retries-- <= 0 - || _TEXASRU_FAILURE_PERSISTENT (__builtin_get_texasru ())) - @{ - acquire_lock (lock); - ... non transactional fallback path... - release_lock (lock); - break; - @} - @} - @} +@smallexample +vector unsigned __int128 +vector signed __int128 +vector unsigned long long int +vector signed long long int +vector double @end smallexample -One final built-in function has been added that returns the value of -the 2-bit Transaction State field of the Machine Status Register (MSR) -as stored in @code{CR0}. +The long long types are only implemented for 64-bit code generation. + +Only functions excluded from the PVIPR are listed here. @smallexample -unsigned long __builtin_ttest (void) -@end smallexample +void vec_dst (const unsigned long *, int, const int); +void vec_dst (const long *, int, const int); -This built-in can be used to determine the current transaction state -using the following code example: +void vec_dststt (const unsigned long *, int, const int); +void vec_dststt (const long *, int, const int); -@smallexample -#include +void vec_dstt (const unsigned long *, int, const int); +void vec_dstt (const long *, int, const int); -unsigned char tx_state = _HTM_STATE (__builtin_ttest ()); +vector unsigned char vec_lvsl (int, const unsigned long *); +vector unsigned char vec_lvsl (int, const long *); -if (tx_state == _HTM_TRANSACTIONAL) - @{ - /* Code to use in transactional state. */ - @} -else if (tx_state == _HTM_NONTRANSACTIONAL) - @{ - /* Code to use in non-transactional state. */ - @} -else if (tx_state == _HTM_SUSPENDED) - @{ - /* Code to use in transaction suspended state. */ - @} -@end smallexample +vector unsigned char vec_lvsr (int, const unsigned long *); +vector unsigned char vec_lvsr (int, const long *); -@subsubsection PowerPC HTM High Level Inline Functions +vector unsigned char vec_lvsl (int, const double *); +vector unsigned char vec_lvsr (int, const double *); -The following high level HTM interface is made available by including -@code{} and using @option{-mhtm} or @option{-mcpu=CPU} -where CPU is `power8' or later. This interface is common between PowerPC -and S/390, allowing users to write one HTM source implementation that -can be compiled and executed on either system. +vector double vec_vsx_ld (int, const vector double *); +vector double vec_vsx_ld (int, const double *); +vector float vec_vsx_ld (int, const vector float *); +vector float vec_vsx_ld (int, const float *); +vector bool int vec_vsx_ld (int, const vector bool int *); +vector signed int vec_vsx_ld (int, const vector signed int *); +vector signed int vec_vsx_ld (int, const int *); +vector signed int vec_vsx_ld (int, const long *); +vector unsigned int vec_vsx_ld (int, const vector unsigned int *); +vector unsigned int vec_vsx_ld (int, const unsigned int *); +vector unsigned int vec_vsx_ld (int, const unsigned long *); +vector bool short vec_vsx_ld (int, const vector bool short *); +vector pixel vec_vsx_ld (int, const vector pixel *); +vector signed short vec_vsx_ld (int, const vector signed short *); +vector signed short vec_vsx_ld (int, const short *); +vector unsigned short vec_vsx_ld (int, const vector unsigned short *); +vector unsigned short vec_vsx_ld (int, const unsigned short *); +vector bool char vec_vsx_ld (int, const vector bool char *); +vector signed char vec_vsx_ld (int, const vector signed char *); +vector signed char vec_vsx_ld (int, const signed char *); +vector unsigned char vec_vsx_ld (int, const vector unsigned char *); +vector unsigned char vec_vsx_ld (int, const unsigned char *); -@smallexample -long __TM_simple_begin (void); -long __TM_begin (void* const TM_buff); -long __TM_end (void); -void __TM_abort (void); -void __TM_named_abort (unsigned char const code); -void __TM_resume (void); -void __TM_suspend (void); +void vec_vsx_st (vector double, int, vector double *); +void vec_vsx_st (vector double, int, double *); +void vec_vsx_st (vector float, int, vector float *); +void vec_vsx_st (vector float, int, float *); +void vec_vsx_st (vector signed int, int, vector signed int *); +void vec_vsx_st (vector signed int, int, int *); +void vec_vsx_st (vector unsigned int, int, vector unsigned int *); +void vec_vsx_st (vector unsigned int, int, unsigned int *); +void vec_vsx_st (vector bool int, int, vector bool int *); +void vec_vsx_st (vector bool int, int, unsigned int *); +void vec_vsx_st (vector bool int, int, int *); +void vec_vsx_st (vector signed short, int, vector signed short *); +void vec_vsx_st (vector signed short, int, short *); +void vec_vsx_st (vector unsigned short, int, vector unsigned short *); +void vec_vsx_st (vector unsigned short, int, unsigned short *); +void vec_vsx_st (vector bool short, int, vector bool short *); +void vec_vsx_st (vector bool short, int, unsigned short *); +void vec_vsx_st (vector pixel, int, vector pixel *); +void vec_vsx_st (vector pixel, int, unsigned short *); +void vec_vsx_st (vector pixel, int, short *); +void vec_vsx_st (vector bool short, int, short *); +void vec_vsx_st (vector signed char, int, vector signed char *); +void vec_vsx_st (vector signed char, int, signed char *); +void vec_vsx_st (vector unsigned char, int, vector unsigned char *); +void vec_vsx_st (vector unsigned char, int, unsigned char *); +void vec_vsx_st (vector bool char, int, vector bool char *); +void vec_vsx_st (vector bool char, int, unsigned char *); +void vec_vsx_st (vector bool char, int, signed char *); -long __TM_is_user_abort (void* const TM_buff); -long __TM_is_named_user_abort (void* const TM_buff, unsigned char *code); -long __TM_is_illegal (void* const TM_buff); -long __TM_is_footprint_exceeded (void* const TM_buff); -long __TM_nesting_depth (void* const TM_buff); -long __TM_is_nested_too_deep(void* const TM_buff); -long __TM_is_conflict(void* const TM_buff); -long __TM_is_failure_persistent(void* const TM_buff); -long __TM_failure_address(void* const TM_buff); -long long __TM_failure_code(void* const TM_buff); +vector double vec_xxpermdi (vector double, vector double, const int); +vector float vec_xxpermdi (vector float, vector float, const int); +vector __int128 vec_xxpermdi (vector __int128, + vector __int128, const int); +vector __uint128 vec_xxpermdi (vector __uint128, + vector __uint128, const int); +vector long long vec_xxpermdi (vector long long, vector long long, const int); +vector unsigned long long vec_xxpermdi (vector unsigned long long, + vector unsigned long long, const int); +vector int vec_xxpermdi (vector int, vector int, const int); +vector unsigned int vec_xxpermdi (vector unsigned int, + vector unsigned int, const int); +vector short vec_xxpermdi (vector short, vector short, const int); +vector unsigned short vec_xxpermdi (vector unsigned short, + vector unsigned short, const int); +vector signed char vec_xxpermdi (vector signed char, vector signed char, + const int); +vector unsigned char vec_xxpermdi (vector unsigned char, + vector unsigned char, const int); + +vector double vec_xxsldi (vector double, vector double, int); +vector float vec_xxsldi (vector float, vector float, int); +vector long long vec_xxsldi (vector long long, vector long long, int); +vector unsigned long long vec_xxsldi (vector unsigned long long, + vector unsigned long long, int); +vector int vec_xxsldi (vector int, vector int, int); +vector unsigned int vec_xxsldi (vector unsigned int, vector unsigned int, int); +vector short vec_xxsldi (vector short, vector short, int); +vector unsigned short vec_xxsldi (vector unsigned short, + vector unsigned short, int); +vector signed char vec_xxsldi (vector signed char, vector signed char, int); +vector unsigned char vec_xxsldi (vector unsigned char, + vector unsigned char, int); @end smallexample -Using these common set of HTM inline functions, we can create -a more portable version of the HTM example in the previous -section that will work on either PowerPC or S/390: +Note that the @samp{vec_ld} and @samp{vec_st} built-in functions always +generate the AltiVec @samp{LVX} and @samp{STVX} instructions even +if the VSX instruction set is available. The @samp{vec_vsx_ld} and +@samp{vec_vsx_st} built-in functions always generate the VSX @samp{LXVD2X}, +@samp{LXVW4X}, @samp{STXVD2X}, and @samp{STXVW4X} instructions. @smallexample -#include - -int num_retries = 10; -TM_buff_type TM_buff; - -while (1) - @{ - if (__TM_begin (TM_buff) == _HTM_TBEGIN_STARTED) - @{ - /* Transaction State Initiated. */ - if (is_locked (lock)) - __TM_abort (); - ... transaction code... - __TM_end (); - break; - @} - else - @{ - /* Transaction State Failed. Use locks if the transaction - failure is "persistent" or we've tried too many times. */ - if (num_retries-- <= 0 - || __TM_is_failure_persistent (TM_buff)) - @{ - acquire_lock (lock); - ... non transactional fallback path... - release_lock (lock); - break; - @} - @} - @} +vector signed long long vec_signedo (vector float); +vector signed long long vec_signede (vector float); +vector unsigned long long vec_unsignedo (vector float); +vector unsigned long long vec_unsignede (vector float); @end smallexample -@node PowerPC Atomic Memory Operation Functions -@subsection PowerPC Atomic Memory Operation Functions -ISA 3.0 of the PowerPC added new atomic memory operation (amo) -instructions. GCC provides support for these instructions in 64-bit -environments. All of the functions are declared in the include file -@code{amo.h}. +The overloaded built-ins @code{vec_signedo} and @code{vec_signede} are +additional extensions to the built-ins as documented in the PVIPR. + +@node PowerPC AltiVec Built-in Functions Available on ISA 2.07 +@subsubsection PowerPC AltiVec Built-in Functions Available on ISA 2.07 -The functions supported are: +If the ISA 2.07 additions to the vector/scalar (power8-vector) +instruction set are available, the following additional functions are +available for both 32-bit and 64-bit targets. For 64-bit targets, you +can use @var{vector long} instead of @var{vector long long}, +@var{vector bool long} instead of @var{vector bool long long}, and +@var{vector unsigned long} instead of @var{vector unsigned long long}. + +Only functions excluded from the PVIPR are listed here. @smallexample -#include +vector long long vec_vaddudm (vector long long, vector long long); +vector long long vec_vaddudm (vector bool long long, vector long long); +vector long long vec_vaddudm (vector long long, vector bool long long); +vector unsigned long long vec_vaddudm (vector unsigned long long, + vector unsigned long long); +vector unsigned long long vec_vaddudm (vector bool unsigned long long, + vector unsigned long long); +vector unsigned long long vec_vaddudm (vector unsigned long long, + vector bool unsigned long long); -uint32_t amo_lwat_add (uint32_t *, uint32_t); -uint32_t amo_lwat_xor (uint32_t *, uint32_t); -uint32_t amo_lwat_ior (uint32_t *, uint32_t); -uint32_t amo_lwat_and (uint32_t *, uint32_t); -uint32_t amo_lwat_umax (uint32_t *, uint32_t); -uint32_t amo_lwat_umin (uint32_t *, uint32_t); -uint32_t amo_lwat_swap (uint32_t *, uint32_t); +vector long long vec_vclz (vector long long); +vector unsigned long long vec_vclz (vector unsigned long long); +vector int vec_vclz (vector int); +vector unsigned int vec_vclz (vector int); +vector short vec_vclz (vector short); +vector unsigned short vec_vclz (vector unsigned short); +vector signed char vec_vclz (vector signed char); +vector unsigned char vec_vclz (vector unsigned char); -int32_t amo_lwat_sadd (int32_t *, int32_t); -int32_t amo_lwat_smax (int32_t *, int32_t); -int32_t amo_lwat_smin (int32_t *, int32_t); -int32_t amo_lwat_sswap (int32_t *, int32_t); +vector signed char vec_vclzb (vector signed char); +vector unsigned char vec_vclzb (vector unsigned char); -uint64_t amo_ldat_add (uint64_t *, uint64_t); -uint64_t amo_ldat_xor (uint64_t *, uint64_t); -uint64_t amo_ldat_ior (uint64_t *, uint64_t); -uint64_t amo_ldat_and (uint64_t *, uint64_t); -uint64_t amo_ldat_umax (uint64_t *, uint64_t); -uint64_t amo_ldat_umin (uint64_t *, uint64_t); -uint64_t amo_ldat_swap (uint64_t *, uint64_t); +vector long long vec_vclzd (vector long long); +vector unsigned long long vec_vclzd (vector unsigned long long); -int64_t amo_ldat_sadd (int64_t *, int64_t); -int64_t amo_ldat_smax (int64_t *, int64_t); -int64_t amo_ldat_smin (int64_t *, int64_t); -int64_t amo_ldat_sswap (int64_t *, int64_t); +vector short vec_vclzh (vector short); +vector unsigned short vec_vclzh (vector unsigned short); -void amo_stwat_add (uint32_t *, uint32_t); -void amo_stwat_xor (uint32_t *, uint32_t); -void amo_stwat_ior (uint32_t *, uint32_t); -void amo_stwat_and (uint32_t *, uint32_t); -void amo_stwat_umax (uint32_t *, uint32_t); -void amo_stwat_umin (uint32_t *, uint32_t); +vector int vec_vclzw (vector int); +vector unsigned int vec_vclzw (vector int); -void amo_stwat_sadd (int32_t *, int32_t); -void amo_stwat_smax (int32_t *, int32_t); -void amo_stwat_smin (int32_t *, int32_t); +vector signed char vec_vgbbd (vector signed char); +vector unsigned char vec_vgbbd (vector unsigned char); -void amo_stdat_add (uint64_t *, uint64_t); -void amo_stdat_xor (uint64_t *, uint64_t); -void amo_stdat_ior (uint64_t *, uint64_t); -void amo_stdat_and (uint64_t *, uint64_t); -void amo_stdat_umax (uint64_t *, uint64_t); -void amo_stdat_umin (uint64_t *, uint64_t); +vector long long vec_vmaxsd (vector long long, vector long long); -void amo_stdat_sadd (int64_t *, int64_t); -void amo_stdat_smax (int64_t *, int64_t); -void amo_stdat_smin (int64_t *, int64_t); -@end smallexample +vector unsigned long long vec_vmaxud (vector unsigned long long, + unsigned vector long long); -@node PowerPC Matrix-Multiply Assist Built-in Functions -@subsection PowerPC Matrix-Multiply Assist Built-in Functions -ISA 3.1 of the PowerPC added new Matrix-Multiply Assist (MMA) instructions. -GCC provides support for these instructions through the following built-in -functions which are enabled with the @code{-mmma} option. The vec_t type -below is defined to be a normal vector unsigned char type. The uint2, uint4 -and uint8 parameters are 2-bit, 4-bit and 8-bit unsigned integer constants -respectively. The compiler will verify that they are constants and that -their values are within range. +vector long long vec_vminsd (vector long long, vector long long); -The built-in functions supported are: +vector unsigned long long vec_vminud (vector long long, vector long long); -@smallexample -void __builtin_mma_xvi4ger8 (__vector_quad *, vec_t, vec_t); -void __builtin_mma_xvi8ger4 (__vector_quad *, vec_t, vec_t); -void __builtin_mma_xvi16ger2 (__vector_quad *, vec_t, vec_t); -void __builtin_mma_xvi16ger2s (__vector_quad *, vec_t, vec_t); -void __builtin_mma_xvf16ger2 (__vector_quad *, vec_t, vec_t); -void __builtin_mma_xvbf16ger2 (__vector_quad *, vec_t, vec_t); -void __builtin_mma_xvf32ger (__vector_quad *, vec_t, vec_t); +vector int vec_vpksdss (vector long long, vector long long); +vector unsigned int vec_vpksdss (vector long long, vector long long); -void __builtin_mma_xvi4ger8pp (__vector_quad *, vec_t, vec_t); -void __builtin_mma_xvi8ger4pp (__vector_quad *, vec_t, vec_t); -void __builtin_mma_xvi8ger4spp(__vector_quad *, vec_t, vec_t); -void __builtin_mma_xvi16ger2pp (__vector_quad *, vec_t, vec_t); -void __builtin_mma_xvi16ger2spp (__vector_quad *, vec_t, vec_t); -void __builtin_mma_xvf16ger2pp (__vector_quad *, vec_t, vec_t); -void __builtin_mma_xvf16ger2pn (__vector_quad *, vec_t, vec_t); -void __builtin_mma_xvf16ger2np (__vector_quad *, vec_t, vec_t); -void __builtin_mma_xvf16ger2nn (__vector_quad *, vec_t, vec_t); -void __builtin_mma_xvbf16ger2pp (__vector_quad *, vec_t, vec_t); -void __builtin_mma_xvbf16ger2pn (__vector_quad *, vec_t, vec_t); -void __builtin_mma_xvbf16ger2np (__vector_quad *, vec_t, vec_t); -void __builtin_mma_xvbf16ger2nn (__vector_quad *, vec_t, vec_t); -void __builtin_mma_xvf32gerpp (__vector_quad *, vec_t, vec_t); -void __builtin_mma_xvf32gerpn (__vector_quad *, vec_t, vec_t); -void __builtin_mma_xvf32gernp (__vector_quad *, vec_t, vec_t); -void __builtin_mma_xvf32gernn (__vector_quad *, vec_t, vec_t); +vector unsigned int vec_vpkudus (vector unsigned long long, + vector unsigned long long); -void __builtin_mma_pmxvi4ger8 (__vector_quad *, vec_t, vec_t, uint4, uint4, uint8); -void __builtin_mma_pmxvi4ger8pp (__vector_quad *, vec_t, vec_t, uint4, uint4, uint8); +vector int vec_vpkudum (vector long long, vector long long); +vector unsigned int vec_vpkudum (vector unsigned long long, + vector unsigned long long); +vector bool int vec_vpkudum (vector bool long long, vector bool long long); -void __builtin_mma_pmxvi8ger4 (__vector_quad *, vec_t, vec_t, uint4, uint4, uint4); -void __builtin_mma_pmxvi8ger4pp (__vector_quad *, vec_t, vec_t, uint4, uint4, uint4); -void __builtin_mma_pmxvi8ger4spp(__vector_quad *, vec_t, vec_t, uint4, uint4, uint4); +vector long long vec_vpopcnt (vector long long); +vector unsigned long long vec_vpopcnt (vector unsigned long long); +vector int vec_vpopcnt (vector int); +vector unsigned int vec_vpopcnt (vector int); +vector short vec_vpopcnt (vector short); +vector unsigned short vec_vpopcnt (vector unsigned short); +vector signed char vec_vpopcnt (vector signed char); +vector unsigned char vec_vpopcnt (vector unsigned char); -void __builtin_mma_pmxvi16ger2 (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2); -void __builtin_mma_pmxvi16ger2s (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2); -void __builtin_mma_pmxvf16ger2 (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2); -void __builtin_mma_pmxvbf16ger2 (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2); +vector signed char vec_vpopcntb (vector signed char); +vector unsigned char vec_vpopcntb (vector unsigned char); -void __builtin_mma_pmxvi16ger2pp (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2); -void __builtin_mma_pmxvi16ger2spp (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2); -void __builtin_mma_pmxvf16ger2pp (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2); -void __builtin_mma_pmxvf16ger2pn (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2); -void __builtin_mma_pmxvf16ger2np (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2); -void __builtin_mma_pmxvf16ger2nn (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2); -void __builtin_mma_pmxvbf16ger2pp (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2); -void __builtin_mma_pmxvbf16ger2pn (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2); -void __builtin_mma_pmxvbf16ger2np (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2); -void __builtin_mma_pmxvbf16ger2nn (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2); +vector long long vec_vpopcntd (vector long long); +vector unsigned long long vec_vpopcntd (vector unsigned long long); -void __builtin_mma_pmxvf32ger (__vector_quad *, vec_t, vec_t, uint4, uint4); -void __builtin_mma_pmxvf32gerpp (__vector_quad *, vec_t, vec_t, uint4, uint4); -void __builtin_mma_pmxvf32gerpn (__vector_quad *, vec_t, vec_t, uint4, uint4); -void __builtin_mma_pmxvf32gernp (__vector_quad *, vec_t, vec_t, uint4, uint4); -void __builtin_mma_pmxvf32gernn (__vector_quad *, vec_t, vec_t, uint4, uint4); +vector short vec_vpopcnth (vector short); +vector unsigned short vec_vpopcnth (vector unsigned short); -void __builtin_mma_xvf64ger (__vector_quad *, __vector_pair, vec_t); -void __builtin_mma_xvf64gerpp (__vector_quad *, __vector_pair, vec_t); -void __builtin_mma_xvf64gerpn (__vector_quad *, __vector_pair, vec_t); -void __builtin_mma_xvf64gernp (__vector_quad *, __vector_pair, vec_t); -void __builtin_mma_xvf64gernn (__vector_quad *, __vector_pair, vec_t); +vector int vec_vpopcntw (vector int); +vector unsigned int vec_vpopcntw (vector int); -void __builtin_mma_pmxvf64ger (__vector_quad *, __vector_pair, vec_t, uint4, uint2); -void __builtin_mma_pmxvf64gerpp (__vector_quad *, __vector_pair, vec_t, uint4, uint2); -void __builtin_mma_pmxvf64gerpn (__vector_quad *, __vector_pair, vec_t, uint4, uint2); -void __builtin_mma_pmxvf64gernp (__vector_quad *, __vector_pair, vec_t, uint4, uint2); -void __builtin_mma_pmxvf64gernn (__vector_quad *, __vector_pair, vec_t, uint4, uint2); +vector long long vec_vrld (vector long long, vector unsigned long long); +vector unsigned long long vec_vrld (vector unsigned long long, + vector unsigned long long); -void __builtin_mma_xxmtacc (__vector_quad *); -void __builtin_mma_xxmfacc (__vector_quad *); -void __builtin_mma_xxsetaccz (__vector_quad *); +vector long long vec_vsld (vector long long, vector unsigned long long); +vector long long vec_vsld (vector unsigned long long, + vector unsigned long long); -void __builtin_mma_build_acc (__vector_quad *, vec_t, vec_t, vec_t, vec_t); -void __builtin_mma_disassemble_acc (void *, __vector_quad *); +vector long long vec_vsrad (vector long long, vector unsigned long long); +vector unsigned long long vec_vsrad (vector unsigned long long, + vector unsigned long long); -void __builtin_vsx_build_pair (__vector_pair *, vec_t, vec_t); -void __builtin_vsx_disassemble_pair (void *, __vector_pair *); +vector long long vec_vsrd (vector long long, vector unsigned long long); +vector unsigned long long char vec_vsrd (vector unsigned long long, + vector unsigned long long); -vec_t __builtin_vsx_xvcvspbf16 (vec_t); -vec_t __builtin_vsx_xvcvbf16spn (vec_t); +vector long long vec_vsubudm (vector long long, vector long long); +vector long long vec_vsubudm (vector bool long long, vector long long); +vector long long vec_vsubudm (vector long long, vector bool long long); +vector unsigned long long vec_vsubudm (vector unsigned long long, + vector unsigned long long); +vector unsigned long long vec_vsubudm (vector bool long long, + vector unsigned long long); +vector unsigned long long vec_vsubudm (vector unsigned long long, + vector bool long long); -__vector_pair __builtin_vsx_lxvp (size_t, __vector_pair *); -void __builtin_vsx_stxvp (__vector_pair, size_t, __vector_pair *); +vector long long vec_vupkhsw (vector int); +vector unsigned long long vec_vupkhsw (vector unsigned int); + +vector long long vec_vupklsw (vector int); +vector unsigned long long vec_vupklsw (vector int); @end smallexample -@node PRU Built-in Functions -@subsection PRU Built-in Functions +If the ISA 2.07 additions to the vector/scalar (power8-vector) +instruction set are available, the following additional functions are +available for 64-bit targets. New vector types +(@var{vector __int128} and @var{vector __uint128}) are available +to hold the @var{__int128} and @var{__uint128} types to use these +builtins. -GCC provides a couple of special builtin functions to aid in utilizing -special PRU instructions. +The normal vector extract, and set operations work on +@var{vector __int128} and @var{vector __uint128} types, +but the index value must be 0. -The built-in functions supported are: +Only functions excluded from the PVIPR are listed here. -@defbuiltin{void __delay_cycles (constant long long @var{cycles})} -This inserts an instruction sequence that takes exactly @var{cycles} -cycles (between 0 and 0xffffffff) to complete. The inserted sequence -may use jumps, loops, or no-ops, and does not interfere with any other -instructions. Note that @var{cycles} must be a compile-time constant -integer - that is, you must pass a number, not a variable that may be -optimized to a constant later. The number of cycles delayed by this -builtin is exact. -@enddefbuiltin +@smallexample +vector __int128 vec_vaddcuq (vector __int128, vector __int128); +vector __uint128 vec_vaddcuq (vector __uint128, vector __uint128); -@defbuiltin{void __halt (void)} -This inserts a HALT instruction to stop processor execution. -@enddefbuiltin +vector __int128 vec_vadduqm (vector __int128, vector __int128); +vector __uint128 vec_vadduqm (vector __uint128, vector __uint128); + +vector __int128 vec_vaddecuq (vector __int128, vector __int128, + vector __int128); +vector __uint128 vec_vaddecuq (vector __uint128, vector __uint128, + vector __uint128); + +vector __int128 vec_vaddeuqm (vector __int128, vector __int128, + vector __int128); +vector __uint128 vec_vaddeuqm (vector __uint128, vector __uint128, + vector __uint128); -@defbuiltin{{unsigned int} @ - __lmbd (unsigned int @var{wordval}, @ - unsigned int @var{bitval})} -This inserts LMBD instruction to calculate the left-most bit with value -@var{bitval} in value @var{wordval}. Only the least significant bit -of @var{bitval} is taken into account. -@enddefbuiltin +vector __int128 vec_vsubecuq (vector __int128, vector __int128, + vector __int128); +vector __uint128 vec_vsubecuq (vector __uint128, vector __uint128, + vector __uint128); -@node RISC-V Built-in Functions -@subsection RISC-V Built-in Functions +vector __int128 vec_vsubeuqm (vector __int128, vector __int128, + vector __int128); +vector __uint128 vec_vsubeuqm (vector __uint128, vector __uint128, + vector __uint128); -These built-in functions are available for the RISC-V family of -processors. +vector __int128 vec_vsubcuq (vector __int128, vector __int128); +vector __uint128 vec_vsubcuq (vector __uint128, vector __uint128); -@defbuiltin{{void *} __builtin_thread_pointer (void)} -Returns the value that is currently set in the @samp{tp} register. -@enddefbuiltin +__int128 vec_vsubuqm (__int128, __int128); +__uint128 vec_vsubuqm (__uint128, __uint128); -@defbuiltin{void __builtin_riscv_pause (void)} -Generates the @code{pause} (hint) machine instruction. If the target implements -the Zihintpause extension, it indicates that the current hart should be -temporarily paused or slowed down. -@enddefbuiltin +vector __int128 __builtin_bcdadd (vector __int128, vector __int128, const int); +vector unsigned char __builtin_bcdadd (vector unsigned char, vector unsigned char, + const int); +int __builtin_bcdadd_lt (vector __int128, vector __int128, const int); +int __builtin_bcdadd_lt (vector unsigned char, vector unsigned char, const int); +int __builtin_bcdadd_eq (vector __int128, vector __int128, const int); +int __builtin_bcdadd_eq (vector unsigned char, vector unsigned char, const int); +int __builtin_bcdadd_gt (vector __int128, vector __int128, const int); +int __builtin_bcdadd_gt (vector unsigned char, vector unsigned char, const int); +int __builtin_bcdadd_ov (vector __int128, vector __int128, const int); +int __builtin_bcdadd_ov (vector unsigned char, vector unsigned char, const int); -@node RISC-V Vector Intrinsics -@subsection RISC-V Vector Intrinsics +vector __int128 __builtin_bcdsub (vector __int128, vector __int128, const int); +vector unsigned char __builtin_bcdsub (vector unsigned char, vector unsigned char, + const int); +int __builtin_bcdsub_le (vector __int128, vector __int128, const int); +int __builtin_bcdsub_le (vector unsigned char, vector unsigned char, const int); +int __builtin_bcdsub_lt (vector __int128, vector __int128, const int); +int __builtin_bcdsub_lt (vector unsigned char, vector unsigned char, const int); +int __builtin_bcdsub_eq (vector __int128, vector __int128, const int); +int __builtin_bcdsub_eq (vector unsigned char, vector unsigned char, const int); +int __builtin_bcdsub_gt (vector __int128, vector __int128, const int); +int __builtin_bcdsub_gt (vector unsigned char, vector unsigned char, const int); +int __builtin_bcdsub_ge (vector __int128, vector __int128, const int); +int __builtin_bcdsub_ge (vector unsigned char, vector unsigned char, const int); +int __builtin_bcdsub_ov (vector __int128, vector __int128, const int); +int __builtin_bcdsub_ov (vector unsigned char, vector unsigned char, const int); +@end smallexample -GCC supports vector intrinsics as specified in version 0.11 of the RISC-V -vector intrinsic specification, which is available at the following link: -@uref{https://github.com/riscv-non-isa/rvv-intrinsic-doc/tree/v0.11.x}. -All of these functions are declared in the include file @file{riscv_vector.h}. +@node PowerPC AltiVec Built-in Functions Available on ISA 3.0 +@subsubsection PowerPC AltiVec Built-in Functions Available on ISA 3.0 -@node CORE-V Built-in Functions -@subsection CORE-V Built-in Functions -For more information on all CORE-V built-ins, please see -@uref{https://github.com/openhwgroup/core-v-sw/blob/master/specifications/corev-builtin-spec.md} +The following additional built-in functions are also available for the +PowerPC family of processors, starting with ISA 3.0 +(@option{-mcpu=power9}) or later. -These built-in functions are available for the CORE-V MAC machine -architecture. For more information on CORE-V built-ins, please see -@uref{https://github.com/openhwgroup/core-v-sw/blob/master/specifications/corev-builtin-spec.md#listing-of-multiply-accumulate-builtins-xcvmac}. +Only instructions excluded from the PVIPR are listed here. -@deftypefn {Built-in Function} {int32_t} __builtin_riscv_cv_mac_mac (int32_t, int32_t, int32_t) -Generated assembler @code{cv.mac} -@end deftypefn +@smallexample +unsigned int scalar_extract_exp (double source); +unsigned long long int scalar_extract_exp (__ieee128 source); -@deftypefn {Built-in Function} {int32_t} __builtin_riscv_cv_mac_msu (int32_t, int32_t, int32_t) -Generates the @code{cv.msu} machine instruction. -@end deftypefn +unsigned long long int scalar_extract_sig (double source); +unsigned __int128 scalar_extract_sig (__ieee128 source); -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_mac_muluN (uint32_t, uint32_t, uint8_t) -Generates the @code{cv.muluN} machine instruction. -@end deftypefn +double scalar_insert_exp (unsigned long long int significand, + unsigned long long int exponent); +double scalar_insert_exp (double significand, unsigned long long int exponent); -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_mac_mulhhuN (uint32_t, uint32_t, uint8_t) -Generates the @code{cv.mulhhuN} machine instruction. -@end deftypefn +ieee_128 scalar_insert_exp (unsigned __int128 significand, + unsigned long long int exponent); +ieee_128 scalar_insert_exp (ieee_128 significand, unsigned long long int exponent); +vector ieee_128 scalar_insert_exp (vector unsigned __int128 significand, + vector unsigned long long exponent); +vector unsigned long long scalar_extract_exp_to_vec (ieee_128); +vector unsigned __int128 scalar_extract_sig_to_vec (ieee_128); -@deftypefn {Built-in Function} {int32_t} __builtin_riscv_cv_mac_mulsN (int32_t, int32_t, uint8_t) -Generates the @code{cv.mulsN} machine instruction. -@end deftypefn +int scalar_cmp_exp_gt (double arg1, double arg2); +int scalar_cmp_exp_lt (double arg1, double arg2); +int scalar_cmp_exp_eq (double arg1, double arg2); +int scalar_cmp_exp_unordered (double arg1, double arg2); -@deftypefn {Built-in Function} {int32_t} __builtin_riscv_cv_mac_mulhhsN (int32_t, int32_t, uint8_t) -Generates the @code{cv.mulhhsN} machine instruction. -@end deftypefn +bool scalar_test_data_class (float source, const int condition); +bool scalar_test_data_class (double source, const int condition); +bool scalar_test_data_class (__ieee128 source, const int condition); -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_mac_muluRN (uint32_t, uint32_t, uint8_t) -Generates the @code{cv.muluRN} machine instruction. -@end deftypefn +bool scalar_test_neg (float source); +bool scalar_test_neg (double source); +bool scalar_test_neg (__ieee128 source); +@end smallexample -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_mac_mulhhuRN (uint32_t, uint32_t, uint8_t) -Generates the @code{cv.mulhhuRN} machine instruction. -@end deftypefn +The @code{scalar_extract_exp} with a 64-bit source argument +function requires an environment supporting ISA 3.0 or later. +The @code{scalar_extract_exp} with a 128-bit source argument +and @code{scalar_extract_sig} +functions require a 64-bit environment supporting ISA 3.0 or later. +The @code{scalar_extract_exp} and @code{scalar_extract_sig} built-in +functions return the significand and the biased exponent value +respectively of their @code{source} arguments. +When supplied with a 64-bit @code{source} argument, the +result returned by @code{scalar_extract_sig} has +the @code{0x0010000000000000} bit set if the +function's @code{source} argument is in normalized form. +Otherwise, this bit is set to 0. +When supplied with a 128-bit @code{source} argument, the +@code{0x00010000000000000000000000000000} bit of the result is +treated similarly. +Note that the sign of the significand is not represented in the result +returned from the @code{scalar_extract_sig} function. Use the +@code{scalar_test_neg} function to test the sign of its @code{double} +argument. -@deftypefn {Built-in Function} {int32_t} __builtin_riscv_cv_mac_mulsRN (int32_t, int32_t, uint8_t) -Generates the @code{cv.mulsRN} machine instruction. -@end deftypefn +The @code{scalar_insert_exp} +functions require a 64-bit environment supporting ISA 3.0 or later. +When supplied with a 64-bit first argument, the +@code{scalar_insert_exp} built-in function returns a double-precision +floating point value that is constructed by assembling the values of its +@code{significand} and @code{exponent} arguments. The sign of the +result is copied from the most significant bit of the +@code{significand} argument. The significand and exponent components +of the result are composed of the least significant 11 bits of the +@code{exponent} argument and the least significant 52 bits of the +@code{significand} argument respectively. -@deftypefn {Built-in Function} {int32_t} __builtin_riscv_cv_mac_mulhhsRN (int32_t, int32_t, uint8_t) -Generates the @code{cv.mulhhsRN} machine instruction. -@end deftypefn +When supplied with a 128-bit first argument, the +@code{scalar_insert_exp} built-in function returns a quad-precision +IEEE floating point value if the two arguments were scalar. If the two +arguments are vectors, the return value is a vector IEEE floating point value. +The sign bit of the result is copied from the most significant bit of the +@code{significand} argument. The significand and exponent components of the +result are composed of the least significant 15 bits of the @code{exponent} +argument (element 0 on big-endian and element 1 on little-endian) and the +least significant 112 bits of the @code{significand} argument +respectively. Note, the @code{significand} is the scalar argument or in the +case of vector arguments, @code{significand} is element 0 for big-endian and +element 1 for little-endian. -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_mac_macuN (uint32_t, uint32_t, uint8_t) -Generates the @code{cv.macuN} machine instruction. -@end deftypefn +The @code{scalar_extract_exp_to_vec}, +and @code{scalar_extract_sig_to_vec} are similar to +@code{scalar_extract_exp}, @code{scalar_extract_sig} except they return +a vector result of type unsigned long long and unsigned __int128 respectively. -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_mac_machhuN (uint32_t, uint32_t, uint8_t) -Generates the @code{cv.machhuN} machine instruction. -@end deftypefn +The @code{scalar_cmp_exp_gt}, @code{scalar_cmp_exp_lt}, +@code{scalar_cmp_exp_eq}, and @code{scalar_cmp_exp_unordered} built-in +functions return a non-zero value if @code{arg1} is greater than, less +than, equal to, or not comparable to @code{arg2} respectively. The +arguments are not comparable if one or the other equals NaN (not a +number). -@deftypefn {Built-in Function} {int32_t} __builtin_riscv_cv_mac_macsN (int32_t, int32_t, uint8_t) -Generates the @code{cv.macsN} machine instruction. -@end deftypefn +The @code{scalar_test_data_class} built-in function returns 1 +if any of the condition tests enabled by the value of the +@code{condition} variable are true, and 0 otherwise. The +@code{condition} argument must be a compile-time constant integer with +value not exceeding 127. The +@code{condition} argument is encoded as a bitmask with each bit +enabling the testing of a different condition, as characterized by the +following: +@smallexample +0x40 Test for NaN +0x20 Test for +Infinity +0x10 Test for -Infinity +0x08 Test for +Zero +0x04 Test for -Zero +0x02 Test for +Denormal +0x01 Test for -Denormal +@end smallexample -@deftypefn {Built-in Function} {int32_t} __builtin_riscv_cv_mac_machhsN (int32_t, int32_t, uint8_t) -Generates the @code{cv.machhsN} machine instruction. -@end deftypefn +The @code{scalar_test_neg} built-in function returns 1 if its +@code{source} argument holds a negative value, 0 otherwise. -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_mac_macuRN (uint32_t, uint32_t, uint8_t) -Generates the @code{cv.macuRN} machine instruction. -@end deftypefn +The following built-in functions are also available for the PowerPC family +of processors, starting with ISA 3.0 or later +(@option{-mcpu=power9}). These string functions are described +separately in order to group the descriptions closer to the function +prototypes. -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_mac_machhuRN (uint32_t, uint32_t, uint8_t) -Generates the @code{cv.machhuRN} machine instruction. -@end deftypefn +Only functions excluded from the PVIPR are listed here. -@deftypefn {Built-in Function} {int32_t} __builtin_riscv_cv_mac_macsRN (int32_t, int32_t, uint8_t) -Generates the @code{cv.macsRN} machine instruction. -@end deftypefn +@smallexample +int vec_all_nez (vector signed char, vector signed char); +int vec_all_nez (vector unsigned char, vector unsigned char); +int vec_all_nez (vector signed short, vector signed short); +int vec_all_nez (vector unsigned short, vector unsigned short); +int vec_all_nez (vector signed int, vector signed int); +int vec_all_nez (vector unsigned int, vector unsigned int); -@deftypefn {Built-in Function} {int32_t} __builtin_riscv_cv_mac_machhsRN (int32_t, int32_t, uint8_t) -Generates the @code{cv.machhsRN} machine instruction. -@end deftypefn +int vec_any_eqz (vector signed char, vector signed char); +int vec_any_eqz (vector unsigned char, vector unsigned char); +int vec_any_eqz (vector signed short, vector signed short); +int vec_any_eqz (vector unsigned short, vector unsigned short); +int vec_any_eqz (vector signed int, vector signed int); +int vec_any_eqz (vector unsigned int, vector unsigned int); + +signed char vec_xlx (unsigned int index, vector signed char data); +unsigned char vec_xlx (unsigned int index, vector unsigned char data); +signed short vec_xlx (unsigned int index, vector signed short data); +unsigned short vec_xlx (unsigned int index, vector unsigned short data); +signed int vec_xlx (unsigned int index, vector signed int data); +unsigned int vec_xlx (unsigned int index, vector unsigned int data); +float vec_xlx (unsigned int index, vector float data); -These built-in functions are available for the CORE-V ALU machine -architecture. For more information on CORE-V built-ins, please see -@uref{https://github.com/openhwgroup/core-v-sw/blob/master/specifications/corev-builtin-spec.md#listing-of-miscellaneous-alu-builtins-xcvalu} +signed char vec_xrx (unsigned int index, vector signed char data); +unsigned char vec_xrx (unsigned int index, vector unsigned char data); +signed short vec_xrx (unsigned int index, vector signed short data); +unsigned short vec_xrx (unsigned int index, vector unsigned short data); +signed int vec_xrx (unsigned int index, vector signed int data); +unsigned int vec_xrx (unsigned int index, vector unsigned int data); +float vec_xrx (unsigned int index, vector float data); +@end smallexample -@deftypefn {Built-in Function} {int} __builtin_riscv_cv_alu_slet (int32_t, int32_t) -Generated assembler @code{cv.slet} -@end deftypefn +The @code{vec_all_nez}, @code{vec_any_eqz}, and @code{vec_cmpnez} +perform pairwise comparisons between the elements at the same +positions within their two vector arguments. +The @code{vec_all_nez} function returns a +non-zero value if and only if all pairwise comparisons are not +equal and no element of either vector argument contains a zero. +The @code{vec_any_eqz} function returns a +non-zero value if and only if at least one pairwise comparison is equal +or if at least one element of either vector argument contains a zero. +The @code{vec_cmpnez} function returns a vector of the same type as +its two arguments, within which each element consists of all ones to +denote that either the corresponding elements of the incoming arguments are +not equal or that at least one of the corresponding elements contains +zero. Otherwise, the element of the returned vector contains all zeros. -@deftypefn {Built-in Function} {int} __builtin_riscv_cv_alu_sletu (uint32_t, uint32_t) -Generated assembler @code{cv.sletu} -@end deftypefn +The @code{vec_xlx} and @code{vec_xrx} functions extract the single +element selected by the @code{index} argument from the vector +represented by the @code{data} argument. The @code{index} argument +always specifies a byte offset, regardless of the size of the vector +element. With @code{vec_xlx}, @code{index} is the offset of the first +byte of the element to be extracted. With @code{vec_xrx}, @code{index} +represents the last byte of the element to be extracted, measured +from the right end of the vector. In other words, the last byte of +the element to be extracted is found at position @code{(15 - index)}. +There is no requirement that @code{index} be a multiple of the vector +element size. However, if the size of the vector element added to +@code{index} is greater than 15, the content of the returned value is +undefined. -@deftypefn {Built-in Function} {int32_t} __builtin_riscv_cv_alu_min (int32_t, int32_t) -Generated assembler @code{cv.min} -@end deftypefn +The following functions are also available if the ISA 3.0 instruction +set additions (@option{-mcpu=power9}) are available. -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_alu_minu (uint32_t, uint32_t) -Generated assembler @code{cv.minu} -@end deftypefn +Only functions excluded from the PVIPR are listed here. -@deftypefn {Built-in Function} {int32_t} __builtin_riscv_cv_alu_max (int32_t, int32_t) -Generated assembler @code{cv.max} -@end deftypefn +@smallexample +vector long long vec_vctz (vector long long); +vector unsigned long long vec_vctz (vector unsigned long long); +vector int vec_vctz (vector int); +vector unsigned int vec_vctz (vector int); +vector short vec_vctz (vector short); +vector unsigned short vec_vctz (vector unsigned short); +vector signed char vec_vctz (vector signed char); +vector unsigned char vec_vctz (vector unsigned char); -@deftypefn {Built-in Function} {uint32_tnt} __builtin_riscv_cv_alu_maxu (uint32_t, uint32_t) -Generated assembler @code{cv.maxu} -@end deftypefn +vector signed char vec_vctzb (vector signed char); +vector unsigned char vec_vctzb (vector unsigned char); -@deftypefn {Built-in Function} {int32_t} __builtin_riscv_cv_alu_exths (int16_t) -Generated assembler @code{cv.exths} -@end deftypefn +vector long long vec_vctzd (vector long long); +vector unsigned long long vec_vctzd (vector unsigned long long); -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_alu_exthz (uint16_t) -Generated assembler @code{cv.exthz} -@end deftypefn +vector short vec_vctzh (vector short); +vector unsigned short vec_vctzh (vector unsigned short); -@deftypefn {Built-in Function} {int32_t} __builtin_riscv_cv_alu_extbs (int8_t) -Generated assembler @code{cv.extbs} -@end deftypefn +vector int vec_vctzw (vector int); +vector unsigned int vec_vctzw (vector int); -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_alu_extbz (uint8_t) -Generated assembler @code{cv.extbz} -@end deftypefn +vector int vec_vprtyb (vector int); +vector unsigned int vec_vprtyb (vector unsigned int); +vector long long vec_vprtyb (vector long long); +vector unsigned long long vec_vprtyb (vector unsigned long long); -@deftypefn {Built-in Function} {int32_t} __builtin_riscv_cv_alu_clip (int32_t, uint32_t) -Generated assembler @code{cv.clip} if the uint32_t operand is a constant and an exact power of 2. -Generated assembler @code{cv.clipr} if the it is a register. -@end deftypefn +vector int vec_vprtybw (vector int); +vector unsigned int vec_vprtybw (vector unsigned int); -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_alu_clipu (uint32_t, uint32_t) -Generated assembler @code{cv.clipu} if the uint32_t operand is a constant and an exact power of 2. -Generated assembler @code{cv.clipur} if the it is a register. -@end deftypefn +vector long long vec_vprtybd (vector long long); +vector unsigned long long vec_vprtybd (vector unsigned long long); +@end smallexample -@deftypefn {Built-in Function} {int32_t} __builtin_riscv_cv_alu_addN (int32_t, int32_t, uint8_t) -Generated assembler @code{cv.addN} if the uint8_t operand is a constant and in the range 0 <= shft <= 31. -Generated assembler @code{cv.addNr} if the it is a register. -@end deftypefn +On 64-bit targets, if the ISA 3.0 additions (@option{-mcpu=power9}) +are available: -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_alu_adduN (uint32_t, uint32_t, uint8_t) -Generated assembler @code{cv.adduN} if the uint8_t operand is a constant and in the range 0 <= shft <= 31. -Generated assembler @code{cv.adduNr} if the it is a register. -@end deftypefn +@smallexample +vector long vec_vprtyb (vector long); +vector unsigned long vec_vprtyb (vector unsigned long); +vector __int128 vec_vprtyb (vector __int128); +vector __uint128 vec_vprtyb (vector __uint128); -@deftypefn {Built-in Function} {int32_t} __builtin_riscv_cv_alu_addRN (int32_t, int32_t, uint8_t) -Generated assembler @code{cv.addRN} if the uint8_t operand is a constant and in the range 0 <= shft <= 31. -Generated assembler @code{cv.addRNr} if the it is a register. -@end deftypefn +vector long vec_vprtybd (vector long); +vector unsigned long vec_vprtybd (vector unsigned long); -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_alu_adduRN (uint32_t, uint32_t, uint8_t) -Generated assembler @code{cv.adduRN} if the uint8_t operand is a constant and in the range 0 <= shft <= 31. -Generated assembler @code{cv.adduRNr} if the it is a register. -@end deftypefn +vector __int128 vec_vprtybq (vector __int128); +vector __uint128 vec_vprtybd (vector __uint128); +@end smallexample -@deftypefn {Built-in Function} {int32_t} __builtin_riscv_cv_alu_subN (int32_t, int32_t, uint8_t) -Generated assembler @code{cv.subN} if the uint8_t operand is a constant and in the range 0 <= shft <= 31. -Generated assembler @code{cv.subNr} if the it is a register. -@end deftypefn +The following built-in functions are available for the PowerPC family +of processors, starting with ISA 3.0 or later (@option{-mcpu=power9}). -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_alu_subuN (uint32_t, uint32_t, uint8_t) -Generated assembler @code{cv.subuN} if the uint8_t operand is a constant and in the range 0 <= shft <= 31. -Generated assembler @code{cv.subuNr} if the it is a register. -@end deftypefn +Only functions excluded from the PVIPR are listed here. -@deftypefn {Built-in Function} {int32_t} __builtin_riscv_cv_alu_subRN (int32_t, int32_t, uint8_t) -Generated assembler @code{cv.subRN} if the uint8_t operand is a constant and in the range 0 <= shft <= 31. -Generated assembler @code{cv.subRNr} if the it is a register. -@end deftypefn +@smallexample +__vector unsigned char +vec_absdb (__vector unsigned char arg1, __vector unsigned char arg2); +__vector unsigned short +vec_absdh (__vector unsigned short arg1, __vector unsigned short arg2); +__vector unsigned int +vec_absdw (__vector unsigned int arg1, __vector unsigned int arg2); +@end smallexample -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_alu_subuRN (uint32_t, uint32_t, uint8_t) -Generated assembler @code{cv.subuRN} if the uint8_t operand is a constant and in the range 0 <= shft <= 31. -Generated assembler @code{cv.subuRNr} if the it is a register. -@end deftypefn +The @code{vec_absd}, @code{vec_absdb}, @code{vec_absdh}, and +@code{vec_absdw} built-in functions each computes the absolute +differences of the pairs of vector elements supplied in its two vector +arguments, placing the absolute differences into the corresponding +elements of the vector result. -These built-in functions are available for the CORE-V Event Load machine -architecture. For more information on CORE-V ELW builtins, please see -@uref{https://github.com/openhwgroup/core-v-sw/blob/master/specifications/corev-builtin-spec.md#listing-of-event-load-word-builtins-xcvelw} +The following built-in functions are available for the PowerPC family +of processors, starting with ISA 3.0 or later (@option{-mcpu=power9}): +@smallexample +vector unsigned int vec_vrlnm (vector unsigned int, vector unsigned int); +vector unsigned long long vec_vrlnm (vector unsigned long long, + vector unsigned long long); +@end smallexample -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_elw_elw (uint32_t *) -Generated assembler @code{cv.elw} -@end deftypefn +The result of @code{vec_vrlnm} is obtained by rotating each element +of the first argument vector left and ANDing it with a mask. The +second argument vector contains the mask beginning in bits 11:15, +the mask end in bits 19:23, and the shift count in bits 27:31, +of each element. -These built-in functions are available for the CORE-V SIMD machine -architecture. For more information on CORE-V SIMD built-ins, please see -@uref{https://github.com/openhwgroup/core-v-sw/blob/master/specifications/corev-builtin-spec.md#listing-of-pulp-816-bit-simd-builtins-xcvsimd} +If the cryptographic instructions are enabled (@option{-mcrypto} or +@option{-mcpu=power8}), the following builtins are enabled. -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_add_h (uint32_t, uint32_t, uint4_t) -Generated assembler @code{cv.add.h} -@end deftypefn +Only functions excluded from the PVIPR are listed here. -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_add_b (uint32_t, uint32_t) -Generated assembler @code{cv.add.b} -@end deftypefn +@smallexample +vector unsigned long long __builtin_crypto_vsbox (vector unsigned long long); -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_add_sc_h (uint32_t, int16_t) -Generated assembler @code{cv.add.sc.h} -@end deftypefn +vector unsigned long long __builtin_crypto_vcipher (vector unsigned long long, + vector unsigned long long); -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_add_sc_h (uint32_t, int6_t) -Generated assembler @code{cv.add.sci.h} -@end deftypefn +vector unsigned long long __builtin_crypto_vcipherlast + (vector unsigned long long, + vector unsigned long long); -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_add_sc_b (uint32_t, int8_t) -Generated assembler @code{cv.add.sc.b} -@end deftypefn +vector unsigned long long __builtin_crypto_vncipher (vector unsigned long long, + vector unsigned long long); -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_add_sc_b (uint32_t, int6_t) -Generated assembler @code{cv.add.sci.b} -@end deftypefn +vector unsigned long long __builtin_crypto_vncipherlast (vector unsigned long long, + vector unsigned long long); -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sub_h (uint32_t, uint32_t, uint4_t) -Generated assembler @code{cv.sub.h} -@end deftypefn +vector unsigned char __builtin_crypto_vpermxor (vector unsigned char, + vector unsigned char, + vector unsigned char); -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sub_b (uint32_t, uint32_t) -Generated assembler @code{cv.sub.b} -@end deftypefn +vector unsigned short __builtin_crypto_vpermxor (vector unsigned short, + vector unsigned short, + vector unsigned short); -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sub_sc_h (uint32_t, int16_t) -Generated assembler @code{cv.sub.sc.h} -@end deftypefn +vector unsigned int __builtin_crypto_vpermxor (vector unsigned int, + vector unsigned int, + vector unsigned int); -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sub_sc_h (uint32_t, int6_t) -Generated assembler @code{cv.sub.sci.h} -@end deftypefn +vector unsigned long long __builtin_crypto_vpermxor (vector unsigned long long, + vector unsigned long long, + vector unsigned long long); -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sub_sc_b (uint32_t, int8_t) -Generated assembler @code{cv.sub.sc.b} -@end deftypefn +vector unsigned char __builtin_crypto_vpmsumb (vector unsigned char, + vector unsigned char); -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sub_sc_b (uint32_t, int6_t) -Generated assembler @code{cv.sub.sci.b} -@end deftypefn +vector unsigned short __builtin_crypto_vpmsumh (vector unsigned short, + vector unsigned short); + +vector unsigned int __builtin_crypto_vpmsumw (vector unsigned int, + vector unsigned int); -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_avg_h (uint32_t, uint32_t) -Generated assembler @code{cv.avg.h} -@end deftypefn +vector unsigned long long __builtin_crypto_vpmsumd (vector unsigned long long, + vector unsigned long long); -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_avg_b (uint32_t, uint32_t) -Generated assembler @code{cv.avg.b} -@end deftypefn +vector unsigned long long __builtin_crypto_vshasigmad (vector unsigned long long, + int, int); -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_avg_sc_h (uint32_t, int16_t) -Generated assembler @code{cv.avg.sc.h} -@end deftypefn +vector unsigned int __builtin_crypto_vshasigmaw (vector unsigned int, int, int); +@end smallexample -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_avg_sc_h (uint32_t, int6_t) -Generated assembler @code{cv.avg.sci.h} -@end deftypefn +The second argument to @var{__builtin_crypto_vshasigmad} and +@var{__builtin_crypto_vshasigmaw} must be a constant +integer that is 0 or 1. The third argument to these built-in functions +must be a constant integer in the range of 0 to 15. -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_avg_sc_b (uint32_t, int8_t) -Generated assembler @code{cv.avg.sc.b} -@end deftypefn +The following sign extension builtins are provided: -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_avg_sc_b (uint32_t, int6_t) -Generated assembler @code{cv.avg.sci.b} -@end deftypefn +@smallexample +vector signed int vec_signexti (vector signed char a); +vector signed long long vec_signextll (vector signed char a); +vector signed int vec_signexti (vector signed short a); +vector signed long long vec_signextll (vector signed short a); +vector signed long long vec_signextll (vector signed int a); +vector signed long long vec_signextq (vector signed long long a); +@end smallexample -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_avgu_h (uint32_t, uint32_t) -Generated assembler @code{cv.avgu.h} -@end deftypefn +Each element of the result is produced by sign-extending the element of the +input vector that would fall in the least significant portion of the result +element. For example, a sign-extension of a vector signed char to a vector +signed long long will sign extend the rightmost byte of each doubleword. -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_avgu_b (uint32_t, uint32_t) -Generated assembler @code{cv.avgu.b} -@end deftypefn +@node PowerPC AltiVec Built-in Functions Available on ISA 3.1 +@subsubsection PowerPC AltiVec Built-in Functions Available on ISA 3.1 -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_avgu_sc_h (uint32_t, uint16_t) -Generated assembler @code{cv.avgu.sc.h} -@end deftypefn +The following additional built-in functions are also available for the +PowerPC family of processors, starting with ISA 3.1 (@option{-mcpu=power10}): -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_avgu_sc_h (uint32_t, uint6_t) -Generated assembler @code{cv.avgu.sci.h} -@end deftypefn +@smallexample +@exdent int vec_test_lsbb_all_ones (vector signed char); +@exdent int vec_test_lsbb_all_ones (vector unsigned char); +@exdent int vec_test_lsbb_all_ones (vector bool char); +@end smallexample +@findex vec_test_lsbb_all_ones -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_avgu_sc_b (uint32_t, uint8_t) -Generated assembler @code{cv.avgu.sc.b} -@end deftypefn +The builtin @code{vec_test_lsbb_all_ones} returns 1 if the least significant +bit in each byte is equal to 1. It returns 0 otherwise. -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_avgu_sc_b (uint32_t, uint6_t) -Generated assembler @code{cv.avgu.sci.b} -@end deftypefn +@smallexample +@exdent int vec_test_lsbb_all_zeros (vector signed char); +@exdent int vec_test_lsbb_all_zeros (vector unsigned char); +@exdent int vec_test_lsbb_all_zeros (vector bool char); +@end smallexample +@findex vec_test_lsbb_all_zeros -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_min_h (uint32_t, uint32_t) -Generated assembler @code{cv.min.h} -@end deftypefn +The builtin @code{vec_test_lsbb_all_zeros} returns 1 if the least significant +bit in each byte is equal to zero. It returns 0 otherwise. -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_min_b (uint32_t, uint32_t) -Generated assembler @code{cv.min.b} -@end deftypefn +@smallexample +@exdent vector unsigned long long int +@exdent vec_cfuge (vector unsigned long long int, vector unsigned long long int); +@end smallexample +Perform a vector centrifuge operation, as if implemented by the +@code{vcfuged} instruction. +@findex vec_cfuge -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_min_sc_h (uint32_t, int16_t) -Generated assembler @code{cv.min.sc.h} -@end deftypefn +@smallexample +@exdent vector unsigned long long int +@exdent vec_cntlzm (vector unsigned long long int, vector unsigned long long int); +@end smallexample +Perform a vector count leading zeros under bit mask operation, as if +implemented by the @code{vclzdm} instruction. +@findex vec_cntlzm -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_min_sc_h (uint32_t, int6_t) -Generated assembler @code{cv.min.sci.h} -@end deftypefn +@smallexample +@exdent vector unsigned long long int +@exdent vec_cnttzm (vector unsigned long long int, vector unsigned long long int); +@end smallexample +Perform a vector count trailing zeros under bit mask operation, as if +implemented by the @code{vctzdm} instruction. +@findex vec_cnttzm -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_min_sc_b (uint32_t, int8_t) -Generated assembler @code{cv.min.sc.b} -@end deftypefn +@smallexample +@exdent vector signed char +@exdent vec_clrl (vector signed char @var{a}, unsigned int @var{n}); +@exdent vector unsigned char +@exdent vec_clrl (vector unsigned char @var{a}, unsigned int @var{n}); +@end smallexample +Clear the left-most @code{(16 - n)} bytes of vector argument @code{a}, as if +implemented by the @code{vclrlb} instruction on a big-endian target +and by the @code{vclrrb} instruction on a little-endian target. A +value of @code{n} that is greater than 16 is treated as if it equaled 16. +@findex vec_clrl -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_min_sc_b (uint32_t, int6_t) -Generated assembler @code{cv.min.sci.b} -@end deftypefn +@smallexample +@exdent vector signed char +@exdent vec_clrr (vector signed char @var{a}, unsigned int @var{n}); +@exdent vector unsigned char +@exdent vec_clrr (vector unsigned char @var{a}, unsigned int @var{n}); +@end smallexample +Clear the right-most @code{(16 - n)} bytes of vector argument @code{a}, as if +implemented by the @code{vclrrb} instruction on a big-endian target +and by the @code{vclrlb} instruction on a little-endian target. A +value of @code{n} that is greater than 16 is treated as if it equaled 16. +@findex vec_clrr -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_minu_h (uint32_t, uint32_t) -Generated assembler @code{cv.minu.h} -@end deftypefn +@smallexample +@exdent vector unsigned long long int +@exdent vec_gnb (vector unsigned __int128, const unsigned char); +@end smallexample +Perform a 128-bit vector gather operation, as if implemented by the +@code{vgnb} instruction. The second argument must be a literal +integer value between 2 and 7 inclusive. +@findex vec_gnb -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_minu_b (uint32_t, uint32_t) -Generated assembler @code{cv.minu.b} -@end deftypefn -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_minu_sc_h (uint32_t, uint16_t) -Generated assembler @code{cv.minu.sc.h} -@end deftypefn +Vector Extract -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_minu_sc_h (uint32_t, uint6_t) -Generated assembler @code{cv.minu.sci.h} -@end deftypefn +@smallexample +@exdent vector unsigned long long int +@exdent vec_extractl (vector unsigned char, vector unsigned char, unsigned int); +@exdent vector unsigned long long int +@exdent vec_extractl (vector unsigned short, vector unsigned short, unsigned int); +@exdent vector unsigned long long int +@exdent vec_extractl (vector unsigned int, vector unsigned int, unsigned int); +@exdent vector unsigned long long int +@exdent vec_extractl (vector unsigned long long, vector unsigned long long, unsigned int); +@end smallexample +Extract an element from two concatenated vectors starting at the given byte index +in natural-endian order, and place it zero-extended in doubleword 1 of the result +according to natural element order. If the byte index is out of range for the +data type, the intrinsic will be rejected. +For little-endian, this output will match the placement by the hardware +instruction, i.e., dword[0] in RTL notation. For big-endian, an additional +instruction is needed to move it from the "left" doubleword to the "right" one. +For little-endian, semantics matching the @code{vextdubvrx}, +@code{vextduhvrx}, @code{vextduwvrx} instruction will be generated, while for +big-endian, semantics matching the @code{vextdubvlx}, @code{vextduhvlx}, +@code{vextduwvlx} instructions +will be generated. Note that some fairly anomalous results can be generated if +the byte index is not aligned on an element boundary for the element being +extracted. This is a limitation of the bi-endian vector programming model is +consistent with the limitation on @code{vec_perm}. +@findex vec_extractl -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_minu_sc_b (uint32_t, uint8_t) -Generated assembler @code{cv.minu.sc.b} -@end deftypefn +@smallexample +@exdent vector unsigned long long int +@exdent vec_extracth (vector unsigned char, vector unsigned char, unsigned int); +@exdent vector unsigned long long int +@exdent vec_extracth (vector unsigned short, vector unsigned short, +unsigned int); +@exdent vector unsigned long long int +@exdent vec_extracth (vector unsigned int, vector unsigned int, unsigned int); +@exdent vector unsigned long long int +@exdent vec_extracth (vector unsigned long long, vector unsigned long long, +unsigned int); +@end smallexample +Extract an element from two concatenated vectors starting at the given byte +index. The index is based on big endian order for a little endian system. +Similarly, the index is based on little endian order for a big endian system. +The extraced elements are zero-extended and put in doubleword 1 +according to natural element order. If the byte index is out of range for the +data type, the intrinsic will be rejected. For little-endian, this output +will match the placement by the hardware instruction (vextdubvrx, vextduhvrx, +vextduwvrx, vextddvrx) i.e., dword[0] in RTL +notation. For big-endian, an additional instruction is needed to move it +from the "left" doubleword to the "right" one. For little-endian, semantics +matching the @code{vextdubvlx}, @code{vextduhvlx}, @code{vextduwvlx} +instructions will be generated, while for big-endian, semantics matching the +@code{vextdubvrx}, @code{vextduhvrx}, @code{vextduwvrx} instructions will +be generated. Note that some fairly anomalous +results can be generated if the byte index is not aligned on the +element boundary for the element being extracted. This is a +limitation of the bi-endian vector programming model consistent with the +limitation on @code{vec_perm}. +@findex vec_extracth +@smallexample +@exdent vector unsigned long long int +@exdent vec_pdep (vector unsigned long long int, vector unsigned long long int); +@end smallexample +Perform a vector parallel bits deposit operation, as if implemented by +the @code{vpdepd} instruction. +@findex vec_pdep -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_minu_sc_b (uint32_t, uint6_t) -Generated assembler @code{cv.minu.sci.b} -@end deftypefn +Vector Insert -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_max_h (uint32_t, uint32_t) -Generated assembler @code{cv.max.h} -@end deftypefn +@smallexample +@exdent vector unsigned char +@exdent vec_insertl (unsigned char, vector unsigned char, unsigned int); +@exdent vector unsigned short +@exdent vec_insertl (unsigned short, vector unsigned short, unsigned int); +@exdent vector unsigned int +@exdent vec_insertl (unsigned int, vector unsigned int, unsigned int); +@exdent vector unsigned long long +@exdent vec_insertl (unsigned long long, vector unsigned long long, +unsigned int); +@exdent vector unsigned char +@exdent vec_insertl (vector unsigned char, vector unsigned char, unsigned int; +@exdent vector unsigned short +@exdent vec_insertl (vector unsigned short, vector unsigned short, +unsigned int); +@exdent vector unsigned int +@exdent vec_insertl (vector unsigned int, vector unsigned int, unsigned int); +@end smallexample + +Let src be the first argument, when the first argument is a scalar, or the +rightmost element of the left doubleword of the first argument, when the first +argument is a vector. Insert the source into the destination at the position +given by the third argument, using natural element order in the second +argument. The rest of the second argument is unchanged. If the byte +index is greater than 14 for halfwords, greater than 12 for words, or +greater than 8 for doublewords the result is undefined. For little-endian, +the generated code will be semantically equivalent to @code{vins[bhwd]rx} +instructions. Similarly for big-endian it will be semantically equivalent +to @code{vins[bhwd]lx}. Note that some fairly anomalous results can be +generated if the byte index is not aligned on an element boundary for the +type of element being inserted. +@findex vec_insertl -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_max_b (uint32_t, uint32_t) -Generated assembler @code{cv.max.b} -@end deftypefn +@smallexample +@exdent vector unsigned char +@exdent vec_inserth (unsigned char, vector unsigned char, unsigned int); +@exdent vector unsigned short +@exdent vec_inserth (unsigned short, vector unsigned short, unsigned int); +@exdent vector unsigned int +@exdent vec_inserth (unsigned int, vector unsigned int, unsigned int); +@exdent vector unsigned long long +@exdent vec_inserth (unsigned long long, vector unsigned long long, +unsigned int); +@exdent vector unsigned char +@exdent vec_inserth (vector unsigned char, vector unsigned char, unsigned int); +@exdent vector unsigned short +@exdent vec_inserth (vector unsigned short, vector unsigned short, +unsigned int); +@exdent vector unsigned int +@exdent vec_inserth (vector unsigned int, vector unsigned int, unsigned int); +@end smallexample -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_max_sc_h (uint32_t, int16_t) -Generated assembler @code{cv.max.sc.h} -@end deftypefn +Let src be the first argument, when the first argument is a scalar, or the +rightmost element of the first argument, when the first argument is a vector. +Insert src into the second argument at the position identified by the third +argument, using opposite element order in the second argument, and leaving the +rest of the second argument unchanged. If the byte index is greater than 14 +for halfwords, 12 for words, or 8 for doublewords, the intrinsic will be +rejected. Note that the underlying hardware instruction uses the same register +for the second argument and the result. +For little-endian, the code generation will be semantically equivalent to +@code{vins[bhwd]lx}, while for big-endian it will be semantically equivalent to +@code{vins[bhwd]rx}. +Note that some fairly anomalous results can be generated if the byte index is +not aligned on an element boundary for the sort of element being inserted. +@findex vec_inserth -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_max_sc_h (uint32_t, int6_t) -Generated assembler @code{cv.max.sci.h} -@end deftypefn +Vector Replace Element +@smallexample +@exdent vector signed int vec_replace_elt (vector signed int, signed int, +const int); +@exdent vector unsigned int vec_replace_elt (vector unsigned int, +unsigned int, const int); +@exdent vector float vec_replace_elt (vector float, float, const int); +@exdent vector signed long long vec_replace_elt (vector signed long long, +signed long long, const int); +@exdent vector unsigned long long vec_replace_elt (vector unsigned long long, +unsigned long long, const int); +@exdent vector double rec_replace_elt (vector double, double, const int); +@end smallexample +The third argument (constrained to [0,3]) identifies the natural-endian +element number of the first argument that will be replaced by the second +argument to produce the result. The other elements of the first argument will +remain unchanged in the result. -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_max_sc_b (uint32_t, int8_t) -Generated assembler @code{cv.max.sc.b} -@end deftypefn +If it's desirable to insert a word at an unaligned position, use +vec_replace_unaligned instead. -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_max_sc_b (uint32_t, int6_t) -Generated assembler @code{cv.max.sci.b} -@end deftypefn +@findex vec_replace_element -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_maxu_h (uint32_t, uint32_t) -Generated assembler @code{cv.maxu.h} -@end deftypefn +Vector Replace Unaligned +@smallexample +@exdent vector unsigned char vec_replace_unaligned (vector unsigned char, +signed int, const int); +@exdent vector unsigned char vec_replace_unaligned (vector unsigned char, +unsigned int, const int); +@exdent vector unsigned char vec_replace_unaligned (vector unsigned char, +float, const int); +@exdent vector unsigned char vec_replace_unaligned (vector unsigned char, +signed long long, const int); +@exdent vector unsigned char vec_replace_unaligned (vector unsigned char, +unsigned long long, const int); +@exdent vector unsigned char vec_replace_unaligned (vector unsigned char, +double, const int); +@end smallexample -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_maxu_b (uint32_t, uint32_t) -Generated assembler @code{cv.maxu.b} -@end deftypefn +The second argument replaces a portion of the first argument to produce the +result, with the rest of the first argument unchanged in the result. The +third argument identifies the byte index (using left-to-right, or big-endian +order) where the high-order byte of the second argument will be placed, with +the remaining bytes of the second argument placed naturally "to the right" +of the high-order byte. -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_maxu_sc_h (uint32_t, uint16_t) -Generated assembler @code{cv.maxu.sc.h} -@end deftypefn +The programmer is responsible for understanding the endianness issues involved +with the first argument and the result. +@findex vec_replace_unaligned -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_maxu_sc_h (uint32_t, uint6_t) -Generated assembler @code{cv.maxu.sci.h} -@end deftypefn +Vector Shift Left Double Bit Immediate +@smallexample +@exdent vector signed char vec_sldb (vector signed char, vector signed char, +const unsigned int); +@exdent vector unsigned char vec_sldb (vector unsigned char, +vector unsigned char, const unsigned int); +@exdent vector signed short vec_sldb (vector signed short, vector signed short, +const unsigned int); +@exdent vector unsigned short vec_sldb (vector unsigned short, +vector unsigned short, const unsigned int); +@exdent vector signed int vec_sldb (vector signed int, vector signed int, +const unsigned int); +@exdent vector unsigned int vec_sldb (vector unsigned int, vector unsigned int, +const unsigned int); +@exdent vector signed long long vec_sldb (vector signed long long, +vector signed long long, const unsigned int); +@exdent vector unsigned long long vec_sldb (vector unsigned long long, +vector unsigned long long, const unsigned int); +@exdent vector signed __int128 vec_sldb (vector signed __int128, +vector signed __int128, const unsigned int); +@exdent vector unsigned __int128 vec_sldb (vector unsigned __int128, +vector unsigned __int128, const unsigned int); +@end smallexample -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_maxu_sc_b (uint32_t, uint8_t) -Generated assembler @code{cv.maxu.sc.b} -@end deftypefn +Shift the combined input vectors left by the amount specified by the low-order +three bits of the third argument, and return the leftmost remaining 128 bits. +Code using this instruction must be endian-aware. -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_maxu_sc_b (uint32_t, uint6_t) -Generated assembler @code{cv.maxu.sci.b} -@end deftypefn +@findex vec_sldb -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_srl_h (uint32_t, uint32_t) -Generated assembler @code{cv.srl.h} -@end deftypefn +Vector Shift Right Double Bit Immediate -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_srl_b (uint32_t, uint32_t) -Generated assembler @code{cv.srl.b} -@end deftypefn +@smallexample +@exdent vector signed char vec_srdb (vector signed char, vector signed char, +const unsigned int); +@exdent vector unsigned char vec_srdb (vector unsigned char, vector unsigned char, +const unsigned int); +@exdent vector signed short vec_srdb (vector signed short, vector signed short, +const unsigned int); +@exdent vector unsigned short vec_srdb (vector unsigned short, vector unsigned short, +const unsigned int); +@exdent vector signed int vec_srdb (vector signed int, vector signed int, +const unsigned int); +@exdent vector unsigned int vec_srdb (vector unsigned int, vector unsigned int, +const unsigned int); +@exdent vector signed long long vec_srdb (vector signed long long, +vector signed long long, const unsigned int); +@exdent vector unsigned long long vec_srdb (vector unsigned long long, +vector unsigned long long, const unsigned int); +@exdent vector signed __int128 vec_srdb (vector signed __int128, +vector signed __int128, const unsigned int); +@exdent vector unsigned __int128 vec_srdb (vector unsigned __int128, +vector unsigned __int128, const unsigned int); +@end smallexample -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_srl_sc_h (uint32_t, int16_t) -Generated assembler @code{cv.srl.sc.h} -@end deftypefn +Shift the combined input vectors right by the amount specified by the low-order +three bits of the third argument, and return the remaining 128 bits. Code +using this built-in must be endian-aware. -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_srl_sc_h (uint32_t, int6_t) -Generated assembler @code{cv.srl.sci.h} -@end deftypefn +@findex vec_srdb -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_srl_sc_b (uint32_t, int8_t) -Generated assembler @code{cv.srl.sc.b} -@end deftypefn +Vector Splat -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_srl_sc_b (uint32_t, int6_t) -Generated assembler @code{cv.srl.sci.b} -@end deftypefn +@smallexample +@exdent vector signed int vec_splati (const signed int); +@exdent vector float vec_splati (const float); +@end smallexample -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sra_h (uint32_t, uint32_t) -Generated assembler @code{cv.sra.h} -@end deftypefn +Splat a 32-bit immediate into a vector of words. -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sra_b (uint32_t, uint32_t) -Generated assembler @code{cv.sra.b} -@end deftypefn +@findex vec_splati -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sra_sc_h (uint32_t, int16_t) -Generated assembler @code{cv.sra.sc.h} -@end deftypefn +@smallexample +@exdent vector double vec_splatid (const float); +@end smallexample -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sra_sc_h (uint32_t, int6_t) -Generated assembler @code{cv.sra.sci.h} -@end deftypefn +Convert a single precision floating-point value to double-precision and splat +the result to a vector of double-precision floats. -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sra_sc_b (uint32_t, int8_t) -Generated assembler @code{cv.sra.sc.b} -@end deftypefn +@findex vec_splatid -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sra_sc_b (uint32_t, int6_t) -Generated assembler @code{cv.sra.sci.b} -@end deftypefn +@smallexample +@exdent vector signed int vec_splati_ins (vector signed int, +const unsigned int, const signed int); +@exdent vector unsigned int vec_splati_ins (vector unsigned int, +const unsigned int, const unsigned int); +@exdent vector float vec_splati_ins (vector float, const unsigned int, +const float); +@end smallexample -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sll_h (uint32_t, uint32_t) -Generated assembler @code{cv.sll.h} -@end deftypefn +Argument 2 must be either 0 or 1. Splat the value of argument 3 into the word +identified by argument 2 of each doubleword of argument 1 and return the +result. The other words of argument 1 are unchanged. -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sll_b (uint32_t, uint32_t) -Generated assembler @code{cv.sll.b} -@end deftypefn +@findex vec_splati_ins -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sll_sc_h (uint32_t, int16_t) -Generated assembler @code{cv.sll.sc.h} -@end deftypefn +Vector Blend Variable -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sll_sc_h (uint32_t, int6_t) -Generated assembler @code{cv.sll.sci.h} -@end deftypefn +@smallexample +@exdent vector signed char vec_blendv (vector signed char, vector signed char, +vector unsigned char); +@exdent vector unsigned char vec_blendv (vector unsigned char, +vector unsigned char, vector unsigned char); +@exdent vector signed short vec_blendv (vector signed short, +vector signed short, vector unsigned short); +@exdent vector unsigned short vec_blendv (vector unsigned short, +vector unsigned short, vector unsigned short); +@exdent vector signed int vec_blendv (vector signed int, vector signed int, +vector unsigned int); +@exdent vector unsigned int vec_blendv (vector unsigned int, +vector unsigned int, vector unsigned int); +@exdent vector signed long long vec_blendv (vector signed long long, +vector signed long long, vector unsigned long long); +@exdent vector unsigned long long vec_blendv (vector unsigned long long, +vector unsigned long long, vector unsigned long long); +@exdent vector float vec_blendv (vector float, vector float, +vector unsigned int); +@exdent vector double vec_blendv (vector double, vector double, +vector unsigned long long); +@end smallexample -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sll_sc_b (uint32_t, int8_t) -Generated assembler @code{cv.sll.sc.b} -@end deftypefn +Blend the first and second argument vectors according to the sign bits of the +corresponding elements of the third argument vector. This is similar to the +@code{vsel} and @code{xxsel} instructions but for bigger elements. -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sll_sc_b (uint32_t, int6_t) -Generated assembler @code{cv.sll.sci.b} -@end deftypefn +@findex vec_blendv -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_or_h (uint32_t, uint32_t) -Generated assembler @code{cv.or.h} -@end deftypefn +Vector Permute Extended -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_or_b (uint32_t, uint32_t) -Generated assembler @code{cv.or.b} -@end deftypefn +@smallexample +@exdent vector signed char vec_permx (vector signed char, vector signed char, +vector unsigned char, const int); +@exdent vector unsigned char vec_permx (vector unsigned char, +vector unsigned char, vector unsigned char, const int); +@exdent vector signed short vec_permx (vector signed short, +vector signed short, vector unsigned char, const int); +@exdent vector unsigned short vec_permx (vector unsigned short, +vector unsigned short, vector unsigned char, const int); +@exdent vector signed int vec_permx (vector signed int, vector signed int, +vector unsigned char, const int); +@exdent vector unsigned int vec_permx (vector unsigned int, +vector unsigned int, vector unsigned char, const int); +@exdent vector signed long long vec_permx (vector signed long long, +vector signed long long, vector unsigned char, const int); +@exdent vector unsigned long long vec_permx (vector unsigned long long, +vector unsigned long long, vector unsigned char, const int); +@exdent vector float (vector float, vector float, vector unsigned char, +const int); +@exdent vector double (vector double, vector double, vector unsigned char, +const int); +@end smallexample -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_or_sc_h (uint32_t, int16_t) -Generated assembler @code{cv.or.sc.h} -@end deftypefn +Perform a partial permute of the first two arguments, which form a 32-byte +section of an emulated vector up to 256 bytes wide, using the partial permute +control vector in the third argument. The fourth argument (constrained to +values of 0-7) identifies which 32-byte section of the emulated vector is +contained in the first two arguments. +@findex vec_permx -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_or_sc_h (uint32_t, int6_t) -Generated assembler @code{cv.or.sci.h} -@end deftypefn +@smallexample +@exdent vector unsigned long long int +@exdent vec_pext (vector unsigned long long int, vector unsigned long long int); +@end smallexample +Perform a vector parallel bit extract operation, as if implemented by +the @code{vpextd} instruction. +@findex vec_pext -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_or_sc_b (uint32_t, int8_t) -Generated assembler @code{cv.or.sc.b} -@end deftypefn +@smallexample +@exdent vector unsigned char vec_stril (vector unsigned char); +@exdent vector signed char vec_stril (vector signed char); +@exdent vector unsigned short vec_stril (vector unsigned short); +@exdent vector signed short vec_stril (vector signed short); +@end smallexample +Isolate the left-most non-zero elements of the incoming vector argument, +replacing all elements to the right of the left-most zero element +found within the argument with zero. The typical implementation uses +the @code{vstribl} or @code{vstrihl} instruction on big-endian targets +and uses the @code{vstribr} or @code{vstrihr} instruction on +little-endian targets. +@findex vec_stril -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_or_sc_b (uint32_t, int6_t) -Generated assembler @code{cv.or.sci.b} -@end deftypefn +@smallexample +@exdent int vec_stril_p (vector unsigned char); +@exdent int vec_stril_p (vector signed char); +@exdent int short vec_stril_p (vector unsigned short); +@exdent int vec_stril_p (vector signed short); +@end smallexample +Return a non-zero value if and only if the argument contains a zero +element. The typical implementation uses +the @code{vstribl.} or @code{vstrihl.} instruction on big-endian targets +and uses the @code{vstribr.} or @code{vstrihr.} instruction on +little-endian targets. Choose this built-in to check for presence of +zero element if the same argument is also passed to @code{vec_stril}. +@findex vec_stril_p -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_xor_h (uint32_t, uint32_t) -Generated assembler @code{cv.xor.h} -@end deftypefn +@smallexample +@exdent vector unsigned char vec_strir (vector unsigned char); +@exdent vector signed char vec_strir (vector signed char); +@exdent vector unsigned short vec_strir (vector unsigned short); +@exdent vector signed short vec_strir (vector signed short); +@end smallexample +Isolate the right-most non-zero elements of the incoming vector argument, +replacing all elements to the left of the right-most zero element +found within the argument with zero. The typical implementation uses +the @code{vstribr} or @code{vstrihr} instruction on big-endian targets +and uses the @code{vstribl} or @code{vstrihl} instruction on +little-endian targets. +@findex vec_strir -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_xor_b (uint32_t, uint32_t) -Generated assembler @code{cv.xor.b} -@end deftypefn +@smallexample +@exdent int vec_strir_p (vector unsigned char); +@exdent int vec_strir_p (vector signed char); +@exdent int short vec_strir_p (vector unsigned short); +@exdent int vec_strir_p (vector signed short); +@end smallexample +Return a non-zero value if and only if the argument contains a zero +element. The typical implementation uses +the @code{vstribr.} or @code{vstrihr.} instruction on big-endian targets +and uses the @code{vstribl.} or @code{vstrihl.} instruction on +little-endian targets. Choose this built-in to check for presence of +zero element if the same argument is also passed to @code{vec_strir}. +@findex vec_strir_p -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_xor_sc_h (uint32_t, int16_t) -Generated assembler @code{cv.xor.sc.h} -@end deftypefn +@smallexample +@exdent vector unsigned char +@exdent vec_ternarylogic (vector unsigned char, vector unsigned char, + vector unsigned char, const unsigned int); +@exdent vector unsigned short +@exdent vec_ternarylogic (vector unsigned short, vector unsigned short, + vector unsigned short, const unsigned int); +@exdent vector unsigned int +@exdent vec_ternarylogic (vector unsigned int, vector unsigned int, + vector unsigned int, const unsigned int); +@exdent vector unsigned long long int +@exdent vec_ternarylogic (vector unsigned long long int, vector unsigned long long int, + vector unsigned long long int, const unsigned int); +@exdent vector unsigned __int128 +@exdent vec_ternarylogic (vector unsigned __int128, vector unsigned __int128, + vector unsigned __int128, const unsigned int); +@end smallexample +Perform a 128-bit vector evaluate operation, as if implemented by the +@code{xxeval} instruction. The fourth argument must be a literal +integer value between 0 and 255 inclusive. +@findex vec_ternarylogic -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_xor_sc_h (uint32_t, int6_t) -Generated assembler @code{cv.xor.sci.h} -@end deftypefn +@smallexample +@exdent vector unsigned char vec_genpcvm (vector unsigned char, const int); +@exdent vector unsigned short vec_genpcvm (vector unsigned short, const int); +@exdent vector unsigned int vec_genpcvm (vector unsigned int, const int); +@exdent vector unsigned int vec_genpcvm (vector unsigned long long int, + const int); +@end smallexample -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_xor_sc_b (uint32_t, int8_t) -Generated assembler @code{cv.xor.sc.b} -@end deftypefn +Vector Integer Multiply/Divide/Modulo -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_xor_sc_b (uint32_t, int6_t) -Generated assembler @code{cv.xor.sci.b} -@end deftypefn +@smallexample +@exdent vector signed int +@exdent vec_mulh (vector signed int @var{a}, vector signed int @var{b}); +@exdent vector unsigned int +@exdent vec_mulh (vector unsigned int @var{a}, vector unsigned int @var{b}); +@end smallexample -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_and_h (uint32_t, uint32_t) -Generated assembler @code{cv.and.h} -@end deftypefn +For each integer value @code{i} from 0 to 3, do the following. The integer +value in word element @code{i} of a is multiplied by the integer value in word +element @code{i} of b. The high-order 32 bits of the 64-bit product are placed +into word element @code{i} of the vector returned. -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_and_b (uint32_t, uint32_t) -Generated assembler @code{cv.and.b} -@end deftypefn +@smallexample +@exdent vector signed long long +@exdent vec_mulh (vector signed long long @var{a}, vector signed long long @var{b}); +@exdent vector unsigned long long +@exdent vec_mulh (vector unsigned long long @var{a}, vector unsigned long long @var{b}); +@end smallexample -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_and_sc_h (uint32_t, int16_t) -Generated assembler @code{cv.and.sc.h} -@end deftypefn +For each integer value @code{i} from 0 to 1, do the following. The integer +value in doubleword element @code{i} of a is multiplied by the integer value in +doubleword element @code{i} of b. The high-order 64 bits of the 128-bit product +are placed into doubleword element @code{i} of the vector returned. -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_and_sc_h (uint32_t, int6_t) -Generated assembler @code{cv.and.sci.h} -@end deftypefn +@smallexample +@exdent vector unsigned long long +@exdent vec_mul (vector unsigned long long @var{a}, vector unsigned long long @var{b}); +@exdent vector signed long long +@exdent vec_mul (vector signed long long @var{a}, vector signed long long @var{b}); +@end smallexample -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_and_sc_b (uint32_t, int8_t) -Generated assembler @code{cv.and.sc.b} -@end deftypefn +For each integer value @code{i} from 0 to 1, do the following. The integer +value in doubleword element @code{i} of a is multiplied by the integer value in +doubleword element @code{i} of b. The low-order 64 bits of the 128-bit product +are placed into doubleword element @code{i} of the vector returned. -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_and_sc_b (uint32_t, int6_t) -Generated assembler @code{cv.and.sci.b} -@end deftypefn +@smallexample +@exdent vector signed int +@exdent vec_div (vector signed int @var{a}, vector signed int @var{b}); +@exdent vector unsigned int +@exdent vec_div (vector unsigned int @var{a}, vector unsigned int @var{b}); +@end smallexample -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_abs_h (uint32_t) -Generated assembler @code{cv.abs.h} -@end deftypefn +For each integer value @code{i} from 0 to 3, do the following. The integer in +word element @code{i} of a is divided by the integer in word element @code{i} +of b. The unique integer quotient is placed into the word element @code{i} of +the vector returned. If an attempt is made to perform any of the divisions + ÷ 0 then the quotient is undefined. -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_abs_b (uint32_t) -Generated assembler @code{cv.abs.b} -@end deftypefn +@smallexample +@exdent vector signed long long +@exdent vec_div (vector signed long long @var{a}, vector signed long long @var{b}); +@exdent vector unsigned long long +@exdent vec_div (vector unsigned long long @var{a}, vector unsigned long long @var{b}); +@end smallexample -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_dotup_h (uint32_t, uint32_t) -Generated assembler @code{cv.dotup.h} -@end deftypefn +For each integer value @code{i} from 0 to 1, do the following. The integer in +doubleword element @code{i} of a is divided by the integer in doubleword +element @code{i} of b. The unique integer quotient is placed into the +doubleword element @code{i} of the vector returned. If an attempt is made to +perform any of the divisions 0x8000_0000_0000_0000 ÷ -1 or ÷ 0 then +the quotient is undefined. -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_dotup_b (uint32_t, uint32_t) -Generated assembler @code{cv.dotup.b} -@end deftypefn +@smallexample +@exdent vector signed int +@exdent vec_dive (vector signed int @var{a}, vector signed int @var{b}); +@exdent vector unsigned int +@exdent vec_dive (vector unsigned int @var{a}, vector unsigned int @var{b}); +@end smallexample -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_dotup_sc_h (uint32_t, uint16_t) -Generated assembler @code{cv.dotup.sc.h} -@end deftypefn +For each integer value @code{i} from 0 to 3, do the following. The integer in +word element @code{i} of a is shifted left by 32 bits, then divided by the +integer in word element @code{i} of b. The unique integer quotient is placed +into the word element @code{i} of the vector returned. If the quotient cannot +be represented in 32 bits, or if an attempt is made to perform any of the +divisions ÷ 0 then the quotient is undefined. -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_dotup_sc_h (uint32_t, uint6_t) -Generated assembler @code{cv.dotup.sci.h} -@end deftypefn +@smallexample +@exdent vector signed long long +@exdent vec_dive (vector signed long long @var{a}, vector signed long long @var{b}); +@exdent vector unsigned long long +@exdent vec_dive (vector unsigned long long @var{a}, vector unsigned long long @var{b}); +@end smallexample -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_dotup_sc_b (uint32_t, uint8_t) -Generated assembler @code{cv.dotup.sc.b} -@end deftypefn +For each integer value @code{i} from 0 to 1, do the following. The integer in +doubleword element @code{i} of a is shifted left by 64 bits, then divided by +the integer in doubleword element @code{i} of b. The unique integer quotient is +placed into the doubleword element @code{i} of the vector returned. If the +quotient cannot be represented in 64 bits, or if an attempt is made to perform + ÷ 0 then the quotient is undefined. -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_dotup_sc_b (uint32_t, uint6_t) -Generated assembler @code{cv.dotup.sci.b} -@end deftypefn +@smallexample +@exdent vector signed int +@exdent vec_mod (vector signed int @var{a}, vector signed int @var{b}); +@exdent vector unsigned int +@exdent vec_mod (vector unsigned int @var{a}, vector unsigned int @var{b}); +@end smallexample -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_dotusp_h (uint32_t, uint32_t) -Generated assembler @code{cv.dotusp.h} -@end deftypefn +For each integer value @code{i} from 0 to 3, do the following. The integer in +word element @code{i} of a is divided by the integer in word element @code{i} +of b. The unique integer remainder is placed into the word element @code{i} of +the vector returned. If an attempt is made to perform any of the divisions +0x8000_0000 ÷ -1 or ÷ 0 then the remainder is undefined. -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_dotusp_b (uint32_t, uint32_t) -Generated assembler @code{cv.dotusp.b} -@end deftypefn +@smallexample +@exdent vector signed long long +@exdent vec_mod (vector signed long long @var{a}, vector signed long long @var{b}); +@exdent vector unsigned long long +@exdent vec_mod (vector unsigned long long @var{a}, vector unsigned long long @var{b}); +@end smallexample -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_dotusp_sc_h (uint32_t, int16_t) -Generated assembler @code{cv.dotusp.sc.h} -@end deftypefn +For each integer value @code{i} from 0 to 1, do the following. The integer in +doubleword element @code{i} of a is divided by the integer in doubleword +element @code{i} of b. The unique integer remainder is placed into the +doubleword element @code{i} of the vector returned. If an attempt is made to +perform ÷ 0 then the remainder is undefined. -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_dotusp_sc_h (uint32_t, int6_t) -Generated assembler @code{cv.dotusp.sci.h} -@end deftypefn +Generate PCV from specified Mask size, as if implemented by the +@code{xxgenpcvbm}, @code{xxgenpcvhm}, @code{xxgenpcvwm} instructions, where +immediate value is either 0, 1, 2 or 3. +@findex vec_genpcvm -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_dotusp_sc_b (uint32_t, int8_t) -Generated assembler @code{cv.dotusp.sc.b} -@end deftypefn +@smallexample +@exdent vector unsigned __int128 vec_rl (vector unsigned __int128 @var{A}, + vector unsigned __int128 @var{B}); +@exdent vector signed __int128 vec_rl (vector signed __int128 @var{A}, + vector unsigned __int128 @var{B}); +@end smallexample -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_dotusp_sc_b (uint32_t, int6_t) -Generated assembler @code{cv.dotusp.sci.b} -@end deftypefn +Result value: Each element of @var{R} is obtained by rotating the corresponding element +of @var{A} left by the number of bits specified by the corresponding element of @var{B}. -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_dotsp_h (uint32_t, uint32_t) -Generated assembler @code{cv.dotsp.h} -@end deftypefn -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_dotsp_b (uint32_t, uint32_t) -Generated assembler @code{cv.dotsp.b} -@end deftypefn +@smallexample +@exdent vector unsigned __int128 vec_rlmi (vector unsigned __int128, + vector unsigned __int128, + vector unsigned __int128); +@exdent vector signed __int128 vec_rlmi (vector signed __int128, + vector signed __int128, + vector unsigned __int128); +@end smallexample -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_dotsp_sc_h (uint32_t, int16_t) -Generated assembler @code{cv.dotsp.sc.h} -@end deftypefn +Returns the result of rotating the first input and inserting it under mask +into the second input. The first bit in the mask, the last bit in the mask are +obtained from the two 7-bit fields bits [108:115] and bits [117:123] +respectively of the second input. The shift is obtained from the third input +in the 7-bit field [125:131] where all bits counted from zero at the left. -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_dotsp_sc_h (uint32_t, int6_t) -Generated assembler @code{cv.dotsp.sci.h} -@end deftypefn +@smallexample +@exdent vector unsigned __int128 vec_rlnm (vector unsigned __int128, + vector unsigned __int128, + vector unsigned __int128); +@exdent vector signed __int128 vec_rlnm (vector signed __int128, + vector unsigned __int128, + vector unsigned __int128); +@end smallexample -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_dotsp_sc_b (uint32_t, int8_t) -Generated assembler @code{cv.dotsp.sc.b} -@end deftypefn +Returns the result of rotating the first input and ANDing it with a mask. The +first bit in the mask and the last bit in the mask are obtained from the two +7-bit fields bits [117:123] and bits [125:131] respectively of the second +input. The shift is obtained from the third input in the 7-bit field bits +[125:131] where all bits counted from zero at the left. -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_dotsp_sc_b (uint32_t, int6_t) -Generated assembler @code{cv.dotsp.sci.b} -@end deftypefn +@smallexample +@exdent vector unsigned __int128 vec_sl(vector unsigned __int128 @var{A}, vector unsigned __int128 @var{B}); +@exdent vector signed __int128 vec_sl(vector signed __int128 @var{A}, vector unsigned __int128 @var{B}); +@end smallexample -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sdotup_h (uint32_t, uint32_t, uint32_t) -Generated assembler @code{cv.sdotup.h} -@end deftypefn +Result value: Each element of @var{R} is obtained by shifting the corresponding element of +@var{A} left by the number of bits specified by the corresponding element of @var{B}. -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sdotup_b (uint32_t, uint32_t, uint32_t) -Generated assembler @code{cv.sdotup.b} -@end deftypefn +@smallexample +@exdent vector unsigned __int128 vec_sr(vector unsigned __int128 @var{A}, vector unsigned __int128 @var{B}); +@exdent vector signed __int128 vec_sr(vector signed __int128 @var{A}, vector unsigned __int128 @var{B}); +@end smallexample -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sdotup_sc_h (uint32_t, uint16_t, uint32_t) -Generated assembler @code{cv.sdotup.sc.h} -@end deftypefn +Result value: Each element of @var{R} is obtained by shifting the corresponding element of +@var{A} right by the number of bits specified by the corresponding element of @var{B}. -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sdotup_sc_h (uint32_t, uint6_t, uint32_t) -Generated assembler @code{cv.sdotup.sci.h} -@end deftypefn +@smallexample +@exdent vector unsigned __int128 vec_sra(vector unsigned __int128 @var{A}, vector unsigned __int128 @var{B}); +@exdent vector signed __int128 vec_sra(vector signed __int128 @var{A}, vector unsigned __int128 @var{B}); +@end smallexample -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sdotup_sc_b (uint32_t, uint8_t, uint32_t) -Generated assembler @code{cv.sdotup.sc.b} -@end deftypefn +Result value: Each element of @var{R} is obtained by arithmetic shifting the corresponding +element of @var{A} right by the number of bits specified by the corresponding element of @var{B}. -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sdotup_sc_b (uint32_t, uint6_t, uint32_t) -Generated assembler @code{cv.sdotup.sci.b} -@end deftypefn +@smallexample +@exdent vector unsigned __int128 vec_mule (vector unsigned long long, + vector unsigned long long); +@exdent vector signed __int128 vec_mule (vector signed long long, + vector signed long long); +@end smallexample -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sdotusp_h (uint32_t, uint32_t, uint32_t) -Generated assembler @code{cv.sdotusp.h} -@end deftypefn +Returns a vector containing a 128-bit integer result of multiplying the even +doubleword elements of the two inputs. -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sdotusp_b (uint32_t, uint32_t, uint32_t) -Generated assembler @code{cv.sdotusp.b} -@end deftypefn +@smallexample +@exdent vector unsigned __int128 vec_mulo (vector unsigned long long, + vector unsigned long long); +@exdent vector signed __int128 vec_mulo (vector signed long long, + vector signed long long); +@end smallexample -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sdotusp_sc_h (uint32_t, int16_t, uint32_t) -Generated assembler @code{cv.sdotusp.sc.h} -@end deftypefn +Returns a vector containing a 128-bit integer result of multiplying the odd +doubleword elements of the two inputs. -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sdotusp_sc_h (uint32_t, int6_t, uint32_t) -Generated assembler @code{cv.sdotusp.sci.h} -@end deftypefn +@smallexample +@exdent vector unsigned __int128 vec_div (vector unsigned __int128, + vector unsigned __int128); +@exdent vector signed __int128 vec_div (vector signed __int128, + vector signed __int128); +@end smallexample -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sdotusp_sc_b (uint32_t, int8_t, uint32_t) -Generated assembler @code{cv.sdotusp.sc.b} -@end deftypefn +Returns the result of dividing the first operand by the second operand. An +attempt to divide any value by zero or to divide the most negative signed +128-bit integer by negative one results in an undefined value. -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sdotusp_sc_b (uint32_t, int6_t, uint32_t) -Generated assembler @code{cv.sdotusp.sci.b} -@end deftypefn +@smallexample +@exdent vector unsigned __int128 vec_dive (vector unsigned __int128, + vector unsigned __int128); +@exdent vector signed __int128 vec_dive (vector signed __int128, + vector signed __int128); +@end smallexample -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sdotsp_h (uint32_t, uint32_t, uint32_t) -Generated assembler @code{cv.sdotsp.h} -@end deftypefn +The result is produced by shifting the first input left by 128 bits and +dividing by the second. If an attempt is made to divide by zero or the result +is larger than 128 bits, the result is undefined. -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sdotsp_b (uint32_t, uint32_t, uint32_t) -Generated assembler @code{cv.sdotsp.b} -@end deftypefn +@smallexample +@exdent vector unsigned __int128 vec_mod (vector unsigned __int128, + vector unsigned __int128); +@exdent vector signed __int128 vec_mod (vector signed __int128, + vector signed __int128); +@end smallexample -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sdotsp_sc_h (uint32_t, int16_t, uint32_t) -Generated assembler @code{cv.sdotsp.sc.h} -@end deftypefn +The result is the modulo result of dividing the first input by the second +input. -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sdotsp_sc_h (uint32_t, int6_t, uint32_t) -Generated assembler @code{cv.sdotsp.sci.h} -@end deftypefn +The following builtins perform 128-bit vector comparisons. The +@code{vec_all_xx}, @code{vec_any_xx}, and @code{vec_cmpxx}, where @code{xx} is +one of the operations @code{eq, ne, gt, lt, ge, le} perform pairwise +comparisons between the elements at the same positions within their two vector +arguments. The @code{vec_all_xx}function returns a non-zero value if and only +if all pairwise comparisons are true. The @code{vec_any_xx} function returns +a non-zero value if and only if at least one pairwise comparison is true. The +@code{vec_cmpxx}function returns a vector of the same type as its two +arguments, within which each element consists of all ones to denote that +specified logical comparison of the corresponding elements was true. +Otherwise, the element of the returned vector contains all zeros. + +@smallexample +vector bool __int128 vec_cmpeq (vector signed __int128, vector signed __int128); +vector bool __int128 vec_cmpeq (vector unsigned __int128, vector unsigned __int128); +vector bool __int128 vec_cmpne (vector signed __int128, vector signed __int128); +vector bool __int128 vec_cmpne (vector unsigned __int128, vector unsigned __int128); +vector bool __int128 vec_cmpgt (vector signed __int128, vector signed __int128); +vector bool __int128 vec_cmpgt (vector unsigned __int128, vector unsigned __int128); +vector bool __int128 vec_cmplt (vector signed __int128, vector signed __int128); +vector bool __int128 vec_cmplt (vector unsigned __int128, vector unsigned __int128); +vector bool __int128 vec_cmpge (vector signed __int128, vector signed __int128); +vector bool __int128 vec_cmpge (vector unsigned __int128, vector unsigned __int128); +vector bool __int128 vec_cmple (vector signed __int128, vector signed __int128); +vector bool __int128 vec_cmple (vector unsigned __int128, vector unsigned __int128); -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sdotsp_sc_b (uint32_t, int8_t, uint32_t) -Generated assembler @code{cv.sdotsp.sc.b} -@end deftypefn +int vec_all_eq (vector signed __int128, vector signed __int128); +int vec_all_eq (vector unsigned __int128, vector unsigned __int128); +int vec_all_ne (vector signed __int128, vector signed __int128); +int vec_all_ne (vector unsigned __int128, vector unsigned __int128); +int vec_all_gt (vector signed __int128, vector signed __int128); +int vec_all_gt (vector unsigned __int128, vector unsigned __int128); +int vec_all_lt (vector signed __int128, vector signed __int128); +int vec_all_lt (vector unsigned __int128, vector unsigned __int128); +int vec_all_ge (vector signed __int128, vector signed __int128); +int vec_all_ge (vector unsigned __int128, vector unsigned __int128); +int vec_all_le (vector signed __int128, vector signed __int128); +int vec_all_le (vector unsigned __int128, vector unsigned __int128); -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sdotsp_sc_b (uint32_t, int6_t, uint32_t) -Generated assembler @code{cv.sdotsp.sci.b} -@end deftypefn +int vec_any_eq (vector signed __int128, vector signed __int128); +int vec_any_eq (vector unsigned __int128, vector unsigned __int128); +int vec_any_ne (vector signed __int128, vector signed __int128); +int vec_any_ne (vector unsigned __int128, vector unsigned __int128); +int vec_any_gt (vector signed __int128, vector signed __int128); +int vec_any_gt (vector unsigned __int128, vector unsigned __int128); +int vec_any_lt (vector signed __int128, vector signed __int128); +int vec_any_lt (vector unsigned __int128, vector unsigned __int128); +int vec_any_ge (vector signed __int128, vector signed __int128); +int vec_any_ge (vector unsigned __int128, vector unsigned __int128); +int vec_any_le (vector signed __int128, vector signed __int128); +int vec_any_le (vector unsigned __int128, vector unsigned __int128); +@end smallexample -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_extract_h (uint32_t, uint6_t) -Generated assembler @code{cv.extract.h} -@end deftypefn -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_extract_b (uint32_t, uint6_t) -Generated assembler @code{cv.extract.b} -@end deftypefn +The following instances are extension of the existing overloaded built-ins +@code{vec_sld}, @code{vec_sldw}, @code{vec_slo}, @code{vec_sro}, @code{vec_srl} +that are documented in the PVIPR. -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_extractu_h (uint32_t, uint6_t) -Generated assembler @code{cv.extractu.h} -@end deftypefn +@smallexample +@exdent vector signed __int128 vec_sld (vector signed __int128, +vector signed __int128, const unsigned int); +@exdent vector unsigned __int128 vec_sld (vector unsigned __int128, +vector unsigned __int128, const unsigned int); +@exdent vector signed __int128 vec_sldw (vector signed __int128, +vector signed __int128, const unsigned int); +@exdent vector unsigned __int128 vec_sldw (vector unsigned __int, +vector unsigned __int128, const unsigned int); +@exdent vector signed __int128 vec_slo (vector signed __int128, +vector signed char); +@exdent vector signed __int128 vec_slo (vector signed __int128, +vector unsigned char); +@exdent vector unsigned __int128 vec_slo (vector unsigned __int128, +vector signed char); +@exdent vector unsigned __int128 vec_slo (vector unsigned __int128, +vector unsigned char); +@exdent vector signed __int128 vec_sro (vector signed __int128, +vector signed char); +@exdent vector signed __int128 vec_sro (vector signed __int128, +vector unsigned char); +@exdent vector unsigned __int128 vec_sro (vector unsigned __int128, +vector signed char); +@exdent vector unsigned __int128 vec_sro (vector unsigned __int128, +vector unsigned char); +@exdent vector signed __int128 vec_srl (vector signed __int128, +vector unsigned char); +@exdent vector unsigned __int128 vec_srl (vector unsigned __int128, +vector unsigned char); +@end smallexample -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_extractu_b (uint32_t, uint6_t) -Generated assembler @code{cv.extractu.b} -@end deftypefn +@node PowerPC Hardware Transactional Memory Built-in Functions +@subsection PowerPC Hardware Transactional Memory Built-in Functions +GCC provides two interfaces for accessing the Hardware Transactional +Memory (HTM) instructions available on some of the PowerPC family +of processors (eg, POWER8). The two interfaces come in a low level +interface, consisting of built-in functions specific to PowerPC and a +higher level interface consisting of inline functions that are common +between PowerPC and S/390. -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_insert_h (uint32_t, uint32_t) -Generated assembler @code{cv.insert.h} -@end deftypefn +@subsubsection PowerPC HTM Low Level Built-in Functions -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_insert_b (uint32_t, uint32_t) -Generated assembler @code{cv.insert.b} -@end deftypefn +The following low level built-in functions are available with +@option{-mhtm} or @option{-mcpu=CPU} where CPU is `power8' or later. +They all generate the machine instruction that is part of the name. -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_shuffle_h (uint32_t, uint32_t) -Generated assembler @code{cv.shuffle.h} -@end deftypefn +The HTM builtins (with the exception of @code{__builtin_tbegin}) return +the full 4-bit condition register value set by their associated hardware +instruction. The header file @code{htmintrin.h} defines some macros that can +be used to decipher the return value. The @code{__builtin_tbegin} builtin +returns a simple @code{true} or @code{false} value depending on whether a transaction was +successfully started or not. The arguments of the builtins match exactly the +type and order of the associated hardware instruction's operands, except for +the @code{__builtin_tcheck} builtin, which does not take any input arguments. +Refer to the ISA manual for a description of each instruction's operands. -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_shuffle_b (uint32_t, uint32_t) -Generated assembler @code{cv.shuffle.b} -@end deftypefn +@smallexample +unsigned int __builtin_tbegin (unsigned int); +unsigned int __builtin_tend (unsigned int); -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_shuffle_sci_h (uint32_t, uint4_t) -Generated assembler @code{cv.shuffle.sci.h} -@end deftypefn +unsigned int __builtin_tabort (unsigned int); +unsigned int __builtin_tabortdc (unsigned int, unsigned int, unsigned int); +unsigned int __builtin_tabortdci (unsigned int, unsigned int, int); +unsigned int __builtin_tabortwc (unsigned int, unsigned int, unsigned int); +unsigned int __builtin_tabortwci (unsigned int, unsigned int, int); -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_shufflei0_sci_b (uint32_t, uint4_t) -Generated assembler @code{cv.shufflei0.sci.b} -@end deftypefn +unsigned int __builtin_tcheck (void); +unsigned int __builtin_treclaim (unsigned int); +unsigned int __builtin_trechkpt (void); +unsigned int __builtin_tsr (unsigned int); +@end smallexample -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_shufflei1_sci_b (uint32_t, uint4_t) -Generated assembler @code{cv.shufflei1.sci.b} -@end deftypefn +In addition to the above HTM built-ins, we have added built-ins for +some common extended mnemonics of the HTM instructions: -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_shufflei2_sci_b (uint32_t, uint4_t) -Generated assembler @code{cv.shufflei2.sci.b} -@end deftypefn +@smallexample +unsigned int __builtin_tendall (void); +unsigned int __builtin_tresume (void); +unsigned int __builtin_tsuspend (void); +@end smallexample -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_shufflei3_sci_b (uint32_t, uint4_t) -Generated assembler @code{cv.shufflei3.sci.b} -@end deftypefn +Note that the semantics of the above HTM builtins are required to mimic +the locking semantics used for critical sections. Builtins that are used +to create a new transaction or restart a suspended transaction must have +lock acquisition like semantics while those builtins that end or suspend a +transaction must have lock release like semantics. Specifically, this must +mimic lock semantics as specified by C++11, for example: Lock acquisition is +as-if an execution of __atomic_exchange_n(&globallock,1,__ATOMIC_ACQUIRE) +that returns 0, and lock release is as-if an execution of +__atomic_store(&globallock,0,__ATOMIC_RELEASE), with globallock being an +implicit implementation-defined lock used for all transactions. The HTM +instructions associated with with the builtins inherently provide the +correct acquisition and release hardware barriers required. However, +the compiler must also be prohibited from moving loads and stores across +the builtins in a way that would violate their semantics. This has been +accomplished by adding memory barriers to the associated HTM instructions +(which is a conservative approach to provide acquire and release semantics). +Earlier versions of the compiler did not treat the HTM instructions as +memory barriers. A @code{__TM_FENCE__} macro has been added, which can +be used to determine whether the current compiler treats HTM instructions +as memory barriers or not. This allows the user to explicitly add memory +barriers to their code when using an older version of the compiler. -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_shuffle2_h (uint32_t, uint32_t, uint32_t) -Generated assembler @code{cv.shuffle2.h} -@end deftypefn +The following set of built-in functions are available to gain access +to the HTM specific special purpose registers. -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_shuffle2_b (uint32_t, uint32_t, uint32_t) -Generated assembler @code{cv.shuffle2.b} -@end deftypefn +@smallexample +unsigned long __builtin_get_texasr (void); +unsigned long __builtin_get_texasru (void); +unsigned long __builtin_get_tfhar (void); +unsigned long __builtin_get_tfiar (void); -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_packlo_h (uint32_t, uint32_t) -Generated assembler @code{cv.pack} -@end deftypefn +void __builtin_set_texasr (unsigned long); +void __builtin_set_texasru (unsigned long); +void __builtin_set_tfhar (unsigned long); +void __builtin_set_tfiar (unsigned long); +@end smallexample -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_packhi_h (uint32_t, uint32_t) -Generated assembler @code{cv.pack.h} -@end deftypefn +Example usage of these low level built-in functions may look like: -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_packhi_b (uint32_t, uint32_t, uint32_t) -Generated assembler @code{cv.packhi.b} -@end deftypefn +@smallexample +#include -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_packlo_b (uint32_t, uint32_t, uint32_t) -Generated assembler @code{cv.packlo.b} -@end deftypefn +int num_retries = 10; -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpeq_h (uint32_t, uint32_t) -Generated assembler @code{cv.cmpeq.h} -@end deftypefn +while (1) + @{ + if (__builtin_tbegin (0)) + @{ + /* Transaction State Initiated. */ + if (is_locked (lock)) + __builtin_tabort (0); + ... transaction code... + __builtin_tend (0); + break; + @} + else + @{ + /* Transaction State Failed. Use locks if the transaction + failure is "persistent" or we've tried too many times. */ + if (num_retries-- <= 0 + || _TEXASRU_FAILURE_PERSISTENT (__builtin_get_texasru ())) + @{ + acquire_lock (lock); + ... non transactional fallback path... + release_lock (lock); + break; + @} + @} + @} +@end smallexample -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpeq_b (uint32_t, uint32_t) -Generated assembler @code{cv.cmpeq.b} -@end deftypefn +One final built-in function has been added that returns the value of +the 2-bit Transaction State field of the Machine Status Register (MSR) +as stored in @code{CR0}. -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpeq_sc_h (uint32_t, int16_t) -Generated assembler @code{cv.cmpeq.sc.h} -@end deftypefn +@smallexample +unsigned long __builtin_ttest (void) +@end smallexample -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpeq_sc_h (uint32_t, int6_t) -Generated assembler @code{cv.cmpeq.sci.h} -@end deftypefn +This built-in can be used to determine the current transaction state +using the following code example: -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpeq_sc_b (uint32_t, int8_t) -Generated assembler @code{cv.cmpeq.sc.b} -@end deftypefn +@smallexample +#include -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpeq_sc_b (uint32_t, int6_t) -Generated assembler @code{cv.cmpeq.sci.b} -@end deftypefn +unsigned char tx_state = _HTM_STATE (__builtin_ttest ()); -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpne_h (uint32_t, uint32_t) -Generated assembler @code{cv.cmpne.h} -@end deftypefn +if (tx_state == _HTM_TRANSACTIONAL) + @{ + /* Code to use in transactional state. */ + @} +else if (tx_state == _HTM_NONTRANSACTIONAL) + @{ + /* Code to use in non-transactional state. */ + @} +else if (tx_state == _HTM_SUSPENDED) + @{ + /* Code to use in transaction suspended state. */ + @} +@end smallexample -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpne_b (uint32_t, uint32_t) -Generated assembler @code{cv.cmpne.b} -@end deftypefn +@subsubsection PowerPC HTM High Level Inline Functions -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpne_sc_h (uint32_t, int16_t) -Generated assembler @code{cv.cmpne.sc.h} -@end deftypefn +The following high level HTM interface is made available by including +@code{} and using @option{-mhtm} or @option{-mcpu=CPU} +where CPU is `power8' or later. This interface is common between PowerPC +and S/390, allowing users to write one HTM source implementation that +can be compiled and executed on either system. -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpne_sc_h (uint32_t, int6_t) -Generated assembler @code{cv.cmpne.sci.h} -@end deftypefn +@smallexample +long __TM_simple_begin (void); +long __TM_begin (void* const TM_buff); +long __TM_end (void); +void __TM_abort (void); +void __TM_named_abort (unsigned char const code); +void __TM_resume (void); +void __TM_suspend (void); -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpne_sc_b (uint32_t, int8_t) -Generated assembler @code{cv.cmpne.sc.b} -@end deftypefn +long __TM_is_user_abort (void* const TM_buff); +long __TM_is_named_user_abort (void* const TM_buff, unsigned char *code); +long __TM_is_illegal (void* const TM_buff); +long __TM_is_footprint_exceeded (void* const TM_buff); +long __TM_nesting_depth (void* const TM_buff); +long __TM_is_nested_too_deep(void* const TM_buff); +long __TM_is_conflict(void* const TM_buff); +long __TM_is_failure_persistent(void* const TM_buff); +long __TM_failure_address(void* const TM_buff); +long long __TM_failure_code(void* const TM_buff); +@end smallexample -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpne_sc_b (uint32_t, int6_t) -Generated assembler @code{cv.cmpne.sci.b} -@end deftypefn +Using these common set of HTM inline functions, we can create +a more portable version of the HTM example in the previous +section that will work on either PowerPC or S/390: -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpgt_h (uint32_t, uint32_t) -Generated assembler @code{cv.cmpgt.h} -@end deftypefn +@smallexample +#include -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpgt_b (uint32_t, uint32_t) -Generated assembler @code{cv.cmpgt.b} -@end deftypefn +int num_retries = 10; +TM_buff_type TM_buff; -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpgt_sc_h (uint32_t, int16_t) -Generated assembler @code{cv.cmpgt.sc.h} -@end deftypefn +while (1) + @{ + if (__TM_begin (TM_buff) == _HTM_TBEGIN_STARTED) + @{ + /* Transaction State Initiated. */ + if (is_locked (lock)) + __TM_abort (); + ... transaction code... + __TM_end (); + break; + @} + else + @{ + /* Transaction State Failed. Use locks if the transaction + failure is "persistent" or we've tried too many times. */ + if (num_retries-- <= 0 + || __TM_is_failure_persistent (TM_buff)) + @{ + acquire_lock (lock); + ... non transactional fallback path... + release_lock (lock); + break; + @} + @} + @} +@end smallexample -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpgt_sc_h (uint32_t, int6_t) -Generated assembler @code{cv.cmpgt.sci.h} -@end deftypefn +@node PowerPC Atomic Memory Operation Functions +@subsection PowerPC Atomic Memory Operation Functions +ISA 3.0 of the PowerPC added new atomic memory operation (amo) +instructions. GCC provides support for these instructions in 64-bit +environments. All of the functions are declared in the include file +@code{amo.h}. -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpgt_sc_b (uint32_t, int8_t) -Generated assembler @code{cv.cmpgt.sc.b} -@end deftypefn +The functions supported are: -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpgt_sc_b (uint32_t, int6_t) -Generated assembler @code{cv.cmpgt.sci.b} -@end deftypefn +@smallexample +#include -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpge_h (uint32_t, uint32_t) -Generated assembler @code{cv.cmpge.h} -@end deftypefn +uint32_t amo_lwat_add (uint32_t *, uint32_t); +uint32_t amo_lwat_xor (uint32_t *, uint32_t); +uint32_t amo_lwat_ior (uint32_t *, uint32_t); +uint32_t amo_lwat_and (uint32_t *, uint32_t); +uint32_t amo_lwat_umax (uint32_t *, uint32_t); +uint32_t amo_lwat_umin (uint32_t *, uint32_t); +uint32_t amo_lwat_swap (uint32_t *, uint32_t); -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpge_b (uint32_t, uint32_t) -Generated assembler @code{cv.cmpge.b} -@end deftypefn +int32_t amo_lwat_sadd (int32_t *, int32_t); +int32_t amo_lwat_smax (int32_t *, int32_t); +int32_t amo_lwat_smin (int32_t *, int32_t); +int32_t amo_lwat_sswap (int32_t *, int32_t); -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpge_sc_h (uint32_t, int16_t) -Generated assembler @code{cv.cmpge.sc.h} -@end deftypefn +uint64_t amo_ldat_add (uint64_t *, uint64_t); +uint64_t amo_ldat_xor (uint64_t *, uint64_t); +uint64_t amo_ldat_ior (uint64_t *, uint64_t); +uint64_t amo_ldat_and (uint64_t *, uint64_t); +uint64_t amo_ldat_umax (uint64_t *, uint64_t); +uint64_t amo_ldat_umin (uint64_t *, uint64_t); +uint64_t amo_ldat_swap (uint64_t *, uint64_t); -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpge_sc_h (uint32_t, int6_t) -Generated assembler @code{cv.cmpge.sci.h} -@end deftypefn +int64_t amo_ldat_sadd (int64_t *, int64_t); +int64_t amo_ldat_smax (int64_t *, int64_t); +int64_t amo_ldat_smin (int64_t *, int64_t); +int64_t amo_ldat_sswap (int64_t *, int64_t); -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpge_sc_b (uint32_t, int8_t) -Generated assembler @code{cv.cmpge.sc.b} -@end deftypefn +void amo_stwat_add (uint32_t *, uint32_t); +void amo_stwat_xor (uint32_t *, uint32_t); +void amo_stwat_ior (uint32_t *, uint32_t); +void amo_stwat_and (uint32_t *, uint32_t); +void amo_stwat_umax (uint32_t *, uint32_t); +void amo_stwat_umin (uint32_t *, uint32_t); -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpge_sc_b (uint32_t, int6_t) -Generated assembler @code{cv.cmpge.sci.b} -@end deftypefn +void amo_stwat_sadd (int32_t *, int32_t); +void amo_stwat_smax (int32_t *, int32_t); +void amo_stwat_smin (int32_t *, int32_t); -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmplt_h (uint32_t, uint32_t) -Generated assembler @code{cv.cmplt.h} -@end deftypefn +void amo_stdat_add (uint64_t *, uint64_t); +void amo_stdat_xor (uint64_t *, uint64_t); +void amo_stdat_ior (uint64_t *, uint64_t); +void amo_stdat_and (uint64_t *, uint64_t); +void amo_stdat_umax (uint64_t *, uint64_t); +void amo_stdat_umin (uint64_t *, uint64_t); -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmplt_b (uint32_t, uint32_t) -Generated assembler @code{cv.cmplt.b} -@end deftypefn +void amo_stdat_sadd (int64_t *, int64_t); +void amo_stdat_smax (int64_t *, int64_t); +void amo_stdat_smin (int64_t *, int64_t); +@end smallexample -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmplt_sc_h (uint32_t, int16_t) -Generated assembler @code{cv.cmplt.sc.h} -@end deftypefn +@node PowerPC Matrix-Multiply Assist Built-in Functions +@subsection PowerPC Matrix-Multiply Assist Built-in Functions +ISA 3.1 of the PowerPC added new Matrix-Multiply Assist (MMA) instructions. +GCC provides support for these instructions through the following built-in +functions which are enabled with the @code{-mmma} option. The vec_t type +below is defined to be a normal vector unsigned char type. The uint2, uint4 +and uint8 parameters are 2-bit, 4-bit and 8-bit unsigned integer constants +respectively. The compiler will verify that they are constants and that +their values are within range. -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmplt_sc_h (uint32_t, int6_t) -Generated assembler @code{cv.cmplt.sci.h} -@end deftypefn +The built-in functions supported are: -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmplt_sc_b (uint32_t, int8_t) -Generated assembler @code{cv.cmplt.sc.b} -@end deftypefn +@smallexample +void __builtin_mma_xvi4ger8 (__vector_quad *, vec_t, vec_t); +void __builtin_mma_xvi8ger4 (__vector_quad *, vec_t, vec_t); +void __builtin_mma_xvi16ger2 (__vector_quad *, vec_t, vec_t); +void __builtin_mma_xvi16ger2s (__vector_quad *, vec_t, vec_t); +void __builtin_mma_xvf16ger2 (__vector_quad *, vec_t, vec_t); +void __builtin_mma_xvbf16ger2 (__vector_quad *, vec_t, vec_t); +void __builtin_mma_xvf32ger (__vector_quad *, vec_t, vec_t); -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmplt_sc_b (uint32_t, int6_t) -Generated assembler @code{cv.cmplt.sci.b} -@end deftypefn +void __builtin_mma_xvi4ger8pp (__vector_quad *, vec_t, vec_t); +void __builtin_mma_xvi8ger4pp (__vector_quad *, vec_t, vec_t); +void __builtin_mma_xvi8ger4spp(__vector_quad *, vec_t, vec_t); +void __builtin_mma_xvi16ger2pp (__vector_quad *, vec_t, vec_t); +void __builtin_mma_xvi16ger2spp (__vector_quad *, vec_t, vec_t); +void __builtin_mma_xvf16ger2pp (__vector_quad *, vec_t, vec_t); +void __builtin_mma_xvf16ger2pn (__vector_quad *, vec_t, vec_t); +void __builtin_mma_xvf16ger2np (__vector_quad *, vec_t, vec_t); +void __builtin_mma_xvf16ger2nn (__vector_quad *, vec_t, vec_t); +void __builtin_mma_xvbf16ger2pp (__vector_quad *, vec_t, vec_t); +void __builtin_mma_xvbf16ger2pn (__vector_quad *, vec_t, vec_t); +void __builtin_mma_xvbf16ger2np (__vector_quad *, vec_t, vec_t); +void __builtin_mma_xvbf16ger2nn (__vector_quad *, vec_t, vec_t); +void __builtin_mma_xvf32gerpp (__vector_quad *, vec_t, vec_t); +void __builtin_mma_xvf32gerpn (__vector_quad *, vec_t, vec_t); +void __builtin_mma_xvf32gernp (__vector_quad *, vec_t, vec_t); +void __builtin_mma_xvf32gernn (__vector_quad *, vec_t, vec_t); -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmple_h (uint32_t, uint32_t) -Generated assembler @code{cv.cmple.h} -@end deftypefn +void __builtin_mma_pmxvi4ger8 (__vector_quad *, vec_t, vec_t, uint4, uint4, uint8); +void __builtin_mma_pmxvi4ger8pp (__vector_quad *, vec_t, vec_t, uint4, uint4, uint8); -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmple_b (uint32_t, uint32_t) -Generated assembler @code{cv.cmple.b} -@end deftypefn +void __builtin_mma_pmxvi8ger4 (__vector_quad *, vec_t, vec_t, uint4, uint4, uint4); +void __builtin_mma_pmxvi8ger4pp (__vector_quad *, vec_t, vec_t, uint4, uint4, uint4); +void __builtin_mma_pmxvi8ger4spp(__vector_quad *, vec_t, vec_t, uint4, uint4, uint4); -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmple_sc_h (uint32_t, int16_t) -Generated assembler @code{cv.cmple.sc.h} -@end deftypefn +void __builtin_mma_pmxvi16ger2 (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2); +void __builtin_mma_pmxvi16ger2s (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2); +void __builtin_mma_pmxvf16ger2 (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2); +void __builtin_mma_pmxvbf16ger2 (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2); -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmple_sc_h (uint32_t, int6_t) -Generated assembler @code{cv.cmple.sci.h} -@end deftypefn +void __builtin_mma_pmxvi16ger2pp (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2); +void __builtin_mma_pmxvi16ger2spp (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2); +void __builtin_mma_pmxvf16ger2pp (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2); +void __builtin_mma_pmxvf16ger2pn (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2); +void __builtin_mma_pmxvf16ger2np (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2); +void __builtin_mma_pmxvf16ger2nn (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2); +void __builtin_mma_pmxvbf16ger2pp (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2); +void __builtin_mma_pmxvbf16ger2pn (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2); +void __builtin_mma_pmxvbf16ger2np (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2); +void __builtin_mma_pmxvbf16ger2nn (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2); -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmple_sc_b (uint32_t, int8_t) -Generated assembler @code{cv.cmple.sc.b} -@end deftypefn +void __builtin_mma_pmxvf32ger (__vector_quad *, vec_t, vec_t, uint4, uint4); +void __builtin_mma_pmxvf32gerpp (__vector_quad *, vec_t, vec_t, uint4, uint4); +void __builtin_mma_pmxvf32gerpn (__vector_quad *, vec_t, vec_t, uint4, uint4); +void __builtin_mma_pmxvf32gernp (__vector_quad *, vec_t, vec_t, uint4, uint4); +void __builtin_mma_pmxvf32gernn (__vector_quad *, vec_t, vec_t, uint4, uint4); -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmple_sc_b (uint32_t, int6_t) -Generated assembler @code{cv.cmple.sci.b} -@end deftypefn +void __builtin_mma_xvf64ger (__vector_quad *, __vector_pair, vec_t); +void __builtin_mma_xvf64gerpp (__vector_quad *, __vector_pair, vec_t); +void __builtin_mma_xvf64gerpn (__vector_quad *, __vector_pair, vec_t); +void __builtin_mma_xvf64gernp (__vector_quad *, __vector_pair, vec_t); +void __builtin_mma_xvf64gernn (__vector_quad *, __vector_pair, vec_t); -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpgtu_h (uint32_t, uint32_t) -Generated assembler @code{cv.cmpgtu.h} -@end deftypefn +void __builtin_mma_pmxvf64ger (__vector_quad *, __vector_pair, vec_t, uint4, uint2); +void __builtin_mma_pmxvf64gerpp (__vector_quad *, __vector_pair, vec_t, uint4, uint2); +void __builtin_mma_pmxvf64gerpn (__vector_quad *, __vector_pair, vec_t, uint4, uint2); +void __builtin_mma_pmxvf64gernp (__vector_quad *, __vector_pair, vec_t, uint4, uint2); +void __builtin_mma_pmxvf64gernn (__vector_quad *, __vector_pair, vec_t, uint4, uint2); -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpgtu_b (uint32_t, uint32_t) -Generated assembler @code{cv.cmpgtu.b} -@end deftypefn +void __builtin_mma_xxmtacc (__vector_quad *); +void __builtin_mma_xxmfacc (__vector_quad *); +void __builtin_mma_xxsetaccz (__vector_quad *); -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpgtu_sc_h (uint32_t, uint16_t) -Generated assembler @code{cv.cmpgtu.sc.h} -@end deftypefn +void __builtin_mma_build_acc (__vector_quad *, vec_t, vec_t, vec_t, vec_t); +void __builtin_mma_disassemble_acc (void *, __vector_quad *); -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpgtu_sc_h (uint32_t, uint6_t) -Generated assembler @code{cv.cmpgtu.sci.h} -@end deftypefn +void __builtin_vsx_build_pair (__vector_pair *, vec_t, vec_t); +void __builtin_vsx_disassemble_pair (void *, __vector_pair *); -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpgtu_sc_b (uint32_t, uint8_t) -Generated assembler @code{cv.cmpgtu.sc.b} -@end deftypefn +vec_t __builtin_vsx_xvcvspbf16 (vec_t); +vec_t __builtin_vsx_xvcvbf16spn (vec_t); -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpgtu_sc_b (uint32_t, uint6_t) -Generated assembler @code{cv.cmpgtu.sci.b} -@end deftypefn +__vector_pair __builtin_vsx_lxvp (size_t, __vector_pair *); +void __builtin_vsx_stxvp (__vector_pair, size_t, __vector_pair *); +@end smallexample -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpgeu_h (uint32_t, uint32_t) -Generated assembler @code{cv.cmpgeu.h} -@end deftypefn +@node PRU Built-in Functions +@subsection PRU Built-in Functions -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpgeu_b (uint32_t, uint32_t) -Generated assembler @code{cv.cmpgeu.b} -@end deftypefn +GCC provides a couple of special builtin functions to aid in utilizing +special PRU instructions. -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpgeu_sc_h (uint32_t, uint16_t) -Generated assembler @code{cv.cmpgeu.sc.h} -@end deftypefn +The built-in functions supported are: -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpgeu_sc_h (uint32_t, uint6_t) -Generated assembler @code{cv.cmpgeu.sci.h} -@end deftypefn +@defbuiltin{void __delay_cycles (constant long long @var{cycles})} +This inserts an instruction sequence that takes exactly @var{cycles} +cycles (between 0 and 0xffffffff) to complete. The inserted sequence +may use jumps, loops, or no-ops, and does not interfere with any other +instructions. Note that @var{cycles} must be a compile-time constant +integer - that is, you must pass a number, not a variable that may be +optimized to a constant later. The number of cycles delayed by this +builtin is exact. +@enddefbuiltin -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpgeu_sc_b (uint32_t, uint8_t) -Generated assembler @code{cv.cmpgeu.sc.b} -@end deftypefn +@defbuiltin{void __halt (void)} +This inserts a HALT instruction to stop processor execution. +@enddefbuiltin -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpgeu_sc_b (uint32_t, uint6_t) -Generated assembler @code{cv.cmpgeu.sci.b} -@end deftypefn +@defbuiltin{{unsigned int} @ + __lmbd (unsigned int @var{wordval}, @ + unsigned int @var{bitval})} +This inserts LMBD instruction to calculate the left-most bit with value +@var{bitval} in value @var{wordval}. Only the least significant bit +of @var{bitval} is taken into account. +@enddefbuiltin -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpltu_h (uint32_t, uint32_t) -Generated assembler @code{cv.cmpltu.h} -@end deftypefn +@node RISC-V Built-in Functions +@subsection RISC-V Built-in Functions -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpltu_b (uint32_t, uint32_t) -Generated assembler @code{cv.cmpltu.b} -@end deftypefn +These built-in functions are available for the RISC-V family of +processors. -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpltu_sc_h (uint32_t, uint16_t) -Generated assembler @code{cv.cmpltu.sc.h} -@end deftypefn +@defbuiltin{{void *} __builtin_thread_pointer (void)} +Returns the value that is currently set in the @samp{tp} register. +@enddefbuiltin -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpltu_sc_h (uint32_t, uint6_t) -Generated assembler @code{cv.cmpltu.sci.h} -@end deftypefn +@defbuiltin{void __builtin_riscv_pause (void)} +Generates the @code{pause} (hint) machine instruction. If the target implements +the Zihintpause extension, it indicates that the current hart should be +temporarily paused or slowed down. +@enddefbuiltin -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpltu_sc_b (uint32_t, uint8_t) -Generated assembler @code{cv.cmpltu.sc.b} -@end deftypefn +@node RISC-V Vector Intrinsics +@subsection RISC-V Vector Intrinsics -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpltu_sc_b (uint32_t, uint6_t) -Generated assembler @code{cv.cmpltu.sci.b} -@end deftypefn +GCC supports vector intrinsics as specified in version 0.11 of the RISC-V +vector intrinsic specification, which is available at the following link: +@uref{https://github.com/riscv-non-isa/rvv-intrinsic-doc/tree/v0.11.x}. +All of these functions are declared in the include file @file{riscv_vector.h}. -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpleu_h (uint32_t, uint32_t) -Generated assembler @code{cv.cmpleu.h} -@end deftypefn +@node CORE-V Built-in Functions +@subsection CORE-V Built-in Functions +For more information on all CORE-V built-ins, please see +@uref{https://github.com/openhwgroup/core-v-sw/blob/master/specifications/corev-builtin-spec.md} -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpleu_b (uint32_t, uint32_t) -Generated assembler @code{cv.cmpleu.b} -@end deftypefn +These built-in functions are available for the CORE-V MAC machine +architecture. For more information on CORE-V built-ins, please see +@uref{https://github.com/openhwgroup/core-v-sw/blob/master/specifications/corev-builtin-spec.md#listing-of-multiply-accumulate-builtins-xcvmac}. -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpleu_sc_h (uint32_t, uint16_t) -Generated assembler @code{cv.cmpleu.sc.h} +@deftypefn {Built-in Function} {int32_t} __builtin_riscv_cv_mac_mac (int32_t, int32_t, int32_t) +Generated assembler @code{cv.mac} @end deftypefn -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpleu_sc_h (uint32_t, uint6_t) -Generated assembler @code{cv.cmpleu.sci.h} +@deftypefn {Built-in Function} {int32_t} __builtin_riscv_cv_mac_msu (int32_t, int32_t, int32_t) +Generates the @code{cv.msu} machine instruction. @end deftypefn -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpleu_sc_b (uint32_t, uint8_t) -Generated assembler @code{cv.cmpleu.sc.b} +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_mac_muluN (uint32_t, uint32_t, uint8_t) +Generates the @code{cv.muluN} machine instruction. @end deftypefn -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpleu_sc_b (uint32_t, uint6_t) -Generated assembler @code{cv.cmpleu.sci.b} +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_mac_mulhhuN (uint32_t, uint32_t, uint8_t) +Generates the @code{cv.mulhhuN} machine instruction. @end deftypefn -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cplxmul_r (uint32_t, uint32_t, uint32_t, uint4_t) -Generated assembler @code{cv.cplxmul.r} +@deftypefn {Built-in Function} {int32_t} __builtin_riscv_cv_mac_mulsN (int32_t, int32_t, uint8_t) +Generates the @code{cv.mulsN} machine instruction. @end deftypefn -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cplxmul_i (uint32_t, uint32_t, uint32_t, uint4_t) -Generated assembler @code{cv.cplxmul.i} +@deftypefn {Built-in Function} {int32_t} __builtin_riscv_cv_mac_mulhhsN (int32_t, int32_t, uint8_t) +Generates the @code{cv.mulhhsN} machine instruction. @end deftypefn -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cplxmul_r (uint32_t, uint32_t, uint32_t, uint4_t) -Generated assembler @code{cv.cplxmul.r.div2} +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_mac_muluRN (uint32_t, uint32_t, uint8_t) +Generates the @code{cv.muluRN} machine instruction. @end deftypefn -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cplxmul_i (uint32_t, uint32_t, uint32_t, uint4_t) -Generated assembler @code{cv.cplxmul.i.div2} +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_mac_mulhhuRN (uint32_t, uint32_t, uint8_t) +Generates the @code{cv.mulhhuRN} machine instruction. @end deftypefn -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cplxmul_r (uint32_t, uint32_t, uint32_t, uint4_t) -Generated assembler @code{cv.cplxmul.r.div4} +@deftypefn {Built-in Function} {int32_t} __builtin_riscv_cv_mac_mulsRN (int32_t, int32_t, uint8_t) +Generates the @code{cv.mulsRN} machine instruction. @end deftypefn -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cplxmul_i (uint32_t, uint32_t, uint32_t, uint4_t) -Generated assembler @code{cv.cplxmul.i.div4} +@deftypefn {Built-in Function} {int32_t} __builtin_riscv_cv_mac_mulhhsRN (int32_t, int32_t, uint8_t) +Generates the @code{cv.mulhhsRN} machine instruction. @end deftypefn -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cplxmul_r (uint32_t, uint32_t, uint32_t, uint4_t) -Generated assembler @code{cv.cplxmul.r.div8} +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_mac_macuN (uint32_t, uint32_t, uint8_t) +Generates the @code{cv.macuN} machine instruction. @end deftypefn -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cplxmul_i (uint32_t, uint32_t, uint32_t, uint4_t) -Generated assembler @code{cv.cplxmul.i.div8} +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_mac_machhuN (uint32_t, uint32_t, uint8_t) +Generates the @code{cv.machhuN} machine instruction. @end deftypefn -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cplxconj (uint32_t) -Generated assembler @code{cv.cplxconj} +@deftypefn {Built-in Function} {int32_t} __builtin_riscv_cv_mac_macsN (int32_t, int32_t, uint8_t) +Generates the @code{cv.macsN} machine instruction. @end deftypefn -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_subrotmj (uint32_t, uint32_t, uint4_t) -Generated assembler @code{cv.subrotmj} +@deftypefn {Built-in Function} {int32_t} __builtin_riscv_cv_mac_machhsN (int32_t, int32_t, uint8_t) +Generates the @code{cv.machhsN} machine instruction. @end deftypefn -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_subrotmj (uint32_t, uint32_t, uint32_t, uint4_t) -Generated assembler @code{cv.subrotmj.div2} +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_mac_macuRN (uint32_t, uint32_t, uint8_t) +Generates the @code{cv.macuRN} machine instruction. @end deftypefn -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_subrotmj (uint32_t, uint32_t, uint32_t, uint4_t) -Generated assembler @code{cv.subrotmj.div4} +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_mac_machhuRN (uint32_t, uint32_t, uint8_t) +Generates the @code{cv.machhuRN} machine instruction. @end deftypefn -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_subrotmj (uint32_t, uint32_t, uint32_t, uint4_t) -Generated assembler @code{cv.subrotmj.div8} +@deftypefn {Built-in Function} {int32_t} __builtin_riscv_cv_mac_macsRN (int32_t, int32_t, uint8_t) +Generates the @code{cv.macsRN} machine instruction. @end deftypefn -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_add_h (uint32_t, uint32_t, uint32_t, uint4_t) -Generated assembler @code{cv.add.div2} +@deftypefn {Built-in Function} {int32_t} __builtin_riscv_cv_mac_machhsRN (int32_t, int32_t, uint8_t) +Generates the @code{cv.machhsRN} machine instruction. @end deftypefn -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_add_h (uint32_t, uint32_t, uint32_t, uint4_t) -Generated assembler @code{cv.add.div4} -@end deftypefn +These built-in functions are available for the CORE-V ALU machine +architecture. For more information on CORE-V built-ins, please see +@uref{https://github.com/openhwgroup/core-v-sw/blob/master/specifications/corev-builtin-spec.md#listing-of-miscellaneous-alu-builtins-xcvalu} -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_add_h (uint32_t, uint32_t, uint32_t, uint4_t) -Generated assembler @code{cv.add.div8} +@deftypefn {Built-in Function} {int} __builtin_riscv_cv_alu_slet (int32_t, int32_t) +Generated assembler @code{cv.slet} @end deftypefn -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sub_h (uint32_t, uint32_t, uint32_t, uint4_t) -Generated assembler @code{cv.sub.div2} +@deftypefn {Built-in Function} {int} __builtin_riscv_cv_alu_sletu (uint32_t, uint32_t) +Generated assembler @code{cv.sletu} @end deftypefn -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sub_h (uint32_t, uint32_t, uint32_t, uint4_t) -Generated assembler @code{cv.sub.div4} +@deftypefn {Built-in Function} {int32_t} __builtin_riscv_cv_alu_min (int32_t, int32_t) +Generated assembler @code{cv.min} @end deftypefn -@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sub_h (uint32_t, uint32_t, uint32_t, uint4_t) -Generated assembler @code{cv.sub.div8} +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_alu_minu (uint32_t, uint32_t) +Generated assembler @code{cv.minu} @end deftypefn -@node RX Built-in Functions -@subsection RX Built-in Functions -GCC supports some of the RX instructions which cannot be expressed in -the C programming language via the use of built-in functions. The -following functions are supported: - -@defbuiltin{void __builtin_rx_brk (void)} -Generates the @code{brk} machine instruction. -@enddefbuiltin +@deftypefn {Built-in Function} {int32_t} __builtin_riscv_cv_alu_max (int32_t, int32_t) +Generated assembler @code{cv.max} +@end deftypefn -@defbuiltin{void __builtin_rx_clrpsw (int)} -Generates the @code{clrpsw} machine instruction to clear the specified -bit in the processor status word. -@enddefbuiltin +@deftypefn {Built-in Function} {uint32_tnt} __builtin_riscv_cv_alu_maxu (uint32_t, uint32_t) +Generated assembler @code{cv.maxu} +@end deftypefn -@defbuiltin{void __builtin_rx_int (int)} -Generates the @code{int} machine instruction to generate an interrupt -with the specified value. -@enddefbuiltin +@deftypefn {Built-in Function} {int32_t} __builtin_riscv_cv_alu_exths (int16_t) +Generated assembler @code{cv.exths} +@end deftypefn -@defbuiltin{void __builtin_rx_machi (int, int)} -Generates the @code{machi} machine instruction to add the result of -multiplying the top 16 bits of the two arguments into the -accumulator. -@enddefbuiltin +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_alu_exthz (uint16_t) +Generated assembler @code{cv.exthz} +@end deftypefn -@defbuiltin{void __builtin_rx_maclo (int, int)} -Generates the @code{maclo} machine instruction to add the result of -multiplying the bottom 16 bits of the two arguments into the -accumulator. -@enddefbuiltin +@deftypefn {Built-in Function} {int32_t} __builtin_riscv_cv_alu_extbs (int8_t) +Generated assembler @code{cv.extbs} +@end deftypefn -@defbuiltin{void __builtin_rx_mulhi (int, int)} -Generates the @code{mulhi} machine instruction to place the result of -multiplying the top 16 bits of the two arguments into the -accumulator. -@enddefbuiltin +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_alu_extbz (uint8_t) +Generated assembler @code{cv.extbz} +@end deftypefn -@defbuiltin{void __builtin_rx_mullo (int, int)} -Generates the @code{mullo} machine instruction to place the result of -multiplying the bottom 16 bits of the two arguments into the -accumulator. -@enddefbuiltin +@deftypefn {Built-in Function} {int32_t} __builtin_riscv_cv_alu_clip (int32_t, uint32_t) +Generated assembler @code{cv.clip} if the uint32_t operand is a constant and an exact power of 2. +Generated assembler @code{cv.clipr} if the it is a register. +@end deftypefn -@defbuiltin{int __builtin_rx_mvfachi (void)} -Generates the @code{mvfachi} machine instruction to read the top -32 bits of the accumulator. -@enddefbuiltin +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_alu_clipu (uint32_t, uint32_t) +Generated assembler @code{cv.clipu} if the uint32_t operand is a constant and an exact power of 2. +Generated assembler @code{cv.clipur} if the it is a register. +@end deftypefn -@defbuiltin{int __builtin_rx_mvfacmi (void)} -Generates the @code{mvfacmi} machine instruction to read the middle -32 bits of the accumulator. -@enddefbuiltin +@deftypefn {Built-in Function} {int32_t} __builtin_riscv_cv_alu_addN (int32_t, int32_t, uint8_t) +Generated assembler @code{cv.addN} if the uint8_t operand is a constant and in the range 0 <= shft <= 31. +Generated assembler @code{cv.addNr} if the it is a register. +@end deftypefn -@defbuiltin{int __builtin_rx_mvfc (int)} -Generates the @code{mvfc} machine instruction which reads the control -register specified in its argument and returns its value. -@enddefbuiltin +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_alu_adduN (uint32_t, uint32_t, uint8_t) +Generated assembler @code{cv.adduN} if the uint8_t operand is a constant and in the range 0 <= shft <= 31. +Generated assembler @code{cv.adduNr} if the it is a register. +@end deftypefn -@defbuiltin{void __builtin_rx_mvtachi (int)} -Generates the @code{mvtachi} machine instruction to set the top -32 bits of the accumulator. -@enddefbuiltin +@deftypefn {Built-in Function} {int32_t} __builtin_riscv_cv_alu_addRN (int32_t, int32_t, uint8_t) +Generated assembler @code{cv.addRN} if the uint8_t operand is a constant and in the range 0 <= shft <= 31. +Generated assembler @code{cv.addRNr} if the it is a register. +@end deftypefn -@defbuiltin{void __builtin_rx_mvtaclo (int)} -Generates the @code{mvtaclo} machine instruction to set the bottom -32 bits of the accumulator. -@enddefbuiltin +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_alu_adduRN (uint32_t, uint32_t, uint8_t) +Generated assembler @code{cv.adduRN} if the uint8_t operand is a constant and in the range 0 <= shft <= 31. +Generated assembler @code{cv.adduRNr} if the it is a register. +@end deftypefn -@defbuiltin{void __builtin_rx_mvtc (int @var{reg}, int @var{val})} -Generates the @code{mvtc} machine instruction which sets control -register number @code{reg} to @code{val}. -@enddefbuiltin +@deftypefn {Built-in Function} {int32_t} __builtin_riscv_cv_alu_subN (int32_t, int32_t, uint8_t) +Generated assembler @code{cv.subN} if the uint8_t operand is a constant and in the range 0 <= shft <= 31. +Generated assembler @code{cv.subNr} if the it is a register. +@end deftypefn -@defbuiltin{void __builtin_rx_mvtipl (int)} -Generates the @code{mvtipl} machine instruction set the interrupt -priority level. -@enddefbuiltin +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_alu_subuN (uint32_t, uint32_t, uint8_t) +Generated assembler @code{cv.subuN} if the uint8_t operand is a constant and in the range 0 <= shft <= 31. +Generated assembler @code{cv.subuNr} if the it is a register. +@end deftypefn -@defbuiltin{void __builtin_rx_racw (int)} -Generates the @code{racw} machine instruction to round the accumulator -according to the specified mode. -@enddefbuiltin +@deftypefn {Built-in Function} {int32_t} __builtin_riscv_cv_alu_subRN (int32_t, int32_t, uint8_t) +Generated assembler @code{cv.subRN} if the uint8_t operand is a constant and in the range 0 <= shft <= 31. +Generated assembler @code{cv.subRNr} if the it is a register. +@end deftypefn -@defbuiltin{int __builtin_rx_revw (int)} -Generates the @code{revw} machine instruction which swaps the bytes in -the argument so that bits 0--7 now occupy bits 8--15 and vice versa, -and also bits 16--23 occupy bits 24--31 and vice versa. -@enddefbuiltin +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_alu_subuRN (uint32_t, uint32_t, uint8_t) +Generated assembler @code{cv.subuRN} if the uint8_t operand is a constant and in the range 0 <= shft <= 31. +Generated assembler @code{cv.subuRNr} if the it is a register. +@end deftypefn -@defbuiltin{void __builtin_rx_rmpa (void)} -Generates the @code{rmpa} machine instruction which initiates a -repeated multiply and accumulate sequence. -@enddefbuiltin +These built-in functions are available for the CORE-V Event Load machine +architecture. For more information on CORE-V ELW builtins, please see +@uref{https://github.com/openhwgroup/core-v-sw/blob/master/specifications/corev-builtin-spec.md#listing-of-event-load-word-builtins-xcvelw} -@defbuiltin{void __builtin_rx_round (float)} -Generates the @code{round} machine instruction which returns the -floating-point argument rounded according to the current rounding mode -set in the floating-point status word register. -@enddefbuiltin +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_elw_elw (uint32_t *) +Generated assembler @code{cv.elw} +@end deftypefn -@defbuiltin{int __builtin_rx_sat (int)} -Generates the @code{sat} machine instruction which returns the -saturated value of the argument. -@enddefbuiltin +These built-in functions are available for the CORE-V SIMD machine +architecture. For more information on CORE-V SIMD built-ins, please see +@uref{https://github.com/openhwgroup/core-v-sw/blob/master/specifications/corev-builtin-spec.md#listing-of-pulp-816-bit-simd-builtins-xcvsimd} -@defbuiltin{void __builtin_rx_setpsw (int)} -Generates the @code{setpsw} machine instruction to set the specified -bit in the processor status word. -@enddefbuiltin +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_add_h (uint32_t, uint32_t, uint4_t) +Generated assembler @code{cv.add.h} +@end deftypefn -@defbuiltin{void __builtin_rx_wait (void)} -Generates the @code{wait} machine instruction. -@enddefbuiltin +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_add_b (uint32_t, uint32_t) +Generated assembler @code{cv.add.b} +@end deftypefn -@node S/390 System z Built-in Functions -@subsection S/390 System z Built-in Functions -@defbuiltin{int __builtin_tbegin (void*)} -Generates the @code{tbegin} machine instruction starting a -non-constrained hardware transaction. If the parameter is non-NULL the -memory area is used to store the transaction diagnostic buffer and -will be passed as first operand to @code{tbegin}. This buffer can be -defined using the @code{struct __htm_tdb} C struct defined in -@code{htmintrin.h} and must reside on a double-word boundary. The -second tbegin operand is set to @code{0xff0c}. This enables -save/restore of all GPRs and disables aborts for FPR and AR -manipulations inside the transaction body. The condition code set by -the tbegin instruction is returned as integer value. The tbegin -instruction by definition overwrites the content of all FPRs. The -compiler will generate code which saves and restores the FPRs. For -soft-float code it is recommended to used the @code{*_nofloat} -variant. In order to prevent a TDB from being written it is required -to pass a constant zero value as parameter. Passing a zero value -through a variable is not sufficient. Although modifications of -access registers inside the transaction will not trigger an -transaction abort it is not supported to actually modify them. Access -registers do not get saved when entering a transaction. They will have -undefined state when reaching the abort code. -@enddefbuiltin +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_add_sc_h (uint32_t, int16_t) +Generated assembler @code{cv.add.sc.h} +@end deftypefn -Macros for the possible return codes of tbegin are defined in the -@code{htmintrin.h} header file: +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_add_sc_h (uint32_t, int6_t) +Generated assembler @code{cv.add.sci.h} +@end deftypefn -@defmac _HTM_TBEGIN_STARTED -@code{tbegin} has been executed as part of normal processing. The -transaction body is supposed to be executed. -@end defmac +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_add_sc_b (uint32_t, int8_t) +Generated assembler @code{cv.add.sc.b} +@end deftypefn -@defmac _HTM_TBEGIN_INDETERMINATE -The transaction was aborted due to an indeterminate condition which -might be persistent. -@end defmac +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_add_sc_b (uint32_t, int6_t) +Generated assembler @code{cv.add.sci.b} +@end deftypefn -@defmac _HTM_TBEGIN_TRANSIENT -The transaction aborted due to a transient failure. The transaction -should be re-executed in that case. -@end defmac +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sub_h (uint32_t, uint32_t, uint4_t) +Generated assembler @code{cv.sub.h} +@end deftypefn -@defmac _HTM_TBEGIN_PERSISTENT -The transaction aborted due to a persistent failure. Re-execution -under same circumstances will not be productive. -@end defmac +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sub_b (uint32_t, uint32_t) +Generated assembler @code{cv.sub.b} +@end deftypefn -@defmac _HTM_FIRST_USER_ABORT_CODE -The @code{_HTM_FIRST_USER_ABORT_CODE} defined in @code{htmintrin.h} -specifies the first abort code which can be used for -@code{__builtin_tabort}. Values below this threshold are reserved for -machine use. -@end defmac +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sub_sc_h (uint32_t, int16_t) +Generated assembler @code{cv.sub.sc.h} +@end deftypefn -@deftp {Data type} {struct __htm_tdb} -The @code{struct __htm_tdb} defined in @code{htmintrin.h} describes -the structure of the transaction diagnostic block as specified in the -Principles of Operation manual chapter 5-91. -@end deftp +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sub_sc_h (uint32_t, int6_t) +Generated assembler @code{cv.sub.sci.h} +@end deftypefn -@defbuiltin{int __builtin_tbegin_nofloat (void*)} -Same as @code{__builtin_tbegin} but without FPR saves and restores. -Using this variant in code making use of FPRs will leave the FPRs in -undefined state when entering the transaction abort handler code. -@enddefbuiltin +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sub_sc_b (uint32_t, int8_t) +Generated assembler @code{cv.sub.sc.b} +@end deftypefn -@defbuiltin{int __builtin_tbegin_retry (void*, int)} -In addition to @code{__builtin_tbegin} a loop for transient failures -is generated. If tbegin returns a condition code of 2 the transaction -will be retried as often as specified in the second argument. The -perform processor assist instruction is used to tell the CPU about the -number of fails so far. -@enddefbuiltin +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sub_sc_b (uint32_t, int6_t) +Generated assembler @code{cv.sub.sci.b} +@end deftypefn -@defbuiltin{int __builtin_tbegin_retry_nofloat (void*, int)} -Same as @code{__builtin_tbegin_retry} but without FPR saves and -restores. Using this variant in code making use of FPRs will leave -the FPRs in undefined state when entering the transaction abort -handler code. -@enddefbuiltin +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_avg_h (uint32_t, uint32_t) +Generated assembler @code{cv.avg.h} +@end deftypefn -@defbuiltin{void __builtin_tbeginc (void)} -Generates the @code{tbeginc} machine instruction starting a constrained -hardware transaction. The second operand is set to @code{0xff08}. -@enddefbuiltin +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_avg_b (uint32_t, uint32_t) +Generated assembler @code{cv.avg.b} +@end deftypefn -@defbuiltin{int __builtin_tend (void)} -Generates the @code{tend} machine instruction finishing a transaction -and making the changes visible to other threads. The condition code -generated by tend is returned as integer value. -@enddefbuiltin +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_avg_sc_h (uint32_t, int16_t) +Generated assembler @code{cv.avg.sc.h} +@end deftypefn -@defbuiltin{void __builtin_tabort (int)} -Generates the @code{tabort} machine instruction with the specified -abort code. Abort codes from 0 through 255 are reserved and will -result in an error message. -@enddefbuiltin +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_avg_sc_h (uint32_t, int6_t) +Generated assembler @code{cv.avg.sci.h} +@end deftypefn -@defbuiltin{void __builtin_tx_assist (int)} -Generates the @code{ppa rX,rY,1} machine instruction. Where the -integer parameter is loaded into rX and a value of zero is loaded into -rY. The integer parameter specifies the number of times the -transaction repeatedly aborted. -@enddefbuiltin +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_avg_sc_b (uint32_t, int8_t) +Generated assembler @code{cv.avg.sc.b} +@end deftypefn -@defbuiltin{int __builtin_tx_nesting_depth (void)} -Generates the @code{etnd} machine instruction. The current nesting -depth is returned as integer value. For a nesting depth of 0 the code -is not executed as part of an transaction. -@enddefbuiltin +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_avg_sc_b (uint32_t, int6_t) +Generated assembler @code{cv.avg.sci.b} +@end deftypefn -@defbuiltin{void __builtin_non_tx_store (uint64_t *, uint64_t)} +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_avgu_h (uint32_t, uint32_t) +Generated assembler @code{cv.avgu.h} +@end deftypefn -Generates the @code{ntstg} machine instruction. The second argument -is written to the first arguments location. The store operation will -not be rolled-back in case of an transaction abort. -@enddefbuiltin +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_avgu_b (uint32_t, uint32_t) +Generated assembler @code{cv.avgu.b} +@end deftypefn -@node SH Built-in Functions -@subsection SH Built-in Functions -The following built-in functions are supported on the SH1, SH2, SH3 and SH4 -families of processors: +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_avgu_sc_h (uint32_t, uint16_t) +Generated assembler @code{cv.avgu.sc.h} +@end deftypefn -@defbuiltin{{void} __builtin_set_thread_pointer (void *@var{ptr})} -Sets the @samp{GBR} register to the specified value @var{ptr}. This is usually -used by system code that manages threads and execution contexts. The compiler -normally does not generate code that modifies the contents of @samp{GBR} and -thus the value is preserved across function calls. Changing the @samp{GBR} -value in user code must be done with caution, since the compiler might use -@samp{GBR} in order to access thread local variables. +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_avgu_sc_h (uint32_t, uint6_t) +Generated assembler @code{cv.avgu.sci.h} +@end deftypefn -@enddefbuiltin +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_avgu_sc_b (uint32_t, uint8_t) +Generated assembler @code{cv.avgu.sc.b} +@end deftypefn -@defbuiltin{{void *} __builtin_thread_pointer (void)} -Returns the value that is currently set in the @samp{GBR} register. -Memory loads and stores that use the thread pointer as a base address are -turned into @samp{GBR} based displacement loads and stores, if possible. -For example: -@smallexample -struct my_tcb -@{ - int a, b, c, d, e; -@}; +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_avgu_sc_b (uint32_t, uint6_t) +Generated assembler @code{cv.avgu.sci.b} +@end deftypefn -int get_tcb_value (void) -@{ - // Generate @samp{mov.l @@(8,gbr),r0} instruction - return ((my_tcb*)__builtin_thread_pointer ())->c; -@} +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_min_h (uint32_t, uint32_t) +Generated assembler @code{cv.min.h} +@end deftypefn -@end smallexample -@enddefbuiltin +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_min_b (uint32_t, uint32_t) +Generated assembler @code{cv.min.b} +@end deftypefn -@defbuiltin{{unsigned int} __builtin_sh_get_fpscr (void)} -Returns the value that is currently set in the @samp{FPSCR} register. -@enddefbuiltin +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_min_sc_h (uint32_t, int16_t) +Generated assembler @code{cv.min.sc.h} +@end deftypefn -@defbuiltin{{void} __builtin_sh_set_fpscr (unsigned int @var{val})} -Sets the @samp{FPSCR} register to the specified value @var{val}, while -preserving the current values of the FR, SZ and PR bits. -@enddefbuiltin +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_min_sc_h (uint32_t, int6_t) +Generated assembler @code{cv.min.sci.h} +@end deftypefn -@node SPARC VIS Built-in Functions -@subsection SPARC VIS Built-in Functions +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_min_sc_b (uint32_t, int8_t) +Generated assembler @code{cv.min.sc.b} +@end deftypefn -GCC supports SIMD operations on the SPARC using both the generic vector -extensions (@pxref{Vector Extensions}) as well as built-in functions for -the SPARC Visual Instruction Set (VIS). When you use the @option{-mvis} -switch, the VIS extension is exposed as the following built-in functions: +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_min_sc_b (uint32_t, int6_t) +Generated assembler @code{cv.min.sci.b} +@end deftypefn -@smallexample -typedef int v1si __attribute__ ((vector_size (4))); -typedef int v2si __attribute__ ((vector_size (8))); -typedef short v4hi __attribute__ ((vector_size (8))); -typedef short v2hi __attribute__ ((vector_size (4))); -typedef unsigned char v8qi __attribute__ ((vector_size (8))); -typedef unsigned char v4qi __attribute__ ((vector_size (4))); +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_minu_h (uint32_t, uint32_t) +Generated assembler @code{cv.minu.h} +@end deftypefn -void __builtin_vis_write_gsr (int64_t); -int64_t __builtin_vis_read_gsr (void); +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_minu_b (uint32_t, uint32_t) +Generated assembler @code{cv.minu.b} +@end deftypefn -void * __builtin_vis_alignaddr (void *, long); -void * __builtin_vis_alignaddrl (void *, long); -int64_t __builtin_vis_faligndatadi (int64_t, int64_t); -v2si __builtin_vis_faligndatav2si (v2si, v2si); -v4hi __builtin_vis_faligndatav4hi (v4si, v4si); -v8qi __builtin_vis_faligndatav8qi (v8qi, v8qi); +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_minu_sc_h (uint32_t, uint16_t) +Generated assembler @code{cv.minu.sc.h} +@end deftypefn -v4hi __builtin_vis_fexpand (v4qi); +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_minu_sc_h (uint32_t, uint6_t) +Generated assembler @code{cv.minu.sci.h} +@end deftypefn -v4hi __builtin_vis_fmul8x16 (v4qi, v4hi); -v4hi __builtin_vis_fmul8x16au (v4qi, v2hi); -v4hi __builtin_vis_fmul8x16al (v4qi, v2hi); -v4hi __builtin_vis_fmul8sux16 (v8qi, v4hi); -v4hi __builtin_vis_fmul8ulx16 (v8qi, v4hi); -v2si __builtin_vis_fmuld8sux16 (v4qi, v2hi); -v2si __builtin_vis_fmuld8ulx16 (v4qi, v2hi); +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_minu_sc_b (uint32_t, uint8_t) +Generated assembler @code{cv.minu.sc.b} +@end deftypefn -v4qi __builtin_vis_fpack16 (v4hi); -v8qi __builtin_vis_fpack32 (v2si, v8qi); -v2hi __builtin_vis_fpackfix (v2si); -v8qi __builtin_vis_fpmerge (v4qi, v4qi); +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_minu_sc_b (uint32_t, uint6_t) +Generated assembler @code{cv.minu.sci.b} +@end deftypefn -int64_t __builtin_vis_pdist (v8qi, v8qi, int64_t); +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_max_h (uint32_t, uint32_t) +Generated assembler @code{cv.max.h} +@end deftypefn -long __builtin_vis_edge8 (void *, void *); -long __builtin_vis_edge8l (void *, void *); -long __builtin_vis_edge16 (void *, void *); -long __builtin_vis_edge16l (void *, void *); -long __builtin_vis_edge32 (void *, void *); -long __builtin_vis_edge32l (void *, void *); +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_max_b (uint32_t, uint32_t) +Generated assembler @code{cv.max.b} +@end deftypefn -long __builtin_vis_fcmple16 (v4hi, v4hi); -long __builtin_vis_fcmple32 (v2si, v2si); -long __builtin_vis_fcmpne16 (v4hi, v4hi); -long __builtin_vis_fcmpne32 (v2si, v2si); -long __builtin_vis_fcmpgt16 (v4hi, v4hi); -long __builtin_vis_fcmpgt32 (v2si, v2si); -long __builtin_vis_fcmpeq16 (v4hi, v4hi); -long __builtin_vis_fcmpeq32 (v2si, v2si); +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_max_sc_h (uint32_t, int16_t) +Generated assembler @code{cv.max.sc.h} +@end deftypefn -v4hi __builtin_vis_fpadd16 (v4hi, v4hi); -v2hi __builtin_vis_fpadd16s (v2hi, v2hi); -v2si __builtin_vis_fpadd32 (v2si, v2si); -v1si __builtin_vis_fpadd32s (v1si, v1si); -v4hi __builtin_vis_fpsub16 (v4hi, v4hi); -v2hi __builtin_vis_fpsub16s (v2hi, v2hi); -v2si __builtin_vis_fpsub32 (v2si, v2si); -v1si __builtin_vis_fpsub32s (v1si, v1si); +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_max_sc_h (uint32_t, int6_t) +Generated assembler @code{cv.max.sci.h} +@end deftypefn -long __builtin_vis_array8 (long, long); -long __builtin_vis_array16 (long, long); -long __builtin_vis_array32 (long, long); -@end smallexample +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_max_sc_b (uint32_t, int8_t) +Generated assembler @code{cv.max.sc.b} +@end deftypefn -When you use the @option{-mvis2} switch, the VIS version 2.0 built-in -functions also become available: +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_max_sc_b (uint32_t, int6_t) +Generated assembler @code{cv.max.sci.b} +@end deftypefn -@smallexample -long __builtin_vis_bmask (long, long); -int64_t __builtin_vis_bshuffledi (int64_t, int64_t); -v2si __builtin_vis_bshufflev2si (v2si, v2si); -v4hi __builtin_vis_bshufflev2si (v4hi, v4hi); -v8qi __builtin_vis_bshufflev2si (v8qi, v8qi); +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_maxu_h (uint32_t, uint32_t) +Generated assembler @code{cv.maxu.h} +@end deftypefn -long __builtin_vis_edge8n (void *, void *); -long __builtin_vis_edge8ln (void *, void *); -long __builtin_vis_edge16n (void *, void *); -long __builtin_vis_edge16ln (void *, void *); -long __builtin_vis_edge32n (void *, void *); -long __builtin_vis_edge32ln (void *, void *); -@end smallexample +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_maxu_b (uint32_t, uint32_t) +Generated assembler @code{cv.maxu.b} +@end deftypefn -When you use the @option{-mvis3} switch, the VIS version 3.0 built-in -functions also become available: +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_maxu_sc_h (uint32_t, uint16_t) +Generated assembler @code{cv.maxu.sc.h} +@end deftypefn -@smallexample -void __builtin_vis_cmask8 (long); -void __builtin_vis_cmask16 (long); -void __builtin_vis_cmask32 (long); +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_maxu_sc_h (uint32_t, uint6_t) +Generated assembler @code{cv.maxu.sci.h} +@end deftypefn + +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_maxu_sc_b (uint32_t, uint8_t) +Generated assembler @code{cv.maxu.sc.b} +@end deftypefn -v4hi __builtin_vis_fchksm16 (v4hi, v4hi); +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_maxu_sc_b (uint32_t, uint6_t) +Generated assembler @code{cv.maxu.sci.b} +@end deftypefn -v4hi __builtin_vis_fsll16 (v4hi, v4hi); -v4hi __builtin_vis_fslas16 (v4hi, v4hi); -v4hi __builtin_vis_fsrl16 (v4hi, v4hi); -v4hi __builtin_vis_fsra16 (v4hi, v4hi); -v2si __builtin_vis_fsll16 (v2si, v2si); -v2si __builtin_vis_fslas16 (v2si, v2si); -v2si __builtin_vis_fsrl16 (v2si, v2si); -v2si __builtin_vis_fsra16 (v2si, v2si); +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_srl_h (uint32_t, uint32_t) +Generated assembler @code{cv.srl.h} +@end deftypefn -long __builtin_vis_pdistn (v8qi, v8qi); +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_srl_b (uint32_t, uint32_t) +Generated assembler @code{cv.srl.b} +@end deftypefn -v4hi __builtin_vis_fmean16 (v4hi, v4hi); +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_srl_sc_h (uint32_t, int16_t) +Generated assembler @code{cv.srl.sc.h} +@end deftypefn -int64_t __builtin_vis_fpadd64 (int64_t, int64_t); -int64_t __builtin_vis_fpsub64 (int64_t, int64_t); +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_srl_sc_h (uint32_t, int6_t) +Generated assembler @code{cv.srl.sci.h} +@end deftypefn -v4hi __builtin_vis_fpadds16 (v4hi, v4hi); -v2hi __builtin_vis_fpadds16s (v2hi, v2hi); -v4hi __builtin_vis_fpsubs16 (v4hi, v4hi); -v2hi __builtin_vis_fpsubs16s (v2hi, v2hi); -v2si __builtin_vis_fpadds32 (v2si, v2si); -v1si __builtin_vis_fpadds32s (v1si, v1si); -v2si __builtin_vis_fpsubs32 (v2si, v2si); -v1si __builtin_vis_fpsubs32s (v1si, v1si); +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_srl_sc_b (uint32_t, int8_t) +Generated assembler @code{cv.srl.sc.b} +@end deftypefn -long __builtin_vis_fucmple8 (v8qi, v8qi); -long __builtin_vis_fucmpne8 (v8qi, v8qi); -long __builtin_vis_fucmpgt8 (v8qi, v8qi); -long __builtin_vis_fucmpeq8 (v8qi, v8qi); +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_srl_sc_b (uint32_t, int6_t) +Generated assembler @code{cv.srl.sci.b} +@end deftypefn -float __builtin_vis_fhadds (float, float); -double __builtin_vis_fhaddd (double, double); -float __builtin_vis_fhsubs (float, float); -double __builtin_vis_fhsubd (double, double); -float __builtin_vis_fnhadds (float, float); -double __builtin_vis_fnhaddd (double, double); +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sra_h (uint32_t, uint32_t) +Generated assembler @code{cv.sra.h} +@end deftypefn -int64_t __builtin_vis_umulxhi (int64_t, int64_t); -int64_t __builtin_vis_xmulx (int64_t, int64_t); -int64_t __builtin_vis_xmulxhi (int64_t, int64_t); -@end smallexample +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sra_b (uint32_t, uint32_t) +Generated assembler @code{cv.sra.b} +@end deftypefn -When you use the @option{-mvis4} switch, the VIS version 4.0 built-in -functions also become available: +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sra_sc_h (uint32_t, int16_t) +Generated assembler @code{cv.sra.sc.h} +@end deftypefn -@smallexample -v8qi __builtin_vis_fpadd8 (v8qi, v8qi); -v8qi __builtin_vis_fpadds8 (v8qi, v8qi); -v8qi __builtin_vis_fpaddus8 (v8qi, v8qi); -v4hi __builtin_vis_fpaddus16 (v4hi, v4hi); +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sra_sc_h (uint32_t, int6_t) +Generated assembler @code{cv.sra.sci.h} +@end deftypefn -v8qi __builtin_vis_fpsub8 (v8qi, v8qi); -v8qi __builtin_vis_fpsubs8 (v8qi, v8qi); -v8qi __builtin_vis_fpsubus8 (v8qi, v8qi); -v4hi __builtin_vis_fpsubus16 (v4hi, v4hi); +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sra_sc_b (uint32_t, int8_t) +Generated assembler @code{cv.sra.sc.b} +@end deftypefn -long __builtin_vis_fpcmple8 (v8qi, v8qi); -long __builtin_vis_fpcmpgt8 (v8qi, v8qi); -long __builtin_vis_fpcmpule16 (v4hi, v4hi); -long __builtin_vis_fpcmpugt16 (v4hi, v4hi); -long __builtin_vis_fpcmpule32 (v2si, v2si); -long __builtin_vis_fpcmpugt32 (v2si, v2si); +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sra_sc_b (uint32_t, int6_t) +Generated assembler @code{cv.sra.sci.b} +@end deftypefn -v8qi __builtin_vis_fpmax8 (v8qi, v8qi); -v4hi __builtin_vis_fpmax16 (v4hi, v4hi); -v2si __builtin_vis_fpmax32 (v2si, v2si); +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sll_h (uint32_t, uint32_t) +Generated assembler @code{cv.sll.h} +@end deftypefn -v8qi __builtin_vis_fpmaxu8 (v8qi, v8qi); -v4hi __builtin_vis_fpmaxu16 (v4hi, v4hi); -v2si __builtin_vis_fpmaxu32 (v2si, v2si); +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sll_b (uint32_t, uint32_t) +Generated assembler @code{cv.sll.b} +@end deftypefn -v8qi __builtin_vis_fpmin8 (v8qi, v8qi); -v4hi __builtin_vis_fpmin16 (v4hi, v4hi); -v2si __builtin_vis_fpmin32 (v2si, v2si); +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sll_sc_h (uint32_t, int16_t) +Generated assembler @code{cv.sll.sc.h} +@end deftypefn -v8qi __builtin_vis_fpminu8 (v8qi, v8qi); -v4hi __builtin_vis_fpminu16 (v4hi, v4hi); -v2si __builtin_vis_fpminu32 (v2si, v2si); -@end smallexample +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sll_sc_h (uint32_t, int6_t) +Generated assembler @code{cv.sll.sci.h} +@end deftypefn -When you use the @option{-mvis4b} switch, the VIS version 4.0B -built-in functions also become available: +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sll_sc_b (uint32_t, int8_t) +Generated assembler @code{cv.sll.sc.b} +@end deftypefn -@smallexample -v8qi __builtin_vis_dictunpack8 (double, int); -v4hi __builtin_vis_dictunpack16 (double, int); -v2si __builtin_vis_dictunpack32 (double, int); +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sll_sc_b (uint32_t, int6_t) +Generated assembler @code{cv.sll.sci.b} +@end deftypefn -long __builtin_vis_fpcmple8shl (v8qi, v8qi, int); -long __builtin_vis_fpcmpgt8shl (v8qi, v8qi, int); -long __builtin_vis_fpcmpeq8shl (v8qi, v8qi, int); -long __builtin_vis_fpcmpne8shl (v8qi, v8qi, int); +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_or_h (uint32_t, uint32_t) +Generated assembler @code{cv.or.h} +@end deftypefn -long __builtin_vis_fpcmple16shl (v4hi, v4hi, int); -long __builtin_vis_fpcmpgt16shl (v4hi, v4hi, int); -long __builtin_vis_fpcmpeq16shl (v4hi, v4hi, int); -long __builtin_vis_fpcmpne16shl (v4hi, v4hi, int); +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_or_b (uint32_t, uint32_t) +Generated assembler @code{cv.or.b} +@end deftypefn -long __builtin_vis_fpcmple32shl (v2si, v2si, int); -long __builtin_vis_fpcmpgt32shl (v2si, v2si, int); -long __builtin_vis_fpcmpeq32shl (v2si, v2si, int); -long __builtin_vis_fpcmpne32shl (v2si, v2si, int); +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_or_sc_h (uint32_t, int16_t) +Generated assembler @code{cv.or.sc.h} +@end deftypefn -long __builtin_vis_fpcmpule8shl (v8qi, v8qi, int); -long __builtin_vis_fpcmpugt8shl (v8qi, v8qi, int); -long __builtin_vis_fpcmpule16shl (v4hi, v4hi, int); -long __builtin_vis_fpcmpugt16shl (v4hi, v4hi, int); -long __builtin_vis_fpcmpule32shl (v2si, v2si, int); -long __builtin_vis_fpcmpugt32shl (v2si, v2si, int); +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_or_sc_h (uint32_t, int6_t) +Generated assembler @code{cv.or.sci.h} +@end deftypefn -long __builtin_vis_fpcmpde8shl (v8qi, v8qi, int); -long __builtin_vis_fpcmpde16shl (v4hi, v4hi, int); -long __builtin_vis_fpcmpde32shl (v2si, v2si, int); +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_or_sc_b (uint32_t, int8_t) +Generated assembler @code{cv.or.sc.b} +@end deftypefn -long __builtin_vis_fpcmpur8shl (v8qi, v8qi, int); -long __builtin_vis_fpcmpur16shl (v4hi, v4hi, int); -long __builtin_vis_fpcmpur32shl (v2si, v2si, int); -@end smallexample +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_or_sc_b (uint32_t, int6_t) +Generated assembler @code{cv.or.sci.b} +@end deftypefn -@node TI C6X Built-in Functions -@subsection TI C6X Built-in Functions +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_xor_h (uint32_t, uint32_t) +Generated assembler @code{cv.xor.h} +@end deftypefn -GCC provides intrinsics to access certain instructions of the TI C6X -processors. These intrinsics, listed below, are available after -inclusion of the @code{c6x_intrinsics.h} header file. They map directly -to C6X instructions. +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_xor_b (uint32_t, uint32_t) +Generated assembler @code{cv.xor.b} +@end deftypefn -@smallexample -int _sadd (int, int); -int _ssub (int, int); -int _sadd2 (int, int); -int _ssub2 (int, int); -long long _mpy2 (int, int); -long long _smpy2 (int, int); -int _add4 (int, int); -int _sub4 (int, int); -int _saddu4 (int, int); +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_xor_sc_h (uint32_t, int16_t) +Generated assembler @code{cv.xor.sc.h} +@end deftypefn -int _smpy (int, int); -int _smpyh (int, int); -int _smpyhl (int, int); -int _smpylh (int, int); +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_xor_sc_h (uint32_t, int6_t) +Generated assembler @code{cv.xor.sci.h} +@end deftypefn -int _sshl (int, int); -int _subc (int, int); +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_xor_sc_b (uint32_t, int8_t) +Generated assembler @code{cv.xor.sc.b} +@end deftypefn -int _avg2 (int, int); -int _avgu4 (int, int); +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_xor_sc_b (uint32_t, int6_t) +Generated assembler @code{cv.xor.sci.b} +@end deftypefn -int _clrr (int, int); -int _extr (int, int); -int _extru (int, int); -int _abs (int); -int _abs2 (int); -@end smallexample +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_and_h (uint32_t, uint32_t) +Generated assembler @code{cv.and.h} +@end deftypefn -@node x86 Built-in Functions -@subsection x86 Built-in Functions +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_and_b (uint32_t, uint32_t) +Generated assembler @code{cv.and.b} +@end deftypefn -These built-in functions are available for the x86-32 and x86-64 family -of computers, depending on the command-line switches used. +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_and_sc_h (uint32_t, int16_t) +Generated assembler @code{cv.and.sc.h} +@end deftypefn -If you specify command-line switches such as @option{-msse}, -the compiler could use the extended instruction sets even if the built-ins -are not used explicitly in the program. For this reason, applications -that perform run-time CPU detection must compile separate files for each -supported architecture, using the appropriate flags. In particular, -the file containing the CPU detection code should be compiled without -these options. +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_and_sc_h (uint32_t, int6_t) +Generated assembler @code{cv.and.sci.h} +@end deftypefn -The following machine modes are available for use with MMX built-in functions -(@pxref{Vector Extensions}): @code{V2SI} for a vector of two 32-bit integers, -@code{V4HI} for a vector of four 16-bit integers, and @code{V8QI} for a -vector of eight 8-bit integers. Some of the built-in functions operate on -MMX registers as a whole 64-bit entity, these use @code{V1DI} as their mode. +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_and_sc_b (uint32_t, int8_t) +Generated assembler @code{cv.and.sc.b} +@end deftypefn -If 3DNow!@: extensions are enabled, @code{V2SF} is used as a mode for a vector -of two 32-bit floating-point values. +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_and_sc_b (uint32_t, int6_t) +Generated assembler @code{cv.and.sci.b} +@end deftypefn -If SSE extensions are enabled, @code{V4SF} is used for a vector of four 32-bit -floating-point values. Some instructions use a vector of four 32-bit -integers, these use @code{V4SI}. Finally, some instructions operate on an -entire vector register, interpreting it as a 128-bit integer, these use mode -@code{TI}. +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_abs_h (uint32_t) +Generated assembler @code{cv.abs.h} +@end deftypefn -The x86-32 and x86-64 family of processors use additional built-in -functions for efficient use of @code{TF} (@code{__float128}) 128-bit -floating point and @code{TC} 128-bit complex floating-point values. +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_abs_b (uint32_t) +Generated assembler @code{cv.abs.b} +@end deftypefn -The following floating-point built-in functions are always available: +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_dotup_h (uint32_t, uint32_t) +Generated assembler @code{cv.dotup.h} +@end deftypefn -@defbuiltin{__float128 __builtin_fabsq (__float128 @var{x}))} -Computes the absolute value of @var{x}. -@enddefbuiltin +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_dotup_b (uint32_t, uint32_t) +Generated assembler @code{cv.dotup.b} +@end deftypefn -@defbuiltin{__float128 __builtin_copysignq (__float128 @var{x}, @ - __float128 @var{y})} -Copies the sign of @var{y} into @var{x} and returns the new value of -@var{x}. -@enddefbuiltin +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_dotup_sc_h (uint32_t, uint16_t) +Generated assembler @code{cv.dotup.sc.h} +@end deftypefn -@defbuiltin{__float128 __builtin_infq (void)} -Similar to @code{__builtin_inf}, except the return type is @code{__float128}. -@enddefbuiltin +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_dotup_sc_h (uint32_t, uint6_t) +Generated assembler @code{cv.dotup.sci.h} +@end deftypefn -@defbuiltin{__float128 __builtin_huge_valq (void)} -Similar to @code{__builtin_huge_val}, except the return type is @code{__float128}. -@enddefbuiltin +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_dotup_sc_b (uint32_t, uint8_t) +Generated assembler @code{cv.dotup.sc.b} +@end deftypefn -@defbuiltin{__float128 __builtin_nanq (void)} -Similar to @code{__builtin_nan}, except the return type is @code{__float128}. -@enddefbuiltin +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_dotup_sc_b (uint32_t, uint6_t) +Generated assembler @code{cv.dotup.sci.b} +@end deftypefn -@defbuiltin{__float128 __builtin_nansq (void)} -Similar to @code{__builtin_nans}, except the return type is @code{__float128}. -@enddefbuiltin +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_dotusp_h (uint32_t, uint32_t) +Generated assembler @code{cv.dotusp.h} +@end deftypefn -The following built-in function is always available. +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_dotusp_b (uint32_t, uint32_t) +Generated assembler @code{cv.dotusp.b} +@end deftypefn -@defbuiltin{void __builtin_ia32_pause (void)} -Generates the @code{pause} machine instruction with a compiler memory -barrier. -@enddefbuiltin +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_dotusp_sc_h (uint32_t, int16_t) +Generated assembler @code{cv.dotusp.sc.h} +@end deftypefn -The following built-in functions are always available and can be used to -check the target platform type. +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_dotusp_sc_h (uint32_t, int6_t) +Generated assembler @code{cv.dotusp.sci.h} +@end deftypefn -@defbuiltin{void __builtin_cpu_init (void)} -This function runs the CPU detection code to check the type of CPU and the -features supported. This built-in function needs to be invoked along with the built-in functions -to check CPU type and features, @code{__builtin_cpu_is} and -@code{__builtin_cpu_supports}, only when used in a function that is -executed before any constructors are called. The CPU detection code is -automatically executed in a very high priority constructor. +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_dotusp_sc_b (uint32_t, int8_t) +Generated assembler @code{cv.dotusp.sc.b} +@end deftypefn -For example, this function has to be used in @code{ifunc} resolvers that -check for CPU type using the built-in functions @code{__builtin_cpu_is} -and @code{__builtin_cpu_supports}, or in constructors on targets that -don't support constructor priority. -@smallexample +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_dotusp_sc_b (uint32_t, int6_t) +Generated assembler @code{cv.dotusp.sci.b} +@end deftypefn -static void (*resolve_memcpy (void)) (void) -@{ - // ifunc resolvers fire before constructors, explicitly call the init - // function. - __builtin_cpu_init (); - if (__builtin_cpu_supports ("ssse3")) - return ssse3_memcpy; // super fast memcpy with ssse3 instructions. - else - return default_memcpy; -@} +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_dotsp_h (uint32_t, uint32_t) +Generated assembler @code{cv.dotsp.h} +@end deftypefn -void *memcpy (void *, const void *, size_t) - __attribute__ ((ifunc ("resolve_memcpy"))); -@end smallexample +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_dotsp_b (uint32_t, uint32_t) +Generated assembler @code{cv.dotsp.b} +@end deftypefn -@enddefbuiltin +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_dotsp_sc_h (uint32_t, int16_t) +Generated assembler @code{cv.dotsp.sc.h} +@end deftypefn -@defbuiltin{int __builtin_cpu_is (const char *@var{cpuname})} -This function returns a positive integer if the run-time CPU -is of type @var{cpuname} -and returns @code{0} otherwise. The following CPU names can be detected: +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_dotsp_sc_h (uint32_t, int6_t) +Generated assembler @code{cv.dotsp.sci.h} +@end deftypefn -@table @samp -@item amd -AMD CPU. +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_dotsp_sc_b (uint32_t, int8_t) +Generated assembler @code{cv.dotsp.sc.b} +@end deftypefn -@item intel -Intel CPU. +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_dotsp_sc_b (uint32_t, int6_t) +Generated assembler @code{cv.dotsp.sci.b} +@end deftypefn -@item atom -Intel Atom CPU. +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sdotup_h (uint32_t, uint32_t, uint32_t) +Generated assembler @code{cv.sdotup.h} +@end deftypefn -@item slm -Intel Silvermont CPU. +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sdotup_b (uint32_t, uint32_t, uint32_t) +Generated assembler @code{cv.sdotup.b} +@end deftypefn -@item core2 -Intel Core 2 CPU. +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sdotup_sc_h (uint32_t, uint16_t, uint32_t) +Generated assembler @code{cv.sdotup.sc.h} +@end deftypefn -@item corei7 -Intel Core i7 CPU. +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sdotup_sc_h (uint32_t, uint6_t, uint32_t) +Generated assembler @code{cv.sdotup.sci.h} +@end deftypefn -@item nehalem -Intel Core i7 Nehalem CPU. +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sdotup_sc_b (uint32_t, uint8_t, uint32_t) +Generated assembler @code{cv.sdotup.sc.b} +@end deftypefn -@item westmere -Intel Core i7 Westmere CPU. +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sdotup_sc_b (uint32_t, uint6_t, uint32_t) +Generated assembler @code{cv.sdotup.sci.b} +@end deftypefn -@item sandybridge -Intel Core i7 Sandy Bridge CPU. +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sdotusp_h (uint32_t, uint32_t, uint32_t) +Generated assembler @code{cv.sdotusp.h} +@end deftypefn -@item ivybridge -Intel Core i7 Ivy Bridge CPU. +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sdotusp_b (uint32_t, uint32_t, uint32_t) +Generated assembler @code{cv.sdotusp.b} +@end deftypefn -@item haswell -Intel Core i7 Haswell CPU. +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sdotusp_sc_h (uint32_t, int16_t, uint32_t) +Generated assembler @code{cv.sdotusp.sc.h} +@end deftypefn -@item broadwell -Intel Core i7 Broadwell CPU. +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sdotusp_sc_h (uint32_t, int6_t, uint32_t) +Generated assembler @code{cv.sdotusp.sci.h} +@end deftypefn -@item skylake -Intel Core i7 Skylake CPU. +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sdotusp_sc_b (uint32_t, int8_t, uint32_t) +Generated assembler @code{cv.sdotusp.sc.b} +@end deftypefn -@item skylake-avx512 -Intel Core i7 Skylake AVX512 CPU. +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sdotusp_sc_b (uint32_t, int6_t, uint32_t) +Generated assembler @code{cv.sdotusp.sci.b} +@end deftypefn -@item cannonlake -Intel Core i7 Cannon Lake CPU. +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sdotsp_h (uint32_t, uint32_t, uint32_t) +Generated assembler @code{cv.sdotsp.h} +@end deftypefn -@item icelake-client -Intel Core i7 Ice Lake Client CPU. +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sdotsp_b (uint32_t, uint32_t, uint32_t) +Generated assembler @code{cv.sdotsp.b} +@end deftypefn -@item icelake-server -Intel Core i7 Ice Lake Server CPU. +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sdotsp_sc_h (uint32_t, int16_t, uint32_t) +Generated assembler @code{cv.sdotsp.sc.h} +@end deftypefn -@item cascadelake -Intel Core i7 Cascadelake CPU. +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sdotsp_sc_h (uint32_t, int6_t, uint32_t) +Generated assembler @code{cv.sdotsp.sci.h} +@end deftypefn -@item tigerlake -Intel Core i7 Tigerlake CPU. +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sdotsp_sc_b (uint32_t, int8_t, uint32_t) +Generated assembler @code{cv.sdotsp.sc.b} +@end deftypefn -@item cooperlake -Intel Core i7 Cooperlake CPU. +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sdotsp_sc_b (uint32_t, int6_t, uint32_t) +Generated assembler @code{cv.sdotsp.sci.b} +@end deftypefn -@item sapphirerapids -Intel Core i7 sapphirerapids CPU. +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_extract_h (uint32_t, uint6_t) +Generated assembler @code{cv.extract.h} +@end deftypefn -@item alderlake -Intel Core i7 Alderlake CPU. +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_extract_b (uint32_t, uint6_t) +Generated assembler @code{cv.extract.b} +@end deftypefn -@item rocketlake -Intel Core i7 Rocketlake CPU. +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_extractu_h (uint32_t, uint6_t) +Generated assembler @code{cv.extractu.h} +@end deftypefn -@item graniterapids -Intel Core i7 graniterapids CPU. +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_extractu_b (uint32_t, uint6_t) +Generated assembler @code{cv.extractu.b} +@end deftypefn -@item graniterapids-d -Intel Core i7 graniterapids D CPU. +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_insert_h (uint32_t, uint32_t) +Generated assembler @code{cv.insert.h} +@end deftypefn -@item arrowlake -Intel Core i7 Arrow Lake CPU. +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_insert_b (uint32_t, uint32_t) +Generated assembler @code{cv.insert.b} +@end deftypefn -@item arrowlake-s -Intel Core i7 Arrow Lake S CPU. +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_shuffle_h (uint32_t, uint32_t) +Generated assembler @code{cv.shuffle.h} +@end deftypefn -@item pantherlake -Intel Core i7 Panther Lake CPU. +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_shuffle_b (uint32_t, uint32_t) +Generated assembler @code{cv.shuffle.b} +@end deftypefn -@item diamondrapids -Intel Core i7 Diamond Rapids CPU. +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_shuffle_sci_h (uint32_t, uint4_t) +Generated assembler @code{cv.shuffle.sci.h} +@end deftypefn -@item bonnell -Intel Atom Bonnell CPU. +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_shufflei0_sci_b (uint32_t, uint4_t) +Generated assembler @code{cv.shufflei0.sci.b} +@end deftypefn -@item silvermont -Intel Atom Silvermont CPU. +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_shufflei1_sci_b (uint32_t, uint4_t) +Generated assembler @code{cv.shufflei1.sci.b} +@end deftypefn -@item goldmont -Intel Atom Goldmont CPU. +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_shufflei2_sci_b (uint32_t, uint4_t) +Generated assembler @code{cv.shufflei2.sci.b} +@end deftypefn -@item goldmont-plus -Intel Atom Goldmont Plus CPU. +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_shufflei3_sci_b (uint32_t, uint4_t) +Generated assembler @code{cv.shufflei3.sci.b} +@end deftypefn -@item tremont -Intel Atom Tremont CPU. +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_shuffle2_h (uint32_t, uint32_t, uint32_t) +Generated assembler @code{cv.shuffle2.h} +@end deftypefn -@item sierraforest -Intel Atom Sierra Forest CPU. +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_shuffle2_b (uint32_t, uint32_t, uint32_t) +Generated assembler @code{cv.shuffle2.b} +@end deftypefn -@item grandridge -Intel Atom Grand Ridge CPU. +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_packlo_h (uint32_t, uint32_t) +Generated assembler @code{cv.pack} +@end deftypefn -@item clearwaterforest -Intel Atom Clearwater Forest CPU. +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_packhi_h (uint32_t, uint32_t) +Generated assembler @code{cv.pack.h} +@end deftypefn -@item lujiazui -ZHAOXIN lujiazui CPU. +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_packhi_b (uint32_t, uint32_t, uint32_t) +Generated assembler @code{cv.packhi.b} +@end deftypefn -@item yongfeng -ZHAOXIN yongfeng CPU. +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_packlo_b (uint32_t, uint32_t, uint32_t) +Generated assembler @code{cv.packlo.b} +@end deftypefn -@item shijidadao -ZHAOXIN shijidadao CPU. +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpeq_h (uint32_t, uint32_t) +Generated assembler @code{cv.cmpeq.h} +@end deftypefn -@item amdfam10h -AMD Family 10h CPU. +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpeq_b (uint32_t, uint32_t) +Generated assembler @code{cv.cmpeq.b} +@end deftypefn -@item barcelona -AMD Family 10h Barcelona CPU. +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpeq_sc_h (uint32_t, int16_t) +Generated assembler @code{cv.cmpeq.sc.h} +@end deftypefn -@item shanghai -AMD Family 10h Shanghai CPU. +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpeq_sc_h (uint32_t, int6_t) +Generated assembler @code{cv.cmpeq.sci.h} +@end deftypefn -@item istanbul -AMD Family 10h Istanbul CPU. +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpeq_sc_b (uint32_t, int8_t) +Generated assembler @code{cv.cmpeq.sc.b} +@end deftypefn -@item btver1 -AMD Family 14h CPU. +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpeq_sc_b (uint32_t, int6_t) +Generated assembler @code{cv.cmpeq.sci.b} +@end deftypefn -@item amdfam15h -AMD Family 15h CPU. +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpne_h (uint32_t, uint32_t) +Generated assembler @code{cv.cmpne.h} +@end deftypefn -@item bdver1 -AMD Family 15h Bulldozer version 1. +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpne_b (uint32_t, uint32_t) +Generated assembler @code{cv.cmpne.b} +@end deftypefn -@item bdver2 -AMD Family 15h Bulldozer version 2. +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpne_sc_h (uint32_t, int16_t) +Generated assembler @code{cv.cmpne.sc.h} +@end deftypefn -@item bdver3 -AMD Family 15h Bulldozer version 3. +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpne_sc_h (uint32_t, int6_t) +Generated assembler @code{cv.cmpne.sci.h} +@end deftypefn -@item bdver4 -AMD Family 15h Bulldozer version 4. +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpne_sc_b (uint32_t, int8_t) +Generated assembler @code{cv.cmpne.sc.b} +@end deftypefn -@item btver2 -AMD Family 16h CPU. +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpne_sc_b (uint32_t, int6_t) +Generated assembler @code{cv.cmpne.sci.b} +@end deftypefn -@item amdfam17h -AMD Family 17h CPU. +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpgt_h (uint32_t, uint32_t) +Generated assembler @code{cv.cmpgt.h} +@end deftypefn -@item znver1 -AMD Family 17h Zen version 1. +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpgt_b (uint32_t, uint32_t) +Generated assembler @code{cv.cmpgt.b} +@end deftypefn -@item znver2 -AMD Family 17h Zen version 2. +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpgt_sc_h (uint32_t, int16_t) +Generated assembler @code{cv.cmpgt.sc.h} +@end deftypefn -@item amdfam19h -AMD Family 19h CPU. +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpgt_sc_h (uint32_t, int6_t) +Generated assembler @code{cv.cmpgt.sci.h} +@end deftypefn -@item znver3 -AMD Family 19h Zen version 3. +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpgt_sc_b (uint32_t, int8_t) +Generated assembler @code{cv.cmpgt.sc.b} +@end deftypefn -@item znver4 -AMD Family 19h Zen version 4. +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpgt_sc_b (uint32_t, int6_t) +Generated assembler @code{cv.cmpgt.sci.b} +@end deftypefn -@item znver5 -AMD Family 1ah Zen version 5. -@end table +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpge_h (uint32_t, uint32_t) +Generated assembler @code{cv.cmpge.h} +@end deftypefn -Here is an example: -@smallexample -if (__builtin_cpu_is ("corei7")) - @{ - do_corei7 (); // Core i7 specific implementation. - @} -else - @{ - do_generic (); // Generic implementation. - @} -@end smallexample -@enddefbuiltin +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpge_b (uint32_t, uint32_t) +Generated assembler @code{cv.cmpge.b} +@end deftypefn -@defbuiltin{int __builtin_cpu_supports (const char *@var{feature})} -This function returns a positive integer if the run-time CPU -supports @var{feature} -and returns @code{0} otherwise. The following features can be detected: +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpge_sc_h (uint32_t, int16_t) +Generated assembler @code{cv.cmpge.sc.h} +@end deftypefn -@table @samp -@item cmov -CMOV instruction. -@item mmx -MMX instructions. -@item popcnt -POPCNT instruction. -@item sse -SSE instructions. -@item sse2 -SSE2 instructions. -@item sse3 -SSE3 instructions. -@item ssse3 -SSSE3 instructions. -@item sse4.1 -SSE4.1 instructions. -@item sse4.2 -SSE4.2 instructions. -@item avx -AVX instructions. -@item avx2 -AVX2 instructions. -@item sse4a -SSE4A instructions. -@item fma4 -FMA4 instructions. -@item xop -XOP instructions. -@item fma -FMA instructions. -@item avx512f -AVX512F instructions. -@item bmi -BMI instructions. -@item bmi2 -BMI2 instructions. -@item aes -AES instructions. -@item pclmul -PCLMUL instructions. -@item avx512vl -AVX512VL instructions. -@item avx512bw -AVX512BW instructions. -@item avx512dq -AVX512DQ instructions. -@item avx512cd -AVX512CD instructions. -@item avx512vbmi -AVX512VBMI instructions. -@item avx512ifma -AVX512IFMA instructions. -@item avx512vpopcntdq -AVX512VPOPCNTDQ instructions. -@item avx512vbmi2 -AVX512VBMI2 instructions. -@item gfni -GFNI instructions. -@item vpclmulqdq -VPCLMULQDQ instructions. -@item avx512vnni -AVX512VNNI instructions. -@item avx512bitalg -AVX512BITALG instructions. -@item x86-64 -Baseline x86-64 microarchitecture level (as defined in x86-64 psABI). -@item x86-64-v2 -x86-64-v2 microarchitecture level. -@item x86-64-v3 -x86-64-v3 microarchitecture level. -@item x86-64-v4 -x86-64-v4 microarchitecture level. +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpge_sc_h (uint32_t, int6_t) +Generated assembler @code{cv.cmpge.sci.h} +@end deftypefn +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpge_sc_b (uint32_t, int8_t) +Generated assembler @code{cv.cmpge.sc.b} +@end deftypefn -@end table +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpge_sc_b (uint32_t, int6_t) +Generated assembler @code{cv.cmpge.sci.b} +@end deftypefn -Here is an example: -@smallexample -if (__builtin_cpu_supports ("popcnt")) - @{ - asm("popcnt %1,%0" : "=r"(count) : "rm"(n) : "cc"); - @} -else - @{ - count = generic_countbits (n); //generic implementation. - @} -@end smallexample -@enddefbuiltin +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmplt_h (uint32_t, uint32_t) +Generated assembler @code{cv.cmplt.h} +@end deftypefn -The following built-in functions are made available by @option{-mmmx}. -All of them generate the machine instruction that is part of the name. +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmplt_b (uint32_t, uint32_t) +Generated assembler @code{cv.cmplt.b} +@end deftypefn -@smallexample -v8qi __builtin_ia32_paddb (v8qi, v8qi); -v4hi __builtin_ia32_paddw (v4hi, v4hi); -v2si __builtin_ia32_paddd (v2si, v2si); -v8qi __builtin_ia32_psubb (v8qi, v8qi); -v4hi __builtin_ia32_psubw (v4hi, v4hi); -v2si __builtin_ia32_psubd (v2si, v2si); -v8qi __builtin_ia32_paddsb (v8qi, v8qi); -v4hi __builtin_ia32_paddsw (v4hi, v4hi); -v8qi __builtin_ia32_psubsb (v8qi, v8qi); -v4hi __builtin_ia32_psubsw (v4hi, v4hi); -v8qi __builtin_ia32_paddusb (v8qi, v8qi); -v4hi __builtin_ia32_paddusw (v4hi, v4hi); -v8qi __builtin_ia32_psubusb (v8qi, v8qi); -v4hi __builtin_ia32_psubusw (v4hi, v4hi); -v4hi __builtin_ia32_pmullw (v4hi, v4hi); -v4hi __builtin_ia32_pmulhw (v4hi, v4hi); -di __builtin_ia32_pand (di, di); -di __builtin_ia32_pandn (di,di); -di __builtin_ia32_por (di, di); -di __builtin_ia32_pxor (di, di); -v8qi __builtin_ia32_pcmpeqb (v8qi, v8qi); -v4hi __builtin_ia32_pcmpeqw (v4hi, v4hi); -v2si __builtin_ia32_pcmpeqd (v2si, v2si); -v8qi __builtin_ia32_pcmpgtb (v8qi, v8qi); -v4hi __builtin_ia32_pcmpgtw (v4hi, v4hi); -v2si __builtin_ia32_pcmpgtd (v2si, v2si); -v8qi __builtin_ia32_punpckhbw (v8qi, v8qi); -v4hi __builtin_ia32_punpckhwd (v4hi, v4hi); -v2si __builtin_ia32_punpckhdq (v2si, v2si); -v8qi __builtin_ia32_punpcklbw (v8qi, v8qi); -v4hi __builtin_ia32_punpcklwd (v4hi, v4hi); -v2si __builtin_ia32_punpckldq (v2si, v2si); -v8qi __builtin_ia32_packsswb (v4hi, v4hi); -v4hi __builtin_ia32_packssdw (v2si, v2si); -v8qi __builtin_ia32_packuswb (v4hi, v4hi); +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmplt_sc_h (uint32_t, int16_t) +Generated assembler @code{cv.cmplt.sc.h} +@end deftypefn -v4hi __builtin_ia32_psllw (v4hi, v4hi); -v2si __builtin_ia32_pslld (v2si, v2si); -v1di __builtin_ia32_psllq (v1di, v1di); -v4hi __builtin_ia32_psrlw (v4hi, v4hi); -v2si __builtin_ia32_psrld (v2si, v2si); -v1di __builtin_ia32_psrlq (v1di, v1di); -v4hi __builtin_ia32_psraw (v4hi, v4hi); -v2si __builtin_ia32_psrad (v2si, v2si); -v4hi __builtin_ia32_psllwi (v4hi, int); -v2si __builtin_ia32_pslldi (v2si, int); -v1di __builtin_ia32_psllqi (v1di, int); -v4hi __builtin_ia32_psrlwi (v4hi, int); -v2si __builtin_ia32_psrldi (v2si, int); -v1di __builtin_ia32_psrlqi (v1di, int); -v4hi __builtin_ia32_psrawi (v4hi, int); -v2si __builtin_ia32_psradi (v2si, int); -@end smallexample +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmplt_sc_h (uint32_t, int6_t) +Generated assembler @code{cv.cmplt.sci.h} +@end deftypefn -The following built-in functions are made available either with -@option{-msse}, or with @option{-m3dnowa}. All of them generate -the machine instruction that is part of the name. +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmplt_sc_b (uint32_t, int8_t) +Generated assembler @code{cv.cmplt.sc.b} +@end deftypefn -@smallexample -v4hi __builtin_ia32_pmulhuw (v4hi, v4hi); -v8qi __builtin_ia32_pavgb (v8qi, v8qi); -v4hi __builtin_ia32_pavgw (v4hi, v4hi); -v1di __builtin_ia32_psadbw (v8qi, v8qi); -v8qi __builtin_ia32_pmaxub (v8qi, v8qi); -v4hi __builtin_ia32_pmaxsw (v4hi, v4hi); -v8qi __builtin_ia32_pminub (v8qi, v8qi); -v4hi __builtin_ia32_pminsw (v4hi, v4hi); -int __builtin_ia32_pmovmskb (v8qi); -void __builtin_ia32_maskmovq (v8qi, v8qi, char *); -void __builtin_ia32_movntq (di *, di); -void __builtin_ia32_sfence (void); -@end smallexample +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmplt_sc_b (uint32_t, int6_t) +Generated assembler @code{cv.cmplt.sci.b} +@end deftypefn -The following built-in functions are available when @option{-msse} is used. -All of them generate the machine instruction that is part of the name. +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmple_h (uint32_t, uint32_t) +Generated assembler @code{cv.cmple.h} +@end deftypefn -@smallexample -int __builtin_ia32_comieq (v4sf, v4sf); -int __builtin_ia32_comineq (v4sf, v4sf); -int __builtin_ia32_comilt (v4sf, v4sf); -int __builtin_ia32_comile (v4sf, v4sf); -int __builtin_ia32_comigt (v4sf, v4sf); -int __builtin_ia32_comige (v4sf, v4sf); -int __builtin_ia32_ucomieq (v4sf, v4sf); -int __builtin_ia32_ucomineq (v4sf, v4sf); -int __builtin_ia32_ucomilt (v4sf, v4sf); -int __builtin_ia32_ucomile (v4sf, v4sf); -int __builtin_ia32_ucomigt (v4sf, v4sf); -int __builtin_ia32_ucomige (v4sf, v4sf); -v4sf __builtin_ia32_addps (v4sf, v4sf); -v4sf __builtin_ia32_subps (v4sf, v4sf); -v4sf __builtin_ia32_mulps (v4sf, v4sf); -v4sf __builtin_ia32_divps (v4sf, v4sf); -v4sf __builtin_ia32_addss (v4sf, v4sf); -v4sf __builtin_ia32_subss (v4sf, v4sf); -v4sf __builtin_ia32_mulss (v4sf, v4sf); -v4sf __builtin_ia32_divss (v4sf, v4sf); -v4sf __builtin_ia32_cmpeqps (v4sf, v4sf); -v4sf __builtin_ia32_cmpltps (v4sf, v4sf); -v4sf __builtin_ia32_cmpleps (v4sf, v4sf); -v4sf __builtin_ia32_cmpgtps (v4sf, v4sf); -v4sf __builtin_ia32_cmpgeps (v4sf, v4sf); -v4sf __builtin_ia32_cmpunordps (v4sf, v4sf); -v4sf __builtin_ia32_cmpneqps (v4sf, v4sf); -v4sf __builtin_ia32_cmpnltps (v4sf, v4sf); -v4sf __builtin_ia32_cmpnleps (v4sf, v4sf); -v4sf __builtin_ia32_cmpngtps (v4sf, v4sf); -v4sf __builtin_ia32_cmpngeps (v4sf, v4sf); -v4sf __builtin_ia32_cmpordps (v4sf, v4sf); -v4sf __builtin_ia32_cmpeqss (v4sf, v4sf); -v4sf __builtin_ia32_cmpltss (v4sf, v4sf); -v4sf __builtin_ia32_cmpless (v4sf, v4sf); -v4sf __builtin_ia32_cmpunordss (v4sf, v4sf); -v4sf __builtin_ia32_cmpneqss (v4sf, v4sf); -v4sf __builtin_ia32_cmpnltss (v4sf, v4sf); -v4sf __builtin_ia32_cmpnless (v4sf, v4sf); -v4sf __builtin_ia32_cmpordss (v4sf, v4sf); -v4sf __builtin_ia32_maxps (v4sf, v4sf); -v4sf __builtin_ia32_maxss (v4sf, v4sf); -v4sf __builtin_ia32_minps (v4sf, v4sf); -v4sf __builtin_ia32_minss (v4sf, v4sf); -v4sf __builtin_ia32_andps (v4sf, v4sf); -v4sf __builtin_ia32_andnps (v4sf, v4sf); -v4sf __builtin_ia32_orps (v4sf, v4sf); -v4sf __builtin_ia32_xorps (v4sf, v4sf); -v4sf __builtin_ia32_movss (v4sf, v4sf); -v4sf __builtin_ia32_movhlps (v4sf, v4sf); -v4sf __builtin_ia32_movlhps (v4sf, v4sf); -v4sf __builtin_ia32_unpckhps (v4sf, v4sf); -v4sf __builtin_ia32_unpcklps (v4sf, v4sf); -v4sf __builtin_ia32_cvtpi2ps (v4sf, v2si); -v4sf __builtin_ia32_cvtsi2ss (v4sf, int); -v2si __builtin_ia32_cvtps2pi (v4sf); -int __builtin_ia32_cvtss2si (v4sf); -v2si __builtin_ia32_cvttps2pi (v4sf); -int __builtin_ia32_cvttss2si (v4sf); -v4sf __builtin_ia32_rcpps (v4sf); -v4sf __builtin_ia32_rsqrtps (v4sf); -v4sf __builtin_ia32_sqrtps (v4sf); -v4sf __builtin_ia32_rcpss (v4sf); -v4sf __builtin_ia32_rsqrtss (v4sf); -v4sf __builtin_ia32_sqrtss (v4sf); -v4sf __builtin_ia32_shufps (v4sf, v4sf, int); -void __builtin_ia32_movntps (float *, v4sf); -int __builtin_ia32_movmskps (v4sf); -@end smallexample +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmple_b (uint32_t, uint32_t) +Generated assembler @code{cv.cmple.b} +@end deftypefn + +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmple_sc_h (uint32_t, int16_t) +Generated assembler @code{cv.cmple.sc.h} +@end deftypefn + +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmple_sc_h (uint32_t, int6_t) +Generated assembler @code{cv.cmple.sci.h} +@end deftypefn + +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmple_sc_b (uint32_t, int8_t) +Generated assembler @code{cv.cmple.sc.b} +@end deftypefn + +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmple_sc_b (uint32_t, int6_t) +Generated assembler @code{cv.cmple.sci.b} +@end deftypefn + +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpgtu_h (uint32_t, uint32_t) +Generated assembler @code{cv.cmpgtu.h} +@end deftypefn + +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpgtu_b (uint32_t, uint32_t) +Generated assembler @code{cv.cmpgtu.b} +@end deftypefn + +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpgtu_sc_h (uint32_t, uint16_t) +Generated assembler @code{cv.cmpgtu.sc.h} +@end deftypefn + +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpgtu_sc_h (uint32_t, uint6_t) +Generated assembler @code{cv.cmpgtu.sci.h} +@end deftypefn + +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpgtu_sc_b (uint32_t, uint8_t) +Generated assembler @code{cv.cmpgtu.sc.b} +@end deftypefn + +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpgtu_sc_b (uint32_t, uint6_t) +Generated assembler @code{cv.cmpgtu.sci.b} +@end deftypefn + +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpgeu_h (uint32_t, uint32_t) +Generated assembler @code{cv.cmpgeu.h} +@end deftypefn + +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpgeu_b (uint32_t, uint32_t) +Generated assembler @code{cv.cmpgeu.b} +@end deftypefn + +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpgeu_sc_h (uint32_t, uint16_t) +Generated assembler @code{cv.cmpgeu.sc.h} +@end deftypefn + +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpgeu_sc_h (uint32_t, uint6_t) +Generated assembler @code{cv.cmpgeu.sci.h} +@end deftypefn + +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpgeu_sc_b (uint32_t, uint8_t) +Generated assembler @code{cv.cmpgeu.sc.b} +@end deftypefn + +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpgeu_sc_b (uint32_t, uint6_t) +Generated assembler @code{cv.cmpgeu.sci.b} +@end deftypefn + +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpltu_h (uint32_t, uint32_t) +Generated assembler @code{cv.cmpltu.h} +@end deftypefn + +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpltu_b (uint32_t, uint32_t) +Generated assembler @code{cv.cmpltu.b} +@end deftypefn -The following built-in functions are available when @option{-msse} is used. +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpltu_sc_h (uint32_t, uint16_t) +Generated assembler @code{cv.cmpltu.sc.h} +@end deftypefn -@defbuiltin{v4sf __builtin_ia32_loadups (float *)} -Generates the @code{movups} machine instruction as a load from memory. -@enddefbuiltin +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpltu_sc_h (uint32_t, uint6_t) +Generated assembler @code{cv.cmpltu.sci.h} +@end deftypefn -@defbuiltin{void __builtin_ia32_storeups (float *, v4sf)} -Generates the @code{movups} machine instruction as a store to memory. -@enddefbuiltin +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpltu_sc_b (uint32_t, uint8_t) +Generated assembler @code{cv.cmpltu.sc.b} +@end deftypefn -@defbuiltin{v4sf __builtin_ia32_loadss (float *)} -Generates the @code{movss} machine instruction as a load from memory. -@enddefbuiltin +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpltu_sc_b (uint32_t, uint6_t) +Generated assembler @code{cv.cmpltu.sci.b} +@end deftypefn -@defbuiltin{v4sf __builtin_ia32_loadhps (v4sf, const v2sf *)} -Generates the @code{movhps} machine instruction as a load from memory. -@enddefbuiltin +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpleu_h (uint32_t, uint32_t) +Generated assembler @code{cv.cmpleu.h} +@end deftypefn -@defbuiltin{v4sf __builtin_ia32_loadlps (v4sf, const v2sf *)} -Generates the @code{movlps} machine instruction as a load from memory -@enddefbuiltin +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpleu_b (uint32_t, uint32_t) +Generated assembler @code{cv.cmpleu.b} +@end deftypefn -@defbuiltin{void __builtin_ia32_storehps (v2sf *, v4sf)} -Generates the @code{movhps} machine instruction as a store to memory. -@enddefbuiltin +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpleu_sc_h (uint32_t, uint16_t) +Generated assembler @code{cv.cmpleu.sc.h} +@end deftypefn -@defbuiltin{void __builtin_ia32_storelps (v2sf *, v4sf)} -Generates the @code{movlps} machine instruction as a store to memory. -@enddefbuiltin +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpleu_sc_h (uint32_t, uint6_t) +Generated assembler @code{cv.cmpleu.sci.h} +@end deftypefn -The following built-in functions are available when @option{-msse2} is used. -All of them generate the machine instruction that is part of the name. +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpleu_sc_b (uint32_t, uint8_t) +Generated assembler @code{cv.cmpleu.sc.b} +@end deftypefn -@smallexample -int __builtin_ia32_comisdeq (v2df, v2df); -int __builtin_ia32_comisdlt (v2df, v2df); -int __builtin_ia32_comisdle (v2df, v2df); -int __builtin_ia32_comisdgt (v2df, v2df); -int __builtin_ia32_comisdge (v2df, v2df); -int __builtin_ia32_comisdneq (v2df, v2df); -int __builtin_ia32_ucomisdeq (v2df, v2df); -int __builtin_ia32_ucomisdlt (v2df, v2df); -int __builtin_ia32_ucomisdle (v2df, v2df); -int __builtin_ia32_ucomisdgt (v2df, v2df); -int __builtin_ia32_ucomisdge (v2df, v2df); -int __builtin_ia32_ucomisdneq (v2df, v2df); -v2df __builtin_ia32_cmpeqpd (v2df, v2df); -v2df __builtin_ia32_cmpltpd (v2df, v2df); -v2df __builtin_ia32_cmplepd (v2df, v2df); -v2df __builtin_ia32_cmpgtpd (v2df, v2df); -v2df __builtin_ia32_cmpgepd (v2df, v2df); -v2df __builtin_ia32_cmpunordpd (v2df, v2df); -v2df __builtin_ia32_cmpneqpd (v2df, v2df); -v2df __builtin_ia32_cmpnltpd (v2df, v2df); -v2df __builtin_ia32_cmpnlepd (v2df, v2df); -v2df __builtin_ia32_cmpngtpd (v2df, v2df); -v2df __builtin_ia32_cmpngepd (v2df, v2df); -v2df __builtin_ia32_cmpordpd (v2df, v2df); -v2df __builtin_ia32_cmpeqsd (v2df, v2df); -v2df __builtin_ia32_cmpltsd (v2df, v2df); -v2df __builtin_ia32_cmplesd (v2df, v2df); -v2df __builtin_ia32_cmpunordsd (v2df, v2df); -v2df __builtin_ia32_cmpneqsd (v2df, v2df); -v2df __builtin_ia32_cmpnltsd (v2df, v2df); -v2df __builtin_ia32_cmpnlesd (v2df, v2df); -v2df __builtin_ia32_cmpordsd (v2df, v2df); -v2di __builtin_ia32_paddq (v2di, v2di); -v2di __builtin_ia32_psubq (v2di, v2di); -v2df __builtin_ia32_addpd (v2df, v2df); -v2df __builtin_ia32_subpd (v2df, v2df); -v2df __builtin_ia32_mulpd (v2df, v2df); -v2df __builtin_ia32_divpd (v2df, v2df); -v2df __builtin_ia32_addsd (v2df, v2df); -v2df __builtin_ia32_subsd (v2df, v2df); -v2df __builtin_ia32_mulsd (v2df, v2df); -v2df __builtin_ia32_divsd (v2df, v2df); -v2df __builtin_ia32_minpd (v2df, v2df); -v2df __builtin_ia32_maxpd (v2df, v2df); -v2df __builtin_ia32_minsd (v2df, v2df); -v2df __builtin_ia32_maxsd (v2df, v2df); -v2df __builtin_ia32_andpd (v2df, v2df); -v2df __builtin_ia32_andnpd (v2df, v2df); -v2df __builtin_ia32_orpd (v2df, v2df); -v2df __builtin_ia32_xorpd (v2df, v2df); -v2df __builtin_ia32_movsd (v2df, v2df); -v2df __builtin_ia32_unpckhpd (v2df, v2df); -v2df __builtin_ia32_unpcklpd (v2df, v2df); -v16qi __builtin_ia32_paddb128 (v16qi, v16qi); -v8hi __builtin_ia32_paddw128 (v8hi, v8hi); -v4si __builtin_ia32_paddd128 (v4si, v4si); -v2di __builtin_ia32_paddq128 (v2di, v2di); -v16qi __builtin_ia32_psubb128 (v16qi, v16qi); -v8hi __builtin_ia32_psubw128 (v8hi, v8hi); -v4si __builtin_ia32_psubd128 (v4si, v4si); -v2di __builtin_ia32_psubq128 (v2di, v2di); -v8hi __builtin_ia32_pmullw128 (v8hi, v8hi); -v8hi __builtin_ia32_pmulhw128 (v8hi, v8hi); -v2di __builtin_ia32_pand128 (v2di, v2di); -v2di __builtin_ia32_pandn128 (v2di, v2di); -v2di __builtin_ia32_por128 (v2di, v2di); -v2di __builtin_ia32_pxor128 (v2di, v2di); -v16qi __builtin_ia32_pavgb128 (v16qi, v16qi); -v8hi __builtin_ia32_pavgw128 (v8hi, v8hi); -v16qi __builtin_ia32_pcmpeqb128 (v16qi, v16qi); -v8hi __builtin_ia32_pcmpeqw128 (v8hi, v8hi); -v4si __builtin_ia32_pcmpeqd128 (v4si, v4si); -v16qi __builtin_ia32_pcmpgtb128 (v16qi, v16qi); -v8hi __builtin_ia32_pcmpgtw128 (v8hi, v8hi); -v4si __builtin_ia32_pcmpgtd128 (v4si, v4si); -v16qi __builtin_ia32_pmaxub128 (v16qi, v16qi); -v8hi __builtin_ia32_pmaxsw128 (v8hi, v8hi); -v16qi __builtin_ia32_pminub128 (v16qi, v16qi); -v8hi __builtin_ia32_pminsw128 (v8hi, v8hi); -v16qi __builtin_ia32_punpckhbw128 (v16qi, v16qi); -v8hi __builtin_ia32_punpckhwd128 (v8hi, v8hi); -v4si __builtin_ia32_punpckhdq128 (v4si, v4si); -v2di __builtin_ia32_punpckhqdq128 (v2di, v2di); -v16qi __builtin_ia32_punpcklbw128 (v16qi, v16qi); -v8hi __builtin_ia32_punpcklwd128 (v8hi, v8hi); -v4si __builtin_ia32_punpckldq128 (v4si, v4si); -v2di __builtin_ia32_punpcklqdq128 (v2di, v2di); -v16qi __builtin_ia32_packsswb128 (v8hi, v8hi); -v8hi __builtin_ia32_packssdw128 (v4si, v4si); -v16qi __builtin_ia32_packuswb128 (v8hi, v8hi); -v8hi __builtin_ia32_pmulhuw128 (v8hi, v8hi); -void __builtin_ia32_maskmovdqu (v16qi, v16qi); -v2df __builtin_ia32_loadupd (double *); -void __builtin_ia32_storeupd (double *, v2df); -v2df __builtin_ia32_loadhpd (v2df, double const *); -v2df __builtin_ia32_loadlpd (v2df, double const *); -int __builtin_ia32_movmskpd (v2df); -int __builtin_ia32_pmovmskb128 (v16qi); -void __builtin_ia32_movnti (int *, int); -void __builtin_ia32_movnti64 (long long int *, long long int); -void __builtin_ia32_movntpd (double *, v2df); -void __builtin_ia32_movntdq (v2df *, v2df); -v4si __builtin_ia32_pshufd (v4si, int); -v8hi __builtin_ia32_pshuflw (v8hi, int); -v8hi __builtin_ia32_pshufhw (v8hi, int); -v2di __builtin_ia32_psadbw128 (v16qi, v16qi); -v2df __builtin_ia32_sqrtpd (v2df); -v2df __builtin_ia32_sqrtsd (v2df); -v2df __builtin_ia32_shufpd (v2df, v2df, int); -v2df __builtin_ia32_cvtdq2pd (v4si); -v4sf __builtin_ia32_cvtdq2ps (v4si); -v4si __builtin_ia32_cvtpd2dq (v2df); -v2si __builtin_ia32_cvtpd2pi (v2df); -v4sf __builtin_ia32_cvtpd2ps (v2df); -v4si __builtin_ia32_cvttpd2dq (v2df); -v2si __builtin_ia32_cvttpd2pi (v2df); -v2df __builtin_ia32_cvtpi2pd (v2si); -int __builtin_ia32_cvtsd2si (v2df); -int __builtin_ia32_cvttsd2si (v2df); -long long __builtin_ia32_cvtsd2si64 (v2df); -long long __builtin_ia32_cvttsd2si64 (v2df); -v4si __builtin_ia32_cvtps2dq (v4sf); -v2df __builtin_ia32_cvtps2pd (v4sf); -v4si __builtin_ia32_cvttps2dq (v4sf); -v2df __builtin_ia32_cvtsi2sd (v2df, int); -v2df __builtin_ia32_cvtsi642sd (v2df, long long); -v4sf __builtin_ia32_cvtsd2ss (v4sf, v2df); -v2df __builtin_ia32_cvtss2sd (v2df, v4sf); -void __builtin_ia32_clflush (const void *); -void __builtin_ia32_lfence (void); -void __builtin_ia32_mfence (void); -v16qi __builtin_ia32_loaddqu (const char *); -void __builtin_ia32_storedqu (char *, v16qi); -v1di __builtin_ia32_pmuludq (v2si, v2si); -v2di __builtin_ia32_pmuludq128 (v4si, v4si); -v8hi __builtin_ia32_psllw128 (v8hi, v8hi); -v4si __builtin_ia32_pslld128 (v4si, v4si); -v2di __builtin_ia32_psllq128 (v2di, v2di); -v8hi __builtin_ia32_psrlw128 (v8hi, v8hi); -v4si __builtin_ia32_psrld128 (v4si, v4si); -v2di __builtin_ia32_psrlq128 (v2di, v2di); -v8hi __builtin_ia32_psraw128 (v8hi, v8hi); -v4si __builtin_ia32_psrad128 (v4si, v4si); -v2di __builtin_ia32_pslldqi128 (v2di, int); -v8hi __builtin_ia32_psllwi128 (v8hi, int); -v4si __builtin_ia32_pslldi128 (v4si, int); -v2di __builtin_ia32_psllqi128 (v2di, int); -v2di __builtin_ia32_psrldqi128 (v2di, int); -v8hi __builtin_ia32_psrlwi128 (v8hi, int); -v4si __builtin_ia32_psrldi128 (v4si, int); -v2di __builtin_ia32_psrlqi128 (v2di, int); -v8hi __builtin_ia32_psrawi128 (v8hi, int); -v4si __builtin_ia32_psradi128 (v4si, int); -v4si __builtin_ia32_pmaddwd128 (v8hi, v8hi); -v2di __builtin_ia32_movq128 (v2di); -@end smallexample +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cmpleu_sc_b (uint32_t, uint6_t) +Generated assembler @code{cv.cmpleu.sci.b} +@end deftypefn + +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cplxmul_r (uint32_t, uint32_t, uint32_t, uint4_t) +Generated assembler @code{cv.cplxmul.r} +@end deftypefn + +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cplxmul_i (uint32_t, uint32_t, uint32_t, uint4_t) +Generated assembler @code{cv.cplxmul.i} +@end deftypefn + +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cplxmul_r (uint32_t, uint32_t, uint32_t, uint4_t) +Generated assembler @code{cv.cplxmul.r.div2} +@end deftypefn + +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cplxmul_i (uint32_t, uint32_t, uint32_t, uint4_t) +Generated assembler @code{cv.cplxmul.i.div2} +@end deftypefn + +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cplxmul_r (uint32_t, uint32_t, uint32_t, uint4_t) +Generated assembler @code{cv.cplxmul.r.div4} +@end deftypefn + +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cplxmul_i (uint32_t, uint32_t, uint32_t, uint4_t) +Generated assembler @code{cv.cplxmul.i.div4} +@end deftypefn + +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cplxmul_r (uint32_t, uint32_t, uint32_t, uint4_t) +Generated assembler @code{cv.cplxmul.r.div8} +@end deftypefn + +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cplxmul_i (uint32_t, uint32_t, uint32_t, uint4_t) +Generated assembler @code{cv.cplxmul.i.div8} +@end deftypefn + +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_cplxconj (uint32_t) +Generated assembler @code{cv.cplxconj} +@end deftypefn + +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_subrotmj (uint32_t, uint32_t, uint4_t) +Generated assembler @code{cv.subrotmj} +@end deftypefn + +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_subrotmj (uint32_t, uint32_t, uint32_t, uint4_t) +Generated assembler @code{cv.subrotmj.div2} +@end deftypefn -The following built-in functions are available when @option{-msse3} is used. -All of them generate the machine instruction that is part of the name. +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_subrotmj (uint32_t, uint32_t, uint32_t, uint4_t) +Generated assembler @code{cv.subrotmj.div4} +@end deftypefn -@smallexample -v2df __builtin_ia32_addsubpd (v2df, v2df); -v4sf __builtin_ia32_addsubps (v4sf, v4sf); -v2df __builtin_ia32_haddpd (v2df, v2df); -v4sf __builtin_ia32_haddps (v4sf, v4sf); -v2df __builtin_ia32_hsubpd (v2df, v2df); -v4sf __builtin_ia32_hsubps (v4sf, v4sf); -v16qi __builtin_ia32_lddqu (char const *); -void __builtin_ia32_monitor (void *, unsigned int, unsigned int); -v4sf __builtin_ia32_movshdup (v4sf); -v4sf __builtin_ia32_movsldup (v4sf); -void __builtin_ia32_mwait (unsigned int, unsigned int); -@end smallexample +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_subrotmj (uint32_t, uint32_t, uint32_t, uint4_t) +Generated assembler @code{cv.subrotmj.div8} +@end deftypefn -The following built-in functions are available when @option{-mssse3} is used. -All of them generate the machine instruction that is part of the name. +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_add_h (uint32_t, uint32_t, uint32_t, uint4_t) +Generated assembler @code{cv.add.div2} +@end deftypefn -@smallexample -v2si __builtin_ia32_phaddd (v2si, v2si); -v4hi __builtin_ia32_phaddw (v4hi, v4hi); -v4hi __builtin_ia32_phaddsw (v4hi, v4hi); -v2si __builtin_ia32_phsubd (v2si, v2si); -v4hi __builtin_ia32_phsubw (v4hi, v4hi); -v4hi __builtin_ia32_phsubsw (v4hi, v4hi); -v4hi __builtin_ia32_pmaddubsw (v8qi, v8qi); -v4hi __builtin_ia32_pmulhrsw (v4hi, v4hi); -v8qi __builtin_ia32_pshufb (v8qi, v8qi); -v8qi __builtin_ia32_psignb (v8qi, v8qi); -v2si __builtin_ia32_psignd (v2si, v2si); -v4hi __builtin_ia32_psignw (v4hi, v4hi); -v1di __builtin_ia32_palignr (v1di, v1di, int); -v8qi __builtin_ia32_pabsb (v8qi); -v2si __builtin_ia32_pabsd (v2si); -v4hi __builtin_ia32_pabsw (v4hi); -@end smallexample +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_add_h (uint32_t, uint32_t, uint32_t, uint4_t) +Generated assembler @code{cv.add.div4} +@end deftypefn -The following built-in functions are available when @option{-mssse3} is used. -All of them generate the machine instruction that is part of the name. +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_add_h (uint32_t, uint32_t, uint32_t, uint4_t) +Generated assembler @code{cv.add.div8} +@end deftypefn -@smallexample -v4si __builtin_ia32_phaddd128 (v4si, v4si); -v8hi __builtin_ia32_phaddw128 (v8hi, v8hi); -v8hi __builtin_ia32_phaddsw128 (v8hi, v8hi); -v4si __builtin_ia32_phsubd128 (v4si, v4si); -v8hi __builtin_ia32_phsubw128 (v8hi, v8hi); -v8hi __builtin_ia32_phsubsw128 (v8hi, v8hi); -v8hi __builtin_ia32_pmaddubsw128 (v16qi, v16qi); -v8hi __builtin_ia32_pmulhrsw128 (v8hi, v8hi); -v16qi __builtin_ia32_pshufb128 (v16qi, v16qi); -v16qi __builtin_ia32_psignb128 (v16qi, v16qi); -v4si __builtin_ia32_psignd128 (v4si, v4si); -v8hi __builtin_ia32_psignw128 (v8hi, v8hi); -v2di __builtin_ia32_palignr128 (v2di, v2di, int); -v16qi __builtin_ia32_pabsb128 (v16qi); -v4si __builtin_ia32_pabsd128 (v4si); -v8hi __builtin_ia32_pabsw128 (v8hi); -@end smallexample +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sub_h (uint32_t, uint32_t, uint32_t, uint4_t) +Generated assembler @code{cv.sub.div2} +@end deftypefn -The following built-in functions are available when @option{-msse4.1} is -used. All of them generate the machine instruction that is part of the -name. +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sub_h (uint32_t, uint32_t, uint32_t, uint4_t) +Generated assembler @code{cv.sub.div4} +@end deftypefn -@smallexample -v2df __builtin_ia32_blendpd (v2df, v2df, const int); -v4sf __builtin_ia32_blendps (v4sf, v4sf, const int); -v2df __builtin_ia32_blendvpd (v2df, v2df, v2df); -v4sf __builtin_ia32_blendvps (v4sf, v4sf, v4sf); -v2df __builtin_ia32_dppd (v2df, v2df, const int); -v4sf __builtin_ia32_dpps (v4sf, v4sf, const int); -v4sf __builtin_ia32_insertps128 (v4sf, v4sf, const int); -v2di __builtin_ia32_movntdqa (v2di *); -v16qi __builtin_ia32_mpsadbw128 (v16qi, v16qi, const int); -v8hi __builtin_ia32_packusdw128 (v4si, v4si); -v16qi __builtin_ia32_pblendvb128 (v16qi, v16qi, v16qi); -v8hi __builtin_ia32_pblendw128 (v8hi, v8hi, const int); -v2di __builtin_ia32_pcmpeqq (v2di, v2di); -v8hi __builtin_ia32_phminposuw128 (v8hi); -v16qi __builtin_ia32_pmaxsb128 (v16qi, v16qi); -v4si __builtin_ia32_pmaxsd128 (v4si, v4si); -v4si __builtin_ia32_pmaxud128 (v4si, v4si); -v8hi __builtin_ia32_pmaxuw128 (v8hi, v8hi); -v16qi __builtin_ia32_pminsb128 (v16qi, v16qi); -v4si __builtin_ia32_pminsd128 (v4si, v4si); -v4si __builtin_ia32_pminud128 (v4si, v4si); -v8hi __builtin_ia32_pminuw128 (v8hi, v8hi); -v4si __builtin_ia32_pmovsxbd128 (v16qi); -v2di __builtin_ia32_pmovsxbq128 (v16qi); -v8hi __builtin_ia32_pmovsxbw128 (v16qi); -v2di __builtin_ia32_pmovsxdq128 (v4si); -v4si __builtin_ia32_pmovsxwd128 (v8hi); -v2di __builtin_ia32_pmovsxwq128 (v8hi); -v4si __builtin_ia32_pmovzxbd128 (v16qi); -v2di __builtin_ia32_pmovzxbq128 (v16qi); -v8hi __builtin_ia32_pmovzxbw128 (v16qi); -v2di __builtin_ia32_pmovzxdq128 (v4si); -v4si __builtin_ia32_pmovzxwd128 (v8hi); -v2di __builtin_ia32_pmovzxwq128 (v8hi); -v2di __builtin_ia32_pmuldq128 (v4si, v4si); -v4si __builtin_ia32_pmulld128 (v4si, v4si); -int __builtin_ia32_ptestc128 (v2di, v2di); -int __builtin_ia32_ptestnzc128 (v2di, v2di); -int __builtin_ia32_ptestz128 (v2di, v2di); -v2df __builtin_ia32_roundpd (v2df, const int); -v4sf __builtin_ia32_roundps (v4sf, const int); -v2df __builtin_ia32_roundsd (v2df, v2df, const int); -v4sf __builtin_ia32_roundss (v4sf, v4sf, const int); -@end smallexample +@deftypefn {Built-in Function} {uint32_t} __builtin_riscv_cv_simd_sub_h (uint32_t, uint32_t, uint32_t, uint4_t) +Generated assembler @code{cv.sub.div8} +@end deftypefn -The following built-in functions are available when @option{-msse4.1} is -used. +@node RX Built-in Functions +@subsection RX Built-in Functions +GCC supports some of the RX instructions which cannot be expressed in +the C programming language via the use of built-in functions. The +following functions are supported: -@defbuiltin{v4sf __builtin_ia32_vec_set_v4sf (v4sf, float, const int)} -Generates the @code{insertps} machine instruction. +@defbuiltin{void __builtin_rx_brk (void)} +Generates the @code{brk} machine instruction. @enddefbuiltin -@defbuiltin{int __builtin_ia32_vec_ext_v16qi (v16qi, const int)} -Generates the @code{pextrb} machine instruction. +@defbuiltin{void __builtin_rx_clrpsw (int)} +Generates the @code{clrpsw} machine instruction to clear the specified +bit in the processor status word. @enddefbuiltin -@defbuiltin{v16qi __builtin_ia32_vec_set_v16qi (v16qi, int, const int)} -Generates the @code{pinsrb} machine instruction. +@defbuiltin{void __builtin_rx_int (int)} +Generates the @code{int} machine instruction to generate an interrupt +with the specified value. @enddefbuiltin -@defbuiltin{v4si __builtin_ia32_vec_set_v4si (v4si, int, const int)} -Generates the @code{pinsrd} machine instruction. +@defbuiltin{void __builtin_rx_machi (int, int)} +Generates the @code{machi} machine instruction to add the result of +multiplying the top 16 bits of the two arguments into the +accumulator. @enddefbuiltin -@defbuiltin{v2di __builtin_ia32_vec_set_v2di (v2di, long long, const int)} -Generates the @code{pinsrq} machine instruction in 64bit mode. +@defbuiltin{void __builtin_rx_maclo (int, int)} +Generates the @code{maclo} machine instruction to add the result of +multiplying the bottom 16 bits of the two arguments into the +accumulator. +@enddefbuiltin + +@defbuiltin{void __builtin_rx_mulhi (int, int)} +Generates the @code{mulhi} machine instruction to place the result of +multiplying the top 16 bits of the two arguments into the +accumulator. +@enddefbuiltin + +@defbuiltin{void __builtin_rx_mullo (int, int)} +Generates the @code{mullo} machine instruction to place the result of +multiplying the bottom 16 bits of the two arguments into the +accumulator. +@enddefbuiltin + +@defbuiltin{int __builtin_rx_mvfachi (void)} +Generates the @code{mvfachi} machine instruction to read the top +32 bits of the accumulator. +@enddefbuiltin + +@defbuiltin{int __builtin_rx_mvfacmi (void)} +Generates the @code{mvfacmi} machine instruction to read the middle +32 bits of the accumulator. +@enddefbuiltin + +@defbuiltin{int __builtin_rx_mvfc (int)} +Generates the @code{mvfc} machine instruction which reads the control +register specified in its argument and returns its value. +@enddefbuiltin + +@defbuiltin{void __builtin_rx_mvtachi (int)} +Generates the @code{mvtachi} machine instruction to set the top +32 bits of the accumulator. +@enddefbuiltin + +@defbuiltin{void __builtin_rx_mvtaclo (int)} +Generates the @code{mvtaclo} machine instruction to set the bottom +32 bits of the accumulator. +@enddefbuiltin + +@defbuiltin{void __builtin_rx_mvtc (int @var{reg}, int @var{val})} +Generates the @code{mvtc} machine instruction which sets control +register number @code{reg} to @code{val}. +@enddefbuiltin + +@defbuiltin{void __builtin_rx_mvtipl (int)} +Generates the @code{mvtipl} machine instruction set the interrupt +priority level. +@enddefbuiltin + +@defbuiltin{void __builtin_rx_racw (int)} +Generates the @code{racw} machine instruction to round the accumulator +according to the specified mode. +@enddefbuiltin + +@defbuiltin{int __builtin_rx_revw (int)} +Generates the @code{revw} machine instruction which swaps the bytes in +the argument so that bits 0--7 now occupy bits 8--15 and vice versa, +and also bits 16--23 occupy bits 24--31 and vice versa. +@enddefbuiltin + +@defbuiltin{void __builtin_rx_rmpa (void)} +Generates the @code{rmpa} machine instruction which initiates a +repeated multiply and accumulate sequence. +@enddefbuiltin + +@defbuiltin{void __builtin_rx_round (float)} +Generates the @code{round} machine instruction which returns the +floating-point argument rounded according to the current rounding mode +set in the floating-point status word register. +@enddefbuiltin + +@defbuiltin{int __builtin_rx_sat (int)} +Generates the @code{sat} machine instruction which returns the +saturated value of the argument. +@enddefbuiltin + +@defbuiltin{void __builtin_rx_setpsw (int)} +Generates the @code{setpsw} machine instruction to set the specified +bit in the processor status word. +@enddefbuiltin + +@defbuiltin{void __builtin_rx_wait (void)} +Generates the @code{wait} machine instruction. +@enddefbuiltin + +@node S/390 System z Built-in Functions +@subsection S/390 System z Built-in Functions +@defbuiltin{int __builtin_tbegin (void*)} +Generates the @code{tbegin} machine instruction starting a +non-constrained hardware transaction. If the parameter is non-NULL the +memory area is used to store the transaction diagnostic buffer and +will be passed as first operand to @code{tbegin}. This buffer can be +defined using the @code{struct __htm_tdb} C struct defined in +@code{htmintrin.h} and must reside on a double-word boundary. The +second tbegin operand is set to @code{0xff0c}. This enables +save/restore of all GPRs and disables aborts for FPR and AR +manipulations inside the transaction body. The condition code set by +the tbegin instruction is returned as integer value. The tbegin +instruction by definition overwrites the content of all FPRs. The +compiler will generate code which saves and restores the FPRs. For +soft-float code it is recommended to used the @code{*_nofloat} +variant. In order to prevent a TDB from being written it is required +to pass a constant zero value as parameter. Passing a zero value +through a variable is not sufficient. Although modifications of +access registers inside the transaction will not trigger an +transaction abort it is not supported to actually modify them. Access +registers do not get saved when entering a transaction. They will have +undefined state when reaching the abort code. @enddefbuiltin -The following built-in functions are changed to generate new SSE4.1 -instructions when @option{-msse4.1} is used. - -@defbuiltin{float __builtin_ia32_vec_ext_v4sf (v4sf, const int)} -Generates the @code{extractps} machine instruction. -@enddefbuiltin +Macros for the possible return codes of tbegin are defined in the +@code{htmintrin.h} header file: -@defbuiltin{int __builtin_ia32_vec_ext_v4si (v4si, const int)} -Generates the @code{pextrd} machine instruction. -@enddefbuiltin +@defmac _HTM_TBEGIN_STARTED +@code{tbegin} has been executed as part of normal processing. The +transaction body is supposed to be executed. +@end defmac -@defbuiltin{{long long} __builtin_ia32_vec_ext_v2di (v2di, const int)} -Generates the @code{pextrq} machine instruction in 64bit mode. -@enddefbuiltin +@defmac _HTM_TBEGIN_INDETERMINATE +The transaction was aborted due to an indeterminate condition which +might be persistent. +@end defmac -The following built-in functions are available when @option{-msse4.2} is -used. All of them generate the machine instruction that is part of the -name. +@defmac _HTM_TBEGIN_TRANSIENT +The transaction aborted due to a transient failure. The transaction +should be re-executed in that case. +@end defmac -@smallexample -v16qi __builtin_ia32_pcmpestrm128 (v16qi, int, v16qi, int, const int); -int __builtin_ia32_pcmpestri128 (v16qi, int, v16qi, int, const int); -int __builtin_ia32_pcmpestria128 (v16qi, int, v16qi, int, const int); -int __builtin_ia32_pcmpestric128 (v16qi, int, v16qi, int, const int); -int __builtin_ia32_pcmpestrio128 (v16qi, int, v16qi, int, const int); -int __builtin_ia32_pcmpestris128 (v16qi, int, v16qi, int, const int); -int __builtin_ia32_pcmpestriz128 (v16qi, int, v16qi, int, const int); -v16qi __builtin_ia32_pcmpistrm128 (v16qi, v16qi, const int); -int __builtin_ia32_pcmpistri128 (v16qi, v16qi, const int); -int __builtin_ia32_pcmpistria128 (v16qi, v16qi, const int); -int __builtin_ia32_pcmpistric128 (v16qi, v16qi, const int); -int __builtin_ia32_pcmpistrio128 (v16qi, v16qi, const int); -int __builtin_ia32_pcmpistris128 (v16qi, v16qi, const int); -int __builtin_ia32_pcmpistriz128 (v16qi, v16qi, const int); -v2di __builtin_ia32_pcmpgtq (v2di, v2di); -@end smallexample +@defmac _HTM_TBEGIN_PERSISTENT +The transaction aborted due to a persistent failure. Re-execution +under same circumstances will not be productive. +@end defmac -The following built-in functions are available when @option{-msse4.2} is -used. +@defmac _HTM_FIRST_USER_ABORT_CODE +The @code{_HTM_FIRST_USER_ABORT_CODE} defined in @code{htmintrin.h} +specifies the first abort code which can be used for +@code{__builtin_tabort}. Values below this threshold are reserved for +machine use. +@end defmac -@defbuiltin{{unsigned int} __builtin_ia32_crc32qi (unsigned int, unsigned char)} -Generates the @code{crc32b} machine instruction. -@enddefbuiltin +@deftp {Data type} {struct __htm_tdb} +The @code{struct __htm_tdb} defined in @code{htmintrin.h} describes +the structure of the transaction diagnostic block as specified in the +Principles of Operation manual chapter 5-91. +@end deftp -@defbuiltin{{unsigned int} __builtin_ia32_crc32hi (unsigned int, unsigned short)} -Generates the @code{crc32w} machine instruction. +@defbuiltin{int __builtin_tbegin_nofloat (void*)} +Same as @code{__builtin_tbegin} but without FPR saves and restores. +Using this variant in code making use of FPRs will leave the FPRs in +undefined state when entering the transaction abort handler code. @enddefbuiltin -@defbuiltin{{unsigned int} __builtin_ia32_crc32si (unsigned int, unsigned int)} -Generates the @code{crc32l} machine instruction. +@defbuiltin{int __builtin_tbegin_retry (void*, int)} +In addition to @code{__builtin_tbegin} a loop for transient failures +is generated. If tbegin returns a condition code of 2 the transaction +will be retried as often as specified in the second argument. The +perform processor assist instruction is used to tell the CPU about the +number of fails so far. @enddefbuiltin -@defbuiltin{{unsigned long long} __builtin_ia32_crc32di (unsigned long long, unsigned long long)} -Generates the @code{crc32q} machine instruction. +@defbuiltin{int __builtin_tbegin_retry_nofloat (void*, int)} +Same as @code{__builtin_tbegin_retry} but without FPR saves and +restores. Using this variant in code making use of FPRs will leave +the FPRs in undefined state when entering the transaction abort +handler code. @enddefbuiltin -The following built-in functions are changed to generate new SSE4.2 -instructions when @option{-msse4.2} is used. - -@defbuiltin{int __builtin_popcount (unsigned int)} -Generates the @code{popcntl} machine instruction. +@defbuiltin{void __builtin_tbeginc (void)} +Generates the @code{tbeginc} machine instruction starting a constrained +hardware transaction. The second operand is set to @code{0xff08}. @enddefbuiltin -@defbuiltin{int __builtin_popcountl (unsigned long)} -Generates the @code{popcntl} or @code{popcntq} machine instruction, -depending on the size of @code{unsigned long}. +@defbuiltin{int __builtin_tend (void)} +Generates the @code{tend} machine instruction finishing a transaction +and making the changes visible to other threads. The condition code +generated by tend is returned as integer value. @enddefbuiltin -@defbuiltin{int __builtin_popcountll (unsigned long long)} -Generates the @code{popcntq} machine instruction. +@defbuiltin{void __builtin_tabort (int)} +Generates the @code{tabort} machine instruction with the specified +abort code. Abort codes from 0 through 255 are reserved and will +result in an error message. @enddefbuiltin -The following built-in functions are available when @option{-mavx} is -used. All of them generate the machine instruction that is part of the -name. - -@smallexample -v4df __builtin_ia32_addpd256 (v4df,v4df); -v8sf __builtin_ia32_addps256 (v8sf,v8sf); -v4df __builtin_ia32_addsubpd256 (v4df,v4df); -v8sf __builtin_ia32_addsubps256 (v8sf,v8sf); -v4df __builtin_ia32_andnpd256 (v4df,v4df); -v8sf __builtin_ia32_andnps256 (v8sf,v8sf); -v4df __builtin_ia32_andpd256 (v4df,v4df); -v8sf __builtin_ia32_andps256 (v8sf,v8sf); -v4df __builtin_ia32_blendpd256 (v4df,v4df,int); -v8sf __builtin_ia32_blendps256 (v8sf,v8sf,int); -v4df __builtin_ia32_blendvpd256 (v4df,v4df,v4df); -v8sf __builtin_ia32_blendvps256 (v8sf,v8sf,v8sf); -v2df __builtin_ia32_cmppd (v2df,v2df,int); -v4df __builtin_ia32_cmppd256 (v4df,v4df,int); -v4sf __builtin_ia32_cmpps (v4sf,v4sf,int); -v8sf __builtin_ia32_cmpps256 (v8sf,v8sf,int); -v2df __builtin_ia32_cmpsd (v2df,v2df,int); -v4sf __builtin_ia32_cmpss (v4sf,v4sf,int); -v4df __builtin_ia32_cvtdq2pd256 (v4si); -v8sf __builtin_ia32_cvtdq2ps256 (v8si); -v4si __builtin_ia32_cvtpd2dq256 (v4df); -v4sf __builtin_ia32_cvtpd2ps256 (v4df); -v8si __builtin_ia32_cvtps2dq256 (v8sf); -v4df __builtin_ia32_cvtps2pd256 (v4sf); -v4si __builtin_ia32_cvttpd2dq256 (v4df); -v8si __builtin_ia32_cvttps2dq256 (v8sf); -v4df __builtin_ia32_divpd256 (v4df,v4df); -v8sf __builtin_ia32_divps256 (v8sf,v8sf); -v8sf __builtin_ia32_dpps256 (v8sf,v8sf,int); -v4df __builtin_ia32_haddpd256 (v4df,v4df); -v8sf __builtin_ia32_haddps256 (v8sf,v8sf); -v4df __builtin_ia32_hsubpd256 (v4df,v4df); -v8sf __builtin_ia32_hsubps256 (v8sf,v8sf); -v32qi __builtin_ia32_lddqu256 (pcchar); -v32qi __builtin_ia32_loaddqu256 (pcchar); -v4df __builtin_ia32_loadupd256 (pcdouble); -v8sf __builtin_ia32_loadups256 (pcfloat); -v2df __builtin_ia32_maskloadpd (pcv2df,v2df); -v4df __builtin_ia32_maskloadpd256 (pcv4df,v4df); -v4sf __builtin_ia32_maskloadps (pcv4sf,v4sf); -v8sf __builtin_ia32_maskloadps256 (pcv8sf,v8sf); -void __builtin_ia32_maskstorepd (pv2df,v2df,v2df); -void __builtin_ia32_maskstorepd256 (pv4df,v4df,v4df); -void __builtin_ia32_maskstoreps (pv4sf,v4sf,v4sf); -void __builtin_ia32_maskstoreps256 (pv8sf,v8sf,v8sf); -v4df __builtin_ia32_maxpd256 (v4df,v4df); -v8sf __builtin_ia32_maxps256 (v8sf,v8sf); -v4df __builtin_ia32_minpd256 (v4df,v4df); -v8sf __builtin_ia32_minps256 (v8sf,v8sf); -v4df __builtin_ia32_movddup256 (v4df); -int __builtin_ia32_movmskpd256 (v4df); -int __builtin_ia32_movmskps256 (v8sf); -v8sf __builtin_ia32_movshdup256 (v8sf); -v8sf __builtin_ia32_movsldup256 (v8sf); -v4df __builtin_ia32_mulpd256 (v4df,v4df); -v8sf __builtin_ia32_mulps256 (v8sf,v8sf); -v4df __builtin_ia32_orpd256 (v4df,v4df); -v8sf __builtin_ia32_orps256 (v8sf,v8sf); -v2df __builtin_ia32_pd_pd256 (v4df); -v4df __builtin_ia32_pd256_pd (v2df); -v4sf __builtin_ia32_ps_ps256 (v8sf); -v8sf __builtin_ia32_ps256_ps (v4sf); -int __builtin_ia32_ptestc256 (v4di,v4di,ptest); -int __builtin_ia32_ptestnzc256 (v4di,v4di,ptest); -int __builtin_ia32_ptestz256 (v4di,v4di,ptest); -v8sf __builtin_ia32_rcpps256 (v8sf); -v4df __builtin_ia32_roundpd256 (v4df,int); -v8sf __builtin_ia32_roundps256 (v8sf,int); -v8sf __builtin_ia32_rsqrtps_nr256 (v8sf); -v8sf __builtin_ia32_rsqrtps256 (v8sf); -v4df __builtin_ia32_shufpd256 (v4df,v4df,int); -v8sf __builtin_ia32_shufps256 (v8sf,v8sf,int); -v4si __builtin_ia32_si_si256 (v8si); -v8si __builtin_ia32_si256_si (v4si); -v4df __builtin_ia32_sqrtpd256 (v4df); -v8sf __builtin_ia32_sqrtps_nr256 (v8sf); -v8sf __builtin_ia32_sqrtps256 (v8sf); -void __builtin_ia32_storedqu256 (pchar,v32qi); -void __builtin_ia32_storeupd256 (pdouble,v4df); -void __builtin_ia32_storeups256 (pfloat,v8sf); -v4df __builtin_ia32_subpd256 (v4df,v4df); -v8sf __builtin_ia32_subps256 (v8sf,v8sf); -v4df __builtin_ia32_unpckhpd256 (v4df,v4df); -v8sf __builtin_ia32_unpckhps256 (v8sf,v8sf); -v4df __builtin_ia32_unpcklpd256 (v4df,v4df); -v8sf __builtin_ia32_unpcklps256 (v8sf,v8sf); -v4df __builtin_ia32_vbroadcastf128_pd256 (pcv2df); -v8sf __builtin_ia32_vbroadcastf128_ps256 (pcv4sf); -v4df __builtin_ia32_vbroadcastsd256 (pcdouble); -v4sf __builtin_ia32_vbroadcastss (pcfloat); -v8sf __builtin_ia32_vbroadcastss256 (pcfloat); -v2df __builtin_ia32_vextractf128_pd256 (v4df,int); -v4sf __builtin_ia32_vextractf128_ps256 (v8sf,int); -v4si __builtin_ia32_vextractf128_si256 (v8si,int); -v4df __builtin_ia32_vinsertf128_pd256 (v4df,v2df,int); -v8sf __builtin_ia32_vinsertf128_ps256 (v8sf,v4sf,int); -v8si __builtin_ia32_vinsertf128_si256 (v8si,v4si,int); -v4df __builtin_ia32_vperm2f128_pd256 (v4df,v4df,int); -v8sf __builtin_ia32_vperm2f128_ps256 (v8sf,v8sf,int); -v8si __builtin_ia32_vperm2f128_si256 (v8si,v8si,int); -v2df __builtin_ia32_vpermil2pd (v2df,v2df,v2di,int); -v4df __builtin_ia32_vpermil2pd256 (v4df,v4df,v4di,int); -v4sf __builtin_ia32_vpermil2ps (v4sf,v4sf,v4si,int); -v8sf __builtin_ia32_vpermil2ps256 (v8sf,v8sf,v8si,int); -v2df __builtin_ia32_vpermilpd (v2df,int); -v4df __builtin_ia32_vpermilpd256 (v4df,int); -v4sf __builtin_ia32_vpermilps (v4sf,int); -v8sf __builtin_ia32_vpermilps256 (v8sf,int); -v2df __builtin_ia32_vpermilvarpd (v2df,v2di); -v4df __builtin_ia32_vpermilvarpd256 (v4df,v4di); -v4sf __builtin_ia32_vpermilvarps (v4sf,v4si); -v8sf __builtin_ia32_vpermilvarps256 (v8sf,v8si); -int __builtin_ia32_vtestcpd (v2df,v2df,ptest); -int __builtin_ia32_vtestcpd256 (v4df,v4df,ptest); -int __builtin_ia32_vtestcps (v4sf,v4sf,ptest); -int __builtin_ia32_vtestcps256 (v8sf,v8sf,ptest); -int __builtin_ia32_vtestnzcpd (v2df,v2df,ptest); -int __builtin_ia32_vtestnzcpd256 (v4df,v4df,ptest); -int __builtin_ia32_vtestnzcps (v4sf,v4sf,ptest); -int __builtin_ia32_vtestnzcps256 (v8sf,v8sf,ptest); -int __builtin_ia32_vtestzpd (v2df,v2df,ptest); -int __builtin_ia32_vtestzpd256 (v4df,v4df,ptest); -int __builtin_ia32_vtestzps (v4sf,v4sf,ptest); -int __builtin_ia32_vtestzps256 (v8sf,v8sf,ptest); -void __builtin_ia32_vzeroall (void); -void __builtin_ia32_vzeroupper (void); -v4df __builtin_ia32_xorpd256 (v4df,v4df); -v8sf __builtin_ia32_xorps256 (v8sf,v8sf); -@end smallexample +@defbuiltin{void __builtin_tx_assist (int)} +Generates the @code{ppa rX,rY,1} machine instruction. Where the +integer parameter is loaded into rX and a value of zero is loaded into +rY. The integer parameter specifies the number of times the +transaction repeatedly aborted. +@enddefbuiltin -The following built-in functions are available when @option{-mavx2} is -used. All of them generate the machine instruction that is part of the -name. +@defbuiltin{int __builtin_tx_nesting_depth (void)} +Generates the @code{etnd} machine instruction. The current nesting +depth is returned as integer value. For a nesting depth of 0 the code +is not executed as part of an transaction. +@enddefbuiltin -@smallexample -v32qi __builtin_ia32_mpsadbw256 (v32qi,v32qi,int); -v32qi __builtin_ia32_pabsb256 (v32qi); -v16hi __builtin_ia32_pabsw256 (v16hi); -v8si __builtin_ia32_pabsd256 (v8si); -v16hi __builtin_ia32_packssdw256 (v8si,v8si); -v32qi __builtin_ia32_packsswb256 (v16hi,v16hi); -v16hi __builtin_ia32_packusdw256 (v8si,v8si); -v32qi __builtin_ia32_packuswb256 (v16hi,v16hi); -v32qi __builtin_ia32_paddb256 (v32qi,v32qi); -v16hi __builtin_ia32_paddw256 (v16hi,v16hi); -v8si __builtin_ia32_paddd256 (v8si,v8si); -v4di __builtin_ia32_paddq256 (v4di,v4di); -v32qi __builtin_ia32_paddsb256 (v32qi,v32qi); -v16hi __builtin_ia32_paddsw256 (v16hi,v16hi); -v32qi __builtin_ia32_paddusb256 (v32qi,v32qi); -v16hi __builtin_ia32_paddusw256 (v16hi,v16hi); -v4di __builtin_ia32_palignr256 (v4di,v4di,int); -v4di __builtin_ia32_andsi256 (v4di,v4di); -v4di __builtin_ia32_andnotsi256 (v4di,v4di); -v32qi __builtin_ia32_pavgb256 (v32qi,v32qi); -v16hi __builtin_ia32_pavgw256 (v16hi,v16hi); -v32qi __builtin_ia32_pblendvb256 (v32qi,v32qi,v32qi); -v16hi __builtin_ia32_pblendw256 (v16hi,v16hi,int); -v32qi __builtin_ia32_pcmpeqb256 (v32qi,v32qi); -v16hi __builtin_ia32_pcmpeqw256 (v16hi,v16hi); -v8si __builtin_ia32_pcmpeqd256 (c8si,v8si); -v4di __builtin_ia32_pcmpeqq256 (v4di,v4di); -v32qi __builtin_ia32_pcmpgtb256 (v32qi,v32qi); -v16hi __builtin_ia32_pcmpgtw256 (16hi,v16hi); -v8si __builtin_ia32_pcmpgtd256 (v8si,v8si); -v4di __builtin_ia32_pcmpgtq256 (v4di,v4di); -v16hi __builtin_ia32_phaddw256 (v16hi,v16hi); -v8si __builtin_ia32_phaddd256 (v8si,v8si); -v16hi __builtin_ia32_phaddsw256 (v16hi,v16hi); -v16hi __builtin_ia32_phsubw256 (v16hi,v16hi); -v8si __builtin_ia32_phsubd256 (v8si,v8si); -v16hi __builtin_ia32_phsubsw256 (v16hi,v16hi); -v32qi __builtin_ia32_pmaddubsw256 (v32qi,v32qi); -v16hi __builtin_ia32_pmaddwd256 (v16hi,v16hi); -v32qi __builtin_ia32_pmaxsb256 (v32qi,v32qi); -v16hi __builtin_ia32_pmaxsw256 (v16hi,v16hi); -v8si __builtin_ia32_pmaxsd256 (v8si,v8si); -v32qi __builtin_ia32_pmaxub256 (v32qi,v32qi); -v16hi __builtin_ia32_pmaxuw256 (v16hi,v16hi); -v8si __builtin_ia32_pmaxud256 (v8si,v8si); -v32qi __builtin_ia32_pminsb256 (v32qi,v32qi); -v16hi __builtin_ia32_pminsw256 (v16hi,v16hi); -v8si __builtin_ia32_pminsd256 (v8si,v8si); -v32qi __builtin_ia32_pminub256 (v32qi,v32qi); -v16hi __builtin_ia32_pminuw256 (v16hi,v16hi); -v8si __builtin_ia32_pminud256 (v8si,v8si); -int __builtin_ia32_pmovmskb256 (v32qi); -v16hi __builtin_ia32_pmovsxbw256 (v16qi); -v8si __builtin_ia32_pmovsxbd256 (v16qi); -v4di __builtin_ia32_pmovsxbq256 (v16qi); -v8si __builtin_ia32_pmovsxwd256 (v8hi); -v4di __builtin_ia32_pmovsxwq256 (v8hi); -v4di __builtin_ia32_pmovsxdq256 (v4si); -v16hi __builtin_ia32_pmovzxbw256 (v16qi); -v8si __builtin_ia32_pmovzxbd256 (v16qi); -v4di __builtin_ia32_pmovzxbq256 (v16qi); -v8si __builtin_ia32_pmovzxwd256 (v8hi); -v4di __builtin_ia32_pmovzxwq256 (v8hi); -v4di __builtin_ia32_pmovzxdq256 (v4si); -v4di __builtin_ia32_pmuldq256 (v8si,v8si); -v16hi __builtin_ia32_pmulhrsw256 (v16hi, v16hi); -v16hi __builtin_ia32_pmulhuw256 (v16hi,v16hi); -v16hi __builtin_ia32_pmulhw256 (v16hi,v16hi); -v16hi __builtin_ia32_pmullw256 (v16hi,v16hi); -v8si __builtin_ia32_pmulld256 (v8si,v8si); -v4di __builtin_ia32_pmuludq256 (v8si,v8si); -v4di __builtin_ia32_por256 (v4di,v4di); -v16hi __builtin_ia32_psadbw256 (v32qi,v32qi); -v32qi __builtin_ia32_pshufb256 (v32qi,v32qi); -v8si __builtin_ia32_pshufd256 (v8si,int); -v16hi __builtin_ia32_pshufhw256 (v16hi,int); -v16hi __builtin_ia32_pshuflw256 (v16hi,int); -v32qi __builtin_ia32_psignb256 (v32qi,v32qi); -v16hi __builtin_ia32_psignw256 (v16hi,v16hi); -v8si __builtin_ia32_psignd256 (v8si,v8si); -v4di __builtin_ia32_pslldqi256 (v4di,int); -v16hi __builtin_ia32_psllwi256 (16hi,int); -v16hi __builtin_ia32_psllw256(v16hi,v8hi); -v8si __builtin_ia32_pslldi256 (v8si,int); -v8si __builtin_ia32_pslld256(v8si,v4si); -v4di __builtin_ia32_psllqi256 (v4di,int); -v4di __builtin_ia32_psllq256(v4di,v2di); -v16hi __builtin_ia32_psrawi256 (v16hi,int); -v16hi __builtin_ia32_psraw256 (v16hi,v8hi); -v8si __builtin_ia32_psradi256 (v8si,int); -v8si __builtin_ia32_psrad256 (v8si,v4si); -v4di __builtin_ia32_psrldqi256 (v4di, int); -v16hi __builtin_ia32_psrlwi256 (v16hi,int); -v16hi __builtin_ia32_psrlw256 (v16hi,v8hi); -v8si __builtin_ia32_psrldi256 (v8si,int); -v8si __builtin_ia32_psrld256 (v8si,v4si); -v4di __builtin_ia32_psrlqi256 (v4di,int); -v4di __builtin_ia32_psrlq256(v4di,v2di); -v32qi __builtin_ia32_psubb256 (v32qi,v32qi); -v32hi __builtin_ia32_psubw256 (v16hi,v16hi); -v8si __builtin_ia32_psubd256 (v8si,v8si); -v4di __builtin_ia32_psubq256 (v4di,v4di); -v32qi __builtin_ia32_psubsb256 (v32qi,v32qi); -v16hi __builtin_ia32_psubsw256 (v16hi,v16hi); -v32qi __builtin_ia32_psubusb256 (v32qi,v32qi); -v16hi __builtin_ia32_psubusw256 (v16hi,v16hi); -v32qi __builtin_ia32_punpckhbw256 (v32qi,v32qi); -v16hi __builtin_ia32_punpckhwd256 (v16hi,v16hi); -v8si __builtin_ia32_punpckhdq256 (v8si,v8si); -v4di __builtin_ia32_punpckhqdq256 (v4di,v4di); -v32qi __builtin_ia32_punpcklbw256 (v32qi,v32qi); -v16hi __builtin_ia32_punpcklwd256 (v16hi,v16hi); -v8si __builtin_ia32_punpckldq256 (v8si,v8si); -v4di __builtin_ia32_punpcklqdq256 (v4di,v4di); -v4di __builtin_ia32_pxor256 (v4di,v4di); -v4di __builtin_ia32_movntdqa256 (pv4di); -v4sf __builtin_ia32_vbroadcastss_ps (v4sf); -v8sf __builtin_ia32_vbroadcastss_ps256 (v4sf); -v4df __builtin_ia32_vbroadcastsd_pd256 (v2df); -v4di __builtin_ia32_vbroadcastsi256 (v2di); -v4si __builtin_ia32_pblendd128 (v4si,v4si); -v8si __builtin_ia32_pblendd256 (v8si,v8si); -v32qi __builtin_ia32_pbroadcastb256 (v16qi); -v16hi __builtin_ia32_pbroadcastw256 (v8hi); -v8si __builtin_ia32_pbroadcastd256 (v4si); -v4di __builtin_ia32_pbroadcastq256 (v2di); -v16qi __builtin_ia32_pbroadcastb128 (v16qi); -v8hi __builtin_ia32_pbroadcastw128 (v8hi); -v4si __builtin_ia32_pbroadcastd128 (v4si); -v2di __builtin_ia32_pbroadcastq128 (v2di); -v8si __builtin_ia32_permvarsi256 (v8si,v8si); -v4df __builtin_ia32_permdf256 (v4df,int); -v8sf __builtin_ia32_permvarsf256 (v8sf,v8sf); -v4di __builtin_ia32_permdi256 (v4di,int); -v4di __builtin_ia32_permti256 (v4di,v4di,int); -v4di __builtin_ia32_extract128i256 (v4di,int); -v4di __builtin_ia32_insert128i256 (v4di,v2di,int); -v8si __builtin_ia32_maskloadd256 (pcv8si,v8si); -v4di __builtin_ia32_maskloadq256 (pcv4di,v4di); -v4si __builtin_ia32_maskloadd (pcv4si,v4si); -v2di __builtin_ia32_maskloadq (pcv2di,v2di); -void __builtin_ia32_maskstored256 (pv8si,v8si,v8si); -void __builtin_ia32_maskstoreq256 (pv4di,v4di,v4di); -void __builtin_ia32_maskstored (pv4si,v4si,v4si); -void __builtin_ia32_maskstoreq (pv2di,v2di,v2di); -v8si __builtin_ia32_psllv8si (v8si,v8si); -v4si __builtin_ia32_psllv4si (v4si,v4si); -v4di __builtin_ia32_psllv4di (v4di,v4di); -v2di __builtin_ia32_psllv2di (v2di,v2di); -v8si __builtin_ia32_psrav8si (v8si,v8si); -v4si __builtin_ia32_psrav4si (v4si,v4si); -v8si __builtin_ia32_psrlv8si (v8si,v8si); -v4si __builtin_ia32_psrlv4si (v4si,v4si); -v4di __builtin_ia32_psrlv4di (v4di,v4di); -v2di __builtin_ia32_psrlv2di (v2di,v2di); -v2df __builtin_ia32_gathersiv2df (v2df, pcdouble,v4si,v2df,int); -v4df __builtin_ia32_gathersiv4df (v4df, pcdouble,v4si,v4df,int); -v2df __builtin_ia32_gatherdiv2df (v2df, pcdouble,v2di,v2df,int); -v4df __builtin_ia32_gatherdiv4df (v4df, pcdouble,v4di,v4df,int); -v4sf __builtin_ia32_gathersiv4sf (v4sf, pcfloat,v4si,v4sf,int); -v8sf __builtin_ia32_gathersiv8sf (v8sf, pcfloat,v8si,v8sf,int); -v4sf __builtin_ia32_gatherdiv4sf (v4sf, pcfloat,v2di,v4sf,int); -v4sf __builtin_ia32_gatherdiv4sf256 (v4sf, pcfloat,v4di,v4sf,int); -v2di __builtin_ia32_gathersiv2di (v2di, pcint64,v4si,v2di,int); -v4di __builtin_ia32_gathersiv4di (v4di, pcint64,v4si,v4di,int); -v2di __builtin_ia32_gatherdiv2di (v2di, pcint64,v2di,v2di,int); -v4di __builtin_ia32_gatherdiv4di (v4di, pcint64,v4di,v4di,int); -v4si __builtin_ia32_gathersiv4si (v4si, pcint,v4si,v4si,int); -v8si __builtin_ia32_gathersiv8si (v8si, pcint,v8si,v8si,int); -v4si __builtin_ia32_gatherdiv4si (v4si, pcint,v2di,v4si,int); -v4si __builtin_ia32_gatherdiv4si256 (v4si, pcint,v4di,v4si,int); -@end smallexample +@defbuiltin{void __builtin_non_tx_store (uint64_t *, uint64_t)} -The following built-in functions are available when @option{-maes} is -used. All of them generate the machine instruction that is part of the -name. +Generates the @code{ntstg} machine instruction. The second argument +is written to the first arguments location. The store operation will +not be rolled-back in case of an transaction abort. +@enddefbuiltin -@smallexample -v2di __builtin_ia32_aesenc128 (v2di, v2di); -v2di __builtin_ia32_aesenclast128 (v2di, v2di); -v2di __builtin_ia32_aesdec128 (v2di, v2di); -v2di __builtin_ia32_aesdeclast128 (v2di, v2di); -v2di __builtin_ia32_aeskeygenassist128 (v2di, const int); -v2di __builtin_ia32_aesimc128 (v2di); -@end smallexample +@node SH Built-in Functions +@subsection SH Built-in Functions +The following built-in functions are supported on the SH1, SH2, SH3 and SH4 +families of processors: -The following built-in function is available when @option{-mpclmul} is -used. +@defbuiltin{{void} __builtin_set_thread_pointer (void *@var{ptr})} +Sets the @samp{GBR} register to the specified value @var{ptr}. This is usually +used by system code that manages threads and execution contexts. The compiler +normally does not generate code that modifies the contents of @samp{GBR} and +thus the value is preserved across function calls. Changing the @samp{GBR} +value in user code must be done with caution, since the compiler might use +@samp{GBR} in order to access thread local variables. -@defbuiltin{v2di __builtin_ia32_pclmulqdq128 (v2di, v2di, const int)} -Generates the @code{pclmulqdq} machine instruction. @enddefbuiltin -The following built-in function is available when @option{-mfsgsbase} is -used. All of them generate the machine instruction that is part of the -name. - +@defbuiltin{{void *} __builtin_thread_pointer (void)} +Returns the value that is currently set in the @samp{GBR} register. +Memory loads and stores that use the thread pointer as a base address are +turned into @samp{GBR} based displacement loads and stores, if possible. +For example: @smallexample -unsigned int __builtin_ia32_rdfsbase32 (void); -unsigned long long __builtin_ia32_rdfsbase64 (void); -unsigned int __builtin_ia32_rdgsbase32 (void); -unsigned long long __builtin_ia32_rdgsbase64 (void); -void _writefsbase_u32 (unsigned int); -void _writefsbase_u64 (unsigned long long); -void _writegsbase_u32 (unsigned int); -void _writegsbase_u64 (unsigned long long); +struct my_tcb +@{ + int a, b, c, d, e; +@}; + +int get_tcb_value (void) +@{ + // Generate @samp{mov.l @@(8,gbr),r0} instruction + return ((my_tcb*)__builtin_thread_pointer ())->c; +@} + @end smallexample +@enddefbuiltin -The following built-in function is available when @option{-mrdrnd} is -used. All of them generate the machine instruction that is part of the -name. +@defbuiltin{{unsigned int} __builtin_sh_get_fpscr (void)} +Returns the value that is currently set in the @samp{FPSCR} register. +@enddefbuiltin + +@defbuiltin{{void} __builtin_sh_set_fpscr (unsigned int @var{val})} +Sets the @samp{FPSCR} register to the specified value @var{val}, while +preserving the current values of the FR, SZ and PR bits. +@enddefbuiltin -@smallexample -unsigned int __builtin_ia32_rdrand16_step (unsigned short *); -unsigned int __builtin_ia32_rdrand32_step (unsigned int *); -unsigned int __builtin_ia32_rdrand64_step (unsigned long long *); -@end smallexample +@node SPARC VIS Built-in Functions +@subsection SPARC VIS Built-in Functions -The following built-in function is available when @option{-mptwrite} is -used. All of them generate the machine instruction that is part of the -name. +GCC supports SIMD operations on the SPARC using both the generic vector +extensions (@pxref{Vector Extensions}) as well as built-in functions for +the SPARC Visual Instruction Set (VIS). When you use the @option{-mvis} +switch, the VIS extension is exposed as the following built-in functions: @smallexample -void __builtin_ia32_ptwrite32 (unsigned); -void __builtin_ia32_ptwrite64 (unsigned long long); -@end smallexample +typedef int v1si __attribute__ ((vector_size (4))); +typedef int v2si __attribute__ ((vector_size (8))); +typedef short v4hi __attribute__ ((vector_size (8))); +typedef short v2hi __attribute__ ((vector_size (4))); +typedef unsigned char v8qi __attribute__ ((vector_size (8))); +typedef unsigned char v4qi __attribute__ ((vector_size (4))); -The following built-in functions are available when @option{-msse4a} is used. -All of them generate the machine instruction that is part of the name. +void __builtin_vis_write_gsr (int64_t); +int64_t __builtin_vis_read_gsr (void); -@smallexample -void __builtin_ia32_movntsd (double *, v2df); -void __builtin_ia32_movntss (float *, v4sf); -v2di __builtin_ia32_extrq (v2di, v16qi); -v2di __builtin_ia32_extrqi (v2di, const unsigned int, const unsigned int); -v2di __builtin_ia32_insertq (v2di, v2di); -v2di __builtin_ia32_insertqi (v2di, v2di, const unsigned int, const unsigned int); -@end smallexample +void * __builtin_vis_alignaddr (void *, long); +void * __builtin_vis_alignaddrl (void *, long); +int64_t __builtin_vis_faligndatadi (int64_t, int64_t); +v2si __builtin_vis_faligndatav2si (v2si, v2si); +v4hi __builtin_vis_faligndatav4hi (v4si, v4si); +v8qi __builtin_vis_faligndatav8qi (v8qi, v8qi); -The following built-in functions are available when @option{-mxop} is used. -@smallexample -v2df __builtin_ia32_vfrczpd (v2df); -v4sf __builtin_ia32_vfrczps (v4sf); -v2df __builtin_ia32_vfrczsd (v2df); -v4sf __builtin_ia32_vfrczss (v4sf); -v4df __builtin_ia32_vfrczpd256 (v4df); -v8sf __builtin_ia32_vfrczps256 (v8sf); -v2di __builtin_ia32_vpcmov (v2di, v2di, v2di); -v2di __builtin_ia32_vpcmov_v2di (v2di, v2di, v2di); -v4si __builtin_ia32_vpcmov_v4si (v4si, v4si, v4si); -v8hi __builtin_ia32_vpcmov_v8hi (v8hi, v8hi, v8hi); -v16qi __builtin_ia32_vpcmov_v16qi (v16qi, v16qi, v16qi); -v2df __builtin_ia32_vpcmov_v2df (v2df, v2df, v2df); -v4sf __builtin_ia32_vpcmov_v4sf (v4sf, v4sf, v4sf); -v4di __builtin_ia32_vpcmov_v4di256 (v4di, v4di, v4di); -v8si __builtin_ia32_vpcmov_v8si256 (v8si, v8si, v8si); -v16hi __builtin_ia32_vpcmov_v16hi256 (v16hi, v16hi, v16hi); -v32qi __builtin_ia32_vpcmov_v32qi256 (v32qi, v32qi, v32qi); -v4df __builtin_ia32_vpcmov_v4df256 (v4df, v4df, v4df); -v8sf __builtin_ia32_vpcmov_v8sf256 (v8sf, v8sf, v8sf); -v16qi __builtin_ia32_vpcomeqb (v16qi, v16qi); -v8hi __builtin_ia32_vpcomeqw (v8hi, v8hi); -v4si __builtin_ia32_vpcomeqd (v4si, v4si); -v2di __builtin_ia32_vpcomeqq (v2di, v2di); -v16qi __builtin_ia32_vpcomequb (v16qi, v16qi); -v4si __builtin_ia32_vpcomequd (v4si, v4si); -v2di __builtin_ia32_vpcomequq (v2di, v2di); -v8hi __builtin_ia32_vpcomequw (v8hi, v8hi); -v8hi __builtin_ia32_vpcomeqw (v8hi, v8hi); -v16qi __builtin_ia32_vpcomfalseb (v16qi, v16qi); -v4si __builtin_ia32_vpcomfalsed (v4si, v4si); -v2di __builtin_ia32_vpcomfalseq (v2di, v2di); -v16qi __builtin_ia32_vpcomfalseub (v16qi, v16qi); -v4si __builtin_ia32_vpcomfalseud (v4si, v4si); -v2di __builtin_ia32_vpcomfalseuq (v2di, v2di); -v8hi __builtin_ia32_vpcomfalseuw (v8hi, v8hi); -v8hi __builtin_ia32_vpcomfalsew (v8hi, v8hi); -v16qi __builtin_ia32_vpcomgeb (v16qi, v16qi); -v4si __builtin_ia32_vpcomged (v4si, v4si); -v2di __builtin_ia32_vpcomgeq (v2di, v2di); -v16qi __builtin_ia32_vpcomgeub (v16qi, v16qi); -v4si __builtin_ia32_vpcomgeud (v4si, v4si); -v2di __builtin_ia32_vpcomgeuq (v2di, v2di); -v8hi __builtin_ia32_vpcomgeuw (v8hi, v8hi); -v8hi __builtin_ia32_vpcomgew (v8hi, v8hi); -v16qi __builtin_ia32_vpcomgtb (v16qi, v16qi); -v4si __builtin_ia32_vpcomgtd (v4si, v4si); -v2di __builtin_ia32_vpcomgtq (v2di, v2di); -v16qi __builtin_ia32_vpcomgtub (v16qi, v16qi); -v4si __builtin_ia32_vpcomgtud (v4si, v4si); -v2di __builtin_ia32_vpcomgtuq (v2di, v2di); -v8hi __builtin_ia32_vpcomgtuw (v8hi, v8hi); -v8hi __builtin_ia32_vpcomgtw (v8hi, v8hi); -v16qi __builtin_ia32_vpcomleb (v16qi, v16qi); -v4si __builtin_ia32_vpcomled (v4si, v4si); -v2di __builtin_ia32_vpcomleq (v2di, v2di); -v16qi __builtin_ia32_vpcomleub (v16qi, v16qi); -v4si __builtin_ia32_vpcomleud (v4si, v4si); -v2di __builtin_ia32_vpcomleuq (v2di, v2di); -v8hi __builtin_ia32_vpcomleuw (v8hi, v8hi); -v8hi __builtin_ia32_vpcomlew (v8hi, v8hi); -v16qi __builtin_ia32_vpcomltb (v16qi, v16qi); -v4si __builtin_ia32_vpcomltd (v4si, v4si); -v2di __builtin_ia32_vpcomltq (v2di, v2di); -v16qi __builtin_ia32_vpcomltub (v16qi, v16qi); -v4si __builtin_ia32_vpcomltud (v4si, v4si); -v2di __builtin_ia32_vpcomltuq (v2di, v2di); -v8hi __builtin_ia32_vpcomltuw (v8hi, v8hi); -v8hi __builtin_ia32_vpcomltw (v8hi, v8hi); -v16qi __builtin_ia32_vpcomneb (v16qi, v16qi); -v4si __builtin_ia32_vpcomned (v4si, v4si); -v2di __builtin_ia32_vpcomneq (v2di, v2di); -v16qi __builtin_ia32_vpcomneub (v16qi, v16qi); -v4si __builtin_ia32_vpcomneud (v4si, v4si); -v2di __builtin_ia32_vpcomneuq (v2di, v2di); -v8hi __builtin_ia32_vpcomneuw (v8hi, v8hi); -v8hi __builtin_ia32_vpcomnew (v8hi, v8hi); -v16qi __builtin_ia32_vpcomtrueb (v16qi, v16qi); -v4si __builtin_ia32_vpcomtrued (v4si, v4si); -v2di __builtin_ia32_vpcomtrueq (v2di, v2di); -v16qi __builtin_ia32_vpcomtrueub (v16qi, v16qi); -v4si __builtin_ia32_vpcomtrueud (v4si, v4si); -v2di __builtin_ia32_vpcomtrueuq (v2di, v2di); -v8hi __builtin_ia32_vpcomtrueuw (v8hi, v8hi); -v8hi __builtin_ia32_vpcomtruew (v8hi, v8hi); -v4si __builtin_ia32_vphaddbd (v16qi); -v2di __builtin_ia32_vphaddbq (v16qi); -v8hi __builtin_ia32_vphaddbw (v16qi); -v2di __builtin_ia32_vphadddq (v4si); -v4si __builtin_ia32_vphaddubd (v16qi); -v2di __builtin_ia32_vphaddubq (v16qi); -v8hi __builtin_ia32_vphaddubw (v16qi); -v2di __builtin_ia32_vphaddudq (v4si); -v4si __builtin_ia32_vphadduwd (v8hi); -v2di __builtin_ia32_vphadduwq (v8hi); -v4si __builtin_ia32_vphaddwd (v8hi); -v2di __builtin_ia32_vphaddwq (v8hi); -v8hi __builtin_ia32_vphsubbw (v16qi); -v2di __builtin_ia32_vphsubdq (v4si); -v4si __builtin_ia32_vphsubwd (v8hi); -v4si __builtin_ia32_vpmacsdd (v4si, v4si, v4si); -v2di __builtin_ia32_vpmacsdqh (v4si, v4si, v2di); -v2di __builtin_ia32_vpmacsdql (v4si, v4si, v2di); -v4si __builtin_ia32_vpmacssdd (v4si, v4si, v4si); -v2di __builtin_ia32_vpmacssdqh (v4si, v4si, v2di); -v2di __builtin_ia32_vpmacssdql (v4si, v4si, v2di); -v4si __builtin_ia32_vpmacsswd (v8hi, v8hi, v4si); -v8hi __builtin_ia32_vpmacssww (v8hi, v8hi, v8hi); -v4si __builtin_ia32_vpmacswd (v8hi, v8hi, v4si); -v8hi __builtin_ia32_vpmacsww (v8hi, v8hi, v8hi); -v4si __builtin_ia32_vpmadcsswd (v8hi, v8hi, v4si); -v4si __builtin_ia32_vpmadcswd (v8hi, v8hi, v4si); -v16qi __builtin_ia32_vpperm (v16qi, v16qi, v16qi); -v16qi __builtin_ia32_vprotb (v16qi, v16qi); -v4si __builtin_ia32_vprotd (v4si, v4si); -v2di __builtin_ia32_vprotq (v2di, v2di); -v8hi __builtin_ia32_vprotw (v8hi, v8hi); -v16qi __builtin_ia32_vpshab (v16qi, v16qi); -v4si __builtin_ia32_vpshad (v4si, v4si); -v2di __builtin_ia32_vpshaq (v2di, v2di); -v8hi __builtin_ia32_vpshaw (v8hi, v8hi); -v16qi __builtin_ia32_vpshlb (v16qi, v16qi); -v4si __builtin_ia32_vpshld (v4si, v4si); -v2di __builtin_ia32_vpshlq (v2di, v2di); -v8hi __builtin_ia32_vpshlw (v8hi, v8hi); -@end smallexample +v4hi __builtin_vis_fexpand (v4qi); -The following built-in functions are available when @option{-mfma4} is used. -All of them generate the machine instruction that is part of the name. +v4hi __builtin_vis_fmul8x16 (v4qi, v4hi); +v4hi __builtin_vis_fmul8x16au (v4qi, v2hi); +v4hi __builtin_vis_fmul8x16al (v4qi, v2hi); +v4hi __builtin_vis_fmul8sux16 (v8qi, v4hi); +v4hi __builtin_vis_fmul8ulx16 (v8qi, v4hi); +v2si __builtin_vis_fmuld8sux16 (v4qi, v2hi); +v2si __builtin_vis_fmuld8ulx16 (v4qi, v2hi); -@smallexample -v2df __builtin_ia32_vfmaddpd (v2df, v2df, v2df); -v4sf __builtin_ia32_vfmaddps (v4sf, v4sf, v4sf); -v2df __builtin_ia32_vfmaddsd (v2df, v2df, v2df); -v4sf __builtin_ia32_vfmaddss (v4sf, v4sf, v4sf); -v2df __builtin_ia32_vfmsubpd (v2df, v2df, v2df); -v4sf __builtin_ia32_vfmsubps (v4sf, v4sf, v4sf); -v2df __builtin_ia32_vfmsubsd (v2df, v2df, v2df); -v4sf __builtin_ia32_vfmsubss (v4sf, v4sf, v4sf); -v2df __builtin_ia32_vfnmaddpd (v2df, v2df, v2df); -v4sf __builtin_ia32_vfnmaddps (v4sf, v4sf, v4sf); -v2df __builtin_ia32_vfnmaddsd (v2df, v2df, v2df); -v4sf __builtin_ia32_vfnmaddss (v4sf, v4sf, v4sf); -v2df __builtin_ia32_vfnmsubpd (v2df, v2df, v2df); -v4sf __builtin_ia32_vfnmsubps (v4sf, v4sf, v4sf); -v2df __builtin_ia32_vfnmsubsd (v2df, v2df, v2df); -v4sf __builtin_ia32_vfnmsubss (v4sf, v4sf, v4sf); -v2df __builtin_ia32_vfmaddsubpd (v2df, v2df, v2df); -v4sf __builtin_ia32_vfmaddsubps (v4sf, v4sf, v4sf); -v2df __builtin_ia32_vfmsubaddpd (v2df, v2df, v2df); -v4sf __builtin_ia32_vfmsubaddps (v4sf, v4sf, v4sf); -v4df __builtin_ia32_vfmaddpd256 (v4df, v4df, v4df); -v8sf __builtin_ia32_vfmaddps256 (v8sf, v8sf, v8sf); -v4df __builtin_ia32_vfmsubpd256 (v4df, v4df, v4df); -v8sf __builtin_ia32_vfmsubps256 (v8sf, v8sf, v8sf); -v4df __builtin_ia32_vfnmaddpd256 (v4df, v4df, v4df); -v8sf __builtin_ia32_vfnmaddps256 (v8sf, v8sf, v8sf); -v4df __builtin_ia32_vfnmsubpd256 (v4df, v4df, v4df); -v8sf __builtin_ia32_vfnmsubps256 (v8sf, v8sf, v8sf); -v4df __builtin_ia32_vfmaddsubpd256 (v4df, v4df, v4df); -v8sf __builtin_ia32_vfmaddsubps256 (v8sf, v8sf, v8sf); -v4df __builtin_ia32_vfmsubaddpd256 (v4df, v4df, v4df); -v8sf __builtin_ia32_vfmsubaddps256 (v8sf, v8sf, v8sf); +v4qi __builtin_vis_fpack16 (v4hi); +v8qi __builtin_vis_fpack32 (v2si, v8qi); +v2hi __builtin_vis_fpackfix (v2si); +v8qi __builtin_vis_fpmerge (v4qi, v4qi); -@end smallexample +int64_t __builtin_vis_pdist (v8qi, v8qi, int64_t); -The following built-in functions are available when @option{-mlwp} is used. +long __builtin_vis_edge8 (void *, void *); +long __builtin_vis_edge8l (void *, void *); +long __builtin_vis_edge16 (void *, void *); +long __builtin_vis_edge16l (void *, void *); +long __builtin_vis_edge32 (void *, void *); +long __builtin_vis_edge32l (void *, void *); -@smallexample -void __builtin_ia32_llwpcb16 (void *); -void __builtin_ia32_llwpcb32 (void *); -void __builtin_ia32_llwpcb64 (void *); -void * __builtin_ia32_llwpcb16 (void); -void * __builtin_ia32_llwpcb32 (void); -void * __builtin_ia32_llwpcb64 (void); -void __builtin_ia32_lwpval16 (unsigned short, unsigned int, unsigned short); -void __builtin_ia32_lwpval32 (unsigned int, unsigned int, unsigned int); -void __builtin_ia32_lwpval64 (unsigned __int64, unsigned int, unsigned int); -unsigned char __builtin_ia32_lwpins16 (unsigned short, unsigned int, unsigned short); -unsigned char __builtin_ia32_lwpins32 (unsigned int, unsigned int, unsigned int); -unsigned char __builtin_ia32_lwpins64 (unsigned __int64, unsigned int, unsigned int); -@end smallexample +long __builtin_vis_fcmple16 (v4hi, v4hi); +long __builtin_vis_fcmple32 (v2si, v2si); +long __builtin_vis_fcmpne16 (v4hi, v4hi); +long __builtin_vis_fcmpne32 (v2si, v2si); +long __builtin_vis_fcmpgt16 (v4hi, v4hi); +long __builtin_vis_fcmpgt32 (v2si, v2si); +long __builtin_vis_fcmpeq16 (v4hi, v4hi); +long __builtin_vis_fcmpeq32 (v2si, v2si); + +v4hi __builtin_vis_fpadd16 (v4hi, v4hi); +v2hi __builtin_vis_fpadd16s (v2hi, v2hi); +v2si __builtin_vis_fpadd32 (v2si, v2si); +v1si __builtin_vis_fpadd32s (v1si, v1si); +v4hi __builtin_vis_fpsub16 (v4hi, v4hi); +v2hi __builtin_vis_fpsub16s (v2hi, v2hi); +v2si __builtin_vis_fpsub32 (v2si, v2si); +v1si __builtin_vis_fpsub32s (v1si, v1si); -The following built-in functions are available when @option{-mbmi} is used. -All of them generate the machine instruction that is part of the name. -@smallexample -unsigned int __builtin_ia32_bextr_u32(unsigned int, unsigned int); -unsigned long long __builtin_ia32_bextr_u64 (unsigned long long, unsigned long long); +long __builtin_vis_array8 (long, long); +long __builtin_vis_array16 (long, long); +long __builtin_vis_array32 (long, long); @end smallexample -The following built-in functions are available when @option{-mbmi2} is used. -All of them generate the machine instruction that is part of the name. -@smallexample -unsigned int _bzhi_u32 (unsigned int, unsigned int); -unsigned int _pdep_u32 (unsigned int, unsigned int); -unsigned int _pext_u32 (unsigned int, unsigned int); -unsigned long long _bzhi_u64 (unsigned long long, unsigned long long); -unsigned long long _pdep_u64 (unsigned long long, unsigned long long); -unsigned long long _pext_u64 (unsigned long long, unsigned long long); -@end smallexample +When you use the @option{-mvis2} switch, the VIS version 2.0 built-in +functions also become available: -The following built-in functions are available when @option{-mlzcnt} is used. -All of them generate the machine instruction that is part of the name. @smallexample -unsigned short __builtin_ia32_lzcnt_u16(unsigned short); -unsigned int __builtin_ia32_lzcnt_u32(unsigned int); -unsigned long long __builtin_ia32_lzcnt_u64 (unsigned long long); -@end smallexample +long __builtin_vis_bmask (long, long); +int64_t __builtin_vis_bshuffledi (int64_t, int64_t); +v2si __builtin_vis_bshufflev2si (v2si, v2si); +v4hi __builtin_vis_bshufflev2si (v4hi, v4hi); +v8qi __builtin_vis_bshufflev2si (v8qi, v8qi); -The following built-in functions are available when @option{-mfxsr} is used. -All of them generate the machine instruction that is part of the name. -@smallexample -void __builtin_ia32_fxsave (void *); -void __builtin_ia32_fxrstor (void *); -void __builtin_ia32_fxsave64 (void *); -void __builtin_ia32_fxrstor64 (void *); +long __builtin_vis_edge8n (void *, void *); +long __builtin_vis_edge8ln (void *, void *); +long __builtin_vis_edge16n (void *, void *); +long __builtin_vis_edge16ln (void *, void *); +long __builtin_vis_edge32n (void *, void *); +long __builtin_vis_edge32ln (void *, void *); @end smallexample -The following built-in functions are available when @option{-mxsave} is used. -All of them generate the machine instruction that is part of the name. -@smallexample -void __builtin_ia32_xsave (void *, long long); -void __builtin_ia32_xrstor (void *, long long); -void __builtin_ia32_xsave64 (void *, long long); -void __builtin_ia32_xrstor64 (void *, long long); -@end smallexample +When you use the @option{-mvis3} switch, the VIS version 3.0 built-in +functions also become available: -The following built-in functions are available when @option{-mxsaveopt} is used. -All of them generate the machine instruction that is part of the name. @smallexample -void __builtin_ia32_xsaveopt (void *, long long); -void __builtin_ia32_xsaveopt64 (void *, long long); -@end smallexample +void __builtin_vis_cmask8 (long); +void __builtin_vis_cmask16 (long); +void __builtin_vis_cmask32 (long); -The following built-in functions are available when @option{-mtbm} is used. -Both of them generate the immediate form of the bextr machine instruction. -@smallexample -unsigned int __builtin_ia32_bextri_u32 (unsigned int, - const unsigned int); -unsigned long long __builtin_ia32_bextri_u64 (unsigned long long, - const unsigned long long); -@end smallexample +v4hi __builtin_vis_fchksm16 (v4hi, v4hi); +v4hi __builtin_vis_fsll16 (v4hi, v4hi); +v4hi __builtin_vis_fslas16 (v4hi, v4hi); +v4hi __builtin_vis_fsrl16 (v4hi, v4hi); +v4hi __builtin_vis_fsra16 (v4hi, v4hi); +v2si __builtin_vis_fsll16 (v2si, v2si); +v2si __builtin_vis_fslas16 (v2si, v2si); +v2si __builtin_vis_fsrl16 (v2si, v2si); +v2si __builtin_vis_fsra16 (v2si, v2si); -The following built-in functions are available when @option{-m3dnow} is used. -All of them generate the machine instruction that is part of the name. +long __builtin_vis_pdistn (v8qi, v8qi); -@smallexample -void __builtin_ia32_femms (void); -v8qi __builtin_ia32_pavgusb (v8qi, v8qi); -v2si __builtin_ia32_pf2id (v2sf); -v2sf __builtin_ia32_pfacc (v2sf, v2sf); -v2sf __builtin_ia32_pfadd (v2sf, v2sf); -v2si __builtin_ia32_pfcmpeq (v2sf, v2sf); -v2si __builtin_ia32_pfcmpge (v2sf, v2sf); -v2si __builtin_ia32_pfcmpgt (v2sf, v2sf); -v2sf __builtin_ia32_pfmax (v2sf, v2sf); -v2sf __builtin_ia32_pfmin (v2sf, v2sf); -v2sf __builtin_ia32_pfmul (v2sf, v2sf); -v2sf __builtin_ia32_pfrcp (v2sf); -v2sf __builtin_ia32_pfrcpit1 (v2sf, v2sf); -v2sf __builtin_ia32_pfrcpit2 (v2sf, v2sf); -v2sf __builtin_ia32_pfrsqrt (v2sf); -v2sf __builtin_ia32_pfsub (v2sf, v2sf); -v2sf __builtin_ia32_pfsubr (v2sf, v2sf); -v2sf __builtin_ia32_pi2fd (v2si); -v4hi __builtin_ia32_pmulhrw (v4hi, v4hi); -@end smallexample +v4hi __builtin_vis_fmean16 (v4hi, v4hi); -The following built-in functions are available when @option{-m3dnowa} is used. -All of them generate the machine instruction that is part of the name. +int64_t __builtin_vis_fpadd64 (int64_t, int64_t); +int64_t __builtin_vis_fpsub64 (int64_t, int64_t); -@smallexample -v2si __builtin_ia32_pf2iw (v2sf); -v2sf __builtin_ia32_pfnacc (v2sf, v2sf); -v2sf __builtin_ia32_pfpnacc (v2sf, v2sf); -v2sf __builtin_ia32_pi2fw (v2si); -v2sf __builtin_ia32_pswapdsf (v2sf); -v2si __builtin_ia32_pswapdsi (v2si); -@end smallexample +v4hi __builtin_vis_fpadds16 (v4hi, v4hi); +v2hi __builtin_vis_fpadds16s (v2hi, v2hi); +v4hi __builtin_vis_fpsubs16 (v4hi, v4hi); +v2hi __builtin_vis_fpsubs16s (v2hi, v2hi); +v2si __builtin_vis_fpadds32 (v2si, v2si); +v1si __builtin_vis_fpadds32s (v1si, v1si); +v2si __builtin_vis_fpsubs32 (v2si, v2si); +v1si __builtin_vis_fpsubs32s (v1si, v1si); -The following built-in functions are available when @option{-mrtm} is used -They are used for restricted transactional memory. These are the internal -low level functions. Normally the functions in -@ref{x86 transactional memory intrinsics} should be used instead. +long __builtin_vis_fucmple8 (v8qi, v8qi); +long __builtin_vis_fucmpne8 (v8qi, v8qi); +long __builtin_vis_fucmpgt8 (v8qi, v8qi); +long __builtin_vis_fucmpeq8 (v8qi, v8qi); -@smallexample -int __builtin_ia32_xbegin (); -void __builtin_ia32_xend (); -void __builtin_ia32_xabort (status); -int __builtin_ia32_xtest (); -@end smallexample +float __builtin_vis_fhadds (float, float); +double __builtin_vis_fhaddd (double, double); +float __builtin_vis_fhsubs (float, float); +double __builtin_vis_fhsubd (double, double); +float __builtin_vis_fnhadds (float, float); +double __builtin_vis_fnhaddd (double, double); -The following built-in functions are available when @option{-mmwaitx} is used. -All of them generate the machine instruction that is part of the name. -@smallexample -void __builtin_ia32_monitorx (void *, unsigned int, unsigned int); -void __builtin_ia32_mwaitx (unsigned int, unsigned int, unsigned int); +int64_t __builtin_vis_umulxhi (int64_t, int64_t); +int64_t __builtin_vis_xmulx (int64_t, int64_t); +int64_t __builtin_vis_xmulxhi (int64_t, int64_t); @end smallexample -The following built-in functions are available when @option{-mclzero} is used. -All of them generate the machine instruction that is part of the name. -@smallexample -void __builtin_i32_clzero (void *); -@end smallexample +When you use the @option{-mvis4} switch, the VIS version 4.0 built-in +functions also become available: -The following built-in functions are available when @option{-mpku} is used. -They generate reads and writes to PKRU. @smallexample -void __builtin_ia32_wrpkru (unsigned int); -unsigned int __builtin_ia32_rdpkru (); -@end smallexample +v8qi __builtin_vis_fpadd8 (v8qi, v8qi); +v8qi __builtin_vis_fpadds8 (v8qi, v8qi); +v8qi __builtin_vis_fpaddus8 (v8qi, v8qi); +v4hi __builtin_vis_fpaddus16 (v4hi, v4hi); -The following built-in functions are available when -@option{-mshstk} option is used. They support shadow stack -machine instructions from Intel Control-flow Enforcement Technology (CET). -Each built-in function generates the machine instruction that is part -of the function's name. These are the internal low-level functions. -Normally the functions in @ref{x86 control-flow protection intrinsics} -should be used instead. +v8qi __builtin_vis_fpsub8 (v8qi, v8qi); +v8qi __builtin_vis_fpsubs8 (v8qi, v8qi); +v8qi __builtin_vis_fpsubus8 (v8qi, v8qi); +v4hi __builtin_vis_fpsubus16 (v4hi, v4hi); -@smallexample -unsigned int __builtin_ia32_rdsspd (void); -unsigned long long __builtin_ia32_rdsspq (void); -void __builtin_ia32_incsspd (unsigned int); -void __builtin_ia32_incsspq (unsigned long long); -void __builtin_ia32_saveprevssp(void); -void __builtin_ia32_rstorssp(void *); -void __builtin_ia32_wrssd(unsigned int, void *); -void __builtin_ia32_wrssq(unsigned long long, void *); -void __builtin_ia32_wrussd(unsigned int, void *); -void __builtin_ia32_wrussq(unsigned long long, void *); -void __builtin_ia32_setssbsy(void); -void __builtin_ia32_clrssbsy(void *); -@end smallexample +long __builtin_vis_fpcmple8 (v8qi, v8qi); +long __builtin_vis_fpcmpgt8 (v8qi, v8qi); +long __builtin_vis_fpcmpule16 (v4hi, v4hi); +long __builtin_vis_fpcmpugt16 (v4hi, v4hi); +long __builtin_vis_fpcmpule32 (v2si, v2si); +long __builtin_vis_fpcmpugt32 (v2si, v2si); -@node x86 transactional memory intrinsics -@subsection x86 Transactional Memory Intrinsics +v8qi __builtin_vis_fpmax8 (v8qi, v8qi); +v4hi __builtin_vis_fpmax16 (v4hi, v4hi); +v2si __builtin_vis_fpmax32 (v2si, v2si); -These hardware transactional memory intrinsics for x86 allow you to use -memory transactions with RTM (Restricted Transactional Memory). -This support is enabled with the @option{-mrtm} option. -For using HLE (Hardware Lock Elision) see -@ref{x86 specific memory model extensions for transactional memory} instead. +v8qi __builtin_vis_fpmaxu8 (v8qi, v8qi); +v4hi __builtin_vis_fpmaxu16 (v4hi, v4hi); +v2si __builtin_vis_fpmaxu32 (v2si, v2si); -A memory transaction commits all changes to memory in an atomic way, -as visible to other threads. If the transaction fails it is rolled back -and all side effects discarded. +v8qi __builtin_vis_fpmin8 (v8qi, v8qi); +v4hi __builtin_vis_fpmin16 (v4hi, v4hi); +v2si __builtin_vis_fpmin32 (v2si, v2si); -Generally there is no guarantee that a memory transaction ever succeeds -and suitable fallback code always needs to be supplied. +v8qi __builtin_vis_fpminu8 (v8qi, v8qi); +v4hi __builtin_vis_fpminu16 (v4hi, v4hi); +v2si __builtin_vis_fpminu32 (v2si, v2si); +@end smallexample -@deftypefn {RTM Function} {unsigned} _xbegin () -Start a RTM (Restricted Transactional Memory) transaction. -Returns @code{_XBEGIN_STARTED} when the transaction -started successfully (note this is not 0, so the constant has to be -explicitly tested). +When you use the @option{-mvis4b} switch, the VIS version 4.0B +built-in functions also become available: -If the transaction aborts, all side effects -are undone and an abort code encoded as a bit mask is returned. -The following macros are defined: +@smallexample +v8qi __builtin_vis_dictunpack8 (double, int); +v4hi __builtin_vis_dictunpack16 (double, int); +v2si __builtin_vis_dictunpack32 (double, int); -@defmac{_XABORT_EXPLICIT} -Transaction was explicitly aborted with @code{_xabort}. The parameter passed -to @code{_xabort} is available with @code{_XABORT_CODE(status)}. -@end defmac +long __builtin_vis_fpcmple8shl (v8qi, v8qi, int); +long __builtin_vis_fpcmpgt8shl (v8qi, v8qi, int); +long __builtin_vis_fpcmpeq8shl (v8qi, v8qi, int); +long __builtin_vis_fpcmpne8shl (v8qi, v8qi, int); + +long __builtin_vis_fpcmple16shl (v4hi, v4hi, int); +long __builtin_vis_fpcmpgt16shl (v4hi, v4hi, int); +long __builtin_vis_fpcmpeq16shl (v4hi, v4hi, int); +long __builtin_vis_fpcmpne16shl (v4hi, v4hi, int); + +long __builtin_vis_fpcmple32shl (v2si, v2si, int); +long __builtin_vis_fpcmpgt32shl (v2si, v2si, int); +long __builtin_vis_fpcmpeq32shl (v2si, v2si, int); +long __builtin_vis_fpcmpne32shl (v2si, v2si, int); -@defmac{_XABORT_RETRY} -Transaction retry is possible. -@end defmac +long __builtin_vis_fpcmpule8shl (v8qi, v8qi, int); +long __builtin_vis_fpcmpugt8shl (v8qi, v8qi, int); +long __builtin_vis_fpcmpule16shl (v4hi, v4hi, int); +long __builtin_vis_fpcmpugt16shl (v4hi, v4hi, int); +long __builtin_vis_fpcmpule32shl (v2si, v2si, int); +long __builtin_vis_fpcmpugt32shl (v2si, v2si, int); -@defmac{_XABORT_CONFLICT} -Transaction abort due to a memory conflict with another thread. -@end defmac +long __builtin_vis_fpcmpde8shl (v8qi, v8qi, int); +long __builtin_vis_fpcmpde16shl (v4hi, v4hi, int); +long __builtin_vis_fpcmpde32shl (v2si, v2si, int); -@defmac{_XABORT_CAPACITY} -Transaction abort due to the transaction using too much memory. -@end defmac +long __builtin_vis_fpcmpur8shl (v8qi, v8qi, int); +long __builtin_vis_fpcmpur16shl (v4hi, v4hi, int); +long __builtin_vis_fpcmpur32shl (v2si, v2si, int); +@end smallexample -@defmac{_XABORT_DEBUG} -Transaction abort due to a debug trap. -@end defmac +@node TI C6X Built-in Functions +@subsection TI C6X Built-in Functions -@defmac{_XABORT_NESTED} -Transaction abort in an inner nested transaction. -@end defmac +GCC provides intrinsics to access certain instructions of the TI C6X +processors. These intrinsics, listed below, are available after +inclusion of the @code{c6x_intrinsics.h} header file. They map directly +to C6X instructions. -There is no guarantee -any transaction ever succeeds, so there always needs to be a valid -fallback path. -@end deftypefn +@smallexample +int _sadd (int, int); +int _ssub (int, int); +int _sadd2 (int, int); +int _ssub2 (int, int); +long long _mpy2 (int, int); +long long _smpy2 (int, int); +int _add4 (int, int); +int _sub4 (int, int); +int _saddu4 (int, int); -@deftypefn {RTM Function} {void} _xend () -Commit the current transaction. When no transaction is active this faults. -All memory side effects of the transaction become visible -to other threads in an atomic manner. -@end deftypefn +int _smpy (int, int); +int _smpyh (int, int); +int _smpyhl (int, int); +int _smpylh (int, int); -@deftypefn {RTM Function} {int} _xtest () -Return a nonzero value if a transaction is currently active, otherwise 0. -@end deftypefn +int _sshl (int, int); +int _subc (int, int); -@deftypefn {RTM Function} {void} _xabort (status) -Abort the current transaction. When no transaction is active this is a no-op. -The @var{status} is an 8-bit constant; its value is encoded in the return -value from @code{_xbegin}. -@end deftypefn +int _avg2 (int, int); +int _avgu4 (int, int); -Here is an example showing handling for @code{_XABORT_RETRY} -and a fallback path for other failures: +int _clrr (int, int); +int _extr (int, int); +int _extru (int, int); +int _abs (int); +int _abs2 (int); +@end smallexample -@smallexample -#include +@node x86 Built-in Functions +@subsection x86 Built-in Functions -int n_tries, max_tries; -unsigned status = _XABORT_EXPLICIT; -... +These built-in functions are available for the x86-32 and x86-64 family +of computers, depending on the command-line switches used. -for (n_tries = 0; n_tries < max_tries; n_tries++) - @{ - status = _xbegin (); - if (status == _XBEGIN_STARTED || !(status & _XABORT_RETRY)) - break; - @} -if (status == _XBEGIN_STARTED) - @{ - ... transaction code... - _xend (); - @} -else - @{ - ... non-transactional fallback path... - @} -@end smallexample +If you specify command-line switches such as @option{-msse}, +the compiler could use the extended instruction sets even if the built-ins +are not used explicitly in the program. For this reason, applications +that perform run-time CPU detection must compile separate files for each +supported architecture, using the appropriate flags. In particular, +the file containing the CPU detection code should be compiled without +these options. -@noindent -Note that, in most cases, the transactional and non-transactional code -must synchronize together to ensure consistency. +The following machine modes are available for use with MMX built-in functions +(@pxref{Vector Extensions}): @code{V2SI} for a vector of two 32-bit integers, +@code{V4HI} for a vector of four 16-bit integers, and @code{V8QI} for a +vector of eight 8-bit integers. Some of the built-in functions operate on +MMX registers as a whole 64-bit entity, these use @code{V1DI} as their mode. -@node x86 control-flow protection intrinsics -@subsection x86 Control-Flow Protection Intrinsics +If 3DNow!@: extensions are enabled, @code{V2SF} is used as a mode for a vector +of two 32-bit floating-point values. -@deftypefn {CET Function} {ret_type} _get_ssp (void) -Get the current value of shadow stack pointer if shadow stack support -from Intel CET is enabled in the hardware or @code{0} otherwise. -The @code{ret_type} is @code{unsigned long long} for 64-bit targets -and @code{unsigned int} for 32-bit targets. -@end deftypefn +If SSE extensions are enabled, @code{V4SF} is used for a vector of four 32-bit +floating-point values. Some instructions use a vector of four 32-bit +integers, these use @code{V4SI}. Finally, some instructions operate on an +entire vector register, interpreting it as a 128-bit integer, these use mode +@code{TI}. -@deftypefn {CET Function} void _inc_ssp (unsigned int) -Increment the current shadow stack pointer by the size specified by the -function argument. The argument is masked to a byte value for security -reasons, so to increment by more than 255 bytes you must call the function -multiple times. -@end deftypefn +The x86-32 and x86-64 family of processors use additional built-in +functions for efficient use of @code{TF} (@code{__float128}) 128-bit +floating point and @code{TC} 128-bit complex floating-point values. -The shadow stack unwind code looks like: +The following floating-point built-in functions are always available: -@smallexample -#include +@defbuiltin{__float128 __builtin_fabsq (__float128 @var{x}))} +Computes the absolute value of @var{x}. +@enddefbuiltin -/* Unwind the shadow stack for EH. */ -#define _Unwind_Frames_Extra(x) \ - do \ - @{ \ - _Unwind_Word ssp = _get_ssp (); \ - if (ssp != 0) \ - @{ \ - _Unwind_Word tmp = (x); \ - while (tmp > 255) \ - @{ \ - _inc_ssp (tmp); \ - tmp -= 255; \ - @} \ - _inc_ssp (tmp); \ - @} \ - @} \ - while (0) -@end smallexample +@defbuiltin{__float128 __builtin_copysignq (__float128 @var{x}, @ + __float128 @var{y})} +Copies the sign of @var{y} into @var{x} and returns the new value of +@var{x}. +@enddefbuiltin -@noindent -This code runs unconditionally on all 64-bit processors. For 32-bit -processors the code runs on those that support multi-byte NOP instructions. +@defbuiltin{__float128 __builtin_infq (void)} +Similar to @code{__builtin_inf}, except the return type is @code{__float128}. +@enddefbuiltin -@node Target Format Checks -@section Format Checks Specific to Particular Target Machines +@defbuiltin{__float128 __builtin_huge_valq (void)} +Similar to @code{__builtin_huge_val}, except the return type is @code{__float128}. +@enddefbuiltin -For some target machines, GCC supports additional options to the -format attribute -(@pxref{Function Attributes,,Declaring Attributes of Functions}). +@defbuiltin{__float128 __builtin_nanq (void)} +Similar to @code{__builtin_nan}, except the return type is @code{__float128}. +@enddefbuiltin -@menu -* Solaris Format Checks:: -* Darwin Format Checks:: -@end menu +@defbuiltin{__float128 __builtin_nansq (void)} +Similar to @code{__builtin_nans}, except the return type is @code{__float128}. +@enddefbuiltin -@node Solaris Format Checks -@subsection Solaris Format Checks +The following built-in function is always available. -Solaris targets support the @code{cmn_err} (or @code{__cmn_err__}) format -check. @code{cmn_err} accepts a subset of the standard @code{printf} -conversions, and the two-argument @code{%b} conversion for displaying -bit-fields. See the Solaris man page for @code{cmn_err} for more information. +@defbuiltin{void __builtin_ia32_pause (void)} +Generates the @code{pause} machine instruction with a compiler memory +barrier. +@enddefbuiltin -@node Darwin Format Checks -@subsection Darwin Format Checks +The following built-in functions are always available and can be used to +check the target platform type. -In addition to the full set of format archetypes (attribute format style -arguments such as @code{printf}, @code{scanf}, @code{strftime}, and -@code{strfmon}), Darwin targets also support the @code{CFString} (or -@code{__CFString__}) archetype in the @code{format} attribute. -Declarations with this archetype are parsed for correct syntax -and argument types. However, parsing of the format string itself and -validating arguments against it in calls to such functions is currently -not performed. +@defbuiltin{void __builtin_cpu_init (void)} +This function runs the CPU detection code to check the type of CPU and the +features supported. This built-in function needs to be invoked along with the built-in functions +to check CPU type and features, @code{__builtin_cpu_is} and +@code{__builtin_cpu_supports}, only when used in a function that is +executed before any constructors are called. The CPU detection code is +automatically executed in a very high priority constructor. -Additionally, @code{CFStringRefs} (defined by the @code{CoreFoundation} headers) may -also be used as format arguments. Note that the relevant headers are only likely to be -available on Darwin (OSX) installations. On such installations, the XCode and system -documentation provide descriptions of @code{CFString}, @code{CFStringRefs} and -associated functions. +For example, this function has to be used in @code{ifunc} resolvers that +check for CPU type using the built-in functions @code{__builtin_cpu_is} +and @code{__builtin_cpu_supports}, or in constructors on targets that +don't support constructor priority. +@smallexample -@node Pragmas -@section Pragmas Accepted by GCC -@cindex pragmas -@cindex @code{#pragma} +static void (*resolve_memcpy (void)) (void) +@{ + // ifunc resolvers fire before constructors, explicitly call the init + // function. + __builtin_cpu_init (); + if (__builtin_cpu_supports ("ssse3")) + return ssse3_memcpy; // super fast memcpy with ssse3 instructions. + else + return default_memcpy; +@} -GCC supports several types of pragmas, primarily in order to compile -code originally written for other compilers. Note that in general -we do not recommend the use of pragmas; @xref{Function Attributes}, -for further explanation. +void *memcpy (void *, const void *, size_t) + __attribute__ ((ifunc ("resolve_memcpy"))); +@end smallexample -The GNU C preprocessor recognizes several pragmas in addition to the -compiler pragmas documented here. Refer to the CPP manual for more -information. +@enddefbuiltin -GCC additionally recognizes OpenMP pragmas when the @option{-fopenmp} -option is specified, and OpenACC pragmas when the @option{-fopenacc} -option is specified. @xref{OpenMP}, and @ref{OpenACC}. +@defbuiltin{int __builtin_cpu_is (const char *@var{cpuname})} +This function returns a positive integer if the run-time CPU +is of type @var{cpuname} +and returns @code{0} otherwise. The following CPU names can be detected: -@menu -* AArch64 Pragmas:: -* ARM Pragmas:: -* LoongArch Pragmas:: -* M32C Pragmas:: -* PRU Pragmas:: -* RS/6000 and PowerPC Pragmas:: -* S/390 Pragmas:: -* Darwin Pragmas:: -* Solaris Pragmas:: -* Symbol-Renaming Pragmas:: -* Structure-Layout Pragmas:: -* Weak Pragmas:: -* Diagnostic Pragmas:: -* Visibility Pragmas:: -* Push/Pop Macro Pragmas:: -* Function Specific Option Pragmas:: -* Loop-Specific Pragmas:: -@end menu +@table @samp +@item amd +AMD CPU. -@node AArch64 Pragmas -@subsection AArch64 Pragmas +@item intel +Intel CPU. -The pragmas defined by the AArch64 target correspond to the AArch64 -target function attributes. They can be specified as below: -@smallexample -#pragma GCC target("string") -@end smallexample +@item atom +Intel Atom CPU. -where @code{@var{string}} can be any string accepted as an AArch64 target -attribute. @xref{AArch64 Function Attributes}, for more details -on the permissible values of @code{string}. +@item slm +Intel Silvermont CPU. -@node ARM Pragmas -@subsection ARM Pragmas +@item core2 +Intel Core 2 CPU. -The ARM target defines pragmas for controlling the default addition of -@code{long_call} and @code{short_call} attributes to functions. -@xref{Function Attributes}, for information about the effects of these -attributes. +@item corei7 +Intel Core i7 CPU. -@table @code -@cindex pragma, long_calls -@item long_calls -Set all subsequent functions to have the @code{long_call} attribute. +@item nehalem +Intel Core i7 Nehalem CPU. -@cindex pragma, no_long_calls -@item no_long_calls -Set all subsequent functions to have the @code{short_call} attribute. +@item westmere +Intel Core i7 Westmere CPU. -@cindex pragma, long_calls_off -@item long_calls_off -Do not affect the @code{long_call} or @code{short_call} attributes of -subsequent functions. -@end table +@item sandybridge +Intel Core i7 Sandy Bridge CPU. -@node LoongArch Pragmas -@subsection LoongArch Pragmas +@item ivybridge +Intel Core i7 Ivy Bridge CPU. -The list of attributes supported by Pragma is the same as that of target -function attributes. @xref{LoongArch Function Attributes}. +@item haswell +Intel Core i7 Haswell CPU. -Example: +@item broadwell +Intel Core i7 Broadwell CPU. -@smallexample -#pragma GCC target("strict-align") -@end smallexample +@item skylake +Intel Core i7 Skylake CPU. -@node M32C Pragmas -@subsection M32C Pragmas +@item skylake-avx512 +Intel Core i7 Skylake AVX512 CPU. -@table @code -@cindex pragma, memregs -@item GCC memregs @var{number} -Overrides the command-line option @code{-memregs=} for the current -file. Use with care! This pragma must be before any function in the -file, and mixing different memregs values in different objects may -make them incompatible. This pragma is useful when a -performance-critical function uses a memreg for temporary values, -as it may allow you to reduce the number of memregs used. +@item cannonlake +Intel Core i7 Cannon Lake CPU. -@cindex pragma, address -@item ADDRESS @var{name} @var{address} -For any declared symbols matching @var{name}, this does three things -to that symbol: it forces the symbol to be located at the given -address (a number), it forces the symbol to be volatile, and it -changes the symbol's scope to be static. This pragma exists for -compatibility with other compilers, but note that the common -@code{1234H} numeric syntax is not supported (use @code{0x1234} -instead). Example: +@item icelake-client +Intel Core i7 Ice Lake Client CPU. -@smallexample -#pragma ADDRESS port3 0x103 -char port3; -@end smallexample +@item icelake-server +Intel Core i7 Ice Lake Server CPU. -@end table +@item cascadelake +Intel Core i7 Cascadelake CPU. -@node PRU Pragmas -@subsection PRU Pragmas +@item tigerlake +Intel Core i7 Tigerlake CPU. -@table @code +@item cooperlake +Intel Core i7 Cooperlake CPU. -@cindex pragma, ctable_entry -@item ctable_entry @var{index} @var{constant_address} -Specifies that the PRU CTABLE entry given by @var{index} has the value -@var{constant_address}. This enables GCC to emit LBCO/SBCO instructions -when the load/store address is known and can be addressed with some CTABLE -entry. For example: +@item sapphirerapids +Intel Core i7 sapphirerapids CPU. -@smallexample -/* will compile to "sbco Rx, 2, 0x10, 4" */ -#pragma ctable_entry 2 0x4802a000 -*(unsigned int *)0x4802a010 = val; -@end smallexample +@item alderlake +Intel Core i7 Alderlake CPU. -@end table +@item rocketlake +Intel Core i7 Rocketlake CPU. -@node RS/6000 and PowerPC Pragmas -@subsection RS/6000 and PowerPC Pragmas +@item graniterapids +Intel Core i7 graniterapids CPU. -The RS/6000 and PowerPC targets define one pragma for controlling -whether or not the @code{longcall} attribute is added to function -declarations by default. This pragma overrides the @option{-mlongcall} -option, but not the @code{longcall} and @code{shortcall} attributes. -@xref{RS/6000 and PowerPC Options}, for more information about when long -calls are and are not necessary. +@item graniterapids-d +Intel Core i7 graniterapids D CPU. -@table @code -@cindex pragma, longcall -@item longcall (1) -Apply the @code{longcall} attribute to all subsequent function -declarations. +@item arrowlake +Intel Core i7 Arrow Lake CPU. -@item longcall (0) -Do not apply the @code{longcall} attribute to subsequent function -declarations. -@end table +@item arrowlake-s +Intel Core i7 Arrow Lake S CPU. -@c Describe h8300 pragmas here. -@c Describe sh pragmas here. -@c Describe v850 pragmas here. +@item pantherlake +Intel Core i7 Panther Lake CPU. -@node S/390 Pragmas -@subsection S/390 Pragmas +@item diamondrapids +Intel Core i7 Diamond Rapids CPU. -The pragmas defined by the S/390 target correspond to the S/390 -target function attributes and some the additional options: +@item bonnell +Intel Atom Bonnell CPU. -@table @samp -@item zvector -@itemx no-zvector -@end table +@item silvermont +Intel Atom Silvermont CPU. -Note that options of the pragma, unlike options of the target -attribute, do change the value of preprocessor macros like -@code{__VEC__}. They can be specified as below: +@item goldmont +Intel Atom Goldmont CPU. -@smallexample -#pragma GCC target("string[,string]...") -#pragma GCC target("string"[,"string"]...) -@end smallexample +@item goldmont-plus +Intel Atom Goldmont Plus CPU. -@node Darwin Pragmas -@subsection Darwin Pragmas +@item tremont +Intel Atom Tremont CPU. -The following pragmas are available for all architectures running the -Darwin operating system. These are useful for compatibility with other -macOS compilers. +@item sierraforest +Intel Atom Sierra Forest CPU. -@table @code -@cindex pragma, mark -@item mark @var{tokens}@dots{} -This pragma is accepted, but has no effect. +@item grandridge +Intel Atom Grand Ridge CPU. -@cindex pragma, options align -@item options align=@var{alignment} -This pragma sets the alignment of fields in structures. The values of -@var{alignment} may be @code{mac68k}, to emulate m68k alignment, or -@code{power}, to emulate PowerPC alignment. Uses of this pragma nest -properly; to restore the previous setting, use @code{reset} for the -@var{alignment}. +@item clearwaterforest +Intel Atom Clearwater Forest CPU. -@cindex pragma, segment -@item segment @var{tokens}@dots{} -This pragma is accepted, but has no effect. +@item lujiazui +ZHAOXIN lujiazui CPU. -@cindex pragma, unused -@item unused (@var{var} [, @var{var}]@dots{}) -This pragma declares variables to be possibly unused. GCC does not -produce warnings for the listed variables. The effect is similar to -that of the @code{unused} attribute, except that this pragma may appear -anywhere within the variables' scopes. -@end table +@item yongfeng +ZHAOXIN yongfeng CPU. -@node Solaris Pragmas -@subsection Solaris Pragmas +@item shijidadao +ZHAOXIN shijidadao CPU. -The Solaris target supports @code{#pragma redefine_extname} -(@pxref{Symbol-Renaming Pragmas}). It also supports additional -@code{#pragma} directives for compatibility with the system compiler. +@item amdfam10h +AMD Family 10h CPU. -@table @code -@cindex pragma, align -@item align @var{alignment} (@var{variable} [, @var{variable}]...) +@item barcelona +AMD Family 10h Barcelona CPU. -Increase the minimum alignment of each @var{variable} to @var{alignment}. -This is the same as GCC's @code{aligned} attribute @pxref{Variable -Attributes}). Macro expansion occurs on the arguments to this pragma -when compiling C and Objective-C@. It does not currently occur when -compiling C++, but this is a bug which may be fixed in a future -release. +@item shanghai +AMD Family 10h Shanghai CPU. -@cindex pragma, fini -@item fini (@var{function} [, @var{function}]...) +@item istanbul +AMD Family 10h Istanbul CPU. -This pragma causes each listed @var{function} to be called after -main, or during shared module unloading, by adding a call to the -@code{.fini} section. +@item btver1 +AMD Family 14h CPU. -@cindex pragma, init -@item init (@var{function} [, @var{function}]...) +@item amdfam15h +AMD Family 15h CPU. -This pragma causes each listed @var{function} to be called during -initialization (before @code{main}) or during shared module loading, by -adding a call to the @code{.init} section. +@item bdver1 +AMD Family 15h Bulldozer version 1. -@end table +@item bdver2 +AMD Family 15h Bulldozer version 2. -@node Symbol-Renaming Pragmas -@subsection Symbol-Renaming Pragmas +@item bdver3 +AMD Family 15h Bulldozer version 3. -GCC supports a @code{#pragma} directive that changes the name used in -assembly for a given declaration. While this pragma is supported on all -platforms, it is intended primarily to provide compatibility with the -Solaris system headers. This effect can also be achieved using the asm -labels extension (@pxref{Asm Labels}). +@item bdver4 +AMD Family 15h Bulldozer version 4. -@table @code -@cindex pragma, redefine_extname -@item redefine_extname @var{oldname} @var{newname} +@item btver2 +AMD Family 16h CPU. -This pragma gives the C function @var{oldname} the assembly symbol -@var{newname}. The preprocessor macro @code{__PRAGMA_REDEFINE_EXTNAME} -is defined if this pragma is available (currently on all platforms). -@end table +@item amdfam17h +AMD Family 17h CPU. -This pragma and the @code{asm} labels extension interact in a complicated -manner. Here are some corner cases you may want to be aware of: +@item znver1 +AMD Family 17h Zen version 1. -@enumerate -@item This pragma silently applies only to declarations with external -linkage. The @code{asm} label feature does not have this restriction. +@item znver2 +AMD Family 17h Zen version 2. -@item In C++, this pragma silently applies only to declarations with -``C'' linkage. Again, @code{asm} labels do not have this restriction. +@item amdfam19h +AMD Family 19h CPU. -@item If either of the ways of changing the assembly name of a -declaration are applied to a declaration whose assembly name has -already been determined (either by a previous use of one of these -features, or because the compiler needed the assembly name in order to -generate code), and the new name is different, a warning issues and -the name does not change. +@item znver3 +AMD Family 19h Zen version 3. -@item The @var{oldname} used by @code{#pragma redefine_extname} is -always the C-language name. -@end enumerate +@item znver4 +AMD Family 19h Zen version 4. -@node Structure-Layout Pragmas -@subsection Structure-Layout Pragmas +@item znver5 +AMD Family 1ah Zen version 5. +@end table -For compatibility with Microsoft Windows compilers, GCC supports a -set of @code{#pragma} directives that change the maximum alignment of -members of structures (other than zero-width bit-fields), unions, and -classes subsequently defined. The @var{n} value below always is required -to be a small power of two and specifies the new alignment in bytes. +Here is an example: +@smallexample +if (__builtin_cpu_is ("corei7")) + @{ + do_corei7 (); // Core i7 specific implementation. + @} +else + @{ + do_generic (); // Generic implementation. + @} +@end smallexample +@enddefbuiltin -@enumerate -@item @code{#pragma pack(@var{n})} simply sets the new alignment. -@item @code{#pragma pack()} sets the alignment to the one that was in -effect when compilation started (see also command-line option -@option{-fpack-struct[=@var{n}]} @pxref{Code Gen Options}). -@item @code{#pragma pack(push[,@var{n}])} pushes the current alignment -setting on an internal stack and then optionally sets the new alignment. -@item @code{#pragma pack(pop)} restores the alignment setting to the one -saved at the top of the internal stack (and removes that stack entry). -Note that @code{#pragma pack([@var{n}])} does not influence this internal -stack; thus it is possible to have @code{#pragma pack(push)} followed by -multiple @code{#pragma pack(@var{n})} instances and finalized by a single -@code{#pragma pack(pop)}. -@end enumerate +@defbuiltin{int __builtin_cpu_supports (const char *@var{feature})} +This function returns a positive integer if the run-time CPU +supports @var{feature} +and returns @code{0} otherwise. The following features can be detected: -Some targets, e.g.@: x86 and PowerPC, support the @code{#pragma ms_struct} -directive which lays out structures and unions subsequently defined as the -documented @code{__attribute__ ((ms_struct))}. +@table @samp +@item cmov +CMOV instruction. +@item mmx +MMX instructions. +@item popcnt +POPCNT instruction. +@item sse +SSE instructions. +@item sse2 +SSE2 instructions. +@item sse3 +SSE3 instructions. +@item ssse3 +SSSE3 instructions. +@item sse4.1 +SSE4.1 instructions. +@item sse4.2 +SSE4.2 instructions. +@item avx +AVX instructions. +@item avx2 +AVX2 instructions. +@item sse4a +SSE4A instructions. +@item fma4 +FMA4 instructions. +@item xop +XOP instructions. +@item fma +FMA instructions. +@item avx512f +AVX512F instructions. +@item bmi +BMI instructions. +@item bmi2 +BMI2 instructions. +@item aes +AES instructions. +@item pclmul +PCLMUL instructions. +@item avx512vl +AVX512VL instructions. +@item avx512bw +AVX512BW instructions. +@item avx512dq +AVX512DQ instructions. +@item avx512cd +AVX512CD instructions. +@item avx512vbmi +AVX512VBMI instructions. +@item avx512ifma +AVX512IFMA instructions. +@item avx512vpopcntdq +AVX512VPOPCNTDQ instructions. +@item avx512vbmi2 +AVX512VBMI2 instructions. +@item gfni +GFNI instructions. +@item vpclmulqdq +VPCLMULQDQ instructions. +@item avx512vnni +AVX512VNNI instructions. +@item avx512bitalg +AVX512BITALG instructions. +@item x86-64 +Baseline x86-64 microarchitecture level (as defined in x86-64 psABI). +@item x86-64-v2 +x86-64-v2 microarchitecture level. +@item x86-64-v3 +x86-64-v3 microarchitecture level. +@item x86-64-v4 +x86-64-v4 microarchitecture level. -@enumerate -@item @code{#pragma ms_struct on} turns on the Microsoft layout. -@item @code{#pragma ms_struct off} turns off the Microsoft layout. -@item @code{#pragma ms_struct reset} goes back to the default layout. -@end enumerate -Most targets also support the @code{#pragma scalar_storage_order} directive -which lays out structures and unions subsequently defined as the documented -@code{__attribute__ ((scalar_storage_order))}. +@end table -@enumerate -@item @code{#pragma scalar_storage_order big-endian} sets the storage order -of the scalar fields to big-endian. -@item @code{#pragma scalar_storage_order little-endian} sets the storage order -of the scalar fields to little-endian. -@item @code{#pragma scalar_storage_order default} goes back to the endianness -that was in effect when compilation started (see also command-line option -@option{-fsso-struct=@var{endianness}} @pxref{C Dialect Options}). -@end enumerate +Here is an example: +@smallexample +if (__builtin_cpu_supports ("popcnt")) + @{ + asm("popcnt %1,%0" : "=r"(count) : "rm"(n) : "cc"); + @} +else + @{ + count = generic_countbits (n); //generic implementation. + @} +@end smallexample +@enddefbuiltin -@node Weak Pragmas -@subsection Weak Pragmas +The following built-in functions are made available by @option{-mmmx}. +All of them generate the machine instruction that is part of the name. -For compatibility with SVR4, GCC supports a set of @code{#pragma} -directives for declaring symbols to be weak, and defining weak -aliases. +@smallexample +v8qi __builtin_ia32_paddb (v8qi, v8qi); +v4hi __builtin_ia32_paddw (v4hi, v4hi); +v2si __builtin_ia32_paddd (v2si, v2si); +v8qi __builtin_ia32_psubb (v8qi, v8qi); +v4hi __builtin_ia32_psubw (v4hi, v4hi); +v2si __builtin_ia32_psubd (v2si, v2si); +v8qi __builtin_ia32_paddsb (v8qi, v8qi); +v4hi __builtin_ia32_paddsw (v4hi, v4hi); +v8qi __builtin_ia32_psubsb (v8qi, v8qi); +v4hi __builtin_ia32_psubsw (v4hi, v4hi); +v8qi __builtin_ia32_paddusb (v8qi, v8qi); +v4hi __builtin_ia32_paddusw (v4hi, v4hi); +v8qi __builtin_ia32_psubusb (v8qi, v8qi); +v4hi __builtin_ia32_psubusw (v4hi, v4hi); +v4hi __builtin_ia32_pmullw (v4hi, v4hi); +v4hi __builtin_ia32_pmulhw (v4hi, v4hi); +di __builtin_ia32_pand (di, di); +di __builtin_ia32_pandn (di,di); +di __builtin_ia32_por (di, di); +di __builtin_ia32_pxor (di, di); +v8qi __builtin_ia32_pcmpeqb (v8qi, v8qi); +v4hi __builtin_ia32_pcmpeqw (v4hi, v4hi); +v2si __builtin_ia32_pcmpeqd (v2si, v2si); +v8qi __builtin_ia32_pcmpgtb (v8qi, v8qi); +v4hi __builtin_ia32_pcmpgtw (v4hi, v4hi); +v2si __builtin_ia32_pcmpgtd (v2si, v2si); +v8qi __builtin_ia32_punpckhbw (v8qi, v8qi); +v4hi __builtin_ia32_punpckhwd (v4hi, v4hi); +v2si __builtin_ia32_punpckhdq (v2si, v2si); +v8qi __builtin_ia32_punpcklbw (v8qi, v8qi); +v4hi __builtin_ia32_punpcklwd (v4hi, v4hi); +v2si __builtin_ia32_punpckldq (v2si, v2si); +v8qi __builtin_ia32_packsswb (v4hi, v4hi); +v4hi __builtin_ia32_packssdw (v2si, v2si); +v8qi __builtin_ia32_packuswb (v4hi, v4hi); -@table @code -@cindex pragma, weak -@item #pragma weak @var{symbol} -This pragma declares @var{symbol} to be weak, as if the declaration -had the attribute of the same name. The pragma may appear before -or after the declaration of @var{symbol}. It is not an error for -@var{symbol} to never be defined at all. +v4hi __builtin_ia32_psllw (v4hi, v4hi); +v2si __builtin_ia32_pslld (v2si, v2si); +v1di __builtin_ia32_psllq (v1di, v1di); +v4hi __builtin_ia32_psrlw (v4hi, v4hi); +v2si __builtin_ia32_psrld (v2si, v2si); +v1di __builtin_ia32_psrlq (v1di, v1di); +v4hi __builtin_ia32_psraw (v4hi, v4hi); +v2si __builtin_ia32_psrad (v2si, v2si); +v4hi __builtin_ia32_psllwi (v4hi, int); +v2si __builtin_ia32_pslldi (v2si, int); +v1di __builtin_ia32_psllqi (v1di, int); +v4hi __builtin_ia32_psrlwi (v4hi, int); +v2si __builtin_ia32_psrldi (v2si, int); +v1di __builtin_ia32_psrlqi (v1di, int); +v4hi __builtin_ia32_psrawi (v4hi, int); +v2si __builtin_ia32_psradi (v2si, int); +@end smallexample -@item #pragma weak @var{symbol1} = @var{symbol2} -This pragma declares @var{symbol1} to be a weak alias of @var{symbol2}. -It is an error if @var{symbol2} is not defined in the current -translation unit. -@end table +The following built-in functions are made available either with +@option{-msse}, or with @option{-m3dnowa}. All of them generate +the machine instruction that is part of the name. -@node Diagnostic Pragmas -@subsection Diagnostic Pragmas +@smallexample +v4hi __builtin_ia32_pmulhuw (v4hi, v4hi); +v8qi __builtin_ia32_pavgb (v8qi, v8qi); +v4hi __builtin_ia32_pavgw (v4hi, v4hi); +v1di __builtin_ia32_psadbw (v8qi, v8qi); +v8qi __builtin_ia32_pmaxub (v8qi, v8qi); +v4hi __builtin_ia32_pmaxsw (v4hi, v4hi); +v8qi __builtin_ia32_pminub (v8qi, v8qi); +v4hi __builtin_ia32_pminsw (v4hi, v4hi); +int __builtin_ia32_pmovmskb (v8qi); +void __builtin_ia32_maskmovq (v8qi, v8qi, char *); +void __builtin_ia32_movntq (di *, di); +void __builtin_ia32_sfence (void); +@end smallexample -GCC allows the user to selectively enable or disable certain types of -diagnostics, and change the kind of the diagnostic. For example, a -project's policy might require that all sources compile with -@option{-Werror} but certain files might have exceptions allowing -specific types of warnings. Or, a project might selectively enable -diagnostics and treat them as errors depending on which preprocessor -macros are defined. +The following built-in functions are available when @option{-msse} is used. +All of them generate the machine instruction that is part of the name. -@table @code -@cindex pragma, diagnostic -@item #pragma GCC diagnostic @var{kind} @var{option} +@smallexample +int __builtin_ia32_comieq (v4sf, v4sf); +int __builtin_ia32_comineq (v4sf, v4sf); +int __builtin_ia32_comilt (v4sf, v4sf); +int __builtin_ia32_comile (v4sf, v4sf); +int __builtin_ia32_comigt (v4sf, v4sf); +int __builtin_ia32_comige (v4sf, v4sf); +int __builtin_ia32_ucomieq (v4sf, v4sf); +int __builtin_ia32_ucomineq (v4sf, v4sf); +int __builtin_ia32_ucomilt (v4sf, v4sf); +int __builtin_ia32_ucomile (v4sf, v4sf); +int __builtin_ia32_ucomigt (v4sf, v4sf); +int __builtin_ia32_ucomige (v4sf, v4sf); +v4sf __builtin_ia32_addps (v4sf, v4sf); +v4sf __builtin_ia32_subps (v4sf, v4sf); +v4sf __builtin_ia32_mulps (v4sf, v4sf); +v4sf __builtin_ia32_divps (v4sf, v4sf); +v4sf __builtin_ia32_addss (v4sf, v4sf); +v4sf __builtin_ia32_subss (v4sf, v4sf); +v4sf __builtin_ia32_mulss (v4sf, v4sf); +v4sf __builtin_ia32_divss (v4sf, v4sf); +v4sf __builtin_ia32_cmpeqps (v4sf, v4sf); +v4sf __builtin_ia32_cmpltps (v4sf, v4sf); +v4sf __builtin_ia32_cmpleps (v4sf, v4sf); +v4sf __builtin_ia32_cmpgtps (v4sf, v4sf); +v4sf __builtin_ia32_cmpgeps (v4sf, v4sf); +v4sf __builtin_ia32_cmpunordps (v4sf, v4sf); +v4sf __builtin_ia32_cmpneqps (v4sf, v4sf); +v4sf __builtin_ia32_cmpnltps (v4sf, v4sf); +v4sf __builtin_ia32_cmpnleps (v4sf, v4sf); +v4sf __builtin_ia32_cmpngtps (v4sf, v4sf); +v4sf __builtin_ia32_cmpngeps (v4sf, v4sf); +v4sf __builtin_ia32_cmpordps (v4sf, v4sf); +v4sf __builtin_ia32_cmpeqss (v4sf, v4sf); +v4sf __builtin_ia32_cmpltss (v4sf, v4sf); +v4sf __builtin_ia32_cmpless (v4sf, v4sf); +v4sf __builtin_ia32_cmpunordss (v4sf, v4sf); +v4sf __builtin_ia32_cmpneqss (v4sf, v4sf); +v4sf __builtin_ia32_cmpnltss (v4sf, v4sf); +v4sf __builtin_ia32_cmpnless (v4sf, v4sf); +v4sf __builtin_ia32_cmpordss (v4sf, v4sf); +v4sf __builtin_ia32_maxps (v4sf, v4sf); +v4sf __builtin_ia32_maxss (v4sf, v4sf); +v4sf __builtin_ia32_minps (v4sf, v4sf); +v4sf __builtin_ia32_minss (v4sf, v4sf); +v4sf __builtin_ia32_andps (v4sf, v4sf); +v4sf __builtin_ia32_andnps (v4sf, v4sf); +v4sf __builtin_ia32_orps (v4sf, v4sf); +v4sf __builtin_ia32_xorps (v4sf, v4sf); +v4sf __builtin_ia32_movss (v4sf, v4sf); +v4sf __builtin_ia32_movhlps (v4sf, v4sf); +v4sf __builtin_ia32_movlhps (v4sf, v4sf); +v4sf __builtin_ia32_unpckhps (v4sf, v4sf); +v4sf __builtin_ia32_unpcklps (v4sf, v4sf); +v4sf __builtin_ia32_cvtpi2ps (v4sf, v2si); +v4sf __builtin_ia32_cvtsi2ss (v4sf, int); +v2si __builtin_ia32_cvtps2pi (v4sf); +int __builtin_ia32_cvtss2si (v4sf); +v2si __builtin_ia32_cvttps2pi (v4sf); +int __builtin_ia32_cvttss2si (v4sf); +v4sf __builtin_ia32_rcpps (v4sf); +v4sf __builtin_ia32_rsqrtps (v4sf); +v4sf __builtin_ia32_sqrtps (v4sf); +v4sf __builtin_ia32_rcpss (v4sf); +v4sf __builtin_ia32_rsqrtss (v4sf); +v4sf __builtin_ia32_sqrtss (v4sf); +v4sf __builtin_ia32_shufps (v4sf, v4sf, int); +void __builtin_ia32_movntps (float *, v4sf); +int __builtin_ia32_movmskps (v4sf); +@end smallexample -Modifies the disposition of a diagnostic. Note that not all -diagnostics are modifiable; at the moment only warnings (normally -controlled by @samp{-W@dots{}}) can be controlled, and not all of them. -Use @option{-fdiagnostics-show-option} to determine which diagnostics -are controllable and which option controls them. +The following built-in functions are available when @option{-msse} is used. -@var{kind} is @samp{error} to treat this diagnostic as an error, -@samp{warning} to treat it like a warning (even if @option{-Werror} is -in effect), or @samp{ignored} if the diagnostic is to be ignored. -@var{option} is a double quoted string that matches the command-line -option. +@defbuiltin{v4sf __builtin_ia32_loadups (float *)} +Generates the @code{movups} machine instruction as a load from memory. +@enddefbuiltin -@smallexample -#pragma GCC diagnostic warning "-Wformat" -#pragma GCC diagnostic error "-Wformat" -#pragma GCC diagnostic ignored "-Wformat" -@end smallexample +@defbuiltin{void __builtin_ia32_storeups (float *, v4sf)} +Generates the @code{movups} machine instruction as a store to memory. +@enddefbuiltin -Note that these pragmas override any command-line options. GCC keeps -track of the location of each pragma, and issues diagnostics according -to the state as of that point in the source file. Thus, pragmas occurring -after a line do not affect diagnostics caused by that line. +@defbuiltin{v4sf __builtin_ia32_loadss (float *)} +Generates the @code{movss} machine instruction as a load from memory. +@enddefbuiltin -@item #pragma GCC diagnostic push -@itemx #pragma GCC diagnostic pop +@defbuiltin{v4sf __builtin_ia32_loadhps (v4sf, const v2sf *)} +Generates the @code{movhps} machine instruction as a load from memory. +@enddefbuiltin -Causes GCC to remember the state of the diagnostics as of each -@code{push}, and restore to that point at each @code{pop}. If a -@code{pop} has no matching @code{push}, the command-line options are -restored. +@defbuiltin{v4sf __builtin_ia32_loadlps (v4sf, const v2sf *)} +Generates the @code{movlps} machine instruction as a load from memory +@enddefbuiltin -@smallexample -#pragma GCC diagnostic error "-Wuninitialized" - foo(a); /* error is given for this one */ -#pragma GCC diagnostic push -#pragma GCC diagnostic ignored "-Wuninitialized" - foo(b); /* no diagnostic for this one */ -#pragma GCC diagnostic pop - foo(c); /* error is given for this one */ -#pragma GCC diagnostic pop - foo(d); /* depends on command-line options */ -@end smallexample +@defbuiltin{void __builtin_ia32_storehps (v2sf *, v4sf)} +Generates the @code{movhps} machine instruction as a store to memory. +@enddefbuiltin -@item #pragma GCC diagnostic ignored_attributes +@defbuiltin{void __builtin_ia32_storelps (v2sf *, v4sf)} +Generates the @code{movlps} machine instruction as a store to memory. +@enddefbuiltin -Similarly to @option{-Wno-attributes=}, this pragma allows users to suppress -warnings about unknown scoped attributes (in C++11 and C23). For example, -@code{#pragma GCC diagnostic ignored_attributes "vendor::attr"} disables -warning about the following declaration: +The following built-in functions are available when @option{-msse2} is used. +All of them generate the machine instruction that is part of the name. @smallexample -[[vendor::attr]] void f(); +int __builtin_ia32_comisdeq (v2df, v2df); +int __builtin_ia32_comisdlt (v2df, v2df); +int __builtin_ia32_comisdle (v2df, v2df); +int __builtin_ia32_comisdgt (v2df, v2df); +int __builtin_ia32_comisdge (v2df, v2df); +int __builtin_ia32_comisdneq (v2df, v2df); +int __builtin_ia32_ucomisdeq (v2df, v2df); +int __builtin_ia32_ucomisdlt (v2df, v2df); +int __builtin_ia32_ucomisdle (v2df, v2df); +int __builtin_ia32_ucomisdgt (v2df, v2df); +int __builtin_ia32_ucomisdge (v2df, v2df); +int __builtin_ia32_ucomisdneq (v2df, v2df); +v2df __builtin_ia32_cmpeqpd (v2df, v2df); +v2df __builtin_ia32_cmpltpd (v2df, v2df); +v2df __builtin_ia32_cmplepd (v2df, v2df); +v2df __builtin_ia32_cmpgtpd (v2df, v2df); +v2df __builtin_ia32_cmpgepd (v2df, v2df); +v2df __builtin_ia32_cmpunordpd (v2df, v2df); +v2df __builtin_ia32_cmpneqpd (v2df, v2df); +v2df __builtin_ia32_cmpnltpd (v2df, v2df); +v2df __builtin_ia32_cmpnlepd (v2df, v2df); +v2df __builtin_ia32_cmpngtpd (v2df, v2df); +v2df __builtin_ia32_cmpngepd (v2df, v2df); +v2df __builtin_ia32_cmpordpd (v2df, v2df); +v2df __builtin_ia32_cmpeqsd (v2df, v2df); +v2df __builtin_ia32_cmpltsd (v2df, v2df); +v2df __builtin_ia32_cmplesd (v2df, v2df); +v2df __builtin_ia32_cmpunordsd (v2df, v2df); +v2df __builtin_ia32_cmpneqsd (v2df, v2df); +v2df __builtin_ia32_cmpnltsd (v2df, v2df); +v2df __builtin_ia32_cmpnlesd (v2df, v2df); +v2df __builtin_ia32_cmpordsd (v2df, v2df); +v2di __builtin_ia32_paddq (v2di, v2di); +v2di __builtin_ia32_psubq (v2di, v2di); +v2df __builtin_ia32_addpd (v2df, v2df); +v2df __builtin_ia32_subpd (v2df, v2df); +v2df __builtin_ia32_mulpd (v2df, v2df); +v2df __builtin_ia32_divpd (v2df, v2df); +v2df __builtin_ia32_addsd (v2df, v2df); +v2df __builtin_ia32_subsd (v2df, v2df); +v2df __builtin_ia32_mulsd (v2df, v2df); +v2df __builtin_ia32_divsd (v2df, v2df); +v2df __builtin_ia32_minpd (v2df, v2df); +v2df __builtin_ia32_maxpd (v2df, v2df); +v2df __builtin_ia32_minsd (v2df, v2df); +v2df __builtin_ia32_maxsd (v2df, v2df); +v2df __builtin_ia32_andpd (v2df, v2df); +v2df __builtin_ia32_andnpd (v2df, v2df); +v2df __builtin_ia32_orpd (v2df, v2df); +v2df __builtin_ia32_xorpd (v2df, v2df); +v2df __builtin_ia32_movsd (v2df, v2df); +v2df __builtin_ia32_unpckhpd (v2df, v2df); +v2df __builtin_ia32_unpcklpd (v2df, v2df); +v16qi __builtin_ia32_paddb128 (v16qi, v16qi); +v8hi __builtin_ia32_paddw128 (v8hi, v8hi); +v4si __builtin_ia32_paddd128 (v4si, v4si); +v2di __builtin_ia32_paddq128 (v2di, v2di); +v16qi __builtin_ia32_psubb128 (v16qi, v16qi); +v8hi __builtin_ia32_psubw128 (v8hi, v8hi); +v4si __builtin_ia32_psubd128 (v4si, v4si); +v2di __builtin_ia32_psubq128 (v2di, v2di); +v8hi __builtin_ia32_pmullw128 (v8hi, v8hi); +v8hi __builtin_ia32_pmulhw128 (v8hi, v8hi); +v2di __builtin_ia32_pand128 (v2di, v2di); +v2di __builtin_ia32_pandn128 (v2di, v2di); +v2di __builtin_ia32_por128 (v2di, v2di); +v2di __builtin_ia32_pxor128 (v2di, v2di); +v16qi __builtin_ia32_pavgb128 (v16qi, v16qi); +v8hi __builtin_ia32_pavgw128 (v8hi, v8hi); +v16qi __builtin_ia32_pcmpeqb128 (v16qi, v16qi); +v8hi __builtin_ia32_pcmpeqw128 (v8hi, v8hi); +v4si __builtin_ia32_pcmpeqd128 (v4si, v4si); +v16qi __builtin_ia32_pcmpgtb128 (v16qi, v16qi); +v8hi __builtin_ia32_pcmpgtw128 (v8hi, v8hi); +v4si __builtin_ia32_pcmpgtd128 (v4si, v4si); +v16qi __builtin_ia32_pmaxub128 (v16qi, v16qi); +v8hi __builtin_ia32_pmaxsw128 (v8hi, v8hi); +v16qi __builtin_ia32_pminub128 (v16qi, v16qi); +v8hi __builtin_ia32_pminsw128 (v8hi, v8hi); +v16qi __builtin_ia32_punpckhbw128 (v16qi, v16qi); +v8hi __builtin_ia32_punpckhwd128 (v8hi, v8hi); +v4si __builtin_ia32_punpckhdq128 (v4si, v4si); +v2di __builtin_ia32_punpckhqdq128 (v2di, v2di); +v16qi __builtin_ia32_punpcklbw128 (v16qi, v16qi); +v8hi __builtin_ia32_punpcklwd128 (v8hi, v8hi); +v4si __builtin_ia32_punpckldq128 (v4si, v4si); +v2di __builtin_ia32_punpcklqdq128 (v2di, v2di); +v16qi __builtin_ia32_packsswb128 (v8hi, v8hi); +v8hi __builtin_ia32_packssdw128 (v4si, v4si); +v16qi __builtin_ia32_packuswb128 (v8hi, v8hi); +v8hi __builtin_ia32_pmulhuw128 (v8hi, v8hi); +void __builtin_ia32_maskmovdqu (v16qi, v16qi); +v2df __builtin_ia32_loadupd (double *); +void __builtin_ia32_storeupd (double *, v2df); +v2df __builtin_ia32_loadhpd (v2df, double const *); +v2df __builtin_ia32_loadlpd (v2df, double const *); +int __builtin_ia32_movmskpd (v2df); +int __builtin_ia32_pmovmskb128 (v16qi); +void __builtin_ia32_movnti (int *, int); +void __builtin_ia32_movnti64 (long long int *, long long int); +void __builtin_ia32_movntpd (double *, v2df); +void __builtin_ia32_movntdq (v2df *, v2df); +v4si __builtin_ia32_pshufd (v4si, int); +v8hi __builtin_ia32_pshuflw (v8hi, int); +v8hi __builtin_ia32_pshufhw (v8hi, int); +v2di __builtin_ia32_psadbw128 (v16qi, v16qi); +v2df __builtin_ia32_sqrtpd (v2df); +v2df __builtin_ia32_sqrtsd (v2df); +v2df __builtin_ia32_shufpd (v2df, v2df, int); +v2df __builtin_ia32_cvtdq2pd (v4si); +v4sf __builtin_ia32_cvtdq2ps (v4si); +v4si __builtin_ia32_cvtpd2dq (v2df); +v2si __builtin_ia32_cvtpd2pi (v2df); +v4sf __builtin_ia32_cvtpd2ps (v2df); +v4si __builtin_ia32_cvttpd2dq (v2df); +v2si __builtin_ia32_cvttpd2pi (v2df); +v2df __builtin_ia32_cvtpi2pd (v2si); +int __builtin_ia32_cvtsd2si (v2df); +int __builtin_ia32_cvttsd2si (v2df); +long long __builtin_ia32_cvtsd2si64 (v2df); +long long __builtin_ia32_cvttsd2si64 (v2df); +v4si __builtin_ia32_cvtps2dq (v4sf); +v2df __builtin_ia32_cvtps2pd (v4sf); +v4si __builtin_ia32_cvttps2dq (v4sf); +v2df __builtin_ia32_cvtsi2sd (v2df, int); +v2df __builtin_ia32_cvtsi642sd (v2df, long long); +v4sf __builtin_ia32_cvtsd2ss (v4sf, v2df); +v2df __builtin_ia32_cvtss2sd (v2df, v4sf); +void __builtin_ia32_clflush (const void *); +void __builtin_ia32_lfence (void); +void __builtin_ia32_mfence (void); +v16qi __builtin_ia32_loaddqu (const char *); +void __builtin_ia32_storedqu (char *, v16qi); +v1di __builtin_ia32_pmuludq (v2si, v2si); +v2di __builtin_ia32_pmuludq128 (v4si, v4si); +v8hi __builtin_ia32_psllw128 (v8hi, v8hi); +v4si __builtin_ia32_pslld128 (v4si, v4si); +v2di __builtin_ia32_psllq128 (v2di, v2di); +v8hi __builtin_ia32_psrlw128 (v8hi, v8hi); +v4si __builtin_ia32_psrld128 (v4si, v4si); +v2di __builtin_ia32_psrlq128 (v2di, v2di); +v8hi __builtin_ia32_psraw128 (v8hi, v8hi); +v4si __builtin_ia32_psrad128 (v4si, v4si); +v2di __builtin_ia32_pslldqi128 (v2di, int); +v8hi __builtin_ia32_psllwi128 (v8hi, int); +v4si __builtin_ia32_pslldi128 (v4si, int); +v2di __builtin_ia32_psllqi128 (v2di, int); +v2di __builtin_ia32_psrldqi128 (v2di, int); +v8hi __builtin_ia32_psrlwi128 (v8hi, int); +v4si __builtin_ia32_psrldi128 (v4si, int); +v2di __builtin_ia32_psrlqi128 (v2di, int); +v8hi __builtin_ia32_psrawi128 (v8hi, int); +v4si __builtin_ia32_psradi128 (v4si, int); +v4si __builtin_ia32_pmaddwd128 (v8hi, v8hi); +v2di __builtin_ia32_movq128 (v2di); @end smallexample -whereas @code{#pragma GCC diagnostic ignored_attributes "vendor::"} prevents -warning about both of these declarations: +The following built-in functions are available when @option{-msse3} is used. +All of them generate the machine instruction that is part of the name. @smallexample -[[vendor::safe]] void f(); -[[vendor::unsafe]] void f2(); +v2df __builtin_ia32_addsubpd (v2df, v2df); +v4sf __builtin_ia32_addsubps (v4sf, v4sf); +v2df __builtin_ia32_haddpd (v2df, v2df); +v4sf __builtin_ia32_haddps (v4sf, v4sf); +v2df __builtin_ia32_hsubpd (v2df, v2df); +v4sf __builtin_ia32_hsubps (v4sf, v4sf); +v16qi __builtin_ia32_lddqu (char const *); +void __builtin_ia32_monitor (void *, unsigned int, unsigned int); +v4sf __builtin_ia32_movshdup (v4sf); +v4sf __builtin_ia32_movsldup (v4sf); +void __builtin_ia32_mwait (unsigned int, unsigned int); @end smallexample -@end table - -GCC also offers a simple mechanism for printing messages during -compilation. - -@table @code -@cindex pragma, diagnostic -@item #pragma message @var{string} - -Prints @var{string} as a compiler message on compilation. The message -is informational only, and is neither a compilation warning nor an -error. Newlines can be included in the string by using the @samp{\n} -escape sequence. +The following built-in functions are available when @option{-mssse3} is used. +All of them generate the machine instruction that is part of the name. @smallexample -#pragma message "Compiling " __FILE__ "..." +v2si __builtin_ia32_phaddd (v2si, v2si); +v4hi __builtin_ia32_phaddw (v4hi, v4hi); +v4hi __builtin_ia32_phaddsw (v4hi, v4hi); +v2si __builtin_ia32_phsubd (v2si, v2si); +v4hi __builtin_ia32_phsubw (v4hi, v4hi); +v4hi __builtin_ia32_phsubsw (v4hi, v4hi); +v4hi __builtin_ia32_pmaddubsw (v8qi, v8qi); +v4hi __builtin_ia32_pmulhrsw (v4hi, v4hi); +v8qi __builtin_ia32_pshufb (v8qi, v8qi); +v8qi __builtin_ia32_psignb (v8qi, v8qi); +v2si __builtin_ia32_psignd (v2si, v2si); +v4hi __builtin_ia32_psignw (v4hi, v4hi); +v1di __builtin_ia32_palignr (v1di, v1di, int); +v8qi __builtin_ia32_pabsb (v8qi); +v2si __builtin_ia32_pabsd (v2si); +v4hi __builtin_ia32_pabsw (v4hi); @end smallexample -@var{string} may be parenthesized, and is printed with location -information. For example, +The following built-in functions are available when @option{-mssse3} is used. +All of them generate the machine instruction that is part of the name. @smallexample -#define DO_PRAGMA(x) _Pragma (#x) -#define TODO(x) DO_PRAGMA(message ("TODO - " #x)) - -TODO(Remember to fix this) +v4si __builtin_ia32_phaddd128 (v4si, v4si); +v8hi __builtin_ia32_phaddw128 (v8hi, v8hi); +v8hi __builtin_ia32_phaddsw128 (v8hi, v8hi); +v4si __builtin_ia32_phsubd128 (v4si, v4si); +v8hi __builtin_ia32_phsubw128 (v8hi, v8hi); +v8hi __builtin_ia32_phsubsw128 (v8hi, v8hi); +v8hi __builtin_ia32_pmaddubsw128 (v16qi, v16qi); +v8hi __builtin_ia32_pmulhrsw128 (v8hi, v8hi); +v16qi __builtin_ia32_pshufb128 (v16qi, v16qi); +v16qi __builtin_ia32_psignb128 (v16qi, v16qi); +v4si __builtin_ia32_psignd128 (v4si, v4si); +v8hi __builtin_ia32_psignw128 (v8hi, v8hi); +v2di __builtin_ia32_palignr128 (v2di, v2di, int); +v16qi __builtin_ia32_pabsb128 (v16qi); +v4si __builtin_ia32_pabsd128 (v4si); +v8hi __builtin_ia32_pabsw128 (v8hi); @end smallexample -@noindent -prints @samp{/tmp/file.c:4: note: #pragma message: -TODO - Remember to fix this}. - -@cindex pragma, diagnostic -@item #pragma GCC error @var{message} -Generates an error message. This pragma @emph{is} considered to -indicate an error in the compilation, and it will be treated as such. - -Newlines can be included in the string by using the @samp{\n} -escape sequence. They will be displayed as newlines even if the -@option{-fmessage-length} option is set to zero. - -The error is only generated if the pragma is present in the code after -pre-processing has been completed. It does not matter however if the -code containing the pragma is unreachable: +The following built-in functions are available when @option{-msse4.1} is +used. All of them generate the machine instruction that is part of the +name. @smallexample -#if 0 -#pragma GCC error "this error is not seen" -#endif -void foo (void) -@{ - return; -#pragma GCC error "this error is seen" -@} -@end smallexample - -@cindex pragma, diagnostic -@item #pragma GCC warning @var{message} -This is just like @samp{pragma GCC error} except that a warning -message is issued instead of an error message. Unless -@option{-Werror} is in effect, in which case this pragma will generate -an error as well. - -@end table +v2df __builtin_ia32_blendpd (v2df, v2df, const int); +v4sf __builtin_ia32_blendps (v4sf, v4sf, const int); +v2df __builtin_ia32_blendvpd (v2df, v2df, v2df); +v4sf __builtin_ia32_blendvps (v4sf, v4sf, v4sf); +v2df __builtin_ia32_dppd (v2df, v2df, const int); +v4sf __builtin_ia32_dpps (v4sf, v4sf, const int); +v4sf __builtin_ia32_insertps128 (v4sf, v4sf, const int); +v2di __builtin_ia32_movntdqa (v2di *); +v16qi __builtin_ia32_mpsadbw128 (v16qi, v16qi, const int); +v8hi __builtin_ia32_packusdw128 (v4si, v4si); +v16qi __builtin_ia32_pblendvb128 (v16qi, v16qi, v16qi); +v8hi __builtin_ia32_pblendw128 (v8hi, v8hi, const int); +v2di __builtin_ia32_pcmpeqq (v2di, v2di); +v8hi __builtin_ia32_phminposuw128 (v8hi); +v16qi __builtin_ia32_pmaxsb128 (v16qi, v16qi); +v4si __builtin_ia32_pmaxsd128 (v4si, v4si); +v4si __builtin_ia32_pmaxud128 (v4si, v4si); +v8hi __builtin_ia32_pmaxuw128 (v8hi, v8hi); +v16qi __builtin_ia32_pminsb128 (v16qi, v16qi); +v4si __builtin_ia32_pminsd128 (v4si, v4si); +v4si __builtin_ia32_pminud128 (v4si, v4si); +v8hi __builtin_ia32_pminuw128 (v8hi, v8hi); +v4si __builtin_ia32_pmovsxbd128 (v16qi); +v2di __builtin_ia32_pmovsxbq128 (v16qi); +v8hi __builtin_ia32_pmovsxbw128 (v16qi); +v2di __builtin_ia32_pmovsxdq128 (v4si); +v4si __builtin_ia32_pmovsxwd128 (v8hi); +v2di __builtin_ia32_pmovsxwq128 (v8hi); +v4si __builtin_ia32_pmovzxbd128 (v16qi); +v2di __builtin_ia32_pmovzxbq128 (v16qi); +v8hi __builtin_ia32_pmovzxbw128 (v16qi); +v2di __builtin_ia32_pmovzxdq128 (v4si); +v4si __builtin_ia32_pmovzxwd128 (v8hi); +v2di __builtin_ia32_pmovzxwq128 (v8hi); +v2di __builtin_ia32_pmuldq128 (v4si, v4si); +v4si __builtin_ia32_pmulld128 (v4si, v4si); +int __builtin_ia32_ptestc128 (v2di, v2di); +int __builtin_ia32_ptestnzc128 (v2di, v2di); +int __builtin_ia32_ptestz128 (v2di, v2di); +v2df __builtin_ia32_roundpd (v2df, const int); +v4sf __builtin_ia32_roundps (v4sf, const int); +v2df __builtin_ia32_roundsd (v2df, v2df, const int); +v4sf __builtin_ia32_roundss (v4sf, v4sf, const int); +@end smallexample -@node Visibility Pragmas -@subsection Visibility Pragmas +The following built-in functions are available when @option{-msse4.1} is +used. -@table @code -@cindex pragma, visibility -@item #pragma GCC visibility push(@var{visibility}) -@itemx #pragma GCC visibility pop +@defbuiltin{v4sf __builtin_ia32_vec_set_v4sf (v4sf, float, const int)} +Generates the @code{insertps} machine instruction. +@enddefbuiltin -This pragma allows the user to set the visibility for multiple -declarations without having to give each a visibility attribute -(@pxref{Function Attributes}). +@defbuiltin{int __builtin_ia32_vec_ext_v16qi (v16qi, const int)} +Generates the @code{pextrb} machine instruction. +@enddefbuiltin -In C++, @samp{#pragma GCC visibility} affects only namespace-scope -declarations. Class members and template specializations are not -affected; if you want to override the visibility for a particular -member or instantiation, you must use an attribute. +@defbuiltin{v16qi __builtin_ia32_vec_set_v16qi (v16qi, int, const int)} +Generates the @code{pinsrb} machine instruction. +@enddefbuiltin -@end table +@defbuiltin{v4si __builtin_ia32_vec_set_v4si (v4si, int, const int)} +Generates the @code{pinsrd} machine instruction. +@enddefbuiltin +@defbuiltin{v2di __builtin_ia32_vec_set_v2di (v2di, long long, const int)} +Generates the @code{pinsrq} machine instruction in 64bit mode. +@enddefbuiltin -@node Push/Pop Macro Pragmas -@subsection Push/Pop Macro Pragmas +The following built-in functions are changed to generate new SSE4.1 +instructions when @option{-msse4.1} is used. -For compatibility with Microsoft Windows compilers, GCC supports -@samp{#pragma push_macro(@var{"macro_name"})} -and @samp{#pragma pop_macro(@var{"macro_name"})}. +@defbuiltin{float __builtin_ia32_vec_ext_v4sf (v4sf, const int)} +Generates the @code{extractps} machine instruction. +@enddefbuiltin -@table @code -@cindex pragma, push_macro -@item #pragma push_macro(@var{"macro_name"}) -This pragma saves the value of the macro named as @var{macro_name} to -the top of the stack for this macro. +@defbuiltin{int __builtin_ia32_vec_ext_v4si (v4si, const int)} +Generates the @code{pextrd} machine instruction. +@enddefbuiltin -@cindex pragma, pop_macro -@item #pragma pop_macro(@var{"macro_name"}) -This pragma sets the value of the macro named as @var{macro_name} to -the value on top of the stack for this macro. If the stack for -@var{macro_name} is empty, the value of the macro remains unchanged. -@end table +@defbuiltin{{long long} __builtin_ia32_vec_ext_v2di (v2di, const int)} +Generates the @code{pextrq} machine instruction in 64bit mode. +@enddefbuiltin -For example: +The following built-in functions are available when @option{-msse4.2} is +used. All of them generate the machine instruction that is part of the +name. @smallexample -#define X 1 -#pragma push_macro("X") -#undef X -#define X -1 -#pragma pop_macro("X") -int x [X]; +v16qi __builtin_ia32_pcmpestrm128 (v16qi, int, v16qi, int, const int); +int __builtin_ia32_pcmpestri128 (v16qi, int, v16qi, int, const int); +int __builtin_ia32_pcmpestria128 (v16qi, int, v16qi, int, const int); +int __builtin_ia32_pcmpestric128 (v16qi, int, v16qi, int, const int); +int __builtin_ia32_pcmpestrio128 (v16qi, int, v16qi, int, const int); +int __builtin_ia32_pcmpestris128 (v16qi, int, v16qi, int, const int); +int __builtin_ia32_pcmpestriz128 (v16qi, int, v16qi, int, const int); +v16qi __builtin_ia32_pcmpistrm128 (v16qi, v16qi, const int); +int __builtin_ia32_pcmpistri128 (v16qi, v16qi, const int); +int __builtin_ia32_pcmpistria128 (v16qi, v16qi, const int); +int __builtin_ia32_pcmpistric128 (v16qi, v16qi, const int); +int __builtin_ia32_pcmpistrio128 (v16qi, v16qi, const int); +int __builtin_ia32_pcmpistris128 (v16qi, v16qi, const int); +int __builtin_ia32_pcmpistriz128 (v16qi, v16qi, const int); +v2di __builtin_ia32_pcmpgtq (v2di, v2di); @end smallexample -@noindent -In this example, the definition of X as 1 is saved by @code{#pragma -push_macro} and restored by @code{#pragma pop_macro}. - -@node Function Specific Option Pragmas -@subsection Function Specific Option Pragmas - -@table @code -@cindex pragma GCC target -@item #pragma GCC target (@var{string}, @dots{}) - -This pragma allows you to set target-specific options for functions -defined later in the source file. One or more strings can be -specified. Each function that is defined after this point is treated -as if it had been declared with one @code{target(}@var{string}@code{)} -attribute for each @var{string} argument. The parentheses around -the strings in the pragma are optional. @xref{Function Attributes}, -for more information about the @code{target} attribute and the attribute -syntax. - -The @code{#pragma GCC target} pragma is presently implemented for -x86, ARM, AArch64, PowerPC, and S/390 targets only. - -@cindex pragma GCC optimize -@item #pragma GCC optimize (@var{string}, @dots{}) - -This pragma allows you to set global optimization options for functions -defined later in the source file. One or more strings can be -specified. Each function that is defined after this point is treated -as if it had been declared with one @code{optimize(}@var{string}@code{)} -attribute for each @var{string} argument. The parentheses around -the strings in the pragma are optional. @xref{Function Attributes}, -for more information about the @code{optimize} attribute and the attribute -syntax. - -@cindex pragma GCC push_options -@cindex pragma GCC pop_options -@item #pragma GCC push_options -@itemx #pragma GCC pop_options - -These pragmas maintain a stack of the current target and optimization -options. It is intended for include files where you temporarily want -to switch to using a different @samp{#pragma GCC target} or -@samp{#pragma GCC optimize} and then to pop back to the previous -options. +The following built-in functions are available when @option{-msse4.2} is +used. -@cindex pragma GCC reset_options -@item #pragma GCC reset_options +@defbuiltin{{unsigned int} __builtin_ia32_crc32qi (unsigned int, unsigned char)} +Generates the @code{crc32b} machine instruction. +@enddefbuiltin -This pragma clears the current @code{#pragma GCC target} and -@code{#pragma GCC optimize} to use the default switches as specified -on the command line. +@defbuiltin{{unsigned int} __builtin_ia32_crc32hi (unsigned int, unsigned short)} +Generates the @code{crc32w} machine instruction. +@enddefbuiltin -@end table +@defbuiltin{{unsigned int} __builtin_ia32_crc32si (unsigned int, unsigned int)} +Generates the @code{crc32l} machine instruction. +@enddefbuiltin -@node Loop-Specific Pragmas -@subsection Loop-Specific Pragmas +@defbuiltin{{unsigned long long} __builtin_ia32_crc32di (unsigned long long, unsigned long long)} +Generates the @code{crc32q} machine instruction. +@enddefbuiltin -@table @code -@cindex pragma GCC ivdep -@item #pragma GCC ivdep +The following built-in functions are changed to generate new SSE4.2 +instructions when @option{-msse4.2} is used. -With this pragma, the programmer asserts that there are no loop-carried -dependencies which would prevent consecutive iterations of -the following loop from executing concurrently with SIMD -(single instruction multiple data) instructions. +@defbuiltin{int __builtin_popcount (unsigned int)} +Generates the @code{popcntl} machine instruction. +@enddefbuiltin -For example, the compiler can only unconditionally vectorize the following -loop with the pragma: +@defbuiltin{int __builtin_popcountl (unsigned long)} +Generates the @code{popcntl} or @code{popcntq} machine instruction, +depending on the size of @code{unsigned long}. +@enddefbuiltin -@smallexample -void foo (int n, int *a, int *b, int *c) -@{ - int i, j; -#pragma GCC ivdep - for (i = 0; i < n; ++i) - a[i] = b[i] + c[i]; -@} -@end smallexample +@defbuiltin{int __builtin_popcountll (unsigned long long)} +Generates the @code{popcntq} machine instruction. +@enddefbuiltin -@noindent -In this example, using the @code{restrict} qualifier had the same -effect. In the following example, that would not be possible. Assume -@math{k < -m} or @math{k >= m}. Only with the pragma, the compiler knows -that it can unconditionally vectorize the following loop: +The following built-in functions are available when @option{-mavx} is +used. All of them generate the machine instruction that is part of the +name. @smallexample -void ignore_vec_dep (int *a, int k, int c, int m) -@{ -#pragma GCC ivdep - for (int i = 0; i < m; i++) - a[i] = a[i + k] * c; -@} +v4df __builtin_ia32_addpd256 (v4df,v4df); +v8sf __builtin_ia32_addps256 (v8sf,v8sf); +v4df __builtin_ia32_addsubpd256 (v4df,v4df); +v8sf __builtin_ia32_addsubps256 (v8sf,v8sf); +v4df __builtin_ia32_andnpd256 (v4df,v4df); +v8sf __builtin_ia32_andnps256 (v8sf,v8sf); +v4df __builtin_ia32_andpd256 (v4df,v4df); +v8sf __builtin_ia32_andps256 (v8sf,v8sf); +v4df __builtin_ia32_blendpd256 (v4df,v4df,int); +v8sf __builtin_ia32_blendps256 (v8sf,v8sf,int); +v4df __builtin_ia32_blendvpd256 (v4df,v4df,v4df); +v8sf __builtin_ia32_blendvps256 (v8sf,v8sf,v8sf); +v2df __builtin_ia32_cmppd (v2df,v2df,int); +v4df __builtin_ia32_cmppd256 (v4df,v4df,int); +v4sf __builtin_ia32_cmpps (v4sf,v4sf,int); +v8sf __builtin_ia32_cmpps256 (v8sf,v8sf,int); +v2df __builtin_ia32_cmpsd (v2df,v2df,int); +v4sf __builtin_ia32_cmpss (v4sf,v4sf,int); +v4df __builtin_ia32_cvtdq2pd256 (v4si); +v8sf __builtin_ia32_cvtdq2ps256 (v8si); +v4si __builtin_ia32_cvtpd2dq256 (v4df); +v4sf __builtin_ia32_cvtpd2ps256 (v4df); +v8si __builtin_ia32_cvtps2dq256 (v8sf); +v4df __builtin_ia32_cvtps2pd256 (v4sf); +v4si __builtin_ia32_cvttpd2dq256 (v4df); +v8si __builtin_ia32_cvttps2dq256 (v8sf); +v4df __builtin_ia32_divpd256 (v4df,v4df); +v8sf __builtin_ia32_divps256 (v8sf,v8sf); +v8sf __builtin_ia32_dpps256 (v8sf,v8sf,int); +v4df __builtin_ia32_haddpd256 (v4df,v4df); +v8sf __builtin_ia32_haddps256 (v8sf,v8sf); +v4df __builtin_ia32_hsubpd256 (v4df,v4df); +v8sf __builtin_ia32_hsubps256 (v8sf,v8sf); +v32qi __builtin_ia32_lddqu256 (pcchar); +v32qi __builtin_ia32_loaddqu256 (pcchar); +v4df __builtin_ia32_loadupd256 (pcdouble); +v8sf __builtin_ia32_loadups256 (pcfloat); +v2df __builtin_ia32_maskloadpd (pcv2df,v2df); +v4df __builtin_ia32_maskloadpd256 (pcv4df,v4df); +v4sf __builtin_ia32_maskloadps (pcv4sf,v4sf); +v8sf __builtin_ia32_maskloadps256 (pcv8sf,v8sf); +void __builtin_ia32_maskstorepd (pv2df,v2df,v2df); +void __builtin_ia32_maskstorepd256 (pv4df,v4df,v4df); +void __builtin_ia32_maskstoreps (pv4sf,v4sf,v4sf); +void __builtin_ia32_maskstoreps256 (pv8sf,v8sf,v8sf); +v4df __builtin_ia32_maxpd256 (v4df,v4df); +v8sf __builtin_ia32_maxps256 (v8sf,v8sf); +v4df __builtin_ia32_minpd256 (v4df,v4df); +v8sf __builtin_ia32_minps256 (v8sf,v8sf); +v4df __builtin_ia32_movddup256 (v4df); +int __builtin_ia32_movmskpd256 (v4df); +int __builtin_ia32_movmskps256 (v8sf); +v8sf __builtin_ia32_movshdup256 (v8sf); +v8sf __builtin_ia32_movsldup256 (v8sf); +v4df __builtin_ia32_mulpd256 (v4df,v4df); +v8sf __builtin_ia32_mulps256 (v8sf,v8sf); +v4df __builtin_ia32_orpd256 (v4df,v4df); +v8sf __builtin_ia32_orps256 (v8sf,v8sf); +v2df __builtin_ia32_pd_pd256 (v4df); +v4df __builtin_ia32_pd256_pd (v2df); +v4sf __builtin_ia32_ps_ps256 (v8sf); +v8sf __builtin_ia32_ps256_ps (v4sf); +int __builtin_ia32_ptestc256 (v4di,v4di,ptest); +int __builtin_ia32_ptestnzc256 (v4di,v4di,ptest); +int __builtin_ia32_ptestz256 (v4di,v4di,ptest); +v8sf __builtin_ia32_rcpps256 (v8sf); +v4df __builtin_ia32_roundpd256 (v4df,int); +v8sf __builtin_ia32_roundps256 (v8sf,int); +v8sf __builtin_ia32_rsqrtps_nr256 (v8sf); +v8sf __builtin_ia32_rsqrtps256 (v8sf); +v4df __builtin_ia32_shufpd256 (v4df,v4df,int); +v8sf __builtin_ia32_shufps256 (v8sf,v8sf,int); +v4si __builtin_ia32_si_si256 (v8si); +v8si __builtin_ia32_si256_si (v4si); +v4df __builtin_ia32_sqrtpd256 (v4df); +v8sf __builtin_ia32_sqrtps_nr256 (v8sf); +v8sf __builtin_ia32_sqrtps256 (v8sf); +void __builtin_ia32_storedqu256 (pchar,v32qi); +void __builtin_ia32_storeupd256 (pdouble,v4df); +void __builtin_ia32_storeups256 (pfloat,v8sf); +v4df __builtin_ia32_subpd256 (v4df,v4df); +v8sf __builtin_ia32_subps256 (v8sf,v8sf); +v4df __builtin_ia32_unpckhpd256 (v4df,v4df); +v8sf __builtin_ia32_unpckhps256 (v8sf,v8sf); +v4df __builtin_ia32_unpcklpd256 (v4df,v4df); +v8sf __builtin_ia32_unpcklps256 (v8sf,v8sf); +v4df __builtin_ia32_vbroadcastf128_pd256 (pcv2df); +v8sf __builtin_ia32_vbroadcastf128_ps256 (pcv4sf); +v4df __builtin_ia32_vbroadcastsd256 (pcdouble); +v4sf __builtin_ia32_vbroadcastss (pcfloat); +v8sf __builtin_ia32_vbroadcastss256 (pcfloat); +v2df __builtin_ia32_vextractf128_pd256 (v4df,int); +v4sf __builtin_ia32_vextractf128_ps256 (v8sf,int); +v4si __builtin_ia32_vextractf128_si256 (v8si,int); +v4df __builtin_ia32_vinsertf128_pd256 (v4df,v2df,int); +v8sf __builtin_ia32_vinsertf128_ps256 (v8sf,v4sf,int); +v8si __builtin_ia32_vinsertf128_si256 (v8si,v4si,int); +v4df __builtin_ia32_vperm2f128_pd256 (v4df,v4df,int); +v8sf __builtin_ia32_vperm2f128_ps256 (v8sf,v8sf,int); +v8si __builtin_ia32_vperm2f128_si256 (v8si,v8si,int); +v2df __builtin_ia32_vpermil2pd (v2df,v2df,v2di,int); +v4df __builtin_ia32_vpermil2pd256 (v4df,v4df,v4di,int); +v4sf __builtin_ia32_vpermil2ps (v4sf,v4sf,v4si,int); +v8sf __builtin_ia32_vpermil2ps256 (v8sf,v8sf,v8si,int); +v2df __builtin_ia32_vpermilpd (v2df,int); +v4df __builtin_ia32_vpermilpd256 (v4df,int); +v4sf __builtin_ia32_vpermilps (v4sf,int); +v8sf __builtin_ia32_vpermilps256 (v8sf,int); +v2df __builtin_ia32_vpermilvarpd (v2df,v2di); +v4df __builtin_ia32_vpermilvarpd256 (v4df,v4di); +v4sf __builtin_ia32_vpermilvarps (v4sf,v4si); +v8sf __builtin_ia32_vpermilvarps256 (v8sf,v8si); +int __builtin_ia32_vtestcpd (v2df,v2df,ptest); +int __builtin_ia32_vtestcpd256 (v4df,v4df,ptest); +int __builtin_ia32_vtestcps (v4sf,v4sf,ptest); +int __builtin_ia32_vtestcps256 (v8sf,v8sf,ptest); +int __builtin_ia32_vtestnzcpd (v2df,v2df,ptest); +int __builtin_ia32_vtestnzcpd256 (v4df,v4df,ptest); +int __builtin_ia32_vtestnzcps (v4sf,v4sf,ptest); +int __builtin_ia32_vtestnzcps256 (v8sf,v8sf,ptest); +int __builtin_ia32_vtestzpd (v2df,v2df,ptest); +int __builtin_ia32_vtestzpd256 (v4df,v4df,ptest); +int __builtin_ia32_vtestzps (v4sf,v4sf,ptest); +int __builtin_ia32_vtestzps256 (v8sf,v8sf,ptest); +void __builtin_ia32_vzeroall (void); +void __builtin_ia32_vzeroupper (void); +v4df __builtin_ia32_xorpd256 (v4df,v4df); +v8sf __builtin_ia32_xorps256 (v8sf,v8sf); @end smallexample -@cindex pragma GCC novector -@item #pragma GCC novector - -With this pragma, the programmer asserts that the following loop should be -prevented from executing concurrently with SIMD (single instruction multiple -data) instructions. - -For example, the compiler cannot vectorize the following loop with the pragma: +The following built-in functions are available when @option{-mavx2} is +used. All of them generate the machine instruction that is part of the +name. @smallexample -void foo (int n, int *a, int *b, int *c) -@{ - int i, j; -#pragma GCC novector - for (i = 0; i < n; ++i) - a[i] = b[i] + c[i]; -@} +v32qi __builtin_ia32_mpsadbw256 (v32qi,v32qi,int); +v32qi __builtin_ia32_pabsb256 (v32qi); +v16hi __builtin_ia32_pabsw256 (v16hi); +v8si __builtin_ia32_pabsd256 (v8si); +v16hi __builtin_ia32_packssdw256 (v8si,v8si); +v32qi __builtin_ia32_packsswb256 (v16hi,v16hi); +v16hi __builtin_ia32_packusdw256 (v8si,v8si); +v32qi __builtin_ia32_packuswb256 (v16hi,v16hi); +v32qi __builtin_ia32_paddb256 (v32qi,v32qi); +v16hi __builtin_ia32_paddw256 (v16hi,v16hi); +v8si __builtin_ia32_paddd256 (v8si,v8si); +v4di __builtin_ia32_paddq256 (v4di,v4di); +v32qi __builtin_ia32_paddsb256 (v32qi,v32qi); +v16hi __builtin_ia32_paddsw256 (v16hi,v16hi); +v32qi __builtin_ia32_paddusb256 (v32qi,v32qi); +v16hi __builtin_ia32_paddusw256 (v16hi,v16hi); +v4di __builtin_ia32_palignr256 (v4di,v4di,int); +v4di __builtin_ia32_andsi256 (v4di,v4di); +v4di __builtin_ia32_andnotsi256 (v4di,v4di); +v32qi __builtin_ia32_pavgb256 (v32qi,v32qi); +v16hi __builtin_ia32_pavgw256 (v16hi,v16hi); +v32qi __builtin_ia32_pblendvb256 (v32qi,v32qi,v32qi); +v16hi __builtin_ia32_pblendw256 (v16hi,v16hi,int); +v32qi __builtin_ia32_pcmpeqb256 (v32qi,v32qi); +v16hi __builtin_ia32_pcmpeqw256 (v16hi,v16hi); +v8si __builtin_ia32_pcmpeqd256 (c8si,v8si); +v4di __builtin_ia32_pcmpeqq256 (v4di,v4di); +v32qi __builtin_ia32_pcmpgtb256 (v32qi,v32qi); +v16hi __builtin_ia32_pcmpgtw256 (16hi,v16hi); +v8si __builtin_ia32_pcmpgtd256 (v8si,v8si); +v4di __builtin_ia32_pcmpgtq256 (v4di,v4di); +v16hi __builtin_ia32_phaddw256 (v16hi,v16hi); +v8si __builtin_ia32_phaddd256 (v8si,v8si); +v16hi __builtin_ia32_phaddsw256 (v16hi,v16hi); +v16hi __builtin_ia32_phsubw256 (v16hi,v16hi); +v8si __builtin_ia32_phsubd256 (v8si,v8si); +v16hi __builtin_ia32_phsubsw256 (v16hi,v16hi); +v32qi __builtin_ia32_pmaddubsw256 (v32qi,v32qi); +v16hi __builtin_ia32_pmaddwd256 (v16hi,v16hi); +v32qi __builtin_ia32_pmaxsb256 (v32qi,v32qi); +v16hi __builtin_ia32_pmaxsw256 (v16hi,v16hi); +v8si __builtin_ia32_pmaxsd256 (v8si,v8si); +v32qi __builtin_ia32_pmaxub256 (v32qi,v32qi); +v16hi __builtin_ia32_pmaxuw256 (v16hi,v16hi); +v8si __builtin_ia32_pmaxud256 (v8si,v8si); +v32qi __builtin_ia32_pminsb256 (v32qi,v32qi); +v16hi __builtin_ia32_pminsw256 (v16hi,v16hi); +v8si __builtin_ia32_pminsd256 (v8si,v8si); +v32qi __builtin_ia32_pminub256 (v32qi,v32qi); +v16hi __builtin_ia32_pminuw256 (v16hi,v16hi); +v8si __builtin_ia32_pminud256 (v8si,v8si); +int __builtin_ia32_pmovmskb256 (v32qi); +v16hi __builtin_ia32_pmovsxbw256 (v16qi); +v8si __builtin_ia32_pmovsxbd256 (v16qi); +v4di __builtin_ia32_pmovsxbq256 (v16qi); +v8si __builtin_ia32_pmovsxwd256 (v8hi); +v4di __builtin_ia32_pmovsxwq256 (v8hi); +v4di __builtin_ia32_pmovsxdq256 (v4si); +v16hi __builtin_ia32_pmovzxbw256 (v16qi); +v8si __builtin_ia32_pmovzxbd256 (v16qi); +v4di __builtin_ia32_pmovzxbq256 (v16qi); +v8si __builtin_ia32_pmovzxwd256 (v8hi); +v4di __builtin_ia32_pmovzxwq256 (v8hi); +v4di __builtin_ia32_pmovzxdq256 (v4si); +v4di __builtin_ia32_pmuldq256 (v8si,v8si); +v16hi __builtin_ia32_pmulhrsw256 (v16hi, v16hi); +v16hi __builtin_ia32_pmulhuw256 (v16hi,v16hi); +v16hi __builtin_ia32_pmulhw256 (v16hi,v16hi); +v16hi __builtin_ia32_pmullw256 (v16hi,v16hi); +v8si __builtin_ia32_pmulld256 (v8si,v8si); +v4di __builtin_ia32_pmuludq256 (v8si,v8si); +v4di __builtin_ia32_por256 (v4di,v4di); +v16hi __builtin_ia32_psadbw256 (v32qi,v32qi); +v32qi __builtin_ia32_pshufb256 (v32qi,v32qi); +v8si __builtin_ia32_pshufd256 (v8si,int); +v16hi __builtin_ia32_pshufhw256 (v16hi,int); +v16hi __builtin_ia32_pshuflw256 (v16hi,int); +v32qi __builtin_ia32_psignb256 (v32qi,v32qi); +v16hi __builtin_ia32_psignw256 (v16hi,v16hi); +v8si __builtin_ia32_psignd256 (v8si,v8si); +v4di __builtin_ia32_pslldqi256 (v4di,int); +v16hi __builtin_ia32_psllwi256 (16hi,int); +v16hi __builtin_ia32_psllw256(v16hi,v8hi); +v8si __builtin_ia32_pslldi256 (v8si,int); +v8si __builtin_ia32_pslld256(v8si,v4si); +v4di __builtin_ia32_psllqi256 (v4di,int); +v4di __builtin_ia32_psllq256(v4di,v2di); +v16hi __builtin_ia32_psrawi256 (v16hi,int); +v16hi __builtin_ia32_psraw256 (v16hi,v8hi); +v8si __builtin_ia32_psradi256 (v8si,int); +v8si __builtin_ia32_psrad256 (v8si,v4si); +v4di __builtin_ia32_psrldqi256 (v4di, int); +v16hi __builtin_ia32_psrlwi256 (v16hi,int); +v16hi __builtin_ia32_psrlw256 (v16hi,v8hi); +v8si __builtin_ia32_psrldi256 (v8si,int); +v8si __builtin_ia32_psrld256 (v8si,v4si); +v4di __builtin_ia32_psrlqi256 (v4di,int); +v4di __builtin_ia32_psrlq256(v4di,v2di); +v32qi __builtin_ia32_psubb256 (v32qi,v32qi); +v32hi __builtin_ia32_psubw256 (v16hi,v16hi); +v8si __builtin_ia32_psubd256 (v8si,v8si); +v4di __builtin_ia32_psubq256 (v4di,v4di); +v32qi __builtin_ia32_psubsb256 (v32qi,v32qi); +v16hi __builtin_ia32_psubsw256 (v16hi,v16hi); +v32qi __builtin_ia32_psubusb256 (v32qi,v32qi); +v16hi __builtin_ia32_psubusw256 (v16hi,v16hi); +v32qi __builtin_ia32_punpckhbw256 (v32qi,v32qi); +v16hi __builtin_ia32_punpckhwd256 (v16hi,v16hi); +v8si __builtin_ia32_punpckhdq256 (v8si,v8si); +v4di __builtin_ia32_punpckhqdq256 (v4di,v4di); +v32qi __builtin_ia32_punpcklbw256 (v32qi,v32qi); +v16hi __builtin_ia32_punpcklwd256 (v16hi,v16hi); +v8si __builtin_ia32_punpckldq256 (v8si,v8si); +v4di __builtin_ia32_punpcklqdq256 (v4di,v4di); +v4di __builtin_ia32_pxor256 (v4di,v4di); +v4di __builtin_ia32_movntdqa256 (pv4di); +v4sf __builtin_ia32_vbroadcastss_ps (v4sf); +v8sf __builtin_ia32_vbroadcastss_ps256 (v4sf); +v4df __builtin_ia32_vbroadcastsd_pd256 (v2df); +v4di __builtin_ia32_vbroadcastsi256 (v2di); +v4si __builtin_ia32_pblendd128 (v4si,v4si); +v8si __builtin_ia32_pblendd256 (v8si,v8si); +v32qi __builtin_ia32_pbroadcastb256 (v16qi); +v16hi __builtin_ia32_pbroadcastw256 (v8hi); +v8si __builtin_ia32_pbroadcastd256 (v4si); +v4di __builtin_ia32_pbroadcastq256 (v2di); +v16qi __builtin_ia32_pbroadcastb128 (v16qi); +v8hi __builtin_ia32_pbroadcastw128 (v8hi); +v4si __builtin_ia32_pbroadcastd128 (v4si); +v2di __builtin_ia32_pbroadcastq128 (v2di); +v8si __builtin_ia32_permvarsi256 (v8si,v8si); +v4df __builtin_ia32_permdf256 (v4df,int); +v8sf __builtin_ia32_permvarsf256 (v8sf,v8sf); +v4di __builtin_ia32_permdi256 (v4di,int); +v4di __builtin_ia32_permti256 (v4di,v4di,int); +v4di __builtin_ia32_extract128i256 (v4di,int); +v4di __builtin_ia32_insert128i256 (v4di,v2di,int); +v8si __builtin_ia32_maskloadd256 (pcv8si,v8si); +v4di __builtin_ia32_maskloadq256 (pcv4di,v4di); +v4si __builtin_ia32_maskloadd (pcv4si,v4si); +v2di __builtin_ia32_maskloadq (pcv2di,v2di); +void __builtin_ia32_maskstored256 (pv8si,v8si,v8si); +void __builtin_ia32_maskstoreq256 (pv4di,v4di,v4di); +void __builtin_ia32_maskstored (pv4si,v4si,v4si); +void __builtin_ia32_maskstoreq (pv2di,v2di,v2di); +v8si __builtin_ia32_psllv8si (v8si,v8si); +v4si __builtin_ia32_psllv4si (v4si,v4si); +v4di __builtin_ia32_psllv4di (v4di,v4di); +v2di __builtin_ia32_psllv2di (v2di,v2di); +v8si __builtin_ia32_psrav8si (v8si,v8si); +v4si __builtin_ia32_psrav4si (v4si,v4si); +v8si __builtin_ia32_psrlv8si (v8si,v8si); +v4si __builtin_ia32_psrlv4si (v4si,v4si); +v4di __builtin_ia32_psrlv4di (v4di,v4di); +v2di __builtin_ia32_psrlv2di (v2di,v2di); +v2df __builtin_ia32_gathersiv2df (v2df, pcdouble,v4si,v2df,int); +v4df __builtin_ia32_gathersiv4df (v4df, pcdouble,v4si,v4df,int); +v2df __builtin_ia32_gatherdiv2df (v2df, pcdouble,v2di,v2df,int); +v4df __builtin_ia32_gatherdiv4df (v4df, pcdouble,v4di,v4df,int); +v4sf __builtin_ia32_gathersiv4sf (v4sf, pcfloat,v4si,v4sf,int); +v8sf __builtin_ia32_gathersiv8sf (v8sf, pcfloat,v8si,v8sf,int); +v4sf __builtin_ia32_gatherdiv4sf (v4sf, pcfloat,v2di,v4sf,int); +v4sf __builtin_ia32_gatherdiv4sf256 (v4sf, pcfloat,v4di,v4sf,int); +v2di __builtin_ia32_gathersiv2di (v2di, pcint64,v4si,v2di,int); +v4di __builtin_ia32_gathersiv4di (v4di, pcint64,v4si,v4di,int); +v2di __builtin_ia32_gatherdiv2di (v2di, pcint64,v2di,v2di,int); +v4di __builtin_ia32_gatherdiv4di (v4di, pcint64,v4di,v4di,int); +v4si __builtin_ia32_gathersiv4si (v4si, pcint,v4si,v4si,int); +v8si __builtin_ia32_gathersiv8si (v8si, pcint,v8si,v8si,int); +v4si __builtin_ia32_gatherdiv4si (v4si, pcint,v2di,v4si,int); +v4si __builtin_ia32_gatherdiv4si256 (v4si, pcint,v4di,v4si,int); @end smallexample -@cindex pragma GCC unroll @var{n} -@item #pragma GCC unroll @var{n} - -You can use this pragma to control how many times a loop should be unrolled. -It must be placed immediately before a @code{for}, @code{while} or @code{do} -loop or a @code{#pragma GCC ivdep}, and applies only to the loop that follows. -@var{n} is an integer constant expression specifying the unrolling factor. -The values of @math{0} and @math{1} block any unrolling of the loop. - -@end table - -@node Thread-Local -@section Thread-Local Storage -@cindex Thread-Local Storage -@cindex @acronym{TLS} -@cindex @code{__thread} - -Thread-local storage (@acronym{TLS}) is a mechanism by which variables -are allocated such that there is one instance of the variable per extant -thread. The runtime model GCC uses to implement this originates -in the IA-64 processor-specific ABI, but has since been migrated -to other processors as well. It requires significant support from -the linker (@command{ld}), dynamic linker (@command{ld.so}), and -system libraries (@file{libc.so} and @file{libpthread.so}), so it -is not available everywhere. - -At the user level, the extension is visible with a new storage -class keyword: @code{__thread}. For example: +The following built-in functions are available when @option{-maes} is +used. All of them generate the machine instruction that is part of the +name. @smallexample -__thread int i; -extern __thread struct state s; -static __thread char *p; +v2di __builtin_ia32_aesenc128 (v2di, v2di); +v2di __builtin_ia32_aesenclast128 (v2di, v2di); +v2di __builtin_ia32_aesdec128 (v2di, v2di); +v2di __builtin_ia32_aesdeclast128 (v2di, v2di); +v2di __builtin_ia32_aeskeygenassist128 (v2di, const int); +v2di __builtin_ia32_aesimc128 (v2di); @end smallexample -The @code{__thread} specifier may be used alone, with the @code{extern} -or @code{static} specifiers, but with no other storage class specifier. -When used with @code{extern} or @code{static}, @code{__thread} must appear -immediately after the other storage class specifier. - -The @code{__thread} specifier may be applied to any global, file-scoped -static, function-scoped static, or static data member of a class. It may -not be applied to block-scoped automatic or non-static data member. - -When the address-of operator is applied to a thread-local variable, it is -evaluated at run time and returns the address of the current thread's -instance of that variable. An address so obtained may be used by any -thread. When a thread terminates, any pointers to thread-local variables -in that thread become invalid. +The following built-in function is available when @option{-mpclmul} is +used. -No static initialization may refer to the address of a thread-local variable. +@defbuiltin{v2di __builtin_ia32_pclmulqdq128 (v2di, v2di, const int)} +Generates the @code{pclmulqdq} machine instruction. +@enddefbuiltin -In C++, if an initializer is present for a thread-local variable, it must -be a @var{constant-expression}, as defined in 5.19.2 of the ANSI/ISO C++ -standard. +The following built-in function is available when @option{-mfsgsbase} is +used. All of them generate the machine instruction that is part of the +name. -See @uref{https://www.akkadia.org/drepper/tls.pdf, -ELF Handling For Thread-Local Storage} for a detailed explanation of -the four thread-local storage addressing models, and how the runtime -is expected to function. +@smallexample +unsigned int __builtin_ia32_rdfsbase32 (void); +unsigned long long __builtin_ia32_rdfsbase64 (void); +unsigned int __builtin_ia32_rdgsbase32 (void); +unsigned long long __builtin_ia32_rdgsbase64 (void); +void _writefsbase_u32 (unsigned int); +void _writefsbase_u64 (unsigned long long); +void _writegsbase_u32 (unsigned int); +void _writegsbase_u64 (unsigned long long); +@end smallexample -@menu -* C99 Thread-Local Edits:: -* C++98 Thread-Local Edits:: -@end menu +The following built-in function is available when @option{-mrdrnd} is +used. All of them generate the machine instruction that is part of the +name. -@node C99 Thread-Local Edits -@subsection ISO/IEC 9899:1999 Edits for Thread-Local Storage +@smallexample +unsigned int __builtin_ia32_rdrand16_step (unsigned short *); +unsigned int __builtin_ia32_rdrand32_step (unsigned int *); +unsigned int __builtin_ia32_rdrand64_step (unsigned long long *); +@end smallexample -The following are a set of changes to ISO/IEC 9899:1999 (aka C99) -that document the exact semantics of the language extension. +The following built-in function is available when @option{-mptwrite} is +used. All of them generate the machine instruction that is part of the +name. -@itemize @bullet -@item -@cite{5.1.2 Execution environments} +@smallexample +void __builtin_ia32_ptwrite32 (unsigned); +void __builtin_ia32_ptwrite64 (unsigned long long); +@end smallexample -Add new text after paragraph 1 +The following built-in functions are available when @option{-msse4a} is used. +All of them generate the machine instruction that is part of the name. -@quotation -Within either execution environment, a @dfn{thread} is a flow of -control within a program. It is implementation defined whether -or not there may be more than one thread associated with a program. -It is implementation defined how threads beyond the first are -created, the name and type of the function called at thread -startup, and how threads may be terminated. However, objects -with thread storage duration shall be initialized before thread -startup. -@end quotation +@smallexample +void __builtin_ia32_movntsd (double *, v2df); +void __builtin_ia32_movntss (float *, v4sf); +v2di __builtin_ia32_extrq (v2di, v16qi); +v2di __builtin_ia32_extrqi (v2di, const unsigned int, const unsigned int); +v2di __builtin_ia32_insertq (v2di, v2di); +v2di __builtin_ia32_insertqi (v2di, v2di, const unsigned int, const unsigned int); +@end smallexample -@item -@cite{6.2.4 Storage durations of objects} +The following built-in functions are available when @option{-mxop} is used. +@smallexample +v2df __builtin_ia32_vfrczpd (v2df); +v4sf __builtin_ia32_vfrczps (v4sf); +v2df __builtin_ia32_vfrczsd (v2df); +v4sf __builtin_ia32_vfrczss (v4sf); +v4df __builtin_ia32_vfrczpd256 (v4df); +v8sf __builtin_ia32_vfrczps256 (v8sf); +v2di __builtin_ia32_vpcmov (v2di, v2di, v2di); +v2di __builtin_ia32_vpcmov_v2di (v2di, v2di, v2di); +v4si __builtin_ia32_vpcmov_v4si (v4si, v4si, v4si); +v8hi __builtin_ia32_vpcmov_v8hi (v8hi, v8hi, v8hi); +v16qi __builtin_ia32_vpcmov_v16qi (v16qi, v16qi, v16qi); +v2df __builtin_ia32_vpcmov_v2df (v2df, v2df, v2df); +v4sf __builtin_ia32_vpcmov_v4sf (v4sf, v4sf, v4sf); +v4di __builtin_ia32_vpcmov_v4di256 (v4di, v4di, v4di); +v8si __builtin_ia32_vpcmov_v8si256 (v8si, v8si, v8si); +v16hi __builtin_ia32_vpcmov_v16hi256 (v16hi, v16hi, v16hi); +v32qi __builtin_ia32_vpcmov_v32qi256 (v32qi, v32qi, v32qi); +v4df __builtin_ia32_vpcmov_v4df256 (v4df, v4df, v4df); +v8sf __builtin_ia32_vpcmov_v8sf256 (v8sf, v8sf, v8sf); +v16qi __builtin_ia32_vpcomeqb (v16qi, v16qi); +v8hi __builtin_ia32_vpcomeqw (v8hi, v8hi); +v4si __builtin_ia32_vpcomeqd (v4si, v4si); +v2di __builtin_ia32_vpcomeqq (v2di, v2di); +v16qi __builtin_ia32_vpcomequb (v16qi, v16qi); +v4si __builtin_ia32_vpcomequd (v4si, v4si); +v2di __builtin_ia32_vpcomequq (v2di, v2di); +v8hi __builtin_ia32_vpcomequw (v8hi, v8hi); +v8hi __builtin_ia32_vpcomeqw (v8hi, v8hi); +v16qi __builtin_ia32_vpcomfalseb (v16qi, v16qi); +v4si __builtin_ia32_vpcomfalsed (v4si, v4si); +v2di __builtin_ia32_vpcomfalseq (v2di, v2di); +v16qi __builtin_ia32_vpcomfalseub (v16qi, v16qi); +v4si __builtin_ia32_vpcomfalseud (v4si, v4si); +v2di __builtin_ia32_vpcomfalseuq (v2di, v2di); +v8hi __builtin_ia32_vpcomfalseuw (v8hi, v8hi); +v8hi __builtin_ia32_vpcomfalsew (v8hi, v8hi); +v16qi __builtin_ia32_vpcomgeb (v16qi, v16qi); +v4si __builtin_ia32_vpcomged (v4si, v4si); +v2di __builtin_ia32_vpcomgeq (v2di, v2di); +v16qi __builtin_ia32_vpcomgeub (v16qi, v16qi); +v4si __builtin_ia32_vpcomgeud (v4si, v4si); +v2di __builtin_ia32_vpcomgeuq (v2di, v2di); +v8hi __builtin_ia32_vpcomgeuw (v8hi, v8hi); +v8hi __builtin_ia32_vpcomgew (v8hi, v8hi); +v16qi __builtin_ia32_vpcomgtb (v16qi, v16qi); +v4si __builtin_ia32_vpcomgtd (v4si, v4si); +v2di __builtin_ia32_vpcomgtq (v2di, v2di); +v16qi __builtin_ia32_vpcomgtub (v16qi, v16qi); +v4si __builtin_ia32_vpcomgtud (v4si, v4si); +v2di __builtin_ia32_vpcomgtuq (v2di, v2di); +v8hi __builtin_ia32_vpcomgtuw (v8hi, v8hi); +v8hi __builtin_ia32_vpcomgtw (v8hi, v8hi); +v16qi __builtin_ia32_vpcomleb (v16qi, v16qi); +v4si __builtin_ia32_vpcomled (v4si, v4si); +v2di __builtin_ia32_vpcomleq (v2di, v2di); +v16qi __builtin_ia32_vpcomleub (v16qi, v16qi); +v4si __builtin_ia32_vpcomleud (v4si, v4si); +v2di __builtin_ia32_vpcomleuq (v2di, v2di); +v8hi __builtin_ia32_vpcomleuw (v8hi, v8hi); +v8hi __builtin_ia32_vpcomlew (v8hi, v8hi); +v16qi __builtin_ia32_vpcomltb (v16qi, v16qi); +v4si __builtin_ia32_vpcomltd (v4si, v4si); +v2di __builtin_ia32_vpcomltq (v2di, v2di); +v16qi __builtin_ia32_vpcomltub (v16qi, v16qi); +v4si __builtin_ia32_vpcomltud (v4si, v4si); +v2di __builtin_ia32_vpcomltuq (v2di, v2di); +v8hi __builtin_ia32_vpcomltuw (v8hi, v8hi); +v8hi __builtin_ia32_vpcomltw (v8hi, v8hi); +v16qi __builtin_ia32_vpcomneb (v16qi, v16qi); +v4si __builtin_ia32_vpcomned (v4si, v4si); +v2di __builtin_ia32_vpcomneq (v2di, v2di); +v16qi __builtin_ia32_vpcomneub (v16qi, v16qi); +v4si __builtin_ia32_vpcomneud (v4si, v4si); +v2di __builtin_ia32_vpcomneuq (v2di, v2di); +v8hi __builtin_ia32_vpcomneuw (v8hi, v8hi); +v8hi __builtin_ia32_vpcomnew (v8hi, v8hi); +v16qi __builtin_ia32_vpcomtrueb (v16qi, v16qi); +v4si __builtin_ia32_vpcomtrued (v4si, v4si); +v2di __builtin_ia32_vpcomtrueq (v2di, v2di); +v16qi __builtin_ia32_vpcomtrueub (v16qi, v16qi); +v4si __builtin_ia32_vpcomtrueud (v4si, v4si); +v2di __builtin_ia32_vpcomtrueuq (v2di, v2di); +v8hi __builtin_ia32_vpcomtrueuw (v8hi, v8hi); +v8hi __builtin_ia32_vpcomtruew (v8hi, v8hi); +v4si __builtin_ia32_vphaddbd (v16qi); +v2di __builtin_ia32_vphaddbq (v16qi); +v8hi __builtin_ia32_vphaddbw (v16qi); +v2di __builtin_ia32_vphadddq (v4si); +v4si __builtin_ia32_vphaddubd (v16qi); +v2di __builtin_ia32_vphaddubq (v16qi); +v8hi __builtin_ia32_vphaddubw (v16qi); +v2di __builtin_ia32_vphaddudq (v4si); +v4si __builtin_ia32_vphadduwd (v8hi); +v2di __builtin_ia32_vphadduwq (v8hi); +v4si __builtin_ia32_vphaddwd (v8hi); +v2di __builtin_ia32_vphaddwq (v8hi); +v8hi __builtin_ia32_vphsubbw (v16qi); +v2di __builtin_ia32_vphsubdq (v4si); +v4si __builtin_ia32_vphsubwd (v8hi); +v4si __builtin_ia32_vpmacsdd (v4si, v4si, v4si); +v2di __builtin_ia32_vpmacsdqh (v4si, v4si, v2di); +v2di __builtin_ia32_vpmacsdql (v4si, v4si, v2di); +v4si __builtin_ia32_vpmacssdd (v4si, v4si, v4si); +v2di __builtin_ia32_vpmacssdqh (v4si, v4si, v2di); +v2di __builtin_ia32_vpmacssdql (v4si, v4si, v2di); +v4si __builtin_ia32_vpmacsswd (v8hi, v8hi, v4si); +v8hi __builtin_ia32_vpmacssww (v8hi, v8hi, v8hi); +v4si __builtin_ia32_vpmacswd (v8hi, v8hi, v4si); +v8hi __builtin_ia32_vpmacsww (v8hi, v8hi, v8hi); +v4si __builtin_ia32_vpmadcsswd (v8hi, v8hi, v4si); +v4si __builtin_ia32_vpmadcswd (v8hi, v8hi, v4si); +v16qi __builtin_ia32_vpperm (v16qi, v16qi, v16qi); +v16qi __builtin_ia32_vprotb (v16qi, v16qi); +v4si __builtin_ia32_vprotd (v4si, v4si); +v2di __builtin_ia32_vprotq (v2di, v2di); +v8hi __builtin_ia32_vprotw (v8hi, v8hi); +v16qi __builtin_ia32_vpshab (v16qi, v16qi); +v4si __builtin_ia32_vpshad (v4si, v4si); +v2di __builtin_ia32_vpshaq (v2di, v2di); +v8hi __builtin_ia32_vpshaw (v8hi, v8hi); +v16qi __builtin_ia32_vpshlb (v16qi, v16qi); +v4si __builtin_ia32_vpshld (v4si, v4si); +v2di __builtin_ia32_vpshlq (v2di, v2di); +v8hi __builtin_ia32_vpshlw (v8hi, v8hi); +@end smallexample -Add new text before paragraph 3 +The following built-in functions are available when @option{-mfma4} is used. +All of them generate the machine instruction that is part of the name. -@quotation -An object whose identifier is declared with the storage-class -specifier @w{@code{__thread}} has @dfn{thread storage duration}. -Its lifetime is the entire execution of the thread, and its -stored value is initialized only once, prior to thread startup. -@end quotation +@smallexample +v2df __builtin_ia32_vfmaddpd (v2df, v2df, v2df); +v4sf __builtin_ia32_vfmaddps (v4sf, v4sf, v4sf); +v2df __builtin_ia32_vfmaddsd (v2df, v2df, v2df); +v4sf __builtin_ia32_vfmaddss (v4sf, v4sf, v4sf); +v2df __builtin_ia32_vfmsubpd (v2df, v2df, v2df); +v4sf __builtin_ia32_vfmsubps (v4sf, v4sf, v4sf); +v2df __builtin_ia32_vfmsubsd (v2df, v2df, v2df); +v4sf __builtin_ia32_vfmsubss (v4sf, v4sf, v4sf); +v2df __builtin_ia32_vfnmaddpd (v2df, v2df, v2df); +v4sf __builtin_ia32_vfnmaddps (v4sf, v4sf, v4sf); +v2df __builtin_ia32_vfnmaddsd (v2df, v2df, v2df); +v4sf __builtin_ia32_vfnmaddss (v4sf, v4sf, v4sf); +v2df __builtin_ia32_vfnmsubpd (v2df, v2df, v2df); +v4sf __builtin_ia32_vfnmsubps (v4sf, v4sf, v4sf); +v2df __builtin_ia32_vfnmsubsd (v2df, v2df, v2df); +v4sf __builtin_ia32_vfnmsubss (v4sf, v4sf, v4sf); +v2df __builtin_ia32_vfmaddsubpd (v2df, v2df, v2df); +v4sf __builtin_ia32_vfmaddsubps (v4sf, v4sf, v4sf); +v2df __builtin_ia32_vfmsubaddpd (v2df, v2df, v2df); +v4sf __builtin_ia32_vfmsubaddps (v4sf, v4sf, v4sf); +v4df __builtin_ia32_vfmaddpd256 (v4df, v4df, v4df); +v8sf __builtin_ia32_vfmaddps256 (v8sf, v8sf, v8sf); +v4df __builtin_ia32_vfmsubpd256 (v4df, v4df, v4df); +v8sf __builtin_ia32_vfmsubps256 (v8sf, v8sf, v8sf); +v4df __builtin_ia32_vfnmaddpd256 (v4df, v4df, v4df); +v8sf __builtin_ia32_vfnmaddps256 (v8sf, v8sf, v8sf); +v4df __builtin_ia32_vfnmsubpd256 (v4df, v4df, v4df); +v8sf __builtin_ia32_vfnmsubps256 (v8sf, v8sf, v8sf); +v4df __builtin_ia32_vfmaddsubpd256 (v4df, v4df, v4df); +v8sf __builtin_ia32_vfmaddsubps256 (v8sf, v8sf, v8sf); +v4df __builtin_ia32_vfmsubaddpd256 (v4df, v4df, v4df); +v8sf __builtin_ia32_vfmsubaddps256 (v8sf, v8sf, v8sf); -@item -@cite{6.4.1 Keywords} +@end smallexample -Add @code{__thread}. +The following built-in functions are available when @option{-mlwp} is used. -@item -@cite{6.7.1 Storage-class specifiers} +@smallexample +void __builtin_ia32_llwpcb16 (void *); +void __builtin_ia32_llwpcb32 (void *); +void __builtin_ia32_llwpcb64 (void *); +void * __builtin_ia32_llwpcb16 (void); +void * __builtin_ia32_llwpcb32 (void); +void * __builtin_ia32_llwpcb64 (void); +void __builtin_ia32_lwpval16 (unsigned short, unsigned int, unsigned short); +void __builtin_ia32_lwpval32 (unsigned int, unsigned int, unsigned int); +void __builtin_ia32_lwpval64 (unsigned __int64, unsigned int, unsigned int); +unsigned char __builtin_ia32_lwpins16 (unsigned short, unsigned int, unsigned short); +unsigned char __builtin_ia32_lwpins32 (unsigned int, unsigned int, unsigned int); +unsigned char __builtin_ia32_lwpins64 (unsigned __int64, unsigned int, unsigned int); +@end smallexample -Add @code{__thread} to the list of storage class specifiers in -paragraph 1. +The following built-in functions are available when @option{-mbmi} is used. +All of them generate the machine instruction that is part of the name. +@smallexample +unsigned int __builtin_ia32_bextr_u32(unsigned int, unsigned int); +unsigned long long __builtin_ia32_bextr_u64 (unsigned long long, unsigned long long); +@end smallexample -Change paragraph 2 to +The following built-in functions are available when @option{-mbmi2} is used. +All of them generate the machine instruction that is part of the name. +@smallexample +unsigned int _bzhi_u32 (unsigned int, unsigned int); +unsigned int _pdep_u32 (unsigned int, unsigned int); +unsigned int _pext_u32 (unsigned int, unsigned int); +unsigned long long _bzhi_u64 (unsigned long long, unsigned long long); +unsigned long long _pdep_u64 (unsigned long long, unsigned long long); +unsigned long long _pext_u64 (unsigned long long, unsigned long long); +@end smallexample -@quotation -With the exception of @code{__thread}, at most one storage-class -specifier may be given [@dots{}]. The @code{__thread} specifier may -be used alone, or immediately following @code{extern} or -@code{static}. -@end quotation +The following built-in functions are available when @option{-mlzcnt} is used. +All of them generate the machine instruction that is part of the name. +@smallexample +unsigned short __builtin_ia32_lzcnt_u16(unsigned short); +unsigned int __builtin_ia32_lzcnt_u32(unsigned int); +unsigned long long __builtin_ia32_lzcnt_u64 (unsigned long long); +@end smallexample -Add new text after paragraph 6 +The following built-in functions are available when @option{-mfxsr} is used. +All of them generate the machine instruction that is part of the name. +@smallexample +void __builtin_ia32_fxsave (void *); +void __builtin_ia32_fxrstor (void *); +void __builtin_ia32_fxsave64 (void *); +void __builtin_ia32_fxrstor64 (void *); +@end smallexample -@quotation -The declaration of an identifier for a variable that has -block scope that specifies @code{__thread} shall also -specify either @code{extern} or @code{static}. +The following built-in functions are available when @option{-mxsave} is used. +All of them generate the machine instruction that is part of the name. +@smallexample +void __builtin_ia32_xsave (void *, long long); +void __builtin_ia32_xrstor (void *, long long); +void __builtin_ia32_xsave64 (void *, long long); +void __builtin_ia32_xrstor64 (void *, long long); +@end smallexample -The @code{__thread} specifier shall be used only with -variables. -@end quotation -@end itemize +The following built-in functions are available when @option{-mxsaveopt} is used. +All of them generate the machine instruction that is part of the name. +@smallexample +void __builtin_ia32_xsaveopt (void *, long long); +void __builtin_ia32_xsaveopt64 (void *, long long); +@end smallexample -@node C++98 Thread-Local Edits -@subsection ISO/IEC 14882:1998 Edits for Thread-Local Storage +The following built-in functions are available when @option{-mtbm} is used. +Both of them generate the immediate form of the bextr machine instruction. +@smallexample +unsigned int __builtin_ia32_bextri_u32 (unsigned int, + const unsigned int); +unsigned long long __builtin_ia32_bextri_u64 (unsigned long long, + const unsigned long long); +@end smallexample -The following are a set of changes to ISO/IEC 14882:1998 (aka C++98) -that document the exact semantics of the language extension. -@itemize @bullet -@item -@b{[intro.execution]} +The following built-in functions are available when @option{-m3dnow} is used. +All of them generate the machine instruction that is part of the name. -New text after paragraph 4 +@smallexample +void __builtin_ia32_femms (void); +v8qi __builtin_ia32_pavgusb (v8qi, v8qi); +v2si __builtin_ia32_pf2id (v2sf); +v2sf __builtin_ia32_pfacc (v2sf, v2sf); +v2sf __builtin_ia32_pfadd (v2sf, v2sf); +v2si __builtin_ia32_pfcmpeq (v2sf, v2sf); +v2si __builtin_ia32_pfcmpge (v2sf, v2sf); +v2si __builtin_ia32_pfcmpgt (v2sf, v2sf); +v2sf __builtin_ia32_pfmax (v2sf, v2sf); +v2sf __builtin_ia32_pfmin (v2sf, v2sf); +v2sf __builtin_ia32_pfmul (v2sf, v2sf); +v2sf __builtin_ia32_pfrcp (v2sf); +v2sf __builtin_ia32_pfrcpit1 (v2sf, v2sf); +v2sf __builtin_ia32_pfrcpit2 (v2sf, v2sf); +v2sf __builtin_ia32_pfrsqrt (v2sf); +v2sf __builtin_ia32_pfsub (v2sf, v2sf); +v2sf __builtin_ia32_pfsubr (v2sf, v2sf); +v2sf __builtin_ia32_pi2fd (v2si); +v4hi __builtin_ia32_pmulhrw (v4hi, v4hi); +@end smallexample -@quotation -A @dfn{thread} is a flow of control within the abstract machine. -It is implementation defined whether or not there may be more than -one thread. -@end quotation +The following built-in functions are available when @option{-m3dnowa} is used. +All of them generate the machine instruction that is part of the name. -New text after paragraph 7 +@smallexample +v2si __builtin_ia32_pf2iw (v2sf); +v2sf __builtin_ia32_pfnacc (v2sf, v2sf); +v2sf __builtin_ia32_pfpnacc (v2sf, v2sf); +v2sf __builtin_ia32_pi2fw (v2si); +v2sf __builtin_ia32_pswapdsf (v2sf); +v2si __builtin_ia32_pswapdsi (v2si); +@end smallexample -@quotation -It is unspecified whether additional action must be taken to -ensure when and whether side effects are visible to other threads. -@end quotation +The following built-in functions are available when @option{-mrtm} is used +They are used for restricted transactional memory. These are the internal +low level functions. Normally the functions in +@ref{x86 transactional memory intrinsics} should be used instead. -@item -@b{[lex.key]} +@smallexample +int __builtin_ia32_xbegin (); +void __builtin_ia32_xend (); +void __builtin_ia32_xabort (status); +int __builtin_ia32_xtest (); +@end smallexample -Add @code{__thread}. +The following built-in functions are available when @option{-mmwaitx} is used. +All of them generate the machine instruction that is part of the name. +@smallexample +void __builtin_ia32_monitorx (void *, unsigned int, unsigned int); +void __builtin_ia32_mwaitx (unsigned int, unsigned int, unsigned int); +@end smallexample -@item -@b{[basic.start.main]} +The following built-in functions are available when @option{-mclzero} is used. +All of them generate the machine instruction that is part of the name. +@smallexample +void __builtin_i32_clzero (void *); +@end smallexample -Add after paragraph 5 +The following built-in functions are available when @option{-mpku} is used. +They generate reads and writes to PKRU. +@smallexample +void __builtin_ia32_wrpkru (unsigned int); +unsigned int __builtin_ia32_rdpkru (); +@end smallexample -@quotation -The thread that begins execution at the @code{main} function is called -the @dfn{main thread}. It is implementation defined how functions -beginning threads other than the main thread are designated or typed. -A function so designated, as well as the @code{main} function, is called -a @dfn{thread startup function}. It is implementation defined what -happens if a thread startup function returns. It is implementation -defined what happens to other threads when any thread calls @code{exit}. -@end quotation +The following built-in functions are available when +@option{-mshstk} option is used. They support shadow stack +machine instructions from Intel Control-flow Enforcement Technology (CET). +Each built-in function generates the machine instruction that is part +of the function's name. These are the internal low-level functions. +Normally the functions in @ref{x86 control-flow protection intrinsics} +should be used instead. -@item -@b{[basic.start.init]} +@smallexample +unsigned int __builtin_ia32_rdsspd (void); +unsigned long long __builtin_ia32_rdsspq (void); +void __builtin_ia32_incsspd (unsigned int); +void __builtin_ia32_incsspq (unsigned long long); +void __builtin_ia32_saveprevssp(void); +void __builtin_ia32_rstorssp(void *); +void __builtin_ia32_wrssd(unsigned int, void *); +void __builtin_ia32_wrssq(unsigned long long, void *); +void __builtin_ia32_wrussd(unsigned int, void *); +void __builtin_ia32_wrussq(unsigned long long, void *); +void __builtin_ia32_setssbsy(void); +void __builtin_ia32_clrssbsy(void *); +@end smallexample -Add after paragraph 4 +@node x86 transactional memory intrinsics +@subsection x86 Transactional Memory Intrinsics -@quotation -The storage for an object of thread storage duration shall be -statically initialized before the first statement of the thread startup -function. An object of thread storage duration shall not require -dynamic initialization. -@end quotation +These hardware transactional memory intrinsics for x86 allow you to use +memory transactions with RTM (Restricted Transactional Memory). +This support is enabled with the @option{-mrtm} option. +For using HLE (Hardware Lock Elision) see +@ref{x86 specific memory model extensions for transactional memory} instead. -@item -@b{[basic.start.term]} +A memory transaction commits all changes to memory in an atomic way, +as visible to other threads. If the transaction fails it is rolled back +and all side effects discarded. -Add after paragraph 3 +Generally there is no guarantee that a memory transaction ever succeeds +and suitable fallback code always needs to be supplied. -@quotation -The type of an object with thread storage duration shall not have a -non-trivial destructor, nor shall it be an array type whose elements -(directly or indirectly) have non-trivial destructors. -@end quotation +@deftypefn {RTM Function} {unsigned} _xbegin () +Start a RTM (Restricted Transactional Memory) transaction. +Returns @code{_XBEGIN_STARTED} when the transaction +started successfully (note this is not 0, so the constant has to be +explicitly tested). -@item -@b{[basic.stc]} +If the transaction aborts, all side effects +are undone and an abort code encoded as a bit mask is returned. +The following macros are defined: -Add ``thread storage duration'' to the list in paragraph 1. +@defmac{_XABORT_EXPLICIT} +Transaction was explicitly aborted with @code{_xabort}. The parameter passed +to @code{_xabort} is available with @code{_XABORT_CODE(status)}. +@end defmac -Change paragraph 2 +@defmac{_XABORT_RETRY} +Transaction retry is possible. +@end defmac -@quotation -Thread, static, and automatic storage durations are associated with -objects introduced by declarations [@dots{}]. -@end quotation +@defmac{_XABORT_CONFLICT} +Transaction abort due to a memory conflict with another thread. +@end defmac -Add @code{__thread} to the list of specifiers in paragraph 3. +@defmac{_XABORT_CAPACITY} +Transaction abort due to the transaction using too much memory. +@end defmac -@item -@b{[basic.stc.thread]} +@defmac{_XABORT_DEBUG} +Transaction abort due to a debug trap. +@end defmac -New section before @b{[basic.stc.static]} +@defmac{_XABORT_NESTED} +Transaction abort in an inner nested transaction. +@end defmac -@quotation -The keyword @code{__thread} applied to a non-local object gives the -object thread storage duration. +There is no guarantee +any transaction ever succeeds, so there always needs to be a valid +fallback path. +@end deftypefn -A local variable or class data member declared both @code{static} -and @code{__thread} gives the variable or member thread storage -duration. -@end quotation +@deftypefn {RTM Function} {void} _xend () +Commit the current transaction. When no transaction is active this faults. +All memory side effects of the transaction become visible +to other threads in an atomic manner. +@end deftypefn -@item -@b{[basic.stc.static]} +@deftypefn {RTM Function} {int} _xtest () +Return a nonzero value if a transaction is currently active, otherwise 0. +@end deftypefn -Change paragraph 1 +@deftypefn {RTM Function} {void} _xabort (status) +Abort the current transaction. When no transaction is active this is a no-op. +The @var{status} is an 8-bit constant; its value is encoded in the return +value from @code{_xbegin}. +@end deftypefn -@quotation -All objects that have neither thread storage duration, dynamic -storage duration nor are local [@dots{}]. -@end quotation +Here is an example showing handling for @code{_XABORT_RETRY} +and a fallback path for other failures: -@item -@b{[dcl.stc]} +@smallexample +#include -Add @code{__thread} to the list in paragraph 1. +int n_tries, max_tries; +unsigned status = _XABORT_EXPLICIT; +... -Change paragraph 1 +for (n_tries = 0; n_tries < max_tries; n_tries++) + @{ + status = _xbegin (); + if (status == _XBEGIN_STARTED || !(status & _XABORT_RETRY)) + break; + @} +if (status == _XBEGIN_STARTED) + @{ + ... transaction code... + _xend (); + @} +else + @{ + ... non-transactional fallback path... + @} +@end smallexample -@quotation -With the exception of @code{__thread}, at most one -@var{storage-class-specifier} shall appear in a given -@var{decl-specifier-seq}. The @code{__thread} specifier may -be used alone, or immediately following the @code{extern} or -@code{static} specifiers. [@dots{}] -@end quotation +@noindent +Note that, in most cases, the transactional and non-transactional code +must synchronize together to ensure consistency. -Add after paragraph 5 +@node x86 control-flow protection intrinsics +@subsection x86 Control-Flow Protection Intrinsics -@quotation -The @code{__thread} specifier can be applied only to the names of objects -and to anonymous unions. -@end quotation +@deftypefn {CET Function} {ret_type} _get_ssp (void) +Get the current value of shadow stack pointer if shadow stack support +from Intel CET is enabled in the hardware or @code{0} otherwise. +The @code{ret_type} is @code{unsigned long long} for 64-bit targets +and @code{unsigned int} for 32-bit targets. +@end deftypefn -@item -@b{[class.mem]} +@deftypefn {CET Function} void _inc_ssp (unsigned int) +Increment the current shadow stack pointer by the size specified by the +function argument. The argument is masked to a byte value for security +reasons, so to increment by more than 255 bytes you must call the function +multiple times. +@end deftypefn -Add after paragraph 6 +The shadow stack unwind code looks like: -@quotation -Non-@code{static} members shall not be @code{__thread}. -@end quotation -@end itemize +@smallexample +#include -@node OpenMP -@section OpenMP -@cindex OpenMP extension support +/* Unwind the shadow stack for EH. */ +#define _Unwind_Frames_Extra(x) \ + do \ + @{ \ + _Unwind_Word ssp = _get_ssp (); \ + if (ssp != 0) \ + @{ \ + _Unwind_Word tmp = (x); \ + while (tmp > 255) \ + @{ \ + _inc_ssp (tmp); \ + tmp -= 255; \ + @} \ + _inc_ssp (tmp); \ + @} \ + @} \ + while (0) +@end smallexample -OpenMP (Open Multi-Processing) is an application programming -interface (API) that supports multi-platform shared memory -multiprocessing programming in C/C++ and Fortran on many -architectures, including Unix and Microsoft Windows platforms. -It consists of a set of compiler directives, library routines, -and environment variables that influence run-time behavior. +@noindent +This code runs unconditionally on all 64-bit processors. For 32-bit +processors the code runs on those that support multi-byte NOP instructions. -GCC implements all of the @uref{https://www.openmp.org/specifications/, -OpenMP Application Program Interface v4.5}, and many features from later -versions of the OpenMP specification. -@xref{OpenMP Implementation Status,,,libgomp, -GNU Offloading and Multi Processing Runtime Library}, -for more details about currently supported OpenMP features. +@node Target Format Checks +@section Format Checks Specific to Particular Target Machines -To enable the processing of OpenMP directives @samp{#pragma omp}, -@samp{[[omp::directive(...)]]}, @samp{[[omp::decl(...)]]}, -and @samp{[[omp::sequence(...)]]} in C and C++, -GCC needs to be invoked with the @option{-fopenmp} option. -This option also arranges for automatic linking of the OpenMP -runtime library. -@xref{,,,libgomp,GNU Offloading and Multi Processing Runtime Library}. +For some target machines, GCC supports additional options to the +format attribute +(@pxref{Function Attributes,,Declaring Attributes of Functions}). -@xref{OpenMP and OpenACC Options}, for additional options useful with -@option{-fopenmp}. +@menu +* Solaris Format Checks:: +* Darwin Format Checks:: +@end menu -@node OpenACC -@section OpenACC -@cindex OpenACC extension support +@node Solaris Format Checks +@subsection Solaris Format Checks -OpenACC is an application programming interface (API) that supports -offloading of code to accelerator devices. It consists of a set of -compiler directives, library routines, and environment variables that -influence run-time behavior. +Solaris targets support the @code{cmn_err} (or @code{__cmn_err__}) format +check. @code{cmn_err} accepts a subset of the standard @code{printf} +conversions, and the two-argument @code{%b} conversion for displaying +bit-fields. See the Solaris man page for @code{cmn_err} for more information. -GCC strives to be compatible with the -@uref{https://www.openacc.org/, OpenACC Application Programming -Interface v2.6}. +@node Darwin Format Checks +@subsection Darwin Format Checks -To enable the processing of OpenACC directives @samp{#pragma acc} -in C and C++, GCC needs to be invoked with the @option{-fopenacc} option. -This option also arranges for automatic linking of the OpenACC runtime -library. -@xref{,,,libgomp,GNU Offloading and Multi Processing Runtime Library}. +In addition to the full set of format archetypes (attribute format style +arguments such as @code{printf}, @code{scanf}, @code{strftime}, and +@code{strfmon}), Darwin targets also support the @code{CFString} (or +@code{__CFString__}) archetype in the @code{format} attribute. +Declarations with this archetype are parsed for correct syntax +and argument types. However, parsing of the format string itself and +validating arguments against it in calls to such functions is currently +not performed. -@xref{OpenMP and OpenACC Options}, for additional options useful with -@option{-fopenacc}. +Additionally, @code{CFStringRefs} (defined by the @code{CoreFoundation} headers) may +also be used as format arguments. Note that the relevant headers are only likely to be +available on Darwin (OSX) installations. On such installations, the XCode and system +documentation provide descriptions of @code{CFString}, @code{CFStringRefs} and +associated functions. @node C++ Extensions @chapter Extensions to the C++ Language