git.ipfire.org Git - thirdparty/gcc.git/log

[Ada] Spurious link error with child unit and different Assertion modes.

gcc/ada/

* exp_util.ads (Force_Evaluation): Add formal parameter
Discr_Number, to indicate discriminant expression for which an
external name must be created.
(Remove_Side_Effects): Ditto.
* exp_util.adb (Force_Evaluation): Call Remove_Side_Effects with
added parameter.
(Remove_Side_Effects, Build_Temporary): If Discr_Number is
positive, create an external name with suffix DISCR and the
given discriminant number, analogous to what is done for
temporaries for array type bounds.
* sem_ch3.adb (Process_Discriminant_Expressions): If the
constraint is for an object or component declaration and the
corresponding entity may be visible in another unit, invoke
Force_Evaluation with the new parameter.

[Ada] Clean up Uint fields (continued)

gcc/ada/

* gen_il-internals.ads (Invalid_Val): Remove, unused and
generates warnings.

[Ada] Refine types of local constants that store Etype results

gcc/ada/

* exp_aggr.adb, exp_ch4.adb, exp_ch5.adb, sprint.adb: Refine
types of local constants.

[Ada] Implementation of Preelaborable_Initialization attribute for AI12-0409

gcc/ada/

* exp_attr.adb (Expand_N_Attribute_Reference): Fold
Preelaborable_Initialization attribute in cases where it hasn't
been folded by the analyzer.
* exp_disp.adb (Original_View_In_Visible_Part): This function is
removed and moved to sem_util.adb.
* sem_attr.adb (Attribute_22): Add
Attribute_Preelaborable_Initialization as an Ada 2022 attribute.
(Analyze_Attribute, Attribute_Preelaborable_Initialization):
Check that the prefix of the attribute is either a formal
private or derived type, or a composite type declared within the
visible part of a package or generic package.
(Eval_Attribute): Perform folding of
Preelaborable_Initialization attribute based on
Has_Preelaborable_Initialization applied to the prefix type.
* sem_ch3.adb (Resolve_Aspects): Add specialized code for
Preelaborable_Initialization used at the end of a package
visible part for setting Known_To_Have_Preelab_Init on types
that are specified with True or that have a conjunction of one
or more P_I attributes applied to formal types.
* sem_ch7.adb (Analyze_Package_Specification): On call to
Has_Preelaborable_Initialization, pass True for new formal
Formal_Types_Have_Preelab_Init, so that error checking treats
subcomponents that are declared within types in generics as
having preelaborable initialization when the subcomponents are
of formal types.
* sem_ch13.adb (Analyze_Aspects_At_Freeze_Point): Add test for
P_I to prevent calling Make_Pragma_From_Boolean_Aspect, since
this aspect is handled specially and the
Known_To_Have_Preelab_Init flag will get set on types that have
the aspect by other means.
(Analyze_Aspect_Specifications.Analyze_One_Aspect): Add test for
Aspect_Preelaborable_Initialization for allowing the aspect to
be specified on formal type declarations.
(Is_Operational_Item): Treat Attribute_Put_Image as an
operational attribute.  The need for this was encountered while
working on these changes.
* sem_util.ads (Has_Preelaborable_Initialization): Add
Formal_Types_Have_Preelab_Init as a new formal parameter that
defaults to False.
(Is_Conjunction_Of_Formal_Preelab_Init_Attributes): New
function.
(Original_View_In_Visible_Part): Moved here from exp_disp.adb,
so it can be called by Analyze_Attribute.
* sem_util.adb (Has_Preelaborable_Initialization): Return True
for formal private and derived types when new formal
Formal_Types_Have_Preelab_Init is True, and pass along the
Formal_Types_Have_Preelab_Init flag in the array component case.
(Check_Components): Pass along Formal_Types_Have_Preelab_Init
flag on call to Has_Preelaborable_Initialization.
(Is_Conjunction_Of_Formal_Preelab_Init_Attributes): New function
that returns True when passed an expression that includes one or
more attributes for Preelaborable_Initialization applied to
prefixes that denote formal types.
(Is_Formal_Preelab_Init_Attribute): New utility function nested
within Is_Conjunction_Of_Formal_Preelab_Init_Attributes that
determines whether a node is a P_I attribute applied to a
generic formal type.
(Original_View_In_Visible_Part): Moved here from exp_util.adb,
so it can be called by Analyze_Attribute.
* snames.ads-tmpl: Add note near the start of spec giving
details about what needs to be done when adding a name that
corresponds to both an attribute and a pragma.  Delete existing
occurrence of Name_Preelaborable_Initialization, and add a note
comment in the list of Name_* constants at that place,
indicating that it's included in type Pragma_Id, etc., echoing
other such comments for names that are both an attribute and a
pragma.  Insert Name_Preelaborable_Initialization in the
alphabetized set of Name_* constants corresponding to
attributes (between First_Attribute_Name and
Last_Attribute_Name).
(type Attribute_Id): Add new literal
Attribute_Preelaborable_Initialization.
(type Pragma_Id): Move Pragma_Preelaborable_Initialization from
its current position to the end of the type, in the special set
of pragma literals that have corresponding atttributes. Add to
accompanying comment, indicating that functions Get_Pragma_Id
and Is_Pragma_Name need to be updated when adding a pragma
literal to the special set.
* snames.adb-tmpl (Get_Pragma_Id): Add case alternative for
Pragma_Preelaborable_Initialization.
(Is_Pragma_Name): Add test for
Name_Preelaborable_Initialization.

[Ada] Fix condition in op interpretation resolution

gcc/ada/

* sem_ch4.adb (Finc_Non_Universal_Interpretations): Fix check.

[Ada] Don't examine all discriminants when looking for the first one

gcc/ada/

* sem_ch3.adb (Build_Discriminant_Constraints): Exit once a
first discriminant is found and the Discrim_Present flag is set.

[Ada] Fix assertion in GNATprove_Mode

gcc/ada/

* gnat1drv.adb (Gnat1drv): Avoid calling List_Rep_Info in
Generate_SCIL and GNATprove_Mode.
* repinfo.adb (List_Common_Type_Info): Fix comment.

[Ada] Small cleanup in System.Dwarf_Line

gcc/ada/

* libgnat/s-dwalin.ads: Remove clause for Ada.Exceptions.Traceback,
add clause for System.Traceback_Entries and alphabetize.
(AET): Delete.
(STE): New package renaming.
(Symbolic_Traceback): Adjust.
* libgnat/s-dwalin.adb: Remove clauses for Ada.Exceptions.Traceback
and System.Traceback_Entries.
(Symbolic_Traceback): Adjust.

[Ada] Only assign type to op if compatible

gcc/ada/

* sem_ch4.adb (Find_Non_Universal_Interpretations): Check if
types are compatible before adding interpretation.

[Ada] Spurious accessibility error on allocator in generic instance

gcc/ada/

* exp_ch4.adb (Expand_N_Type_Conversion): Add guard to protect
against calculating accessibility levels against internal
compiler-generated types.

[Ada] Capitalize comment

gcc/ada/

* sem_dim.adb (Dimensions_Msg_Of): Capitalize comment.

[Ada] Refactor scan_backend_switch to share logic across backends

gcc/ada/

* adabkend.adb (Scan_Back_End_Switches): Replace switch-scanning
logic with call to Backend_Utils.Scan_Common_Back_End_Switches.
* back_end.adb (Scan_Back_End_Switches): Replace switch-scanning
logic with call to Backend_Utils.Scan_Common_Back_End_Switches.
* backend_utils.adb: New file.
* backend_utils.ads: New file.
* gcc-interface/Make-lang.in: Add ada/backend_utils.o.

[Ada] Work around CodePeer bug by declaring variable

gcc/ada/

* atree.adb (Get_32_Bit_Field): Declare result before returning.

[Ada] Move Build_And_Insert_Cuda_Initialization to Expand_CUDA_Package

gcc/ada/

* exp_ch7.adb (Expand_N_Package_Body): Replace
Build_And_Insert_Cuda_Initialization with Expand_CUDA_Package.
* gnat_cuda.adb (Expand_CUDA_Package): New procedure.
(Build_And_Insert_Cuda_Initialization): Make internal.
* gnat_cuda.ads (Expand_CUDA_Package): New procedure.
(Build_And_Insert_Cuda_Initialization): Remove from spec.

[Ada] usage.adb: make -gnatw.c description clearer

gcc/ada/

* usage.adb (Usage): Update -gnatw.c messages.

[Ada] Remove inappropriate test from Is_By_Reference_Type

gcc/ada/

* sem_aux.adb (Is_By_Reference_Type): Do not test Error_Posted.

Use the proper vectype

The following uses the SLP node vectype rather than the vectype
stored in the DR group.

2021-09-17 Richard Biener <rguenther@suse.de>

* tree-vect-stmts.c (vectorizable_load): Use the vectype
from the SLP node.

Fortran/OpenMP: unconstrained/reproducible ordered modifier

gcc/fortran/ChangeLog:

* gfortran.h (gfc_omp_clauses): Add order_unconstrained.
* dump-parse-tree.c (show_omp_clauses): Dump it.
* openmp.c (gfc_match_omp_clauses): Match unconstrained/reproducible
modifiers to ordered(concurrent).
(OMP_DISTRIBUTE_CLAUSES): Accept ordered clause.
(resolve_omp_clauses): Reject ordered + order on same directive.
* trans-openmp.c (gfc_trans_omp_clauses, gfc_split_omp_clauses): Pass
on unconstrained modifier of ordered(concurrent).

gcc/testsuite/ChangeLog:

* gfortran.dg/gomp/order-5.f90: New test.
* gfortran.dg/gomp/order-6.f90: New test.
* gfortran.dg/gomp/order-7.f90: New test.
* gfortran.dg/gomp/order-8.f90: New test.
* gfortran.dg/gomp/order-9.f90: New test.

Avoid premature alignment setting in vect_duplicate_ssa_name_ptr_info

This removes adjusting alignment based on the vectorized accesses
and instead keeps what was set on the original access. The
code generating the actual accesses make sure to properly align
the vectorized accesses based on the generated pointer already
and the vectorizers alignment is always based of the desired
alignment of a vector type and thus will reset alignment to
unknown this way for example when doing strided accesses.

2021-09-20 Richard Biener <rguenther@suse.de>

* tree-vect-data-refs.c (vect_duplicate_ssa_name_ptr_info):
Do not compute alignment of the vectorized access here.

vect alignmet enhance TLC

This properly marks the loop as for a runtime alias peel rather
than (pointlessly) going through DR_MISALIGNMENT.

2021-09-20 Richard Biener <rguenther@suse.de>

* tree-vect-data-refs.c (vect_enhance_data_refs_alignment):
Store -1 for runtime alias peeling iterations.

Obsolete hppa[12]*-*-hpux10* and hppa[12]*-*-hpux11*

This obsoletes the 32bit hppa-hpux configurations which only support
STABS as debuginfo format.

2021-09-20 Richard Biener <rguenther@suse.de>

gcc/
* config.gcc: Obsolete hppa[12]*-*-hpux10* and hppa[12]*-*-hpux11*.

contrib/
* config-list.mk: --enable-obsolete for hppa2.0-hpux10.1 and
hppa2.0-hpux11.9.

testsuite: Remove .exe suffix in prune_gcc_output

When running the testsuite under Windows, we noticed failures in
testcase which attempt to match compiler error messages containing the
name of the executable.

For instance, gcc.dg/analyzer/signal-4a.c tries to match 'cc1:' which
obviously fails when the executable is called cc1.exe.

This patch removes the .exe suffix from various toolchain executables
to avoid this problem.

2021-09-08 Christophe Lyon <christophe.lyon@foss.st.com>
Torbjörn SVENSSON <torbjorn.svensson@st.com>

gcc/testsuite/
* lib/prune.exp (prune_gcc_output): Remove .exe suffix from
toolchain executables names.

Don't record string concatenation data for 'RESERVED_LOCATION_P'

'RESERVED_LOCATION_P' means 'UNKNOWN_LOCATION' or 'BUILTINS_LOCATION'.
We're using 'UNKNOWN_LOCATION' as a spare value for 'Empty', so should
ascertain that we don't use it as a key additionally. Similarly for
'BUILTINS_LOCATION' that we'd later like to use as a spare value for
'Deleted'.

As discussed in the source code comment added, for these we didn't have
stable behavior anyway.

Follow-up to r239175 (commit 88fa5555a309e5d6c6171b957daaf2f800920869)
"On-demand locations within string-literals".

gcc/
* input.c (string_concat_db::record_string_concatenation)
(string_concat_db::get_string_concatenation): Skip for
'RESERVED_LOCATION_P'.
gcc/testsuite/
* gcc.dg/plugin/diagnostic-test-string-literals-1.c: Adjust
expected error diagnostics.

tree-optimization/65206 - dependence analysis on mixed pointer/array

This adds the capability to analyze the dependence of mixed
pointer/array accesses.  The example is from where using a masked
load/store creates the pointer-based access when an otherwise
unconditional access is array based.  Other examples would include
accesses to an array mixed with accesses from inlined helpers
that work on pointers.

The idea is quite simple and old - analyze the data-ref indices
as if the reference was pointer-based.  The following change does
this by changing dr_analyze_indices to work on the indices
sub-structure and storing an alternate indices substructure in
each data reference.  That alternate set of indices is analyzed
lazily by initialize_data_dependence_relation when it fails to
match-up the main set of indices of two data references.
initialize_data_dependence_relation is refactored into a head
and a tail worker and changed to work on one of the indices
structures and thus away from using DR_* access macros which
continue to reference the main indices substructure.

There are quite some vectorization and loop distribution opportunities
unleashed in SPEC CPU 2017, notably 520.omnetpp_r, 548.exchange2_r,
510.parest_r, 511.povray_r, 521.wrf_r, 526.blender_r, 527.cam4_r and
544.nab_r see amendments in what they report with -fopt-info-loop while
the rest of the specrate set sees no changes there.  Measuring runtime
for the set where changes were reported reveals nothing off-noise
besides 511.povray_r which seems to regress slightly for me
(on a Zen2 machine with -Ofast -march=native).

2021-09-08  Richard Biener  <rguenther@suse.de>

PR tree-optimization/65206
* tree-data-ref.h (struct data_reference): Add alt_indices,
order it last.
* tree-data-ref.c (free_data_ref): Release alt_indices.
(dr_analyze_indices): Work on struct indices and get DR_REF as tree.
(create_data_ref): Adjust.
(initialize_data_dependence_relation): Split into head
and tail.  When the base objects fail to match up try
again with pointer-based analysis of indices.
* tree-vectorizer.c (vec_info_shared::check_datarefs): Do
not compare the lazily computed alternate set of indices.

* gcc.dg/torture/20210916.c: New testcase.
* gcc.dg/vect/pr65206.c: Likewise.

Driver: Fix bootstrap with DEFAULT_{ASSEMBLER,LINKER,DSYMUTIL}.

The patch at r12-3662-g5fee8a0a9223d factored the code for
printing the names of programes into a separate function.
However the moved editions that print out the names of the
assembler, linker (and dsymutil on Darwin) when those are
specified at configure-time were not adjusted accordingly,
leading to a bootstrap fail.

Fixed by testing specifically for execute OK, since we know
these are programs.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
gcc/ChangeLog:

* gcc.c: Test for execute OK when we find the
programs for assembler linker and dsymutil and those
were specified at configure-time.

Daily bump.

Correct a function pre/postcondition [PR102403].

Resolves:
PR middle-end/102403 - ICE in init_from_control_deps, at gimple-predicate-analysis.cc:2364

gcc/ChangeLog:
PR middle-end/102403
* gimple-predicate-analysis.cc (predicate::init_from_control_deps):
Correct a function pre/postcondition.

gcc/testsuite/ChangeLog:
PR middle-end/102403
* gcc.dg/uninit-pr102403.c: New test.
* gcc.dg/uninit-pr102403-c2.c: New test.

Handle null cfun [PR102243].

Resolves:
PR middle-end/102243 - ICE on placement new at global scope

gcc/ChangeLog:
PR middle-end/102243
* tree-ssa-strlen.c (get_range): Handle null cfun.

gcc/testsuite/ChangeLog:
PR middle-end/102243
* g++.dg/warn/Wplacement-new-size-10.C: New test.

libgcc, Darwin: Remove unused symlinks.

These were used on older systems to equate the FAT libgcc_s
library to single-slice equivalents. Unused for any current
system and never emitted by GCC.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
libgcc/ChangeLog:

* config/t-slibgcc-darwin: Delete unused code.

libgcc, X86, Darwin: Handle symbols for HF cases.

This reorganises the Darwin symbol vers files to include
the generic ones at the top level; allowing for arch ports
to override (via either exclusion or inclusion as needed).

We add an X86-specific vers file containing the new HF
symbols. Note that although Darwin does not use ELF-style
symbol versioning - the parser that produces the map can
consume it. Using the ELF-style description will help us
know at which rev the symbols were introduced.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
libgcc/ChangeLog:

* config/i386/t-darwin: Add in a vers file for X86-specific
symbols.
* config/t-darwin: Add the generic symbol maps here...
* config/t-slibgcc-darwin: ... removing from here.
* config/i386/libgcc-darwin.ver: New file.

libgcc, X86: Exclude rules for libgcc2 __{div,mul}hc3.

We want to override the libgcc2 generic version of these functions
for X86. First exclude the original and the add in the replacements.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
libgcc/ChangeLog:

* config/i386/t-softfp: Exclude libgcc2 versions of __divhc3
and __mulhc3.

Darwin, crts: Build Darwin10 unwinder shim as a library.

We have a small unwinder shim that is only used for Darwin10
(and only then in quite specific cases). To avoid linking
this code for every executable or DSO, we can present the crt
as a convenience library (rather than a .o file).

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
gcc/ChangeLog:

* config/darwin.h (LINK_COMMAND_SPEC_A): Use Darwin10
unwinder shim as a convenience library.

libgcc/ChangeLog:

* config.host: Use convenience library for Darwin10
unwinder shim.
* config/t-darwin: Build Darwin10 unwinder shim as a
convenience library.

[committed] Make test names unique for a couple of goacc tests

gcc/testsuite
* gfortran.dg/goacc/privatization-1-compute.f90: Make test names
unique.
* gfortran.dg/goacc/routine-external-level-of-parallelism-2.f:
Likewise.

Update the section on binutils version

LTO usage requires binutils 2.35 or newer due to
https://sourceware.org/PR25355.
This adds a note in the prerequisites page about it.

Ok?

gcc/ChangeLog:

* doc/install.texi: Add note about
binutils 2.35 is required for LTO usage.

Fix PR bootstrap/102389: --with-build-config=bootstrap-lto is broken

So the problem here is that now the lto-plugin requires NM that works
with LTO to work so we need to pass down NM just like we do for ranlib
and ar.

OK? Bootstrapped and tested with --with-build-config=bootstrap-lto on aarch64-linux-gnu.
Note you need to use binutils 2.35 or later too due to ttps://sourceware.org/PR25355
(I will submit another patch to improve the installation instructions too).

config/ChangeLog:

PR bootstrap/102389
* bootstrap-lto-lean.mk: Handle NM like RANLIB AND AR.
* bootstrap-lto.mk: Likewise.

Minor cleanups to forward threader.

Every time we allocate a threading edge we push it onto the path in a
distinct step. There's no need to do this in two steps, and avoiding
this, keeps us from exposing the internals of the registry.

I've also did some tiny cleanups in thread_across_edge, most importantly
removing the bitmap in favor of an auto_bitmap.

There are no functional changes.

gcc/ChangeLog:

* tree-ssa-threadbackward.c
(back_threader_registry::register_path): Use push_edge.
* tree-ssa-threadedge.c
(jump_threader::thread_around_empty_blocks): Same.
(jump_threader::thread_through_normal_block): Same.
(jump_threader::thread_across_edge): Same. Also, use auto_bitmap.
Tidy up code.
* tree-ssa-threadupdate.c
(jt_path_registry::allocate_thread_edge): Remove.
(jt_path_registry::push_edge): New.
(dump_jump_thread_path): Make static.
* tree-ssa-threadupdate.h (allocate_thread_edge): Remove.
(push_edge): New.

Jit, testsuite: Amend expect processing to tolerate more platforms.

The current 'fixed_host_execute' implementation fails on Darwin
platforms for a number of reasons:

1/ If the sub-process spawn fails (e.g. because of missing or mal-
   formed params); rather than reporting the fail output into the
   match stream, as indicated by the expect manual, it terminates
   the script.

- We fix this by (a) checking that the executable is valid as well
   as existing (b) we put the spawn into a catch block and report
   a failure.

2/ There is no recovery path at all for a buffer-full case (and we
   do see buffer-full events with the default sizes).

- Added by the patch here, however it is not as sophisticated as
   the methods used by dejagnu internally.  Here we set the process
   to be "nowait" and then close the connection - with the intent
   that this will terminate the spawned process.

3/  The expect logic assumes that 'Totals:' is a valid indicator
    for the end of the spawned process output.  This is not true
    even for the default dejagnu header (there are a number of
    additional reporting lines after).  In addition to this, there
    are some tests that intentionally produce more output after
    the totals report (and there are tests that do not use that
    mechanism at all).

    The effect is the we might arrive at the "wait" for the spawned
    process to finish - but that process might not have completed
    all its output.  For Darwin, at least that causes a deadlock
    between expect and the spawnee - the latter is doing a non-
    cancellable write and the former is waiting for the latter to
    terminate.  For some reason this does not seem to affect Linux
    perhaps the pty implementation allows the write(s) are able to
    proceed even though there is no reader.

-  This is fixed by modifying the loop termination condition to be
    either EOF (which will be the 'correct' condition) or a timeout
    which would represent an error either in the runtime or in the
    parsing of the output.  As added precautions, we only try to
    wait if there is a correcly-spawned process, and we are also
    specific about which process we are waiting for.

4/  Darwin appears to have a bug in either the tcl or termios
    'cooking' code that ocassionally inserts an additional CR char
    into the stream - thus '\n' => '\r\r\n' instead of '\r\n'. The
    original program output is correct (it only contains a single
    \n) - the additional character is being inserted somewhere in
    the translations applied before the output reaches expect.

    The logic of this expect implementation does not tolerate single
    \r or \n characters (it will fail with a timeout or buffer-full
    if that occurs).

-  This is fixed by having a line-end match that is adjusted for
    Darwin.

5/  The default buffer size does seem to be too small in some cases
    noting that GCC uses 10000 as the match buffer size and the
    default is 2000.

-  Fixed by increasing the size to 8192.

6/  There is a somewhat arbitrary dumping of output where we match
    ^$prefix\tSOMETHING... and then process the something.  This
    essentially allows the match to start at any place in the buffer
    following any collection of non-line-end chars.

-  Fixed by amending the match for 'general' lines to accommodate
    these cases, and reporting such lines to the log.  At least this
    should allow debugging of any cases where output that should be
    recognized is being dropped.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
gcc/testsuite/ChangeLog:

* jit.dg/jit.exp (fixed_local_execute): Amend the match and
exit conditions to cater for more platforms.

Make dump_ranger routines externally visible.

There was an inline extern declaration for dump_ranger that was a bit of
a hack. I've removed it in favor of an actual prototype. There are
also some trivial changes to the dumping code in the path solver.

gcc/ChangeLog:

* gimple-range-path.cc (path_range_query::path_range_query): Add
header.
(path_range_query::dump): Remove extern declaration of dump_ranger.
* gimple-range-trace.cc (dump_ranger): Add DEBUG_FUNCTION marker.
* gimple-range-trace.h (dump_ranger): Add prototype.

[PATCH] Factor out `find_a_program` helper around `find_a_file`

gcc/
* gcc.c (find_a_program): New function, factored out of...
(find_a_file): Here.
(execute): Use find_a_program when looking for programs rather
than find_a_file.

[PATCH] avr: Add atmega324pb MCU

gcc/
* config/avr/avr-mcus.def: Add atmega324pb.
* doc/avr-mmcu.texi: Corresponding changes.

PR middle-end/88173: More constant folding of NaN comparisons.

This patch tackles PR middle-end/88173 where the order of operands in
a comparison affects constant folding.  As diagnosed by Jason Merrill,
"match.pd handles these comparisons very differently".  The history is
that the middle end, typically canonicalizes comparisons to place
constants on the right, but when a comparison contains two constants
we need to check/transform both constants, i.e. on both the left and the
right.  Hence the added lines below duplicate for @0 the same transform
applied a few lines above for @1.

Whilst preparing the testcase, I noticed that this transformation is
incorrectly disabled with -fsignaling-nans even when both operands are
known not be be signaling NaNs, so I've corrected that and added a
second test case.  Unfortunately, c-c++-common/pr57371-4.c then starts
failing, as it doesn't distinguish QNaNs (which are quiet) from SNaNs
(which signal), so this patch includes a minor tweak to the expected
behaviour for QNaNs in that existing test.

2021-09-19  Roger Sayle <roger@nextmovesoftware.com>

gcc/ChangeLog
PR middle-end/88173
* match.pd (cmp @0 REAL_CST@1): When @0 is also REAL_CST, apply
the same transformations as to @1.  For comparisons against NaN,
don't check HONOR_SNANS but confirm that neither operand is a
signaling NaN.

gcc/testsuite/ChangeLog
PR middle-end/88173
* c-c++-common/pr57371-4.c: Tweak/correct test case for QNaNs.
* g++.dg/pr88173-1.C: New test case.
* g++.dg/pr88173-2.C: New test case.

[PATCH] Remove unused function make_unique_name.

gcc/
* attribs.c (make_unique_name): Delete.
* attribs.h (make_unique_name): Delete.

Fix middle-end/102395: reg_class having only NO_REGS and ALL_REGS.

So this is a simple fix is to just add to the assert that
sclass and dclass are both greater than or equal to NO_REGS.
NO_REGS is documented as the first register class so it should
have the value of 0.

gcc/ChangeLog:

* lra-constraints.c (check_and_process_move): Assert
that dclass and sclass are greater than or equal to NO_REGS.

Daily bump.

openmp: Handle unconstrained and reproducible modifiers on order(concurrent)

This patch adds handling for unconstrained and reproducible modifiers on
order(concurrent) clause.  For all static schedules (including auto and
no schedule or dist_schedule clauses) I believe what we implement is
reproducible, so the patch doesn't do much beyond recognizing those.
Note, there is an OpenMP/spec issue that needs resolution on what
should happen with the dynamic schedules (whether it should be an error
to mix such clauses, or silently make it non-reproducible, and in which
exact cases), so it might need some follow-up.

Besides that, this patch allows order(concurrent) clause on the distribute
construct which is something also added in OpenMP 5.1, and finally
check the newly added restriction that at most one order clause
can appear on a construct.

The allowing of order clause on distribute has a side-effect that
order(concurrent) copyin(thrpriv) is no longer allowed on combined/composite
constructs with distribute parallel for{, simd} in it, previously the
order applied only to for/simd and so a threadprivate var could be seen
in the construct, but now it also applies to distribute and so on the parallel
we shouldn't refer to a threadprivate var.

2021-09-18  Jakub Jelinek  <jakub@redhat.com>

gcc/
* tree.h (OMP_CLAUSE_ORDER_UNCONSTRAINED): Define.
* tree-pretty-print.c (dump_omp_clause): Print unconstrained:
for OMP_CLAUSE_ORDER_UNCONSTRAINED.
gcc/c-family/
* c-omp.c (c_omp_split_clauses): Split order clause also to
distribute construct.  Copy over OMP_CLAUSE_ORDER_UNCONSTRAINED.
gcc/c/
* c-parser.c (c_parser_omp_clause_order): Parse unconstrained
and reproducible modifiers.
(OMP_DISTRIBUTE_CLAUSE_MASK): Add order clause.
gcc/cp/
* parser.c (cp_parser_omp_clause_order): Parse unconstrained
and reproducible modifiers.
(OMP_DISTRIBUTE_CLAUSE_MASK): Add order clause.
gcc/testsuite/
* c-c++-common/gomp/order-1.c (f2): Add tests for distribute
with order clause.
(f3): Remove.
* c-c++-common/gomp/order-2.c: Don't expect error for distribute
with order clause.
* c-c++-common/gomp/order-5.c: New test.
* c-c++-common/gomp/order-6.c: New test.
* c-c++-common/gomp/clause-dups-1.c (f1): Add tests for
duplicated order clause.
(f9): New function.
* c-c++-common/gomp/clauses-1.c (baz, bar): Don't mix copyin and
order(concurrent) clauses on the same composite construct combined
with distribute, instead split it into two tests, one without
copyin and one without order(concurrent).  Add order(concurrent)
clauses to {,{,target} teams} distribute.
* g++.dg/gomp/attrs-1.C (baz, bar): Likewise.
* g++.dg/gomp/attrs-2.C (baz, bar): Likewise.

Fix ICE in pass_rpad.

Besides conversion instructions, pass_rpad also handles scalar
sqrt/rsqrt/rcp/round instructions, while r12-3614 should only want to
handle conversion instructions, so fix it.

gcc/ChangeLog:

* config/i386/i386-features.c (remove_partial_avx_dependency):
Restrict TARGET_USE_VECTOR_FP_CONVERTS and
TARGET_USE_VECTOR_CONVERTS to conversion instructions only.

openmp: Allow private or firstprivate arguments to default clause even for C/C++

OpenMP 5.1 allows default(private) or default(firstprivate) even in C/C++,
but it behaves the same way as in Fortran only for variables not declared at
namespace or file scope. For the namespace/file scope variables it instead
behaves as default(none).

2021-09-18 Jakub Jelinek <jakub@redhat.com>

gcc/
* gimplify.c (omp_default_clause): For C/C++ default({,first}private),
if file/namespace scope variable doesn't have predetermined sharing,
treat it as if there was default(none).
gcc/c/
* c-parser.c (c_parser_omp_clause_default): Handle private and
firstprivate arguments, adjust diagnostics on unknown argument.
gcc/cp/
* parser.c (cp_parser_omp_clause_default): Handle private and
firstprivate arguments, adjust diagnostics on unknown argument.
* cp-gimplify.c (cxx_omp_finish_clause): Handle OMP_CLAUSE_PRIVATE.
gcc/testsuite/
* c-c++-common/gomp/default-2.c: New test.
* c-c++-common/gomp/default-3.c: New test.
* g++.dg/gomp/default-1.C: New test.
libgomp/
* testsuite/libgomp.c++/default-1.C: New test.
* testsuite/libgomp.c-c++-common/default-1.c: New test.
* libgomp.texi (OpenMP 5.1): Mark "private and firstprivate argument
to default clause in C and C++" as implemented.

AVX512FP16: Add testcase for scalar FMA instructions.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx512fp16-vfmaddXXXsh-1a.c: New test.
* gcc.target/i386/avx512fp16-vfmaddXXXsh-1b.c: Ditto.
* gcc.target/i386/avx512fp16-vfmsubXXXsh-1a.c: Ditto.
* gcc.target/i386/avx512fp16-vfmsubXXXsh-1b.c: Ditto.
* gcc.target/i386/avx512fp16-vfnmaddXXXsh-1a.c: Ditto.
* gcc.target/i386/avx512fp16-vfnmaddXXXsh-1b.c: Ditto.
* gcc.target/i386/avx512fp16-vfnmsubXXXsh-1a.c: Ditto.
* gcc.target/i386/avx512fp16-vfnmsubXXXsh-1b.c: Ditto.

AVX512FP16: Add scalar fma instructions.

Add vfmadd[132,213,231]sh/vfnmadd[132,213,231]sh/
vfmsub[132,213,231]sh/vfnmsub[132,213,231]sh.

gcc/ChangeLog:

* config/i386/avx512fp16intrin.h (_mm_fmadd_sh):
New intrinsic.
(_mm_mask_fmadd_sh): Likewise.
(_mm_mask3_fmadd_sh): Likewise.
(_mm_maskz_fmadd_sh): Likewise.
(_mm_fmadd_round_sh): Likewise.
(_mm_mask_fmadd_round_sh): Likewise.
(_mm_mask3_fmadd_round_sh): Likewise.
(_mm_maskz_fmadd_round_sh): Likewise.
(_mm_fnmadd_sh): Likewise.
(_mm_mask_fnmadd_sh): Likewise.
(_mm_mask3_fnmadd_sh): Likewise.
(_mm_maskz_fnmadd_sh): Likewise.
(_mm_fnmadd_round_sh): Likewise.
(_mm_mask_fnmadd_round_sh): Likewise.
(_mm_mask3_fnmadd_round_sh): Likewise.
(_mm_maskz_fnmadd_round_sh): Likewise.
(_mm_fmsub_sh): Likewise.
(_mm_mask_fmsub_sh): Likewise.
(_mm_mask3_fmsub_sh): Likewise.
(_mm_maskz_fmsub_sh): Likewise.
(_mm_fmsub_round_sh): Likewise.
(_mm_mask_fmsub_round_sh): Likewise.
(_mm_mask3_fmsub_round_sh): Likewise.
(_mm_maskz_fmsub_round_sh): Likewise.
(_mm_fnmsub_sh): Likewise.
(_mm_mask_fnmsub_sh): Likewise.
(_mm_mask3_fnmsub_sh): Likewise.
(_mm_maskz_fnmsub_sh): Likewise.
(_mm_fnmsub_round_sh): Likewise.
(_mm_mask_fnmsub_round_sh): Likewise.
(_mm_mask3_fnmsub_round_sh): Likewise.
(_mm_maskz_fnmsub_round_sh): Likewise.
* config/i386/i386-builtin-types.def
(V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT): New builtin type.
* config/i386/i386-builtin.def: Add new builtins.
* config/i386/i386-expand.c: Handle new builtin type.
* config/i386/sse.md (fmai_vmfmadd_<mode><round_name>):
Ajdust to support FP16.
(fmai_vmfmsub_<mode><round_name>): Ditto.
(fmai_vmfnmadd_<mode><round_name>): Ditto.
(fmai_vmfnmsub_<mode><round_name>): Ditto.
(*fmai_fmadd_<mode>): Ditto.
(*fmai_fmsub_<mode>): Ditto.
(*fmai_fnmadd_<mode><round_name>): Ditto.
(*fmai_fnmsub_<mode><round_name>): Ditto.
(avx512f_vmfmadd_<mode>_mask<round_name>): Ditto.
(avx512f_vmfmadd_<mode>_mask3<round_name>): Ditto.
(avx512f_vmfmadd_<mode>_maskz<round_expand_name>): Ditto.
(avx512f_vmfmadd_<mode>_maskz_1<round_name>): Ditto.
(*avx512f_vmfmsub_<mode>_mask<round_name>): Ditto.
(avx512f_vmfmsub_<mode>_mask3<round_name>): Ditto.
(*avx512f_vmfmsub_<mode>_maskz_1<round_name>): Ditto.
(*avx512f_vmfnmsub_<mode>_mask<round_name>): Ditto.
(*avx512f_vmfnmsub_<mode>_mask3<round_name>): Ditto.
(*avx512f_vmfnmsub_<mode>_mask<round_name>): Ditto.
(*avx512f_vmfnmadd_<mode>_mask<round_name>): Renamed to ...
(avx512f_vmfnmadd_<mode>_mask<round_name>) ... this, and
adjust to support FP16.
(avx512f_vmfnmadd_<mode>_mask3<round_name>): Ditto.
(avx512f_vmfnmadd_<mode>_maskz_1<round_name>): Ditto.
(avx512f_vmfnmadd_<mode>_maskz<round_expand_name>): New
expander.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx-1.c: Add test for new builtins.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/sse-14.c: Add test for new intrinsics.
* gcc.target/i386/sse-22.c: Ditto.

AVX512FP16: Enable FP16 mask load/store.

gcc/ChangeLog:

* config/i386/sse.md (avx512fmaskmodelower): Extend to support
HF modes.
(maskload<mode><avx512fmaskmodelower>): Ditto.
(maskstore<mode><avx512fmaskmodelower>): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx512fp16-xorsign-1.c: New test.

AVX512FP16: Add testcase for fp16 bitwise operations.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx512fp16-neg-1a.c: New test.
* gcc.target/i386/avx512fp16-neg-1b.c: Ditto.
* gcc.target/i386/avx512fp16-scalar-bitwise-1a.c: Ditto.
* gcc.target/i386/avx512fp16-scalar-bitwise-1b.c: Ditto.
* gcc.target/i386/avx512fp16-vector-bitwise-1a.c: Ditto.
* gcc.target/i386/avx512fp16-vector-bitwise-1b.c: Ditto.
* gcc.target/i386/avx512fp16vl-neg-1a.c: Ditto.
* gcc.target/i386/avx512fp16vl-neg-1b.c: Ditto.

AVX512FP16: Add scalar/vector bitwise operations, including

1. FP16 vector xor/ior/and/andnot/abs/neg
2. FP16 scalar abs/neg/copysign/xorsign

gcc/ChangeLog:

* config/i386/i386-expand.c (ix86_expand_fp_absneg_operator):
Handle HFmode.
(ix86_expand_copysign): Ditto.
(ix86_expand_xorsign): Ditto.
* config/i386/i386.c (ix86_build_const_vector): Handle HF vector
modes.
(ix86_build_signbit_mask): Ditto.
(ix86_can_change_mode_class): Ditto.
* config/i386/i386.md
(SSEMODEF): Add HFmode.
(ssevecmodef): Ditto.
(<code>hf2): New define_expand.
(*<code>hf2_1): New define_insn_and_split.
(copysign<mode>): Extend to support HFmode under AVX512FP16.
(xorsign<mode>): Ditto.
* config/i386/sse.md (VFB): New mode iterator.
(VFB_128_256): Ditto.
(VFB_512): Ditto.
(sseintvecmode2): Support HF vector mode.
(<code><mode>2): Use new mode iterator.
(*<code><mode>2): Ditto.
(copysign<mode>3): Ditto.
(xorsign<mode>3): Ditto.
(<code><mode>3<mask_name>): Ditto.
(<code><mode>3<mask_name>): Ditto.
(<sse>_andnot<mode>3<mask_name>): Adjust for HF vector mode.
(<sse>_andnot<mode>3<mask_name>): Ditto.
(*<code><mode>3<mask_name>): Ditto.
(*<code><mode>3<mask_name>): Ditto.

AVX512FP16: Add testcase for fma instructions

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx512fp16-vfmaddXXXph-1a.c: New test.
* gcc.target/i386/avx512fp16-vfmaddXXXph-1b.c: Ditto.
* gcc.target/i386/avx512fp16-vfmsubXXXph-1a.c: Ditto.
* gcc.target/i386/avx512fp16-vfmsubXXXph-1b.c: Ditto.
* gcc.target/i386/avx512fp16-vfnmaddXXXph-1a.c: Ditto.
* gcc.target/i386/avx512fp16-vfnmaddXXXph-1b.c: Ditto.
* gcc.target/i386/avx512fp16-vfnmsubXXXph-1a.c: Ditto.
* gcc.target/i386/avx512fp16-vfnmsubXXXph-1b.c: Ditto.
* gcc.target/i386/avx512fp16vl-vfmaddXXXph-1a.c: Ditto.
* gcc.target/i386/avx512fp16vl-vfmaddXXXph-1b.c: Ditto.
* gcc.target/i386/avx512fp16vl-vfmsubXXXph-1a.c: Ditto.
* gcc.target/i386/avx512fp16vl-vfmsubXXXph-1b.c: Ditto.
* gcc.target/i386/avx512fp16vl-vfnmaddXXXph-1a.c: Ditto.
* gcc.target/i386/avx512fp16vl-vfnmaddXXXph-1b.c: Ditto.
* gcc.target/i386/avx512fp16vl-vfnmsubXXXph-1a.c: Ditto.
* gcc.target/i386/avx512fp16vl-vfnmsubXXXph-1b.c: Ditto.

AVX512FP16: Add FP16 fma instructions.

Add vfmadd[132,213,231]ph/vfnmadd[132,213,231]ph/vfmsub[132,213,231]ph/
vfnmsub[132,213,231]ph.

gcc/ChangeLog:

* config/i386/avx512fp16intrin.h (_mm512_mask_fmadd_ph):
New intrinsic.
(_mm512_mask3_fmadd_ph): Likewise.
(_mm512_maskz_fmadd_ph): Likewise.
(_mm512_fmadd_round_ph): Likewise.
(_mm512_mask_fmadd_round_ph): Likewise.
(_mm512_mask3_fmadd_round_ph): Likewise.
(_mm512_maskz_fmadd_round_ph): Likewise.
(_mm512_fnmadd_ph): Likewise.
(_mm512_mask_fnmadd_ph): Likewise.
(_mm512_mask3_fnmadd_ph): Likewise.
(_mm512_maskz_fnmadd_ph): Likewise.
(_mm512_fnmadd_round_ph): Likewise.
(_mm512_mask_fnmadd_round_ph): Likewise.
(_mm512_mask3_fnmadd_round_ph): Likewise.
(_mm512_maskz_fnmadd_round_ph): Likewise.
(_mm512_fmsub_ph): Likewise.
(_mm512_mask_fmsub_ph): Likewise.
(_mm512_mask3_fmsub_ph): Likewise.
(_mm512_maskz_fmsub_ph): Likewise.
(_mm512_fmsub_round_ph): Likewise.
(_mm512_mask_fmsub_round_ph): Likewise.
(_mm512_mask3_fmsub_round_ph): Likewise.
(_mm512_maskz_fmsub_round_ph): Likewise.
(_mm512_fnmsub_ph): Likewise.
(_mm512_mask_fnmsub_ph): Likewise.
(_mm512_mask3_fnmsub_ph): Likewise.
(_mm512_maskz_fnmsub_ph): Likewise.
(_mm512_fnmsub_round_ph): Likewise.
(_mm512_mask_fnmsub_round_ph): Likewise.
(_mm512_mask3_fnmsub_round_ph): Likewise.
(_mm512_maskz_fnmsub_round_ph): Likewise.
* config/i386/avx512fp16vlintrin.h (_mm256_fmadd_ph):
New intrinsic.
(_mm256_mask_fmadd_ph): Likewise.
(_mm256_mask3_fmadd_ph): Likewise.
(_mm256_maskz_fmadd_ph): Likewise.
(_mm_fmadd_ph): Likewise.
(_mm_mask_fmadd_ph): Likewise.
(_mm_mask3_fmadd_ph): Likewise.
(_mm_maskz_fmadd_ph): Likewise.
(_mm256_fnmadd_ph): Likewise.
(_mm256_mask_fnmadd_ph): Likewise.
(_mm256_mask3_fnmadd_ph): Likewise.
(_mm256_maskz_fnmadd_ph): Likewise.
(_mm_fnmadd_ph): Likewise.
(_mm_mask_fnmadd_ph): Likewise.
(_mm_mask3_fnmadd_ph): Likewise.
(_mm_maskz_fnmadd_ph): Likewise.
(_mm256_fmsub_ph): Likewise.
(_mm256_mask_fmsub_ph): Likewise.
(_mm256_mask3_fmsub_ph): Likewise.
(_mm256_maskz_fmsub_ph): Likewise.
(_mm_fmsub_ph): Likewise.
(_mm_mask_fmsub_ph): Likewise.
(_mm_mask3_fmsub_ph): Likewise.
(_mm_maskz_fmsub_ph): Likewise.
(_mm256_fnmsub_ph): Likewise.
(_mm256_mask_fnmsub_ph): Likewise.
(_mm256_mask3_fnmsub_ph): Likewise.
(_mm256_maskz_fnmsub_ph): Likewise.
(_mm_fnmsub_ph): Likewise.
(_mm_mask_fnmsub_ph): Likewise.
(_mm_mask3_fnmsub_ph): Likewise.
(_mm_maskz_fnmsub_ph): Likewise.
* config/i386/i386-builtin.def: Add corresponding new builtins.
* config/i386/sse.md
(<avx512>_fmadd_<mode>_maskz<round_expand_name>): Adjust to
support HF vector modes.
(<sd_mask_codefor>fma_fmadd_<mode><sd_maskz_name><round_name>):
Ditto.
(*<sd_mask_codefor>fma_fmadd_<mode><sd_maskz_name>_bcst_1): Ditto.
(*<sd_mask_codefor>fma_fmadd_<mode><sd_maskz_name>_bcst_2): Ditto.
(*<sd_mask_codefor>fma_fmadd_<mode><sd_maskz_name>_bcst_3): Ditto.
(<avx512>_fmadd_<mode>_mask<round_name>): Ditto.
(<avx512>_fmadd_<mode>_mask3<round_name>): Ditto.
(<avx512>_fmsub_<mode>_maskz<round_expand_name>): Ditto.
(<sd_mask_codefor>fma_fmsub_<mode><sd_maskz_name><round_name>):
Ditto.
(*<sd_mask_codefor>fma_fmsub_<mode><sd_maskz_name>_bcst_1): Ditto.
(*<sd_mask_codefor>fma_fmsub_<mode><sd_maskz_name>_bcst_2): Ditto.
(*<sd_mask_codefor>fma_fmsub_<mode><sd_maskz_name>_bcst_3): Ditto.
(<avx512>_fmsub_<mode>_mask<round_name>): Ditto.
(<avx512>_fmsub_<mode>_mask3<round_name>): Ditto.
(<sd_mask_codefor>fma_fnmadd_<mode><sd_maskz_name><round_name>):
Ditto.
(*<sd_mask_codefor>fma_fnmadd_<mode><sd_maskz_name>_bcst_1): Ditto.
(*<sd_mask_codefor>fma_fnmadd_<mode><sd_maskz_name>_bcst_2): Ditto.
(*<sd_mask_codefor>fma_fnmadd_<mode><sd_maskz_name>_bcst_3): Ditto.
(<avx512>_fnmadd_<mode>_mask<round_name>): Ditto.
(<avx512>_fnmadd_<mode>_mask3<round_name>): Ditto.
(<avx512>_fnmsub_<mode>_maskz<round_expand_name>): Ditto.
(<sd_mask_codefor>fma_fnmsub_<mode><sd_maskz_name><round_name>):
Ditto.
(*<sd_mask_codefor>fma_fnmsub_<mode><sd_maskz_name>_bcst_1): Ditto.
(*<sd_mask_codefor>fma_fnmsub_<mode><sd_maskz_name>_bcst_2): Ditto.
(*<sd_mask_codefor>fma_fnmsub_<mode><sd_maskz_name>_bcst_3): Ditto.
(<avx512>_fnmsub_<mode>_mask<round_name>): Ditto.
(<avx512>_fnmsub_<mode>_mask3<round_name>): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx-1.c: Add test for new builtins.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/sse-14.c: Add test fot new intrinsics.
* gcc.target/i386/sse-22.c: Ditto.

AVX512FP16: Add testcase for vfmaddsub[132,213,231]ph/vfmsubadd[132,213,231]ph.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx512fp16-vfmaddsubXXXph-1a.c: New test.
* gcc.target/i386/avx512fp16-vfmaddsubXXXph-1b.c: Ditto.
* gcc.target/i386/avx512fp16-vfmsubaddXXXph-1a.c: Ditto.
* gcc.target/i386/avx512fp16-vfmsubaddXXXph-1b.c: Ditto.
* gcc.target/i386/avx512fp16vl-vfmaddsubXXXph-1a.c: Ditto.
* gcc.target/i386/avx512fp16vl-vfmaddsubXXXph-1b.c: Ditto.
* gcc.target/i386/avx512fp16vl-vfmsubaddXXXph-1a.c: Ditto.
* gcc.target/i386/avx512fp16vl-vfmsubaddXXXph-1b.c: Ditto.

AVX512FP16: Add vfmaddsub[132,213,231]ph/vfmsubadd[132,213,231]ph.

gcc/ChangeLog:

* config/i386/avx512fp16intrin.h (_mm512_fmaddsub_ph):
New intrinsic.
(_mm512_mask_fmaddsub_ph): Likewise.
(_mm512_mask3_fmaddsub_ph): Likewise.
(_mm512_maskz_fmaddsub_ph): Likewise.
(_mm512_fmaddsub_round_ph): Likewise.
(_mm512_mask_fmaddsub_round_ph): Likewise.
(_mm512_mask3_fmaddsub_round_ph): Likewise.
(_mm512_maskz_fmaddsub_round_ph): Likewise.
(_mm512_mask_fmsubadd_ph): Likewise.
(_mm512_mask3_fmsubadd_ph): Likewise.
(_mm512_maskz_fmsubadd_ph): Likewise.
(_mm512_fmsubadd_round_ph): Likewise.
(_mm512_mask_fmsubadd_round_ph): Likewise.
(_mm512_mask3_fmsubadd_round_ph): Likewise.
(_mm512_maskz_fmsubadd_round_ph): Likewise.
* config/i386/avx512fp16vlintrin.h (_mm256_fmaddsub_ph):
New intrinsic.
(_mm256_mask_fmaddsub_ph): Likewise.
(_mm256_mask3_fmaddsub_ph): Likewise.
(_mm256_maskz_fmaddsub_ph): Likewise.
(_mm_fmaddsub_ph): Likewise.
(_mm_mask_fmaddsub_ph): Likewise.
(_mm_mask3_fmaddsub_ph): Likewise.
(_mm_maskz_fmaddsub_ph): Likewise.
(_mm256_fmsubadd_ph): Likewise.
(_mm256_mask_fmsubadd_ph): Likewise.
(_mm256_mask3_fmsubadd_ph): Likewise.
(_mm256_maskz_fmsubadd_ph): Likewise.
(_mm_fmsubadd_ph): Likewise.
(_mm_mask_fmsubadd_ph): Likewise.
(_mm_mask3_fmsubadd_ph): Likewise.
(_mm_maskz_fmsubadd_ph): Likewise.
* config/i386/i386-builtin.def: Add corresponding new builtins.
* config/i386/sse.md (VFH_SF_AVX512VL): New mode iterator.
* (<avx512>_fmsubadd_<mode>_maskz<round_expand_name>): New expander.
* (<avx512>_fmaddsub_<mode>_maskz<round_expand_name>): Use
VFH_SF_AVX512VL.
* (<sd_mask_codefor>fma_fmaddsub_<mode><sd_maskz_name><round_name>):
Ditto.
* (<avx512>_fmaddsub_<mode>_mask<round_name>): Ditto.
* (<avx512>_fmaddsub_<mode>_mask3<round_name>): Ditto.
* (<sd_mask_codefor>fma_fmsubadd_<mode><sd_maskz_name><round_name>):
Ditto.
* (<avx512>_fmsubadd_<mode>_mask<round_name>): Ditto.
* (<avx512>_fmsubadd_<mode>_mask3<round_name>): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx-1.c: Add test for new builtins.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/sse-14.c: Add test for new intrinsics.
* gcc.target/i386/sse-22.c: Ditto.

Support embedded broadcast for AVX512FP16 instructions.

gcc/ChangeLog:

PR target/87767
* config/i386/i386.c (ix86_print_operand): Handle
V8HF/V16HF/V32HFmode.
* config/i386/i386.h (VALID_BCST_MODE_P): Add HFmode.
* config/i386/sse.md (avx512bcst): Remove.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx512fp16-broadcast-1.c: New test.
* gcc.target/i386/avx512fp16-broadcast-2.c: New test.

c++: improve lookup of member-qualified names

I've been working on the resolution of CWG1835 by P1787, which among many
other things clarified that a name after -> or . is looked up first in the
class of the object expression even if it's dependent. This patch does not
make that change; this is a smaller change extracted from that work in
progress to make the lookup in the object type work better in cases where
unqualified lookup doesn't find anything.

Basically, if we see "t.foo::" we know that looking up foo in t needs to
find a type, so we build an implicit TYPENAME_TYPE for it.

This also implements the change from P1787 to assume that a name followed by
< in a type-only context names a template, since the less-than operator
can't appear in a type context. This makes some of the lines in dtor11.C
work.

I introduce the predicate 'dependentish_scope_p' for the case where the
current instantiation has dependent bases, so even though we can perform
name lookup, we can't conclude that a lookup failure is conclusive.

gcc/cp/ChangeLog:

* cp-tree.h (dependentish_scope_p): Declare.
* pt.c (dependentish_scope_p): New.
* parser.c (cp_parser_lookup_name): Return a TYPENAME_TYPE
for lookup of a type in a dependent object.
(cp_parser_template_id): Handle TYPENAME_TYPE.
(cp_parser_template_name): If we're looking for a type,
a name followed by < names a template.

gcc/testsuite/ChangeLog:

* g++.dg/template/dtor5.C: Adjust expected error.
* g++.dg/cpp23/lookup2.C: New test.
* g++.dg/template/dtor11.C: New test.

c++: fix comment typo

gcc/cp/ChangeLog:

* cp-tree.h: Fix typo in LANG_FLAG list.

Daily bump.

Factor predidacte analysis out of tree-ssa-uninit.c into its own module.

gcc/ChangeLog:

* Makefile.in (OBJS): Add gimple-predicate-analysis.o.
* tree-ssa-uninit.c (max_phi_args): Move to gimple-predicate-analysis.
(MASK_SET_BIT, MASK_TEST_BIT, MASK_EMPTY): Same.
(check_defs): Add comment.
(can_skip_redundant_opnd): Update comment.
(compute_uninit_opnds_pos): Adjust to namespace change.
(find_pdom): Move to gimple-predicate-analysis.cc.
(find_dom): Same.
(struct uninit_undef_val_t): New.
(is_non_loop_exit_postdominating): Move to gimple-predicate-analysis.cc.
(find_control_equiv_block): Same.
(MAX_NUM_CHAINS, MAX_CHAIN_LEN, MAX_POSTDOM_CHECK): Same.
(MAX_SWITCH_CASES): Same.
(compute_control_dep_chain): Same.
(find_uninit_use): Use predicate analyzer.
(struct pred_info): Move to gimple-predicate-analysis.
(convert_control_dep_chain_into_preds): Same.
(find_predicates): Same.
(collect_phi_def_edges): Same.
(warn_uninitialized_phi): Use predicate analyzer.
(find_def_preds): Move to gimple-predicate-analysis.
(dump_pred_info): Same.
(dump_pred_chain): Same.
(dump_predicates): Same.
(destroy_predicate_vecs): Remove.
(execute_late_warn_uninitialized): New.
(get_cmp_code): Move to gimple-predicate-analysis.
(is_value_included_in): Same.
(value_sat_pred_p): Same.
(find_matching_predicate_in_rest_chains): Same.
(is_use_properly_guarded): Same.
(prune_uninit_phi_opnds): Same.
(find_var_cmp_const): Same.
(use_pred_not_overlap_with_undef_path_pred): Same.
(pred_equal_p): Same.
(is_neq_relop_p): Same.
(is_neq_zero_form_p): Same.
(pred_expr_equal_p): Same.
(is_pred_expr_subset_of): Same.
(is_pred_chain_subset_of): Same.
(is_included_in): Same.
(is_superset_of): Same.
(pred_neg_p): Same.
(simplify_pred): Same.
(simplify_preds_2): Same.
(simplify_preds_3): Same.
(simplify_preds_4): Same.
(simplify_preds): Same.
(push_pred): Same.
(push_to_worklist): Same.
(get_pred_info_from_cmp): Same.
(is_degenerated_phi): Same.
(normalize_one_pred_1): Same.
(normalize_one_pred): Same.
(normalize_one_pred_chain): Same.
(normalize_preds): Same.
(can_one_predicate_be_invalidated_p): Same.
(can_chain_union_be_invalidated_p): Same.
(uninit_uses_cannot_happen): Same.
(pass_late_warn_uninitialized::execute): Define.
* gimple-predicate-analysis.cc: New file.
* gimple-predicate-analysis.h: New file.

Fortran - (large) arrays in the main shall be static

gcc/fortran/ChangeLog:

PR fortran/102366
* trans-decl.c (gfc_finish_var_decl): Disable the warning message
for variables moved from stack to static storange if they are
declared in the main, but allow the move to happen.

gcc/testsuite/ChangeLog:

PR fortran/102366
* gfortran.dg/pr102366.f90: New test.

libstdc++: Add 'noexcept' to path::iterator members

All path::iterator operations are non-throwing.

Signed-off-by: Jonathan Wakely <jwakely@redhat.com>
libstdc++-v3/ChangeLog:

* include/bits/fs_path.h (path::iterator): Add noexcept to all
member functions and friend functions.
(distance): Add noexcept.
(advance): Add noexcept and inline.
* include/experimental/bits/fs_path.h (path::iterator):
Add noexcept to all member functions.

libstdc++: Fix last std::tuple constructor missing 'constexpr' [PR102270]

Also rename the test so it actually runs.

Signed-off-by: Jonathan Wakely <jwakely@redhat.com>
libstdc++-v3/ChangeLog:

PR libstdc++/102270
* include/std/tuple (_Tuple_impl): Add constexpr to constructor
missed in previous patch.
* testsuite/20_util/tuple/cons/102270.C: Moved to...
* testsuite/20_util/tuple/cons/102270.cc: ...here.
* testsuite/util/testsuite_allocator.h (SimpleAllocator): Add
constexpr to constructor so it can be used for C++20 tests.

openacc: Remove unnecessary barriers (gimple worker partitioning/broadcast)

This is an optimisation for middle-end worker-partitioning support (used
to support multiple workers on AMD GCN).  At present, barriers may be
emitted in cases where they aren't needed and cannot be optimised away.
This patch stops the extraneous barriers from being emitted in the
first place.

One exception to the above (where the barrier is still needed) is for
predicated blocks of code that perform a write to gang-private shared
memory from one worker.  We must execute a barrier before other workers
read that shared memory location.

gcc/
* config/gcn/gcn.c (gimple.h): Include.
(gcn_fork_join): Emit barrier for worker-level joins.
* omp-oacc-neuter-broadcast.cc (find_local_vars_to_propagate): Add
writes_gang_private bitmap parameter. Set bit for blocks
containing gang-private variable writes.
(worker_single_simple): Don't emit barrier after predicated block.
(worker_single_copy): Don't emit barrier if we're not broadcasting
anything and the block contains no gang-private writes.
(neuter_worker_single): Don't predicate blocks that only contain
NOPs or internal marker functions.  Pass has_gang_private_write
argument to worker_single_copy.
(oacc_do_neutering): Add writes_gang_private bitmap handling.

openacc: Shared memory layout optimisation

This patch implements an algorithm to lay out local data-share (LDS)
space.  It currently works for AMD GCN.  At the moment, LDS is used for
three things:

  1. Gang-private variables
  2. Reduction temporaries (accumulators)
  3. Broadcasting for worker partitioning

After the patch is applied, (2) and (3) are placed at preallocated
locations in LDS, and (1) continues to be handled by the backend (as it
is at present prior to this patch being applied). LDS now looks like this:

  +--------------+ (gang-private size + 1024, = 1536)
  | free space   |
  |    ...       |
  | - - - - - - -|
  | worker bcast |
  +--------------+
  | reductions   |
  +--------------+ <<< -mgang-private-size=<number> (def. 512)
  | gang-private |
  |    vars      |
  +--------------+ (32)
  | low LDS vars |
  +--------------+ LDS base

So, gang-private space is fixed at a constant amount at compile time
(which can be increased with a command-line switch if necessary
for some given code). The layout algorithm takes out a slice of the
remainder of usable space for reduction vars, and uses the rest for
worker partitioning.

The partitioning algorithm works as follows.

1. An "adjacency" set is built up for each basic block that might
    do a broadcast. This is calculated by starting at each such block,
    and doing a recursive DFS walk over successors to find the next
    block (or blocks) that *also* does a broadcast
    (dfs_broadcast_reachable_1).

2. The adjacency set is inverted to get adjacent predecessor blocks also.

3. Blocks that will perform a broadcast are sorted by size of that
    broadcast: the biggest blocks are handled first.

4. A splay tree structure is used to calculate the spans of LDS memory
    that are already allocated by the blocks adjacent to this one
    (merge_ranges{,_1}.

5. The current block's broadcast space is allocated from the first free
    span not allocated in the splay tree structure calculated above
    (first_fit_range). This seems to work quite nicely and efficiently
    with the splay tree structure.

6. Continue with the next-biggest broadcast block until we're done.

In this way, "adjacent" broadcasts will not use the same piece of
LDS memory.

PR96334 "openacc: Unshare reduction temporaries for GCN" got merged in:

The GCN backend uses tree nodes like MEM((__lds TYPE *) <constant>)
for reduction temporaries. Unlike e.g. var decls and SSA names, these
nodes cannot be shared during gimplification, but are so in some
circumstances. This is detected when appropriate --enable-checking
options are used. This patch unshares such nodes when they are reused
more than once.

gcc/
* config/gcn/gcn-protos.h
(gcn_goacc_create_worker_broadcast_record): Update prototype.
* config/gcn/gcn-tree.c (gcn_goacc_get_worker_red_decl): Use
preallocated block of LDS memory.  Do not cache/share decls for
reduction temporaries between invocations.
(gcn_goacc_reduction_teardown): Unshare VAR on second use.
(gcn_goacc_create_worker_broadcast_record): Add OFFSET parameter
and return temporary LDS space at that offset.  Return pointer in
"sender" case.
* config/gcn/gcn.c (acc_lds_size, gang_private_hwm, lds_allocs):
New global vars.
(ACC_LDS_SIZE): Define as acc_lds_size.
(gcn_init_machine_status): Don't initialise lds_allocated,
lds_allocs, reduc_decls fields of machine function struct.
(gcn_option_override): Handle default size for gang-private
variables and -mgang-private-size option.
(gcn_expand_prologue): Use LDS_SIZE instead of LDS_SIZE-1 when
initialising M0_REG.
(gcn_shared_mem_layout): New function.
(gcn_print_lds_decl): Update comment. Use global lds_allocs map and
gang_private_hwm variable.
(TARGET_GOACC_SHARED_MEM_LAYOUT): Define target hook.
* config/gcn/gcn.h (machine_function): Remove lds_allocated,
lds_allocs, reduc_decls. Add reduction_base, reduction_limit.
* config/gcn/gcn.opt (gang_private_size_opt): New global.
(mgang-private-size=): New option.
* doc/tm.texi.in (TARGET_GOACC_SHARED_MEM_LAYOUT): Place
documentation hook.
* doc/tm.texi: Regenerate.
* omp-oacc-neuter-broadcast.cc (targhooks.h, diagnostic-core.h):
Add includes.
(build_sender_ref): Handle sender_decl being pointer.
(worker_single_copy): Add PLACEMENT and ISOLATE_BROADCASTS
parameters.  Pass placement argument to
create_worker_broadcast_record hook invocations.  Handle
sender_decl being pointer and isolate_broadcasts inserting extra
barriers.
(blk_offset_map_t): Add typedef.
(neuter_worker_single): Add BLK_OFFSET_MAP parameter.  Pass
preallocated range to worker_single_copy call.
(dfs_broadcast_reachable_1): New function.
(idx_decl_pair_t, used_range_vec_t): New typedefs.
(sort_size_descending): New function.
(addr_range): New class.
(splay_tree_compare_addr_range, splay_tree_free_key)
(first_fit_range, merge_ranges_1, merge_ranges): New functions.
(execute_omp_oacc_neuter_broadcast): Rename to...
(oacc_do_neutering): ... this.  Add BOUNDS_LO, BOUNDS_HI
parameters.  Arrange layout of shared memory for broadcast
operations.
(execute_omp_oacc_neuter_broadcast): New function.
(pass_omp_oacc_neuter_broadcast::gate): Remove num_workers==1
handling from here.  Enable pass for all OpenACC routines in order
to call shared memory-layout hook.
* target.def (create_worker_broadcast_record): Add OFFSET
parameter.
(shared_mem_layout): New hook.
libgomp/
* testsuite/libgomp.oacc-c-c++-common/broadcast-many.c: Update.

openacc: Turn off worker partitioning if num_workers==1

This patch turns off the middle-end worker-partitioning support if the
number of workers for an outlined offload function is one. In that case,
we do not need to perform the broadcasting/neutering code transformation.

gcc/
* omp-oacc-neuter-broadcast.cc
(pass_omp_oacc_neuter_broadcast::gate): Disable if num_workers is
1.
(execute_omp_oacc_neuter_broadcast): Adjust.

Co-Authored-By: Thomas Schwinge <thomas@codesourcery.com>

Add 'libgomp.oacc-c-c++-common/broadcast-many.c'

libgomp/
* testsuite/libgomp.oacc-c-c++-common/broadcast-many.c: New test.

Provide a relation oracle for paths.

This provides a path_oracle class which can optionally be used in conjunction
with another oracle to track relations on a path as it is walked.

* value-relation.cc (class equiv_chain): Move to header file.
(path_oracle::path_oracle): New.
(path_oracle::~path_oracle): New.
(path_oracle::register_relation): New.
(path_oracle::query_relation): New.
(path_oracle::reset_path): New.
(path_oracle::dump): New.
* value-relation.h (class equiv_chain): Move to here.
(class path_oracle): New.

Virtualize relation oracle and various cleanups.

Standardize equiv_oracle API onto the new relation_oracle virtual base, and
then have dom_oracle inherit from that.
equiv_set always returns an equivalency set now, never NULL.
EQ_EXPR requires symmetry now. Each SSA name must be in the other equiv set.
Shuffle some routines around, simplify.

* gimple-range-cache.cc (ranger_cache::ranger_cache): Create a DOM
based oracle.
* gimple-range-fold.cc (fur_depend::register_relation): Use
register_stmt/edge routines.
* value-relation.cc (equiv_chain::find): Relocate from equiv_oracle.
(equiv_oracle::equiv_oracle): Create self equivalence cache.
(equiv_oracle::~equiv_oracle): Release same.
(equiv_oracle::equiv_set): Return entry from self equiv cache if there
are no equivalences.
(equiv_oracle::find_equiv_block): Move list find to equiv_chain.
(equiv_oracle::register_relation): Rename from register_equiv.
(relation_chain_head::find_relation): Relocate from dom_oracle.
(relation_oracle::register_stmt): New.
(relation_oracle::register_edge): New.
(dom_oracle::*): Rename from relation_oracle.
(dom_oracle::register_relation): Adjust to call equiv_oracle.
(dom_oracle::set_one_relation): Split from register_relation.
(dom_oracle::register_transitives): Consolidate 2 methods.
(dom_oracle::find_relation_block): Move core to relation_chain.
(dom_oracle::query_relation): Rename from find_relation_dom and adjust.
* value-relation.h (class relation_oracle): New pure virtual base.
(class equiv_oracle): Inherit from relation_oracle and adjust.
(class dom_oracle): Rename from old relation_oracle and adjust.

testsuite: Fix gcc.target/i386/auto-init-* tests.

This set of tests failed on many different combination of -march, -mtune.
some of them failed with -fstack-protestor-all, or -mno-sse. And the
pattern matches are also different on lp64 or ia32.

The reason for these failures is that the RTL or assembly level patten
matches are only valid for -march=x86-64 -mtune=generic.

We restrict the testing only for -march=x86-64 and -mtune=generic. Also
add -fno-stack-protector or -msse for some of the testing cases.

gcc/testsuite/ChangeLog:

2021-09-17 qing zhao <qing.zhao@oracle.com>

* gcc.target/i386/auto-init-1.c: Restrict the testing only for
-march=x86-64 and -mtune=generic. Add -fno-stack-protector.
* gcc.target/i386/auto-init-2.c: Restrict the testing only for
-march=x86-64 and -mtune=generic -msse.
* gcc.target/i386/auto-init-3.c: Likewise.
* gcc.target/i386/auto-init-4.c: Likewise.
* gcc.target/i386/auto-init-5.c: Different pattern match for lp64 and
ia32.
* gcc.target/i386/auto-init-6.c: Restrict the testing only for
-march=x86-64 and -mtune-generic -msse. Add -fno-stack-protector.
* gcc.target/i386/auto-init-7.c: Likewise.
* gcc.target/i386/auto-init-8.c: Restrict the testing only for
-march=x86-64 and -mtune=generic -msse..
* gcc.target/i386/auto-init-padding-1.c: Likewise.
* gcc.target/i386/auto-init-padding-10.c: Likewise.
* gcc.target/i386/auto-init-padding-11.c: Likewise.
* gcc.target/i386/auto-init-padding-12.c: Likewise.
* gcc.target/i386/auto-init-padding-2.c: Likewise.
* gcc.target/i386/auto-init-padding-3.c: Restrict the testing only for
-march=x86-64. Different pattern match for lp64 and ia32.
* gcc.target/i386/auto-init-padding-4.c: Restrict the testing only for
-march=x86-64 and -mtune-generic -msse.
* gcc.target/i386/auto-init-padding-5.c: Likewise.
* gcc.target/i386/auto-init-padding-6.c: Likewise.
* gcc.target/i386/auto-init-padding-7.c: Restrict the testing only for
-march=x86-64 and -mtune-generic -msse. Add -fno-stack-protector.
* gcc.target/i386/auto-init-padding-8.c: Likewise.
* gcc.target/i386/auto-init-padding-9.c: Restrict the testing only for
-march=x86-64. Different pattern match for lp64 and ia32.

Better handle MIN/MAX_EXPR of unrelated objects [PR102200].

Resolves:
PR middle-end/102200 - ICE on a min of a decl and pointer in a loop

gcc/ChangeLog:

PR middle-end/102200
* pointer-query.cc (access_ref::inform_access): Handle MIN/MAX_EXPR.
(handle_min_max_size): Change argument. Store original SSA_NAME for
operands to potentially distinct (sub)objects.
(compute_objsize_r): Adjust call to the above.

gcc/testsuite/ChangeLog:

PR middle-end/102200
* gcc.dg/Wstringop-overflow-62.c: Adjust text of an expected note.
* gcc.dg/Warray-bounds-89.c: New test.
* gcc.dg/Wstringop-overflow-74.c: New test.
* gcc.dg/Wstringop-overflow-75.c: New test.
* gcc.dg/Wstringop-overflow-76.c: New test.

rs6000: Support for vectorizing built-in functions

This patch just duplicates a couple of functions and adjusts them to use the
new builtin names. There's no logical change otherwise.

2021-09-17 Bill Schmidt <wschmidt@linux.ibm.com>

gcc/
* config/rs6000/rs6000.c (rs6000-builtins.h): New include.
(rs6000_new_builtin_vectorized_function): New function.
(rs6000_new_builtin_md_vectorized_function): Likewise.
(rs6000_builtin_vectorized_function): Call
rs6000_new_builtin_vectorized_function.
(rs6000_builtin_md_vectorized_function): Call
rs6000_new_builtin_md_vectorized_function.

rs6000: Handle some recent MMA builtin changes

Peter Bergner recently added two new builtins __builtin_vsx_lxvp and
__builtin_vsx_stxvp.  These happened to break a pattern in MMA builtins that
I had been using to automate gimple folding of MMA builtins.  Previously,
every MMA function that could be folded had an associated internal function
that it was folded into.  The LXVP/STXVP builtins are just folded directly
into memory operations.

Instead of relying on this pattern, this patch adds a new attribute to
builtins called "mmaint," which is set for all MMA builtins that have an
associated internal builtin.  The naming convention that adds _INTERNAL to
the builtin index name remains.

The rest of the patch is just duplicating Peter's patch, using the new
builtin infrastructure.

2021-09-17  Bill Schmidt  <wschmidt@linux.ibm.com>

gcc/
* config/rs6000/rs6000-builtin-new.def (ASSEMBLE_ACC): Add mmaint flag.
(ASSEMBLE_PAIR): Likewise.
(BUILD_ACC): Likewise.
(DISASSEMBLE_ACC): Likewise.
(DISASSEMBLE_PAIR): Likewise.
(PMXVBF16GER2): Likewise.
(PMXVBF16GER2NN): Likewise.
(PMXVBF16GER2NP): Likewise.
(PMXVBF16GER2PN): Likewise.
(PMXVBF16GER2PP): Likewise.
(PMXVF16GER2): Likewise.
(PMXVF16GER2NN): Likewise.
(PMXVF16GER2NP): Likewise.
(PMXVF16GER2PN): Likewise.
(PMXVF16GER2PP): Likewise.
(PMXVF32GER): Likewise.
(PMXVF32GERNN): Likewise.
(PMXVF32GERNP): Likewise.
(PMXVF32GERPN): Likewise.
(PMXVF32GERPP): Likewise.
(PMXVF64GER): Likewise.
(PMXVF64GERNN): Likewise.
(PMXVF64GERNP): Likewise.
(PMXVF64GERPN): Likewise.
(PMXVF64GERPP): Likewise.
(PMXVI16GER2): Likewise.
(PMXVI16GER2PP): Likewise.
(PMXVI16GER2S): Likewise.
(PMXVI16GER2SPP): Likewise.
(PMXVI4GER8): Likewise.
(PMXVI4GER8PP): Likewise.
(PMXVI8GER4): Likewise.
(PMXVI8GER4PP): Likewise.
(PMXVI8GER4SPP): Likewise.
(XVBF16GER2): Likewise.
(XVBF16GER2NN): Likewise.
(XVBF16GER2NP): Likewise.
(XVBF16GER2PN): Likewise.
(XVBF16GER2PP): Likewise.
(XVF16GER2): Likewise.
(XVF16GER2NN): Likewise.
(XVF16GER2NP): Likewise.
(XVF16GER2PN): Likewise.
(XVF16GER2PP): Likewise.
(XVF32GER): Likewise.
(XVF32GERNN): Likewise.
(XVF32GERNP): Likewise.
(XVF32GERPN): Likewise.
(XVF32GERPP): Likewise.
(XVF64GER): Likewise.
(XVF64GERNN): Likewise.
(XVF64GERNP): Likewise.
(XVF64GERPN): Likewise.
(XVF64GERPP): Likewise.
(XVI16GER2): Likewise.
(XVI16GER2PP): Likewise.
(XVI16GER2S): Likewise.
(XVI16GER2SPP): Likewise.
(XVI4GER8): Likewise.
(XVI4GER8PP): Likewise.
(XVI8GER4): Likewise.
(XVI8GER4PP): Likewise.
(XVI8GER4SPP): Likewise.
(XXMFACC): Likewise.
(XXMTACC): Likewise.
(XXSETACCZ): Likewise.
(ASSEMBLE_PAIR_V): Likewise.
(BUILD_PAIR): Likewise.
(DISASSEMBLE_PAIR_V): Likewise.
(LXVP): New.
(STXVP): New.
* config/rs6000/rs6000-call.c (rs6000_gimple_fold_new_mma_builtin):
Handle RS6000_BIF_LXVP and RS6000_BIF_STXVP.
* config/rs6000/rs6000-gen-builtins.c (attrinfo): Add ismmaint.
(parse_bif_attrs): Handle ismmaint.
(write_decls): Add bif_mmaint_bit and bif_is_mmaint.
(write_bif_static_init): Handle ismmaint.

rs6000: Handle gimple folding of target built-ins

This is another patch that looks bigger than it really is.  Because we
have a new namespace for the builtins, allowing us to have both the old
and new builtin infrastructure supported at once, we need versions of
these functions that use the new builtin namespace.  Otherwise the code is
unchanged.

2021-09-17  Bill Schmidt  <wschmidt@linux.ibm.com>

gcc/
* config/rs6000/rs6000-call.c (rs6000_gimple_fold_new_builtin): New
forward decl.
(rs6000_gimple_fold_builtin): Call rs6000_gimple_fold_new_builtin.
(rs6000_new_builtin_valid_without_lhs): New function.
(rs6000_gimple_fold_new_mma_builtin): Likewise.
(rs6000_gimple_fold_new_builtin): Likewise.

Fix 'hash_table::expand' to destruct stale Value objects

Thus plugging potentional memory leaks if these have non-trivial
constructor/destructor.

See
<https://stackoverflow.com/questions/6730403/how-to-delete-object-constructed-via-placement-new-operator>
and others.

As one example, compilation of 'g++.dg/warn/Wmismatched-tags.C' per
'valgrind --leak-check=full' improves as follows:

     [...]
    -
    -104 bytes in 1 blocks are definitely lost in loss record 399 of 519
    -   at 0x483DFAF: realloc (vg_replace_malloc.c:836)
    -   by 0x223B62C: xrealloc (xmalloc.c:179)
    -   by 0xA8D848: void va_heap::reserve<class_decl_loc_t::class_key_loc_t>(vec<class_decl_loc_t::class_key_loc_t, va_heap, vl_embed>*&, unsigned int, bool) (vec.h:290)
    -   by 0xA8B373: vec<class_decl_loc_t::class_key_loc_t, va_heap, vl_ptr>::reserve(unsigned int, bool) (vec.h:1858)
    -   by 0xA8B277: vec<class_decl_loc_t::class_key_loc_t, va_heap, vl_ptr>::safe_push(class_decl_loc_t::class_key_loc_t const&) (vec.h:1967)
    -   by 0xA57481: class_decl_loc_t::add_or_diag_mismatched_tag(tree_node*, tag_types, bool, bool) (parser.c:32967)
    -   by 0xA573E1: class_decl_loc_t::add(cp_parser*, unsigned int, tag_types, tree_node*, bool, bool) (parser.c:32941)
    -   by 0xA56C52: cp_parser_check_class_key(cp_parser*, unsigned int, tag_types, tree_node*, bool, bool) (parser.c:32819)
    -   by 0xA3AD12: cp_parser_elaborated_type_specifier(cp_parser*, bool, bool) (parser.c:20227)
    -   by 0xA37EF2: cp_parser_type_specifier(cp_parser*, int, cp_decl_specifier_seq*, bool, int*, bool*) (parser.c:18942)
    -   by 0xA31CDD: cp_parser_decl_specifier_seq(cp_parser*, int, cp_decl_specifier_seq*, int*) (parser.c:15517)
    -   by 0xA43C71: cp_parser_parameter_declaration(cp_parser*, int, bool, bool*) (parser.c:24242)
    -
    -168 bytes in 3 blocks are definitely lost in loss record 422 of 519
    -   at 0x483DFAF: realloc (vg_replace_malloc.c:836)
    -   by 0x223B62C: xrealloc (xmalloc.c:179)
    -   by 0xA8D848: void va_heap::reserve<class_decl_loc_t::class_key_loc_t>(vec<class_decl_loc_t::class_key_loc_t, va_heap, vl_embed>*&, unsigned int, bool) (vec.h:290)
    -   by 0xA8B373: vec<class_decl_loc_t::class_key_loc_t, va_heap, vl_ptr>::reserve(unsigned int, bool) (vec.h:1858)
    -   by 0xA8B277: vec<class_decl_loc_t::class_key_loc_t, va_heap, vl_ptr>::safe_push(class_decl_loc_t::class_key_loc_t const&) (vec.h:1967)
    -   by 0xA57481: class_decl_loc_t::add_or_diag_mismatched_tag(tree_node*, tag_types, bool, bool) (parser.c:32967)
    -   by 0xA573E1: class_decl_loc_t::add(cp_parser*, unsigned int, tag_types, tree_node*, bool, bool) (parser.c:32941)
    -   by 0xA56C52: cp_parser_check_class_key(cp_parser*, unsigned int, tag_types, tree_node*, bool, bool) (parser.c:32819)
    -   by 0xA3AD12: cp_parser_elaborated_type_specifier(cp_parser*, bool, bool) (parser.c:20227)
    -   by 0xA37EF2: cp_parser_type_specifier(cp_parser*, int, cp_decl_specifier_seq*, bool, int*, bool*) (parser.c:18942)
    -   by 0xA31CDD: cp_parser_decl_specifier_seq(cp_parser*, int, cp_decl_specifier_seq*, int*) (parser.c:15517)
    -   by 0xA53385: cp_parser_single_declaration(cp_parser*, vec<deferred_access_check, va_gc, vl_embed>*, bool, bool, bool*) (parser.c:31072)
    -
    -488 bytes in 7 blocks are definitely lost in loss record 449 of 519
    -   at 0x483DFAF: realloc (vg_replace_malloc.c:836)
    -   by 0x223B62C: xrealloc (xmalloc.c:179)
    -   by 0xA8D848: void va_heap::reserve<class_decl_loc_t::class_key_loc_t>(vec<class_decl_loc_t::class_key_loc_t, va_heap, vl_embed>*&, unsigned int, bool) (vec.h:290)
    -   by 0xA8B373: vec<class_decl_loc_t::class_key_loc_t, va_heap, vl_ptr>::reserve(unsigned int, bool) (vec.h:1858)
    -   by 0xA8B277: vec<class_decl_loc_t::class_key_loc_t, va_heap, vl_ptr>::safe_push(class_decl_loc_t::class_key_loc_t const&) (vec.h:1967)
    -   by 0xA57481: class_decl_loc_t::add_or_diag_mismatched_tag(tree_node*, tag_types, bool, bool) (parser.c:32967)
    -   by 0xA573E1: class_decl_loc_t::add(cp_parser*, unsigned int, tag_types, tree_node*, bool, bool) (parser.c:32941)
    -   by 0xA56C52: cp_parser_check_class_key(cp_parser*, unsigned int, tag_types, tree_node*, bool, bool) (parser.c:32819)
    -   by 0xA3AD12: cp_parser_elaborated_type_specifier(cp_parser*, bool, bool) (parser.c:20227)
    -   by 0xA37EF2: cp_parser_type_specifier(cp_parser*, int, cp_decl_specifier_seq*, bool, int*, bool*) (parser.c:18942)
    -   by 0xA31CDD: cp_parser_decl_specifier_seq(cp_parser*, int, cp_decl_specifier_seq*, int*) (parser.c:15517)
    -   by 0xA49508: cp_parser_member_declaration(cp_parser*) (parser.c:26440)
    -
    -728 bytes in 7 blocks are definitely lost in loss record 455 of 519
    -   at 0x483B7F3: malloc (vg_replace_malloc.c:309)
    -   by 0x223B63F: xrealloc (xmalloc.c:177)
    -   by 0xA8D848: void va_heap::reserve<class_decl_loc_t::class_key_loc_t>(vec<class_decl_loc_t::class_key_loc_t, va_heap, vl_embed>*&, unsigned int, bool) (vec.h:290)
    -   by 0xA8B373: vec<class_decl_loc_t::class_key_loc_t, va_heap, vl_ptr>::reserve(unsigned int, bool) (vec.h:1858)
    -   by 0xA57508: class_decl_loc_t::add_or_diag_mismatched_tag(tree_node*, tag_types, bool, bool) (parser.c:32980)
    -   by 0xA573E1: class_decl_loc_t::add(cp_parser*, unsigned int, tag_types, tree_node*, bool, bool) (parser.c:32941)
    -   by 0xA56C52: cp_parser_check_class_key(cp_parser*, unsigned int, tag_types, tree_node*, bool, bool) (parser.c:32819)
    -   by 0xA48BC6: cp_parser_class_head(cp_parser*, bool*) (parser.c:26090)
    -   by 0xA4674B: cp_parser_class_specifier_1(cp_parser*) (parser.c:25302)
    -   by 0xA47D76: cp_parser_class_specifier(cp_parser*) (parser.c:25680)
    -   by 0xA37E27: cp_parser_type_specifier(cp_parser*, int, cp_decl_specifier_seq*, bool, int*, bool*) (parser.c:18912)
    -   by 0xA31CDD: cp_parser_decl_specifier_seq(cp_parser*, int, cp_decl_specifier_seq*, int*) (parser.c:15517)
    -
    -832 bytes in 8 blocks are definitely lost in loss record 458 of 519
    -   at 0x483B7F3: malloc (vg_replace_malloc.c:309)
    -   by 0x223B63F: xrealloc (xmalloc.c:177)
    -   by 0xA8D848: void va_heap::reserve<class_decl_loc_t::class_key_loc_t>(vec<class_decl_loc_t::class_key_loc_t, va_heap, vl_embed>*&, unsigned int, bool) (vec.h:290)
    -   by 0xA901ED: bool vec_safe_reserve<class_decl_loc_t::class_key_loc_t, va_heap>(vec<class_decl_loc_t::class_key_loc_t, va_heap, vl_embed>*&, unsigned int, bool) (vec.h:697)
    -   by 0xA8F161: void vec_alloc<class_decl_loc_t::class_key_loc_t, va_heap>(vec<class_decl_loc_t::class_key_loc_t, va_heap, vl_embed>*&, unsigned int) (vec.h:718)
    -   by 0xA8D18D: vec<class_decl_loc_t::class_key_loc_t, va_heap, vl_embed>::copy() const (vec.h:979)
    -   by 0xA8B0C3: vec<class_decl_loc_t::class_key_loc_t, va_heap, vl_ptr>::copy() const (vec.h:1824)
    -   by 0xA896B1: class_decl_loc_t::operator=(class_decl_loc_t const&) (parser.c:32697)
    -   by 0xA571FD: class_decl_loc_t::add(cp_parser*, unsigned int, tag_types, tree_node*, bool, bool) (parser.c:32899)
    -   by 0xA56C52: cp_parser_check_class_key(cp_parser*, unsigned int, tag_types, tree_node*, bool, bool) (parser.c:32819)
    -   by 0xA3AD12: cp_parser_elaborated_type_specifier(cp_parser*, bool, bool) (parser.c:20227)
    -   by 0xA37EF2: cp_parser_type_specifier(cp_parser*, int, cp_decl_specifier_seq*, bool, int*, bool*) (parser.c:18942)
    -
    -1,144 bytes in 11 blocks are definitely lost in loss record 466 of 519
    -   at 0x483B7F3: malloc (vg_replace_malloc.c:309)
    -   by 0x223B63F: xrealloc (xmalloc.c:177)
    -   by 0xA8D848: void va_heap::reserve<class_decl_loc_t::class_key_loc_t>(vec<class_decl_loc_t::class_key_loc_t, va_heap, vl_embed>*&, unsigned int, bool) (vec.h:290)
    -   by 0xA901ED: bool vec_safe_reserve<class_decl_loc_t::class_key_loc_t, va_heap>(vec<class_decl_loc_t::class_key_loc_t, va_heap, vl_embed>*&, unsigned int, bool) (vec.h:697)
    -   by 0xA8F161: void vec_alloc<class_decl_loc_t::class_key_loc_t, va_heap>(vec<class_decl_loc_t::class_key_loc_t, va_heap, vl_embed>*&, unsigned int) (vec.h:718)
    -   by 0xA8D18D: vec<class_decl_loc_t::class_key_loc_t, va_heap, vl_embed>::copy() const (vec.h:979)
    -   by 0xA8B0C3: vec<class_decl_loc_t::class_key_loc_t, va_heap, vl_ptr>::copy() const (vec.h:1824)
    -   by 0xA896B1: class_decl_loc_t::operator=(class_decl_loc_t const&) (parser.c:32697)
    -   by 0xA571FD: class_decl_loc_t::add(cp_parser*, unsigned int, tag_types, tree_node*, bool, bool) (parser.c:32899)
    -   by 0xA56C52: cp_parser_check_class_key(cp_parser*, unsigned int, tag_types, tree_node*, bool, bool) (parser.c:32819)
    -   by 0xA48BC6: cp_parser_class_head(cp_parser*, bool*) (parser.c:26090)
    -   by 0xA4674B: cp_parser_class_specifier_1(cp_parser*) (parser.c:25302)
    -
    -1,376 bytes in 10 blocks are definitely lost in loss record 467 of 519
    -   at 0x483DFAF: realloc (vg_replace_malloc.c:836)
    -   by 0x223B62C: xrealloc (xmalloc.c:179)
    -   by 0xA8D848: void va_heap::reserve<class_decl_loc_t::class_key_loc_t>(vec<class_decl_loc_t::class_key_loc_t, va_heap, vl_embed>*&, unsigned int, bool) (vec.h:290)
    -   by 0xA8B373: vec<class_decl_loc_t::class_key_loc_t, va_heap, vl_ptr>::reserve(unsigned int, bool) (vec.h:1858)
    -   by 0xA8B277: vec<class_decl_loc_t::class_key_loc_t, va_heap, vl_ptr>::safe_push(class_decl_loc_t::class_key_loc_t const&) (vec.h:1967)
    -   by 0xA57481: class_decl_loc_t::add_or_diag_mismatched_tag(tree_node*, tag_types, bool, bool) (parser.c:32967)
    -   by 0xA573E1: class_decl_loc_t::add(cp_parser*, unsigned int, tag_types, tree_node*, bool, bool) (parser.c:32941)
    -   by 0xA56C52: cp_parser_check_class_key(cp_parser*, unsigned int, tag_types, tree_node*, bool, bool) (parser.c:32819)
    -   by 0xA3AD12: cp_parser_elaborated_type_specifier(cp_parser*, bool, bool) (parser.c:20227)
    -   by 0xA37EF2: cp_parser_type_specifier(cp_parser*, int, cp_decl_specifier_seq*, bool, int*, bool*) (parser.c:18942)
    -   by 0xA31CDD: cp_parser_decl_specifier_seq(cp_parser*, int, cp_decl_specifier_seq*, int*) (parser.c:15517)
    -   by 0xA301E0: cp_parser_simple_declaration(cp_parser*, bool, tree_node**) (parser.c:14772)
    -
    -3,552 bytes in 33 blocks are definitely lost in loss record 483 of 519
    -   at 0x483B7F3: malloc (vg_replace_malloc.c:309)
    -   by 0x223B63F: xrealloc (xmalloc.c:177)
    -   by 0xA8D848: void va_heap::reserve<class_decl_loc_t::class_key_loc_t>(vec<class_decl_loc_t::class_key_loc_t, va_heap, vl_embed>*&, unsigned int, bool) (vec.h:290)
    -   by 0xA901ED: bool vec_safe_reserve<class_decl_loc_t::class_key_loc_t, va_heap>(vec<class_decl_loc_t::class_key_loc_t, va_heap, vl_embed>*&, unsigned int, bool) (vec.h:697)
    -   by 0xA8F161: void vec_alloc<class_decl_loc_t::class_key_loc_t, va_heap>(vec<class_decl_loc_t::class_key_loc_t, va_heap, vl_embed>*&, unsigned int) (vec.h:718)
    -   by 0xA8D18D: vec<class_decl_loc_t::class_key_loc_t, va_heap, vl_embed>::copy() const (vec.h:979)
    -   by 0xA8B0C3: vec<class_decl_loc_t::class_key_loc_t, va_heap, vl_ptr>::copy() const (vec.h:1824)
    -   by 0xA8964A: class_decl_loc_t::class_decl_loc_t(class_decl_loc_t const&) (parser.c:32689)
    -   by 0xA8F515: hash_table<hash_map<tree_decl_hash, class_decl_loc_t, simple_hashmap_traits<default_hash_traits<tree_decl_hash>, class_decl_loc_t> >::hash_entry, false, xcallocator>::expand() (hash-table.h:839)
    -   by 0xA8D4B3: hash_table<hash_map<tree_decl_hash, class_decl_loc_t, simple_hashmap_traits<default_hash_traits<tree_decl_hash>, class_decl_loc_t> >::hash_entry, false, xcallocator>::find_slot_with_hash(tree_node* const&, unsigned int, insert_option) (hash-table.h:1008)
    -   by 0xA8B1DC: hash_map<tree_decl_hash, class_decl_loc_t, simple_hashmap_traits<default_hash_traits<tree_decl_hash>, class_decl_loc_t> >::get_or_insert(tree_node* const&, bool*) (hash-map.h:200)
    -   by 0xA57128: class_decl_loc_t::add(cp_parser*, unsigned int, tag_types, tree_node*, bool, bool) (parser.c:32888)
     [...]
     LEAK SUMMARY:
    -   definitely lost: 8,440 bytes in 81 blocks
    +   definitely lost: 48 bytes in 1 blocks
        indirectly lost: 12,529 bytes in 329 blocks
          possibly lost: 0 bytes in 0 blocks
        still reachable: 1,644,376 bytes in 768 blocks

gcc/
* hash-table.h (hash_table<Descriptor, Lazy, Allocator>::expand):
Destruct stale Value objects.
* hash-map-tests.c (test_map_of_type_with_ctor_and_dtor_expand):
Update.

Fortran: Use _Float128 rather than __float128 for c_float128 kind.

The GNU Fortran manual documents that the c_float128 kind corresponds
to __float128, but in fact the implementation uses float128_type_node,
which is _Float128.  Both refer to the 128-bit IEEE/ISO encoding, but
some targets including aarch64 only define _Float128 and not __float128,
and do not provide quadmath.h.  This caused errors in some test cases
referring to __float128.

This patch changes the documentation (including code comments) and
test cases to use _Float128 to match the implementation.

2021-09-16  Sandra Loosemore  <sandra@codesourcery.com>

gcc/fortran/

* intrinsic.texi (ISO_C_BINDING): Change C_FLOAT128 to correspond
to _Float128 rather than __float128.
* iso-c-binding.def (c_float128): Update comments.
* trans-intrinsic.c (gfc_builtin_decl_for_float_kind): Likewise.
(build_round_expr): Likewise.
(gfc_build_intrinsic_lib_fndcecls): Likewise.
* trans-types.h (gfc_real16_is_float128): Likewise.

gcc/testsuite/
* gfortran.dg/PR100914.c: Do not include quadmath.h.  Use
_Float128 _Complex instead of __complex128.
* gfortran.dg/PR100914.f90: Add -Wno-pedantic to suppress error
about use of _Float128.
* gfortran.dg/c-interop/typecodes-array-float128-c.c: Use
_Float128 instead of __float128.
* gfortran.dg/c-interop/typecodes-sanity-c.c: Likewise.
* gfortran.dg/c-interop/typecodes-scalar-float128-c.c: Likewise.
* lib/target-supports.exp
(check_effective_target_fortran_real_c_float128): Update comments.

libgfortran/
* ISO_Fortran_binding.h: Update comments.
* runtime/ISO_Fortran_binding.c: Likewise.

PR c/102245: Disable sign-changing optimization for shifts by zero.

Respecting Jakub's suggestion that it may be better to warn-on-valid for
"if (x << 0)" as the author might have intended "if (x < 0)" [which will
also warn when x is _Bool], the simplest way to resolve this regression
is to disable the recently added fold transformation for shifts by zero;
these will be optimized later (elsewhere). Guarding against integer_zerop
is the simplest of three alternatives; the second being to only apply
this transformation to GIMPLE and not GENERIC, and the third (potentially)
being to explicitly handle shifts by zero here, with an (if cond then else),
optimizing the expression to a convert, but awkwardly duplicating a
more general transformation earlier in match.pd's shift simplifications.

2021-09-17 Roger Sayle <roger@nextmovesoftware.com>

gcc/ChangeLog
PR c/102245
* match.pd (shift optimizations): Disable recent sign-changing
optimization for shifts by zero, these will be folded later.

gcc/testsuite/ChangeLog
PR c/102245
* gcc.dg/Wint-in-bool-context-4.c: New test case.

rs6000: Move __builtin_mffsl to the [always] stanza

I over-restricted use of __builtin_mffsl, since I was unaware that it
automatically uses mffs when mffsl is not available. Paul Clarke pointed
this out in discussion of his SSE 4.1 compatibility patches.

2021-08-31 Bill Schmidt <wschmidt@linux.ibm.com>

gcc/
* config/rs6000/rs6000-builtin-new.def (__builtin_mffsl): Move from
[power9] to [always].

Fortran: Prefer GCC internal macros to float.h in ISO_Fortran_binding.h.

2021-09-17 Sandra Loosemore <sandra@codesourcery.com>
Tobias Burnus <tobias@codesourcery.com>

libgfortran/
* ISO_Fortran_binding.h: Only include float.h if the C compiler
doesn't have predefined __LDBL_* and __DBL_* macros. Handle
LDBL_MANT_DIG == 53 for FreeBSD.

configure, jit: Allow for 'make check-gcc-jit'.

This is a convenience feature that allows the user to
do "make check-gcc-jit" at the top level of the build
to check that facility in isolation from others.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
ChangeLog:

* Makefile.def: Add a jit check target for the jit
language.
* Makefile.in: Regenerate.

Revert no longer needed fix for PR95539

The workaround is no longer necessary since we maintain alignment
info on the DR group leader only.

2021-09-17 Richard Biener <rguenther@suse.de>

* tree-vect-stmts.c (vectorizable_load): Do not frob
stmt_info for SLP.

libstdc++: Rename tests with incorrect extension

The libstdc++ testsuite only runs .cc files, so these two old tests have
never been run.

Signed-off-by: Jonathan Wakely <jwakely@redhat.com>
libstdc++-v3/ChangeLog:

* testsuite/26_numerics/valarray/dr630-3.C: Moved to...
* testsuite/26_numerics/valarray/dr630-3.cc: ...here.
* testsuite/27_io/basic_iostream/cons/16251.C: Moved to...
* testsuite/27_io/basic_iostream/cons/16251.cc: ...here.

libgomp: Spelling error fix in OpenMP 5.1 conformance section

Fix spelling of OpenMP directive declare variant.

2021-09-17 Jakub Jelinek <jakub@redhat.com>

* libgomp.texi (OpenMP 5.1): Spelling fix,
declare variante -> declare variant.

openmp: Add support for OpenMP 5.1 atomics for C++

Besides the C++ FE changes, I've noticed that the C FE didn't reject
  #pragma omp atomic capture compare
  { v = x; x = y; }
and other forms of atomic swap, this patch fixes that too.  And the
c-family/ routine needed quite a few changes so that the new code
in it works fine with both FEs.

2021-09-17  Jakub Jelinek  <jakub@redhat.com>

gcc/c-family/
* c-omp.c (c_finish_omp_atomic): Avoid creating
TARGET_EXPR if test is true, use create_tmp_var_raw instead of
create_tmp_var and add a zero initializer to TARGET_EXPRs that
had NULL initializer.  When omitting operands after v = x,
use type of v rather than type of x.  Fix type of vtmp
TARGET_EXPR.
gcc/c/
* c-parser.c (c_parser_omp_atomic): Reject atomic swap if capture
is true.
gcc/cp/
* cp-tree.h (finish_omp_atomic): Add r and weak arguments.
* parser.c (cp_parser_omp_atomic): Update function comment for
OpenMP 5.1 atomics, parse OpenMP 5.1 atomics and fail, compare and
weak clauses.
* semantics.c (finish_omp_atomic): Add r and weak arguments, handle
them, handle COND_EXPRs.
* pt.c (tsubst_expr): Adjust for COND_EXPR forms that
finish_omp_atomic can now produce.
gcc/testsuite/
* c-c++-common/gomp/atomic-18.c: Expect same diagnostics in C++ as in
C.
* c-c++-common/gomp/atomic-25.c: Drop c effective target.
* c-c++-common/gomp/atomic-26.c: Likewise.
* c-c++-common/gomp/atomic-27.c: Likewise.
* c-c++-common/gomp/atomic-28.c: Likewise.
* c-c++-common/gomp/atomic-29.c: Likewise.
* c-c++-common/gomp/atomic-30.c: Likewise.  Adjust expected diagnostics
for C++ when it differs from C.
(foo): Change return type from double to void.
* g++.dg/gomp/atomic-5.C: Adjust expected diagnostics wording.
* g++.dg/gomp/atomic-20.C: New test.
libgomp/
* testsuite/libgomp.c-c++-common/atomic-19.c: Drop c effective target.
Use /* */ comments instead of //.
* testsuite/libgomp.c-c++-common/atomic-20.c: Likewise.
* testsuite/libgomp.c-c++-common/atomic-21.c: Likewise.
* testsuite/libgomp.c++/atomic-16.C: New test.
* testsuite/libgomp.c++/atomic-17.C: New test.

x86: Add TARGET_SSE_PARTIAL_REG_[FP_]CONVERTS_DEPENDENCY

1. Replace TARGET_SSE_PARTIAL_REG_DEPENDENCY with
TARGET_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY in SSE FP to FP splitters.
2. Replace TARGET_SSE_PARTIAL_REG_DEPENDENCY with
TARGET_SSE_PARTIAL_REG_CONVERTS_DEPENDENCY in SSE INT to FP splitters.
3. Also check TARGET_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY and
TARGET_SSE_PARTIAL_REG_DEPENDENCY when handling avx_partial_xmm_update
attribute. Don't convert AVX partial XMM register update if there is no
partial SSE register dependency for SSE conversion.

gcc/

* config/i386/i386-features.c (remove_partial_avx_dependency):
Also check TARGET_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY and
and TARGET_SSE_PARTIAL_REG_CONVERTS_DEPENDENCY before generating
vxorps.
* config/i386/i386.h (TARGET_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY):
New.
(TARGET_SSE_PARTIAL_REG_CONVERTS_DEPENDENCY): Likewise.
* config/i386/i386.md (SSE FP to FP splitters): Replace
TARGET_SSE_PARTIAL_REG_DEPENDENCY with
TARGET_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY.
(SSE INT to FP splitter): Replace TARGET_SSE_PARTIAL_REG_DEPENDENCY
with TARGET_SSE_PARTIAL_REG_CONVERTS_DEPENDENCY.
* config/i386/x86-tune.def
(X86_TUNE_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY): New.
(X86_TUNE_SSE_PARTIAL_REG_CONVERTS_DEPENDENCY): Likewise.

gcc/testsuite/

* gcc.target/i386/avx-covert-1.c: New file.
* gcc.target/i386/avx-fp-covert-1.c: Likewise.
* gcc.target/i386/avx-int-covert-1.c: Likewise.
* gcc.target/i386/sse-covert-1.c: Likewise.
* gcc.target/i386/sse-fp-covert-1.c: Likewise.
* gcc.target/i386/sse-int-covert-1.c: Likewise.

x86: Properly handle USE_VECTOR_FP_CONVERTS/USE_VECTOR_CONVERTS

Check TARGET_USE_VECTOR_FP_CONVERTS or TARGET_USE_VECTOR_CONVERTS when
handling avx_partial_xmm_update attribute. Don't convert AVX partial
XMM register update if vector packed SSE conversion should be used.

gcc/

PR target/101900
* config/i386/i386-features.c (remove_partial_avx_dependency):
Check TARGET_USE_VECTOR_FP_CONVERTS and TARGET_USE_VECTOR_CONVERTS
before generating vxorps.

gcc/testsuite

PR target/101900
* gcc.target/i386/pr101900-1.c: New test.
* gcc.target/i386/pr101900-2.c: Likewise.
* gcc.target/i386/pr101900-3.c: Likewise.

x86: Update memcpy/memset inline strategies for -mtune=tremont

Simply memcpy and memset inline strategies to avoid branches for
-mtune=tremont:

1. Create Tremont cost model from generic cost model.
2. With MOVE_RATIO and CLEAR_RATIO == 17, GCC will use integer/vector
   load and store for up to 16 * 16 (256) bytes when the data size is
   fixed and known.
3. Inline only if data size is known to be <= 256.
   a. Use "rep movsb/stosb" with simple code sequence if the data size
      is a constant.
   b. Use loop if data size is not a constant.
4. Use memcpy/memset libray function if data size is unknown or > 256.

* config/i386/i386-options.c (processor_cost_table): Use
tremont_cost for Tremont.
* config/i386/x86-tune-costs.h (tremont_memcpy): New.
(tremont_memset): Likewise.
(tremont_cost): Likewise.
* config/i386/x86-tune.def (X86_TUNE_PREFER_KNOWN_REP_MOVSB_STOSB):
Enable for Tremont.

x86: Update -mtune=tremont

Initial -mtune=tremont update

1. Use Haswell scheduling model.
2. Assume that stack engine allows to execute push&pop instructions in
parall.
3. Prepare for scheduling pass as -mtune=generic.
4. Use the same issue rate as -mtune=generic.
5. Enable partial_reg_dependency.
6. Disable accumulate_outgoing_args
7. Enable use_leave
8. Enable push_memory
9. Disable four_jump_limit
10. Disable opt_agu
11. Disable avoid_lea_for_addr
12. Disable avoid_mem_opnd_for_cmove
13. Enable misaligned_move_string_pro_epilogues
14. Enable use_cltd
16. Enable avoid_false_dep_for_bmi
17. Enable avoid_mfence
18. Disable expand_abs
19. Enable sse_typeless_stores
20. Enable sse_load0_by_pxor
21. Disable split_mem_opnd_for_fp_converts
22. Disable slow_pshufb
23. Enable partial_reg_dependency

This is the first patch to tune for Tremont.  With all patches applied,
performance impacts on SPEC CPU 2017 are:

500.perlbench_r         1.81%
502.gcc_r               0.57%
505.mcf_r               1.16%
520.omnetpp_r           0.00%
523.xalancbmk_r         0.00%
525.x264_r              4.55%
531.deepsjeng_r         0.00%
541.leela_r             0.39%
548.exchange2_r         1.13%
557.xz_r                0.00%
geomean for intrate     0.95%
503.bwaves_r            0.00%
507.cactuBSSN_r         6.94%
508.namd_r              12.37%
510.parest_r            1.01%
511.povray_r            3.70%
519.lbm_r               36.61%
521.wrf_r               8.79%
526.blender_r           2.91%
527.cam4_r              6.23%
538.imagick_r           0.28%
544.nab_r               21.99%
549.fotonik3d_r         3.63%
554.roms_r              -1.20%
geomean for fprate      7.50%

gcc/ChangeLog

* common/config/i386/i386-common.c: Use Haswell scheduling model
for Tremont.
* config/i386/i386.c (ix86_sched_init_global): Prepare for Tremont
scheduling pass.
* config/i386/x86-tune-sched.c (ix86_issue_rate): Change Tremont
issue rate to 4.
(ix86_adjust_cost): Handle Tremont.
* config/i386/x86-tune.def (X86_TUNE_SSE_PARTIAL_REG_DEPENDENCY):
Enable for Tremont.
(X86_TUNE_USE_LEAVE): Likewise.
(X86_TUNE_PUSH_MEMORY): Likewise.
(X86_TUNE_MISALIGNED_MOVE_STRING_PRO_EPILOGUES): Likewise.
(X86_TUNE_USE_CLTD): Likewise.
(X86_TUNE_AVOID_FALSE_DEP_FOR_BMI): Likewise.
(X86_TUNE_AVOID_MFENCE): Likewise.
(X86_TUNE_SSE_TYPELESS_STORES): Likewise.
(X86_TUNE_SSE_LOAD0_BY_PXOR): Likewise.
(X86_TUNE_ACCUMULATE_OUTGOING_ARGS): Disable for Tremont.
(X86_TUNE_FOUR_JUMP_LIMIT): Likewise.
(X86_TUNE_OPT_AGU): Likewise.
(X86_TUNE_AVOID_LEA_FOR_ADDR): Likewise.
(X86_TUNE_AVOID_MEM_OPND_FOR_CMOVE): Likewise.
(X86_TUNE_EXPAND_ABS): Likewise.
(X86_TUNE_SPLIT_MEM_OPND_FOR_FP_CONVERTS): Likewise.
(X86_TUNE_SLOW_PSHUFB): Likewise.

Fix PR rtl-optimization/102306

This is a duplication of volatile loads introduced during GCC 9 development
by the 2->2 mechanism of the RTL combiner. There is already a substantial
checking for volatile references in can_combine_p but it implicitly assumes
that the combination reduces the number of instructions, which is of course
not the case here. So the fix teaches try_combine to abort the combination
when it is about to make a copy of volatile references to preserve them.

gcc/
PR rtl-optimization/102306
* combine.c (try_combine): Abort the combination if we are about to
duplicate volatile references.

gcc/testsuite/
* gcc.target/sparc/20210917-1.c: New test.

AVX512FP16: Add intrinsics for casting between vector float16 and vector float32/float64/integer.

gcc/ChangeLog:

* config/i386/avx512fp16intrin.h (_mm_undefined_ph):
New intrinsic.
(_mm256_undefined_ph): Likewise.
(_mm512_undefined_ph): Likewise.
(_mm_cvtsh_h): Likewise.
(_mm256_cvtsh_h): Likewise.
(_mm512_cvtsh_h): Likewise.
(_mm512_castph_ps): Likewise.
(_mm512_castph_pd): Likewise.
(_mm512_castph_si512): Likewise.
(_mm512_castph512_ph128): Likewise.
(_mm512_castph512_ph256): Likewise.
(_mm512_castph128_ph512): Likewise.
(_mm512_castph256_ph512): Likewise.
(_mm512_zextph128_ph512): Likewise.
(_mm512_zextph256_ph512): Likewise.
(_mm512_castps_ph): Likewise.
(_mm512_castpd_ph): Likewise.
(_mm512_castsi512_ph): Likewise.
* config/i386/avx512fp16vlintrin.h (_mm_castph_ps):
New intrinsic.
(_mm256_castph_ps): Likewise.
(_mm_castph_pd): Likewise.
(_mm256_castph_pd): Likewise.
(_mm_castph_si128): Likewise.
(_mm256_castph_si256): Likewise.
(_mm_castps_ph): Likewise.
(_mm256_castps_ph): Likewise.
(_mm_castpd_ph): Likewise.
(_mm256_castpd_ph): Likewise.
(_mm_castsi128_ph): Likewise.
(_mm256_castsi256_ph): Likewise.
(_mm256_castph256_ph128): Likewise.
(_mm256_castph128_ph256): Likewise.
(_mm256_zextph128_ph256): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx512fp16-typecast-1.c: New test.
* gcc.target/i386/avx512fp16-typecast-2.c: Ditto.
* gcc.target/i386/avx512fp16vl-typecast-1.c: Ditto.
* gcc.target/i386/avx512fp16vl-typecast-2.c: Ditto.

AVX512FP16: Add testcase for vcvtsh2sd/vcvtsh2ss/vcvtsd2sh/vcvtss2sh.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx512fp16-vcvtsd2sh-1a.c: New test.
* gcc.target/i386/avx512fp16-vcvtsd2sh-1b.c: Ditto.
* gcc.target/i386/avx512fp16-vcvtsh2sd-1a.c: Ditto.
* gcc.target/i386/avx512fp16-vcvtsh2sd-1b.c: Ditto.
* gcc.target/i386/avx512fp16-vcvtsh2ss-1a.c: Ditto.
* gcc.target/i386/avx512fp16-vcvtsh2ss-1b.c: Ditto.
* gcc.target/i386/avx512fp16-vcvtss2sh-1a.c: Ditto.
* gcc.target/i386/avx512fp16-vcvtss2sh-1b.c: Ditto.

AVX512FP16: Add vcvtsh2ss/vcvtsh2sd/vcvtss2sh/vcvtsd2sh.

gcc/ChangeLog:

* config/i386/avx512fp16intrin.h (_mm_cvtsh_ss):
New intrinsic.
(_mm_mask_cvtsh_ss): Likewise.
(_mm_maskz_cvtsh_ss): Likewise.
(_mm_cvtsh_sd): Likewise.
(_mm_mask_cvtsh_sd): Likewise.
(_mm_maskz_cvtsh_sd): Likewise.
(_mm_cvt_roundsh_ss): Likewise.
(_mm_mask_cvt_roundsh_ss): Likewise.
(_mm_maskz_cvt_roundsh_ss): Likewise.
(_mm_cvt_roundsh_sd): Likewise.
(_mm_mask_cvt_roundsh_sd): Likewise.
(_mm_maskz_cvt_roundsh_sd): Likewise.
(_mm_cvtss_sh): Likewise.
(_mm_mask_cvtss_sh): Likewise.
(_mm_maskz_cvtss_sh): Likewise.
(_mm_cvtsd_sh): Likewise.
(_mm_mask_cvtsd_sh): Likewise.
(_mm_maskz_cvtsd_sh): Likewise.
(_mm_cvt_roundss_sh): Likewise.
(_mm_mask_cvt_roundss_sh): Likewise.
(_mm_maskz_cvt_roundss_sh): Likewise.
(_mm_cvt_roundsd_sh): Likewise.
(_mm_mask_cvt_roundsd_sh): Likewise.
(_mm_maskz_cvt_roundsd_sh): Likewise.
* config/i386/i386-builtin-types.def
(V8HF_FTYPE_V2DF_V8HF_V8HF_UQI_INT,
V8HF_FTYPE_V4SF_V8HF_V8HF_UQI_INT,
V2DF_FTYPE_V8HF_V2DF_V2DF_UQI_INT,
V4SF_FTYPE_V8HF_V4SF_V4SF_UQI_INT): Add new builtin types.
* config/i386/i386-builtin.def: Add corrresponding new builtins.
* config/i386/i386-expand.c: Handle new builtin types.
* config/i386/sse.md (VF48_128): New mode iterator.
(avx512fp16_vcvtsh2<ssescalarmodesuffix><mask_scalar_name><round_saeonly_scalar_name>):
New.
(avx512fp16_vcvt<ssescalarmodesuffix>2sh<mask_scalar_name><round_scalar_name>):
Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx-1.c: Add test for new builtins.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/sse-14.c: Add test for new intrinsics.
* gcc.target/i386/sse-22.c: Ditto.

AVX512FP16: Add testcase for vcvtph2pd/vcvtph2psx/vcvtpd2ph/vcvtps2phx.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx512fp16-helper.h (V512): Add DF contents.
(src3f): New.
* gcc.target/i386/avx512fp16-vcvtpd2ph-1a.c: New test.
* gcc.target/i386/avx512fp16-vcvtpd2ph-1b.c: Ditto.
* gcc.target/i386/avx512fp16-vcvtph2pd-1a.c: Ditto.
* gcc.target/i386/avx512fp16-vcvtph2pd-1b.c: Ditto.
* gcc.target/i386/avx512fp16-vcvtph2psx-1a.c: Ditto.
* gcc.target/i386/avx512fp16-vcvtph2psx-1b.c: Ditto.
* gcc.target/i386/avx512fp16-vcvtps2ph-1a.c: Ditto.
* gcc.target/i386/avx512fp16-vcvtps2ph-1b.c: Ditto.
* gcc.target/i386/avx512fp16vl-vcvtpd2ph-1a.c: Ditto.
* gcc.target/i386/avx512fp16vl-vcvtpd2ph-1b.c: Ditto.
* gcc.target/i386/avx512fp16vl-vcvtph2pd-1a.c: Ditto.
* gcc.target/i386/avx512fp16vl-vcvtph2pd-1b.c: Ditto.
* gcc.target/i386/avx512fp16vl-vcvtph2psx-1a.c: Ditto.
* gcc.target/i386/avx512fp16vl-vcvtph2psx-1b.c: Ditto.
* gcc.target/i386/avx512fp16vl-vcvtps2ph-1a.c: Ditto.
* gcc.target/i386/avx512fp16vl-vcvtps2ph-1b.c: Ditto.

AVX512FP16: Add vcvtph2pd/vcvtph2psx/vcvtpd2ph/vcvtps2phx.

gcc/ChangeLog:

* config/i386/avx512fp16intrin.h (_mm512_cvtph_pd):
New intrinsic.
(_mm512_mask_cvtph_pd): Likewise.
(_mm512_maskz_cvtph_pd): Likewise.
(_mm512_cvt_roundph_pd): Likewise.
(_mm512_mask_cvt_roundph_pd): Likewise.
(_mm512_maskz_cvt_roundph_pd): Likewise.
(_mm512_cvtxph_ps): Likewise.
(_mm512_mask_cvtxph_ps): Likewise.
(_mm512_maskz_cvtxph_ps): Likewise.
(_mm512_cvtx_roundph_ps): Likewise.
(_mm512_mask_cvtx_roundph_ps): Likewise.
(_mm512_maskz_cvtx_roundph_ps): Likewise.
(_mm512_cvtxps_ph): Likewise.
(_mm512_mask_cvtxps_ph): Likewise.
(_mm512_maskz_cvtxps_ph): Likewise.
(_mm512_cvtx_roundps_ph): Likewise.
(_mm512_mask_cvtx_roundps_ph): Likewise.
(_mm512_maskz_cvtx_roundps_ph): Likewise.
(_mm512_cvtpd_ph): Likewise.
(_mm512_mask_cvtpd_ph): Likewise.
(_mm512_maskz_cvtpd_ph): Likewise.
(_mm512_cvt_roundpd_ph): Likewise.
(_mm512_mask_cvt_roundpd_ph): Likewise.
(_mm512_maskz_cvt_roundpd_ph): Likewise.
* config/i386/avx512fp16vlintrin.h (_mm_cvtph_pd):
New intrinsic.
(_mm_mask_cvtph_pd): Likewise.
(_mm_maskz_cvtph_pd): Likewise.
(_mm256_cvtph_pd): Likewise.
(_mm256_mask_cvtph_pd): Likewise.
(_mm256_maskz_cvtph_pd): Likewise.
(_mm_cvtxph_ps): Likewise.
(_mm_mask_cvtxph_ps): Likewise.
(_mm_maskz_cvtxph_ps): Likewise.
(_mm256_cvtxph_ps): Likewise.
(_mm256_mask_cvtxph_ps): Likewise.
(_mm256_maskz_cvtxph_ps): Likewise.
(_mm_cvtxps_ph): Likewise.
(_mm_mask_cvtxps_ph): Likewise.
(_mm_maskz_cvtxps_ph): Likewise.
(_mm256_cvtxps_ph): Likewise.
(_mm256_mask_cvtxps_ph): Likewise.
(_mm256_maskz_cvtxps_ph): Likewise.
(_mm_cvtpd_ph): Likewise.
(_mm_mask_cvtpd_ph): Likewise.
(_mm_maskz_cvtpd_ph): Likewise.
(_mm256_cvtpd_ph): Likewise.
(_mm256_mask_cvtpd_ph): Likewise.
(_mm256_maskz_cvtpd_ph): Likewise.
* config/i386/i386-builtin.def: Add corresponding new builtins.
* config/i386/i386-builtin-types.def: Add corresponding builtin types.
* config/i386/i386-expand.c: Handle new builtin types.
* config/i386/sse.md
(VF4_128_8_256): New.
(VF48H_AVX512VL): Ditto.
(ssePHmode): Add HF vector modes.
(castmode): Add new convertable modes.
(qq2phsuff): Ditto.
(ph2pssuffix): New.
(avx512fp16_vcvt<castmode>2ph_<mode><mask_name><round_name>): Ditto.
(avx512fp16_vcvt<castmode>2ph_<mode>): Ditto.
(*avx512fp16_vcvt<castmode>2ph_<mode>): Ditto.
(avx512fp16_vcvt<castmode>2ph_<mode>_mask): Ditto.
(*avx512fp16_vcvt<castmode>2ph_<mode>_mask): Ditto.
(*avx512fp16_vcvt<castmode>2ph_<mode>_mask_1): Ditto.
(avx512fp16_float_extend_ph<mode>2<mask_name><round_saeonly_name>):
Ditto.
(avx512fp16_float_extend_ph<mode>2<mask_name>): Ditto.
(*avx512fp16_float_extend_ph<mode>2_load<mask_name>): Ditto.
(avx512fp16_float_extend_phv2df2<mask_name>): Ditto.
(*avx512fp16_float_extend_phv2df2_load<mask_name>): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx-1.c: Add test for new builtins.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/sse-14.c: Add test for new intrinsics.
* gcc.target/i386/sse-22.c: Ditto.

AVX512FP16: Add vcvttsh2si/vcvttsh2usi.

gcc/ChangeLog:

* config/i386/avx512fp16intrin.h (_mm_cvttsh_i32):
New intrinsic.
(_mm_cvttsh_u32): Likewise.
(_mm_cvtt_roundsh_i32): Likewise.
(_mm_cvtt_roundsh_u32): Likewise.
(_mm_cvttsh_i64): Likewise.
(_mm_cvttsh_u64): Likewise.
(_mm_cvtt_roundsh_i64): Likewise.
(_mm_cvtt_roundsh_u64): Likewise.
* config/i386/i386-builtin.def: Add corresponding new builtins.
* config/i386/sse.md
(avx512fp16_fix<fixunssuffix>_trunc<mode>2<round_saeonly_name>):
New.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx512fp16-vcvttsh2si-1a.c: New test.
* gcc.target/i386/avx512fp16-vcvttsh2si-1b.c: Ditto.
* gcc.target/i386/avx512fp16-vcvttsh2si64-1a.c: Ditto.
* gcc.target/i386/avx512fp16-vcvttsh2si64-1b.c: Ditto.
* gcc.target/i386/avx512fp16-vcvttsh2usi-1a.c: Ditto.
* gcc.target/i386/avx512fp16-vcvttsh2usi-1b.c: Ditto.
* gcc.target/i386/avx512fp16-vcvttsh2usi64-1a.c: Ditto.
* gcc.target/i386/avx512fp16-vcvttsh2usi64-1b.c: Ditto.
* gcc.target/i386/avx-1.c: Add test for new builtins.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/sse-14.c: Add test for new intrinsics.
* gcc.target/i386/sse-22.c: Ditto.

AVX512FP16: Add testcase for vcvttph2w/vcvttph2uw/vcvttph2dq/vcvttph2udq/vcvttph2qq/vcvttph2uqq.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx512fp16-vcvttph2dq-1a.c: New test.
* gcc.target/i386/avx512fp16-vcvttph2dq-1b.c: Ditto.
* gcc.target/i386/avx512fp16-vcvttph2qq-1a.c: Ditto.
* gcc.target/i386/avx512fp16-vcvttph2qq-1b.c: Ditto.
* gcc.target/i386/avx512fp16-vcvttph2udq-1a.c: Ditto.
* gcc.target/i386/avx512fp16-vcvttph2udq-1b.c: Ditto.
* gcc.target/i386/avx512fp16-vcvttph2uqq-1a.c: Ditto.
* gcc.target/i386/avx512fp16-vcvttph2uqq-1b.c: Ditto.
* gcc.target/i386/avx512fp16-vcvttph2uw-1a.c: Ditto.
* gcc.target/i386/avx512fp16-vcvttph2uw-1b.c: Ditto.
* gcc.target/i386/avx512fp16-vcvttph2w-1a.c: Ditto.
* gcc.target/i386/avx512fp16-vcvttph2w-1b.c: Ditto.
* gcc.target/i386/avx512fp16vl-vcvttph2dq-1a.c: Ditto.
* gcc.target/i386/avx512fp16vl-vcvttph2dq-1b.c: Ditto.
* gcc.target/i386/avx512fp16vl-vcvttph2qq-1a.c: Ditto.
* gcc.target/i386/avx512fp16vl-vcvttph2qq-1b.c: Ditto.
* gcc.target/i386/avx512fp16vl-vcvttph2udq-1a.c: Ditto.
* gcc.target/i386/avx512fp16vl-vcvttph2udq-1b.c: Ditto.
* gcc.target/i386/avx512fp16vl-vcvttph2uqq-1a.c: Ditto.
* gcc.target/i386/avx512fp16vl-vcvttph2uqq-1b.c: Ditto.
* gcc.target/i386/avx512fp16vl-vcvttph2uw-1a.c: Ditto.
* gcc.target/i386/avx512fp16vl-vcvttph2uw-1b.c: Ditto.
* gcc.target/i386/avx512fp16vl-vcvttph2w-1a.c: Ditto.
* gcc.target/i386/avx512fp16vl-vcvttph2w-1b.c: Ditto.

AVX512FP16: Add vcvttph2w/vcvttph2uw/vcvttph2dq/vcvttph2qq/vcvttph2udq/vcvttph2uqq

gcc/ChangeLog:

* config/i386/avx512fp16intrin.h (_mm512_cvttph_epi32):
New intrinsic.
(_mm512_mask_cvttph_epi32): Likewise.
(_mm512_maskz_cvttph_epi32): Likewise.
(_mm512_cvtt_roundph_epi32): Likewise.
(_mm512_mask_cvtt_roundph_epi32): Likewise.
(_mm512_maskz_cvtt_roundph_epi32): Likewise.
(_mm512_cvttph_epu32): Likewise.
(_mm512_mask_cvttph_epu32): Likewise.
(_mm512_maskz_cvttph_epu32): Likewise.
(_mm512_cvtt_roundph_epu32): Likewise.
(_mm512_mask_cvtt_roundph_epu32): Likewise.
(_mm512_maskz_cvtt_roundph_epu32): Likewise.
(_mm512_cvttph_epi64): Likewise.
(_mm512_mask_cvttph_epi64): Likewise.
(_mm512_maskz_cvttph_epi64): Likewise.
(_mm512_cvtt_roundph_epi64): Likewise.
(_mm512_mask_cvtt_roundph_epi64): Likewise.
(_mm512_maskz_cvtt_roundph_epi64): Likewise.
(_mm512_cvttph_epu64): Likewise.
(_mm512_mask_cvttph_epu64): Likewise.
(_mm512_maskz_cvttph_epu64): Likewise.
(_mm512_cvtt_roundph_epu64): Likewise.
(_mm512_mask_cvtt_roundph_epu64): Likewise.
(_mm512_maskz_cvtt_roundph_epu64): Likewise.
(_mm512_cvttph_epi16): Likewise.
(_mm512_mask_cvttph_epi16): Likewise.
(_mm512_maskz_cvttph_epi16): Likewise.
(_mm512_cvtt_roundph_epi16): Likewise.
(_mm512_mask_cvtt_roundph_epi16): Likewise.
(_mm512_maskz_cvtt_roundph_epi16): Likewise.
(_mm512_cvttph_epu16): Likewise.
(_mm512_mask_cvttph_epu16): Likewise.
(_mm512_maskz_cvttph_epu16): Likewise.
(_mm512_cvtt_roundph_epu16): Likewise.
(_mm512_mask_cvtt_roundph_epu16): Likewise.
(_mm512_maskz_cvtt_roundph_epu16): Likewise.
* config/i386/avx512fp16vlintrin.h (_mm_cvttph_epi32):
New intirnsic.
(_mm_mask_cvttph_epi32): Likewise.
(_mm_maskz_cvttph_epi32): Likewise.
(_mm256_cvttph_epi32): Likewise.
(_mm256_mask_cvttph_epi32): Likewise.
(_mm256_maskz_cvttph_epi32): Likewise.
(_mm_cvttph_epu32): Likewise.
(_mm_mask_cvttph_epu32): Likewise.
(_mm_maskz_cvttph_epu32): Likewise.
(_mm256_cvttph_epu32): Likewise.
(_mm256_mask_cvttph_epu32): Likewise.
(_mm256_maskz_cvttph_epu32): Likewise.
(_mm_cvttph_epi64): Likewise.
(_mm_mask_cvttph_epi64): Likewise.
(_mm_maskz_cvttph_epi64): Likewise.
(_mm256_cvttph_epi64): Likewise.
(_mm256_mask_cvttph_epi64): Likewise.
(_mm256_maskz_cvttph_epi64): Likewise.
(_mm_cvttph_epu64): Likewise.
(_mm_mask_cvttph_epu64): Likewise.
(_mm_maskz_cvttph_epu64): Likewise.
(_mm256_cvttph_epu64): Likewise.
(_mm256_mask_cvttph_epu64): Likewise.
(_mm256_maskz_cvttph_epu64): Likewise.
(_mm_cvttph_epi16): Likewise.
(_mm_mask_cvttph_epi16): Likewise.
(_mm_maskz_cvttph_epi16): Likewise.
(_mm256_cvttph_epi16): Likewise.
(_mm256_mask_cvttph_epi16): Likewise.
(_mm256_maskz_cvttph_epi16): Likewise.
(_mm_cvttph_epu16): Likewise.
(_mm_mask_cvttph_epu16): Likewise.
(_mm_maskz_cvttph_epu16): Likewise.
(_mm256_cvttph_epu16): Likewise.
(_mm256_mask_cvttph_epu16): Likewise.
(_mm256_maskz_cvttph_epu16): Likewise.
* config/i386/i386-builtin.def: Add new builtins.
* config/i386/sse.md
(avx512fp16_fix<fixunssuffix>_trunc<mode>2<mask_name><round_saeonly_name>):
New.
(avx512fp16_fix<fixunssuffix>_trunc<mode>2<mask_name>): Ditto.
(*avx512fp16_fix<fixunssuffix>_trunc<mode>2_load<mask_name>): Ditto.
(avx512fp16_fix<fixunssuffix>_truncv2di2<mask_name>): Ditto.
(avx512fp16_fix<fixunssuffix>_truncv2di2_load<mask_name>): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx-1.c: Add test for new builtins.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/sse-14.c: Add test for new intrinsics.
* gcc.target/i386/sse-22.c: Ditto.

AVX512FP16: Add testcase for vcvtsh2si/vcvtsh2usi/vcvtsi2sh/vcvtusi2sh.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx512fp16-helper.h (V512): Add int32
component.
* gcc.target/i386/avx512fp16-vcvtsh2si-1a.c: New test.
* gcc.target/i386/avx512fp16-vcvtsh2si-1b.c: Ditto.
* gcc.target/i386/avx512fp16-vcvtsh2si64-1a.c: Ditto.
* gcc.target/i386/avx512fp16-vcvtsh2si64-1b.c: Ditto.
* gcc.target/i386/avx512fp16-vcvtsh2usi-1a.c: Ditto.
* gcc.target/i386/avx512fp16-vcvtsh2usi-1b.c: Ditto.
* gcc.target/i386/avx512fp16-vcvtsh2usi64-1a.c: Ditto.
* gcc.target/i386/avx512fp16-vcvtsh2usi64-1b.c: Ditto.
* gcc.target/i386/avx512fp16-vcvtsi2sh-1a.c: Ditto.
* gcc.target/i386/avx512fp16-vcvtsi2sh-1b.c: Ditto.
* gcc.target/i386/avx512fp16-vcvtsi2sh64-1a.c: Ditto.
* gcc.target/i386/avx512fp16-vcvtsi2sh64-1b.c: Ditto.
* gcc.target/i386/avx512fp16-vcvtusi2sh-1a.c: Ditto.
* gcc.target/i386/avx512fp16-vcvtusi2sh-1b.c: Ditto.
* gcc.target/i386/avx512fp16-vcvtusi2sh64-1a.c: Ditto.
* gcc.target/i386/avx512fp16-vcvtusi2sh64-1b.c: Ditto.

AVX512FP16: Add vcvtsh2si/vcvtsh2usi/vcvtsi2sh/vcvtusi2sh.

gcc/ChangeLog:

* config/i386/avx512fp16intrin.h (_mm_cvtsh_i32): New intrinsic.
(_mm_cvtsh_u32): Likewise.
(_mm_cvt_roundsh_i32): Likewise.
(_mm_cvt_roundsh_u32): Likewise.
(_mm_cvtsh_i64): Likewise.
(_mm_cvtsh_u64): Likewise.
(_mm_cvt_roundsh_i64): Likewise.
(_mm_cvt_roundsh_u64): Likewise.
(_mm_cvti32_sh): Likewise.
(_mm_cvtu32_sh): Likewise.
(_mm_cvt_roundi32_sh): Likewise.
(_mm_cvt_roundu32_sh): Likewise.
(_mm_cvti64_sh): Likewise.
(_mm_cvtu64_sh): Likewise.
(_mm_cvt_roundi64_sh): Likewise.
(_mm_cvt_roundu64_sh): Likewise.
* config/i386/i386-builtin-types.def: Add corresponding builtin types.
* config/i386/i386-builtin.def: Add corresponding new builtins.
* config/i386/i386-expand.c (ix86_expand_round_builtin):
Handle new builtin types.
* config/i386/sse.md
(avx512fp16_vcvtsh2<sseintconvertsignprefix>si<rex64namesuffix><round_name>):
New define_insn.
(avx512fp16_vcvtsh2<sseintconvertsignprefix>si<rex64namesuffix>_2): Likewise.
(avx512fp16_vcvt<floatsuffix>si2sh<rex64namesuffix><round_name>): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx-1.c: Add test for new builtins.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/sse-14.c: Add test for new intrinsics.
* gcc.target/i386/sse-22.c: Ditto.