git.ipfire.org Git - thirdparty/gcc.git/log

Daily bump.

doc: Document correct -fwide-exec-charset defaults [PR41041]

As shown in the PR, the default is not UTF-32 but rather UTF-32BE or
UTF-32LE, avoiding the need for a byte order mark in literals.

gcc/ChangeLog:

PR c/41041
* doc/cppopts.texi: Document -fwide-exec-charset defaults
correctly.

(cherry picked from commit e50ea3a42f058c14ee29327d5277ab0435e3d36b)

Fix recent thinko in operand_equal_p

There is a thinko in a recent improvement made to operand_equal_p where
the code just looks at operand 2 of COMPONENT_REF, if it is present, to
compare addresses. That's wrong because operand 2 contains the number of
DECL_OFFSET_ALIGN-bit-sized words so, when DECL_OFFSET_ALIGN > 8, not all
the bytes are included and some of them are in DECL_FIELD_BIT_OFFSET, see
get_inner_reference for the model computation.

In other words, you would need to compare operand 2 and DECL_OFFSET_ALIGN
and DECL_FIELD_BIT_OFFSET in this situation, but I'm not sure this is worth
the hassle in practice so the fix just removes this alternate handling.

gcc/
* fold-const.cc (operand_compare::operand_equal_p) <COMPONENT_REF>:
Do not take into account operand 2.
(operand_compare::hash_operand) <COMPONENT_REF>: Likewise.

gcc/testsuite/
* gnat.dg/opt99.adb: New test.
* gnat.dg/opt99_pkg1.ads, gnat.dg/opt99_pkg1.adb: New helper.
* gnat.dg/opt99_pkg2.ads: Likewise.

Daily bump.

i386: Fix uninitialized register after peephole2 conversion [PR107404]

The eliminate reg-reg move by inverting the condition of
a cmove #2 peephole2 converts the following sequence:

  473: bx:DI=[r14:DI*0x8+r12:DI]
  960: r15:DI=r8:DI
  485: {flags:CCC=cmp(r15:DI+bx:DI,bx:DI);r15:DI=r15:DI+bx:DI;}
  737: r15:DI={(geu(flags:CCC,0))?r15:DI:bx:DI}

to:

1110: {flags:CCC=cmp(r8:DI+bx:DI,bx:DI);r8:DI=r8:DI+bx:DI;}
1111: r15:DI=[r14:DI*0x8+r12:DI]
1112: r15:DI={(geu(flags:CCC,0))?r8:DI:r15:DI}

Please note that(insn 1110) uses register BX, but its
initialization was eliminated.

Avoid conversion if eliminated move intialized a register, used
in the moved instruction.

2022-11-03  Uroš Bizjak  <ubizjak@gmail.com>

gcc/ChangeLog:

PR target/107404
* config/i386/i386.md (eliminate reg-reg move by inverting the
condition of a cmove #2 peephole2): Check if eliminated move
initialized a register, used in the moved instruction.

gcc/testsuite/ChangeLog:

PR target/107404
* g++.target/i386/pr107404.C: New test.

(cherry picked from commit 553b1d3dd5b9253ebdf66ee3260c717d5b807dd1)

c, c++: Fix up excess precision handling of scalar_to_vector conversion [PR107358]

As mentioned earlier in the C++ excess precision support mail, the following
testcase is broken with excess precision both in C and C++ (though just in C++
it was triggered in real-world code).
scalar_to_vector is called in both FEs after the excess precision promotions
(or stripping of EXCESS_PRECISION_EXPR), so we can then get invalid
diagnostics that say float vector + float involves truncation (on ia32
from long double to float).

The following patch fixes that by calling scalar_to_vector on the operands
before the excess precision promotions, let scalar_to_vector just do the
diagnostics (it does e.g. fold_for_warn so it will fold
EXCESS_PRECISION_EXPR around REAL_CST to constants etc.) but will then
do the actual conversions using the excess precision promoted operands
(so say if we have vector double + (float + float) we don't actually do
vector double + (float) ((long double) float + (long double) float)
but
vector double + (double) ((long double) float + (long double) float)

2022-10-24 Jakub Jelinek <jakub@redhat.com>

PR c++/107358
gcc/c/
* c-typeck.cc (build_binary_op): Pass operands before excess precision
promotions to scalar_to_vector call.
gcc/testsuite/
* c-c++-common/pr107358.c: New test.

(cherry picked from commit 65e3274e363cb2c6bfe6b5e648916eb7696f7e2f)

c++: Fix up constexpr handling of char/signed char/short pre/post inc/decrement [PR105774]

signed char, char or short int pre/post inc/decrement are represented by
normal {PRE,POST}_{INC,DEC}REMENT_EXPRs in the FE and only gimplification
ensures that the {PLUS,MINUS}_EXPR is done in unsigned version of those
types:
    case PREINCREMENT_EXPR:
    case PREDECREMENT_EXPR:
    case POSTINCREMENT_EXPR:
    case POSTDECREMENT_EXPR:
      {
        tree type = TREE_TYPE (TREE_OPERAND (*expr_p, 0));
        if (INTEGRAL_TYPE_P (type) && c_promoting_integer_type_p (type))
          {
            if (!TYPE_OVERFLOW_WRAPS (type))
              type = unsigned_type_for (type);
            return gimplify_self_mod_expr (expr_p, pre_p, post_p, 1, type);
          }
        break;
      }
This means during constant evaluation we need to do it similarly (either
using unsigned_type_for or using widening to integer_type_node).
The following patch does the latter.

2022-10-24  Jakub Jelinek  <jakub@redhat.com>

PR c++/105774
* constexpr.cc (cxx_eval_increment_expression): For signed types
that promote to int, evaluate PLUS_EXPR or MINUS_EXPR in int type.

* g++.dg/cpp1y/constexpr-105774.C: New test.

(cherry picked from commit da8c362c4c18cff2f2dfd5c4706bdda7576899a4)

libgomp: Fix up creation of artificial teams

When not in explicit parallel/target/teams construct, we in some cases create
an artificial parallel with a single thread (either to handle target nowait
or for task reduction purposes).  In those cases, it handled again artificially
created implicit task (created by gomp_new_icv for cases where we needed to write
to some ICVs), but as the testcases show, didn't take into account possibility
of this being done from explicit task(s).  The code would destroy/free the previous
task and replace it with the new implicit task.  If task is an explicit task
(when teams is NULL, all explicit tasks behave like if (0)), it is a pointer to
a local stack variable, so freeing it doesn't work, and additionally we shouldn't
lose the explicit tasks - the new implicit task should instead replace the
ancestor task which is the first implicit one.

2022-10-12  Jakub Jelinek  <jakub@redhat.com>

* task.c (gomp_create_artificial_team): Fix up handling of invocations
from within explicit task.
* target.c (GOMP_target_ext): Likewise.
* testsuite/libgomp.c/task-7.c: New test.
* testsuite/libgomp.c/task-8.c: New test.
* testsuite/libgomp.c-c++-common/task-reduction-17.c: New test.
* testsuite/libgomp.c-c++-common/task-reduction-18.c: New test.

(cherry picked from commit a58a965eb73253759f6a3e1c7380392557da89c8)

tree-cfg: Fix a verification diagnostic typo [PR107121]

Obvious typo in diagnostics.

2022-10-02 Jakub Jelinek <jakub@redhat.com>

PR tree-optimization/107121
* tree-cfg.cc (verify_gimple_call): Fix a typo in diagnostics,
DEFFERED_INIT -> DEFERRED_INIT.

(cherry picked from commit d01bd0b0f3b8f4c33c437ff10f0b949200627f56)

openmp: Fix ICE with taskgroup at -O0 -fexceptions [PR107001]

The following testcase ICEs because with -O0 -fexceptions GOMP_taskgroup_end
call isn't directly followed by GOMP_RETURN statement, but there are some
conditionals to handle exceptions and we fail to find the correct GOMP_RETURN.

The fix is to treat taskgroup similarly to target data, both of these constructs
emit a try { body } finally { end_call } around the construct's body during
gimplification and we need to see proper construct nesting during gimplification
and omp lowering (including nesting of regions checks), but during omp expansion
we don't really need their nesting anymore, all we need is emit something at
the start of the region and the end of the region is the end API call we've
already emitted during gimplification.  For target data, we weren't adding
GOMP_RETURN statement during omp lowering, so after that pass it is treated
merely like stand-alone omp directives.  This patch does the same for
taskgroup too.

2022-09-24  Jakub Jelinek  <jakub@redhat.com>

PR c/107001
* omp-low.cc (lower_omp_taskgroup): Don't add GOMP_RETURN statement
at the end.
* omp-expand.cc (build_omp_regions_1): Clarify GF_OMP_TARGET_KIND_DATA
is not stand-alone directive.  For GIMPLE_OMP_TASKGROUP, also don't
update parent.
(omp_make_gimple_edges) <case GIMPLE_OMP_TASKGROUP>: Reset
cur_region back after new_omp_region.

* c-c++-common/gomp/pr107001.c: New test.

(cherry picked from commit ad2aab5c816a6fd56b46210c0a4a4c6243da1de9)

openmp, c: Tighten up c_tree_equal [PR106981]

This patch changes c_tree_equal to work more like cp_tree_equal, be
more strict in what it accepts.  The ICE on the first testcase was
due to INTEGER_CST wi::wide (t1) == wi::wide (t2) comparison which
ICEs if the two constants have different precision, but as the second
testcase shows, being too lenient in it can also lead to miscompilation
of valid OpenMP programs where we think certain expression is the same
even when it isn't and can be guaranteed at runtime to represent different
memory location.  So, the patch looks through only NON_LVALUE_EXPRs
and for constants as well as casts requires that the types match before
actually comparing the constant values or recursing on the cast operands.

2022-09-24  Jakub Jelinek  <jakub@redhat.com>

PR c/106981
gcc/c/
* c-typeck.cc (c_tree_equal): Only strip NON_LVALUE_EXPRs at the
start.  For CONSTANT_CLASS_P or CASE_CONVERT: return false if t1 and
t2 have different types.
gcc/testsuite/
* c-c++-common/gomp/pr106981.c: New test.
libgomp/
* testsuite/libgomp.c-c++-common/pr106981.c: New test.

(cherry picked from commit 3c5bccb608c665ac3f62adb1817c42c845812428)

openmp: Fix handling of target constructs in static member functions [PR106829]

Just calling current_nonlambda_class_type in static member functions returns
non-NULL, but something that isn't *this and if unlucky can match part of the
IL and can be added to target clauses.
      if (DECL_NONSTATIC_MEMBER_P (decl)
          && current_class_ptr)
is a guard used elsewhere (in check_accessibility_of_qualified_id).

2022-09-07  Jakub Jelinek  <jakub@redhat.com>

PR c++/106829
* semantics.cc (finish_omp_target_clauses): If current_function_decl
isn't a nonstatic member function, don't set data.current_object to
non-NULL.

* g++.dg/gomp/pr106829.C: New test.

(cherry picked from commit e90af965e5c858ba02c0cdfbac35d0a19da1c2f6)

Daily bump.

Fortran: Add missing TKR initialization to class variables [PR100097, PR100098]

gcc/fortran/ChangeLog:

PR fortran/100097
PR fortran/100098
* trans-array.cc (gfc_trans_class_array): New function to
initialize class descriptor's TKR information.
* trans-array.h (gfc_trans_class_array): Add function prototype.
* trans-decl.cc (gfc_trans_deferred_vars): Add calls to the new
function for both pointers and allocatables.

gcc/testsuite/ChangeLog:

PR fortran/100097
PR fortran/100098
* gfortran.dg/PR100097.f90: New test.
* gfortran.dg/PR100098.f90: New test.

(cherry picked from commit 4cfdaeb2755121ac1069f09898def56469b0fb51)

Daily bump.

Fortran: BOZ literal constants are not compatible to any type [PR103413]

gcc/fortran/ChangeLog:

PR fortran/103413
* symbol.cc (gfc_type_compatible): A boz-literal-constant has no type
and thus is not considered compatible to any type.

gcc/testsuite/ChangeLog:

PR fortran/103413
* gfortran.dg/illegal_boz_arg_4.f90: New test.

(cherry picked from commit f7d28818179247685f3c101f9f2f16366f56309b)

OpenACC: Don't gang-privatize artificial variables [PR90115]

This patch prevents compiler-generated artificial variables from being
treated as privatization candidates for OpenACC.

The rationale is that e.g. "gang-private" variables actually must be
shared by each worker and vector spawned within a particular gang, but
that sharing is not necessary for any compiler-generated variable (at
least at present, but no such need is anticipated either). Variables on
the stack (and machine registers) are already private per-"thread"
(gang, worker and/or vector), and that's fine for artificial variables.

We're restricting this to blocks, as we still need to understand what it
means for a 'DECL_ARTIFICIAL' to appear in a 'private' clause.

Several tests need their scan output patterns adjusted to compensate.

2022-10-14 Julian Brown <julian@codesourcery.com>

PR middle-end/90115
gcc/
* omp-low.cc (oacc_privatization_candidate_p): Artificial vars are not
privatization candidates.

libgomp/
* testsuite/libgomp.oacc-fortran/declare-1.f90: Adjust scan output.
* testsuite/libgomp.oacc-fortran/host_data-5.F90: Likewise.
* testsuite/libgomp.oacc-fortran/if-1.f90: Likewise.
* testsuite/libgomp.oacc-fortran/print-1.f90: Likewise.
* testsuite/libgomp.oacc-fortran/privatized-ref-2.f90: Likewise.

Co-authored-by: Thomas Schwinge <thomas@codesourcery.com>
(cherry picked from commit 11e811d8e2f63667f60f73731bb934273f5882b8)

Daily bump.

lto: do not load LTO stream for aliases [PR107418]

PR lto/107418

gcc/lto/ChangeLog:

* lto-dump.cc (lto_main): Do not load LTO stream for aliases.

(cherry picked from commit be6c75547385c69706370f4e792b04295f708a5a)

IRA: Make sure array is big enough

In commit 081c96621da, the call to resize_reg_info() was moved before
the call to remove_scratches() and the latter one can increase the
number of regs and that would cause an out of bounds usage on the
reg_renumber global array.

Without this patch, the following testcase randomly fails with:
during RTL pass: ira
In file included from /src/gcc/testsuite/gcc.dg/compat/struct-by-value-5b_y.c:13:
/src/gcc/testsuite/gcc.dg/compat/struct-by-value-5b_y.c: In function 'checkgSf13':
/src/gcc/testsuite/gcc.dg/compat/fp-struct-test-by-value-y.h:28:1: internal compiler error: Segmentation fault
/src/gcc/testsuite/gcc.dg/compat/struct-by-value-5b_y.c:22:1: note: in expansion of macro 'TEST'

gcc/ChangeLog:

* ira.cc: Resize array after reg number increased.

Co-Authored-By: Yvan ROUX <yvan.roux@foss.st.com>
Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>
(cherry picked from commit 4e1d704243a4f3c4ded47cd0d02427bb7efef069)

Daily bump.

aarch64: update Ampere-1 core definition

This brings the extensions detected by -mcpu=native on Ampere-1 systems
in sync with the defaults generated for -mcpu=ampere1.

Note that some early kernel versions on Ampere1 may misreport the
presence of PAUTH and PREDRES (i.e., -mcpu=native will add 'nopauth'
and 'nopredres').

gcc/ChangeLog:

* config/aarch64/aarch64-cores.def (AARCH64_CORE): Update
Ampere-1 core entry.

(cherry picked from commit db2f5d661239737157cf131de7d4df1c17d8d88d)

aarch64: fix off-by-one in reading cpuinfo

Fixes: 341573406b39
Don't subtract one from the result of strnlen() when trying to point
to the first character after the current string. This issue would
cause individual characters (where the 128 byte buffers are stitched
together) to be lost.

gcc/ChangeLog:

* config/aarch64/driver-aarch64.cc (readline): Fix off-by-one.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/cpunative/info_18: New test.
* gcc.target/aarch64/cpunative/native_cpu_18.c: New test.

(cherry picked from commit b1cfbccc41de6aec950c0f662e7e85ab34bfff8a)

Daily bump.

Relax assertion in profiler

This assertion in branch_prob:

  if (bb == ENTRY_BLOCK_PTR_FOR_FN (cfun)->next_bb)
    {
      location_t loc = DECL_SOURCE_LOCATION (current_function_decl);
      gcc_checking_assert (!RESERVED_LOCATION_P (loc));

had been correct until the fix for PR debug/101598 was installed.

gcc/
* profile.cc (branch_prob): Be prepared for ignored functions with
DECL_SOURCE_LOCATION set to UNKNOWN_LOCATION.

gcc/testsuite:
* gnat.dg/specs/coverage1.ads: New test.
* gnat.dg/specs/variant_part.ads: Minor tweak.
* gnat.dg/specs/weak1.ads: Add dg directive.

IBM zSystems: Fix function_ok_for_sibcall [PR106355]

For a parameter with BLKmode we cannot use REG_NREGS in order to
determine the number of consecutive registers. Streamlined this with
the implementation of s390_function_arg.

Fix some indentation whitespace, too.

gcc/ChangeLog:

PR target/106355
* config/s390/s390.cc (s390_call_saved_register_used): For a
parameter with BLKmode fix determining number of consecutive
registers.

gcc/testsuite/ChangeLog:

* gcc.target/s390/pr106355.h: Common code for new tests.
* gcc.target/s390/pr106355-1.c: New test.
* gcc.target/s390/pr106355-2.c: New test.
* gcc.target/s390/pr106355-3.c: New test.

(cherry picked from commit cb994acc08b67f26a54e7c5dc1f4995a2ce24d98)

i386: fix pedantic warning

PR target/107364

gcc/ChangeLog:

* common/config/i386/i386-cpuinfo.h (enum processor_vendor):
Fix pedantic warning.

(cherry picked from commit f3f000b7689ce9eb6364808072025672af1e4e1b)

x86: fix VENDOR_MAX enum value

PR target/107364

gcc/ChangeLog:

* common/config/i386/i386-cpuinfo.h (enum processor_vendor):
Reorder enum values as BUILTIN_VENDOR_MAX should not point
in the middle of the valid enum values.

(cherry picked from commit f751bf4c5d1aaa1aacfcbdec62881c5ea1175dfb)

Daily bump.

Fortran: error recovery with references of bad array constructors [PR105633]

gcc/fortran/ChangeLog:

PR fortran/105633
* expr.cc (find_array_section): Move check for NULL pointers so
that both subscript triplets and vector subscripts are covered.

gcc/testsuite/ChangeLog:

PR fortran/105633
* gfortran.dg/pr105633.f90: New test.

Co-authored-by: Steven G. Kargl <kargl@gcc.gnu.org>
(cherry picked from commit ecb20df4fa6d99daa635c7fb662dc0554610777e)

Daily bump.

Add 'libgomp.oacc-c-c++-common/private-big-1.c' [PR105421]

After commit r13-3404-g7c55755d4c760de326809636531478fd7419e1e5
"amdgcn: Use FLAT addressing for all functions with pointer arguments [PR105421]",
"big" private data now works for GCN offloading, too.

PR target/105421
libgomp/
* testsuite/libgomp.oacc-c-c++-common/private-big-1.c: New.

(cherry picked from commit c7ebee2378426eeca425ca5406af213a926f154c)

amdgcn: Use FLAT addressing for all functions with pointer arguments [PR105421]

The GCN backend uses a heuristic to determine whether to use FLAT or
GLOBAL addressing in a particular (offload) function: namely, if a
function takes a pointer-to-scalar parameter, it is assumed that the
pointer may refer to "flat scratch" space, and thus FLAT addressing must
be used instead of GLOBAL.

I came up with this heuristic initially whilst working on support for
moving OpenACC gang-private variables into local-data share (scratch)
memory. The assumption that only scalar variables would be transformed in
that way turned out to be wrong.  For example, prior to the next patch in
the series, Fortran compiler-generated temporary structures were treated
as gang private and moved to LDS space, typically overflowing the region
allocated for such variables.  That will no longer happen after that
patch is applied, but there may be other cases of structs moving to LDS
space now or in the future that this patch may be needed for.

2022-10-14  Julian Brown  <julian@codesourcery.com>

PR target/105421
gcc/
* config/gcn/gcn.cc (gcn_detect_incoming_pointer_arg): Any pointer
argument forces FLAT addressing mode, not just
pointer-to-non-aggregate.

(cherry picked from commit 7c55755d4c760de326809636531478fd7419e1e5)

tree-optimization/107323 - loop distribution partition ordering issue

The following reverts part of the PR94125 fix which causes us to
use a bogus partition ordering after applying versioning for
alias to the testcase in PR107323. Instead PR94125 is fixed by
appropriately considering to be merged SCCs when skipping edges
we want to ignore because of the alias versioning.

PR tree-optimization/107323
* tree-loop-distribution.cc (pg_unmark_merged_alias_ddrs):
New function.
(loop_distribution::break_alias_scc_partitions): Revert
postorder save/restore from the PR94125 fix. Instead
make sure to not ignore edges from SCCs we are going to
merge.

* gcc.dg/tree-ssa/pr107323.c: New testcase.

(cherry picked from commit 09f9814dc02c161ed78604c6df70b19b596f7524)

Daily bump.

Make 'c-c++-common/goacc/kernels-decompose-pr100400-1-*.c' behave consistently, regardless of checking level

Fix-up for commit c14ea6a72fb1ae66e3d32ac8329558497c6e4403
"Catch 'GIMPLE_DEBUG' misbehavior in OpenACC 'kernels' decomposition
[PR100400, PR103836, PR104061]".

For C++ compilation of 'c-c++-common/goacc/kernels-decompose-pr100400-1-2.c',
we first emit a 'sorry' diagnostic, and then a 'gcc_unreachable' (or
'internal_error', see below) diagnostic, but for example, for
'--enable-checking=release' (thus, '!CHECKING_P'), the second one may actually
be turned into a 'confused by earlier errors, bailing out' diagnostic. (See
'gcc/diagnostic.cc:diagnostic_report_diagnostic': "When not checking, ICEs are
converted to fatal errors when an error has already occurred.") Thus, make
'c-c++-common/goacc/kernels-decompose-pr100400-1-2.c' behave consistently via
'-Wfatal-errors', and thus only matching the 'sorry' diagnostic.

For example, for '--enable-checking=no' (thus, '!ENABLE_ASSERT_CHECKING'), a
call to 'gcc_unreachable' cannot be assumed emit an 'internal_error'-like
diagnostic, so explicitly call 'internal_error' in
'gcc/omp-oacc-kernels-decompose.cc:visit_loops_in_gang_single_region', in the
'GIMPLE_OMP_FOR' case, to avoid regressing
'c-c++-common/goacc/kernels-decompose-pr100400-1-3.c', and
'c-c++-common/goacc/kernels-decompose-pr100400-1-4.c'.

PR middle-end/100400
gcc/
* omp-oacc-kernels-decompose.cc
(visit_loops_in_gang_single_region) <GIMPLE_OMP_FOR>: Explicitly
call 'internal_error'.
gcc/testsuite/
* c-c++-common/goacc/kernels-decompose-pr100400-1-2.c: Specify
'-Wfatal-errors'.

(cherry picked from commit da6305558bab9e24943848e4fc5bd8738d7e8f9b)

aarch64: Prevent generation of /M BRKAS and BRKBS

Bit of a brown-paper-bag bug, but: GCC was generating
non-existent merging forms of BRKAS and BRKBS. Those
instructions only support zero predication (although
BRKA and BRKB support both).

gcc/
* config/aarch64/aarch64-sve.md (*aarch64_brk<brk_op>_cc): Remove
merging alternative.
(*aarch64_brk<brk_op>_ptest): Likewise.

gcc/testsuite/
* gcc.target/aarch64/sve/acle/general/brka_1.c: Expect a separate
PTEST instruction.
* gcc.target/aarch64/sve/acle/general/brkb_1.c: Likewise.

(cherry picked from commit 57675c7f92a3bd3ca8dae1faac7f2f51d40e0f9e)

aarch64: Fix matching of BRKNS

Unlike other flag-setting SVE instructions, BRKNS sets the flags
based on an all-true governing predicate, rather than the GP operand.

gcc/
* config/aarch64/iterators.md (SVE_BRKP): New iterator.
* config/aarch64/aarch64-sve.md (*aarch64_brkn_cc): New pattern.
(*aarch64_brkn_ptest): Likewise.
(*aarch64_brk<brk_op>_cc): Restrict to SVE_BRKP.
(*aarch64_brk<brk_op>_ptest): Likewise.

gcc/testsuite/
* gcc.target/aarch64/sve/acle/general/brkn_1.c: Expect separate
PTEST instructions.
* gcc.target/aarch64/sve/acle/general/brkn_2.c: New test.

(cherry picked from commit 6bec66640597e2604f51fc1642c7d279164cd442)

aarch64: Define __ARM_FEATURE_RCPC

https://github.com/ARM-software/acle/pull/199 adds a new feature
macro for RCPC, for use in things like inline assembly.  This patch
adds the associated support to GCC.

Also, RCPC is required for Armv8.3-A and later, but the armv8.3-a
entry didn't include it.  This was probably harmless in practice
since GCC simply ignored the extension until now.  (The GAS
definition is OK.)

gcc/
* config/aarch64/aarch64.h (AARCH64_FL_FOR_ARCH8_3): Add
AARCH64_FL_RCPC.
(AARCH64_ISA_RCPC): New macro.
* config/aarch64/aarch64-cores.def (thunderx3t110, zeus, neoverse-v1)
(neoverse-512tvb, saphira): Remove RCPC from these Armv8.3-A+ cores.
* config/aarch64/aarch64-c.cc (aarch64_update_cpp_builtins): Define
__ARM_FEATURE_RCPC when appropriate.

gcc/testsuite/
* gcc.target/aarch64/pragma_cpp_predefs_1.c: Add RCPC tests.

Daily bump.

libstdc++: eh_globals: gthreads: reset _S_init before deleting key

Clear __eh_globals_init's _S_init in the dtor before deleting the
gthread key.

This ensures that, in case any code involved in deleting the key
interacts with eh_globals, the key that is being deleted won't be
used, and the non-thread-specific eh_globals fallback will.

for libstdc++-v3/ChangeLog

* libsupc++/eh_globals.cc [!_GLIBCXX_HAVE_TLS]
(__eh_globals_init::~__eh_globals_init): Clear _S_init first.

(cherry picked from commit a33dda016e5acf9c6325ce8a72a1b0238130374e)

Remove undefined behaviour from testscase.

There was a patch posted to remove the undefined behaviour from this
testcase, but it appear to never have been applied.

gcc/teststuite/
PR tree-optimization/102892
* gcc.dg/pr102892-1.c: Remove undefined behaviour.

rs6000: Fix the condition with frame_pointer_needed_indeed [PR96072]

As PR96072 shows, the code adding REG_CFA_DEF_CFA reg note
makes one assumption that we have emitted one insn which
restores the frame pointer previously. That part of code
was guarded with flag frame_pointer_needed before, it was
consistent, but it was replaced with flag
frame_pointer_needed_indeed since commit r10-7981. It
caused ICE due to unexpected NULL insn.

PR target/96072

gcc/ChangeLog:

* config/rs6000/rs6000-logue.cc (rs6000_emit_epilogue): Update the
condition for adding REG_CFA_DEF_CFA reg note with
frame_pointer_needed_indeed.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr96072.c: New test.

(cherry picked from commit 5be0950d22209f5ba69d244387228e12389a8470)

rs6000: Fix condition of define_expand vec_shr_<mode> [PR100645]

PR100645 exposes one latent bug in define_expand vec_shr_<mode>
that the current condition TARGET_ALTIVEC is too loose. The
mode iterator VEC_L contains a few modes, they are not always
supported as vector mode, VECTOR_UNIT_ALTIVEC_OR_VSX_P should
be used like some other VEC_L usages.

PR target/100645

gcc/ChangeLog:

* config/rs6000/vector.md (vec_shr_<mode>): Replace condition
TARGET_ALTIVEC with VECTOR_UNIT_ALTIVEC_OR_VSX_P.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr100645.c: New test.

(cherry picked from commit bfad7069b74c97000b698191c1945f07a6192db5)

Daily bump.

Fix register count when not splitting Complex IEEE 128-bit args.

For ABI_V4, we do not split complex args. This created a problem because
even though an arg would be passed in two VSX regs, we were only advancing the
function arg counter by one VSX register. Fixed with this patch.

PR target/99685

gcc/
* config/rs6000/rs6000-call.cc (rs6000_function_arg_advance_1): Bump
register count when not splitting IEEE 128-bit Complex.

(cherry picked from commit 2ee68beee709e48fce85b8892ff9985acc6a91a8)

tree-optimization/107254 - check and support live lanes from permutes

The following fixes an omission from adding SLP permute nodes which
is live lanes originating from those. We have to check that we
can extract the lane and have to actually code generate them.

PR tree-optimization/107254
* tree-vect-slp.cc (vect_slp_analyze_node_operations_1):
For permutes also analyze live lanes.
(vect_schedule_slp_node): For permutes also code generate
live lane extracts.

* gfortran.dg/vect/pr107254.f90: New testcase.

(cherry picked from commit 9ed4a849afb5b18b462bea311e7eee454c2c9f68)

tree-optimization/107212 - SLP reduction of reduction paths

The following fixes an issue with how we handle epilogue generation
for SLP reductions of reduction paths where the actual live lanes
are not "canonical". We need to make sure to identify all live
lanes as reductions and thus have to iterate over all participating
SLP lanes when walking the reduction SSA use-def chain. Also the
previous attempt likely to mitigate such issue in
vectorizable_live_operation is misguided and has to be removed.

PR tree-optimization/107212
* tree-vect-loop.cc (vectorizable_reduction): Make sure to
set STMT_VINFO_REDUC_DEF for all live lanes in a SLP
reduction.
(vectorizable_live_operation): Do not pun to the SLP
node representative for reduction epilogue generation.

* gcc.dg/vect/pr107212-1.c: New testcase.
* gcc.dg/vect/pr107212-2.c: Likewise.

(cherry picked from commit ee467644c53ee2f7d633a8e1f53603feafab4351)

tree-optimization/107160 - avoid reusing multiple accumulators

Epilogue vectorization is not set up to re-use a vectorized
accumulator consisting of more than one vector. For non-SLP
we always reduce to a single but for SLP that isn't happening.
In such case we currenlty miscompile the epilog so avoid this.

PR tree-optimization/107160
* tree-vect-loop.cc (vect_create_epilog_for_reduction):
Do not register accumulator if we failed to reduce it
to a single vector.

* gcc.dg/vect/pr107160.c: New testcase.

(cherry picked from commit 5cbaf84c191b9a3e3cb26545c808d208bdbf2ab5)

tree-optimization/107107 - tail-merging VN wrong-code

The following fixes an unintended(?) side-effect of the special
MODIFY_EXPR expression entries we add for tail-merging during VN.
We shouldn't value-number the virtual operand differently here.

PR tree-optimization/107107
* tree-ssa-sccvn.cc (visit_reference_op_store): Do not
affect value-numbering when doing the tail merging
MODIFY_EXPR lookup.

* gcc.dg/pr107107.c: New testcase.

(cherry picked from commit 85333b9265720fc4e49397301cb16324d2b89aa7)

tree-optimization/106922 - extend same-val clobber FRE

The following extends the skipping of same valued stores to
handle an arbitrary number of them as long as they are from the
same value (which we now record). That's an obvious extension
which allows to optimize the m_engaged member of std::optional
more reliably.

PR tree-optimization/106922
* tree-ssa-sccvn.cc (vn_reference_lookup_3): Allow
an arbitrary number of same valued skipped stores.

* g++.dg/torture/pr106922.C: New testcase.

(cherry picked from commit af611afe5fcc908a6678b5b205fb5af7d64fbcb2)

testsuite: Fix up pr106922.C test

On Thu, Sep 22, 2022 at 01:10:08PM +0200, Richard Biener via Gcc-patches wrote:
>       * g++.dg/tree-ssa/pr106922.C: Adjust.

> --- a/gcc/testsuite/g++.dg/tree-ssa/pr106922.C
> +++ b/gcc/testsuite/g++.dg/tree-ssa/pr106922.C
> @@ -87,5 +87,4 @@ void testfunctionfoo() {
>    }
>  }
>
> -// { dg-final { scan-tree-dump-times "Found fully redundant value" 4 "pre" { xfail { ! lp64 } } } }
> -// { dg-final { scan-tree-dump-not "m_initialized" "cddce3" { xfail { ! lp64 } } } }
> +// { dg-final { scan-tree-dump-not "m_initialized" "dce3" } }

I've noticed
+UNRESOLVED: g++.dg/tree-ssa/pr106922.C  -std=gnu++20  scan-tree-dump-not dce3 "m_initialized"
+UNRESOLVED: g++.dg/tree-ssa/pr106922.C  -std=gnu++2b  scan-tree-dump-not dce3 "m_initialized"
with this change, both on x86_64 and i686.
The dump is still cddce3, additionally as the last reference to the pre
dump is gone, not sure it is worth creating that dump.

With the following patch, there aren't FAILs nor UNRESOLVED tests with
GXX_TESTSUITE_STDS=98,11,14,17,20,2b make check-g++ RUNTESTFLAGS="--target_board=unix\{-m32,-m64\} dg.exp='pr106922.C'"

2022-09-23  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/106922
* g++.dg/tree-ssa/pr106922.C: Scan in cddce3 dump rather than
dce3.  Remove -fdump-tree-pre-details from dg-options.

(cherry picked from commit a0de11d0d22054b6fd76a0730a3ec807542379d0)

tree-optimization/106922 - missed FRE/PRE

The following enhances the store-with-same-value trick in
vn_reference_lookup_3 by not only looking for

  a = val;
  *ptr = val;
  .. = a;

but also

  *ptr = val;
  other = x;
  .. = a;

where the earlier store is more than one hop away.  It does this
by queueing the actual value to compare until after the walk but
as disadvantage only allows a single such skipped store from a
constant value.

Unfortunately we cannot handle defs from non-constants this way
since we're prone to pick up values from the past loop iteration
this way and we have no good way to identify values that are
invariant in the currently iterated cycle.  That's why we keep
the single-hop lookup for those cases.  gcc.dg/tree-ssa/pr87126.c
would be a testcase that's un-XFAILed when we'd handle those
as well.

PR tree-optimization/106922
* tree-ssa-sccvn.cc (vn_walk_cb_data::same_val): New member.
(vn_walk_cb_data::finish): Perform delayed verification of
a skipped may-alias.
(vn_reference_lookup_pieces): Likewise.
(vn_reference_lookup): Likewise.
(vn_reference_lookup_3): When skipping stores of the same
value also handle constant stores that are more than a
single VDEF away by delaying the verification.

* gcc.dg/tree-ssa/ssa-fre-100.c: New testcase.
* g++.dg/tree-ssa/pr106922.C: Adjust.

(cherry picked from commit 9baee6181b4e427e0b5ba417e51424c15858dce7)

Daily bump.

Fix PR target/107248

This is the infamous PR rtl-optimization/38644 rearing its ugly head for
leaf functions on SPARC more than a decade later... Richard E.'s generic
solution has never been implemented so let's do as other RISC back-ends did.

gcc/
PR target/107248
* config/sparc/sparc.cc (sparc_expand_prologue): Emit a frame
blockage for leaf functions.
(sparc_flat_expand_prologue): Emit frame instead of full blockage.
(sparc_expand_epilogue): Emit a frame blockage for leaf functions.
(sparc_flat_expand_epilogue): Emit frame instead of full blockage.

Daily bump.

c++: ICE with VEC_INIT_EXPR and defarg [PR106925]

Since r12-8066, in cxx_eval_vec_init we perform expand_vec_init_expr
while processing the default argument in this test. At this point
start_preparsed_function hasn't yet set current_function_decl.
expand_vec_init_expr then leads to maybe_splice_retval_cleanup which
checks DECL_CONSTRUCTOR_P (current_function_decl) without checking that
c_f_d is non-null first. It seems correct that c_f_d is null here, so
it seems to me that maybe_splice_retval_cleanup should check c_f_d as
in the following patch.

PR c++/106925

gcc/cp/ChangeLog:

* except.cc (maybe_splice_retval_cleanup): Check current_function_decl.
Make the bool const.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/initlist-defarg3.C: New test.

(cherry picked from commit 3130e70dab1e64a7b014391fe941090d5f3b6b7d)

install.texi: gcn - update llvm reqirements, gcn/nvptx - newlib use version

gcc/
* doc/install.texi (Specific): Add missing items to bullet list.
(amdgcn): Update LLVM requirements, use version not date for newlib.
(nvptx): Use version not git hash for newlib.

(cherry picked from commit e886ebd17965d78f609b62479f4f48085108389c)

Daily bump.

fortran: Move clobbers after evaluation of all arguments [PR106817]

For actual arguments whose dummy is INTENT(OUT), we used to generate
clobbers on them at the same time we generated the argument reference
for the function call. This was wrong if for an argument coming
later, the value expression was depending on the value of the just-
clobbered argument, and we passed an undefined value in that case.

With this change, clobbers are collected separatedly and appended
to the procedure call preliminary code after all the arguments have been
evaluated.

PR fortran/106817

gcc/fortran/ChangeLog:

* trans-expr.cc (gfc_conv_procedure_call): Collect all clobbers
to their own separate block. Append the block of clobbers to
the procedure preliminary block after the argument evaluation
codes for all the arguments.

gcc/testsuite/ChangeLog:

* gfortran.dg/intent_optimize_4.f90: New test.

(cherry picked from commit 29919bf3b6449bafd02e795abbb1966e3990c1fc)

fortran: Fix invalid function decl clobber ICE [PR105012]

The fortran frontend, as result symbol for a function without
declared result symbol, uses the function symbol itself. This caused
an invalid clobber of a function decl to be emitted, leading to an
ICE, whereas the intended behaviour was to clobber the function result
variable. This change fixes the problem by getting the decl from the
just-retrieved variable reference after the call to
gfc_conv_expr_reference, instead of copying it from the frontend symbol.

PR fortran/105012

gcc/fortran/ChangeLog:

* trans-expr.cc (gfc_conv_procedure_call): Retrieve variable
from the just calculated variable reference.

gcc/testsuite/ChangeLog:

* gfortran.dg/intent_out_15.f90: New test.

(cherry picked from commit edaf1e005c90b311c39b46d85cea17befbece112)

fortran: Move the clobber generation code

This change inlines the clobber generation code from
gfc_conv_expr_reference to the single caller from where the add_clobber
flag can be true, and removes the add_clobber argument.

What motivates this is the standard making the procedure call a cause
for a variable to become undefined, which translates to a clobber
generation, so clobber generation should be closely related to procedure
call generation, whereas it is rather orthogonal to variable reference
generation. Thus the generation of the clobber feels more appropriate
in gfc_conv_procedure_call than in gfc_conv_expr_reference.

Behaviour remains unchanged.

gcc/fortran/ChangeLog:

* trans.h (gfc_conv_expr_reference): Remove add_clobber
argument.
* trans-expr.cc (gfc_conv_expr_reference): Ditto. Inline code
depending on add_clobber and conditions controlling it ...
(gfc_conv_procedure_call): ... to here.

(cherry picked from commit 2b393f6f83903cb836676bbd042c1b99a6e7e6f7)

Daily bump.

arm: Fix constant immediates predicates and constraints for some MVE builtins

Several MVE builtins incorrectly use the same predicate/constraint
pair for several modes, which does not match the specification.
This patch uses the appropriate iterator instead.

2022-09-06 Christophe Lyon <christophe.lyon@arm.com>

gcc/
* config/arm/mve.md (mve_vqshluq_n_s<mode>): Use
MVE_pred/MVE_constraint instead of mve_imm_7/Ra.
(mve_vqshluq_m_n_s<mode>): Likewise.
(mve_vqrshrnbq_n_<supf><mode>): Use MVE_pred3/MVE_constraint3
instead of mve_imm_8/Rb.
(mve_vqrshrunbq_n_s<mode>): Likewise.
(mve_vqrshrntq_n_<supf><mode>): Likewise.
(mve_vqrshruntq_n_s<mode>): Likewise.
(mve_vrshrnbq_n_<supf><mode>): Likewise.
(mve_vrshrntq_n_<supf><mode>): Likewise.
(mve_vqrshrnbq_m_n_<supf><mode>): Likewise.
(mve_vqrshrntq_m_n_<supf><mode>): Likewise.
(mve_vrshrnbq_m_n_<supf><mode>): Likewise.
(mve_vrshrntq_m_n_<supf><mode>): Likewise.
(mve_vqrshrunbq_m_n_s<mode>): Likewise.
(mve_vsriq_n_<supf><mode): Use MVE_pred2/MVE_constraint2 instead
of mve_imm_selective_upto_8/Rg.
(mve_vsriq_m_n_<supf><mode>): Likewise.

(cherry-picked from c3fb6658c7670e446f2fd00984404d971e416b3c)

tree-optimization/106934 - avoid BIT_FIELD_REF of bitfields

The following avoids creating BIT_FIELD_REF of bitfields in
update-address-taken. The patch doesn't implement punning to
a full precision integer type but leaves a comment according to
that.

PR tree-optimization/106934
* tree-ssa.cc (non_rewritable_mem_ref_base): Avoid BIT_FIELD_REFs
of bitfields.
(maybe_rewrite_mem_ref_base): Likewise.

* gfortran.dg/pr106934.f90: New testcase.

(cherry picked from commit 05f5c42cb42c5088187d44cc45a5f671d19ad8c5)

tree-optimization/106922 - PRE and virtual operand translation

PRE implicitely keeps virtual operands at the blocks incoming version
but the explicit updating point during PHI translation fails to trigger
when there are no PHIs at all in a block.  Later lazy updating then
fails because of a too lose block check.  A similar issues plagues
reference invalidation when checking the ANTIC_OUT to ANTIC_IN
translation.  The following fixes both and makes the lazy updating
work.

The diagnostic testcase unfortunately requires boost so the
testcase is the one I reduced for a missed optimization in PRE.
The testcase fails with -m32 on x86_64 because we optimize too
much before PRE which causes PRE to not trigger so we fail to
eliminate a full redundancy.  I'm going to open a separate bug
for this.  Hopefully the !lp64 selector is good enough.

PR tree-optimization/106922
* tree-ssa-pre.cc (translate_vuse_through_block): Only
keep the VUSE if its def dominates PHIBLOCK.
(prune_clobbered_mems): Rewrite logic so we check whether
a value dies in a block when the VUSE def doesn't dominate it.

* g++.dg/tree-ssa/pr106922.C: New testcase.

(cherry picked from commit 5edf02ed2b6de024f83a023d046a6a18f645bc83)

tree-optimization/106892 - avoid invalid pointer association in predcom

When predictive commoning builds a reference for iteration N it
prematurely associates a constant offset into the MEM_REF offset
operand which can be invalid if the base pointer then points
outside of an object which alias-analysis does not consider valid.

PR tree-optimization/106892
* tree-predcom.cc (ref_at_iteration): Do not associate the
constant part of the offset into the MEM_REF offset
operand, across a non-zero offset.

* gcc.dg/torture/pr106892.c: New testcase.

(cherry picked from commit a8b0b13da7379feb31950a9d2ad74b98a29c547f)

tree-optimization/105937 - avoid uninit diagnostics crossing iterations

The following avoids adding PHIs to the worklist for uninit processing
if we reach them following backedges.  That confuses predicate analysis
because it assumes the use is happening in the same iteration as the the
definition.  For the testcase in the PR the situation is like

void foo (int val)
{
  int uninit;
  # val = PHI <..> (B)
  for (..)
    {
      if (..)
        {
          .. = val; (C)
          val = uninit;
        }
      # val = PHI <..> (A)
    }
}

and starting from (A) with 'uninit' as argument we arrive at (B)
and from there at (C).  Predicate analysis then tries to prove
the predicate of (B) (not the backedge) can prove that the
path from (B) to (C) is unreachable which isn't really what it
necessary - that's what we'd need to do when the preheader
edge of the loop were the edge with the uninitialized def.

So the following makes those cases intentionally false negatives.

PR tree-optimization/105937
* tree-ssa-uninit.cc (find_uninit_use): Do not queue PHIs
on backedges.
(execute_late_warn_uninitialized): Mark backedges.

* g++.dg/uninit-pr105937.C: New testcase.

(cherry picked from commit c77fae1ca796d6ea06d5cd437909905c3d3d771c)

Daily bump.

Add cpplib ro.po

* ro.po: New.

Daily bump.

Fortran: Fix ICE and wrong code for assumed-rank arrays [PR100029, PR100040]

gcc/fortran/ChangeLog:

PR fortran/100040
PR fortran/100029
* trans-expr.cc (gfc_conv_class_to_class): Add code to have
assumed-rank arrays recognized as full arrays and fix the type
of the array assignment.
(gfc_conv_procedure_call): Change order of code blocks such that
the free of ALLOCATABLE dummy arguments with INTENT(OUT) occurs
first.

gcc/testsuite/ChangeLog:

PR fortran/100029
* gfortran.dg/PR100029.f90: New test.

PR fortran/100040
* gfortran.dg/PR100040.f90: New test.

(cherry picked from commit 5299155bb80e90df822e1eebc9f9a0c8e4505a46)

Daily bump.

gcc/config/t-i386: add build dependencies on i386-builtin-types.inc

i386-builtin-types.inc is included indirectly via i386-builtins.h
into 4 files: i386.cc i386-builtins.cc i386-expand.cc i386-features.cc

Only i386.cc dependency was present in gcc/config/t-i386 makefile.

As a result parallel builds occasionally fail as:

    g++ ... -o i386-builtins.o ... ../../gcc-13-20220911/gcc/config/i386/i386-builtins.cc
    In file included from ../../gcc-13-20220911/gcc/config/i386/i386-builtins.cc:92:
    ../../gcc-13-20220911/gcc/config/i386/i386-builtins.h:25:10:
     fatal error: i386-builtin-types.inc: No such file or directory
       25 | #include "i386-builtin-types.inc"
          |          ^~~~~~~~~~~~~~~~~~~~~~~~
    compilation terminated.
    make[3]: *** [../../gcc-13-20220911/gcc/config/i386/t-i386:54: i386-builtins.o]
      Error 1 shuffle=1663349189

gcc/
PR target/107064
* config/i386/t-i386: Add build-time dependencies against
i386-builtin-types.inc to i386-builtins.o, i386-expand.o,
i386-features.o.

(cherry picked from commit ef3165736d9daafba88adb2db65b2e8ebf0024ca)

Update gcc sv.po

* sv.po: Update.

Daily bump.

Fortran: Fix automatic reallocation inside select rank [PR100103]

gcc/fortran/ChangeLog:

PR fortran/100103
* trans-array.cc (gfc_is_reallocatable_lhs): Add select rank
temporary associate names as possible targets of automatic
reallocation.

gcc/testsuite/ChangeLog:

PR fortran/100103
* gfortran.dg/PR100103.f90: New test.

(cherry picked from commit 12b537b9b7fd50f4b2fbfcb7ccf45f8d66085577)

Fortran: Fix function attributes [PR100132]

gcc/fortran/ChangeLog:

PR fortran/100132
* trans-types.cc (create_fn_spec): Fix function attributes when
passing polymorphic pointers.

gcc/testsuite/ChangeLog:

PR fortran/100132
* gfortran.dg/PR100132.f90: New test.

(cherry picked from commit be60aa5b608b5f09fadfeff852a46589ac311a42)

Daily bump.

c++: fix triviality of class with unsatisfied op=

cxx20_pair is trivially copyable because it has a trivial copy constructor
and only a deleted copy assignment operator; the non-triviality of the
unsatisfied copy assignment overload is not considered.

gcc/cp/ChangeLog:

* class.cc (check_methods): Call constraints_satisfied_p.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/cond-triv3.C: New test.

Fortran: error recovery while simplifying intrinsic UNPACK [PR107054]

gcc/fortran/ChangeLog:

PR fortran/107054
* simplify.cc (gfc_simplify_unpack): Replace assert by condition
that terminates simplification when there are not enough elements
in the constructor of argument VECTOR.

gcc/testsuite/ChangeLog:

PR fortran/107054
* gfortran.dg/pr107054.f90: New test.

(cherry picked from commit 78bc6497fc61bbdacfb416ee0246a775360d9af6)

Fortran: fix ICE in generate_coarray_sym_init [PR82868]

gcc/fortran/ChangeLog:

PR fortran/82868
* trans-decl.cc (generate_coarray_sym_init): Skip symbol
if attr.associate_var.

gcc/testsuite/ChangeLog:

PR fortran/82868
* gfortran.dg/associate_26a.f90: New test.

(cherry picked from commit bc71318a91286b5f00e88f07aab818ac82510692)

Fortran: NULL pointer dereference in invalid simplification [PR106985]

gcc/fortran/ChangeLog:

PR fortran/106985
* expr.cc (gfc_simplify_expr): Avoid NULL pointer dereference.

gcc/testsuite/ChangeLog:

PR fortran/106985
* gfortran.dg/pr106985.f90: New test.

(cherry picked from commit 8dbb15bc2d019488240c1e69d93121b0347ac092)

i386: Mark XMM4-XMM6 as clobbered by encodekey128/encodekey256

encodekey128 and encodekey256 operations clear XMM4-XMM6. But it is
documented that XMM4-XMM6 are reserved for future usages and software
should not rely upon them being zeroed. Change encodekey128 and
encodekey256 to clobber XMM4-XMM6.

gcc/

PR target/107061
* config/i386/predicates.md (encodekey128_operation): Check
XMM4-XMM6 as clobbered.
(encodekey256_operation): Likewise.
* config/i386/sse.md (encodekey128u32): Clobber XMM4-XMM6.
(encodekey256u32): Likewise.

gcc/testsuite/

PR target/107061
* gcc.target/i386/keylocker-encodekey128.c: Don't check
XMM4-XMM6.
* gcc.target/i386/keylocker-encodekey256.c: Likewise.

(cherry picked from commit db288230db55dc1ff626f46c708b555847013a41)