git.ipfire.org Git - thirdparty/gcc.git/log

RISC-V: Add support for B standard extension

This patch adds support for recognizing the B standard extension to be the
collection of Zba, Zbb, Zbs extensions for consistency and conciseness
across toolchains

https://github.com/riscv/riscv-b/tags

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc: Add imply rules for B extension
* config/riscv/arch-canonicalize: Ditto

Signed-off-by: Edwin Lu <ewlu@rivosinc.com>

internal-fn: Reuse SUBREG_PROMOTED_VAR_P handling

expand_fn_using_insn has code to handle SUBREG_PROMOTED_VAR_P
destinations.  Specifically, for:

  (subreg/v:M1 (reg:M2 R) ...)

it creates a new temporary register T, uses it for the output
operand, then sign- or zero-extends the M1 lowpart of T to M2,
storing the result in R.

This patch splits this handling out into helper routines and
uses them for other instances of:

  if (!rtx_equal_p (target, ops[0].value))
    emit_move_insn (target, ops[0].value);

It's quite probable that this doesn't help any of the other cases;
in particular, it shouldn't affect vectors.  But I think it could
be useful for the CRC work.

gcc/
* internal-fn.cc (create_call_lhs_operand, assign_call_lhs): New
functions, split out from...
(expand_fn_using_insn): ...here.
(expand_load_lanes_optab_fn): Use them.
(expand_GOMP_SIMT_ENTER_ALLOC): Likewise.
(expand_GOMP_SIMT_LAST_LANE): Likewise.
(expand_GOMP_SIMT_ORDERED_PRED): Likewise.
(expand_GOMP_SIMT_VOTE_ANY): Likewise.
(expand_GOMP_SIMT_XCHG_BFLY): Likewise.
(expand_GOMP_SIMT_XCHG_IDX): Likewise.
(expand_partial_load_optab_fn): Likewise.
(expand_vec_cond_optab_fn): Likewise.
(expand_vec_cond_mask_optab_fn): Likewise.
(expand_RAWMEMCHR): Likewise.
(expand_gather_load_optab_fn): Likewise.
(expand_while_optab_fn): Likewise.
(expand_SPACESHIP): Likewise.

c++: array new with value-initialization [PR115645]

This extends the r11-5179 fix which doesn't work with multidimensional
arrays.  In particular,

  struct S {
    explicit S() { }
  };
  auto p = new S[1][1]();

should not say "converting to S from initializer list would use
explicit constructor" because there's no {}.  However, since we
went into the block where we create a {}, we got confused.  We
should not have gotten there but we did because array_p was true.

This patch refines the check once more.

PR c++/115645

gcc/cp/ChangeLog:

* init.cc (build_new): Don't do any deduction for arrays with
bounds if it's value-initialized.

gcc/testsuite/ChangeLog:

* g++.dg/expr/anew7.C: New test.

recog: Handle some mode-changing hardreg propagations

insn_propagation would previously only replace (reg:M H) with X
for some hard register H if the uses of H were also in mode M.
This patch extends it to handle simple mode punning too.

The original motivation was to try to get rid of the execution
frequency test in aarch64_split_simd_shift_p, but doing that is
follow-up work.

I tried this on at least one target per CPU directory (as for
the late-combine patches) and it seems to be a small win for
all of them.

The patch includes a couple of updates to the ia32 results.
In pr105033.c, foo3 replaced:

       vmovq       8(%esp), %xmm1
       vpunpcklqdq %xmm1, %xmm0, %xmm0

with:

       vmovhps     8(%esp), %xmm0, %xmm0

In vect-bfloat16-2b.c, 5 of the vec_extract_v32bf_* routines
(specifically the ones with nonzero even indices) replaced
things like:

       movl    28(%esp), %eax
       vmovd   %eax, %xmm0

with:

       vpinsrw $0, 28(%esp), %xmm0, %xmm0

(These functions return a bf16, and so only the low 16 bits matter.)

gcc/
* recog.cc (insn_propagation::apply_to_rvalue_1): Handle simple
cases of hardreg propagation in which the register is set and
used in different modes.

gcc/testsuite/
* gcc.target/i386/pr105033.c: Expect vmovhps for the ia32 version
of foo.
* gcc.target/i386/vect-bfloat16-2b.c: Expect more vpinsrws.

rtl-ssa: Add replace_nondebug_insn [PR115785]

change_insns is used to change multiple instructions at once, so that
the IR on return is valid & self-consistent.  These changes can involve
moving instructions, and the new position for one instruction might
be expressed in terms of the old position of another instruction
that is changing at the same time.

change_insns therefore adds placeholder instructions to mark each
new instruction position, then replaces each placeholder with the
corresponding real instruction.  This replacement was done in two
steps: removing the old placeholder instruction and inserting the new
real instruction.  But it's more convenient for the upcoming fix for
PR115785 if we do the operation as a single step.  That should also
be slightly more efficient, since e.g. no splay tree operations are
needed.

This operation happens purely on the rtl-ssa instruction chain.
The placeholders are never represented in rtl.

gcc/
PR rtl-optimization/115785
* rtl-ssa/functions.h (function_info::replace_nondebug_insn): Declare.
* rtl-ssa/insns.h (insn_info::order_node::set_uid): New function.
(insn_info::remove_note): Declare.
* rtl-ssa/insns.cc (insn_info::remove_note): New function.
(function_info::replace_nondebug_insn): Likewise.
* rtl-ssa/changes.cc (function_info::change_insns): Use
replace_nondebug_insn instead of remove_insn + add_insn.

fixincludes: skip stdio_stdarg_h on darwin

The fix is unnecessary on macOS.

fixincludes/ChangeLog:

* fixincl.x: Regenerate.
* inclhack.def (stdio_stdarg_h): Skip on darwin.

c++, contracts: Fix ICE in create_tmp_var [PR113968]

During contract parsing, in grok_contract(), we proceed even if the
condition contains errors. This results in contracts with embedded errors
which eventually confuse gimplify. Checks for errors have been added in
grok_contract() to exit early if an error is encountered.

PR c++/113968

gcc/cp/ChangeLog:

* contracts.cc (grok_contract): Check for error_mark_node early
exit.

gcc/testsuite/ChangeLog:

* g++.dg/contracts/pr113968.C: New test.

Signed-off-by: Nina Ranns <dinka.ranns@gmail.com>

fixincludes: add bypass to darwin_objc_runtime_1

Headers are now fixed in the macOS 15 SDK, and the fix should be
bypassed there.

fixincludes/ChangeLog:

* fixincl.x: Regenerate.
* inclhack.def (darwin_objc_runtime_1): Add bypass.

PR modula2/115823 Wrong expansion of isnormal optab

The bug fix changes gcc/m2/gm2-gcc/m2builtins.c:m2builtins_BuiltinExists
to recognise both __builtin_<functionname> and functionname as a builtin.

gcc/m2/ChangeLog:

PR modula2/115823
* gm2-gcc/m2builtins.cc (struct builtin_macro_definition): New
field builtinname.
(builtin_function_match): New function.
(builtin_macro_match): Ditto.
(m2builtins_BuiltinExists): Use builtin_function_match and
builtin_macro_match.
(lookup_builtin_macro): Use builtin_macro_match.
(lookup_builtin_function): Use builtin_function_match.
(define_builtin): Assign builtinname field.

gcc/testsuite/ChangeLog:

PR modula2/115823
* gm2/builtins/run/pass/testalloa.mod: New test.

Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>

middle-end: Fix stalled swapped condition code value [PR115836]

emit_store_flag_1 calculates scode (swapped condition code) at the
beginning of the function from the value of code variable. However,
code variable may change before scode usage site, resulting in
invalid stalled scode value.

Move calculation of scode value just before its only usage site to
avoid stalled scode value.

PR middle-end/115836

gcc/ChangeLog:

* expmed.cc (emit_store_flag_1): Move calculation of
scode just before its only usage site.

arm: cleanup legacy ARM_PE code

The arm 'pe' target was removed back in 2012 when the FPA support was
removed, but in a small number of places some conditional code was
accidentally left behind. It's no-longer needed, so remove it.

gcc/ChangeLog:

* config/arm/arm-protos.h (arm_dllexport_name_p): Remove prototype.
(arm_dllimport_name_p): Likewise.
(arm_pe_unique_section): Likewise.
(arm_pe_encode_section_info): Likewise.
(arm_dllexport_p): Likewise.
(arm_dllimport_p): Likewise.
(arm_mark_dllexport): Likewise.
(arm_mark_dllimport): Likewise.
(arm_change_mode_p): Likewise.
* config/arm/arm.cc (arm_gnu_attributes): Remove attributes for ARM_PE.
(TARGET_ENCODE_SECTION_INFO): Remove setting for ARM_PE.
(is_called_in_ARM_mode): Remove ARM_PE conditional code.
(thumb1_output_interwork): Remove obsolete ARM_PE code.
(arm_encode_section_info): Remove surrounding #ifndef.

[PR115394] Remove streamer_debugging and it's uses.

gcc/ChangeLog:
PR lto/115394
* lto-streamer.h: Remove streamer_debugging definition.
* lto-streamer-out.cc (stream_write_tree_ref): Remove use of streamer_debugging.
(lto_output_tree): Likewise.
* tree-streamer-in.cc (streamer_read_tree_bitfields): Likewise.
(streamer_get_pickled_tree): Likewise.
* tree-streamer-out.cc (pack_ts_base_value_fields): Likewise.

Signed-off-by: Prathamesh Kulkarni <prathameshk@nvidia.com>

Match: Support form 2 for the .SAT_TRUNC

This patch would like to add form 2 support for the .SAT_TRUNC.  Aka:

Form 2:
  #define DEF_SAT_U_TRUC_FMT_2(NT, WT)     \
  NT __attribute__((noinline))             \
  sat_u_truc_##WT##_to_##NT##_fmt_2 (WT x) \
  {                                        \
    bool overflow = x > (WT)(NT)(-1);      \
    return overflow ? (NT)-1 : (NT)x;      \
  }

DEF_SAT_U_TRUC_FMT_2(uint32, uint64)

Before this patch:
   3   │
   4   │ __attribute__((noinline))
   5   │ uint32_t sat_u_truc_uint64_t_to_uint32_t_fmt_2 (uint64_t x)
   6   │ {
   7   │   uint32_t _1;
   8   │   long unsigned int _3;
   9   │
  10   │ ;;   basic block 2, loop depth 0
  11   │ ;;    pred:       ENTRY
  12   │   _3 = MIN_EXPR <x_2(D), 4294967295>;
  13   │   _1 = (uint32_t) _3;
  14   │   return _1;
  15   │ ;;    succ:       EXIT
  16   │
  17   │ }

After this patch:
   3   │
   4   │ __attribute__((noinline))
   5   │ uint32_t sat_u_truc_uint64_t_to_uint32_t_fmt_2 (uint64_t x)
   6   │ {
   7   │   uint32_t _1;
   8   │
   9   │ ;;   basic block 2, loop depth 0
  10   │ ;;    pred:       ENTRY
  11   │   _1 = .SAT_TRUNC (x_2(D)); [tail call]
  12   │   return _1;
  13   │ ;;    succ:       EXIT
  14   │
  15   │ }

The below test suites are passed for this patch:
1. The x86 bootstrap test.
2. The x86 fully regression test.
3. The rv64gcv fully regresssion test.

gcc/ChangeLog:

* match.pd: Add form 2 for .SAT_TRUNC.
* tree-ssa-math-opts.cc (math_opts_dom_walker::after_dom_children):
Add new case NOP_EXPR,  and try to match SAT_TRUNC.

Signed-off-by: Pan Li <pan2.li@intel.com>

testsuite: Tests the pattern folding x/sqrt(x) to sqrt(x) for Float16

As a follow-up to adding a pattern that folds x/sqrt(x) to sqrt(x) in match.pd, this patch adds a test case for type Float16 for armv8.2-a+fp16.

The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.

Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com>
gcc/testsuite/

* gcc.target/aarch64/sqrt_div_float16.c: New test.

testsuite: Allow matching `{_1, { 0,0,0,0 }}` for vect/slp-gap-1.c

While working on adding V4QI support to the aarch64 backend,
vect/slp-gap-1.c started to fail but only because the regex
was failing. Before it was loading use SI (int) and afterwards,
we started to use V4QI. The generated code was the same and the
generated gimple was almost the same. The regex was searching
for `zero-padding trick` and it was still doing that but instead
of directly 0, it was V4QI 0 (or rather `{ 0, 0, 0 }`).
This extends regex to support both.

Tested on x86_64-linux-gnu and aarch64-linux-gnu (with the support added).

gcc/testsuite/ChangeLog:

* gcc.dg/vect/slp-gap-1.c: Support matching `{_1, { 0, 0, 0, 0 }}`
in addition to `{_1, 0}`.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

Remove expanding complex EQ/NE inside a GIMPLE_RETURN [PR115721]

This code has been dead at least since the move over to tuples
in 0-88576-g726a989a8b74bf, when gimple returns could only have
a simple expression in it. So let's remove it.

Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

PR tree-optimization/115721
* tree-complex.cc (expand_complex_comparison): Remove
support for GIMPLE_RETURN.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

RISC-V: fix zcmp popretz [PR113715]

No functional changes compared with V1, just spaces to table conversion
in testcases to pass check-function-bodies.

V1 passed regression locally but suprisingly failed in pre-commit CI, after
picking the patch from patchwork, I realize table got coverted to spaces
before sending the patch.

Root cause:
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=b27d323a368033f0b37e93c57a57a35fd9997864
Commit above tries in targetm.gen_epilogue () to detect if
there's li a0,0 insn at the end of insn chain, if so, cm.popret
is replaced by cm.popretz and li a0,0 insn is deleted.

Insertion of the generated epilogue sequence
into the insn chain doesn't happen at this moment.
If later shrink-wrap decides NOT to insert the epilogue sequence at the end
of insn chain, then the li a0,0 insn has already been mistakeny removed.

Fix this issue by removing generation of cm.popretz in epilogue,
leaving the assignment to a0 and use insn with cm.popret.

That's likely going to result in some kind of code size regression,
but not a correctness regression.

Optimization can be done in future.

Signed-off-by: Fei Gao <gaofei@eswincomputing.com>
gcc/ChangeLog:
PR target/113715

* config/riscv/riscv.cc (riscv_zcmp_can_use_popretz): Removed.
(riscv_gen_multi_pop_insn): Remove generation of cm.popretz.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rv32e_zcmp.c: Adapt TC.
* gcc.target/riscv/rv32i_zcmp.c: Likewise.

Daily bump.

Fix test errors after r15-1394 for sizeof(int)==sizeof(long) [PR115545]

Some tests added to test the type of redeclarations of enumerators
in r15-1394 fail on architectures where sizeof(long) == sizeof(int).
Adapt tests to use long long and/or accept that long long is selected
as type for the enumerator.

PR testsuite/115545

gcc/testsuite/

* gcc.dg/pr115109.c: Adapt test.
* gcc.dg/c23-tag-enum-6.c: Adapt test.
* gcc.dg/c23-tag-enum-7.c: Adapt test.

c: Fix ICE for redeclaration of structs with different alignment [PR114727]

For redeclarations of struct in C23, if one has an alignment attribute
that makes the alignment different, we later get an ICE in verify_types.
This patches disallows such redeclarations by declaring such types to
be different.

PR c/114727

gcc/c/
* c-typeck.cc (tagged_types_tu_compatible): Add test.

gcc/testsuite/
* gcc.dg/pr114727.c: New test.

c: Fix ICE for incorrect code in comptypes_verify [PR115696]

The new verification code produces an ICE for incorrect code. Add the
same logic as already used in comptypes to to bail out under certain
conditions.

PR c/115696

gcc/c/
* c-typeck.cc (comptypes_verify): Bail out for
identical, empty, and erroneous input types.

gcc/testsuite/
* gcc.dg/pr115696.c: New test.

rs6000, remove vector set and vector init built-ins.

The vector init built-ins:

  __builtin_vec_init_v16qi, __builtin_vec_init_v8hi,
  __builtin_vec_init_v4si, __builtin_vec_init_v4sf,
  __builtin_vec_init_v2di, __builtin_vec_init_v2df,
  __builtin_vec_init_v1ti

perform the same operation as initializing the vector in C code.  For
example:

  result_v4si = __builtin_vec_init_v4si (1, 2, 3, 4);
  result_v4si = {1, 2, 3, 4};

These two constructs were tested and verified they generate identical
assembly instructions with no optimization and -O3 optimization.

The vector set built-ins:

  __builtin_vec_set_v16qi, __builtin_vec_set_v8hi.
  __builtin_vec_set_v4si, __builtin_vec_set_v4sf,
  __builtin_vec_set_v1ti, __builtin_vec_set_v2di,
  __builtin_vec_set_v2df

perform the same operation as setting a specific element in the vector in
C code.  For example:

  src_v4si = __builtin_vec_set_v4si (src_v4si, int_val, index);
  src_v4si[index] = int_val;

The built-in actually generates more instructions than the inline C code
with no optimization but is identical with -O3 optimizations.

All of the above built-ins that are removed do not have test cases and
are not documented.

Built-ins __builtin_vec_set_v1ti __builtin_vec_set_v2di,
__builtin_vec_set_v2df are not removed as they are used in function
resolve_vec_insert() in file rs6000-c.cc.

The built-ins are removed as they don't provide any benefit over just
using C code.

The code to define the bif_init_bit, bif_is_init, as well as their uses
are removed.  The function altivec_expand_vec_init_builtin is also removed.

gcc/ChangeLog:
* config/rs6000/rs6000-builtin.cc (altivec_expand_vec_init_builtin):
Remove the function.
(rs6000_expand_builtin): Remove the if bif_is_int check to call
the altivec_expand_vec_init_builtin function.
* config/rs6000/rs6000-builtins.def: Remove the attribute string
comment for init.
(__builtin_vec_init_v16qi,
__builtin_vec_init_v4sf, __builtin_vec_init_v4si,
__builtin_vec_init_v8hi, __builtin_vec_init_v1ti,
__builtin_vec_init_v2df, __builtin_vec_init_v2di,
__builtin_vec_set_v16qi, __builtin_vec_set_v4sf,
__builtin_vec_set_v4si, __builtin_vec_set_v8hi): Remove
built-in definitions.
* config/rs6000/rs6000-gen-builtins.cc: Remove comment for init
attribute string.
(struct attrinfo): Remove isinit entry.
(parse_bif_attrs): Remove the if statement to check for attribute
init.
(ifdef DEBUG): Remove print for init attribute string.
(write_decls): Remove print for define bif_init_bit and
define for bif_is_init.
(write_bif_static_init): Remove if bifp->attrs.isinit statement.

rs6000, remove __builtin_vsx_xvcmpeqsp_p built-in

The built-in __builtin_vsx_xvcmpeqsp_p is a duplicate of the overloaded
__builtin_altivec_vcmpeqfp_p built-in. The built-in is undocumented and
there are no test cases for it. The patch removes built-in
__builtin_vsx_xvcmpeqsp_p.

gcc/ChangeLog:
* config/rs6000/rs6000-builtins.def (__builtin_vsx_xvcmpeqsp_p):
Remove built-in definition.

rs6000, extend vec_xxpermdi built-in for __int128 args

Add a new signed and unsigned int128 overloaded vector instances for
vec_xxpermdi:

__int128 vec_xxpermdi (__int128, __int128, const int);
__uint128 vec_xxpermdi (__uint128, __uint128, const int);

Update the documentation to include a reference to the new vector built-in
instances of vec_xxpermdi.

Add test cases for the new overloaded instances.

gcc/ChangeLog:
* config/rs6000/rs6000-overload.def (vec_xxpermdi): Add new
overloaded built-in instances of vector signed and unsigned
int128.
* doc/extend.texi: Add documentation for built-in instances of
vector signed and unsigned int128.

gcc/testsuite/ChangeLog:gcc/testsuite/ChangeLog:
* gcc.target/powerpc/vec_perm-runnable-i128.c: New test file.

rs6000, remove __builtin_vsx_xvnegdp and __builtin_vsx_xvnegsp built-ins

The undocumented __builtin_vsx_xvnegdp and __builtin_vsx_xvnegsp are
redundant. The overloaded vec_neg built-in provides the same
functionality. The two built-ins are not documented nor are there any
test cases for them.

Remove the definitions so users will use the overloaded vec_neg built-in
which is documented in the PVIPR.

gcc/ChangeLog:
* config/rs6000/rs6000-builtins.def (__builtin_vsx_xvnegdp,
__builtin_vsx_xvnegsp): Remove built-in definitions.

rs6000, remove __builtin_vsx_vperm_* built-ins

The undocumented built-ins:
  __builtin_vsx_vperm_16qi_uns,
  __builtin_vsx_vperm_1ti,
  __builtin_vsx_vperm_1ti_uns,
  __builtin_vsx_vperm_2df,
  __builtin_vsx_vperm_2di,
  __builtin_vsx_vperm_2di_uns,
  __builtin_vsx_vperm_4sf,
  __builtin_vsx_vperm_4si,
  __builtin_vsx_vperm_4si_uns

are duplicats of the __builtin_altivec_* built-ins that are used by
the overloaded vec_perm built-in that is documented in the PVIPR.

gcc/ChangeLog:
* config/rs6000/rs6000-builtins.def (__builtin_vsx_vperm_16qi_uns,
__builtin_vsx_vperm_1ti, __builtin_vsx_vperm_1ti_uns,
__builtin_vsx_vperm_2df, __builtin_vsx_vperm_2di,
__builtin_vsx_vperm_2di_uns, __builtin_vsx_vperm_4sf,
__builtin_vsx_vperm_4si, __builtin_vsx_vperm_4si_uns): Remove
built-in definitions and comments.

gcc/testsuite/ChangeLog:
* gcc.target/powerpc/vsx-builtin-3.c (__builtin_vsx_vperm_16qi_uns,
__builtin_vsx_vperm_1ti, __builtin_vsx_vperm_1ti_uns,
__builtin_vsx_vperm_2df, __builtin_vsx_vperm_2di,
__builtin_vsx_vperm_2di_uns, __builtin_vsx_vperm_4sf,
__builtin_vsx_vperm_4si, __builtin_vsx_vperm_4si_uns,
__builtin_vsx_vperm): Change call to built-in to the  overloaded
built-in vec_perm.

rs6000, remove the vec_xxsel built-ins, they are duplicates

The following undocumented built-ins are covered by the existing overloaded
vec_sel built-in definitions.

  const vsc __builtin_vsx_xxsel_16qi (vsc, vsc, vsc);
same as vsc __builtin_vec_sel (vsc, vsc, vuc);  (overloaded vec_sel)

  const vuc __builtin_vsx_xxsel_16qi_uns (vuc, vuc, vuc);
same as vuc __builtin_vec_sel (vuc, vuc, vuc);  (overloaded vec_sel)

  const vd __builtin_vsx_xxsel_2df (vd, vd, vd);
same as  vd __builtin_vec_sel (vd, vd, vull);   (overloaded vec_sel)

  const vsll __builtin_vsx_xxsel_2di (vsll, vsll, vsll);
same as vsll __builtin_vec_sel (vsll, vsll, vsll);  (overloaded vec_sel)

  const vull __builtin_vsx_xxsel_2di_uns (vull, vull, vull);
same as vull __builtin_vec_sel (vull, vull, vsll);  (overloaded vec_sel)

  const vf __builtin_vsx_xxsel_4sf (vf, vf, vf);
same as vf __builtin_vec_sel (vf, vf, vsi)          (overloaded vec_sel)

  const vsi __builtin_vsx_xxsel_4si (vsi, vsi, vsi);
same as vsi __builtin_vec_sel (vsi, vsi, vbi);      (overloaded vec_sel)

  const vui __builtin_vsx_xxsel_4si_uns (vui, vui, vui);
same as vui __builtin_vec_sel (vui, vui, vui);      (overloaded vec_sel)

  const vss __builtin_vsx_xxsel_8hi (vss, vss, vss);
same as vss __builtin_vec_sel (vss, vss, vbs);      (overloaded vec_sel)

  const vus __builtin_vsx_xxsel_8hi_uns (vus, vus, vus);
same as vus __builtin_vec_sel (vus, vus, vus);      (overloaded vec_sel)

This patch removed the duplicate built-in definitions so users will only
use the documented vec_sel built-in.  The __builtin_vsx_xxsel_[4si, 8hi,
16qi, 4sf, 2df] tests are also removed.

gcc/ChangeLog:
* config/rs6000/rs6000-builtins.def (__builtin_vsx_xxsel_16qi,
__builtin_vsx_xxsel_16qi_uns, __builtin_vsx_xxsel_2df,
__builtin_vsx_xxsel_2di, __builtin_vsx_xxsel_2di_uns,
__builtin_vsx_xxsel_4sf, __builtin_vsx_xxsel_4si,
__builtin_vsx_xxsel_4si_uns, __builtin_vsx_xxsel_8hi,
__builtin_vsx_xxsel_8hi_uns): Remove built-in definitions.

gcc/testsuite/ChangeLog:
* gcc.target/powerpc/vsx-builtin-3.c (__builtin_vsx_xxsel_4si,
__builtin_vsx_xxsel_8hi, __builtin_vsx_xxsel_16qi,
__builtin_vsx_xxsel_4sf, __builtin_vsx_xxsel_2df,
__builtin_vsx_xxsel): Change built-in call to overloaded built-in
call vec_sel.

rs6000, add overloaded vec_sel with int128 arguments

Extend the vec_sel built-in to take three signed/unsigned/bool int128
arguments and return a signed/unsigned/bool int128 result.

Extending the vec_sel built-in makes the existing buit-ins
__builtin_vsx_xxsel_1ti and __builtin_vsx_xxsel_1ti_uns obsolete. The
patch removes these built-ins.

The patch adds documentation and test cases for the new overloaded
vec_sel built-ins.

gcc/ChangeLog:
* config/rs6000/rs6000-builtins.def (__builtin_vsx_xxsel_1ti,
__builtin_vsx_xxsel_1ti_uns): Remove built-in definitions.
* config/rs6000/rs6000-overload.def (vec_sel): Add new
overloaded vector signed, unsigned and bool 128-bit definitions.
* doc/extend.texi (vec_sel): Add documentation for new instances
with signed, unsigned and bool 129-bit bool arguments.

gcc/testsuite/ChangeLog:
* gcc.target/powerpc/builtins-10-runnable.c: New runnable test
file.
* gcc.target/powerpc/builtins-10.c: New compile only test file.

rs6000, remove duplicated built-ins of vecmergl and vec_mergeh

The following undocumented built-ins are same as existing documented
overloaded builtins.

  const vf __builtin_vsx_xxmrghw (vf, vf);
same as  vf __builtin_vec_mergeh (vf, vf);     (overloaded vec_mergeh)

  const vsi __builtin_vsx_xxmrghw_4si (vsi, vsi);
same as vsi __builtin_vec_mergeh (vsi, vsi);   (overloaded vec_mergeh)

  const vf __builtin_vsx_xxmrglw (vf, vf);
same as vf __builtin_vec_mergel (vf, vf);      (overloaded vec_mergel)

  const vsi __builtin_vsx_xxmrglw_4si (vsi, vsi);
same as vsi __builtin_vec_mergel (vsi, vsi);   (overloaded vec_mergel)

This patch removes the duplicate built-in definitions so only the
documented built-ins will be available for use.  The case statements in
rs6000_gimple_fold_builtin are removed as they are no longer needed.  The
patch removes the now unused define_expands for vsx_xxmrghw_<mode> and
vsx_xxmrglw_<mode>.

gcc/ChangeLog:
* config/rs6000/rs6000-builtins.def (__builtin_vsx_xxmrghw,
__builtin_vsx_xxmrghw_4si, __builtin_vsx_xxmrglw,
__builtin_vsx_xxmrglw_4si, __builtin_vsx_xxsel_16qi): Remove
built-in definition.
* config/rs6000/rs6000-builtin.cc (rs6000_gimple_fold_builtin):
remove case entries RS6000_BIF_XXMRGLW_4SI,
RS6000_BIF_XXMRGLW_4SF, RS6000_BIF_XXMRGHW_4SI,
RS6000_BIF_XXMRGHW_4SF.
* config/rs6000/vsx.md (vsx_xxmrghw_<mode>, vsx_xxmrglw_<mode>):
Remove unused define_expands.

rs6000, Remove redundant vector float/double type conversions

The following built-ins are redundant as they are covered by another
overloaded built-in.

  __builtin_vsx_xvcvspdp covered by vec_double{e,o}
  __builtin_vsx_xvcvdpsp covered by vec_float{e,o}
  __builtin_vsx_xvcvsxwdp covered by vec_double{e,o}
  __builtin_vsx_xvcvuxddp_uns covered by vec_double

Remove the redundant built-ins. They are not documented nor do they have
test cases.

gcc/ChangeLog:
* config/rs6000/rs6000-builtins.def (__builtin_vsx_xvcvspdp,
__builtin_vsx_xvcvdpsp, __builtin_vsx_xvcvsxwdp,
__builtin_vsx_xvcvuxddp_uns): Remove.

rs6000, extend the current vec_{un,}signed{e,o} built-ins

The built-ins __builtin_vsx_xvcvspsxds and __builtin_vsx_xvcvspuxds
convert a vector of floats to a vector of signed/unsigned long long ints.
Extend the existing vec_{un,}signed{e,o} built-ins to handle the argument
vector of floats to return a vector of even/odd signed/unsigned integers.

The define expands vsignede_v4sf, vsignedo_v4sf, vunsignede_v4sf,
vunsignedo_v4sf are added to support the new vec_{un,}signed{e,o}
built-ins.

The built-ins __builtin_vsx_xvcvspsxds and __builtin_vsx_xvcvspuxds are
now for internal use only. They are not documented and they do not
have test cases.

Add testcases and update documentation.

gcc/ChangeLog:
* config/rs6000/rs6000-builtins.def (__builtin_vsx_xvcvspsxds,
__builtin_vsx_xvcvspuxds): Rename to __builtin_vsignede_v4sf,
__builtin_vunsignede_v4sf respectively.
(XVCVSPSXDS, XVCVSPUXDS): Rename to VEC_VSIGNEDE_V4SF,
VEC_VUNSIGNEDE_V4SF respectively.
(__builtin_vsignedo_v4sf, __builtin_vunsignedo_v4sf): New
built-in definitions.
* config/rs6000/rs6000-overload.def (vec_signede, vec_signedo,
vec_unsignede, vec_unsignedo): Add new overloaded specifications.
* config/rs6000/vsx.md (vsignede_v4sf, vsignedo_v4sf,
vunsignede_v4sf, vunsignedo_v4sf): New define_expands.
* doc/extend.texi (vec_signedo, vec_signede, vec_unsignedo,
vec_unsignede): Add documentation for new overloaded built-ins to
convert vector float to vector {un,}signed long long.

gcc/testsuite/ChangeLog:
* gcc.target/powerpc/builtins-3-runnable.c
(test_unsigned_int_result, test_ll_unsigned_int_result): Add
new argument.
(vec_signede, vec_signedo, vec_unsignede, vec_unsignedo): New
tests for the overloaded built-ins.

rs6000, fix error in unsigned vector float to unsigned int built-in definitions

The built-in __builtin_vsx_vunsigned_v2df is supposed to take a vector of
doubles and return a vector of unsigned long long ints.  Similarly
__builtin_vsx_vunsigned_v4sf takes a vector of floats an is supposed to
return a vector of unsinged ints.  The definitions are using the signed
version of the instructions not the unsigned version of the instruction.
The results should also be unsigned.  The built-ins are used by the
overloaded vec_unsigned built-in which has an unsigned result.

Similarly the built-ins __builtin_vsx_vunsignede_v2df and
__builtin_vsx_vunsignedo_v2df are supposed to return an unsigned result.
If the floating point argument is negative, the unsigned result is zero.
The built-ins are used in the overloaded built-in vec_unsignede and
vec_unsignedo respectively.

Add a test cases for a negative floating point arguments for each of the
above built-ins.

gcc/ChangeLog:
* config/rs6000/rs6000-builtins.def (__builtin_vsx_vunsigned_v2df,
__builtin_vsx_vunsigned_v4sf, __builtin_vsx_vunsignede_v2df,
__builtin_vsx_vunsignedo_v2df): Change the result type to unsigned.

gcc/testsuite/ChangeLog:
* gcc.target/powerpc/builtins-3-runnable.c: Add tests for
vec_unsignede and vec_unsignedo with negative arguments.

rs6000, Remove __builtin_vsx_xvcv{sp{sx,u}ws,dpuxds_uns}

The built-in __builtin_vsx_xvcvspsxws is covered by built-in vec_signed
built-in that is documented in the PVIPR. The __builtin_vsx_xvcvspsxws
built-in is not documented and there are no test cases for it.

The built-in __builtin_vsx_xvcvdpuxds_uns is redundant as it is covered by
vec_unsigned, remove.

The __builtin_vsx_xvcvspuxws is redundant as it is covered by
vec_unsigned, remove.

The built-in __builtin_vsx_xvcvdpsxws is redundant as it is covered by
vec_signed{e,o}, remove.

The built-in __builtin_vsx_xvcvdpuxws is redundant as it is covered by
vec_unsigned{e,o}, remove.

This patch removes the redundant built-ins.

gcc/ChangeLog:
* config/rs6000/rs6000-builtins.def (__builtin_vsx_xvcvspsxws,
__builtin_vsx_xvcvdpuxds_uns, __builtin_vsx_xvcvspuxws,
__builtin_vsx_xvcvdpsxws, __builtin_vsx_xvcvdpuxws): Remove
built-in definitions.

rs6000, Remove __builtin_vsx_cmple* builtins

The built-ins __builtin_vsx_cmple_u16qi, __builtin_vsx_cmple_u2di,
__builtin_vsx_cmple_u4si and __builtin_vsx_cmple_u8hi should take
unsigned arguments and return an unsigned result. The current definitions
take signed arguments and return signed results which is incorrect.

The signed and unsigned versions of __builtin_vsx_cmple* are not
documented in extend.texi. Also there are no test cases for the
built-ins.

Users can use the existing vec_cmple as PVIPR defines instead of
__builtin_vsx_cmple_u16qi, __builtin_vsx_cmple_u2di,
__builtin_vsx_cmple_u4si and __builtin_vsx_cmple_u8hi,
__builtin_vsx_cmple_16qi, __builtin_vsx_cmple_2di,
__builtin_vsx_cmple_4si and __builtin_vsx_cmple_8hi,
__builtin_altivec_cmple_1ti, __builtin_altivec_cmple_u1ti.

Hence these built-ins are redundant and are removed by this patch.

gcc/ChangeLog:
* config/rs6000/rs6000-builtin.cc (RS6000_BIF_CMPLE_16QI,
RS6000_BIF_CMPLE_U16QI, RS6000_BIF_CMPLE_8HI,
RS6000_BIF_CMPLE_U8HI, RS6000_BIF_CMPLE_4SI, RS6000_BIF_CMPLE_U4SI,
RS6000_BIF_CMPLE_2DI, RS6000_BIF_CMPLE_U2DI, RS6000_BIF_CMPLE_1TI,
RS6000_BIF_CMPLE_U1TI): Remove case statements.
* config/rs6000/rs6000-builtins.def (__builtin_vsx_cmple_16qi,
__builtin_vsx_cmple_2di, __builtin_vsx_cmple_4si,
__builtin_vsx_cmple_8hi, __builtin_vsx_cmple_u16qi,
__builtin_vsx_cmple_u2di, __builtin_vsx_cmple_u4si,
__builtin_vsx_cmple_u8hi): Remove buit-in definitions.

i386: Implement .SAT_TRUNC for unsigned integers

The following testcase:

unsigned short foo (unsigned int x)
{
_Bool overflow = x > (unsigned int)(unsigned short)(-1);
return ((unsigned short)x | (unsigned short)-overflow);
}

currently compiles (-O2) to:

foo:
xorl %eax, %eax
cmpl $65535, %edi
seta %al
negl %eax
orl %edi, %eax
ret

We can expand through ustrunc{m}{n}2 optab to use carry flag from the
comparison and generate code using SBB:

foo:
cmpl $65535, %edi
sbbl %eax, %eax
orl %edi, %eax
ret

or CMOV instruction:

foo:
movl $65535, %eax
cmpl %eax, %edi
cmovnc %edi, %eax
ret

gcc/ChangeLog:

* config/i386/i386.md (@cmp<mode>_1): Use SWI mode iterator.
(ustruncdi<mode>2): New expander.
(ustruncsi<mode>2): Ditto.
(ustrunchiqi2): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/sattrunc-1.c: New test.

diagnostics: use refs rather than pointers for diagnostic_{path,context}

Use const & rather than const * in various places where it can't be null
and can't change.

No functional change intended.

gcc/ChangeLog:
* diagnostic-path.cc: Replace "const diagnostic_path *" with
"const diagnostic_path &" throughout, and "diagnostic_context *"
with "diagnostic context &".
* diagnostic.cc (diagnostic_context::show_any_path): Pass
reference in call to print_path.
* diagnostic.h (diagnostic_context::print_path): Convert param
to a reference.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

arm: clean up some legacy FPA related cruft.

Support for the FPA on Arm was removed after gcc-4.7, but this little
bit of crufty code was left behind. In particular the code to support
the 'N' modifier in assembly code was left behind and this lead to a
trail of other code that depended on it, even though most of the
constants that it supported had been removed in the original cleanup.

This patch removes most of the remaining cruft and simplifies the one
bit that remains: to determine whether an RTL construct contains 0.0 we
don't need to convert it to a real value, we can simply compare it to
CONST0_RTX of the appropriate mode.

gcc/

* config/arm/arm.cc (fp_consts_initited): Delete variable.
(value_fp0): Likewise.
(init_fp_table): Delete function.
(fp_const_from_val): Likewise.
(arm_const_double_rtx): Rework to avoid converting to REAL_VALUE_TYPE.
(arm_print_operand, case 'N'): Make use of this case an error.

RISC-V: Fix comment/naming in attribute parsing code

Function target attributes have to be separated by semi-colons.
Let's fix the comment and variable naming to better explain what
the code does.

gcc/ChangeLog:

* config/riscv/riscv-target-attr.cc (riscv_process_target_attr):
Fix comments and variable names.

Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>

RISC-V: Deduplicate arch subset list processing

We have a code duplication in riscv_set_arch_by_subset_list() and
riscv_parse_arch_string(), where the latter function parses an ISA string
into a subset_list before doing the same as the former function.

riscv_parse_arch_string() is used to process command line options and
riscv_set_arch_by_subset_list() processes target attributes.
So, it is obvious that both functions should do the same.
Let's deduplicate the code to enforce this.

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc (riscv_set_arch_by_subset_list):
Fix overlong line.
(riscv_parse_arch_string): Replace duplicated code by a call to
riscv_set_arch_by_subset_list.

Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>

RISC-V: testsuite: Properly gate LTO tests

There are two test cases with the following skip directive:
  dg-skip-if "" { *-*-* } { "-flto -fno-fat-lto-objects" }
This reads as: skip if both '-flto' and '-fno-fat-lto-objects'
are present.  This is not the case if only '-flto' is present.

Since both tests depend on instruction sequences (one does
check-function-bodies the other tests for an assembler error
message), they won't work reliably with fat LTO objects.

Let's change the skip line to gate the test on '-flto'
to avoid failing tests like this:

FAIL: gcc.target/riscv/interrupt-misaligned.c   -O2 -flto   check-function-bodies interrupt
FAIL: gcc.target/riscv/interrupt-misaligned.c   -O2 -flto -flto-partition=none   check-function-bodies interrupt
FAIL: gcc.target/riscv/pr93202.c   -O2 -flto   (test for errors, line 10)
FAIL: gcc.target/riscv/pr93202.c   -O2 -flto   (test for errors, line 9)
FAIL: gcc.target/riscv/pr93202.c   -O2 -flto -flto-partition=none   (test for errors, line 10)
FAIL: gcc.target/riscv/pr93202.c   -O2 -flto -flto-partition=none   (test for errors, line 9)

gcc/testsuite/ChangeLog:

* gcc.target/riscv/interrupt-misaligned.c: Remove
"-fno-fat-lto-objects" from skip condition.
* gcc.target/riscv/pr93202.c: Likewise.

Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>

i386: Correct AVX10 CPUID emulation

AVX10 Documentaion has specified ecx value as 0 for AVX10 version and
vector size under 0x24 subleaf. Although for ecx=1, the bits are all
reserved for now, we still need to specify ecx as 0 to avoid dirty
value in ecx.

gcc/ChangeLog:

* common/config/i386/cpuinfo.h (get_available_features): Correct
AVX10 CPUID emulation to specify ecx value.

c: Rewrite c_parser_omp_tile_sizes to use c_parser_expr_list

The following patch simplifies c_parser_omp_tile_sizes to use
c_parser_expr_list, so that it will get CPP_EMBED parsing naturally,
without having another spot that needs to be adjusted for it.

2024-07-09 Jakub Jelinek <jakub@redhat.com>

* c-parser.cc (c_parser_omp_tile_sizes): Use c_parser_expr_list.

* c-c++-common/gomp/tile-11.c: Adjust expected diagnostics for c.
* c-c++-common/gomp/tile-12.c: Likewise.

c++: Implement C++26 CWG2819 - Allow cv void * null pointer value conversion to object types in constant expressions

The following patch implements CWG2819 (which wasn't a DR because
it changes behavior of C++26 only).

2024-07-09 Jakub Jelinek <jakub@redhat.com>

* constexpr.cc (cxx_eval_constant_expression): CWG2819 - Allow
cv void * null pointer value conversion to object types in constant
expressions.

* g++.dg/cpp26/constexpr-voidptr3.C: New test.
* g++.dg/cpp0x/constexpr-cast2.C: Adjust expected diagnostics for
C++26.
* g++.dg/cpp0x/constexpr-cast4.C: Likewise.

Rename __{float,double}_u to __x86_{float,double}_u to avoid pulluting the namespace.

I have a build failure on NetBSD as the namespace pollution avoidance causes
a direct hit with the system /usr/include/math.h
=======================================================================

In file included from /usr/src/local/gcc/obj/gcc/include/emmintrin.h:31,
                 from /usr/src/local/gcc/obj/x86_64-unknown-netbsd10.99/libstdc++-v3/include/ext/random:45,
                 from /usr/src/local/gcc/libstdc++-v3/include/precompiled/extc++.h:65:
/usr/src/local/gcc/obj/gcc/include/xmmintrin.h:75:15: error: conflicting declaration 'typedef float __float_u'
   75 | typedef float __float_u __attribute__ ((__may_alias__, __aligned__ (1)));
      |               ^~~~~~~~~
In file included from /usr/src/local/gcc/obj/x86_64-unknown-netbsd10.99/libstdc++-v3/include/cmath:47,
                 from /usr/src/local/gcc/obj/x86_64-unknown-netbsd10.99/libstdc++-v3/include/x86_64-unknown-netbsd10.99/bits/stdc++.h:114,
                 from /usr/src/local/gcc/libstdc++-v3/include/precompiled/extc++.h:32:
/usr/src/local/gcc/obj/gcc/include-fixed/math.h:49:7: note: previous declaration as 'union __float_u'
   49 | union __float_u {

gcc/ChangeLog:

PR target/115796
* config/i386/emmintrin.h (__float_u): Rename to ..
(__x86_float_u): .. this.
(_mm_load_sd): Ditto.
(_mm_store_sd): Ditto.
(_mm_loadh_pd): Ditto.
(_mm_loadl_pd): Ditto.
* config/i386/xmmintrin.h (__double_u): Rename to ..
(__x86_double_u): .. this.
(_mm_load_ss): Ditto.
(_mm_store_ss): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr115796.c: New test.

RISC-V: Add testcases for unsigned vector .SAT_ADD IMM form 2

After the middle-end supported the vector mode of .SAT_ADD,  add more
testcases to ensure the correctness of RISC-V backend for form 2.  Aka:

Form 2:
  #define DEF_VEC_SAT_U_ADD_IMM_FMT_2(T, IMM)                          \
  T __attribute__((noinline))                                          \
  vec_sat_u_add_imm##IMM##_##T##_fmt_2 (T *out, T *in, unsigned limit) \
  {                                                                    \
    unsigned i;                                                        \
    for (i = 0; i < limit; i++)                                        \
      out[i] = (T)(in[i] + IMM) < in[i] ? -1 : (in[i] + IMM);          \
  }

DEF_VEC_SAT_U_ADD_IMM_FMT_2 (uint64_t, 9)

Passed the fully rv64gcv regression tests.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h: Add help
test macro.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-5.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-6.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-7.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-8.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-run-5.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-run-6.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-run-7.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-run-8.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

RISC-V: Add testcases for unsigned vector .SAT_ADD IMM form 1

After the middle-end supported the vector mode of .SAT_ADD,  add more
testcases to ensure the correctness of RISC-V backend for form 1.  Aka:

Form 1:
  #define DEF_VEC_SAT_U_ADD_IMM_FMT_1(T, IMM)                          \
  T __attribute__((noinline))                                          \
  vec_sat_u_add_imm##IMM##_##T##_fmt_1 (T *out, T *in, unsigned limit) \
  {                                                                    \
    unsigned i;                                                        \
    for (i = 0; i < limit; i++)                                        \
      out[i] = (T)(in[i] + IMM) >= in[i] ? (in[i] + IMM) : -1;         \
  }

DEF_VEC_SAT_U_ADD_IMM_FMT_1 (uint64_t, 9)

Passed the fully rv64gcv regression tests.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h: Add help
test macro.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_data.h: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-1.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-2.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-3.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-4.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-run-1.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-run-2.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-run-3.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-run-4.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

Daily bump.

[to-be-committed][RISC-V][V3] DCE analysis for extension elimination

The pre-commit testing showed that making ext-dce only active at -O2 and above
would require minor edits to the tests.  In some cases we had specified -O1 in
the test or specified no optimization level at all. Those need to be bumped to
-O2.   In one test we had one set of dg-options overriding another.

The other approach that could have been taken would be to drop the -On
argument, add an explicit -fext-dce and add dg-skip-if options.  I originally
thought that was going to be way to go, but the dg-skip-if aspect was going to
get ugly as things like interaction between unrolling, peeling and -ftracer
would have to be accounted for and would likely need semi-regular adjustment.

Changes since V2:
  Testsuite changes to deal with pass only being enabled at -O2 or
  higher.

--

Changes since V1:

  Check flag_ext_dce before running the new pass.  I'd forgotten that
  I had removed that part of the gate to facilitate more testing.
  Turn flag_ext_dce on at -O2 and above.
  Adjust one of the riscv tests to explicitly avoid vectors
  Adjust a few aarch64 tests
    In tbz_2.c we remove an unnecessary extension which causes us to use
    "x" registers instead of "w" registers.

    In the pred_clobber tests we also remove an extension and that
    ultimately causes a reg->reg copy to change locations.

--

This was actually ack'd late in the gcc-14 cycle, but I chose not to integrate
it given how late we were in the cycle.

The basic idea here is to track liveness of subobjects within a word and if we
find an extension where the bits set aren't actually used, then we convert the
extension into a subreg.  The subreg typically simplifies away.

I've seen this help a few routines in coremark, fix one bug in the testsuite
(pr111384) and fix a couple internally reported bugs in Ventana.

The original idea and code were from Joern; Jivan and I hacked it into usable
shape.  I've had this in my tester for ~8 months, so it's been through more
build/test cycles than I care to contemplate and nearly every architecture we
support.

But just in case, I'm going to wait for it to spin through the pre-commit CI
tester.  I'll find my old ChangeLog before committing.

gcc/
* Makefile.in (OBJS): Add ext-dce.o
* common.opt (ext-dce): Document new option.
* df-scan.cc (df_get_ext_block_use_set): Delete prototype and
make extern.
* df.h (df_get_exit_block_use_set): Prototype.
* ext-dce.cc: New file/pass.
* opts.cc (default_options_table): Handle ext-dce at -O2 or higher.
* passes.def: Add ext-dce before combine.
* tree-pass.h (make_pass_ext_dce): Prototype.

gcc/testsuite
* gcc.target/aarch64/sve/pred_clobber_1.c: Update expected output.
* gcc.target/aarch64/sve/pred_clobber_2.c: Likewise.
* gcc.target/aarch64/sve/pred_clobber_3.c: Likewise.
* gcc.target/aarch64/tbz_2.c: Likewise.
* gcc.target/riscv/core_bench_list.c: New test.
* gcc.target/riscv/core_init_matrix.c: New test.
* gcc.target/riscv/core_list_init.c: New test.
* gcc.target/riscv/matrix_add_const.c: New test.
* gcc.target/riscv/mem-extend.c: New test.
* gcc.target/riscv/pr111384.c: New test.

Co-authored-by: Jivan Hakobyan <jivanhakobyan9@gmail.com>
Co-authored-by: Joern Rennecke <joern.rennecke@embecosm.com>

c-format.cc: add ctors to format_check_results and format_check_context

This is a minor cleanup I spotted whilst working on another patch.
No functional change intended.

gcc/c-family/ChangeLog:
* c-format.cc (format_check_results::format_check_results): New
ctor.
(struct format_check_context): Add ctor; add "m_" prefix to all
fields.
(check_format_info): Use above ctors.
(check_format_arg): Update for "m_" prefix to
format_check_context.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

i386: Promote {QI,HI}mode x86_mov<mode>cc_0_m1_neg to SImode

Promote HImode x86_mov<mode>cc_0_m1_neg insn to SImode to avoid
redundant prefixes. Also promote QImode insn when TARGET_PROMOTE_QImode
is set. This is similar to promotable_binary_operator splitter, where we
promote the result to SImode.

Also correct insn condition for splitters to SImode of NEG and NOT
instructions. The sizes of QImode and SImode instructions are always
the same, so there is no need for optimize_insn_for_size bypass.

gcc/ChangeLog:

* config/i386/i386.md (x86_mov<mode>cc_0_m1_neg splitter to SImode):
New splitter.
(NEG and NOT splitter to SImode): Remove optimize_insn_for_size_p
predicate from insn condition.

libstdc++: Fix _Atomic(T) macro in <stdatomic.h> [PR115807]

The definition of the _Atomic(T) macro needs to refer to ::std::atomic,
not some other std::atomic relative to the current namespace.

libstdc++-v3/ChangeLog:

PR libstdc++/115807
* include/c_compatibility/stdatomic.h (_Atomic): Ensure it
refers to std::atomic in the global namespace.
* testsuite/29_atomics/headers/stdatomic.h/115807.cc: New test.

Remove trailing whitespace from invoke.texi

gcc/ChangeLog:

* doc/invoke.texi: Remove trailing whitespace.

Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>

x86: Support bitwise and/andnot/abs/neg/copysign/xorsign op for V8BF/V16BF/V32BF

This patch extends support for BF16 vector operations in GCC, including bitwise AND, ANDNOT, ABS, NEG, COPYSIGN, and XORSIGN for V8BF, V16BF, and V32BF modes.

gcc/ChangeLog:

* config/i386/i386-expand.cc (ix86_expand_fp_absneg_operator): Add VBF modes.
(ix86_expand_copysign): Ditto.
(ix86_expand_xorsign): Ditto.
* config/i386/i386.cc (ix86_build_const_vector): Ditto.
(ix86_build_signbit_mask): Ditto.
* config/i386/sse.md: Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx2-bf16-vec-absneg.c: New test.
* gcc.target/i386/avx512f-bf16-vec-absneg.c: New test.

rs6000: load high and low part of 128bit vector independently [PR110040]

PR110040 exposes an issue concerning moves from vector registers to GPRs.
There are two moves, one for upper 64 bits and the other for the lower
64 bits.  In the problematic test case, we are only interested in storing
the lower 64 bits.  However, the instruction for copying the upper 64 bits
is still emitted and is dead code.  This patch adds a splitter that splits
apart the two move instructions so that DCE can remove the dead code after
splitting.

2024-07-08  Jeevitha Palanisamy  <jeevitha@linux.ibm.com>

gcc/
PR target/110040
* config/rs6000/vsx.md (split pattern for V1TI to DI move): New define.

gcc/testsuite/
PR target/110040
* gcc.target/powerpc/pr110040-1.c: New testcase.
* gcc.target/powerpc/pr110040-2.c: New testcase.

RISC-V: Implement .SAT_TRUNC for vector unsigned int

This patch would like to implement the .SAT_TRUNC for the RISC-V
backend.  With the help of the RVV Vector Narrowing Fixed-Point
Clip Instructions.  The below SEW(S) are supported:

* e64 => e32
* e64 => e16
* e64 => e8
* e32 => e16
* e32 => e8
* e16 => e8

Take below example to see the changes to asm.
Form 1:
  #define DEF_VEC_SAT_U_TRUNC_FMT_1(NT, WT)                             \
  void __attribute__((noinline))                                        \
  vec_sat_u_trunc_##NT##_##WT##_fmt_1 (NT *out, WT *in, unsigned limit) \
  {                                                                     \
    unsigned i;                                                         \
    for (i = 0; i < limit; i++)                                         \
      {                                                                 \
        WT x = in[i];                                                   \
        bool overflow = x > (WT)(NT)(-1);                               \
        out[i] = ((NT)x) | (NT)-overflow;                               \
      }                                                                 \
  }

DEF_VEC_SAT_U_TRUNC_FMT_1 (uint32_t, uint64_t)

Before this patch:
.L3:
  vsetvli      a5,a2,e64,m1,ta,ma
  vle64.v      v1,0(a1)
  vmsgtu.vv    v0,v1,v2
  vsetvli      zero,zero,e32,mf2,ta,ma
  vncvt.x.x.w  v1,v1
  vmerge.vim   v1,v1,-1,v0
  vse32.v      v1,0(a0)
  slli         a4,a5,3
  add          a1,a1,a4
  slli         a4,a5,2
  add          a0,a0,a4
  sub          a2,a2,a5
  bne          a2,zero,.L3

After this patch:
.L3:
  vsetvli      a5,a2,e32,mf2,ta,ma
  vle64.v      v1,0(a1)
  vnclipu.wi   v1,v1,0
  vse32.v      v1,0(a0)
  slli         a4,a5,3
  add          a1,a1,a4
  slli         a4,a5,2
  add          a0,a0,a4
  sub          a2,a2,a5
  bne          a2,zero,.L3

Passed the rv64gcv fully regression tests.

gcc/ChangeLog:

* config/riscv/autovec.md (ustrunc<mode><v_double_trunc>2): Add
new pattern for double truncation.
(ustrunc<mode><v_quad_trunc>2): Ditto but for quad truncation.
(ustrunc<mode><v_oct_trunc>2): Ditto but for oct truncation.
* config/riscv/riscv-protos.h (expand_vec_double_ustrunc): Add
new func decl to expand double vec ustrunc.
(expand_vec_quad_ustrunc): Ditto but for quad.
(expand_vec_oct_ustrunc): Ditto but for oct.
* config/riscv/riscv-v.cc (expand_vec_double_ustrunc): Add new
func impl to expand vector double ustrunc.
(expand_vec_quad_ustrunc): Ditto but for quad.
(expand_vec_oct_ustrunc): Ditto but for oct.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h: Add helper
test macros.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_data.h: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-1.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-2.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-3.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-4.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-5.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-6.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-run-1.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-run-2.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-run-3.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-run-4.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-run-5.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-run-6.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_unary_vv_run.h: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

fortran: Move definition of variable closer to its uses

No change of behaviour, this makes a variable easier to track.

gcc/fortran/ChangeLog:

* trans-array.cc (gfc_trans_preloop_setup): Use a separate variable
for iteration. Use directly the value of variable I if it is known.
Move the definition of the variable to the branch where the
remaining uses are.

[RISC-V] add implied extension repeatly until stable

Call handle_implied_ext repeatly until there's no
new subset added into the subset list.

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc (riscv_subset_list::riscv_subset_list):
init m_subset_num to 0.
(riscv_subset_list::add): increase m_subset_num once a subset added.
(riscv_subset_list::finalize): call handle_implied_ext repeatly
until no change in m_subset_num.
* config/riscv/riscv-subset.h: add m_subset_num member.

Signed-off-by: Fei Gao <gaofei@eswincomputing.com>

rs6000: Replace orc with iorc [PR115659]

Since iorc optab is introduced, this patch is to update the
expander names and all the related uses like bif expanders,
gen functions accordingly.

PR tree-optimization/115659

gcc/ChangeLog:

* config/rs6000/rs6000-builtins.def: Update some bif expanders by
replacing orc<mode>3 with iorc<mode>3.
* config/rs6000/rs6000-string.cc (expand_cmp_vec_sequence): Update gen
function by replacing orc<mode>3 with iorc<mode>3.
* config/rs6000/rs6000.md (orc<mode>3): Rename to ...
(iorc<mode>3): ... this.

isel: Fold more in gimple_expand_vec_cond_expr with andc and iorc [PR115659]

As PR115659 shows, assuming c = x CMP y, there are some
folding chances for patterns r = c ? 0/z : z/-1:
  - for r = c ? 0 : z, it can be folded into r = ~c & z.
  - for r = c ? z : -1, it can be folded into r = ~c | z.

But BIT_AND/BIT_IOR applied on one BIT_NOT operand is a
compound operation, it's arguable to consider it beats
vector selection.  So this patch is to introduce new
optabs andc, iorc and its corresponding internal functions
BIT_{ANDC,IORC}, and if targets defines such optabs for
vector modes, it means targets support these hardware
insns and should be not worse than vector selection.

PR tree-optimization/115659

gcc/ChangeLog:

* doc/md.texi: Document andcm3 and iorcm3.
* gimple-isel.cc (gimple_expand_vec_cond_expr): Add more foldings for
patterns x CMP y ? 0 : z and x CMP y ? z : -1.
* internal-fn.def (BIT_ANDC): New internal function.
(BIT_IORC): Likewise.
* optabs.def (andc, iorc): New optab.

rs6000: Consider explicit VSX when masking off ALTIVEC [PR115688]

PR115688 exposes an inconsistent state in which we have VSX
enabled but ALTIVEC disabled.  There is one hunk:

  if (main_target_opt && !main_target_opt->x_rs6000_altivec_abi)
    rs6000_isa_flags &= ~((OPTION_MASK_VSX | OPTION_MASK_ALTIVEC)
  & ~rs6000_isa_flags_explicit);

which disables both VSX and ALTIVEC together only considering
them explicitly set or not.  For the given case, VSX is explicitly
specified, altivec is implicitly enabled as it's part of set
ISA_2_6_MASKS_SERVER.  When falling into the above hunk, vsx is
kept as it's explicitly enabled but altivec gets masked off, it's
unexpected.

This patch is to consider explicit VSX when masking off ALTIVEC,
not mask off it if TARGET_VSX and it's explicitly set.

PR target/115688

gcc/ChangeLog:

* config/rs6000/rs6000.cc (rs6000_option_override_internal): Consider
explicit VSX when masking off ALTIVEC.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr115688.c: New test.

x86: Update branch hint for Redwood Cove.

According to Intel® 64 and IA-32 Architectures Optimization Reference
Manual[1], Branch Hint is updated for Redwood Cove.

--------cut from [1]-------------------------
Starting with the Redwood Cove microarchitecture, if the predictor has
no stored information about a branch, the branch has the Intel® SSE2
branch taken hint (i.e., instruction prefix 3EH), When the codec
decodes the branch, it flips the branch’s prediction from not-taken to
taken. It then flushes the pipeline in front of it and steers this
pipeline to fetch the taken path of the branch.
--------cut end -----------------------------

Split tune branch_prediction_hints into branch_prediction_hints_taken
and branch_prediction_hints_not_taken, always generate branch hint for
conditional branches, both tunes are disabled by default.

[1] https://www.intel.com/content/www/us/en/content-details/821612/intel-64-and-ia-32-architectures-optimization-reference-manual-volume-1.html

gcc/

* config/i386/i386.cc (ix86_print_operand): Always generate
branch hint for conditional branches.
* config/i386/i386.h (TARGET_BRANCH_PREDICTION_HINTS): Split
into ..
(TARGET_BRANCH_PREDICTION_HINTS_TAKEN): .. this, and ..
(TARGET_BRANCH_PREDICTION_HINTS_NOT_TAKEN): .. this.
* config/i386/x86-tune.def (X86_TUNE_BRANCH_PREDICTION_HINTS):
Split into ..
(X86_TUNE_BRANCH_PREDICTION_HINTS_TAKEN): .. this, and ..
(X86_TUNE_BRANCH_PREDICTION_HINTS_NOT_TAKEN): .. this.

Daily bump.

PR modula2/115804 ICE during gimplification with new isfinite optab

The calls to five m2 builtins have the incorrect return type.
This was detected when adding isfinitedf2 optab to the s390
backend which results in ICEs during gimplification in the
gm2 testsuite.

gcc/m2/ChangeLog:

PR modula2/115804
* gm2-gcc/m2builtins.cc (builtin_function_entry): Add GTY.
(DoBuiltinMemCopy): Add rettype and use rettype in the call.
(DoBuiltinAlloca): Ditto.
(DoBuiltinIsfinite): Ditto.
(DoBuiltinIsnan): Ditto.
(m2builtins_BuiltInHugeVal): Ditto.
(m2builtins_BuiltInHugeValShort): Ditto.
(m2builtins_BuiltInHugeValLong): Ditto.

Co-Authored-By: Stefan Schulze Frielinghaus <stefansf@linux.ibm.com>
Co-Authored-By: Andrew Pinski <quic_apinski@quicinc.com>
Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>

libstdc++: Tweak two links in configuration docs

libstdc++-v3:
* doc/xml/manual/configure.xml: Update Autobook 14 link.
Update GCC installation instructions link.
* doc/html/manual/configure.html: Regenerate.

maintainer-scripts: Switch bug reporting URL to https

maintainer-scripts:
* update_web_docs_git (BUGURL): Switch to https.

doc: Remove dubious example around bug reporting

gcc:
* doc/bugreport.texi (Bug Criteria): Remove dubious example.

c++: Simplify uses of LAMBDA_EXPR_EXTRA_SCOPE

I noticed there already exists a getter to get the scope of a lambda
from its type directly rather than needing to go via
CLASSTYPE_LAMBDA_EXPR, we may as well use it.

gcc/cp/ChangeLog:

* module.cc (trees_out::get_merge_kind): Use
LAMBDA_TYPE_EXTRA_SCOPE instead of LAMBDA_EXPR_EXTRA_SCOPE.
(trees_out::key_mergeable): Likewise.

Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>

ada: Make the names of uninstalled cross-gnattools consistent across builds

We suffer from an inconsistency in the names of uninstalled gnattools
executables in cross-compiler configurations.  The cause is a recipe we
have:

ada.all.cross:
for tool in $(ADA_TOOLS) ; do \
  if [ -f $$tool$(exeext) ] ; \
  then \
    $(MV) $$tool$(exeext) $$tool-cross$(exeext); \
  fi; \
done

the intent of which is to give the names of gnattools executables the
'-cross' suffix, consistently with the compiler drivers: 'gcc-cross',
'g++-cross', etc.

A problem with the recipe is that this 'make' target is called too early
in the build process, before gnattools have been made.  Consequently no
renames happen and owing to that they are conditional on the presence of
the individual executables the recipe succeeds doing nothing.

However if a target is requested later on such as 'make pdf' that does
not cause gnattools executables to be rebuilt, then 'ada.all.cross' does
succeed in renaming the executables already present in the build tree.
Then if the 'gnat' testsuite is run later on which expects non-suffixed
'gnatmake' executable, it does not find the 'gnatmake-cross' executable
in the build tree and may either catastrophically fail or incorrectly
use a system-installed copy of 'gnatmake'.

Of course if a target is requested such as `make all' that does cause
gnattools executables to be rebuilt, then both suffixed and non-suffixed
uninstalled executables result.

Fix the problem by moving the renaming of gnattools to a separate 'make'
recipe, pasted into a new 'gnattools-cross-mv' target and the existing
legacy 'cross-gnattools' target.  Then invoke the new target explicitly
from the 'gnattools-cross' recipe in gnattools/.

Update the test harness accordingly, so that suffixed gnattools are used
in cross-compilation testsuite runs.

gcc/ada/
* gcc-interface/Make-lang.in (ada.all.cross): Move recipe to...
(GNATTOOLS_CROSS_MV): ... this new variable.
(cross-gnattools): Paste it here.
(gnattools-cross-mv): New target.

gnattools/
* Makefile.in (gnattools-cross): Also build 'gnattools-cross-mv'
in GCC_DIR.

gcc/testsuite/
* lib/gnat.exp (local_find_gnatmake, find_gnatclean): Use
'-cross' suffix where testing a cross-compiler.

libstdc++: Fix std::find for non-contiguous iterators [PR115799]

The r15-1857 change didn't correctly restrict the new optimization to
contiguous iterators.

libstdc++-v3/ChangeLog:

PR libstdc++/115799
* include/bits/stl_algo.h (find): Use 'if constexpr' so that
memchr optimization is a discarded statement for non-contiguous
iterators.
* testsuite/25_algorithms/find/bytes.cc: Check with input
iterators.

libstdc++: Fix memchr path in std::ranges::find for non-common range [PR115799]

The memchr optimization introduced in r15-1857 needs to advance the
start iterator instead of returning the sentinel.

libstdc++-v3/ChangeLog:

PR libstdc++/115799
* include/bits/ranges_util.h (__find_fn): Return iterator
instead of sentinel.
* testsuite/25_algorithms/find/constrained.cc: Check non-common
contiguous sized range of char.

Daily bump.

libstdc++: Remove redundant 17_intro/headers tests

We have several nearly identical tests under 17_intro/headers which only
differ in a -std option set using dg-options. Since the testsuite now
supports running tests with multiple -std options (and I test that
regularly) we don't need these duplicated tests. We can remove most of
them and let the testsuite decide which -std option to use.

In the all_attributes.cc case the content of the tests is slightly
different, but they can be combined into one test that defines macros
conditionally based on __cplusplus checks.

The stdc++.cc tests could also be combined this way, but for now I've
just kept one version for c++98 and one for all later standards.

For stdc++_multiple_inclusion.cc we can remove the body of the files and
just include stdc++.cc twice. This means we don't need to add includes
to both stdc++.cc and stdc++_multiple_inclusion.cc, we only need to
update one place.

libstdc++-v3/ChangeLog:

* testsuite/17_intro/headers/c++1998/all_attributes.cc: Add
attribute names from later standards and remove dg-options.
* testsuite/17_intro/headers/c++1998/stdc++.cc: Add c++98_only
target selector.
* testsuite/17_intro/headers/c++1998/stdc++_multiple_inclusion.cc:
Remove content and include stdc++.cc twice instead.
* testsuite/17_intro/headers/c++2011/stdc++.cc: Replace
dg-options with c++11 target selector.
* testsuite/17_intro/headers/c++2011/stdc++_multiple_inclusion.cc:
Remove content and include stdc++.cc twice instead.
* testsuite/17_intro/headers/c++2011/all_attributes.cc: Removed.
* testsuite/17_intro/headers/c++2011/all_no_exceptions.cc: Removed.
* testsuite/17_intro/headers/c++2011/all_no_rtti.cc: Removed.
* testsuite/17_intro/headers/c++2011/all_pedantic_errors.cc: Removed.
* testsuite/17_intro/headers/c++2011/charset.cc: Removed.
* testsuite/17_intro/headers/c++2011/operator_names.cc: Removed.
* testsuite/17_intro/headers/c++2014/all_attributes.cc: Removed.
* testsuite/17_intro/headers/c++2014/all_no_exceptions.cc: Removed.
* testsuite/17_intro/headers/c++2014/all_no_rtti.cc: Removed.
* testsuite/17_intro/headers/c++2014/all_pedantic_errors.cc: Removed.
* testsuite/17_intro/headers/c++2014/charset.cc: Removed.
* testsuite/17_intro/headers/c++2014/operator_names.cc: Removed.
* testsuite/17_intro/headers/c++2014/stdc++.cc: Removed.
* testsuite/17_intro/headers/c++2014/stdc++_multiple_inclusion.cc: Removed.
* testsuite/17_intro/headers/c++2017/all_attributes.cc: Removed.
* testsuite/17_intro/headers/c++2017/all_no_exceptions.cc: Removed.
* testsuite/17_intro/headers/c++2017/all_no_rtti.cc: Removed.
* testsuite/17_intro/headers/c++2017/all_pedantic_errors.cc: Removed.
* testsuite/17_intro/headers/c++2017/charset.cc: Removed.
* testsuite/17_intro/headers/c++2017/operator_names.cc: Removed.
* testsuite/17_intro/headers/c++2017/stdc++.cc: Removed.
* testsuite/17_intro/headers/c++2017/stdc++_multiple_inclusion.cc: Removed.
* testsuite/17_intro/headers/c++2020/all_attributes.cc: Removed.
* testsuite/17_intro/headers/c++2020/all_no_exceptions.cc: Removed.
* testsuite/17_intro/headers/c++2020/all_no_rtti.cc: Removed.
* testsuite/17_intro/headers/c++2020/all_pedantic_errors.cc: Removed.
* testsuite/17_intro/headers/c++2020/charset.cc: Removed.
* testsuite/17_intro/headers/c++2020/operator_names.cc: Removed.
* testsuite/17_intro/headers/c++2020/stdc++.cc: Removed.
* testsuite/17_intro/headers/c++2020/stdc++_multiple_inclusion.cc: Removed.

libstdc++: Use reserved form of [[__likely__]] in <variant>

We should not use [[unlikely]] before C++20, so use [[__unlikely__]]
instead.

libstdc++-v3/ChangeLog:

* include/std/variant (_Variant_storage::_M_reset): Use
__unlikely__ form of attribute instead of unlikely.

libstdc++: Restore support for including <name.h> in extern "C" [PR115797]

The r15-1857 change means that <type_traits> is included by <cmath> for
C++17 and up, which breaks code including <math.h> inside an extern "C"
block. Although doing that is not allowed by the C++ standard, there's
lots of existing code which incorrectly thinks it's a good idea and so
we try to support it.

libstdc++-v3/ChangeLog:

PR libstdc++/115797
* include/std/type_traits: Ensure "C++" language linkage.
* testsuite/17_intro/headers/c++2011/linkage.cc: Replace
dg-options with c++11 target selector.

[to-be-committed][v3][RISC-V] Handle bit manipulation of SImode values

Last patch in this round of bitmanip work...  At least I think I'm going to
pause here and switch gears to other projects that need attention 🙂

This patch introduces the ability to generate bitmanip instructions for rv64
when operating on SI objects when we know something about the range of the bit
position (due to masking of the position).

I've got note that the (7-pos % 8) bit position form was discovered by RAU in
500.perl.  I took that and expanded it to the simple (pos & mask) form as well
as covering bset, binv and bclr.

As far as the implementation is concerned....

This turns the recently added define_splits into define_insn_and_split
constructs.  This allows combine to "see" enough RTL to realize a sign
extension is unnecessary.  Otherwise we get undesirable sign extensions for the
new testcases.

Second it adds new patterns for the logical operations.  Two patterns for
IOR/XOR and two patterns for AND.

I think a key concept to keep in mind is that once we determine a Zbs operation
is safe to perform on a SI value, we can rewrite the RTL in 64bit form.  If we
were ever to try and use range information at expand time for this stuff (and
we probably should investigate that), that's the path I'd suggest.

This is notably cleaner than my original implementation which actually kept the
more complex RTL form through final and emitted 2/3 instructions (mask the bit
position, then the bset/bclr/binv).

Tested in my tester, but waiting for pre-commit CI to report back before taking
further action.

gcc/
* config/riscv/bitmanip.md (bset splitters): Turn into define_and_splits.
Don't depend on combine splitting the "andn with constant" form.
(bset, binv, bclr with masked bit position): New patterns.

gcc/testsuite
* gcc.target/riscv/binv-for-simode-1.c: New test.
* gcc.target/riscv/bset-for-simode-1.c: New test.
* gcc.target/riscv/bclr-for-simode-1.c: New test.

testsuite/52641 - Fix more sloppy tests.

PR testsuite/52641
gcc/testsuite/
* gcc.dg/analyzer/torture/boxed-ptr-1.c: Requires size24plus.
* gcc.dg/analyzer/torture/pr102692.c: Use intptr_t instead of long.
* gcc.dg/ipa/pr102714.c: Use uintptr_t instead of unsigned long.
* gcc.dg/torture/pr115387-1.c: Same.
* gcc.dg/torture/pr113895-1.c : Same.
* gcc.dg/ipa/pr108007.c: Require int32plus.
* gcc.dg/ipa/pr109318.c: Same.
* gcc.dg/ipa/pr96040.c: Use size_t instead of unsigned long.
* gcc.dg/torture/pr113126.c: Use vectors of same dimension.
* gcc.dg/tree-ssa/builtin-sprintf-9.c: Requires double64.

* gcc.dg/spellcheck-inttypes.c [avr]: Avoid include of inttypes.h.
* gcc.dg/analyzer/torture/pr104159.c [avr]: Skip.
* gcc.dg/torture/pr84682-2.c [avr]: Skip.
* gcc.dg/wtr-conversion-1.c [avr]: Remove avr selector since
long double is a 64-bit type by now.

[committed] Fix various sh define_insn_and_split predicates

The sh4-linux-gnu port has failed to bootstrap since the introduction of late
combine due to failures to split certain insns.

This is caused by incorrect predicates in various define_insn_and_split
patterns.  Essentially the insn's predicate is something like "TARGET_SH1".
The split predicate is "&& can_create_pseudos_p ()".  So these patterns will
match post-reload, but be un-splittable.  So at assembly output time, we get
the failure as the output template is "#".

This patch fixes the most obvious & egregious cases by bringing the split
condition into the insn's predicate and leaving "&& 1" as the split condition.
That's enough to get sh4-linux-gnu bootstrapping again and I'm hoping it does
the same for sh4eb-linux-gnu.

Pushing to the trunk.

gcc/
* config/sh/sh.md (adddi3): Only allow matching when we can
still create new pseudos.
(subdi3, *rotcl, *rotcr, *rotcr_neg_t, negdi2): Likewise.
(abs<mode>2, negabs<mode>2, negdi_cond): Likewise.
(*swapbisi2_and_shl8, *swapbhisi2, *movsi_index_disp_load): Likewise.
(*movhi_index_disp_load, *mov<mode>index_disp_store): Likewise.
(*mov_t_msb_neg, *negt_msb, clipu_one): Likewise.

AVR: Create more opportunities for -mfuse-add optimization.

avr_split_tiny_move() was only run for AVR_TINY because it has no PLUS
addressing modes.  Same applies to the X register on ordinary cores, and
also to the Z register when used with [E]LPM.  For example, without this patch

long long addLL (long long *a, long long *b)
{
  return *a + *b;
}

compiles with "-mmcu=atmgea128 -Os -dp" to:

    ...
    movw r26,r24     ;  80  [c=4 l=1]  *movhi/0
    movw r30,r22     ;  81  [c=4 l=1]  *movhi/0
    ld r18,X         ;  82  [c=4 l=1]  movqi_insn/3
    adiw r26,1   ;  83  [c=4 l=3]  movqi_insn/3
    ld r19,X
    sbiw r26,1
    adiw r26,2   ;  84  [c=4 l=3]  movqi_insn/3
    ld r20,X
    sbiw r26,2
    adiw r26,3   ;  85  [c=4 l=3]  movqi_insn/3
    ld r21,X
    sbiw r26,3
    adiw r26,4   ;  86  [c=4 l=3]  movqi_insn/3
    ld r22,X
    sbiw r26,4
    adiw r26,5   ;  87  [c=4 l=3]  movqi_insn/3
    ld r23,X
    sbiw r26,5
    adiw r26,6   ;  88  [c=4 l=3]  movqi_insn/3
    ld r24,X
    sbiw r26,6
    adiw r26,7   ;  89  [c=4 l=2]  movqi_insn/3
    ld r25,X
    ld r10,Z         ;  90  [c=4 l=1]  movqi_insn/3
    ...

whereas with this patch it becomes:

    ...
    movw r26,r24     ;  80  [c=4 l=1]  *movhi/0
    movw r30,r22     ;  81  [c=4 l=1]  *movhi/0
    ld r18,X+        ;  140 [c=4 l=1]  movqi_insn/3
    ld r19,X+        ;  142 [c=4 l=1]  movqi_insn/3
    ld r20,X+        ;  144 [c=4 l=1]  movqi_insn/3
    ld r21,X+        ;  146 [c=4 l=1]  movqi_insn/3
    ld r22,X+        ;  148 [c=4 l=1]  movqi_insn/3
    ld r23,X+        ;  150 [c=4 l=1]  movqi_insn/3
    ld r24,X+        ;  152 [c=4 l=1]  movqi_insn/3
    ld r25,X         ;  109 [c=4 l=1]  movqi_insn/3
    ld r10,Z         ;  111 [c=4 l=1]  movqi_insn/3
    ...

gcc/
* config/avr/avr.md: Also split with avr_split_tiny_move()
for non-AVR_TINY.
* config/avr/avr.cc (avr_split_tiny_move): Don't change memory
references with base regs that can do PLUS addressing.
(avr_out_lpm_no_lpmx) [POST_INC]: Don't output final ADIW when the
address register is unused after.
gcc/testsuite/
* gcc.target/avr/torture/fuse-add.c: New test.

RISC-V: fix internal error on global variable-length array

This is an ICE in the RISC-V back-end calling tree_to_uhwi on the DECL_SIZE
of a global variable-length array.

gcc/
PR target/115591
* config/riscv/riscv.cc (riscv_valid_lo_sum_p): Add missing test on
tree_fits_uhwi_p before calling tree_to_uhwi.

gcc/testsuite/
* gnat.dg/array41.ads, gnat.dg/array41.adb: New test.

PR target/115751: Avoid force_reg in ix86_expand_ternlog.

This patch fixes a problem with splitting of complex AVX512 ternlog
instructions on x86_64.  A recent change allows the ternlog pattern
to have multiple mem-like operands prior to reload, by emitting any
"reloads" as necessary during split1, before register allocation.
The issue is that this code calls force_reg to place the mem-like
operand into a register, but unfortunately the vec_duplicate (broadcast)
form of operands supported by ternlog isn't considered a "general_operand",
i.e. supported by all instructions.  This mismatch triggers an ICE in
the middle-end's force_reg, even though the x86 supports loading these
vec_duplicate operands into a vector register in a single (move)
instruction.

This patch resolves this problem by replacing force_reg with calls
to gen_reg_rtx and emit_move (as the i386 backend, unlike the middle-end,
knows these will be recognized by recog).

2024-07-06  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
PR target/115751
* config/i386/i386-expand.cc (ix86_expand_ternlog): Avoid use of
force_reg to "reload" non-register operands, as these may contain
vec_duplicate (broadcast) operands that aren't supported by
force_reg.  Use (safer) gen_reg_rtx and emit_move instead.

Daily bump.

x86, Darwin: Fix bootstrap for 32b multilibs/hosts.

r15-1735-ge62ea4fb8ffcab06ddd contained changes that altered the
codegen for 32b Darwin (whether hosted on 64b or as 32b host) such
that the per function picbase load is called multiple times in some
cases. Darwin's back end is not expecting this (and indeed some of
the handling depends on a single instance).

The fixes the issue by marking those instructions as not copyable
(as suggested by Andrew Pinski).

The change is Darwin-specific.

gcc/ChangeLog:

* config/i386/i386.cc (ix86_cannot_copy_insn_p): New.
(TARGET_CANNOT_COPY_INSN_P): New.

Signed-off-by: Iain Sandoe <iains@gcc.gnu.org>

Fortran: switch test to use issignaling() built-in

The macro may not be present in all libc's, but the built-in
is always available.

gcc/testsuite/ChangeLog:

* gfortran.dg/ieee/signaling_2.f90: Adjust test.
* gfortran.dg/ieee/signaling_2_c.c: Adjust test.

Arm: Fix ldrd offset range [PR115153]

The valid offset range of LDRD in arm_legitimate_index_p is increased to
-1024..1020 if NEON is enabled since VALID_NEON_DREG_MODE includes DImode.
Fix this by moving the LDRD check earlier.

gcc:
PR target/115153
* config/arm/arm.cc (arm_legitimate_index_p): Move LDRD case before
NEON.
(thumb2_legitimate_index_p): Update comments.
(output_move_neon): Use DFmode for vldr/vstr and non-checking
adjust_address.

gcc/testsuite:
PR target/115153
* gcc.target/arm/pr115153.c: Add new test.
* lib/target-supports.exp: Add arm_arch_v7ve_neon target support.

libgccjit: Allow comparing array types

gcc/jit/ChangeLog:

* jit-common.h: Add array_type class.
* jit-recording.h (type::dyn_cast_array_type,
memento_of_get_aligned::dyn_cast_array_type,
array_type::dyn_cast_array_type, array_type::is_same_type_as):
New methods.

gcc/testsuite/ChangeLog:

* jit.dg/test-types.c: Add array type comparison to the test.

libgccjit: Add support for the type bfloat16

gcc/jit/ChangeLog:

PR jit/112574
* docs/topics/types.rst: Document GCC_JIT_TYPE_BFLOAT16.
* jit-common.h: Update NUM_GCC_JIT_TYPES.
* jit-playback.cc (get_tree_node_for_type): Support bfloat16.
* jit-recording.cc (recording::memento_of_get_type::get_size,
recording::memento_of_get_type::dereference,
recording::memento_of_get_type::is_int,
recording::memento_of_get_type::is_signed,
recording::memento_of_get_type::is_float,
recording::memento_of_get_type::is_bool): Support bfloat16.
* libgccjit.h (enum gcc_jit_types): Add GCC_JIT_TYPE_BFLOAT16.

gcc/testsuite/ChangeLog:

PR jit/112574
* jit.dg/all-non-failing-tests.h: New test test-bfloat16.c.
* jit.dg/test-types.c: Test GCC_JIT_TYPE_BFLOAT16.
* jit.dg/test-bfloat16.c: New test.

MAINTAINERS: Fix order in DCO

ChangeLog:

* MAINTAINERS: Fix order in Contributing under the DCO.

Signed-off-by: Filip Kastl <fkastl@suse.cz>

RISC-V: Use tu policy for first-element vec_set [PR115725].

This patch changes the tail policy for vmv.s.x from ta to tu.
By default the bug does not show up with qemu because qemu's
current vmv.s.x implementation always uses the tail-undisturbed
policy. With a local qemu version that overwrites the tail
with ones when the tail-agnostic policy is specified, the bug
shows.

gcc/ChangeLog:

* config/riscv/autovec.md: Add TU policy.
* config/riscv/riscv-protos.h (enum insn_type): Define
SCALAR_MOVE_MERGED_OP_TU.

gcc/testsuite/ChangeLog:

PR target/115725

* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-1.c: Adjust
test expectation.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-4.c: Ditto.

AVR: target/87376 - Use nop_general_operand for DImode inputs.

The avr-dimode.md expanders have code like emit_move_insn(acc_a, operands[1])
where acc_a is a hard register and operands[1] might be a non-generic
address-space memory reference. Such loads may clobber hard regs since
some of them are implemented as libgcc calls /and/ 64-moves are
expanded as eight byte-moves, so that acc_a or acc_b might be clobbered
by such a load.

This patch simply denies non-generic address-space references by using
nop_general_operand for all avr-dimode.md input predicates.
With the patch, all memory loads that require library calls are issued
before the expander codes from avr-dimode.md are run.

PR target/87376
gcc/
* config/avr/avr-dimode.md: Use "nop_general_operand" instead
of "general_operand" as predicate for all input operands.

gcc/testsuite/
* gcc.target/avr/torture/pr87376.c: New test.

libstdc++: Add dg-error for new -Wdelete-incomplete diagnostics [PR115747]

Since r15-1794-gbeb7a418aaef2e the -Wdelete-incomplete diagnostic is a
permerror instead of a (suppressed in system headers) warning. Add
dg-error directives.

libstdc++-v3/ChangeLog:

PR c++/115747
* testsuite/tr1/2_general_utilities/shared_ptr/cons/43820_neg.cc:
Add dg-error for new C++26 diagnostics.

libstdc++: Use RAII in <bits/stl_uninitialized.h>

This adds an _UninitDestroyGuard class template, similar to
ranges::_DestroyGuard used in <bits/ranges_uninitialized.h>. This allows
us to remove all the try-catch blocks and rethrows, because any required
cleanup gets done in the guard destructor.

libstdc++-v3/ChangeLog:

* include/bits/stl_uninitialized.h (_UninitDestroyGuard): New
class template and partial specialization.
(__do_uninit_copy, __do_uninit_fill, __do_uninit_fill_n)
(__uninitialized_copy_a, __uninitialized_fill_a)
(__uninitialized_fill_n_a, __uninitialized_copy_move)
(__uninitialized_move_copy, __uninitialized_fill_move)
(__uninitialized_move_fill, __uninitialized_default_1)
(__uninitialized_default_n_a, __uninitialized_default_novalue_1)
(__uninitialized_default_novalue_n_1, __uninitialized_copy_n)
(__uninitialized_copy_n_pair): Use it.

libstdc++: Use memchr to optimize std::find [PR88545]

This optimizes std::find to use memchr when searching for an integer in
a range of bytes.

libstdc++-v3/ChangeLog:

PR libstdc++/88545
PR libstdc++/115040
* include/bits/cpp_type_traits.h (__can_use_memchr_for_find):
New variable template.
* include/bits/ranges_util.h (__find_fn): Use memchr when
possible.
* include/bits/stl_algo.h (find): Likewise.
* testsuite/25_algorithms/find/bytes.cc: New test.

AArch64: lower 2 reg TBL permutes with one zero register to 1 reg TBL.

When a two reg TBL is performed with one operand being a zero vector we can
instead use a single reg TBL and map the indices for accessing the zero vector
to an out of range constant.

On AArch64 out of range indices into a TBL have a defined semantics of setting
the element to zero.  Many uArches have a slower 2-reg TBL than 1-reg TBL.

Before this change we had:

typedef unsigned int v4si __attribute__ ((vector_size (16)));

v4si f1 (v4si a)
{
  v4si zeros = {0,0,0,0};
  return __builtin_shufflevector (a, zeros, 0, 5, 1, 6);
}

which generates:

f1:
        mov     v30.16b, v0.16b
        movi    v31.4s, 0
        adrp    x0, .LC0
        ldr     q0, [x0, #:lo12:.LC0]
        tbl     v0.16b, {v30.16b - v31.16b}, v0.16b
        ret

.LC0:
        .byte   0
        .byte   1
        .byte   2
        .byte   3
        .byte   20
        .byte   21
        .byte   22
        .byte   23
        .byte   4
        .byte   5
        .byte   6
        .byte   7
        .byte   24
        .byte   25
        .byte   26
        .byte   27

and with the patch:

f1:
        adrp    x0, .LC0
        ldr     q31, [x0, #:lo12:.LC0]
        tbl     v0.16b, {v0.16b}, v31.16b
        ret

.LC0:
        .byte   0
        .byte   1
        .byte   2
        .byte   3
        .byte   -1
        .byte   -1
        .byte   -1
        .byte   -1
        .byte   4
        .byte   5
        .byte   6
        .byte   7
        .byte   -1
        .byte   -1
        .byte   -1
        .byte   -1

This sequence is generated often by openmp and aside from the
strict performance impact of this change, it also gives better
register allocation as we no longer have the consecutive
register limitation.

gcc/ChangeLog:

* config/aarch64/aarch64.cc (struct expand_vec_perm_d): Add zero_op0_p
and zero_op_p1.
(aarch64_evpc_tbl): Implement register value remapping.
(aarch64_vectorize_vec_perm_const): Detect if operand is a zero dup
before it's forced to a reg.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/tbl_with_zero_1.c: New test.
* gcc.target/aarch64/tbl_with_zero_2.c: New test.

AArch64: remove aarch64_simd_vec_unpack<su>_lo_

The fix for PR18127 reworked the uxtl to zip optimization.
In doing so it undid the changes in aarch64_simd_vec_unpack<su>_lo_ and this now
no longer matches aarch64_simd_vec_unpack<su>_hi_. It still works because the
RTL generated by aarch64_simd_vec_unpack<su>_lo_ overlaps with the general zero
extend RTL and so because that one is listed before the lo pattern recog picks
it instead.

This removes aarch64_simd_vec_unpack<su>_lo_.

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md
(aarch64_simd_vec_unpack<su>_lo_<mode>): Remove.
(vec_unpack<su>_lo_<mode): Simplify.
* config/aarch64/aarch64.cc (aarch64_gen_shareable_zero): Update
comment.

middle-end: Add debug functions to dump dominator tree in dot format

This adds debug functions to dump the dominator tree in dot format.
There are two overloads: one which takes a FILE * and another which
takes a const char *fname and wraps the first with fopen/fclose for
convenience.

gcc/ChangeLog:

* dominance.cc (dot_dominance_tree): New.

i386: Refactor ssedoublemode

ssedoublemode's double should mean double type, like SI -> DI.
And we need to refactor some patterns with <ssedoublemode> instead of
<ssedoublevecmode>.

gcc/ChangeLog:

* config/i386/sse.md (ssedoublemode): Remove mappings to twice
the number of same-sized elements. Add mappings to the same
number of double-sized elements.
(define_split for vec_concat_minus_plus): Change mode_attr from
ssedoublemode to ssedoublevecmode.
(define_split for vec_concat_plus_minus): Ditto.
(<mask_codefor>avx512dq_shuf_<shuffletype>64x2_1<mask_name>):
Ditto.
(avx512f_shuf_<shuffletype>64x2_1<mask_name>): Ditto.
(avx512vl_shuf_<shuffletype>32x4_1<mask_name>): Ditto.
(avx512f_shuf_<shuffletype>32x4_1<mask_name>): Ditto.

MIPS: Support more cases with alien mode of SHF.DF

Currently, we support the cases that strictly fit for the instructions.
For example, for V16QImode, we only support shuffle like
(0<=N0, N1, N2, N3<=3 here)
N0, N1, N2, N3
N0+4 N1+4 N2+4, N3+4
N0+8 N1+8 N2+8, N3+8
N0+12 N1+12 N2+12, N3+12

While in fact we can support more cases to try use other SHF.DF
instructions not strictly fitting the mode.

1) We can use SHF.H to support more cases for V16QImode:
(M0/M1/M2/M3 are 0 or 2 or 4 or 6)
M0 M0+1, M1, M1+1
M2 M2+1, M3, M3+1
M0+8 M0+9, M1+8, M1+9
M2+8 M2+9, M3+8, M3+9

2) We can use SHF.W to support some cases for V16QImode:
(M0/M1/M2/M3 are 0 or 4 or 8 or 12)
M0, M0+1, M0+2, M0+3
M1, M1+1, M1+2, M1+3
M2, M2+1, M2+2, M2+3
M3, M3+1, M3+2, M3+3

3) We can use SHF.W to support some cases for V8HImode:
(M0/M1/M2/M3 are 0 or 2 or 4 or 6)
M0, M0+1
M1, M1+1
M2, M2+1
M3, M3+1

4) We can also use SHF.W to swap the 2 parts of V2DF or V2DI.

gcc
* config/mips/mips-protos.h: New function mips_msa_shf_i8.
* config/mips/mips-msa.md(MSA_WHB_W): Not used anymore;
(msa_shf_<msafmt_f>): Use mips_msa_shf_i8.
* config/mips/mips.cc(mips_const_vector_shuffle_set_p):
Support more cases try to use alien mode instruction;
(mips_msa_shf_i8): New function to get the correct MSA SHF
instruction and IMM.

Testsuite/MIPS: Fix msa.c: test7_v2f64, test7_v4f32, test43_v2i64

BNEGI.W/D are used for test7_v2f64 and test7_v4f32 now. It is
an improvment since that we can save a instruction.

ILVR.D is used for test43_v2i64 now, instead of INSVE.D.

gcc/testsuite
* gcc.target/mips/msa.c: Fix test7_v2f64, test7_v4f32 and
test43_v2i64.

MIPS/testsuite: Add -mfpxx to call-clobbered-1.c

The scan-assembler-times rules only fit for -mfp32 and -mfpxx.
It fails if we are configured as FP64 by default, as it has
one less sdc1/ldc1 pair.

gcc/testsuite
* gcc.target/mips/call-clobbered-1.c: Add -mfpxx.

MIPS/testsuite: Fix umips-save-restore-1.c

With some recent optimization, -O1/-O2/-O3 can archive almost same
performace/size by stack load/store.  Thus lwm/swm will save/store
less callee-saved register.  In fact only $16 is saved with swm.

To be sure that this optimization does exist, let's add 2 more
function calls.  So that lwm/swm can be much more profitable.

If we add only once more, -O1 will still use stack load/store.

gcc/testsuite
* gcc.target/mips/umips-save-restore-1.c: Be sure lwm/swm
are used for more callee-saved registers with addtional
2 more function calls.