git.ipfire.org Git - thirdparty/gcc.git/log

sparc: Char arrays are 64-bit aligned on SPARC

pr88077 fails on SPARC since char HeaderStr[1] in pr88077_1.c and
long HeaderStr in pr88077_0.c differs in alignment.

Warning printed by Binutils ld:
warning: alignment 4 of normal symbol `HeaderStr' in c_lto_pr88077_0.o is
smaller than 8 used by the common definition in c_lto_pr88077_1.o

gcc/testsuite/ChangeLog:

* gcc.dg/lto/pr88077_0.c: Change type to match alignment for SPARC

testsuite: Skip ifcvt-4.c for SPARC V8

Conditional moves are not available in SPARC V8.

gcc/testsuite/ChangeLog:

* gcc.dg/ifcvt-4.c: Skip for SPARC V8

sparc: Treat instructions with length 0 as empty

This is to handle the membar_empty instruction that can be generated
when compiling for UT699.

gcc/ChangeLog:

* config/sparc/sparc.cc (next_active_non_empty_insn): Length 0 treated as empty

sparc: Add errata workaround to membar patterns

LEON now uses the standard V8 membar patterns that contains an ldstub
instruction. This instruction needs to be aligned properly when the
GR712RC errata workaround is enabled.

gcc/ChangeLog:

* config/sparc/sparc.cc (atomic_insn_for_leon3_p): Treat membar_storeload as atomic
* config/sparc/sync.md (membar_storeload): Turn into named insn
and add GR712RC errata workaround.
(membar_v8): Add GR712RC errata workaround.

sparc: Revert membar optimization that is not suitable for LEON5

LEON5 has a deeper write-buffer and hence stb is not enough to flush a
write out. For compatibility, use the default V8 approach for both
LEON3 and LEON5.

This reverts commit 49cc765db35a5a21cab2aece27a44983fa70b94b,
"sync.md (*membar_storeload_leon3): New insn."

gcc/ChangeLog:

* config/sparc/sync.md (*membar_storeload_leon3): Remove
(*membar_storeload): Enable for LEON

Fix crash of -fdump-ada-spec in a pathological case

gcc/c-family/
PR ada/113397
* c-ada-spec.cc (check_type_name_conflict): Add guard for the
presence of DECL_NAME on a TYPE_DECL.

cfgexpand: Workaround CSE of ADDR_EXPRs in VAR_DECL partitioning [PR113372]

The following patch adds a quick workaround to bugs in VAR_DECL
partitioning.
The problem is that there is no dependency between ADDR_EXPRs of local
decls and CLOBBERs of those vars, so VN can CSE uses of ADDR_EXPRs
(including ivopts integral variants thereof), which can break
add_scope_conflicts discovery of what variables are actually used
in certain region.
E.g. we can have
  ivtmp.40_3 = (unsigned long) &MEM <unsigned long[100]> [(void *)&bitint.6 + 8B];
...
  uses of ivtmp.40_3
...
  bitint.6 ={v} {CLOBBER(eos)};
...
  ivtmp.28_43 = (unsigned long) &MEM <unsigned long[100]> [(void *)&bitint.6 + 8B];
...
  uses of ivtmp.28_43
before VN (such as dom3), which the add_scope_conflicts code identifies as 2
independent uses of bitint.6 variable (which is correct), but then VN
determines ivtmp.28_43 is the same as ivtmp.40_3 and just uses ivtmp.40_3
even in the second region; at that point add_scope_conflict thinks the
bitint.6 variable is not used in that region anymore.

The following patch does a simple single def-stmt check for such ADDR_EXPRs
(rather than say trying to do a full propagation of what SSA_NAMEs can
contain ADDR_EXPRs of local variables), which seems to workaround all 4 PRs.

In addition to this patch I've used the attached one to gather statistics
on the total size of all variable partitions in a function and seems besides
the new testcases nothing is really affected compared to no patch (I've
actually just modified the patch to == OMP_SCAN instead of == ADDR_EXPR, so
it looks the same except that it never triggers).  The comparison wasn't
perfect because I've only gathered BITS_PER_WORD, main_input_filename (did
some replacement of build directories and /tmp/ccXXXXXX names of LTO to make
it more similar between the two bootstraps/regtests), current_function_name
and the total size of all variable partitions if any, because I didn't
record e.g. the optimization options and so e.g. torture tests which iterate
over options could have different partition sizes even in one compiler when
BITS_PER_WORD, main_input_filename and current_function_name are all equal.
So had to write an awk script to check if the first triple in the second
build appeared in the first one and the quadruple in the second build
appeared in the first one too, otherwise print result and that only
triggered in the new tests.
Also, the cc1plus binary according to objdump -dr is identical between the
two builds except for the ADDR_EXPR vs. OMP_SCAN constant in the two spots.

2024-01-16  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/113372
PR middle-end/90348
PR middle-end/110115
PR middle-end/111422
* cfgexpand.cc (add_scope_conflicts_2): New function.
(add_scope_conflicts_1): Use it.

* gcc.dg/torture/bitint-49.c: New test.
* gcc.c-torture/execute/pr90348.c: New test.
* gcc.c-torture/execute/pr110115.c: New test.
* gcc.c-torture/execute/pr111422.c: New test.

AVR: Add AVR16EB, AVR16EA and AVR32EA devices.

gcc/
* config/avr/avr-mcus.def (avr16eb14, avr16eb20, avr16eb28, avr16eb32)
(avr16ea28, avr16ea32, avr16ea48, avr32ea28, avr32ea32, avr32ea48): Add.
* doc/avr-mmcu.texi: Regenerate.

Do not count unused scalar use when marking STMT_VINFO_LIVE_P [PR113091]

When pattern recognition is involved, a statement whose definition is
consumed in some pattern, may not be included in the final replacement
pattern statements, and would be skipped when building SLP graph.

* Original
  char a_c = *(char *) a;
  char b_c = *(char *) b;
  unsigned short a_s = (unsigned short) a_c;
  int a_i = (int) a_s;
  int b_i = (int) b_c;
  int r_i = a_i - b_i;

* After pattern replacement
  a_s = (unsigned short) a_c;
  a_i = (int) a_s;

  patt_b_s = (unsigned short) b_c;    // b_i = (int) b_c
  patt_b_i = (int) patt_b_s;          // b_i = (int) b_c

  patt_r_s = widen_minus(a_c, b_c);   // r_i = a_i - b_i
  patt_r_i = (int) patt_r_s;          // r_i = a_i - b_i

The definitions of a_i(original statement) and b_i(pattern statement)
are related to, but actually not part of widen_minus pattern.
Vectorizing the pattern does not cause these definition statements to
be marked as PURE_SLP.  For this case, we need to recursively check
whether their uses are all absorbed into vectorized code.  But there
is an exception that some use may participate in an vectorized
operation via an external SLP node containing that use as an element.

gcc/ChangeLog:

PR tree-optimization/113091
* tree-vect-slp.cc (vect_slp_has_scalar_use): New function.
(vect_bb_slp_mark_live_stmts): New parameter scalar_use_map, check
scalar use with new function.
(vect_bb_slp_mark_live_stmts): New function as entry to existing
overriden functions with same name.
(vect_slp_analyze_operations): Call new entry function to mark
live statements.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/bb-slp-pr113091.c: New test.

RISC-V: Report Sorry when users enable RVV in big-endian mode [PR113404]

As PR113404 mentioned: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113404

We have ICE when we enable RVV in big-endian mode:

during RTL pass: expand
a-float-point-dynamic-frm-66.i:2:14: internal compiler error: in to_constant, at poly-int.h:588
0xab4c2c poly_int<2u, unsigned short>::to_constant() const
        /repo/gcc-trunk/gcc/poly-int.h:588
0xab4de1 poly_int<2u, unsigned short>::to_constant() const
        /repo/gcc-trunk/gcc/tree.h:4055
0xab4de1 default_function_arg_padding(machine_mode, tree_node const*)
        /repo/gcc-trunk/gcc/targhooks.cc:844
0x12e2327 locate_and_pad_parm(machine_mode, tree_node*, int, int, int, tree_node*, args_size*, locate_and_pad_arg_data*)
        /repo/gcc-trunk/gcc/function.cc:4061
0x12e2aca assign_parm_find_entry_rtl
        /repo/gcc-trunk/gcc/function.cc:2614
0x12e2c89 assign_parms
        /repo/gcc-trunk/gcc/function.cc:3693
0x12e59df expand_function_start(tree_node*)
        /repo/gcc-trunk/gcc/function.cc:5152
0x112fafb execute
        /repo/gcc-trunk/gcc/cfgexpand.cc:6739

Report users that we don't support RVV in big-endian mode for the following reasons:
1. big-endian in RISC-V is pretty rare case.
2. We didn't test RVV in big-endian and we don't have enough time to test it since it's stage 4 now.

Naive disallow RVV in big-endian.

Tested no regression, ok for trunk ?

gcc/ChangeLog:

PR target/113404
* config/riscv/riscv.cc (riscv_override_options_internal): Report sorry
for RVV in big-endian mode.

gcc/testsuite/ChangeLog:

PR target/113404
* gcc.target/riscv/rvv/base/big_endian-1.c: New test.
* gcc.target/riscv/rvv/base/big_endian-2.c: New test.

testsuite: Fix vect_long_mult on Power [PR109705]

As pointed out by the discussion in PR109705, the current
vect_long_mult effective target check on Power is broken.
This patch is to fix it accordingly.

With additional change by adding a guard vect_long_mult
in gcc.dg/vect/pr25413a.c, it's tested well on Power{8,9}
LE & BE (also on Power10 LE as before).

PR testsuite/109705

gcc/testsuite/ChangeLog:

* lib/target-supports.exp (check_effective_target_vect_long_mult):
Fix powerpc*-*-* checks.

RISC-V: delete vector abi checking in all relevant tests.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/abi-call-args-1-run.c: Delete the
-Wno-psabi.
* gcc.target/riscv/rvv/base/abi-call-args-1.c: Ditto.
* gcc.target/riscv/rvv/base/abi-call-args-2-run.c: Ditto.
* gcc.target/riscv/rvv/base/abi-call-args-2.c: Ditto.
* gcc.target/riscv/rvv/base/abi-call-args-3-run.c: Ditto.
* gcc.target/riscv/rvv/base/abi-call-args-3.c: Ditto.
* gcc.target/riscv/rvv/base/abi-call-args-4-run.c: Ditto.
* gcc.target/riscv/rvv/base/abi-call-args-4.c: Ditto.
* gcc.target/riscv/rvv/base/abi-call-error-1.c: Ditto.
* gcc.target/riscv/rvv/base/abi-call-return-run.c: Ditto.
* gcc.target/riscv/rvv/base/abi-call-return.c: Ditto.
* gcc.target/riscv/rvv/base/abi-call-variant_cc.c: Ditto.
* gcc.target/riscv/rvv/base/abi-callee-saved-1-fixed-1.c: Ditto.
* gcc.target/riscv/rvv/base/abi-callee-saved-1-fixed-2.c: Ditto.
* gcc.target/riscv/rvv/base/abi-callee-saved-1-save-restore.c: Ditto.
* gcc.target/riscv/rvv/base/abi-callee-saved-1-zcmp.c: Ditto.
* gcc.target/riscv/rvv/base/abi-callee-saved-1.c: Ditto.
* gcc.target/riscv/rvv/base/abi-callee-saved-2-save-restore.c: Ditto.
* gcc.target/riscv/rvv/base/abi-callee-saved-2-zcmp.c: Ditto.
* gcc.target/riscv/rvv/base/abi-callee-saved-2.c: Ditto.
* gcc.target/riscv/rvv/base/fixed-point-vxrm-error.c: Ditto.
* gcc.target/riscv/rvv/base/fixed-point-vxrm.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-cvt-f.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-cvt-x.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-cvt-xu.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-1.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-10.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-11.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-12.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-13.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-14.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-15.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-16.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-17.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-18.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-19.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-2.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-20.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-21.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-22.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-23.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-24.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-25.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-26.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-27.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-28.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-29.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-3.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-30.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-31.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-32.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-33.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-34.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-35.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-36.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-37.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-38.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-39.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-4.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-40.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-41.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-42.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-43.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-44.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-45.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-46.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-47.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-48.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-49.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-5.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-50.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-51.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-52.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-53.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-54.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-55.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-56.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-57.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-58.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-59.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-6.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-60.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-61.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-62.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-63.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-64.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-65.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-66.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-67.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-68.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-69.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-7.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-70.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-71.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-72.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-73.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-74.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-75.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-76.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-77.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-8.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-9.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-frm-autovec-1.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-frm-autovec-2.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-frm-autovec-3.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-frm-autovec-4.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-frm-error.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-frm-insert-1.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-frm-insert-10.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-frm-insert-2.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-frm-insert-3.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-frm-insert-4.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-frm-insert-5.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-frm-insert-7.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-frm-insert-8.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-frm-insert-9.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-frm-run-1.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-frm-run-2.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-frm-run-3.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-frm-run-4.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-frm-run-5.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-frm.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-fwmacc.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-macc.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-madd.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-msac.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-msub.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-ncvt-f.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-ncvt-x.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-ncvt-xu.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-nmacc.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-nmadd.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-nmsac.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-nmsub.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-rec7.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-redosum.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-redusum.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-single-div.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-single-mul.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-single-rdiv.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-single-rsub.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-single-sub.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-sqrt.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-wcvt-x.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-wcvt-xu.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-widening-add.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-widening-mul.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-widening-sub.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-wmsac.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-wnmacc.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-wnmsac.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-wredosum.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-wredusum.c: Ditto.
* gcc.target/riscv/rvv/base/intrisinc-vrgatherei16.c: Ditto.
* gcc.target/riscv/rvv/base/no-honor-frm-1.c: Ditto.
* gcc.target/riscv/rvv/base/overloaded_rv32_vadd.c: Ditto.
* gcc.target/riscv/rvv/base/overloaded_rv32_vfadd.c: Ditto.
* gcc.target/riscv/rvv/base/overloaded_rv32_vget_vset.c: Ditto.
* gcc.target/riscv/rvv/base/overloaded_rv32_vloxseg2ei16.c: Ditto.
* gcc.target/riscv/rvv/base/overloaded_rv32_vmv.c: Ditto.
* gcc.target/riscv/rvv/base/overloaded_rv32_vreinterpret.c: Ditto.
* gcc.target/riscv/rvv/base/overloaded_rv64_vadd.c: Ditto.
* gcc.target/riscv/rvv/base/overloaded_rv64_vfadd.c: Ditto.
* gcc.target/riscv/rvv/base/overloaded_rv64_vget_vset.c: Ditto.
* gcc.target/riscv/rvv/base/overloaded_rv64_vloxseg2ei16.c: Ditto.
* gcc.target/riscv/rvv/base/overloaded_rv64_vmv.c: Ditto.
* gcc.target/riscv/rvv/base/overloaded_rv64_vreinterpret.c: Ditto.
* gcc.target/riscv/rvv/base/pr110119-1.c: Ditto.
* gcc.target/riscv/rvv/base/pr110119-2.c: Ditto.
* gcc.target/riscv/rvv/base/pr110265-1.c: Ditto.
* gcc.target/riscv/rvv/base/pr110265-2.c: Ditto.
* gcc.target/riscv/rvv/base/pr110265-3.c: Ditto.
* gcc.target/riscv/rvv/base/pr110277-1.c: Ditto.
* gcc.target/riscv/rvv/base/pr110277-2.c: Ditto.
* gcc.target/riscv/rvv/base/pr110299-1.c: Ditto.
* gcc.target/riscv/rvv/base/pr110299-2.c: Ditto.
* gcc.target/riscv/rvv/base/pr110299-3.c: Ditto.
* gcc.target/riscv/rvv/base/pr110299-4.c: Ditto.
* gcc.target/riscv/rvv/base/pr111720-0.c: Ditto.
* gcc.target/riscv/rvv/base/pr111720-1.c: Ditto.
* gcc.target/riscv/rvv/base/pr111720-10.c: Ditto.
* gcc.target/riscv/rvv/base/pr111720-2.c: Ditto.
* gcc.target/riscv/rvv/base/pr111720-3.c: Ditto.
* gcc.target/riscv/rvv/base/pr111720-4.c: Ditto.
* gcc.target/riscv/rvv/base/pr111720-5.c: Ditto.
* gcc.target/riscv/rvv/base/pr111720-6.c: Ditto.
* gcc.target/riscv/rvv/base/pr111720-7.c: Ditto.
* gcc.target/riscv/rvv/base/pr111720-8.c: Ditto.
* gcc.target/riscv/rvv/base/pr111720-9.c: Ditto.
* gcc.target/riscv/rvv/base/pr111935.c: Ditto.
* gcc.target/riscv/rvv/base/scalar-move-merged-run-1.c: Ditto.
* gcc.target/riscv/rvv/base/simplify-vdiv.c: Ditto.
* gcc.target/riscv/rvv/base/simplify-vrsub.c: Ditto.
* gcc.target/riscv/rvv/base/tuple-intrinsic.c: Ditto.
* gcc.target/riscv/rvv/base/tuple_vundefined.c: Ditto.
* gcc.target/riscv/rvv/base/vcreate.c: Ditto.
* gcc.target/riscv/rvv/base/vlmul_ext-2.c: Ditto.
* gcc.target/riscv/rvv/base/vslide1down-1.c: Ditto.
* gcc.target/riscv/rvv/base/vslide1down-2.c: Ditto.
* gcc.target/riscv/rvv/base/vslide1down-3.c: Ditto.
* gcc.target/riscv/rvv/base/vslide1up-1.c: Ditto.
* gcc.target/riscv/rvv/base/vslide1up-2.c: Ditto.
* gcc.target/riscv/rvv/base/vslide1up-3.c: Ditto.
* gcc.target/riscv/rvv/base/zvbb-intrinsic.c: Ditto.
* gcc.target/riscv/rvv/base/zvbb_vandn_vx_constraint.c: Ditto.
* gcc.target/riscv/rvv/base/zvbc-intrinsic.c: Ditto.
* gcc.target/riscv/rvv/base/zvbc_vx_constraint-1.c: Ditto.
* gcc.target/riscv/rvv/base/zvbc_vx_constraint-2.c: Ditto.
* gcc.target/riscv/rvv/base/zvfhmin-intrinsic.c: Ditto.
* gcc.target/riscv/rvv/base/zvkg-intrinsic.c: Ditto.
* gcc.target/riscv/rvv/base/zvkned-intrinsic.c: Ditto.
* gcc.target/riscv/rvv/base/zvknha-intrinsic.c: Ditto.
* gcc.target/riscv/rvv/base/zvknhb-intrinsic.c: Ditto.
* gcc.target/riscv/rvv/base/zvksed-intrinsic.c: Ditto.
* gcc.target/riscv/rvv/base/zvksh-intrinsic.c: Ditto.

Signed-off-by: Yanzhang Wang <yanzhang.wang@intel.com>

RISC-V: delete all the vector psabi checking.

Thanks the
https://hub.fgit.cf/riscv-non-isa/riscv-elf-psabi-doc/pull/389, we
need not to maintain the psabi checking any more.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_arg_has_vector): Delete.
(riscv_pass_in_vector_p): Delete.
(riscv_init_cumulative_args): Delete the checking.
(riscv_get_arg_info): Delete the checking.
(riscv_function_value): Delete the checking.
* config/riscv/riscv.h: Delete the member for checking.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/binop_vx_constraint-120.c: Delete the -Wno-psabi.
* gcc.target/riscv/rvv/base/integer_compare_insn_shortcut.c: Ditto.
* gcc.target/riscv/rvv/base/mask_insn_shortcut.c: Ditto.
* gcc.target/riscv/rvv/base/misc_vreinterpret_vbool_vint.c: Ditto.
* gcc.target/riscv/rvv/base/pr110109-2.c: Ditto.
* gcc.target/riscv/rvv/base/scalar_move-9.c: Ditto.
* gcc.target/riscv/rvv/base/spill-10.c: Ditto.
* gcc.target/riscv/rvv/base/spill-11.c: Ditto.
* gcc.target/riscv/rvv/base/spill-9.c: Ditto.
* gcc.target/riscv/rvv/base/vlmul_ext-1.c: Ditto.
* gcc.target/riscv/rvv/base/zero_base_load_store_optimization.c: Ditto.
* gcc.target/riscv/rvv/base/zvfh-intrinsic.c: Ditto.
* gcc.target/riscv/rvv/base/zvfh-over-zvfhmin.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vsetvl-1.c: Ditto.
* gcc.target/riscv/rvv/base/vector-abi-1.c: Removed.
* gcc.target/riscv/rvv/base/vector-abi-2.c: Removed.
* gcc.target/riscv/rvv/base/vector-abi-3.c: Removed.
* gcc.target/riscv/rvv/base/vector-abi-4.c: Removed.
* gcc.target/riscv/rvv/base/vector-abi-5.c: Removed.
* gcc.target/riscv/rvv/base/vector-abi-6.c: Removed.
* gcc.target/riscv/rvv/base/vector-abi-7.c: Removed.
* gcc.target/riscv/rvv/base/vector-abi-8.c: Removed.

Signed-off-by: Yanzhang Wang <yanzhang.wang@intel.com>
Signed-off-by: Yanzhang Wang <yanzhang.wang@intel.com>
Signed-off-by: Yanzhang Wang <yanzhang.wang@intel.com>
Signed-off-by: Yanzhang Wang <yanzhang.wang@intel.com<mailto:yanzhang.wang@intel.com>>

Daily bump.

analyzer: fix false +ves from -Wanalyzer-tainted-array-index with unsigned char index [PR106229]

gcc/analyzer/ChangeLog:
PR analyzer/106229
* analyzer.h (compare_constants): New decl.
* constraint-manager.cc (compare_constants): Make non-static.
* sm-taint.cc: Add include "fold-const.h".
(class concrete_range): New.
(get_possible_range): New.
(index_can_be_out_of_bounds_p): New.
(region_model::check_region_for_taint): Reject
-Wanalyzer-tainted-array-index if the type of the value makes it
impossible for it to be out-of-bounds of the array.

gcc/testsuite/ChangeLog:
PR analyzer/106229
* c-c++-common/analyzer/taint-index-pr106229.c: New test.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

analyzer: casting all zeroes should give all zeroes [PR113333]

In particular, accessing the result of *calloc (1, SZ) (if non-NULL)
should be known to be all zeroes.

gcc/analyzer/ChangeLog:
PR analyzer/113333
* region-model-manager.cc
(region_model_manager::maybe_fold_unaryop): Casting all zeroes
should give all zeroes.

gcc/testsuite/ChangeLog:
PR analyzer/113333
* c-c++-common/analyzer/calloc-1.c: Add tests.
* c-c++-common/analyzer/pr96639.c: Update expected results.
* gcc.dg/analyzer/data-model-9.c: Likewise.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

c++: ICE with auto in template arg [PR110065]

Here we started crashing with r14-1659 because that removed the
auto checking in cp_parser_template_type_arg which seemed like
dead code. But the attached test shows that the code can still
be reached because cp_parser_type_id_1 checks auto only when
auto_is_implicit_function_template_parm_p is on.

Then I noticed that we're still crashing in C++20, and that ICE
started with r12-4772. So I changed the reemerged check to use
flag_concepts_ts rather than flag_concepts on the basis that
check_auto_in_tmpl_args also checks flag_concepts_ts.

PR c++/110065

gcc/cp/ChangeLog:

* parser.cc (cp_parser_template_type_arg): Add auto checking.

gcc/testsuite/ChangeLog:

* g++.dg/concepts/auto8.C: New test.
* g++.dg/concepts/auto8a.C: New test.

Add myself to the DCO section

It is time to add myself to DCO section for my quicinc email account.

ChangeLog:

* MAINTAINERS (DCO): Add myself.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

c++: access of class-scope partial spec

Since partial specializations can't be named directly, their access
when declared at class scope is irrelevant, so we shouldn't have to set
their TREE_PRIVATE / TREE_PROTECTED in maybe_new_partial_specialization
(which is used only for constrained partial specializations anyway).

This code was added by r10-4833-gcce3c9db9e6ffa for PR92078, but it
seems better to just disable the access consistency check for partial
specializations, which lets us accept the below testcase.

gcc/cp/ChangeLog:

* parser.cc (cp_parser_check_access_in_redeclaration): Don't
check access for a partial or explicit specialization.
* pt.cc (maybe_new_partial_specialization): Don't set TREE_PRIVATE
or TREE_PROTECTED on the newly created partial specialization.

gcc/testsuite/ChangeLog:

* g++.dg/template/partial-specialization14.C: New test.

Reviewed-by: Jason Merrill <jason@redhat.com>

c++: explicit inst w/ similar constrained partial specs [PR104634]

Here we neglect to emit the definitions of A<double>::f2 and A<double*>::f4
despite the explicit instantiations ultimately because TREE_PUBLIC isn't
set on the corresponding partial specializations, whose declarations are
created from maybe_new_partial_specialization which is responsible for
disambiguating them from the first and third partial specializations (which
have the same class-head but different constraints). This makes grokfndecl
in turn clear TREE_PUBLIC for f2 and f4 as if they have internal linkage.

This patch fixes this by setting TREE_PUBLIC appropriately for such partial
specializations.

PR c++/104634

gcc/cp/ChangeLog:

* pt.cc (maybe_new_partial_specialization): Propagate TREE_PUBLIC
to the newly created partial specialization.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-explicit-inst6.C: New test.

Reviewed-by: Jason Merrill <jason@redhat.com>

c++: non-dep array list-init w/ non-triv dtor [PR109899]

The get_target_expr call added in r12-7069-g119cea98f66476 causes us
for the below testcase to call build_vec_delete in a template context,
which builds a templated destructor call and checks expr_noexcept_p for
it, which ICEs because the call has templated form.

Much of the work of build_vec_delete however is code generation and thus
will just get discarded in a template context, and that includes the
code guarded by expr_noexcept_p. So this patch narrowly fixes this ICE
by eliding the expr_noexcept_p call when in a template context.

PR c++/109899

gcc/cp/ChangeLog:

* init.cc (build_vec_delete_1): Assume expr_noexcept_p returns
false in a template context.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/initlist-array21.C: New test.

Reviewed-by: Jason Merrill <jason@redhat.com>

libstdc++: Reduce std::variant template instantiation depth

The recursively defined constraints on _Variadic_union's user-defined
destructor (used for maintaining trivial destructibility of the variant
iff all of its alternatives are) turn out to require a template
instantiation depth of 3x the number of variants in C++20 mode, with the
instantiation stack looking like

  ...
  _Variadic_union<B, C, ...>
  std::is_trivially_destructible_v<_Variadic_union<B, C, ...>>
  _Variadic_union<A, B, C, ...>::~_Variadic_union()
  _Variadic_union<A, B, C, ...>
  ...

Ideally the template depth should be ~equal to the number of variants
(plus a constant).  Luckily it seems we don't need to compute trivial
destructibility of the alternatives at all from _Variadic_union, since
its only user _Variant_storage already has that information.  To that
end this patch removes these recursive constraints and instead passes
this information down from _Variant_storage.  After this patch, the
template instantiation depth for 87619.cc in C++20 mode is ~270 instead
of ~780.

libstdc++-v3/ChangeLog:

* include/std/variant (__detail::__variant::_Variadic_union):
Add bool __trivially_destructible template parameter.
(__detail::__variant::_Variadic_union::~_Variadic_union):
Use __trivially_destructible in constraints instead.
(__detail::__variant::_Variant_storage): Pass
__trivially_destructible value to _Variadic_union.

Reviewed-by: Jonathan Wakely <jwakely@redhat.com>

libstdc++: Implement P2836R1 changes to const_iterator

libstdc++-v3/ChangeLog:

* include/bits/stl_iterator.h (const_iterator): Define conversion
operators as per P2836R1.
* include/bits/version.def (ranges_as_const): Update value.
* include/bits/version.h: Regenerate.
* testsuite/24_iterators/const_iterator/1.cc (test04): New test.
* testsuite/std/ranges/adaptors/as_const/1.cc: Adjust expected
value of __cpp_lib_ranges_as_const.
* testsuite/std/ranges/version_c++23.cc: Likewise.

Reviewed-by: Jonathan Wakely <jwakely@redhat.com>

Remove --save-temps from some compile tests

--save-temps is needed to scan assembly outputs for assemble, link and
run tests. Not all compile tests need --save-temps unless they used to
trigger GCC bugs. Run --save-temps from compile tests if not needed.

PR testsuite/113369
* g++.dg/abi/ref-temp1.C: Remove --save-temps.
* g++.target/i386/bfloat_cpp_typecheck.C: Likewise.
* gcc.dg/debug/dwarf2/pr111080.c: Likewise.
* gcc.dg/debug/dwarf2/pr47939-1.c: Likewise.
* gcc.dg/debug/dwarf2/pr47939-2.c: Likewise.
* gcc.dg/debug/dwarf2/pr47939-3.c: Likewise.
* gcc.dg/debug/dwarf2/pr47939-4.c: Likewise.

libstdc++: Fix redefinition error in std::tuple [PR108822]

When using a compiler that doesn't define __cpp_conditional_explicit
there's a redefinition error for tuple::__nothrow_assignable. This is
because it's defined in different places for the pre-C++20 and C++20
implementations, which are controled by different preprocessor
conditions. For certain combinations of C++20 feature test macros it's
possible for both __nothrow_assignable definitions to be in scope.

Move the pre-C++20 __assignable and __nothrow_assignable definitions adjacent to
their use, so that only one set of definitions is visible for any given
set of feature test macros.

libstdc++-v3/ChangeLog:

PR libstdc++/108822
* include/std/tuple (__assignable, __is_nothrow_assignable):
Move pre-C++20 definitions adjacent to their use.

libstdc++: Use variable template to fix -fconcepts-ts error [PR113366]

There's an error for -fconcepts-ts due to using a concept where a bool
NTTP is required, which is fixed by using the vraiable template that
already exists in the class scope.

This doesn't fix the problem with -fconcepts-ts as changes to the
placement of attributes is also needed.

libstdc++-v3/ChangeLog:

PR testsuite/113366
* include/std/format (basic_format_arg): Use __formattable
variable template instead of __format::__formattable_with
concept.

libstdc++: Update tzdata to 2023d

Import the new 2023d tzdata.zi file. The leapseconds file was also
updated to have a new expiry (no new leap seconds were added).

libstdc++-v3/ChangeLog:

* src/c++20/tzdata.zi: Import new file from 2023d release.
* src/c++20/tzdb.cc (tzdb_list::_Node::_S_read_leap_seconds)
Update expiry date for leap seconds list.

testsuite: Add testcase for already fixed PR [PR113048]

The ICE on this testcase was fixed by r14-7141.

2024-01-15 Jakub Jelinek <jakub@redhat.com>

PR rtl-optimization/113048
* gcc.target/i386/pr113048.c: New test.

AVR: Document option -mskip-bug.

gcc/
* doc/invoke.texi (AVR Options) [-mskip-bug]: Add documentation.

RISC-V: Add C intrinsic for Scalar Bitmanip Extension

This patch adds C intrinsics for Bitmanip Extension.
RISCV_BUILTIN_NO_PREFIX is a new riscv_builtin_description like RISCV_BUILTIN.
But it uses CODE_FOR_##INSN rather than CODE_FOR_riscv_##INSN.
Changed orcb, clmul, brev8 pattern's mode form X to GPR because orcbsi, clmul_si,
brev8_si are both included in rv32 and rv64. Test them in scalar_bitmanip_intrinsic-64-emulated.c.

gcc/ChangeLog:

* config.gcc: Include riscv_bitmanip.h.
* config/riscv/bitmanip.md: Changed mode form X to GPR in orcb and clmul pattern.
* config/riscv/crypto.md: Changed mode form X to GPR in brev8 pattern.
* config/riscv/riscv-builtins.cc (AVAIL): Adding new bitmanip builtins.
(RISCV_BUILTIN_NO_PREFIX): New helper macro.
* config/riscv/riscv-cmo.def (RISCV_BUILTIN): Add '_32'/'_64' postfix to builtins.
* config/riscv/riscv-ftypes.def (2): New ftypes.
* config/riscv/riscv-scalar-crypto.def (RISCV_BUILTIN): New builtins.
(RISCV_BUILTIN_NO_PREFIX): Likewise.
* config/riscv/riscv_bitmanip.h: New file.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/scalar_bitmanip_intrinsic-32.c: New test.
* gcc.target/riscv/scalar_bitmanip_intrinsic-64-emulated.c: New test.
* gcc.target/riscv/scalar_bitmanip_intrinsic-64.c: New test.

RISC-V: Add C intrinsic for Scalar Crypto Extension

This patch adds C intrinsics for Scalar Crypto Extension.

gcc/ChangeLog:

* config.gcc: Include riscv_crypto.h.
* config/riscv/riscv_crypto.h: New file.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/scalar_crypto_intrinsic-32.c: New test.
* gcc.target/riscv/scalar_crypto_intrinsic-64.c: New test.

RISC-V: Remove the Scalar Bitmanip and Crypto Built-In function testsuites

The serials patch provides a mapping from the RV intrinsics to the builtin names.
There are some duplicates testsuites between intrinsic and built-in function.
Remove the Scalar Bitmanip and Scalar Crypto Built-In function testsuites
that will be included in the intrinsic functions.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zbb_32_bswap-2.c: Moved to...
* gcc.target/riscv/zbb_bswap16.c: ...here.
* gcc.target/riscv/zbkb32.c: Remove __builtin_riscv_(un)zip and
zip,__builtin_riscv_brev8.
* gcc.target/riscv/zbkb64.c: Remove __builtin_riscv_brev8.
* gcc.target/riscv/zbb_32_bswap-1.c: Removed.
* gcc.target/riscv/zbb_bswap-1.c: Removed.
* gcc.target/riscv/zbb_bswap-2.c: Removed.
* gcc.target/riscv/zbbw.c: Removed.
* gcc.target/riscv/zbc32.c: Removed.
* gcc.target/riscv/zbc64.c: Removed.
* gcc.target/riscv/zbkc32.c: Removed.
* gcc.target/riscv/zbkc64.c: Removed.
* gcc.target/riscv/zbkx32.c: Removed.
* gcc.target/riscv/zbkx64.c: Removed.
* gcc.target/riscv/zknd32-2.c: Removed.
* gcc.target/riscv/zknd64-2.c: Removed.
* gcc.target/riscv/zkne32-2.c: Removed.
* gcc.target/riscv/zkne64-2.c: Removed.
* gcc.target/riscv/zknh-sha256-32.c: Removed.
* gcc.target/riscv/zknh-sha256-64.c: Removed.
* gcc.target/riscv/zknh-sha512-32.c: Removed.
* gcc.target/riscv/zknh-sha512-64.c: Removed.
* gcc.target/riscv/zksed32-2.c: Removed.
* gcc.target/riscv/zksed64-2.c: Removed.
* gcc.target/riscv/zksh32.c: Removed.
* gcc.target/riscv/zksh64.c: Removed.

[PR113354][LRA]: Fixing LRA failure on building MIPS GCC

My recent patch for PR112918 triggered a hidden bug in LRA on MIPS.  A
pseudo is matched to a register constraint and assigned to a hard
registers at the first constraint sub-pass but later it is matched to
X constraint.  Keeping this pseudo in the register (MD0) prevents to
use the same register for another pseudo in the insn and this results
in LRA failure.  The patch fixes this by spilling the pseudo at the
constraint subpass when the chosen alternative constraint not require
hard register anymore.

gcc/ChangeLog:

PR middle-end/113354
* lra-constraints.cc (curr_insn_transform): Spill pseudo only used
in the insn if the corresponding operand does not require hard
register anymore.

AVR: target/107201: Make -nodevicelib work for all devices.

driver-avr.cc contains a spec that discriminates bwtween cores
and devices by means of a mmcu=avr* spec pattern. This does not
work for new devices like AVR128* which also start with mmcu=avr
like all cores do. The patch uses a new spec function in order to
tell apart cores from devices.

gcc/
PR target/107201
* config/avr/avr.h (EXTRA_SPEC_FUNCTIONS): Add no-devlib, avr_no_devlib.
* config/avr/driver-avr.cc (avr_no_devlib): New function.
(avr_devicespecs_file): Use it to remove -nodevicelib from the
options for cores only.
* config/avr/avr-arch.h (avr_get_parch): New prototype.
* config/avr/avr-devices.cc (avr_get_parch): New function.

libgfortran: Bugfix if not define HAVE_ATOMIC_FETCH_ADD

This patch try to fix the bug when HAVE_ATOMIC_FETCH_ADD is
not defined in dec_waiting_unlocked function. As io.h does
not include async.h, the WRLOCK and RWUNLOCK macros are
undefined.

libgfortran/ChangeLog:

* io/io.h (dec_waiting_unlocked): Use
__gthread_rwlock_wrlock/__gthread_rwlock_unlock or
__gthread_mutex_lock/__gthread_mutex_unlock functions
to replace WRLOCK and RWUNLOCK macros.

Signed-off-by: Lipeng Zhu <lipeng.zhu@intel.com>

RISC-V: Fix regression (GCC-14 compare with GCC-13.2) of SHA256 from coremark-pro

This patch fixes -70% performance drop from GCC-13.2 to GCC-14 with -march=rv64gcv in real hardware.

The root cause is incorrect cost model cause inefficient vectorization which makes us performance drop significantly.

So this patch does:

1. Adjust vector to scalar cost by introducing v to scalar reg move.
2. Adjust vec_construct cost since we does spend NUNITS instructions to construct the vector.

Tested on both RV32/RV64 no regression, Rebase to the trunk and commit it as it is approved by Robin.

PR target/113247

gcc/ChangeLog:

* config/riscv/riscv-protos.h (struct regmove_vector_cost): Add vector to scalar regmove.
* config/riscv/riscv-vector-costs.cc (adjust_stmt_cost): Ditto.
* config/riscv/riscv.cc (riscv_builtin_vectorization_cost): Adjust vec_construct cost.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls/reduc-19.c: Adapt test.
* gcc.target/riscv/rvv/autovec/vls/reduc-20.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/reduc-21.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/pr113247-1.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/pr113247-2.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/pr113247-3.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/pr113247-4.c: New test.

RISC-V: Adjust loop len by costing 1 when NITER < VF

Rebase in v3: Rebase to the trunk and commit it as it's approved by Robin.
Update in v2: Add dynmaic lmul test.

This patch fixes the regression between GCC 13.2.0 and trunk GCC (GCC-14)

GCC 13.2.0:

lui a5,%hi(a)
li a4,19
sb a4,%lo(a)(a5)
li a0,0
ret

Trunk GCC:

        vsetvli a5,zero,e8,mf2,ta,ma
        li      a4,-32768
        vid.v   v1
        vsetvli zero,zero,e16,m1,ta,ma
        addiw   a4,a4,104
        vmv.v.i v3,15
        lui     a1,%hi(a)
        li      a0,19
        vsetvli zero,zero,e8,mf2,ta,ma
        vadd.vi v1,v1,1
        sb      a0,%lo(a)(a1)
        vsetvli zero,zero,e16,m1,ta,ma
        vzext.vf2       v2,v1
        vmv.v.x v1,a4
        vminu.vv        v2,v2,v3
        vsrl.vv v1,v1,v2
        vslidedown.vi   v1,v1,17
        vmv.x.s a0,v1
        snez    a0,a0
        ret

The root cause we are vectorizing the codes inefficiently since we doesn't cost len when NITERS < VF.
Leverage loop control of mask targets or rs6000 fixes the regression.

Tested no regression. Ok for trunk ?

PR target/113281

gcc/ChangeLog:

* config/riscv/riscv-vector-costs.cc (costs::adjust_vect_cost_per_loop): New function.
(costs::finish_cost): Adjust cost for LOOP LEN with NITERS < VF.
* config/riscv/riscv-vector-costs.h: New function.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/costmodel/riscv/rvv/pr113281-3.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/pr113281-4.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/pr113281-5.c: New test.

tree-optimization/113385 - wrong loop father with early exit vectorization

The following avoids splitting an edge before redirecting it. This
allows the loop father of the new block to be correct in the first
place.

PR tree-optimization/113385
* tree-vect-loop-manip.cc (slpeel_tree_duplicate_loop_to_edge_cfg):
First redirect, then split the exit edge.

RISC-V: Remove m_num_vector_iterations[NFC]

Notice the m_num_vector_iterations is not used, remove the redundant codes.

Committed.

gcc/ChangeLog:

* config/riscv/riscv-vector-costs.cc (costs::analyze_loop_vinfo):
Remove m_num_vector_iterations.
* config/riscv/riscv-vector-costs.h: Ditto.

RISC-V: Add optimized dump check of VLS reduc tests

Add more dump check to robostify the tests.

Committed.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls/reduc-1.c: Add dump check.
* gcc.target/riscv/rvv/autovec/vls/reduc-10.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/reduc-11.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/reduc-12.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/reduc-13.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/reduc-14.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/reduc-15.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/reduc-16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/reduc-17.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/reduc-18.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/reduc-19.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/reduc-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/reduc-20.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/reduc-21.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/reduc-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/reduc-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/reduc-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/reduc-6.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/reduc-7.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/reduc-8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/reduc-9.c: Ditto.

AVR: target/113156 - Fix ICE due to missing "Save" on -m[long-]double= options.

Multilib options -mdouble= and -mlong-double= are not orthogonal:
TARGET_HANDLE_OPTION = avr-common.cc::avr_handle_option() sets them
such that sizeof(double) <= sizeof(long double) is always true.

gcc/
PR target/113156
* config/avr/avr.opt (-mdouble, -mlong-double): Add "Save" flag.
(-mbranch-cost): Set "Optimization" flag.

lower-bitint: Fix up handling of INTEGER_CSTs in handle_operand in right shifts or comparisons [PR113370]

The INTEGER_CST code uses the remainder bits in computations whether to use
whole constant or just part of it and extend it at runtime, and furthermore
uses it to avoid using all bits even when using the (almost) whole constant.
The problem is that the prec % (2 * limb_prec) computation it uses is
appropriate only for the normal lowering of mergeable operations (where
we process 2 limbs at a time in a loop starting with least significant
limbs and process the remaining 0-2 limbs after the loop (there with
constant indexes).  For that case it is ok not to emit the upper
prec % (2 * limb_prec) bits into the constant, because those bits will be
extracted using INTEGER_CST idx and so will be used directly in the
statements as INTEGER_CSTs.
For other cases, where we either process just a single limb in a loop,
process it downwards (e.g. non-equality comparisons) or with some runtime
addends (some shifts), there is either just at most one limb lowered with
INTEGER_CST idx after the loop (e.g. for right shift) or before the loop
(e.g. non-equality comparisons), or all limbs are processed with
non-INTEGER_CST indexes (e.g. for left shift, when m_var_msb is set).
Now, the m_var_msb case is already handled through
              if (m_var_msb)
                type = TREE_TYPE (op);
              else
                /* If we have a guarantee the most significant partial limb
                   (if any) will be only accessed through handle_operand
                   with INTEGER_CST idx, we don't need to include the partial
                   limb in .rodata.  */
                type = build_bitint_type (prec - rem, 1);
but for the right shifts or comparisons the prec - rem when rem was
prec % (2 * limb_prec) was incorrect, so the following patch fixes it
to use remainder for 2 limbs only if m_upwards_2limb and remainder for
1 limb otherwise.

2024-01-15  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/113370
* gimple-lower-bitint.cc (bitint_large_huge::handle_operand): Only
set rem to prec % (2 * limb_prec) if m_upwards_2limb, otherwise
set it to just prec % limb_prec.

* gcc.dg/torture/bitint-48.c: New test.

RISC-V: Fix attributes bug configuration of ternary instructions

This patch fixes the following FAILs:

Running target riscv-sim/-march=rv64gcv/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-preference=fixed-vlmax
FAIL: gcc.c-torture/execute/pr68532.c   -O0  execution test
FAIL: gcc.c-torture/execute/pr68532.c   -O1  execution test
FAIL: gcc.c-torture/execute/pr68532.c   -O2  execution test
FAIL: gcc.c-torture/execute/pr68532.c   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
FAIL: gcc.c-torture/execute/pr68532.c   -O3 -g  execution test
FAIL: gcc.c-torture/execute/pr68532.c   -Os  execution test
FAIL: gcc.c-torture/execute/pr68532.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none  execution test

Running target riscv-sim/-march=rv64gcv/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m2/--param=riscv-autovec-preference=fixed-vlmax
FAIL: gcc.dg/vect/pr60196-1.c execution test
FAIL: gcc.dg/vect/pr60196-1.c -flto -ffat-lto-objects execution test

Running target riscv-sim/-march=rv64gcv/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-preference=fixed-vlmax
FAIL: gcc.dg/vect/pr60196-1.c execution test
FAIL: gcc.dg/vect/pr60196-1.c -flto -ffat-lto-objects execution test

Running target riscv-sim/-march=rv64gcv_zvl256b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-preference=fixed-vlmax
FAIL: gcc.dg/vect/pr60196-1.c execution test
FAIL: gcc.dg/vect/pr60196-1.c -flto -ffat-lto-objects execution test

The root cause is attributes of ternary intructions are incorrect which cause AVL prop PASS and VSETVL PASS behave
incorrectly.

Tested no regression and committed.

PR target/113393

gcc/ChangeLog:

* config/riscv/vector.md: Fix ternary attributes.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/pr113393-1.c: New test.
* gcc.target/riscv/rvv/autovec/pr113393-2.c: New test.
* gcc.target/riscv/rvv/autovec/pr113393-3.c: New test.

MIPS/testsuite: Add -mno-abicalls option to unaligned-2.c

PIC/abicalls option will generate some GOT operation, and some
`ld/sd` instructions are used.

Let's skip them.

gcc/testsuite
* gcc.target/mips/unaligned-2.c: Add -mno-abicalls option.

Daily bump.

Disable tests for strdup/strndup on __hpux__

hppa*-*-hpux* doesn't have strdup or strndup.

2024-01-14 John David Anglin <danglin@gcc.gnu.org>

gcc/testsuite/ChangeLog:

* gcc.dg/builtin-object-size-1.c: Disable tests for strdup/strndup
on __hpux__.
* gcc.dg/builtin-object-size-2.c: Likewise.
* gcc.dg/builtin-object-size-3.c: Likewise.
* gcc.dg/builtin-object-size-4.c: Likewise.

Skip several gcc.dg/builtin-dynamic-object-size tests on hppa*-*-hpux*

hppa*-*-hpux* doesn't have strdup or strndup.

2024-01-14 John David Anglin <danglin@gcc.gnu.org>

gcc/testsuite/ChangeLog:

* gcc.dg/builtin-dynamic-object-size-0.c: Skip on hppa*-*-hpux*.
* gcc.dg/builtin-dynamic-object-size-1.c: Likewise.
* gcc.dg/builtin-dynamic-object-size-2.c: Likewise.
* gcc.dg/builtin-dynamic-object-size-3.c: Likewise.
* gcc.dg/builtin-dynamic-object-size-4.c: Likewise.

Fix dg-warning on hppa*64*-*-*

2024-01-14 John David Anglin <danglin@gcc.gnu.org>

gcc/testsuite/ChangeLog:

* gcc.dg/Wattributes-6.c: Fix dg-warning on hppa*64*-*-*.

Skip several analyzer socket tests on hppa*-*-hpux*

2024-01-14 John David Anglin <danglin@gcc.gnu.org>

gcc/testsuite/ChangeLog:

PR analyzer/113150
* c-c++-common/analyzer/fd-glibc-byte-stream-socket.c: Skip
on hppa*-*-hpux*.
* c-c++-common/analyzer/fd-manpage-getaddrinfo-client.c: Likewise.
* c-c++-common/analyzer/fd-mappage-getaddrinfo-server.c: Likewise.
* c-c++-common/analyzer/fd-symbolic-socket.c: Likewise.
* gcc.dg/analyzer/fd-glibc-byte-stream-connection-server.c: Likewise.

AVR: Support .rodata in Flash for AVR64* and AVR128* Devices.

These devices see a 32 KiB block of their program memory (flash) in
the RAM address space. This can be used to support .rodata in flash
provided Binutils support PR31124 (Add new emulations which locate
.rodata in flash). This patch does the following:

* configure checks availability of Binutils PR31124.

* Add new command line options -mrodata-in-ram and -mflmap.
While -flmap is for internal usage (communicate hardware properties
from device-specs to the compiler proper), -mrodata-in-ram is a user
space option that allows to return to the current rodata-in-ram layout.

* Adjust gen-avr-mmcu-specs.cc so that device-specs are generated
that sanity check options, and that translate -m[no-]rodata-in-ram
to its emulation.

* Objects in .rodata don't drag __do_copy_data.

* Document new options and built-in macros.

PR target/112944

gcc/
* configure.ac [target=avr]: Check availability of emulations
avrxmega2_flmap and avrxmega4_flmap, resulting in new config vars
HAVE_LD_AVR_AVRXMEGA2_FLMAP and HAVE_LD_AVR_AVRXMEGA4_FLMAP.
* configure: Regenerate.
* config.in: Regenerate.
* doc/invoke.texi (AVR Options): Document -mflmap, -mrodata-in-ram,
__AVR_HAVE_FLMAP__, __AVR_RODATA_IN_RAM__.

* config/avr/avr.opt (-mflmap, -mrodata-in-ram): New options.
* config/avr/avr-arch.h (enum avr_device_specific_features):
Add AVR_ISA_FLMAP.
* config/avr/avr-mcus.def (AVR_MCU) [avr64*, avr128*]: Set isa flag
AVR_ISA_FLMAP.
* config/avr/avr.cc (avr_arch_index, avr_has_rodata_p): New vars.
(avr_set_core_architecture): Set avr_arch_index.
(have_avrxmega2_flmap, have_avrxmega4_flmap)
(have_avrxmega3_rodata_in_flash): Set new static const bool according
to configure results.
(avr_rodata_in_flash_p): New function using them.
(avr_asm_init_sections): Let readonly_data_section->unnamed.callback
track avr_need_copy_data_p only if not avr_rodata_in_flash_p().
(avr_asm_named_section): Track avr_has_rodata_p.
(avr_file_end): Emit __do_copy_data also when avr_has_rodata_p
and not avr_rodata_in_flash_p ().
* config/avr/specs.h (CC1_SPEC): Add %(cc1_rodata_in_ram).
(LINK_SPEC): Add %(link_rodata_in_ram).
(LINK_ARCH_SPEC): Remove.
* config/avr/gen-avr-mmcu-specs.cc (have_avrxmega3_rodata_in_flash)
(have_avrxmega2_flmap, have_avrxmega4_flmap): Set new static
const bool according to configure results.
(diagnose_mrodata_in_ram): New function.
(print_mcu): Generate specs with the following changes:
<*cc1_misc, *asm_misc, *link_misc>: New specs so that we don't
need to extend avr/specs.h each time we add a new bell or whistle.
<*cc1_rodata_in_ram, *link_rodata_in_ram>: New specs to diagnose
-m[no-]rodata-in-ram.
<*cpp_rodata_in_ram>: New. Does -D__AVR_RODATA_IN_RAM__=0/1.
<*cpp_mcu>: Add -D__AVR_AVR_FLMAP__ if it applies.
<*cpp>: Add %(cpp_rodata_in_ram).
<*link_arch>: Use emulation avrxmega2_flmap, avrxmega4_flmap as
requested.
<*self_spec>: Add -mflmap or %<mflmap as needed.

gcc/testsuite/
* gcc.target/avr/torture/pr112944-flmap-0.c: New test.
* gcc.target/avr/torture/pr112944-flmap-1.c: New test.

[committed] Fix MIPS bootstrap

mips bootstraps have been broken for a while.  They've been triggering an error
about mutually exclusive equal-tests always being false when building
gencondmd.

This was ultimately tracked down to the ior<mode>3_mips16_asmacro pattern.  The
pattern uses the GPR mode iterator which looks like this:

(define_mode_iterator GPR [SI (DI "TARGET_64BIT")])

The condition for the pattern looks like this:

  "ISA_HAS_MIPS16E2"

And if you dig into ISA_HAS_MIPS16E2:

/* The MIPS16e V2 instructions are available.  */
                                && !TARGET_64BIT)

The way the mode iterator is handled is by adding its condition to the
pattern's condition when we expand copies of the pattern resulting in this
condition for one of the two generated patterns:

(TARGET_MIPS16 && TARGET_MIPS16E2 && !TARGET_64BIT) && TARGET_64BIT

This can never be true because of the TARGET_64BIT tests.

The fix is trivial.  Don't use a mode iterator on that pattern.

Bootstrapped on mips64el.  I don't have any tests to compare against, so no
regression test data.

gcc/
* config/mips/mips.md (ior<mode>3_mips16_asmacro): Use SImode,
not the GPR iterator.  Adjust pattern name and mode attribute
accordingly.

Daily bump.

Fortran: intrinsic ISHFTC and missing optional argument SIZE [PR67277]

gcc/fortran/ChangeLog:

PR fortran/67277
* trans-intrinsic.cc (gfc_conv_intrinsic_ishftc): Handle optional
dummy argument for SIZE passed to ISHFTC. Set default value to
BIT_SIZE(I) when missing.

gcc/testsuite/ChangeLog:

PR fortran/67277
* gfortran.dg/ishftc_optional_size_1.f90: New test.

hppa64: Fix fmt_f_default_field_width_3.f90 and fmt_g_default_field_width_3.f90

The hppa*64*-*-hpux* target is not included in the set of fortran_real_16
targets because it doesn't have cosl. However, these tests don't need
cosl, etc.

2024-01-13 John David Anglin <danglin@gcc.gnu.org>

gcc/testsuite/ChangeLog:

* gfortran.dg/fmt_f_default_field_width_3.f90: Add hppa*64*-*-hpux*
to real_16 dg-error targets.
* gfortran.dg/fmt_g_default_field_width_3.f90: Likewise.

Fortran: annotations for DO CONCURRENT loops [PR113305]

gcc/fortran/ChangeLog:

PR fortran/113305
* gfortran.h (gfc_loop_annot): New.
(gfc_iterator, gfc_forall_iterator): Use for annotation control.
* array.cc (gfc_copy_iterator): Adjust.
* gfortran.texi: Document annotations IVDEP, UNROLL n, VECTOR,
NOVECTOR as applied to DO CONCURRENT.
* parse.cc (parse_do_block): Parse annotations IVDEP, UNROLL n,
VECTOR, NOVECTOR as applied to DO CONCURRENT. Apply UNROLL only to
first loop control variable.
* trans-stmt.cc (iter_info): Use gfc_loop_annot.
(gfc_trans_simple_do): Adjust.
(gfc_trans_forall_loop): Annotate loops with IVDEP, UNROLL n,
VECTOR, NOVECTOR as needed for DO CONCURRENT.
(gfc_trans_forall_1): Handle loop annotations.

gcc/testsuite/ChangeLog:

PR fortran/113305
* gfortran.dg/do_concurrent_7.f90: New test.

libstdc++: Implement P2255R2 dangling checks for std::tuple [PR108822]

This is the last part of PR libstdc++/108822 implementing P2255R2, which
makes it ill-formed to create a std::tuple that would bind a reference
to a temporary.

The dangling checks are implemented as deleted constructors for C++20
and higher, and as Debug Mode static assertions in the constructor body
for older standards. This is similar to the r13-6084-g916ce577ad109b
changes for std::pair.

As part of this change, I've reimplemented most of std::tuple for C++20,
making use of concepts to replace the enable_if constraints, and using
conditional explicit to avoid duplicating most constructors. We could
use conditional explicit for the C++11 implementation too (with pragmas
to disables the -Wc++17-extensions warnings), but that should be done as
a stage 1 change for GCC 15 rather than now.

The partial specialization for std::tuple<T1, T2> is no longer used for
C++20 (or more precisely, for a C++20 compiler that supports concepts
and conditional explicit). The additional constructors and assignment
operators that take std::pair arguments have been added to the C++20
implementation of the primary template, with sizeof...(_Elements)==2
constraints. This avoids reimplementing all the other constructors in
the std::tuple<T1, T2> partial specialization to use concepts. This way
we avoid four implementations of every constructor and only have three!
(The primary template has an implementation of each constructor for
C++11 and another for C++20, and the tuple<T1,T2> specialization has an
implementation of each for C++11, so that's three for each constructor.)

In order to make the constraints more efficient on the C++20 version of
the default constructor I've also added a variable template for the
__is_implicitly_default_constructible trait, implemented using concepts.

libstdc++-v3/ChangeLog:

PR libstdc++/108822
* include/std/tuple (tuple): Add checks for dangling references.
Reimplement constraints and constant expressions using C++20
features.
* include/std/type_traits [C++20]
(__is_implicitly_default_constructible_v): Define.
(__is_implicitly_default_constructible): Use variable template.
* testsuite/20_util/tuple/dangling_ref.cc: New test.

Reviewed-by: Patrick Palka <ppalka@redhat.com>

lower-bitint: Fix up handle_operand_addr INTEGER_CST handling [PR113361]

As the testcase shows, the INTEGER_CST handling in handle_operand_addr
(i.e. what is used when passing address of an integer to a bitint library
routine) wasn't correct.  If the minimum precision to represent an
INTEGER_CST is smaller or equal to limb_prec, the code correctly uses
m_limb_type; if the minimum precision of a _BitInt INTEGER_CST is large
enough such that the bitint is middle, large or huge, everything is fine
too.  But the code wasn't handling correctly e.g. __int128 constants which
need more than limb_prec bits or _BitInt constants which on the architecture
are considered small (say have DImode limb_mode, TImode abi_limb_mode and
for [65, 128] bits use TImode scalar like the proposed aarch64 patch).
Best would be to use an array of 2/3/4 limbs in that case, but we'd need to
convert the INTEGER_CST to a CONSTRUCTOR in the right endianity etc.,
so the code was using mid_min_prec to enforce a middle _BitInt precision.
Except that mid_min_prec can be 0 and not computed yet, or it doesn't have
to be the smallest middle _BitInt precision, just the smallest so far
encountered.  So, on the testcase one possibility was that it used precision
65 from mid_min_prec, even when the INTEGER_CST actually needed larger
minimum precision (96 bits at least), or crashed when mid_min_prec was 0.

The patch fixes it in 2 hunks, the first makes sure we actually try to
create a BITINT_TYPE for the > limb_prec cases like __int128, and the second
instead of using mid_min_prec attempts to increase mp precision until it
isn't small anymore.

2024-01-13  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/113361
* gimple-lower-bitint.cc (bitint_large_huge::handle_operand_addr):
Fix up determination of the type for > limb_prec constants.

* gcc.dg/torture/bitint-47.c: New test.

testsuite: Fix up vect-early-break_100-pr113287.c testcase [PR113287]

When the testcase was being adjusted for unsigned long -> unsigned long long,
two spots using long weren't changed to long long, so the testcase still warns
about UB in shifts.

2024-01-13 Jakub Jelinek <jakub@redhat.com>

PR tree-optimization/113287
* gcc.dg/vect/vect-early-break_100-pr113287.c: Use long long instead
of long.

c++, demangle: Implement https://github.com/itanium-cxx-abi/cxx-abi/issues/148 non-proposal

The following patch attempts to implement what apparently clang++
implemented for explicit object member function mangling, but nobody
actually proposed in patch form in
https://github.com/itanium-cxx-abi/cxx-abi/issues/148

2024-01-13 Jakub Jelinek <jakub@redhat.com>

gcc/cp/
* mangle.cc (write_nested_name): Mangle explicit object
member functions with H as per
https://github.com/itanium-cxx-abi/cxx-abi/issues/148 non-proposal.
gcc/testsuite/
* g++.dg/abi/mangle79.C: New test.
include/
* demangle.h (enum demangle_component_type): Add
DEMANGLE_COMPONENT_XOBJ_MEMBER_FUNCTION.
libiberty/
* cp-demangle.c (FNQUAL_COMPONENT_CASE): Add case for
DEMANGLE_COMPONENT_XOBJ_MEMBER_FUNCTION.
(d_dump): Handle DEMANGLE_COMPONENT_XOBJ_MEMBER_FUNCTION.
(d_nested_name): Parse H after N in nested name.
(d_count_templates_scopes): Handle
DEMANGLE_COMPONENT_XOBJ_MEMBER_FUNCTION.
(d_print_mod): Likewise.
(d_print_function_type): Likewise.
* testsuite/demangle-expected: Add tests for explicit object
member functions.

Add a few testcases for fix missed optimization regressions

Adds a few new testcases for some missed optimization regressions.
The analysis on how each should be optimized is in the testcases
themselves (and in the bug report).

Committed as obvious after running the testsuite to make sure they pass.

PR tree-optimization/107823
PR tree-optimization/110768
PR tree-optimization/110941
PR tree-optimization/110450
PR tree-optimization/110841

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/ssa-thread-22.c: New test.
* gcc.dg/tree-ssa/vrp-loop-1.c: New test.
* gcc.dg/tree-ssa/vrp-loop-2.c: New test.
* gcc.dg/tree-ssa/vrp-unreachable-1.c: New test.
* gcc.dg/tree-ssa/vrp-unreachable-2.c: New test.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

libstdc++: Implement C++23 std::bind_back from P2387R3 [PR108827]

The implementation is based off of std::bind_front. Since this is a
C++23 feature we use deducing this unconditionally.

PR libstdc++/108827
PR libstdc++/111327

libstdc++-v3/ChangeLog:

* include/bits/version.def (bind_back): Define.
* include/bits/version.h: Regenerate.
* include/std/functional (_Bind_back): Define for C++23.
(bind_back): Likewise.
* testsuite/20_util/function_objects/bind_back/1.cc: New test
(adapted from corresponding bind_front test).
* testsuite/20_util/function_objects/bind_back/111327.cc: Likewise.

libstdc++: Use C++23 deducing this in std::bind_front

This simplifies the operator() of _Bind_front using C++23 deducing
this, allowing us to condense multiple operator() overloads into one.

In passing I think we can remove _Bind_front's defaulted special member
declarations and just let the compiler implicitly generate them for us.

libstdc++-v3/ChangeLog:

* include/std/functional (_Bind_front): Remove =default special
member function declarations.
(_Bind_front::operator()): Implement using C++23 deducing this
when available.
* testsuite/20_util/function_objects/bind_front/111327.cc:
Adjust testcase to expect better errors in C++23 mode.

libstdc++/ranges: Use perfect forwarding in _Pipe and _Partial ctors

This avoids redundant moves when composing and partially applying range
adaptor objects.

libstdc++-v3/ChangeLog:

* include/std/ranges (views::__adaptor::operator|): Perform
perfect forwarding of arguments.
(views::__adaptor::_RangeAdaptor::operator()): Pass dummy
first argument to _Partial.
(views::__adaptor::_Partial::_Partial): Likewise. Add dummy
first parameter.
(views::__adaptor::_Pipe::_Pipe): Perform perfect forwarding
of arguments.
(to): Pass dummy first argument to _Partial.

Daily bump.

libstdc++: Fix non-portable results from 64-bit std::subtract_with_carry_engine [PR107466]

I implemented the resolution of LWG 3809 in r13-4364-ga64775a0edd469 but
it was recently noted in the MSVC STL github repo that the change causes
possible truncation for 64-bit seeds. Whether the truncation occurs (and
to what value) depends on the width of uint_least32_t which is not
portable, so the output of the PRNG for 64-bit seed values is no longer
the same as in C++20, and no longer portable across platforms.

That new issue was filed as LWG 4014. I proposed a new change which
reduces the seed by the LCG's modulus before the conversion to
uint_least32_t. This ensures that 64-bit seed values are consistently
reduced by the modulus before any truncation. This removes the
platform-dependent behaviour and restores the old behaviour for
std::subtract_with_carry_engine specializations using a 64-bit result
type (such as std::ranlux48_base).

libstdc++-v3/ChangeLog:

PR libstdc++/107466
* include/bits/random.tcc (subtract_with_carry_engine::seed):
Implement proposed resolution of LWG 4014.
* testsuite/26_numerics/random/pr60037-neg.cc: Adjust dg-error
line number.
* testsuite/26_numerics/random/subtract_with_carry_engine/cons/lwg3809.cc:
Check for expected result of 64-bit engine with seed that
doesn't fit in 32-bits.

AVR: Documentation: Web-Link an example ld-Script for Address-Space __flashN.

gcc/
* doc/extend.texi (AVR Named Address Spaces, Limitations and Caveats):
Add web-link to the avr-gcc wiki.

AVR: Documentation: Attribute address has exactly one argument.

gcc/
* doc/extend.texi (AVR Variable Attributes) [address]: Remove
documentation for a version without argument, which is not supported.

c++: __class_type_info and modules [PR113038]

Doing a dynamic_cast in both TUs broke because we were declaring a new
__class_type_info in _b that conflicted with the one imported in the global
module from _a. It seems clear to me that any new class declaration in
the global module should merge with an imported definition, but for GCC 14
let's just fix this for the specific case of __class_type_info.

PR c++/113038

gcc/cp/ChangeLog:

* name-lookup.cc (lookup_elaborated_type): Look for bindings
in the global namespace in the ABI namespace.

gcc/testsuite/ChangeLog:

* g++.dg/modules/pr106304_b.C: Add dynamic_cast.

arm: vld1_types_x4 ACLE intrinsics

This patch is part of a series of patches implementing the _xN
variants of the vld1 intrinsic for the arm port. This patch adds the
_x4 variants of the vld1 intrinsic.

The previous vld1_x4 has been updated to vld1q_x4 to take into
account that it works with 4-word-length types. vld1_x4 is now
only for 2-word-length types.

ACLE documents:
https://developer.arm.com/documentation/ihi0053/latest/

ISA documents:
https://developer.arm.com/documentation/ddi0487/latest/

gcc/ChangeLog:
* config/arm/arm_neon.h
(vld1_u8_x4, vld1_u16_x4, vld1_u32_x4, vld1_u64_x4): New.
(vld1_s8_x4, vld1_s16_x4, vld1_s32_x4, vld1_s64_x4): New.
(vld1_f16_x4, vld1_f32_x4): New.
(vld1_p8_x4, vld1_p16_x4, vld1_p64_x4): New.
(vld1_bf16_x4): New.
(vld1q_types_x4): Updated to use vld1q_x4
from arm_neon_builtins.def
* config/arm/arm_neon_builtins.def
(vld1_x4): Updated entries.
(vld1q_x4): New entries, but comes from the old vld1_x4
* config/arm/neon.md
(neon_vld1q_x4<mode>): Updated from neon_vld1_x4<mode>.

gcc/testsuite/ChangeLog:
* gcc.target/arm/simd/vld1_base_xN_1.c: Updated.
* gcc.target/arm/simd/vld1_bf16_xN_1.c: Updated.
* gcc.target/arm/simd/vld1_fp16_xN_1.c: Updated.
* gcc.target/arm/simd/vld1_p64_xN_1.c: Updated.

arm: vld1_types_x3 ACLE intrinsics

This patch is part of a series of patches implementing the _xN
variants of the vld1 intrinsic for the arm port. This patch adds the
_x3 variants of the vld1 intrinsic.

The previous vld1_x3 has been updated to vld1q_x3 to take into
account that it works with 4-word-length types. vld1_x3 is now
only for 2-word-length types.

ACLE documents:
https://developer.arm.com/documentation/ihi0053/latest/

ISA documents:
https://developer.arm.com/documentation/ddi0487/latest/

gcc/ChangeLog:
* config/arm/arm_neon.h
(vld1_u8_x3, vld1_u16_x3, vld1_u32_x3, vld1_u64_x3): New.
(vld1_s8_x3, vld1_s16_x3, vld1_s32_x3, vld1_s64_x3): New.
(vld1_f16_x3, vld1_f32_x3): New.
(vld1_p8_x3, vld1_p16_x3, vld1_p64_x3): New.
(vld1_bf16_x3): New.
(vld1q_types_x3): Updated to use vld1q_x3 from
arm_neon_builtins.def
* config/arm/arm_neon_builtins.def
(vld1_x3): Updated entries.
(vld1q_x3): New entries, but comes from the old vld1_x2
* config/arm/neon.md
(neon_vld1q_x3<mode>): Updated from neon_vld1_x3<mode>.

gcc/testsuite/ChangeLog:
* gcc.target/arm/simd/vld1_base_xN_1.c: Updated.
* gcc.target/arm/simd/vld1_bf16_xN_1.c: Updated.
* gcc.target/arm/simd/vld1_fp16_xN_1.c: Updated.
* gcc.target/arm/simd/vld1_p64_xN_1.c: Updated.

arm: vld1_types_x2 ACLE intrinsics

This patch is part of a series of patches implementing the _xN
variants of the vld1 intrinsic for the arm port. This patch adds the
_x2 variants of the vld1 intrinsic.

The previous vld1_x2 has been updated to vld1q_x2 to take into
account that it works with 4-word-length types. vld1_x2 is now
only for 2-word-length types.

ACLE documents:
https://developer.arm.com/documentation/ihi0053/latest/

ISA documents:
https://developer.arm.com/documentation/ddi0487/latest/

gcc/ChangeLog:
* config/arm/arm_neon.h
(vld1_u8_x2, vld1_u16_x2, vld1_u32_x2, vld1_u64_x2): New.
(vld1_s8_x2, vld1_s16_x2, vld1_s32_x2, vld1_s64_x2): New.
(vld1_f16_x2, vld1_f32_x2): New.
(vld1_p8_x2, vld1_p16_x2, vld1_p64_x2): New.
(vld1_bf16_x2): New.
(vld1q_types_x2): Updated to use vld1q_x2 from
arm_neon_builtins.def
* config/arm/arm_neon_builtins.def
(vld1_x2): Updated entries.
(vld1q_x2): New entries, but comes from the old vld1_x2
* config/arm/neon.md
(neon_vld1<VMEMX2_q>_x2<VDQX:mode>): Updated from
neon_vld1_x2<mode>.

gcc/testsuite/ChangeLog:
* gcc.target/arm/simd/vld1_base_xN_1.c: Add new tests.
* gcc.target/arm/simd/vld1_bf16_xN_1.c: Add new tests.
* gcc.target/arm/simd/vld1_fp16_xN_1.c: Add new tests.
* gcc.target/arm/simd/vld1_p64_xN_1.c: Add new tests.

arm: vst1q_types_x4 ACLE intrinsics

This patch is part of a series of patches implementing the _xN
variants of the vst1q intrinsic for the arm port. This patch adds the
_x4 variants of the vst1q intrinsic.

ACLE:
https://developer.arm.com/documentation/ihi0053/latest/

ISA documents:
https://developer.arm.com/documentation/ddi0487/latest/

gcc/ChangeLog:
* config/arm/arm_neon.h
(vst1q_u8_x4, vst1q_u16_x4, vst1q_u32_x4, vst1q_u64_x4): New.
(vst1q_s8_x4, vst1q_s16_x4, vst1q_s32_x4, vst1q_s64_x4): New.
(vst1q_f16_x4, vst1q_f32_x4): New.
(vst1q_p8_x4, vst1q_p16_x4, vst1q_p64_x4): New.
(vst1q_bf16_x4): New.
* config/arm/arm_neon_builtins.def (vst1q_x4): New entries.
* config/arm/neon.md
(neon_vst1q_x4<mode>): New.
(neon_vst1x4qa<mode>, neon_vst1x4qb<mode>): New.
* config/arm/unspecs.md
(UNSPEC_VST1X4A, UNSPEC_VST1X4B): New.

gcc/testsuite/ChangeLog:
* gcc.target/arm/simd/vst1q_base_xN_1.c: Updated
* gcc.target/arm/simd/vst1q_bf16_xN_1.c: Updated
* gcc.target/arm/simd/vst1q_fp16_xN_1.c: Updated
* gcc.target/arm/simd/vst1q_p64_xN_1.c: Updated

arm: vst1q_types_x3 ACLE intrinsics

This patch is part of a series of patches implementing the _xN
variants of the vst1q intrinsic for the arm port. This patch adds the
_x3 variants of the vst1q intrinsic.

ACLE documents:
https://developer.arm.com/documentation/ihi0053/latest/

ISA documents:
https://developer.arm.com/documentation/ddi0487/latest/

gcc/ChangeLog:
* config/arm/arm_neon.h
(vst1q_u8_x3, vst1q_u16_x3, vst1q_u32_x3, vst1q_u64_x3): New.
(vst1q_s8_x3, vst1q_s16_x3, vst1q_s32_x3, vst1q_s64_x3): New.
(vst1q_f16_x3, vst1q_f32_x3): New.
(vst1q_p8_x3, vst1q_p16_x3, vst1q_p64_x3): New.
(vst1q_bf16_x3): New.
* config/arm/arm_neon_builtins.def (vst1q_x3): New entries.
* config/arm/neon.md
(neon_vst1q_x3<mode>): New.
(neon_vld1x3qa<mode>, neon_vst1x3qb<mode>): New.
* config/arm/unspecs.md
(UNSPEC_VST1X3A, UNSPEC_VST1X3B): New.

gcc/testsuite/ChangeLog:
* gcc.target/arm/simd/vst1q_base_xN_1.c: Add new tests.
* gcc.target/arm/simd/vst1q_bf16_xN_1.c: Add new tests.
* gcc.target/arm/simd/vst1q_fp16_xN_1.c: Add new tests.
* gcc.target/arm/simd/vst1q_p64_xN_1.c: Add new tests.

arm: vst1q_types_x2 ACLE intrinsics

This patch is part of a series of patches implementing the _xN
variants of the vst1q intrinsic for the arm port. This patch adds the
_x2 variants of the vst1q intrinsic.

ACLE documents:
https://developer.arm.com/documentation/ihi0053/latest/

ISA documents:
https://developer.arm.com/documentation/ddi0487/latest/

gcc/ChangeLog:
* config/arm/arm_neon.h
(vst1q_u8_x2, vst1q_u16_x2, vst1q_u32_x2, vst1q_u64_x2): New.
(vst1q_s8_x2, vst1q_s16_x2, vst1q_s32_x2, vst1q_s64_x2): New.
(vst1q_f16_x2, vst1q_f32_x2): New.
(vst1q_p8_x2, vst1q_p16_x2, vst1q_p64_x2): New.
(vst1q_bf16_x2): New.
* config/arm/arm_neon_builtins.def (vst1<_x2): New entries.
* config/arm/neon.md
(neon_vst1<VMEMX2_q>_x2<VDQX:mode>): Updated from
neon_vst1_x2<mode>.
* config/arm/iterators.md
(VMEMX2): New mode iterator.
(VMEMX2_q): New mode attribute.

gcc/testsuite/ChangeLog:
* gcc.target/arm/simd/vst1q_base_xN_1.c: Add new tests.
* gcc.target/arm/simd/vst1q_bf16_xN_1.c: Add new tests.
* gcc.target/arm/simd/vst1q_fp16_xN_1.c: Add new tests.
* gcc.target/arm/simd/vst1q_p64_xN_1.c: Add new tests.

arm: vst1_types_x4 ACLE intrinsics

This patch is part of a series of patches implementing the _xN
variants of the vst1 intrinsic for the arm port. This patch adds the
_x4 variants of the vst1 intrinsic.

ACLE documents:
https://developer.arm.com/documentation/ihi0053/latest/

ISA documents:
https://developer.arm.com/documentation/ddi0487/latest/

gcc/ChangeLog:
* config/arm/arm_neon.h
(vst1_u8_x4, vst1_u16_x4, vst1_u32_x4, vst1_u64_x4): New.
(vst1_s8_x4, vst1_s16_x4, vst1_s32_x4, vst1_s64_x4): New.
(vst1_f16_x4, vst1_f32_x4): New.
(vst1_p8_x4, vst1_p16_x4, vst1_p64_x4): New.
(vst1_bf16_x4): New.
* config/arm/arm_neon_builtins.def (vst1_x4): New entries.
* config/arm/neon.md (vst1_x4<mode>): New.

gcc/testsuite/ChangeLog:
* gcc.target/arm/simd/vst1_base_xN_1.c: Updated.
* gcc.target/arm/simd/vst1_bf16_xN_1.c: Updated.
* gcc.target/arm/simd/vst1_fp16_xN_1.c: Updated.
* gcc.target/arm/simd/vst1_p64_xN_1.c: Updated.

arm: vst1_types_x3 ACLE intrinsics

This patch is part of a series of patches implementing the _xN
variants of the vst1 intrinsic for the arm port. This patch adds the
_x3 variants of the vst1 intrinsic.

ACLE documents:
https://developer.arm.com/documentation/ihi0053/latest/

ISA documents:
https://developer.arm.com/documentation/ddi0487/latest/

gcc/ChangeLog:
* config/arm/arm_neon.h
(vst1_u8_x3, vst1_u16_x3, vst1_u32_x3, vst1_u64_x3): New.
(vst1_s8_x3, vst1_s16_x3, vst1_s32_x3, vst1_s64_x3): New.
(vst1_f16_x3, vst1_f32_x3): New.
(vst1_p8_x3, vst1_p16_x3, vst1_p64_x3): New.
(vst1_bf16_x3): New.
* config/arm/arm_neon_builtins.def (vst1_x3): New entries.
* config/arm/neon.md (vst1_x3<mode>): New.

gcc/testsuite/ChangeLog:
* gcc.target/arm/simd/vst1_base_xN_1.c: Updated.
* gcc.target/arm/simd/vst1_bf16_xN_1.c: Updated.
* gcc.target/arm/simd/vst1_fp16_xN_1.c: Updated.
* gcc.target/arm/simd/vst1_p64_xN_1.c: Updated.

arm: vst1_types_x2 ACLE intrinsics

This patch is part of a series of patches implementing the _xN
variants of the vst1 intrinsic for the arm port. This patch adds the
_x2 variants of the vst1 intrinsic.

ACLE documents:
https://developer.arm.com/documentation/ihi0053/latest/

ISA documents:
https://developer.arm.com/documentation/ddi0487/latest/

gcc/ChangeLog:
* config/arm/arm_neon.h
(vst1_u8_x2, vst1_u16_x2, vst1_u32_x2, vst1_u64_x2): New.
(vst1_s8_x2, vst1_s16_x2, vst1_s32_x2, vst1_s64_x2): New.
(vst1_f16_x2, vst1_f32_x2): New.
(vst1_p8_x2, vst1_p16_x2, vst1_p64_x2): New.
(vst1_bf16_x2): New.
* config/arm/arm_neon_builtins.def (vst1_x2): New entries.
* config/arm/neon.md (vst1_x2<mode>): New.

gcc/testsuite/ChangeLog:
* gcc.target/arm/simd/vst1_base_xN_1.c: Add new tests.
* gcc.target/arm/simd/vst1_bf16_xN_1.c: Add new tests.
* gcc.target/arm/simd/vst1_fp16_xN_1.c: Add new tests.
* gcc.target/arm/simd/vst1_p64_xN_1.c: Add new tests.

arm: vld1q_types_x4 ACLE intrinsics

This patch is part of a series of patches implementing the _xN
variants of the vld1q intrinsic for the arm port. This patch adds the
_x4 variants of the vld1q intrinsic.

ACLE documents:
https://developer.arm.com/documentation/ihi0053/latest/

ISA documents:
https://developer.arm.com/documentation/ddi0487/latest/

gcc/ChangeLog:
* config/arm/arm_neon.h
(vld1q_u8_x4, vld1q_u16_x4, vld1q_u32_x4, vld1q_u64_x4): New.
(vld1q_s8_x4, vld1q_s16_x4, vld1q_s32_x4, vld1q_s64_x4): New.
(vld1q_f16_x4, vld1q_f32_x4): New.
(vld1q_p8_x4, vld1q_p16_x4, vld1q_p64_x4): New.
(vld1q_bf16_x4): New.
* config/arm/arm_neon_builtins.def (vld1_x4): New entries.
* config/arm/neon.md
(neon_vld1_x4<mode>): New.
(neon_vld1x4qa<mode>, neon_vld1x4qb<mode>): New
* config/arm/unspecs.md
(UNSPEC_VLD1X4A, UNSPEC_VLD1X4B): New.

gcc/testsuite/ChangeLog:
* gcc.target/arm/simd/vld1q_base_xN_1.c: Updated.
* gcc.target/arm/simd/vld1q_bf16_xN_1.c: Updated.
* gcc.target/arm/simd/vld1q_fp16_xN_1.c: Updated.
* gcc.target/arm/simd/vld1q_p64_xN_1.c: Updated.

arm: vld1q_types_x3 ACLE intrinsics

This patch is part of a series of patches implementing the _xN
variants of the vld1q intrinsic for the arm port. This patch adds the
_x3 variants of the vld1q intrinsic.

ACLE documents:
https://developer.arm.com/documentation/ihi0053/latest/

ISA documents:
https://developer.arm.com/documentation/ddi0487/latest/

gcc/ChangeLog:
* config/arm/arm_neon.h
(vld1q_u8_x3, vld1q_u16_x3, vld1q_u32_x3, vld1q_u64_x3): New.
(vld1q_s8_x3, vld1q_s16_x3, vld1q_s32_x3, vld1q_s64_x3): New.
(vld1q_f16_x3, vld1q_f32_x3): New.
(vld1q_p8_x3, vld1q_p16_x3, vld1q_p64_x3): New.
(vld1q_bf16_x3): New.
* config/arm/arm_neon_builtins.def (vld1_x3): New entries.
* config/arm/neon.md
(neon_vld1_x3<mode>): New.
(neon_vld1x3qa<mode>, neon_vld1x3qb<mode>): New.
* config/arm/unspecs.md
(UNSPEC_VLD1X3A, UNSPEC_VLD1X3B): New.

gcc/testsuite/ChangeLog:
* gcc.target/arm/simd/vld1q_base_xN_1.c: Add new tests.
* gcc.target/arm/simd/vld1q_bf16_xN_1.c: Add new tests.
* gcc.target/arm/simd/vld1q_fp16_xN_1.c: Add new tests.
* gcc.target/arm/simd/vld1q_p64_xN_1.c: Add new tests.

arm: vld1q_types_x2 ACLE intrinsics

This patch is part of a series of patches implementing the _xN
variants of the vld1q intrinsic for the arm port. This patch adds the
_x2 variants of the vld1q intrinsic.

ACLE documents:
https://developer.arm.com/documentation/ihi0053/latest/

ISA documents:
https://developer.arm.com/documentation/ddi0487/latest/

gcc/ChangeLog:
* config/arm/arm_neon.h
(vld1q_u8_x2, vld1q_u16_x2, vld1q_u32_x2, vld1q_u64_x2): New.
(vld1q_s8_x2, vld1q_s16_x2, vld1q_s32_x2, vld1q_s64_x2): New.
(vld1q_f16_x2, vld1q_f32_x2): New.
(vld1q_p8_x2, vld1q_p16_x2, vld1q_p64_x2): New.
(vld1q_bf16_x2): New.
* config/arm/arm_neon_builtins.def (vld1_x2): New entries.
* config/arm/neon.md (vld1_x2<mode>): New.

gcc/testsuite/ChangeLog:
* gcc.target/arm/simd/vld1q_base_xN_1.c: Add new test.
* gcc.target/arm/simd/vld1q_bf16_xN_1.c: Add new test.
* gcc.target/arm/simd/vld1q_fp16_xN_1.c: Add new test.
* gcc.target/arm/simd/vld1q_p64_xN_1.c: Add new test.

c: Avoid _BitInt indexes > sizetype in ARRAY_REFs [PR113315]

When build_array_ref doesn't use ARRAY_REF, it casts the index to sizetype
already, performs POINTER_PLUS_EXPR and then dereferences.
While when emitting ARRAY_REF, we try to keep index expression as is in
whatever type it had, which is reasonable e.g. for signed or unsigned types
narrower than sizetype for loop optimizations etc.
But if the index is wider than sizetype, we are unnecessarily computing
bits beyond what is needed. For {,unsigned }__int128 on 64-bit arches
or {,unsigned }long long on 32-bit arches we've been doing that for decades,
so the following patch doesn't propose to change that (might be stage1
material), but for _BitInt at least the _BitInt lowering code doesn't expect
to see large/huge _BitInt in the ARRAY_REF indexes, I was expecting one
would see just casts of those to sizetype.

So, the following patch makes sure that large/huge _BitInt indexes don't
appear in ARRAY_REFs.

2024-01-12 Jakub Jelinek <jakub@redhat.com>

PR c/113315
* c-typeck.cc (build_array_ref): If index has BITINT_TYPE type with
precision larger than sizetype precision, convert it to sizetype.

* gcc.dg/bitint-65.c: New test.
* gcc.dg/bitint-66.c: New test.

testsuite: Make bitint early vect test more accurate

This changes the tests I committed for PR113287 to also
run on targets that don't support bitint.

gcc/ChangeLog:

PR tree-optimization/113287
* doc/sourcebuild.texi (check_effective_target_bitint65535): New.

gcc/testsuite/ChangeLog:

PR tree-optimization/113287
* gcc.dg/vect/vect-early-break_100-pr113287.c: Support non-bitint.
* gcc.dg/vect/vect-early-break_99-pr113287.c: Likewise.
* lib/target-supports.exp (bitint, bitint128, bitint575, bitint65535):
Document them.

middle-end: remove more usages of single_exit

This replaces two more usages of single_exit that I had missed before.
They both seem to happen when we re-use the ifcvt scalar loop for versioning.

The condition in versioning is the same as the one for when we don't re-use the
scalar loop.

gcc/ChangeLog:

* tree-vect-loop-manip.cc (vect_loop_versioning): Replace single_exit.
* tree-vect-loop.cc (vect_transform_loop): Likewise.

middle-end: fill in reduction PHI for all alt exits [PR113178]

When we have a loop with more than 2 exits and a reduction I forgot to fill in
the PHI value for all alternate exits.

All alternate exits use the same PHI value so we should loop over the new
PHI elements and copy the value across since we call the reduction calculation
code only once for all exits. This was normally covered up by earlier parts of
the compiler rejecting loops incorrectly (which has been fixed now).

Note that while I can use the loop in all cases, the reason I separated out the
main and alt exit is so that if you pass the wrong edge the macro will assert.

gcc/ChangeLog:

PR tree-optimization/113178
* tree-vect-loop.cc (vect_create_epilog_for_reduction): Fill in all
alternate exits.

gcc/testsuite/ChangeLog:

PR tree-optimization/113178
* gcc.dg/vect/vect-early-break_101-pr113178.c: New test.
* gcc.dg/vect/vect-early-break_102-pr113178.c: New test.

middle-end: thread through existing LCSSA variable for alternative exits too [PR113237]

Builing on top of the previous patch, similar to when we have a single exit if
we have a case where all exits are considered early exits and there are existing
non virtual phi then in order to maintain LCSSA we have to use the existing PHI
variables. We can't simply clear them and just rebuild them because the order
of the PHIs in the main exit must match the original exit for when we add the
skip_epilog guard.

But the infrastructure is already in place to maintain them, we just have to use
the right value.

gcc/ChangeLog:

PR tree-optimization/113237
* tree-vect-loop-manip.cc (slpeel_tree_duplicate_loop_to_edge_cfg): Use
existing LCSSA variable for exit when all exits are early break.

gcc/testsuite/ChangeLog:

PR tree-optimization/113237
* gcc.dg/vect/vect-early-break_98-pr113237.c: New test.

middle-end: maintain LCSSA form when peeled vector iterations have virtual operands

This patch fixes several interconnected issues.

1. When picking an exit we wanted to check for niter_desc.may_be_zero not true.
   i.e. we want to pick an exit which we know will iterate at least once.
   However niter_desc.may_be_zero is not a boolean.  It is a tree that encodes
   a boolean value.  !niter_desc.may_be_zero is just checking if we have some
   information, not what the information is.  This leads us to pick a more
   difficult to vectorize exit more often than we should.

2. Because we had this bug, we used to pick an alternative exit much more ofthen
   which showed one issue, when the loop accesses memory and we "invert it" we
   would corrupt the VUSE chain.  This is because on an peeled vector iteration
   every exit restarts the loop (i.e. they're all early) BUT since we may have
   performed a store, the vUSE would need to be updated.  This version maintains
   virtual PHIs correctly in these cases.   Note that we can't simply remove all
   of them and recreate them because we need the PHI nodes still in the right
   order for if skip_vector.

3. Since we're moving the stores to a safe location I don't think we actually
   need to analyze whether the store is in range of the memref,  because if we
   ever get there, we know that the loads must be in range, and if the loads are
   in range and we get to the store we know the early breaks were not taken and
   so the scalar loop would have done the VF stores too.

4. Instead of searching for where to move stores to, they should always be in
   exit belonging to the latch.  We can only ever delay stores and even if we
   pick a different exit than the latch one as the main one, effects still
   happen in program order when vectorized.  If we don't move the stores to the
   latch exit but instead to whever we pick as the "main" exit then we can
   perform incorrect memory accesses (luckily these are trapped by verify_ssa).

5. We only used to analyze loads inside the same BB as an early break, and also
   we'd never analyze the ones inside the block where we'd be moving memory
   references to.  This is obviously bogus and to fix it this patch splits apart
   the two constraints.  We first validate that all load memory references are
   in bounds and only after that do we perform the alias checks for the writes.
   This makes the code simpler to understand and more trivially correct.

gcc/ChangeLog:

PR tree-optimization/113137
PR tree-optimization/113136
PR tree-optimization/113172
PR tree-optimization/113178
* tree-vect-loop-manip.cc (slpeel_tree_duplicate_loop_to_edge_cfg):
Maintain PHIs on inverted loops.
(vect_do_peeling): Maintain virtual PHIs on inverted loops.
* tree-vect-loop.cc (vec_init_loop_exit_info): Pick exit closes to
latch.
(vect_create_loop_vinfo): Record all conds instead of only alt ones.

gcc/testsuite/ChangeLog:

PR tree-optimization/113137
PR tree-optimization/113136
PR tree-optimization/113172
PR tree-optimization/113178
* g++.dg/vect/vect-early-break_4-pr113137.cc: New test.
* g++.dg/vect/vect-early-break_5-pr113137.cc: New test.
* gcc.dg/vect/vect-early-break_95-pr113137.c: New test.
* gcc.dg/vect/vect-early-break_96-pr113136.c: New test.
* gcc.dg/vect/vect-early-break_97-pr113172.c: New test.

middle-end: make memory analysis for early break more deterministic [PR113135]

Instead of searching for where to move stores to, they should always be in
exit belonging to the latch.  We can only ever delay stores and even if we
pick a different exit than the latch one as the main one, effects still
happen in program order when vectorized.  If we don't move the stores to the
latch exit but instead to whever we pick as the "main" exit then we can
perform incorrect memory accesses (luckily these are trapped by verify_ssa).

We used to iterate over the conds and check the loads and stores inside them.
However this relies on the conds being ordered in program order.  Additionally
if there is a basic block between two conds we would not have analyzed it.

Instead this now walks from the preds of the destination basic block up to the
loop header and analyzes every block along the way.  As a later optimization we
could stop as soon as we've seen all the BBs we have conds for.  For now the
header will always contain the first cond, but this can change when we support
arbitrary control flow.

gcc/ChangeLog:

PR tree-optimization/113135
* tree-vect-data-refs.cc (vect_analyze_early_break_dependences): Rework
dependency analysis.

gcc/testsuite/ChangeLog:

PR tree-optimization/113135
* gcc.dg/vect/vect-early-break_103-pr113135.c: New test.

c++: cand_parms_match and reversed candidates

When considering whether the candidate parameters match, according to the
language we're considering the synthesized reversed candidate, so we should
compare the parameters in swapped order.  In this situation it doesn't make
sense to consider whether object parameters correspond, since we're
comparing an object parameter to a non-object parameter, so I generalized
xobj_iobj_parameters_correspond accordingly.

As I refine cand_parms_match, more behaviors need to differ between its
original use to compare the original templates for two candidates, and the
later use to decide whether to compare constraints.  So now there's a
parameter to select between the semantics.

gcc/cp/ChangeLog:

* call.cc (reversed_match): New.
(enum class pmatch): New enum.
(cand_parms_match): Add match_kind parm.
(object_parms_correspond): Add fn parms.
(joust): Adjust.
* class.cc (xobj_iobj_parameters_correspond): Rename to...
(iobj_parm_corresponds_to): ...this.  Take the other
type instead of a second function.
(object_parms_correspond): Adjust.
* cp-tree.h (iobj_parm_corresponds_to): Declare.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-memfun4.C: Change expected
reversed handling.

Objective-C, Darwin: Fix a regression in handling bad receivers.

This is seen on 32b hosts with a 64b multilib, and is an ICE when
the build has checking enabled. The fix is to exit the routine
early if the sender or receiver are already error_mark_node.

gcc/objc/ChangeLog:

* objc-next-runtime-abi-02.cc
(build_v2_objc_method_fixup_call): Early exit for cases
where the sender or receiver are known to be in error.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>

Darwin, powerpc: Fix bootstrap.

Recent changes to the member names of the diagnostics class missed one case in
the Darwin PowerPC host code. Fixed thus.

gcc/ChangeLog:

* config/rs6000/host-darwin.cc (segv_handler): Use the revised
diagnostics class member name for abort of error.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>

AVR: Work around "sequence of 3 consecutive punctuation characters" warning.

gcc/
* config/avr/avr.cc (avr_handle_addr_attribute): Move "..." from
format string to %s argument.

Bump BASE-VER to 14.0.1 now that we are in stage4.

* BASE-VER: Bump to 14.0.1.

varasm: Fix up process_pending_assemble_externals [PR113182]

John reported that on HP-UX we no longer emit needed external libcalls.

The problem is that we didn't strip name encoding when looking up
the identifiers in assemble_external_libcall and
process_pending_assemble_externals, while
assemble_name_resolve does that:
  const char *real_name = targetm.strip_name_encoding (name);
  tree id = maybe_get_identifier (real_name);

  if (id)
    {
...
      mark_referenced (id);
The intention is that assemble_external_libcall ensures the IDENTIFIER
exists for the external libcall, then for actually emitted calls
assemble_name_resolve sees those IDENTIFIERS and sets TREE_SYMBOL_REFERENCED
on them and finally process_pending_assemble_externals looks the
IDENTIFIER up again and checks its TREE_SYMBOL_REFERENCED.

But without the strip_name_encoding call, they can look up different
identifiers and those are likely never used.

In the PR, John was discussing whether get_identifier or
maybe_get_identifier should be used, I believe in assemble_external_libcall
we definitely want to use get_identifier, we need an IDENTIFIER allocated
so that it can be actually tracked, in process_pending_assemble_externals
it doesn't matter, the IDENTIFIER should be already created.

2024-01-12  John David Anglin  <danglin@gcc.gnu.org>
    Jakub Jelinek  <jakub@redhat.com>

PR middle-end/113182
* varasm.cc (process_pending_assemble_externals,
assemble_external_libcall): Use targetm.strip_name_encoding
before calling get_identifier.

aarch64: Rework uxtl->zip optimisation [PR113196]

g:f26f92b534f9 implemented unsigned extensions using ZIPs rather than
UXTL{,2}, since the former has a higher throughput than the latter on
amny cores.  The optimisation worked by lowering directly to ZIP during
expand, so that the zero input could be hoisted and shared.

However, changing to ZIP means that zero extensions no longer benefit
from some existing combine patterns.  The patch included new patterns
for UADDW and USUBW, but the PR shows that other patterns were affected
as well.

This patch instead introduces the ZIPs during a pre-reload split
and forcibly hoists the zero move to the outermost scope.  This has
the disadvantage of executing the move even for a shrink-wrapped
function, which I suppose could be a problem if it causes a kernel
to trap and enable Advanced SIMD unnecessarily.  In other circumstances,
an unused move shouldn't affect things much.

Also, the RA should be able to rematerialise the move at an
appropriate point if necessary, such as if there is an intervening
call.

In https://gcc.gnu.org/pipermail/gcc-patches/2024-January/641948.html
I'd then tried to allow a zero to be recombined back into a solitary
ZIP.  However, that relied on late-combine, which didn't make it into
GCC 14.  This version instead restricts the split to cases where the
UXTL executes more frequently as the entry block (which is where we
plan to put the zero).

Also, the original optimisation contained a big-endian correction
that I don't think is needed/correct.  Even on big-endian targets,
we want the ZIP to take the low half of an element from the input
vector and the high half from the zero vector.  And the patterns
map directly to the underlying Advanced SIMD instructions: the use
of unspecs means that there's no need to adjust for the difference
between GCC and Arm lane numbering.

gcc/
PR target/113196
* config/aarch64/aarch64.h (machine_function::advsimd_zero_insn):
New member variable.
* config/aarch64/aarch64-protos.h (aarch64_split_simd_shift_p):
Declare.
* config/aarch64/iterators.md (Vnarrowq2): New mode attribute.
* config/aarch64/aarch64-simd.md
(vec_unpacku_hi_<mode>, vec_unpacks_hi_<mode>): Recombine into...
(vec_unpack<su>_hi_<mode>): ...this.  Move the generation of
zip2 for zero-extends to...
(aarch64_simd_vec_unpack<su>_hi_<mode>): ...a split of this
instruction.  Fix big-endian handling.
(vec_unpacku_lo_<mode>, vec_unpacks_lo_<mode>): Recombine into...
(vec_unpack<su>_lo_<mode>): ...this.  Move the generation of
zip1 for zero-extends to...
(<optab><Vnarrowq><mode>2): ...a split of this instruction.
Fix big-endian handling.
(*aarch64_zip1_uxtl): New pattern.
(aarch64_usubw<mode>_lo_zip, aarch64_uaddw<mode>_lo_zip): Delete
(aarch64_usubw<mode>_hi_zip, aarch64_uaddw<mode>_hi_zip): Likewise.
* config/aarch64/aarch64.cc (aarch64_get_shareable_reg): New function.
(aarch64_gen_shareable_zero): Use it.
(aarch64_split_simd_shift_p): New function.

gcc/testsuite/
PR target/113196
* gcc.target/aarch64/pr113196.c: New test.
* gcc.target/aarch64/simd/vmovl_high_1.c: Remove double include.
Expect uxtl2 rather than zip2.
* gcc.target/aarch64/vect_mixed_sizes_8.c: Expect zip1 rather
than uxtl.
* gcc.target/aarch64/vect_mixed_sizes_9.c: Likewise.
* gcc.target/aarch64/vect_mixed_sizes_10.c: Likewise.

Keep track of the FUNCTION_BEG note

function.cc emits a NOTE_FUNCTION_BEG after all arguments have
been copied to pseudos.  It then records this note in parm_birth_insn.
Various other pieces of code use this insn as a convenient place to
insert things at the start of the function.

However, cfgexpand later changes parm_birth_insn as follows:

  /* If we emitted any instructions for setting up the variables,
     emit them before the FUNCTION_START note.  */
  if (var_seq)
    {
      emit_insn_before (var_seq, parm_birth_insn);

      /* In expand_function_end we'll insert the alloca save/restore
before parm_birth_insn.  We've just insertted an alloca call.
Adjust the pointer to match.  */
      parm_birth_insn = var_seq;
    }

But the FUNCTION_BEG note is still useful for things that aren't
sensitive to stack allocation, and it has the advantage that
(unlike the var_seq above) it is never deleted or combined.
This patch adds a separate variable to track it.

gcc/
* emit-rtl.h (rtl_data::x_function_beg_note): New member variable.
(function_beg_insn): New macro.
* function.cc (expand_function_start): Initialize function_beg_insn.

aarch64: Use a global map to detect duplicated overloads [PR112989]

As explained in the covering note to the previous patch,
the fact that aarch64-sve-* is now used for multiple header
files means that function_builder::add_overloaded_function
now needs to use a global map to detect duplicated overload
functions, instead of the member variable that it used previously.

gcc/
PR target/112989
* config/aarch64/aarch64-sve-builtins.h
(function_builder::m_overload_names): Replace with...
* config/aarch64/aarch64-sve-builtins.cc (overload_names): ...this
new global.
(add_overloaded_function): Update accordingly, using get_identifier
to get a GGC-friendly record of the name.

aarch64: Use a separate group for SME builtins [PR112989]

The PR shows that we were registering the same overloaded SVE
builtins twice.  This was supposed to be prevented by
function_builder::add_overloaded_function, which uses a map
to detect whether a function of the same name has already been
registered.  add_overloaded_function then had some asserts to
check for consistency.

However, the map that add_overloaded_function uses was a member of
function_builder itself.  That made sense when there was just one
header file, arm_sve.h, since it meant that the memory could be
reclaimed once arm_sve.h had been processed.  But now we have three
header files, and in principle, it's possible for arm_sme.h to include
overloads of things that arm_sve.h also defines.  We therefore need
to use a global map instead.

However, doing that meant that the consistency checks in
add_overloaded_function fired as expected, which showed some
latent issues.  This preliminary patch deals with those by adding
AARCH64_FL_SME to things that require AARCH64_FL_SME2.

This inconsistency led to another problem: functions were selected
for arm_sme.h over arm_sve.h based on whether they had AARCH64_FL_SME.
So some SME2-only things were actually defined in arm_sve.h, whereas
similar SME things were defined in arm_sme.h.

Choosing based on flags was an early get-started crutch that I forgot
to clean up later :(  This patch goes for the more direct approach of
having a separate table of SME builtins, as for arm_neon_sve_bridge.h.

aarch64-sve-builtins-sve2.def contains several intrinsics that are
currently SME-only but that operate entirely on vector registers.
Many of these will be extended to SVE2.1 once SVE2.1 support is added,
so the patch front-loads that by keeping the current division between
aarch64-sve-builtins-sve2.def (whose functions now go in arm_sve.h)
and aarch64-sve-builtins-sme.def (whose functions now go in arm_sme.h).

gcc/
PR target/112989
* config/aarch64/aarch64-sve-builtins.def: Don't include
aarch64-sve-builtins-sme.def.
(DEF_SME_ZA_FUNCTION_GS, DEF_SME_ZA_FUNCTION): Move to...
* config/aarch64/aarch64-sve-builtins-sme.def: ...here.
(DEF_SME_FUNCTION): New macro.  Use it and DEF_SME_FUNCTION_GS
instead of DEF_SVE_*.  Add AARCH64_FL_SME to anything that
requires AARCH64_FL_SME2.
* config/aarch64/aarch64-sve-builtins-sve2.def: Make same
AARCH64_FL_SME adjustment here.
* config/aarch64/aarch64-sve-builtins.cc (function_groups): Don't
include SME intrinsics.
(sme_function_groups): New array.
(handle_arm_sve_h): Remove check for AARCH64_FL_SME.
(handle_arm_sme_h): Use sme_function_groups instead of function_groups.

gcc/testsuite/
PR target/112989
* gcc.target/aarch64/sve/acle/general-c/clamp_1.c: Remove bogus
error test.

RISC-V: Adjust scalar_to_vec cost

1. Introduce vector regmove new tune info.
2. Adjust scalar_to_vec cost in add_stmt_cost.

We will get optimal codegen after this patch with -march=rv64gcv_zvl256b:

lui a5,%hi(a)
li a4,19
sb a4,%lo(a)(a5)
li a0,0
ret

Tested on both RV32/RV64 no regression, Ok for trunk ?

PR target/113281

gcc/ChangeLog:

* config/riscv/riscv-protos.h (struct regmove_vector_cost): New struct.
(struct cpu_vector_cost): Add regmove struct.
(get_vector_costs): Export as global.
* config/riscv/riscv-vector-costs.cc (adjust_stmt_cost): Adjust scalar_to_vec cost.
(costs::add_stmt_cost): Ditto.
* config/riscv/riscv.cc (get_common_costs): Export global function.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/pr113209.c: Adapt test.
* gcc.dg/vect/costmodel/riscv/rvv/pr113281-1.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/pr113281-2.c: New test.

lower-bitint: Fix up handling of unsigned INTEGER_CSTs operands with lots of 1s in the upper bits [PR113334]

For INTEGER_CST operands, the code decides if it should emit the whole
INTEGER_CST into memory, or if there are enough upper bits either all 0s
or all 1s to warrant an optimization, where we use memory for lower limbs
or even just an INTEGER_CST for least significant limb and fill in the
rest of limbs with 0s or 1s.  Unfortunately when not using
bitint_min_cst_precision, the code was using tree_int_cst_sgn (op) < 0
to determine whether to fill in the upper bits with 1s or 0s.  That is
incorrect for TYPE_UNSIGNED INTEGER_CSTs which have higher limbs full of
ones, we really want to check here whether the most significant bit is
set or clear.

Fixed thusly.

2024-01-12  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/113334
* gimple-lower-bitint.cc (bitint_large_huge::handle_operand): Use
wi::neg_p (wi::to_wide (op)) instead of tree_int_cst_sgn (op) < 0
to determine if number should be extended by all ones rather than zero
extended.

* gcc.dg/torture/bitint-46.c: New test.

sra: Punt for too large _BitInt accesses [PR113330]

This is the case I was talking about in
https://gcc.gnu.org/pipermail/gcc-patches/2024-January/642423.html
and Zdenek kindly found a testcase for it.
We can only create BITINT_TYPE with precision at most 65535, not 65536,
so need to punt if we'd want to create it.

2024-01-12 Jakub Jelinek <jakub@redhat.com>

PR tree-optimization/113330
* tree-sra.cc (create_access): Punt for BITINT_TYPE accesses with
too large size.

* gcc.dg/bitint-69.c: New test.