git.ipfire.org Git - thirdparty/gcc.git/log

testsuite: Only run test if alarm is available

Most baremetal toolchains will not have an implementation for alarm and
sigaction as they are target specific.
For arm-none-eabi with newlib, function signatures are exposed, but
there is no implmentation and thus the test cases causes a undefined
symbol link error.

gcc/testsuite/ChangeLog:

* gcc.dg/pr78185.c: Remove dg-do and replace with
with dg-require-effective-target of signal and alarm.
* gcc.dg/pr116906-1.c: Likewise.
* gcc.dg/pr116906-2.c: Likewise.
* gcc.dg/vect/pr101145inf.c: Use effective-target alarm.
* gcc.dg/vect/pr101145inf_1.c: Likewise.
* lib/target-supports.exp(check_effective_target_alarm): New.

gcc/ChangeLog:

* doc/sourcebuild.texi (Effective-Target Keywords): Document
'alarm'.

Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>

AVR: PR117726 - Tweak 32-bit logical shifts of 25...30 for -Oz.

As it turns out, logical 32-bit shifts with an offset of 25..30 can
be performed in 7 instructions or less. This beats the 7 instruc-
tions required for the default code of a shift loop.
Plus, with zero overhead, these cases can be 3-operand.

This is only relevant for -Oz because with -Os, 3op shifts are
split with -msplit-bit-shift (which is not performed with -Oz).

PR target/117726
gcc/
* config/avr/avr.cc (avr_ld_regno_p): New function.
(ashlsi3_out) [case 25,26,27,28,29,30]: Handle and tweak.
(lshrsi3_out): Same.
(avr_rtx_costs_1) [SImode, ASHIFT, LSHIFTRT]: Adjust costs.
* config/avr/avr.md (ashlsi3, *ashlsi3, *ashlsi3_const):
Add "r,r,C4L" alternative.
(lshrsi3, *lshrsi3, *lshrsi3_const): Add "r,r,C4R" alternative.
* config/avr/constraints.md (C4R, C4L): New,
gcc/testsuite/
* gcc.target/avr/torture/avr-torture.exp (AVR_TORTURE_OPTIONS):
Turn one option variant into -Oz.

Fortran: Regression- fix ICE at fortran/trans-decl.c:1575 [PR96087]

2025-01-23 Paul Thomas <pault@gcc.gnu.org>

gcc/fortran
PR fortran/96087
* trans-decl.cc (gfc_get_symbol_decl): If a dummy is missing a
backend decl, it is likely that it has come from a module proc
interface. Look for the formal symbol by name in the containing
proc and use its backend decl.
* trans-expr.cc (gfc_apply_interface_mapping_to_expr): For the
same reason, match the name, rather than the symbol address to
perform the mapping.

gcc/testsuite/
PR fortran/96087
* gfortran.dg/pr96087.f90: New test.

tree-optimization/118558 - fix alignment compute with VMAT_CONTIGUOUS_REVERSE

There are calls to dr_misalignment left that do not correct for the
offset (which is vector type dependent) when the stride is negative.
Notably vect_known_alignment_in_bytes doesn't allow to pass through
such offset which the following adds (computing the offset in
vect_known_alignment_in_bytes would be possible as well, but the
offset can be shared as seen). Eventually this function could go away.

This leads to peeling for gaps not considerd, nor shortening of the
access applied which is what fixes the testcase on x86_64.

PR tree-optimization/118558
* tree-vectorizer.h (vect_known_alignment_in_bytes): Pass
through offset to dr_misalignment.
* tree-vect-stmts.cc (get_group_load_store_type): Compute
offset applied for negative stride and use it when querying
alignment of accesses.
(vectorizable_load): Likewise.

* gcc.dg/vect/pr118558.c: New testcase.

c++: Update mangling of lambdas in expressions

https://github.com/itanium-cxx-abi/cxx-abi/pull/85 clarifies that
mangling a lambda expression should use 'L' rather than "tl".

gcc/cp/ChangeLog:

* mangle.cc (write_expression): Update mangling for lambdas.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/lambda-generic-mangle1.C: Update mangling.
* g++.dg/cpp2a/lambda-generic-mangle1a.C: Likewise.

Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>

c++: Fix mangling of lambdas in static data member initializers [PR107741]

This fixes an issue where lambdas declared in the initializer of a
static data member within the class body do not get a mangling scope of
that variable; this results in mangled names that do not conform to the
ABI spec.

To do this, the patch splits up grokfield for this case specifically,
allowing a declaration to be build and used in start_lambda_scope before
parsing the initializer, so that record_lambda_scope works correctly.

As a drive-by, this also fixes the issue of a static member not being
visible within its own initializer.

PR c++/107741

gcc/c-family/ChangeLog:

* c-opts.cc (c_common_post_options): Bump ABI version.

gcc/ChangeLog:

* common.opt: Add -fabi-version=20.
* doc/invoke.texi: Likewise.

gcc/cp/ChangeLog:

* cp-tree.h (start_initialized_static_member): Declare.
(finish_initialized_static_member): Declare.
* decl2.cc (start_initialized_static_member): New function.
(finish_initialized_static_member): New function.
* lambda.cc (record_lambda_scope): Support falling back to old
ABI (maybe with warning).
* parser.cc (cp_parser_member_declaration): Build decl early
when parsing an initialized static data member.

gcc/testsuite/ChangeLog:

* g++.dg/abi/macro0.C: Bump ABI version.
* g++.dg/abi/mangle74.C: Remove XFAILs.
* g++.dg/other/fold1.C: Restore originally raised error.
* g++.dg/abi/lambda-ctx2-19.C: New test.
* g++.dg/abi/lambda-ctx2-19vs20.C: New test.
* g++.dg/abi/lambda-ctx2-20.C: New test.
* g++.dg/abi/lambda-ctx2.h: New test.
* g++.dg/cpp0x/static-member-init-1.C: New test.

Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>

c++/modules: Fix exporting temploid friends in header units [PR118582]

When we started streaming the bit to handle merging of imported temploid
friends in r15-2807, I unthinkingly only streamed it in the
'!state->is_header ()' case.

This patch reworks the streaming logic to ensure that this data is
always streamed, including for unique entities (in case that ever comes
up somehow). This does make the streaming slightly less efficient, as
functions and types will need an extra byte, but this doesn't appear to
make a huge difference to the size of the resulting module; the 'std'
module on my machine grows by 0.2% from 30671136 to 30730144 bytes.

PR c++/118582

gcc/cp/ChangeLog:

* module.cc (trees_out::decl_value): Always stream
imported_temploid_friends information.
(trees_in::decl_value): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/modules/pr118582_a.H: New test.
* g++.dg/modules/pr118582_b.H: New test.
* g++.dg/modules/pr118582_c.H: New test.

Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>

LoongArch: Fix invalid subregs in xorsign [PR118501]

The test case added in r15-7073 now triggers an ICE, indicating we need
the same fix as AArch64.

gcc/ChangeLog:

PR target/118501
* config/loongarch/loongarch.md (@xorsign<mode>3): Use
force_lowpart_subreg.

i386: Omit "p" for packed in intrin name for FP8 convert

gcc/ChangeLog:

* config/i386/avx10_2-512convertintrin.h:
Omit "p" for packed for FP8.
* config/i386/avx10_2convertintrin.h: Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx10_2-512-convert-1.c: Adjust intrin call.
* gcc.target/i386/avx10_2-512-vcvtbiasph2bf8-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvtbiasph2bf8s-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvtbiasph2hf8-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvtbiasph2hf8s-2.c: Ditto.
* gcc.target/i386/avx10_2-convert-1.c: Ditto.

i386: Change mnemonics from VCVT[,T]NEBF162I[,U]BS to VCVT[,T]BF162I[,U]BS

gcc/ChangeLog:

PR target/118270
* config/i386/avx10_2-512satcvtintrin.h: Change intrin and
builtin name according to new mnemonics.
* config/i386/avx10_2satcvtintrin.h: Ditto.
* config/i386/i386-builtin.def (BDESC): Ditto.
* config/i386/sse.md
(UNSPEC_VCVTBF162IBS): Rename from UNSPEC_VCVTNEBF162IBS.
(UNSPEC_VCVTBF162IUBS): Rename from UNSPEC_VCVTNEBF162IUBS.
(UNSPEC_VCVTTBF162IBS): Rename from UNSPEC_VCVTTNEBF162IBS.
(UNSPEC_VCVTTBF162IUBS): Rename from UNSPEC_VCVTTNEBF162IUBS.
(UNSPEC_CVTNE_BF16_IBS_ITER): Rename to...
(UNSPEC_CVT_BF16_IBS_ITER): ...this. Adjust UNSPEC name.
(sat_cvt_sign_prefix): Adjust UNSPEC name.
(sat_cvt_trunc_prefix): Ditto.
(avx10_2_cvt<sat_cvt_trunc_prefix>nebf162i<sat_cvt_sign_prefix>bs<mode><mask_name>):
Rename to...
(avx10_2_cvt<sat_cvt_trunc_prefix>bf162i<sat_cvt_sign_prefix>bs<mode><mask_name>):
...this. Change instruction name output.

gcc/testsuite/ChangeLog:

PR target/118270
* gcc.target/i386/avx10_2-512-satcvt-1.c: Adjust output and
intrin call.
* gcc.target/i386/avx10_2-512-vcvtnebf162ibs-2.c: Move to...
* gcc.target/i386/avx10_2-512-vcvtbf162ibs-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-512-vcvtnebf162iubs-2.c: Move to...
* gcc.target/i386/avx10_2-512-vcvtbf162iubs-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-512-vcvttnebf162ibs-2.c: Move to...
* gcc.target/i386/avx10_2-512-vcvttbf162ibs-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-512-vcvttnebf162iubs-2.c: Move to...
* gcc.target/i386/avx10_2-512-vcvttbf162iubs-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-satcvt-1.c: Adjust output and
intrin call.
* gcc.target/i386/avx10_2-vcvtnebf162ibs-2.c: Move to...
* gcc.target/i386/avx10_2-vcvtbf162ibs-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-vcvtnebf162iubs-2.c: Move to...
* gcc.target/i386/avx10_2-vcvtbf162iubs-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-vcvttnebf162ibs-2.c: Move to...
* gcc.target/i386/avx10_2-vcvttbf162ibs-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-vcvttnebf162iubs-2.c: Move to...
* gcc.target/i386/avx10_2-vcvttbf162iubs-2.c: ...here.
Adjust intrin call.

i386: Change mnemonics from VCVTNEPH2[B,H]F8 to VCVTPH2[B,H]F8

gcc/ChangeLog:

PR target/118270
* config/i386/avx10_2-512convertintrin.h: Change intrin and
builtin name according to new mnemonics.
* config/i386/avx10_2convertintrin.h: Ditto.
* config/i386/i386-builtin.def (BDESC): Ditto.
* config/i386/sse.md
(UNSPEC_VCVTPH2BF8): Rename from UNSPEC_VCVTNEPH2BF8.
(UNSPEC_VCVTPH2BF8S): Rename from UNSPEC_VCVTNEPH2BF8S.
(UNSPEC_VCVTPH2HF8): Rename from UNSPEC_VCVTNEPH2HF8.
(UNSPEC_VCVTPH2HF8S): Rename from UNSPEC_VCVTNEPH2HF8S.
(UNSPEC_CONVERTPH2FP8): Rename from UNSPEC_NECONVERTPH2FP8.
Adjust UNSPEC name.
(convertph2fp8): Rename from neconvertph2fp8. Adjust
iterator map.
(vcvt<neconvertph2fp8>v8hf): Rename to...
(vcvt<neconvertph2fp8>v8hf): ...this.
(*vcvt<neconvertph2fp8>v8hf): Rename to...
(*vcvt<neconvertph2fp8>v8hf): ...this.
(vcvt<neconvertph2fp8>v8hf_mask): Rename to...
(vcvt<neconvertph2fp8>v8hf_mask): ...this.
(*vcvt<neconvertph2fp8>v8hf_mask): Rename to...
(*vcvt<neconvertph2fp8>v8hf_mask): ...this.
(vcvt<neconvertph2fp8><mode><mask_name>): Rename to...
(vcvt<convertph2fp8><mode><mask_name>): ...this.

gcc/testsuite/ChangeLog:

PR target/118270
* gcc.target/i386/avx10_2-512-convert-1.c: Adjust output
and intrin call.
* gcc.target/i386/avx10_2-512-vcvtneph2bf8-2.c: Move to...
* gcc.target/i386/avx10_2-512-vcvtph2bf8-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-512-vcvtneph2bf8s-2.c: Move to...
* gcc.target/i386/avx10_2-512-vcvtph2bf8s-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-512-vcvtneph2hf8-2.c: Move to...
* gcc.target/i386/avx10_2-512-vcvtph2hf8-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-512-vcvtneph2hf8s-2.c: Move to...
* gcc.target/i386/avx10_2-512-vcvtph2hf8s-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-convert-1.c: Adjust output
and intrin call.
* gcc.target/i386/avx10_2-vcvtneph2bf8-2.c: Move to...
* gcc.target/i386/avx10_2-vcvtph2bf8-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-vcvtneph2hf8-2.c: Move to...
* gcc.target/i386/avx10_2-vcvtph2bf8s-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-vcvtneph2bf8s-2.c: Move to...
* gcc.target/i386/avx10_2-vcvtph2hf8-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-vcvtneph2hf8s-2.c: Move to...
* gcc.target/i386/avx10_2-vcvtph2hf8s-2.c: ...here.
Adjust intrin call.

i386: Change mnemonics from VCVTNE2PH2[B,H]F8 to VCVT2PH2[B,H]F8

gcc/ChangeLog:

PR target/118270
* config/i386/avx10_2-512convertintrin.h: Change intrin and
builtin name according to new mnemonics.
* config/i386/avx10_2convertintrin.h: Ditto.
* config/i386/i386-builtin.def (BDESC): Ditto.
* config/i386/sse.md
(UNSPEC_VCVT2PH2BF8): Rename from UNSPEC_VCVTNE2PH2BF8.
(UNSPEC_VCVT2PH2BF8S): Rename from UNSPEC_VCVTNE2PH2BF8S.
(UNSPEC_VCVT2PH2HF8): Rename from UNSPEC_VCVTNE2PH2HF8.
(UNSPEC_VCVT2PH2HF8S): Rename from UNSPEC_VCVTNE2PH2HF8S.
(UNSPEC_CONVERTFP8_PACK): Rename from UNSPEC_NECONVERTFP8_PACK.
Adjust UNSPEC name.
(convertfp8_pack): Rename from neconvertfp8_pack. Adjust
iterator map.
(vcvt<neconvertfp8_pack><mode><mask_name>): Rename to...
(vcvt<convertfp8_pack><mode><mask_name>): ...this.

gcc/testsuite/ChangeLog:

PR target/118270
* gcc.target/i386/avx10_2-512-convert-1.c: Adjust output
and intrin call.
* gcc.target/i386/avx10_2-512-vcvtne2ph2bf8-2.c: Move to...
* gcc.target/i386/avx10_2-512-vcvt2ph2bf8-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-512-vcvtne2ph2bf8s-2.c: Move to...
* gcc.target/i386/avx10_2-512-vcvt2ph2bf8s-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-512-vcvtne2ph2hf8-2.c: Move to...
* gcc.target/i386/avx10_2-512-vcvt2ph2hf8-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-512-vcvtne2ph2hf8s-2.c: Move to...
* gcc.target/i386/avx10_2-512-vcvt2ph2hf8s-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-convert-1.c: Adjust output
and intrin call.
* gcc.target/i386/avx10_2-vcvtne2ph2bf8-2.c: Move to...
* gcc.target/i386/avx10_2-vcvt2ph2bf8-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-vcvtne2ph2hf8-2.c: Move to...
* gcc.target/i386/avx10_2-vcvt2ph2bf8s-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-vcvtne2ph2bf8s-2.c: Move to...
* gcc.target/i386/avx10_2-vcvt2ph2hf8-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-vcvtne2ph2hf8s-2.c: Move to...
* gcc.target/i386/avx10_2-vcvt2ph2hf8s-2.c: ...here.
Adjust intrin call.

i386: Change mnemonics from VCOMSBF16 to VCOMISBF16

Besides mnemonics change, this patch also use the compare
pattern instead of UNSPEC.

gcc/ChangeLog:

PR target/118270
* config/i386/avx10_2bf16intrin.h: Change intrin and builtin
name according to new mnemonics.
* config/i386/i386-builtin.def (BDESC): Ditto.
* config/i386/i386-expand.cc
(ix86_expand_fp_compare): Adjust comments.
(ix86_expand_builtin): Adjust switch case.
* config/i386/i386.md (cmpibf): Change instruction name output.
* config/i386/sse.md (UNSPEC_VCOMSBF16): Removed.
(avx10_2_comisbf16_v8bf): New.
(avx10_2_comsbf16_v8bf): Removed.

gcc/testsuite/ChangeLog:

PR target/118270
* gcc.target/i386/avx10_2-comibf-1.c: Adjust asm check.
* gcc.target/i386/avx10_2-comibf-3.c: Ditto.
* gcc.target/i386/avx10_2-vcomsbf16-1.c: Move to...
* gcc.target/i386/avx10_2-vcomisbf16-1.c: ...here.
Adjust output and intrin call.
* gcc.target/i386/avx10_2-vcomsbf16-2.c: Move to...
* gcc.target/i386/avx10_2-vcomisbf16-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/pr117495.c: Adjust asm check.

i386: Change mnemonics from V[GETEXP,FPCLASS]PBF16 to V[GETEXP,FPCLASS]BF16

Besides mnemonics change, this patch also fixed SDE test fail for
FPCLASS.

gcc/ChangeLog:

PR target/118270
* config/i386/avx10_2-512bf16intrin.h: Change intrin and builtin
name according to new mnemonics.
* config/i386/avx10_2bf16intrin.h: Ditto.
* config/i386/i386-builtin.def (BDESC): Ditto.
* config/i386/sse.md
(UNSPEC_VFPCLASSBF16); Rename from UNSPEC_VFPCLASSPBF16.
(avx10_2_getexppbf16_<mode><mask_name>): Rename to...
(avx10_2_getexpbf16_<mode><mask_name>): ...this.
Change instruction name output.
(avx10_2_fpclasspbf16_<mode><mask_scalar_merge_name>):
Rename to...
(avx10_2_fpclassbf16_<mode><mask_scalar_merge_name>): ...this.
Change instruction name output.

gcc/testsuite/ChangeLog:

PR target/118270
* gcc.target/i386/avx10_2-512-bf16-1.c: Adjust output and
intrin call.
* gcc.target/i386/avx10_2-512-vfpclasspbf16-2.c: Move to...
* gcc.target/i386/avx10_2-512-vfpclassbf16-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-512-vgetexppbf16-2.c: Move to...
* gcc.target/i386/avx10_2-512-vgetexpbf16-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-bf16-1.c: Adjust output and
intrin call.
* gcc.target/i386/avx10_2-vgetexppbf16-2.c: Move to...
* gcc.target/i386/avx10_2-vgetexpbf16-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-vfpclasspbf16-2.c: Move to...
* gcc.target/i386/avx10_2-vfpclassbf16-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx-1.c: Adjust builtin call.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-23.c: Ditto.

i386: Change mnemonics from V[RSQRT,SCALEF,SQRTNE]PBF16 to V[RSQRT,SCALEF,SQRT]BF16

gcc/ChangeLog:

PR target/118270
* config/i386/avx10_2-512bf16intrin.h: Change intrin and builtin
name according to new mnemonics.
* config/i386/avx10_2bf16intrin.h: Ditto.
* config/i386/i386-builtin.def (BDESC): Ditto.
* config/i386/sse.md
(UNSPEC_VSCALEFBF16): Rename from UNSPEC_VSCALEFPBF16.
(avx10_2_scalefpbf16_<mode><mask_name>): Rename to...
(avx10_2_scalefbf16_<mode><mask_name>): ...this.
Change instruction name output.
(avx10_2_rsqrtpbf16_<mode><mask_name>): Rename to...
(avx10_2_rsqrtbf16_<mode><mask_name>): ...this.
Change instruction name output.
(avx10_2_sqrtnepbf16_<mode><mask_name>): Rename to...
(avx10_2_sqrtbf16_<mode><mask_name>): ...this.
Change instruction name output.

gcc/testsuite/ChangeLog:

PR target/118270
* gcc.target/i386/avx10_2-512-bf16-1.c: Adjust output and intrin
call.
* gcc.target/i386/avx10_2-512-vrsqrtpbf16-2.c: Move to...
* gcc.target/i386/avx10_2-512-vrsqrtbf16-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-512-vscalefpbf16-2.c: Move to...
* gcc.target/i386/avx10_2-512-vscalefbf16-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-512-vsqrtnepbf16-2.c: Move to...
* gcc.target/i386/avx10_2-512-vsqrtbf16-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-bf16-1.c: Adjust output and intrin
call.
* gcc.target/i386/avx10_2-vrsqrtpbf16-2.c: Move to...
* gcc.target/i386/avx10_2-vrsqrtbf16-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-vscalefpbf16-2.c: Move to...
* gcc.target/i386/avx10_2-vscalefbf16-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-vsqrtnepbf16-2.c: Move to...
* gcc.target/i386/avx10_2-vsqrtbf16-2.c: ...here.
Adjust intrin call.

i386: Change mnemonics from V[GETMANT,REDUCENE,RNDSCALENE]PBF16 to V[GETMANT,REDUCE,RNDSCALE]BF16

gcc/ChangeLog:

PR target/118270
* config/i386/avx10_2-512bf16intrin.h: Change intrin and builtin
name according to new mnemonics.
* config/i386/avx10_2bf16intrin.h: Ditto.
* config/i386/i386-builtin.def (BDESC): Ditto.
* config/i386/sse.md
(UNSPEC_VRNDSCALEBF16): Rename from UNSPEC_VRNDSCALENEPBF16.
(UNSPEC_VREDUCEBF16): Rename from UNSPEC_VREDUCENEPBF16.
(UNSPEC_VGETMANTBF16): Rename from UNSPEC_VGETMANTPBF16.
(BF16IMMOP): Adjust iterator due to UNSPEC name change.
(bf16immop): Ditto.
(avx10_2_<bf16immop>pbf16_<mode><mask_name>): Rename to...
(avx10_2_<bf16immop>bf16_<mode><mask_name>): ...this. Change
instruction name output.

gcc/testsuite/ChangeLog:

PR target/118270
* gcc.target/i386/avx10_2-512-bf16-1.c: Adjust output and
intrin call.
* gcc.target/i386/avx10_2-512-vgetmantpbf16-2.c: Move to...
* gcc.target/i386/avx10_2-512-vgetmantbf16-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-512-vreducenepbf16-2.c: Move to...
* gcc.target/i386/avx10_2-512-vreducebf16-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-512-vrndscalenepbf16-2.c: Move to...
* gcc.target/i386/avx10_2-512-vrndscalebf16-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-bf16-1.c: Adjust output and intrin
call.
* gcc.target/i386/avx10_2-vgetmantpbf16-2.c: Move to...
* gcc.target/i386/avx10_2-vgetmantbf16-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-vreducenepbf16-2.c: Move to...
* gcc.target/i386/avx10_2-vreducebf16-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-vrndscalenepbf16-2.c: Move to...
* gcc.target/i386/avx10_2-vrndscalebf16-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx-1.c: Adjust builtin call.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/sse-14.c: Adjust intrin call.
* gcc.target/i386/sse-22.c: Ditto.

i386: Change mnemonics from VMINMAXNEPBF16 to VMINMAXBF16

gcc/ChangeLog:

PR target/118270
* config/i386/avx10_2-512minmaxintrin.h: Change intrin and
builtin name according to new mnemonics.
* config/i386/avx10_2minmaxintrin.h: Ditto.
* config/i386/i386-builtin.def (BDESC): Ditto.
* config/i386/sse.md
(UNSPEC_MINMAXBF16): Rename from UNSPEC_MINMAXNEPBF16.
(avx10_2_minmaxnepbf16_<mode><mask_name>): Rename to...
(avx10_2_minmaxbf16_<mode><mask_name>): ...this. Change
instruction name output.

gcc/testsuite/ChangeLog:

PR target/118270
* gcc.target/i386/avx10_2-512-minmax-1.c: Adjust output and
intrin call.
* gcc.target/i386/avx10_2-512-vminmaxnepbf16-2.c: Move to...
* gcc.target/i386/avx10_2-512-vminmaxbf16-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-minmax-1.c: Adjust output and intrin
call.
* gcc.target/i386/avx10_2-vminmaxnepbf16-2.c: Move to...
* gcc.target/i386/avx10_2-vminmaxbf16-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx-1.c: Adjust builtin call.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/sse-14.c: Adjust intrin call.
* gcc.target/i386/sse-22.c: Ditto.

i386: Change mnemonics from V[CMP,MAX,MIN]PBF16 to V[CMP,MAX,MIN]BF16

gcc/ChangeLog:

PR target/118270
* config/i386/avx10_2-512bf16intrin.h: Change intrin and builtin
name according to new mnemonics.
* config/i386/avx10_2bf16intrin.h: Ditto.
* config/i386/i386-builtin.def (BDESC): Ditto.
* config/i386/sse.md
(avx10_2_<code>pbf16_<mode><mask_name>): Rename to...
(avx10_2_<code>bf16_<mode><mask_name>): ...this.
Change instruction name output.
(avx10_2_cmppbf16_<mode><mask_scalar_merge_name>): Rename to...
(avx10_2_cmpbf16_<mode><mask_scalar_merge_name>): ...this.
Change instruction name output.

gcc/testsuite/ChangeLog:

PR target/118270
* gcc.target/i386/avx10_2-512-bf16-1.c: Adjust output and
intrin call.
* gcc.target/i386/avx10_2-512-bf-vector-cmpp-1.c: Move to...
* gcc.target/i386/avx10_2-512-bf16-vector-cmp-1.c: ...here.
Adjust asm check.
* gcc.target/i386/avx10_2-512-bf-vector-smaxmin-1.c: Move to...
* gcc.target/i386/avx10_2-512-bf16-vector-smaxmin-1.c: ...here.
Adjust asm check.
* gcc.target/i386/avx10_2-512-vcmppbf16-2.c: Move to...
* gcc.target/i386/avx10_2-512-vcmpbf16-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-512-vmaxpbf16-2.c: Move to...
* gcc.target/i386/avx10_2-512-vmaxbf16-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-512-vminpbf16-2.c: Move to...
* gcc.target/i386/avx10_2-512-vminbf16-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-bf16-1.c: Adjust output and
intrin call.
* gcc.target/i386/avx10_2-bf-vector-cmpp-1.c: Move to...
* gcc.target/i386/avx10_2-bf16-vector-cmp-1.c: ...here.
Adjust asm check.
* gcc.target/i386/avx10_2-bf-vector-smaxmin-1.c: Move to...
* gcc.target/i386/avx10_2-bf16-vector-smaxmin-1.c: ...here.
Adjust asm check.
* gcc.target/i386/avx10_2-partial-bf-vector-smaxmin-1.c: Move to...
* gcc.target/i386/avx10_2-partial-bf16-vector-smaxmin-1.c: ...here.
* gcc.target/i386/avx10_2-vcmppbf16-2.c: Move to...
* gcc.target/i386/avx10_2-vcmpbf16-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-vmaxpbf16-2.c: Move to...
* gcc.target/i386/avx10_2-vmaxbf16-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-vminpbf16-2.c: Move to...
* gcc.target/i386/avx10_2-vminbf16-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/part-vect-vec_cmpbf.c: Adjust asm check.
* gcc.target/i386/avx-1.c: Adjust builtin call.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-23.c: Ditto.

i386: Change mnemonics from VF[,N]M[ADD,SUB][132,213,231]NEPBF16 to VF[,N]M[ADD,SUB][132,213,231]BF16

gcc/ChangeLog:

PR target/118270
* config/i386/avx10_2-512bf16intrin.h: Change intrin and builtin
names according to new mnemonics.
* config/i386/avx10_2bf16intrin.h: Ditto.
* config/i386/i386-builtin.def (BDESC): Ditto.
* config/i386/sse.md
(avx10_2_fmaddnepbf16_<mode>_maskz): Rename to...
(avx10_2_fmaddbf16_<mode>_maskz): ...this. Adjust emit_insn.
(avx10_2_fmaddnepbf16_<mode><sd_maskz_name>): Rename to...
(avx10_2_fmaddbf16_<mode><sd_maskz_name>): ...this.
Change instruction name output.
(avx10_2_fmaddnepbf16_<mode>_mask): Rename to...
(avx10_2_fmaddbf16_<mode>_mask): ...this.
Change instruction name output.
(avx10_2_fmaddnepbf16_<mode>_mask3): Rename to...
(avx10_2_fmaddbf16_<mode>_mask3): ...this.
Change instruction name output.
(avx10_2_fnmaddnepbf16_<mode>_maskz): Rename to...
(avx10_2_fnmaddbf16_<mode>_maskz): ...this. Adjust emit_insn.
(avx10_2_fnmaddnepbf16_<mode><sd_maskz_name>): Rename to...
(avx10_2_fnmaddbf16_<mode><sd_maskz_name>): ...this.
Change instruction name output.
(avx10_2_fnmaddnepbf16_<mode>_mask): Rename to...
(avx10_2_fnmaddbf16_<mode>_mask): ...this.
Change instruction name output.
(avx10_2_fnmaddnepbf16_<mode>_mask3): Rename to...
(avx10_2_fnmaddbf16_<mode>_mask3): ...this.
Change instruction name output.
(avx10_2_fmsubnepbf16_<mode>_maskz): Rename to...
(avx10_2_fmsubbf16_<mode>_maskz): ...this. Adjust emit_insn.
(avx10_2_fmsubnepbf16_<mode><sd_maskz_name>): Rename to...
(avx10_2_fmsubbf16_<mode><sd_maskz_name>): ...this.
Change instruction name output.
(avx10_2_fmsubnepbf16_<mode>_mask): Rename to...
(avx10_2_fmsubbf16_<mode>_mask): ...this.
Change instruction name output.
(avx10_2_fmsubnepbf16_<mode>_mask3): Rename to...
(avx10_2_fmsubbf16_<mode>_mask3): ...this.
Change instruction name output.
(avx10_2_fnmsubnepbf16_<mode>_maskz): Rename to...
(avx10_2_fnmsubbf16_<mode>_maskz): ...this. Adjust emit_insn.
(avx10_2_fnmsubnepbf16_<mode><sd_maskz_name>): Rename to...
(avx10_2_fnmsubbf16_<mode><sd_maskz_name>): ...this.
Change instruction name output.
(avx10_2_fnmsubnepbf16_<mode>_mask): Rename to...
(avx10_2_fnmsubbf16_<mode>_mask): ...this.
Change instruction name output.
(avx10_2_fnmsubnepbf16_<mode>_mask3): Rename to...
(avx10_2_fnmsubbf16_<mode>_mask3): ...this.
Change instruction name output.

gcc/testsuite/ChangeLog:

PR target/118270
* gcc.target/i386/avx10_2-512-bf16-1.c: Adjust output and
intrin call.
* gcc.target/i386/avx10_2-512-bf-vector-fma-1.c: Move to...
* gcc.target/i386/avx10_2-512-bf16-vector-fma-1.c: ...here.
Adjust asm check.
* gcc.target/i386/avx10_2-512-vfmaddXXXnepbf16-2.c: Move to...
* gcc.target/i386/avx10_2-512-vfmaddXXXbf16-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-512-vfmsubXXXnepbf16-2.c: Move to...
* gcc.target/i386/avx10_2-512-vfmsubXXXbf16-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-512-vfnmaddXXXnepbf16-2.c: Move to...
* gcc.target/i386/avx10_2-512-vfnmaddXXXbf16-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-512-vfnmsubXXXnepbf16-2.c: Move to...
* gcc.target/i386/avx10_2-512-vfnmsubXXXbf16-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-bf16-1.c: Adjust output and
intrin call.
* gcc.target/i386/avx10_2-bf-vector-fma-1.c: Move to...
* gcc.target/i386/avx10_2-bf16-vector-fma-1.c: ...here.
Adjust asm check.
* gcc.target/i386/avx10_2-partial-bf-vector-fma-1.c: Move to...
* gcc.target/i386/avx10_2-partial-bf16-vector-fma-1.c: ...here.
Adjust asm check.
* gcc.target/i386/avx10_2-vfmaddXXXnepbf16-2.c: Move to...
* gcc.target/i386/avx10_2-vfmaddXXXbf16-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-vfmsubXXXnepbf16-2.c: Move to...
* gcc.target/i386/avx10_2-vfmsubXXXbf16-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-vfnmaddXXXnepbf16-2.c: Move to...
* gcc.target/i386/avx10_2-vfnmaddXXXbf16-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-vfnmsubXXXnepbf16-2.c: Move to...
* gcc.target/i386/avx10_2-vfnmsubXXXbf16-2.c: ...here.
Adjust intrin call.

i386: Change mnemonics from V[ADDNE,DIVNE,MULNE,RCP,SUBNE]PBF16 to V[ADD,DIV,MUL,RCP,SUB]BF16

gcc/ChangeLog:

PR target/118270
* config/i386/avx10_2-512bf16intrin.h: Change intrin and builtin
name according to new mnemonics.
* config/i386/avx10_2bf16intrin.h: Ditto.
* config/i386/i386-builtin.def (BDESC): Ditto.
* config/i386/sse.md (div<mode>3): Adjust emit_insn.
(avx10_2_<insn>nepbf16_<mode><mask_name>): Rename to...
(avx10_2_<insn>bf16_<mode><mask_name>): ...this. Change
instruction name output.
(avx10_2_rcppbf16_<mode><mask_name>): Rename to...
(avx10_2_rcpbf16_<mode><mask_name>):...this. Change
instruction name output.

gcc/testsuite/ChangeLog:

PR target/118270
* gcc.target/i386/avx10_2-512-bf16-1.c: Adjust output and
intrin call.
* gcc.target/i386/avx10_2-512-bf-vector-operations-1.c: Move to ...
* gcc.target/i386/avx10_2-512-bf16-vector-operations-1.c: ...here.
Adjust asm check.
* gcc.target/i386/avx10_2-512-vaddnepbf16-2.c: Move to...
* gcc.target/i386/avx10_2-512-vaddbf16-2.c: ...here. Adjust
intrin call.
* gcc.target/i386/avx10_2-512-vdivnepbf16-2.c: Move to...
* gcc.target/i386/avx10_2-512-vdivbf16-2.c: ...here. Adjust
intrin call.
* gcc.target/i386/avx10_2-512-vmulnepbf16-2.c: Move to...
* gcc.target/i386/avx10_2-512-vmulbf16-2.c: ...here. Adjust
intrin call.
* gcc.target/i386/avx10_2-512-vrcppbf16-2.c: Move to...
* gcc.target/i386/avx10_2-512-vrcpbf16-2.c: ...here. Adjust
intrin call.
* gcc.target/i386/avx10_2-512-vsubnepbf16-2.c: Move to...
* gcc.target/i386/avx10_2-512-vsubbf16-2.c: ...here. Adjust
intrin call.
* gcc.target/i386/avx10_2-bf16-1.c: Adjust output and
intrin call.
* gcc.target/i386/avx10_2-bf-vector-operations-1.c: Move to ....
* gcc.target/i386/avx10_2-bf16-vector-operations-1.c: ...here.
Adjust asm check.
* gcc.target/i386/avx10_2-partial-bf-vector-fast-math-1.c: Move to...
* gcc.target/i386/avx10_2-partial-bf16-vector-fast-math-1.c: ...here.
Adjust asm check.
* gcc.target/i386/avx10_2-partial-bf-vector-operations-1.c: Move to...
* gcc.target/i386/avx10_2-partial-bf16-vector-operations-1.c: ...here.
Adjust asm check.
* gcc.target/i386/avx10_2-vaddnepbf16-2.c: Move to...
* gcc.target/i386/avx10_2-vaddbf16-2.c: ...here. Adjust intrin call.
* gcc.target/i386/avx10_2-vdivnepbf16-2.c: Move to...
* gcc.target/i386/avx10_2-vdivbf16-2.c: ...here. Adjust intrin call.
* gcc.target/i386/avx10_2-vmulnepbf16-2.c: Move to...
* gcc.target/i386/avx10_2-vmulbf16-2.c: ...here. Adjust intrin call.
* gcc.target/i386/avx10_2-vrcppbf16-2.c: Move to...
* gcc.target/i386/avx10_2-vrcpbf16-2.c: ...here. Adjust intrin call.
* gcc.target/i386/avx10_2-vsubnepbf16-2.c: Move to...
* gcc.target/i386/avx10_2-vsubbf16-2.c: ...here. Adjust intrin call.
* lib/target-supports.exp (check_effective_target_avx10_2):
Adjust asm usage.
(check_effective_target_avx10_2_512): Ditto.

i386: Enhance AMX tests

After Binutils got changed, the previous usage on intrin will raise
warning for assembler. We need to change that. Besides that, there
are separate issues for both AMX-MOVRS and AMX-TRANSPOSE.

For AMX-MOVRS, t2rpntlvwrs tests wrongly used AMX-TRANSPOSE intrins
in test. Since the only difference between them is the "rs" hint,
it won't change result.

For AMX-TRANSPOSE, "t1" hint test is missing.

This patch fixed both of them. Also changing AMX-MOVRS test file
name to make it match with other AMX tests.

gcc/testsuite/ChangeLog:

PR target/118270
PR target/118609
* gcc.target/i386/amxmovrs-t2rpntlvw-2.c: Move to...
* gcc.target/i386/amxmovrs-2rpntlvwrs-2.c: ...here.
* gcc.target/i386/amxtranspose-2rpntlvw-2.c: Add "t1" hint test.

i386: Append -march=x86-64-v3 to AVX10.2/512 VNNI testcases

These two testcases are misses on previous addition for
-march=x86-64-v3 to silence warning for -march=native tests.

gcc/testsuite/ChangeLog:

* gcc.target/i386/vnniint16-auto-vectorize-4.c: Append
-march=x86-64-v3.
* gcc.target/i386/vnniint8-auto-vectorize-4.c: Ditto.

Daily bump.

d,ada/spec: only sub nostd{inc,lib} rather than nostd{inc,lib}*

This prevents the gcc driver erroneously accepting -nostdlib++ when it
should not when Ada was enabled.

Also, similarly, -nostdinc* (where * is nonempty) is unhandled by either
the Ada or D compiler, so the spec should not substitute those
either (thanks for pointing that out, Jakub).

Brought to my attention by Michał Górny <mgorny@gentoo.org>.

gcc/ada/ChangeLog:

* gcc-interface/lang-specs.h: Replace %{nostdinc*} %{nostdlib*}
with %{nostdinc} %{nostdlib}.

gcc/d/ChangeLog:

* lang-specs.h: Replace %{nostdinc*} with %{nostdinc}.

gcc/testsuite/ChangeLog:

* gcc.dg/driver-nostdlibstar.c: New test.

c++: Implement for static locals CWG 2867 - Order of initialization for structured bindings [PR115769]

On Wed, Aug 14, 2024 at 10:06:24AM +0200, Jakub Jelinek wrote:
> Though, now that I think about it again, perhaps what we could do instead
> is just make sure the _ZGVZ3barvEDC1x1y1z1wE initialization doesn't have
> a CLEANUP_POINT_EXPR in it and wrap both the _ZGVZ3barvEDC1x1y1z1wE
> and cp_finish_decomp created stuff into a single CLEANUP_POINT_EXPR.
> That way, perhaps _ZGVZ3barvEDC1x1y1z1wE could be initialized by one thread
> and _ZGVZ3barvE1x by a different, but the temporaries from _ZGVZ3barvEDC1x1y1z1wE
> initialization would be only destructed after the _ZGVZ3barvE1w guard
> was released by the thread which initialized _ZGVZ3barvEDC1x1y1z1wE.

Here is the I believe ABI compatible version, which uses the separate
guard variables, so different structured binding variables can be
initialized in different threads, but the thread that did the artificial
base initialization will keep temporaries live at least until the last
guard variable is released (i.e. when even that variable has been
initialized).

2025-01-22 Jakub Jelinek <jakub@redhat.com>

PR c++/115769
* decl.cc: Partially implement CWG 2867 - Order of initialization
for structured bindings.
(cp_finish_decl): If need_decomp_init, for function scope structure
binding bases, temporarily clear stmts_are_full_exprs_p before
calling expand_static_init, after it call cp_finish_decomp and wrap
code emitted by both into maybe_cleanup_point_expr_void and ensure
cp_finish_decomp isn't called again.

* g++.dg/DRs/dr2867-3.C: New test.
* g++.dg/DRs/dr2867-4.C: New test.

c++: further tweak to cxx_eval_outermost_constant_expr [PR118396]

This patch adds an error in a !allow_non_constant case when the
initializer/object types don't match.

PR c++/118396

gcc/cp/ChangeLog:

* constexpr.cc (cxx_eval_outermost_constant_expr): Add an error call
when !allow_non_constant.

Reviewed-by: Jason Merrill <jason@redhat.com>

jit: fix startup on aarch64

libgccjit fails on startup on aarch64 (and probably other archs).

The issues are that

(a) within jit_langhook_init the call to
targetm.init_builtins can use types that aren't representable
via jit::recording::type, and

(b) targetm.init_builtins can call lang_hooks.decls.pushdecl, which
although a no-op for libgccjit has a gcc_unreachable.

Fixed thusly.

gcc/jit/ChangeLog:
* dummy-frontend.cc (tree_type_to_jit_type): For POINTER_TYPE,
bail out if the inner call to tree_type_to_jit_type fails.
Don't abort on unknown types.
(jit_langhook_pushdecl): Replace gcc_unreachable with return of
NULL_TREE.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

s390: Fix arch15 machine string for binutils

gcc/ChangeLog:

* config/s390/s390.cc: Fix arch15 machine string which must not
be empty.

aarch64: Fix aarch64_write_sysregdi predicate

While working on another MSR-related patch, I noticed that
aarch64_write_sysregdi's constraints allowed zero, but its
predicate didn't. This could in principle lead to an ICE
during or after RA, since "Z" allows the RA to rematerialise
a known zero directly into the instruction.

The usual techniques for exposing a bug like that didn't work in this
case, since the optimisers seem to make no attempt to remove redundant
zero moves (at least not for these unspec_volatiles). But the problem
still seems worth fixing pre-emptively.

gcc/
* config/aarch64/aarch64.md (aarch64_read_sysregti): Change
the source predicate to aarch64_reg_or_zero.

gcc/testsuite/
* gcc.target/aarch64/acle/rwsr-4.c: New test.
* gcc.target/aarch64/acle/rwsr-armv8p9.c: Avoid read of uninitialized
variable.

AVR: Add test cases for PR118591.

gcc/testsuite/
PR rtl-optimization/118591
* gcc.target/avr/torture/pr118591-1.c: New test.
* gcc.target/avr/torture/pr118591-2.c: New test.

c++: Clear TARGET_EXPR_ELIDING_P when forced to use a copy constructor due to __no_unique_address__ [PR118199]

We currently fail with a checking assert upon the following valid code
when using -fno-elide-constructors

=== cut here ===
struct d { ~d(); };
d &b();
struct f {
  [[__no_unique_address__]] d e;
};
struct h : f  {
  h() : f{b()} {}
} i;
=== cut here ===

The problem is that split_nonconstant_init_1 detects that it cannot
elide the copy constructor due to __no_unique_address__ but does not
clear TARGET_EXPR_ELIDING_P, and due to -fno-elide-constructors, we trip
on a checking assert in cp_gimplify_expr.

This patch fixes this by making sure that we clear TARGET_EXPR_ELIDING_P
if we determine that we have to keep the copy constructor due to
__no_unique_address__. An alternative would be to just check for
elide_constructors in that assert, but I think it'd lose most of its
value if we did so.

PR c++/118199

gcc/cp/ChangeLog:

* typeck2.cc (split_nonconstant_init_1): Clear
TARGET_EXPR_ELIDING_P if we need to use a copy constructor
because of __no_unique_address__.

gcc/testsuite/ChangeLog:

* g++.dg/init/no-elide3.C: New test.

LoongArch: Fix wrong code with <optab>_alsl_reversesi_extended

The second source register of this insn cannot be the same as the
destination register.

gcc/ChangeLog:

* config/loongarch/loongarch.md
(<optab>_alsl_reversesi_extended): Add '&' to the destination
register constraint and append '0' to the first source register
constraint to indicate the destination register cannot be same
as the second source register, and change the split condition to
reload_completed so that the insn will be split only after RA in
order to obtain allocated registers that satisfy the above
constraints.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/bitwise-shift-reassoc-clobber.c: New
test.

c++: Improve cp_parser_objc_messsage_args compile time

On Tue, Jan 21, 2025 at 06:47:53PM +0100, Jakub Jelinek wrote:
> Indeed, I've just used what it was doing without thinking too much about it,
> sorry.
> addl_args = tree_cons (NULL_TREE, arg, addl_args);
> with addl_args = nreverse (addl_args); after the loop might be better,
> can test that incrementally. sel_args is handled the same and should have
> the same treatment.

Here is incremental patch to do that.

Verified also on the 2 va-meth*.mm testcases (one without CPP_EMBED, one
with) that -fdump-tree-gimple is the same before/after the patch.

2025-01-22 Jakub Jelinek <jakub@redhat.com>

* parser.cc (cp_parser_objc_message_args): Use tree_cons with
nreverse at the end for both sel_args and addl_args, instead of
chainon with build_tree_list second argument.

c++: Introduce append_ctor_to_tree_vector

On Mon, Jan 20, 2025 at 05:14:33PM -0500, Jason Merrill wrote:
> > --- gcc/cp/call.cc.jj       2025-01-15 18:24:36.135503866 +0100
> > +++ gcc/cp/call.cc  2025-01-17 14:42:38.201643385 +0100
> > @@ -4258,11 +4258,30 @@ add_list_candidates (tree fns, tree firs
> >     /* Expand the CONSTRUCTOR into a new argument vec.  */
>
> Maybe we could factor out a function called something like
> append_ctor_to_tree_vector from the common code between this and
> make_tree_vector_from_ctor?
>
> But this is OK as is if you don't want to pursue that.

I had the previous patch already tested and wanted to avoid delaying
the large initializer speedup re-reversion any further, so I've committed
the patch as is.

Here is an incremental patch to factor that out.

2025-01-22  Jakub Jelinek  <jakub@redhat.com>

gcc/c-family/
* c-common.h (append_ctor_to_tree_vector): Declare.
* c-common.cc (append_ctor_to_tree_vector): New function.
(make_tree_vector_from_ctor): Use it.
gcc/cp/
* call.cc (add_list_candidates): Use append_ctor_to_tree_vector.

c++: 'this' capture clobbered during recursive inst [PR116756]

Here during instantiation of generic lambda's op() [with I = 0] we
substitute into the call self(self, cst<1>{}) which requires recursive
instantiation of the same op() [with I = 1] (which isn't deferred due to
lambda's deduced return type. During this recursive instantiation, the
DECL_EXPR case of tsubst_stmt clobbers LAMBDA_EXPR_THIS_CAPTURE to point
to the child op()'s specialized capture proxy instead of the parent's,
and the original value is never restored.

So later when substituting into the openSeries call in the parent op()
maybe_resolve_dummy uses the 'this' proxy belonging to the child op(),
which leads to a context mismatch ICE during gimplification of the
proxy.

An earlier version of this patch fixed this by making instantiate_body
save/restore LAMBDA_EXPR_THIS_CAPTURE during a lambda op() instantiation.
But it seems cleaner to avoid overwriting LAMBDA_EXPR_THIS_CAPTURE in the
first place by making it point to the non-specialized capture proxy, and
instead call retrieve_local_specialization as needed, which is what this
patch implements. It's natural then to not clear LAMBDA_EXPR_THIS_CAPTURE
after parsing/regenerating a lambda.

PR c++/116756

gcc/cp/ChangeLog:

* lambda.cc (lambda_expr_this_capture): Call
retrieve_local_specialization on the result of
LAMBDA_EXPR_THIS_CAPTURE for a generic lambda.
* parser.cc (cp_parser_lambda_expression): Don't clear
LAMBDA_EXPR_THIS_CAPTURE.
* pt.cc (tsubst_stmt) <case DECL_EXPR>: Don't overwrite
LAMBDA_EXPR_THIS_CAPTURE with the specialized capture.
(tsubst_lambda_expr): Don't clear LAMBDA_EXPR_THIS_CAPTURE
afterward.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1z/constexpr-if-lambda7.C: New test.

Reviewed-by: Jason Merrill <jason@redhat.com>

Daily bump.

Revert "[PATCH 1/2] RISC-V:Add intrinsic support for the CMOs extensions"

This reverts commit d2c8548e0ce51dac6bc51d37236c50f98fca82f0.

Revert "[PATCH 2/2] RISC-V:Add intrinsic cases for the CMOs extensions"

This reverts commit b22d9c8f8216d15773dee4f9677c6b26aff507fd.

match: Improve the `x ==/!= ~x` pattern [PR118483]

This improves this pattern by 2 ways:
* Allow for an optional convert, similar to how the few other
`a OP ~a` patterns also allow for an optional convert.
* Use bitwise_inverted_equal_p/maybe_bit_not instead of directly
matching bit_not. Just like the other patterns do too.

Note pr118483-2.c used to optimized for aarch64-linux-gnu with GCC 4.9.4
on the RTL level even though the gimple level was missing it.

PR tree-optimization/118483

gcc/ChangeLog:

* match.pd (`x ==/!= ~x`): Allow for an optional convert
and use itwise_inverted_equal_p/maybe_bit_not instead of
directly matching bit_not.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/pr118483-1.c: New test.
* gcc.dg/tree-ssa/pr118483-2.c: New test.
* gcc.dg/tree-ssa/pr118483-3.c: New test.
* gcc.dg/tree-ssa/pr118483-4.c: New test.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

c++: Don't call fold from cp_fold if one of the operands is an error_mark [PR118525]

While adding a new match pattern, g++.dg/cpp2a/consteval36.C started to ICE and that was
because we would call fold even if one of the operands of the comparison was an error_mark_node.
I found a new testcase which also ICEs before this patch too so show the issue was latent.

So there is code in cp_fold to avoid calling fold when one of the operands become error_mark_node
but with the addition of consteval, the replacement of an invalid call is replaced before the call
to cp_fold and there is no way to pop up the error_mark. So this patch changes the current code to
check if the operands of the expression are error_mark_node before checking if the folded operand
is different from the previous one.

Bootstrapped and tested on x86_64-linux-gnu.

PR c++/118525

gcc/cp/ChangeLog:

* cp-gimplify.cc (cp_fold): Check operands of unary, binary, cond/vec_cond
and array_ref for error_mark before checking if the operands had changed.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/consteval38.C: New test.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

testsuite: Require int32plus for test case pr117546.c

Test case is valid even if size of int is more than 32 bits.

gcc/testsuite/ChangeLog:

* gcc.dg/torture/pr117546.c: Require effective target int32plus.

Signed-off-by: Dimitar Dimitrov <dimitar@dinux.eu>

libphobos: Add MIPS64 implementation of fiber_switchContext [PR118584]

Replaces the generic implementation. The `core.thread.fiber' module
already defines version=AsmExternal on mips64el-linux-gnuabi64.

PR d/118584

libphobos/ChangeLog:

* libdruntime/config/mips/switchcontext.S: Add MIPS64 N64 ABI
implementation of fiber_switchContext.

RISC-V: Unbreak bootstrap.

This fixes a wrong format specifier and an unused variable which should
re-enable bootstrap.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_file_end): Fix format string.
(riscv_lshift_subword): Mark MODE as unused.

AVR: Tweak some 16-bit shifts by using MUL.

u16 << 5 and u16 << 6 can be tweaked by using MUL instructions.
Benefit is a better speed ratio with -Os and smaller size with -O2.

gcc/
* config/avr/avr-passes.cc (avr_emit_shift) [ASHIFT,HImode]:
Allow offsets 5 and 6 as 3op provided have MUL and a scratch.
* config/avr/avr.cc (avr_optimize_size_max_p): New function.
(avr_out_ashlhi3_mul): New function.
(ashlhi3_out) [case 4, 5, 6]: Better speed for -Os.
* config/avr/avr.md (isa) <mul, no_mul>: New attr values.
(*ashlhi3_const): Add alternative for offsets 5 and 6.

c++: Handle CPP_EMBED in cp_parser_objc_message_args [PR118586]

As the following testcases show, I forgot to handle CPP_EMBED in
cp_parser_objc_message_args which is another place which can parse
possibly long valid lists of CPP_COMMA separated CPP_NUMBER tokens.

2025-01-21 Jakub Jelinek <jakub@redhat.com>

PR objc++/118586
gcc/cp/
* parser.cc (cp_parser_objc_message_args): Handle CPP_EMBED.
gcc/testsuite/
* objc.dg/embed-1.m: New test.
* obj-c++.dg/embed-1.mm: New test.
* obj-c++.dg/va-meth-2.mm: New test.

RISC-V: Add a new constraint to ensure that the vl of XTheadVector does not get a non-zero immediate

Although we have handled the vl of XTheadVector correctly in the
expand phase and predicates, the results show that the work is
still insufficient.

In the curr_insn_transform function, the insn is transformed from:
(insn 69 67 225 12 (set (mem:RVVM8SF (reg/f:DI 218 [ _77 ]) [0  S[128, 128] A32])
        (if_then_else:RVVM8SF (unspec:RVVMF4BI [
                    (const_vector:RVVMF4BI repeat [
                            (const_int 1 [0x1])
                        ])
                    (reg:DI 209)
                    (const_int 0 [0])
                    (reg:SI 66 vl)
                    (reg:SI 67 vtype)
                ] UNSPEC_VPREDICATE)
            (reg/v:RVVM8SF 143 [ _xx ])
            (mem:RVVM8SF (reg/f:DI 218 [ _77 ]) [0  S[128, 128] A32])))
     (expr_list:REG_DEAD (reg/v:RVVM8SF 143 [ _xx ])
        (nil)))
to
(insn 69 284 225 11 (set (mem:RVVM8SF (reg/f:DI 18 s2 [orig:218 _77 ] [218]) [0  S[128, 128] A32])
        (if_then_else:RVVM8SF (unspec:RVVMF4BI [
                    (const_vector:RVVMF4BI repeat [
                            (const_int 1 [0x1])
                        ])
                    (const_int 1 [0x1])
                    (const_int 0 [0])
                    (reg:SI 66 vl)
                    (reg:SI 67 vtype)
                ] UNSPEC_VPREDICATE)
            (reg/v:RVVM8SF 104 v8 [orig:143 _xx ] [143])
            (mem:RVVM8SF (reg/f:DI 18 s2 [orig:218 _77 ] [218]) [0  S[128, 128] A32])))
     (nil))

Looking at the log for the reload pass, it is found that "Changing pseudo 209 in
operand 3 of insn 69 on equiv 0x1".
It converts the vl operand in insn from the expected register(reg:DI 209) to the
constant 1(const_int 1 [0x1]).

This conversion occurs because, although the predicate for the vl operand is
restricted by "vector_length_operand" in the pattern, the constraint is still
"rK", which allows the transformation.

The issue is that changing the "rK" constraint to "rJ" for the constraint of vl
operand in the pattern would prevent this conversion, But unfortunately this will
conflict with RVV (RISC-V Vector Extension).

Based on the review's recommendations, the best solution for now is to create
a new constraint to distinguish between RVV and XTheadVector, which is exactly
what this patch does.

PR target/116593

gcc/ChangeLog:

* config/riscv/constraints.md (vl): New.
* config/riscv/thead-vector.md: Replacing rK with rvl.
* config/riscv/vector.md: Likewise.

gcc/testsuite/ChangeLog:

* g++.target/riscv/rvv/rvv.exp: Enable testsuite of XTheadVector.
* g++.target/riscv/rvv/xtheadvector/pr116593.C: New test.

RISC-V: Enable and adjust the testsuite for XTheadVector.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/rvv.exp: Enable testsuite of
XTheadVector.
* gcc.target/riscv/rvv/xtheadvector/pr114194.c: Adjust correctly.
* gcc.target/riscv/rvv/xtheadvector/prefix.c: Likewise.
* gcc.target/riscv/rvv/xtheadvector/vlb-vsb.c: Likewise.
* gcc.target/riscv/rvv/xtheadvector/vlbu-vsb.c: Likewise.
* gcc.target/riscv/rvv/xtheadvector/vlh-vsh.c: Likewise.
* gcc.target/riscv/rvv/xtheadvector/vlhu-vsh.c: Likewise.
* gcc.target/riscv/rvv/xtheadvector/vlw-vsw.c: Likewise.
* gcc.target/riscv/rvv/xtheadvector/vlwu-vsw.c: Likewise.

Use `known_ge' instead of `compare_sizes_for_sort'.

gcc/
* lra-spills.cc (assign_stack_slot_num_and_sort_pseudos): Use known_ge
to compare sizes.

testsuite: Add testcase for already fixed PR [PR118560]

The fix for this PR has been committed without a testcase.
The following testcase would take at least 15 minutes to compile
on a fast machine (powerpc64-linux both -m32 or -m64), now it takes
100ms.

2025-01-21 Jakub Jelinek <jakub@redhat.com>

PR target/118560
* gcc.dg/dfp/pr118560.c: New test.

c++: fix wrong-code with constexpr prvalue opt [PR118396]

The recent r15-6369 unfortunately caused a bad wrong-code issue.
Here we have

  TARGET_EXPR <D.2996, (void) (D.2996 = {.status=0, .data={._vptr.Foo=&_ZTV3Foo + 16}})>

and call cp_fold_r -> maybe_constant_init with object=D.2996.  In
cxx_eval_outermost_constant_expr we now take the type of the object
if present.  An object can't have type 'void' and so we continue to
evaluate the initializer.  That evaluates into a VOID_CST, meaning
we disregard the whole initializer, and terrible things ensue.

For non-simple TARGET_EXPRs, we should return ctx.ctor rather than
the result of cxx_eval_constant_expression.

PR c++/118396
PR c++/118523

gcc/cp/ChangeLog:

* constexpr.cc (cxx_eval_outermost_constant_expr): For non-simple
TARGET_EXPRs, return ctx.ctor rather than the result of
cxx_eval_constant_expression.  If TYPE and the type of R don't
match, return the original expression.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/constexpr-prvalue4.C: New test.
* g++.dg/cpp1y/constexpr-prvalue3.C: New test.

Reviewed-by: Jason Merrill <jason@redhat.com>

vect: Force alignment peeling to vectorize more early break loops [PR118211]: update 'gcc.dg/vect/vect-switch-search-line-fast.c' for GCN

PR tree-optimization/118211
PR tree-optimization/116126
gcc/testsuite/
* gcc.dg/vect/vect-switch-search-line-fast.c: Update for GCN.

MAINTAINERS: add myself to write after approval

ChangeLog:

* MAINTAINERS: Add myself to write after approval.

[RISC-V][PR target/116256] Fix incorrect return value for predicate

Another bug found while chasing paths to fix the remaining issues in pr116256.

This case is sometimes benign when the optimizers are enabled.  But could show
up in a -O0 compile with some patterns I was playing around with.

Basically we have a predicate that is meant to return true if bits set in the
operand are all consecutive.

That predicate would return the wrong value when presented with (const_int 0)
indicating it had a run of on bits when obviously no bits are on 😉

It's pretty obvious once you look at the implementation.

if (exact_log2 ((val >> ctz_hwi (val)) + 1) < 0)
  return false
return true;

The right shift is always going to produce 0.  0 + 1 = 1 which is a power of 2.
So exact_log2 returns 0 and we get a true result rather than a false result.

The fix is trivial.  "<=".  While inside we might as well fix the formatting.

Tested on rv32 and rv64 in my tester.  Waiting on upstream pre-commit testing
to render a verdict.

PR target/116256
gcc/
* config/riscv/predicates.md (consecutive_bits_operand): Properly
handle (const_int 0).

Regenerate aarch64.opt.urls

This updates aarch64.opt.urls after my patch earlier today.

Pushing directly as it's an obvious fix.

gcc/ChangeLog:

* config/aarch64/aarch64.opt.urls: Regenerate

tree-optimization/118569 - LC SSA broken after unrolling

The following amends the previous fix to mark all of the loop BBs
as need to be scanned for new LC PHI uses when its nesting parents
changed, noticing one caller of fix_loop_placement was already
doing that. So the following moves this code into fix_loop_placement,
covering both callers now.

PR tree-optimization/118569
* cfgloopmanip.cc (fix_loop_placement): When the loops
nesting parents changed, mark all blocks to be scanned
for LC PHI uses.
(fix_bb_placements): Remove code moved into fix_loop_placement.

* gcc.dg/torture/pr118569.c: New testcase.

AArch64: Add LUTI ACLE for SVE2

This patch introduces support for LUTI2/LUTI4 ACLE for SVE2.

LUTI instructions are used for efficient table lookups with 2-bit
or 4-bit indices. LUTI2 reads indexed 8-bit or 16-bit elements from
the low 128 bits of the table vector using packed 2-bit indices,
while LUTI4 can read from the low 128 or 256 bits of the table
vector or from two table vectors using packed 4-bit indices.
These instructions fill the destination vector by copying elements
indexed by segments of the source vector, selected by the vector
segment index.

The changes include the addition of a new AArch64 option
extension "lut", __ARM_FEATURE_LUT preprocessor macro, definitions
for the new LUTI instruction shapes, and implementations of the
svluti2 and svluti4 builtins.

gcc/ChangeLog:

* config/aarch64/aarch64-c.cc
(aarch64_update_cpp_builtins): Add new flag TARGET_LUT.
* config/aarch64/aarch64-sve-builtins-shapes.cc
(struct luti_base): Shape for lut intrinsics.
(SHAPE): Specializations for lut shapes for luti2 and luti4..
* config/aarch64/aarch64-sve-builtins-shapes.h: Declare lut
intrinsics.
* config/aarch64/aarch64-sve-builtins-sve2.cc
(class svluti_lane_impl): Define expand for lut intrinsics.
(FUNCTION): Define expand for lut intrinsics.
* config/aarch64/aarch64-sve-builtins-sve2.def
(REQUIRED_EXTENSIONS): Declare lut intrinsics behind lut flag.
(svluti2_lane): Define intrinsic behind flag.
(svluti4_lane): Define intrinsic behind flag.
* config/aarch64/aarch64-sve-builtins-sve2.h: Declare lut
intrinsics.
* config/aarch64/aarch64-sve-builtins.cc
(TYPES_bh_data): New type for byte and halfword.
(bh_data): Type array for byte and halfword.
(h_data): Type array for halfword.
* config/aarch64/aarch64-sve2.md
(@aarch64_sve_luti<LUTI_BITS><mode>): Instruction patterns for
lut intrinsics.
* config/aarch64/iterators.md: Iterators and attributes for lut
intrinsics.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/sve/acle/asm/test_sve_acle.h: New test
macro.
* lib/target-supports.exp: Add lut flag to the for loop.
* gcc.target/aarch64/sve/acle/general-c/lut_1.c: New test.
* gcc.target/aarch64/sve/acle/general-c/lut_2.c: New test.
* gcc.target/aarch64/sve/acle/general-c/lut_3.c: New test.
* gcc.target/aarch64/sve/acle/general-c/lut_4.c: New test.
* gcc.target/aarch64/sve2/acle/asm/luti2_bf16.c: New test.
* gcc.target/aarch64/sve2/acle/asm/luti2_f16.c: New test.
* gcc.target/aarch64/sve2/acle/asm/luti2_s16.c: New test.
* gcc.target/aarch64/sve2/acle/asm/luti2_s8.c: New test.
* gcc.target/aarch64/sve2/acle/asm/luti2_u16.c: New test.
* gcc.target/aarch64/sve2/acle/asm/luti2_u8.c: New test.
* gcc.target/aarch64/sve2/acle/asm/luti4_bf16.c: New test.
* gcc.target/aarch64/sve2/acle/asm/luti4_bf16_x2.c: New test.
* gcc.target/aarch64/sve2/acle/asm/luti4_f16.c: New test.
* gcc.target/aarch64/sve2/acle/asm/luti4_f16_x2.c: New test.
* gcc.target/aarch64/sve2/acle/asm/luti4_s16.c: New test.
* gcc.target/aarch64/sve2/acle/asm/luti4_s16_x2.c: New test.
* gcc.target/aarch64/sve2/acle/asm/luti4_s8.c: New test.
* gcc.target/aarch64/sve2/acle/asm/luti4_u16.c: New test.
* gcc.target/aarch64/sve2/acle/asm/luti4_u16_x2.c: New test.
* gcc.target/aarch64/sve2/acle/asm/luti4_u8.c: New test.

c++: Don't ICE in build_class_member_access_expr during error recovery [PR118225]

The invalid case in this PR trips on an assertion in
build_class_member_access_expr that build_base_path would never return
an error_mark_node, which is actually incorrect if the object involves a
tree with an error_mark_node DECL_INITIAL, like here.

This patch changes the assert to not fire if an error has been reported.

PR c++/118225

gcc/cp/ChangeLog:

* typeck.cc (build_class_member_access_expr): Let errors that
that have been reported go through.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/constexpr-ice21.C: New test.

middle-end: use ncopies both when registering and reading masks [PR118273]

When registering masks for SIMD clone we end up using nmasks instead of
nvectors where nmasks seems to compute the number of input masks required for
the call given the current simdlen.

This is however wrong as vect_record_loop_mask wants to know how many masks you
want to create from the given vectype. i.e. which level of rgroups to create.

This ends up mismatching with vect_get_loop_mask which uses nvectors and if the
return type is narrower than the input types there will be a mismatch which
causes us to try to read from the given rgroup. It only happens to work if the
function had an additional argument that's wider or if all elements and return
types are the same size.

This fixes it by using nvectors during registration as well, which has already
taken into account SLP and VF.

gcc/ChangeLog:

PR middle-end/118273
* tree-vect-stmts.cc (vectorizable_simd_clone_call): Use nvectors when
doing mask registrations.

gcc/testsuite/ChangeLog:

PR middle-end/118273
* gcc.target/aarch64/vect-simd-clone-4.c: New test.

aarch64: Drop ILP32 from default elf multilibs after deprecation

Following the deprecation of ILP32 *-elf builds fail now due to -Werror on the
deprecation warning. This is because on embedded builds ILP32 is part of the
default multilib.

This patch removed it from the default target as the build would fail anyway.

gcc/ChangeLog:

* config.gcc (aarch64-*-elf): Drop ILP32 from default multilibs.

LoongArch: Implement target pragma.

The target pragmas defined correspond to the target function attributes.

This implementation is derived from AArch64.

gcc/ChangeLog:

* config/loongarch/loongarch-protos.h
(loongarch_reset_previous_fndecl): Add function declaration.
(loongarch_save_restore_target_globals): Likewise.
(loongarch_register_pragmas): Likewise.
* config/loongarch/loongarch-target-attr.cc
(loongarch_option_valid_attribute_p): Optimize the processing
of attributes.
(loongarch_pragma_target_parse): New functions.
(loongarch_register_pragmas): Likewise.
* config/loongarch/loongarch.cc
(loongarch_reset_previous_fndecl): New functions.
(loongarch_set_current_function): When the old_tree is the same
as the new_tree, the rules for using registers, etc.,
are set according to the option values to ensure that the
pragma can be processed correctly.
* config/loongarch/loongarch.h (REGISTER_TARGET_PRAGMAS):
Define macro.
* doc/extend.texi: Supplemental Documentation.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/arch-func-attr-1.c: Add '#pragma'.
* gcc.target/loongarch/cmodel-func-attr-1.c: Likewise.
* gcc.target/loongarch/lasx-func-attr-1.c: Likewise.
* gcc.target/loongarch/lsx-func-attr-1.c: Likewise.
* gcc.target/loongarch/strict_align-func-attr-1.c: Likewise.
* gcc.target/loongarch/strict_align-func-attr-2.c: Likewise.
* gcc.target/loongarch/vector-func-attr-1.c: Likewise.
* gcc.target/loongarch/arch-pragma-attr-1.c: Likewise.
* gcc.target/loongarch/cmodel-pragma-attr-1.c: New test.
* gcc.target/loongarch/lasx-pragma-attr-1.c: New test.
* gcc.target/loongarch/lasx-pragma-attr-2.c: New test.
* gcc.target/loongarch/lsx-pragma-attr-1.c: New test.
* gcc.target/loongarch/lsx-pragma-attr-2.c: New test.
* gcc.target/loongarch/strict_align-pragma-attr-1.c: New test.
* gcc.target/loongarch/strict_align-pragma-attr-2.c: New test.
* gcc.target/loongarch/vector-pragma-attr-1.c: New test.
* gcc.target/loongarch/pragma-push-pop.c: New test.

LoongArch: Implement target attribute.

Add function attributes support for LoongArch.

Currently, the following items are supported:

        __attribute__ ((target ("{no-}strict-align")))
        __attribute__ ((target ("cmodel=")))
        __attribute__ ((target ("arch=")))
        __attribute__ ((target ("tune=")))
        __attribute__ ((target ("{no-}lsx")))
        __attribute__ ((target ("{no-}lasx")))

This implementation is derived from AArch64.

gcc/ChangeLog:

* attr-urls.def: Regenerate.
* config.gcc: Add loongarch-target-attr.o to extra_objs.
* config/loongarch/loongarch-protos.h
(loongarch_option_valid_attribute_p): Function declaration.
(loongarch_option_override_internal): Likewise.
* config/loongarch/loongarch.cc
(loongarch_option_override_internal): Delete the modifications
to target_option_default_node and target_option_current_node.
(loongarch_set_current_function): Add annotation information.
(loongarch_option_override): add assignment operations to
target_option_default_node and target_option_current_node.
(TARGET_OPTION_VALID_ATTRIBUTE_P): Define.
* config/loongarch/t-loongarch: Add compilation of target file
loongarch-target-attr.o.
* doc/extend.texi: Add description information of LoongArch
Function Attributes.
* config/loongarch/loongarch-target-attr.cc: New file.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/arch-func-attr-1.c: New test.
* gcc.target/loongarch/cmodel-func-attr-1.c: New test.
* gcc.target/loongarch/lasx-func-attr-1.c: New test.
* gcc.target/loongarch/lasx-func-attr-2.c: New test.
* gcc.target/loongarch/lsx-func-attr-1.c: New test.
* gcc.target/loongarch/lsx-func-attr-2.c: New test.
* gcc.target/loongarch/strict_align-func-attr-1.c: New test.
* gcc.target/loongarch/strict_align-func-attr-2.c: New test.
* gcc.target/loongarch/vector-func-attr-1.c: New test.
* gcc.target/loongarch/attr-check-error-message.c: New test.

testsuite: Fix test failing with -fimplicit-constexpr [PR118277]

While testing an unrelated C++ patch with "make check-c++-all", I
noticed that r15-6760-g38a13ea4117b96 added a test case that fails with
-fimplicit-constexpr.

The problem is that this test unconditionally expects an error stating
that a non-constexpr function is called, but that function is
auto-magically constexpr'd under -fimplicit-constexpr.

As suggested by Jakub, this patch simply passes -fno-implicit-constexpr
in that test.

PR c++/118277

gcc/testsuite/ChangeLog:

* g++.dg/cpp1z/constexpr-asm-5.C: Pass -fno-implicit-constexpr.

Add warning for non-spec compliant FMV in Aarch64

This patch adds a warning when FMV is used for Aarch64.

The reasoning for this is the ACLE [1] spec for FMV has diverged
significantly from the current implementation and we want to prevent
potential future compatability issues.

There is a patch for an ACLE compliant version of target_version and
target_clone in progress but it won't make gcc-15.

This has been bootstrap and regression tested for Aarch64.
Is this okay for master and packport to gcc-14?

[1] https://github.com/ARM-software/acle/blob/main/main/acle.md#function-multi-versioning

gcc/ChangeLog:

* config/aarch64/aarch64.cc
(aarch64_process_target_version_attr): Add experimental warning.
* config/aarch64/aarch64.opt: Add command line option to disable
warning.
* doc/invoke.texi: Add documentation for -W[no-]experimental-fmv-target.

gcc/testsuite/ChangeLog:

* g++.target/aarch64/mv-1.C: Add CLI flag.
* g++.target/aarch64/mv-symbols1.C: Add CLI flag.
* g++.target/aarch64/mv-symbols2.C: Add CLI flag.
* g++.target/aarch64/mv-symbols3.C: Add CLI flag.
* g++.target/aarch64/mv-symbols4.C: Add CLI flag.
* g++.target/aarch64/mv-symbols5.C: Add CLI flag.
* g++.target/aarch64/mv-warning1.C: New test.
* g++.target/aarch64/mvc-symbols1.C: Add CLI flag.
* g++.target/aarch64/mvc-symbols2.C: Add CLI flag.
* g++.target/aarch64/mvc-symbols3.C: Add CLI flag.
* g++.target/aarch64/mvc-symbols4.C: Add CLI flag.
* g++.target/aarch64/mv-pragma.C: Add CLI flag.
* g++.target/aarch64/mvc-warning1.C: New test.

c++: Speed up compilation of large char array initializers when not using #embed

The following patch (again, on top of the #embed patchset
attempts to optimize compilation of large {{{,un}signed ,}char,std::byte}
array initializers when not using #embed in the source.

Unlike the C patch which is done during the parsing of initializers this
is done when lexing tokens into an array, because C++ lexes all tokens
upfront and so by the time we parse the initializers we already have 16
bytes per token allocated (i.e. 32 extra compile time memory bytes per
one byte in the array).

The drawback is again that it can result in worse locations for diagnostics
(-Wnarrowing, -Wconversion) when initializing signed char arrays with values
128..255.  Not really sure what to do about this though unlike the C case,
the locations would need to be preserved through reshape_init* and perhaps
till template instantiation.
For #embed, there is just a single location_t (could be range of the
directive), for diagnostics perhaps we could extend it to say byte xyz of
the file embedded here or something like that, but the optimization done by
this patch, either we'd need to bump the minimum limit at which to try it,
or say temporarily allocate a location_t array for each byte and then clear
it when we no longer need it or something.
I've been using the same testcases as for C, with #embed of 100'000'000
bytes:
time ./cc1plus -quiet -O2 -o test4a.s2 test4a.c

real    0m0.972s
user    0m0.578s
sys     0m0.195s
with xxd -i alternative of the same data without this patch it consumed
around 13.2GB of RAM and
time ./cc1plus -quiet -O2 -o test4b.s4 test4b.c

real    3m47.968s
user    3m41.907s
sys     0m5.015s
and the same with this patch it consumed around 3.7GB of RAM and
time ./cc1plus -quiet -O2 -o test4b.s3 test4b.c

real    0m24.772s
user    0m23.118s
sys     0m1.495s

2025-01-21  Jakub Jelinek  <jakub@redhat.com>

* parser.cc (cp_lexer_new_main): Attempt to optimize large sequences
of CPP_NUMBER with int type and values 0-255 separated by CPP_COMMA
into CPP_EMBED with RAW_DATA_CST u.value.

c, c++: Return 1 for __has_builtin(__builtin_va_arg) and __has_builtin(__builtin_c23_va_start)

The Linux kernel uses its own copy of stdarg.h.
Now, before GCC 15, our stdarg.h had
    #if defined __STDC_VERSION__ && __STDC_VERSION__ > 201710L
    #define va_start(v, ...)        __builtin_va_start(v, 0)
    #else
    #define va_start(v,l)   __builtin_va_start(v,l)
    #endif
va_start definition but GCC 15 has:
    #if defined __STDC_VERSION__ && __STDC_VERSION__ > 201710L
    #define va_start(...) __builtin_c23_va_start(__VA_ARGS__)
    #else
    #define va_start(v,l)   __builtin_va_start(v,l)
    #endif

I wanted to suggest to the kernel people during their porting to C23
that they'd better use C23 compatible va_start macro definition,
but to make it portable, I think they really want something like
    #if defined __STDC_VERSION__ && __STDC_VERSION__ > 201710L
    #define va_start(v, ...)        __builtin_va_start(v, 0)
    #ifdef __has_builtin
    #if __has_builtin(__builtin_c23_va_start)
    #undef va_start
    #define va_start(...) __builtin_c23_va_start(__VA_ARGS__)
    #endif
    #else
    #define va_start(v,l)   __builtin_va_start(v,l)
    #endif
or so (or with >= 202311L), as GCC 13-14 and clang don't support
__builtin_c23_va_start (yet?) and one gets better user experience with
that.

Except it seems __has_builtin(__builtin_c23_va_start) doesn't actually work,
it works for most of the stdarg.h __builtin_va_*, doesn't work for
__builtin_va_arg (neither C nor C++) and didn't work for
__builtin_c23_va_start if it was available.

The following patch wires __has_builtin for those.

2025-01-21  Jakub Jelinek  <jakub@redhat.com>

gcc/c/
* c-decl.cc (names_builtin_p): Return 1 for RID_C23_VA_START and
RID_VA_ARG.
gcc/cp/
* cp-objcp-common.cc (names_builtin_p): Return 1 for RID_VA_ARG.
gcc/testsuite/
* c-c++-common/cpp/has-builtin-4.c: New test.

c++: Handle RAW_DATA_CST in add_list_candidates [PR118532]

This is the second bug discovered today with the
https://gcc.gnu.org/pipermail/gcc-patches/2025-January/673945.html
hack but then turned into proper testcases where embed-2[23].C FAILed
since introduction of optimized #embed support and the others when
optimizing large C++ initializers using RAW_DATA_CST.

The add_list_candidates problem is the same as with
make_tree_vector_from_ctor, unfortunately it can't call that
function because it can have those additional artificial arguments
that need to be pushed earlier.
When working on the patch, I've also noticed an error where we didn't
know how to dump RAW_DATA_CST, so I've added support for that too.

2025-01-21 Jakub Jelinek <jakub@redhat.com>

PR c++/118532
* call.cc (add_list_candidates): Handle RAW_DATA_CST among init_list
elts.
* error.cc (dump_expr_init_vec): Handle RAW_DATA_CST among v elts.

* g++.dg/cpp/embed-22.C: New test.
* g++.dg/cpp/embed-23.C: New test.
* g++.dg/cpp0x/pr118532.C: New test.
* g++.dg/cpp2a/explicit20.C: New test.

Daily bump.

c++/modules: Check linkage of structured binding decls

When looking at PR c++/118513 I noticed that we don't currently check
the linkage of structured binding declarations in modules. This patch
adds those checks, and corrects decl_linkage to properly recognise
structured binding declarations as potentially having linkage.

gcc/cp/ChangeLog:

* parser.cc (cp_parser_decomposition_declaration): Check linkage
of structured bindings in modules.
* tree.cc (decl_linkage): Structured bindings don't necessarily
have no linkage.

gcc/testsuite/ChangeLog:

* g++.dg/modules/export-6.C: Add structured binding tests.
* g++.dg/modules/hdr-2.H: Likewise.

Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>

c++/modules: Handle mismatching TYPE_CANONICAL when deduping partial specs [PR118101]

In r15-4862 we ensured that merging a partial specialisation would
properly update its TYPE_CANONICAL. However, this confuses the deduping
mechanism, since the canonical type has updated out from under it,
causing is_matching_decl to crash when seeing the equivalent types with
different TYPE_CANONICAL.

This patch solves the issue by forcing structural equality checking for
this case; this way mismatching TYPE_CANONICAL doesn't cause issues, but
we still can handle the case that the types are legitimately different.

PR c++/118101

gcc/cp/ChangeLog:

* module.cc (trees_in::decl_value): Use structural equality when
deduping partial specs with mismatching canonical types.

gcc/testsuite/ChangeLog:

* g++.dg/modules/partial-7.h: New test.
* g++.dg/modules/partial-7_a.C: New test.
* g++.dg/modules/partial-7_b.C: New test.
* g++.dg/modules/partial-7_c.C: New test.

Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>

[PR118560][LRA]: Fix typo in checking secondary memory mode for the reg class

The patch for PR118067 wrongly checked hard reg set subset. It worked for
the equal sets as in PR118067. But it was wrong in other cases as in
PR118560 (inordinate compile time).

gcc/ChangeLog:

PR target/118560
* lra-constraints.cc (invalid_mode_reg_p): Exchange args in
hard_reg_set_subset_p call.

[PR target/116256] Adjust expected output in a couple testcases

I've had a long standing TODO to review the RISC-V testsuite regressions from
enabling the late-combine pass (pr116256). I adjusted a few cases months ago,
this adjusts a couple more were it looks like the right thing to do.

All that's left after this are the vls/dup-? tests which regress in meaningful
ways and I'm still investigating reasonable approaches to fix them (they play
into the whole mvconst_internal pattern situation), late-combine isn't doing
anything wrong.

PR target/116256
gcc/testsuite
* gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-37.c: Update expected
output.
* gcc.target/riscv/rvv/vsetvl/vsetvl-15.c: Likewise.

[PR target/114442] Add reservations for all insn types to xiangshan-nanhu model

The RISC-V backend has checks to verify that every used insn has an associated
type and that every insn type maps to some reservation in the DFA model.  If
either test fails we ICE.

With the cpu/isa allowed to vary independently from the tune/scheduler model,
it's entirely possible (in fact trivial) to trigger those kinds of ICEs.

This patch "fixes" the ICEs for xiangshan-nanhu by throwing every unknown insn
type into a special bucket  I wouldn't be surprised if a few of them are
implemented (like rotates as the chip seems to have other bitmanip extensions).
But I know nothing about this design and the DFA author hasn't responded to
requests to update the DFA in ~6 months.

This should dramatically reduce the number of ICEs in the testsuite if someone
were to turn on xiangshan-nanhu scheduling.

Not strictly a regression, but a bugfix and highly isolated to the
xiangshan-nanhu tuning in the RISC-V backend.  So I'm gating this into gcc-15,
assuming pre-commit doesn't balk.

PR target/114442
gcc/
* config/riscv/xiangshan.md: Add missing insn types to a
new dummy insn reservation.

gcc/testsuite

* gcc.target/riscv/pr114442.c: New test.

[PR target/116256] Fix latent regression in pattern to associate arithmetic to simplify constants

This is something I spotted working on an outstanding issue with pr116256.
It's a latent regression.   I'm reasonably sure that with some effort I could
find a testcase that would represent a regression, probably just by adjusting
the constant index in "f" within in gcc.c-torture/execute/index-1.c

We have to define_insn_and_split patterns that potentially reassociate shifts
and adds to make a constant term cheaper to synthesize. The split code for
these two patterns assumes that if the left shifted constant is cheaper than
the right shifted constant then the left shifted constant can be loaded with a
trivial set.  That is not always the case -- and we'll ICE.

This patch simplifies the matching condition so that it only matches when the
constant can be loaded with a single instruction.

Tested in my tester on rv32 and rv64.  Will wait for precommit CI to render a
verdict.

Jeff

PR target/116256
gcc/
* config/riscv/riscv.md (reassocating constant addition): Adjust
condition to avoid creating an unrecognizable insn.

Update gcc zh_CN.po

* zh_CN.po: Update.

[PR117868][LRA]: Restrict the reuse of spill slots

This is an LRA bug derived from reuse spilling slots after frame pointer spilling.
The slot was created for QImode (1 byte) and it was reused after spilling of the
frame pointer for TImode register (16 bytes long) and it overlaps other slots.

Wrong things happened while `lra_spill ()'
---------------------------- part of lra-spills.cc ----------------------------
  n = assign_spill_hard_regs (pseudo_regnos, n);
  slots_num = 0;
  assign_stack_slot_num_and_sort_pseudos (pseudo_regnos, n);  <--- first call ---
  for (i = 0; i < n; i++)
    if (pseudo_slots[pseudo_regnos[i]].mem == NULL_RTX)
      assign_mem_slot (pseudo_regnos[i]);
  if ((n2 = lra_update_fp2sp_elimination (pseudo_regnos)) > 0)
    {
      /* Assign stack slots to spilled pseudos assigned to fp.  */
      assign_stack_slot_num_and_sort_pseudos (pseudo_regnos, n2);  <--- second call ---
      for (i = 0; i < n2; i++)
if (pseudo_slots[pseudo_regnos[i]].mem == NULL_RTX)
  assign_mem_slot (pseudo_regnos[i]);
    }
------------------------------------------------------------------------------

In a first call of `assign_stack_slot_num_and_sort_pseudos(...)' LRA allocates slot #17
for r93 (QImode - 1 byte).
In a second call of `assign_stack_slot_num_and_sort_pseudos(...)' LRA reuse slot #17 for
r114 (TImode - 16 bytes).
It's wrong. We can't reuse 1 byte slot #17 for 16 bytes register.

The code in patch does reuse slots only without allocated memory or only with equal or
smaller registers with equal or smaller alignment.
Also, a small fix for debugging output of slot width.
Print slot size as width, not a 0 as a size of (mem/c:BLK (...)).

PR rtl-optimization/117868
gcc/
* lra-spills.cc (assign_stack_slot_num_and_sort_pseudos): Reuse slots
only without allocated memory or only with equal or smaller registers
with equal or smaller alignment.
(lra_spill): Print slot size as width.

Fortran: improve error message for conflicting OpenMP clauses [PR107122]

PR fortran/107122

gcc/fortran/ChangeLog:

* openmp.cc (resolve_omp_clauses): Add 'with' to error message text.

gcc/testsuite/ChangeLog:

* gfortran.dg/gomp/order-8.f90: Adjust pattern.

vect: Preserve OMP info for conditional stores [PR118348]

OMP reductions are lowered into the form:

    idx = .OMP_SIMD_LANE (simuid, 0);
    ...
    oldval = D.anon[idx];
    newval = oldval op ...;
    D.anon[idx] = newval;

So if the scalar loop has a {0, +, 1} iv i, idx = i % vf.
Despite this wraparound, the vectoriser pretends that the D.anon
accesses are linear.  It records the .OMP_SIMD_LANE's second argument
(val) in the data_reference aux field (-1 - val) and then copies this
to the stmt_vec_info simd_lane_access_p field (val + 1).

vectorizable_load and vectorizable_store use simd_lane_access_p
to detect accesses of this form and suppress the vector pointer
increments that would be used for genuine linear accesses.

The difference in this PR is that the reduction is conditional,
and so the store back to D.anon is recognised as a conditional
store pattern.  simd_lane_access_p was not being copied across
from the original stmt_vec_info to the pattern stmt_vec_info,
meaning that it was vectorised as a normal linear store.

gcc/
PR tree-optimization/118348
* tree-vectorizer.cc (vec_info::move_dr): Copy
STMT_VINFO_SIMD_LANE_ACCESS_P.

gcc/testsuite/
PR tree-optimization/118348
* gcc.target/aarch64/pr118348_1.c: New test.
* gcc.target/aarch64/pr118348_2.c: Likewise.

Revert "vect: Preserve OMP info for conditional stores [PR118384]"

This reverts commit 8edf8b552313951cb4f2f97821ee4b3820c9506b.

vect: Preserve OMP info for conditional stores [PR118384]

OMP reductions are lowered into the form:

    idx = .OMP_SIMD_LANE (simuid, 0);
    ...
    oldval = D.anon[idx];
    newval = oldval op ...;
    D.anon[idx] = newval;

So if the scalar loop has a {0, +, 1} iv i, idx = i % vf.
Despite this wraparound, the vectoriser pretends that the D.anon
accesses are linear.  It records the .OMP_SIMD_LANE's second argument
(val) in the data_reference aux field (-1 - val) and then copies this
to the stmt_vec_info simd_lane_access_p field (val + 1).

vectorizable_load and vectorizable_store use simd_lane_access_p
to detect accesses of this form and suppress the vector pointer
increments that would be used for genuine linear accesses.

The difference in this PR is that the reduction is conditional,
and so the store back to D.anon is recognised as a conditional
store pattern.  simd_lane_access_p was not being copied across
from the original stmt_vec_info to the pattern stmt_vec_info,
meaning that it was vectorised as a normal linear store.

gcc/
PR tree-optimization/118384
* tree-vectorizer.cc (vec_info::move_dr): Copy
STMT_VINFO_SIMD_LANE_ACCESS_P.

gcc/testsuite/
PR tree-optimization/118384
* gcc.target/aarch64/pr118384_1.c: New test.
* gcc.target/aarch64/pr118384_2.c: Likewise.

aarch64: Fix invalid subregs in xorsign [PR118501]

In the testcase, we try to use xorsign on:

   (subreg:DF (reg:TI R) 8)

i.e. the highpart of the TI.  xorsign wants to take a V2DF
paradoxical subreg of this, which is rightly rejected as a direct
operation.  In cases like this, we need to force the highpart into
a fresh register first.

gcc/
PR target/118501
* config/aarch64/aarch64.md (@xorsign<mode>3): Use
force_lowpart_subreg.

gcc/testsuite/
PR target/118501
* gcc.c-torture/compile/pr118501.c: New test.

aarch64: Add missing simd requirements for INS [PR118531]

In g:b096a6ebe9d9f9fed4c105f6555f724eb32af95c I'd forgotten
to gate some uses of INS on TARGET_SIMD.

gcc/
PR target/118531
* config/aarch64/aarch64.md (*insv_reg<mode>_<SUBDI_BITS>)
(*aarch64_bfi<GPI:mode><ALLX:mode>_<SUBDI_BITS>)
(*aarch64_bfidi<ALLX:mode>_subreg_<SUBDI_BITS>): Add missing
simd requirements.

gcc/testsuite/
* gcc.target/aarch64/ins_bitfield_1a.c: New test.
* gcc.target/aarch64/ins_bitfield_3a.c: Likewise.
* gcc.target/aarch64/ins_bitfield_5a.c: Likewise.

d: Fix failing test with 32-bit compiler [PR114434]

Since the introduction of gdc.test/runnable/test23514.d, it's exposed an
incorrect compilation when adding a 64-bit constant to a link-time
address. The current cast to size_t causes a loss of precision, which
can result in incorrect compilation.

PR d/114434

gcc/d/ChangeLog:

* expr.cc (ExprVisitor::visit (PtrExp *)): Get the offset as a
dinteger_t rather than a size_t.
(ExprVisitor::visit (SymOffExp *)): Likewise.

Fortran: do not copy back for parameter actual arguments [PR81978]

When an array is packed for passing as an actual argument, and the array
has the PARAMETER attribute (i.e., it is a named constant that can reside
in read-only memory), do not copy back (unpack) from the temporary.

PR fortran/81978

gcc/fortran/ChangeLog:

* trans-array.cc (gfc_conv_array_parameter): Do not copy back data
if actual array parameter has the PARAMETER attribute.
* trans-expr.cc (gfc_conv_subref_array_arg): Likewise.

gcc/testsuite/ChangeLog:

* gfortran.dg/pr81978.f90: New test.

c++: Handle RAW_DATA_CST in make_tree_vector_from_ctor [PR118528]

This is the first bug discovered today with the
https://gcc.gnu.org/pipermail/gcc-patches/2025-January/673945.html
hack but then turned into proper testcases where embed-21.C FAILed
since introduction of optimized #embed support and the other when
optimizing large C++ initializers using RAW_DATA_CST.

The problem is that the C++ FE calls make_tree_vector_from_ctor
and uses that as arguments vector for deduction guide handling.
The call.cc code isn't prepared to handle RAW_DATA_CST just about
everywhere, so I think it is safer to make sure RAW_DATA_CST only
appears in CONSTRUCTOR_ELTS and nowhere else.
Thus, the following patch expands the RAW_DATA_CSTs from initializers
into multiple INTEGER_CSTs in the returned vector.

2025-01-20 Jakub Jelinek <jakub@redhat.com>

PR c++/118528
* c-common.cc (make_tree_vector_from_ctor): Expand RAW_DATA_CST
elements from the CONSTRUCTOR to individual INTEGER_CSTs.

* g++.dg/cpp/embed-21.C: New test.
* g++.dg/cpp2a/class-deduction-aggr16.C: New test.

RISC-V: Correct the mode that is causing the program to fail for XTheadCondMov

For XTheadCondMov, the bit width of rs2 should always be XLEN-sized, otherwise
the program logic will be wrong.

Reference form
https://github.com/XUANTIE-RV/thead-extension-spec/releases/download/2.3.0/xthead-2023-11-10-2.3.0.pdf

Synopsis
Move if equal zero.

Mnemonic
th.mveqz rd, rs1, rs2

Description
This instruction moves the content of register rs1 into rd if the content of rs2 is 0x0.
Otherwise, the value of rd does not change.

Operation
if (reg[rs2] == 0x0)
reg[rd] := reg[rs1]

gcc/ChangeLog:

* config/riscv/thead.md (*th_cond_mov<GPR:mode><GPR2:mode>):
Change GPR2 to X.
(*th_cond_mov<GPR:mode>): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/xtheadcondmov-bug.c: New test.

inline: Purge the abnormal edges as needed in fold_marked_statements [PR118077]

While fixing PR target/117665, I had noticed that fold_marked_statements
would not purge the abnormal edges which could not be taken any more due
to folding a call (devirtualization or simplification of a [target] builtin).
Devirutalization could also cause a call that used to be able to have an
abornal edge become one not needing one too so this was needed for GCC 15.

Bootstrapped and tested on x86_64-linux-gnu

PR tree-optimization/118077
PR tree-optimization/117668

gcc/ChangeLog:

* tree-inline.cc (fold_marked_statements): Purge abnormal edges
as needed.

gcc/testsuite/ChangeLog:

* g++.dg/opt/devirt6.C: New test.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

libstdc++: perfectly forward std::ranges::clamp arguments

As reported in PR118185, std::ranges::clamp does not correctly forward
the projected value to the comparator. Add the missing forward.

libstdc++-v3/ChangeLog:

PR libstdc++/118185
PR libstdc++/100249
* include/bits/ranges_algo.h (__clamp_fn): Correctly forward the
projected value to the comparator.
* testsuite/25_algorithms/clamp/118185.cc: New test.

Signed-off-by: Giuseppe D'Angelo <giuseppe.dangelo@kdab.com>
Reviewed-by: Patrick Palka <ppalka@redhat.com>
Reviewed-by: Jonathan Wakely <jwakely@redhat.com>

arm, testsuite: fix fast-math-bb-slp-complex-mla-float.c dg-add-options

The test uses floats, not fp16 so it should use arm_v8_3a_complex_neon
instead of arm_v8_3a_fp16_complex_neon.

This makes it PASS on arm-linux-gnueabihf instead of being UNRESOLVED.

gcc/testsuite/ChangeLog:
* gcc.dg/vect/complex/fast-math-bb-slp-complex-mla-float.c: Use
arm_v8_3a_complex_neon.

arm, testsuite: remove duplicate dg-add-options arm_v8_3a_complex_neon

These two testcases have twice the same dg-add-options
arm_v8_3a_complex_neon, the patch removes one of them.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/complex/complex-operations-run.c: Remove duplicate
dg-add-options arm_v8_3a_complex_neon.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-add-pattern-double.c:
Likewise.

tree-optimization/117875 - missed SLP vectorization

There's a discrepancy in SLP vs non-SLP vectorization that SLP build
does not handle plain SSA copies (which should have been elimiated
earlier). But this now bites back since non-SLP happily handles them,
causing a regression with --param vect-force-slp=1 which is now default,
resulting in a big performance regression in 456.hmmer.

So the following restores parity between SLP and non-SLP here, defering
the missed copy elimination to later (PR118565).

PR tree-optimization/117875
* tree-vect-slp.cc (vect_build_slp_tree_1): Handle SSA copies.

LoongArch: Improve reassociation for bitwise operation and left shift [PR 115921]

For things like

        (x | 0x101) << 11

It's obvious to write:

        ori     $r4,$r4,257
        slli.d  $r4,$r4,11

But we are actually generating something insane:

        lu12i.w $r12,524288>>12             # 0x80000
        ori     $r12,$r12,2048
        slli.d  $r4,$r4,11
        or      $r4,$r4,$r12
        jr      $r1

It's because the target-independent canonicalization was written before
we have all the RISC targets where loading an immediate may need
multiple instructions.  So for these targets we need to handle this in
the target code.

We do the reassociation on our own (i.e. reverting the
target-independent reassociation) if "(reg [&|^] mask) << shamt" does
not need to load mask into an register, and either:
- (mask << shamt) needs to be loaded into an register, or
- shamt is a const_immalsl_operand, so the outer shift may be further
  combined with an add.

gcc/ChangeLog:

PR target/115921
* config/loongarch/loongarch-protos.h
(loongarch_reassoc_shift_bitwise): New function prototype.
* config/loongarch/loongarch.cc
(loongarch_reassoc_shift_bitwise): Implement.
* config/loongarch/loongarch.md
(*alslsi3_extend_subreg): New define_insn_and_split.
(<any_bitwise:optab>_shift_reverse<X:mode>): New
define_insn_and_split.
(<any_bitwise:optab>_alsl_reversesi_extended): New
define_insn_and_split.
(zero_extend_ashift): Remove as it's just a special case of
and_shift_reversedi, and it does not make too much sense to
write "alsl.d rd,rs,r0,shamt" instead of "slli.d rd,rs,shamt".
(bstrpick_alsl_paired): Remove as it is already done by
splitting and_shift_reversedi into and + ashift first, then
late combining the ashift and a further add.

gcc/testsuite/ChangeLog:

PR target/115921
* gcc.target/loongarch/bstrpick_alsl_paired.c (scan-rtl-dump):
Scan for and_shift_reversedi instead of the removed
bstrpick_alsl_paired.
* gcc.target/loongarch/bitwise-shift-reassoc.c: New test.

LoongArch: Simplify using bstr{ins,pick} instructions for and

For bstrins, we can merge it into and<mode>3 instead of having a
separate define_insn.

For bstrpick, we can use the constraints to ensure the first source
register and the destination register are the same hardware register,
instead of emitting a move manually.

This will simplify the next commit where we'll reassociate bitwise
and left shift for better code generation.

gcc/ChangeLog:

* config/loongarch/constraints.md (Yy): New define_constriant.
* config/loongarch/loongarch.cc (loongarch_print_operand):
For "%M", output the index of bits to be used with
bstrins/bstrpick.
* config/loongarch/predicates.md (ins_zero_bitmask_operand):
Exclude low_bitmask_operand as for low_bitmask_operand it's
always better to use bstrpick instead of bstrins.
(and_operand): New define_predicate.
* config/loongarch/loongarch.md (any_or): New
define_code_iterator.
(bitwise_operand): New define_code_attr.
(*<optab:any_or><mode:GPR>3): New define_insn.
(*and<mode:GPR>3): New define_insn.
(<optab:any_bitwise><mode:X>3): New define_expand.
(and<mode>3_extended): Remove, replaced by the 3rd alternative
of *and<mode:GPR>3.
(bstrins_<mode>_for_mask): Remove, replaced by the 4th
alternative of *and<mode:GPR>3.
(*<optab:any_bitwise>si3_internal): Remove, already covered by
the *<optab:any_or><mode:GPR>3 and *and<mode:GPR>3 templates.

testsuite: Fix name of PR116348 test case

gcc/testsuite/ChangeLog:

* gcc.c-torture/compile/pr116438.c: Rename to ...
* gcc.c-torture/compile/pr116348.c: ... this.

tree-optimization/118552 - failed LC SSA update after unrolling

When unrolling changes nesting relationship of loops we fail to
mark blocks as in need to change for LC SSA update. Specifically
the LC SSA PHI on a former inner loop exit might be misplaced
if that loop becomes a sibling of its outer loop.

PR tree-optimization/118552
* cfgloopmanip.cc (fix_loop_placement): Properly mark
exit source blocks as to be scanned for LC SSA update when
the loops nesting relationship changed.
(fix_loop_placements): Adjust.
(fix_bb_placements): Likewise.

* gcc.dg/torture/pr118552.c: New testcase.

nvptx: Gracefully handle '-mptx=3.1' if neither sm_30 nor sm_35 multilib variant is built

For example, for GCC/nvptx built with '--with-arch=sm_52' (current default)
and '--without-multilib-list', neither a sm_30 nor a sm_35 multilib variant
is built, and thus no '-mptx=3.1' sub-variant either.  Such a configuration
is possible as of commit 86b3a7532d56f74fcd1c362f2da7f95e8cc4e4a6
"nvptx: Support '--with-multilib-list'", but currently results in the
following bogus behavior:

    [...]/xgcc -print-multi-directory -mgomp -march=sm_52
    mgomp
    [...]/xgcc -print-multi-directory -mgomp -march=sm_35
    mgomp
    [...]/xgcc -print-multi-directory -mgomp -march=sm_30
    mgomp
    [...]/xgcc -print-multi-directory -mgomp -march=sm_35 -mptx=3.1
    .
    [...]/xgcc -print-multi-directory -mgomp -march=sm_30 -mptx=3.1
    .

The latter two '.' are unexpected; linking OpenMP/nvptx offloading code
like this fails with: 'unresolved symbol __nvptx_uni', for example.
Instead of '.', the latter two should print 'mgomp', too.  To achieve that,
we must not set up the '-mptx=3.1' multilib axis if no '-mptx=3.1'
sub-variant is built.

gcc/
* config/nvptx/t-nvptx (MULTILIB_OPTIONS): Don't add 'mptx=3.1' if
neither sm_30 nor sm_35 multilib variant is built.

tree, c++: Consider TARGET_EXPR invariant like SAVE_EXPR [PR118509]

My October PR117259 fix to get_member_function_from_ptrfunc to use a
TARGET_EXPR rather than SAVE_EXPR unfortunately caused some regressions as
well as the following testcase shows.
What happens is that
get_member_function_from_ptrfunc -> build_base_path calls save_expr,
so since the PR117259 change in mnay cases it will call save_expr on
a TARGET_EXPR.  And, for some strange reason a TARGET_EXPR is not considered
an invariant, so we get a SAVE_EXPR wrapped around the TARGET_EXPR.
That SAVE_EXPR <TARGET_EXPR <...>> gets initially added only to the second
operand of ?:, so at that point it would still work fine during expansion.
But unfortunately an expression with that subexpression is handed to the
caller also through *instance_ptrptr = instance_ptr; and gets evaluated
once again when computing the first argument to the method.
So, essentially, we end up with
(TARGET_EXPR <D.2907, ...>, (... ? ... SAVE_EXPR <TARGET_EXPR <D.2907, ...>
... : ...)) (... SAVE_EXPR <TARGET_EXPR <D.2907, ...> ..., ...);
and while D.2907 is initialized during gimplification in the code dominating
everything that uses it, the extra temporary created for the SAVE_EXPR
is initialized only conditionally (if the ?: condition is true) but then
used unconditionally, so we get
pmf-4.C: In function ‘void foo(C, B*)’:
pmf-4.C:12:11: warning: ‘<anonymous>’ may be used uninitialized [-Wmaybe-uninitialized]
   12 |   (y->*x) ();
      |   ~~~~~~~~^~
pmf-4.C:12:11: note: ‘<anonymous>’ was declared here
   12 |   (y->*x) ();
      |   ~~~~~~~~^~
diagnostic and wrong-code issue too.

The following patch fixes it by considering a TARGET_EXPR invariant
for SAVE_EXPR purposes the same as SAVE_EXPR is.  Really creating another
temporary for it is just a waste of the IL.

Unfortunately I had to tweak the omp matching code to be able to accept
TARGET_EXPR the same as SAVE_EXPR.

2025-01-20  Jakub Jelinek  <jakub@redhat.com>

PR c++/118509
gcc/
* tree.cc (tree_invariant_p_1): Return true for TARGET_EXPR too.
gcc/c-family/
* c-omp.cc (c_finish_omp_for): Handle TARGET_EXPR in first operand
of COMPOUND_EXPR incr the same as SAVE_EXPR.
gcc/testsuite/
* g++.dg/expr/pmf-4.C: New test.

tree-ssa-dce: Fix calloc handling [PR118224]

As reported by Dimitar, this should have been a multiplication, but wasn't
caught because in the test (~(__SIZE_TYPE__) 0) / 2 is the largest accepted
size and so adding 3 to it also resulted in "overflow".

The following patch adds one subtest to really verify it is a multiplication
and fixes the operation.

2025-01-20 Jakub Jelinek <jakub@redhat.com>

PR tree-optimization/118224
* tree-ssa-dce.cc (is_removable_allocation_p): Multiply a1 by a2
instead of adding it.

* gcc.dg/pr118224.c: New test.

s390: Update vec_(load,store)_len(,_r)

Reflect latest updates for vec_(load,store)_len(,_r) which means that
all types except character based types are deprecated.

gcc/ChangeLog:

* config/s390/s390-builtins.def (s390_vec_load_len): Deprecate
some overloads.
(s390_vec_store_len): Deprecate some overloads.
(s390_vec_load_len_r): Add.
(s390_vec_store_len_r): Add.
* config/s390/s390-c.cc (s390_vec_load_len_r): Add.
(s390_vec_store_len_r): Add.
* config/s390/vecintrin.h (vec_load_len_r): Redefine.
(vec_store_len_r): Redefine.

s390: Vector shift: Add 128-bit integer support

Add 128-bit vector shift support. Deprecate vector shift by byte
builtins where the shift amount is not of type unsigned character.

gcc/ChangeLog:

* config/s390/s390-builtins.def: Add 128-bit variants.
* config/s390/s390-builtin-types.def: Update accordingly.
* config/s390/vector.md (<vec_shifts_name><mode>3): Add 128-bit
variants.
* config/s390/vx-builtins.md: Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/s390/vector/vec-shift-10.c: New test.
* gcc.target/s390/vector/vec-shift-11.c: New test.
* gcc.target/s390/vector/vec-shift-12.c: New test.
* gcc.target/s390/vector/vec-shift-3.c: New test.
* gcc.target/s390/vector/vec-shift-4.c: New test.
* gcc.target/s390/vector/vec-shift-5.c: New test.
* gcc.target/s390/vector/vec-shift-6.c: New test.
* gcc.target/s390/vector/vec-shift-7.c: New test.
* gcc.target/s390/vector/vec-shift-8.c: New test.
* gcc.target/s390/vector/vec-shift-9.c: New test.

s390: arch15: Vector maximum/minimum: Add 128-bit integer support

For previous architectures emulate operation max/min.

gcc/ChangeLog:

* config/s390/s390-builtins.def: Add 128-bit variants and remove
bool variants.
* config/s390/s390-builtin-types.def: Update accordinly.
* config/s390/s390.md: Emulate min/max for GPR.
* config/s390/vector.md: Add min/max patterns and emulate in
case of no VXE3.

gcc/testsuite/ChangeLog:

* gcc.target/s390/vector/vec-max-emu.c: New test.
* gcc.target/s390/vector/vec-min-emu.c: New test.