git.ipfire.org Git - thirdparty/gcc.git/log

testsuite: AMDGCN test for vect-early-break_38.c as well to consistent architecture [PR119286]

I had missed this one during the AMDGCN test failures.

Like vect-early-break_18.c this test is also scalaring the
loads and thus leading to unexpected vectorization for this
testcase.

gcc/testsuite/ChangeLog:

PR target/119286
* gcc.dg/vect/vect-early-break_38.c: Force -march=gfx908 for amdgcn.

OpenMP: Add libgomp.fortran/target-enter-data-8.f90

Add another testcase for Fortran deep mapping of allocatable components.

libgomp/ChangeLog:

* testsuite/libgomp.fortran/target-enter-data-8.f90: New test.

Accept allones or 0 operand for vcond_mask op1.

Since ix86_expand_sse_movcc will simplify them into a simple vmov, vpand
or vpandn.

gcc/ChangeLog:

* config/i386/predicates.md (vector_or_0_or_1s_operand): New predicate.
(nonimm_or_0_or_1s_operand): Ditto.
* config/i386/sse.md (vcond_mask_<mode><sseintvecmodelower>):
Extend the predicate of operands1 to accept 0 or allones
operands.
(vcond_mask_<mode><sseintvecmodelower>): Ditto.
(vcond_mask_v1tiv1ti): Ditto.
(vcond_mask_<mode><sseintvecmodelower>): Ditto.
* config/i386/i386.md (mov<mode>cc): Ditto for operands[2] and
operands[3].
* config/i386/i386-expand.cc (ix86_expand_sse_fp_minmax):
Force immediate_operand to register.

gcc/testsuite/ChangeLog:

* gcc.target/i386/blendv-to-maxmin.c: New test.
* gcc.target/i386/blendv-to-pand.c: New test.

Daily bump.

Fix vectorizer costs of COND_EXPR, MIN_EXPR, MAX_EXPR, ABS_EXPR, ABSU_EXPR

this patch adds special cases for vectorizer costs in COND_EXPR, MIN_EXPR,
MAX_EXPR, ABS_EXPR and ABSU_EXPR. We previously costed ABS_EXPR and ABSU_EXPR
but it was only correct for FP variant (wehre it corresponds to andss clearing
sign bit). Integer abs/absu is open coded as conditinal move for SSE2 and
SSE3 instroduced an instruction.

MIN_EXPR/MAX_EXPR compiles to minss/maxss for FP and accroding to Agner Fog
tables they costs same as sse_op on all targets. Integer translated to single
instruction since SSE3.

COND_EXPR translated to open-coded conditional move for SSE2, SSE4.1 simplified
the sequence and AVX512 introduced masked registers.

gcc/ChangeLog:

* config/i386/i386.cc (ix86_vector_costs::add_stmt_cost): Add special cases
for COND_EXPR; make MIN_EXPR, MAX_EXPR, ABS_EXPR and ABSU_EXPR more realistic.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr89618-2.c: XFAIL.

rs6000: Ignore OPTION_MASK_SAVE_TOC_INDIRECT differences in inlining decisions [PR119327]

The following testcase FAILs because the always_inline function can't
be inlined.
The rs6000 backend has similarly to other targets a hook which rejects
inlining which would bring in new ISAs which aren't there in the caller.
And this hook rejects this because of OPTION_MASK_SAVE_TOC_INDIRECT
differences.
This flag is set if explicitly requested or by default depending on
whether the current function looks hot (or at least not cold):
  if ((rs6000_isa_flags_explicit & OPTION_MASK_SAVE_TOC_INDIRECT) == 0
      && flag_shrink_wrap_separate
      && optimize_function_for_speed_p (cfun))
    rs6000_isa_flags |= OPTION_MASK_SAVE_TOC_INDIRECT;
The target nodes that are being compared here are actually the default
target node (which was created when cfun was NULL) vs. one that was
created for the always_inline function when it wasn't NULL, so one
doesn't have it, the other does.
In any case, this flag feels like a tuning decision rather than hard
ISA requirement and I see no problems why we couldn't inline
even explicit -msave-toc-indirect function into -mno-save-toc-indirect
or vice versa.
We already ignore OPTION_MASK_P{8,10}_FUSION which are also more
like tuning flags.

2025-04-22  Jakub Jelinek  <jakub@redhat.com>

PR target/119327
* config/rs6000/rs6000.cc (rs6000_can_inline_p): Ignore also
OPTION_MASK_SAVE_TOC_INDIRECT differences.

* g++.dg/opt/pr119327.C: New test.

Revert "libstdc++: Optimize std::projected<I, std::identity>" [PR119888]

This non-standard optimization breaks real-world code that expects the
result of std::projected to always (be a class type and) have a value_type
member, which isn't true for e.g. I=int*, so revert it for now.

PR libstdc++/119888

This reverts commit 51761c50f843d5be4e24172535e4524b5072f24c.

aarch64: Define __ARM_FEATURE_FAMINMAX

We implemented FAMINMAX ACLE support but failed to define the
associated feature macro.

gcc/
* config/aarch64/aarch64-c.cc (aarch64_update_cpp_builtins): Define
__ARM_FEATURE_FAMINMAX.

gcc/testsuite/
* gcc.target/aarch64/pragma_cpp_predefs_4.c: Test
__ARM_FEATURE_FAMINMAX.

Induction vectorizer: prevent ICE for scalable types

We currently check that the target suppports PLUS_EXPR and MINUS_EXPR
with step_vectype (a fix for pr103523). However, vectorizable_induction
can emit a vectorized MULT_EXPR when calculating the step of each IV for
SLP, and both MULT_EXPR/FLOAT_EXPR when calculating VEC_INIT for float
inductions.

gcc/ChangeLog:

* tree-vect-loop.cc (vectorizable_induction): Add target support
checks for vectorized MULT_EXPR and FLOAT_EXPR where necessary for
scalable types.
Prefer target_supports_op_p over directly_supports_p for these tree
codes.
(vectorizable_nonlinear_induction): Fix a doc comment while I'm
here.

AArch64: Emit half-precision FCMP/FCMPE

Enable a target with FEAT_FP16 to emit the half-precision variants
of FCMP/FCMPE.

gcc/ChangeLog:

* config/aarch64/aarch64.md: Update cbranch, cstore, fcmp
and fcmpe to use the GPF_F16 iterator for floating-point
modes.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/_Float16_cmp_1.c: New test.
* gcc.target/aarch64/_Float16_cmp_2.c: New (negative) test.

AArch64: Define the spaceship optab [PR117013]

This expansion ensures that exactly one comparison is emitted for
spacesip-like sequences on floating-point operands, including when
the result of such sequences are compared against members of
std::<some_ordering>::<some_value>.

For both integer and floating-point types, we optimize for the case
in which the result of a spaceship-like operation is written to a GPR.
The PR highlights this issue for floating-point operands, but we also
make an improvement for integers, preferring:

cmp     w0, w1
cset w1, gt
csinv w0, w1, wzr, ge

over:

cmp w0, w1
mov     w0, 1
csinv   w0, w0, wzr, ge
csel    w0, w0, wzr, ne

to compute:

auto test(int a, int b) { return a <=> b;}

gcc/ChangeLog:
PR target/117013
* config/aarch64/aarch64-protos.h (aarch64_expand_fp_spaceship):
Declare optab expander function for floating-point types.
* config/aarch64/aarch64.cc (aarch64_expand_fp_spaceship):
Define optab expansion for floating-point types (new function).
* config/aarch64/aarch64.md (spaceship<mode>4):
Add define_expands for spaceship<mode>4 on integer and
floating-point types.

gcc/testsuite/ChangeLog:
PR target/117013
* g++.target/aarch64/spaceship_1.C: New test.
* g++.target/aarch64/spaceship_2.C: New test.
* g++.target/aarch64/spaceship_3.C: New test.

aarch64: Update FP8 dependencies for -mcpu=olympus

We had not noticed that after g:299a8e2dc667e795991bc439d2cad5ea5bd379e2 the
FP8FMA and FP8DOT4 features aren't implied by FP8FMA. The intent is for
-mcpu=olympus to support all of them.
Fix the definition to include the relevant sub-features explicitly.

Signed-off-by: Kyrylo Tkachov <ktkachov@nvidia.com>
gcc/

* config/aarch64/aarch64-cores.def (olympus): Add fp8fma, fp8dot4
explicitly.

cobol: Restrict COBOL to supported Linux arches [PR119217]

The COBOL frontend is currently built on all x86_64 and aarch64 hosts
although the code contains some Linux/glibc specifics that break the build
e.g. on Solaris/amd64.

Tested on Linux/x86_64 and Solaris/amd64.

2025-03-17 Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE>

PR cobol/119217
* configure.ac: Restrict cobol to aarch64-*-linux*,
x86_64-*-linux*.
Fix indentation.
* configure: Regenerate.

libstdc++: Update baseline symbols for m68k-linux

* config/abi/post/m68k-linux-gnu/baseline_symbols.txt: Update.

Fortran: Various fixes on F2018 teams.

gcc/fortran/ChangeLog:

* match.cc (match_exit_cycle): Allow to exit team block.
(gfc_match_end_team): Create end_team node also without
parameter list.
* trans-intrinsic.cc (conv_stat_and_team): Team and team_number
only need to be a single pointer.
* trans-stmt.cc (trans_associate_var): Create a mapping coarray
token for coarray associations or it is not addressed correctly.
* trans.h (enum gfc_coarray_regtype): Add mapping mode to
coarray register.

libgfortran/ChangeLog:

* caf/libcaf.h: Add mapping mode to coarray's register.
* caf/single.c (_gfortran_caf_register): Create a token sharing
another token's memory.
(check_team): Check team parameters to coindexed expressions are
valid.

gcc/testsuite/ChangeLog:

* gfortran.dg/coarray/coindexed_3.f08: Add minimal test for
get_team().
* gfortran.dg/team_change_2.f90: Add test for change team with
label and exiting out of it.
* gfortran.dg/team_end_2.f90: Check parsing to labeled team
blocks is correct now.
* gfortran.dg/team_end_3.f90: Check that end_team call is
generated for labeled end_teams, too.
* gfortran.dg/coarray/coindexed_5.f90: New test.

Fortran: Add teams support in image_index and num_images for F2018

This more or less completes the set of functions that are affected by
teams.

gcc/fortran/ChangeLog:

* check.cc (gfc_check_image_index): Check for team or
team_number correctnes.
(gfc_check_num_images): Same.
* gfortran.texi: Update documentation on num_images' API
function.
* intrinsic.cc (add_functions): Update signature of image_index
and num_images. Both can take either a team handle or number.
* intrinsic.h (gfc_check_num_images): Update signature to take
either team or team_number.
(gfc_check_image_index): Can take coarray, subscripts and team
or team number now.
(gfc_simplify_image_index): Same.
(gfc_simplify_num_images): Same.
(gfc_resolve_image_index): Same.
* intrinsic.texi: Update documentation of num_images() Fortran
function.
* iresolve.cc (gfc_resolve_image_index): Update signature.
* simplify.cc (gfc_simplify_num_images): Update signature and
remove undocumented failed argument.
(gfc_simplify_image_index): Add team or team number argument.
* trans-intrinsic.cc (conv_stat_and_team): Because being
optional teams need to be a pointer to the opaque pointer.
(conv_caf_sendget): Correct call; was two arguments short.
(trans_image_index): Support team or team_number.
(trans_num_images): Same.
(conv_intrinsic_cobound): Adapt to changed signature of
num_images in call.
* trans-stmt.cc (gfc_trans_sync): Same.

libgfortran/ChangeLog:

* caf/libcaf.h (_gfortran_caf_num_images): Correct prototype.
* caf/single.c (_gfortran_caf_num_images): Default
implementation.

gcc/testsuite/ChangeLog:

* gfortran.dg/coarray_49.f90: Adapt to changed error message.
* gfortran.dg/coarray_collectives_12.f90: Adapt to changed
function signature of num_images.
* gfortran.dg/coarray_collectives_16.f90: Same.
* gfortran.dg/coarray_lib_this_image_1.f90: Same.
* gfortran.dg/coarray_lib_this_image_2.f90: Same.
* gfortran.dg/coarray_this_image_1.f90: Adapt tests for
num_images.
* gfortran.dg/coarray_this_image_2.f90: Same.
* gfortran.dg/coarray_this_image_3.f90: Same.
* gfortran.dg/num_images_1.f90: Check that deprecated syntax is
no longer supported.

Fortran: Add team-support to this_image [PR87326]

This_image() no longer has a distance formal argument, but a team one.
The source of the distance argument could not be identified, i.e.
whether it came from a TS or standard draft. To implement only the
standard it is removed. Besides being defined, it was not used anyway.

PR fortran/87326

gcc/fortran/ChangeLog:

* check.cc (gfc_check_this_image): Check the three different
parameter lists possible for this_image and sort them correctly.
* gfortran.texi: Update documentation on this_image's API.
* intrinsic.cc (add_functions): Update this_image's signature.
(check_specific): Add specific check for this_image.
* intrinsic.h (gfc_check_this_image): Change to flexible
argument list.
* intrinsic.texi: Update documentation on this_image().
* iresolve.cc (gfc_resolve_this_image): Resolve the different
arguments.
* simplify.cc (gfc_simplify_this_image): Simplify the simplify
routine.
* trans-decl.cc (gfc_build_builtin_function_decls): Update
signature of this_image.
* trans-expr.cc (gfc_caf_get_image_index): Use correct signature
of this_image.
* trans-intrinsic.cc (trans_this_image): Adapt to correct
signature.

libgfortran/ChangeLog:

* caf/libcaf.h (_gfortran_caf_this_image): Correct prototype.
* caf/single.c (struct caf_single_team): Add new_index of image.
(_gfortran_caf_this_image): Return the image index in the given team.
(_gfortran_caf_form_team): Set new_index in team structure.

gcc/testsuite/ChangeLog:

* gfortran.dg/coarray_10.f90: Update error messages.
* gfortran.dg/coarray_lib_this_image_1.f90: Same.
* gfortran.dg/coarray_lib_this_image_2.f90: Same.
* gfortran.dg/coarray_this_image_1.f90: Add more tests and
remove incorrect ones.
* gfortran.dg/coarray_this_image_2.f90: Test more features.
* gfortran.dg/coarray_this_image_3.f90: New test.

Fortran: Update get_team, team_number and image_status to F2018 [PR88154, PR88960, PR97210, PR103001]

Add functions get_team() and team_number() to comply with F2018
standard.

Update image_status() to comply with F2018 standard.

PR fortran/88154
PR fortran/88960
PR fortran/97210
PR fortran/103001

gcc/fortran/ChangeLog:

* check.cc (team_type_check): Check a type for being team_type
from the iso_fortran_env module.
(gfc_check_image_status): Use team_type check.
(gfc_check_get_team): Check for level argument.
(gfc_check_team_number): Use team_type check.
* expr.cc (gfc_check_assign): Add treatment for returning
team_type in caf-single mode.
* gfortran.texi: Add/Update documentation for get_team and
team_number API functions.
* intrinsic.cc (add_functions): Update get_team signature.
* intrinsic.h (gfc_resolve_get_team): Add prototype.
* intrinsic.texi: Add/Update documentation for get_team and
team_number Fortran functions.
* iresolve.cc (gfc_resolve_get_team): Resolve return type to be
of type team_type.
* iso-fortran-env.def: Update STAT_LOCK constants. They have
nothing to do with files. Add level constants for get_team.
* libgfortran.h: Add level and unlock_stat constants.
* simplify.cc (gfc_simplify_get_team): Simply to correct return
type team_type.
* trans-decl.cc (gfc_build_builtin_function_decls): Update
get_team and image_status API prototypes to correct signatures.
* trans-intrinsic.cc (conv_intrinsic_image_status): Translate
second parameter correctly.
(conv_intrinsic_team_number): Translate optional single team
argument correctly.
(gfc_conv_intrinsic_function): Add translation of get_team.

libgfortran/ChangeLog:

* caf/libcaf.h: Add constants for get_team's level argument and
update stat values for failed images.
(_gfortran_caf_team_number): Add prototype.
(_gfortran_caf_get_team): Same.
* caf/single.c (_gfortran_caf_team_number): Get the given team's
team number.
(_gfortran_caf_get_team): Get the current team or the team given
by level when the argument is present.

gcc/testsuite/ChangeLog:

* gfortran.dg/coarray/image_status_1.f08: Correct check for
team_type.
* gfortran.dg/pr102458.f90: Adapt to multiple errors.
* gfortran.dg/coarray/get_team_1.f90: New test.
* gfortran.dg/team_get_1.f90: New test.
* gfortran.dg/team_number_1.f90: Correct Fortran syntax.

Fortran: Improve F2018 TEAM handling [PR87326, PR87556, PR88254, PR103896]

Improve the implementation of F2018 TEAM handling routines. Add
runtime-functions to caf_single to allow testing.

PR fortran/87326
PR fortran/87556
PR fortran/88254
PR fortran/103796

gcc/fortran/ChangeLog:

* coarray.cc (split_expr_at_caf_ref): Treat polymorphic types
correctly. Ensure resolve of expression after coindex.
(create_allocated_callback): Fix parameter of allocated function
for coarrays.
(coindexed_expr_callback): Improve detection of coarrays in
allocated function.
* decl.cc (gfc_match_end): Add team block matching.
* dump-parse-tree.cc (show_code_node): Dump change team block as
such.
* frontend-passes.cc (gfc_code_walker): Recognice team block.
* gfortran.texi: Add documentation for team api functions.
* intrinsic.texi: Add documentation about team_type in
iso_fortran_env module.
* iso-fortran-env.def (team_type): Use helper to get pointer
kind.
* match.cc (gfc_match_associate): Factor out matching of
association list, because it is used in change team as well.
(check_coarray_assoc): Ensure, that the association is to a
coarray.
(match_association_list): Match a list of association either in
associate or in change team.
(gfc_match_form_team): Match form team correctly include
new_index.
(gfc_match_change_team): Match change team with association
list.
(gfc_match_end_team): Match end team including stat and errmsg.
(gfc_match_return): Prevent return from team block.
* parse.cc (decode_statement): Sort team block.
(next_statement): Same.
(check_statement_label): Same.
(accept_statement): Same.
(verify_st_order): Same.
(parse_associate): Renamed to move_associates_to_block...
(move_associates_to_block): ... to enable reuse for change team.
(parse_change_team): Parse it as block.
(parse_executable): Same.
* parse.h (enum gfc_compile_state): Add team block as compiler
state.
* resolve.cc (resolve_scalar_argument): New function to resolve
an argument to a statement as a scalar.
(resolve_form_team): Resolve its members.
(resolve_change_team): Same.
(resolve_branch): Prevent branch from jumping out of team block.
(check_team): Removed.
* trans-decl.cc (gfc_build_builtin_function_decls): Add stat and
errmsg to team API functions and update their arguments.
* trans-expr.cc (gfc_trans_subcomponent_assign): Also null the
token when moving memory or an allocated() will not detect a
free.
* trans-intrinsic.cc (gfc_conv_intrinsic_caf_is_present_remote):
Adapt to signature change no longer a pointer-pointer.
* trans-stmt.cc (gfc_trans_form_team): Translate a form team
including new_index.
(gfc_trans_change_team): Translate a change team as a block.

libgfortran/ChangeLog:

* caf/libcaf.h: Remove commented block.
(_gfortran_caf_form_team): Allow for all relevant arguments.
(_gfortran_caf_change_team): Same.
(_gfortran_caf_end_team): Same.
(_gfortran_caf_sync_team): Same.
* caf/single.c (struct caf_single_team): Team handling
structures.
(_gfortran_caf_init): Initialize initial team.
(free_team_list): Free all teams and the memory they hold.
(_gfortran_caf_finalize): Free initial and sibling teams.
(_gfortran_caf_register): Add memory registered to current team.
(_gfortran_caf_deregister): Unregister memory from current team.
(_gfortran_caf_is_present_on_remote): Check token's memptr for
llocation. May have been deallocated by an end team.
(_gfortran_caf_form_team): Push a new team stub to the list.
(_gfortran_caf_change_team): Push a formed team on top of the
ctive teams stack.
(_gfortran_caf_end_team): End the active team, free all memory
allocated during its livespan.
(_gfortran_caf_sync_team): Take stat and errmsg into account.

gcc/testsuite/ChangeLog:

* gfortran.dg/team_change_2.f90: New test.
* gfortran.dg/team_change_3.f90: New test.
* gfortran.dg/team_end_2.f90: New test.
* gfortran.dg/team_end_3.f90: New test.
* gfortran.dg/team_form_2.f90: New test.
* gfortran.dg/team_form_3.f90: New test.
* gfortran.dg/team_sync_2.f90: New test.

Fortran: Unify handling of STAT= and ERRMSG= optional arguments [PR87939]

In preparing F2018 Teams handling improvements, unify handling of STAT=
and ERRMSG= optional arguments. Handling of stat and errmsg in most
teams statements is corrected in the next patch.

Implement stat and errmsg for move_alloc () to comply with F2018.

PR fortran/87939

gcc/fortran/ChangeLog:

* check.cc (gfc_check_move_alloc): Add stat and errmsg to
move_alloc.
* dump-parse-tree.cc (show_sync_stat): New helper function.
(show_code_node): Use show_sync_stat to print stat and errmsg.
* gfortran.h (struct sync_stat): New struct to unify stat and
errmsg handling.
* intrinsic.cc (add_subroutines): Correct signature of
move_alloc.
* intrinsic.h (gfc_check_move_alloc): Correct signature of
check_move_alloc.
* match.cc (match_named_arg): Match an optional argument to a
statement.
(match_stat_errmsg): Match a stat= or errmsg= named argument.
(gfc_match_critical): Use match_stat_errmsg to match the named
arguments.
(gfc_match_sync_team): Same.
* resolve.cc (resolve_team_argument): Resolve an expr to have
type TEAM_TYPE from iso_fortran_env.
(resolve_scalar_variable_as_arg): Resolve an argument as a
scalar type.
(resolve_sync_stat): Resolve stat and errmsg expressions.
(resolve_sync_team): Resolve a sync team statement using
sync_stat helper.
(resolve_end_team): Same.
(resolve_critical): Same.
* trans-decl.cc (gfc_build_builtin_function_decls): Correct
sync_team signature.
* trans-intrinsic.cc (conv_intrinsic_move_alloc): Store stat
an errmsg optional arguments in helper struct and use helper
to translate.
* trans-stmt.cc (trans_exit): Implement DRY pattern for
generating an _exit().
(gfc_trans_sync_stat): Translate stat and errmsg contents.
(gfc_trans_end_team): Use helper to translate stat and errmsg.
(gfc_trans_sync_team): Same.
(gfc_trans_critical): Same.
* trans-stmt.h (gfc_trans_sync_stat): New function.
* trans.cc (gfc_deallocate_with_status): Parameterize check at
runtime to allow unallocated (co-)array when freeing a
structure.
(gfc_deallocate_scalar_with_status): Same and also add errmsg.
* trans.h (gfc_deallocate_with_status): Signature changes.
(gfc_deallocate_scalar_with_status): Same.

libgfortran/ChangeLog:

* caf/single.c (_gfortran_caf_lock): Correct stat value, if
lock is already locked by current image.
(_gfortran_caf_unlock): Correct stat value, if lock is not
locked.

gcc/testsuite/ChangeLog:

* gfortran.dg/coarray_critical_2.f90: New test.
* gfortran.dg/coarray_critical_3.f90: New test.
* gfortran.dg/team_sync_1.f90: New test.
* gfortran.dg/move_alloc_11.f90: New test.

[PATCH] [RISC-V]Support -mcpu for Xuantie cpu

Support -mcpu=xt-c908, xt-c908v, xt-c910, xt-c910v2, xt-c920, xt-c920v2
for Xuantie series cpu.
ref:https://www.xrvm.cn/community/download?id=4224248662731067392

without fmv_cost, vector_unaligned_access, use_divmod_expansion, overlap_op_by_pieces, fill the tune info with generic ooo for further modification.

gcc/ChangeLog:

* config/riscv/riscv-cores.def (RISCV_TUNE): Add xt-c908, xt-c908v,
xt-c910, xt-c910v2, xt-c920, xt-c920v2.
(RISCV_CORE): Add xt-c908, xt-c908v, xt-c910, xt-c910v2, xt-c920,
xt-c920v2.
* doc/invoke.texi: Add xt-c908, xt-c908v, xt-c910, xt-c910v2,
xt-c920, xt-c920v2.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/mcpu-xt-c908.c: test -mcpu=xt-c908.
* gcc.target/riscv/mcpu-xt-c910.c: test -mcpu=xt-c910.
* gcc.target/riscv/mcpu-xt-c920v2.c: test -mcpu=xt-c920v2.
* gcc.target/riscv/mcpu-xt-c908v.c: test -mcpu=xt-c908v.
* gcc.target/riscv/mcpu-xt-c910v2.c: test -mcpu=xt-c910v2.
* gcc.target/riscv/mcpu-xt-c920.c: test -mcpu=xt-c920.

libstdc++: Update baseline symbols for riscv64-linux

* config/abi/post/riscv64-linux-gnu/baseline_symbols.txt: Update.

testsuite: Add support for GCOV_UNDER_TEST

After commit r15-8947-g8ed2d5d219e999, which added new tests using
gcov, the CI noticed failures because it was calling 'gcov' instead of
$target-gcov.

This is because the CI scripts override GXX_UNDER_TEST, but still run
the testsuite in-tree, and gcc-transform-out-of-tree only depends on
TESTING_IN_BUILD_TREE but the definition of GCOV uses GCC_UNDER_TEST,
GXX_UNDER_TEST or GDC_UNDER_TEST (assuming their default values when
TESTING_IN_BUILD_TREE).

To handle such a case, this patch adds support for a new variable,
GCOV_UNDER_TEST, which overrides the current behavior if defined, and
has an effect similar to overriding GCC_UNDER_TEST etc...

Unfortunately, the change needs to be duplicated in several places,
which use either GCC_UNDER_TEST, GXX_UNDER_TEST or GDC_UNDER_TEST.

Tested g++.dg/gcov/gcov.exp and now g++.dg/gcov/gcov-22.C passes
(calling <installdir>/bin/$target-gcov as instructed by the CI
scripts). No change observed on gcc.misc-tests/gcov.exp, and I could
not test gdc.dg/gcov.exp and gnat.dg/gcov/gcov.exp.

gcc/testsuite/ChangeLog:

* g++.dg/gcov/gcov.exp: Handle GCOV_UNDER_TEST.
* gcc.misc-tests/gcov.exp: Likewise.
* gdc.dg/gcov.exp: Likewise.
* gnat.dg/gcov/gcov.exp: Likewise.

Document locality partitioning params in invoke.texi

Filip Kastl pointed out that contrib/check-params-in-docs.py complains
about params not documented in invoke.texi, so this patch adds the short
explanation from params.opt for these to the invoke.texi section.
Thanks for the reminder.

Signed-off-by: Kyrylo Tkachov <ktkachov@nvidia.com>
gcc/

* doc/invoke.texi (lto-partition-locality-frequency-cutoff,
lto-partition-locality-size-cutoff, lto-max-locality-partition):
Document.

libstdc++: Update Linux/sparc64 baselines for GCC 15.1

The Linux/sparc64 libstdc++ baselines haven't been updated for years.
This patch fixes that.

Tested on sparc64-unknown-linux-gnu on both the gcc-15 branch and trunk.

2025-04-21 Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE>

libstdc++-v3:
* config/abi/post/sparc64-linux-gnu/baseline_symbols.txt: Regenerate.
* config/abi/post/sparc64-linux-gnu/32/baseline_symbols.txt: Likewise.

libstdc++: Update Solaris baselines for GCC 15.1

This patch updates the Solaris libstdc++ baselines for GCC 15.1.

Tested on i386-pc-solaris2.11 and sparc-sun-solaris2.11 on both the
gcc-15 branch and trunk.

2025-02-11 Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE>

libstdc++-v3:
* config/abi/post/i386-solaris/baseline_symbols.txt: Regenerate.
* config/abi/post/i386-solaris/amd64/baseline_symbols.txt:
Likewise.
* config/abi/post/sparc-solaris/baseline_symbols.txt: Likewise.
* config/abi/post/sparc-solaris/sparcv9/baseline_symbols.txt:
Likewise.

libstdc++: Update baseline_symbols.txt for {x86_64,i486,powerpc64le,s390x,aarch64}-linux

We forgot to update these timely, sorry for that, the following patch
updates them from the 15.1-rc1 builds in Fedora.

2025-04-22 Jakub Jelinek <jakub@redhat.com>

* config/abi/post/x86_64-linux-gnu/baseline_symbols.txt: Update.
* config/abi/post/x86_64-linux-gnu/32/baseline_symbols.txt: Update.
* config/abi/post/i486-linux-gnu/baseline_symbols.txt: Update.
* config/abi/post/aarch64-linux-gnu/baseline_symbols.txt: Update.
* config/abi/post/s390x-linux-gnu/baseline_symbols.txt: Update.
* config/abi/post/powerpc64le-linux-gnu/baseline_symbols.txt: Update.

libstdc++: Increase timeouts for format and flat_maps tests

Test for format parse format string and compile time,
flat_maps ones test all functionality in single test file.

libstdc++-v3/ChangeLog:

* testsuite/23_containers/flat_map/1.cc: Add dg-timeout-factor 2.
* testsuite/23_containers/flat_multimap/1.cc: Likewise.
* testsuite/std/format/ranges/map.cc: Likewise.
* testsuite/std/format/ranges/sequence.cc: Likewise.
* testsuite/std/format/ranges/string.cc: Likewise.

libstdc++: Finalize GCC 15 ABI

Disallow adding new symbols to GLIBCXX_3.4.34 and CXXABI_1.3.16 versions.

* testsuite/util/testsuite_abi.cc (check_version): Update latestp
to use GLIBCXX_3.4.35 and CXXABI_1.3.17.

testsuite: Use sigsetjmp in gcc.misc-tests/gcov-31.c

The gcc.misc-tests/gcov-31.c test FAILs on Solaris and Darwin:

FAIL: gcc.misc-tests/gcov-31.c (test for excess errors)

Excess errors:
/vol/gcc/src/hg/master/local/gcc/testsuite/gcc.misc-tests/gcov-31.c:23:5:
error: implicit declaration of function '__sigsetjmp'; did you mean
'sigsetjmp'? [-Wimplicit-function-declaration]

__sigsetjmp is a Linux/glibc implementation detail. Other tests just
use sigsetjmp directly, so this patch follows suit.

Tested on i386-pc-solaris2.11, sparc-sun-solaris2.11,
x86_64-pc-linux-gnu, and x86_64-apple-darwin24.4.0.

2025-04-22 Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE>

gcc/testsuite:
* gcc.misc-tests/gcov-31.c (run_pending_traps): Use sigsetjmp
instead of __sigsetjmp.

c++/modules: Remove unnecessary lazy_load_pendings

This call is not necessary, as we don't access the bodies of any classes
that we instantiate here.

gcc/cp/ChangeLog:

* name-lookup.cc (lookup_imported_hidden_friend): Remove
unnecessary lazy_load_pendings.

Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>

c++/modules: Find non-exported reachable decls when instantiating friend classes [PR119863]

In r15-9029-geb26b667518c95, we started checking for conflicting
declarations with any reachable decl attached to the same originating
module. This exposed the issue in the PR, where we would always create
a new type even if a matching type existed in the original module.

This patch reworks lookup_imported_hidden_friend to handle this case
better, by first checking for any reachable decl in the attached module
before looking in the mergeable decl slots.

PR c++/119863

gcc/cp/ChangeLog:

* name-lookup.cc (get_mergeable_namespace_binding): Remove
no-longer-used function.
(lookup_imported_hidden_friend): Also look for hidden imported
decls in an attached decl's module.

gcc/testsuite/ChangeLog:

* g++.dg/modules/tpl-friend-18_a.C: New test.
* g++.dg/modules/tpl-friend-18_b.C: New test.
* g++.dg/modules/tpl-friend-18_c.C: New test.

Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>

Skip g++.dg/eh/pr119507.C on arm eabi

arm eabi emits the exception table using the handlerdata directive
and does not use a comdat section for comdat functions. So this
testcase should be skipped for arm eabi.

Pushed as obvious after a quick test.

gcc/testsuite/ChangeLog:

* g++.dg/eh/pr119507.C: Skip for arm eabi.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

[testsuite] [ppc] require ifunc for target_clones test

gcc.target/powerpc/power11-3.c uses target_clones, that depends on
ifunc. Require ifunc support.

for gcc/testsuite/ChangeLog

* gcc.target/powerpc/power11-3.c: Require ifunc support.

[riscv] vec_dup immediate constants in pred_broadcast expand [PR118182]

pr118182-2.c fails on gcc-14 because it lacks the late_combine passes,
particularly the one that runs after register allocation.

Even in the trunk, the predicate broadcast for the add reduction is
expanded and register-allocated as _zvfh, taking up an unneeded scalar
register to hold the constant to be vec_duplicated.

It is the late combine pass after register allocation that substitutes
this unneeded scalar register into the vec_duplicate, resolving to the
_zero or _imm insns.

It's easy enough and more efficient to expand pred_broadcast to the
insns that take the already-duplicated vector constant, when the
operands satisfy the predicates of the _zero or _imm insns.

for gcc/ChangeLog

PR target/118182
* config/riscv/vector.md (@pred_broadcast<mode>): Expand to
_zero and _imm variants without vec_duplicate.

Daily bump.

c++: reorder constexpr checks

My proposed change to stop setting TREE_STATIC on constexpr heap
pseudo-variables led to a diagnostic regression because we would get the
generic "not constant" diagnostic before the "allocated storage" diagnostic.
So let's move the generic verify_constant down a bit.

gcc/cp/ChangeLog:

* constexpr.cc (cxx_eval_outermost_constant_expr): Move
verify_constant later.

c++: new size folding [PR118775]

r15-7893 added a workaround for a case where we weren't registering (long)&a
as invalid in a constant-expression, because build_new_1 had folded away the
CONVERT_EXPR that we rely on to diagnose that problem. In general we want
to defer most folding until cp_fold_function, so let's fold less here. We
mainly want to expose constant size so we can treat it differently, and we
already did any constexpr evaluation when initializing cst_outer_nelts, so
fold_to_constant seems like the right choice.

PR c++/118775

gcc/cp/ChangeLog:

* constexpr.cc (cxx_eval_call_expression): Add assert.
(fold_to_constant): Handle processing_template_decl.
* init.cc (build_new_1): Use fold_to_constant.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/constexpr-new24.C: Adjust diagnostic.

c++: static constexpr strictness [PR99456]

r11-7740 limited constexpr rejection of conversion from pointer to integer
to manifestly constant-evaluated contexts; it should instead check whether
we're in strict mode.

The comment for that commit noted that making this change regressed other
tests, which turned out to be because maybe_constant_init_1 was not being
properly strict for variables declared constexpr/constinit.

PR c++/99456

gcc/cp/ChangeLog:

* constexpr.cc (cxx_eval_constant_expression): Check strict
instead of manifestly_const_eval.
(maybe_constant_init_1): Be strict for static constexpr vars.

Fix cost of vectorized double->float conversion

In previous patch I miscomputed costs of cvtpd2pf instruction
which mistakely gets accounted as 2 (VEC_PACK_TRUNC_EXPR).
Vectorizer can produce both, but when producing VEC_PACK_TRUNC_EXPR
it use promote_demote patch. This patch thus simplifies
handling of NOP_EXPR since in that case we should always be producing
only one instruction.

PR target/119879
* config/i386/i386.cc (fp_conversion_stmt_cost): Inline to ...
(ix86_vector_costs::add_stmt_cost): ... here; fix handling of NOP_EXPR.

[PATCH 21/61] Testsuite: Modify the gcc.dg/memcpy-4.c test

From: Andrew Bennett <andrew.bennett@imgtec.com>

Firstly, remove the MIPS specific bit of the test.
Secondly, create a MIPS specific version in the gcc.target/mips.
This will only execute for a MIPS ISA less than R6.

Cherry-picked c8b051cdbb1d5b166293513b0360d3d67cf31eb9
from https://github.com/MIPS/gcc

gcc/testsuite
* gcc.dg/memcpy-4.c: Remove mips specific code.
* gcc.target/mips/memcpy-2.c: New test.

[PATCH 08/61] Testsuite: Accept jrc for clear cache intrinsic

Accept jrc for clear cache intrinsic.

gcc/testsuite
* gcc.target/mips/clear-cache-1.c: Also allow jrc.

[PATCH 30/61] MSA: Make MSA and microMIPS R5 unsupported

There are no platforms nor simulators for MSA and microMIPS R5 so
turning off this support for now.

gcc/ChangeLog:

* config/mips/mips.cc (mips_option_override): Error out for
-mmicromips -mmsa.

[PATCH 43/61] Disable ssa-dom-cse-2.c for MIPS lp64

The optimisation to reduce the result to constant 28 still happens
but only much later in combine.

gcc/testsuite/
* gcc.dg/tree-ssa/ssa-dom-cse-2.c: Do not check output for
MIPS lp64 abi.

except: Don't use the cached value of the gcc_except_table section for comdat functions [PR119507]

This has been broken since GCC started to put the comdat functions' gcc_except_table into their
own section; r0-118218-g3e6011cfebedfb. What would happen is after a non-comdat function is processed,
the cached value would always be used even for comdat function. Instead we should create a new
section for comdat functions.

OK? Bootstrapped and tested on x86_64-linux-gnu.

PR middle-end/119507

gcc/ChangeLog:

* except.cc (switch_to_exception_section): Don't use the cached section if
the current function is in comdat.

gcc/testsuite/ChangeLog:

* g++.dg/eh/pr119507.C: New test.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

Add assert to array_slice::begin/end

So while debugging PR 118320, I found it was useful to have
an assert inside array_slice::begin/end that the array slice isvalid
rather than getting an segfault. This adds an assert that is only
enabled for checking.

OK? Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

* vec.h (array_slice::begin): Assert that the
slice is valid.
(array_slice::end): Likewise.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

libgcobol: Check for struct tm tm_zone

intrinsic.cc doesn't compile on Solaris:

/vol/gcc/src/hg/master/cobol/libgcobol/intrinsic.cc: In function ‘void
__gg__formatted_current_date(cblc_field_t*, cblc_field_t*, std::size_t,
std::size_t)’:
/vol/gcc/src/hg/master/cobol/libgcobol/intrinsic.cc:1480:6: error: ‘struct
std::tm’ has no member named ‘tm_zone’; did you mean ‘tm_mon’?
1480 |   tm.tm_zone = "GMT";
      |      ^~~~~~~
      |      tm_mon

struct tm.tm_zone is new in POSIX.1-2024, thus cannot be assumed to be
present universally.

This patch checks for its presence and guards the use accordingly.

Bootstrapped without regressions on amd64-pc-solaris2.11,
sparcv9-sun-solaris2.11, and x86_64-pc-solaris2.11.

2025-04-08  Rainer Orth  <ro@CeBiTec.Uni-Bielefeld.DE>

libgcobol:
* configure.ac: Check for struct tm.tm_zone.
* configure, config.h.in: Regenerate.
* intrinsic.cc (__gg__formatted_current_date): Guard tm.tm_zone
use with HAVE_STRUCT_TM_TM_ZONE.

Generate 2 FMA instructions in ix86_expand_swdivsf.

When FMA is available, N-R step can be rewritten with

a / b = (a - (rcp(b) * a * b)) * rcp(b) + rcp(b) * a

which have 2 fma generated.

gcc/ChangeLog:

* config/i386/i386-expand.cc (ix86_emit_swdivsf): Generate 2
FMA instructions when TARGET_FMA.

gcc/testsuite/ChangeLog:

* gcc.target/i386/recip-vec-divf-fma.c: New test.

Daily bump.

x86: Add tests for PR target/117863

commit 546f28f83ceba74dc8bf84b0435c0159ffca971a
Author: Richard Sandiford <richard.sandiford@arm.com>
Date: Mon Apr 7 08:03:46 2025 +0100

simplify-rtx: Fix shortcut for vector eq/ne

fixed

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117863

PR target/117863
* gcc.dg/rtl/i386/vector_eq-2.c: New test.
* gcc.dg/rtl/i386/vector_eq-3.c: Likewise.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>

Daily bump.

Disable parallel testing for 'rust/compile/nr2/compile.exp' [PR119508]

..., using the standard idiom. This '*.exp' file doesn't adhere to the
parallel testing protocol as defined in 'gcc/testsuite/lib/gcc-defs.exp'.

This also restores proper behavior for '*.exp' files executing after (!) this
one, which erroneously caused hundreds or even thousands of individual test
cases get duplicated vs. skipped, randomly, depending on the '-jN' level.

PR testsuite/119508
gcc/testsuite/
* rust/compile/nr2/compile.exp: Disable parallel testing.

[RISC-V][PR target/119865] Don't free ggc allocated memory

Kaiweng's patch to stop freeing riscv_arch_string was correct, but incomplete
as there's another path that was freeing that node, which is just plain wrong
for a node allocated by the GC system.

This patch removes that call to free() which fixes the test. I've spun it in
my tester and will obviously wait for the pre-commit system to render a verdict
before moving forward.

PR target/119865
gcc/
* config/riscv/riscv.cc (parse_features_for_version): Do not
explicitly free the architecture string.

[RISC-V][PR target/118410] Improve code generation for some logical ops

I'm posting this on behalf of Shreya Munnangi who is working as an intern with
me. I've got her digging into prerequisites for removing mvconst_internal and
would prefer she focus on that rather than our patch process at this time.

--

We can use the orn, xnor, andn instructions on RISC-V to improve the code
generated logical operations when one operand is a constant C where
synthesizing ~C is cheaper than synthesizing C.

This is going to be an N -> N - 1 splitter rather than a define_insn_and_split.
A define_insn_and_split can obviously work, but has multiple undesirable
effects in general.

As a result of implementing as a simple define_split we're not supporting AND
at this time. We need to clean up the mvconst_internal situation first after
which supporting AND is trivial.

This has been tested in Ventana's CI system as well as my tester. Obviously
we'll wait for the pre-commit tester to run before moving forward.

PR target/118410
gcc/
* config/riscv/bitmanip.md (logical with constant argument): New
splitter for cases where synthesizing ~C is cheaper than synthesizing
the original constant C.

gcc/testsuite/
* gcc.target/riscv/pr118410-1.c: New test.
* gcc.target/riscv/pr118410-2.c: Likewise.

Co-authored-by: Jeff Law <jlaw@ventanamicro.com>

Add tables for SSE fp conversion costs

as disucssed, I will proceed adding costs for common SSE operations which are
currently globbed into addss cost, so we do not need to set it incorrectly for
znver5. Looking through the stats, there are quite few missing cases, so I am
starting with those that I think are more common. I plan to do it in smaller
steps so individual changes gets benchmarked by LNT and also can be bisected
to.

This patch adds costs for various SSE and AVX FP->FP conversions (extensions and
truncations). Looking through Agner Fog's tables, these are bit assymetric so I
added cost for CVTSS2SD which is also used for CVTSD2SS, CVTPS2PD and CVTPD2PS,
cost for 256bit VCVTPS2PS (also used for oposite direction) and cost for 512bit
one.

I plan to add int->int conversions next and then int->fp & fp->int which are
more tricky since they may bundle inter-unit move.

I also noticed that size tables are wrong for all SSE instructions so I updated
them. With some love I think vectorization can work as size optimization, too,
but we need more work on that.

Those values I can find in Agner Fog tables are taken from there, other are guesses
(especially for yongfeng_cost and shijidadao_cost).

gcc/ChangeLog:

* config/i386/i386.cc (vec_fp_conversion_cost): New function.
(ix86_rtx_costs): Use it for SSE/AVX FP conversoins.
(ix86_builtin_vectorization_cost): Fix indentation;
and use vec_fp_conversion_cost in vec_promote_demote.
(fp_conversion_stmt_cost): New function.
(ix86_vector_costs::add_stmt_cost): Use it to cost NOP_EXPR
and vec_promote_demote.
* config/i386/i386.h (struct processor_costs):
* config/i386/x86-tune-costs.h (struct processor_costs):

Fix pr118947-1.c and pr78408-3.c on targets where 32 bytes memcpy uses a vector

The problem here is on targets where a 32byte memcpy will use an integral (vector) type
to do the copy and the code will be optimized a different way than expected. This changes
the testcase instead to use a size of 1025 to make sure there is no target that will use an
integral (vector) type for the memcpy and be optimized via the method that was just added.

Pushed as obvious after a test run.

gcc/testsuite/ChangeLog:

* gcc.dg/pr118947-1.c: Use 1025 as the size of the buf.
* gcc.dg/pr78408-3.c: Likewise.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

combine: Better split point for `(and (not X))` [PR111949]

In a similar way find_split_point handles `a+b*C`, this adds
the split point for `~a & b`. This allows for better instruction
selection when the target has this instruction (aarch64, arm and x86_64
are examples which have this).

Built and tested for aarch64-linux-gnu.

PR rtl-optimization/111949

gcc/ChangeLog:

* combine.cc (find_split_point): Add a split point
for `(and (not X) Y)` if not in the outer set already.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/bic-1.c: New test.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

[PATCH v2] sh: libgcc: Implement fenv rouding and exceptions for soft-fp [PR118257]

Implement fenv rouding and exceptions for soft-fp, as per SuperH
arch specification.

No new tests required, as it's already covered by many torture tests
with fenv_exceptions.

PR target/118257

libgcc/ChangeLog:

* config/sh/sfp-machine.h (_FPU_GETCW): Implement with builtin.
(_FPU_SETCW): Likewise.
(FP_EX_ENABLE_SHIFT): Derive from arch spec.
(FP_EX_CAUSE_SHIFT): Likewise.
(FP_RND_MASK): Likewise.
(FP_EX_INVALID): Likewise.
(FP_EX_DIVZERO): Likewise.
(FP_EX_ALL): Likewise.
(FP_EX_OVERFLOW): Likewise.
(FP_EX_UNDERFLOW): Likewise.
(FP_EX_INEXACT): Likewise.
(_FP_DECL_EX): Declear default FCSR value.
(FP_RND_NEAREST): Derive from arch spec.
(FP_RND_ZERO): Likewise.
(FP_INIT_ROUNDMODE): Likewise.
(FP_ROUNDMODE): Likewise.
(FP_TRAPPING_EXCEPTIONS): Likewise.
(FP_HANDLE_EXCEPTIONS): Implement with _FPU_SETCW.

[PATCH v2] sh: Correct NaN signalling bit and propagation rules [PR111814]

As per architecture, SuperH has a reversed NaN signalling bit
vs IEEE754-2008, it also has a NaN propgation rule similar to
MIPS style.

Use mips style float format and mode for all float types, and
correct sfp-machine header accordingly.

PR target/111814

gcc/ChangeLog:

* config/sh/sh-modes.def (RESET_FLOAT_FORMAT): Use mips format.
(FLOAT_MODE): Use mips mode.

libgcc/ChangeLog:

* config/sh/sfp-machine.h (_FP_NANFRAC_B): Reverse signaling bit.
(_FP_NANFRAC_H): Likewise.
(_FP_NANFRAC_S): Likewise.
(_FP_NANFRAC_D): Likewise.
(_FP_NANFRAC_Q): Likewise.
(_FP_KEEPNANFRACP): Enable for target.
(_FP_QNANNEGATEDP): Enable for target.
(_FP_CHOOSENAN): Port from MIPS.

gcc/testsuite/ChangeLog:

* gcc.target/sh/pr111814.c: New test.

c++: minor EXPR_STMT cleanup

I think it was around PR118574 that I noticed a few cases where we were
unnecessarily wrapping a statement tree in a further EXPR_STMT. Let's avoid
that and also use finish_expr_stmt in a few places in the coroutines code
that were building EXPR_STMT directly.

gcc/cp/ChangeLog:

* coroutines.cc (coro_build_expr_stmt)
(coro_build_cvt_void_expr_stmt): Remove.
(build_actor_fn): Use finish_expr_stmt.
* semantics.cc (finish_expr_stmt): Avoid wrapping statement in
EXPR_STMT.
(finish_stmt_expr_expr): Add comment.

Alpha: Fix base block alignment calculation regression

In determination of base block alignment we only examine a COMPONENT_REF
tree node at hand without ever checking if its ultimate alignment has
been reduced by the combined offset going back to the outermost object.
Consequently cases have been observed where quadword accesses have been
produced for a memory location referring a nested struct member only
aligned to the longword boundary, causing emulation to trigger.

Address this issue by recursing into COMPONENT_REF tree nodes until the
outermost one has been reached, which is supposed to be a MEM_REF one,
accumulating the offset as we go, fixing a commit e0dae4da4c45 ("Alpha:
Also use tree information to get base block alignment") regression.

Bail out and refrain from using tree information for alignment if we end
up at something different or we are unable to calculate the offset at
any point.

gcc/
* config/alpha/alpha.cc
(alpha_get_mem_rtx_alignment_and_offset): Recurse into
COMPONENT_REF nodes.

gcc/testsuite/
* gcc.target/alpha/memcpy-nested-offset-long.c: New file.
* gcc.target/alpha/memcpy-nested-offset-quad.c: New file.

Fortran: Fix checking for IMPURE in DO CONCURRENT.

PR fortran/119836

gcc/fortran/ChangeLog:

* resolve.cc (check_pure_function): Fix checking for
an impure subprogram within a DO CONCURRENT construct.
(pure_subroutine): Ditto.

gcc/testsuite/ChangeLog:

* gfortran.dg/do_concurrent_all_clauses.f90: Remove invalid
dg-error test.
* gfortran.dg/pr119836_1.f90: New test.
* gfortran.dg/pr119836_2.f90: New test.
* gfortran.dg/pr119836_3.f90: New test.
* gfortran.dg/pr119836_4.f90: New test.

Daily bump.

Fix time zone for 'cobol.dg/group2/FUNCTION_DATE___TIME_OMNIBUS.cob' [PR119818]

This progresses:

    PASS: cobol.dg/group2/FUNCTION_DATE___TIME_OMNIBUS.cob   -O0  (test for excess errors)
    [-FAIL:-]{+PASS:+} cobol.dg/group2/FUNCTION_DATE___TIME_OMNIBUS.cob   -O0  execution test
    [Etc.]

PR cobol/119818
gcc/testsuite/
* cobol.dg/group2/FUNCTION_DATE___TIME_OMNIBUS.cob:
'dg-set-target-env-var TZ UTC0'.

libstdc++: Add _GLIBCXX_DEBUG checks on unordered container local_iterator

Complete tests on _GLIBCXX_DEBUG checks in include/debug/safe_local_iterator.h.

Fix several tests not testing the container corresponding to their location in the
testsuite directory.

libstdc++-v3/ChangeLog:

* testsuite/util/debug/unordered_checks.h (fill_container): New helper method.
(use_erased_local_iterator, invalid_local_iterator_pre_increment)
(invalid_local_iterator_post_increment, invalid_local_iterator_compare)
(invalid_local_iterator_range): Use latter.
(fill_and_get_local_iterator): New, use fill_container.
(use_invalid_local_iterator): Use latter.
(invalid_local_iterator_arrow_operator): New test function.
(invalid_local_iterator_copy_instantiation): New test function.
(invalid_local_iterator_move_instantiation): New test function.
(invalid_local_iterator_copy_assignment): New test function.
(invalid_local_iterator_move_assignment): New test function.
(invalid_local_iterator_const_conversion): New test function.
* testsuite/23_containers/unordered_map/debug/invalid_local_iterator_arrow_operator_neg.cc:
New test case.
* testsuite/23_containers/unordered_map/debug/invalid_local_iterator_const_conversion_neg.cc:
New test case.
* testsuite/23_containers/unordered_map/debug/invalid_local_iterator_copy_assignment_neg.cc:
New test case.
* testsuite/23_containers/unordered_map/debug/invalid_local_iterator_copy_construction_neg.cc:
New test case.
* testsuite/23_containers/unordered_map/debug/invalid_local_iterator_move_assignment_neg.cc:
New test case.
* testsuite/23_containers/unordered_map/debug/invalid_local_iterator_move_construction_neg.cc:
New test case.
* testsuite/23_containers/unordered_map/debug/max_load_factor_neg.cc: Test unordered_map.
* testsuite/23_containers/unordered_multimap/debug/begin2_neg.cc: Test unordered_multimap.
* testsuite/23_containers/unordered_multimap/debug/bucket_size_neg.cc: Likewise.
* testsuite/23_containers/unordered_multimap/debug/cbegin_neg.cc: Likewise.
* testsuite/23_containers/unordered_multimap/debug/cend_neg.cc: Likewise.
* testsuite/23_containers/unordered_multimap/debug/end1_neg.cc: Likewise.
* testsuite/23_containers/unordered_multimap/debug/end2_neg.cc: Likewise.
* testsuite/23_containers/unordered_multimap/debug/invalid_local_iterator_arrow_operator_neg.cc:
New test case.
* testsuite/23_containers/unordered_multimap/debug/invalid_local_iterator_const_conversion_neg.cc:
New test case.
* testsuite/23_containers/unordered_multimap/debug/invalid_local_iterator_copy_assignment_neg.cc:
New test case.
* testsuite/23_containers/unordered_multimap/debug/invalid_local_iterator_copy_construction_neg.cc:
New test case.
* testsuite/23_containers/unordered_multimap/debug/invalid_local_iterator_move_assignment_neg.cc:
New test case.
* testsuite/23_containers/unordered_multimap/debug/invalid_local_iterator_move_construction_neg.cc:
New test case.
* testsuite/23_containers/unordered_multimap/debug/max_load_factor_neg.cc:
Test unordered_multimap.
* testsuite/23_containers/unordered_multiset/debug/invalid_local_iterator_arrow_operator_neg.cc:
New test case.
* testsuite/23_containers/unordered_multiset/debug/invalid_local_iterator_const_conversion_neg.cc:
New test case.
* testsuite/23_containers/unordered_multiset/debug/invalid_local_iterator_copy_assignment_neg.cc:
New test case.
* testsuite/23_containers/unordered_multiset/debug/invalid_local_iterator_copy_construction_neg.cc:
New test case.
* testsuite/23_containers/unordered_multiset/debug/invalid_local_iterator_move_assignment_neg.cc:
New test case.
* testsuite/23_containers/unordered_multiset/debug/invalid_local_iterator_move_construction_neg.cc:
New test case.
* testsuite/23_containers/unordered_set/debug/invalid_local_iterator_arrow_operator_neg.cc:
New test case.
* testsuite/23_containers/unordered_set/debug/invalid_local_iterator_const_conversion_neg.cc:
New test case.
* testsuite/23_containers/unordered_set/debug/invalid_local_iterator_copy_assignment_neg.cc:
New test case.
* testsuite/23_containers/unordered_set/debug/invalid_local_iterator_copy_construction_neg.cc:
New test case.
* testsuite/23_containers/unordered_set/debug/invalid_local_iterator_move_assignment_neg.cc:
New test case.
* testsuite/23_containers/unordered_set/debug/invalid_local_iterator_move_construction_neg.cc:
New test case.

Reviewed-by: Jonathan Wakely <jwakely@redhat.com>

[RISC-V] Fix missed bext discovery

RISC-V has the ability to extract a single bit out of a register from a fixed
or variable position.

While looking at 502.gcc a little while ago I realize that we failed to use
bext inside bitmap_bit_p for its return value.

The core "problem" is that the RISC-V does not define SHIFT_COUNT_TRUNCATED
(for good reasons).  As a result the target is largely responsible for handling
elimination of shift count/bit position masking.

There's a follow-up patch I've been working on with an intern to improve
detection of bext in more cases.  This one stands independently though and is
probably the most important of the missed cases.

Will push to the trunk assuming pre-commit testing is green.  It's already been
through my tester as well as Ventana's internal testing.

gcc
* config/riscv/bitmanip.md (*bext<mode>_mask_pos): New pattern
for extracting a single bit at masked bit position.

gcc/testsuite

* gcc.target/riscv/bext-ext-2.c: New test

ref-temp1.C: Enable some tests for PE targets

Test for expected PE values.

Signed-off-by: Jonathan Yong <10walls@gmail.com>
gcc/testsuite/ChangeLog:

* g++.dg/abi/ref-temp1.C: Replicate some test based on
PE expectations.
* lib/target-supports.exp: New check_effective_target_pe.

DSE: Trim stores of 0 like triming stores of {} [PR87901]

This is the second part of the PR which comes from transformation
of memset into either stores of 0 (via an integral type) or stores
of {}. We already handle stores of `{}`, this just extends that to
handle of the constant 0 and treat it similarly.

PR tree-optimization/87901

gcc/ChangeLog:

* tree-ssa-dse.cc (maybe_trim_constructor_store): Add was_integer_cst argument.
Check for was_integer_cst instead of `{}` when was_integer_cst is true.
(maybe_trim_partially_dead_store): Handle INTEGER_CST stores of 0 as stores of `{}`.
Udpate call to maybe_trim_constructor_store for CONSTRUCTOR.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/ssa-dse-53.c: New test.
* gcc.dg/tree-ssa/ssa-dse-54.c: New test.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

DSE: Support triming of some more memset [PR87901]

DSE has support for trimming memset (and memset like) statements.
In this case we have `MEM <unsigned char[17]> [(char * {ref-all})&z] = {};` in
the IR and when we go to trim it, we call build_fold_addr_expr which leaves around
a cast from one pointer type to another. This is due to build_fold_addr_expr
being generic but in gimple you don't need these casts.

PR tree-optimization/87901

gcc/ChangeLog:

* tree-ssa-dse.cc (maybe_trim_constructor_store): Strip over useless type
conversions after taking the address of the MEM_REF.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/ssa-dse-52.c: New test.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

gimple: Canonical order for invariants [PR118902]

So unlike constants, address invariants are currently put first if
used with a SSA NAME.
It would be better if address invariants are consistent with constants
and this patch changes that.
gcc.dg/tree-ssa/pr118902-1.c is an example where this canonicalization
can help. In it if `p` variable was a global variable, FRE (VN) would have figured
it out that `a` could never be equal to `&p` inside the loop. But without the
canonicalization we end up with `&p == a.0_1` which VN does try to handle for conditional
VN.

Bootstrapped and tested on x86_64.

PR tree-optimization/118902
gcc/ChangeLog:

* fold-const.cc (tree_swap_operands_p): Place invariants in the first operand
if not used with constants.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/pr118902-1.c: New test.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

gimple-fold: Improve optimize_memcpy_to_memset by walking back until aliasing says the ref is a may clobber. [PR118947]

The case here is we have:
```
    char buf[32] = {};
    void* ret = aaa();
    __builtin_memcpy(ret, buf, 32);
```

And buf does not escape.  But we don't prop the zeroing from buf to the memcpy statement
because optimize_memcpy_to_memset only looks back one statement. This can be fixed to look back
until we get an statement that may clobber the reference.  If we get a phi node, then we don't do
anything.

Bootstrapped and tested on x86_64-linux-gnu.

PR tree-optimization/118947

gcc/ChangeLog:

* gimple-fold.cc (optimize_memcpy_to_memset): Walk back until we get a
statement that may clobber the read.

gcc/testsuite/ChangeLog:

* gcc.dg/pr118947-1.c: New test.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

gimple-fold: Improve optimize_memcpy_to_memset to handle STRING_CST [PR78408]

While looking into PR 118947, I noticed that optimize_memcpy_to_memset didn't
handle STRING_CST which are also used for a memset of 0 but for char arrays.
This fixes that and improves optimize_memcpy_to_memset to handle that case.

This fixes part of PR 118947 but not the whole thing; we still need to skip over
vdefs in some cases.

Boostrapped and tested on x86_64-linux-gnu.

PR tree-optimization/78408
PR tree-optimization/118947

gcc/ChangeLog:

* gimple-fold.cc (optimize_memcpy_to_memset): Handle STRING_CST case too.

gcc/testsuite/ChangeLog:

* gcc.dg/pr78408-3.c: New test.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

[PATCH] c6x: Fix EHTYPE relocations

R_C6000_EHTYPE relocations are implemented as GOT-indirect relocations,
but, as specified by the C6000 EABI (SPRAB89A), 13.5.1 Relocation Types,
they are a special case of SBR (static base relocation).

gcc/
* config/c6x/c6x.h (ASM_PREFERRED_EH_DATA_FORMAT): Remove the
DW_EH_PE_indirect flag.

testsuite: Use int size instead of alignment for pr116357.c

The test case assumes that alignof(int)=sizeof(int). But for some
targets this is not valid. For example, for PRU target,
alignof(int)=1 but sizeof(int)=4.

Fix the test case to align to twice the size of int, as the expected
dg-error messages suggest.

This patch fixes the test failures for PRU target.

gcc/testsuite/ChangeLog:

* gcc.dg/pr116357.c: Use sizeof(int) instead of alignof(int).

Signed-off-by: Dimitar Dimitrov <dimitar@dinux.eu>

tree-optimization/119858 - type mismatch with POINTER_PLUS

The recent PFA early-break vectorization fix left us with a POINTER_PLUS
and non-sizetype offset.

PR tree-optimization/119858
* tree-vect-loop.cc (vectorizable_live_operation): Convert
pointer offset to sizetype.

[PATCH] riscv: Add support for riscv*-gnu (GNU/Hurd on RISC-V)

This produces a toolchain that can successfully build binaries targeting
riscv*-gnu.

gcc/ChangeLog:
* config.gcc: Recognize riscv*-*-gnu* targets.
* config/riscv/gnu.h: New file.

[PATCH] [RISC-V] Tune for removal unnecessary sext in builtin overflows [PR108016]

It fixes one of the PR108016 mis-optimization.

The patch adjusts expanding for __builtin_add/sub_overflow() on RV64 targets
to avoid unnecessary sext.w instructions.

It replaces expanded for ADD/SUB_OVERFLOW code:
r141:SI=r139:DI#0+r140:DI#0 .. r143:DI=sign_extend(r141:SI)
to the followong kind of chain ->
r143:DI=sign_extend(r139:DI#0+r140:DI#0) .. r141:SI=r143:DI#0
so that sign_extend(a:SI+b:SI) to be emitted as addw (or subw) instruction,
while output r141:SI register will be placed at the end of chain without
extra dependencies, and thus could be easily optimized-out by further pipeline.

PR middle-end/108016
gcc/ChangeLog:

* config/riscv/riscv.md (addv<mode>4, uaddv<mode>4, subv<mode>4,
usubv<mode>4): Tunes for unnecessary sext.w elimination.

PR middle-end/108016
gcc/testsuite/ChangeLog:

* gcc.target/riscv/pr108016.c: New test.

libstdc++: Clarify that _S_ prefix is be used for static member functions.

libstdc++-v3/ChangeLog:

* doc/xml/manual/appendix_contributing.xml: Add 'and functions'.

avoid-store-forwarding: Fix reg init on load-elimination [PR119160]

In the case that we are eliminating the load instruction, we use zero_extend
for the initialization of the base register for the zero-offset store.
This causes issues when the store and the load use the same mode,
as we are trying to generate a zero_extend with the same inner and
outer modes.

This patch fixes the issue by zero-extending the value stored in the
base register only when the load's mode is wider than the store's mode.

PR rtl-optimization/119160

gcc/ChangeLog:

* avoid-store-forwarding.cc (process_store_forwarding):
Zero-extend the value stored in the base register, in case
of load-elimination, only when the mode of the destination
is wider.

gcc/testsuite/ChangeLog:

* gcc.dg/pr119160.c: New test.

doc: Clarify REG_EH_REGION note usage

The documentation for the REG_EH_REGION could easily be read
(especially by non-native speakers) to indicate that it should be
attached to insn at the destination of an excpetion edge. Despite the
original text saying that the note "specifies the destination," it is
actually always attached to the source instruction.

This updates the documentation to make it clear that the REG_EH_REGION
note is always attached to instructions originating an exception edge
and that the value of the note specifies where the exception edge
leads to.

Co-Developed-by: Philipp Tomsich <philipp.tomsich@vrull.eu>
gcc/ChangeLog:

* doc/cfg.texi: Update the exception handling section for the
REG_EH_REGION notes to make it clear that the note is attached
to the instruction throwing the exception.

LoongArch: Change {dg-do-what-default} save and restore logical.

The set of {dg-do-what-default} to 'run' may lead some test hang
during make check.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/vector/loongarch-vector.exp: Change
{dg-do-what-default} save and restore logical.

Daily bump.

[PATCH] RISC-V: Do not free a riscv_arch_string when handling target-arch attribute

The build_target_option_node() function may return a cached node when
fndecl having the same effective global_options. Therefore, freeing
memory used in target nodes can lead to a use-after-free issue, as a
target node may be shared by multiple fndecl.
This issue occurs in gcc.target/riscv/target-attr-16.c, where all
functions have the same march, but the last function tries to free its
old x_riscv_arch_string (which is shared) when processing the second
target attribute.However, the behavior of this issue depends on how the
OS handles malloc. It's very likely that xstrdup returns the old address
just freed, coincidentally hiding the issue. We can verify the issue by
forcing xstrdup to return a new address, e.g.,

-  if (opts->x_riscv_arch_string != default_opts->x_riscv_arch_string)
-    free (CONST_CAST (void *, (const void *) opts->x_riscv_arch_string));
+  // Force it to use a new address, NFCI
+  const char *tmp = opts->x_riscv_arch_string;
   opts->x_riscv_arch_string = xstrdup (local_arch_str);

+  if (tmp != default_opts->x_riscv_arch_string)
+    free (CONST_CAST (void *, (const void *) tmp));

This patch replaces xstrdup with ggc_strdup and let gc to take care of
unused strings.

gcc/ChangeLog:

* config/riscv/riscv-target-attr.cc
(riscv_target_attr_parser::update_settings):
Do not manually free any arch string.

c++: constexpr virtual base diagnostic

I thought this diagnostic could be clearer that the problem is the
combination of virtual bases and constexpr constructor, not just complain
that the class has virtual bases without context.

gcc/cp/ChangeLog:

* constexpr.cc (is_valid_constexpr_fn): Improve diagnostic.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/constexpr-dtor16.C: Adjust diagnostic.
* g++.dg/cpp2a/constexpr-dynamic10.C: Likewise.

Document peculiarities of BOOLEAN_TYPE

gcc/
* tree.def (BOOLEAN_TYPE): Add more details.

c++: constexpr new diagnostic location

Presenting the allocation location as the location of the outermost
expression we're trying to evaluate is inaccurate; let's provide both
locations.

gcc/cp/ChangeLog:

* constexpr.cc (cxx_eval_outermost_constant_expr): Give both
expression and allocation location in allocated storage diagnostics.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1y/constexpr-new.C: Adjust diagnostics.
* g++.dg/cpp1z/constexpr-asm-5.C: Likewise.
* g++.dg/cpp26/static_assert1.C: Likewise.
* g++.dg/cpp2a/constexpr-dtor7.C: Likewise.
* g++.dg/cpp2a/constexpr-new26.C: Likewise.
* g++.dg/cpp2a/constexpr-new3.C: Likewise.
* g++.dg/cpp2a/constinit14.C: Likewise.

c++: vec_safe_reserve usage tweaks

A couple of cleanups from noticing that the semantics of
std::vector<T>::reserve() (request the new minimum allocation) differ from
the GCC vec<...>::reserve() (request a minimum number of slots available).

In preserve_state, we were tripling the size of the vec when doubling it is
more than enough.

In get_tinfo_desc we were using vec_safe_reserve properly, but it's
simpler to use vec_safe_grow_cleared.

gcc/cp/ChangeLog:

* name-lookup.cc (name_lookup::preserve_state): Fix reserve call.
* rtti.cc (get_tinfo_desc): Use vec_safe_grow_cleared.

c++: improve pack index diagnostics

While looking at pack-indexing16.C I thought it would be helpful to print
the problematic type/value.

gcc/cp/ChangeLog:

* semantics.cc (finish_type_pack_element): Add more info
to diagnostics.

libstdc++-v3/ChangeLog:

* testsuite/20_util/tuple/element_access/get_neg.cc: Adjust
diagnostic.

gcc/testsuite/ChangeLog:

* g++.dg/cpp26/pack-indexing2.C: Adjust diagnostics.
* g++.dg/ext/type_pack_element2.C: Likewise.
* g++.dg/ext/type_pack_element4.C: Likewise.

c++: add assert to cp_make_fname_decl

In the PR118629 testcase, pushdecl_outermost_localscope was failing and
returning error_mark_node without ever actually giving an error; in addition
to my earlier fix for the failure, make sure failures aren't silent.

gcc/cp/ChangeLog:

* decl.cc (cp_make_fname_decl): Prevent silent failure.

c++: 'requires' diagnostic before C++20

We were giving a generic "not declared" error for a requires-expression
without concepts enabled; we can do better.

gcc/cp/ChangeLog:

* lex.cc (unqualified_name_lookup_error): Handle 'requires' better.

doc: say "compatible types" for -fstrict-aliasing

Include the term used in the standard to ease further research for users,
and while at it, rephrase the description of the rule entirely using
Alexander Monakov's suggestion: it was previously wrong (and imprecise) as
"the same address" may well be re-used later on, and the issue is the
access via an expression of the wrong type.

gcc/ChangeLog:

* doc/invoke.texi: Use "compatible types" term. Rephrase to be
more precise (and correct).

ada: bump Library_Version to 16.

gcc/ada/ChangeLog:

* gnatvsn.ads: Bump Library_Version to 16.

Update crontab and git_update_version.py

2025-04-17 Jakub Jelinek <jakub@redhat.com>

maintainer-scripts/
* crontab: Snapshots from trunk are now GCC 16 related.
Add GCC 15 snapshots from the respective branch.
contrib/
* gcc-changelog/git_update_version.py (active_refs): Add
releases/gcc-15.

Bump BASE-VER.

2025-04-17 Jakub Jelinek <jakub@redhat.com>

* BASE-VER: Set to 16.0.0.

libgomp: Don't test ompx::allocator::gnu_pinned_mem on non-linux targets.

The libgomp.c/alloc-pinned*.c test have
/* { dg-skip-if "Pinning not implemented on this host" { ! *-*-linux-gnu* } } */
so they are only run on Linux targets right now. Duplicating the tests or
reworking them into headers looked like too much work for me right now this
late in stage4, so I've just #ifdefed the uses at least for now.

2025-04-17 Jakub Jelinek <jakub@redhat.com>

PR libgomp/119849
* testsuite/libgomp.c++/allocator-1.C (test_inequality, main): Guard
ompx::allocator::gnu_pinned_mem uses with #ifdef __gnu_linux__.
* testsuite/libgomp.c++/allocator-2.C (main): Likewise.

libstdc++: Fixed signed comparision in _M_parse_fill_and_align [PR119840]

Explicitly cast elements of __not_fill to _CharT. Only '{' and ':'
are used as `__not_fill`, so they are never negative.

PR libstdc++/119840

libstdc++-v3/ChangeLog:

* include/std/format (_M_parse_fill_and_align): Cast elements of
__not_fill to _CharT.

middle-end: fix masking for partial vectors and early break [PR119351]

The following testcase shows an incorrect masked codegen:

#define N 512
#define START 1
#define END 505

int x[N] __attribute__((aligned(32)));

int __attribute__((noipa))
foo (void)
{
  int z = 0;
  for (unsigned int i = START; i < END; ++i)
    {
      z++;
      if (x[i] > 0)
        continue;

      return z;
    }
  return -1;
}

notice how there's a continue there instead of a break.  This means we generate
a control flow where success stays within the loop iteration:

  mask_patt_9.12_46 = vect__1.11_45 > { 0, 0, 0, 0 };
  vec_mask_and_47 = mask_patt_9.12_46 & loop_mask_41;
  if (vec_mask_and_47 == { -1, -1, -1, -1 })
    goto <bb 4>; [41.48%]
  else
    goto <bb 15>; [58.52%]

However when loop_mask_41 is a partial mask this comparison can lead to an
incorrect match.  In this case the mask is:

  # loop_mask_41 = PHI <next_mask_63(6), { 0, -1, -1, -1 }(2)>

due to peeling for alignment with masking and compiling with
-msve-vector-bits=128.

At codegen time we generate:

ptrue   p15.s, vl4
ptrue   p7.b, vl1
not     p7.b, p15/z, p7.b
.L5:
ld1w    z29.s, p7/z, [x1, x0, lsl 2]
cmpgt   p7.s, p7/z, z29.s, #0
not     p7.b, p15/z, p7.b
ptest   p15, p7.b
b.none  .L2
...<early exit>...

Here the basic blocks are rotated and a not is generated.
But the generated not is unmasked (or predicated over an ALL true mask in this
case).  This has the unintended side-effect of flipping the results of the
inactive lanes (which were zero'd by the cmpgt) into -1.  Which then incorrectly
causes us to not take the branch to .L2.

This is happening because we're not comparing against the right value for the
forall case.  This patch gets rid of the forall case by rewriting the
if(all(mask)) into if (!all(mask)) which is the same as if (any(~mask)) by
negating the masks and flipping the branches.

1. For unmasked loops we simply reduce the ~mask.
2. For masked loops we reduce (~mask & loop_mask) which is the same as
   doing (mask & loop_mask) ^ loop_mask.

For the above we now generate:

.L5:
        ld1w    z28.s, p7/z, [x1, x0, lsl 2]
        cmple   p7.s, p7/z, z28.s, #0
        ptest   p15, p7.b
        b.none  .L2

This fixes gromacs with > 1 OpenMP threads and improves performance.

gcc/ChangeLog:

PR tree-optimization/119351
* tree-vect-stmts.cc (vectorizable_early_exit): Mask both operands of
the gcond for partial masking support.

gcc/testsuite/ChangeLog:

PR tree-optimization/119351
* gcc.target/aarch64/sve/pr119351.c: New test.
* gcc.target/aarch64/sve/pr119351_run.c: New test.

libstdc++: Do not use 'not' alternative token in <format>

This fixes:
FAIL: 17_intro/headers/c++1998/operator_names.cc  -std=gnu++23 (test for excess errors)
FAIL: 17_intro/headers/c++1998/operator_names.cc  -std=gnu++26 (test for excess errors)

The purpose of 'not defined<format_kind<R>>' is to be ill-formed (as
required by [format.range.fmtkind]) and to give an error that includes
the string "not defined<format_kind<R>>". That was intended to tell you
that format_kind<R> is not defined, just like it says!

But user code can use -fno-operator-names so we can't use 'not' here,
and "! defined" in the diagnostic doesn't seem as user-friendly. It also
raises questions about whether it was intended to be the preprocessor
token 'defined' (it's not) or where 'defined' is defined (it's not).

Replace it with __primary_template_not_defined<format_kind<R>> and a
comment, which seems to give a fairly clear diagnostic with both GCC and
Clang. The diagnostic now looks like:

.../include/c++/15.0.1/format:5165:7: error: use of 'std::format_kind<int>' before deduction of 'auto'
5165 |       format_kind<_Rg> // you can specialize this for non-const input ranges
      |       ^~~~~~~~~~~~~~~~
.../include/c++/15.0.1/format:5164:35: error: '__primary_template_not_defined' was not declared in this scope
5164 |     __primary_template_not_defined(
      |     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
5165 |       format_kind<_Rg> // you can specialize this for non-const input ranges
      |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
5166 |     );
      |     ~

libstdc++-v3/ChangeLog:

* include/std/format (format_kind): Do not use 'not'
alternative token to make the primary template ill-formed. Use
the undeclared identifier __primary_template_not_defined and a
comment that will appear in diagnostics.
* testsuite/std/format/ranges/format_kind_neg.cc: New test.

s390: Use match_scratch instead of scratch in define_split [PR119834]

The following testcase ICEs since r15-1579 (addition of late combiner),
because *clrmem_short can't be split.
The problem is that the define_insn uses
   (use (match_operand 1 "nonmemory_operand" "n,a,a,a"))
   (use (match_operand 2 "immediate_operand" "X,R,X,X"))
   (clobber (match_scratch:P 3 "=X,X,X,&a"))
and define_split assumed that if operands[1] is const_int_operand,
match_scratch will be always scratch, and it will be reg only if
it was the last alternative where operands[1] is a reg.
The pattern doesn't guarantee it though, of course RA will not try to
uselessly assign a reg there if it is not needed, but during RA
on the testcase below we match the last alternative, but then comes
late combiner and propagates const_int 3 into operands[1].  And that
matches fine, match_scratch matches either scratch or reg and the constraint
in that case is X for the first variant, so still just fine.  But we won't
split that because the splitters only expect scratch.

The following patch fixes it by using match_scratch instead of scratch,
so that it accepts either.

2025-04-17  Jakub Jelinek  <jakub@redhat.com>

PR target/119834
* config/s390/s390.md (define_split after *cpymem_short): Use
(clobber (match_scratch N)) instead of (clobber (scratch)).  Use
(match_dup 4) and operands[4] instead of (match_dup 3) and operands[3]
in the last of those.
(define_split after *clrmem_short): Use (clobber (match_scratch N))
instead of (clobber (scratch)).
(define_split after *cmpmem_short): Likewise.

* g++.target/s390/pr119834.C: New test.

libstdc++: Remove dead code in range_formatter::format [PR109162]

Because the _M_format(__rg, __fc) were placed outside of if constexpr,
these method and its children where instantiated, even if
_M_format<const _Range> could be used.

To simplify the if constexpr chain, we introduce a __simply_formattable_range
(name based on simple-view) exposition only concept, that checks if range is
const and mutable formattable and uses same formatter specialization for
references in each case.

PR libstdc++/109162

libstdc++-v3/ChangeLog:

* include/std/format (__format::__simply_formattable_range): Define.
(range_formatter::format): Do not instantiate _M_format for mutable
_Rg if const _Rg can be used.

Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>