git.ipfire.org Git - thirdparty/gcc.git/log

fortran: Fix closing brace in comment

In a comment, fix the closing brace of the tree layout definition of the
openmp allocate clause. It was confusing vim's matching brace support.

gcc/fortran/ChangeLog:

* trans-decl.cc (gfc_trans_deferred_vars): Fix closing brace in
a comment.

Properly record SLP node when costing a vectorized store

Even when we emit scalar stores we should pass down the SLP node.

PR tree-optimization/121350
* tree-vect-stmts.cc (vectorizable_store): Pass down SLP
node when costing scalar stores in vect_body.

Avoid representing SLP mask by scalar op

The following removes the scalar mask output from vect_check_scalar_mask
and deals with the fallout, eliminating uses of it. That's mostly
replacing checks on 'mask' by checks on 'mask_node' but also realizing
PR121349 and fixing that up a bit in check_load_store_for_partial_vectors.

PR tree-optimization/121349
* tree-vect-stmts.cc (check_load_store_for_partial_vectors):
Get full SLP mask, reduce to uniform scalar_mask for further
processing if possible.
(vect_check_scalar_mask): Remove scalar mask output, remove
code conditional on slp_mask.
(vectorizable_call): Adjust.
(check_scan_store): Get and check SLP mask.
(vectorizable_store): Eliminate scalar mask variable.
(vectorizable_load): Likewise.

doc: mdocml.bsd.lv is now mandoc.bsd.lv

On the way switch from http to https.

gcc:
* doc/install.texi (Prerequisites): mdocml.bsd.lv is now
mandoc.bsd.lv.

Merge get_group_load_store_type into get_load_store_type

The following merges back get_group_load_store_type into
get_load_store_type, it gets easier to follow that way. I've
removed the unused ncopies parameter as well.

* tree-vect-stmts.cc (get_group_load_store_type): Remove,
inline into ...
(get_load_store_type): ... this. Remove ncopies parameter.
(vectorizable_load): Adjust.
(vectorizable_store): Likewise.

Some TLC to vectorizable_store

The following removes redundant checks and scalar operand uses.

* tree-vect-stmts.cc (get_group_load_store_type): Remove
checks performed at SLP build time.
(vect_check_store_rhs): Remove scalar RHS output.
(vectorizable_store): Remove uses of scalar RHS.

Add VMAT_UNINITIALIZED

We're using VMAT_INVARIANT as default, but we should simply have
an uninitialized state.

* tree-vectorizer.h (VMAT_UNINITIALIZED): New
vect_memory_access_type.
* tree-vect-slp.cc (_slp_tree::_slp_tree): Use it.

tree-optimization/121338 - UBSAN error in adjust_setup_cost

The following avoids possibly overflowing adds for rounding. We
know cost is bound, so it's enough to do this simple test.

PR tree-optimization/121338
* tree-ssa-loop-ivopts.cc (avg_loop_niter): Return an
unsigned.
(adjust_setup_cost): When niters is so large the division
result is one or zero avoid it.
(create_new_ivs): Adjust.

Put SLP_TREE_SIMD_CLONE_INFO into type specifc data

The following adds vect_simd_clone_data as a container for vect
type specific data for vectorizable_simd_clone_call and moves
SLP_TREE_SIMD_CLONE_INFO there.

* tree-vectorizer.h (vect_simd_clone_data): New.
(_slp_tree::simd_clone_info): Remove.
(SLP_TREE_SIMD_CLONE_INFO): Likewise.
* tree-vect-slp.cc (_slp_tree::_slp_tree): Adjust.
(_slp_tree::~_slp_tree): Likewise.
* tree-vect-stmts.cc (vectorizable_simd_clone_call): Use
tyupe specific data to store SLP_TREE_SIMD_CLONE_INFO.

Use a class hierarchy for vect specific data

The following turns the union into a class hierarchy. One completed
SLP_TREE_TYPE could move into the base class.

* tree-vect-slp.cc (_slp_tree::_slp_tree): Adjust.
(_slp_tree::~_slp_tree): Likewise.
* tree-vectorizer.h (vect_data): New base class.
(_slp_tree::u): Remove.
(_slp_tree::data): Add pointer to vect_data.
(_slp_tree::get_data): New helper template.

bswap: Fix up ubsan detected UB in find_bswap_or_nop [PR121322]

The following testcase results in compiler UB as detected by ubsan.
find_bswap_or_nop first checks is_bswap_or_nop_p and if that fails
on the tmp_n value, tries some rotation of that if possible.
The discovery what rotate count to use ignores zero bytes from
the least significant end (those mean zero bytes and so can be masked
away) and on the first non-zero non-0xff byte (0xff means don't know),
1-8 means some particular byte of the original computes count (the rotation
count) from that byte + the byte index.
Now, on the following testcase we have tmp_n 0x403020105060700, i.e.
the least significant byte is zero, then the msb from the original value,
byte below it, another one below it, then the low 32 bits of the original
value.  So, we stop at count 7 with i 1, it wraps around and we get count
0.
Then we invoke UB on
          tmp_n = tmp_n >> count | tmp_n << (range - count);
because count is 0 and range is 64.
Now, of course I could fix it up by doing tmp_n << ((range - count) % range)
or something similar, but that is just wasted compile time, if count is 0,
we already know that is_bswap_or_nop_p failed on that tmp_n value and
so it will fail again if the value is the same.  So I think better
just return NULL (i.e. punt).

2025-08-01  Jakub Jelinek  <jakub@redhat.com>

PR middle-end/121322
* gimple-ssa-store-merging.cc (find_bswap_or_nop): Return NULL if
count is 0.

* gcc.dg/pr121322.c: New test.

MAINTAINERS: Update my e-mail address.

ChangeLog/
* MAINTAINERS: Update my e-mail address.

c++/modules: Warn for optimize attributes instead of ICEing [PR108080]

This PR is the most frequently reported modules bug for 15, as the ICE
message does not indicate the issue at all and reducing to find the
underlying cause can be tricky.

I have a WIP patch to fix this issue by just reconstructing these nodes
on stream-in from any attributes applied to the functions, but since at
this stage it may still take a while to be ready, it seems useful to me
to at least make the error here more friendly and guide users to what
they could do to work around this issue.

In fact, as noted on the PR, a lot of the time it should be harmless to
just ignore the optimize etc. attribute and continue translation, at the
user's own risk; this patch as such turns the ICE into a warning with no
option to silence.

PR c++/108080

gcc/cp/ChangeLog:

* module.cc (trees_out::core_vals): Warn when streaming
target/optimize node; adjust comments.
(trees_in::core_vals): Don't stream a target/optimize node.

gcc/testsuite/ChangeLog:

* g++.dg/modules/pr108080.H: New test.

Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
Reviewed-by: Jason Merrill <jason@redhat.com>
Reviewed-by: Patrick Palka <ppalka@redhat.com>

c++/modules: Merge PARM_DECL properties from function definitions [PR121238]

When we merge a function definition, if there already exists a forward
declaration in the importing TU we use the PARM_DECLs belonging to that
decl. This usually works fine, except as noted in the linked PR there
are some flags (such as TREE_ADDRESSABLE) that only get set on a
PARM_DECL once a definition is provided.

This patch fixes the wrong-code issues by propagating any properties on
PARM_DECLs I could find that may affect codegen.

PR c++/121238

gcc/cp/ChangeLog:

* module.cc (trees_in::fn_parms_fini): Merge properties for
definitions.

gcc/testsuite/ChangeLog:

* g++.dg/modules/merge-19.h: New test.
* g++.dg/modules/merge-19_a.H: New test.
* g++.dg/modules/merge-19_b.C: New test.

Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
Reviewed-by: Jason Merrill <jason@redhat.com>
Reviewed-by: Patrick Palka <ppalka@redhat.com>

Daily bump.

PR modula2/121314: quotes appearing in concatenated error strings

This patch fixes the addition of strings so that no extraneous quotes
appear in the result string. The fix is made to the bootstrap tool mc
and it has been rebuilt.

gcc/m2/ChangeLog:

PR modula2/121314
* mc-boot/GFormatStrings.cc (PerformFormatString): Rebuilt.
* mc-boot/GM2EXCEPTION.cc (M2EXCEPTION_M2Exception): Rebuilt.
* mc-boot/GSFIO.cc (SFIO_GetFileName): Rebuilt.
* mc-boot/GSFIO.h (SFIO_GetFileName): Rebuilt.
* mc-boot/Gdecl.cc: Rebuilt.
* mc-boot/GmcFileName.h: Rebuilt.
* mc/decl.mod (getStringChar): New procedure function.
(getStringContents): Call getStringChar.
(addQuotes): New procedure function.
(foldBinary): Call addQuotes to add delimiting quotes
to the new string.

gcc/testsuite/ChangeLog:

PR modula2/121314
* gm2/errors/fail/badindrtype.mod: New test.
* gm2/errors/fail/badindrtype2.mod: New test.

Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>

fortran: Evaluate class function bounds in the scalarizer [PR121342]

There is code in gfc_conv_procedure_call that, for polymorphic
functions, initializes the scalarization array descriptor
information and forcedfully sets loop bounds.  This code is changing
the decisions made by the scalarizer behind his back, and the test shows
an example where the consequences are (badly) visible.  In the test, for
one of the actual arguments to an elemental subroutine, an offset to the
loop variable is missing to access the array, as it was the one
originally chosen to set the loop bounds from.

This could theoretically be fixed by just clearing the array of choice
for the loop bounds.  This change takes instead the harder path of
adding the missing information to the scalarizer's knowledge so that its
decision doesn't need to be forced to something else after the fact.
The array descriptor information initialisation for polymorphic
functions is moved to gfc_add_loop_ss_code (after the function call
generation), and the loop bounds initialization to a new function called
after that.

As the array chosen to set the loop bounds from is no longer forced
to be the polymorphic function result, we have to let the scalarizer set
a delta for polymorphic function results.  For regular non-polymorphic
function result arrays, they are zero-based and the temporary creation
makes the loop zero-based as well, so we can continue to skip the delta
calculation.

In the cases where a temporary is created to store the result of the
array function, the creation of the temporary shifts the loop bounds
to be zero-based.  As there was no delta for polymorphic result arrays,
the function result descriptor offset was set to zero in that case for
a zero-based array reference to be correct.  Now that the scalarizer
sets a delta, those forced offset updates have to go because they can
make the descriptor invalid and cause erroneous array references.

PR fortran/121342

gcc/fortran/ChangeLog:

* trans-expr.cc (gfc_conv_subref_array_arg): Remove offset
update.
(gfc_conv_procedure_call): For polymorphic functions, move the
scalarizer descriptor information...
* trans-array.cc (gfc_add_loop_ss_code): ... here, and evaluate
the bounds to fresh variables.
(get_class_info_from_ss): Remove offset update.
(gfc_conv_ss_startstride): Don't set a zero value for function
result upper bounds.
(late_set_loop_bounds): New.
(gfc_conv_loop_setup): If the bounds of a function result have
been set, and no other array provided loop bounds for a
dimension, use the function result bounds as loop bounds for
that dimension.
(gfc_set_delta): Don't skip delta setting for polymorphic
function results.

gcc/testsuite/ChangeLog:

* gfortran.dg/class_elemental_1.f90: New test.

AVR: avr.opt.urls: Add -mfuse-move2

PR rtl-optimization 121340
gcc/
* config/avr/avr.opt.urls (-mfuse-move2): Add url.

AVR: Set .type of jump table label.

gcc/
* config/avr/avr.cc (avr_output_addr_vec) <labl>: Asm out its .type.

AVR: rtl-optimization/121340 - New mini-pass to undo superfluous moves from insn combine.

Insn combine may come up with superfluous reg-reg moves, where the combine
people say that these are no problem since reg-alloc is supposed to optimize
them.  The issue is that the lower-subreg pass sitting between combine and
reg-alloc may split such moves, coming up with a zoo of subregs which are
only handled poorly by the register allocator.

This patch adds a new avr mini-pass that handles such cases.

As an example, take

int f_ffssi (long x)
{
    return __builtin_ffsl (x);
}

where the two functions have the same interface, i.e. there are no extra
moves required for the argument or for the return value. However,

$ avr-gcc -S -Os -dp -mno-fuse-move ...

f_ffssi:
mov r20,r22 ;  29 [c=4 l=1]  movqi_insn/0
mov r21,r23 ;  30 [c=4 l=1]  movqi_insn/0
mov r22,r24 ;  31 [c=4 l=1]  movqi_insn/0
mov r23,r25 ;  32 [c=4 l=1]  movqi_insn/0
mov r25,r23 ;  33 [c=4 l=4]  *movsi/0
mov r24,r22
mov r23,r21
mov r22,r20
rcall __ffssi2 ;  34 [c=16 l=1]  *ffssihi2.libgcc
ret ;  37 [c=0 l=1]  return

where all the moves add up to a no-op.  The -mno-fuse-move option
stops any attempts by the avr backend to clean up that mess.

PR rtl-optimization/121340
gcc/
* config/avr/avr.opt (-mfuse-move2): New option.
* config/avr/avr-passes.def (avr_pass_2moves): Insert after combine.
* config/avr/avr-passes.cc (make_avr_pass_2moves): New function.
(pass_data avr_pass_data_2moves): New static variable.
(avr_pass_2moves): New rtl_opt_pass.
* config/avr/avr-protos.h (make_avr_pass_2moves): New proto.
* common/config/avr/avr-common.cc
(default_options avr_option_optimization_table) <-mfuse-move2>:
Set for -O1 and higher.
* doc/invoke.texi (AVR Options) <-mfuse-move2>: Document.

c++: constexpr, array, private ctor [PR120800]

Here cxx_eval_vec_init_1 wants to recreate the default constructor call that
we previously built and threw away in build_vec_init_elt, but we aren't in
the same access context at this point. Since we already checked access,
let's just suppress access control here.

Redoing overload resolution at constant evaluation time is sketchy, but
should usually be fine for a default/copy constructor.

PR c++/120800

gcc/cp/ChangeLog:

* constexpr.cc (cxx_eval_vec_init_1): Suppress access control.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/constexpr-array30.C: New test.

Revert "Ada: Add System.C_Time and GNAT.C_Time units to libgnat"

This reverts commit 41974d6ed349507ca1532629851b7b5d74f44abc.

Ada: Fix miscompilation of GNAT tools with -march=znver3

The throw and catch sides of the Ada exception machinery disagree about
the BIGGEST_ALIGNMENT setting.

gcc/ada/
PR ada/120440
* gcc-interface/Makefile.in (GNATLINK_OBJS): Add s-excmac.o.
(GNATMAKE_OBJS): Likewise.

Ada: Add System.C_Time and GNAT.C_Time units to libgnat

The first unit provides the time_t, timeval and timespec types corresponding
to the C types defined by the OS, as well as various conversion functions.

The second unit is a mere renaming of the first under the GNAT hierarchy.

This removes C time types and conversions under System, and from bodies and
private parts under GNAT, while keeping visible types and conversions under
GNAT as Obsolescent.

[changelog]
PR ada/114065
* Makefile.rtl (GNATRTL_NONTASKING_OBJS): Add g-c_time$(objext) and
s-c_time$(objext).
(Aarch64/Android): Do not use s-osinte__android.adb.
(SPARC/Solaris): Do not use s-osprim__solaris.adb.
(x86/Solaris): Likewise.
(LynxOS178): Do not use s-parame__posix2008.ads.
(RTEMS): Likewise.
(x32/Linux): Likewise, as well as s-linux__x32.ads. Replace
s-osprim__x32.adb with s-osprim__posix.adb.
(LIBGNAT_OBJS): Remove cal.o.
* cal.c: Delete.
* doc/gnat_rm/the_gnat_library.rst (GNAT.C_Time): New entry.
(GNAT.Calendar): Do not mention the obsolete conversion functions.
* impunit.adb (Non_Imp_File_Names_95): Add g-c_time.
* libgnarl/a-exetim__posix.adb: Add with clause for System.C_Time
(Clock): Use type and functions from System.C_Time.
* libgnarl/s-linux.ads: Remove with clause for System.Parameters.
Remove declarations of C time types.
* libgnarl/s-linux__alpha.ads: Likewise.
* libgnarl/s-linux__android-aarch64.ads: Likewise.
* libgnarl/s-linux__android-arm.ads: Likewise.
* libgnarl/s-linux__hppa.ads: Likewise.
* libgnarl/s-linux__loongarch.ads: Likewise.
* libgnarl/s-linux__mips.ads: Likewise.
* libgnarl/s-linux__riscv.ads: Likewise.
* libgnarl/s-linux__sparc.ads: Likewise.
* libgnarl/s-osinte__aix.ads: Likewise.
* libgnarl/s-osinte__android.ads: Likewise.
* libgnarl/s-osinte__cheribsd.ads: Likewise.
* libgnarl/s-osinte__darwin.ads: Likewise.
* libgnarl/s-osinte__dragonfly.ads: Likewise.
* libgnarl/s-osinte__freebsd.ads: Likewise.
* libgnarl/s-osinte__gnu.ads: Likewise.
* libgnarl/s-osinte__hpux.ads: Likewise.
* libgnarl/s-osinte__kfreebsd-gnu.ads: Likewise.
* libgnarl/s-osinte__linux.ads: Likewise.
* libgnarl/s-osinte__lynxos178e.ads: Likewise.
* libgnarl/s-osinte__qnx.ads: Likewise.
* libgnarl/s-osinte__rtems.ads: Likewise.
* libgnarl/s-osinte__solaris.ads: Likewise.
* libgnarl/s-osinte__vxworks.ads: Likewise.
* libgnarl/s-qnx.ads: Likewise.
* libgnarl/s-linux__x32.ads: Delete.
* libgnarl/s-osinte__darwin.adb (To_Duration): Remove.
(To_Timespec): Likewise.
* libgnarl/s-osinte__aix.adb: Likewise.
* libgnarl/s-osinte__dragonfly.adb: Likewise.
* libgnarl/s-osinte__freebsd.adb: Likewise.
* libgnarl/s-osinte__gnu.adb: Likewise.
* libgnarl/s-osinte__lynxos178.adb: Likewise.
* libgnarl/s-osinte__posix.adb: Likewise.
* libgnarl/s-osinte__qnx.adb: Likewise.
* libgnarl/s-osinte__rtems.adb: Likewise.
* libgnarl/s-osinte__solaris.adb: Likewise.
* libgnarl/s-osinte__vxworks.adb: Likewise.
* libgnarl/s-osinte__x32.adb: Likewise.
* libgnarl/s-taprop__solaris.adb: Add with clause for System.C_Time.
(Monotonic_Clock): Use type and functions from System.C_Time.
(RT_Resolution): Likewise.
(Timed_Sleep): Likewise.
(Timed_Delay): Likewise.
* libgnarl/s-taprop__vxworks.adb: Likewise.
* libgnarl/s-tpopmo.adb: Likewise.
* libgnarl/s-osinte__android.adb: Delete.
* libgnat/g-c_time.ads: New file.
* libgnat/g-calend.adb: Delegate to System.C_Time.
* libgnat/g-calend.ads: Likewise.
* libgnat/g-socket.adb: Likewise.
* libgnat/g-socthi.adb: Likewise.
* libgnat/g-socthi__vxworks.adb: Likewise.
* libgnat/g-sothco.ads: Likewise.
* libgnat/g-spogwa.adb: Likewise.
* libgnat/s-c_time.adb: New file.
* libgnat/s-c_time.ads: Likewise.
* libgnat/s-optide.adb: Import nanosleep here.
* libgnat/s-os_lib.ads (time_t): Remove.
(To_Ada): Adjust.
(To_C): Likewise.
* libgnat/s-os_lib.adb: Likewise.
* libgnat/s-osprim__darwin.adb: Delegate to System.C_Time.
* libgnat/s-osprim__posix.adb: Likewise.
* libgnat/s-osprim__posix2008.adb: Likewise.
* libgnat/s-osprim__rtems.adb: Likewise.
* libgnat/s-osprim__unix.adb: Likewise.
* libgnat/s-osprim__solaris.adb: Delete.
* libgnat/s-osprim__x32.adb: Likewise.
* libgnat/s-parame.ads (time_t_bits): Remove.
* libgnat/s-parame__hpux.ads: Likewise.
* libgnat/s-parame__vxworks.ads: Likewise.
* libgnat/s-parame__posix2008.ads: Delete.
* s-oscons-tmplt.c (SIZEOF_tv_nsec): New constant.

c++: consteval blocks

This patch implements consteval blocks, as specified by P2996.
They aren't very useful without define_aggregate, but having
a reviewed implementation on trunk would be great.

consteval {} can be anywhere where a member-declaration or
block-declaration can be.  The expression corresponding to it is:

  [] -> void static consteval compound-statement ()

and it must be a constant expression.

I've used cp_parser_lambda_expression to take care of most of the
parsing.  Since a consteval block can find itself in a template, we
need a vehicle to carry the block for instantiation.  Rather than
inventing a new tree, I'm using STATIC_ASSERT.

A consteval block can't return a value but that is checked by virtue
of the lambda having a void return type.

PR c++/120775

gcc/cp/ChangeLog:

* constexpr.cc (cxx_eval_outermost_constant_expr): Use
extract_call_expr.
* cp-tree.h (CONSTEVAL_BLOCK_P, LAMBDA_EXPR_CONSTEVAL_BLOCK_P): Define.
(finish_static_assert): Adjust declaration.
(current_nonlambda_function): Likewise.
* lambda.cc (current_nonlambda_function): New parameter.  Only keep
iterating if the function represents a consteval block.
* parser.cc (cp_parser_lambda_expression): New parameter for
consteval blocks.  Use it.  Set LAMBDA_EXPR_CONSTEVAL_BLOCK_P.
(cp_parser_lambda_declarator_opt): Likewise.
(build_empty_string): New.
(cp_parser_next_tokens_are_consteval_block_p): New.
(cp_parser_consteval_block): New.
(cp_parser_block_declaration): Handle consteval blocks.
(cp_parser_static_assert): Use build_empty_string.
(cp_parser_member_declaration): Handle consteval blocks.
* pt.cc (tsubst_stmt): Adjust a call to finish_static_assert.
* semantics.cc (finish_fname): Warn for consteval blocks.
(finish_static_assert): New parameter for consteval blocks.  Set
CONSTEVAL_BLOCK_P.  Evaluate consteval blocks specially.

gcc/testsuite/ChangeLog:

* g++.dg/cpp26/consteval-block1.C: New test.
* g++.dg/cpp26/consteval-block2.C: New test.
* g++.dg/cpp26/consteval-block3.C: New test.
* g++.dg/cpp26/consteval-block4.C: New test.
* g++.dg/cpp26/consteval-block5.C: New test.
* g++.dg/cpp26/consteval-block6.C: New test.
* g++.dg/cpp26/consteval-block7.C: New test.
* g++.dg/cpp26/consteval-block8.C: New test.

Reviewed-by: Jason Merrill <jason@redhat.com>

RISC-V: Add testcases for signed avg ceil vx combine

The unsigned avg ceil share the vaaddx.vx for the vx combine,
so add the test case to make sure it works well as expected.

The below test suites are passed for this patch series.
* The rv64gcv fully regression test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i16.c: Add asm check
for signed avg ceil.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-i16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-i16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_binary.h: Add test
helper macros.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_binary_data.h: Add
test data for run test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vaadd-run-2-i16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vaadd-run-2-i32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vaadd-run-2-i64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vaadd-run-2-i8.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

vect: Don't set bogus bounds on epilogues [PR120805]

The testcases in the PR are failing due to the code trying to set a vector range
on an epilogue.

However on epilogues the range doesn't make sense.  In particular we are setting
ranged to help niters analysis. But the epilogue doesn't iterate.

Secondly the bounds variable hasn't been adjusted to vector iterations:

In the epilogue this is calculated as

  <bb 13> [local count: 81467476]:
  # i_127 = PHI <tmp.7_131(10), 0(5)>
  # _132 = PHI <_133(10), 0(5)>
  _181 = (unsigned int) n_41(D);
  bnd.31_180 = _181 - _132;

where

  _133 = niters_vector_mult_vf.6_130;

but _132 is a phi node, and if coming from the vector loop skip edge
_181 will be <1, VF>.

But this is a range VRP or Ranger can easily report due to the guard on the
skip_vector loop.

Previously, non-const VF would skip this code entirely due to the .is_constant()
check.

Non-partial vector loop would also skip it because the bounds would fold to a
constant. so it doesn't enter the !gimple_value check.

When support for partial vector ranges was added, this accidentally enabled
ranges on partial vector epilogues.

This patch now makes it explicit that ranges shouldn't be set for epilogues, as
they don't seem to be useful anyway.

gcc/ChangeLog:

PR tree-optimization/120805
* tree-vect-loop-manip.cc (vect_gen_vector_loop_niters): Skip setting
bounds on epilogues.

libgcc: Update FMV features to latest ACLE spec 2024Q4

Update FMV features to latest ACLE spec of 2024Q4 - several features have been
removed or merged. Add FMV support for CSSC and MOPS. Preserve the ordering
in enum CPUFeatures.

gcc:
* common/config/aarch64/cpuinfo.h: Remove unused features, add FEAT_CSSC
and FEAT_MOPS.
* config/aarch64/aarch64-option-extensions.def: Remove FMV support
for RPRES, use PULL rather than AES, add FMV support for CSSC and MOPS.

libgcc:
* config/aarch64/cpuinfo.c (__init_cpu_features_constructor):
Remove unused features, add support for CSSC and MOPS.

libgcc: Cleanup HWCAP defines in cpuinfo.c

Cleanup HWCAP defines - rather than including hwcap.h and then repeating it
using ifndef, just define the HWCAPs we need exactly as in hwcap.h.

libgcc:
* config/aarch64/cpuinfo.c: Cleanup HWCAP defines.

AArch64: Use correct cost for shifted halfword load/stores

Since all Armv9 cores support shifted LDRH/STRH, use the correct cost of zero
for these.

gcc:
* config/aarch64/tuning_models/generic_armv9_a.h
(generic_armv9_a_addrcost_table): Use zero cost for himode.

Fixup wrong change to get_group_load_store_type

The following fixes up the r16-2593-g6ac78317aa6adf change which
made us match up a scalar with a vector type. Oops.

Noticed when removing the gather/scatter pattern that creates the
IFNs early.

* tree-vect-stmts.cc (get_group_load_store_type): Properly
compare the scalar type of the gather/scatter offset to
the offset vector component type.

zlib: refresh version in configure

zlib/ChangeLog:

* configure: Regenerate.
* configure.ac: Set version to 1.3.1.

Extend gimple_fold_inplace API

The following allows to specify the valueization hook to be used.

* gimple-fold.h (fold_stmt_inplace): Add valueization hook
argument, defaulted to no_follow_ssa_edges.
* gimple-fold.cc (fold_stmt_inplace): Adjust.

zlib: update ChangeLog

zlib/ChangeLog:
PR other/105404
* Import zlib-1.3.1

zlib: import zlib-1.3.1

This is vanilla zlib-1.3.1 imported over the existing zlib/ dir with:
* README adjusted to add the GCC note at the top;
* GCC's ChangeLog merged with the upstream one, as before;
* Deleted upstream Makefile as has been done before (we use an autoconf-
generated one)

cobol: Eliminate various errors. [PR120244]

The following coding errors were located by running extended tests
through valgrind. These changes repair the errors.

gcc/cobol/ChangeLog:

PR cobol/120244
* genapi.cc (get_level_88_domain): Increase array size for final byte.
(psa_FldLiteralA): Use correct length in build_string_literal call.
* scan.l: Use a loop instead of std:transform to avoid EOF overrun.
* scan_ante.h (binary_integer_usage): Use a variable-length buffer.

i386: Fix typo in diagnostic about simultaneous regparm and thiscall use

gcc/ChangeLog:

* config/i386/i386-options.cc (ix86_handle_cconv_attribute):
Fix typo.

i386: Fix incorrect handling of simultaneous regparm and thiscall use

gcc/ChangeLog:

* config/i386/i386-options.cc (ix86_handle_cconv_attribute):
Handle simultaneous use of regparm and thiscall attributes in
case when regparm is set before thiscall.

gcc/testsuite/ChangeLog:

* gcc.target/i386/attributes-error.c: Add more attributes
combinations.

i386: Fix incorrect comment about stdcall and fastcall compatibility

gcc/ChangeLog:

* config/i386/i386-options.cc (ix86_handle_cconv_attribute):
Fix comments which state that combination of stdcall and fastcall
attributes is valid but redundant.

i386: Ignore regparm attribute and warn for it in 64-bit mode

The regparm attribute does not affect code generation on x86-64 target.
Despite this, regparm was accepted silently, unlike other calling
convention attributes handled in the ix86_handle_cconv_attribute
function.

Due to lack of diagnostics, Linux kernel attempted to specify regparm(0)
on vmread_error_trampoline declaration, which is supposed to be invoked
with all arguments on stack:
https://lore.kernel.org/all/20220928232015.745948-1-seanjc@google.com/

To produce a warning for regparm in 64-bit mode, simply move the block
that produces diagnostics above the block that handles the regparm
attribute.

gcc/ChangeLog:

* config/i386/i386-options.cc (ix86_handle_cconv_attribute):
Move 64-bit mode check before regparm handling.

gcc/testsuite/ChangeLog:

* g++.dg/abi/regparm1.C: Require ia32 target.
* gcc.target/i386/20020224-1.c: Likewise.
* gcc.target/i386/pr103785.c: Use regparm attribute only if
not in 64-bit mode.
* gcc.target/i386/pr36533.c: Likewise.
* gcc.target/i386/pr59099.c: Likewise.
* gcc.target/i386/sibcall-8.c: Likewise.
* gcc.target/i386/sw-1.c: Likewise.
* gcc.target/i386/pr15184-2.c: Fix invalid comment.
* gcc.target/i386/attributes-ignore.c: New test.

tree-optimization/121320 - UBSAN error in ao_ref_init_from_vn_reference

The multiplication by BITS_PER_UNIT should be done in poly_offset_int.

PR tree-optimization/121320
* tree-ssa-sccvn.cc (ao_ref_init_from_vn_reference): Convert
op->off to poly_offset_int before multiplying by
BITS_PER_UNIT.

tree-optimization/121323 - UBSAN error in ao_ref_init_from_ptr_and_range

We should check the offset fits a HWI when multiplied to be in bits.

PR tree-optimization/121323
* tree-ssa-alias.cc (ao_ref_init_from_ptr_and_range): Check
the pointer offset fits in a HWI when represented in bits.

testsuite: Add runtime test for FMV resolvers

gcc/testsuite/ChangeLog:

* g++.target/aarch64/mv-cpu-features.C: new test.

testsuite: Add tests for __init_cpu_features_constructor

Add tests that would call __init_cpu_features_resolver() directly
from an ifunc resolver that would in tern call the function under
test __init_cpu_features_constructor() using synthetic parameters
for different sizes of the 2nd argument.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/ifunc-resolver.in: add core test functions.
* gcc.target/aarch64/ifunc-resolver-0.c: new test.
* gcc.target/aarch64/ifunc-resolver-1.c: ditto.
* gcc.target/aarch64/ifunc-resolver-2.c: ditto.
* gcc.target/aarch64/ifunc-resolver-3.c: ditto.
* gcc.target/aarch64/ifunc-resolver-4.c: as above.

aarch64: Stop using sys/ifunc.h header in libatomic and libgcc

This optional header is used to bring in the definition of the
struct __ifunc_arg_t type. Since it has been added to glibc only
recently, the previous implementation had to check whether this
header is present and, if not, it provide its own definition.

This creates dead code because either one of these two parts would
not be tested. The ABI specification for ifunc resolvers allows to
create own ABI-compatible definition for this type, which is the
right way of doing it.

In addition to improving consistency, the new approach also helps
with addition of new fields to struct __ifunc_arg_t type without
the need to work-around situations when the definition imported
from the header lacks these new fields.

ABI allows to define as many hwcap fields in this struct as needed,
provided that at runtime we only access the fields that are permitted
by the _size value.

gcc/
* config/aarch64/aarch64.cc (build_ifunc_arg_type):
Add new fields _hwcap3 and _hwcap4.

libatomic/
* config/linux/aarch64/host-config.h (__ifunc_arg_t):
Remove sys/ifunc.h and add new fields _hwcap3 and _hwcap4.

libgcc/
* config/aarch64/cpuinfo.c (__ifunc_arg_t): Likewise.
(__init_cpu_features): obtain and assign values for the
fields _hwcap3 and _hwcap4.
(__init_cpu_features_constructor): check _size in the
arg argument.

rs6000: Avoid undefined behavior caused by overflow and invalid shifts

While building GCC with --with-build-config=bootstrap-ubsan on
powerpc64le-unknown-linux-gnu, multiple UBSAN runtime errors were
encountered in rs6000.cc and rs6000.md due to undefined behavior
involving left shifts on negative values and shift exponents equal to
or exceeding the type width.

The issue was in bit pattern recognition code
(in can_be_rotated_to_negative_lis and can_be_built_by_li_and_rldic),
where signed values were shifted without handling negative inputs or
guarding against shift counts equal to the type width, causing UB.
The fix ensures shifts and rotations are done unsigned HOST_WIDE_INT,
and casting back only where needed (like for arithmetic right shifts)
with proper guards to prevent shift-by-64.

2025-07-31 Kishan Parmar <kishan@linux.ibm.com>

gcc:
PR target/118890
* config/rs6000/rs6000.cc (can_be_rotated_to_negative_lis): Avoid left
shift of negative value and guard shift count.
(can_be_built_by_li_and_rldic): Likewise.
(rs6000_emit_set_long_const): Likewise.
* config/rs6000/rs6000.md (splitter for plus into two 16-bit parts): Fix
UB from overflow in addition.

Add checks for node in aarch64 vector cost modeling

After removing STMT_VINFO_MEMORY_ACCESS_TYPE we now ICE when costing
for scalar stmts required in the epilog since the cost model tries
to pattern-match gathers (an earlier patch tried to improve this
by introducing stmt groups, but that was on hold due to negative
feedback). The following shot-cuts those attempts when node is NULL
as that then cannot be a vector stmt. Another possibility would be
to gate on vect_body, or restructure everything.

Note we now ensure that when m_costing_for_scalar node is NULL.

* config/aarch64/aarch64.cc (aarch64_detect_vector_stmt_subtype):
Check for node before dereferencing.
(aarch64_vector_costs::add_stmt_cost): Likewise.

aarch64: Prevent streaming-compatible code from assembler rejection [PR121028]

Streaming-compatible functions can be compiled without SME enabled, but need
to use "SMSTART SM" and "SMSTOP SM" to temporarily switch into the streaming
state of a callee. These switches are conditional on the current mode being
opposite to the target mode, so no SME instructions are executed if SME is not
available.

However, in GAS, "SMSTART SM" and "SMSTOP SM" always require +sme. A call
from a streaming-compatible function, compiled without SME enabled, to a non
-streaming function will be rejected as:

Error: selected processor does not support `smstop sm'..

To work around this, we make use of the .inst directive to insert the literal
encodings of "SMSTART SM" and "SMSTOP SM".

gcc/ChangeLog:
PR target/121028
* config/aarch64/aarch64-sme.md (aarch64_smstart_sm): Use the .inst
directive if !TARGET_SME.
(aarch64_smstop_sm): Likewise.

gcc/testsuite/ChangeLog:
PR target/121028
* gcc.target/aarch64/sme/call_sm_switch_1.c: Tell check-function
-bodies not to ignore .inst directives, and replace the test for
"smstart sm" with one for it's encoding.
* gcc.target/aarch64/sme/call_sm_switch_11.c: Likewise.
* gcc.target/aarch64/sme/pr121028.c: New test.

Remove STMT_VINFO_MEMORY_ACCESS_TYPE

This should be present only on SLP nodes now. The RISC-V changes
are mechanical along the line of the SLP_TREE_TYPE changes.

* tree-vectorizer.h (_stmt_vec_info::memory_access_type): Remove.
(STMT_VINFO_MEMORY_ACCESS_TYPE): Likewise.
(vect_mem_access_type): Likewise.
* tree-vect-stmts.cc (vectorizable_store): Do not set
STMT_VINFO_MEMORY_ACCESS_TYPE. Fix SLP_TREE_MEMORY_ACCESS_TYPE
usage.
* tree-vect-loop.cc (update_epilogue_loop_vinfo): Remove
checking of memory access type.
* config/riscv/riscv-vector-costs.cc (costs::compute_local_live_ranges):
Use SLP_TREE_MEMORY_ACCESS_TYPE.
(costs::need_additional_vector_vars_p): Likewise.
(segment_loadstore_group_size): Get SLP node as argument,
use SLP_TREE_MEMORY_ACCESS_TYPE.
(costs::adjust_stmt_cost): Pass down SLP node.
* config/aarch64/aarch64.cc (aarch64_ld234_st234_vectors): Use
SLP_TREE_MEMORY_ACCESS_TYPE instead of vect_mem_access_type.
(aarch64_detect_vector_stmt_subtype): Likewise.
(aarch64_vector_costs::count_ops): Likewise.
(aarch64_vector_costs::add_stmt_cost): Likewise.

Do not bother with fake verifying of shared DRs

The following avoids comparing the shared DRs against their unmodified
copy for epilogues during loop transform since they are actually
modified by update_epilogue_loop_vinfo. Avoid the pointless faking
of the original DRs there.

* tree-vect-loop.cc (vect_transform_loop): Do not verify DRs
have not been modified for epilogue loops.
(update_epilogue_loop_vinfo): Do not copy modified DRs to
the originals.

change get_best_mode args int -> HOST_WIDE_INT [PR121264]

The following testcase is miscompiled, because byte 0x20000000
is bit 0x100000000 and ifcombine incorrectly combines the two loads
into a BIT_FIELD_REF even when they are very far away.
The problem is that gimple-fold.cc ifcombine uses get_best_mode heavily,
and that function has just int bitsize and int bitpos arguments, so
when called e.g. with
  if (get_best_mode (end_bit - first_bit, first_bit, 0, ll_end_region,
                     ll_align, BITS_PER_WORD, volatilep, &lnmode))
where end_bit - first_bit doesn't fit into int, it is silently truncated.
If there was just a single problematic get_best_mode call, I would probably
just check for overflows in the caller, but there are many.
And the two arguments are used solely as arguments to
bit_field_mode_iterator constructor which has HOST_WIDE_INT arguments,
so I think the easiest fix is just make the get_best_mode arguments
also HOST_WIDE_INT.

2025-07-31  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/121264
* machmode.h (get_best_mode): Change type of first 2 arguments
from int to HOST_WIDE_INT.
* stor-layout.cc (get_best_mode): Likewise.

* gcc.dg/tree-ssa/pr121264.c: New test.

aarch64: testsuite: Fix do-assemble tests for SME

GCC doesn't support SME without SVE2, so the -march=armv8-a+<ext> argument to
check_no_compiler_messages causes aarch64_asm_<ext>_ok to return zero for SME
and any <ext> that implies it. This patch changes the baseline architecure to
armv9-a for these extensions.

The tests for ACLE SME2 intrinsics that require FEAT_FAMINMAX were configured
to do-assemble if aarch64_asm_sme2_ok returned 1 (by default), but they really
need to check if +faminmax is supported too. The fix above exposed this, so
we also fix the do-assemble/do-compile choice for those tests here.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/sme2/acle-asm/amax_f16_x2.c: Gate do-assemble on
assembler support for +faminmax and +sme2.
* gcc.target/aarch64/sme2/acle-asm/amax_f16_x4.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/amax_f32_x2.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/amax_f32_x4.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/amax_f64_x2.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/amax_f64_x4.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/amin_f16_x2.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/amin_f16_x4.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/amin_f32_x2.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/amin_f32_x4.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/amin_f64_x2.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/amin_f64_x4.c: Likewise.
* lib/target-supports.exp: Split the extensions that require SME into
a separate set, and use armv9-a as their baseline.

Fix comment typos - hanlde -> handle

2025-07-31 Jakub Jelinek <jakub@redhat.com>

* gimple-ssa-store-merging.cc (find_bswap_or_nop): Fix comment typos,
hanlde -> handle.
* config/i386/i386.cc (ix86_gimple_fold_builtin, ix86_rtx_costs):
Likewise.
* config/i386/i386-features.cc (remove_partial_avx_dependency):
Likewise.

* gcc.target/i386/apx-1.c (apx_hanlder): Rename to ...
(apx_handler): ... this.
* gcc.target/i386/uintr-2.c (UINTR_hanlder): Rename to ...
(UINTR_handler): ... this.
* gcc.target/i386/uintr-5.c (UINTR_hanlder): Rename to ...
(UINTR_handler): ... this.

Disallow scan-store vectorization in epilogues

The following disallows vectorizing epilogues containing scan-stores.
Since code generation works by walking gimple stmts it is not ready
for this when cleaning up epilogue vectorization. I believe
scan-store vectorization needs most of the work done during SLP
discovery to reflect the data flow.

* tree-vect-stmts.cc (check_scan_store): Remove redundant
slp_node check. Disallow epilogue vectorization.

Avoid passing vectype != NULL when costing scalar IL

The following makes sure to not leak a set vectype on a stmt when
doing scalar IL costing as this can confuse vector cost models
which do not look at m_costing_for_scalar most of the time.

* tree-vectorizer.h (vector_costs::costing_for_scalar): New
accessor.
(add_stmt_cost): For scalar costing force vectype to NULL.
Verify we do not pass in a SLP node.

RISC-V: Adding H to the canonical order [PR121312]

We added H into canonical order before, but forgot to add it to
arch-canonicalize as well...

gcc/ChangeLog:

PR target/121312
* config/riscv/arch-canonicalize: Add H extension to the
canonical order.

Daily bump.

[sanitizer_common] Remove reference to obsolete termio ioctls (#138822)

Cherry picked from LLVM commit c99b1bcd505064f2e086e6b1034ce0b0c91ea5b9.

The termio ioctls are no longer used after commit 59978b21ad9c
("[sanitizer_common] Remove interceptors for deprecated struct termio
(#137403)"), remove them.  Fixes this build error:

../../../../libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.cpp:765:27: error: invalid application of ‘sizeof’ to incomplete type ‘__sanitizer::termio’
  765 |   unsigned IOCTL_TCGETA = TCGETA;
      |                           ^~~~~~
../../../../libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.cpp:769:27: error: invalid application of ‘sizeof’ to incomplete type ‘__sanitizer::termio’
  769 |   unsigned IOCTL_TCSETA = TCSETA;
      |                           ^~~~~~
../../../../libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.cpp:770:28: error: invalid application of ‘sizeof’ to incomplete type ‘__sanitizer::termio’
  770 |   unsigned IOCTL_TCSETAF = TCSETAF;
      |                            ^~~~~~~
../../../../libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.cpp:771:28: error: invalid application of ‘sizeof’ to incomplete type ‘__sanitizer::termio’
  771 |   unsigned IOCTL_TCSETAW = TCSETAW;
      |                            ^~~~~~~

Update cpplib sr.po

* sr.po: Update.

c++: Don't assume trait funcs return error_mark_node when tf_error is passed [PR121291]

For the sake of determining if there are other errors in user code to
report early, many trait functions don't always return error_mark_node
if not called in a SFINAE context (i.e., tf_error is set). This patch
removes some assumptions on this behaviour I'd made when improving
diagnostics of builtin traits.

PR c++/121291

gcc/cp/ChangeLog:

* constraint.cc (diagnose_trait_expr): Remove assumption about
failures returning error_mark_node.
* except.cc (explain_not_noexcept): Allow expr not being
noexcept.
* method.cc (build_invoke): Adjust comment.
(is_trivially_xible): Always note non-trivial components if expr
is not null or error_mark_node.
(is_nothrow_xible): Likewise for non-noexcept components.
(is_nothrow_convertible): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/ext/is_invocable7.C: New test.
* g++.dg/ext/is_nothrow_convertible5.C: New test.

Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
Reviewed-by: Patrick Palka <ppalka@redhat.com>

libstdc++: Fix test when dual abi disabled

When !_GLIBCXX_USE_DUAL_ABI the old COW std::string implementation is being used
which do not generate the expected error diagnostics.

libstdc++-v3/ChangeLog:

* testsuite/std/time/format/data_not_present_neg.cc: Remove _GLIBCXX_USE_DUAL_ABI
check.

Reviewed-by: Jonathan Wakely <jwakely@redhat.com>

c++: improve non-constant template arg diagnostic

A conversation today pointed out that the current diagnostic for this case
doesn't mention the constant evaluation failure, it just says e.g.

"'p' is not a valid template argument for 'int*' because it is not the
address of a variable"

This patch improves the diagnostic in C++17 and above (post-N4268) to
diagnose failed constant-evaluation.

gcc/cp/ChangeLog:

* pt.cc (convert_nontype_argument_function): Check
cxx_constant_value on failure.
(invalid_tparm_referent_p): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/tc1/dr49.C: Adjust diagnostic.
* g++.dg/template/func2.C: Likewise.
* g++.dg/cpp1z/nontype8.C: New test.

simplify-rtx: Add `(subreg (not a))` simplification for word_mode [PR121308]

Right now in simplify_subreg, there is code to try to simplify for word_mode
with the binary bitwise operators. The unary bitwise operator is not handle,
this causes an odd mix match and the new self testing code that was added with
r16-2614-g965564eafb721f was not expecting.

The self testing code was for testing the newly added code but since there
was already code that handles word_mode, we hit the mismatch but only
for targets where word_mode is SImode (or smaller).

This adds the code to handle `not` in a similar fashion as the other
bitwise operators for word_mode.

Changes since v1:
* v2: add `&& SCALAR_INT_MODE_P (innermode)` to the conditional.

Bootstrapped and tested on x86_64-linux-gnu.

PR rtl-optimization/121308
gcc/ChangeLog:

* simplify-rtx.cc (simplify_context::simplify_subreg): Handle
subreg of `not` with word_mode to make it symmetric with the
other bitwise operators.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

IFCVT: Fix factor_out_operators correctly for more than 1 phi [PR121295]

r16-2590-ga51bf9e10182cf was not the correct fix for this in the end.
Instead a much simplier and localized fix is needed, just change the phi
that is being worked on with the new result and arguments that is from the
factored out operator.
This solves the issue of not having result in the IR and causing issues that way.

Bootstrapped and tested on x86_64-linux-gnu.
Note this depends on reverting r16-2590-ga51bf9e10182cf.

PR tree-optimization/121236
PR tree-optimization/121295

gcc/ChangeLog:

* tree-if-conv.cc (factor_out_operators): Change the phi node
to the new result and args.

gcc/testsuite/ChangeLog:

* gcc.dg/torture/pr121236-1.c: New test.
* gcc.dg/torture/pr121295-1.c: New test.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

Revert "ifcvt: Fix ifcvt for multiple phi nodes after factoring operator [PR121236]"

This reverts commit a51bf9e10182cf7ac858db0ea6c5cb11b4f12377.

Report read errors when reading auto-profile

currently -fauto-profile will happily read truncated file without any warning
and interpret it as a zero profile which will in turn result in slow code.
This patch exports gcov_is_error and adds checks so truncated files are detected.

gcc/ChangeLog:

* auto-profile.cc (string_table::read): Check gcov_is_error.
(read_profile): Likewise.
* gcov-io.cc (gcov_is_error): Export for gcc linkage.
* gcov-io.h (gcov_is_error): Declare.

[x86] factor out worker from ix86_builtin_vectorization_cost

The following factors out a worker that gets a mode argument
rather than a vectype argument.  That makes a difference when
we hit the fallback in add_stmt_cost for scalar stmts where
vectype might be NULL and thus mode is derived from the scalar
stmt there.  But ix86_builtin_vectorization_cost does not
have access to the stmt.  So the patch instead dispatches
to the new ix86_default_vector_cost there, passing down the mode
we derived from the stmt.

This is to avoid regressions with a patch that makes even more
scalar stmt costings have a vectype passed.

* config/i386/i386.cc (ix86_default_vector_cost): Split
out from ...
(ix86_builtin_vectorization_cost): ... this and use
mode instead of vectype as argument.
(ix86_vector_costs::add_stmt_cost): Call
ix86_default_vector_cost instead of ix86_builtin_vectorization_cost.

s390: Implement spaceship optab [PR117015]

gcc/ChangeLog:

PR target/117015
* config/s390/s390-protos.h (s390_expand_int_spaceship): New
function.
(s390_expand_fp_spaceship): New function.
* config/s390/s390.cc (s390_expand_int_spaceship): New function.
(s390_expand_fp_spaceship): New function.
* config/s390/s390.md (spaceship<mode>4): New expander.

gcc/testsuite/ChangeLog:

* gcc.target/s390/spaceship-fp-1.c: New test.
* gcc.target/s390/spaceship-fp-2.c: New test.
* gcc.target/s390/spaceship-fp-3.c: New test.
* gcc.target/s390/spaceship-fp-4.c: New test.
* gcc.target/s390/spaceship-int-1.c: New test.
* gcc.target/s390/spaceship-int-2.c: New test.
* gcc.target/s390/spaceship-int-3.c: New test.

cprop: Allow jump bypassing for single set insns

During jump bypassing also consider insns of the form

(insn 25 57 26 9 (parallel [
            (set (reg:CCZ 33 %cc)
                (compare:CCZ (reg:SI 60 [ _9 ])
                    (const_int 0 [0])))
            (clobber (scratch:SI))
        ]) "spaceship-fp-4.c":27:1 1746 {*tstsi_cconly_extimm}
     (nil))

by testing for a single set insn during bypass_conditional_jumps().
This is a requirement for test gcc.target/s390/spaceship-fp-4.c of the
subsequent commit.

In order to silence

cprop.cc:1621:40: error: 'setcc_dest' may be used uninitialized [-Werror=maybe-uninitialized]
1621 |             src = simplify_replace_rtx (src, setcc_dest, setcc_src);
      |                   ~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~

initialize setcc_{dest,src} in bypass_block() although this is not
really required.

gcc/ChangeLog:

* cprop.cc (bypass_block): Extract single set.
(bypass_conditional_jumps): Ditto.

x86: Transform to "pushq $-1; popq reg" for -Oz

commit 4c80062d7b8c272e2e193b8074a8440dbb4fe588
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Sun May 25 07:40:29 2025 +0800

    x86: Enable *mov<mode>_(and|or) only for -Oz

disabled transformation from "movq $-1,reg" to "pushq $-1; popq reg" for
-Oz.  But for legacy integer registers, the former is 4 bytes and the
latter is 3 bytes.  Enable such transformation for -Oz.

gcc/

PR target/120427
* config/i386/i386.md (peephole2): Transform "movq $-1,reg" to
"pushq $-1; popq reg" for -Oz if reg is a legacy integer register.

gcc/testsuite/

PR target/120427
* gcc.target/i386/pr120427-5.c: New test.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>

auto-profile fixes

This patch silences warning about bad location in function_instance::match
warning about profile containing record for line numbers that are not matched
by the function body. While this is a bogus profile (and we will end up losing
the profile data), create_gcov does not have enough information to output them
correctly in all contexts since in dwarf5 we output multiple locations per single
instructions (possibly comming from different inlines) while it can only represent
one inline stack.

The patch also fixes issue with profile scaling. By making force_nonzero to
take into account cutoffs, I made the test for counter being non-zero before scaling
too agressive.

gcc/ChangeLog:

* auto-profile.cc (function_instance::match): Disable warning
about bogus locations since dwarf does not represent enough
info to output them correctly in all cases.
(add_scale): Use nonzero_p instead of orig.force_nonzero () == orig.
(afdo_adjust_guessed_profile): Add missing newline in dump
file.

Fix symbol_table::change_decl_assembler_name when DECL_RTL is already computed

while working on patch assigning unique names to static symbols I noticed that
fortran symbols are not renamed since the frontend calls make_decl_rtl. This
gets DECL_ASSEMBBLER_NAME and DECL_RTL out of sync. I think we can drop that
call, but it is also good idea to avoid this inconsistence, so this patch makes
symbol_table::change_decl_assembler_name to recompute DECL_RTL in this case.

gcc/ChangeLog:

* symtab.cc (symbol_table::change_decl_assembler_name): Recompute DECL_RTL
in case it is already computed.

Fix fasle profile insonsistency error

This patch fixes false incosistent profile error message seen when building SPEC with
-fprofile-use -fdump-ipa-profile.
The problem is that with dumping tree_esitmate_probability is run in dry run
mode to report success rates of heuristics. It however runs determine_unlikely_bbs
which ovewrites some counts to profile_count::zero and later value profiling sees
the mismatch.

In sane profile determine_unlikely_bbs should be almost always no-op since it
should only drop to 0 things that are known to be unlikely executed. What
happens here is that there is a comdat where profile is lost and we see a
call with non-zero count calling function with zero count and "fix" the profile
by making the call to have zero count, too.

I also extended unlikely prediates to avoid tampering with predictions when
prediciton is believed to be reliable. This also avoids us from dropping all
EH regions to 0 count as tested by the testcase.

gcc/ChangeLog:

* predict.cc (unlikely_executed_edge_p): Ignore EDGE_EH if profile
is reliable.
(unlikely_executed_stmt_p): special case builtin_trap/unreachable and
ignore other heuristics for reliable profiles.
(tree_estimate_probability): Disable unlikely bb detection when
doing dry run

gcc/testsuite/ChangeLog:

* g++.dg/tree-prof/eh1.C: New test.

vect: Add target hook to prefer gather/scatter instructions

For AMD GCN, the instructions available for loading/storing vectors are
always scatter/gather operations (i.e. there are separate addresses for
each vector lane), so the current heuristic to avoid gather/scatter
operations with too many elements in get_group_load_store_type is
counterproductive. Avoiding such operations in that function can
subsequently lead to a missed vectorization opportunity whereby later
analyses in the vectorizer try to use a very wide array type which is
not available on this target, and thus it bails out.

This patch adds a target hook to override the "single_element_p"
heuristic in the function as a target hook, and activates it for GCN. This
allows much better code to be generated for affected loops.

Co-authored-by: Julian Brown <julian@codesourcery.com>
gcc/
* doc/tm.texi.in (TARGET_VECTORIZE_PREFER_GATHER_SCATTER): Add
documentation hook.
* doc/tm.texi: Regenerate.
* target.def (prefer_gather_scatter): Add target hook under vectorizer.
* hooks.cc (hook_bool_mode_int_unsigned_false): New function.
* hooks.h (hook_bool_mode_int_unsigned_false): New prototype.
* tree-vect-stmts.cc (vect_use_strided_gather_scatters_p): Add
parameters group_size and single_element_p, and rework to use
targetm.vectorize.prefer_gather_scatter.
(get_group_load_store_type): Move some of the condition into
vect_use_strided_gather_scatters_p.
* config/gcn/gcn.cc (gcn_prefer_gather_scatter): New function.
(TARGET_VECTORIZE_PREFER_GATHER_SCATTER): Define hook.

Don't pass vector params through to offload targets

The optimization options are deliberately passed through to the LTO compiler,
but when the same mechanism is reused for offloading it ends up forcing the
host compiler settings onto the device compiler. Maybe this should be removed
completely, but this patch just fixes a few of them. In particular,
param_vect_partial_vector_usage is disabled by x86 and this really hurts amdgcn.

I also fixed an ambiguous else warning in the generated file by adding braces.

gcc/ChangeLog:

* config/gcn/gcn.cc (gcn_option_override): Add note to set default for
param_vect_partial_vector_usage to "1".
* optc-save-gen.awk: Don't pass through options marked "NoOffload".
* params.opt (-param=vect-epilogues-nomask): Add NoOffload.
(-param=vect-partial-vector-usage): Likewise.
(-param=vect-inner-loop-cost-factor): Likewise.

tree-optimization/121130 - vectorizable_call cannot handle .MASK_CALL

The following makes it correctly reject them,
vectorizable_simd_clone_call is solely responsible for them.

PR tree-optimization/121130
* tree-vect-stmts.cc (vectorizable_call): Bail out for
.MASK_CALL.

* gcc.dg/vect/vect-simd-pr121130.c: New testcase.

c++: Make __extension__ silence -Wlong-long pedwarns/warnings [PR121133]

The PR13358 r0-92909 change changed the diagnostics on long long
in C++ (either with -std=c++98 or -Wlong-long), but unlike the
C FE we unfortunately warn even in the
__extension__ long long a;
etc. cases. The C FE in that case in
disable_extension_diagnostics saves and clears not just
pedantic flag but also warn_long_long (and several others), while
C++ FE only temporarily disables pedantic.

The following patch makes it behave like the C FE in this regard,
though (__extension__ 1LL) still doesn't work because of the
separate lexing (and I must say I have no idea how to fix that).

Or do you prefer a solution closer to the C FE, cp_parser_extension_opt
saving the values into a bitfield and have another function to restore
the state (or use RAII)?

2025-07-30 Jakub Jelinek <jakub@redhat.com>

PR c++/121133
* parser.cc (cp_parser_unary_expression): Adjust
cp_parser_extension_opt caller and restore warn_long_long.
(cp_parser_declaration): Likewise.
(cp_parser_block_declaration): Likewise.
(cp_parser_member_declaration): Likewise.
(cp_parser_extension_opt): Add SAVED_LONG_LONG argument,
save previous warn_long_long state into it and clear it
for __extension__.

* g++.dg/warn/pr121133-1.C: New test.
* g++.dg/warn/pr121133-2.C: New test.
* g++.dg/warn/pr121133-3.C: New test.
* g++.dg/warn/pr121133-4.C: New test.

libcpp: Fix up comma diagnostics in preprocessor for C++ [PR120778]

The P2843R3 Preprocessing is never undefined paper contains comments
that various compilers handle comma operators in preprocessor expressions
incorrectly and I think they are right.

In both C and C++ the grammar uses constant-expression non-terminal
for #if/#elif and in both C and C++ that NT is conditional-expression,
so
  #if 1, 2
is IMHO clearly wrong in both languages.

C89 then says for constant-expression
"Constant expressions shall not contain assignment, increment, decrement,
function-call, or comma operators, except when they are contained within the
operand of a sizeof operator."
Because all the remaining identifiers in the #if/#elif expression are
replaced with 0 I think assignments, increment, decrement and function-call
aren't that big deal because (0 = 1) or ++4 etc. are all invalid, but
for comma expressions I think it matters.  In r0-56429 PR456 Joseph has
added !CPP_OPTION (pfile, c99) to handle that correctly.
Then C99 changed that to:
"Constant expressions shall not contain assignment, increment, decrement, function-call,
or comma operators, except when they are contained within a subexpression that is not
evaluated."
That made for C99+
  #if 1 || (1, 2)
etc. valid but
  #if (1, 2)
is still invalid, ditto
  #if 1 ? 1, 2 : 3

In C++ I can't find anything like that though, and as can be seen on say
int a[(1, 2)];
int b[1 ? 1, 2 : 3];
being accepted by C++ and rejected by C while
int c[1, 2];
int d[1 ? 2 : 3, 4];
being rejected in both C and C++, so I think for C++ it is indeed just the
grammar that prevents #if 1, 2.  When it is the second operand of ?: or
inside of () the grammar just uses expression and that allows comma
operator.

So, the following patch uses different decisions for C++ when to diagnose
comma operator in preprocessor expressions, for C++ tracks if it is inside
of () (obviously () around #embed clauses don't count unless one uses
limit ((1, 2)) etc.) or inside of the second ?: operand and allows comma
operator there and disallows elsewhere.

BTW, I wonder if anything in the standard disallows <=> in the preprocessor
expressions.  Say
  #if (0 <=> 1) < 0
etc.
  #include <compare>
  constexpr int a = (0 <=> 1) < 0;
is valid (but not valid without #include <compare>) and the expressions
don't use any identifiers.

2025-07-30  Jakub Jelinek  <jakub@redhat.com>

PR c++/120778
* internal.h (struct lexer_state): Add comma_ok member.
* expr.cc (_cpp_parse_expr): Initialize it to 0, increment on
CPP_OPEN_PAREN and CPP_QUERY, decrement on CPP_CLOSE_PAREN
and CPP_COLON.
(num_binary_op): For C++ pedwarn on comma operator if
pfile->state.comma_ok is 0 instead of !c99 or skip_eval.

* g++.dg/cpp/if-comma-1.C: New test.

vect: Add missing skip-vector check for peeling with versioning [PR121020]

This fixes a miscompilation issue introduced by the enablement of
combined loop peeling and versioning. A test case that reproduces the
issue is included in the patch.

When performing loop peeling, GCC usually inserts a skip-vector check.
This ensures that after peeling, there are enough remaining iterations
to enter the main vectorized loop. Previously, the check was omitted if
loop versioning for alignment was applied. It was safe before because
versioning and peeling for alignment were mutually exclusive.

However, with combined peeling and versioning enabled, this is not safe
any more. A loop may be peeled and versioned at the same time. Without
the skip-vector check, the main vectorized loop can be entered even if
its iteration count is zero. This can cause the loop running many more
iterations than needed, resulting in incorrect results.

To fix this, the patch updates the condition of omitting the skip-vector
check to when versioning is performed alone without peeling.

gcc/ChangeLog:

PR tree-optimization/121020
* tree-vect-loop-manip.cc (vect_do_peeling): Update the
condition of omitting the skip-vector check.
* tree-vectorizer.h (LOOP_VINFO_USE_VERSIONING_WITHOUT_PEELING):
Add a helper macro.

gcc/testsuite/ChangeLog:

PR tree-optimization/121020
* gcc.dg/vect/vect-early-break_138-pr121020.c: New test.

vect: Fix insufficient alignment requirement for speculative loads [PR121190]

This patch fixes a segmentation fault issue that can occur in vectorized
loops with an early break. When GCC vectorizes such loops, it may insert
a versioning check to ensure that data references (DRs) with speculative
loads are aligned. The check normally requires DRs to be aligned to the
vector mode size, which prevents generated vector load instructions from
crossing page boundaries.

However, this is not sufficient when a single scalar load is vectorized
into multiple loads within the same iteration. In such cases, even if
none of the vector loads crosses page boundaries, subsequent loads after
the first one may still access memory beyond current valid page.

Consider the following loop as an example:

while (i < MAX_COMPARE) {
  if (*(p + i) != *(q + i))
    return i;
  i++;
}

When compiled with "-O3 -march=znver2" on x86, the vectorized loop may
include instructions like:

vmovdqa (%rcx,%rax), %ymm0
vmovdqa 32(%rcx,%rax), %ymm1
vpcmpeqq (%rdx,%rax), %ymm0, %ymm0
vpcmpeqq 32(%rdx,%rax), %ymm1, %ymm1

Note two speculative vector loads are generated for each DR (p and q).
The first vmovdqa and vpcmpeqq are safe due to the vector size (32-byte)
alignment, but the following ones (at offset 32) may not be safe because
they could read from the beginning of the next memory page, potentially
leading to segmentation faults.

To avoid the issue, this patch increases the alignment requirement for
speculative loads to DR_TARGET_ALIGNMENT. It ensures all vector loads in
the same vector iteration access memory within the same page.

gcc/ChangeLog:

PR tree-optimization/121190
* tree-vect-data-refs.cc (vect_enhance_data_refs_alignment):
Increase alignment requirement for speculative loads.

gcc/testsuite/ChangeLog:

PR tree-optimization/121190
* gcc.dg/vect/vect-early-break_52.c: Update an unsafe test.
* gcc.dg/vect/vect-early-break_137-pr121190.c: New test.

aarch64: Fix sme2+faminmax intrisic gating (PR 121300)

Fixes the feature gating for the SME2+FAMINMAX intrinsics.

PR target/121300

gcc/ChangeLog:

* config/aarch64/aarch64-sve-builtins-sme.def (svamin/svamax): Fix
arch gating.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/pr121300.c: New test.

tree-optimization/121304 - set memory_access_type before reading it

The following re-orders gather/scatter handling back to be before
we check for fallback situations, specifically make sure to set
memory_access_type before reading it.

* tree-vect-stmts.cc (get_group_load_store_type):
Process STMT_VINFO_GATHER_SCATTER before reading
memory_access_type.

aarch64: Add support for unpacked SVE FP conditional ternary arithmetic

This patch extends the expander for fma, fnma, fms, and fnms to support
partial SVE FP modes.

We add the missing BF16 tests, which we can now trigger for having
implemented the conditional expander.

We also add tests for the 'merging with multiplicand' case, which this
expander canonicalizes (albeit under SVE_STRICT_GP).

gcc/ChangeLog:

* config/aarch64/aarch64-sve.md (@cond_<optab><mode>): Extend
to support partial FP modes.
(*cond_<optab><mode>_2_strict): Extend from SVE_FULL_F to SVE_F,
use aarch64_predicate_operand.
(*cond_<optab><mode>_4_strict): Extend from SVE_FULL_F_B16B16 to
SVE_F_B16B16, use aarch64_predicate_operand.
(*cond_<optab><mode>_any_strict): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/sve/unpacked_cond_fmla_1.c: Add test cases
for merging with multiplcand.
* gcc.target/aarch64/sve/unpacked_cond_fmls_1.c: Likewise.
* gcc.target/aarch64/sve/unpacked_cond_fnmla_1.c: Likewise.
* gcc.target/aarch64/sve/unpacked_cond_fnmls_1.c: Likewise.
* gcc.target/aarch64/sve/unpacked_cond_fmla_2.c: New test.
* gcc.target/aarch64/sve/unpacked_cond_fmls_2.c: Likewise.
* gcc.target/aarch64/sve/unpacked_cond_fnmla_2.c: Likewise..
* gcc.target/aarch64/sve/unpacked_cond_fnmls_2.c: Likewise.
* g++.target/aarch64/sve/unpacked_cond_ternary_bf16_1.C: Likewise.
* g++.target/aarch64/sve/unpacked_cond_ternary_bf16_2.C: Likewise.

aarch64: Relaxed SEL combiner patterns for unpacked SVE FP ternary arithmetic

Extend the ternary op/UNSPEC_SEL combiner patterns from SVE_FULL_F/
SVE_FULL_F_BF to SVE_F/SVE_F_BF, where the strictness value is
SVE_RELAXED_GP.

We can only reliably test the 'merging with the third input' (addend)
and 'independent value' patterns at this stage as the canocalisation that
reorders the multiplicands based on the second SEL input would be performed
by the conditional expander.

Another difficulty is that we can't test these fused multiply/SEL combines
without using __builtin_fma and friends. The reason for this is as follows:

We support COND_ADD, COND_SUB, and COND_MUL optabs, so match.pd will
canonicalize patterns like ADD/SUB/MUL combined with a VEC_COND_EXPR into
these conditional forms. Later, when widening_mul tries to fold these into
conditional fused multiply operations, the transformation fails - simply
because we haven’t implemented those conditional fused multiply optabs yet.

Hence why this patch lacks tests for BFloat16...

gcc/ChangeLog:

* config/aarch64/aarch64-sve.md (*cond_<optab><mode>_2_relaxed):
Extend from SVE_FULL_F to SVE_F.
(*cond_<optab><mode>_4_relaxed): Extend from SVE_FULL_F_B16B16
to SVE_F_B16B16.
(*cond_<optab><mode>_any_relaxed): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/sve/unpacked_cond_fmla_1.c: New test.
* gcc.target/aarch64/sve/unpacked_cond_fmls_1.c: Likewise.
* gcc.target/aarch64/sve/unpacked_cond_fnmla_1.c: Likewise.
* gcc.target/aarch64/sve/unpacked_cond_fnmls_1.c: Likewise.

fortran: Remove useless elements count variable

The function gfc_array_init_size evaluates the number of array elements
to a variable from a caller, but the single caller providing the
variable actually doesn't use it.

This change removes the variable and the function arguments passing its
address down the call chain.

gcc/fortran/ChangeLog:

* trans-array.cc (gfc_array_init_size): Remove the nelems
argument.
(gfc_array_allocate): Update caller. Remove the nelems
argument.
* trans-stmt.cc (gfc_trans_allocate): Update caller. Remove the
nelems variable.
* trans-array.h (gfc_array_allocate): Update prototype.

fortran: implement split for fortran 2023

This patch includes the implementation, documentation, and test case for SPLIT.

gcc/fortran/ChangeLog:

* check.cc (gfc_check_split): Argument check for SPLIT.
* gfortran.h (enum gfc_isym_id): Define GFC_ISYM_SPLIT.
* intrinsic.cc (add_subroutines): Register SPLIT intrinsic.
* intrinsic.h (gfc_check_split): New decl.
(gfc_resolve_split): Ditto.
* intrinsic.texi: SPLIT documentation.
* iresolve.cc (gfc_resolve_split): Add resolved_sym for SPLIT.
* trans-decl.cc (gfc_build_intrinsic_function_decls): Add decl for
SPLIT in libgfortran.
* trans-intrinsic.cc (conv_intrinsic_split): SPLIT codegen.
(gfc_conv_intrinsic_subroutine): Handle SPLIT case.
* trans.h (GTY): Declare gfor_fndecl_string_split{, _char4}.

libgfortran/ChangeLog:

* gfortran.map: Add split symbol.
* intrinsics/string_intrinsics_inc.c (string_split):
Runtime support for SPLIT.

gcc/testsuite/ChangeLog:

* gfortran.dg/split_1.f90: New test.
* gfortran.dg/split_2.f90: New test.
* gfortran.dg/split_3.f90: New test.
* gfortran.dg/split_4.f90: New test.

Signed-off-by: Yuao Ma <c8ef@outlook.com>

aarch64: Add support for unpacked SVE FP ternary arithmetic

This patch extends the expander for unconditional fma, fnma, fms, and
fnms, so that it supports partial SVE FP modes.

gcc/ChangeLog:

* config/aarch64/aarch64-sve.md (<optab><mode>4): Extend from
SVE_FULL_F_B16B16 to SVE_F_B16B16. Use aarch64_sve_fp_pred instead
of aarch64_ptrue_reg.
(@aarch64_pred_<optab><mode>): Extend from SVE_FULL_F_B16B16 to
SVE_F_B16B16. Use aarch64_predicate_operand.

gcc/testsuite/ChangeLog:

* g++.target/aarch64/sve/unpacked_ternary_bf16_1.C: New test.
* g++.target/aarch64/sve/unpacked_ternary_bf16_2.C: Likewise.
* gcc.target/aarch64/sve/unpacked_fmla_1.c: Likewise.
* gcc.target/aarch64/sve/unpacked_fmla_2.c: Likewise.
* gcc.target/aarch64/sve/unpacked_fmls_1.c: Likewise.
* gcc.target/aarch64/sve/unpacked_fmls_2.c: Likewise.
* gcc.target/aarch64/sve/unpacked_fnmla_1.c: Likeiwse.
* gcc.target/aarch64/sve/unpacked_fnmla_2.c: Likewise.
* gcc.target/aarch64/sve/unpacked_fnmls_1.c: Likewise.
* gcc.target/aarch64/sve/unpacked_fnmls_2.c: Likewise.

Remove V64SFmode and V64SImode.

It's needed by avx5124vnniw/avx5124fmaps which have been removed by
r15-656-ge1a7e2c54d52d0.

gcc/ChangeLog:

* config/i386/i386-modes.def: Remove VECTOR_MODES(FLOAT, 256)
and VECTOR_MODE (INT, SI, 64).
* config/i386/i386.cc (ix86_hard_regno_nregs): Remove related
code for V64SF/V64SImode.

Eliminate redundant vpextrq/vpinsrq when move TI to V4SI.

r14-1902-g96c3539f2a3813 split TImode move with 2 DImode move, it's
supposed to optimize TImode in parameter/return since accoring to
psABI it's stored into 2 general registers.

But when TImode is not in parameter/return, it could create redundancy
in the PR.

The patch add a splitter to handle that.

.i.e.
(insn 10 9 14 2 (set (subreg:V2DI (reg:V4SI 98 [ <retval> ]) 0)
(vec_concat:V2DI (subreg:DI (reg:TI 101) 0)
(subreg:DI (reg:TI 101) 8)))
8442 {vec_concatv2di}
(expr_list:REG_DEAD (reg:TI 101)

gcc/ChangeLog:

PR target/121274
* config/i386/sse.md (*vec_concatv2di_0): Add a splitter
before it.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr121274.c: New test.

RISC-V: Add testcases for unsigned avg ceil vx combine.

The unsigned avg ceil share the vaaddux.vx for the vx combine,
so add the test case to make sure it works well as expected.

The below test suites are passed for this patch series.
* The rv64gcv fully regression test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u64.c: Add asm check
for unsigned avg ceil.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_binary.h: Add test
helper macros.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_binary_data.h: Add
test data.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vaadd-run-2-u16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vaadd-run-2-u32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vaadd-run-2-u64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vaadd-run-2-u8.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

Daily bump.

simplify-rtx: Fix Distribute subregs over logic ops [PR121302]

r16-2614-g965564eafb721f had a typo where it would assume byte==0
rather than use the byte (offset) that was passed.

This fixes that typo and also fixes the comment since it is not just
about lowerpart subregs but all non-paradoxical subregs.

Pushed as obvious after bootstrap/test on x86_64-linux-gnu.

PR rtl-optimization/121302
gcc/ChangeLog:

* simplify-rtx.cc (simplify_context::simplify_subreg): Use
byte instead of 0 when calling simplify_subreg.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

testsuite: Cleanup after auto-profile testcases when auto-profile is not supported [PR121215]

The problem here is that in tree-prof.exp does not cleanup if requiring auto-profile
but it is not supported and the testcase uses dg-additional-sources. Currently additional_sources
is not reset to "" and then another testcase comes along and thinks that is the additional source
to be added.

Committed as obvious after testing:
make check-gcc RUNTESTFLAGS="tree-prof.exp=afdo-crossmodule-1.c tree-ssa.exp=pr67891.c"
to make sure pr67891.c now no longer uses the additional source.

PR testsuite/121215
gcc/testsuite/ChangeLog:

* lib/profopt.exp (profopt-execute): Call cleanup-after-saved-dg-test
if returning early for the -fauto-profile case failing case.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

aarch64: Add support for unpacked SVE FP conditional binary arithmetic

This patch extends the expander for conditional smax, smin, add, sub, mul,
min, max, and div to support partial SVE FP modes.

If exceptions from undefined vector elements must be suppressed, this
expansion converts the container-level predicate to an element-level one, and
ensures that these elements are inactive for the operation.  In practice, this
is a predicate AND with the existing mask and a container-size PTRUE.

gcc/ChangeLog:

* config/aarch64/aarch64-protos.h (aarch64_sve_emit_masked_fp_pred):
Declare.
* config/aarch64/aarch64-sve.md (and<mode>3):  Change this to...
(@and<mode>3): ...this, so that we can use gen_and3.
(@cond_<optab><mode>): Extend from SVE_FULL_F_B16B16 to SVE_F_B16B16,
use aarch64_predicate_operand.
(*cond_<optab><mode>_2_strict): Likewise.
(*cond_<optab><mode>_3_strict): Likewise.
(*cond_<optab><mode>_any_strict): Likwise.
(*cond_<optab><mode>_2_const_strict): Extend from SVE_FULL_F to SVE_F,
use aarch64_predicate_operand.
(*cond_<optab><mode>_any_const_strict): Likewise.
(*cond_sub<mode>_3_const_strict): Likwise.
(*cond_sub<mode>_const_strict): Likewise.
(*vcond_mask_<mode><vpred>): Use aarch64_predicate_operand, and update
the comment here.
* config/aarch64/aarch64.cc (aarch64_sve_emit_masked_fp_pred): New
function.  Helper to mask the predicate in conditional expanders.

gcc/testsuite/ChangeLog:

* g++.target/aarch64/sve/unpacked_cond_binary_bf16_2.C: New test.
* gcc.target/aarch64/sve/unpacked_cond_builtin_fmax_2.c: Likewise.
* gcc.target/aarch64/sve/unpacked_cond_builtin_fmin_2.c: Likewise.
* gcc.target/aarch64/sve/unpacked_cond_fadd_2.c: Likewise.
* gcc.target/aarch64/sve/unpacked_cond_fdiv_2.c: Likewise.
* gcc.target/aarch64/sve/unpacked_cond_fmaxnm_2.c: Likewise.
* gcc.target/aarch64/sve/unpacked_cond_fminnm_2.c: Likewise.
* gcc.target/aarch64/sve/unpacked_cond_fmul_2.c: Likewise.
* gcc.target/aarch64/sve/unpacked_cond_fsubr_2.c: Likewise.

x86: Pass -mno-80387 to compile pr121208-1(a|b).c

Pass -mno-80387 to compile pr121208-1(a|b).c to silence

.../pr121208-1a.c:11:1: sorry, unimplemented: 80387 instructions aren’t allowed in a function with the ‘no_caller_saved_registers’ attribute

PR target/121208
* gcc.target/i386/pr121208-1a.c (dg-options): Add -mno-80387.
* gcc.target/i386/pr121208-1b.c (dg-options): Likewise.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>

testsuite: Adjust s390x params for vector tests.

Loop peeling and minimal loop vectorization threshold prevented loop
vectorization in these examples. Adjust parameters in the test to
make the test pass.

Signed-off-by: Juergen Christ <jchrist@linux.ibm.com>
PR testsuite/121286
PR testsuite/121288

gcc/testsuite/ChangeLog:

* gcc.dg/vect/pr112325.c: Adjust parameters for s390.
* gcc.dg/vect/pr117888-1.c: Ditto.

RISC-V: Generate -mcpu and -mtune options from riscv-cores.def.

Automatically generate -mcpu and -mtune options in invoke.texi from
the unified riscv-cores.def metadata, ensuring documentation stays in sync
with definitions and reducing manual maintenance.

gcc/ChangeLog:

* Makefile.in: Add riscv-mcpu.texi and riscv-mtune.texi to the list
of files to be processed by the Texinfo generator.
* config/riscv/t-riscv: Add rule for generating riscv-mcpu.texi
and riscv-mtune.texi.
* doc/invoke.texi: Replace hand‑written extension table with
`@include riscv-mcpu.texi` and `@include riscv-mtune.texi` to
pull in auto‑generated entries.
* config/riscv/gen-riscv-mcpu-texi.cc: New file.
* config/riscv/gen-riscv-mtune-texi.cc: New file.
* doc/riscv-mcpu.texi: New file.
* doc/riscv-mtune.texi: New file.

simplify-rtx: Simplify subregs of logic ops

This patch adds a new rule for distributing lowpart subregs through
ANDs, IORs, and XORs with a constant, in cases where one of the terms
then disappears.  For example:

  (lowart-subreg:QI (and:HI x 0x100))

simplifies to zero and

  (lowart-subreg:QI (and:HI x 0xff))

simplifies to (lowart-subreg:QI x).

This would often be handled at some point using nonzero bits.  However,
the specific case I want the optimisation for is SVE predicates,
where nonzero bit tracking isn't currently an option.  Specifically:
the predicate modes VNx8BI, VNx4BI and VNx2BI have the same size as
VNx16BI, but treat only every second, fourth, or eighth bit as
significant.  Thus if we have:

  (subreg:VNx8BI (and:VNx16BI x C))

where C is the repeating constant { 1, 0, 1, 0, ... }, then the
AND only clears bits that are made insignificant by the subreg,
and so the result is equal to (subreg:VNx8BI x).  Later patches
rely on this.

gcc/
* simplify-rtx.cc (simplify_context::simplify_subreg): Distribute
lowpart subregs through AND/IOR/XOR, if doing so eliminates one
of the terms.
(test_scalar_int_ext_ops): Add some tests of the above for integers.
* config/aarch64/aarch64.cc (aarch64_test_sve_folding): Likewise
add tests for predicate modes.

testsuite: Generalise aarch64/saturating_arithmetic*.c

gcc.target/aarch64/saturating_arithmetic_{1,2}.c expect w0 and w1 to
be duplicated into vectors. The tests expected the duplication of w1
to happen first, but the other order would be fine too. A later
simplify-rtx.cc patch happens to change the order.

gcc/testsuite/
* gcc.target/aarch64/saturating_arithmetic_1.c: Allow w0 and w1
to be duplicated in either order.
* gcc.target/aarch64/saturating_arithmetic_2.c: Likewise.

testsuite: Make aarch64/cmpbr.c more forgiving

The 8-bit and 16-bit tests in cmpbr.c assumed an inverted operand
order ("w1, w0"), but it's possible to use the uninverted operand
order too. This patch generalises the tests to support both forms.

This is a prerequisite for a later patch that adds a new
simplify-rtx.cc rule.

gcc/testsuite/
* gcc.target/aarch64/cmpbr.c: Support both operand orders
for 8-bit and 16-bit comparisons.