git.ipfire.org Git - thirdparty/gcc.git/log

Simplify intereaved store vectorization processing

When doing interleaving we perform code generation when visiting the
last store of a chain. We keep track of this via DR_GROUP_STORE_COUNT,
the following localizes this to the caller of vectorizable_store,
also avoing redundant non-processing of the other stores.

* tree-vect-stmts.cc (vectorizable_store): Do not bump
DR_GROUP_STORE_COUNT here. Remove early out.
(vect_transform_stmt): Only call vectorizable_store on
the last element of an interleaving chain.

MAINTAINERS: Update my email address

Signed-off-by: Filip Kastl <fkastl@suse.cz>
ChangeLog:

* MAINTAINERS: Update my email address.

tree-optimization/94864 - vector insert of vector extract simplification

The PRs ask for optimizing of

  _1 = BIT_FIELD_REF <b_3(D), 64, 64>;
  result_4 = BIT_INSERT_EXPR <a_2(D), _1, 64>;

to a vector permutation.  The following implements this as
match.pd pattern, improving code generation on x86_64.

On the RTL level we face the issue that backend patterns inconsistently
use vec_merge and vec_select of vec_concat to represent permutes.

I think using a (supported) permute is almost always better
than an extract plus insert, maybe excluding the case we extract
element zero and that's aliased to a register that can be used
directly for insertion (not sure how to query that).

The patch FAILs one case in gcc.target/i386/avx512fp16-vmovsh-1a.c
where we now expand from

__A_28 = VEC_PERM_EXPR <x2.8_9, x1.9_10, { 0, 9, 10, 11, 12, 13, 14, 15 }>;

instead of

_28 = BIT_FIELD_REF <x2.8_9, 16, 0>;
__A_29 = BIT_INSERT_EXPR <x1.9_10, _28, 0>;

producing a vpblendw instruction instead of the expected vmovsh.  That's
either a missed vec_perm_const expansion optimization or even better,
an improvement - Zen4 for example has 4 ports to execute vpblendw
but only 3 for executing vmovsh and both instructions have the same size.

The patch XFAILs the sub-testcase.

PR tree-optimization/94864
PR tree-optimization/94865
PR tree-optimization/93080
* match.pd (bit_insert @0 (BIT_FIELD_REF @1 ..) ..): New pattern
for vector insertion from vector extraction.

* gcc.target/i386/pr94864.c: New testcase.
* gcc.target/i386/pr94865.c: Likewise.
* gcc.target/i386/avx512fp16-vmovsh-1a.c: XFAIL.
* gcc.dg/tree-ssa/forwprop-40.c: Likewise.
* gcc.dg/tree-ssa/forwprop-41.c: Likewise.

Fortran: implement vector sections in DATA statements [PR49588]

gcc/fortran/ChangeLog:

PR fortran/49588
* data.cc (gfc_advance_section): Derive next index set and next offset
into DATA variable also for array references using vector sections.
Use auxiliary array to keep track of offsets into indexing vectors.
(gfc_get_section_index): Set up initial indices also for DATA variables
with array references using vector sections.
* data.h (gfc_get_section_index): Adjust prototype.
(gfc_advance_section): Likewise.
* resolve.cc (check_data_variable): Pass vector offsets.

gcc/testsuite/ChangeLog:

PR fortran/49588
* gfortran.dg/data_vector_section.f90: New test.

VECT: Support loop len control on EXTRACT_LAST vectorization

Hi, @Richi and @Richard, base on previous disscussion, I simpily fix issuses for
powerpc and s390 with your suggestions:

-  machine_mode len_load_mode = get_len_load_store_mode
-    (loop_vinfo->vector_mode, true).require ();
-  machine_mode len_store_mode = get_len_load_store_mode
-    (loop_vinfo->vector_mode, false).require ();
+  machine_mode len_load_mode, len_store_mode;
+  if (!get_len_load_store_mode (loop_vinfo->vector_mode, true)
+        .exists (&len_load_mode))
+    return false;
+  if (!get_len_load_store_mode (loop_vinfo->vector_mode, false)
+        .exists (&len_store_mode))
+    return false;

Co-Authored-By: Kewen.Lin <linkw@linux.ibm.com>
gcc/ChangeLog:

* tree-vect-loop.cc (vect_verify_loop_lens): Add exists check.
(vectorizable_live_operation): Add live vectorization for length loop
control.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/partial/live-1.c: New test.
* gcc.target/riscv/rvv/autovec/partial/live_run-1.c: New test.

Testcase fix.

gcc/testsuite/ChangeLog:

* gcc.target/i386/invariant-ternlog-1.c: Only scan %rdx under
TARGET_64BIT.

RISC-V: Change fnms testcases assertion to xfail

Hi,

This patch fixes inappropriate assertions in fnms testcases since
we want to generate .COND_FNMS but actually generate .FNMS + .VCOND_MASK.
A patch to do this optimization will follow.

Best,
Lehua

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-1.c: Adjust.
* gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-6.c: Ditto.

analyzer: check format strings for null termination [PR105899]

This patch extends -fanalyzer to check the format strings of calls
to functions marked with '__attribute__ ((format...))'.

The only checking done in this patch is to check that the format string
is a valid null-terminated string; this patch doesn't attempt to check
the content of the format string.

gcc/analyzer/ChangeLog:
PR analyzer/105899
* call-details.cc (call_details::call_details): New ctor.
* call-details.h (call_details::call_details): New ctor decl.
(struct call_arg_details): Move here from region-model.cc.
* region-model.cc (region_model::check_call_format_attr): New.
(region_model::check_call_args): Call it.
(struct call_arg_details): Move it to call-details.h.
* region-model.h (region_model::check_call_format_attr): New decl.

gcc/testsuite/ChangeLog:
PR analyzer/105899
* gcc.dg/analyzer/attr-format-1.c: New test.
* gcc.dg/analyzer/sprintf-1.c: Update expected results for
now-passing tests.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

analyzer: add kf_fopen

Add checking to -fanalyzer that both params of calls to "fopen" are
valid null-terminated strings.

gcc/analyzer/ChangeLog:
* kf.cc (class kf_fopen): New.
(register_known_functions): Register it.

gcc/testsuite/ChangeLog:
* gcc.dg/analyzer/fopen-1.c: New test.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

analyzer: replace -Wanalyzer-unterminated-string with scan_for_null_terminator [PR105899]

In r14-3169-g325f9e88802daa I added check_for_null_terminated_string_arg
to -fanalyzer, calling it in various places, with a sole check for
unterminated string constants, adding -Wanalyzer-unterminated-string for
this case.

This patch adds region_model::scan_for_null_terminator, which simulates
scanning memory for a zero byte, complaining about uninitiliazed bytes
and out-of-range accesses seen before any zero byte is seen.

This more flexible approach catches the issues we saw before with
-Wanalyzer-unterminated-string, and also catches uninitialized runs
of bytes, and I believe will be a better way to build checking of C
string operations in the analyzer.

Given that the patch makes -Wanalyzer-unterminated-string redundant
and that this option was only in trunk for 10 days and has no known
users, the patch simply removes the option without a compatibility
fallback.

The patch uses custom events and notes to provide context on where
the issues are coming from.  For example, given:

null-terminated-strings-1.c: In function ‘test_partially_initialized’:
null-terminated-strings-1.c:71:3: warning: use of uninitialized value ‘buf[1]’ [CWE-457] [-Wanalyzer-use-of-uninitialized-value]
   71 |   __analyzer_get_strlen (buf);
      |   ^~~~~~~~~~~~~~~~~~~~~~~~~~~
  ‘test_partially_initialized’: events 1-3
    |
    |   69 |   char buf[16];
    |      |        ^~~
    |      |        |
    |      |        (1) region created on stack here
    |   70 |   buf[0] = 'a';
    |   71 |   __analyzer_get_strlen (buf);
    |      |   ~~~~~~~~~~~~~~~~~~~~~~~~~~~
    |      |   |
    |      |   (2) while looking for null terminator for argument 1 (‘&buf’) of ‘__analyzer_get_strlen’...
    |      |   (3) use of uninitialized value ‘buf[1]’ here
    |
analyzer-decls.h:59:22: note: argument 1 of ‘__analyzer_get_strlen’ must be a pointer to a null-terminated string
   59 | extern __SIZE_TYPE__ __analyzer_get_strlen (const char *ptr);
      |                      ^~~~~~~~~~~~~~~~~~~~~

gcc/analyzer/ChangeLog:
PR analyzer/105899
* analyzer.opt (Wanalyzer-unterminated-string): Delete.
* call-details.cc
(call_details::check_for_null_terminated_string_arg): Convert
return type from void to const svalue *.  Add param "out_sval".
* call-details.h
(call_details::check_for_null_terminated_string_arg): Likewise.
* kf-analyzer.cc (kf_analyzer_get_strlen::impl_call_pre): Wire up
to result of check_for_null_terminated_string_arg.
* region-model.cc (get_strlen): Delete.
(class unterminated_string_arg): Delete.
(struct fragment): New.
(class iterable_cluster): New.
(region_model::get_store_bytes): New.
(get_tree_for_byte_offset): New.
(region_model::scan_for_null_terminator): New.
(region_model::check_for_null_terminated_string_arg): Convert
return type from void to const svalue *.  Add param "out_sval".
Reimplement in terms of scan_for_null_terminator, dropping the
special-case for -Wanalyzer-unterminated-string.
* region-model.h (region_model::get_store_bytes): New decl.
(region_model::scan_for_null_terminator): New decl.
(region_model::check_for_null_terminated_string_arg): Convert
return type from void to const svalue *.  Add param "out_sval".
* store.cc (concrete_binding::get_byte_range): New.
* store.h (concrete_binding::get_byte_range): New decl.
(store_manager::get_concrete_binding): New overload.

gcc/ChangeLog:
PR analyzer/105899
* doc/invoke.texi: Remove -Wanalyzer-unterminated-string.

gcc/testsuite/ChangeLog:
PR analyzer/105899
* gcc.dg/analyzer/error-1.c: Update expected results to reflect
reimplementation of unterminated string detection.  Add test
coverage for uninitialized buffers.
* gcc.dg/analyzer/null-terminated-strings-1.c: Likewise.
* gcc.dg/analyzer/putenv-1.c: Likewise.
* gcc.dg/analyzer/strchr-1.c: Likewise.
* gcc.dg/analyzer/strcpy-1.c: Likewise.
* gcc.dg/analyzer/strdup-1.c: Likewise.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

analyzer: handle NULL inner context in region_model_context_decorator

gcc/analyzer/ChangeLog:
* region-model.cc (region_model_context_decorator::add_event):
Handle m_inner being NULL.
* region-model.h (class region_model_context_decorator): Likewise.
(annotating_context::warn): Likewise.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

analyzer: add ability for context to add events to a saved_diagnostic

gcc/analyzer/ChangeLog:
* diagnostic-manager.cc (saved_diagnostic::add_event): New.
(saved_diagnostic::add_any_saved_events): New.
(diagnostic_manager::add_event): New.
(dedupe_winners::emit_best): New.
(diagnostic_manager::emit_saved_diagnostic): Make "sd" param
non-const. Call saved_diagnostic::add_any_saved_events.
* diagnostic-manager.h (saved_diagnostic::add_event): New decl.
(saved_diagnostic::add_any_saved_events): New decl.
(saved_diagnostic::m_saved_events): New field.
(diagnostic_manager::add_event): New decl.
(diagnostic_manager::emit_saved_diagnostic): Make "sd" param
non-const.
* engine.cc (impl_region_model_context::add_event): New.
* exploded-graph.h (impl_region_model_context::add_event): New decl.
* region-model.cc
(noop_region_model_context::add_event): New.
(region_model_context_decorator::add_event): New.
* region-model.h (region_model_context::add_event): New vfunc.
(noop_region_model_context::add_event): New decl.
(region_model_context_decorator::add_event): New decl.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

analyzer: convert note_adding_context to annotating_context

This is enabling work towards the context being able to inject
events into diagnostic paths, rather than just notes after the
warning.

gcc/analyzer/ChangeLog:
* region-model.cc
(class check_external_function_for_access_attr::annotating_ctxt):
Convert to an annotating_context.
* region-model.h (class note_adding_context): Rename to...
(class annotating_context): ...this, updating the "warn" method.
(note_adding_context::make_note): Replace with...
(annotating_context::add_annotations): ...this.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

Daily bump.

RISC-V: Support RVV VFWREDUSUM.VS rounding mode intrinsic API

This patch would like to support the rounding mode API for the
VFWREDUSUM.VS as the below samples

* __riscv_vfwredusum_vs_f32m1_f64m1_rm
* __riscv_vfwredusum_vs_f32m1_f64m1_rm_m

Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc
(vfwredusum_frm_obj): New declaration.
(BASE): Ditto.
* config/riscv/riscv-vector-builtins-bases.h: Ditto.
* config/riscv/riscv-vector-builtins-functions.def
(vfwredusum_frm): New intrinsic function def.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/float-point-wredusum.c: New test.

bpf: neg instruction does not accept an immediate

The BPF virtual machine does not support neg nor neg32 instructions with
an immediate.

The erroneous instructions were removed from binutils:
https://sourceware.org/pipermail/binutils/2023-August/129135.html

Change the define_insn so that an immediate cannot be accepted.

From testing, a neg-immediate was probably never chosen over a
mov-immediate anyway.

gcc/

* config/bpf/bpf.md (neg): Second operand must be a register.

[PATCH] RISC-V: Add Types to Missing Bitmanip Instructions

This patch updates the bitmanip instructions to ensure that no insn is left
without a type attribute. Updates a total of 8 insns to have type "bitmanip"

Tested for regressions using rv32/64 multilib with newlib/linux.

gcc/Changelog:

* config/riscv/bitmanip.md: Added bitmanip type to insns
that are missing types.

Remove XFAIL from gcc/testsuite/gcc.dg/unroll-7.c

This test passes since commit e41103081bfa "Fix undefined behaviour in
profile_count::differs_from_p", so remove the xfail annotation.

Tested on aarch64-linux-gnu, armv8l-linux-gnueabihf and x86_64-linux-gnu.

gcc/testsuite/ChangeLog:
* gcc.dg/unroll-7.c: Remove xfail.

[RISCV][committed] Remove spurious newline in ztso sequence

amo-table-ztso-load-3 the coordination branch after merging up the Ztso changes
due to a spurious newline in the output causing scan-function-body to fail.
There's probably an over-zealous .* or similar regexp in the framework. I
didn't see it in a quick scan, but could have easily missed it.

Regardless, fixing the extraneous newline is easy :-)

gcc/
* config/riscv/sync-ztso.md (atomic_load_ztso<mode>): Avoid extraenous
newline.

aarch64: fix format specifier

gcc/ChangeLog:

* config/aarch64/falkor-tag-collision-avoidance.cc (dump_insn_list):
Fix format specifier.

[frange] Return false if nothing changed in union_nans().

When one operand is a known NAN, we always return TRUE from
union_nans(), even if no change occurred. This patch fixes the
oversight.

gcc/ChangeLog:

* value-range.cc (frange::union_nans): Return false if nothing
changed.
(range_tests_floats): New test.

[PATCH 2/2] RISC-V: Add quotes to #error messages (all)

From: Tsukasa OI <research_trasio@irq.a4lg.com>

In commit 1aaf3a64e92a ("[PATCH] RISC-V: Deduplicate #error messages in
testsuite"), the author made a mistake to miss the test after adding
quotes around extension names. To avoid future errors and for consistency
with other #error uses in the RISC-V testsuite, this commit quotes all
unquoted #error messages.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/xtheadba.c: Quote unquoted #error message.
* gcc.target/riscv/xtheadbb.c: Ditto.
* gcc.target/riscv/xtheadbs.c: Ditto.
* gcc.target/riscv/xtheadcmo.c: Ditto.
* gcc.target/riscv/xtheadcondmov.c: Ditto.
* gcc.target/riscv/xtheadfmemidx.c: Ditto.
* gcc.target/riscv/xtheadfmv.c: Ditto.
* gcc.target/riscv/xtheadint.c: Ditto.
* gcc.target/riscv/xtheadmac.c: Ditto.
* gcc.target/riscv/xtheadmemidx.c: Ditto.
* gcc.target/riscv/xtheadmempair.c: Ditto.
* gcc.target/riscv/xtheadsync.c: Ditto.
* gcc.target/riscv/zawrs.c: Ditto.
* gcc.target/riscv/zvbb.c: Ditto.
* gcc.target/riscv/zvbc.c: Ditto.
* gcc.target/riscv/zvkg.c: Ditto.
* gcc.target/riscv/zvkned.c: Ditto.
* gcc.target/riscv/zvknha.c: Ditto.
* gcc.target/riscv/zvknhb.c: Ditto.
* gcc.target/riscv/zvksed.c: Ditto.
* gcc.target/riscv/zvksh.c: Ditto.
* gcc.target/riscv/zvkt.c: Ditto.

[PATCH 1/2] RISC-V: Add quotes to #error messages

In commit 1aaf3a64e92a ("[PATCH] RISC-V: Deduplicate #error messages in
testsuite"), the author made a mistake to miss the test after adding
quotes around extension names. To avoid future errors and for consistency
with other #error uses in the RISC-V testsuite, this commit quotes #error
messages where necessary to avoid current test case failures.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zvkn.c: Quote #error messages.
* gcc.target/riscv/zvkn-1.c: Ditto.
* gcc.target/riscv/zvknc.c: Ditto.
* gcc.target/riscv/zvknc-1.c: Ditto.
* gcc.target/riscv/zvknc-2.c: Ditto.
* gcc.target/riscv/zvkng.c: Ditto.
* gcc.target/riscv/zvkng-1.c: Ditto.
* gcc.target/riscv/zvkng-2.c: Ditto.
* gcc.target/riscv/zvks.c: Ditto.
* gcc.target/riscv/zvks-1.c: Ditto.
* gcc.target/riscv/zvksc.c: Ditto.
* gcc.target/riscv/zvksc-1.c: Ditto.
* gcc.target/riscv/zvksc-2.c: Ditto.
* gcc.target/riscv/zvksg.c: Ditto.
* gcc.target/riscv/zvksg-1.c: Ditto.
* gcc.target/riscv/zvksg-2.c: Ditto.

Fix FAIL: gcc.target/i386/pr87007-5.c

The following fixes the gcc.target/i386/pr87007-5.c testcase which
changed code generation again after the recent sinking improvements.
We now have

        vxorps  %xmm0, %xmm0, %xmm0
        vsqrtsd d2(%rip), %xmm0, %xmm0

and a necessary xor again in one case, the other vsqrtsd has
a register source and a properly zeroing load:

        vmovsd  d3(%rip), %xmm0
        testl   %esi, %esi
        jg      .L11
.L3:
        vsqrtsd %xmm0, %xmm0, %xmm0

the following patch adjusts the scan.

* gcc.target/i386/pr87007-5.c: Update comment, adjust subtest.

Fix gcc.dg/vect/bb-slp-subgroups-2.c with 256bit vectors

The following adds vect128, vect256 and vect512 effective targets
and adjusts gcc.dg/vect/bb-slp-subgroups-2.c accordingly.

gcc/testsuite/
* lib/target-supports.exp: Add vect128, vect256 and vect512
effective targets.
* gcc.dg/vect/bb-slp-subgroups-2.c: Properly handle the
vect256 case.

Fix gcc.dg/vect/pr65947-7.c failures on aarch64.

gcc/testsuite/ChangeLog:
* gcc.dg/vect/pr65947-7.c: Add target check aarch64*-*-* and scan vect
dump for pattern "optimizing condition reduction with FOLD_EXTRACT_LAST"
for targets that support vect_fold_extract_last.

Fix gcc.dg/vect/bb-slp-46.c FAIL

When relaxing vectorization of possibly overflowing reductions I
failed to update a testcase that will now vectorize and no longer
test for what it was written for. The following replaces the
vectorizable add with a division.

* gcc.dg/vect/bb-slp-46.c: Use division instead of addition
to avoid reduction vectorization.

Adjust testcase for Intel GDS.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx512f-pr88464-2.c: Add -mgather to
options.
* gcc.target/i386/avx512f-pr88464-3.c: Ditto.
* gcc.target/i386/avx512f-pr88464-4.c: Ditto.
* gcc.target/i386/avx512f-pr88464-6.c: Ditto.
* gcc.target/i386/avx512f-pr88464-7.c: Ditto.
* gcc.target/i386/avx512f-pr88464-8.c: Ditto.
* gcc.target/i386/avx512vl-pr88464-10.c: Ditto.
* gcc.target/i386/avx512vl-pr88464-12.c: Ditto.
* gcc.target/i386/avx512vl-pr88464-13.c: Ditto.
* gcc.target/i386/avx512vl-pr88464-14.c: Ditto.
* gcc.target/i386/avx512vl-pr88464-15.c: Ditto.
* gcc.target/i386/avx512vl-pr88464-16.c: Ditto.
* gcc.target/i386/avx512vl-pr88464-2.c: Ditto.
* gcc.target/i386/avx512vl-pr88464-4.c: Ditto.
* gcc.target/i386/avx512vl-pr88464-5.c: Ditto.
* gcc.target/i386/avx512vl-pr88464-6.c: Ditto.
* gcc.target/i386/avx512vl-pr88464-7.c: Ditto.
* gcc.target/i386/avx512vl-pr88464-8.c: Ditto.

PR111048: Set arg_npatterns correctly.

In valid_mask_for_fold_vec_perm_cst we set arg_npatterns always
to VECTOR_CST_NPATTERNS (arg0) because of (q1 & 0) == 0:

     /* Ensure that the stepped sequence always selects from the same
         input pattern.  */
      unsigned arg_npatterns
        = ((q1 & 0) == 0) ? VECTOR_CST_NPATTERNS (arg0)
                          : VECTOR_CST_NPATTERNS (arg1);

resulting in wrong code-gen issues.
The patch fixes this by changing the condition to (q1 & 1) == 0.

gcc/ChangeLog:
PR tree-optimization/111048
* fold-const.cc (valid_mask_for_fold_vec_perm_cst_p): Set arg_npatterns
correctly.
(fold_vec_perm_cst): Remove workaround and again call
valid_mask_fold_vec_perm_cst_p for both VLS and VLA vectors.
(test_fold_vec_perm_cst::test_nunits_min_4): Add test-case.

tree-optimization/111082 - bogus promoted min

vectorize_slp_instance_root_stmt promotes operations with undefined
overflow to unsigned arithmetic but fails to consider operations
that do not overflow like MIN which it turned into MIN with wrong
signedness and in the case of the PR an unsupported operation.
The following rectifies this.

PR tree-optimization/111082
* tree-vect-slp.cc (vectorize_slp_instance_root_stmt): Only
pun operations that can overflow.

* gcc.dg/pr111082.c: New testcase.

libstdc++: Remove reliance on unspecified behaviour in std::rethrow_if_nested test

This test case calls std::set_terminate while there is an active
exception. Since LWG 2111 it is unspecified which terminate handler is
used when std::nested_exception::rethrow_nested() calls std::terminate.
With libsupc++ the global handler changed by std::set_terminate is used,
but libc++abi uses the active exception's handler (the one that was
current when the exception was first thrown).

Adjust the test case so that it works with either implementation choice.
So that the process doesn't exit cleanly if std::terminate happens
sooner than expected, use a global variable to control when the "clean
terminate" behaviour happens.

libstdc++-v3/ChangeLog:

* testsuite/18_support/nested_exception/rethrow_if_nested-term.cc:
Call std::set_terminate before throwing the nested exception.

LCM: Export 2 helpful functions as global for VSETVL PASS use in RISC-V backend

This patch exports 'compute_antinout_edge' and 'compute_earliest' as global scope
which is going to be used in VSETVL PASS of RISC-V backend.

The demand fusion is the fusion of VSETVL information to emit VSETVL which dominate and pre-config for most
of the RVV instructions in order to elide redundant VSETVLs.

For exmaple:

for
for
  for
    if (cond}
      VSETVL demand 1: SEW/LMUL = 16 and TU policy
    else
      VSETVL demand 2: SEW = 32

VSETVL pass should be able to fuse demand 1 and demand 2 into new demand: SEW = 32, LMUL = M2, TU policy.
Then emit such VSETVL at the outmost of the for loop to get the most optimal codegen and run-time execution.

Currenty the VSETVL PASS Phase 3 (demand fusion) is really messy and un-reliable as well as un-maintainable.
And, I recently read dragon book and morgan's book again, I found there "earliest" can allow us to do the
demand fusion in a very reliable and optimal way.

So, this patch exports these 2 functions which are very helpful for VSETVL pass.

gcc/ChangeLog:

* lcm.cc (compute_antinout_edge): Export as global use.
(compute_earliest): Ditto.
(compute_rev_insert_delete): Ditto.
* lcm.h (compute_antinout_edge): Ditto.
(compute_earliest): Ditto.

tree-optimization/111070 - fix ICE with recent ifcombine fix

We now got test coverage for non-SSA name bits so the following amends
the SSA_NAME_OCCURS_IN_ABNORMAL_PHI checks.

PR tree-optimization/111070
* tree-ssa-ifcombine.cc (ifcombine_ifandif): Check we have
an SSA name before checking SSA_NAME_OCCURS_IN_ABNORMAL_PHI.

* gcc.dg/pr111070.c: New testcase.

MATCH: [PR111002] Sink view_convert for vec_cond

Like convert we can sink view_convert into vec_cond but
we can only do it if the element types are nop_conversions.
This is to allow conversion between signed and unsigned types only.
Rather than between integer and float types which mess up the vec_cond
so that isel does not understand `a?-1:0` is still that.

OK? Bootstrapped and tested on x86_64-linux-gnu and aarch64-linux-gnu.

PR tree-optimization/111002

gcc/ChangeLog:

* match.pd (view_convert(vec_cond(a,b,c))): New pattern.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/sve/cond_convert_8.c: New test.

Testsuite, LTO: silence warning to make test pass on Darwin

gcc/testsuite/ChangeLog:

* gcc.dg/lto/20091013-1_2.c: Add -Wno-stringop-overread.

Support -march=gracemont

Alderlake-N is E-core only, add it as an alias of Alderlake.

gcc/ChangeLog:

* common/config/i386/cpuinfo.h (get_intel_cpu): Detect
Alderlake-N.
* common/config/i386/i386-common.cc (alias_table): Support
-march=gracemont as an alias of -march=alderlake.

Daily bump.

PR modula2/111085 nexttoward and nexttowardf contain incorrect definitions

The definition for procedures nexttoward and nexttowardf contain
second incorrect parameter and return types. This bug was
discovered when attempting to resolve PR 108143 and is applied
separately and prior to PR 108143.

gcc/m2/ChangeLog:

PR modula2/111085
* gm2-libs/Builtins.def (nexttoward): Alter the second
parameter to LONGREAL.
(nexttowardf): Alter the second parameter to LONGREAL.
* gm2-libs/Builtins.mod (nexttoward): Alter the second
parameter to LONGREAL.
(nexttowardf): Alter the second parameter to LONGREAL.
* gm2-libs/cbuiltin.def (nexttoward): Alter the second
parameter to LONGREAL.
(nexttowardf): Alter the second parameter to LONGREAL.

Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>

Testsuite, darwin: account for macOS 13 and 14

gcc/testsuite/ChangeLog:

* gcc.dg/darwin-minversion-link.c: Account for macOS 13 and 14.

testsuite: Adjust g++.dg/gomp/pr58567.C to new compiler message

Commit 92d1425ca780 "c++: redundant targ coercion for var/alias tmpls"
changed the compiler error message in this testcase from

<source>: In instantiation of 'void foo() [with T = int]':
<source>:14:11:   required from here
<source>:8:22: error: 'int' is not a class, struct, or union type
<source>:8:22: error: 'int' is not a class, struct, or union type
<source>:8:22: error: 'int' is not a class, struct, or union type
<source>:8:3: error: expected iteration declaration or initialization
compiler exited with status 1

to:

<source>: In instantiation of 'void foo() [with T = int]':
<source>:14:11:   required from here
<source>:8:22: error: 'int' is not a class, struct, or union type
<source>:8:3: error: invalid type for iteration variable 'i'
compiler exited with status 1
Excess errors:
<source>:8:3: error: invalid type for iteration variable 'i'

Andrew Pinski analysed the issue in PR 110756 and considered that it was a
testsuite issue in that the error message changed slightly.  Also, it's a
better error message.

Therefore, we only need to adjust the testcase to expect the new message.

gcc/testsuite/ChangeLog:
PR testsuite/110756
* g++.dg/gomp/pr58567.C: Adjust to new compiler error message.

Testsuite, darwin: Fix analyzer testcases

On darwin, system headers are fortified by default and that defeats the
analyzer's warnings on memcpy() calls. Turn this off for testing.

gcc/testsuite/ChangeLog:

* gcc.dg/plugin/taint-CVE-2011-0521-5-fixed.c: Use
_FORTIFY_SOURCE=0 on darwin.
* gcc.dg/plugin/taint-CVE-2011-0521-5.c: Likewise.
* gcc.dg/plugin/taint-CVE-2011-0521-6.c: Likewise.

Testsuite: mark IPA test as requiring alias support

This was indicated in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85656
but never committed. Without it, the test fails on darwin.

gcc/testsuite/ChangeLog:
* gcc.dg/ipa/ipa-icf-38.c: Require alias support.

Testsuite, plugin: make testcase pattern more flexible

On Darwin, the message recorded in the sarif file contains:
"message": {"text": "Segmentation fault: 11"}
which is different from, e.g., linux:
"message": {"text": "Segmentation fault"}

Adjusting the testcase pattern to be a little more flexible.

gcc/testsuite/ChangeLog:
* gcc.dg/plugin/crash-test-write-though-null-sarif.c: Update
expected pattern.

i386: Micro-optimize ix86_expand_sse_extend

Partial vector src is forced to a register as ops[1], we can use it
instead of SRC in the call to ix86_expand_sse_cmp. This change avoids
forcing operand[1] to a register in sign/zero-extend expanders.

gcc/ChangeLog:

* config/i386/i386-expand.cc (ix86_expand_sse_extend): Use ops[1]
instead of src in the call to ix86_expand_sse_cmp.
* config/i386/sse.md (<any_extend:insn>v8qiv8hi2): Do not
force operands[1] to a register.
(<any_extend:insn>v4hiv4si2): Ditto.
(<any_extend:insn>v2siv2di2): Ditto.

d: Merge upstream dmd, druntime 26f049fb26, phobos 330d6a4fd.

D front-end changes:

- Import dmd v2.105.0-beta.1.
- Added predefined version identifier VisionOS (ignored by GDC).
- Functions can no longer have `enum` storage class.
- The deprecation of the `body` keyword has been reverted, it is
  now an obsolete feature.
- The error for `scope class` has been reverted, it is now an
  obsolete feature.

D runtime changes:

- Import druntime v2.105.0-beta.1.

Phobos changes:

- Import phobos v2.105.0-beta.1.
- AliasSeq has been removed from std.math.
- extern(C) getdelim and getline have been removed from
  std.stdio.

gcc/d/ChangeLog:

* dmd/MERGE: Merge upstream dmd 26f049fb26.
* dmd/VERSION: Bump version to v2.105.0-beta.1.
* d-codegen.cc (get_frameinfo): Check useGC in condition.
* d-lang.cc (d_handle_option): Set obsolete parameter when compiling
with -Wall.
(d_post_options): Set useGC to false when compiling with
-fno-druntime.  Propagate obsolete flag to compileEnv.
* expr.cc (ExprVisitor::visit (CatExp *)): Check useGC in condition.

libphobos/ChangeLog:

* libdruntime/MERGE: Merge upstream druntime 26f049fb26.
* src/MERGE: Merge upstream phobos 330d6a4fd.

Testsuite: fix analyzer tests on Darwin

On macOS, system headers redefine by default some macros (memcpy,
memmove, etc) to checked versions, which defeats the analyzer. We
want to turn this off.
See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104042

gcc/testsuite/ChangeLog:

PR analyzer/104042
* gcc.dg/analyzer/analyzer.exp: Pass -D_FORTIFY_SOURCE=0 on Darwin.
* gcc.dg/analyzer/fd-bind.c: Add missing <string.h> header.
* gcc.dg/analyzer/fd-datagram-socket.c: Likewise.
* gcc.dg/analyzer/fd-listen.c: Likewise.
* gcc.dg/analyzer/fd-socket-misuse.c: Likewise.
* gcc.dg/analyzer/fd-stream-socket-active-open.c: Likewise.
* gcc.dg/analyzer/fd-stream-socket-passive-open.c: Likewise.
* gcc.dg/analyzer/fd-stream-socket.c: Likewise.
* gcc.dg/analyzer/fd-symbolic-socket.c: Likewise.

MATCH: Sink convert for vec_cond

Convert be sinked into a vec_cond if both sides
fold. Unlike other unary operations, we need to check that we still can handle
this vec_cond's first operand is the same as the new truth type.

I tried a few different versions of this patch:
view_convert to the new truth_type but that does not work as we always support all vec_cond
afterwards.
using expand_vec_cond_expr_p; but that would allow too much.

I also tried to see if view_convert can be handled here but we end up with:
_3 = VEC_COND_EXPR <_2, { Nan(-1), Nan(-1), Nan(-1), Nan(-1) }, { 0.0, 0.0, 0.0, 0.0 }>;
Which isel does not know how to handle as just being a view_convert from `vector(4) <signed-boolean:32>`
to `vector(4) float` and causes a regression with `g++.target/i386/pr88152.C`

Note, in the case of the SVE testcase, we will sink negate after the convert and be able
to remove a few extra instructions in the end.
Also with this change gcc.target/aarch64/sve/cond_unary_5.c will now pass.

Committed as approved after a bootstrapped and tested on x86_64-linux-gnu and aarch64-linux-gnu.

gcc/ChangeLog:

PR tree-optimization/111006
PR tree-optimization/110986
* match.pd: (op(vec_cond(a,b,c))): Handle convert for op.

gcc/testsuite/ChangeLog:

PR tree-optimization/111006
* gcc.target/aarch64/sve/cond_convert_7.c: New test.

fix misleading identation breaking bootstrap

Fix identation issue introduced by 966f3c13
"Fix format attribute for printf".

gcc/c-family/ChangeLog:

* c-format.cc: Fix identation.

improve error when /usr/include isn't found [PR90835]

This is a pretty simple patch that ought to help Darwin users understand
better why their build is failing when they forget to pass the
--with-sysroot= flag to configure.

gcc/ChangeLog:

PR target/90835
* Makefile.in: improve error message when /usr/include is
missing

Fix format attribute for printf

Since a long time (GCC 4.4?) GCC does support annotating functions
with either the format attribute "gnu_printf" or "ms_printf" to
distinguish between different format string interpretations.

However, it seems like the attribute is ignored for the "printf"
symbol; regardless what the function declaration says, GCC treats
it as "ms_printf". This has become an issue now that mingw-w64
supports using the UCRT instead of msvcrt.dll, and in this case
the stdio functions are declared with the gnu_printf attribute,
and inttypes.h uses the same format specifiers as in GNU mode.

A reproducible example of the problem:

$ cat format.c
__attribute__((__format__ (gnu_printf, 1, 2))) int printf (const char *__format, ...);
__attribute__((__format__ (gnu_printf, 1, 2))) int othername (const char *__format, ...);

void function(void) {
    long long unsigned x = 42;
    othername("%llu\n", x);
    printf("%llu\n", x);
}
$ x86_64-w64-mingw32-gcc -c -Wformat format.c
format.c: In function 'function':
format.c:7:15: warning: unknown conversion type character 'l' in format [-Wformat=]
    7 |     printf("%llu\n", x);
      |               ^
format.c:7:12: warning: too many arguments for format [-Wformat-extra-args]
    7 |     printf("%llu\n", x);
      |            ^~~~~~~~

Note how both functions, printf and othername, are declare with
identical gnu_printf format attributes - GCC does take this into
account for "othername" and doesn't produce a warning, but GCC
seems to disregard the attribute in the printf declaration and
behave as if it was declared as ms_printf.

If the printf function declaration is changed into a static inline
function, the actual attribute used is honored though.

gcc/c-family/ChangeLog:

PR c/95130
* c-format.cc: skip default format for printf symbol if
explicitly declared by prototype.

Signed-off-by: Tomas Kalibera <tomas.kalibera@gmail.com>
Signed-off-by: Jonathan Yong <10walls@gmail.com>

Daily bump.

omp-expand.cc: Fix wrong code with non-rectangular loop nest [PR111017]

Before commit r12-5295-g47de0b56ee455e, all gimple_build_cond in
expand_omp_for_* were inserted with
gsi_insert_before (gsi_p, cond_stmt, GSI_SAME_STMT);
except the one dealing with the multiplicative factor that was
gsi_insert_after (gsi, cond_stmt, GSI_CONTINUE_LINKING);

That commit for PR103208 fixed the issue of some missing regimplify of
operands of GIMPLE_CONDs by moving the condition handling to the new function
expand_omp_build_cond. While that function has an 'bool after = false'
argument to switch between the two variants.

However, all callers ommited this argument. This commit reinstates the
prior behavior by passing 'true' for the factor != 0 condition, fixing
the included testcase.

PR middle-end/111017
gcc/
* omp-expand.cc (expand_omp_for_init_vars): Pass after=true
to expand_omp_build_cond for 'factor != 0' condition, resulting
in pre-r12-5295-g47de0b56ee455e code for the gimple insert.

libgomp/
* testsuite/libgomp.c-c++-common/non-rect-loop-1.c: New test.

Loongarch: Fix plugin header missing install.

gcc/ChangeLog:

* config/loongarch/t-loongarch: Add loongarch-driver.h into
TM_H. Add loongarch-def.h and loongarch-tune.h into
OPTIONS_H_EXTRA.

Co-authored-by: Lulu Cheng <chenglulu@loongson.cn>

Daily bump.

libstdc++: Revert pre-C++23 support for 16-bit float types [PR111060]

In r14-3304-g1a566fddea212a and r14-3305-g6cf214b4fc97f5 I tried to
enable std::format for 16-bit float types before C++23. This causes
errors for targets where the types are defined but can't actually be
used, e.g. i686 without sse2.

Make the std::numeric_limits and std::formatter specializations for
_Float16 and __bfloat16_t depend on the __STDCPP_FLOAT16_T__ and
__STDCPP_BFLOAT16_T__ macros again, so they're only defined for C++23
when the type is fully supported. This is OK because the main point of
my earlier commits was to add better support for _Float32 and _Float64.
It seems fine for the new 16-bit types to only be supported for C++23,
as they were never present before GCC 13 anyway.

libstdc++-v3/ChangeLog:

PR target/111060
* include/std/format (formatter): Only define specializations
for 16-bit floating-point types for C++23.
* include/std/limits (numeric_limits): Likewise.

testsuite: Improve test in dg-require-python-h

If GCC is tested with a sysroot which doesn't contain a Python
installation (e.g., with a command such as
"make check-gcc-c FLAGS_UNDER_TEST="--sysroot=/some/path"), but there's
a python3-config in $PATH, then the testsuite will pick up the host's
Python.h which can't actually be used:

Executing on host: python3-config --includes    (timeout = 300)
spawn -ignore SIGHUP python3-config --includes
-I/usr/include/python3.10 -I/usr/include/python3.10
Executing on host: /some/sysroot/bin/aarch64-unknown-linux-gnu-gcc --sysroot=/some/sysroot/libc -Wl,-dynamic-linker=/some/sysroot/libc/lib/ld-linux-aarch64.so.1 -Wl,-rpath=/some/sysroot/libc/lib  /some/src/gcc.git/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-2.c    -fdiagnostics-plain-output  -fplugin=./analyzer_cpython_plugin.so -fanalyzer -I/usr/include/python3.10 -I/usr/include/python3.10 -S -o cpython-plugin-test-2.s    (timeout = 600)
spawn -ignore SIGHUP /some/sysroot/bin/aarch64-unknown-linux-gnu-gcc --sysroot=/some/sysroot/libc -Wl,-dynamic-linker=/some/sysroot/libc/lib/ld-linux-aarch64.so.1 -Wl,-rpath=/some/sysroot/libc/lib /some/src/gcc.git/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-2.c -fdiagnostics-plain-output -fplugin=./analyzer_cpython_plugin.so -fanalyzer -I/usr/include/python3.10 -I/usr/include/python3.10 -S -o cpython-plugin-test-2.s
In file included from /usr/include/python3.10/Python.h:8,
                 from /some/src/gcc.git/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-2.c:8:
/usr/include/python3.10/pyconfig.h:9:12: fatal error: aarch64-linux-gnu/python3.10/pyconfig.h: No such file or directory
compilation terminated.
compiler exited with status 1

This problem causes these testsuite failures:

FAIL: gcc.dg/plugin/cpython-plugin-test-2.c -fplugin=./analyzer_cpython_plugin.so  (test for warnings, line 17)
FAIL: gcc.dg/plugin/cpython-plugin-test-2.c -fplugin=./analyzer_cpython_plugin.so  (test for warnings, line 18)
FAIL: gcc.dg/plugin/cpython-plugin-test-2.c -fplugin=./analyzer_cpython_plugin.so  (test for warnings, line 21)
FAIL: gcc.dg/plugin/cpython-plugin-test-2.c -fplugin=./analyzer_cpython_plugin.so  (test for warnings, line 31)
FAIL: gcc.dg/plugin/cpython-plugin-test-2.c -fplugin=./analyzer_cpython_plugin.so  (test for warnings, line 32)
FAIL: gcc.dg/plugin/cpython-plugin-test-2.c -fplugin=./analyzer_cpython_plugin.so  (test for warnings, line 35)
FAIL: gcc.dg/plugin/cpython-plugin-test-2.c -fplugin=./analyzer_cpython_plugin.so  (test for warnings, line 45)
FAIL: gcc.dg/plugin/cpython-plugin-test-2.c -fplugin=./analyzer_cpython_plugin.so  (test for warnings, line 55)
FAIL: gcc.dg/plugin/cpython-plugin-test-2.c -fplugin=./analyzer_cpython_plugin.so  (test for warnings, line 63)
FAIL: gcc.dg/plugin/cpython-plugin-test-2.c -fplugin=./analyzer_cpython_plugin.so  (test for warnings, line 66)
FAIL: gcc.dg/plugin/cpython-plugin-test-2.c -fplugin=./analyzer_cpython_plugin.so  (test for warnings, line 68)
FAIL: gcc.dg/plugin/cpython-plugin-test-2.c -fplugin=./analyzer_cpython_plugin.so  (test for warnings, line 69)
FAIL: gcc.dg/plugin/cpython-plugin-test-2.c -fplugin=./analyzer_cpython_plugin.so (test for excess errors)
Excess errors:
/usr/include/python3.10/pyconfig.h:9:12: fatal error: aarch64-linux-gnu/python3.10/pyconfig.h: No such file or directory
compilation terminated.

So try to compile a test file so that the testcase can be marked as
unsupported instead.

gcc/testsuite/ChangeLog:
* lib/target-supports.exp (dg-require-python-h): Test
whether Python.h can really be used.

i386: Use PUNPCKL?? to implement vector extend and zero_extend for TARGET_SSE2.

Implement vector extend and zero_extend functionality for TARGET_SSE2 using
PUNPCKL?? family of instructions. The code for e.g. zero-extend from V2SI to
V2DImode improves from:

        movd    %xmm0, %edx
        pshufd  $85, %xmm0, %xmm0
        movd    %xmm0, %eax
        movq    %rdx, (%rdi)
        movq    %rax, 8(%rdi)

to:
        pxor    %xmm1, %xmm1
        punpckldq       %xmm1, %xmm0
        movaps  %xmm0, (%rdi)

And the code for sign-extend from V2SI to V2DImode from:

        movd    %xmm0, %edx
        pshufd  $85, %xmm0, %xmm0
        movd    %xmm0, %eax
        movslq  %edx, %rdx
        cltq
        movq    %rdx, (%rdi)
        movq    %rax, 8(%rdi)

to:
        pxor    %xmm1, %xmm1
        pcmpgtd %xmm0, %xmm1
        punpckldq       %xmm1, %xmm0
        movaps  %xmm0, (%rdi)

PR target/111023

gcc/ChangeLog:

* config/i386/i386-expand.cc (ix86_split_mmx_punpck):
Also handle V2QImode.
(ix86_expand_sse_extend): New function.
* config/i386/i386-protos.h (ix86_expand_sse_extend): New prototype.
* config/i386/mmx.md (<any_extend:insn>v4qiv4hi2): Enable for
TARGET_SSE2.  Expand through ix86_expand_sse_extend for !TARGET_SSE4_1.
(<any_extend:insn>v2hiv2si2): Ditto.
(<any_extend:insn>v2qiv2hi2): Ditto.
* config/i386/sse.md (<any_extend:insn>v8qiv8hi2): Ditto.
(<any_extend:insn>v4hiv4si2): Ditto.
(<any_extend:insn>v2siv2di2): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr111023-2.c: New test.
* gcc.target/i386/pr111023-4b.c: New test.
* gcc.target/i386/pr111023-8b.c: New test.
* gcc.target/i386/pr111023.c: New test.

[irange] Return FALSE if updated bitmask is unchanged [PR110753]

The mask/value pair we track in the irange is a bit fickle in that it
can sometimes contradict the bitmask inherent in the range.  This can
happen when a series of calculations yield a combination such as:

[3, 1000] MASK 0xfffffffe VALUE 0x0

The mask/value above implies that the lowest bit is a known 0, which
would exclude the 3 in the range.  At one time we tried keeping mask
and ranges 100% consistent, but the performance penalty was too high
(5% in VRP).  Also, it's unclear whether the intersection of two
incompatible known bits should make the whole range undefined, or
just the contradicting bits.  This is all documented in
irange::get_bitmask().  We could revisit both of these assumptions
in the future.

In this testcase IPA ends up with a range where the lower 2 bits are
expected to be 0, but the range is [1,1].

[irange] long int [1, 1] MASK 0xfffffffffffffffc VALUE 0x0

This causes irange::union_bitmask() to think an update occurred, when
no semantic change happened, thus triggering an assert in IPA-cp.  We
could get rid of the assert, but it's cleaner to make
irange::{union,intersect}_bitmask always tell the truth.  Beside, the
ranger's cache also depends on union being truthful.

PR ipa/110753

gcc/ChangeLog:

* value-range.cc (irange::union_bitmask): Return FALSE if updated
bitmask is semantically equivalent to the original mask.
(irange::intersect_bitmask): Same.
(irange::get_bitmask): Add comment.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/pr110753.c: New test.

tree-optimization/111019 - invariant motion and aliasing

The following fixes a bad choice in representing things to the alias
oracle by LIM which while correct in pieces is inconsistent with itself.
When canonicalizing a ref to a bare deref instead of leaving the base
object and the extracted offset the same and just substituting an
alternate ref the following replaces the base and the offset as well,
avoiding the confusion that otherwise will arise in
aliasing_matching_component_refs_p.

PR tree-optimization/111019
* tree-ssa-loop-im.cc (gather_mem_refs_stmt): When canonicalizing
also scrap base and offset in case the ref is indirect.

* g++.dg/torture/pr111019.C: New testcase.

bpf: bump maximum frame size limit to 32767 bytes

This commit bumps the maximum stack frame size allowed for BPF
functions to the maximum possible value.

Tested in x86_64-linux-gnu host and target bpf-unknown-none.

gcc/ChangeLog

* config/bpf/bpf.opt (mframe-limit): Set default to 32767.

gcc/testsuite/ChangeLog

* gcc.target/bpf/frame-limit-1.c: New test.
* gcc.target/bpf/frame-limit-2.c: Likewise.

libstdc++: Replace non-type-dependent uses of wchar_t in <format> and <chrono>

This is one more piece of the rework to make wchar_t support in
std::format depend on _GLIBCXX_USE_WCHAR_T.

In <format> the __to_wstring_numeric function is called with arguments
that aren't type-dependent, so a declaration needs to be available, or
the calls need to be guarded by _GLIBCXX_USE_WCHAR_T.

In <chrono> there is a similarly non-type-dependent call to std::format
with a wchar_t format string, which is ill-formed when the wchar_t
overloads of std::format are not declared. Use _GLIBCXX_WIDEN to make it
type-dependent.

libstdc++-v3/ChangeLog:

* include/bits/chrono_io.h (operator<<): Make uses of wide
strings with streams and std::format type-dependent on _CharT.
* include/std/format [!_GLIBCXX_USE_WCHAR_T] Do not use
__to_wstring_numeric.

Makefile.in: Make TM_P_H depend on $(TREE_H) [PR111021]

As PR111021 shows, the below ${port}-protos.h include tree.h
for code_helper and tree_code:

  arm/arm-protos.h:#include "tree.h"
  cris/cris-protos.h:#include "tree.h" (H-P removed this in r14-3218)
  microblaze/microblaze-protos.h:#include "tree.h"
  rl78/rl78-protos.h:#include "tree.h"
  stormy16/stormy16-protos.h:#include "tree.h"

, when compiling build/gencondmd.cc, the include hierarchy
makes it depend on tm_p.h -> ${port}-protos.h -> tree.h,
which further includes (depends on) some files that are
generated during the building, such as: all-tree.def,
tree-check.h and so on.  The previous commit r14-3215
should already force build/gencondmd.cc to depend on
${TREE_H}, so the reported build failure should be gone.

But for a long term maintenance, especially one day some
build/xxx.cc requires tm_p.h but not recog.h, the ${TREE_H}
dependence could be missed and a build failure will show
up.  So this patch is to make TM_P_H depend on $(TREE_H),
any new build/xxx.cc depending on tm_p.h will be able to
consider ${TREE_H}.

It's tested with cross-builds for the affected ports with
steps:
1) dropped the fix r14-3215;
2) reproduced the build failure with serial build;
3) applied this patch, serial built and verified all passed;
4) added back r14-3215, serial built and verified all passed;

PR bootstrap/111021

gcc/ChangeLog:

* Makefile.in (TM_P_H): Add $(TREE_H) as dependence.

vect: Factor out the handling on scatter store having gs_info.decl

Similar to the existing function vect_build_gather_load_calls,
this patch is to factor out the handling on scatter store
having gs_info.decl to vect_build_scatter_store_calls which
is a new function. It also does some minor refactoring like
moving some variables' declarations close to their uses and
restrict the scope for some of them etc.

It's a pre-patch for upcoming vectorizable_store re-structuring
for costing.

gcc/ChangeLog:

* tree-vect-stmts.cc (vect_build_scatter_store_calls): New, factor
out from ...
(vectorizable_store): ... here.

libstdc++: Fix incomplete rework of wchar_t support in std::format

r14-3300-g023a62b77f999b left make_wformat_args and some uses of
std::wformat_context unguarded by _GLIBCXX_USE_WCHAR_T.

libstdc++-v3/ChangeLog:

* include/bits/chrono_io.h (operator<<): Use __format_context.
* include/std/format (__format::__format_context): New alias
template.
[!_GLIBCXX_USE_WCHAR_T] (wformat_args, make_wformat_arg):
Disable.

tree-optimization/111048 - avoid flawed logic in fold_vec_perm

The following avoids running into somehow flawed logic in fold_vec_perm
for non-VLA vectors.

PR tree-optimization/111048
* fold-const.cc (fold_vec_perm_cst): Check for non-VLA
vectors first.

* gcc.dg/torture/pr111048.c: New testcase.

i386: Add AVX2 pragma wrapper for AVX512DQVL intrins

PR target/111051

gcc/ChangeLog:

* config/i386/avx512vldqintrin.h: Push AVX2 when AVX2 is
disabled.

gcc/testsuite/ChangeLog:

PR target/111051
* gcc.target/i386/pr111051-1.c: New test.

vect: Move VMAT_GATHER_SCATTER handlings from final loop nest

Following Richi's suggestion [1], this patch is to move the
handlings on VMAT_GATHER_SCATTER in the final loop nest
of function vectorizable_load to its own loop. Basically
it duplicates the final loop nest, clean up some useless
set up code for the case of VMAT_GATHER_SCATTER, remove some
unreachable code. Also remove the corresponding handlings
in the final loop nest.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2023-June/623329.html

gcc/ChangeLog:

* tree-vect-stmts.cc (vectorizable_load): Move the handlings on
VMAT_GATHER_SCATTER in the final loop nest to its own loop,
and update the final nest accordingly.

RISC-V: Fix -march error of zhinxmin testcases

This little patch fixs the -march error of a zhinxmin testcase I added earlier
and an old zhinxmin testcase, since these testcases are for zhinxmin extension
and not zfhmin extension.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/_Float16-zhinxmin-3.c: Adjust.
* gcc.target/riscv/_Float16-zhinxmin-4.c: Ditto.

Document cond_neg, cond_one_cmpl, cond_len_neg and cond_len_one_cmpl standard patterns

When I added `cond_one_cmpl` (and the corresponding IFN) I had noticed cond_neg
standard named pattern was not documented and this adds the documentation for
all 4 named patterns now.

OK? Tested by building the manual.

gcc/ChangeLog:

* doc/md.texi (Standard patterns): Document cond_neg, cond_one_cmpl,
cond_len_neg and cond_len_one_cmpl.

RISC-V: Add the missed half floating-point mode patterns of local_pic_load/store when only use zfhmin or zhinxmin

Hi,

There is a new failed RISC-V testcase(testsuite/gcc.target/riscv/rvv/autovec/vls/const-4.c)
on the current trunk branch when use medany as default cmodel.
The reason is the load of half floating-point imm is convert from RTL 1 to RTL
2 as the cmodel be changed from medlow to medany. This change let insn 7 be
combineed with @pred_broadcast patterns (insn 8) at combine pass. However,
insn 6 and insn 7 are combined for SF and DF mode, but not for HF mode, and
the fail combined leads to insn 7 and insn 8 be combined. The reason of the
fail combined is the local_pic_loadhf pattern doesn't exist when only enable
zfhmin(implied by zvfh).

Therefore, when only zfhmin but not zfh is enabled, the define_insn of
*local_pic_load<ANYF:mode> must also be able to produce the pattern for
*load_pic_loadhf pattern, since the zfhmin extension also includes a
half floating-point load/store instructions. So, I added an ANFLSF Iterator
and applied it to local_pic_load/store define_insns. I have checked other ANYF
usage scenarios and feel that this is the only place that needs to be corrected.
I may have missed something, please correct. Thanks.

RTL 1:

(insn 6 3 7 2 (set (reg:DI 137)
        (high:DI (symbol_ref/u:DI ("*.LC0") [flags 0x82]))) "/work/home/lding/open-source/riscv-gnu-toolchain-push/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/const-4.c":7:1 discrim 3 179 {*movdi_64bit}
     (nil))
(insn 7 6 8 2 (set (reg:HF 136)
        (mem/u/c:HF (lo_sum:DI (reg:DI 137)
                (symbol_ref/u:DI ("*.LC0") [flags 0x82])) [0  S2 A16])) "/work/home/lding/open-source/riscv-gnu-toolchain-push/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/const-4.c":7:1 discrim 3 126 {*movhf_hardfloat}
     (expr_list:REG_EQUAL (const_double:HF 8.8828125e+0 [0x0.8e2p+4])
        (nil)))

RTL 2:

(insn 6 3 7 2 (set (reg/f:DI 137)
        (symbol_ref/u:DI ("*.LC0") [flags 0x82])) "/work/home/lding/open-source/riscv-gnu-toolchain-push/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/const-4.c":7:1 discrim 3 179 {*movdi_64bit}
     (nil))
(insn 7 6 8 2 (set (reg:HF 136)
        (mem/u/c:HF (reg/f:DI 137) [0  S2 A16])) "/work/home/lding/open-source/riscv-gnu-toolchain-push/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/const-4.c":7:1 discrim 3 126 {*movhf_hardfloat}
     (expr_list:REG_EQUAL (const_double:HF 8.8828125e+0 [0x0.8e2p+4])
        (nil)))
(insn 8 7 9 2 (set (reg:V2HF 135)
        (if_then_else:V2HF (unspec:V2BI [
                    (const_vector:V2BI [
                            (const_int 1 [0x1]) repeated x2
                        ])
                    (const_int 2 [0x2]) repeated x3
                    (const_int 0 [0])
                    (reg:SI 66 vl)
                    (reg:SI 67 vtype)
                ] UNSPEC_VPREDICATE)
            (vec_duplicate:V2HF (reg:HF 136))
            (unspec:V2HF [
                    (reg:SI 0 zero)
                ] UNSPEC_VUNDEF))) "/work/home/lding/open-source/riscv-gnu-toolchain-push/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/const-4.c":6:1 discrim 3 1389 {*pred_broadcastv2hf}
     (nil))

Best,
Lehua

gcc/ChangeLog:

* config/riscv/iterators.md (TARGET_HARD_FLOAT || TARGET_ZFINX): New.
* config/riscv/pic.md (*local_pic_load<ANYF:mode>): Change ANYF.
(*local_pic_load<ANYLSF:mode>): To ANYLSF.
(*local_pic_load_32d<ANYF:mode>): Ditto.
(*local_pic_load_32d<ANYLSF:mode>): Ditto.
(*local_pic_store<ANYF:mode>): Ditto.
(*local_pic_store<ANYLSF:mode>): Ditto.
(*local_pic_store_32d<ANYF:mode>): Ditto.
(*local_pic_store_32d<ANYLSF:mode>): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/_Float16-zfhmin-4.c: New test.
* gcc.target/riscv/_Float16-zhinxmin-4.c: New test.

RISC-V: Revert the convert from vmv.s.x to vmv.v.i

Hi,

This patch revert the convert from vmv.s.x to vmv.v.i and add new pattern
optimize the special case when the scalar operand is zero.

Currently, the broadcast pattern where the scalar operand is a imm
will be converted to vmv.v.i from vmv.s.x and the mask operand will be
converted from 00..01 to 11..11. There are some advantages and
disadvantages before and after the conversion after discussing
with Juzhe offline and we chose not to do this transform.

Before:

  Advantages: The vsetvli info required by vmv.s.x has better compatibility since
  vmv.s.x only required SEW and VLEN be zero or one. That mean there
  is more opportunities to combine with other vsetlv infos in vsetvl pass.

  Disadvantages: For non-zero scalar imm, one more `li rd, imm` instruction
  will be needed.

After:

  Advantages: No need `li rd, imm` instruction since vmv.v.i support imm operand.

  Disadvantages: Like before's advantages. Worse compatibility leads to more
  vsetvl instrunctions need.

Consider the bellow C code and asm after autovec.
there is an extra insn (vsetivli zero, 1, e32, m1, ta, ma)
after converted vmv.s.x to vmv.v.i.

```
int foo1(int* restrict a, int* restrict b, int *restrict c, int n) {
    int sum = 0;
    for (int i = 0; i < n; i++)
      sum += a[i] * b[i];

    return sum;
}
```

asm (Before):

```
foo1:
        ble     a3,zero,.L7
        vsetvli a2,zero,e32,m1,ta,ma
        vmv.v.i v1,0
.L6:
        vsetvli a5,a3,e32,m1,tu,ma
        slli    a4,a5,2
        sub     a3,a3,a5
        vle32.v v2,0(a0)
        vle32.v v3,0(a1)
        add     a0,a0,a4
        add     a1,a1,a4
        vmacc.vv        v1,v3,v2
        bne     a3,zero,.L6
        vsetvli a2,zero,e32,m1,ta,ma
        vmv.s.x v2,zero
        vredsum.vs      v1,v1,v2
        vmv.x.s a0,v1
        ret
.L7:
        li      a0,0
        ret
```

asm (After):

```
foo1:
        ble     a3,zero,.L4
        vsetvli a2,zero,e32,m1,ta,ma
        vmv.v.i v1,0
.L3:
        vsetvli a5,a3,e32,m1,tu,ma
        slli    a4,a5,2
        sub     a3,a3,a5
        vle32.v v2,0(a0)
        vle32.v v3,0(a1)
        add     a0,a0,a4
        add     a1,a1,a4
        vmacc.vv        v1,v3,v2
        bne     a3,zero,.L3
        vsetivli        zero,1,e32,m1,ta,ma
        vmv.v.i v2,0
        vsetvli a2,zero,e32,m1,ta,ma
        vredsum.vs      v1,v1,v2
        vmv.x.s a0,v1
        ret
.L4:
        li      a0,0
        ret
```

Best,
Lehua

Co-Authored-By: Ju-Zhe Zhong <juzhe.zhong@rivai.ai>
gcc/ChangeLog:

* config/riscv/predicates.md (vector_const_0_operand): New.
* config/riscv/vector.md (*pred_broadcast<mode>_zero): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/scalar_move-5.c: Update.
* gcc.target/riscv/rvv/base/scalar_move-6.c: Ditto.

RISC-V: Forbidden fuse vlmax vsetvl to DEMAND_NONZERO_AVL vsetvl

Hi,

This little patch fix the fail testcase
(gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-1.c)
after apply this patch
(https://gcc.gnu.org/pipermail/gcc-patches/2023-August/627121.html).
The specific reason is that the vsetvl pass has bug and this patch
forbidden the fuse of this case. This patch needs to be committed
before that patch to work.

Best,
Lehua

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (pass_vsetvl::backward_demand_fusion):
Forbidden.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-1.c:
Address failure due to uninitialized vtype register.

Daily bump.

Revert "libstdc++: Reuse double overload of __convert_to_v if possible"

This reverts commit aad83d61d2e92b168688f7b6bd00b8604d11fc9f.

libstdc++-v3/ChangeLog:

* config/locale/generic/c_locale.cc:

libstdc++: Replace global std::string objects in tzdb.cc

When the library is built with --disable-libstdcxx-dual-abi the only
type of std::string supported is the COW string, and the two global
std::string objects in tzdb.cc have to allocate memory. I added them
thinking they would fit in the SSO string buffer, but that's not the
case when the library only uses COW strings.

Replace them with string_view objects to avoid any allocations.

libstdc++-v3/ChangeLog:

* src/c++20/tzdb.cc (tzdata_file, leaps_file): Change type to
std::string_view.

libstdc++: Reuse double overload of __convert_to_v if possible

For targets where double and long double have the same representation we
can reuse the same __convert_to_v code for both types. This will
slightly reduce the size of the compiled code in the library.

libstdc++-v3/ChangeLog:

* config/locale/generic/c_locale.cc (__convert_to_v): Reuse
double overload for long double if possible.

libstdc++: Micro-optimize construction of named std::locale

This shaves about 100ns off the std::locale constructor for named
locales (which is only about 1% of the total time).

Using !*s instead of !strcmp(s, "") doesn't make any difference as GCC
optimizes that already even at -O1. !strcmp(s, "C") is optimized at -O2
so replacing that with s[0] == 'C' && s[1] == '\0' only matters for the
--enable-libstdcxx-debug builds. But !strcmp(s, "POSIX") always makes a
call to strcmp at any optimization level. We make that strcmp call,
maybe several times, for any locale name except for "C" (which will be
matched before we get to the check for "POSIX").

For most targets, locale names begin with a lowercase letter and the
only one that begins with 'P' is "POSIX". Replacing !strcmp(s, "POSIX")
with s[0] == 'P' && !strcmp(s+1, "OSIX") means that we avoid calling
strcmp unless the string really does match "POSIX".

Maybe more importantly, I find is_C_locale(s) easier to read than
strcmp(s, "C") == 0 || strcmp(s, "POSIX") == 0, and !is_C_locale(s)
easier to read than strcmp(s, "C") != 0 && strcmp(s, "POSIX") != 0.

libstdc++-v3/ChangeLog:

* src/c++98/localename.cc (is_C_locale): New function.
(locale::locale(const char*)): Use is_C_locale.

libstdc++: Optimize std::string::assign(Iter, Iter) [PR110945]

Calling string::assign(Iter, Iter) with "foreign" iterators (not the
string's own iterator or pointer types) currently constructs a temporary
string and then calls replace to copy the characters from it. That means
we copy from the iterators twice, and if the replace operation has to
grow the string then we also allocate twice.

By using *this = basic_string(first, last, get_allocator()) we only
perform a single allocation+copy and then do a cheap move assignment
instead of a second copy (and possible allocation). But that alternative
has to be done conditionally, so that we don't pessimize the native
iterator case (the string's own iterator and pointer types) which
currently select efficient overloads of replace which will not allocate
at all if the string already has sufficient capacity. For C++20 we can
extend that efficient case to work for any contiguous iterator with the
right value type, not just for the string's native iterators.

So the change is to inline the code that decides whether to work in
place or to allocate+copy (instead of deciding that via overload
resolution for replace), and for the allocate+copy case do a move
assignment instead of another call to replace.

For C++98 there is no change, as we can't do an efficient move
assignment anyway, so keep the current code.

We can also simplify assign(initializer_list<CharT>) because the backing
array for an initializer_list is always disjunct with *this, so most of
the code in _M_replace is not needed.

libstdc++-v3/ChangeLog:

PR libstdc++/110945
* include/bits/basic_string.h (basic_string::assign(Iter, Iter)):
Dispatch to _M_replace or move assignment from a temporary,
based on the iterator type.

libstdc++: Add std::formatter specializations for extended float types

This makes it possible to format _Float32, _Float64 etc. in C++20 mode.
Previously it was only possible to format them in C++23 when the
<stdfloat> typedefs and the std::to_chars overloads were defined.

Instead of relying on std::to_chars for those types, we can just reuse
the formatters for float, double and long double. This also avoids
template bloat by reusing the same specializations instead of
instantiating __formatter_fp for every different type.

libstdc++-v3/ChangeLog:

* include/std/format (formatter): Add partial specializations
for extended floating-point types.
* testsuite/std/format/functions/format.cc: Move test_float128()
to ...
* testsuite/std/format/formatter/ext_float.cc: New test.

libstdc++: Define std::numeric_limits<_FloatNN> before C++23

The extended floating-point types such as _Float32 are supported by GCC
prior to C++23, you just can't use the standard-conforming names from
<stdfloat> to refer to them. This change defines the specializations of
std::numeric_limits for those types for older dialects, not only for
C++23.

libstdc++-v3/ChangeLog:

* include/bits/c++config (__gnu_cxx::__bfloat16_t): Define
whenever __BFLT16_DIG__ is defined, not only for C++23.
* include/std/limits (numeric_limits<bfloat16_t>): Likewise.
(numeric_limits<_Float16>, numeric_limits<_Float32>)
(numeric_limits<_Float64>): Likewise for other extended
floating-point types.

libstdc++: Fix -Wunused-parameter in <experimental/internet>

libstdc++-v3/ChangeLog:

* include/experimental/internet (address_v4::to_string): Remove
unused parameter name.

libstdc++: Make __cmp_cat::__unseq constructor consteval

This constructor should only ever be used with a literal 0 as the
argument, so we can make it consteval. This has the nice advantage that
it is expanded immediately in the front end, and so GDB will never step
into the __cmp_cat::__unseq::__unseq(__unseq*) constructor that is
uninteresting and probably confusing to users.

libstdc++-v3/ChangeLog:

* libsupc++/compare (__cmp_cat::__unseq): Make ctor consteval.
* testsuite/18_support/comparisons/categories/zero_neg.cc: Prune
excess errors caused by invalid consteval calls.

libstdc++: Simplify chrono::__units_suffix using std::format

For std::chrono formatting we can simplify __units_suffix by using
std::format_to to generate the "[n/m]s" suffix with the correct
character type and write directly to the output iterator, so it doesn't
need to be widened using ctype. We can't remove the use of ctype::widen
for formatting a time zone abbreviation as a wide string, because that
can contain arbitrary characters that can't be widened by
__to_wstring_numeric.

This also fixes a bug in the chrono formatter for %Z which created a
dangling wstring_view.

libstdc++-v3/ChangeLog:

* include/bits/chrono_io.h (__units_suffix_misc): Remove.
(__units_suffix): Return a known suffix as string view, do not
write unknown suffixes to a buffer.
(__fmt_units_suffix): New function that formats the suffix using
std::format_to.
(operator<<, __chrono_formatter::_M_q): Use __fmt_units_suffix.
(__chrono_formatter::_M_Z): Correct lifetime of wstring.

libstdc++: Rework std::format support for wchar_t

This changes how std::format creates wide strings, by replacing uses of
std::ctype<wchar_t>::widen with the recently-added __to_wstring_numeric
helper function. This removes the dependency on the locale, which should
only be used for locale-specific formats such as {:Ld}.

Also disable all the wide string formatting support if the
_GLIBCXX_USE_WCHAR_T macro is not defined. This is consistent with other
wchar_t support being disabled if the library is built without that
macro defined.

libstdc++-v3/ChangeLog:

* include/std/format [_GLIBCXX_USE_WCHAR_T]: Guard all wide
string formatters with this macro.
(__formatter_int::_M_format_int, __formatter_fp::format)
(formatter<const void*, C>::format): Use __to_wstring_numeric
instead of std::ctype::widen.
(__formatter_fp::_M_localize): Use hardcoded wchar_t values
instead of std::ctype::widen.
* testsuite/std/format/functions/format.cc: Add more checks for
wstring formatting of arithmetic types.

libstdc++: Implement std::to_string in terms of std::format (P2587R3)

This change for C++26 affects std::to_string for floating-point
arguments, so that they should be formatted using std::format("{}", v)
instead of using sprintf. The modified specification in the standard
also affects integral arguments, but there's no observable difference
for them, and we already use std::to_chars for them anyway.

To avoid <string> depending on all of <format>, this change actually
just uses std::to_chars directly instead of using std::format. This is
equivalent, because the format spec "{}" doesn't use any of the other
features of std::format.

libstdc++-v3/ChangeLog:

* include/bits/basic_string.h (to_string(floating-point-type)):
Implement using std::to_chars for C++26.
* include/bits/version.def (__cpp_lib_to_string): Define.
* include/bits/version.h: Regenerate.
* testsuite/21_strings/basic_string/numeric_conversions/char/dr1261.cc:
Adjust expected result in C++26 mode.
* testsuite/21_strings/basic_string/numeric_conversions/char/to_string.cc:
Likewise.
* testsuite/21_strings/basic_string/numeric_conversions/wchar_t/dr1261.cc:
Likewise.
* testsuite/21_strings/basic_string/numeric_conversions/wchar_t/to_wstring.cc:
Likewise.
* testsuite/21_strings/basic_string/numeric_conversions/char/to_string_float.cc:
New test.
* testsuite/21_strings/basic_string/numeric_conversions/wchar_t/to_wstring_float.cc:
New test.
* testsuite/21_strings/basic_string/numeric_conversions/version.cc:
New test.

libstdc++: Optimize std::to_string using std::string::resize_and_overwrite

This uses std::string::__resize_and_overwrite to avoid initializing the
string buffer with characters that are immediately overwritten. This
results in about 6% better performance for the std_to_string case in
int-benchmark.cc from https://github.com/fmtlib/format-benchmark

This requires a change to a testcase. The previous implementation
guaranteed that the string returned from std::to_string(integral-type)
would have no excess capacity, because it was constructed with the
correct length. The new implementation constructs an empty string and
then resizes it with resize_and_overwrite, which over-allocates. This
means that the "no-excess capacity" guarantee no longer holds.

We can also greatly improve the performance of std::to_wstring by using
std::to_string and then widening it with a new helper function, instead
of using std::swprintf to do the formatting.

libstdc++-v3/ChangeLog:

* include/bits/basic_string.h (to_string(integral-type)): Use
resize_and_overwrite when available.
(__to_wstring_numeric): New helper functions.
(to_wstring): Use std::to_string then __to_wstring_numeric.
* testsuite/21_strings/basic_string/numeric_conversions/char/to_string_int.cc:
Remove check for no excess capacity.

libstdc++: Define std::string::resize_and_overwrite for C++11 and COW string

There are several places in the library where we can improve performance
using resize_and_overwrite so it's inconvenient only being able to use
it in C++23 mode, and only for cxx11 strings. This adds it for COW
strings, and also adds __resize_and_overwrite as an extension for C++11
mode.

The new __resize_and_overwrite is available for C++11 and later, so
within the library we can use that consistently even in C++23. In order
to avoid making a copy (which might not be possible for non-copyable,
non-movable types) the callable is passed to resize_and_overwrite as an
lvalue reference. Unlike wrapping it in std::ref(op) this ensures that
invoking it as std::move(op)(n, p) will use the correct value category.
It also avoids any overhead that would be added by wrapping it in a
lambda like [&op](auto p, auto n) { return std::move(op)(p, n); }.

Adjust std::format to use the new __resize_and_overwrite, which we can
assume exists because we only use std::basic_string<char> and
std::basic_string<wchar_t>, so no program-defined specializations.

The uses in <experimental/internet> cannot be replaced, because those
are type-dependent on an Allocator template parameter, which could mean
they use program-defined specializations of std::basic_string that don't
have the __resize_and_overwrite extension.

libstdc++-v3/ChangeLog:

* include/bits/basic_string.h (__resize_and_overwrite): New
function.
* include/bits/basic_string.tcc (__resize_and_overwrite): New
function.
(resize_and_overwrite): Simplify by using reserve instead of
growing the string manually. Adjust for C++11 compatibility.
* include/bits/cow_string.h (resize_and_overwrite): New
function.
(__resize_and_overwrite): New function.
* include/bits/version.def (__cpp_lib_string_resize_and_overwrite):
Do not depend on cxx11abi.
* include/bits/version.h: Regenerate.
* include/std/format (__formatter_fp::_S_resize_and_overwrite):
Remove.
(__formatter_fp::format, __formatter_fp::_M_localize): Use
__resize_and_overwrite instead of _S_resize_and_overwrite.
* testsuite/21_strings/basic_string/capacity/char/resize_and_overwrite.cc:
Adjust for C++11 compatibility when included by ...
* testsuite/21_strings/basic_string/capacity/char/resize_and_overwrite_ext.cc:
New test.

Fix range-ops operator_addr.

Lack of symbolic information prevents op1_range from beig able to draw
the same conclusions as fold_range can.

PR tree-optimization/111009
gcc/
* range-op.cc (operator_addr_expr::op1_range): Be more restrictive.

gcc/testsuite/
* gcc.dg/pr111009.c: New.

RISCV: Add rotate immediate regression test

This adds new regression tests to ensure half-register rotations are
correctly optimized into rori instructions.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zbb-rol-ror-08.c: New test.
* gcc.target/riscv/zbb-rol-ror-09.c: New test.

Co-authored-by: Charlie Jenkins <charlie@rivosinc.com>
Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>

libstdc++: Implement P2770R0 changes to join_view / join_with_view

This C++23 paper fixes an issue in these views when adapting a certain
kind of non-forward range, and we treat it as a DR against C++20.

Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
libstdc++-v3/ChangeLog:

* include/bits/regex.h (regex_iterator::iterator_concept):
Define for C++20 as per P2770R0.
(regex_token_iterator::iterator_concept): Likewise.
* include/std/ranges (__detail::__as_lvalue): Define.
(join_view::_Iterator): Befriend join_view.
(join_view::_Iterator::_M_satisfy): Use _M_get_outer
instead of _M_outer.
(join_view::_Iterator::_M_get_outer): Define.
(join_view::_Iterator::_Iterator): Split constructor taking
_Parent argument into two as per P2770R0.  Remove constraint on
default constructor.
(join_view::_Iterator::_M_outer): Make this data member present
only when the underlying range is forward.
(join_view::_Iterator::operator++): Use _M_get_outer instead of
_M_outer.
(join_view::_Iterator::operator--): Use __as_lvalue helper.
(join_view::_Iterator::operator==): Adjust constraints as per
P2770R0.
(join_view::_Sentinel::__equal): Use _M_get_outer instead of
_M_outer.
(join_view::_M_outer): New data member when the underlying range
is non-forward.
(join_view::begin): Adjust definition as per P2770R0.
(join_view::end): Likewise.
(join_with_view::_M_outer_it): New data member when the
underlying range is non-forward.
(join_with_view::begin): Adjust definition as per P2770R0.
(join_with_view::end): Likewise.
(join_with_view::_Iterator::_M_outer_it): Make this data member
present only when the underlying range is forward.
(join_with_view::_Iterator::_M_get_outer): Define.
(join_with_view::_Iterator::_Iterator): Split constructor
taking _Parent argument into two as per P2770R0.  Remove
constraint on default constructor.
(join_with_view::_Iterator::_M_update_inner): Adjust definition
as per P2770R0.
(join_with_view::_Iterator::_M_get_inner): Likewise.
(join_with_view::_Iterator::_M_satisfy): Adjust calls to
_M_get_inner.  Use _M_get_outer instead of _M_outer_it.
(join_with_view::_Iterator::operator==): Adjust constraints
as per P2770R0.
(join_with_view::_Sentinel::operator==): Use _M_get_outer
instead of _M_outer_it.
* testsuite/std/ranges/adaptors/p2770r0.cc: New test.

libstdc++: Convert _RangeAdaptorClosure into a CRTP base [PR108827]

Using the CRTP idiom for this base class avoids bloating the size of a
pipeline when adding distinct empty range adaptor closure objects to it,
as detailed in section 4.1 of P2387R3.

But it means we can no longer define its operator| overloads as hidden
friends, since it'd mean each instantiation of _RangeAdaptorClosure
introduces its own distinct set of hidden friends.  So e.g. for the
outer | in

  x | (views::reverse | views::join)

ADL would find 6 distinct hidden operator| friends:

  two from _RangeAdaptorClosure<_Reverse>
  two from _RangeAdaptorClosure<_Join>
  two from _RangeAdaptorClosure<_Pipe<_Reverse, _Join>>

but we really only want to consider the last two.

We avoid this issue by instead defining the operator| overloads at
namespace scope alongside _RangeAdaptorClosure.  This should be fine
because the only types defined in this namespace are _RangeAdaptorClosure,
_RangeAdaptor, _Pipe and _Partial, so we don't have to worry about
unintentional ADL.

Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
PR libstdc++/108827

libstdc++-v3/ChangeLog:

* include/std/ranges (__adaptor::_RangeAdaptorClosure):
Convert into a CRTP class template.  Move hidden operator|
friends into namespace scope and adjust their constraints.
(__closure::__is_range_adaptor_closure_fn): Define.
(__closure::__is_range_adaptor_closure): Define.
(__adaptor::_Partial): Adjust use of _RangeAdaptorClosure.
(__adaptor::_Pipe): Likewise.
(views::_All): Likewise.
(views::_Join): Likewise.
(views::_Common): Likewise.
(views::_Reverse): Likewise.
(views::_Elements): Likewise.
(views::_Adjacent): Likewise.
(views::_AsRvalue): Likewise.
(views::_Enumerate): Likewise.
(views::_AsConst): Likewise.
* testsuite/std/ranges/adaptors/all.cc: Reinstate assertion
expecting that adding empty range adaptor closure objects to a
pipeline doesn't increase the size of a pipeline.

[LRA]: When assigning stack slots to pseudos previously assigned to fp consider other spilled pseudos

The previous LRA patch can assign slot of conflicting pseudos to
pseudos spilled after prohibiting fp->sp elimination. This patch
fixes this problem.

gcc/ChangeLog:

* lra-spills.cc (assign_stack_slot_num_and_sort_pseudos): Moving
slots_num initialization from here ...
(lra_spill): ... to here before the 1st call of
assign_stack_slot_num_and_sort_pseudos. Add the 2nd call after
fp->sp elimination.

Add warning options -W[no-]compare-distinct-pointer-types

GCC emits pedwarns unconditionally when comparing pointers of
different types, for example:

  int xdp_context (struct xdp_md *xdp)
    {
        void *data = (void *)(long)xdp->data;
        __u32 *metadata = (void *)(long)xdp->data_meta;
        __u32 ret;

        if (metadata + 1 > data)
          return 0;
        return 1;
   }

  /home/jemarch/foo.c: In function ‘xdp_context’:
  /home/jemarch/foo.c:15:20: warning: comparison of distinct pointer types lacks a cast
         15 |   if (metadata + 1 > data)
                 |                    ^

LLVM supports an option -W[no-]compare-distinct-pointer-types that can
be used in order to enable or disable the emission of such warnings.
It is enabled by default.

This patch adds the same options to GCC.

Documentation and testsuite updated included.
Regtested in x86_64-linu-gnu.
No regressions observed.

gcc/ChangeLog:

PR c/106537
* doc/invoke.texi (Option Summary): Mention
-Wcompare-distinct-pointer-types under `Warning Options'.
(Warning Options): Document -Wcompare-distinct-pointer-types.

gcc/c-family/ChangeLog:

PR c/106537
* c.opt (Wcompare-distinct-pointer-types): New option.

gcc/c/ChangeLog:

PR c/106537
* c-typeck.cc (build_binary_op): Warning on comparing distinct
pointer types only when -Wcompare-distinct-pointer-types.

gcc/testsuite/ChangeLog:

PR c/106537
* gcc.c-torture/compile/pr106537-1.c: New test.
* gcc.c-torture/compile/pr106537-2.c: Likewise.
* gcc.c-torture/compile/pr106537-3.c: Likewise.

Fix code_helper unused argument warning for fr30

fr30 is the only target defining GO_IF_LEGITIMATE_ADDRESS right now, in
which case the `code_helper ch` argument to memory_address_addr_space_p()
is unused and emits a new warning.

gcc/ChangeLog:
* recog.cc (memory_address_addr_space_p): Mark possibly unused
argument as unused.

[PATCH] RISC-V: Deduplicate #error messages in testsuite

"#error Feature macro not defined" is required to test the existence of an
extension through the preprocessor. However, multiple occurrence of the
exact same error message will confuse the developer once an error is
encountered.

This commit replaces such error messages to
"#error Feature macro for `EXT' not defined" to make which
macro is missing.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zvkn.c: Deduplicate #error messages.
* gcc.target/riscv/zvkn-1.c: Ditto.
* gcc.target/riscv/zvknc.c: Ditto.
* gcc.target/riscv/zvknc-1.c: Ditto.
* gcc.target/riscv/zvknc-2.c: Ditto.
* gcc.target/riscv/zvkng.c: Ditto.
* gcc.target/riscv/zvkng-1.c: Ditto.
* gcc.target/riscv/zvkng-2.c: Ditto.
* gcc.target/riscv/zvks.c: Ditto.
* gcc.target/riscv/zvks-1.c: Ditto.
* gcc.target/riscv/zvksc.c: Ditto.
* gcc.target/riscv/zvksc-1.c: Ditto.
* gcc.target/riscv/zvksc-2.c: Ditto.
* gcc.target/riscv/zvksg.c: Ditto.
* gcc.target/riscv/zvksg-1.c: Ditto.
* gcc.target/riscv/zvksg-2.c: Ditto.

tree-optimization/111039 - abnormals and bit test merging

The following guards the bit test merging code in if-combine against
the appearance of SSA names used in abnormal PHIs.

PR tree-optimization/111039
* tree-ssa-ifcombine.cc (ifcombine_ifandif): Check for
SSA_NAME_OCCURS_IN_ABNORMAL_PHI.

* gcc.dg/pr111039.c: New testcase.

libgomp: call numa_available first when using libnuma

The documentation requires that numa_available() is called and only
when successful, other libnuma function may be called. Internally,
it does a syscall to get_mempolicy with flag=0 (which would return
the default policy if mode were not NULL). If this returns -1 (and
not 0) and errno == ENOSYS, the Linux kernel does not have the
get_mempolicy syscall function; if so, numa_available() returns -1
(otherwise: 0).

libgomp/

PR libgomp/111024
* allocator.c (gomp_init_libnuma): Call numa_available; if
not available or not returning 0, disable libnuma usage.

doc: Fixes to RTL-SSA sample code

This patch fixes up the code examples in the RTL-SSA documentation (the
sections on making insn changes) to reflect the current API.

The main issues are as follows:
- rtl_ssa::recog takes an obstack_watermark & as the first parameter.
   Presumably this is intended to be the change attempt, so I've updated
   the examples to pass this through.
- The variants of recog and restrict_movement that take an ignore
   predicate have been renamed with an _ignoring suffix, so I've
   updated callers to use those names.
- A couple of minor "obvious" fixes to add a missing address-of
   operator and correct a variable name.

gcc/ChangeLog:

* doc/rtl.texi: Fix up sample code for RTL-SSA insn changes.

RISC-V: Fix XPASS slp testcases

This patch fixs XPASS slp testcases on trunk by
making the conditions for xfail stricter.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/partial/slp-1.c: Fix.
* gcc.target/riscv/rvv/autovec/partial/slp-16.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/slp-17.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/slp-18.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/slp-19.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/slp-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/slp-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/slp-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/slp-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/slp-6.c: Ditto.

bpf: support `naked' function attributes in BPF targets

The kernel selftests and other BPF programs make extensive use of the
`naked' function attribute with bodies written using basic inline
assembly. This patch adds support for the attribute to
bpf-unkonwn-none, makes it to inhibit warnings due to lack of explicit
`return' statement, and updates documentation and testsuite
accordingly.

Tested in x86_64-linux-gnu host and bpf-unknown-none target.

gcc/ChangeLog

PR target/111046
* config/bpf/bpf.cc (bpf_attribute_table): Add entry for the
`naked' function attribute.
(bpf_warn_func_return): New function.
(TARGET_WARN_FUNC_RETURN): Define.
(bpf_expand_prologue): Add preventive comment.
(bpf_expand_epilogue): Likewise.
* doc/extend.texi (BPF Function Attributes): Document the `naked'
function attribute.

gcc/testsuite/ChangeLog

* gcc.target/bpf/naked-1.c: New test.