git.ipfire.org Git - thirdparty/gcc.git/log

middle-end: check memory accesses in the destination block [PR113588].

When analyzing loads for early break it was always the intention that for the
exit where things get moved to we only check the loads that can be reached from
the condition.

However the main loop checks all loads and we skip the destination BB. As such
we never actually check the loads reachable from the COND in the last BB unless
this BB was also the exit chosen by the vectorizer.

This leads us to incorrectly vectorize the loop in the PR and in doing so access
out of bounds.

gcc/ChangeLog:

PR tree-optimization/113588
PR tree-optimization/113467
* tree-vect-data-refs.cc
(vect_analyze_data_ref_dependence): Choose correct dest and fix checks.
(vect_analyze_early_break_dependences): Update comments.

gcc/testsuite/ChangeLog:

PR tree-optimization/113588
PR tree-optimization/113467
* gcc.dg/vect/vect-early-break_108-pr113588.c: New test.
* gcc.dg/vect/vect-early-break_109-pr113588.c: New test.

Fix some of vect-avg-*.c testcases

The vect-avg-*.c testcases are trying to make sure
the AVG internal function are used and not
doing promotion to `vector unsigned short`
but when V4QI is implemented, `vector(2) unsigned short`
shows up in the detail dump file and causes the failure.
To fix this checking the optimized dump instead of the vect dump
for `vector unsigned short` to make sure the vectorizer does not
do the promotion.

Built and tested for aarch64-linux-gnu.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-avg-1.c: Check optimized dump
for `vector *signed short` instead of the `vect` dump.
* gcc.dg/vect/vect-avg-11.c: Likewise.
* gcc.dg/vect/vect-avg-12.c: Likewise.
* gcc.dg/vect/vect-avg-13.c: Likewise.
* gcc.dg/vect/vect-avg-14.c: Likewise.
* gcc.dg/vect/vect-avg-2.c: Likewise.
* gcc.dg/vect/vect-avg-3.c: Likewise.
* gcc.dg/vect/vect-avg-4.c: Likewise.
* gcc.dg/vect/vect-avg-5.c: Likewise.
* gcc.dg/vect/vect-avg-6.c: Likewise.
* gcc.dg/vect/vect-avg-7.c: Likewise.
* gcc.dg/vect/vect-avg-8.c: Likewise.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

d: Merge dmd, druntime bce5c1f7b5, phobos e4d0dd513.

D front-end changes:

- Import latest changes from dmd v2.107.0-beta.1.
- Keywords like `__FILE__' are now always evaluated at the
callsite.

D runtime changes:

- Import latest changes from druntime v2.107.0-beta.1.
- Added `nameSig' field to TypeInfo_Class in object.d.

Phobos changes:

- Import latest changes from phobos v2.107.0-beta.1.

gcc/d/ChangeLog:

* dmd/MERGE: Merge upstream dmd bce5c1f7b5.
* d-attribs.cc (build_attributes): Update for new front-end interface.
* d-lang.cc (d_parse_file): Likewise.
* decl.cc (DeclVisitor::visit (VarDeclaration *)): Likewise.
* expr.cc (build_lambda_tree): New function.
(ExprVisitor::visit (FuncExp *)): Use build_lambda_tree.
(ExprVisitor::visit (SymOffExp *)): Likewise.
(ExprVisitor::visit (VarExp *)): Likewise.
* typeinfo.cc (create_tinfo_types): Add two ulong fields to internal
TypeInfo representation.
(TypeInfoVisitor::visit (TypeInfoClassDeclaration *)): Emit stub data
for TypeInfo_Class.nameSig.
(TypeInfoVisitor::visit (TypeInfoStructDeclaration *)): Update for new
front-end interface.

libphobos/ChangeLog:

* libdruntime/MERGE: Merge upstream druntime bce5c1f7b5.
* src/MERGE: Merge upstream phobos e4d0dd513.

d: Merge dmd, druntime d8e3976a58, phobos 7a6e95688

D front-end changes:

    - Import dmd v2.107.0-beta.1.
    - A string literal as an assert condition is deprecated.
    - Added `@standalone` for module constructors.

D runtime changes:

    - Import druntime v2.107.0-beta.1.

Phobos changes:

    - Import phobos v2.107.0-beta.1.

gcc/d/ChangeLog:

* dmd/MERGE: Merge upstream dmd d8e3976a58.
* dmd/VERSION: Bump version to v2.107.0-beta.1.
* d-lang.cc (d_parse_file): Update for new front-end interface.
* modules.cc (struct module_info): Add standalonectors.
(build_module_tree): Implement @standalone.
(register_module_decl): Likewise.

libphobos/ChangeLog:

* libdruntime/MERGE: Merge upstream druntime d8e3976a58.
* src/MERGE: Merge upstream phobos 7a6e95688.

d: Merge upstream dmd, druntime f1a045928e

D front-end changes:

    - Import dmd v2.106.1-rc.1.
    - Unrecognized pragmas are no longer an error by default.

D runtime changes:

    - Import druntime v2.106.1-rc.1.

gcc/d/ChangeLog:

* dmd/MERGE: Merge upstream dmd f1a045928e.
* dmd/VERSION: Bump version to v2.106.1-rc.1.
* gdc.texi (fignore-unknown-pragmas): Update documentation.
* d-builtins.cc (covariant_with_builtin_type_p): Update for new
front-end interface.
* d-lang.cc (d_parse_file): Likewise.
* typeinfo.cc (make_frontend_typeinfo): Likewise.

libphobos/ChangeLog:

* libdruntime/MERGE: Merge upstream druntime f1a045928e.
* libdruntime/Makefile.am (DRUNTIME_DSOURCES): Add
core/stdc/stdatomic.d.
* libdruntime/Makefile.in: Regenerate.

libgo: better error messages for unknown GOARCH/GOOS

PR go/113530

Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/557655

compiler: export the type "any" as a builtin

Otherwise we can't tell the difference between builtin type "any" and
a locally defined type "any".

This will require updates to the gccgo export data parsers in the main
Go repo and the x/tools repo. These updates are https://go.dev/cl/537195
and https://go.dev/cl/537215.

Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/536715

libgcc: Fix up _BitInt division [PR113604]

The following testcase ends up with SIGFPE in __divmodbitint4.
The problem is a thinko in my attempt to implement Knuth's algorithm.

The algorithm does (where b is 65536, i.e. one larger than what
fits in their unsigned short word):
        // Compute estimate qhat of q[j].
        qhat = (un[j+n]*b + un[j+n-1])/vn[n-1];
        rhat = (un[j+n]*b + un[j+n-1]) - qhat*vn[n-1];
      again:
        if (qhat >= b || qhat*vn[n-2] > b*rhat + un[j+n-2])
        { qhat = qhat - 1;
          rhat = rhat + vn[n-1];
          if (rhat < b) goto again;
        }
The problem is that it uses a double-word / word -> double-word
division (and modulo), while all we have is udiv_qrnnd unless
we'd want to do further library calls, and udiv_qrnnd is a
double-word / word -> word division and modulo.
Now, as the algorithm description says, it can produce at most
word bits + 1 bit quotient.  And I believe that actually the
highest qhat the original algorithm can produce is
(1 << word_bits) + 1.  The algorithm performs earlier canonicalization
where both the divisor and dividend are shifted left such that divisor
has msb set.  If it has msb set already before, no shifting occurs but
we start with added 0 limb, so in the first uv1:uv0 double-word uv1
is 0 and so we can't get too high qhat, if shifting occurs, the first
limb of dividend is shifted right by UWtype bits - shift count into
a new limb, so again in the first iteration in the uv1:uv0 double-word
uv1 doesn't have msb set while vv1 does and qhat has to fit into word.
In the following iterations, previous iteration should guarantee that
the previous quotient digit is correct.  Even if the divisor was the
maximal possible vv1:all_ones_in_all_lower_limbs, if the old uv0:lower_limbs
would be larger or equal to the divisor, the previous quotient digit
would increase and another divisor would be subtracted, which I think
implies that in the next iteration in uv1:uv0 double-word uv1 <= vv1,
but uv0 could be up to all ones, e.g. in case of all lower limbs
of divisor being all ones and at least one dividend limb below uv0
being not all ones.  So, we can e.g. for 64-bit UWtype see
uv1:uv0 / vv1 0x8000000000000000UL:0xffffffffffffffffUL / 0x8000000000000000UL
or 0xffffffffffffffffUL:0xffffffffffffffffUL / 0xffffffffffffffffUL
In all these cases (when uv1 == vv1 && uv0 >= uv1), qhat is
0x10000000000000001UL, i.e. 2 more than fits into UWtype result,
if uv1 == vv1 && uv0 < uv1 it would be 0x10000000000000000UL, i.e.
1 more than fits into UWtype result.
Because we only have udiv_qrnnd which can't deal with those too large
cases (SIGFPEs or otherwise invokes undefined behavior on those), I've
tried to handle the uv1 >= vv1 case separately, but for one thing
I thought it would be at most 1 larger than what fits, and for two
have actually subtracted vv1:vv1 from uv1:uv0 instead of subtracting
0:vv1 from uv1:uv0.
For the uv1 < vv1 case, the implementation already performs roughly
what the algorithm does.
Now, let's see what happens with the two possible extra cases in
the original algorithm.
If uv1 == vv1 && uv0 < uv1, qhat above would be b, so we take
if (qhat >= b, decrement qhat by 1 (it becomes b - 1), add
vn[n-1] aka vv1 to rhat and goto again if rhat < b (but because
qhat already fits we can goto to the again label in the uv1 < vv1
code).  rhat in this case is uv0 and rhat + vv1 can but doesn't
have to overflow, say for uv0 42UL and vv1 0x8000000000000000UL
it will not (and so we should goto again), while for uv0
0x8000000000000000UL and vv1 0x8000000000000001UL it will (and
we shouldn't goto again).
If uv1 == vv1 && uv0 >= uv1, qhat above would be b + 1, so we
take if (qhat >= b, decrement qhat by 1 (it becomes b), add
vn[n-1] aka vv1 to rhat. But because vv1 has msb set and
rhat in this case is uv0 - vv1, the rhat + vv1 addition
certainly doesn't overflow, because (uv0 - vv1) + vv1 is uv0,
so in the algorithm we goto again, again take if (qhat >= b and
decrement qhat so it finally becomes b - 1, and add vn[n-1]
aka vv1 to rhat again.  But this time I believe it must always
overflow, simply because we added (uv0 - vv1) + vv1 + vv1 and
vv1 has msb set, so already vv1 + vv1 must overflow.  And
because it overflowed, it will not goto again.
So, I believe the following patch implements this correctly, by
subtracting vv1 from uv1:uv0 double-word once, then comparing
again if uv1 >= vv1.  If that is true, subtract vv1 from uv1:uv0
again and add 2 * vv1 to rhat, no __builtin_add_overflow is needed
as we know it always overflowed and so won't goto again.
If after the first subtraction uv1 < vv1, use __builtin_add_overflow
when adding vv1 to rhat, because it can but doesn't have to overflow.

I've added an extra testcase which tests the behavior of all the changed
cases, so it has a case where uv1:uv0 / vv1 is 1:1, where it is
1:0 and rhat + vv1 overflows and where it is 1:0 and rhat + vv1 does not
overflow, and includes tests also from Zdenek's other failing tests.

2024-02-02  Jakub Jelinek  <jakub@redhat.com>

PR libgcc/113604
* libgcc2.c (__divmodbitint4): If uv1 >= vv1, subtract
vv1 from uv1:uv0 once or twice as needed, rather than
subtracting vv1:vv1.

* gcc.dg/torture/bitint-53.c: New test.
* gcc.dg/torture/bitint-55.c: New test.

libgccjit: Implement sizeof operator

gcc/jit/ChangeLog:

* docs/topics/compatibility.rst (LIBGCCJIT_ABI_27): New ABI tag.
* docs/topics/expressions.rst: Document gcc_jit_context_new_sizeof.
* jit-playback.cc (new_sizeof): New method.
* jit-playback.h (new_sizeof): New method.
* jit-recording.cc (recording::context::new_sizeof,
recording::memento_of_sizeof::replay_into,
recording::memento_of_sizeof::make_debug_string,
recording::memento_of_sizeof::write_reproducer): New methods.
* jit-recording.h (class memento_of_sizeof): New class.
* libgccjit.cc (gcc_jit_context_new_sizeof): New function.
* libgccjit.h (gcc_jit_context_new_sizeof): New function.
* libgccjit.map: New function.

gcc/testsuite/ChangeLog:

* jit.dg/all-non-failing-tests.h: New test.
* jit.dg/test-sizeof.c: New test.

c++: op== defaulted outside class [PR110084]

defaulted_late_check is for checks that need to happen after the class is
complete; we shouldn't call it sooner.

PR c++/110084

gcc/cp/ChangeLog:

* pt.cc (tsubst_function_decl): Only check a function defaulted
outside the class if the class is complete.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/spaceship-synth-neg3.C: Check error message.
* g++.dg/cpp2a/spaceship-eq16.C: New test.

hppa: Implement TARGET_ATOMIC_ASSIGN_EXPAND_FENV

This change implements __builtin_get_fpsr() and __builtin_set_fpsr(x)
to get and set the floating-point status register. They are used to
implement pa_atomic_assign_expand_fenv().

2024-02-02 John David Anglin <danglin@gcc.gnu.org>

gcc/ChangeLog:

PR target/59778
* config/pa/pa.cc (enum pa_builtins): Add PA_BUILTIN_GET_FPSR
and PA_BUILTIN_SET_FPSR builtins.
* (pa_builtins_icode): Declare.
* (def_builtin, pa_fpu_init_builtins): New.
* (pa_init_builtins): Initialize FPU builtins.
* (pa_builtin_decl, pa_expand_builtin_1): New.
* (pa_expand_builtin): Handle PA_BUILTIN_GET_FPSR and
PA_BUILTIN_SET_FPSR builtins.
* (pa_atomic_assign_expand_fenv): New.
* config/pa/pa.md (UNSPECV_GET_FPSR, UNSPECV_SET_FPSR): New
UNSPECV constants.
(get_fpsr, put_fpsr): New expanders.
(get_fpsr_32, get_fpsr_64, set_fpsr_32, set_fpsr_64): New
insn patterns.

RISC-V: Expand VLMAX scalar move in reduction

This patch fixes the following:

        vsetvli a5,a1,e32,m1,tu,ma
        slli    a4,a5,2
        sub     a1,a1,a5
        vle32.v v2,0(a0)
        add     a0,a0,a4
        vadd.vv v1,v2,v1
        bne     a1,zero,.L3
        vsetivli        zero,1,e32,m1,ta,ma
        vmv.s.x v2,zero
        vsetvli a5,zero,e32,m1,ta,ma              ---> Redundant vsetvl.
        vredsum.vs      v1,v1,v2
        vmv.x.s a0,v1
        ret

VSETVL PASS is able to fuse avl = 1 of scalar move and VLMAX avl of reduction.

However, this following RTL blocks the fusion in dependence analysis in VSETVL PASS:

(insn 49 24 50 5 (set (reg:RVVM1SI 98 v2 [148])
        (if_then_else:RVVM1SI (unspec:RVVMF32BI [
                    (const_vector:RVVMF32BI [
                            (const_int 1 [0x1])
                            repeat [
                                (const_int 0 [0])
                            ]
                        ])
                    (const_int 1 [0x1])
                    (const_int 2 [0x2]) repeated x2
                    (const_int 0 [0])
                    (reg:SI 66 vl)
                    (reg:SI 67 vtype)
                ] UNSPEC_VPREDICATE)
            (const_vector:RVVM1SI repeat [
                    (const_int 0 [0])
                ])
            (unspec:RVVM1SI [
                    (reg:DI 0 zero)
                ] UNSPEC_VUNDEF))) 3813 {*pred_broadcastrvvm1si_zero}
     (nil))
(insn 50 49 51 5 (set (reg:DI 15 a5 [151])                          ---->  It set a5, blocks the following VLMAX into the scalar move above.
        (unspec:DI [
                (const_int 32 [0x20])
            ] UNSPEC_VLMAX)) 2566 {vlmax_avldi}
     (expr_list:REG_EQUIV (unspec:DI [
                (const_int 32 [0x20])
            ] UNSPEC_VLMAX)
        (nil)))
(insn 51 50 52 5 (set (reg:RVVM1SI 97 v1 [150])
        (unspec:RVVM1SI [
                (unspec:RVVMF32BI [
                        (const_vector:RVVMF32BI repeat [
                                (const_int 1 [0x1])
                            ])
                        (reg:DI 15 a5 [151])
                        (const_int 2 [0x2])
                        (const_int 1 [0x1])
                        (reg:SI 66 vl)
                        (reg:SI 67 vtype)
                    ] UNSPEC_VPREDICATE)
                (unspec:RVVM1SI [
                        (reg:RVVM1SI 97 v1 [orig:134 vect_result_14.6 ] [134])
                        (reg:RVVM1SI 98 v2 [148])
                    ] UNSPEC_REDUC_SUM)
                (unspec:RVVM1SI [
                        (reg:DI 0 zero)
                    ] UNSPEC_VUNDEF)
            ] UNSPEC_REDUC)) 17541 {pred_redsumrvvm1si}
     (expr_list:REG_DEAD (reg:RVVM1SI 98 v2 [148])
        (expr_list:REG_DEAD (reg:SI 66 vl)
            (expr_list:REG_DEAD (reg:DI 15 a5 [151])
                (expr_list:REG_DEAD (reg:DI 0 zero)
                    (nil))))))

Such situation can only happen on auto-vectorization, never happen on intrinsic codes.
Since the reduction is passed VLMAX AVL, it should be more natural to pass VLMAX to the scalar move which initial the value of the reduction.

After this patch:

vsetvli a5,a1,e32,m1,tu,ma
slli a4,a5,2
sub a1,a1,a5
vle32.v v2,0(a0)
add a0,a0,a4
vadd.vv v1,v2,v1
bne a1,zero,.L3
vsetvli a5,zero,e32,m1,ta,ma
vmv.s.x v2,zero
vredsum.vs v1,v1,v2
vmv.x.s a0,v1
        ret

Tested on both RV32/RV64 no regression.

PR target/113697

gcc/ChangeLog:

* config/riscv/riscv-v.cc (expand_reduction): Pass VLMAX avl to scalar move.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/pr113697.c: New test.

testsuite, Darwin: Allow for undefined symbols in shared test.

Darwin's linker defaults to error on undefined (which makes it look as
if we do not support shared, leading to tests being marked incorrectly
as unsupported).

This fixes the issue by allowing the symbols used in the target
supports test to be undefined.

gcc/testsuite/ChangeLog:

* lib/target-supports.exp (check_effective_target_shared):
Allow the external symbols referenced in the test to be undefined.

testsuite, ubsan: Add libstdc++ deps where required.

We use the ubsan tests from both C, C++, D and Fortran.
thee sanitizer libraries link to libstdc++.

When we are using the C/gdc/gfortran driver, and the target might
require a path to the libstdc++ (e.g. for handing -static-xxxx or
for embedded runpaths), we need to add a suitable option (or we get
fails at execution time because of the missing paths).

Conversely, we do not want to add multiple instances of these
paths (since that leads to failures on tools that report warnings
for duplicate runpaths).

This patch modifies the _init function to allow a sigle parameter
that determines whether the *asan_init should add a path for
libstdc++ (yes for C driver, no for C++ driver).
gcc/testsuite/ChangeLog:

* g++.dg/ubsan/ubsan.exp:Add a parameter to init to say that
we expect the C++ driver to provide paths for libstdc++.
* gcc.dg/ubsan/ubsan.exp: Add a parameter to init to say that
we need a path added for libstdc++.
* gdc.dg/ubsan/ubsan.exp: Likewise.
* gfortran.dg/ubsan/ubsan.exp: Likewise.
* lib/ubsan-dg.exp: Handle a single parameter to init that
requests addition of a path to libstdc++ to link flags.

testsuite, asan, hwsan: Add libstdc++ deps where required.

We use the shared asan/hwasan rom both C,C++,D and Fortran.
The sanitizer libraries link to libstdc++.

When we are using the C/gdc/gfortran driver, and the target might
require a path to the libstdc++ (e.g. for handing -static-xxxx or
for embedded runpaths), we need to add a suitable option (or we get
fails at execution time because of the missing paths).

Conversely, we do not want to add multiple instances of these
paths (since that leads to failures on tools that report warnings
for duplicate runpaths).

This patch modifies the _init function to allow a single parameter
that determines whether the *asan_init should add a path for
libstdc++ (yes for C driver, no for C++ driver).

gcc/testsuite/ChangeLog:

* g++.dg/asan/asan.exp: Add a parameter to init to say that
we expect the C++ driver to provide paths for libstdc++.
* g++.dg/hwasan/hwasan.exp: Likewise
* gcc.dg/asan/asan.exp: Add a parameter to init to say that
we need a path added for libstdc++.
* gcc.dg/hwasan/hwasan.exp: Likewise.
* gdc.dg/asan/asan.exp: Likewise.
* gfortran.dg/asan/asan.exp: Likewise.
* lib/asan-dg.exp: Handle a single parameter to init that
requests addition of a path to libstdc++ to link flags.
* lib/hwasan-dg.exp: Likewise.

[PATCH] libgcc: Include stdlib.h for abort() on mingw32

libgcc/
* config/i386/enable-execute-stack-mingw32.c: Include
stdlib.h for abort() definition.

libstdc++: Make std::function deduction guide support explicit object functions [PR113335]

This makes the deduction guides for std::function and std::packaged_task
work for explicit object member functions, i.e. "deducing this", as per
LWG 3617.

libstdc++-v3/ChangeLog:

PR libstdc++/113335
* include/bits/std_function.h (__function_guide_helper): Add
partial specialization for explicit object member functions, as
per LWG 3617.
* testsuite/20_util/function/cons/deduction_c++23.cc: Check
explicit object member functions.
* testsuite/30_threads/packaged_task/cons/deduction_c++23.cc:
Likewise.

libstdc++: Fix experimental/names.cc failure on AIX

This fails due to "u" being used in a system header.

FAIL: experimental/names.cc -std=gnu++17 (test for excess errors)
Excess errors:
/usr/include/sys/poll.h:77: error: expected unqualified-id before ';' token
/usr/include/sys/poll.h:77: error: expected ')' before ';' token

FAIL: experimental/names.cc -std=gnu++17 (test for excess errors)
Excess errors:
/usr/include/sys/poll.h:102: error: expected unqualified-id before ';' token
/usr/include/sys/poll.h:102: error: expected ')' before ';' token

libstdc++-v3/ChangeLog:

* testsuite/17_intro/names.cc [_AIX]: Undefine "u".

libgcc: Export XF, TF, HF and BFmode specific _BitInt symbols from libgcc_s.so.1 [PR113700]

Rainer pointed out that __PFX__ and __FIXPTPFX__ prefix replacement is done
solely for libgcc-std.ver.in and not for the *.ver files in config.
I've used the __PFX__ prefix even in config/i386/libgcc-glibc.ver because it
was used for similar symbols in libgcc-std.ver.in, and that results in those
symbols being STB_LOCAL in libgcc_s.so.1. Tests still work because gcc by
default uses -static-libgcc when linking (unlike g++ etc.), but would
have failed when using -shared-libgcc (but I see nothing in the testsuite
actually testing with -shared-libgcc, so am not adding tests).

With the patch, libgcc_s.so.1 now exports
__fixtfbitint@@GCC_14.0.0 FUNC GLOBAL DEFAULT
__fixxfbitint@@GCC_14.0.0 FUNC GLOBAL DEFAULT
__floatbitintbf@@GCC_14.0.0 FUNC GLOBAL DEFAULT
__floatbitinthf@@GCC_14.0.0 FUNC GLOBAL DEFAULT
__floatbitinttf@@GCC_14.0.0 FUNC GLOBAL DEFAULT
__floatbitintxf@@GCC_14.0.0 FUNC GLOBAL DEFAULT
on x86_64-linux which it wasn't before.

2024-02-02 Jakub Jelinek <jakub@redhat.com>

PR target/113700
* config/i386/libgcc-glibc.ver (GCC_14.0.0): Remove __PFX prefixes
from symbol names.

doc: Fix typo in description of hardbool attribute

gcc/ChangeLog:

* doc/extend.texi (Common Type Attributes): Fix typo in
description of hardbool.

testsuite: Add another bitint testcase [PR113691]

This is fixed by the PR113692 patch.

2024-02-02 Jakub Jelinek <jakub@redhat.com>

PR tree-optimization/113691
* gcc.dg/bitint-83.c: New test.

lower-bitint: Handle casts from large/huge _BitInt to pointer/reference types [PR113692]

I thought one needs to cast first to pointer-sized integer before casting to
pointer, but apparently that is not the case.
So the following patch arranges for the large/huge _BitInt to
pointer/reference conversions to use the same code as for conversions
of them to small integral types.

2024-02-02 Jakub Jelinek <jakub@redhat.com>

PR tree-optimization/113692
* gimple-lower-bitint.cc (bitint_large_huge::lower_stmt): Handle casts
from large/huge BITINT_TYPEs to POINTER_TYPE/REFERENCE_TYPE as
final_cast_p.

* gcc.dg/bitint-82.c: New test.

lower-bitint: Handle uninitialized large/huge SSA_NAMEs as inline asm inputs [PR113699]

Similar problem to calls with uninitialized large/huge _BitInt SSA_NAME
arguments, var_to_partition will not work for those, but unlike calls
where we just create a new uninitialized SSA_NAME here we need to change
the inline asm input to be an uninitialized VAR_DECL.

2024-02-02 Jakub Jelinek <jakub@redhat.com>

PR middle-end/113699
* gimple-lower-bitint.cc (bitint_large_huge::lower_asm): Handle
uninitialized large/huge _BitInt SSA_NAME inputs.

* gcc.dg/bitint-81.c: New test.

libstdc++: Implement some missing functions for net::ip::network_v6

libstdc++-v3/ChangeLog:

* include/experimental/internet (network_v6::network): Define.
(network_v6::hosts): Finish implementing.
(network_v6::to_string): Do not concatenate std::string to
arbitrary std::basic_string specialization.
* testsuite/experimental/net/internet/network/v6/cons.cc: New
test.

libstdc++: Fix invalid order in PSTL inplace_merge test [PR90276]

This looks like a typo in the upstream test that causes a failure in
debug mode. It has been reported upstream.

libstdc++-v3/ChangeLog:

PR libstdc++/90276
* testsuite/25_algorithms/pstl/alg_merge/inplace_merge.cc: Fix
comparison function to use less-than instead of equality.

libstdc++: Avoid reusing moved-from iterators in PSTL tests [PR90276]

The reverse_invoker utility for PSTL tests uses forwarding references for
all parameters, but some of those parameters get forwarded to move
constructors which then leave the objects in a moved-from state. When
the parameters are forwarded a second time that results in making new
copies of moved-from iterators. For libstdc++ debug mode iterators, the
moved-from state is singular, which means copying them will abort at
runtime.

The fix is to make copies of iterator arguments instead of forwarding
them.

The callers of reverse_invoker::operator() also forward the iterators
multiple times, but that's OK because reverse_invoker accepts them by
forwarding reference but then breaks the chain of forwarding and copies
them as lvalues.

libstdc++-v3/ChangeLog:

PR libstdc++/90276
* testsuite/util/pstl/test_utils.h (reverse_invoker): Do not use
perfect forwarding for iterator arguments.

tree-ssa-math-opts: Fix is_widening_mult_rhs_p - unbreak bootstrap [PR113705]

On Tue, Jan 30, 2024 at 07:33:10AM -0000, Roger Sayle wrote:
+             wide_int bits = wide_int::from (tree_nonzero_bits (rhs),
+                                             prec,
+                                             TYPE_SIGN (TREE_TYPE (rhs)));
...
> +               if (gimple_assign_rhs_code (stmt) == BIT_AND_EXPR
> +                   && TREE_CODE (gimple_assign_rhs2 (stmt)) == INTEGER_CST
> +                   && wi::to_wide (gimple_assign_rhs2 (stmt))
> +                      == wi::mask (hprec, false, prec))

This change broke bootstrap on aarch64-linux.
The problem can be seen even on the reduced testcase.

The IL on the unreduced testcase before widening_mul has:
  # val_583 = PHI <val_26(13), val_164(40)>
...
  pretmp_266 = MEM[(const struct wide_int_storage *)&D.160657].len;
  _264 = pretmp_266 & 65535;
...
  _176 = (sizetype) val_583;
  _439 = (sizetype) _264;
  _284 = _439 * 8;
  _115 = _176 + _284;
where 583/266/264 have unsigned int type and 176/439/284/115 have sizetype.
widening_mul first turns that into:
  # val_583 = PHI <val_26(13), val_164(40)>
...
  pretmp_266 = MEM[(const struct wide_int_storage *)&D.160657].len;
  _264 = pretmp_266 & 65535;
...
  _176 = (sizetype) val_583;
  _439 = (sizetype) _264;
  _284 = _264 w* 8;
  _115 = _176 + _284;
and then is_widening_mult_rhs_p is called, with type sizetype (64-bit),
rhs _264, hprec 32 and prec 64.  Now tree_nonzero_bits (rhs) is
65535, so bits is 64-bit wide_int 65535, stmt is BIT_AND_EXPR,
but we ICE on the
wi::to_wide (gimple_assign_rhs2 (stmt)) == wi::mask (hprec, false, prec)
comparison because wi::to_wide on gimple_assign_rhs2 (stmt) - unsigned int
65535 gives 32-bit wide_int 65535, while wi::mask (hprec, false, prec)
gives 64-bit wide_int 0xffffffff and comparison between different precision
wide_ints is forbidden.

The following patch fixes it the same way how bits is computed earlier,
by calling wide_int::from on the wi::to_wide (gimple_assign_rhs2 (stmt)),
so we compare 64-bit 65535 with 64-bit 0xffffffff.

2024-02-02  Jakub Jelinek  <jakub@redhat.com>

PR middle-end/113705
* tree-ssa-math-opts.cc (is_widening_mult_rhs_p): Use wide_int_from
around wi::to_wide in order to compare value in prec precision.

* g++.dg/opt/pr113705.C: New test.

testsuite: i386: Fix gcc.target/i386/pr71321.c on Solaris/x86

gcc.target/i386/pr71321.c FAILs on 64-bit Solaris/x86 with the native
assembler:

FAIL: gcc.target/i386/pr71321.c scan-assembler-not lea.*0

The problem is that /bin/as doesn't fully support cfi directives, so the
.eh_frame section is specified explicitly, which includes ".long 0".
The regular expression above includes ".*", which does multiline
matches. AFAICS those aren't needed here.

This patch changes the RE not to use multiline patches.

Tested on i386-pc-solaris2.11 (as and gas) and i686-pc-linux-gnu.

2024-02-01 Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE>

gcc/testsuite:
* gcc.target/i386/pr71321.c (scan-assembler-not): Avoid multiline
matches.

libstdc++: Allow explicit conversion of string views with different traits

This was changed by LWG 3857.

libstdc++-v3/ChangeLog:

* include/std/string_view (basic_string_view(R&&)): Remove
constraint that traits_type must be the same, as per LWG 3857.
* testsuite/21_strings/basic_string_view/cons/char/range_c++20.cc:
Explicit conversion between different specializations should be
allowed.
* testsuite/21_strings/basic_string_view/cons/wchar_t/range_c++20.cc:
Likewise.

libstdc++: Remove noexcept from std::osyncstream::operator=

This should not be noexcept because its _M_syncbuf member has a
potentially-throwing move assignment operator. The noexcept was removed
by LWG 3867.

libstdc++-v3/ChangeLog:

* include/std/syncstream (basic_osyncstream::operator=): Remove
noexcept, as per LWG 3867.

libstdc++: Remove noexcept from std::generator::promise_type::yield_value

This overload of std::generator::promise_type::yield_value calls things
which might throw, so should not be noexcept. The noexcept was remove by
LWG 3894.

libstdc++-v3/ChangeLog:

* include/std/generator (promise_type::yield_value): Remove
noexcept from fourth overload, as per LWG 3894.

testsuite: i386: Fix gcc.target/i386/sse2-stv-1.c on Solaris/x86

gcc.target/i386/sse2-stv-1.c FAILs on 32-bit Solaris/x86:

FAIL: gcc.target/i386/sse2-stv-1.c scan-assembler-not %[er]sp
FAIL: gcc.target/i386/sse2-stv-1.c scan-assembler-not shldl

The test assumes the Linux/x86 default of -mno-stackrealign, while
32-bit Solaris/x86 default to -mstackrealign.

Fixed by explicitly specifying -mno-stackrealign.

Tested on i386-pc-solaris2.11 and i686-pc-linux-gnu.

2024-02-01 Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE>

gcc/testsuite:
* gcc.target/i386/sse2-stv-1.c (dg-options): Add -mno-stackrealign.

testsuite: i386: Restrict gcc.target/i386/pr80569.c to gas

gcc.target/i386/pr80569.c FAILs on Solaris/x86 with the native
assembler:

FAIL: gcc.target/i386/pr80569.c (test for excess errors)

Excess errors:
Assembler: pr80569.c
        "/var/tmp//ccm4_iqb.s", line 2 : Illegal mnemonic
        Near line: "    .code16gcc"
        "/var/tmp//ccm4_iqb.s", line 2 : Syntax error
        Near line: "    .code16gcc"

.code16gcc is a gas extension, so restrict the test to gas.

Tested on i386-pc-solaris2.11 (as and gas) and i686-pc-linux-gnu.

2024-02-01  Rainer Orth  <ro@CeBiTec.Uni-Bielefeld.DE>

gcc/testsuite:
* gcc.target/i386/pr80569.c: Require gas.

Revert "RISC-V: Allow LICM hoist POLY_INT configuration code sequence"

This reverts commit 74489c19070703361acc20bc172f304cae845a96.

testsuite, libphobos: Update link flags [PR112864].

The regressions here are primarily from duplicated '-B' additions.

We remove the duplicate on the link line.
We also make sure that platforms with extensions other than .so for
shared libs will have the correct paths.

PR target/112864

libphobos/ChangeLog:

* testsuite/lib/libphobos.exp: Use ${shlib_ext} instead of
hard-wiring '.so'.
* testsuite/testsuite_flags.in: Remove duplicate -B option
for spec file path.

testsuite, Objective-C++: Update link flags [PR112863].

These regressions are caused by missing or duplicate runpaths which
now fire linker warnings.

We need to add options to locate libobjc (and on Darwin libobjc-gnu)
along with libstdc++.
Usually '-L' options are added to point to the relevant directories for
the uninstalled libraries.

In cases where libraries are available as both shared and convenience
some additional checks are made.

For some targets -static-xxxx options are handled by specs substitution
and need a '-B' option rather than '-L'. For Darwin, when embedded
runpaths are in use (the default for all versions after macOS 10.11),
'-B' is also needed to provide the runpath.

When '-B' is used, this results in a '-L' for each path that exists (so
that appending a '-L' as well is a needless duplicate). There are also
cases where tools warn for duplicates, leading to spurious fails.

PR target/112863

gcc/testsuite/ChangeLog:

* lib/obj-c++.exp: Decide on whether to present -B or -L to
reference the paths to uninstalled libobjc/libobjc-gnu and
libstdc++ and use that to generate the link flags.

testsuite, gfortran: Update link flags [PR112862].

The regressions here are caused by two issues:
1. In some cases there is no generated runpath for libatomic
2. In other cases there are duplicate paths.

This patch simplifies the addition of the options in the main
gfortran exp and removes the duplicates elewhere.

We need to add options to locate libgfortran and the dependent libs
libquadmath (supporting REAL*16) and libatomic (supporting operations
used by coarrays).  Usually '-L' options are added to point to the
relevant directories for the uninstalled libraries.

In cases where libraries are available as both shared and convenience
some additional checks are made.

For some targets -static-xxxx options are handled by specs substitution
and need a '-B' option rather than '-L'.  For Darwin, when embedded
runpaths are in use (the default for all versions after macOS 10.11),
'-B' is also needed to provide the runpath.

When '-B' is used, this results in a '-L' for each path that exists (so
that appending a '-L' as well is a needless duplicate).  There are also
cases where tools warn for duplicates, leading to spurious fails.

PR target/112862

gcc/testsuite/ChangeLog:

* gfortran.dg/coarray/caf.exp: Remove duplicate additions of
libatomic handling.
* gfortran.dg/dg.exp: Likewise.
* lib/gfortran.exp: Decide on whether to present -B or -L to
reference the paths to uninstalled libgfortran, libqadmath and
libatomic and use that to generate the link flags.

RISC-V: Allow LICM hoist POLY_INT configuration code sequence

Realize in recent benchmark evaluation (coremark-pro zip-test):

        vid.v   v2
        vmv.v.i v5,0
.L9:
        vle16.v v3,0(a4)
        vrsub.vx        v4,v2,a6   ---> LICM failed to hoist it outside the loop.

The root cause is:

(insn 56 47 57 4 (set (subreg:DI (reg:HI 220) 0)
        (reg:DI 223)) "rvv.c":11:9 208 {*movdi_64bit}  -> Its result used by the following vrsub.vx then supress the hoist of the vrsub.vx
     (nil))

(insn 57 56 59 4 (set (reg:RVVMF2HI 216)
        (if_then_else:RVVMF2HI (unspec:RVVMF32BI [
                    (const_vector:RVVMF32BI repeat [
                            (const_int 1 [0x1])
                        ])
                    (reg:DI 350)
                    (const_int 2 [0x2]) repeated x2
                    (const_int 1 [0x1])
                    (reg:SI 66 vl)
                    (reg:SI 67 vtype)
                ] UNSPEC_VPREDICATE)
            (minus:RVVMF2HI (vec_duplicate:RVVMF2HI (reg:HI 220))
                (reg:RVVMF2HI 217))
            (unspec:RVVMF2HI [
                    (reg:DI 0 zero)
                ] UNSPEC_VUNDEF))) "rvv.c":11:9 6938 {pred_subrvvmf2hi_reverse_scalar}
     (expr_list:REG_DEAD (reg:HI 220)
        (nil)))

This patch fixes it generate (set (reg:HI) (subreg:HI (reg:DI))) instead of (set (subreg:DI (reg:DI)) (reg:DI)).

After this patch:

vid.v v2
vrsub.vx v2,v2,a7
vmv.v.i v4,0
.L3:
vle16.v v3,0(a4)

Tested on both RV32 and RV64 no regression.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_legitimize_move): Fix poly_int dest generation.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/poly_licm-1.c: New test.
* gcc.target/riscv/rvv/autovec/poly_licm-2.c: New test.

testsuite: i386: Fix gcc.target/i386/pieces-memcpy-7.c etc. on Solaris/x86

gcc.target/i386/pieces-memcpy-7.c etc. FAIL on 32-bit Solaris/x86:

FAIL: gcc.target/i386/pieces-memcpy-7.c scan-assembler-not %[re]bp
FAIL: gcc.target/i386/pieces-memcpy-8.c scan-assembler-not %[re]bp
FAIL: gcc.target/i386/pieces-memcpy-9.c scan-assembler-not %[re]bp
FAIL: gcc.target/i386/pieces-memset-36.c scan-assembler-not %[re]bp
FAIL: gcc.target/i386/pieces-memset-40.c scan-assembler-not %[re]bp
FAIL: gcc.target/i386/pieces-memset-9.c scan-assembler-not %[re]bp

The problem is that the tests assume -mno-stackrealign while 32-bit
Solaris/x86 defaults to -mstackrealign.

Fixed by explicitly specifying -mno-stackrealign.

Tested on i386-pc-solaris2.11 and i686-pc-linux-gnu.

2024-02-01 Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE>

gcc/testsuite:
* gcc.target/i386/pieces-memcpy-7.c (dg-additional-options): Add
-mno-stackrealign.
* gcc.target/i386/pieces-memcpy-8.c: Likewise.
* gcc.target/i386/pieces-memcpy-9.c: Likewise.
* gcc.target/i386/pieces-memset-36.c: Likewise.
* gcc.target/i386/pieces-memset-40.c: Likewise.
* gcc.target/i386/pieces-memset-9.c: Likewise.

testsuite: i386: Fix gcc.target/i386/apx-ndd-cmov.c on Solaris/x86

gcc.target/i386/apx-ndd-cmov.c FAILs on 64-bit Solaris/x86 with the
native assembler:

FAIL: gcc.target/i386/apx-ndd-cmov.c scan-assembler-times cmove[^\\n\\r]*, %eax 1
FAIL: gcc.target/i386/apx-ndd-cmov.c scan-assembler-times cmovge[^\\n\\r]*, %eax 1

The gas vs. as difference is

-       cmove   c+4(%rip), %esi, %eax
+       cmovl.e c+4(%rip), %esi, %eax

-       cmovge  %ecx, %edx, %eax
+       cmovl.ge        %ecx, %edx, %eax

This patch accounts for both forms.

Tested on i386-pc-solaris2.11 (as and gas) and i686-pc-linux-gnu.

2024-02-01  Rainer Orth  <ro@CeBiTec.Uni-Bielefeld.DE>

gcc/testsuite:
* gcc.target/i386/apx-ndd-cmov.c (scan-assembler-times): Allow for
cmovl.e, cmovl.ge.

MAINTAINERS: Update my e-mail address.

ChangeLog/
* MAINTAINERS: Update my e-mail address.

c++: no_unique_address and constexpr [PR112439]

Here, because we don't build a CONSTRUCTOR for an empty base, we were
wrongly marking the Foo CONSTRUCTOR as complete after initializing the Empty
member. Fixed by checking empty_base here as well.

PR c++/112439

gcc/cp/ChangeLog:

* constexpr.cc (cxx_eval_store_expression): Check empty_base
before marking a CONSTRUCTOR readonly.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/no_unique_address15.C: New test.

c++: variable template array of unknown bound [PR113638]

When we added variable templates, we didn't extend the VAR_HAD_UNKNOWN_BOUND
handling for class template static data members to handle them as well.

PR c++/113638

gcc/cp/ChangeLog:

* cp-tree.h: Adjust comment.
* pt.cc (instantiate_template): Set VAR_HAD_UNKNOWN_BOUND for
variable template.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1y/var-templ-array1.C: New test.

RISC-V: Cleanup the comments for the psabi

This patch would like to cleanup some comments which are out of date or incorrect.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_get_arg_info): Cleanup comments.
(riscv_pass_by_reference): Ditto.
(riscv_fntype_abi): Ditto.

Signed-off-by: Pan Li <pan2.li@intel.com>

RISC-V: Remove vsetvl_pre bogus instructions in VSETVL PASS

I realize there is a RTL regression between GCC-14 and GCC-13.
https://godbolt.org/z/Ga7K6MqaT

GCC-14:
(insn 9 13 31 2 (set (reg:DI 15 a5 [138])
        (unspec:DI [
                (const_int 64 [0x40])
            ] UNSPEC_VLMAX)) "/app/example.c":5:15 2566 {vlmax_avldi}
     (expr_list:REG_EQUIV (unspec:DI [
                (const_int 64 [0x40])
            ] UNSPEC_VLMAX)
        (nil)))
(insn 31 9 10 2 (parallel [
            (set (reg:DI 15 a5 [138])
                (unspec:DI [
                        (reg:DI 0 zero)
                        (const_int 32 [0x20])
                        (const_int 7 [0x7])
                        (const_int 1 [0x1]) repeated x2
                    ] UNSPEC_VSETVL))
            (set (reg:SI 66 vl)
                (unspec:SI [
                        (reg:DI 0 zero)
                        (const_int 32 [0x20])
                        (const_int 7 [0x7])
                    ] UNSPEC_VSETVL))
            (set (reg:SI 67 vtype)
                (unspec:SI [
                        (const_int 32 [0x20])
                        (const_int 7 [0x7])
                        (const_int 1 [0x1]) repeated x2
                    ] UNSPEC_VSETVL))
        ]) "/app/example.c":5:15 3281 {vsetvldi}
     (nil))

GCC-13:
(insn 10 7 26 2 (set (reg/f:DI 11 a1 [139])
        (plus:DI (reg:DI 11 a1 [142])
            (const_int 800 [0x320]))) "/app/example.c":6:32 5 {adddi3}
     (nil))
(insn 26 10 9 2 (parallel [
            (set (reg:DI 15 a5)
                (unspec:DI [
                        (reg:DI 0 zero)
                        (const_int 32 [0x20])
                        (const_int 7 [0x7])
                        (const_int 1 [0x1]) repeated x2
                    ] UNSPEC_VSETVL))
            (set (reg:SI 66 vl)
                (unspec:SI [
                        (reg:DI 0 zero)
                        (const_int 32 [0x20])
                        (const_int 7 [0x7])
                    ] UNSPEC_VSETVL))
            (set (reg:SI 67 vtype)
                (unspec:SI [
                        (const_int 32 [0x20])
                        (const_int 7 [0x7])
                        (const_int 1 [0x1]) repeated x2
                    ] UNSPEC_VSETVL))
        ]) "/app/example.c":5:15 792 {vsetvldi}
     (nil))

GCC-13 doesn't have:
(insn 9 13 31 2 (set (reg:DI 15 a5 [138])
        (unspec:DI [
                (const_int 64 [0x40])
            ] UNSPEC_VLMAX)) "/app/example.c":5:15 2566 {vlmax_avldi}
     (expr_list:REG_EQUIV (unspec:DI [
                (const_int 64 [0x40])
            ] UNSPEC_VLMAX)
        (nil)))

vsetvl_pre doesn't emit any assembler which is just used for occupying scalar register.
It should be removed in VSETVL PASS.

Tested on both RV32 and RV64 no regression.

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (vsetvl_pre_insn_p): New function.
(pre_vsetvl::cleaup): Remove vsetvl_pre.
(pre_vsetvl::remove_vsetvl_pre_insns): New function.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/vsetvl/vsetvl_pre-1.c: New test.

LoongArch: Fix incorrect return type for frecipe/frsqrte intrinsic functions

gcc/ChangeLog:

* config/loongarch/larchintrin.h
(__frecipe_s): Update function return type.
(__frecipe_d): Ditto.
(__frsqrte_s): Ditto.
(__frsqrte_d): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/larch-frecipe-intrinsic.c: New test.

LoongArch: Adjust cost of vector_stmt that match multiply-add pattern.

We found that when only 128-bit vectorization was enabled, 549.fotonik3d_r
failed to vectorize effectively. For this reason, we adjust the cost of
128-bit vector_stmt that match the multiply-add pattern to facilitate 128-bit
vectorization.
The experimental results show that after the modification, 549.fotonik3d_r
performance can be improved by 9.77% under the 128-bit vectorization option.

gcc/ChangeLog:

* config/loongarch/loongarch.cc (loongarch_multiply_add_p): New.
(loongarch_vector_costs::add_stmt_cost): Adjust.

gcc/testsuite/ChangeLog:

* gfortran.dg/vect/vect-10.f90: New test.

LoongArch: Don't split the instructions containing relocs for extreme code model.

The ABI mandates the pcalau12i/addi.d/lu32i.d/lu52i.d instructions for
addressing a symbol to be adjacent.  So model them as "one large
instruction", i.e. define_insn, with two output registers.  The real
address is the sum of these two registers.

The advantage of this approach is the RTL passes can still use ldx/stx
instructions to skip an addi.d instruction.

gcc/ChangeLog:

* config/loongarch/loongarch.md (unspec): Add
UNSPEC_LA_PCREL_64_PART1 and UNSPEC_LA_PCREL_64_PART2.
(la_pcrel64_two_parts): New define_insn.
* config/loongarch/loongarch.cc (loongarch_tls_symbol): Fix a
typo in the comment.
(loongarch_call_tls_get_addr): If -mcmodel=extreme
-mexplicit-relocs={always,auto}, use la_pcrel64_two_parts for
addressing the TLS symbol and __tls_get_addr.  Emit an REG_EQUAL
note to allow CSE addressing __tls_get_addr.
(loongarch_legitimize_tls_address): If -mcmodel=extreme
-mexplicit-relocs={always,auto}, address TLS IE symbols with
la_pcrel64_two_parts.
(loongarch_split_symbol): If -mcmodel=extreme
-mexplicit-relocs={always,auto}, address symbols with
la_pcrel64_two_parts.
(loongarch_output_mi_thunk): Clean up unreachable code.  If
-mcmodel=extreme -mexplicit-relocs={always,auto}, address the MI
thunks with la_pcrel64_two_parts.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/func-call-extreme-1.c (dg-options):
Use -O2 instead of -O0 to ensure the pcalau12i/addi/lu32i/lu52i
instruction sequences are not reordered by the compiler.
(NOIPA): Disallow interprocedural optimizations.
* gcc.target/loongarch/func-call-extreme-2.c: Remove the content
duplicated from func-call-extreme-1.c, include it instead.
(dg-options): Likewise.
* gcc.target/loongarch/func-call-extreme-3.c (dg-options):
Likewise.
* gcc.target/loongarch/func-call-extreme-4.c (dg-options):
Likewise.
* gcc.target/loongarch/cmodel-extreme-1.c: New test.
* gcc.target/loongarch/cmodel-extreme-2.c: New test.
* g++.target/loongarch/cmodel-extreme-mi-thunk-1.C: New test.
* g++.target/loongarch/cmodel-extreme-mi-thunk-2.C: New test.
* g++.target/loongarch/cmodel-extreme-mi-thunk-3.C: New test.

LoongArch: Added support for loading __get_tls_addr symbol address using call36.

gcc/ChangeLog:

* config/loongarch/loongarch.cc (loongarch_call_tls_get_addr):
Add support for call36.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/explicit-relocs-medium-call36-auto-tls-ld-gd.c: New test.

LoongArch: Enable explicit reloc for extreme TLS GD/LD with -mexplicit-relocs=auto.

Binutils does not support relaxation using four instructions to obtain
symbol addresses

gcc/ChangeLog:

* config/loongarch/loongarch.cc (loongarch_explicit_relocs_p):
When the code model of the symbol is extreme and -mexplicit-relocs=auto,
the macro instruction loading symbol address is not applicable.
(loongarch_call_tls_get_addr): Adjust code.
(loongarch_legitimize_tls_address): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/explicit-relocs-extreme-auto-tls-ld-gd.c: New test.
* gcc.target/loongarch/explicit-relocs-medium-auto-tls-ld-gd.c: New test.

LoongArch: Add the macro implementation of mcmodel=extreme.

gcc/ChangeLog:

* config/loongarch/loongarch-protos.h (loongarch_symbol_extreme_p):
Add function declaration.
* config/loongarch/loongarch.cc (loongarch_symbolic_constant_p):
For SYMBOL_PCREL64, non-zero addend of "la.local $rd,$rt,sym+addend"
is not allowed
(loongarch_load_tls): Added macro support in extreme mode.
(loongarch_call_tls_get_addr): Likewise.
(loongarch_legitimize_tls_address): Likewise.
(loongarch_force_address): Likewise.
(loongarch_legitimize_move): Likewise.
(loongarch_output_mi_thunk): Likewise.
(loongarch_option_override_internal): Remove the code that detects
explicit relocs status.
(loongarch_handle_model_attribute): Likewise.
* config/loongarch/loongarch.md (movdi_symbolic_off64): New template.
* config/loongarch/predicates.md (symbolic_off64_operand): New predicate.
(symbolic_off64_or_reg_operand): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/attr-model-5.c: New test.
* gcc.target/loongarch/func-call-extreme-5.c: New test.
* gcc.target/loongarch/func-call-extreme-6.c: New test.
* gcc.target/loongarch/tls-extreme-macro.c: New test.

LoongArch: Merge template got_load_tls_{ld/gd/le/ie}.

gcc/ChangeLog:

* config/loongarch/loongarch.cc (loongarch_load_tls):
Load all types of tls symbols through one function.
(loongarch_got_load_tls_gd): Delete.
(loongarch_got_load_tls_ld): Delete.
(loongarch_got_load_tls_ie): Delete.
(loongarch_got_load_tls_le): Delete.
(loongarch_call_tls_get_addr): Modify the called function name.
(loongarch_legitimize_tls_address): Likewise.
* config/loongarch/loongarch.md (@got_load_tls_gd<mode>): Delete.
(@load_tls<mode>): New template.
(@got_load_tls_ld<mode>): Delete.
(@got_load_tls_le<mode>): Delete.
(@got_load_tls_ie<mode>): Delete.

LoongArch: Modify the address calculation logic for obtaining array element values through fp.

Modify address calculation logic from (((a x C) + fp) + offset) to ((fp + offset) + a x C).
Thereby modifying the register dependencies and optimizing the code.
The value of C is 2 4 or 8.

The following is the assembly code before and after a loop modification in spec2006 401.bzip:

                 old                      |                 new
735 .L71:                                |  735 .L71:
736         slli.d  $r12,$r15,2          |  736         slli.d  $r12,$r15,2
737         ldx.w   $r13,$r22,$r12       |  737         ldx.w   $r13,$r22,$r12
738         addi.d  $r15,$r15,-1         |  738         addi.d  $r15,$r15,-1
739         slli.w  $r16,$r15,0          |  739         slli.w  $r16,$r15,0
740         addi.w  $r13,$r13,-1         |  740         addi.w  $r13,$r13,-1
741         slti    $r14,$r13,0          |  741         slti    $r14,$r13,0
742         add.w   $r12,$r26,$r13       |  742         add.w   $r12,$r26,$r13
743         maskeqz $r12,$r12,$r14       |  743         maskeqz $r12,$r12,$r14
744         masknez $r14,$r13,$r14       |  744         masknez $r14,$r13,$r14
745         or      $r12,$r12,$r14       |  745         or      $r12,$r12,$r14
746         ldx.bu  $r14,$r30,$r12       |  746         ldx.bu  $r14,$r30,$r12
747         lu12i.w $r13,4096>>12        |  747         alsl.d  $r14,$r14,$r18,2
748         ori     $r13,$r13,432        |  748         ldptr.w $r13,$r14,0
749         add.d   $r13,$r13,$r3        |  749         addi.w  $r17,$r13,-1
750         alsl.d  $r14,$r14,$r13,2     |  750         stptr.w $r17,$r14,0
751         ldptr.w $r13,$r14,-1968      |  751         slli.d  $r13,$r13,2
752         addi.w  $r17,$r13,-1         |  752         stx.w   $r12,$r22,$r13
753         st.w    $r17,$r14,-1968      |  753         ldptr.w $r12,$r19,0
754         slli.d  $r13,$r13,2          |  754         blt     $r12,$r16,.L71
755         stx.w   $r12,$r22,$r13       |  755         .align  4
756         ldptr.w $r12,$r18,-2048      |  756
757         blt     $r12,$r16,.L71       |  757
758         .align  4                    |  758

This patch is ported from riscv's commit r14-3111.

gcc/ChangeLog:

* config/loongarch/loongarch.cc (mem_shadd_or_shadd_rtx_p): New function.
(loongarch_legitimize_address): Add logical transformation code.

Daily bump.

c++: -Wdangling-reference tweak to unbreak aarch64

My recent -Wdangling-reference change to not warn on std::span-like classes
unfortunately caused a new warning: extending reference_like_class_p also
opens the door to new warnings since we use reference_like_class_p for
checking the return type of the function: either it must be a reference
or a reference_like_class_p.

We can consider even non-templates as std::span-like to get rid of the
warning here.

gcc/cp/ChangeLog:

* call.cc (reference_like_class_p): Consider even non-templates for
std::span-like classes.

gcc/ChangeLog:

* doc/invoke.texi: Update -Wdangling-reference documentation.

gcc/testsuite/ChangeLog:

* g++.dg/warn/Wdangling-reference21.C: New test.

i386: Improve *cmp<dwi>_doubleword splitter [PR113701]

The fix for PR70321 introduced a splitter that split a doubleword
comparison into a pair of XORs followed by an IOR to set the (zero)
flags register.  To help the reload, splitter forced SUBREG pieces of
double-word input values to a pseudo, but this regressed
gcc.target/i386/pr82580.c:

int f0 (U x, U y) { return x == y; }

from:
xorq    %rdx, %rdi
xorq    %rcx, %rsi
xorl    %eax, %eax
orq     %rsi, %rdi
sete    %al
ret

to:
xchgq   %rdi, %rsi
movq    %rdx, %r8
movq    %rcx, %rax
movq    %rsi, %rdx
movq    %rdi, %rcx
xorq    %rax, %rcx
xorq    %r8, %rdx
xorl    %eax, %eax
orq     %rcx, %rdx
sete    %al
ret

To mitigate the regression, remove this legacy heuristic (workaround?).
There have been many incremental changes and improvements to x86 TImode
and register allocation, so this legacy workaround is not only no longer
useful, but it actually hurts register allocation.  The patched compiler
now produces:

        xchgq   %rdi, %rsi
        xorl    %eax, %eax
        xorq    %rsi, %rdx
        xorq    %rdi, %rcx
        orq     %rcx, %rdx
        sete    %al
        ret

PR target/113701

gcc/ChangeLog:

* config/i386/i386.md (*cmp<dwi>_doubleword):
Do not force SUBREG pieces to pseudos.

libgcc: Avoid warnings on __gcc_nested_func_ptr_created [PR113402]

I'm seeing hundreds of
In file included from ../../../libgcc/libgcc2.c:56:
../../../libgcc/libgcc2.h:32:13: warning: conflicting types for built-in function ‘__gcc_nested_func_ptr_created’; expected ‘void(void *, void *, void *)’
+[-Wbuiltin-declaration-mismatch]
   32 | extern void __gcc_nested_func_ptr_created (void *, void *, void **);
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
warnings.

Either we need to add like in r14-6218
  #pragma GCC diagnostic ignored "-Wbuiltin-declaration-mismatch"
(but in that case because of the libgcc2.h prototype (why is it there?)
it would need to be also with #pragma GCC diagnostic push/pop around),
or we could go with just following how the builtins are prototyped on the
compiler side and only cast to void ** when dereferencing (which is in
a single spot in each TU).

2024-02-01  Jakub Jelinek  <jakub@redhat.com>

PR libgcc/113402
* libgcc2.h (__gcc_nested_func_ptr_created): Change type of last
argument from void ** to void *.
* config/i386/heap-trampoline.c (__gcc_nested_func_ptr_created):
Change type of dst from void ** to void * and cast dst to void **
before dereferencing it.
* config/aarch64/heap-trampoline.c (__gcc_nested_func_ptr_created):
Likewise.

libgcc: Fix up i386/t-heap-trampoline [PR113403]

I'm seeing
../../../libgcc/shared-object.mk:14: warning: overriding recipe for target 'heap-trampoline.o'
../../../libgcc/shared-object.mk:14: warning: ignoring old recipe for target 'heap-trampoline.o'
../../../libgcc/shared-object.mk:17: warning: overriding recipe for target 'heap-trampoline_s.o'
../../../libgcc/shared-object.mk:17: warning: ignoring old recipe for target 'heap-trampoline_s.o'

This patch fixes that.

2024-02-01 Jakub Jelinek <jakub@redhat.com>

PR libgcc/113403
* config/i386/t-heap-trampoline: Add to LIB2ADDEHSHARED
i386/heap-trampoline.c rather than aarch64/heap-trampoline.c.

libstdc++: Implement P2165R4 changes to std::pair/tuple/etc [PR113309]

This implements the C++23 paper P2165R4 Compatibility between tuple,
pair and tuple-like objects, which builds upon many changes from the
earlier C++23 paper P2321R2 zip.

Some declarations had to be moved around so that they're visible from
<bits/stl_pair.h> without introducing new includes and bloating the
header. In the end, the only new include is for <bits/utility.h> from
<bits/stl_iterator.h>, for tuple_element_t.

PR libstdc++/113309
PR libstdc++/109203

libstdc++-v3/ChangeLog:

* include/bits/ranges_util.h (__detail::__pair_like): Don't
define in C++23 mode.
(__detail::__pair_like_convertible_from): Adjust as per P2165R4.
(__detail::__is_subrange<subrange>): Moved from <ranges>.
(__detail::__is_tuple_like_v<subrange>): Likewise.
* include/bits/stl_iterator.h: Include <bits/utility.h> for
C++23.
(__different_from): Move to <concepts>.
(__iter_key_t): Adjust for C++23 as per P2165R4.
(__iter_val_t): Likewise.
* include/bits/stl_pair.h (pair, array): Forward declare.
(get): Forward declare all overloads relevant to P2165R4
tuple-like constructors.
(__is_tuple_v): Define for C++23.
(__is_tuple_like_v): Define for C++23.
(__tuple_like): Define for C++23 as per P2165R4.
(__pair_like): Define for C++23 as per P2165R4.
(__eligibile_tuple_like): Define for C++23.
(__eligibile_pair_like): Define for C++23.
(pair::_S_constructible_from_pair_like): Define for C++23.
(pair::_S_convertible_from_pair_like): Define for C++23.
(pair::_S_dangles_from_pair_like): Define for C++23.
(pair::pair): Define overloads taking a tuple-like type for
C++23 as per P2165R4.
(pair::_S_assignable_from_tuple_like): Define for C++23.
(pair::_S_const_assignable_from_tuple_like): Define for C++23.
(pair::operator=): Define overloads taking a tuple-like type for
C++23 as per P2165R4.
* include/bits/utility.h (ranges::__detail::__is_subrange):
Moved from <ranges>.
* include/bits/version.def (tuple_like): Define for C++23.
* include/bits/version.h: Regenerate.
* include/std/concepts (__different_from): Moved from
<bits/stl_iterator.h>.
(ranges::__swap::__adl_swap): Clarify which __detail namespace.
* include/std/map (__cpp_lib_tuple_like): Define C++23.
* include/std/ranges (__detail::__is_subrange): Moved to
<bits/utility.h>.
(__detail::__is_subrange<subrange>): Moved to <bits/ranges_util.h>
(__detail::__has_tuple_element): Adjust for C++23 as per P2165R4.
(__detail::__tuple_or_pair): Remove as per P2165R4. Replace all
uses with plain tuple as per P2165R4.
* include/std/tuple (__cpp_lib_tuple_like): Define for C++23.
(__tuple_like_tag_t): Define for C++23.
(__tuple_cmp): Forward declare for C++23.
(_Tuple_impl::_Tuple_impl): Define overloads taking
__tuple_like_tag_t and a tuple-like type for C++23.
(_Tuple_impl::_M_assign): Likewise.
(tuple::__constructible_from_tuple_like): Define for C++23.
(tuple::__convertible_from_tuple_like): Define for C++23.
(tuple::__dangles_from_tuple_like): Define for C++23.
(tuple::tuple): Define overloads taking a tuple-like type for
C++23 as per P2165R4.
(tuple::__assignable_from_tuple_like): Define for C++23.
(tuple::__const_assignable_from_tuple_like): Define for C++23.
(tuple::operator=): Define overloads taking a tuple-like type
for C++23 as per P2165R4.
(tuple::__tuple_like_common_comparison_category): Define for C++23.
(tuple::operator<=>): Define overload taking a tuple-like type
for C++23 as per P2165R4.
(array, get): Forward declarations moved to <bits/stl_pair.h>.
(tuple_cat): Constrain with __tuple_like for C++23 as per P2165R4.
(apply): Likewise.
(make_from_tuple): Likewise.
(__tuple_like_common_reference): Define for C++23.
(basic_common_reference): Adjust as per P2165R4.
(__tuple_like_common_type): Define for C++23.
(common_type): Adjust as per P2165R4.
* include/std/unordered_map (__cpp_lib_tuple_like): Define for
C++23.
* include/std/utility (__cpp_lib_tuple_like): Define for C++23.
* testsuite/std/ranges/zip/1.cc (test01): Adjust to handle pair
and 2-tuple interchangeably.
(test05): New test.
* testsuite/20_util/pair/p2165r4.cc: New test.
* testsuite/20_util/tuple/p2165r4.cc: New test.

Reviewed-by: Jonathan Wakely <jwakely@redhat.com>

libstdc++/pair: Factor out const-assignability helper for C++20

This is consistent with std::tuple's __const_assignable helper, and will
be useful for implementing the new pair::operator= overloads from P2165R4.

libstdc++-v3/ChangeLog:

* include/bits/stl_pair.h (pair::_S_const_assignable): Define,
factored out from ...
(pair::operator=): ... the constraints of the const overloads.

Reviewed-by: Jonathan Wakely <jwakely@redhat.com>

Set num_threads to 50 on 32-bit hppa in two libgomp loop tests

We support a maximum of 50 threads on 32-bit hppa.

2024-02-01 John David Anglin <danglin@gcc.gnu.org>

libgomp/ChangeLog:

* testsuite/libgomp.c++/loop-3.C: Set num_threads to 50
on 32-bit hppa.
* testsuite/libgomp.c/omp-loop03.c: Likewise.

xfail gnat.dg/trampoline3.adb scan-assembler-not check on hppa*-*-*

We still require an executable stack for trampolines on hppa*-*-*.

2024-02-01 John David Anglin <danglin@gcc.gnu.org>

gcc/testsuite/ChangeLog:

* gnat.dg/trampoline3.adb: xfail scan-assembler-not
check on hppa*-*-*.

hppa: Fix bug in atomic_storedi_1 pattern

The first alternative stores the floating-point status register
in the destination. It should store zero. We need to copy %fr0
to another floating-point register to initialize it to zero.

2024-02-01 John David Anglin <danglin@gcc.gnu.org>

gcc/ChangeLog:

* config/pa/pa.md (atomic_storedi_1): Fix bug in
alternative 1.

c++: ttp TEMPLATE_DECL equivalence [PR112737]

Here during declaration matching we undesirably consider the two TT{42}
CTAD expressions to be non-equivalent ultimately because for CTAD
placeholder equivalence we compare the TEMPLATE_DECLs via pointer identity,
and here the corresponding TEMPLATE_DECLs for TT are different since
they're from different scopes.  On the other hand, the corresponding
TEMPLATE_TEMPLATE_PARMs are deemed equivalent according to cp_tree_equal
(since they have the same position and template parameters).  This turns
out to be the root cause of some of the xtreme-header modules regressions.

So this patch relaxes ttp CTAD placeholder equivalence accordingly, by
comparing the TEMPLATE_TEMPLATE_PARM instead of the TEMPLATE_DECL.  It
turns out this issue also affects function template-id equivalence as
with g<TT> in the second testcase, so it makes sense to relax TEMPLATE_DECL
equivalence more generally in cp_tree_equal.  In passing this patch
improves ctp_hasher::hash for CTAD placeholders, so that they don't
all get the same hash.

PR c++/112737

gcc/cp/ChangeLog:

* pt.cc (iterative_hash_template_arg) <case TEMPLATE_DECL>:
Adjust hashing to match cp_tree_equal.
(ctp_hasher::hash): Also hash CLASS_PLACEHOLDER_TEMPLATE.
* tree.cc (cp_tree_equal) <case TEMPLATE_DECL>: Return true
for ttp TEMPLATE_DECLs if their TEMPLATE_TEMPLATE_PARMs are
equivalent.
* typeck.cc (structural_comptypes) <case TEMPLATE_TYPE_PARM>:
Use cp_tree_equal to compare CLASS_PLACEHOLDER_TEMPLATE.

gcc/testsuite/ChangeLog:

* g++.dg/template/ttp42.C: New test.
* g++.dg/template/ttp43.C: New test.

Reviewed-by: Jason Merrill <jason@redhat.com>

AVR: Tabify avr.cc

gcc/
* config/avr/avr.cc: Tabify.

middle-end: Fix ICE in poly-int.h due to SLP.

Adds a check to ensure that the input vector arguments
to a function are not variable length. Previously, only the
output vector of a function was checked.

The ICE in question is within the neon-sve-bridge.c test,
and is related to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111268

gcc/ChangeLog:
PR tree-optimization/111268
* tree-vect-slp.cc (vectorizable_slp_permutation_1):
Add variable-length check for vector input arguments
to a function.

libstdc++: Do not use def-file-line for each macro in <bits/version.h>

These line markers are not needed, because searching <bits/version.def>
for a macro name works fine. Removing them means that small changes to
<bits/version.def> do not result in large diffs to <bits/version.h>
because of all the changed line numbers.

libstdc++-v3/ChangeLog:

* include/bits/version.tpl: Do not use def-file-line for each
macro being defined.
* include/bits/version.h: Regenerate.

libstdc++: Update expected error for debug/constexpr*_neg.cc tests

We no longer hit a __builtin_unreachable() in these tests, so we need to
update the dg-error patterns to match _Error_formatter::_M_error().

We can also remove some dg-prune-output directives matching notes saying
"in 'constexpr' expansion" because that's done globally in prune.exp.

libstdc++-v3/ChangeLog:

* testsuite/25_algorithms/copy/debug/constexpr_neg.cc: Adjust
dg-error pattern.
* testsuite/25_algorithms/copy_backward/debug/constexpr_neg.cc:
Likewise.
* testsuite/25_algorithms/equal/debug/constexpr_neg.cc:
Likewise.
* testsuite/25_algorithms/lower_bound/debug/constexpr_partitioned_neg.cc:
Likewise.
* testsuite/25_algorithms/lower_bound/debug/constexpr_partitioned_pred_neg.cc:
Likewise.
* testsuite/25_algorithms/lower_bound/debug/constexpr_valid_range_neg.cc:
Likewise.
* testsuite/25_algorithms/upper_bound/debug/constexpr_partitioned_neg.cc:
Likewise.
* testsuite/25_algorithms/upper_bound/debug/constexpr_partitioned_pred_neg.cc:
Likewise.
* testsuite/25_algorithms/upper_bound/debug/constexpr_valid_range_neg.cc:
Likewise.

libstdc++: Fix -Wdeprecated warning about implicit capture of 'this'

In C++20 it's deprecated for a [=] lambda capture to capture the 'this'
pointer. Using resize_and_overwrite with a lambda seems like overkill to
write three chars to the string anyway. Just resize the string and
overwrite the end of it directly.

libstdc++-v3/ChangeLog:

* include/experimental/internet (network_v4::to_string()):
Remove lambda and use of resize_and_overwrite.

GCN: Don't hard-code number of SGPR/VGPR/AVGPR registers

Also add 'STATIC_ASSERT's for number of SGPR/VGPR/AVGPR registers (in
'#ifndef USED_FOR_TARGET', as otherwise 'STATIC_ASSERT' isn't available).

gcc/
* config/gcn/gcn.cc (gcn_hsa_declare_function_name): Don't
hard-code number of SGPR/VGPR/AVGPR registers.
* config/gcn/gcn.h: Add a 'STATIC_ASSERT's for number of
SGPR/VGPR/AVGPR registers.

libcpp: Stabilize the location for macros restored after PCH load [PR105608]

libcpp currently lacks the infrastructure to assign correct locations to
macros that were defined prior to loading a PCH and then restored
afterwards. While I plan to address that fully for GCC 15, this patch
improves things by using at least a valid location, even if it's not the
best one. Without this change, libcpp uses pfile->directive_line as the
location for the restored macros, but this location_t applies to the old
line map, not the one that was just restored from the PCH, so the resulting
location is unpredictable and depends on what was stored in the line maps
before. With this change, all restored macros get assigned locations at the
line of the #include that triggered the PCH restore. A future patch will
store the actual file name and line number of each definition and then
synthesize locations in the new line map pointing to the right place.

gcc/c-family/ChangeLog:

PR preprocessor/105608
* c-pch.cc (c_common_read_pch): Adjust line map so that libcpp
assigns a location to restored macros which is the same location
that triggered the PCH include.

libcpp/ChangeLog:

PR preprocessor/105608
* pch.cc (cpp_read_state): Set a valid location for restored
macros.

c++: ICE with throw inside concept [PR112437]

We crash in the loop at the end of treat_lvalue_as_rvalue_p for code
like

template <class T>
concept Throwable = requires(T x) { throw x; };

because the code assumes that we eventually reach sk_function_parms or
sk_try and bail, but in a concept we're in a sk_namespace.

We're already checking sk_try so we don't crash in a function-try-block,
but I've added a test anyway.

PR c++/112437

gcc/cp/ChangeLog:

* typeck.cc (treat_lvalue_as_rvalue_p): Bail out on sk_namespace in
the move on throw of parms loop.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-throw1.C: New test.
* g++.dg/eh/throw4.C: New test.

RISC-V: Support scheduling for sifive p600 series

Add sifive p600 series scheduler module. For more information
see https://www.sifive.com/cores/performance-p650-670.
Add sifive-p650, sifive-p670 for mcpu option will come in separate patches.

gcc/ChangeLog:

* config/riscv/riscv.md: Add "fcvt_i2f", "fcvt_f2i" type
attribute, and include sifive-p600.md.
* config/riscv/generic-ooo.md: Update type attribute.
* config/riscv/generic.md: Update type attribute.
* config/riscv/sifive-7.md: Update type attribute.
* config/riscv/sifive-p600.md: New file.
* config/riscv/riscv-cores.def (RISCV_TUNE): Add parameter.
* config/riscv/riscv-opts.h (enum riscv_microarchitecture_type):
Add sifive_p600.
* config/riscv/riscv.cc (sifive_p600_tune_info): New.
* config/riscv/riscv.h (TARGET_SFB_ALU): Update.
* doc/invoke.texi (RISC-V Options): Add sifive-p600-series

RISC-V: Add minimal support for 7 new unprivileged extensions

The RISC-V Profiles specification here:
https://github.com/riscv/riscv-profiles/blob/main/profiles.adoc#7-new-isa-extensions

These extensions don't add any new features but
describe existing features. So this patch only adds parsing.

Za64rs: Reservation set size of 64 bytes
Za128rs: Reservation set size of 128 bytes
Ziccif: Main memory supports instruction fetch with atomicity requirement
Ziccrse: Main memory supports forward progress on LR/SC sequences
Ziccamoa: Main memory supports all atomics in A
Zicclsm: Main memory supports misaligned loads/stores
Zic64b: Cache block size isf 64 bytes

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc: Add Za64rs, Za128rs,
Ziccif, Ziccrse, Ziccamoa, Zicclsm, Zic64b items.
* config/riscv/riscv.opt: New macro for 7 new unprivileged
extensions.
* doc/invoke.texi (RISC-V Options): Add Za64rs, Za128rs,
Ziccif, Ziccrse, Ziccamoa, Zicclsm, Zic64b extensions.
gcc/testsuite/ChangeLog:

* gcc.target/riscv/za-ext.c: New test.
* gcc.target/riscv/zi-ext.c: New test.

Link shared libasan with -z now on Solaris

g++.dg/asan/default-options-1.C FAILs on Solaris/SPARC and x86:

FAIL: g++.dg/asan/default-options-1.C   -O0  execution test
FAIL: g++.dg/asan/default-options-1.C   -O1  execution test
FAIL: g++.dg/asan/default-options-1.C   -O2  execution test
FAIL: g++.dg/asan/default-options-1.C   -O2 -flto  execution test
FAIL: g++.dg/asan/default-options-1.C   -O2 -flto -flto-partition=none  execution test
FAIL: g++.dg/asan/default-options-1.C   -O3 -g  execution test
FAIL: g++.dg/asan/default-options-1.C   -Os  execution test

The failure is always the same:

AddressSanitizer: CHECK failed: asan_rtl.cpp:397 "((!AsanInitIsRunning() && "ASan init calls itself!")) != (0)" (0x0, 0x0) (tid=1)

This happens because libasan makes unportable assumptions about
initialization order that don't hold on Solaris.  The problem has
already been fixed in clang by

[Driver] Link shared asan runtime lib with -z now on Solaris/x86
https://reviews.llvm.org/D156325

where it was way more prevalent.

This patch applies the same fix to gcc.

Tested on i386-pc-solaris2.11 (ld and gld) and sparc-sun-solaris2.11.

2024-01-30  Rainer Orth  <ro@CeBiTec.Uni-Bielefeld.DE>

gcc:
* config/sol2.h (LIBASAN_EARLY_SPEC): Add -z now unless
-static-libasan.  Add missing whitespace.

testsuite: i386: Fix gcc.target/i386/pr38534-1.c etc. on Solaris/x86

The gcc.target/i386/pr38534-1.c etc. tests FAIL on 32 and 64-bit
Solaris/x86:

FAIL: gcc.target/i386/pr38534-1.c scan-assembler-not push
FAIL: gcc.target/i386/pr38534-2.c scan-assembler-not push
FAIL: gcc.target/i386/pr38534-3.c scan-assembler-not push
FAIL: gcc.target/i386/pr38534-4.c scan-assembler-not push

The tests assume the Linux/x86 default of -fomit-frame-pointer, while
Solaris/x86 defaults to -fno-omit-frame-pointer.

Fixed by specifying -fomit-frame-pointer explicitly.

Tested on i386-pc-solaris2.11 and i686-pc-linux-gnu.

2024-01-30 Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE>

gcc/testsuite:
* gcc.target/i386/pr38534-1.c: Add -fomit-frame-pointer to
dg-options.
* gcc.target/i386/pr38534-2.c: Likewise.
* gcc.target/i386/pr38534-3.c: Likewise.
* gcc.target/i386/pr38534-4.c: Likewise.

testsuite: i386: Fix gcc.target/i386/no-callee-saved-1.c etc. on Solaris/x86

The gcc.target/i386/no-callee-saved-[12].c tests FAIL on Solaris/x86:

FAIL: gcc.target/i386/no-callee-saved-1.c scan-assembler-not push
FAIL: gcc.target/i386/no-callee-saved-2.c scan-assembler-not push

In both cases, the test expect the Linux/x86 default of
-fomit-frame-pointer, while Solaris/x86 defaults to
-fno-omit-frame-pointer.

So this patch explicitly specifies -fomit-frame-pointer.

Tested on i386-pc-solaris2.11 (as and gas) and i686-pc-linux-gnu.

2024-01-30 Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE>

gcc/testsuite:
* gcc.target/i386/no-callee-saved-1.c: Add -fomit-frame-pointer to
dg-options.
* gcc.target/i386/no-callee-saved-2.c: Likewise.

testsuite: i386: Fix gcc.target/i386/avx512vl-stv-rotatedi-1.c on 32-bit Solaris/x86

gcc.target/i386/avx512vl-stv-rotatedi-1.c FAILs on 32-bit Solaris/x86
since its introduction in

commit 4814b63c3c2326cb5d7baa63882da60ac011bd97
Author: Roger Sayle <roger@nextmovesoftware.com>
Date:   Mon Jul 10 09:04:29 2023 +0100

    i386: Add AVX512 support for STV of SI/DImode rotation by constant.

FAIL: gcc.target/i386/avx512vl-stv-rotatedi-1.c scan-assembler-times vpro[lr]q 29

While the test depends on -mstv, 32-bit Solaris/x86 defaults to
-mstackrealign which is incompatible.

The patch thus specifies -mstv -mno-stackrealign explicitly.

Tested on i386-pc-solaris2.11 and i686-pc-linux-gnu.

2024-01-23  Rainer Orth  <ro@CeBiTec.Uni-Bielefeld.DE>

gcc/testsuite:
* gcc.target/i386/avx512vl-stv-rotatedi-1.c: Add -mstv
-mno-stackrealign to dg-options.

testsuite: i386: Fix gcc.target/i386/pr70321.c on 32-bit Solaris/x86

gcc.target/i386/pr70321.c FAILs on 32-bit Solaris/x86 since its
introduction in

commit 43201f2c2173894bf7c423cad6da1c21567e06c0
Author: Roger Sayle <roger@nextmovesoftware.com>
Date:   Mon May 30 21:20:09 2022 +0100

    PR target/70321: Split double word equality/inequality after STV on x86.

FAIL: gcc.target/i386/pr70321.c scan-assembler-times mov 1

The failure happens because 32-bit Solaris/x86 defaults to
-fno-omit-frame-pointer.

Fixed by specifying -fomit-frame-pointer explicitly.

Tested on i386-pc-solaris2.11 and i686-pc-linux-gnu.

2024-01-23  Rainer Orth  <ro@CeBiTec.Uni-Bielefeld.DE>

gcc/testsuite:
* gcc.target/i386/pr70321.c: Add -fomit-frame-pointer to
dg-options.

c++: Fix g++.dg/ext/attr-section2.C etc. with Solaris/SPARC as

The new g++.dg/ext/attr-section2*.C tests FAIL on Solaris/SPARC with the
native assembler:

+FAIL: g++.dg/ext/attr-section2.C -std=c++14 scan-assembler
.(section|csect)[ \\\\t]+.foo
+FAIL: g++.dg/ext/attr-section2.C -std=c++17 scan-assembler
.(section|csect)[ \\\\t]+.foo
+FAIL: g++.dg/ext/attr-section2.C -std=c++20 scan-assembler
.(section|csect)[ \\\\t]+.foo

The problem is that the SPARC assembler requires the section name to be
double-quoted, like

        .section        ".foo%_Z3varIiE",#alloc,#write,#progbits

This patch allows for that.  At the same time, it quotes literal dots in
the REs.

Tested on sparc-sun-solaris2.11 (as and gas) and i386-pc-solaris2.11 (as
and gas).

2024-01-18  Rainer Orth  <ro@CeBiTec.Uni-Bielefeld.DE>

gcc/testsuite:
* g++.dg/ext/attr-section2.C (scan-assembler): Quote dots.  Allow
for double-quoted section name.
* g++.dg/ext/attr-section2a.C: Likewise.
* g++.dg/ext/attr-section2b.C: Likewise.

Daily bump.

GCN: Remove 'FIRST_{SGPR,VGPR,AVGPR}_REG', 'LAST_{SGPR,VGPR,AVGPR}_REG' from machine description

They're not used there, and we avoid potentially out-of-sync definitions.

gcc/
* config/gcn/gcn.md (FIRST_SGPR_REG, LAST_SGPR_REG)
(FIRST_VGPR_REG, LAST_VGPR_REG, FIRST_AVGPR_REG, LAST_AVGPR_REG):
Don't 'define_constants'.

GCN: Remove 'SGPR_OR_VGPR_REGNO_P' definition

..., which was always (a) unused, and (b) bogus: always-false.

gcc/
* config/gcn/gcn.h (SGPR_OR_VGPR_REGNO_P): Remove.

GCN, RDNA 3: Adjust 'sync_compare_and_swap<mode>_lds_insn'

For OpenACC/GCN '-march=gfx1100', a lot of libgomp OpenACC test cases FAIL:

    /tmp/ccGfLJ8a.mkoffload.2.s:406:2: error: instruction not supported on this GPU
            ds_cmpst_rtn_b32 v0, v0, v4, v3
            ^

In RDNA 3, 'ds_cmpst_[...]' has been replaced by 'ds_cmpstore_[...]', and the
notes for 'ds_cmpst_[...]' in pre-RDNA 3 ISA manuals:

    Caution, the order of src and cmp are the *opposite* of the BUFFER_ATOMIC_CMPSWAP opcode.

..., have been resolved for 'ds_cmpstore_[...]' in the RDNA 3 ISA manual:

    In this architecture the order of src and cmp agree with the BUFFER_ATOMIC_CMPSWAP opcode.

..., and therefore '%2', '%3' now swapped with regards to GCC operand order.
Most of the affected libgomp OpenACC test cases then PASS their execution test.

gcc/
* config/gcn/gcn.md (sync_compare_and_swap<mode>_lds_insn)
[TARGET_RDNA3]: Adjust.

PR modula2/111627 defend against ICE

Although PR 111627 can be fixed by renaming testsuite modules it
highlighted that a possible ICE can occur if a malformed
implementation module is actually a program module. This small
patch defends against this ICE and checks to see whether the module
is a DefImp before testing IsDefinitionForC.

gcc/m2/ChangeLog:

PR modula2/111627
PR modula2/112506
* gm2-compiler/M2Comp.mod (Pass0CheckMod): Test IsDefImp before
checking IsDefinitionForC.

Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>

tree-optimization/113693 - LC SSA and region VN

The following fixes LC SSA preserving with region VN which was broken
when availability checking was enhanced to treat not visited value
numbers as available. The following makes sure to honor availability
data we put in place for LC SSA preserving instead.

PR tree-optimization/113693
* tree-ssa-sccvn.cc (rpo_elim::eliminate_avail): Honor avail
data when available.

* gcc.dg/pr113693.c: New testcase.

aarch64: libgcc: Cleanup ELF marking in asm

Use aarch64-asm.h in asm code consistently, this was started in

  commit c608ada288ced0268bbbbc1fd4136f56c34b24d4
  Author:     Zac Walker <zacwalker@microsoft.com>
  CommitDate: 2024-01-23 15:32:30 +0000

  Ifdef `.hidden`, `.type`, and `.size` pseudo-ops for `aarch64-w64-mingw32` target

But that commit failed to remove some existing markings from asm files,
which means some objects got double marked with gnu property notes.

libgcc/ChangeLog:

* config/aarch64/crti.S: Remove stack marking.
* config/aarch64/crtn.S: Remove stack marking, include aarch64-asm.h
* config/aarch64/lse.S: Remove stack and GNU property markings.

gimple-low: Remove .ASAN_MARK calls on TREE_STATIC variables [PR113531]

Since the r14-1500-g4d935f52b0d5c0 commit we promote an initializer_list
backing array to static storage where appropriate, but this happens after
we decided to add it to asan_poisoned_variables.  As a result we add
unpoison/poison for it to the gimple.  But then sanopt removes the unpoison.
So the second time we call the function and want to load from the array asan
still considers it poisoned.

The following patch fixes it by removing the .ASAN_MARK internal calls
during gimple lowering if they refer to TREE_STATIC vars.

2024-02-01  Jakub Jelinek  <jakub@redhat.com>
    Jason Merrill  <jason@redhat.com>

PR c++/113531
* gimple-low.cc (lower_stmt): Remove .ASAN_MARK calls
on variables which were promoted to TREE_STATIC.

* g++.dg/asan/initlist1.C: New test.

Co-authored-by: Jason Merrill <jason@redhat.com>

PR target/113560: Enhance is_widening_mult_rhs_p.

This patch resolves PR113560, a code quality regression from GCC12
affecting x86_64, by enhancing the middle-end's tree-ssa-math-opts.cc
to recognize more instances of widening multiplications.

The widening multiplication perception code identifies cases like:

_1 = (unsigned __int128) x;
__res = _1 * 100;

but in the reported test case, the original input looks like:

_1 = (unsigned long long) x;
_2 = (unsigned __int128) _1;
__res = _2 * 100;

which gets optimized by constant folding during tree-ssa to:

_2 = x & 18446744073709551615;  // x & 0xffffffffffffffff
__res = _2 * 100;

where the BIT_AND_EXPR hides (has consumed) the extension operation.
This reveals the more general deficiency (missed optimization
opportunity) in widening multiplication perception that additionally
both

__int128 foo(__int128 x, __int128 y) {
  return (x & 1000) * (y & 1000)
}

and

unsigned __int128 bar(unsigned __int128 x, unsigned __int128) {
  return (x >> 80) * (y >> 80);
}

should be recognized as widening multiplications.  Hence rather than
test explicitly for BIT_AND_EXPR (as in the first version of this patch)
the more general solution is to make use of range information, as
provided by tree_non_zero_bits.

As a demonstration of the observed improvements, function foo above
currently with -O2 compiles on x86_64 to:

foo: movq    %rdi, %rsi
        movq    %rdx, %r8
        xorl    %edi, %edi
        xorl    %r9d, %r9d
        andl    $1000, %esi
        andl    $1000, %r8d
        movq    %rdi, %rcx
        movq    %r9, %rdx
        imulq   %rsi, %rdx
        movq    %rsi, %rax
        imulq   %r8, %rcx
        addq    %rdx, %rcx
        mulq    %r8
        addq    %rdx, %rcx
        movq    %rcx, %rdx
        ret

with this patch, GCC recognizes the *w and instead generates:

foo:    movq    %rdi, %rsi
        movq    %rdx, %r8
        andl    $1000, %esi
        andl    $1000, %r8d
        movq    %rsi, %rax
        imulq   %r8
        ret

which is perhaps easier to understand at the tree-level where

__int128 foo (__int128 x, __int128 y)
{
  __int128 _1;
  __int128 _2;
  __int128 _5;

  <bb 2> [local count: 1073741824]:
  _1 = x_3(D) & 1000;
  _2 = y_4(D) & 1000;
  _5 = _1 * _2;
  return _5;

}

gets transformed to:

__int128 foo (__int128 x, __int128 y)
{
  __int128 _1;
  __int128 _2;
  __int128 _5;
  signed long _7;
  signed long _8;

  <bb 2> [local count: 1073741824]:
  _1 = x_3(D) & 1000;
  _2 = y_4(D) & 1000;
  _7 = (signed long) _1;
  _8 = (signed long) _2;
  _5 = _7 w* _8;
  return _5;
}

2023-02-01  Roger Sayle  <roger@nextmovesoftware.com>
    Richard Biener  <rguenther@suse.de>

gcc/ChangeLog
PR target/113560
* tree-ssa-math-opts.cc (is_widening_mult_rhs_p): Use range
information via tree_non_zero_bits to check if this operand
is suitably extended for a widening (or highpart) multiplication.
(convert_mult_to_widen): Insert explicit casts if the RHS or LHS
isn't already of the claimed type.

gcc/testsuite/ChangeLog
PR target/113560
* g++.target/i386/pr113560.C: New test case.
* gcc.target/i386/pr113560.c: Likewise.
* gcc.dg/pr87954.c: Update test case.

Revert "RISC-V: Add non-vector types to dfa pipelines"

This reverts commit 26c34b809cd1a6249027730a8b52bbf6a1c0f4a8.

Revert "RISC-V: Add vector related pipelines"

This reverts commit e56fb037d9d265682f5e7217d8a4c12a8d3fddf8.

Revert "RISC-V: Use default cost model for insn scheduling"

This reverts commit 4b799a16ae59fc0f508c5931ebf1851a3446b707.

Revert "RISC-V: Enable assert for insn_has_dfa_reservation"

This reverts commit 23cd2961bd2ff63583f46e3499a07bd54491d45c.

RISC-V: Enable assert for insn_has_dfa_reservation

Enables assert that every typed instruction is associated with a
dfa reservation

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_sched_variable_issue): enable assert

RISC-V: Use default cost model for insn scheduling

Use default cost model scheduling on these test cases. All these tests
introduce scan dump failures with -mtune generic-ooo. Since the vector
cost models are the same across all three tunes, some of the tests
in PR113249 will be fixed with this patch series.

PR target/113249

gcc/testsuite/ChangeLog:

* g++.target/riscv/rvv/base/bug-1.C: use default scheduling
* gcc.target/riscv/rvv/autovec/reduc/reduc_call-2.c: ditto
* gcc.target/riscv/rvv/base/binop_vx_constraint-102.c: ditto
* gcc.target/riscv/rvv/base/binop_vx_constraint-108.c: ditto
* gcc.target/riscv/rvv/base/binop_vx_constraint-114.c: ditto
* gcc.target/riscv/rvv/base/binop_vx_constraint-119.c: ditto
* gcc.target/riscv/rvv/base/binop_vx_constraint-12.c: ditto
* gcc.target/riscv/rvv/base/binop_vx_constraint-16.c: ditto
* gcc.target/riscv/rvv/base/binop_vx_constraint-17.c: ditto
* gcc.target/riscv/rvv/base/binop_vx_constraint-19.c: ditto
* gcc.target/riscv/rvv/base/binop_vx_constraint-21.c: ditto
* gcc.target/riscv/rvv/base/binop_vx_constraint-23.c: ditto
* gcc.target/riscv/rvv/base/binop_vx_constraint-25.c: ditto
* gcc.target/riscv/rvv/base/binop_vx_constraint-27.c: ditto
* gcc.target/riscv/rvv/base/binop_vx_constraint-29.c: ditto
* gcc.target/riscv/rvv/base/binop_vx_constraint-31.c: ditto
* gcc.target/riscv/rvv/base/binop_vx_constraint-33.c: ditto
* gcc.target/riscv/rvv/base/binop_vx_constraint-35.c: ditto
* gcc.target/riscv/rvv/base/binop_vx_constraint-4.c: ditto
* gcc.target/riscv/rvv/base/binop_vx_constraint-40.c: ditto
* gcc.target/riscv/rvv/base/binop_vx_constraint-44.c: ditto
* gcc.target/riscv/rvv/base/binop_vx_constraint-50.c: ditto
* gcc.target/riscv/rvv/base/binop_vx_constraint-56.c: ditto
* gcc.target/riscv/rvv/base/binop_vx_constraint-62.c: ditto
* gcc.target/riscv/rvv/base/binop_vx_constraint-68.c: ditto
* gcc.target/riscv/rvv/base/binop_vx_constraint-74.c: ditto
* gcc.target/riscv/rvv/base/binop_vx_constraint-79.c: ditto
* gcc.target/riscv/rvv/base/binop_vx_constraint-8.c: ditto
* gcc.target/riscv/rvv/base/binop_vx_constraint-84.c: ditto
* gcc.target/riscv/rvv/base/binop_vx_constraint-90.c: ditto
* gcc.target/riscv/rvv/base/binop_vx_constraint-96.c: ditto
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-30.c: ditto
* gcc.target/riscv/rvv/base/pr108185-1.c: ditto
* gcc.target/riscv/rvv/base/pr108185-2.c: ditto
* gcc.target/riscv/rvv/base/pr108185-3.c: ditto
* gcc.target/riscv/rvv/base/pr108185-4.c: ditto
* gcc.target/riscv/rvv/base/pr108185-5.c: ditto
* gcc.target/riscv/rvv/base/pr108185-6.c: ditto
* gcc.target/riscv/rvv/base/pr108185-7.c: ditto
* gcc.target/riscv/rvv/base/shift_vx_constraint-1.c: ditto
* gcc.target/riscv/rvv/vsetvl/pr111037-3.c: ditto
* gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-28.c: ditto
* gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-29.c: ditto
* gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-32.c: ditto
* gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-33.c: ditto
* gcc.target/riscv/rvv/vsetvl/vlmax_single_block-17.c: ditto
* gcc.target/riscv/rvv/vsetvl/vlmax_single_block-18.c: ditto
* gcc.target/riscv/rvv/vsetvl/vlmax_single_block-19.c: ditto
* gcc.target/riscv/rvv/vsetvl/vlmax_switch_vtype-10.c: ditto
* gcc.target/riscv/rvv/vsetvl/vlmax_switch_vtype-11.c: ditto
* gcc.target/riscv/rvv/vsetvl/vlmax_switch_vtype-12.c: ditto
* gcc.target/riscv/rvv/vsetvl/vlmax_switch_vtype-4.c: ditto
* gcc.target/riscv/rvv/vsetvl/vlmax_switch_vtype-5.c: ditto
* gcc.target/riscv/rvv/vsetvl/vlmax_switch_vtype-6.c: ditto
* gcc.target/riscv/rvv/vsetvl/vlmax_switch_vtype-7.c: ditto
* gcc.target/riscv/rvv/vsetvl/vlmax_switch_vtype-8.c: ditto
* gcc.target/riscv/rvv/vsetvl/vlmax_switch_vtype-9.c: ditto
* gfortran.dg/vect/vect-8.f90: ditto

Signed-off-by: Edwin Lu <ewlu@rivosinc.com>

RISC-V: Add vector related pipelines

Creates new generic vector pipeline file common to all cpu tunes.
Moves all vector related pipelines from generic-ooo to generic-vector-ooo.
Creates new vector crypto related insn reservations.

gcc/ChangeLog:

* config/riscv/generic-ooo.md (generic_ooo): Move reservation
(generic_ooo_vec_load): ditto
(generic_ooo_vec_store): ditto
(generic_ooo_vec_loadstore_seg): ditto
(generic_ooo_vec_alu): ditto
(generic_ooo_vec_fcmp): ditto
(generic_ooo_vec_imul): ditto
(generic_ooo_vec_fadd): ditto
(generic_ooo_vec_fmul): ditto
(generic_ooo_crypto): ditto
(generic_ooo_perm): ditto
(generic_ooo_vec_reduction): ditto
(generic_ooo_vec_ordered_reduction): ditto
(generic_ooo_vec_idiv): ditto
(generic_ooo_vec_float_divsqrt): ditto
(generic_ooo_vec_mask): ditto
(generic_ooo_vec_vesetvl): ditto
(generic_ooo_vec_setrm): ditto
(generic_ooo_vec_readlen): ditto
* config/riscv/riscv.md: include generic-vector-ooo
* config/riscv/generic-vector-ooo.md: New file. to here

Signed-off-by: Edwin Lu <ewlu@rivosinc.com>
Co-authored-by: Robin Dapp <rdapp.gcc@gmail.com>

RISC-V: Add non-vector types to dfa pipelines

This patch adds non-vector related insn reservations and updates/creates
new insn reservations so all non-vector typed instructions have a reservation.

gcc/ChangeLog:

* config/riscv/generic-ooo.md (generic_ooo_sfb_alu): Add reservation
(generic_ooo_branch): ditto
* config/riscv/generic.md (generic_sfb_alu): ditto
(generic_fmul_half): ditto
* config/riscv/riscv.md: Remove cbo, pushpop, and rdfrm types
* config/riscv/sifive-7.md (sifive_7_hfma):Add reservation
(sifive_7_popcount): ditto
* config/riscv/vector.md: change rdfrm to fmove
* config/riscv/zc.md: change pushpop to load/store

Signed-off-by: Edwin Lu <ewlu@rivosinc.com>

aarch64: -mstrict-align vs __arm_data512_t [PR113657]

After r14-1187-gd6b756447cd58b, simplify_gen_subreg can return
NULL for "unaligned" memory subreg. Since V8DI has an alignment of 8 bytes,
using TImode causes simplify_gen_subreg to return NULL.
This fixes the issue by using DImode instead for the loop. And then we will have
later on the STP/LDP pass combine it back into STP/LDP if needed.
Since strict align is less important (usually used for firmware and early boot only),
not doing LDP/STP here is ok.

Built and tested for aarch64-linux-gnu with no regressions.

PR target/113657

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md (split for movv8di):
For strict aligned mode, use DImode instead of TImode.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/acle/ls64_strict_align.c: New test.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

analyzer: fix skipping of debug stmts [PR113253]

PR analyzer/113253 reports a case where the analyzer output varied
with and without -g enabled.

The root cause was that debug stmts were in the
FOR_EACH_IMM_USE_FAST list for SSA names, leading to the analyzer's
state purging logic differing between the -g and non-debugging cases,
and thus leading to differences in the exploration of the user's code.

Fix by skipping such stmts in the state-purging logic, and removing
debug stmts when constructing the supergraph.

gcc/analyzer/ChangeLog:
PR analyzer/113253
* region-model.cc (region_model::on_stmt_pre): Add gcc_unreachable
for debug statements.
* state-purge.cc
(state_purge_per_ssa_name::state_purge_per_ssa_name): Skip any
debug stmts in the FOR_EACH_IMM_USE_FAST list.
* supergraph.cc (supergraph::supergraph): Don't add debug stmts
to the supernodes.

gcc/testsuite/ChangeLog:
PR analyzer/113253
* gcc.dg/analyzer/deref-before-check-pr113253.c: New test.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

c: Fix ICE for nested enum redefinitions with/without fixed underlying type [PR112571]

Bug 112571 reports an ICE-on-invalid for cases where an enum is
defined, without a fixed underlying type, inside the enum type
specifier for a definition of that same enum with a fixed underlying
type.

The ultimate cause is attempting to access ENUM_UNDERLYING_TYPE in a
case where it is NULL. Avoid this by clearing
ENUM_FIXED_UNDERLYING_TYPE_P in thie case of inconsistent definitions.

Bootstrapped wth no regressions for x86_64-pc-linux-gnu.

PR c/112571

gcc/c/
* c-decl.cc (start_enum): Clear ENUM_FIXED_UNDERLYING_TYPE_P when
defining without a fixed underlying type an enumeration previously
declared with a fixed underlying type.

gcc/testsuite/
* gcc.dg/c23-enum-9.c, gcc.dg/c23-enum-10.c: New tests.