git.ipfire.org Git - thirdparty/gcc.git/log

testsuite/arm: Fix bfloat16_vector_typecheck_[12].c tests [PR 112698]

After commit r14-5617-gb8592186611, int32x[24]_t types now use
elements of 'long int' type instead of 'int' on arm-eabi (it's still
'int' on arm-linux-gnueabihf). Both are 32-bit types anyway.

This patch adjust the two tests so that they optionnally accept 'long '
before 'int' in the expected error message.

2023-11-30 Christophe Lyon <christophe.lyon@linaro.org>

PR target/112698
gcc/testsuite/
* gcc.target/arm/bfloat16_vector_typecheck_1.c: Update expected
error message.
* gcc.target/arm/bfloat16_vector_typecheck_2.c: Likewise.

Fix 'libgomp.c/declare-variant-4-*.c', add 'libgomp.c/declare-variant-4.c'

These test cases being 'dg-skip-if [...] { ! amdgcn-*-* }' meant they were
only ever considered for 'amdgcn-*-*' target -- that is, GCN *target*, not
GCN *offload target*, what is intended to be tested here, but for which they
thus always were UNSUPPORTED. Use the same style of 'dg-[...]' directives
as for the nvptx offloading test cases, 'libgomp.c/declare-variant-3*.c'.

Fix-up for commit 1fd508744eccda9ad9c6d6fcce5b2ea9c568818d
"amdgcn: Support AMD-specific 'isa' traits in OpenMP context selectors".

libgomp/
* testsuite/libgomp.c/declare-variant-4-fiji.c: Adjust.
* testsuite/libgomp.c/declare-variant-4-gfx803.c: Likewise.
* testsuite/libgomp.c/declare-variant-4-gfx900.c: Likewise.
* testsuite/libgomp.c/declare-variant-4-gfx906.c: Likewise.
* testsuite/libgomp.c/declare-variant-4-gfx908.c: Likewise.
* testsuite/libgomp.c/declare-variant-4-gfx90a.c: Likewise.
* testsuite/libgomp.c/declare-variant-4.h: Likewise.
* testsuite/libgomp.c/declare-variant-4.c: New.

Spin 'dg-do run' part of 'libgomp.c/declare-variant-3-sm30.c' off into new 'libgomp.c/declare-variant-3.c'

Having nvptx offloading configured doesn't imply being able to run nvptx
offloading test cases on the test host.

Also, make 'libgomp.c/declare-variant-3.c' work for all non-offloading and
offloading cases.

Fix-up for commit 59b8ade88774b4dcf1691a8f650cdbb86cc30862
"[libgomp, testsuite, nvptx] Add libgomp.c/declare-variant-3-sm*.c".

libgomp/
* testsuite/libgomp.c/declare-variant-3-sm30.c: Turn 'dg-do run'
into 'dg-do link'.
* testsuite/libgomp.c/declare-variant-3.c: New.
* testsuite/libgomp.c/declare-variant-3.h: Extend.

In 'libgomp.c/declare-variant-{3,4}-*.c', restrict 'scan-offload-tree-dump's to 'only_for_offload_target [...]'

... to care for the case where not just one but both of GCN and nvptx
offloading are enabled.  In that case, we currently get:

    UNRESOLVED: libgomp.c/declare-variant-3-sm30.c scan-amdgcn-amdhsa-offload-tree-dump optimized "= f30 \$\$;"

... in addition to:

    PASS: libgomp.c/declare-variant-3-sm30.c scan-nvptx-none-offload-tree-dump optimized "= f30 \$\$;"

Etc.

Fix-up for commit 59b8ade88774b4dcf1691a8f650cdbb86cc30862
"[libgomp, testsuite, nvptx] Add libgomp.c/declare-variant-3-sm*.c",
and commit 1fd508744eccda9ad9c6d6fcce5b2ea9c568818d
"amdgcn: Support AMD-specific 'isa' traits in OpenMP context selectors".

libgomp/
* testsuite/libgomp.c/declare-variant-3-sm30.c: Restrict
'scan-offload-tree-dump' to 'only_for_offload_target nvptx-none'.
* testsuite/libgomp.c/declare-variant-3-sm35.c: Likewise.
* testsuite/libgomp.c/declare-variant-3-sm53.c: Likewise.
* testsuite/libgomp.c/declare-variant-3-sm70.c: Likewise.
* testsuite/libgomp.c/declare-variant-3-sm75.c: Likewise.
* testsuite/libgomp.c/declare-variant-3-sm80.c: Likewise.
* testsuite/libgomp.c/declare-variant-4-fiji.c: Restrict
'scan-offload-tree-dump' to
'only_for_offload_target amdgcn-amdhsa'.
* testsuite/libgomp.c/declare-variant-4-gfx803.c: Likewise.
* testsuite/libgomp.c/declare-variant-4-gfx900.c: Likewise.
* testsuite/libgomp.c/declare-variant-4-gfx906.c: Likewise.
* testsuite/libgomp.c/declare-variant-4-gfx908.c: Likewise.
* testsuite/libgomp.c/declare-variant-4-gfx90a.c: Likewise.

Fix 'libgomp.c/declare-variant-3-*.c' compilation for configurations where GCN offloading is enabled in addition to nvptx

The GCN offloading compiler doesn't like '-misa=sm_30' etc.; restrict to
'-foffload=nvptx-none' compilation only.

Fix-up for commit 59b8ade88774b4dcf1691a8f650cdbb86cc30862
"[libgomp, testsuite, nvptx] Add libgomp.c/declare-variant-3-sm*.c".

libgomp/
* testsuite/libgomp.c/declare-variant-3-sm30.c:
'dg-additional-options -foffload=nvptx-none'.
* testsuite/libgomp.c/declare-variant-3-sm35.c: Likewise.
* testsuite/libgomp.c/declare-variant-3-sm53.c: Likewise.
* testsuite/libgomp.c/declare-variant-3-sm70.c: Likewise.
* testsuite/libgomp.c/declare-variant-3-sm75.c: Likewise.
* testsuite/libgomp.c/declare-variant-3-sm80.c: Likewise.

GCN: Generally enable the 'gcc.target/gcn/avgpr-[...]' test cases

... added in commit ae0d2c240213c5a7f6959c032bfc9f0703cab787
"amdgcn: Add Accelerator VGPR registers". This way, they're correctly tested
no matter what '-march=[...]' is used with 'make check'.

gcc/testsuite/
* gcc.target/gcn/avgpr-mem-double.c: Remove
'dg-skip-if "incompatible ISA" [...]'.
* gcc.target/gcn/avgpr-mem-int.c: Likewise.
* gcc.target/gcn/avgpr-mem-long.c: Likewise.
* gcc.target/gcn/avgpr-mem-short.c: Likewise.
* gcc.target/gcn/avgpr-spill-double.c: Likewise.
* gcc.target/gcn/avgpr-spill-int.c: Likewise.
* gcc.target/gcn/avgpr-spill-long.c: Likewise.
* gcc.target/gcn/avgpr-spill-short.c: Likewise.

AArch64: Fix strict-align cpymem/setmem [PR103100]

The cpymemdi/setmemdi implementation doesn't fully support strict alignment.
Block the expansion if the alignment is less than 16 with STRICT_ALIGNMENT.
Clean up the condition when to use MOPS.

gcc/ChangeLog/
PR target/103100
* config/aarch64/aarch64.md (cpymemdi): Remove pattern condition.
(setmemdi): Likewise.
* config/aarch64/aarch64.cc (aarch64_expand_cpymem): Support
strict-align. Cleanup condition for using MOPS.
(aarch64_expand_setmem): Likewise.

Fortran: fix TARGET attribute of associating entity in ASSOCIATE [PR112764]

The associating entity in an ASSOCIATE construct has the TARGET attribute
if and only if the selector is a variable and has either the TARGET or
POINTER attribute (e.g. F2018:11.1.3.3).

gcc/fortran/ChangeLog:

PR fortran/112764
* primary.cc (gfc_variable_attr): Set TARGET attribute of associating
entity dependent on TARGET or POINTER attribute of selector.

gcc/testsuite/ChangeLog:

PR fortran/112764
* gfortran.dg/associate_62.f90: New test.

tree-optimization/112767 - spurious diagnostic after sccp/loop-split swap

We are diagnosing an unreachable loop which we only manage to elide
after the copyprop pass after sccp which leaves the code open for
diagnosing by the intermittent ivcanon pass. The following makes sure
to clean things up a bit earlier, propagating constant final values
to uses immediately.

PR tree-optimization/112767
* tree-scalar-evolution.cc (final_value_replacement_loop):
Propagate constants to immediate uses immediately.

* gcc.dg/tree-ssa/pr112767.c: New testcase.
* gcc.dg/graphite/pr83255.c: Disable SCCP.

tree-optimization/112766 - improve pruning of uninit diagnostics

Uninit diagnostics has code to prune based on incoming PHI args
that prove the uninit code is never executed. But that only
looks at the first found flag candidate while in the PRs case
only the second candidate would be the one to prune on. The
following patch makes us consider all of the flag candidates
which is cycles well spent IMHO.

PR tree-optimization/112766
* gimple-predicate-analysis.cc (find_var_cmp_const):
Support continuing the iteration and report every candidate.
(uninit_analysis::overlap): Iterate over all flag var
candidates.

* g++.dg/torture/uninit-pr112766.C: New testcase.

RISC-V: Support widening register overlap for vf4/vf8

size_t
foo (char const *buf, size_t len)
{
  size_t sum = 0;
  size_t vl = __riscv_vsetvlmax_e8m8 ();
  size_t step = vl * 4;
  const char *it = buf, *end = buf + len;
  for (; it + step <= end;)
    {
      vint8m1_t v0 = __riscv_vle8_v_i8m1 ((void *) it, vl);
      it += vl;
      vint8m1_t v1 = __riscv_vle8_v_i8m1 ((void *) it, vl);
      it += vl;
      vint8m1_t v2 = __riscv_vle8_v_i8m1 ((void *) it, vl);
      it += vl;
      vint8m1_t v3 = __riscv_vle8_v_i8m1 ((void *) it, vl);
      it += vl;

      asm volatile("nop" ::: "memory");
      vint64m8_t vw0 = __riscv_vsext_vf8_i64m8 (v0, vl);
      vint64m8_t vw1 = __riscv_vsext_vf8_i64m8 (v1, vl);
      vint64m8_t vw2 = __riscv_vsext_vf8_i64m8 (v2, vl);
      vint64m8_t vw3 = __riscv_vsext_vf8_i64m8 (v3, vl);

      asm volatile("nop" ::: "memory");
      size_t sum0 = __riscv_vmv_x_s_i64m8_i64 (vw0);
      size_t sum1 = __riscv_vmv_x_s_i64m8_i64 (vw1);
      size_t sum2 = __riscv_vmv_x_s_i64m8_i64 (vw2);
      size_t sum3 = __riscv_vmv_x_s_i64m8_i64 (vw3);

      sum += sumation (sum0, sum1, sum2, sum3);
    }
  return sum;
}

Before this patch:

        add     a3,s0,s1
        add     a4,s6,s1
        add     a5,s7,s1
        vsetvli zero,s0,e64,m8,ta,ma
        vle8.v  v4,0(s1)
        vle8.v  v3,0(a3)
        mv      s1,s2
        vle8.v  v2,0(a4)
        vle8.v  v1,0(a5)
        nop
        vsext.vf8       v8,v4
        vsext.vf8       v16,v2
        vs8r.v  v8,0(sp)
        vsext.vf8       v24,v1
        vsext.vf8       v8,v3
        nop
        vmv.x.s a1,v8
        vl8re64.v       v8,0(sp)
        vmv.x.s a3,v24
        vmv.x.s a2,v16
        vmv.x.s a0,v8
        add     s2,s2,s5
        call    sumation
        add     s3,s3,a0
        bgeu    s4,s2,.L5

After this patch:

add a3,s0,s1
add a4,s6,s1
add a5,s7,s1
vsetvli zero,s0,e64,m8,ta,ma
vle8.v v15,0(s1)
vle8.v v23,0(a3)
mv s1,s2
vle8.v v31,0(a4)
vle8.v v7,0(a5)
vsext.vf8 v8,v15
vsext.vf8 v16,v23
vsext.vf8 v24,v31
vsext.vf8 v0,v7
vmv.x.s a3,v0
vmv.x.s a2,v24
vmv.x.s a1,v16
vmv.x.s a0,v8
add s2,s2,s5
call sumation
add s3,s3,a0
bgeu s4,s2,.L5

PR target/112431

gcc/ChangeLog:

* config/riscv/vector.md: Add widening overlap of vf2/vf4.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/pr112431-16.c: New test.
* gcc.target/riscv/rvv/base/pr112431-17.c: New test.
* gcc.target/riscv/rvv/base/pr112431-18.c: New test.

RISC-V: Remove earlyclobber for wx/wf instructions.

While working on overlap for widening instructions, I realize that we set
vwadd.wx/vfwadd.wf as earlyclobber which is incorrect.

Since according to RVV ISA:
"The destination EEW equals the source EEW."

vwadd.vx widens the first source operand (i.e. 2 * source EEW = dest EEW) while
vwadd.wx only widens the second/scalar source operand.

Therefore overlap is legal for wx but not for vx.

Before this patch (heave spillings):

        csrr    a5,vlenb
        slli    a5,a5,1
        addi    a5,a5,64
        vfwadd.wf       v2,v14,fs0
        add     a5,a5,sp
        vs2r.v  v2,0(a5)
        vl2re32.v       v2,0(a1)
        vfwadd.wf       v14,v12,fs0
        vfwadd.wf       v12,v10,fs0
        vfwadd.wf       v10,v8,fs0
        vfwadd.wf       v8,v6,fs0
        vfwadd.wf       v6,v4,fs0
        vfwadd.wf       v4,v2,fs0
        vfwadd.wf       v2,v16,fs0
        vfwadd.wf       v16,v18,fs0
        vfwadd.wf       v18,v20,fs0
        vfwadd.wf       v20,v22,fs0
        vfwadd.wf       v22,v24,fs0
        vfwadd.wf       v24,v26,fs0
        vfwadd.wf       v26,v28,fs0
        vfwadd.wf       v28,v30,fs0
        vfwadd.wf       v30,v0,fs0
        nop
        vsetvli zero,zero,e32,m2,ta,ma
        csrr    a5,vlenb

After this patch (no spillings):

        vfwadd.wf v16,v16,fs0
vfwadd.wf v14,v14,fs0
vfwadd.wf v12,v12,fs0
vfwadd.wf v10,v10,fs0
vfwadd.wf v8,v8,fs0
vfwadd.wf v6,v6,fs0
vfwadd.wf v4,v4,fs0
vfwadd.wf v2,v2,fs0
vfwadd.wf v18,v18,fs0
vfwadd.wf v20,v20,fs0
vfwadd.wf v22,v22,fs0
vfwadd.wf v24,v24,fs0
vfwadd.wf v26,v26,fs0
vfwadd.wf v28,v28,fs0
vfwadd.wf v30,v30,fs0
vfwadd.wf v0,v0,fs0

Confirm the codegen above run successfully on both SPIKE/QEMU.

PR target/112431

gcc/ChangeLog:

* config/riscv/vector.md: Remove earlyclobber for wx/wf instructions.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/pr112431-19.c: New test.
* gcc.target/riscv/rvv/base/pr112431-20.c: New test.
* gcc.target/riscv/rvv/base/pr112431-21.c: New test.

ada: Rework fix for wrong finalization of qualified aggregate in allocator

The problem is that there is no easy method to insert an action after an
arbitrary node in the tree, so the original fix does not correctly work
when the allocator is nested in another expression.

Therefore this moves the burden of the insertion from Apply_Predicate_Check
to Expand_Allocator_Expression and restricts the new processing to the case
where it is really required.

gcc/ada/

* checks.ads (Apply_Predicate_Check): Add Deref boolean parameter.
* checks.adb (Apply_Predicate_Check): Revert latest change. Use
Loc local variable to hold the source location. Use a common code
path for the generic processing and make a dereference if Deref is
True.
* exp_ch4.adb (Expand_Allocator_Expression): Compute Aggr_In_Place
earlier. If it is true, do not call Apply_Predicate_Check on the
expression on entry but on the temporary on exit with a
dereference.
* sem_res.adb (Resolve_Actuals): Add explicit parameter
association in call to Apply_Predicate_Check.

ada: Support Put_Image for types in user-defined instances of predefined generics.

Predefined units do not generally support the Put_Image attribute.
There are good reasons for this in most cases. But if a user-defined
instantiation of a predefined generic occurs in Ada 2022 code, then
Put_Image can be supported for types declared therein. Add this support.

gcc/ada/

* exp_put_image.adb (Put_Image_Enabled): Return True in more
cases. In particular, when testing to see if a type occurs in a
predefined unit, test the type's code unit
(obtained by calling Get_Code_Unit). In the case of type within a
user-defined instance of a predefined generic, Is_Predefined_Unit
will return True for the type and False for the type's code unit.

ada: Remove SPARK legality checks

SPARK legality checks apply only to code with SPARK_Mode On, and are
performed again in GNATprove for detecting SPARK-compatible declarations
in code with SPARK_Mode Auto. Remove this duplication, to only perform
SPARK legality checking in GNATprove. After this patch, only a few
special SPARK legality checks are performed in the frontend, which could
be moved to GNATprove later.

gcc/ada/

* contracts.adb (Analyze_Entry_Or_Subprogram_Body_Contract):
Remove checking on volatility. Remove handling of SPARK_Mode, not
needed anymore.
(Analyze_Entry_Or_Subprogram_Contract): Remove checking on
volatility.
(Check_Type_Or_Object_External_Properties): Same.
(Analyze_Object_Contract): Same.
* freeze.adb (Freeze_Record_Type): Same. Also remove checking on
synchronized types and ghost types.
* sem_ch12.adb (Instantiate_Object): Remove checking on
volatility.
(Instantiate_Type): Same.
* sem_ch3.adb (Access_Type_Declaration): Same.
(Derived_Type_Declaration): Remove checking related to untagged
partial view.
(Process_Discriminants): Remove checking on volatility.
* sem_ch5.adb (Analyze_Loop_Parameter_Specification): Same.
* sem_ch6.adb (Analyze_Procedure_Call): Fix use of SPARK_Mode
where GNATprove_Mode was intended.
* sem_disp.adb (Inherited_Subprograms): Protect against Empty
node.
* sem_prag.adb (Analyze_Global_In_Decl_Part): Remove checking on
volatility.
(Analyze_Pragma): Same.
* sem_res.adb (Flag_Effectively_Volatile_Objects): Remove.
(Resolve_Actuals): Remove checking on volatility.
(Resolve_Entity_Name): Same.
* sem_util.adb (Check_Nonvolatile_Function_Profile): Remove.
(Check_Volatility_Compatibility): Remove.
* sem_util.ads: Same.

ada: Remove GNATcheck violations

Remove GNATcheck violations by refactoring code and also using
pragma Annotate to exempt them.

gcc/ada/

* libgnat/i-cstrin.adb (Free): Rewrite code so there is only one
return, to remove Improper_Returns violation.
(Position_Of_Nul): Add pragma to exempt Improper_Returns
violation.
(To_Chars_Ptr): Likewise.
(Value): Likewise

ada: Ignore defered compile time errors without backend

We defer some compile time warnings and errors until the
backend has added the extra information needed. However
it is not guaranteed that the backend has run by this point.
Avoid checking these errors if the backend has not been activated
and no code has been generated.

gcc/ada/

* sem_prag.adb (Validate_Compile_Time_Warning_Errors): Avoid
checking compile time warnings and errors if backend has not been
activated.

ada: Fix spelling of functions with(out) "side effects"

Correct spelling does not include an hyphen. Fix comments and one
error message.

Also fix other mispellings of "side-effect" or "side effect" depending
on the case (adjective should have hyphen), and "side-effect-free" with
double hyphen as an adjective.

gcc/ada/

* checks.adb, exp_aggr.adb, exp_ch4.ads, exp_ch5.adb,
exp_util.adb, exp_util.ads, inline.adb, sem_ch13.adb,
sem_ch6.adb, sem_ch8.adb, sem_prag.adb, sem_util.ads: Fix comments
and typos.

ada: Crash initializing component of private record type

The compiler may crash processing the full type declaration of a
private record type that initializes a component with a call to
a function instantiated in the private part of the package.

gcc/ada/

* freeze.adb (Declared_In_Expanded_Body): New subprogram.
(In_Expanded_Body): Minor code cleanup.
(Freeze_Expression): Code cleanup plus factorize in a new function
the code that identifies entities declared in the body of expander
generated subprograms, since such case must be checked also for
other node kinds when climbing the tree to locate the place to
insert the freezing node.

ada: Name resolution in expanded instances

In building the tree for an instance of a generic, expansion sets
entity fields on names that refer to things declared outside of the
instance, but leaves the entity field unset on names that should end
up referring to things declared within the instance. These will instead be
set by analysis - the idea is that if a name resolves a certain way in the
generic, then we should get corresponding results if we resolve the
corresponding name in an instance. For this to work, we have to prevent
unrelated declarations that happen to be visible at the point of the
instantiation from participating in resolution. Add code to filter out such
unwanted name resolution candidates.

gcc/ada/

* sem_ch8.adb (Find_Direct_Name): In the case of a resolving a
name that occurs within an instantiation, add code to detect and
filter out unwanted candidate resolutions. The filtering is
performed via a call to Remove_Interp.

ada: Add comment describing Partition_Elaboration_Policy dependency.

Add a comment in the spec for the default (as opposed to hie) version of
Ada.Real_Time.Timing_Events indicating that it is incompatible with a
a Partition_Elaboration_Policy specification specifying a policy other than
Concurrent.

gcc/ada/

* libgnarl/a-rttiev.ads: add a comment

ada: Too-strict conformance checking for formal discriminated type

The discriminant subtype conformance check for an actual parameter
corresponding to a generic formal discriminated type was too strict and
could incorrectly reject legal instantiations.

gcc/ada/

* sem_ch12.adb (Validate_Discriminated_Formal_Type): Replace
Entity_Id equality test with a call to Subtypes_Match. Distinct
subtypes which are statically matching should pass this test.
(Check_Discriminated_Formal): Replace Entity_Id equality test with
a call to Subtypes_Statically_Match (preceded by a check that the
preconditions for the call are satisfied).

ada: Fix predicate check failure in Expand_Allocator_Expression

The For_Special_Return_Object flag needs to be accessed on entry of the
procedure in case the allocator is rewritten during the processing.

gcc/ada/

* exp_ch4.adb (Expand_Allocator_Expression): Add Special_Return
boolean constant to hold the value of For_Special_Return_Object
for the allocator and use it throughout the procedure.

ada: Fix wrong finalization for qualified aggregate of limited type in allocator

This happens with -gnata when the limited type has controlled components
and a predicate, because the predicate check generated for the aggregate
causes the creation of a temporary that is used as the expression of the
allocator. Now this combination is illegal for a limited type, so the
compiler does not generate the deep adjustment that would be necessary
for the access value, which ultimately results in a wrong finalization.

gcc/ada/

* checks.adb (Apply_Predicate_Check): Also deal specifically with
an expression that is a qualified aggregate in an allocator.

ada: Constant_Indexing used when context requires a variable

In the case of a call with a formal parameter of mode other than "IN"
where the corresponding actual parameter is a generalized indexing
and the indexable container has both Constant_Indexing and Variable_Indexing
aspects specified, the generalized indexing must be interpreted as a
variable indexing, not as a constant indexing. In some cases involving a
call to a prefixed view of a subprogram, this was not handled correctly.
This error results in spurious compile-time error messages saying that
the actual parameter in the call "must be a variable".

gcc/ada/

* sem_ch4.adb (Constant_Indexing_OK): As a temporary stopgap,
return False in the case of an unanalyzed prefixed-view call.

libiberty: Disable hwcaps for sha1.o

This patch

commit bf4f40cc3195eb7b900bf5535cdba1ee51fdbb8e
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Tue Nov 28 13:14:05 2023 +0100

    libiberty: Use x86 HW optimized sha1

broke Solaris/x86 bootstrap with the native as:

libtool: compile:  /var/gcc/regression/master/11.4-gcc/build/./gcc/gccgo -B/var/gcc/regression/master/11.4-gcc/build/./gcc/ -B/vol/gcc/i386-pc-solaris2.11/bin/ -B/vol/gcc/i386-pc-solaris2.11/lib/ -isystem /vol/gcc/i386-pc-solaris2.11/include -isystem /vol/gcc/i386-pc-solaris2.11/sys-include -fchecking=1 -minline-all-stringops -O2 -g -I . -c -fgo-pkgpath=internal/goarch /vol/gcc/src/hg/master/local/libgo/go/internal/goarch/goarch.go zgoarch.go
ld.so.1: go1: fatal: /var/gcc/regression/master/11.4-gcc/build/gcc/go1: hardware capability (CA_SUNW_HW_2) unsupported: 0x4000000  [ SHA1 ]
gccgo: fatal error: Killed signal terminated program go1

As is already done in a couple of other similar cases, this patches
disables hwcaps support for libiberty.

Initially, this didn't work because config/hwcaps.m4 uses target_os, but
didn't ensure it is defined.

Tested on i386-pc-solaris2.11 with as and gas.

2023-11-29  Rainer Orth  <ro@CeBiTec.Uni-Bielefeld.DE>

config:
* hwcaps.m4 (GCC_CHECK_ASSEMBLER_HWCAP): Require
AC_CANONICAL_TARGET.

libiberty:
* configure.ac (GCC_CHECK_ASSEMBLER_HWCAP): Invoke.
* configure, aclocal.m4: Regenerate.
* Makefile.in (COMPILE.c): Add HWCAP_CFLAGS.

wide-int: Fix wide_int division/remainder [PR112733]

This patch fixes the other half of the PR, where we were crashing on the
UNSIGNED widest_int modulo of 1 % -128 (where the -128 in UNSIGNED is
131072 bit 0xfffff...fffff80).
The patch actually has 2 independent parts.
The fix for the divmod buffer overflow is in the 2nd-5th and 7th-8th hunks of
the patch.  abs (remainder) <= min (abs (dividend), abs (divisor)) and the
wide-int.h callers estimate the remainder size to be the same as the
quotient size, i.e. for UNSIGNED dividend with msb set quotient (same as
remainder) precision based estimation, for SIGNED or dividend with msb
clear estimate based on divident len (+ 1).
For quotient (if any), wi_pack is called as
  quotient_len = wi_pack (quotient, b_quotient, m, dividend_prec);
where m is
  m = dividend_blocks_needed;
  while (m > 1 && b_dividend[m - 1] == 0)
    m--;
and
  dividend_blocks_needed = 2 * BLOCKS_NEEDED (dividend_prec);
where wi_pack stores to result (in_len + 1) / 2 limbs (or one more
if (in_len & 1) == 0 and in_len / 2 < BLOCKS_NEEDED (dividend_prec)).
In any case, that is never more than BLOCKS_NEEDED (dividend_prec) and
matches caller's estimations (then it is canonicalized and perhaps shrunk
that way).

The problematic case is the remainder, where wi_pack is called as
  *remainder_len = wi_pack (remainder, b_remainder, n, dividend_prec);
The last argument is again based on the dividend precision like for
quotient, but n is instead:
  n = divisor_blocks_needed;
  while (n > 1 && b_divisor[n - 1] == 0)
    n--;
so based on the divisor rather than dividend.  Now, in the
wi::multiple_of_p (1, -128, UNSIGNED) widest_int precision case,
dividend_prec is very small, the dividend is 1 and while the official
precision is 131072 bits, dividend_len is 1 and
  if (sgn == SIGNED || dividend_val[dividend_len - 1] >= 0)
    dividend_prec = MIN ((dividend_len + 1) * HOST_BITS_PER_WIDE_INT,
                         dividend_prec);
shrinks the estimate to 128 bits in this case; but divisor_prec
is huge, because the divisor is negative UNSIGNED, so 131072 bits,
2048 limbs, 4096 half-limbs (so divisor_blocks_needed and n are 4096).
Now, wi_pack relies on the penultimate argument to be smaller or equal
to 2 * BLOCKS_NEEDED of the last argument and that is not the case here,
4096 is certainly larger than 2 * BLOCKS_NEEDED (128) aka 4 and so
wi_pack overflows the array.
Unfortunately looking around, this isn't the only overflow, already
in divmod_internal_2 we have an overflow, though there not overflowing
a buffer during writing, but during reading.  When divmod_internal_2
is called with the last 2 arguments like m=1, n=4096 as in this case
(but generally any where m < n), it has a special case for n == 1 (which
won't be the problem, we never have m or n 0), then decides based on
msb of divisor whether there should be some shift canonicalization, then
performs a
  for (j = m - n; j >= 0; j--)
main loop and finally initializes b_remainder:
  if (s)
    for (i = 0; i < n; i++)
      b_remainder[i] = (b_dividend[i] >> s)
        | (b_dividend[i+1] << (HOST_BITS_PER_HALF_WIDE_INT - s));
  else
    for (i = 0; i < n; i++)
      b_remainder[i] = b_dividend[i];
In the usual case of m >= n, the main loop will iterate at least once,
b_dividend has m + 1 valid elements and the copying to b_remainder
is correct.  But for the m < n unusual case, the main loop never iterates
and we want to copy the b_dividend right into b_remainder (possibly with
shifts).  Except when it copies n elements, it copies garbage which isn't
there at all for the last n - m elements.
Because the decision whether the main loop should iterate at all or not
and the s computation should be based on the original n, the following
patch sets n to MIN (n, m) only after the main loop, such that we copy
to b_remainder at most what is initialized in b_dividend, and returns
that adjusted value to the caller which then passes it to wi_pack and
so satisfies again the requirement there, rather than trying to
do the n = MIN (n, m) before calling divmod_internal_2 or after it.

I believe the bug is mostly something I've only introduced myself in the
> 576 bit wide_int/widest_int support changes, before that there was
no attempt to decrease the precisions and so I think dividend_prec
and divisor_prec were pretty much always the same (or always?),
and even the copying of garbage wasn't the case because the only time
m or n was decreased from the same precision was if there were 0s in the
upper half-limbs.

The 1st and 6th hunks in the patch is just that while looking at the
code, I've noticed I computed wrong the sizes in the XALLOCAVEC case,
somehow the 4 * in the static arrays led me believe I need to make the
alloced buffers twice (in the multiplication) or 4 times (in the
division/modulo cases) larger than needed.

2023-11-30  Jakub Jelinek  <jakub@redhat.com>

PR middle-end/112733
* wide-int.cc (wi::mul_internal): Don't allocate twice as much
space for u, v and r as needed.
(divmod_internal_2): Change return type from void to int, for n == 1
return 1, otherwise before writing b_dividend into b_remainder set
n to MIN (n, m) and at the end return it.
(wi::divmod_internal): Don't allocate 4 times as much space for
b_quotient, b_remainder, b_dividend and b_divisor.  Set n to
result of divmod_internal_2.
(wide_int_cc_tests): Add test for unsigned widest_int
wi::multiple_of_p of 1 and -128.

c++: Implement C++26 P2169R4 - Placeholder variables with no name [PR110349]

The following patch implements the C++26 P2169R4 paper.
As written in the PR, the patch expects that:
1) https://eel.is/c++draft/expr.prim.lambda.capture#2
   "Ignoring appearances in initializers of init-captures, an identifier
    or this shall not appear more than once in a lambda-capture."
   is adjusted such that name-independent lambda captures with initializers
   can violate this rule (but lambda captures which aren't name-independent
   can't appear after name-independent ones)
2) https://eel.is/c++draft/class.mem#general-5
   "A member shall not be declared twice in the member-specification,
    except that"
   having an exception that name-independent non-static data member
   declarations can appear multiple times (but again, if there is
   a member which isn't name-independent, it can't appear after
   name-independent ones)
3) it assumes that any name-independent declarations which weren't
   previously valid result in the _ lookups being ambiguous, not just
   if there are 2 _ declarations in the same scope, in particular the
   https://eel.is/c++draft/basic.scope#block-2 mentioned cases
4) it assumes that _ in static function/block scope structured bindings
   is never name-independent like in namespace scope structured bindings;
   it matches clang behavior and is consistent with e.g. static type _;
   not being name-independent both at namespace scope and at function/block
   scope

As you preferred in the PR, for local scope bindings, the ambiguous cases
use a TREE_LIST with the ambiguous cases which can often be directly fed
into print_candidates.  For member_vec after sorting/deduping, I chose to use
instead OVERLOAD with a new flag but only internally inside of the
member_vec, get_class_binding_direct turns it into a TREE_LIST.  There are
2 reasons for that, in order to keep the member_vec binary search fast, I
think it is better to keep OVL_NAME usable on all elements because having
to special case TREE_LIST would slow everything down, and the callers need
to be able to chain the results anyway and so need an unshared TREE_LIST
they can tweak/destroy anyway.
name-independent declarations (even in older standards) will not have
-Wunused{,-variable,-but-set-variable} or -Wshadow* warnings diagnosed, but
unlike e.g. the clang implementation, this patch does diagnose
-Wunused-parameter for parameters with _ names because they aren't
name-independent and one can just omit their name instead.

2023-11-30  Jakub Jelinek  <jakub@redhat.com>

PR c++/110349
gcc/c-family/
* c-cppbuiltin.cc (c_cpp_builtins): Predefine
__cpp_placeholder_variables=202306L for C++26.
gcc/cp/
* cp-tree.h: Implement C++26 P2169R4 - Placeholder variables with no
name.
(OVL_NAME_INDEPENDENT_DECL_P): Define.
(add_capture): Add unsigned * argument.
(name_independent_decl_p): New inline function.
* name-lookup.cc (class name_lookup): Make ambiguous and
add_value members public.
(name_independent_linear_search): New function.
(get_class_binding_direct): Handle member_vec_binary_search
returning OVL_NAME_INDEPENDENT_DECL_P OVERLOAD.  Use
name_independent_linear_search rather than fields_linear_search
for linear lookup of _ name if !want_type.
(member_name_cmp): Sort name-independent declarations first.
(member_vec_dedup): Handle name-independent declarations.
(pop_local_binding): Handle binding->value being a TREE_LIST for
ambiguous name-independent declarations.
(supplement_binding): Handle name-independent declarations.
(update_binding): Likewise.
(check_local_shadow): Return tree rather than void, normally
NULL_TREE but old for name-independent declarations which used
to conflict with outer scope declaration.  Don't emit -Wshadow*
warnings for name-independent declarations.
(pushdecl): Handle name-independent declarations.
* search.cc (lookup_field_r): Handle nval being a TREE_LIST.
* lambda.cc (build_capture_proxy): Adjust for ___.<number>
names of members.
(add_capture): Add NAME_INDEPENDENT_CNT argument.  Use ___.<number>
name rather than ___ for second and following capture with
_ name.
(add_default_capture): Adjust add_capture caller.
* decl.cc (poplevel): Don't warn about name-independent declarations.
(duplicate_decls): If in C++26 a _ named declaration conflicts with
earlier declarations, emit explaining note why the new declaration
is not name-independent.
(reshape_init_class): If field is a TREE_LIST, emit an ambiguity
error with list of candidates rather than error about non-existing
non-static data member.
* parser.cc (cp_parser_lambda_introducer): Adjust add_capture callers.
Allow name-independent capture redeclarations.
(cp_parser_decomposition_declaration): Set decl_specs.storage_class
to sc_static for static structured bindings.
* pt.cc (tsubst_lambda_expr): Adjust add_capture caller.
gcc/testsuite/
* g++.dg/cpp26/name-independent-decl1.C: New test.
* g++.dg/cpp26/name-independent-decl2.C: New test.
* g++.dg/cpp26/name-independent-decl3.C: New test.
* g++.dg/cpp26/name-independent-decl4.C: New test.
* g++.dg/cpp26/name-independent-decl5.C: New test.
* g++.dg/cpp26/name-independent-decl6.C: New test.
* g++.dg/cpp26/feat-cxx26.C: Add __cpp_placeholder_variables test.

Support sdot_prodv*qi with emulation of sdot_prodv*hi.

Currently sdot_prodv*qi is available under TARGET_AVXVNNIINT8, but it
can be emulated by

vec_unpacks_lo_v32qi
vec_unpacks_lo_v32qi
vec_unpacks_hi_v32qi
vec_unpacks_hi_v32qi
sdot_prodv16hi
sdot_prodv16hi
add3v8si

which is faster than original

  vect_patt_39.11_48 = WIDEN_MULT_LO_EXPR <vect__3.7_44, vect__7.10_47>;
  vect_patt_39.11_49 = WIDEN_MULT_HI_EXPR <vect__3.7_44, vect__7.10_47>;
  vect_patt_38.14_54 = [vec_unpack_lo_expr] vect_patt_39.11_48;
  vect_patt_38.14_55 = [vec_unpack_hi_expr] vect_patt_39.11_48;
  vect_patt_38.14_56 = [vec_unpack_lo_expr] vect_patt_39.11_49;
  vect_patt_38.14_57 = [vec_unpack_hi_expr] vect_patt_39.11_49;
  vect_sum_15.15_59 = vect_patt_38.14_54 + vect_patt_38.14_55;
  vect_sum_15.15_60 = vect_patt_38.14_56 + vect_sum_15.15_59;
  vect_sum_15.15_61 = vect_patt_38.14_57 + vect_sum_15.15_60;

gcc/ChangeLog:

* config/i386/sse.md (sdot_prodv64qi): New expander.
(sseunpackmodelower): New mode attr.
(sdot_prod<mode>): Emulate sdot_prodv*qi with sodt_prov*hi
when TARGET_VNNIINT8 is not available.

gcc/testsuite/ChangeLog:

* gcc.target/i386/sdotprodint8_emulate.c: New test.

Use vec_extact_lo instead of subreg in reduc_<code>_scal_m.

Loop vectorizer will use vec_perm to select lower part of a vector,
there could be some redundancy when using subreg in
reduc_<code>_scal_m, because rtl cse can't figure out vec_select lower
part is just subreg.

I'm trying to canonicalize vec_select to subreg like aarch64 did, but
there're so many regressions, some are easy to fix, some requires
middle-end adjustment.

So for simplicity, the patch use vec_select instead of subreg in
reduc_<code>_scal_m.

gcc/ChangeLog:

* config/i386/sse.md: (reduc_plus_scal_<mode>): Use
vec_extract_lo instead of subreg.
(reduc_<code>_scal_<mode>): Ditto.
(reduc_<code>_scal_<mode>): Ditto.
(reduc_<code>_scal_<mode>): Ditto.
(reduc_<code>_scal_<mode>): Ditto.

Revert "testsuite: analyzer: expect alignment warning with -fshort-enums"

This reverts commit 0e0e3420dfe1d302145b4eb3bbf311a4f39eeced.

c++: mark short-enums as packed

Unlike C, C++ doesn't mark enums shortened by -fshort-enums as packed.
This makes for undesirable warning differences between C and C++,
e.g. c-c++-common/analyzer/null-deref-pr108251-smp_fetch_ssl_fc_has_early*.c
triggers a warning about a type cast from a pointer to enum that, when
packed, might not be sufficiently aligned.

This change is not enough for that warning to trigger.  The tree
expression generated by the C++ front-end is also a little too
complicated for us get to the base pointer.  A separate patch takes
care of that.

for  gcc/cp/ChangeLog

* decl.cc (finish_enum_value_list): Set TYPE_PACKED if
use_short_enum, and propagate it to variants.

c++: remove LAMBDA_EXPR_MUTABLE_P

In review of the deducing 'this' patch it came up that LAMBDA_EXPR_MUTABLE_P
doesn't make sense for a lambda with an explicit object parameter. And it
was never necessary, so let's remove it.

gcc/cp/ChangeLog:

* cp-tree.h (LAMBDA_EXPR_MUTABLE_P): Remove.
* cp-tree.def: Remove documentation.
* lambda.cc (build_lambda_expr): Remove reference.
* parser.cc (cp_parser_lambda_declarator_opt): Likewise.
* pt.cc (tsubst_lambda_expr): Likewise.
* ptree.cc (cxx_print_lambda_node): Likewise.
* semantics.cc (capture_decltype): Get the object quals
from the object instead.

RISC-V: Fix 'E' extension version to test

Commit 006e90e1 ("RISC-V: Initial RV64E and LP64E support") caused a
regression (test failure) but this is caused by failing to track GCC
changes in that test case (not a true GCC bug).

This commit fixes the test case to track the latest GCC with 'E'
extension version 2.0 (ratified).

gcc/testsuite/ChangeLog:

* gcc.target/riscv/predef-13.c: Fix 'E' extension version to test.

RISC-V: Support highpart overlap for floating-point widen instructions

This patch leverages the approach of vwcvt/vext.vf2 which has been approved.
Their approaches are totally the same.

Tested no regression and committed.

PR target/112431

gcc/ChangeLog:

* config/riscv/vector.md: Add widenning overlap.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/pr112431-10.c: New test.
* gcc.target/riscv/rvv/base/pr112431-11.c: New test.
* gcc.target/riscv/rvv/base/pr112431-12.c: New test.
* gcc.target/riscv/rvv/base/pr112431-13.c: New test.
* gcc.target/riscv/rvv/base/pr112431-14.c: New test.
* gcc.target/riscv/rvv/base/pr112431-15.c: New test.
* gcc.target/riscv/rvv/base/pr112431-7.c: New test.
* gcc.target/riscv/rvv/base/pr112431-8.c: New test.
* gcc.target/riscv/rvv/base/pr112431-9.c: New test.

RISC-V: Rename vconstraint into group_overlap

Fix for Robin's suggestion.

gcc/ChangeLog:

* config/riscv/constraints.md (TARGET_VECTOR ? V_REGS : NO_REGS): Fix constraint.
* config/riscv/riscv.md (no,W21,W42,W84,W41,W81,W82): Rename vconstraint into group_overlap.
(no,yes): Ditto.
(none,W21,W42,W84,W43,W86,W87): Ditto.
* config/riscv/vector.md: Ditto.

RISC-V: Support highpart overlap for vext.vf

PR target/112431

gcc/ChangeLog:

* config/riscv/vector.md: Support highpart overlap for vext.vf2

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/unop_v_constraint-2.c: Adapt test.
* gcc.target/riscv/rvv/base/pr112431-4.c: New test.
* gcc.target/riscv/rvv/base/pr112431-5.c: New test.
* gcc.target/riscv/rvv/base/pr112431-6.c: New test.

Daily bump.

aarch64: Add support for Ampere-1B (-mcpu=ampere1b) CPU

This patch adds initial support for Ampere-1B core.

The Ampere-1B core implements ARMv8.7 with the following (compiler
visible) extensions:
- CSSC (Common Short Sequence Compression instructions),
- MTE (Memory Tagging Extension)
- SM3/SM4

gcc/ChangeLog:

* config/aarch64/aarch64-cores.def (AARCH64_CORE): Add ampere-1b
* config/aarch64/aarch64-cost-tables.h: Add ampere1b_extra_costs
* config/aarch64/aarch64-tune.md: Regenerate
* config/aarch64/aarch64.cc: Include ampere1b tuning model
* doc/invoke.texi: Document -mcpu=ampere1b
* config/aarch64/tuning_models/ampere1b.h: New file.

c++: P2280R4, Using unknown refs in constant expr [PR106650]

This patch is an attempt to implement (part of?) P2280, Using unknown
pointers and references in constant expressions.  (Note that R4 seems to
only allow References to unknown/Accesses via this, but not Pointers to
unknown.)

This patch works to the extent that the test case added in [expr.const]
works as expected, as well as the test in
<https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2022/p2280r4.html#the-this-pointer>

Most importantly, the proposal makes this compile:

  template <typename T, size_t N>
  constexpr auto array_size(T (&)[N]) -> size_t {
      return N;
  }

  void check(int const (&param)[3]) {
      constexpr auto s = array_size(param);
      static_assert (s == 3);
  }

and I think it would be a pity not to have it in GCC 14.

What still doesn't work is the test in $3.2:

  struct A2 { constexpr int f() { return 0; } };
  struct B2 : virtual A2 {};
  void f2(B2 &b) { constexpr int k = b.f(); }

where we say
error: '* & b' is not a constant expression

This will be fixed in the future.

PR c++/106650

gcc/cp/ChangeLog:

* constexpr.cc (cxx_eval_constant_expression) <case PARM_DECL>: Allow
reference to unknown/this as per P2280.
<case VAR_DECL>: Allow reference to unknown as per P2280.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/constexpr-array-ptr6.C: Remove dg-error.
* g++.dg/cpp0x/constexpr-ref12.C: Likewise.
* g++.dg/cpp0x/constexpr-ref2.C: Adjust dg-error.
* g++.dg/cpp0x/noexcept34.C: Remove dg-error.
* g++.dg/cpp1y/lambda-generic-const10.C: Likewise.
* g++.dg/cpp0x/constexpr-ref13.C: New test.
* g++.dg/cpp1z/constexpr-ref1.C: New test.
* g++.dg/cpp1z/constexpr-ref2.C: New test.
* g++.dg/cpp2a/constexpr-ref1.C: New test.

libbacktrace: call GetModuleFileNameA on Windows

Patch from Björn Schäpers.

* fileline.c: Include <windows.h> if available.
(windows_get_executable_path): New static function.
(fileline_initialize): Call windows_get_executable_path.
* configure.ac: Checked for windows.h
* configure: Regenerate.
* config.h.in: Regenerate.

c++: fix testcase [PR112765]

PR c++/112765

gcc/testsuite/ChangeLog:

* g++.dg/warn/Wparentheses-33.C: Compile with -Wparentheses.

c++: bogus -Wparentheses warning [PR112765]

We need to consistently look through implicit INDIRECT_REF when
setting/checking for -Wparentheses warning suppression. In passing
use the recently introduced STRIP_REFERENCE_REF helper some more.

PR c++/112765

gcc/cp/ChangeLog:

* pt.cc (tsubst_expr) <case MODOP_EXPR>: Look through implicit
INDIRECT_REF when propagating -Wparentheses warning suppression.
* semantics.cc (maybe_warn_unparenthesized_assignment): Replace
REFERENCE_REF_P handling with STRIP_REFERENCE_REF.
(finish_parenthesized_expr): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/warn/Wparentheses-33.C: New test.

bpf: change ASM_COMMENT_START to '#'

The BPF "pseudo-C" assembly dialect uses semi-colon (;) to separate
statements, not to begin line comments. The GNU assembler was recently
changed accordingly:

https://sourceware.org/pipermail/binutils/2023-November/130867.html

This patch adapts the BPF backend in GCC accordingly, to use a hash (#)
instead of semi-colon (;) for ASM_COMMENT_START. This is supported
already in clang.

gcc/
* config/bpf/bpf.h (ASM_COMMENT_START): Change from ';' to '#'.

gcc/testsuite/
* gcc.target/bpf/core-builtin-enumvalue-opt.c: Change dg-final
scans to not assume a specific comment character.
* gcc.target/bpf/core-builtin-enumvalue.c: Likewise.
* gcc.target/bpf/core-builtin-type-based.c: Likewise.
* gcc.target/bpf/core-builtin-type-id.c: Likewise.

rs6000: Fix up c-c++-common/builtin-classify-type-1.c failure [PR112725]

The rs6000 backend (and s390 one as well) diagnoses passing vector types
to unprototyped functions, which breaks the builtin-classify-type-1.c test.
The builtin isn't really unprototyped, it is just type-generic and accepting
vector types is just fine there, all it does is categorize the vector type.
The following patch makes sure we don't diagnose it for this builtin.

2023-11-29 Jakub Jelinek <jakub@redhat.com>

PR target/112725
* config/rs6000/rs6000.cc (invalid_arg_for_unprototyped_fn): Return
NULL for __builtin_classify_type calls with vector arguments.

Check operands before invoking fold_range.

Call check_operands_p before fold_range to make sure it is a valid operation.

PR tree-optimization/111922
gcc/
* ipa-cp.cc (ipa_vr_operation_and_type_effects): Check the
operands are valid before calling fold_range.

gcc/testsuite/
* gcc.dg/pr111922.c: New.

Add operand_check_p to range-ops.

Add an optional method to verify operands are compatible, and check
the operands before all range operations.

* range-op-mixed.h (operator_equal::operand_check_p): New.
(operator_not_equal::operand_check_p): New.
(operator_lt::operand_check_p): New.
(operator_le::operand_check_p): New.
(operator_gt::operand_check_p): New.
(operator_ge::operand_check_p): New.
(operator_plus::operand_check_p): New.
(operator_abs::operand_check_p): New.
(operator_minus::operand_check_p): New.
(operator_negate::operand_check_p): New.
(operator_mult::operand_check_p): New.
(operator_bitwise_not::operand_check_p): New.
(operator_bitwise_xor::operand_check_p): New.
(operator_bitwise_and::operand_check_p): New.
(operator_bitwise_or::operand_check_p): New.
(operator_min::operand_check_p): New.
(operator_max::operand_check_p): New.
* range-op.cc (range_op_handler::fold_range): Check operand
parameter types.
(range_op_handler::op1_range): Ditto.
(range_op_handler::op2_range): Ditto.
(range_op_handler::operand_check_p): New.
(range_operator::operand_check_p): New.
(operator_lshift::operand_check_p): New.
(operator_rshift::operand_check_p): New.
(operator_logical_and::operand_check_p): New.
(operator_logical_or::operand_check_p): New.
(operator_logical_not::operand_check_p): New.
* range-op.h (range_operator::operand_check_p): New.
(range_op_handler::operand_check_p): New.

tree-sra: Avoid returns of references to SRA candidates

The enhancement to address PR 109849 contained an importsnt thinko,
and that any reference that is passed to a function and does not
escape, must also not happen to be aliased by the return value of the
function.  This has quickly transpired as bugs PR 112711 and PR
112721.

Just as IPA-modref does a good enough job to allow us to rely on the
escaped set of variables, it sems to be doing well also on updating
EAF_NOT_RETURNED_DIRECTLY call argument flag which happens to address
exactly the situation we need to avoid.  Of course, if a call
statement ignores any returned value, we also do not need to check the
flag.

Hopefully this does not pessimize things too much, I have verified
that the PR 109849 testcae remains quick and so should also the
benchmark it is derived from.

gcc/ChangeLog:

2023-11-27  Martin Jambor  <mjambor@suse.cz>

PR tree-optimization/112711
PR tree-optimization/112721
* tree-sra.cc (build_access_from_call_arg): New parameter
CAN_BE_RETURNED, disqualify any candidate passed by reference if it is
true.  Adjust leading comment.
(scan_function): Pass appropriate value to CAN_BE_RETURNED of
build_access_from_call_arg.

gcc/testsuite/ChangeLog:

2023-11-29  Martin Jambor  <mjambor@suse.cz>

PR tree-optimization/112711
PR tree-optimization/112721
* g++.dg/tree-ssa/pr112711.C: New test.
* gcc.dg/tree-ssa/pr112721.c: Likewise.

In 'libgomp.c/target-simd-clone-{1,2,3}.c', restrict 'scan-offload-ipa-dump's to 'only_for_offload_target amdgcn-amdhsa'

This gets rid of UNRESOLVEDs if nvptx offloading compilation is enabled in
addition to GCN:

     PASS: libgomp.c/target-simd-clone-1.c (test for excess errors)
     PASS: libgomp.c/target-simd-clone-1.c scan-amdgcn-amdhsa-offload-ipa-dump simdclone "Generated local clone _ZGV.*N.*_addit"
    -UNRESOLVED: libgomp.c/target-simd-clone-1.c scan-nvptx-none-offload-ipa-dump simdclone "Generated local clone _ZGV.*N.*_addit"
     PASS: libgomp.c/target-simd-clone-1.c scan-amdgcn-amdhsa-offload-ipa-dump simdclone "Generated local clone _ZGV.*M.*_addit"
    -UNRESOLVED: libgomp.c/target-simd-clone-1.c scan-nvptx-none-offload-ipa-dump simdclone "Generated local clone _ZGV.*M.*_addit"
     PASS: libgomp.c/target-simd-clone-2.c (test for excess errors)
     PASS: libgomp.c/target-simd-clone-2.c scan-amdgcn-amdhsa-offload-ipa-dump-not simdclone "Generated .* clone"
    -UNRESOLVED: libgomp.c/target-simd-clone-2.c scan-nvptx-none-offload-ipa-dump-not simdclone "Generated .* clone"
     PASS: libgomp.c/target-simd-clone-3.c (test for excess errors)
     PASS: libgomp.c/target-simd-clone-3.c scan-amdgcn-amdhsa-offload-ipa-dump simdclone "device doesn't match"
    -UNRESOLVED: libgomp.c/target-simd-clone-3.c scan-nvptx-none-offload-ipa-dump simdclone "device doesn't match"
     PASS: libgomp.c/target-simd-clone-3.c scan-amdgcn-amdhsa-offload-ipa-dump-not simdclone "Generated .* clone"
    -UNRESOLVED: libgomp.c/target-simd-clone-3.c scan-nvptx-none-offload-ipa-dump-not simdclone "Generated .* clone"

Minor fix-up for commit 309e2d95e3b930c6f15c8a5346b913158404c76d
'OpenMP: Generate SIMD clones for functions with "declare target"'.

libgomp/
* testsuite/libgomp.c/target-simd-clone-1.c: Restrict
'scan-offload-ipa-dump's to
'only_for_offload_target amdgcn-amdhsa'.
* testsuite/libgomp.c/target-simd-clone-2.c: Likewise.
* testsuite/libgomp.c/target-simd-clone-3.c: Likewise.

testsuite: Add 'only_for_offload_target' wrapper for 'scan-offload-tree-dump' etc.

This allows restricting scans to one specific offload target only.

gcc/
* doc/sourcebuild.texi (Final Actions): Document
'only_for_offload_target' wrapper.
gcc/testsuite/
* lib/scanoffload.exp (only_for_offload_target): New 'proc'.

testsuite, i386: Only check for cfi directives if supported [PR112729]

gcc.target/i386/apx-interrupt-1.c and two more tests FAIL on Solaris/x86
with the native assembler.  Like Darwin as, it doesn't support cfi
directives.  Instead of adding more and more targets in every affected
test, this patch introduces a cfi effective-target keyword to check for
the prerequisite.

Tested on i386-pc-solaris2.11 (as and gas), x86_64-pc-linux-gnu, and
x86_64-apple-darwin23.1.0.

2023-11-24  Rainer Orth  <ro@CeBiTec.Uni-Bielefeld.DE>

gcc/testsuite:
PR testsuite/112729
* lib/target-supports.exp (check_effective_target_cfi): New proc.
* gcc.target/i386/apx-interrupt-1.c: Require cfi instead of
skipping on *-*-darwin*.
* gcc.target/i386/apx-push2pop2_force_drap-1.c: Likewise.
* gcc.target/i386/apx-push2pop2-1.c: Likewise.

gcc:
PR testsuite/112729
* doc/sourcebuild.texi (Effective-Target Keywords, Environment
attributes): Document cfi.

Fix 'g++.dg/cpp26/static_assert1.C' for '-fno-exceptions' configurations

This test case, added in recent commit 6ce952188ab39e303e4f63e474b5cba83b5b12fd
"c++: Implement C++26 P2741R3 - user-generated static_assert messages [PR110348]",
expectedly runs into 'UNSUPPORTED: [...]: exception handling disabled', but
along the way also FAILs a few tests:

    UNSUPPORTED: g++.dg/cpp26/static_assert1.C  -std=gnu++98
    PASS: g++.dg/cpp26/static_assert1.C  -std=gnu++11  (test for warnings, line 6)
    [...]
    PASS: g++.dg/cpp26/static_assert1.C  -std=gnu++11  (test for warnings, line 51)
    FAIL: g++.dg/cpp26/static_assert1.C  -std=gnu++11  (test for errors, line 52)
    PASS: g++.dg/cpp26/static_assert1.C  -std=gnu++11  (test for warnings, line 56)
    PASS: g++.dg/cpp26/static_assert1.C  -std=gnu++11  (test for warnings, line 57)
    FAIL: g++.dg/cpp26/static_assert1.C  -std=gnu++11  at line 58 (test for errors, line 57)
    PASS: g++.dg/cpp26/static_assert1.C  -std=gnu++11  (test for warnings, line 59)
    [...]
    PASS: g++.dg/cpp26/static_assert1.C  -std=gnu++11  at line 308 (test for errors, line 307)
    UNSUPPORTED: g++.dg/cpp26/static_assert1.C  -std=gnu++11: exception handling disabled
    PASS: g++.dg/cpp26/static_assert1.C  -std=gnu++14  (test for warnings, line 6)
    [...]
    PASS: g++.dg/cpp26/static_assert1.C  -std=gnu++14  (test for warnings, line 51)
    FAIL: g++.dg/cpp26/static_assert1.C  -std=gnu++14  (test for errors, line 52)
    PASS: g++.dg/cpp26/static_assert1.C  -std=gnu++14  (test for warnings, line 56)
    PASS: g++.dg/cpp26/static_assert1.C  -std=gnu++14  (test for warnings, line 57)
    FAIL: g++.dg/cpp26/static_assert1.C  -std=gnu++14  at line 58 (test for errors, line 57)
    PASS: g++.dg/cpp26/static_assert1.C  -std=gnu++14  (test for warnings, line 59)
    [...]
    PASS: g++.dg/cpp26/static_assert1.C  -std=gnu++14  at line 257 (test for errors, line 256)
    FAIL: g++.dg/cpp26/static_assert1.C  -std=gnu++14  (test for errors, line 261)
    PASS: g++.dg/cpp26/static_assert1.C  -std=gnu++14  (test for warnings, line 262)
    [...]
    PASS: g++.dg/cpp26/static_assert1.C  -std=gnu++14  at line 308 (test for errors, line 307)
    UNSUPPORTED: g++.dg/cpp26/static_assert1.C  -std=gnu++14: exception handling disabled
    PASS: g++.dg/cpp26/static_assert1.C  -std=gnu++20  (test for warnings, line 6)
    [...]
    PASS: g++.dg/cpp26/static_assert1.C  -std=gnu++20  (test for warnings, line 51)
    FAIL: g++.dg/cpp26/static_assert1.C  -std=gnu++20  (test for errors, line 52)
    PASS: g++.dg/cpp26/static_assert1.C  -std=gnu++20  (test for warnings, line 56)
    PASS: g++.dg/cpp26/static_assert1.C  -std=gnu++20  (test for warnings, line 57)
    FAIL: g++.dg/cpp26/static_assert1.C  -std=gnu++20  at line 58 (test for errors, line 57)
    PASS: g++.dg/cpp26/static_assert1.C  -std=gnu++20  (test for warnings, line 59)
    [...]
    PASS: g++.dg/cpp26/static_assert1.C  -std=gnu++20  at line 257 (test for errors, line 256)
    FAIL: g++.dg/cpp26/static_assert1.C  -std=gnu++20  (test for errors, line 261)
    PASS: g++.dg/cpp26/static_assert1.C  -std=gnu++20  (test for warnings, line 262)
    [...]
    PASS: g++.dg/cpp26/static_assert1.C  -std=gnu++20  at line 308 (test for errors, line 307)
    UNSUPPORTED: g++.dg/cpp26/static_assert1.C  -std=gnu++20: exception handling disabled

Use an explicit '-fexceptions' to turn this front end test case all-PASS.

gcc/testsuite/
* g++.dg/cpp26/static_assert1.C: Fix for '-fno-exceptions'
configurations.

Fix '23_containers/span/at.cc' for '-fno-exceptions' configurations

Added in recent commit 1fa85dcf656e2f2c7e483c9ed3c2680bf7db6858
"libstdc++: Add std::span::at for C++26 (P2821R5)", the test case already
does use '#if __cpp_exceptions', but failed to correspondingly guard the
'dg-warning' directives, resulting in:

    FAIL: 23_containers/span/at.cc  -std=gnu++26  (test for warnings, line 15)
    FAIL: 23_containers/span/at.cc  -std=gnu++26  (test for warnings, line 26)
    PASS: 23_containers/span/at.cc  -std=gnu++26 (test for excess errors)
    PASS: 23_containers/span/at.cc  -std=gnu++26 execution test

libstdc++-v3/
* testsuite/23_containers/span/at.cc: Fix for '-fno-exceptions'
configurations.

Adjust 'g++.dg/ext/has-feature.C' for default-'-fno-exceptions', '-fno-rtti' configurations

..., where you currently get:

FAIL: g++.dg/ext/has-feature.C -std=gnu++98 (test for excess errors)
[...]

Minor fix-up for recent commit 06280a906cb3dc80cf5e07cf3335b758848d488d
"c-family: Implement __has_feature and __has_extension [PR60512]".

gcc/testsuite/
* g++.dg/ext/has-feature.C: Adjust for default-'-fno-exceptions',
'-fno-rtti' configurations.

middle-end/110237 - wrong MEM_ATTRs for partial loads/stores

The following addresses a miscompilation by RTL scheduling related
to the representation of masked stores.  For that we have

(insn 38 35 39 3 (set (mem:V16SI (plus:DI (reg:DI 40 r12 [orig:90 _22 ] [90])
                (const:DI (plus:DI (symbol_ref:DI ("b") [flags 0x2]  <var_decl 0x7ffff6e28d80 b>)
                        (const_int -4 [0xfffffffffffffffc])))) [1 MEM <vector(16) int> [(int *)vectp_b.12_28]+0 S64 A32])
        (vec_merge:V16SI (reg:V16SI 20 xmm0 [118])
            (mem:V16SI (plus:DI (reg:DI 40 r12 [orig:90 _22 ] [90])
                    (const:DI (plus:DI (symbol_ref:DI ("b") [flags 0x2]  <var_decl 0x7ffff6e28d80 b>)
                            (const_int -4 [0xfffffffffffffffc])))) [1 MEM <vector(16) int> [(int *)vectp_b.12_28]+0 S64 A32])

and specifically the memory attributes

  [1 MEM <vector(16) int> [(int *)vectp_b.12_28]+0 S64 A32]

are problematic.  They tell us the instruction stores and reads a full
vector which it if course does not.  There isn't any good MEM_EXPR
we can use here (we lack a way to just specify a pointer and restrict
info for example), and since the MEMs have a vector mode it's
difficult in general as passes do not need to look at the memory
attributes at all.

The easiest way to avoid running into the alias analysis problem is
to scrap the MEM_EXPR when we expand the internal functions for
partial loads/stores.  That avoids the disambiguation we run into
which is realizing that we store to an object of less size as
the size of the mode we appear to store.

After the patch we see just

  [1  S64 A32]

so we preserve the alias set, the alignment and the size (the size
is redundant if the MEM insn't BLKmode).  That's still not good
in case the RTL alias oracle would implement the same
disambiguation but it fends off the gimple one.

This fixes gcc.dg/torture/pr58955-2.c when built with AVX512
and --param=vect-partial-vector-usage=1.

PR middle-end/110237
* internal-fn.cc (expand_partial_load_optab_fn): Clear
MEM_EXPR and MEM_OFFSET.
(expand_partial_store_optab_fn): Likewise.

fold-const: Fix up multiple_of_p [PR112733]

We ICE on the following testcase when wi::multiple_of_p is called on
widest_int 1 and -128 with UNSIGNED.  I still need to work on the
actual wide-int.cc issue, the latest patch attached to the PR regressed
bitint-{38,39}.c, so will need to debug that, but there is a clear bug
on the fold-const.cc side as well - widest_int is a signed representation
by definition, using UNSIGNED with it certainly doesn't match what was
intended, because -128 as the second operand effectively means unsigned
131072 bit 0xfffff............ffff80 integer, not the signed char -128
that appeared in the source.

In the INTEGER_CST case a few lines above this we already use
    case INTEGER_CST:
      if (TREE_CODE (bottom) != INTEGER_CST || integer_zerop (bottom))
        return false;
      return wi::multiple_of_p (wi::to_widest (top), wi::to_widest (bottom),
                                SIGNED);
so I think using SIGNED with widest_int is best there (compared to the
other choices in the PR).

2023-11-29  Jakub Jelinek  <jakub@redhat.com>

PR middle-end/112733
* fold-const.cc (multiple_of_p): Pass SIGNED rather than
UNSIGNED for wi::multiple_of_p on widest_int arguments.

* gcc.dg/pr112733.c: New test.

testsuite, x86: Handle a broken assembler

Earlier assembler support for complex fp16 on x86_64 Darwin is broken.
This adds an additional test to the existing target-supports that fails
for the broken assemblers but works for the newer, fixed, ones.

gcc/testsuite/ChangeLog:

* lib/target-supports.exp: Test an asm line that fails on broken
Darwin assembler versions.

testsuite: Adjust g++.dg/opt/devirt2.C on SPARC

Since 20231124, g++.dg/opt/devirt2.C began to FAIL on 32 and 64-bit
Solaris/SPARC:

FAIL: g++.dg/opt/devirt2.C  -std=gnu++14  scan-assembler-times (jmp|call)[^\\n]*xyzzy 4
FAIL: g++.dg/opt/devirt2.C  -std=gnu++17  scan-assembler-times (jmp|call)[^\\n]*xyzzy 4
FAIL: g++.dg/opt/devirt2.C  -std=gnu++20  scan-assembler-times (jmp|call)[^\\n]*xyzzy 4
FAIL: g++.dg/opt/devirt2.C  -std=gnu++98  scan-assembler-times (jmp|call)[^\\n]*xyzzy 4

This is no doubt due to

commit ba0869323e1d45b1328b4cb723cb139a2e2146c3
Author: Maciej W. Rozycki <macro@embecosm.com>
Date:   Thu Nov 23 16:13:59 2023 +0000

    testsuite: Fix subexpressions with `scan-assembler-times'

which fixes exactly the double-counting the test relied on/worked around
on sparc.  Fixed by adjusting the count.

Tested on sparc-sun-solaris2.11.

2023-11-28  Rainer Orth  <ro@CeBiTec.Uni-Bielefeld.DE>

gcc/testsuite:
* g++.dg/opt/devirt2.C: Adjust scan-assembler-count on sparc for
removal of -inline from regexp.  Update comment.

RISC-V: Support highpart register overlap for vwcvt

Since Richard supports register filters recently, we are able to support highpart register
overlap for widening RVV instructions.

This patch support it for vwcvt intrinsics.

I leverage real application user codes for vwcvt:
https://github.com/riscv/riscv-v-spec/issues/929
https://godbolt.org/z/xoeGnzd8q

This is the real application codes that using LMUL = 8 with unrolling to gain optimal
performance for specific libraury.

You can see in the codegen, GCC has optimal codegen for such since we supported register
lowpart overlap for narrowing instructions (dest EEW < source EEW).

Now, we start to support highpart register overlap from this patch for widening instructions (dest EEW > source EEW).

Leverage this intrinsic codes above but for vwcvt:

https://godbolt.org/z/1TMPE5Wfr

size_t
foo (char const *buf, size_t len)
{
  size_t sum = 0;
  size_t vl = __riscv_vsetvlmax_e8m8 ();
  size_t step = vl * 4;
  const char *it = buf, *end = buf + len;
  for (; it + step <= end;)
    {
      vint8m4_t v0 = __riscv_vle8_v_i8m4 ((void *) it, vl);
      it += vl;
      vint8m4_t v1 = __riscv_vle8_v_i8m4 ((void *) it, vl);
      it += vl;
      vint8m4_t v2 = __riscv_vle8_v_i8m4 ((void *) it, vl);
      it += vl;
      vint8m4_t v3 = __riscv_vle8_v_i8m4 ((void *) it, vl);
      it += vl;

      asm volatile("nop" ::: "memory");
      vint16m8_t vw0 = __riscv_vwcvt_x_x_v_i16m8 (v0, vl);
      vint16m8_t vw1 = __riscv_vwcvt_x_x_v_i16m8 (v1, vl);
      vint16m8_t vw2 = __riscv_vwcvt_x_x_v_i16m8 (v2, vl);
      vint16m8_t vw3 = __riscv_vwcvt_x_x_v_i16m8 (v3, vl);

      asm volatile("nop" ::: "memory");
      size_t sum0 = __riscv_vmv_x_s_i16m8_i16 (vw0);
      size_t sum1 = __riscv_vmv_x_s_i16m8_i16 (vw1);
      size_t sum2 = __riscv_vmv_x_s_i16m8_i16 (vw2);
      size_t sum3 = __riscv_vmv_x_s_i16m8_i16 (vw3);

      sum += sumation (sum0, sum1, sum2, sum3);
    }
  return sum;
}

Before this patch:

...
csrr    t0,vlenb
...
        vwcvt.x.x.v     v16,v8
        vwcvt.x.x.v     v8,v28
        vs8r.v  v16,0(sp)               ---> spill
        vwcvt.x.x.v     v16,v24
        vwcvt.x.x.v     v24,v4
        nop
        vsetvli zero,zero,e16,m8,ta,ma
        vmv.x.s a2,v16
        vl8re16.v       v16,0(sp)      --->  reload
...
csrr    t0,vlenb
...

You can see heavy spill && reload inside the loop body.

After this patch:

...
vwcvt.x.x.v v8,v12
vwcvt.x.x.v v16,v20
vwcvt.x.x.v v24,v28
vwcvt.x.x.v v0,v4
...

Optimal codegen after this patch.

Tested on zvl128b no regression.

I am gonna to test zve64d/zvl256b/zvl512b/zvl1024b.

Ok for trunk if no regression on the testing above ?

Co-authored-by: kito-cheng <kito.cheng@sifive.com>
Co-authored-by: kito-cheng <kito.cheng@gmail.com>
PR target/112431

gcc/ChangeLog:

* config/riscv/constraints.md (TARGET_VECTOR ? V_REGS : NO_REGS): New register filters.
* config/riscv/riscv.md (no,W21,W42,W84,W41,W81,W82): Ditto.
(no,yes): Ditto.
* config/riscv/vector.md: Support highpart register overlap for vwcvt.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/pr112431-1.c: New test.
* gcc.target/riscv/rvv/base/pr112431-2.c: New test.
* gcc.target/riscv/rvv/base/pr112431-3.c: New test.

testsuite: Handle double-quoted LTO section names [PR112728]

The gcc.dg/scantest-lto.c test FAILs on Solaris/SPARC with the native as:

FAIL: gcc.dg/scantest-lto.c scan-assembler-not ascii
FAIL: gcc.dg/scantest-lto.c scan-assembler-times ascii 0

It requires double-quoting the section name which scanasm.exp doesn't
allow for.

This patch fixes that.

Tested on sparc-sun-solaris2.11 (as and gas) and i386-pc-solaris2.11 (as
and gas).

2023-11-23 Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE>

gcc/testsuite:
PR testsuite/112728
* lib/scanasm.exp (dg-scan): Allow for double-quoted LTO section names.
(scan-assembler-times): Likewise.
(scan-assembler-dem-not): Likewise.

RISC-V: Add explicit braces to eliminate  warning.

../.././gcc/gcc/config/riscv/riscv.cc: In function ‘void riscv_option_override()’:
../.././gcc/gcc/config/riscv/riscv.cc:8673:6: warning: suggest explicit braces to avoid ambiguous ‘else’ [-Wdangling-else]
   if (TARGET_RVE)
      ^

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_option_override): Eliminate warning.

testsuite: move gcc.c-torture/compile/libcall-2.c to gcc.target/i386/libcall-1.c

This patch relocates a test that is really x86 specific, and changes
it to use check_effective_target_int128.

gcc/testsuite/ChangeLog

* gcc.c-torture/compile/libcall-2.c: Remove.
* gcc.target/i386/libcall-1.c: Moved from
gcc.c-torture/compile/libcall-2.c and adapted to use
effective-target for int128_t.

c++: Fix a compile time memory leak in finish_static_assert

On Tue, Nov 28, 2023 at 11:31:48AM -0500, Jason Merrill wrote:
> Jonathan pointed out elsewhere that this gets leaked if error return
> prevents us from getting to the XDELETEVEC.

As there is a single error return in which it can leak, I've just added
a XDELETEVEC (buf); statement to that path rather than introducing some
RAII solution.

2023-11-29 Jakub Jelinek <jakub@redhat.com>

* semantics.cc (finish_static_assert): Free buf on error return.

fold-mem-offsets: Fix powerpc64le-linux profiledbootstrap [PR111601]

The introduction of the fold-mem-offsets pass breaks profiledbootstrap
on powerpc64le-linux.
From what I can see, the pass works one basic block at a time and
will punt on any non-DEBUG_INSN uses outside of the current block
(I believe because of the
          /* This use affects instructions outside of CAN_FOLD_INSNS.  */
          if (!bitmap_bit_p (&can_fold_insns, INSN_UID (use)))
            return 0;
test and can_fold_insns only set in do_analysis (when processing insns in
current bb, cleared at the end) or results of get_single_def_in_bb
(which are checked to be in the same bb).
But, while get_single_def_in_bb checks for
  if (DF_INSN_LUID (def) > DF_INSN_LUID (insn))
    return NULL;
The basic block in the PR in question has:
...
(insn 212 210 215 25 (set (mem/f:DI (reg/v/f:DI 10 10 [orig:152 last_viable ] [152]) [2 *last_viable_336+0 S8 A64])
        (reg/f:DI 9 9 [orig:155 _342 ] [155])) "pr111601.ii":50:17 683 {*movdi_internal64}
     (expr_list:REG_DEAD (reg/v/f:DI 10 10 [orig:152 last_viable ] [152])
        (nil)))
(insn 215 212 484 25 (set (reg:DI 5 5 [226])
        (const_int 0 [0])) "pr111601.ii":52:12 683 {*movdi_internal64}
     (expr_list:REG_EQUIV (const_int 0 [0])
        (nil)))
(insn 484 215 218 25 (set (reg/v/f:DI 10 10 [orig:152 last_viable ] [152])
        (reg/f:DI 9 9 [orig:155 _342 ] [155])) "pr111601.ii":52:12 683 {*movdi_internal64}
     (nil))
...
(insn 564 214 216 25 (set (reg/v/f:DI 10 10 [orig:152 last_viable ] [152])
        (plus:DI (reg/v/f:DI 10 10 [orig:152 last_viable ] [152])
            (const_int 96 [0x60]))) "pr111601.ii":52:12 66 {*adddi3}
     (nil))
(insn 216 564 219 25 (set (mem/f:DI (reg/v/f:DI 10 10 [orig:152 last_viable ] [152]) [2 _343->next+0 S8 A64])
        (reg:DI 5 5 [226])) "pr111601.ii":52:12 683 {*movdi_internal64}
     (expr_list:REG_DEAD (reg:DI 5 5 [226])
        (nil)))
...
and when asking for all uses of %r10 from def 564, it will see uses
in 216 and 212; the former is after the += 96 addition and gets changed
to load from %r10+96 with the addition being dropped, but there is
the other store which is a use across the backedge and when reached
from other edges certainly doesn't have the + 96 addition anywhere,
so the pass doesn't actually change that location.

This patch adds checks from get_single_def_in_bb to get_uses as well,
in particular check that the (regular non-debug) use only appears in the
same basic block as the definition and that it doesn't appear before it (i.e.
use across backedge).

2023-11-29  Jakub Jelinek  <jakub@redhat.com>

PR bootstrap/111601
* fold-mem-offsets.cc (get_uses): Ignore DEBUG_INSN uses.  Otherwise,
punt if use is in a different basic block from INSN or appears before
INSN in the same basic block.  Formatting fixes.
(get_single_def_in_bb): Formatting fixes.
(fold_offsets_1, pass_fold_mem_offsets::execute): Comment formatting
fixes.

* g++.dg/opt/pr111601.C: New test.

LoongArch: Use LSX for scalar FP rounding with explicit rounding mode

In LoongArch FP base ISA there is only the frint.{s/d} instruction which
reads the global rounding mode. Utilize LSX for explicit rounding mode
even if the operand is scalar. It seems wasting the CPU power, but
still much faster than calling the library function.

gcc/ChangeLog:

* config/loongarch/simd.md (LSX_SCALAR_FRINT): New int iterator.
(VLSX_FOR_FMODE): New mode attribute.
(<simd_for_scalar_frint_pattern><mode>2): New expander,
expanding to vreplvei.{w/d} + frint{rp/rz/rm/rne}.{s.d}.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/vect-frint-scalar.c: New test.
* gcc.target/loongarch/vect-frint-scalar-no-inexact.c: New test.

LoongArch: Remove lrint_allow_inexact

No functional change, just a cleanup.

gcc/ChangeLog:

* config/loongarch/loongarch.md (lrint_allow_inexact): Remove.
(<lrint_pattern><ANYF:mode><ANYFI:mode>2): Check if <LRINT>
== UNSPEC_FTINT instead of <lrint_allow_inexact>.

LoongArch: Use standard pattern name and RTX code for LSX/LASX rotate shift

Remove unnecessary UNSPECs and make the [x]vrotr[i] instructions useful
with GNU vectors and auto vectorization.

gcc/ChangeLog:

* config/loongarch/lsx.md (bitimm): Move to ...
(UNSPEC_LSX_VROTR): Remove.
(lsx_vrotr_<lsxfmt>): Remove.
(lsx_vrotri_<lsxfmt>): Remove.
* config/loongarch/lasx.md (UNSPEC_LASX_XVROTR): Remove.
(lsx_vrotr_<lsxfmt>): Remove.
(lsx_vrotri_<lsxfmt>): Remove.
* config/loongarch/simd.md (bitimm): ... here. Expand it to
cover LASX modes.
(vrotr<mode>3): New define_insn.
(vrotri<mode>3): New define_insn.
* config/loongarch/loongarch-builtins.cc:
(CODE_FOR_lsx_vrotr_b): Use standard pattern name.
(CODE_FOR_lsx_vrotr_h): Likewise.
(CODE_FOR_lsx_vrotr_w): Likewise.
(CODE_FOR_lsx_vrotr_d): Likewise.
(CODE_FOR_lasx_xvrotr_b): Likewise.
(CODE_FOR_lasx_xvrotr_h): Likewise.
(CODE_FOR_lasx_xvrotr_w): Likewise.
(CODE_FOR_lasx_xvrotr_d): Likewise.
(CODE_FOR_lsx_vrotri_b): Define to standard pattern name.
(CODE_FOR_lsx_vrotri_h): Likewise.
(CODE_FOR_lsx_vrotri_w): Likewise.
(CODE_FOR_lsx_vrotri_d): Likewise.
(CODE_FOR_lasx_xvrotri_b): Likewise.
(CODE_FOR_lasx_xvrotri_h): Likewise.
(CODE_FOR_lasx_xvrotri_w): Likewise.
(CODE_FOR_lasx_xvrotri_d): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/vect-rotr.c: New test.

LoongArch: Use standard pattern name and RTX code for LSX/LASX muh instructions

Removes unnecessary UNSPECs and make the muh instructions useful with
GNU vectors or auto vectorization.

gcc/ChangeLog:

* config/loongarch/simd.md (muh): New code attribute mapping
any_extend to smul_highpart or umul_highpart.
(<su>mul<mode>3_highpart): New define_insn.
* config/loongarch/lsx.md (UNSPEC_LSX_VMUH_S): Remove.
(UNSPEC_LSX_VMUH_U): Remove.
(lsx_vmuh_s_<lsxfmt>): Remove.
(lsx_vmuh_u_<lsxfmt>): Remove.
* config/loongarch/lasx.md (UNSPEC_LASX_XVMUH_S): Remove.
(UNSPEC_LASX_XVMUH_U): Remove.
(lasx_xvmuh_s_<lasxfmt>): Remove.
(lasx_xvmuh_u_<lasxfmt>): Remove.
* config/loongarch/loongarch-builtins.cc (CODE_FOR_lsx_vmuh_b):
Redefine to standard pattern name.
(CODE_FOR_lsx_vmuh_h): Likewise.
(CODE_FOR_lsx_vmuh_w): Likewise.
(CODE_FOR_lsx_vmuh_d): Likewise.
(CODE_FOR_lsx_vmuh_bu): Likewise.
(CODE_FOR_lsx_vmuh_hu): Likewise.
(CODE_FOR_lsx_vmuh_wu): Likewise.
(CODE_FOR_lsx_vmuh_du): Likewise.
(CODE_FOR_lasx_xvmuh_b): Likewise.
(CODE_FOR_lasx_xvmuh_h): Likewise.
(CODE_FOR_lasx_xvmuh_w): Likewise.
(CODE_FOR_lasx_xvmuh_d): Likewise.
(CODE_FOR_lasx_xvmuh_bu): Likewise.
(CODE_FOR_lasx_xvmuh_hu): Likewise.
(CODE_FOR_lasx_xvmuh_wu): Likewise.
(CODE_FOR_lasx_xvmuh_du): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/vect-muh.c: New test.

LoongArch: Fix usage of LSX and LASX frint/ftint instructions [PR112578]

The usage LSX and LASX frint/ftint instructions had some problems:

1. These instructions raises FE_INEXACT, which is not allowed with
   -fno-fp-int-builtin-inexact for most C2x section F.10.6 functions
   (the only exceptions are rint, lrint, and llrint).
2. The "frint" instruction without explicit rounding mode is used for
   roundM2, this is incorrect because roundM2 is defined "rounding
   operand 1 to the *nearest* integer, rounding away from zero in the
   event of a tie".  We actually don't have such an instruction.  Our
   frintrne instruction is roundevenM2 (unfortunately, this is not
   documented).
3. These define_insn's are written in a way not so easy to hack.

So I removed these instructions and created a "simd.md" file, then added
them and the corresponding expanders there.  The advantage of the
simd.md file is we don't need to duplicate the RTL template twice (in
lsx.md and lasx.md).

gcc/ChangeLog:

PR target/112578
* config/loongarch/lsx.md (UNSPEC_LSX_VFTINT_S,
UNSPEC_LSX_VFTINTRNE, UNSPEC_LSX_VFTINTRP,
UNSPEC_LSX_VFTINTRM, UNSPEC_LSX_VFRINTRNE_S,
UNSPEC_LSX_VFRINTRNE_D, UNSPEC_LSX_VFRINTRZ_S,
UNSPEC_LSX_VFRINTRZ_D, UNSPEC_LSX_VFRINTRP_S,
UNSPEC_LSX_VFRINTRP_D, UNSPEC_LSX_VFRINTRM_S,
UNSPEC_LSX_VFRINTRM_D): Remove.
(ILSX, FLSX): Move into ...
(VIMODE): Move into ...
(FRINT_S, FRINT_D): Remove.
(frint_pattern_s, frint_pattern_d, frint_suffix): Remove.
(lsx_vfrint_<flsxfmt>, lsx_vftint_s_<ilsxfmt>_<flsxfmt>,
lsx_vftintrne_w_s, lsx_vftintrne_l_d, lsx_vftintrp_w_s,
lsx_vftintrp_l_d, lsx_vftintrm_w_s, lsx_vftintrm_l_d,
lsx_vfrintrne_s, lsx_vfrintrne_d, lsx_vfrintrz_s,
lsx_vfrintrz_d, lsx_vfrintrp_s, lsx_vfrintrp_d,
lsx_vfrintrm_s, lsx_vfrintrm_d,
<FRINT_S:frint_pattern_s>v4sf2,
<FRINT_D:frint_pattern_d>v2df2, round<mode>2,
fix_trunc<mode>2): Remove.
* config/loongarch/lasx.md: Likewise.
* config/loongarch/simd.md: New file.
(ILSX, ILASX, FLSX, FLASX, VIMODE): ... here.
(IVEC, FVEC): New mode iterators.
(VIMODE): ... here.  Extend it to work for all LSX/LASX vector
modes.
(x, wu, simd_isa, WVEC, vimode, simdfmt, simdifmt_for_f,
elebits): New mode attributes.
(UNSPEC_SIMD_FRINTRP, UNSPEC_SIMD_FRINTRZ, UNSPEC_SIMD_FRINT,
UNSPEC_SIMD_FRINTRM, UNSPEC_SIMD_FRINTRNE): New unspecs.
(SIMD_FRINT): New int iterator.
(simd_frint_rounding, simd_frint_pattern): New int attributes.
(<simd_isa>_<x>vfrint<simd_frint_rounding>_<simdfmt>): New
define_insn template for frint instructions.
(<simd_isa>_<x>vftint<simd_frint_rounding>_<simdifmt_for_f>_<simdfmt>):
Likewise, but for ftint instructions.
(<simd_frint_pattern><mode>2): New define_expand with
flag_fp_int_builtin_inexact checked.
(l<simd_frint_pattern><mode><vimode>2): Likewise.
(ftrunc<mode>2): New define_expand.  It does not require
flag_fp_int_builtin_inexact.
(fix_trunc<mode><vimode>2): New define_insn_and_split.  It does
not require flag_fp_int_builtin_inexact.
(include): Add lsx.md and lasx.md.
* config/loongarch/loongarch.md (include): Include simd.md,
instead of including lsx.md and lasx.md directly.
* config/loongarch/loongarch-builtins.cc
(CODE_FOR_lsx_vftint_w_s, CODE_FOR_lsx_vftint_l_d,
CODE_FOR_lasx_xvftint_w_s, CODE_FOR_lasx_xvftint_l_d):
Remove.

gcc/testsuite/ChangeLog:

PR target/112578
* gcc.target/loongarch/vect-frint.c: New test.
* gcc.target/loongarch/vect-frint-no-inexact.c: New test.
* gcc.target/loongarch/vect-ftint.c: New test.
* gcc.target/loongarch/vect-ftint-no-inexact.c: New test.

Introduce hardbool attribute for C

This patch introduces hardened booleans in C.  The hardbool attribute,
when attached to an integral type, turns it into an enumerate type
with boolean semantics, using the named or implied constants as
representations for false and true.

Expressions of such types decay to _Bool, trapping if the value is
neither true nor false, and _Bool can convert implicitly back to them.
Other conversions go through _Bool first.

for  gcc/c-family/ChangeLog

* c-attribs.cc (c_common_attribute_table): Add hardbool.
(handle_hardbool_attribute): New.
(type_valid_for_vector_size): Reject hardbool.
* c-common.cc (convert_and_check): Skip warnings for convert
and check for hardbool.
(c_hardbool_type_attr_1): New.
* c-common.h (c_hardbool_type_attr): New.

for  gcc/c/ChangeLog

* c-typeck.cc (convert_lvalue_to_rvalue): Decay hardbools.
* c-convert.cc (convert): Convert to hardbool through
truthvalue.
* c-decl.cc (check_bitfield_type_and_width): Skip enumeral
truncation warnings for hardbool.
(finish_struct): Propagate hardbool attribute to bitfield
types.
(digest_init): Convert to hardbool.

for  gcc/ChangeLog

* doc/extend.texi (hardbool): New type attribute.
* doc/invoke.texi (-ftrivial-auto-var-init): Document
representation vs values.

for  gcc/testsuite/ChangeLog

* gcc.dg/hardbool-err.c: New.
* gcc.dg/hardbool-trap.c: New.
* gcc.dg/torture/hardbool.c: New.
* gcc.dg/torture/hardbool-s.c: New.
* gcc.dg/torture/hardbool-us.c: New.
* gcc.dg/torture/hardbool-i.c: New.
* gcc.dg/torture/hardbool-ul.c: New.
* gcc.dg/torture/hardbool-ll.c: New.
* gcc.dg/torture/hardbool-5a.c: New.
* gcc.dg/torture/hardbool-s-5a.c: New.
* gcc.dg/torture/hardbool-us-5a.c: New.
* gcc.dg/torture/hardbool-i-5a.c: New.
* gcc.dg/torture/hardbool-ul-5a.c: New.
* gcc.dg/torture/hardbool-ll-5a.c: New.

call maybe_return_this in build_clone

__dt_base doesn't get its body from a maybe_return_this caller, it's
rather cloned with the full body within build_clone, and then it's
left alone, without going through finish_function_body or
build_delete_destructor_body, that call maybe_return_this.

Now, this is correct as far as the generated code is concerned, since
the cloned body of a cdtor that returns this is also a cdtor body that
returns this.  The problem is that the decl for THIS is also cloned,
and it doesn't get the warning suppression introduced by
maybe_return_this, so Wuse-after-free3.C fails with an excess warning
at the closing brace of the dtor body.

I've split out the warning suppression from maybe_return_this, and
arranged to call that bit from the relevant build_clone case.
Unfortunately, because the warning is silenced for all uses of the
THIS decl, rather than only for the ABI-mandated return stmt, this
also silences the very warning that the testcase checks for.

I'm not revamping the warning suppression approach to overcome this,
so I'm xfailing the expected warning on ARM EABI, hoping that's the
only target with cdtor_return_this, and leaving it at that.

for  gcc/cp/ChangeLog

* decl.cc (maybe_prepare_return_this): Split out of...
(maybe_return_this): ... this.
* cp-tree.h (maybe_prepare_return_this): Declare.
* class.cc (build_clone): Call it.

for  gcc/testsuite/ChangeLog

* g++.dg/warn/Wuse-after-free3.C: xfail on arm_eabi.

c++: for contracts, cdtors never return this

When targetm.cxx.cdtor_return_this() holds, cdtors have a
non-VOID_TYPE_P result, but IMHO this ABI implementation detail
shouldn't leak to the abstract language conceptual framework, in which
cdtors don't have return values.  For contracts, specifically those
that establish postconditions on results, such a leakage is present,
and the present patch puts an end to it: with it, cdtors get an error
for result postconditions regardless of the ABI.  This fixes
g++.dg/contracts/contracts-ctor-dtor2.C on arm-eabi.

for  gcc/cp/ChangeLog

* contracts.cc (check_postcondition_result): Cope with
cdtor_return_this.

Introduce -finline-stringops

try_store_by_multiple_pieces was added not long ago, enabling
variable-sized memset to be expanded inline when the worst-case
in-range constant length would, using conditional blocks with powers
of two to cover all possibilities of length and alignment.

This patch introduces -finline-stringops[=fn] to request expansions to
start with a loop, so as to still take advantage of known alignment
even with long lengths, but without necessarily adding store blocks
for every power of two.

This makes it possible for the supported stringops (memset, memcpy,
memmove, memset) to be expanded, even if storing a single byte per
iteration.  Surely efficient implementations can run faster, with a
pre-loop to increase alignment, but that would likely be excessive for
inline expansions.

Still, in some cases, such as in freestanding environments, users
prefer to inline such stringops, especially those that the compiler
may introduce itself, even if the expansion is not as performant as a
highly optimized C library implementation could be, to avoid
depending on a C runtime library.

for  gcc/ChangeLog

* expr.cc (emit_block_move_hints): Take ctz of len.  Obey
-finline-stringops.  Use oriented or sized loop.
(emit_block_move): Take ctz of len, and pass it on.
(emit_block_move_via_sized_loop): New.
(emit_block_move_via_oriented_loop): New.
(emit_block_move_via_loop): Take incr.  Move an incr-sized
block per iteration.
(emit_block_cmp_via_cmpmem): Take ctz of len.  Obey
-finline-stringops.
(emit_block_cmp_via_loop): New.
* expr.h (emit_block_move): Add ctz of len defaulting to zero.
(emit_block_move_hints): Likewise.
(emit_block_cmp_hints): Likewise.
* builtins.cc (expand_builtin_memory_copy_args): Pass ctz of
len to emit_block_move_hints.
(try_store_by_multiple_pieces): Support starting with a loop.
(expand_builtin_memcmp): Pass ctz of len to
emit_block_cmp_hints.
(expand_builtin): Allow inline expansion of memset, memcpy,
memmove and memcmp if requested.
* common.opt (finline-stringops): New.
(ilsop_fn): New enum.
* flag-types.h (enum ilsop_fn): New.
* doc/invoke.texi (-finline-stringops): Add.

for  gcc/testsuite/ChangeLog

* gcc.dg/torture/inline-mem-cmp-1.c: New.
* gcc.dg/torture/inline-mem-cpy-1.c: New.
* gcc.dg/torture/inline-mem-cpy-cmp-1.c: New.
* gcc.dg/torture/inline-mem-move-1.c: New.
* gcc.dg/torture/inline-mem-set-1.c: New.

RISC-V: Bugfix for ICE in block move when zve32f

The exact_div requires the exactly multiple of the divider.
Unfortunately, the condition will be broken when zve32f in
some cases. For example,

potential_ew is 8
BYTES_PER_RISCV_VECTOR * lmul1 is [4, 4]

This patch would like to ensure the precondition of exact_div
when get_vec_mode.

PR target/112743

gcc/ChangeLog:

* config/riscv/riscv-string.cc (expand_block_move): Add
precondition check for exact_div.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/pr112743-1.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

testsuite: fix gcc.c-torture/compile/libcall-2.c in -m32

This test relies on having __int128 in x86_64 targets, which is only
available in -m64.

gcc/testsuite/ChangeLog

* gcc.c-torture/compile/libcall-2.c: Skip test in -m32.

[i386] Fix push2pop2 test fail on non-linux target [PR112729]

On linux x86-64, -fomit-frame-pointer was by default enabled so the
push2pop2 tests cfi scans are based on it. On other target with
-fno-omit-frame-pointer the cfi scan will be wrong as the frame pointer
is pushed at first. Add -fomit-frame-pointer to these tests that related
to cfi scan.

gcc/testsuite/ChangeLog:

PR target/112729
* gcc.target/i386/apx-interrupt-1.c: Add -fomit-frame-pointer.
* gcc.target/i386/apx-push2pop2-1.c: Likewise.
* gcc.target/i386/apx-push2pop2_force_drap-1.c: Likewise.

Daily bump.

c++: prvalue array decay [PR94264]

My change for PR53220 made array to pointer decay for prvalue arrays
ill-formed to catch well-defined C code that produces a dangling pointer in
C++ due to the shorter lifetime of compound literals. This wasn't really
correct, but wasn't a problem until C++17 added prvalue arrays, at which
point it started rejecting valid C++ code.

I wanted to make sure that we still diagnose the problematic code;
-Wdangling-pointer covers the array-lit.c case, but I needed to extend
-Wreturn-local-addr to handle the return case.

PR c++/94264
PR c++/53220

gcc/c/ChangeLog:

* c-typeck.cc (array_to_pointer_conversion): Adjust -Wc++-compat
diagnostic.

gcc/cp/ChangeLog:

* call.cc (convert_like_internal): Remove obsolete comment.
* typeck.cc (decay_conversion): Allow array prvalue.
(maybe_warn_about_returning_address_of_local): Check
for returning pointer to temporary.

gcc/testsuite/ChangeLog:

* c-c++-common/array-lit.c: Adjust.
* g++.dg/cpp1z/array-prvalue1.C: New test.
* g++.dg/ext/complit17.C: New test.

ARC: Consistent use of whitespace in assembler templates.

This minor clean-up patch tweaks arc.md to use whitespace consistently
in output templates, always using a TAB between the mnemonic and its
operands, and avoiding spaces after commas betweem operands. There
should be no functional changes with this patch, though several test
cases' scan-assembler need to be updated to use \\s+ instead of testing
for a TAB or a space explicitly.

2023-11-28 Roger Sayle <roger@nextmovesoftware.com>

gcc/ChangeLog
* config/arc/arc.md: Make output template whitespace consistent.

gcc/testsuite/ChangeLog
* gcc.target/arc/jli-1.c: Update dg-final whitespace.
* gcc.target/arc/jli-2.c: Likewise.
* gcc.target/arc/naked-1.c: Likewise.
* gcc.target/arc/naked-2.c: Likewise.
* gcc.target/arc/tmac-1.c: Likewise.
* gcc.target/arc/tmac-2.c: Likewise.

varasm.cc: refer to assemble_external_libcall only ifdef ASM_OUTPUT_EXTERNAL

This fixes boostrap in targets where ASM_OUTPUT_EXTERNAL is not
defined.

gcc/ChangeLog

* varasm.cc (assemble_external_libcall): Refer in assert only ifdef
ASM_OUTPUT_EXTERNAL.

MATCH: Fix invalid signed boolean type usage

This fixes the incorrect assumption that was done in r14-3721-ge6bcf839894783,
that being able to doing the negative after the conversion would be a valid thing
but really it is not valid for boolean types.

Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

PR tree-optimization/112738
* match.pd (`(nop_convert)-(convert)a`): Reject
when the outer type is boolean.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

c++: Fix up __has_extension (cxx_init_captures)

On Mon, Nov 27, 2023 at 10:58:04AM +0000, Alex Coplan wrote:
> Many thanks both for the reviews, this is now pushed (with Jason's
> above changes implemented) as g:06280a906cb3dc80cf5e07cf3335b758848d488d.

The new test FAILs everywhere with GXX_TESTSUITE_STDS=98,11,14,17,20,2b
I'm normally using for testing.
FAIL: g++.dg/ext/has-feature.C  -std=gnu++11 (test for excess errors)
Excess errors:
/home/jakub/src/gcc/gcc/testsuite/g++.dg/ext/has-feature.C:185:2: error: #error

This is on
#if __has_extension (cxx_init_captures) != CXX11
#error
#endif
Comparing the values with clang++ on godbolt and with what is actually
implemented:
void foo () { auto a = [b = 3]() { return b; }; }
both clang++ and GCC implement init captures as extension already in C++11
(and obviously not in C++98 because lambdas aren't implemented there),
unless -pedantic-errors/-Werror=pedantic, so I think we should change
the FE to match the test rather than the other way around.

Making __has_extension return __has_feature for -pedantic-errors and not
for -Werror=pedantic is just weird, but as that is what clang++ implements
and this is for compatibility with it, I can live with it (but perhaps
we should mention it in the documentation).  Note, the warnings/errors
can be changed using pragmas inside of the source, so whether one can
use an extension or not depends on where in the code it is (__extension__
to the rescue if it can be specified around it).
I wonder if the has-feature.C test shouldn't be #included in other 2 tests,
one where -pedantic-errors would be in dg-options and through some macro
tell the file that __has_extension will behave like __has_feature, and
another with -Werror=pedantic to document that the option doesn't change
it.

2023-11-28  Jakub Jelinek  <jakub@redhat.com>

* cp-objcp-common.cc (cp_feature_table): Evaluate
__has_extension (cxx_init_captures) to 1 even for -std=c++11.

Fix PR ada/111909 On Darwin, determine filesystem case sensitivity at runtime

In gcc/ada/adaint.c(__gnat_get_file_names_case_sensitive), the current
assumption for __APPLE__ is that file names are case-insensitive
unless __arm__ or __arm64__ are defined, in which case file names are
declared case-sensitive.

The associated comment is
  "By default, we suppose filesystems aren't case sensitive on
  Windows and Darwin (but they are on arm-darwin)."

This means that on aarch64-apple-darwin, file names are treated as
case-sensitive, which is not the default case.

The true default position is that macOS file systems are
case-insensitive, iOS file systems are case-sensitive.

Apple provide a header file <TargetConditionals.h> which permits a
compile-time check for the compiler target (e.g. OSX vs IOS); if
TARGET_OS_IOS is defined as 1, this is a build for iOS.

2023-11-22  Simon Wright  <simon@pushface.org>

gcc/ada/

PR ada/111909
* adaint.c
(__gnat_get_file_names_case_sensitive): Split out the __APPLE__
check and remove the checks for __arm__, __arm64__. For Apple,
file names are by default case-insensitive unless TARGET_OS_IOS is
set.

Signed-off-by: Simon Wright <simon@pushface.org>

middle-end/112741 - ICE with gimple FE and later regimplification

The GIMPLE frontend, when bypassing gimplification, doesn't set
DECL_SEEN_IN_BIND_EXPR_P given there are no such things in GIMPLE.
But it probably should set the flag anyway to avoid later ICEs
when regimplifying.

PR middle-end/112741
gcc/c/
* gimple-parser.cc (c_parser_parse_gimple_body): Also
set DECL_SEEN_IN_BIND_EXPR_Pfor locals.

gcc/testsuite/
* gcc.dg/ubsan/pr112741.c: New testcase.

middle-end/112732 - stray TYPE_ALIAS_SET in type variant

The following fixes a stray TYPE_ALIAS_SET in a type variant built
by build_opaque_vector_type which is diagnosed by type checking
enabled with -flto.

PR middle-end/112732
* tree.cc (build_opaque_vector_type): Reset TYPE_ALIAS_SET
of the newly built type.

i386: Improve cmpstrnqi_1 insn pattern [PR112494]

REPZ CMPSB instruction does not update FLAGS register when %ecx register
equals zero. Improve cmpstrnqi_1 insn pattern to set FLAGS_REG to its
previous value instead of (const_int 0) when operand 2 equals zero.

PR target/112494

gcc/ChangeLog:

* config/i386/i386.md (cmpstrnqi_1): Set FLAGS_REG to its previous
value when operand 2 equals zero.
(*cmpstrnqi_1): Ditto.
(*cmpstrnqi_1 peephole2): Ditto.

Revert "This patch enables errors when external calls are created."

Reverted commit was breaking the BPF build, because libgcc emits
libcalls to __builtin_abort.

This reverts commit faf5b148588bd7fbb60ec669aefa704044037cdc.

Fortran: fix reallocation on assignment of polymorphic variables [PR110415]

This patch fixes two bugs related to polymorphic class assignment in the
Fortran front-end. One (described in PR110415) is an issue with the malloc
and realloc calls using the size from the old vptr rather than the new one.
The other is caused by the return value from the realloc call being ignored.
Testcases are added for these issues.

2023-11-28 Andrew Jenner <andrew@codesourcery.com>

gcc/fortran/
PR fortran/110415
* trans-expr.cc (trans_class_vptr_len_assignment): Add
from_vptrp parameter. Populate it. Don't check for DECL_P
when deciding whether to create temporary.
(trans_class_pointer_fcn, gfc_trans_pointer_assignment): Add
NULL argument to trans_class_vptr_len_assignment calls.
(trans_class_assignment): Get rhs_vptr from
trans_class_vptr_len_assignment and use it for determining size
for allocation/reallocation. Use return value from realloc.

gcc/testsuite/
PR fortran/110415
* gfortran.dg/pr110415.f90: New test.
* gfortran.dg/asan/pr110415-2.f90: New test.
* gfortran.dg/asan/pr110415-3.f90: New test.

Co-Authored-By: Tobias Burnus <tobias@codesourcery.com>

Emit funcall external declarations only if actually used.

There are many places in GCC where alternative local sequences are
tried in order to determine what is the cheapest or best alternative
to use in the current target.  When any of these sequences involve a
libcall, the current implementation of emit_library_call_value_1
introduce a side-effect consisting on emitting an external declaration
for the funcall (such as __divdi3) which is thus emitted even if the
sequence that does the libcall is not retained.

This is problematic in targets such as BPF, because the kernel loader
chokes on the spurious symbol __divdi3 and makes the resulting BPF
object unloadable.  Note that BPF objects are not linked before being
loaded.

This patch changes asssemble_external_libcall to defer emitting
declarations of external libcall symbols, by saving the call tree
nodes in a temporary list pending_libcall_symbols and letting
process_pending_assembly_externals to emit them only if they have been
referenced.  Solution suggested and sketched by Richard Sandiford.

Regtested in x86_64-linux-gnu.
Tested with host x86_64-linux-gnu with target bpf-unknown-none.

gcc/ChangeLog

PR target/109253
* varasm.cc (pending_libcall_symbols): New variable.
(process_pending_assemble_externals): Process
pending_libcall_symbols.
(assemble_external_libcall): Defer emitting external libcall
symbols to process_pending_assemble_externals.

gcc/testsuite/ChangeLog

PR target/109253
* gcc.target/bpf/divmod-libcall-1.c: New test.
* gcc.target/bpf/divmod-libcall-2.c: Likewise.
* gcc.c-torture/compile/libcall-2.c: Likewise.

libsanitizer: Update LOCAL_PATCHES

2023-11-28 Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE>

libsanitizer:
* LOCAL_PATCHES: Update.

libsanitizer: Only use assembler symbol assignment if supported [PR112563]

This patch only enables symbol assignment if the configure test determined
it's supported.

Bootstrapped without regressions on sparc-sun-solaris2.11 (as and gas) and
i386-pc-solaris2.11 (as and gas).

2023-11-23 Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE>

libsanitizer:
PR sanitizer/112563
* sanitizer_common/sanitizer_redefine_builtins.h: Check
HAVE_AS_SYM_ASSIGN.

libsanitizer: Check assembler support for symbol assignment [PR112563]

The recent libsanitizer import broke the build on Solaris/SPARC with the
native as:

/usr/ccs/bin/as: ".libs/sanitizer_errno.s", line 4247: error: symbol
"__sanitizer_internal_memset" is used but not defined
/usr/ccs/bin/as: ".libs/sanitizer_errno.s", line 4247: error: symbol
"__sanitizer_internal_memcpy" is used but not defined
/usr/ccs/bin/as: ".libs/sanitizer_errno.s", line 4247: error: symbol
"__sanitizer_internal_memmove" is used but not defined

Since none of the alternatives considered in the PR worked out, this
patch checks if the assembler does support symbol assignment, disabling
the code otherwise. This returns the code to the way it was up to LLVM 16.

Bootstrapped without regressions on sparc-sun-solaris2.11 (as and gas) and
i386-pc-solaris2.11 (as and gas).

2023-11-23 Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE>

libsanitizer:
PR sanitizer/112563
* configure.ac (libsanitizer_cv_as_sym_assign): Check for
assembler symbol assignment support.
* configure: Regenerate.
* asan/Makefile.am (DEFS): Add @AS_SYM_ASSIGN_DEFS@.
* Makefile.in, asan/Makefile.in, hwasan/Makefile.in,
interception/Makefile.in, libbacktrace/Makefile.in,
lsan/Makefile.in, sanitizer_common/Makefile.in, tsan/Makefile.in,
ubsan/Makefile.in: Regenerate.

Fixed problem with BTF defining smaller enums.

This patch fixes a BTF, which would become invalid when having
smaller then 4 byte definitions of enums.
For example, when using the __attribute__((mode(byte))) in the enum
definition.

Two problems were identified:
- it would incorrectly create an entry for enum64 when the size of the
enum was different then 4.
- it would allocate less then 4 bytes for the value entry in BTF, in
case the type was smaller.

BTF generated was validated against clang.

gcc/ChangeLog:
* btfout.cc (btf_calc_num_vbytes): Fixed logic for enum64.
(btf_asm_enum_const): Corrected logic for enum64 and smaller
than 4 bytes values.

gcc/testsuite/ChangeLog:
* gcc.dg/debug/btf/btf-enum-small.c: Added test.

This patch enables errors when external calls are created.

When architectural limitations or usage of builtins implies the compiler
to create function calls to external libraries that implement the
functionality, GCC will now report an error claiming that this function
calls are not compatible with eBPF target.
Examples of those are the usage of __builtin_memmove and a sign division
in BPF ISA v3 or below that will require to call __divdi3.
This is currently an eBPF limitation which does not support linking of
object files but rather "raw" non linked ones. Those object files are
loaded and relocated by libbpf and the kernel.

gcc/ChangeLog:
* config/bpf/bpf.cc (bpf_output_call): Report error in case the
function call is for a builtin.
(bpf_external_libcall): Added target hook to detect and report
error when other external calls that are not builtins.

gcc/testsuite/ChangeLog:
* gcc.target/bpf/atomic-cmpxchg-2.c: Adapted.
* gcc.target/bpf/atomic-fetch-op-3.c: Adapted.
* gcc.target/bpf/atomic-op-3.c: Adapted.
* gcc.target/bpf/atomic-xchg-2.c: Adapted.
* gcc.target/bpf/diag-sdiv.c: Adapted.
* gcc.target/bpf/diag-smod.c: Adapted.

testsuite: Fix gcc.dg/pr111409.c on Solaris/SPARC with as

gcc.dg/pr111409.c FAILs on Solaris/SPARC with the native as:

FAIL: gcc.dg/pr111409.c scan-assembler-times .section\\\\s+.debug_macro 1

Unlike most other ELF targets, that assembler requires the section name to
be double-quoted. This patch allows for that. It also expects \t to
separate .section directive and name like other similar tests, and escapes
literal dots.

Tested on sparc-sun-solaris2.11 (as and gas) and i386-pc-solaris2.11 (as
and gas).

2023-11-23 Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE>

gcc/testsuite:
* gcc.dg/pr111409.c: Allow for " before .debug_macro.
Quote literals dots.

bpf: Forces __buildin_memcmp not to generate a call upto 1024 bytes.

This patch forces __builtin_memcmp calls upto data sizes of 1024 to
become inline in caller.
This is a requirement by BPF and it mimics the default behaviour of the
clang BPF implementation.

gcc/ChangeLog:
* config/bpf/bpf.cc (bpf_use_by_pieces_infrastructure_p): Added
function to bypass default behaviour.
* config/bpf/bpf.h (COMPARE_MAX_PIECES): Defined to 1024 bytes.

libstdc++: Include <stdint.h> in <bits/atomic_wait.h>

This is needed in order to compile it as a header-unit, which might be
desired because it's included by both <atomic> and <semaphore>.

libstdc++-v3/ChangeLog:

* include/bits/atomic_wait.h: Include <stdint.h>.

libstdc++: Fix typo in comment

libstdc++-v3/ChangeLog:

* include/bits/stl_uninitialized.h: Fix typo in comment.

bpf: Corrected condition in core_mark_as_access_index.

gcc/ChangeLog:
* config/bpf/core-builtins.cc (core_mark_as_access_index):
Corrected check.

bpf: Delayed the removal of the parser enum plugin handler.

The parser plugin handler that is responsible for collecting enum values
information was being removed way too early.
bpf_resolve_overloaded_core_builtin is called by the parser.
It was moved to the function execute_lower_bpf_core.

gcc/ChangeLog:
* config/bpf/core-builtins.cc
(bpf_resolve_overloaded_core_builtin): Removed call.
(execute_lower_bpf_core): Added all to remove_parser_plugin.