git.ipfire.org Git - thirdparty/gcc.git/log

ARC: Add *extvsi_n_0 define_insn_and_split for PR 110717.

This patch improves the code generated for bitfield sign extensions on
ARC cpus without a barrel shifter.

Compiling the following test case:

int foo(int x) { return (x<<27)>>27; }

with -O2 -mcpu=em, generates two loops:

foo: mov     lp_count,27
        lp      2f
        add     r0,r0,r0
        nop
2:      # end single insn loop
        mov     lp_count,27
        lp      2f
        asr     r0,r0
        nop
2:      # end single insn loop
        j_s     [blink]

and the closely related test case:

struct S { int a : 5; };
int bar (struct S *p) { return p->a; }

generates the slightly better:

bar: ldb_s   r0,[r0]
        mov_s   r2,0    ;3
        add3    r0,r2,r0
        sexb_s  r0,r0
        asr_s   r0,r0
        asr_s   r0,r0
        j_s.d   [blink]
        asr_s   r0,r0

which uses 6 instructions to perform this particular sign extension.
It turns out that sign extensions can always be implemented using at
most three instructions on ARC (without a barrel shifter) using the
idiom ((x&mask)^msb)-msb [as described in section "2-5 Sign Extension"
of Henry Warren's book "Hacker's Delight"].  Using this, the sign
extensions above on ARC's EM both become:

        bmsk_s  r0,r0,4
        xor     r0,r0,16
        sub     r0,r0,16

which takes about 3 cycles, compared to the ~112 cycles for the loops
in foo.

2023-12-13  Roger Sayle  <roger@nextmovesoftware.com>
    Jeff Law  <jlaw@ventanamicro.com>

gcc/ChangeLog
* config/arc/arc.md (*extvsi_n_0): New define_insn_and_split to
implement SImode sign extract using a AND, XOR and MINUS sequence.

gcc/testsuite/ChangeLog
* gcc.target/arc/extvsi-1.c: New test case.
* gcc.target/arc/extvsi-2.c: Likewise.

RISC-V:Add crypto vector implied ISA info.

Due to the crypto vector entension is depend on the Vector extension,
so add the implied ISA info with the corresponding crypto vector extension.

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc: Modify implied ISA info.
* config/riscv/arch-canonicalize: Add crypto vector implied info.

libstdc++: Fix regression in std::format output of %Y for negative years

The change in r14-6468-ga01462ae8bafa8 was only supposed to apply to %C
formats, not %Y.

libstdc++-v3/ChangeLog:

* include/bits/chrono_io.h (__formatter_chrono::_M_C_y_Y): Do
not round century down for %Y formats.

gettext: disable install, docs targets, libasprintf, threads

This fixes issues reported by David Edelsohn <dje.gcc@gmail.com>, and by
Eric Gallager <egallager@gcc.gnu.org>.

ChangeLog:

* Makefile.def (gettext): Disable (via missing)
{install-,}{pdf,html,info,dvi} and TAGS targets. Set no_install
to true. Add --disable-threads --disable-libasprintf. Drop the
lib_path (as there are no shared libs).
* Makefile.in: Regenerate.

download_prerequisites: add --only-gettext

contrib/ChangeLog:

* download_prerequisites
<arg parse>: Parse --only-gettext.
(echo_archives): Check only_gettext and stop early if true.
(helptext): Document --only-gettext.

RISC-V: Postpone full available optimization [VSETVL PASS]

Fix VSETVL BUG that AVL is polluted

.L15:
        li      a3,9
        lui     a4,%hi(s)
        sw      a3,%lo(j)(t2)
        sh      a5,%lo(s)(a4) <--a4 is hold the address of s
        beq     t0,zero,.L42
        sw      t5,8(t4)
        vsetvli zero,a4,e8,m8,ta,ma  <<--- a4 as avl

Actually, this vsetvl is redundant.
The root cause we include full available optimization in LCM local data computation.

full available optimization should be after LCM computation.

PR target/112929
PR target/112988

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc
(pre_vsetvl::compute_lcm_local_properties): Remove full available.
(pre_vsetvl::pre_global_vsetvl_info): Add full available optimization.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/vsetvl/pr112929.c: New test.
* gcc.target/riscv/rvv/vsetvl/pr112988.c: New test.

RISC-V: Fix dynamic lmul tests depended on abi

Some toolchain configs would report:
fatal error: gnu/stubs-ilp32.h: No such file or directory

Fix method suggested by Juzhe-Zhong

gcc/testsuite/ChangeLog:

* gcc.dg/vect/costmodel/riscv/rvv/riscv_vector.h: New file.

Signed-off-by: demin.han <demin.han@starfivetech.com>
Signed-off-by: demin.han <demin.han@starfivetech.com>

Middle-end: Adjust decrement IV style partial vectorization COST model

Hi, before this patch, a simple conversion case for RVV codegen:

foo:
        ble     a2,zero,.L8
        addiw   a5,a2,-1
        li      a4,6
        bleu    a5,a4,.L6
        srliw   a3,a2,3
        slli    a3,a3,3
        add     a3,a3,a0
        mv      a5,a0
        mv      a4,a1
        vsetivli        zero,8,e16,m1,ta,ma
.L4:
        vle8.v  v2,0(a5)
        addi    a5,a5,8
        vzext.vf2       v1,v2
        vse16.v v1,0(a4)
        addi    a4,a4,16
        bne     a3,a5,.L4
        andi    a5,a2,-8
        beq     a2,a5,.L10
.L3:
        slli    a4,a5,32
        srli    a4,a4,32
        subw    a2,a2,a5
        slli    a2,a2,32
        slli    a5,a4,1
        srli    a2,a2,32
        add     a0,a0,a4
        add     a1,a1,a5
        vsetvli zero,a2,e16,m1,ta,ma
        vle8.v  v2,0(a0)
        vzext.vf2       v1,v2
        vse16.v v1,0(a1)
.L8:
        ret
.L10:
        ret
.L6:
        li      a5,0
        j       .L3

This vectorization go through first loop:

        vsetivli        zero,8,e16,m1,ta,ma
.L4:
        vle8.v  v2,0(a5)
        addi    a5,a5,8
        vzext.vf2       v1,v2
        vse16.v v1,0(a4)
        addi    a4,a4,16
        bne     a3,a5,.L4

Each iteration processes 8 elements.

For a scalable vectorization with VLEN > 128 bits CPU, it's ok when VLEN = 128.
But, as long as VLEN > 128 bits, it will waste the CPU resources. That is, e.g. VLEN = 256bits.
only half of the vector units are working and another half is idle.

After investigation, I realize that I forgot to adjust COST for SELECT_VL.
So, adjust COST for SELECT_VL styple length vectorization. We adjust COST from 3 to 2. since
after this patch:

foo:
ble a2,zero,.L5
.L3:
vsetvli a5,a2,e16,m1,ta,ma     -----> SELECT_VL cost.
vle8.v v2,0(a0)
slli a4,a5,1                -----> additional shift of outcome SELECT_VL for memory address calculation.
vzext.vf2 v1,v2
sub a2,a2,a5
vse16.v v1,0(a1)
add a0,a0,a5
add a1,a1,a4
bne a2,zero,.L3
.L5:
ret

This patch is a simple fix that I previous forgot.

Ok for trunk ?

If not, I am going to adjust cost in backend cost model.

PR target/111317

gcc/ChangeLog:

* tree-vect-loop.cc (vect_estimate_min_profitable_iters): Adjust for COST for decrement IV.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/costmodel/riscv/rvv/pr111317.c: New test.

lower-bitint: Fix lowering of non-_BitInt to _BitInt cast merged with some wider cast [PR112940]

The following testcase ICEs, because a PHI argument from latch edge
uses a SSA_NAME set only in a conditionally executed block inside of the
loop.
This happens when we have some outer cast which lowers its operand several
times, under some condition with variable index, under different condition
with some constant index, otherwise something else, and then there is
an inner cast from non-_BitInt integer (or small/middle one).  Such cast
in certain conditions is emitted by initializing some SSA_NAMEs in the
initialization statements before loops (say for casts from <= limb size
precision by computing a SSA_NAME for the first limb and then extension
of it for the later limbs) and uses the prepare_data_in_out function
to create a PHI node.  Such function is passed the value (constant or
SSA_NAME) to use in the PHI argument from the pre-header edge, but for
the latch edge it always created a new SSA_NAME and then caller emitted
in the following 3 spots an extra assignment to set that SSA_NAME to
whatever value we want from the latch edge.  In all these 3 cases
the argument from the latch edge is known already before the loop though,
either constant or SSA_NAME computed in pre-header as well.
But the need to emit an assignment combined with the handle_operand done
in a conditional basic block results in the SSA verification failure.

The following patch fixes it by extending the prpare_data_in_out method,
so that when the latch edge argument is known before (constant or computed
in pre-header), we can just use it directly and avoid the extra assignment
that would normally be hopefully optimized away later to what we now emit
directly.

2023-12-13  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/112940
* gimple-lower-bitint.cc (struct bitint_large_huge): Add another
argument to prepare_data_in_out method defaulted to NULL_TREE.
(bitint_large_huge::handle_operand): Pass another argument to
prepare_data_in_out instead of emitting an assignment to set it.
(bitint_large_huge::prepare_data_in_out): Add VAL_OUT argument.
If non-NULL, use it as PHI argument instead of creating a new
SSA_NAME.
(bitint_large_huge::handle_cast): Pass rext as another argument
to 2 prepare_data_in_out calls instead of emitting assignments
to set them.

* gcc.dg/bitint-53.c: New test.

attribs: Fix valgrind failures on -Wno-attributes* tests [PR112953]

The r14-6076 change changed the allocation of attribute tables from
table = new attribute_spec[2];
to
table = new attribute_spec { ... };
with
ignored_attributes_table.safe_push (table);
later in both cases, but didn't change the corresponding delete in
free_attr_data, which means valgrind is unhappy about that:
FAIL: c-c++-common/Wno-attributes-2.c  -Wc++-compat  (test for excess errors)
Excess errors:
==974681== Mismatched free() / delete / delete []
==974681==    at 0x484965B: operator delete[](void*) (vg_replace_malloc.c:1103)
==974681==    by 0x707434: free_attr_data() (attribs.cc:318)
==974681==    by 0xCFF8A4: compile_file() (toplev.cc:454)
==974681==    by 0x704D23: do_compile (toplev.cc:2150)
==974681==    by 0x704D23: toplev::main(int, char**) (toplev.cc:2306)
==974681==    by 0x7064BA: main (main.cc:39)
==974681==  Address 0x51dffa0 is 0 bytes inside a block of size 40 alloc'd
==974681==    at 0x4845FF5: operator new(unsigned long) (vg_replace_malloc.c:422)
==974681==    by 0x70A040: handle_ignored_attributes_option(vec<char*, va_heap, vl_ptr>*) (attribs.cc:301)
==974681==    by 0x7FA089: handle_pragma_diagnostic_impl<false, false> (c-pragma.cc:934)
==974681==    by 0x7FA089: handle_pragma_diagnostic(cpp_reader*) (c-pragma.cc:1028)
==974681==    by 0x75814F: c_parser_pragma(c_parser*, pragma_context, bool*) (c-parser.cc:14707)
==974681==    by 0x784A85: c_parser_external_declaration(c_parser*) (c-parser.cc:2027)
==974681==    by 0x785223: c_parser_translation_unit (c-parser.cc:1900)
==974681==    by 0x785223: c_parse_file() (c-parser.cc:26713)
==974681==    by 0x7F6331: c_common_parse_file() (c-opts.cc:1301)
==974681==    by 0xCFF87D: compile_file() (toplev.cc:446)
==974681==    by 0x704D23: toplev::main(int, char**) (toplev.cc:2306)
==974681==    by 0x7064BA: main (main.cc:39)

2023-12-13  Jakub Jelinek  <jakub@redhat.com>

PR middle-end/112953
* attribs.cc (free_attr_data): Use delete x rather than delete[] x.

i386: Fix ICE on __builtin_ia32_pabsd128 without lhs [PR112962]

The following patch fixes ICE on the testcase in similar way to how
other folded builtins are handled in ix86_gimple_fold_builtin when
they don't have a lhs; these builtins are const or pure, so normally
DCE would remove them later, but with -O0 that isn't guaranteed to
happen, and during expansion if they are marked TREE_SIDE_EFFECTS
it might still be attempted to be expanded.
This removes them right away during the folding.

Initially I wanted to also change all gsi_replace last args in that function
to true, but Andrew pointed to PR107209, so I've kept them as is.

2023-12-13 Jakub Jelinek <jakub@redhat.com>

PR target/112962
* config/i386/i386.cc (ix86_gimple_fold_builtin): For shifts
and abs without lhs replace with nop.

* gcc.target/i386/pr112962.c: New test.

Avoid losing MEM_REF offset in MEM_EXPR adjustment for stack slot sharing

When investigating PR111591 with respect to TBAA and stack slot sharing
I noticed we're eventually scrapping a [TARGET_]MEM_REF offset when
rewriting the VAR_DECL base of the MEM_EXPR to use a pointer to the
partition instead. The following makes sure to preserve that.

* emit-rtl.cc (set_mem_attributes_minus_bitpos): Preserve
the offset when rewriting an exising MEM_REF base for
stack slot sharing.

tree-optimization/112991 - re-do PR112961 fix

The following does away with the fake edge adding as in the original
PR112961 fix and instead exposes handling of entry PHIs as additional
parameter of the region VN run.

PR tree-optimization/112991
PR tree-optimization/112961
* tree-ssa-sccvn.h (do_rpo_vn): Add skip_entry_phis argument.
* tree-ssa-sccvn.cc (do_rpo_vn): Likewise.
(do_rpo_vn_1): Likewise, merge with auto-processing.
(run_rpo_vn): Adjust.
(pass_fre::execute): Likewise.
* tree-if-conv.cc (tree_if_conversion): Revert last change.
Value-number latch block but disable value-numbering of
entry PHIs.
* tree-ssa-uninit.cc (execute_early_warn_uninitialized): Adjust.

* gcc.dg/torture/pr112991.c: New testcase.

tree-optimization/112990 - unsupported VEC_PERM from match pattern

The following avoids creating an unsupported VEC_PERM after vector
lowering from the pattern merging a bit-insert from a bit-field-ref
to a VEC_PERM. For the already existing s390 testcase we get
TImode vectors which later ICE during attempted expansion of
a vec_perm_const.

PR tree-optimization/112990
* match.pd (bit_insert @0 (BIT_FIELD_REF @1 ..) ..):
Restrict to vector modes after lowering.

middle-end/111591 - explain why TBAA doesn't need adjustment

While tidying the prototype patch I've done for the reduced testcase
in PR111591 and in that process trying to produce a testcase that
is miscompiled by stack slot coalescing and the TBAA info that
remains un-altered I've realized we do not need to adjust TBAA info.

The following documents this in the place we adjust points-to info
which we do need to adjust.

PR middle-end/111591
* cfgexpand.cc (update_alias_info_with_stack_vars): Document
why not adjusting TBAA info on accesses is OK.

multiflags: fix doc warning properly

Rather than a dubious fix for a dubious warning, namely adding a
period after a parenthesized @xref because the warning demands it, use
@pxref that is meant for exactly this case. Thanks to Joseph Myers
for introducing me to it.

for gcc/ChangeLog

* doc/invoke.texi (multiflags): Drop extraneous period, use
@pxref instead.

aarch64: Implement the ACLE instruction/data prefetch functions.

Implement the ACLE data and instruction prefetch functions[1] with the
following signatures:

  1. Data prefetch intrinsics:
  ----------------------------
  void __pldx (/*constant*/ unsigned int /*access_kind*/,
               /*constant*/ unsigned int /*cache_level*/,
               /*constant*/ unsigned int /*retention_policy*/,
               void const volatile *addr);

  void __pld (void const volatile *addr);

  2. Instruction prefetch intrinsics:
  -----------------------------------
  void __plix (/*constant*/ unsigned int /*cache_level*/,
               /*constant*/ unsigned int /*retention_policy*/,
               void const volatile *addr);

  void __pli (void const volatile *addr);

`__pldx' affords the programmer more fine-grained control over the
data prefetch behaviour than the analogous GCC builtin
`__builtin_prefetch', and allows access to the "SLC" cache level.

While `__builtin_prefetch' chooses both cache-level and retention
policy automatically via the optional `locality' parameter, `__pldx'
expects 2 (mandatory) arguments to explicitly define the desired
cache-level and retention policies.

`__plix' on the other hand, generates a code prefetch instruction and
so extends functionality on aarch64 targets beyond that which is
exposed by `builtin_prefetch'.

`__pld' and `__pli' do prefetch of data and instructions,
respectively, using default values for both cache-level and retention
policies.

Bootstrapped and tested on aarch64-none-linux-gnu.

[1] https://arm-software.github.io/acle/main/acle.html#memory-prefetch-intrinsics

gcc/ChangeLog:

* config/aarch64/aarch64-builtins.cc:
(AARCH64_PLD): New enum aarch64_builtins entry.
(AARCH64_PLDX): Likewise.
(AARCH64_PLI): Likewise.
(AARCH64_PLIX): Likewise.
(aarch64_init_prefetch_builtin): New.
(aarch64_general_init_builtins): Call prefetch init function.
(aarch64_expand_prefetch_builtin): New.
(aarch64_general_expand_builtin):  Add prefetch expansion.
(require_const_argument): New.
* config/aarch64/aarch64.md (UNSPEC_PLDX): New.
(aarch64_pldx): Likewise.
* config/aarch64/arm_acle.h (__pld): Likewise.
(__pli): Likewise.
(__plix): Likewise.
(__pldx): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/builtin_pld_pli.c: New.
* gcc.target/aarch64/builtin_pld_pli_illegal.c: New.

range: Workaround different type precision between _Float128 and long double [PR112788]

As PR112788 shows, on rs6000 with -mabi=ieeelongdouble type _Float128
has the different type precision (128) from that (127) of type long
double, but actually they has the same underlying mode, so they have
the same precision as the mode indicates the same real type format
ieee_quad_format.

It's not sensible to have such two types which have the same mode but
different type precisions, some fix attempt was posted at [1].
As the discussion there, there are some historical reasons and
practical issues. Considering we passed stage 1 and it also affected
the build as reported, this patch is trying to temporarily workaround
it. I thought to introduce a hookpod but that seems a bit overkill,
assuming scalar float type with the same mode should have the same
precision looks sensible.

[1] https://inbox.sourceware.org/gcc-patches/718677e7-614d-7977-312d-05a75e1fd5b4@linux.ibm.com/

PR tree-optimization/112788

gcc/ChangeLog:

* value-range.h (range_compatible_p): Workaround same type mode but
different type precision issue for rs6000 scalar float types
_Float128 and long double.

i386: Fix PR110790 testcase

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr110790-2.c: Change scan-assembler from shrq
to shr\[qx\].

rs6000: using pli for constant splitting

For constant building e.g. r120=0x66666666, which does not fit 'li or lis',
'pli' is used to build this constant via 'emit_move_insn'.

While for a complicated constant, e.g. 0x6666666666666666ULL, when using
'rs6000_emit_set_long_const' to split the constant recursively, it fails to
use 'pli' to build the half part constant: 0x66666666.

'rs6000_emit_set_long_const' could be updated to use 'pli' to build half
part of the constant when necessary. For example: 0x6666666666666666ULL,
"pli 3,1717986918; rldimi 3,3,32,0" can be used.

gcc/ChangeLog:

* config/rs6000/rs6000.cc (rs6000_emit_set_long_const): Add code to use
pli for 34bit constant.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/const-build-1.c: New test.

rs6000: accurate num_insns_constant_gpr

Trunk gcc supports more constants to be built via two instructions:
e.g. "li/lis; xori/xoris/rldicl/rldicr/rldic".
And then num_insns_constant should also be updated.

Function "rs6000_emit_set_long_const" is used to build complicated
constants; and "num_insns_constant_gpr" is used to compute 'how
many instructions are needed" to build the constant. So, these
two functions should be aligned.

The idea of this patch is: to reuse "rs6000_emit_set_long_const" to
compute/record the instruction number(when computing the insn_num,
then do not emit instructions).

gcc/ChangeLog:

* config/rs6000/rs6000.cc (rs6000_emit_set_long_const): Add new
parameter to record number of instructions to build the constant.
(num_insns_constant_gpr): Call rs6000_emit_set_long_const to compute
num_insn.

Daily bump.

c++: class hotness attribute and member template

The FUNCTION_DECL check ignored member function templates.

gcc/cp/ChangeLog:

* class.cc (propagate_class_warmth_attribute): Handle
member templates.

gcc/testsuite/ChangeLog:

* g++.dg/ext/attr-hotness.C: Add member templates.

Co-authored-by: Jason Xu <rxu@DRWHoldings.com>

RISC-V: Apply vla vs. vls mode heuristic vector COST model

This patch apply vla vs. vls mode heuristic which can fixes the following FAILs:
FAIL: gcc.target/riscv/rvv/autovec/pr111751.c -O3 -ftree-vectorize
scan-assembler-not vset
FAIL: gcc.target/riscv/rvv/autovec/pr111751.c -O3 -ftree-vectorize
scan-assembler-times li\\s+[a-x0-9]+,0\\s+ret 2

The root cause of this FAIL is we failed to pick VLS mode for the vectorization.

Before this patch:

foo2:
        addi    sp,sp,-208
        addi    a2,sp,64
        addi    a5,sp,128
        lui     a6,%hi(.LANCHOR0)
        sd      ra,200(sp)
        addi    a6,a6,%lo(.LANCHOR0)
        mv      a0,a2
        mv      a1,a5
        li      a3,16
        mv      a4,sp
        vsetivli        zero,8,e64,m8,ta,ma
        vle64.v v8,0(a6)
        vse64.v v8,0(a2)
        vse64.v v8,0(a5)
.L4:
        vsetvli a5,a3,e32,m1,ta,ma
        slli    a2,a5,2
        vle32.v v2,0(a1)
        vle32.v v1,0(a0)
        sub     a3,a3,a5
        vadd.vv v1,v1,v2
        vse32.v v1,0(a4)
        add     a1,a1,a2
        add     a0,a0,a2
        add     a4,a4,a2
        bne     a3,zero,.L4
        lw      a4,128(sp)
        lw      a5,64(sp)
        addw    a5,a5,a4
        lw      a4,0(sp)
        bne     a4,a5,.L5
        lw      a4,132(sp)
        lw      a5,68(sp)
        addw    a5,a5,a4
        lw      a4,4(sp)
        bne     a4,a5,.L5
        lw      a4,136(sp)
        lw      a5,72(sp)
        addw    a5,a5,a4
        lw      a4,8(sp)
        bne     a4,a5,.L5
        lw      a4,140(sp)
        lw      a5,76(sp)
        addw    a5,a5,a4
        lw      a4,12(sp)
        bne     a4,a5,.L5
        lw      a4,144(sp)
        lw      a5,80(sp)
        addw    a5,a5,a4
        lw      a4,16(sp)
        bne     a4,a5,.L5
        lw      a4,148(sp)
        lw      a5,84(sp)
        addw    a5,a5,a4
        lw      a4,20(sp)
        bne     a4,a5,.L5
        lw      a4,152(sp)
        lw      a5,88(sp)
        addw    a5,a5,a4
        lw      a4,24(sp)
        bne     a4,a5,.L5
        lw      a4,156(sp)
        lw      a5,92(sp)
        addw    a5,a5,a4
        lw      a4,28(sp)
        bne     a4,a5,.L5
        lw      a4,160(sp)
        lw      a5,96(sp)
        addw    a5,a5,a4
        lw      a4,32(sp)
        bne     a4,a5,.L5
        lw      a4,164(sp)
        lw      a5,100(sp)
        addw    a5,a5,a4
        lw      a4,36(sp)
        bne     a4,a5,.L5
        lw      a4,168(sp)
        lw      a5,104(sp)
        addw    a5,a5,a4
        lw      a4,40(sp)
        bne     a4,a5,.L5
        lw      a4,172(sp)
        lw      a5,108(sp)
        addw    a5,a5,a4
        lw      a4,44(sp)
        bne     a4,a5,.L5
        lw      a4,176(sp)
        lw      a5,112(sp)
        addw    a5,a5,a4
        lw      a4,48(sp)
        bne     a4,a5,.L5
        lw      a4,180(sp)
        lw      a5,116(sp)
        addw    a5,a5,a4
        lw      a4,52(sp)
        bne     a4,a5,.L5
        lw      a4,184(sp)
        lw      a5,120(sp)
        addw    a5,a5,a4
        lw      a4,56(sp)
        bne     a4,a5,.L5
        lw      a4,188(sp)
        lw      a5,124(sp)
        addw    a5,a5,a4
        lw      a4,60(sp)
        bne     a4,a5,.L5
        ld      ra,200(sp)
        li      a0,0
        addi    sp,sp,208
        jr      ra
.L5:
        call    abort

After this patch:

        li      a0,0
        ret

The heuristic leverage ARM SVE and fully tested and confirm we have same behavior
as ARM SVE GCC and RVV Clang.

gcc/ChangeLog:

* config/riscv/riscv-vector-costs.cc (costs::analyze_loop_vinfo): New function.
(costs::record_potential_vls_unrolling): Ditto.
(costs::prefer_unrolled_loop): Ditto.
(costs::better_main_loop_than_p): Ditto.
(costs::add_stmt_cost): Ditto.
* config/riscv/riscv-vector-costs.h (enum cost_type_enum): New enum.
* config/riscv/t-riscv: Add new include files.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/pr111313.c: Adapt test.
* gcc.target/riscv/rvv/autovec/vls/shift-3.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-1.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-10.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-11.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-12.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-2.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-3.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-4.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-5.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-6.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-7.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-8.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-9.c: New test.

RISC-V: Refactor Dynamic LMUL codes

This patch refactor dynamic LMUL to remove this following variable:
static hash_map<class loop *, autovec_info> loop_autovec_infos;

which will keep growing on-the-fly.

gcc/ChangeLog:

* config/riscv/riscv-vector-costs.cc (get_current_lmul): Remove it.
(compute_estimated_lmul): New function.
(costs::costs): Refactor.
(costs::preferred_new_lmul_p): Ditto.
(preferred_new_lmul_p): Ditto.
(costs::better_main_loop_than_p): Ditto.
* config/riscv/riscv-vector-costs.h (struct autovec_info): Remove it.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul-mixed-1.c: Adapt test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-3.c: Adapt test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-6.c: Adapt test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-1.c: Adapt test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-2.c: Adapt test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-3.c: Adapt test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-4.c: Adapt test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-5.c: Adapt test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-6.c: Adapt test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-1.c: Adapt test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-2.c: Adapt test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-3.c: Adapt test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-4.c: Adapt test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c: Adapt test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-6.c: Adapt test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-7.c: Adapt test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-8.c: Adapt test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-9.c: Adapt test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-1.c: Adapt test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-10.c: Adapt test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c: Adapt test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-3.c: Adapt test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-4.c: Adapt test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-5.c: Adapt test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-6.c: Adapt test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-7.c: Adapt test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-8.c: Adapt test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-9.c: Adapt test.
* gcc.dg/vect/costmodel/riscv/rvv/pr111848.c: Adapt test.

testsuite: Add testcase for already fixed PR [PR112822]

Adding a testcase for PR112822 to ensure we won't regress.

2023-12-12 Peter Bergner <bergner@linux.ibm.com>

gcc/testsuite/
PR tree-optimization/112822
* g++.dg/pr112822.C: New test.

libstdc++: Fix std::format("{}", 'c')

When I added a fast path for std::format("{}", x) in
r14-5587-g41a5ea4cab2c59 I forgot to handle char separately from other
integral types. That caused std::format("{}", 'c') to return "99"
instead of "c".

libstdc++-v3/ChangeLog:

* include/std/format (__do_vformat_to): Handle char separately
from other integral types.
* testsuite/std/format/functions/format.cc: Check for expected
output for char and bool arguments.
* testsuite/std/format/string.cc: Check that 0 filling is
rejected for character and string formats.

libstdc++: Fix std::format output of %C for negative years

During discussion of LWG 4022 I noticed that we do not correctly
implement floored division for the century. We were just truncating
towards zero, rather than applying the floor function. For negative
values that rounds the wrong way.

libstdc++-v3/ChangeLog:

* include/bits/chrono_io.h (__formatter_chrono::_M_C_y_Y): Fix
rounding for negative centuries.
* testsuite/std/time/year/io.cc: Check %C for negative years.

libstdc++: Remove redundant -std flags from Makefile

In r14-4060-gc4baeaecbbf7d0 I moved some files from src/c++98 to
src/c++11 but I didn't remove the redundant -std=gnu++11 flags for those
files. The flags aren't needed now, because AM_CXXFLAGS for that
directory already uses -std=gnu++11. This removes them.

libstdc++-v3/ChangeLog:

* src/c++11/Makefile.am: Remove redundant -std=gnu++11 flags.
* src/c++11/Makefile.in: Regenerate.

SRA: Force gimple operand in an additional corner case (PR 112822)

PR 112822 revealed a corner case in load_assign_lhs_subreplacements
where it creates invalid gimple: an assignment where on the LHS there
is a complex variable which however is not a gimple register because
it has partial defs and on the right hand side there is a
VIEW_CONVERT_EXPR. This patch invokes force_gimple_operand_gsi on
such statements (like it already does when both sides of a generated
assignment have partial definitions.

gcc/ChangeLog:

2023-12-12 Martin Jambor <mjambor@suse.cz>

PR tree-optimization/112822
* tree-sra.cc (load_assign_lhs_subreplacements): Invoke
force_gimple_operand_gsi also when LHS has partial stores and RHS is a
VIEW_CONVERT_EXPR.

PR modula2/112984 Compiling program with -Wpedantic shows warning in libraries

This patch tidies up the library modules so that -Wpedantic does not
generate any warnings (apart from two procedures with legitimate infinite
loops).

gcc/m2/ChangeLog:

PR modula2/112984
* gm2-libs-coroutines/SYSTEM.mod: Remove redundant import of memcpy.
* gm2-libs-iso/ClientSocket.mod: Remove redundant import of IOConsts.
* gm2-libs-iso/IOChan.mod: Remove redundant import of IOConsts.
* gm2-libs-iso/IOLink.mod: Remove redundant import of IOChan and SYSTEM.
* gm2-libs-iso/IOResult.mod: Remove redundant import of IOChan.
* gm2-libs-iso/LongIO.mod: Remove redundant import of writeString.
* gm2-libs-iso/LongWholeIO.mod: Remove redundant import of IOChan.
* gm2-libs-iso/M2RTS.mod: Remove redundant import of ADDRESS.
* gm2-libs-iso/MemStream.mod: Remove redundant import of ADDRESS.
* gm2-libs-iso/RTdata.mod: Remove redundant import of DeviceTablePtr.
* gm2-libs-iso/RTfio.mod: Remove redundant import of DeviceTablePtr.
* gm2-libs-iso/RTgen.mod: Remove redundant import of DeviceTablePtr.
* gm2-libs-iso/RealIO.mod: Remove redundant import of writeString.
* gm2-libs-iso/RndFile.mod: Remove redundant import of SYSTEM.
* gm2-libs-iso/SYSTEM.mod: Remove redundant import of memcpy.
* gm2-libs-iso/ShortWholeIO.mod: Remove redundant import of IOConsts.
* gm2-libs-iso/TextIO.mod: Remove redundant import of IOChan.
* gm2-libs-iso/TextUtil.mod: Remove redundant import of IOChan.
* gm2-libs-iso/WholeIO.mod: Remove redundant import of IOChan.
* gm2-libs-log/BitByteOps.mod: Remove redundant import of BYTE.
* gm2-libs-log/FileSystem.mod: Remove redundant import of BYTE and ADDRESS.
* gm2-libs-log/InOut.mod: Remove redundant import of String.
* gm2-libs-log/RealConversions.mod: Remove redundant import of StringToLongreal.
* gm2-libs/FIO.mod: Remove redundant import of SIZE.
* gm2-libs/FormatStrings.mod: Remove redundant import of String
and ConCatChar.
* gm2-libs/IO.mod: Remove redundant import of SIZE.
* gm2-libs/Indexing.mod: Remove redundant import of ADDRESS.
* gm2-libs/M2Dependent.mod: Remove redundant import of SIZE.
* gm2-libs/M2RTS.mod: Remove redundant import of ADDRESS.
* gm2-libs/OptLib.mod: Remove redundant import of DynamicStrings.
* gm2-libs/SYSTEM.mod: Remove redundant import of memcpy.
* gm2-libs/StringConvert.mod: Remove redundant import of String.

libgm2/ChangeLog:

* libm2iso/Makefile.am (libm2iso_la_M2FLAGS): Added line breaks.
* libm2iso/Makefile.in: Regenerate.
* libm2log/Makefile.am (libm2log_la_M2FLAGS): Added line breaks.
* libm2log/Makefile.in: Regenerate.
* libm2pim/Makefile.am (libm2pim_la_M2FLAGS): Added line breaks.
* libm2pim/Makefile.in: Regenerate.

gcc/testsuite/ChangeLog:

PR modula2/112984
* gm2/switches/pedantic/pass/hello.mod: New test.
* gm2/switches/pedantic/pass/switches-pedantic-pass.exp: New test.

Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>

LoongArch: testsuite: Remove XFAIL in vect-ftint-no-inexact.c

After r14-6455 this no longer fails.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/vect-ftint-no-inexact.c (xfail): Remove.

testsuite: fix is_nothrow_default_constructible8.C

This testcase uses variable templates, a C++14 feature.

gcc/testsuite/ChangeLog:

* g++.dg/ext/is_nothrow_constructible8.C: Require C++14.

tree: add to clobber_kind

In discussion of PR71093 it came up that more clobber_kind options would be
useful within the C++ front-end.

gcc/ChangeLog:

* tree-core.h (enum clobber_kind): Rename CLOBBER_EOL to
CLOBBER_STORAGE_END. Add CLOBBER_STORAGE_BEGIN,
CLOBBER_OBJECT_BEGIN, CLOBBER_OBJECT_END.
* gimple-lower-bitint.cc
* gimple-ssa-warn-access.cc
* gimplify.cc
* tree-inline.cc
* tree-ssa-ccp.cc: Adjust for rename.
* tree-pretty-print.cc: And handle new values.

gcc/cp/ChangeLog:

* call.cc (build_trivial_dtor_call): Use CLOBBER_OBJECT_END.
* decl.cc (build_clobber_this): Take clobber_kind argument.
(start_preparsed_function): Pass CLOBBER_OBJECT_BEGIN.
(begin_destructor_body): Pass CLOBBER_OBJECT_END.

gcc/testsuite/ChangeLog:

* gcc.dg/pr87052.c: Adjust expected CLOBBER output.

Co-authored-by: Nathaniel Shead <nathanieloshead@gmail.com>

aarch64,arm: Fix branch-protection= parsing

Refactor the parsing to have a single API and fix a few parsing issues:

- Different handling of "bti+none" and "none+bti": these should be
  rejected because "none" can only appear alone.

- Accepted empty strings such as "bti++pac-ret" or "bti+", this bug
  was caused by using strtok_r.

- Memory got leaked (str_root was never freed). And two buffers got
  allocated when one is enough.

The callbacks now have no failure mode, only parsing can fail and
all failures are handled locally.  The "-mbranch-protection=" vs
"target("branch-protection=")" difference in the error message is
handled by a separate argument to aarch_validate_mbranch_protection.

gcc/ChangeLog:

* config/aarch64/aarch64.cc (aarch64_override_options): Update.
(aarch64_handle_attr_branch_protection): Update.
* config/arm/aarch-common-protos.h (aarch_parse_branch_protection):
Remove.
(aarch_validate_mbranch_protection): Add new argument.
* config/arm/aarch-common.cc (aarch_handle_no_branch_protection):
Update.
(aarch_handle_standard_branch_protection): Update.
(aarch_handle_pac_ret_protection): Update.
(aarch_handle_pac_ret_leaf): Update.
(aarch_handle_pac_ret_b_key): Update.
(aarch_handle_bti_protection): Update.
(aarch_parse_branch_protection): Remove.
(next_tok): New.
(aarch_validate_mbranch_protection): Rewrite.
* config/arm/aarch-common.h (struct aarch_branch_protect_type):
Add field "alone".
* config/arm/arm.cc (arm_configure_build_target): Update.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/branch-protection-attr.c: Update.
* gcc.target/aarch64/branch-protection-option.c: Update.

aarch64,arm: Remove accepted_branch_protection_string

On aarch64 this caused ICE with pragma push_options since

  commit ae54c1b09963779c5c3914782324ff48af32e2f1
  Author:     Wilco Dijkstra <wilco.dijkstra@arm.com>
  CommitDate: 2022-06-01 18:13:57 +0100

  AArch64: Cleanup option processing code

The failure is at pop_options:

internal compiler error: ‘global_options’ are modified in local context

On arm the variable was unused.

gcc/ChangeLog:

* config/aarch64/aarch64.cc (aarch64_override_options_after_change_1):
Do not override branch_protection options.
(aarch64_override_options): Remove accepted_branch_protection_string.
* config/arm/aarch-common.cc (BRANCH_PROTECT_STR_MAX): Remove.
(aarch_parse_branch_protection): Remove
accepted_branch_protection_string.
* config/arm/arm.cc: Likewise.

tree-optimization/112736 - avoid overread with non-grouped SLP load

The following aovids over/under-read of storage when vectorizing
a non-grouped load with SLP.  Instead of forcing peeling for gaps
use a smaller load for the last vector which might access excess
elements.  This builds upon the existing optimization avoiding
peeling for gaps, generalizing it to all gap widths leaving a
power-of-two remaining number of elements (but it doesn't replace
or improve that particular case at this point).

I wonder if the poly relational compares I set up are good enough
to guarantee /* remain should now be > 0 and < nunits.  */.

There is existing test coverage that runs into /* DR will be unused.  */
always when the gap is wider than nunits.  Compared to the
existing gap == nunits/2 case this only adjusts the load that will
cause the overrun at the end, not every load.  Apart from the
poly relational compares it should reliably cover these cases but
I'll leave it for stage1 to remove.

PR tree-optimization/112736
* tree-vect-stmts.cc (vectorizable_load): Extend optimization
to avoid peeling for gaps to handle single-element non-groups
we now allow with SLP.

* gcc.dg/torture/pr112736.c: New testcase.

ipa/92606 - properly handle no_icf attribute for variables

The following adds no_icf handling for variables where the attribute
was rejected. It also fixes the check for no_icf by checking both
the source and the targets decl.

PR ipa/92606
gcc/c-family/
* c-attribs.cc (handle_noicf_attribute): Also allow the
attribute on global variables.

gcc/
* ipa-icf.cc (sem_item_optimizer::merge_classes): Check
both source and alias for the no_icf attribute.
* doc/extend.texi (no_icf): Document variable attribute.

tree-optimization/112961 - include latch in if-conversion CSE

The following makes sure to also process the (empty) latch when
performing CSE on the if-converted loop body. That's important
to get all uses of copies propagated out on the backedge as well.
To avoid CSE on the PHI nodes itself which is prohibitive
(see PR90402) this temporarily adds a fake entry edge to the loop.

PR tree-optimization/112961
* tree-if-conv.cc (tree_if_conversion): Instead of excluding
the latch block from VN, add a fake entry edge.

* g++.dg/vect/pr112961.cc: New testcase.

testsuite: Fix up test directive syntax errors

I've noticed
+ERROR: gcc.dg/gomp/pr87887-1.c: syntax error in target selector ".-4" for " dg-warning 13 "unsupported return type ‘struct S’ for ‘simd’ functions" { target aarch64*-*-* } .-4 "
+ERROR: gcc.dg/gomp/pr87887-1.c: syntax error in target selector ".-4" for " dg-warning 13 "unsupported return type ‘struct S’ for ‘simd’ functions" { target aarch64*-*-* } .-4 "
+ERROR: gcc.dg/gomp/pr89246-1.c: syntax error in target selector ".-4" for " dg-warning 11 "unsupported argument type ‘__int128’ for ‘simd’ functions" { target aarch64*-*-* } .-4 "
+ERROR: gcc.dg/gomp/pr89246-1.c: syntax error in target selector ".-4" for " dg-warning 11 "unsupported argument type ‘__int128’ for ‘simd’ functions" { target aarch64*-*-* } .-4 "
+ERROR: gcc.dg/gomp/simd-clones-2.c: unmatched open quote in list for " dg-final 19 { scan-tree-dump "_ZGVnN2ua32vl_setArray" "optimized { target aarch64*-*-* } } "
+ERROR: gcc.dg/gomp/simd-clones-2.c: unmatched open quote in list for " dg-final 19 { scan-tree-dump "_ZGVnN2ua32vl_setArray" "optimized { target aarch64*-*-* } } "
regressions. The following patch fixes those.

2023-12-12 Jakub Jelinek <jakub@redhat.com>

* gcc.dg/gomp/pr87887-1.c: Add missing comment argument to dg-warning.
* gcc.dg/gomp/pr89246-1.c: Likewise.
* gcc.dg/gomp/simd-clones-2.c: Add missing " after dump name.

Only allow (int)trunc(x) to (int)x simplification with -ffp-int-builtin-inexact [PR107723]

With -fno-fp-int-builtin-inexact, trunc is not allowed to raise
FE_INEXACT and it should produce an integral result (if the input is not
NaN or Inf). Thus FE_INEXACT should not be raised.

But (int)x may raise FE_INEXACT when x is a non-integer, non-NaN, and
non-Inf value. C23 recommends to do so in a footnote.

Thus we should not simplify (int)trunc(x) to (int)x if
-fno-fp-int-builtin-inexact is in-effect.

gcc/ChangeLog:

PR middle-end/107723
* convert.cc (convert_to_integer_1) [case BUILT_IN_TRUNC]: Break
early if !flag_fp_int_builtin_inexact and flag_trapping_math.

gcc/testsuite/ChangeLog:

PR middle-end/107723
* gcc.dg/torture/builtin-fp-int-inexact-trunc.c: New test.

aarch64: Add dg-options to prfm_imm_offset_2.c

gcc/testsuite/
* gcc.target/aarch64/prfm_imm_offset_2.c: Add dg-options.

Add myself to write after approval

ChangeLog:

* MAINTAINERS: Add myself to write after approval

Signed-off-by: Paul Iannetta <piannetta@kalrayinc.com>

RISC-V: Disable RVV VCOMPRESS avl propagation

This patch would like to disable the avl propagation for the follow
reasons.

According to the ISA, the first vl elements of vector register
group vs2 should be extracted and packed for vcompress.  And the
highest element of vs2 vector may be touched by the mask, which
may be eliminated by avl propagation.

For example, given original vl = 4 here. We have:

  v0 = 0b1000
  v1 = {0x1, 0x2, 0x3, 0x4}
  v2 = {0x5, 0x6, 0x7, 0x8}

Then:
  vcompress v1, v2, v0 (avl = 4), v1 = {0x8, 0x2, 0x3, 0x4}. <== Correct.
  vcompress v1, v2, v0 (avl = 2), v1 will be unchanged.      <== Wrong.

Finally, we cannot propagate avl of vcompress because it may has
senmatics change to the result.

This patch also fix the failure of gcc.c-torture/execute/990128-1.c for
the following configurations.

riscv-sim/-march=rv64gcv/-mabi=lp64d/-mcmodel=medlow
riscv-sim/-march=rv64gcv/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m1
riscv-sim/-march=rv64gcv/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m1/--param=riscv-autovec-preference=fixed-vlmax
riscv-sim/-march=rv64gcv/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m2
riscv-sim/-march=rv64gcv/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m2/--param=riscv-autovec-preference=fixed-vlmax
riscv-sim/-march=rv64gcv/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m4
riscv-sim/-march=rv64gcv/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m4/--param=riscv-autovec-preference=fixed-vlmax
riscv-sim/-march=rv64gcv/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m8
riscv-sim/-march=rv64gcv/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m8/--param=riscv-autovec-preference=fixed-vlmax
riscv-sim/-march=rv64gcv_zvl256b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m1
riscv-sim/-march=rv64gcv_zvl256b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m1/--param=riscv-autovec-preference=fixed-vlmax
riscv-sim/-march=rv64gcv_zvl256b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m2
riscv-sim/-march=rv64gcv_zvl256b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m2/--param=riscv-autovec-preference=fixed-vlmax
riscv-sim/-march=rv64gcv_zvl256b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m4
riscv-sim/-march=rv64gcv_zvl256b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m4/--param=riscv-autovec-preference=fixed-vlmax
riscv-sim/-march=rv64gcv_zvl256b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m8
riscv-sim/-march=rv64gcv_zvl256b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m8/--param=riscv-autovec-preference=fixed-vlmax
riscv-sim/-march=rv64gcv_zvl512b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m1
riscv-sim/-march=rv64gcv_zvl512b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m1/--param=riscv-autovec-preference=fixed-vlmax
riscv-sim/-march=rv64gcv_zvl512b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m2
riscv-sim/-march=rv64gcv_zvl512b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m2/--param=riscv-autovec-preference=fixed-vlmax
riscv-sim/-march=rv64gcv_zvl512b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m4
riscv-sim/-march=rv64gcv_zvl512b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m4/--param=riscv-autovec-preference=fixed-vlmax
riscv-sim/-march=rv64gcv_zvl512b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m8
riscv-sim/-march=rv64gcv_zvl512b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m8/--param=riscv-autovec-preference=fixed-vlmax

gcc/ChangeLog:

* config/riscv/riscv-avlprop.cc (avl_can_be_propagated_p):
Disable the avl propogation for the vcompress.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/vcompress-avlprop-1.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

libquadmath: Restore linking against -lm on most targets [PR112963]

The r14-4825 change added AC_CHECK_LIBM to libquadmath configure.ac and
replaced unconditional linking with -lm with linking with $(LIBM)
determined by that.
Unfortunately that broke bare metal targets because AC_CHECK_LIBM attempts
to link against -lm and this was after (unconditional) GCC_NO_EXECUTABLES.
Then r14-4863 partially reverted that change (no longer AC_CHECK_LIBM),
but didn't revert the Makefile.am change of -lm to $(LIBM), which had
the effect that libquadmath is not linked against -lm on any arch.
That is a serious problem though e.g. on Linux, because libquadmath calls
a few libm entrypoints and e.g. on powerpc64le the underlinking can cause
crashes in IFUNC resolvers of libm.
Instead of adding further reversion of the r14-4825 commit and use -lm
unconditionally again, this patch adds an AC_CHECK_LIBM like substitutions
with the *-ncr-sysv4.3* target handling removed (I think we don't support
such targets, especially not in libquadmath) and with the default case
replaced by simple using -lm. That is something in between using -lm
unconditionally and what AC_CHECK_LIBM does if it would work on bare metal
- we know from GCC 13 and earlier that we can link -lm on all targets
libquadmath is built for, and just white list a couple of targets which
we know don't have separate -lm and don't want to link against that
(like Darwin, Cygwin, ...).

2023-12-12 Jakub Jelinek <jakub@redhat.com>

PR libquadmath/112963
* configure.ac (LIBM): Readd AC_CHECK_LIBM-like check without doing
AC_CHECK_LIB in it.
* configure: Regenerated.
* Makefile.in: Regenerated.

LoongArch: Fix warnings building libgcc

We are excluding loongarch-opts.h from target libraries, but now struct
loongarch_target and gcc_options are not declared in the target
libraries, causing:

In file included from ../.././gcc/options.h:8,
                 from ../.././gcc/tm.h:49,
                 from ../../../gcc/libgcc/fixed-bit.c:48:
../../../gcc/libgcc/../gcc/config/loongarch/loongarch-opts.h:57:41:
warning: 'struct gcc_options' declared inside parameter list will not
be visible outside of this definition or declaration
   57 |                                  struct gcc_options *opts,
      |                                         ^~~~~~~~~~~

So exclude the declarations referring to the C++ structs as well.

gcc/ChangeLog:

* config/loongarch/loongarch-opts.h (la_target): Move into #if
for loongarch-def.h.
(loongarch_init_target): Likewise.
(loongarch_config_target): Likewise.
(loongarch_update_gcc_opt_status): Likewise.

LoongArch: Allow -mcmodel=extreme and model attribute with -mexplicit-relocs=auto

There seems no real reason to require -mexplicit-relocs=always for
-mcmodel=extreme or model attribute. As the linker does not know how to
relax a 3-operand la.local or la.global pseudo instruction, just emit
explicit relocs for SYMBOL_PCREL64, and under TARGET_CMODEL_EXTREME also
SYMBOL_GOT_DISP.

gcc/ChangeLog:

* config/loongarch/loongarch.cc (loongarch_explicit_relocs_p):
Return true for SYMBOL_PCREL64. Return true for SYMBOL_GOT_DISP
if TARGET_CMODEL_EXTREME.
(loongarch_split_symbol): Check for la_opt_explicit_relocs !=
EXPLICIT_RELOCS_NONE instead of TARGET_EXPLICIT_RELOCS.
(loongarch_print_operand_reloc): Likewise.
(loongarch_option_override_internal): Likewise.
(loongarch_handle_model_attribute): Likewise.
* doc/invoke.texi (-mcmodel=extreme): Update the compatibility
between it and -mexplicit-relocs=.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/attr-model-3.c: New test.
* gcc.target/loongarch/attr-model-4.c: New test.
* gcc.target/loongarch/func-call-extreme-3.c: New test.
* gcc.target/loongarch/func-call-extreme-4.c: New test.

tree-optimization/112939 - VN PHI visiting and -ftrivial-auto-var-init

The following builds upon the last fix, making sure we only value-number
to visited (un-)defs, otherwise prefer .VN_TOP.

PR tree-optimization/112939
* tree-ssa-sccvn.cc (visit_phi): When all args are undefined
make sure we end up with a value that was visited, otherwise
fall back to .VN_TOP.

* gcc.dg/pr112939.c: New testcase.

Don't assume it's AVX_U128_CLEAN after call_insn whose abi.mode_clobber(V4DImode) deosn't contains all SSE_REGS.

If the function desn't clobber any sse registers or only clobber
128-bit part, then vzeroupper isn't issued before the function exit.
the status not CLEAN but ANY after the function.

Also for sibling_call, it's safe to issue an vzeroupper. Also there
could be missing vzeroupper since there's no mode_exit for
sibling_call_p.

gcc/ChangeLog:

PR target/112891
* config/i386/i386.cc (ix86_avx_u128_mode_after): Return
AVX_U128_ANY if callee_abi doesn't clobber all_sse_regs to
align with ix86_avx_u128_mode_needed.
(ix86_avx_u128_mode_needed): Return AVX_U128_ClEAN for
sibling_call.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr112891.c: New test.
* gcc.target/i386/pr112891-2.c: New test.

untyped calls: enable target switching [PR112334]

The computation of apply_args_size and apply_result_size is saved in a
static variable, so that the corresponding _mode arrays are
initialized only once. That is not compatible with switchable
targets, and ARM's arm_set_current_function, by saving and restoring
target globals, exercises this problem with a testcase such as that in
the PR, in which more than one function in the translation unit calls
__builtin_apply or __builtin_return, respectively.

This patch moves the _size statics into the target_builtins array,
with a bit of ugliness over _plus_one so that zero initialization of
the struct does the right thing.

for gcc/ChangeLog

PR target/112334
* builtins.h (target_builtins): Add fields for apply_args_size
and apply_result_size.
* builtins.cc (apply_args_size, apply_result_size): Cache
results in fields rather than in static variables.
(get_apply_args_size, set_apply_args_size): New.
(get_apply_result_size, set_apply_result_size): New.

i386: Fix missed APX_NDD check for shift/rotate expanders [PR 112943]

The ashl/lshr/ashr expanders calls ix86_expand_binary_operator, while
they will be called for some post-reload split, and TARGET_APX_NDD is
required for these calls to avoid force-load to memory at postreload
stage.

gcc/ChangeLog:

PR target/112943
* config/i386/i386.md (ashl<mode>3): Add TARGET_APX_NDD to
ix86_expand_binary_operator call.
(<insn><mode>3): Likewise for rshift.
(<insn>di3): Likewise for DImode rotate.
(<insn><mode>3): Likewise for SWI124 rotate.

gcc/testsuite/ChangeLog:

PR target/112943
* gcc.target/i386/pr112943.c: New test.

analyzer: add more test coverage for tainted modulus

Add more test coverage for r14-6349-g0bef72539e585d.

gcc/testsuite/ChangeLog:
* gcc.dg/plugin/plugin.exp: Add taint-modulus.c to
analyzer_kernel_plugin.c tests.
* gcc.dg/plugin/taint-modulus.c: New test.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

MAINTAINERS: Update my email address

ChangeLog:

* MAINTAINERS: Update my email address

RISC-V: Add avail interface into function_group_info

Patch v3: Fix typo and remove the modification of rvv.exp.
Patch v2: Using variadic macro and add the dependency into t-riscv.

In order to add other extension about vector,this patch add
unsigned int (*avail) (void) into function_group_info to determine
whether to register the intrinsic based on ISA info.
gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-functions.def (DEF_RVV_FUNCTION):
Add AVAIL argument.
(read_vl): Using AVAIL argument default value.
(vlenb): Ditto.
(vsetvl): Ditto.
(vsetvlmax): Ditto.
(vle): Ditto.
(vse): Ditto.
(vlm): Ditto.
(vsm): Ditto.
(vlse): Ditto.
(vsse): Ditto.
(vluxei8): Ditto.
(vluxei16): Ditto.
(vluxei32): Ditto.
(vluxei64): Ditto.
(vloxei8): Ditto.
(vloxei16): Ditto.
(vloxei32): Ditto.
(vloxei64): Ditto.
(vsuxei8): Ditto.
(vsuxei16): Ditto.
(vsuxei32): Ditto.
(vsuxei64): Ditto.
(vsoxei8): Ditto.
(vsoxei16): Ditto.
(vsoxei32): Ditto.
(vsoxei64): Ditto.
(vleff): Ditto.
(vadd): Ditto.
(vsub): Ditto.
(vrsub): Ditto.
(vneg): Ditto.
(vwaddu): Ditto.
(vwsubu): Ditto.
(vwadd): Ditto.
(vwsub): Ditto.
(vwcvt_x): Ditto.
(vwcvtu_x): Ditto.
(vzext): Ditto.
(vsext): Ditto.
(vadc): Ditto.
(vmadc): Ditto.
(vsbc): Ditto.
(vmsbc): Ditto.
(vand): Ditto.
(vor): Ditto.
(vxor): Ditto.
(vnot): Ditto.
(vsll): Ditto.
(vsra): Ditto.
(vsrl): Ditto.
(vnsrl): Ditto.
(vnsra): Ditto.
(vncvt_x): Ditto.
(vmseq): Ditto.
(vmsne): Ditto.
(vmsltu): Ditto.
(vmslt): Ditto.
(vmsleu): Ditto.
(vmsle): Ditto.
(vmsgtu): Ditto.
(vmsgt): Ditto.
(vmsgeu): Ditto.
(vmsge): Ditto.
(vminu): Ditto.
(vmin): Ditto.
(vmaxu): Ditto.
(vmax): Ditto.
(vmul): Ditto.
(vmulh): Ditto.
(vmulhu): Ditto.
(vmulhsu): Ditto.
(vdivu): Ditto.
(vdiv): Ditto.
(vremu): Ditto.
(vrem): Ditto.
(vwmul): Ditto.
(vwmulu): Ditto.
(vwmulsu): Ditto.
(vmacc): Ditto.
(vnmsac): Ditto.
(vmadd): Ditto.
(vnmsub): Ditto.
(vwmaccu): Ditto.
(vwmacc): Ditto.
(vwmaccsu): Ditto.
(vwmaccus): Ditto.
(vmerge): Ditto.
(vmv_v): Ditto.
(vsaddu): Ditto.
(vsadd): Ditto.
(vssubu): Ditto.
(vssub): Ditto.
(vaaddu): Ditto.
(vaadd): Ditto.
(vasubu): Ditto.
(vasub): Ditto.
(vsmul): Ditto.
(vssrl): Ditto.
(vssra): Ditto.
(vnclipu): Ditto.
(vnclip): Ditto.
(vfadd): Ditto.
(vfsub): Ditto.
(vfrsub): Ditto.
(vfadd_frm): Ditto.
(vfsub_frm): Ditto.
(vfrsub_frm): Ditto.
(vfwadd): Ditto.
(vfwsub): Ditto.
(vfwadd_frm): Ditto.
(vfwsub_frm): Ditto.
(vfmul): Ditto.
(vfdiv): Ditto.
(vfrdiv): Ditto.
(vfmul_frm): Ditto.
(vfdiv_frm): Ditto.
(vfrdiv_frm): Ditto.
(vfwmul): Ditto.
(vfwmul_frm): Ditto.
(vfmacc): Ditto.
(vfnmsac): Ditto.
(vfmadd): Ditto.
(vfnmsub): Ditto.
(vfnmacc): Ditto.
(vfmsac): Ditto.
(vfnmadd): Ditto.
(vfmsub): Ditto.
(vfmacc_frm): Ditto.
(vfnmacc_frm): Ditto.
(vfmsac_frm): Ditto.
(vfnmsac_frm): Ditto.
(vfmadd_frm): Ditto.
(vfnmadd_frm): Ditto.
(vfmsub_frm): Ditto.
(vfnmsub_frm): Ditto.
(vfwmacc): Ditto.
(vfwnmacc): Ditto.
(vfwmsac): Ditto.
(vfwnmsac): Ditto.
(vfwmacc_frm): Ditto.
(vfwnmacc_frm): Ditto.
(vfwmsac_frm): Ditto.
(vfwnmsac_frm): Ditto.
(vfsqrt): Ditto.
(vfsqrt_frm): Ditto.
(vfrsqrt7): Ditto.
(vfrec7): Ditto.
(vfrec7_frm): Ditto.
(vfmin): Ditto.
(vfmax): Ditto.
(vfsgnj): Ditto.
(vfsgnjn): Ditto.
(vfsgnjx): Ditto.
(vfneg): Ditto.
(vfabs): Ditto.
(vmfeq): Ditto.
(vmfne): Ditto.
(vmflt): Ditto.
(vmfle): Ditto.
(vmfgt): Ditto.
(vmfge): Ditto.
(vfclass): Ditto.
(vfmerge): Ditto.
(vfmv_v): Ditto.
(vfcvt_x): Ditto.
(vfcvt_xu): Ditto.
(vfcvt_rtz_x): Ditto.
(vfcvt_rtz_xu): Ditto.
(vfcvt_f): Ditto.
(vfcvt_x_frm): Ditto.
(vfcvt_xu_frm): Ditto.
(vfcvt_f_frm): Ditto.
(vfwcvt_x): Ditto.
(vfwcvt_xu): Ditto.
(vfwcvt_rtz_x): Ditto.
(vfwcvt_rtz_xu) Ditto.:
(vfwcvt_f): Ditto.
(vfwcvt_x_frm): Ditto.
(vfwcvt_xu_frm) Ditto.:
(vfncvt_x): Ditto.
(vfncvt_xu): Ditto.
(vfncvt_rtz_x): Ditto.
(vfncvt_rtz_xu): Ditto.
(vfncvt_f): Ditto.
(vfncvt_rod_f): Ditto.
(vfncvt_x_frm): Ditto.
(vfncvt_xu_frm): Ditto.
(vfncvt_f_frm): Ditto.
(vredsum): Ditto.
(vredmaxu): Ditto.
(vredmax): Ditto.
(vredminu): Ditto.
(vredmin): Ditto.
(vredand): Ditto.
(vredor): Ditto.
(vredxor): Ditto.
(vwredsum): Ditto.
(vwredsumu): Ditto.
(vfredusum): Ditto.
(vfredosum): Ditto.
(vfredmax): Ditto.
(vfredmin): Ditto.
(vfredusum_frm): Ditto.
(vfredosum_frm): Ditto.
(vfwredosum): Ditto.
(vfwredusum): Ditto.
(vfwredosum_frm): Ditto.
(vfwredusum_frm): Ditto.
(vmand): Ditto.
(vmnand): Ditto.
(vmandn): Ditto.
(vmxor): Ditto.
(vmor): Ditto.
(vmnor): Ditto.
(vmorn): Ditto.
(vmxnor): Ditto.
(vmmv): Ditto.
(vmclr): Ditto.
(vmset): Ditto.
(vmnot): Ditto.
(vcpop): Ditto.
(vfirst): Ditto.
(vmsbf): Ditto.
(vmsif): Ditto.
(vmsof): Ditto.
(viota): Ditto.
(vid): Ditto.
(vmv_x): Ditto.
(vmv_s): Ditto.
(vfmv_f): Ditto.
(vfmv_s): Ditto.
(vslideup): Ditto.
(vslidedown): Ditto.
(vslide1up): Ditto.
(vslide1down): Ditto.
(vfslide1up): Ditto.
(vfslide1down): Ditto.
(vrgather): Ditto.
(vrgatherei16): Ditto.
(vcompress): Ditto.
(vundefined): Ditto.
(vreinterpret): Ditto.
(vlmul_ext): Ditto.
(vlmul_trunc): Ditto.
(vset): Ditto.
(vget): Ditto.
(vcreate): Ditto.
(vlseg): Ditto.
(vsseg): Ditto.
(vlsseg): Ditto.
(vssseg): Ditto.
(vluxseg): Ditto.
(vloxseg): Ditto.
(vsuxseg): Ditto.
(vsoxseg): Ditto.
(vlsegff): Ditto.
* config/riscv/riscv-vector-builtins.cc (DEF_RVV_FUNCTION): Using variadic macro.
* config/riscv/riscv-vector-builtins.h (struct function_group_info):
Add avail function interface into struct.
* config/riscv/t-riscv: Add dependency
* config/riscv/riscv-vector-builtins-avail.h: New file.The definition of AVAIL marco.

RISC-V: Move RVV POLY VALUE estimation from riscv.cc to riscv-v.cc[NFC]

This patch moves RVV POLY VALUE estimation from riscv.cc to riscv-v.cc for
future better maintain like other target hook implementation.

Committed as it is obviously a code refinement.

gcc/ChangeLog:

* config/riscv/riscv-protos.h (estimated_poly_value): New function.
* config/riscv/riscv-v.cc (estimated_poly_value): Ditto.
* config/riscv/riscv.cc (riscv_estimated_poly_value): Move RVV POLY
VALUE estimation to riscv-v.cc

LoongArch: Fix eh_return epilogue for normal returns.

On LoongArch, the regitsters $r4 - $r7 (EH_RETURN_DATA_REGNO) will be saved
and restored in the function prologue and epilogue if the given function calls
__builtin_eh_return. This causes the return value to be overwritten on normal
return paths and breaks a rare case of libgcc's _Unwind_RaiseException.

gcc/ChangeLog:

* config/loongarch/loongarch.cc: Do not restore the saved eh_return
data registers ($r4-$r7) for a normal return of a function that calls
__builtin_eh_return elsewhere.
* config/loongarch/loongarch-protos.h: Same.
* config/loongarch/loongarch.md: Same.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/eh_return-normal-return.c: New test.

MAINTAINERS: Add myself to write after approval and DCO

ChangeLog:

* MAINTAINERS: Add myself to write after approval

Daily bump.

testsuite: Disable -fstack-protector* for some strub tests

In our distro builds, we test with
RUNTESTFLAGS='--target_board=unix\{,-fstack-protector-strong\}'
because SSP is something we use widely in the distribution.
4 new strub test FAIL with that option though, as can be
seen with a simple
make check-gcc check-g++ RUNTESTFLAGS='--target_board=unix\{,-fstack-protector-strong\} dg.exp=strub-O*'
- in particular, the expand dump
\[(\]call\[^\n\]*strub_leave.*\n\[(\]code_label
regexps see code_labels in there introduced for stack protector.

The following patch fixes it by using -fno-stack-protector for these
explicitly.

2023-12-11 Jakub Jelinek <jakub@redhat.com>

* c-c++-common/strub-O2fni.c: Add -fno-stack-protector to dg-options.
* c-c++-common/strub-O3fni.c: Likewise.
* c-c++-common/strub-Os.c: Likewise.
* c-c++-common/strub-Og.c: Likewise.

Fix regression causing ICE for structs with VLAs [PR 112488]

A previous patch that fixed several ICEs related to size expressions
of VM types (PR c/70418, ...) caused a regression for structs where
a DECL_EXPR is not generated anymore although reqired. We now call
add_decl_expr introduced by the previous patch from finish_struct.
The function is revised with a new argument to not set the TYPE_NAME
for the type to the DECL_EXPR in this specific case.

PR c/112488

gcc/c
* c-decl.cc (add_decl_expr): Revise.
(finish_struct): Create DECL_EXPR.
* c-parser.cc (c_parser_struct_or_union_specifier): Call
finish_struct with expression for VLA sizes.
* c-tree.h (finish_struct): Add argument.

gcc/testsuite
* gcc.dg/pr112488-1.c: New test.
* gcc.dg/pr112488-2.c: New test.
* gcc.dg/pr112898.c: New test.
* gcc.misc-tests/gcov-pr85350.c: Adapt.

Resolve ICE in 'gcc/fortran/trans-openmp.cc:gfc_omp_call_is_alloc'

Fix-up for recent commit 2505a8b41d3b74a545755a278f3750a29c1340b6
"OpenMP: Minor '!$omp allocators' cleanup", which caused:

{+FAIL: gfortran.dg/gomp/allocate-5.f90 -O (internal compiler error: tree check: expected class 'type', have 'declaration' (function_decl) in gfc_omp_call_is_alloc, at fortran/trans-openmp.cc:8386)+}
[-PASS:-]{+FAIL:+} gfortran.dg/gomp/allocate-5.f90 -O (test for excess errors)

..., and similarly in 'libgomp.fortran/allocators-1.f90',
'libgomp.fortran/allocators-2.f90', 'libgomp.fortran/allocators-3.f90',
'libgomp.fortran/allocators-4.f90', 'libgomp.fortran/allocators-5.f90'.

gcc/fortran/
* trans-openmp.cc (gfc_omp_call_is_alloc): Resolve ICE.

analyzer: fix uninitialized bitmap [PR112955]

In r14-5566-g841008d3966c0f I added a new ctor for
feasibility_state, but failed to call bitmap_clear
on m_snodes_visited.

Fixed thusly.

gcc/analyzer/ChangeLog:
PR analyzer/112955
* engine.cc (feasibility_state::feasibility_state): Initialize
m_snodes_visited.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

Treat "p" in asms as addressing VOIDmode

check_asm_operands was inconsistent about how it handled "p"
after RA compared to before RA.  Before RA it tested the address
with a void (unknown) memory mode:

    case CT_ADDRESS:
      /* Every address operand can be reloaded to fit.  */
      result = result || address_operand (op, VOIDmode);
      break;

After RA it deferred to constrain_operands, which used the mode
of the operand:

if ((GET_MODE (op) == VOIDmode
     || SCALAR_INT_MODE_P (GET_MODE (op)))
    && (strict <= 0
|| (strict_memory_address_p
     (recog_data.operand_mode[opno], op))))
  win = true;

Using the mode of the operand is necessary for special predicates,
where it is used to give the memory mode.  But for asms, the operand
mode is simply the mode of the address itself (so DImode on 64-bit
targets), which doesn't say anything about the addressed memory.

This patch uses VOIDmode for asms but continues to use the operand
mode for .md insns.  It's needed to avoid a regression in the
testcase with the late-combine pass.

Fixing this made me realise that recog_level2 was doing duplicate
work for asms after RA.

gcc/
* recog.cc (constrain_operands): Pass VOIDmode to
strict_memory_address_p for 'p' constraints in asms.
* rtl-ssa/changes.cc (recog_level2): Skip redundant constrain_operands
for asms.

gcc/testsuite/
* gcc.target/aarch64/prfm_imm_offset_2.c: New test.

testsuite: update mangling

Since r14-6064-gc3f281a0c1ca50 this test was checking for the wrong
mangling, but it still passed on targets that support ABI compatibility
aliases. Let's avoid generating those aliases when checking mangling.

gcc/ChangeLog:

* common.opt: Add comment.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-explicit-inst1.C: Specify ABI v18.
* g++.dg/cpp2a/concepts-explicit-inst1a.C: New test.

-finline-stringops: avoid too-wide smallest_int_mode_for_size [PR112784]

smallest_int_mode_for_size may abort when the requested mode is not
available.  Call int_mode_for_size instead, that signals the
unsatisfiable request in a more graceful way.

for  gcc/ChangeLog

PR middle-end/112784
* expr.cc (emit_block_move_via_loop): Call int_mode_for_size
for maybe-too-wide sizes.
(emit_block_cmp_via_loop): Likewise.

for  gcc/testsuite/ChangeLog

PR middle-end/112784
* gcc.target/i386/avx512cd-inline-stringops-pr112784.c: New.

-finline-stringops: check base blksize for memset [PR112778]

The recently-added logic for -finline-stringops=memset introduced an
assumption that doesn't necessarily hold, namely, that
can_store_by_pieces of a larger size implies can_store_by_pieces by
smaller sizes.  Checks for all sizes the by-multiple-pieces machinery
might use before committing to an expansion pattern.

for  gcc/ChangeLog

PR target/112778
* builtins.cc (can_store_by_multiple_pieces): New.
(try_store_by_multiple_pieces): Call it.

for  gcc/testsuite/ChangeLog

PR target/112778
* gcc.dg/inline-mem-cmp-pr112778.c: New.

-finline-stringops: don't assume ptr_mode ptr in memset [PR112804]

On aarch64 -milp32, and presumably on other such targets, ptr can be
in a different mode than ptr_mode in the testcase.  Cope with it.

for  gcc/ChangeLog

PR target/112804
* builtins.cc (try_store_by_multiple_pieces): Use ptr's mode
for the increment.

for  gcc/testsuite/ChangeLog

PR target/112804
* gcc.target/aarch64/inline-mem-set-pr112804.c: New.

multiflags: fix doc warning

Comply with dubious doc warning that after an @xref there must be a
comma or a period, not a close parentheses.

for gcc/ChangeLog

* doc/invoke.texi (multiflags): Add period after @xref to
silence warning.

strub: disable on rl78

rl78 allocation of virtual registers to physical registers doesn't
operate on asm statements, and strub uses asm statements in the
runtime and in the generated code, to the point that the runtime
won't build. Force strub disabled on that target.

for gcc/ChangeLog

* config/rl78/rl78.cc (TARGET_HAVE_STRUB_SUPPORT_FOR): Disable.

strub: add note on attribute access

Document why attribute access doesn't need the same treatment as fn
spec, and check that the assumption behind it holds.

for gcc/ChangeLog

* ipa-strub.cc (pass_ipa_strub::execute): Check that we don't
add indirection to pointer parameters, and document attribute
access non-interactions.

libgfortran: Replace mutex with rwlock

This patch try to introduce the rwlock and split the read/write to
unit_root tree and unit_cache with rwlock instead of the mutex to
increase CPU efficiency. In the get_gfc_unit function, the percentage
to step into the insert_unit function is around 30%, in most instances,
we can get the unit in the phase of reading the unit_cache or unit_root
tree. So split the read/write phase by rwlock would be an approach to
make it more parallel.

BTW, the IPC metrics can gain around 9x in our test
server with 220 cores. The benchmark we used is
https://github.com/rwesson/NEAT

libgcc/ChangeLog:

* gthr-posix.h (__GTHREAD_RWLOCK_INIT): New macro.
(__gthrw): New function.
(__gthread_rwlock_rdlock): New function.
(__gthread_rwlock_tryrdlock): New function.
(__gthread_rwlock_wrlock): New function.
(__gthread_rwlock_trywrlock): New function.
(__gthread_rwlock_unlock): New function.

libgfortran/ChangeLog:

* io/async.c (DEBUG_LINE): New macro.
* io/async.h (RWLOCK_DEBUG_ADD): New macro.
(CHECK_RDLOCK): New macro.
(CHECK_WRLOCK): New macro.
(TAIL_RWLOCK_DEBUG_QUEUE): New macro.
(IN_RWLOCK_DEBUG_QUEUE): New macro.
(RDLOCK): New macro.
(WRLOCK): New macro.
(RWUNLOCK): New macro.
(RD_TO_WRLOCK): New macro.
(INTERN_RDLOCK): New macro.
(INTERN_WRLOCK): New macro.
(INTERN_RWUNLOCK): New macro.
* io/io.h (struct gfc_unit): Change UNIT_LOCK to UNIT_RWLOCK in
a comment.
(unit_lock): Remove including associated internal_proto.
(unit_rwlock): New declarations including associated internal_proto.
(dec_waiting_unlocked): Use WRLOCK and RWUNLOCK on unit_rwlock
instead of __gthread_mutex_lock and __gthread_mutex_unlock on
unit_lock.
* io/transfer.c (st_read_done_worker): Use WRLOCK and RWUNLOCK on
unit_rwlock instead of LOCK and UNLOCK on unit_lock.
(st_write_done_worker): Likewise.
* io/unit.c: Change UNIT_LOCK to UNIT_RWLOCK in 'IO locking rules'
comment. Use unit_rwlock variable instead of unit_lock variable.
(get_gfc_unit_from_unit_root): New function.
(get_gfc_unit): Use RDLOCK, WRLOCK and RWUNLOCK on unit_rwlock
instead of LOCK and UNLOCK on unit_lock.
(close_unit_1): Use WRLOCK and RWUNLOCK on unit_rwlock instead of
LOCK and UNLOCK on unit_lock.
(close_units): Likewise.
(newunit_alloc): Use RWUNLOCK on unit_rwlock instead of UNLOCK on
unit_lock.
* io/unix.c (find_file): Use RDLOCK and RWUNLOCK on unit_rwlock
instead of LOCK and UNLOCK on unit_lock.
(flush_all_units): Use WRLOCK and RWUNLOCK on unit_rwlock instead
of LOCK and UNLOCK on unit_lock.

PR rtl-optimization/112380: Defend against CLOBBERs in combine.cc

This patch addresses PR rtl-optimization/112380, an ICE-on-valid regression
where a (clobber (const_int 0)) encounters a sanity checking gcc_assert
(at line 7554) in simplify-rtx.cc.  These CLOBBERs are used internally
by GCC's combine pass much like error_mark_node is used by various
language front-ends.

The solutions are either to handle/accept these CLOBBERs through-out
(or in more places in) the middle-end's RTL optimizers, including functions
in simplify-rtx.cc that are used by passes other than combine, and/or
attempt to prevent these CLOBBERs escaping from try_combine into the
RTX/RTL stream.  The benefit of the second approach is that it actually
allows for better optimization: when try_combine fails to simplify an
expression instead of substituting a CLOBBER to avoid the instruction
pattern being recognized, noticing the CLOBBER often allows combine
to attempt alternate simplifications/transformations looking for those
that can be recognized.

This first alternative is the minimal fix to address the CLOBBER
encountered in the bugzilla PR.

2023-12-11  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
PR rtl-optimization/112380
* combine.cc (expand_field_assignment): Check if gen_lowpart
returned a CLOBBER, and avoid calling gen_simplify_binary with
it if so.

gcc/testsuite/ChangeLog
PR rtl-optimization/112380
* gcc.dg/pr112380.c: New test case.

Testsuite: restrict test to nonpic targets

The test is currently failing on x86_64-apple-darwin.

gcc/testsuite/ChangeLog:

PR testsuite/112297
* gcc.target/i386/pr100936.c: Require nonpic target.

c++: add fixed testcase [PR63378]

We accept this testcase since r12-4453-g79802c5dcc043a.

PR c++/63378

gcc/testsuite/ChangeLog:

* g++.dg/template/fnspec3.C: New test.

aarch64: Fix wrong code for bfloat when f16 is enabled [PR 111867]

The problem here is when f16 is enabled, movbf_aarch64 accepts `Ufc`
as a constraint:
[ w , Ufc ; fconsts , fp16 ] fmov\t%h0, %1
But that is for fmov values and in this case fmov represents f16 rather than bfloat16 values.
This means we would get the wrong value in the register.

Built and tested for aarch64-linux-gnu with no regressions. Also tested with `-march=armv9-a+sve2,
gcc.dg/torture/bfloat16-basic.c and gcc.dg/torture/bfloat16-builtin.c no longer fail.

gcc/ChangeLog:

PR target/111867
* config/aarch64/aarch64.cc (aarch64_float_const_representable_p): For BFmode,
only accept +0.0.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

MATCH: (convert)(zero_one !=/== 0/1) for outer type and zero_one type are the same

When I moved two_value to match.pd, I removed the check for the {0,+-1}
as I had placed it after the {0,+-1} case for cond in match.pd.
In the case of {0,+-1} and non boolean, before we would optmize those
case to just `(convert)a` but after we would get `(convert)(a != 0)`
which was not handled anyways to just `(convert)a`.
So this adds a pattern to match `(convert)(zeroone != 0)` and simplify
to `(convert)zeroone`.

Also this optimizes (convert)(zeroone == 0) into (zeroone^1) if the
type match. Removing the opposite transformation from fold.
The opposite transformation was added with
https://gcc.gnu.org/pipermail/gcc-patches/2006-February/190514.html
It is no longer considered the canonicalization either, even VRP will
transform it back into `(~a) & 1` so removing it is a good idea.

Note the testcase pr69270.c needed a slight update due to not matching
exactly a scan pattern, this update makes it more robust and will match
before and afterwards and if there are other changes in this area too.

Note the testcase gcc.target/i386/pr110790-2.c needs a slight update
for better code generation in LP64 bit mode.

Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

PR tree-optimization/111972
PR tree-optimization/110637
* match.pd (`(convert)(zeroone !=/== CST)`): Match
and simplify to ((convert)zeroone){,^1}.
* fold-const.cc (fold_binary_loc): Remove
transformation of `(~a) & 1` and `(a ^ 1) & 1`
into `(convert)(a == 0)`.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/pr110637-1.c: New test.
* gcc.dg/tree-ssa/pr110637-2.c: New test.
* gcc.dg/tree-ssa/pr110637-3.c: New test.
* gcc.dg/tree-ssa/pr111972-1.c: New test.
* gcc.dg/tree-ssa/pr69270.c: Update testcase.
* gcc.target/i386/pr110790-2.c: Update testcase.
* gcc.dg/fold-even-1.c: Removed.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

analyzer: Remove check of unsigned_char in maybe_undo_optimize_bit_field_compare.

The check for the type seems unnecessary and gets in the way sometimes.
Also with a patch I am working on for match.pd, it causes a failure to happen.
Before my patch the IR was:
  _1 = BIT_FIELD_REF <s, 8, 16>;
  _2 = _1 & 1;
  _3 = _2 != 0;
  _4 = (int) _3;
  __analyzer_eval (_4);

Where _2 was an unsigned char type.
And After my patch we have:
  _1 = BIT_FIELD_REF <s, 8, 16>;
  _2 = (int) _1;
  _3 = _2 & 1;
  __analyzer_eval (_3);

But in this case, the BIT_AND_EXPR is in an int type.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/analyzer/ChangeLog:

* region-model-manager.cc (maybe_undo_optimize_bit_field_compare): Remove
the check for type being unsigned_char_type_node.

expr: catch more `a*bool` while expanding [PR 112935]

After r14-1655-g52c92fb3f40050 (and the other commits
which touch zero_one_valued_p), we end up with a with
`bool * a` but where the bool is an SSA name that might not
have non-zero bits set on it (to 0x1) even though it
does the non-zero bits would be 0x1.
The case of coremarks, it is only phiopt4 which adds the new
ssa name and nothing afterwards updates the nonzero bits on it.
This fixes the regression by using gimple_zero_one_valued_p
rather than tree_nonzero_bits to match the cases where the
SSA_NAME didn't have the non-zero bits set.
gimple_zero_one_valued_p handles one level of cast and also
and an `&`.

Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

PR middle-end/112935
* expr.cc (expand_expr_real_2): Use
gimple_zero_one_valued_p instead of tree_nonzero_bits
to find boolean defined expressions.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

[PATCH] wrong code on m68k with -mlong-jump-table-offsets and -malign-int (PR target/112413)

On m68k the compiler assumes that the PC-relative jump-via-jump-table
instruction and the jump table are adjacent with no padding in between.

When -mlong-jump-table-offsets is combined with -malign-int, a 2-byte
nop may be inserted before the jump table, causing the jump to add the
fetched offset to the wrong PC base and thus jump to the wrong address.

Fixed by referencing the jump table via its label. On the test case
in the PR the object code change is (the moveal at 16 is the nop):

    a:  6536            bcss 42 <f+0x42>
    c:  e588            lsll #2,%d0
    e:  203b 0808       movel %pc@(18 <f+0x18>,%d0:l),%d0
-  12:  4efb 0802       jmp %pc@(16 <f+0x16>,%d0:l)
+  12:  4efb 0804       jmp %pc@(18 <f+0x18>,%d0:l)
   16:  284c            moveal %a4,%a4
   18:  0000 0020       orib #32,%d0
   1c:  0000 002c       orib #44,%d0

Bootstrapped and tested on m68k-linux-gnu, no regressions.

Note: I don't have commit rights to I would need assistance applying this.

PR target/112413
gcc/

* config/m68k/linux.h (ASM_RETURN_CASE_JUMP): For
TARGET_LONG_JUMP_TABLE_OFFSETS, reference the jump table
via its label.
* config/m68k/m68kelf.h (ASM_RETURN_CASE_JUMP): Likewise.
* config/m68k/netbsd-elf.h (ASM_RETURN_CASE_JUMP): Likewise.

aarch64: enable mixed-types for aarch64 simdclones

This patch enables the use of mixed-types for simd clones for AArch64, adds
aarch64 as a target_vect_simd_clones and corrects the way the simdlen is chosen
for non-specified simdlen clauses according to the 'Vector Function Application
Binary Interface Specification for AArch64'.

Additionally this patch also restricts combinations of simdlen and
return/argument types that map to vectors larger than 128 bits as we currently
do not have a way to represent these types in a way that is consistent
internally and externally.

gcc/ChangeLog:

* config/aarch64/aarch64.cc (lane_size): New function.
(aarch64_simd_clone_compute_vecsize_and_simdlen): Determine simdlen according to NDS rule
and reject combination of simdlen and types that lead to vectors larger than 128bits.

gcc/testsuite/ChangeLog:

* lib/target-supports.exp: Add aarch64 targets to vect_simd_clones.
* c-c++-common/gomp/declare-variant-14.c: Adapt test for aarch64.
* c-c++-common/gomp/pr60823-1.c: Likewise.
* c-c++-common/gomp/pr60823-2.c: Likewise.
* c-c++-common/gomp/pr60823-3.c: Likewise.
* g++.dg/gomp/attrs-10.C: Likewise.
* g++.dg/gomp/declare-simd-1.C: Likewise.
* g++.dg/gomp/declare-simd-3.C: Likewise.
* g++.dg/gomp/declare-simd-4.C: Likewise.
* g++.dg/gomp/declare-simd-7.C: Likewise.
* g++.dg/gomp/declare-simd-8.C: Likewise.
* g++.dg/gomp/pr88182.C: Likewise.
* gcc.dg/declare-simd.c: Likewise.
* gcc.dg/gomp/declare-simd-1.c: Likewise.
* gcc.dg/gomp/declare-simd-3.c: Likewise.
* gcc.dg/gomp/pr87887-1.c: Likewise.
* gcc.dg/gomp/pr87895-1.c: Likewise.
* gcc.dg/gomp/pr89246-1.c: Likewise.
* gcc.dg/gomp/pr99542.c: Likewise.
* gcc.dg/gomp/simd-clones-2.c: Likewise.
* gcc.dg/vect/vect-simd-clone-1.c: Likewise.
* gcc.dg/vect/vect-simd-clone-2.c: Likewise.
* gcc.dg/vect/vect-simd-clone-4.c: Likewise.
* gcc.dg/vect/vect-simd-clone-5.c: Likewise.
* gcc.dg/vect/vect-simd-clone-6.c: Likewise.
* gcc.dg/vect/vect-simd-clone-7.c: Likewise.
* gcc.dg/vect/vect-simd-clone-8.c: Likewise.
* gfortran.dg/gomp/declare-simd-2.f90: Likewise.
* gfortran.dg/gomp/declare-simd-coarray-lib.f90: Likewise.
* gfortran.dg/gomp/declare-variant-14.f90: Likewise.
* gfortran.dg/gomp/pr79154-1.f90: Likewise.
* gfortran.dg/gomp/pr83977.f90: Likewise.

libgomp/ChangeLog:

* testsuite/libgomp.c/declare-variant-1.c: Adapt test for aarch64.
* testsuite/libgomp.fortran/declare-simd-1.f90: Likewise.

c++: alias CTAD and specializations table

A rewritten guide for alias CTAD isn't really a specialization of the
original guide, so we shouldn't register it as such. This avoids an ICE
in the below modules testcase for which we otherwise crash due to the
guide's empty DECL_CONTEXT when walking the specializations table. It
also preemptively avoids the same ICE in modules/concept-6 in C++23 mode
with the inherited CTAD patch.

gcc/cp/ChangeLog:

* pt.cc (alias_ctad_tweaks): Pass use_spec_table=false to
tsubst_decl.

gcc/testsuite/ChangeLog:

* g++.dg/modules/concept-8.h: New test.
* g++.dg/modules/concept-8_a.H: New test.
* g++.dg/modules/concept-8_b.C: New test.

RISC-V: testsuite: Fix strcmp-run.c test.

This fixes expectations in the strcmp-run test which would sometimes
fail with newlib. The test expects libc strcmp return values and
asserts the vectorized result is similar to those. Therefore hard-code
the expected results instead of relying on a strcmp call.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c: Adjust test
expectation and target selector.
* gcc.target/riscv/rvv/autovec/builtin/strlen-run.c: Adjust
target selector.
* gcc.target/riscv/rvv/autovec/builtin/strncmp-run.c: Ditto.

OpenMP: Support acquires/release in 'omp require atomic_default_mem_order'

This is an OpenMP 5.2 feature.

gcc/c/ChangeLog:

* c-parser.cc (c_parser_omp_requires): Handle acquires/release
in atomic_default_mem_order clause.
(c_parser_omp_atomic): Update.

gcc/cp/ChangeLog:

* parser.cc (cp_parser_omp_requires): Handle acquires/release
in atomic_default_mem_order clause.
(cp_parser_omp_atomic): Update.

gcc/fortran/ChangeLog:

* gfortran.h (enum gfc_omp_requires_kind): Add
OMP_REQ_ATOMIC_MEM_ORDER_ACQUIRE and OMP_REQ_ATOMIC_MEM_ORDER_RELEASE.
(gfc_namespace): Add a 7th bit to omp_requires.
* module.cc (enum ab_attribute): Add AB_OMP_REQ_MEM_ORDER_ACQUIRE
and AB_OMP_REQ_MEM_ORDER_RELEASE
(mio_symbol_attribute): Handle it.
* openmp.cc (gfc_omp_requires_add_clause): Update for acquire/release.
(gfc_match_omp_requires): Likewise.
(gfc_match_omp_atomic): Handle them for atomic_default_mem_order.
* parse.cc: Likewise.

gcc/testsuite/ChangeLog:

* c-c++-common/gomp/requires-3.c: Update for now valid code.
* gfortran.dg/gomp/requires-3.f90: Likewise.
* gfortran.dg/gomp/requires-2.f90: Update dg-error.
* gfortran.dg/gomp/requires-5.f90: Likewise.
* c-c++-common/gomp/requires-5.c: New test.
* c-c++-common/gomp/requires-6.c: New test.
* c-c++-common/gomp/requires-7.c: New test.
* c-c++-common/gomp/requires-8.c: New test.
* gfortran.dg/gomp/requires-10.f90: New test.
* gfortran.dg/gomp/requires-11.f90: New test.

OpenMP: Minor '!$omp allocators' cleanup

gcc/fortran/ChangeLog:

* trans-openmp.cc (gfc_omp_call_add_alloc,
gfc_omp_call_is_alloc): Set 'fn spec'.

libgomp/ChangeLog:

* libgomp_g.h (GOMP_add_alloc, GOMP_is_alloc): Add.

ada: Fix Ada bootstrap on FreeBSD

Ada bootstrap on FreeBSD/amd64 was also broken by the recent warning
changes:

terminals.c: In function 'allocate_pty_desc':
terminals.c:1200:12: error: implicit declaration of function 'openpty'; did you
mean 'openat'? [-Wimplicit-function-declaration]
1200 |   status = openpty (&master_fd, &slave_fd, NULL, NULL, NULL);
      |            ^~~~~~~
      |            openat

terminals.c: At top level:
terminals.c:1268:9: warning: "TABDLY" redefined
1268 | #define TABDLY 0
      |         ^~~~~~
In file included from /usr/include/termios.h:38,
                 from terminals.c:1109:
/usr/include/sys/_termios.h:111:9: note: this is the location of the previous definition
  111 | #define TABDLY          0x00000004      /* tab delay mask */
      |         ^~~~~~
make[7]: *** [../gcc-interface/Makefile:302: terminals.o] Error 1

Fixed by including the necessary header and guarding the fallback
definition of TABDLY.

This allowed a 64-bit-only bootstrap on x86_64-unknown-freebsd14.0 to
complete successfully.

2023-12-11  Rainer Orth  <ro@CeBiTec.Uni-Bielefeld.DE>

gcc/ada:
* terminals.c [__FreeBSD__]: Include <libutil.h>.
(TABDLY): Only define if missing.

RTL-SSA: Fix ICE on record_use of RTL_SSA for RISC-V VSETVL PASS

This patch fixes an ICE on record_use during RTL_SSA initialization RISC-V backend VSETVL PASS.

This is the ICE:

0x11a8603 partial_subreg_p(machine_mode, machine_mode)
        ../../../../gcc/gcc/rtl.h:3187
0x3b695eb rtl_ssa::function_info::record_use(rtl_ssa::function_info::build_info&, rtl_ssa::insn_info*, rtx_obj_reference)
        ../../../../gcc/gcc/rtl-ssa/insns.cc:524

In record_use:

      if (HARD_REGISTER_NUM_P (regno)
  && partial_subreg_p (use->mode (), mode))

Assertion failed on partial_subreg_p which is:

inline bool
partial_subreg_p (machine_mode outermode, machine_mode innermode)
{
  /* Modes involved in a subreg must be ordered.  In particular, we must
     always know at compile time whether the subreg is paradoxical.  */
  poly_int64 outer_prec = GET_MODE_PRECISION (outermode);
  poly_int64 inner_prec = GET_MODE_PRECISION (innermode);
  gcc_checking_assert (ordered_p (outer_prec, inner_prec));                     -----> cause ICE.
  return maybe_lt (outer_prec, inner_prec);
}

RISC-V VSETVL PASS is an advanced lazy vsetvl insertion PASS after RA (register allocation).

The rootcause is that we have a pattern (reduction instruction) that includes both VLA (length-agnostic) and VLS (fixed-length) modes.

(insn 168 173 170 31 (set (reg:RVVM1SI 101 v5 [311])
        (unspec:RVVM1SI [
                (unspec:V32BI [
                        (const_vector:V32BI [
                                (const_int 1 [0x1]) repeated x32
                            ])
                        (reg:DI 30 t5 [312])
                        (const_int 2 [0x2]) repeated x2
                        (reg:SI 66 vl)
                        (reg:SI 67 vtype)
                    ] UNSPEC_VPREDICATE)
                (unspec:RVVM1SI [
                        (reg:V32SI 96 v0 [orig:185 vect__96.40 ] [185])   -----> VLS mode NUNITS = 32 elements.
                        (reg:RVVM1SI 113 v17 [439])                       -----> VLA mode NUNITS = [8, 8] elements.
                    ] UNSPEC_REDUC_XOR)
                (unspec:RVVM1SI [
                        (reg:SI 0 zero)
                    ] UNSPEC_VUNDEF)
            ] UNSPEC_REDUC)) 15948 {pred_redxorv32si}

In this case, record_use is trying to check partial_subreg_p (use->mode (), mode) for RTX = (reg:V32SI 96 v0 [orig:185 vect__96.40 ] [185]).

use->mode () == V32SImode, wheras mode = RVVM1SImode. Then it ICE since they are !ordered_p.

Set the use mode as the biggest mode which is natural fall back mode.

gcc/ChangeLog:

* rtl-ssa/insns.cc (function_info::record_use): Add !ordered_p case.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/vsetvl/vsetvl_bug-2.c: New test.

RISC-V: Robostify shuffle index used by vrgather and fix regression

Notice there are some regression FAILs:
FAIL: gcc.target/riscv/rvv/autovec/pr110950.c -O3 -ftree-vectorize  scan-assembler-times vslide1up\\.vx 1
FAIL: gcc.target/riscv/rvv/autovec/vls-vlmax/perm-4.c -std=c99 -O3 -ftree-vectorize --param riscv-autovec-preference=fixed-vlmax  scan-assembler-times vrgather\\.vv\\tv[0-9]+,\\s*v[0-9]+,\\s*v[0-9]+ 19
FAIL: gcc.target/riscv/rvv/autovec/vls-vlmax/perm-4.c -std=c99 -O3 -ftree-vectorize --param riscv-autovec-preference=fixed-vlmax  scan-assembler-times vrgatherei16\\.vv\\tv[0-9]+,\\s*v[0-9]+,\\s*v[0-9]+ 12
FAIL: gcc.target/riscv/rvv/autovec/vls/perm-4.c -O3 -ftree-vectorize --param riscv-autovec-preference=scalable  scan-assembler-times vrgather\\.vv\\tv[0-9]+,\\s*v[0-9]+,\\s*v[0-9]+ 19
FAIL: gcc.target/riscv/rvv/autovec/vls/perm-4.c -O3 -ftree-vectorize --param riscv-autovec-preference=scalable  scan-assembler-times vrgatherei16\\.vv\\tv[0-9]+,\\s*v[0-9]+,\\s*v[0-9]+ 12

pr110950 is not a regression, adapt testcase is enough.

The rest FAILs which is caused by this patch:
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=d9dd06ad51b7479f09acb88adf404664a1e18b2a

need to be recovered back.

Robostify the gather index to fixe those FAILs.

gcc/ChangeLog:

* config/riscv/riscv-v.cc (get_gather_index_mode): New function.
(shuffle_series_patterns): Robostify shuffle index.
(shuffle_generic_patterns): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/pr110950.c: Adapt test.

Testsuite, asan, darwin: Adjust output pattern

Since the last import from upstream libsanitizer, the output has changed
and now looks more like this:

READ of size 6 at 0x7ff7beb2a144 thread T0
    #0 0x101cf7796 in MemcmpInterceptorCommon(void*, int (*)(void const*, void const*, unsigned long), void const*, void const*, unsigned long) sanitizer_common_interceptors.inc:813
    #1 0x101cf7b99 in memcmp sanitizer_common_interceptors.inc:840
    #2 0x108a0c39f in __stack_chk_guard+0xf (dyld:x86_64+0x8039f)

so let's adjust the pattern accordingly.

gcc/testsuite/ChangeLog:

* c-c++-common/asan/memcmp-1.c: Adjust pattern on darwin.

aarch64: arm_neon.h - Fix -Wincompatible-pointer-types errors

In the Linux kernel, u64/s64 are [un]signed long long, not [un]signed
long. This means that when the `arm_neon.h' header is used by the
kernel, any use of the `uint64_t' / `in64_t' types needs to be
correctly cast to the correct `__builtin_aarch64_simd_di' /
`__builtin_aarch64_simd_df' types when calling the relevant ACLE
builtins.

This patch adds the necessary fixes to ensure that `vstl1_*' and
`vldap1_*' intrinsics are correctly defined for use by the kernel.

gcc/ChangeLog:

* config/aarch64/arm_neon.h (vldap1_lane_u64): Add
`const' to `__builtin_aarch64_simd_di *' cast.
(vldap1q_lane_u64): Likewise.
(vldap1_lane_s64): Cast __src to `const __builtin_aarch64_simd_di *'.
(vldap1q_lane_s64): Likewise.
(vldap1_lane_f64): Cast __src to `const __builtin_aarch64_simd_df *'.
(vldap1q_lane_f64): Cast __src to `const __builtin_aarch64_simd_df *'.
(vldap1_lane_p64): Add `const' to `__builtin_aarch64_simd_di *' cast.
(vldap1q_lane_p64): Add `const' to `__builtin_aarch64_simd_di *' cast.
(vstl1_lane_u64): remove stray `const'.
(vstl1_lane_s64): Cast __src to `__builtin_aarch64_simd_di *'.
(vstl1q_lane_s64): Likewise.
(vstl1_lane_f64): Cast __src to `const __builtin_aarch64_simd_df *'.
(vstl1q_lane_f64): Likewise.

d: Merge upstream dmd, druntime 2bbf64907c, phobos b64bfbf91

D front-end changes:

    - Import dmd v2.106.0.

D runtime changes:

    - Import druntime v2.106.0.

Phobos changes:

    - Import phobos v2.106.0.

gcc/d/ChangeLog:

* Make-lang.in (D_FRONTEND_OBJS): Rename d/common-string.o to
d/common-smallbuffer.o.
* dmd/MERGE: Merge upstream dmd 2bbf64907c.
* dmd/VERSION: Bump version to v2.106.0.
* modules.cc (layout_moduleinfo_fields): Update for new front-end
interface.
(layout_moduleinfo): Likewise.

libphobos/ChangeLog:

* libdruntime/MERGE: Merge upstream druntime 2bbf64907c.
* src/MERGE: Merge upstream phobos b64bfbf91.

RISC-V: Rename test[NFC]

Since I want to commit multiple tests which are fixing vsetvl bugs,
rename it to make testcases more easier maintain.

Committed as it is obvious.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/vsetvl/avl_use_bug-1.c: Moved to...
* gcc.target/riscv/rvv/vsetvl/vsetvl_bug-1.c: ...here.

RISC-V: Recognize stepped series in expand_vec_perm_const.

We currently try to recognize various forms of stepped (const_vector)
sequence variants in expand_const_vector.  Because of complications with
canonicalization and encoding it is easier to identify such patterns
in expand_vec_perm_const_1 already where perm.series_p () is available.

This patch introduces shuffle_series as new permutation pattern and
tries to recognize series like [base0 base1 base1 + step ...].  If such
a series is found the series is expanded by expand_vec_series and a
gather is emitted.

On top the patch fixes the step recognition in expand_const_vector
for stepped series where such a series would end up before.

This fixes several execution failures when running code compiled for a
scalable vector size of 128 on a target with vlen = 256 or higher.
The problem was only noticed there because the encoding for a reversed
[2 2]-element vector ("3 2 1 0") is { [1 2], [0 2], [1 4] }.

Some testcases that failed were:
vect-alias-check-18.c
vect-alias-check-1.F90
pr64365.c

On a 128-bit target, only the first two elements are used.  The
third element causing the complications only comes into effect at
vlen = 256.

With this patch the testsuite results are similar with vlen = 128,
vlen = 256 as well as vlen = 512 (apart from the fixed-vlmax tests of
course).

gcc/ChangeLog:

PR target/112853

* config/riscv/riscv-v.cc (expand_const_vector):  Fix step
calculation.
(modulo_sel_indices): Also perform modulo for variable-length
constants.
(shuffle_series): Recognize series permutations.
(expand_vec_perm_const_1): Add shuffle_series.

Testsuite, i386: mark test as requiring dfp

Test currently fails on darwin with:
error: decimal floating-point not supported for this target

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr112445.c: Require dfp.

Simplify vector ((VCE (a cmp b ? -1 : 0)) < 0) ? c : d to just (VCE ((a cmp b) ? (VCE c) : (VCE d))).

When I'm working on PR112443, I notice there's some misoptimizations:
after we fold _mm{,256}_blendv_epi8/pd/ps into gimple, the backend
fails to combine it back to v{,p}blendv{v,ps,pd} since the pattern is
too complicated, so I think maybe we should hanlde it in the gimple
level.

The dump is like

  _1 = c_3(D) >= { 0, 0, 0, 0 };
  _2 = VEC_COND_EXPR <_1, { -1, -1, -1, -1 }, { 0, 0, 0, 0 }>;
  _7 = VIEW_CONVERT_EXPR<vector(32) char>(_2);
  _8 = VIEW_CONVERT_EXPR<vector(32) char>(b_6(D));
  _9 = VIEW_CONVERT_EXPR<vector(32) char>(a_5(D));
  _10 = _7 < { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 };
  _11 = VEC_COND_EXPR <_10, _8, _9>;

It can be optimized to

  _1 = c_2(D) >= { 0, 0, 0, 0 };
  _6 = VEC_COND_EXPR <_1, b_5(D), a_4(D)>;

since _7 is either -1 or 0, the selection of _7 < 0 ? _8 : _9 should
be euqal to _1 ? b : a as long as TYPE_PRECISION of the component type
of the second VEC_COND_EXPR is less equal to the first one.
The patch add a gimple pattern to handle that.

gcc/ChangeLog:

* match.pd (VCE (a cmp b ? -1 : 0) < 0) ? c : d ---> (VCE ((a
cmp b) ? (VCE:c) : (VCE:d))): New gimple simplication.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx512vl-blendv-3.c: New test.
* gcc.target/i386/blendv-3.c: New test.

Testsuite, Darwin: actually skip test

Previous commit xfailed instead of skipping, but we really
want to skip.

gcc/testsuite/ChangeLog:

* gcc.target/i386/libcall-1.c: Skip on darwin.

RISC-V: Support highest overlap for wv instructions

According to RVV ISA, we can allow vwadd.wv v2, v2, v3 overlap.

Before this patch:

        nop
        vsetivli        zero,4,e8,m4,tu,ma
        vle16.v v8,0(a0)
        vmv8r.v v0,v8
        vwsub.wv        v0,v8,v12
        nop
        addi    a4,a0,100
        vle16.v v8,0(a4)
        vmv8r.v v24,v8
        vwsub.wv        v24,v8,v12
        nop
        addi    a4,a0,200
        vle16.v v8,0(a4)
        vmv8r.v v16,v8
        vwsub.wv        v16,v8,v12
        nop

After this patch:

nop
vsetivli zero,4,e8,m4,tu,ma
vle16.v v0,0(a0)
vwsub.wv v0,v0,v4
nop
addi a4,a0,100
vle16.v v24,0(a4)
vwsub.wv v24,v24,v28
nop
addi a4,a0,200
vle16.v v16,0(a4)
vwsub.wv v16,v16,v20

PR target/112431

gcc/ChangeLog:

* config/riscv/vector.md: Support highest overlap for wv instructions.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/pr112431-39.c: New test.
* gcc.target/riscv/rvv/base/pr112431-40.c: New test.
* gcc.target/riscv/rvv/base/pr112431-41.c: New test.

RISC-V: Fix ICE in extract_single_source

This patch fixes the following ICE in VSETVL PASS:
bug.c:39:1: internal compiler error: Segmentation fault
   39 | }
      | ^
0x1ad5a08 crash_signal
        ../../../../gcc/gcc/toplev.cc:316
0x7f7f55feb90f ???
        ./signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c:0
0x218d7c7 extract_single_source
        ../../../../gcc/gcc/config/riscv/riscv-vsetvl.cc:583
0x218d95d extract_single_source
        ../../../../gcc/gcc/config/riscv/riscv-vsetvl.cc:604
0x218fbc5 pre_vsetvl::compute_lcm_local_properties()
        ../../../../gcc/gcc/config/riscv/riscv-vsetvl.cc:2703
0x2190ef4 pre_vsetvl::earliest_fuse_vsetvl_info()
        ../../../../gcc/gcc/config/riscv/riscv-vsetvl.cc:2890
0x2193e62 pass_vsetvl::lazy_vsetvl()
        ../../../../gcc/gcc/config/riscv/riscv-vsetvl.cc:3537
0x219406a pass_vsetvl::execute(function*)
        ../../../../gcc/gcc/config/riscv/riscv-vsetvl.cc:3584

The rootcause we have a case that the def info can not be traced:

(insn 208 327 333 27 (use (reg/i:DI 10 a0)) "bug.c":36:1 -1
     (nil))

It's obvious, we conservatively disable any optimization in this situation if
AVL def_info can not be tracded.

Committed as it is obvious.

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (extract_single_source): Fix ICE.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/vsetvl/avl_use_bug-1.c: New test.

extend.texi: Mark builtin arguments with @var{...}

In many cases we just specify types for the builtin arguments, in other cases
types and names with @var{name} syntax, and in other case with just name.

Shall we tweak that somehow?  If the argument names are unimportant, perhaps
it is fine to leave that out, but shouldn't we always use @var{...} around
the parameter names when specified?

On Fri, Dec 01, 2023 at 10:43:57AM -0700, Sandra Loosemore wrote:
> Yup.  The Texinfo manual says:  "When using @deftypefn command and
> variations, you should mark parameter names with @var to distinguish these
> from data type names, keywords, and other parts of the literal syntax of the
> programming language."

Here is a patch which does that (but not adding types to where they were
missing, that will be harder to search for).

2023-12-11  Jakub Jelinek  <jakub@redhat.com>

* doc/extend.texi (__sync_fetch_and_add, __sync_fetch_and_sub,
__sync_fetch_and_or, __sync_fetch_and_and, __sync_fetch_and_xor,
__sync_fetch_and_nand, __sync_add_and_fetch, __sync_sub_and_fetch,
__sync_or_and_fetch, __sync_and_and_fetch, __sync_xor_and_fetch,
__sync_nand_and_fetch, __sync_bool_compare_and_swap,
__sync_val_compare_and_swap, __sync_lock_test_and_set,
__sync_lock_release, __atomic_load_n, __atomic_load, __atomic_store_n,
__atomic_store, __atomic_exchange_n, __atomic_exchange,
__atomic_compare_exchange_n, __atomic_compare_exchange,
__atomic_add_fetch, __atomic_sub_fetch, __atomic_and_fetch,
__atomic_xor_fetch, __atomic_or_fetch, __atomic_nand_fetch,
__atomic_fetch_add, __atomic_fetch_sub, __atomic_fetch_and,
__atomic_fetch_xor, __atomic_fetch_or, __atomic_fetch_nand,
__atomic_test_and_set, __atomic_clear, __atomic_thread_fence,
__atomic_signal_fence, __atomic_always_lock_free,
__atomic_is_lock_free, __builtin_add_overflow,
__builtin_sadd_overflow, __builtin_saddl_overflow,
__builtin_saddll_overflow, __builtin_uadd_overflow,
__builtin_uaddl_overflow, __builtin_uaddll_overflow,
__builtin_sub_overflow, __builtin_ssub_overflow,
__builtin_ssubl_overflow, __builtin_ssubll_overflow,
__builtin_usub_overflow, __builtin_usubl_overflow,
__builtin_usubll_overflow, __builtin_mul_overflow,
__builtin_smul_overflow, __builtin_smull_overflow,
__builtin_smulll_overflow, __builtin_umul_overflow,
__builtin_umull_overflow, __builtin_umulll_overflow,
__builtin_add_overflow_p, __builtin_sub_overflow_p,
__builtin_mul_overflow_p, __builtin_addc, __builtin_addcl,
__builtin_addcll, __builtin_subc, __builtin_subcl, __builtin_subcll,
__builtin_alloca, __builtin_alloca_with_align,
__builtin_alloca_with_align_and_max, __builtin_speculation_safe_value,
__builtin_nan, __builtin_nand32, __builtin_nand64, __builtin_nand128,
__builtin_nanf, __builtin_nanl, __builtin_nanf@var{n},
__builtin_nanf@var{n}x, __builtin_nans, __builtin_nansd32,
__builtin_nansd64, __builtin_nansd128, __builtin_nansf,
__builtin_nansl, __builtin_nansf@var{n}, __builtin_nansf@var{n}x,
__builtin_ffs, __builtin_clz, __builtin_ctz, __builtin_clrsb,
__builtin_popcount, __builtin_parity, __builtin_bswap16,
__builtin_bswap32, __builtin_bswap64, __builtin_bswap128,
__builtin_extend_pointer, __builtin_goacc_parlevel_id,
__builtin_goacc_parlevel_size, vec_clrl, vec_clrr, vec_mulh, vec_mul,
vec_div, vec_dive, vec_mod, __builtin_rx_mvtc): Use @var{...} around
parameter names.
(vec_rl, vec_sl, vec_sr, vec_sra): Likewise.  Use @var{...} also
around A, B and R in description.

RISC-V: Remove poly selftest when --preference=fixed-vlmax

This patch fixes multiple ICEs in full coverage testing:

cc1: internal compiler error: in riscv_legitimize_poly_move, at config/riscv/riscv.cc:2456^M
0x1fd8d78 riscv_legitimize_poly_move^M
        ../../../../gcc/gcc/config/riscv/riscv.cc:2456^M
0x1fd9518 riscv_legitimize_move(machine_mode, rtx_def*, rtx_def*)^M
        ../../../../gcc/gcc/config/riscv/riscv.cc:2583^M
0x2936820 gen_movdi(rtx_def*, rtx_def*)^M
        ../../../../gcc/gcc/config/riscv/riscv.md:2099^M
0x11a0f28 rtx_insn* insn_gen_fn::operator()<rtx_def*, rtx_def*>(rtx_def*, rtx_def*) const^M
        ../../../../gcc/gcc/recog.h:431^M
0x13cf2f9 emit_move_insn_1(rtx_def*, rtx_def*)^M
        ../../../../gcc/gcc/expr.cc:4553^M
0x13d010c emit_move_insn(rtx_def*, rtx_def*)^M
        ../../../../gcc/gcc/expr.cc:4723^M
0x216f5e0 run_poly_int_selftest^M
        ../../../../gcc/gcc/config/riscv/riscv-selftests.cc:185^M
0x21701e6 run_poly_int_selftests^M
        ../../../../gcc/gcc/config/riscv/riscv-selftests.cc:226^M
0x2172109 selftest::riscv_run_selftests()^M
        ../../../../gcc/gcc/config/riscv/riscv-selftests.cc:371^M
0x3b8067b selftest::run_tests()^M
        ../../../../gcc/gcc/selftest-run-tests.cc:112^M
0x1ad90ee toplev::run_self_tests()^M
        ../../../../gcc/gcc/toplev.cc:2209^M

Running target riscv-sim/-march=rv64gcv/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m1/--param=riscv-autovec-preference=fixed-vlmax

The rootcause is that we are testing POLY value computation during FIXED-VLMAX and ICE in this code:

  if (BYTES_PER_RISCV_VECTOR.is_constant ())
    {
      gcc_assert (value.is_constant ());                           ----->  assert failed.
      riscv_emit_move (dest, GEN_INT (value.to_constant ()));
      return;
    }

For example, a poly value [15, 16] is computed by csrr vlen + multiple scalar integer instructions.

However, such compile-time unknown value need to be computed when it is scalable vector, that is !BYTES_PER_RISCV_VECTOR.is_constant (),
since csrr vlenb = [16, 0] when -march=rv64gcv --param=riscv-autovec-preference=fixed-vlmax and we have no chance to compute compile-time POLY value.

Also, we never reach the situation to compute a compile time unknown value when it is FIXED-VLMAX vector. So disable POLY selftest for FIXED-VLMAX.

gcc/ChangeLog:

* config/riscv/riscv-selftests.cc (riscv_run_selftests):
Remove poly self test when FIXED-VLMAX.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/poly-selftest-1.c: New test.

[PATCH 3/5] [ifcvt] optimize x=c ? (y AND z) : y by RISC-V Zicond like insns

Take the following case for example.

CFLAGS: -march=rv64gc_zbb_zicond -mabi=lp64d -O2

long
test_AND_ceqz (long x, long y, long z, long c)
{
  if (c)
    x = y & z;
  else
    x = y;
  return x;
}

Before patch:

and a2,a1,a2
czero.eqz a0,a2,a3
czero.nez a3,a1,a3
or a0,a3,a0
ret

After patch:
and a0,a1,a2
czero.nez a1,a1,a3
or a0,a1,a0
ret

Co-authored-by: Xiao Zeng<zengxiao@eswincomputing.com>
gcc/ChangeLog:

* ifcvt.cc (noce_cond_zero_binary_op_supported): Add support for AND.
(noce_bbs_ok_for_cond_zero_arith): Likewise.
(noce_try_cond_zero_arith): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zicond_ifcvt_opt.c: Add TCs for AND.