git.ipfire.org Git - thirdparty/gcc.git/log

[to-be-committed][v3][RISC-V] Handle bit manipulation of SImode values

Last patch in this round of bitmanip work...  At least I think I'm going to
pause here and switch gears to other projects that need attention 🙂

This patch introduces the ability to generate bitmanip instructions for rv64
when operating on SI objects when we know something about the range of the bit
position (due to masking of the position).

I've got note that the (7-pos % 8) bit position form was discovered by RAU in
500.perl.  I took that and expanded it to the simple (pos & mask) form as well
as covering bset, binv and bclr.

As far as the implementation is concerned....

This turns the recently added define_splits into define_insn_and_split
constructs.  This allows combine to "see" enough RTL to realize a sign
extension is unnecessary.  Otherwise we get undesirable sign extensions for the
new testcases.

Second it adds new patterns for the logical operations.  Two patterns for
IOR/XOR and two patterns for AND.

I think a key concept to keep in mind is that once we determine a Zbs operation
is safe to perform on a SI value, we can rewrite the RTL in 64bit form.  If we
were ever to try and use range information at expand time for this stuff (and
we probably should investigate that), that's the path I'd suggest.

This is notably cleaner than my original implementation which actually kept the
more complex RTL form through final and emitted 2/3 instructions (mask the bit
position, then the bset/bclr/binv).

Tested in my tester, but waiting for pre-commit CI to report back before taking
further action.

gcc/
* config/riscv/bitmanip.md (bset splitters): Turn into define_and_splits.
Don't depend on combine splitting the "andn with constant" form.
(bset, binv, bclr with masked bit position): New patterns.

gcc/testsuite
* gcc.target/riscv/binv-for-simode-1.c: New test.
* gcc.target/riscv/bset-for-simode-1.c: New test.
* gcc.target/riscv/bclr-for-simode-1.c: New test.

testsuite/52641 - Fix more sloppy tests.

PR testsuite/52641
gcc/testsuite/
* gcc.dg/analyzer/torture/boxed-ptr-1.c: Requires size24plus.
* gcc.dg/analyzer/torture/pr102692.c: Use intptr_t instead of long.
* gcc.dg/ipa/pr102714.c: Use uintptr_t instead of unsigned long.
* gcc.dg/torture/pr115387-1.c: Same.
* gcc.dg/torture/pr113895-1.c : Same.
* gcc.dg/ipa/pr108007.c: Require int32plus.
* gcc.dg/ipa/pr109318.c: Same.
* gcc.dg/ipa/pr96040.c: Use size_t instead of unsigned long.
* gcc.dg/torture/pr113126.c: Use vectors of same dimension.
* gcc.dg/tree-ssa/builtin-sprintf-9.c: Requires double64.

* gcc.dg/spellcheck-inttypes.c [avr]: Avoid include of inttypes.h.
* gcc.dg/analyzer/torture/pr104159.c [avr]: Skip.
* gcc.dg/torture/pr84682-2.c [avr]: Skip.
* gcc.dg/wtr-conversion-1.c [avr]: Remove avr selector since
long double is a 64-bit type by now.

[committed] Fix various sh define_insn_and_split predicates

The sh4-linux-gnu port has failed to bootstrap since the introduction of late
combine due to failures to split certain insns.

This is caused by incorrect predicates in various define_insn_and_split
patterns.  Essentially the insn's predicate is something like "TARGET_SH1".
The split predicate is "&& can_create_pseudos_p ()".  So these patterns will
match post-reload, but be un-splittable.  So at assembly output time, we get
the failure as the output template is "#".

This patch fixes the most obvious & egregious cases by bringing the split
condition into the insn's predicate and leaving "&& 1" as the split condition.
That's enough to get sh4-linux-gnu bootstrapping again and I'm hoping it does
the same for sh4eb-linux-gnu.

Pushing to the trunk.

gcc/
* config/sh/sh.md (adddi3): Only allow matching when we can
still create new pseudos.
(subdi3, *rotcl, *rotcr, *rotcr_neg_t, negdi2): Likewise.
(abs<mode>2, negabs<mode>2, negdi_cond): Likewise.
(*swapbisi2_and_shl8, *swapbhisi2, *movsi_index_disp_load): Likewise.
(*movhi_index_disp_load, *mov<mode>index_disp_store): Likewise.
(*mov_t_msb_neg, *negt_msb, clipu_one): Likewise.

AVR: Create more opportunities for -mfuse-add optimization.

avr_split_tiny_move() was only run for AVR_TINY because it has no PLUS
addressing modes.  Same applies to the X register on ordinary cores, and
also to the Z register when used with [E]LPM.  For example, without this patch

long long addLL (long long *a, long long *b)
{
  return *a + *b;
}

compiles with "-mmcu=atmgea128 -Os -dp" to:

    ...
    movw r26,r24     ;  80  [c=4 l=1]  *movhi/0
    movw r30,r22     ;  81  [c=4 l=1]  *movhi/0
    ld r18,X         ;  82  [c=4 l=1]  movqi_insn/3
    adiw r26,1   ;  83  [c=4 l=3]  movqi_insn/3
    ld r19,X
    sbiw r26,1
    adiw r26,2   ;  84  [c=4 l=3]  movqi_insn/3
    ld r20,X
    sbiw r26,2
    adiw r26,3   ;  85  [c=4 l=3]  movqi_insn/3
    ld r21,X
    sbiw r26,3
    adiw r26,4   ;  86  [c=4 l=3]  movqi_insn/3
    ld r22,X
    sbiw r26,4
    adiw r26,5   ;  87  [c=4 l=3]  movqi_insn/3
    ld r23,X
    sbiw r26,5
    adiw r26,6   ;  88  [c=4 l=3]  movqi_insn/3
    ld r24,X
    sbiw r26,6
    adiw r26,7   ;  89  [c=4 l=2]  movqi_insn/3
    ld r25,X
    ld r10,Z         ;  90  [c=4 l=1]  movqi_insn/3
    ...

whereas with this patch it becomes:

    ...
    movw r26,r24     ;  80  [c=4 l=1]  *movhi/0
    movw r30,r22     ;  81  [c=4 l=1]  *movhi/0
    ld r18,X+        ;  140 [c=4 l=1]  movqi_insn/3
    ld r19,X+        ;  142 [c=4 l=1]  movqi_insn/3
    ld r20,X+        ;  144 [c=4 l=1]  movqi_insn/3
    ld r21,X+        ;  146 [c=4 l=1]  movqi_insn/3
    ld r22,X+        ;  148 [c=4 l=1]  movqi_insn/3
    ld r23,X+        ;  150 [c=4 l=1]  movqi_insn/3
    ld r24,X+        ;  152 [c=4 l=1]  movqi_insn/3
    ld r25,X         ;  109 [c=4 l=1]  movqi_insn/3
    ld r10,Z         ;  111 [c=4 l=1]  movqi_insn/3
    ...

gcc/
* config/avr/avr.md: Also split with avr_split_tiny_move()
for non-AVR_TINY.
* config/avr/avr.cc (avr_split_tiny_move): Don't change memory
references with base regs that can do PLUS addressing.
(avr_out_lpm_no_lpmx) [POST_INC]: Don't output final ADIW when the
address register is unused after.
gcc/testsuite/
* gcc.target/avr/torture/fuse-add.c: New test.

RISC-V: fix internal error on global variable-length array

This is an ICE in the RISC-V back-end calling tree_to_uhwi on the DECL_SIZE
of a global variable-length array.

gcc/
PR target/115591
* config/riscv/riscv.cc (riscv_valid_lo_sum_p): Add missing test on
tree_fits_uhwi_p before calling tree_to_uhwi.

gcc/testsuite/
* gnat.dg/array41.ads, gnat.dg/array41.adb: New test.

PR target/115751: Avoid force_reg in ix86_expand_ternlog.

This patch fixes a problem with splitting of complex AVX512 ternlog
instructions on x86_64.  A recent change allows the ternlog pattern
to have multiple mem-like operands prior to reload, by emitting any
"reloads" as necessary during split1, before register allocation.
The issue is that this code calls force_reg to place the mem-like
operand into a register, but unfortunately the vec_duplicate (broadcast)
form of operands supported by ternlog isn't considered a "general_operand",
i.e. supported by all instructions.  This mismatch triggers an ICE in
the middle-end's force_reg, even though the x86 supports loading these
vec_duplicate operands into a vector register in a single (move)
instruction.

This patch resolves this problem by replacing force_reg with calls
to gen_reg_rtx and emit_move (as the i386 backend, unlike the middle-end,
knows these will be recognized by recog).

2024-07-06  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
PR target/115751
* config/i386/i386-expand.cc (ix86_expand_ternlog): Avoid use of
force_reg to "reload" non-register operands, as these may contain
vec_duplicate (broadcast) operands that aren't supported by
force_reg.  Use (safer) gen_reg_rtx and emit_move instead.

Daily bump.

x86, Darwin: Fix bootstrap for 32b multilibs/hosts.

r15-1735-ge62ea4fb8ffcab06ddd contained changes that altered the
codegen for 32b Darwin (whether hosted on 64b or as 32b host) such
that the per function picbase load is called multiple times in some
cases. Darwin's back end is not expecting this (and indeed some of
the handling depends on a single instance).

The fixes the issue by marking those instructions as not copyable
(as suggested by Andrew Pinski).

The change is Darwin-specific.

gcc/ChangeLog:

* config/i386/i386.cc (ix86_cannot_copy_insn_p): New.
(TARGET_CANNOT_COPY_INSN_P): New.

Signed-off-by: Iain Sandoe <iains@gcc.gnu.org>

Fortran: switch test to use issignaling() built-in

The macro may not be present in all libc's, but the built-in
is always available.

gcc/testsuite/ChangeLog:

* gfortran.dg/ieee/signaling_2.f90: Adjust test.
* gfortran.dg/ieee/signaling_2_c.c: Adjust test.

Arm: Fix ldrd offset range [PR115153]

The valid offset range of LDRD in arm_legitimate_index_p is increased to
-1024..1020 if NEON is enabled since VALID_NEON_DREG_MODE includes DImode.
Fix this by moving the LDRD check earlier.

gcc:
PR target/115153
* config/arm/arm.cc (arm_legitimate_index_p): Move LDRD case before
NEON.
(thumb2_legitimate_index_p): Update comments.
(output_move_neon): Use DFmode for vldr/vstr and non-checking
adjust_address.

gcc/testsuite:
PR target/115153
* gcc.target/arm/pr115153.c: Add new test.
* lib/target-supports.exp: Add arm_arch_v7ve_neon target support.

libgccjit: Allow comparing array types

gcc/jit/ChangeLog:

* jit-common.h: Add array_type class.
* jit-recording.h (type::dyn_cast_array_type,
memento_of_get_aligned::dyn_cast_array_type,
array_type::dyn_cast_array_type, array_type::is_same_type_as):
New methods.

gcc/testsuite/ChangeLog:

* jit.dg/test-types.c: Add array type comparison to the test.

libgccjit: Add support for the type bfloat16

gcc/jit/ChangeLog:

PR jit/112574
* docs/topics/types.rst: Document GCC_JIT_TYPE_BFLOAT16.
* jit-common.h: Update NUM_GCC_JIT_TYPES.
* jit-playback.cc (get_tree_node_for_type): Support bfloat16.
* jit-recording.cc (recording::memento_of_get_type::get_size,
recording::memento_of_get_type::dereference,
recording::memento_of_get_type::is_int,
recording::memento_of_get_type::is_signed,
recording::memento_of_get_type::is_float,
recording::memento_of_get_type::is_bool): Support bfloat16.
* libgccjit.h (enum gcc_jit_types): Add GCC_JIT_TYPE_BFLOAT16.

gcc/testsuite/ChangeLog:

PR jit/112574
* jit.dg/all-non-failing-tests.h: New test test-bfloat16.c.
* jit.dg/test-types.c: Test GCC_JIT_TYPE_BFLOAT16.
* jit.dg/test-bfloat16.c: New test.

MAINTAINERS: Fix order in DCO

ChangeLog:

* MAINTAINERS: Fix order in Contributing under the DCO.

Signed-off-by: Filip Kastl <fkastl@suse.cz>

RISC-V: Use tu policy for first-element vec_set [PR115725].

This patch changes the tail policy for vmv.s.x from ta to tu.
By default the bug does not show up with qemu because qemu's
current vmv.s.x implementation always uses the tail-undisturbed
policy. With a local qemu version that overwrites the tail
with ones when the tail-agnostic policy is specified, the bug
shows.

gcc/ChangeLog:

* config/riscv/autovec.md: Add TU policy.
* config/riscv/riscv-protos.h (enum insn_type): Define
SCALAR_MOVE_MERGED_OP_TU.

gcc/testsuite/ChangeLog:

PR target/115725

* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-1.c: Adjust
test expectation.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-4.c: Ditto.

AVR: target/87376 - Use nop_general_operand for DImode inputs.

The avr-dimode.md expanders have code like emit_move_insn(acc_a, operands[1])
where acc_a is a hard register and operands[1] might be a non-generic
address-space memory reference. Such loads may clobber hard regs since
some of them are implemented as libgcc calls /and/ 64-moves are
expanded as eight byte-moves, so that acc_a or acc_b might be clobbered
by such a load.

This patch simply denies non-generic address-space references by using
nop_general_operand for all avr-dimode.md input predicates.
With the patch, all memory loads that require library calls are issued
before the expander codes from avr-dimode.md are run.

PR target/87376
gcc/
* config/avr/avr-dimode.md: Use "nop_general_operand" instead
of "general_operand" as predicate for all input operands.

gcc/testsuite/
* gcc.target/avr/torture/pr87376.c: New test.

libstdc++: Add dg-error for new -Wdelete-incomplete diagnostics [PR115747]

Since r15-1794-gbeb7a418aaef2e the -Wdelete-incomplete diagnostic is a
permerror instead of a (suppressed in system headers) warning. Add
dg-error directives.

libstdc++-v3/ChangeLog:

PR c++/115747
* testsuite/tr1/2_general_utilities/shared_ptr/cons/43820_neg.cc:
Add dg-error for new C++26 diagnostics.

libstdc++: Use RAII in <bits/stl_uninitialized.h>

This adds an _UninitDestroyGuard class template, similar to
ranges::_DestroyGuard used in <bits/ranges_uninitialized.h>. This allows
us to remove all the try-catch blocks and rethrows, because any required
cleanup gets done in the guard destructor.

libstdc++-v3/ChangeLog:

* include/bits/stl_uninitialized.h (_UninitDestroyGuard): New
class template and partial specialization.
(__do_uninit_copy, __do_uninit_fill, __do_uninit_fill_n)
(__uninitialized_copy_a, __uninitialized_fill_a)
(__uninitialized_fill_n_a, __uninitialized_copy_move)
(__uninitialized_move_copy, __uninitialized_fill_move)
(__uninitialized_move_fill, __uninitialized_default_1)
(__uninitialized_default_n_a, __uninitialized_default_novalue_1)
(__uninitialized_default_novalue_n_1, __uninitialized_copy_n)
(__uninitialized_copy_n_pair): Use it.

libstdc++: Use memchr to optimize std::find [PR88545]

This optimizes std::find to use memchr when searching for an integer in
a range of bytes.

libstdc++-v3/ChangeLog:

PR libstdc++/88545
PR libstdc++/115040
* include/bits/cpp_type_traits.h (__can_use_memchr_for_find):
New variable template.
* include/bits/ranges_util.h (__find_fn): Use memchr when
possible.
* include/bits/stl_algo.h (find): Likewise.
* testsuite/25_algorithms/find/bytes.cc: New test.

AArch64: lower 2 reg TBL permutes with one zero register to 1 reg TBL.

When a two reg TBL is performed with one operand being a zero vector we can
instead use a single reg TBL and map the indices for accessing the zero vector
to an out of range constant.

On AArch64 out of range indices into a TBL have a defined semantics of setting
the element to zero.  Many uArches have a slower 2-reg TBL than 1-reg TBL.

Before this change we had:

typedef unsigned int v4si __attribute__ ((vector_size (16)));

v4si f1 (v4si a)
{
  v4si zeros = {0,0,0,0};
  return __builtin_shufflevector (a, zeros, 0, 5, 1, 6);
}

which generates:

f1:
        mov     v30.16b, v0.16b
        movi    v31.4s, 0
        adrp    x0, .LC0
        ldr     q0, [x0, #:lo12:.LC0]
        tbl     v0.16b, {v30.16b - v31.16b}, v0.16b
        ret

.LC0:
        .byte   0
        .byte   1
        .byte   2
        .byte   3
        .byte   20
        .byte   21
        .byte   22
        .byte   23
        .byte   4
        .byte   5
        .byte   6
        .byte   7
        .byte   24
        .byte   25
        .byte   26
        .byte   27

and with the patch:

f1:
        adrp    x0, .LC0
        ldr     q31, [x0, #:lo12:.LC0]
        tbl     v0.16b, {v0.16b}, v31.16b
        ret

.LC0:
        .byte   0
        .byte   1
        .byte   2
        .byte   3
        .byte   -1
        .byte   -1
        .byte   -1
        .byte   -1
        .byte   4
        .byte   5
        .byte   6
        .byte   7
        .byte   -1
        .byte   -1
        .byte   -1
        .byte   -1

This sequence is generated often by openmp and aside from the
strict performance impact of this change, it also gives better
register allocation as we no longer have the consecutive
register limitation.

gcc/ChangeLog:

* config/aarch64/aarch64.cc (struct expand_vec_perm_d): Add zero_op0_p
and zero_op_p1.
(aarch64_evpc_tbl): Implement register value remapping.
(aarch64_vectorize_vec_perm_const): Detect if operand is a zero dup
before it's forced to a reg.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/tbl_with_zero_1.c: New test.
* gcc.target/aarch64/tbl_with_zero_2.c: New test.

AArch64: remove aarch64_simd_vec_unpack<su>_lo_

The fix for PR18127 reworked the uxtl to zip optimization.
In doing so it undid the changes in aarch64_simd_vec_unpack<su>_lo_ and this now
no longer matches aarch64_simd_vec_unpack<su>_hi_. It still works because the
RTL generated by aarch64_simd_vec_unpack<su>_lo_ overlaps with the general zero
extend RTL and so because that one is listed before the lo pattern recog picks
it instead.

This removes aarch64_simd_vec_unpack<su>_lo_.

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md
(aarch64_simd_vec_unpack<su>_lo_<mode>): Remove.
(vec_unpack<su>_lo_<mode): Simplify.
* config/aarch64/aarch64.cc (aarch64_gen_shareable_zero): Update
comment.

middle-end: Add debug functions to dump dominator tree in dot format

This adds debug functions to dump the dominator tree in dot format.
There are two overloads: one which takes a FILE * and another which
takes a const char *fname and wraps the first with fopen/fclose for
convenience.

gcc/ChangeLog:

* dominance.cc (dot_dominance_tree): New.

i386: Refactor ssedoublemode

ssedoublemode's double should mean double type, like SI -> DI.
And we need to refactor some patterns with <ssedoublemode> instead of
<ssedoublevecmode>.

gcc/ChangeLog:

* config/i386/sse.md (ssedoublemode): Remove mappings to twice
the number of same-sized elements. Add mappings to the same
number of double-sized elements.
(define_split for vec_concat_minus_plus): Change mode_attr from
ssedoublemode to ssedoublevecmode.
(define_split for vec_concat_plus_minus): Ditto.
(<mask_codefor>avx512dq_shuf_<shuffletype>64x2_1<mask_name>):
Ditto.
(avx512f_shuf_<shuffletype>64x2_1<mask_name>): Ditto.
(avx512vl_shuf_<shuffletype>32x4_1<mask_name>): Ditto.
(avx512f_shuf_<shuffletype>32x4_1<mask_name>): Ditto.

MIPS: Support more cases with alien mode of SHF.DF

Currently, we support the cases that strictly fit for the instructions.
For example, for V16QImode, we only support shuffle like
(0<=N0, N1, N2, N3<=3 here)
N0, N1, N2, N3
N0+4 N1+4 N2+4, N3+4
N0+8 N1+8 N2+8, N3+8
N0+12 N1+12 N2+12, N3+12

While in fact we can support more cases to try use other SHF.DF
instructions not strictly fitting the mode.

1) We can use SHF.H to support more cases for V16QImode:
(M0/M1/M2/M3 are 0 or 2 or 4 or 6)
M0 M0+1, M1, M1+1
M2 M2+1, M3, M3+1
M0+8 M0+9, M1+8, M1+9
M2+8 M2+9, M3+8, M3+9

2) We can use SHF.W to support some cases for V16QImode:
(M0/M1/M2/M3 are 0 or 4 or 8 or 12)
M0, M0+1, M0+2, M0+3
M1, M1+1, M1+2, M1+3
M2, M2+1, M2+2, M2+3
M3, M3+1, M3+2, M3+3

3) We can use SHF.W to support some cases for V8HImode:
(M0/M1/M2/M3 are 0 or 2 or 4 or 6)
M0, M0+1
M1, M1+1
M2, M2+1
M3, M3+1

4) We can also use SHF.W to swap the 2 parts of V2DF or V2DI.

gcc
* config/mips/mips-protos.h: New function mips_msa_shf_i8.
* config/mips/mips-msa.md(MSA_WHB_W): Not used anymore;
(msa_shf_<msafmt_f>): Use mips_msa_shf_i8.
* config/mips/mips.cc(mips_const_vector_shuffle_set_p):
Support more cases try to use alien mode instruction;
(mips_msa_shf_i8): New function to get the correct MSA SHF
instruction and IMM.

Testsuite/MIPS: Fix msa.c: test7_v2f64, test7_v4f32, test43_v2i64

BNEGI.W/D are used for test7_v2f64 and test7_v4f32 now. It is
an improvment since that we can save a instruction.

ILVR.D is used for test43_v2i64 now, instead of INSVE.D.

gcc/testsuite
* gcc.target/mips/msa.c: Fix test7_v2f64, test7_v4f32 and
test43_v2i64.

MIPS/testsuite: Add -mfpxx to call-clobbered-1.c

The scan-assembler-times rules only fit for -mfp32 and -mfpxx.
It fails if we are configured as FP64 by default, as it has
one less sdc1/ldc1 pair.

gcc/testsuite
* gcc.target/mips/call-clobbered-1.c: Add -mfpxx.

MIPS/testsuite: Fix umips-save-restore-1.c

With some recent optimization, -O1/-O2/-O3 can archive almost same
performace/size by stack load/store.  Thus lwm/swm will save/store
less callee-saved register.  In fact only $16 is saved with swm.

To be sure that this optimization does exist, let's add 2 more
function calls.  So that lwm/swm can be much more profitable.

If we add only once more, -O1 will still use stack load/store.

gcc/testsuite
* gcc.target/mips/umips-save-restore-1.c: Be sure lwm/swm
are used for more callee-saved registers with addtional
2 more function calls.

Support group size of three in SLP store permute lowering

The following implements the group-size three scheme from
vect_permute_store_chain in SLP grouped store permute lowering
and extends it to power-of-two multiples of group size three.

The scheme goes from vectors A, B and C to
{ A[0], B[0], C[0], A[1], B[1], C[1], ... } by first producing
{ A[0], B[0], X, A[1], B[1], X, ... } (with X random but chosen
to A[n]) and then permuting in C[n] in the appropriate places.

The extension goes as to replace vector elements with a
power-of-two number of lanes and you'd get pairwise interleaving
until the final three input permutes happen.

The last permute step could be seen as extending C to { C[0], C[0],
C[0], ... } and then performing a blend.

VLA archs will want to use store-lanes here I guess, I'm not sure
if the three vector interleave operation is also available with
a register source and destination and thus available for a shuffle.

* tree-vect-slp.cc (vect_build_slp_instance): Special case
three input permute with the same number of lanes in store
permute lowering.

* gcc.dg/vect/slp-53.c: New testcase.
* gcc.dg/vect/slp-54.c: New testcase.

Daily bump.

analyzer: convert sm_context * to sm_context &

These are never nullptr and never change, so use a reference rather
than a pointer.

No functional change intended.

gcc/analyzer/ChangeLog:
* diagnostic-manager.cc
(diagnostic_manager::add_events_for_eedge): Pass sm_ctxt by
reference.
* engine.cc (impl_region_model_context::on_condition): Likewise.
(impl_region_model_context::on_bounded_ranges): Likewise.
(impl_region_model_context::on_phi): Likewise.
(exploded_node::on_stmt): Likewise.
* sm-fd.cc: Update all uses of sm_context * to sm_context &.
* sm-file.cc: Likewise.
* sm-malloc.cc: Likewise.
* sm-pattern-test.cc: Likewise.
* sm-sensitive.cc: Likewise.
* sm-signal.cc: Likewise.
* sm-taint.cc: Likewise.
* sm.h: Likewise.
* varargs.cc: Likewise.

gcc/testsuite/ChangeLog:
* gcc.dg/plugin/analyzer_gil_plugin.c: Update all uses of
sm_context * to sm_context &.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

analyzer: handle <error.h> at -O0 [PR115724]

At -O0, glibc's:

__extern_always_inline void
error (int __status, int __errnum, const char *__format, ...)
{
  if (__builtin_constant_p (__status) && __status != 0)
    __error_noreturn (__status, __errnum, __format, __builtin_va_arg_pack ());
  else
    __error_alias (__status, __errnum, __format, __builtin_va_arg_pack ());
}

becomes just:

__extern_always_inline void
error (int __status, int __errnum, const char *__format, ...)
{
  if (0)
    __error_noreturn (__status, __errnum, __format, __builtin_va_arg_pack ());
  else
    __error_alias (__status, __errnum, __format, __builtin_va_arg_pack ());
}

and thus calls to "error" are calls to "__error_alias" by the
time -fanalyzer "sees" them.

Handle them with more special-casing in kf.cc.

gcc/analyzer/ChangeLog:
PR analyzer/115724
* kf.cc (register_known_functions): Add __error_alias and
__error_at_line_alias.

gcc/testsuite/ChangeLog:
PR analyzer/115724
* c-c++-common/analyzer/error-pr115724.c: New test.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

[committed][RISC-V] Fix test expectations after recent late-combine changes

With the recent DCE related adjustment to late-combine the rvv/base/vcreate.c
test no longer has those undesirable vmvNr statements.

It's a bit unclear why this wasn't written as a scan-assembler-not and xfailed
given the comment says we don't want to see vmvNr insructions. I must have
missed that during review.

This patch adjusts the test to expect no vmvNr statements and if they're ever
re-introduced, we'll get a nice unexpected failure.

gcc/testsuite
* gcc.target/riscv/rvv/base/vcreate.c: Update expected output.

Skip 30_threads/future/members/poll.cc on hppa*-*-linux*

hppa*-*-linux* lacks high resolution timer support. Timer resolution
ranges from 1 to 10ms. As a result, a large number of iterations are
needed for the wait_for_0 and ready loops. This causes the
wait_until_sys_epoch and wait_until_steady_epoch loops to timeout.
There the loop wait time is determined by the timer resolution.

2024-07-04 John David Anglin <danglin@gcc.gnu.org>

libstdc++-v3/ChangeLog:
PR libstdc++/98678
* testsuite/30_threads/future/members/poll.cc: Skip on hppa*-*-linux*.

testsuite: Update test for PR115537 to use SVE .

The PR was about SVE codegen, the testcase accidentally used neoverse-n1
instead of neoverse-v1 as was the original report.

This updates the tool options.

gcc/testsuite/ChangeLog:

PR tree-optimization/115537
* gcc.dg/vect/pr115537.c: Update flag from neoverse-n1 to neoverse-v1.

c++ frontend: check for missing condition for novector [PR115623]

It looks like I forgot to check in the C++ frontend if a condition exist for the
loop being adorned with novector. This causes a segfault because cond isn't
expected to be null.

This fixes it by issuing ignoring the pragma when there's no loop condition
the same way we do in the C frontend.

gcc/cp/ChangeLog:

PR c++/115623
* semantics.cc (finish_for_cond): Add check for C++ cond.

gcc/testsuite/ChangeLog:

PR c++/115623
* g++.dg/vect/vect-novector-pragma_2.cc: New test.

arm: Use LDMIA/STMIA for thumb1 DI/DF loads/stores

If the address register is dead after load/store operation it looks
beneficial to use LDMIA/STMIA instead of pair of LDR/STR instructions,
at least if optimizing for size.

gcc/ChangeLog:

* config/arm/arm.cc (thumb_load_double_from_address): Emit ldmia
when address reg rewritten by load.
* config/arm/thumb1.md (peephole2 to rewrite DI/DF load): New.
(peephole2 to rewrite DI/DF store): New.
* config/arm/iterators.md (DIDF): New.

gcc/testsuite:

* gcc.target/arm/thumb1-load-store-64bit.c: Add new test.

Signed-off-by: Siarhei Volkau <lis8215@gmail.com>

Aarch64, bugfix: Fix NEON bigendian addp intrinsic [PR114890]

This change removes code that switches the operands in bigendian mode erroneously.
This fixes the related test also.

gcc/ChangeLog:

PR target/114890
* config/aarch64/aarch64-simd.md: Remove bigendian operand swap.

gcc/testsuite/ChangeLog:

PR target/114890
* gcc.target/aarch64/vector_intrinsics_asm.c: Remove xfail.

Aarch64: Add test for non-commutative SIMD intrinsic

This adds a test for non-commutative SIMD NEON intrinsics.
Specifically addp is non-commutative and has a bug in the current big-endian implementation.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/vector_intrinsics_asm.c: New test.

middle-end/115426 - wrong gimplification of "rm" asm output operand

When the operand is gimplified to an extract of a register or a
register we have to disallow memory as we otherwise fail to
gimplify it properly.  Instead of

  __asm__("" : "=rm" __imag <r>);

we want

  __asm__("" : "=rm" D.2772);
  _1 = REALPART_EXPR <r>;
  r = COMPLEX_EXPR <_1, D.2772>;

otherwise SSA rewrite will fail and generate wrong code with 'r'
left bare in the asm output.

PR middle-end/115426
* gimplify.cc (gimplify_asm_expr): Handle "rm" output
constraint gimplified to a register (operation).

* gcc.dg/pr115426.c: New testcase.

Use __builtin_cpu_support instead of __get_cpuid_count.

gcc/testsuite/ChangeLog:

PR target/115748
* gcc.target/i386/avx512-check.h: Use __builtin_cpu_support
instead of __get_cpuid_count.

i386: Add additional variant of bswaphisi2_lowpart peephole2.

This patch adds an additional variation of the peephole2 used to convert
bswaphisi2_lowpart into rotlhi3_1_slp, which converts xchgb %ah,%al into
rotw if the flags register isn't live.  The motivating example is:

void ext(int x);
void foo(int x)
{
  ext((x&~0xffff)|((x>>8)&0xff)|((x&0xff)<<8));
}

where GCC with -O2 currently produces:

foo: movl    %edi, %eax
        rolw    $8, %ax
        movl    %eax, %edi
        jmp     ext

The issue is that the original xchgb (bswaphisi2_lowpart) can only be
performed in "Q" registers that allow the %?h register to be used, so
reload generates the above two movl.  However, it's later in peephole2
where we see that CC_FLAGS can be clobbered, so we can use a rotate word,
which is more forgiving with register allocations.  With the additional
peephole2 proposed here, we now generate:

foo: rolw    $8, %di
        jmp     ext

2024-07-04  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
* config/i386/i386.md (bswaphisi2_lowpart peephole2): New
peephole2 variant to eliminate register shuffling.

gcc/testsuite/ChangeLog
* gcc.target/i386/xchg-4.c: New test case.

[committed] Fix newlib build failure with rx as well as several dozen testsuite failures

The rx port has been failing to build newlib for a bit over a week.  I can't
remember if it was the late-combine work or the IRA costing twiddle, regardless
the real bug is in the rx backend.

Basically dwarf2cfi is blowing up because of inconsistent state caused by the
failure to mark a stack adjustment as frame related.  This instance in the
epilogue looks like a simple goof.

With the port building again, the testsuite would run and it showed a number of
regressions, again related to CFI handling.  The common thread was a failure to
mark a copy from FP to SP in the prologue as frame related.  The change which
introduced this bug as supposed to just be changing promotions of vector types.
It's unclear if Nick included the hunk accidentally or just goof'd on the
logic.  Regardless it looks quite incorrect.

Reverting that hunk fixes the regressions *and* fixes 94 pre-existing failures.

The net is rx-elf is regression free and has moved forward in terms of its
testsuite status.

Pushing to the trunk momentarily.

gcc/

* config/rx/rx.cc (rx_expand_prologue): Mark the copy from FP to SP
as frame related.
(rx_expand_epilogue): Mark the stack pointer adjustment as frame
related.

[APX PPX] Avoid generating unmatched pushp/popp in pro/epilogue

According to APX spec, the pushp/popp pairs should be matched,
otherwise the PPX hint cannot take effect and cause performance loss.

In the ix86_expand_epilogue, there are several optimizations that may
cause the epilogue using mov to restore the regs. Check if PPX applied
and prevent usage of mov/leave in the epilogue. Also do not use PPX
for eh_return.

gcc/ChangeLog:

* config/i386/i386.cc (ix86_expand_prologue): Set apx_ppx_used
flag in m.fs with TARGET_APX_PPX && !crtl->calls_eh_return.
(ix86_emit_save_regs): Emit ppx is available only when
TARGET_APX_PPX && !crtl->calls_eh_return.
(ix86_expand_epilogue): Don't restore reg using mov when
apx_ppx_used flag is true.
* config/i386/i386.h (struct machine_frame_state):
Add apx_ppx_used flag.

gcc/testsuite/ChangeLog:

* gcc.target/i386/apx-ppx-2.c: New test.
* gcc.target/i386/apx-ppx-3.c: Likewise.

c++: OVERLOAD in diagnostics

In modules we can get an OVERLOAD around a non-function, so let's tail
recurse instead of falling through. As a result we start printing the
template header in this testcase.

gcc/cp/ChangeLog:

* error.cc (dump_decl) [OVERLOAD]: Recurse on single case.

gcc/testsuite/ChangeLog:

* g++.dg/warn/pr61945.C: Adjust diagnostic.

c++: CTAD and trait built-ins

While poking at 101232 I noticed that we started trying to parse
__is_invocable(_Fn, _Args...) as a functional cast to a CTAD placeholder
type; we shouldn't consider CTAD for a template that shares a name (reserved
for the implementation) with a built-in trait.

gcc/cp/ChangeLog:

* pt.cc (ctad_template_p): Return false for trait names.

vect: Fix ICE caused by missing check for TREE_CODE == SSA_NAME

Need to check if the tree's code is SSA_NAME before SSA_NAME_RANGE_INFO.

2024-07-03 Hu, Lin1 <lin1.hu@intel.com>
Andrew Pinski <quic_apinski@quicinc.com>

gcc/ChangeLog:

PR tree-optimization/115753
* tree-vect-stmts.cc (supportable_indirect_convert_operation): Add
TYPE_CODE check before SSA_NAME_RANGE_INFO.

gcc/testsuite/ChangeLog:

PR tree-optimization/115753
* gcc.dg/vect/pr115753-1.c: New test.
* gcc.dg/vect/pr115753-2.c: Ditto.
* gcc.dg/vect/pr115753-3.c: Ditto.

Daily bump.

[committed] Fix previously latent bug in reorg affecting cris port

The late-combine patch has triggered a previously latent bug in reorg.

Basically we have a sequence like this in the middle of reorg before we start
relaxing delay slots (cris-elf, gcc.dg/torture/pr98289.c)

> (insn 67 49 18 (sequence [
>             (jump_insn 50 49 52 (set (pc)
>                     (if_then_else (ne (reg:CC 19 ccr)
>                             (const_int 0 [0]))
>                         (label_ref:SI 30)
>                         (pc))) "j.c":10:6 discrim 1 282 {*bnecc}
>                  (expr_list:REG_DEAD (reg:CC 19 ccr)
>                     (int_list:REG_BR_PROB 7 (nil)))
>              -> 30)
>             (insn/f 52 50 18 (set (mem:SI (reg/f:SI 14 sp) [1  S4 A8])
>                     (reg:SI 16 srp)) 37 {*mov_tomemsi}
>                  (nil))
>         ]) "j.c":10:6 discrim 1 -1
>      (nil))
>
> (note 18 67 54 [bb 3] NOTE_INSN_BASIC_BLOCK)
>
> (note 54 18 55 NOTE_INSN_EPILOGUE_BEG)
>
> (jump_insn 55 54 56 (return) "j.c":14:1 228 {*return_expanded}
>      (nil)
>  -> return)
>
> (barrier 56 55 43)
>
> (note 43 56 65 [bb 4] NOTE_INSN_BASIC_BLOCK)
>
> (note 65 43 30 NOTE_INSN_SWITCH_TEXT_SECTIONS)
>
> (code_label 30 65 8 5 6 (nil) [1 uses])
>
> (note 8 30 61 [bb 5] NOTE_INSN_BASIC_BLOCK)

So at a high level the things to note are that insn 50 conditionally jumps
around insn 55.  Second there's a SWITCH_TEXT_SECTIONS note between insn 50 and
the target label for insn 50 (code_label 30).

reorg sees the conditional jump around the unconditional jump/return and will
invert the jump and retarget the original jump to an appropriate location.  In
this case generating:

> (insn 67 49 18 (sequence [
>             (jump_insn 50 49 52 (set (pc)
>                     (if_then_else (eq (reg:CC 19 ccr)
>                             (const_int 0 [0]))
>                         (label_ref:SI 68)
>                         (pc))) "j.c":10:6 discrim 1 281 {*beqcc}
>                  (expr_list:REG_DEAD (reg:CC 19 ccr)
>                     (int_list:REG_BR_PROB 1073741831 (nil)))
>              -> 68)
>             (insn/s/f 52 50 18 (set (mem:SI (reg/f:SI 14 sp) [1  S4 A8])
>                     (reg:SI 16 srp)) 37 {*mov_tomemsi}
>                  (nil))
>         ]) "j.c":10:6 discrim 1 -1
>      (nil))
>
> (note 18 67 54 [bb 3] NOTE_INSN_BASIC_BLOCK)
>
> (note 54 18 43 NOTE_INSN_EPILOGUE_BEG)
>
> (note 43 54 65 [bb 4] NOTE_INSN_BASIC_BLOCK)
>
> (note 65 43 8 NOTE_INSN_SWITCH_TEXT_SECTIONS)
>
> (note 8 65 61 [bb 5] NOTE_INSN_BASIC_BLOCK)
[ ... ]
Where the new target of the jump is a return statement later in the IL.

Note that we now have a SWITCH_TEXT_SECTIONS note that is not immediately
preceded by a BARRIER.  That triggers an assertion in the dwarf2 code.  Removal
of the BARRIER is inherent in this optimization.

The fix is simple, we avoid this optimization when there's a
SWITCH_TEXT_SECTIONS note between the conditional jump insn and its target.
Thankfully we already have a routine to test for this in reorg, so we just need
to call it appropriately.  The other approach would be to drop the note which I
considered and discarded.

We don't have great coverage for delay slot targets.  I've tested arc, cris,
fr30, frv, h8, iq2000, microblaze, or1k, sh3  visium in my tester as crosses
without new regressions, fixing one regression along the way.   Bootstrap &
regression testing on sh4 and hppa will take considerably longer.

gcc/

* reorg.cc (relax_delay_slots): Do not optimize a conditional
jump around an unconditional jump/return in the presence of
a text section switch.

Revert "Delete MALLOC_ABI_ALIGNMENT define from pa32-linux.h"

This reverts commit 0ee3266b3dec4d984d43c79e2b3e649256e3eaaa.

Fortran: fix associate with assumed-length character array [PR115700]

gcc/fortran/ChangeLog:

PR fortran/115700
* trans-stmt.cc (trans_associate_var): When the associate target
is an array-valued character variable, the length is known at entry
of the associate block. Move setting of string length of the
selector to the initialization part of the block.

gcc/testsuite/ChangeLog:

PR fortran/115700
* gfortran.dg/associate_69.f90: New test.

RISC-V: Describe -march behavior for dependent extensions

gcc/ChangeLog:

* doc/invoke.texi: Describe -march behavior for dependent extensions on
RISC-V.

RISC-V: Add support for Zabha extension

The Zabha extension adds support for subword Zaamo ops.

Extension: https://github.com/riscv/riscv-zabha.git
Ratification: https://jira.riscv.org/browse/RVS-1685

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc
(riscv_subset_list::to_string): Skip zabha when not supported by
the assembler.
* config.in: Regenerate.
* config/riscv/arch-canonicalize: Make zabha imply zaamo.
* config/riscv/iterators.md (amobh): Add iterator for amo
byte/halfword.
* config/riscv/riscv.opt: Add zabha.
* config/riscv/sync.md (atomic_<atomic_optab><mode>): Add
subword atomic op pattern.
(zabha_atomic_fetch_<atomic_optab><mode>): Add subword
atomic_fetch op pattern.
(lrsc_atomic_fetch_<atomic_optab><mode>): Prefer zabha over lrsc
for subword atomic ops.
(zabha_atomic_exchange<mode>): Add subword atomic exchange
pattern.
(lrsc_atomic_exchange<mode>): Prefer zabha over lrsc for subword
atomic exchange ops.
* configure: Regenerate.
* configure.ac: Add zabha assembler check.
* doc/sourcebuild.texi: Add zabha documentation.

gcc/testsuite/ChangeLog:

* lib/target-supports.exp: Add zabha testsuite infra support.
* gcc.target/riscv/amo/inline-atomics-1.c: Remove zabha to continue to
test the lr/sc subword patterns.
* gcc.target/riscv/amo/inline-atomics-2.c: Ditto.
* gcc.target/riscv/amo/zalrsc-rvwmo-subword-amo-add-char-acq-rel.c: Ditto.
* gcc.target/riscv/amo/zalrsc-rvwmo-subword-amo-add-char-acquire.c: Ditto.
* gcc.target/riscv/amo/zalrsc-rvwmo-subword-amo-add-char-relaxed.c: Ditto.
* gcc.target/riscv/amo/zalrsc-rvwmo-subword-amo-add-char-release.c: Ditto.
* gcc.target/riscv/amo/zalrsc-rvwmo-subword-amo-add-char-seq-cst.c: Ditto.
* gcc.target/riscv/amo/zalrsc-ztso-subword-amo-add-char-acq-rel.c: Ditto.
* gcc.target/riscv/amo/zalrsc-ztso-subword-amo-add-char-acquire.c: Ditto.
* gcc.target/riscv/amo/zalrsc-ztso-subword-amo-add-char-relaxed.c: Ditto.
* gcc.target/riscv/amo/zalrsc-ztso-subword-amo-add-char-release.c: Ditto.
* gcc.target/riscv/amo/zalrsc-ztso-subword-amo-add-char-seq-cst.c: Ditto.
* gcc.target/riscv/amo/zabha-all-amo-ops-char-run.c: New test.
* gcc.target/riscv/amo/zabha-all-amo-ops-short-run.c: New test.
* gcc.target/riscv/amo/zabha-rvwmo-all-amo-ops-char.c: New test.
* gcc.target/riscv/amo/zabha-rvwmo-all-amo-ops-short.c: New test.
* gcc.target/riscv/amo/zabha-rvwmo-amo-add-char.c: New test.
* gcc.target/riscv/amo/zabha-rvwmo-amo-add-short.c: New test.
* gcc.target/riscv/amo/zabha-ztso-amo-add-char.c: New test.
* gcc.target/riscv/amo/zabha-ztso-amo-add-short.c: New test.

Co-Authored-By: Patrick O'Neill <patrick@rivosinc.com>
Signed-Off-By: Gianluca Guida <gianluca@rivosinc.com>
Tested-by: Andrea Parri <andrea@rivosinc.com>

[PATCH] ARC: Update gcc.target/arc/pr9001184797.c test

... to comply with new standards due to stricter analysis in
the latest GCC versions.

gcc/testsuite/ChangeLog:

* gcc.target/arc/pr9001184797.c: Fix compiler warnings.

RISC-V: Bugfix vfmv insn honor zvfhmin for FP16 SEW [PR115763]

According to the ISA,  the zvfhmin sub extension should only contain
convertion insn.  Thus,  the vfmv insn acts on FP16 should not be
present when only the zvfhmin option is given.

This patch would like to fix it by split the pred_broadcast define_insn
into zvfhmin and zvfh part.  Given below example:

void test (_Float16 *dest, _Float16 bias) {
  dest[0] = bias;
  dest[1] = bias;
}

when compile with -march=rv64gcv_zfh_zvfhmin

Before this patch:
test:
  vsetivli        zero,2,e16,mf4,ta,ma
  vfmv.v.f        v1,fa0 // should not leverage vfmv for zvfhmin
  vse16.v v1,0(a0)
  ret

After this patch:
test:
  addi     sp,sp,-16
  fsh      fa0,14(sp)
  addi     a5,sp,14
  vsetivli zero,2,e16,mf4,ta,ma
  vlse16.v v1,0(a5),zero
  vse16.v  v1,0(a0)
  addi     sp,sp,16
  jr       ra

PR target/115763

gcc/ChangeLog:

* config/riscv/vector.md (*pred_broadcast<mode>): Split into
zvfh and zvfhmin part.
(*pred_broadcast<mode>_zvfh): New define_insn for zvfh part.
(*pred_broadcast<mode>_zvfhmin): Ditto but for zvfhmin.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/scalar_move-5.c: Adjust asm check.
* gcc.target/riscv/rvv/base/scalar_move-6.c: Ditto.
* gcc.target/riscv/rvv/base/scalar_move-7.c: Ditto.
* gcc.target/riscv/rvv/base/scalar_move-8.c: Ditto.
* gcc.target/riscv/rvv/base/pr115763-1.c: New test.
* gcc.target/riscv/rvv/base/pr115763-2.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

[MAINTAINERS] Update my email address.

* MAINTAINERS: Update my email address and add myself to DCO.

Signed-off-by: Prathamesh Kulkarni <prathameshk@nvidia.com>

Match: Allow more types truncation for .SAT_TRUNC

The .SAT_TRUNC has the input and output types,  aka cvt from
itype to otype and the sizeof (otype) < sizeof (itype).  The
previous patch only allows the sizeof (otype) == sizeof (itype) / 2.
But actually we have 1/4 and 1/8 truncation.

This patch would like to support more types trunction when
sizeof (otype) < sizeof (itype).  The below truncation will be
covered.

* uint64_t => uint8_t
* uint64_t => uint16_t
* uint64_t => uint32_t
* uint32_t => uint8_t
* uint32_t => uint16_t
* uint16_t => uint8_t

The below test suites are passed for this patch:
1. The rv64gcv fully regression tests.
2. The rv64gcv build with glibc.
3. The x86 bootstrap tests.
4. The x86 fully regression tests.

gcc/ChangeLog:

* match.pd: Allow any otype is less than itype truncation.

Signed-off-by: Pan Li <pan2.li@intel.com>

Vect: Support IFN SAT_TRUNC for unsigned vector int

This patch would like to support the .SAT_TRUNC for the unsigned
vector int.  Given we have below example code:

Form 1
  #define VEC_DEF_SAT_U_TRUC_FMT_1(NT, WT)                             \
  void __attribute__((noinline))                                       \
  vec_sat_u_truc_##WT##_to_##NT##_fmt_1 (NT *x, WT *y, unsigned limit) \
  {                                                                    \
    for (unsigned i = 0; i < limit; i++)                               \
      {                                                                \
        bool overflow = y[i] > (WT)(NT)(-1);                           \
        x[i] = ((NT)y[i]) | (NT)-overflow;                             \
      }                                                                \
  }

VEC_DEF_SAT_U_TRUC_FMT_1 (uint32_t, uint64_t)

Before this patch:
void vec_sat_u_truc_uint64_t_to_uint32_t_fmt_1 (uint32_t * x, uint64_t * y, unsigned int limit)
{
  ...
  _51 = .SELECT_VL (ivtmp_49, POLY_INT_CST [2, 2]);
  ivtmp_35 = _51 * 8;
  vect__4.7_32 = .MASK_LEN_LOAD (vectp_y.5_34, 64B, { -1, ... }, _51, 0);
  mask_overflow_16.8_30 = vect__4.7_32 > { 4294967295, ... };
  vect__5.9_29 = (vector([2,2]) unsigned int) vect__4.7_32;
  vect__10.13_20 = .VCOND_MASK (mask_overflow_16.8_30, { 4294967295, ... }, vect__5.9_29);
  ivtmp_12 = _51 * 4;
  .MASK_LEN_STORE (vectp_x.14_11, 32B, { -1, ... }, _51, 0, vect__10.13_20);
  vectp_y.5_33 = vectp_y.5_34 + ivtmp_35;
  vectp_x.14_46 = vectp_x.14_11 + ivtmp_12;
  ivtmp_50 = ivtmp_49 - _51;
  if (ivtmp_50 != 0)
  ...
}

After this patch:
void vec_sat_u_truc_uint64_t_to_uint32_t_fmt_1 (uint32_t * x, uint64_t * y, unsigned int limit)
{
  ...
  _12 = .SELECT_VL (ivtmp_21, POLY_INT_CST [2, 2]);
  ivtmp_34 = _12 * 8;
  vect__4.7_31 = .MASK_LEN_LOAD (vectp_y.5_33, 64B, { -1, ... }, _12, 0);
  vect_patt_40.8_30 = .SAT_TRUNC (vect__4.7_31); // << .SAT_TRUNC
  ivtmp_29 = _12 * 4;
  .MASK_LEN_STORE (vectp_x.9_28, 32B, { -1, ... }, _12, 0, vect_patt_40.8_30);
  vectp_y.5_32 = vectp_y.5_33 + ivtmp_34;
  vectp_x.9_27 = vectp_x.9_28 + ivtmp_29;
  ivtmp_20 = ivtmp_21 - _12;
  if (ivtmp_20 != 0)
  ...
}

The below test suites are passed for this patch
* The x86 bootstrap test.
* The x86 fully regression test.
* The rv64gcv fully regression tests.

gcc/ChangeLog:

* tree-vect-patterns.cc (gimple_unsigned_integer_sat_trunc): Add
new decl generated by match.
(vect_recog_sat_trunc_pattern): Add new func impl to recog the
.SAT_TRUNC pattern.

Signed-off-by: Pan Li <pan2.li@intel.com>

Remove redundant vector permute dump

The following removes redundant dumping in vect permute vectorization.

* tree-vect-slp.cc (vectorizable_slp_permutation_1): Remove
redundant dump.

[PATCH] match.pd: Fold x/sqrt(x) to sqrt(x)

This patch adds a pattern in match.pd folding x/sqrt(x) to sqrt(x) for -funsafe-math-optimizations. Test cases were added for double, float, and long double.

The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
Ok for mainline?

Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com>
gcc/

* match.pd: Fold x/sqrt(x) to sqrt(x).

gcc/testsuite/

* gcc.dg/tree-ssa/sqrt_div.c: New test.

Deduplicate explicitly-sized types

When make_type_from_size is called with a biased type, for an entity
that isn't explicitly biased, we may refrain from reusing the given
type because it doesn't seem to match, and then proceed to create an
exact copy of that type.

Compute earlier the biased status of the expected type, early enough
for the suitability check of the given type.  Modify for_biased
instead of biased_p, so that biased_p remains with the given type's
status for the comparison.

Avoid creating unnecessary copies of types in make_type_from_size, by
caching and reusing previously-created identical types, similarly to
the caching of packable types.

While at that, fix two vaguely related issues:

- TYPE_DEBUG_TYPE's storage is shared with other sorts of references
to types, so it shouldn't be accessed unless
TYPE_CAN_HAVE_DEBUG_TYPE_P holds.

- When we choose the narrower/packed variant of a type as the main
debug info type, we fail to output its name if we fail to follow debug
type for the TYPE_NAME decl type in modified_type_die.

for  gcc/ada/ChangeLog

* gcc-interface/misc.cc (gnat_get_array_descr_info): Only follow
TYPE_DEBUG_TYPE if TYPE_CAN_HAVE_DEBUG_TYPE_P.
* gcc-interface/utils.cc (sized_type_hash): New struct.
(sized_type_hasher): New struct.
(sized_type_hash_table): New variable.
(init_gnat_utils): Allocate it.
(destroy_gnat_utils): Release it.
(sized_type_hasher::equal): New.
(hash_sized_type): New.
(canonicalize_sized_type): New.
(make_type_from_size): Use it to cache packed variants.  Fix
type reuse by combining biased_p and for_biased earlier.  Hold
the combination in for_biased, adjusting later uses.

for  gcc/ChangeLog

* dwarf2out.cc (modified_type_die): Follow name's debug type.

for  gcc/testsuite/ChangeLog

* gnat.dg/bias1.adb: Count occurrences of -7.*DW_AT_GNU_bias.

[debug] Avoid dropping bits from num/den in fixed-point types

We used to use an unsigned 128-bit type to hold the numerator and
denominator used to represent the delta of a fixed-point type in debug
information, but there are cases in which that was not enough, and
more significant bits silently overflowed and got omitted from debug
information.

Introduce a mode in which UI_to_gnu selects a wide-enough unsigned
type, and use that to convert numerator and denominator.  While at
that, avoid exceeding the maximum precision for wide ints, and for
available int modes, when selecting a type to represent very wide
constants, falling back to 0/0 for unrepresentable fractions.

for  gcc/ada/ChangeLog

* gcc-interface/cuintp.cc (UI_To_gnu): Add mode that selects a
wide enough unsigned type.  Fail if the constant exceeds the
representable numbers.
* gcc-interface/decl.cc (gnat_to_gnu_entity): Use it for
numerator and denominator of fixed-point types.  In case of
failure, fall back to an indeterminate fraction.

[i386] restore recompute to override opts after change [PR113719]

The first patch for PR113719 regressed gcc.dg/ipa/iinline-attr.c on
toolchains configured to --enable-frame-pointer, because the
optimization node created within handle_optimize_attribute had
flag_omit_frame_pointer incorrectly set, whereas
default_optimization_node didn't.  With this difference,
can_inline_edge_by_limits_p flagged an optimization mismatch and we
refused to inline the function that had a redundant optimization flag
into one that didn't, which is exactly what is tested for there.

This patch restores the calls to ix86_default_align and
ix86_recompute_optlev_based_flags that used to be, and ought to be,
issued during TARGET_OVERRIDE_OPTIONS_AFTER_CHANGE, but preserves the
intent of the original change, of having those functions called at
different spots within ix86_option_override_internal.  To that end,
the remaining bits were refactored into a separate function, that was
in turn adjusted to operate on explicitly-passed opts and opts_set,
rather than going for their global counterparts.

for  gcc/ChangeLog

PR target/113719
* config/i386/i386-options.cc
(ix86_override_options_after_change_1): Add opts and opts_set
parms, operate on them, after factoring out of...
(ix86_override_options_after_change): ... this.  Restore calls
of ix86_default_align and ix86_recompute_optlev_based_flags.
(ix86_option_override_internal): Call the factored-out bits.

aarch64: PR target/115475 Implement missing __ARM_FEATURE_SVE_BF16 macro

The ACLE requires __ARM_FEATURE_SVE_BF16 to be enabled when SVE and BF16
and the associated intrinsics are available.
GCC does support the required intrinsics for TARGET_SVE_BF16 so define
this macro too.

Bootstrapped and tested on aarch64-none-linux-gnu.

gcc/

PR target/115475
* config/aarch64/aarch64-c.cc (aarch64_update_cpp_builtins):
Define __ARM_FEATURE_SVE_BF16 for TARGET_SVE_BF16.

gcc/testsuite/

PR target/115475
* gcc.target/aarch64/acle/bf16_sve_feature.c: New test.

Signed-off-by: Kyrylo Tkachov <ktkachov@nvidia.com>

aarch64: PR target/115457 Implement missing __ARM_FEATURE_BF16 macro

The ACLE asks the user to test for __ARM_FEATURE_BF16 before using the
<arm_bf16.h> header but GCC doesn't set this up.
LLVM does, so this is an inconsistency between the compilers.

This patch enables that macro for TARGET_BF16_FP.
Bootstrapped and tested on aarch64-none-linux-gnu.

gcc/

PR target/115457
* config/aarch64/aarch64-c.cc (aarch64_update_cpp_builtins):
Define __ARM_FEATURE_BF16 for TARGET_BF16_FP.

gcc/testsuite/

PR target/115457
* gcc.target/aarch64/acle/bf16_feature.c: New test.

Signed-off-by: Kyrylo Tkachov <ktkachov@nvidia.com>

Handle NULL stmt in SLP_TREE_SCALAR_STMTS

The following starts to handle NULL elements in SLP_TREE_SCALAR_STMTS
with the first candidate being the two-operator nodes where some
lanes are do-not-care and also do not have a scalar stmt computing
the result. I originally added SLP_TREE_SCALAR_STMTS to two-operator
nodes but this exposes PR115764, so I've split that out.

I have a patch use NULL elements for loads from groups with gaps
where we get around not doing that by having a load permutation.

* tree-vect-slp.cc (bst_traits::hash): Handle NULL elements
in SLP_TREE_SCALAR_STMTS.
(vect_print_slp_tree): Likewise.
(vect_mark_slp_stmts): Likewise.
(vect_mark_slp_stmts_relevant): Likewise.
(vect_find_last_scalar_stmt_in_slp): Likewise.
(vect_bb_slp_mark_live_stmts): Likewise.
(vect_slp_prune_covered_roots): Likewise.
(vect_bb_partition_graph_r): Likewise.
(vect_remove_slp_scalar_calls): Likewise.
(vect_slp_gather_vectorized_scalar_stmts): Likewise.
(vect_bb_slp_scalar_cost): Likewise.
(vect_contains_pattern_stmt_p): Likewise.
(vect_slp_convert_to_external): Likewise.
(vect_find_first_scalar_stmt_in_slp): Likewise.
(vect_optimize_slp_pass::remove_redundant_permutations): Likewise.
(vect_slp_analyze_node_operations_1): Likewise.
(vect_schedule_slp_node): Likewise.
* tree-vect-stmts.cc (can_vectorize_live_stmts): Likewise.
(vectorizable_shift): Likewise.
* tree-vect-data-refs.cc (vect_slp_analyze_load_dependences):
Handle NULL elements in SLP_TREE_SCALAR_STMTS.

AVR: target/98762 - Handle partial clobber in movqi output.

PR target/98762
gcc/
* config/avr/avr.cc (avr_out_movqi_r_mr_reg_disp_tiny): Properly
restore the base register when it is partially clobbered.
gcc/testsuite/
* gcc.target/avr/torture/pr98762.c: New test.

ivopts: replace constant_multiple_of with aff_combination_constant_multiple_p [PR114932]

The current implementation of constant_multiple_of is doing a more limited
version of aff_combination_constant_multiple_p.

The only non-debug usage of constant_multiple_of will proceed with the values
as affine trees. There is scope for further optimization here, namely I believe
that if constant_multiple_of returns the aff_tree after the conversion then
get_computation_aff_1 can use it instead of manually creating the aff_tree.

However I think it makes sense to first commit this smaller change and then
incrementally change things.

gcc/ChangeLog:

PR tree-optimization/114932
* tree-ssa-loop-ivopts.cc (constant_multiple_of): Use
aff_combination_constant_multiple_p instead.

ivopts: fix wide_int_constant_multiple_p when VAL and DIV are 0. [PR114932]

wide_int_constant_multiple_p tries to check if for two tree expressions a and b
that there is a multiplier which makes a == b * c.

This code however seems to think that there's no c where a=0 and b=0 are equal
which is of course wrong.

This fixes it and also fixes the comment.

gcc/ChangeLog:

PR tree-optimization/114932
* tree-affine.cc (wide_int_constant_multiple_p): Support 0 and 0 being
multiples.

Give fast DCE a separate dirty flag

Thomas pointed out that we sometimes failed to eliminate some dead code
(specifically clobbers of otherwise unused registers) on nvptx when
late-combine is enabled.  This happens because:

- combine is able to optimise the function in a way that exposes dead code.
  This leaves the df information in a "dirty" state.

- late_combine calls df_analyze without DF_LR_RUN_DCE run set.
  This updates the df information and clears the "dirty" state.

- late_combine doesn't find any extra optimisations, and so leaves
  the df information up-to-date.

- if_after_combine (ce2) calls df_analyze with DF_LR_RUN_DCE set.
  Because the df information is already up-to-date, fast DCE is
  not run.

The upshot is that running late-combine has the effect of suppressing
a DCE opportunity that would have been noticed without late_combine.

I think this shows that we should track the state of the DCE separately
from the LR problem.  Every pass updates the latter, but not all passes
update the former.

gcc/
* df.h (DF_LR_DCE): New df_problem_id.
(df_lr_dce): New macro.
* df-core.cc (rest_of_handle_df_finish): Check for a null free_fun.
* df-problems.cc (df_lr_finalize): Split out fast DCE handling to...
(df_lr_dce_finalize): ...this new function.
(problem_LR_DCE): New df_problem.
(df_lr_add_problem): Register LR_DCE rather than LR itself.
* dce.cc (fast_dce): Clear df_lr_dce->solutions_dirty.

Move runtime check into a separate function and guard it with target ("no-avx")

The patch can avoid SIGILL on non-AVX512 machine due to kmovd is
generated in dynamic check.

gcc/testsuite/ChangeLog:

PR target/115748
* gcc.target/i386/avx512-check.h: Move runtime check into a
separate function and guard it with target ("no-avx").

RISC-V: Fix asm check failure for truncated after SAT_SUB

It seems that the asm check is incorrect for truncated after SAT_SUB,
we should take the vx check for vssubu instead of vv check.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-1.c:
Update vssubu check from vv to vx.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-2.c:
Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-3.c:
Ditto.

Signed-off-by: Pan Li <pan2.li@intel.com>

tree-optimization/115764 - testcase for BB SLP issue

The following adds a testcase for a CSE issue with BB SLP two operator
handling when we make those CSE aware by providing SLP_TREE_SCALAR_STMTS
for them. This was reduced from 526.blender_r.

PR tree-optimization/115764
* gcc.dg/vect/bb-slp-76.c: New testcase.

preprocessor: Create the parser before handling command-line includes [PR115312]

Since r14-2893, we create a parser object in preprocess-only mode for the
purpose of parsing #pragma while preprocessing. The parser object was
formerly created after calling c_finish_options(), which leads to problems
on platforms that don't use stdc-predef.h (such as MinGW, as reported in
the PR). On such platforms, the call to c_finish_options() will process
the first command-line-specified include file. If that includes a PCH, then
c-ppoutput.cc will encounter a state it did not anticipate. Fix it by
creating the parser prior to calling c_finish_options().

gcc/c-family/ChangeLog:

PR pch/115312
* c-opts.cc (c_common_init): Call c_init_preprocess() before
c_finish_options() so that a parser is available to process any
includes specified on the command line.

gcc/testsuite/ChangeLog:

PR pch/115312
* g++.dg/pch/pr115312.C: New test.
* g++.dg/pch/pr115312.Hs: New test.

Daily bump.

aarch64: Add vector popcount besides QImode [PR113859]

This patch improves GCC’s vectorization of __builtin_popcount for aarch64 target
by adding popcount patterns for vector modes besides QImode, i.e., HImode,
SImode and DImode.

With this patch, we now generate the following for V8HI:
  cnt     v1.16b, v0.16b
  uaddlp  v2.8h, v1.16b

For V4HI, we generate:
  cnt     v1.8b, v0.8b
  uaddlp  v2.4h, v1.8b

For V4SI, we generate:
  cnt     v1.16b, v0.16b
  uaddlp  v2.8h, v1.16b
  uaddlp  v3.4s, v2.8h

For V4SI with TARGET_DOTPROD, we generate the following instead:
  movi    v0.4s, #0
  movi    v1.16b, #1
  cnt     v3.16b, v2.16b
  udot    v0.4s, v3.16b, v1.16b

For V2SI, we generate:
  cnt     v1.8b, v.8b
  uaddlp  v2.4h, v1.8b
  uaddlp  v3.2s, v2.4h

For V2SI with TARGET_DOTPROD, we generate the following instead:
  movi    v0.8b, #0
  movi    v1.8b, #1
  cnt     v3.8b, v2.8b
  udot    v0.2s, v3.8b, v1.8b

For V2DI, we generate:
  cnt     v1.16b, v.16b
  uaddlp  v2.8h, v1.16b
  uaddlp  v3.4s, v2.8h
  uaddlp  v4.2d, v3.4s

For V4SI with TARGET_DOTPROD, we generate the following instead:
  movi    v0.4s, #0
  movi    v1.16b, #1
  cnt     v3.16b, v2.16b
  udot    v0.4s, v3.16b, v1.16b
  uaddlp  v0.2d, v0.4s

PR target/113859

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md (aarch64_<su>addlp<mode>): Rename to...
(@aarch64_<su>addlp<mode>): ... This.
(popcount<mode>2): New define_expand.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/popcnt-udot.c: New test.
* gcc.target/aarch64/popcnt-vec.c: New test.

Signed-off-by: Pengxuan Zheng <quic_pzheng@quicinc.com>

aarch64: Add testcase for vectorconvert lowering [PR110473]

Vectorconvert lowering was changed to use the convert optab directly
starting in r15-1677-gc320a7efcd35ba. I had filed an aarch64 specific
issue for this specific thing and it would make sense to add an aarch64
specific testcase instead of just having a x86_64 specific ones for
this.

Pushed as obvious after testing for aarch64-linux-gnu.

PR tree-optimization/110473
PR tree-optimization/107432

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/vect-convert-1.c: New test.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

Rename expand_powcabs pass to expand_pow

Since cabs expansion was removed from this pass,
it would be good to rename it.

Bootstrapped and tested on x86_64-linux-gnu

gcc/ChangeLog:

* passes.def (expand_pow): Renamed from expand_powcabs.
* timevar.def (TV_TREE_POWCABS): Remove.
(TV_TREE_POW): Add
* tree-pass.h (make_pass_expand_powcabs): Rename to ...
(make_pass_expand_pow): This.
* tree-ssa-math-opts.cc (class pass_expand_powcabs): Rename to ...
(class pass_expand_pow): This.
(pass_expand_powcabs::execute): Rename to ...
(pass_expand_pow::execute): This.
(make_pass_expand_powcabs): Rename to ...
(make_pass_expand_pow): This.

gcc/testsuite/ChangeLog:

* gcc.dg/pow-sqrt-synth-1.c: Update testcase for renamed pass.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

Add some optimizations to gimple_expand_builtin_cabs

While looking into the original folding code for cabs
(moved to match in r6-4111-gabcc43f5323869), I noticed that
`cabs(x+0i)` was optimized even without the need of sqrt.
I also noticed that now the code generation in this case
will be worse if the target had a sqrt. So let's implement
this small optimizations in gimple_expand_builtin_cabs.
Note `cabs(x+0i)` is done without unsafe math optimizations.
This is because the definition of `cabs(x+0i)` is `hypot(x, 0)`
and the definition in the standard says that just returns `abs(x)`.

Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

* tree-complex.cc (gimple_expand_builtin_cabs): Add
`cabs(a+ai)`, `cabs(x+0i)` and `cabs(0+xi)` optimizations.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/cabs-3.c: New test.
* gcc.dg/tree-ssa/cabs-4.c: New test.
* gcc.dg/tree-ssa/cabs-5.c: New test.
* gcc.dg/tree-ssa/cabs-6.c: New test.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

Move cabs expansion from powcabs to complex lowering [PR115710]

Expanding cabs in powcab might be too late as forwprop might
recombine the load from a memory with the complex expr. Moving
instead to complex lowering allows us to use directly the real/imag
component from the loads instead. This allows for vectorization too.

Bootstrapped and tested on x86_64-linux-gnu with no regressions.

PR tree-optimization/115710

gcc/ChangeLog:

* tree-complex.cc (init_dont_simulate_again): Handle CABS.
(gimple_expand_builtin_cabs): New function, moved mostly
from tree-ssa-math-opts.cc.
(expand_complex_operations_1): Call gimple_expand_builtin_cabs.
* tree-ssa-math-opts.cc (gimple_expand_builtin_cabs): Remove.
(build_and_insert_binop): Remove.
(pass_data_expand_powcabs): Update comment.
(pass_expand_powcabs::execute): Don't handle CABS.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/cabs-1.c: New test.
* gcc.dg/tree-ssa/cabs-2.c: New test.
* gfortran.dg/vect/pr115710.f90: New test.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

Small optimization for complex addition, real/imag parts the same

This is just a small optimization for the case where the real and imag
parts are the same when lowering complex addition/subtraction. We only
need to do the addition once when the real and imag parts are the same (on
both sides of the operator). This gets done later on by FRE/PRE/DOM but
having it done soon allows the cabs lowering to remove the sqrt and
just change it to a multiply by a constant.

Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

* tree-complex.cc (expand_complex_addition): If both
operands have the same real and imag parts, only
add the addition once.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/complex-8.c: New test.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

c++: Fix ICE on constexpr placement new [PR115754]

C++26 is making in P2747R2 paper placement new constexpr.
While working on a patch for that, I've noticed we ICE starting with
GCC 14 on the following testcase.
The problem is that e.g. for the void * to sometype * casts checks,
we really assume the casts have their operand constant evaluated
as prvalue, but on the testcase the cast itself is evaluated with
vc_discard and that means op can end up e.g. a VAR_DECL which the
later code doesn't like and asserts on.
If the result type is void, we don't really need the cast operand
for anything, so can use vc_discard for the recursive call,
VIEW_CONVERT_EXPR can appear on the lhs, so we need to honor the
lval but otherwise the patch uses vc_prvalue.
I'd like to get this patch in before the rest of P2747R2 implementation,
so that it can be backported to 14.2 later on.

2024-07-02 Jakub Jelinek <jakub@redhat.com>
Jason Merrill <jason@redhat.com>

PR c++/115754
* constexpr.cc (cxx_eval_constant_expression) <case CONVERT_EXPR>:
For conversions to void, pass vc_discard to the recursive call
and otherwise for tcode other than VIEW_CONVERT_EXPR pass vc_prvalue.

* g++.dg/cpp26/pr115754.C: New test.

c++: Implement C++26 P3144R2 - Deleting a Pointer to an Incomplete Type Should be Ill-formed [PR115747]

The following patch implements the C++26 paper which makes delete
and delete[] on incomplete class types invalid, previously it has
been UB unless the class had trivial destructor and no custom
deallocator.

The patch uses permerror_opt, so -Wno-delete-incomplete makes it
still compile without warnings like before, and -fpermissive makes
it warn but not error; in SFINAE contexts it is considered an error
in C++26 and later.

2024-07-02 Jakub Jelinek <jakub@redhat.com>
Jason Merrill <jason@redhat.com>

PR c++/115747
gcc/cp/
* init.cc: Implement C++26 P3144R2 - Deleting a Pointer to an
Incomplete Type Should be Ill-formed.
(build_vec_delete_1): Emit permerror_at and return error_mark_node
for delete [] on incomplete type.
(build_delete): Similarly for delete.
gcc/testsuite/
* g++.dg/init/delete1.C: Adjust expected diagnostics for C++26.
* g++.dg/warn/Wdelete-incomplete-1.C: Likewise.
* g++.dg/warn/incomplete1.C: Likewise.
* g++.dg/ipa/pr85607.C: Likewise.
* g++.dg/cpp26/delete1.C: New test.
* g++.dg/cpp26/delete2.C: New test.
* g++.dg/cpp26/delete3.C: New test.

c++: Implement C++26 P0963R3 - Structured binding declaration as a condition [PR115745]

This C++26 paper allows structured bindings declaration in
if/while/for/switch conditions, where the structured binding shouldn't
be initialized by array (so in the standard only non-union class types;
as extension _Complex will also work and vectors will be diagnosed because
of conversion issues) and the decision variable is the artificial variable
(e in the standard) itself contextually converted to bool or converted to
some integer/enumeration type.
The standard requires that the conversion is evaluated before the get calls
in case of std::tuple* using class, so the largest part of the patch is making
sure this can be done during instantiation without duplicating too much
code.
In cp_parser_condition, creating a TARGET_EXPR to hold temporarily the
bool or int/enum result of the conversion across the get calls is easy, it
could be just added in between cp_finish_decl and cp_finish_decomp, but for
pt.cc there was no easy spot to add that.
In the end, the patch uses DECL_DECOMP_BASE for this.  That tree is used
primarily for the user vars or var proxies to point back at the
DECL_ARTIFICIAL e variable, before this patch it has been NULL_TREE on
the base.  In some places code was checking if DECL_DECOMP_BASE is NULL_TREE
to find out if it is the base or user var/var proxy.
The patch introduces DECL_DECOMP_IS_BASE macro for what used to be
!DECL_DECOMP_BASE and can stick something else in the base's
DECL_DECOMP_BASE as long as it is not a VAR_DECL.
The patch uses integer_zero_node to mark if/while/for condition structured
binding, integer_one_node to mark switch condition structured binding and
finally cp_finish_decomp sets it to TARGET_EXPR if some get method calls are
emitted and from there the callers can pick that up.  This way I also
avoided code duplication between !processing_template_decl parsing and
pt.cc.

2024-07-02  Jakub Jelinek  <jakub@redhat.com>

PR c++/115745
gcc/cp/
* cp-tree.h: Implement C++26 P0963R3 - Structured binding declaration
as a condition.
(DECL_DECOMP_BASE): Adjust comment.
(DECL_DECOMP_IS_BASE): Define.
* parser.cc (cp_parser_selection_statement): Adjust
cp_parser_condition caller.
(cp_parser_condition): Add KEYWORD argument.  Parse
C++26 structured bindings in conditions.
(cp_parser_c_for, cp_parser_iteration_statement): Adjust
cp_parser_condition callers.
(cp_parser_simple_declaration): Adjust
cp_parser_decomposition_declaration caller.
(cp_parser_decomposition_declaration): Add KEYWORD argument.
If it is not RID_MAX, diagnose for C++23 and older rather than C++14
and older.  Set DECL_DECOMP_BASE to integer_zero_node for structured
bindings used in if/while/for conditions or integer_one_node for
those used in switch conditions.
* decl.cc (poplevel, check_array_initializer): Use DECL_DECOMP_IS_BASE
instead of !DECL_DECOMP_BASE.
(cp_finish_decomp): Diagnose array initializer for structured bindings
used in conditions.  If using std::tuple_{size,element}, emit
conversion to bool or integer/enumeration of e into a TARGET_EXPR
before emitting get method calls.
* decl2.cc (mark_used): Use DECL_DECOMP_IS_BASE instead of
!DECL_DECOMP_BASE.
* module.cc (trees_in::tree_node): Likewise.
* typeck.cc (maybe_warn_about_returning_address_of_local): Likewise.
* semantics.cc (maybe_convert_cond): For structured bindings with
TARGET_EXPR DECL_DECOMP_BASE use that as condition.
(finish_switch_cond): Likewise.
gcc/testsuite/
* g++.dg/cpp1z/decomp16.C: Adjust expected diagnostics.
* g++.dg/cpp26/decomp3.C: New test.
* g++.dg/cpp26/decomp4.C: New test.
* g++.dg/cpp26/decomp5.C: New test.
* g++.dg/cpp26/decomp6.C: New test.
* g++.dg/cpp26/decomp7.C: New test.
* g++.dg/cpp26/decomp8.C: New test.
* g++.dg/cpp26/decomp9.C: New test.
* g++.dg/cpp26/decomp10.C: New test.

Regenerate common.opt.urls

I was not aware of the requirement to regenerate the opt urls files
when adding a new option until the autobuilder complained.

Regenerate common.opt.urls for the -gprune-btf option added in:
b8977d928a7a btf: add -gprune-btf option

gcc/
* common.opt.urls: Regenerate.

bpf,btf: enable BTF pruning by default for BPF

This patch enables -gprune-btf by default in the BPF backend when
generating BTF information, and fixes BPF CO-RE generation when using
-gprune-btf.

When generating BPF CO-RE information, we must ensure that types used
in CO-RE relocations always have sufficient BTF information emited so
that the CO-RE relocations can be processed by a BPF loader. The BTF
pruning algorithm on its own does not have sufficient information to
determine which types are used in a BPF CO-RE relocation, so this
information must be supplied by the BPF backend, using a new
btf_mark_type_used function.

Co-authored-by: Cupertino Miranda <cupertino.miranda@oracle.com>
gcc/
* btfout.cc (btf_mark_type_used): New.
* ctfc.h (btf_mark_type_used): Declare it here.
* config/bpf/bpf.cc (bpf_option_override): Enable -gprune-btf
by default if -gbtf is enabled.
* config/bpf/core-builtins.cc (extra_fn): New typedef.
(compute_field_expr): Add callback parameter, and call it if supplied.
Fix computation for MEM_REF.
(mark_component_type_as_used): New.
(bpf_mark_types_as_used): Likewise.
(bpf_expand_core_builtin): Call here.
* doc/invoke.texi (Debugging Options): Note that -gprune-btf is
enabled by default for BPF target when generating BTF.

gcc/testsuite/
* gcc.dg/debug/btf/btf-variables-5.c: Adjust one test for bpf-*-*
target.

btf: add -gprune-btf option

This patch adds a new option, -gprune-btf, to control BTF debug info
generation.

As the name implies, this option enables a kind of "pruning" of the BTF
information before it is emitted.  When enabled, rather than emitting
all type information translated from DWARF, only information for types
directly used in the source program is emitted.

The primary purpose of this pruning is to reduce the amount of
unnecessary BTF information emitted, especially for BPF programs.  It is
very common for BPF programs to include Linux kernel internal headers in
order to have access to kernel data structures.  However, doing so often
has the side effect of also adding type definitions for a large number
of types which are not actually used by nor relevant to the program.
In these cases, -gprune-btf commonly reduces the size of the resulting
BTF information by 10x or more, as seen on average when compiling Linux
kernel BPF selftests.  This both slims down the size of the resulting
object and reduces the time required by the BPF loader to verify the
program and its BTF information.

Note that the pruning implemented in this patch follows the same rules
as the BTF pruning performed unconditionally by LLVM's BPF backend when
generating BTF.  In particular, the main sources of pruning are:

  1) Only generate BTF for types used by variables and functions at the
     file scope.

     Note that which variables are known to be "used" may differ
     slightly between LTO and non-LTO builds due to optimizations.  For
     non-LTO builds (and always for the BPF target), variables which are
     optimized away during compilation are considered to be unused, and
     they (along with their types) are pruned.  For LTO builds, such
     variables are not known to be optimized away by the time pruning
     occurs, so VAR records for them and information for their types may
     be present in the emitted BTF information.  This is a missed
     optimization that may be fixed in the future.

  2) Avoid emitting full BTF for struct and union types which are only
     pointed-to by members of other struct/union types.  In these cases,
     the full BTF_KIND_STRUCT or BTF_KIND_UNION which would normally
     be emitted is replaced with a BTF_KIND_FWD, as though the
     underlying type was a forward-declared struct or union type.

gcc/
* btfout.cc (btf_used_types): New hash set.
(struct btf_fixup): New.
(fixups, forwards): New vecs.
(btf_output): Calculate num_types depending on debug_prune_btf.
(btf_early_finsih): New initialization for debug_prune_btf.
(btf_add_used_type): New function.
(btf_used_type_list_cb): Likewise.
(btf_collect_pruned_types): Likewise.
(btf_add_vars): Handle special case for variables in ".maps" section
when generating BTF for BPF CO-RE target.
(btf_late_finish): Use btf_collect_pruned_types when debug_prune_btf
is in effect.  Move some initialization to btf_early_finish.
(btf_finalize): Additional deallocation for debug_prune_btf.
* common.opt (gprune-btf): New flag.
* ctfc.cc (init_ctf_strtable): Make non-static.
* ctfc.h (init_ctf_strtable, ctfc_delete_strtab): Make extern.
* doc/invoke.texi (Debugging Options): Document -gprune-btf.

gcc/testsuite/
* gcc.dg/debug/btf/btf-prune-1.c: New test.
* gcc.dg/debug/btf/btf-prune-2.c: Likewise.
* gcc.dg/debug/btf/btf-prune-3.c: Likewise.
* gcc.dg/debug/btf/btf-prune-maps.c: Likewise.

btf: refactor and simplify implementation

This patch heavily refactors btfout.cc to take advantage of the
structural changes in the prior commits.

Now that inter-type references are internally stored as simply pointers,
all the painful, brittle, confusing infrastructure that was used in the
process of converting CTF type IDs to BTF type IDs can be thrown out.
This greatly simplifies the entire process of converting from CTF to
BTF, making the code cleaner, easier to read, and easier to maintain.

In addition, we no longer need to worry about destructive changes in
internal data structures used commonly by CTF and BTF, which allows
deleting several ancillary data structures previously used in btfout.cc.

This is nearly transparent, but a few improvements have also been made:

1) BTF_KIND_FUNC records are now _always_ constructed at early_finish,
    allowing us to construct records even for functions which are later
    inlined by optimizations. DATASEC entries for functions are only
    constructed at late_finish, to avoid incorrectly generating entries
    for functions which get inlined.

2) BTF_KIND_VAR records and DATASEC entries for them are now always
    constructed at (late) finish, which avoids cases where we could
    incorrectly create records for variables which were completely
    optimized away.  This fixes PR debug/113566 for non-LTO builds.
    In LTO builds, BTF must be emitted at early_finish, so some VAR
    records may be emitted for variables which are later optimized away.

3) Some additional assembler comments have been added with more
    information for debugging.

gcc/
* btfout.cc (struct btf_datasec_entry): New.
(struct btf_datasec): Add `id' member.  Change `entries' to use
new struct btf_datasec_entry.
(func_map): New hash_map.
(max_translated_id): New.
(btf_var_ids, btf_id_map, holes, voids, num_vars_added)
(num_types_added, num_types_created): Delete.
(btf_absolute_var_id, btf_relative_var_id, btf_absolute_func_id)
(btf_relative_func_id, btf_absolute_datasec_id, init_btf_id_map)
(get_btf_id, set_btf_id, btf_emit_id_p): Delete.
(btf_removed_type_p): Delete.
(btf_dtd_kind, btf_emit_type_p): New helpers.
(btf_fwd_to_enum_p, btf_calc_num_vbytes): Use them.
(btf_collect_datasec): Delete.
(btf_dtd_postprocess_cb, btf_dvd_emit_preprocess_cb)
(btf_dtd_emit_preprocess_cb, btf_emit_preprocess): Delete.
(btf_dmd_representable_bitfield_p): Adapt to type reference changes
and delete now-unused ctfc argument.
(btf_asm_datasec_type_ref): Delete.
(btf_asm_type_ref): Adapt to type reference changes, simplify.
(btf_asm_type): Likewise. Mark struct/union types with bitfield
members.
(btf_asm_array): Adapt to data structure changes.
(btf_asm_varent): Likewise.
(btf_asm_sou_member): Likewise. Ensure non-bitfield members are
correctly re-encoded if struct or union contains any bitfield.
(btf_asm_func_arg, btf_asm_func_type, btf_asm_datasec_entry)
(btf_asm_datasec_type): Adapt to data structure changes.
(output_btf_header): Adapt to other changes, simplify type
length calculation, add info to assembler comments.
(output_btf_vars): Adapt to other changes.
(output_btf_strs): Fix overlong lines.
(output_asm_btf_sou_fields, output_asm_btf_enum_list)
(output_asm_btf_func_args_list, output_asm_btf_vlen_bytes)
(output_asm_btf_type, output_btf_types, output_btf_func_types)
(output_btf_datasec_types): Adapt to other changes.
(btf_init_postprocess): Delete.
(btf_output): Change to only perform output.
(btf_add_const_void, btf_add_func_records): New.
(btf_early_finish): Use them here. New.
(btf_datasec_push_entry): Adapt to data structure changes.
(btf_datasec_add_func, btf_datasec_add_var): New.
(btf_add_func_datasec_entries): New.
(btf_emit_variable_p): New helper.
(btf_add_vars): Use it here. New.
(btf_type_list_cb, btf_collect_translated_types): New.
(btf_assign_func_ids, btf_late_assign_var_ids)
(btf_assign_datasec_ids): New.
(btf_finish): Remove unused argument. Call new btf_late*
functions and btf_output.
(btf_finalize): Adapt to data structure changes.
* ctfc.h (struct ctf_dtdef): Convert existing boolean flags to
BOOL_BITFIELD and reorder.
(struct ctf_dvdef): Add dvd_id member.
(btf_finish): Remove argument from prototype.
(get_btf_id): Delete prototype.
(funcs_traverse_callback, traverse_btf_func_types): Add an
explanatory comment.
* dwarf2ctf.cc (ctf_debug_finish): Remove unused argument.
* dwarf2ctf.h: Analogous change.
* dwarf2out.cc: Likewise.

ctf: use pointers instead of IDs internally

This patch replaces all inter-type references in the ctfc internal data
structures with pointers, rather than the references-by-ID which were
used previously.

A couple of small updates in the BPF backend are included to make it
compatible with the change.

This change is only to the in-memory representation of various CTF
structures to make them easier to work with in various cases. It is
outwardly transparent; there is no change in emitted CTF.

gcc/
* btfout.cc (BTF_VOID_TYPEID, BTF_INIT_TYPEID): Move defines to
include/btf.h.
(btf_dvd_emit_preprocess_cb, btf_emit_preprocess)
(btf_dmd_representable_bitfield_p, btf_asm_array, btf_asm_varent)
(btf_asm_sou_member, btf_asm_func_arg, btf_init_postprocess):
Adapt to structural changes in ctf_* structs.
* ctfc.h (struct ctf_dtdef): Add forward declaration.
(ctf_dtdef_t, ctf_dtdef_ref): Move typedefs earlier.
(struct ctf_arinfo, struct ctf_funcinfo, struct ctf_sliceinfo)
(struct ctf_itype, struct ctf_dmdef, struct ctf_func_arg)
(struct ctf_dvdef): Use pointers instead of type IDs for
references to other types and use typedefs where appropriate.
(struct ctf_dtdef): Add ref_type member.
(ctf_type_exists): Use pointer instead of type ID.
(ctf_add_reftype, ctf_add_enum, ctf_add_slice, ctf_add_float)
(ctf_add_integer, ctf_add_unknown, ctf_add_pointer)
(ctf_add_array, ctf_add_forward, ctf_add_typedef)
(ctf_add_function, ctf_add_sou, ctf_add_enumerator)
(ctf_add_variable): Likewise. Return pointer instead of ID.
(ctf_lookup_tree_type): Return pointer to type instead of ID.
* ctfc.cc: Analogous changes.
* ctfout.cc (ctf_asm_type, ctf_asm_slice, ctf_asm_varent)
(ctf_asm_sou_lmember, ctf_asm_sou_member, ctf_asm_func_arg)
(output_ctf_objt_info): Adapt to changes.
* dwarf2ctf.cc (gen_ctf_type, gen_ctf_void_type)
(gen_ctf_unknown_type, gen_ctf_base_type, gen_ctf_pointer_type)
(gen_ctf_subrange_type, gen_ctf_array_type, gen_ctf_typedef)
(gen_ctf_modifier_type, gen_ctf_sou_type, gen_ctf_function_type)
(gen_ctf_enumeration_type, gen_ctf_variable, gen_ctf_function)
(gen_ctf_type, ctf_do_die): Likewise.
* config/bpf/btfext-out.cc (struct btf_ext_core_reloc): Use
pointer instead of type ID.
(bpf_core_reloc_add, bpf_core_get_sou_member_index)
(output_btfext_core_sections): Adapt to above changes.
* config/bpf/core-builtins.cc (process_type): Likewise.

include/
* btf.h (BTF_VOID_TYPEID, BTF_INIT_TYPEID): Move defines here,
from gcc/btfout.cc.

ctf, btf: restructure CTF/BTF emission

This commit makes some structural changes to the CTF/BTF debug info
emission.  In particular:

a) CTF is new always fully generated and emitted before any
    BTF-related procedures are run.  This means that BTF-related
    functions can change, even irreversibly, the shared in-memory
    representation used by the two formats without issue.

b) BTF generation has fewer entry points, and is cleanly divided
    into early_finish and finish.

c) BTF is now always emitted at finish (called from dwarf2out_finish),
    for all targets in non-LTO builds, rather than being emitted at
    early_finish for targets other than BPF CO-RE.  In LTO builds,
    BTF is emitted at early_finish as before.

    Note that this change alone does not alter the contents of BTF at
    all, regardless of whether it would have previously been emitted at
    early_finish or finish, because the calculation of the BTF to be
    emitted is not moved by this patch, only the write-out.

The changes are transparent to both CTF and BTF emission.

gcc/
* btfout.cc (btf_init_postprocess): Rename to...
(btf_early_finish): ...this.
(btf_output): Rename to...
(btf_finish): ...this.
* ctfc.h: Analogous changes.
* dwarf2ctf.cc (ctf_debug_early_finish): Conditionally call
btf_early_finish, or ctf_finalize as appropriate.  Emit BTF
here for LTO builds.
(ctf_debug_finish): Always call btf_finish here if generating
BTF info in non-LTO builds.
(ctf_debug_finalize, ctf_debug_init_postprocess): Delete.
* dwarf2out.cc (dwarf2out_early_finish): Remove call to
ctf_debug_init_postprocess.

Arm: Fix disassembly error in Thumb-1 relaxed load/store [PR115188]

A Thumb-1 memory operand allows single-register LDMIA/STMIA. This doesn't get
printed as LDR/STR with writeback in unified syntax, resulting in strange
assembler errors if writeback is selected. To work around this, use the 'Uw'
constraint that blocks writeback. Also use a new 'mem_and_no_t1_wback_op'
which is a general memory operand that disallows writeback in Thumb-1.
A few other patterns were using 'm' for Thumb-1 in a similar way, update these
to also use 'mem_and_no_t1_wback_op' and 'Uw'.

gcc:
PR target/115188
* config/arm/arm.md (unaligned_loadsi): Use 'Uw' constraint and
'mem_and_no_t1_wback_op'.
(unaligned_loadhiu): Likewise.
(unaligned_storesi): Likewise.
(unaligned_storehi): Likewise.
* config/arm/predicates.md (mem_and_no_t1_wback_op): Add new predicate.
* config/arm/sync.md (arm_atomic_load<mode>): Use 'Uw' constraint.
(arm_atomic_store<mode>): Likewise.

gcc/testsuite:
PR target/115188
* gcc.target/arm/pr115188.c: Add new test.

build: Fix "make install" for MinGW

Since r8-4925, the "make install" recipe generates a path which can start
with "//", causing problems for some Windows environments. Fix by removing
the redundant slash.

gcc/cp/ChangeLog:

* Make-lang.in: Remove redundant slash.

gcc: docs: Fix documentation of two hooks

The `function_attribute_inlinable_p` hook documentation described it
returning the value if it is OK to inline the provided fndecl into "the
current function".  AFAICS This hook is only called when
`current_function_decl` is the same as the `fndecl` argument that the
hook is given, hence asking whether `fndecl` can be inlined into "the
current function" doesn't seem relevant.  Moreover from what I see no
existing implementation of `function_attribute_inlinable_p` uses "the
current function" in any way.
Update the documentation to match this understanding.

The `unspec_may_trap_p` documentation mentioned applying to either
`unspec` or `unspec_volatile`.  AFAICS this hook is only used for
`unspec` codes since c84a808e493a, so I removed the mention of
`unspec_volatile`.

gcc/ChangeLog:

* doc/tm.texi: Regenerated.
* target.def (function_attribute_inlinable_p,
unspec_may_trap_p): Update documentation.

tree-optimization/115741 - ICE with VMAT_CONTIGUOUS_REVERSE and gap

When we determine overrun we have to consider VMAT_CONTIGUOUS_REVERSE
the same as VMAT_CONTIGUOUS.

PR tree-optimization/115741
* tree-vect-stmts.cc (get_group_load_store_type): Also
handle VMAT_CONTIGUOUS_REVERSE when determining overrun.

ada: Use static allocation for small dynamic string concatenations in more cases

This lifts the limitation of the original implementation whereby the first
operand of the concatenation needs to have a length known at compiled time
in order for the static allocation to be used.

gcc/ada/

* exp_ch4.adb (Expand_Concatenate): In the case where an operand
does not have both bounds known at compile time, use nevertheless
the low bound directly if it is known at compile time.
Fold the conditional expression giving the low bound of the result
in the general case if the low bound of all the operands are equal.

ada: Fix generic renaming table low bound on reset

gcc/ada/

* sem_ch12.adb (Save_And_Reset): Fix value of low bound used to
reset table.

ada: Compiler accepts an illegal Unchecked_Access attribute reference

The compiler incorrectly accepts Some_Object'Unchecked_Access'Image.

gcc/ada/

* sem_attr.adb
(Analyze_Image_Attribute.Check_Image_Type): Check for
E_Access_Attribute_Type prefix type.

ada: Use clause (or use type clause) in a protected operation sometimes ignored.

In some cases, a use clause (or a use type clause) occurring within a
protected operation is incorrectly ignored.

gcc/ada/

* exp_ch9.adb
(Expand_N_Protected_Body): Declare new procedure
Unanalyze_Use_Clauses and call it before analyzing the newly
constructed subprogram body.

ada: Put_Image aspect spec ignored for null extension.

If type T1 is is a tagged null record with a Put_Image aspect specification
and type T2 is a null extension of T1 (with no aspect specifications), then
evaluation of a T2'Image call should include a call to the specified procedure
(as opposed to yielding "(NULL RECORD)").

gcc/ada/

* exp_put_image.adb
(Build_Record_Put_Image_Procedure): Declare new Boolean-valued
function Null_Record_Default_Implementation_OK; call it as part of
deciding whether to generate "(NULL RECORD)" text.

ada: Allow mutably tagged types to work with qualified expressions

This patch modifies the experimental 'Size'Class feature such that objects of
mutably tagged types can be assigned qualified expressions featuring a
definite type (e.g. Mutable_Obj := Root_Child_T'(Root_T with others => <>)).

gcc/ada/

* sem_ch5.adb:
(Analyze_Assignment): Add special expansion for qualified expressions
in certain cases dealing with mutably tagged types.

ada: Bug box for expression function with list comprehension

GNAT crashes on an iterator with a filter inside an expression function
that is the completion of an earlier spec.

gcc/ada/

* freeze.adb (Freeze_Type_Refs): If Node is in N_Has_Etype,
check that it has had its Etype set, because this can be
called early for expression functions that are completions.

ada: Call memcmp instead of Compare_Array_Unsigned_8 and...

... implement support for ordering comparisons of discrete array types.

This extends the Support_Composite_Compare_On_Target feature to ordering
comparisons of discrete array types as specified by RM 4.5.2(26/3), when
the component type is a byte (unsigned).

Implement support for ordering comparisons of discrete array types
with a two-pronged approach: for types with a size known at compile time,
this lets the gimplifier generate the call to memcmp (or else an optimize
version of it); otherwise, this directly generates the call to memcmp.

gcc/ada/

* exp_ch4.adb (Expand_Array_Comparison): Remove the obsolete byte
addressibility test. If Support_Composite_Compare_On_Target is true,
immediately return for a component size of 8, an unsigned component
type and aligned operands. Disable when Unnest_Subprogram_Mode is
true (for LLVM).
(Expand_N_Op_Eq): Adjust comment.
* targparm.ads (Support_Composite_Compare_On_Target): Replace bit by
byte in description and document support for ordering comparisons.
* gcc-interface/utils2.cc (compare_arrays): Rename into...
(compare_arrays_for_equality): ...this. Remove redundant lines.
(compare_arrays_for_ordering): New function.
(build_binary_op) <comparisons>: Call compare_arrays_for_ordering
to implement ordering comparisons for arrays.