git.ipfire.org Git - thirdparty/gcc.git/log

arm: vld1q_types_x3 ACLE intrinsics

This patch is part of a series of patches implementing the _xN
variants of the vld1q intrinsic for the arm port. This patch adds the
_x3 variants of the vld1q intrinsic.

ACLE documents:
https://developer.arm.com/documentation/ihi0053/latest/

ISA documents:
https://developer.arm.com/documentation/ddi0487/latest/

gcc/ChangeLog:
* config/arm/arm_neon.h
(vld1q_u8_x3, vld1q_u16_x3, vld1q_u32_x3, vld1q_u64_x3): New.
(vld1q_s8_x3, vld1q_s16_x3, vld1q_s32_x3, vld1q_s64_x3): New.
(vld1q_f16_x3, vld1q_f32_x3): New.
(vld1q_p8_x3, vld1q_p16_x3, vld1q_p64_x3): New.
(vld1q_bf16_x3): New.
* config/arm/arm_neon_builtins.def (vld1_x3): New entries.
* config/arm/neon.md
(neon_vld1_x3<mode>): New.
(neon_vld1x3qa<mode>, neon_vld1x3qb<mode>): New.
* config/arm/unspecs.md
(UNSPEC_VLD1X3A, UNSPEC_VLD1X3B): New.

gcc/testsuite/ChangeLog:
* gcc.target/arm/simd/vld1q_base_xN_1.c: Add new tests.
* gcc.target/arm/simd/vld1q_bf16_xN_1.c: Add new tests.
* gcc.target/arm/simd/vld1q_fp16_xN_1.c: Add new tests.
* gcc.target/arm/simd/vld1q_p64_xN_1.c: Add new tests.

arm: vld1q_types_x2 ACLE intrinsics

This patch is part of a series of patches implementing the _xN
variants of the vld1q intrinsic for the arm port. This patch adds the
_x2 variants of the vld1q intrinsic.

ACLE documents:
https://developer.arm.com/documentation/ihi0053/latest/

ISA documents:
https://developer.arm.com/documentation/ddi0487/latest/

gcc/ChangeLog:
* config/arm/arm_neon.h
(vld1q_u8_x2, vld1q_u16_x2, vld1q_u32_x2, vld1q_u64_x2): New.
(vld1q_s8_x2, vld1q_s16_x2, vld1q_s32_x2, vld1q_s64_x2): New.
(vld1q_f16_x2, vld1q_f32_x2): New.
(vld1q_p8_x2, vld1q_p16_x2, vld1q_p64_x2): New.
(vld1q_bf16_x2): New.
* config/arm/arm_neon_builtins.def (vld1_x2): New entries.
* config/arm/neon.md (vld1_x2<mode>): New.

gcc/testsuite/ChangeLog:
* gcc.target/arm/simd/vld1q_base_xN_1.c: Add new test.
* gcc.target/arm/simd/vld1q_bf16_xN_1.c: Add new test.
* gcc.target/arm/simd/vld1q_fp16_xN_1.c: Add new test.
* gcc.target/arm/simd/vld1q_p64_xN_1.c: Add new test.

c: Avoid _BitInt indexes > sizetype in ARRAY_REFs [PR113315]

When build_array_ref doesn't use ARRAY_REF, it casts the index to sizetype
already, performs POINTER_PLUS_EXPR and then dereferences.
While when emitting ARRAY_REF, we try to keep index expression as is in
whatever type it had, which is reasonable e.g. for signed or unsigned types
narrower than sizetype for loop optimizations etc.
But if the index is wider than sizetype, we are unnecessarily computing
bits beyond what is needed. For {,unsigned }__int128 on 64-bit arches
or {,unsigned }long long on 32-bit arches we've been doing that for decades,
so the following patch doesn't propose to change that (might be stage1
material), but for _BitInt at least the _BitInt lowering code doesn't expect
to see large/huge _BitInt in the ARRAY_REF indexes, I was expecting one
would see just casts of those to sizetype.

So, the following patch makes sure that large/huge _BitInt indexes don't
appear in ARRAY_REFs.

2024-01-12 Jakub Jelinek <jakub@redhat.com>

PR c/113315
* c-typeck.cc (build_array_ref): If index has BITINT_TYPE type with
precision larger than sizetype precision, convert it to sizetype.

* gcc.dg/bitint-65.c: New test.
* gcc.dg/bitint-66.c: New test.

testsuite: Make bitint early vect test more accurate

This changes the tests I committed for PR113287 to also
run on targets that don't support bitint.

gcc/ChangeLog:

PR tree-optimization/113287
* doc/sourcebuild.texi (check_effective_target_bitint65535): New.

gcc/testsuite/ChangeLog:

PR tree-optimization/113287
* gcc.dg/vect/vect-early-break_100-pr113287.c: Support non-bitint.
* gcc.dg/vect/vect-early-break_99-pr113287.c: Likewise.
* lib/target-supports.exp (bitint, bitint128, bitint575, bitint65535):
Document them.

middle-end: remove more usages of single_exit

This replaces two more usages of single_exit that I had missed before.
They both seem to happen when we re-use the ifcvt scalar loop for versioning.

The condition in versioning is the same as the one for when we don't re-use the
scalar loop.

gcc/ChangeLog:

* tree-vect-loop-manip.cc (vect_loop_versioning): Replace single_exit.
* tree-vect-loop.cc (vect_transform_loop): Likewise.

middle-end: fill in reduction PHI for all alt exits [PR113178]

When we have a loop with more than 2 exits and a reduction I forgot to fill in
the PHI value for all alternate exits.

All alternate exits use the same PHI value so we should loop over the new
PHI elements and copy the value across since we call the reduction calculation
code only once for all exits. This was normally covered up by earlier parts of
the compiler rejecting loops incorrectly (which has been fixed now).

Note that while I can use the loop in all cases, the reason I separated out the
main and alt exit is so that if you pass the wrong edge the macro will assert.

gcc/ChangeLog:

PR tree-optimization/113178
* tree-vect-loop.cc (vect_create_epilog_for_reduction): Fill in all
alternate exits.

gcc/testsuite/ChangeLog:

PR tree-optimization/113178
* gcc.dg/vect/vect-early-break_101-pr113178.c: New test.
* gcc.dg/vect/vect-early-break_102-pr113178.c: New test.

middle-end: thread through existing LCSSA variable for alternative exits too [PR113237]

Builing on top of the previous patch, similar to when we have a single exit if
we have a case where all exits are considered early exits and there are existing
non virtual phi then in order to maintain LCSSA we have to use the existing PHI
variables. We can't simply clear them and just rebuild them because the order
of the PHIs in the main exit must match the original exit for when we add the
skip_epilog guard.

But the infrastructure is already in place to maintain them, we just have to use
the right value.

gcc/ChangeLog:

PR tree-optimization/113237
* tree-vect-loop-manip.cc (slpeel_tree_duplicate_loop_to_edge_cfg): Use
existing LCSSA variable for exit when all exits are early break.

gcc/testsuite/ChangeLog:

PR tree-optimization/113237
* gcc.dg/vect/vect-early-break_98-pr113237.c: New test.

middle-end: maintain LCSSA form when peeled vector iterations have virtual operands

This patch fixes several interconnected issues.

1. When picking an exit we wanted to check for niter_desc.may_be_zero not true.
   i.e. we want to pick an exit which we know will iterate at least once.
   However niter_desc.may_be_zero is not a boolean.  It is a tree that encodes
   a boolean value.  !niter_desc.may_be_zero is just checking if we have some
   information, not what the information is.  This leads us to pick a more
   difficult to vectorize exit more often than we should.

2. Because we had this bug, we used to pick an alternative exit much more ofthen
   which showed one issue, when the loop accesses memory and we "invert it" we
   would corrupt the VUSE chain.  This is because on an peeled vector iteration
   every exit restarts the loop (i.e. they're all early) BUT since we may have
   performed a store, the vUSE would need to be updated.  This version maintains
   virtual PHIs correctly in these cases.   Note that we can't simply remove all
   of them and recreate them because we need the PHI nodes still in the right
   order for if skip_vector.

3. Since we're moving the stores to a safe location I don't think we actually
   need to analyze whether the store is in range of the memref,  because if we
   ever get there, we know that the loads must be in range, and if the loads are
   in range and we get to the store we know the early breaks were not taken and
   so the scalar loop would have done the VF stores too.

4. Instead of searching for where to move stores to, they should always be in
   exit belonging to the latch.  We can only ever delay stores and even if we
   pick a different exit than the latch one as the main one, effects still
   happen in program order when vectorized.  If we don't move the stores to the
   latch exit but instead to whever we pick as the "main" exit then we can
   perform incorrect memory accesses (luckily these are trapped by verify_ssa).

5. We only used to analyze loads inside the same BB as an early break, and also
   we'd never analyze the ones inside the block where we'd be moving memory
   references to.  This is obviously bogus and to fix it this patch splits apart
   the two constraints.  We first validate that all load memory references are
   in bounds and only after that do we perform the alias checks for the writes.
   This makes the code simpler to understand and more trivially correct.

gcc/ChangeLog:

PR tree-optimization/113137
PR tree-optimization/113136
PR tree-optimization/113172
PR tree-optimization/113178
* tree-vect-loop-manip.cc (slpeel_tree_duplicate_loop_to_edge_cfg):
Maintain PHIs on inverted loops.
(vect_do_peeling): Maintain virtual PHIs on inverted loops.
* tree-vect-loop.cc (vec_init_loop_exit_info): Pick exit closes to
latch.
(vect_create_loop_vinfo): Record all conds instead of only alt ones.

gcc/testsuite/ChangeLog:

PR tree-optimization/113137
PR tree-optimization/113136
PR tree-optimization/113172
PR tree-optimization/113178
* g++.dg/vect/vect-early-break_4-pr113137.cc: New test.
* g++.dg/vect/vect-early-break_5-pr113137.cc: New test.
* gcc.dg/vect/vect-early-break_95-pr113137.c: New test.
* gcc.dg/vect/vect-early-break_96-pr113136.c: New test.
* gcc.dg/vect/vect-early-break_97-pr113172.c: New test.

middle-end: make memory analysis for early break more deterministic [PR113135]

Instead of searching for where to move stores to, they should always be in
exit belonging to the latch.  We can only ever delay stores and even if we
pick a different exit than the latch one as the main one, effects still
happen in program order when vectorized.  If we don't move the stores to the
latch exit but instead to whever we pick as the "main" exit then we can
perform incorrect memory accesses (luckily these are trapped by verify_ssa).

We used to iterate over the conds and check the loads and stores inside them.
However this relies on the conds being ordered in program order.  Additionally
if there is a basic block between two conds we would not have analyzed it.

Instead this now walks from the preds of the destination basic block up to the
loop header and analyzes every block along the way.  As a later optimization we
could stop as soon as we've seen all the BBs we have conds for.  For now the
header will always contain the first cond, but this can change when we support
arbitrary control flow.

gcc/ChangeLog:

PR tree-optimization/113135
* tree-vect-data-refs.cc (vect_analyze_early_break_dependences): Rework
dependency analysis.

gcc/testsuite/ChangeLog:

PR tree-optimization/113135
* gcc.dg/vect/vect-early-break_103-pr113135.c: New test.

c++: cand_parms_match and reversed candidates

When considering whether the candidate parameters match, according to the
language we're considering the synthesized reversed candidate, so we should
compare the parameters in swapped order.  In this situation it doesn't make
sense to consider whether object parameters correspond, since we're
comparing an object parameter to a non-object parameter, so I generalized
xobj_iobj_parameters_correspond accordingly.

As I refine cand_parms_match, more behaviors need to differ between its
original use to compare the original templates for two candidates, and the
later use to decide whether to compare constraints.  So now there's a
parameter to select between the semantics.

gcc/cp/ChangeLog:

* call.cc (reversed_match): New.
(enum class pmatch): New enum.
(cand_parms_match): Add match_kind parm.
(object_parms_correspond): Add fn parms.
(joust): Adjust.
* class.cc (xobj_iobj_parameters_correspond): Rename to...
(iobj_parm_corresponds_to): ...this.  Take the other
type instead of a second function.
(object_parms_correspond): Adjust.
* cp-tree.h (iobj_parm_corresponds_to): Declare.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-memfun4.C: Change expected
reversed handling.

Objective-C, Darwin: Fix a regression in handling bad receivers.

This is seen on 32b hosts with a 64b multilib, and is an ICE when
the build has checking enabled. The fix is to exit the routine
early if the sender or receiver are already error_mark_node.

gcc/objc/ChangeLog:

* objc-next-runtime-abi-02.cc
(build_v2_objc_method_fixup_call): Early exit for cases
where the sender or receiver are known to be in error.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>

Darwin, powerpc: Fix bootstrap.

Recent changes to the member names of the diagnostics class missed one case in
the Darwin PowerPC host code. Fixed thus.

gcc/ChangeLog:

* config/rs6000/host-darwin.cc (segv_handler): Use the revised
diagnostics class member name for abort of error.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>

AVR: Work around "sequence of 3 consecutive punctuation characters" warning.

gcc/
* config/avr/avr.cc (avr_handle_addr_attribute): Move "..." from
format string to %s argument.

Bump BASE-VER to 14.0.1 now that we are in stage4.

* BASE-VER: Bump to 14.0.1.

varasm: Fix up process_pending_assemble_externals [PR113182]

John reported that on HP-UX we no longer emit needed external libcalls.

The problem is that we didn't strip name encoding when looking up
the identifiers in assemble_external_libcall and
process_pending_assemble_externals, while
assemble_name_resolve does that:
  const char *real_name = targetm.strip_name_encoding (name);
  tree id = maybe_get_identifier (real_name);

  if (id)
    {
...
      mark_referenced (id);
The intention is that assemble_external_libcall ensures the IDENTIFIER
exists for the external libcall, then for actually emitted calls
assemble_name_resolve sees those IDENTIFIERS and sets TREE_SYMBOL_REFERENCED
on them and finally process_pending_assemble_externals looks the
IDENTIFIER up again and checks its TREE_SYMBOL_REFERENCED.

But without the strip_name_encoding call, they can look up different
identifiers and those are likely never used.

In the PR, John was discussing whether get_identifier or
maybe_get_identifier should be used, I believe in assemble_external_libcall
we definitely want to use get_identifier, we need an IDENTIFIER allocated
so that it can be actually tracked, in process_pending_assemble_externals
it doesn't matter, the IDENTIFIER should be already created.

2024-01-12  John David Anglin  <danglin@gcc.gnu.org>
    Jakub Jelinek  <jakub@redhat.com>

PR middle-end/113182
* varasm.cc (process_pending_assemble_externals,
assemble_external_libcall): Use targetm.strip_name_encoding
before calling get_identifier.

aarch64: Rework uxtl->zip optimisation [PR113196]

g:f26f92b534f9 implemented unsigned extensions using ZIPs rather than
UXTL{,2}, since the former has a higher throughput than the latter on
amny cores.  The optimisation worked by lowering directly to ZIP during
expand, so that the zero input could be hoisted and shared.

However, changing to ZIP means that zero extensions no longer benefit
from some existing combine patterns.  The patch included new patterns
for UADDW and USUBW, but the PR shows that other patterns were affected
as well.

This patch instead introduces the ZIPs during a pre-reload split
and forcibly hoists the zero move to the outermost scope.  This has
the disadvantage of executing the move even for a shrink-wrapped
function, which I suppose could be a problem if it causes a kernel
to trap and enable Advanced SIMD unnecessarily.  In other circumstances,
an unused move shouldn't affect things much.

Also, the RA should be able to rematerialise the move at an
appropriate point if necessary, such as if there is an intervening
call.

In https://gcc.gnu.org/pipermail/gcc-patches/2024-January/641948.html
I'd then tried to allow a zero to be recombined back into a solitary
ZIP.  However, that relied on late-combine, which didn't make it into
GCC 14.  This version instead restricts the split to cases where the
UXTL executes more frequently as the entry block (which is where we
plan to put the zero).

Also, the original optimisation contained a big-endian correction
that I don't think is needed/correct.  Even on big-endian targets,
we want the ZIP to take the low half of an element from the input
vector and the high half from the zero vector.  And the patterns
map directly to the underlying Advanced SIMD instructions: the use
of unspecs means that there's no need to adjust for the difference
between GCC and Arm lane numbering.

gcc/
PR target/113196
* config/aarch64/aarch64.h (machine_function::advsimd_zero_insn):
New member variable.
* config/aarch64/aarch64-protos.h (aarch64_split_simd_shift_p):
Declare.
* config/aarch64/iterators.md (Vnarrowq2): New mode attribute.
* config/aarch64/aarch64-simd.md
(vec_unpacku_hi_<mode>, vec_unpacks_hi_<mode>): Recombine into...
(vec_unpack<su>_hi_<mode>): ...this.  Move the generation of
zip2 for zero-extends to...
(aarch64_simd_vec_unpack<su>_hi_<mode>): ...a split of this
instruction.  Fix big-endian handling.
(vec_unpacku_lo_<mode>, vec_unpacks_lo_<mode>): Recombine into...
(vec_unpack<su>_lo_<mode>): ...this.  Move the generation of
zip1 for zero-extends to...
(<optab><Vnarrowq><mode>2): ...a split of this instruction.
Fix big-endian handling.
(*aarch64_zip1_uxtl): New pattern.
(aarch64_usubw<mode>_lo_zip, aarch64_uaddw<mode>_lo_zip): Delete
(aarch64_usubw<mode>_hi_zip, aarch64_uaddw<mode>_hi_zip): Likewise.
* config/aarch64/aarch64.cc (aarch64_get_shareable_reg): New function.
(aarch64_gen_shareable_zero): Use it.
(aarch64_split_simd_shift_p): New function.

gcc/testsuite/
PR target/113196
* gcc.target/aarch64/pr113196.c: New test.
* gcc.target/aarch64/simd/vmovl_high_1.c: Remove double include.
Expect uxtl2 rather than zip2.
* gcc.target/aarch64/vect_mixed_sizes_8.c: Expect zip1 rather
than uxtl.
* gcc.target/aarch64/vect_mixed_sizes_9.c: Likewise.
* gcc.target/aarch64/vect_mixed_sizes_10.c: Likewise.

Keep track of the FUNCTION_BEG note

function.cc emits a NOTE_FUNCTION_BEG after all arguments have
been copied to pseudos.  It then records this note in parm_birth_insn.
Various other pieces of code use this insn as a convenient place to
insert things at the start of the function.

However, cfgexpand later changes parm_birth_insn as follows:

  /* If we emitted any instructions for setting up the variables,
     emit them before the FUNCTION_START note.  */
  if (var_seq)
    {
      emit_insn_before (var_seq, parm_birth_insn);

      /* In expand_function_end we'll insert the alloca save/restore
before parm_birth_insn.  We've just insertted an alloca call.
Adjust the pointer to match.  */
      parm_birth_insn = var_seq;
    }

But the FUNCTION_BEG note is still useful for things that aren't
sensitive to stack allocation, and it has the advantage that
(unlike the var_seq above) it is never deleted or combined.
This patch adds a separate variable to track it.

gcc/
* emit-rtl.h (rtl_data::x_function_beg_note): New member variable.
(function_beg_insn): New macro.
* function.cc (expand_function_start): Initialize function_beg_insn.

aarch64: Use a global map to detect duplicated overloads [PR112989]

As explained in the covering note to the previous patch,
the fact that aarch64-sve-* is now used for multiple header
files means that function_builder::add_overloaded_function
now needs to use a global map to detect duplicated overload
functions, instead of the member variable that it used previously.

gcc/
PR target/112989
* config/aarch64/aarch64-sve-builtins.h
(function_builder::m_overload_names): Replace with...
* config/aarch64/aarch64-sve-builtins.cc (overload_names): ...this
new global.
(add_overloaded_function): Update accordingly, using get_identifier
to get a GGC-friendly record of the name.

aarch64: Use a separate group for SME builtins [PR112989]

The PR shows that we were registering the same overloaded SVE
builtins twice.  This was supposed to be prevented by
function_builder::add_overloaded_function, which uses a map
to detect whether a function of the same name has already been
registered.  add_overloaded_function then had some asserts to
check for consistency.

However, the map that add_overloaded_function uses was a member of
function_builder itself.  That made sense when there was just one
header file, arm_sve.h, since it meant that the memory could be
reclaimed once arm_sve.h had been processed.  But now we have three
header files, and in principle, it's possible for arm_sme.h to include
overloads of things that arm_sve.h also defines.  We therefore need
to use a global map instead.

However, doing that meant that the consistency checks in
add_overloaded_function fired as expected, which showed some
latent issues.  This preliminary patch deals with those by adding
AARCH64_FL_SME to things that require AARCH64_FL_SME2.

This inconsistency led to another problem: functions were selected
for arm_sme.h over arm_sve.h based on whether they had AARCH64_FL_SME.
So some SME2-only things were actually defined in arm_sve.h, whereas
similar SME things were defined in arm_sme.h.

Choosing based on flags was an early get-started crutch that I forgot
to clean up later :(  This patch goes for the more direct approach of
having a separate table of SME builtins, as for arm_neon_sve_bridge.h.

aarch64-sve-builtins-sve2.def contains several intrinsics that are
currently SME-only but that operate entirely on vector registers.
Many of these will be extended to SVE2.1 once SVE2.1 support is added,
so the patch front-loads that by keeping the current division between
aarch64-sve-builtins-sve2.def (whose functions now go in arm_sve.h)
and aarch64-sve-builtins-sme.def (whose functions now go in arm_sme.h).

gcc/
PR target/112989
* config/aarch64/aarch64-sve-builtins.def: Don't include
aarch64-sve-builtins-sme.def.
(DEF_SME_ZA_FUNCTION_GS, DEF_SME_ZA_FUNCTION): Move to...
* config/aarch64/aarch64-sve-builtins-sme.def: ...here.
(DEF_SME_FUNCTION): New macro.  Use it and DEF_SME_FUNCTION_GS
instead of DEF_SVE_*.  Add AARCH64_FL_SME to anything that
requires AARCH64_FL_SME2.
* config/aarch64/aarch64-sve-builtins-sve2.def: Make same
AARCH64_FL_SME adjustment here.
* config/aarch64/aarch64-sve-builtins.cc (function_groups): Don't
include SME intrinsics.
(sme_function_groups): New array.
(handle_arm_sve_h): Remove check for AARCH64_FL_SME.
(handle_arm_sme_h): Use sme_function_groups instead of function_groups.

gcc/testsuite/
PR target/112989
* gcc.target/aarch64/sve/acle/general-c/clamp_1.c: Remove bogus
error test.

RISC-V: Adjust scalar_to_vec cost

1. Introduce vector regmove new tune info.
2. Adjust scalar_to_vec cost in add_stmt_cost.

We will get optimal codegen after this patch with -march=rv64gcv_zvl256b:

lui a5,%hi(a)
li a4,19
sb a4,%lo(a)(a5)
li a0,0
ret

Tested on both RV32/RV64 no regression, Ok for trunk ?

PR target/113281

gcc/ChangeLog:

* config/riscv/riscv-protos.h (struct regmove_vector_cost): New struct.
(struct cpu_vector_cost): Add regmove struct.
(get_vector_costs): Export as global.
* config/riscv/riscv-vector-costs.cc (adjust_stmt_cost): Adjust scalar_to_vec cost.
(costs::add_stmt_cost): Ditto.
* config/riscv/riscv.cc (get_common_costs): Export global function.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/pr113209.c: Adapt test.
* gcc.dg/vect/costmodel/riscv/rvv/pr113281-1.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/pr113281-2.c: New test.

lower-bitint: Fix up handling of unsigned INTEGER_CSTs operands with lots of 1s in the upper bits [PR113334]

For INTEGER_CST operands, the code decides if it should emit the whole
INTEGER_CST into memory, or if there are enough upper bits either all 0s
or all 1s to warrant an optimization, where we use memory for lower limbs
or even just an INTEGER_CST for least significant limb and fill in the
rest of limbs with 0s or 1s.  Unfortunately when not using
bitint_min_cst_precision, the code was using tree_int_cst_sgn (op) < 0
to determine whether to fill in the upper bits with 1s or 0s.  That is
incorrect for TYPE_UNSIGNED INTEGER_CSTs which have higher limbs full of
ones, we really want to check here whether the most significant bit is
set or clear.

Fixed thusly.

2024-01-12  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/113334
* gimple-lower-bitint.cc (bitint_large_huge::handle_operand): Use
wi::neg_p (wi::to_wide (op)) instead of tree_int_cst_sgn (op) < 0
to determine if number should be extended by all ones rather than zero
extended.

* gcc.dg/torture/bitint-46.c: New test.

sra: Punt for too large _BitInt accesses [PR113330]

This is the case I was talking about in
https://gcc.gnu.org/pipermail/gcc-patches/2024-January/642423.html
and Zdenek kindly found a testcase for it.
We can only create BITINT_TYPE with precision at most 65535, not 65536,
so need to punt if we'd want to create it.

2024-01-12 Jakub Jelinek <jakub@redhat.com>

PR tree-optimization/113330
* tree-sra.cc (create_access): Punt for BITINT_TYPE accesses with
too large size.

* gcc.dg/bitint-69.c: New test.

lower-bitint: Fix a typo in a condition [PR113323]

The following testcase revealed a typo in condition, as the comment
says the intent is
       /*  If lhs of stmt is large/huge _BitInt SSA_NAME not in m_names
           it means it will be handled in a loop or straight line code
           at the location of its (ultimate) immediate use, so for
           vop checking purposes check these only at the ultimate
           immediate use.  */
but the condition was using != BITINT_TYPE rather than == BITINT_TYPE,
so e.g. it used bitint_precision_kind on non-BITINT_TYPEs (e.g. on vector
types it will crash because TYPE_PRECISION means something different there,
or on say INTEGER_TYPEs the precision will never be large enough to be >=
bitint_prec_large).

The following patch fixes that.

2024-01-12  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/113323
* gimple-lower-bitint.cc (bitint_dom_walker::before_dom_children): Fix
check for lhs being large/huge _BitInt not in m_names.

* gcc.dg/bitint-68.c: New test.

lower-bitint: Fix up handling of uninitialized large/huge _BitInt call arguments [PR113316]

The code to assign large/huge _BitInt SSA_NAMEs to partitions intentionally
ignores uninitialized SSA_NAMEs:
          /* Also ignore uninitialized uses.  */
          if (SSA_NAME_IS_DEFAULT_DEF (s)
              && (!SSA_NAME_VAR (s) || VAR_P (SSA_NAME_VAR (s))))
            continue;
because there is no need to store them into memory, all we need is when
trying to extract some limb from them use uninitialized SSA_NAME for the
limb.

The following testcase shows this is a problem for call arguments though,
for those we need to create replacement SSA_NAMEs which are loaded from
the underlying variable.  For uninitialized SSA_NAMEs because we didn't
create underlying variable for them var_to_partition doesn't work, the
following patch handles it by just creating an uninitialized replacement
SSA_NAME.

2024-01-12  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/113316
* gimple-lower-bitint.cc (bitint_large_huge::lower_call): Handle
uninitialized large/huge _BitInt arguments to calls.

* gcc.dg/bitint-67.c: New test.

lower-bitint: Fix handling of casts on arches with abi_limb_mode != limb_mode

> > This patch is still work in progress, but posting to show failure with
> > bitint-7 test where handle_stmt called from lower_mergeable_stmt ICE's
> > because the idx (3) is out of range for the __BitInt(135) with a limb_prec
> > of 64.
>
> I can reproduce it, will debug it momentarily.

So, the problem was that in 2 spots I was comparing TYPE_SIZE of large/huge
BITINT_TYPEs to determine if it can be handled cheaply.
On x86_64 with limb_mode == abi_limb_mode (both DImode) that works fine,
if TYPE_SIZE is equal, it means it has the same number of limbs.
But on aarch64 TYPE_SIZE of say _BitInt(135) and _BitInt(193) is the same,
both are 256-bit storage, but because DImode is used as limb_mode, the
former actually needs just 3 limbs, while the latter needs 4 limbs.
And limb_access_type was asserting that we don't try to access 4th limb
on types which actually have a precision which needs just 3 limbs.

The following patch should fix that.

Note, for the info.extended targets (currently none, but I think arm 32-bit
in the ABI is meant like that), we'll need to do something different,
because the upper bits aren't just padding and should be zero/sign extended,
so if we say have limb_mode SImode, abi_limb_mode DImode, we'll need to
treat _BitInt(135) not as 5 SImode limbs, but 6. For !info.extended targets
I think treating _BitInt(135) as 3 DImode limbs rather than 4 is fine.

2024-01-12 Jakub Jelinek <jakub@redhat.com>

* gimple-lower-bitint.cc (mergeable_op): Instead of comparing
TYPE_SIZE (t) of large/huge BITINT_TYPEs, compare
CEIL (TYPE_PRECISION (t), limb_prec).
(bitint_large_huge::handle_cast): Likewise.

[PATCH] libgccjit: Add support for function attributes and variable attributes.

gcc/jit/ChangeLog:

* dummy-frontend.cc (handle_alias_attribute): New function.
(handle_always_inline_attribute): New function.
(handle_cold_attribute): New function.
(handle_fnspec_attribute): New function.
(handle_format_arg_attribute): New function.
(handle_format_attribute): New function.
(handle_noinline_attribute): New function.
(handle_target_attribute): New function.
(handle_used_attribute): New function.
(handle_visibility_attribute): New function.
(handle_weak_attribute): New function.
(handle_alias_ifunc_attribute): New function.
* jit-playback.cc (fn_attribute_to_string): New function.
(variable_attribute_to_string): New function.
(global_new_decl): Add attributes support.
(set_variable_attribute): New function.
(new_global): Add attributes support.
(new_global_initialized): Add attributes support.
(new_local): Add attributes support.
* jit-playback.h (fn_attribute_to_string): New function.
(set_variable_attribute): New function.
* jit-recording.cc (recording::lvalue::add_attribute): New function.
(recording::function::function): New function.
(recording::function::write_to_dump): Add attributes support.
(recording::function::add_attribute): New function.
(recording::function::add_string_attribute): New function.
(recording::function::add_integer_array_attribute): New function.
(recording::global::replay_into): Add attributes support.
(recording::local::replay_into): Add attributes support.
* jit-recording.h: Add attributes support.
* libgccjit.cc (gcc_jit_function_add_attribute): New function.
(gcc_jit_function_add_string_attribute): New function.
(gcc_jit_function_add_integer_array_attribute): New function.
(gcc_jit_lvalue_add_attribute): New function.
* libgccjit.h (enum gcc_jit_fn_attribute): New enum.
(gcc_jit_function_add_attribute): New function.
(gcc_jit_function_add_string_attribute): New function.
(gcc_jit_function_add_integer_array_attribute): New function.
(enum gcc_jit_variable_attribute): New function.
(gcc_jit_lvalue_add_string_attribute): New function.
* libgccjit.map: Declare new functions.

gcc/testsuite/ChangeLog:

* jit.dg/all-non-failing-tests.h: Add new attributes tests.
* jit.dg/jit.exp: Add `jit-verify-assembler-output-not` test command.
* jit.dg/test-restrict-attribute.c: New test.
* jit.dg/test-alias-attribute.c: New test.
* jit.dg/test-always_inline-attribute.c: New test.
* jit.dg/test-cold-attribute.c: New test.
* jit.dg/test-const-attribute.c: New test.
* jit.dg/test-noinline-attribute.c: New test.
* jit.dg/test-nonnull-attribute.c: New test.
* jit.dg/test-pure-attribute.c: New test.
* jit.dg/test-used-attribute.c: New test.
* jit.dg/test-variable-attribute.c: New test.
* jit.dg/test-weak-attribute.c: New test.

gcc/jit/ChangeLog:
* docs/topics/compatibility.rst: Add documentation for LIBGCCJIT_ABI_26.
* docs/topics/functions.rst: Add documentation for new functions.
* docs/topics/expressions.rst: Add documentation for new functions.

Co-authored-by: Antoni Boucher <bouanto@zoho.com>
Signed-off-by: Guillaume Gomez <guillaume1.gomez@gmail.com>

rs6000: Fix ASAN linker errors for Power ELF V1 ABI [PR113284]

rs6000_elf_declare_function_name () outputs Power ELF V1 ABI function
entry labels without using ASM_OUTPUT_FUNCTION_LABEL (). As a result,
.LASANPC labels are not emitted, causing linker errors.

In theory, it is possible to reuse ASM_OUTPUT_FUNCTION_LABEL () by
changing rs6000_output_function_entry () to generate label names
without outputting them, but this would be quite a large change.

Instead, factor out the .LASANPC emitting code from
ASM_OUTPUT_FUNCTION_LABEL () and call it manually.

Fixes: c659dd8bfb55 ("Implement ASM_DECLARE_FUNCTION_NAME using ASM_OUTPUT_FUNCTION_LABEL")
Suggested-by: Jakub Jelinek <jakub@redhat.com>
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
gcc/ChangeLog:

PR sanitizer/113284
* config/rs6000/rs6000.cc (rs6000_elf_declare_function_name):
Use assemble_function_label_final () for Power ELF V1 ABI.
* output.h (assemble_function_label_final): New function.
* varasm.cc (assemble_function_label_raw): Use
assemble_function_label_final ().
(assemble_function_label_final): New function.

testsuite: Fix up preprocessor conditions in bitint-31.c test

Andre reported on IRC that this test has weird preprocessor conditions,
obviously the intent was to test whether corresponding __*_MANT_DIG__
is equal to the expected value like earlier in the function definitions,
but somehow I've ended up with a comma expression instead, which was
always true.

2024-01-12 Jakub Jelinek <jakub@redhat.com>

* gcc.dg/bitint-31.c: Fix up #if conditions checking whether
__*_MANT_DIG__ is equal to a particular precision.

libstdc++: Fix std::runtime_format deviations from the spec [PR113320]

I seem to have implemented this feature based on the P2918R0 revision,
not the final P2918R2 one that was approved for C++26. This commit fixes
it.

The runtime-format-string type should not have a publicly accessible
data member, so add a constructor and make it a friend of
basic_format_string. It should also be non-copyable, so that it can only
be constructed from a prvalue via temporary materialization. Change the
basic_format_string constructor parameter to pass by value. Also add
noexcept to the constructors and runtime_format generator functions.

libstdc++-v3/ChangeLog:

PR libstdc++/113320
* include/std/format (__format::_Runtime_format_string): Add
constructor and disable copy operations.
(basic_format_string(_Runtime_format_string)): Add noexcept and
take parameter by value not rvalue reference.
(runtime_format): Add noexcept.
* testsuite/std/format/runtime_format.cc: Check noexcept. Check
that construction is only possible from prvalues, not xvalues.

Reviewed-by: Daniel Krügler <daniel.kruegler@gmail.com>

libstdc++: Implement C++23 P1951R1 (Default Args for pair's Forwarding Ctor) [PR105505]

This was approved for C++23 at he June 2021 virtual meeting.

This allows more efficient construction of std::pair members when {} is
used as a constructor argument. The perfect forwarding constructor can
be used, so that the member variables are constructed from forwarded
arguments instead of being copy constructed from temporaries.

libstdc++-v3/ChangeLog:

PR libstdc++/105505
* include/bits/stl_pair.h (pair::pair(U1&&, U2&&)) [C++23]: Add
default template arguments, as per P1951R1.
* testsuite/20_util/pair/cons/default_tmpl_args.cc: New test.

libstdc++: Fix incorrect PR number in comment

libstdc++-v3/ChangeLog:

* include/std/format (__format::_Arg_store): Fix PR number in
comment. Simplify preprocessor code.

libgcc: Use may_alias attribute in bitint handlers

As discussed on IRC, the following patch uses may_alias attribute, so that
on targets like aarch64 where abi_limb_mode != limb_mode the library
accesses the limbs (half limbs of the ABI) in the arrays with conservative
alias set.

2024-01-12 Jakub Jelinek <jakub@redhat.com>

* libgcc2.h (UBILtype): New typedef with may_alias attribute.
(__mulbitint3, __divmodbitint4): Use UBILtype * instead of
UWtype * and const UBILtype * instead of const UWtype *.
* libgcc2.c (bitint_reduce_prec, bitint_mul_1, bitint_addmul_1,
__mulbitint3, bitint_negate, bitint_submul_1, __divmodbitint4):
Likewise.
* soft-fp/bitint.h (UBILtype): Change define into a typedef with
may_alias attribute.

middle-end/113344 - is_truth_type_for vs GENERIC tcc_comparison

On GENERIC tcc_comparison can have int type so restrict the PR113126
fix to vector types.

PR middle-end/113344
* match.pd ((double)float CMP (double)float -> float CMP float):
Perform result type check only for vectors.
* fold-const.cc (fold_binary_loc): Likewise.

i386: Remove redundant move in vnni pattern

gcc/ChangeLog:

* config/i386/sse.md (sdot_prod<mode>): Remove redundant SET.
(usdot_prod<mode>): Ditto.
(sdot_prod<mode>): Ditto.
(udot_prod<mode>): Ditto.

i386: Add AVX10.1 related macros

gcc/ChangeLog:

PR target/113288
* config/i386/i386-c.cc (ix86_target_macros_internal):
Add __AVX10_1__, __AVX10_1_256__ and __AVX10_1_512__.

RISC-V: Enhance a testcase

This test should pass no matter how we adjust cost model.

Remove -fno-vect-cost-model.

Committed.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/fold-min-poly.c: Remove -fno-vect-cost-model

target/112280 - properly guard permute query

The following adds guards avoiding code generation to
expand_perm_as_a_vlbr_vstbr_candidate when d.testing_p.

PR target/112280
* config/s390/s390.cc (expand_perm_as_a_vlbr_vstbr_candidate):
Do not generate code when d.testing_p.

libgcc, nios2: Fix exception handling on nios2 with -fpic

Exception handling on nios2-linux-gnu with -fpic has been broken since
revision 790854ea7670f11c14d431c102a49181d2915965, "Use _dl_find_object
in _Unwind_Find_FDE".  For whatever reason, this doesn't work on nios2.

Nios2 uses the GOT address as the base for DW_EH_PE_datarel
relocations in PIC; see my previous fix to make this work, revision
2d33dcfe9f0494c9b56a8d704c3d27c5a4329ebc, "Support for GOT-relative
DW_EH_PE_datarel encoding".  So this may be a horrible bug in the ABI
or in my interpretation of it or just glibc's implementation of
_dl_find_object for this target, but there's existing code out there
that does things this way; and realistically, nobody is going to
re-engineer this now that the vendor has EOL'ed the nios2
architecture.  So, just skip over the code trying to use
_dl_find_object on this target and fall back to the way that works.

I plan to backport this patch to the GCC 12 and GCC 13 branches as well.

libgcc/ChangeLog
* unwind-dw2-fde-dip.c (_Unwind_Find_FDE): Do not try to use
_dl_find_object on nios2; it doesn't work.

Update documents for fcf-protection=

After r14-2692-g1c6231c05bdcca, the option is defined as EnumSet and
-fcf-protection=branch won't unset any others bits since they're in
different groups. So to override -fcf-protection, an explicit
-fcf-protection=none needs to be added and then with
-fcf-protection=XXX

gcc/ChangeLog:

PR target/113039
* doc/invoke.texi (fcf-protection=): Update documents.

RISC-V: Update the comments of riscv_v_ext_mode_p [NFC]

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_v_ext_mode_p): Update the
comments of predicate func riscv_v_ext_mode_p.

Signed-off-by: Pan Li <pan2.li@intel.com>

RISC-V: Modify ABI-name length of vfloat16m8_t

The length of vfloat16m8_t ABI-name should be 17.
gcc/ChangeLog:

* config/riscv/riscv-vector-builtins.def (vfloat16m8_t):
Modify ABI-name length of vfloat16m8_t

LoongArch: Redundant sign extension elimination optimization 2.

Eliminate the redundant sign extension that exists after the conditional
move when the target register is SImode.

gcc/ChangeLog:

* config/loongarch/loongarch.cc (loongarch_expand_conditional_move):
Adjust.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/sign-extend-2.c: Adjust.

LoongArch: Redundant sign extension elimination optimization.

We found that the current combine optimization pass in gcc cannot handle
the following redundant sign extension situations:

(insn 77 76 78 5 (set (reg:SI 143)
        (plus:SI (subreg/s/u:SI (reg/v:DI 104 [ len ]) 0)
            (const_int 1 [0x1]))) {addsi3}
    (expr_list:REG_DEAD (reg/v:DI 104 [ len ])
        (nil)))
(insn 78 77 82 5 (set (reg/v:DI 104 [ len ])
        (sign_extend:DI (reg:SI 143))) {extendsidi2}
        (nil))

Because reg:SI 143 is not died or set in insn 78, no replacement merge will
be performed for the insn sequence. We adjusted the add template to eliminate
redundant sign extensions during the expand pass.
Adjusted based on upstream comments:
https://gcc.gnu.org/pipermail/gcc-patches/2024-January/641988.html

gcc/ChangeLog:

* config/loongarch/loongarch.md (add<mode>3): Removed.
(*addsi3): New.
(addsi3): Ditto.
(adddi3): Ditto.
(*addsi3_extended): Removed.
(addsi3_extended): New.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/sign-extend.c: Moved to...
* gcc.target/loongarch/sign-extend-1.c: ...here.
* gcc.target/loongarch/sign-extend-2.c: New test.

Daily bump.

OpenMP: lvalue parsing for map/to/from clauses (C)

This patch adds support for parsing general lvalues ("locator list item
types") for OpenMP "map", "to" and "from" clauses to the C front-end,
similar to the previously-posted patch for C++.  Such syntax is permitted
for OpenMP 5.0 and above.  It was previously posted for mainline here:

  https://gcc.gnu.org/pipermail/gcc-patches/2022-December/609038.html

and for the og13 branch here:

  https://gcc.gnu.org/pipermail/gcc-patches/2023-June/623355.html

2024-01-11  Julian Brown  <julian@codesourcery.com>

gcc/c-family/
* c-pretty-print.cc (c_pretty_printer::postfix_expression,
c_pretty_printer::expression): Add OMP_ARRAY_SECTION support.

gcc/c/
* c-parser.cc (c_parser_braced_init, c_parser_conditional_expression):
Don't allow OpenMP array section.
(c_parser_postfix_expression): Don't allow array section in statement
expression.
(c_parser_postfix_expression_after_primary): Add support for OpenMP
array section parsing.
(c_parser_expr_list): Don't allow OpenMP array section here.
(c_parser_omp_variable_list): Change ALLOW_DEREF parameter to
MAP_LVALUE.  Support parsing of general lvalues in "map", "to" and
"from" clauses.
(c_parser_omp_var_list_parens): Change ALLOW_DEREF parameter to
MAP_LVALUE.  Update call to c_parser_omp_variable_list.
(c_parser_oacc_data_clause): Update calls to
c_parser_omp_var_list_parens.
(c_parser_omp_clause_reduction): Use OMP_ARRAY_SECTION tree node
instead of TREE_LIST for array sections.
(c_parser_omp_target): Allow GOMP_MAP_ATTACH.
* c-tree.h (c_omp_array_section_p): Add extern declaration.
(build_omp_array_section): Add prototype.
* c-typeck.cc (c_omp_array_section_p): Add flag.
(mark_exp_read): Support OMP_ARRAY_SECTION.
(build_omp_array_section): Add function.
(build_external_ref): Tweak error path for OpenMP array sections.
(handle_omp_array_sections_1): Use OMP_ARRAY_SECTION tree code instead
of TREE_LIST.  Handle more kinds of expressions.
(c_oacc_check_attachments): Use OMP_ARRAY_SECTION instead of TREE_LIST
for array sections.
(c_finish_omp_clauses): Use OMP_ARRAY_SECTION instead of TREE_LIST.
Check for supported expression types.

gcc/testsuite/
* gcc.dg/gomp/bad-array-section-c-1.c: New test.
* gcc.dg/gomp/bad-array-section-c-2.c: New test.
* gcc.dg/gomp/bad-array-section-c-3.c: New test.
* gcc.dg/gomp/bad-array-section-c-4.c: New test.
* gcc.dg/gomp/bad-array-section-c-5.c: New test.
* gcc.dg/gomp/bad-array-section-c-6.c: New test.
* gcc.dg/gomp/bad-array-section-c-7.c: New test.
* gcc.dg/gomp/bad-array-section-c-8.c: New test.

libgomp/
* libgomp.texi: C/C++ lvalues are supported now for map/to/from.
* testsuite/libgomp.c-c++-common/ind-base-4.c: New test.
* testsuite/libgomp.c-c++-common/unary-ptr-1.c: New test.

c++: corresponding object parms [PR113191]

As discussed, our handling of corresponding object parameters needed to
handle the using-declaration case better. And I took the opportunity to
share code between the add_method and cand_parms_match uses.

This patch specifically doesn't compare reversed parameters, but a follow-up
patch will.

PR c++/113191

gcc/cp/ChangeLog:

* class.cc (xobj_iobj_parameters_correspond): Add context parm.
(object_parms_correspond): Factor out of...
(add_method): ...here.
* method.cc (defaulted_late_check): Use it.
* call.cc (class_of_implicit_object): New.
(object_parms_correspond): Overload taking two candidates.
(cand_parms_match): Use it.
(joust): Check reversed before comparing constraints.
* cp-tree.h (object_parms_correspond): Declare.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-memfun4.C: New test.

libstdc++: Fix spelling mistake in new doc addition

libstdc++-v3/ChangeLog:

* doc/xml/manual/evolution.xml: Fix spelling.
* doc/html/manual/api.html: Regenerate.

libstdc++: Document addition of libstdc++exp.a

The API Evolution section of the manual should mention when the
libstdc++exp.a library was added.

libstdc++-v3/ChangeLog:

* doc/xml/manual/evolution.xml: Document addition of
libstdc++exp.a.
* doc/html/*: Regenerate.

libstdc++: use updated type for __unexpected_handler

Commit f4130a3eb545ab1aaf3ecb44f3d06b43e3751e04 changed the type of
__expected_handler in libsupc++/unwind-cxx.h to be a
std::terminate_handler to avoid a deprecated warning. However, the
definition in eh_unex_handler.cc still used the old type
(std::unexpected_handler) and thus causes a warning when compiling
libstdc++ with -Wdeprecated-declarations (which is the default, for
example, for clang).

Adapt the definition to match the declaration.

libstdc++-v3/ChangeLog:

* libsupc++/eh_unex_handler.cc: Adjust definition type to
declaration.

Reviewed-by: Jonathan Wakely <jwakely@redhat.com>

libstdc++: Removed a duplicate define directive for __glibcxx_want_ranges_iota

libstdc++-v3/ChangeLog:

* include/std/ranges (__glibcxx_want_ranges_iota): Remove
duplicate definition.

Signed-off-by: Michael Levine <mlevine55@bloomberg.net>
Reviewed-by: Jonathan Wakely <jwakely@redhat.com>

RISC-V: THEAD: Fix ICE caused by split optimizations for XTheadFMemIdx.

Due to the premature split optimizations for XTheadFMemIdx, GPR
is allocated when reload allocates registers, resulting in the
following insn.

(insn 66 21 64 5 (set (reg:DF 14 a4 [orig:136 <retval> ] [136])
        (mem:DF (plus:SI (reg/f:SI 15 a5 [141])
                (ashift:SI (reg/v:SI 10 a0 [orig:137 i ] [137])
                    (const_int 3 [0x3]))) [0  S8 A64])) 218 {*movdf_hardfloat_rv32}
     (nil))

Since we currently do not support adjustments to th_m_mir/th_m_miu,
which will trigger ICE. So it is recommended to place the split
optimizations after reload to ensure FPR when registers are allocated.

gcc/ChangeLog:

* config/riscv/thead.md: Add limits for splits.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/xtheadfmemidx-medany.c: New test.

libstdc++: [_GLIBCXX_DEBUG] Fix assignment of value-initialized iterator [PR112477]

Now that _M_Detach do not reset iterator _M_version value we need to reset it when
the iterator is attached to a new sequence, even if this sequencer is null when
assigning a value-initialized iterator. In this case _M_version shall be resetted to 0.

libstdc++-v3/ChangeLog:

PR libstdc++/112477
* src/c++11/debug.cc
(_Safe_iterator_base::_M_attach): Reset _M_version to 0 if attaching to null
sequence.
(_Safe_iterator_base::_M_attach_single): Likewise.
(_Safe_local_iterator_base::_M_attach): Likewise.
(_Safe_local_iterator_base::_M_attach_single): Likewise.
* testsuite/23_containers/map/debug/112477.cc: New test case.

Reviewed-by: Jonathan Wakely <jwakely@redhat.com>

libstdc++/ranges: Use C++23 deducing this in _Pipe and _Partial

This simplifies the operator() of the _Pipe and _Partial range adaptor
closure objects using C++23 deducing this, allowing us to condense
multiple operator() overloads into one.

The new __like_t alias template is similar to the expositional one from
P0847R6 except it's implemented in terms of forward_like instead of vice
versa, and thus ours always yields a reference so e.g. __like_t<A, char>
is char&& instead of char. For our purposes (forwarding) this shouldn't
make a difference, I think..

libstdc++-v3/ChangeLog:

* include/bits/move.h (__like_t): Define in C++23 mode.
* include/std/ranges (views::__adaptor::Partial::operator()):
Implement using C++23 deducing this when available.
(views::__adaptor::_Pipe::operator()): Likewise.
* testsuite/std/ranges/adaptors/100577.cc: Adjust testcase to
accept new "no match for call" errors issued in C++23 mode.
* testsuite/std/ranges/adaptors/lazy_split_neg.cc: Likewise.

expr: Limit the store flag optimization for single bit to non-vectors [PR113322]

The problem here is after the recent vectorizer improvements, we end up
with a comparison against a vector bool 0 which then tries expand_single_bit_test
which is not expecting vector comparisons at all.

The IR was:
  vector(4) <signed-boolean:1> mask_patt_5.13;
  _Bool _12;

  mask_patt_5.13_44 = vect_perm_even_41 != { 0.0, 1.0e+0, 2.0e+0, 3.0e+0 };
  _12 = mask_patt_5.13_44 == { 0, 0, 0, 0 };

and we tried to call expand_single_bit_test for the last comparison.
Rejecting the vector comparison is needed.

Bootstrapped and tested on x86_64-linux-gnu with no regressions.

PR middle-end/113322

gcc/ChangeLog:

* expr.cc (do_store_flag): Don't try single bit tests with
comparison on vector types.

gcc/testsuite/ChangeLog:

* gcc.c-torture/compile/pr113322-1.c: New test.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

match: Delay folding of 1/x into `(x+1u)<2u?x:0` until late [PR113301]

Since currently ranger does not work with the complexity of COND_EXPR in
some cases so delaying the simplification of `1/x` for signed types
help code generation.
tree-ssa/divide-8.c is a new testcase where this can help.

Bootstrapped and tested on x86_64-linux-gnu with no regressions.

PR tree-optimization/113301

gcc/ChangeLog:

* match.pd (`1/x`): Delay signed case until late.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/divide-8.c: New test.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

libstdc++: Add GDB printer for std::integral_constant

libstdc++-v3/ChangeLog:

* python/libstdcxx/v6/printers.py (StdIntegralConstantPrinter):
Add printer for std::integral_constant.
* testsuite/libstdc++-prettyprinters/cxx11.cc: Test it.

libstdc++: Prefer posix_memalign for aligned-new [PR113258]

As described in PR libstdc++/113258 there are old versions of tcmalloc
which replace malloc and related APIs, but do not repalce aligned_alloc
because it didn't exist at the time they were released. This means that
when operator new(size_t, align_val_t) uses aligned_alloc to obtain
memory, it comes from libc's aligned_alloc not from tcmalloc. But when
operator delete(void*, size_t, align_val_t) uses free to deallocate the
memory, that goes to tcmalloc's replacement version of free, which
doesn't know how to free it.

If we give preference to the older posix_memalign instead of
aligned_alloc then we're more likely to use a function that will be
compatible with the replacement version of free. Because posix_memalign
has been around for longer, it's more likely that old third-party malloc
replacements will also replace posix_memalign alongside malloc and free.

libstdc++-v3/ChangeLog:

PR libstdc++/113258
* libsupc++/new_opa.cc: Prefer to use posix_memalign if
available.

dg-extract-results.py: Ignore case in header line

DejaGNU changed its header line from "Test Run By" to "Test run by"
around 2016. This patch makes it so that both alternatives are
correcly detected.

contrib/ChangeLog:

* dg-extract-results.py: Make the test_run regex case
insensitive.

testsuite: remove xfail

These two lines have been getting XPASS since the test was added.

gcc/testsuite/ChangeLog:

* g++.dg/cpp23/explicit-obj-diagnostics7.C: Remove xfail.

AVR: invoke.texi: Put internal options in their own @subsubsection.

gcc/
* doc/invoke.texi (AVR Options): Move -mrmw, -mn-flash, -mshort-calls
and -msp8 to...
(AVR Internal Options): ...this new @subsubsection.

testsuite: remove -save-temps from many tests [PR113319]

This removes -save-temps from the tests I've introduced to fix the LTO
mismatches.

gcc/testsuite/ChangeLog:

PR testsuite/113319
* gcc.dg/bic-bitmask-13.c: Remove -save-temps.
* gcc.dg/bic-bitmask-14.c: Likewise.
* gcc.dg/bic-bitmask-15.c: Likewise.
* gcc.dg/bic-bitmask-16.c: Likewise.
* gcc.dg/bic-bitmask-17.c: Likewise.
* gcc.dg/bic-bitmask-18.c: Likewise.
* gcc.dg/bic-bitmask-19.c: Likewise.
* gcc.dg/bic-bitmask-20.c: Likewise.
* gcc.dg/bic-bitmask-21.c: Likewise.
* gcc.dg/bic-bitmask-22.c: Likewise.
* gcc.dg/bic-bitmask-7.c: Likewise.
* gcc.dg/vect/vect-early-break-run_1.c: Likewise.
* gcc.dg/vect/vect-early-break-run_10.c: Likewise.
* gcc.dg/vect/vect-early-break-run_2.c: Likewise.
* gcc.dg/vect/vect-early-break-run_3.c: Likewise.
* gcc.dg/vect/vect-early-break-run_4.c: Likewise.
* gcc.dg/vect/vect-early-break-run_5.c: Likewise.
* gcc.dg/vect/vect-early-break-run_6.c: Likewise.
* gcc.dg/vect/vect-early-break-run_7.c: Likewise.
* gcc.dg/vect/vect-early-break-run_8.c: Likewise.
* gcc.dg/vect/vect-early-break-run_9.c: Likewise.

[PR112918][LRA]: Fixing IRA ICE on m68k

Some GCC tests on m68K port of LRA is failed on `maximum number of
generated reload insns per insn achieved`.  The problem is in that for
subreg reload LRA can not narrow reg class more from ALL_REGS to
GENERAL_REGS and then to data regs or address regs.  The patch permits
narrowing reg class from reload insns if this results in successful
matching of reg operand.  This is the second version of the patch to
fix the PR.  This version adds matching with and without narrowing reg
class and preferring match without narrowing classes.

gcc/ChangeLog:

PR rtl-optimization/112918
* lra-constraints.cc (SMALL_REGISTER_CLASS_P): Move before in_class_p.
(in_class_p): Restrict condition for narrowing class in case of
allow_all_reload_class_changes_p.
(process_alt_operands): Try to match operand without and with
narrowing reg class.  Discourage narrowing the class.  Finish insn
matching only if there is no class narrowing.
(curr_insn_transform): Pass true to in_class_p for reg operand win.

tree-optimization/112505 - bit-precision induction vectorization

Vectorization of bit-precision inductions isn't implemented but we
don't check this, instead we ICE during transform.

PR tree-optimization/112505
* tree-vect-loop.cc (vectorizable_induction): Reject
bit-precision induction.

* gcc.dg/vect/pr112505.c: New testcase.

tree-optimization/113126 - vector extension compare optimization

The following makes sure the resulting boolean type is the same
when eliding a float extension.

PR tree-optimization/113126
* match.pd ((double)float CMP (double)float -> float CMP float):
Make sure the boolean type is the same.
* fold-const.cc (fold_binary_loc): Likewise.

* gcc.dg/torture/pr113126.c: New testcase.

tree-optimization/112636 - estimate niters before header copying

The following avoids a mismatch between an early query for maximum
number of iterations of a loop and a late one when through ranger
we'd get iterations estimated. Instead make sure we compute niters
before querying the iteration bound.

PR tree-optimization/112636
* tree-ssa-loop-ch.cc (ch_base::copy_headers): Call
estimate_numbers_of_iterations before querying
get_max_loop_iterations_int.
(pass_ch::execute): Initialize SCEV and loops appropriately.

* gcc.dg/pr112636.c: New testcase.

AVR: Some minor improvements to the TEXI documentation.

gcc/
* config/avr/avr-devices.cc (avr_texinfo): Adjust documentation for
Reduced Tiny.
* config/avr/gen-avr-mmcu-texi.cc (main): Add @anchor for each core.
* doc/extend.texi (AVR Variable Attributes): Improve documentation
of io, io_low and address attributes.
* doc/invoke.texi (AVR Options): Add some anchors for external refs.
* doc/avr-mmcu.texi: Rebuild.

libstdc++: Use using instead of typedef in opts-common.h

libstdc++-v3/ChangeLog:

* src/filesystem/ops-common.h (stat_type): Use using.

Signed-off-by: Ken Matsui <kmatsui@gcc.gnu.org>
Reviewed-by: Jonathan Wakely <jwakely@redhat.com>

libstdc++: Fix error handling in filesystem::equivalent [PR113250]

This patch made std::filesystem::equivalent correctly throw an exception
when either path does not exist as per [fs.op.equivalent]/4.

PR libstdc++/113250

libstdc++-v3/ChangeLog:

* src/c++17/fs_ops.cc (fs::equivalent): Use || instead of &&.
* src/filesystem/ops.cc (fs::equivalent): Likewise.
* testsuite/27_io/filesystem/operations/equivalent.cc: Handle
error codes.
* testsuite/experimental/filesystem/operations/equivalent.cc:
Likewise.

Signed-off-by: Ken Matsui <kmatsui@gcc.gnu.org>
Reviewed-by: Jonathan Wakely <jwakely@redhat.com>

LoongArch: Implement option save/restore

LTO option streaming and target attributes both require per-function
target configuration, which is achieved via option save/restore.

We implement TARGET_OPTION_{SAVE,RESTORE} to switch the la_target
context in addition to other automatically maintained option states
(via the "Save" option property in the .opt files).

Tested on loongarch64-linux-gnu without regression.

PR target/113233

gcc/ChangeLog:

* config/loongarch/genopts/loongarch.opt.in: Mark options with
the "Save" property.
* config/loongarch/loongarch.opt: Same.
* config/loongarch/loongarch-opts.cc: Refresh -mcmodel= state
according to la_target.
* config/loongarch/loongarch.cc: Implement TARGET_OPTION_{SAVE,
RESTORE} for the la_target structure; Rename option conditions
to have the same "la_" prefix.
* config/loongarch/loongarch.h: Same.

LOOP-UNROLL: Leverage HAS_SIGNED_ZERO for var expansion

The insert_var_expansion_initialization depends on the
HONOR_SIGNED_ZEROS to initialize the unrolling variables
to +0.0f when -0.0f and no-signed-option. Unfortunately,
we should always keep the -0.0f here because:

* The -0.0f is always the correct initial value.
* We need to support the target that always honor signed zero.

Thus, we need to leverage MODE_HAS_SIGNED_ZEROS when initialize
instead of HONOR_SIGNED_ZEROS. Then the target/backend can
decide to honor the no-signed-zero or not.

We also removed the testcase pr30957-1.c, as it makes undefined behavior
whether the return value is positive or negative.

The below tests are passed for this patch:

* The riscv regression tests.
* The aarch64 regression tests.
* The x86 bootstrap and regression tests.

gcc/ChangeLog:

* loop-unroll.cc (insert_var_expansion_initialization): Leverage
MODE_HAS_SIGNED_ZEROS for expansion variable initialization.

gcc/testsuite/ChangeLog:

* gcc.dg/pr30957-1.c: Remove.

Signed-off-by: Pan Li <pan2.li@intel.com>

aarch64: Fix dwarf2cfi ICEs due to recent CFI note changes [PR113077]

In r14-6604-gd7ee988c491cde43d04fe25f2b3dbad9d85ded45 we changed the CFI notes
attached to callee saves (in aarch64_save_callee_saves).  That patch changed
the ldp/stp representation to use unspecs instead of PARALLEL moves.  This meant
that we needed to attach CFI notes to all frame-related pair saves such that
dwarf2cfi could still emit the appropriate CFI (it cannot interpret the unspecs
directly).  The patch also attached REG_CFA_OFFSET notes to individual saves so
that the ldp/stp pass could easily preserve them when forming stps.

In that change I chose to use REG_CFA_OFFSET, but as the PR shows, that
choice was problematic in that REG_CFA_OFFSET requires the attached
store to be expressed in terms of the current CFA register at all times.
This means that even scheduling of frame-related insns can break this
invariant, leading to ICEs in dwarf2cfi.

The old behaviour (before that change) allowed dwarf2cfi to interpret the RTL
directly for sp-relative saves.  This change restores that behaviour by using
REG_FRAME_RELATED_EXPR instead of REG_CFA_OFFSET.  REG_FRAME_RELATED_EXPR
effectively just gives a different pattern for dwarf2cfi to look at instead of
the main insn pattern.  That allows us to attach the old-style PARALLEL move
representation in a REG_FRAME_RELATED_EXPR note and means we are free to always
express the save addresses in terms of the stack pointer.

Since the ldp/stp fusion pass can combine frame-related stores, this patch also
updates it to preserve REG_FRAME_RELATED_EXPR notes, and additionally gives it
the ability to synthesize those notes when combining sp-relative saves into an
stp (the latter always needs a note due to the unspec representation, the former
does not).

gcc/ChangeLog:

PR target/113077
* config/aarch64/aarch64-ldp-fusion.cc (filter_notes): Add
fr_expr param to extract REG_FRAME_RELATED_EXPR notes.
(combine_reg_notes): Handle REG_FRAME_RELATED_EXPR notes, and
synthesize these if needed.  Update caller ...
(ldp_bb_info::fuse_pair): ... here.
(ldp_bb_info::try_fuse_pair): Punt if either insn has writeback
and either insn is frame-related.
(find_trailing_add): Punt on frame-related insns.
* config/aarch64/aarch64.cc (aarch64_save_callee_saves): Use
REG_FRAME_RELATED_EXPR instead of REG_CFA_OFFSET.

gcc/testsuite/ChangeLog:

PR target/113077
* gcc.target/aarch64/pr113077.c: New test.

MIPS: Add ATTRIBUTE_UNUSED to mips_start_function_definition

Fix build warning:
mips.cc: warning: unused parameter 'decl'.

gcc
* config/mips/mips.cc (mips_start_function_definition):
Add ATTRIBUTE_UNUSED.

tree-optimization/111003 - new testcase

Testcase for fixed PR.

PR tree-optimization/111003
gcc/testsuite/
* gcc.dg/tree-ssa/pr111003.c: New testcase.

middle-end/112740 - vector boolean CTOR expansion issue

The optimization to expand uniform boolean vectors by sign-extension
works only for dense masks but it failed to check that.

PR middle-end/112740
* expr.cc (store_constructor): Check the integer vector
mask has a single bit per element before using sign-extension
to expand an uniform vector.

* gcc.dg/pr112740.c: New testcase.

RISC-V: VLA preempts VLS on unknown NITERS loop

This patch fixes the known issues on SLP cases:

ble a2,zero,.L11
addiw t1,a2,-1
li a5,15
bleu t1,a5,.L9
srliw a7,t1,4
slli a7,a7,7
lui t3,%hi(.LANCHOR0)
lui a6,%hi(.LANCHOR0+128)
addi t3,t3,%lo(.LANCHOR0)
li a4,128
addi a6,a6,%lo(.LANCHOR0+128)
add a7,a7,a0
addi a3,a1,37
mv a5,a0
vsetvli zero,a4,e8,m8,ta,ma
vle8.v v24,0(t3)
vle8.v v16,0(a6)
.L4:
li a6,128
vle8.v v0,0(a3)
vrgather.vv v8,v0,v24
vadd.vv v8,v8,v16
vse8.v v8,0(a5)
add a5,a5,a6
add a3,a3,a6
bne a5,a7,.L4
andi a5,t1,-16
mv t1,a5
.L3:
subw a2,a2,a5
li a4,1
beq a2,a4,.L5
slli a5,a5,32
srli a5,a5,32
addiw a2,a2,-1
slli a5,a5,3
csrr a4,vlenb
slli a6,a2,32
addi t3,a5,37
srli a3,a6,29
slli a4,a4,2
add t3,a1,t3
add a5,a0,a5
mv t5,a3
bgtu a3,a4,.L14
.L6:
li a4,50790400
addi a4,a4,1541
li a6,67633152
addi a6,a6,513
slli a4,a4,32
add a4,a4,a6
vsetvli t4,zero,e64,m4,ta,ma
vmv.v.x v16,a4
vsetvli a6,zero,e16,m8,ta,ma
vid.v v8
vsetvli zero,t5,e8,m4,ta,ma
vle8.v v20,0(t3)
vsetvli a6,zero,e16,m8,ta,ma
csrr a7,vlenb
vand.vi v8,v8,-8
vsetvli zero,zero,e8,m4,ta,ma
slli a4,a7,2
vrgatherei16.vv v4,v20,v8
vadd.vv v4,v4,v16
vsetvli zero,t5,e8,m4,ta,ma
vse8.v v4,0(a5)
bgtu a3,a4,.L15
.L7:
addw t1,a2,t1
.L5:
slliw a5,t1,3
add a1,a1,a5
lui a4,%hi(.LC2)
add a0,a0,a5
lbu a3,37(a1)
addi a5,a4,%lo(.LC2)
vsetivli zero,8,e8,mf2,ta,ma
vmv.v.x v1,a3
vle8.v v2,0(a5)
vadd.vv v1,v1,v2
vse8.v v1,0(a0)
.L11:
ret
.L15:
sub a3,a3,a4
bleu a3,a4,.L8
mv a3,a4
.L8:
li a7,50790400
csrr a4,vlenb
slli a4,a4,2
addi a7,a7,1541
li t4,67633152
add t3,t3,a4
vsetvli zero,a3,e8,m4,ta,ma
slli a7,a7,32
addi t4,t4,513
vle8.v v20,0(t3)
add a4,a5,a4
add a7,a7,t4
vsetvli a5,zero,e64,m4,ta,ma
vmv.v.x v16,a7
vsetvli a6,zero,e16,m8,ta,ma
vid.v v8
vand.vi v8,v8,-8
vsetvli zero,zero,e8,m4,ta,ma
vrgatherei16.vv v4,v20,v8
vadd.vv v4,v4,v16
vsetvli zero,a3,e8,m4,ta,ma
vse8.v v4,0(a4)
j .L7
.L14:
mv t5,a4
j .L6
.L9:
li a5,0
li t1,0
j .L3

The vectorization codegen is quite inefficient since we choose a VLS modes to vectorize the loop body
with epilogue choosing a VLA modes.

cost.c:6:21: note:  ***** Choosing vector mode V128QI
cost.c:6:21: note:  ***** Choosing epilogue vector mode RVVM4QI

As we known, in RVV side, we have VLA modes and VLS modes. VLAmodes support partial vectors wheras
VLSmodes support full vectors.  The goal we add VLSmodes is to improve the codegen of known NITERS
or SLP codes.

If NITERS is unknown, that is i < n, n is unknown. We will always have partial vectors vectorization.
It can be loop body or epilogue. In this case, It's always more efficient to apply VLA partial vectorization
on loop body which doesn't have epilogue.

After this patch:

f:
ble a2,zero,.L7
li a5,1
beq a2,a5,.L5
li a6,50790400
addi a6,a6,1541
li a4,67633152
addi a4,a4,513
csrr a5,vlenb
addiw a2,a2,-1
slli a6,a6,32
add a6,a6,a4
slli a5,a5,2
slli a4,a2,32
vsetvli t1,zero,e64,m4,ta,ma
srli a3,a4,29
neg t4,a5
addi a7,a1,37
mv a4,a0
vmv.v.x v12,a6
vsetvli t3,zero,e16,m8,ta,ma
vid.v v16
vand.vi v16,v16,-8
.L4:
minu a6,a3,a5
vsetvli zero,a6,e8,m4,ta,ma
vle8.v v8,0(a7)
vsetvli t3,zero,e8,m4,ta,ma
mv t1,a3
vrgatherei16.vv v4,v8,v16
vsetvli zero,a6,e8,m4,ta,ma
vadd.vv v4,v4,v12
vse8.v v4,0(a4)
add a7,a7,a5
add a4,a4,a5
add a3,a3,t4
bgtu t1,a5,.L4
.L3:
slliw a2,a2,3
add a1,a1,a2
lui a5,%hi(.LC0)
lbu a4,37(a1)
add a0,a0,a2
addi a5,a5,%lo(.LC0)
vsetivli zero,8,e8,mf2,ta,ma
vmv.v.x v1,a4
vle8.v v2,0(a5)
vadd.vv v1,v1,v2
vse8.v v1,0(a0)
.L7:
ret

Tested on both RV32 and RV64 no regression. Ok for trunk ?

gcc/ChangeLog:

* config/riscv/riscv-vector-costs.cc (costs::better_main_loop_than_p): VLA
preempt VLS on unknown NITERS loop.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/partial/slp-1.c: Remove xfail.
* gcc.target/riscv/rvv/autovec/partial/slp-16.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/slp-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/slp-5.c: Ditto.

libstdc++: Optimize std::is_compound compilation performance

This patch optimizes the compilation performance of std::is_compound.

libstdc++-v3/ChangeLog:

* include/std/type_traits (is_compound): Do not use __not_.
(is_compound_v): Use is_fundamental_v instead.

Signed-off-by: Ken Matsui <kmatsui@gcc.gnu.org>

Add -mevex512 into invoke.texi

Hi Richard,

It seems that I send out a not updated patch. This patch should what
I want to send.

Thx,
Haochen

gcc/ChangeLog:

* doc/invoke.texi: Add -mevex512.

LoongArch: Optimized some of the symbolic expansion instructions generated during bitwise operations.

There are two mode iterators defined in the loongarch.md:
(define_mode_iterator GPR [SI (DI "TARGET_64BIT")])
and
(define_mode_iterator X [(SI "!TARGET_64BIT") (DI "TARGET_64BIT")])
Replace the mode in the bit arithmetic from GPR to X.

Since the bitwise operation instruction does not distinguish between 64-bit,
32-bit, etc., it is necessary to perform symbolic expansion if the bitwise
operation is less than 64 bits.
The original definition would have generated a lot of redundant symbolic
extension instructions. This problem is optimized with reference to the
implementation of RISCV.

Add this patch spec2017 500.perlbench performance improvement by 1.8%

gcc/ChangeLog:

* config/loongarch/loongarch.md (one_cmpl<mode>2): Replace GPR with X.
(*nor<mode>3): Likewise.
(nor<mode>3): Likewise.
(*negsi2_extended): New template.
(*<optab>si3_internal): Likewise.
(*one_cmplsi2_internal): Likewise.
(*norsi3_internal): Likewise.
(*<optab>nsi_internal): Likewise.
(bytepick_w_<bytepick_imm>_extend): Modify this template according to the
modified bit operation to make the optimization work.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/sign-extend-bitwise.c: New test.

Optimize A < B ? A : B to MIN_EXPR.

Similar for A < B ? B : A to MAX_EXPR.
There're codes in the frontend to optimize such pattern but failed to
handle testcase in the PR since it's exposed at gimple level when
folding backend builtins.

pr95906 now can be optimized to MAX_EXPR as it's commented in the
testcase.

// FIXME: this should further optimize to a MAX_EXPR
typedef signed char v16i8 __attribute__((vector_size(16)));
v16i8 f(v16i8 a, v16i8 b)

gcc/ChangeLog:

PR target/104401
* match.pd (VEC_COND_EXPR: A < B ? A : B -> MIN_EXPR): New patten match.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr104401.c: New test.
* gcc.dg/tree-ssa/pr95906.c: Adjust testcase.

PR modula2/112946 set expression type checking

This patch adds type checking for binary set operators.
It also checks the IN operator and improves the := type checking.

gcc/m2/ChangeLog:

PR modula2/112946
* gm2-compiler/M2GenGCC.mod (IsExpressionCompatible): Import.
(ExpressionTypeCompatible): Import.
(CodeStatement): Remove op1, op2, op3 parameters from CodeSetOr,
CodeSetAnd, CodeSetSymmetricDifference, CodeSetLogicalDifference.
(checkArrayElements): Rename op1 to des and op3 to expr.
Use despos and exprpos instead of CurrentQuadToken.
(checkRecordTypes): Rename op1 to des and op2 to expr.
Use virtpos instead of CurrentQuadToken.
(checkIncorrectMeta): Ditto.
(checkBecomes): Rename op1 to des and op3 to expr.
Use virtpos instead of CurrentQuadToken.
(NoWalkProcedure): New procedure stub.
(CheckBinaryExpressionTypes): New procedure function.
(CheckElementSetTypes): New procedure function.
(CodeBinarySet): Re-write.
(FoldBinarySet): Re-write.
(CodeSetOr): Remove parameters op1, op2 and op3.
(CodeSetAnd): Ditto.
(CodeSetLogicalDifference): Ditto.
(CodeSetSymmetricDifference): Ditto.
(CodeIfIn): Call CheckBinaryExpressionTypes and
CheckElementSetTypes.
* gm2-compiler/M2Quads.mod (BuildRotateFunction): Correct
parameters to MakeVirtualTok to reflect parameter block
passed to Rotate.

gcc/testsuite/ChangeLog:

PR modula2/112946
* gm2/pim/fail/badbecomes.mod: New test.
* gm2/pim/fail/badexpression.mod: New test.
* gm2/pim/fail/badexpression2.mod: New test.
* gm2/pim/fail/badifin.mod: New test.
* gm2/pim/pass/goodifin.mod: New test.

Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>

config: delete unused CYG_AC_PATH_LIBERTY macro

Nothing uses this, so delete it to avoid confusion.

config/ChangeLog:

* acinclude.m4 (CYG_AC_PATH_LIBERTY): Delete.

Daily bump.

libstdc++: Use _GLIBCXX_USE_BUILTIN_TRAIT for _Nth_type

Since _Nth_type has a fallback native implementation, use
_GLIBCXX_USE_BUILTIN_TRAIT when checking for __type_pack_element
so that we can easily toggle which implementation to use.

libstdc++-v3/ChangeLog:

* include/bits/utility.h (_Nth_type): Use
_GLIBCXX_USE_BUILTIN_TRAIT instead of __has_builtin.

RISC-V: Switch RVV cost model.

This patch is preparing patch for the following cost model tweak.

Since we don't have vector cost model in default tune info (rocket),
we set the cost model default as generic cost model by default.

The reason we want to switch to generic vector cost model is the default
cost model generates inferior codegen for various benchmarks.

For example, PR113247, we have performance bug that we end up having over 70%
performance drop of SHA256. Currently, no matter how we adapt cost model,
we are not able to fix the performance bug since we always use default cost model by default.

Also, tweak the generic cost model back to default cost model since we have some FAILs in
current tests.

After this patch, we (me an Robin) can work on cost model tunning together to improve performane
in various benchmarks.

Tested on both RV32 and RV64, ok for trunk ?

gcc/ChangeLog:

* config/riscv/riscv.cc (get_common_costs): Switch RVV cost model.
(get_vector_costs): Ditto.
(riscv_builtin_vectorization_cost): Ditto.

RISC-V: Minor tweak dynamic cost model

v2 update: Robostify tests.

While working on cost model, I notice one case that dynamic lmul cost doesn't work well.

Before this patch:

foo:
        lui     a4,%hi(.LANCHOR0)
        li      a0,1953
        li      a1,63
        addi    a4,a4,%lo(.LANCHOR0)
        li      a3,64
        vsetvli a2,zero,e32,mf2,ta,ma
        vmv.v.x v5,a0
        vmv.v.x v4,a1
        vid.v   v3
.L2:
        vsetvli a5,a3,e32,mf2,ta,ma
        vadd.vi v2,v3,1
        vadd.vv v1,v3,v5
        mv      a2,a5
        vmacc.vv        v1,v2,v4
        slli    a1,a5,2
        vse32.v v1,0(a4)
        sub     a3,a3,a5
        add     a4,a4,a1
        vsetvli a5,zero,e32,mf2,ta,ma
        vmv.v.x v1,a2
        vadd.vv v3,v3,v1
        bne     a3,zero,.L2
        li      a0,0
        ret

Unexpected: Use scalable vector and LMUL = MF2 which is wasting computation resources.

Ideally, we should use LMUL = M8 VLS modes.

The root cause is the dynamic LMUL heuristic dominates the VLS heuristic.
Adapt the cost model heuristic.

After this patch:

foo:
lui a4,%hi(.LANCHOR0)
addi a4,a4,%lo(.LANCHOR0)
li a3,4096
li a5,32
li a1,2016
addi a2,a4,128
addiw a3,a3,-32
vsetvli zero,a5,e32,m8,ta,ma
li a0,0
vid.v v8
vsll.vi v8,v8,6
vadd.vx v16,v8,a1
vadd.vx v8,v8,a3
vse32.v v16,0(a4)
vse32.v v8,0(a2)
ret

Tested on both RV32/RV64 no regression.

Ok for trunk ?

gcc/ChangeLog:

* config/riscv/riscv-vector-costs.cc (costs::better_main_loop_than_p): Minior tweak.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-10.c: Fix test.
* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-11.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-12.c: Ditto.

libgccjit: Fix GGC segfault when using -flto

gcc/ChangeLog:
PR jit/111396
* ipa-fnsummary.cc (ipa_fnsummary_cc_finalize): Call
ipa_free_size_summary.
* ipa-icf.cc (ipa_icf_cc_finalize): New function.
* ipa-profile.cc (ipa_profile_cc_finalize): New function.
* ipa-prop.cc (ipa_prop_cc_finalize): New function.
* ipa-prop.h (ipa_prop_cc_finalize): New function.
* ipa-sra.cc (ipa_sra_cc_finalize): New function.
* ipa-utils.h (ipa_profile_cc_finalize, ipa_icf_cc_finalize,
ipa_sra_cc_finalize): New functions.
* toplev.cc (toplev::finalize): Call ipa_icf_cc_finalize,
ipa_prop_cc_finalize, ipa_profile_cc_finalize and
ipa_sra_cc_finalize
Include ipa-utils.h.

gcc/testsuite/ChangeLog:
PR jit/111396
* jit.dg/all-non-failing-tests.h: Add note about test-ggc-bugfix.
* jit.dg/test-ggc-bugfix.c: New test.

RISC-V: T-HEAD: Add support for the XTheadInt ISA extension

The XTheadInt ISA extension provides the following instructions
to accelerate interrupt processing:
* th.ipush
* th.ipop

Ref:
https://github.com/T-head-Semi/thead-extension-spec/releases/download/2.3.0/xthead-2023-11-10-2.3.0.pdf

gcc/ChangeLog:

* config/riscv/riscv-protos.h (th_int_get_mask): New prototype.
(th_int_get_save_adjustment): Likewise.
(th_int_adjust_cfi_prologue): Likewise.
* config/riscv/riscv.cc (BITSET_P): Moved away from here.
(TH_INT_INTERRUPT): New macro.
(riscv_expand_prologue): Add the processing of XTheadInt.
(riscv_expand_epilogue): Likewise.
* config/riscv/riscv.h (BITSET_P): Moved to here.
* config/riscv/riscv.md: New unspec.
* config/riscv/thead.cc (th_int_get_mask): New function.
(th_int_get_save_adjustment): Likewise.
(th_int_adjust_cfi_prologue): Likewise.
* config/riscv/thead.md (th_int_push): New pattern.
(th_int_pop): new pattern.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/xtheadint-push-pop.c: New test.

middle-end: Don't apply copysign optimization if target does not implement optab [PR112468]

Currently GCC does not treat IFN_COPYSIGN the same as the copysign tree expr.
The latter has a libcall fallback and the IFN can only do optabs.

Because of this the change I made to optimize copysign only works if the
target has impemented the optab, but it should work for those that have the
libcall too.

More annoyingly if a target has vector versions of ABS and NEG but not COPYSIGN
then the change made them lose vectorization.

The proper fix for this is to treat the IFN the same as the tree EXPR and to
enhance expand_COPYSIGN to also support vector calls.

I have such a patch for GCC 15 but it's quite big and too invasive for stage-4.
As such this is a minimal fix, just don't apply the transformation and leave
targets which don't have the optab unoptimized.

Targets list for check_effective_target_ifn_copysign was gotten by grepping for
copysign and looking at the optab.

gcc/ChangeLog:

PR tree-optimization/112468
* doc/sourcebuild.texi: Document ifn_copysign.
* match.pd: Only apply transformation if target supports the IFN.

gcc/testsuite/ChangeLog:

PR tree-optimization/112468
* gcc.dg/fold-copysign-1.c: Modify tests based on if target supports
IFN_COPYSIGN.
* gcc.dg/pr55152-2.c: Likewise.
* gcc.dg/tree-ssa/abs-4.c: Likewise.
* gcc.dg/tree-ssa/backprop-6.c: Likewise.
* gcc.dg/tree-ssa/copy-sign-2.c: Likewise.
* gcc.dg/tree-ssa/mult-abs-2.c: Likewise.
* lib/target-supports.exp (check_effective_target_ifn_copysign): New.

reassoc vs uninitialized variable [PR112581]

Like r14-2293-g11350734240dba and r14-2289-gb083203f053f16,
reassociation can combine across a few bb and one of the usage
can be an uninitializated variable and if going from an conditional
usage to an unconditional usage can cause wrong code.
This uses maybe_undef_p like other passes where this can happen.

Note if-to-switch uses the function (init_range_entry) provided
by ressociation so we need to call mark_ssa_maybe_undefs there;
otherwise we assume almost all ssa names are uninitialized.

Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

PR tree-optimization/112581
* gimple-if-to-switch.cc (pass_if_to_switch::execute): Call
mark_ssa_maybe_undefs.
* tree-ssa-reassoc.cc (can_reassociate_op_p): Uninitialized
variables can not be reassociated.
(init_range_entry): Check for uninitialized variables too.
(init_reassoc): Call mark_ssa_maybe_undefs.

gcc/testsuite/ChangeLog:

PR tree-optimization/112581
* gcc.c-torture/execute/pr112581-1.c: New test.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

RISC-V/testsuite: Fix comment termination in pr105314.c

Add terminating `/' character missing from one of the test harness
command clauses in pr105314.c. This causes no issue with compilation
owing to another comment immediately following, but would cause a:

pr105314.c:3:1: warning: "/*" within comment [-Wcomment]

message if warnings were enabled.

gcc/testsuite/
* gcc.target/riscv/pr105314.c: Fix comment termination.

RISC-V: Also handle sign extension in branch costing

Complement commit c1e8cb3d9f94 ("RISC-V: Rework branch costing model for
if-conversion") and also handle extraneous sign extend operations that
are sometimes produced by `noce_try_cmove_arith' instead of zero extend
operations, making branch costing consistent. It is unclear what the
condition is for the middle end to choose between the zero extend and
sign extend operation, but the test case included uses sign extension
with 64-bit targets, preventing if-conversion from triggering across all
the architectural variants.

There are further anomalies revealed by the test case, specifically the
exceedingly high branch cost of 6 required for the `-mmovcc' variant
despite that the final branchless sequence only uses 4 instructions, the
missed conversion at -O1 for 32-bit targets even though code is machine
word size agnostic, and the missed conversion at -Os and -Oz for 32-bit
Zicond targets even though the branchless sequence would be shorter than
the branched one. These will have to be handled separately.

gcc/
* config/riscv/riscv.cc (riscv_noce_conversion_profitable_p):
Also handle sign extension.

gcc/testsuite/
* gcc.target/riscv/cset-sext-sfb.c: New test.
* gcc.target/riscv/cset-sext-thead.c: New test.
* gcc.target/riscv/cset-sext-ventana.c: New test.
* gcc.target/riscv/cset-sext-zicond.c: New test.
* gcc.target/riscv/cset-sext.c: New test.

testsuite: Add testcase for already fixed PR [PR112734]

This test was already fixed by r14-6051 aka PR112770 fix.

2024-01-10 Jakub Jelinek <jakub@redhat.com>

PR tree-optimization/112734
* gcc.dg/bitint-64.c: New test.

aarch64: Make ldp/stp pass off by default

As discussed on IRC, this makes the aarch64 ldp/stp pass off by default. This
should stabilize the trunk and give some time to address the P1 regressions.

gcc/ChangeLog:

* config/aarch64/aarch64.opt (-mearly-ldp-fusion): Set default
to 0.
(-mlate-ldp-fusion): Likewise.

middle-end: correctly identify the edge taken when condition is true. [PR113287]

The vectorizer needs to know during early break vectorization whether the edge
that will be taken if the condition is true stays or leaves the loop.

This is because the code assumes that if you take the true branch you exit the
loop. If you don't exit the loop it has to generate a different condition.

Basically it uses this information to decide whether it's generating a
"any element" or an "all element" check.

Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
and no issues with --enable-lto --with-build-config=bootstrap-O3
--enable-checking=release,yes,rtl,extra.

gcc/ChangeLog:

PR tree-optimization/113287
* tree-vect-stmts.cc (vectorizable_early_exit): Check the flags on edge
instead of using BRANCH_EDGE to determine true edge.

gcc/testsuite/ChangeLog:

PR tree-optimization/113287
* gcc.dg/vect/vect-early-break_100-pr113287.c: New test.
* gcc.dg/vect/vect-early-break_99-pr113287.c: New test.

tree-optimization/113078 - conditional subtraction reduction vectorization

When if-conversion was changed to use .COND_ADD/SUB for conditional
reduction it was forgotten to update reduction path handling to
canonicalize .COND_SUB to .COND_ADD for vectorizable_reduction
similar to what we do for MINUS_EXPR. The following adds this
and testcases exercising this at runtime and looking for the
appropriate masked subtraction in the vectorized code on x86.

PR tree-optimization/113078
* tree-vect-loop.cc (check_reduction_path): Canonicalize
.COND_SUB to .COND_ADD.

* gcc.dg/vect/vect-reduc-cond-sub.c: New testcase.
* gcc.target/i386/vect-pr113078.c: Likewise.

c++ frontend: initialize ivdep value

Should control enter the switch from one of the cases other than
the IVDEP one then the variable remains uninitialized.

This fixes it by initializing it to false.

gcc/cp/ChangeLog:

* parser.cc (cp_parser_pragma): Initialize to false.

gcc-urlifier: handle option prefixes such as '-fno-'

Given e.g. this missppelled option (omitting the trailing 's'):
$ LANG=C ./xgcc -B. -fno-inline-small-function
xgcc: error: unrecognized command-line option '-fno-inline-small-function'; did you mean '-fno-inline-small-functions'?

we weren't providing a documentation URL for the suggestion.

The issue is the URLification code uses find_opt, which doesn't consider
the various '-fno-' prefixes.

This patch adds a way to find the pertinent prefix remapping and uses it
when determining URLs.
With this patch, the suggestion '-fno-inline-small-functions' now gets a
documentation link (to that of '-finline-small-functions').

gcc/ChangeLog:
* gcc-urlifier.cc (gcc_urlifier::get_url_suffix_for_option):
Handle prefix mappings before calling find_opt.
(selftest::gcc_urlifier_cc_tests): Add example of urlifying a
"-fno-"-prefixed command-line option.
* opts-common.cc (get_option_prefix_remapping): New.
* opts.h (get_option_prefix_remapping): New decl.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

pretty-print: support urlification in phase 3

TL;DR: for the case when the user misspells a command-line option
and we suggest one, with this patch we now provide a documentation URL
for the suggestion.

In r14-5118-gc5db4d8ba5f3de I added a mechanism to automatically add
URLs to quoted strings in diagnostics, and in r14-6920-g9e49746da303b8
through r14-6923-g4ded42c2c5a5c9 wired this up so that any time
we mention a command-line option in a diagnostic message in quotes,
the user gets a URL to the HTML documentation for that option.

However this only worked for quoted strings that were fully within
a single "chunk" within the pretty-printer implementation, such as:

* "%<-foption%>" (handled in phase 1)
* "%qs", "-foption" (handled in phase 2)

but not where the the quoted string straddled multiple chunks, in
particular for this important case in the gcc.cc:

  error ("unrecognized command-line option %<-%s%>;"
" did you mean %<-%s%>?",
switches[i].part1, hint);

e.g. for:
$ LANG=C ./xgcc -B. -finling-small-functions
xgcc: error: unrecognized command-line option '-finling-small-functions'; did you mean '-finline-small-functions'?

which within pp_format becomes these chunks:

* chunk 0: "unrecognized command-line option `-"
* chunk 1: switches[i].part1  (e.g. "finling-small-functions")
* chunk 2: "'; did you mean `-"
* chunk 3: hint (e.g. "finline-small-functions")
* chunk 4: "'?"

where the first quoted run is in chunks 1-3 and the second in
chunks 2-4.

Hence we were not attempting to provide a URL for the two quoted runs,
and, in particular not for the hint.

This patch refactors the urlification mechanism in pretty-print.cc so
that it checks for quoted runs that appear in phase 3 (as well as in
phases 1 and 2, as before).  With this, the quoted text runs
"-finling-small-functions" and "-finline-small-functions" are passed
to the urlifier, which successfully finds a documentation URL for
the latter.

As before, the urlification code is only run if the URL escapes are
enabled, and only for messages from diagnostic.cc (error, warn, inform,
etc), not for all pretty_printer usage.

gcc/ChangeLog:
* diagnostic.cc (diagnostic_context::report_diagnostic): Pass
m_urlifier to pp_output_formatted_text.
* pretty-print.cc: Add #define of INCLUDE_VECTOR.
(obstack_append_string): New overload, taking a length.
(urlify_quoted_string): Pass in an obstack ptr, rather than using
that of the pp's buffer.  Generalize to handle trailing text in
the buffer beyond the run of quoted text.
(class quoting_info): New.
(on_begin_quote): New.
(on_end_quote): New.
(pp_format): Refactor phase 1 and phase 2 quoting support, moving
it to calls to on_begin_quote and on_end_quote.
(struct auto_obstack): New.
(quoting_info::handle_phase_3): New.
(pp_output_formatted_text): Add urlifier param.  Use it if there
is deferred urlification.  Delete m_quotes.
(selftest::pp_printf_with_urlifier): Pass urlifier to
pp_output_formatted_text.
(selftest::test_urlification): Update results for the existing
case of quoted text stradding chunks; add more such test cases.
* pretty-print.h (class quoting_info): New forward decl.
(chunk_info::m_quotes): New field.
(pp_output_formatted_text): Add optional urlifier param.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

pretty-print: add selftest coverage for numbered args

No functional change intended.

gcc/ChangeLog:
* pretty-print.cc (selftest::test_pp_format): Add selftest
coverage for numbered args.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

OpenMP: Fix g++.dg/gomp/bad-array-section-10.C for C++23 and up

This patch adjusts diagnostic output for C++23 and above for the test
case mentioned in the commit title.

2024-01-10 Julian Brown <julian@codesourcery.com>

gcc/testsuite/
* g++.dg/gomp/bad-array-section-10.C: Adjust diagnostics for C++23 and
up.