git.ipfire.org Git - thirdparty/gcc.git/log

aarch64: Re-enable ldp/stp fusion pass

Since, to the best of my knowledge, all reported regressions related to
the ldp/stp fusion pass have now been fixed, and PGO+LTO bootstrap with
--enable-languages=all is working again with the passes enabled, this
patch turns the passes back on by default, as agreed with Jakub here:

https://gcc.gnu.org/pipermail/gcc-patches/2024-January/642478.html

gcc/ChangeLog:

* config/aarch64/aarch64.opt (-mearly-ldp-fusion): Set default
to 1.
(-mlate-ldp-fusion): Likewise.

middle-end: rename main_exit_p in reduction code.

This renamed main_exit_p to last_val_reduc_p to more accurately
reflect what the value is calculating.

gcc/ChangeLog:

* tree-vect-loop.cc (vect_get_vect_def,
vect_create_epilog_for_reduction): Rename main_exit_p to
last_val_reduc_p.

middle-end: fix epilog reductions when vector iters peeled [PR113364]

This fixes a bug where vect_create_epilog_for_reduction does not handle the
case where all exits are early exits. In this case we should do like induction
handling code does and not have a main exit.

This shows that some new miscompiles are happening (stage3 is likely miscompiled)
but that's unrelated to this patch and I'll look at it next.

gcc/ChangeLog:

PR tree-optimization/113364
* tree-vect-loop.cc (vect_create_epilog_for_reduction): If all exits all
early exits then we must reduce from the first offset for all of them.

gcc/testsuite/ChangeLog:

PR tree-optimization/113364
* gcc.dg/vect/vect-early-break_107-pr113364.c: New test.

libgomp.texi: Document omp_pause_resource{,_all} and omp_target_memcpy*

libgomp/ChangeLog:

* libgomp.texi (Runtime Library Routines): Document
omp_pause_resource, omp_pause_resource_all and
omp_target_memcpy{,_rect}{,_async}.

Co-authored-by: Sandra Loosemore <sandra@codesourcery.com>
Signed-off-by: Tobias Burnus <tburnus@baylibre.com>

libstdc++: [_Hashtable] Remove useless check for _M_before_begin node

When removing the first node of a bucket it is useless to check if this bucket
is the one containing the _M_before_begin node. The bucket before-begin node is
already transfered to the next pointed-to bucket regardeless if it is the container
before-begin node.

libstdc++-v3/ChangeLog:

* include/bits/hashtable.h (_Hahstable<>::_M_remove_bucket_begin): Remove
_M_before_begin check and cleanup implementation.

Co-authored-by: Théo Papadopoulo <papadopoulo@gmail.com>

RISC-V: Add regression test for vsetvl bug pr113429

The reduced testcase for pr113429 (cam4 failure) needed additional
modules so it wasn't committed.
The fuzzer found a c testcase that was also fixed with pr113429's fix.
Adding it as a regression test.

PR target/113429

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/vsetvl/pr113429.c: New test.

Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>

RISC-V: Fix large memory usage of VSETVL PASS [PR113495]

SPEC 2017 wrf benchmark expose unreasonble memory usage of VSETVL PASS
that is, VSETVL PASS consume over 33 GB memory which make use impossible
to compile SPEC 2017 wrf in a laptop.

The root cause is wasting-memory variables:

unsigned num_exprs = num_bbs * num_regs;
sbitmap *avl_def_loc = sbitmap_vector_alloc (num_bbs, num_exprs);
sbitmap *m_kill = sbitmap_vector_alloc (num_bbs, num_exprs);
m_avl_def_in = sbitmap_vector_alloc (num_bbs, num_exprs);
m_avl_def_out = sbitmap_vector_alloc (num_bbs, num_exprs);

I find that compute_avl_def_data can be achieved by RTL_SSA framework.
Replace the code implementation base on RTL_SSA framework.

After this patch, the memory-hog issue is fixed.

simple vsetvl memory usage (valgrind --tool=massif --pages-as-heap=yes --massif-out-file=massif.out)
is 1.673 GB.

lazy vsetvl memory usage (valgrind --tool=massif --pages-as-heap=yes --massif-out-file=massif.out)
is 2.441 GB.

Tested on both RV32 and RV64, no regression.

gcc/ChangeLog:

PR target/113495
* config/riscv/riscv-vsetvl.cc (get_expr_id): Remove.
(get_regno): Ditto.
(get_bb_index): Ditto.
(pre_vsetvl::compute_avl_def_data): Ditto.
(pre_vsetvl::earliest_fuse_vsetvl_info): Fix large memory usage.
(pre_vsetvl::pre_global_vsetvl_info): Ditto.

gcc/testsuite/ChangeLog:

PR target/113495
* gcc.target/riscv/rvv/vsetvl/avl_single-107.c: Adapt test.

Daily bump.

testsuite: Disable new test for PR113292 on targets without TLS support

This disables the new test added by r14-8168 on machines that don't have
TLS support, such as bare-metal ARM.

gcc/testsuite/ChangeLog:

* g++.dg/modules/pr113292_c.C: Require TLS.

Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>

c++: -Wdangling-reference and lambda false warning [PR109640]

-Wdangling-reference checks if a function receives a temporary as its
argument, and only warns if any of the arguments was a temporary. But
we should not warn when the temporary represents a lambda or we generate
false positives as in the attached testcases.

PR c++/113256
PR c++/111607
PR c++/109640

gcc/cp/ChangeLog:

* call.cc (do_warn_dangling_reference): Don't warn if the temporary
is of lambda type.

gcc/testsuite/ChangeLog:

* g++.dg/warn/Wdangling-reference14.C: New test.
* g++.dg/warn/Wdangling-reference15.C: New test.
* g++.dg/warn/Wdangling-reference16.C: New test.

MAINTAINERS: Update my email address

ChangeLog:

* MAINTAINERS: Update my email address.

Signed-off-by: Tobias Burnus <tburnus@baylibre.com>

c: Call c_fully_fold on __atomic_* operands in atomic_bitint_fetch_using_cas_loop [PR113518]

As the following testcase shows, I forgot to call c_fully_fold on the
__atomic_*/__sync_* operands called on _BitInt address, the expressions
are then used inside of TARGET_EXPR initializers etc. and are never fully
folded later, which means we can ICE e.g. on C_MAYBE_CONST_EXPR trees
inside of those.

The following patch fixes it, while the function currently is only called
in the C FE because C++ doesn't support BITINT_TYPE, I think guarding the
calls on !c_dialect_cxx () is safer.

2024-01-23 Jakub Jelinek <jakub@redhat.com>

PR c/113518
* c-common.cc (atomic_bitint_fetch_using_cas_loop): Call c_fully_fold
on lhs_addr, val and model for C.

* gcc.dg/bitint-77.c: New test.

aarch64/expr: Use ccmp when the outer expression is used twice [PR100942]

Ccmp is not used if the result of the and/ior is used by both
a GIMPLE_COND and a GIMPLE_ASSIGN. This improves the code generation
here by using ccmp in this case.
Two changes is required, first we need to allow the outer statement's
result be used more than once.
The second change is that during the expansion of the gimple, we need
to try using ccmp. This is needed because we don't use expand the ssa
name of the lhs but rather expand directly from the gimple.

A small note on the ccmp_4.c testcase, we should be able to get slightly
better than with this patch but it is one extra instruction compared to
before.

PR target/100942

gcc/ChangeLog:

* ccmp.cc (ccmp_candidate_p): Add outer argument.
Allow if the outer is true and the lhs is used more
than once.
(expand_ccmp_expr): Update call to ccmp_candidate_p.
* expr.h (expand_expr_real_gassign): Declare.
* expr.cc (expand_expr_real_gassign): New function, split out from...
(expand_expr_real_1): ...here.
* cfgexpand.cc (expand_gimple_stmt_1): Use expand_expr_real_gassign.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/ccmp_3.c: New test.
* gcc.target/aarch64/ccmp_4.c: New test.
* gcc.target/aarch64/ccmp_5.c: New test.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Co-Authored-By: Richard Sandiford <richard.sandiford@arm.com>

Update my email in MAINTAINERS

ChangeLog:

* MAINTAINERS: Update

Signed-off-by: Andrew Stubbs <ams@baylibre.com>

aarch64: Fix up debug uses in ldp/stp pass [PR113089]

As the PR shows, we were missing code to update debug uses in the
load/store pair fusion pass.  This patch fixes that.

The patch tries to give a complete treatment of the debug uses that will
be affected by the changes we make, and in particular makes an effort to
preserve debug info where possible, e.g. when re-ordering an update of
a base register by a constant over a debug use of that register.  When
re-ordering loads over a debug use of a transfer register, we reset the
debug insn.  Likewise when re-ordering stores over debug uses of mem.

While doing this I noticed that try_promote_writeback used a strange
choice of move_range for the pair insn, in that it chose the previous
nondebug insn instead of the insn itself.  Since the insn is being
changed, these move ranges are equivalent (at least in terms of nondebug
insn placement as far as RTL-SSA is concerned), but I think it is more
natural to choose the pair insn itself.  This is needed to avoid
incorrectly updating some debug uses.

gcc/ChangeLog:

PR target/113089
* config/aarch64/aarch64-ldp-fusion.cc (reset_debug_use): New.
(fixup_debug_use): New.
(fixup_debug_uses_trailing_add): New.
(fixup_debug_uses): New. Use it ...
(ldp_bb_info::fuse_pair): ... here.
(try_promote_writeback): Call fixup_debug_uses_trailing_add to
fix up debug uses of the base register that are affected by
folding in the trailing add insn.

gcc/testsuite/ChangeLog:

PR target/113089
* gcc.c-torture/compile/pr113089.c: New test.

aarch64: Re-parent trailing nondebug base reg uses [PR113089]

While working on PR113089, I realised we where missing code to re-parent
trailing nondebug uses of the base register in the case of cancelling
writeback in the load/store pair pass. This patch fixes that.

gcc/ChangeLog:

PR target/113089
* config/aarch64/aarch64-ldp-fusion.cc (ldp_bb_info::fuse_pair):
Update trailing nondebug uses of the base register in the case
of cancelling writeback.

rtl-ssa: Provide easier access to debug uses [PR113089]

This patch adds some accessors to set_info and use_info to make it
easier to get at and iterate through uses in debug insns.

It is used by the aarch64 load/store pair fusion pass in a subsequent
patch to fix PR113089, i.e. to update debug uses in the pass.

gcc/ChangeLog:

PR target/113089
* rtl-ssa/accesses.h (use_info::next_debug_insn_use): New.
(debug_insn_use_iterator): New.
(set_info::first_debug_insn_use): New.
(set_info::debug_insn_uses): New.
* rtl-ssa/member-fns.inl (use_info::next_debug_insn_use): New.
(set_info::first_debug_insn_use): New.
(set_info::debug_insn_uses): New.

aarch64: Don't record hazards against paired insns [PR113356]

For the testcase in the PR, we try to pair insns where the first has
writeback and the second uses the updated base register.  This causes us
to record a hazard against the second insn, thus narrowing the move
range away from the end of the BB.

However, it isn't meaningful to record hazards against the other insn
in the pair, as this doesn't change which pairs can be formed, and also
doesn't change where the pair is formed (from the perspective of
nondebug insns).

To see why this is the case, consider the two cases:

- Suppoe we are finding hazards for insns[0].  If we record a hazard
   against insns[1], then range.last becomes
   insns[1]->prev_nondebug_insn (), but note that this is equivalent to
   inserting after insns[1] (since insns[1] is being changed).
- Now consider finding hazards for insns[1].  Suppose we record
   insns[0] as a hazard.  Then we set range.first = insns[0], which is a
   no-op.

As such, it seems better to never record hazards against the other insn
in the pair, as we check whether the insns themselves are suitable for
combination separately (e.g. for ldp checking that they use distinct
transfer registers).  Avoiding unnecessarily narrowing the move range
avoids unnecessarily re-ordering over debug insns.

This should also mean that we can only narrow the move range away from
the end of the BB in the case that we record a hazard for insns[0]
against insns[1]->prev_nondebug_insn () or earlier.  This means that for
the non-call-exceptions case, either the move range includes insns[1],
or we reject the pair (thus the assert tripped in the PR should always
hold).

gcc/ChangeLog:

PR target/113356
* config/aarch64/aarch64-ldp-fusion.cc (ldp_bb_info::try_fuse_pair):
Don't record hazards against the opposite insn in the pair.

gcc/testsuite/ChangeLog:

PR target/113356
* gcc.target/aarch64/pr113356.C: New test.

Update year in Gnatvsn

gcc/ada/
* gnatvsn.ads: Update year.

LoongArch: testsuite: Disable stack protector for got-load.C

When building GCC with --enable-default-ssp, the stack protector is
enabled for got-load.C, causing additional GOT loads for
__stack_chk_guard. So mem/u will be matched more than 2 times and the
test will fail.

Disable stack protector to fix this issue.

gcc/testsuite:

* g++.target/loongarch/got-load.C (dg-options): Add
-fno-stack-protector.

Ifdef `.hidden`, `.type`, and `.size` pseudo-ops for `aarch64-w64-mingw32` target

Recent
change (https://gcc.gnu.org/pipermail/gcc-cvs/2023-December/394915.html)
added a generic SME support using `.hidden`, `.type`, and ``.size`
pseudo-ops in the assembly sources, `aarch64-w64-mingw32` does not
support the pseudo-ops though. This patch wraps usage of those
pseudo-ops using macros and ifdefs them for `__ELF__` define.

libgcc/
* config/aarch64/aarch64-asm.h (HIDDEN, SYMBOL_SIZE, SYMBOL_TYPE)
(ENTRY_ALIGN, GNU_PROPERTY): New macros.
* config/aarch64/__arm_sme_state.S: Use them.
* config/aarch64/__arm_tpidr2_save.S: Likewise.
* config/aarch64/__arm_za_disable.S: Likewise.
* config/aarch64/crti.S: Likewise.
* config/aarch64/lse.S: Likewise.

gcc.dg/torture/pr113255.c: Fix ia32 test failure

Fix ia32 test failure:

FAIL: gcc.dg/torture/pr113255.c -O1 (test for excess errors)
Excess errors:
cc1: error: '-mstringop-strategy=rep_8byte' not supported for 32-bit code

PR rtl-optimization/113255
* gcc.dg/torture/pr113255.c (dg-additional-options): Add only
if not ia32.

m2: Use time_t in time and don't redefine alloca

Fix the m2 build warning and error:

[...]
../../src/gcc/m2/mc/mc.flex:32:9: warning: "alloca" redefined
   32 | #define alloca __builtin_alloca
      |         ^~~~~~
In file included from /usr/include/stdlib.h:587,
                 from <stdout>:22:
/usr/include/alloca.h:35:10: note: this is the location of the previous definition
   35 | # define alloca(size)   __builtin_alloca (size)
      |          ^~~~~~
../../src/gcc/m2/mc/mc.flex: In function 'handleDate':
../../src/gcc/m2/mc/mc.flex:333:25: error: passing argument 1 of 'time' from incompatible point
er type [-Wincompatible-pointer-types]
  333 |   time_t  clock = time ((long *)0);
      |                         ^~~~~~~~~
      |                         |
      |                         long int *
In file included from ../../src/gcc/m2/mc/mc.flex:28:
/usr/include/time.h:76:29: note: expected 'time_t *' {aka 'long long int *'} but argument is of
type 'long int *'
   76 | extern time_t time (time_t *__timer) __THROW;

PR bootstrap/113554
* mc/mc.flex (alloca): Don't redefine.
(handleDate): Replace (long *)0 with (time_t *)0 when calling
time.

aarch64: Fix up uses of mem following stp insert [PR113070]

As the PR shows (specifically #c7) we are missing updating uses of mem
when inserting an stp in the aarch64 load/store pair fusion pass.  This
patch fixes that.

RTL-SSA has a simple view of memory and by default doesn't allow stores
to be re-ordered w.r.t. other stores.  In the ldp fusion pass, we do our
own alias analysis and so can re-order stores over other accesses when
we deem this is safe.  If neither store can be re-purposed (moved into
the required position to form the stp while respecting the RTL-SSA
constraints), then we turn both the candidate stores into "tombstone"
insns (logically delete them) and insert a new stp insn.

As it stands, we implement the insert case separately (after dealing
with the candidate stores) in fuse_pair by inserting into the middle of
the vector of changes.  This is OK when we only have to insert one
change, but with this fix we would need to insert the change for the new
stp plus multiple changes to fix up uses of mem (note the number of
fix-ups is naturally bounded by the alias limit param to prevent
quadratic behaviour).  If we kept the code structured as is and inserted
into the middle of the vector, that would lead to repeated moving of
elements in the vector which seems inefficient.  The structure of the
code would also be a little unwieldy.

To improve on that situation, this patch introduces a helper class,
stp_change_builder, which implements a state machine that helps to build
the required changes directly in program order.  That state machine is
reponsible for deciding what changes need to be made in what order, and
the code in fuse_pair then simply follows those steps.

Together with the fix in the previous patch for installing new defs
correctly in RTL-SSA, this fixes PR113070.

We take the opportunity to rename the function decide_stp_strategy to
try_repurpose_store, as that seems more descriptive of what it actually
does, since stp_change_builder is now responsible for the overall change
strategy.

gcc/ChangeLog:

PR target/113070
* config/aarch64/aarch64-ldp-fusion.cc
(struct stp_change_builder): New.
(decide_stp_strategy): Reanme to ...
(try_repurpose_store): ... this.
(ldp_bb_info::fuse_pair): Refactor to use stp_change_builder to
construct stp changes.  Fix up uses when inserting new stp insns.

rtl-ssa: Ensure new defs get inserted [PR113070]

In r14-5820-ga49befbd2c783e751dc2110b544fe540eb7e33eb I added support to
RTL-SSA for inserting new insns, which included support for users
creating new defs.

However, I missed that apply_changes_to_insn needed updating to ensure
that the new defs actually got inserted into the main def chain.  This
meant that when the aarch64 ldp/stp pass inserted a new stp insn, the
stp would just get skipped over during subsequent alias analysis, as its
def never got inserted into the memory def chain.  This (unsurprisingly)
led to wrong code.

This patch fixes the issue by ensuring new user-created defs get
inserted.  I would have preferred to have used a flag internal to the
defs instead of a separate data structure to keep track of them, but since
machine_mode increased to 16 bits we're already at 64 bits in access_info,
and we can't really reuse m_is_temp as the logic in finalize_new_accesses
requires it to get cleared.

gcc/ChangeLog:

PR target/113070
* rtl-ssa.h: Include hash-set.h.
* rtl-ssa/changes.cc (function_info::finalize_new_accesses): Add
new_sets parameter and use it to keep track of new user-created sets.
(function_info::apply_changes_to_insn): Also call add_def on new sets.
(function_info::change_insns): Add hash_set to keep track of new
user-created defs.  Plumb it through.
* rtl-ssa/functions.h: Add hash_set parameter to finalize_new_accesses and
apply_changes_to_insn.

rtl-ssa: Support for creating new uses [PR113070]

This exposes an interface for users to create new uses in RTL-SSA.
This is needed for updating uses after inserting a new store pair insn
in the aarch64 load/store pair fusion pass.

gcc/ChangeLog:

PR target/113070
* rtl-ssa/accesses.cc (function_info::create_use): New.
* rtl-ssa/changes.cc (function_info::finalize_new_accesses):
Ensure new uses end up referring to permanent defs.
* rtl-ssa/functions.h (function_info::create_use): Declare.

rtl-ssa: Run finalize_new_accesses forwards [PR113070]

The next patch in this series exposes an interface for creating new uses
in RTL-SSA.  The intent is that new user-created uses can consume new
user-created defs in the same change group.  This is so that we can
correctly update uses of memory when inserting a new store pair insn in
the aarch64 load/store pair fusion pass (the affected uses need to
consume the new store pair insn).

As it stands, finalize_new_accesses is called as part of the backwards
insn placement loop within change_insns, but if we want new uses to be
able to depend on new defs in the same change group, we need
finalize_new_accesses to be called on earlier insns first.  This is so
that when we process temporary uses and turn them into permanent uses,
we can follow the last_def link on the temporary def to ensure we end up
with a permanent use consuming a permanent def.

gcc/ChangeLog:

PR target/113070
* rtl-ssa/changes.cc (function_info::change_insns): Split out the call
to finalize_new_accesses from the backwards placement loop, run it
forwards in a separate loop.

tree-optimization/113552 - fix num_call accounting in simd clone vectorization

The following avoids using exact_log2 on the number of SIMD clone calls
to be emitted when vectorizing calls since that can easily be not
a power of two in which case it will return -1. For different simd
clones the number of calls will differ by a multiply with a power of two
only so using floor_log2 is good enough here.

PR tree-optimization/113552
* tree-vect-stmts.cc (vectorizable_simd_clone_call): Use
floor_log2 instead of exact_log2 on the number of calls.

ia64: Fix up -Wunused-parameter warning

Since r14-6945-gc659dd8bfb55e02a1b97407c1c28f7a0e8f7f09b
there is a warning
../../gcc/config/ia64/ia64.cc: In function ‘void ia64_start_function(FILE*, const char*, tree)’:
../../gcc/config/ia64/ia64.cc:3889:59: warning: unused parameter ‘decl’ [-Wunused-parameter]
3889 | ia64_start_function (FILE *file, const char *fnname, tree decl)
      |                                                      ~~~~~^~~~
which presumably for bootstraps breaks the bootstrap.
While the decl parameter is passed to the ASM_OUTPUT_FUNCTION_LABEL macro,
that macro actually doesn't use that argument, so the removal of
ATTRIBUTE_UNUSED was incorrect.
This patch reverts the first ia64.cc hunk from r14-6945.

2024-01-23  Jeff Law  <jlaw@ventanamicro.com>
    Jakub Jelinek  <jakub@redhat.com>

* config/ia64/ia64.cc (ia64_start_function): Add ATTRIBUTE_UNUSED to
decl.

Refactor exit PHI handling in vectorizer epilogue peeling

This refactors the handling of PHIs inbetween the main and the
epilogue loop. Instead of trying to handle the multiple exit
and original single exit case together the following separates
these cases resulting in much easier to understand code.

* tree-vect-loop-manip.cc (slpeel_tree_duplicate_loop_to_edge_cfg):
Separate single and multi-exit case when creating PHIs between
the main and epilogue.

aarch64: Avoid registering duplicate C++ overloads [PR112989]

In the original fix for this PR, I'd made sure that
including <arm_sme.h> didn't reach the final return in
simulate_builtin_function_decl (which would indicate duplicate
function definitions). But it seems I forgot to do the same
thing for C++, which defines all of its overloads directly.

This patch fixes a case where we still recorded duplicate
functions for C++. Thanks to Iain for reporting the resulting
GC ICE and for help with reproducing it.

gcc/
PR target/112989
* config/aarch64/aarch64-sve-builtins-shapes.cc (build_one): Skip
MODE_single variants of functions that don't take tuple arguments.

aarch64: Don't assert recog success in ldp/stp pass [PR113114]

The PR shows two different cases where try_promote_writeback produces an
RTL pattern which isn't recognized.  Currently this leads to an ICE, as
we assert recog success, but I think it's better just to back out of the
changes gracefully if recog fails (as we do in the main fuse_pair case).

In theory since we check the ranges here recog shouldn't fail (which is
why I had the assert in the first place), but the PR shows an edge case
in the patterns where if we form a pre-writeback pair where the
writeback offset is exactly -S, where S is the size in bytes of one
transfer register, we fail to match the expected pattern as the patterns
look explicitly for plus operands in the mems.  I think fixing this
would require adding at least four new special-case patterns to
aarch64.md for what doesn't seem to be a particularly useful variant of
the insns.  Even if we were to do that, I think it would be GCC 15
material, and it's better to just punt for GCC 14.

The ILP32 case in the PR is a bit different, as that shows us trying to
combine a pair with DImode base register operands in the mems together
with an SImode trailing update of the base register.  This leads to us
forming an RTL pattern which references the base register in both SImode
and DImode, which also fails to recog.  Again, I think it's best just to
take the missed optimization for now.  If we really want to make this
(try_promote_writeback) work for ILP32, we can try to do it for GCC 15.

gcc/ChangeLog:

PR target/113114
* config/aarch64/aarch64-ldp-fusion.cc (try_promote_writeback):
Don't assert recog success, just punt if the writeback pair
isn't recognized.

gcc/testsuite/ChangeLog:

PR target/113114
* gcc.c-torture/compile/pr113114.c: New test.
* gcc.target/aarch64/pr113114.c: New test.

gcn: Fix a warning

I see
../../gcc/config/gcn/gcn.cc: In function ‘void gcn_hsa_declare_function_name(FILE*, const char*, tree)’:
../../gcc/config/gcn/gcn.cc:6568:67: warning: unused parameter ‘decl’ [-Wunused-parameter]
6568 | gcn_hsa_declare_function_name (FILE *file, const char *name, tree decl)
| ~~~~~^~~~
warning presumably since r14-6945-gc659dd8bfb55e02a1b97407c1c28f7a0e8f7f09b
Previously, the argument was anonymous, but now it is passed to a macro
which ignores it, so I think we should go with ATTRIBUTE_UNUSED.

2024-01-23 Jakub Jelinek <jakub@redhat.com>

* config/gcn/gcn.cc (gcn_hsa_declare_function_name): Add
ATTRIBUTE_UNUSED to decl.

debug/107058 - gracefully handle unexpected DIE contexts

While the bug is persisting that LTO streaming picks up a CONST_DECL
from an attribute argument on a VAR_DECL which with -fdebug-type-section
refers to a DIE in a type unit we can handle this gracefully, at least
with -fno-checking. Do so. The C++ frontend nevetheless should resolve
the CONST_DECL attribute argument to a constant.

PR debug/107058
* dwarf2out.cc (dwarf2out_die_ref_for_decl): Gracefully
handle unexpected but bogus DIE contexts when not checking
enabled.

* c-c++-common/pr107058.c: New testcase.

c++: Fix handling of extern templates in modules [PR112820]

Currently, extern templates are detected by looking for the
DECL_EXTERNAL flag on a TYPE_DECL. However, this is incorrect:
TYPE_DECLs don't actually set this flag, and it happens to work by
coincidence due to TYPE_DECL_SUPPRESS_DEBUG happening to use the same
underlying bit. This however causes issues with other TYPE_DECLs that
also happen to have suppressed debug information.

Instead, this patch reworks the logic so CLASSTYPE_INTERFACE_ONLY is
always emitted into the module BMI and can then be used to check for an
extern template correctly.

Otherwise, for other declarations we always want to redetermine this:
even for declarations from the GMF, we may change our mind on whether to
import or export depending on decisions made later in the TU after
importing so we shouldn't decide this now, or necessarily reuse what the
module we'd imported had decided.

Some of this may need to change in the future to account for
https://github.com/itanium-cxx-abi/cxx-abi/issues/170.

PR c++/112820
PR c++/102607

gcc/cp/ChangeLog:

* module.cc (trees_out::lang_type_bools): Write interface_only
and interface_unknown.
(trees_in::lang_type_bools): Read the above flags.
(trees_in::decl_value): Reset CLASSTYPE_INTERFACE_* except for
extern templates.
(trees_in::read_class_def): Remove buggy extern template
handling.

gcc/testsuite/ChangeLog:

* g++.dg/modules/debug-2_a.C: New test.
* g++.dg/modules/debug-2_b.C: New test.
* g++.dg/modules/debug-2_c.C: New test.
* g++.dg/modules/debug-3_a.C: New test.
* g++.dg/modules/debug-3_b.C: New test.

Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>

fold-const: Fold larger VIEW_CONVERT_EXPRs [PR113462]

On Mon, Jan 22, 2024 at 11:27:52AM +0100, Richard Biener wrote:
> We run into
>
> static tree
> native_interpret_int (tree type, const unsigned char *ptr, int len)
> {
> ...
>   if (total_bytes > len
>       || total_bytes * BITS_PER_UNIT > HOST_BITS_PER_DOUBLE_INT)
>     return NULL_TREE;
>
> OTOH using a V_C_E to "truncate" a _BitInt looks wrong?  OTOH the
> check doesn't really handle native_encode_expr using the "proper"
> wide_int encoding however that's exactly handled.  So it might be
> a pre-existing issue that's only uncovered by large _BitInts
> (__int128 might show similar issues?)

I guess the || total_bytes * BITS_PER_UNIT > HOST_BITS_PER_DOUBLE_INT
conditions make no sense, all we care is whether it fits in the buffer
or not.
But then there is
fold_view_convert_expr
(and other spots) which use
  /* We support up to 1024-bit values (for GCN/RISC-V V128QImode).  */
  unsigned char buffer[128];
or something similar.
This patch fixes even that by using a XALLOCAVEC allocated buffer
if the type size is 129 .. 8192 bytes.

2024-01-22  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/113462
* fold-const.cc (native_interpret_int): Don't punt if total_bytes
is larger than HOST_BITS_PER_DOUBLE_INT / BITS_PER_UNIT.
(fold_view_convert_expr): Use XALLOCAVEC buffers for types with
sizes between 129 and 8192 bytes.

LoongArch: Disable explicit reloc for TLS LD/GD with -mexplicit-relocs=auto

Binutils 2.42 supports TLS LD/GD relaxation which requires the assembler
macro.

gcc/ChangeLog:

* config/loongarch/loongarch.cc (loongarch_explicit_relocs_p):
If la_opt_explicit_relocs is EXPLICIT_RELOCS_AUTO, return false
for SYMBOL_TLS_LDM and SYMBOL_TLS_GD.
(loongarch_call_tls_get_addr): Do not split symbols of
SYMBOL_TLS_LDM or SYMBOL_TLS_GD if la_opt_explicit_relocs is
EXPLICIT_RELOCS_AUTO.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/explicit-relocs-auto-tls-ld-gd.c: Check
for la.tls.ld and la.tls.gd.

find_base_value part

The following adjusts find_base_value similar as to what
find_base_term was adjusted for PR113255.

* alias.cc (known_base_value_p): Remove.
(find_base_value): Remove PLUS/MINUS handling
when both operands are not CONST_INT_P.

rtl-optimization/113255 - base_alias_check vs. pointer difference

When the x86 backend generates code for cpymem with the rep_8byte
strathegy for the 8 byte aligned main rep movq it needs to compute
an adjusted pointer to the source after doing a prologue aligning
the destination.  It computes that via

  src_ptr + (dest_ptr - orig_dest_ptr)

which is perfectly fine.  On RTL this is then

    8: r134:DI=const(`g'+0x44)
    9: {r133:DI=frame:DI-0x4c;clobber flags:CC;}
      REG_UNUSED flags:CC
   56: r129:DI=const(`g'+0x4c)
   57: {r129:DI=r129:DI&0xfffffffffffffff8;clobber flags:CC;}
      REG_UNUSED flags:CC
      REG_EQUAL const(`g'+0x4c)&0xfffffffffffffff8
   58: {r118:DI=r134:DI-r129:DI;clobber flags:CC;}
      REG_DEAD r134:DI
      REG_UNUSED flags:CC
      REG_EQUAL const(`g'+0x44)-r129:DI
   59: {r119:DI=r133:DI-r118:DI;clobber flags:CC;}
      REG_DEAD r133:DI
      REG_UNUSED flags:CC

but as written find_base_term happily picks the first candidate
it finds for the MINUS which means it picks const(`g') rather
than the correct frame:DI.  This way find_base_term (but also
the unfixed find_base_value used by init_alias_analysis to
initialize REG_BASE_VALUE) performs pointer analysis isn't
sound.  The following restricts the handling of multi-operand
operations to the case we know only one can be a pointer.

This for example causes gcc.dg/tree-ssa/pr94969.c to miss some
RTL PRE (I've opened PR113395 for this).  A more drastic patch,
removing base_alias_check results in only gcc.dg/guality/pr41447-1.c
regressing (so testsuite coverage is bad).  I've looked at
gcc.dg/tree-ssa tests and mostly scheduling changes are present,
the cc1plus .text size is only 230 bytes worse.  With the this
less drastic patch below most scheduling changes are gone.

x86_64 might not the very best target to test for impact, but
test coverage on other targets is unlikely to be very much better.

PR rtl-optimization/113255
* alias.cc (find_base_term): Remove PLUS/MINUS handling
when both operands are not CONST_INT_P.

* gcc.dg/torture/pr113255.c: New testcase.

debug/112718 - reset all type units with -ffat-lto-objects

When mixing -flto, -ffat-lto-objects and -fdebug-type-section we
fail to reset all type units after early output resulting in an
ICE when attempting to add then duplicate sibling attributes.

PR debug/112718
* dwarf2out.cc (dwarf2out_finish): Reset all type units
for the fat part of an LTO compile.

* gcc.dg/debug/pr112718.c: New testcase.

LoongArch: doc: Add attribute descriptions defined in the target-supports.exp.

gcc/ChangeLog:

* doc/sourcebuild.texi: Add attributes for keywords.

compiler: don't pass iota value to lowering pass

It is no longer used. The iota value is now handled in the
determine-types pass.

Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/536644

Correct lists of options enabled by -Wall and -Wextra [PR90463]

gcc/ChangeLog
PR c++/90463
* doc/invoke.texi (Warning Options): Correct lists of options
enabled by -Wall and -Wextra by checking against common.opt
and c-family/c.opt.

Sort warning options in c-family/c.opt.

In spite of the plea "Please try to keep this file in ASCII collating
order" at the top of the file, the sorting of the entries for the
various -Wfoo options had increased in entropy. This made it hard to
correlate them against e.g. the list of options enabled by -Wall in
the manual (see PR90463). This patch is at least a step in the right
direction to restore order to the file.

I confirmed that no lines were added or removed by these changes, and
that the output of "gcc -Q --help=warnings" is unchanged both with or
without -Wall added to the command.

gcc/c-family/ChangeLog
* c.opt: Improve sorting of warning options.

Daily bump.

c++: extend Wdangling-reference17.C [PR109642]

This patch extends g++.dg/warn/Wdangling-reference17.C with code
from PR109642. I'm not creating a new test because this one
already #includes the required headers.

PR c++/109642

gcc/testsuite/ChangeLog:

* g++.dg/warn/Wdangling-reference17.C: Additional testing.

Add -gno-strict-dwarf to dg-options in various btf enum tests

The -gno-strict-dwarf option is needed to ensure enum signedness
is added to type_die.

2024-01-22 John David Anglin <danglin@gcc.gnu.org>

gcc/testsuite/ChangeLog:

PR debug/113382
* gcc.dg/debug/btf/btf-bitfields-3.c: Add -gno-strict-dwarf
option to dg-options.
* gcc.dg/debug/btf/btf-enum-1.c: Likewise.
* gcc.dg/debug/btf/btf-enum-small.c: Likewise.
* gcc.dg/debug/btf/btf-enum64-1.c: Likewise.

arm: Fix parsecpu.awk for aliases [PR113030]

So the problem here is the 2 functions check_cpu and check_arch use
the wrong variable to check if an alias is valid for that cpu/arch.
check_cpu uses cpu_optaliases instead of cpu_opt_alias. cpu_optaliases
is an array of index'ed by the cpuname that contains all of the valid aliases
for that cpu but cpu_opt_alias is an double index array which is index'ed
by cpuname and the alias which provides what is the alias for that option.
Similar thing happens for check_arch and arch_optaliases vs arch_optaliases.

Tested by running:
```
awk -f config/arm/parsecpu.awk -v cmd="chkarch armv7-a+simd" config/arm/arm-cpus.in
awk -f config/arm/parsecpu.awk -v cmd="chkarch armv7-a+neon" config/arm/arm-cpus.in
awk -f config/arm/parsecpu.awk -v cmd="chkarch armv7-a+neon-vfpv3" config/arm/arm-cpus.in
```
And they don't return error back.

gcc/ChangeLog:

PR target/113030
* config/arm/parsecpu.awk (check_cpu): Use cpu_opt_alias
instead of cpu_optaliases.
(check_arch): Use arch_opt_alias instead of arch_optaliases.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

xfail libgomp.c/declare-variant-4-{fiji,gfx803}.c

Since r14-4734-g56ed1055b2f40ac162ae8d382280ac07a33f789f, GCC no longer
builds the Fiji (alias gfx803) libraries by default as support for it was
removed in ROCm 4.0 and will be removed in LLVM 18.

Thus, unless gfx803 is explicitly enabled, the following testcases will
fail to link as libgomp is not available for Fiji. Hence, this commit
xfails those testcases.

libgomp/ChangeLog:

* testsuite/libgomp.c/declare-variant-4-fiji.c: Xfail as fiji
support is no longer enabled by default.
* testsuite/libgomp.c/declare-variant-4-gfx803.c: Likewise.

Signed-off-by: Tobias Burnus <tburnus@baylibre.com>

RISC-V: Lower vmv.v.x (avl = 1) into vmv.s.x

Notice there is a AI benchmark, GCC vs Clang has 3% performance drop.

It's because Clang/LLVM has a simplification transform vmv.v.x (avl = 1) into vmv.s.x.

Since vmv.s.x has more flexible vsetvl demand than vmv.v.x that can allow us to have
better chances to fuse vsetvl.

Consider this following case:

void
foo (uint32_t *outputMat, uint32_t *inputMat)
{
  vuint32m1_t matRegIn0 = __riscv_vle32_v_u32m1 (inputMat, 4);
  vuint32m1_t matRegIn1 = __riscv_vle32_v_u32m1 (inputMat + 4, 4);
  vuint32m1_t matRegIn2 = __riscv_vle32_v_u32m1 (inputMat + 8, 4);
  vuint32m1_t matRegIn3 = __riscv_vle32_v_u32m1 (inputMat + 12, 4);

  vbool32_t oddMask
    = __riscv_vreinterpret_v_u32m1_b32 (__riscv_vmv_v_x_u32m1 (0xaaaa, 1));

  vuint32m1_t smallTransposeMat0
    = __riscv_vslideup_vx_u32m1_tumu (oddMask, matRegIn0, matRegIn1, 1, 4);
  vuint32m1_t smallTransposeMat2
    = __riscv_vslideup_vx_u32m1_tumu (oddMask, matRegIn2, matRegIn3, 1, 4);

  vuint32m1_t outMat0 = __riscv_vslideup_vx_u32m1_tu (smallTransposeMat0,
      smallTransposeMat2, 2, 4);

  __riscv_vse32_v_u32m1 (outputMat, outMat0, 4);
}

Before this patch:

        vsetivli        zero,4,e32,m1,ta,ma
        li      a5,45056
        addi    a2,a1,16
        addi    a3,a1,32
        addi    a4,a1,48
        vle32.v v1,0(a1)
        vle32.v v4,0(a2)
        vle32.v v2,0(a3)
        vle32.v v3,0(a4)
        addiw   a5,a5,-1366
        vsetivli        zero,1,e32,m1,ta,ma
        vmv.v.x v0,a5                         ---> Since it avl = 1, we can transform it into vmv.s.x
        vsetivli        zero,4,e32,m1,tu,mu
        vslideup.vi     v1,v4,1,v0.t
        vslideup.vi     v2,v3,1,v0.t
        vslideup.vi     v1,v2,2
        vse32.v v1,0(a0)
        ret

After this patch:

li a5,45056
addi a2,a1,16
vsetivli zero,4,e32,m1,tu,mu
addiw a5,a5,-1366
vle32.v v3,0(a2)
addi a3,a1,32
addi a4,a1,48
vle32.v v1,0(a1)
vmv.s.x v0,a5
vle32.v v2,0(a3)
vslideup.vi v1,v3,1,v0.t
vle32.v v3,0(a4)
vslideup.vi v2,v3,1,v0.t
vslideup.vi v1,v2,2
vse32.v v1,0(a0)
ret

Tested on both RV32 and RV64 no regression.

gcc/ChangeLog:

* config/riscv/riscv-protos.h (splat_to_scalar_move_p): New function.
* config/riscv/riscv-v.cc (splat_to_scalar_move_p): Ditto.
* config/riscv/vector.md: Simplify vmv.v.x. into vmv.s.x.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/vsetvl/attribute-2.c: New test.
* gcc.target/riscv/rvv/vsetvl/attribute-3.c: New test.

libstdc++: Fix check in testsuite/std/time/clock/file/io.cc

The test_format() function contained an incorrect assertion but wasn't
actually being called from main.

libstdc++-v3/ChangeLog:

* testsuite/std/time/clock/file/io.cc: Fix expected result in
assertion and call test_format() from main.

RISC-V: Fix regressions due to 86de9b66480b710202a2898cf513db105d8c432f

This patch fixes the recent regression:

FAIL: gcc.dg/torture/float32-tg-2.c   -O1  (internal compiler error: in reg_or_subregno, at jump.cc:1895)
FAIL: gcc.dg/torture/float32-tg-2.c   -O1  (test for excess errors)
FAIL: gcc.dg/torture/float32-tg-2.c   -O2  (internal compiler error: in reg_or_subregno, at jump.cc:1895)
FAIL: gcc.dg/torture/float32-tg-2.c   -O2  (test for excess errors)
FAIL: gcc.dg/torture/float32-tg-2.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none  (internal compiler error: in reg_or_subregno, at jump.cc:1895)
FAIL: gcc.dg/torture/float32-tg-2.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none  (test for excess errors)
FAIL: gcc.dg/torture/float32-tg-2.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  (internal compiler error: in reg_or_subregno, at jump.cc:1895)
FAIL: gcc.dg/torture/float32-tg-2.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  (test for excess errors)
FAIL: gcc.dg/torture/float32-tg-2.c   -O3 -g  (internal compiler error: in reg_or_subregno, at jump.cc:1895)
FAIL: gcc.dg/torture/float32-tg-2.c   -O3 -g  (test for excess errors)
FAIL: gcc.dg/torture/float32-tg-2.c   -Os  (internal compiler error: in reg_or_subregno, at jump.cc:1895)
FAIL: gcc.dg/torture/float32-tg-2.c   -Os  (test for excess errors)
FAIL: gcc.dg/torture/float32-tg.c   -O1  (internal compiler error: in reg_or_subregno, at jump.cc:1895)
FAIL: gcc.dg/torture/float32-tg.c   -O1  (test for excess errors)
FAIL: gcc.dg/torture/float32-tg.c   -O2  (internal compiler error: in reg_or_subregno, at jump.cc:1895)
FAIL: gcc.dg/torture/float32-tg.c   -O2  (test for excess errors)
FAIL: gcc.dg/torture/float32-tg.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none  (internal compiler error: in reg_or_subregno, at jump.cc:1895)
FAIL: gcc.dg/torture/float32-tg.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none  (test for excess errors)
FAIL: gcc.dg/torture/float32-tg.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  (internal compiler error: in reg_or_subregno, at jump.cc:1895)
FAIL: gcc.dg/torture/float32-tg.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  (test for excess errors)
FAIL: gcc.dg/torture/float32-tg.c   -O3 -g  (internal compiler error: in reg_or_subregno, at jump.cc:1895)
FAIL: gcc.dg/torture/float32-tg.c   -O3 -g  (test for excess errors)
FAIL: gcc.dg/torture/float32-tg.c   -Os  (internal compiler error: in reg_or_subregno, at jump.cc:1895)
FAIL: gcc.dg/torture/float32-tg.c   -Os  (test for excess errors)
FAIL: gcc.dg/torture/pr48124-4.c   -O1  (internal compiler error: in reg_or_subregno, at jump.cc:1895)
FAIL: gcc.dg/torture/pr48124-4.c   -O1  (test for excess errors)
FAIL: gcc.dg/torture/pr48124-4.c   -O2  (internal compiler error: in reg_or_subregno, at jump.cc:1895)
FAIL: gcc.dg/torture/pr48124-4.c   -O2  (test for excess errors)
FAIL: gcc.dg/torture/pr48124-4.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none  (internal compiler error: in reg_or_subregno, at jump.cc:1895)
FAIL: gcc.dg/torture/pr48124-4.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none  (test for excess errors)
FAIL: gcc.dg/torture/pr48124-4.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  (internal compiler error: in reg_or_subregno, at jump.cc:1895)
FAIL: gcc.dg/torture/pr48124-4.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  (test for excess errors)
FAIL: gcc.dg/torture/pr48124-4.c   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  (internal compiler error: in reg_or_subregno, at jump.cc:1895)
FAIL: gcc.dg/torture/pr48124-4.c   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  (test for excess errors)
FAIL: gcc.dg/torture/pr48124-4.c   -O3 -g  (internal compiler error: in reg_or_subregno, at jump.cc:1895)
FAIL: gcc.dg/torture/pr48124-4.c   -O3 -g  (test for excess errors)
FAIL: gcc.dg/torture/pr48124-4.c   -Os  (internal compiler error: in reg_or_subregno, at jump.cc:1895)
FAIL: gcc.dg/torture/pr48124-4.c   -Os  (test for excess errors)

due to commit 86de9b66480b710202a2898cf513db105d8c432f.

The root cause is register_operand and reg_or_subregno are consistent so we reach the assertion fail.

We shouldn't worry about subreg:...VL_REGNUM since it's impossible that we can have such situation,
that is, we only have (set (reg) (reg:VL_REGNUM)) which generate "csrr vl" ASM for first fault load instructions (vleff).
So, using REG_P and REGNO must be totally solid and robostic.

Since we don't allow VL_RENUM involved into register allocation and we don't have such constraint, we always use this
following pattern to generate "csrr vl" ASM:

(define_insn "read_vlsi"
  [(set (match_operand:SI 0 "register_operand" "=r")
(reg:SI VL_REGNUM))]
  "TARGET_VECTOR"
  "csrr\t%0,vl"
  [(set_attr "type" "rdvl")
   (set_attr "mode" "SI")])

So the check in riscv.md is to disallow such situation fall into move pattern in riscv.md

Tested on both RV32/RV64 no regression.

PR target/109092

gcc/ChangeLog:

* config/riscv/riscv.md: Use reg instead of subreg.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/pr109092.c: New test.

[gcn] mkoffload: Fix linking with "-g"; fix file deletion; improve diagnostic [PR111966]

With debugging enabled, '*.mkoffload.dbg.o' files are generated. The e_flags
header of all *.o files must be the same - otherwise, the linker complains.
Since r14-4734-g56ed1055b2f40ac162ae8d382280ac07a33f789f the -march= default
is now gfx900. If compiling without any -march= flag, the default value is
used by the compiler but not passed to mkoffload. Hence, mkoffload.cc's uses
its own default for march - unfortunately, it still had gfx803/fiji as default,
leading to the linker error: 'incompatible mach'. Solution: Update the
default to gfx900.

While debugging it, I saw that /tmp/cc*.mkoffload.dbg.o kept accumulating;
there were a couple of issues with the handling:
* dbgobj was always added to files_to_cleanup
* If copy_early_debug_info returned true, dbgobj was added again
  -> pointless and in theory a race if the same file was added in the
     faction of a second.
* If copy_early_debug_info returned false,
  - In exactly one case, it already deleted the file it self
    (same potential race as above)
  - The pointer dbgobj was freed - such that files_to_cleanup contained
    a dangling pointer - probably the reason that stale files remained.
Solution: Only if copy_early_debug_info returns true, dbgobj is added to
files_to_cleanup. If it returns false, the file is unlinked before freeing
the pointer.

When compiling, GCC warned about several fatal_error messages as having
no %<...%> or %qs quotes. This patch now silences several of those warnings
by using those quotes.

gcc/ChangeLog:

PR other/111966
* config/gcn/mkoffload.cc (elf_arch): Change default to gfx900
to match the compiler default.
(simple_object_copy_lto_debug_sections): Never unlink the outfile
on error as the caller does so.
(maybe_unlink, compile_native): Use %<...%> and %qs in fatal_error.
(main): Likewise. Fix 'mkoffload.dbg.o' cleanup.

Signed-off-by: Tobias Burnus <tburnus@baylibre.com>

Update copyright years.

tree-optimization/113373 - add missing LC PHIs for live operations

The following makes reduction epilogue code generation happy by properly
adding LC PHIs to the exit blocks for multiple exit vectorized loops.

Some refactoring might make the flow easier to follow but I've refrained
from doing that with this patch.

I've kept some fixes in reduction epilogue generation from the earlier
attempt fixing this PR.

PR tree-optimization/113373
* tree-vect-loop-manip.cc (slpeel_tree_duplicate_loop_to_edge_cfg):
Create LC PHIs in the exit blocks where necessary.
* tree-vect-loop.cc (vectorizable_live_operation): Do not try
to handle missing LC PHIs.
(find_connected_edge): Remove.
(vect_create_epilog_for_reduction): Cleanup use of auto_vec.

* gcc.dg/vect/vect-early-break_104-pr113373.c: New testcase.

RISC-V: Fix vfirst/vmsbf/vmsif/vmsof ratio attributes

vfirst/vmsbf/vmsif/vmsof instructions are supposed to demand ratio instead of demanding sew_lmul.
But my previous typo makes VSETVL PASS miss honor the risc-v v spec.

Consider this following simple case:

int foo4 (void * in, void * out)
{
  vint32m1_t v = __riscv_vle32_v_i32m1 (in, 4);
  v = __riscv_vadd_vv_i32m1 (v, v, 4);
  vbool32_t mask = __riscv_vreinterpret_v_i32m1_b32(v);
  mask = __riscv_vmsof_m_b32(mask, 4);
  return __riscv_vfirst_m_b32(mask, 4);
}

Before this patch:

foo4:
        vsetivli        zero,4,e32,m1,ta,ma
        vle32.v v1,0(a0)
        vadd.vv v1,v1,v1
        vsetvli zero,zero,e8,mf4,ta,ma    ----> redundant.
        vmsof.m v2,v1
        vfirst.m        a0,v2
        ret

After this patch:

foo4:
vsetivli zero,4,e32,m1,ta,ma
vle32.v v1,0(a0)
vadd.vv v1,v1,v1
vmsof.m v2,v1
vfirst.m a0,v2
ret

Confirm RVV spec and Clang, this patch makes VSETVL PASS match the correct behavior.

Tested on both RV32/RV64, no regression.

gcc/ChangeLog:

* config/riscv/vector.md: Fix vfirst/vmsbf/vmsof ratio attributes.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/vsetvl/attribute-1.c: New test.

RISC-V: Bugfix for resolve_overloaded_builtin[PR113420]

v2:
Avoid internal ICE for the case below.
vint8mf8_t test_vle8_v_i8mf8_m(vbool64_t vm, const int32_t *rs1, size_t vl) {
  return __riscv_vle8(vm, rs1, vl);
}

v1:
Change the hash value of overloaded intrinsic from considering
all parameter types to:
1. Encoding vector data type
2. In order to distinguish vle8_v_i8mf8_m(vbool64_t vm, const int8_t *rs1, size_t vl)
   and vle8_v_u8mf8_m(vbool64_t vm, const uint8_t *rs1, size_t vl), encode the pointer type
3. In order to distinguish vfadd_vv_f32mf2_rm(vfloat32mf2_t vs2, vfloat32mf2_t vs1, size_t vl)
   and vfadd_vv_f32mf2(vfloat32mf2_t vs2, vfloat32mf2_t vs1, size_t vl), encode the number of
   parameters. The same goes for the vxrm intrinsics.

PR target/113420

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins.cc (has_vxrm_or_frm_p):remove.
(registered_function::overloaded_hash):refactor.
(resolve_overloaded_builtin):avoid internal ICE.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/pr113420-1.c: New test.
* gcc.target/riscv/rvv/base/pr113420-2.c: New test.

[committed] Adjust expectations for pr59533-1.c

The change for pr111267 twiddled code generation for sh/pr59533-1.c

We end up eliminating two comparisons, but require two shll instructions to do
so.  And in a couple places we're using an addc sequence rather than a subc
sequence.   This patch adjusts the expected codegen for the test as all those
are either a wash or a

The fwprop change does cause some code regressions on the same test.  I'll file
a distinct but for that issue.

gcc/testsuite
* gcc.target/sh/pr59533-1.c: Adjust expected output.

Daily bump.

[PATCH v3 2/2] RISC-V: Fix XCValu test

gcc/testsuite/ChangeLog:
* gcc.target/riscv/cv-alu-fail-compile.c: Change warning to error.

Re: [PATCH] Avoid ICE with m68k-elf -malign-int and libcalls

>> emit_library_call_value_1 calls emit_push_insn with NULL_TREE
>> for TYPE.  Sometimes emit_push_insn needs to assign a temp with
>> that TYPE, which causes a segfault.
>>
>> Fixed by computing the TYPE from MODE when needed.
>>
>> Original patch by Thorsten Otto.
>>
[ ... ]
> This really needs to happen in the two call paths which pass in
> NULL_TREE for the type.  Note how the type is used to determine padding
> earlier in emit_push_insn.  That would also make the code more
> consistent with the comment before emit_push_insn which implies that
> both MODE and TYPE are valid.
>
>
> Additionally you should bootstrap and regression test this patch on at
> least one target.

Updated as requested, and bootstrapped and tested on
{x86_64,aarch64,m68k}-linux-gnu without regressions.

gcc/

PR target/82420
PR target/111279
* calls.cc (emit_library_call_value_1): Pass valid TYPE
to emit_push_insn.
* expr.cc (emit_push_insn): Likewise.

gcc/testsuite/

PR target/82420
* gcc.target/m68k/pr82420.c: New test.

Co-authored-by: Thorsten Otto <admin@tho-otto.de>

Install right version of last change.

gcc/
* config/riscv/riscv.cc (riscv_init_cumulative_args): Install
correcction version of last change.

[committed] [NFC] Fix riscv_init_cumulative_args for unused arguments

The signature was still using ATTRIBUTE_UNUSED and actually marked one
of the used arguments with ATTRIBUTE_UNUSED.

This patch drops the decorations and instead remove the name of arguments
which are actually unused which is the preferred way to handle this now
when we can.

Bootstrapped. I didn't have test results on the platform where I
bootstrapped, so no results to compare against. Given its NFC, I
think we're OK without the regression results.

gcc/
* config/riscv/riscv.cc (riscv_init_cumulative_args): Update and
fix bugs in signature.

libstdc++: Fix std::format for floating-point chrono::time_point [PR113500]

Currently trying to use std::format with certain specializations of
std::chrono::time_point is ill-formed, due to one member function of the
__formatter_chrono type which tries to write a time_point to an ostream.
For sys_time<floating-point> or sys_time with a period greater than days
there is no operator<< that can be used.

That operator<< is only needed when using an empty chrono-specs in the
format string, like "{}", but the ill-formed expression gives an error
even if not actually used. This means it's not possible to format some
other specializations of chrono::time_point, even when using a non-empty
chrono-specs.

This fixes it by avoiding using 'os << t' for all chrono::time_point
specializations, and instead using std::format("{:L%F %T}", t). So that
we continue to reject std::format("{}", sys_time{1.0s}) a check for
empty chrono-specs is added to the formatter<sys_time<D>, C>
specialization.

While testing this I noticed that the output for %S with a
floating-point duration was incorrect, as the subseconds part was being
appended to the seconds without a decimal point, and without the correct
number of leading zeros.

libstdc++-v3/ChangeLog:

PR libstdc++/113500
* include/bits/chrono_io.h (__formatter_chrono::_M_S): Fix
printing of subseconds with floating-point rep.
(__formatter_chrono::_M_format_to_ostream): Do not write
time_point specializations directly to the ostream.
(formatter<chrono::sys_time<D>, C>::parse): Do not allow an
empty chrono-spec if the type fails to meet the constraints for
writing to an ostream with operator<<.
* testsuite/std/time/clock/file/io.cc: Check formatting
non-integral times with empty chrono-specs.
* testsuite/std/time/clock/gps/io.cc: Likewise.
* testsuite/std/time/clock/utc/io.cc: Likewise.
* testsuite/std/time/hh_mm_ss/io.cc: Likewise.

libstdc++: Fix std::chrono::file_clock conversions for low-precision times

THe std::chrono::file_clock conversions were not using common_type and
so failed to compile when converting anything that should have increased
precision after arithmetic with a std::chrono::seconds value.

libstdc++-v3/ChangeLog:

* include/bits/chrono.h (__file_clock::from_sys)
(__file_clock::to_sys, __file_clock::_S_from_sys)
(__file_clock::_S_to_sys): Use common_type for return type.
* testsuite/std/time/clock/file/members.cc: Check round trip
conversion for time with lower precision that seconds.

PR rtl-optimization/111267: Improved forward propagation.

This patch resolves PR rtl-optimization/111267 by improving RTL-level
forward propagation.  This x86_64 code quality regression was caused
(exposed) by my changes to improve how x86's (TImode) argument passing
is represented at the RTL-level (reducing the use of SUBREGs to catch
more optimization opportunities in combine).  The pitfall is that the
more complex RTL representations expose a limitation in RTL's fwprop
pass.

At the heart of fwprop, in try_fwprop_subst_pattern, the logic can
be summarized as three steps.  Step 1 is a heuristic that rejects the
propagation attempt if the expression is too complex, step 2 calls
the backend's recog to see if the propagated/simplified instruction
is recognizable/valid, and step 3 then calls set_src_cost to compare
the rtx costs of the replacement vs. the original, and accepts the
transformation if the final cost is the same of better.

The logic error (or missed optimization opportunity) is that the
step 1 heuristic that attempts to predict (second guess) the
process is flawed.  Ultimately the decision on whether to fwprop
or not should depend solely on actual improvement, as measured
by RTX costs.  Hence the prototype fix in the bugzilla PR removes
the heuristic of calling prop.profitable_p entirely, relying
entirely on the cost comparison in step 3.

Unfortunately, things are a tiny bit more complicated.  The cost
comparison in fwprop uses the older set_src_cost API and not the
newer (preffered) insn_cost API as currently used in combine.
This means that the cost improvement comparisons are only done
for single_set instructions (more complex PARALLELs etc. aren't
supported).  Hence we can only rely on skipping step 1 for that
subset of instructions actually evaluated by step 3.

The other subtlety is that to avoid potential infinite loops
in fwprop we should only reply purely on rtx costs when the
transformation is obviously an improvement.  If the replacement
has the same cost as the original, we can use the prop.profitable_p
test to preserve the current behavior.

Finally, to answer Richard Biener's remaining question about this
approach: yes, there is an asymmetry between how patterns are
handled and how REG_EQUAL notes are handled.  For example, at
the moment propagation into notes doesn't use rtx costs at all,
and ultimately when fwprop is updated to use insn_cost, this
(and recog) obviously isn't applicable to notes.  There's no reason
the logic need be identical between patterns and notes, and during
stage4 we only need update propagation into patterns to fix this
P1 regression (notes and use of cost_insn can be done for GCC 15).

For Jakub's reduced testcase:

struct S { float a, b, c, d; };
int bar (struct S x, struct S y) {
  return x.b <= y.d && x.c >= y.a;
}

On x86_64-pc-linux-gnu with -O2 gcc currently generates:

bar:    movq    %xmm2, %rdx
        movq    %xmm3, %rax
        movq    %xmm0, %rsi
        xchgq   %rdx, %rax
        movq    %rsi, %rcx
        movq    %rax, %rsi
        movq    %rdx, %rax
        shrq    $32, %rcx
        shrq    $32, %rax
        movd    %ecx, %xmm4
        movd    %eax, %xmm0
        comiss  %xmm4, %xmm0
        jb      .L6
        movd    %esi, %xmm0
        xorl    %eax, %eax
        comiss  %xmm0, %xmm1
        setnb   %al
        ret
.L6: xorl    %eax, %eax
        ret

with this simple patch to fwprop, we now generate:

bar: shufps  $85, %xmm0, %xmm0
        shufps  $85, %xmm3, %xmm3
        comiss  %xmm0, %xmm3
        jb      .L6
        xorl    %eax, %eax
        comiss  %xmm2, %xmm1
        setnb   %al
        ret
.L6: xorl    %eax, %eax
        ret

2024-01-21  Roger Sayle  <roger@nextmovesoftware.com>
    Richard Biener  <rguenther@suse.de>

gcc/ChangeLog
PR rtl-optimization/111267
* fwprop.cc (fwprop_propagation::profitabe_p): Rename
profitable_p method to likely_profitable_p.
(try_fwprop_subst_node): Update call to likely_profitable_p.
Only bail-out early when !prop.likely_profitable_p for instructions
that are not single sets.  When comparing costs, bail-out if the
cost is unchanged and !prop.likely_profitable_p.

gcc/testsuite/ChangeLog
PR rtl-optimization/111267
* gcc.target/i386/pr111267.c: New test case.

Make the manual clearer about what options -Wunused enables [PR90464]

gcc/ChangeLog
PR c++/90464
* doc/invoke.texi (Warning Options): Document that -Wunused-parameter
isn't enabled by -Wunused unless -Wextra is provided, and that
-Wunused does enable -Wunused-const-variable=1 for C. Clarify that
-Wunused doesn't enable -Wunused-* options documented as behaving
otherwise, and list them explicitly.

Fortran: passing of optional scalar arguments with VALUE attribute [PR113377]

gcc/fortran/ChangeLog:

PR fortran/113377
* trans-expr.cc (gfc_conv_procedure_call): Fix handling of optional
scalar arguments of intrinsic type with the VALUE attribute.

gcc/testsuite/ChangeLog:

PR fortran/113377
* gfortran.dg/optional_absent_9.f90: New test.

C23: Fix ICE for composite type for structs with unsigned bitfields [PR113492]

This patch fixes a bug when forming a composite type from structs that
contain an unsigned bitfield declared with int while using -funsigned-bitfields.
In such structs the unsigned integer type was not compatible to the
regular unsigned integer type used elsewhere in the C FE.

PR c/113492
gcc/c:
* c-decl.cc (grokdeclarator): Use c_common_unsigned_type instead of
unsigned_type_for to create the unsigned type for bitfields declared
with int when using -funsigned-bitfields.

gcc/testsuite:
* gcc.dg/pr113492.c: New test.

libstdc++: Fix std::format floating-point alternate forms [PR113512]

The logic for handling '#' forms was ... not good. The count of
significant figures just counted digits, instead of ignoring leading
zeros. And when moving the result from the stack buffer to a dynamic
string the exponent could get lost in some cases.

libstdc++-v3/ChangeLog:

PR libstdc++/113512
* include/std/format (__formatter_fp::format): Fix logic for
alternate forms.
* testsuite/std/format/functions/format.cc: Check buggy cases of
alternate forms with g presentation type.

Clean up examples for -Wdangling-pointer [PR109708]

gcc/ChangeLog
PR c/109708
* doc/invoke.texi (Warning Options): Fix broken example and
clean up/reorganize the others. Also describe what the short-form
options mean.

Daily bump.

Remove several xfails for 32-bit hppa*-*-*

These arise because 32-bit ELF targets were changed from
callee copies to caller copies.

2024-01-20 John David Anglin <danglin@gcc.gnu.org>

gcc/testsuite/ChangeLog:

* gcc.dg/ipa/iinline-4.c: Remove dg-final xfail for
32-bit hppa*-*-*.
* gcc.dg/ipa/inline-5.c: Likewise.
* gcc.dg/ipa/ipcp-cstagg-7.c: Likewise.
* gcc.dg/tree-ssa/vector-4.c: Likewise.

Increase timeout by 2 in libgomp.fortran/alloc-comp-3.f90 on hppa*-*-*

2024-01-20 John David Anglin <danglin@gcc.gnu.org>

libgomp/ChangeLog:

* testsuite/libgomp.fortran/alloc-comp-3.f90: Increase
timeout by 2 on hppa*-*-*.

Don't run libgomp.c/simd-math-1.c on hppa*-*-hpux*

hppa*-*-hpux* lacks necessary math functions.

2024-01-20 John David Anglin <danglin@gcc.gnu.org>

libgomp/ChangeLog:

* testsuite/libgomp.c/simd-math-1.c: Don't run on
hppa*-*-hpux*.

xfail scan-tree-dump-times checks on hppa*64*-*-* in gcc.dg/tree-ssa/slsr-13.c

2024-01-20 John David Anglin <danglin@gcc.gnu.org>

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/slsr-13.c: xfail scan-tree-dump-times
checks on hppa*64*-*-*.

Require target lra in gcc.dg/torture/pr110422.c

LRA is required for asm goto.

2024-01-20 John David Anglin <danglin@gcc.gnu.org>

gcc/testsuite/ChangeLog:

* gcc.dg/torture/pr110422.c: Require target lra.

libstdc++: suppress -Wdangling-reference with operator| [PR111410]

It seems to me that we should exclude std::ranges::views::__adaptor::operator|
from the -Wdangling-reference warning. It's commonly used when handling
ranges.

PR c++/111410

libstdc++-v3/ChangeLog:

* include/std/ranges: Add #pragma to disable -Wdangling-reference with
std::ranges::views::__adaptor::operator|.

gcc/testsuite/ChangeLog:

* g++.dg/warn/Wdangling-reference17.C: New test.

ipa: Add testcase for already fixed case [PR110705]

This testcase was fixed with r13-1695-gb0f02eeb906b63 which
added an Ada testcase for the issue but adding a C testcase
is a good idea and that is what this does.

Committed after making sure it passes on x86_64-linux-gnu.

PR ipa/110705

gcc/testsuite/ChangeLog:

* gcc.c-torture/compile/pr110705-1.c: New test.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

fortran: Restore current interface info on error [PR111291]

This change is a followup to the fix for PR48776 (namely
r14-3572-gd58150452976c4ca65ddc811fac78ef956fa96b0 AKA
fortran: Restore interface to its previous state on error [PR48776]),
which cleaned up new changes from interfaces upon error.

Unfortunately, there is one case in that fix that is mishandled, visible
on unexpected_interface.f90 with valgrind or an asan-instrumented gfortran.
when an interface statement is found while parsing an interface body (which
is invalid), the current interface is replaced by the one from the new
statement, and as parsing continues, new procedures are added
to the new interface, which has been rejected and freed, instead of the
original one.

This change restores the current interface pointer to its previous value
on each rejected statement.

PR fortran/48776
PR fortran/111291

gcc/fortran/ChangeLog:

* parse.cc: Restore current interface to its previous value on error.

Correct documentation for -Warray-parameter [PR102998]

gcc/ChangeLog
PR c/102998
* doc/invoke.texi (Option Summary): Add -Warray-parameter.
(Warning Options): Correct/edit discussion of -Warray-parameter
to make the first example less confusing, and fill in missing info.

lower-bitint: Handle INTEGER_CST rhs1 in handle_cast [PR113462]

The following patch ICEs because fre3 leaves around unfolded
_1 = VIEW_CONVERT_EXPR<_BitInt(129)>(0);
statement and in handle_cast I was expecting just SSA_NAMEs for the
large/huge _BitInt to large/huge _BitInt casts; INTEGER_CST is something
we can handle in that case exactly the same, as the handle_operand recursion
handles those.

Of course, maybe we should also try to fold_stmt such cases somewhere in
bitint lowering preparation.

2024-01-20 Jakub Jelinek <jakub@redhat.com>

PR tree-optimization/113462
* gimple-lower-bitint.cc (bitint_large_huge::handle_cast):
Handle rhs1 INTEGER_CST like SSA_NAME.

* gcc.dg/bitint-76.c: New test.

tree-switch-conversion: Bugfixes for _BitInt [PR113491]

The following patch fixes various issues with large/huge _BitInt used as switch
expressions.
In particular:
1) the indexes in CONSTRUCTORs shouldn't be types with precision larger than
   sizetype precision, varasm uses wi::to_offset on those and too large
   indexes ICE; we've already checked earlier that the cluster is at most
   sizetype bits and arrays can't be larger than that anyway
2) some spots were using SCALAR_INT_TYPE_MODE or
   lang_hooks.types.type_for_mode on TYPE_MODE to determine types to use,
   that obviously doesn't work for the large/huge BITINT_TYPE
3) like the recent change in the C FE, this patch makes sure we don't create
   ARRAY_REFs with indexes with precision above sizetype precision, because
   bitint lowering isn't prepared for that and because the indexes can't be
   larger than sizetype anyway; the subtraction of the cluster minimum from
   the index obviously needs to be done in unsigned __int128 or large/huge
   BITINT_TYPE, but then we cast to sizetype if the precision is larger than
   sizetype

2024-01-20  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/113491
* tree-switch-conversion.cc (switch_conversion::build_constructors):
If elt.index has precision higher than sizetype, fold_convert it to
sizetype.
(switch_conversion::array_value_type): Return type if type is
BITINT_TYPE with precision above MAX_FIXED_MODE_SIZE or with BLKmode.
(switch_conversion::build_arrays): Use unsigned_type_for rather than
lang_hooks.types.type_for_mode if utype is BITINT_TYPE with precision
above MAX_FIXED_MODE_SIZE or with BLKmode.  If utype has precision
higher than sizetype, use sizetype as tidx type and fold_convert the
subtraction to sizetype.

* gcc.dg/torture/bitint-51.c: New test.

RISC-V: Suppress warning

../../gcc/config/riscv/riscv.cc: In function 'void riscv_init_cumulative_args(CUMULATIVE_ARGS*, tree, rtx, tree, int)':
../../gcc/config/riscv/riscv.cc:4879:34: error: unused parameter 'fndecl' [-Werror=unused-parameter]
4879 |                             tree fndecl,
      |                             ~~~~~^~~~~~
../../gcc/config/riscv/riscv.cc: In function 'bool riscv_vector_mode_supported_any_target_p(machine_mode)':
../../gcc/config/riscv/riscv.cc:10537:56: error: unused parameter 'mode' [-Werror=unused-parameter]
10537 | riscv_vector_mode_supported_any_target_p (machine_mode mode)
      |                                           ~~~~~~~~~~~~~^~~~
cc1plus: all warnings being treated as errors
make[3]: *** [Makefile:2559: riscv.o] Error 1

Suppress these warnings.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_init_cumulative_args): Suppress warning.
(riscv_vector_mode_supported_any_target_p): Ditto.

Daily bump.

[PATCH] Avoid ICE on m68k -fzero-call-used-regs -fpic [PR110934]

PR110934 is a problem on m68k where -fzero-call-used-regs -fpic ICEs
when clearing an FP register.

The generic code generates an XFmode move of zero to that register,
which becomes an XFmode load from initialized data, which due to -fpic
uses a non-constant address, which the backend rejects. The
zero-call-used-regs pass runs very late, after register allocation and
frame layout, and at that point we can't allow new uses of the PIC
register or new pseudos.

To clear an FP register on m68k it's enough to do the move in SFmode,
but the generic code can't be told to do that, so this patch updates
m68k to use its own TARGET_ZERO_CALL_USED_REGS.

Bootstrapped and regression tested on m68k-linux-gnu.

Ok for master? (I don't have commit rights.)

gcc/

PR target/110934
* config/m68k/m68k.cc (m68k_zero_call_used_regs): New function.
(TARGET_ZERO_CALL_USED_REGS): Define.

gcc/testsuite/

PR target/110934
* gcc.target/m68k/pr110934.c: New test.

[PATCH] Avoid ICE in single-bit logical RMWs on m68k-uclinux [PR108640]

When generating RMW logical operations on m68k, the backend
recognizes single-bit operations and rewrites them as bit
instructions on operands adjusted to address the intended byte.
When offsetting the addresses the backend keeps the modes as
SImode, even though the actual access will be in QImode.

The uclinux target defines M68K_OFFSETS_MUST_BE_WITHIN_SECTIONS_P
which adds a check that the adjusted operand is within the bounds
of the original object. Since the address has been offset it is
not, and the compiler ICEs.

The bug is that the modes of the adjusted operands should have been
narrowed to QImode, which is that this patch does. Nearby code
which narrows to HImode gets that right.

Bootstrapped and regression tested on m68k-linux-gnu.

Ok for master? (Note: I don't have commit rights.)

gcc/

PR target/108640
* config/m68k/m68k.cc (output_andsi3): Use QImode for
address adjusted for 1-byte RMW access.
(output_iorsi3): Likewise.
(output_xorsi3): Likewise.

gcc/testsuite/

PR target/108640
* gcc.target/m68k/pr108640.c: New test.

libgccjit: Add missing builtins needed by optimizations

gcc/jit/ChangeLog:

* jit-builtins.cc (ensure_optimization_builtins_exist): Add
popcount builtins.

gcc/testsuite/ChangeLog:

* jit.dg/all-non-failing-tests.h: New test.
* jit.dg/test-popcount.c: New test.

libgccjit: Make is_int return false on vector types

gcc/jit/ChangeLog:

* jit-recording.h (is_numeric_vector, vector_type::new_int): New
functions.
* libgccjit.cc (gcc_jit_context_new_unary_op,
gcc_jit_context_new_binary_op): add checks for
is_numeric_vector.

gcc/testsuite/ChangeLog:

* jit.dg/test-reflection.c: Add check to make sure
gcc_jit_type_is_integral returns 0 on a vector type.

Fortran: fix wrong array bounds check [PR113471]

gcc/fortran/ChangeLog:

PR fortran/113471
* trans-array.cc (array_bound_check_elemental): Array bounds check
shall apply here to elemental dimensions of an array section only.

gcc/testsuite/ChangeLog:

PR fortran/113471
* gfortran.dg/bounds_check_24.f90: New test.

c++: requires and using-decl [PR113498]

get_template_info was crashing because it assumed that any decl with
DECL_LANG_SPECIFIC could use DECL_TEMPLATE_INFO. It's more complicated than
that.

PR c++/113498

gcc/cp/ChangeLog:

* pt.cc (decl_template_info): New fn.
(get_template_info): Use it.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-using4.C: New test.

libstdc++: Do not use CTAD for _Utf32_view alias template (redux)

My change in r14-8181-g665a3ff1539ce2 was incomplete as there's a second
place using CTAD with the _Utf32_view alias template. This fixes it.

libstdc++-v3/ChangeLog:

* include/std/format (_Spec::_M_parse_fill_and_align): Do not
use CTAD for _Utf32_view.

libstdc++: Fix P2255R2 dangling checks for std::tuple in C++17 [PR108822]

I accidentally used && in a fold-expression instead of || which meant
that in C++17 the tuple(UElements&&...) constructor only failed its
debug assertion if all tuple elements were dangling references. Some
missing tests (noted as "TODO") meant this wasn't tested.

This fixes the fold expression and adds the missing tests.

libstdc++-v3/ChangeLog:

PR libstdc++/108822
* include/std/tuple (__glibcxx_no_dangling_refs) [C++17]: Fix
wrong fold-operator.
* testsuite/20_util/tuple/dangling_ref.cc: Check tuples with one
element and three elements. Check allocator-extended
constructors.

c++: alias template argument conversion [PR112632]

We've had a problem with lost conversions to template parameter types for a
while now; looking at this PR, it occurred to me that the problem is really
with alias (and concept) templates, since we do substitution of dependent
arguments into them in a way that we don't for other templates. And fixing
that specific problem is a lot simpler than adding IMPLICIT_CONV_EXPR around
all dependent template arguments the way I gave up on for 111357.

The other part of the fix was changing tsubst_expr to actually call
convert_nontype_argument instead of assuming it will eventually happen.

I waffled about stripping the forced conversion when !force_conv
vs. skipping them in iterative_hash_template_arg and
template_args_equal (like we already do for some other conversions) and
decided to go with the former, but that isn't a strong preference if it
turns out to be somehow problematic.

PR c++/112632
PR c++/112594
PR c++/111357
PR c++/104594
PR c++/67898

gcc/cp/ChangeLog:

* cp-tree.h (IMPLICIT_CONV_EXPR_FORCED): New.
* pt.cc (expand_integer_pack): Remove 111357 workaround.
(maybe_convert_nontype_argument): Add force parm.
(convert_template_argument): Handle alias template args
specially.
(tsubst_expr): Don't ignore IMPLICIT_CONV_EXPR_NONTYPE_ARG.
* error.cc (dump_expr) [CASE_CONVERT]: Handle null optype.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/alias-decl-nontype1.C: New test.
* g++.dg/cpp2a/concepts-narrowing1.C: New test.
* g++.dg/cpp2a/nontype-class63.C: New test.
* g++.dg/cpp2a/nontype-class63a.C: New test.

modula2: tidyup gcc/m2/gm2-compiler/M2GenGCC.mod remove unused parameters/vars

This patch removes unused parameters and local variables from
M2GenGCC.mod. It required ForeachScopeBlockDo2 to be implemented and
exported affecting any module indirectly calling ConvertQuadsToTree.

gcc/m2/ChangeLog:

* gm2-compiler/M2BasicBlock.mod (InitBasicBlocks): Rename
ForeachScopeBlockDo to ForeachScopeBlockDo3.
* gm2-compiler/M2Code.mod: Import ForeachScopeBlockDo2.
(OptimizeScopeBlock): Call ForeachScopeBlockDo3 for
procedures with three parameters and ForeachScopeBlockDo2
for two parameters.
(CodeBlock): Ditto.
* gm2-compiler/M2GCCDeclare.mod (DeclareTypesConstantsProcedures):
Rename ForeachScopeBlockDo to ForeachScopeBlockDo3.
* gm2-compiler/M2GenGCC.def (ConvertQuadsToTree): Remove Scope
parameter.
* gm2-compiler/M2GenGCC.mod (ConvertQuadsToTree): Remove Scope
parameter.
(MaybeDebugBuiltinMemcpy): Remove parameter tok.
(MaybeDebugBuiltinMemset): Remove.
(MakeCopyUse): Remove tokenno from call to
MaybeDebugBuiltinMemcpy.
(PerformFoldBecomes): Remove desloc and exprloc.
(checkArrayElements): Remove location. Remove virtpos
as a parameter to MaybeDebugBuiltinMemcpy.
(NoWalkProcedure): Add attribute unused.
(CheckElementSetTypes): Remove parameter p.
Remove CurrentQuadToken in call to MaybeDebugBuiltinMemcpy.
Remove NoWalkProcedure from call to CheckElementSetTypes.
Remove tokenno from call to MaybeDebugBuiltinMemcpy.
* gm2-compiler/M2Optimize.mod (RemoveProcedures): Replace
two parameter indirect procedure iterator with
ForeachScopeBlockDo2.
* gm2-compiler/M2SSA.mod: Remove ForeachScopeBlockDo.
* gm2-compiler/M2Scope.def (ForeachScopeBlockDo2): New
declaration.
(ForeachScopeBlockDo): Rename ...
(ForeachScopeBlockDo3): ... to this.
(ScopeProcedure2): New declaration.
* gm2-compiler/M2Scope.mod (ForeachScopeBlockDo2): New
procedure.
(ForeachScopeBlockDo): Rename ...
(ForeachScopeBlockDo3): ... to this.

Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>

Limit dg-xfail-run-if for *-*-hpux11.[012]* to -O0

2024-01-19 John David Anglin <danglin@gcc.gnu.org>

gcc/testsuite/ChangeLog:

* gcc.dg/torture/pr47917.c: Limit dg-xfail-run-if for
hpux11.[012]* to -O0.

Change dg-options for hpux to define _HPUX_SOURCE in gcc.dg/pthread-init-2.c

Pthreads on hpux needs _HPUX_SOURCE define for id_t and spu_t types.

2024-01-19 John David Anglin <danglin@gcc.gnu.org>

gcc/testsuite/ChangeLog:

* gcc.dg/pthread-init-2.c: Change dg-options for hpux
to define _HPUX_SOURCE.

Only xfail gcc.dg/pr84877.c on 32-bit hppa*-*-*

2024-01-19 John David Anglin <danglin@gcc.gnu.org>

gcc/testsuite/ChangeLog:

* gcc.dg/pr84877.c: Only xfail on 32-bit hppa*-*-*.

Skip gcc.dg/analyzer/pr94688.c on hppa*64*-*-*

2024-01-19 John David Anglin <danglin@gcc.gnu.org>

gcc/testsuite/ChangeLog:

PR analyzer/112705
* gcc.dg/analyzer/pr94688.c: Skip on hppa*64*-*-*.

libstdc++: Add <print> and <text_encoding> to stdc++.h

libstdc++-v3/ChangeLog:

* include/precompiled/stdc++.h [_GLIBCXX_HOSTED]: Include
<print> and <text_encoding> for C++23 and C++26 respectively.