Andrew MacLeod [Fri, 1 Nov 2024 14:56:54 +0000 (10:56 -0400)]
Make fur_edge accessible.
Move the decl of fur_edge out of the source file into the header file.
* gimple-range-fold.cc (class fur_edge): Relocate from here.
(fur_edge::fur_edge): Also move to:
* gimple-range-fold.h (class fur_edge): Relocate to here.
(fur_edge::fur_edge): Likewise.
Jakub Jelinek [Mon, 4 Nov 2024 11:29:01 +0000 (12:29 +0100)]
libstdc++: Fix up 117406.cc test [PR117406]
Christophe mentioned in bugzilla that the test FAILs on aarch64,
I'm not including <climits> and use INT_MAX.
Apparently during my testing I got it because the test preinclude
-include bits/stdc++.h
and that includes <climits>, dunno why that didn't happen on aarch64.
In any case, either I can add #include <climits>, or because the
test already has #include <limits> I've changed uses of INT_MAX
with std::numeric_limits<int>::max(), that should be the same thing.
But if you prefer
#include <climits>
I can surely add that instead.
2024-11-04 Jakub Jelinek <jakub@redhat.com>
PR libstdc++/117406
* testsuite/26_numerics/headers/cmath/117406.cc: Use
std::numeric_limits<int>::max() instead of INT_MAX.
Jakub Jelinek [Sat, 2 Nov 2024 17:48:54 +0000 (18:48 +0100)]
libstdc++: Fix up std::{,b}float16_t std::{ilogb,l{,l}r{ound,int}} [PR117406]
These overloads incorrectly cast the result of the float __builtin_*
to _Float or __gnu_cxx::__bfloat16_t. For std::ilogb that changes
behavior for the INT_MAX return because that isn't representable in
either of the floating point formats, for the others it is I think
just a very inefficient hop from int/long/long long to std::{,b}float16_t
and back. I mean for the round/rint cases, either the argument is small
and then the return value should be representable in the floating point
format too, or it is too large that the argument is already integral
and then it should just return the argument with the round trips.
Too large value is unspecified unlike ilogb.
2024-11-02 Jakub Jelinek <jakub@redhat.com>
PR libstdc++/117406
* include/c_global/cmath (std::ilogb(_Float16), std::llrint(_Float16),
std::llround(_Float16), std::lrint(_Float16), std::lround(_Float16)):
Don't cast __builtin_* return to _Float16.
(std::ilogb(__gnu_cxx::__bfloat16_t),
std::llrint(__gnu_cxx::__bfloat16_t),
std::llround(__gnu_cxx::__bfloat16_t),
std::lrint(__gnu_cxx::__bfloat16_t),
std::lround(__gnu_cxx::__bfloat16_t)): Don't cast __builtin_* return to
__gnu_cxx::__bfloat16_t.
* testsuite/26_numerics/headers/cmath/117406.cc: New test.
Jakub Jelinek [Thu, 31 Oct 2024 09:52:56 +0000 (10:52 +0100)]
expand: Fix up expansion of VIEW_CONVERT_EXPR to BITINT_TYPE [PR117354]
The following testcase ICEs, because when trying to expand the
VIEW_CONVERT_EXPR operand which is SSA_NAME defined to
V32QI or V4DI MEM_REF which is aligned just to 8 bytes we force
it as unaligned into a register, but then try to call extract_bit_field
from the V32QI or V4DI register to BLKmode. extract_bit_field doesn't
obviously support BLKmode extraction and so ICEs.
The second hunk fixes the ICE by not calling extract_bit_field when
it can't handle it, the last if will handle it properly by storing
it to memory and using BLKmode access to the copy.
The first hunk is an optimization, if mode is BLKmode, by setting
inner_reference_p argument to expand_expr_real we avoid the
expand_misaligned_mem_ref calls which load it from memory into a register.
2024-10-31 Jakub Jelinek <jakub@redhat.com>
PR middle-end/117354
* expr.cc (expand_expr_real_1) <case VIEW_CONVERT_EXPR>: Pass
true as inner_reference_p argument to expand_expr_real if
mode is BLKmode. Don't call extract_bit_field if mode is BLKmode.
Jakub Jelinek [Wed, 30 Oct 2024 08:59:22 +0000 (09:59 +0100)]
function: Call do_pending_stack_adjust in assign_parms [PR117296]
Functions called by assign_parms call emit_block_move in two places,
so on some targets can be expanded as calls and can result in pending
stack adjustment.
Now, during expansion we normally call do_pending_stack_adjust at the end
of expansion of each basic block or before emitting code that will branch
and/or has labels, and when emitting labels we assert that there are no
pending stack adjustments.
assign_parms is expanded before the first basic block and if the first
basic block starts with a label and at least one of those emit_block_move
calls resulted in the need of pending stack adjustments, we ICE when
emitting that label.
The following patch fixes that by calling do_pending_stack_adjust after
after the assign_parms potential emit_block_move calls.
Jakub Jelinek [Tue, 29 Oct 2024 10:14:12 +0000 (11:14 +0100)]
libstdc++: Use if consteval rather than if (std::__is_constant_evaluated()) for {,b}float16_t nextafter [PR117321]
The nextafter_c++23.cc testcase fails to link at -O0.
The problem is that eventhough std::__is_constant_evaluated() has
always_inline attribute, that at -O0 just means that we inline the
call, but its result is still assigned to a temporary which is tested
later, nothing at -O0 propagates that false into the if and optimizes
away the if body. And the __builtin_nextafterf16{,b} calls are meant
to be used solely for constant evaluation, the C libraries don't
define nextafterf16 these days.
As __STDCPP_FLOAT16_T__ and __STDCPP_BFLOAT16_T__ are predefined right
now only by GCC, not by clang which doesn't implement the extended floating
point types paper, and as they are predefined in C++23 and later modes only,
I think we can just use if consteval which is folded already during the FE
and the body isn't included even at -O0. I've added a feature test for
that just in case clang implements those and implements those in some weird
way. Note, if (__builtin_is_constant_evaluted()) would work correctly too,
that is also folded to false at gimplification time and the corresponding
if block not emitted at all. But for -O0 it can't be wrapped into a helper
inline function.
2024-10-29 Jakub Jelinek <jakub@redhat.com>
PR libstdc++/117321
* include/c_global/cmath (nextafter(_Float16, _Float16)): Use
if consteval rather than if (std::__is_constant_evaluated()) around
the __builtin_nextafterf16 call.
(nextafter(__gnu_cxx::__bfloat16_t, __gnu_cxx::__bfloat16_t)): Use
if consteval rather than if (std::__is_constant_evaluated()) around
the __builtin_nextafterf16b call.
* testsuite/26_numerics/headers/cmath/117321.cc: New test.
Eric Botcazou [Fri, 16 Aug 2024 14:03:30 +0000 (16:03 +0200)]
ada: Fix internal error on concatenation of discriminant-dependent component
This only occurs with optimization enabled, but the expanded code is always
wrong because it reuses the formal parameter of an initialization procedure
associated with a discriminant (a discriminal in GNAT parlance) outside of
the initialization procedure.
gcc/ada/
* checks.adb (Selected_Length_Checks.Get_E_Length): For a
component of a record with discriminants and if the expression is
a selected component, try to build an actual subtype from its
prefix instead of from the discriminal.
Paul Thomas [Fri, 25 Oct 2024 16:59:03 +0000 (17:59 +0100)]
Fortran: Fix ICE with structure constructor in data statement [PR79685]
2024-10-25 Paul Thomas <pault@gcc.gnu.org>
gcc/fortran
PR fortran/79685
* decl.cc (match_data_constant): Find the symtree instead of
the symbol so the use renamed symbols are found. Pass this and
the derived type to gfc_match_structure_constructor.
* match.h: Update prototype of gfc_match_structure_contructor.
* primary.cc (gfc_match_structure_constructor): Remove call to
gfc_get_ha_sym_tree and use caller supplied symtree instead.
gcc/testsuite/
PR fortran/79685
* gfortran.dg/use_rename_13.f90: New test.
Hongyu Wang [Wed, 7 Feb 2024 06:42:58 +0000 (14:42 +0800)]
[APX PPX] Avoid generating unmatched pushp/popp in pro/epilogue
According to APX spec, the pushp/popp pairs should be matched,
otherwise the PPX hint cannot take effect and cause performance loss.
In the ix86_expand_epilogue, there are several optimizations that may
cause the epilogue using mov to restore the regs. Check if PPX applied
and prevent usage of mov/leave in the epilogue. Also do not use PPX
for eh_return.
gcc/ChangeLog:
* config/i386/i386.cc (ix86_expand_prologue): Set apx_ppx_used
flag in m.fs with TARGET_APX_PPX && !crtl->calls_eh_return.
(ix86_emit_save_regs): Emit ppx is available only when
TARGET_APX_PPX && !crtl->calls_eh_return.
(ix86_expand_epilogue): Don't restore reg using mov when
apx_ppx_used flag is true.
* config/i386/i386.h (struct machine_frame_state):
Add apx_ppx_used flag.
gcc/testsuite/ChangeLog:
* gcc.target/i386/apx-ppx-2.c: New test.
* gcc.target/i386/apx-ppx-3.c: Likewise.
The current code was based on an early version of the SME spec,
which allowed the .Q forms of TRN1, TRN2, UZP1, UZP2, ZIP1, and ZIP2
to be used in streaming mode. We should now forbid them instead;
see https://developer.arm.com/documentation/ddi0602/2024-09/SVE-Instructions/TRN1--TRN2--vectors---Interleave-even-or-odd-elements-from-two-vectors-?lang=en
and the corresponding entries for the others.
Yangyu Chen [Thu, 31 Oct 2024 19:52:45 +0000 (19:52 +0000)]
Fix function multiversioning dispatcher link error with LTO
We forgot to apply DECL_EXTERNAL to __init_cpu_features_resolver decl. When
building with LTO, the linker cannot find the
__init_cpu_features_resolver.lto_priv* symbol, causing the link error.
This patch gets this fixed by adding DECL_EXTERNAL to the decl. To avoid used
but never defined warning for this symbol, we also mark TREE_PUBLIC to the decl.
We should also mark the decl having hidden visibility. And fix the attribute in
the same way for __aarch64_cpu_features identifier.
* config/aarch64/aarch64.cc (dispatch_function_versions): Adding
DECL_EXTERNAL, TREE_PUBLIC and hidden DECL_VISIBILITY to
__init_cpu_features_resolver and __aarch64_cpu_features.
David Malcolm [Wed, 30 Oct 2024 20:11:41 +0000 (16:11 -0400)]
jit: fix leak of pending_assemble_externals_set [PR117275]
My recent r15-4580-g779c0390e3b57d fix for resetting state in
varasm.cc introduced some noise to "make selftest-valgrind" and,
presumably, a memory leak in libgccjit:
==2462086== 160 (56 direct, 104 indirect) bytes in 1 blocks are definitely lost in loss record 248 of 352
==2462086== at 0x5270E7D: operator new(unsigned long) (vg_replace_malloc.c:342)
==2462086== by 0x1D1EB89: init_varasm_once() (varasm.cc:6806)
==2462086== by 0x181C845: backend_init() (toplev.cc:1826)
==2462086== by 0x181D41A: do_compile() (toplev.cc:2193)
==2462086== by 0x181D99C: toplev::main(int, char**) (toplev.cc:2371)
==2462086== by 0x378391D: main (main.cc:39)
Fixed thusly.
gcc/ChangeLog:
PR jit/117275
* varasm.cc (process_pending_assemble_externals): Reset
pending_assemble_externals_set to nullptr after deleting it.
(varasm_cc_finalize): Delete pending_assemble_externals_set.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
(cherry picked from commit 7f41203f08b9948c1c636dc9d66571121c6c7793) Signed-off-by: David Malcolm <dmalcolm@redhat.com>
David Malcolm [Wed, 30 Oct 2024 20:11:40 +0000 (16:11 -0400)]
jit: reset state in varasm.cc [PR117275]
PR jit/117275 reports various jit test failures seen on
powerpc64le-unknown-linux-gnu due to hitting this assertion
in varasm.cc on the 2nd compilation in a process:
#2 0x00007ffff63e67d0 in assemble_external_libcall (fun=0x7ffff2a4b1d8)
at ../../src/gcc/varasm.cc:2650
2650 gcc_assert (!pending_assemble_externals_processed);
(gdb) p pending_assemble_externals_processed
$1 = true
We're not properly resetting state in varasm.cc after a compile
for libgccjit.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
(cherry picked from commit 779c0390e3b57d1eebd41bbfe43d1f329c91de6c) Signed-off-by: David Malcolm <dmalcolm@redhat.com>
jit.dg/test-error-pr63969-missing-driver.c tries to break PATH and
verify that an error is generated when using an external driver.
However it does this by unsetting PATH, and so the test could
accidentally find the driver if the system supplies a default and the
driver happens to be installed in that path (reported as rhbz#2318021).
Fix the test by instead setting PATH to a bogus value.
gcc/testsuite/ChangeLog:
* jit.dg/test-error-pr63969-missing-driver.c (create_code): When
breaking PATH, use setenv with a bogus value, rather than
unsetenv, in case the system uses a default path that contains
the driver binary.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
(cherry picked from commit f8dcb559e615dbb4557a23363f9532a3544a7241) Signed-off-by: David Malcolm <dmalcolm@redhat.com>
Alex Coplan [Wed, 30 Oct 2024 13:46:12 +0000 (13:46 +0000)]
aarch64: Assume alias conflict if common address reg changes [PR116783]
As the PR shows, pair fusion was tricking memory_modified_in_insn_p into
returning false when a common base register (in this case, x1) was
modified between the mem and the store insn. This lead to wrong code as
the accesses really did alias.
To avoid this sort of problem, this patch avoids invoking RTL alias
analysis altogether (and assume an alias conflict) if the two insns to
be compared share a common address register R, and the insns see different
definitions of R (i.e. it was modified in between).
PR rtl-optimization/116783
* config/aarch64/aarch64-ldp-fusion.cc
(def_walker::cand_addr_uses): New.
(def_walker::def_walker): Add parameter for candidate address
uses.
(def_walker::alias_conflict_p): Declare.
(def_walker::addr_reg_conflict_p): New.
(def_walker::conflict_p): New.
(store_walker::store_walker): Add parameter for candidate
address uses and pass to base ctor.
(store_walker::conflict_p): Rename to ...
(store_walker::alias_conflict_p): ... this.
(load_walker::load_walker): Add parameter for candidate
address uses and pass to base ctor.
(load_walker::conflict_p): Rename to ...
(load_walker::alias_conflict_p): ... this.
(ldp_bb_info::try_fuse_pair): Collect address register
uses for candidate insns and pass down to alias walkers.
gcc/testsuite/ChangeLog:
PR rtl-optimization/116783
* g++.dg/torture/pr116783.C: New test.
liuhongt [Tue, 29 Oct 2024 09:09:39 +0000 (02:09 -0700)]
Fix ICE due to subreg:us_truncate.
Force_operand issues an ICE when input
is (subreg:DI (us_truncate:V8QI)), it's probably because it's an
invalid rtx, So refine backend patterns for that.
gcc/ChangeLog:
PR target/117318
* config/i386/sse.md (*avx512vl_<code>v2div2qi2_mask_store_1):
Rename to ..
(avx512vl_<code>v2div2qi2_mask_store_1): .. this.
(avx512vl_<code>v2div2qi2_mask_store_2): Change to
define_expand.
(*avx512vl_<code><mode>v4qi2_mask_store_1): Rename to ..
(avx512vl_<code><mode>v4qi2_mask_store_1): .. this.
(avx512vl_<code><mode>v4qi2_mask_store_2): Change to
define_expand.
(*avx512vl_<code><mode>v8qi2_mask_store_1): Rename to ..
(avx512vl_<code><mode>v8qi2_mask_store_1): .. this.
(avx512vl_<code><mode>v8qi2_mask_store_2): Change to
define_expand.
(*avx512vl_<code><mode>v4hi2_mask_store_1): Rename to ..
(avx512vl_<code><mode>v4hi2_mask_store_1): .. this.
(avx512vl_<code><mode>v4hi2_mask_store_2): Change to
define_expand.
(*avx512vl_<code>v2div2hi2_mask_store_1): Rename to ..
(avx512vl_<code>v2div2hi2_mask_store_1): .. this.
(avx512vl_<code>v2div2hi2_mask_store_2): Change to
define_expand.
(*avx512vl_<code>v2div2si2_mask_store_1): Rename to ..
(avx512vl_<code>v2div2si2_mask_store_1): .. this.
(avx512vl_<code>v2div2si2_mask_store_2): Change to
define_expand.
(*avx512f_<code>v8div16qi2_mask_store_1): Rename to ..
(avx512f_<code>v8div16qi2_mask_store_1): .. this.
(avx512f_<code>v8div16qi2_mask_store_2): Change to
define_expand.
Eric Botcazou [Tue, 29 Oct 2024 20:40:34 +0000 (21:40 +0100)]
Fix miscompilation of function containing __builtin_unreachable
This is a wrong-code generation on the SPARC for a function containing
a call to __builtin_unreachable caused by the delay slot scheduling pass,
and more specifically the find_end_label function which has these lines:
/* Otherwise, see if there is a label at the end of the function. If there
is, it must be that RETURN insns aren't needed, so that is our return
label and we don't have to do anything else. */
The comment was correct 20 years ago but no longer is nowadays in the
presence of RTL epilogues and calls to __builtin_unreachable, so the
patch just removes the associated two lines of code:
else if (LABEL_P (insn))
*plabel = as_a <rtx_code_label *> (insn);
and otherwise contains just adjustments to the commentary.
gcc/
PR rtl-optimization/117327
* reorg.cc (find_end_label): Do not return a dangling label at the
end of the function and adjust commentary.
gcc/testsuite/
* gcc.c-torture/execute/20241029-1.c: New test.
Peter Bergner [Fri, 23 Aug 2024 16:45:40 +0000 (11:45 -0500)]
rs6000: Fix PTImode handling in power8 swap optimization pass [PR116415]
Our power8 swap optimization pass has some special handling for optimizing
swaps of TImode variables. The test case reported in bugzilla uses a call
to __atomic_compare_exchange, which introduces a variable of PTImode and
that does not get the same treatment as TImode leading to wrong code
generation. The simple fix is to treat PTImode identically to TImode.
2024-08-23 Peter Bergner <bergner@linux.ibm.com>
gcc/
PR target/116415
* config/rs6000/rs6000.h (TI_OR_PTI_MODE): New define.
* config/rs6000/rs6000-p8swap.cc (rs6000_analyze_swaps): Use it to
handle PTImode identically to TImode.
gcc/testsuite/
PR target/116415
* gcc.target/powerpc/pr116415.c: New test.
Jakub Jelinek [Fri, 25 Oct 2024 12:09:42 +0000 (14:09 +0200)]
Assorted --disable-checking fixes [PR117249]
We have currently 3 different definitions of gcc_assert macro, one used most
of the time (unless --disable-checking) which evaluates the condition at
runtime and also checks it at runtime, then one for --disable-checking GCC 4.5+
which looks like
((void)(UNLIKELY (!(EXPR)) ? __builtin_unreachable (), 0 : 0))
and a fallback one
((void)(0 && (EXPR)))
Now, the last one actually doesn't evaluate any of the side-effects in the
argument, just quiets up unused var/parameter warnings.
I've tried to replace the middle definition with
({ [[assume (EXPR)]]; (void) 0; })
for compilers which support assume attribute and statement expressions
(surprisingly quite a few spots use gcc_assert inside of comma expressions),
but ran into PR117287, so for now such a change isn't being proposed.
The following patch attempts to move important side-effects from gcc_assert
arguments.
Bootstrapped/regtested on x86_64-linux and i686-linux with normal
--enable-checking=yes,rtl,extra, plus additionally I've attempted to do
x86_64-linux bootstrap with --disable-checking and gcc_assert changed to the
((void)(0 && (EXPR)))
version when --disable-checking. That version ran into spurious middle-end
warnings
../../gcc/../include/libiberty.h:733:36: error: argument to ‘alloca’ is too large [-Werror=alloca-larger-than=]
../../gcc/tree-ssa-reassoc.cc:5659:20: note: in expansion of macro ‘XALLOCAVEC’
int op_num = ops.length ();
int op_normal_num = op_num;
gcc_assert (op_num > 0);
int stmt_num = op_num - 1;
gimple **stmts = XALLOCAVEC (gimple *, stmt_num);
where we have gcc_assert exactly to work-around middle-end warnings.
Guess I'd need to also disable -Werror for this experiment, which actually
isn't a problem with unmodified system.h, because even for
--disable-checking we use the __builtin_unreachable at least in
stage2/stage3 and so the warnings aren't emitted, and even if it used
[[assume ()]]; it would work too because in stage2/stage3 we could again
rely on assume and statement expression support.
2024-10-25 Jakub Jelinek <jakub@redhat.com>
PR middle-end/117249
* tree-ssa-structalias.cc (insert_vi_for_tree): Move put calls out of
gcc_assert.
* lto-cgraph.cc (lto_symtab_encoder_delete_node): Likewise.
* gimple-ssa-strength-reduction.cc (get_alternative_base,
add_cand_for_stmt): Likewise.
* tree-eh.cc (add_stmt_to_eh_lp_fn): Likewise.
* except.cc (duplicate_eh_regions_1): Likewise.
* tree-ssa-reassoc.cc (insert_operand_rank): Likewise.
* config/nvptx/nvptx.cc (nvptx_expand_call): Use == rather than = in
gcc_assert.
* opts-common.cc (jobserver_info::disconnect): Call close outside of
gcc_assert and only check result in it.
(jobserver_info::return_token): Call write outside of gcc_assert and
only check result in it.
* genautomata.cc (output_default_latencies): Move j++ side-effect
outside of gcc_assert.
* tree-ssa-loop-ivopts.cc (get_alias_ptr_type_for_ptr_address): Use
== rather than = in gcc_assert.
* cgraph.cc (symbol_table::create_edge): Move ++edges_max_uid
side-effect outside of gcc_assert.
Jakub Jelinek [Thu, 24 Oct 2024 10:56:19 +0000 (12:56 +0200)]
c++: Further fix for get_member_function_from_ptrfunc [PR117259]
The following testcase shows that the previous get_member_function_from_ptrfunc
changes weren't sufficient and we still have cases where
-fsanitize=undefined with pointers to member functions can cause wrong code
being generated and related false positive warnings.
The problem is that save_expr doesn't always create SAVE_EXPR, it can skip
some invariant arithmetics and in the end it could be really large
expressions which would be evaluated several times (and what is worse, with
-fsanitize=undefined those expressions then can have SAVE_EXPRs added to
their subparts for -fsanitize=bounds or -fsanitize=null or
-fsanitize=alignment instrumentation). Tried to just build1 a SAVE_EXPR
+ add TREE_SIDE_EFFECTS instead of save_expr, but that doesn't work either,
because cp_fold happily optimizes those SAVE_EXPRs away when it sees
SAVE_EXPR operand is tree_invariant_p.
So, the following patch instead of using save_expr or building SAVE_EXPR
manually builds a TARGET_EXPR. Both types are pointers, so it doesn't need
to be destroyed in any way, but TARGET_EXPR is what doesn't get optimized
away immediately.
2024-10-24 Jakub Jelinek <jakub@redhat.com>
PR c++/117259
* typeck.cc (get_member_function_from_ptrfunc): Use force_target_expr
rather than save_expr for instance_ptr and function. Don't call it
for TREE_CONSTANT.
Jakub Jelinek [Thu, 24 Oct 2024 10:45:34 +0000 (12:45 +0200)]
asan: Fix up build_check_stmt gsi handling [PR117209]
gsi_safe_insert_before properly updates gsi_bb in gimple_stmt_iterator
in case it splits objects, but unfortunately build_check_stmt was in
some places (but not others) using a copy of the iterator rather than
the iterator passed from callers and so didn't propagate that to callers.
I guess it didn't matter much before when it was just using
gsi_insert_before as that really didn't change the iterator.
The !before_p case is apparently dead code, nothing is calling it with
before_p=false since around 4.9.
2024-10-24 Jakub Jelinek <jakub@redhat.com>
PR sanitizer/117209
* asan.cc (maybe_cast_to_ptrmode): Formatting fix.
(build_check_stmt): Don't copy *iter into gsi, perform all
the updates on iter directly.
Eric Botcazou [Wed, 11 Sep 2024 17:53:12 +0000 (19:53 +0200)]
ada: Fix internal error on bit-packed array type with Volatile_Full_Access
The problem occurs when the component type is a record type with default
values for the initialization procedure of the (base) array type, because
the compiler is trying to generate a full access for a parameter of the
base array type, which does not make sense.
gcc/ada/ChangeLog:
PR ada/116551
* gcc-interface/trans.cc (node_is_atomic) <N_Identifier>: Return
false if the type of the entity is an unconstrained array type.
(node_is_volatile_full_access) <N_Identifier>: Likewise.
Georg-Johann Lay [Tue, 22 Oct 2024 09:51:44 +0000 (11:51 +0200)]
AVR: target/116953 - Restore recog_data after calling jump_over_one_insn_p.
The previous fix for PR116953 is incomplete because references to
recog_data are escaping avr_out_sbxx_branch() in the form of %-operands
in the returned asm code template. This patch reverts the previous fix,
and re-extracts the operands by means of extract_constrain_insn_cached()
after the call of jump_over_one_insn_p().
PR target/116953
gcc/
* config/avr/avr.cc (avr_out_sbxx_branch): Revert previous fix
for PR116953 (r15-4078). Run extract_constrain_insn_cached
on the current insn after calling jump_over_one_insn_p.
Paul Thomas [Tue, 16 Jul 2024 14:56:44 +0000 (15:56 +0100)]
Fortran: Simplify len_trim with array ref and fix mapping bug[PR84868].
2024-07-16 Paul Thomas <pault@gcc.gnu.org>
gcc/fortran
PR fortran/84868
* simplify.cc (gfc_simplify_len_trim): If the argument is an
element of a parameter array, simplify all the elements and
build a new parameter array to hold the result, after checking
that it doesn't already exist.
* trans-expr.cc (gfc_get_interface_mapping_array) if a string
length is available, use it for the typespec.
(gfc_add_interface_mapping): Supply the se string length.
gcc/testsuite/
PR fortran/84868
* gfortran.dg/pr84868.f90: New test.
Jakub Jelinek [Tue, 22 Oct 2024 18:30:41 +0000 (20:30 +0200)]
c-family: Fix up -Wsizeof-pointer-memaccess ICEs [PR117230]
In the following testcases, we ICE on all 4 function calls.
The problem is using TYPE_PRECISION on vector types (but guess it
would be similarly problematic on structures/unions/arrays).
The test only differentiates between suggestion what to do, whether
to supply explicit size because sizeof (*p) for
{,{,un}signed }char *p is not very likely what the user want, or
dereferencing the pointer, so I think limiting that suggestion
to integral types is ok.
2024-10-22 Jakub Jelinek <jakub@redhat.com>
PR c/117230
* c-warn.cc (sizeof_pointer_memaccess_warning): Only compare
TYPE_PRECISION of TREE_TYPE (type) to precision of char if
TREE_TYPE (type) is integral type.
* c-c++-common/Wsizeof-pointer-memaccess5.c: New test.
Jakub Jelinek [Tue, 15 Oct 2024 17:38:46 +0000 (19:38 +0200)]
match.pd: Further fma negation fixes [PR116891]
On Mon, Oct 14, 2024 at 08:53:29AM +0200, Jakub Jelinek wrote:
> > PR middle-end/116891
> > * match.pd ((negate (IFN_FNMS@3 @0 @1 @2)) -> (IFN_FMA @0 @1 @2)):
> > Only enable for !HONOR_SIGN_DEPENDENT_ROUNDING.
>
> Guess it would be nice to have a testcase which FAILs without the patch and
> PASSes with it, but it can be added later.
I've added such a testcase now, and additionally found the fix only fixed
one of the 4 problematic similar cases.
Here is a patch which fixes the others too and adds the testcases.
fma-pr116891.c FAILed without your patch, FAILs with your patch too (but
only due to the bar/baz/qux checks) and PASSes with the patch.
Nathaniel Shead [Thu, 23 May 2024 12:50:58 +0000 (22:50 +1000)]
c++/modules: Fix treatment of unnamed types [PR116929]
In r14-9530 we relaxed "depending on type with no-linkage" errors for
declarations that could actually be accessed from different TUs anyway.
However, this also enabled it for unnamed types, which never work.
In a normal module interface, an unnamed type is TU-local by
[basic.link] p15.2, and so cannot be exposed or the program is
ill-formed. We don't yet implement this checking but we should assume
that we will later; currently supporting this actually causes ICEs when
attempting to create the mangled name in some situations.
For a header unit, by [module.import] p5.3 it is unspecified whether two
TUs importing a header unit providing such a declaration are importing
the same header unit. In this case, we would require name mangling
changes to somehow allow the (anonymous) type exported by such a header
unit to correspond across different TUs in the presence of other
anonymous declarations, so for this patch just assume that this case
would be an ODR violation instead.
PR c++/116929
gcc/cp/ChangeLog:
* tree.cc (no_linkage_check): Anonymous types can't be accessed
in a different TU.
gcc/testsuite/ChangeLog:
* g++.dg/modules/linkage-1_a.C: Remove anonymous type test.
* g++.dg/modules/linkage-1_b.C: Likewise.
* g++.dg/modules/linkage-1_c.C: Likewise.
* g++.dg/modules/linkage-2.C: Add note about anonymous types.
Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com> Reviewed-by: Jason Merrill <jason@redhat.com>
(cherry picked from commit 0173dcce92baa62a74929814a75edb75eeab1a54)
testsuite: arm: Use check-function-bodies in cmse-5 tests
Converted the tests to use check-function-bodies in order to ensure that
the sequence is correct.
This also allows both APSR_nzcvq and APSR_nzcvqg as target selector does
not work when the -march and/or -mcpu overrides the target to test.
testsuite: Skip pr112305.c for -O[01] on simulators
gcc.dg/torture/pr112305.c contains an inner loop that executes
0x8000_0014 times and an outer loop that executes 5 times, giving about
10 billion total executions of the inner loop body. At -O2 and above we
are able to remove the inner loop, but at -O1 we keep a no-op loop:
dls lr, r3
.L3:
subs r3, r3, #1
le lr, .L3
and at -O0 we of course don't optimise.
This can lead to long execution times on simulators, possibly
triggering a timeout.
gcc/testsuite
* gcc.dg/torture/pr112305.c: Skip at -O0 and -O1 for simulators.
Patrick Palka [Fri, 4 Oct 2024 14:01:39 +0000 (10:01 -0400)]
libstdc++/ranges: Implement various small LWG issues
This implements the following small LWG issues:
3848. adjacent_view, adjacent_transform_view and slide_view missing base accessor
3851. chunk_view::inner-iterator missing custom iter_move and iter_swap
3947. Unexpected constraints on adjacent_transform_view::base()
4001. iota_view should provide empty
4012. common_view::begin/end are missing the simple-view check
4013. lazy_split_view::outer-iterator::value_type should not provide default constructor
4035. single_view should provide empty
4053. Unary call to std::views::repeat does not decay the argument
4054. Repeating a repeat_view should repeat the view
libstdc++-v3/ChangeLog:
* include/std/ranges (single_view::empty): Define as per LWG 4035.
(iota_view::empty): Define as per LWG 4001.
(lazy_split_view::_OuterIter::value_type): Remove default
constructor and make other constructor private as per LWG 4013.
(common_view::begin): Disable non-const overload for simple
views as per LWG 4012.
(common_view::end): Likewise.
(adjacent_view::base): Define as per LWG 3848.
(adjacent_transform_view::base): Likewise.
(chunk_view::_InnerIter::iter_move): Define as per LWG 3851.
(chunk_view::_InnerIter::itep_swap): Likewise.
(slide_view::base): Define as per LWG 3848.
(repeat_view): Adjust deduction guide as per LWG 4053.
(_Repeat::operator()): Adjust single-parameter overload as per
LWG 4054.
* testsuite/std/ranges/adaptors/adjacent/1.cc: Verify existence
of base member function.
* testsuite/std/ranges/adaptors/adjacent_transform/1.cc: Likewise.
* testsuite/std/ranges/adaptors/chunk/1.cc: Test LWG 3851 example.
* testsuite/std/ranges/adaptors/slide/1.cc: Verify existence of
base member function.
* testsuite/std/ranges/iota/iota_view.cc: Test LWG 4001 example.
* testsuite/std/ranges/repeat/1.cc: Test LWG 4053/4054 examples.
Patrick Palka [Thu, 22 Aug 2024 15:25:10 +0000 (11:25 -0400)]
libstdc++: Add some missing ranges feature-test macro tests
libstdc++-v3/ChangeLog:
* testsuite/25_algorithms/contains/1.cc: Verify value of
__cpp_lib_ranges_contains.
* testsuite/25_algorithms/find_last/1.cc: Verify value of
__cpp_lib_ranges_find_last.
* testsuite/25_algorithms/iota/1.cc: Verify value of
__cpp_lib_ranges_iota.
Patrick Palka [Thu, 22 Aug 2024 13:24:20 +0000 (09:24 -0400)]
libstdc++: Implement P2997R1 changes to the indirect invocability concepts
This implements the changes of this C++26 paper as a DR against C++20.
In passing this patch removes the std/ranges/version_c++23.cc test which
is now mostly obsolete after the version.def FTM refactoring, and instead
expands the __cpp_lib_ranges checks in another test so that it verifies
the exact value of the FTM on a per language version basis.
libstdc++-v3/ChangeLog:
* include/bits/iterator_concepts.h (indirectly_unary_invocable):
Relax as per P2997R1.
(indirectly_regular_unary_invocable): Likewise.
(indirect_unary_predicate): Likewise.
(indirect_binary_predicate): Likewise.
(indirect_equivalence_relation): Likewise.
(indirect_strict_weak_order): Likewise.
* include/bits/version.def (ranges): Update value for C++26.
* include/bits/version.h: Regenerate.
* testsuite/24_iterators/indirect_callable/p2997r1.cc: New test.
* testsuite/std/ranges/version_c++23.cc: Remove.
* testsuite/std/ranges/headers/ranges/synopsis.cc: Refine the
__cpp_lib_ranges checks.
Patrick Palka [Thu, 22 Aug 2024 13:24:11 +0000 (09:24 -0400)]
libstdc++: Implement P2609R3 changes to the indirect invocability concepts
This implements the changes of this C++23 paper as a DR against C++20.
Note that after the later P2538R1 "ADL-proof std::projected" (which we
already implement), we can't use a simple partial specialization to match
specializations of the 'projected' alias template. So instead we identify
such specializations using a pair of distinguishing member aliases.
libstdc++-v3/ChangeLog:
* include/bits/iterator_concepts.h (__detail::__indirect_value):
Define.
(__indirect_value_t): Define as per P2609R3.
(iter_common_reference_t): Adjust as per P2609R3.
(indirectly_unary_invocable): Likewise.
(indirectly_regular_unary_invocable): Likewise.
(indirect_unary_predicate): Likewise.
(indirect_binary_predicate): Likewise.
(indirect_equivalence_relation): Likewise.
(indirect_strict_weak_order): Likewise.
(__detail::__projected::__type): Define member aliases
__projected_Iter and __projected_Proj providing the
template arguments of the current specialization.
* include/bits/version.def (ranges): Update value.
* include/bits/version.h: Regenerate.
* testsuite/24_iterators/indirect_callable/p2609r3.cc: New test.
* testsuite/std/ranges/version_c++23.cc: Update expected value
of __cpp_lib_ranges macro.
Richard Biener [Sat, 12 Oct 2024 12:51:37 +0000 (14:51 +0200)]
tree-optimization/117104 - add missed guards to max(a,b) != a simplification
For vector types we have to make sure the comparison result is a vector
type and the resulting compare operation is supported. As the resulting
compare is never an equality compare I didn't bother to check for the
cbranch case.
Richard Biener [Mon, 7 Oct 2024 09:05:17 +0000 (11:05 +0200)]
tree-optimization/116982 - analyze scalar loop exit early
The following makes sure to discover the scalar loop IV exit during
analysis as failure to do so (if DCE and friends are disabled this
can happen due to if-conversion doing DCE and FRE on the if-converted
loop) would ICE later.
I refrained from larger refactoring to be able to eventually backport.
PR tree-optimization/116982
* tree-vectorizer.h (vect_analyze_loop): Pass in .LOOP_VECTORIZED
call.
(vect_analyze_loop_form): Likewise.
* tree-vect-loop.cc (vect_analyze_loop_form): Reject loops where we
cannot determine a IV exit for the scalar loop.
(vect_analyze_loop): Adjust.
* tree-vectorizer.cc (try_vectorize_loop_1): Likewise.
* tree-parloops.cc (gather_scalar_reductions): Likewise.
Richard Biener [Sun, 13 Oct 2024 10:44:04 +0000 (12:44 +0200)]
tree-optimization/116907 - stale BLOCK reference from DECL_VALUE_EXPR
When we remove unused BLOCKs we fail to clean references to them
from DECL_VALUE_EXPRs of variables in other BLOCKs which in the
PR causes LTO streaming to walk into pointers to GGC freed blocks.
There's the question of whether such DECL_VALUE_EXPRs should keep
variables and blocks referenced live (it doesn't seem to do that)
and whether such DECL_VALUE_EXPRs should have survived in the
first place.
PR tree-optimization/116907
* tree-ssa-live.cc (clear_unused_block_pointer_in_block): New
helper.
(clear_unused_block_pointer): Call it.
Richard Biener [Sun, 13 Oct 2024 09:42:27 +0000 (11:42 +0200)]
tree-optimization/116481 - avoid building function_type[]
The following avoids building an array type with function or method
element type during diagnosing an array bound violation as this
will result in an error, rejecting a program with a not too useful
error message. Instead build such array type manually.
PR tree-optimization/116481
* pointer-query.cc (build_printable_array_type):
Build an array types with function or method element type
manually to avoid bogus diagnostic.
Richard Biener [Sun, 13 Oct 2024 13:12:44 +0000 (15:12 +0200)]
tree-optimization/116290 - fix compare-debug issue in ldist
Loop distribution does different analysis with -g0/-g due to counting
a debug stmt starting a BB against a limit which will everntually
lead to different IVOPTs choices. I've fixed a possible IVOPTs
issue on the way even though it doesn't make a difference here.
PR tree-optimization/116290
* tree-loop-distribution.cc (determine_reduction_stmt_1): PHIs
have no debug variants. Start with first non-debug real stmt.
* tree-ssa-loop-ivopts.cc (find_givs_in_bb): Do not analyze
debug stmts.
Richard Biener [Fri, 17 May 2024 09:02:29 +0000 (11:02 +0200)]
middle-end/115110 - Fix view_converted_memref_p
view_converted_memref_p was checking the reference type against the
pointer type of the offset operand rather than its pointed-to type
which leads to all refs being subject to view-convert treatment
in get_alias_set causing numerous testsuite fails but with its
new uses from r15-512-g9b7cad5884f21c is also a wrong-code issue.
liuhongt [Wed, 16 Oct 2024 05:43:48 +0000 (13:43 +0800)]
Refine splitters related to "combine vpcmpuw + zero_extend to vpcmpuw"
r12-6103-g1a7ce8570997eb combines vpcmpuw + zero_extend to vpcmpuw
with the pre_reload splitter, but the splitter transforms the
zero_extend into a subreg which make reload think the upper part is
garbage, it's not correct.
The patch adjusts the zero_extend define_insn_and_split to
define_insn to keep zero_extend.
gcc/ChangeLog:
PR target/117159
* config/i386/sse.md
(*<avx512>_cmp<V48H_AVX512VL:mode>3_zero_extend<SWI248x:mode>):
Change from define_insn_and_split to define_insn.
(*<avx512>_cmp<VI12_AVX512VL:mode>3_zero_extend<SWI248x:mode>):
Ditto.
(*<avx512>_ucmp<VI12_AVX512VL:mode>3_zero_extend<SWI248x:mode>):
Ditto.
(*<avx512>_ucmp<VI48_AVX512VL:mode>3_zero_extend<SWI248x:mode>):
Ditto.
(*<avx512>_cmp<V48H_AVX512VL:mode>3_zero_extend<SWI248x:mode>_2):
Split to the zero_extend pattern.
(*<avx512>_cmp<VI12_AVX512VL:mode>3_zero_extend<SWI248x:mode>_2):
Ditto.
(*<avx512>_ucmp<VI12_AVX512VL:mode>3_zero_extend<SWI248x:mode>_2):
Ditto.
(*<avx512>_ucmp<VI48_AVX512VL:mode>3_zero_extend<SWI248x:mode>_2):
Ditto.
Martin Jambor [Fri, 18 Oct 2024 19:32:16 +0000 (21:32 +0200)]
ipa: Treat static constructors and destructors as non-local (PR 115815)
In PR 115815, IPA-SRA thought it had control over all invocations of a
(recursive) static destructor but it did not see the implied
invocation which led to the original being left behind and the
clean-up code encountering uses of SSAs that definitely should have
been dead.
Fixed by teaching cgraph_node::can_be_local_p about static
constructors and destructors. Similar test is missing in
cgraph_node::local_p so I added the check there as well.
In addition to the commit with the fix, this backport also contains
squashed commit 1a458bdeb223ffa501bac8e76182115681967094 which fixes
dejagnu directives in the testcase.
gcc/ChangeLog:
2024-07-25 Martin Jambor <mjambor@suse.cz>
PR ipa/115815
* cgraph.cc (cgraph_node_cannot_be_local_p_1): Also check
DECL_STATIC_CONSTRUCTOR and DECL_STATIC_DESTRUCTOR.
* ipa-visibility.cc (non_local_p): Likewise.
(cgraph_node::local_p): Delete extraneous line of tabs.
gcc/testsuite/ChangeLog:
2024-07-25 Martin Jambor <mjambor@suse.cz>
PR ipa/115815
* gcc.dg/lto/pr115815_0.c: New test.
Li Xu [Thu, 10 Oct 2024 14:51:19 +0000 (08:51 -0600)]
RISC-V:Bugfix for C++ code compilation failure with rv32imafc_zve32f[pr116883]
From: xuli <xuli1@eswincomputing.com>
Example as follows:
int main()
{
unsigned long arraya[128], arrayb[128], arrayc[128];
for (int i = 0; i < 128; i++)
{
arraya[i] = arrayb[i] + arrayc[i];
}
return 0;
}
Compiled with -march=rv32imafc_zve32f -mabi=ilp32f, it will cause a compilation issue:
riscv_vector.h:40:25: error: ambiguating new declaration of 'vint64m4_t __riscv_vle64(vbool16_t, const long long int*, unsigned int)'
40 | #pragma riscv intrinsic "vector"
| ^~~~~~~~
riscv_vector.h:40:25: note: old declaration 'vint64m1_t __riscv_vle64(vbool64_t, const long long int*, unsigned int)'
With zvl=32b, vbool16_t is registered in init_builtins() with
type_common.precision=0x101 (nunits=2), mode_nunits[E_RVVMF16BI]=[2,2].
Normally, vbool64_t is only valid when TARGET_MIN_VLEN > 32, so vbool64_t
is not registered in init_builtins(), meaning vbool64_t=null.
In order to implement __attribute__((target("arch=+v"))), we must register
all vector types and all RVV intrinsics. Therefore, vbool64_t will be registered
by default with zvl=128b in reinit_builtins(), resulting in
type_common.precision=0x101 (nunits=2) and mode_nunits[E_RVVMF64BI]=[2,2].
We then get TYPE_VECTOR_SUBPARTS(vbool16_t) == TYPE_VECTOR_SUBPARTS(vbool64_t),
calculated using type_common.precision, resulting in 2. Since vbool16_t and
vbool64_t have the same element type (boolean_type), the compiler treats them
as the same type, leading to a re-declaration conflict.
After all types and intrinsics have been registered, processing
__attribute__((target("arch=+v"))) will update the parameters option and
init_adjust_machine_modes. Therefore, to avoid conflicts, we can choose
zvl=4096b for the null type reinit_builtins().
Patrick Palka [Tue, 15 Oct 2024 17:13:15 +0000 (13:13 -0400)]
c++: checking ICE w/ constexpr if and lambda as def targ [PR117054]
Here we're tripping over the assert in extract_locals_r which enforces
that an extra-args tree appearing inside another extra-args tree doesn't
actually have extra args. This invariant doesn't always hold for lambdas
(which recently gained the extra-args mechanism) but that should be
harmless since cp_walk_subtrees doesn't walk LAMBDA_EXPR_EXTRA_ARGS and
so should be immune to the PR114303 issue for now. So let's just disable
this assert for lambdas.
PR c++/117054
gcc/cp/ChangeLog:
* pt.cc (extract_locals_r): Disable tree_extra_args assert
for LAMBDA_EXPR.
Marek Polacek [Thu, 29 Aug 2024 14:40:50 +0000 (10:40 -0400)]
c++: ICE with -Wtautological-compare in template [PR116534]
Pre r14-4793, we'd call warn_tautological_cmp -> operand_equal_p
with operands wrapped in NON_DEPENDENT_EXPR, which works, since
o_e_p bails for codes it doesn't know. But now we pass operands
not encapsulated in NON_DEPENDENT_EXPR, and crash, because the
template tree for &a[x] has null DECL_FIELD_OFFSET.
This patch extends r12-7797 to cover the case when DECL_FIELD_OFFSET
is null.
PR c++/116534
gcc/ChangeLog:
* fold-const.cc (operand_compare::operand_equal_p): If either
field's DECL_FIELD_OFFSET is null, compare the fields with ==.
Marek Polacek [Wed, 28 Aug 2024 19:45:49 +0000 (15:45 -0400)]
c++: wrong error due to std::initializer_list opt [PR116476]
Here maybe_init_list_as_array gets elttype=field, init={NON_LVALUE_EXPR <2>}
and it tries to convert the init's element type (int) to field
using implicit_conversion, which works, so overall maybe_init_list_as_array
is successful.
But it constifies init_elttype so we end up with "const int". Later,
when we actually perform the conversion and invoke field::field(T&&),
we end up with this error:
error: binding reference of type 'int&&' to 'const int' discards qualifiers
So I think maybe_init_list_as_array should try to perform the conversion,
like it does below with fc.
PR c++/116476
gcc/cp/ChangeLog:
* call.cc (maybe_init_list_as_array): Try convert_like and see if it
worked.
We cannot elide the TARGET_EXPR because we're taking its address.
It is set as eliding in massage_init_elt. I've tried to not set
TARGET_EXPR_ELIDING_P when the context is not direct-initialization.
That didn't work: even when it's not direct-initialization now, it
can become one later, for instance, after split_nonconstant_init.
One problem is that replace_placeholders_for_class_temp_r will replace
placeholders in non-eliding TARGET_EXPRs with the slot, but if we then
elide the TARGET_EXPR, we end up with a "stray" VAR_DECL and crash.
(Only some TARGET_EXPRs are handled by replace_decl.)
I thought I'd have to go back to
<https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651163.html> but
then I realized that this problem occurrs only with ()-init but not
{}-init. With {}-init, there is no problem, because we are clearing
TARGET_EXPR_ELIDING_P in process_init_constructor_record:
/* We can't actually elide the temporary when initializing a
potentially-overlapping field from a function that returns by
value. */
if (ce->index
&& TREE_CODE (next) == TARGET_EXPR
&& unsafe_copy_elision_p (ce->index, next))
TARGET_EXPR_ELIDING_P (next) = false;
But that does not happen for ()-init because we have no ce->index.
()-init doesn't allow brace elision so we don't really reshape them.
But I can just move the clearing a few lines down and then it handles
both ()-init and {}-init.
PR c++/116424
gcc/cp/ChangeLog:
* typeck2.cc (process_init_constructor_record): Move the clearing of
TARGET_EXPR_ELIDING_P down.
Uros Bizjak [Tue, 15 Oct 2024 14:51:33 +0000 (16:51 +0200)]
i386: Fix expand_vector_set for VEC_MERGE/VEC_DUPLICATE RTX [PR117116]
Middle end can generate SYMBOL_REF RTX as a value "val" in the call
to expand_vector_set, but SYMBOL_REF RTX is not accepted in
<sse2p4_1>_pinsr<ssemodesuffix> insn pattern, generated via
VEC_MERGE/VEC_DUPLICATE RTX path.
Force the value into a register before VEC_MERGE/VEC_DUPLICATE RTX
is generated if it doesn't satisfy nonimmediate_operand predicate.
PR target/117116
gcc/ChangeLog:
* config/i386/i386-expand.cc (expand_vector_set): Force "val"
into a register before VEC_MERGE/VEC_DUPLICATE RTX is generated
if it doesn't satisfy nonimmediate_operand predicate.
Jonathan Wakely [Wed, 16 Oct 2024 08:22:37 +0000 (09:22 +0100)]
libstdc++: Fix Python deprecation warning in printers.py
python/libstdcxx/v6/printers.py:1355: DeprecationWarning: 'count' is passed as positional argument
The Python docs say:
Deprecated since version 3.13: Passing count and flags as positional
arguments is deprecated. In future Python versions they will be
keyword-only parameters.
Using a keyword argument for count only became possible with Python 3.1
so introduce a new function to do the substitution.
libstdc++-v3/ChangeLog:
* python/libstdcxx/v6/printers.py (strip_fundts_namespace): New.
(StdExpAnyPrinter, StdExpOptionalPrinter): Use it.
Jonathan Wakely [Sun, 13 Oct 2024 20:47:14 +0000 (21:47 +0100)]
libstdc++: Implement LWG 3564 for ranges::transform_view
The _Iterator<true> type returned by begin() const uses const F& to
transform the elements, so it should use const F& to determine the
iterator's value_type and iterator_category as well.
This was accepted into the WP in July 2022.
libstdc++-v3/ChangeLog:
* include/std/ranges (transform_view:_Iterator): Use const F&
to determine value_type and iterator_category of
_Iterator<true>, as per LWG 3564.
* testsuite/std/ranges/adaptors/transform.cc: Check value_type
and iterator_category.
Jonathan Wakely [Fri, 11 Oct 2024 08:40:38 +0000 (09:40 +0100)]
libstdc++: Fix localized %c formatting for <chrono> [PR117085]
When formatting a time point with %c we call std::vformat_to using the
formatting locale's D_T_FMT string, but we weren't adding the L option
to the format string. This meant we always interpreted D_T_FMT in the C
locale, instead of using the formatting locale as obviously intended
when %c is used.
libstdc++-v3/ChangeLog:
PR libstdc++/117085
* include/bits/chrono_io.h (__formatter_chrono::_M_c): Add L
option to format string.
* testsuite/std/time/format.cc: Move to...
* testsuite/std/time/format/format.cc: ...here.
* testsuite/std/time/format/pr117085.cc: New test.
Jonathan Wakely [Fri, 27 Sep 2024 15:54:31 +0000 (16:54 +0100)]
libstdc++: Tweak %c formatting for chrono types
libstdc++-v3/ChangeLog:
* include/bits/chrono_io.h (__formatter_chrono::_M_c): Add
[[unlikely]] attribute to condition for missing %c format in
locale. Use %T instead of %H:%M:%S in fallback.
Jonathan Wakely [Tue, 24 Sep 2024 22:20:56 +0000 (23:20 +0100)]
libstdc++: Populate generic std::time_get's wide %c format [PR117135]
I missed out the __timepunct<wchar_t> specialization for the "generic"
implementation when defining the %c format in r15-4016-gc534e37faccf48.
libstdc++-v3/ChangeLog:
PR libstdc++/117135
* config/locale/generic/time_members.cc
(__timepunct<wchar_t>::_M_initialize_timepunc): Set
_M_date_time_format for C locale. Set %Ex formats to the same
values as the %x formats.
Jonathan Wakely [Sun, 13 Oct 2024 21:48:43 +0000 (22:48 +0100)]
libstdc++: Use std::move for iterator in ranges::fill [PR117094]
Input iterators aren't required to be copyable.
libstdc++-v3/ChangeLog:
PR libstdc++/117094
* include/bits/ranges_algobase.h (__fill_fn): Use std::move for
iterator that might not be copyable.
* testsuite/25_algorithms/fill/constrained.cc: Check
non-copyable iterator with sized sentinel.
middle-end: Fix ifcvt predicate generation for masked function calls
Up until now, due to a latent bug in the code for the ifcvt pass,
irrespective of the branch taken in a conditional statement, the
original condition for the if statement was used in masking the
function call.
This patch ensures that the correct predicate mask generation is
carried out such that, upon autovectorization, the correct vector
lanes are selected in the vectorized function call.
gcc/ChangeLog:
* tree-if-conv.cc (predicate_statements): Fix handling of
predicated function calls.
Jan Hubicka [Mon, 22 Jul 2024 21:01:50 +0000 (23:01 +0200)]
Fix handling of ICF_NOVOPS in ipa-modref
As shown in somewhat convoluted testcase, ipa-modref is mistreating
ECF_NOVOPS as "having no side effects". This come from time when
modref cared only about memory accesses and thus it was possible to
shortcut on it.
This patch removes (hopefully) all those bad shortcuts.
Bootstrapped/regtested x86_64-linux, comitted.
The testcase contains a VNx2QImode pseudo that is live across a call
and that cannot be allocated a call-preserved register. LRA quite
reasonably tried to save it before the call and restore it afterwards.
Unfortunately, the target told it to do that in SImode, even though
punning between SImode and VNx2QImode is disallowed by both
TARGET_CAN_CHANGE_MODE_CLASS and TARGET_MODES_TIEABLE_P.
The natural class to use for SImode is GENERAL_REGS, so this led
to an unsalvageable situation in which we had:
(set (subreg:VNx2QI (reg:SI A) 0) (reg:VNx2QI B))
where A needed GENERAL_REGS and B needed FP_REGS. We therefore ended
up in a reload loop.
The hooks above should ensure that this situation can never occur
for incoming subregs. It only happened here because the target
explicitly forced it.
The decision to use SImode for modes smaller than 4 bytes dates
back to the beginning of the port, before 16-bit floating-point
modes existed. I'm not sure whether promoting to SImode really
makes sense for any FPR, but that's a separate performance/QoI
discussion. For now, this patch just disallows using SImode
when it is wrong for correctness reasons, since that should be
safer to backport.
gcc/
PR testsuite/116238
* config/aarch64/aarch64.cc (aarch64_hard_regno_caller_save_mode):
Only return SImode if we can convert to and from it.
gcc/testsuite/
PR testsuite/116238
* gcc.target/aarch64/sve/pr116238.c: New test.
Steve Baird [Mon, 8 Jul 2024 21:45:55 +0000 (14:45 -0700)]
ada: Type conversion in instance incorrectly rejected.
In some cases, a legal type conversion in a generic package is correctly
accepted but the corresponding type conversion in an instance of the generic
is incorrectly rejected.
gcc/ada/
PR ada/114593
* sem_res.adb (Valid_Conversion): Test In_Instance instead of
In_Instance_Body.
According to Intel SOM[1], For Crestmont, most 256-bit Intel AVX2
instructions can be decomposed into two independent 128-bit
micro-operations, except for a subset of Intel AVX2 instructions,
known as cross-lane operations, can only compute the result for an
element by utilizing one or more sources belonging to other elements.
The 256-bit instructions listed below use more operand sources than
can be natively supported by a single reservation station within these
microarchitectures. They are decomposed into two μops, where the first
μop resolves a subset of operand dependencies across two cycles. The
dependent second μop executes the 256-bit operation by using a single
128-bit execution port for two consecutive cycles with a five-cycle
latency for a total latency of seven cycles.
Instead of setting tune avx128_optimal for SRF, the patch add a new
tune avx256_avoid_vec_perm for it. so by default, vectorizer still
uses 256-bit VF if cost is profitable, but lowers to 128-bit whenever
256-bit vec_perm is needed for auto-vectorization. w/o vec_perm,
performance of 256-bit vectorization should be similar as 128-bit
ones(some benchmark results show it's even better than 128-bit
vectorization since it enables more parallelism for convert cases.)
* config/i386/i386.cc (ix86_vector_costs::ix86_vector_costs):
Add new member m_num_avx256_vec_perm.
(ix86_vector_costs::add_stmt_cost): Record 256-bit vec_perm.
(ix86_vector_costs::finish_cost): Prevent vectorization for
TAREGT_AVX256_AVOID_VEC_PERM when there's 256-bit vec_perm
instruction.
* config/i386/i386.h (TARGET_AVX256_AVOID_VEC_PERM): New
Macro.
* config/i386/x86-tune.def (X86_TUNE_AVX256_SPLIT_REGS): Add
m_CORE_ATOM.
(X86_TUNE_AVX256_AVOID_VEC_PERM): New tune.
gcc/testsuite/ChangeLog:
* gcc.target/i386/avx256_avoid_vec_perm.c: New test.
For Crestmont, 4-operand vex blendv instructions come from MSROM and
is slower than 3-instructions sequence (op1 & mask) | (op2 & ~mask).
legacy blendv instruction can still be handled by the decoder.
The patch add a new tune which is enabled for all processors except
for SRF/CWF. It will use vpand + vpandn + vpor instead of
vpblendvb(similar for vblendvps/vblendvpd) for SRF/CWF.
gcc/ChangeLog:
* config/i386/i386-expand.cc (ix86_expand_sse_movcc): Guard
instruction blendv generation under new tune.
* config/i386/i386.h (TARGET_SSE_MOVCC_USE_BLENDV): New Macro.
* config/i386/x86-tune.def (X86_TUNE_SSE_MOVCC_USE_BLENDV):
New tune.
Richard Biener [Wed, 9 Oct 2024 09:42:59 +0000 (11:42 +0200)]
tree-optimization/117041 - fix load classification of former grouped load
When we first detect a grouped load but later dis-associate it we
only set DR_GROUP_FIRST_ELEMENT to NULL, indicating it is not a
STMT_VINFO_GROUPED_ACCESS but leave DR_GROUP_NEXT_ELEMENT set. This
causes a stray DR_GROUP_NEXT_ELEMENT access in get_group_load_store_type
to go wrong, indicating a load isn't single_element_p when it actually
is, leading to wrong classification and an ICE.
PR tree-optimization/117041
* tree-vect-stmts.cc (get_group_load_store_type): Only
check DR_GROUP_NEXT_ELEMENT for STMT_VINFO_GROUPED_ACCESS.
Richard Biener [Mon, 30 Sep 2024 11:38:28 +0000 (13:38 +0200)]
tree-optimization/116879 - failure to recognize non-empty latch
When we relaxed the vectorizers constraint on loop structure verifying
the emptiness of the latch became too lose as can be seen in the case
for PR116879 where the latch effectively contains two basic-blocks
which one being an unmerged forwarder that's not empty.
PR tree-optimization/116879
* tree-vect-loop.cc (vect_analyze_loop_form): Scan all
blocks that form the latch.