Richard Biener [Wed, 22 Mar 2023 08:29:49 +0000 (09:29 +0100)]
rtl-optimization/109237 - quadraticness in delete_trivially_dead_insns
The following addresses quadraticness in processing debug insns
in delete_trivially_dead_insns and insn_live_p by using TREE_VISITED
on the INSN_VAR_LOCATION_DECL to indicate a later debug bind
with the same decl and no intervening real insn or debug marker.
That gets rid of the NEXT_INSN walk in insn_live_p in favor of
first clearing TREE_VISITED in the first loop over insn and
the book-keeping of decls we set the bit since we need to clear
them when visiting a real or debug marker insn.
That improves the time spent in delete_trivially_dead_insns from
10.6s to 2.2s for the testcase.
PR rtl-optimization/109237
* cse.cc (insn_live_p): Remove NEXT_INSN walk, instead check
TREE_VISITED on INSN_VAR_LOCATION_DECL.
(delete_trivially_dead_insns): Maintain TREE_VISITED on
active debug bind INSN_VAR_LOCATION_DECL.
For the testcase bb_is_just_return is on top of the profile, changing
it to walk BB insns backwards puts it off the profile. That's because
in the forward walk you have to process possibly many debug insns
but in a backward walk you very likely run into control insns first.
PR rtl-optimization/109237
* cfgcleanup.cc (bb_is_just_return): Walk insns backwards.
Jakub Jelinek [Wed, 19 Apr 2023 08:01:04 +0000 (10:01 +0200)]
testsuite: Fix up pr109524.C for -std=c++23 [PR109524]
This testcase was reduced such that it isn't valid C++23, so with my
usual testing with GXX_TESTSUITE_STDS=98,11,14,17,20,2b it fails:
FAIL: g++.dg/pr109524.C -std=gnu++2b (test for excess errors)
.../gcc/testsuite/g++.dg/pr109524.C: In function 'nn hh(nn)':
.../gcc/testsuite/g++.dg/pr109524.C:35:12: error: cannot bind non-const lvalue reference of type 'nn&' to an rvalue of type 'nn'
.../gcc/testsuite/g++.dg/pr109524.C:17:6: note: initializing argument 1 of 'nn::nn(nn&)'
The following patch fixes that and I've verified it doesn't change
anything on what the test was testing, it still ICEs in r13-7198 and
passes in r13-7203, now in all language modes (except for 98 where
it is intentionally UNSUPPORTED).
2023-04-19 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/109524
* g++.dg/pr109524.C (nn::nn): Change argument type from nn & to
const nn &.
Check hard_regno_mode_ok before setting lowest memory move cost for the mode with different reg classes.
There's a potential performance issue when backend returns some
unreasonable value for the mode which can be never be allocate with
reg class.
gcc/ChangeLog:
PR rtl-optimization/109351
* ira.cc (setup_class_subset_and_memory_move_costs): Check
hard_regno_mode_ok before setting lowest memory move cost for
the mode with different reg classes.
Jonathan Wakely [Tue, 18 Apr 2023 23:07:36 +0000 (00:07 +0100)]
libstdc++: Adjust uses of null pointer constants in docs
libstdc++-v3/ChangeLog:
* doc/xml/manual/extensions.xml: Fix example to declare and
qualify std::free, and use NULL instead of 0.
* doc/html/manual/ext_demangling.html: Regenerate.
* libsupc++/cxxabi.h: Adjust doxygen comments.
ifcvt.cc: Prevent excessive if-conversion for conditional moves
gcc/
* ifcvt.cc (cond_move_process_if_block): Consider the result of
targetm.noce_conversion_profitable_p() when replacing the original
sequence with the converted one.
Andrew Pinski [Fri, 31 Mar 2023 00:00:20 +0000 (00:00 +0000)]
PHIOPT: Move tree_ssa_cs_elim into pass_cselim::execute.
This moves around the code for tree_ssa_cs_elim slightly
improving code readability and removing declarations that
are no longer needed.
OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
gcc/ChangeLog:
* tree-ssa-phiopt.cc (tree_ssa_phiopt_worker): Remove declaration.
(make_pass_phiopt): Make execute out of line.
(tree_ssa_cs_elim): Move code into ...
(pass_cselim::execute): here.
i386: Improve permutations with INSERTPS instruction [PR94908]
INSERTPS can select any element from src and insert into any place
of the dest. For SSE4.1 targets, compiler can generate e.g.
insertps $64, %xmm0, %xmm1
to insert element 1 from %xmm1 to element 0 of %xmm0.
gcc/ChangeLog:
PR target/94908
* config/i386/i386-builtin.def (__builtin_ia32_insertps128):
Use CODE_FOR_sse4_1_insertps_v4sf.
* config/i386/i386-expand.cc (expand_vec_perm_insertps): New.
(expand_vec_perm_1): Call expand_vec_per_insertps.
* config/i386/i386.md ("unspec"): Declare UNSPEC_INSERTPS here.
* config/i386/mmx.md (mmxscalarmode): New mode attribute.
(@sse4_1_insertps_<mode>): New insn pattern.
* config/i386/sse.md (@sse4_1_insertps_<mode>): Macroize insn
pattern from sse4_1_insertps using VI4F_128 mode iterator.
gcc/testsuite/ChangeLog:
PR target/94908
* gcc.target/i386/pr94908.c: New test.
* gcc.target/i386/sse4_1-insertps-5.c: New test.
* gcc.target/i386/vperm-v4sf-2-sse4.c: New test.
Jonathan Wakely [Tue, 18 Apr 2023 16:22:40 +0000 (17:22 +0100)]
libstdc++: Fix preprocessor condition in linker script [PR108969]
The linker script is preprocessed with $(top_builddir)/config.h not the
include/$target/bits/c++config.h version, which means that configure
macros do not have the _GLIBCXX_ prefix yet.
The _GLIBCXX_SYMVER_GNU and _GLIBCXX_SHARED checks are redundant,
because the gnu.ver file is only used for _GLIBCXX_SYMVER_GNU and the
linker script is only used for the shared library. Remove those.
Aldy Hernandez [Thu, 23 Feb 2023 08:10:16 +0000 (09:10 +0100)]
Add GTY support for vrange.
IPA currently puts *some* irange's in GC memory. When I contribute
support for generic ranges in IPA, we'll need to change this to
vrange. This patch adds GTY support for both vrange and frange.
constraint: fix relaxed memory and repeated constraint handling
The function `constrain_operands' lacked the logic to consider relaxed
memory constraints when "traditional" memory constraints were not
satisfied, creating potential issues as observed during the reload
compilation pass.
In addition, it was observed that while `constrain_operands' chooses
to disregard constraints when more than one alternative is provided,
e.g. "m,r" using CONSTRAINT__UNKNOWN, it has no checks in place to
determine whether the multiple constraints in a given string are in
fact repetitions of the same constraint and should thus in fact be
treated as a single constraint, as ought to be the case for something
like "m,m".
Both of these issues are dealt with here, thus ensuring that we get
appropriate pattern matching.
Jonathan Wakely [Tue, 18 Apr 2023 13:37:38 +0000 (14:37 +0100)]
libstdc++: Export global iostreams with GLIBCXX_3.4.31 symver [PR108969]
Since GCC 13 the global iostream objects are only initialized once in
libstdc++, and not by a std::ios::Init object in every translation unit
that includes <iostream>. To avoid using uninitialized streams defined
in an older libstdc++.so, translation units using the global iostreams
should depend on the GLIBCXX_3.4.31 symver.
Define std::cin as std::__io::cin and then export it as
std::cin@@GLIBCXX_3.4.31 so that references to std::cin bind to the new
symver. Also export it as @GLIBCXX_3.4 for backwards compatibility
libstdc++-v3/ChangeLog:
PR libstdc++/108969
* src/Makefile.am: Move globals_io.cc to here.
* src/Makefile.in: Regenerate.
* src/c++98/Makefile.am: Remove globals_io.cc from here.
* src/c++98/Makefile.in: Regenerate.
* src/c++98/globals_io.cc [_GLIBCXX_SYMVER_GNU] (cin): Adjust
symbol name and then export with GLIBCXX_3.4.31 symver.
(cout, cerr, clog, wcin, wcout, wcerr, wclog): Likewise.
* config/abi/post/aarch64-linux-gnu/baseline_symbols.txt:
Regenerate.
* config/abi/post/i486-linux-gnu/baseline_symbols.txt:
Regenerate.
* config/abi/post/m68k-linux-gnu/baseline_symbols.txt:
Regenerate.
* config/abi/post/powerpc64-linux-gnu/baseline_symbols.txt:
Regenerate.
* config/abi/post/riscv64-linux-gnu/baseline_symbols.txt:
Regenerate.
* config/abi/post/x86_64-linux-gnu/32/baseline_symbols.txt:
Regenerate.
* config/abi/post/s390x-linux-gnu/baseline_symbols.txt:
Regenerate.
* config/abi/post/x86_64-linux-gnu/baseline_symbols.txt:
Regenerate.
* config/abi/pre/gnu.ver: Add iostream objects to new symver.
Kito Cheng [Tue, 18 Apr 2023 10:07:06 +0000 (18:07 +0800)]
Docs: Add doc for RISC-V vector intrinsics
Document which version of RISC-V vector intrinsics has implemented in
GCC.
gcc/ChangeLog:
* doc/extend.texi (Target Builtins): Add RISC-V Vector
Intrinsics.
(RISC-V Vector Intrinsics): Document GCC implemented which
version of RISC-V vector intrinsics and its reference.
This adds bitmap_clear_first_set_bit and uses it where previously
bitmap_clear_bit followed bitmap_first_set_bit. The advantage
is speeding up the search and avoiding to clobber ->current.
Richard Biener [Fri, 10 Mar 2023 11:23:09 +0000 (12:23 +0100)]
Shrink points-to analysis dumps when not dumping with -details
The following allows to get PTA stats with -stats without blowing
up your filesystem by guarding constraint and solution dumping
with TDF_DETAILS and the SSA points-to info with TDF_DETAILS
or TDF_ALIAS.
* tree-ssa-structalias.cc (dump_sa_stats): Split out from...
(dump_sa_points_to_info): ... this function.
(compute_points_to_sets): Guard large dumps with TDF_DETAILS,
and call dump_sa_stats guarded with TDF_STATS.
(ipa_pta_execute): Likewise.
(compute_may_aliases): Guard dump_alias_info with
TDF_DETAILS|TDF_ALIAS.
Andrew Pinski [Tue, 4 Apr 2023 00:09:27 +0000 (00:09 +0000)]
PHIOPT: add folding/simplification detail to the dump
While debugging PHI-OPT with match-and-simplify,
I found that adding more dumping to the debug dumps made
it easier to understand what was going on rather than stepping in
the debugger so this adds them. Note I used TDF_FOLDING rather
than TDF_DETAILS as these debug messages can be chatty and
only needed if you are debugging match and simplify
with PHI-OPT and match and simplify uses TDF_FOLDING as
its check.
OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
gcc/ChangeLog:
* tree-ssa-phiopt.cc (gimple_simplify_phiopt): Dump
the expression that is being tried when TDF_FOLDING
is true.
(phiopt_worker::match_simplify_replacement): Dump
the sequence which was created by gimple_simplify_phiopt
when TDF_FOLDING is true.
Andrew Pinski [Sun, 9 Apr 2023 02:03:38 +0000 (19:03 -0700)]
PHIOPT: small cleanup in match_simplify_replacement
We know that the statement we are moving is already
have a SSA_NAME on the lhs so we don't need to
check that and can also just call reset_flow_sensitive_info
with the name we already got.
OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
gcc/ChangeLog:
* tree-ssa-phiopt.cc (match_simplify_replacement):
Simplify code that does the movement slightly.
aarch64: Use standard RTL codes for __rev16 intrinsic expansion
I noticed for the expansion of the __rev16* arm_acle.h intrinsics we don't need to use an unspec just because it doesn't match neatly to a bswap code.
We have organic combine patterns for it that we can reuse.
This patch removes the define_insn using UNSPEC_REV (should it have been an UNSPEC_REV16?) and adds an expander to emit
the patterns we have for rev16 using standard RTL codes.
Bootstrapped and tested on aarch64-none-linux-gnu.
Declare dconstm0 to go along with dconst0 and friends.
Negating dconst0 is getting pretty old, and we will keep adding copies
of the same idiom. Fixed by adding a dconstm0 constant to go along
with dconst1, dconstm1, etc.
gcc/ChangeLog:
* emit-rtl.cc (init_emit_once): Initialize dconstm0.
* gimple-range-op.cc (class cfn_signbit): Remove dconstm0
declaration.
* range-op-float.cc (zero_range): Use dconstm0.
(zero_to_inf_range): Same.
* real.h (dconstm0): New.
* value-range.cc (frange::flush_denormals_to_zero): Use dconstm0.
(frange::set_zero): Do not declare dconstm0.
Richard Biener [Mon, 6 Mar 2023 10:06:38 +0000 (11:06 +0100)]
RAII auto_mpfr and autp_mpz
The following adds two RAII classes, one for mpz_t and one for mpfr_t
making object lifetime management easier. Both formerly require
explicit initialization with {mpz,mpfr}_init and release with
{mpz,mpfr}_clear.
I've converted two example places (where lifetime is trivial).
aarch64: Use intrinsic flags information rather than hardcoding FLAG_AUTO_FP
We record the flags to use for the intrinsics in aarch64_simd_intrinsic_data, so use it when initialising them
rather than using a hardcoded FLAG_AUTO_FP. The current vreinterpret intrinsics use FLAG_AUTO_FP anyway so this
patch is an NFC but this will be needed as we migrate more builtins into the intrinsics infrastructure.
Bootstrapped and tested on aarch64-none-linux-gnu.
gcc/ChangeLog:
* config/aarch64/aarch64-builtins.cc (aarch64_init_simd_intrinsics): Take
builtin flags from intrinsic data rather than hardcoded FLAG_AUTO_FP.
Aldy Hernandez [Thu, 26 Jan 2023 03:46:54 +0000 (04:46 +0100)]
Return true from operator== for two identical ranges containing NAN.
The == operator for ranges signifies that two ranges contain the same
thing, not that they are ultimately equal. So [2,4] == [2,4], even
though one may be a 2 and the other may be a 3. Similarly with two
VARYING ranges.
There is an oversight in frange::operator== where we are returning
false for two identical NANs. This is causing us to never cache NANs
in sbr_sparse_bitmap::set_bb_range.
gcc/ChangeLog:
* value-range.cc (frange::operator==): Adjust for NAN.
(range_tests_nan): Remove some NAN tests.
Aldy Hernandez [Thu, 23 Feb 2023 07:48:28 +0000 (08:48 +0100)]
Add inchash support for vrange.
This patch provides inchash support for vrange. It is along the lines
of the streaming support I just posted and will be used for IPA
hashing of ranges.
Richard Biener [Tue, 18 Apr 2023 09:49:48 +0000 (11:49 +0200)]
tree-optimization/109539 - restrict PHI handling in access diagnostics
Access diagnostics visits the SSA def-use chains to diagnose things like
dangling pointer uses. When that runs into PHIs it tries to prove
all incoming pointers of which one is the currently visited use are
related to decide whether to keep looking for the PHI def uses.
That turns out to be overly optimistic and thus costly. The following
scraps the existing handling for simply requiring that we eventually
visit all incoming pointers of the PHI during the def-use chain
analysis and only then process uses of the PHI def.
Note this handles backedges of natural loops optimistically, diagnosing
the first iteration. There's gcc.dg/Wuse-after-free-2.c containing
a testcase requiring this.
PR tree-optimization/109539
* gimple-ssa-warn-access.cc (pass_waccess::check_pointer_uses):
Re-implement pointer relatedness for PHIs.
Patrick Palka [Tue, 18 Apr 2023 11:21:13 +0000 (07:21 -0400)]
libstdc++: Implement range_adaptor_closure from P2387R3 [PR108827]
PR libstdc++/108827
libstdc++-v3/ChangeLog:
* include/bits/ranges_cmp.h (__cpp_lib_ranges): Bump value
for C++23.
* include/std/ranges (range_adaptor_closure): Define for C++23.
* include/std/version (__cpp_lib_ranges): Bump value for
C++23.
* testsuite/std/ranges/version_c++23.cc: Bump expected value
of __cpp_lib_ranges.
* testsuite/std/ranges/range_adaptor_closure.cc: New test.
Andrew Stubbs [Fri, 14 Apr 2023 16:05:15 +0000 (17:05 +0100)]
amdgcn: HardFP divide
Implement FP division using hardware instructions. This replaces both the
softfp library calls, and the --fast-math inaccurate divsion we had previously.
The GCN architecture does not have a single divide instruction, but it does
have a number of support instructions designed to make multiply-by-reciprocal
sufficiently accurate for non-fast-math usage.
gcc/ChangeLog:
* config/gcn/gcn-valu.md (SV_SFDF): New iterator.
(SV_FP): New iterator.
(scalar_mode, SCALAR_MODE): Add identity mappings for scalar modes.
(recip<mode>2): Unify the two patterns using SV_FP.
(div_scale<mode><exec_vcc>): New insn.
(div_fmas<mode><exec>): New insn.
(div_fixup<mode><exec>): New insn.
(div<mode>3): Unify the two expanders and rewrite using hardfp.
* config/gcn/gcn.cc (gcn_md_reorg): Support "vccwait" attribute.
* config/gcn/gcn.md (unspec): Add UNSPEC_DIV_SCALE, UNSPEC_DIV_FMAS,
and UNSPEC_DIV_FIXUP.
(vccwait): New attribute.
gcc/testsuite/ChangeLog:
* gcc.target/gcn/fpdiv.c: Remove the -ffast-math requirement.
This patch is a straightforward extension of the zero-extending LDAPR
pattern to represent QI -> HI load-extends. This maps down to a LDAPRB-W
instruction.
This lets us remove a redundant zero-extend in the new test function.
Bootstrapped and tested on aarch64-none-linux-gnu.
gcc/ChangeLog:
* config/aarch64/atomics.md
(*aarch64_atomic_load<ALLX:mode>_rcpc_zext):
Use SD_HSDI for destination mode iterator.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/ldapr-zext.c: Add test for u8 to u16
extension.
Jin Ma [Tue, 18 Apr 2023 09:26:49 +0000 (17:26 +0800)]
RISC-V: Adjust the parsing order of extensions to be consistent with riscv-spec and binutils.
The current order of gcc and binutils parsing extensions is inconsistent.
According to latest risc-v spec, the canonical order in which extension names must
appear in the name string specified in Table 29.1 is different from before.
In the latest table, non-standard extensions must be listed after all standard
extensions. To keep consistent, we now change the parsing order.
Related llvm patch links:
https://reviews.llvm.org/D148315
gcc/ChangeLog:
* common/config/riscv/riscv-common.cc (multi_letter_subset_rank): Swap the order
of z-extensions and s-extensions.
(riscv_subset_list::parse): Likewise.
match.pd has mostly for AArch64 an optimization in which it optimizes
certain forms of __builtin_shuffle of x + y and x - y vectors into
fneg using twice as wide element type so that every other sign is changed,
followed by fadd.
The following patch extends that optimization, so that it can handle
other forms as well, using the same fneg but fsub instead of fadd.
As the plus is commutative and minus is not and I want to handle
vec_perm with plus minus and minus plus order preferrably in one
pattern, I had to do the matching operand checks by hand.
2023-04-18 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/109240
* match.pd (fneg/fadd): Rewrite such that it handles both plus as
first vec_perm operand and minus as second using fneg/fadd and
minus as first vec_perm operand and plus as second using fneg/fsub.
* gcc.target/aarch64/simd/addsub_2.c: New test.
* gcc.target/aarch64/sve/addsub_2.c: New test.
In upcoming patches I will contribute code to stream out frange's as
well as vrange's. This patch abstracts out the REAL_VALUE_TYPE
streaming into their own functions, so that they may be used elsewhere.
Aldy Hernandez [Fri, 10 Feb 2023 11:52:24 +0000 (12:52 +0100)]
Abstract out calculation of max HWIs per wide int.
I'm about to add one more use of the same snippet of code, for a total
of 4 identical calculations in the code base.
gcc/ChangeLog:
* wide-int.h (WIDE_INT_MAX_HWIS): New.
(class fixed_wide_int_storage): Use it.
(trailing_wide_ints <N>::set_precision): Use it.
(trailing_wide_ints <N>::extra_size): Use it.
Xi Ruoyao [Sun, 2 Apr 2023 13:37:49 +0000 (21:37 +0800)]
LoongArch: Optimize additions with immediates
1. Use addu16i.d for TARGET_64BIT and suitable immediates.
2. Split one addition with immediate into two addu16i.d or addi.{d/w}
instructions if possible. This can avoid using a temp register w/o
increase the count of instructions.
Inspired by https://reviews.llvm.org/D143710 and
https://reviews.llvm.org/D147222.
Bootstrapped and regtested on loongarch64-linux-gnu. Ok for GCC 14?
gcc/ChangeLog:
* config/loongarch/loongarch-protos.h
(loongarch_addu16i_imm12_operand_p): New function prototype.
(loongarch_split_plus_constant): Likewise.
* config/loongarch/loongarch.cc
(loongarch_addu16i_imm12_operand_p): New function.
(loongarch_split_plus_constant): Likewise.
* config/loongarch/loongarch.h (ADDU16I_OPERAND): New macro.
(DUAL_IMM12_OPERAND): Likewise.
(DUAL_ADDU16I_OPERAND): Likewise.
* config/loongarch/constraints.md (La, Lb, Lc, Ld, Le): New
constraint.
* config/loongarch/predicates.md (const_dual_imm12_operand): New
predicate.
(const_addu16i_operand): Likewise.
(const_addu16i_imm12_di_operand): Likewise.
(const_addu16i_imm12_si_operand): Likewise.
(plus_di_operand): Likewise.
(plus_si_operand): Likewise.
(plus_si_extend_operand): Likewise.
* config/loongarch/loongarch.md (add<mode>3): Convert to
define_insn_and_split. Use plus_<mode>_operand predicate
instead of arith_operand. Add alternatives for La, Lb, Lc, Ld,
and Le constraints.
(*addsi3_extended): Convert to define_insn_and_split. Use
plus_si_extend_operand instead of arith_operand. Add
alternatives for La and Le alternatives.
gcc/testsuite/ChangeLog:
* gcc.target/loongarch/add-const.c: New test.
* gcc.target/loongarch/stack-check-cfa-1.c: Adjust for stack
frame size change.
* gcc.target/loongarch/stack-check-cfa-2.c: Likewise.
libsanitizer, darwin: Unsupport Darwin >= 22 for now.
The mechanism for location dyld has altered from Darwin22 since dyld is now
in the shared cache. The implemented mechanism for walking the cache uses
Apple Blocks which GCC does not yet support, and the fallback to the original
mechanism does not work there.
Until a suitable work-around can be found, unsupport Darwin22+.
Aldy Hernandez [Thu, 2 Mar 2023 12:12:45 +0000 (13:12 +0100)]
Constify invariant fields of vrange and irange.
The discriminator in vrange cannot change after construction,
similarly the number of allocated ranges in an irange. It's best to
make them constant to avoid invalid changes.
Patrick Palka [Mon, 17 Apr 2023 22:52:07 +0000 (18:52 -0400)]
c++: bound ttp level lowering [PR109531]
Here when level lowering the bound ttp TT<typename T::type> via the
substitution T=C, we're neglecting to canonicalize (and thereby strip
of simple typedefs) the substituted template arguments {A<int>} before
determining the new canonical type via hash table lookup. This leads to
a hash mismatch ICE for the two equivalent types TT<int> and TT<A<int>>
since iterative_hash_template_arg assumes type arguments are already
canonicalized.
We can fix this by canonicalizing or coercing the substituted arguments
directly, but seeing as creation and ordinary substitution of bound ttps
both go through lookup_template_class, which in turn performs the desired
coercion/canonicalization, it seems preferable to make this code path go
through lookup_template_class as well.
PR c++/109531
gcc/cp/ChangeLog:
* pt.cc (tsubst) <case BOUND_TEMPLATE_TEMPLATE_PARM>:
In the level-lowering case just use lookup_template_class
to rebuild the bound ttp.
gcc/testsuite/ChangeLog:
* g++.dg/template/canon-type-20.C: New test.
* g++.dg/template/ttp36.C: New test.
RISC-V: optimize stack manipulation in save-restore
The stack that save-restore reserves is not well accumulated in stack allocation and deallocation.
This patch allows less instructions to be used in stack allocation and deallocation if save-restore enabled.
before patch:
bar:
call t0,__riscv_save_4
addi sp,sp,-64
...
li t0,-12288
addi t0,t0,-1968 # optimized out after patch
add sp,sp,t0 # prologue
...
li t0,12288 # epilogue
addi t0,t0,2000 # optimized out after patch
add sp,sp,t0
...
addi sp,sp,32
tail __riscv_restore_4
after patch:
bar:
call t0,__riscv_save_4
addi sp,sp,-2032
...
li t0,-12288
add sp,sp,t0 # prologue
...
li t0,12288 # epilogue
add sp,sp,t0
...
addi sp,sp,2032
tail __riscv_restore_4
gcc/
* config/riscv/riscv.cc (riscv_expand_prologue): Consider save-restore in
stack allocation.
(riscv_expand_epilogue): Consider save-restore in stack deallocation.
gcc/testsuite
* gcc.target/riscv/stack_save_restore.c: New test.
A warning pass should not be exporting global ranges it finds along
the way, because that will alter the behavior of future passes.
The reason the present behavior was there was because of some long ago
forgotten regression in another pass. This regression is no longer
there, and if there's ever any fallout from cleaning this up, we can
address it in the pass that is missing some information.
gcc/ChangeLog:
* gimple-ssa-warn-alloca.cc (pass_walloca::execute): Do not export
global ranges.
The RVV test harness currently sets the ISA according to the target
tuple, but doesn't also set the ABI. This just sets the ABI to match
the ISA, though we should really also be respecting the user's specific
ISA to test.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/rvv.exp (gcc_mabi): New variable.
The test case that was added is rv64i-specific, as there's better ways
to generate this code on rv32i (where the long/int cast is a NOP) and on
rv64i_zba (where we have word shifts). This renames the original test
case and adds two more for those targets.
gcc/testsuite/ChangeLog:
PR target/106602
* gcc.target/riscv/pr106602.c: Moved to...
* gcc.target/riscv/pr106602-rv64i.c: ...here.
* gcc.target/riscv/pr106602-rv32i.c: New test.
* gcc.target/riscv/pr106602-rv64i_zba.c: New test.
RISC-V: add a new parameter in riscv_first_stack_step.
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_first_stack_step): Add a new function
parameter remaining_size.
(riscv_compute_frame_info): Adapt new riscv_first_stack_step interface.
(riscv_expand_prologue): Likewise.
(riscv_expand_epilogue): Likewise.
Feng Wang [Sat, 15 Apr 2023 16:11:15 +0000 (10:11 -0600)]
RISC-V: Optimze the reverse conditions of rotate shift
gcc/ChangeLog:
* config/riscv/bitmanip.md (rotrsi3_sext): Support generating
roriw for constant counts.
* rtl.h (reverse_rotate_by_imm_p): Add function declartion
* simplify-rtx.cc (reverse_rotate_by_imm_p): New function.
(simplify_context::simplify_binary_operation_1): Use it.
* expmed.cc (expand_shift_1): Likewise.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/zbb-rol-ror-04.c: New test.
* gcc.target/riscv/zbb-rol-ror-05.c: New test.
* gcc.target/riscv/zbb-rol-ror-06.c: New test.
* gcc.target/riscv/zbb-rol-ror-07.c: New test.
Jakub Jelinek [Mon, 17 Apr 2023 13:12:49 +0000 (15:12 +0200)]
Update crontab and git_update_version.py
2023-04-17 Jakub Jelinek <jakub@redhat.com>
maintainer-scripts/
* crontab: Snapshots from trunk are now GCC 14 related.
Add GCC 13 snapshots from the respective branch.
contrib/
* gcc-changelog/git_update_version.py (active_refs): Add
releases/gcc-13.
Martin Jambor [Mon, 17 Apr 2023 10:59:51 +0000 (12:59 +0200)]
ipa: Fix double reference-count decrements for the same edge (PR 107769, PR 109318)
It turns out that since addition of the code that can identify globals
which are only read from, the code that keeps track of the references
can decrement their count for the same calls, once during IPA-CP and
then again during inlining. Fixed by adding a special flag to the
pass-through variant and simply wiping out the reference to the
refdesc structure from the constant ones.
Moreover, during debugging of the issue I have discovered that the
code removing references could remove a reference associated with the
same statement but of a wrong type. In all cases it wanted to remove
an IPA_REF_ADDR reference so removing a lesser one instead should do
no harm in practice, but we should try to be consistent and so this
patch extends symtab_node::find_reference so that it searches for a
reference of a given type only.
gcc/ChangeLog:
2023-04-14 Martin Jambor <mjambor@suse.cz>
PR ipa/107769
PR ipa/109318
* cgraph.h (symtab_node::find_reference): Add parameter use_type.
* ipa-prop.h (ipa_pass_through_data): New flag refdesc_decremented.
(ipa_zap_jf_refdesc): New function.
(ipa_get_jf_pass_through_refdesc_decremented): Likewise.
(ipa_set_jf_pass_through_refdesc_decremented): Likewise.
* ipa-cp.cc (ipcp_discover_new_direct_edges): Provide a value for
the new parameter of find_reference.
(adjust_references_in_caller): Likewise. Make sure the constant jump
function is not used to decrement a refdec counter again. Only
decrement refdesc counters when the pass_through jump function allows
it. Added a detailed dump when decrementing refdesc counters.
* ipa-prop.cc (ipa_print_node_jump_functions_for_edge): Dump new flag.
(ipa_set_jf_simple_pass_through): Initialize the new flag.
(ipa_set_jf_unary_pass_through): Likewise.
(ipa_set_jf_arith_pass_through): Likewise.
(remove_described_reference): Provide a value for the new parameter of
find_reference.
(update_jump_functions_after_inlining): Zap refdesc of new jfunc if
the previous pass_through had a flag mandating that we do so.
(propagate_controlled_uses): Likewise. Only decrement refdesc
counters when the pass_through jump function allows it.
(ipa_edge_args_sum_t::duplicate): Provide a value for the new
parameter of find_reference.
(ipa_write_jump_function): Assert the new flag does not have to be
streamed.
* symtab.cc (symtab_node::find_reference): Add parameter use_type, use
it in searching.
Philipp Tomsich [Thu, 23 Mar 2023 18:47:57 +0000 (19:47 +0100)]
aarch64: disable LDP via tuning structure for -mcpu=ampere1
AmpereOne (-mcpu=ampere1) breaks LDP instructions into two uops.
Given the chance that this causes instructions to slip into the next
decoding cycle and the additional overheads when handling
cacheline-crossing LDP instructions, we disable the generation of LDP
isntructions through the tuning structure from instruction combining
(such as in peephole2).
Given the code-density benefits in builtins and prologue/epilogue
expansion, we allow LDPs there.
This commit:
* adds a new tuning option AARCH64_EXTRA_TUNE_NO_LDP_COMBINE
* allows -moverride=tune=... to override this
These changes are benchmark-driven, yielding the following changes
(with a net-overall improvement):
503.bwaves_r. -0.88%
507.cactuBSSN_r 0.35%
508.namd_r 3.09%
510.parest_r -2.99%
511.povray_r 5.54%
519.lbm_r 15.83%
521.wrf_r 0.56%
526.blender_r 2.47%
527.cam4_r 0.70%
538.imagick_r 0.00%
544.nab_r -0.33%
549.fotonik3d_r. -0.42%
554.roms_r 0.00%
-------------------------
= total 1.79%
Signed-off-by: Philipp Tomsich <philipp.tomsich@vrull.eu> Co-Authored-By: Di Zhao <di.zhao@amperecomputing.com>
gcc/ChangeLog:
* config/aarch64/aarch64-tuning-flags.def (AARCH64_EXTRA_TUNING_OPTION):
Add AARCH64_EXTRA_TUNE_NO_LDP_COMBINE.
* config/aarch64/aarch64.cc (aarch64_operands_ok_for_ldpstp):
Check for the above tuning option when processing loads.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/ampere1-no_ldp_combine.c: New test.
Jakub Jelinek [Mon, 17 Apr 2023 09:45:53 +0000 (11:45 +0200)]
testsuite: Fix up vect-simd-clone-1[678]f.c tests some more
With
make check-gcc check-g++ -j32 -k RUNTESTFLAGS='--target_board=unix\{-m32,-m32/-mavx,-m32/-mavx512f,-m32/-march=cascadelake,-m64,-m64/-mavx,-m64/-mavx512f,-m64/-march=cascadelake\}
+vect.exp=vect-simd-clone*'
the vect-simd-clone-1[678]f.c tests fail with -m32/-mavx512f and -m32/-march=cascadelake,
in that case there are zero matches rather than the 4 expected for ia32.
-m64/-mavx512f and -m64/-march=cascadelake works fine though (2 expected
matches).
So, the following patch just adds -mno-avx512f for x86 non-lp64.
Richard Biener [Mon, 17 Apr 2023 07:22:57 +0000 (09:22 +0200)]
tree-optimization/109524 - ICE with VRP edge removal
VRP queues edges to process late for updating global ranges for
__builtin_unreachable. But this interferes with edge removal
from substitute_and_fold. The following deals with this by
looking up the edge with source/dest block indices which do not
become stale.
PR tree-optimization/109524
* tree-vrp.cc (remove_unreachable::m_list): Change to a
vector of pairs of block indices.
(remove_unreachable::maybe_register_block): Adjust.
(remove_unreachable::remove_and_update_globals): Likewise.
Deal with removed blocks.
As PR108809 mentioned, vec_xl_len_r and vec_xst_len_r are tested
in gcc.target/powerpc/builtins-5-p9-runnable.c.
The vector operand of these two bifs are different from the view
of v16_int8 between BE and LE, even it is same from the view of
128bits(uint128/V1TI).
The test case gcc.target/powerpc/builtins-5-p9-runnable.c was
written for LE environment, this patch updates it for BE.
Tested on ppc64 BE and LE.
Is this ok for trunk?
BR,
Jeff (Jiufu)
gcc/testsuite/ChangeLog:
PR testsuite/108809
* gcc.target/powerpc/builtins-5-p9-runnable.c: Update for BE.
Pan Li [Fri, 14 Apr 2023 03:25:11 +0000 (11:25 +0800)]
RISC-V: Add test cases for the RVV mask insn shortcut.
There are sorts of shortcut codegen for the RVV mask insn. For
example.
vmxor vd, va, va => vmclr vd.
We would like to add more optimization like this but first of all
we must add the tests for the existing shortcut optimization, to
ensure we don't break existing optimization from underlying shortcut
optimization.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/mask_insn_shortcut.c: New test.
Jeff Law [Sun, 16 Apr 2023 15:55:32 +0000 (09:55 -0600)]
[committed] [PR target/109508] Adjust conditional move expansion for SFB
Recently the conditional move expander's predicates were loosened for the
benefit of the THEAD processors. In particular one operand that was
previously "register_operand" is now "reg_or_0_operand". That's fine for
THEAD, but breaks for SFB which requires a register for that operand.
This results in an ICE when compiling the testcase an SFB target such as
the sifive s76.
This change adjusts the expansion code slightly to copy the value into
a register for SFB.
Bootstrapped and regression tested (c,c++,fortran only) with a toolchain
configured to enable SFB by default.
PR target/109508
gcc/
* config/riscv/riscv.cc (riscv_expand_conditional_move): For
TARGET_SFB_ALU, force the true arm into a register.
gcc/testsuite
* gcc.target/riscv/pr109508.c: New test.
Roger Sayle [Sun, 16 Apr 2023 12:03:10 +0000 (13:03 +0100)]
[Committed] New test case gcc.target/avr/pr54816.c
PR target/54816 is now fixed on mainline. This adds a test case to
check that it doesn't regress in future. Tested with a cross compiler
to avr-elf. Committed as obvious.
2023-04-16 Roger Sayle <roger@nextmovesoftware.com>
gcc/testsuite/ChangeLog
PR target/54816
* gcc.target/avr/pr54816.c: New test case.
Eric Botcazou [Sat, 15 Apr 2023 17:35:02 +0000 (19:35 +0200)]
Fix fallout of previous change on x86/Linux
gcc/ada/
PR bootstrap/109510
* gcc-interface/decl.cc (gnat_to_gnu_entity) <types>: Do not reset
align to zero in any case. Set TYPE_USER_ALIGN on the type only if
it is an aggregate type, or else a type whose default alignment is
specifically capped on selected platforms.
Jason Merrill [Sat, 15 Apr 2023 02:40:43 +0000 (22:40 -0400)]
c++: constexpr aggregate destruction [PR109357]
We were assuming that the result of evaluation of TARGET_EXPR_INITIAL would
always be the new value of the temporary, but that's not necessarily true
when the initializer is complex (i.e. target_expr_needs_replace). In that
case evaluating the initializer initializes the temporary as a side-effect.
PR c++/109357
gcc/cp/ChangeLog:
* constexpr.cc (cxx_eval_constant_expression) [TARGET_EXPR]:
Check for complex initializer.
Jakub Jelinek [Sat, 15 Apr 2023 10:08:45 +0000 (12:08 +0200)]
if-conv: Small improvement for expansion of complex PHIs [PR109154]
The following patch is just a dumb improvement, gets rid of 2 unnecessary
instructions on both the PR's original testcase and on the two reduced ones,
both on -mcpu=neoverse-v1 and -mavx512f.
The thing is, if we have args_len (args_len >= 2) unique PHI arguments,
we need only args_len - 1 COND_EXPRs to expand the PHI, because first
COND_EXPR can merge 2 unique arguments and all the following ones merge
another unique argument with the previously merged arguments,
while the code for mysterious reasons was always emitting args_len
COND_EXPRs, where the first COND_EXPR merged the first and second unique
arguments, the second COND_EXPR merged the second unique argument with
result of merging the first and second unique arguments and the rest was
already expectable, nth COND_EXPR for n > 2 merged the nth unique argument
with result of merging the previous unique arguments.
Now, in my understanding, the bb_predicate for bb's predecessor need to
form a disjunct set which together creates the successor's bb_predicate,
so I don't see why we'd need to check all the bb_predicates, if we check
all but one then when all those other ones are false the last bb_predicate
is necessarily true. Given that the code attempts to sort argument with
most occurrences (so likely most complex combined predicate) last, I chose
not to test that last argument's predicate.
So e.g. on the testcase from comment 47 in the PR:
void
foo (int *f, int d, int e)
{
for (int i = 0; i < 1024; i++)
{
int a = f[i];
int t;
if (a < 0)
t = 1;
else if (a < e)
t = 1 - a * d;
else
t = 0;
f[i] = t;
}
}
we used to emit:
_7 = a_10 < 0;
_21 = a_10 >= 0;
_22 = a_10 < e_11(D);
_23 = _21 & _22;
_26 = a_10 >= e_11(D);
_27 = _21 & _26;
_ifc__42 = _7 ? 1 : t_13;
_ifc__43 = _23 ? t_13 : _ifc__42;
t_6 = _27 ? 0 : _ifc__43;
while the following patch changes it to:
_7 = a_10 < 0;
_21 = a_10 >= 0;
_22 = a_10 < e_11(D);
_23 = _21 & _22;
_ifc__42 = _23 ? t_13 : 0;
t_6 = _7 ? 1 : _ifc__42;
which I believe should be sufficient for a PHI <1, t_13, 0>.
I've gathered some statistics and on x86_64-linux and i686-linux
bootstraps/regtests, this code triggers:
92 4 4
112 2 4
141 3 4
4046 3 3
(where 2nd number is args_len and 3rd argument EDGE_COUNT (bb->preds)
and first argument count of those from sort | uniq -c | sort -n).
In all these cases the patch should squeze one extra COND_EXPR and
its associated predicate (the latter only if it wasn't used elsewhere).
Incrementally, I think we should try to perform some analysis on which
predicates depend on inverses of other predicates and if possible try
to sort the arguments better and omit testing unnecessary predicates.
So essentially for the above testcase deconstruct it back to:
_7 = a_10 < 0;
_22 = a_10 < e_11(D);
_ifc__42 = _22 ? t_13 : 0;
t_6 = _7 ? 1 : _ifc__42;
which is like what this patch produces, but with the & a_10 >= 0 part
removed, because the last predicate is a_10 < 0 and so testing a_10 >= 0
on what appears on the false branch doesn't make sense.
But I'm afraid that will take more work than is doable in stage4 right now.
2023-04-15 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/109154
* tree-if-conv.cc (predicate_scalar_phi): For complex PHIs, emit just
args_len - 1 COND_EXPRs rather than args_len. Formatting fix.
rs6000: don't expect __ibm128 with 64-bit long double [PR99708]
When long double is 64-bit wide, as on vxworks, the rs6000 backend
defines neither the __ibm128 type nor the __SIZEOF_IBM128__ macro, but
pr99708.c expected both to be always defined. Adjust the test to
match the implementation.
Co-Authored-By: Kewen Lin <linkw@linux.ibm.com>
for gcc/testsuite/ChangeLog
PR target/99708
* gcc.target/powerpc/pr99708.c: Accept lack of
__SIZEOF_IBM128__ when long double is 64-bit wide.
Here we hit the MEM_REF case, with its arg an ADDR_EXPR, but had no handling
for that and wrongly assumed it would be a reference to a local variable.
This patch overhauls the logic for deciding whether the target is something
to warn about so that we only warn if we specifically recognize the target
as non-local. None of the existing tests regress as a result.
Harald Anlauf [Fri, 14 Apr 2023 18:45:19 +0000 (20:45 +0200)]
Fortran: fix compile-time simplification of SET_EXPONENT [PR109511]
gcc/fortran/ChangeLog:
PR fortran/109511
* simplify.cc (gfc_simplify_set_exponent): Fix implementation of
compile-time simplification of intrinsic SET_EXPONENT for argument
X < 1 and for I < 0.
gcc/testsuite/ChangeLog:
PR fortran/109511
* gfortran.dg/set_exponent_1.f90: New test.
Eric Botcazou [Fri, 14 Apr 2023 18:14:07 +0000 (20:14 +0200)]
Fix build failure of Ada runtime for Aarch64 targets
The Aarch64 back-end now asserts that the main variant of scalar types
has TYPE_USER_ALIGN cleared, and that's not the case for scalar types
declared with a confirming alignment clause in Ada.
gcc/ada/
PR bootstrap/109510
* gcc-interface/decl.cc (gnat_to_gnu_entity) <types>: Reset align
to zero if its value is equal to TYPE_ALIGN and the type is scalar.
Set TYPE_USER_ALIGN on the type only if align is positive.
Patrick Palka [Fri, 14 Apr 2023 14:31:54 +0000 (10:31 -0400)]
libstdc++: Move down definitions of ranges::cbegin/cend/cetc
This moves down the definitions of the range const-access CPOs to after
the definition of input_range in preparation for implementing P2278R4
which redefines these CPOs in a way that indirectly uses input_range.
libstdc++-v3/ChangeLog:
* include/bits/ranges_base.h (__cust_access::__as_const)
(__cust_access::_CBegin, __cust::cbegin)
(__cust_access::_CEnd, __cust::cend)
(__cust_access::_CRBegin, __cust::crbegin)
(__cust_access::_CREnd, __cust::crend)
(__cust_access::_CData, __cust::cdata): Move down definitions to
shortly after the definition of input_range.
Jonathan Wakely [Thu, 13 Apr 2023 15:34:51 +0000 (16:34 +0100)]
libstdc++: Improve diagnostics for invalid std::format calls
Add a static_assert and a comment so that calling std::format for
unformattable argument types will now show:
/home/jwakely/gcc/13/include/c++/13.0.1/format:3563:22: error: static assertion failed: std::formatter must be specialized for each format arg
3563 | static_assert((is_default_constructible_v<formatter<_Args, _CharT>> && ...),
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
and:
140 | formatter() = delete; // No std::formatter specialization for this type.
libstdc++-v3/ChangeLog:
* include/std/format (formatter): Add comment to deleted default
constructor of primary template.
(_Checking_scanner): Add static_assert.
Paul Thomas [Fri, 14 Apr 2023 10:14:00 +0000 (11:14 +0100)]
Fortran: Fix an excess finalization during allocation [PR104272]
2023-04-14 Paul Thomas <pault@gcc.gnu.org>
gcc/fortran
PR fortran/104272
* gfortran.h : Add expr3_not_explicit bit field to gfc_code.
* resolve.cc (resolve_allocate_expr): Set bit field when the
default initializer is applied to expr3.
* trans-stmt.cc (gfc_trans_allocate): If expr3_not_explicit is
set, do not deallocate expr3.
gcc/testsuite/
PR fortran/104272
* gfortran.dg/class_result_8.f90 : Number of builtin_frees down
from 6 to 5 without memory leaks.
* gfortran.dg/finalize_52.f90: New test
Richard Biener [Fri, 14 Apr 2023 07:55:27 +0000 (09:55 +0200)]
tree-optimization/109502 - vector conversion between mask and non-mask
The following fixes a check that should have rejected vectorizing
a conversion between a mask and non-mask type. Those should be
done via pattern statements.
PR tree-optimization/109502
* tree-vect-stmts.cc (vectorizable_assignment): Fix
check for conversion between mask and non-mask types.
Richard Biener [Fri, 14 Apr 2023 09:35:58 +0000 (11:35 +0200)]
Fix vect-simd-clone testcase dump scanning
This replaces i686*-*-* && { ! lp64 } with the appropriate
{ i?86-*-* x86_64-*-* } && { ! lp64 } for the testcases and
also amends the e variants checking last variant for avx.
I've used avx in the dump scanning, not avx_runtime, since
the dumps get produced when one would not execute but only
compile them. The f varaints lack AVX checking, I didn't
rectify this with this patch.
* gcc.dg/vect/vect-simd-clone-16e.c: Fix x86 lp64 checking
and add missing avx guard.
* gcc.dg/vect/vect-simd-clone-17e.c: Likewise.
* gcc.dg/vect/vect-simd-clone-18e.c: Likewise.
* gcc.dg/vect/vect-simd-clone-16f.c: Fix x86 lp64 checking.
* gcc.dg/vect/vect-simd-clone-17f.c: Likewise.
* gcc.dg/vect/vect-simd-clone-18f.c: Likewise.
Jakub Jelinek [Fri, 14 Apr 2023 07:20:49 +0000 (09:20 +0200)]
combine: Fix AND handling for WORD_REGISTER_OPERATIONS targets [PR109040]
The following testcase is miscompiled on riscv since the addition
of *mvconst_internal define_insn_and_split.
We have:
(insn 36 35 39 2 (set (mem/c:SI (plus:SI (reg/f:SI 65 frame)
(const_int -64 [0xffffffffffffffc0])) [2 S4 A128])
(reg:SI 166)) "pr109040.c":9:11 178 {*movsi_internal}
(expr_list:REG_DEAD (reg:SI 166)
(nil)))
(insn 39 36 40 2 (set (reg:SI 171)
(zero_extend:SI (mem/c:HI (plus:SI (reg/f:SI 65 frame)
(const_int -64 [0xffffffffffffffc0])) [0 S2 A128]))) "pr109040.c":9:11 111 {*zero_extendhisi2}
(nil))
and RTL DSE's replace_read since r0-86337-g18b526e806ab6455 handles
even different modes like in the above case, and so it optimizes it into:
(insn 47 35 39 2 (set (reg:HI 175)
(subreg:HI (reg:SI 166) 0)) "pr109040.c":9:11 179 {*movhi_internal}
(expr_list:REG_DEAD (reg:SI 166)
(nil)))
(insn 39 47 40 2 (set (reg:SI 171)
(zero_extend:SI (reg:HI 175))) "pr109040.c":9:11 111 {*zero_extendhisi2}
(expr_list:REG_DEAD (reg:HI 175)
(nil)))
Pseudo 166 is result of AND with 0x8084c constant (forced into a register).
Combine attempts to combine the AND with the insn 47 above created by DSE,
and turns it because of WORD_REGISTER_OPERATIONS and its assumption that all
the subword operations are actually done on word mode into:
(set (subreg:SI (reg:HI 175) 0)
(and:SI (reg:SI 167 [ m ])
(reg:SI 168)))
and later on the ZERO_EXTEND is thrown away.
We then see
(and:SI (subreg:SI (reg:HI 175) 0) (const_int 0x84c))
and optimize that into
(subreg:SI (and:HI (reg:HI 175) (const_int 0x84c)) 0)
which is still fine, in WORD_REGISTER_OPERATIONS the AND in HImode
will set all upper bits up to BITS_PER_WORD to zeros.
But later on simplify_binary_operation_1 or simplify_and_const_int_1
sees that because nonzero_bits ((reg:HI 175), HImode) == 0x84c, we can
optimize the AND into (reg:HI 175). That isn't correct, because while
the low 16 bits of that REG are known to have all bits but 0x84c cleared,
we don't know that all the upper 16 bits are all clear as well.
So, for WORD_REGISTER_OPERATIONS for integral modes smaller than word mode,
we need to check all bits from word_mode in nonzero_bits for the optimizations.
2023-04-14 Jeff Law <jlaw@ventanamicro.com>
Jakub Jelinek <jakub@redhat.com>
PR target/108947
PR target/109040
* combine.cc (simplify_and_const_int_1): Compute nonzero_bits in
word_mode rather than mode if WORD_REGISTER_OPERATIONS and mode is
smaller than word_mode.
* simplify-rtx.cc (simplify_context::simplify_binary_operation_1)
<case AND>: Likewise.
* gcc.dg/pr108947.c: New test.
* gcc.c-torture/execute/pr109040.c: New test.
Jakub Jelinek [Fri, 14 Apr 2023 07:19:25 +0000 (09:19 +0200)]
loop-iv: Fix up bounds computation
On Thu, Apr 13, 2023 at 06:35:07AM -0600, Jeff Law wrote:
> Bootstrap was successful with v3, but there's hundreds of testsuite failures
> due to the simplify-rtx hunk. compile/20070520-1.c for example when
> compiled with: -O3 -funroll-loops -march=rv64gc -mabi=lp64d
>
> Thursdays are my hell day. It's unlikely I'd be able to look at this at all
> today.
So, seems to me this is because loop-iv.cc asks for invalid RTL to be
simplified, it calls simplify_gen_binary (AND, SImode,
(subreg:SI (plus:DI (reg:DI 289 [ ivtmp_312 ])
(const_int 4294967295 [0xffffffff])) 0),
(const_int 4294967295 [0xffffffff]))
but 0xffffffff is not valid SImode CONST_INT, and unlike previously
we no longer on WORD_REGISTER_OPERATIONS targets which have DImode
word_mode optimize that into the op0, so the invalid constant is emitted
into the IL and checking fails.
The following patch fixes that (and we optimize that & -1 away even earlier
with that).
2023-04-14 Jakub Jelinek <jakub@redhat.com>
* loop-iv.cc (iv_number_of_iterations): Use gen_int_mode instead
of GEN_INT.
testsuite: filter out warning noise for CWE-1341 test
The case file-CWE-1341-example.c checkes [CWE-1341](`double-fclose`).
While on some systems, besides [CWE-1341], a message of [CWE-415] is
also reported. On those systems, attribute `malloc` may be attached on
fopen:
```
# 258 "/usr/include/stdio.h" 3 4
extern FILE *fopen (const char *__restrict __filename,
const char *__restrict __modes)
__attribute__ ((__malloc__)) __attribute__ ((__malloc__ (fclose, 1))) ;
or say: __attribute_malloc__ __attr_dealloc_fclose __wur;
```
See (PR analyzer/108722) for future fix in the analyzer.
This workaround patch adds -Wno-analyzer-double-free to this case.
Patrick Palka [Thu, 13 Apr 2023 20:02:21 +0000 (16:02 -0400)]
c++: 'typename T::X' vs 'struct T::X' lookup [PR109420]
r13-6098-g46711ff8e60d64 made make_typename_type no longer ignore
non-types during the lookup, unless the TYPENAME_TYPE in question was
followed by the :: scope resolution operator. But there is another
exception to this rule: we need to ignore non-types during the lookup
also if the TYPENAME_TYPE was named with a tag other than 'typename',
such as 'struct' or 'enum', since in that case we're dealing with an
elaborated-type-specifier and so [basic.lookup.elab] applies. This
patch implements this additional exception.
PR c++/109420
gcc/cp/ChangeLog:
* decl.cc (make_typename_type): Also ignore non-types during the
lookup if tag_type corresponds to an elaborated-type-specifier.
* pt.cc (tsubst) <case TYPENAME_TYPE>: Pass class_type or
enum_type as tag_type to make_typename_type accordingly instead
of always passing typename_type.
Jason Merrill [Tue, 4 Apr 2023 03:20:13 +0000 (23:20 -0400)]
c++: make trait of incomplete type a permerror [PR109277]
An incomplete type argument to several traits is specified to be undefined
behavior in the library; since it's a compile-time property, we diagnose
it. But apparently some code was relying on the previous behavior of not
diagnosing. So let's make it a permerror.
The assert in cxx_incomplete_type_diagnostic didn't like that, and I don't
see the point of having the assert, so let's just remove it.