Jonathan Wakely [Mon, 23 Dec 2024 21:51:24 +0000 (21:51 +0000)]
libstdc++: Use preprocessor conditions in std module [PR118177]
The std-clib.cc module definition file assumes that all names are
available unconditionally, but that's not true for all targets. Use the
same preprocessor conditions as are present in the <cxxx> headers.
A similar change is needed in std.cc.in for the <chrono> features that
depend on the SSO std::string, guarded with a __cpp_lib_chrono value
indicating full C++20 support.
The conditions for <cmath> are omitted from this change, as there are a
large number of them. That probably needs to be fixed.
libstdc++-v3/ChangeLog:
PR libstdc++/118177
* src/c++23/std-clib.cc.in: Use preprocessor conditions for
names which are not always defined.
* src/c++23/std.cc.in: Likewise.
libstdc++: add initializer_list constructor to std::span (P2447R6)
This commit implements P2447R6. The code is straightforward (just one
extra constructor, with constraints and conditional explicit).
I decided to suppress -Winit-list-lifetime because otherwise it would
give too many false positives. The new constructor is meant to be used
as a parameter-passing interface (this is a design choice, see
P2447R6/ยง2) and, as such, the initializer_list won't dangle despite
GCC's warnings.
The new constructor isn't 100% backwards compatible. A couple of
examples are included in Annex C, but I have also lifted some more
from R4. A new test checks for the old and the new behaviors.
libstdc++-v3/ChangeLog:
* include/bits/version.def: Add the new feature-testing macro.
* include/bits/version.h: Regenerate.
* include/std/span: Add constructor from initializer_list.
* testsuite/23_containers/span/init_list_cons.cc: New test.
* testsuite/23_containers/span/init_list_cons_neg.cc: New test.
Signed-off-by: Giuseppe D'Angelo <giuseppe.dangelo@kdab.com>
Jonathan Wakely [Wed, 11 Dec 2024 22:56:08 +0000 (22:56 +0000)]
libstdc++: Avoid redundant assertions in std::span constructors
Any std::span<T, N> constructor with a runtime length has a precondition
that the length is equal to N (except when N == std::dynamic_extent).
Currently every constructor with a runtime length does:
if constexpr (extent != dynamic_extent)
__glibcxx_assert(n == extent);
We can move those assertions into the __detail::__extent_storage<N>
constructor so they are only done in one place. To avoid checking the
assertions when we have a constant length we can add a second
constructor which is consteval and takes a integral_constant<size_t, N>
argument. The std::span constructors can pass a size_t for runtime
lengths and a std::integral_constant<size_t, N> for constant lengths
that don't need to be checked.
The __detail::__extent_storage<dynamic_extent> specialization only needs
one constructor, as a std::integral_constant<size_t, N> argument can
implicitly convert to size_t.
For the member functions that return a subspan with a constant extent we
return std::span<T,C>(ptr, C) which is redundant in two ways. Repeating
the constant length C when it's already a template argument is
redundant, and using the std::span(T*, size_t) constructor implies a
runtime length which will do a redundant assertion check. Even though
that assertion won't fail and should be optimized away, it's still
unnecessary code that doesn't need to be instantiated and then optimized
away again. We can avoid that by adding a new private constructor that
only takes a pointer (wrapped in a custom tag struct to avoid
accidentally using that constructor) and automatically sets _M_extent to
the correct value.
libstdc++-v3/ChangeLog:
* include/std/span (__detail::__extent_storage): Check
precondition in constructor. Add consteval constructor for valid
lengths and deleted constructor for invalid constant lengths.
Make member functions always_inline.
(__detail::__span_ptr): New class template.
(span): Adjust constructors to use a std::integral_constant
value for constant lengths. Declare all specializations of
std::span as friends.
(span::first<C>, span::last<C>, span::subspan<O,C>): Use new
private constructor.
(span(__span_ptr<T>)): New private constructor for constant
lengths.
Jonathan Wakely [Wed, 18 Dec 2024 12:57:14 +0000 (12:57 +0000)]
libstdc++: Handle errors from strxfrm in std::collate::transform [PR85824]
std::regex builds a cache of equivalence classes by calling
std::regex_traits<char>::transform_primary(c) for every char, which then
calls std::collate<char>::transform which calls strxfrm. On several
targets strxfrm fails for non-ASCII characters. Because strxfrm has no
return value reserved to indicate an error, some implementations return
INT_MAX or SIZE_MAX. This causes std::collate::transform to try to
allocate a huge buffer, which is either very slow or throws
std::bad_alloc. We should check errno after calling strxfrm to detect
errors and then throw a more appropriate exception instead of trying to
allocate a huge buffer.
Unfortunately the std::collate<C>::_M_transform function has a
non-throwing exception specifier, so we can't do the error handling
there.
As well as checking errno, this patch changes std::collate::do_transform
to use __builtin_alloca for small inputs, and to use RAII to deallocate
the buffers used for large inputs.
This change isn't sufficient to fix the three std::regex bugs caused by
the lack of error handling in std::collate::do_transform, we also need
to make std::regex_traits::transform_primary handle exceptions. This
change also attempts to make transform_primary closer to the effects
described in the standard, by not even attempting to use std::collate if
the locale's std::collate facet has been replaced (see PR 118105).
Implementing the correct effects for transform_primary requires RTTI, so
that we don't use some user-defined std::collate facet with unknown
semantics. When -fno-rtti is used transform_primary just returns an
empty string, making equivalence classes unusable in std::basic_regex.
That's not ideal, but I don't have any better ideas.
I'm unsure if std::regex_traits<C>::transform_primary is supposed to
convert the string to lower case or not. The general regex traits
requirements ([re.req] p20) do say "when character case is not
considered" but the specification for the std::regex_traits<char> and
std::regex_traits<wchar_t> specializations ([re.traits] p7) don't say
anything about that.
With the r15-6317-geb339c29ee42aa change, transform_primary is not
called unless the regex actually uses an equivalence class. But using an
equivalence class would still fail (or be incredibly slow) on some
targets. With this commit, equivalence classes should be usable on all
targets, without excessive memory allocations.
Arguably, we should not even try to call transform_primary for any char
values over 127, since they're never valid in locales that use UTF-8 or
7-bit ASCII, and probably for other charsets too. Handling 128
exceptions for every std::regex compilation is very inefficient, but at
least it now works instead of failing with std::bad_alloc, and no longer
allocates 128 x 2GB. Maybe for C++26 we could check the locale's
std::text_encoding and use that to decide whether to cache equivalence
classes for char values over 127.
libstdc++-v3/ChangeLog:
PR libstdc++/85824
PR libstdc++/94409
PR libstdc++/98723
PR libstdc++/118105
* include/bits/locale_classes.tcc (collate::do_transform): Check
errno after calling _M_transform. Use RAII type to manage the
buffer and to restore errno.
* include/bits/regex.h (regex_traits::transform_primary): Handle
exceptions from std::collate::transform and do not try to use
std::collate for user-defined facets.
Jonathan Wakely [Tue, 17 Dec 2024 21:32:19 +0000 (21:32 +0000)]
libstdc++: Fix std::future::wait_until for subsecond negative times [PR118093]
The current check for negative times (i.e. before the epoch) only checks
for a negative number of seconds. For a time 1ms before the epoch the
seconds part will be zero, but the futex syscall will still fail with an
EINVAL error. Extend the check to handle this case.
This change adds a redundant check in the headers too, so that we avoid
even calling into the library for negative times. Both checks can be
marked [[unlikely]]. The check in the headers avoids the cost of
splitting the time into seconds and nanoseconds and then making a PLT
call. The check inside the library matches where we were checking
already, and fixes existing binaries that were compiled against older
headers but use a newer libstdc++.so.6 at runtime.
libstdc++-v3/ChangeLog:
PR libstdc++/118093
* include/bits/atomic_futex.h (_M_load_and_test_until_impl):
Return false for times before the epoch.
* src/c++11/futex.cc (_M_futex_wait_until): Extend check for
negative times to check for subsecond times. Add unlikely
attribute.
(_M_futex_wait_until_steady): Likewise.
* testsuite/30_threads/future/members/118093.cc: New test.
We have several overloads of std::deque::_M_insert_aux, one of which is
variadic and called by std::deque::emplace. With a suitable set of
arguments to emplace, it's possible for one of the non-variadic
_M_insert_aux overloads to be selected by overload resolution, making
emplace ill-formed.
Rename the variadic _M_insert_aux to _M_emplace_aux so that calls to
emplace never select an _M_insert_aux overload. Also add an inline
_M_insert_aux for the const lvalue overload that is called from
insert(const_iterator, const value_type&).
Richard Biener [Wed, 8 Jan 2025 08:25:52 +0000 (09:25 +0100)]
tree-optimization/117979 - failed irreducible loop update from DCE
When CD-DCE creates forwarders to reduce false control dependences
it fails to update the irreducible state of edge and the forwarder
block in case the fowarder groups both normal (entry) and edges
from an irreducible region (necessarily backedges). This is because
when we split the first edge, if that's a normal edge, the forwarder
and its edge to the original block will not be marked as part
of the irreducible region but when we then redirect an edge from
within the region it becomes so.
The following fixes this up.
Note I think creating a forwarder that includes backedges is
likely not going to help, but at this stage I don't want to change
the CFG going into DCE. For regular loops we'll have a single
entry and a single backedge by means of loop init and will never
create a forwarder - so this is solely happening for irreducible
regions where it's harder to prove that such forwarder doesn't help.
PR tree-optimization/117979
* tree-ssa-dce.cc (make_forwarders_with_degenerate_phis):
Properly update the irreducible region state.
DWARF has voted in recently https://dwarfstd.org/issues/241209.1.html ,
which is basically just a guarantee that the DWARF 6 draft
DW_AT_language_{name,version} attribute codes and content of
https://dwarfstd.org/languages-v6.html can be used as an extension
in DWARF 5 and won't be changed.
So, this patch is an alternative to the
https://gcc.gnu.org/pipermail/gcc-patches/2024-November/669671.html
patch, which had the major problem that it required changing all the
DWARF consumers to be able to debug C17 or later or C++17 or later
sources.
This patch uses still DWARF 5 DW_LANG_C11 or DW_LANG_C_plus_plus_14,
the latest code in DWARF 5 proper, so all DWARF 5 capable consumers
should be able to deal with that, but additionally emits the
DWARF 6 attributes so that newer DWARF consumers can see it isn't
just C++14 but say C++23 or C11 but C23. Consumers which don't know
those DWARF 6 attributes would just ignore them. This is like any other
-gno-strict-dwarf extension, except that normally we emit say DWARF 5
codes where possible only after DWARF 5 is released, while in this case
there is a guarantee it can be used before DWARF 6 is released.
2025-01-08 Jakub Jelinek <jakub@redhat.com>
include/
* dwarf2.h (enum dwarf_source_language): Fix comment pasto.
(enum dwarf_source_language_name): New type.
* dwarf2.def (DW_AT_language_name, DW_AT_language_version): New
DWARF 6 codes.
gcc/
* dwarf2out.cc (break_out_comdat_types): Copy over
DW_AT_language_{name,version} if present.
(output_skeleton_debug_sections): Remove also
DW_AT_language_{name,version}.
(gen_compile_unit_die): For C17, C23, C2Y, C++17, C++20, C++23
and C++26 emit for -gdwarf-5 -gno-strict-dwarf also
DW_AT_language_{name,version} attributes.
gcc/testsuite/
* g++.dg/debug/dwarf2/lang-cpp17.C: Add -gno-strict-dwarf to
dg-options. Check also for DW_AT_language_{name,version} values.
* g++.dg/debug/dwarf2/lang-cpp20.C: Likewise.
* g++.dg/debug/dwarf2/lang-cpp23.C: New test.
Richard Biener [Tue, 7 Jan 2025 10:15:43 +0000 (11:15 +0100)]
tree-optimization/118269 - SLP reduction chain and early breaks
When we create the SLP reduction chain epilogue for the PHIs for
the early exit we fail to properly classify the reduction as SLP
reduction chain. The following fixes the corresponding checks.
PR tree-optimization/118269
* tree-vect-loop.cc (vect_create_epilog_for_reduction):
Use the correct stmt for the REDUC_GROUP_FIRST_ELEMENT lookup.
* gcc.dg/vect/vect-early-break_131-pr118269.c: New testcase.
Jeevitha [Wed, 8 Jan 2025 07:03:12 +0000 (01:03 -0600)]
testsuite: Simplify target test and dg-options for AMO tests
Removed powerpc*-*-* from the target test as it is always true. Simplified
options by removing -mpower9-misc and -mvsx, which are enabled by default with
-mdejagnu-cpu=power9. The has_arch_pwr9 check is also true with
-mdejagnu-cpu=power9, so it has been removed.
* gcc.target/powerpc/amo1.c: Removed powerpc*-*-* from the target and
simplified dg-options.
* gcc.target/powerpc/amo2.c: Simplified dg-options and added powerpc_vsx
target check.
Hongyu Wang [Thu, 2 Jan 2025 02:29:27 +0000 (10:29 +0800)]
i386: Add br_mispredict_scale in cost table.
For later processors, the pipeline went deeper so the penalty for
untaken branch can be larger than before. Add a new parameter
br_mispredict_scale to describe the penalty, and adopt to
noce_max_ifcvt_seq_cost hook to allow longer sequence to be
converted with cmove.
This improves cpu2017 544 with -Ofast -march=native for 14% on P-core
SPR, and 8% on E-core SRF. No other regression observed.
gcc/ChangeLog:
* config/i386/i386.cc (ix86_noce_max_ifcvt_seq_cost): Adjust
cost with ix86_tune_cost->br_mispredict_scale.
* config/i386/i386.h (processor_costs): Add br_mispredict_scale.
* config/i386/x86-tune-costs.h: Add new br_mispredict_scale to
all processor_costs, in which icelake_cost/alderlake_cost
with value COSTS_N_INSNS (2) + 3 and other processor with value
COSTS_N_INSNS (2).
Pan Li [Thu, 12 Dec 2024 02:48:08 +0000 (10:48 +0800)]
Match: Refactor the signed SAT_* match for saturated value [NFC]
This patch would like to refactor the all signed SAT_* patterns for
the saturated value. Aka, overflow to INT_MAX when > 0 and downflow
to INT_MIN when < 0. Thus, we can remove sorts of duplicated expression
in different patterns.
The below test suites are passed for this patch.
* The rv64gcv fully regression test.
* The x86 bootstrap test.
* The x86 fully regression test.
gcc/ChangeLog:
* match.pd: Extract saturated value match for signed SAT_*.
Pan Li [Wed, 11 Dec 2024 11:37:06 +0000 (19:37 +0800)]
Match: Refactor the signed SAT_TRUNC match patterns [NFC]
This patch would like to refactor the all signed SAT_TRUNC patterns,
aka:
* Extract type check outside.
* Re-arrange the related match pattern forms together.
The below test suites are passed for this patch.
* The rv64gcv fully regression test.
* The x86 bootstrap test.
* The x86 fully regression test.
gcc/ChangeLog:
* match.pd: Refactor sorts of signed SAT_TRUNC match patterns
Pan Li [Wed, 11 Dec 2024 11:09:08 +0000 (19:09 +0800)]
Match: Refactor the signed SAT_SUB match patterns [NFC]
This patch would like to refactor the all signed SAT_ADD patterns,
aka:
* Extract type check outside.
* Re-arrange the related match pattern forms together.
The below test suites are passed for this patch.
* The rv64gcv fully regression test.
* The x86 bootstrap test.
* The x86 fully regression test.
gcc/ChangeLog:
* match.pd: Refactor sorts of signed SAT_SUB match patterns.
This improves codegen for x264 sum of absolute difference routines.
The insn count is same, but we avoid double widening ops and ensuing
whole register moves.
Also for more general applicability, we chose to implement abs diff
vs. the sum of abs diff variant.
Suggested-by: Robin Dapp <rdapp@ventanamicro.com> Co-authored-by: Pan Li <pan2.li@intel.com> Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
PR target/117722
Keith Packard [Tue, 7 Jan 2025 21:54:11 +0000 (14:54 -0700)]
[PATCH] libgcc/m68k: More fixes for soft float
Fix __extenddfxf2:
* Remove bogus denorm handling block which would never execute --
the converted exp value is always positive as EXCESSX > EXCESSD.
* Compute the whole significand in dl instead of doing part of it in
ldl.
* Mask off exponent from dl.l.upper so the denorm shift test
works.
* Insert the hidden one bit into dl.l.upper as needed.
Fix __truncxfdf2 denorm handling. All that is required is to shift the
significand right by the correct amount; it already has all of the
necessary bits set including the explicit one. Compute the shift
amount, then perform the wide shift across both elements of the
significand.
Fix __fixxfsi:
* The value was off by a factor of two as the significand contains
32 bits, not 31 so we need to shift by one more than the equivalent
code in __fixdfsi.
* Simplify the code having realized that the lower 32 bits of the
significand can never appear in the results.
Return positive qNaN instead of negative. For floats, qNaN is 0x7fff_ffff. For
doubles, qNaN is 0x7fff_ffff_ffff_ffff.
Return correctly signed zero on float and double divide underflow. This means
that Ld$underflow now expects d7 to contain the sign bit, just like the other
return paths.
libgcc/
* config/m68k/fpgnulib.c (extenddfxf2): Simplify code by removing code
that should never execute. Fix denorm shift test and insert hidden bit
as needed.
(__truncxfdf2): Properly compue and shift the significant right.
* config/m68k/lb1sf68.S (__fixxfsi): Correct shift counts and simplify.
(QUIET_NAN): Make it a positive quiet NaN and fix return values to inject
sign properly.
Jeff Law [Tue, 7 Jan 2025 21:27:28 +0000 (14:27 -0700)]
Fix testsuite expectations for RVV after recent change
Tamar's recent improvement to improve affine unsigned folding for exchange2
twiddle code generation for a couple tests in the RVV testsuite just enough to
cause testsuite failures.
I've looked at both tests before/after Tamar's change and the code is clearly
better -- essentially tighter vector loops due to improvements in address
arithmetic. Additionally we have fewer vsetvls after Tamar's patch.
Given that I'm just making the obvious adjustments to the expected assembly and
pushing to the trunk.
Jeff Law [Tue, 7 Jan 2025 19:20:15 +0000 (12:20 -0700)]
Fix regression in ft32 port after recent switch table adjustments
This is a trivial bug that showed up after Mark W's recent patch to not apply
the size limit on jump tables.
The ft32 port has limited immediate ranges on comparisons and the casesi
expander didn't honor those. It'd blindly pass along an out of range constant.
This patch adds the trivial adjustment to force an out of range constant into a
register. It fixes these regressions:
> Tests that now fail, but worked before (3 tests):
>
> ft32-sim: gcc: gcc.c-torture/compile/pr34093.c -O1 (test for excess errors)
> ft32-sim: gcc: gcc.dg/torture/pr106809.c -O1 (test for excess errors)
> ft32-sim: gcc: gcc.dg/torture/pr106809.c -O1 (test for excess errors)
Tested in my tester. No other tests were fixed.
gcc/
* config/ft32/ft32.md (casesi expander): Force operands[2] into
a register if it's not a suitable rimm operand.
Dimitar Dimitrov [Sun, 24 Nov 2024 10:22:13 +0000 (12:22 +0200)]
testsuite: RISC-V: Skip tests providing -march for ILP32E/ILP64E ABIs
Many test cases explicitly set -march with extensions which are not
compatible with the E ABI variants. This leads to spurious errors
when toolchain has been configured for RV32E base ISA and ILP32E ABI:
spawn ... -march=rv32gc_zbb ...
cc1: error: ILP32E ABI does not support the 'D' extension
Fix by skipping those tests if toolchain's default ABI is E.
testsuite: RISC-V: Skip tests using -mcpu= for ILP32E/ILP64E ABIs
The tests are specifying -mcpu with D extension, which is not compatible
with the ILP32E and ILP64E ABIs. Fix by skipping the tests if toolchain's
default ABI is an E variant.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/pr109508.c: Skip for E ABI.
* gcc.target/riscv/pr114139.c: Ditto.
Dimitar Dimitrov [Mon, 25 Nov 2024 18:48:00 +0000 (20:48 +0200)]
testsuite: RISC-V: Skip V and Zvbb tests for ILP32E/ILP64E ABIs
Some tests add options for V and Zvbb extensions, but those extensions
are not compatible with the E ABI variants. This leads to spurious test
failures when toolchain's default ABI is ILP32E or ILP64E:
spawn ... -march=rv32ecv_zvbb ...
cc1: error: ILP32E ABI does not support the 'D' extension
cc1: sorry, unimplemented: Currently the 'V' implementation requires the 'M' extension
Fix by skipping the tests when toolchain's default ABI is E variant.
Dimitar Dimitrov [Thu, 12 Dec 2024 18:22:59 +0000 (20:22 +0200)]
testsuite: RISC-V: Add effective target for E ABI variant
Add new effective target check for either ILP32E or ILP64E ABI variants.
Initial implementation only checks for RV32E or RV64E ISA, which in turn
implies that ILP32E/ILP64E ABI is used. The RV32I+ILP32E and
RV64I+ILP64E combinations are not yet caught by the check, but they
do not seem to be widely used currently.
gcc/testsuite/ChangeLog:
* lib/target-supports.exp (check_effective_target_riscv_abi_e):
New procedure.
Thomas Koenig [Tue, 7 Jan 2025 14:23:29 +0000 (15:23 +0100)]
Document unsigned constants in intrinsic modules.
gcc/fortran/ChangeLog:
* intrinsic.texi (ISO_FORTRAN_ENV): Also mention INT8 in the
text. Document UINT8, UINT16, UINT32 and UINT64.
(ISO_C_BINDING): New table for unsigned KIND numbers.
Wilco Dijkstra [Fri, 1 Nov 2024 14:40:26 +0000 (14:40 +0000)]
AArch64: Switch off early scheduling
The early scheduler takes up ~33% of the total build time, however it doesn't
provide a meaningful performance gain. This is partly because modern OoO cores
need far less scheduling, partly because the scheduler tends to create many
unnecessary spills by increasing register pressure. Building applications
56% faster is far more useful than ~0.1% improvement on SPEC, so switch off
early scheduling on AArch64. Codesize reduces by ~0.2%.
Fix various tests that depend on scheduling by explicitly adding -fschedule-insns.
gcc:
* common/config/aarch64/aarch64-common.cc: Switch off fschedule_insns.
Wilco Dijkstra [Fri, 1 Nov 2024 14:44:56 +0000 (14:44 +0000)]
AArch64: Block combine_and_move from creating FP literal loads
The IRA combine_and_move pass runs if the scheduler is disabled and aggressively
combines moves. The movsf/df patterns allow all FP immediates since they rely
on a split pattern. However splits do not happen during IRA, so the result is
extra literal loads. To avoid this, split early during expand and block
creation of FP immediates that need this split. Mark a few testcases that
rely on late splitting as xfail.
Tobias Burnus [Tue, 7 Jan 2025 15:43:30 +0000 (16:43 +0100)]
libgomp.texi: Minor update to omp_get_num_devices/omp_get_initial_device
libgomp/ChangeLog:
* libgomp.texi (OpenMP 6.0): Fix typo.
(omp_get_default_device): Update the wording as the value
returned by omp_get_initial_device is now ambiguous.
(omp_get_num_devices): Minor wording tweak.
(omp_get_initial_device): Note that the function may also
return omp_initial_device since OpenMP 6.
Tamar Christina [Mon, 6 Jan 2025 17:52:14 +0000 (17:52 +0000)]
perform affine fold to unsigned on non address expressions. [PR114932]
When the patch for PR114074 was applied we saw a good boost in exchange2.
This boost was partially caused by a simplification of the addressing modes.
With the patch applied IV opts saw the following form for the base addressing;
This is because the patch promoted multiplies where one operand is a constant
from a signed multiply to an unsigned one, to attempt to fold away the constant.
This patch attempts the same but due to the various problems with SCEV and
niters not being able to analyze the resulting forms (i.e. PR114322) we can't
do it during SCEV or in the general form like in fold-const like extract_muldiv
attempts.
Instead this applies the simplification during IVopts initialization when we
create the IV. This allows IV opts to see the simplified form without
influencing the rest of the compiler.
as mentioned in PR114074 it would be good to fix the missed optimization in the
other passes so we can perform this in general.
The reason this has a big impact on Fortran code is that Fortran doesn't seem to
have unsigned integer types. As such all it's addressing are created with
signed types and folding does not happen on them due to the possible overflow.
concretely on AArch64 this changes the results from generation:
The two patches together results in a 10% performance increase in exchange2 in
SPECCPU 2017 and a 4% reduction in binary size and a 5% improvement in compile
time. There's also a 5% performance improvement in fotonik3d and similar
reduction in binary size.
The patch folds every IV to unsigned to canonicalize them. At the end of the
pass we match.pd will then remove unneeded conversions.
Note that we cannot force everything to unsigned, IVops requires that array
address expressions remain as such. Folding them results in them becoming
pointer expressions for which some optimizations in IVopts do not run.
PR tree-optimization/114932
* gcc.dg/tree-ssa/pr64705.c: Update dump file scan.
* gcc.target/i386/pr115462.c: The testcase shares 3 IVs which calculates
the same thing but with a slightly different increment offset. The test
checks for 3 complex addressing loads, one for each IV. But with this
change they now all share one IV. That is the loop now only has one
complex addressing. This is ultimately driven by the backend costing
and the current costing says this is preferred so updating the testcase.
* gfortran.dg/addressing-modes_1.f90: New test.
Andrew Pinski [Sat, 16 Nov 2024 04:22:04 +0000 (20:22 -0800)]
cfgexpand: Handle integral vector types and constructors for scope conflicts [PR105769]
This is an expansion of the last patch to also track pointers via vector types and the
constructor that are used with vector types.
In this case we had:
```
_15 = (long unsigned int) &bias;
_10 = (long unsigned int) &cov_jn;
_12 = {_10, _15};
...
...
MEM <vector(2) long unsigned int> [(void *)&D.6172 + 32B] = _12;
MEM[(struct function *)&D.6157] ={v} {CLOBBER(bob)};
```
Anyways tracking the pointers via vector types to say they are alive
at the point where the store of the vector happens fixes the bug by saying
it is alive at the same time as another variable is alive.
Bootstrapped and tested on x86_64-linux-gnu.
PR tree-optimization/105769
gcc/ChangeLog:
* cfgexpand.cc (vars_ssa_cache::operator()): For constructors
walk over the elements.
gcc/testsuite/ChangeLog:
* g++.dg/torture/pr105769-1.C: New test.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Andrew Pinski [Sat, 16 Nov 2024 04:22:03 +0000 (20:22 -0800)]
cfgexpand: Rewrite add_scope_conflicts_2 to use cache and look back further [PR111422]
After fixing loop-im to do the correct overflow rewriting
for pointer types too. We end up with code like:
```
_9 = (unsigned long) &g;
_84 = _9 + 18446744073709551615;
_11 = _42 + _84;
_44 = (signed char *) _11;
...
*_44 = 10;
g ={v} {CLOBBER(eos)};
...
n[0] = &f;
*_44 = 8;
g ={v} {CLOBBER(eos)};
```
Which was not being recongized by the scope conflicts code.
This was because it only handled one level walk backs rather than multiple ones.
This fixes the issue by having a cache which records all references to addresses
of stack variables.
Unlike the previous patch, this only records and looks at addresses of stack variables.
The cache uses a bitmap and uses the index as the bit to look at.
* cfgexpand.cc (struct vars_ssa_cache): New class.
(vars_ssa_cache::vars_ssa_cache): New constructor.
(vars_ssa_cache::~vars_ssa_cache): New deconstructor.
(vars_ssa_cache::create): New method.
(vars_ssa_cache::exists): New method.
(vars_ssa_cache::add_one): New method.
(vars_ssa_cache::update): New method.
(vars_ssa_cache::dump): New method.
(add_scope_conflicts_2): Factor mostly out to
vars_ssa_cache::operator(). New cache argument.
Walk the bitmap cache for the stack variables addresses.
(vars_ssa_cache::operator()): New method factored out from
add_scope_conflicts_2. Rewrite to be a full walk of all operands
and use a worklist.
(add_scope_conflicts_1): Add cache new argument for the addr cache.
Just call add_scope_conflicts_2 for the phi result instead of calling
for the uses and don't call walk_stmt_load_store_addr_ops for phis.
Update call to add_scope_conflicts_2 to add cache argument.
(add_scope_conflicts): Add cache argument and update calls to
add_scope_conflicts_1.
gcc/testsuite/ChangeLog:
* gcc.dg/torture/pr117426-1.c: New test.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Andrew Pinski [Sat, 16 Nov 2024 04:22:02 +0000 (20:22 -0800)]
cfgexpand: Factor out getting the stack decl index
This is the first patch in improving this code.
Since there are a few places which get the index and they
check the same thing, let's factor that out into one function.
Bootstrapped and tested on x86_64-linux-gnu.
gcc/ChangeLog:
* cfgexpand.cc (INVALID_STACK_INDEX): New defined.
(decl_stack_index): New function.
(visit_op): Use decl_stack_index.
(visit_conflict): Likewise.
(add_scope_conflicts_1): Likewise.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Richard Biener [Tue, 7 Jan 2025 12:18:27 +0000 (13:18 +0100)]
rtl-optimization/118298 - constant iteration loops and #pragma unroll
When the RTL unroller handles constant iteration loops it bails out
prematurely when heuristics wouldn't apply any unrolling before
checking #pragma unroll.
PR rtl-optimization/118298
* loop-unroll.cc (decide_unroll_constant_iterations): Honor
loop->unroll even if the loop is too big for heuristics.
Richard Biener [Tue, 7 Jan 2025 14:07:12 +0000 (15:07 +0100)]
Fixup convert-dfp*.c
The testcases use -save-temps which doesn't play nice with -flto
and multilib testing resulting in spurious UNRESOLVED like
/usr/lib64/gcc/x86_64-suse-linux/14/../../../../x86_64-suse-linux/bin/ld: i386:x86-64 architecture of input file `./convert-dfp-2.ltrans0.ltrans.o' is incompatible with i386 output
The following skips the testcases when using -flto.
* gcc.dg/torture/convert-dfp-2.c: Skip with -flto.
* gcc.dg/torture/convert-dfp.c: Likewise.
Alexandre Oliva [Wed, 11 Dec 2024 13:16:58 +0000 (10:16 -0300)]
ada: Drop g-cpp* units not needed by the compiler
Having moved __gnat_convert_caught_object to g-cstyin.o, we can drop
other g-cpp* units that are now needed by programs that actually use
their APIs to get more information about C++ exceptions and type_info
objects.
gcc/ada/ChangeLog:
* gcc-interface/Make-lang.in (GNAT_ADA_OBJS, GNATBIND_OBJS):
Drop g-cpp, g-cppexc and g-cppstd.
Eric Botcazou [Tue, 10 Dec 2024 09:24:47 +0000 (10:24 +0100)]
ada: Do not create temporaries for initialization statements
Assignment statements marked with the No_Ctrl_Actions or No_Finalize_Actions
flag are initialization statements and, therefore, no temporaries are needed
to hold the value of the right-hand side for them.
gcc/ada/ChangeLog:
* gcc-interface/trans.cc (Call_to_gnu): Always use the return slot
optimization if the parent node is an initialization statement.
(gnat_to_gnu) <N_Assignment_Statement>: Build an INIT_EXPR instead
of a MODIFY_EXPR if this is an initialization statement.
Eric Botcazou [Fri, 20 Dec 2024 15:49:50 +0000 (16:49 +0100)]
ada: Do not raise exceptions from Exp_Aggr.Packed_Array_Aggregate_Handled
An exception is now raised during bootstrap and this causes compatibility
issues with older compilers.
gcc/ada/ChangeLog:
* exp_aggr.adb (Packed_Array_Aggregate_Handled): Remove declaration
and handler for Not_Handled local exception. Check the return value
of Get_Component_Val instead.
(Get_Component_Val): Return No_Uint instead of raising Not_Handled.
Javier Miranda [Thu, 19 Dec 2024 10:41:59 +0000 (10:41 +0000)]
ada: Cleanup preanalysis of static expressions (part 2)
According to RM 13.14(8/4), a static expression in an aspect specification
does not cause freezing; however, the frontend performs many calls to
Preanalyze_Spec_Expression made during the analysis of aspects. This
patch, suggested by Eric Botcazou, takes care of this additional code
cleanup which requires also replacing many occurrences of the global
variable In_Spec_Expression by calls to Preanalysis_Active.
gcc/ada/ChangeLog:
* exp_util.adb (Insert_Actions): Document behavior under strict
preanalysis.
* sem.ads (In_Strict_Preanalysis): New subprogram.
(Preanalysis_Active): Replace 'and' operator by 'and then'.
* sem.adb (In_Strict_Preanalysis): Ditto.
* sem_attr.adb (Check_Dereference): Replace In_Spec_Expression
occurrence by call to Preanalysis_Active, and document it.
(Resolve_Attribute [Atribute_Access]): Ditto.
(Eval_Attribute): No evaluation under strict preanalysis.
(Validate_Static_Object_Name): No action under strict preanalysis.
* sem_ch13.adb (Check_Aspect_At_End_Of_Declarations): Replace
calls to Preanalyze_Spec_Expression by calls to Preanalyze_And_Resolve.
(Check_Aspect_At_Freeze_Point): Ditto.
(Resolve_Aspect_Expressions [Dynamic/Static/Predicate aspects]): Code
cleanup adjusting the code to emulate Preanalyze_And_Resolve, instead
of Preanalyze_Spec_Expression.
(Resolve_Aspect_Expressions [CPU/Interrupt_Priority/Priority/
Storage_Size aspects]): Replace calls to Preanalyze_Spec_Expression
by call to Preanalyze_And _Resolve.
* sem_ch3.adb (Analyze_Object_Declaration): Replace In_Spec_Expression
occurrence by call to Preanalysis_Active.
(Find_Type_Of_Object): Add documentation.
* sem_ch4.adb (Analyze_Case_Expression): Replace In_Spec_Expression
occurrence by call to Preanalysis_Active.
* sem_ch6.adb (Analyze_Expression_Function): Minor code reorganization
moving the code preanalyzing the expression after the new body has
been inserted in the tree to ensure that its Parent attribute is
available for preanalysis.
* sem_cat.adb (Validate_Static_Object_Name): No action under strict
preanalysis.
* sem_elab.adb (Check_For_Eliminated_Subprogram): Replace In_Spec_Expression
occurrence by call to Preanalysis_Active.
* sem_eval.adb (Eval_Intrinsic_Call [Name_Enclosing_Entity]): Ditto.
* sem_elim.adb (Check_For_Eliminated_Subprogram): Ditto.
* sem_res.adb (Resolve_Entity_Name): Ditto.
Piotr Trojanek [Thu, 19 Dec 2024 23:09:15 +0000 (00:09 +0100)]
ada: Improve protection against wrong use from GDB
A code cleanup in routine intended to be used from DGB, suggested by running
GNATcheck rule Boolean_Negations. However, this code can be tuned to protect
against more illegal uses.
gcc/ada/ChangeLog:
* exp_disp.adb (Write_DT): Add guards that prevent crashes on illegal
node numbers.
Piotr Trojanek [Thu, 19 Dec 2024 14:32:56 +0000 (15:32 +0100)]
ada: Remove dead code in detection of null record definitions
Code cleanup; behavior is unaffected.
gcc/ada/ChangeLog:
* sem_util.adb (Is_Null_Record_Definition): Remove check for
Component_List being present after using it; replace check for
component item being a component declaration with an assertion;
fix style in comment.
This patch fixes two problems with how abort was deferred in finally
parts. First, calls to runtime subprograms are now omitted when
aborting is disallowed by active restrictions. Second, Abort_Undefer is
now correctly called when the finally part propagates an exception.
Steve Baird [Tue, 17 Dec 2024 21:27:04 +0000 (13:27 -0800)]
ada: Improved checking of uses of package renamings
In some cases, the RM 8.5.1(3.1) legality rule about uses of renamings of
limited views of packages was implemented incorrectly, resulting in rejecting
legal uses.
gcc/ada/ChangeLog:
* gen_il-fields.ads: add new Renames_Limited_View field.
* gen_il-gen-gen_entities.adb: add Renames_Limited_View flag for
packages.
* einfo.ads: add comment documenting Renames_Limited_View flag.
* sem_ch8.adb (Analyze_Package_Renaming): Set new Renames_Limited_View
flag. Test new Renames_Limited_View flag instead of calling
Has_Limited_With. If Has_Limited_With is True, that just means
that somebody, sometime during this compilation needed to
reference the limited view of the package; so that function
returns True too often to be used here.
(Find_Expanded_Name): Test new Renames_Limited_View flag instead of
calling Has_Limited_With.
Piotr Trojanek [Tue, 30 Jan 2024 00:10:17 +0000 (01:10 +0100)]
ada: Remove flag Is_Inherited_Pragma which is only set and never used
Code cleanup; behavior is unaffected. Flag Is_Inherited_Pragma is only set in
GNAT, but is not actually used, neither by the compiler nor by any backend.
gcc/ada/ChangeLog:
* contracts.adb (Inherit_Pragma): Don't set flag Is_Inherited_Pragma.
* gen_il-fields.ads (Opt_Field_Enum): Remove field identifier.
* gen_il-gen-gen_nodes.adb (N_Pragma): Remove field from node.
* sinfo.ads (Is_Inherited_Pragma): Remove field description.
(N_Pragma): Remove field reference.
Piotr Trojanek [Tue, 26 Mar 2024 15:23:41 +0000 (16:23 +0100)]
ada: Handle attributes related to Ada 2012 iterators as internal
Use existing machinery for internal attributes to handle attributes
related to Ada 2012 iterators. All these attributes exist exclusively
as a mean to delay processing.
Code cleanup. The only change in behavior is the wording of error
emitted when one of the internal attributes appears in source code:
from "illegal attribute" (which used to be emitted in the analysis)
to "unrecognized attribute (which is emitted by the parser).
gcc/ada/ChangeLog:
* exp_attr.adb (Expand_N_Attribute_Reference): Remove explicit
handling of attributes related to Ada 2012 iterators.
* sem_attr.adb (Analyze_Attribute, Eval_Attribute): Likewise;
move attribute Reduce according to alphabetic order.
* snames.adb-tmpl (Get_Attribute_Id): Add support for new internal
attributes.
* snames.ads-tmpl: Recognize names of new internal attributes.
(Attribute_Id): Recognize new internal attributes.
(Internal_Attribute_Id): Likewise.
(Is_Internal_Attribute_Name): Avoid duplication in comment.
Piotr Trojanek [Thu, 2 Mar 2023 21:43:12 +0000 (22:43 +0100)]
ada: Remove unnecessary qualifiers for First/Next list operations
Code cleanup related to work on expression functions for GNATprove
(which require accessibility checks even when they are not expanded
and thus have no explicit return statements).
gcc/ada/ChangeLog:
* accessibility.adb (First_Selector): Remove redundant and locally
inconsistent parenthesis.
(Check_Return_Construct_Accessibility): Remove qualifier from list
operation.
* sem_util.adb (Is_Prim_Of_Abst_Type_With_Nonstatic_CW_Pre_Post):
Likewise.
Eric Botcazou [Wed, 18 Dec 2024 09:16:15 +0000 (10:16 +0100)]
ada: Fix internal error on container aggregate for bounded vectors
The problem is that we analyze references to an object before the actual
subtype of the object is established, thus creating a type mismatch that
is flagged by the code generator.
gcc/ada/ChangeLog:
* exp_ch7.ads (Store_After_Actions_In_Scope_Without_Analysis): New
procedure declaration.
* exp_ch7.adb (Store_New_Actions_In_Scope): New procedure.
(Store_Actions_In_Scope): Call Store_New_Actions_In_Scope when the
target list is empty.
(Store_After_Actions_In_Scope_Without_Analysis): New procedure body.
* exp_aggr.adb (Expand_Container_Aggregate): For a declaration that
is wrapped in a transient scope, also defer the analysis of the new
code until after the declaration is analyzed.
Eric Botcazou [Tue, 17 Dec 2024 19:00:38 +0000 (20:00 +0100)]
ada: Add guard to System.Val_Real.Large_Powfive against pathological input
There is no need to keep multiplying the result once it saturates to +Inf.
gcc/ada/ChangeLog:
* libgnat/s-powflt.ads (Maxpow_Exact): Minor comment fix.
* libgnat/s-powlfl.ads (Maxpow_Exact): Likewise.
* libgnat/s-powllf.ads (Maxpow_Exact): Likewise.
* libgnat/s-valrea.adb (Large_Powfive) [1 parameter]: Exit the loop
as soon as the result saturates to +Inf.
(Large_Powfive) [2 parameters]: Likewise.
Piotr Trojanek [Mon, 16 Dec 2024 13:15:57 +0000 (14:15 +0100)]
ada: Move checks for consequences of Exceptional_Cases to GNAT
Previously checks for consequence expressions of Exceptional_Cases aspects were
done in GNATprove backend. However, we can do them in the frontend, where they
will apply to all subprograms, regardless of the SPARK_Mode aspect.
gcc/ada/ChangeLog:
* sem_prag.adb (Analyze_Exceptional_Cases_In_Decl_Part): Move check
from GNATprove backend to GNAT frontend.
Piotr Trojanek [Mon, 16 Dec 2024 12:52:43 +0000 (13:52 +0100)]
ada: Fix comments about Subprogram_Variant and Exceptional_Cases
The comment about Subprogram_Variant was outdated after more types have been
allowed by the corresponding SPARK RM rule; the comment about Exceptional_Cases
was incorrect, after being copy-pasted.
Steve Baird [Fri, 13 Dec 2024 01:06:00 +0000 (17:06 -0800)]
ada: Put_Image spec incorrectly ignored for Fixed_Point_Type'Base'Image call.
If a Put_Image aspect specification (introduced in Ada 2022) is given for a
fixed point type Fx, then in some cases a call to Fx'Base'Image would
incorrectly ignore the aspect specification and would instead return the
pre-Ada2022 version of the image. However, a call to Fx'Image would do the
right thing.
gcc/ada/ChangeLog:
* exp_put_image.adb (Image_Should_Call_Put_Image): Cope with the case
where the attribute prefix for an Image attribute reference
denotes an Itype constructed for a fixed point type. Calling
Has_Aspect with such an Itype misses applicable aspect
specifications; we need to look on the right list. This comes up
if the prefix of the attribute reference is
Some_Fixed_Point_Type'Base.
Gary Dismukes [Fri, 13 Dec 2024 23:36:05 +0000 (23:36 +0000)]
ada: Error on instantiation with defaulted formal type referencing other formal type
The compiler wasn't accounting for default subtypes on generic formal types
that reference other formal types of the same generic, leading to errors
about invalid subtypes. Several other problems that could lead to blowups
or incorrect errors were noticed through testing related cases and fixed
along the way.
gcc/ada/ChangeLog:
* sem_ch12.adb (Analyze_One_Association): In the case of a formal type
that has a Default_Subtype_Mark that does not have its Entity field set,
this means the default refers to another formal type of the same generic
formal part, so locate the matching subtype in the Result_Renamings and
set Match's Entity to that subtype prior to the call to Instantiate_Type.
(Validate_Formal_TypeDefault.Reference_Formal): Add test of Entity being
Present, to prevent blowups on End_Label ids (which don't have Entity set).
(Validate_Formal_Type_Default.Validate_Derived_Type_Default): Apply
Base_Type to Formal.
(Validate_Formal_Type_Default): Guard interface-related semantic checks
with a test of Is_Tagged_Type.
Eric Botcazou [Mon, 16 Dec 2024 07:59:26 +0000 (08:59 +0100)]
ada: Restrict previous change made to expansion of allocators
There is no need to build a cleanup if exceptions cannot be propagated.
gcc/ada/ChangeLog:
* exp_ch4.adb (Expand_Allocator_Expression): Do not build a cleanup
if restriction No_Exception_Propagation is active.
* exp_ch6.adb (Make_Build_In_Place_Call_In_Allocator): Likewise.
Deng Jianbo [Tue, 31 Dec 2024 11:33:23 +0000 (19:33 +0800)]
LoongArch: Optimize initializing fp resgister to zero
In LoongArch, currently uses instruction movgr2fr.{d|w} to move zero
from fixed-point register to floating-pointer regsiter for initializing
fp register to zero. When LSX or LASX is enabled, we can use instruction
vxor.v which has lower latency than instruction movgr2fr.{d|w} to set fp
register to zero directly.
gcc/ChangeLog:
* config/loongarch/loongarch.cc (loongarch_output_move):
Optimize instructions for initializing fp regsiter to zero.
gcc/testsuite/ChangeLog:
* gcc.target/loongarch/mov-zero-1.c: New test.
* gcc.target/loongarch/mov-zero-2.c: New test.
Gaius Mulley [Tue, 7 Jan 2025 11:20:45 +0000 (11:20 +0000)]
[PR modula2/118010, modula2/118183] Unable to rebuild the bootstrap tools and Wtypemismatch
This patch combines fixes for both PR-118010 (Wtypemismatch) and PR-118183
(unable to rebuild the bootstrap tools). PR-118010 required a new data
type (COFF_T) to be exported from SYSTEM and used in all return
types for libc.lseek. The patch also includes COFF_T implemented in mc
and this data type has been propagated though the translated versions
of pge and mc. Finally the patch adjusts the modula-2 declaration of
location_t to reflect the new gcc 64 bit type.
A new command line option -fm2-file-offset-bits= has been implemented to
override the default 64 bit declaration of COFF_T.
gcc/ChangeLog:
PR modula2/118010
* doc/gm2.texi (Compiler options): New option
-fm2-file-offset-bits=.
Fortran: Extend cylic type detection for deallocate [PR116669]
Using cycles in derived/class types lead to the compiler doing a endless
recursion in several locations, when the cycle was not immediate.
An immediate cyclic dependency is present in, for example T T::comp.
Cylcic dependencies of the form T T2::comp; T2 T::comp2; are now
detected and the recursive bit in the derived type's attr is set.
gcc/fortran/ChangeLog:
PR fortran/116669
* class.cc (gfc_find_derived_vtab): Use attr to determine cyclic
type dependendies.
* expr.cc (gfc_has_default_initializer): Prevent endless
recursion by storing already visited derived types.
* resolve.cc (resolve_cyclic_derived_type): Determine if a type
is used in its hierarchy in a cyclic way.
(resolve_fl_derived0): Call resolve_cyclic_derived_type.
(resolve_fl_derived): Ensure vtab is generated when cyclic
derived types have allocatable components.
* trans-array.cc (structure_alloc_comps): Prevent endless loop
for derived type cycles.
* trans-expr.cc (gfc_get_ultimate_alloc_ptr_comps_caf_token):
Off topic, just prevent memory leaks.
gcc/testsuite/ChangeLog:
* gfortran.dg/class_array_15.f03: Freeing more memory.
* gfortran.dg/recursive_alloc_comp_6.f90: New test.
This patch removes the AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS tunable and
use_new_vector_costs entry in aarch64-tuning-flags.def and makes the
AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS paths in the backend the
default. To that end, the function aarch64_use_new_vector_costs_p and its uses
were removed. To prevent costing vec_to_scalar operations with 0, as
described in
https://gcc.gnu.org/pipermail/gcc-patches/2024-October/665481.html,
we adjusted vectorizable_store such that the variable n_adjacent_stores
also covers vec_to_scalar operations. This way vec_to_scalar operations
are not costed individually, but as a group.
As suggested by Richard Sandiford, the "known_ne" in the multilane-check
was replaced by "maybe_ne" in order to treat nunits==1+1X as a vector
rather than a scalar.
Two tests were adjusted due to changes in codegen. In both cases, the
old code performed loop unrolling once, but the new code does not:
Example from gcc.target/aarch64/sve/strided_load_2.c (compiled with
-O2 -ftree-vectorize -march=armv8.2-a+sve -mtune=generic -moverride=tune=none):
f_int64_t_32:
cbz w3, .L92
mov x4, 0
uxtw x3, w3
+ cntd x5
+ whilelo p7.d, xzr, x3
+ mov z29.s, w5
mov z31.s, w2
- whilelo p6.d, xzr, x3
- mov x2, x3
- index z30.s, #0, #1
- uqdecd x2
- ptrue p5.b, all
- whilelo p7.d, xzr, x2
+ index z30.d, #0, #1
+ ptrue p6.b, all
.p2align 3,,7
.L94:
- ld1d z27.d, p7/z, [x0, #1, mul vl]
- ld1d z28.d, p6/z, [x0]
- movprfx z29, z31
- mul z29.s, p5/m, z29.s, z30.s
- incw x4
- uunpklo z0.d, z29.s
- uunpkhi z29.d, z29.s
- ld1d z25.d, p6/z, [x1, z0.d, lsl 3]
- ld1d z26.d, p7/z, [x1, z29.d, lsl 3]
- add z25.d, z28.d, z25.d
+ ld1d z27.d, p7/z, [x0, x4, lsl 3]
+ movprfx z28, z31
+ mul z28.s, p6/m, z28.s, z30.s
+ ld1d z26.d, p7/z, [x1, z28.d, uxtw 3]
add z26.d, z27.d, z26.d
- st1d z26.d, p7, [x0, #1, mul vl]
- whilelo p7.d, x4, x2
- st1d z25.d, p6, [x0]
- incw z30.s
- incb x0, all, mul #2
- whilelo p6.d, x4, x3
+ st1d z26.d, p7, [x0, x4, lsl 3]
+ add z30.s, z30.s, z29.s
+ incd x4
+ whilelo p7.d, x4, x3
b.any .L94
.L92:
ret
Alexandre Oliva [Fri, 20 Dec 2024 21:02:08 +0000 (18:02 -0300)]
expand: drop stack adjustments after barrier [PR118006]
A gimple block with __builtin_unreachable () can't have code after it,
and gimple optimizers ensure there isn't any, even without
optimization. But if the block requires stack adjustments,
e.g. because of a call that passes arguments on the stack, expand will
emit that after the barrier, and then rtl checkers rightfully
complain. Arrange to discard adjustments after a barrier.
Strub expanders seem to be necessary to bring about the exact
conditions that require stack adjustments after the block that ends
with a __builtin_unreachable call.
for gcc/ChangeLog
PR middle-end/118006
* cfgexpand.cc (expand_gimple_basic_block): Do not emit
pending stack adjustments after a barrier.
Akram Ahmad [Mon, 6 Jan 2025 20:09:30 +0000 (20:09 +0000)]
aarch64: remove extra XTN in vector concatenation
GIMPLE code which performs a narrowing truncation on the result of a
vector concatenation currently results in an unnecessary XTN being
emitted following a UZP1 to concate the operands. In cases such as this,
UZP1 should instead use a smaller arrangement specifier to replace the
XTN instruction. This is seen in cases such as in this GIMPLE example:
int32x2_t foo (svint64_t a, svint64_t b)
{
vector(2) int vect__2.8;
long int _1;
long int _3;
vector(2) long int _12;
bar:
ptrue p3.b, all
uaddv d0, p3, z0.d
uaddv d1, p3, z1.d
uzp1 v0.2d, v0.2d, v1.2d
xtn v0.2s, v0.2d
ret
This patch therefore defines the *aarch64_trunc_concat<mode> insn which
truncates the concatenation result, rather than concatenating the
truncated operands (such as in *aarch64_narrow_trunc<mode>), resulting
in the following optimised assembly being emitted:
bar:
ptrue p3.b, all
uaddv d0, p3, z0.d
uaddv d1, p3, z1.d
uzp1 v0.2s, v0.2s, v1.2s
ret
This patch passes all regression tests on aarch64 with no new failures.
A supporting test for this optimisation is also written and passes.
OK for master? I do not have commit rights so I cannot push the patch
myself.
gcc/ChangeLog:
* config/aarch64/aarch64-simd.md: (*aarch64_trunc_concat)
new insn definition.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/sve/truncated_concatenation_1.c: new test
for the above example and other modes covered by insn
definitions.
```
In file included from /gcc/src/libsanitizer/interception/interception.h:18,
from /gcc/src/libsanitizer/interception/interception_type_test.cpp:14:
/gcc/src/libsanitizer/interception/interception_type_test.cpp:30:61: error: static assertion failed
30 | COMPILER_CHECK((__sanitizer::is_same<::SSIZE_T, ::ssize_t>::value));
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~
/gcc/src/libsanitizer/sanitizer_common/sanitizer_internal_defs.h:363:44: note: in definition of macro 'COMPILER_CHECK'
363 | #define COMPILER_CHECK(pred) static_assert(pred, "")
| ^~~~
make[8]: *** [Makefile:469: interception_type_test.lo] Error 1
```
The culprit seems to be that we don't check for equality of type sizes
anymore but rather whether the types are indeed the same. On s390 -m31
we have that `sizeof(int)==sizeof(long)` holds which is why previously
the checks succeeded. They fail now because
```
size_t => unsigned long
ssize_t => long
ptrdiff_t => int
::SSIZE_T => __sanitizer::sptr => int
::PTRDIFF_T => __sanitizer::sptr => int
```
This is fixed by mapping `SSIZE_T` to `long` in the end.
For some targets uptr is mapped to unsigned int and size_t to unsigned
long and sizeof(int)==sizeof(long) holds. Still, these are distinct
types and type checking may fail. Therefore, replace uptr by
usize/SIZE_T wherever a size_t is expected.
Stafford Horne [Mon, 6 Jan 2025 12:12:40 +0000 (12:12 +0000)]
or1k: add .note.GNU-stack section on linux
In the OpenRISC build we get the following warning:
ld: warning: __modsi3_s.o: missing .note.GNU-stack section implies executable stack
ld: NOTE: This behaviour is deprecated and will be removed in a future version of the linker
Fix this by adding a .note.GNU-stack to indicate the stack does not need to be
executable for the lib1funcs.
Note, this is also needed for the upcoming glibc 2.41.
libgcc/
* config/or1k/lib1funcs.S: Add .note.GNU-stack section on linux.
SVE intrinsics: Fold svmul by -1 to svneg for unsigned types
As follow-up to
https://gcc.gnu.org/pipermail/gcc-patches/2024-October/665472.html,
this patch implements folding of svmul by -1 to svneg for
unsigned SVE vector types. The key idea is to reuse the existing code that
does this fold for signed types and feed it as callback to a helper function
that adds the necessary type conversions.
For example, for the test case
svuint64_t foo (svuint64_t x, svbool_t pg)
{
return svmul_n_u64_x (pg, x, -1);
}
the following gimple sequence is emitted (-O2 -mcpu=grace):
svuint64_t foo (svuint64_t x, svbool_t pg)
{
svint64_t D.12921;
svint64_t D.12920;
svuint64_t D.12919;
In general, the new helper gimple_folder::convert_and_fold
- takes a target type and a function pointer,
- converts the lhs and all non-boolean vector types to the target type,
- passes the converted lhs and arguments to the callback,
- receives the new gimple statement from the callback function,
- adds the necessary view converts to the gimple sequence,
- and returns the new call.
Because all arguments are converted to the same target types, the helper
function is only suitable for folding calls whose arguments are all of
the same type. If necessary, this could be extended to convert the
arguments to different types differentially.
The patch was bootstrapped and tested on aarch64-linux-gnu, no regression.
OK for mainline?
Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com>
gcc/ChangeLog:
* config/aarch64/aarch64-sve-builtins-base.cc
(svmul_impl::fold): Wrap code for folding to svneg in lambda
function and pass to gimple_folder::convert_and_fold to enable
the transform for unsigned types.
* config/aarch64/aarch64-sve-builtins.cc
(gimple_folder::convert_and_fold): New function that converts
operands to target type before calling callback function, adding the
necessary conversion statements.
(gimple_folder::redirect_call): Set fntype of redirected call.
(get_vector_type): Move from here to aarch64-sve-builtins.h.
* config/aarch64/aarch64-sve-builtins.h
(gimple_folder::convert_and_fold): Declare function.
(get_vector_type): Move here as inline function.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/sve/acle/asm/mul_u8.c: Adjust expected outcome.
* gcc.target/aarch64/sve/acle/asm/mul_u16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mul_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mul_u64.c: New test and adjust
expected outcome.
Martin Jambor [Mon, 6 Jan 2025 10:58:29 +0000 (11:58 +0100)]
ipa-cp: Make dumping of bit masks representing -1 nicer
Dumps of the lattices representing bit-values and of propagation
results of bit-values can print a really long hexadecimal value when
the bit-value represents -1 (all bits set). This patch simply detect
that situation and prints the string "-1" in that case, making the
dumps somewhat nicer.
gcc/ChangeLog:
2025-01-03 Martin Jambor <mjambor@suse.cz>
* ipa-cp.cc (ipcp_print_widest_int): New function.
(ipcp_store_vr_results): Use it.
(ipcp_bits_lattice::print): Likewise. Fix formatting.