gcc/configure: Fix check for assembler section merging support on Arm
In 32-bit Arm assembly, the @ character is the start of a comment so
the section type needs to use the % character instead.
configure.ac attempts to account for this difference by doing a second
try when checking the assembler for section merging support.
Unfortunately there is a bug: because the gcc_GAS_CHECK_FEATURE macro
has a call to AC_CACHE_CHECK, it will actually skip the second try
because the gcc_cv_as_shf_merge variable has already been set:
checking assembler for section merging support... no
checking assembler for section merging support... (cached) no
Fix by using a separate variable for the second try, as is done in the
check for COMDAT group support.
This problem was noticed because the recent binutils commit d5cbf916be4a ("gas/ELF: also reject merge entity size being zero") caused
gas to be stricter about mergeable sections without an entity size:
configure:27013: checking assembler for section merging support
configure:27022: /path/to/as --fatal-warnings -o conftest.o conftest.s >&5
conftest.s: Assembler messages:
conftest.s:1: Warning: invalid merge / string entity size
conftest.s: Error: 1 warning, treating warnings as errors
configure:27025: $? = 1
configure: failed program was
.section .rodata.str, "aMS", @progbits, 1
configure:27036: result: no
In previous versions of gas the conftest.s program above was accepted
and configure detected support for section merging.
See also:
https://linaro.atlassian.net/browse/GNU-1427
https://sourceware.org/bugzilla/show_bug.cgi?id=32491
Tested on armv8l-linux-gnueabihf.
gcc/ChangeLog:
* configure.ac: Fix check for HAVE_GAS_SHF_MERGE on Arm targets.
* configure: Regenerate.
Jason Merrill [Tue, 24 Dec 2024 00:56:43 +0000 (19:56 -0500)]
c++: add ref checks in conversion code
While looking at another patch I noticed that on a few tests we were doing
nonsensical things like building a reference to a reference. Make sure we
catch that sooner. But let's be friendly in can_convert, since it doesn't
return a conversion that could be wrongly applied to a reference.
gcc/cp/ChangeLog:
* call.cc (implicit_conversion): Check that FROM isn't a reference
if we also got an EXPR argument.
(convert_like_internal): Check that EXPR isn't a reference.
(can_convert_arg): convert_from_reference if needed.
Jason Merrill [Mon, 23 Dec 2024 17:32:54 +0000 (12:32 -0500)]
c++: print stub object as std::declval
If the result of build_stub_object gets printed by %E it looks something
like '(A&&)1', which seems confusing. Let's instead print it as
'std::declval<A>()' since that's how the library writes the same idea.
gcc/cp/ChangeLog:
* method.cc (is_stub_object): New.
* cp-tree.h (is_stub_object): Declare.
* error.cc (dump_expr): Use it.
Jason Merrill [Tue, 24 Dec 2024 00:57:56 +0000 (19:57 -0500)]
c++: fix conversion issues
Some issues caught by a check from another patch:
In the convert_like_internal bad_p handling, we are iterating from outside
to inside, so once we recurse into convert_like we need to stop looping.
In build_ramp_function, we're assigning REFERENCE_TYPE things, so we need to
build the assignment directly rather than rely on functions that implement
C++ semantics.
In omp_declare_variant_finalize_one, the parameter object building failed to
handle reference parms, and it seems simpler to just use build_stub_object
like other parts of the compiler.
Jakub Jelinek [Wed, 8 Jan 2025 19:07:47 +0000 (20:07 +0100)]
fortran: Bump MOD_VERSION to "16" [PR118337]
As mentioned in the PR, there is a *.mod incompatibility between GCC 14 and
GCC 15, at least when using iso_c_binding or iso_fortran_env intrinsic
modules, because new entries have been added to those modules in the middle,
causing changes in the constants emitted in the *.mod files.
Also, I fear modules produced with GCC 15 with -funsigned and using UNSIGNED
in the modules will be unreadable by GCC 14.
The following patch just bumps MOD_VERSION for this.
Note, a patch for accepting also MOD_VERSION "15" has been posted
incrementally.
2025-01-08 Jakub Jelinek <jakub@redhat.com>
PR fortran/118337
* module.cc (MOD_VERSION): Bump to "16".
aarch64_function_ok_for_sibcall required the caller and callee
to use the same PCS variant. However, it should be enough for the
callee to preserve at least as much register state as the caller;
preserving more state is fine.
ARM_PCS_AAPCS64, ARM_PCS_SIMD, and ARM_PCS_SVE agree on what
GPRs should be preserved. For the others:
Thus it's ok for something earlier in the list to tail call something
later in the list.
gcc/
PR target/107102
* config/aarch64/aarch64.cc (aarch64_function_ok_for_sibcall): Only
reject calls with different PCSes if the callee clobbers register
state that the caller must preserve.
gcc/testsuite/
PR target/107102
* gcc.target/aarch64/sve/sibcall_1.c: New test.
* gimplify.cc (gimplify_call_expr): Disable variant function's
append_args in 'omp dispatch' when invoking the variant directly
and not through the base function.
gcc/testsuite/ChangeLog:
* c-c++-common/gomp/append-args-4.c: New test.
* c-c++-common/gomp/append-args-5.c: New test.
the last of which wasn't converted to void and so we, since r15-6369,
do not take the "if (VOID_TYPE_P (type))" path, and try to set
D.2912 to false.
The last statement comes from build_disable_temp_cleanup.
convert_to_void is typically called from finish_expr_stmt, but we are
adding the cleanup statement via add_stmt which doesn't convert to void.
So I think we can use finish_expr_stmt instead.
PR c++/118169
gcc/cp/ChangeLog:
* typeck2.cc (split_nonconstant_init): Call finish_expr_stmt instead
of add_stmt.
Thomas Schwinge [Mon, 28 Nov 2022 09:37:26 +0000 (10:37 +0100)]
nvptx: Re-enable "Stack alignment causes use of alloca" test cases
These generally PASS nowadays, without requiring 'alloca'.
There were two exceptions: 'gcc.dg/torture/stackalign/pr16660-2.c',
'gcc.dg/torture/stackalign/pr16660-3.c', where variants specifying
'-O0' or '-fpic' FAILed with 'ptxas' of, for example, CUDA 10.0 due to:
nvptx-as: ptxas terminated with signal 11 [Segmentation fault], core dumped
That however is gone with 'ptxas' of, for example, CUDA 11.5 and later.
Jonathan Wakely [Mon, 23 Dec 2024 21:51:24 +0000 (21:51 +0000)]
libstdc++: Use preprocessor conditions in std module [PR118177]
The std-clib.cc module definition file assumes that all names are
available unconditionally, but that's not true for all targets. Use the
same preprocessor conditions as are present in the <cxxx> headers.
A similar change is needed in std.cc.in for the <chrono> features that
depend on the SSO std::string, guarded with a __cpp_lib_chrono value
indicating full C++20 support.
The conditions for <cmath> are omitted from this change, as there are a
large number of them. That probably needs to be fixed.
libstdc++-v3/ChangeLog:
PR libstdc++/118177
* src/c++23/std-clib.cc.in: Use preprocessor conditions for
names which are not always defined.
* src/c++23/std.cc.in: Likewise.
libstdc++: add initializer_list constructor to std::span (P2447R6)
This commit implements P2447R6. The code is straightforward (just one
extra constructor, with constraints and conditional explicit).
I decided to suppress -Winit-list-lifetime because otherwise it would
give too many false positives. The new constructor is meant to be used
as a parameter-passing interface (this is a design choice, see
P2447R6/ยง2) and, as such, the initializer_list won't dangle despite
GCC's warnings.
The new constructor isn't 100% backwards compatible. A couple of
examples are included in Annex C, but I have also lifted some more
from R4. A new test checks for the old and the new behaviors.
libstdc++-v3/ChangeLog:
* include/bits/version.def: Add the new feature-testing macro.
* include/bits/version.h: Regenerate.
* include/std/span: Add constructor from initializer_list.
* testsuite/23_containers/span/init_list_cons.cc: New test.
* testsuite/23_containers/span/init_list_cons_neg.cc: New test.
Signed-off-by: Giuseppe D'Angelo <giuseppe.dangelo@kdab.com>
Jonathan Wakely [Wed, 11 Dec 2024 22:56:08 +0000 (22:56 +0000)]
libstdc++: Avoid redundant assertions in std::span constructors
Any std::span<T, N> constructor with a runtime length has a precondition
that the length is equal to N (except when N == std::dynamic_extent).
Currently every constructor with a runtime length does:
if constexpr (extent != dynamic_extent)
__glibcxx_assert(n == extent);
We can move those assertions into the __detail::__extent_storage<N>
constructor so they are only done in one place. To avoid checking the
assertions when we have a constant length we can add a second
constructor which is consteval and takes a integral_constant<size_t, N>
argument. The std::span constructors can pass a size_t for runtime
lengths and a std::integral_constant<size_t, N> for constant lengths
that don't need to be checked.
The __detail::__extent_storage<dynamic_extent> specialization only needs
one constructor, as a std::integral_constant<size_t, N> argument can
implicitly convert to size_t.
For the member functions that return a subspan with a constant extent we
return std::span<T,C>(ptr, C) which is redundant in two ways. Repeating
the constant length C when it's already a template argument is
redundant, and using the std::span(T*, size_t) constructor implies a
runtime length which will do a redundant assertion check. Even though
that assertion won't fail and should be optimized away, it's still
unnecessary code that doesn't need to be instantiated and then optimized
away again. We can avoid that by adding a new private constructor that
only takes a pointer (wrapped in a custom tag struct to avoid
accidentally using that constructor) and automatically sets _M_extent to
the correct value.
libstdc++-v3/ChangeLog:
* include/std/span (__detail::__extent_storage): Check
precondition in constructor. Add consteval constructor for valid
lengths and deleted constructor for invalid constant lengths.
Make member functions always_inline.
(__detail::__span_ptr): New class template.
(span): Adjust constructors to use a std::integral_constant
value for constant lengths. Declare all specializations of
std::span as friends.
(span::first<C>, span::last<C>, span::subspan<O,C>): Use new
private constructor.
(span(__span_ptr<T>)): New private constructor for constant
lengths.
Jonathan Wakely [Wed, 18 Dec 2024 12:57:14 +0000 (12:57 +0000)]
libstdc++: Handle errors from strxfrm in std::collate::transform [PR85824]
std::regex builds a cache of equivalence classes by calling
std::regex_traits<char>::transform_primary(c) for every char, which then
calls std::collate<char>::transform which calls strxfrm. On several
targets strxfrm fails for non-ASCII characters. Because strxfrm has no
return value reserved to indicate an error, some implementations return
INT_MAX or SIZE_MAX. This causes std::collate::transform to try to
allocate a huge buffer, which is either very slow or throws
std::bad_alloc. We should check errno after calling strxfrm to detect
errors and then throw a more appropriate exception instead of trying to
allocate a huge buffer.
Unfortunately the std::collate<C>::_M_transform function has a
non-throwing exception specifier, so we can't do the error handling
there.
As well as checking errno, this patch changes std::collate::do_transform
to use __builtin_alloca for small inputs, and to use RAII to deallocate
the buffers used for large inputs.
This change isn't sufficient to fix the three std::regex bugs caused by
the lack of error handling in std::collate::do_transform, we also need
to make std::regex_traits::transform_primary handle exceptions. This
change also attempts to make transform_primary closer to the effects
described in the standard, by not even attempting to use std::collate if
the locale's std::collate facet has been replaced (see PR 118105).
Implementing the correct effects for transform_primary requires RTTI, so
that we don't use some user-defined std::collate facet with unknown
semantics. When -fno-rtti is used transform_primary just returns an
empty string, making equivalence classes unusable in std::basic_regex.
That's not ideal, but I don't have any better ideas.
I'm unsure if std::regex_traits<C>::transform_primary is supposed to
convert the string to lower case or not. The general regex traits
requirements ([re.req] p20) do say "when character case is not
considered" but the specification for the std::regex_traits<char> and
std::regex_traits<wchar_t> specializations ([re.traits] p7) don't say
anything about that.
With the r15-6317-geb339c29ee42aa change, transform_primary is not
called unless the regex actually uses an equivalence class. But using an
equivalence class would still fail (or be incredibly slow) on some
targets. With this commit, equivalence classes should be usable on all
targets, without excessive memory allocations.
Arguably, we should not even try to call transform_primary for any char
values over 127, since they're never valid in locales that use UTF-8 or
7-bit ASCII, and probably for other charsets too. Handling 128
exceptions for every std::regex compilation is very inefficient, but at
least it now works instead of failing with std::bad_alloc, and no longer
allocates 128 x 2GB. Maybe for C++26 we could check the locale's
std::text_encoding and use that to decide whether to cache equivalence
classes for char values over 127.
libstdc++-v3/ChangeLog:
PR libstdc++/85824
PR libstdc++/94409
PR libstdc++/98723
PR libstdc++/118105
* include/bits/locale_classes.tcc (collate::do_transform): Check
errno after calling _M_transform. Use RAII type to manage the
buffer and to restore errno.
* include/bits/regex.h (regex_traits::transform_primary): Handle
exceptions from std::collate::transform and do not try to use
std::collate for user-defined facets.
Jonathan Wakely [Tue, 17 Dec 2024 21:32:19 +0000 (21:32 +0000)]
libstdc++: Fix std::future::wait_until for subsecond negative times [PR118093]
The current check for negative times (i.e. before the epoch) only checks
for a negative number of seconds. For a time 1ms before the epoch the
seconds part will be zero, but the futex syscall will still fail with an
EINVAL error. Extend the check to handle this case.
This change adds a redundant check in the headers too, so that we avoid
even calling into the library for negative times. Both checks can be
marked [[unlikely]]. The check in the headers avoids the cost of
splitting the time into seconds and nanoseconds and then making a PLT
call. The check inside the library matches where we were checking
already, and fixes existing binaries that were compiled against older
headers but use a newer libstdc++.so.6 at runtime.
libstdc++-v3/ChangeLog:
PR libstdc++/118093
* include/bits/atomic_futex.h (_M_load_and_test_until_impl):
Return false for times before the epoch.
* src/c++11/futex.cc (_M_futex_wait_until): Extend check for
negative times to check for subsecond times. Add unlikely
attribute.
(_M_futex_wait_until_steady): Likewise.
* testsuite/30_threads/future/members/118093.cc: New test.
We have several overloads of std::deque::_M_insert_aux, one of which is
variadic and called by std::deque::emplace. With a suitable set of
arguments to emplace, it's possible for one of the non-variadic
_M_insert_aux overloads to be selected by overload resolution, making
emplace ill-formed.
Rename the variadic _M_insert_aux to _M_emplace_aux so that calls to
emplace never select an _M_insert_aux overload. Also add an inline
_M_insert_aux for the const lvalue overload that is called from
insert(const_iterator, const value_type&).
Richard Biener [Wed, 8 Jan 2025 08:25:52 +0000 (09:25 +0100)]
tree-optimization/117979 - failed irreducible loop update from DCE
When CD-DCE creates forwarders to reduce false control dependences
it fails to update the irreducible state of edge and the forwarder
block in case the fowarder groups both normal (entry) and edges
from an irreducible region (necessarily backedges). This is because
when we split the first edge, if that's a normal edge, the forwarder
and its edge to the original block will not be marked as part
of the irreducible region but when we then redirect an edge from
within the region it becomes so.
The following fixes this up.
Note I think creating a forwarder that includes backedges is
likely not going to help, but at this stage I don't want to change
the CFG going into DCE. For regular loops we'll have a single
entry and a single backedge by means of loop init and will never
create a forwarder - so this is solely happening for irreducible
regions where it's harder to prove that such forwarder doesn't help.
PR tree-optimization/117979
* tree-ssa-dce.cc (make_forwarders_with_degenerate_phis):
Properly update the irreducible region state.
DWARF has voted in recently https://dwarfstd.org/issues/241209.1.html ,
which is basically just a guarantee that the DWARF 6 draft
DW_AT_language_{name,version} attribute codes and content of
https://dwarfstd.org/languages-v6.html can be used as an extension
in DWARF 5 and won't be changed.
So, this patch is an alternative to the
https://gcc.gnu.org/pipermail/gcc-patches/2024-November/669671.html
patch, which had the major problem that it required changing all the
DWARF consumers to be able to debug C17 or later or C++17 or later
sources.
This patch uses still DWARF 5 DW_LANG_C11 or DW_LANG_C_plus_plus_14,
the latest code in DWARF 5 proper, so all DWARF 5 capable consumers
should be able to deal with that, but additionally emits the
DWARF 6 attributes so that newer DWARF consumers can see it isn't
just C++14 but say C++23 or C11 but C23. Consumers which don't know
those DWARF 6 attributes would just ignore them. This is like any other
-gno-strict-dwarf extension, except that normally we emit say DWARF 5
codes where possible only after DWARF 5 is released, while in this case
there is a guarantee it can be used before DWARF 6 is released.
2025-01-08 Jakub Jelinek <jakub@redhat.com>
include/
* dwarf2.h (enum dwarf_source_language): Fix comment pasto.
(enum dwarf_source_language_name): New type.
* dwarf2.def (DW_AT_language_name, DW_AT_language_version): New
DWARF 6 codes.
gcc/
* dwarf2out.cc (break_out_comdat_types): Copy over
DW_AT_language_{name,version} if present.
(output_skeleton_debug_sections): Remove also
DW_AT_language_{name,version}.
(gen_compile_unit_die): For C17, C23, C2Y, C++17, C++20, C++23
and C++26 emit for -gdwarf-5 -gno-strict-dwarf also
DW_AT_language_{name,version} attributes.
gcc/testsuite/
* g++.dg/debug/dwarf2/lang-cpp17.C: Add -gno-strict-dwarf to
dg-options. Check also for DW_AT_language_{name,version} values.
* g++.dg/debug/dwarf2/lang-cpp20.C: Likewise.
* g++.dg/debug/dwarf2/lang-cpp23.C: New test.
Richard Biener [Tue, 7 Jan 2025 10:15:43 +0000 (11:15 +0100)]
tree-optimization/118269 - SLP reduction chain and early breaks
When we create the SLP reduction chain epilogue for the PHIs for
the early exit we fail to properly classify the reduction as SLP
reduction chain. The following fixes the corresponding checks.
PR tree-optimization/118269
* tree-vect-loop.cc (vect_create_epilog_for_reduction):
Use the correct stmt for the REDUC_GROUP_FIRST_ELEMENT lookup.
* gcc.dg/vect/vect-early-break_131-pr118269.c: New testcase.
Jeevitha [Wed, 8 Jan 2025 07:03:12 +0000 (01:03 -0600)]
testsuite: Simplify target test and dg-options for AMO tests
Removed powerpc*-*-* from the target test as it is always true. Simplified
options by removing -mpower9-misc and -mvsx, which are enabled by default with
-mdejagnu-cpu=power9. The has_arch_pwr9 check is also true with
-mdejagnu-cpu=power9, so it has been removed.
* gcc.target/powerpc/amo1.c: Removed powerpc*-*-* from the target and
simplified dg-options.
* gcc.target/powerpc/amo2.c: Simplified dg-options and added powerpc_vsx
target check.
Hongyu Wang [Thu, 2 Jan 2025 02:29:27 +0000 (10:29 +0800)]
i386: Add br_mispredict_scale in cost table.
For later processors, the pipeline went deeper so the penalty for
untaken branch can be larger than before. Add a new parameter
br_mispredict_scale to describe the penalty, and adopt to
noce_max_ifcvt_seq_cost hook to allow longer sequence to be
converted with cmove.
This improves cpu2017 544 with -Ofast -march=native for 14% on P-core
SPR, and 8% on E-core SRF. No other regression observed.
gcc/ChangeLog:
* config/i386/i386.cc (ix86_noce_max_ifcvt_seq_cost): Adjust
cost with ix86_tune_cost->br_mispredict_scale.
* config/i386/i386.h (processor_costs): Add br_mispredict_scale.
* config/i386/x86-tune-costs.h: Add new br_mispredict_scale to
all processor_costs, in which icelake_cost/alderlake_cost
with value COSTS_N_INSNS (2) + 3 and other processor with value
COSTS_N_INSNS (2).
Pan Li [Thu, 12 Dec 2024 02:48:08 +0000 (10:48 +0800)]
Match: Refactor the signed SAT_* match for saturated value [NFC]
This patch would like to refactor the all signed SAT_* patterns for
the saturated value. Aka, overflow to INT_MAX when > 0 and downflow
to INT_MIN when < 0. Thus, we can remove sorts of duplicated expression
in different patterns.
The below test suites are passed for this patch.
* The rv64gcv fully regression test.
* The x86 bootstrap test.
* The x86 fully regression test.
gcc/ChangeLog:
* match.pd: Extract saturated value match for signed SAT_*.
Pan Li [Wed, 11 Dec 2024 11:37:06 +0000 (19:37 +0800)]
Match: Refactor the signed SAT_TRUNC match patterns [NFC]
This patch would like to refactor the all signed SAT_TRUNC patterns,
aka:
* Extract type check outside.
* Re-arrange the related match pattern forms together.
The below test suites are passed for this patch.
* The rv64gcv fully regression test.
* The x86 bootstrap test.
* The x86 fully regression test.
gcc/ChangeLog:
* match.pd: Refactor sorts of signed SAT_TRUNC match patterns
Pan Li [Wed, 11 Dec 2024 11:09:08 +0000 (19:09 +0800)]
Match: Refactor the signed SAT_SUB match patterns [NFC]
This patch would like to refactor the all signed SAT_ADD patterns,
aka:
* Extract type check outside.
* Re-arrange the related match pattern forms together.
The below test suites are passed for this patch.
* The rv64gcv fully regression test.
* The x86 bootstrap test.
* The x86 fully regression test.
gcc/ChangeLog:
* match.pd: Refactor sorts of signed SAT_SUB match patterns.
This improves codegen for x264 sum of absolute difference routines.
The insn count is same, but we avoid double widening ops and ensuing
whole register moves.
Also for more general applicability, we chose to implement abs diff
vs. the sum of abs diff variant.
Suggested-by: Robin Dapp <rdapp@ventanamicro.com> Co-authored-by: Pan Li <pan2.li@intel.com> Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
PR target/117722
Keith Packard [Tue, 7 Jan 2025 21:54:11 +0000 (14:54 -0700)]
[PATCH] libgcc/m68k: More fixes for soft float
Fix __extenddfxf2:
* Remove bogus denorm handling block which would never execute --
the converted exp value is always positive as EXCESSX > EXCESSD.
* Compute the whole significand in dl instead of doing part of it in
ldl.
* Mask off exponent from dl.l.upper so the denorm shift test
works.
* Insert the hidden one bit into dl.l.upper as needed.
Fix __truncxfdf2 denorm handling. All that is required is to shift the
significand right by the correct amount; it already has all of the
necessary bits set including the explicit one. Compute the shift
amount, then perform the wide shift across both elements of the
significand.
Fix __fixxfsi:
* The value was off by a factor of two as the significand contains
32 bits, not 31 so we need to shift by one more than the equivalent
code in __fixdfsi.
* Simplify the code having realized that the lower 32 bits of the
significand can never appear in the results.
Return positive qNaN instead of negative. For floats, qNaN is 0x7fff_ffff. For
doubles, qNaN is 0x7fff_ffff_ffff_ffff.
Return correctly signed zero on float and double divide underflow. This means
that Ld$underflow now expects d7 to contain the sign bit, just like the other
return paths.
libgcc/
* config/m68k/fpgnulib.c (extenddfxf2): Simplify code by removing code
that should never execute. Fix denorm shift test and insert hidden bit
as needed.
(__truncxfdf2): Properly compue and shift the significant right.
* config/m68k/lb1sf68.S (__fixxfsi): Correct shift counts and simplify.
(QUIET_NAN): Make it a positive quiet NaN and fix return values to inject
sign properly.
Jeff Law [Tue, 7 Jan 2025 21:27:28 +0000 (14:27 -0700)]
Fix testsuite expectations for RVV after recent change
Tamar's recent improvement to improve affine unsigned folding for exchange2
twiddle code generation for a couple tests in the RVV testsuite just enough to
cause testsuite failures.
I've looked at both tests before/after Tamar's change and the code is clearly
better -- essentially tighter vector loops due to improvements in address
arithmetic. Additionally we have fewer vsetvls after Tamar's patch.
Given that I'm just making the obvious adjustments to the expected assembly and
pushing to the trunk.
Jeff Law [Tue, 7 Jan 2025 19:20:15 +0000 (12:20 -0700)]
Fix regression in ft32 port after recent switch table adjustments
This is a trivial bug that showed up after Mark W's recent patch to not apply
the size limit on jump tables.
The ft32 port has limited immediate ranges on comparisons and the casesi
expander didn't honor those. It'd blindly pass along an out of range constant.
This patch adds the trivial adjustment to force an out of range constant into a
register. It fixes these regressions:
> Tests that now fail, but worked before (3 tests):
>
> ft32-sim: gcc: gcc.c-torture/compile/pr34093.c -O1 (test for excess errors)
> ft32-sim: gcc: gcc.dg/torture/pr106809.c -O1 (test for excess errors)
> ft32-sim: gcc: gcc.dg/torture/pr106809.c -O1 (test for excess errors)
Tested in my tester. No other tests were fixed.
gcc/
* config/ft32/ft32.md (casesi expander): Force operands[2] into
a register if it's not a suitable rimm operand.
Dimitar Dimitrov [Sun, 24 Nov 2024 10:22:13 +0000 (12:22 +0200)]
testsuite: RISC-V: Skip tests providing -march for ILP32E/ILP64E ABIs
Many test cases explicitly set -march with extensions which are not
compatible with the E ABI variants. This leads to spurious errors
when toolchain has been configured for RV32E base ISA and ILP32E ABI:
spawn ... -march=rv32gc_zbb ...
cc1: error: ILP32E ABI does not support the 'D' extension
Fix by skipping those tests if toolchain's default ABI is E.
testsuite: RISC-V: Skip tests using -mcpu= for ILP32E/ILP64E ABIs
The tests are specifying -mcpu with D extension, which is not compatible
with the ILP32E and ILP64E ABIs. Fix by skipping the tests if toolchain's
default ABI is an E variant.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/pr109508.c: Skip for E ABI.
* gcc.target/riscv/pr114139.c: Ditto.
Dimitar Dimitrov [Mon, 25 Nov 2024 18:48:00 +0000 (20:48 +0200)]
testsuite: RISC-V: Skip V and Zvbb tests for ILP32E/ILP64E ABIs
Some tests add options for V and Zvbb extensions, but those extensions
are not compatible with the E ABI variants. This leads to spurious test
failures when toolchain's default ABI is ILP32E or ILP64E:
spawn ... -march=rv32ecv_zvbb ...
cc1: error: ILP32E ABI does not support the 'D' extension
cc1: sorry, unimplemented: Currently the 'V' implementation requires the 'M' extension
Fix by skipping the tests when toolchain's default ABI is E variant.
Dimitar Dimitrov [Thu, 12 Dec 2024 18:22:59 +0000 (20:22 +0200)]
testsuite: RISC-V: Add effective target for E ABI variant
Add new effective target check for either ILP32E or ILP64E ABI variants.
Initial implementation only checks for RV32E or RV64E ISA, which in turn
implies that ILP32E/ILP64E ABI is used. The RV32I+ILP32E and
RV64I+ILP64E combinations are not yet caught by the check, but they
do not seem to be widely used currently.
gcc/testsuite/ChangeLog:
* lib/target-supports.exp (check_effective_target_riscv_abi_e):
New procedure.
Thomas Koenig [Tue, 7 Jan 2025 14:23:29 +0000 (15:23 +0100)]
Document unsigned constants in intrinsic modules.
gcc/fortran/ChangeLog:
* intrinsic.texi (ISO_FORTRAN_ENV): Also mention INT8 in the
text. Document UINT8, UINT16, UINT32 and UINT64.
(ISO_C_BINDING): New table for unsigned KIND numbers.
Wilco Dijkstra [Fri, 1 Nov 2024 14:40:26 +0000 (14:40 +0000)]
AArch64: Switch off early scheduling
The early scheduler takes up ~33% of the total build time, however it doesn't
provide a meaningful performance gain. This is partly because modern OoO cores
need far less scheduling, partly because the scheduler tends to create many
unnecessary spills by increasing register pressure. Building applications
56% faster is far more useful than ~0.1% improvement on SPEC, so switch off
early scheduling on AArch64. Codesize reduces by ~0.2%.
Fix various tests that depend on scheduling by explicitly adding -fschedule-insns.
gcc:
* common/config/aarch64/aarch64-common.cc: Switch off fschedule_insns.
Wilco Dijkstra [Fri, 1 Nov 2024 14:44:56 +0000 (14:44 +0000)]
AArch64: Block combine_and_move from creating FP literal loads
The IRA combine_and_move pass runs if the scheduler is disabled and aggressively
combines moves. The movsf/df patterns allow all FP immediates since they rely
on a split pattern. However splits do not happen during IRA, so the result is
extra literal loads. To avoid this, split early during expand and block
creation of FP immediates that need this split. Mark a few testcases that
rely on late splitting as xfail.
Tobias Burnus [Tue, 7 Jan 2025 15:43:30 +0000 (16:43 +0100)]
libgomp.texi: Minor update to omp_get_num_devices/omp_get_initial_device
libgomp/ChangeLog:
* libgomp.texi (OpenMP 6.0): Fix typo.
(omp_get_default_device): Update the wording as the value
returned by omp_get_initial_device is now ambiguous.
(omp_get_num_devices): Minor wording tweak.
(omp_get_initial_device): Note that the function may also
return omp_initial_device since OpenMP 6.
Tamar Christina [Mon, 6 Jan 2025 17:52:14 +0000 (17:52 +0000)]
perform affine fold to unsigned on non address expressions. [PR114932]
When the patch for PR114074 was applied we saw a good boost in exchange2.
This boost was partially caused by a simplification of the addressing modes.
With the patch applied IV opts saw the following form for the base addressing;
This is because the patch promoted multiplies where one operand is a constant
from a signed multiply to an unsigned one, to attempt to fold away the constant.
This patch attempts the same but due to the various problems with SCEV and
niters not being able to analyze the resulting forms (i.e. PR114322) we can't
do it during SCEV or in the general form like in fold-const like extract_muldiv
attempts.
Instead this applies the simplification during IVopts initialization when we
create the IV. This allows IV opts to see the simplified form without
influencing the rest of the compiler.
as mentioned in PR114074 it would be good to fix the missed optimization in the
other passes so we can perform this in general.
The reason this has a big impact on Fortran code is that Fortran doesn't seem to
have unsigned integer types. As such all it's addressing are created with
signed types and folding does not happen on them due to the possible overflow.
concretely on AArch64 this changes the results from generation:
The two patches together results in a 10% performance increase in exchange2 in
SPECCPU 2017 and a 4% reduction in binary size and a 5% improvement in compile
time. There's also a 5% performance improvement in fotonik3d and similar
reduction in binary size.
The patch folds every IV to unsigned to canonicalize them. At the end of the
pass we match.pd will then remove unneeded conversions.
Note that we cannot force everything to unsigned, IVops requires that array
address expressions remain as such. Folding them results in them becoming
pointer expressions for which some optimizations in IVopts do not run.
PR tree-optimization/114932
* gcc.dg/tree-ssa/pr64705.c: Update dump file scan.
* gcc.target/i386/pr115462.c: The testcase shares 3 IVs which calculates
the same thing but with a slightly different increment offset. The test
checks for 3 complex addressing loads, one for each IV. But with this
change they now all share one IV. That is the loop now only has one
complex addressing. This is ultimately driven by the backend costing
and the current costing says this is preferred so updating the testcase.
* gfortran.dg/addressing-modes_1.f90: New test.
Andrew Pinski [Sat, 16 Nov 2024 04:22:04 +0000 (20:22 -0800)]
cfgexpand: Handle integral vector types and constructors for scope conflicts [PR105769]
This is an expansion of the last patch to also track pointers via vector types and the
constructor that are used with vector types.
In this case we had:
```
_15 = (long unsigned int) &bias;
_10 = (long unsigned int) &cov_jn;
_12 = {_10, _15};
...
...
MEM <vector(2) long unsigned int> [(void *)&D.6172 + 32B] = _12;
MEM[(struct function *)&D.6157] ={v} {CLOBBER(bob)};
```
Anyways tracking the pointers via vector types to say they are alive
at the point where the store of the vector happens fixes the bug by saying
it is alive at the same time as another variable is alive.
Bootstrapped and tested on x86_64-linux-gnu.
PR tree-optimization/105769
gcc/ChangeLog:
* cfgexpand.cc (vars_ssa_cache::operator()): For constructors
walk over the elements.
gcc/testsuite/ChangeLog:
* g++.dg/torture/pr105769-1.C: New test.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Andrew Pinski [Sat, 16 Nov 2024 04:22:03 +0000 (20:22 -0800)]
cfgexpand: Rewrite add_scope_conflicts_2 to use cache and look back further [PR111422]
After fixing loop-im to do the correct overflow rewriting
for pointer types too. We end up with code like:
```
_9 = (unsigned long) &g;
_84 = _9 + 18446744073709551615;
_11 = _42 + _84;
_44 = (signed char *) _11;
...
*_44 = 10;
g ={v} {CLOBBER(eos)};
...
n[0] = &f;
*_44 = 8;
g ={v} {CLOBBER(eos)};
```
Which was not being recongized by the scope conflicts code.
This was because it only handled one level walk backs rather than multiple ones.
This fixes the issue by having a cache which records all references to addresses
of stack variables.
Unlike the previous patch, this only records and looks at addresses of stack variables.
The cache uses a bitmap and uses the index as the bit to look at.
* cfgexpand.cc (struct vars_ssa_cache): New class.
(vars_ssa_cache::vars_ssa_cache): New constructor.
(vars_ssa_cache::~vars_ssa_cache): New deconstructor.
(vars_ssa_cache::create): New method.
(vars_ssa_cache::exists): New method.
(vars_ssa_cache::add_one): New method.
(vars_ssa_cache::update): New method.
(vars_ssa_cache::dump): New method.
(add_scope_conflicts_2): Factor mostly out to
vars_ssa_cache::operator(). New cache argument.
Walk the bitmap cache for the stack variables addresses.
(vars_ssa_cache::operator()): New method factored out from
add_scope_conflicts_2. Rewrite to be a full walk of all operands
and use a worklist.
(add_scope_conflicts_1): Add cache new argument for the addr cache.
Just call add_scope_conflicts_2 for the phi result instead of calling
for the uses and don't call walk_stmt_load_store_addr_ops for phis.
Update call to add_scope_conflicts_2 to add cache argument.
(add_scope_conflicts): Add cache argument and update calls to
add_scope_conflicts_1.
gcc/testsuite/ChangeLog:
* gcc.dg/torture/pr117426-1.c: New test.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Andrew Pinski [Sat, 16 Nov 2024 04:22:02 +0000 (20:22 -0800)]
cfgexpand: Factor out getting the stack decl index
This is the first patch in improving this code.
Since there are a few places which get the index and they
check the same thing, let's factor that out into one function.
Bootstrapped and tested on x86_64-linux-gnu.
gcc/ChangeLog:
* cfgexpand.cc (INVALID_STACK_INDEX): New defined.
(decl_stack_index): New function.
(visit_op): Use decl_stack_index.
(visit_conflict): Likewise.
(add_scope_conflicts_1): Likewise.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Richard Biener [Tue, 7 Jan 2025 12:18:27 +0000 (13:18 +0100)]
rtl-optimization/118298 - constant iteration loops and #pragma unroll
When the RTL unroller handles constant iteration loops it bails out
prematurely when heuristics wouldn't apply any unrolling before
checking #pragma unroll.
PR rtl-optimization/118298
* loop-unroll.cc (decide_unroll_constant_iterations): Honor
loop->unroll even if the loop is too big for heuristics.
Richard Biener [Tue, 7 Jan 2025 14:07:12 +0000 (15:07 +0100)]
Fixup convert-dfp*.c
The testcases use -save-temps which doesn't play nice with -flto
and multilib testing resulting in spurious UNRESOLVED like
/usr/lib64/gcc/x86_64-suse-linux/14/../../../../x86_64-suse-linux/bin/ld: i386:x86-64 architecture of input file `./convert-dfp-2.ltrans0.ltrans.o' is incompatible with i386 output
The following skips the testcases when using -flto.
* gcc.dg/torture/convert-dfp-2.c: Skip with -flto.
* gcc.dg/torture/convert-dfp.c: Likewise.
Alexandre Oliva [Wed, 11 Dec 2024 13:16:58 +0000 (10:16 -0300)]
ada: Drop g-cpp* units not needed by the compiler
Having moved __gnat_convert_caught_object to g-cstyin.o, we can drop
other g-cpp* units that are now needed by programs that actually use
their APIs to get more information about C++ exceptions and type_info
objects.
gcc/ada/ChangeLog:
* gcc-interface/Make-lang.in (GNAT_ADA_OBJS, GNATBIND_OBJS):
Drop g-cpp, g-cppexc and g-cppstd.
Eric Botcazou [Tue, 10 Dec 2024 09:24:47 +0000 (10:24 +0100)]
ada: Do not create temporaries for initialization statements
Assignment statements marked with the No_Ctrl_Actions or No_Finalize_Actions
flag are initialization statements and, therefore, no temporaries are needed
to hold the value of the right-hand side for them.
gcc/ada/ChangeLog:
* gcc-interface/trans.cc (Call_to_gnu): Always use the return slot
optimization if the parent node is an initialization statement.
(gnat_to_gnu) <N_Assignment_Statement>: Build an INIT_EXPR instead
of a MODIFY_EXPR if this is an initialization statement.
Eric Botcazou [Fri, 20 Dec 2024 15:49:50 +0000 (16:49 +0100)]
ada: Do not raise exceptions from Exp_Aggr.Packed_Array_Aggregate_Handled
An exception is now raised during bootstrap and this causes compatibility
issues with older compilers.
gcc/ada/ChangeLog:
* exp_aggr.adb (Packed_Array_Aggregate_Handled): Remove declaration
and handler for Not_Handled local exception. Check the return value
of Get_Component_Val instead.
(Get_Component_Val): Return No_Uint instead of raising Not_Handled.
Javier Miranda [Thu, 19 Dec 2024 10:41:59 +0000 (10:41 +0000)]
ada: Cleanup preanalysis of static expressions (part 2)
According to RM 13.14(8/4), a static expression in an aspect specification
does not cause freezing; however, the frontend performs many calls to
Preanalyze_Spec_Expression made during the analysis of aspects. This
patch, suggested by Eric Botcazou, takes care of this additional code
cleanup which requires also replacing many occurrences of the global
variable In_Spec_Expression by calls to Preanalysis_Active.
gcc/ada/ChangeLog:
* exp_util.adb (Insert_Actions): Document behavior under strict
preanalysis.
* sem.ads (In_Strict_Preanalysis): New subprogram.
(Preanalysis_Active): Replace 'and' operator by 'and then'.
* sem.adb (In_Strict_Preanalysis): Ditto.
* sem_attr.adb (Check_Dereference): Replace In_Spec_Expression
occurrence by call to Preanalysis_Active, and document it.
(Resolve_Attribute [Atribute_Access]): Ditto.
(Eval_Attribute): No evaluation under strict preanalysis.
(Validate_Static_Object_Name): No action under strict preanalysis.
* sem_ch13.adb (Check_Aspect_At_End_Of_Declarations): Replace
calls to Preanalyze_Spec_Expression by calls to Preanalyze_And_Resolve.
(Check_Aspect_At_Freeze_Point): Ditto.
(Resolve_Aspect_Expressions [Dynamic/Static/Predicate aspects]): Code
cleanup adjusting the code to emulate Preanalyze_And_Resolve, instead
of Preanalyze_Spec_Expression.
(Resolve_Aspect_Expressions [CPU/Interrupt_Priority/Priority/
Storage_Size aspects]): Replace calls to Preanalyze_Spec_Expression
by call to Preanalyze_And _Resolve.
* sem_ch3.adb (Analyze_Object_Declaration): Replace In_Spec_Expression
occurrence by call to Preanalysis_Active.
(Find_Type_Of_Object): Add documentation.
* sem_ch4.adb (Analyze_Case_Expression): Replace In_Spec_Expression
occurrence by call to Preanalysis_Active.
* sem_ch6.adb (Analyze_Expression_Function): Minor code reorganization
moving the code preanalyzing the expression after the new body has
been inserted in the tree to ensure that its Parent attribute is
available for preanalysis.
* sem_cat.adb (Validate_Static_Object_Name): No action under strict
preanalysis.
* sem_elab.adb (Check_For_Eliminated_Subprogram): Replace In_Spec_Expression
occurrence by call to Preanalysis_Active.
* sem_eval.adb (Eval_Intrinsic_Call [Name_Enclosing_Entity]): Ditto.
* sem_elim.adb (Check_For_Eliminated_Subprogram): Ditto.
* sem_res.adb (Resolve_Entity_Name): Ditto.
Piotr Trojanek [Thu, 19 Dec 2024 23:09:15 +0000 (00:09 +0100)]
ada: Improve protection against wrong use from GDB
A code cleanup in routine intended to be used from DGB, suggested by running
GNATcheck rule Boolean_Negations. However, this code can be tuned to protect
against more illegal uses.
gcc/ada/ChangeLog:
* exp_disp.adb (Write_DT): Add guards that prevent crashes on illegal
node numbers.
Piotr Trojanek [Thu, 19 Dec 2024 14:32:56 +0000 (15:32 +0100)]
ada: Remove dead code in detection of null record definitions
Code cleanup; behavior is unaffected.
gcc/ada/ChangeLog:
* sem_util.adb (Is_Null_Record_Definition): Remove check for
Component_List being present after using it; replace check for
component item being a component declaration with an assertion;
fix style in comment.
This patch fixes two problems with how abort was deferred in finally
parts. First, calls to runtime subprograms are now omitted when
aborting is disallowed by active restrictions. Second, Abort_Undefer is
now correctly called when the finally part propagates an exception.
Steve Baird [Tue, 17 Dec 2024 21:27:04 +0000 (13:27 -0800)]
ada: Improved checking of uses of package renamings
In some cases, the RM 8.5.1(3.1) legality rule about uses of renamings of
limited views of packages was implemented incorrectly, resulting in rejecting
legal uses.
gcc/ada/ChangeLog:
* gen_il-fields.ads: add new Renames_Limited_View field.
* gen_il-gen-gen_entities.adb: add Renames_Limited_View flag for
packages.
* einfo.ads: add comment documenting Renames_Limited_View flag.
* sem_ch8.adb (Analyze_Package_Renaming): Set new Renames_Limited_View
flag. Test new Renames_Limited_View flag instead of calling
Has_Limited_With. If Has_Limited_With is True, that just means
that somebody, sometime during this compilation needed to
reference the limited view of the package; so that function
returns True too often to be used here.
(Find_Expanded_Name): Test new Renames_Limited_View flag instead of
calling Has_Limited_With.
Piotr Trojanek [Tue, 30 Jan 2024 00:10:17 +0000 (01:10 +0100)]
ada: Remove flag Is_Inherited_Pragma which is only set and never used
Code cleanup; behavior is unaffected. Flag Is_Inherited_Pragma is only set in
GNAT, but is not actually used, neither by the compiler nor by any backend.
gcc/ada/ChangeLog:
* contracts.adb (Inherit_Pragma): Don't set flag Is_Inherited_Pragma.
* gen_il-fields.ads (Opt_Field_Enum): Remove field identifier.
* gen_il-gen-gen_nodes.adb (N_Pragma): Remove field from node.
* sinfo.ads (Is_Inherited_Pragma): Remove field description.
(N_Pragma): Remove field reference.
Piotr Trojanek [Tue, 26 Mar 2024 15:23:41 +0000 (16:23 +0100)]
ada: Handle attributes related to Ada 2012 iterators as internal
Use existing machinery for internal attributes to handle attributes
related to Ada 2012 iterators. All these attributes exist exclusively
as a mean to delay processing.
Code cleanup. The only change in behavior is the wording of error
emitted when one of the internal attributes appears in source code:
from "illegal attribute" (which used to be emitted in the analysis)
to "unrecognized attribute (which is emitted by the parser).
gcc/ada/ChangeLog:
* exp_attr.adb (Expand_N_Attribute_Reference): Remove explicit
handling of attributes related to Ada 2012 iterators.
* sem_attr.adb (Analyze_Attribute, Eval_Attribute): Likewise;
move attribute Reduce according to alphabetic order.
* snames.adb-tmpl (Get_Attribute_Id): Add support for new internal
attributes.
* snames.ads-tmpl: Recognize names of new internal attributes.
(Attribute_Id): Recognize new internal attributes.
(Internal_Attribute_Id): Likewise.
(Is_Internal_Attribute_Name): Avoid duplication in comment.
Piotr Trojanek [Thu, 2 Mar 2023 21:43:12 +0000 (22:43 +0100)]
ada: Remove unnecessary qualifiers for First/Next list operations
Code cleanup related to work on expression functions for GNATprove
(which require accessibility checks even when they are not expanded
and thus have no explicit return statements).
gcc/ada/ChangeLog:
* accessibility.adb (First_Selector): Remove redundant and locally
inconsistent parenthesis.
(Check_Return_Construct_Accessibility): Remove qualifier from list
operation.
* sem_util.adb (Is_Prim_Of_Abst_Type_With_Nonstatic_CW_Pre_Post):
Likewise.
Eric Botcazou [Wed, 18 Dec 2024 09:16:15 +0000 (10:16 +0100)]
ada: Fix internal error on container aggregate for bounded vectors
The problem is that we analyze references to an object before the actual
subtype of the object is established, thus creating a type mismatch that
is flagged by the code generator.
gcc/ada/ChangeLog:
* exp_ch7.ads (Store_After_Actions_In_Scope_Without_Analysis): New
procedure declaration.
* exp_ch7.adb (Store_New_Actions_In_Scope): New procedure.
(Store_Actions_In_Scope): Call Store_New_Actions_In_Scope when the
target list is empty.
(Store_After_Actions_In_Scope_Without_Analysis): New procedure body.
* exp_aggr.adb (Expand_Container_Aggregate): For a declaration that
is wrapped in a transient scope, also defer the analysis of the new
code until after the declaration is analyzed.
Eric Botcazou [Tue, 17 Dec 2024 19:00:38 +0000 (20:00 +0100)]
ada: Add guard to System.Val_Real.Large_Powfive against pathological input
There is no need to keep multiplying the result once it saturates to +Inf.
gcc/ada/ChangeLog:
* libgnat/s-powflt.ads (Maxpow_Exact): Minor comment fix.
* libgnat/s-powlfl.ads (Maxpow_Exact): Likewise.
* libgnat/s-powllf.ads (Maxpow_Exact): Likewise.
* libgnat/s-valrea.adb (Large_Powfive) [1 parameter]: Exit the loop
as soon as the result saturates to +Inf.
(Large_Powfive) [2 parameters]: Likewise.
Piotr Trojanek [Mon, 16 Dec 2024 13:15:57 +0000 (14:15 +0100)]
ada: Move checks for consequences of Exceptional_Cases to GNAT
Previously checks for consequence expressions of Exceptional_Cases aspects were
done in GNATprove backend. However, we can do them in the frontend, where they
will apply to all subprograms, regardless of the SPARK_Mode aspect.
gcc/ada/ChangeLog:
* sem_prag.adb (Analyze_Exceptional_Cases_In_Decl_Part): Move check
from GNATprove backend to GNAT frontend.
Piotr Trojanek [Mon, 16 Dec 2024 12:52:43 +0000 (13:52 +0100)]
ada: Fix comments about Subprogram_Variant and Exceptional_Cases
The comment about Subprogram_Variant was outdated after more types have been
allowed by the corresponding SPARK RM rule; the comment about Exceptional_Cases
was incorrect, after being copy-pasted.
Steve Baird [Fri, 13 Dec 2024 01:06:00 +0000 (17:06 -0800)]
ada: Put_Image spec incorrectly ignored for Fixed_Point_Type'Base'Image call.
If a Put_Image aspect specification (introduced in Ada 2022) is given for a
fixed point type Fx, then in some cases a call to Fx'Base'Image would
incorrectly ignore the aspect specification and would instead return the
pre-Ada2022 version of the image. However, a call to Fx'Image would do the
right thing.
gcc/ada/ChangeLog:
* exp_put_image.adb (Image_Should_Call_Put_Image): Cope with the case
where the attribute prefix for an Image attribute reference
denotes an Itype constructed for a fixed point type. Calling
Has_Aspect with such an Itype misses applicable aspect
specifications; we need to look on the right list. This comes up
if the prefix of the attribute reference is
Some_Fixed_Point_Type'Base.
Gary Dismukes [Fri, 13 Dec 2024 23:36:05 +0000 (23:36 +0000)]
ada: Error on instantiation with defaulted formal type referencing other formal type
The compiler wasn't accounting for default subtypes on generic formal types
that reference other formal types of the same generic, leading to errors
about invalid subtypes. Several other problems that could lead to blowups
or incorrect errors were noticed through testing related cases and fixed
along the way.
gcc/ada/ChangeLog:
* sem_ch12.adb (Analyze_One_Association): In the case of a formal type
that has a Default_Subtype_Mark that does not have its Entity field set,
this means the default refers to another formal type of the same generic
formal part, so locate the matching subtype in the Result_Renamings and
set Match's Entity to that subtype prior to the call to Instantiate_Type.
(Validate_Formal_TypeDefault.Reference_Formal): Add test of Entity being
Present, to prevent blowups on End_Label ids (which don't have Entity set).
(Validate_Formal_Type_Default.Validate_Derived_Type_Default): Apply
Base_Type to Formal.
(Validate_Formal_Type_Default): Guard interface-related semantic checks
with a test of Is_Tagged_Type.
Eric Botcazou [Mon, 16 Dec 2024 07:59:26 +0000 (08:59 +0100)]
ada: Restrict previous change made to expansion of allocators
There is no need to build a cleanup if exceptions cannot be propagated.
gcc/ada/ChangeLog:
* exp_ch4.adb (Expand_Allocator_Expression): Do not build a cleanup
if restriction No_Exception_Propagation is active.
* exp_ch6.adb (Make_Build_In_Place_Call_In_Allocator): Likewise.
Deng Jianbo [Tue, 31 Dec 2024 11:33:23 +0000 (19:33 +0800)]
LoongArch: Optimize initializing fp resgister to zero
In LoongArch, currently uses instruction movgr2fr.{d|w} to move zero
from fixed-point register to floating-pointer regsiter for initializing
fp register to zero. When LSX or LASX is enabled, we can use instruction
vxor.v which has lower latency than instruction movgr2fr.{d|w} to set fp
register to zero directly.
gcc/ChangeLog:
* config/loongarch/loongarch.cc (loongarch_output_move):
Optimize instructions for initializing fp regsiter to zero.
gcc/testsuite/ChangeLog:
* gcc.target/loongarch/mov-zero-1.c: New test.
* gcc.target/loongarch/mov-zero-2.c: New test.
Gaius Mulley [Tue, 7 Jan 2025 11:20:45 +0000 (11:20 +0000)]
[PR modula2/118010, modula2/118183] Unable to rebuild the bootstrap tools and Wtypemismatch
This patch combines fixes for both PR-118010 (Wtypemismatch) and PR-118183
(unable to rebuild the bootstrap tools). PR-118010 required a new data
type (COFF_T) to be exported from SYSTEM and used in all return
types for libc.lseek. The patch also includes COFF_T implemented in mc
and this data type has been propagated though the translated versions
of pge and mc. Finally the patch adjusts the modula-2 declaration of
location_t to reflect the new gcc 64 bit type.
A new command line option -fm2-file-offset-bits= has been implemented to
override the default 64 bit declaration of COFF_T.
gcc/ChangeLog:
PR modula2/118010
* doc/gm2.texi (Compiler options): New option
-fm2-file-offset-bits=.
Fortran: Extend cylic type detection for deallocate [PR116669]
Using cycles in derived/class types lead to the compiler doing a endless
recursion in several locations, when the cycle was not immediate.
An immediate cyclic dependency is present in, for example T T::comp.
Cylcic dependencies of the form T T2::comp; T2 T::comp2; are now
detected and the recursive bit in the derived type's attr is set.
gcc/fortran/ChangeLog:
PR fortran/116669
* class.cc (gfc_find_derived_vtab): Use attr to determine cyclic
type dependendies.
* expr.cc (gfc_has_default_initializer): Prevent endless
recursion by storing already visited derived types.
* resolve.cc (resolve_cyclic_derived_type): Determine if a type
is used in its hierarchy in a cyclic way.
(resolve_fl_derived0): Call resolve_cyclic_derived_type.
(resolve_fl_derived): Ensure vtab is generated when cyclic
derived types have allocatable components.
* trans-array.cc (structure_alloc_comps): Prevent endless loop
for derived type cycles.
* trans-expr.cc (gfc_get_ultimate_alloc_ptr_comps_caf_token):
Off topic, just prevent memory leaks.
gcc/testsuite/ChangeLog:
* gfortran.dg/class_array_15.f03: Freeing more memory.
* gfortran.dg/recursive_alloc_comp_6.f90: New test.
This patch removes the AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS tunable and
use_new_vector_costs entry in aarch64-tuning-flags.def and makes the
AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS paths in the backend the
default. To that end, the function aarch64_use_new_vector_costs_p and its uses
were removed. To prevent costing vec_to_scalar operations with 0, as
described in
https://gcc.gnu.org/pipermail/gcc-patches/2024-October/665481.html,
we adjusted vectorizable_store such that the variable n_adjacent_stores
also covers vec_to_scalar operations. This way vec_to_scalar operations
are not costed individually, but as a group.
As suggested by Richard Sandiford, the "known_ne" in the multilane-check
was replaced by "maybe_ne" in order to treat nunits==1+1X as a vector
rather than a scalar.
Two tests were adjusted due to changes in codegen. In both cases, the
old code performed loop unrolling once, but the new code does not:
Example from gcc.target/aarch64/sve/strided_load_2.c (compiled with
-O2 -ftree-vectorize -march=armv8.2-a+sve -mtune=generic -moverride=tune=none):
f_int64_t_32:
cbz w3, .L92
mov x4, 0
uxtw x3, w3
+ cntd x5
+ whilelo p7.d, xzr, x3
+ mov z29.s, w5
mov z31.s, w2
- whilelo p6.d, xzr, x3
- mov x2, x3
- index z30.s, #0, #1
- uqdecd x2
- ptrue p5.b, all
- whilelo p7.d, xzr, x2
+ index z30.d, #0, #1
+ ptrue p6.b, all
.p2align 3,,7
.L94:
- ld1d z27.d, p7/z, [x0, #1, mul vl]
- ld1d z28.d, p6/z, [x0]
- movprfx z29, z31
- mul z29.s, p5/m, z29.s, z30.s
- incw x4
- uunpklo z0.d, z29.s
- uunpkhi z29.d, z29.s
- ld1d z25.d, p6/z, [x1, z0.d, lsl 3]
- ld1d z26.d, p7/z, [x1, z29.d, lsl 3]
- add z25.d, z28.d, z25.d
+ ld1d z27.d, p7/z, [x0, x4, lsl 3]
+ movprfx z28, z31
+ mul z28.s, p6/m, z28.s, z30.s
+ ld1d z26.d, p7/z, [x1, z28.d, uxtw 3]
add z26.d, z27.d, z26.d
- st1d z26.d, p7, [x0, #1, mul vl]
- whilelo p7.d, x4, x2
- st1d z25.d, p6, [x0]
- incw z30.s
- incb x0, all, mul #2
- whilelo p6.d, x4, x3
+ st1d z26.d, p7, [x0, x4, lsl 3]
+ add z30.s, z30.s, z29.s
+ incd x4
+ whilelo p7.d, x4, x3
b.any .L94
.L92:
ret