The issue is the same as 12383255fe4e82c31f5e42c72a8fbcb1b5dea35d.
Neither is .REDUC_PLUS set for V2SImode on LoongArch, so add it
to the list of targets not expecting BB vectorization.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/bb-slp-77.c: Add loongarch*-*-* to the list
of expected failing targets.
Jeff Law [Sun, 9 Mar 2025 20:25:37 +0000 (14:25 -0600)]
[rtl-optimization/117467] Mark FP destinations as dead
The next step in improving ext-dce is to clean up a minor wart in the
set/clobber handling code.
In that code the safe thing to do is to not process a destination at all. That
will leave bits set in the live bitmaps for objects that may no longer be live.
Of course with extraneous bits set we use more memory and do more work managing
the bitmaps, but it's safe from a code correctness standpoint.
One case that is slipping through that we need to fix is scalar fp
destinations. Essentially the code never tried to handle those and as a result
would leave those entities live and bubble them up through the CFG.
In the testcase at hand this takes us from ~10k live objects at entry to ~4k
live objects at entry. Time spent in ext-dce goes from 2.14s to .64s.
Jeff Law [Sun, 9 Mar 2025 19:28:10 +0000 (13:28 -0600)]
[rtl-optimization/117467] Avoid unnecessarily marking things live in ext-dce
This is the first of what I expect to be a few patches to improve memory
consumption and performance of ext-dce.
While I haven't been able to reproduce the insane memory usage that Richi saw,
I can certainly see how we might get there. I instrumented ext-dce to dump the
size of liveness sets, removed the memory allocation limiter, then compiled the
appropriate file from specfp on rv64.
In my test I saw the liveness sets growing to absurd sizes as we worked from
the last block back to the first. Think 125k entries by the time we got back
to the entry block which would mean ~30k live registers. Simply no way that's
correct.
The use handling is the primary source of problems and the code that I most
want to rewrite for gcc-16. It's just a fugly mess. I'm not terribly inclined
to do that rewrite for gcc-15 though. So these will be spot adjustments.
The most important thing to know about use processing is it sets up an iterator
and walks that. When a SET is encountered we actually manually
dive into the SRC/DEST and ideally terminate the iterator.
If during that SET processing we encounter something unexpected we let the
iterator continue normally, which causes iteration down into the SET_DEST
object. That's safe behavior, though it can lead to too many objects as being
marked live.
We can refine that behavior by trivially realizing that we need not process the
SET_DEST if it is a naked REG (and probably for other cases too, but they're
not expected to be terribly important). So once we see the SET with a simple
REG destination, we can bump the iterator to avoid having it dive into the
SET_DEST if something unexpected is seen on the SET_SRC side.
Fixing this alone takes us from 125k live objects to 10k live objects at the
entry block. Time in ext-dce for rv64 on the testcase goes from 10.81s to
2.14s.
Given this reduces the things considered live, this could easily result in
finding more cases for ext-dce to improve. In fact a missed optimization issue
for rv64 I've been poking at needs this patch as a prerequisite.
Bootstrapped and regression tested on x86_64.
Pushing to the trunk.
PR rtl-optimization/117467
gcc
* ext-dce.cc (ext_dce_process_uses): When trivially possible advance
the iterator over the destination of a SET.
Andrew Pinski [Sun, 9 Mar 2025 06:43:54 +0000 (22:43 -0800)]
phiopt: Fix value_replacement for middle bb having phi nodes [PR118922]
After r12-5300-gf98f373dd822b3, value_replacement would be able to look at the
following cfg structure:
```
<bb 5> [local count: 1014686024]:
if (h_6 != 0)
goto <bb 7>; [94.50%]
else
goto <bb 6>; [5.50%]
value_replacement would incorrectly think the middle bb (6) was empty and so it decides
to remove condition in bb5 and replacing it with 0 as the function thought it was `h_6 ? 0 : h_6`.
But since the there is an incoming phi node to bb6 defining h_6 that is incorrect.
The fix is to check if there is phi nodes in the middle bb and set empty_or_with_defined_p to false.
This was not needed before r12-5300-gf98f373dd822b3 because the phi would have been dead otherwise due to
other checks.
Bootstrapped and tested on x86_64-linux-gnu.
PR tree-optimization/118922
gcc/ChangeLog:
* tree-ssa-phiopt.cc (value_replacement): Set empty_or_with_defined_p
to false when there is phi nodes for the middle bb.
gcc/testsuite/ChangeLog:
* gcc.dg/torture/pr118922-1.c: New test.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
OpenMP: Integrate dynamic selectors with dispatch argument handling [PR118457]
Support for dynamic selectors in "declare variant" was developed in
parallel with support for the adjust_args/append_args clauses and the
dispatch construct; they collided in a bad way. This patch fixes the
"sorry" for calls that need both by removing the adjust_args/append_args
code from gimplify_call_expr and invoking it from the new variant
substitution code instead. It's handled as a tree -> tree transformation
rather than tree -> gimple because eventually this code may end up being
invoked from the front ends instead of the gimplifier (see PR115076).
gcc/ChangeLog
PR middle-end/118457
* gimplify.cc (modify_call_for_omp_dispatch): New, containing
code split from gimplify_call_expr and modified to emit tree
instead of gimple. Remove the error for falling through to a call
to the base function.
(expand_variant_call_expr): New, split from gimplify_variant_call_expr.
Call modify_call_for_omp_dispatch on calls to
variants in a dispatch construct context.
(gimplify_variant_call_expr): Make it call expand_variant_call_expr
to do the actual work.
(gimplify_call_expr): Remove sorry for calls involving both
dynamic/late selectors and adjust_args/append_args, and adjust
for new interface. Move adjust_args/append_args code to
modify_call_for_omp_dispatch.
(gimplify_omp_dispatch): Add some comments.
Thomas Koenig [Sat, 8 Mar 2025 15:13:41 +0000 (16:13 +0100)]
Fix regression with -Wexternal-argument-mismatch.
The attached patch fixes an ICE regresseion where undo state was not
handled properly when generating formal from actual arguments, which
occurred under certain conditions with the newly introduced
-Wexternal-argument-mismatch option.
The fix is simple: When we are generating these symbols, we no
longer need to undo anything, so we can just remove them.
I had considered adding an extra optional argument, but decided
against it on code clarity grounds.
While looking at the code, I also saw that a member of gfc_symbol
introduced with my patch should be a bitfield of width 1.
gcc/fortran/ChangeLog:
PR fortran/119157
* gfortran.h (gfc_symbol): Make ext_dummy_arglist_mismatch a
one-bit bitfield
(gfc_pop_undo_symbol): Declare prototype.
* symbol.cc (gfc_pop_undo_symbol): New function.
* interface.cc (gfc_get_formal_from_actual_arglist): Call it
for artificially introduced formal variables.
gcc/testsuite/ChangeLog:
PR fortran/119157
* gfortran.dg/interface_57.f90: New test.
This commit implements the proposed resolution to LWG4169, which is
to constrain std::atomic<T>'s default constructor based on whether
T itself is default constructible.
At the moment, std::atomic<T>'s primary template in libstdc++ has a
defaulted default constructor. Value-initialization of the T member
(since C++20 / P0883R2) is done via a NSDMI (= T()).
GCC already considers the defaulted constructor constrained/deleted,
however this behavior is non-standard (see the discussion in PR116769):
the presence of a NSDMI should not make the constructor unavailable to
overload resolution/deleted ([class.default.ctor]/2.5 does not apply).
When using libstdc++ on Clang, this causes build issues as the
constructor is *not* deleted there -- the interpretation of
[class.default.ctor]/4 seems to match Clang's behavior.
Therefore, although there would be "nothing to do" with GCC+libstdc++,
this commit changes the code as to stop relying on the GCC language
extension. In C++ >= 20 modes, std::atomic's defaulted default
constructor is changed to be a non-defaulted one, with a constraint
added as per LWG4169; value-initialization of the data member is moved
from the NSDMI to the member init list. The new signature matches the
one in the Standard as per [atomics.types.operations]/1.
In pre-C++20 modes, the constructor is left defaulted. This ensures
compatibility with C++11/14/17 behavior. In other words: we are not
backporting P0883R2 to earlier language modes here.
Amend an existing test to check that a std::atomic wrapping a
non-default constructible type is always non-default constructible:
from C++20, because of the constraint; before C++20, because we
are removing the NSDMI, and therefore [class.default.ctor]/2.5
applies.
Add another test that checks that std::atomic is trivially default
constructible in pre-C++20 modes, and it isn't afterwards.
libstdc++-v3/ChangeLog:
* include/bits/version.def (atomic_value_initialization):
Guard the FTM with the language concepts FTM.
* include/bits/version.h: Regenerate.
* include/std/atomic (atomic): When atomic value init is
defined, change the defaulted default constructor to
a non-defaulted one, constraining it as per LWG4169.
Otherwise, keep the existing constructor.
Remove the NSDMI for the _M_i member.
(_GLIBCXX20_INIT): Drop the macro, as it is not needed any more.
* testsuite/29_atomics/atomic/69301.cc: Test that
an atomic wrapping a non-default-constructible type is
always itself non-default-constructible (in all language
modes).
* testsuite/29_atomics/atomic/cons/trivial.cc: New test.
inline-asm: Improve documentation of "asm constexpr".
While working on an adjacent documentation fix, I noticed that the
documentation for the gnu++11 "asm constexpr" feature was very
confusing, in some cases being attached to parts of the asm syntax
that are not otherwise required to be string literals, and missing from
other parts of the syntax that are. I've checked what the C++ parser
actually does and fixed the documentation to match, also improving it
to use correct markup and to be more explicit and less implementor-speaky.
gcc/cp/ChangeLog
* parser.cc (cp_parser_asm_definition): Make comment more explicit.
(cp_parser_asm_operand_list): Likewise. Also correct the comment
block at the top of the function to reflect reality.
gcc/ChangeLog
* doc/extend.texi (Basic Asm): Document that AssemblerInstructions
can be an asm constexpr.
(Extended Asm): Move the notes about asm constexprs for
AssemblerTemplate and Clobbers to the corresponding subsections.
Remove the notes for OutputOperands and InputOperands and reword
misleading descriptions of the list item syntax. Note that
constraint strings can be asm constexprs.
(Asm constexprs): Use "title case" for subsection name. Be
explicit about what parts of the asm syntax this applies to and
that the parentheses are required. Correct markup and terminology.
Jason Merrill [Thu, 6 Mar 2025 17:39:36 +0000 (12:39 -0500)]
c++/modules: purview of explicit instantiations [PR114630]
When calling instantiate_pending_templates at end of parsing, any new
functions that are instantiated from this point have their module
purview set based on the current value of module_kind.
This is unideal, however, as the modules code will then treat these
instantiations as reachable and cause large swathes of the GMF to be
emitted into the module CMI, despite no code in the actual module
purview referencing it.
This patch fixes this by setting DECL_MODULE_PURVIEW_P as appropriate when
we see an explicit instantiation, and adjusting module_kind accordingly
during deferred instantiation, meaning that GMF entities won't be counted
as reachable unless referenced by an actually reachable entity.
Note that purviewness and attachment etc. is generally only determined
by the base template: this is purely for determining whether an
explicit instantiation is in the module purview and hence whether it
should be streamed out. See the comment on 'set_instantiating_module'.
Incidentally, since the "xtreme" testcases are deliberately large (and this
commit adds another one), let's make sure we only run them once.
PR c++/114630
PR c++/114795
gcc/cp/ChangeLog:
* pt.cc (reopen_tinst_level): Set or clear MK_PURVIEW.
(mark_decl_instantiated): Call set_instantiating_module.
(instantiate_pending_templates): Save and restore module_kind so
it isn't affected by reopen_tinst_level.
gcc/testsuite/ChangeLog:
* g++.dg/modules/modules.exp: Run xtreme tests once.
* g++.dg/modules/gmf-3.C: New test.
* g++.dg/modules/gmf-4.C: New test.
* g++.dg/modules/gmf-xtreme.C: New test.
My P3349R1 paper clarifies that we should be able to lower contiguous
iterators to pointers, without worrying about side effects of individual
increment or dereference operations.
We do need to advance the iterators, and we need to use std::to_address
on the result of advancing them. This ensures that iterators with error
detection get a chance to diagnose bugs. If we don't use std::to_address
on the advanced iterator, it would be possible for a memcpy on the
pointers to overflow a buffer. By performing the += or -= operations and
also using std::to_address, we give the iterator a chance to abort,
throw, or call a violation handler before the buffer overflow happens.
The new tests only check the std::copy* algorithms, because std::move
and std::move_backward use the same implementation details.
libstdc++-v3/ChangeLog:
* include/bits/stl_algobase.h (__nothrow_contiguous_iterator):
Remove.
(__memcpyable_iterators): Simplify.
(__copy_move_a2, __copy_n_a, __copy_move_backward_a2): Call
std::to_address on the iterators after advancing them.
* testsuite/25_algorithms/copy/contiguous.cc: New test.
* testsuite/25_algorithms/copy_backward/contiguous.cc: New test.
* testsuite/25_algorithms/copy_n/contiguous.cc: New test.
Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
inline-asm: Clarify documentation of operand syntax [PR67301]
gcc/ChangeLog
PR c/67301
* doc/extend.texi (Extended Asm): Clarify that the square brackets
around the asmSymbolicName of operands are a required part of
the syntax.
Jerry DeLisle [Sat, 8 Mar 2025 02:33:29 +0000 (18:33 -0800)]
Fortran: Fix ICE in resolve.cc with -pedantic
Fixes an ICE in gfc_resolve_code when passing an
optional array to an elemental procedure with `-pedantic` enabled.
PR95446 added the original check, this patch fixes the case where the
other actual argument is an array literal (or something else other
than a variable).
PR fortran/119054
gcc/fortran/ChangeLog:
* resolve.cc (resolve_elemental_actual): When checking other
actual arguments to elemental procedures, don't check
attributes of literals and function calls.
gcc/testsuite/ChangeLog:
* gfortran.dg/pr95446.f90: Expand test case to literals and
function calls.
Jakub Jelinek [Fri, 7 Mar 2025 23:50:13 +0000 (00:50 +0100)]
c-family, tree: Allow nonstring attribute on multidimensional arrays [PR117178]
As requested in the PR117178 thread, the following patch allows nonstring
attribute also on multi-dimensional arrays (with cv char/signed char/unsigned
char as innermost element type) and pointers to such multi-dimensional arrays
or pointers to single-dimensional cv char/signed char/unsigned char arrays.
Given that (unfortunately) nonstring is a decl attribute rather than type
attribute, I think restricting it to single-dimensional arrays makes no
sense, even multi-dimensional ones can be used for storage of non-nul
terminated strings.
I really don't know what the kernel plans are, whether
they'll go with -Wno-unterminated-string-initialization added in Makefiles,
or whether the plan is to use nonstring attributes to quiet the warning.
In the latter case, some of the nonstring attributes will need to be
conditional on gcc version, because gcc before this patch will reject it
on multidimensional arrays.
2025-03-08 Jakub Jelinek <jakub@redhat.com>
PR c/117178
gcc/
* tree.cc (get_attr_nonstring_decl): Look through all ARRAY_REFs, not
just one and handle COMPONENT_REF and MEM_REF after skipping those
rather than only when there wasn't ARRAY_REF. Formatting fix.
gcc/c-family/
* c-attribs.cc (handle_nonstring_attribute): Allow the attribute also
on multi-dimensional arrays with char/signed char/unsigned char
element type or pointers to such single and multi-dimensional arrays.
gcc/testsuite/
* c-c++-common/attr-nonstring-7.c: Remove one xfail.
* c-c++-common/attr-nonstring-9.c: New test.
* c-c++-common/attr-nonstring-10.c: New test.
* c-c++-common/attr-nonstring-11.c: New test.
* c-c++-common/attr-nonstring-12.c: New test.
* c-c++-common/attr-nonstring-13.c: New test.
* c-c++-common/attr-nonstring-14.c: New test.
* c-c++-common/attr-nonstring-15.c: New test.
* c-c++-common/attr-nonstring-16.c: New test.
Jakub Jelinek [Fri, 7 Mar 2025 22:59:34 +0000 (23:59 +0100)]
c: do not warn about truncating NUL char when initializing nonstring arrays [PR117178]
When initializing a nonstring char array when compiled with
-Wunterminated-string-initialization the warning trips even when
truncating the trailing NUL character from the string constant. Only
warn about this when running under -Wc++-compat since under C++ we should
not initialize nonstrings from C strings.
This patch separates the -Wunterminated-string-initialization and
-Wc++-compat warnings, they are now independent option, the former implied
by -Wextra, the latter not implied by anything. If -Wc++-compat is in effect,
it takes precedence over -Wunterminated-string-initialization and warns regardless
of nonstring attribute, otherwise if -Wunterminated-string-initialization is
enabled, it warns only if there isn't nonstring attribute.
In all cases, the warnings and also pedwarn_init for even larger sizes now
provide details on the lengths.
2025-03-07 Kees Cook <kees@kernel.org>
Jakub Jelinek <jakub@redhat.com>
PR c/117178
gcc/
* doc/invoke.texi (Wunterminated-string-initialization): Document
the new interaction between this warning and -Wc++-compat and that
initialization of decls with nonstring attribute aren't warned about.
gcc/c-family/
* c.opt (Wunterminated-string-initialization): Don't depend on
-Wc++-compat.
gcc/c/
* c-typeck.cc (digest_init): Add DECL argument. Adjust wording of
pedwarn_init for too long strings and provide details on the lengths,
for string literals where just the trailing NULL doesn't fit warn for
warn_cxx_compat with OPT_Wc___compat, wording which mentions "for C++"
and provides details on lengths, otherwise for
warn_unterminated_string_initialization adjust the warning, provide
details on lengths and don't warn if get_attr_nonstring_decl (decl).
(build_c_cast, store_init_value, output_init_element): Adjust
digest_init callers.
gcc/testsuite/
* gcc.dg/Wunterminated-string-initialization.c: Add additional test
coverage.
* gcc.dg/Wcxx-compat-14.c: Check in dg-warning for "for C++" part of
the diagnostics.
* gcc.dg/Wcxx-compat-23.c: New test.
* gcc.dg/Wcxx-compat-24.c: New test.
Sanitizer: Mention -g option in documentation [PR56682]
gcc/ChangeLog
PR sanitizer/56682
* doc/invoke.texi (Instrumentation Options): Document that -g
is useful with -fsanitize=thread and -fsanitize=address.
Also mention -fno-omit-frame-pointer per the asan wiki.
Andrew Pinski [Fri, 7 Mar 2025 21:18:43 +0000 (21:18 +0000)]
Fix testcases up after recent -Wreturn-type change
I missed these two testcases in the diff when looking for testcases
that fail. The change is the same as what was done for
gcc.dg/Wreturn-mismatch-2.c.
Pushed as obvious after a quick test.
gcc/testsuite/ChangeLog:
* gcc.dg/Wreturn-mismatch-2a.c: Change dg-warning
for the last -Wreturn-type to dg-bogus.
* gcc.dg/Wreturn-mismatch-6.c: Likewise.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Jonathan Wakely [Thu, 6 Mar 2025 21:18:21 +0000 (21:18 +0000)]
libstdc++: Make std::erase for linked lists convert to bool
LWG 4135 (approved in Wrocław, November 2024) fixes the lambda
expressions used by std::erase for std::list and std::forward_list.
Previously they attempted to copy something that isn't required to be
copyable. Instead they should convert it to bool right away.
The issue resolution also changes the lambda's parameter to be const, so
that it can't modify the elements while comparing them.
libstdc++-v3/ChangeLog:
* include/std/forward_list (erase): Change lambda to have
explicit return type and const parameter type.
* include/std/list (erase): Likewise.
* testsuite/23_containers/forward_list/erasure.cc: Check lambda
is correct.
* testsuite/23_containers/list/erasure.cc: Likewise.
Jonathan Wakely [Thu, 6 Mar 2025 20:23:29 +0000 (20:23 +0000)]
libstdc++: Add poison pill for chrono::from_stream
LWG 3956 (approved in Hagenberg, February 2025) decided that from_stream
should be found *only* by ADL, not ordinary unqualified lookup. Add a
poison pill overload to chrono::__detail where the __parsable concept
and operator>>(basic_istream&, _Parser) are defined. This ensures that
when they use from_stream unqualified ordinary lookup finds the poison
pill, which is deleted, so a usable overload resolution result can only
be found by ADL.
We already have the std/time/parse/parse.cc test checking that ADL
works, so this doesn't add a new test.
libstdc++-v3/ChangeLog:
* include/bits/chrono_io.h (chrono::__detail::from_stream): Add
deleted function as poison pill for unqualified lookup.
this patch removes TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE and
replaces it with two hooks: one that controls the cost of using an
extra callee-saved register and one that controls the cost of allocating
a frame for the first spill.
(The patch does not attempt to address the shrink-wrapping part of
the thread above.)
On AArch64, this is enough to fix PR117477, as verified by the new tests.
The patch does not change the SPEC2017 scores significantly. (I saw a
slight improvement in fotonik3d and roms, but I'm not convinced that
the improvements are real.)
The patch makes IRA use caller saves for gcc.target/aarch64/pr103350-1.c,
which is a scan-dump correctness test that relies on not using
caller saves. The decision to use caller saves looks appropriate,
and saves an instruction, so I've just added -fno-caller-saves
to the test options.
The x86 parts were written by Honza. ix86_callee_save_cost is updated
by H.J. to replace gcc_checking_assert with returning 1 if mem_cost <= 2.
gcc/
PR rtl-optimization/117477
* config/aarch64/aarch64.cc (aarch64_count_saves): New function.
(aarch64_count_above_hard_fp_saves, aarch64_callee_save_cost)
(aarch64_frame_allocation_cost): Likewise.
(TARGET_CALLEE_SAVE_COST): Define.
(TARGET_FRAME_ALLOCATION_COST): Likewise.
* config/i386/i386.cc (ix86_ira_callee_saved_register_cost_scale):
Replace with...
(ix86_callee_save_cost): ...this new hook.
(TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE): Delete.
(TARGET_CALLEE_SAVE_COST): Define.
* target.h (spill_cost_type, frame_cost_type): New enums.
* target.def (callee_save_cost, frame_allocation_cost): New hooks.
(ira_callee_saved_register_cost_scale): Delete.
* doc/tm.texi.in (TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE): Delete.
(TARGET_CALLEE_SAVE_COST, TARGET_FRAME_ALLOCATION_COST): New hooks.
* doc/tm.texi: Regenerate.
* hard-reg-set.h (hard_reg_set_popcount): New function.
* ira-color.cc (allocated_memory_p): New variable.
(allocated_callee_save_regs): Likewise.
(record_allocation): New function.
(assign_hard_reg): Use targetm.frame_allocation_cost to model
the cost of the first spill or first caller save. Use
targetm.callee_save_cost to model the cost of using new callee-saved
registers. Apply the exit rather than entry frequency to the cost
of restoring a register or deallocating the frame. Update the
new variables above.
(improve_allocation): Use record_allocation.
(color): Initialize allocated_callee_save_regs.
(ira_color): Initialize allocated_memory_p.
* targhooks.h (default_callee_save_cost): Declare.
(default_frame_allocation_cost): Likewise.
* targhooks.cc (default_callee_save_cost): New function.
(default_frame_allocation_cost): Likewise.
Andrew Pinski [Fri, 7 Mar 2025 00:07:02 +0000 (16:07 -0800)]
c: Fix warning after an error on a return statment [PR60440]
Like r5-6912-g3dbb84276aca10 but this is for the C front-end.
Basically we have an error on a return statement, we just return
error_mark_node and then the warning happens as there is no return
statement. Anyways instead mark the current function for supression
of the warning instead.
PR c/60440
gcc/c/ChangeLog:
* c-typeck.cc (c_finish_return): Mark the current function
for supression of the -Wreturn-type if there was an error
on the return statement.
gcc/testsuite/ChangeLog:
* gcc.dg/Wreturn-mismatch-2.c: Change dg-warning
for the last -Wreturn-type to dg-bogus.
* gcc.dg/pr60440-1.c: New test.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Marek Polacek [Tue, 11 Feb 2025 20:43:40 +0000 (15:43 -0500)]
c++: ICE with operator new[] in constexpr [PR118775]
Here we ICE since r11-7740 because we no longer say that (long)&a
(where a is a global var) is non_constant_p. So VERIFY_CONSTANT
does not return and we crash on tree_to_uhwi. We should check
tree_fits_uhwi_p before calling tree_to_uhwi.
Martin Jambor [Fri, 7 Mar 2025 16:17:24 +0000 (17:17 +0100)]
ipa-cp: Avoid ICE when redistributing nodes among edges to recursive clones (PR 118318)
PR 118318 reported an ICE during PGO build of Firefox when IPA-CP, in
the final stages of update_counts_for_self_gen_clones where it
attempts to guess how to distribute profile count among clones created
for recursive edges and the various edges that are created in the
process. If one such edge has profile count of kind GUESSED_GLOBAL0,
the compatibility check in the operator+ will lead to an ICE. After
discussing the situation with Honza, we concluded that there is little
more we can do other than check for this situation before touching the
edge count, so this is what this patch does.
gcc/ChangeLog:
2025-02-28 Martin Jambor <mjambor@suse.cz>
PR ipa/118318
* ipa-cp.cc (adjust_clone_incoming_counts): Add a compatible_p check.
arm: testsuite: improve guard checks for arm_neon.h
The header file arm_neon.h provides the Advanced SIMD intrinsics that
are available on armv7 or later A & R profile cores. However, they
are not compatible with M-profile and we also need to ensure that the
FP instructions are enabled (with -mfloat-abi=softfp/hard). That
leads to some complicated checking as arm_neon.h includes stdint.h
and, at least on linux, that can require that the appropriate ABI
bits/ headers are also installed.
This patch adds a new check to target-supports.exp to establish the
minimal set of option overrides needed to enable use of this header in
a test.
gcc/testsuite:
* lib/target-supports.exp
(check_effective_target_arm_neon_h_ok_nocache): New function.
(check_effective_target_arm_neon_h_ok): Likewise.
(add_options_for_arm_neon_h): Likewise.
(check_effective_target_arm_libc_fp_abi_ok_nocache): Allow any
Arm target, not just arm32.
* gcc.target/arm/attr-neon-builtin-fail.c: Use it.
* gcc.target/arm/attr-neon-builtin-fail2.c: Likewise.
* gcc.target/arm/attr-neon-fp16.c: Likewise.
* gcc.target/arm/attr-neon2.c: Likewise.
arm: make arm_neon.h compatible with '-march=<base> -mfloat-abi=softfp'
With -mfpu set to auto, an architecture specification that lacks
floating-point, but has -mfloat-abi=softfp will cause a misleading
error. Specifically, if we have
We can therefore distinguish between the soft and softfp ABIs by
temporarily forcing VFP instructions into the ISA. If __ARM_FP is
still zero after doing this then we must be using the soft ABI.
gcc:
* config/arm/arm_neon.h: Try harder to detect if we have
the softfp ABI enabled.
Jakub Jelinek [Fri, 7 Mar 2025 15:35:11 +0000 (16:35 +0100)]
docs: Attempt to clarify complex literal suffixes [PR112960]
This attempts to clarify Complex literal suffixes in the documentation.
2025-03-07 Jakub Jelinek <jakub@redhat.com>
PR c/112960
PR c/117029
* doc/extend.texi (Complex): Add I and J suffixes to the list of
complex suffixes, adjust for all of those being part of ISO C2Y,
clarify that for -fno-ext-numeric-literals none of those are
recognized as GNU extensions and for C++14 i is considered UDL
even for -fext-numeric-literals when <complex> is included.
Tamar Christina [Fri, 7 Mar 2025 13:46:41 +0000 (13:46 +0000)]
middle-end: delay checking for alignment to load [PR118464]
This fixes two PRs on Early break vectorization by delaying the safety checks to
vectorizable_load when the VF, VMAT and vectype are all known.
This patch does add two new restrictions:
1. On LOAD_LANES targets, where the buffer size is known, we reject non-power
of two group sizes, as they are unaligned every other iteration and so may
cross a page unwittingly. For those cases require partial masking support.
2. On LOAD_LANES targets when the buffer is unknown, we reject vectorization if
we cannot peel for alignment, as the alignment requirement is quite large at
GROUP_SIZE * vectype_size. This is unlikely to ever be beneficial so we
don't support it for now.
There are other steps documented inside the code itself so that the reasoning
is next to the code.
As a fall-back, when the alignment fails we require partial vector support.
For VLA targets like SVE return element alignment as the desired vector
alignment. This means that the loads are never misaligned and so annoying it
won't ever need to peel.
So what I think needs to happen in GCC 16 is that.
1. during vect_compute_data_ref_alignment we need to take the max of
POLY_VALUE_MIN and vector_alignment.
2. vect_do_peeling define skip_vector when PFA for VLA, and in the guard add a
check that ncopies * vectype does not exceed POLY_VALUE_MAX which we use as a
proxy for pagesize.
3. Force LOOP_VINFO_USING_PARTIAL_VECTORS_P to be true in
vect_determine_partial_vectors_and_peeling since the first iteration has to
be partial. Require LOOP_VINFO_MUST_USE_PARTIAL_VECTORS_P otherwise we have
to fail to vectorize.
4. Create a default mask to be used, so that vect_use_loop_mask_for_alignment_p
becomes true and we generate the peeled check through loop control for
partial loops. From what I can tell this won't work for
LOOP_VINFO_FULLY_WITH_LENGTH_P since they don't have any peeling support at
all in the compiler. That would need to be done independently from the
above.
In any case, not GCC 15 material so I've kept the WIP patches I have downstream.
Bootstrapped Regtested on aarch64-none-linux-gnu,
arm-none-linux-gnueabihf, x86_64-pc-linux-gnu
-m32, -m64 and no issues.
gcc/ChangeLog:
PR tree-optimization/118464
PR tree-optimization/116855
* doc/invoke.texi (min-pagesize): Update docs with vectorizer use.
* tree-vect-data-refs.cc (vect_analyze_early_break_dependences): Delay
checks.
(vect_compute_data_ref_alignment): Remove alignment checks and move to
get_load_store_type, increase group access alignment.
(vect_enhance_data_refs_alignment): Add note to comment needing
investigating.
(vect_analyze_data_refs_alignment): Likewise.
(vect_supportable_dr_alignment): For group loads look at first DR.
* tree-vect-stmts.cc (get_load_store_type):
Perform safety checks for early break pfa.
* tree-vectorizer.h (dr_set_safe_speculative_read_required,
dr_safe_speculative_read_required, DR_SCALAR_KNOWN_BOUNDS): New.
(need_peeling_for_alignment): Renamed to...
(safe_speculative_read_required): .. This
(class dr_vec_info): Add scalar_access_known_in_bounds.
gcc/testsuite/ChangeLog:
PR tree-optimization/118464
PR tree-optimization/116855
* gcc.dg/vect/bb-slp-pr65935.c: Update, it now vectorizes because the
load type is relaxed later.
* gcc.dg/vect/vect-early-break_121-pr114081.c: Update.
* gcc.dg/vect/vect-early-break_22.c: Require partial vectors.
* gcc.dg/vect/vect-early-break_128.c: Likewise.
* gcc.dg/vect/vect-early-break_26.c: Likewise.
* gcc.dg/vect/vect-early-break_43.c: Likewise.
* gcc.dg/vect/vect-early-break_44.c: Likewise.
* gcc.dg/vect/vect-early-break_2.c: Require load_lanes.
* gcc.dg/vect/vect-early-break_7.c: Likewise.
* gcc.dg/vect/vect-early-break_132-pr118464.c: New test.
* gcc.dg/vect/vect-early-break_133_pfa1.c: New test.
* gcc.dg/vect/vect-early-break_133_pfa11.c: New test.
* gcc.dg/vect/vect-early-break_133_pfa10.c: New test.
* gcc.dg/vect/vect-early-break_133_pfa2.c: New test.
* gcc.dg/vect/vect-early-break_133_pfa3.c: New test.
* gcc.dg/vect/vect-early-break_133_pfa4.c: New test.
* gcc.dg/vect/vect-early-break_133_pfa5.c: New test.
* gcc.dg/vect/vect-early-break_133_pfa6.c: New test.
* gcc.dg/vect/vect-early-break_133_pfa7.c: New test.
* gcc.dg/vect/vect-early-break_133_pfa8.c: New test.
* gcc.dg/vect/vect-early-break_133_pfa9.c: New test.
* gcc.dg/vect/vect-early-break_39.c: Update testcase for misalignment.
* gcc.dg/vect/vect-early-break_18.c: Likewise.
* gcc.dg/vect/vect-early-break_20.c: Likewise.
* gcc.dg/vect/vect-early-break_21.c: Likewise.
* gcc.dg/vect/vect-early-break_38.c: Likewise.
* gcc.dg/vect/vect-early-break_6.c: Likewise.
* gcc.dg/vect/vect-early-break_53.c: Likewise.
* gcc.dg/vect/vect-early-break_56.c: Likewise.
* gcc.dg/vect/vect-early-break_57.c: Likewise.
* gcc.dg/vect/vect-early-break_81.c: Likewise.
Jonathan Wakely [Wed, 5 Mar 2025 21:08:21 +0000 (21:08 +0000)]
libstdc++: Add missing static_assert to std::expected<void,E>::value()&&
The r15-2326-gea435261ad58ea change missed a static_assert for
is_move_constructible_v in expected<cv void, E>::value()&&. When
exceptions are enabled, the program is ill-formed if the error type is
not move constructible, because we can't construct the
std::bad_expected_access. But prior to r15-7856-gd87c0d5443ba86, using
-fno-exceptions meant that we never constructed an exception, so didn't
need to copy/move the error value.
So that we don't rely on the r15-7856-gd87c0d5443ba86 change to the
_GLIBCXX_THROW_OR_ABORT macro to consistently enforce the Mandates:
conditions whether exceptions are enabled or not, we should check the
requirement explicitly.
This adds the missing static_assert. It also adds a test that verifies
the Mandates: conditions added by LWG 3843 and 3490 are enforced even
with -fno-exceptions.
libstdc++-v3/ChangeLog:
* include/std/expected (expected<cv void,E>::value()&&):
Add missing static_assert for LWG 3940.
* testsuite/20_util/expected/lwg3843.cc: New test.
Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
Richard Biener [Fri, 7 Mar 2025 09:15:20 +0000 (10:15 +0100)]
tree-optimization/119145 - avoid stray .MASK_CALL after vectorization
When we BB vectorize an if-converted loop body we make sure to not
leave around .MASK_LOAD or .MASK_STORE created by if-conversion but
we failed to check for .MASK_CALL.
PR tree-optimization/119145
* tree-vectorizer.cc (try_vectorize_loop_1): Avoid BB
vectorizing an if-converted loop body when there's a .MASK_CALL
in the loop body.
Christophe Lyon [Mon, 3 Mar 2025 11:12:18 +0000 (11:12 +0000)]
arm: Handle fixed PIC register in require_pic_register (PR target/115485)
Commit r9-4307-g89d7557202d25a forgot to accept a fixed PIC register
when extending the assert in require_pic_register.
arm_pic_register can be set explicitly by the user
(e.g. -mpic-register=r9) or implicitly as the default value with
-fpic/-fPIC/-fPIE and -mno-pic-data-is-text-relative -mlong-calls, and
we want to use/accept it when recording cfun->machine->pic_reg as used
to be the case.
tree-data-refs.cc uses alignment information to try to optimise
the code generated for alias checks. The assumption for "normal"
non-grouped, full-width scalar accesses was that the access size
would be a multiple of the alignment. As Richi notes in the PR,
this is a documented precondition of dr_with_seg_len:
/* The minimum common alignment of DR's start address, SEG_LEN and
ACCESS_SIZE. */
unsigned int align;
PR115192 was a case in which this assumption didn't hold. The access
was part of an aligned 4-element group, but only the first 2 elements
of the group were accessed. The alignment was therefore double the
access size.
In r15-820-ga0fe4fb1c8d78045 I'd "fixed" that by capping the
alignment in one of the output routines. But I think that was
misconceived. The precondition means that we should cap the
alignment at source instead.
Failure to do that caused a similar wrong code bug in this PR,
where the alignment comes from a short bitfield access rather
than from a group access.
gcc/
PR tree-optimization/116125
* tree-vect-data-refs.cc (vect_prune_runtime_alias_test_list): Make
the dr_with_seg_len alignment fields describe tha access sizes as
well as the pointer alignment.
* tree-data-ref.cc (create_intersect_range_checks): Don't compensate
for invalid alignment fields here.
gcc/testsuite/
PR tree-optimization/116125
* gcc.dg/vect/pr116125.c: New test.
aarch64: Use force_lowpart_subreg in a BFI splitter [PR119133]
lowpart_subreg ICEs are the gift that keeps giving. This is another
case where we need to use force_lowpart_subreg instead, to handle
cases where the input is already a subreg and where the combined
subreg is not allowed as a single operation.
We don't need to check can_create_pseudo_p since the input should
be a hard register rather than a subreg if !can_create_pseudo_p.
gcc/
PR target/119133
* config/aarch64/aarch64.md
(*aarch64_bfi<GPI:mode><ALLX:mode>_<SUBDI_BITS>): Use
force_lowpart_subreg.
gcc/testsuite/
PR target/119133
* gcc.dg/torture/pr119133.c: New test.
The following addresses the fact that we keep an excessive amount of
redundant DEBUG BEGIN_STMTs - in the testcase it sums up to 99.999%
of all stmts, sucking up compile-time in IL walks. The patch amends
the GIMPLE DCE code that elides redundant DEBUG BIND stmts, also
pruning uninterrupted sequences of DEBUG BEGIN_STMTs, keeping only
the last of each set of DEBUG BEGIN_STMT with unique location.
PR middle-end/118801
* tree-ssa-dce.cc (eliminate_unnecessary_stmts): Prune
sequences of uninterrupted DEBUG BEGIN_STMTs, keeping only
the last of a set with unique location.
This option can warn about things other than string and memory functions.
Say so explicitly, and give an example. I also did some copy-editing
of the text and added some paragraph breaks.
Haochen Jiang [Wed, 5 Mar 2025 02:35:11 +0000 (10:35 +0800)]
i386: Correct mask width for bf8->fp16 intrin on 256/512 bit
For bf8 -> fp16 convert, when dst is 256 bit, the mask should be
16 bit since 16*16=256, not the 8 bit in the current intrin. In
512 bit intrin, the mask size is also halved. This patch will fix
both of them.
[PR rtl-optimization/119099] Avoid infinite loop in ext-dce.
This fixes the ping-ponging of live sets in ext-dce which is left
unresolved can lead to infinite loops in the ext-dce pass as seen by the
P1 regression 119099.
At its core instead of replacing the livein set with the just recomputed
data, we IOR in the just recomputed data to the existing livein set.
That ensures the existing livein set never shrinks.
Bootstrapped and regression tested on x86. I've also thrown this into
my tester to verify it across multiple targets and that we aren't
regressing the (limited) tests we have in place for ext-dce's
optimization behavior.
While it's a generic patch, I'll wait for the RISC-V tester to run is
course before committing.
PR rtl-optimization/119099
gcc/
* ext-dce.cc (ext_dce_rd_transfer_n): Do not allow the livein
set to shrink.
gcc/testsuite/
* gcc.dg/torture/pr119099.c: New test.
Harald Anlauf [Thu, 6 Mar 2025 20:45:42 +0000 (21:45 +0100)]
Fortran: improve checking of substring bounds [PR119118]
After the fix for pr98490 no substring bounds check was generated if the
substring start was not a variable. While the purpose of that fix was to
suppress a premature check before implied-do indices were substituted, this
prevented a check if the substring start was an expression or a constant.
A better solution is to defer the check until implied-do indices have been
substituted in the start and end expressions.
PR fortran/119118
gcc/fortran/ChangeLog:
* dependency.cc (gfc_contains_implied_index_p): Helper function to
determine if an expression has a dependence on an implied-do index.
* dependency.h (gfc_contains_implied_index_p): Add prototype.
* trans-expr.cc (gfc_conv_substring): Adjust logic to not generate
substring bounds checks before implied-do indices have been
substituted.
gcc/testsuite/ChangeLog:
* gfortran.dg/bounds_check_23.f90: Generalize test.
* gfortran.dg/bounds_check_26.f90: New test.
Simon Martin [Thu, 6 Mar 2025 19:36:26 +0000 (20:36 +0100)]
Fix comment typos
While investigating PR c++/99538 I noticed two comment typos: "delared"
and "paramter". The first has a single occurrence, but the second a few
more. This patch fixes all of them.
Wilco Dijkstra [Mon, 3 Mar 2025 16:47:32 +0000 (16:47 +0000)]
AArch64: Enable early scheduling for -O3 and higher (PR118351)
Enable the early scheduler on AArch64 for O3/Ofast. This means GCC15 benefits
from much faster build times with -O2, but avoids the regressions in lbm which
is very sensitive to minor scheduling changes due to long FMA chains.
gcc:
PR target/118351
PR other/38768
* common/config/aarch64/aarch64-common.cc: Enable early scheduling with
-O3 and higher.
* doc/invoke.texi (-fschedule-insns): Update comment.
Jakub Jelinek [Thu, 6 Mar 2025 17:26:37 +0000 (18:26 +0100)]
c++: Update TYPE_FIELDS of variant types if cp_parser_late_parsing_default_args etc. modify it [PR98533]
The following testcases ICE during type verification, because TYPE_FIELDS
of e.g. S RECORD_TYPE in pr119123.C is different from TYPE_FIELDS of const S.
Various decls are added to S's TYPE_FIELDS first, then finish_struct
indirectly calls fixup_type_variants to sync the variant copies.
But later on cp_parser_class_specifier calls
cp_parser_late_parsing_default_args and that apparently adds a lambda
type (from default argument) to TYPE_FIELDS of S.
Dunno if that is right or not, assuming it is right, the following
patch fixes it by updating TYPE_FIELDS of variant types if there were
any changes in the various functions cp_parser_class_specifier defers and
calls on the outermost enclosing class.
There was quite a lot of code repetition already before, so the patch
uses a lambda to avoid the repetitions.
To my surprise, in some of the contract testcases (
g++.dg/contracts/contracts-friend1.C
g++.dg/contracts/contracts-nested-class1.C
g++.dg/contracts/contracts-nested-class2.C
g++.dg/contracts/contracts-redecl7.C
g++.dg/contracts/contracts-redecl8.C
) it is actually setting class_type and pushing TRANSLATION_UNIT_DECL
rather than some class types in some cases.
Or should the lambda pushing into the containing class be somehow avoided?
2025-03-06 Jakub Jelinek <jakub@redhat.com>
PR c++/98533
PR c++/119123
* parser.cc (cp_parser_class_specifier): Update TYPE_FIELDS of
variant types in case cp_parser_late_parsing_default_args etc. change
TYPE_FIELDS on the main variant. Add switch_to_class lambda and
use it to simplify repeated class switching code.
* g++.dg/cpp0x/pr98533.C: New test.
* g++.dg/cpp0x/pr119123.C: New test.
Jakub Jelinek [Thu, 6 Mar 2025 16:58:14 +0000 (17:58 +0100)]
c++: Fix up instantiation of pointer/reference/array types with attributes [PR119138]
My r15-7822 PR118787 change unfortunately broke build on x86_64-w64-mingw32.
The reduced testcase below shows what is going on.
va_list on this target is char * with extra (non-dependent) attributes
on it.
Before my r15-7822 change, instantiation of such type used the fast path and
just returned t, but as it has non-NULL TYPE_ATTRIBUTES, it now falls
through, builds a pointer type and then calls
apply_late_template_attributes. And in there triggers a bug, that function
has been written for types with RECORD_TYPE/UNION_TYPE (or ENUMERAL_TYPE?)
in mind, where we call apply_late_template_attributes with
ATTR_FLAG_TYPE_IN_PLACE and can just apply the non-dependent attributes
directly to TYPE_ATTRIBUTES. That is wrong for shared types like
{POINTER,REFERENCE,ARRAY}_TYPE etc., we should just force
cp_build_type_attribute_variant to build a variant type for the
non-dependent attributes and then process dependent attributes (which
given attr_flag will DTRT already).
The second change in the patch is an optimization, we can actually return
back to returning t even when TYPE_ATTRIBUTES is non-NULL, as long as it
is non-dependent (dependent attributes are stored first, so it is enough
to check the first attribute).
2025-03-06 Jakub Jelinek <jakub@redhat.com>
PR c++/119138
* pt.cc (apply_late_template_attributes): Set p to NULL if
ATTR_FLAG_TYPE_IN_PLACE is not set in attr_flags.
(tsubst) <case POINTER_TYPE, case REFERENCE_TYPE, case ARRAY_TYPE>:
Reuse original type even if TYPE_ATTRIBUTES is non-NULL, but all
the attributes are non-dependent.
Jonathan Wakely [Thu, 6 Mar 2025 13:29:41 +0000 (13:29 +0000)]
libstdc++: Make std::unique_lock self-move-assignable
LWG 4172 was approved in Hagenberg, February 2025, fixing
std::unique_lock and std::shared_lock to work correctly for
self-move-assignment.
Our std::shared_lock was already doing the right thing (contradicting
the standard) so just add a comment there. Our std::unique_lock needs to
be fixed to do the right thing.
libstdc++-v3/ChangeLog:
* include/bits/unique_lock.h (unique_lock::operator=): Fix for
self-move-assignment.
* include/std/shared_mutex (shared_lock::operator=): Add
comment.
* testsuite/30_threads/shared_lock/cons/lwg4172.cc: New test.
* testsuite/30_threads/unique_lock/cons/lwg4172.cc: New test.
Jonathan Wakely [Thu, 27 Feb 2025 21:59:41 +0000 (21:59 +0000)]
libstdc++: Add assertions to std::list::pop_{front,back}
The recently-approved Standard Library Hardening proposal (P3471R4)
gives pop_front and pop_back member functions hardened preconditions,
but std::list was missing assertions on them. Our other sequence
containers do have assertions on those members.
We need to include <bits/stl_pair.h> in C++23 and later, so that
__pair_like_convertible_from can use __pair_like, and so that
__is_tuple_like_v is declared before we define a partial specialization.
libstdc++-v3/ChangeLog:
* include/bits/ranges_util.h: Include <bits/stl_pair.h>.
Jonathan Wakely [Thu, 6 Mar 2025 16:04:05 +0000 (16:04 +0000)]
libstdc++: Fix failures in new std::complex test [PR119144]
This test fails due to duplicate explicit instantiations on targets
where size_t and unsigned int are the same type. It also fails with
-D_GLIBCXX_USE_CXX11_ABI=0 due to using std::string in constexpr
functions, and with --disable-libstdcxx-pch due to not including
<algorithm> for ranges::fold_left.
libstdc++-v3/ChangeLog:
PR libstdc++/119144
* testsuite/26_numerics/complex/tuple_like.cc: Include
<algorithm>, replace std::string with std::string_view,
instantiate tests for long instead of size_t.
Richard Biener [Thu, 6 Mar 2025 12:48:16 +0000 (13:48 +0100)]
lto/114501 - missed free-lang-data for CONSTRUCTOR index
The following makes sure to also walk CONSTRUCTOR element indexes
which can be FIELD_DECLs, referencing otherwise unused types we
need to clean. walk_tree only walks CONSTRUCTOR element data.
PR lto/114501
* ipa-free-lang-data.cc (find_decls_types_r): Explicitly
handle CONSTRUCTORs as walk_tree handling of those is
incomplete.
Thomas Schwinge [Wed, 19 Feb 2025 19:39:25 +0000 (20:39 +0100)]
libstdc++: Avoid '-Wunused-parameter' for 'out' in member function 'std::codecvt_base::result std::__format::{anonymous}::__encoding::conv(std::string_view, std::string&) const'
In a newlib configuration:
../../../../../source-gcc/libstdc++-v3/src/c++20/format.cc: In member function ‘std::codecvt_base::result std::__format::{anonymous}::__encoding::conv(std::string_view, std::string&) const’:
../../../../../source-gcc/libstdc++-v3/src/c++20/format.cc:100:35: error: unused parameter ‘out’ [-Werror=unused-parameter]
100 | conv(string_view input, string& out) const
| ~~~~~~~~^~~
libstdc++-v3/
* src/c++20/format.cc (conv): Tag 'out' as '[[maybe_unused]]'.
Thomas Schwinge [Wed, 19 Feb 2025 19:18:52 +0000 (20:18 +0100)]
libstdc++: Avoid '-Wunused-parameter' for 'is_directory' in member function 'bool std::filesystem::__cxx11::_Dir::do_unlink(bool, std::error_code&) const'
In a newlib configuration:
../../../../../source-gcc/libstdc++-v3/src/c++17/fs_dir.cc: In member function ‘bool std::filesystem::__cxx11::_Dir::do_unlink(bool, std::error_code&) const’:
../../../../../source-gcc/libstdc++-v3/src/c++17/fs_dir.cc:147:18: error: unused parameter ‘is_directory’ [-Werror=unused-parameter]
147 | do_unlink(bool is_directory, error_code& ec) const noexcept
| ~~~~~^~~~~~~~~~~~
libstdc++-v3/
* src/c++17/fs_dir.cc (do_unlink): Tag 'is_directory' as
'[[maybe_unused]]'.
Thomas Schwinge [Wed, 19 Feb 2025 19:15:30 +0000 (20:15 +0100)]
libstdc++: Avoid '-Wunused-parameter' for 'nofollow' in static member function 'static std::filesystem::__gnu_posix::DIR* std::filesystem::_Dir_base::openat(const _At_path&, bool)'
In a newlib configuration:
In file included from ../../../../../source-gcc/libstdc++-v3/src/c++17/fs_dir.cc:37,
from ../../../../../source-gcc/libstdc++-v3/src/c++17/cow-fs_dir.cc:26:
../../../../../source-gcc/libstdc++-v3/src/c++17/../filesystem/dir-common.h: In static member function ‘static std::filesystem::__gnu_posix::DIR* std::filesystem::_Dir_base::openat(const _At_path&, bool)’:
../../../../../source-gcc/libstdc++-v3/src/c++17/../filesystem/dir-common.h:210:36: error: unused parameter ‘nofollow’ [-Werror=unused-parameter]
210 | openat(const _At_path& atp, bool nofollow)
| ~~~~~^~~~~~~~
libstdc++-v3/
* src/filesystem/dir-common.h (openat): Tag 'nofollow' as
'[[maybe_unused]]'.
Thomas Schwinge [Wed, 19 Feb 2025 19:34:25 +0000 (20:34 +0100)]
libstdc++: Avoid '-Wunused-parameter' for '__what' in function 'void std::__throw_format_error(const char*)'
In a '-fno-exceptions' configuration:
In file included from ../../../../../source-gcc/libstdc++-v3/src/c++20/format.cc:29:
[...]/build-gcc/[...]/libstdc++-v3/include/format: In function ‘void std::__throw_format_error(const char*)’:
[...]/build-gcc/[...]/libstdc++-v3/include/format:200:36: error: unused parameter ‘__what’ [-Werror=unused-parameter]
200 | __throw_format_error(const char* __what)
| ~~~~~~~~~~~~^~~~~~
The PR claims that pair-fusion has invalid uses of gcc_assert (such that
the pass will misbehave with --disable-checking). As noted in the
comments, in the case of the calls to restrict_movement, the only way we
can possibly depend on the side effects is if we call it with a
non-singleton move range. However, the intent is that we always have a
singleton move range here, and thus we do not rely on the side effects.
This patch therefore adds asserts to check for a singleton move range
before calling restrict_movement, thus clarifying the intent and
hopefully dispelling any concerns that having the calls wrapped in
asserts is problematic here.
gcc/ChangeLog:
PR rtl-optimization/114492
* pair-fusion.cc (pair_fusion_bb_info::fuse_pair): Check for singleton
move range before calling restrict_movement.
(pair_fusion::try_promote_writeback): Likewise.
libstdc++: implement tuple protocol for std::complex (P2819R2)
This commit implements P2819R2 for C++26, making std::complex
destructurable and tuple-like (see [complex.tuple]).
std::get needs to get forward declared in stl_pair.h (following the
existing precedent for the implementation of P2165R4, cf. r14-8710-g65b4cba9d6a9ff), and implemented in <complex>.
Also, std::get(complex<T>) needs to return *references* to the real and
imaginary parts of a std::complex object, honoring the value category
and constness of the argument. In principle a straightforward task, it
gets a bit convoluted by the fact that:
1) std::complex does not have existing getters that one can use for this
(real() and imag() return values, not references);
2) there are specializations for language/extended floating-point types,
which requires some duplication -- need to amend the primary and all
the specializations;
3) these specializations use a `__complex__ T`, but the primary template
uses two non-static data members, making generic code harder to write.
The implementation choice used here is to add the overloads of std::get
for complex as declared in [complex.tuple]. In turn they dispatch to a
newly added getter that extracts references to the real/imaginary parts
of a complex<T>. This getter is private API, and the implementation
depends on whether it's the primary (bind the data member) or a
specialization (use the GCC language extensions for __complex__).
To avoid duplication and minimize template instantiations, the getter
uses C++23's deducing this (this avoids const overloads). The value
category is dealt with by the std::get overloads.
Add a test that covers the aspects of the tuple protocol, as well as the
tuple-like interface. While at it, add a test for the existing
tuple-like feature-testing macro.
PR libstdc++/113310
libstdc++-v3/ChangeLog:
* include/bits/stl_pair.h (get): Forward-declare std::get for
std::complex.
* include/bits/version.def (tuple_like): Bump the value of
the feature-testing macro in C++26.
* include/bits/version.h: Regenerate.
* include/std/complex: Implement the tuple protocol for
std::complex.
(tuple_size): Specialize for std::complex.
(tuple_element): Ditto.
(__is_tuple_like_v): Ditto.
(complex): Add a private getter to obtain references to the real
and the imaginary part, on the primary class template and on its
specializations.
(get): Add overloads of std::get for std::complex.
* testsuite/20_util/tuple/tuple_like_ftm.cc: New test.
* testsuite/26_numerics/complex/tuple_like.cc: New test.
this patch removes TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE and
replaces it with two hooks: one that controls the cost of using an
extra callee-saved register and one that controls the cost of allocating
a frame for the first spill.
(The patch does not attempt to address the shrink-wrapping part of
the thread above.)
On AArch64, this is enough to fix PR117477, as verified by the new tests.
The patch does not change the SPEC2017 scores significantly. (I saw a
slight improvement in fotonik3d and roms, but I'm not convinced that
the improvements are real.)
The patch makes IRA use caller saves for gcc.target/aarch64/pr103350-1.c,
which is a scan-dump correctness test that relies on not using
caller saves. The decision to use caller saves looks appropriate,
and saves an instruction, so I've just added -fno-caller-saves
to the test options.
The x86 parts were written by Honza.
gcc/
PR rtl-optimization/117477
* config/aarch64/aarch64.cc (aarch64_count_saves): New function.
(aarch64_count_above_hard_fp_saves, aarch64_callee_save_cost)
(aarch64_frame_allocation_cost): Likewise.
(TARGET_CALLEE_SAVE_COST): Define.
(TARGET_FRAME_ALLOCATION_COST): Likewise.
* config/i386/i386.cc (ix86_ira_callee_saved_register_cost_scale):
Replace with...
(ix86_callee_save_cost): ...this new hook.
(TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE): Delete.
(TARGET_CALLEE_SAVE_COST): Define.
* target.h (spill_cost_type, frame_cost_type): New enums.
* target.def (callee_save_cost, frame_allocation_cost): New hooks.
(ira_callee_saved_register_cost_scale): Delete.
* doc/tm.texi.in (TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE): Delete.
(TARGET_CALLEE_SAVE_COST, TARGET_FRAME_ALLOCATION_COST): New hooks.
* doc/tm.texi: Regenerate.
* hard-reg-set.h (hard_reg_set_popcount): New function.
* ira-color.cc (allocated_memory_p): New variable.
(allocated_callee_save_regs): Likewise.
(record_allocation): New function.
(assign_hard_reg): Use targetm.frame_allocation_cost to model
the cost of the first spill or first caller save. Use
targetm.callee_save_cost to model the cost of using new callee-saved
registers. Apply the exit rather than entry frequency to the cost
of restoring a register or deallocating the frame. Update the
new variables above.
(improve_allocation): Use record_allocation.
(color): Initialize allocated_callee_save_regs.
(ira_color): Initialize allocated_memory_p.
* targhooks.h (default_callee_save_cost): Declare.
(default_frame_allocation_cost): Likewise.
* targhooks.cc (default_callee_save_cost): New function.
(default_frame_allocation_cost): Likewise.
Michal Jires [Thu, 6 Mar 2025 05:49:20 +0000 (06:49 +0100)]
lto: Fix missing cleanup with incremental LTO.
Incremental LTO disabled cleanup of output_files since they have to
persist in ltrans cache.
This unintetionally also kept temporary early debug "*.debug.temp.o"
files.
Bootstrapped/regtested on x86_64-linux.
Ok for trunk?
lto-plugin/ChangeLog:
* lto-plugin.c (cleanup_handler): Keep only files in ltrans
cache.
Richard Biener [Thu, 6 Mar 2025 08:08:07 +0000 (09:08 +0100)]
middle-end/119119 - re-gimplification of empty CTOR assignments
The following testcase runs into a re-gimplification issue during
inlining when processing
MEM[(struct e *)this_2(D)].a = {};
where re-gimplification does not handle assignments in the same
way than the gimplifier but instead relies on rhs_predicate_for
and gimplifying the RHS standalone. This fails to handle
special-casing of CTORs. The is_gimple_mem_rhs_or_call predicate
already handles clobbers but not empty CTORs so we end up in
the fallback code trying to force the CTOR into a separate stmt
using a temporary - but as we have a non-copyable type here that ICEs.
The following generalizes empty CTORs in is_gimple_mem_rhs_or_call
since those need no additional re-gimplification.
PR middle-end/119119
* gimplify.cc (is_gimple_mem_rhs_or_call): All empty CTORs
are OK when not a register type.
=== cut here ===
struct span {
span (const int (&__first)[1]) : _M_ptr (__first) {}
int operator[] (long __i) { return _M_ptr[__i]; }
const int *_M_ptr;
};
void foo () {
constexpr int a_vec[]{1};
auto vec{[&a_vec]() -> span { return a_vec; }()};
}
=== cut here ===
The problem is that perform_implicit_conversion_flags (via
mark_rvalue_use) replaces "a_vec" in the return statement by a
CONSTRUCTOR representing a_vec's constant value, and then takes its
address when invoking span's constructor. So we end up with an instance
that points to garbage instead of a_vec's storage.
As per Jason's suggestion, this patch simply removes the calls to
mark_*_use from perform_implicit_conversion_flags, which fixes the PR.
Jeff Law [Thu, 6 Mar 2025 05:24:05 +0000 (22:24 -0700)]
Improve coverage of ext-dce tests in risc-v testsuite
Inspired by Liao Shihua, this adjusts two tests in the RISC-V testsuite
to get more coverage. Drop the -O1 argument and replace it with -fext-dce.
That way the test gets run across the full set of flags. We just need to
make sure to skip -O0.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/core_list_init.c: Use -fext-dce rather than
-O1. Skip for -O0.
* gcc.target/riscv/pr111384.c: Ditto.
Gaius Mulley [Wed, 5 Mar 2025 23:01:45 +0000 (23:01 +0000)]
PR modula2/118998 Rotate of a packetset causes different types to binary operator error
This patch allow a packedset to be rotated by the system module intrinsic
procedure function. It ensures that both operands to the tree rotate are
of the same type. In turn the result will be the same type and the
assignment into the designator (of the same set type) will succeed.
gcc/m2/ChangeLog:
PR modula2/118998
* gm2-gcc/m2expr.cc (m2expr_BuildLRotate): Convert nBits
to the return type.
(m2expr_BuildRRotate): Ditto.
(m2expr_BuildLogicalRotate): Convert op3 to an integer type.
Replace op3 aith rotateCount.
Negate rotateCount if it is negative and call rotate right.
* gm2-gcc/m2pp.cc (m2pp_bit_and_expr): New function.
(m2pp_binary_function): Ditto.
(m2pp_simple_expression): BIT_AND_EXPR new case clause.
LROTATE_EXPR ditto.
RROTATE_EXPR ditto.
gcc/testsuite/ChangeLog:
PR modula2/118998
* gm2/iso/pass/testrotate.mod: New test.
* gm2/pim/fail/tinyconst.mod: New test.
* gm2/sets/run/pass/simplepacked.mod: New test.
Jonathan Wakely [Mon, 3 Mar 2025 13:36:54 +0000 (13:36 +0000)]
libstdc++: Move new functions to separate files [PR119110]
The new test functions I added in r15-7765-g3866ca796d5281 are causing
those tests to FAIL on Solaris and arm-thumb due to the linker
complaining about undefined functions. The new test functions are not
called, so it shouldn't matter that they call undefined member
functions, but it does.
Move those functions to separate { dg-do compile } files so the linker
isn't used and won't complain.
libstdc++-v3/ChangeLog:
PR libstdc++/119110
* testsuite/25_algorithms/move/constrained.cc: Move test06
function to ...
* testsuite/25_algorithms/move/105609.cc: New test.
* testsuite/25_algorithms/move_backward/constrained.cc: Move
test04 function to ...
* testsuite/25_algorithms/move_backward/105609.cc: New test.
This is a C++ >= 26 codepath for supporting constexpr stable_sort, so we
know that we have if consteval available; it just needs protection with
the feature-testing macro. Also merge the return in the same statement.
Amends r15-7708-gff43f9853d3b10.
libstdc++-v3/ChangeLog:
* include/bits/stl_algo.h (__stable_sort): Use if consteval
instead of is_constant_evaluated.
Jason Merrill [Wed, 5 Mar 2025 13:45:34 +0000 (08:45 -0500)]
c++: coroutines and return in registers [PR118874]
Because coroutines insert a call to the resumer between the initialization
of the return value and the actual return to the caller, we need to
duplicate the work of gimplify_return_expr for the !aggregate_value_p case.
PR c++/117364
PR c++/118874
gcc/cp/ChangeLog:
* coroutines.cc (cp_coroutine_transform::build_ramp_function): For
!aggregate_value_p return force return value into a local temp.
Hannes Braun [Thu, 20 Feb 2025 14:09:41 +0000 (15:09 +0100)]
arm: Fix signedness of vld1q intrinsic parms [PR118942]
vld1q_s8_x3, vld1q_s16_x3, vld1q_s8_x4 and vld1q_s16_x4 were expecting
pointers to unsigned integers. These parameters should be pointers to
signed integers.
gcc/ChangeLog:
PR target/118942
* config/arm/arm_neon.h (vld1q_s8_x3): Use int8_t instead of
uint16_t.
(vld1q_s16_x3): Use int16_t instead of uint16_t.
(vld1q_s8_x4): Likewise.
(vld1q_s16_x4): Likewise.
Patrick Palka [Wed, 5 Mar 2025 16:11:35 +0000 (11:11 -0500)]
libstdc++: Some concat_view bugfixes [PR115215, PR115218, LWG 4082]
- Use __builtin_unreachable to suppress a false-positive "control
reaches end of non-void function" warning in the recursive lambda
(which the existing tests failed to notice since test01 wasn't
being called at runtime)
- Relax the constraints on views::concat in the single-argument case
as per PR115215
- Add an input_range requirement to that same case as per LWG 4082
- In the const-converting constructor of concat_view's iterator,
don't require the first iterator to be default constructible
PR libstdc++/115215
PR libstdc++/115218
libstdc++-v3/ChangeLog:
* include/std/ranges
(concat_view::iterator::_S_invoke_with_runtime_index): Use
__builtin_unreachable in recursive lambda to certify it always
exits via 'return'.
(concat_view::iterator::iterator): In the const-converting
constructor, direct initialize _M_it.
(views::_Concat::operator()): Adjust constraints in the
single-argument case as per LWG 4082.
* testsuite/std/ranges/concat/1.cc (test01): Call it at runtime
too.
(test04): New test.
Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com> Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
Da Xie [Sun, 2 Mar 2025 06:45:11 +0000 (14:45 +0800)]
c++: Check invalid use of constrained auto with trailing return type [PR100589]
Add check for constrained auto type specifier in function declaration or
function type declaration with trailing return type. Issue error if such
usage is detected.
Test file renamed, and added a new test for type declaration.
Successfully bootstrapped and regretested on x86_64-pc-linux-gnu:
Added 6 passed and 4 unsupported tests.
PR c++/100589
gcc/cp/ChangeLog:
* decl.cc (grokdeclarator): Issue an error for a declarator with
constrained auto type specifier and trailing return types. Include
function names if available.
gcc/testsuite/ChangeLog:
* g++.dg/cpp2a/concepts-pr100589.C: New test.
Signed-off-by: Da Xie <xxie_xd@163.com> Reviewed-by: Patrick Palka <ppalka@redhat.com> Reviewed-by: Jason Merrill <jason@redhat.com>
Kyrylo Tkachov [Wed, 5 Mar 2025 11:03:52 +0000 (03:03 -0800)]
PR rtl-optimization/119046: aarch64: Fix PARALLEL mode for vec_perm DUP expansion
The PARALLEL created in aarch64_evpc_dup is used to hold the lane number.
It is not appropriate for it to have a vector mode.
Other such uses use VOIDmode.
Do this here as well.
This avoids the risk of generic code treating the PARALLEL as trapping when it
has floating-point mode.
Bootstrapped and tested on aarch64-none-linux-gnu.
Signed-off-by: Kyrylo Tkachov <ktkachov@nvidia.com>
PR rtl-optimization/119046
* config/aarch64/aarch64.cc (aarch64_evpc_dup): Use VOIDmode for
PARALLEL.
Kyrylo Tkachov [Thu, 27 Feb 2025 17:00:25 +0000 (09:00 -0800)]
PR rtl-optimization/119046: Don't mark PARALLEL RTXes with floating-point mode as trapping
In this testcase late-combine was failing to merge:
dup v31.4s, v31.s[3]
fmla v30.4s, v31.4s, v29.4s
into the lane-wise fmla form.
This is because late-combine checks may_trap_p under the hood on the dup insn.
This ended up returning true for the insn:
(set (reg:V4SF 152 [ _32 ])
(vec_duplicate:V4SF (vec_select:SF (reg:V4SF 111 [ rhs_panel.8_31 ])
(parallel:V4SF [
(const_int 3 [0x3])]))))
Although mem_trap_p correctly reasoned that vec_duplicate and vec_select of
floating-point modes can't trap, it assumed that the V4SF parallel can trap.
The correct behaviour is to recurse into vector inside the PARALLEL and check
the sub-expression. This patch adjusts may_trap_p_1 to do just that.
With this check the above insn is not deemed to be trapping and is propagated
into the FMLA giving:
fmla vD.4s, vA.4s, vB.s[3]
Bootstrapped and tested on aarch64-none-linux-gnu.
Apparently this also fixes a regression in
gcc.target/aarch64/vmul_element_cost.c that I observed.
Jakub Jelinek [Wed, 5 Mar 2025 13:30:35 +0000 (14:30 +0100)]
value-range: Fix up irange::union_bitmask [PR118953]
The following testcase is miscompiled during evrp.
Before vrp, we have (from ccp):
# RANGE [irange] long long unsigned int [0, +INF] MASK 0xffffffffffffc000 VALUE 0x2d
_3 = _2 + 18446744073708503085;
...
# RANGE [irange] long long unsigned int [0, +INF] MASK 0xffffffffffffc000 VALUE 0x59
_6 = (long long unsigned int) _5;
# RANGE [irange] int [-INF, +INF] MASK 0xffffc000 VALUE 0x34
_7 = k_11 + -1048524;
switch (_7) <default: <L5> [33.33%], case 8: <L7> [33.33%], case 24: <L6> [33.33%], case 32: <L6> [33.33%]>
...
# RANGE [irange] long long unsigned int [0, +INF] MASK 0xffffffffffffc07d VALUE 0x0
# i_20 = PHI <_3(4), 0(3), _6(2)>
and evrp is now trying to figure out range for i_20 in range_of_phi.
All the ranges and MASK/VALUE pairs above are correct for the testcase,
k_11 and _2 based on it is a result of multiplication by a constant with low
14 bits cleared and then some numbers are added to it.
There is an obvious missed optimization for which I've filed PR119039,
simplify_switch_using_ranges could see that all the labels but default
are unreachable because the controlling expression has
MASK 0xffffc000 VALUE 0x34 and none of 8, 24 and 32 satisfy that.
Anyway, during range_of_phi for i_20, we process the PHI arguments
in order. For the _3(4) case, we figure out that it is reachable
through the case 24: case 32: labels only of the switch and that
0x34 - 0x2d is 7, so derive
[irange] long long unsigned int [17, 17][25, 25] MASK 0xffffffffffffc000 VALUE 0x2d
(the MASK/VALUE just got inherited from the _3 earlier range).
Now (not suprisingly because those labels aren't actually reachable),
that range is inconsistent, 0x2d is 45, so there is conflict between the
values and the irange_bitmask.
value-range.{h,cc} code differentiates between actually stored
irange_bitmask, which is that MASK 0xffffffffffffc000 VALUE 0x2d, and
semantic bitmask, which is what get_bitmask returns. That is
// The mask inherent in the range is calculated on-demand. For
// example, [0,255] does not have known bits set by default. This
// saves us considerable time, because setting it at creation incurs
// a large penalty for irange::set. At the time of writing there
// was a 5% slowdown in VRP if we kept the mask precisely up to date
// at all times. Instead, we default to -1 and set it when
// explicitly requested. However, this function will always return
// the correct mask.
//
// This also means that the mask may have a finer granularity than
// the range and thus contradict it. Think of the mask as an
// enhancement to the range. For example:
//
// [3, 1000] MASK 0xfffffffe VALUE 0x0
//
// 3 is in the range endpoints, but is excluded per the known 0 bits
// in the mask.
//
// See also the note in irange_bitmask::intersect.
irange_bitmask bm
= get_bitmask_from_range (type (), lower_bound (), upper_bound ());
if (!m_bitmask.unknown_p ())
bm.intersect (m_bitmask);
Now, get_bitmask_from_range here is MASK 0x1f VALUE 0x0 and it intersects
that with that MASK 0xffffffffffffc000 VALUE 0x2d.
Which triggers the ugly special case in irange_bitmask::intersect:
// If we have two known bits that are incompatible, the resulting
// bit is undefined. It is unclear whether we should set the entire
// range to UNDEFINED, or just a subset of it. For now, set the
// entire bitmask to unknown (VARYING).
if (wi::bit_and (~(m_mask | src.m_mask),
m_value ^ src.m_value) != 0)
{
unsigned prec = m_mask.get_precision ();
m_mask = wi::minus_one (prec);
m_value = wi::zero (prec);
}
so the semantic bitmask is actually MASK 0xffffffffffffffff VALUE 0x0.
Next, range_of_phi attempts to union it with the 0(3) PHI argument,
and during irange::union_ first adds the [0,0] to the subranges, so
[irange] long long unsigned int [0, 0][17, 17][25, 25] MASK 0xffffffffffffc000 VALUE 0x2d
and then goes on to irange::union_bitmask which does
if (m_bitmask == r.m_bitmask)
return false;
irange_bitmask bm = get_bitmask ();
irange_bitmask save = bm;
bm.union_ (r.get_bitmask ());
if (save == bm)
return false;
m_bitmask = bm;
if (save == get_bitmask ())
return false;
m_bitmask MASK 0xffffffffffffc000 VALUE 0x2d isn't the same as
r.m_bitmask MASK 0x0 VALUE 0x0, so we compute the semantic bitmask
(but note, not from the original range before union, but the modified one,
dunno if that isn't a problem as well), which is still the VARYING/unknown_p
one, union_ that with MASK 0x0 VALUE 0x0 and get still
MASK 0xffffffffffffffff VALUE 0x0, so don't update anything, the semantic
bitmask didn't change, so we are fine (not!, see later).
Except then we try to union with the third PHI argument. And, because the
edge to that comes only from case 8: label and there is a known difference
between the two, the argument is actually already from earlier replaced by
45(2) constant. So, irange::union_ adds the [45, 45] range to the list
of subranges, but voila, 45 is 0x2d and satisfies the stored
MASK 0xffffffffffffc000 VALUE 0x2d and so the semantic bitmask changed to
from MASK 0xffffffffffffffff VALUE 0x0 to MASK 0xffffffffffffc000 VALUE 0x2d
by that addition. Eventually, we just optimize this to
[irange] long long unsigned int [45, 45] because that is the only range
which satisfies the bitmask. And that is wrong, at runtime i_20 has
value 0.
The following patch attempts to detect this case where get_bitmask
turns some non-VARYING m_bitmask into VARYING one because of a conflict
and in that case makes sure m_bitmask is actually updated rather than
unmodified, so that later union_ doesn't cause problems.
I also wonder whether e.g. get_bitmask couldn't have special case for this
and if bm.intersect (m_bitmask); yields unknown_p from something not
originally unknown_p, perhaps chooses to just use get_bitmask_from_range
value and ignore the stored m_bitmask. Though, dunno how union_bitmask
in that case would figure out it needs to update m_bitmask.
2025-03-05 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/118953
* value-range.cc (irange::union_bitmask): Update m_bitmask if
get_bitmask () is unknown_p and m_bitmask is not even when the
semantic bitmask didn't change and returning false.
* include/bits/ranges_util.h (__detail::__pair_like_convertible_from):
Use `_Tp` in `is_reference_v` check
* testsuite/std/ranges/subrange/tuple_like.cc: New tests for
pair-like conversion
Richard Biener [Tue, 4 Mar 2025 15:13:09 +0000 (16:13 +0100)]
middle-end/97323 - TYPE_CANONICAL vs. ARRAY_TYPE modes
For strict-alignment targets we can end up with BLKmode single-element
array types when the element type is unaligned. This confuses
type checking since the canonical type would have an aligned
element type and a non-BLKmode mode. The following simply ignores
the mode we assign to array types for this purpose, like we already
do for record and union types.
PR middle-end/97323
* tree.cc (gimple_canonical_types_compatible_p): Ignore
TYPE_MODE also for ARRAY_TYPE.
(verify_type): Likewise.
This commit adds support for C++26's constexpr specialized memory
algorithms, introduced by P2283R2, P3508R0, P3369R0.
The uninitialized_default, value, copy, move and fill algorithms are
affected, in all of their variants (iterator-based, range-based and _n
versions.)
The changes are mostly mechanical -- add `constexpr` to a number of
signatures when compiling in C++26 and above modes. The internal helper
guard class for range algorithms instead can be marked unconditionally.
uninitialized_default_construct is implemented in terms of the
_Construct_novalue helper, which requires support for C++26's constexpr
placement new from the compiler (P2747R2, which GCC implements). We can
simply mark it as constexpr in C++26 language modes, even if the
compiler does not support P2747R2 (e.g. Clang 17/18), because C++23's
P2448R2 makes it OK to mark functions as constexpr even if they never
qualify, and other compilers implement this.
The only "real" change to the implementation of the algorithms is that
during constant evaluation I need to dispatch to a constexpr-friendly
version of them.
For each algorithm family I've added only one test to cover it and its
variants; the idea is to avoid too much repetition and simplify future
maintenance.
libstdc++-v3/ChangeLog:
* include/bits/ranges_uninitialized.h: Mark the specialized
memory algorithms as constexpr in C++26. Also mark the members
of the _DestroyGuard helper class.
* include/bits/stl_uninitialized.h: Ditto.
* include/bits/stl_construct.h: (_Construct_novalue) Mark it
as constexpr in C++26.
* include/bits/version.def (raw_memory_algorithms): Bump the
feature-testing macro for C++26.
* include/bits/version.h: Regenerate.
* testsuite/20_util/headers/memory/synopsis.cc: Add constexpr to
the uninitialized_* algorithms (when in C++26) in the test.
* testsuite/20_util/specialized_algorithms/feature_test_macro.cc:
New test.
* testsuite/20_util/specialized_algorithms/uninitialized_copy/constexpr.cc:
New test.
* testsuite/20_util/specialized_algorithms/uninitialized_default_construct/constexpr.cc:
New test.
* testsuite/20_util/specialized_algorithms/uninitialized_fill/constexpr.cc:
New test.
* testsuite/20_util/specialized_algorithms/uninitialized_move/constexpr.cc:
New test.
* testsuite/20_util/specialized_algorithms/uninitialized_value_construct/constexpr.cc:
New test.
Fortran: Add view convert to pointer assign when only pointer/alloc attr differs [PR104684]
PR fortran/104684
gcc/fortran/ChangeLog:
* trans-array.cc (gfc_conv_expr_descriptor): Look at the
lang-specific akind and do a view convert when only the akind
attribute differs between pointer and allocatable array.
Simon Martin [Wed, 5 Mar 2025 08:08:57 +0000 (09:08 +0100)]
c++: Fix checking assert upon invalid class definition [PR116740]
A checking assert triggers upon the following invalid code since
GCC 11:
=== cut here ===
class { a (struct b;
} struct b
=== cut here ===
The problem is that during error recovery, we call
set_identifier_type_value_with_scope for B in the global namespace, and
the checking assert added via r11-7228-g8f93e1b892850b fails.
This patch relaxes that assert to not fail if we've seen a parser error
(it a generalization of another fix done to that checking assert via r11-7266-g24bf79f1798ad1).
PR c++/116740
gcc/cp/ChangeLog:
* name-lookup.cc (set_identifier_type_value_with_scope): Don't
fail assert with ill-formed input.
Jakub Jelinek [Wed, 5 Mar 2025 06:47:52 +0000 (07:47 +0100)]
openmp, c++: Fix up OpenMP/OpenACC handling in C++ modules [PR119102]
modules.cc has apparently support for extensions and attempts to ensure
that if a module is compiled with those extensions enabled, sources which
use the module are compiled with the same extensions.
The only extension supported is SE_OPENMP right now.
And the use of the extension is keyed on streaming out or in OMP_CLAUSE
tree.
This is undesirable for several reasons.
OMP_CLAUSE is the only tree which can appear in the IL even without
-fopenmp/-fopenmp-simd/-fopenacc (when simd ("notinbranch") or
simd ("inbranch") attributes are used), and it can appear also in all
the 3 modes mentioned above. On the other side, with the exception of
arguments of attributes added e.g. for declare simd where no harm should
be done if -fopenmp/-fopenmp-simd isn't enabled later on, OMP_CLAUSE appears
in OMP_*_CLAUSES of OpenMP/OpenACC construct trees. And those construct
trees often have no clauses at all, so keying the extension on OMP_CLAUSE
doesn't catch many cases that should be caught.
Furthermore, for OpenMP we have 2 modes, -fopenmp-simd which parses some
OpenMP but constructs from that mostly OMP_SIMD and a few other cases,
and -fopenmp which includes that and far more on top of that; and there is
also -fopenacc.
So, this patch stops setting/requesting the extension on OMP_CLAUSE,
introduces 3 extensions rather than one (SE_OPENMP_SIMD, SE_OPENMP and
SE_OPENACC) and keyes those on OpenMP constructs from the -fopenmp-simd
subset, other OpenMP constructs and OpenACC constructs.
2025-03-05 Jakub Jelinek <jakub@redhat.com>
PR c++/119102
gcc/cp/
* module.cc (enum streamed_extensions): Add SE_OPENMP_SIMD
and SE_OPENACC, change value of SE_OPENMP and SE_BITS.
(CASE_OMP_SIMD_CODE, CASE_OMP_CODE, CASE_OACC_CODE): Define.
(trees_out::start): Don't set SE_OPENMP extension for OMP_CLAUSE.
Set SE_OPENMP_SIMD extension for CASE_OMP_SIMD_CODE, SE_OPENMP
for CASE_OMP_CODE and SE_OPENACC for CASE_OACC_CODE.
(trees_in::start): Don't fail for OMP_CLAUSE with missing
SE_OPENMP extension. Do fail for CASE_OMP_SIMD_CODE and missing
SE_OPENMP_SIMD extension, or CASE_OMP_CODE and missing SE_OPENMP
extension, or CASE_OACC_CODE and missing SE_OPENACC extension.
(module_state::write_readme): Write all of SE_OPENMP_SIMD, SE_OPENMP
and SE_OPENACC extensions.
(module_state::read_config): Diagnose missing -fopenmp, -fopenmp-simd
and/or -fopenacc depending on extensions used.
gcc/testsuite/
* g++.dg/modules/pr119102_a.H: New test.
* g++.dg/modules/pr119102_b.C: New test.
* g++.dg/modules/omp-3_a.C: New test.
* g++.dg/modules/omp-3_b.C: New test.
* g++.dg/modules/omp-3_c.C: New test.
* g++.dg/modules/omp-3_d.C: New test.
* g++.dg/modules/oacc-1_a.C: New test.
* g++.dg/modules/oacc-1_b.C: New test.
* g++.dg/modules/oacc-1_c.C: New test.
Jakub Jelinek [Wed, 5 Mar 2025 05:41:00 +0000 (06:41 +0100)]
c++: Apply/diagnose attributes when instatiating ARRAY/POINTER/REFERENCE_TYPE [PR118787]
The following testcase IMO in violation of the P2552R3 paper doesn't
pedwarn on alignas applying to dependent types or alignas with dependent
argument.
tsubst was just ignoring TYPE_ATTRIBUTES.
The following patch fixes it for the POINTER/REFERENCE_TYPE and
ARRAY_TYPE cases, but perhaps we need to do the same also for other
types (INTEGER_TYPE/REAL_TYPE and the like). I guess I'll need to
construct more testcases.
2025-03-05 Jakub Jelinek <jakub@redhat.com>
PR c++/118787
* pt.cc (tsubst) <case ARRAY_TYPE>: Use return t; only if it doesn't
have any TYPE_ATTRIBUTES. Call apply_late_template_attributes.
<case POINTER_TYPE, case REFERENCE_TYPE>: Likewise. Formatting fix.
Xi Ruoyao [Sun, 2 Mar 2025 11:02:50 +0000 (19:02 +0800)]
LoongArch: Fix incorrect reorder of __lsx_vldx and __lasx_xvldx [PR119084]
They could be incorrectly reordered with store instructions like st.b
because the RTL expression does not have a memory_operand or a (mem)
expression. The incorrect reorder has been observed in openh264 LTO
build.
Expand them to a (mem) expression instead of unspec to fix the issue.
Then we need to make loongarch_address_insns return 1 for
ADDRESS_REG_REG because the constraint "R" expects this behavior, or
the vldx instruction will be considered invalid by the register
allocate pass and turned to add.d + vld. Apply the ADDRESS_REG_REG
penalty in loongarch_address_cost instead, loongarch_rtx_costs should
also call loongarch_address_cost instead of loongarch_address_insns
then.