Steve Baird [Fri, 10 Jan 2025 21:15:18 +0000 (13:15 -0800)]
ada: Avoid calling Resolve with Stand.Any_Fixed as the expected type
When we call Resolve for an expression, we pass in the expected type
for that expression. In the absence of semantic errors, that expected type
should never be any of the "Any_xxx" types declared in stand.ads (e.g.,
Any_Array, Any_Numeric, Any_Real). In particular, it should never be Any_Fixed.
Fix a case in which this rule was being violated.
gcc/ada/ChangeLog:
* sem_res.adb
(Set_Mixed_Mode_Operand): If we are about to call Resolve
passing in Any_Fixed as the expected type, then instead pass in
the fixed point type of the other operand (i.e., B_Typ).
Gary Dismukes [Fri, 10 Jan 2025 22:39:52 +0000 (22:39 +0000)]
ada: Compiler crash on array aggregate association iterating over function result
The compiler triggers a bug box when compiling an array aggregate with
an iterated_component_association that iterates over another array object,
failing when trying to retrieve a Choices field, which isn't an allowed
field for N_Iterated_Component_Association nodes. This occurs in procedure
Check_Function_Writable_Actuals, which wasn't accounting for the iterated
association forms.
gcc/ada/ChangeLog:
* sem_util.adb (Check_Function_Writable_Actuals): Add handling for
N_Iterated_Component_Association and N_Iterated_Element_Association.
Fix a typo in an RM reference (6.4.1(20/3) => 6.4.1(6.20/3)).
(Collect_Expression_Ids): New procedure factoring code for collecting
identifiers from expressions of aggregate associations.
(Handle_Association_Choices): New procedure factoring code for handling
id collection for expressions of aggregate associations with multiple
choices. Removed redundant test of Box_Present from original code.
Hongyu Wang [Thu, 5 Jun 2025 06:45:08 +0000 (14:45 +0800)]
tree-sra: Use MOVE_MAX for sra size limit [PR112824]
Current sra use UNITS_PER_WORD to define max scalarization size, but
for targets like x86 it allows operations on larger size, so the
components like vector variables in an aggregate can be larger than
just UNITS_PER_WORD. Use MOVE_MAX instead of UNITS_PER_WORD to allow
sra for aggregates with vector components.
gcc/ChangeLog:
PR middle-end/112824
* tree-sra.cc (sra_get_max_scalarization_size): Use MOVE_MAX
instead of UNITS_PER_WORD to define max_scalarization_size.
We do not need to generate this code early, since it does not affect
any of the analysis. Lowering it later takes less code, and avoids
modifying the initial await expresssion which will simplify changes
to analysis to deal with open PRs.
gcc/cp/ChangeLog:
* coroutines.cc (expand_one_await_expression): Set the
initial_await_resume_called flag here.
(build_actor_fn): Populate the frame accessor for the
initial_await_resume_called flag.
(cp_coroutine_transform::wrap_original_function_body): Do
not modify the initial_await expression to include the
initial_await_resume_called flag here.
Hu, Lin1 [Tue, 27 May 2025 11:09:04 +0000 (19:09 +0800)]
i386: Fix vmovvdup's mem attribute
Some vmovvdup pattern's type attribute is sselog1 and then mem attribute is
both. Modify type attribute according to other patterns about vmovvdup.
gcc/ChangeLog:
* config/i386/sse.md
(avx512f_movddup512<mask_name>): Change sselog1 to ssemov.
(avx_movddup256<mask_name>): Ditto.
(*vec_dupv2di): Change alternative 4's type attribute from sselog1
to ssemov.
This patch introduces a new testcase to verify the merging of profiles
is performed for cloned functions.
Since this is invoked very early, before the pass manager, we need to
set up the dumping explicitly. This is similar to the handling in
finish_optimization_passes.
OpenMP: Fix regressions in metadirective-target-device-2.c [PR120518]
My previous patch that added a CLEANUP_POINT_EXPR around the device_num
selector expression in the C++ front end broke the testcase
c-c++-common/gomp/metadirective-target-device-2.c on offload targets.
It confused the code in omp_device_num_check that tries to bypass error
checking and do early resolution when the expression is a call to one
of the OpenMP library functions. The solution is to make that code smart
enough to look inside a CLEANUP_POINT_EXPR.
gcc/ChangeLog
PR c++/120518
* omp-general.cc (omp_device_num_check): Look inside a
CLEANUP_POINT_EXPR when trying to optimize special cases.
Jonathan Wakely [Wed, 28 May 2025 14:19:18 +0000 (15:19 +0100)]
libstdc++: Make system_clock::to_time_t always_inline [PR99832]
For some 32-bit targets Glibc supports changing the size of time_t to be
64 bits by defining _TIME_BITS=64. That causes an ABI change which
would affect std::chrono::system_clock::to_time_t. Because to_time_t is
not a function template, its mangled name does not depend on the return
type, so it has the same mangled name whether it returns a 32-bit time_t
or a 64-bit time_t. On targets where the size of time_t can be selected
at preprocessing time, that can cause ODR violations, e.g. the linker
selects a definition of to_time_t that returns a 32-bit value but a
caller expects 64-bit and so reads 32 bits of garbage from the stack.
This commit adds always_inline to to_time_t so that all callers inline
the conversion to time_t, and will do so using whatever type time_t
happens to be in that translation unit.
Existing objects compiled before this change will either have inlined
the function anyway (which is likely if compiled with any optimization
enabled) or will contain a COMDAT definition of the inline function and
so still be able to find it at link-time.
The attribute is also added to system_clock::from_time_t, because that's
an equally simple function and it seems reasonable for them to both be
always inlined.
libstdc++-v3/ChangeLog:
PR libstdc++/99832
* include/bits/chrono.h (system_clock::to_time_t): Add
always_inline attribute to be agnostic to the underlying type of
time_t.
(system_clock::from_time_t): Add always_inline for consistency
with to_time_t.
* testsuite/20_util/system_clock/99832.cc: New test.
Nathan Myers [Wed, 4 Jun 2025 18:52:29 +0000 (14:52 -0400)]
libstdc++: sstream from string_view (P2495R3) [PR119741]
Add constructors to stringbuf, stringstream, istringstream, and ostringstream,
and a matching overload of str(sv) in each, that take anything convertible to
a string_view in places where the existing ctors and function take a string.
Note this change omits the constraint applied to the istringstream constructor
from string cited as a "drive-by" in P2495R3, as we have determined it is
redundant.
libstdc++-v3/ChangeLog:
PR libstdc++/119741
* include/std/sstream: full implementation, really just
decls, requires clause and plumbing.
* include/bits/version.def, include/bits/version.h:
new preprocessor symbol
__cpp_lib_sstream_from_string_view.
* testsuite/27_io/basic_stringbuf/cons/char/string_view.cc:
New tests.
* testsuite/27_io/basic_istringstream/cons/char/string_view.cc:
New tests.
* testsuite/27_io/basic_ostringstream/cons/char/string_view.cc:
New tests.
* testsuite/27_io/basic_stringstream/cons/char/string_view.cc:
New tests.
* testsuite/27_io/basic_stringbuf/cons/wchar_t/string_view.cc:
New tests.
* testsuite/27_io/basic_istringstream/cons/wchar_t/string_view.cc:
New tests.
* testsuite/27_io/basic_ostringstream/cons/wchar_t/string_view.cc:
New tests.
* testsuite/27_io/basic_stringstream/cons/wchar_t/string_view.cc:
New tests.
This implements the library changes in P0849R8 "auto(x): decay-copy
in the language" which consist of replacing most uses of the
exposition-only function decay-copy with auto(x) throughout the library
wording. We implement this as a DR against C++20 since there should be
no behavior change in practice (especially in light of LWG 3724 which
makes decay-copy SFINAE-friendly).
The main difference between decay-copy and auto(x) is that decay-copy
materializes its argument unlike auto(x), and so the latter is a no-op
when its argument is a prvalue. Effectively the former could introduce
an unnecessary move constructor call in some contexts. In C++20 and
earlier we could emulate auto(x) with decay_t<decltype((x))>(x).
After this paper the only remaining uses of decay-copy in the standard
are in the specification of some range adaptors. In our implementation
of those range adaptors I believe decay-copy is already implied which is
why we don't use __decay_copy explicitly there. So since it's apparently
no longer needed this patch goes ahead and removes __decay_copy.
Jason Merrill [Wed, 4 Jun 2025 17:31:02 +0000 (13:31 -0400)]
c++: constexpr prvalues vs genericize [PR120502]
Here constexpr evaluation was getting confused by the result of
split_nonconstant_init, which leaves an INIT_EXPR from an empty CONSTRUCTOR
to be followed by member initialization. As a result
CONSTRUCTOR_NO_CLEARING was set for the time_zone, and
cxx_eval_store_expression didn't set it again for the initial clobber in the
basic_string constructor, so when cxx_fold_indirect_ref wants to check
whether the anonymous union active member had type non_trivial_if, we see
that we don't currently have a value for the anonymous union, try to add
one, and fail.
So let's do constexpr evaluation before split_nonconstant_init.
PR c++/120502
gcc/cp/ChangeLog:
* cp-gimplify.cc (cp_fold_r) [TARGET_EXPR]: Do constexpr evaluation
before genericize.
* constexpr.cc (cxx_eval_store_expression): Add comment.
Thomas Schwinge [Mon, 26 May 2025 11:31:54 +0000 (13:31 +0200)]
Avoid SIGSEGV in nvptx 'mkoffload' for voluminous PTX code
In commit 50be486dff4ea2676ed022e9524ef190b92ae2b1
"nvptx: libgomp+mkoffload.cc: Prepare for reverse offload fn lookup", some
additional tracking of the PTX code was added, and this assumes that
potentially every single character of PTX code needs to be tracked as a new
chunk of PTX code. That's problematic if we're dealing with voluminous PTX
code (for example, non-trivial C++ code), and the 'file_idx' 'alloca'tion then
causes stack overflow. For example:
FAIL: libgomp.c++/target-std__valarray-1.C (test for excess errors)
UNRESOLVED: libgomp.c++/target-std__valarray-1.C compilation failed to produce executable
lto-wrapper: fatal error: [...]/build-gcc/gcc//accel/nvptx-none/mkoffload terminated with signal 11 [Segmentation fault], core dumped
gcc/
* config/nvptx/mkoffload.cc (process): Use an 'auto_vec' for
'file_idx'.
Andrew Pinski [Fri, 21 Feb 2025 06:05:38 +0000 (22:05 -0800)]
gimple-fold: Implement simple copy propagation for aggregates [PR14295]
This implements a simple copy propagation for aggregates in the similar
fashion as we already do for copy prop of zeroing.
Right now this only looks at the previous vdef statement but this allows us
to catch a lot of cases that show up in C++ code.
This used to deleted aggregate copies that are to the same location (PR57361)
But that was found to delete statements that are needed for aliasing markers reason.
So we need to keep them around until that is solved. Note DSE will delete the statements
anyways so there is no testcase added since we expose the latent bug in the same way.
See https://gcc.gnu.org/pipermail/gcc-patches/2025-May/685003.html for the testcase and
explaintation there.
Also adds a variant of pr22237.c which was found while working on this patch.
Changes since v1:
* v2: change check for vuse to use default definition.
Remove dest/src arguments for optimize_agr_copyprop
Changed dump messages slightly.
Added stats
Don't delete `a = a` until aliasing markers are added.
* tree-ssa-forwprop.cc (optimize_agr_copyprop): New function.
(pass_forwprop::execute): Call optimize_agr_copyprop for load/store statements.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/20031106-6.c: Un-xfail. Add scan for forwprop1.
* g++.dg/opt/pr66119.C: Disable forwprop since that does
the copy prop now.
* gcc.dg/tree-ssa/pr108358-a.c: New test.
* gcc.dg/tree-ssa/pr114169-1.c: New test.
* gcc.c-torture/execute/builtins/pr22237-1-lib.c: New test.
* gcc.c-torture/execute/builtins/pr22237-1.c: New test.
* gcc.dg/tree-ssa/pr57361.c: Disable forwprop1.
* gcc.dg/tree-ssa/pr57361-1.c: New test.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Pengfei Li [Wed, 4 Jun 2025 15:59:44 +0000 (16:59 +0100)]
match.pd: Fold (x + y) >> 1 into IFN_AVG_FLOOR (x, y) for vectors
This patch folds vector expressions of the form (x + y) >> 1 into
IFN_AVG_FLOOR (x, y), reducing instruction count on platforms that
support averaging operations. For example, it can help improve the
codegen on AArch64 from:
add v0.4s, v0.4s, v31.4s
ushr v0.4s, v0.4s, 1
to:
uhadd v0.4s, v0.4s, v31.4s
As this folding is only valid when the most significant bit of each
element in both x and y is known to be zero, this patch checks leading
zero bits of elements in x and y, and extends get_nonzero_bits_1() to
handle uniform vectors. When the input is a uniform vector, the function
now returns the nonzero bits of its element.
Additionally, this patch adds more checks to reject vector types in bit
constant propagation (tree-bit-ccp), since tree-bit-ccp was designed for
scalar values only, and the new vector logic in get_non_zero_bits_1()
could lead to incorrect propagation results.
The result was many hundreds of warnings. The vast bulk of them were
recommendations for declaring variables as const, recommendations for
changing C-style casts to C++ casts, cheery notes about shadowed
variables, and complaints that malloc() results weren't being checked
for errors.
Two and a half days of applied OCD on my part has reduced the number of
warnings down to zero.
Xi Ruoyao [Sun, 11 May 2025 08:44:31 +0000 (16:44 +0800)]
ext-dce: Don't refine live width with SUBREG mode if !TRULY_NOOP_TRUNCATION_MODES_P [PR 120050]
If we see a promoted subreg and TRULY_NOOP_TRUNCATION says the
truncation is not a noop, then all bits of the inner reg are live. We
cannot reduce the live mask to that of the mode of the subreg.
gcc/ChangeLog:
PR rtl-optimization/120050
* ext-dce.cc (ext_dce_process_uses): Break early if a SUBREG in
rhs is promoted and the truncation from the inner mode to the
outer mode is not a noop when handling SETs.
Jakub Jelinek [Wed, 4 Jun 2025 15:22:58 +0000 (17:22 +0200)]
ranger: Some parameter formatting fixes
When reading the code, I've noticed various function definitions
with misaligned parameters, they should IMHO always align below the first
character after opening ( and in most cases they do, but in some
cases they were indented more or less. Perhaps the functions changed
name or something.
Jakub Jelinek [Wed, 4 Jun 2025 15:21:51 +0000 (17:21 +0200)]
ranger: Add support for float <-> float casts [PR120231]
I've noticed we don't even support say float -> double and other
scalar floating point to scalar floating point conversions in the
ranger, we just end up with VARYING for those.
The following patch attempts to fix that.
The reverse cast case uses float_binary_op_range_finish e.g. because
if the result isn't infinite, then the source couldn't be infinite
either even if the reverse fold_range would suggest that.
And special cases the case of guaranteed widening cast (where
we have assurance that all the source type values are exactly
representable in the destination type; using ieee_bits for that).
2025-06-04 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/120231
* range-op-mixed.h (operator_cast::fold_range): Add overload
with 3 {,const} frange & operands. Change parameter names and
add final override keywords for float <-> integer cast overloads.
(operator_cast::op1_range): Likewise.
* range-op-float.cc (operator_cast::fold_range): New overload
with 3 {,const} frange & operands.
(operator_cast::op1_range): Likewise.
Tomasz Kamiński [Tue, 3 Jun 2025 09:40:17 +0000 (11:40 +0200)]
libstdc++: Test for formatting with empty spec for time points.
Adding a tests for behavior of the ostream operator and the formatting
with empty chrono-spec for the chrono types. Current coverage is:
* time point, zoned_time and local_time_format in this commit,
* duration and hh_mm_ss in r16-1099-gac0a04b7a254fb,
* calendar types in r16-1016-g28a17985dd34b7.
libstdc++-v3/ChangeLog:
* testsuite/std/time/format/empty_spec.cc: New tests.
Reviewed-by: Jonathan Wakely <jwakely@redhat.com> Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>
Patrick Palka [Wed, 4 Jun 2025 14:29:47 +0000 (10:29 -0400)]
libstdc++: Implement C++23 P1659R3 starts_with and ends_with
This implements ranges::starts_with and ranges::ends_with from the C++23
paper P1659R3. The corresponding_S_impl member functions take optional
optional size parameters __n1 and __n2 of the two ranges, where -1 means
the corresponding size is not known.
Dongyan Chen [Wed, 4 Jun 2025 14:03:31 +0000 (08:03 -0600)]
[PATCH] RISC-V: Imply zicsr for svade and svadu extensions.
This patch implies zicsr for svade and svadu extensions.
According to the riscv-privileged spec, the svade and svadu extensions
are privileged instructions, so they should imply zicsr.
Dongyan Chen [Wed, 4 Jun 2025 13:57:01 +0000 (07:57 -0600)]
[PATCH v2] RISC-V: Add svbare extension.
This patch support svbare extension, which is an extension in RVA23 profile.
To enable GCC to recognize and process svbare extension correctly at compile time.
Jonathan Wakely [Mon, 2 Jun 2025 22:01:40 +0000 (23:01 +0100)]
libstdc++: Refactor __semaphore_base member functions
Replace the _S_get_current and _S_do_try_acquire static member functions
with non-static member functions _M_get_current and _M_do_try_acquire.
This means they don't need the address of _M_counter passed in.
libstdc++-v3/ChangeLog:
* include/bits/semaphore_base.h (_S_get_current): Replace with
non-static _M_get_current.
(_S_do_try_acquire): Replace with non-static _M_do_try_acquire.
Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
There's a deadlock in std::counting_semaphore that occurs when the
semaphore is under contention. The bug happens when one thread tries to
acquire the mutex, calling __semaphore_base::_S_do_try_acquire to
atomically decrement the counter using compare_exchange_strong. If the
counter is non-zero (and so should be possible to decrement) but another
thread changes it (either incrementing or decrementing it) then the
compare_exchange fails and _S_do_try_acquire returns false. Because that
function is used by the predicate passed to __atomic_wait_address, when
it returns false the thread does a futex wait until the value changes.
However, when the predicate is false because the compare_exchange failed
due to not matching the expected value, waiting for the value to change
is incorrect. The correct behaviour would be to retry the
compare_exchange using the new value (as long as it's still non-zero).
Waiting for the value to change again means we can block forever,
because it might never change again.
The predicate should only test the value, not also attempt to alter it,
and its return value should mean only one thing, not conflate a busy
semaphore that cannot be acquired with a contended one that can be
acquired by retrying.
The correct behaviour of __semaphore_base::_M_acquire would be to
attempt the decrement, and to retry immediately if it failed due to
contention on the variable (i.e. due to the variable not having the
expected value). It should only wait for the value to change when the
value is zero, because that's the only time we can't decrement it.
This commit moves the _S_do_try_acquire call out of the predicate and
loops while it is false, only doing an atomic wait when the counter's
value is zero. The predicate used for the atomic wait now only checks
whether the value is decrementable (non-zero), without also trying to
perform that decrement.
In order for the caller to tell whether it should retry a failed
_S_do_try_acquire or should wait for the value to be non-zero, the value
obtained by a failed compare_exchange needs to be passed back to the
caller. _S_do_try_acquire is changed to take its parameter by reference,
so that the caller gets the new value and can check whether it's zero.
In order to avoid doing another atomic load after returning from an
atomic wait, the predicate is also changed to capture the local __val by
reference, and then assign to __val when it sees a non-zero value. That
makes the new value available to _M_acquire, so it can be passed to
_S_do_try_acquire as the expected value of the compare_exchange.
Although this means that the predicate is modifying data again, not just
checking a value, this modification is safe. It's not changing the
semaphore's counter, only changing a local variable in the caller to
avoid a redundant atomic load.
Equivalent changes are made to _M_try_acquire_until and
_M_try_acquire_for. They have the same bug, although they can escape the
deadlock if the wait is interrupted by timing out. For _M_acquire
there's no time out so it potentially waits forever.
_M_try_acquire also has the same bug, but can be simplified to just
calling _M_try_acquire_for(0ns). A timeout of zero results in calling
__wait_impl with the __spin_only flag set, so that the value is loaded
and checked in a spin loop but there is no futex wait. This means that
_M_try_acquire can still succeed under light contention if the counter
is being changed concurrently, at the cost of a little extra overhead.
It would be possible to implement _M_try_acquire as nothing more than an
atomic load and a compare_exchange, but it would fail when there is any
contention.
libstdc++-v3/ChangeLog:
PR libstdc++/104928
* include/bits/semaphore_base.h (_S_do_try_acquire): Take old
value by reference.
(_M_acquire): Move _S_do_try_acquire call out of the predicate
and loop on its result. Make the predicate capture and update
the local copy of the value.
(_M_try_acquire_until, _M_try_acquire_for): Likewise.
(_M_try_acquire): Just call _M_try_acquire_for.
* testsuite/30_threads/semaphore/104928-2.cc: New test.
* testsuite/30_threads/semaphore/104928.cc: New test.
Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
In the comment trail for PR119966, I'd said that the validate_subreg
condition:
/* The outer size must be ordered wrt the register size, otherwise
we wouldn't know at compile time how many registers the outer
mode occupies. */
if (!ordered_p (osize, regsize))
return false;
"is also potentially relevant" for paradoxical subregs. But I'd
forgotten an important caveat. If the inner size is smaller than
a register, we know that the inner value will only occupy a single
register. Although the paradoxical subreg might extend that single
register to multiple registers by padding with undefined bits,
the register size that matters for the extension is:
REGMODE_NATURAL_SIZE (omode)
rather than regsize's:
REGMODE_NATURAL_SIZE (imode)
The ordered check is still relevant if the inner value spans
multiple registers.
Enabling the check above for paradoxical subregs led to an ICE in the
testcase, where we tried to generate a VNx4QI paradoxical subreg of a
QI scalar. This was previously allowed, and AFAIK worked correctly.
The patch doesn't have the effect of relaxing the condition for
non-paradoxical subregs, since:
So even before the patch for PR119966, the condition only existed for
the maybe_gt (isize, regsize) case.
The term "block" used in the comment is taken from the rtl.texi
documentation of subregs.
gcc/
PR rtl-optimization/120447
* emit-rtl.cc (validate_subreg): Restrict ordered_p test
between osize and regsize to cases where the inner value
occupies multiple blocks.
gcc/testsuite/
PR rtl-optimization/120447
* gcc.dg/pr120447.c: New test.
Tobias Burnus [Wed, 4 Jun 2025 11:25:05 +0000 (13:25 +0200)]
libgomp.texi (omp_interop_*): Add note about 5.2-to-6.0 incompatibility
GCC uses the 6.0 types - which are unfortunately not quite compatible with
code expecting 5.1/5.2 data types. Therefore, this commit adds a note to
hopefully reduce surprises. Namely:
For C/C++: while OpenMP 5.1 and 5.2 used 'int *ret_code', OpenMP 6.0 uses
'omp_interop_rc_t *ret_code' in omp_interop_{int,ptr,str} and 'int' instead
of 'omp_interop_rc_t ret_code' in omp_get_interop_rc_desc.
Neither C nor C++ like passing the wrong pointer type, albeit for C, GCC < 14
and clang only warn (gcc >= r14-6037-g9715c545d33b3a has an error) and
using -fpermissive turns it into a warning and -Wno-incompatible-pointer-types
silences it for C.
C++ also dislikes passing an int to an enum, albeit -fpermissive turns the
error into a warning with g++ (but not clang++). And, here, using an enum
on the caller side works with both int and enum on the callee side.
libgomp/ChangeLog:
* libgomp.texi (omp_interop_{int,ptr,str,rc_desc}): Add note about
the 'ret_code' type change in OpenMP 6.
Tomasz Kamiński [Wed, 4 Jun 2025 09:05:11 +0000 (11:05 +0200)]
libstdc++: Fix format call and test formatting with empty specs for durations.
This patches fixes an obvious error, where the output iterator argument was
missing for call to format_to, when duration with custom representation types
are used.
It's also adding the test for behavior of ostream operator and the formatting
with empty chron-spec for the chrono types. Current coverage is:
* duration and hh_mm_ss in this commit,
* calendar types in r16-1016-g28a17985dd34b7.
libstdc++-v3/ChangeLog:
* include/bits/chrono_io.h (__formatter_chrono:_M_s): Add missing
__out argument to format_to call.
* testsuite/std/time/format/empty_spec.cc: New test.
Reviewed-by: Jonathan Wakely <jwakely@redhat.com> Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>
Program received signal SIGSEGV, Segmentation fault.
0x000000000131174f in prepare_call_arguments (
bb=<basic_block 0x7fffe99dfba0 (2)>, insn=0x7fffe980cc60)
at /export/gnu/import/git/sources/gcc/gcc/var-tracking.cc:6277
6277 fndecl = MEM_EXPR (XEXP (call, 0));
(gdb) bt
bb=<basic_block 0x7fffe99dfba0 (2)>, insn=0x7fffe980cc60)
at /export/gnu/import/git/sources/gcc/gcc/var-tracking.cc:6277
at /export/gnu/import/git/sources/gcc/gcc/var-tracking.cc:10297
at /export/gnu/import/git/sources/gcc/gcc/var-tracking.cc:10526
at /export/gnu/import/git/sources/gcc/gcc/var-tracking.cc:10579
at /export/gnu/import/git/sources/gcc/gcc/var-tracking.cc:10616
Update prepare_call_arguments to check MEM_P before using MEM_EXPR.
gcc/
PR debug/120525
* var-tracking.cc (prepare_call_arguments): Use MEM_EXPR only
if MEM_P is true.
Fortran: Fix missing substring ref for allocatable saved vars [PR120483]
Compute a substring ref on an allocatable static character array
using pointer arithmetic. Using an array type corrupts type
layouting and crashes omp generation.
PR fortran/120483
gcc/fortran/ChangeLog:
* trans-expr.cc (gfc_conv_substring): Use pointer arithmetic on
static allocatable char arrays.
Hu, Lin1 [Mon, 10 Mar 2025 08:52:22 +0000 (16:52 +0800)]
i386: Add more peephole2 for APX NDD
The patch aims to optimize
movb (%rdi), %al
movq %rdi, %rbx
xorl %esi, %eax, %edx
movb %dl, (%rdi)
cmpb %sil, %al
jne
to
xorb %sil, (%rdi)
movq %rdi, %rbx
jne
Reduce 2 mov and 1 cmp instructions.
Due to APX NDD allowing the dest register and source register to be different,
some original peephole2 are invalid. Add new peephole2 patterns for APX NDD.
gcc/ChangeLog:
* config/i386/i386.md (define_peephole2): Define some new peephole2 for
APX NDD.
Hu, Lin1 [Wed, 19 Feb 2025 07:51:40 +0000 (15:51 +0800)]
i386: Add more forms peephole2 for adc/sbb
Enable -mapxf will change some patterns about adc/sbb.
Hence gcc will raise an extra mov like
movq 8(%rdi), %rax
adcq %rax, 8(%rsi), %rax
movq %rax, 8(%rdi)
rather than
movq 8(%rsi), %rax
adcq %rax, 8(%rdi)
The patch add more kinds of peephole2 to eliminate the extra mov.
gcc/ChangeLog:
* config/i386/i386.md: Add 4 new peephole2 by swap the original
peephole2's operands' order to support new pattern.
Martin Uecker [Sun, 1 Jun 2025 18:34:52 +0000 (20:34 +0200)]
c: Move checking assertions from recursion when forming composite types to avoid ICE.
The checking assertion in composite_type_internal for structures and unions may
fail if there are self-referential types. To avoid this, we move them out of
the recursion. This should also be more efficient and covers other types.
We have to ignore some cases where we form composite types with qualifiers
not matching (PR120510).
Pan Li [Mon, 2 Jun 2025 08:56:59 +0000 (16:56 +0800)]
RISC-V: Combine vec_duplicate + vidv.vv to vdiv.vx on GR2VR cost
This patch would like to combine the vec_duplicate + vdiv.vv to the
vdiv.vx. From example as below code. The related pattern will depend
on the cost of vec_duplicate from GR2VR. Then the late-combine will
take action if the cost of GR2VR is zero, and reject the combination
if the GR2VR cost is greater than zero.
Assume we have example code like below, GR2VR cost is 0.
#define DEF_VX_BINARY(T, OP) \
void \
test_vx_binary (T * restrict out, T * restrict in, T x, unsigned n) \
{ \
for (unsigned i = 0; i < n; i++) \
out[i] = in[i] OP x; \
}
The below test suites are passed for this patch.
* The rv64gcv fully regression test.
gcc/ChangeLog:
* config/riscv/riscv-v.cc (expand_vx_binary_vec_vec_dup): Add new
case for DIV op.
* config/riscv/riscv.cc (get_vector_binary_rtx_cost): Add new func
to get the cost of vector binary.
(riscv_rtx_costs): Add div rtx match and leverage above wrap to
get cost.
* config/riscv/vector-iterators.md: Add new op div to no_shift_vx_op.
Richard Biener [Tue, 3 Jun 2025 12:09:22 +0000 (14:09 +0200)]
tree-optimization/120517 - fix dataref group split math
DR_INIT is already measured in bytes, so there's no need to multiply
the DR_INIT difference of two DRs by the size of one of the DRs when
comparing that difference against MAX_BITSIZE_MODE_ANY_MODE.
PR tree-optimization/120517
* tree-vect-data-refs.cc (vect_analyze_data_ref_accesses):
Fix math in dataref group split.
Jonathan Wakely [Mon, 2 Jun 2025 10:24:32 +0000 (11:24 +0100)]
libstdc++: Fix errors and incorrect returns in atomic timed waits
The __detail::__wait_until function has a comment that should have been
removed when r16-1000-g225622398a9631 changed the return type from a
std::pair to a struct with three members.
The __atomic_wait_address_until_v and __atomic_wait_address_for_v
function templates are apparently never used or instantiated, because
they don't compile. This fixes them, but they're still unused. I plan
to make use of them in a later commit.
In __atomic_wait_address_until_v, __res.first in the return statement
should have also been changed when r16-1000-g225622398a9631 changed
__wait_result_type, and &__args should have been changed to just __args
by r16-988-g219bb905a60d95.
In __atomic_wait_address_for_v, the parameter is a copy & paste error
and should use chrono::duration not chrono::time_point
Fix _M_spin_until_impl so that the _M_has_val member of the result is
accurate. If the deadline has passed then it never enters the loop and
so never loads a fresh value, so _M_has_val should be false. There's
also a redundant clock::now() call in __spin_until_impl which can be
removed, we can reuse the call immediately before it.
libstdc++-v3/ChangeLog:
* include/bits/atomic_timed_wait.h (__detail::__wait_until):
Remove incorrect comment.
(__atomic_wait_address_until_v): Do not take address of __args in
call to __detail::__wait_until. Fix return statement to refer to
member of __wait_result_type.
(__atomic_wait_address_for_v): Change parameter type from
time_point to duration.
* src/c++20/atomic.cc (__spin_until_impl): Fix incorrect
return value. Reuse result of first call to clock.
Jonathan Wakely [Thu, 29 May 2025 10:40:59 +0000 (11:40 +0100)]
libstdc++: Replace some implicit conversions in std::vector
This replaces two implicit conversions from ptrdiff_t to size_t with
explicit conversions that include unreachable hints for the ptrdiff_t
value not being negative.
libstdc++-v3/ChangeLog:
* include/bits/stl_vector.h (~_Vector_base): Add unreachable
hint for negative capacity and cast to size_t explicitly.
* include/bits/vector.tcc (vector::_M_realloc_append): Use
size() instead of end() - begin().
Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
Jonathan Wakely [Tue, 27 May 2025 15:54:52 +0000 (16:54 +0100)]
libstdc++: Remove redundant macro checks in std.cc.in
__cpp_lib_any and __cpp_lib_chrono are defined unconditionally in C++20
and __cpp_lib_three_way_comparison and __cpp_lib_concepts depend on
front-end features which are definitely supported by GCC trunk and all
non-GCC compilers we care about.
libstdc++-v3/ChangeLog:
* src/c++23/std.cc.in: Remove redundant checks for feature test
macros that are always true.
Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
RISC-V: Use helper function to get FPR to VR move cost
Since last patch introduced get_fr2vr_cost () to get the correct cost to move
data from a floating-point to a vector register, this patch replaces existing
uses of the constant FR2VR.
RISC-V: Add pattern for vector-scalar multiply-add/sub [PR119100]
This pattern enables the combine pass (or late-combine, depending on the case)
to merge a vec_duplicate into a plus-mult or minus-mult RTL instruction.
Before this patch, we have two instructions, e.g.:
vfmv.v.f v6,fa0
vfmadd.vv v9,v6,v7
After, we get only one:
vfmadd.vf v9,fa0,v7
On SPEC2017's 503.bwaves_r, depending on the workload, the reduction in dynamic
instruction count varies from -4.66% to -4.75%.
PR target/119100
gcc/ChangeLog:
* config/riscv/autovec-opt.md (*<optab>_vf_<mode>): Add new pattern to
combine vec_duplicate + vfm{add,sub}.vv into vfm{add,sub}.vf.
* config/riscv/riscv-opts.h (FPR2VR_COST_UNPROVIDED): Define.
* config/riscv/riscv-protos.h (get_fr2vr_cost): Declare function.
* config/riscv/riscv.cc (riscv_rtx_costs): Add cost model for MULT with
VEC_DUPLICATE.
(get_fr2vr_cost): New function.
* config/riscv/riscv.opt: Add new option --param=fpr2vr-cost.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_mulop.h: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_mulop_data.h: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_mulop_run.h: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmadd-run-1-f16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmadd-run-1-f32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmadd-run-1-f64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmsub-run-1-f16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmsub-run-1-f32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmsub-run-1-f64.c: New test.
Jakub Jelinek [Tue, 3 Jun 2025 05:54:37 +0000 (07:54 +0200)]
libgomp: Fix up omp_target_memset-3.c test for C++ [PR120444]
The test PASSes for C, but FAILs for C++:
.../libgomp.c-c++-common/omp_target_memset-3.c: In function 'void test_it(void*, int, size_t)':
.../libgomp.c-c++-common/omp_target_memset-3.c:31:7: warning: pointer of type 'void *' used in arithmetic [-Wpointer-arith]
.../libgomp.c-c++-common/omp_target_memset-3.c:33:13: error: invalid conversion from 'void*' to 'int8_t*' {aka 'signed char*'} [-fpermissive]
.../libgomp.c-c++-common/omp_target_memset-3.c:10:19: note: initializing argument 1 of 'void init_val(int8_t*, int, size_t)'
.../libgomp.c-c++-common/omp_target_memset-3.c:37:14: error: invalid conversion from 'void*' to 'int8_t*' {aka 'signed char*'} [-fpermissive]
.../libgomp.c-c++-common/omp_target_memset-3.c:17:20: note: initializing argument 1 of 'void check_val(int8_t*, int, size_t)'
.../libgomp.c-c++-common/omp_target_memset-3.c:38:18: warning: pointer of type 'void *' used in arithmetic [-Wpointer-arith]
.../libgomp.c-c++-common/omp_target_memset-3.c:38:18: error: invalid conversion from 'void*' to 'int8_t*' {aka 'signed char*'} [-fpermissive]
.../libgomp.c-c++-common/omp_target_memset-3.c:17:20: note: initializing argument 1 of 'void check_val(int8_t*, int, size_t)'
.../libgomp.c-c++-common/omp_target_memset-3.c: In function 'int main()':
.../libgomp.c-c++-common/omp_target_memset-3.c:46:7: warning: pointer of type 'void *' used in arithmetic [-Wpointer-arith]
The following two-liner fixes that, tested on x86_64-linux and i686-linux.
2025-06-03 Jakub Jelinek <jakub@redhat.com>
PR libgomp/120444
* testsuite/libgomp.c-c++-common/omp_target_memset-3.c (test_it):
Change ptr argument type from void * to int8_t *.
(main): Change ptr variable type from void * to int8_t * and cast
omp_target_alloc result to the latter type.
Andrew Pinski [Mon, 2 Jun 2025 22:56:20 +0000 (15:56 -0700)]
switch-conversion: Mark CSWTCH as mergeable [PR120451]
When we have a smallish CSWTCH, it could be placed in the rodata.cst16
section so it can be merged with other constants across TUs.
The fix is simple; just mark the decl as mergable (DECL_MERGEABLE).
DECL_MERGEABLE was added with r14-1500-g4d935f52b0d5c0 specifically
to improve these kind of decls.
PR tree-optimization/120451
gcc/ChangeLog:
* tree-switch-conversion.cc (switch_conversion::build_one_array): Mark
the newly created decl as mergable.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/cswtch-6.c: New test.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Alexandre Oliva [Mon, 2 Jun 2025 23:21:45 +0000 (20:21 -0300)]
[lra] force reg update after spilling to memory [PR120424]
In the added C++ testcase, a stack slot at a negative sp offset is
used to hold a value across a call.
There are a couple of causes that directly lead to this outcome:
- the -fstack-clash-protection and -fnon-call-exception options, that
cause arm_frame_pointer_required to flip from false to true when the
first pseudo gets spilled to memory;
- when the affected pseudo is spilled to memory, we fail to update lra
regno info, because the insns that reference it are already on the
lra_constraint_insn_stack;
There is another potentially-related issue:
- when we notice that the frame pointer can no longer be eliminated to
the stack pointer, we immediately clear can_eliminate, and also
prev_can_eliminate, but update_reg_eliminate relied on the latter to
tell that it needs to propagate a previous_offset to the
newly-selected elimination, or restore the original offsets.
This patch ensures that we update insn register info after spilling a
pseudo to memory, and enables update_reg_eliminate to recognize the
case in which a previously-preferred elimination is disabled
regardless of prev_can_eliminate.
for gcc/ChangeLog
PR rtl-optimization/120424
PR middle-end/118939
* lra-spills.cc (spill_pseudos): Update insn regno info.
* lra-eliminations.cc (update_reg_eliminate): Recognize
disabling of active elimination regardless of
prev_can_eliminate.
Robert Dubner [Mon, 2 Jun 2025 19:55:20 +0000 (15:55 -0400)]
cobol: Honor HAVE_CLOCK_GETTIME and HAVE_GETTIMEOFDAY. [PR119975]
These changes cause genapi.cc to use whichever of clock_gettime() or
gettimeofday() are available. This prevents compilation errors on
systems where clock_gettime() is not available.
Iain Sandoe [Sat, 31 May 2025 15:13:40 +0000 (16:13 +0100)]
c++, coroutines: Some cleanups in build_actor.
We were incorrectly guarding all the frame cleanups on the
basis of frame_needs_free (which is always set for the present
code-gen since we have no allocation elision). The net result
being that the (incorrect) code was behaving as expected.
We built, but never used, a label for the frame destruction;
in practice it is never triggered independently of the promise
and argument copy destruction.
Finally there are a few instances where we had been building
expressions manually rather than using higher-level APIs.
gcc/cp/ChangeLog:
* coroutines.cc (build_actor_fn): Remove an unused
label, guard the frame deallocation correctly, use
simpler APIs to build if and return statements.
Jonathan Wakely [Wed, 21 May 2025 19:12:50 +0000 (20:12 +0100)]
libstdc++: Implement LWG 2439 for std::unique_copy [PR120386]
The current overload set for __unique_copy handles three cases:
- The input range uses forward iterators, the output range does not.
This is the simplest case, and can just compare adjacent elements of
the input range.
- Neither the input range nor output range use forward iterators.
This requires a local variable copied from the input range and updated
by assigning each element to the local variable.
- The output range uses forward iterators.
For this case we compare the current element from the input range with
the element just written to the output range.
There are two problems with this implementation. Firstly, the third case
assumes that the value type of the output range can be compared to the
value type of the input range, which might not be possible at all, or
might be possible but give different results to comparing elements of
the input range. This is the problem identified in LWG 2439.
Secondly, the third case is used when both ranges use forward iterators,
even though the first case could (and should) be used. This means that
we compare elements from the output range instead of the input range,
with the problems described above (either not well-formed, or might give
the wrong results).
The cause of the second problem is that the overload for the first case
looks like:
When the output range uses forward iterators this overload cannot be
used, because forward_iterator_tag does not inherit from
output_iterator_tag, so is not convertible to it.
To fix these problems we need to implement the resolution of LWG 2439 so
that the third case is only used when the value types of the two ranges
are the same. This ensures that the comparisons are well behaved. We
also need to ensure that the first case is used when both ranges use
forward iterators.
This change replaces a single step of tag dispatching to choose between
three overloads with two step of tag dispatching, choosing between two
overloads at each step. The first step dispatches based on the iterator
category of the input range, ignoring the category of the output range.
The second step only happens when the input range uses non-forward
iterators, and dispatches based on the category of the output range and
whether the value type of the two ranges is the same. So now the cases
that are handled are:
- The input range uses forward iterators.
- The output range uses non-forward iterators or a different value type.
- The output range uses forward iterators and has the same value type.
For the second case, the old code used __gnu_cxx::__ops::__iter_comp_val
to wrap the predicate in another level of indirection. That seems
unnecessary, as we can just use a pointer to the local variable instead
of an iterator referring to it.
During review of this patch, it was discovered that all known
implementations of std::unique_copy and ranges::unique_copy (except
cmcstl2) disagree with the specification. The standard (and the SGI STL
documentation) say that it uses pred(*i, *(i-1)) but everybody uses
pred(*(i-1), *i) instead, and apparently always has done. This patch
adjusts ranges::unique_copy to be consistent.
In the first __unique_copy overload, the local copy of the iterator is
changed to be the previous position not the next one, so that we use
++first as the "next" iterator, consistent with the logic used in the
other overloads. This makes it easier to compare them, because we aren't
using pred(*first, *next) in one and pred(something, *first) in the
others. Instead it's always pred(something, *first).
libstdc++-v3/ChangeLog:
PR libstdc++/120386
* include/bits/ranges_algo.h (__unique_copy_fn): Reorder
arguments for third case to match the first two cases.
* include/bits/stl_algo.h (__unique_copy): Replace three
overloads with two, depending only on the iterator category of
the input range. Dispatch to __unique_copy_1 for the
non-forward case.
(__unique_copy_1): New overloads for the case where the input
range uses non-forward iterators.
(unique_copy): Only pass the input range category to
__unique_copy.
* testsuite/25_algorithms/unique_copy/lwg2439.cc: New test.
Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
Jason Merrill [Mon, 2 Jun 2025 14:09:07 +0000 (10:09 -0400)]
c++: __is_destructible fixes [PR107600]
destructible_expr was wrongly assuming that TO is a class type.
When is_xible_helper was added in r8-742 it returned early for abstract
class types, which is correct for __is_constructible, but not
__is_assignable or (now) __is_destructible.
PR c++/107600
gcc/cp/ChangeLog:
* method.cc (destructible_expr): Handle non-classes.
(constructible_expr): Check for abstract class here...
(is_xible_helper): ...not here.
Tomasz Kamiński [Wed, 28 May 2025 09:16:22 +0000 (11:16 +0200)]
libstdc++: Pass small trivial types by value in polymorphic wrappers
This patch adjust the passing of parameters for the move_only_function,
copyable_function and function_ref. For types that are declared as being passed
by value in signature template argument, they are passed by value to the invoker,
when they are small (at most two pointers), trivially move constructible and
trivially destructible. The latter guarantees that passing them by value has not
user visible side effects.
In particular, this extends the set of types forwarded by value, that was
previously limited to scalars, to also include specializations of std::span and
std::string_view, and similar standard and program defined-types.
Checking the suitability of the parameter types requires the types to be complete.
As a consequence, the implementation imposes requirements on instantiation of
move_only_function and copyable_function. To avoid producing the errors from
the implementation details, a static assertion was added to partial
specializations of copyable_function, move_only_function and function_ref.
The static assertion uses existing __is_complete_or_unbounded, as arrays type
parameters are automatically decayed in function type.
Standard already specifies in [res.on.functions] p2.5 that instantiating these
partial specialization with incomplete types leads to undefined behavior.
libstdc++-v3/ChangeLog:
* include/bits/funcwrap.h (__polyfunc::__pass_by_rref): Define.
(__polyfunc::__param_t): Update to use __pass_by_rref.
* include/bits/cpyfunc_impl.h:: Assert that are parameters type
are complete.
* include/bits/funcref_impl.h: Likewise.
* include/bits/mofunc_impl.h: Likewise.
* testsuite/20_util/copyable_function/call.cc: New test.
* testsuite/20_util/function_ref/call.cc: New test.
* testsuite/20_util/move_only_function/call.cc: New test.
* testsuite/20_util/copyable_function/conv.cc: New test.
* testsuite/20_util/function_ref/conv.cc: New test.
* testsuite/20_util/move_only_function/conv.cc: New test.
* testsuite/20_util/copyable_function/incomplete_neg.cc: New test.
* testsuite/20_util/function_ref/incomplete_neg.cc: New test.
* testsuite/20_util/move_only_function/incomplete_neg.cc: New test.
Reviewed-by: Patrick Palka <ppalka@redhat.com> Reviewed-by: Jonathan Wakely <jwakely@redhat.com> Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>
This patch implements C++26 std::polymorphic as specified in P3019 with
amendment to move assignment from LWG 4251.
The implementation always allocate stored object on the heap. The manager
function (_M_manager) is similary keep with the object (polymorphic::_Obj),
which reduces the size of the polymorphic to size of the single pointer plus
allocator (that is declared with [[no_unique_address]]).
The implementation does not not use small-object optimization (SSO). We may
consider adding this in the future, as SSO is allowed by the standard. However,
storing any polimorphic object will require providing space for two pointers
(manager function and vtable pointer) and user-declared data members.
PR libstdc++/119152
libstdc++-v3/ChangeLog:
* include/bits/indirect.h (std::polymorphic, pmr::polymorphic)
[__glibcxx_polymorphic]: Define.
* include/bits/version.def (polymorphic): Define.
* include/bits/version.h: Regenerate.
* include/std/memory: Define __cpp_lib_polymorphic.
* testsuite/std/memory/polymorphic/copy.cc: New test.
* testsuite/std/memory/polymorphic/copy_alloc.cc: New test.
* testsuite/std/memory/polymorphic/ctor.cc: New test.
* testsuite/std/memory/polymorphic/ctor_poly.cc: New test.
* testsuite/std/memory/polymorphic/incomplete.cc: New test.
* testsuite/std/memory/polymorphic/invalid_neg.cc: New test.
* testsuite/std/memory/polymorphic/move.cc: New test.
* testsuite/std/memory/polymorphic/move_alloc.cc: New test.
Co-authored-by: Tomasz Kamiński <tkaminsk@redhat.com> Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>
Stafford Horne [Sat, 31 May 2025 05:54:58 +0000 (06:54 +0100)]
or1k: Fix struct return test
In or1k structs are returned from functions using the memory address
passed in r3. In the current version of GCC the struct stores changed
from r11 (the return value) to r3 the incoming memory address. Both of
are valid.
Adjust the test to match what GCC is producing now.
Stafford Horne [Mon, 12 May 2025 20:47:21 +0000 (21:47 +0100)]
or1k: Support long jump offsets with -mcmodel=large
The -mcmodel=large option was originally added to handle generation of
large binaries with large PLTs. However, when compiling the Linux
kernel with allyesconfig the output binary is so large that the jump
instruction 26-bit immediate is not large enough to store the jump
offset to some symbols when linking. Example error:
relocation truncated to fit: R_OR1K_INSN_REL_26 against symbol `do_fpe_trap' defined in .text section in arch/openrisc/kernel/traps.o
We fix this by forcing jump offsets to registers when -mcmodel=large.
Note, to get the Linux kernel allyesconfig config to work with OpenRISC,
this patch is needed along with some other patches to the Linux hand
coded assembly bits.
gcc/ChangeLog:
* config/or1k/predicates.md (call_insn_operand): Add condition
to not allow symbol_ref operands with TARGET_CMODEL_LARGE.
* config/or1k/or1k.opt: Document new -mcmodel=large
implications.
* doc/invoke.texi: Likewise.
gcc/testsuite/ChangeLog:
* gcc.target/or1k/call-1.c: New test.
* gcc.target/or1k/got-1.c: New test.
Some tests have 'dg-do link' but currently require 'tls' which is a
compile-only check.
In some configurations of arm-none-eabi, the 'tls' effective-target
can be successful although these tests fail to link with
undefined reference to `__aeabi_read_tp'
This patch as a new tls_link effective target which makes sure we can
build an executable.
Kito Cheng [Wed, 28 May 2025 09:59:11 +0000 (17:59 +0800)]
RISC-V: Adjust build rule for gen-riscv-ext-opt and gen-riscv-ext-texi
Separate the build rules to compile and link stage to make sure
BUILD_LINKERFLAGS and BUILD_LDFLAGS are applied correctly.
We hit this issue when we try to build GCC with non-system-default g++,
and it use newer libstdc++, and then got error from using older libstdc++ from
system, that should not happened if we link with -static-libgcc and
-static-libstdc++.
gcc/ChangeLog:
* config/riscv/t-riscv: Adjust build rule for gen-riscv-ext-opt
and gen-riscv-ext-texi.
Kito Cheng [Tue, 27 May 2025 02:10:15 +0000 (10:10 +0800)]
c++tools: Don't check --enable-default-pie.
`--enable-default-pie` is an option to specify whether to enable
position-independent executables by default for `target`.
However c++tools is build for `host`, so it should just follow
`--enable-host-pie` option to determine whether to build with
position-independent executables or not.
NOTE:
I checked PR 98324 and build with same configure option
(`--enable-default-pie` and lto bootstrap) on x86-64 linux to make sure
it won't cause same problem.
OpenMP: Handle more cases in user/condition selector
Tobias had noted that the C front end was not treating C23 constexprs
as constant in the user/condition selector property, which led to
missed opportunities to resolve metadirectives at parse time.
Additionally neither C nor C++ was permitting the expression to have
pointer or floating-point type -- the former being a common idiom in
other C/C++ conditional expressions. By using the existing front-end
hooks for the implicit conversion to bool in conditional expressions,
we also get free support for using a C++ class object that has a bool
conversion operator in the user/condition selector.
gcc/c/ChangeLog
* c-parser.cc (c_parser_omp_context_selector): Call
convert_lvalue_to_rvalue and c_objc_common_truthvalue_conversion
on the expression for OMP_TRAIT_PROPERTY_BOOL_EXPR.
gcc/cp/ChangeLog
* cp-tree.h (maybe_convert_cond): Declare.
* parser.cc (cp_parser_omp_context_selector): Call
maybe_convert_cond and fold_build_cleanup_point_expr on the
expression for OMP_TRAIT_PROPERTY_BOOL_EXPR.
* pt.cc (tsubst_omp_context_selector): Likewise.
* semantics.cc (maybe_convert_cond): Remove static declaration.
Jerry DeLisle [Sat, 31 May 2025 15:57:22 +0000 (08:57 -0700)]
Fortran: Fix handling of parsed format strings.
Previously parsed strings with errors were being cached such
that subsequent use of the format string were not being
checked for errors.
PR libfortran/119856
libgfortran/ChangeLog:
* io/format.c (parse_format_list): Set the fmt->error
message for missing comma.
(parse_format): Do not cache the parsed format string
if a previous error ocurred.
Andrew Pinski [Sat, 31 May 2025 22:10:14 +0000 (15:10 -0700)]
forwprop: Manually rename the virtual mem op for complex and vector loads prop
There are two places which forwprop replaces an original load to a few different loads.
Both can set the vuse manually instead of relying on update_ssa.
One is doing a complex load followed by REAL/IMAG_PART only
And the other is very similar but for vector loads followed by BIT_FIELD_REF.
Since this was the last place that needed to handle updating the ssa form,
Remove the TODO_update_ssa also from the pass.
gcc/ChangeLog:
* tree-ssa-forwprop.cc (optimize_vector_load): Set the vuse manually
on the new load statements. Also remove forward declaration since
the definition is before the first use.
(pass_forwprop::execute): Likewise for complex loads.
(pass_data_forwprop): Remove TODO_update_ssa.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Martin Uecker [Thu, 29 May 2025 15:17:12 +0000 (17:17 +0200)]
c: fix ICE related to tagged types with attributes in diagnostics [PR120380]
get_aka_type will create a new type for diagnostics, but for tagged types
attributes will then be ignored with a warning. This can lead to reentering
warning code which leads to an ICE. Fix this by ignoring the attributes
for tagged types.
PR c/120380
gcc/c/ChangeLog:
* c-objc-common.cc (get_aka_type): Ignore attributes for tagged types.
gcc/testsuite/ChangeLog:
* gcc.dg/pr120380.c: New test.
xtensa: Remove an unnecessary constraint modifier from movsf_internal insn pattern
In this case, there is no need to consider reloading when memory is the
destination. On the other hand, when memory is the source, reloading
read from constant pool becomes double indirection and should obviously
be avoided.
gcc/ChangeLog:
* config/xtensa/xtensa.md (movsf_internal):
Remove destination side constraint modifier '^' in the third
alternative.
Implement TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS in order to avoid using
ALL_REGS rclass as is done on other targets, instead of overestimating
between integer and FP register move costs.
gcc/ChangeLog:
* config/xtensa/xtensa.cc
(xtensa_ira_change_pseudo_allocno_class):
New prototype and function.
(TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS): Define macro.
(xtensa_register_move_cost):
Change between integer and FP register move cost to a value
based on actual behavior, i.e. 2, the default and the same as
the move cost between integer registers.