Patrick Palka [Tue, 28 Jan 2025 14:27:02 +0000 (09:27 -0500)]
c++: friend vs inherited guide confusion [PR117855]
We recently started using the lang_decl_fn::context field to track
inheritedness of a deduction guide (for C++23 inherited CTAD). This
new overloading of the field accidentally made DECL_FRIEND_CONTEXT
return non-NULL for inherited guides, which breaks the below testcase
during overload resolution with an inherited guide.
This patch fixes this by refining DECL_FRIEND_CONTEXT appropriately.
Tamar Christina [Thu, 16 Jan 2025 19:23:50 +0000 (19:23 +0000)]
AArch64: don't override march to assembler with mcpu if march is specified [PR110901]
When both -mcpu and -march are specified, the value of -march wins out.
This is done correctly for the calls to cc1 and for the assembler directives we
put out in assembly files.
However in the call to as we don't do this and instead use the arch from the
cpu. This leads to a situation that GCC cannot reliably be used to compile
assembly files which don't have a .arch directive.
This is quite common with .S files which use macros to selectively enable
codepath based on what the preprocessor sees.
The fix is to change MCPU_TO_MARCH_SPEC to not override the march if an march
is already specified.
gcc/ChangeLog:
PR target/110901
* config/aarch64/aarch64.h (MCPU_TO_MARCH_SPEC): Don't override if
march is set.
gcc/testsuite/ChangeLog:
PR target/110901
* gcc.target/aarch64/options_set_29.c: New test.
This has worked great but was only added for homogenous systems.
However the same thing works for big.LITTLE as in such system the cores must
have the same extensions otherwise it doesn't fundamentally work.
i.e. task migration from one core to the other wouldn't work.
This extends the same handling to non-homogenous systems.
gcc/ChangeLog:
PR target/113257
* config/aarch64/driver-aarch64.cc (get_cpu_from_id, DEFAULT_CPU): New.
(host_detect_local_cpu): Use it.
gcc/testsuite/ChangeLog:
PR target/113257
* gcc.target/aarch64/cpunative/info_34: New test.
* gcc.target/aarch64/cpunative/native_cpu_34.c: New test.
* gcc.target/aarch64/cpunative/info_35: New test.
* gcc.target/aarch64/cpunative/native_cpu_35.c: New test.
The FakeStack flag is not zeroed out when can_store_by_pieces()
returns false. Over time, this causes FakeStack::Allocate() to perform
the maximum number of loop iterations, significantly slowing down the
instrumented program.
As reported in PR118185, std::ranges::clamp does not correctly forward
the projected value to the comparator. Add the missing forward.
libstdc++-v3/ChangeLog:
PR libstdc++/118185
PR libstdc++/100249
* include/bits/ranges_algo.h (__clamp_fn): Correctly forward the
projected value to the comparator.
* testsuite/25_algorithms/clamp/118185.cc: New test.
Signed-off-by: Giuseppe D'Angelo <giuseppe.dangelo@kdab.com> Reviewed-by: Patrick Palka <ppalka@redhat.com> Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
(cherry picked from commit b342614139c0a981b369176980663941b9c27f39)
Patrick Palka [Thu, 16 Jan 2025 21:40:08 +0000 (16:40 -0500)]
c++: explicit spec of constrained member tmpl [PR107522]
When defining a explicit specialization of a constrained member template
(of a class template) such as f and g in the below testcase, the
DECL_TEMPLATE_PARMS of the corresponding TEMPLATE_DECL are partially
instantiated, whereas its associated constraints are carried over
from the original template and thus are in terms of the original
DECL_TEMPLATE_PARMS. So during normalization for such an explicit
specialization we need to consider the (parameters of) the most general
template, since that's what the constraints are in terms of and since we
always use the full set of template arguments during satisfaction.
PR c++/107522
gcc/cp/ChangeLog:
* constraint.cc (get_normalized_constraints_from_decl): Use the
most general template for an explicit specialization of a
member template.
gcc/testsuite/ChangeLog:
* g++.dg/cpp2a/concepts-explicit-spec7.C: New test.
Harald Anlauf [Sun, 19 Jan 2025 20:06:56 +0000 (21:06 +0100)]
Fortran: do not copy back for parameter actual arguments [PR81978]
When an array is packed for passing as an actual argument, and the array
has the PARAMETER attribute (i.e., it is a named constant that can reside
in read-only memory), do not copy back (unpack) from the temporary.
PR fortran/81978
gcc/fortran/ChangeLog:
* trans-array.cc (gfc_conv_array_parameter): Do not copy back data
if actual array parameter has the PARAMETER attribute.
* trans-expr.cc (gfc_conv_subref_array_arg): Likewise.
Marek Polacek [Mon, 25 Nov 2024 14:45:13 +0000 (09:45 -0500)]
c++: ICE with nested anonymous union [PR117153]
In a template, for
union {
union {
T d;
};
};
build_anon_union_vars crates a malformed COMPONENT_REF: we have no
DECL_NAME for the nested anon union so we create something like "object.".
Most of the front end doesn't seem to care, but if such a tree gets to
potential_constant_expression, it can cause a crash.
We can use FIELD directly for the COMPONENT_REF's member. tsubst_stmt
should build up a proper one in:
if (VAR_P (decl) && !DECL_NAME (decl)
&& ANON_AGGR_TYPE_P (TREE_TYPE (decl)))
/* Anonymous aggregates are a special case. */
finish_anon_union (decl);
PR c++/117153
gcc/cp/ChangeLog:
* decl2.cc (build_anon_union_vars): Use FIELD for the second operand
of a COMPONENT_REF.
gcc/testsuite/ChangeLog:
* g++.dg/other/anon-union6.C: New test.
* g++.dg/other/anon-union7.C: New test.
Peter Bergner [Thu, 16 Jan 2025 16:49:45 +0000 (10:49 -0600)]
rs6000: Fix loop limit for built-in constant checking
The loop checking for built-in constant operand restrictions was missing
some operands due to the loop limit being too small. Fixing that exposed
a testsuite failure which is caused by a typo in the pmxvi4ger8pp definition
where we had made the PMASK field too small.
2025-01-16 Peter Bergner <bergner@linux.ibm.com>
gcc/
* config/rs6000/rs6000-builtin.cc (rs6000_expand_builtin): Use correct
array size for the loop limit.
* config/rs6000/rs6000-builtins.def: Fix field size for PMASK operand.
as a nop. This PR shows that that isn't always correct.
The compare in the set above is between two 0/1 booleans (at least
on STORE_FLAG_VALUE==1 targets), whereas the unknown comparison that
produced the incoming (reg:CC cc) is unconstrained; it could be between
arbitrary integers, or even floats. The fold is therefore replacing a
cc that is valid for both signed and unsigned comparisons with one that
is only known to be valid for signed comparisons.
(gt (compare (gt cc 0) (lt cc 0) 0)
does simplify to:
(gt cc 0)
but:
(gtu (compare (gt cc 0) (lt cc 0) 0)
does not simplify to:
(gtu cc 0)
The optimisation didn't come with a testcase, but it was added for
i386's cmpstrsi, now cmpstrnsi. That probably doesn't matter as much
as it once did, since it's now conditional on -minline-all-stringops.
But the patch is almost 25 years old, so whatever the original
motivation was, it seems likely that other things now rely on it.
It therefore seems better to try to preserve the optimisation on rtl
rather than get rid of it. To do that, we need to look at how the
result of the outer compare is used. We'd therefore be looking at four
instructions (the gt, the lt, the compare, and the use of the compare),
but combine already allows that for 3-instruction combinations thanks
to:
/* If the source is a COMPARE, look for the use of the comparison result
and try to simplify it unless we already have used undobuf.other_insn. */
When applied to boolean inputs, a comparison operator is
effectively a boolean logical operator (AND, ANDNOT, XOR, etc.).
simplify_logical_relational_operation already had code to simplify
logical operators between two comparison results, but:
* It only handled IOR, which doesn't cover all the cases needed here.
The others are easily added.
* It treated comparisons of integers as having an ORDERED/UNORDERED result.
Therefore:
* it would not treat "true for LT + EQ + GT" as "always true" for
comparisons between integers, because the mask excluded the UNORDERED
condition.
* it would try to convert "true for LT + GT" into LTGT even for comparisons
between integers. To prevent an ICE later, the code used:
/* Many comparison codes are only valid for certain mode classes. */
if (!comparison_code_valid_for_mode (code, mode))
return 0;
However, this used the wrong mode, since "mode" is here the integer
result of the comparisons (and the mode of the IOR), not the mode of
the things being compared. Thus the effect was to reject all
floating-point-only codes, even when comparing floats.
I think instead the code should detect whether the comparison is between
integer values and remove UNORDERED from consideration if so. It then
always produces a valid comparison (or an always true/false result),
and so comparison_code_valid_for_mode is not needed. In particular,
"true for LT + GT" becomes NE for comparisons between integers but
remains LTGT for comparisons between floats.
* There was a missing check for whether the comparison inputs had
side effects.
While there, it also seemed worth extending
simplify_logical_relational_operation to unsigned comparisons, since
that makes the testing easier.
As far as that testing goes: the patch exhaustively tests all
combinations of integer comparisons in:
(cmp1 (cmp2 X Y) (cmp3 X Y))
for the 10 integer comparisons, giving 1000 fold attempts in total.
It then tries all combinations of (X in {-1,0,1} x Y in {-1,0,1})
on the result of the fold, giving 9 checks per fold, or 9000 in total.
That's probably more than is typical for self-tests, but it seems to
complete in neglible time, even for -O0 builds.
gcc/
PR rtl-optimization/117186
* rtl.h (simplify_context::simplify_logical_relational_operation): Add
an invert0_p parameter.
* simplify-rtx.cc (unsigned_comparison_to_mask): New function.
(mask_to_unsigned_comparison): Likewise.
(comparison_code_valid_for_mode): Delete.
(simplify_context::simplify_logical_relational_operation): Add
an invert0_p parameter. Handle AND and XOR. Handle unsigned
comparisons. Handle always-false results. Ignore the low bit
of the mask if the operands are always ordered and remove the
then-redundant check of comparison_code_valid_for_mode. Check
for side-effects in the operands before simplifying them away.
(simplify_context::simplify_binary_operation_1): Remove
simplification of (compare (gt ...) (lt ...)) and instead...
(simplify_context::simplify_relational_operation_1): ...handle
comparisons of comparisons here.
(test_comparisons): New function.
(test_scalar_ops): Call it.
gcc/testsuite/
PR rtl-optimization/117186
* gcc.dg/torture/pr117186.c: New test.
* gcc.target/aarch64/pr117186.c: Likewise.
aarch64: Detect word-level modification in early-ra [PR118184]
REGMODE_NATURAL_SIZE is set to 64 bits for everything except
VLA SVE modes. This means that it's possible to modify (say)
the highpart of a TI pseudo or a V2DI pseudo independently
of the lowpart. Modifying such highparts requires a reload
if the highpart ends up in the upper 64 bits of an FPR,
since RTL semantics do not allow the highpart of a single
hard register to be modified independently of the lowpart.
early-ra missed a check for this case, which meant that it
effectively treated an assignment to (subreg:DI (reg:TI R) 0)
as an assignment to the whole of R.
gcc/
PR target/118184
* config/aarch64/aarch64-early-ra.cc (allocno_assignment_is_rmw):
New function.
(early_ra::record_insn_defs): Mark the live range information as
untrustworthy if an assignment would change part of an allocno
but preserve the rest.
gcc/testsuite/
* gcc.dg/torture/pr118184.c: New test.
Jakub Jelinek [Tue, 21 Jan 2025 23:18:24 +0000 (00:18 +0100)]
c++: Wrap force_target_expr in get_member_function_from_ptrfunc with save_expr [PR118509]
My October PR117259 fix to get_member_function_from_ptrfunc to use a
TARGET_EXPR rather than SAVE_EXPR unfortunately caused some regressions as
well as the following testcase shows.
What happens is that
get_member_function_from_ptrfunc -> build_base_path calls save_expr,
so since the PR117259 change in mnay cases it will call save_expr on
a TARGET_EXPR. And, for some strange reason a TARGET_EXPR is not considered
an invariant, so we get a SAVE_EXPR wrapped around the TARGET_EXPR.
That SAVE_EXPR <TARGET_EXPR <...>> gets initially added only to the second
operand of ?:, so at that point it would still work fine during expansion.
But unfortunately an expression with that subexpression is handed to the
caller also through *instance_ptrptr = instance_ptr; and gets evaluated
once again when computing the first argument to the method.
So, essentially, we end up with
(TARGET_EXPR <D.2907, ...>, (... ? ... SAVE_EXPR <TARGET_EXPR <D.2907, ...>
... : ...)) (... SAVE_EXPR <TARGET_EXPR <D.2907, ...> ..., ...);
and while D.2907 is initialized during gimplification in the code dominating
everything that uses it, the extra temporary created for the SAVE_EXPR
is initialized only conditionally (if the ?: condition is true) but then
used unconditionally, so we get
pmf-4.C: In function ‘void foo(C, B*)’:
pmf-4.C:12:11: warning: ‘<anonymous>’ may be used uninitialized [-Wmaybe-uninitialized]
12 | (y->*x) ();
| ~~~~~~~~^~
pmf-4.C:12:11: note: ‘<anonymous>’ was declared here
12 | (y->*x) ();
| ~~~~~~~~^~
diagnostic and wrong-code issue too.
As the trunk fix to just treat TARGET_EXPR as invariant seems a little bit risky
and I'd like to get it tested on the trunk for a while, for 14.2.1 this patch
instead wraps those TARGET_EXPRs into SAVE_EXPRs. Eventually that can be reverted
and the trunk fix backported.
2025-01-21 Jakub Jelinek <jakub@redhat.com>
PR c++/118509
* typeck.cc (get_member_function_from_ptrfunc): Wrap force_target_expr
with save_expr.
Nathaniel Shead [Fri, 17 Jan 2025 10:29:08 +0000 (21:29 +1100)]
c++/modules: Propagate FNDECL_USED_AUTO when propagating deduced return types [PR118049]
In the linked testcase, we're erroring because the declared return types
of the functions do not appear to match. This is because when merging
the deduced return types for 'foo' in 'auto-5_b.C', we overwrote the
return type for the declaration with the deduced return type from
'auto-5_a.C' but neglected to track that we were originally declared
with 'auto'.
As a drive-by improvement to QOI, also add checks for if the deduced
return types do not match; this is currently useful because we do not
check the equivalence of the bodies of functions yet.
PR c++/118049
gcc/cp/ChangeLog:
* module.cc (trees_in::is_matching_decl): Propagate
FNDECL_USED_AUTO as well.
gcc/testsuite/ChangeLog:
* g++.dg/modules/auto-5_a.C: New test.
* g++.dg/modules/auto-5_b.C: New test.
* g++.dg/modules/auto-5_c.C: New test.
* g++.dg/modules/auto-6_a.H: New test.
* g++.dg/modules/auto-6_b.C: New test.
Iain Buclaw [Mon, 20 Jan 2025 19:01:03 +0000 (20:01 +0100)]
d: Fix failing test with 32-bit compiler [PR114434]
Since the introduction of gdc.test/runnable/test23514.d, it's exposed an
incorrect compilation when adding a 64-bit constant to a link-time
address. The current cast to size_t causes a loss of precision, which
can result in incorrect compilation.
PR d/114434
gcc/d/ChangeLog:
* expr.cc (ExprVisitor::visit (PtrExp *)): Get the offset as a
dinteger_t rather than a size_t.
(ExprVisitor::visit (SymOffExp *)): Likewise.
Uros Bizjak [Fri, 20 Dec 2024 15:16:15 +0000 (16:16 +0100)]
i386: Disable SImode/DImode moves from/to mask regs without avx512bw [PR118067]
SImode and DImode moves from/to mask registers are valid only with AVX512BW,
so mark relevant alternatives in *movsi_internal and *movdi_internal as such.
Even with the patch, the testcase still fails, but now with:
pr118067.c: In function ‘foo’:
pr118067.c:13:1: internal compiler error: maximum number of generated reload insns per insn achieved (90)
13 | }
| ^
0x2c3b581 internal_error(char const*, ...)
../../git/gcc/gcc/diagnostic-global-context.cc:517
0xb68938 lra_constraints(bool)
../../git/gcc/gcc/lra-constraints.cc:5411
0xb51a0d lra(_IO_FILE*, int)
../../git/gcc/gcc/lra.cc:2449
0xaf9f4d do_reload
../../git/gcc/gcc/ira.cc:5977
0xafa462 execute
../../git/gcc/gcc/ira.cc:6165
Simon Martin [Sun, 5 Jan 2025 09:36:47 +0000 (10:36 +0100)]
c++: Friend classes don't shadow enclosing template class paramater [PR118255]
We currently reject the following code
=== code here ===
template <int non_template> struct S { friend class non_template; };
class non_template {};
S<0> s;
=== code here ===
While EDG agrees with the current behaviour, clang and MSVC don't (see
https://godbolt.org/z/69TGaabhd), and I believe that this code is valid,
since the friend clause does not actually declare a type, so it cannot
shadow anything. The fact that we didn't error out if the non_template
class was declared before S backs this up as well.
This patch fixes this by skipping the call to check_template_shadow for
hidden bindings.
PR c++/118255
gcc/cp/ChangeLog:
* name-lookup.cc (pushdecl): Don't call check_template_shadow
for hidden bindings.
gcc/testsuite/ChangeLog:
* g++.dg/lookup/pr99116-1.C: Adjust test expectation.
* g++.dg/template/friend84.C: New test.
Nathaniel Shead [Fri, 20 Dec 2024 11:09:39 +0000 (22:09 +1100)]
c++: Allow pragmas in NSDMIs [PR118147]
This patch removes the (unnecessary) CPP_PRAGMA_EOL case from
cp_parser_cache_defarg, which currently has the result that any pragmas
in the NSDMI cause an error.
PR c++/118147
gcc/cp/ChangeLog:
* parser.cc (cp_parser_cache_defarg): Don't error when
CPP_PRAGMA_EOL.
Simon Martin [Thu, 16 Jan 2025 15:27:06 +0000 (16:27 +0100)]
c++: Make sure fold_sizeof_expr returns the correct type [PR117775]
We currently ICE upon the following code, that is valid under
-Wno-pointer-arith:
=== cut here ===
int main() {
decltype( [](auto) { return sizeof(void); } ) x;
return x.operator()(0);
}
=== cut here ===
The problem is that "fold_sizeof_expr (sizeof(void))" returns
size_one_node, that has a different TREE_TYPE from that of the sizeof
expression, which later triggers an assert in cxx_eval_store_expression.
This patch makes sure that fold_sizeof_expr always returns a tree with
the size_type_node type.
PR c++/117775
gcc/cp/ChangeLog:
* decl.cc (fold_sizeof_expr): Make sure the folded result has
type size_type_node.
Eugene Rozenfeld [Sat, 11 Jan 2025 03:48:52 +0000 (19:48 -0800)]
Fix setting of call graph node AutoFDO count
We are initializing both the call graph node count and
the entry block count of the function with the head_count value
from the profile.
Count propagation algorithm may refine the entry block count
and we may end up with a case where the call graph node count
is set to zero but the entry block count is non-zero. That becomes
a problem because we have this code in execute_fixup_cfg:
profile_count num = node->count;
profile_count den = ENTRY_BLOCK_PTR_FOR_FN (cfun)->count;
bool scale = num.initialized_p () && !(num == den);
Here if num is 0 but den is not 0, scale becomes true and we
lose the counts in
if (scale)
bb->count = bb->count.apply_scale (num, den);
This is what happened in the issue reported in PR116743
(a 10% regression in MySQL HAMMERDB tests). 3d9e6767939e9658260e2506e81ec32b37cba041 made an improvement in
AutoFDO count propagation, which caused a mismatch between
the call graph node count (zero) and the entry block count (non-zero)
and subsequent loss of counts as described above.
The fix is to update the call graph node count once we've done count propagation.
Tested on x86_64-pc-linux-gnu.
gcc/ChangeLog:
PR gcov-profile/116743
* auto-profile.cc (afdo_annotate_cfg): Fix mismatch between the call graph node count
and the entry block count.
Iain Buclaw [Thu, 16 Jan 2025 23:23:45 +0000 (00:23 +0100)]
d: Fix record layout of compiler-generated TypeInfo_Class [PR115249]
In r14-8766, the layout of TypeInfo_Class changed in the runtime
library, but didn't get reflected in the compiler-generated data,
causing a corruption of runtime type introspection on BigEndian targets.
This adjusts the size of the `ClassFlags' field from uint to ushort, and
adds a new ushort `depth' field in the space where ClassFlags used to
occupy.
Jonathan Wakely [Fri, 27 Sep 2024 20:01:46 +0000 (21:01 +0100)]
libstdc++: Fix more pedwarns in headers for C++98
Some tests e.g. 17_intro/headers/c++1998/all_pedantic_errors.cc FAIL
with GLIBCXX_TESTSUITE_STDS=98 due to numerous C++11 extensions still in
use in the library headers. The recent changes to not make them system
headers means we get warnings now.
This change adds more diagnostic pragmas to suppress those warnings.
libstdc++-v3/ChangeLog:
* include/bits/istream.tcc: Add diagnostic pragmas around uses
of long long and extern template.
* include/bits/locale_facets.h: Likewise.
* include/bits/locale_facets.tcc: Likewise.
* include/bits/locale_facets_nonio.tcc: Likewise.
* include/bits/ostream.tcc: Likewise.
* include/bits/stl_algobase.h: Likewise.
* include/c_global/cstdlib: Likewise.
* include/ext/pb_ds/detail/resize_policy/hash_prime_size_policy_imp.hpp:
Likewise.
* include/ext/pointer.h: Likewise.
* include/ext/stdio_sync_filebuf.h: Likewise.
* include/std/istream: Likewise.
* include/std/ostream: Likewise.
* include/tr1/cmath: Likewise.
* include/tr1/type_traits: Likewise.
* include/tr1/functional_hash.h: Likewise. Remove semi-colons
at namespace scope that aren't needed after macro expansion.
* include/tr1/tuple: Remove semi-colon at namespace scope.
* include/bits/vector.tcc: Change LL suffix to just L.
We have several overloads of std::deque::_M_insert_aux, one of which is
variadic and called by std::deque::emplace. With a suitable set of
arguments to emplace, it's possible for one of the non-variadic
_M_insert_aux overloads to be selected by overload resolution, making
emplace ill-formed.
Rename the variadic _M_insert_aux to _M_emplace_aux so that calls to
emplace never select an _M_insert_aux overload. Also add an inline
_M_insert_aux for the const lvalue overload that is called from
insert(const_iterator, const value_type&).
Match-and-simplified .COND_IOR (_41, d_lsm.7_11, _46, d_lsm.7_11) to 1
when _46 == 1. This happens by removing the conditional and applying
a | 1 = 1. Normally we re-introduce the conditional and its else value
if needed but that does not happen here as we're not dealing with a
vector type. For correctness's sake, we must not remove the conditional
even for non-vector types.
This patch re-introduces a COND_EXPR in such cases. For PR118140 this
result in a non-vectorized loop.
PR middle-end/118140
gcc/ChangeLog:
* gimple-match-exports.cc (maybe_resimplify_conditional_op): Add
COND_EXPR when we simplified to a scalar gimple value but still
have an else value.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/pr118140.c: New test.
* gcc.target/riscv/rvv/autovec/pr118140.c: New test.
testsuite: The expect framework might introduce CR in output
When running tests using the "sim" config, the command is launched in
non-readonly mode and the text retrieved from the expect command will
then replace all LF with CRLF. (The problem can be found in sim_load
where it calls remote_spawn without an input file).
libstdc++-v3/ChangeLog:
* testsuite/27_io/print/1.cc: Allow both LF and CRLF in test.
* testsuite/27_io/print/3.cc: Likewise.
Nathaniel Shead [Sat, 11 Jan 2025 17:35:08 +0000 (04:35 +1100)]
testsuite: Fix flag used for modules test
GCC14 doesn't have the new spelling '-fmodules' for enabling modules;
use the old '-fmodules-ts' spelling instead.
gcc/testsuite/ChangeLog:
* g++.dg/modules/pr114630_a.C: Use -fmodules-ts instead of
-fmodules in testcase.
* g++.dg/modules/pr114630_b.C: Likewise.
* g++.dg/modules/pr114630_c.C: Likewise.
Nathaniel Shead [Thu, 9 Jan 2025 14:06:37 +0000 (01:06 +1100)]
c++/modules: Handle chaining already-imported local types [PR114630]
In the linked testcase, an ICE occurs because when reading the
(duplicate) function definition for _M_do_parse from module Y, the local
type definitions have already been streamed from module X and setup as
regular backreferences, rather than being found with find_duplicate,
causing issues with managing DECL_CHAIN.
It is tempting to just skip setting up the DECL_CHAIN for this case.
However, for the future it would be best to ensure that the block vars
for the duplicate definition are accurate, so that we could implement
ODR checking on function definitions at some point.
So to solve this, this patch creates a copy of the streamed-in local
type and chains that; it will be discarded along with the rest of the
duplicate function after we've finished processing.
A couple of suggested implementations from the discussion on the PR that
don't work:
- Replacing the `DECL_CHAIN` assertion with `(*chain && *chain != decl)`
doesn't handle the case where type definitions are followed by regular
local variables, since those won't have been imported as separate
backreferences and so the chains will diverge.
- Correcting the purviewness of GMF template instantiations to force Y
to emit copies of the local types rather than backreferences into X is
insufficient, as it's still possible that the local types got streamed
in a separate cluster to the function definition, and so will be again
referred to via regular backreferences when importing.
- Likewise, preventing the emission of function definitions where an
import has already provided that same definition also is insufficient,
for much the same reason.
PR c++/114630
gcc/cp/ChangeLog:
* module.cc (trees_in::core_vals) <BLOCK>: Chain a new node if
DECL_CHAIN already is set.
gcc/testsuite/ChangeLog:
* g++.dg/modules/pr114630.h: New test.
* g++.dg/modules/pr114630_a.C: New test.
* g++.dg/modules/pr114630_b.C: New test.
* g++.dg/modules/pr114630_c.C: New test.
Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com> Reviewed-by: Jason Merrill <jason@redhat.com> Reviewed-by: Patrick Palka <ppalka@redhat.com>
In GCC 12 there was a ~40% regression in the performance of hashmap->find.
This regression came about accidentally:
Before GCC 12 the find function was small enough that IPA would inline it even
though it wasn't marked inline. In GCC-12 an optimization was added to perform
a linear search when the entries in the hashmap are small.
This increased the size of the function enough that IPA would no longer inline.
Inlining had two benefits:
1. The return value is a reference. so it has to be returned and dereferenced
even though the search loop may have already dereference it.
2. The pattern is a hard pattern to track for branch predictors. This causes
a large number of branch misses if the value is immediately checked and
branched on. i.e. if (a != m.end()) which is a common pattern.
The patch fixes both these issues by adding the inline keyword to _M_locate
to allow the inliner to consider inlining again.
This and the other patches have been ran through serveral benchmarks where
the size, number of elements searched for and type (reference vs value) etc
were tested.
The change shows no statistical regression, but an average find improvement of
~27% and a range between ~10-60% improvements.
testsuite: arm: Add pattern for armv8-m.base to cmse-15.c test
Since armv8-m.base uses thumb1 that does not suport sibcall/tailcall,
a pattern is needed that uses PUSH/BL/POP sequence instead of a single
B instruction to reuse an already existing function in the compile unit.
gcc/testsuite/ChangeLog:
* gcc.target/arm/cmse/cmse-15.c: Added pattern for armv8-m.base.
Andrew Carlotti [Tue, 7 Jan 2025 18:32:23 +0000 (18:32 +0000)]
Disable a broken multiversioning optimisation
This patch skips redirect_to_specific clone for aarch64 and riscv,
because the optimisation has two flaws:
1. It checks the value of the "target" attribute, even on targets that
don't use this attribute for multiversioning.
2. The algorithm used is too aggressive, and will eliminate the
indirection in some cases where the runtime choice of callee version
can't be determined statically at compile time. A correct would need to
verify that:
- if the current caller version were selected at runtime, then the
chosen callee version would be eligible for selection.
- if any higher priority callee version were selected at runtime, then
a higher priority caller version would have been eligble for
selection (and hence the current caller version wouldn't have been
selected).
The current checks only verify a more restrictive version of the first
condition, and don't check the second condition at all.
Fixing the optimisation properly would require implementing target hooks
to check for implications between version attributes, which is too
complicated for this stage. However, I would like to see this hook
implemented in the future, since it could also help deduplicate other
multiversioning code.
Since this behavior has existed for x86 and powerpc for a while, I
think it's best to preserve the existing behavior on those targets,
unless any maintainer for those targets disagrees.
gcc/ChangeLog:
* multiple_target.cc
(redirect_to_specific_clone): Assert that "target" attribute is
used for FMV before checking it.
(ipa_target_clone): Skip redirect_to_specific_clone on some
targets.
Richard Biener [Thu, 5 Dec 2024 09:47:13 +0000 (10:47 +0100)]
tree-optimization/117912 - bogus address equivalences for __builtin_object_size
VN again is the culprit for exploiting address equivalences before
__builtin_object_size got the chance to do its job. This time
it isn't about union members but adjacent structure fields where
an address to one after the last element of an array field can
spill over to the next field.
The following protects all out-of-bound accesses on the upper bound
side (singling out TYPE_MAX_VALUE + 1 is more expensive). It
ignores other out-of-bound addresses that would invoke UB.
Zero-sized arrays are a bit awkward because the C++ represents them
with a -1U upper bound.
There's a similar issue for zero-sized components whose address can
be the same as the adjacent field in C.
PR tree-optimization/117912
* tree-ssa-sccvn.cc (copy_reference_ops_from_ref): For addresses
of zero-sized components do not set ->off if the object size pass
didn't run.
For OOB ARRAY_REF accesses in address expressions avoid setting
->off if the object size pass didn't run.
(valueize_refs_1): Likewise.
* c-c++-common/torture/pr117912-1.c: New testcase.
* c-c++-common/torture/pr117912-2.c: Likewise.
* c-c++-common/torture/pr117912-3.c: Likewise.
This member function was previously deprecated, but that was reverted by
P2875R4, approved earlier this year in Tokyo. Since it's not going to be
deprecated in C++26, and so presumably not removed, there is no point in
giving deprecated warnings for C++23 mode.
Jonathan Wakely [Thu, 11 Apr 2024 18:12:48 +0000 (19:12 +0100)]
libstdc++: Give std::memory_order a fixed underlying type [PR89624]
Prior to C++20 this enum type doesn't have a fixed underlying type,
which means it can be modified by -fshort-enums, which then means the
HLE bits are outside the range of valid values for the type.
As it has a fixed type of int in C++20 and later, do the same for
earlier standards too. This is technically a change for C++17 down,
because the implicit underlying type (without -fshort-enums) was
unsigned before. I doubt it matters in practice. That incompatibility
already exists between C++17 and C++20 and nobody has noticed or
complained. Now at least the underlying type will be int for all -std
modes.
libstdc++-v3/ChangeLog:
PR libstdc++/89624
* include/bits/atomic_base.h (memory_order): Use int as
underlying type.
* testsuite/29_atomics/atomic/89624.cc: New test.
The standard says that std::exclusive_scan can be used to work in
place, i.e. where the output range is the same as the input range. This
means that the first sum cannot be written to the output until after
reading the first input value, otherwise we'll already have overwritten
the first input value.
While writing a new testcase I also realised that the serial version of
std::exclusive_scan uses copy construction for the accumulator variable,
but the standard only requires Cpp17MoveConstructible. We also require
move assignable, which is missing from the standard's requirements, but
we should at least use move construction not copy construction.
A similar problem exists for some other new C++17 numeric algos, but
I'll fix the others in a subsequent commit.
libstdc++-v3/ChangeLog:
PR libstdc++/108236
* include/pstl/glue_numeric_impl.h (exclusive_scan): Pass __init
as rvalue.
* include/pstl/numeric_impl.h (__brick_transform_scan): Do not
write through __result until after reading through __first. Move
__init into return value.
(__pattern_transform_scan): Pass __init as rvalue.
* include/std/numeric (exclusive_scan): Move construct instead
of copy constructing.
* testsuite/26_numerics/exclusive_scan/2.cc: New test.
* testsuite/26_numerics/pstl/numeric_ops/108236.cc: New test.
Jonathan Wakely [Mon, 9 Dec 2024 10:52:10 +0000 (10:52 +0000)]
libstdc++: Fix debug containers for constant evaluation [PR117962]
Using a stateful allocator with std::vector would fail in Debug Mode,
because the allocator-extended move constructor tries to swap all the
attached safe iterators, but that uses a non-inline function which isn't
constexpr. We don't actually need to swap any iterators in constant
expressions, because we never attach them to the container in the first
place.
This bug went unnoticed because the tests for constexpr std::vector were
using a stateful allocator with a std::allocator base class, but were
failing to override the inherited is_always_equal trait from
std::allocator. That meant that the allocators took the always-equal
code paths, and didn't try to use the buggy constructor. In C++26 the
std::allocator::is_always_equal trait goes away, and so the tests
changed behaviour, revealing the bug.
libstdc++-v3/ChangeLog:
PR libstdc++/117962
* include/debug/safe_container.h: Make allocator-extended move
constructor a no-op during constant evaluation.
Jonathan Wakely [Tue, 10 Dec 2024 10:56:41 +0000 (10:56 +0000)]
libstdc++: Disable __gnu_debug::__is_singular(T*) in constexpr [PR109517]
Because of PR c++/85944 we have several bugs where _GLIBCXX_DEBUG causes
errors for constexpr code. Although Bug 117966 could be fixed by
avoiding redundant debug checks in std::span, and Bug 106212 could be
fixed by avoiding redundant debug checks in std::array, there are many
more cases where similar __glibcxx_requires_valid_range checks fail to
compile and removing the checks everywhere isn't desirable.
This just disables the __gnu_debug::__check_singular(T*) check during
constant evaluation. Attempting to dereference a null pointer will
certainly fail during constant evaluation (if it doesn't fail then it's
a compiler bug and not the library's problem). Disabling this check
during constant evaluation shouldn't do any harm.
libstdc++-v3/ChangeLog:
PR libstdc++/109517
PR libstdc++/109976
* include/debug/helper_functions.h (__valid_range_aux): Treat
all input iterator ranges as valid during constant evaluation.
Jonathan Wakely [Mon, 9 Dec 2024 17:35:24 +0000 (17:35 +0000)]
libstdc++: Skip redundant assertions in std::array equality [PR106212]
As PR c++/106212 shows, the Debug Mode checks cause a compilation error
for equality comparisons involving std::array prvalues in constant
expressions. Those Debug Mode checks are redundant when
comparing two std::array objects, because we already know we have a
valid range. We can also avoid the unnecessary step of using
std::__niter_base to do __normal_iterator unwrapping, which isn't needed
because our std::array iterators are just pointers. Using
std::__equal_aux1 instead of std::equal avoids the redundant checks in
std::equal and std::__equal_aux.
libstdc++-v3/ChangeLog:
PR libstdc++/106212
* include/std/array (operator==): Use std::__equal_aux1 instead
of std::equal.
* testsuite/23_containers/array/comparison_operators/106212.cc:
New test.
Jonathan Wakely [Mon, 9 Dec 2024 17:35:24 +0000 (17:35 +0000)]
libstdc++: Skip redundant assertions in std::span construction [PR117966]
As PR c++/117966 shows, the Debug Mode checks cause a compilation error
for a global constexpr std::span. Those debug checks are redundant when
constructing from an array or a range, because we already know we have a
valid range and we know its size. Instead of delegating to the
std::span(contiguous_iterator, contiguous_iterator) constructor, just
initialize the data members directly.
libstdc++-v3/ChangeLog:
PR libstdc++/117966
* include/std/span (span(T (&)[N])): Do not delegate to
constructor that performs redundant checks.
(span(array<T, N>&), span(const array<T, N>&)): Likewise.
(span(Range&&), span(const span<T, N>&)): Likewise.
* testsuite/23_containers/span/117966.cc: New test.
Inserting an empty range into a std::deque results in undefined calls to
either std::copy, std::copy_backward, std::move, or std::move_backward.
We call those algos with invalid arguments where the output range is the
same as the input range, e.g. std::copy(first, last, first) which
violates the preconditions for the algorithms.
This fix simply returns early if there's nothing to insert. Most callers
already ensure that we don't even call _M_range_insert_aux with an empty
range, but some callers don't. Rather than checking for n == 0 in each
of the callers, this just does the check once and uses __builtin_expect
to treat empty insertions as unlikely.
libstdc++-v3/ChangeLog:
PR libstdc++/118035
* include/bits/deque.tcc (_M_range_insert_aux): Return
immediately if inserting an empty range.
* testsuite/23_containers/deque/modifiers/insert/118035.cc: New
test.
Patrick Palka [Thu, 9 Jan 2025 15:50:19 +0000 (10:50 -0500)]
c++: ICE during requires-expr partial subst [PR118060]
Here during partial substitution of the requires-expression (as part of
CTAD constraint rewriting) we segfault from the INDIRECT_REF case of
convert_to_void due *f(u) being type-dependent. We should just defer
checking convert_to_void until satisfaction.
PR c++/118060
gcc/cp/ChangeLog:
* constraint.cc (tsubst_valid_expression_requirement): Don't
check convert_to_void during partial substitution.
Patrick Palka [Thu, 9 Jan 2025 15:50:12 +0000 (10:50 -0500)]
c++: constexpr potentiality of CAST_EXPR [PR117925]
We're incorrectly treating the templated callee (FnPtr)fnPtr, represented
as CAST_EXPR with TREE_LIST operand, as potentially constant here due to
neglecting to look through the TREE_LIST in the CAST_EXPR case of p_c_e_1.
PR c++/117925
gcc/cp/ChangeLog:
* constexpr.cc (potential_constant_expression_1) <case CAST_EXPR>:
Fix check for class conversion to literal type to properly look
through the TREE_LIST operand of a CAST_EXPR.
Patrick Palka [Thu, 9 Jan 2025 15:50:08 +0000 (10:50 -0500)]
c++: relax ICE for unexpected trees during constexpr [PR117925]
When we encounter an unexpected (likely templated) tree code during
constexpr evaluation we currently ICE even in release mode. But it
seems more user-friendly to just gracefully treat the expression as
non-constant, which will be harmless most of the time (e.g. in the case
of warning-specific or speculative constexpr folding as in the PR), and
at worst will transform an ICE-on-valid bug into a rejects-valid bug.
This is also what e.g. tsubst_expr does when it encounters an unexpected
tree code.
PR c++/117925
gcc/cp/ChangeLog:
* constexpr.cc (cxx_eval_constant_expression) <default>:
Relax ICE when encountering an unexpected tree code into a
checking ICE guarded by flag_checking.
Patrick Palka [Thu, 9 Jan 2025 15:49:45 +0000 (10:49 -0500)]
c++: template-id dependence wrt local static arg [PR117792]
Here we end up ICEing at instantiation time for the call to
f<local_static> ultimately because we wrongly consider the call to be
non-dependent, and so we specialize f ahead of time and then get
confused when fully substituting this specialization.
The call is dependent due to [temp.dep.temp]/3 and we miss that because
function template-id arguments aren't coerced until overload resolution,
and so the local static template argument lacks an implicit cast to
reference type that value_dependent_expression_p looks for before
considering dependence of the address. Other kinds of template-ids aren't
affected since they're coerced ahead of time.
So when considering dependence of a function template-id, we need to
conservatively consider dependence of the address of each argument (if
applicable).
PR c++/117792
gcc/cp/ChangeLog:
* pt.cc (type_dependent_expression_p): Consider the dependence
of the address of each template argument of a function
template-id.
Patrick Palka [Fri, 13 Dec 2024 18:17:29 +0000 (13:17 -0500)]
libstdc++: Avoid unnecessary copies in ranges::min/max [PR112349]
Use a local reference for the (now possibly lifetime extended) result of
*__first so that we copy it only when necessary.
PR libstdc++/112349
libstdc++-v3/ChangeLog:
* include/bits/ranges_algo.h (__min_fn::operator()): Turn local
object __tmp into a reference.
* include/bits/ranges_util.h (__max_fn::operator()): Likewise.
* testsuite/25_algorithms/max/constrained.cc (test04): New test.
* testsuite/25_algorithms/min/constrained.cc (test04): New test.
Patrick Palka [Tue, 29 Oct 2024 13:26:19 +0000 (09:26 -0400)]
libstdc++: Fix complexity of drop_view::begin() const [PR112641]
Views are required to have a amortized O(1) begin(), but our drop_view's
const begin overload is O(n) for non-common ranges with a non-sized
sentinel. This patch reimplements it so that it's O(1) always. See
also LWG 4009.
PR libstdc++/112641
libstdc++-v3/ChangeLog:
* include/std/ranges (drop_view::begin): Reimplement const
overload so that it's O(1) always.
* testsuite/std/ranges/adaptors/drop.cc (test10): New test.
When the test was initially created, -fcommon was the default, but in
commit r10-4867-g6271dd984d7 the default value changed to -fno-common.
This change made the test start failing. To counter the over-alignment
caused by 'a' no longer being common, use -Os.
gcc/testsuite/ChangeLog:
* gcc.target/arm/memset-inline-8.c: Use -Os and prefix assembler
instructions with a tab to improve test stability.
* gcc.target/arm/memset-inline-8-exe.c: Use -Os.
Marek Polacek [Thu, 12 Dec 2024 19:56:07 +0000 (14:56 -0500)]
c++: ICE initializing array of aggrs [PR117985]
This crash started with my r12-7803 but I believe the problem lies
elsewhere.
build_vec_init has cleanup_flags whose purpose is -- if I grok this
correctly -- to avoid destructing an object multiple times. Let's
say we are initializing an array of A. Then we might end up in
a scenario similar to initlist-eh1.C:
try
{
call A::A in a loop
// #0
try
{
call a fn using the array
}
finally
{
// #1
call A::~A in a loop
}
}
catch
{
// #2
call A::~A in a loop
}
cleanup_flags makes us emit a statement like
D.3048 = 2;
at #0 to disable performing the cleanup at #2, since #1 will take
care of the destruction of the array.
But if we are not emitting the loop because we can use a constant
initializer (and use a single { a, b, ...}), we shouldn't generate
the statement resetting the iterator to its initial value. Otherwise
we crash in gimplify_var_or_parm_decl because it gets the stray decl
D.3048.
PR c++/117985
gcc/cp/ChangeLog:
* init.cc (build_vec_init): Pop CLEANUP_FLAGS if we're not
generating the loop.
gcc/testsuite/ChangeLog:
* g++.dg/cpp0x/initlist-array23.C: New test.
* g++.dg/cpp0x/initlist-array24.C: New test.
Marek Polacek [Tue, 25 Jun 2024 21:42:01 +0000 (17:42 -0400)]
c++: unresolved overload with comma op [PR115430]
This works:
template<typename T>
int Func(T);
typedef int (*funcptrtype)(int);
funcptrtype fp0 = &Func<int>;
but this doesn't:
funcptrtype fp2 = (0, &Func<int>);
because we only call resolve_nondeduced_context on the LHS (via
convert_to_void) but not on the RHS, so cp_build_compound_expr's
type_unknown_p check issues an error.
PR c++/115430
gcc/cp/ChangeLog:
* typeck.cc (cp_build_compound_expr): Call resolve_nondeduced_context
on RHS.
gcc/testsuite/ChangeLog:
* g++.dg/cpp0x/noexcept41.C: Remove dg-error.
* g++.dg/overload/addr3.C: New test.
Marek Polacek [Tue, 3 Sep 2024 17:04:09 +0000 (13:04 -0400)]
c++: noexcept and pointer to member function type [PR113108]
We ICE in nothrow_spec_p because it got a DEFERRED_NOEXCEPT.
This DEFERRED_NOEXCEPT was created in implicitly_declare_fn
when declaring
Foo& operator=(Foo&&) = default;
in the test. The problem is that in resolve_overloaded_unification
we call maybe_instantiate_noexcept before try_one_overload only in
the TEMPLATE_ID_EXPR case.
Marek Polacek [Thu, 5 Sep 2024 20:45:32 +0000 (16:45 -0400)]
c++: ICE with structured bindings and m-d array [PR102594]
We ICE in decay_conversion with this test:
struct S {
S() {}
};
S arr[1][1];
auto [m](arr3);
But not when the last line is:
auto [n] = arr3;
Therefore the difference is between copy- and direct-init. In
particular, in build_vec_init we have:
if (direct_init)
from = build_tree_list (NULL_TREE, from);
and then we call build_vec_init again with init==from. Then
decay_conversion gets the TREE_LIST and it crashes.
build_aggr_init has:
/* Wrap the initializer in a CONSTRUCTOR so that build_vec_init
recognizes it as direct-initialization. */
init = build_constructor_single (init_list_type_node,
NULL_TREE, init);
CONSTRUCTOR_IS_DIRECT_INIT (init) = true;
so I propose to do the same in build_vec_init.
PR c++/102594
gcc/cp/ChangeLog:
* init.cc (build_vec_init): Build up a CONSTRUCTOR to signal
direct-initialization rather than a TREE_LIST.
Marek Polacek [Thu, 29 Aug 2024 19:13:03 +0000 (15:13 -0400)]
c++: mutable temps in rodata [PR116369]
Here we wrongly mark the reference temporary for g TREE_READONLY,
so it's put in .rodata and so we can't modify its subobject even
when the subobject is marked mutable. This is so since r9-869.
r14-1785 fixed a similar problem, but not in set_up_extended_ref_temp.
PR c++/116369
gcc/cp/ChangeLog:
* call.cc (set_up_extended_ref_temp): Don't mark a temporary
TREE_READONLY if its type is TYPE_HAS_MUTABLE_P.
Marek Polacek [Thu, 15 Aug 2024 22:47:29 +0000 (18:47 -0400)]
c++: ICE with enum and conversion fn in template [PR115657]
Here we initialize an enumerator with a class prvalue with a conversion
function. When we fold it in build_enumerator, we create a TARGET_EXPR
for the object, and subsequently crash in tsubst_expr, which should not
see such a code.
Normally, we fix similar problems by using an IMPLICIT_CONV_EXPR but here
I may get away with not using the result of fold_non_dependent_expr unless
the result is a constant. A TARGET_EXPR is not constant.
PR c++/115657
gcc/cp/ChangeLog:
* decl.cc (build_enumerator): Call maybe_fold_non_dependent_expr
instead of fold_non_dependent_expr.
gcc/testsuite/ChangeLog:
* g++.dg/cpp1y/constexpr-recursion2.C: New test.
* g++.dg/template/conv21.C: New test.
Marek Polacek [Wed, 8 May 2024 19:43:58 +0000 (15:43 -0400)]
c++: ICE with reference NSDMI [PR114854]
Here we crash on a cp_gimplify_expr/TARGET_EXPR assert:
/* A TARGET_EXPR that expresses direct-initialization should have been
elided by cp_gimplify_init_expr. */
gcc_checking_assert (!TARGET_EXPR_DIRECT_INIT_P (*expr_p));
the TARGET_EXPR in question is created for the NSDMI in:
class Vector { int m_size; };
struct S {
const Vector &vec{};
};
where we first need to create a Vector{} temporary, and then bind the
vec reference to it. The temporary is represented by a TARGET_EXPR
and it cannot be elided. When we create an object of type S, we get
Marek Polacek [Wed, 18 Sep 2024 19:44:31 +0000 (15:44 -0400)]
c++: concept in default argument [PR109859]
1) We're hitting the assert in cp_parser_placeholder_type_specifier.
It says that if it turns out to be false, we should do error() instead.
Do so, then.
2) lambda-targ8.C should compile fine, though. The problem was that
local_variables_forbidden_p wasn't cleared when we're about to parse
the optional template-parameter-list for a lambda in a default argument.
PR c++/109859
gcc/cp/ChangeLog:
* parser.cc (cp_parser_lambda_declarator_opt): Temporarily clear
local_variables_forbidden_p.
(cp_parser_placeholder_type_specifier): Turn an assert into an
error.
gcc/testsuite/ChangeLog:
* g++.dg/cpp2a/concepts-defarg3.C: New test.
* g++.dg/cpp2a/lambda-targ8.C: New test.
Christophe Lyon [Sun, 24 Nov 2024 18:08:48 +0000 (18:08 +0000)]
arm: [MVE intrinsics] Fix support for predicate constants [PR target/114801]
This backport is a cherry pick of commit 2089009210a1774c37e527ead8bbcaaa1a7a9d2d, with a small change needed
because force_lowpart_subreg does not exist in gcc-14: the patch
replaces it with the equivalent:
- x = force_lowpart_subreg (mode, x, GET_MODE (x));
+ {
+ auto byte = subreg_lowpart_offset (mode, GET_MODE (x));
+ x = force_subreg (mode, x, GET_MODE (x), byte);
+ }
In this PR, we have to handle a case where MVE predicates are supplied
as a const_int, where individual predicates have illegal boolean
values (such as 0xc for a 4-bit boolean predicate). To avoid the ICE,
fix the constant (any non-zero value is converted to all 1s) and emit
a warning.
On MVE, V8BI and V4BI multi-bit masks are interpreted byte-by-byte at
instruction level, but end-users should describe lanes rather than
bytes (so all bytes of a true-predicated lane should be '1'), see the
section on MVE intrinsics in the Arm ACLE specification.
Since force_lowpart_subreg cannot handle const_int (because they have VOID mode),
use gen_lowpart on them, force_lowpart_subreg otherwise.
2024-11-20 Christophe Lyon <christophe.lyon@linaro.org>
Jakub Jelinek <jakub@redhat.com>
Jonathan Wakely [Tue, 17 Dec 2024 21:32:19 +0000 (21:32 +0000)]
libstdc++: Fix std::future::wait_until for subsecond negative times [PR118093]
The current check for negative times (i.e. before the epoch) only checks
for a negative number of seconds. For a time 1ms before the epoch the
seconds part will be zero, but the futex syscall will still fail with an
EINVAL error. Extend the check to handle this case.
This change adds a redundant check in the headers too, so that we avoid
even calling into the library for negative times. Both checks can be
marked [[unlikely]]. The check in the headers avoids the cost of
splitting the time into seconds and nanoseconds and then making a PLT
call. The check inside the library matches where we were checking
already, and fixes existing binaries that were compiled against older
headers but use a newer libstdc++.so.6 at runtime.
libstdc++-v3/ChangeLog:
PR libstdc++/118093
* include/bits/atomic_futex.h (_M_load_and_test_until_impl):
Return false for times before the epoch.
* src/c++11/futex.cc (_M_futex_wait_until): Extend check for
negative times to check for subsecond times. Add unlikely
attribute.
(_M_futex_wait_until_steady): Likewise.
* testsuite/30_threads/future/members/118093.cc: New test.