Iain Sandoe [Thu, 31 Oct 2024 08:40:08 +0000 (08:40 +0000)]
c++/coroutines: Fix awaiter var creation [PR116506]
Awaiters always need to have a coroutine state frame copy since
they persist across potential supensions. It simplifies the later
analysis considerably to assign these early which we do when
building co_await expressions.
The cleanups in r15-3146-g47dbd69b1, unfortunately elided some of
processing used to cater for cases where the var created from an
xvalue, or is a pointer/reference type.
Corrected thus.
PR c++/116506
PR c++/116880
gcc/cp/ChangeLog:
* coroutines.cc (build_co_await): Ensure that xvalues are
materialised. Handle references/pointer values in awaiter
access expressions.
(is_stable_lvalue): New.
* decl.cc (cxx_maybe_build_cleanup): Handle null arg.
gcc/testsuite/ChangeLog:
* g++.dg/coroutines/pr116506.C: New test.
* g++.dg/coroutines/pr116880.C: New test.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk> Co-authored-by: Jason Merrill <jason@redhat.com>
(cherry picked from commit 4c743798b1d4530b327dad7c606c610f3811fdbf)
Jason Merrill [Tue, 24 Dec 2024 00:57:56 +0000 (19:57 -0500)]
c++: fix conversion issues
back-port the coroutine part of this.
Some issues caught by a check from another patch:
In build_ramp_function, we're assigning REFERENCE_TYPE things, so we need to
build the assignment directly rather than rely on functions that implement
C++ semantics.
Iain Sandoe [Fri, 1 Nov 2024 23:30:58 +0000 (23:30 +0000)]
c++, coroutines:Ensure bind exprs are visited once [PR98935].
Recent changes in the C++ FE and the coroutines implementation have
exposed a latent issue in which a bind expression containing a var
that we need to capture in the coroutine state gets visited twice.
This causes an ICE (from a checking assert). Fixed by adding a pset
to the relevant tree walk. Exit the callback early when the tree is
not a BIND_EXPR.
PR c++/98935
gcc/cp/ChangeLog:
* coroutines.cc (register_local_var_uses): Add a pset to the
tree walk to avoid visiting the same BIND_EXPR twice. Make
an early exit for cases that the callback does not apply.
(cp_coroutine_transform::apply_transforms): Add a pset to the
tree walk for register_local_vars.
c++/coro: ignore cleanup_point_exprs while expanding awaits [PR116793]
If we reach a CLEANUP_POINT_EXPR while trying to walk statements, we
actually care about the statement or statement list contained within it.
Indeed, such a construction started happening with r15-3513-g964577c31df206, after temporary promotion. In the test case
presented in PR116793, the compiler generated:
... i.e. a statement list within a cleanup point. In such a case, we
don't actually care about the cleanup point, but we do care about the
statement inside, so, we can just walk down into the CLEANUP_POINT_EXPR.
PR c++/116793
gcc/cp/ChangeLog:
* coroutines.cc (await_statement_expander): Just process
subtrees if encountering a CLEANUP_POINT_EXPR.
c++: simplify handling implicit INDIRECT_REF and co_await in convert_to_void
convert_to_void has, so far, when converting a co_await expression to
void altered the await_resume expression of a co_await so that it is
also converted to void. This meant that the type of the await_resume
expression, which is also supposed to be the type of the whole co_await
expression, was not the same as the type of the CO_AWAIT_EXPR tree.
While this has not caused problems so far, it is unexpected, I think.
Also, convert_to_void had a special case when an INDIRECT_REF wrapped a
CALL_EXPR. In this case, we also diagnosed maybe_warn_nodiscard. This
was a duplication of logic related to converting call expressions to
void.
Instead, we can generalize a bit, and rather discard the expression that
was implicitly dereferenced instead.
warning: indirection will not access object of incomplete type
'volatile S' in statement
... to:
warning: implicit dereference will not access object of type
‘volatile S’ in statement
... but should have no impact in other cases.
gcc/cp/ChangeLog:
* coroutines.cc (co_await_get_resume_call): Return a tree
directly, rather than a tree pointer.
* cp-tree.h (co_await_get_resume_call): Adjust signature
accordingly.
* cvt.cc (convert_to_void): Do not alter CO_AWAIT_EXPRs when
discarding them. Simplify handling implicit INDIRECT_REFs.
Arsen Arsenović [Wed, 28 Aug 2024 19:59:18 +0000 (21:59 +0200)]
c++/coro: prevent ICV_STATEMENT diagnostics in temp promotion [PR116502]
If such a diagnostic is necessary, it has already been emitted,
otherwise, it is not correct and emitting it here is inactionable by the
user, and bogus.
PR c++/116502
gcc/cp/ChangeLog:
* coroutines.cc (maybe_promote_temps): Convert temporary
initializers to void without complaining.
gcc/testsuite/ChangeLog:
* g++.dg/coroutines/maybe-unused-1.C: New test.
* g++.dg/coroutines/pr116502.C: New test.
In examining the coroutine testcases for unexpected diagnostic output
for 'Wall', I found a 'statement has no effect' warning for the promise
construction in one case. In particular, the case is where the users
promise type has an implicit CTOR but a user-provided DTOR. Further, the
type does not actually need constructing.
In very early versions of the coroutines code we used to check
TYPE_NEEDS_CONSTRUCTING() to determine whether to attempt to build
a constructor call for the promise. During review, it was suggested
to use type_build_ctor_call () instead.
This latter call checks the constructors in the type (both user-defined
and implicit) and returns true, amongst other cases if any of the found
CTORs are marked as deprecated.
In a number of places (for example [class.copy.ctor] / 6) the standard
says that some version of an implicit CTOR is deprecated when the user
provides a DTOR.
Thus, for this specific arrangement of promise type, type_build_ctor_call
returns true, because of (for example) a deprecated implicit copy CTOR.
We are not going to use any of the deprecated CTORs and thus will not
see warnings from this - however, since the call returned true, we have
now determined that we should attempt to build a constructor call.
Note as above, the type does not actually require construction and thus
one might expect either a NULL_TREE or error_mark_node in response to
the build_special_member_call (). However, in practice the function
returns the original instance object instead of a call or some error.
When we add that as a statement it triggers the 'statement has no effect'
warning.
The patch here rearranges the promise construction/destruction code to
allow for the case that a DTOR is required independently of a CTOR. In
addition, we check that the return from build_special_member_call ()
has side effects before we add it as a statement.
gcc/cp/ChangeLog:
* coroutines.cc
(cp_coroutine_transform::build_ramp_function): Separate the
build of promise constructor and destructor. When evaluating
the constructor, check that build_special_member_call returns
an expression with side effects before adding it.
c++, coroutines: Fix handling of bool await_suspend() [PR115905].
As noted in the PR the action of the existing implementation was to
treat a false value from await_suspend () as equivalent to "do not
suspend". Actually it needs to be the equivalent of "resume" - and
we need to restart the dispatcher - since the await_suspend() body
could have already resumed the coroutine.
See also https://github.com/cplusplus/CWG/issues/601 (NAD) for more
discussion.
Since we need to amend the await expansion and the actor build, take
the opportunity to clean up and modernise the code there. Note that
we need to make the jump back to the dispatcher without any scope
exit cleanups (so we have to use the .CO_SUSPN IFN to do this).
PR c++/115905
gcc/cp/ChangeLog:
* coroutines.cc (struct coro_aw_data): Add a member for the
restart dispatch label.
(expand_one_await_expression): Rework to modernise and to
handle the boolean await_suspend() case.
(build_actor_fn): Rework the dispatcher and allow for a jump
back to the dispatcher.
We rely on .CO_YIELD calls being followed by an assignment (optionally)
and then a switch/if in the same basic block. This implies that a
.CO_YIELD can never end a block. However, since a call to .CO_YIELD is
still a call, if the function containing it calls setjmp, GCC thinks
that the .CO_YIELD can introduce abnormal control flow, and generates an
edge for the call.
We know this is not the case; .CO_YIELD calls get removed quite early on
and have no effect, and result in no other calls, so .CO_YIELD can be
considered a leaf function, preventing generating an edge when calling
it.
Arsen Arsenović [Fri, 16 Aug 2024 17:07:01 +0000 (19:07 +0200)]
c++: don't remove labels during coro-early-expand-ifns [PR105104]
In some scenarios, it is possible for the CFG cleanup to cause one of
the labels mentioned in CO_YIELD, which coro-early-expand-ifns intends
to remove, to become part of some statement. As a result, when that
label is removed, the statement it became part of becomes invalid,
crashing the compiler.
There doesn't appear to be a reason to remove the labels (anymore, at
least), so let's not do that.
PR c++/105104
gcc/ChangeLog:
* coroutine-passes.cc (execute_early_expand_coro_ifns): Don't
remove any labels.
Iain Sandoe [Mon, 26 Aug 2024 13:09:40 +0000 (14:09 +0100)]
c++, coroutines: The frame pointer is used in the helpers [PR116482].
We have a bogus warning about the coroutine state frame pointers
being apparently unused in the resume and destroy functions. Fixed
by making the parameters DECL_ARTIFICIAL.
PR c++/116482
gcc/cp/ChangeLog:
* coroutines.cc
(coro_build_actor_or_destroy_function): Make the parameter
decls DECL_ARTIFICIAL.
Arsen Arsenović [Fri, 23 Aug 2024 18:19:05 +0000 (20:19 +0200)]
c++/coros: do not assume coros don't nest [PR113457]
In the testcase presented in the PR, during template expansion, an
tsubst of an operand causes a lambda coroutine to be processed, causing
it to get an initial suspend and final suspend. The code for assigning
awaitable var names (get_awaitable_var) assumed that the sequence Is ->
Is -> Fs -> Fs is impossible (i.e. that one could only 'open' one
coroutine before closing it at a time), and reset the counter used for
unique numbering each time a final suspend occured. This assumption is
false in a few cases, usually when lambdas are involved.
Instead of storing this counter in a static-storage variable, we can
store it in coroutine_info. This struct is local to each function, so
we don't need to worry about "cross-contamination" nor resetting.
PR c++/113457
gcc/cp/ChangeLog:
* coroutines.cc (struct coroutine_info): Add integer field
awaitable_number. This is a counter used for assigning unique
names to awaitable temporaries.
(get_awaitable_var): Use awaitable_number from coroutine_info
instead of the static int awn.
gcc/testsuite/ChangeLog:
* g++.dg/coroutines/pr113457-1.C: New test.
* g++.dg/coroutines/pr113457.C: New test.
Iain Sandoe [Mon, 19 Aug 2024 19:50:54 +0000 (20:50 +0100)]
c++, coroutines: Look through initial_await target exprs [PR110635].
In the case that the initial awaiter returns an object, the initial await
can be a target expression and we need to look at its initializer to cast
the await_resume() to void and to wrap in a compound expression that sets
the initial_await_resume_called flag.
PR c++/110635
gcc/cp/ChangeLog:
* coroutines.cc
(cp_coroutine_transform::wrap_original_function_body): Look through
initial await target expressions to find the actual co_await_expr
that we need to update.
Iain Sandoe [Sun, 18 Aug 2024 21:54:50 +0000 (22:54 +0100)]
c++, coroutines: Rework handling of throwing_cleanups [PR102051].
In the fix for PR95822 (r11-7402) we set throwing_cleanup false in the top
level of the coroutine transform code. However, as the current PR shows,
that is not sufficient.
Any use of cxx_maybe_build_cleanup() can reset the flag, which causes the
check_return_expr () logic to try to add a guard variable and set it.
For the coroutine code, we need to handle the cleanups separately, since
the responsibility for them changes after the first resume point, which
we handle in the ramp exception processing.
Fix this by forcing the "throwing_cleanup" flag false right before the
processing of the return expression.
PR c++/102051
gcc/cp/ChangeLog:
* coroutines.cc
(cp_coroutine_transform::build_ramp_function): Handle
"throwing_cleanup" here instead of ...
(cp_coroutine_transform::apply_transforms): ... here.
We have been requiring the get_return_on_allocation_fail() call to have the
same type as the ramp. This is not intended by the standard, so relax that
to allow anything convertible to the ramp return.
PR c++/109682
gcc/cp/ChangeLog:
* coroutines.cc
(cp_coroutine_transform::build_ramp_function): Allow for cases where
get_return_on_allocation_fail has a type convertible to the ramp
return type.
Iain Sandoe [Sat, 17 Aug 2024 14:47:58 +0000 (15:47 +0100)]
c++, coroutines: Only allow void get_return_object if the ramp is void [PR100476].
Require that the value returned by get_return_object is convertible to
the ramp return. This means that the only time we allow a void
get_return_object, is when the ramp is also a void function.
We diagnose this early to allow us to exit the ramp build if the return
values are incompatible.
PR c++/100476
gcc/cp/ChangeLog:
* coroutines.cc
(cp_coroutine_transform::build_ramp_function): Remove special
handling of void get_return_object expressions.
c++, coroutines: check for members we use in handle_types [PR105475]
Currently, it is possible to ICE GCC by giving it sufficiently broken
code, where sufficiently broken means a std::coroutine_handle missing a
default on the promise_type template argument, and missing members.
As the code generator relies on lookups in the coroutine_handle never
failing (and has no way to signal that error), lets do it ahead of time,
save the result, and use that. This saves us some lookups and allows us
to propagate an error.
PR c++/105475 - coroutines: ICE in coerce_template_parms, at cp/pt.cc:9183
PR c++/105475
* coroutines.cc (struct coroutine_info): Add from_address.
Carries the from_address member we looked up earlier.
(coro_resume_identifier): Remove. Unused.
(coro_init_identifiers): Do not initialize the above.
(struct susp_frame_data): Remove unused members, provide a CTOR.
(void_coro_handle_address): New variable. Contains the baselink
for the std::coroutine_handle<void>::address() instance method.
(get_handle_type_address): New function. Looks up and validates
handle_type::address in a given handle_type.
(get_handle_type_from_address): New function. Looks up and
validates handle_type::from_address in a given handle_type.
(coro_promise_type_found_p): Remove reliance on
coroutine_handle<> defaulting the promise type to void. Store
get_handle_type_* results where appropriate.
(struct local_vars_frame_data): Add a CTOR.
(replace_continue): Look up expression type.
(get_coroutine_from_address): New helper. Gets the
handle_type::from_address BASELINK from a coroutine_info.
(morph_fn_to_coro): Use susp_frame_data CTOR, and make the suspend
state hash map local to the morph function. Use CTOR for
local_vars_frame_data instead of brace init.
(build_actor_fn): Use the get_coroutine_from_address helper and
void_coro_handle_address.
gcc/testsuite/ChangeLog:
PR c++/105475
* g++.dg/coroutines/pr103868.C: Add std::coroutine_handle
members we check for now.
* g++.dg/coroutines/pr105287.C: Ditto.
* g++.dg/coroutines/pr105301.C: Ditto.
* g++.dg/coroutines/pr94528.C: Ditto.
* g++.dg/coroutines/pr94879-folly-1.C: Ditto.
* g++.dg/coroutines/pr94883-folly-2.C: Ditto.
* g++.dg/coroutines/pr98118.C: Ditto.
* g++.dg/coroutines/pr105475.C: New test.
* g++.dg/coroutines/pr105475-1.C: New test.
* g++.dg/coroutines/pr105475-2.C: New test.
* g++.dg/coroutines/pr105475-3.C: New test.
* g++.dg/coroutines/pr105475-4.C: New test.
* g++.dg/coroutines/pr105475-5.C: New test.
* g++.dg/coroutines/pr105475-6.C: New test.
* g++.dg/coroutines/pr105475-broken-spec.C: New test.
* g++.dg/coroutines/pr105475-broken-spec-2.C: New test.
... the coroutine promise type in each statement is always
std::coroutine_handle<task>::promise_type, and all of the operands are
not type-dependent, so we can always compute the resulting types (and
expected types) of these expressions and statements.
Also, when we do not know the type of the CO_AWAIT_EXPR or
CO_YIELD_EXPR, we now return NULL_TREE as the type rather than
unknown_type_node. This is more correct, since the type is not unknown,
it just isn't determined yet. This also means we can remove the
CO_AWAIT_EXPR and CO_YIELD_EXPR special-cases from
type_dependent_expression_p.
PR c++/112341 - error: insufficient contextual information to determine type on co_await result in function template
gcc/cp/ChangeLog:
PR c++/112341
* coroutines.cc (struct coroutine_info): Also cache the
traits type.
(ensure_coro_initialized): New function. Makes sure we have
initialized the coroutine state successfully, or informs the
caller should it fail to do so. Extracted from
coro_promise_type_found_p.
(coro_get_traits_class): New function. Gets the (cached)
coroutine traits type for a given coroutine. Extracted from
coro_promise_type_found_p and refactored to cache the result.
(coro_promise_type_found_p): Use the two functions above.
(build_template_co_await_expr): New function. Builds a
CO_AWAIT_EXPR representing a CO_AWAIT_EXPR in a template
declaration.
(build_co_await): Use the above if processing_template_decl, and
give it a proper type.
(coro_dependent_p): New function. Returns true iff its
argument is a type-dependent expression OR the current functions
traits class is type dependent.
(finish_co_await_expr): Defer expansion only in the case
coro_dependent_p returns true.
(finish_co_yield_expr): Ditto.
(finish_co_return_stmt): Ditto.
* pt.cc (type_dependent_expression_p): Do not treat
CO_AWAIT/CO_YIELD specially.
gcc/testsuite/ChangeLog:
PR c++/112341
* g++.dg/coroutines/pr112341-2.C: New test.
* g++.dg/coroutines/pr112341-3.C: New test.
* g++.dg/coroutines/torture/co-yield-03-tmpl-nondependent.C: New
test.
* g++.dg/coroutines/pr112341.C: New test.
c++: diagnose usage of co_await and co_yield in default args [PR115906]
This is a partial fix for PR115906. Per [expr.await] 2s3, "An
await-expression shall not appear in a default argument
([dcl.fct.default])". This patch introduces the diagnostic in that
case, and in the case of a co_yield (as co_yield is defined in terms of
co_await, so prerequisites of co_await hold).
PR c++/115906 - [coroutines] missing diagnostic and ICE when co_await used as default argument in function declaration
gcc/cp/ChangeLog:
PR c++/115906
* parser.cc (cp_parser_unary_expression): Reject await
expressions if use of local variables is currently forbidden.
(cp_parser_yield_expression): Reject yield expressions if use of
local variables is currently forbidden.
gcc/testsuite/ChangeLog:
PR c++/115906
* g++.dg/coroutines/pr115906-yield.C: New test.
* g++.dg/coroutines/pr115906.C: New test.
* g++.dg/coroutines/co-await-syntax-02-outside-fn.C: Don't rely
on default arguments.
* g++.dg/coroutines/co-yield-syntax-01-outside-fn.C: Ditto.
c++: fix ICE on FUNCTION_DECLs inside coroutines [PR115906]
When register_local_var_uses iterates a BIND_EXPRs BIND_EXPR_VARS, it
fails to account for the fact that FUNCTION_DECLs might be present, and
later passes it to DECL_HAS_VALUE_EXPR_P. This leads to a tree check
failure in DECL_HAS_VALUE_EXPR_P:
tree check: expected var_decl or parm_decl or result_decl, have
function_decl in register_local_var_uses
We only care about PARM_DECL and VAR_DECL, so select only those.
PR c++/115906 - [coroutines] missing diagnostic and ICE when co_await used as default argument in function declaration
gcc/cp/ChangeLog:
PR c++/115906
* coroutines.cc (register_local_var_uses): Only process
PARM_DECL and VAR_DECLs.
gcc/testsuite/ChangeLog:
PR c++/115906
* g++.dg/coroutines/coro-function-decl.C: New test.
cp/coroutines: do not rewrite parameters in unevaluated contexts
It is possible to use parameters of a parent function of a lambda in
unevaluated contexts without capturing them. By not capturing them, we
work around the usual mechanism we use to prevent rewriting captured
parameters. Prevent this by simply skipping rewrites in unevaluated
contexts. Those won't mind the value not being present anyway.
This prevents an ICE during parameter substitution. In the testcase
from the PR, the rewriting machinery finds a param in the body of the
coroutine, which it did not previously encounter while processing the
coroutine declaration, and that does not have a DECL_VALUE_EXPR, and
fails.
Iain Sandoe [Sat, 15 Jun 2024 16:47:33 +0000 (17:47 +0100)]
c++, coroutines, contracts: Handle coroutine and void functions [PR110871,PR110872,PR115434].
The current implementation of contracts emits the checks into function
bodies in three places; for pre-conditions at the start of the body,
for asserts in-line in the function body and for post-conditions as an
addition to return statements.
In general (at least with existing "2a" contract semantics) the in-line
contract asserts behave as expected.
However, the mechanism is not applicable to:
* Handling pre conditions in coroutines since, for those, the standard
specifies a wrapping of the original function body by functionality
implementing initial and final suspends (along with some housekeeping
to route exceptions). Thus for such transformed function bodies, the
preconditions then get actioned after the initial suspend, which does
not behave as intended.
* Handling post conditions in functions that do not have return
statements (which applies to coroutines and void functions).
In the following, we identify a potentially transformed function body
(in the case of coroutines, this is usually called the "ramp()" function).
The patch here re-implements the code insertion in one of the two
following ways (code for exposition only):
* For functions with no post-conditions we wrap the potentially
transformed function as follows:
This implements the intent that the preconditions are processed after
the function parameters are initialised but before any other actions.
* For functions with post-conditions:
if (preconditions_exist)
handle_pre_condition_checking ();
try
{
potentially_transformed_function_body ();
}
finally
{
handle_post_condition_checking ();
}
else [only if the function is not marked noexcept(true) ]
{
;
}
In this, post-conditions [that might apply to the return value etc.]
are evaluated on every non-exceptional edge out of the function.
At present, the model here is that exceptions thrown by the function
propagate upwards as if there were no contracts present. If the desired
semantic becomes that an exception is counted as equivalent to a contract
violation - then we can add a second handler in place of the empty
statement.
This patch specifically does not address changes to code-gen and constexpr
handling that are contained in P2900.
PR c++/115434
PR c++/110871
PR c++/110872
gcc/cp/ChangeLog:
* constexpr.cc (cxx_eval_constant_expression): Handle EH_ELSE_EXPR.
* contracts.cc (finish_contract_attribute): Remove excess line.
(build_contract_condition_function): Post condition handlers are
void now.
(emit_postconditions_cleanup): Remove.
(emit_postconditions): New.
(add_pre_condition_fn_call): New.
(add_post_condition_fn_call): New.
(apply_preconditions): New.
(apply_postconditions): New.
(maybe_apply_function_contracts): New.
(apply_postcondition_to_return): Remove.
* contracts.h (apply_postcondition_to_return): Remove.
(maybe_apply_function_contracts): Add.
* coroutines.cc (coro_build_actor_or_destroy_function): Do not
copy contracts to coroutine helpers.
* decl.cc (finish_function): Handle wrapping a possibly
transformed function body in contract checks.
* typeck.cc (check_return_expr): Remove handling of post
conditions on return expressions.
gcc/ChangeLog:
* gimplify.cc (struct gimplify_ctx): Add a flag to show we are
expending a handler.
(gimplify_expr): When we are expanding a handler, and the body
transforms might have re-written DECL_RESULT into a gimple var,
ensure that hander references to DECL_RESULT are also re-written
to refer to the gimple var. When we are processing an EH_ELSE
expression, then add it if either of the cleanup slots is in
use.
gcc/testsuite/ChangeLog:
* g++.dg/contracts/pr115434.C: New test.
* g++.dg/coroutines/pr110871.C: New test.
* g++.dg/coroutines/pr110872.C: New test.
Iain Sandoe [Sun, 29 Dec 2024 23:06:54 +0000 (23:06 +0000)]
includes, Darwin: Handle modular use for macOS SDKs [PR116827].
Recent changes to the OS SDKs have altered the way in which include guards
are used for a number of headers when C++ modules are enabled. Instead of
placing the guards in the included header, they are being placed in the
including header. This breaks the assumptions in the current GCC stddef.h
specifically, that the presence of __PTRDIFF_T and __SIZE_T means that the
relevant defs are already made. However in the case of the module-enabled
C++ with these SDKs, that is no longer true.
stddef.h has a large body of special-cases already, but it seems that the
only viable solution here is to add a new one specifically for __APPLE__
and modular code.
This fixes around 280 new fails in the modules test-suite; it is needed on
all open branches that support modules.
PR target/116827
gcc/ChangeLog:
* ginclude/stddef.h: Undefine __PTRDIFF_T and __SIZE_T for module-
enabled c++ on Darwin/macOS platforms.
Iain Sandoe [Mon, 10 Mar 2025 08:44:41 +0000 (08:44 +0000)]
testsuite, gm2: Use -B option for libstdc++ where required.
We need to add testsuite options to locate gm2 libs and libstdc++.
Usually '-L' options are added to point to the relevant directories for
the uninstalled libraries.
In cases where libraries are available as both shared and convenience some
additional checks are made.
For some targets -static-xxxx options are handled by specs substitution and
need a '-B' option rather than '-L'. For Darwin, when embedded runpaths are
in use (the default for all versions after macOS 10.11), '-B' is also needed
to provide the runpath.
When '-B' is used, this results in a '-L' for each path that exists (so that
appending a '-L' as well is a needless duplicate). There are also cases
where tools warn for duplicates, leading to spurious fails.
Therefore the objective of the code here is to add just one '-L' or '-B' for
each of the libraries.
Currently, we are forcing the full paths to each of the gm2 convenience libs
onto the link line and therefore the B/L logic is not needed there. It would
need to be added if/when gm2 is tested with shared libraries
gcc/testsuite/ChangeLog:
* lib/gm2.exp: Arrange for a '-B' option to be added for the
libstdc++ paths on targets that need it.
Iain Sandoe [Sun, 9 Mar 2025 09:24:34 +0000 (09:24 +0000)]
Darwin: Pass -macos_version_min to the linker [PR119172].
For binaries to be notarised, the SDK version must be available.
Since we do not, at present, parse this information we have been
passing "0.0" to ld64. This now results in a warning and a fail
to notarise. As a quick-fix, we can fall back to letting ld64
figure out the SDK version (which it does for -macos_version_min).
* config.in: Regenerate.
* config/darwin.h (DARWIN_PLATFORM_ID): Add the option to
use -macos_version_min where available.
* configure: Regenerate.
* configure.ac: Check for ld64 support of -macos_version_min.
Co-authored-by: Andrew Pinski <quic_apinski@quicinc.com> Signed-off-by: Iain Sandoe <iain@sandoe.co.uk> Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Mark Mentovai [Tue, 24 Sep 2024 20:11:14 +0000 (16:11 -0400)]
libgcc, Darwin: Drop the legacy library build for macOS >= 10.12 [PR116809].
From macOSX15 SDK, the unwinder no longer exports some of the symbols used
in that library which (a) causes bootstrap fail and (b) means that the
legacy library is no longer useful.
No open branch of GCC emits references to this library - and any already
-built code that depends on the symbols would need rework anyway.
We have been asked to extend this back to the earliest OS vesion supported
by the SDK (10.12).
PR target/116809
libgcc/ChangeLog:
* config.host: Build legacy libgcc_s.1 on hosts before macOS 10.12.
* config/i386/t-darwin: Remove reference to legacy libgcc_s.1
* config/rs6000/t-darwin: Likewise.
* config/t-darwin-libgccs1: New file.
Harald Anlauf [Tue, 15 Apr 2025 18:43:05 +0000 (20:43 +0200)]
Fortran: pure subroutine with pure procedure as dummy [PR106948]
PR fortran/106948
gcc/fortran/ChangeLog:
* resolve.cc (gfc_pure_function): If a function has been resolved,
but esym is not yet set, look at its attributes to see whether it
is pure or elemental.
In r15-123 and r14-11434 we unconditionally set processing_template_decl
when substituting the context of an UNBOUND_CLASS_TEMPLATE, in order to
handle instantiation of the dependently scoped friend declaration
template<int N>
template<class T>
friend class A<N>::B;
where the scope A<N> remains dependent after instantiation. But this
turns out to misbehave for the UNBOUND_CLASS_TEMPLATE in the below
testcase representing
g<[]{}>::template fn
since with the flag set substituting the args of test3 into the lambda
causes us to defer the substitution and yield a lambda that still looks
dependent, which in turn makes g<[]{}> still dependent and not suitable
for qualified name lookup.
This patch restricts setting processing_template_decl during
UNBOUND_CLASS_TEMPLATE substitution to the case where there are multiple
levels of introduced template parameters, as in the friend declaration.
(This means we need to substitute the template parameter list(s) first,
which makes sense since they lexically appear first.)
PR c++/119981
PR c++/119378
gcc/cp/ChangeLog:
* pt.cc (tsubst) <case UNBOUND_CLASS_TEMPLATE>: Substitute
into template parameter list first. When substituting the
context, only set processing_template_decl if there's more
than one level of introduced template parameters.
Eric Botcazou [Wed, 30 Apr 2025 10:41:36 +0000 (12:41 +0200)]
Fix GNAT build failure for x86/FreeBSD
gcc/ada/
PR ada/112958
* Makefile.rtl (LIBGNAT_TARGET_PAIRS) [x86 FreeBSD]: Add specific
version of s-dorepr.adb.
* libgnat/s-dorepr__freebsd.adb: New file.
AVR: target/119989 - Add missing clobbers to xload_<mode>_libgcc.
libgcc's __xload_1...4 is clobbering Z (and also R21 is some cases),
but avr.md had clobbers of respective GPRs only up to reload.
Outcome was that code reading from the same __memx address twice
could be wrong. This patch adds respective clobbers.
gcc/
* config/avr/avr.md (xload_<mode>_libgcc): Clobber R21, Z.
gcc/testsuite/
* gcc.target/avr/torture/pr119989.h: New file.
* gcc.target/avr/torture/pr119989-memx-1.c: New test.
* gcc.target/avr/torture/pr119989-memx-2.c: New test.
* gcc.target/avr/torture/pr119989-memx-3.c: New test.
* gcc.target/avr/torture/pr119989-memx-4.c: New test.
Adapt testsuite v3_target_compile to strip version namespace from compiler
output so that dg-error and dg-warning directives do not need to consider it.
libstdc++-v3/ChangeLog:
* testsuite/lib/libstdc++.exp (v3_target_compile): Strip version namespace
from compiler output.
* testsuite/20_util/function/cons/70692.cc: Remove now useless __8 namespace
pattern.
* testsuite/23_containers/map/48101_neg.cc: Likewise.
* testsuite/23_containers/multimap/48101_neg.cc: Likewise.
RISC-V: vsetvl: elide abnormal edges from LCM computations [PR119533]
vsetvl phase4 uses LCM guided info to insert VSETVL insns, including a
straggler loop for "mising vsetvls" on certain edges. Currently it
asserts on encountering EDGE_ABNORMAL.
When enabling go frontend with V enabled, libgo build hits the assert.
The solution is to prevent abnormal edges from getting into LCM at all
(my prior attempt at this just ignored them after LCM which is not
right). Existing invalid_opt_bb_p () current does this for BB predecessors
but not for successors which is what the patch adds.
Crucially, the ICE/fix also depends on avoiding vsetvl hoisting past
non-transparent blocks: That is taken care of by Robin's patch
"RISC-V: Do not lift up vsetvl into non-transparent blocks [PR119547]"
for a different yet related issue.
Reported-by: Heinrich Schuchardt <heinrich.schuchardt@canonical.com> Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
PR target/119533
Robin Dapp [Fri, 4 Apr 2025 15:06:44 +0000 (17:06 +0200)]
RISC-V: Do not lift up vsetvl into non-transparent blocks [PR119547].
When lifting up a vsetvl into a block we currently don't consider the
block's transparency with respect to the vsetvl as in other parts of the
pass. This patch does not perform the lift when transparency is not
guaranteed.
This condition is more restrictive than necessary as we can still
perform a vsetvl lift if the conflicting register is only every used
in vsetvls and no regular insns but given how late we are in the GCC 15
cycle it seems better to defer this. Therefore
gcc.target/riscv/rvv/vsetvl/avl_single-68.c is XFAILed for now.
This issue was found in OpenCV where it manifests as a runtime error.
Zhijin Zeng debugged PR119547 and provided an initial patch.
It turns out the reason the behavior of this testcase changed after CWG
2369 is because validity of the substituted return type is now checked
later, after constraints. So a more reliable workaround for this issue
is to add a constraint to check the validity of the return type earlier,
matching the pre-CWG 2369 semantics.
PR libstdc++/104606
libstdc++-v3/ChangeLog:
* include/std/optional (operator<=>): Revert r14-9771 change.
Add constraint checking the validity of the return type
compare_three_way_result_t before the three_way_comparable_with
constraint.
Patrick Palka [Tue, 25 Feb 2025 18:35:04 +0000 (13:35 -0500)]
libstdc++: Implement LWG 4027 change to possibly-const-range [PR118083]
LWG 4027 effectively makes the const range access CPOs ranges::cfoo behave
more consistently across C++23 and C++20 (pre-P2278R4) and also more
consistently with the std::cfoo range accessors, as the below testcase
adjustments demonstrate (which mostly consist of reverting workarounds
added by r14-3771-gf12e26f3496275 and r13-7186-g0d94c6df183375).
In passing fix PR118083 which reports that the input_range constraint on
possibly-const-range is missing in our implementation. A consequence of
this is that the const range access CPOs now consistently reject a non-range
argument, and so in some our of tests we need to introduce otherwise
unused begin/end members.
PR libstdc++/118083
libstdc++-v3/ChangeLog:
* include/bits/ranges_base.h
(ranges::__access::__possibly_const_range): Adjust logic as per
LWG 4027. Add missing input_range constraint.
* testsuite/std/ranges/access/cbegin.cc (test05): Verify LWG
4027 testcases.
* testsuite/std/ranges/access/cdata.cc: Adjust, simplify and
consolidate some tests after the above.
* testsuite/std/ranges/access/cend.cc: Likewise.
* testsuite/std/ranges/access/crbegin.cc: Likewise.
* testsuite/std/ranges/access/crend.cc: Likewise.
* testsuite/std/ranges/adaptors/join.cc: Likewise.
* testsuite/std/ranges/adaptors/take_while.cc: Likewise.
* testsuite/std/ranges/adaptors/transform.cc: Likewise.
When remapping existing specializations of a hidden template friend from
a previous declaration to the new definition, we must remap only those
specializations that match this new definition, but currently we
remap all specializations (since they all appear in the same
DECL_TEMPLATE_INSTANTIATIONS list of the most general template).
Concretely, in the first testcase below, we form two specializations of
the friend A::f, one with arguments {{0},{bool}} and another with
arguments {{1},{bool}}. Later when instantiating B, we need to remap
these specializations. During the B<0> instantiation we only want to
remap the first specialization, and during the B<1> instantiation only
the second specialization, but currently we remap both specializations
twice.
tsubst_friend_function needs to determine if an existing specialization
matches the shape of the new definition, which is tricky in general,
e.g. if the outer template parameters may not match up. Fortunately we
don't have to reinvent the wheel here since is_specialization_of_friend
seems to do exactly what we need. We can check this unconditionally,
but I think it's only necessary when dealing with specializations formed
from a class template scope previous declaration, hence the
TMPL_ARGS_HAVE_MULTIPLE_LEVELS check.
PR c++/119807
PR c++/112288
gcc/cp/ChangeLog:
* pt.cc (tsubst_friend_function): Skip remapping an
existing specialization if it doesn't match the shape of
the new friend definition.
gcc/testsuite/ChangeLog:
* g++.dg/template/friend86.C: New test.
* g++.dg/template/friend87.C: New test.
Jason Merrill [Wed, 16 Apr 2025 15:15:14 +0000 (11:15 -0400)]
c++: format attribute redeclaration [PR116954]
Here when merging the two decls, remove_contract_attributes loses
ATTR_IS_DEPENDENT on the format attribute, so apply_late_template_attributes
just returns, so the attribute doesn't get propagated to the type where the
warning looks for it.
Fixed by using copy_node instead of tree_cons to preserve flags.
PR c++/116954
gcc/cp/ChangeLog:
* contracts.cc (remove_contract_attributes): Preserve flags
on the attribute list.
Jason Merrill [Sat, 12 Apr 2025 15:35:18 +0000 (11:35 -0400)]
c++: shortcut constexpr vector ctor [PR113835]
Since std::vector became usable in constant evaluation in C++20, a vector
variable with static storage duration might be manifestly
constant-evaluated, so we properly try to constant-evaluate its initializer.
But it can never succeed since the result will always refer to the result of
operator new, so trying is a waste of time. Potentially a large waste of
time for a large vector, as in the testcase in the PR.
So, let's recognize this case and skip trying constant-evaluation. I do
this only for the case of an integer argument, as that's the case that's
easy to write but slow to (fail to) evaluate.
In the test, I use dg-timeout-factor to lower the default timeout from 300
seconds to 15; on my laptop, compilation without the patch takes about 20
seconds versus about 2 with the patch.
is_std_class comes from r15-4953.
PR c++/113835
gcc/cp/ChangeLog:
* cp-tree.h (is_std_class): Declare.
* constexpr.cc (is_std_class): New function.
(is_std_allocator): Use it.
(cxx_eval_outermost_constant_expr): Bail out early
for std::vector(N).
Here the basic blocks are rotated and a not is generated.
But the generated not is unmasked (or predicated over an ALL true mask in this
case). This has the unintended side-effect of flipping the results of the
inactive lanes (which were zero'd by the cmpgt) into -1. Which then incorrectly
causes us to not take the branch to .L2.
This is happening because we're not comparing against the right value for the
forall case. This patch gets rid of the forall case by rewriting the
if(all(mask)) into if (!all(mask)) which is the same as if (any(~mask)) by
negating the masks and flipping the branches.
1. For unmasked loops we simply reduce the ~mask.
2. For masked loops we reduce (~mask & loop_mask) which is the same as
doing (mask & loop_mask) ^ loop_mask.
Tamar Christina [Mon, 28 Apr 2025 11:58:37 +0000 (12:58 +0100)]
aarch64: force operand to fresh register to avoid subreg issues [PR118892]
When the input is already a subreg and we try to make a paradoxical
subreg out of it for copysign this can fail if it violates the subreg
relationship.
Use force_lowpart_subreg instead of lowpart_subreg to then force the
results to a register instead of ICEing.
gcc/ChangeLog:
PR target/118892
* config/aarch64/aarch64.md (copysign<GPF:mode>3): Use
force_lowpart_subreg instead of lowpart_subreg.
gcc/testsuite/ChangeLog:
PR target/118892
* gcc.target/aarch64/copysign-pr118892.c: New test.
Martin Jambor [Mon, 7 Apr 2025 11:32:10 +0000 (13:32 +0200)]
sra: Clear grp_same_access_path of acesses created by total scalarization (PR118924)
During analysis of PR 118924 it was discussed that total scalarization
invents access paths (strings of COMPONENT_REFs and possibly even
ARRAY_REFs) which did not exist in the program before which can have
unintended effects on subsequent AA queries. Although not doing that
does not mean that SRA cannot create such situations (see the bug for
more info), it has been agreed that not doing this is generally better.
This patch therfore makes SRA fall back on creating simple MEM_REFs when
accessing components of an aggregate corresponding to what a SRA
variable now represents.
gcc/ChangeLog:
2025-03-26 Martin Jambor <mjambor@suse.cz>
PR tree-optimization/118924
* tree-sra.cc (create_total_scalarization_access): Set
grp_same_access_path flag to zero.
Martin Jambor [Mon, 7 Apr 2025 11:32:09 +0000 (13:32 +0200)]
sra: Avoid creating TBAA hazards (PR118924)
The testcase in PR 118924, when compiled on Aarch64, contains an
gimple aggregate assignment statement in between different types which
are types_compatible_p but behave differently for the purposes of
alias analysis.
SRA replaces the statement with a series of scalar assignments which
however have LHSs access chains modeled on the RHS type and so do not
alias with a subsequent reads and so are DSEd.
SRA clearly gets its "same_access_path" logic subtly wrong. One issue
is that the same_access_path_p function probably should be implemented
more along the lines of (parts of ao_compare::compare_ao_refs) instead
of internally relying on operand_equal_p. That is however not the
problem in the PR and so I will deal with it only later.
The issue here is that even when the access path is the same, it must
not be bolted on an aggregate type that does not match. This patch
does that, taking just one simple function from the
ao_compare::compare_ao_refs machinery and using it to detect the
situation. The rest is just merging the information in between
accesses of the same access group.
I looked at how many times we come across such assignment during
"make stage2-bubble" of GCC (configured with only c and C++ and
without multilib and libsanitizers) and on an x86_64 there were 87924
such assignments (though now I realize not all of them had to be
aggregate), so they do happen. The patch leads to about 5% increase
of cases where we don't use an "access path" but resort to a
MEM_REF (from 90209 to 95204). On an Aarch64, there were 92268 such
assignments and the increase of falling back to MEM_REFs was by
4% (but from a bigger base 132983 to 107991).
gcc/ChangeLog:
2025-04-04 Martin Jambor <mjambor@suse.cz>
PR tree-optimization/118924
* tree-ssa-alias-compare.h (types_equal_for_same_type_for_tbaa_p):
Declare.
* tree-ssa-alias.cc: Include ipa-utils.h.
(types_equal_for_same_type_for_tbaa_p): New public overloaded variant.
* tree-sra.cc: Include tree-ssa-alias-compare.h.
(create_access): Initialzie grp_same_access_path to true.
(build_accesses_from_assign): Detect tbaa hazards and clear
grp_same_access_path fields of involved accesses when they occur.
(sort_and_splice_var_accesses): Take previous values of
grp_same_access_path into account.
gcc/testsuite/ChangeLog:
2025-03-25 Martin Jambor <mjambor@suse.cz>
PR tree-optimization/118924
* g++.dg/tree-ssa/pr118924.C: New test.
Since r12-5426 apply_late_template_attributes suppresses various global
state to avoid applying active pragmas to earlier declarations; we also
need to override target_option_current_node.
PR c++/114772
PR c++/101180
gcc/cp/ChangeLog:
* pt.cc (apply_late_template_attributes): Also override
target_option_current_node.
Jeremy Bettis [Thu, 17 Apr 2025 04:00:00 +0000 (21:00 -0700)]
libcpp: Fix incorrect line numbers in large files [PR108900]
This patch addresses an issue in the C preprocessor where incorrect
line number information is generated when processing files with a
large number of lines. The problem arises from improper handling
of location intervals in the line map, particularly when locations
exceed LINE_MAP_MAX_LOCATION_WITH_PACKED_RANGES.
By ensuring that the highest location is not decremented if it
would move to a different ordinary map, this fix resolves
the line number discrepancies observed in certain test cases.
This change improves the accuracy of line number reporting, benefiting
users relying on precise code coverage and debugging information.
Tested x86_64-linux.
libcpp/ChangeLog:
PR preprocessor/108900
* files.cc (_cpp_stack_file): Do not decrement highest_location
across distinct maps.
Signed-off-by: Jeremy Bettis <jbettis@google.com> Signed-off-by: Yash Shinde <Yash.Shinde@windriver.com>
(cherry picked from commit d9b56c65a2697e0d7a6c0f15f1977803dc94579b)
Jason Merrill [Tue, 15 Apr 2025 15:23:57 +0000 (11:23 -0400)]
c++: constexpr, trivial, and non-alias target [PR111075]
On Darwin and other targets with !can_alias_cdtor, we instead go to
maybe_thunk_ctor, which builds a thunk function that calls the general
constructor. And then cp_fold tries to constant-evaluate that call, and we
ICE because we don't expect to ever be asked to constant-evaluate a call to
a trivial function.
No new test because this fixes g++.dg/torture/tail-padding1.C on affected
targets.
PR c++/111075
gcc/cp/ChangeLog:
* constexpr.cc (cxx_eval_call_expression): Allow trivial
call from a thunk.
Richard Biener [Mon, 14 Apr 2025 09:42:18 +0000 (11:42 +0200)]
tree-optimization/119778 - properly mark abnormal edge sources during inlining
When inlining a call that abnormally transfers control-flow we make
all inlined calls that can possibly transfer abnormal control-flow
do so as well. But we failed to mark the calls as altering
control-flow. This results in inconsistent behavior later and
possibly wrong-code (we'd eventually prune those edges).
PR tree-optimization/119778
* tree-inline.cc (copy_edges_for_bb): Mark calls that are
source of abnormal edges as altering control-flow.
Jennifer Schmitz [Thu, 10 Apr 2025 13:46:15 +0000 (06:46 -0700)]
aarch64: Add test case.
This patch adds a test case to the testsuite for PR119706.
The bug was already fixed by
https://gcc.gnu.org/pipermail/gcc-patches/2025-April/680573.html.
OK for mainline?
Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com>
gcc/testsuite/
PR tree-optimization/119706
* g++.target/aarch64/sve/pr119706.C: New test.
Richard Biener [Thu, 10 Apr 2025 11:30:42 +0000 (13:30 +0200)]
middle-end/119706 - allow POLY_INT_CST as is_gimple_mem_ref_addr
We currently only INTEGER_CST, but not POLY_INT_CST, which leads
to the situation that when the POLY_INT_CST is only indrectly
present via a SSA def the IL is valid but when propagated it's not.
That's unsustainable.
PR middle-end/119706
* gimple-expr.cc (is_gimple_mem_ref_addr): Also allow
POLY_INT_CST.
Richard Biener [Mon, 31 Mar 2025 12:56:25 +0000 (14:56 +0200)]
target/119549 - fixup handling of -mno-sse4 in target attribute
The following fixes ix86_valid_target_attribute_inner_p to properly
handle target("no-sse4") via OPT_mno_sse4 rather than as unset OPT_msse4.
I've added asserts to ix86_handle_option that RejectNegative is honored
for both.
PR target/119549
* common/config/i386/i386-common.cc (ix86_handle_option):
Assert that both OPT_msse4 and OPT_mno_sse4 are never unset.
* config/i386/i386-options.cc (ix86_valid_target_attribute_inner_p):
Process negated OPT_msse4 as OPT_mno_sse4.
d: Fix ICE: type variant differs by TYPE_MAX_VALUE with -g [PR119826]
Forward referenced enum types were never fixed up after the main
ENUMERAL_TYPE was finished. All flags set are now propagated to all
variants after its mode, size, and alignment has been calculated.
PR d/119826
gcc/d/ChangeLog:
* types.cc (TypeVisitor::visit (TypeEnum *)): Propagate flags of main
enum types to all forward-referenced variants.
gcc/testsuite/ChangeLog:
* gdc.dg/debug/imports/pr119826b.d: New test.
* gdc.dg/debug/pr119826.d: New test.
d: Fix ICE in dwarf2out_imported_module_or_decl, at dwarf2out.cc:27676 [PR119817]
The ImportVisitor method for handling the importing of overload sets was
pushing NULL_TREE to the array of import decls, which in turn got passed
to `debug_hooks->imported_module_or_decl', triggering the observed
internal compiler error.
NULL_TREE is returned from `build_import_decl' when the symbol was
ignored for being non-trivial to represent in debug, for example,
template or tuple declarations. So similarly "skip" adding the symbol
when this is the case for overload sets too.
PR d/119817
gcc/d/ChangeLog:
* imports.cc (ImportVisitor::visit (OverloadSet *)): Don't push
NULL_TREE to vector of import symbols.
gcc/testsuite/ChangeLog:
* gdc.dg/debug/imports/m119817/a.d: New test.
* gdc.dg/debug/imports/m119817/b.d: New test.
* gdc.dg/debug/imports/m119817/package.d: New test.
* gdc.dg/debug/pr119817.d: New test.
d: Fix forward referenced enums missing type names in debug info [PR118309]
Calling `rest_of_type_compilation' as the D types were built meant that
debug info was being emitted before all forward references were
resolved, resulting in DW_AT_name's to be missing.
Instead, defer outputting type debug information until all modules have
been parsed and generated in `d_finish_compilation'.
PR d/118309
gcc/d/ChangeLog:
* modules.cc: Include debug.h
(d_finish_compilation): Call debug_hooks->type_decl on all TYPE_DECLs.
* types.cc: Remove toplev.h include.
(finish_aggregate_type): Don't call rest_of_type_compilation or
rest_of_decl_compilation on type.
(TypeVisitor::visit (TypeEnum *)): Likewise.
Jakub Jelinek [Wed, 16 Apr 2025 15:22:49 +0000 (17:22 +0200)]
libatomic: Fix up libat_{,un}lock_n for mingw [PR119796]
Here is just a port of the previously posted patch to mingw which
clearly has the same problems.
2025-04-16 Jakub Jelinek <jakub@redhat.com>
PR libgcc/101075
PR libgcc/119796
* config/mingw/lock.c (libat_lock_n, libat_unlock_n): Start with
computing how many locks will be needed and take into account
((uintptr_t)ptr % WATCH_SIZE). If some locks from the end of the
locks array and others from the start of it will be needed, first
lock the ones from the start followed by ones from the end.
Jakub Jelinek [Wed, 16 Apr 2025 15:21:39 +0000 (17:21 +0200)]
libatomic: Fix up libat_{,un}lock_n [PR119796]
As mentioned in the PR (and I think in PR101075 too), we can run into
deadlock with libat_lock_n calls with larger n.
As mentioned in PR66842, we use multiple locks (normally 64 mutexes
for each 64 byte cache line in 4KiB page) and currently can lock more
than one lock, in particular for n [0, 64] a single lock, for n [65, 128]
2 locks, for n [129, 192] 3 locks etc.
There are two problems with this:
1) we can deadlock if there is some wrap-around, because the locks are
acquired always in the order from addr_hash (ptr) up to
locks[NLOCKS-1].mutex and then if needed from locks[0].mutex onwards;
so if e.g. 2 threads perform libat_lock_n with n = 2048+64, in one
case at pointer starting at page boundary and in another case at
page boundary + 2048 bytes, the first thread can lock the first
32 mutexes, the second thread can lock the last 32 mutexes and
then first thread wait for the lock 32 held by second thread and
second thread wait for the lock 0 held by the first thread;
fixed below by always locking the locks in order of increasing
index, if there is a wrap-around, by locking in 2 loops, first
locking some locks at the start of the array and second at the
end of it
2) the number of locks seems to be determined solely depending on the
n value, I think that is wrong, we don't know the structure alignment
on the libatomic side, it could very well be 1 byte aligned struct,
and so how many cachelines are actually (partly or fully) occupied
by the atomic access depends not just on the size, but also on
ptr % WATCH_SIZE, e.g. 2 byte structure at address page_boundary+63
should IMHO lock 2 locks because it occupies the first and second
cacheline
Note, before this patch it locked exactly one lock for n = 0, while
with this patch it could lock either no locks at all (if it is at cacheline
boundary) or 1 (otherwise).
Dunno of libatomic APIs can be called for zero sizes and whether
we actually care that much how many mutexes are locked in that case,
because one can't actually read/write anything into zero sized memory.
If you think it is important, I could add else if (nlocks == 0) nlocks = 1;
in both spots.
2025-04-16 Jakub Jelinek <jakub@redhat.com>
PR libgcc/101075
PR libgcc/119796
* config/posix/lock.c (libat_lock_n, libat_unlock_n): Start with
computing how many locks will be needed and take into account
((uintptr_t)ptr % WATCH_SIZE). If some locks from the end of the
locks array and others from the start of it will be needed, first
lock the ones from the start followed by ones from the end.
Jakub Jelinek [Wed, 16 Apr 2025 07:11:06 +0000 (09:11 +0200)]
bitintlower: Fix interaction of gimple_assign_copy_p stmts vs. has_single_use [PR119808]
The following testcase is miscompiled, because we emit a CLOBBER in a place
where it shouldn't be emitted.
Before lowering we have:
b_5 = 0;
b.0_6 = b_5;
b.1_1 = (unsigned _BitInt(129)) b.0_6;
...
<retval> = b_5;
The bitint coalescing assigns the same partition/underlying variable
for both b_5 and b.0_6 (possible because there is a copy assignment)
and of course a different one for b.1_1 (and other SSA_NAMEs in between).
This is -O0 so stmts aren't DCEd and aren't propagated that much etc.
It is -O0 so we also don't try to optimize and omit some names from m_names
and handle multiple stmts at once, so the expansion emits essentially
bitint.4 = {};
bitint.4 = bitint.4;
bitint.2 = cast of bitint.4;
bitint.4 = CLOBBER;
...
<retval> = bitint.4;
and the CLOBBER is the problem because bitint.4 is still live afterwards.
We emit the clobbers to improve code generation, but do it only for
(initially) has_single_use SSA_NAMEs (remembered in m_single_use_names)
being used, if they don't have the same partition on the lhs and a few
other conditions.
The problem above is that b.0_6 which is used in the cast has_single_use
and so was in m_single_use_names bitmask and the lhs in that case is
bitint.2, so a different partition. But there is gimple_assign_copy_p
with SSA_NAME rhs1 and the partitioning special cases those and while
b.0_6 is single use, b_5 has multiple uses. I believe this ought to be
a problem solely in the case of such copy stmts and its special case
by the partitioning, if instead of b.0_6 = b_5; there would be
b.0_6 = b_5 + 1; or whatever other stmts that performs or may perform
changes on the value, partitioning couldn't assign the same partition
to b.0_6 and b_5 if b_5 is used later, it couldn't have two different
(or potentially different) values in the same bitint.N var. With
copy that is possible though.
So the following patch fixes it by being more careful when we set
m_single_use_names, don't set it if it is a has_single_use SSA_NAME
but SSA_NAME_DEF_STMT of it is a copy stmt with SSA_NAME rhs1 and that
rhs1 doesn't have single use, or has_single_use but SSA_NAME_DEF_STMT of it
is a copy stmt etc.
Just to make sure it doesn't change code generation too much, I've gathered
statistics how many times
if (m_first
&& m_single_use_names
&& m_vars[p] != m_lhs
&& m_after_stmt
&& bitmap_bit_p (m_single_use_names, SSA_NAME_VERSION (op)))
{
tree clobber = build_clobber (TREE_TYPE (m_vars[p]),
CLOBBER_STORAGE_END);
g = gimple_build_assign (m_vars[p], clobber);
gimple_stmt_iterator gsi = gsi_for_stmt (m_after_stmt);
gsi_insert_after (&gsi, g, GSI_SAME_STMT);
}
emits a clobber on
make check-gcc GCC_TEST_RUN_EXPENSIVE=1 RUNTESTFLAGS="--target_board=unix\{-m64,-m32\} GCC_TEST_RUN_EXPENSIVE=1 dg.exp='*bitint* pr112673.c builtin-stdc-bit-*.c pr112566-2.c pr112511.c pr116588.c pr116003.c pr113693.c pr113602.c flex-array-counted-by-7.c' dg-torture.exp='*bitint* pr116480-2.c pr114312.c pr114121.c' dfp.exp=*bitint* i386.exp='pr118017.c pr117946.c apx-ndd-x32-2a.c' vect.exp='vect-early-break_99-pr113287.c' tree-ssa.exp=pr113735.c"
and before this patch it was 41010 clobbers and after it is 40968,
so difference is 42 clobbers, 0.1% fewer.
2025-04-16 Jakub Jelinek <jakub@redhat.com>
PR middle-end/119808
* gimple-lower-bitint.cc (gimple_lower_bitint): Don't set
m_single_use_names bits for SSA_NAMEs which have single use but
their SSA_NAME_DEF_STMT is a copy from another SSA_NAME which doesn't
have a single use, or single use which is such a copy etc.
Jakub Jelinek [Mon, 14 Apr 2025 17:34:22 +0000 (19:34 +0200)]
expmed: Always use QImode for init_expmed set_zero_cost [PR119785]
This is a regression on some targets introduced I believe by r6-2055
which added mode argument to set_src_cost.
The problem here is that in the first iteration, mode is always QImode
and we get as -Os zero cost set_src_cost (const0_rtx, QImode, false).
But then we use the mode variable for iterating over int, partial int
and vector int modes, so for the second iteration we call set_src_cost
with mode which is at that time (machine_mode) (MAX_MODE_VECTOR_INT + 1).
In the x86 case that happens to be V2HFmode and we don't crash (and
compute the same 0 cost as we would for QImode).
But e.g. in the SPARC case (machine_mode) (MAX_MODE_VECTOR_INT + 1) is
MAX_MACHINE_MODE and that does all kinds of weird things especially
when doing ubsan bootstrap.
Fixed by always using QImode.
2025-04-14 Jakub Jelinek <jakub@redhat.com>
PR rtl-optimization/119785
* expmed.cc (init_expmed): Always pass QImode rather than mode to
set_src_cost passed to set_zero_cost.
Jakub Jelinek [Mon, 14 Apr 2025 08:18:13 +0000 (10:18 +0200)]
driver: On linux hosts disable ASLR during -freport-bug [PR119727]
Andi had a useful comment that even with the PR119727 workaround to
ignore differences in libbacktrace printed addresses, it is still better
to turn off ASLR when easily possible, e.g. in case some address leaks
in somewhere in the ICE message elsewhere, or to verify the ICE doesn't
depend on a particular library/binary load addresses.
The following patch adds a configure check and uses personality syscall
to turn off randomization for further -freport-bug subprocesses.
2025-04-14 Jakub Jelinek <jakub@redhat.com>
PR driver/119727
* configure.ac (HOST_HAS_PERSONALITY_ADDR_NO_RANDOMIZE): New check.
* gcc.cc: Include sys/personality.h if
HOST_HAS_PERSONALITY_ADDR_NO_RANDOMIZE is defined.
(try_generate_repro): Call
personality (personality (0xffffffffU) | ADDR_NO_RANDOMIZE)
if HOST_HAS_PERSONALITY_ADDR_NO_RANDOMIZE is defined.
* config.in: Regenerate.
* configure: Regenerate.
Jakub Jelinek [Sat, 12 Apr 2025 11:15:13 +0000 (13:15 +0200)]
driver: Fix up -freport-bug for ASLR [PR119727]
With --enable-host-pie -freport-bug almost never prepares preprocessed
source and instead emits
The bug is not reproducible, so it is likely a hardware or OS problem.
message even for bogus which are 100% reproducible.
The way -freport-bug works is that it reruns it 3 times, capturing stdout
and stderr from each and then tries to compare the outputs in between
different runs.
The libbacktrace emitted hexadecimal addresses at the start of the lines
can differ between runs due to ASLR, either of the PIE executable, or
even if not PIE if there is some frame with e.g. libc function (say
crash in strlen/memcpy etc.).
The following patch fixes it by ignoring such differences at the start of
the lines.
2025-04-12 Jakub Jelinek <jakub@redhat.com>
PR driver/119727
* gcc.cc (files_equal_p): Rewritten using fopen/fgets/fclose instead
of open/fstat/read/close. At the start of lines, ignore lowercase
hexadecimal addresses followed by space.
Jakub Jelinek [Sat, 12 Apr 2025 11:13:53 +0000 (13:13 +0200)]
bitintlower: Fix up handling of SSA_NAME copies in coalescing [PR119722]
The following patch is miscompiled, because during the limited
SSA name coalescing the bitintlower pass does we incorrectly don't
register a conflict.
This is on
<bb 4> [local count: 1073741824]:
# b_17 = PHI <b_19(3), 8(2)>
g.4_13 = g;
_14 = g.4_13 >> 50;
_15 = (unsigned int) _14;
_21 = b_17;
_16 = (unsigned int) _21;
s_22 = _15 + _16;
return s_22;
basic block where in the map->bitint bitmap we track 14, 17 and 19.
The build_bitint_stmt_ssa_conflicts "hook" has special code where
it tracks uses at the final statements of mergeable operations, so
e.g. the
_16 = (unsigned int) _21;
statement is considered to be use of b_17 because _21 is not in
map->bitmap (or large_huge.m_names), i.e. is mergeable.
The problem is that build_ssa_conflict_graph has special code to handle
SSA_NAME copies and _21 = b_17; is gimple_assign_copy_p. In such cases
it calls live_track_clear_var on the rhs1. The problem is that
on the above bb, after we note in the _16 = (unsigned int) _21;
stmt we need b_17 the generic code makes us forget that because
of the copy statement, and then build_bitint_stmt_ssa_conflicts
ignores it completely (because _21 is large/huge bitint and is
not in map->bitint, so assumed to be handled by a later stmt in the
bb, for backwards walk like this before this one).
As the b_17 use is ignored, the coalescing thinks it can put
all of b_17, b_19 and _14 into the same partition, which is wrong,
while we can and should coalesce b_17 and b_19, _14 needs to be a different
temporary because b_17 is set before and used after _14 has been written.
The following patch fixes it by handling gimple_assign_copy_p in two
separate spots, move the generic coalesce handling of it after
build_ssa_conflict_graph (where build_ssa_conflict_graph handling
doesn't fall through to that, it does continue after the call) and
inside of build_ssa_conflict_graph it performs it too, but only if
the lhs is not mergeable large/huge bitint.
2025-04-12 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/119722
* gimple-lower-bitint.h (build_bitint_stmt_ssa_conflicts): Add
CLEAR argument.
* gimple-lower-bitint.cc (build_bitint_stmt_ssa_conflicts): Add
CLEAR argument. Call clear on gimple_assign_copy_p rhs1 if lhs
is large/huge bitint unless lhs is not in names.
* tree-ssa-coalesce.cc (build_ssa_conflict_graph): Adjust
build_bitint_stmt_ssa_conflicts caller. Move gimple_assign_copy_p
handling to after the build_bitint_stmt_ssa_conflicts call.
Jakub Jelinek [Fri, 11 Apr 2025 06:27:55 +0000 (08:27 +0200)]
bitintlower: Fix up handling of nested casts in m_upward_2limbs cases [PR119707]
The following testcase is miscompiled I believe starting with
PR112941 r14-6742. That commit fixed the bitint-55.c testcase.
The m_first initialization for such conversion initializes 2 SSA_NAMEs,
one is PHI result on the loop (m_data[save_data_cnt]) and the other
(m_data[save_data_cnt+1]) is the argument of that PHI from the latch
edge initialized somewhere in the loop. Both of these are used to
propagate sign extension (i.e. either 0 or all ones limb) from the
iteration with the sign bit of a narrower type to following iterations.
The bitint-55.c testcase was ICEing with invalid SSA forms as it was
using unconditionally the PHI argument SSA_NAME even in places which
weren't dominated by that. And the code which was touched is about
handling constant idx, so if e.g. there are nested casts and the
outer one does conditional code based on index comparison with
a particular constant index.
In the following testcase there are 2 nested casts, one from signed
_BitInt(129) to unsigned _BitInt(255) and the outer from unsigned
_BitInt(255) to unsigned _BitInt(256). The m_upward_2limbs case which
is used for handling mergeable arithmetics (like +-|&^ and casts etc.)
one loop iteration handles 2 limbs, the first half the even ones, the
second half the odd ones.
And for these 2 conversions, the special one for the inner conversion
on x86_64 is with index 2 where the sign bit of _BitInt(129) is present,
while for the outer one index 3 where we need to mask off the most
significant bit.
The r15-6742 change started using m_data[save_data_cnt] for all constant
indexes if it is still inside of the loop (and it is sign extension).
But that doesn't work correctly for the case where the inner conversion
produces the sign extension limb in the loop for an even index and
the outer conversion needs to special case the immediately next conversion,
because in that case using the PHI result will see still 0 there rather
than the updated value from the handling of previous limb.
So the following patch special cases this and uses the other SSA_NAME.
Commented IL, trying to lower
_1 = (unsigned _BitInt(255)) y_4(D);
_2 = (unsigned _BitInt(256)) _1;
_3 = _2 + x_5(D);
<retval> = _3;
we were emitting
<bb 3> [local count: 1073741824]:
# _8 = PHI <0(2), _9(12)> // This is the limb index
# _10 = PHI <0(2), _11(12)> // Sign extension limb from inner cast (0 or ~0UL)
# _22 = PHI <0(2), _23(12)> // Overflow bit from addition of previous limb
if (_8 <= 2)
goto <bb 4>; [80.00%]
else
goto <bb 7>; [20.00%]
<bb 11> [local count: 214748360]:
// And HERE is the actual bug. Using _10 for idx 3 will mean it is always
// zero there and doesn't contain the _18 value propagated to it.
// It should be
// _30 = (<unnamed-unsigned:63>) _11;
// Now if the outer conversion had special iteration say 5, we could
// have used _10 fine here, by that time it already propagates through
// the PHI.
_30 = (<unnamed-unsigned:63>) _10;
_31 = (unsigned long) _30;
PR tree-optimization/119707
* gimple-lower-bitint.cc (bitint_large_huge::handle_cast): Only use
m_data[save_data_cnt] instead of m_data[save_data_cnt + 1] if
idx is odd and equal to low + 1. Remember tree_to_uhwi (idx) in
a temporary instead of calling the function multiple times.
Jakub Jelinek [Wed, 9 Apr 2025 20:01:30 +0000 (22:01 +0200)]
libquadmath: Fix up THREEp96 constant in expq
Here is a cherry-pick from glibc [BZ #32411] fix.
As mentioned by the reporter in a pull request against gcc-mirror,
the THREEp96 constant in e_expl.c is incorrect, it is actually 0x3.p+94f128
rather than 0x3.p+96f128.
The algorithm uses that to compute the t2 integer (tval2), by whose
delta it adjusts the x+xl pair and then in the result uses the precomputed
exp value for that entry.
Using 0x3.p+94f128 rather than 0x3.p+96f128 results in tval2 sometimes
being one smaller, sometimes one larger than the desired value, thus can mean
the x+xl pair after adjustment will be larger in absolute value than it
should be.
DesWursters created a test program for this
https://github.com/DesWurstes/comparefloats
and his results were
total: 1135000000 not_equal: 4322 earlier_score: 674 later_score: 3648
I've modified this so with
https://sourceware.org/bugzilla/show_bug.cgi?id=32411#c3
so that it actually tests pseudo-random _Float128 values with range
(-16384.,16384) with strong bias on values larger than 0.0002 in absolute
value (so that tval1/tval2 aren't zero most of the time) and that gave
total: 10000000000 not_equal: 29861 earlier_score: 4606 later_score: 25255
So, in both cases, in most cases the change doesn't result in any differences,
and in those rare cases where does, about 85% have smaller ulp than without
the patch.
Additionally I've tried
https://sourceware.org/bugzilla/show_bug.cgi?id=32411#c4
and in 2 billion iterations it didn't find any case where x+xl after the
adjustments without this change would be smaller in absolute value compared
to x+xl after the adjustments with this change.
Jakub Jelinek [Fri, 4 Apr 2025 18:57:09 +0000 (20:57 +0200)]
lto: lto-opts fixes [PR119625]
I can reproduce a really weird error in our distro i686 trunk gcc
(but haven't managed to reproduce it with vanilla trunk yet).
echo 'void foo (void) {}' > a.c; gcc -O2 -flto=auto -m32 -march=i686 -ffat-lto-objects -fhardened -o a.o -c a.c; gcc -O2 -flto=auto -m32 -march=i686 -r -o a.lo a.o
lto1: fatal error: open failed: No such file or directory
compilation terminated.
lto-wrapper: fatal error: gcc returned 1 exit status
The error is because
cat ./a.lo.lto.o-args.0
""
a.o
My suspicion is that this "" in there is caused by weird .gnu.lto_.opts
section content during
gcc -O2 -flto=auto -m32 -march=i686 -ffat-lto-objects -fhardened -S -o a.s -c a.c
compilation (and I can reproduce that one with vanilla trunk).
The above results in
.section .gnu.lto_.opts,"e",@progbits
.string "'-fno-openmp' '-fno-openacc' '-fPIC' '' '-m32' '-march=i686' '-O2' '-flto=auto' '-ffat-lto-objects'"
There are two weird things, one (IMHO the cause of the "" later on) is
the '' part, I think it comes from lto_write_options doing
append_to_collect_gcc_options (&temporary_obstack, &first_p, "");
IMHO it shouldn't call append_to_collect_gcc_options at all for that case.
The -fhardened option causes global_options.x_flag_cf_protection
to be set to CF_FULL and later on the backend option processing
sets it to CF_FULL | CF_SET (i.e. 7, a value not handled in
lto_write_options).
The following patch fixes it by not emitting anything there if
flag_cf_protection is one of the unhandled values.
Perhaps it could incrementally use
switch (global_options.x_flag_cf_protection & ~CF_SET)
instead, dunno.
And the other problem is that the -fPIC in there is really weird.
Our distro compiler or vanilla configured trunk certainly doesn't
default to -fPIC and -fhardened uses -fPIE when
-fPIC/-fpic/-fno-pie/-fno-pic is not specified, so I was expecting
-fPIE in there.
The thing is that the -fpie option causes setting of both
global_options.x_flag_pi{c,e} to 1, -fPIE both to 2:
/* If -fPIE or -fpie is used, turn on PIC. */
if (opts->x_flag_pie)
opts->x_flag_pic = opts->x_flag_pie;
else if (opts->x_flag_pic == -1)
opts->x_flag_pic = 0;
if (opts->x_flag_pic && !opts->x_flag_pie)
opts->x_flag_shlib = 1;
so checking first for flag_pic == 2 and then flag_pic == 1
and only afterwards for flag_pie means we never print
-fPIE/-fpie.
Or do you want something further (like
switch (global_options.x_flag_cf_protection & ~CF_SET)
)?
2025-04-04 Jakub Jelinek <jakub@redhat.com>
PR lto/119625
* lto-opts.cc (lto_write_options): If neither flag_pic nor
flag_pie are set, check first for flag_pie and only later
for flag_pic rather than the other way around, use a temporary
variable. If flag_cf_protection is not set, don't append anything
if flag_cf_protection is none of CF_{NONE,FULL,BRANCH,RETURN} and
use a temporary variable.
Jakub Jelinek [Wed, 2 Apr 2025 17:28:20 +0000 (19:28 +0200)]
c: Fix ICEs with -fsanitize=pointer-{subtract,compare} [PR119582]
The following testcase ICEs because c_fully_fold isn't performed on the
arguments of __sanitizer_ptr_{sub,cmp} builtins and so e.g.
C_MAYBE_CONST_EXPR can leak into the gimplifier where it ICEs.
2025-04-02 Jakub Jelinek <jakub@redhat.com>
PR c/119582
* c-typeck.cc (pointer_diff, build_binary_op): Call c_fully_fold on
__sanitizer_ptr_sub or __sanitizer_ptr_cmp arguments.
Jakub Jelinek [Tue, 1 Apr 2025 14:40:55 +0000 (16:40 +0200)]
combine: Use reg_used_between_p rather than modified_between_p in two spots [PR119291]
The following testcase is miscompiled on x86_64-linux at -O2 by the combiner.
We have from earlier combinations
(insn 22 21 23 4 (set (reg:SI 104 [ _7 ])
(const_int 0 [0])) "pr119291.c":25:15 96 {*movsi_internal}
(nil))
(insn 23 22 24 4 (set (reg/v:SI 117 [ e ])
(reg/v:SI 116 [ e ])) 96 {*movsi_internal}
(expr_list:REG_DEAD (reg/v:SI 116 [ e ])
(nil)))
(note 24 23 25 4 NOTE_INSN_DELETED)
(insn 25 24 26 4 (parallel [
(set (reg:CCZ 17 flags)
(compare:CCZ (neg:SI (reg:SI 104 [ _7 ]))
(const_int 0 [0])))
(set (reg/v:SI 116 [ e ])
(neg:SI (reg:SI 104 [ _7 ])))
]) "pr119291.c":26:13 977 {*negsi_2}
(expr_list:REG_DEAD (reg:SI 104 [ _7 ])
(nil)))
(note 26 25 27 4 NOTE_INSN_DELETED)
(insn 27 26 28 4 (set (reg:DI 128 [ _9 ])
(ne:DI (reg:CCZ 17 flags)
(const_int 0 [0]))) "pr119291.c":26:13 1447 {*setcc_di_1}
(expr_list:REG_DEAD (reg:CCZ 17 flags)
(nil)))
and try_combine is called on i3 25 and i2 22 (second time)
and reach the hunk being patched with simplified i3
(insn 25 24 26 4 (parallel [
(set (pc)
(pc))
(set (reg/v:SI 116 [ e ])
(const_int 0 [0]))
]) "pr119291.c":28:13 977 {*negsi_2}
(expr_list:REG_DEAD (reg:SI 104 [ _7 ])
(nil)))
and
(insn 22 21 23 4 (set (reg:SI 104 [ _7 ])
(const_int 0 [0])) "pr119291.c":27:15 96 {*movsi_internal}
(nil))
Now, the try_combine code there attempts to split two independent
sets in newpat by moving one of them to i2.
And among other tests it checks
!modified_between_p (SET_DEST (set1), i2, i3)
which is certainly needed, if there would be say
(set (reg/v:SI 116 [ e ]) (const_int 42 [0x2a]))
in between i2 and i3, we couldn't do that, as that set would overwrite
the value set by set1 we want to move to the i2 position.
But in this case pseudo 116 isn't set in between i2 and i3, but used
(and additionally there is a REG_DEAD note for it).
This is equally bad for the move, because while the i3 insn
and later will see the pseudo value that we set, the insn in between
which uses the value will see a different value from the one that
it should see.
As we don't check for that, in the end try_combine succeeds and
changes the IL to:
(insn 22 21 23 4 (set (reg/v:SI 116 [ e ])
(const_int 0 [0])) "pr119291.c":27:15 96 {*movsi_internal}
(nil))
(insn 23 22 24 4 (set (reg/v:SI 117 [ e ])
(reg/v:SI 116 [ e ])) 96 {*movsi_internal}
(expr_list:REG_DEAD (reg/v:SI 116 [ e ])
(nil)))
(note 24 23 25 4 NOTE_INSN_DELETED)
(insn 25 24 26 4 (set (pc)
(pc)) "pr119291.c":28:13 2147483647 {NOOP_MOVE}
(nil))
(note 26 25 27 4 NOTE_INSN_DELETED)
(insn 27 26 28 4 (set (reg:DI 128 [ _9 ])
(const_int 0 [0])) "pr119291.c":28:13 95 {*movdi_internal}
(nil))
(note, the i3 got turned into a nop and try_combine also modified insn 27).
The following patch replaces the modified_between_p
tests with reg_used_between_p, my understanding is that
modified_between_p is a subset of reg_used_between_p, so one
doesn't need both.
Looking at this some more today, I think we should special case
set_noop_p because that can be put into i2 (except for the JUMP_P
violations), currently both modified_between_p (pc_rtx, i2, i3)
and reg_used_between_p (pc_rtx, i2, i3) returns false.
I'll post a patch incrementally for that (but that feels like
new optimization, so probably not something that should be backported).
On Tue, Apr 01, 2025 at 11:27:25AM +0200, Richard Biener wrote:
> Can we constrain SET_DEST (set1/set0) to a REG_P in combine? Why
> does the comment talk about memory?
I was worried about making too risky changes this late in stage4
(and especially also for backports). Most of this code is 1992-ish.
I think many of the functions are just misnamed, the reg_ in there doesn't
match what those functions do (bet they initially supported just REGs
and later on support for other kinds of expressions was added, but haven't
done git archeology to prove that).
What we know for sure is:
&& GET_CODE (SET_DEST (XVECEXP (newpat, 0, 0))) != ZERO_EXTRACT
&& GET_CODE (SET_DEST (XVECEXP (newpat, 0, 0))) != STRICT_LOW_PART
&& GET_CODE (SET_DEST (XVECEXP (newpat, 0, 1))) != ZERO_EXTRACT
&& GET_CODE (SET_DEST (XVECEXP (newpat, 0, 1))) != STRICT_LOW_PART
that is checked earlier in the condition.
Then it calls
&& ! reg_referenced_p (SET_DEST (XVECEXP (newpat, 0, 1)),
XVECEXP (newpat, 0, 0))
&& ! reg_referenced_p (SET_DEST (XVECEXP (newpat, 0, 0)),
XVECEXP (newpat, 0, 1))
While it has reg_* in it, that function mostly calls reg_overlap_mentioned_p
which is also misnamed, that function handles just fine all of
REG, MEM, SUBREG of REG, (SUBREG of MEM not, see below), ZERO_EXTRACT,
STRICT_LOW_PART, PC and even some further cases.
So, IMHO SET_DEST (set0) or SET_DEST (set0) can be certainly a REG, SUBREG
of REG, PC (at least the REG and PC cases are triggered on the testcase)
and quite possibly also MEM (SUBREG of MEM not, see below).
Now, the code uses !modified_between_p (SET_SRC (set{1,0}), i2, i3) where that
function for constants just returns false, for PC returns true, for REG
returns reg_set_between_p, for MEM recurses on the address, for
MEM_READONLY_P otherwise returns false, otherwise checks using alias.cc code
whether the memory could have been modified in between, for all other
rtxes recurses on the subrtxes. This part didn't change in my patch.
I've only changed those
- && !modified_between_p (SET_DEST (set{1,0}), i2, i3)
+ && !reg_used_between_p (SET_DEST (set{1,0}), i2, i3)
where the former has been described above and clearly handles all of
REG, SUBREG of REG, PC, MEM and SUBREG of MEM among other things.
The replacement reg_used_between_p calls reg_overlap_mentioned_p on each
instruction in between i2 and i3. So, there is clearly a difference
in behavior if SET_DEST (set{1,0}) is pc_rtx, in that case modified_between_p
returns unconditionally true even if there are no instructions in between,
but reg_used_between_p if there are no non-debug insns in between returns
false. Sorry for missing that, guess I should check for that (with the
exception of the noop moves which are often (set (pc) (pc)) and handled
by the incremental patch). In fact not just that, reg_used_between_p
will only return true for PC if it is mentioned anywhere in the insns
in between.
Anyway, except for that, for REG it calls refers_to_regno_p
and so should find any occurrences of any of the REG or parts of it for hard
registers, for MEM returns true if it sees any MEMs in insns in between
(conservatively), for SUBREGs apparently it relies on it being SUBREG of REG
(so doesn't handle SUBREG of MEM) and handles SUBREG of REG like the
SUBREG_REG, PC I've already described.
Now, because reg_overlap_mentioned_p doesn't handle SUBREG of MEM, I think
already the initial
&& ! reg_referenced_p (SET_DEST (XVECEXP (newpat, 0, 1)),
XVECEXP (newpat, 0, 0))
&& ! reg_referenced_p (SET_DEST (XVECEXP (newpat, 0, 0)),
XVECEXP (newpat, 0, 1))
calls would have failed --enable-checking=rtl or would have misbehaved, so
I think there is no need to check for it further.
To your question why I don't use reg_referenced_p, that is because
reg_referenced_p is something to call on one insn pattern, while
reg_used_between_p is pretty much that on all insns in between two
instructions (excluding the boundaries).
So, I think it would be safer to add && SET_DEST (set{1,0} != pc_rtx
checks to preserve former behavior, like in the following version.
2025-04-01 Jakub Jelinek <jakub@redhat.com>
PR rtl-optimization/119291
* combine.cc (try_combine): For splitting of PARALLEL with
2 independent SETs into i2 and i3 sets check reg_used_between_p
of the SET_DESTs rather than just modified_between_p.
Tomasz Kamiński [Tue, 11 Mar 2025 10:59:36 +0000 (11:59 +0100)]
libstdc++: Correct preprocessing checks for floatX_t and bfloat_16 formatting
Floating points types _Float16, _Float32, _Float64, and bfloat16,
can be formatted only if std::to_chars overloads for such types
were provided. Currently this is only the case for architectures
where float and double are 32-bits and 64-bits IEEE floating points types.
This patch updates the preprocessing checks for formatters
for above types to check _GLIBCXX_FLOAT_IS_IEEE_BINARY32
and _GLIBCXX_DOUBLE_IS_IEEE_BINARY64. Making them non-formattable
on non-IEEE architectures.
Remove a potential UB, where we could produce basic_format_arg
with _M_type set to _Arg_fp32 or _Arg_fp64, that was later not
handled by `_M_visit`.
libstdc++-v3/ChangeLog:
* include/std/format (formatter<_Float16, _CharT>): Define only if
_GLIBCXX_FLOAT_IS_IEEE_BINARY32 macro is defined.
(formatter<_Float16, _CharT>): As above.
(formatter<__gnu_cxx::__bfloat16_t, _CharT>): As above.
(formatter<_Float64, _CharT>): Define only if
_GLIBCXX_DOUBLE_IS_IEEE_BINARY64 is defined.
(basic_format_arg::_S_to_arg_type): Normalize _Float32 and _Float64
only to float and double respectivelly.
(basic_format_arg::_S_to_enum): Remove handling of _Float32 and _Float64.
Reviewed-by: Jonathan Wakely <jwakely@redhat.com> Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>
(cherry picked from commit 445128c12cf22081223f7385196ee3889ef4c4b2)
i386: Enable -mnop-mcount for -fpic with PLTs [PR119386]
-mnop-mcount can be trivially enabled for -fPIC codegen as long as PLTs
are being used, given that the instruction encodings are identical, only
the target may resolve differently depending on how the linker decides
to incorporate the object file.
So relax the option check, and add a test to ensure that 5-byte NOPs are
emitted when -mnop-mcount is being used.
i386: Prefer PLT indirection for __fentry__ calls under -fPIC [PR119386]
Commit bde21de1205 ("i386: Honour -mdirect-extern-access when calling
__fentry__") updated the logic that emits mcount() / __fentry__() calls
into function prologues when profiling is enabled, to avoid GOT-based
indirect calls when a direct call would suffice.
There are two problems with that change:
- it relies on -mdirect-extern-access rather than -fno-plt to decide
whether or not a direct [PLT based] call is appropriate;
- for the PLT case, it falls through to x86_print_call_or_nop(), which
does not emit the @PLT suffix, resulting in the wrong relocation to be
used (R_X86_64_PC32 instead of R_X86_64_PLT32)
Fix this by testing flag_plt instead of ix86_direct_extern_access, and
updating x86_print_call_or_nop() to take flag_pic and flag_plt into
account. This also ensures that -mnop-mcount works as expected when
emitting the PLT based profiling calls.
While at it, fix the 32-bit logic as well, and issue a PLT call unless
PLTs are explicitly disabled.
PR target/119386
* config/i386/i386.cc (x86_print_call_or_nop): Add @PLT suffix
where appropriate.
(x86_function_profiler): Fall through to x86_print_call_or_nop()
for PIC codegen when flag_plt is set.
gcc/testsuite/ChangeLog:
PR target/119386
* gcc.target/i386/pr119386-1.c: New test.
* gcc.target/i386/pr119386-2.c: New test.
Eric Botcazou [Wed, 16 Apr 2025 20:01:31 +0000 (22:01 +0200)]
Fix wrong optimization of conditional expression with enumeration type
This is a regression introduced on the mainline and 14 branch by:
https://gcc.gnu.org/pipermail/gcc-cvs/2023-October/391658.html
The change bypasses int_fits_type_p (essentially) to work around the
signedness constraints, but in doing so disregards the peculiarities
of boolean types whose precision is not 1 dealt with by the predicate,
leading to the creation of a problematic conversion here.
Fixed by special-casing boolean types whose precision is not 1, as done
in several other places.
gcc/
* tree-ssa-phiopt.cc (factor_out_conditional_operation): Do not
bypass the int_fits_type_p test for boolean types whose precision
is not 1.
gcc/testsuite/
* gnat.dg/opt105.adb: New test.
* gnat.dg/opt105_pkg.ads, gnat.dg/opt105_pkg.adb: New helper.
Alice Carlotti [Tue, 15 Apr 2025 16:36:25 +0000 (17:36 +0100)]
aarch64: Disable sysreg feature gating
This applies to the sysreg read/write intrinsics __arm_[wr]sr*. It does
not depend on changes to Binutils, because GCC converts recognised
sysreg names to an encoding based form, which is already ungated in Binutils.
We have, however, agreed to make an equivalent change in Binutils (which
would then disable feature gating for sysreg accesses in inline
assembly), but this has not yet been posted upstream.
In the future we may introduce a new flag to renable some checking,
but these checks could not be comprehensive because many system
registers depend on architecture features that don't have corresponding
GCC/GAS --march options. This would also depend on addressing numerous
inconsistencies in the existing list of sysreg feature dependencies.