Jeff Law [Thu, 3 Jul 2025 12:44:31 +0000 (06:44 -0600)]
[RISC-V][PR target/118886] Refine when two insns are signaled as fusion candidates
A number of folks have had their fingers in this code and it's going to take a
few submissions to do everything we want to do.
This patch is primarily concerned with avoiding signaling that fusion can occur
in cases where it obviously should not be signaling fusion.
Every DEC based fusion I'm aware of requires the first instruction to set a
destination register that is both used and set again by the second instruction.
If the two instructions set different registers, then the destination of the
first instruction was not dead and would need to have a result produced.
This is complicated by the fact that we have pseudo registers prior to reload.
So the approach we take is to signal fusion prior to reload even if the
destination registers don't match. Post reload we require them to match.
That allows us to clean up the code ever-so-slightly.
Second, we sometimes signaled fusion into loads that weren't scalar integer
loads. I'm not aware of a design that's fusing into FP loads or vector loads.
So those get rejected explicitly.
Third, the store pair "fusion" code is cleaned up a little. We use fusion to
model store pair commits since the basic properties for detection are the same.
The point where they "fuse" is different. Also this code liked to "return
false" at each step along the way if fusion wasn't possible. Future work for
additional fusion cases makes that behavior undesirable. So the logic gets
reworked a little bit to be more friendly to future work.
Fourth, if we already fused the previous instruction, then we can't fuse it
again. Signaling fusion in that case is, umm, bad as it creates an atomic blob
of code from a scheduling standpoint.
Hopefully I got everything correct with extracting this work out of a larger
set of changes 🙂 We will contribute some instrumentation & testing code so if
I botched things in a major way we'll soon have a way to test that and I'll be
on the hook to fix any goof's.
From a correctness standpoint this should be a big fat nop. We've seen this
make measurable differences in pico benchmarks, but obviously as you scale up
to bigger stuff the gains largely disappear into the noise.
This has been through Ventana's internal CI and my tester. I'll obviously wait
for a verdict from the pre-commit tester.
PR target/118886
gcc/
* config/riscv/riscv.cc (riscv_macro_fusion_pair_p): Check
for fusion being disabled earlier. If PREV is already fused,
then it can't be fused again. Be more selective about fusing
when the destination registers do not match. Don't fuse into
loads that aren't scalar integer modes. Revamp store pair
commit support.
Co-authored-by: Daniel Barboza <dbarboza@ventanamicro.com> Co-authored-by: Shreya Munnangi <smunnangi1@ventanamicro.com>
Karl Meakin [Thu, 3 Jul 2025 11:48:34 +0000 (12:48 +0100)]
AArch64: make rules for CBZ/TBZ higher priority
Move the rules for CBZ/TBZ to be above the rules for
CBB<cond>/CBH<cond>/CB<cond>. We want them to have higher priority
because they can express larger displacements.
gcc/ChangeLog:
* config/aarch64/aarch64.md (aarch64_cbz<optab><mode>1): Move
above rules for CBB<cond>/CBH<cond>/CB<cond>.
(*aarch64_tbz<optab><mode>1): Likewise.
Karl Meakin [Thu, 3 Jul 2025 11:48:32 +0000 (12:48 +0100)]
AArch64: precommit test for CMPBR instructions
Commit the test file `cmpbr.c` before rules for generating the new
instructions are added, so that the changes in codegen are more obvious
in the next commit.
gcc/testsuite/ChangeLog:
* lib/target-supports.exp: Add `cmpbr` to the list of extensions.
* gcc.target/aarch64/cmpbr.c: New test.
Karl Meakin [Thu, 3 Jul 2025 11:48:31 +0000 (12:48 +0100)]
AArch64: recognize `+cmpbr` option
Add the `+cmpbr` option to enable the FEAT_CMPBR architectural
extension.
gcc/ChangeLog:
* config/aarch64/aarch64-option-extensions.def (cmpbr): New
option.
* config/aarch64/aarch64.h (TARGET_CMPBR): New macro.
* doc/invoke.texi (cmpbr): New option.
Karl Meakin [Thu, 3 Jul 2025 11:48:28 +0000 (12:48 +0100)]
AArch64: rename branch instruction rules
Give the `define_insn` rules used in lowering `cbranch<mode>4` to RTL
more descriptive and consistent names: from now on, each rule is named
after the AArch64 instruction that it generates. Also add comments to
document each rule.
gcc/ChangeLog:
* config/aarch64/aarch64.md (condjump): Rename to ...
(aarch64_bcond): ...here.
(*compare_condjump<GPI:mode>): Rename to ...
(*aarch64_bcond_wide_imm<GPI:mode>): ...here.
(aarch64_cb<optab><mode>): Rename to ...
(aarch64_cbz<optab><mode>1): ...here.
(*cb<optab><mode>1): Rename to ...
(*aarch64_tbz<optab><mode>1): ...here.
(@aarch64_tb<optab><ALLI:mode><GPI:mode>): Rename to ...
(@aarch64_tbz<optab><ALLI:mode><GPI:mode>): ...here.
(restore_stack_nonlocal): Handle rename.
(stack_protect_combined_test): Likewise.
* config/aarch64/aarch64-simd.md (cbranch<mode>4): Likewise.
* config/aarch64/aarch64-sme.md (aarch64_restore_za): Likewise.
* config/aarch64/aarch64.cc (aarch64_gen_test_and_branch): Likewise.
Karl Meakin [Thu, 3 Jul 2025 11:48:27 +0000 (12:48 +0100)]
AArch64: place branch instruction rules together
The rules for conditional branches were spread throughout `aarch64.md`.
Group them together so it is easier to understand how `cbranch<mode>4`
is lowered to RTL.
Nathan Myers [Mon, 30 Jun 2025 22:55:48 +0000 (18:55 -0400)]
libstdc++: construct bitset from string_view (P2697) [PR119742]
Add a bitset constructor from string_view, per P2697. Fix existing
tests that would fail to detect incorrect exception behavior.
Argument checks that result in exceptions guarded by "#if HOSTED"
are made unguarded because the functions called to throw just call
terminate() in free-standing builds. Improve readability in Doxygen
comments. Generalize a private member argument-checking function
to work with string and string_view without mentioning either,
obviating need for guards.
The version.h symbol is not "hosted" because string_view, though
not specified to be available in free-standing builds, is defined
there and the feature is useful there.
libstdc++-v3/ChangeLog:
PR libstdc++/119742
* include/bits/version.def: Add preprocessor symbol.
* include/bits/version.h: Add preprocessor symbol.
* include/std/bitset: Add constructor.
* testsuite/20_util/bitset/cons/1.cc: Fix.
* testsuite/20_util/bitset/cons/6282.cc: Fix.
* testsuite/20_util/bitset/cons/string_view.cc: Test new ctor.
* testsuite/20_util/bitset/cons/string_view_wide.cc: Test new ctor.
tree-optimization/120780: Support object size for containing objects
MEM_REF cast of a subobject to its containing object has negative
offsets, which objsz sees as an invalid access. Support this use case
by peeking into the structure to validate that the containing object
indeed contains a type of the subobject at that offset and if present,
adjust the wholesize for the object to allow the negative offset.
If __mcount_loc wasn't used, we got an unused label. Update
x86_function_profiler to emit label only when __mcount_loc section
is used.
gcc/
PR target/120936
* config/i386/i386.cc (x86_print_call_or_nop): Add a label
argument and use it to print label.
(x86_function_profiler): Emit label only when __mcount_loc
section is used.
Jan Hubicka [Thu, 3 Jul 2025 10:05:45 +0000 (12:05 +0200)]
Add -Wauto-profile warning
this patch adds new warning -Wauto-profile which warns about mismatches between
profile data and function bodies. This is implemented during the offline pass
where every function instance is compared with actual gimple body (if
available) and we verify that the statement locations in the profile data can
be matched with statements in the function.
Currently it is mostly useful to find bugs, but eventually I hope it will be
useful for users to verify that auto-profile works as expected or to evaulate
how much of an old auto-profile data can still be applied to current sources.
There will probably be always some side cases we can not handle with
auto-profile format (such as function with bodies in mutlple files) that can be
patched in compiled program.
I also added logic to fix up missing discriminators in the function callsites.
I am not sure how those happens (but seem to go away with -fno-crossjumping)
and will dig into it.
Ohter problem is that without -flto at the train run inlined functions have
dwarf names rather than symbol names. LLVM solves this by
-gdebug-for-autoprofile flag that we could also have. With this flag we could
output assembler names as well as multiplicities of statemnets.
Building SPECint there are approx 7k profile mismatches.
Bootstrapped/regtested x86_64-linux. Plan to commit it after some extra testing.
gcc/ChangeLog:
* auto-profile.cc (get_combined_location): Handle negative
offsets; output better diagnostics.
(get_relative_location_for_locus): Reutrn -1 for unknown location.
(function_instance::get_cgraph_node): New member function.
(match_with_target): New function.
(dump_stmt): New function.
(function_instance::lookup_count): New function.
(mark_expr_locations): New function.
(function_instance::match): New function.
(autofdo_source_profile::offline_external_functions): Do
not repeat renaming; manage two worklists and do matching.
(autofdo_source_profile::offline_unrealized_inlines): Simplify.
(afdo_set_bb_count): do not look for lost discriminators.
(auto_profile): Do not ICE when profile reading failed.
* common.opt (Wauto-profile): New warning flag
* doc/invoke.texi (-Wauto-profile): Document.
Jan Hubicka [Thu, 3 Jul 2025 10:00:05 +0000 (12:00 +0200)]
Make inliner loop hints more agressive
This patch makes loop inline hints more agressive. If we know iteration
count or stride, we currently assume improvement in time relative to
preheader count. I changed it to header count, since this knowledge
is supposed to likely help unrolling and vectorizing which brings
benefits relative to that.
* ipa-fnsummary.cc (analyze_function_body): For loop
heuristics use header count instead of preheader count.
Jan Hubicka [Thu, 3 Jul 2025 09:56:28 +0000 (11:56 +0200)]
Fix division by zero in ipa-cp.cc:update_profiling_info
This ICE has triggered for me during autoprofiledbootstrap. The
code already takes into care possible range, so I think in this case
we can just push to one side of it.
Bootstrapped/regtesed x86_64-linux, OK?
gcc/ChangeLog:
* ipa-cp.cc (update_profiling_info): Watch for division by zero.
Remove the checks on coranks conformability in expressions,
because there is nothing in the standard about it. When a coarray
has no coindexes it it treated like a non-coarray, when it has
a full-corank coindex its result is a regular array. So nothing
to check for corank conformability.
PR fortran/120843
gcc/fortran/ChangeLog:
* resolve.cc (resolve_operator): Remove conformability check,
because it is not in the standard.
gcc/testsuite/ChangeLog:
* gfortran.dg/coarray/coindexed_6.f90: Enhance test to have
coarray components covered.
Jonathan Wakely [Wed, 2 Jul 2025 20:54:06 +0000 (21:54 +0100)]
libstdc++: Fix regression in std::uninitialized_fill for C++98 [PR120931]
A typo in r15-4473-g3abe751ea86e34 made it ill-formed to use
std::uninitialized_fill with iterators that aren't pointers (or pointers
wrapped in our __normal_iterator) if the value type is a narrow
character type.
libstdc++-v3/ChangeLog:
PR libstdc++/120931
* include/bits/stl_uninitialized.h (__uninitialized_fill<true>):
Fix typo resulting in call to __do_uninit_copy instead of
__do_uninit_fill.
* testsuite/20_util/specialized_algorithms/uninitialized_fill/120931.cc:
New test.
Alex Coplan [Thu, 19 Jun 2025 11:38:11 +0000 (12:38 +0100)]
aarch64: Drop const_int from aarch64_maskload_else_operand
The "else operand" to maskload should always be a const_vector, never a
const_int.
This was just an issue I noticed while looking through the code, I don't
have a testcase which shows a concrete problem due to this.
Testing of that change alone showed ICEs with load lanes vectorization
and SVE. That turned out to be because the backend pattern was missing
a mode for the else operand (causing the middle-end to choose a
const_int during expansion), fixed thusly. That in turn exposed an
issue with the unpredicated load lanes expander which was using the
wrong mode for the else operand, so fixed that too.
gcc/ChangeLog:
* config/aarch64/aarch64-sve.md
(vec_load_lanes<mode><vsingle>): Expand else operand in
subvector mode, as per optab documentation.
(vec_mask_load_lanes<mode><vsingle>): Add missing mode for
operand 3.
* config/aarch64/predicates.md (aarch64_maskload_else_operand):
Remove const_int.
Alex Coplan [Mon, 30 Jun 2025 14:06:03 +0000 (15:06 +0100)]
doc: Clarify mode of else operand for vec_mask_load_lanesmn
This extends the documentation of the vec_mask_load_lanes<m><n> optab to
explicitly state that the mode of the else operand is n, i.e. the mode
of a single subvector.
gcc/ChangeLog:
* doc/md.texi (Standard Names): Clarify mode of else operand for
vec_mask_load_lanesmn optab.
Jan Hubicka [Thu, 3 Jul 2025 08:25:39 +0000 (10:25 +0200)]
Enable ipa-cp cloning for cold wrappers of hot functions
ipa-cp cloning disables itself for all functions not passing opt_for_fn
(node->decl, optimize_size) which disables it for cold wrappers of hot
functions where we want to propagate. Since we later want to time saved
to be considered hot, we do not need to make this early test.
The patch also fixes few other places where AFDO 0 disables ipa-cp.
gcc/ChangeLog:
* ipa-cp.cc (cs_interesting_for_ipcp_p): Handle
correctly GLOBAL0 afdo counts.
(ipcp_cloning_candidate_p): Do not rule out nodes
!node->optimize_for_size_p ().
(good_cloning_opportunity_p): Handle afdo counts
as non-zero.
Jan Hubicka [Thu, 3 Jul 2025 08:19:31 +0000 (10:19 +0200)]
Fix overlfow in ipa-cp heuristics
ipa-cp converts sreal times to int, while point of sreal is to accomodate very
large values that can happen for loops with large number of iteraitons and also
when profile is inconsistent. This happens with afdo in testsuite where loop
preheader is estimated to have 0 excutions while loop body has large number of
executions.
Bootstrapped/regtesed x86_64-linux, comitted.
gcc/ChangeLog:
* ipa-cp.cc (hint_time_bonus): Return sreal and avoid
conversions to integer.
(good_cloning_opportunity_p): Avoid sreal to integer
conversions
(perform_estimation_of_a_value): Update.
Jan Hubicka [Tue, 1 Jul 2025 06:32:56 +0000 (08:32 +0200)]
Auto-FDO/FDO profile comparator
the patch I sent from airport only worked if you produced the gcda files with
unpatched compiler. For some reason auto-profile reading is interwinded into
gcov reading which is not necessary. Here is cleaner version which also
makes the format bit more convenient. One can now grep as:
This dumps basic blocks that do have large counts by normal profile feedback
but autofdo gives them small count (so they get cold). These seems to be
indeed mostly basic blocks controlling loops.
gcc/ChangeLog:
* auto-profile.cc (afdo_hot_bb_threshod): New global
variable.
(maybe_hot_afdo_count_p): New function.
(autofdo_source_profile::read): Do not set up dump file;
set afdo_hot_bb_threshod.
(afdo_annotate_cfg): Handle partial training.
(afdo_callsite_hot_enough_for_early_inline):
Use maybe_hot_afdo_count_p.
(auto_profile_offline::execute): Read autofdo file.
* auto-profile.h (maybe_hot_afdo_count_p): Declare.
(afdo_hot_bb_threshold): Declare.
* coverage.cc (read_counts_file): Also set gcov_profile_info.
(coverage_init): Do not read autofdo file.
* opts.cc (enable_fdo_optimizations): Add autofdo parameter;
do not set flag_branch_probabilities and flag_profile_values
with it.
(common_handle_option): Update.
* passes.cc (finish_optimization_passes): Do not end branch
prob here.
(pass_manager::dump_profile_report): Also mark change after
autofdo pass.
* profile.cc: Include auto-profile.h
(gcov_profile_info): New global variable.
(struct afdo_fdo_record): New struture.
(compute_branch_probabilities): Record afdo profile.
(end_branch_prob): Dump afdo/fdo profile comparsion.
* profile.h (gcov_profile_info): Declarre.
* tree-profile.cc (tree_profiling): Call end_branch_prob
(pass_ipa_tree_profile::gate): Also enable with autoFDO
Eric Botcazou [Thu, 12 Jun 2025 20:31:06 +0000 (22:31 +0200)]
ada: Enforce alignment constraint for large Object_Size clauses
The constraint is that the Object_Size must be a multiple of the alignment
in bits. But it's enforced only when the value of the clause is lower than
the Value_Size rounded up to the alignment in bits, not for larger values.
gcc/ada/ChangeLog:
* gcc-interface/decl.cc (gnat_to_gnu_entity): Use default messages
for errors reported for Object_Size clauses.
(validate_size): Give an error for stand-alone objects of composite
types if the specified size is not a multiple of the alignment.
Eric Botcazou [Mon, 26 May 2025 07:25:57 +0000 (09:25 +0200)]
ada: Fix alignment violation for mix of aligned and misaligned composite types
This happens when the chain of initialization procedures is called on the
subcomponents and causes the creation of temporaries along the way out of
alignment considerations. Now these temporaries are not necessary in the
context and were not created until recently, so this gets rid of them.
gcc/ada/ChangeLog:
* gcc-interface/trans.cc (addressable_p): Add COMPG third parameter.
<COMPONENT_REF>: Do not return true out of alignment considerations
for non-strict-alignment targets if COMPG is set.
(Call_to_gnu): Pass true as COMPG in the call to the addressable_p
predicate if the called subprogram is an initialization procedure.
Eric Botcazou [Tue, 6 May 2025 17:14:40 +0000 (19:14 +0200)]
ada: Fix wrong finalization of constrained subtype of unconstrained array type
This implements the Is_Constr_Array_Subt_With_Bounds flag for allocators.
gcc/ada/ChangeLog:
* gcc-interface/trans.cc (gnat_to_gnu) <N_Allocator>: Allocate the
bounds alongside the data if the Is_Constr_Array_Subt_With_Bounds
flag is set on the designated type.
<N_Free_Statement>: Take into account the allocated bounds if the
Is_Constr_Array_Subt_With_Bounds flag is set on the designated type.
Eric Botcazou [Thu, 1 May 2025 23:30:56 +0000 (01:30 +0200)]
ada: Fix missing error on too large Component_Size not multiple of storage unit
This is a small regression introduced a few years ago.
gcc/ada/ChangeLog:
* gcc-interface/decl.cc (gnat_to_gnu_component_type): Validate the
Component_Size like the size of a type only if the component type
is actually packed.
In addition, the checks for "Is_Composite_Type" in
Tbuild.Unchecked_Convert_To are narrowed to "not Is_Scalar_Type";
that way, useless duplicate unchecked conversions of access types will
be removed as for composite types.
Insert_Actions performs a sanity check when it goes through an
expression with actions while going up the three. That check was not
perfectly right before this patch and spuriously failed when inserting
range checks in some situation. This patch makes the check more robust.
Daniel King [Thu, 12 Jun 2025 09:03:53 +0000 (10:03 +0100)]
ada: Port System.Stack_Usage to CHERI
This unit performed integer to address conversions to calculate stack addresses
which, on a CHERI target, result in an invalid capability that triggers a
capability tag fault when dereferenced during stack filling. This patch updates
the unit to preserve addresses (capabilities) during the calculations.
The method used to determine the stack base address is also updated to CHERI.
The current method tries to get the stack base from the compiler info for the
current task. If no info is found, then as a fallback it estimates the base by
taking the address of a variable on the stack. This address is then derived to
calculate the range of addresses to fill the stack.
This fallback does not work on CHERI since taking the 'Address of a stack variable
will result in a capability with bounds restricted to that object and attempting to
write outside those bounds triggers a capability bounds fault. Instead, we add a
new function Get_Stack_Base which, on CHERI, gets the exact stack base from the
upper bound of the capability stack pointer (CSP) register. On non-CHERI platforms,
Get_Stack_Base returns the stack base from the compiler info, resulting in the same
behaviour as before on those platforms.
gcc/ada/ChangeLog:
* Makefile.rtl (LIBGNAT_TARGET_PAIRS): New unit s-tsgsba__cheri.adb for morello-freebsd.
* libgnarl/s-tassta.adb (Get_Stack_Base): New function.
* libgnarl/s-tsgsba__cheri.adb: New file for CHERI targets.
* libgnarl/s-tsgsba.adb: New default file for non-CHERI targets.
* libgnat/s-stausa.adb (Fill_Stack, Compute_Result): Port to CHERI.
* libgnat/s-stausa.ads (Initialize_Analyzer, Stack_Analyzer): Port to CHERI.
Piotr Trojanek [Wed, 11 Jun 2025 14:41:00 +0000 (16:41 +0200)]
ada: Improve retrieval of nominal unconstrained type in extended return
When extended return statement declares object using an explicit subtype
indication, then it is better to recover the original unconstrained type using
the explicit subtype indication. This appears to be necessary for subtypes with
predicates.
gcc/ada/ChangeLog:
* sem_ch3.adb (Check_Return_Subtype_Indication): Use type from
explicit subtype indication, when possible.
Piotr Trojanek [Tue, 10 Jun 2025 22:20:13 +0000 (00:20 +0200)]
ada: Adjust message about statically compatible result subtype
Ada RM 6.5(5.3/5) is about "result SUBTYPE of the function", while the error
message was saying "result TYPE of the function". Now use the exact RM wording
in the error message for this rule.
gcc/ada/ChangeLog:
* sem_ch3.adb (Check_Return_Subtype_Indication): Adjust error message
to match the RM wording.
Piotr Trojanek [Tue, 10 Jun 2025 14:29:30 +0000 (16:29 +0200)]
ada: Fix constraint-related legality checks in extended return statements
Legality checks in extended return statements were (almost) literally
implementing the RM rules, but the when analyzing the return object declaration
we replace the nominal subtype of that object with its constrained subtype.
(It is a bit odd to have such an expansion activity in analysis, but we already
rely on this particular expansion in quite a few places).
gcc/ada/ChangeLog:
* sem_ch3.adb (Check_Return_Subtype_Indication): Use the nominal
subtype of a return object; literally implement the RM rule about
elementary types; check for static subtype compatibility both when
the subtype is given as a subtype mark and a subtype indication.
Denis Mazzucato [Fri, 6 Jun 2025 07:53:00 +0000 (07:53 +0000)]
ada: Fix node copy with functions as actual parameters in dispatching DIC
When dispatching in a Default_Initial_Condition, copying the condition
node crashes if there is a, possibly nested, parameterless function as
actual parameter; there were two issues:
1. Subp_Entity in Check_Dispatching_call was uninitialized, a GNAT SAS
finding.
2. The controlling argument update logic only tried to propagate the
update by traversing the actual parameters, leading to a crash in
case of parameterless functions.
This patch initializes Subp_Entity and allows the update of controlling
argument to succeed even when no traversal happened.
gcc/ada/ChangeLog:
* sem_disp.adb (Check_Dispatching_call): Fix uninitialized Subp_Entity.
* sem_util.adb (Update_Controlling_Argument): No need to replace controlling argument
in case of functions.
The Finalizable aspect introduced controlled types for which not all the
finalization primitives exist. This patch makes Make_Deep_Record_Body
handle this case correctly.
gcc/ada/ChangeLog:
* exp_ch7.adb (Make_Deep_Record_Body): Fix case of absent Initialize
primitive.
Since the introduction of the Finalizable aspect, there can be types
for which Is_Controlled returns True but that don't have all three
finalization primitives. The Generate_Finalization_Actions raised an
exception in that case before this patch, which fixes the problem.
gcc/ada/ChangeLog:
* exp_aggr.adb (Generate_Finalization_Actions): Stop assuming that
initialize primitive exists.
Gary Dismukes [Tue, 3 Jun 2025 01:01:12 +0000 (01:01 +0000)]
ada: Enforce visibility of unit used as a parent instance of a child instance
In cases involving instantiation of a generic child unit, the visibility
of the parent unit was mishandled, allowing the parent to be referenced
in another compilation unit that has visibility of the child instance
but no with_clause for the parent of the instance.
gcc/ada/ChangeLog:
* sem_ch12.adb (Install_Spec): Remove "not Is_Generic_Instance (Par)"
in test for setting Instance_Parent_Unit. Revise comment to no longer
say "noninstance", plus remove "???".
(Remove_Parent): Restructure if_statement to allow for both "elsif"
parts to be executed (by changing them to be separate if_statements
within an "else" part).
Piotr Trojanek [Wed, 4 Jun 2025 10:08:58 +0000 (12:08 +0200)]
ada: Cleanup in type support subprograms code
Code cleanup; semantics is unaffected.
gcc/ada/ChangeLog:
* exp_tss.adb (TSS): Refactor IF condition to make code smaller.
* lib.adb (Increment_Serial_Number, Synchronize_Serial_Number):
Use type of renamed object when creating renaming.
* lib.ads (Unit_Record): Refine subtype of dependency number.
Eric Botcazou [Tue, 3 Jun 2025 16:54:03 +0000 (18:54 +0200)]
ada: Fix spurious Constraint_Error raised by 'Value of fixed-point types
This happens for very large Smalls with regard to the size of the mantissa,
because the prerequisites of the implementation used in this case are not
met, although they are documented in the head comment of Integer_To_Fixed.
This change documents them at the beginning of the body of System.Value_F
and adjusts the compiler interface accordingly.
gcc/ada/ChangeLog:
* libgnat/s-valuef.adb: Document the prerequisites more precisely.
* libgnat/a-tifiio.adb (OK_Get_32): Adjust to the prerequisites.
(OK_Get_64): Likewise.
* libgnat/a-tifiio__128.adb (OK_Get_32): Likewise.
(OK_Get_64): Likewise.
(OK_Get_128): Likewise.
* libgnat/a-wtfiio.adb (OK_Get_32): Likewise.
(OK_Get_64): Likewise.
* libgnat/a-wtfiio__128.adb (OK_Get_32): Likewise.
(OK_Get_64): Likewise.
(OK_Get_128): Likewise.
* libgnat/a-ztfiio.adb (OK_Get_32): Likewise.
(OK_Get_64): Likewise.
* libgnat/a-ztfiio__128.adb (OK_Get_32): Likewise.
(OK_Get_64): Likewise.
(OK_Get_128): Likewise.
* exp_imgv.adb (Expand_Value_Attribute): Adjust the conditions under
which the RE_Value_Fixed{32,64,128} routines are called for ordinary
fixed-point types.
ada: Fix assertion failure on finalizable aggregate
The Finalizable aspect makes it possible that
Insert_Actions_In_Scope_Around is entered with an empty list of after
actions. This patch fixes a condition that was not quite right in this
case.
Bob Duff [Fri, 30 May 2025 13:38:04 +0000 (09:38 -0400)]
ada: Correct documentation of policy_identifiers for Assertion_Policy
Follow-on to gnat-945.
Change Ignore to Disable; Ignore is defined by the language,
Disable is the implementation-defined one.
Also minor code cleanup.
gcc/ada/ChangeLog:
* doc/gnat_rm/implementation_defined_characteristics.rst:
Change Ignore to Disable.
* sem_ch13.ads (Analyze_Aspect_Specifications):
Minor: Remove incorrect comment; there is no need to check
Has_Aspects (N) at the call site.
* gnat_rm.texi: Regenerate.
* gnat_ugn.texi: Regenerate.
Bob Duff [Fri, 30 May 2025 12:07:43 +0000 (08:07 -0400)]
ada: Remove Empty_Or_Error
Minor stylistic improvement: Remove Empty_Or_Error, and replace
comparisons with Empty_Or_Error with "[not] in Empty | Error".
(Found while working on VAST.)
Viljar Indus [Thu, 29 May 2025 07:54:30 +0000 (10:54 +0300)]
ada: Call Semantics when analyzing a renamed package
Calling Semantics here will additionally update the reference to
Current_Sem_Unit the renamed unit so that we will not receive
bogus visibility errors when checking for self-referential with-s.
gcc/ada/ChangeLog:
* sem_ch10.adb(Analyze_With_Clause): Call Semantics instead
of Analyze to bring Current_Sem_Unit up to date.
Eric Botcazou [Tue, 27 May 2025 11:32:18 +0000 (13:32 +0200)]
ada: Fix wrong conversion of controlled array with representation change
The problem is that a temporary is created for the conversion because of the
representation change, and it is finalized without having been initialized.
gcc/ada/ChangeLog:
* exp_ch4.adb (Handle_Changed_Representation): Alphabetize local
variables. Set the No_Finalize_Actions flag on the assignment.
Joffrey Huguet [Thu, 12 Dec 2024 14:40:46 +0000 (15:40 +0100)]
ada: Support Potentially_Invalid aspect in the frontend
The Potentially_Invalid aspect is used to disable the SPARK assumption
that all read data is valid on a case-by-case basis in GNATprove.
gcc/ada/ChangeLog:
* aspects.ads: Define an identifier for Potentially_Invalid.
* doc/gnat_rm/implementation_defined_aspects.rst: Add section for Potentially_Invalid.
* sem_attr.adb (Analyze_Attribute_Old_Result): Attribute Old is allowed to occur in a
Potentially_Invalid aspect.
* sem_ch13.adb (Analyze_Aspect_Specifications): Handle Potentially_Invalid.
* sem_util.adb (Has_Potentially_Invalid): Returns True iff an entity is subject to the
Potentially_Invalid aspect.
* sem_util.ads (Has_Potentially_Invalid): Idem.
* snames.ads-tmpl: New name for Potentially_Invalid.
* gnat_rm.texi: Regenerate.
Piotr Trojanek [Tue, 27 May 2025 10:17:06 +0000 (12:17 +0200)]
ada: Fix ALI elaboration flags for ghost compilation units (cont.)
When GNAT was compiling a ghost unit, the ALI file wrongly suggested that this
unit required elaboration counters, which caused linking errors to non-existing
objects.
gcc/ada/ChangeLog:
* sem_ch10.adb (Analyze_Compilation_Unit): Ignored ghost unit need no
elaboration checks.
The vectorizer tracks alignment of datarefs with dr_aligned
and dr_unaligned_supported but that's aligned with respect to
the target alignment which can be less aligned than the mode
used for the access. The following fixes this discrepancy
for vectorizing loads and stores. The issue is visible for
aarch64 SVE and risc-v where VLA vector modes have larger than
element alignment but the target handles element alignment
just fine.
H.J. Lu [Thu, 3 Jul 2025 02:54:39 +0000 (10:54 +0800)]
x86-64: Add RDI clobber to 64-bit dynamic TLS patterns
*tls_global_dynamic_64_largepic, *tls_local_dynamic_64_<mode> and
*tls_local_dynamic_base_64_largepic use RDI as the __tls_get_addr
argument. Add RDI clobber to these patterns to show it.
gcc/
PR target/120908
* config/i386/i386.cc (legitimize_tls_address): Pass RDI to
gen_tls_local_dynamic_64.
* config/i386/i386.md (*tls_global_dynamic_64_largepic): Add
RDI clobber and use it to generate LEA.
(*tls_local_dynamic_64_<mode>): Likewise.
(*tls_local_dynamic_base_64_largepic): Likewise.
(@tls_local_dynamic_64_<mode>): Add a clobber.
gcc/testsuite/
PR target/120908
* gcc.target/i386/pr120908.c: New test.
Jason Merrill [Wed, 2 Jul 2025 22:03:57 +0000 (18:03 -0400)]
c++: uninitialized TARGET_EXPR and constexpr [PR120684]
In r15-7532 for PR118856 I introduced a TARGET_EXPR with a
TARGET_EXPR_INITIAL of void_node to express that no initialization is done.
And indeed evaluating that doesn't store a value for the TARGET_EXPR_SLOT
variable.
But then at the end of the full-expression, destroy_value stores void_node
to express that its lifetime has ended. If we evaluate the same
full-expression again, global_ctx->values still holds the void_node, causing
confusion when we try to destroy it again. So clear out any value before
evaluating a TARGET_EXPR_INITIAL of void_type.
PR c++/120684
PR c++/118856
gcc/cp/ChangeLog:
* constexpr.cc (cxx_eval_constant_expression) [TARGET_EXPR]: Clear
the value first if is_complex.
H.J. Lu [Tue, 1 Jul 2025 09:17:06 +0000 (17:17 +0800)]
x86-64: Add RDI clobber to tls_global_dynamic_64 patterns
*tls_global_dynamic_64_<mode> uses RDI as the __tls_get_addr argument.
Add RDI clobber to tls_global_dynamic_64 patterns to show it.
PR target/120908
* config/i386/i386.cc (legitimize_tls_address): Pass RDI to
gen_tls_global_dynamic_64.
* config/i386/i386.md (*tls_global_dynamic_64_<mode>): Add RDI
clobber and use it to generate LEA.
(@tls_global_dynamic_64_<mode>): Add a clobber.
Dimitar Dimitrov [Fri, 20 Jun 2025 17:57:15 +0000 (20:57 +0300)]
RISC-V: testsuite: Skip tests providing -march/-mcpu for ILP32E/ILP64E ABIs
Some test cases explicitly set -march or -mcpu with extensions which
are not compatible with the E ABI variants. This leads to spurious
errors when toolchain has been configured for RV32E base ISA and
ILP32E ABI:
cc1: error: ILP32E ABI does not support the 'D' extension
Also, test gcc.target/riscv/rvv/base/pr119164.c implicitly requires
rv64 since it explicitly selects -march=rv64gcv_zvl256b:
cc1: error: ABI requires '-march=rv32'
Testing done:
- Ensured cross riscv64-unknown-linux-gnu has no difference in test
output with and without the patch.
- For riscv32-unknown-elf there are no new failures. Test case pr119164.c
no longer fails and is now marked as unsupported.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/mcpu-xt-c908.c: Disable for E ABI variants.
* gcc.target/riscv/mcpu-xt-c908v.c: Ditto.
* gcc.target/riscv/mcpu-xt-c910.c: Ditto.
* gcc.target/riscv/mcpu-xt-c910v2.c: Ditto.
* gcc.target/riscv/mcpu-xt-c920.c: Ditto.
* gcc.target/riscv/mcpu-xt-c920v2.c: Ditto.
* gcc.target/riscv/pr118241.c: Ditto.
* gcc.target/riscv/pr120223.c: Ditto.
* gcc.target/riscv/rvv/base/pr119164.c: Disable for E ABI variants
and for 32-bit ISA.
[PATCH] [RISC-V] Fix shift type for RVV interleaved stepped patterns [PR120356]
It corrects the shift type of interleaved stepped patterns for const vector
expanding in LRA. The shift instruction was initially LSHIFTRT, and it seems
still should be the same type for both LRA and other cases.
PR target/120356
gcc/ChangeLog:
* config/riscv/riscv-v.cc
(expand_const_vector_interleaved_stepped_npatterns):
Fix ASHIFT to LSHIFTRT insn.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/pr120356.c: New test.
Jonathan Wakely [Wed, 27 Nov 2024 20:58:29 +0000 (20:58 +0000)]
libstdc++: Use hidden friends for __normal_iterator operators
As suggested by Jason, this makes all __normal_iterator operators into
friends so they can be found by ADL and don't need to be separately
exported in module std.
The operator<=> comparing two iterators of the same type is removed
entirely, instead of being made a hidden friend. That overload was added
by r12-5882-g2c7fb16b5283cf to deal with unconstrained operator
overloads found by ADL, as defined in the testsuite_greedy_ops.h header.
We don't actually test that case as there's no unconstrained <=> in that
header, and it doesn't seem reasonable for anybody to define such an
operator<=> in C++20 when they should constrain their overloads properly
(e.g. using a requires-clause). The homogeneous operator<=> overloads
added for reverse_iterator and move_iterator could also be removed, but
that's not part of this commit.
I also had to reorder the __attribute__((always_inline)) and
[[nodiscard]] attributes on the pre-c++20 operators, because Clang won't
allow [[foo]] after __attribute__((bar)) on a friend function:
<source>:4:36: error: an attribute list cannot appear here
4 | __attribute__((always_inline)) [[nodiscard]] friend bool
| ^~~~~~~~~~~~~
libstdc++-v3/ChangeLog:
* include/bits/stl_iterator.h (__normal_iterator): Make all
non-member operators hidden friends, except ...
(operator<=>(__normal_iterator<I,C>, __normal_iterator<I,C>)):
Remove.
* src/c++11/string-inst.cc: Remove explicit instantiations of
operators that are no longer templates.
* src/c++23/std.cc.in (__gnu_cxx): Do not export operators for
__normal_iterator.
Richard Biener [Wed, 2 Jul 2025 11:44:59 +0000 (13:44 +0200)]
Do not query further vector epilogues after a masked epilogue
When doing --param vect-partial-vector-usage=1 we'd continue querying
the target whether it wants more vector epilogues, but when it comes
back with a suggestion we then might iterate endlessly. Do not
even ask the target when we decided for the last epilogue to be
one with partial vectors.
PR tree-optimization/120927
* tree-vect-loop.cc (vect_analyze_loop): Stop querying
further epilogues after one with partial vectors.
libstdc++: make range view ctors explicit (P2711) [PR119744]
Make range view constructors explicit, per P2711. Technically, this
is a breaking change, but it is unlikely to break any production
code, as reliance on non-explicit construction is unidiomatic..
libstdc++-v3/ChangeLog
PR libstdc++/119744
* include/std/ranges: View ctors become explicit.
Qing Zhao [Mon, 16 Jun 2025 21:08:32 +0000 (21:08 +0000)]
Use the counted_by attribute of pointers in array bound checker.
Current array bound checker only instruments ARRAY_REF, and the INDEX
information is the 2nd operand of the ARRAY_REF.
When extending the array bound checker to pointer references with
counted_by attributes, the hardest part is to get the INDEX of the
corresponding array ref from the offset computation expression of
the pointer ref. I.e.
Given an OFFSET expression, and the ELEMENT_SIZE,
get the index expression from the OFFSET.
For example:
OFFSET:
((long unsigned int) m * (long unsigned int) SAVE_EXPR <n>) * 4
ELEMENT_SIZE:
(sizetype) SAVE_EXPR <n> * 4
get the index as (long unsigned int) m.
gcc/c-family/ChangeLog:
* c-gimplify.cc (is_address_with_access_with_size): New function.
(ubsan_walk_array_refs_r): Instrument an INDIRECT_REF whose base
address is .ACCESS_WITH_SIZE or an address computation whose base
address is .ACCESS_WITH_SIZE.
* c-ubsan.cc (ubsan_instrument_bounds_pointer_address): New function.
(struct factor_t): New structure.
(get_factors_from_mul_expr): New function.
(get_index_from_offset): New function.
(get_index_from_pointer_addr_expr): New function.
(is_instrumentable_pointer_array_address): New function.
(ubsan_array_ref_instrumented_p): Change prototype.
Handle MEM_REF in addtional to ARRAY_REF.
(ubsan_maybe_instrument_array_ref): Handle MEM_REF in addtional
to ARRAY_REF.
gcc/testsuite/ChangeLog:
* gcc.dg/ubsan/pointer-counted-by-bounds-2.c: New test.
* gcc.dg/ubsan/pointer-counted-by-bounds-3.c: New test.
* gcc.dg/ubsan/pointer-counted-by-bounds-4.c: New test.
* gcc.dg/ubsan/pointer-counted-by-bounds-5.c: New test.
* gcc.dg/ubsan/pointer-counted-by-bounds.c: New test.
Qing Zhao [Mon, 16 Jun 2025 20:58:40 +0000 (20:58 +0000)]
Use the counted_by attribute of pointers in builtinin-object-size.
gcc/ChangeLog:
* tree-object-size.cc (access_with_size_object_size): Update comments
for pointers with .ACCESS_WITH_SIZE.
(collect_object_sizes_for): Propagate size info through GIMPLE_ASSIGN
for pointers with .ACCESS_WITH_SIZE.
gcc/testsuite/ChangeLog:
* gcc.dg/pointer-counted-by-4-char.c: New test.
* gcc.dg/pointer-counted-by-4-float.c: New test.
* gcc.dg/pointer-counted-by-4-struct.c: New test.
* gcc.dg/pointer-counted-by-4-union.c: New test.
* gcc.dg/pointer-counted-by-4.c: New test.
* gcc.dg/pointer-counted-by-5.c: New test.
* gcc.dg/pointer-counted-by-6.c: New test.
* gcc.dg/pointer-counted-by-7.c: New test.
specifies that the "array2" is an array that is pointed by the
pointer field, and its number of elements is given by the field
"count2" in the same structure.
gcc/c-family/ChangeLog:
* c-attribs.cc (handle_counted_by_attribute): Accept counted_by
attribute for pointer fields.
gcc/c/ChangeLog:
* c-decl.cc (verify_counted_by_attribute): Change the 2nd argument
to a vector of fields with counted_by attribute. Verify all fields
in this vector.
(finish_struct): Collect all the fields with counted_by attribute
to a vector and pass this vector to verify_counted_by_attribute.
* c-typeck.cc (build_counted_by_ref): Handle pointers with counted_by.
Add one more argument, issue error when the pointee type is a structure
or union including a flexible array member.
(build_access_with_size_for_counted_by): Handle pointers with counted_by.
(handle_counted_by_for_component_ref): Call build_counted_by_ref
with the new prototype.
gcc/ChangeLog:
* doc/extend.texi: Extend counted_by attribute to pointer fields in
structures. Add one more requirement to pointers with counted_by
attribute.
gcc/testsuite/ChangeLog:
* gcc.dg/flex-array-counted-by.c: Update test.
* gcc.dg/pointer-counted-by-1.c: New test.
* gcc.dg/pointer-counted-by-2.c: New test.
* gcc.dg/pointer-counted-by-3.c: New test.
* gcc.dg/pointer-counted-by.c: New test.
Harald Anlauf [Tue, 1 Jul 2025 19:41:53 +0000 (21:41 +0200)]
Fortran: fix minor issues with coarrays
gcc/fortran/ChangeLog:
* coarray.cc (check_add_new_component): Treat pure and elemental
intrinsic functions the same as non-intrinsic ones.
(create_caf_add_data_parameter_type): Fix front-end memleaks.
* trans-intrinsic.cc (conv_caf_func_index): Likewise.
Patrick Palka [Tue, 1 Jul 2025 17:43:12 +0000 (13:43 -0400)]
libstdc++: Use ranges::iter_move in ranges::remove_if [PR120789]
PR libstdc++/120789
libstdc++-v3/ChangeLog:
* include/bits/ranges_algo.h (__remove_if_fn::operator()): Use
ranges::iter_move(iter) instead of std::move(*iter).
* testsuite/25_algorithms/remove_if/120789.cc: New test.
Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com> Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
Patrick Palka [Tue, 1 Jul 2025 17:43:09 +0000 (13:43 -0400)]
libstdc++: Use ranges::iter_move in ranges::unique [PR120789]
PR libstdc++/120789
libstdc++-v3/ChangeLog:
* include/bits/ranges_algo.h (__unique_fn::operator()): Use
ranges::iter_move(iter) instead of std::move(*iter).
* testsuite/25_algorithms/unique/120789.cc: New test.
Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com> Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
Jakub Jelinek [Tue, 1 Jul 2025 17:37:39 +0000 (19:37 +0200)]
testsuite: Fix up gcc.target/powerpc/builtin_altivec_tr_stxvr_runnable.c test (test UB) [PR120919]
In my reading of the test and the instructions emitted by the
builtins, it invokes UB 4 times, each time overwriting one byte
after some variable (sc, then ss, then si and then sll).
If we are lucky, like at -O0 -mcpu=power10, there is just padding
there or something that doesn't make the tests fail, if unlucky
like with -O0 -mcpu=power10 -fstack-protector-strong,
&sc + 1 == &expected_sc
and so it overwrites the expected_sc variable.
The test fails when testing with
RUNTESTFLAGS="--target_board=unix/'{,-fstack-protector-strong}'"
on power10.
The following patch fixes that by using arrays of 2 elements, so that
the overwriting of 1 byte happens to the part of the same variable.
2025-07-01 Jakub Jelinek <jakub@redhat.com>
PR testsuite/120919
* gcc.target/powerpc/builtin_altivec_tr_stxvr_runnable.c (main): Change
sc, ss, si and sll vars from scalars to arrays of 2 elements,
initialize and test just the first one though.
Jakub Jelinek [Tue, 1 Jul 2025 15:33:32 +0000 (17:33 +0200)]
s390: Add -fno-stack-protector to 3 tests
In Fedora/RHEL we usually test with
make check RUNTESTFLAGS="--target_board=unix/'{,-fstack-protector-strong}'"
because -fstack-protector-strong is used when building pretty much all the
packages.
In the past Marek Polacek has committed tweaks to various tests to make
them PASS in such testing, see e.g. r14-6276 or r14-2200.
These 3 tests FAIL with -fstack-protector-strong on s390x because they
use check-function-bodies and aren't prepared for the extra
-fstack-protector-{strong,all} extra code in the prologue/epilogue.
Jakub Jelinek [Tue, 1 Jul 2025 13:28:10 +0000 (15:28 +0200)]
c++: Fix up cp_build_array_ref COND_EXPR handling [PR120471]
The following testcase is miscompiled since the introduction of UBSan,
cp_build_array_ref COND_EXPR handling replaces
(cond ? a : b)[idx] with cond ? a[idx] : b[idx], but if there are
SAVE_EXPRs inside of idx, they will be evaluated just in one of the
branches and the other uses uninitialized temporaries.
Fixed by keeping doing what it did if idx doesn't have side effects
and is invariant. Otherwise if op1/op2 are ARRAY_TYPE arrays with
invariant addresses or pointers with invariant values, use
SAVE_EXPR <op0>, SAVE_EXPR <idx>, SAVE_EXPR <op0> as a new condition
and SAVE_EXPR <idx> instead of idx for the recursive calls.
Otherwise punt, but if op1/op2 are ARRAY_TYPE, furthermore call
cp_default_conversion on array, so that COND_EXPR with ARRAY_TYPE doesn't
survive in the IL until expansion.
2025-07-01 Jakub Jelinek <jakub@redhat.com>
PR c++/120471
gcc/
* tree.h (address_invariant_p): New function.
* tree.cc (address_invariant_p): New function.
(tree_invariant_p_1): Use it for ADDR_EXPR handling. Formatting
tweak.
gcc/cp/
* typeck.cc (cp_build_array_ref) <case COND_EXPR>: If idx is not
INTEGER_CST, don't optimize the case (but cp_default_conversion on
array early if it has ARRAY_TYPE) or use
SAVE_EXPR <op0>, SAVE_EXPR <idx>, SAVE_EXPR <op0> as new op0 depending
on flag_strong_eval_order and whether op1 and op2 are arrays with
invariant address or tree invariant pointers. Formatting fixes.
gcc/testsuite/
* g++.dg/ubsan/pr120471.C: New test.
* g++.dg/parse/pr120471.C: New test.
Add an optimization to aarch64 SIMD converting mvn+shrn into mvni+subhn when
possible, which allows for better optimization when the code is inside a loop
by using a constant.
The conversion is based on the fact that for an unsigned integer:
-x = ~x + 1 => ~x = -1 - x
thus '(u8)(~x >> imm)' is equivalent to '(u8)(((u16)-1 - x) >> imm)'.
For the following function:
uint8x8_t neg_narrow_v8hi(uint16x8_t a) {
uint16x8_t b = vmvnq_u16(a);
return vshrn_n_u16(b, 8);
}
Without this patch the assembly look like:
not v0.16b, v0.16b
shrn v0.8b, v0.8h, 8
After the patch it becomes:
mvni v31.4s, 0
subhn v0.8b, v31.8h, v0.8h
Fortran: Ensure arguments in coarray call get unique components in add_data [PR120847]
PR fortran/120847
gcc/fortran/ChangeLog:
* coarray.cc (check_add_new_comp_handle_array): Make the count
of components static to be able to create more than one. Create
an array component only for array expressions.
Jakub Jelinek [Tue, 1 Jul 2025 09:58:28 +0000 (11:58 +0200)]
testsuite: Fix up pr119318.c test for big-endian [PR120082]
The test is not endianess clean, x[0] is supposed to be ((__int128)0x19)<<32
on little endian - 0x19 is in the second vector elt - but ((__int128)0x19)<<64
on big endian. I've added also verification of int and __int128 sizes just
in case we have say 16-bit or 64-bit int target with __int128 type, or
pdp endian gets __int128 support.
2025-07-01 Jakub Jelinek <jakub@redhat.com>
PR ipa/119318
PR testsuite/120082
* gcc.dg/ipa/pr119318.c (main): Expect different result on big endian
from little endian, on unexpected endianness or int/int128 sizes don't
test anything. Formatting fixes.
Jakub Jelinek [Tue, 1 Jul 2025 09:26:45 +0000 (11:26 +0200)]
tailc: Handle musttail in case of non-cleaned-up cleanups, especially ASan related [PR120608]
The following testcases FAIL at -O0 -fsanitize=address. The problem is
we end up with something like
_26 = foo (x_24(D)); [must tail call]
// predicted unlikely by early return (on trees) predictor.
finally_tmp.3_27 = 0;
goto <bb 5>; [INV]
...
<bb 5> :
# _6 = PHI <_26(3), _23(D)(4)>
# finally_tmp.3_8 = PHI <finally_tmp.3_27(3), finally_tmp.3_22(4)>
.ASAN_MARK (POISON, &c, 4);
if (finally_tmp.3_8 == 1)
goto <bb 7>; [INV]
else
goto <bb 6>; [INV]
<bb 11> :
<L11>:
return _7;
before the sanopt pass. This is -O0, we don't try to do forward
propagation, jump threading etc. And what is worse, the sanopt
pass lowers the .ASAN_MARK calls that the tailc/musttail passes
already handle into somewthing that they can't easily pattern match.
The following patch fixes that by
1) moving the musttail pass 2 passes earlier (this is mostly just
for -O0/-Og, for normal optimization levels musttail calls are
handled in the tailc pass), i.e. across the sanopt and cleanup_eh
passes
2) recognizes these finally_tmp SSA_NAME assignments, PHIs using those
and GIMPLE_CONDs deciding based on those both on the backwards
walk (when we start from the edges to EXIT) and forwards walk
(when we find a candidate tail call and process assignments
after those up to the return statement). For backwards walk,
ESUCC argument has been added which is either NULL for the
noreturn musttail case, or the succ edge through which we've
reached bb and if it sees GIMPLE_COND with such comparison,
based on the ESUCC and comparison it will remember which later
edges to ignore later on and which bb must be walked up to the
start during tail call discovery (the one with the PHI).
3) the move of musttail pass across cleanup_eh pass resulted in
g++.dg/opt/pr119613.C regressions but moving cleanup_eh before
sanopt doesn't work too well, so I've extended
empty_eh_cleanup to also handle resx which doesn't throw
externally
I know moving a pass on release branches feels risky, though the
musttail pass is only relevant to functions with musttail calls,
so something quite rare and only at -O0/-Og (unless one e.g.
disables the tailc pass).
2025-07-01 Jakub Jelinek <jakub@redhat.com>
PR middle-end/120608
* passes.def (pass_musttail): Move before pass_sanopt.
* tree-tailcall.cc (empty_eh_cleanup): Handle GIMPLE_RESX
which doesn't throw externally through recursion on single
eh edge (if any and cnt still allows that).
(find_tail_calls): Add ESUCC, IGNORED_EDGES and MUST_SEE_BBS
arguments. Handle GIMPLE_CONDs for non-simplified cleanups with
finally_tmp temporaries both on backward and forward walks, adjust
recursive call.
(tree_optimize_tail_calls_1): Adjust find_tail_calls callers.
* c-c++-common/asan/pr120608-3.c: New test.
* c-c++-common/asan/pr120608-4.c: New test.
* g++.dg/asan/pr120608-3.C: New test.
* g++.dg/asan/pr120608-4.C: New test.