REGNO_REG_CLASS (regno) [Macro]
A C expression whose value is a register class containing hard
register regno. In general there is more than one such class;
choose a class which is minimal, meaning that no smaller class
also contains the register.
riscv_regno_to_class[] currently maps every FP hard register to
RVC_FP_REGS, but RVC_FP_REGS only contains f8-f15. The entries for
f0-f7 and f16-f31 therefore violate the "containing hard register
regno" half of the contract: the returned class does not contain the
register at all.
The mismatch corrupts IRA's cost model. setup_allocno_cost_vector
indexes the per-hard-reg cost slot via REGNO_REG_CLASS:
After setup_regno_cost_classes_by_mode adds RVC_FP_REGS to the cost
classes, the cost for e.g. f16 is silently read from the RVC_FP_REGS
slot.
The new fp-reg-class.c testcase puts eight "cf"- and sixteen "f"-
constrained doubles live across a call. In the buggy state IRA
places the cf pseudos outside the cf class and LRA recovers with
sixteen fmv.d to fs* registers; with the fix IRA spills those values
honestly and the IRA "+++Costs" line reports a non-zero "mem"
component.
Fix it by giving each FP hard register its minimal class: FP_REGS for
f0-f7 and f16-f31, RVC_FP_REGS for f8-f15. As a companion change,
switch riscv_secondary_memory_needed from class-equality tests to
reg_class_subset_p so it still recognises the FP side regardless of
which subclass the table returns.
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_regno_to_class): Use the minimal
class containing each FP hard register: FP_REGS for f0-f7 and
f16-f31, RVC_FP_REGS for f8-f15.
(riscv_secondary_memory_needed): Use reg_class_subset_p to
detect FP classes.
Zhongyao Chen [Thu, 28 May 2026 11:27:25 +0000 (19:27 +0800)]
RISC-V: Support VLS LMUL cost scaling
Make VLS (fixed-length) vector modes use the same LMUL cost scaling as
VLA modes. This makes the vectorizer to pick smaller LMULs sometimes.
Here is how I update the testsuite which failed in regression test:
- dyn-lmul-conv-[1-2].c: The cost model now prefers smaller LMULs,
so update expectations.
- pr123414.c: This test relies on large LMULs to trigger a specific bug,
can be fixed by adding -fno-vect-cost-model.
gcc/ChangeLog:
* config/riscv/riscv-vector-costs.cc (get_lmul_cost_scaling):
Enable scaling for all vector modes (VLA and VLS).
Tomasz Kamiński [Mon, 25 May 2026 13:15:09 +0000 (15:15 +0200)]
libstdc++: Optimize operator<< for piecewise distributions.
This avoids creating an temporary vector and uses _M_int and _M_den
members of _M_param. Empty _M_int (default) values are handled by
printing values direclty.
libstdc++-v3/ChangeLog:
* include/bits/random.h (piecewise_constant_distribution::param_type)
(piecewise_linear_distribution::param_type): Befriend operator<<.
* include/bits/random.tcc
(operator<<(basic_ostream&, piecewise_linear_distribution))
(operator<<(basic_ostream&, piecewise_constant_distribution)):
Use __x._M_param._M_int and __x._M_param._M_den instead of accessors.
Reviewed-by: Jonathan Wakely <jwakely@redhat.com> Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>
Tomasz Kamiński [Mon, 25 May 2026 12:53:43 +0000 (14:53 +0200)]
libstdc++: Expand serialization test for piecewise distributions.
Due the viariability of the resutls, the test are currently limited
to x86_64 architectures. float/double test are disabled for -m32
as I was getting unstable result.
libstdc++-v3/ChangeLog:
* testsuite/26_numerics/random/piecewise_constant_distribution/operators/serialize2.cc:
New test.
* testsuite/26_numerics/random/piecewise_linear_distribution/operators/serialize2.cc:
New test.
Reviewed-by: Jonathan Wakely <jwakely@redhat.com> Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>
This can be simplified into a single dup instruction going to an SVE
register directly from a scalar (or a smaller vector) value:
mov z0.s, s0
ret
To facilitate this, this patch adds a pattern that combine can use to
merge two vec_duplicate instructions (scalar -> AdvSIMD and AdvSIMD ->
SVE) into a single one (scalar -> SVE).
To demonstrate the effect of this patch, the vec-init-23.c test from
AdvSIMD was reused as a new SVE test (vec_init_5.c).
gcc/ChangeLog:
* config/aarch64/aarch64-sve.md
(*aarch64_vec_duplicate_subvector<vconsv><vconq><mode>):
New pattern.
* config/aarch64/iterators.md (VCONSV): New mode attribute.
(vconsv): Likewise.
Artemiy Volkov [Thu, 26 Feb 2026 08:45:08 +0000 (08:45 +0000)]
aarch64: implement vec_concat support for sub-64-bit types
This patch improves handling of 2-element vec_concats in
aarch64_vector_init_fallback (); where previously the aarch64_vec_concat
insn was emitted only for pairs of vectors, we now allow scalar operands
as well. Furthermore, if the two operands are the same, we can now emit a
vec_duplicate instead of a vec_concat, leading to better code generation.
This is backed by the new combine{z,_internal}{,_be} insn patterns, that
were each split between integral 16- and 32-bit modes (only involving GPRs
and memory), and the rest (requiring the "w" alternatives as well).
The effect of the changes is illustrated by the changes to vec-init-23.c,
introduced in the previous patch (and a handful of other vector-init
related tests).
gcc/ChangeLog:
* config/aarch64/aarch64-simd.md (*aarch64_combine_internal<mode>):
New insn pattern.
(*aarch64_combine_internal_be<mode>): Likewise.
(*aarch64_combinez<mode>): Likewise.
(*aarch64_combinez_be<mode>): Likewise.
(@aarch64_vec_concat<mode>): Support smaller vector and scalar modes.
* config/aarch64/aarch64.cc (aarch64_expand_vector_init_fallback):
Handle the case of two scalar elements.
* config/aarch64/iterators.md (SSUB64): New mode iterator.
(VSSUB64): Likewise.
(VSSUB32_I) : Likewise.
(VSSUB64_F): Likewise.
(VS32_I_SUB64_F): Likewise.
(single_wx): Define attribute for sub-64-bit vector and scalar modes.
(bitsize): Likewise.
(VDBL): Likewise.
(single_dwx): New mode attribute.
Artemiy Volkov [Thu, 26 Feb 2026 09:01:30 +0000 (09:01 +0000)]
aarch64: initialize vectors from starting subsequence
Now that we have 2- and 4-element vector modes for all the sub-word scalar
modes, we can emit more efficient code when the elements of a vector
constructor can be generated from a common starting subsequence of length
power of two. To do this, first detect the shortest possible starting
subsequence by repeatedly folding the initial constructor element array
in half, as long as the left and the right halves are equal. Afterwards,
after emitting the subsequence, duplicate it by generating a
vec_duplicate with the correct source mode.
On the MD side, this requires implementing the vec_duplicate optab to
duplicate an arbitrary sub-128-bit value into a full 64- or a 128-bit
AdvSIMD register, as well as the vec_set insn for the VSUB64 modes (needed
as fallback for the divide-and-conquer approach). The latter uses a
properly scaled and shifted "bfi" for integer values, and a properly
indexed "ins" for FP elements.
This change allows us to get rid of long chains of inserts and compile
things like:
int16x8_t f (int16_t x, int16_t y, int16_t z, int16_t w)
{
return (int16x8_t) {x, y, z, w, x, y, z, w};
}
Artemiy Volkov [Mon, 18 May 2026 10:21:18 +0000 (10:21 +0000)]
aarch64: introduce partial AdvSIMD vector modes
In addition to V2HF that already exists, this patch adds 4 more partial
(16- and 32-bit) AdvSIMD vector modes: V4QI, V2QI, V2HI, and V2BF. For
now, these are intended only for duplication into full-sized (32-, 64-,
and 128-bit) registers. As a minimal closure required to bootstrap the
compiler, this also implements the "mov" expand and the "aarch64_simd_mov"
insn_and_split for the new modes (gathered under the VSUB64 iterator).
This patch also adds the new aarch64_advsimd_sub_dword_mode_p () helper to
facilitate detecting the new modes; that is then used (a) to disable
vec_perm_const vectorization for those modes, (b) in the "mov" expander
for those modes, and (c) to define the new "Da" constraint.
Some existing testcases were adjusted where needed. (The _Float16
testcase in sve/slp_1.c temporarily expects GPRs to be used for V2HF,
which is corrected to FPRs by the succeeding patch; and the half-float
complex tests now recognize some of the patterns, but check that V2BF
still can't be used for vectorization.)
gcc/ChangeLog:
* config/aarch64/aarch64-modes.def (VECTOR_MODE): Remove V2HF.
(VECTOR_MODES): Define V2QI, V4QI, V2HI, V2HF, V2BF.
* config/aarch64/aarch64-protos.h
(aarch64_advsimd_sub_dword_mode_p): Declare new predicate.
* config/aarch64/aarch64-simd.md (*aarch64_simd_mov<mode>): New
define_insn_and_split pattern.
(mov<mode>): Add sub-64-bit vector modes to the VALL_F16 expander.
Forego const vector expansion for those modes.
* config/aarch64/aarch64.cc (aarch64_classify_vector_mode):
Handle 16- and 32-bit vector modes.
(aarch64_advsimd_sub_dword_mode_p): Define new predicate.
(aarch64_vectorize_vec_perm_const): Refuse for partial vector modes.
* config/aarch64/constraints.md (Da): New constraint.
* config/aarch64/iterators.md (VSUB64): New iterator.
(VALL_F16_SUB64): Likewise.
(size): Define attribute for sub-64-bit vector modes.
(VSC): New mode attribute.
(vstype): Likewise.
Kewen Lin [Thu, 28 May 2026 11:22:57 +0000 (11:22 +0000)]
i386: Refine c86-4g fdiv scheduling model
Commit r17-258 introduced separated c86-4g fdiv units to avoid the
automaton explosion caused by modeling the whole divider latency on
normal FPU pipes. But the real hardware may keep the associated FPU
pipe occupied for some cycles at both the beginning and the end of
an fdiv or sqrt operation. Following Alexander's suggestion in [1],
this patch still keeps the long-latency part on the dedicated fdiv
unit but models only a bounded part of the FPU pipe occupancy. It
makes the first four cycles reserve both the selected FPU pipe and
the fdiv unit, then keep only the fdiv unit for the remaining cycles.
Taking r17-258 as baseline, I tried K = 1,2,3,4 for
fpu,divider*N -> (fpu+divider)*K, divider*(N-K)
and measured the time for build/genautomata and the top 100 symbol
sizes of insn-automata.o (baseline normalized as 100) as below:
1) without any other changes:
time size
baseline 100 100
r17-203 340.0 629.3
K1 100.3 100
K2 105.5 112.5
K3 112.8 129
K4 119.4 141
2) Splitting fpu0/fpu2 and fpu1/fpu3 to paired automatons:
time size
baseline 100 100
r17-203 340.0 629.3
KS1 79.6 43.3
KS2 79.8 43.3
KS3 79.6 43.3
KS4 79.4 43.3
It turns out that if we want to model the FPU occupancy for some
beginning cycles, separating the involved fpu1/fpu3 from the
original fpu looks better. So this patch splits fpu0/fpu2 and
fpu1/fpu3 into two paired automata and this extra coupling does
not grow the main FPU automata significantly.
This patch also corrects some other modeling omissions like:
- Fix c86_4g_fp_op_idiv_load latency typo by one cycle.
- Merge the old c86_4g_m7 idiv DI/SI/HI reservations after
aligning their latency and divider unit occupancy (with
updated values), while keeping QI separate.
- Adjust reservation units in templates like
c86_4g_m7_avx_vpinsr_reg_load and c86_4g_m7_avx512_sseadd_xy
etc.
- Add missing reservation units and unit occupancy in templates
like c86_4g_m7_avx512_permi2_ymm and
c86_4g_m7_sse_sseiadd_hplus_load etc.
- Adjust reservation units and unit occupancy in templates like
c86_4g_m7_avx512_perm_zmm_imm, c86_4g_m7_avx512_expand and
c86_4g_m7_avx512_ssemul etc.
And also introduces some reusable reservation aliases to simplify
some modelings.
I tested build time for i686 bootstrapping in a docker container:
- r17-202: 2437s (before c86-4g support)
- r17-203: 7291s (c86-4g support)
- r17-258: 2646s (tweaking for build time)
- this: 2358s
It looks this patch improves build time (even better than r17-202
though the trivial gap can be due to some jitter).
The symbol sizes are improved as below:
nm -CS -t d --defined-only gcc/insn-automata.o \
| sed 's/^[0-9]* 0*//' \
| sort -n | tail -20
with r17-258:
20068 r bdver1_fp_transitions
22354 r c86_4g_m7_ieu_min_issue_delay
26208 r slm_min_issue_delay
26580 t internal_min_issue_delay(int, DFA_chip*)
26869 t internal_state_transition(int, DFA_chip*)
27244 r bdver1_fp_min_issue_delay
28518 r glm_check
28518 r glm_transitions
33690 r geode_min_issue_delay
33728 r c86_4g_fp_transitions
45436 r znver4_fpu_min_issue_delay
46980 r bdver3_fp_min_issue_delay
49428 r glm_min_issue_delay
53730 r btver2_fp_min_issue_delay
53760 r znver1_fp_transitions
89414 r c86_4g_m7_ieu_transitions
93960 r bdver3_fp_transitions
181744 r znver4_fpu_transitions
326322 r c86_4g_m7_fpu_min_issue_delay 1305288 r c86_4g_m7_fpu_transitions
with this:
17872 r print_reservation(_IO_FILE*, rtx_insn*)::...
20068 r bdver1_fp_check
20068 r bdver1_fp_transitions
22016 r c86_4g_m7_fpu02_transitions
22354 r c86_4g_m7_ieu_min_issue_delay
26208 r slm_min_issue_delay
27244 r bdver1_fp_min_issue_delay
28199 t internal_min_issue_delay(int, DFA_chip*)
28362 t internal_state_transition(int, DFA_chip*)
28518 r glm_check
28518 r glm_transitions
33690 r geode_min_issue_delay
45436 r znver4_fpu_min_issue_delay
46980 r bdver3_fp_min_issue_delay
49428 r glm_min_issue_delay
53730 r btver2_fp_min_issue_delay
53760 r znver1_fp_transitions
89414 r c86_4g_m7_ieu_transitions
93960 r bdver3_fp_transitions
181744 r znver4_fpu_transitions
Based on random sampling of SPEC2017 benchmarks 525.x264_r and
521.wrf_r, I verified that the new modeling introduces no
significant compilation overhead. Testing with a single job on a
c86-4g-m7 machine revealed no impact on x264 and a tiny increase
for wrf (~0.3%).
Zhongyao Chen [Wed, 20 May 2026 09:30:22 +0000 (17:30 +0800)]
RISC-V: Add RISC-V RVV main-loop overhead comparison in cost model
Add an RVV-specific loop-overhead comparison in the RISC-V cost model and
use it after inside-loop cost comparison.
The RISC-V implementation prefers RVV mode that eliminate the main
loop, and otherwise compares their main-loop head overhead.
Local testing shows no regressions. This is likely because few testcases
have equal inside-loop cost, especially before VLS lmul cost scaling support.
I also ran regression tests with temporary VLS lmul cost scaling support.
Only 3 regressions found:
- dyn-lmul-conv-1.c & dyn-lmul-conv-2.c: Cost model now prefers smaller LMULs
due to VLS lmul scaling, so this is reasonable, just need to update expectations.
- pr123414.c: This test relies on large LMULs to trigger a specific bug,
so reasonable too, can be fixed by adding -fno-vect-cost-model.
The VLS LMUL cost scaling patch will be updated after this is pushed.
gcc/ChangeLog:
* config/riscv/riscv-vector-costs.cc
(estimated_loop_iters): New function.
(compare_loop_overhead): New function.
(costs::better_main_loop_than_p): Compare RVV loop overhead after
inside-loop cost.
Alex Coplan [Wed, 27 May 2026 20:26:44 +0000 (21:26 +0100)]
aarch64: Make more use of UINTVAL
I noticed while reviewing some other code that we have existing code of
the form (unsigned HOST_WIDE_INT) INTVAL (X). Such expressions are (by
definition of UINTVAL) equivalent to UINTVAL (x), and the latter is both
more succint and (IMO) more readable, so this patch replaces those
instances in the aarch64 backend accordingly.
There are also many occurrences of this outside of aarch64, I see:
Georg-Johann Lay [Thu, 28 May 2026 09:44:21 +0000 (11:44 +0200)]
AVR: Support [[len=<words]] notes in inline asm to specifty its size.
This patch adds support for [[len=<words>]] in (the comments of) inline
asm constructs. It serves several purposes:
- Cases where the expanded asm is longer than determined from the number
of physical and logical line breaks. Such cases can lead to errors
when a jump that uses a too optimistic jump offset is crossing an asm.
- Better code generation for jumps that are crossing an asm. The default
length of an asm is (1 + NL) * 2 words, where NL denotes the sum of
physical and logical line breaks. However, almost all AVR instructions
occupy only one 16-bit word.
The feature is implemented in ADJUST_INSN_LENGTH. The length of
an asm is the sum over all [[len=<words>]] notes, except when an
unrecognized construct is found or an error occurred. In the latter
case, the default insn length is used. These <words> are supported:
<words> = [0-9]+
Specifies a non-negative decimal integer.
<words> = %[0-9]+
<words> = %[<name>] # Already resolved to %[0-9]+ by the middle-end.
Refers to the respective asm operand, which must be CONST_INT.
<words> = lds
<words> = sts
Specifies the length of a LDS or STS instruction, i.e.
1 word if AVR_TINY, and 2 words otherwise.
<words> = %~
<words> = %~call
<words> = %~jmp
Specifies the length of a %~call resp. %~jmp instruction, i.e.
2 words if AVR_HAVE_JMP_CALL, and 1 word otherwise.
In order to observe the assigned lengths, see -fdump-rtl-shorten or the
";; ADDR = ..." insn addresses in the asm output with -mlog=insn_addresses.
The benefits of using magic comments are:
- The feature is backwards compatible, and the target code can use
the same asm syntax since only asm comments have to be adjusted.
No #ifdef feature test macros are needed. The only case where the
feature is not fully backwards compatible is when asm templates
already contain invalid "[[len=" notes for some reason. In that
case, -mno-asm-len-notes restores the old behavior.
- Since the asm size is the sum over all notes, the final size can
be stitched together from multiple annotations / parts of an asm
template, and there is no need to support operations like plus.
gcc/
* config/avr/avr.cc (avr_read_number, avr_length_of_asm)
(avr_maybe_length_of_asm): New static functions.
(avr_adjust_insn_length): Call avr_maybe_length_of_asm on
unrecognized insns.
* config/avr/avr.opt (-masm-len-notes, -Wasm-len-notes): New
options.
* doc/invoke.texi (AVR Options): Add -masm-len-notes,
-Wasm-len-notes.
* doc/extend.texi (Size of an asm): Add @subsubheading
"Specifying the size of an asm on AVR".
libgcc/config/avr/libf7/
* libf7.h: Add "[len=...]]" notes to all non-empty inline asm's.
* libf7.c: Dito.
Jakub Jelinek [Thu, 28 May 2026 08:28:12 +0000 (10:28 +0200)]
i386: Fix up *add<mode>_1<nf_name> [PR125469]
The following testcase ICEs, because combine matches
(set (reg:DI 108) (plus:DI (reg:DI 104 [ s ]) (subreg:DI (reg:TI 103 [ _2 ]) 8)))
Now, because ix86_validate_address_register has:
12038 /* Don't allow SUBREGs that span more than a word. It can
12039 lead to spill failures when the register is one word out
12040 of a two word structure. */
12041 if (GET_MODE_SIZE (mode) > UNITS_PER_WORD)
12042 return NULL_RTX;
this isn't recognized as *leadi, but is recognized as *adddi_1_nf pattern
instead. Now, later on the RA turns it into:
(set (reg:DI 2 cx [108]) (plus:DI (reg:DI 0 ax [orig:104 s ] [104]) (reg:DI 5 di [ _2+8 ])))
which would be valid *leadi, but given that INSN_CODE is already set to the
*adddi_1_nf and that also satisfies it, nothing re-recognizes it as *leadi.
But in that case without TARGET_APX_NDD the pattern has return "#";
That is a bug, because there is no splitter to split that
(set (reg:DI 2 cx [108]) (plus:DI (reg:DI 0 ax [orig:104 s ] [104]) (reg:DI 5 di [ _2+8 ])))
into itself so that it is re-recognized as *leadi, so it just ICEs.
I think having a splitter to split to the same thing would be just weird, so
this just outputs lea insn directly.
2026-05-28 Jakub Jelinek <jakub@redhat.com>
PR target/125469
* config/i386/i386.md (*add<mode>_1<nf_name>): Don't return "#" for
the lea non-TARGET_APX_NDD case, instead emit a lea directly.
Eric Botcazou [Tue, 10 Mar 2026 09:14:20 +0000 (10:14 +0100)]
ada: Fix iterator for Iterable aspect rejected without subtype indication
Iterator specifications of the In form without subtype indication are parsed
as a choice list, and later turned during semantic analysis into a bona-fide
N_Iterator_Specification node when there is a single choice with an iterator
type, but the case of the GNAT Iterable aspect is overlooked in the process.
gcc/ada/ChangeLog:
* sem_aggr.adb (Resolve_Array_Aggregate): Also rewrite a choice list
with a single choice as an iterator specification when the choice's
type has the GNAT Iterable aspect specified.
Eric Botcazou [Fri, 6 Mar 2026 13:30:23 +0000 (14:30 +0100)]
ada: Fix assertion failure on invalid String_Literal aspect
The root cause is that a subprogram declared in the body is incorrectly
considered as a primitive operation of a type declared in a package spec.
gcc/ada/ChangeLog:
* einfo.ads (In_Package_Body): Update description.
(In_Private_Part): Likewise.
* sem_ch3.adb (Analyze_Object_Declaration): Compute In_Package_Body
along with In_Private_Part for the object if its scope is a package.
* sem_ch6.adb (Analyze_Expression_Function): Do not compute
In_Private_Part here.
(Enter_Overloaded_Entity): Compute In_Package_Body & In_Private_Part
for the entity if its scope is a package.
* sem_util.adb (Collect_Primitive_Operations): Skip the subprograms
declared in the body for types declared in a package specification.
Eric Botcazou [Wed, 4 Mar 2026 19:43:02 +0000 (20:43 +0100)]
ada: Reject non-primitive operations in Finalizable aspect
The implementation does not support them and allowing them would not bring
any significant benefit.
gcc/ada/ChangeLog:
* doc/gnat_rm/gnat_language_extensions.rst
(Generalized Finalization): Document the new restriction.
* sem_ch13.adb (Resolve_Finalizable_Argument): Adjust wording of
error message.
(Resolve_Finalization_Procedure.Is_Finalizable_Primitive): Require
the procedure to be a primitive operation.
* gnat_rm.texi: Regenerate.
Piotr Trojanek [Wed, 4 Mar 2026 16:05:33 +0000 (17:05 +0100)]
ada: Remove .EXE suffix from GNAT.Command_Line error messages
The .EXE suffix in GNAT.Command_Line output causes diffs in testsuite results
that run on different platforms.
gcc/ada/ChangeLog:
* libgnat/g-comlin.adb
(Command_Name): New routine to strip platform-specific suffix.
(Display_Help, Get_Opt): Use new routine.
(Try_Help): Remove hardcoded ".exe" suffix; use new routine.
Eric Botcazou [Wed, 4 Mar 2026 13:36:13 +0000 (14:36 +0100)]
ada: Fix bogus visibility error for inherited operator of null extension
This occurs when the operator has a heterogeneous profile and the extension
is declared in the same scope as the type of a non-controlling parameter of
the operator, because Find_Dispatching_Type incorrectly returns this type.
gcc/ada/ChangeLog:
* exp_ch3.adb (Make_Controlling_Function_Wrappers): Manually set the
Has_Controlling_Result flag on the wrappers.
* sem_disp.ads (Override_Dispatching_Operation): Move to...
* sem_disp.adb (Override_Dispatching_Operation): ...here.
(Find_Dispatching_Type): Return the (controlling) result type for a
controlling function wrapper.
Eric Botcazou [Tue, 3 Mar 2026 10:35:36 +0000 (11:35 +0100)]
ada: Fix unresolved symbols with partial -gnatVo compilation
This happens when the units of a program using the standard containers are
not uniformly compiled with the -gnatVo switch. This is the fallout of an
internal confusion as to what validity checks must be applied to.
gcc/ada/ChangeLog:
* exp_ch4.adb (Expand_N_Op_Eq): Do not expand an array comparison
for validity checking purposes when the component type is covered
by the suppression of validity checks.
Javier Miranda [Mon, 2 Mar 2026 16:24:01 +0000 (16:24 +0000)]
ada: Incorrect error message on use of 'Result with wrong prefix
gcc/ada/ChangeLog:
* sem_util.ads (Is_Access_Subprogram_Wrapper): Renamed as
Is_Access_To_Subprogram_Wrapper.
* sem_util.adb (Is_Access_Subprogram_Wrapper): Ditto plus add
assertion.
* sem_disp.adb (Is_Access_To_Subprogram_Wrapper): Removed.
* sem_prag.adb (Find_Related_Declaration_Or_Body): Replace call to
Is_Access_Subprogram_Wrapper by call to Is_Access_To_Subprogram_Wrapper.
* exp_ch6.adb (Expand_Call): Ditto.
* sem_attr.adb (Analyze_Attribute [Attribute_Result]): For access to
subprogram wrappers, report that the expected prefix is the name of
the access type.
Bob Duff [Sun, 1 Mar 2026 18:29:50 +0000 (13:29 -0500)]
ada: Rewrite Analyze_Aspect_Specifications
Misc cleanup of Sem_Ch13.Analyze_Aspect_Specifications.
Split out procedures, remove gratuitous gotos, make various
things somewhat more uniform, etc.
Change type of E parameter of Analyze_Aspect_Specifications
from Entity_Id to N_Entity_Id; the latter has a predicate to
make sure we only pass entities. Modify one place in
Sem_Ch12.Analyze_Formal_Subprogram_Declaration that violates
the predicate, by skipping Analyze_Aspect_Specifications in
case of error.
Consolidate computation of Delay_Required into a single function.
Unfortunately, it is still necessary to modify Delay_Required
later, so it can't be constant.
Aspect_Invariant was set to Always_Delay, and then we did
"Delay_Required := False;" unconditionally. Better to set it
to Never_Delay in the first place. Similar for some other aspects.
Aspect_Implicit_Dereference was set to Always_Delay, but we create an
Aitem and insert it without delay and then do a "goto" to skip the
delay-related code. Better to set it to Never_Delay. Similar for some
other aspects, including ones previously set to Rep_Aspect. This is
probably wrong, but it was already wrong -- it doesn't introduce new
bugs.
Move Set_Aspect_On_Partial_View so it gets called for all
aspects when appropriate; "goto Continue;" was skipping this
call in some cases.
Make Boolean_Aspects include Library_Unit_Aspects, because all
Library_Unit_Aspects really are Boolean_Aspects. This allows
to change "Boolean_Aspects | Library_Unit_Aspects" to just
"Boolean_Aspects" in several places. There were just 3 uses
of Boolean_Aspects without Library_Unit_Aspects; the one in
Sem_Util seems harmless, and the two in Delay_Aspect have
a new assertion that makes sure we're not changing anything.
gcc/ada/ChangeLog:
* sem_ch13.adb (Analyze_Aspect_Specifications):
Major rewrite.
* sem_ch13.ads: Minor comment improvements.
* aspects.ads: Change some aspects to be Never_Delay.
Make Boolean_Aspects include Library_Unit_Aspects.
* exp_ch9.adb (Build_Corresponding_Record):
When copying aspects, set Aspect_Rep_Item to Empty,
so Asp_Copy looks like an unanalyzed tree.
* sem_ch12.adb (Analyze_Formal_Subprogram_Declaration):
Skip Analyze_Aspect_Specifications in case of error.
* sem_ch6.adb (Analyze_Expression_Function): Likewise.
* sinfo.ads: Minor comment improvement.
Steve Baird [Thu, 26 Feb 2026 23:59:07 +0000 (15:59 -0800)]
ada: Compiler hangs on a semantically incorrect program.
A homonyms list should be acyclic. Do not introduce a cycle in an error case.
gcc/ada/ChangeLog:
* sem_ch6.adb (Install_Entity): If the entity to be installed is
already installed, assert that an error has already been flagged
and then return without introducing a cycle in the entity's
Homonyms list.
Viljar Indus [Wed, 25 Feb 2026 12:29:46 +0000 (14:29 +0200)]
ada: Create the SARIF file in the current cwd
Previously we used to create the SARIF file next to the specified
source file e.g. "<Specified_Source_File_Path>.gnat.sarif"
Now the SARIF file is always generated in the cwd
"<Source_File_Name>.gnat.sarif" similarly to how gcc handles its sarif
files.
gcc/ada/ChangeLog:
* errout.adb (Output_Messages): use the source file name without
the directory path when constructing the name of the SARIF file.
* osint.adb (Strip_Directory): New method for extracting the file name
from a given path.
* osint.ads (Strip_Directory): Likewise.
Marc Poulhiès [Tue, 24 Feb 2026 09:03:52 +0000 (10:03 +0100)]
ada: Fix VAST check on aspect consistency
Currently, N_Attribute_Definition_Clause nodes don't have a
Corresponding_Aspect field. As hinted by a comment, it's something we
would like to do in the future, but adding the check was premature.
gcc/ada/ChangeLog:
* vast.adb (Do_Node_Pass_2): Adjust check for aspect consistency.
Eric Botcazou [Mon, 23 Feb 2026 16:29:44 +0000 (17:29 +0100)]
ada: Adjust 'Constrained for formal parameters of unchecked union types
GNAT has historically never added extra formal parameters alongside formal
parameters of unchecked union types, even when they have convention Ada,
so it cannot compute the 'Constrained attribute for In Out or Out formal
parameters. This changes the compiler to raise Program_Error in this case.
gcc/ada/ChangeLog:
* exp_attr.adb (Expand_N_Attribute_Reference) <Constrained>: If the
prefix is a non-In formal parameter of an unchecked union type, give
a warning and insert a raise statement for Program_Error.
Eric Botcazou [Mon, 23 Feb 2026 08:43:17 +0000 (09:43 +0100)]
ada: Fix spurious discriminant check failure for unconstrained actual parameter
This happens when the unconstrained variable passed as actual parameter is
initialized by a conditional expression, because its declaration is wrongly
distributed into the dependent expressions of the conditional expression.
gcc/ada/ChangeLog:
* exp_util.ads (Is_Distributable_Declaration): New predicate.
* exp_util.adb (Is_Distributable_Declaration): New predicate coming
from Expand_N_Case_Expression and Expand_N_If_Expression. Return
False for variables of an unconstrained definite nonlimited subtype.
* exp_ch4.adb (Expand_N_Case_Expression): Replace calls to local
Is_Optimizable_Declaration by calls to Is_Distributable_Declaration.
(Expand_N_If_Expression): Likewise.
* exp_ch6.adb (Expand_Ctrl_Function_Call): Likewise.
Bob Duff [Fri, 20 Feb 2026 15:07:05 +0000 (10:07 -0500)]
ada: Cleanup Analyze_Aspect_Specifications
Comment cleanup: Change incorrect uses of "erroneous"
(which is Ada jargon) to be "illegal".
Remove long list of aspects for Insert_Pragma;
it seems useless, and might be incorrect, and is certainly
incorrect after this change.
Change Insert_Pragma to be more general, and use it more
instead of ad-hoc code. It now supports N_Attribute_Definition_Clauses,
so should be renamed (in a future change).
The previous code sometimes used Ins_Node to preserve order;
the order of pragmas is the same as the order of aspects.
But sometimes, Ins_Node was not used. (With Ins_Node,
"with Foo => ..., Bar => ..." produces pragma Foo then pragma Bar,
for example. Without Ins_Node, it produces pragma Bar then pragma Foo.)
We are trying to use Insert_Pragma for more cases (DRY).
The new code uses Ins_Node to preserve order in case of Annotate,
and not otherwise. The Compilation_Unit case also does not
preserve order. This code is marked "???" to be cleaned up later.
One goal of this change (not yet done) is to avoid having
so many "goto Continue;"s, which are confusing, especially
since <<Continue>> is misnamed (it's not at the end of a
loop body). We will probably also split out Analyze_One_Aspect
as a separate procedure. When we get to the code after the
giant case statement, if Aitem is present, we can insert it.
(Current code inserts it as we go along.)
Move code dealing with Boolean_Aspects and Library_Unit_Aspects of
library units to where other Boolean_Aspects and Library_Unit_Aspects
are handled. This seems simpler.
Marc Poulhiès [Thu, 19 Feb 2026 10:18:23 +0000 (11:18 +0100)]
ada: Fix freezing of nested discriminated type
Simply creating the freeze node for the base type of a discriminated type
without adjusting the scope and the visible declarations leads to an
incorrect tree that crashes the compiler when unnesting the predicate
function.
gcc/ada/ChangeLog:
* sem_ch3.adb (Find_Type_Of_Object): Adjust freezing of the base
type of a discriminated type.
Co-authored-by: Eric Botcazou <botcazou@adacore.com>
Eric Botcazou [Thu, 19 Feb 2026 18:36:18 +0000 (19:36 +0100)]
ada: Fix oversight in latest accessibility change
The oversight is that the dynamic accessibility checks should be generated
neither when accessibility checks are disabled, for example by means of the
-gnatp switch, nor when the GNAT restriction No_Dynamic_Accessibility_Checks
is enabled.
gcc/ada/ChangeLog:
* accessibility.adb
(Apply_Accessibility_Check_For_Class_Wide_Return): Do not test if
accessibility checks are suppressed here but...
(Apply_Accessibility_Check_For_Return): ...here instead.
This improves the documentation comments of Is_Immutably_Limited_Type and
Is_Inherently_Limited_Type and rewrites the body of
Is_Inherently_Limited_Type to leverage Is_Immutably_Limited_Type.
gcc/ada/ChangeLog:
* sem_aux.ads (Is_Immutably_Limited_Type, Is_Inherently_Limited_Type):
Improve documentation comments.
* sem_aux.adb (Is_Inherently_Limited_Type): Replace inline code with
call to Is_Immutably_Limited_Type.
Javier Miranda [Wed, 28 Jan 2026 11:19:45 +0000 (11:19 +0000)]
ada: Crash when using address clause on declare-expression constant
gcc/ada/ChangeLog:
* gen_il-fields.ads (Scope_Link): New field.
* gen_il-gen-gen_nodes.adb (N_Expression_With_Actions): Added Scope_Link.
* sinfo.ads (N_Expression_With_Actions): Add field Scope_Link.
* sem_ch4.adb (Analyze_Expression_With_Actions): Set field Scope_Link
* sem_ch5.ads (Has_Sec_Stack_Call): Declaration moved to the package spec.
* sem_ch5.adb (Has_Sec_Stack_Call): ditto.
* sem_res.adb (Resolve_Declare_Expression): Push/Pop internally created
scope to provide proper visibility of the declare_items.
Denis Mazzucato [Wed, 18 Feb 2026 13:35:55 +0000 (14:35 +0100)]
ada: Fix crash evaluating class-wide preconditions with missing completion
This patch fixes a crash occurring when evaluating class-wide precondition of a
non-primitive subprogram where accessing the class-wide type of its dispatching
type is not possible. The bug occurs when the type is abstract and missing
completion, a proper error should be given instead.
gcc/ada/ChangeLog:
* sem_prag.adb (Check_References): Don't call Class_Wide_Type if the
subprogram is a non-primitive procedure as the dispatching type may be
empty.
Piotr Trojanek [Mon, 16 Feb 2026 14:23:41 +0000 (15:23 +0100)]
ada: Remove spurious error on attribute Count with expansion disabled
When expansion is disabled, e.g. because of GNAT switch -gnatc or because GNAT
is operating in the GNATprove mode, then attribute Count is not expanded and
can legitimately appear in a barrier of a protected entry, even if restriction
Pure_Barriers is enabled.
Eric Botcazou [Tue, 17 Feb 2026 16:27:43 +0000 (17:27 +0100)]
ada: Fix small irregularity for Master_Id of anonymous access result type
The Master_Id of access types whose designated type contains tasks is set to
a renaming of the current _Master variable by means of Build_Master_Renaming
The exception is for anonymous access result types, whose Master_Id is set
to the current _Master variable by Check_Anonymous_Access_Return_With_Tasks.
This is fully correct because the entity of the variable is preresolved, but
is ambiguous in the .dg file because there can be several _Master variables
in the subprogram, which effectively represent distinct masters. Therefore
this makes the case of anonymous access result types also use a renaming.
gcc/ada/ChangeLog:
* exp_ch9.ads (Build_Master_Declaration): Minor tweaks in comment.
(Build_Master_Entity): Likewise.
(Build_Master_Renaming): Likewise.
(Build_Master_Renaming_Declaration): New function declaration.
* exp_ch9.adb (Build_Master_Declaration): Move around.
(Build_Master_Renaming_Declaration): New function.
(Build_Master_Renaming): Call Build_Master_Renaming_Declaration
to build the renaming declaration.
* sem_ch6.adb (Check_Anonymous_Access_Return_With_Tasks): Remove
useless guard on Declarations (N). Create a renaming declaration
for the current _Master variable and set is as the Master_Id of
the access result type.
Piotr Trojanek [Fri, 13 Feb 2026 09:07:28 +0000 (10:07 +0100)]
ada: Fix for illegal deep delta array aggregate with others
Do not try to apply a scalar range check to "others" choice in deep delta array
aggregate. This choice is illegal, but we still need to handle it in expansion.
gcc/ada/ChangeLog:
* exp_spark.adb (Expand_SPARK_N_Delta_Aggregate): Special case for
"others" clause.
On x86_64-w64-mingw32, PR target/54412 is triggered when AVX and
AVX512 values are passed or returned indirectly and GCC materializes
under-aligned stack storage for them.
Add run tests for the original by-value AVX cases, for isolated hidden
sret allocation and callee by-reference parameter materialization, and
for a dedicated aligned(64) AVX512 case.
gcc/testsuite/ChangeLog:
PR target/54412
* gcc.target/i386/pr54412-v4d-o0-aligned-locals.c: New test.
* gcc.target/i386/pr54412-o2-by-value-cases.c: New test.
* gcc.target/i386/pr54412-sret-no-args.c: New test.
* gcc.target/i386/pr54412-callee-byref-param.c: New test.
* gcc.target/i386/pr54412-avx512-aligned64.c: New test.
Signed-off-by: oltolm <oleg.tolmatcev@gmail.com> Signed-off-by: Jonathan Yong <10walls@gmail.com>
Jeff Law [Thu, 28 May 2026 02:00:34 +0000 (20:00 -0600)]
[RISC-V] Drop compromised scan-asm test after recent vectorized loop epilogue changes
Tamar's recent change to elide vectorized loop epilogues compromised the
scan-asm part of this test for RISC-V. Essentially the test is looking for a
specific insn that appears in the unnecessary epilogue.
The original motivation for these tests was an ICE. So I'm just dropping the
scan-asm parts of this test so that we still verify that we're not triggering
an ICE.
gcc/testsuite
* gcc.target/riscv/rvv/base/pr115456-3.c: Drop compromised scan-asm
part of the test.
Jerry DeLisle [Sun, 24 May 2026 03:28:43 +0000 (20:28 -0700)]
Fortran: [PR93727] Add EX format rounding for truncated hex mantissa
Implement proper rounding of the hex mantissa in write_ex when the
user specifies a d smaller than full precision. All Fortran ROUND=
modes are supported: ROUND_NEAREST (ties-to-even), ROUND_COMPATIBLE
(ties away from zero), ROUND_UP, ROUND_DOWN, and ROUND_ZERO.
ROUND_PROCDEFINED and ROUND_UNSPECIFIED default to ROUND_NEAREST on
IEEE 754 systems, consistent with the decimal format behaviour.
Carry propagation handles the case where incrementing a string of
trailing F hex digits reaches the integer digit; if that overflows
(F → 16) the output is normalized by setting the integer digit to 8
and incrementing the binary exponent by one.
Assisted by: Claude Sonnet 4.6
PR fortran/93727
libgfortran/ChangeLog:
* io/write.c (write_ex): Replace simple truncation with
rounding-aware logic respecting dtp round_status. Add carry
propagation and integer-digit normalization.
* io/write_float.def: Change use of GFC_UINTEGER_8 to
long long unsigned.
gcc/testsuite/ChangeLog:
* gfortran.dg/EXformat_3.F90: New test covering rounding for
KIND=4, 8, 10, and 16: clear round-up, ties-to-even (truncate
and round-up cases), carry propagation, and normalization.
* gfortran.dg/EXrounding.F90: New test checking the various
rounding modes for all kinds.
Philipp Tomsich [Fri, 20 Mar 2026 16:14:15 +0000 (17:14 +0100)]
ext-dce: narrow sign-extending loads to zero-extending when upper bits are dead
The ext-dce pass tracks bit-level liveness and can replace sign extensions
with zero extensions when the upper bits are dead. However,
ext_dce_try_optimize_extension bails out when the inner operand is MEM
rather than REG, missing the opportunity to narrow sign-extending loads
(e.g. lh -> lhu on RISC-V, ldrsh -> ldrh on AArch64).
Add handling for SIGN_EXTEND of MEM: when the liveness analysis has
already determined the sign bits are dead, replace the sign-extending
load with a zero-extending load via validate_change, which ensures the
target has a matching instruction pattern.
gcc/ChangeLog:
* ext-dce.cc (ext_dce_try_optimize_extension): Handle
SIGN_EXTEND of MEM by replacing with ZERO_EXTEND of MEM
when upper bits are dead.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/ext-dce-1.c: New test.
* gcc.target/riscv/ext-dce-3.c: New test.
* gcc.target/riscv/ext-dce-4.c: New test.
Philipp Tomsich [Fri, 20 Mar 2026 16:14:00 +0000 (17:14 +0100)]
ext-dce: fix off-by-one in subreg liveness for 32-bit modes
ext_dce_process_uses uses `size >= 32` to decide whether group 3
(bits 32-63) is live for a lowpart subreg source. For SImode subregs
(size == 32), this incorrectly marks bits 32-63 as live, preventing
the pass from recognizing that the upper half of a DImode register is
dead. This blocks lw -> lwu narrowing on RV64.
Change the condition to `size > 32`, consistent with the other
thresholds in the same block (size > 8, size > 16). The size > 32
case is still reachable via SUBREG_PROMOTED_VAR_P which widens size
beyond the outer mode.
gcc/ChangeLog:
* ext-dce.cc (ext_dce_process_uses): Fix off-by-one: use
size > 32 instead of size >= 32 for group 3 liveness.
Kishan Parmar [Wed, 27 May 2026 16:24:57 +0000 (21:54 +0530)]
testsuite: Restrict mpc860_no_lwsync.c to Power ilp32 targets [PR125448]
The recently added mpc860_no_lwsync.c test case fails on 64-bit PowerPC
targets (such as powerpc64le-linux-gnu) because the default -m64 option
conflicts with the 32-bit legacy processor specified by -mcpu=860,
resulting in the error: "cc1: error: '-m64' requires a PowerPC64 cpu".
Fix this by restricting the test case execution to 32-bit PowerPC
targets using the 'ilp32' target requirement.
Tobias Burnus [Wed, 27 May 2026 14:16:05 +0000 (16:16 +0200)]
libgomp: Fix ipr_vendor for OpenMP's interop
omp_ipr_vendor and omp_ipr_vendor_name denote the vendor of the
implementation (GCC / GNU compiler) and not the vendor of the
foreign runtime (like Nvidia for CUDA or AMD for ROCm/HSA/HIP).
Thus, use 5 / "gnu" per "Additional Definitions" document
("1.2 Supported vendor-name Values"), cf.
https://www.openmp.org/specifications/
[See also OpenMP Spec Issue #4766.]
libgomp/ChangeLog:
* libgomp.texi (Foreign-runtime support for AMD GPUs,
Foreign-runtime support for Nvdia GPUs): Fix vendor
value to match compiler not GPU vendor.
* plugin/plugin-gcn.c (GOMP_OFFLOAD_get_interop_int,
GOMP_OFFLOAD_get_interop_str): Return 5/"gnu" as ipr_vendor.
* plugin/plugin-nvptx.c (GOMP_OFFLOAD_get_interop_int,
GOMP_OFFLOAD_get_interop_str): Likewise.
* testsuite/libgomp.c/append-args-fr.h: Updated expected
value.
* testsuite/libgomp.c/interop-cuda-full.c: Likewise.
* testsuite/libgomp.c/interop-fr-1.c: Likewise.
* testsuite/libgomp.c/interop-hip.h: Likewise.
* testsuite/libgomp.fortran/interop-hip.h: Likewise.
Tamar Christina [Wed, 27 May 2026 13:31:49 +0000 (14:31 +0100)]
vect: drop prefetches during if-cvt [PR120164]
PR114061 added support for dropping prefetches during vectorization but that
version doesn't work when the prefetch is conditional.
The conditionality introduces a non-if-convertible block in the CFG. The
vectorizer removes the prefetch later on but it can't modify the CFG and as
such the block where the prefetch was in remains with just a VUSEs chain.
This change now drops them during if-conversion. While this patch at the moment
removes them, it makes it easier for later on, should we want to start
vectorizing these instead to just update predicate_statements.
For now this follows the same approach as PR114061 and just drops them.
gcc/ChangeLog:
PR tree-optimization/120164
* tree-if-conv.cc (if_convertible_stmt_p): Detect prefetches.
(predicate_statements): Drop them during predication.
gcc/testsuite/ChangeLog:
PR tree-optimization/120164
* gcc.dg/vect/vect-prefetch-drop_2.c: New test.
Fix x86 caller/callee handling for over-aligned indirect arguments/returns
On x86_64-w64-mingw32, TARGET_SEH limits MAX_SUPPORTED_STACK_ALIGNMENT
to 128 bits, but 256-bit AVX values are still passed and returned indirectly.
Some caller/callee stack-slot paths still used generic allocators that cap
requested alignment to MAX_SUPPORTED_STACK_ALIGNMENT, producing slots that are
under-aligned for later vmovapd/vmovaps accesses.
Fix caller-side paths by using dynamically allocated stack space for:
Fix callee-side paths by overallocating the local stack slot, then aligning the
effective address within that slot when required alignment exceeds
MAX_SUPPORTED_STACK_ALIGNMENT.
This preserves ABI behavior while ensuring alignment-sensitive AVX accesses are
correctly aligned in both caller and callee paths.
Use a target hook to control when this over-aligned stack-slot handling is
required, instead of hardcoding target conditionals in generic code.
gcc/ChangeLog:
PR target/54412
* target.def (overaligned_stack_slot_required): New calls hook.
* calls.cc (allocate_call_dynamic_stack_space): New helper.
(initialize_argument_information): Use
targetm.calls.overaligned_stack_slot_required for over-aligned
by-reference argument copies.
(expand_call): Use
targetm.calls.overaligned_stack_slot_required for over-aligned
hidden return slots.
* function.cc (assign_stack_local_aligned): New helper.
(assign_parm_setup_block): Use
targetm.calls.overaligned_stack_slot_required for over-aligned
stack parm slots.
(assign_parm_setup_reg): Likewise.
* config/i386/i386.cc (ix86_overaligned_stack_slot_required): New.
(TARGET_OVERALIGNED_STACK_SLOT_REQUIRED): Define for i386.
* doc/tm.texi.in: Add hook placement.
* doc/tm.texi: Regenerate.
Signed-off-by: oltolm <oleg.tolmatcev@gmail.com> Signed-off-by: Jonathan Yong <10walls@gmail.com>
oltolm [Tue, 19 May 2026 17:34:42 +0000 (19:34 +0200)]
i386: return 256/512-bit vectors in registers for x86_64 MS ABI [PR89597]
On x86_64 Windows targets using MS ABI, GCC classified 256-bit and
512-bit vector returns as memory returns. That caused hidden sret
pointer returns where YMM0/ZMM0 returns are expected.
Teach MS ABI return classification to keep 32-byte and 64-byte vector
returns in registers when AVX/AVX512F is enabled, matching the return
register selection path.
Also extend function_value_ms_64 so 32-byte and 64-byte eligible vector
returns are mapped to the SSE register class (YMM0/ZMM0 lanes).
Add tests for x86_64-*-mingw* that verify 256-bit and 512-bit vector
returns use YMM0/ZMM0 codegen.
gcc:
PR target/89597
* config/i386/i386.cc (function_value_ms_64): Handle 32-byte and
64-byte vector returns in registers when supported.
(ix86_return_in_memory): Do not force 32-byte/64-byte eligible
vector returns to memory for MS ABI.
gcc/testsuite:
* gcc.target/i386/pr89597-1.c: New test.
* gcc.target/i386/pr89597-2.c: New test.
Signed-off-by: Oleg Tolmatcev <oleg.tolmatcev@gmail.com> Signed-off-by: Jonathan Yong <10walls@gmail.com>
Which when we find an element, in order to return 1 we still go to scalar.
Obviously the scalar code is completely unneeded.
This patch teaches the vectorizer that when
1. We have no live values
2. We only have one exit (this is a restriction that will be lifted in a later
patch and is there because we need masking to avoid false positives, but see
testcase vect-early-break-no-epilog_11.c)
3. The loop has no side-effects
PR tree-optimization/120352
* gcc.dg/vect/vect-early-break-no-epilog_1.c: New test.
* gcc.dg/vect/vect-early-break-no-epilog_10.c: New test.
* gcc.dg/vect/vect-early-break-no-epilog_11.c: New test.
* gcc.dg/vect/vect-early-break-no-epilog_2.c: New test.
* gcc.dg/vect/vect-early-break-no-epilog_3.c: New test.
* gcc.dg/vect/vect-early-break-no-epilog_4.c: New test.
* gcc.dg/vect/vect-early-break-no-epilog_5.c: New test.
* gcc.dg/vect/vect-early-break-no-epilog_6.c: New test.
* gcc.dg/vect/vect-early-break-no-epilog_7.c: New test.
* gcc.dg/vect/vect-early-break-no-epilog_8.c: New test.
* gcc.dg/vect/vect-early-break-no-epilog_9.c: New test.
* gcc.target/aarch64/noeffect.c: New test.
* gcc.target/aarch64/noeffect10.c: New test.
* gcc.target/aarch64/noeffect11.c: New test.
* gcc.target/aarch64/noeffect2.c: New test.
* gcc.target/aarch64/noeffect3.c: New test.
* gcc.target/aarch64/noeffect4.c: New test.
* gcc.target/aarch64/noeffect5.c: New test.
* gcc.target/aarch64/noeffect6.c: New test.
* gcc.target/aarch64/noeffect7.c: New test.
* gcc.target/aarch64/noeffect8.c: New test.
* gcc.target/aarch64/noeffect9.c: New test.
* gcc.target/aarch64/sve/noeffect.c: New test.
* gcc.target/aarch64/sve/noeffect10.c: New test.
* gcc.target/aarch64/sve/noeffect11.c: New test.
* gcc.target/aarch64/sve/noeffect2.c: New test.
* gcc.target/aarch64/sve/noeffect3.c: New test.
* gcc.target/aarch64/sve/noeffect4.c: New test.
* gcc.target/aarch64/sve/noeffect5.c: New test.
* gcc.target/aarch64/sve/noeffect6.c: New test.
* gcc.target/aarch64/sve/noeffect7.c: New test.
* gcc.target/aarch64/sve/noeffect8.c: New test.
* gcc.target/aarch64/sve/noeffect9.c: New test.
Tamar Christina [Wed, 27 May 2026 09:52:27 +0000 (10:52 +0100)]
vect: refactor loop peeling to support explicit flag to redirect early exits [PR120352]
This patch series is the first in a few to optimize early break vectorization.
The first one addresses that certain loops don't require an epilog at all.
An example is
int a[N] = {0,0,0,1};
int b[N] = {0,0,0,1};
__attribute__((noipa, noinline))
int foo ()
{
for (int i = 0; i < N; i++)
{
if (a[i] > b[i])
return 1;
}
return 0;
}
where we have no value or side-effect to compute. Naturally there's no need to
redo any work to just return 1 or 0.
Teaching the vectorizer this however re-enabled epilogue nomask for early break
and so we still need to be able to peel for the epilogues. This peeling however
should not redirect all the alternative exits to the epilog. To understand when
this has to happen peeling now gets an extra parameter to indicate how to handle
the multiple exits.
This had an unfortunate interaction with uncounted loops, because uncounted
loops re-used the layout (with the intermediate merge block) but just being a
fall through block. When it did this it didn't put all PHI nodes in the final
merge block and as such relied on fixups later.
This made the actual changed needed for not needing epilogs more fragile than
needed so I first refactored peeling to be more consistent between early break
and uncounted loops and insure that all BB now explicitly mention and use all
PHI nodes from the exits.
The code should hopefully be a bit more robust now wrt to needed optimizations.
gcc/ChangeLog:
PR tree-optimization/120352
* tree-vect-loop-manip.cc (slpeel_tree_duplicate_loop_to_edge_cfg): Add
redirect_exits.
(vect_do_peeling): Use it.
* tree-vectorizer.h (slpeel_tree_duplicate_loop_to_edge_cfg): Update
prototype.
which was dumb, but valid and the above optimization now gets the load
eliminated and the constants folded. However, in particular for scalars,
AArch64 has an optimization that's been a long for ages in which scalar FPR
constants are created using vector broadcasting operations. It assumes scalars
are accessed as scalars (as in, in the mode that created them).
So the above gets optimized to
movi v30.8b, 0x1
which is invalid. The original load requires the inactive elements to be zero,
where-as by using the paradoxical subreg it's relying on the implicit (as in,
not modelled in RTL) assumption that the load zeros the top bits, but doesn't
keep in mind that the load can be optimized away.
This patch fixes it by creating a full SVE vector of 0s and writing only the
values we want to set using an INSR. (i.e. using VL2 of bytes writes a short).
It then provides patterns to optimize this:
1. if it's still following a load, just emit the load.
2. if it's not, then optimize it to a zero'ing operation. so e.g. HI mode
issues an fmov h0, h0 and so clears the top bits to zero.
I choose this representation because even without the above operations it is
semantically valid and will generate correct code.
The alternative would be to delay this optimization to e.g. combine however we
have two problems there:
1. It's quite late, so the above constant cases for instance don't get optimized
and we keep the pointless store and loads.
2. Our RTX costs don't model predicates. and so it may not accept the
combination since the replacement is more expensive.
So I chose to keep the optimization early, but just replace the paradoxical
subreg with a zero-extend.
Richard Earnshaw [Tue, 19 May 2026 17:38:55 +0000 (18:38 +0100)]
MAINTAINERS: Add additional checking to check-MAINTAINERS.py
In order to maintain the new sorting, add a number of additional checks
to check-MAINTAINERS.py. In summary, these are:
- arrange to sort names by surname and then forname(s).
- rework the code to use regex matches for the fields to accommodate
some fields that spill over into the next column.
- arrange to sort by more than one field.
contrib/ChangeLog:
* check-MAINTAINERS.py (get_surname): Rename to ...
(get_name_for_sort): ... this. Add the forenames after the
surname.
(check_group): match against regexs and support additional
fields for secondary sorting.
(sections): Rework to use regexs, add rules for the other
sections in the MAINTAINERS file.
Richard Earnshaw [Tue, 19 May 2026 17:55:22 +0000 (18:55 +0100)]
MAINTAINERS: Sort Various Maintainers
Sort the Various Maintainers by area and then name. In a minor change
to the formatting, I've introduced the convention where if a field
overflows its allotted space, it must be terminated by at exactly two
spaces. This makes it possible for a parser to separate the component
from the subsequent maintainer name.
Evgeny Karpov [Tue, 19 May 2026 16:41:32 +0000 (16:41 +0000)]
aarch64: mingw: Enable init priority order
The patch enables init priority, which is needed for winpthreads and
supported now by aarch64-w64-mingw32.
This change allows building aarch64-w64-mingw32 from upstream
binutils/gcc/mingw repos.
gcc/ChangeLog:
* config/aarch64/aarch64-coff.h (SUPPORTS_INIT_PRIORITY):
Enable SUPPORTS_INIT_PRIORITY by default.
Pietro Monteiro [Tue, 26 May 2026 23:09:23 +0000 (19:09 -0400)]
libffi: Use correct include path for tests [PR125417]
Libffi testsuite uses relative directories for include paths. For
multilibbed targets we run the tests from the the main target build
directory, so using relative paths leads to the wrong fiules being
included by the tests. Fix by using variables that point into the
build dir for the current multilib variant being tested.
libffi/ChangeLog:
PR libffi/125417
* testsuite/lib/libffi.exp (libffi_target_compile): Use
${libffi_include} and ${blddirffi} instead of "../include" and
".." for include paths.
Signed-off-by: Pietro Monteiro <pietro@sociotechnical.xyz>
Without the change gcc fails to build on master in
--enable-checking=release mode as:
gcc/tree-vect-stmts.cc:10408:47: error: ‘stride_step’
may be used uninitialized [-Werror=maybe-uninitialized]
This happens due to a limit hit in uninit anaysis. To avoid intermittent
build failures dependent on code size let's unconditionally initialize
the stride_step.
PR bootstrap/125318
* tree-vect-stmts.cc (vectorizable_load): Explicitly
initialize stride_step to work around
-Werror=maybe-uninitialized build failure.
Here bb 15 predecessor is a normal edge.
So when merging the forwarder bb 15 into bb14 we end up with:
<bb 12> [local count: 206998870]:
# arr1$0_42(ab) = PHI <arr1$0_24(ab)(11), 9(9)>
# sj12_43(ab) = PHI <sj12_30(ab)(11), arr1$0_35(ab)(9)>
f8 ();
and now there is an overlap of live range of arr1$0_35 and arr1$0_42.
So we need to reject the case where we have phis and the phi arguments that
use abnormal uses.
Changes since v1:
* v2: Look at phi arguments of the forwarder block rather than the dest bb
having an abnormal edge out.
* v3: Fix bb_phis_references_abnormal_uses to use the gimple_phi_num_args to
search over the phi arguments. Also fix the commit message which was wrong.
Bootstrapped and tested on x86_64-linux-gnu.
PR tree-optimization/125396
gcc/ChangeLog:
* tree-cfgcleanup.cc (bb_phis_references_abnormal_uses): New function.
(maybe_remove_forwarder_block): Check to make sure the
forwarder block does not have a phi that references ssa name that has
abnormal uses.
gcc/testsuite/ChangeLog:
* gcc.dg/torture/pr125396-1.c: New test.
Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
Tomasz Kamiński [Tue, 26 May 2026 16:01:40 +0000 (18:01 +0200)]
libstdc++: Revert making ref_view<R> statically sized.
This patch reverts all changes except introduction of
ranges::__static_size from r17-810-g7239744d25dadf.
In addition to expected errors from breaking inplace_vector
preconditions, this lead to change in the return type of
define_static_array, when applied on (adapted) ref_view:
int x[10];
auto x = define_static_array(x | views::transform(...));
Type of x changed from span<const ...> to span<const ..., 10>,
due size being statically know.
This was considered beyond the scope of implementation freedom,
and we should wait for acceptance of P3928R0 instead.
libstdc++-v3/ChangeLog:
* include/std/ranges (ref_view::size()): Only call ranges::size(*_M_r).
(ref_view::empty): Only call ranges::empty(*_M_r).
* testsuite/23_containers/inplace_vector/cons/from_iota_neg.cc:
Except no errors from ref_view uses.
* testsuite/23_containers/inplace_vector/cons/from_range_neg.cc:
Likewise.
Fortran: Add debug functions for OpenMP data structures
show_omp_namelist and show_omp_clauses cannot be called from GDB because
dumpfile is NULL at debug time. Add debug wrappers that temporarily set it to
stderr.
gcc/fortran/ChangeLog:
* dump-parse-tree.cc (debug): Add debug functions for gfc_omp_namelist
and gfc_omp_clauses.
ada: Fix System.Interrupt_Names generation on VxWorks
The spec of Ada.Interrupts.Names for VxWorks (and RTEMS) contains a
subtype declaration. This is a deviation from the Ada reference manual
and the sed script used to generate System.Interrupt_Names failed to
handle it. This patch fixes this.
Tomasz Kamiński [Fri, 24 Apr 2026 03:26:13 +0000 (05:26 +0200)]
libstdc++: Make ref_view<R> statically sized if R has static size
This patch introduces ranges::__static_size<_Range> helper functions,
that returns ranges::size(__rg) for __statically_sized_range.
This function is then used for ref_view<R>::size if R has static size,
avoiding dereference of pointer value that is not known at compile time.
Similarly for ref_view<R>::empty() we compare the size with zero, if it
is known statically.
This implements relevant part of P3928R0: static_sized_range by
Hewill Kang. As standard does not specify when constexpr functions
are usable at compile time, such implementation are allowed (but not
mandated) by current draft.
libstdc++-v3/ChangeLog:
* include/bits/ranges_base.h (ranges::__static_size): Define.
* include/std/ranges (ref_view::size()): For ranges with static
size return ranges::__static_size<_Range>.
(ref_view::empty): For ranges with static size, compare size
against zero.
* testsuite/23_containers/inplace_vector/cons/from_iota_neg.cc:
Expect errors from ref_view uses.
* testsuite/23_containers/inplace_vector/cons/from_range_neg.cc:
Expect errors from ref_view uses.
Reviewed-by: Jonathan Wakely <jwakely@redhat.com> Reviewed-by: Patrick Palka <ppalka@redhat.com> Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>
The test case illustrates that views applied to span<T, N> are still
statically sized, but once applied to array<T, N> are not. This is
caused by the fact that ref_view stores a pointer to array, and
dereference of unknown pointers does not produce reference to unknown,
and simply yield non-constant expressions (unknown pointer is not
equivalent to pointer to unknown, as it may be null).
libstdc++-v3/ChangeLog:
* include/std/inplace_vector (inplace_vector(std::from_range, __Rg&&)):
Add static_asserts checking range size.
* testsuite/23_containers/inplace_vector/cons/from_iota_neg.cc:
New test.
* testsuite/23_containers/inplace_vector/cons/from_range_neg.cc:
New test.
Reviewed-by: Jonathan Wakely <jwakely@redhat.com> Reviewed-by: Patrick Palka <ppalka@redhat.com> Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>
Jonathan Wakely [Thu, 21 May 2026 16:09:47 +0000 (17:09 +0100)]
libstdc++: Deprecate std::memory_order::consume
This implements P3475R2, "Defang and deprecate memory_order::consume",
approved in Hagenberg, 2025.
It looks like the using-declaration for memory_order_consume in
<stdatomic.h> was not deprecated by the paper, but I don't think we can
implement that if we warn for the name in <atomic>. It doesn't make
sense to me for it to be deprecated in C++ but still usable in the C/C++
compatibility header. It's still just as useless in common C/C++
headers, so we should warn.
Jonathan Wakely [Thu, 21 May 2026 17:58:14 +0000 (18:58 +0100)]
libstdc++: Add missing constraints to vector and deque deduction guides
The standard requires that these deduction guides are constrained to
only accept a type that qualifies as an allocator for the second
templates argument.
libstdc++-v3/ChangeLog:
* include/bits/stl_deque.h: Add missing constraint on allocator
type in deduction guide.
* include/bits/stl_vector.h: Likewise.
* include/debug/deque: Likewise.
* include/debug/vector: Likewise.
* testsuite/23_containers/deque/cons/deduction_c++23.cc: Check
that deduction fails for a type which does not qualify as an
allocator.
* testsuite/23_containers/vector/cons/deduction_c++23.cc:
Likewise.
Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
Jonathan Wakely [Tue, 19 May 2026 15:51:14 +0000 (16:51 +0100)]
libstdc++: Split <iosfwd> and only include it in <ios> [PR125371]
The standard explicitly requires the <ios> header to include <iosfwd>,
which has declarations of all the standard stream buffers and stream
types. Because we include <ios> in <istream> and <ostream>, this means
that <iosfwd> is included everywhere, and so every iostream header has a
forward declaration of every iostream type. This means that for example,
<fstream> has a declaration (but not the definition) of std::stringbuf.
This leads to a poor user experience, because the compiler's fixit hints
for undeclared types do not trigger of the type _has_ been declared,
instead users get an error about using an incomplete type. See the
example in PR 125371, where using std::istringstream after including
<fstream> fails to suggest including <sstream>.
If we stop including <ios> in <istream> and <ostream>, and instead
include _most_ of the same things that <ios> provides, then we can avoid
the unhelpful declarations of the entire family of iostream types in
every header. Users who really do want a declaration of std::filebuf
or std::istringstream (but don't want the full definition) can still
explicitly include <iosfwd> to get those declarations. But they won't
get them as a side effect of <fstream> etc.
Various headers currently include <iosfwd> because they really do want
the declaration of e.g. std::ostream of std::streambuf_iterator. We can
split <iosfwd> into five smaller headers and then only include the relevant
one where required, e.g. <fstream> only needs to include iosfwd_file.h
and not iosfwd_string.h.
We need to add an explicit include of <ios> in <iostream>. The standard
requires it there, and after this change we no longer get it via
<istream> and <ostream>.
libstdc++-v3/ChangeLog:
PR libstdc++/125371
* config/io/basic_file_stdio.h: Include <bits/ios_base.h>
instead of <ios>.
* include/Makefile.am: Add new headers.
* include/Makefile.in: Regenerate.
* include/bits/fs_path.h: Include <bits/iosfwd.h> instead of
<iosfwd>.
* include/bits/locale_facets.h: Remove unused <iosfwd> and
<streambuf> includes.
* include/bits/localefwd.h: Include <bits/iosfwd.h> instead of
<iosfwd>.
* include/bits/ostream.h: Replace <ios> with its constituent
parts, except for <iosfwd>.
* include/bits/ostream_insert.h: Include <bits/iosfwd.h> instead
of <iosfwd>.
* include/bits/shared_ptr.h: Likewise.
* include/bits/std_thread.h: Likewise.
* include/bits/stream_iterator.h: Likewise.
* include/std/fstream: Include <bits/iosfwd_file.h>.
* include/std/iomanip: Include <bits/iosfwd.h> instead of
<iosfwd>.
* include/std/ios: Do not include <exception> or
<bits/char_traits.h>.
* include/std/iosfwd: Move declarations to new headers and
include those new headers. Tweak Doxygen comment.
* include/std/iostream: Include <ios>.
* include/std/istream: Replace <ios> with its constituent
parts, except for <iosfwd>.
* include/std/random: Include <bits/iosfwd.h> instead of
<iosfwd>.
* include/std/spanstream: Include <bits/iosfwd_span.h>.
* include/std/sstream: Include <bits/iosfwd_string.h>.
* include/std/streambuf: Include <bits/iosfwd.h> instead of
<iosfwd>.
* include/std/string_view: Likewise.
* include/std/syncstream: Include <bits/iosfwd_sync.h>.
* include/std/system_error: Include <bits/iosfwd.h> instead of
<iosfwd>.
* include/bits/iosfwd.h: New file.
* include/bits/iosfwd_file.h: New file.
* include/bits/iosfwd_span.h: New file.
* include/bits/iosfwd_string.h: New file.
* include/bits/iosfwd_sync.h: New file.
Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>