Jonathan Wakely [Thu, 10 Jul 2025 13:12:44 +0000 (14:12 +0100)]
libstdc++: Ensure that ranges::destroy destroys in constexpr [PR121024]
The new test is currently marked as XFAIL because PR c++/102284 means
that GCC doesn't notice that the lifetimes have ended.
libstdc++-v3/ChangeLog:
PR libstdc++/121024
* include/bits/ranges_uninitialized.h (ranges::destroy): Do not
optimize away trivial destructors during constant evaluation.
(ranges::destroy_n): Likewise.
* testsuite/20_util/specialized_algorithms/destroy/121024.cc:
New test.
Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
Jonathan Wakely [Sun, 13 Jul 2025 14:05:52 +0000 (15:05 +0100)]
libstdc++: Ensure std::make_unsigned<Enum> works for 128-bit enum
Another follow-up to r16-2190-g4faa42ac0dee2c, ensuring that make_signed
and make_unsigned work on enumeration types with 128-bit integers as
their underlying type.
libstdc++-v3/ChangeLog:
* include/std/type_traits (__make_unsigned_selector): Add
unsigned __int128 to type list.
* testsuite/20_util/make_unsigned/int128.cc: New test.
Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
Jonathan Wakely [Fri, 16 May 2025 12:33:23 +0000 (13:33 +0100)]
libstdc++: Ensure std::hash<__int128> is defined [PR96710]
This is a follow-up to r16-2190-g4faa42ac0dee2c which ensures that
std::hash is always enabled for signed and unsigned __int128. The
standard requires std::hash to be enabled for all arithmetic types.
libstdc++-v3/ChangeLog:
PR libstdc++/96710
* include/bits/functional_hash.h (hash<__int128>): Define for
strict modes.
(hash<unsigned __int128>): Likewise.
* testsuite/20_util/hash/int128.cc: New test.
Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
Tomasz Kamiński [Fri, 27 Jun 2025 13:04:19 +0000 (15:04 +0200)]
libstdc++: Format %a/%A/%b/%h/%B/%p without using locale::classic [PR110739]
With changes r16-2063-g8ad5968a8dcb47 the _M_a_A, _M_b_B and _M_p functions
are called only if the locale is equal to the locale::classic(), for which
the behavior is know. This patch changes they implementation, so instead of
reffering to __timepunct facet members, they use hardcoded list of English
weekday, months names. Only one list is needed, as in case of locale::classic()
abbreviated name corresponds to first tree letters of the full name.
For _M_p, _M_r we use a new _M_fill_ampm helper, that fills provided buffer
with "AM"/"PM" depending on the hours value.
In _M_S we no longer guard querying of numpuct facet, with check that requires
potentially equally expensive construction of locale::classic. We also mark
localized path as unlikely.
The _M_locale method is no longer used in __formatter_chrono, and thus was
moved to __formatter_duration.
PR libstdc++/110739
libstdc++-v3/ChangeLog:
* include/bits/chrono_io.h (__formatter_chrono::_S_weekdays)
(__formatter_chrono::_S_months, __formatter_chrono::_S_fill_ampm):
Define.
(__formatter_chrono::_M_format_to): Do not pass context parameter
to functions listed below.
(__formatter_chrono::_M_a_A, __formatter_chrono::_M_b_B): Implement
using harcoded list of names, and remove format context parameter.
(__formatter_chrono::_M_p, __formatter_chrono::_M_r): Implement
using _S_fill_ampm.
(__formatter_chrono::_M_c): Removed format context parameter.
(__formatter_chrono::_M_subsecs): Call __ctx.locale() directly,
instead of _M_locale and do not compare with locale::classic().
Add [[unlikely]] attributes.
(__formatter_chrono::_M_locale): Move to __formatter_duration.
(__formatter_duration::_M_locale): Moved from __formatter_chrono.
aarch64: AND/BIC combines for unpacked SVE FP comparisons
This patch extends the splitting patterns for combining FP comparisons
with predicated logical operations such that they cover all of SVE_F.
gcc/ChangeLog:
* config/aarch64/aarch64-sve.md (*fcm<cmp_op><mode>_and_combine):
Extend from SVE_FULL_F to SVE_F.
(*fcmuo<mode>_and_combine): Likewise.
(*fcm<cmp_op><mode>_bic_combine): Likewise.
(*fcm<cmp_op><mode>_nor_combine): Likewise.
(*fcmuo<mode>_bic_combine): Likewise.
(*fcmuo<mode>_nor_combine): Likewise. Move the comment here to
above fcmuo<mode>_bic_combine, since it applies to both patterns.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/sve/unpacked_fcm_combines_1.c: New test.
* gcc.target/aarch64/sve/unpacked_fcm_combines_2.c: Likewise.
Mikael Morin [Tue, 15 Jul 2025 07:58:44 +0000 (09:58 +0200)]
fortran: Amend descriptor bounds init if unallocated
Always generate the conditional initialization of unallocated variables
regardless of the basic variable allocation tracking done in the
frontend and with an additional always false condition.
The scalarizer used to always evaluate array bounds, including in the
case of unallocated arrays on the left hand side of an assignment. This
was (correctly) causing uninitialized warnings, even if the
uninitialized values were in the end discarded.
Since the fix for PR fortran/108889, an initialization of the descriptor
bounds is added to silent the uninitialized warnings, conditional on the
array being unallocated. This initialization is not useful in the
execution of the program, and it is removed if the compiler can prove
that the variable is unallocated (in the case of a local variable for
example). Unfortunately, the compiler is not always able to prove it
and the useless initialization may remain in the final code.
Moreover, the generated code that was causing the evaluation of
uninitialized variables has ben changed to avoid them, so we can try to
remove or revisit that unallocated variable bounds initialization tweak.
Unfortunately, just removing the extra initialization restores the
warnings at -O0, as there is no dead code removal at that optimization
level. Instead, this change keeps the initialization and modifies its
guarding condition with an extra always false variable, so that if
optimizations are enabled the whole initialization block is removed, and
if they are disabled it remains and is sufficient to prevent the
warning.
The new variable requires the code generation to be done earlier in the
function so that the variable declaration and usage are in the same
scope.
As the modified condition guarantees the removal of the block with
optimizations, we can emit it more broadly and remove the basic
allocation tracking that was done in the frontend to limit its emission.
gcc/fortran/ChangeLog:
* gfortran.h (gfc_symbol): Remove field allocated_in_scope.
* trans-array.cc (gfc_array_allocate): Don't set it.
(gfc_alloc_allocatable_for_assignment): Likewise.
Generate the unallocated descriptor bounds initialisation
before the opening of the reallocation code block. Create a
variable and use it as additional condition to the unallocated
descriptor bounds initialisation.
so the bounds etc are evaluated first to variables, and the reallocation
code takes care to update the variables during the reallocation. This
is problematic because the variables' initialization references the
array bounds, which for unallocated arrays are uninitialized at the
evaluation point. This used to (correctly) cause uninitialized warnings
(see PR fortran/108889), and a workaround for variables was found, that
initializes the bounds of arrays variables to some value beforehand if
they are unallocated. For allocatable components, there is no warning
but the problem remains, some uninitialized values are used, even if
discarded later.
so the scalarizer avoids storing the values to variables at the time it
evaluates them, if the array is reallocatable on assignment. Instead,
it keeps expressions with references to the array descriptor fields,
expressions that remain valid through reallocation. After the
reallocation code has been generated, the expressions stored by the
scalarizer are evaluated in place to variables.
The decision to delay evaluation is based on the existing field
is_alloc_lhs, which requires a few tweaks to be alway correct wrt to
what its name suggests. Namely it should be set even if the assignment
right hand side is an intrinsic function, and it should not be set if
the right hand side is a scalar and neither if the -fno-realloc-lhs flag
is passed to the compiler.
gcc/fortran/ChangeLog:
* trans-array.cc (gfc_conv_ss_descriptor): Don't evaluate
offset and data to a variable if is_alloc_lhs is set. Move the
existing evaluation decision condition for data...
(save_descriptor_data): ... here as a new predicate.
(evaluate_bound): Add argument save_value. Omit the evaluation
of the value to a variable if that argument isn't set.
(gfc_conv_expr_descriptor): Update caller.
(gfc_conv_section_startstride): Update caller. Set save_value
if is_alloc_lhs is not set. Omit the evaluation of stride to a
variable if save_value isn't set.
(gfc_set_delta): Omit the evaluation of delta to a variable
if is_alloc_lhs is set.
(gfc_is_reallocatable_lhs): Return false if flag_realloc_lhs
isn't set.
(gfc_alloc_allocatable_for_assignment): Don't update
the variables that may be stored in saved_offset, delta, and
data. Call instead...
(update_reallocated_descriptor): ... this new procedure.
* trans-expr.cc (gfc_trans_assignment_1): Don't omit setting the
is_alloc_lhs flag if the right hand side is an intrinsic
function. Clear the flag if the right hand side is scalar.
Mikael Morin [Tue, 15 Jul 2025 07:58:26 +0000 (09:58 +0200)]
fortran: Generate array reallocation out of loops
Generate the array reallocation on assignment code before entering the
scalarization loops. This doesn't move the generated code itself,
which was already put before the outermost loop, but only changes the
current scope at the time the code is generated. This is a prerequisite
for a followup patch that makes the reallocation code create new
variables. Without this change the new variables would be declared in
the innermost loop body and couldn't be used outside of it.
gcc/fortran/ChangeLog:
* trans-expr.cc (gfc_trans_assignment_1): Generate array
reallocation code before entering the scalarisation loops.
In file included from ./tm_p.h:4,
from /vol/gcc/src/hg/master/local/gcc/tree.cc:35:
/vol/gcc/src/hg/master/local/gcc/config/sparc/sparc-protos.h:46:47: error: use of enum ‘memmodel’ without previous declaration
46 | extern void sparc_emit_membar_for_model (enum memmodel, int, int);
|
Fixed by including memmodel.h.
Bootstrapped without regressions on sparc-sun-solaris2.11 and
i386-pc-solaris2.11.
Robert Dubner [Mon, 14 Jul 2025 20:41:35 +0000 (16:41 -0400)]
cobol: Eliminate cppcheck warnings in gcc/cobol .cc files.
These changes eliminate various cppcheck warnings, mostly involving C-Style
casting and applying "const" to various variables and formal parameters.
Some tab characters were eliminated, and some lines were trimmed to
seventy-nine characters.
Jonathan Wakely [Sun, 13 Jul 2025 14:34:15 +0000 (15:34 +0100)]
libstdc++: Add comments to deleted std::swap overloads for LWG 2766
We pre-emptively implemented part of LWG 2766, which still hasn't been
approved. Add comments to the deleted swap overloads saying why they're
there, because the standard doesn't require them.
Andrew Stubbs [Wed, 9 Jul 2025 14:59:20 +0000 (14:59 +0000)]
amdgcn: fix vec_ucmp infinite recursion
I suppose this pattern doesn't get used much! The unsigned compare was meant to
be defined using the signed compare pattern, but actually ended up trying to
recursively call itself. This patch fixes the issue in the obvious way.
gcc/ChangeLog:
* config/gcn/gcn-valu.md (vec_cmpu<mode>di_exec): Call gen_vec_cmp*,
not gen_vec_cmpu*.
* config/s390/vector.md (reduc_plus_scal_<mode>): Implement.
(reduc_plus_scal_v2df): Implement.
(reduc_plus_scal_v4sf): Implement.
(REDUC_FMINMAX): New int iterator.
(reduc_fminmax_name): New int attribute.
(reduc_minmax): New code iterator.
(reduc_minmax_name): New code attribute.
(reduc_<reduc_fminmax_name>_scal_v2df): Implement.
(reduc_<reduc_fminmax_name>_scal_v4sf): Implement.
(reduc_<reduc_minmax_name>_scal_v2df): Implement.
(reduc_<reduc_minmax_name>_scal_v4sf): Implement.
(REDUCBIN): New code iterator.
(reduc_bin_insn): New code attribute.
(reduc_<reduc_bin_insn>_scal_v2di): Implement.
(reduc_<reduc_bin_insn>_scal_v4si): Implement.
(reduc_<reduc_bin_insn>_scal_v8hi): Implement.
(reduc_<reduc_bin_insn>_scal_v16qi): Implement.
gcc/testsuite/ChangeLog:
* lib/target-supports.exp: Add s390 to vect_logical_reduc targets.
* gcc.target/s390/vector/reduc-binops-1.c: New test.
* gcc.target/s390/vector/reduc-minmax-1.c: New test.
* gcc.target/s390/vector/reduc-plus-1.c: New test.
The default setting of s390 for the parameter min-vect-loop-bound was
set to 2 to prevent certain epilogue loop vectorizations in the past.
Reevaluation of this parameter shows that this setting now is not
needed anymore and sometimes even harmful. Remove the overwrite to
align s390 with other backends.
Andrew Stubbs [Fri, 11 Jul 2025 13:41:19 +0000 (13:41 +0000)]
amdgcn: Don't clobber VCC if we don't need to
This is a hold-over from GCN3 where v_add always wrote to the condition
register, whether you wanted it or not. This hasn't been true since GCN5, and
we dropped support for GCN3 a little while ago, so let's fix it.
There was actually a latent bug here because some other post-reload splitters
were generating v_add instructions without declaring the VCC clobber (at least
mul did this), so this should fix some wrong-code bugs also.
gcc/ChangeLog:
* config/gcn/gcn-valu.md (add<mode>3<exec_clobber>): Rename ...
(add<mode>3<exec>): ... to this, remove the clobber, and change the
instruction from v_add_co_u32 to v_add_u32.
(add<mode>3_dup<exec_clobber>): Rename ...
(add<mode>3_dup<exec>): ... to this, and likewise.
(sub<mode>3<exec_clobber>): Rename ...
(sub<mode>3<exec>): ... to this, and likewise
* config/gcn/gcn.md (addsi3): Remove the DI clobber, and change the
instruction from v_add_co_u32 to v_add_u32.
(addsi3_scc): Likewise.
(subsi3): Likewise, but for v_sub_co_u32.
(muldi3): Likewise.
Richard Biener [Mon, 14 Jul 2025 12:09:28 +0000 (14:09 +0200)]
tree-optimization/121059 - record loop mask when required
For loop masking we need to mask a mask AND operation with the loop
mask. The following makes sure we have a corresponding mask
available. There's no good way to distinguish loop masking from
len masking here, so assume we have recorded a mask for the operands
mask producers.
PR tree-optimization/121059
* tree-vect-stmts.cc (vectorizable_operation): Record a
loop mask for mask AND operations.
Pan Li [Fri, 11 Jul 2025 00:58:31 +0000 (08:58 +0800)]
RISC-V: Add testcase for rv32 SAT_MUL from uint64
Add the run and asm testcase for rv32 SAT_MUL, widen mul from
uint8_t, uint16_t, uint32_t to uint64_t.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/sat/sat_u_mul-1-u16-from-u64.c: New test.
* gcc.target/riscv/sat/sat_u_mul-1-u32-from-u64.c: New test.
* gcc.target/riscv/sat/sat_u_mul-1-u8-from-u64.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-1-u16-from-u64.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-1-u32-from-u64.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-1-u8-from-u64.c: New test.
Pan Li [Fri, 11 Jul 2025 00:38:09 +0000 (08:38 +0800)]
Match: Refine the widen mul check for SAT_MUL pattern
The widen mul will have source type from N-bits to
dest type 2N-bits. The previous check only focus on
the HOST_WIDE_INT but not working for QI => HI, HI => SI
and SI to DImode. Thus, refine the widen mul precision
check as dest has twice bits of input.
gcc/ChangeLog:
* match.pd: Make sure widen mul has twice bitsize
of the inputs in SAT_MUL pattern.
instead of mcount, which is placed before the prologue so that -pg can
be used with -fshrink-wrap-separate enabled at -O1. This option is
64-bit only because __fentry__ doesn't support PIC in 32-bit mode. The
default it to enable -mfentry when targeting glibc.
Also warn -pg without -mfentry with shrink wrapping enabled. The warning
is disable for PIC in 32-bit mode.
gcc/
PR target/120881
* config.in: Regenerated.
* configure: Likewise.
* configure.ac: Add --enable-x86-64-mfentry.
* config/i386/i386-options.cc (ix86_option_override_internal):
Enable __fentry__ in 64-bit mode if ENABLE_X86_64_MFENTRY is set
to 1. Warn -pg without -mfentry with shrink wrapping enabled.
* doc/install.texi: Document --enable-x86-64-mfentry.
darwin25 will be named macOS 26 (codename Tahoe). This is a change from
darwin24, which was macOS 15. We need to adapt the driver to this new
numbering scheme.
This pattern enables the combine pass (or late-combine, depending on the case)
to merge a float_extend'ed vec_duplicate into a plus-mult or minus-mult RTL
instruction.
Before this patch, we have three instructions, e.g.:
fcvt.s.h fa5,fa5
vfmv.v.f v24,fa5
vfmadd.vv v8,v24,v16
After, we get only one:
vfwmacc.vf v8,fa5,v16
PR target/119100
gcc/ChangeLog:
* config/riscv/autovec-opt.md (*vfwmacc_vf_<mode>): New pattern to
handle both vfwmacc and vfwmsac.
(*extend_vf_<mode>): New pattern that serves as an intermediate combine
step.
* config/riscv/vector-iterators.md (vsubel): New mode attribute. This is
just the lower-case version of VSUBEL.
* config/riscv/vector.md (@pred_widen_mul_<optab><mode>_scalar): Reorder
and swap operands to match the RTL emitted by expand, i.e. first
float_extend then vec_duplicate.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f16.c: Add vfwmacc and
vfwmsac.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f32.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f16.c: Likewise. Also check
for fcvt and vfmv.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f32.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f16.c: Add vfwmacc and
vfwmsac.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f32.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f16.c: Likewise. Also check
for fcvt and vfmv.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f32.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_mulop.h: Add support for
widening variants.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_mulop_widen_run.h: New test
helper.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfwmacc-run-1-f16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfwmacc-run-1-f32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfwmsac-run-1-f16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfwmsac-run-1-f32.c: New test.
Jonathan Wakely [Fri, 11 Jul 2025 22:49:27 +0000 (23:49 +0100)]
libstdc++: Correct value of __cpp_lib_constexpr_exceptions [PR117785]
Only P3068R6 (Allowing exception throwing in constant-evaluation) is
implemented in the library so far, so the value of the
constexpr_exceptions feature test macro should be 202411L. Once we
support the library changes in P3378R2 (constexpr exception types) then
we can set the value to 202502L again.
Jonathan Wakely [Fri, 11 Jul 2025 11:40:14 +0000 (12:40 +0100)]
libstdc++: Fix constexpr exceptions for -fno-exceptions
The if-consteval branches in std::make_exception_ptr and
std::exception_ptr_cast use a try-catch block, which gives an error for
-fno-exceptions. Just make them return a null pointer at compile-time
when -fno-exceptions is used, because there's no way to get an active
exception with -fno-exceptions.
For both functions we have a runtime-only branch that depends on RTTI,
and a fallback using try-catch which works for runtime and consteval.
Rearrange both functions to express this logic more clearly.
Also adjust some formatting and whitespace elsewhere in the file.
libstdc++-v3/ChangeLog:
* libsupc++/exception_ptr.h (make_exception_ptr): Return null
for consteval when -fno-exceptions is used.
(exception_ptr_cast): Likewise. Allow consteval path to work
with -fno-rtti.
Eric Botcazou [Mon, 14 Jul 2025 10:11:44 +0000 (12:11 +0200)]
Ada: Add missing guard before accessing the Underlying_Record_View field
It is necessary when GNAT extensions are enabled (-gnatX switch).
gcc/ada/
PR ada/121056
* sem_ch4.adb (Try_Object_Operation.Try_Primitive_Operation): Add
test on Is_Record_Type before accessing Underlying_Record_View.
gcc/testsuite/
* gnat.dg/deref4.adb: New test.
* gnat.dg/deref4_pkg.ads: New helper.
Implements the sme2+faminmax svamin and svamax intrinsics.
gcc/ChangeLog:
* config/aarch64/aarch64-sme.md (@aarch64_sme_<faminmax_uns_op><mode>):
New patterns.
* config/aarch64/aarch64-sve-builtins-sme.def (svamin): New intrinsics.
(svamax): New intrinsics.
* config/aarch64/aarch64-sve-builtins-sve2.cc (class faminmaximpl): New
class.
(svamin): New function.
(svamax): New function.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/sme2/acle-asm/amax_f16_x2.c: New test.
* gcc.target/aarch64/sme2/acle-asm/amax_f16_x4.c: New test.
* gcc.target/aarch64/sme2/acle-asm/amax_f32_x2.c: New test.
* gcc.target/aarch64/sme2/acle-asm/amax_f32_x4.c: New test.
* gcc.target/aarch64/sme2/acle-asm/amax_f64_x2.c: New test.
* gcc.target/aarch64/sme2/acle-asm/amax_f64_x4.c: New test.
* gcc.target/aarch64/sme2/acle-asm/amin_f16_x2.c: New test.
* gcc.target/aarch64/sme2/acle-asm/amin_f16_x4.c: New test.
* gcc.target/aarch64/sme2/acle-asm/amin_f32_x2.c: New test.
* gcc.target/aarch64/sme2/acle-asm/amin_f32_x4.c: New test.
* gcc.target/aarch64/sme2/acle-asm/amin_f64_x2.c: New test.
* gcc.target/aarch64/sme2/acle-asm/amin_f64_x4.c: New test.
i386: Remove KEYLOCKER related feature since Panther Lake and Clearwater Forest
According to July 2025 SDM, Key locker will no longer be supported on
hardware 2025 onwards. This means for Panther Lake and Clearwater Forest,
the feature will not be enabled. Remove them from those two platforms.
RISC-V: Add testcases for unsigned vector SAT_SUB form 11 and form 12
This patch adds testcase for form11 and form12, as shown below:
void __attribute__((noinline)) \
vec_sat_u_sub_##T##_fmt_11 (T *out, T *op_1, T *op_2, unsigned limit) \
{ \
unsigned i; \
for (i = 0; i < limit; i++) \
{ \
T x = op_1[i]; \
T y = op_2[i]; \
T ret; \
T overflow = __builtin_sub_overflow (x, y, &ret); \
out[i] = overflow ? 0 : ret; \
} \
}
void __attribute__((noinline)) \
vec_sat_u_sub_##T##_fmt_12 (T *out, T *op_1, T *op_2, unsigned limit) \
{ \
unsigned i; \
for (i = 0; i < limit; i++) \
{ \
T x = op_1[i]; \
T y = op_2[i]; \
T ret; \
T overflow = __builtin_sub_overflow (x, y, &ret); \
out[i] = !overflow ? ret : 0; \
} \
}
Passed the rv64gcv regression test.
Signed-off-by: Ciyan Pan <panciyan@eswincomputing.com>
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/sat/vec_sat_arith.h: Unsigned vector SAT_SUB form11 form12.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_data.h: Use ussub instead of usub.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-1-u16.c: Use ussub instead of usub.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-1-u32.c: Use ussub instead of usub.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-1-u64.c: Use ussub instead of usub.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-1-u8.c: Use ussub instead of usub.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-10-u16.c: Use ussub instead of usub.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-10-u32.c: Use ussub instead of usub.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-10-u64.c: Use ussub instead of usub.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-10-u8.c: Use ussub instead of usub.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-2-u16.c: Use ussub instead of usub.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-2-u32.c: Use ussub instead of usub.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-2-u64.c: Use ussub instead of usub.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-2-u8.c: Use ussub instead of usub.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-3-u16.c: Use ussub instead of usub.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-3-u32.c: Use ussub instead of usub.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-3-u64.c: Use ussub instead of usub.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-3-u8.c: Use ussub instead of usub.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-4-u16.c: Use ussub instead of usub.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-4-u32.c: Use ussub instead of usub.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-4-u64.c: Use ussub instead of usub.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-4-u8.c: Use ussub instead of usub.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-5-u16.c: Use ussub instead of usub.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-5-u32.c: Use ussub instead of usub.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-5-u64.c: Use ussub instead of usub.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-5-u8.c: Use ussub instead of usub.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-6-u16.c: Use ussub instead of usub.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-6-u32.c: Use ussub instead of usub.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-6-u64.c: Use ussub instead of usub.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-6-u8.c: Use ussub instead of usub.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-7-u16.c: Use ussub instead of usub.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-7-u32.c: Use ussub instead of usub.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-7-u64.c: Use ussub instead of usub.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-7-u8.c: Use ussub instead of usub.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-8-u16.c: Use ussub instead of usub.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-8-u32.c: Use ussub instead of usub.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-8-u64.c: Use ussub instead of usub.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-8-u8.c: Use ussub instead of usub.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-9-u16.c: Use ussub instead of usub.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-9-u32.c: Use ussub instead of usub.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-9-u64.c: Use ussub instead of usub.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-9-u8.c: Use ussub instead of usub.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-11-u16.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-11-u32.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-11-u64.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-11-u8.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-12-u16.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-12-u32.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-12-u64.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-12-u8.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-11-u16.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-11-u32.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-11-u64.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-11-u8.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-12-u16.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-12-u32.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-12-u64.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-12-u8.c: New test.
Andrew Pinski [Sun, 13 Jul 2025 18:56:03 +0000 (11:56 -0700)]
tree: Add include to tm_p.h to tree.cc [PR120866]
After r16-1738-g0337e3c2743ca0, a call to ASM_GENERATE_INTERNAL_LABEL
was done without including tm_p.h. This does not break most targets
as ASM_GENERATE_INTERNAL_LABEL macro function does not call target
specific functions from it; mostly just sprintf. It does however
break pdp11-aout and powerpc-aix* because those two call a target
specific function to do create the internal label.
Pushed as obvious after a build of gcc for pdp11-aout and x86_64-linux-gnu.
PR middle-end/120866
gcc/ChangeLog:
* tree.cc: Add include to tm_p.h.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Robert Dubner [Fri, 11 Jul 2025 21:11:21 +0000 (17:11 -0400)]
cobol: Minor changes to genapi.cc to eliminate CPPCHECK warnings.
Several hundred cppcheck warnings were eliminated.
Most of these changes were replacing C-style casts, checking for NULL
pointers, establishing some variables and formal parameters as const,
and moving some variables around to tidy up their scopes.
One memory leak was found and eliminated as a result of the cppcheck.
Jan Hubicka [Sat, 12 Jul 2025 15:57:25 +0000 (17:57 +0200)]
Fix some auto-profile issues
This patch fixes minor things that has cumulated in my tree. Except for
formating fixes an important change is that seen set is now kept up to date.
Oriignal code first populated it for all string in the string table but now
gimple matching may introduce new ones that needs to be checked for match with
symbol table as well.
This makes imagemagic of spec2017 to be faster with auto-fdo then without at
least when trained with ref run. Train run has problem since it does not train
the innermost loop at all, so even with normal PGO it is slower then without.
* auto-profile.cc (function_instance::~function_instance):
Move down in source.
(string_table::get_cgraph_node): New member function with
logic broken out from ...
(function_instance::get_cgraph_node): ... here.
(match_with_target): Fix formating.
(function_instance::match): Fix formating; do not use iterators
after modifying map; remove incorrect set of warned flag.
(autofdo_source_profile::offline_external_functions): Keep
seen set up to date.
(function_instance::read_function_instance): Fix formating.
MMX allows only direct moves from zero, so correct V_32:mode and v2qi
move patterns to allow only nonimm_or_0_operand as their input operand.
gcc/ChangeLog:
* config/i386/mmx.md (mov<V_32:mode>):
Use nonimm_or_0_operand predicate for operand 1.
(*mov<V_32:mode>_internal): Ditto.
(movv2qi): Ditto.
(*movv2qi_internal): Ditto. Use ix86_hardreg_mov_ok
in insn condition.
Xi Ruoyao [Tue, 8 Jul 2025 06:39:11 +0000 (14:39 +0800)]
lra: Reallow reloading user hard registers if the insn is not asm [PR 120983]
The PR 87600 fix has disallowed reloading user hard registers to resolve
earlyclobber-induced conflict.
However before reload, recog completely ignores the constraints of
insns, so the RTL passes may produce insns where some user hard
registers violate an earlyclobber. Then we'll get an ICE without
reloading them, like what we are recently encountering in LoongArch test
suite.
IIUC "recog does not look at constraints until reload" has been a
well-established rule in GCC for years and I don't have enough skill to
challange it. So reallow reloading user hard registers (but still
disallow doing so for asm) to fix the ICE.
gcc/ChangeLog:
PR rtl-optimization/120983
* lra-constraints.cc (process_alt_operands): Allow reloading
user hard registers unless the insn is an asm.
Xi Ruoyao [Tue, 8 Jul 2025 06:07:21 +0000 (14:07 +0800)]
testsuite: Enable the PR 87600 tests for LoongArch
I'm going to refine a part of the PR 87600 fix which seems triggering
PR 120983 that LoongArch is particularly suffering. Enable the PR 87600
tests so I'll not regress PR 87600.
Fortran/OpenACC: Permit PARAMETER as 'var' in clauses (+ ignore)
It turned out that other compilers permit (require?) named constants
to appear in clauses - and programs actually use this. OpenACC 3.4
added therefore the following:
In this spec, a _var_ (in italics) is one of the following:
...
* a named constant in Fortran.
plus
If during an optimization phase _var_ is removed by the compiler,
appearances of var in data clauses are ignored.
Thus, all errors related to PARAMETER are now downgraded, most
to a -Wsurprising warning, but for 'acc declare device_resident'
(which kind of makes sense), no warning is printed.
In trans-openmp.cc, those are ignored, unless I missed some code
path. (If so, I hope the middle end removes them; but before
removing them for the covered cases, the program just compiled &
linked fine.)
Note that 'ignore PARAMETER inside clauses' in trans-openmp.cc
would in principle also apply to expressions ('if (var)') but
those should be evaluated during 'resolve.cc' + 'openmp.cc' to
their (numeric, logical, string) value such that there should
be no issue.
gcc/fortran/ChangeLog:
* invoke.texi (-Wsurprising): Note about OpenACC warning
related to PARAMATER.
* openmp.cc (resolve_omp_clauses, gfc_resolve_oacc_declare):
Accept PARAMETER for OpenACC but add surprising warning.
* trans-openmp.cc (gfc_trans_omp_variable_list,
gfc_trans_omp_clauses): Ignore PARAMETER inside clauses.
gcc/testsuite/ChangeLog:
* gfortran.dg/goacc/parameter.f95: Add -Wsurprising flag
and update expected diagnostic.
* gfortran.dg/goacc/parameter-3.f90: New test.
* gfortran.dg/goacc/parameter-4.f90: New test.
David Malcolm [Fri, 11 Jul 2025 18:58:21 +0000 (14:58 -0400)]
diagnostics: add support for directed graphs; use them for state graphs
In r16-1631-g2334d30cd8feac I added support for capturing state
information from -fanalyzer in XML form, and adding a way to visualize
these states in HTML output. The data was optionally captured in SARIF
output (with "xml-state=yes"), stashing the XML in string form in
a property bag.
This worked, but there was no way to round-trip the stored data back
from SARIF without adding an XML parser to GCC, which I don't want to
do.
SARIF supports capturing directed graphs, so this patch:
(a) adds a new namespace diagnostics::digraphs, with classes digraph,
node, and edge, representing directed graphs in a form similar to
what SARIF can serialize
(b) adds support to GCC's diagnostic subsystem for reporting graphs,
either "globally" or as part of a diagnostic. An example in a testsuite
plugin emits an error that has a couple of dummy graphs associated with
it, and captures the optimization passes as a digraph "globally".
Graphs are ignored by text sinks, but are captured by sarif sinks,
and the "experimental-html" sink gains SVG-based rendering of any graphs
using dot. This HTML output is rather crude; an example can be seen
here:
https://dmalcolm.fedorapeople.org/gcc/2025-07-10/diagnostic-test-graphs-html.c.html
(c) adds support to libgdiagnostics for the above
(d) adds support to sarif-replay for the above (round-tripping any
graph information)
(e) replaces the XML representation of state with a representation
based on the above directed graphs, using property bags to stash
additional information (e.g. "this is an on-stack buffer")
(f) implements round-tripping of this information in sarif-replay
To summarize:
- previously we could generate HTML diagrams for debugging
-fanalyzer directly from gcc, but not from stored .sarif output.
- with this patch, we can generate such HTML diagrams both directly
*and* from stored .sarif output (provided the SARIF sink was created
with "state-graphs=yes")
Examples of HTML output can be seen here:
https://dmalcolm.fedorapeople.org/gcc/2025-07-10/
where as before j/k can be used to cycle through the events.
which is almost identical to the output from the old XML-based
implementation seen at:
https://dmalcolm.fedorapeople.org/gcc/2025-06-23/
gcc/ChangeLog:
* Makefile.in (OBJS-libcommon): Add diagnostic-digraphs.o and
diagnostic-state-graphs.o.
gcc/ChangeLog:
* diagnostic-format-html.cc: Include "diagnostic-format-sarif.h",
Replace include of "diagnostic-state.h" with includes of
"diagnostic-digraphs.h" and "diagnostic-state-graphs.h".
(html_generation_options::html_generation_options): Update for
field renaming.
(html_builder::m_body_element): New field.
(html_builder::html_builder): Initialize m_body_element.
(html_builder::maybe_make_state_diagram): Port from XML
implementation to state graph implementation.
(html_builder::make_element_for_diagnostic): Add any
per-diagnostic graphs.
(html_builder::add_graph): New.
(html_builder::emit_global_graph): New.
(html_output_format::report_global_digraph): New.
* diagnostic-format-html.h
(html_generation_options::m_show_state_diagram_xml): Replace
with...
(html_generation_options::m_show_state_diagrams_sarif): ...this.
(html_generation_options::m_show_state_diagram_dot_src): Rename
to...
(html_generation_options::m_show_state_diagrams_dot_src): ...this.
* diagnostic-format-sarif.cc: Include "diagnostic-digraphs.h" and
"diagnostic-state-graphs.h".
(sarif_builder::m_run_graphs): New field.
(sarif_result::on_nested_diagnostic): Update call to
make_location_object to pass arg by pointer.
(sarif_builder::sarif_builder): Initialize m_run_graphs.
(sarif_builder::report_global_digraph): New.
(sarif_builder::make_result_object): Add any graphs to
the result object.
(sarif_builder::make_locations_arr): Update call to
make_location_object to pass arg by pointer.
(sarif_builder::make_location_object): Pass param "loc_mgr" by
pointer rather than by reference so that it can be null, and
handle this case.
(copy_any_property_bag): New.
(make_sarif_graph): New.
(make_sarif_node): New.
(make_sarif_edge): New.
(sarif_property_bag::set_graph): New.
(populate_thread_flow_location_object): Port from XML
implementation to state graph implementation.
(make_run_object): Store any graphs.
(sarif_output_format::report_global_digraph): New.
(sarif_generation_options::sarif_generation_options): Rename
m_xml_state to m_state_graph.
(selftest::test_make_location_object): Update for change to
make_location_object.
* diagnostic-format-sarif.h:
(sarif_generation_options::m_xml_state): Replace with...
(sarif_generation_options::m_state_graph): ...this.
(class sarif_location_manager): Add forward decl.
(diagnostics::digraphs::digraph): New forward decl.
(diagnostics::digraphs::node): New forward decl.
(diagnostics::digraphs::edge): New forward decl.
(sarif_property_bag::set_graph): New decl.
(class sarif_graph): New.
(class sarif_node): New.
(class sarif_edge): New.
(make_sarif_graph): New decl.
(make_sarif_node): New decl.
(make_sarif_edge): New decl.
* diagnostic-format-text.h
(diagnostic_text_output_format::report_global_digraph): New.
* diagnostic-format.h
(diagnostic_output_format::report_global_digraph): New vfunc.
* diagnostic-digraphs.cc: New file.
* diagnostic-digraphs.h: New file.
* diagnostic-metadata.h (diagnostics::digraphs::lazy_digraphs):
New forward decl.
(diagnostic_metadata::diagnostic_metadata): Initialize
m_lazy_digraphs.
(diagnostic_metadata::set_lazy_digraphs): New.
(diagnostic_metadata::get_lazy_digraphs): New.
(diagnostic_metadata::m_lazy_digraphs): New field.
* diagnostic-output-spec.cc (sarif_scheme_handler::make_sink):
Update for XML to state graph changes.
(sarif_scheme_handler::make_sarif_gen_opts): Likewise.
(html_scheme_handler::make_sink): Rename "show-state-diagram-xml"
to "show-state-diagrams-sarif" and use pluralization consistently.
* diagnostic-path.cc: Replace include of "xml.h" with
"diagnostic-state-graphs.h".
(diagnostic_event::maybe_make_xml_state): Replace with...
(diagnostic_event::maybe_make_diagnostic_state_graph): ...this.
* diagnostic-path.h (diagnostics::digraphs::digraph): New forward
decl.
(diagnostic_event::maybe_make_xml_state): Replace with...
(diagnostic_event::maybe_make_diagnostic_state_graph): ...this.
* diagnostic-state-graphs.cc: New file.
* diagnostic-state-graphs.h: New file.
* diagnostic-state-to-dot.cc: Port implementation from XML to
state graphs.
* diagnostic-state.h: Deleted file.
* diagnostic.cc (diagnostic_context::report_global_digraph): New.
* diagnostic.h (diagnostics::digraphs::lazy_digraph): New forward
decl.
(diagnostic_context::report_global_digraph): New decl.
* doc/analyzer.texi (Debugging the Analyzer): Update to reflect
change from XML to state graphs.
* doc/invoke.texi ("sarif" diagnostics sink): Replace "xml-state"
with "state-graphs".
("experimental-html" diagnostics sink): Replace
"show-state-diagrams-xml" with "show-state-diagrams-sarif"
* doc/libgdiagnostics/topics/compatibility.rst
(LIBGDIAGNOSTICS_ABI_3): New.
* doc/libgdiagnostics/topics/graphs.rst: New file.
* doc/libgdiagnostics/topics/index.rst: Add graphs.rst.
* graphviz.h (node_id::operator=): New.
* json.h (json::value::dyn_cast_string): New.
(json::object::get_num_keys): New accessor.
(json::object::get_key): New accessor.
(json::string::dyn_cast_string): New.
* libgdiagnostics++.h (class libgdiagnostics::graph): New.
(class libgdiagnostics::node): New.
(class libgdiagnostics::edge): New.
(class libgdiagnostics::diagnostic::take_graph): New.
(class libgdiagnostics::manager::take_global_graph): New.
(class libgdiagnostics::graph::set_description): New.
(class libgdiagnostics::graph::get_node_by_id): New.
(class libgdiagnostics::graph::get_edge_by_id): New.
(class libgdiagnostics::graph::add_edge): New.
(class libgdiagnostics::node::set_label): New.
(class libgdiagnostics::node::set_location): New.
(class libgdiagnostics::node::set_logical_location): New.
* libgdiagnostics-private.h: New file.
* libgdiagnostics.cc: Define INCLUDE_STRING. Include
"diagnostic-digraphs.h", "diagnostic-state-graphs.h", and
"libgdiagnostics-private.h".
(struct diagnostic_graph): New.
(struct diagnostic_node): New.
(struct diagnostic_edge): New.
(libgdiagnostics_path_event::libgdiagnostics_path_event): Add
state_graph param.
(libgdiagnostics_path_event::maybe_make_diagnostic_state_graph):
New.
(libgdiagnostics_path_event::m_state_graph): New field.
(diagnostic_execution_path::add_event_va): Add state_graph param.
(class prebuilt_digraphs): New.
(diagnostic::diagnostic): Use m_graphs in m_metadata.
(diagnostic::take_graph): New.
(diagnostic::get_graphs): New accessor.
(diagnostic::m_graphs): New field.
(diagnostic_manager::take_global_graph): New.
(diagnostic_execution_path_add_event): Update for new param to
add_event_va.
(diagnostic_execution_path_add_event_va): Likewise.
(diagnostic_graph::add_node_with_id): New public entrypoint.
(diagnostic_graph::add_edge_with_label): New public entrypoint.
(diagnostic_manager_new_graph): New public entrypoint.
(diagnostic_manager_take_global_graph): New public entrypoint.
(diagnostic_take_graph): New public entrypoint.
(diagnostic_graph_release): New public entrypoint.
(diagnostic_graph_set_description): New public entrypoint.
(diagnostic_graph_add_node): New public entrypoint.
(diagnostic_graph_add_edge): New public entrypoint.
(diagnostic_graph_get_node_by_id): New public entrypoint.
(diagnostic_graph_get_edge_by_id): New public entrypoint.
(diagnostic_node_set_location): New public entrypoint.
(diagnostic_node_set_label): New public entrypoint.
(diagnostic_node_set_logical_location): New public entrypoint.
(private_diagnostic_execution_path_add_event_2): New private
entrypoint.
(private_diagnostic_graph_set_property_bag): New private
entrypoint.
(private_diagnostic_node_set_property_bag): New private
entrypoint.
(private_diagnostic_edge_set_property_bag): New private
entrypoint.
* libgdiagnostics.h (diagnostic_graph): New typedef.
(diagnostic_node): New typedef.
(diagnostic_edge): New typedef.
(diagnostic_manager_new_graph): New decl.
(diagnostic_manager_take_global_graph): New decl.
(diagnostic_take_graph): New decl.
(diagnostic_graph_release): New decl.
(diagnostic_graph_set_description): New decl.
(diagnostic_graph_add_node): New decl.
(diagnostic_graph_add_edge): New decl.
(diagnostic_graph_get_node_by_id): New decl.
(diagnostic_graph_get_edge_by_id): New decl.
(diagnostic_node_set_label): New decl.
(diagnostic_node_set_location): New decl.
(diagnostic_node_set_logical_location): New decl.
* libgdiagnostics.map (LIBGDIAGNOSTICS_ABI_3): New.
* libsarifreplay.cc: Include "libgdiagnostics-private.h".
(id_map): New "using".
(sarif_replayer::report_invalid_sarif): Update for change to
report_problem params.
(sarif_replayer::report_unhandled_sarif): Likewise.
(sarif_replayer::report_note): New.
(sarif_replayer::report_problem): Pass param "ref" by
pointer rather than reference and handle it being null.
(sarif_replayer::maybe_get_property_bag): New.
(sarif_replayer::maybe_get_property_bag_value): New.
(sarif_replayer::handle_run_obj): Handle run-level "graphs" as per
§3.14.20.
(sarif_replayer::handle_result_obj): Handle result-level "graphs"
as per §3.27.19.
(handle_thread_flow_location_object): Optionally handle graphs
stored in property "gcc/diagnostic_event/state_graph" as state
graphs.
(sarif_replayer::handle_graph_object): New.
(sarif_replayer::handle_node_object): New.
(sarif_replayer::handle_edge_object): New.
(sarif_replayer::get_graph_node_by_id_property): New.
* selftest-run-tests.cc (selftest::run_tests): Call
selftest::diagnostic_graph_cc_tests and
selftest::diagnostic_state_graph_cc_tests.
* selftest.h (selftest::diagnostic_graph_cc_tests): New decl.
(selftest::diagnostic_state_graph_cc_tests): New decl.
gcc/analyzer/ChangeLog:
* ana-state-to-diagnostic-state.cc: Reimplement, replacing
XML-based implementation with one based on state graphs.
* ana-state-to-diagnostic-state.h: Likewise.
* checker-event.cc: Replace include of "xml.h" with include of
"diagnostic-state-graphs.h".
(checker_event::maybe_make_xml_state): Replace with...
(checker_event::maybe_make_diagnostic_state_graph): ...this.
* checker-event.h: Add include of "diagnostic-digraphs.h".
(checker_event::maybe_make_xml_state): Replace decl with...
(checker_event::maybe_make_diagnostic_state_graph): ...this.
* engine.cc (exploded_node::on_stmt_pre): Replace
"_analyzer_dump_xml" with "__analyzer_dump_sarif".
* program-state.cc: Replace include of "diagnostic-state.h" with
"diagnostic-state-graphs.h".
(program_state::dump_dot): Port from XML to state graphs.
* program-state.h: Drop reduntant forward decl of xml::document.
(program_state::make_xml): Replace decl with...
(program_state::make_diagnostic_state_graph): ...this.
(program_state::dump_xml_to_pp): Drop decl.
(program_state::dump_xml_to_file): Drop decl.
(program_state::dump_xml): Drop decl.
(program_state::dump_dump_sarif): New decl.
* sm-malloc.cc (get_dynalloc_state_for_state): New.
(malloc_state_machine::add_state_to_xml): Replace with...
(malloc_state_machine::add_state_to_state_graph): ...this.
* sm.cc (state_machine::add_state_to_xml): Replace with...
(state_machine::add_state_to_state_graph): ...this.
(state_machine::add_global_state_to_xml): Replace with...
(state_machine::add_global_state_to_state_graph): ...this.
* sm.h (class xml_state): Drop forward decl.
(class analyzer_state_graph): New forward decl.
(state_machine::add_state_to_xml): Replace decl with...
(state_machine::add_state_to_state_graph): ...this.
(state_machine::add_global_state_to_xml): Replace decl with...
(state_machine::add_global_state_to_state_graph): ...this.
gcc/testsuite/ChangeLog:
* gcc.dg/analyzer/state-diagram-1-sarif.py (test_xml_state):
Rename to...
(test_state_graph): ...this. Port from XML to SARIF graphs.
* gcc.dg/analyzer/state-diagram-1.c: Update sink option
from "sarif:xml-state=yes" to "sarif:state-graphs=yes".
* gcc.dg/analyzer/state-diagram-5-sarif.c: Likewise.
* gcc.dg/analyzer/state-diagram-5-sarif.py: Drop import of ET.
(test_nested_types_in_xml_state): Rename to...
(test_nested_types_in_state_graph): ...this. Port from XML to
SARIF graphs.
* gcc.dg/plugin/diagnostic-test-graphs-html.c: New test.
* gcc.dg/plugin/diagnostic-test-graphs-html.py: New test script.
* gcc.dg/plugin/diagnostic-test-graphs-sarif.c: New test.
* gcc.dg/plugin/diagnostic-test-graphs-sarif.py: New test script.
* gcc.dg/plugin/diagnostic-test-graphs.c: New test.
* gcc.dg/plugin/diagnostic_plugin_test_graphs.cc: New test plugin.
* gcc.dg/plugin/plugin.exp (plugin_test_list): Add the above.
* lib/sarif.py (get_xml_state): Delete.
(get_state_graph): New.
(def get_state_node_attr): New.
(get_state_node_kind): New.
(get_state_node_name): New.
(get_state_node_type): New.
(get_state_node_value): New.
* sarif-replay.dg/2.1.0-invalid/3.40.2-duplicate-node-id.sarif:
New test.
* sarif-replay.dg/2.1.0-invalid/3.41.4-unrecognized-node-id.sarif:
New test.
* sarif-replay.dg/2.1.0-valid/graphs-check-html.py: New test
script.
* sarif-replay.dg/2.1.0-valid/graphs-check-sarif-roundtrip.py: New
test script.
* sarif-replay.dg/2.1.0-valid/graphs.sarif: New test.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
David Malcolm [Fri, 11 Jul 2025 18:58:20 +0000 (14:58 -0400)]
json: add json::value::clone
gcc/ChangeLog:
* json.cc (json::object::clone): New.
(json::object::clone_as_object): New.
(json::array::clone): New.
(json::float_number::clone): New.
(json::integer_number::clone): New.
(json::string::clone): New.
(json::literal::clone): New.
(selftest::test_cloning): New test.
(selftest::json_cc_tests): Call it.
* json.h (json::value::clone): New vfunc.
(json::object::clone): New decl.
(json::object::clone_as_object): New decl.
(json::array::clone): New decl.
(json::float_number::clone): New decl.
(json::integer_number::clone): New decl.
(json::string::clone): New decl.
(json::literal::clone): New decl.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
David Malcolm [Fri, 11 Jul 2025 18:58:20 +0000 (14:58 -0400)]
json: fix null-termination of json::string
gcc/ChangeLog:
* json.cc (string::string): When constructing from pointer and
length, ensure the new buffer is null-terminated.
(selftest::test_strcmp): New.
(selftest::json_cc_tests): Likewise.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
David Malcolm [Fri, 11 Jul 2025 18:58:20 +0000 (14:58 -0400)]
libgdiagnostics: doc fixes
gcc/ChangeLog:
* doc/libgdiagnostics/topics/compatibility.rst
(_LIBGDIAGNOSTICS_ABI_2): Add missing anchor.
* doc/libgdiagnostics/topics/diagnostic-manager.rst
(diagnostic_manager_add_sink_from_spec): Add links to GCC's
documentation of "-fdiagnostics-add-output=". Fix parameter
markup.
(diagnostic_manager_set_analysis_target): Fix parameter markup.
Add link to SARIF spec.
* doc/libgdiagnostics/topics/logical-locations.rst: Markup fix.
* doc/libgdiagnostics/tutorial/02-physical-locations.rst: Clarify
wording of what "the source file" means, and that a range can't
have multiple files.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
[PR121007, LRA]: Fall back to reload of whole inner address in PR case and constrain iteration number of address reloads
gcc/ChangeLog:
* lra-constraints.cc (process_address_1): When changing base reg
on a reg of the base class, fall back to reload of whole inner address.
(process_address): Constrain the iteration number.
The following patch implements the compiler side of the C++26 paper.
Based on the https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119064#c3
feedback, the patch enables the new conditional keywords
trivially_relocatable_if_eligible and replaceable_if_eligible only
for C++26, for older versions those conditional keywords yield
-Wc++26-compat warning and are treated as normal identifiers.
Plus __trivially_relocatable_if_eligible and __replaceable_if_eligible
are handled as conditional keywords always without diagnostics (similarly
to __final in C++98).
The patch uses __builtin_ prefix on the new traits (but unlike clang
which for some weird reason chose to name one __builtin_is_replaceable
and another __builtin_is_cpp_trivially_relocatable this one uses
__builtin_is_replaceable and __builtin_is_trivially_relocatable.
I'll try to convince clang to change, they've only implemented it
recently.
The patch computes these properties on demand, only when something needs
them (at the expense of eating 2 more bits per lang_type, but I've recently
saved 64 bits and a patch to save another 64 bits is pending; and even
4 bits wouldn't fit).
The patch doesn't add __builtin_trivially_relocate builtin that clang has,
std::trivially_relocate is not constexpr and I think we don't need it for
now at least until we implement some kind of vtable pointer signing
__builtin_memmove should do the job. Especially if libstdc++ will for clang
compatibility use the builtin if available and __builtin_memmove otherwise,
we can switch any time.
I've cross-tested all testcases also against the clang++ trunk
implementation, and both compilers agreed in everything except for
https://github.com/llvm/llvm-project/issues/143599
where clang++ was changed already and
https://github.com/llvm/llvm-project/issues/144232
where I believe clang++ got it wrong too.
The first testcase comes from
https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2025/p2786r13.html#simple-worked-examples
just tweaked so that the classes are named differently each time and that it
compiles. There are 3 differences from the paper vs. the g++ as well as
clang++ implementation, I've added comments into
trivially-relocatable1.C but I think either that part of the paper wasn't
updated through the later changes or it got it wrong.
2025-07-11 Jakub Jelinek <jakub@redhat.com>
PR c++/119064
gcc/
* doc/invoke.texi (Wc++26-compat): Document.
gcc/c-family/
* c.opt (Wc++26-compat): New option.
* c.opt.urls: Regenerate.
* c-opts.cc (c_common_post_options): Clear warn_cxx26_compat for
C++26 or later.
* c-cppbuiltin.cc (c_cpp_builtins): For C++26 predefine
__cpp_trivial_relocatability=202502L.
gcc/cp/
* cp-tree.h: Implement C++26 P2786R13 - Trivial Relocatability.
(struct lang_type): Add trivially_relocatable,
trivially_relocatable_computed, replaceable and replaceable_computed
bitfields. Change width of dummy from 2 to 30.
(CLASSTYPE_TRIVIALLY_RELOCATABLE_BIT,
CLASSTYPE_TRIVIALLY_RELOCATABLE_COMPUTED, CLASSTYPE_REPLACEABLE_BIT,
CLASSTYPE_REPLACEABLE_COMPUTED): Define.
(enum virt_specifier): Add VIRT_SPEC_TRIVIALLY_RELOCATABLE_IF_ELIGIBLE
and VIRT_SPEC_REPLACEABLE_IF_ELIGIBLE enumerators.
(trivially_relocatable_type_p, replaceable_type_p): Declare.
* cp-trait.def (IS_NOTHROW_RELOCATABLE, IS_REPLACEABLE,
IS_TRIVIALLY_RELOCATABLE): New traits.
* parser.cc (cp_parser_class_property_specifier_seq_opt): Handle
trivially_relocatable_if_eligible,
__trivially_relocatable_if_eligible, replaceable_if_eligible and
__replaceable_if_eligible.
(cp_parser_class_head): Set CLASSTYPE_REPLACEABLE_BIT
and/or CLASSTYPE_TRIVIALLY_RELOCATABLE_BIT if corresponding
conditional keywords were parsed and assert corresponding *_COMPUTED
macro is false.
* pt.cc (instantiate_class_template): Copy over also
CLASSTYPE_TRIVIALLY_RELOCATABLE_{BIT,COMPUTED} and
CLASSTYPE_REPLACEABLE_{BIT,COMPUTED} bits.
* semantics.cc (referenceable_type_p): Move definition earlier.
(trait_expr_value): Handle CPTK_IS_NOTHROW_RELOCATABLE,
CPTK_IS_REPLACEABLE and CPTK_IS_TRIVIALLY_RELOCATABLE.
(finish_trait_expr): Likewise.
* tree.cc (default_movable_type_p): New function.
(union_with_no_declared_special_member_fns): Likewise.
(trivially_relocatable_type_p): Likewise.
(replaceable_type_p): Likewise.
* constraint.cc (diagnose_trait_expr): Handle
CPTK_IS_NOTHROW_RELOCATABLE, CPTK_IS_REPLACEABLE and
CPTK_IS_TRIVIALLY_RELOCATABLE.
gcc/testsuite/
* g++.dg/cpp26/feat-cxx26.C: Add test for
__cpp_trivial_relocatability.
* g++.dg/cpp26/trivially-relocatable1.C: New test.
* g++.dg/cpp26/trivially-relocatable2.C: New test.
* g++.dg/cpp26/trivially-relocatable3.C: New test.
* g++.dg/cpp26/trivially-relocatable4.C: New test.
* g++.dg/cpp26/trivially-relocatable5.C: New test.
* g++.dg/cpp26/trivially-relocatable6.C: New test.
* g++.dg/cpp26/trivially-relocatable7.C: New test.
* g++.dg/cpp26/trivially-relocatable8.C: New test.
* g++.dg/cpp26/trivially-relocatable9.C: New test.
* g++.dg/cpp26/trivially-relocatable10.C: New test.
* g++.dg/cpp26/trivially-relocatable11.C: New test.
aarch64: Tweak handling of general SVE permutes [PR121027]
This PR is partly about a code quality regression that was triggered
by g:caa7a99a052929d5970677c5b639e1fa5166e334. That patch taught the
gimple optimisers to fold two VEC_PERM_EXPRs into one, conditional
upon either (a) the original permutations not being "native" operations
or (b) the combined permutation being a "native" operation.
Whether something is a "native" operation is tested by calling
can_vec_perm_const_p with allow_variable_p set to false. This requires
the permutation to be supported directly by TARGET_VECTORIZE_VEC_PERM_CONST,
rather than falling back to the general vec_perm optab.
This exposed a problem with the way that we handled general 2-input
permutations for SVE. Unlike Advanced SIMD, base SVE does not have
an instruction to do general 2-input permutations. We do still implement
the vec_perm optab for SVE, but only when the vector length is known at
compile time. The general expansion is pretty expensive: an AND, a SUB,
two TBLs, and an ORR. It certainly couldn't be considered a "native"
operation.
However, if a VEC_PERM_EXPR has a constant selector, the indices can
be wider than the elements being permuted. This is not true for the
vec_perm optab, where the indices and permuted elements must have the
same precision.
This leads to one case where we cannot leave a general 2-input permutation
to be handled by the vec_perm optab: when permuting bytes on a target
with 2048-bit vectors. In that case, the indices of the elements in
the second vector are in the range [256, 511], which cannot be stored
in a byte index.
TARGET_VECTORIZE_VEC_PERM_CONST therefore has to handle 2-input SVE
permutations for one specific case. Rather than check for that
specific case, the code went ahead and used the vec_perm expansion
whenever it worked. But that undermines the !allow_variable_p
handling in can_vec_perm_const_p; it becomes impossible for
target-independent code to distinguish "native" operations from
the worst-case fallback.
This patch instead limits TARGET_VECTORIZE_VEC_PERM_CONST to the
cases that it has to handle. It fixes the PR for all vector lengths
except 2048 bits.
A better fix would be to introduce some sort of costing mechanism,
which would allow us to reject the new VEC_PERM_EXPR even for
2048-bit targets. But that would be a significant amount of work
and would not be backportable.
gcc/
PR target/121027
* config/aarch64/aarch64.cc (aarch64_evpc_sve_tbl): Punt on 2-input
operations that can be handled by vec_perm.
gcc/testsuite/
PR target/121027
* gcc.target/aarch64/sve/acle/general/perm_1.c: New test.
To handle DImode BCAX operations we want to do them on the SIMD side only if
the incoming arguments don't require a cross-bank move.
This means we need to split back the combination to separate GP BIC+EOR
instructions if the operands are expected to be in GP regs through reload.
The split happens pre-reload if we already know that the destination will be
a GP reg. Otherwise if reload descides to use the "=r,r" alternative we ensure
operand 0 is early-clobber.
This scheme is similar to how we handle the BSL operations elsewhere in
aarch64-simd.md.
Thus, for the functions:
uint64_t bcax_d_gp (uint64_t a, uint64_t b, uint64_t c) { return BCAX (a, b, c); }
uint64x1_t bcax_d (uint64x1_t a, uint64x1_t b, uint64x1_t c) { return BCAX (a, b, c); }
we now generate the desired:
bcax_d_gp:
bic x1, x1, x2
eor x0, x1, x0
ret
bcax_d:
bcax v0.16b, v0.16b, v1.16b, v2.16b
ret
When the inputs are in SIMD regs we use BCAX and when they are in GP regs we
don't force them to SIMD with extra moves.
Bootstrapped and tested on aarch64-none-linux-gnu.
aarch64: Allow 64-bit vector modes in pattern for BCAX instruction
The BCAX instruction from TARGET_SHA3 only operates on the full .16b form
of the inputs but as it's a pure bitwise operation we can use it for the 64-bit
modes as well as there we don't care about the upper 64 bits. This patch extends
the relevant pattern in aarch64-simd.md to accept the 64-bit vector modes.
Thus, for the input:
uint32x2_t
bcax_s (uint32x2_t a, uint32x2_t b, uint32x2_t c)
{
return BCAX (a, b, c);
}
we can now generate:
bcax_s:
bcax v0.16b, v0.16b, v1.16b, v2.16b
ret
instead of the current:
bcax_s:
bic v1.8b, v1.8b, v2.8b
eor v0.8b, v1.8b, v0.8b
ret
This patch doesn't cover the DI/V1DI modes as that would require extending
the bcaxqdi4 pattern with =r,r alternatives and adding splitting logic to
handle the cases where the operands arrive in GP regs. It is doable, but can
be a separate patch. This patch as is should be a straightforward improvement
always.
Bootstrapped and tested on aarch64-none-linux-gnu.
The following fixes the loop following the reduction chain to
properly visit all SLP nodes involved and makes the stmt info
and the SLP node we track match.
PR tree-optimization/121034
* tree-vect-loop.cc (vectorizable_reduction): Cleanup
reduction chain following code.
Jakub Jelinek [Fri, 11 Jul 2025 11:50:07 +0000 (13:50 +0200)]
libstdc++: Implement C++26 P3748R0 - Inspecting exception_ptr should be constexpr
The following patch makes std::exception_ptr_cast constexpr.
The paper suggests using dynamic_cast, but that does only work for
polymorphic exceptions, doesn't work if they are scalar or non-polymorphic
classes.
Furthermore, the patch adds some static_asserts for
"Mandates: E is a cv-unqualified complete object type. E is not an array type.
E is not a pointer or pointer-to-member type."
2025-07-11 Jakub Jelinek <jakub@redhat.com>
* libsupc++/exception_ptr.h: Implement C++26 P3748R0 - Inspecting
exception_ptr should be constexpr.
(std::exception_ptr_cast): Make constexpr, remove inline keyword. Add
static_asserts for Mandates. For if consteval use std::rethrow_exception,
catch it and return its address or nullptr.
* testsuite/18_support/exception_ptr/exception_ptr_cast.cc (E::~E): Add
constexpr.
(G::G): Likewise.
(test01): Likewise. Return bool and take bool argument, throw if the
argument is true. Add static_assert(test01(false)).
(main): Call test01(true) in try.
Jakub Jelinek [Fri, 11 Jul 2025 11:43:58 +0000 (13:43 +0200)]
testsuite: Add testcase for already fixed PR [PR120954]
This was a regression introduced by r16-1893 (and its backports) for C++,
though for C it had false positive warning for years. Fixed by r16-2000
(and its backports).
2025-07-11 Jakub Jelinek <jakub@redhat.com>
PR c++/120954
* c-c++-common/Warray-bounds-11.c: New test.
Jan Hubicka [Fri, 11 Jul 2025 11:01:13 +0000 (13:01 +0200)]
Rewrite assign_discriminators
To assign debug locations to corresponding statements auto-fdo uses
discriminators. Documentation says that if given statement belongs to multiple
basic blocks, the discrminator distinguishes them.
Current implementation however only work fork statements that expands into a
squence of gimple statements which forms a linear sequence, sicne it
essentially tracks a current location and renews it each time new BB is found.
This is commonly not true for C++ code as in:
371 if (this!=simulation.getContextModule())
372 throw cRuntimeError("send()/sendDelayed() of module (%s)%s called in the context of "
373 "module (%s)%s: method called from the latter module "
374 "lacks Enter_Method() or Enter_Method_Silent()? "
375 "Also, if message to be sent is passed from that module, "
376 "you'll need to call take(msg) after Enter_Method() as well",
377 getClassName(), getFullPath().c_str(),
378 simulation.getContextModule()->getClassName(),
379 simulation.getContextModule()->getFullPath().c_str());
Notice that 379:85 is interleaved by 377:45 and the pass does not assign new discriminator.
With patch we get:
There are earlier statements with line number 379, so that is why there is discriminator 7 for the call.
After that discriminator is increased. There are two reasons for it
1) AFDO requires every callsite to have unique lineno:discriminator pair
2) call may not terminate and htus the profile of first statement
may be higher than the rest.
Old pass also contained logic to skip debug statements. This is not a good
idea since we output them to the debug output and if AFDO tool picks these
locations up they will be misplaced in basic blocks.
Debug statements are naturally quite useful to track back the AFDO profiles
and in meantime LLVM folks implemented something similar called pseudoprobe.
I think it makes sense toenable debug statements with -fauto-profile even if
debug info is off and make use of them as done in this patch.
Sadly AFDO tool is quite broken and bulid around assumption that every address
has at most one debug location assigned to it (i.e. debug info before debug
statements were introduced). I have WIP patch fixing this.
Note that LLVM also has -fdebug-info-for-auto-profile (on by defualt it seems)
that controls discriminator production and some other little bits. I wonder if
we want to have something similar. Should it be -gdebug-info-for-auto-profile
instead?
gcc/ChangeLog:
* opts.cc (finish_options): Enable debug_nonbind_markers_p for
auto-profile.
* tree-cfg.cc (struct locus_discrim_map): Remove.
(struct locus_discrim_hasher): Remove.
(locus_discrim_hasher::hash): Remove.
(locus_discrim_hasher::equal): Remove.
(first_non_label_nondebug_stmt): Remove.
(build_gimple_cfg): Do not allocate discriminator tables.
(next_discriminator_for_locus): Remove.
(same_line_p): Remove.
(struct discrim_entry): New structure.
(assign_discriminator): Rewrite.
(assign_discriminators): Rewrite.
Jan Hubicka [Fri, 11 Jul 2025 10:37:24 +0000 (12:37 +0200)]
Fix ICE in speculative devirtualization
This patch fixes ICE bilding lto1 with autoprofiledbootstrap and in pr114790.
What happens is that auto-fdo speculatively devirtualizes to a wrong target.
This is due to a bug where it mixes up dwarf names and linkage names of inline
functions I need to fix as well.
Later we clone at WPA time. At ltrans time clone is materialized and call is
turned into a direct call (this optimization is missed by ipa-cp propagation).
At this time we should resolve speculation but we don't. As a result we get
error from verifier after inlining complaining that there is speculative call
with corresponding direct call lacking speculative flag.
This seems long-lasting problem in cgraph_update_edges_for_call_stmt_node but
I suppose it does not trigger since we usually speculate correctly or notice
the direct call at WPA time already.
Bootstrapped/regtested x86_64-linux.
gcc/ChangeLog:
PR ipa/114790
* cgraph.cc (cgraph_update_edges_for_call_stmt_node): Resolve devirtualization
if call statement was optimized out or turned to direct call.
gcc/testsuite/ChangeLog:
* g++.dg/lto/pr114790_0.C: New test.
* g++.dg/lto/pr114790_1.C: New test.
Jakub Jelinek [Fri, 11 Jul 2025 10:09:44 +0000 (12:09 +0200)]
ipa: Disallow signature changes in fun->has_musttail functions [PR121023]
As the following testcase shows e.g. on ia32, letting IPA opts change
signature of functions which have [[{gnu,clang}::musttail]] calls
can turn programs that would be compiled normally into something
that is rejected because the caller has fewer argument stack slots
than the function being tail called.
The following patch prevents signature changes for such functions.
It is perhaps too big hammer in some cases, but it might be hard
to try to figure out what signature changes are still acceptable and which
are not at IPA time.
2025-07-11 Jakub Jelinek <jakub@redhat.com>
Martin Jambor <mjambor@suse.cz>
Richard Biener [Thu, 10 Jul 2025 11:30:30 +0000 (13:30 +0200)]
properly compute fp/mode for scalar ops for vectorizer costing
The x86 add_stmt_hook relies on the passed vectype to determine
the mode and whether it is FP for a scalar operation. This is
unreliable now for stmts involving patterns and in the future when
there is no vector type passed for scalar operations.
To be least disruptive I've kept using the vector type if it is passed.
* config/i386/i386.cc (ix86_vector_costs::add_stmt_cost): Use
the LHS of a scalar stmt to determine mode and whether it is FP.
Bootstrapping trunk with 32-bit-default on Mac OS X 10.11
(i386-apple-darwin15) fails:
/vol/gcc/src/hg/master/local/gcc/cobol/lexio.cc: In static member function 'static void cdftext::process_file(filespan_t, int, bool)':
/vol/gcc/src/hg/master/local/gcc/cobol/lexio.cc:1859:14: error: format '%u' expects argument of type 'unsigned int', but argument 4 has type 'size_t' {aka 'long unsigned int'} [-Werror=format=]
1859 | dbgmsg("%s:%d: line " HOST_SIZE_T_PRINT_UNSIGNED ", opening %s on fd %d",
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1860 | __func__, __LINE__,mfile.lineno(),
| ~~~~~~~~~~~~~~
| |
| size_t {aka long unsigned int}
In file included from /vol/gcc/src/hg/master/local/gcc/system.h:1244,
from /vol/gcc/src/hg/master/local/gcc/cobol/cobol-system.h:61,
from /vol/gcc/src/hg/master/local/gcc/cobol/lexio.cc:33:
/vol/gcc/src/hg/master/local/gcc/hwint.h:135:51: note: format string is defined here
135 | #define HOST_SIZE_T_PRINT_UNSIGNED "%" GCC_PRISZ "u"
| ~~~~~~~~~~~~~~^
| |
| unsigned int
| %" GCC_PRISZ "lu
On Darwin, size_t is always long unsigned int. However, unsigned int
and long unsigned int are both 32-bit, so hwint.h selects %u for the
format. As documented there, the arg needs to be cast to fmt_size_t to
avoid the error.
This isn't an issue on other 32-bit platforms like Solaris/i386 or
Linux/i686 since they use unsigned int for size_t.
/vol/gcc/src/hg/master/local/gcc/cobol/parse.y: In function 'int yyparse()':
/vol/gcc/src/hg/master/local/gcc/cobol/parse.y:10215:36: error: format '%zu' expects argument of type 'size_t', but argument 4 has type 'int' [-Werror=format=]
10215 | error_msg(loc, "FUNCTION %qs has "
| ^~~~~~~~~~~~~~~~~~~
10216 | "inconsistent parameter type %zu (%qs)",
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
10217 | keyword_str($1), p - args.data(), name_of(p->field) );
| ~~~~~~~~~~~~~~~
| |
| int
The arg (p - args.data())) is ptrdiff_t (int on 32-bit Darwin), while
the %zu format expect size_t (long unsigned int). The patch therefore
casts the ptrdiff_t arg to long and prints it as such.
There are two more instances of the same problem:
/vol/gcc/src/hg/master/local/gcc/cobol/util.cc: In member function 'void cbl_field_t::report_invalid_initial_value(const YYLTYPE&) const':
/vol/gcc/src/hg/master/local/gcc/cobol/util.cc:905:80: error: format '%zu' expects argument of type 'size_t', but argument 6 has type 'int' [-Werror=format=]
905 | error_msg(loc, "%s cannot represent VALUE %qs exactly (max %c%zu)",
| ~~^
| |
| long unsigned int
| %u
906 | name, data.initial, '.', pend - p);
| ~~~~~~~~
| |
| int
In file included from /vol/gcc/src/hg/master/local/gcc/cobol/scan.l:48:
/vol/gcc/src/hg/master/local/gcc/cobol/scan_ante.h: In function 'int numstr_of(const char*, radix_t)':
/vol/gcc/src/hg/master/local/gcc/cobol/scan_ante.h:152:25: error: format '%zu' expects argument of type 'size_t', but argument 4 has type 'int' [-Werror=format=]
152 | error_msg(yylloc, "significand of %s has more than 36 digits (%zu)", input, nx);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~
| |
| int
Fixed in the same way.
Bootstrapped without regressions on i386-apple-darwin15,
x86_64-apple-darwin, i386-pc-solaris2.11, amd64-pc-solaris2.11,
i686-pc-linux-gnu, and x86_64-pc-linux-gnu.
Jonathan Wakely [Wed, 2 Jul 2025 20:16:30 +0000 (21:16 +0100)]
libstdc++: Always treat __float128 as a floating-point type
Similar to the previous commit that made is_integral_v<__int128>
unconditionally true, this makes is_floating_point_v<__float128>
unconditionally true. With the new extended floating-point types in
C++23 (std::float64_t etc.) it seems unhelpful for is_floating_point_v
to be true for them, but not for __float128. Especially as it is true on
some targets, because __float128 is just a typedef for long double.
This change makes is_floating_point_v<__float128> true whenever the type
is defined, giving less surprising and more portable behaviour.
libstdc++-v3/ChangeLog:
* include/bits/cpp_type_traits.h (__is_floating<__float128>):
Do not depend on __STRICT_ANSI__.
* include/bits/stl_algobase.h (__size_to_integer(__float128)):
Likewise.
* include/std/type_traits (__is_floating_point_helper<__float128>):
Likewise.
Jonathan Wakely [Fri, 16 May 2025 12:33:23 +0000 (13:33 +0100)]
libstdc++: Treat __int128 as a real integral type [PR96710]
Since LWG 3828 (included in C++23) implementations are allowed to have
extended integer types that are wider than intmax_t. This means we no
longer have to make is_integral_v<__int128> false for strict -std=c++23
mode, removing the confusing inconsistency with -std=gnu++23 (where
is_integral_v<__int128> is true).
This change makes __int128 a true integral type for all modes, treating
LWG 3828 as a DR for previous standards. Most of the change just
involves removing special cases where we wanted to treat __int128 and
unsigned __int128 as integral types even when is_integral_v was false.
There are still some preprocessor conditionals needed, because on some
targets the compiler defines the macro __GLIBCXX_TYPE_INT_N_0 as
__int128 in non-strict modes. Because we define explicit specializations
of templates such as is_integral for all the INT_N types, we already
have a specialization of is_integral<__int128> in non-strict modes, and
so to avoid a redefinition we only must only define
is_integral<__int128> for strict modes.
libstdc++-v3/ChangeLog:
PR libstdc++/96710
* include/bits/cpp_type_traits.h (__is_integer): Define explicit
specializations for __int128.
(__memcpyable_integer): Remove explicit specializations for
__int128.
* include/bits/iterator_concepts.h (incrementable_traits):
Likewise.
(__is_signed_int128, __is_unsigned_int128, __is_int128): Remove.
(__is_integer_like, __is_signed_integer_like): Remove check for
__int128.
* include/bits/max_size_type.h: Remove all uses of __is_int128
in constraints.
* include/bits/ranges_base.h (__to_unsigned_like): Remove
overloads for __int128.
(ranges::ssize): Remove special case for __int128.
* include/bits/stl_algobase.h (__size_to_integer): Define
__int128 overloads for strict modes.
* include/ext/numeric_traits.h (__is_integer_nonstrict): Remove
explicit specializations for __int128.
* include/std/charconv (to_chars): Define overloads for
__int128.
* include/std/format (__format::make_unsigned_t): Remove.
(__format::to_chars): Remove.
* include/std/limits (numeric_limits): Define explicit
specializations for __int128.
* include/std/type_traits (__is_integral_helper): Likewise.
(__make_unsigned, __make_signed): Likewise.
gcc/fortran
PR fortran/106135
* decl.cc (build_sym): Emit an error if a symbol associated by
an IMPORT, ONLY or IMPORT, all statement is being redeclared.
(gfc_match_import): Parse and check the F2018 versions of the
IMPORT statement. For scopes other than and interface body, if
the symbol cannot be found in the host scope, generate it and
set it up such that gfc_fixup_sibling_symbols can transfer its
'imported attribute' if it turnes out to be a not yet parsed
procedure. Test for violations of C897-8100.
* gfortran.h : Add 'import_only' to the gfc_symtree structure.
Add the enum, 'importstate', which is used for values the new
field 'import_state' in gfc_namespace.
* parse.cc (gfc_fixup_sibling_symbols): Transfer the attribute
'imported' to the new symbol.
* resolve.cc (check_sym_import_status, check_import_status):
New functions to test symbols and expressions for violations of
F2018:C8102.
(resolve_call): Test the 'resolved_sym' against C8102 by a call
to 'check_sym_import_status'.
(gfc_resolve_expr): If the expression is OK and an IMPORT
statement has been registered in the current scope, test C102
by calling 'check_import_status'.
(resolve_select_type): Test the declared derived type in TYPE
IS and CLASS IS statements.
gcc/testsuite/
PR fortran/106135
* gfortran.dg/import3.f90: Use -std=f2008 and comment on change
in error message texts with f2018.
* gfortran.dg/import12.f90: New test.
Jakub Jelinek [Thu, 10 Jul 2025 22:05:23 +0000 (00:05 +0200)]
c++: Save 8 further bytes from lang_type allocations
The following patch implements the
/* FIXME reuse another field? */
comment on the lambda_expr member.
I think (and asserts in the patch seem to confirm) CLASSTYPE_KEY_METHOD
is only ever non-NULL for TYE_POLYMORPHIC_P and on the other side
CLASSTYPE_LAMBDA_EXPR is only used on closure types which are never
polymorphic.
So, the patch just uses one member for both, with the accessor macros
changed to be no longer lvalues and adding SET_* variants of the macros
for setters.
2025-07-11 Jakub Jelinek <jakub@redhat.com>
* cp-tree.h (struct lang_type): Add comment before key_method.
Remove lambda_expr.
(CLASSTYPE_KEY_METHOD): Give NULL_TREE if not TYPE_POLYMORPHIC_P.
(SET_CLASSTYPE_KEY_METHOD): Define.
(CLASSTYPE_LAMBDA_EXPR): Give NULL_TREE if TYPE_POLYMORPHIC_P.
Use key_method member instead of lambda_expr.
(SET_CLASSTYPE_LAMBDA_EXPR): Define.
* class.cc (determine_key_method): Use SET_CLASSTYPE_KEY_METHOD
macro.
* decl.cc (xref_tag): Use SET_CLASSTYPE_LAMBDA_EXPR macro.
* lambda.cc (begin_lambda_type): Likewise.
* module.cc (trees_in::read_class_def): Use SET_CLASSTYPE_LAMBDA_EXPR
and SET_CLASSTYPE_KEY_METHOD macros, assert lambda is NULL if
TYPE_POLYMORPHIC_P and otherwise assert key_method is NULL.
Jakub Jelinek [Thu, 10 Jul 2025 21:47:42 +0000 (23:47 +0200)]
c++: Fix up final handling in C++98 [PR120628]
The following patch is on top of the
https://gcc.gnu.org/pipermail/gcc-patches/2025-June/686210.html
patch which stopped treating override as conditional keyword in
class properties.
This PR mentions another problem; we emit a bogus warning on code like
struct C {}; struct C final = {};
in C++98. In this case we parse final as conditional keyword in C++
(including pedwarn) but the caller then immediately aborts the tentative
parse because it isn't followed by { nor (in some cases) : .
I think we certainly shouldn't pedwarn on it, but I think we even shouldn't
warn for it say for -Wc++11-compat, because we don't actually treat the
identifier as conditional keyword even in C++11 and later.
The patch only does this if final is the only class property conditional
keyword, if one uses
struct S __final final __final = {};
one gets the warning and duplicate diagnostics and later parsing errors.
2025-07-10 Jakub Jelinek <jakub@redhat.com>
PR c++/120628
* parser.cc (cp_parser_elaborated_type_specifier): Use
cp_parser_nth_token_starts_class_definition_p with extra argument 1
instead of cp_parser_next_token_starts_class_definition_p.
(cp_parser_class_property_specifier_seq_opt): For final conditional
keyword in C++98 check if the token after it isn't
cp_parser_nth_token_starts_class_definition_p nor CPP_NAME and in
that case break without consuming it nor warning.
(cp_parser_class_head): Use
cp_parser_nth_token_starts_class_definition_p with extra argument 1
instead of cp_parser_next_token_starts_class_definition_p.
(cp_parser_next_token_starts_class_definition_p): Renamed to ...
(cp_parser_nth_token_starts_class_definition_p): ... this. Add N
argument. Use cp_lexer_peek_nth_token instead of cp_lexer_peek_token.
* g++.dg/cpp0x/final1.C: New test.
* g++.dg/cpp0x/final2.C: New test.
* g++.dg/cpp0x/override6.C: New test.
Jakub Jelinek [Thu, 10 Jul 2025 21:41:56 +0000 (23:41 +0200)]
c++: Don't incorrectly reject override after class head name [PR120569]
While the
https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2025/p2786r13.html#c03-compatibility-changes-for-annex-c-diff.cpp03.dcl.dcl
hunk dropped because
struct C {}; struct C final {};
is actually not valid C++98 (which didn't have list initialization), we
actually also reject
struct D {}; struct D override {};
and that IMHO is valid all the way from C++11 onwards.
Especially in the light of P2786R13 adding new contextual keywords, I think
it is better to use a separate routine for parsing the
class-virt-specifier-seq (in C++11, there was export next to final),
class-virt-specifier (in C++14 to C++23) and
class-property-specifier-seq (in C++26) instead of using the same function
for virt-specifier-seq and class-property-specifier-seq.
2025-07-10 Jakub Jelinek <jakub@redhat.com>
PR c++/120569
* parser.cc (cp_parser_class_property_specifier_seq_opt): New
function.
(cp_parser_class_head): Use it instead of
cp_parser_property_specifier_seq_opt. Don't diagnose
VIRT_SPEC_OVERRIDE here. Formatting fix.
* g++.dg/cpp0x/override2.C: Expect different diagnostics with
override or duplicate final.
* g++.dg/cpp0x/override5.C: New test.
* g++.dg/cpp0x/duplicate1.C: Expect different diagnostics with
duplicate final.
The following patch implements the C++26 P3068R5 - constexpr exceptions
paper.
As the IL cxx_eval_constant* functions process already contains the low
level calls like __cxa_{allocate,free}_exception, __cxa_{,re}throw etc.,
the patch just makes 10 extern "C" __cxa_* functions magic builtins which
during constant evaluation pretend to be constexpr even when not declared
so and handle them directly, plus does the same for 3 std namespace
functions - std::uncaught_exceptions, std::current_exception and
std::rethrow_exception and adds one new FE builtin -
__builtin_eh_ptr_adjust_ref which the library can use instead of the
_M_addref and _M_release out of line methods (this one instead of
recognizing _M_* as magic too because those are clearly specific to
libstdc++ and e.g. libc++ could use something else).
The patch uses magic VAR_DECLs with heap_{uninit_,,deleted_}identifier
DECL_NAME like for operator new/delete for objects allocated with
__cxa_allocate_exception, just sets their DECL_LANG_SPECIFIC so that
we can track their reference count as well (with std::exception_ptr
the same exception object can be referenced multiple times and we want
to destruct and free only when it reaches zero refcount).
For uncaught exceptions being propagated, the patch uses new kind of
*jump_target, which is that magic VAR_DECL described above.
The largest change in the patch is making jump_target argument non-optional
in cxa_eval_constant_exception and all functions it calls that need it.
This is because exceptions can be thrown from pretty much everywhere, e.g.
binary expression can throw in either operand. And the patch also adds
if (*jump_target) return NULL_TREE; or similar in many spots, so that we
don't crash because cxx_eval_constant_expression returned NULL_TREE
somewhere before actually trying to use it and so that we don't uselessly
dive into other operands etc.
Note, with statement expressions actually this was something we just didn't
handle correctly before, one can validly have:
a = ({ if (x) return 42; 12; }) + b;
or in the other operand, or break/continue instead of return if it is
somewhere in a loop/switch; and it isn't ok to branch from one operand to
another one through some kind of goto.
On the potential_constant_expression_1 side, important change was to
set *jump_target conservatively on calls that could throw for C++26 (the
patch uses magic void_node for potential_constant_expression* instead of
VAR_DECL, so that we don't have to create new VAR_DECLs there uselessly).
Without that change, several methods in libstdc++ wouldn't work correctly.
I'm not sure what exactly potential_constant_expression_1 maps to in the
C++26 standard wording now and whether doing that is ok, because basically
after the first call to non-noexcept function it stops checking stuff.
And, in some spots where I know potential_constant_expression_1 didn't
check some subexpressions (e.g. the EH only cleanups or TRY_BLOCK handlers)
I've added *potential_constant_expression* calls during cxx_eval_constant*,
not sure if I need to do that because potential_constant_expression_1 is
very conservative and just doesn't recurse on subexpressions in many cases.
2025-07-10 Jakub Jelinek <jakub@redhat.com>
PR c++/117785
gcc/c-family/
* c-cppbuiltin.cc (c_cpp_builtins): Predefine
__cpp_constexpr_exceptions=202411L for C++26.
gcc/cp/
* constexpr.cc: Implement C++26 P3068R5 - constexpr exceptions.
(class constexpr_global_ctx): Add caught_exceptions and
uncaught_exceptions members.
(constexpr_global_ctx::constexpr_global_ctx): Initialize
uncaught_exceptions.
(returns, breaks, continues, switches): Move earlier.
(throws): New function.
(exception_what_str, diagnose_std_terminate,
diagnose_uncaught_exception): New functions.
(enum cxa_builtin): New type.
(cxx_cxa_builtin_fn_p, cxx_eval_cxa_builtin_fn): New functions.
(cxx_eval_builtin_function_call): Add jump_target argument. Call
cxx_eval_cxa_builtin_fn for __builtin_eh_ptr_adjust_ref. Adjust
cxx_eval_constant_expression calls, if it results in jmp_target,
set *jump_target to it and return.
(cxx_bind_parameters_in_call): Add jump_target argument. Pass
it through to cxx_eval_constant_expression. If it sets *jump_target,
break.
(fold_operand): Adjust cxx_eval_constant_expression caller.
(cxx_eval_assert): Likewise. If it set jmp_target, return true.
(cxx_eval_internal_function): Add jump_target argument. Pass it
through to cxx_eval_constant_expression. Return early if *jump_target
after recursing on args.
(cxx_eval_dynamic_cast_fn): Likewise. Don't set reference_p for
C++26 with -fexceptions.
(cxx_eval_thunk_call): Add jump_target argument. Pass it through
to cxx_eval_constant_expression.
(cxx_set_object_constness): Likewise. Don't set TREE_READONLY if
throws (jump_target).
(cxx_eval_call_expression): Add jump_target argument. Pass it
through to cxx_eval_internal_function, cxx_eval_builtin_function_call,
cxx_eval_thunk_call, cxx_eval_dynamic_cast_fn and
cxx_set_object_constness. Pass it through also
cxx_eval_constant_expression on arguments, cxx_bind_parameters_in_call
and cxx_fold_indirect_ref and for those cases return early
if *jump_target. Call cxx_eval_cxa_builtin_fn for cxx_cxa_builtin_fn_p
functions. For cxx_eval_constant_expression on body, pass address of
cleared jmp_target automatic variable, if it throws propagate
to *jump_target and make it non-cacheable. For C++26 don't diagnose
calls to non-constexpr functions before cxx_bind_parameters_in_call
could report some argument throwing an exception.
(cxx_eval_unary_expression): Add jump_target argument. Pass it
through to cxx_eval_constant_expression and return early
if *jump_target after the call.
(cxx_fold_pointer_plus_expression): Likewise.
(cxx_eval_binary_expression): Likewise and similarly for
cxx_fold_pointer_plus_expression call.
(cxx_eval_conditional_expression): Pass jump_target to
cxx_eval_constant_expression on first operand and return early
if *jump_target after the call.
(cxx_eval_vector_conditional_expression): Add jump_target argument.
Pass it through to cxx_eval_constant_expression for all 3 arguments
and return early if *jump_target after any of those calls.
(get_array_or_vector_nelts): Add jump_target argument. Pass it
through to cxx_eval_constant_expression.
(eval_and_check_array_index): Add jump_target argument. Pass it
through to cxx_eval_constant_expression calls and return early after
each of them if *jump_target.
(cxx_eval_array_reference): Likewise.
(cxx_eval_component_reference): Likewise.
(cxx_eval_bit_field_ref): Likewise.
(cxx_eval_bit_cast): Likewise. Assert CHECKING_P call doesn't
throw or return.
(cxx_eval_logical_expression): Add jump_target argument. Pass it
through to cxx_eval_constant_expression calls and return early after
each of them if *jump_target.
(cxx_eval_bare_aggregate): Likewise.
(cxx_eval_vec_init_1): Add jump_target argument. Pass it through
to cxx_eval_bare_aggregate and recursive call. Pass it through
to get_array_or_vector_nelts and cxx_eval_constant_expression
and return early after it if *jump_target.
(cxx_eval_vec_init): Add jump_target argument. Pass it through
to cxx_eval_constant_expression and cxx_eval_vec_init_1.
(cxx_union_active_member): Add jump_target argument. Pass it
through to cxx_eval_constant_expression and return early after it
if *jump_target.
(cxx_fold_indirect_ref_1): Add jump_target argument. Pass it
through to cxx_union_active_member and recursive calls.
(cxx_eval_indirect_ref): Add jump_target argument. Pass it through
to cxx_fold_indirect_ref_1 calls and to recursive call, in which
case return early after it if *jump_target.
(cxx_fold_indirect_ref): Add jump_target argument. Pass it through
to cxx_fold_indirect_ref and cxx_eval_constant_expression calls and
return early after those if *jump_target.
(cxx_eval_trinary_expression): Add jump_target argument. Pass it
through to cxx_eval_constant_expression calls and return early after
those if *jump_target.
(cxx_eval_store_expression): Add jump_target argument. Pass it
through to cxx_eval_constant_expression and eval_and_check_array_index
calls and return early after those if *jump_target.
(cxx_eval_increment_expression): Add jump_target argument. Pass it
through to cxx_eval_constant_expression calls and return early after
those if *jump_target.
(label_matches): Handle VAR_DECL case.
(cxx_eval_statement_list): Remove local_target variable and
!jump_target handling. Handle throws (jump_target) like returns or
breaks.
(cxx_eval_loop_expr): Remove local_target variable and !jump_target
handling. Pass it through to cxx_eval_constant_expression. Handle
throws (jump_target) like returns.
(cxx_eval_switch_expr): Pass jump_target through to
cxx_eval_constant_expression on cond, return early after it
if *jump_target.
(build_new_constexpr_heap_type): Add jump_target argument. Pass it
through to cxx_eval_constant_expression calls, return early after
those if *jump_target.
(merge_jump_target): New function.
(cxx_eval_constant_expression): Make jump_target argument no longer
defaulted, don't test jump_target for NULL. Pass jump_target
through to recursive calls, cxx_eval_call_expression,
cxx_eval_store_expression, cxx_eval_indirect_ref,
cxx_eval_unary_expression, cxx_eval_binary_expression,
cxx_eval_logical_expression, cxx_eval_array_reference,
cxx_eval_component_reference, cxx_eval_bit_field_ref,
cxx_eval_vector_conditional_expression, cxx_eval_bare_aggregate,
cxx_eval_vec_init, cxx_eval_trinary_expression, cxx_fold_indirect_ref,
build_new_constexpr_heap_type, cxx_eval_increment_expression,
cxx_eval_bit_cast and return earlyu after some of those
if *jump_target as needed.
(cxx_eval_constant_expression) <case TARGET_EXPR>: For C++26 push
also CLEANUP_EH_ONLY cleanups, with NULL_TREE marker after them.
(cxx_eval_constant_expression) <case RETURN_EXPR>: Don't
override *jump_target if throws (jump_target).
(cxx_eval_constant_expression) <case TRY_CATCH_EXPR, case TRY_BLOCK,
case MUST_NOT_THROW_EXPR, case TRY_FINALLY_EXPR, case CLEANUP_STMT>:
Handle C++26 constant expressions.
(cxx_eval_constant_expression) <case CLEANUP_POINT_EXPR>: For C++26
with throws (jump_target) evaluate the CLEANUP_EH_ONLY cleanups as
well, and if not throws (jump_target) skip those. Set *jump_target
if some of the cleanups threw.
(cxx_eval_constant_expression) <case THROW_EXPR>: Recurse on operand
for C++26.
(cxx_eval_outermost_constant_expr): Diagnose uncaught exceptions both
from main expression and cleanups, diagnose also
break/continue/returns from the main expression. Handle
CLEANUP_EH_ONLY cleanup markers. Don't diagnose mutable poison stuff
if non_constant_p. Use different diagnostics for non-deleted heap
allocations if they were allocated by __cxa_allocate_exception.
(callee_might_throw): New function.
(struct check_for_return_continue_data): Add could_throw field.
(check_for_return_continue): Handle AGGR_INIT_EXPR and CALL_EXPR and
set d->could_throw if they could throw.
(potential_constant_expression_1): For CALL_EXPR allow
cxx_dynamic_cast_fn_p calls. For C++26 set *jump_target to void_node
for calls that could throw. For C++26 if call to non-constexpr call
is seen, try to evaluate arguments first and if they could throw,
don't diagnose call to non-constexpr function nor return false.
Adjust check_for_return_continue_data initializers and
set *jump_target to void_node if data.could_throw_p. For C++26
recurse on THROW_EXPR argument. Add comment explaining TRY_BLOCK
handling with C++26 exceptions. Handle throws like returns in some
cases.
* cp-tree.h (MUST_NOT_THROW_NOEXCEPT_P, MUST_NOT_THROW_THROW_P,
MUST_NOT_THROW_CATCH_P, DECL_EXCEPTION_REFCOUNT): Define.
(DECL_LOCAL_DECL_P): Fix comment typo, VARIABLE_DECL -> VAR_DECL.
(enum cp_built_in_function): Add CP_BUILT_IN_EH_PTR_ADJUST_REF,
(handler_match_for_exception_type): Declare.
* call.cc (handler_match_for_exception_type): New function.
* except.cc (initialize_handler_parm): Set MUST_NOT_THROW_CATCH_P
on newly created MUST_NOT_THROW_EXPR.
(begin_eh_spec_block): Set MUST_NOT_THROW_NOEXCEPT_P.
(wrap_cleanups_r): Set MUST_NOT_THROW_THROW_P.
(build_throw): Add another TARGET_EXPR whose scope spans
until after the __cxa_throw call and copy pointer value from ptr
to it and use it in __cxa_throw argument.
* tree.cc (builtin_valid_in_constant_expr_p): Handle
CP_BUILT_IN_EH_PTR_ADJUST_REF.
* decl.cc (cxx_init_decl_processing): Initialize
__builtin_eh_ptr_adjust_ref FE builtin.
* pt.cc (tsubst_stmt) <case MUST_NOT_THROW_EXPR>: Copy the
MUST_NOT_THROW_NOEXCEPT_P, MUST_NOT_THROW_THROW_P and
MUST_NOT_THROW_CATCH_P flags.
* cp-gimplify.cc (cp_gimplify_expr) <case CALL_EXPR>: Error on
non-folded CP_BUILT_IN_EH_PTR_ADJUST_REF calls.
gcc/testsuite/
* g++.dg/cpp0x/constexpr-ellipsis2.C: Expect different diagnostics for
C++26.
* g++.dg/cpp0x/constexpr-throw.C: Likewise.
* g++.dg/cpp1y/constexpr-84192.C: Expect different diagnostics.
* g++.dg/cpp1y/constexpr-throw.C: Expect different diagnostics for
C++26.
* g++.dg/cpp1z/constexpr-asm-5.C: Likewise.
* g++.dg/cpp26/constexpr-eh1.C: New test.
* g++.dg/cpp26/constexpr-eh2.C: New test.
* g++.dg/cpp26/constexpr-eh3.C: New test.
* g++.dg/cpp26/constexpr-eh4.C: New test.
* g++.dg/cpp26/constexpr-eh5.C: New test.
* g++.dg/cpp26/constexpr-eh6.C: New test.
* g++.dg/cpp26/constexpr-eh7.C: New test.
* g++.dg/cpp26/constexpr-eh8.C: New test.
* g++.dg/cpp26/constexpr-eh9.C: New test.
* g++.dg/cpp26/constexpr-eh10.C: New test.
* g++.dg/cpp26/constexpr-eh11.C: New test.
* g++.dg/cpp26/constexpr-eh12.C: New test.
* g++.dg/cpp26/constexpr-eh13.C: New test.
* g++.dg/cpp26/constexpr-eh14.C: New test.
* g++.dg/cpp26/constexpr-eh15.C: New test.
* g++.dg/cpp26/feat-cxx26.C: Change formatting in __cpp_pack_indexing
and __cpp_pp_embed test. Add __cpp_constexpr_exceptions test.
* g++.dg/cpp26/static_assert1.C: Expect different diagnostics for
C++26.
* g++.dg/cpp2a/consteval34.C: Likewise.
* g++.dg/cpp2a/consteval-memfn1.C: Likewise.
* g++.dg/cpp2a/constexpr-dynamic4.C: For C++26 add std::exception and
std::bad_cast definitions and expect different diagnostics.
* g++.dg/cpp2a/constexpr-dynamic6.C: Likewise.
* g++.dg/cpp2a/constexpr-dynamic7.C: Likewise.
* g++.dg/cpp2a/constexpr-dynamic8.C: Likewise.
* g++.dg/cpp2a/constexpr-dynamic9.C: Likewise.
* g++.dg/cpp2a/constexpr-dynamic11.C: Likewise.
* g++.dg/cpp2a/constexpr-dynamic14.C: Likewise.
* g++.dg/cpp2a/constexpr-dynamic18.C: Likewise.
* g++.dg/cpp2a/constexpr-new27.C: New test.
* g++.dg/cpp2a/constexpr-typeid5.C: New test.
libstdc++-v3/
* include/bits/version.def (constexpr_exceptions): New.
* include/bits/version.h: Regenerate.
* libsupc++/exception (std::bad_exception::bad_exception): Add
_GLIBCXX26_CONSTEXPR.
(std::bad_exception::~bad_exception, std::bad_exception::what): For
C++26 add constexpr and define inline.
* libsupc++/exception.h (std::exception::exception,
std::exception::operator=): Add _GLIBCXX26_CONSTEXPR.
(std::exception::~exception, std::exception::what): For C++26 add
constexpr and define inline.
* libsupc++/exception_ptr.h (std::make_exception_ptr): Add
_GLIBCXX26_CONSTEXPR. For if consteval use just throw with
current_exception() in catch.
(std::exception_ptr::exception_ptr(void*)): For C++26 add constexpr
and define inline.
(std::exception_ptr::exception_ptr()): Add _GLIBCXX26_CONSTEXPR.
(std::exception_ptr::exception_ptr(const exception_ptr&)): Likewise.
Use __builtin_eh_ptr_adjust_ref if consteval and compiler has it
instead of _M_addref.
(std::exception_ptr::exception_ptr(nullptr_t)): Add
_GLIBCXX26_CONSTEXPR.
(std::exception_ptr::exception_ptr(exception_ptr&&)): Likewise.
(std::exception_ptr::operator=): Likewise.
(std::exception_ptr::~exception_ptr): Likewise. Use
__builtin_eh_ptr_adjust_ref if consteval and compiler has it
instead of _M_release.
(std::exception_ptr::swap): Add _GLIBCXX26_CONSTEXPR.
(std::exception_ptr::operator bool): Likewise.
(std::exception_ptr::operator==): Likewise.
* libsupc++/nested_exception.h
(std::nested_exception::nested_exception): Add _GLIBCXX26_CONSTEXPR.
(std::nested_exception::operator=): Likewise.
(std::nested_exception::~nested_exception): For C++26 add constexpr
and define inline.
(std::nested_exception::rethrow_if_nested): Add _GLIBCXX26_CONSTEXPR.
(std::nested_exception::nested_ptr): Likewise.
(std::_Nested_exception::_Nested_exception): Likewise.
(std::throw_with_nested, std::rethrow_if_nested): Likewise.
* libsupc++/new (std::bad_alloc::bad_alloc): Likewise.
(std::bad_alloc::operator=): Likewise.
(std::bad_alloc::~bad_alloc): For C++26 add constexpr and define
inline.
(std::bad_alloc::what): Likewise.
(std::bad_array_new_length::bad_array_new_length): Add
_GLIBCXX26_CONSTEXPR.
(std::bad_array_new_length::~bad_array_new_length): For C++26 add
constexpr and define inline.
(std::bad_array_new_length::what): Likewise.
* libsupc++/typeinfo (std::bad_cast::bad_cast): Add
_GLIBCXX26_CONSTEXPR.
(std::bad_cast::~bad_cast): For C++26 add constexpr and define inline.
(std::bad_cast::what): Likewise.
(std::bad_typeid::bad_typeid): Add _GLIBCXX26_CONSTEXPR.
(std::bad_typeid::~bad_typeid): For C++26 add constexpr and define
inline.
(std::bad_typeid::what): Likewise.
aarch64: Guard VF-based costing with !m_costing_for_scalar
g:4b47acfe2b626d1276e229a0cf165e934813df6c caused a segfault
in aarch64_vector_costs::analyze_loop_vinfo when costing scalar
code, since we'd end up dividing by a zero VF.
Much of the structure of the aarch64 costing code dates from
a stage 4 patch, when we had to work within the bounds of what
the target-independent code did. Some of it could do with a
rework now that we're not so constrained.
This patch is therefore an emergency fix rather than the best
long-term solution. I'll revisit when I have more time to think
about it.
gcc/
* config/aarch64/aarch64.cc (aarch64_vector_costs::add_stmt_cost):
Guard VF-based costing with !m_costing_for_scalar.
Reduce the # of arguments of .ACCESS_WITH_SIZE from 6 to 4.
This is an improvement to the design of internal function .ACCESS_WITH_SIZE.
Currently, the .ACCESS_WITH_SIZE is designed as:
ACCESS_WITH_SIZE (REF_TO_OBJ, REF_TO_SIZE, CLASS_OF_SIZE,
TYPE_OF_SIZE, ACCESS_MODE, TYPE_SIZE_UNIT for element)
which returns the REF_TO_OBJ same as the 1st argument;
1st argument REF_TO_OBJ: The reference to the object;
2nd argument REF_TO_SIZE: The reference to the size of the object,
3rd argument CLASS_OF_SIZE: The size referenced by the REF_TO_SIZE represents
0: the number of bytes.
1: the number of the elements of the object type;
4th argument TYPE_OF_SIZE: A constant 0 with its TYPE being the same as the
TYPE of the object referenced by REF_TO_SIZE
5th argument ACCESS_MODE:
-1: Unknown access semantics
0: none
1: read_only
2: write_only
3: read_write
6th argument: The TYPE_SIZE_UNIT of the element TYPE of the FAM when 3rd
argument is 1. NULL when 3rd argument is 0.
Among the 6 arguments:
A. The 3rd argument CLASS_OF_SIZE is not needed. If the REF_TO_SIZE represents
the number of bytes, simply pass 1 to the TYPE_SIZE_UNIT argument.
B. The 4th and the 5th arguments can be combined into 1 argument, whose TYPE
represents the TYPE_OF_SIZE, and the constant value represents the
ACCESS_MODE.
As a result, the new design of the .ACCESS_WITH_SIZE is:
ACCESS_WITH_SIZE (REF_TO_OBJ, REF_TO_SIZE,
TYPE_OF_SIZE + ACCESS_MODE, TYPE_SIZE_UNIT for element)
which returns the REF_TO_OBJ same as the 1st argument;
1st argument REF_TO_OBJ: The reference to the object;
2nd argument REF_TO_SIZE: The reference to the size of the object,
3rd argument TYPE_OF_SIZE + ACCESS_MODE: An integer constant with a pointer
TYPE.
The pointee TYPE of the pointer TYPE is the TYPE of the object referenced
by REF_TO_SIZE.
The integer constant value represents the ACCESS_MODE:
0: none
1: read_only
2: write_only
3: read_write
4th argument: The TYPE_SIZE_UNIT of the element TYPE of the array.
gcc/c-family/ChangeLog:
* c-ubsan.cc (get_bound_from_access_with_size): Adjust the position
of the arguments per the new design.
gcc/c/ChangeLog:
* c-typeck.cc (build_access_with_size_for_counted_by): Update comments.
Adjust the arguments per the new design.
gcc/ChangeLog:
* internal-fn.cc (expand_ACCESS_WITH_SIZE): Update comments.
* internal-fn.def (ACCESS_WITH_SIZE): Update comments.
* tree-object-size.cc (access_with_size_object_size): Update comments.
Adjust the arguments per the new design.
Passing TYPE_SIZE_UNIT of the element as the 6th argument to .ACCESS_WITH_SIZE (PR121000)
The size of the element of the FAM _cannot_ reliably depends on the original
TYPE of the FAM that we passed as the 6th parameter to the .ACCESS_WITH_SIZE:
when the element of the FAM has a variable length type. Since the variable
that represents TYPE_SIZE_UNIT has no explicit usage in the original IL,
compiler transformations (such as DSE) that are applied before object_size
phase might eliminate the whole definition to the variable that represents
the TYPE_SIZE_UNIT of the element of the FAM.
In order to resolve this issue, instead of passing the original TYPE of the
FAM as the 6th argument to .ACCESS_WITH_SIZE, we should explicitly pass the
original TYPE_SIZE_UNIT of the element TYPE of the FAM as the 6th argument
to the call to .ACCESS_WITH_SIZE.
PR middle-end/121000
gcc/c/ChangeLog:
* c-typeck.cc (build_access_with_size_for_counted_by): Update comments.
Pass TYPE_SIZE_UNIT of the element as the 6th argument.
gcc/ChangeLog:
* internal-fn.cc (expand_ACCESS_WITH_SIZE): Update comments.
* internal-fn.def (ACCESS_WITH_SIZE): Update comments.
* tree-object-size.cc (access_with_size_object_size): Update comments.
Get the element_size from the 6th argument directly.
gcc/testsuite/ChangeLog:
* gcc.dg/flex-array-counted-by-pr121000.c: New test.
aarch64: Fix LD1Q and ST1Q failures for big-endian
LD1Q gathers and ST1Q scatters are unusual in that they operate
on 128-bit blocks (effectively VNx1TI). However, we don't have
modes or ACLE types for 128-bit integers, and 128-bit integers
are not the intended use case. Instead, the instructions are
intended to be used in "hybrid VLA" operations, where each 128-bit
block is an Advanced SIMD vector.
The normal SVE modes therefore capture the intended use case better
than VNx1TI would. For example, VNx2DI is effectively N copies
of V2DI, VNx4SI N copies of V4SI, etc.
Since there is only one LD1Q instruction and one ST1Q instruction,
the ACLE support used a single pattern for each, with the loaded or
stored data having mode VNx2DI. The ST1Q pattern was generated by:
where the force_lowpart_subreg bitcast the stored data to VNx2DI.
But such subregs require an element reverse on big-endian targets
(see the comment at the head of aarch64-sve.md), which wasn't the
intention. The code should have used aarch64_sve_reinterpret instead.
which always returns a VNx2DI value, leaving the caller to bitcast
that to the correct mode. That bitcast again uses subregs and has
the same problem as above.
However, for the reasons explained in the comment, using
aarch64_sve_reinterpret does not work well for LD1Q. The patch
instead parameterises the LD1Q based on the required data mode.
gcc/
* config/aarch64/aarch64-sve2.md (aarch64_gather_ld1q): Replace with...
(@aarch64_gather_ld1q<mode>): ...this, parameterizing based on mode.
* config/aarch64/aarch64-sve-builtins-sve2.cc
(svld1q_gather_impl::expand): Update accordingly.
(svst1q_scatter_impl::expand): Use aarch64_sve_reinterpret
instead of force_lowpart_subreg.
Jan Hubicka [Thu, 10 Jul 2025 14:56:21 +0000 (16:56 +0200)]
Fixes to auto-profile and Gimple matching.
This patch fixes several issues I noticed in gimple matching and -Wauto-profile
warning. One problem is that we mismatched symbols with user names, such as
"*strlen" instead of "strlen". I added raw_symbol_name to strip extra '*' which
is ok on ELF targets which are only targets we support with auto-profile, but
eventually we will want to add the user prefix. There is sorry about this.
Also I think dwarf2out is wrong:
/* Mimic what assemble_name_raw does with a leading '*'. */
if (name[0] == '*')
name = &name[1];
The patch also fixes locations of warning. I used location of problematic
statement as warning_at parmaeter but also included info about the containing
funtction. This makes warning_at to ignore the fist location that is fixed now.
I also fixed the ICE with -Wno-auto-profile disussed earlier.
Bootstrapped/regtested x86_64-linux. Autoprofiled bootstrap now fails for
weird reasons for me (it does not bild the training stage), so I will try to
debug this before comitting.
gcc/ChangeLog:
* auto-profile.cc: Include output.h.
(function_instance::set_call_location): Also sanity check
that location is known.
(raw_symbol_name): Two new static functions.
(dump_inline_stack): Use it.
(string_table::get_index_by_decl): Likewise.
(function_instance::get_cgraph_node): Likewise.
(function_instance::get_function_instance_by_decl): Fix typo
in warning; use raw names; fix lineno decoding.
(match_with_target): Add containing funciton parameter;
correctly output function and call location in warning.
(function_instance::lookup_count): Fix warning locations.
(function_instance::match): Fix warning locations; avoid
crash with mismatched callee; do not warn about broken callsites
twice.
(autofdo_source_profile::offline_external_functions): Use
raw_assembler_name.
(walk_block): Use raw_assembler_name.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-prof/afdo-inline.c: Add user symbol names.
Robin Dapp [Wed, 9 Jul 2025 13:58:05 +0000 (15:58 +0200)]
expand: ICE if asked to expand RDIV with non-float type.
This patch adds asserts that ensure we only expand an RDIV_EXPR with
actual float mode. It also replaces the RDIV_EXPR in setting a
vectorized loop's length by EXACT_DIV_EXPR. The code in question is
only used with length-control targets (riscv, powerpc, s390).
Robin Dapp [Thu, 10 Jul 2025 07:41:48 +0000 (09:41 +0200)]
RISC-V: Make zero-stride load broadcast a tunable.
This patch makes the zero-stride load broadcast idiom dependent on a
uarch-tunable "use_zero_stride_load". Right now we have quite a few
paths that reach a strided load and some of them are not exactly
straightforward.
While broadcast is relatively rare on rv64 targets it is more common on
rv32 targets that want to vectorize 64-bit elements.
While the patch is more involved than I would have liked it could have
even touched more places. The whole broadcast-like insn path feels a
bit hackish due to the several optimizations we employ. Some of the
complications stem from the fact that we lump together real broadcasts,
vector single-element sets, and strided broadcasts. The strided-load
alternatives currently require a memory_constraint to work properly
which causes more complications when trying to disable just these.
In short, the whole pred_broadcast handling in combination with the
sew64_scalar_helper could use work in the future. I was about to start
with it in this patch but soon realized that it would only distract from
the original intent. What can help in the future is split strided and
non-strided broadcast entirely, as well as the single-element sets.
Yet unclear is whether we need to pay special attention for misaligned
strided loads (PR120782).
I regtested on rv32 and rv64 with strided_load_broadcast_p forced to
true and false. With either I didn't observe any new execution failures
but obviously there are new scan failures with strided broadcast turned
off.
PR target/118734
gcc/ChangeLog:
* config/riscv/constraints.md (Wdm): Use tunable for Wdm
constraint.
* config/riscv/riscv-protos.h (emit_avltype_insn): Declare.
(can_be_broadcasted_p): Rename to...
(can_be_broadcast_p): ...this.
* config/riscv/predicates.md: Use renamed function.
(strided_load_broadcast_p): Declare.
* config/riscv/riscv-selftests.cc (run_broadcast_selftests):
Only run broadcast selftest if strided broadcasts are OK.
* config/riscv/riscv-v.cc (emit_avltype_insn): New function.
(sew64_scalar_helper): Only emit a pred_broadcast if the new
tunable says so.
(can_be_broadcasted_p): Rename to...
(can_be_broadcast_p): ...this and use new tunable.
* config/riscv/riscv.cc (struct riscv_tune_param): Add strided
broad tunable.
(strided_load_broadcast_p): Implement.
* config/riscv/vector.md: Use strided_load_broadcast_p () and
work around 64-bit broadcast on rv32 targets.
Jan Dubiec [Thu, 10 Jul 2025 13:41:08 +0000 (07:41 -0600)]
[PATCH] libgcc: PR target/116363 Fix SFtype to UDWtype conversion
This patch fixes SFtype to UDWtype (aka float to unsigned long long)
conversion on targets without DFmode like e.g. H8/300H. It solely relies
on SFtype->UWtype and UWtype->UDWtype conversions/casts. The existing code
in line 2218 (counter = a) assigns/casts a float which is *always* not lesser
than Wtype_MAXp1_F to an UWtype int which of course does not have enough
capacity.
PR target/116363
libgcc/ChangeLog:
* libgcc2.c (__fixunssfDI): Fix SFtype to UDWtype conversion for targets
without LIBGCC2_HAS_DF_MODE defined
Daniel Barboza [Thu, 10 Jul 2025 13:28:38 +0000 (07:28 -0600)]
[RISC-V] Detect new fusions for RISC-V
This is primarily Daniel's work... He's chasing things in QEMU & LLVM right
now so I'm doing a bit of clean-up and shepherding this patch forward.
--
Instruction fusion is a reasonably common way to improve the performance of
code on many architectures/designs. A few years ago we submitted (via VRULL I
suspect) fusion support for a number of cases in the RISC-V space.
We made each type of fusion selectable independently in the tuning structure so
that designs which implemented some particular set of fusions could select just
the ones their design implemented. This patch adds to that generic
infrastructure.
In particular we're introducing additional load fusions, store pair fusions,
bitfield extractions and a few B extension related fusions.
Conceptually for the new load fusions we're adding the ability to fuse most
add/shNadd instructions with a subsequent load. There's a couple of
exceptions, but in general the expectation is that if we have add/shNadd for
address computation, then they can potentially use with the load where the
address gets used.
We've had limited forms of store pair fusion for a while. Essentially we
required both stores to be 64 bits wide and land on opposite sides of a 128 bit
cache line. That was enough to help prologues and a few other things, but was
fairly restrictive. The new cases capture store pairs where the two stores
have the same size and hit consecutive memory locations. For example, storing
consecutive bytes with sb+sb is fusible.
For bitfield extractions we can fuse together a shift left followed by a shift
right for arbitrary shift counts where as previously we restricted the shift
counts to those implementing sign/zero extensions of 8, and 16 bit objects.
Finally some B extension fusions. orc.b+not which shows up in string
comparisons, ctz+andi (deepsjeng?), neg+max (synthesized abs).
I hope these prove to be useful to other RISC-V designs. I wouldn't be
surprised if we have to break down the new load fusions further for some
designs. If we need to do that it wouldn't be hard.
FWIW, our data indicates the generalized store fusions followed by the expanded
load fusions are the most important cases for the new code.
These have been tested with crosses and bootstrapped on the BPI.
Waiting on pre-commit CI before moving forward (though it has been failing to
pick up some patches recently...)
gcc/
* config/riscv/riscv.cc (riscv_fusion_pairs): Add new cases.
(riscv_set_is_add): New function.
(riscv_set_is_addi, riscv_set_is_adduw, riscv_set_is_shNadd): Likewise.
(riscv_set_is_shNadduw): Likewise.
(riscv_macro_fusion_pair_p): Add new fusion cases.
testsuite: Add -funwind-tables to sve*/pfalse* tests
The SVE svpfalse folding tests use CFI directives to delimit the
function bodies. That requires -funwind-tables to be enabled,
which is true by default for *-linux-gnu targets, but not for *-elf.
Richard Biener [Thu, 10 Jul 2025 09:26:04 +0000 (11:26 +0200)]
Handle failed gcond pattern gracefully
SLP analysis of early break conditions asserts pattern recognition
canonicalized all of them. But the pattern can fail, for example
when vector types cannot be computed. So be graceful here, so
we don't ICE when we didn't yet compute vector types.
* tree-vect-slp.cc (vect_analyze_slp): Fail for non-canonical
gconds.
Richard Biener [Thu, 10 Jul 2025 09:23:59 +0000 (11:23 +0200)]
Adjust reduction with conversion SLP build
The following adjusts how we set SLP_TREE_VECTYPE for the conversion
node we build when fixing up the reduction with conversion SLP instance.
This should probably see more TLC, but the following avoids relying
on STMT_VINFO_VECTYPE for this.
* tree-vect-slp.cc (vect_build_slp_instance): Do not use
SLP_TREE_VECTYPE to determine the conversion back to the
reduction IV.
Richard Biener [Thu, 10 Jul 2025 09:21:26 +0000 (11:21 +0200)]
Avoid vect_is_simple_use call from vectorizable_reduction
When analyzing the reduction cycle we look to determine the
reduction input vector type, for lane-reducing ops we look
at the input but instead of using vect_is_simple_use which
is problematic for SLP we should simply get at the SLP
operands vector type. If that's not set and we make up one
we should also ensure it stays so.
* tree-vect-loop.cc (vectorizable_reduction): Avoid
vect_is_simple_use and record a vector type if we come
up with one.
Richard Biener [Thu, 10 Jul 2025 08:25:03 +0000 (10:25 +0200)]
Avoid vect_is_simple_use call from get_load_store_type
This isn't the required refactoring of vect_check_gather_scatter
but it avoids a now unnecessary call to vect_is_simple_use which
is problematic because it looks at STMT_VINFO_VECTYPE which we
want to get rid of. SLP build already ensures vect_is_simple_use
on all lane defs, so all we need is to populate the offset_vectype
and offset_dt which is not always set by vect_check_gather_scatter.
That's both easy to get from the SLP child directly.
* tree-vect-stmts.cc (get_load_store_type): Do not use
vect_is_simple_use to fill gather/scatter offset operand
vectype and dt.
Richard Biener [Thu, 10 Jul 2025 08:08:23 +0000 (10:08 +0200)]
Pass SLP node down to cost hook for reduction cost
The following arranges vector reduction costs to hand down the
SLP node (of the reduction stmt) to the cost hooks, not only the
stmt_info. This also avoids accessing STMT_VINFO_VECTYPE of an
unrelated stmt to the node that is subject to code generation.
* tree-vect-loop.cc (vect_model_reduction_cost): Get SLP
node instead of stmt_info and use that when recording costs.
aarch64: PR target/120999: Adjust operands for movprfx alternative of NBSL implementation of NOR
While the SVE2 NBSL instruction accepts MOVPRFX to add more flexibility
due to its tied operands, the destination of the movprfx cannot be also
a source operand. But the offending pattern in aarch64-sve2.md tries
to do exactly that for the "=?&w,w,w" alternative and gas warns for the
attached testcase.
This patch adjusts that alternative to avoid taking operand 0 as an input
in the NBSL again.
So for the testcase in the patch we now generate:
nor_z:
movprfx z0, z1
nbsl z0.d, z0.d, z2.d, z1.d
ret
instead of the previous:
nor_z:
movprfx z0, z1
nbsl z0.d, z0.d, z2.d, z0.d
ret
which generated a gas warning.
Bootstrapped and tested on aarch64-none-linux-gnu.
TARGET_VECTORIZE_VEC_PERM_CONST has code to match the SVE2.1
"hybrid VLA" DUPQ, EXTQ, UZPQ{1,2}, and ZIPQ{1,2} instructions.
This matching was conditional on !BYTES_BIG_ENDIAN.
The ACLE code also lowered the associated SVE2.1 intrinsics into
suitable VEC_PERM_EXPRs. This lowering was not conditional on
!BYTES_BIG_ENDIAN.
The mismatch led to lots of ICEs in the ACLE tests on big-endian
targets: we lowered to VEC_PERM_EXPRs that are not supported.
I think the !BYTES_BIG_ENDIAN restriction was unnecessary.
SVE maps the first memory element to the least significant end of
the register for both endiannesses, so no endian correction or lane
number adjustment is necessary.
This is in some ways a bit counterintuitive. ZIPQ1 is conceptually
"apply Advanced SIMD ZIP1 to each 128-bit block" and endianness does
matter when choosing between Advanced SIMD ZIP1 and ZIP2. For example,
the V4SI permute selector { 0, 4, 1, 5 } corresponds to ZIP1 for little-
endian and ZIP2 for big-endian. But the difference between the hybrid
VLA and Advanced SIMD permute selectors is a consequence of the
difference between the SVE and Advanced SIMD element orders.
The same thing applies to ACLE intrinsics. The current lowering of
svzipq1 etc. is correct for both endiannesses. If ACLE code does:
2x svld1_s32 + svzipq1_s32 + svst1_s32
then the byte-for-byte result is the same for both endiannesses.
On big-endian targets, this is different from using the Advanced SIMD
sequence below for each 128-bit block:
depends on endianness, since the quadword gathers and scatters use
Advanced SIMD byte ordering for each 128-bit block. This gather/scatter
sequence behaves in the same way as the Advanced SIMD LDR+ZIP1+STR
sequence for both endiannesses.
Programmers writing ACLE code have to be aware of this difference
if they want to support both endiannesses.
The patch includes some new execution tests to verify the expansion
of the VEC_PERM_EXPRs.
Jakub Jelinek [Thu, 10 Jul 2025 08:23:31 +0000 (10:23 +0200)]
Comment spelling fix: tunning -> tuning
Kyrylo noticed another spelling bug and like usually, the same mistake
happens in multiple places.
2025-07-10 Jakub Jelinek <jakub@redhat.com>
* config/i386/x86-tune.def: Change "Tunning the" to "tuning" in
comment and use semicolon instead of dot in comment.
* loop-unroll.cc (decide_unroll_stupid): Comment spelling fix,
tunning -> tuning.
Richard Biener [Wed, 9 Jul 2025 13:04:12 +0000 (15:04 +0200)]
Remove vect_analyze_loop_operations
This removes the remains of vect_analyze_loop_operations. All the
checks it does still on LC PHIs of inner loops in outer loop
vectorization should be handled by vectorizable_lc_phi.
* tree-vect-loop.cc (vect_active_double_reduction_p): Remove.
(vect_analyze_loop_operations): Remove.
(vect_analyze_loop_2): Do not call it.
Richard Biener [Wed, 9 Jul 2025 10:53:45 +0000 (12:53 +0200)]
Remove non-SLP vectorization factor determining
The following removes the VF determining step from non-SLP stmts.
For now we keep setting STMT_VINFO_VECTYPE for all stmts, there are
too many places to fix, including some more complicated ones, so
this is defered for a followup.
Along this removes vect_update_vf_for_slp, merging the check for
present hybrid SLP stmts to vect_detect_hybrid_slp and fail analysis
early. This also removes to essentially duplicate this check in
the stmt walk of vect_analyze_loop_operations. Getting rid of that,
and performing some other checks earlier is also defered to a followup.
* tree-vect-loop.cc (vect_determine_vf_for_stmt_1): Rename
to ...
(vect_determine_vectype_for_stmt_1): ... this and only set
STMT_VINFO_VECTYPE. Fail for single-element vector types.
(vect_determine_vf_for_stmt): Rename to ...
(vect_determine_vectype_for_stmt): ... this and only set
STMT_VINFO_VECTYPE. Fail for single-element vector types.
(vect_determine_vectorization_factor): Rename to ...
(vect_set_stmts_vectype): ... this and only set STMT_VINFO_VECTYPE.
(vect_update_vf_for_slp): Remove.
(vect_analyze_loop_operations): Remove walk over stmts.
(vect_analyze_loop_2): Call vect_set_stmts_vectype instead of
vect_determine_vectorization_factor. Set vectorization factor
from LOOP_VINFO_SLP_UNROLLING_FACTOR. Fail if vect_detect_hybrid_slp
detects hybrid stmts or when vect_make_slp_decision finds
nothing to SLP.
* tree-vect-slp.cc (vect_detect_hybrid_slp): Move check
whether we have any hybrid stmts here from vect_update_vf_for_slp
* tree-vect-stmts.cc (vect_analyze_stmt): Remove loop over
stmts.
* tree-vectorizer.h (vect_detect_hybrid_slp): Update.