Jason Merrill [Fri, 1 Aug 2025 02:58:43 +0000 (22:58 -0400)]
c++: constexpr, array, private ctor [PR120800]
Here cxx_eval_vec_init_1 wants to recreate the default constructor call that
we previously built and threw away in build_vec_init_elt, but we aren't in
the same access context at this point. Since we already checked access,
let's just suppress access control here.
Redoing overload resolution at constant evaluation time is sketchy, but
should usually be fine for a default/copy constructor.
aarch64: Prevent streaming-compatible code from assembler rejection [PR121028]
Streaming-compatible functions can be compiled without SME enabled, but need
to use "SMSTART SM" and "SMSTOP SM" to temporarily switch into the streaming
state of a callee. These switches are conditional on the current mode being
opposite to the target mode, so no SME instructions are executed if SME is not
available.
However, in GAS, "SMSTART SM" and "SMSTOP SM" always require +sme. A call
from a streaming-compatible function, compiled without SME enabled, to a non
-streaming function will be rejected as:
Error: selected processor does not support `smstop sm'..
To work around this, we make use of the .inst directive to insert the literal
encodings of "SMSTART SM" and "SMSTOP SM".
gcc/ChangeLog:
PR target/121028
* config/aarch64/aarch64-sme.md (aarch64_smstart_sm): Use the .inst
directive if !TARGET_SME.
(aarch64_smstop_sm): Likewise.
gcc/testsuite/ChangeLog:
PR target/121028
* gcc.target/aarch64/sme/call_sm_switch_1.c: Tell check-function
-bodies not to ignore .inst directives, and replace the test for
"smstart sm" with one for it's encoding.
* gcc.target/aarch64/sme/call_sm_switch_11.c: Likewise.
* gcc.target/aarch64/sme/pr121028.c: New test.
testsuite: Fix gcc.target/powerpc/vsx-builtin-7.c test [PR119382]
The test vsx-builtin-7.c failed on powerpc64le-linux due to Identical
Code Folding (ICF) merging the functions insert_di_0_v2 and insert_di_0.
This behavior was introduced by commit r15-7961-gdc47161c1f32c3, which
enhanced alias analysis in ao_compare::compare_ao_refs, enabling the
compiler to identify and optimize structurally identical functions. As a
result, the compiler replaced insert_di_0_v2 with a tail call to
insert_di_0, altering the expected test behavior.
This patch adds -fno-ipa-icf to the test's dg-options to disable ICF,
avoiding function merging and ensuring the test executes correctly.
disabled transformation from "movq $-1,reg" to "pushq $-1; popq reg" for
-Oz. But for legacy integer registers, the former is 4 bytes and the
latter is 3 bytes. Enable such transformation for -Oz.
gcc/
PR target/120427
* config/i386/i386.md (peephole2): Transform "movq $-1,reg" to
"pushq $-1; popq reg" for -Oz if reg is a legacy integer register.
gcc/testsuite/
PR target/120427
* gcc.target/i386/pr120427-5.c: New test.
Eliminate redundant vpextrq/vpinsrq when move TI to V4SI.
r14-1902-g96c3539f2a3813 split TImode move with 2 DImode move, it's
supposed to optimize TImode in parameter/return since accoring to
psABI it's stored into 2 general registers.
But when TImode is not in parameter/return, it could create redundancy
in the PR.
The termio ioctls are no longer used after commit 59978b21ad9c
("[sanitizer_common] Remove interceptors for deprecated struct termio
(#137403)"), remove them. Fixes this build error:
../../../../libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.cpp:765:27: error: invalid application of ‘sizeof’ to incomplete type ‘__sanitizer::termio’
765 | unsigned IOCTL_TCGETA = TCGETA;
| ^~~~~~
../../../../libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.cpp:769:27: error: invalid application of ‘sizeof’ to incomplete type ‘__sanitizer::termio’
769 | unsigned IOCTL_TCSETA = TCSETA;
| ^~~~~~
../../../../libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.cpp:770:28: error: invalid application of ‘sizeof’ to incomplete type ‘__sanitizer::termio’
770 | unsigned IOCTL_TCSETAF = TCSETAF;
| ^~~~~~~
../../../../libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.cpp:771:28: error: invalid application of ‘sizeof’ to incomplete type ‘__sanitizer::termio’
771 | unsigned IOCTL_TCSETAW = TCSETAW;
| ^~~~~~~
x86: PR target/103773: Fix wrong-code with -Oz from pop to memory.
added "*mov<mode>_and" and extended "*mov<mode>_or" to transform
"mov $0,mem" to the shorter "and $0,mem" and "mov $-1,mem" to the shorter
"or $-1,mem" for -Oz. But the new pattern:
aren't guarded for -Oz. As a result, "and $0,mem" and "or $-1,mem" are
generated without -Oz.
1. Change *mov<mode>_and" to define_insn_and_split and split it to
"mov $0,mem" if not -Oz.
2. Change "*mov<mode>_or" to define_insn_and_split and split it to
"mov $-1,mem" if not -Oz.
3. Don't transform "mov $-1,reg" to "push $-1; pop reg" for -Oz since it
should be transformed to "or $-1,reg".
gcc/
PR target/120427
* config/i386/i386.md (*mov<mode>_and): Changed to
define_insn_and_split. Split it to "mov $0,mem" if not -Oz.
(*mov<mode>_or): Changed to define_insn_and_split. Split it
to "mov $-1,mem" if not -Oz.
(peephole2): Don't transform "mov $-1,reg" to "push $-1; pop reg"
for -Oz since it will be transformed to "or $-1,reg".
gcc/testsuite/
PR target/120427
* gcc.target/i386/cold-attribute-4.c: Compile with -Oz.
* gcc.target/i386/pr120427-1.c: New test.
* gcc.target/i386/pr120427-2.c: Likewise.
* gcc.target/i386/pr120427-3.c: Likewise.
* gcc.target/i386/pr120427-4.c: Likewise.
Xi Ruoyao [Mon, 14 Jul 2025 19:01:12 +0000 (03:01 +0800)]
LoongArch: Fix wrong code generated by TARGET_VECTORIZE_VEC_PERM_CONST [PR121064]
When TARGET_VECTORIZE_VEC_PERM_CONST is called, target may be the
same pseudo as op0 and/or op1. Loading the selector into target
would clobber the input, producing wrong code like
vld $vr0, $t0
vshuf.w $vr0, $vr0, $vr1
So don't load the selector into d->target, use a new pseudo to hold the
selector instead. The reload pass will load the pseudo for selector and
the pseudo for target into the same hard register (following our
constraint '0' on the shuf instructions) anyway.
gcc/ChangeLog:
PR target/121064
* config/loongarch/lsx.md (lsx_vshuf_<lsxfmt_f>): Add '@' to
generate a mode-aware helper. Use <VIMODE> as the mode of the
operand 1 (selector).
* config/loongarch/lasx.md (lasx_xvshuf_<lasxfmt_f>): Likewise.
* config/loongarch/loongarch.cc
(loongarch_try_expand_lsx_vshuf_const): Create a new pseudo for
the selector. Use the mode-aware helper to simplify the code.
(loongarch_expand_vec_perm_const): Likewise.
gcc/testsuite/ChangeLog:
PR target/121064
* gcc.target/loongarch/pr121064.c: New test.
aarch64: Tweak handling of general SVE permutes [PR121027]
This PR is partly about a code quality regression that was triggered
by g:caa7a99a052929d5970677c5b639e1fa5166e334. That patch taught the
gimple optimisers to fold two VEC_PERM_EXPRs into one, conditional
upon either (a) the original permutations not being "native" operations
or (b) the combined permutation being a "native" operation.
Whether something is a "native" operation is tested by calling
can_vec_perm_const_p with allow_variable_p set to false. This requires
the permutation to be supported directly by TARGET_VECTORIZE_VEC_PERM_CONST,
rather than falling back to the general vec_perm optab.
This exposed a problem with the way that we handled general 2-input
permutations for SVE. Unlike Advanced SIMD, base SVE does not have
an instruction to do general 2-input permutations. We do still implement
the vec_perm optab for SVE, but only when the vector length is known at
compile time. The general expansion is pretty expensive: an AND, a SUB,
two TBLs, and an ORR. It certainly couldn't be considered a "native"
operation.
However, if a VEC_PERM_EXPR has a constant selector, the indices can
be wider than the elements being permuted. This is not true for the
vec_perm optab, where the indices and permuted elements must have the
same precision.
This leads to one case where we cannot leave a general 2-input permutation
to be handled by the vec_perm optab: when permuting bytes on a target
with 2048-bit vectors. In that case, the indices of the elements in
the second vector are in the range [256, 511], which cannot be stored
in a byte index.
TARGET_VECTORIZE_VEC_PERM_CONST therefore has to handle 2-input SVE
permutations for one specific case. Rather than check for that
specific case, the code went ahead and used the vec_perm expansion
whenever it worked. But that undermines the !allow_variable_p
handling in can_vec_perm_const_p; it becomes impossible for
target-independent code to distinguish "native" operations from
the worst-case fallback.
This patch instead limits TARGET_VECTORIZE_VEC_PERM_CONST to the
cases that it has to handle. It fixes the PR for all vector lengths
except 2048 bits.
A better fix would be to introduce some sort of costing mechanism,
which would allow us to reject the new VEC_PERM_EXPR even for
2048-bit targets. But that would be a significant amount of work
and would not be backportable.
gcc/
PR target/121027
* config/aarch64/aarch64.cc (aarch64_evpc_sve_tbl): Punt on 2-input
operations that can be handled by vec_perm.
gcc/testsuite/
PR target/121027
* gcc.target/aarch64/sve/acle/general/perm_1.c: New test.
aarch64: Fix general permutes of svbfloat16_ts [PR121027]
Testing gcc.target/aarch64/sve/permute_2.c without the associated GCC
patch triggered an unrecognisable insn ICE for the svbfloat16_t tests.
This was because the implementation of general two-vector permutes
requires two TBLs and an ORR, with the ORR being represented as an
unspec for floating-point modes. The associated pattern did not
cover VNx8BF.
gcc/
PR target/121027
* config/aarch64/iterators.md (SVE_I): Move further up file.
(SVE_F): New mode iterator.
(SVE_ALL): Redefine in terms of SVE_I and SVE_F.
* config/aarch64/aarch64-sve.md (*<LOGICALF:optab><mode>3): Extend
to all SVE_F.
gcc/testsuite/
PR target/121027
* gcc.target/aarch64/sve/permute_5.c: New test.
aarch64: Fix ZIP1 order in aarch64_expand_vector_init [PR118891]
aarch64_expand_vector_init contains some divide-and-conquer code
that tries to load the odd and even elements into 64-bit registers
and then ZIP them together. On big-endian targets, the even elements
are more significant than the odd elements and so should come second
in the ZIP.
This fixes many execution failures on aarch64_be-elf, including
gcc.c-torture/execute/pr28982a.c.
gcc/
PR target/118891
* config/aarch64/aarch64.cc (aarch64_expand_vector_init): Fix the
ZIP1 operand order for big-endian targets.
aarch64: Fix neon-sve-bridge.c failures for big-endian
Lowpart subregs are generally disallowed on big-endian SVE vector
registers, since the first memory element is stored at the least
significant end of the register, rather than the most significant end.
(See the comment at the head of aarch64-sve.md for details,
and aarch64_modes_compatible_p for the implementation.)
This means that arm_sve_neon_bridge.h needs to use custom define_insns
for big-endian targets, in lieu of using lowpart subregs. However,
one of those define_insns relied on the prohibited lowparts internally,
to convert an Advanced SIMD register to an SVE register. Since the
lowpart is not allowed, the lowpart_subreg would return null, leading
to a later ICE.
The simplest fix seems to be to use %Z instead, to force the Advanced
SIMD register to be written as an SVE register.
gcc/
* config/aarch64/aarch64-sve.md (@aarch64_sve_set_neonq_<mode>):
Use %Z instead of lowpart_subreg. Tweak formatting.
vect: Fix VEC_WIDEN_PLUS_HI/LO choice for big-endian [PR118891]
In the tree codes and optabs, the "hi" in a vector hi/lo pair means
"most significant" and the "lo" means "least significant", with
sigificance following GCC's normal endian expectations. Thus on
big-endian targets, the hi part handles the first half of the elements
in memory order and the lo part handles the second half.
For tree codes, supportable_widening_operation first chooses hi/lo
pairs based on little-endian order and then uses:
if (BYTES_BIG_ENDIAN && c1 != VEC_WIDEN_MULT_EVEN_EXPR)
std::swap (c1, c2);
to adjust. However, the handling for internal functions was missing
an equivalent fixup. This led to several execution failures in vect.exp
on aarch64_be-elf.
If the hi/lo code fails, the internal function handling goes on to try
even/odd. But I couldn't see anything obvious that would put the even/
odd results back into the right order later, so there might be a latent
bug there too.
gcc/
PR tree-optimization/118891
* tree-vect-stmts.cc (supportable_widening_operation): Swap the
hi and lo internal functions on big-endian targets.
Jason Merrill [Sat, 26 Jul 2025 00:49:17 +0000 (20:49 -0400)]
c++: constexpr uninitialized union [PR120577]
This was failing for two reasons:
1) We were wrongly treating the basic_string constructor as
zero-initializing the object, which it doesn't.
2) Given that, when we went to look for a value for the anonymous union,
we concluded that it was value-initialized, and trying to evaluate that
broke because we weren't setting ctx->ctor for it.
This patch fixes both issues, #1 by setting CONSTRUCTOR_NO_CLEARING and #2
by inserting a new CONSTRUCTOR for the member rather than evaluate it out of
context, which is consistent with cxx_eval_store_expression.
PR c++/120577
gcc/cp/ChangeLog:
* constexpr.cc (cxx_eval_call_expression): Set
CONSTRUCTOR_NO_CLEARING on initial value for ctor.
(cxx_eval_component_reference): Make value-initialization
of an aggregate member explicit.
Jakub Jelinek [Fri, 18 Jul 2025 07:20:30 +0000 (09:20 +0200)]
gimple-fold: Fix up big endian _BitInt adjustment [PR121131]
The following testcase ICEs because SCALAR_INT_TYPE_MODE of course
doesn't work for large BITINT_TYPE types which have BLKmode.
native_encode* as well as e.g. r14-8276 use in cases like these
GET_MODE_SIZE (SCALAR_INT_TYPE_MODE ()) and TREE_INT_CST_LOW (TYPE_SIZE_UNIT
()) for the BLKmode ones.
In this case, it wants bits rather than bytes, so I've used
GET_MODE_BITSIZE like before and TYPE_SIZE otherwise.
Furthermore, the patch only computes encoding_size for big endian
targets, for little endian we don't really adjust anything, so there
is no point computing it.
2025-07-18 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/121131
* gimple-fold.cc (fold_nonarray_ctor_reference): Use
TREE_INT_CST_LOW (TYPE_SIZE ()) instead of
GET_MODE_BITSIZE (SCALAR_INT_TYPE_MODE ()) for BLKmode BITINT_TYPEs.
Don't compute encoding_size at all for little endian targets.
H.J. Lu [Thu, 3 Jul 2025 02:54:39 +0000 (10:54 +0800)]
x86-64: Add RDI clobber to 64-bit dynamic TLS patterns
*tls_global_dynamic_64_largepic, *tls_local_dynamic_64_<mode> and
*tls_local_dynamic_base_64_largepic use RDI as the __tls_get_addr
argument. Add RDI clobber to these patterns to show it.
gcc/
PR target/120908
* config/i386/i386.cc (legitimize_tls_address): Pass RDI to
gen_tls_local_dynamic_64.
* config/i386/i386.md (*tls_global_dynamic_64_largepic): Add
RDI clobber and use it to generate LEA.
(*tls_local_dynamic_64_<mode>): Likewise.
(*tls_local_dynamic_base_64_largepic): Likewise.
(@tls_local_dynamic_64_<mode>): Add a clobber.
gcc/testsuite/
PR target/120908
* gcc.target/i386/pr120908.c: New test.
H.J. Lu [Tue, 1 Jul 2025 09:17:06 +0000 (17:17 +0800)]
x86-64: Add RDI clobber to tls_global_dynamic_64 patterns
*tls_global_dynamic_64_<mode> uses RDI as the __tls_get_addr argument.
Add RDI clobber to tls_global_dynamic_64 patterns to show it.
PR target/120908
* config/i386/i386.cc (legitimize_tls_address): Pass RDI to
gen_tls_global_dynamic_64.
* config/i386/i386.md (*tls_global_dynamic_64_<mode>): Add RDI
clobber and use it to generate LEA.
(@tls_global_dynamic_64_<mode>): Add a clobber.
i386: Remove KEYLOCKER related feature since Panther Lake and Clearwater Forest
According to July 2025 SDM, Key locker will no longer be supported on
hardware 2025 onwards. This means for Panther Lake and Clearwater Forest,
the feature will not be enabled. Remove them from those two platforms.
Eric Botcazou [Thu, 1 May 2025 23:30:56 +0000 (01:30 +0200)]
ada: Fix missing error on too large Component_Size not multiple of storage unit
This is a small regression introduced a few years ago.
gcc/ada/ChangeLog:
* gcc-interface/decl.cc (gnat_to_gnu_component_type): Validate the
Component_Size like the size of a type only if the component type
is actually packed.
Eric Botcazou [Tue, 27 May 2025 11:32:18 +0000 (13:32 +0200)]
ada: Fix wrong conversion of controlled array with representation change
The problem is that a temporary is created for the conversion because of the
representation change, and it is finalized without having been initialized.
gcc/ada/ChangeLog:
* exp_ch4.adb (Handle_Changed_Representation): Alphabetize local
variables. Set the No_Finalize_Actions flag on the assignment.
Jakub Jelinek [Tue, 1 Jul 2025 13:28:10 +0000 (15:28 +0200)]
c++: Fix up cp_build_array_ref COND_EXPR handling [PR120471]
The following testcase is miscompiled since the introduction of UBSan,
cp_build_array_ref COND_EXPR handling replaces
(cond ? a : b)[idx] with cond ? a[idx] : b[idx], but if there are
SAVE_EXPRs inside of idx, they will be evaluated just in one of the
branches and the other uses uninitialized temporaries.
Fixed by keeping doing what it did if idx doesn't have side effects
and is invariant. Otherwise if op1/op2 are ARRAY_TYPE arrays with
invariant addresses or pointers with invariant values, use
SAVE_EXPR <op0>, SAVE_EXPR <idx>, SAVE_EXPR <op0> as a new condition
and SAVE_EXPR <idx> instead of idx for the recursive calls.
Otherwise punt, but if op1/op2 are ARRAY_TYPE, furthermore call
cp_default_conversion on array, so that COND_EXPR with ARRAY_TYPE doesn't
survive in the IL until expansion.
2025-07-01 Jakub Jelinek <jakub@redhat.com>
PR c++/120471
gcc/cp/
* typeck.cc (cp_build_array_ref) <case COND_EXPR>: If idx is not
INTEGER_CST, don't optimize the case (but cp_default_conversion on
array early if it has ARRAY_TYPE) or use
SAVE_EXPR <op0>, SAVE_EXPR <idx>, SAVE_EXPR <op0> as new op0 depending
on flag_strong_eval_order and whether op1 and op2 are arrays with
invariant address or tree invariant pointers. Formatting fixes.
gcc/testsuite/
* g++.dg/ubsan/pr120471.C: New test.
* g++.dg/parse/pr120471.C: New test.
aarch64: Incorrect removal of ZA restore [PR120624]
The PCS defines a lazy save scheme for managing ZA across normal
"private-ZA" functions. GCC currently uses this scheme for calls
to all private-ZA functions (rather than using caller-save).
Therefore, before a sequence of calls to private-ZA functions, GCC emits
code to set up a lazy save. After the sequence of calls, GCC emits code
to check whether lazy save was committed and restore the ZA contents
if so.
These sequences are emitted by the mode-switching pass, in an attempt
to reduce the number of redundant saves and restores.
The lazy save scheme also means that, before a function can use ZA,
it must first conditionally store the old contents of ZA to the caller's
lazy save buffer, if any.
This all creates some relatively complex dependencies between
setup code, save/restore code, and normal reads from and writes to ZA.
These dependencies are modelled using special fake hard registers:
;; Sometimes we use placeholder instructions to mark where later
;; ABI-related lowering is needed. These placeholders read and
;; write this register. Instructions that depend on the lowering
;; read the register.
(LOWERING_REGNUM 87)
;; Represents the contents of the current function's TPIDR2 block,
;; in abstract form.
(TPIDR2_BLOCK_REGNUM 88)
;; Holds the value that the current function wants PSTATE.ZA to be.
;; The actual value can sometimes vary, because it does not track
;; changes to PSTATE.ZA that happen during a lazy save and restore.
;; Those effects are instead tracked by ZA_SAVED_REGNUM.
(SME_STATE_REGNUM 89)
;; Instructions write to this register if they set TPIDR2_EL0 to a
;; well-defined value. Instructions read from the register if they
;; depend on the result of such writes.
;;
;; The register does not model the architected TPIDR2_ELO, just the
;; current function's management of it.
(TPIDR2_SETUP_REGNUM 90)
;; Represents the property "has an incoming lazy save been committed?".
(ZA_FREE_REGNUM 91)
;; Represents the property "are the current function's ZA contents
;; stored in the lazy save buffer, rather than in ZA itself?".
(ZA_SAVED_REGNUM 92)
;; Represents the contents of the current function's ZA state in
;; abstract form. At various times in the function, these contents
;; might be stored in ZA itself, or in the function's lazy save buffer.
;;
;; The contents persist even when the architected ZA is off. Private-ZA
;; functions have no effect on its contents.
(ZA_REGNUM 93)
Every normal read from ZA and write to ZA depends on SME_STATE_REGNUM,
in order to sequence the code with the initial setup of ZA and
with the lazy save scheme.
The code to restore ZA after a call involves several instructions,
including conditional control flow. It is initially represented as
a single define_insn and is split late, after shrink-wrapping and
prologue/epilogue insertion.
The split form of the restore instruction includes a conditional call
to __arm_tpidr2_restore:
The write to SME_STATE_REGNUM indicates the end of the region where
ZA_REGNUM might differ from the real contents of ZA. In other words,
it is the point at which normal reads from ZA and writes to ZA
can safely take place.
To finally get to the point, the problem in this PR was that the
unsplit aarch64_restore_za pattern was missing this change to
SME_STATE_REGNUM. It could therefore be deleted as dead before
it had chance to be split. The split form had the correct dataflow,
but the unsplit form didn't.
Unfortunately, the tests for this code tended to use calls and asms
to model regions of ZA usage, and those don't seem to be affected
in the same way.
gcc/
PR target/120624
* config/aarch64/aarch64.md (SME_STATE_REGNUM): Expand on comments.
* config/aarch64/aarch64-sme.md (aarch64_restore_za): Also set
SME_STATE_REGNUM
gcc/testsuite/
PR target/120624
* gcc.target/aarch64/sme/za_state_7.c: New test.
Eric Botcazou [Fri, 27 Jun 2025 21:47:49 +0000 (23:47 +0200)]
Fix misoptimization of CONSTRUCTOR with reverse SSO
fold_ctor_reference already punts on a CONSTRUCTOR whose type has reverse
storage order, but it can be invoked in a couple of places on a CONSTRUCTOR
with native storage order that has been wrapped in a VIEW_CONVERT_EXPR to a
type with reverse storage order; this would require a post adjustment that
does not currently exist, thus yield wrong code for this admittedly quite
pathological (but supported) case.
gcc/
* gimple-fold.cc (fold_const_aggregate_ref_1) <COMPONENT_REF>:
Bail out immediately if the reference has reverse storage order.
* tree-ssa-sccvn.cc (fully_constant_vn_reference_p): Likewise.
gcc/testsuite/
* gnat.dg/sso20.adb: New test.
Haochen Jiang [Tue, 17 Jun 2025 06:08:38 +0000 (14:08 +0800)]
i386: Remove CLDEMOTE for clients
CLDEMOTE is not enabled on clients according to SDM. SDM only mentioned
it will be enabled on Xeon and Atom servers, not clients. Remove them
since Alder Lake (where it is introduced).
gcc/ChangeLog:
* config/i386/i386.h (PTA_ALDERLAKE): Use PTA_GOLDMONT_PLUS
as base to remove PTA_CLDEMOTE.
(PTA_SIERRAFOREST): Add PTA_CLDEMOTE since PTA_ALDERLAKE
does not include that anymore.
* doc/invoke.texi: Update texi file.
Richard Biener [Wed, 11 Sep 2024 11:54:33 +0000 (13:54 +0200)]
tree-optimization/116674 - vectorizable_simd_clone_call and re-analysis
When SLP analysis scraps an instance because it fails to analyze we
can end up calling vectorizable_* in analysis mode on a node that
was analyzed during the analysis of that instance again.
vectorizable_simd_clone_call wasn't expecting that and instead
guarded analysis/transform code on populated data structures.
The following changes it so it survives re-analysis.
PR tree-optimization/116674
* tree-vect-stmts.cc (vectorizable_simd_clone_call): Support
re-analysis.
Jakub Jelinek [Thu, 19 Jun 2025 06:57:27 +0000 (08:57 +0200)]
dfp: Further decimal_real_to_integer fixes [PR120631]
Unfortunately, the following further testcase shows that there aren't
problems only with very large precisions and large exponents, but pretty
much anything larger than 64-bits. After all, before _BitInt support dfp
didn't even have {,unsigned }__int128 <-> _Decimal{32,64,128,64x} support,
and the testcase again shows some of the conversions yielding zeros.
While the pr120631.c test worked even without the earlier patch.
So, this patch assumes 64-bit precision at most is ok and for anything
larger it just uses exponent 0 and multiplies afterwards.
2025-06-19 Jakub Jelinek <jakub@redhat.com>
PR middle-end/120631
* dfp.cc (decimal_real_to_integer): Use result multiplication not just
when precision > 128 and dn.exponent > 19, but when precision > 64
and dn.exponent > 0.
* gcc.dg/dfp/bitint-10.c: New test.
* gcc.dg/dfp/pr120631.c: New test.
Jakub Jelinek [Wed, 18 Jun 2025 06:07:22 +0000 (08:07 +0200)]
dfp, real: Fix up FLOAT_EXPR/FIX_TRUNC_EXPR constant folding between dfp and large _BitInt [PR120631]
The following testcase shows that while at runtime we handle conversions
between _Decimal{64,128} and large _BitInt correctly, at compile time we
mishandle them in both directions, in one direction we end up in ICE in
decimal_from_integer callee because the char buffer is too short for the
needed number of decimal digits, in the conversion of dfp to large _BitInt
we return 0 in the wide_int.
The following patch fixes the ICE by using larger buffer (XALLOCAVEC
allocated, it will be never larger than 65536 / 3 bytes) in the larger
_BitInt case, and the other direction by setting exponent to exp % 19
and instead multiplying the result by needed powers of 10^19 (10^19 chosen
as largest power of ten that can fit into UHWI).
2025-06-18 Jakub Jelinek <jakub@redhat.com>
PR middle-end/120631
* real.cc (decimal_from_integer): Add digits argument, if larger than
256, use XALLOCAVEC allocated buffer.
(real_from_integer): Pass val_in's precision divided by 3 to
decimal_from_integer.
* dfp.cc (decimal_real_to_integer): For precision > 128 if finite
and exponent is large, decrease exponent and multiply resulting
wide_int by powers of 10^19.
Richard Earnshaw [Thu, 20 Mar 2025 14:42:59 +0000 (14:42 +0000)]
opcodes: fix wrong code in expand_binop_directly [PR117811]
If expand_binop_directly fails to add a REG_EQUAL note it tries to
unwind and restart. But it can unwind too far if expand_binop changed
some of the operands before calling it. We don't need to unwind that
far anyway since we should end up taking exactly the same route next
time, just without a target rtx.
To fix this we remove LAST from the argument list and let the callers
(all in expand_binop) do their own unwinding if the call fails.
Instead we unwind just as far as the entry to expand_binop_directly
and recurse within this function instead of all the way back up.
gcc/ChangeLog:
PR middle-end/117811
* optabs.cc (expand_binop_directly): Remove LAST as an argument,
instead record the last insn on entry. Only delete insns if
we need to restart and restart by calling ourself, not expand_binop.
(expand_binop): Update callers to expand_binop_directly. If it
fails to expand the operation, delete back to LAST.
gcc/testsuite:
PR middle-end/117811
* gcc.dg/torture/pr117811.c: New test.
Jakub Jelinek [Thu, 12 Jun 2025 18:22:39 +0000 (20:22 +0200)]
recip: Reset range info when replacing sqrt with rsqrt [PR120638]
This pass reuses a SSA_NAME on the lhs of sqrt etc. call as lhs
of .RSQRT etc. call. The following testcase is miscompiled since my recent
ranger cast changes, because we compute (correct) range for sqrtf argument
as well as result but then recip pass keeps using that range for the .RQSRT
call which returns 1. / sqrt, so the function then returns 0.5f
unconditionally.
Note, on foo this is a regression from GCC 15, but on bar it regressed
already with the r14-536 change.
2025-06-12 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/120638
* tree-ssa-math-opts.cc (pass_cse_reciprocals::execute): Call
reset_flow_sensitive_info on arg1.
Jakub Jelinek [Thu, 5 Jun 2025 13:47:19 +0000 (15:47 +0200)]
real: Fix up real_from_integer [PR120547]
The function has 2 problems, one is _BitInt specific and the other is
most likely also reproduceable only with it.
The first issue is that I've missed updating the function for _BitInt,
maxbitlen as MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT
obviously isn't guaranteed to be larger than any integral type we might
want to convert at compile time from wide_int to REAL_VALUE_FORMAT.
Just using len instead of it works fine, at least when used after
HOST_BITS_PER_WIDE_INT is added to it and it is truncated to multiples
of HOST_BITS_PER_WIDE_INT.
The other bug is that if the value has too many significant bits (formerly
maxbitlen - cnt_l_z, now len - cnt_l_z), the code just shifts it right and
adds the shift count to the future exponent. That isn't correct for
rounding as the testcase attempts to show, the internal real format has more
bits than any precision in supported format, but we still need to
distinguish bewtween values exactly half way between representable floating
point values (those should be rounded to even) and the case when we've
shifted away some non-zero bits, so the value was tiny bit larger than half
way and then we should round up.
The patch uses something like e.g. soft-fp uses in these cases, right shift
with sticky bit in the least significant bit.
2025-06-05 Jakub Jelinek <jakub@redhat.com>
PR middle-end/120547
* real.cc (real_from_integer): Remove maxbitlen variable, use
len instead of that. When shifting right, or in 1 if any of the
shifted away bits are non-zero. Formatting fix.
Jakub Jelinek [Tue, 20 May 2025 06:21:14 +0000 (08:21 +0200)]
tree-chrec: Use signed_type_for in convert_affine_scev
On s390x-linux I've run into the gcc.dg/torture/bitint-27.c test ICEing in
build_nonstandard_integer_type called from convert_affine_scev (not sure
why it doesn't trigger on x86_64/aarch64).
The problem is clear, when ct is a BITINT_TYPE with some large
TYPE_PRECISION, build_nonstandard_integer_type won't really work on it.
The patch fixes it similarly what has been done for GCC 14 in various
other spots.
2025-05-20 Jakub Jelinek <jakub@redhat.com>
* tree-chrec.cc (convert_affine_scev): Use signed_type_for instead of
build_nonstandard_integer_type.
Jonathan Wakely [Wed, 4 Jun 2025 17:22:28 +0000 (18:22 +0100)]
libstdc++: Fix std::format thousands separators when sign present [PR120548]
The leading sign character should be skipped when deciding whether to
insert thousands separators into a floating-point format.
libstdc++-v3/ChangeLog:
PR libstdc++/120548
* include/std/format (__formatter_fp::_M_localize): Do not
include a leading sign character in the string to be grouped.
* testsuite/std/format/functions/format.cc: Check grouping when
sign is present in the output.
Jonathan Wakely [Wed, 28 May 2025 14:19:18 +0000 (15:19 +0100)]
libstdc++: Make system_clock::to_time_t always_inline [PR99832]
For some 32-bit targets Glibc supports changing the size of time_t to be
64 bits by defining _TIME_BITS=64. That causes an ABI change which
would affect std::chrono::system_clock::to_time_t. Because to_time_t is
not a function template, its mangled name does not depend on the return
type, so it has the same mangled name whether it returns a 32-bit time_t
or a 64-bit time_t. On targets where the size of time_t can be selected
at preprocessing time, that can cause ODR violations, e.g. the linker
selects a definition of to_time_t that returns a 32-bit value but a
caller expects 64-bit and so reads 32 bits of garbage from the stack.
This commit adds always_inline to to_time_t so that all callers inline
the conversion to time_t, and will do so using whatever type time_t
happens to be in that translation unit.
Existing objects compiled before this change will either have inlined
the function anyway (which is likely if compiled with any optimization
enabled) or will contain a COMDAT definition of the inline function and
so still be able to find it at link-time.
The attribute is also added to system_clock::from_time_t, because that's
an equally simple function and it seems reasonable for them to both be
always inlined.
libstdc++-v3/ChangeLog:
PR libstdc++/99832
* include/bits/chrono.h (system_clock::to_time_t): Add
always_inline attribute to be agnostic to the underlying type of
time_t.
(system_clock::from_time_t): Add always_inline for consistency
with to_time_t.
* testsuite/20_util/system_clock/99832.cc: New test.
Jonathan Wakely [Tue, 20 May 2025 09:53:41 +0000 (10:53 +0100)]
libstdc++: Fix incorrect links to archived SGI STL docs
In r8-7777-g25949ee33201f2 I updated some URLs to point to copies of the
SGI STL docs in the Wayback Machine, because the original pags were no
longer hosted on sgi.com. However, I incorrectly assumed that if one
archived page was at https://web.archive.org/web/20171225062613/... then
all the other pages would be too. Apparently that's not how the Wayback
Machine works, and each page is archived on a different date. That meant
that some of our links were redirecting to archived copies of the
announcement that the SGI STL docs have gone away.
This fixes each URL to refer to a correctly archived copy of the
original docs.