testsuite: handle-multiline-outputs must allow both cc1 and cc1.exe
Prior to 14-2027-g985d6480fe5, the input text had the file extensions
pruned. In 14-2027-g985d6480fe5, due to the move of the call, the
pruning is never done. This change restores the pruning of the file
extension to allow multiline test to pass on both Windows and other
platforms like Linux.
Patrick Palka [Mon, 4 Aug 2025 20:43:33 +0000 (16:43 -0400)]
c++: constexpr evaluation of abi::__dynamic_cast [PR120620]
r13-3299 changed our internal declaration of __dynamic_cast to reside
inside the abi/__cxxabiv1:: namespace instead of the global namespace,
matching the real declaration. This inadvertently made us now attempt
constexpr evaluation of user-written calls to abi::__dynamic_cast since
cxx_dynamic_cast_fn_p now also returns true for them, but we're not
prepared to handle arbitrary calls to __dynamic_cast, and therefore ICE.
This patch restores cxx_dynamic_cast_fn_p to return true only for
synthesized calls to __dynamic_cast, which can be distinguished by
DECL_ARTIFICIAL, since apparently the synthesized declaration of
__dynamic_cast doesn't get merged with the actual declaration.
PR c++/120620
gcc/cp/ChangeLog:
* constexpr.cc (cxx_dynamic_cast_fn_p): Return true only
for synthesized __dynamic_cast.
gcc/testsuite/ChangeLog:
* g++.dg/cpp2a/constexpr-dynamic19.C: New test.
* g++.dg/cpp2a/constexpr-dynamic1a.C: New test.
Jakub Jelinek [Wed, 6 Aug 2025 09:30:08 +0000 (11:30 +0200)]
bitint: Fix up INTEGER_CST PHI handling [PR121413]
The following testcase is miscompiled on aarch64-linux.
The problem is in the optimization to shorten large constants
in PHI arguments.
In a couple of places during bitint lowering we compute
minimal precision of constant and if it is significantly
smaller than the precision of the type, store smaller constant
in memory and extend it at runtime (zero or all ones).
Now, in most places that works fine, we handle the stored number
of limbs by loading them from memory and then the rest is
extended. In the PHI INTEGER_CST argument handling we do
it differently, we don't form there any loops (because we
insert stmt sequences on the edges).
The problem is that we copy the whole _BitInt variable from
memory to the PHI VAR_DECL + initialize the rest to = {} or
memset to -1. It has
min_prec = CEIL (min_prec, limb_prec) * limb_prec;
precision, so e.g. on x86_64 there is no padding and it works
just fine. But on aarch64 which has abi_limb_mode TImode
and limb_mode DImode it doesn't in some cases.
In the testcase the constant has 408 bits min precision, rounded up
to limb_prec (64) is 448, i.e. 7 limbs. But aarch64 with TImode
abi_limb_mode will actually allocate 8 limbs and the most significant
limb is solely padding. As we want to extend the constant with all
ones, copying the padding (from memory, so 0s) will result in
64 0 bits where 1 bits were needed.
The following patch fixes it by detecting this case and setting
min_prec to a multiple of abi limb precision so that it has
no padding.
2025-08-06 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/121413
* gimple-lower-bitint.cc (abi_limb_prec): New variable
(bitint_precision_kind): Initialize it.
(gimple_lower_bitint): Clear it at the start. For
min_prec > limb_prec descreased precision vars for
INTEGER_CST PHI arguments ensure min_prec is either
prec or multiple of abi_limb_prec.
Jakub Jelinek [Wed, 6 Aug 2025 09:28:37 +0000 (11:28 +0200)]
bitint: Fix up handling of uninitialized mul/div/float cast operands [PR121127]
handle_operand_addr (used for the cases where we use libgcc APIs, so
multiplication, division, modulo, casts of _BitInt to float/dfp) when it
sees default definition of an SSA_NAME which is not PARM_DECL (i.e.
uninitialized one) just allocates single uninitialized limb, there is no
need to waste more memory on it, it can just tell libgcc that it has
64-bit precision, not say 1024 bit etc.
Unfortunately, doing this runs into some asserts when we have a narrowing
cast of the uninitialized SSA_NAME (but still large/huge _BitInt).
The following patch fixes that by using a magic value in *prec_stored
for the uninitialized cases (0) and just don't do any *prec tweaks for
narrowing casts from that. precs still needs to be maintained as before,
that one is used for big endian adjustment.
2025-08-06 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/121127
* gimple-lower-bitint.cc (bitint_large_huge::handle_operand_addr): For
uninitialized SSA_NAME, set *prec_stored to 0 rather than *prec.
Handle that case in narrowing casts. If prec_stored is non-NULL,
set *prec_stored to prec_stored_val.
Jakub Jelinek [Fri, 1 Aug 2025 06:41:54 +0000 (08:41 +0200)]
bswap: Fix up ubsan detected UB in find_bswap_or_nop [PR121322]
The following testcase results in compiler UB as detected by ubsan.
find_bswap_or_nop first checks is_bswap_or_nop_p and if that fails
on the tmp_n value, tries some rotation of that if possible.
The discovery what rotate count to use ignores zero bytes from
the least significant end (those mean zero bytes and so can be masked
away) and on the first non-zero non-0xff byte (0xff means don't know),
1-8 means some particular byte of the original computes count (the rotation
count) from that byte + the byte index.
Now, on the following testcase we have tmp_n 0x403020105060700, i.e.
the least significant byte is zero, then the msb from the original value,
byte below it, another one below it, then the low 32 bits of the original
value. So, we stop at count 7 with i 1, it wraps around and we get count
0.
Then we invoke UB on
tmp_n = tmp_n >> count | tmp_n << (range - count);
because count is 0 and range is 64.
Now, of course I could fix it up by doing tmp_n << ((range - count) % range)
or something similar, but that is just wasted compile time, if count is 0,
we already know that is_bswap_or_nop_p failed on that tmp_n value and
so it will fail again if the value is the same. So I think better
just return NULL (i.e. punt).
2025-08-01 Jakub Jelinek <jakub@redhat.com>
PR middle-end/121322
* gimple-ssa-store-merging.cc (find_bswap_or_nop): Return NULL if
count is 0.
Jakub Jelinek [Fri, 11 Jul 2025 11:43:58 +0000 (13:43 +0200)]
testsuite: Add testcase for already fixed PR [PR120954]
This was a regression introduced by r16-1893 (and its backports) for C++,
though for C it had false positive warning for years. Fixed by r16-2000
(and its backports).
2025-07-11 Jakub Jelinek <jakub@redhat.com>
PR c++/120954
* c-c++-common/Warray-bounds-11.c: New test.
Jakub Jelinek [Fri, 4 Jul 2025 05:50:12 +0000 (07:50 +0200)]
c-family: Tweak ptr +- (expr +- cst) FE optimization [PR120837]
The following testcase is miscompiled with -fsanitize=undefined but we
introduce UB into the IL even without that flag.
The optimization ptr +- (expr +- cst) when expr/cst have undefined
overflow into (ptr +- cst) +- expr is sometimes simply not valid,
without careful analysis on what ptr points to we don't know if it
is valid to do (ptr +- cst) pointer arithmetics.
E.g. on the testcase, ptr points to start of an array (actually
conditionally one or another) and cst is -1, so ptr - 1 is invalid
pointer arithmetics, while ptr + (expr - 1) can be valid if expr
is at runtime always > 1 and smaller than size of the array ptr points
to + 1.
Unfortunately, removing this 1992-ish optimization altogether causes
FAIL: c-c++-common/restrict-2.c -Wc++-compat scan-tree-dump-times lim2 "Moving statement" 11
FAIL: gcc.dg/tree-ssa/copy-headers-5.c scan-tree-dump ch2 "is now do-while loop"
FAIL: gcc.dg/tree-ssa/copy-headers-5.c scan-tree-dump-times ch2 " if " 3
FAIL: gcc.dg/vect/pr57558-2.c scan-tree-dump vect "vectorized 1 loops"
FAIL: gcc.dg/vect/pr57558-2.c -flto -ffat-lto-objects scan-tree-dump vect "vectorized 1 loops"
regressions (restrict-2.c also for C++ in all std modes). I've been thinking
about some match.pd optimization for signed integer addition/subtraction of
constant followed by widening integral conversion followed by multiplication
or left shift, but that wouldn't help 32-bit arches.
So, instead at least for now, the following patch keeps doing the
optimization, just doesn't perform it in pointer arithmetics.
pointer_int_sum itself actually adds the multiplication by size_exp,
so ptr + expr is turned into ptr p+ expr * size_exp,
so this patch will try to optimize
ptr + (expr +- cst)
into
ptr p+ ((sizetype)expr * size_exp +- (sizetype)cst * size_exp)
and
ptr - (expr +- cst)
into
ptr p+ -((sizetype)expr * size_exp +- (sizetype)cst * size_exp)
2025-07-04 Jakub Jelinek <jakub@redhat.com>
PR c/120837
* c-common.cc (pointer_int_sum): Rewrite the intop PLUS_EXPR or
MINUS_EXPR optimization into extension of both intop operands,
their separate multiplication and then addition/subtraction followed
by rest of pointer_int_sum handling after the multiplication.
Jakub Jelinek [Tue, 1 Jul 2025 17:37:39 +0000 (19:37 +0200)]
testsuite: Fix up gcc.target/powerpc/builtin_altivec_tr_stxvr_runnable.c test (test UB) [PR120919]
In my reading of the test and the instructions emitted by the
builtins, it invokes UB 4 times, each time overwriting one byte
after some variable (sc, then ss, then si and then sll).
If we are lucky, like at -O0 -mcpu=power10, there is just padding
there or something that doesn't make the tests fail, if unlucky
like with -O0 -mcpu=power10 -fstack-protector-strong,
&sc + 1 == &expected_sc
and so it overwrites the expected_sc variable.
The test fails when testing with
RUNTESTFLAGS="--target_board=unix/'{,-fstack-protector-strong}'"
on power10.
The following patch fixes that by using arrays of 2 elements, so that
the overwriting of 1 byte happens to the part of the same variable.
2025-07-01 Jakub Jelinek <jakub@redhat.com>
PR testsuite/120919
* gcc.target/powerpc/builtin_altivec_tr_stxvr_runnable.c (main): Change
sc, ss, si and sll vars from scalars to arrays of 2 elements,
initialize and test just the first one though.
Jason Merrill [Fri, 1 Aug 2025 02:58:43 +0000 (22:58 -0400)]
c++: constexpr, array, private ctor [PR120800]
Here cxx_eval_vec_init_1 wants to recreate the default constructor call that
we previously built and threw away in build_vec_init_elt, but we aren't in
the same access context at this point. Since we already checked access,
let's just suppress access control here.
Redoing overload resolution at constant evaluation time is sketchy, but
should usually be fine for a default/copy constructor.
aarch64: Prevent streaming-compatible code from assembler rejection [PR121028]
Streaming-compatible functions can be compiled without SME enabled, but need
to use "SMSTART SM" and "SMSTOP SM" to temporarily switch into the streaming
state of a callee. These switches are conditional on the current mode being
opposite to the target mode, so no SME instructions are executed if SME is not
available.
However, in GAS, "SMSTART SM" and "SMSTOP SM" always require +sme. A call
from a streaming-compatible function, compiled without SME enabled, to a non
-streaming function will be rejected as:
Error: selected processor does not support `smstop sm'..
To work around this, we make use of the .inst directive to insert the literal
encodings of "SMSTART SM" and "SMSTOP SM".
gcc/ChangeLog:
PR target/121028
* config/aarch64/aarch64-sme.md (aarch64_smstart_sm): Use the .inst
directive if !TARGET_SME.
(aarch64_smstop_sm): Likewise.
gcc/testsuite/ChangeLog:
PR target/121028
* gcc.target/aarch64/sme/call_sm_switch_1.c: Tell check-function
-bodies not to ignore .inst directives, and replace the test for
"smstart sm" with one for it's encoding.
* gcc.target/aarch64/sme/call_sm_switch_11.c: Likewise.
* gcc.target/aarch64/sme/pr121028.c: New test.
testsuite: Fix gcc.target/powerpc/vsx-builtin-7.c test [PR119382]
The test vsx-builtin-7.c failed on powerpc64le-linux due to Identical
Code Folding (ICF) merging the functions insert_di_0_v2 and insert_di_0.
This behavior was introduced by commit r15-7961-gdc47161c1f32c3, which
enhanced alias analysis in ao_compare::compare_ao_refs, enabling the
compiler to identify and optimize structurally identical functions. As a
result, the compiler replaced insert_di_0_v2 with a tail call to
insert_di_0, altering the expected test behavior.
This patch adds -fno-ipa-icf to the test's dg-options to disable ICF,
avoiding function merging and ensuring the test executes correctly.
disabled transformation from "movq $-1,reg" to "pushq $-1; popq reg" for
-Oz. But for legacy integer registers, the former is 4 bytes and the
latter is 3 bytes. Enable such transformation for -Oz.
gcc/
PR target/120427
* config/i386/i386.md (peephole2): Transform "movq $-1,reg" to
"pushq $-1; popq reg" for -Oz if reg is a legacy integer register.
gcc/testsuite/
PR target/120427
* gcc.target/i386/pr120427-5.c: New test.
Eliminate redundant vpextrq/vpinsrq when move TI to V4SI.
r14-1902-g96c3539f2a3813 split TImode move with 2 DImode move, it's
supposed to optimize TImode in parameter/return since accoring to
psABI it's stored into 2 general registers.
But when TImode is not in parameter/return, it could create redundancy
in the PR.
The termio ioctls are no longer used after commit 59978b21ad9c
("[sanitizer_common] Remove interceptors for deprecated struct termio
(#137403)"), remove them. Fixes this build error:
../../../../libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.cpp:765:27: error: invalid application of ‘sizeof’ to incomplete type ‘__sanitizer::termio’
765 | unsigned IOCTL_TCGETA = TCGETA;
| ^~~~~~
../../../../libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.cpp:769:27: error: invalid application of ‘sizeof’ to incomplete type ‘__sanitizer::termio’
769 | unsigned IOCTL_TCSETA = TCSETA;
| ^~~~~~
../../../../libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.cpp:770:28: error: invalid application of ‘sizeof’ to incomplete type ‘__sanitizer::termio’
770 | unsigned IOCTL_TCSETAF = TCSETAF;
| ^~~~~~~
../../../../libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.cpp:771:28: error: invalid application of ‘sizeof’ to incomplete type ‘__sanitizer::termio’
771 | unsigned IOCTL_TCSETAW = TCSETAW;
| ^~~~~~~
x86: PR target/103773: Fix wrong-code with -Oz from pop to memory.
added "*mov<mode>_and" and extended "*mov<mode>_or" to transform
"mov $0,mem" to the shorter "and $0,mem" and "mov $-1,mem" to the shorter
"or $-1,mem" for -Oz. But the new pattern:
aren't guarded for -Oz. As a result, "and $0,mem" and "or $-1,mem" are
generated without -Oz.
1. Change *mov<mode>_and" to define_insn_and_split and split it to
"mov $0,mem" if not -Oz.
2. Change "*mov<mode>_or" to define_insn_and_split and split it to
"mov $-1,mem" if not -Oz.
3. Don't transform "mov $-1,reg" to "push $-1; pop reg" for -Oz since it
should be transformed to "or $-1,reg".
gcc/
PR target/120427
* config/i386/i386.md (*mov<mode>_and): Changed to
define_insn_and_split. Split it to "mov $0,mem" if not -Oz.
(*mov<mode>_or): Changed to define_insn_and_split. Split it
to "mov $-1,mem" if not -Oz.
(peephole2): Don't transform "mov $-1,reg" to "push $-1; pop reg"
for -Oz since it will be transformed to "or $-1,reg".
gcc/testsuite/
PR target/120427
* gcc.target/i386/cold-attribute-4.c: Compile with -Oz.
* gcc.target/i386/pr120427-1.c: New test.
* gcc.target/i386/pr120427-2.c: Likewise.
* gcc.target/i386/pr120427-3.c: Likewise.
* gcc.target/i386/pr120427-4.c: Likewise.
Xi Ruoyao [Mon, 14 Jul 2025 19:01:12 +0000 (03:01 +0800)]
LoongArch: Fix wrong code generated by TARGET_VECTORIZE_VEC_PERM_CONST [PR121064]
When TARGET_VECTORIZE_VEC_PERM_CONST is called, target may be the
same pseudo as op0 and/or op1. Loading the selector into target
would clobber the input, producing wrong code like
vld $vr0, $t0
vshuf.w $vr0, $vr0, $vr1
So don't load the selector into d->target, use a new pseudo to hold the
selector instead. The reload pass will load the pseudo for selector and
the pseudo for target into the same hard register (following our
constraint '0' on the shuf instructions) anyway.
gcc/ChangeLog:
PR target/121064
* config/loongarch/lsx.md (lsx_vshuf_<lsxfmt_f>): Add '@' to
generate a mode-aware helper. Use <VIMODE> as the mode of the
operand 1 (selector).
* config/loongarch/lasx.md (lasx_xvshuf_<lasxfmt_f>): Likewise.
* config/loongarch/loongarch.cc
(loongarch_try_expand_lsx_vshuf_const): Create a new pseudo for
the selector. Use the mode-aware helper to simplify the code.
(loongarch_expand_vec_perm_const): Likewise.
gcc/testsuite/ChangeLog:
PR target/121064
* gcc.target/loongarch/pr121064.c: New test.
aarch64: Tweak handling of general SVE permutes [PR121027]
This PR is partly about a code quality regression that was triggered
by g:caa7a99a052929d5970677c5b639e1fa5166e334. That patch taught the
gimple optimisers to fold two VEC_PERM_EXPRs into one, conditional
upon either (a) the original permutations not being "native" operations
or (b) the combined permutation being a "native" operation.
Whether something is a "native" operation is tested by calling
can_vec_perm_const_p with allow_variable_p set to false. This requires
the permutation to be supported directly by TARGET_VECTORIZE_VEC_PERM_CONST,
rather than falling back to the general vec_perm optab.
This exposed a problem with the way that we handled general 2-input
permutations for SVE. Unlike Advanced SIMD, base SVE does not have
an instruction to do general 2-input permutations. We do still implement
the vec_perm optab for SVE, but only when the vector length is known at
compile time. The general expansion is pretty expensive: an AND, a SUB,
two TBLs, and an ORR. It certainly couldn't be considered a "native"
operation.
However, if a VEC_PERM_EXPR has a constant selector, the indices can
be wider than the elements being permuted. This is not true for the
vec_perm optab, where the indices and permuted elements must have the
same precision.
This leads to one case where we cannot leave a general 2-input permutation
to be handled by the vec_perm optab: when permuting bytes on a target
with 2048-bit vectors. In that case, the indices of the elements in
the second vector are in the range [256, 511], which cannot be stored
in a byte index.
TARGET_VECTORIZE_VEC_PERM_CONST therefore has to handle 2-input SVE
permutations for one specific case. Rather than check for that
specific case, the code went ahead and used the vec_perm expansion
whenever it worked. But that undermines the !allow_variable_p
handling in can_vec_perm_const_p; it becomes impossible for
target-independent code to distinguish "native" operations from
the worst-case fallback.
This patch instead limits TARGET_VECTORIZE_VEC_PERM_CONST to the
cases that it has to handle. It fixes the PR for all vector lengths
except 2048 bits.
A better fix would be to introduce some sort of costing mechanism,
which would allow us to reject the new VEC_PERM_EXPR even for
2048-bit targets. But that would be a significant amount of work
and would not be backportable.
gcc/
PR target/121027
* config/aarch64/aarch64.cc (aarch64_evpc_sve_tbl): Punt on 2-input
operations that can be handled by vec_perm.
gcc/testsuite/
PR target/121027
* gcc.target/aarch64/sve/acle/general/perm_1.c: New test.
aarch64: Fix general permutes of svbfloat16_ts [PR121027]
Testing gcc.target/aarch64/sve/permute_2.c without the associated GCC
patch triggered an unrecognisable insn ICE for the svbfloat16_t tests.
This was because the implementation of general two-vector permutes
requires two TBLs and an ORR, with the ORR being represented as an
unspec for floating-point modes. The associated pattern did not
cover VNx8BF.
gcc/
PR target/121027
* config/aarch64/iterators.md (SVE_I): Move further up file.
(SVE_F): New mode iterator.
(SVE_ALL): Redefine in terms of SVE_I and SVE_F.
* config/aarch64/aarch64-sve.md (*<LOGICALF:optab><mode>3): Extend
to all SVE_F.
gcc/testsuite/
PR target/121027
* gcc.target/aarch64/sve/permute_5.c: New test.
aarch64: Fix ZIP1 order in aarch64_expand_vector_init [PR118891]
aarch64_expand_vector_init contains some divide-and-conquer code
that tries to load the odd and even elements into 64-bit registers
and then ZIP them together. On big-endian targets, the even elements
are more significant than the odd elements and so should come second
in the ZIP.
This fixes many execution failures on aarch64_be-elf, including
gcc.c-torture/execute/pr28982a.c.
gcc/
PR target/118891
* config/aarch64/aarch64.cc (aarch64_expand_vector_init): Fix the
ZIP1 operand order for big-endian targets.
aarch64: Fix neon-sve-bridge.c failures for big-endian
Lowpart subregs are generally disallowed on big-endian SVE vector
registers, since the first memory element is stored at the least
significant end of the register, rather than the most significant end.
(See the comment at the head of aarch64-sve.md for details,
and aarch64_modes_compatible_p for the implementation.)
This means that arm_sve_neon_bridge.h needs to use custom define_insns
for big-endian targets, in lieu of using lowpart subregs. However,
one of those define_insns relied on the prohibited lowparts internally,
to convert an Advanced SIMD register to an SVE register. Since the
lowpart is not allowed, the lowpart_subreg would return null, leading
to a later ICE.
The simplest fix seems to be to use %Z instead, to force the Advanced
SIMD register to be written as an SVE register.
gcc/
* config/aarch64/aarch64-sve.md (@aarch64_sve_set_neonq_<mode>):
Use %Z instead of lowpart_subreg. Tweak formatting.
vect: Fix VEC_WIDEN_PLUS_HI/LO choice for big-endian [PR118891]
In the tree codes and optabs, the "hi" in a vector hi/lo pair means
"most significant" and the "lo" means "least significant", with
sigificance following GCC's normal endian expectations. Thus on
big-endian targets, the hi part handles the first half of the elements
in memory order and the lo part handles the second half.
For tree codes, supportable_widening_operation first chooses hi/lo
pairs based on little-endian order and then uses:
if (BYTES_BIG_ENDIAN && c1 != VEC_WIDEN_MULT_EVEN_EXPR)
std::swap (c1, c2);
to adjust. However, the handling for internal functions was missing
an equivalent fixup. This led to several execution failures in vect.exp
on aarch64_be-elf.
If the hi/lo code fails, the internal function handling goes on to try
even/odd. But I couldn't see anything obvious that would put the even/
odd results back into the right order later, so there might be a latent
bug there too.
gcc/
PR tree-optimization/118891
* tree-vect-stmts.cc (supportable_widening_operation): Swap the
hi and lo internal functions on big-endian targets.
Jason Merrill [Sat, 26 Jul 2025 00:49:17 +0000 (20:49 -0400)]
c++: constexpr uninitialized union [PR120577]
This was failing for two reasons:
1) We were wrongly treating the basic_string constructor as
zero-initializing the object, which it doesn't.
2) Given that, when we went to look for a value for the anonymous union,
we concluded that it was value-initialized, and trying to evaluate that
broke because we weren't setting ctx->ctor for it.
This patch fixes both issues, #1 by setting CONSTRUCTOR_NO_CLEARING and #2
by inserting a new CONSTRUCTOR for the member rather than evaluate it out of
context, which is consistent with cxx_eval_store_expression.
PR c++/120577
gcc/cp/ChangeLog:
* constexpr.cc (cxx_eval_call_expression): Set
CONSTRUCTOR_NO_CLEARING on initial value for ctor.
(cxx_eval_component_reference): Make value-initialization
of an aggregate member explicit.
Jakub Jelinek [Fri, 18 Jul 2025 07:20:30 +0000 (09:20 +0200)]
gimple-fold: Fix up big endian _BitInt adjustment [PR121131]
The following testcase ICEs because SCALAR_INT_TYPE_MODE of course
doesn't work for large BITINT_TYPE types which have BLKmode.
native_encode* as well as e.g. r14-8276 use in cases like these
GET_MODE_SIZE (SCALAR_INT_TYPE_MODE ()) and TREE_INT_CST_LOW (TYPE_SIZE_UNIT
()) for the BLKmode ones.
In this case, it wants bits rather than bytes, so I've used
GET_MODE_BITSIZE like before and TYPE_SIZE otherwise.
Furthermore, the patch only computes encoding_size for big endian
targets, for little endian we don't really adjust anything, so there
is no point computing it.
2025-07-18 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/121131
* gimple-fold.cc (fold_nonarray_ctor_reference): Use
TREE_INT_CST_LOW (TYPE_SIZE ()) instead of
GET_MODE_BITSIZE (SCALAR_INT_TYPE_MODE ()) for BLKmode BITINT_TYPEs.
Don't compute encoding_size at all for little endian targets.
H.J. Lu [Thu, 3 Jul 2025 02:54:39 +0000 (10:54 +0800)]
x86-64: Add RDI clobber to 64-bit dynamic TLS patterns
*tls_global_dynamic_64_largepic, *tls_local_dynamic_64_<mode> and
*tls_local_dynamic_base_64_largepic use RDI as the __tls_get_addr
argument. Add RDI clobber to these patterns to show it.
gcc/
PR target/120908
* config/i386/i386.cc (legitimize_tls_address): Pass RDI to
gen_tls_local_dynamic_64.
* config/i386/i386.md (*tls_global_dynamic_64_largepic): Add
RDI clobber and use it to generate LEA.
(*tls_local_dynamic_64_<mode>): Likewise.
(*tls_local_dynamic_base_64_largepic): Likewise.
(@tls_local_dynamic_64_<mode>): Add a clobber.
gcc/testsuite/
PR target/120908
* gcc.target/i386/pr120908.c: New test.
H.J. Lu [Tue, 1 Jul 2025 09:17:06 +0000 (17:17 +0800)]
x86-64: Add RDI clobber to tls_global_dynamic_64 patterns
*tls_global_dynamic_64_<mode> uses RDI as the __tls_get_addr argument.
Add RDI clobber to tls_global_dynamic_64 patterns to show it.
PR target/120908
* config/i386/i386.cc (legitimize_tls_address): Pass RDI to
gen_tls_global_dynamic_64.
* config/i386/i386.md (*tls_global_dynamic_64_<mode>): Add RDI
clobber and use it to generate LEA.
(@tls_global_dynamic_64_<mode>): Add a clobber.
i386: Remove KEYLOCKER related feature since Panther Lake and Clearwater Forest
According to July 2025 SDM, Key locker will no longer be supported on
hardware 2025 onwards. This means for Panther Lake and Clearwater Forest,
the feature will not be enabled. Remove them from those two platforms.
Eric Botcazou [Thu, 1 May 2025 23:30:56 +0000 (01:30 +0200)]
ada: Fix missing error on too large Component_Size not multiple of storage unit
This is a small regression introduced a few years ago.
gcc/ada/ChangeLog:
* gcc-interface/decl.cc (gnat_to_gnu_component_type): Validate the
Component_Size like the size of a type only if the component type
is actually packed.
Eric Botcazou [Tue, 27 May 2025 11:32:18 +0000 (13:32 +0200)]
ada: Fix wrong conversion of controlled array with representation change
The problem is that a temporary is created for the conversion because of the
representation change, and it is finalized without having been initialized.
gcc/ada/ChangeLog:
* exp_ch4.adb (Handle_Changed_Representation): Alphabetize local
variables. Set the No_Finalize_Actions flag on the assignment.
Jakub Jelinek [Tue, 1 Jul 2025 13:28:10 +0000 (15:28 +0200)]
c++: Fix up cp_build_array_ref COND_EXPR handling [PR120471]
The following testcase is miscompiled since the introduction of UBSan,
cp_build_array_ref COND_EXPR handling replaces
(cond ? a : b)[idx] with cond ? a[idx] : b[idx], but if there are
SAVE_EXPRs inside of idx, they will be evaluated just in one of the
branches and the other uses uninitialized temporaries.
Fixed by keeping doing what it did if idx doesn't have side effects
and is invariant. Otherwise if op1/op2 are ARRAY_TYPE arrays with
invariant addresses or pointers with invariant values, use
SAVE_EXPR <op0>, SAVE_EXPR <idx>, SAVE_EXPR <op0> as a new condition
and SAVE_EXPR <idx> instead of idx for the recursive calls.
Otherwise punt, but if op1/op2 are ARRAY_TYPE, furthermore call
cp_default_conversion on array, so that COND_EXPR with ARRAY_TYPE doesn't
survive in the IL until expansion.
2025-07-01 Jakub Jelinek <jakub@redhat.com>
PR c++/120471
gcc/cp/
* typeck.cc (cp_build_array_ref) <case COND_EXPR>: If idx is not
INTEGER_CST, don't optimize the case (but cp_default_conversion on
array early if it has ARRAY_TYPE) or use
SAVE_EXPR <op0>, SAVE_EXPR <idx>, SAVE_EXPR <op0> as new op0 depending
on flag_strong_eval_order and whether op1 and op2 are arrays with
invariant address or tree invariant pointers. Formatting fixes.
gcc/testsuite/
* g++.dg/ubsan/pr120471.C: New test.
* g++.dg/parse/pr120471.C: New test.
aarch64: Incorrect removal of ZA restore [PR120624]
The PCS defines a lazy save scheme for managing ZA across normal
"private-ZA" functions. GCC currently uses this scheme for calls
to all private-ZA functions (rather than using caller-save).
Therefore, before a sequence of calls to private-ZA functions, GCC emits
code to set up a lazy save. After the sequence of calls, GCC emits code
to check whether lazy save was committed and restore the ZA contents
if so.
These sequences are emitted by the mode-switching pass, in an attempt
to reduce the number of redundant saves and restores.
The lazy save scheme also means that, before a function can use ZA,
it must first conditionally store the old contents of ZA to the caller's
lazy save buffer, if any.
This all creates some relatively complex dependencies between
setup code, save/restore code, and normal reads from and writes to ZA.
These dependencies are modelled using special fake hard registers:
;; Sometimes we use placeholder instructions to mark where later
;; ABI-related lowering is needed. These placeholders read and
;; write this register. Instructions that depend on the lowering
;; read the register.
(LOWERING_REGNUM 87)
;; Represents the contents of the current function's TPIDR2 block,
;; in abstract form.
(TPIDR2_BLOCK_REGNUM 88)
;; Holds the value that the current function wants PSTATE.ZA to be.
;; The actual value can sometimes vary, because it does not track
;; changes to PSTATE.ZA that happen during a lazy save and restore.
;; Those effects are instead tracked by ZA_SAVED_REGNUM.
(SME_STATE_REGNUM 89)
;; Instructions write to this register if they set TPIDR2_EL0 to a
;; well-defined value. Instructions read from the register if they
;; depend on the result of such writes.
;;
;; The register does not model the architected TPIDR2_ELO, just the
;; current function's management of it.
(TPIDR2_SETUP_REGNUM 90)
;; Represents the property "has an incoming lazy save been committed?".
(ZA_FREE_REGNUM 91)
;; Represents the property "are the current function's ZA contents
;; stored in the lazy save buffer, rather than in ZA itself?".
(ZA_SAVED_REGNUM 92)
;; Represents the contents of the current function's ZA state in
;; abstract form. At various times in the function, these contents
;; might be stored in ZA itself, or in the function's lazy save buffer.
;;
;; The contents persist even when the architected ZA is off. Private-ZA
;; functions have no effect on its contents.
(ZA_REGNUM 93)
Every normal read from ZA and write to ZA depends on SME_STATE_REGNUM,
in order to sequence the code with the initial setup of ZA and
with the lazy save scheme.
The code to restore ZA after a call involves several instructions,
including conditional control flow. It is initially represented as
a single define_insn and is split late, after shrink-wrapping and
prologue/epilogue insertion.
The split form of the restore instruction includes a conditional call
to __arm_tpidr2_restore:
The write to SME_STATE_REGNUM indicates the end of the region where
ZA_REGNUM might differ from the real contents of ZA. In other words,
it is the point at which normal reads from ZA and writes to ZA
can safely take place.
To finally get to the point, the problem in this PR was that the
unsplit aarch64_restore_za pattern was missing this change to
SME_STATE_REGNUM. It could therefore be deleted as dead before
it had chance to be split. The split form had the correct dataflow,
but the unsplit form didn't.
Unfortunately, the tests for this code tended to use calls and asms
to model regions of ZA usage, and those don't seem to be affected
in the same way.
gcc/
PR target/120624
* config/aarch64/aarch64.md (SME_STATE_REGNUM): Expand on comments.
* config/aarch64/aarch64-sme.md (aarch64_restore_za): Also set
SME_STATE_REGNUM
gcc/testsuite/
PR target/120624
* gcc.target/aarch64/sme/za_state_7.c: New test.
Eric Botcazou [Fri, 27 Jun 2025 21:47:49 +0000 (23:47 +0200)]
Fix misoptimization of CONSTRUCTOR with reverse SSO
fold_ctor_reference already punts on a CONSTRUCTOR whose type has reverse
storage order, but it can be invoked in a couple of places on a CONSTRUCTOR
with native storage order that has been wrapped in a VIEW_CONVERT_EXPR to a
type with reverse storage order; this would require a post adjustment that
does not currently exist, thus yield wrong code for this admittedly quite
pathological (but supported) case.
gcc/
* gimple-fold.cc (fold_const_aggregate_ref_1) <COMPONENT_REF>:
Bail out immediately if the reference has reverse storage order.
* tree-ssa-sccvn.cc (fully_constant_vn_reference_p): Likewise.
gcc/testsuite/
* gnat.dg/sso20.adb: New test.
Haochen Jiang [Tue, 17 Jun 2025 06:08:38 +0000 (14:08 +0800)]
i386: Remove CLDEMOTE for clients
CLDEMOTE is not enabled on clients according to SDM. SDM only mentioned
it will be enabled on Xeon and Atom servers, not clients. Remove them
since Alder Lake (where it is introduced).
gcc/ChangeLog:
* config/i386/i386.h (PTA_ALDERLAKE): Use PTA_GOLDMONT_PLUS
as base to remove PTA_CLDEMOTE.
(PTA_SIERRAFOREST): Add PTA_CLDEMOTE since PTA_ALDERLAKE
does not include that anymore.
* doc/invoke.texi: Update texi file.
Richard Biener [Wed, 11 Sep 2024 11:54:33 +0000 (13:54 +0200)]
tree-optimization/116674 - vectorizable_simd_clone_call and re-analysis
When SLP analysis scraps an instance because it fails to analyze we
can end up calling vectorizable_* in analysis mode on a node that
was analyzed during the analysis of that instance again.
vectorizable_simd_clone_call wasn't expecting that and instead
guarded analysis/transform code on populated data structures.
The following changes it so it survives re-analysis.
PR tree-optimization/116674
* tree-vect-stmts.cc (vectorizable_simd_clone_call): Support
re-analysis.
Jakub Jelinek [Thu, 19 Jun 2025 06:57:27 +0000 (08:57 +0200)]
dfp: Further decimal_real_to_integer fixes [PR120631]
Unfortunately, the following further testcase shows that there aren't
problems only with very large precisions and large exponents, but pretty
much anything larger than 64-bits. After all, before _BitInt support dfp
didn't even have {,unsigned }__int128 <-> _Decimal{32,64,128,64x} support,
and the testcase again shows some of the conversions yielding zeros.
While the pr120631.c test worked even without the earlier patch.
So, this patch assumes 64-bit precision at most is ok and for anything
larger it just uses exponent 0 and multiplies afterwards.
2025-06-19 Jakub Jelinek <jakub@redhat.com>
PR middle-end/120631
* dfp.cc (decimal_real_to_integer): Use result multiplication not just
when precision > 128 and dn.exponent > 19, but when precision > 64
and dn.exponent > 0.
* gcc.dg/dfp/bitint-10.c: New test.
* gcc.dg/dfp/pr120631.c: New test.
Jakub Jelinek [Wed, 18 Jun 2025 06:07:22 +0000 (08:07 +0200)]
dfp, real: Fix up FLOAT_EXPR/FIX_TRUNC_EXPR constant folding between dfp and large _BitInt [PR120631]
The following testcase shows that while at runtime we handle conversions
between _Decimal{64,128} and large _BitInt correctly, at compile time we
mishandle them in both directions, in one direction we end up in ICE in
decimal_from_integer callee because the char buffer is too short for the
needed number of decimal digits, in the conversion of dfp to large _BitInt
we return 0 in the wide_int.
The following patch fixes the ICE by using larger buffer (XALLOCAVEC
allocated, it will be never larger than 65536 / 3 bytes) in the larger
_BitInt case, and the other direction by setting exponent to exp % 19
and instead multiplying the result by needed powers of 10^19 (10^19 chosen
as largest power of ten that can fit into UHWI).
2025-06-18 Jakub Jelinek <jakub@redhat.com>
PR middle-end/120631
* real.cc (decimal_from_integer): Add digits argument, if larger than
256, use XALLOCAVEC allocated buffer.
(real_from_integer): Pass val_in's precision divided by 3 to
decimal_from_integer.
* dfp.cc (decimal_real_to_integer): For precision > 128 if finite
and exponent is large, decrease exponent and multiply resulting
wide_int by powers of 10^19.