Eric Botcazou [Thu, 1 May 2025 23:30:56 +0000 (01:30 +0200)]
ada: Fix missing error on too large Component_Size not multiple of storage unit
This is a small regression introduced a few years ago.
gcc/ada/ChangeLog:
* gcc-interface/decl.cc (gnat_to_gnu_component_type): Validate the
Component_Size like the size of a type only if the component type
is actually packed.
Eric Botcazou [Tue, 27 May 2025 11:32:18 +0000 (13:32 +0200)]
ada: Fix wrong conversion of controlled array with representation change
The problem is that a temporary is created for the conversion because of the
representation change, and it is finalized without having been initialized.
gcc/ada/ChangeLog:
* exp_ch4.adb (Handle_Changed_Representation): Alphabetize local
variables. Set the No_Finalize_Actions flag on the assignment.
Jakub Jelinek [Tue, 1 Jul 2025 13:28:10 +0000 (15:28 +0200)]
c++: Fix up cp_build_array_ref COND_EXPR handling [PR120471]
The following testcase is miscompiled since the introduction of UBSan,
cp_build_array_ref COND_EXPR handling replaces
(cond ? a : b)[idx] with cond ? a[idx] : b[idx], but if there are
SAVE_EXPRs inside of idx, they will be evaluated just in one of the
branches and the other uses uninitialized temporaries.
Fixed by keeping doing what it did if idx doesn't have side effects
and is invariant. Otherwise if op1/op2 are ARRAY_TYPE arrays with
invariant addresses or pointers with invariant values, use
SAVE_EXPR <op0>, SAVE_EXPR <idx>, SAVE_EXPR <op0> as a new condition
and SAVE_EXPR <idx> instead of idx for the recursive calls.
Otherwise punt, but if op1/op2 are ARRAY_TYPE, furthermore call
cp_default_conversion on array, so that COND_EXPR with ARRAY_TYPE doesn't
survive in the IL until expansion.
2025-07-01 Jakub Jelinek <jakub@redhat.com>
PR c++/120471
gcc/cp/
* typeck.cc (cp_build_array_ref) <case COND_EXPR>: If idx is not
INTEGER_CST, don't optimize the case (but cp_default_conversion on
array early if it has ARRAY_TYPE) or use
SAVE_EXPR <op0>, SAVE_EXPR <idx>, SAVE_EXPR <op0> as new op0 depending
on flag_strong_eval_order and whether op1 and op2 are arrays with
invariant address or tree invariant pointers. Formatting fixes.
gcc/testsuite/
* g++.dg/ubsan/pr120471.C: New test.
* g++.dg/parse/pr120471.C: New test.
aarch64: Incorrect removal of ZA restore [PR120624]
The PCS defines a lazy save scheme for managing ZA across normal
"private-ZA" functions. GCC currently uses this scheme for calls
to all private-ZA functions (rather than using caller-save).
Therefore, before a sequence of calls to private-ZA functions, GCC emits
code to set up a lazy save. After the sequence of calls, GCC emits code
to check whether lazy save was committed and restore the ZA contents
if so.
These sequences are emitted by the mode-switching pass, in an attempt
to reduce the number of redundant saves and restores.
The lazy save scheme also means that, before a function can use ZA,
it must first conditionally store the old contents of ZA to the caller's
lazy save buffer, if any.
This all creates some relatively complex dependencies between
setup code, save/restore code, and normal reads from and writes to ZA.
These dependencies are modelled using special fake hard registers:
;; Sometimes we use placeholder instructions to mark where later
;; ABI-related lowering is needed. These placeholders read and
;; write this register. Instructions that depend on the lowering
;; read the register.
(LOWERING_REGNUM 87)
;; Represents the contents of the current function's TPIDR2 block,
;; in abstract form.
(TPIDR2_BLOCK_REGNUM 88)
;; Holds the value that the current function wants PSTATE.ZA to be.
;; The actual value can sometimes vary, because it does not track
;; changes to PSTATE.ZA that happen during a lazy save and restore.
;; Those effects are instead tracked by ZA_SAVED_REGNUM.
(SME_STATE_REGNUM 89)
;; Instructions write to this register if they set TPIDR2_EL0 to a
;; well-defined value. Instructions read from the register if they
;; depend on the result of such writes.
;;
;; The register does not model the architected TPIDR2_ELO, just the
;; current function's management of it.
(TPIDR2_SETUP_REGNUM 90)
;; Represents the property "has an incoming lazy save been committed?".
(ZA_FREE_REGNUM 91)
;; Represents the property "are the current function's ZA contents
;; stored in the lazy save buffer, rather than in ZA itself?".
(ZA_SAVED_REGNUM 92)
;; Represents the contents of the current function's ZA state in
;; abstract form. At various times in the function, these contents
;; might be stored in ZA itself, or in the function's lazy save buffer.
;;
;; The contents persist even when the architected ZA is off. Private-ZA
;; functions have no effect on its contents.
(ZA_REGNUM 93)
Every normal read from ZA and write to ZA depends on SME_STATE_REGNUM,
in order to sequence the code with the initial setup of ZA and
with the lazy save scheme.
The code to restore ZA after a call involves several instructions,
including conditional control flow. It is initially represented as
a single define_insn and is split late, after shrink-wrapping and
prologue/epilogue insertion.
The split form of the restore instruction includes a conditional call
to __arm_tpidr2_restore:
The write to SME_STATE_REGNUM indicates the end of the region where
ZA_REGNUM might differ from the real contents of ZA. In other words,
it is the point at which normal reads from ZA and writes to ZA
can safely take place.
To finally get to the point, the problem in this PR was that the
unsplit aarch64_restore_za pattern was missing this change to
SME_STATE_REGNUM. It could therefore be deleted as dead before
it had chance to be split. The split form had the correct dataflow,
but the unsplit form didn't.
Unfortunately, the tests for this code tended to use calls and asms
to model regions of ZA usage, and those don't seem to be affected
in the same way.
gcc/
PR target/120624
* config/aarch64/aarch64.md (SME_STATE_REGNUM): Expand on comments.
* config/aarch64/aarch64-sme.md (aarch64_restore_za): Also set
SME_STATE_REGNUM
gcc/testsuite/
PR target/120624
* gcc.target/aarch64/sme/za_state_7.c: New test.
Eric Botcazou [Fri, 27 Jun 2025 21:47:49 +0000 (23:47 +0200)]
Fix misoptimization of CONSTRUCTOR with reverse SSO
fold_ctor_reference already punts on a CONSTRUCTOR whose type has reverse
storage order, but it can be invoked in a couple of places on a CONSTRUCTOR
with native storage order that has been wrapped in a VIEW_CONVERT_EXPR to a
type with reverse storage order; this would require a post adjustment that
does not currently exist, thus yield wrong code for this admittedly quite
pathological (but supported) case.
gcc/
* gimple-fold.cc (fold_const_aggregate_ref_1) <COMPONENT_REF>:
Bail out immediately if the reference has reverse storage order.
* tree-ssa-sccvn.cc (fully_constant_vn_reference_p): Likewise.
gcc/testsuite/
* gnat.dg/sso20.adb: New test.
Haochen Jiang [Tue, 17 Jun 2025 06:08:38 +0000 (14:08 +0800)]
i386: Remove CLDEMOTE for clients
CLDEMOTE is not enabled on clients according to SDM. SDM only mentioned
it will be enabled on Xeon and Atom servers, not clients. Remove them
since Alder Lake (where it is introduced).
gcc/ChangeLog:
* config/i386/i386.h (PTA_ALDERLAKE): Use PTA_GOLDMONT_PLUS
as base to remove PTA_CLDEMOTE.
(PTA_SIERRAFOREST): Add PTA_CLDEMOTE since PTA_ALDERLAKE
does not include that anymore.
* doc/invoke.texi: Update texi file.
Richard Biener [Wed, 11 Sep 2024 11:54:33 +0000 (13:54 +0200)]
tree-optimization/116674 - vectorizable_simd_clone_call and re-analysis
When SLP analysis scraps an instance because it fails to analyze we
can end up calling vectorizable_* in analysis mode on a node that
was analyzed during the analysis of that instance again.
vectorizable_simd_clone_call wasn't expecting that and instead
guarded analysis/transform code on populated data structures.
The following changes it so it survives re-analysis.
PR tree-optimization/116674
* tree-vect-stmts.cc (vectorizable_simd_clone_call): Support
re-analysis.
Jakub Jelinek [Thu, 19 Jun 2025 06:57:27 +0000 (08:57 +0200)]
dfp: Further decimal_real_to_integer fixes [PR120631]
Unfortunately, the following further testcase shows that there aren't
problems only with very large precisions and large exponents, but pretty
much anything larger than 64-bits. After all, before _BitInt support dfp
didn't even have {,unsigned }__int128 <-> _Decimal{32,64,128,64x} support,
and the testcase again shows some of the conversions yielding zeros.
While the pr120631.c test worked even without the earlier patch.
So, this patch assumes 64-bit precision at most is ok and for anything
larger it just uses exponent 0 and multiplies afterwards.
2025-06-19 Jakub Jelinek <jakub@redhat.com>
PR middle-end/120631
* dfp.cc (decimal_real_to_integer): Use result multiplication not just
when precision > 128 and dn.exponent > 19, but when precision > 64
and dn.exponent > 0.
* gcc.dg/dfp/bitint-10.c: New test.
* gcc.dg/dfp/pr120631.c: New test.
Jakub Jelinek [Wed, 18 Jun 2025 06:07:22 +0000 (08:07 +0200)]
dfp, real: Fix up FLOAT_EXPR/FIX_TRUNC_EXPR constant folding between dfp and large _BitInt [PR120631]
The following testcase shows that while at runtime we handle conversions
between _Decimal{64,128} and large _BitInt correctly, at compile time we
mishandle them in both directions, in one direction we end up in ICE in
decimal_from_integer callee because the char buffer is too short for the
needed number of decimal digits, in the conversion of dfp to large _BitInt
we return 0 in the wide_int.
The following patch fixes the ICE by using larger buffer (XALLOCAVEC
allocated, it will be never larger than 65536 / 3 bytes) in the larger
_BitInt case, and the other direction by setting exponent to exp % 19
and instead multiplying the result by needed powers of 10^19 (10^19 chosen
as largest power of ten that can fit into UHWI).
2025-06-18 Jakub Jelinek <jakub@redhat.com>
PR middle-end/120631
* real.cc (decimal_from_integer): Add digits argument, if larger than
256, use XALLOCAVEC allocated buffer.
(real_from_integer): Pass val_in's precision divided by 3 to
decimal_from_integer.
* dfp.cc (decimal_real_to_integer): For precision > 128 if finite
and exponent is large, decrease exponent and multiply resulting
wide_int by powers of 10^19.
Richard Earnshaw [Thu, 20 Mar 2025 14:42:59 +0000 (14:42 +0000)]
opcodes: fix wrong code in expand_binop_directly [PR117811]
If expand_binop_directly fails to add a REG_EQUAL note it tries to
unwind and restart. But it can unwind too far if expand_binop changed
some of the operands before calling it. We don't need to unwind that
far anyway since we should end up taking exactly the same route next
time, just without a target rtx.
To fix this we remove LAST from the argument list and let the callers
(all in expand_binop) do their own unwinding if the call fails.
Instead we unwind just as far as the entry to expand_binop_directly
and recurse within this function instead of all the way back up.
gcc/ChangeLog:
PR middle-end/117811
* optabs.cc (expand_binop_directly): Remove LAST as an argument,
instead record the last insn on entry. Only delete insns if
we need to restart and restart by calling ourself, not expand_binop.
(expand_binop): Update callers to expand_binop_directly. If it
fails to expand the operation, delete back to LAST.
gcc/testsuite:
PR middle-end/117811
* gcc.dg/torture/pr117811.c: New test.
Jakub Jelinek [Thu, 12 Jun 2025 18:22:39 +0000 (20:22 +0200)]
recip: Reset range info when replacing sqrt with rsqrt [PR120638]
This pass reuses a SSA_NAME on the lhs of sqrt etc. call as lhs
of .RSQRT etc. call. The following testcase is miscompiled since my recent
ranger cast changes, because we compute (correct) range for sqrtf argument
as well as result but then recip pass keeps using that range for the .RQSRT
call which returns 1. / sqrt, so the function then returns 0.5f
unconditionally.
Note, on foo this is a regression from GCC 15, but on bar it regressed
already with the r14-536 change.
2025-06-12 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/120638
* tree-ssa-math-opts.cc (pass_cse_reciprocals::execute): Call
reset_flow_sensitive_info on arg1.
Jakub Jelinek [Thu, 5 Jun 2025 13:47:19 +0000 (15:47 +0200)]
real: Fix up real_from_integer [PR120547]
The function has 2 problems, one is _BitInt specific and the other is
most likely also reproduceable only with it.
The first issue is that I've missed updating the function for _BitInt,
maxbitlen as MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT
obviously isn't guaranteed to be larger than any integral type we might
want to convert at compile time from wide_int to REAL_VALUE_FORMAT.
Just using len instead of it works fine, at least when used after
HOST_BITS_PER_WIDE_INT is added to it and it is truncated to multiples
of HOST_BITS_PER_WIDE_INT.
The other bug is that if the value has too many significant bits (formerly
maxbitlen - cnt_l_z, now len - cnt_l_z), the code just shifts it right and
adds the shift count to the future exponent. That isn't correct for
rounding as the testcase attempts to show, the internal real format has more
bits than any precision in supported format, but we still need to
distinguish bewtween values exactly half way between representable floating
point values (those should be rounded to even) and the case when we've
shifted away some non-zero bits, so the value was tiny bit larger than half
way and then we should round up.
The patch uses something like e.g. soft-fp uses in these cases, right shift
with sticky bit in the least significant bit.
2025-06-05 Jakub Jelinek <jakub@redhat.com>
PR middle-end/120547
* real.cc (real_from_integer): Remove maxbitlen variable, use
len instead of that. When shifting right, or in 1 if any of the
shifted away bits are non-zero. Formatting fix.
Jakub Jelinek [Tue, 20 May 2025 06:21:14 +0000 (08:21 +0200)]
tree-chrec: Use signed_type_for in convert_affine_scev
On s390x-linux I've run into the gcc.dg/torture/bitint-27.c test ICEing in
build_nonstandard_integer_type called from convert_affine_scev (not sure
why it doesn't trigger on x86_64/aarch64).
The problem is clear, when ct is a BITINT_TYPE with some large
TYPE_PRECISION, build_nonstandard_integer_type won't really work on it.
The patch fixes it similarly what has been done for GCC 14 in various
other spots.
2025-05-20 Jakub Jelinek <jakub@redhat.com>
* tree-chrec.cc (convert_affine_scev): Use signed_type_for instead of
build_nonstandard_integer_type.
Jonathan Wakely [Wed, 4 Jun 2025 17:22:28 +0000 (18:22 +0100)]
libstdc++: Fix std::format thousands separators when sign present [PR120548]
The leading sign character should be skipped when deciding whether to
insert thousands separators into a floating-point format.
libstdc++-v3/ChangeLog:
PR libstdc++/120548
* include/std/format (__formatter_fp::_M_localize): Do not
include a leading sign character in the string to be grouped.
* testsuite/std/format/functions/format.cc: Check grouping when
sign is present in the output.
Jonathan Wakely [Wed, 28 May 2025 14:19:18 +0000 (15:19 +0100)]
libstdc++: Make system_clock::to_time_t always_inline [PR99832]
For some 32-bit targets Glibc supports changing the size of time_t to be
64 bits by defining _TIME_BITS=64. That causes an ABI change which
would affect std::chrono::system_clock::to_time_t. Because to_time_t is
not a function template, its mangled name does not depend on the return
type, so it has the same mangled name whether it returns a 32-bit time_t
or a 64-bit time_t. On targets where the size of time_t can be selected
at preprocessing time, that can cause ODR violations, e.g. the linker
selects a definition of to_time_t that returns a 32-bit value but a
caller expects 64-bit and so reads 32 bits of garbage from the stack.
This commit adds always_inline to to_time_t so that all callers inline
the conversion to time_t, and will do so using whatever type time_t
happens to be in that translation unit.
Existing objects compiled before this change will either have inlined
the function anyway (which is likely if compiled with any optimization
enabled) or will contain a COMDAT definition of the inline function and
so still be able to find it at link-time.
The attribute is also added to system_clock::from_time_t, because that's
an equally simple function and it seems reasonable for them to both be
always inlined.
libstdc++-v3/ChangeLog:
PR libstdc++/99832
* include/bits/chrono.h (system_clock::to_time_t): Add
always_inline attribute to be agnostic to the underlying type of
time_t.
(system_clock::from_time_t): Add always_inline for consistency
with to_time_t.
* testsuite/20_util/system_clock/99832.cc: New test.
Jonathan Wakely [Tue, 20 May 2025 09:53:41 +0000 (10:53 +0100)]
libstdc++: Fix incorrect links to archived SGI STL docs
In r8-7777-g25949ee33201f2 I updated some URLs to point to copies of the
SGI STL docs in the Wayback Machine, because the original pags were no
longer hosted on sgi.com. However, I incorrectly assumed that if one
archived page was at https://web.archive.org/web/20171225062613/... then
all the other pages would be too. Apparently that's not how the Wayback
Machine works, and each page is archived on a different date. That meant
that some of our links were redirecting to archived copies of the
announcement that the SGI STL docs have gone away.
This fixes each URL to refer to a correctly archived copy of the
original docs.
libstdc++: fix compile error when converting std::weak_ptr<T[]>
A std::weak_ptr<T[]> can be converted to a compatible
std::weak_ptr<U[]>. This is implemented by having suitable converting
constructors to std::weak_ptr which dispatch to the __weak_ptr base
class (implementation detail).
In __weak_ptr<T[]>, lock() is supposed to return a __shared_ptr<T[]>,
not a __shared_ptr<element_type> (that is, __shared_ptr<T>).
Unfortunately the return type of lock() and the type of the returned
__shared_ptr were mismatching and that was causing a compile error: when
converting a __weak_ptr<T[]> to a __weak_ptr<U[]> through __weak_ptr's
converting constructor, the code calls lock(), and that simply fails to
build.
Fix it by removing the usage of element_type inside lock(), and using
_Tp instead.
Note that std::weak_ptr::lock() itself was already correct; the one in
__weak_ptr was faulty (and that is the one called by __weak_ptr's
converting constructors).
libstdc++-v3/ChangeLog:
* include/bits/shared_ptr_base.h (lock): Fixed a compile error
when calling lock() on a weak_ptr<T[]>, by removing an
erroneous usage of element_type from within lock().
* testsuite/20_util/shared_ptr/requirements/explicit_instantiation/1.cc:
Add more tests for array types.
* testsuite/20_util/weak_ptr/requirements/explicit_instantiation/1.cc:
Likewise.
* testsuite/20_util/shared_ptr/requirements/1.cc: New test.
* testsuite/20_util/weak_ptr/requirements/1.cc: New test.
What happens is that symtab_remove_unreachable_nodes leaves the last symbol
in kind of a limbo state: in .remove_symbols, we have:
opt7_pkg__enum_name_table/13 (Opt7_Pkg.Enum_Name_Table)
Type: variable
Body removed by symtab_remove_unreachable_nodes
Visibility: externally_visible semantic_interposition external public
References:
Referring: opt7_pkg__image/2 (read) opt7_pkg__image/2 (read)
Availability: not_available
Varpool flags: initialized read-only const-value-known
This means that the "body" (DECL_INITIAL) of the symbol has been disregarded
during reachability analysis, causing the first two symbols to be discarded:
but the DECL_INITIAL is explicitly preserved for later constant folding,
which makes it possible to retrofit the DECLs corresponding to the first
two symbols in the GIMPLE IR and ultimately leads to the crash.
gcc/
* tree-vect-data-refs.cc (vect_can_force_dr_alignment_p): Return
false if the variable has no symtab node.
gcc/testsuite/
* gnat.dg/specs/opt7.ads: New test.
* gnat.dg/specs/opt7_pkg.ads: New helper.
* gnat.dg/specs/opt7_pkg.adb: Likewise.
Jakub Jelinek [Thu, 17 Apr 2025 08:57:18 +0000 (10:57 +0200)]
s390: Use match_scratch instead of scratch in define_split [PR119834]
The following testcase ICEs since r15-1579 (addition of late combiner),
because *clrmem_short can't be split.
The problem is that the define_insn uses
(use (match_operand 1 "nonmemory_operand" "n,a,a,a"))
(use (match_operand 2 "immediate_operand" "X,R,X,X"))
(clobber (match_scratch:P 3 "=X,X,X,&a"))
and define_split assumed that if operands[1] is const_int_operand,
match_scratch will be always scratch, and it will be reg only if
it was the last alternative where operands[1] is a reg.
The pattern doesn't guarantee it though, of course RA will not try to
uselessly assign a reg there if it is not needed, but during RA
on the testcase below we match the last alternative, but then comes
late combiner and propagates const_int 3 into operands[1]. And that
matches fine, match_scratch matches either scratch or reg and the constraint
in that case is X for the first variant, so still just fine. But we won't
split that because the splitters only expect scratch.
The following patch fixes it by using match_scratch instead of scratch,
so that it accepts either.
2025-04-17 Jakub Jelinek <jakub@redhat.com>
PR target/119834
* config/s390/s390.md (define_split after *cpymem_short): Use
(clobber (match_scratch N)) instead of (clobber (scratch)). Use
(match_dup 4) and operands[4] instead of (match_dup 3) and operands[3]
in the last of those.
(define_split after *clrmem_short): Use (clobber (match_scratch N))
instead of (clobber (scratch)).
(define_split after *cmpmem_short): Likewise.
Patrick Palka [Thu, 15 May 2025 15:07:53 +0000 (11:07 -0400)]
c++: unifying specializations of non-primary tmpls [PR120161]
Here unification of P=Wrap<int>::type, A=Wrap<long>::type wrongly
succeeds ever since r14-4112 which made the RECORD_TYPE case of unify
no longer recurse into template arguments for non-primary templates
(since they're a non-deduced context) and so the int/long mismatch that
makes the two types distinct goes unnoticed.
In the case of (comparing specializations of) a non-primary template,
unify should still go on to compare the types directly before returning
success.
PR c++/120161
gcc/cp/ChangeLog:
* pt.cc (unify) <case RECORD_TYPE>: When comparing specializations
of a non-primary template, still perform a type comparison.
Insn tf_to_fprx2 moves a TF value into a floating-point register pair.
For alternative 0, the input is a vector register, however, in the else
case instruction ldr is emitted which expects floating-point register
operands only. Thus, this works only for vector registers which overlap
with floating-point registers. Replace ldr with vlr so that the
remaining vector registers are dealt with, too. Emitting a vlr instead
of a ldr is fine since the destination register %v0 is part of a
floating-point register pair which means that the low half of %v0 is
ignored in the end anyway and therefore may be clobbered.
gcc/ChangeLog:
* config/s390/vector.md: Fix tf_to_fprx2 by using vlr instead of
ldr.
Jakub Jelinek [Thu, 27 Feb 2025 07:48:18 +0000 (08:48 +0100)]
alias: Perform offset arithmetics in poly_offset_int rather than poly_int64 [PR118819]
This PR is about ubsan error on the c - cx1 + cy1 evaluation in the first
hunk.
The following patch hopefully fixes that by doing the additions/subtractions
in poly_offset_int rather than poly_int64 and then converting back to poly_int64.
If it doesn't fit, -1 is returned (which means it is unknown if there is a conflict
or not).
2025-02-27 Jakub Jelinek <jakub@redhat.com>
PR middle-end/118819
* alias.cc (memrefs_conflict_p): Perform arithmetics on c, xsize and
ysize in poly_offset_int and return -1 if it is not representable in
poly_int64.
Jakub Jelinek [Wed, 5 Feb 2025 13:06:42 +0000 (14:06 +0100)]
cselib: Fix up previous patch for SPARC [PR117239]
Sorry, our CI bot just notified me I broke SPARC build. There are two
#ifdef STACK_ADDRESS_OFFSET
guarded snippets and the macro is only defined on SPARC target, so I didn't
notice there was a syntax error.
Fixed thusly.
2025-02-05 Jakub Jelinek <jakub@redhat.com>
PR rtl-optimization/117239
* cselib.cc (cselib_init): Remove spurious closing paren in
the #ifdef STACK_ADDRESS_OFFSET specific code.
Jakub Jelinek [Wed, 5 Feb 2025 12:16:17 +0000 (13:16 +0100)]
cselib: For CALL_INSNs to const/pure fns invalidate memory below sp [PR117239]
The following testcase is miscompiled on x86_64 during postreload.
After reload (with IPA-RA figuring out the calls don't modify any
registers but %rax for return value) postreload sees
(insn 14 12 15 2 (set (mem:DI (plus:DI (reg/f:DI 7 sp)
(const_int 16 [0x10])) [0 S8 A64])
(reg:DI 1 dx [orig:105 q+16 ] [105])) "pr117239.c":18:7 95 {*movdi_internal}
(nil))
(call_insn/i 15 14 16 2 (set (reg:SI 0 ax)
(call (mem:QI (symbol_ref:DI ("baz") [flags 0x3] <function_decl 0x7ffb2e2bdf00 r>) [0 baz S1 A8])
(const_int 24 [0x18]))) "pr117239.c":18:7 1476 {*call_value}
(expr_list:REG_CALL_DECL (symbol_ref:DI ("baz") [flags 0x3] <function_decl 0x7ffb2e2bdf00 baz>)
(expr_list:REG_EH_REGION (const_int 0 [0])
(nil)))
(nil))
(insn 16 15 18 2 (parallel [
(set (reg/f:DI 7 sp)
(plus:DI (reg/f:DI 7 sp)
(const_int 24 [0x18])))
(clobber (reg:CC 17 flags))
]) "pr117239.c":18:7 285 {*adddi_1}
(expr_list:REG_ARGS_SIZE (const_int 0 [0])
(nil)))
...
(call_insn/i 19 18 21 2 (set (reg:SI 0 ax)
(call (mem:QI (symbol_ref:DI ("foo") [flags 0x3] <function_decl 0x7ffb2e2bdb00 l>) [0 foo S1 A8])
(const_int 0 [0]))) "pr117239.c":19:3 1476 {*call_value}
(expr_list:REG_CALL_DECL (symbol_ref:DI ("foo") [flags 0x3] <function_decl 0x7ffb2e2bdb00 foo>)
(expr_list:REG_EH_REGION (const_int 0 [0])
(nil)))
(nil))
(insn 21 19 26 2 (parallel [
(set (reg/f:DI 7 sp)
(plus:DI (reg/f:DI 7 sp)
(const_int -24 [0xffffffffffffffe8])))
(clobber (reg:CC 17 flags))
]) "pr117239.c":19:3 discrim 1 285 {*adddi_1}
(expr_list:REG_ARGS_SIZE (const_int 24 [0x18])
(nil)))
(insn 26 21 24 2 (set (mem:DI (plus:DI (reg/f:DI 7 sp)
(const_int 16 [0x10])) [0 S8 A64])
(reg:DI 1 dx [orig:105 q+16 ] [105])) "pr117239.c":19:3 discrim 1 95 {*movdi_internal}
(nil))
i.e.
movq %rdx, 16(%rsp)
call baz
addq $24, %rsp
...
call foo
subq $24, %rsp
movq %rdx, 16(%rsp)
Now, postreload uses cselib and cselib remembered that %rdx value has been
stored into 16(%rsp). Both baz and foo are pure calls. If they weren't,
when processing those CALL_INSNs cselib would invalidate all MEMs
if (RTL_LOOPING_CONST_OR_PURE_CALL_P (insn)
|| !(RTL_CONST_OR_PURE_CALL_P (insn)))
cselib_invalidate_mem (callmem);
where callmem is (mem:BLK (scratch)). But they are pure, so instead the
code just invalidates the argument slots from CALL_INSN_FUNCTION_USAGE.
The calls actually clobber more than that, even const/pure calls clobber
all memory below the stack pointer. And that is something that hasn't been
invalidated. In this failing testcase, the call to baz is not a big deal,
we don't have anything remembered in memory below %rsp at that call.
But then we increment %rsp by 24, so the %rsp+16 is now 8 bytes below stack
and do the call to foo. And that call now actually, not just in theory,
clobbers the memory below the stack pointer (in particular overwrites it
with the return value). But cselib does not invalidate. Then %rsp
is decremented again (in preparation for another call, to bar) and cselib
is processing store of %rdx (which IPA-RA says has not been modified by
either baz or foo calls) to %rsp + 16, and it sees the memory already has
that value, so the store is useless, let's remove it.
But it is not, the call to foo has changed it, so it needs to be stored
again.
The following patch adds targetted invalidation of memory below stack
pointer (or on SPARC memory below stack pointer + 2047 when stack bias is
used, or on PA memory above stack pointer instead).
It does so only in !ACCUMULATE_OUTGOING_ARGS or cfun->calls_alloca functions,
because in other functions the stack pointer should be constant from
the end of prologue till start of epilogue and so nothing should be stored
within the function below the stack pointer.
Now, memory below stack pointer is special, except for functions using
alloca/VLAs I believe no addressable memory should be there, it should be
purely outgoing function argument area, if we take address of some automatic
variable, it should live all the time above the outgoing function argument
area. So on top of just trying to flush memory below stack pointer
(represented by %rsp - PTRDIFF_MAX with PTRDIFF_MAX size on most arches),
the patch tries to optimize and only invalidate memory that has address
clearly derived from stack pointer (memory with other bases is not
invalidated) and if we can prove (we see same SP_DERIVED_VALUE_P bases in
both VALUEs) it is above current stack, also don't call
canon_anti_dependence which might just give up in certain cases.
I've gathered statistics from x86_64-linux and i686-linux
bootstraps/regtests. During -m64 compilations from those, there were 3718396 + 42634 + 27761 cases of processing MEMs in cselib_invalidate_mem
(callmem[1]) calls, the first number is number of MEMs not invalidated
because of the optimization, i.e.
+ if (sp_derived_base == NULL_RTX)
+ {
+ has_mem = true;
+ num_mems++;
+ p = &(*p)->next;
+ continue;
+ }
in the patch, the second number is number of MEMs not invalidated because
canon_anti_dependence returned false and finally the last number is number
of MEMs actually invalidated (so that is what hasn't been invalidated
before). During -m32 compilations the numbers were 1422412 + 39354 + 16509 with the same meaning.
Note, when there is no red zone, in theory even the sp = sp + incr
instruction invalidates memory below the new stack pointer, as signal
can come and overwrite the memory. So maybe we should be invalidating
something at those instructions as well. But in leaf functions we certainly
can have even addressable automatic vars in the red zone (which would make
it harder to distinguish), on the other side aren't normally storing
anything below the red zone, and in non-leaf it should normally be just the
outgoing arguments area.
2025-02-05 Jakub Jelinek <jakub@redhat.com>
PR rtl-optimization/117239
* cselib.cc: Include predict.h.
(callmem): Change type from rtx to rtx[2].
(cselib_preserve_only_values): Use callmem[0] rather than callmem.
(cselib_invalidate_mem): Optimize and don't try to invalidate
for the mem_rtx == callmem[1] case MEMs which clearly can't be
below the stack pointer.
(cselib_process_insn): Use callmem[0] rather than callmem.
For const/pure calls also call cselib_invalidate_mem (callmem[1])
in !ACCUMULATE_OUTGOING_ARGS or cfun->calls_alloca functions.
(cselib_init): Initialize callmem[0] rather than callmem and also
initialize callmem[1].
Jakub Jelinek [Thu, 28 Nov 2024 09:51:16 +0000 (10:51 +0100)]
gimple-fold: Avoid ICEs with bogus declarations like const attribute no snprintf [PR117358]
When one puts incorrect const or pure attributes on declarations of various
C APIs which have corresponding builtins (vs. what they actually do), we can
get tons of ICEs in gimple-fold.cc.
The following patch fixes it by giving up gimple_fold_builtin_* folding
if the functions don't have gimple_vdef (or for pure functions like
bcmp/strchr/strstr gimple_vuse) when in SSA form (during gimplification
they will surely have both of those NULL even when declared correctly,
yet it is highly desirable to fold them).
Or shall I replace
!gimple_vdef (stmt) && gimple_in_ssa_p (cfun)
tests with
(gimple_call_flags (stmt) & (ECF_CONST | ECF_PURE | ECF_NOVOPS)) != 0
and
!gimple_vuse (stmt) && gimple_in_ssa_p (cfun)
with
(gimple_call_flags (stmt) & (ECF_CONST | ECF_NOVOPS)) != 0
?
2024-11-28 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/117358
* gimple-fold.cc (gimple_fold_builtin_memory_op): Punt if stmt has no
vdef in ssa form.
(gimple_fold_builtin_bcmp): Punt if stmt has no vuse in ssa form.
(gimple_fold_builtin_bcopy): Punt if stmt has no vdef in ssa form.
(gimple_fold_builtin_bzero): Likewise.
(gimple_fold_builtin_memset): Likewise. Use return false instead of
return NULL_TREE.
(gimple_fold_builtin_strcpy): Punt if stmt has no vdef in ssa form.
(gimple_fold_builtin_strncpy): Likewise.
(gimple_fold_builtin_strchr): Punt if stmt has no vuse in ssa form.
(gimple_fold_builtin_strstr): Likewise.
(gimple_fold_builtin_strcat): Punt if stmt has no vdef in ssa form.
(gimple_fold_builtin_strcat_chk): Likewise.
(gimple_fold_builtin_strncat): Likewise.
(gimple_fold_builtin_strncat_chk): Likewise.
(gimple_fold_builtin_string_compare): Likewise.
(gimple_fold_builtin_fputs): Likewise.
(gimple_fold_builtin_memory_chk): Likewise.
(gimple_fold_builtin_stxcpy_chk): Likewise.
(gimple_fold_builtin_stxncpy_chk): Likewise.
(gimple_fold_builtin_stpcpy): Likewise.
(gimple_fold_builtin_snprintf_chk): Likewise.
(gimple_fold_builtin_sprintf_chk): Likewise.
(gimple_fold_builtin_sprintf): Likewise.
(gimple_fold_builtin_snprintf): Likewise.
(gimple_fold_builtin_fprintf): Likewise.
(gimple_fold_builtin_printf): Likewise.
(gimple_fold_builtin_realloc): Likewise.
Harald Anlauf [Thu, 15 May 2025 19:07:07 +0000 (21:07 +0200)]
Fortran: default-initialization and functions returning derived type [PR85750]
Functions with non-pointer, non-allocatable result and of derived type did
not always get initialized although the type had default-initialization,
and a derived type component had the allocatable or pointer attribute.
Rearrange the logic when to apply default-initialization.
PR fortran/85750
gcc/fortran/ChangeLog:
* resolve.cc (resolve_symbol): Reorder conditions when to apply
default-initializers.
Martin Jambor [Wed, 14 May 2025 10:08:24 +0000 (12:08 +0200)]
tree-sra: Do not create stores into const aggregates (PR111873)
This patch fixes (hopefully the) one remaining place where gimple SRA
was still creating a load into const aggregates. It occurs when there
is a replacement for a load but that replacement is not type
compatible - typically because it is a single field structure.
I have used testcases from duplicates because the original test-case
no longer reproduces for me.
gcc/ChangeLog:
2025-05-13 Martin Jambor <mjambor@suse.cz>
PR tree-optimization/111873
* tree-sra.cc (sra_modify_expr): When processing a load which has
a type-incompatible replacement, do not store the contents of the
replacement into the original aggregate when that aggregate is
const.
gcc/testsuite/ChangeLog:
2025-05-13 Martin Jambor <mjambor@suse.cz>
* gcc.dg/ipa/pr120044-1.c: New test.
* gcc.dg/ipa/pr120044-2.c: Likewise.
* gcc.dg/tree-ssa/pr114864.c: Likewise.
Jakub Jelinek [Mon, 10 Feb 2025 09:40:22 +0000 (10:40 +0100)]
i386: Change RTL representation of bt[lq] [PR118623]
The following testcase is miscompiled because of RTL represententation
of bt{l,q} insn followed by e.g. j{c,nc} being misleading to what it
actually does.
Let's look e.g. at
(define_insn_and_split "*jcc_bt<mode>"
[(set (pc)
(if_then_else (match_operator 0 "bt_comparison_operator"
[(zero_extract:SWI48
(match_operand:SWI48 1 "nonimmediate_operand")
(const_int 1)
(match_operand:QI 2 "nonmemory_operand"))
(const_int 0)])
(label_ref (match_operand 3))
(pc)))
(clobber (reg:CC FLAGS_REG))]
"(TARGET_USE_BT || optimize_function_for_size_p (cfun))
&& (CONST_INT_P (operands[2])
? (INTVAL (operands[2]) < GET_MODE_BITSIZE (<MODE>mode)
&& INTVAL (operands[2])
>= (optimize_function_for_size_p (cfun) ? 8 : 32))
: !memory_operand (operands[1], <MODE>mode))
&& ix86_pre_reload_split ()"
"#"
"&& 1"
[(set (reg:CCC FLAGS_REG)
(compare:CCC
(zero_extract:SWI48
(match_dup 1)
(const_int 1)
(match_dup 2))
(const_int 0)))
(set (pc)
(if_then_else (match_op_dup 0 [(reg:CCC FLAGS_REG) (const_int 0)])
(label_ref (match_dup 3))
(pc)))]
{
operands[0] = shallow_copy_rtx (operands[0]);
PUT_CODE (operands[0], reverse_condition (GET_CODE (operands[0])));
})
The define_insn part in RTL describes exactly what it does,
jumps to op3 if bit op2 in op1 is set (for op0 NE) or not set (for op0 EQ).
The problem is with what it splits into.
put_condition_code %C1 for CCCmode comparisons emits c for EQ and LTU,
nc for NE and GEU and ICEs otherwise.
CCCmode is used mainly for carry out of add/adc, borrow out of sub/sbb,
in those cases e.g. for add we have
(set (reg:CCC flags) (compare:CCC (plus:M x y) x))
and use (ltu (reg:CCC flags) (const_int 0)) for carry set and
(geu (reg:CCC flags) (const_int 0)) for carry not set. These cases
model in RTL what is actually happening, compare in infinite precision
x from the result of finite precision addition in M mode and if it is
less than unsigned (i.e. overflow happened), carry is set.
Another use of CCCmode is in UNSPEC_* patterns, those are used with
(eq (reg:CCC flags) (const_int 0)) for carry set and ne for unset,
given the UNSPEC no big deal, the middle-end doesn't know what means
set or unset.
But for the bt{l,q}; j{c,nc} case the above splits it into
(set (reg:CCC flags) (compare:CCC (zero_extract) (const_int 0)))
for bt and
(set (pc) (if_then_else (eq (reg:CCC flags) (const_int 0)) (label_ref) (pc)))
for the bit set case (so that the jump expands to jc) and ne for
the bit not set case (so that the jump expands to jnc).
Similarly for the different splitters for cmov and set{c,nc} etc.
The problem is that when the middle-end reads this RTL, it feels
the exact opposite to it. If zero_extract is 1, flags is set
to comparison of 1 and 0 and that would mean using ne ne in the
if_then_else, and vice versa.
So, in order to better describe in RTL what is actually happening,
one possibility would be to swap the behavior of put_condition_code
and use NE + LTU -> c and EQ + GEU -> nc rather than the current
EQ + LTU -> c and NE + GEU -> nc; and adjust everything. The
following patch uses a more limited approach, instead of representing
bt{l,q}; j{c,nc} case as written above it uses
(set (reg:CCC flags) (compare:CCC (const_int 0) (zero_extract)))
and
(set (pc) (if_then_else (ltu (reg:CCC flags) (const_int 0)) (label_ref) (pc)))
which uses the existing put_condition_code but describes what the
insns actually do in RTL clearly. If zero_extract is 1,
then flags are LTU, 0U < 1U, if zero_extract is 0, then flags are GEU,
0U >= 0U. The patch adjusts the *bt<mode> define_insn and all the
splitters to it and its comparisons/conditional moves/setXX.
2025-02-10 Jakub Jelinek <jakub@redhat.com>
PR target/118623
* config/i386/i386.md (*bt<mode>): Represent bt as
compare:CCC of const0_rtx and zero_extract rather than
zero_extract and const0_rtx.
(*bt<SWI48:mode>_mask): Likewise.
(*jcc_bt<mode>): Likewise. Use LTU and GEU as flags test
instead of EQ and NE.
(*jcc_bt<mode>_mask): Likewise.
(*jcc_bt<SWI48:mode>_mask_1): Likewise.
(Help combine recognize bt followed by cmov splitter): Likewise.
(*bt<mode>_setcqi): Likewise.
(*bt<mode>_setncqi): Likewise.
(*bt<mode>_setnc<mode>): Likewise.
(*bt<mode>_setncqi_2): Likewise.
(*bt<mode>_setc<mode>_mask): Likewise.
Jakub Jelinek [Thu, 13 Feb 2025 09:21:29 +0000 (10:21 +0100)]
c++: Fix up regressions caused by for/while loops with declarations [PR118822]
The recent PR86769 r15-7426 changes regressed the following two testcases,
the first one is more important as it is derived from real-world code.
The first problem is that the chosen
prep = do_pushlevel (sk_block);
// emit something
body = push_stmt_list ();
// emit further stuff
body = pop_stmt_list (body);
prep = do_poplevel (prep);
way of constructing the {FOR,WHILE}_COND_PREP and {FOR,WHILE}_BODY
isn't reliable. If during parsing a label is seen in the body and then
some decl with destructors, sk_cleanup transparent scope is added, but
the correspondiong result from push_stmt_list is saved in
*current_binding_level and pop_stmt_list then pops even that statement list
but only do_poplevel actually attempts to pop the sk_cleanup scope and so we
ICE.
The reason for not doing do_pushlevel (sk_block); do_pushlevel (sk_block);
is that variables should be in the same scope (otherwise various e.g.
redeclaration*.C tests FAIL) and doing do_pushlevel (sk_block); do_pushlevel
(sk_cleanup); wouldn't work either as do_poplevel would silently unwind even
the cleanup one.
So, the following patch changes the earlier approach. Nothing is removed
from the {FOR,WHILE}_COND_PREP subtrees while doing adjust_loop_decl_cond,
push_stmt_list isn't called either; all it does is remember as an integer
the number of cleanups (CLEANUP_STMT at the end of the STATEMENT_LISTs)
from querying stmt_list_stack and finding the initial *body_p in there
(that integer is stored into {FOR,WHILE}_COND_CLEANUP), and temporarily
{FOR,WHILE}_BODY is set to the last statement (if any) in the innermost
STATEMENT_LIST at the adjust_loop_decl_cond time; then at
finish_{for,while}_stmt a new finish_loop_cond_prep routine takes care of
do_poplevel for the scope (which is in {FOR,WHILE}_COND_PREP) and finds
given {FOR,WHILE}_COND_CLEANUP number and {FOR,WHILE}_BODY tree the right
spot where body statements start and moves that into {FOR,WHILE}_BODY.
Finally genericize_c_loop then inserts the cond, body, continue label, expr
into the right subtree of {FOR,WHILE}_COND_PREP.
The constexpr evaluation unfortunately had to be changed as well, because
we don't want to evaluate everything in BIND_EXPR_BODY (*_COND_PREP ())
right away, we want to evaluate it with the exception of the CLEANUP_STMT
cleanups at the end (given {FOR,WHILE}_COND_CLEANUP levels), and defer
the evaluation of the cleanups until after cond, body, expr are evaluated.
2025-02-13 Jakub Jelinek <jakub@redhat.com>
PR c++/118822
gcc/
* tree-iterator.h (tsi_split_stmt_list): Declare.
* tree-iterator.cc (tsi_split_stmt_list): New function.
gcc/c-family/
* c-common.h (WHILE_COND_CLEANUP): Change description in comment.
(FOR_COND_CLEANUP): Likewise.
* c-gimplify.cc (genericize_c_loop): Adjust for COND_CLEANUP
being CLEANUP_STMT/TRY_FINALLY_EXPR trailing nesting depth
instead of actual cleanup.
gcc/cp/
* semantics.cc (adjust_loop_decl_cond): Allow multiple trailing
CLEANUP_STMT levels in *BODY_P. Set *CLEANUP_P to the number
of levels rather than one particular cleanup, keep the cleanups
in *PREP_P. Set *BODY_P to the last stmt in the cur_stmt_list
or NULL if *CLEANUP_P and the innermost cur_stmt_list is empty.
(finish_loop_cond_prep): New function.
(finish_while_stmt, finish_for_stmt): Use it. Don't call
set_one_cleanup_loc.
* constexpr.cc (cxx_eval_loop_expr): Adjust handling of
{FOR,WHILE}_COND_{PREP,CLEANUP}.
gcc/testsuite/
* g++.dg/expr/for9.C: New test.