Richard Earnshaw [Thu, 20 Mar 2025 14:42:59 +0000 (14:42 +0000)]
opcodes: fix wrong code in expand_binop_directly [PR117811]
If expand_binop_directly fails to add a REG_EQUAL note it tries to
unwind and restart. But it can unwind too far if expand_binop changed
some of the operands before calling it. We don't need to unwind that
far anyway since we should end up taking exactly the same route next
time, just without a target rtx.
To fix this we remove LAST from the argument list and let the callers
(all in expand_binop) do their own unwinding if the call fails.
Instead we unwind just as far as the entry to expand_binop_directly
and recurse within this function instead of all the way back up.
gcc/ChangeLog:
PR middle-end/117811
* optabs.cc (expand_binop_directly): Remove LAST as an argument,
instead record the last insn on entry. Only delete insns if
we need to restart and restart by calling ourself, not expand_binop.
(expand_binop): Update callers to expand_binop_directly. If it
fails to expand the operation, delete back to LAST.
gcc/testsuite:
PR middle-end/117811
* gcc.dg/torture/pr117811.c: New test.
Jakub Jelinek [Thu, 12 Jun 2025 18:22:39 +0000 (20:22 +0200)]
recip: Reset range info when replacing sqrt with rsqrt [PR120638]
This pass reuses a SSA_NAME on the lhs of sqrt etc. call as lhs
of .RSQRT etc. call. The following testcase is miscompiled since my recent
ranger cast changes, because we compute (correct) range for sqrtf argument
as well as result but then recip pass keeps using that range for the .RQSRT
call which returns 1. / sqrt, so the function then returns 0.5f
unconditionally.
Note, on foo this is a regression from GCC 15, but on bar it regressed
already with the r14-536 change.
2025-06-12 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/120638
* tree-ssa-math-opts.cc (pass_cse_reciprocals::execute): Call
reset_flow_sensitive_info on arg1.
Jakub Jelinek [Thu, 5 Jun 2025 13:47:19 +0000 (15:47 +0200)]
real: Fix up real_from_integer [PR120547]
The function has 2 problems, one is _BitInt specific and the other is
most likely also reproduceable only with it.
The first issue is that I've missed updating the function for _BitInt,
maxbitlen as MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT
obviously isn't guaranteed to be larger than any integral type we might
want to convert at compile time from wide_int to REAL_VALUE_FORMAT.
Just using len instead of it works fine, at least when used after
HOST_BITS_PER_WIDE_INT is added to it and it is truncated to multiples
of HOST_BITS_PER_WIDE_INT.
The other bug is that if the value has too many significant bits (formerly
maxbitlen - cnt_l_z, now len - cnt_l_z), the code just shifts it right and
adds the shift count to the future exponent. That isn't correct for
rounding as the testcase attempts to show, the internal real format has more
bits than any precision in supported format, but we still need to
distinguish bewtween values exactly half way between representable floating
point values (those should be rounded to even) and the case when we've
shifted away some non-zero bits, so the value was tiny bit larger than half
way and then we should round up.
The patch uses something like e.g. soft-fp uses in these cases, right shift
with sticky bit in the least significant bit.
2025-06-05 Jakub Jelinek <jakub@redhat.com>
PR middle-end/120547
* real.cc (real_from_integer): Remove maxbitlen variable, use
len instead of that. When shifting right, or in 1 if any of the
shifted away bits are non-zero. Formatting fix.
Jakub Jelinek [Tue, 20 May 2025 06:21:14 +0000 (08:21 +0200)]
tree-chrec: Use signed_type_for in convert_affine_scev
On s390x-linux I've run into the gcc.dg/torture/bitint-27.c test ICEing in
build_nonstandard_integer_type called from convert_affine_scev (not sure
why it doesn't trigger on x86_64/aarch64).
The problem is clear, when ct is a BITINT_TYPE with some large
TYPE_PRECISION, build_nonstandard_integer_type won't really work on it.
The patch fixes it similarly what has been done for GCC 14 in various
other spots.
2025-05-20 Jakub Jelinek <jakub@redhat.com>
* tree-chrec.cc (convert_affine_scev): Use signed_type_for instead of
build_nonstandard_integer_type.
Jonathan Wakely [Wed, 4 Jun 2025 17:22:28 +0000 (18:22 +0100)]
libstdc++: Fix std::format thousands separators when sign present [PR120548]
The leading sign character should be skipped when deciding whether to
insert thousands separators into a floating-point format.
libstdc++-v3/ChangeLog:
PR libstdc++/120548
* include/std/format (__formatter_fp::_M_localize): Do not
include a leading sign character in the string to be grouped.
* testsuite/std/format/functions/format.cc: Check grouping when
sign is present in the output.
Jonathan Wakely [Wed, 28 May 2025 14:19:18 +0000 (15:19 +0100)]
libstdc++: Make system_clock::to_time_t always_inline [PR99832]
For some 32-bit targets Glibc supports changing the size of time_t to be
64 bits by defining _TIME_BITS=64. That causes an ABI change which
would affect std::chrono::system_clock::to_time_t. Because to_time_t is
not a function template, its mangled name does not depend on the return
type, so it has the same mangled name whether it returns a 32-bit time_t
or a 64-bit time_t. On targets where the size of time_t can be selected
at preprocessing time, that can cause ODR violations, e.g. the linker
selects a definition of to_time_t that returns a 32-bit value but a
caller expects 64-bit and so reads 32 bits of garbage from the stack.
This commit adds always_inline to to_time_t so that all callers inline
the conversion to time_t, and will do so using whatever type time_t
happens to be in that translation unit.
Existing objects compiled before this change will either have inlined
the function anyway (which is likely if compiled with any optimization
enabled) or will contain a COMDAT definition of the inline function and
so still be able to find it at link-time.
The attribute is also added to system_clock::from_time_t, because that's
an equally simple function and it seems reasonable for them to both be
always inlined.
libstdc++-v3/ChangeLog:
PR libstdc++/99832
* include/bits/chrono.h (system_clock::to_time_t): Add
always_inline attribute to be agnostic to the underlying type of
time_t.
(system_clock::from_time_t): Add always_inline for consistency
with to_time_t.
* testsuite/20_util/system_clock/99832.cc: New test.
Jonathan Wakely [Tue, 20 May 2025 09:53:41 +0000 (10:53 +0100)]
libstdc++: Fix incorrect links to archived SGI STL docs
In r8-7777-g25949ee33201f2 I updated some URLs to point to copies of the
SGI STL docs in the Wayback Machine, because the original pags were no
longer hosted on sgi.com. However, I incorrectly assumed that if one
archived page was at https://web.archive.org/web/20171225062613/... then
all the other pages would be too. Apparently that's not how the Wayback
Machine works, and each page is archived on a different date. That meant
that some of our links were redirecting to archived copies of the
announcement that the SGI STL docs have gone away.
This fixes each URL to refer to a correctly archived copy of the
original docs.
libstdc++: fix compile error when converting std::weak_ptr<T[]>
A std::weak_ptr<T[]> can be converted to a compatible
std::weak_ptr<U[]>. This is implemented by having suitable converting
constructors to std::weak_ptr which dispatch to the __weak_ptr base
class (implementation detail).
In __weak_ptr<T[]>, lock() is supposed to return a __shared_ptr<T[]>,
not a __shared_ptr<element_type> (that is, __shared_ptr<T>).
Unfortunately the return type of lock() and the type of the returned
__shared_ptr were mismatching and that was causing a compile error: when
converting a __weak_ptr<T[]> to a __weak_ptr<U[]> through __weak_ptr's
converting constructor, the code calls lock(), and that simply fails to
build.
Fix it by removing the usage of element_type inside lock(), and using
_Tp instead.
Note that std::weak_ptr::lock() itself was already correct; the one in
__weak_ptr was faulty (and that is the one called by __weak_ptr's
converting constructors).
libstdc++-v3/ChangeLog:
* include/bits/shared_ptr_base.h (lock): Fixed a compile error
when calling lock() on a weak_ptr<T[]>, by removing an
erroneous usage of element_type from within lock().
* testsuite/20_util/shared_ptr/requirements/explicit_instantiation/1.cc:
Add more tests for array types.
* testsuite/20_util/weak_ptr/requirements/explicit_instantiation/1.cc:
Likewise.
* testsuite/20_util/shared_ptr/requirements/1.cc: New test.
* testsuite/20_util/weak_ptr/requirements/1.cc: New test.
What happens is that symtab_remove_unreachable_nodes leaves the last symbol
in kind of a limbo state: in .remove_symbols, we have:
opt7_pkg__enum_name_table/13 (Opt7_Pkg.Enum_Name_Table)
Type: variable
Body removed by symtab_remove_unreachable_nodes
Visibility: externally_visible semantic_interposition external public
References:
Referring: opt7_pkg__image/2 (read) opt7_pkg__image/2 (read)
Availability: not_available
Varpool flags: initialized read-only const-value-known
This means that the "body" (DECL_INITIAL) of the symbol has been disregarded
during reachability analysis, causing the first two symbols to be discarded:
but the DECL_INITIAL is explicitly preserved for later constant folding,
which makes it possible to retrofit the DECLs corresponding to the first
two symbols in the GIMPLE IR and ultimately leads to the crash.
gcc/
* tree-vect-data-refs.cc (vect_can_force_dr_alignment_p): Return
false if the variable has no symtab node.
gcc/testsuite/
* gnat.dg/specs/opt7.ads: New test.
* gnat.dg/specs/opt7_pkg.ads: New helper.
* gnat.dg/specs/opt7_pkg.adb: Likewise.
Jakub Jelinek [Thu, 17 Apr 2025 08:57:18 +0000 (10:57 +0200)]
s390: Use match_scratch instead of scratch in define_split [PR119834]
The following testcase ICEs since r15-1579 (addition of late combiner),
because *clrmem_short can't be split.
The problem is that the define_insn uses
(use (match_operand 1 "nonmemory_operand" "n,a,a,a"))
(use (match_operand 2 "immediate_operand" "X,R,X,X"))
(clobber (match_scratch:P 3 "=X,X,X,&a"))
and define_split assumed that if operands[1] is const_int_operand,
match_scratch will be always scratch, and it will be reg only if
it was the last alternative where operands[1] is a reg.
The pattern doesn't guarantee it though, of course RA will not try to
uselessly assign a reg there if it is not needed, but during RA
on the testcase below we match the last alternative, but then comes
late combiner and propagates const_int 3 into operands[1]. And that
matches fine, match_scratch matches either scratch or reg and the constraint
in that case is X for the first variant, so still just fine. But we won't
split that because the splitters only expect scratch.
The following patch fixes it by using match_scratch instead of scratch,
so that it accepts either.
2025-04-17 Jakub Jelinek <jakub@redhat.com>
PR target/119834
* config/s390/s390.md (define_split after *cpymem_short): Use
(clobber (match_scratch N)) instead of (clobber (scratch)). Use
(match_dup 4) and operands[4] instead of (match_dup 3) and operands[3]
in the last of those.
(define_split after *clrmem_short): Use (clobber (match_scratch N))
instead of (clobber (scratch)).
(define_split after *cmpmem_short): Likewise.
Patrick Palka [Thu, 15 May 2025 15:07:53 +0000 (11:07 -0400)]
c++: unifying specializations of non-primary tmpls [PR120161]
Here unification of P=Wrap<int>::type, A=Wrap<long>::type wrongly
succeeds ever since r14-4112 which made the RECORD_TYPE case of unify
no longer recurse into template arguments for non-primary templates
(since they're a non-deduced context) and so the int/long mismatch that
makes the two types distinct goes unnoticed.
In the case of (comparing specializations of) a non-primary template,
unify should still go on to compare the types directly before returning
success.
PR c++/120161
gcc/cp/ChangeLog:
* pt.cc (unify) <case RECORD_TYPE>: When comparing specializations
of a non-primary template, still perform a type comparison.
Insn tf_to_fprx2 moves a TF value into a floating-point register pair.
For alternative 0, the input is a vector register, however, in the else
case instruction ldr is emitted which expects floating-point register
operands only. Thus, this works only for vector registers which overlap
with floating-point registers. Replace ldr with vlr so that the
remaining vector registers are dealt with, too. Emitting a vlr instead
of a ldr is fine since the destination register %v0 is part of a
floating-point register pair which means that the low half of %v0 is
ignored in the end anyway and therefore may be clobbered.
gcc/ChangeLog:
* config/s390/vector.md: Fix tf_to_fprx2 by using vlr instead of
ldr.
Jakub Jelinek [Thu, 27 Feb 2025 07:48:18 +0000 (08:48 +0100)]
alias: Perform offset arithmetics in poly_offset_int rather than poly_int64 [PR118819]
This PR is about ubsan error on the c - cx1 + cy1 evaluation in the first
hunk.
The following patch hopefully fixes that by doing the additions/subtractions
in poly_offset_int rather than poly_int64 and then converting back to poly_int64.
If it doesn't fit, -1 is returned (which means it is unknown if there is a conflict
or not).
2025-02-27 Jakub Jelinek <jakub@redhat.com>
PR middle-end/118819
* alias.cc (memrefs_conflict_p): Perform arithmetics on c, xsize and
ysize in poly_offset_int and return -1 if it is not representable in
poly_int64.
Jakub Jelinek [Wed, 5 Feb 2025 13:06:42 +0000 (14:06 +0100)]
cselib: Fix up previous patch for SPARC [PR117239]
Sorry, our CI bot just notified me I broke SPARC build. There are two
#ifdef STACK_ADDRESS_OFFSET
guarded snippets and the macro is only defined on SPARC target, so I didn't
notice there was a syntax error.
Fixed thusly.
2025-02-05 Jakub Jelinek <jakub@redhat.com>
PR rtl-optimization/117239
* cselib.cc (cselib_init): Remove spurious closing paren in
the #ifdef STACK_ADDRESS_OFFSET specific code.
Jakub Jelinek [Wed, 5 Feb 2025 12:16:17 +0000 (13:16 +0100)]
cselib: For CALL_INSNs to const/pure fns invalidate memory below sp [PR117239]
The following testcase is miscompiled on x86_64 during postreload.
After reload (with IPA-RA figuring out the calls don't modify any
registers but %rax for return value) postreload sees
(insn 14 12 15 2 (set (mem:DI (plus:DI (reg/f:DI 7 sp)
(const_int 16 [0x10])) [0 S8 A64])
(reg:DI 1 dx [orig:105 q+16 ] [105])) "pr117239.c":18:7 95 {*movdi_internal}
(nil))
(call_insn/i 15 14 16 2 (set (reg:SI 0 ax)
(call (mem:QI (symbol_ref:DI ("baz") [flags 0x3] <function_decl 0x7ffb2e2bdf00 r>) [0 baz S1 A8])
(const_int 24 [0x18]))) "pr117239.c":18:7 1476 {*call_value}
(expr_list:REG_CALL_DECL (symbol_ref:DI ("baz") [flags 0x3] <function_decl 0x7ffb2e2bdf00 baz>)
(expr_list:REG_EH_REGION (const_int 0 [0])
(nil)))
(nil))
(insn 16 15 18 2 (parallel [
(set (reg/f:DI 7 sp)
(plus:DI (reg/f:DI 7 sp)
(const_int 24 [0x18])))
(clobber (reg:CC 17 flags))
]) "pr117239.c":18:7 285 {*adddi_1}
(expr_list:REG_ARGS_SIZE (const_int 0 [0])
(nil)))
...
(call_insn/i 19 18 21 2 (set (reg:SI 0 ax)
(call (mem:QI (symbol_ref:DI ("foo") [flags 0x3] <function_decl 0x7ffb2e2bdb00 l>) [0 foo S1 A8])
(const_int 0 [0]))) "pr117239.c":19:3 1476 {*call_value}
(expr_list:REG_CALL_DECL (symbol_ref:DI ("foo") [flags 0x3] <function_decl 0x7ffb2e2bdb00 foo>)
(expr_list:REG_EH_REGION (const_int 0 [0])
(nil)))
(nil))
(insn 21 19 26 2 (parallel [
(set (reg/f:DI 7 sp)
(plus:DI (reg/f:DI 7 sp)
(const_int -24 [0xffffffffffffffe8])))
(clobber (reg:CC 17 flags))
]) "pr117239.c":19:3 discrim 1 285 {*adddi_1}
(expr_list:REG_ARGS_SIZE (const_int 24 [0x18])
(nil)))
(insn 26 21 24 2 (set (mem:DI (plus:DI (reg/f:DI 7 sp)
(const_int 16 [0x10])) [0 S8 A64])
(reg:DI 1 dx [orig:105 q+16 ] [105])) "pr117239.c":19:3 discrim 1 95 {*movdi_internal}
(nil))
i.e.
movq %rdx, 16(%rsp)
call baz
addq $24, %rsp
...
call foo
subq $24, %rsp
movq %rdx, 16(%rsp)
Now, postreload uses cselib and cselib remembered that %rdx value has been
stored into 16(%rsp). Both baz and foo are pure calls. If they weren't,
when processing those CALL_INSNs cselib would invalidate all MEMs
if (RTL_LOOPING_CONST_OR_PURE_CALL_P (insn)
|| !(RTL_CONST_OR_PURE_CALL_P (insn)))
cselib_invalidate_mem (callmem);
where callmem is (mem:BLK (scratch)). But they are pure, so instead the
code just invalidates the argument slots from CALL_INSN_FUNCTION_USAGE.
The calls actually clobber more than that, even const/pure calls clobber
all memory below the stack pointer. And that is something that hasn't been
invalidated. In this failing testcase, the call to baz is not a big deal,
we don't have anything remembered in memory below %rsp at that call.
But then we increment %rsp by 24, so the %rsp+16 is now 8 bytes below stack
and do the call to foo. And that call now actually, not just in theory,
clobbers the memory below the stack pointer (in particular overwrites it
with the return value). But cselib does not invalidate. Then %rsp
is decremented again (in preparation for another call, to bar) and cselib
is processing store of %rdx (which IPA-RA says has not been modified by
either baz or foo calls) to %rsp + 16, and it sees the memory already has
that value, so the store is useless, let's remove it.
But it is not, the call to foo has changed it, so it needs to be stored
again.
The following patch adds targetted invalidation of memory below stack
pointer (or on SPARC memory below stack pointer + 2047 when stack bias is
used, or on PA memory above stack pointer instead).
It does so only in !ACCUMULATE_OUTGOING_ARGS or cfun->calls_alloca functions,
because in other functions the stack pointer should be constant from
the end of prologue till start of epilogue and so nothing should be stored
within the function below the stack pointer.
Now, memory below stack pointer is special, except for functions using
alloca/VLAs I believe no addressable memory should be there, it should be
purely outgoing function argument area, if we take address of some automatic
variable, it should live all the time above the outgoing function argument
area. So on top of just trying to flush memory below stack pointer
(represented by %rsp - PTRDIFF_MAX with PTRDIFF_MAX size on most arches),
the patch tries to optimize and only invalidate memory that has address
clearly derived from stack pointer (memory with other bases is not
invalidated) and if we can prove (we see same SP_DERIVED_VALUE_P bases in
both VALUEs) it is above current stack, also don't call
canon_anti_dependence which might just give up in certain cases.
I've gathered statistics from x86_64-linux and i686-linux
bootstraps/regtests. During -m64 compilations from those, there were 3718396 + 42634 + 27761 cases of processing MEMs in cselib_invalidate_mem
(callmem[1]) calls, the first number is number of MEMs not invalidated
because of the optimization, i.e.
+ if (sp_derived_base == NULL_RTX)
+ {
+ has_mem = true;
+ num_mems++;
+ p = &(*p)->next;
+ continue;
+ }
in the patch, the second number is number of MEMs not invalidated because
canon_anti_dependence returned false and finally the last number is number
of MEMs actually invalidated (so that is what hasn't been invalidated
before). During -m32 compilations the numbers were 1422412 + 39354 + 16509 with the same meaning.
Note, when there is no red zone, in theory even the sp = sp + incr
instruction invalidates memory below the new stack pointer, as signal
can come and overwrite the memory. So maybe we should be invalidating
something at those instructions as well. But in leaf functions we certainly
can have even addressable automatic vars in the red zone (which would make
it harder to distinguish), on the other side aren't normally storing
anything below the red zone, and in non-leaf it should normally be just the
outgoing arguments area.
2025-02-05 Jakub Jelinek <jakub@redhat.com>
PR rtl-optimization/117239
* cselib.cc: Include predict.h.
(callmem): Change type from rtx to rtx[2].
(cselib_preserve_only_values): Use callmem[0] rather than callmem.
(cselib_invalidate_mem): Optimize and don't try to invalidate
for the mem_rtx == callmem[1] case MEMs which clearly can't be
below the stack pointer.
(cselib_process_insn): Use callmem[0] rather than callmem.
For const/pure calls also call cselib_invalidate_mem (callmem[1])
in !ACCUMULATE_OUTGOING_ARGS or cfun->calls_alloca functions.
(cselib_init): Initialize callmem[0] rather than callmem and also
initialize callmem[1].
Jakub Jelinek [Thu, 28 Nov 2024 09:51:16 +0000 (10:51 +0100)]
gimple-fold: Avoid ICEs with bogus declarations like const attribute no snprintf [PR117358]
When one puts incorrect const or pure attributes on declarations of various
C APIs which have corresponding builtins (vs. what they actually do), we can
get tons of ICEs in gimple-fold.cc.
The following patch fixes it by giving up gimple_fold_builtin_* folding
if the functions don't have gimple_vdef (or for pure functions like
bcmp/strchr/strstr gimple_vuse) when in SSA form (during gimplification
they will surely have both of those NULL even when declared correctly,
yet it is highly desirable to fold them).
Or shall I replace
!gimple_vdef (stmt) && gimple_in_ssa_p (cfun)
tests with
(gimple_call_flags (stmt) & (ECF_CONST | ECF_PURE | ECF_NOVOPS)) != 0
and
!gimple_vuse (stmt) && gimple_in_ssa_p (cfun)
with
(gimple_call_flags (stmt) & (ECF_CONST | ECF_NOVOPS)) != 0
?
2024-11-28 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/117358
* gimple-fold.cc (gimple_fold_builtin_memory_op): Punt if stmt has no
vdef in ssa form.
(gimple_fold_builtin_bcmp): Punt if stmt has no vuse in ssa form.
(gimple_fold_builtin_bcopy): Punt if stmt has no vdef in ssa form.
(gimple_fold_builtin_bzero): Likewise.
(gimple_fold_builtin_memset): Likewise. Use return false instead of
return NULL_TREE.
(gimple_fold_builtin_strcpy): Punt if stmt has no vdef in ssa form.
(gimple_fold_builtin_strncpy): Likewise.
(gimple_fold_builtin_strchr): Punt if stmt has no vuse in ssa form.
(gimple_fold_builtin_strstr): Likewise.
(gimple_fold_builtin_strcat): Punt if stmt has no vdef in ssa form.
(gimple_fold_builtin_strcat_chk): Likewise.
(gimple_fold_builtin_strncat): Likewise.
(gimple_fold_builtin_strncat_chk): Likewise.
(gimple_fold_builtin_string_compare): Likewise.
(gimple_fold_builtin_fputs): Likewise.
(gimple_fold_builtin_memory_chk): Likewise.
(gimple_fold_builtin_stxcpy_chk): Likewise.
(gimple_fold_builtin_stxncpy_chk): Likewise.
(gimple_fold_builtin_stpcpy): Likewise.
(gimple_fold_builtin_snprintf_chk): Likewise.
(gimple_fold_builtin_sprintf_chk): Likewise.
(gimple_fold_builtin_sprintf): Likewise.
(gimple_fold_builtin_snprintf): Likewise.
(gimple_fold_builtin_fprintf): Likewise.
(gimple_fold_builtin_printf): Likewise.
(gimple_fold_builtin_realloc): Likewise.
Harald Anlauf [Thu, 15 May 2025 19:07:07 +0000 (21:07 +0200)]
Fortran: default-initialization and functions returning derived type [PR85750]
Functions with non-pointer, non-allocatable result and of derived type did
not always get initialized although the type had default-initialization,
and a derived type component had the allocatable or pointer attribute.
Rearrange the logic when to apply default-initialization.
PR fortran/85750
gcc/fortran/ChangeLog:
* resolve.cc (resolve_symbol): Reorder conditions when to apply
default-initializers.
Martin Jambor [Wed, 14 May 2025 10:08:24 +0000 (12:08 +0200)]
tree-sra: Do not create stores into const aggregates (PR111873)
This patch fixes (hopefully the) one remaining place where gimple SRA
was still creating a load into const aggregates. It occurs when there
is a replacement for a load but that replacement is not type
compatible - typically because it is a single field structure.
I have used testcases from duplicates because the original test-case
no longer reproduces for me.
gcc/ChangeLog:
2025-05-13 Martin Jambor <mjambor@suse.cz>
PR tree-optimization/111873
* tree-sra.cc (sra_modify_expr): When processing a load which has
a type-incompatible replacement, do not store the contents of the
replacement into the original aggregate when that aggregate is
const.
gcc/testsuite/ChangeLog:
2025-05-13 Martin Jambor <mjambor@suse.cz>
* gcc.dg/ipa/pr120044-1.c: New test.
* gcc.dg/ipa/pr120044-2.c: Likewise.
* gcc.dg/tree-ssa/pr114864.c: Likewise.
Jakub Jelinek [Mon, 10 Feb 2025 09:40:22 +0000 (10:40 +0100)]
i386: Change RTL representation of bt[lq] [PR118623]
The following testcase is miscompiled because of RTL represententation
of bt{l,q} insn followed by e.g. j{c,nc} being misleading to what it
actually does.
Let's look e.g. at
(define_insn_and_split "*jcc_bt<mode>"
[(set (pc)
(if_then_else (match_operator 0 "bt_comparison_operator"
[(zero_extract:SWI48
(match_operand:SWI48 1 "nonimmediate_operand")
(const_int 1)
(match_operand:QI 2 "nonmemory_operand"))
(const_int 0)])
(label_ref (match_operand 3))
(pc)))
(clobber (reg:CC FLAGS_REG))]
"(TARGET_USE_BT || optimize_function_for_size_p (cfun))
&& (CONST_INT_P (operands[2])
? (INTVAL (operands[2]) < GET_MODE_BITSIZE (<MODE>mode)
&& INTVAL (operands[2])
>= (optimize_function_for_size_p (cfun) ? 8 : 32))
: !memory_operand (operands[1], <MODE>mode))
&& ix86_pre_reload_split ()"
"#"
"&& 1"
[(set (reg:CCC FLAGS_REG)
(compare:CCC
(zero_extract:SWI48
(match_dup 1)
(const_int 1)
(match_dup 2))
(const_int 0)))
(set (pc)
(if_then_else (match_op_dup 0 [(reg:CCC FLAGS_REG) (const_int 0)])
(label_ref (match_dup 3))
(pc)))]
{
operands[0] = shallow_copy_rtx (operands[0]);
PUT_CODE (operands[0], reverse_condition (GET_CODE (operands[0])));
})
The define_insn part in RTL describes exactly what it does,
jumps to op3 if bit op2 in op1 is set (for op0 NE) or not set (for op0 EQ).
The problem is with what it splits into.
put_condition_code %C1 for CCCmode comparisons emits c for EQ and LTU,
nc for NE and GEU and ICEs otherwise.
CCCmode is used mainly for carry out of add/adc, borrow out of sub/sbb,
in those cases e.g. for add we have
(set (reg:CCC flags) (compare:CCC (plus:M x y) x))
and use (ltu (reg:CCC flags) (const_int 0)) for carry set and
(geu (reg:CCC flags) (const_int 0)) for carry not set. These cases
model in RTL what is actually happening, compare in infinite precision
x from the result of finite precision addition in M mode and if it is
less than unsigned (i.e. overflow happened), carry is set.
Another use of CCCmode is in UNSPEC_* patterns, those are used with
(eq (reg:CCC flags) (const_int 0)) for carry set and ne for unset,
given the UNSPEC no big deal, the middle-end doesn't know what means
set or unset.
But for the bt{l,q}; j{c,nc} case the above splits it into
(set (reg:CCC flags) (compare:CCC (zero_extract) (const_int 0)))
for bt and
(set (pc) (if_then_else (eq (reg:CCC flags) (const_int 0)) (label_ref) (pc)))
for the bit set case (so that the jump expands to jc) and ne for
the bit not set case (so that the jump expands to jnc).
Similarly for the different splitters for cmov and set{c,nc} etc.
The problem is that when the middle-end reads this RTL, it feels
the exact opposite to it. If zero_extract is 1, flags is set
to comparison of 1 and 0 and that would mean using ne ne in the
if_then_else, and vice versa.
So, in order to better describe in RTL what is actually happening,
one possibility would be to swap the behavior of put_condition_code
and use NE + LTU -> c and EQ + GEU -> nc rather than the current
EQ + LTU -> c and NE + GEU -> nc; and adjust everything. The
following patch uses a more limited approach, instead of representing
bt{l,q}; j{c,nc} case as written above it uses
(set (reg:CCC flags) (compare:CCC (const_int 0) (zero_extract)))
and
(set (pc) (if_then_else (ltu (reg:CCC flags) (const_int 0)) (label_ref) (pc)))
which uses the existing put_condition_code but describes what the
insns actually do in RTL clearly. If zero_extract is 1,
then flags are LTU, 0U < 1U, if zero_extract is 0, then flags are GEU,
0U >= 0U. The patch adjusts the *bt<mode> define_insn and all the
splitters to it and its comparisons/conditional moves/setXX.
2025-02-10 Jakub Jelinek <jakub@redhat.com>
PR target/118623
* config/i386/i386.md (*bt<mode>): Represent bt as
compare:CCC of const0_rtx and zero_extract rather than
zero_extract and const0_rtx.
(*bt<SWI48:mode>_mask): Likewise.
(*jcc_bt<mode>): Likewise. Use LTU and GEU as flags test
instead of EQ and NE.
(*jcc_bt<mode>_mask): Likewise.
(*jcc_bt<SWI48:mode>_mask_1): Likewise.
(Help combine recognize bt followed by cmov splitter): Likewise.
(*bt<mode>_setcqi): Likewise.
(*bt<mode>_setncqi): Likewise.
(*bt<mode>_setnc<mode>): Likewise.
(*bt<mode>_setncqi_2): Likewise.
(*bt<mode>_setc<mode>_mask): Likewise.
Jakub Jelinek [Thu, 13 Feb 2025 09:21:29 +0000 (10:21 +0100)]
c++: Fix up regressions caused by for/while loops with declarations [PR118822]
The recent PR86769 r15-7426 changes regressed the following two testcases,
the first one is more important as it is derived from real-world code.
The first problem is that the chosen
prep = do_pushlevel (sk_block);
// emit something
body = push_stmt_list ();
// emit further stuff
body = pop_stmt_list (body);
prep = do_poplevel (prep);
way of constructing the {FOR,WHILE}_COND_PREP and {FOR,WHILE}_BODY
isn't reliable. If during parsing a label is seen in the body and then
some decl with destructors, sk_cleanup transparent scope is added, but
the correspondiong result from push_stmt_list is saved in
*current_binding_level and pop_stmt_list then pops even that statement list
but only do_poplevel actually attempts to pop the sk_cleanup scope and so we
ICE.
The reason for not doing do_pushlevel (sk_block); do_pushlevel (sk_block);
is that variables should be in the same scope (otherwise various e.g.
redeclaration*.C tests FAIL) and doing do_pushlevel (sk_block); do_pushlevel
(sk_cleanup); wouldn't work either as do_poplevel would silently unwind even
the cleanup one.
So, the following patch changes the earlier approach. Nothing is removed
from the {FOR,WHILE}_COND_PREP subtrees while doing adjust_loop_decl_cond,
push_stmt_list isn't called either; all it does is remember as an integer
the number of cleanups (CLEANUP_STMT at the end of the STATEMENT_LISTs)
from querying stmt_list_stack and finding the initial *body_p in there
(that integer is stored into {FOR,WHILE}_COND_CLEANUP), and temporarily
{FOR,WHILE}_BODY is set to the last statement (if any) in the innermost
STATEMENT_LIST at the adjust_loop_decl_cond time; then at
finish_{for,while}_stmt a new finish_loop_cond_prep routine takes care of
do_poplevel for the scope (which is in {FOR,WHILE}_COND_PREP) and finds
given {FOR,WHILE}_COND_CLEANUP number and {FOR,WHILE}_BODY tree the right
spot where body statements start and moves that into {FOR,WHILE}_BODY.
Finally genericize_c_loop then inserts the cond, body, continue label, expr
into the right subtree of {FOR,WHILE}_COND_PREP.
The constexpr evaluation unfortunately had to be changed as well, because
we don't want to evaluate everything in BIND_EXPR_BODY (*_COND_PREP ())
right away, we want to evaluate it with the exception of the CLEANUP_STMT
cleanups at the end (given {FOR,WHILE}_COND_CLEANUP levels), and defer
the evaluation of the cleanups until after cond, body, expr are evaluated.
2025-02-13 Jakub Jelinek <jakub@redhat.com>
PR c++/118822
gcc/
* tree-iterator.h (tsi_split_stmt_list): Declare.
* tree-iterator.cc (tsi_split_stmt_list): New function.
gcc/c-family/
* c-common.h (WHILE_COND_CLEANUP): Change description in comment.
(FOR_COND_CLEANUP): Likewise.
* c-gimplify.cc (genericize_c_loop): Adjust for COND_CLEANUP
being CLEANUP_STMT/TRY_FINALLY_EXPR trailing nesting depth
instead of actual cleanup.
gcc/cp/
* semantics.cc (adjust_loop_decl_cond): Allow multiple trailing
CLEANUP_STMT levels in *BODY_P. Set *CLEANUP_P to the number
of levels rather than one particular cleanup, keep the cleanups
in *PREP_P. Set *BODY_P to the last stmt in the cur_stmt_list
or NULL if *CLEANUP_P and the innermost cur_stmt_list is empty.
(finish_loop_cond_prep): New function.
(finish_while_stmt, finish_for_stmt): Use it. Don't call
set_one_cleanup_loc.
* constexpr.cc (cxx_eval_loop_expr): Adjust handling of
{FOR,WHILE}_COND_{PREP,CLEANUP}.
gcc/testsuite/
* g++.dg/expr/for9.C: New test.
Jakub Jelinek [Fri, 7 Feb 2025 16:07:23 +0000 (17:07 +0100)]
c++: Fix up handling of for/while loops with declarations in condition [PR86769]
As the following testcase show (note, just for-{3,4,6,7,8}.C, constexpr-86769.C
and stmtexpr27.C FAIL without the patch, the rest is just that I couldn't
find coverage for some details and so added tests we don't regress or for5.C
is from Marek's attempt in the PR), we weren't correctly handling for/while
loops with declarations as conditions.
The C++ FE has the simplify_loop_decl_cond function which transforms
such loops as mentioned in the comment:
while (A x = 42) { }
for (; A x = 42;) { }
becomes
while (true) { A x = 42; if (!x) break; }
for (;;) { A x = 42; if (!x) break; }
For for loops this is not enough, as the x declaration should be
still in scope when expr (if any) is executed, and injecting the
expr expression into the body in the FE needs to have the continue
label in between, something normally added by the c-family
genericization. One of my thoughts was to just add there an artificial
label plus the expr expression in the FE and tell c-family about that
label, so that it doesn't create it but uses what has been emitted.
Unfortunately break/continue are resolved to labels only at c-family
genericization time and by moving the condition (and its preparation
statements such as the DECL_EXPR) into the body (and perhaps by also
moving there the (increment) expr as well) we resolve incorrectly any
break/continue statement appearing in cond (or newly perhaps also expr)
expression(s). While in standard C++ one can't have something like that
there, with statement expressions they are possible there, and we actually
have testsuite coverage that when they appear outside of the body of the
loop they bind to an outer loop rather than the inner one. When the FE
moves everything into the body, c-family can't distinguish any more between
the user body vs. the condition/preparation statements vs. expr expression.
So, the following patch instead keeps them separate and does the merging
only at the c-family loop genericization time. For that the patch
introduces two new operands of FOR_STMT and WHILE_STMT, *_COND_PREP
which is forced to be a BIND_EXPR which contains the preparation statements
like DECL_EXPR, and the initialization of that variable, so basically what
{FOR,WHILE}_BODY has when we encounter the function dealing with this,
except one optional CLEANUP_STMT at the end which holds cleanups for the
variable if it needs to be destructed. This CLEANUP_STMT is removed and
the actual cleanup saved into another new operand, *_COND_CLEANUP.
The c-family loop genericization handles such loops roughly the way
https://eel.is/c++draft/stmt.for and https://eel.is/c++draft/stmt.while
specifies, so the body is (if *_COND_CLEANUP is non-NULL)
{ A x = 42; try { if (!x) break; body; cont_label: expr; } finally { cleanup; } }
and otherwise
{ A x = 42; if (!x) break; body; cont_label: expr; }
i.e. the *_COND, *_BODY, optional continue label, FOR_EXPR are appended
into the body of the *_COND_PREP BIND_EXPR.
And when doing constexpr evaluation of such FOR/WHILE loops, we treat
it similarly, first evaluate *_COND_PREP except the
for (tree decl = BIND_EXPR_VARS (t); decl; decl = DECL_CHAIN (decl))
destroy_value_checked (ctx, decl, non_constant_p);
part of BIND_EXPR handling for it, then evaluate *_COND (and decide based
on whether it was false or true like before), then *_BODY, then FOR_EXPR,
then *_COND_CLEANUP (similarly to the way how CLEANUP_STMT handling handles
that) and finally do those destroy_value_checked.
Note, the constexpr-86769.C testcase FAILs with both clang++ and MSVC (note,
the rest of tests PASS with clang++) but I believe it must be just a bug
in those compilers, new int is done in all the constructors and delete is
done in the destructor, so when clang++ reports one of the new int weren't
deallocated during constexpr evaluation I don't see how that would be
possible. When the same test has all the constexpr stuff, all the new int
are properly deleted at runtime when compiled by both compilers and valgrind
is happy about it, no leaks.
2025-02-07 Jakub Jelinek <jakub@redhat.com>
Jason Merrill <jason@redhat.com>
PR c++/86769
gcc/c-family/
* c-common.def (FOR_STMT): Add 2 operands and document them.
(WHILE_STMT): Likewise.
* c-common.h (WHILE_COND_PREP, WHILE_COND_CLEANUP): Define.
(FOR_COND_PREP, FOR_COND_CLEANUP): Define.
* c-gimplify.cc (genericize_c_loop): Add COND_PREP and COND_CLEANUP
arguments, handle them if they are non-NULL.
(genericize_for_stmt, genericize_while_stmt, genericize_do_stmt):
Adjust callers.
gcc/c/
* c-parser.cc (c_parser_while_statement): Add 2 further NULL_TREE
operands to build_stmt.
(c_parser_for_statement): Likewise.
gcc/cp/
* semantics.cc (set_one_cleanup_loc): New function.
(set_cleanup_locs): Use it.
(simplify_loop_decl_cond): Remove.
(adjust_loop_decl_cond): New function.
(begin_while_stmt): Add 2 further NULL_TREE operands to build_stmt.
(finish_while_stmt_cond): Call adjust_loop_decl_cond instead of
simplify_loop_decl_cond.
(finish_while_stmt): Call do_poplevel also on WHILE_COND_PREP if
non-NULL and also use pop_stmt_list rather than do_poplevel for
WHILE_BODY in that case. Call set_one_cleanup_loc.
(begin_for_stmt): Add 2 further NULL_TREE operands to build_stmt.
(finish_for_cond): Call adjust_loop_decl_cond instead of
simplify_loop_decl_cond.
(finish_for_stmt): Call do_poplevel also on FOR_COND_PREP if non-NULL
and also use pop_stmt_list rather than do_poplevel for FOR_BODY in
that case. Call set_one_cleanup_loc.
* constexpr.cc (cxx_eval_loop_expr): Handle
{WHILE,FOR}_COND_{PREP,CLEANUP}.
(check_for_return_continue): Handle {WHILE,FOR}_COND_PREP.
(potential_constant_expression_1): RECUR on
{WHILE,FOR}_COND_{PREP,CLEANUP}.
gcc/testsuite/
* g++.dg/diagnostic/redeclaration-7.C: New test.
* g++.dg/expr/for3.C: New test.
* g++.dg/expr/for4.C: New test.
* g++.dg/expr/for5.C: New test.
* g++.dg/expr/for6.C: New test.
* g++.dg/expr/for7.C: New test.
* g++.dg/expr/for8.C: New test.
* g++.dg/ext/stmtexpr27.C: New test.
* g++.dg/cpp2a/constexpr-86769.C: New test.
* g++.dg/cpp26/name-independent-decl7.C: New test.
* g++.dg/cpp26/name-independent-decl8.C: New test.
Iain Sandoe [Mon, 12 May 2025 15:57:54 +0000 (16:57 +0100)]
c++, coroutines: Fix handling of early exceptions [PR113773].
This is a GCC-14 version of the same strategy as used on trunk, but
with the more wide-ranging code cleanups elided. Since the return
expression could throw, but after the frame is destroyed, we must
also account for this, in addition to whether initial await_resume
has been called.
PR c++/113773
gcc/cp/ChangeLog:
* coroutines.cc (coro_rewrite_function_body): Do not set
initial_await_resume_called here.
(morph_fn_to_coro): Set it here, and introduce a new flag
that indicates we have not yet reached the ramp return.
Gate the EH cleanups on both of these flags.
Nathaniel Shead [Wed, 14 May 2025 10:49:22 +0000 (20:49 +1000)]
c++: Partially revert "Support lambdas attached to more places in modules" [PR118245]
r14-9232-g3685fae23bb008 broke the ABI for lambdas in base classes,
causing ICEs when different lambdas got given the same mangled name.
This patch reverts the parser.cc changes from that patch to restore the
old behaviour. The properly fixed behaviour is available in GCC 15.1
with r15-7202-g8990070b4297b9, but that change was not suitable for
backporting.
PR c++/118245
gcc/cp/ChangeLog:
* parser.cc (cp_parser_class_head): Remove lambda scope when
parsing base classes.
gcc/testsuite/ChangeLog:
* g++.dg/modules/lambda-7_a.H: Expect the test to fail.
* g++.dg/modules/lambda-7_b.C: Likewise.
* g++.dg/modules/lambda-7_c.C: Likewise.
* g++.dg/cpp2a/lambda-uneval23.C: New test.
Joseph Myers [Mon, 18 Nov 2024 22:24:48 +0000 (22:24 +0000)]
c: Allow bool and enum null pointer constants [PR112556]
As reported in bug 112556, GCC wrongly rejects conversion of null
pointer constants with bool or enum type to pointers in
convert_for_assignment (assignment, initialization, argument passing,
return). Fix the code there to allow BOOLEAN_TYPE and ENUMERAL_TYPE;
it already allowed INTEGER_TYPE and BITINT_TYPE.
This bug (together with -std=gnu23 meaning false has type bool rather
than int) has in turn resulted in people thinking they need to fix
code using false as a null pointer constant for C23 compatibility.
While such a usage is certainly questionable, it has nothing to do
with C23 compatibility and the right place for warnings about such
usage is -Wzero-as-null-pointer-constant. I think it would be
appropriate to extend -Wzero-as-null-pointer-constant to cover
BOOLEAN_TYPE, ENUMERAL_TYPE and BITINT_TYPE (in all the various
contexts in which that option generates warnings), though this patch
doesn't do anything about that option.
Bootstrapped with no regressions for x86-64-pc-linux-gnu.
PR c/112556
gcc/c/
* c-typeck.cc (convert_for_assignment): Allow conversion of
ENUMERAL_TYPE and BOOLEAN_TYPE null pointer constants to pointers.
gcc/testsuite/
* gcc.dg/c11-null-pointer-constant-1.c,
gcc.dg/c23-null-pointer-constant-1.c: New tests.
Kyle Huey [Wed, 14 May 2025 03:26:26 +0000 (20:26 -0700)]
dwarf2out: Propagate dtprel into the .debug_addr table in resolve_addr_in_expr
For a debugger to display statically-allocated[0] TLS variables the compiler
must communicate information[1] that can be used in conjunction with knowledge
of the runtime enviroment[2] to calculate a location for the variable for
each thread. That need gives rise to dw_loc_dtprel in dwarf2out, a flag tracking
whether the location description is dtprel, or relative to the
"dynamic thread pointer". Location descriptions in the .debug_info section for
TLS variables need to be relocated by the static linker accordingly, and
dw_loc_dtprel controls emission of the needed relocations.
This is further complicated by -gsplit-dwarf. -gsplit-dwarf is designed to allow
as much debugging information as possible to bypass the static linker to improve
linking performance. One of the ways that is done is by introducing a layer of
indirection for relocatable values[3]. That gives rise to addr_index_table which
ultimately results in the .debug_addr section.
While the code handling addr_index_table clearly contemplates the existence of
dtprel entries[4] resolve_addr_in_expr does not, and the result is that when
using -gsplit-dwarf the DWARF for TLS variables contains an address[5] rather
than an offset, and debuggers can't work with that.
This is visible on a trivial example. Compile
```
static __thread int tls_var;
int main(void) {
tls_var = 42;
return 0;
}
```
with -g and -g -gsplit-dwarf. Run the program under gdb. When examining the
value of tls_var before and after the assignment, -g behaves as one would
expect but -g -gsplit-dwarf does not. If the user is lucky and the miscalculated
address is not mapped, gdb will print "Cannot access memory at address ...".
If the user is unlucky and the miscalculated address is mapped, gdb will simply
give the wrong value. You can further confirm that the issue is the address
calculation by asking gdb for the address of tls_var and comparing that to what
one would expect.[6]
Thankfully this is trivial to fix by modifying resolve_addr_in_expr to propagate
the dtprel character of the location where necessary. gdb begins working as
expected and the diff in the generated assembly is clear.
[0] Referring to e.g. __thread as statically-allocated vs. e.g. a
dynamically-allocated pthread_key_create() call.
[1] Generally an offset in a TLS block.
[2] With glibc, provided by libthread_db.so.
[3] Relocatable values are moved to a table in the .debug_addr section, those
values in .debug_info are replaced with special values that look up indexes
in that table, and then the static linker elsewhere assigns a single per-CU
starting index in the .debug_addr section, allowing those special values to
remain permanently fixed and the resulting data to be ignored by the linker.
[4] ate_kind_rtx_dtprel exists, after all, and new_addr_loc_descr does produce
it where appropriate.
[5] e.g. an address in the .tbss/.tdata section.
[6] e.g. on x86-64 by examining %fsbase and the offset in the assembly
2025-05-01 Kyle Huey <me@kylehuey.com>
* dwarf2out.cc (resolve_addr_in_expr): Propagate dtprel into the address
table when appropriate.
where the return type of value should be 'int &' since '(val)' is an
expression, not a name, and decltype(auto) performs the type deduction
using the decltype rules.
The problem is that we weren't propagating REF_PARENTHESIZED_P
correctly: the return value of finish_non_static_data_member in this
test was a REFERENCE_REF_P, so we didn't set the flag. We should
use force_paren_expr like below.
PR c++/116379
gcc/cp/ChangeLog:
* pt.cc (tsubst_expr) <COMPONENT_REF>: Use force_paren_expr to set
REF_PARENTHESIZED_P.
Marek Polacek [Wed, 29 Jan 2025 20:58:38 +0000 (15:58 -0500)]
c++: auto in trailing-return-type in parameter [PR117778]
This PR describes a few issues, both ICE and rejects-valid, but
ultimately the problem is that we don't properly synthesize the
second auto in:
int
g (auto fp() -> auto)
{
return fp ();
}
since r12-5860, which disabled auto_is_implicit_function_template_parm_p
in cp_parser_parameter_declaration after parsing the decl-specifier-seq.
If there is no trailing auto, there is no problem.
So we have to make sure auto_is_implicit_function_template_parm_p is
properly set when parsing the trailing auto. A complication is that
one can write:
auto f (auto fp(auto fp2() -> auto) -> auto) -> auto;
~~~~~~~
where only the underlined auto should be synthesized. So when we
parse a parameter-declaration-clause inside another
parameter-declaration-clause, we should not enable the flag. We
have no flags to keep track of such nesting, but I think I can walk
current_binding_level to see if we find ourselves in such an unlikely
scenario.
Marek Polacek [Wed, 14 May 2025 14:34:44 +0000 (10:34 -0400)]
c++: fix reporting routines re-entered [PR119303]
We crash while we call warning_at ("inline function used but never defined")
since it invokes dump_template_bindings -> tsubst -> ... -> convert_like ->
... -> c_common_truthvalue_conversion -> warning_at ("enum constant in boolean
context")
cp_truthvalue_conversion correctly gets complain=0 but it calls
c_common_truthvalue_conversion from c-family which doesn't have
a similar parameter.
We can fix this by tweaking diagnostic_context::report_diagnostic to
check for recursion after checking if the diagnostic was enabled.
PR c++/116960
PR c++/119303
gcc/ChangeLog:
* diagnostic.cc (diagnostic_context::report_diagnostic): Check for
non-zero m_lock later, after checking diagnostic_enabled.
gcc/testsuite/ChangeLog:
* g++.dg/cpp2a/lambda-uneval26.C: New test.
* g++.dg/warn/undefined2.C: New test.
Marek Polacek [Tue, 11 Feb 2025 20:43:40 +0000 (15:43 -0500)]
c++: ICE with operator new[] in constexpr [PR118775]
Here we ICE since r11-7740 because we no longer say that (long)&a
(where a is a global var) is non_constant_p. So VERIFY_CONSTANT
does not return and we crash on tree_to_uhwi. We should check
tree_fits_uhwi_p before calling tree_to_uhwi.
Nathaniel Shead [Thu, 8 May 2025 13:06:13 +0000 (23:06 +1000)]
c++/modules: Fix handling of -fdeclone-ctor-dtor with explicit instantiations [PR120125]
The attached testcase ICEs in maybe_thunk_body because we haven't
created a node in the cgraph for an imported explicit instantiation yet.
We in fact really shouldn't be emitting calls at all, since an imported
explicit instantiation always exists in the TU we imported it from. But
the required logic for that doesn't exist in this branch, so we'll just
xfail the relevant check.
PR c++/120125
gcc/cp/ChangeLog:
* optimize.cc (maybe_thunk_body): Don't assume 'fn' has a cgraph
node created.
gcc/testsuite/ChangeLog:
* g++.dg/modules/clone-4_a.C: New test.
* g++.dg/modules/clone-4_b.C: New test.
Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com> Reviewed-by: Jason Merrill <jason@redhat.com>
Jakub Jelinek [Tue, 13 May 2025 12:20:22 +0000 (14:20 +0200)]
libfortran: Fix up _gfortran_{,m,s}findloc2_s{1,4} [PR120196]
As mentioned in the PR, _gfortran_{,m,s}findloc2_s{1,4} iterate too many
times in the back case if nothing is found.
For !back, the loops are for (i = 1; i <= extent; i++) so i is in the
body [1, extent] if nothing is found, but for back it is
for (i = extent; i >= 0; i--) so i is in the body [0, extent] and compares
one element before the start of the array.
Note, findloc1_s{1,4} uses
for (n = len; n > 0; n--, src -= delta * len_array)
for the back loop and
for (n = 1; n <= len; n++, src += delta * len_array)
for !back. This patch fixes that.
The testcase fails under valgrind without the libgfortran changes and
succeeds with those.
2025-05-13 Jakub Jelinek <jakub@redhat.com>
PR libfortran/120196
* m4/ifindloc2.m4 (header1, header2): For back use i > 0 rather than
i >= 0 as for condition.
* generated/findloc2_s1.c: Regenerate.
* generated/findloc2_s4.c: Regenerate.
Jakub Jelinek [Tue, 13 May 2025 12:19:25 +0000 (14:19 +0200)]
libfortran: Fix up _gfortran_s{max,min}loc1_{4,8,16}_s{1,4} [PR120191]
There is a bug in _gfortran_s{max,min}loc1_{4,8,16}_s{1,4} which the
following testcase shows.
The functions return but then crash in the caller.
Seems that is because buffer overflows, I believe those functions for
if (mask == NULL || *mask) condition being false are supposed to fill in
the result array with all zeros (or allocate it and fill it with zeros).
My understanding is the result array in that case is integer(kind={4,8,16})
and should have the extents the character input array has.
The problem is that it uses * string_len in the extent multiplication:
extent[n] = GFC_DESCRIPTOR_EXTENT(array,n) * string_len;
and
extent[n] =
GFC_DESCRIPTOR_EXTENT(array,n + 1) * string_len;
which is I guess fine and desirable for the extents of the character array,
but not for the extents of the destination array. Yet the code uses
that extent array for that purpose (and no other purposes).
Here it uses it to set the dimensions for the case where it needs to
allocate (as well as size):
for (n = 0; n < rank; n++)
{
if (n == 0)
str = 1;
else
str = GFC_DESCRIPTOR_STRIDE(retarray,n-1) * extent[n-1];
GFC_DIMENSION_SET(retarray->dim[n], 0, extent[n] - 1, str);
}
Here it uses it for bounds checking of the destination:
if (unlikely (compile_options.bounds_check))
{
for (n=0; n < rank; n++)
{
index_type ret_extent;
ret_extent = GFC_DESCRIPTOR_EXTENT(retarray,n);
if (extent[n] != ret_extent)
runtime_error ("Incorrect extent in return value of"
" MAXLOC intrinsic in dimension %ld:"
" is %ld, should be %ld", (long int) n + 1,
(long int) ret_extent, (long int) extent[n]);
}
}
and here to find out how many retarray elements to actually fill in each
dimension:
while(1)
{
*dest = 0;
count[0]++;
dest += dstride[0];
n = 0;
while (count[n] == extent[n])
{
/* When we get to the end of a dimension, reset it and increment
the next dimension. */
count[n] = 0;
/* We could precalculate these products, but this is a less
frequently used path so probably not worth it. */
dest -= dstride[n] * extent[n];
Seems maxloc1s.m4 and minloc1s.m4 are the only users of ifunction-s.m4,
so we can change SCALAR_ARRAY_FUNCTION in there without breaking anything
else.
Jakub Jelinek [Tue, 13 May 2025 12:18:10 +0000 (14:18 +0200)]
libfortran: Fix up _gfortran_s{max,min}loc2_{4,8,16}_s{1,4} [PR120191]
I've tried to write a testcase for the BT_CHARACTER maxloc/minloc with named
or unnamed arguments and indeed the just posted patch fixed the arguments
in there in multiple cases to match what the library expects.
But the testcase still fails, due to library problems.
One dealt with in this patch are _gfortran_s{max,min}loc2_{4,8,16}_s{1,4}
functions. Those are trivial wrappers around
_gfortrani_{max,min}loc2_{4,8,16}_s{1,4} which should call those functions
if the scalar mask is true and just return 0 otherwise.
The two bugs I see there is that the back, len arguments are swapped,
which means that it always acts as back=.true. and for len will use
character length of 1 or 0 instead of the desired one.
The _gfortrani_{max,min}loc2_{4,8,16}_s{1,4} functions have prototypes like
GFC_INTEGER_4
maxloc2_4_s1 (gfc_array_s1 * const restrict array, GFC_LOGICAL_4 back, gfc_charlen_type len)
so back comes before len, ditto for the
GFC_INTEGER_4
smaxloc2_4_s1 (gfc_array_s1 * const restrict array,
GFC_LOGICAL_4 *mask, GFC_LOGICAL_4 back, gfc_charlen_type len)
The other problem is that it was just testing if (mask). In my limited
Fortran understanding that means that the optional argument mask was
supplied but nothing about its actual value. Other scalar mask generated
routines use if (mask == NULL || *mask) as the condition when to call the
non-masked function, i.e. when mask is not supplied (then it should act like
.true. mask) or when it is supplied and evaluates to .true.).
2025-05-13 Jakub Jelinek <jakub@redhat.com>
PR fortran/120191
* m4/maxloc2s.m4: For smaxloc2 call maxloc2 if mask is NULL or *mask.
Swap back and len arguments.
* m4/minloc2s.m4: Likewise.
* generated/maxloc2_4_s1.c: Regenerate.
* generated/maxloc2_4_s4.c: Regenerate.
* generated/maxloc2_8_s1.c: Regenerate.
* generated/maxloc2_8_s4.c: Regenerate.
* generated/maxloc2_16_s1.c: Regenerate.
* generated/maxloc2_16_s4.c: Regenerate.
* generated/minloc2_4_s1.c: Regenerate.
* generated/minloc2_4_s4.c: Regenerate.
* generated/minloc2_8_s1.c: Regenerate.
* generated/minloc2_8_s4.c: Regenerate.
* generated/minloc2_16_s1.c: Regenerate.
* generated/minloc2_16_s4.c: Regenerate.
Jakub Jelinek [Tue, 13 May 2025 12:14:55 +0000 (14:14 +0200)]
fortran: Fix up minloc/maxloc lowering [PR120191]
We need to drop the kind argument from what is passed to the
library, but need to do it not only when one uses the argument name
for it (so kind=4 etc.) but also when one passes all the arguments
to the intrinsics.
The following patch uses what gfc_conv_intrinsic_findloc uses,
which looks more efficient and cleaner, we already set automatic
vars to point to the kind and back actual arguments, so we can just
free/clear expr on the former and set name to "%VAL" on the latter.
And similarly clears dim argument for the BT_CHARACTER case when using
maxloc2/minloc2, again regardless of whether it was named or not.
2025-05-13 Jakub Jelinek <jakub@redhat.com>
Daniil Kochergin <daniil2472s@gmail.com>
Tobias Burnus <tburnus@baylibre.com>
PR fortran/120191
* trans-intrinsic.cc (strip_kind_from_actual): Remove.
(gfc_conv_intrinsic_minmaxloc): Don't call strip_kind_from_actual.
Free and clear kind_arg->expr if non-NULL. Set back_arg->name to
"%VAL" instead of a loop looking for last argument. Remove actual
variable, use array_arg instead. Free and clear dim_arg->expr if
non-NULL for BT_CHARACTER cases instead of using a loop.
Jonathan Wakely [Fri, 9 May 2025 16:50:52 +0000 (17:50 +0100)]
libstdc++: Restore std::scoped_lock for non-gthreads targets [PR120198]
This was a regression introduced with using version.def to define
feature test macros (r14-3248-g083b7f2833d71d). std::scoped_lock doesn't
need to depend on gthreads and so can be defined unconditionally, even
for freestanding.
libstdc++-v3/ChangeLog:
PR libstdc++/120198
* include/bits/version.def (scoped_lock): Do not depend on
gthreads or hosted.
* include/bits/version.h: Regenerate.
* include/std/mutex (scoped_lock): Update comment.
* testsuite/30_threads/scoped_lock/requirements/typedefs.cc:
Remove dg-require-gthreads and use custom lockable type instead
of std::mutex. Check that typedef is only present for a single
template argument.
Jakub Jelinek [Thu, 23 Jan 2025 10:17:47 +0000 (11:17 +0100)]
c++: Fix build_omp_array_section for type dependent array_expr [PR118590]
As can be seen on the testcase, when array_expr is type dependent, assuming
it has non-NULL TREE_TYPE is just wrong, it can often have NULL type, and even
if not, blindly assuming it is a pointer or array type is also wrong.
So, like in many other spots in the C++ FE, for type dependent expressions
we want to create something which will survive until instantiation and can be
redone at that point.
Unfortunately, build_omp_array_section is called before we actually do any
kind of checking what array_expr really is, and on invalid code it can be e.g.
a TYPE_DECL on which type_dependent_expression_p ICEs (as can be seen on the
pr67522.C testcase). So, I've hacked this by checking it is not TYPE_DECL,
I hope a TYPE_P can't make it through there when we just lookup an identifier.
Anyway, this patch is not enough, we can ICE e.g. on __uint128_t[0:something]
during instantiation, so I think something needs to be done for this in pt.cc
as well.
2025-01-23 Jakub Jelinek <jakub@redhat.com>
PR c++/118590
* typeck.cc (build_omp_array_section): If array_expr is type dependent
or a TYPE_DECL, build OMP_ARRAY_SECTION with NULL type.
Andrew Pinski [Sat, 11 Jan 2025 04:04:09 +0000 (20:04 -0800)]
final: Fix get_attr_length for asm goto [PR118411]
The problem is for inline-asm goto, the outer rtl insn type
is a jump_insn and get_attr_length does not handle ASM specially
unlike if the outer rtl insn type was just insn.
This fixes the issue by adding support for both CALL_INSN and JUMP_INSN
with asm.
OK? Bootstrapped and tested on x86_64-linux-gnu.
PR middle-end/118411
gcc/ChangeLog:
* final.cc (get_attr_length_1): Handle asm for CALL_INSN
and JUMP_INSNs.
Harald Anlauf [Sat, 3 May 2025 18:35:57 +0000 (20:35 +0200)]
Fortran: array subreferences and components of derived types [PR119986]
PR fortran/119986
gcc/fortran/ChangeLog:
* expr.cc (is_subref_array): When searching for array references,
do not terminate early so that inquiry references to complex
components work.
* primary.cc (gfc_variable_attr): A substring reference can refer
to either a scalar or array character variable. Adjust search
accordingly.
Martin Jambor [Tue, 6 May 2025 15:28:43 +0000 (17:28 +0200)]
ipa: Do not emit info about temporary clones to ipa-clones dump (PR119852)
As described in PR 119852, the output of -fdump-ipa-clones can contain
"(null)" as the suffix/reason for cloning when we need to create a
clone to hold the original function during recursive inlining. Such
clone is never output and so should not be part of the dump output
either.
gcc/ChangeLog:
2025-04-23 Martin Jambor <mjambor@suse.cz>
PR ipa/119852
* cgraphclones.cc (dump_callgraph_transformation): Document the
function. Do not dump if suffix is NULL.
Martin Jambor [Tue, 6 May 2025 15:28:42 +0000 (17:28 +0200)]
Document option -fdump-ipa-clones
I have noticed that the option -fdump-ipa-clones is not documented
although there are users who depend on it. This patch adds the
missing documentation along with the description of the information it
dumps and the format it uses.
I am never quite sure which of the texinfo mark-ups is the most
appropriate in which situation, I'll of course incorporate any
feedback on this as well as the general wording of the text.
After we settle on a version, I'd like to backport the documentation
also at least to GCC 15, 14 and 13.
Is it perhaps OK for master and the branches or what would better be
changed?
Pan Li [Tue, 30 Apr 2024 01:42:39 +0000 (09:42 +0800)]
DSE: Fix ICE after allow vector type in get_stored_val
We allowed vector type for get_stored_val when read is less than or
equal to store in previous. Unfortunately, the valididate_subreg
treats the vector type's size is less than vector register as
invalid. Then we will have ICE here.
This patch would like to fix it by filter-out the invalid type size,
and make sure the subreg is valid for both the read_mode and store_mode
before perform the real gen_lowpart.
The below test suites are passed for this patch:
* The x86 bootstrap test.
* The x86 regression test.
* The riscv rv64gcv regression test.
* The riscv rv64gc regression test.
* The aarch64 regression test.
gcc/ChangeLog:
* dse.cc (get_stored_val): Make sure read_mode/write_mode
is valid subreg before gen_lowpart.
Dimitar Dimitrov [Thu, 24 Oct 2024 16:59:42 +0000 (19:59 +0300)]
testsuite: Require effective target pie for pr113197
The test for PR113197 explicitly enables PIE. But targets without PIE
emit warnings when -fpie is passed (e.g. pru and avr), which causes the
test to fail.
Fix by adding an effective target requirement for PIE.
With this patch, the test now is marked as unsupported for
pru-unknown-elf. Testing for x86_64-pc-linux-gnu passes with current
mainline, and fails if the fix from r15-4018-g02f4efe3c12cf7 is
reverted.
Jakub Jelinek [Fri, 31 Jan 2025 11:39:34 +0000 (12:39 +0100)]
testsuite: Add testcase for already fixed PR [PR117498]
This wrong-code issue has been fixed with r15-7249.
We still emit warnings which are questionable and perhaps we'd
get better generated code if niters determined the loop has only a single
iteration without UB and we'd punt on vectorizing it (or unrolling).
2025-01-31 Jakub Jelinek <jakub@redhat.com>
PR middle-end/117498
* gcc.c-torture/execute/pr117498.c: New test.
but stretch_14 is a 8-bit boolean so the two forms are not equivalent, that
is to say dropping the "& 1" is wrong. It's another instance of the issue:
https://gcc.gnu.org/pipermail/gcc-patches/2020-November/558537.html
Here it's the reverse case: the bitwise NOT (~) is treated as logical by the
machinery in range-op.cc but the bitwise AND (&) is *not* treated as logical
by that of vr-values.cc, leading to the same problematic outcome.
gcc/
* vr-values.cc (simplify_using_ranges::simplify) <BIT_AND_EXPR>:
Do not call simplify_bit_ops_using_ranges for boolean types whose
precision is not 1.
gcc/testsuite/
* gnat.dg/opt106.adb: New test.
* gnat.dg/opt106_pkg1.ads, gnat.dg/opt106_pkg1.adb: New helper.
* gnat.dg/opt106_pkg2.ads, gnat.dg/opt106_pkg2.adb: Likewise.
Richard Biener [Tue, 14 May 2024 09:13:51 +0000 (11:13 +0200)]
tree-optimization/120156 - ICE in ptr_derefs_may_alias_p
This picks the ptr_derefs_may_alias_p fix from the PR99954 fix
which said: This makes us run into a latent issue in
ptr_deref_may_alias_decl_p when the pointer is something like &MEM[0].a
in which case we fail to handle non-SSA name pointers. Add code
similar to what we have in ptr_derefs_may_alias_p.
PR tree-optimization/120156
* tree-ssa-alias.cc (ptr_deref_may_alias_decl_p): Verify
the pointer is an SSA name.
aarch64: Fix CFA offsets in non-initial stack probes [PR119610]
PR119610 is about incorrect CFI output for a stack probe when that
probe is not the initial allocation. The main aarch64 stack probe
function, aarch64_allocate_and_probe_stack_space, implicitly assumed
that the incoming stack pointer pointed to the top of the frame,
and thus held the CFA.
aarch64_save_callee_saves and aarch64_restore_callee_saves use a
parameter called bytes_below_sp to track how far the stack pointer
is above the base of the static frame. This patch does the same
thing for aarch64_allocate_and_probe_stack_space.
Also, I noticed that the SVE path was attaching the first CFA note
to the wrong instruction: it was attaching the note to the calculation
of the stack size, rather than to the r11<-sp copy.
gcc/
PR target/119610
* config/aarch64/aarch64.cc (aarch64_allocate_and_probe_stack_space):
Add a bytes_below_sp parameter and use it to calculate the CFA
offsets. Attach the first SVE CFA note to the move into the
associated temporary register.
(aarch64_allocate_and_probe_stack_space): Update calls accordingly.
Start out with bytes_per_sp set to the frame size and decrement
it after each allocation.
gcc/testsuite/
PR target/119610
* g++.dg/torture/pr119610.C: New test.
* g++.target/aarch64/sve/pr119610-sve.C: Likewise.
Jakub Jelinek [Tue, 22 Apr 2025 19:27:28 +0000 (21:27 +0200)]
rs6000: Ignore OPTION_MASK_SAVE_TOC_INDIRECT differences in inlining decisions [PR119327]
The following testcase FAILs because the always_inline function can't
be inlined.
The rs6000 backend has similarly to other targets a hook which rejects
inlining which would bring in new ISAs which aren't there in the caller.
And this hook rejects this because of OPTION_MASK_SAVE_TOC_INDIRECT
differences.
This flag is set if explicitly requested or by default depending on
whether the current function looks hot (or at least not cold):
if ((rs6000_isa_flags_explicit & OPTION_MASK_SAVE_TOC_INDIRECT) == 0
&& flag_shrink_wrap_separate
&& optimize_function_for_speed_p (cfun))
rs6000_isa_flags |= OPTION_MASK_SAVE_TOC_INDIRECT;
The target nodes that are being compared here are actually the default
target node (which was created when cfun was NULL) vs. one that was
created for the always_inline function when it wasn't NULL, so one
doesn't have it, the other does.
In any case, this flag feels like a tuning decision rather than hard
ISA requirement and I see no problems why we couldn't inline
even explicit -msave-toc-indirect function into -mno-save-toc-indirect
or vice versa.
We already ignore OPTION_MASK_P{8,10}_FUSION which are also more
like tuning flags.
2025-04-22 Jakub Jelinek <jakub@redhat.com>
PR target/119327
* config/rs6000/rs6000.cc (rs6000_can_inline_p): Ignore also
OPTION_MASK_SAVE_TOC_INDIRECT differences.