Jakub Jelinek [Wed, 5 Feb 2025 13:06:42 +0000 (14:06 +0100)]
cselib: Fix up previous patch for SPARC [PR117239]
Sorry, our CI bot just notified me I broke SPARC build. There are two
#ifdef STACK_ADDRESS_OFFSET
guarded snippets and the macro is only defined on SPARC target, so I didn't
notice there was a syntax error.
Fixed thusly.
2025-02-05 Jakub Jelinek <jakub@redhat.com>
PR rtl-optimization/117239
* cselib.cc (cselib_init): Remove spurious closing paren in
the #ifdef STACK_ADDRESS_OFFSET specific code.
Jakub Jelinek [Wed, 5 Feb 2025 12:16:17 +0000 (13:16 +0100)]
cselib: For CALL_INSNs to const/pure fns invalidate memory below sp [PR117239]
The following testcase is miscompiled on x86_64 during postreload.
After reload (with IPA-RA figuring out the calls don't modify any
registers but %rax for return value) postreload sees
(insn 14 12 15 2 (set (mem:DI (plus:DI (reg/f:DI 7 sp)
(const_int 16 [0x10])) [0 S8 A64])
(reg:DI 1 dx [orig:105 q+16 ] [105])) "pr117239.c":18:7 95 {*movdi_internal}
(nil))
(call_insn/i 15 14 16 2 (set (reg:SI 0 ax)
(call (mem:QI (symbol_ref:DI ("baz") [flags 0x3] <function_decl 0x7ffb2e2bdf00 r>) [0 baz S1 A8])
(const_int 24 [0x18]))) "pr117239.c":18:7 1476 {*call_value}
(expr_list:REG_CALL_DECL (symbol_ref:DI ("baz") [flags 0x3] <function_decl 0x7ffb2e2bdf00 baz>)
(expr_list:REG_EH_REGION (const_int 0 [0])
(nil)))
(nil))
(insn 16 15 18 2 (parallel [
(set (reg/f:DI 7 sp)
(plus:DI (reg/f:DI 7 sp)
(const_int 24 [0x18])))
(clobber (reg:CC 17 flags))
]) "pr117239.c":18:7 285 {*adddi_1}
(expr_list:REG_ARGS_SIZE (const_int 0 [0])
(nil)))
...
(call_insn/i 19 18 21 2 (set (reg:SI 0 ax)
(call (mem:QI (symbol_ref:DI ("foo") [flags 0x3] <function_decl 0x7ffb2e2bdb00 l>) [0 foo S1 A8])
(const_int 0 [0]))) "pr117239.c":19:3 1476 {*call_value}
(expr_list:REG_CALL_DECL (symbol_ref:DI ("foo") [flags 0x3] <function_decl 0x7ffb2e2bdb00 foo>)
(expr_list:REG_EH_REGION (const_int 0 [0])
(nil)))
(nil))
(insn 21 19 26 2 (parallel [
(set (reg/f:DI 7 sp)
(plus:DI (reg/f:DI 7 sp)
(const_int -24 [0xffffffffffffffe8])))
(clobber (reg:CC 17 flags))
]) "pr117239.c":19:3 discrim 1 285 {*adddi_1}
(expr_list:REG_ARGS_SIZE (const_int 24 [0x18])
(nil)))
(insn 26 21 24 2 (set (mem:DI (plus:DI (reg/f:DI 7 sp)
(const_int 16 [0x10])) [0 S8 A64])
(reg:DI 1 dx [orig:105 q+16 ] [105])) "pr117239.c":19:3 discrim 1 95 {*movdi_internal}
(nil))
i.e.
movq %rdx, 16(%rsp)
call baz
addq $24, %rsp
...
call foo
subq $24, %rsp
movq %rdx, 16(%rsp)
Now, postreload uses cselib and cselib remembered that %rdx value has been
stored into 16(%rsp). Both baz and foo are pure calls. If they weren't,
when processing those CALL_INSNs cselib would invalidate all MEMs
if (RTL_LOOPING_CONST_OR_PURE_CALL_P (insn)
|| !(RTL_CONST_OR_PURE_CALL_P (insn)))
cselib_invalidate_mem (callmem);
where callmem is (mem:BLK (scratch)). But they are pure, so instead the
code just invalidates the argument slots from CALL_INSN_FUNCTION_USAGE.
The calls actually clobber more than that, even const/pure calls clobber
all memory below the stack pointer. And that is something that hasn't been
invalidated. In this failing testcase, the call to baz is not a big deal,
we don't have anything remembered in memory below %rsp at that call.
But then we increment %rsp by 24, so the %rsp+16 is now 8 bytes below stack
and do the call to foo. And that call now actually, not just in theory,
clobbers the memory below the stack pointer (in particular overwrites it
with the return value). But cselib does not invalidate. Then %rsp
is decremented again (in preparation for another call, to bar) and cselib
is processing store of %rdx (which IPA-RA says has not been modified by
either baz or foo calls) to %rsp + 16, and it sees the memory already has
that value, so the store is useless, let's remove it.
But it is not, the call to foo has changed it, so it needs to be stored
again.
The following patch adds targetted invalidation of memory below stack
pointer (or on SPARC memory below stack pointer + 2047 when stack bias is
used, or on PA memory above stack pointer instead).
It does so only in !ACCUMULATE_OUTGOING_ARGS or cfun->calls_alloca functions,
because in other functions the stack pointer should be constant from
the end of prologue till start of epilogue and so nothing should be stored
within the function below the stack pointer.
Now, memory below stack pointer is special, except for functions using
alloca/VLAs I believe no addressable memory should be there, it should be
purely outgoing function argument area, if we take address of some automatic
variable, it should live all the time above the outgoing function argument
area. So on top of just trying to flush memory below stack pointer
(represented by %rsp - PTRDIFF_MAX with PTRDIFF_MAX size on most arches),
the patch tries to optimize and only invalidate memory that has address
clearly derived from stack pointer (memory with other bases is not
invalidated) and if we can prove (we see same SP_DERIVED_VALUE_P bases in
both VALUEs) it is above current stack, also don't call
canon_anti_dependence which might just give up in certain cases.
I've gathered statistics from x86_64-linux and i686-linux
bootstraps/regtests. During -m64 compilations from those, there were 3718396 + 42634 + 27761 cases of processing MEMs in cselib_invalidate_mem
(callmem[1]) calls, the first number is number of MEMs not invalidated
because of the optimization, i.e.
+ if (sp_derived_base == NULL_RTX)
+ {
+ has_mem = true;
+ num_mems++;
+ p = &(*p)->next;
+ continue;
+ }
in the patch, the second number is number of MEMs not invalidated because
canon_anti_dependence returned false and finally the last number is number
of MEMs actually invalidated (so that is what hasn't been invalidated
before). During -m32 compilations the numbers were 1422412 + 39354 + 16509 with the same meaning.
Note, when there is no red zone, in theory even the sp = sp + incr
instruction invalidates memory below the new stack pointer, as signal
can come and overwrite the memory. So maybe we should be invalidating
something at those instructions as well. But in leaf functions we certainly
can have even addressable automatic vars in the red zone (which would make
it harder to distinguish), on the other side aren't normally storing
anything below the red zone, and in non-leaf it should normally be just the
outgoing arguments area.
2025-02-05 Jakub Jelinek <jakub@redhat.com>
PR rtl-optimization/117239
* cselib.cc: Include predict.h.
(callmem): Change type from rtx to rtx[2].
(cselib_preserve_only_values): Use callmem[0] rather than callmem.
(cselib_invalidate_mem): Optimize and don't try to invalidate
for the mem_rtx == callmem[1] case MEMs which clearly can't be
below the stack pointer.
(cselib_process_insn): Use callmem[0] rather than callmem.
For const/pure calls also call cselib_invalidate_mem (callmem[1])
in !ACCUMULATE_OUTGOING_ARGS or cfun->calls_alloca functions.
(cselib_init): Initialize callmem[0] rather than callmem and also
initialize callmem[1].
Richard Earnshaw [Thu, 19 Dec 2024 16:00:48 +0000 (16:00 +0000)]
arm: Use POP {pc} to return when returning [PR118089]
When generating thumb2 code,
LDM SP!, {PC}
is a two-byte instruction, whereas
LDR PC, [SP], #4
is needs 4 bytes. When optimizing for size, or when there's no obvious
performance benefit prefer the former.
gcc/ChangeLog:
PR target/118089
* config/arm/arm.cc (thumb2_expand_return): Use LDM SP!, {PC}
when optimizing for size, or when there's no performance benefit over
LDR PC, [SP], #4.
(arm_expand_epilogue): Likewise.
Richard Earnshaw [Thu, 19 Dec 2024 15:54:16 +0000 (15:54 +0000)]
arm: remove constraints from *pop_multiple_with_writeback_and_return
This pattern is intended to be used only by the epilogue generation
code and will always use fixed hard registers. As such, it does not
need any register constraints, which might be misleading if a
post-reload pass wanted to try renumbering various registers. So
remove the constraints.
Futhermore, to permit this pattern to match when popping just the PC
(which is not a valid register_operand), remove the match on the first
transfer register: pop_multiple_return will validate everything it
needs to.
gcc/ChangeLog:
* config/arm/arm.md (*pop_multiple_with_writeback_and_return): Remove
constraints. Don't validate the first transfer register here.
Richard Earnshaw [Thu, 19 Dec 2024 15:32:36 +0000 (15:32 +0000)]
arm: cleanup code in ldm_stm_operation_p; relax limits on ldm/stm
I needed to make some adjustments to this function to permit a push or
pop of a single register in thumb2 code, since ldm/stm can be a
two-byte instruction instead of 4. Trying to read the code as it was
made me scratch my head as the logic was not very clear. So this
patch cleans up the code somewhat, fixes a couple of minor bugs and
removes the limit of having to use multiple registers when using this
form of the instruction (the shape of this pattern is such that I
can't see it being generated automatically by the compiler, so there
should be no adverse affects of this).
Buglets fixed:
- Validate that the first element contains RETURN if we're matching
a return instruction.
- Don't allow the base address register to be stored if saving regs
and the address is being updated (this is unpredictable in the
architecture).
- Verify that the last register loaded in a RETURN insn is the PC.
gcc/
* config/arm/arm.cc (decompose_addr_for_ldm_stm): New function.
(ldm_stm_operation_p): Rework to clarify logic. Allow single
registers to be pushed or popped using LDM/STM.
It's still correct. But then when the SAD pattern is recognized:
patt_29 = SAD_EXPR <a_14, b_16, r_23>;
This is not correct. This only happens for targets with both uabd and
sabd but not vec_widen_{s,u}abd, currently LoongArch is the only target
affected.
The problem is vect_look_through_possible_promotion will throw away a
series of conversions if the effect is equivalent to a sign change and a
promotion, but here the sign change is definitely relevant, and the
promotion is also relevant for "mixed sign" cases like
r += abs((unsigned int)(unsigned char) a - (signed int)(signed char) b
(we need to promote to HImode as the difference can exceed the range of
QImode).
If there were any redundant promotion, it should have been stripped in
vect_recog_abd_pattern (i.e. when patt_31 = .ABD (a.0_4, b.1_6) is
recognized) instead of in vect_recog_sad_pattern, or we'd have a
missed-optimization if the ABD output is not summerized. So anyway
vect_recog_sad_pattern is just not a proper location to call
vect_look_through_possible_promotion for the ABD inputs, remove the
calls to fix the issue.
gcc/ChangeLog:
PR tree-optimization/118727
* tree-vect-patterns.cc (vect_recog_sad_pattern): Don't call
vect_look_through_possible_promotion on ABD inputs.
gcc/testsuite/ChangeLog:
PR tree-optimization/118727
* gcc.dg/pr108692.c: Mention PR 118727 in the comment.
* gcc.dg/pr118727.c: New test case.
testsuite: Revert to the original version of pr100056.c
r15-268-g9dbff9c05520 restored the original GCC 11 output for
pr100056.c, so this patch reverts the changes made to the test
in r12-7259-g25332d2325c7. (The code parts of r12-7259 still
seem useful, as a belt-and-braces thing.)
gcc/testsuite/
* gcc.target/aarch64/pr100056.c: Restore the original version of
the scan-assemblers.
libstdc++: correct symbol version of typeinfo for bfloat16_t on RISC-V
broke the libstdc++-abi/abi_check test on Solaris: the log shows
1 incompatible symbols
0
Argument "{CXXABI_1.3.15}" isn't numeric in numeric eq (==) at /vol/gcc/src/hg/master/local/libstdc++-v3/scripts/extract_symvers.pl line 129.
version status: incompatible
type: uncategorized
status: added
The problem has two parts:
* The patch above introduced a new version in libstdc++.so,
CXXABI_1.3.16, which everywhere but on RISC-V contains no symbols (a
weak version). This is the first time this happened in libstdc++.
* Solaris uses scripts/extract_symvers.pl to determine the version info.
The script currently chokes on the pvs output for weak versions:
While this patch hardens the script to cope with weak versions, there's
no reason to introduce them in the first place. So the new version is
only created on __riscv.
Tested on i386-pc-solaris2.11, sparc-sun-solaris2.11, and
x86_64-pc-linux-gnu.
2025-01-29 Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE>
Jonathan Wakely <jwakely@redhat.com>
libstdc++-v3:
PR libstdc++/118701
* config/abi/pre/gnu.ver (CXXABI_1.3.16): Move __riscv guard
around version.
* scripts/extract_symvers.pl: Allow for weak versions.
* testsuite/util/testsuite_abi.cc (check_version): Wrap
CXXABI_1.3.16 in __riscv.
Jerry DeLisle [Wed, 5 Feb 2025 01:21:42 +0000 (17:21 -0800)]
Fortran: Fix PR 47485.
The -MT and -MQ options should replace the default target in the
generated dependency file. deps_add_target needs to be called before
cpp_read_main_file, otherwise the original object name is added.
Contributed by Vincent Vanlaer <vincenttc@volkihar.be>
PR fortran/47485
gcc/fortran/ChangeLog:
* cpp.cc: fix -MT/-MQ adding additional target instead of
replacing the default.
gcc/testsuite/ChangeLog:
* gfortran.dg/dependency_generation_1.f90: New test.
Signed-off-by: Vincent Vanlaer <vincenttc@volkihar.be>
Gaius Mulley [Tue, 4 Feb 2025 23:21:52 +0000 (23:21 +0000)]
PR modula2/115112 Incorrect line debugging information occurs during INC builtin
This patch fixes location bugs in BuildDecProcedure,
BuildIncProcedure, BuildInclProcedure, BuildExclProcedure and
BuildThrow. All these procedure functions use the token position
passed as a parameter (rather than from the quad stack). It also
fixes location bugs in CheckRangeIncDec to ensure that the token
position is stored on the quad stack before calling subsidiary
procedure functions.
gcc/m2/ChangeLog:
PR modula2/115112
* gm2-compiler/M2Quads.mod (BuildPseudoProcedureCall): Pass
tokno to each build procedure.
(BuildThrowProcedure): New parameter functok.
(BuildIncProcedure): New parameter proctok.
Pass proctok on the quad stack during every push.
(BuildDecProcedure): Ditto.
(BuildInclProcedure): New parameter proctok.
(BuildExclProcedure): New parameter proctok.
gcc/testsuite/ChangeLog:
PR modula2/115112
* gm2/pim/run/pass/dectest.mod: New test.
* gm2/pim/run/pass/inctest.mod: New test.
Jakub Jelinek [Tue, 4 Feb 2025 20:16:57 +0000 (21:16 +0100)]
c++: Fix ICE with #embed/RAW_DATA_CST after list conversion [PR118671]
The following testcases ICE with RAW_DATA_CSTs (so the first one since
introduction of #embed C++ optimizations and the latter since optimization
of large sequences of comma separated literals).
I've missed the fact that implicit_conversion can embed the exact expression
passed to it into stuff pointed out by conversion * (e.g. for user
conversions in sub->cand->args).
So, it isn't enough in convert_like_internal to pass the right INTEGER_CST
for each element of the RAW_DATA_CST because the whole RAW_DATA_CST might be
in sub->cand->args etc.
Either I'd need to chase for wherever the RAW_DATA_CST is found and update
those for each element processed, or, as implemented in the following patch,
build_list_conv detects the easy optimizable case where
convert_like_internal can be kept as the whole RAW_DATA_CST with changed
type and possibly narrowing diagnostics, and otherwise instead of having
a single subconversion it has RAW_DATA_CST separate subconversions.
Instead of trying to reallocate the subconvs array when we detect that case,
the patch instead uses an artificial ck_list inside of the u.list array
to hold the individual subconversions.
Seems the only places where u.list is used are build_list_conv and
convert_like_internal.
2025-02-04 Jakub Jelinek <jakub@redhat.com>
PR c++/118671
* call.cc (build_list_conv): For RAW_DATA_CST, call
implicit_conversion with INTEGER_CST representing first byte instead
of the whole RAW_DATA_CST. If it is an optimizable trivial
conversion, just save that to subconvs, otherwise allocate an
artificial ck_list for all the RAW_DATA_CST bytes and create
subsubconv for each of them.
(convert_like_internal): For ck_list with RAW_DATA_CST, instead of
doing all the checks for optimizable conversion just check kind and
assert everything else, otherwise use subsubconversions instead of
the subconversion for each element.
* g++.dg/cpp/embed-25.C: New test.
* g++.dg/cpp0x/pr118671.C: New test.
arm: testsuite: Adapt mve-vabs.c to improved codegen
Since commit r15-491-gc290e6a0b7a9de this failure happens on
armv8l-linux-gnueabihf and arm-eabi:
Running gcc:gcc.target/arm/simd/simd.exp ...
gcc.target/arm/simd/mve-vabs.c: memmove found 0 times
FAIL: gcc.target/arm/simd/mve-vabs.c scan-assembler-times memmove 3
In PR PR target/116010, Andrew Pinski noted that
"gcc.target/arm/simd/mve-vabs.c now calls memcpy because of the restrict
instead of memmove. That should be a simple fix there."
Therefore change the test to expect memcpy rather than memmove.
Another change is that memcpy is inlined rather than called, so also change
the test to check the optimized tree dump rather than the generated
assembly.
Tested on armv8l-linux-gnueabihf and arm-eabi.
gcc/testsuite/ChangeLog:
PR target/116010
* gcc.target/arm/simd/mve-vabs.c: Test tree dump and adjust to new
code.
Suggested-by: Andrew Pinski <quic_apinski@quicinc.com>
Marek Polacek [Wed, 29 Jan 2025 20:58:38 +0000 (15:58 -0500)]
c++: auto in trailing-return-type in parameter [PR117778]
This PR describes a few issues, both ICE and rejects-valid, but
ultimately the problem is that we don't properly synthesize the
second auto in:
int
g (auto fp() -> auto)
{
return fp ();
}
since r12-5860, which disabled auto_is_implicit_function_template_parm_p
in cp_parser_parameter_declaration after parsing the decl-specifier-seq.
If there is no trailing auto, there is no problem.
So we have to make sure auto_is_implicit_function_template_parm_p is
properly set when parsing the trailing auto. A complication is that
one can write:
auto f (auto fp(auto fp2() -> auto) -> auto) -> auto;
~~~~~~~
where only the underlined auto should be synthesized. So when we
parse a parameter-declaration-clause inside another
parameter-declaration-clause, we should not enable the flag. We
have no flags to keep track of such nesting, but I think I can walk
current_binding_level to see if we find ourselves in such an unlikely
scenario.
Richard Biener [Tue, 4 Feb 2025 09:54:48 +0000 (10:54 +0100)]
c/118742 - gimple FE parsing of unary operators of C promoted args
The GIMPLE FE currently invokes parser_build_unary_op to build
unary GENERIC which has the operand subject to C promotion rules
which does not match GIMPLE. The following adds a wrapper around
the build_unary_op worker which conveniently has an argument to
indicate whether to skip such promotion.
PR c/118742
gcc/c/
* gimple-parser.cc (gimple_parser_build_unary_op): New
wrapper around build_unary_op.
(c_parser_gimple_unary_expression): Use it.
gcc/testsuite/
* gcc.dg/gimplefe-56.c: New testcase.
Ilya Leoshkevich [Thu, 13 Oct 2022 00:54:52 +0000 (02:54 +0200)]
IBM zSystems: Do not use @PLT with larl
Commit 0990d93dd8a4 ("IBM Z: Use @PLT symbols for local functions in
64-bit mode") made GCC call both static and non-static functions and
load both static and non-static function addresses with the @PLT
suffix. This made it difficult for linkers to distinguish calling and
address taking instructions [1]. It is currently assumed that the
R_390_PLT32DBL relocation, corresponding to the @PLT suffix, is used
only for calling, and the R_390_PC32DBL relocation, corresponding to
the empty suffix, is used only for address taking.
Linkers needs to make this distinction in order to decide whether to
ask ld.so to use canonical PLT entries. Normally GOT entries in shared
objects contain addresses of the respective functions, with one notable
exception: when a no-pie executable calls the respective function and
also takes its address. Such executables assume that all addresses are
known in advance, so they use addresses of the respective PLT entries.
For consistency reasons, all respective GOT entries in the process must
also use them.
When a linker sees that a no-pie executable both calls a function and
also takes its address, it creates a PLT entry and asks ld.so to
consider it canonical by setting the respective undefined symbol's
address, which is normally 0, to the address of this PLT entry.
Improve the situation by not using @PLT with larl.
Now that @PLT is not used with larl, also drop the 31-bit handling,
which was required because 31-bit PLT entries require %r12 to point to
the respective object's GOT, and this requirement is not satisfied when
calling them by pointer from another object.
Also drop the weak symbol handling, which was required because it is
not possible to load an undefined weak symbol address (0) using larl.
* config/s390/s390.cc (print_operand): Remove the no longer
necessary 31-bit and weak symbol handling.
* config/s390/s390.md (*movdi_64): Do not use @PLT with larl.
(*movsi_larl): Likewise.
(main_base_64): Likewise.
(reload_base_64): Likewise.
Simon Martin [Tue, 4 Feb 2025 09:58:17 +0000 (10:58 +0100)]
c++: Fix overeager Woverloaded-virtual with conversion operators [PR109918]
We currently emit an incorrect -Woverloaded-virtual warning upon the
following test case
=== cut here ===
struct A {
virtual operator int() { return 42; }
virtual operator char() = 0;
};
struct B : public A {
operator char() { return 'A'; }
};
=== cut here ===
The problem is that when iterating over ovl_range (fns), warn_hidden
gets confused by the conversion operator marker, concludes that
seen_non_override is true and therefore emits a warning for all
conversion operators in A that do not convert to char, even if
-Woverloaded-virtual is 1 (e.g. with -Wall, the case reported).
A second set of problems is highlighted when -Woverloaded-virtual is 2.
First, with the same test case, since base_fndecls contains all
conversion operators in A (except the one to char, that's been removed
when iterating over ovl_range (fns)), we emit a spurious warning for
the conversion operator to int, even though it's unrelated.
Second, in case there are several conversion operators with different
cv-qualifiers to the same type in A, we rightfully emit a warning,
however the note uses the location of the conversion operator marker
instead of the right one; location_of should go over conv_op_marker.
This patch fixes all these by explicitly keeping track of (1) base
methods that are overriden, as well as (2) base methods that are hidden
but not overriden (and by what), and warning about methods that are in
(2) but not (1). It also ignores non virtual base methods, per
"definition" of -Woverloaded-virtual.
Co-authored-by: Jason Merrill <jason@redhat.com>
PR c++/117114
PR c++/109918
gcc/cp/ChangeLog:
* class.cc (warn_hidden): Keep track of overloaded and of hidden
base methods.
* error.cc (location_of): Skip over conv_op_marker.
gcc/testsuite/ChangeLog:
* g++.dg/warn/Woverloaded-virt1.C: Check that no warning is
emitted for non virtual base methods.
* g++.dg/warn/Woverloaded-virt10.C: New test.
* g++.dg/warn/Woverloaded-virt11.C: New test.
* g++.dg/warn/Woverloaded-virt12.C: New test.
* g++.dg/warn/Woverloaded-virt13.C: New test.
* g++.dg/warn/Woverloaded-virt5.C: New test.
* g++.dg/warn/Woverloaded-virt6.C: New test.
* g++.dg/warn/Woverloaded-virt7.C: New test.
* g++.dg/warn/Woverloaded-virt8.C: New test.
* g++.dg/warn/Woverloaded-virt9.C: New test.
Richard Biener [Mon, 3 Feb 2025 14:12:52 +0000 (15:12 +0100)]
tree-optimization/117113 - ICE with unroll-and-jam
When there's an inner loop without virtual header PHI but the outer
loop has one the fusion process cannot handle the need to create
an inner loop virtual header PHI. Punt in this case.
PR tree-optimization/117113
* gimple-loop-jam.cc (unroll_jam_possible_p): Detect when
we cannot handle virtual SSA update.
Simon Martin [Tue, 4 Feb 2025 09:44:10 +0000 (10:44 +0100)]
c++: Properly detect calls to digest_init in build_vec_init [PR114619]
We currently ICE in checking mode with cxx_dialect < 17 on the following
valid code
=== cut here ===
struct X {
X(const X&) {}
};
extern X x;
void foo () {
new X[1]{x};
}
=== cut here ===
We trip on a gcc_checking_assert in cp_gimplify_expr due to a
TARGET_EXPR that is not TARGET_EXPR_ELIDING_P. As pointed by Jason, the
problem is that build_vec_init does not recognize that digest_init has
been called, and we end up calling the copy constructor twice.
This happens because the detection in build_vec_init assumes that BASE
is a reference to the array, while it's a pointer to its first element
here. This patch makes sure that the detection works in both cases.
PR c++/114619
gcc/cp/ChangeLog:
* init.cc (build_vec_init): Properly determine whether
digest_init has been called.
Jakub Jelinek [Tue, 4 Feb 2025 08:23:15 +0000 (09:23 +0100)]
c++: Fix up pedwarn for capturing structured bindings in lambdas [PR118719]
As mentioned in the PR, this pedwarni is desirable for the implicit or
explicit capturing of structured bindings in C++17, but in the case of
init-captures the initializer is just some expression and that can include
structured bindings.
So, the following patch limits the warning to non-explicit_init_p.
2025-02-04 Jakub Jelinek <jakub@redhat.com>
PR c++/118719
* lambda.cc (add_capture): Only pedwarn about capturing structured
binding if !explicit_init_p.
Andrew Pinski [Tue, 4 Feb 2025 03:58:45 +0000 (19:58 -0800)]
optabs: Fix widening optabs for vec-mode -> scalar-mode [PR116926]
r15-4317-ga6f4404689f12 tried to add support for widending optabs
for vec-mode -> scalar-mode but it misunderstood how FOR_EACH_MODE worked,
the limit in this case is not inclusive. Which means setting limit to from,
would cause the loop not be executed at all. This fixes by setting the
limit to be the next mode after from mode.
Note the original version that added the widening optabs for vec-mode -> scalar-mode
(https://gcc.gnu.org/pipermail/gcc-patches/2024-October/665021.html) didn't have this
bug, only the second version with suggested change
(https://gcc.gnu.org/pipermail/gcc-patches/2024-October/665068.html) dud. The suggested
change missed this issue with FOR_EACH_MODE.
Bootstrapped and tested on x86_64-linux-gnu.
PR middle-end/116926
gcc/ChangeLog:
* optabs-query.cc (find_widening_optab_handler_and_mode): Fix
limit for `vec-mode -> scalar-mode` case.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Thomas Koenig [Mon, 27 Jan 2025 17:43:44 +0000 (18:43 +0100)]
Add modular exponentiation for UNSIGNED.
gcc/fortran/ChangeLog:
* arith.cc (arith_power): Handle modular arithmetic for
BT_UNSIGNED.
(eval_intrinsic): Error for unsigned exponentiation with
-pedantic.
* expr.cc (gfc_type_convert_binary): Use type of first
argument for unsigned exponentiation.
* gfortran.texi: Mention arithmetic exponentiation.
* resolve.cc (resolve_operator): Allow unsigned exponentiation.
* trans-decl.cc (gfc_build_intrinsic_function_decls): Build
declarations for unsigned exponentiation.
* trans-expr.cc (gfc_conv_cst_uint_power): New function.
(gfc_conv_power_op): Call it. Handle unsigned exponentiation.
* trans.h (gfor_fndecl_unsigned_pow_list): Add declaration.
libgfortran/ChangeLog:
* Makefile.am: Add files for unsigned exponentiation.
* Makefile.in: Regenerate.
* gfortran.map: Add functions for unsigned exponentiation.
* generated/pow_m16_m1.c: New file.
* generated/pow_m16_m16.c: New file.
* generated/pow_m16_m2.c: New file.
* generated/pow_m16_m4.c: New file.
* generated/pow_m16_m8.c: New file.
* generated/pow_m1_m1.c: New file.
* generated/pow_m1_m16.c: New file.
* generated/pow_m1_m2.c: New file.
* generated/pow_m1_m4.c: New file.
* generated/pow_m1_m8.c: New file.
* generated/pow_m2_m1.c: New file.
* generated/pow_m2_m16.c: New file.
* generated/pow_m2_m2.c: New file.
* generated/pow_m2_m4.c: New file.
* generated/pow_m2_m8.c: New file.
* generated/pow_m4_m1.c: New file.
* generated/pow_m4_m16.c: New file.
* generated/pow_m4_m2.c: New file.
* generated/pow_m4_m4.c: New file.
* generated/pow_m4_m8.c: New file.
* generated/pow_m8_m1.c: New file.
* generated/pow_m8_m16.c: New file.
* generated/pow_m8_m2.c: New file.
* generated/pow_m8_m4.c: New file.
* generated/pow_m8_m8.c: New file.
* m4/powu.m4: New file.
gcc/testsuite/ChangeLog:
* gfortran.dg/unsigned_15.f90: Adjust error messages.
* gfortran.dg/unsigned_43.f90: New test.
* gfortran.dg/unsigned_44.f90: New test.
Richard Biener [Mon, 3 Feb 2025 13:27:01 +0000 (14:27 +0100)]
lto/113207 - fix free_lang_data_in_type
When we process function types we strip volatile and const qualifiers
after building a simplified type variant (which preserves those).
The qualified type handling of both isn't really compatible, so avoid
bad interaction by swapping this, first dropping const/volatile
qualifiers and then building the simplified type thereof.
PR lto/113207
* ipa-free-lang-data.cc (free_lang_data_in_type): First drop
const/volatile qualifiers from function argument types,
then build a simplified type.
Nathaniel Shead [Sat, 1 Feb 2025 11:55:22 +0000 (22:55 +1100)]
c++: Improve contracts support in modules [PR108205]
Modules makes some assumptions about types that currently aren't
fulfilled by the types created in contracts logic. This patch ensures
that exporting inline functions using contracts works again with
modules.
PR c++/108205
gcc/cp/ChangeLog:
* contracts.cc (get_pseudo_contract_violation_type): Give names
to generated FIELD_DECLs.
(declare_handle_contract_violation): Mark contract_violation
type as external linkage.
(build_contract_handler_call): Ensure any builtin declarations
created here aren't treated as attached to the current module.
gcc/testsuite/ChangeLog:
* g++.dg/modules/contracts-5_a.C: New test.
* g++.dg/modules/contracts-5_b.C: New test.
Nathaniel Shead [Sat, 1 Feb 2025 10:21:37 +0000 (21:21 +1100)]
c++: Modularise start_cleanup_fn [PR98893]
'start_cleanup_fn' is not currently viable in modules, due to generating
functions relying on the 'start_cleanup_cnt' counter which is reset to 0
with each new TU. This means that cleanup functions declared in a TU
will conflict with any imported cleanup functions.
This patch mitigates the problem by using the mangled name of the decl
we're destroying as part of the name of the function. This should avoid
clashes unless the decls would have clashed anyway.
PR c++/98893
gcc/cp/ChangeLog:
* decl.cc (start_cleanup_fn): Make name from the mangled name of
the passed-in decl.
(register_dtor_fn): Pass decl to start_cleanup_fn.
gcc/testsuite/ChangeLog:
* g++.dg/modules/pr98893_a.H: New test.
* g++.dg/modules/pr98893_b.C: New test.
c++: find A pack from B in <typename...A,Class<A>...B> [PR118265]
For non-type parameter packs when unifying the arguments in
unify_pack_expansion it iterates over the associated packs of a param so
that when it recursively unifies the param with the arguments it knows
which targs have been populated with parameter pack arguments that it can
then collect up. This change adds a tree walk so that in the example above
it reaches ...A and adds it to the associated packs for ...B and therefore
knows it will have been set in targs in unify_pack_expansion and processes
it as per other pack arguments.
PR c++/118265
gcc/cp/ChangeLog:
* pt.cc (find_parameter_packs_r) <case TEMPLATE_PARM_INDEX>:
Walk into the type of a parameter pack.
Signed-off-by: Adam J Ryan <gcc.gnu.org@ajryansolutions.co.uk>
Iain Sandoe [Thu, 31 Oct 2024 08:40:08 +0000 (08:40 +0000)]
c++/coroutines: Fix awaiter var creation [PR116506]
Awaiters always need to have a coroutine state frame copy since
they persist across potential supensions. It simplifies the later
analysis considerably to assign these early which we do when
building co_await expressions.
The cleanups in r15-3146-g47dbd69b1, unfortunately elided some of
processing used to cater for cases where the var created from an
xvalue, or is a pointer/reference type.
Corrected thus.
PR c++/116506
PR c++/116880
gcc/cp/ChangeLog:
* coroutines.cc (build_co_await): Ensure that xvalues are
materialised. Handle references/pointer values in awaiter
access expressions.
(is_stable_lvalue): New.
* decl.cc (cxx_maybe_build_cleanup): Handle null arg.
gcc/testsuite/ChangeLog:
* g++.dg/coroutines/pr116506.C: New test.
* g++.dg/coroutines/pr116880.C: New test.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk> Co-authored-by: Jason Merrill <jason@redhat.com>
Jason Merrill [Fri, 31 Jan 2025 17:31:43 +0000 (12:31 -0500)]
c++: coroutines and range for [PR118491]
The implementation of extended range-for temporaries in r15-3840 confused
coroutines, because await_statement_walker and the like get confused by the
EXPR_STMT into thinking that the whole for-loop is a single expression
statement and try to process it accordingly. Fixing this seems to be a
simple matter of dropping the EXPR_STMT.
Harald Anlauf [Sat, 1 Feb 2025 18:14:21 +0000 (19:14 +0100)]
Fortran: different character lengths in array constructor [PR93289]
PR fortran/93289
gcc/fortran/ChangeLog:
* decl.cc (gfc_set_constant_character_len): Downgrade different
string lengths in character array constructor to legacy extension.
gcc/testsuite/ChangeLog:
* gfortran.dg/unlimited_polymorphic_1.f03: Pad element in character
array constructor to correct length.
* gfortran.dg/char_array_constructor_5.f90: New test.
Uros Bizjak [Mon, 3 Feb 2025 20:01:51 +0000 (21:01 +0100)]
i386: Fix and improve TARGET_INDIRECT_BRANCH_REGISTER handling some more
gcc/ChangeLog:
* config/i386/i386.md (*sibcall_pop_memory):
Disable for TARGET_INDIRECT_BRANCH_REGISTER
* config/i386/predicates.md (call_insn_operand): Enable when
"satisfies_constraint_Bw (op)" is true, instead of open-coding
constraint here.
(sibcall_insn_operand): Ditto with "satisfies_constraint_Bs (op)"
This patch fixes the dupq_* testsuite failures. The tests were
introduced with r15-3669-ga92f54f580c3 (which was a nice improvement)
and Pengxuan originally had a follow-on patch to recognise INDEX
constants during vec_init.
I'd originally wanted to solve this a different way, using wildcards
when building a vector and letting vector_builder::finalize find the
best way of filling them in. I no longer think that's the best
approach though. Stepped constants are likely to be more expensive
than unstepped constants, so we should first try finding an unstepped
constant that is valid, even if it has a longer representation than
the stepped version.
This patch therefore uses a variant of Pengxuan's idea.
While there, I noticed that the (old) code for finding an unstepped
constant only tried varying one bit at a time. So for index 0 in a
16-element constant, the code would try picking a constant from index 8,
4, 2, and then 1. But since the goal is to create "fewer, larger,
repeating parts", it would be better to iterate over a bit-reversed
increment, so that after trying an XOR with 0 and 8, we try adding 4
to each previous attempt, then 2 to each previous attempt, and so on.
In the previous example this would give 8, 4, 12, 2, 10, 6, 14, ...
The test shows an example of this for 8 shorts.
gcc/
* config/aarch64/aarch64.cc (aarch64_choose_vector_init_constant):
New function, split out from...
(aarch64_expand_vector_init_fallback): ...here. Use a bit-
reversed increment to find a constant index. Add support for
stepped constants.
gcc/testsuite/
* gcc.target/aarch64/sve/acle/general/dupq_12.c: New test.
hppa: Revise various millicode insn patterns to use match_operand
LRA does not correctly support hard-register input operands that
are clobbered. This is needed to support millicode calls on hppa.
The operand setup is sometimes deleted.
This problem can be avoided by hiding hard-register input operands
using match_operand. This also potentially allows for constraints
that specify the operand is both read and written.
2025-02-03 John David Anglin <danglin@gcc.gnu.org>
gcc/ChangeLog:
PR rtl-optimization/117248
* config/pa/predicates.md (r25_operand): New predicate.
(r26_operand): Likewise.
* config/pa/pa.md: Use match_operand for r25 and r26 hard
register operands in mult, div, udiv, mod and umod millicode
patterns.
Richard Biener [Mon, 3 Feb 2025 08:55:50 +0000 (09:55 +0100)]
tree-optimization/118717 - store commoning vs. abnormals
When we sink common stores in cselim or the sink pass we have to
make sure to not introduce overlapping lifetimes for abnormals
used in the ref. The easiest is to avoid sinking stmts which
reference abnormals at all which is what the following does.
PR tree-optimization/118717
* tree-ssa-phiopt.cc (cond_if_else_store_replacement_1):
Do not common stores referencing abnormal SSA names.
* tree-ssa-sink.cc (sink_common_stores_to_bb): Likewise.
Andi Kleen [Thu, 26 Dec 2024 21:05:57 +0000 (13:05 -0800)]
Size input line cache based on file size
While the input line cache size now tunable it's better if the compiler
auto tunes it. Otherwise large files needing random file access will
still have to search many lines to find the right lines.
Add support for allocating one line anchor per hundred input lines.
This means an overhead of ~235k per 1M input lines on 64bit, which
seems reasonable.
gcc/ChangeLog:
PR preprocessor/118168
* input.cc (file_cache_slot::get_next_line): Implement
dynamic sizing of m_line_record based on input length.
* params.opt: (param_file_cache_lines): Set to 0 to size
dynamically.
Andi Kleen [Wed, 25 Dec 2024 22:41:49 +0000 (14:41 -0800)]
Rebalance file_cache input line cache dynamically
The input context file_cache maintains an array of anchors
to speed up accessing lines before the previous line.
The array has a fixed upper size and the algorithm relies
on the linemap reporting the maximum number of lines in the file
in advance to compute the position of each anchor in the cache.
This doesn't work for C which doesn't know the maximum number
of lines before the files has finished parsing. The code
has a fallback for this, but it is quite inefficient and
effectively defeats the cache, so many accesses have to
go through most of the input buffer to compute line
boundaries. For large files this can be very costly
as demonstrated in PR118168.
Use a different algorithm to maintain the cache without
needing the maximum number of lines in advance. When the cache
runs out of entries and the gap to the last line anchor gets
too large, prune every second entry in the cache. This maintains
even spacing of the line anchors without requiring the maximum
index.
For the original PR this moves the overhead of enabling
-Wmisleading-indentation to 32% with the default cache size.
With a 10k entry cache it becomes noise.
cc1 -O0 -fsyntax-only mypy.c -quiet ran
1.03 ± 0.05 times faster than cc1 -O0 -fsyntax-only mypy.c -quiet -Wmisleading-indentation --param=file-cache-lines=10000
1.09 ± 0.08 times faster than cc1 -O0 -fsyntax-only mypy.c -quiet -Wmisleading-indentation --param=file-cache-lines=1000
1.32 ± 0.07 times faster than cc1 -O0 -fsyntax-only mypy.c -quiet -Wmisleading-indentation
The code could be further optimized, e.g. use the vectorized
line search functions the preprocessor uses.
Also it seems the input cache always reads the whole file into
memory, so perhaps it should just be using file mmap if possible.
gcc/ChangeLog:
PR preprocessor/118168
* input.cc (file_cache_slot::get_next_line): Use new algorithm
to maintain
(file_cache_slot::read_line_num): Use binary search for lookup.
Andi Kleen [Wed, 25 Dec 2024 19:54:13 +0000 (11:54 -0800)]
Add tunables for input buffer
The input machinery to read the source code independent of the lexer
has a range of hard coded maximum array sizes that can impact performance.
Make them tunable.
input.cc is part of libcommon so it cannot direct access params
without a level of indirection.
gcc/ChangeLog:
PR preprocessor/118168
* input.cc (file_cache::tune): New function.
* input.h (class file_cache): Make tunables non const.
* params.opt: Add new tunables.
* toplev.cc (toplev::main): Initialize input buffer context
tunables.
Gaius Mulley [Sun, 2 Feb 2025 16:02:27 +0000 (16:02 +0000)]
PR modula2/117411 Request for documentation to include exception example
This patch adds a new section to the gm2 documentation and new
corresponding testcode to the regression testsuite.
gcc/ChangeLog:
PR modula2/117411
* doc/gm2.texi (Exception handling): New section.
(The ISO system module): Add description of COFF_T.
(Assembler language): Tidy up last sentance.
gcc/testsuite/ChangeLog:
PR modula2/117411
* gm2/iso/run/pass/except9.mod: New test.
* gm2/iso/run/pass/lazyunique.mod: New test.
Lewis Hyatt [Sun, 26 Jan 2025 23:57:00 +0000 (18:57 -0500)]
options: Adjust cl_optimization_compare to avoid checking ICE [PR115913]
At the end of a sequence like:
#pragma GCC push_options
...
#pragma GCC pop_options
the handler for pop_options calls cl_optimization_compare() (as generated by
optc-save-gen.awk) to make sure that all global state has been restored to
the value it had prior to the push_options call. The verification is
performed for almost all entries in the global_options struct. This leads to
unexpected checking asserts, as discussed in the PR, in case the state of
warnings-related options has been intentionally modified in between
push_options and pop_options via a call to #pragma GCC diagnostic. Address
that by skipping the verification for CL_WARNING-flagged options.
gcc/ChangeLog:
PR middle-end/115913
* optc-save-gen.awk (cl_optimization_compare): Skip options with
CL_WARNING flag.
gcc/testsuite/ChangeLog:
PR middle-end/115913
* c-c++-common/cpp/pr115913.c: New test.
This patch builds access to the gcc builtins clz, clzl, clzll,
ctz, ctzl and ctzll within m2builtins.cc. The patch provides
modula2 api access to clz, clzll, ctz and ctzll though the
Builtins definition module. This PR was raised because of
PR118689.
H.J. Lu [Fri, 31 Jan 2025 04:29:04 +0000 (12:29 +0800)]
x86: Handle TARGET_INDIRECT_BRANCH_REGISTER for -fno-plt
If TARGET_INDIRECT_BRANCH_REGISTER is true, indirect call and jump should
use register, not memory. Update Bs, Bw and Bz constraints to disable
indirect call over memmory if TARGET_INDIRECT_BRANCH_REGISTER true, change
x32 call over GOT slot to call over register and also disable sibcall
over memory.
gcc/
PR target/118713
* config/i386/constraints.md (Bs): Always disable if
TARGET_INDIRECT_BRANCH_REGISTER is true.
(Bw): Likewise.
* config/i386/i386-expand.cc (ix86_expand_call): Force indirect
call via register for x32 GOT slot call if
TARGET_INDIRECT_BRANCH_REGISTER is true.
* config/i386/i386-protos.h (ix86_nopic_noplt_attribute_p): New.
* config/i386/i386.cc (ix86_nopic_noplt_attribute_p): Make it
global.
* config/i386/i386.md (*call_got_x32): Disable indirect call via
memory for TARGET_INDIRECT_BRANCH_REGISTER.
(*call_value_got_x32): Likewise.
(*sibcall_value_pop_memory): Likewise.
* config/i386/predicates.md (constant_call_address_operand):
Return false if both TARGET_INDIRECT_BRANCH_REGISTER and
ix86_nopic_noplt_attribute_p are true.
David Malcolm [Sat, 1 Feb 2025 13:38:13 +0000 (08:38 -0500)]
sarif-replay: support "cached" logical locations [§3.33.3]
Some SARIF files offload most of the properties within logical locations
in the results to an array of "cached" instances in
theRun.logicalLocations, so the information can be consolidated (and to
support the "parentIndex" property, which is PR 116176).
Support such files in sarif-replay.
gcc/ChangeLog:
* libsarifreplay.cc (sarif_replayer::handle_run_obj): Pass run to
handle_result_obj.
(sarif_replayer::handle_result_obj): Add run_obj param and pass it
to handle_location_object and handle_thread_flow_object.
(sarif_replayer::handle_thread_flow_object): Add run_obj param and
pass it to handle_thread_flow_location_object.
(sarif_replayer::handle_thread_flow_location_object): Add run_obj
param and pass it to handle_location_object.
(sarif_replayer::handle_location_object): Add run_obj param and
pass it to handle_logical_location_object.
(sarif_replayer::handle_logical_location_object): Add run_obj
param. If the run_obj is non-null and has "logicalLocations",
then use these "cached" logical locations if we see an "index"
property, as per §3.33.3
gcc/testsuite/ChangeLog:
* sarif-replay.dg/2.1.0-invalid/3.33.3-index-out-of-range.sarif:
New test.
* sarif-replay.dg/2.1.0-valid/spec-example-4.sarif: Update expected
output to reflect that we now find the function name for the
events in the path.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
H.J. Lu [Sat, 1 Feb 2025 10:06:33 +0000 (18:06 +0800)]
x86: Add a -mstack-protector-guard=global test
Verify that -mstack-protector-guard=global works on x86. Default stack
protector uses TLS. -mstack-protector-guard=global uses a global variable,
__stack_chk_guard, instead of TLS.
Jeff Law [Fri, 31 Jan 2025 23:59:35 +0000 (16:59 -0700)]
[committed][PR tree-optimization/114277] Fix missed optimization for multiplication against boolean value
Andrew, Raphael and I have all poked at it in various ways over the last year
or so. I think when Raphael and I first looked at it I sent us down a bit of
rathole.
In particular it's odd that we're using a multiply to implement a select and it
seemed like recognizing the idiom and rewriting into a conditional move was the
right path. That looked reasonably good for the test, but runs into problems
with min/max detection elsewhere.
I think that initial investigation somewhat polluted our thinking. The
regression can be fixed with a fairly simple match.pd pattern.
Essentially we want to handle
x * (x || b) -> x
x * !(x || b) -> 0
There's simplifications that can be made for "&&" cases, but I haven't seen
them in practice. Rather than drop in untested patterns, I'm leaving that as a
future todo.
My original was two match.pd patterns. Andrew combined them into a single
pattern. I've made this conditional on GIMPLE as an earlier version that
simplified to a conditional move showed that when applied on GENERIC we could
drop an operand with a side effect which is clearly not good.
I've bootstrapped and regression tested this on x86. I've also tested on the
various embedded targets in my tester.
PR tree-optimization/114277
gcc/
* match.pd (a * (a || b) -> a): New pattern.
(a * !(a || b) -> 0): Likewise.
gcc/testsuite
* gcc.target/i386/pr114277.c: New test.
* gcc.target/riscv/pr114277.c: Likewise.
Co-author: Andrew Pinski <quic_apinski@quicinc.com>
Jakub Jelinek [Fri, 31 Jan 2025 23:50:24 +0000 (00:50 +0100)]
icf: Compare call argument types in certain cases and asm operands [PR117432]
compare_operand uses operand_equal_p under the hood, which e.g. for
INTEGER_CSTs will just match the values rather regardless of their types.
Now, in many comparing the type is redundant, if we have
x_2 = y_3 + 1;
we've already compared the type for the lhs and also for rhs1, there won't
be any surprises on rhs2.
As noted in the PR, there are cases where the type of the operand is the
sole place of information and we don't want to ICF merge functions if the
types differ.
One case is stdarg functions, arguments passed to ..., it is different
if we pass 1, 1L, 1LL.
Another case are the K&R unprototyped functions (sure, gone in C23).
And yet another case are inline asm operands, "r" (1) is different from "r"
(1L) from "r" (1LL).
So, the following patch determines based on lack of fntype (e.g. for
internal functions), or on !prototype_p, or on stdarg_p (in that case
using number of named arguments) which arguments need to have type checked
and does that, plus compares types on inline asm operands (maybe it would be
enough to do that just for input operands but we have just a routine to
handle both and I didn't feel we need to differentiate).
Furthermore, I've noticed fntype{1,2} isn't actually compared if it is a
direct call (gimple_call_fndecl is non-NULL). That is wrong too, we could
have
void (*fn) (int, long long) = (void (*) (int, long long)) foo;
fn (1, 1LL);
in one case and
void (*fn) (long long, int) = (void (*) (long long, int)) foo;
fn (1LL, 1);
in another, both folded into a direct call of foo with different
gimple_call_fntype. Sure, one of them would be UB at runtime (or both), but
what if we ICF merge it into something that into the one UB at runtime
and the program actually calls the correct one only?
2025-02-01 Jakub Jelinek <jakub@redhat.com>
PR ipa/117432
* ipa-icf-gimple.cc (func_checker::compare_asm_inputs_outputs):
Also return_false if operands have incompatible types.
(func_checker::compare_gimple_call): Check fntype1 vs. fntype2
compatibility for all non-internal calls and assume fntype1 and
fntype2 are non-NULL for those. For calls to non-prototyped
calls or for stdarg_p functions after the last named argument (if any)
check type compatibility of call arguments.
* gcc.c-torture/execute/pr117432.c: New test.
* gcc.target/i386/pr117432.c: New test.
Jakub Jelinek [Fri, 31 Jan 2025 23:48:21 +0000 (00:48 +0100)]
c++: check_flexarray fixes [PR117516]
On the pr117516.C testcase check_flexarrays and its helper functions
have exponential complexity, plus it reports the same bug over and over
again in some cases instead of reporting perhaps other bugs.
The functions want to diagnose flexible array member (and strangely [0]
arrays too) followed by some other non-empty or array members in the same
strcuture, or followed by other non-empty or array members in a containing
structure (any of them), or flexible array members/[0] arrays in structures
with no other non-empty members, or those nested in other structures.
Strangely it doesn't complain if flexible array member is in a structure
used in an array.
As can be seeen on e.g. the flexary41.C test, it keeps reporting the
same bug over and over:
flexary41.C:5:24: error: flexible array member ‘A::b’ not at end of ‘struct A’
flexary41.C:5:24: error: flexible array member ‘A::b’ not at end of ‘struct B’
flexary41.C:5:24: error: flexible array member ‘A::b’ not at end of ‘struct C’
flexary41.C:5:24: error: flexible array member ‘A::b’ not at end of ‘struct D’
flexary41.C:13:39: error: flexible array member ‘E::<unnamed struct>::n’ not at end of ‘struct E’
flexary41.C:18:23: error: flexible array member ‘H::t’ not at end of ‘struct K’
flexary41.C:25:36: note: next member ‘int K::ab’ declared here
flexary41.C:25:8: note: in the definition of ‘struct K’
The bug that A::b is followed by A::c is one bug reported 4 times, while it
doesn't report the other bugs, that B::e flexarray is followed by B::f
and that C::h flexarray is followed by C::i.
That is because it always walks all the structures/unions of all the members
and just finds the first flexarray in there.
Now, this has horrible complexity plus it doesn't seem really useful to
users. So, for cases where a flexible array member is followed by a
non-empty other member in the same structure, the following patch just
reports it once when finalizing that structure, and otherwise just recurses
in structures solely into the last member, so that it can report cases like
struct X { int a; int b[]; };
struct Y { X c; int d; };
or
struct Z { X c; };
i.e. correct use of flexarray in X but following it by another member in Y
or just nesting it (the former is error, the latter pedwarn as before).
By only looking at the last member for structures we get rid of the complexity.
Note, the patch doesn't do anything about unions, I think we still could
spend a lot of time compiling.
struct S { char s; };
union U0 { S a, b; };
union U1 { union U0 a, b; };
union U2 { union U1 a, b; };
...
union U32 { union U31 a, b; };
struct T { union U32 a; int b; };
Not really sure what we could do about that, all the elements are "last"
(but admittedly I haven't studied in detail how the original code worked
in union, there is fmem->after[pun] where pun is whether it is somewhere
inside of a union). Perhaps in a hash table marking unions which don't have
any flexarrays at the end, nested or not, so that we don't walk them again?
Plus if we find some with flexarray at the end, maybe there is no point
to look other union members? In any case, I think that is less severe,
because people usually don't nest unions deeply.
2025-02-01 Jakub Jelinek <jakub@redhat.com>
PR c++/117516
* class.cc (field_nonempty_p): Formatting fixes. Use
integer_zerop instead of tree_int_cst_equal with size_zero_node.
(struct flexmems_t): Change type of first member from tree to bool.
(find_flexarrays): Add nested_p argument. Change pun argument type
from tree to bool, adjust uses. Formatting fixes. If BASE_P or
NESTED_P and T is RECORD_TYPE, start looking only at the last
non-empty or array FIELD_DECL. Adjust recursive call, set first
if it was a nested call and found an array.
(diagnose_invalid_flexarray, diagnose_flexarrays, check_flexarrays):
Formatting fixes.
* g++.dg/ext/flexary9.C: Expect different wording of one of the
warnings and at a different line.
* g++.dg/ext/flexary19.C: Likewise.
* g++.dg/ext/flexary42.C: New test.
* g++.dg/other/pr117516.C: New test.
Patrick Palka [Fri, 31 Jan 2025 20:53:12 +0000 (15:53 -0500)]
libstdc++: Fix flat_foo::insert_range for non-common ranges [PR118156]
This fixes flat_map/multimap::insert_range by just generalizing the
insert implementation to handle heterogenous iterator/sentinel pair.
I'm not sure we can do better than this, e.g. we can't implement it in
terms of the adapted containers' insert_range because that'd require two
passes over the range.
For flat_set/multiset, we can implement insert_range directly in terms
of the adapted container's insert_range. A fallback implementation
is also provided if insert_range isn't available, as is the case for
std::deque currently.
PR libstdc++/118156
libstdc++-v3/ChangeLog:
* include/std/flat_map (_Flat_map_impl::_M_insert): Generalized
version of insert taking heterogenous iterator/sentinel pair.
(_Flat_map_impl::insert): Dispatch to _M_insert.
(_Flat_map_impl::insert_range): Likewise.
(flat_map): Export _Flat_map_impl::insert_range.
(flat_multimap): Likewise.
* include/std/flat_set (_Flat_set_impl::insert_range):
Reimplement directly, not in terms of insert.
(flat_set): Export _Flat_set_impl::insert_range.
(flat_multiset): Likewise.
* testsuite/23_containers/flat_map/1.cc (test06): New test.
* testsuite/23_containers/flat_multimap/1.cc (test06): New test.
* testsuite/23_containers/flat_multiset/1.cc (test06): New test.
* testsuite/23_containers/flat_set/1.cc (test06): New test.
Harald Anlauf [Thu, 30 Jan 2025 21:21:19 +0000 (22:21 +0100)]
Fortran: host association issue with symbol in COMMON block [PR108454]
When resolving a flavorless symbol that is already registered with a COMMON
block, and which neither has the intrinsic, generic, or external attribute,
skip searching among interfaces to avoid false resolution to a derived type
of the same name.
PR fortran/108454
gcc/fortran/ChangeLog:
* resolve.cc (resolve_common_blocks): Initialize variable.
(resolve_symbol): If a symbol is already registered with a COMMON
block, do not search for an interface with the same name.
[PR116234][LRA]: Check debug insn when looking at one insn pseudo occurrence
LRA can change reg class to NO_REGS when pseudo referred in one
insn. Checking the references did not take into account that referring
insn can be a debug insn. This resulted in different code generation
with and without debug info generation. The patch fixes this pitfall.
gcc/ChangeLog:
PR rtl-optimization/116234
* lra-constraints.cc (multiple_insn_refs_p): New function.
(curr_insn_transform): Use it.
Eric Botcazou [Fri, 31 Jan 2025 11:41:19 +0000 (12:41 +0100)]
Fix wrong elaboration for allocator at library level of dynamic library
The problem was preexisting for class-wide allocators, but now occurs for
allocators of controlled types too, because of the recent overhaul of the
finalization machinery.
gcc/ada/
* gcc-interface/utils.cc (gnat_pushdecl): Clear TREE_PUBLIC on
functions really nested in another function.
Jakub Jelinek [Fri, 31 Jan 2025 11:39:34 +0000 (12:39 +0100)]
testsuite: Add testcase for already fixed PR [PR117498]
This wrong-code issue has been fixed with r15-7249.
We still emit warnings which are questionable and perhaps we'd
get better generated code if niters determined the loop has only a single
iteration without UB and we'd punt on vectorizing it (or unrolling).
2025-01-31 Jakub Jelinek <jakub@redhat.com>
PR middle-end/117498
* gcc.c-torture/execute/pr117498.c: New test.
Richard Biener [Fri, 31 Jan 2025 07:56:39 +0000 (08:56 +0100)]
debug/100530 - Revert QUAL_ADDR_SPACE handling from dwarf2out.cc
The bug clearly shows that r8-4385-ga297ccb52e0c89 was wrong in
enabling handling of address-space qualification as DWARF type
qualifiers as the code isn't prepared to it actually be not handled
and ends up changing a lesser qualified (without address-space)
type DIE in ways tripping asserts. The following reverts that
part which then causes the DIE for the same type with address-space
qualifiers removed to be re-used since there's currently no code
to encode address-spaces within dwarf2out.cc or in the DWARF spec.
r8-4385-ga297ccb52e0c89 did not come with a testcase nor a good
description of the bug fixed - I've verified const qualification
mixed with address-spaces creates the expected DWARF.
PR debug/100530
* dwarf2out.cc (modified_type_die): Do not claim we handle
address-space qualification with dwarf_qual_info[].
Jakub Jelinek [Fri, 31 Jan 2025 10:02:41 +0000 (11:02 +0100)]
niter: Make build_cltz_expr more robust [PR118689]
Since my r15-7223 the niter analysis can recognize one loop during bootstrap
as being ctz like.
The patch just turned
@@ -2173,7 +2173,7 @@ PROC m2pim_NumberIO_BinToStr (CARDINAL x
_T535_44 = &buf[i.40_2]{lb: 1 sz: 4};
_T536_45 = x_21 & 1;
*_T535_44 = _T536_45;
- _T537_47 = x_21 / 2;
+ _T537_47 = x_21 >> 1;
x_48 = _T537_47;
# DEBUG x => x_48
if (x_48 != 0)
which is not a big deal for the number_of_iterations_cltz optimization, it
recognizes both right shift by 1 and unsigned division by 2 (and similarly
for clz left shift by 1 or multiplication by 2).
But starting with forwprop1 that change also resulted in
@@ -1875,9 +1875,9 @@ PROC m2pim_NumberIO_BinToStr (CARDINAL x
i.40_2 = (INTEGER) _T530_34;
_T536_45 = x_21 & 1;
MEM <CARDINAL[1:64]> [(CARDINAL *)&buf][i.40_2]{lb: 1 sz: 4} = _T536_45;
- _T537_47 = x_21 / 2;
+ _T537_47 = x_21 >> 1;
# DEBUG x => _T537_47
- if (x_21 > 1)
+ if (_T537_47 != 0)
goto <bb 3>; [INV]
else
goto <bb 8>; [INV]
and apparently it is only the latter form that number_of_iterations_cltz
pattern matches, not the former (after all, that was the exact reason
for r15-7223).
The problem is that build_cltz_expr assumes if IFN_C[LT]Z can't be used it
can use the __builtin_c[lt]z{,l,ll} builtins, and while most of the FEs do
create them, modula 2 does not.
The following patch just lets us punt if the FE doesn't build those builtins.
I've filed a PR against modula2 so that they add the builtins too.
2025-01-31 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/118689
PR modula2/115032
* tree-ssa-loop-niter.cc (build_cltz_expr): Return NULL_TREE if fn is
NULL and use_ifn is false.
Richard Biener [Thu, 30 Jan 2025 13:52:14 +0000 (14:52 +0100)]
Do not rely on non-SLP analysis for SLP outer loop vectorization
We end up relying on non-SLP analysis of the inner loop LC PHI to
set the vectorizationb method for SLP since vectorizable_reduction
claims responsibility. The following fixes this.
* tree-vect-loop.cc (vect_analyze_loop_operations): Only
call vectorizable_lc_phi when not PURE_SLP.
(vectorizable_reduction): Do not claim having handled
the inner loop LC PHI for outer loop vectorization.
Ian Lance Taylor [Thu, 30 Jan 2025 23:20:23 +0000 (15:20 -0800)]
libbacktrace: add casts to avoid undefined shifts
Patch from pgerell@github.
* elf.c (elf_fetch_bits): Add casts to avoid potentially shifting
a value farther than its type size.
(elf_fetch_bits_backward): Likewise.
(elf_uncompress_lzma_block): Likewise.
(elf_uncompress_lzma): Likewise.
Georg-Johann Lay [Thu, 30 Jan 2025 11:16:50 +0000 (12:16 +0100)]
AVR: Provide built-ins for strlen where the string lives in some AS.
This patch adds built-in functions __builtin_avr_strlen_flash,
__builtin_avr_strlen_flashx and __builtin_avr_strlen_memx.
Purpose is that higher-level functions can use __builtin_constant_p
on strlen without raising a diagnostic due to -Waddr-space-convert.
gcc/
* config/avr/builtins.def (STRLEN_FLASH, STRLEN_FLASHX)
(STRLEN_MEMX): New DEF_BUILTIN's.
* config/avr/avr.cc (avr_ftype_strlen): New static function.
(avr_builtin_supported_p): New built-ins are not for AVR_TINY.
(avr_init_builtins) <strlen_flash_node, strlen_flashx_node,
strlen_memx_node>: Provide new fntypes.
(avr_fold_builtin) [AVR_BUILTIN_STRLEN_FLASH]
[AVR_BUILTIN_STRLEN_FLASHX, AVR_BUILTIN_STRLEN_MEMX]: Fold if
possible.
* doc/extend.texi (AVR Built-in Functions): Document
__builtin_avr_strlen_flash, __builtin_avr_strlen_flashx,
__builtin_avr_strlen_memx.
libgcc/
* config/avr/t-avr (LIB1ASMFUNCS): Add _strlen_memx.
* config/avr/lib1funcs.S <L_strlen_memx, __strlen_memx>: Implement.
Georg-Johann Lay [Thu, 30 Jan 2025 11:05:19 +0000 (12:05 +0100)]
AVR: Only provide a built-in when it is available.
Some built-ins are not available for C++ since they are using
named address-spaces or fixed-point types.
gcc/
* config/avr/builtins.def (AVR_FIRST_C_ONLY_BUILTIN_ID): New macro.
* config/avr/avr-protos.h (avr_builtin_supported_p): New.
* config/avr/avr.cc (avr_builtin_supported_p): New function.
(avr_init_builtins): Only provide a built-in when it is supported.
* config/avr/avr-c.cc (avr_cpu_cpp_builtins): Only define the
__BUILTIN_AVR_<NAME> build-in defines when the associated built-in
function is supported.
* doc/extend.texi (AVR Built-in Functions): Add a note that
following built-ins are supported for only for GNU-C.
Sandra Loosemore [Thu, 30 Jan 2025 16:45:27 +0000 (16:45 +0000)]
OpenMP: Update documentation of metadirective implementation status.
libgomp/ChangeLog
* libgomp.texi (OpenMP 5.0): Mark metadirective and declare variant
as implemented.
(OpenMP 5.1): Mark target_device as supported.
Add changed interaction between declare target and OpenMP context
and dynamic selector support.
(OpenMP 5.2): Mark otherwise clause as supported, note that
default is also still accepted.
Jakub Jelinek [Thu, 30 Jan 2025 17:30:10 +0000 (18:30 +0100)]
s390: Fix up *vec_cmpgt{,u}<mode><mode>_nocc_emu splitters [PR118696]
The following testcase is miscompiled on s390x-linux with e.g. -march=z13
(both -O0 and -O2) starting with r15-7053.
The problem is in the splitters which emulate TImode/V1TImode GT and GTU
comparisons.
For GT we want to do
(ior (gt (hi op1) (hi op2))
(and (eq (hi op1) (hi op2)) (gtu (lo op1) (lo op2))))
and for GTU similarly except for gtu instead of gt in there.
Now, the splitter emulation is using V2DImode comparisons where on s390x
the hi part is in the first element of the vector, lo part in the second,
and for the gtu case it swaps the elements of the vector.
So, we get the right result in the first element of the result vector.
But vrepg was then broadcasting the second element of the result vector
rather than the first, and the value of the second element of the vector
is instead
(ior (gt (lo op1) (lo op2))
(and (eq (lo op1) (lo op2)) (gtu (hi op1) (hi op2))))
so something not really usable for the emulated comparison.
The following patch fixes that. The testcase tries to test behavior of
double-word smin/smax/umin/umax with various cases of the halves of both
operands (one that is sometimes EQ, sometimes GT, sometimes LT, sometimes
GTU, sometimes LTU).
2025-01-30 Jakub Jelinek <jakub@redhat.com>
Stefan Schulze Frielinghaus <stefansf@gcc.gnu.org>
PR target/118696
* config/s390/vector.md (*vec_cmpgt<mode><mode>_nocc_emu,
*vec_cmpgtu<mode><mode>_nocc_emu): Duplicate the first rather than
second V2DImode element.
* gcc.dg/pr118696.c: New test.
* gcc.target/s390/vector/pr118696.c: New test.
* gcc.target/s390/vector/vec-abs-emu.c: Expect vrepg with 0 as last
operand rather than 1.
* gcc.target/s390/vector/vec-max-emu.c: Likewise.
* gcc.target/s390/vector/vec-min-emu.c: Likewise.
Patrick Palka [Thu, 30 Jan 2025 15:30:56 +0000 (10:30 -0500)]
c++: remove LAMBDA_EXPR_CAPTURES_THIS_P
This unused accessor is just a simple alias of LAMBDA_EXPR_THIS_CAPTURE
and contrary to its documentation doesn't use TREE_LANG_FLAG_0. Might
as well remove it.
Richard Biener [Thu, 30 Jan 2025 10:22:37 +0000 (11:22 +0100)]
middle-end/118695 - missed misalign handling in MEM_REF expansion
When MEM_REF expansion of a non-MEM falls back to a stack temporary
we fail to handle the case where the offset adjusted reference to
the temporary is not aligned according to the requirement of the
mode. We have to go through bitfield extraction or movmisalign
in this case. Fortunately there's a helper for this.
This fixes an ICE observed on arm which has sanity checks in its
move patterns for this.
PR middle-end/118695
* expr.cc (expand_expr_real_1): When expanding a MEM_REF
to a non-MEM by committing it to a stack temporary make
sure to handle misaligned accesses correctly.
Jonathan Wakely [Thu, 9 Jan 2025 21:50:31 +0000 (21:50 +0000)]
libstdc++: Use safe integer comparisons in std::latch [PR98749]
The std::latch::max() function assumes that the returned value can be
represented by ptrdiff_t, which is true when __platform_wait_t is int
(e.g. on Linux) but not when it's unsigned long, which is the case for
most other 64-bit targets. We should use the smaller of PTRDIFF_MAX and
std::numeric_limits<__platform_wait_t>::max(). Use std::cmp_less to do a
safe comparison that works for all types. We can also use std::cmp_less
and std::cmp_equal in std::latch::count_down so that we don't need to
deal with comparisons between signed and unsigned.
Also add a missing precondition check to constructor and fix the
existing check in count_down which was duplicated by mistake.
libstdc++-v3/ChangeLog:
PR libstdc++/98749
* include/std/latch (latch::max()): Ensure the return value is
representable as the return type.
(latch::latch(ptrdiff_t)): Add assertion.
(latch::count_down): Fix copy & pasted duplicate assertion. Use
std::cmp_equal to compare __platform_wait_t and ptrdiff_t
values.
(latch::_M_a): Use defined constant for alignment.
* testsuite/30_threads/latch/1.cc: Check max(). Check constant
initialization works for values in the valid range. Check
alignment.
Tobias Burnus [Thu, 30 Jan 2025 10:28:50 +0000 (11:28 +0100)]
OpenMP: append_args clause fixes + Fortran support
This fixes a large number of smaller and larger issues with the append_args
clause to 'declare variant' and adds Fortran support for it; it also contains
a larger number of testcases.
In particular, for Fortran, it also handles passing allocatable, pointer,
optional arguments to an interop dummy argument with or without value
attribute. And it changes the internal representation such that dumping the
tree does not lead to an ICE.
gcc/c/ChangeLog:
* c-parser.cc (c_finish_omp_declare_variant): Modify how
append_args is saved internally.
gcc/cp/ChangeLog:
* parser.cc (cp_finish_omp_declare_variant): Modify how append_args
is saved internally.
* pt.cc (tsubst_attribute): Likewise.
(tsubst_omp_clauses): Remove C_ORT_OMP_DECLARE_SIMD from interop
handling as no longer called for it.
* decl.cc (omp_declare_variant_finalize_one): Update append_args
changes; fixes for ADL input.
gcc/fortran/ChangeLog:
* gfortran.h (gfc_omp_declare_variant): Add append_args_list.
* openmp.cc (gfc_parser_omp_clause_init_modifiers): New;
splitt of from ...
(gfc_match_omp_init): ... here; call it.
(gfc_match_omp_declare_variant): Update to handle append_args
clause; some syntax handling fixes.
* trans-openmp.cc (gfc_trans_omp_declare_variant): Handle
append_args clause; add some diagnostic.
gcc/ChangeLog:
* gimplify.cc (gimplify_call_expr): For OpenMP's append_args clause
processed by 'omp dispatch', update for internal-representation
changes; fix handling of hidden arguments, add some comments and
handle Fortran's value dummy and optional/pointer/allocatable actual
args.
libgomp/ChangeLog:
* libgomp.texi (Impl. Status): Update for accumpulated changes
related to 'dispatch' and interop.
gcc/testsuite/ChangeLog:
* c-c++-common/gomp/append-args-1.c: Update dg-*.
* c-c++-common/gomp/append-args-3.c: Likewise.
* g++.dg/gomp/append-args-1.C: Likewise.
* gfortran.dg/gomp/adjust-args-1.f90: Likewise.
* gfortran.dg/gomp/adjust-args-3.f90: Likewise.
* gfortran.dg/gomp/declare-variant-2.f90: Likewise.
* c-c++-common/gomp/append-args-6.c: New test.
* c-c++-common/gomp/append-args-7.c: New test.
* c-c++-common/gomp/append-args-8.c: New test.
* c-c++-common/gomp/append-args-9.c: New test.
* g++.dg/gomp/append-args-4.C: New test.
* g++.dg/gomp/append-args-5.C: New test.
* g++.dg/gomp/append-args-6.C: New test.
* g++.dg/gomp/append-args-7.C: New test.
* gcc.dg/gomp/append-args-1.c: New test.
* gfortran.dg/gomp/append_args-1.f90: New test.
* gfortran.dg/gomp/append_args-2.f90: New test.
* gfortran.dg/gomp/append_args-3.f90: New test.
* gfortran.dg/gomp/append_args-4.f90: New test.
Richard Biener [Wed, 29 Jan 2025 14:09:35 +0000 (15:09 +0100)]
middle-end/118692 - ICE with out-of-bound ref expansion
The following guards the BIT_FIELD_REF expansion fallback for
MEM_REFs of entities expanded to register (or constant) further,
avoiding large out-of-bound offsets by, when the access does not
overlap the base object, expanding the offset as if it were zero.
PR middle-end/118692
* expr.cc (expand_expr_real_1): When expanding a MEM_REF
as BIT_FIELD_REF avoid large offsets for accesses not
overlapping the base object.
Richard Biener [Wed, 29 Jan 2025 12:25:14 +0000 (13:25 +0100)]
tree-optimization/114052 - consider infinite sub-loops when lowering iter bound
When we walk stmts to find always executed stmts with UB in the last
iteration to be able to reduce the iteration count by one we fail
to consider infinite subloops in the last iteration that would make
such stmt not execute. The following adds this.
PR tree-optimization/114052
* tree-ssa-loop-niter.cc (maybe_lower_iteration_bound): Check
for infinite subloops we might not exit.
pair-fusion: Check for invalid use arrays [PR118320]
As Andrew says in the bugzilla comments, this PR is about a case where
we tried to fuse two stores of x0, one in which x0 was defined and one
in which it was undefined. merge_access_arrays failed on the conflict,
but the failure wasn't caught.
Normally the hazard detection code would fail if the instructions
had incompatible uses. However, an undefined use doesn't impose
many restrictions on movements. I think this is likely to be the
only case where hazard detection isn't enough.
As Andrew notes in bugzilla, it might be possible to allow uses
of defined and undefined values to be merged to the defined value.
But that sounds dangerous in the general case, as an rtl-ssa-level
decision. We might run the risk of turning conditional UB into
unconditional UB. And LLVM proves that the definition of "undef"
isn't simple.
gcc/
PR rtl-optimization/118320
* pair-fusion.cc (pair_fusion_bb_info::fuse_pair): Commonize
the merge of input_uses and return early if it fails.
gcc/testsuite/
PR rtl-optimization/118320
* g++.dg/torture/pr118320.C: New test.
Jeff Law [Thu, 30 Jan 2025 02:42:11 +0000 (19:42 -0700)]
[PR testsuite/116860] Testsuite adjustment for recently added tests
There's two new tests that are dependent on logical-op-non-short-circuit
settings. The BZ is reported against ppc64 and ppc64le, but also applies to a
goodly number of the other targets.
The "regression" fix is trivial, just add the appropriate param to force the
behavior we're expecting. I'm committing that fix momentarily. It's been
verified on ppc64, ppc64le and x86_64 as well as the various embedded targets
in my tester where many FAILS flip to PASS.
I'm leaving the bug open without the regression marker as Jakub has noted a
couple of improvements that we can and probably should make.
PR target/116860
gcc/testsuite
* gcc.dg/tree-ssa/fold-xor-and-or.c: Set logical-op-non-short-circuit.
* gcc.dg/tree-ssa/fold-xor-or.c: Similarly.
Gaius Mulley [Wed, 29 Jan 2025 20:32:07 +0000 (20:32 +0000)]
PR modula2/116073 invalid rtl sharing compiling FileSystem.mod caused by ext-dce
The bug fixes to PR modula2/118010 and PR modula2/118183 uncovered a bug
in the procedure interface to lseek which uses SYSTEM.COFF_T rather than
SYSTEM.CSSIZE_T. This patch sets the default size for COFF_T to the same
as CSSIZE_T.
gcc/ChangeLog:
PR modula2/118010
PR modula2/118183
PR modula2/116073
* doc/gm2.texi (-fm2-file-offset-bits=): Change the default size
description to CSSIZE_T.
Add COFF_T to the list of data types exported by SYSTEM.def.
gcc/m2/ChangeLog:
PR modula2/118010
PR modula2/118183
PR modula2/116073
* gm2-compiler/M2Options.mod (OffTBits): Assign to 0.
* gm2-gcc/m2type.cc (build_m2_specific_size_type): Ensure that
layout_type is called before returning c.
(build_m2_offt_type_node): If GetFileOffsetBits returns 0 then
use the type size of ssize_t.
gcc/testsuite/ChangeLog:
PR modula2/118010
PR modula2/118183
PR modula2/116073
* gm2/pim/run/pass/printtypesize.mod: New test.
Arsen Arsenović [Wed, 29 Jan 2025 20:14:33 +0000 (21:14 +0100)]
d: give dependency files better filenames [PR118477]
Currently, the dependency files for root-file.o and common-file.o were
both d/.deps/file.Po, which would cause parallel builds to fail
sometimes with:
make[3]: Leaving directory '/var/tmp/portage/sys-devel/gcc-14.1.1_p20240511/work/build/gcc'
make[3]: Entering directory '/var/tmp/portage/sys-devel/gcc-14.1.1_p20240511/work/build/gcc'
mv: cannot stat 'd/.deps/file.TPo': No such file or directory
make[3]: *** [/var/tmp/portage/sys-devel/gcc-14.1.1_p20240511/work/gcc-14-20240511/gcc/d/Make-lang.in:421: d/root-file.o] Error 1 shuffle=131581365
Also, this means that dependencies of one of root-file or common-file
are missing when developing. After this patch, those two files get
assigned dependency files d/.deps/root-file.Po and
d/.deps/common-file.Po respectively, so match the actual object
files in the d/ subdirectory.
There are other files with similar conflicts (mangle-package.o,
visitor-package.o for instance).
2025-01-29 Arsen Arsenović <arsen@aarsen.me>
Jakub Jelinek <jakub@redhat.com>
PR d/118477
* Make-lang.in (DCOMPILE, DPOSTCOMPILE): Use $(basename $(@F))
instead of $(*F).
pair-fusion: A couple of fixes for sp updates [PR118429]
The PR showed two issues with pair-fusion. The first is that the pass
treated stack pointer deallocations as ordinary register updates, and so
might move them earlier than another stack access (through a different
base register) that doesn't alias the pair candidate.
The simplest fix for that seems to be to prevent the stack deallocation
from being moved. This part might (or might not) be a latent source of
wrong code and so worth backporting in some form. (The patch as-is
won't work for GCC 14.)
The second issue only started with r15-6551, which added a memory
write to stack allocations and deallocations. We should use the
existing tombstone mechanism to preserve the associated memory
definition. (Deleting definitions immediately would have quadratic
complexity in the worst case.)
gcc/
PR rtl-optimization/118429
* pair-fusion.cc (latest_hazard_before): Add an extra parameter
to say whether the instruction is a load or a store. If the
instruction is not a load or store and has memory side effects,
prevent it from being moved earlier.
(pair_fusion::find_trailing_add): Update call accordingly.
(pair_fusion_bb_info::fuse_pair): If the trailng addition had
a memory side-effect, use a tombstone to preserve it.
gcc/testsuite/
PR rtl-optimization/118429
* gcc.c-torture/compile/pr118429.c: New test.