d: Fix ICE in dwarf2out_imported_module_or_decl, at dwarf2out.cc:27676 [PR119817]
The ImportVisitor method for handling the importing of overload sets was
pushing NULL_TREE to the array of import decls, which in turn got passed
to `debug_hooks->imported_module_or_decl', triggering the observed
internal compiler error.
NULL_TREE is returned from `build_import_decl' when the symbol was
ignored for being non-trivial to represent in debug, for example,
template or tuple declarations. So similarly "skip" adding the symbol
when this is the case for overload sets too.
PR d/119817
gcc/d/ChangeLog:
* imports.cc (ImportVisitor::visit (OverloadSet *)): Don't push
NULL_TREE to vector of import symbols.
gcc/testsuite/ChangeLog:
* gdc.dg/debug/imports/m119817/a.d: New test.
* gdc.dg/debug/imports/m119817/b.d: New test.
* gdc.dg/debug/imports/m119817/package.d: New test.
* gdc.dg/debug/pr119817.d: New test.
d: Fix forward referenced enums missing type names in debug info [PR118309]
Calling `rest_of_type_compilation' as the D types were built meant that
debug info was being emitted before all forward references were
resolved, resulting in DW_AT_name's to be missing.
Instead, defer outputting type debug information until all modules have
been parsed and generated in `d_finish_compilation'.
PR d/118309
gcc/d/ChangeLog:
* modules.cc: Include debug.h
(d_finish_compilation): Call debug_hooks->type_decl on all TYPE_DECLs.
* types.cc: Remove toplev.h include.
(finish_aggregate_type): Don't call rest_of_type_compilation or
rest_of_decl_compilation on type.
(TypeVisitor::visit (TypeEnum *)): Likewise.
Jakub Jelinek [Wed, 16 Apr 2025 15:22:49 +0000 (17:22 +0200)]
libatomic: Fix up libat_{,un}lock_n for mingw [PR119796]
Here is just a port of the previously posted patch to mingw which
clearly has the same problems.
2025-04-16 Jakub Jelinek <jakub@redhat.com>
PR libgcc/101075
PR libgcc/119796
* config/mingw/lock.c (libat_lock_n, libat_unlock_n): Start with
computing how many locks will be needed and take into account
((uintptr_t)ptr % WATCH_SIZE). If some locks from the end of the
locks array and others from the start of it will be needed, first
lock the ones from the start followed by ones from the end.
Jakub Jelinek [Wed, 16 Apr 2025 15:21:39 +0000 (17:21 +0200)]
libatomic: Fix up libat_{,un}lock_n [PR119796]
As mentioned in the PR (and I think in PR101075 too), we can run into
deadlock with libat_lock_n calls with larger n.
As mentioned in PR66842, we use multiple locks (normally 64 mutexes
for each 64 byte cache line in 4KiB page) and currently can lock more
than one lock, in particular for n [0, 64] a single lock, for n [65, 128]
2 locks, for n [129, 192] 3 locks etc.
There are two problems with this:
1) we can deadlock if there is some wrap-around, because the locks are
acquired always in the order from addr_hash (ptr) up to
locks[NLOCKS-1].mutex and then if needed from locks[0].mutex onwards;
so if e.g. 2 threads perform libat_lock_n with n = 2048+64, in one
case at pointer starting at page boundary and in another case at
page boundary + 2048 bytes, the first thread can lock the first
32 mutexes, the second thread can lock the last 32 mutexes and
then first thread wait for the lock 32 held by second thread and
second thread wait for the lock 0 held by the first thread;
fixed below by always locking the locks in order of increasing
index, if there is a wrap-around, by locking in 2 loops, first
locking some locks at the start of the array and second at the
end of it
2) the number of locks seems to be determined solely depending on the
n value, I think that is wrong, we don't know the structure alignment
on the libatomic side, it could very well be 1 byte aligned struct,
and so how many cachelines are actually (partly or fully) occupied
by the atomic access depends not just on the size, but also on
ptr % WATCH_SIZE, e.g. 2 byte structure at address page_boundary+63
should IMHO lock 2 locks because it occupies the first and second
cacheline
Note, before this patch it locked exactly one lock for n = 0, while
with this patch it could lock either no locks at all (if it is at cacheline
boundary) or 1 (otherwise).
Dunno of libatomic APIs can be called for zero sizes and whether
we actually care that much how many mutexes are locked in that case,
because one can't actually read/write anything into zero sized memory.
If you think it is important, I could add else if (nlocks == 0) nlocks = 1;
in both spots.
2025-04-16 Jakub Jelinek <jakub@redhat.com>
PR libgcc/101075
PR libgcc/119796
* config/posix/lock.c (libat_lock_n, libat_unlock_n): Start with
computing how many locks will be needed and take into account
((uintptr_t)ptr % WATCH_SIZE). If some locks from the end of the
locks array and others from the start of it will be needed, first
lock the ones from the start followed by ones from the end.
Jakub Jelinek [Wed, 16 Apr 2025 07:11:06 +0000 (09:11 +0200)]
bitintlower: Fix interaction of gimple_assign_copy_p stmts vs. has_single_use [PR119808]
The following testcase is miscompiled, because we emit a CLOBBER in a place
where it shouldn't be emitted.
Before lowering we have:
b_5 = 0;
b.0_6 = b_5;
b.1_1 = (unsigned _BitInt(129)) b.0_6;
...
<retval> = b_5;
The bitint coalescing assigns the same partition/underlying variable
for both b_5 and b.0_6 (possible because there is a copy assignment)
and of course a different one for b.1_1 (and other SSA_NAMEs in between).
This is -O0 so stmts aren't DCEd and aren't propagated that much etc.
It is -O0 so we also don't try to optimize and omit some names from m_names
and handle multiple stmts at once, so the expansion emits essentially
bitint.4 = {};
bitint.4 = bitint.4;
bitint.2 = cast of bitint.4;
bitint.4 = CLOBBER;
...
<retval> = bitint.4;
and the CLOBBER is the problem because bitint.4 is still live afterwards.
We emit the clobbers to improve code generation, but do it only for
(initially) has_single_use SSA_NAMEs (remembered in m_single_use_names)
being used, if they don't have the same partition on the lhs and a few
other conditions.
The problem above is that b.0_6 which is used in the cast has_single_use
and so was in m_single_use_names bitmask and the lhs in that case is
bitint.2, so a different partition. But there is gimple_assign_copy_p
with SSA_NAME rhs1 and the partitioning special cases those and while
b.0_6 is single use, b_5 has multiple uses. I believe this ought to be
a problem solely in the case of such copy stmts and its special case
by the partitioning, if instead of b.0_6 = b_5; there would be
b.0_6 = b_5 + 1; or whatever other stmts that performs or may perform
changes on the value, partitioning couldn't assign the same partition
to b.0_6 and b_5 if b_5 is used later, it couldn't have two different
(or potentially different) values in the same bitint.N var. With
copy that is possible though.
So the following patch fixes it by being more careful when we set
m_single_use_names, don't set it if it is a has_single_use SSA_NAME
but SSA_NAME_DEF_STMT of it is a copy stmt with SSA_NAME rhs1 and that
rhs1 doesn't have single use, or has_single_use but SSA_NAME_DEF_STMT of it
is a copy stmt etc.
Just to make sure it doesn't change code generation too much, I've gathered
statistics how many times
if (m_first
&& m_single_use_names
&& m_vars[p] != m_lhs
&& m_after_stmt
&& bitmap_bit_p (m_single_use_names, SSA_NAME_VERSION (op)))
{
tree clobber = build_clobber (TREE_TYPE (m_vars[p]),
CLOBBER_STORAGE_END);
g = gimple_build_assign (m_vars[p], clobber);
gimple_stmt_iterator gsi = gsi_for_stmt (m_after_stmt);
gsi_insert_after (&gsi, g, GSI_SAME_STMT);
}
emits a clobber on
make check-gcc GCC_TEST_RUN_EXPENSIVE=1 RUNTESTFLAGS="--target_board=unix\{-m64,-m32\} GCC_TEST_RUN_EXPENSIVE=1 dg.exp='*bitint* pr112673.c builtin-stdc-bit-*.c pr112566-2.c pr112511.c pr116588.c pr116003.c pr113693.c pr113602.c flex-array-counted-by-7.c' dg-torture.exp='*bitint* pr116480-2.c pr114312.c pr114121.c' dfp.exp=*bitint* i386.exp='pr118017.c pr117946.c apx-ndd-x32-2a.c' vect.exp='vect-early-break_99-pr113287.c' tree-ssa.exp=pr113735.c"
and before this patch it was 41010 clobbers and after it is 40968,
so difference is 42 clobbers, 0.1% fewer.
2025-04-16 Jakub Jelinek <jakub@redhat.com>
PR middle-end/119808
* gimple-lower-bitint.cc (gimple_lower_bitint): Don't set
m_single_use_names bits for SSA_NAMEs which have single use but
their SSA_NAME_DEF_STMT is a copy from another SSA_NAME which doesn't
have a single use, or single use which is such a copy etc.
Jakub Jelinek [Mon, 14 Apr 2025 17:34:22 +0000 (19:34 +0200)]
expmed: Always use QImode for init_expmed set_zero_cost [PR119785]
This is a regression on some targets introduced I believe by r6-2055
which added mode argument to set_src_cost.
The problem here is that in the first iteration, mode is always QImode
and we get as -Os zero cost set_src_cost (const0_rtx, QImode, false).
But then we use the mode variable for iterating over int, partial int
and vector int modes, so for the second iteration we call set_src_cost
with mode which is at that time (machine_mode) (MAX_MODE_VECTOR_INT + 1).
In the x86 case that happens to be V2HFmode and we don't crash (and
compute the same 0 cost as we would for QImode).
But e.g. in the SPARC case (machine_mode) (MAX_MODE_VECTOR_INT + 1) is
MAX_MACHINE_MODE and that does all kinds of weird things especially
when doing ubsan bootstrap.
Fixed by always using QImode.
2025-04-14 Jakub Jelinek <jakub@redhat.com>
PR rtl-optimization/119785
* expmed.cc (init_expmed): Always pass QImode rather than mode to
set_src_cost passed to set_zero_cost.
Jakub Jelinek [Mon, 14 Apr 2025 08:18:13 +0000 (10:18 +0200)]
driver: On linux hosts disable ASLR during -freport-bug [PR119727]
Andi had a useful comment that even with the PR119727 workaround to
ignore differences in libbacktrace printed addresses, it is still better
to turn off ASLR when easily possible, e.g. in case some address leaks
in somewhere in the ICE message elsewhere, or to verify the ICE doesn't
depend on a particular library/binary load addresses.
The following patch adds a configure check and uses personality syscall
to turn off randomization for further -freport-bug subprocesses.
2025-04-14 Jakub Jelinek <jakub@redhat.com>
PR driver/119727
* configure.ac (HOST_HAS_PERSONALITY_ADDR_NO_RANDOMIZE): New check.
* gcc.cc: Include sys/personality.h if
HOST_HAS_PERSONALITY_ADDR_NO_RANDOMIZE is defined.
(try_generate_repro): Call
personality (personality (0xffffffffU) | ADDR_NO_RANDOMIZE)
if HOST_HAS_PERSONALITY_ADDR_NO_RANDOMIZE is defined.
* config.in: Regenerate.
* configure: Regenerate.
Jakub Jelinek [Sat, 12 Apr 2025 11:15:13 +0000 (13:15 +0200)]
driver: Fix up -freport-bug for ASLR [PR119727]
With --enable-host-pie -freport-bug almost never prepares preprocessed
source and instead emits
The bug is not reproducible, so it is likely a hardware or OS problem.
message even for bogus which are 100% reproducible.
The way -freport-bug works is that it reruns it 3 times, capturing stdout
and stderr from each and then tries to compare the outputs in between
different runs.
The libbacktrace emitted hexadecimal addresses at the start of the lines
can differ between runs due to ASLR, either of the PIE executable, or
even if not PIE if there is some frame with e.g. libc function (say
crash in strlen/memcpy etc.).
The following patch fixes it by ignoring such differences at the start of
the lines.
2025-04-12 Jakub Jelinek <jakub@redhat.com>
PR driver/119727
* gcc.cc (files_equal_p): Rewritten using fopen/fgets/fclose instead
of open/fstat/read/close. At the start of lines, ignore lowercase
hexadecimal addresses followed by space.
Jakub Jelinek [Sat, 12 Apr 2025 11:13:53 +0000 (13:13 +0200)]
bitintlower: Fix up handling of SSA_NAME copies in coalescing [PR119722]
The following patch is miscompiled, because during the limited
SSA name coalescing the bitintlower pass does we incorrectly don't
register a conflict.
This is on
<bb 4> [local count: 1073741824]:
# b_17 = PHI <b_19(3), 8(2)>
g.4_13 = g;
_14 = g.4_13 >> 50;
_15 = (unsigned int) _14;
_21 = b_17;
_16 = (unsigned int) _21;
s_22 = _15 + _16;
return s_22;
basic block where in the map->bitint bitmap we track 14, 17 and 19.
The build_bitint_stmt_ssa_conflicts "hook" has special code where
it tracks uses at the final statements of mergeable operations, so
e.g. the
_16 = (unsigned int) _21;
statement is considered to be use of b_17 because _21 is not in
map->bitmap (or large_huge.m_names), i.e. is mergeable.
The problem is that build_ssa_conflict_graph has special code to handle
SSA_NAME copies and _21 = b_17; is gimple_assign_copy_p. In such cases
it calls live_track_clear_var on the rhs1. The problem is that
on the above bb, after we note in the _16 = (unsigned int) _21;
stmt we need b_17 the generic code makes us forget that because
of the copy statement, and then build_bitint_stmt_ssa_conflicts
ignores it completely (because _21 is large/huge bitint and is
not in map->bitint, so assumed to be handled by a later stmt in the
bb, for backwards walk like this before this one).
As the b_17 use is ignored, the coalescing thinks it can put
all of b_17, b_19 and _14 into the same partition, which is wrong,
while we can and should coalesce b_17 and b_19, _14 needs to be a different
temporary because b_17 is set before and used after _14 has been written.
The following patch fixes it by handling gimple_assign_copy_p in two
separate spots, move the generic coalesce handling of it after
build_ssa_conflict_graph (where build_ssa_conflict_graph handling
doesn't fall through to that, it does continue after the call) and
inside of build_ssa_conflict_graph it performs it too, but only if
the lhs is not mergeable large/huge bitint.
2025-04-12 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/119722
* gimple-lower-bitint.h (build_bitint_stmt_ssa_conflicts): Add
CLEAR argument.
* gimple-lower-bitint.cc (build_bitint_stmt_ssa_conflicts): Add
CLEAR argument. Call clear on gimple_assign_copy_p rhs1 if lhs
is large/huge bitint unless lhs is not in names.
* tree-ssa-coalesce.cc (build_ssa_conflict_graph): Adjust
build_bitint_stmt_ssa_conflicts caller. Move gimple_assign_copy_p
handling to after the build_bitint_stmt_ssa_conflicts call.
Jakub Jelinek [Fri, 11 Apr 2025 06:27:55 +0000 (08:27 +0200)]
bitintlower: Fix up handling of nested casts in m_upward_2limbs cases [PR119707]
The following testcase is miscompiled I believe starting with
PR112941 r14-6742. That commit fixed the bitint-55.c testcase.
The m_first initialization for such conversion initializes 2 SSA_NAMEs,
one is PHI result on the loop (m_data[save_data_cnt]) and the other
(m_data[save_data_cnt+1]) is the argument of that PHI from the latch
edge initialized somewhere in the loop. Both of these are used to
propagate sign extension (i.e. either 0 or all ones limb) from the
iteration with the sign bit of a narrower type to following iterations.
The bitint-55.c testcase was ICEing with invalid SSA forms as it was
using unconditionally the PHI argument SSA_NAME even in places which
weren't dominated by that. And the code which was touched is about
handling constant idx, so if e.g. there are nested casts and the
outer one does conditional code based on index comparison with
a particular constant index.
In the following testcase there are 2 nested casts, one from signed
_BitInt(129) to unsigned _BitInt(255) and the outer from unsigned
_BitInt(255) to unsigned _BitInt(256). The m_upward_2limbs case which
is used for handling mergeable arithmetics (like +-|&^ and casts etc.)
one loop iteration handles 2 limbs, the first half the even ones, the
second half the odd ones.
And for these 2 conversions, the special one for the inner conversion
on x86_64 is with index 2 where the sign bit of _BitInt(129) is present,
while for the outer one index 3 where we need to mask off the most
significant bit.
The r15-6742 change started using m_data[save_data_cnt] for all constant
indexes if it is still inside of the loop (and it is sign extension).
But that doesn't work correctly for the case where the inner conversion
produces the sign extension limb in the loop for an even index and
the outer conversion needs to special case the immediately next conversion,
because in that case using the PHI result will see still 0 there rather
than the updated value from the handling of previous limb.
So the following patch special cases this and uses the other SSA_NAME.
Commented IL, trying to lower
_1 = (unsigned _BitInt(255)) y_4(D);
_2 = (unsigned _BitInt(256)) _1;
_3 = _2 + x_5(D);
<retval> = _3;
we were emitting
<bb 3> [local count: 1073741824]:
# _8 = PHI <0(2), _9(12)> // This is the limb index
# _10 = PHI <0(2), _11(12)> // Sign extension limb from inner cast (0 or ~0UL)
# _22 = PHI <0(2), _23(12)> // Overflow bit from addition of previous limb
if (_8 <= 2)
goto <bb 4>; [80.00%]
else
goto <bb 7>; [20.00%]
<bb 11> [local count: 214748360]:
// And HERE is the actual bug. Using _10 for idx 3 will mean it is always
// zero there and doesn't contain the _18 value propagated to it.
// It should be
// _30 = (<unnamed-unsigned:63>) _11;
// Now if the outer conversion had special iteration say 5, we could
// have used _10 fine here, by that time it already propagates through
// the PHI.
_30 = (<unnamed-unsigned:63>) _10;
_31 = (unsigned long) _30;
PR tree-optimization/119707
* gimple-lower-bitint.cc (bitint_large_huge::handle_cast): Only use
m_data[save_data_cnt] instead of m_data[save_data_cnt + 1] if
idx is odd and equal to low + 1. Remember tree_to_uhwi (idx) in
a temporary instead of calling the function multiple times.
Jakub Jelinek [Wed, 9 Apr 2025 20:01:30 +0000 (22:01 +0200)]
libquadmath: Fix up THREEp96 constant in expq
Here is a cherry-pick from glibc [BZ #32411] fix.
As mentioned by the reporter in a pull request against gcc-mirror,
the THREEp96 constant in e_expl.c is incorrect, it is actually 0x3.p+94f128
rather than 0x3.p+96f128.
The algorithm uses that to compute the t2 integer (tval2), by whose
delta it adjusts the x+xl pair and then in the result uses the precomputed
exp value for that entry.
Using 0x3.p+94f128 rather than 0x3.p+96f128 results in tval2 sometimes
being one smaller, sometimes one larger than the desired value, thus can mean
the x+xl pair after adjustment will be larger in absolute value than it
should be.
DesWursters created a test program for this
https://github.com/DesWurstes/comparefloats
and his results were
total: 1135000000 not_equal: 4322 earlier_score: 674 later_score: 3648
I've modified this so with
https://sourceware.org/bugzilla/show_bug.cgi?id=32411#c3
so that it actually tests pseudo-random _Float128 values with range
(-16384.,16384) with strong bias on values larger than 0.0002 in absolute
value (so that tval1/tval2 aren't zero most of the time) and that gave
total: 10000000000 not_equal: 29861 earlier_score: 4606 later_score: 25255
So, in both cases, in most cases the change doesn't result in any differences,
and in those rare cases where does, about 85% have smaller ulp than without
the patch.
Additionally I've tried
https://sourceware.org/bugzilla/show_bug.cgi?id=32411#c4
and in 2 billion iterations it didn't find any case where x+xl after the
adjustments without this change would be smaller in absolute value compared
to x+xl after the adjustments with this change.
Jakub Jelinek [Fri, 4 Apr 2025 18:57:09 +0000 (20:57 +0200)]
lto: lto-opts fixes [PR119625]
I can reproduce a really weird error in our distro i686 trunk gcc
(but haven't managed to reproduce it with vanilla trunk yet).
echo 'void foo (void) {}' > a.c; gcc -O2 -flto=auto -m32 -march=i686 -ffat-lto-objects -fhardened -o a.o -c a.c; gcc -O2 -flto=auto -m32 -march=i686 -r -o a.lo a.o
lto1: fatal error: open failed: No such file or directory
compilation terminated.
lto-wrapper: fatal error: gcc returned 1 exit status
The error is because
cat ./a.lo.lto.o-args.0
""
a.o
My suspicion is that this "" in there is caused by weird .gnu.lto_.opts
section content during
gcc -O2 -flto=auto -m32 -march=i686 -ffat-lto-objects -fhardened -S -o a.s -c a.c
compilation (and I can reproduce that one with vanilla trunk).
The above results in
.section .gnu.lto_.opts,"e",@progbits
.string "'-fno-openmp' '-fno-openacc' '-fPIC' '' '-m32' '-march=i686' '-O2' '-flto=auto' '-ffat-lto-objects'"
There are two weird things, one (IMHO the cause of the "" later on) is
the '' part, I think it comes from lto_write_options doing
append_to_collect_gcc_options (&temporary_obstack, &first_p, "");
IMHO it shouldn't call append_to_collect_gcc_options at all for that case.
The -fhardened option causes global_options.x_flag_cf_protection
to be set to CF_FULL and later on the backend option processing
sets it to CF_FULL | CF_SET (i.e. 7, a value not handled in
lto_write_options).
The following patch fixes it by not emitting anything there if
flag_cf_protection is one of the unhandled values.
Perhaps it could incrementally use
switch (global_options.x_flag_cf_protection & ~CF_SET)
instead, dunno.
And the other problem is that the -fPIC in there is really weird.
Our distro compiler or vanilla configured trunk certainly doesn't
default to -fPIC and -fhardened uses -fPIE when
-fPIC/-fpic/-fno-pie/-fno-pic is not specified, so I was expecting
-fPIE in there.
The thing is that the -fpie option causes setting of both
global_options.x_flag_pi{c,e} to 1, -fPIE both to 2:
/* If -fPIE or -fpie is used, turn on PIC. */
if (opts->x_flag_pie)
opts->x_flag_pic = opts->x_flag_pie;
else if (opts->x_flag_pic == -1)
opts->x_flag_pic = 0;
if (opts->x_flag_pic && !opts->x_flag_pie)
opts->x_flag_shlib = 1;
so checking first for flag_pic == 2 and then flag_pic == 1
and only afterwards for flag_pie means we never print
-fPIE/-fpie.
Or do you want something further (like
switch (global_options.x_flag_cf_protection & ~CF_SET)
)?
2025-04-04 Jakub Jelinek <jakub@redhat.com>
PR lto/119625
* lto-opts.cc (lto_write_options): If neither flag_pic nor
flag_pie are set, check first for flag_pie and only later
for flag_pic rather than the other way around, use a temporary
variable. If flag_cf_protection is not set, don't append anything
if flag_cf_protection is none of CF_{NONE,FULL,BRANCH,RETURN} and
use a temporary variable.
Jakub Jelinek [Wed, 2 Apr 2025 17:28:20 +0000 (19:28 +0200)]
c: Fix ICEs with -fsanitize=pointer-{subtract,compare} [PR119582]
The following testcase ICEs because c_fully_fold isn't performed on the
arguments of __sanitizer_ptr_{sub,cmp} builtins and so e.g.
C_MAYBE_CONST_EXPR can leak into the gimplifier where it ICEs.
2025-04-02 Jakub Jelinek <jakub@redhat.com>
PR c/119582
* c-typeck.cc (pointer_diff, build_binary_op): Call c_fully_fold on
__sanitizer_ptr_sub or __sanitizer_ptr_cmp arguments.
Jakub Jelinek [Tue, 1 Apr 2025 14:40:55 +0000 (16:40 +0200)]
combine: Use reg_used_between_p rather than modified_between_p in two spots [PR119291]
The following testcase is miscompiled on x86_64-linux at -O2 by the combiner.
We have from earlier combinations
(insn 22 21 23 4 (set (reg:SI 104 [ _7 ])
(const_int 0 [0])) "pr119291.c":25:15 96 {*movsi_internal}
(nil))
(insn 23 22 24 4 (set (reg/v:SI 117 [ e ])
(reg/v:SI 116 [ e ])) 96 {*movsi_internal}
(expr_list:REG_DEAD (reg/v:SI 116 [ e ])
(nil)))
(note 24 23 25 4 NOTE_INSN_DELETED)
(insn 25 24 26 4 (parallel [
(set (reg:CCZ 17 flags)
(compare:CCZ (neg:SI (reg:SI 104 [ _7 ]))
(const_int 0 [0])))
(set (reg/v:SI 116 [ e ])
(neg:SI (reg:SI 104 [ _7 ])))
]) "pr119291.c":26:13 977 {*negsi_2}
(expr_list:REG_DEAD (reg:SI 104 [ _7 ])
(nil)))
(note 26 25 27 4 NOTE_INSN_DELETED)
(insn 27 26 28 4 (set (reg:DI 128 [ _9 ])
(ne:DI (reg:CCZ 17 flags)
(const_int 0 [0]))) "pr119291.c":26:13 1447 {*setcc_di_1}
(expr_list:REG_DEAD (reg:CCZ 17 flags)
(nil)))
and try_combine is called on i3 25 and i2 22 (second time)
and reach the hunk being patched with simplified i3
(insn 25 24 26 4 (parallel [
(set (pc)
(pc))
(set (reg/v:SI 116 [ e ])
(const_int 0 [0]))
]) "pr119291.c":28:13 977 {*negsi_2}
(expr_list:REG_DEAD (reg:SI 104 [ _7 ])
(nil)))
and
(insn 22 21 23 4 (set (reg:SI 104 [ _7 ])
(const_int 0 [0])) "pr119291.c":27:15 96 {*movsi_internal}
(nil))
Now, the try_combine code there attempts to split two independent
sets in newpat by moving one of them to i2.
And among other tests it checks
!modified_between_p (SET_DEST (set1), i2, i3)
which is certainly needed, if there would be say
(set (reg/v:SI 116 [ e ]) (const_int 42 [0x2a]))
in between i2 and i3, we couldn't do that, as that set would overwrite
the value set by set1 we want to move to the i2 position.
But in this case pseudo 116 isn't set in between i2 and i3, but used
(and additionally there is a REG_DEAD note for it).
This is equally bad for the move, because while the i3 insn
and later will see the pseudo value that we set, the insn in between
which uses the value will see a different value from the one that
it should see.
As we don't check for that, in the end try_combine succeeds and
changes the IL to:
(insn 22 21 23 4 (set (reg/v:SI 116 [ e ])
(const_int 0 [0])) "pr119291.c":27:15 96 {*movsi_internal}
(nil))
(insn 23 22 24 4 (set (reg/v:SI 117 [ e ])
(reg/v:SI 116 [ e ])) 96 {*movsi_internal}
(expr_list:REG_DEAD (reg/v:SI 116 [ e ])
(nil)))
(note 24 23 25 4 NOTE_INSN_DELETED)
(insn 25 24 26 4 (set (pc)
(pc)) "pr119291.c":28:13 2147483647 {NOOP_MOVE}
(nil))
(note 26 25 27 4 NOTE_INSN_DELETED)
(insn 27 26 28 4 (set (reg:DI 128 [ _9 ])
(const_int 0 [0])) "pr119291.c":28:13 95 {*movdi_internal}
(nil))
(note, the i3 got turned into a nop and try_combine also modified insn 27).
The following patch replaces the modified_between_p
tests with reg_used_between_p, my understanding is that
modified_between_p is a subset of reg_used_between_p, so one
doesn't need both.
Looking at this some more today, I think we should special case
set_noop_p because that can be put into i2 (except for the JUMP_P
violations), currently both modified_between_p (pc_rtx, i2, i3)
and reg_used_between_p (pc_rtx, i2, i3) returns false.
I'll post a patch incrementally for that (but that feels like
new optimization, so probably not something that should be backported).
On Tue, Apr 01, 2025 at 11:27:25AM +0200, Richard Biener wrote:
> Can we constrain SET_DEST (set1/set0) to a REG_P in combine? Why
> does the comment talk about memory?
I was worried about making too risky changes this late in stage4
(and especially also for backports). Most of this code is 1992-ish.
I think many of the functions are just misnamed, the reg_ in there doesn't
match what those functions do (bet they initially supported just REGs
and later on support for other kinds of expressions was added, but haven't
done git archeology to prove that).
What we know for sure is:
&& GET_CODE (SET_DEST (XVECEXP (newpat, 0, 0))) != ZERO_EXTRACT
&& GET_CODE (SET_DEST (XVECEXP (newpat, 0, 0))) != STRICT_LOW_PART
&& GET_CODE (SET_DEST (XVECEXP (newpat, 0, 1))) != ZERO_EXTRACT
&& GET_CODE (SET_DEST (XVECEXP (newpat, 0, 1))) != STRICT_LOW_PART
that is checked earlier in the condition.
Then it calls
&& ! reg_referenced_p (SET_DEST (XVECEXP (newpat, 0, 1)),
XVECEXP (newpat, 0, 0))
&& ! reg_referenced_p (SET_DEST (XVECEXP (newpat, 0, 0)),
XVECEXP (newpat, 0, 1))
While it has reg_* in it, that function mostly calls reg_overlap_mentioned_p
which is also misnamed, that function handles just fine all of
REG, MEM, SUBREG of REG, (SUBREG of MEM not, see below), ZERO_EXTRACT,
STRICT_LOW_PART, PC and even some further cases.
So, IMHO SET_DEST (set0) or SET_DEST (set0) can be certainly a REG, SUBREG
of REG, PC (at least the REG and PC cases are triggered on the testcase)
and quite possibly also MEM (SUBREG of MEM not, see below).
Now, the code uses !modified_between_p (SET_SRC (set{1,0}), i2, i3) where that
function for constants just returns false, for PC returns true, for REG
returns reg_set_between_p, for MEM recurses on the address, for
MEM_READONLY_P otherwise returns false, otherwise checks using alias.cc code
whether the memory could have been modified in between, for all other
rtxes recurses on the subrtxes. This part didn't change in my patch.
I've only changed those
- && !modified_between_p (SET_DEST (set{1,0}), i2, i3)
+ && !reg_used_between_p (SET_DEST (set{1,0}), i2, i3)
where the former has been described above and clearly handles all of
REG, SUBREG of REG, PC, MEM and SUBREG of MEM among other things.
The replacement reg_used_between_p calls reg_overlap_mentioned_p on each
instruction in between i2 and i3. So, there is clearly a difference
in behavior if SET_DEST (set{1,0}) is pc_rtx, in that case modified_between_p
returns unconditionally true even if there are no instructions in between,
but reg_used_between_p if there are no non-debug insns in between returns
false. Sorry for missing that, guess I should check for that (with the
exception of the noop moves which are often (set (pc) (pc)) and handled
by the incremental patch). In fact not just that, reg_used_between_p
will only return true for PC if it is mentioned anywhere in the insns
in between.
Anyway, except for that, for REG it calls refers_to_regno_p
and so should find any occurrences of any of the REG or parts of it for hard
registers, for MEM returns true if it sees any MEMs in insns in between
(conservatively), for SUBREGs apparently it relies on it being SUBREG of REG
(so doesn't handle SUBREG of MEM) and handles SUBREG of REG like the
SUBREG_REG, PC I've already described.
Now, because reg_overlap_mentioned_p doesn't handle SUBREG of MEM, I think
already the initial
&& ! reg_referenced_p (SET_DEST (XVECEXP (newpat, 0, 1)),
XVECEXP (newpat, 0, 0))
&& ! reg_referenced_p (SET_DEST (XVECEXP (newpat, 0, 0)),
XVECEXP (newpat, 0, 1))
calls would have failed --enable-checking=rtl or would have misbehaved, so
I think there is no need to check for it further.
To your question why I don't use reg_referenced_p, that is because
reg_referenced_p is something to call on one insn pattern, while
reg_used_between_p is pretty much that on all insns in between two
instructions (excluding the boundaries).
So, I think it would be safer to add && SET_DEST (set{1,0} != pc_rtx
checks to preserve former behavior, like in the following version.
2025-04-01 Jakub Jelinek <jakub@redhat.com>
PR rtl-optimization/119291
* combine.cc (try_combine): For splitting of PARALLEL with
2 independent SETs into i2 and i3 sets check reg_used_between_p
of the SET_DESTs rather than just modified_between_p.
Tomasz Kamiński [Tue, 11 Mar 2025 10:59:36 +0000 (11:59 +0100)]
libstdc++: Correct preprocessing checks for floatX_t and bfloat_16 formatting
Floating points types _Float16, _Float32, _Float64, and bfloat16,
can be formatted only if std::to_chars overloads for such types
were provided. Currently this is only the case for architectures
where float and double are 32-bits and 64-bits IEEE floating points types.
This patch updates the preprocessing checks for formatters
for above types to check _GLIBCXX_FLOAT_IS_IEEE_BINARY32
and _GLIBCXX_DOUBLE_IS_IEEE_BINARY64. Making them non-formattable
on non-IEEE architectures.
Remove a potential UB, where we could produce basic_format_arg
with _M_type set to _Arg_fp32 or _Arg_fp64, that was later not
handled by `_M_visit`.
libstdc++-v3/ChangeLog:
* include/std/format (formatter<_Float16, _CharT>): Define only if
_GLIBCXX_FLOAT_IS_IEEE_BINARY32 macro is defined.
(formatter<_Float16, _CharT>): As above.
(formatter<__gnu_cxx::__bfloat16_t, _CharT>): As above.
(formatter<_Float64, _CharT>): Define only if
_GLIBCXX_DOUBLE_IS_IEEE_BINARY64 is defined.
(basic_format_arg::_S_to_arg_type): Normalize _Float32 and _Float64
only to float and double respectivelly.
(basic_format_arg::_S_to_enum): Remove handling of _Float32 and _Float64.
Reviewed-by: Jonathan Wakely <jwakely@redhat.com> Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>
(cherry picked from commit 445128c12cf22081223f7385196ee3889ef4c4b2)
i386: Enable -mnop-mcount for -fpic with PLTs [PR119386]
-mnop-mcount can be trivially enabled for -fPIC codegen as long as PLTs
are being used, given that the instruction encodings are identical, only
the target may resolve differently depending on how the linker decides
to incorporate the object file.
So relax the option check, and add a test to ensure that 5-byte NOPs are
emitted when -mnop-mcount is being used.
i386: Prefer PLT indirection for __fentry__ calls under -fPIC [PR119386]
Commit bde21de1205 ("i386: Honour -mdirect-extern-access when calling
__fentry__") updated the logic that emits mcount() / __fentry__() calls
into function prologues when profiling is enabled, to avoid GOT-based
indirect calls when a direct call would suffice.
There are two problems with that change:
- it relies on -mdirect-extern-access rather than -fno-plt to decide
whether or not a direct [PLT based] call is appropriate;
- for the PLT case, it falls through to x86_print_call_or_nop(), which
does not emit the @PLT suffix, resulting in the wrong relocation to be
used (R_X86_64_PC32 instead of R_X86_64_PLT32)
Fix this by testing flag_plt instead of ix86_direct_extern_access, and
updating x86_print_call_or_nop() to take flag_pic and flag_plt into
account. This also ensures that -mnop-mcount works as expected when
emitting the PLT based profiling calls.
While at it, fix the 32-bit logic as well, and issue a PLT call unless
PLTs are explicitly disabled.
PR target/119386
* config/i386/i386.cc (x86_print_call_or_nop): Add @PLT suffix
where appropriate.
(x86_function_profiler): Fall through to x86_print_call_or_nop()
for PIC codegen when flag_plt is set.
gcc/testsuite/ChangeLog:
PR target/119386
* gcc.target/i386/pr119386-1.c: New test.
* gcc.target/i386/pr119386-2.c: New test.
Eric Botcazou [Wed, 16 Apr 2025 20:01:31 +0000 (22:01 +0200)]
Fix wrong optimization of conditional expression with enumeration type
This is a regression introduced on the mainline and 14 branch by:
https://gcc.gnu.org/pipermail/gcc-cvs/2023-October/391658.html
The change bypasses int_fits_type_p (essentially) to work around the
signedness constraints, but in doing so disregards the peculiarities
of boolean types whose precision is not 1 dealt with by the predicate,
leading to the creation of a problematic conversion here.
Fixed by special-casing boolean types whose precision is not 1, as done
in several other places.
gcc/
* tree-ssa-phiopt.cc (factor_out_conditional_operation): Do not
bypass the int_fits_type_p test for boolean types whose precision
is not 1.
gcc/testsuite/
* gnat.dg/opt105.adb: New test.
* gnat.dg/opt105_pkg.ads, gnat.dg/opt105_pkg.adb: New helper.
Alice Carlotti [Tue, 15 Apr 2025 16:36:25 +0000 (17:36 +0100)]
aarch64: Disable sysreg feature gating
This applies to the sysreg read/write intrinsics __arm_[wr]sr*. It does
not depend on changes to Binutils, because GCC converts recognised
sysreg names to an encoding based form, which is already ungated in Binutils.
We have, however, agreed to make an equivalent change in Binutils (which
would then disable feature gating for sysreg accesses in inline
assembly), but this has not yet been posted upstream.
In the future we may introduce a new flag to renable some checking,
but these checks could not be comprehensive because many system
registers depend on architecture features that don't have corresponding
GCC/GAS --march options. This would also depend on addressing numerous
inconsistencies in the existing list of sysreg feature dependencies.
H.J. Lu [Tue, 27 Aug 2024 14:03:22 +0000 (07:03 -0700)]
Extend check-function-bodies to allow label and directives
As PR target/116174 shown, we may need to verify labels and the directive
order. Extend check-function-bodies to support matched output lines to
allow label and directives.
gcc/
* doc/sourcebuild.texi (check-function-bodies): Add an optional
argument for matched output lines.
gcc/testsuite/
* gcc.target/i386/pr116174.c: Use check-function-bodies.
* lib/scanasm.exp (parse_function_bodies): Append the line if
$up_config(matched) matches the line.
(check-function-bodies): Add an argument for matched. Set
up_config(matched) to $matched. Append the expected line without
$config(line_prefix) to function_regexp if it starts with ".L".
Since r15-9062-g70391e3958db79 we perform vector bitmask initialization
via the vec_duplicate expander directly. This triggered a latent bug in
ours where we missed to mask out the single bit which resulted in an
execution FAIL of pr119114.c
The attached patch adds the 1-masking of the broadcast operand.
aarch64: Split aarch64_combinev16qi before RA [PR115258]
Two-vector TBL instructions are fed by an aarch64_combinev16qi, whose
purpose is to put the two input data vectors into consecutive registers.
This aarch64_combinev16qi was then split after reload into individual
moves (from the first input to the first half of the output, and from
the second input to the second half of the output).
In the worst case, the RA might allocate things so that the destination
of the aarch64_combinev16qi is the second input followed by the first
input. In that case, the split form of aarch64_combinev16qi uses three
eors to swap the registers around.
This PR is about a test where this worst case occurred. And given the
insn description, that allocation doesn't semm unreasonable.
early-ra should (hopefully) mean that we're now better at allocating
subregs of vector registers. The upcoming RA subreg patches should
improve things further. The best fix for the PR therefore seems
to be to split the combination before RA, so that the RA can see
the underlying moves.
Perhaps it even makes sense to do this at expand time, avoiding the need
for aarch64_combinev16qi entirely. That deserves more experimentation
though.
gcc/
PR target/115258
* config/aarch64/aarch64-simd.md (aarch64_combinev16qi): Allow
the split before reload.
* config/aarch64/aarch64.cc (aarch64_split_combinev16qi): Generalize
into a form that handles pseudo registers.
gcc/testsuite/
PR target/115258
* gcc.target/aarch64/pr115258.c: New test.
aarch64: Avoid unnecessary use of 2-input TBLs [PR115258]
When using TBL for (say) a V4SI permutation, the aarch64 port first
asks target-independent code to lower to a V16QI permutation.
Then, during code generation, an input like:
But subregs (unlike regs) are not shared, so the op0 == op1 check
always failed for this case. We'd then force each subreg into a
fresh register, meaning that during the later:
there is no way for aarch64_expand_vec_perm_1 to realise that
d->op0 and d->op1 are the same value. It would therefore generate
a two-input TBL in the testcase, even though a single-input TBL
is enough.
I'm not sure forcing subregs to a fresh regiter is a good idea --
it caused problems for copysign & co. -- but that's not something
to fiddle with during stage 4. Using op0 == op1 for rtx equality
is independently wrong, so we might as well just fix that for now.
The patch gets rid of extra MOVs that are a regression from GCC 14.
The testcase is based on one from Kugan, itself based on TSVC.
gcc/
PR target/115258
* config/aarch64/aarch64.cc (aarch64_vectorize_vec_perm_const): Use
d.one_vector_p to decide whether op1 should be a copy of op0.
gcc/testsuite/
PR target/115258
* gcc.target/aarch64/pr115258_2.c: New test.
tree-data-refs.cc uses alignment information to try to optimise
the code generated for alias checks. The assumption for "normal"
non-grouped, full-width scalar accesses was that the access size
would be a multiple of the alignment. As Richi notes in the PR,
this is a documented precondition of dr_with_seg_len:
/* The minimum common alignment of DR's start address, SEG_LEN and
ACCESS_SIZE. */
unsigned int align;
PR115192 was a case in which this assumption didn't hold. The access
was part of an aligned 4-element group, but only the first 2 elements
of the group were accessed. The alignment was therefore double the
access size.
In r15-820-ga0fe4fb1c8d78045 I'd "fixed" that by capping the
alignment in one of the output routines. But I think that was
misconceived. The precondition means that we should cap the
alignment at source instead.
Failure to do that caused a similar wrong code bug in this PR,
where the alignment comes from a short bitfield access rather
than from a group access.
gcc/
PR tree-optimization/116125
* tree-vect-data-refs.cc (vect_prune_runtime_alias_test_list): Make
the dr_with_seg_len alignment fields describe tha access sizes as
well as the pointer alignment.
* tree-data-ref.cc (create_intersect_range_checks): Don't compensate
for invalid alignment fields here.
gcc/testsuite/
PR tree-optimization/116125
* gcc.dg/vect/pr116125.c: New test.
aarch64: Fix invalid subregs in xorsign [PR118501]
In the testcase, we try to use xorsign on:
(subreg:DF (reg:TI R) 8)
i.e. the highpart of the TI. xorsign wants to take a V2DF
paradoxical subreg of this, which is rightly rejected as a direct
operation. In cases like this, we need to force the highpart into
a fresh register first.
gcc/
PR target/118501
* config/aarch64/aarch64.md (@xorsign<mode>3): Use
force_lowpart_subreg.
gcc/testsuite/
PR target/118501
* gcc.c-torture/compile/pr118501.c: New test.
aarch64: Use force_lowpart_subreg in a BFI splitter [PR119133]
lowpart_subreg ICEs are the gift that keeps giving. This is another
case where we need to use force_lowpart_subreg instead, to handle
cases where the input is already a subreg and where the combined
subreg is not allowed as a single operation.
We don't need to check can_create_pseudo_p since the input should
be a hard register rather than a subreg if !can_create_pseudo_p.
gcc/
PR target/119133
* config/aarch64/aarch64.md
(*aarch64_bfi<GPI:mode><ALLX:mode>_<SUBDI_BITS>): Use
force_lowpart_subreg.
gcc/testsuite/
PR target/119133
* gcc.dg/torture/pr119133.c: New test.
Avoid using POINTER_DIFF_EXPR for overlap checks [PR119399]
In r10-4803-g8489e1f45b50600c I'd used POINTER_DIFF_EXPR to subtract
the two pointers involved in an overlap test. I'm not sure whether
I'd specifically chosen that over MINUS_EXPR or not; if so, the only
reason I can think of is that it is probably faster on targets with
PSImode pointers. Regardless, as the PR points out, subtracting
unrelated pointers using POINTER_DIFF_EXPR is undefined behaviour.
gcc/
PR tree-optimization/119399
* tree-data-ref.cc (create_waw_or_war_checks): Use a MINUS_EXPR
on two converted pointers, rather than converting a POINTER_DIFF_EXPR
on the pointers.
gcc/testsuite/
PR tree-optimization/119399
* gcc.dg/vect/pr119399.c: New test.
optabs had a local function called lowpart_subreg_maybe_copy
that is very similar to the lowpart version of force_subreg.
This patch adds a force_lowpart_subreg wrapper around
force_subreg.
The only difference between the old and new functions is that
the old one asserted success while the new one doesn't.
It's common not to assert elsewhere when taking subregs;
normally a null result is enough.
Later patches will make more use of the new function.
gcc/
* explow.h (force_lowpart_subreg): Declare.
* explow.cc (force_lowpart_subreg): New function.
While adding more uses of force_subreg, I realised that it should
be more careful to emit no instructions on failure. This kind of
failure should be very rare, so I don't think it's a case worth
optimising for.
gcc/
* explow.cc (force_subreg): Emit no instructions on failure.
Jonathan Wakely [Tue, 15 Apr 2025 13:00:23 +0000 (14:00 +0100)]
libstdc++: Do not define __cpp_lib_ranges_iota in <ranges>
In r14-7153-gadbc46942aee75 we removed a duplicate definition of
__glibcxx_want_range_iota from <ranges>, but __cpp_lib_ranges_iota
should be defined in <ranges> at all.
libstdc++-v3/ChangeLog:
* include/std/ranges (__glibcxx_want_ranges_iota): Do not
define.
Michael Levine [Fri, 7 Jun 2024 08:54:38 +0000 (09:54 +0100)]
libstdc++: Fix std::ranges::iota is not included in numeric [PR108760]
Before this patch, using std::ranges::iota required including
<algorithm> when it should have been sufficient to only include
<numeric>.
For the backport to the release branch ranges::iota is defined in
<bits/ranges_algobase.h> so that it's available in both <numeric> and
<algorithm>. This avoids breaking code that compiles successfully using
existing releases where <algorithm> defines ranges::iota.
libstdc++-v3/ChangeLog:
PR libstdc++/108760
* include/bits/ranges_algo.h (ranges::out_value_result)
(ranges::iota_result, ranges::__iota_fn, ranges::iota): Move to
<bits/ranges_algobase.h>.
* include/bits/ranges_algobase.h (ranges::out_value_result):
(ranges::iota_result, ranges::__iota_fn, ranges::iota): Move to
here.
* include/std/numeric: Include <bits/ranges_algobase.h>.
* testsuite/25_algorithms/iota/1.cc: Renamed to ...
* testsuite/26_numerics/iota/2.cc: ... here.
Jonathan Wakely [Thu, 27 Feb 2025 13:27:17 +0000 (13:27 +0000)]
libstdc++: Fix ranges::move and ranges::move_backward to use iter_move [PR105609]
The ranges::move and ranges::move_backward algorithms are supposed to
use ranges::iter_move(iter) instead of std::move(*iter), which matters
for an iterator type with an iter_move overload findable by ADL.
Currently those algorithms use std::__assign_one which uses std::move,
so define a new ranges::__detail::__assign_one helper function that uses
ranges::iter_move.
libstdc++-v3/ChangeLog:
PR libstdc++/105609
* include/bits/ranges_algobase.h (__detail::__assign_one): New
helper function.
(__copy_or_move, __copy_or_move_backward): Use new function
instead of std::__assign_one.
* testsuite/25_algorithms/move/constrained.cc: Check that
ADL iter_move is used in preference to std::move.
* testsuite/25_algorithms/move_backward/constrained.cc:
Likewise.
Jonathan Wakely [Mon, 14 Oct 2024 22:34:20 +0000 (23:34 +0100)]
libstdc++: Reuse std::__assign_one in <bits/ranges_algobase.h>
Use std::__assign_one instead of ranges::__assign_one. Adjust the uses,
because std::__assign_one has the arguments in the opposite order (the
same order as an assignment expression).
libstdc++-v3/ChangeLog:
* include/bits/ranges_algobase.h (ranges::__assign_one): Remove.
(__copy_or_move, __copy_or_move_backward): Use std::__assign_one
instead of ranges::__assign_one.
Jonathan Wakely [Sun, 13 Oct 2024 18:14:04 +0000 (19:14 +0100)]
libstdc++: Fix ranges::copy_backward for a single memcpyable element [PR117121]
The result iterator needs to be decremented before writing to it.
Improve the PR 108846 tests for all of std::copy, std::copy_n,
std::copy_backward, and the std::ranges versions.
libstdc++-v3/ChangeLog:
PR libstdc++/117121
* include/bits/ranges_algobase.h (copy_backward): Decrement
output iterator before assigning one element through it.
* testsuite/25_algorithms/copy/108846.cc: Ensure the algorithm's
effects are correct for a single memcpyable element.
* testsuite/25_algorithms/copy_backward/108846.cc: Likewise.
* testsuite/25_algorithms/copy_n/108846.cc: Likewise.
libstdc++: Do not use use memmove for 1-element ranges [PR108846,PR116471]
This commit ports the fixes already applied by r13-6372-g822a11a1e642e0
to the range-based versions of copy/move algorithms.
When doing so, a further bug (PR116471) was discovered in the
implementation of the range-based algorithms: although the algorithms
are already constrained by the indirectly_copyable/movable concepts,
there was a failing static_assert in the memmove path.
This static_assert checked that iterator's value type was assignable by
using the is_copy_assignable (move) type traits. However, this is a
problem, because the traits are too strict when checking for constness;
a type like
struct S { S& operator=(S &) = default; };
is trivially copyable (and thus could benefit of the memmove path),
but it does not satisfy is_copy_assignable because the operator takes
by non-const reference.
Now, the reason for the check to be there is because a type with
a deleted assignment operator like
struct E { E& operator=(const E&) = delete; };
is still trivially copyable, but not assignable. We don't want
algorithms like std::ranges::copy to compile because they end up
selecting the memmove path, "ignoring" the fact that E isn't even
copy assignable.
But the static_assert isn't needed here any longer: as noted before,
the ranges algorithms already have the appropriate constraints; and
even if they didn't, there's now a non-discarded codepath to deal with
ranges of length 1 where there is an explicit assignment operation.
Therefore, this commit removes it. (In fact, r13-6372-g822a11a1e642e0
removed the same static_assert from the non-ranges algorithms.)
libstdc++-v3/ChangeLog:
PR libstdc++/108846
PR libstdc++/116471
* include/bits/ranges_algobase.h (__assign_one): New helper
function.
(__copy_or_move): Remove a spurious static_assert; use
__assign_one for memcpyable ranges of length 1.
(__copy_or_move_backward): Likewise.
* testsuite/25_algorithms/copy/108846.cc: Extend to range-based
algorithms, and cover both memcpyable and non-memcpyable
cases.
* testsuite/25_algorithms/copy_backward/108846.cc: Likewise.
* testsuite/25_algorithms/copy_n/108846.cc: Likewise.
* testsuite/25_algorithms/move/108846.cc: Likewise.
* testsuite/25_algorithms/move_backward/108846.cc: Likewise.
RISC-V: revert pr114194 tests on gcc-14 [PR118601]
The gcc-14 backport that split the pr114194 testcase for rv32 and rv64
would only generate the expected rv32 sequence if commit 6b315907c0353f71169a7555e653d29a981fef67 had also been backported, but
it wasn't. Without it, we get the same code as before on both rv32
and rv64, so revert to the original test.
The pr118182-2.c testcase backported from gcc-15 depended on the late
combine pass after register allocation to substitute the zero constant
into the pred_broadcast to get to the expected vmv.s.x instruction.
Without that pass, we get a mfmv.s.f instead. Expect that on gcc-14.
Andrew Pinski [Sat, 15 Mar 2025 23:37:41 +0000 (16:37 -0700)]
discriminators: Fix assigning discriminators on edge [PR113546]
The problem here is there was a compare debug since the discriminators
would still take into account debug statements. For the edge we would look
at the first statement after the labels and that might have been a debug statement.
So we need to skip over debug statements otherwise we could get different
discriminators # with and without -g.
Bootstrapped and tested on x86_64-linux-gnu with no regressions.
PR middle-end/113546
gcc/ChangeLog:
* tree-cfg.cc (first_non_label_stmt): Rename to ...
(first_non_label_nondebug_stmt): This and use gsi_start_nondebug_after_labels_bb.
(assign_discriminators): Update call to first_non_label_nondebug_stmt.
Patrick Palka [Mon, 14 Apr 2025 15:20:13 +0000 (11:20 -0400)]
c++: wrong targs in satisfaction diagnostic context line [PR99214]
In the three-parameter version of satisfy_declaration_constraints, when
't' isn't the most general template, then 't' won't correspond with
'args' after we augment the latter via add_outermost_template_args, and
so the instantiation context that we push via push_tinst_level isn't
quite correct: 'args' is a complete set of template arguments, but 't'
is not necessarily the most general template. This manifests as
misleading diagnostic context lines when issuing a satisfaction failure
error, e.g. the below testcase without this patch we emit:
In substitution of '... void A<int>::f<U>() ... [with U = int]'
and with this patch we emit:
In substitution of '... void A<int>::f<U>() ... [with U = char]'.
This patch fixes this by passing the original 'args' to push_tinst_level,
which ought to properly correspond to 't'.
PR c++/99214
gcc/cp/ChangeLog:
* constraint.cc (satisfy_declaration_constraints): Pass the
original ARGS to push_tinst_level.
Jonathan Wakely [Fri, 11 Apr 2025 10:08:34 +0000 (11:08 +0100)]
libstdc++: Document thread-safety for COW std::string [PR21334]
The gcc4-compatible copy-on-write std::string does not conform to the
C++11 requirements on data race avoidance in standard containers.
Specifically, calling non-const member functions such as begin() and
data() needs to do the "copy on write" operation and so is most
definitely a modification of the object. As such, those non-const
members must not be called concurrently with any other uses of the
string object.
libstdc++-v3/ChangeLog:
PR libstdc++/21334
* doc/xml/manual/using.xml: Document that container data race
avoidance rules do not apply to COW std::string.
* doc/html/*: Regenerate.
Andrew Pinski [Mon, 2 Dec 2024 16:35:23 +0000 (08:35 -0800)]
phiopt: Reset the number of iterations information of a loop when changing an exit from the loop [PR117243]
After r12-5300-gf98f373dd822b3, phiopt could get the following bb structure:
|
middle-bb -----|
| |
| |----| |
phi<1, 2> | |
cond | |
| | |
|--------+---|
Which was considered 2 loops. The inner loop had esimtate of upper_bound to be 8,
due to the original `for (b = 0; b <= 7; b++)`. The outer loop was already an
infinite one.
So phiopt would come along and change the condition to be unconditionally true,
we change the inner loop to being an infinite one but don't reset the estimate
on the loop and cleanup cfg comes along and changes it into one loop but also
does not reset the estimate of the loop. Then the loop unrolling uses the old estimate
and decides to add an unreachable there.o
So the fix is when phiopt changes an exit to a loop, reset the estimates, similar to
how cleanupcfg does it when merging some basic blocks.
Andrew Pinski [Sun, 9 Mar 2025 06:43:54 +0000 (22:43 -0800)]
phiopt: Fix value_replacement for middle bb having phi nodes [PR118922]
After r12-5300-gf98f373dd822b3, value_replacement would be able to look at the
following cfg structure:
```
<bb 5> [local count: 1014686024]:
if (h_6 != 0)
goto <bb 7>; [94.50%]
else
goto <bb 6>; [5.50%]
value_replacement would incorrectly think the middle bb (6) was empty and so it decides
to remove condition in bb5 and replacing it with 0 as the function thought it was `h_6 ? 0 : h_6`.
But since the there is an incoming phi node to bb6 defining h_6 that is incorrect.
The fix is to check if there is phi nodes in the middle bb and set empty_or_with_defined_p to false.
This was not needed before r12-5300-gf98f373dd822b3 because the phi would have been dead otherwise due to
other checks.
Bootstrapped and tested on x86_64-linux-gnu.
PR tree-optimization/118922
gcc/ChangeLog:
* tree-ssa-phiopt.cc (value_replacement): Set empty_or_with_defined_p
to false when there is phi nodes for the middle bb.
Eric Botcazou [Mon, 14 Apr 2025 21:35:43 +0000 (23:35 +0200)]
Revert very recent backport of changes to the type system
The backport of the change made for PR c/113688 onto the 14 branch a couple
of weeks ago has seriously broken the LTO compiler for the Ada language on
the 14 branch, because it changes the GCC type system for the sake of C in
a way that is not compatible with simple discriminated types in Ada. To be
more precise, useless_type_conversion_p now returns true for some (view-)
conversions that are needed by the rest of the compiler.
gcc/
PR lto/119792
Revert
Backported from master:
2024-12-12 Martin Uecker <uecker@tugraz.at>
Andrew Pinski [Mon, 14 Apr 2025 15:40:24 +0000 (08:40 -0700)]
testcase: Add testcase for already fixed PR [PR118476]
This testcase was fixed by r15-3052-gc7b76a076cb2c6ded but is
a testcase that failed in a different fashion and a much older
failure than the one added with r15-3052.
Andrew Pinski [Mon, 19 Aug 2024 15:06:36 +0000 (08:06 -0700)]
match: Reject non-ssa name/min invariants in gimple_extract [PR116412]
After the conversion for phiopt's conditional operand
to use maybe_push_res_to_seq, it was found that gimple_extract
will extract out from REALPART_EXPR/IMAGPART_EXPR/VCE and BIT_FIELD_REF,
a memory load. But that extraction was not needed as memory loads are not
simplified in match and simplify. So gimple_extract should return false
in those cases.
Changes since v1:
* Move the rejection to gimple_extract from factor_out_conditional_operation.
Bootstrapped and tested on x86_64-linux-gnu.
PR tree-optimization/116412
gcc/ChangeLog:
* gimple-match-exports.cc (gimple_extract): Return false if op0
was not a SSA name nor a min invariant for REALPART_EXPR/IMAGPART_EXPR/VCE
and BIT_FIELD_REF.
Andrew Pinski [Sun, 27 Oct 2024 20:16:22 +0000 (13:16 -0700)]
vec-lowering: Fix ABSU lowering [PR111285]
ABSU_EXPR lowering incorrectly used the resulting type
for the new expression but in the case of ABSU the resulting
type is an unsigned type and with ABSU is folded away. The fix
is to use a signed type for the expression instead.
Bootstrapped and tested on x86_64-linux-gnu.
PR middle-end/111285
gcc/ChangeLog:
* tree-vect-generic.cc (do_unop): Use a signed type for the
operand if the operation was ABSU_EXPR.
Andrew Pinski [Tue, 1 Oct 2024 21:48:19 +0000 (14:48 -0700)]
backprop: Fix deleting of a phi node [PR116922]
The problem here is remove_unused_var is called on a name that is
defined by a phi node but it deletes it like removing a normal statement.
remove_phi_node should be called rather than gsi_remove for phinodes.
Note there is a possibility of using simple_dce_from_worklist instead
but that is for another day.
Andrew Pinski [Wed, 2 Oct 2024 21:21:24 +0000 (14:21 -0700)]
aarch64: Fix early ra for -fno-delete-dead-exceptions [PR116927]
Early-RA was considering throwing instructions as being dead and removing
them even if -fno-delete-dead-exceptions was in use. This fixes that oversight.
Built and tested for aarch64-linux-gnu.
PR target/116927
gcc/ChangeLog:
* config/aarch64/aarch64-early-ra.cc (early_ra::is_dead_insn): Insns
that throw are not dead with -fno-delete-dead-exceptions.
Andrew Pinski [Tue, 1 Oct 2024 18:34:00 +0000 (18:34 +0000)]
phiopt: Fix VCE moving by rewriting it into cast [PR116098]
Phiopt match_and_simplify might move a well defined VCE assign statement
from being conditional to being uncondtitional; that VCE might no longer
being defined. It will need a rewrite into a cast instead.
This adds the rewriting code to move_stmt for the VCE case.
This is enough to fix the issue at hand. It should also be using rewrite_to_defined_overflow
but first I need to move the check to see a rewrite is needed into its own function
and that is causing issues (see https://gcc.gnu.org/pipermail/gcc-patches/2024-September/663938.html).
Plus this version is easiest to backport.
Bootstrapped and tested on x86_64-linux-gnu.
PR tree-optimization/116098
gcc/ChangeLog:
* tree-ssa-phiopt.cc (move_stmt): Rewrite VCEs from integer to integer
types to case.
gcc/testsuite/ChangeLog:
* c-c++-common/torture/pr116098-2.c: New test.
* g++.dg/torture/pr116098-1.C: New test.
Harald Anlauf [Tue, 8 Apr 2025 20:30:15 +0000 (22:30 +0200)]
Fortran: fix issue with impure elemental subroutine and interface [PR119656]
PR fortran/119656
gcc/fortran/ChangeLog:
* interface.cc (gfc_compare_actual_formal): Fix front-end memleak
when searching for matching interfaces.
* trans-expr.cc (gfc_conv_procedure_call): If there is a formal
dummy corresponding to an absent argument, use its type, and only
fall back to inferred type otherwise.
We've been miscompiling the following since r0-51314-gd6b4ea8592e338 (I
did not go compile something that old, and identified this change via
git blame, so might be wrong)
=== cut here ===
struct Foo { int x; };
Foo& get (Foo &v) { return v; }
void bar () {
Foo v; v.x = 1;
(true ? get (v) : get (v)).*(&Foo::x) = 2;
// v.x still equals 1 here...
}
=== cut here ===
The problem lies in build_m_component_ref, that computes the address of
the COND_EXPR using build_address to build the representation of
(true ? get (v) : get (v)).*(&Foo::x);
and gets something like
&(true ? get (v) : get (v)) // #1
instead of
(true ? &get (v) : &get (v)) // #2
and the write does not go where want it to, hence the miscompile.
This patch replaces the call to build_address by a call to
cp_build_addr_expr, which gives #2, that is properly handled.
PR c++/114525
gcc/cp/ChangeLog:
* typeck2.cc (build_m_component_ref): Call cp_build_addr_expr
instead of build_address.
Richard Biener [Wed, 9 Apr 2025 12:36:19 +0000 (14:36 +0200)]
rtl-optimization/119689 - compare-debug failure with LRA
The previous change to fix LRA rematerialization broke compare-debug
for i586 bootstrap. Fixed by using prev_nonnote_nondebug_insn
instead of prev_nonnote_insn.
PR rtl-optimization/119689
PR rtl-optimization/115568
* lra-remat.cc (create_cands): Use prev_nonnote_nondebug_insn
to check whether insn2 is directly before insn.
[PR115568][LRA]: Use more strict output reload check in rematerialization
In this PR case LRA rematerialized a value from inheritance insn
instead of output reload one. This resulted in considering a
rematerilization candidate value available when it was actually
not. As a consequence an insn after rematerliazation used the
unexpected value and this use resulted in fp exception. The patch
fixes this bug.
gcc/ChangeLog:
PR rtl-optimization/115568
* lra-remat.cc (create_cands): Check that output reload insn is
adjacent to given insn. Update a comment.
Jason Merrill [Thu, 10 Apr 2025 22:16:37 +0000 (18:16 -0400)]
c++: avoid ARM -Wunused-value [PR114970]
Because of the __builtin_is_constant_evaluated, maybe_constant_init in
expand_default_init fails, so the constexpr constructor isn't folded until
cp_fold, which builds a COMPOUND_EXPR in case the enclosing expression is
relying on the ARM behavior of returning 'this'.
As in other places, avoid -Wunused-value on artificial COMPOUND_EXPR.
PR c++/114970
gcc/cp/ChangeLog:
* cp-gimplify.cc (cp_fold): Suppress warnings on
return_this COMPOUND_EXPR.
Jason Merrill [Thu, 10 Apr 2025 18:34:35 +0000 (14:34 -0400)]
c++: nested lambda capture pack [PR119345]
tsubst_stmt already registers a local capture proxy as a
local_specialization of both an outer capture proxy and the captured
variable; we also need to do that in add_extra_args.
PR c++/119345
gcc/cp/ChangeLog:
* pt.cc (add_extra_args): Also register a specialization
of the captured variable.
Jason Merrill [Wed, 9 Apr 2025 17:22:56 +0000 (13:22 -0400)]
c++: lambda in constraint of lambda [PR119175]
Here when we went to mangle the constraints of from<0>, the outer lambda has
no mangling scope, but the inner one was treated as having the outer one as
its scope. And mangling the outer one means mangling its constraints, which
include the inner one. So infinite recursion.
But a lambda closure type isn't a scope that anything should have for
mangling, the inner lambda should also have no mangling scope.
PR c++/119175
gcc/cp/ChangeLog:
* mangle.cc (decl_mangling_context): Look through lambda type.
Jason Merrill [Mon, 7 Apr 2025 18:35:14 +0000 (14:35 -0400)]
c++: self-dependent alias template [PR117530]
Here, instantiating B<short> means instantiating A<short>, which means
instantiating B<short>. And then when we go to register the initial
instantiation, it conflicts with the inner one. Fixed by checking after
tsubst whether there's already something in the hash table. We already did
something much like this in tsubst_decl, but that doesn't handle this case.
PR c++/117530
gcc/cp/ChangeLog:
* pt.cc (instantiate_template): Check retrieve_specialization after
tsubst.
With inherited CTAD the set of guides may be a two-dimensional overload
set (i.e. OVERLOADs of OVERLOADs) so alias_ctad_tweaks (which also does
the inherited CTAD transformation) needs to use the 2D-aware lkp_iterator
instead of ovl_iterator, or better yet use the more idiomatic lkp_range.
PR c++/119687
gcc/cp/ChangeLog:
* pt.cc (alias_ctad_tweaks): Use lkp_range / lkp_iterator
instead of ovl_iterator.
gcc/testsuite/ChangeLog:
* g++.dg/cpp23/class-deduction-inherited8.C: New test.
Jonathan Wakely [Tue, 5 Nov 2024 17:19:06 +0000 (17:19 +0000)]
libstdc++: Fix conversions to key/value types for hash table insertion [PR115285]
The conversions to key_type and value_type that are performed when
inserting into _Hashtable need to be fixed to do any required
conversions explicitly. The current code assumes that conversions from
the parameter to the key_type or value_type can be done implicitly,
which isn't necessarily true.
Remove the _S_forward_key function which doesn't handle all cases and
either forward the parameter if it already has type cv key_type, or
explicitly construct a temporary of type key_type.
Similarly, the _ConvertToValueType specialization for maps doesn't
handle all cases either, for std::pair arguments only some value
categories are handled. Remove _ConvertToValueType and for the _M_insert
function for unique keys, either forward the argument unchanged or
explicitly construct a temporary of type value_type.
For the _M_insert overload for non-unique keys we don't need any
conversion at all, we can just forward the argument directly to where we
construct a node.
libstdc++-v3/ChangeLog:
PR libstdc++/115285
* include/bits/hashtable.h (_Hashtable::_S_forward_key): Remove.
(_Hashtable::_M_insert_unique_aux): Replace _S_forward_key with
a static_cast to a type defined using conditional_t.
(_Hashtable::_M_insert): Replace _ConvertToValueType with a
static_cast to a type defined using conditional_t.
* include/bits/hashtable_policy.h (_ConvertToValueType): Remove.
* testsuite/23_containers/unordered_map/insert/115285.cc: New test.
* testsuite/23_containers/unordered_set/insert/115285.cc: New test.
* testsuite/23_containers/unordered_set/96088.cc: Adjust
expected number of allocations.
François Dumont [Tue, 22 Oct 2024 17:13:34 +0000 (19:13 +0200)]
libstdc++: Always instantiate key_type to compute hash code [PR115285]
Even if it is possible to compute a hash code from the inserted arguments
we need to instantiate the key_type to guaranty hash code consistency.
Preserve the lazy instantiation of the mapped_type in the context of
associative containers.
libstdc++-v3/ChangeLog:
PR libstdc++/115285
* include/bits/hashtable.h (_S_forward_key<_Kt>): Always return a temporary
key_type instance.
* testsuite/23_containers/unordered_map/96088.cc: Adapt to additional instanciation.
Also check that mapped_type is not instantiated when there is no insertion.
* testsuite/23_containers/unordered_multimap/96088.cc: Adapt to additional
instanciation.
* testsuite/23_containers/unordered_multiset/96088.cc: Likewise.
* testsuite/23_containers/unordered_set/96088.cc: Likewise.
* testsuite/23_containers/unordered_set/pr115285.cc: New test case.
Jin Ma [Mon, 7 Apr 2025 06:21:50 +0000 (14:21 +0800)]
RISC-V: Disable unsupported vsext/vzext patterns for XTheadVector.
XThreadVector does not support the vsext/vzext instructions; however,
due to the reuse of RVV optimizations, it may generate these instructions
in certain cases. To prevent the error "Unknown opcode 'th.vsext.vf2',"
we should disable these patterns.
V2:
Change the value of dg-do in the test case from assemble to compile, and
remove the -save-temps option.
gcc/ChangeLog:
* config/riscv/vector.md: Disable vsext/vzext for XTheadVector.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/xtheadvector/vsext.c: New test.
* gcc.target/riscv/rvv/xtheadvector/vzext.c: New test.
Looking at the log for the reload pass, it is found that "Changing pseudo 209 in
operand 3 of insn 69 on equiv 0x1".
It converts the vl operand in insn from the expected register(reg:DI 209) to the
constant 1(const_int 1 [0x1]).
This conversion occurs because, although the predicate for the vl operand is
restricted by "vector_length_operand" in the pattern, the constraint is still
"rK", which allows the transformation.
The issue is that changing the "rK" constraint to "rJ" for the constraint of vl
operand in the pattern would prevent this conversion, But unfortunately this will
conflict with RVV (RISC-V Vector Extension).
Based on the review's recommendations, the best solution for now is to create
a new constraint to distinguish between RVV and XTheadVector, which is exactly
what this patch does.
Kito Cheng [Thu, 10 Apr 2025 08:58:49 +0000 (16:58 +0800)]
RISC-V: Fix the behavior for multilib-generator with --cmodel=large on rv32
Large code model is only supported on RV64, so we don't need to
generate the multilibs for RV32 with --cmodel=large. And the compact
code model is something we don't supported on upstream (which is
accidentally added in the past), so we need to remove it.
gcc/ChangeLog:
* config/riscv/multilib-generator: Remove the compact code model
and check large code model for RV32.
Patrick Palka [Wed, 9 Apr 2025 21:48:05 +0000 (17:48 -0400)]
libstdc++: Fix constraint recursion in basic_const_iterator operator- [PR115046]
It was proposed in PR112490 to also adjust basic_const_iterator's friend
operator-(sent, iter) overload alongside the r15-7757-g4342c50ca84ae5
adjustments to its comparison operators, but we lacked a concrete
testcase demonstrating fixable constraint recursion there. It turns out
Hewill Kang's PR115046 is such a testcase! So this patch makes the same
adjustments to that overload as well, fixing PR115046. The LWG 4218 P/R
will need to get adjusted too.
PR libstdc++/115046
PR libstdc++/112490
libstdc++-v3/ChangeLog:
* include/bits/stl_iterator.h (basic_const_iterator::operator-):
Replace non-dependent basic_const_iterator function parameter with
a dependent one of type basic_const_iterator<_It2> where _It2
matches _It.
* testsuite/std/ranges/adaptors/as_const/1.cc (test04): New test.
Patrick Palka [Wed, 9 Apr 2025 21:55:36 +0000 (17:55 -0400)]
c++: ICE with nested default targ lambdas [PR119574]
In GCC 14 we fixed PR116567 in a more conservative way that doesn't
distinguish between the two kinds of deferred substitutions, and so
for PR119574 we instead ICE from get_innermost_template_args due to
TMPL_PARMS_DEPTH of the lambda, 2, being greater than the depth of the
augmented args, 1.
This patch works around the ICE in a best effort kind of way by guarding
the get_innermost_template_args call appropriately; I don't think it's
possible to get this completely right in GCC 14 without backporting the
proper fix for PR116567.
Note that lambda-targ13b.C present in the GCC 15 version of this patch[1]
never worked in GCC 14, and still doesn't work, which is why it's not
present in this patch.
xuli [Tue, 12 Nov 2024 02:31:28 +0000 (02:31 +0000)]
RISC-V: Bugfix for max_sew_overlap_and_next_ratio_valid_for_prev_sew_p[pr117483]
This patch fixs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117483
If prev and next satisfy the following rules, we should forbid the case
(next.get_sew() < prev.get_sew() && (!next.get_ta() || !next.get_ma()))
in the compatible function max_sew_overlap_and_next_ratio_valid_for_prev_sew_p.
Otherwise, the tail elements of next will be polluted.
Andreas noted we were getting an uninit warning after the recent constant
synthesis changes. Essentially there's no way for the uninit analysis code to
know the first entry in the CODES array is a UNKNOWN which will set X before
its first use.
So trivial initialization with NULL_RTX is the obvious fix.
Robin Dapp [Wed, 24 Jul 2024 07:08:00 +0000 (09:08 +0200)]
RISC-V: Error early with V and no M extension.
For calculating the value of a poly_int at runtime we use a
multiplication instruction that requires the M extension.
Instead of just asserting and ICEing this patch emits an early
error at option-parsing time.
gcc/ChangeLog:
PR target/116036
* config/riscv/riscv.cc (riscv_override_options_internal): Error
with TARGET_VECTOR && !TARGET_MUL.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/arch-31.c: Add m to arch string and expect it.
* gcc.target/riscv/arch-32.c: Ditto.
* gcc.target/riscv/predef-14.c: Ditto.
* gcc.target/riscv/predef-15.c: Ditto.
* gcc.target/riscv/predef-16.c: Ditto.
* gcc.target/riscv/predef-26.c: Ditto.
* gcc.target/riscv/predef-27.c: Ditto.
* gcc.target/riscv/predef-32.c: Ditto.
* gcc.target/riscv/predef-33.c: Ditto.
* gcc.target/riscv/rvv/autovec/pr111486.c: Add m to arch string.
* gcc.target/riscv/compare-debug-1.c: Ditto.
* gcc.target/riscv/compare-debug-2.c: Ditto.
* gcc.target/riscv/rvv/base/pr116036.c: New test.
Robin Dapp [Wed, 31 Jul 2024 14:54:03 +0000 (16:54 +0200)]
RISC-V: Correct mode_idx attribute for viwalu wx variants [PR116149].
In PR116149 we choose a wrong vector length which causes wrong values in
a reduction. The problem happens in avlprop where we choose the
number of units in the instruction's mode as vector length. For the
non-scalar variants the respective operand has the correct non-widened
mode. For the scalar variants, however, the same operand has a scalar
mode which obviously only has one unit. This makes us choose VL = 1
leaving three elements undisturbed (so potentially -1). Those end up
in the reduction causing the wrong result.
This patch adjusts the mode_idx just for the scalar variants of the
affected instruction patterns.
gcc/ChangeLog:
PR target/116149
* config/riscv/vector.md: Fix mode_idx attribute of scalar
widen add/sub variants.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/pr116149.c: New test.
Jeff Law [Thu, 8 Aug 2024 13:42:26 +0000 (07:42 -0600)]
[RISC-V][PR target/116240] Ensure object is a comparison before extracting arguments
This was supposed to go out the door yesterday, but I kept getting interrupted.
The target bits for rtx costing can't assume the rtl they're given actually
matches a target pattern. It's just kind of inherent in how the costing
routines get called in various places.
In this particular case we're trying to cost a conditional move:
(set (dest) (if_then_else (cond) (true) (false))
On the RISC-V port the backend only allows actual conditionals for COND. So
something like (eq (reg) (const_int 0)). In the costing code for if-then-else
we did something like
(XEXP (XEXP (cond, 0), 0)))
Which fails miserably if COND is a terminal node like (reg) rather than (ne
(reg) (const_int 0)
So this patch tightens up the RTL scanning to ensure that we have a comparison
before we start looking at the comparison's arguments.
Run through my tester without incident, but I'll wait for the pre-commit tester
to run through a cycle before pushing to the trunk.
Jeff
ps. We probably could support a naked REG for the condition and internally convert it to (ne (reg) (const_int 0)), but I don't think it likely happens with any regularity.
PR target/116240
gcc/
* config/riscv/riscv.cc (riscv_rtx_costs): Ensure object is a
comparison before looking at its arguments.
gcc/testsuite
* gcc.target/riscv/pr116240.c: New test.
曾治金 [Wed, 14 Aug 2024 06:06:23 +0000 (14:06 +0800)]
RISC-V: Fix factor in dwarf_poly_indeterminate_value [PR116305]
This patch is to fix the bug (BugId:116305) introduced by the commit
bd93ef for risc-v target.
The commit bd93ef changes the chunk_num from 1 to TARGET_MIN_VLEN/128
if TARGET_MIN_VLEN is larger than 128 in riscv_convert_vector_bits. So
it changes the value of BYTES_PER_RISCV_VECTOR. For example, before
merging the commit bd93ef and if TARGET_MIN_VLEN is 256, the value
of BYTES_PER_RISCV_VECTOR should be [8, 8], but now [16, 16]. The value
of riscv_bytes_per_vector_chunk and BYTES_PER_RISCV_VECTOR are no longer
equal.
Prologue will use BYTES_PER_RISCV_VECTOR.coeffs[1] to estimate the vlenb
register value in riscv_legitimize_poly_move, and dwarf2cfi will also
get the estimated vlenb register value in riscv_dwarf_poly_indeterminate_value
to calculate the number of times to multiply the vlenb register value.
So need to change the factor from riscv_bytes_per_vector_chunk to
BYTES_PER_RISCV_VECTOR, otherwise we will get the incorrect dwarf
information. The incorrect example as follow:
The sequence '0x92,0xa2,0x38,0' means the vlenb register, '0x34' means
the literal 4, '0x1e' means the multiply operation. But in fact, the
vlenb register value just need to multiply the literal 2.
PR target/116305
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_dwarf_poly_indeterminate_value): Take
BYTES_PER_RISCV_VECTOR for *factor instead of riscv_bytes_per_vector_chunk.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/scalable_vector_cfi.c: New test.