git.ipfire.org Git - thirdparty/gcc.git/log

d: Fix ICE in dwarf2out_imported_module_or_decl, at dwarf2out.cc:27676 [PR119817]

The ImportVisitor method for handling the importing of overload sets was
pushing NULL_TREE to the array of import decls, which in turn got passed
to `debug_hooks->imported_module_or_decl', triggering the observed
internal compiler error.

NULL_TREE is returned from `build_import_decl' when the symbol was
ignored for being non-trivial to represent in debug, for example,
template or tuple declarations. So similarly "skip" adding the symbol
when this is the case for overload sets too.

PR d/119817

gcc/d/ChangeLog:

* imports.cc (ImportVisitor::visit (OverloadSet *)): Don't push
NULL_TREE to vector of import symbols.

gcc/testsuite/ChangeLog:

* gdc.dg/debug/imports/m119817/a.d: New test.
* gdc.dg/debug/imports/m119817/b.d: New test.
* gdc.dg/debug/imports/m119817/package.d: New test.
* gdc.dg/debug/pr119817.d: New test.

(cherry picked from commit f5ed7d19c965de9ccb158d77e929b17459bf65b5)

d: Fix forward referenced enums missing type names in debug info [PR118309]

Calling `rest_of_type_compilation' as the D types were built meant that
debug info was being emitted before all forward references were
resolved, resulting in DW_AT_name's to be missing.

Instead, defer outputting type debug information until all modules have
been parsed and generated in `d_finish_compilation'.

PR d/118309

gcc/d/ChangeLog:

* modules.cc: Include debug.h
(d_finish_compilation): Call debug_hooks->type_decl on all TYPE_DECLs.
* types.cc: Remove toplev.h include.
(finish_aggregate_type): Don't call rest_of_type_compilation or
rest_of_decl_compilation on type.
(TypeVisitor::visit (TypeEnum *)): Likewise.

gcc/testsuite/ChangeLog:

* gdc.dg/debug/dwarf2/pr118309.d: New test.

(cherry picked from commit cee353c2653d274768a67677c8ea37fd23422b3c)

libatomic: Fix up libat_{,un}lock_n for mingw [PR119796]

Here is just a port of the previously posted patch to mingw which
clearly has the same problems.

2025-04-16 Jakub Jelinek <jakub@redhat.com>

PR libgcc/101075
PR libgcc/119796
* config/mingw/lock.c (libat_lock_n, libat_unlock_n): Start with
computing how many locks will be needed and take into account
((uintptr_t)ptr % WATCH_SIZE). If some locks from the end of the
locks array and others from the start of it will be needed, first
lock the ones from the start followed by ones from the end.

(cherry picked from commit 34fe8e90007afbc87941df9b01ffcf8747c11497)

libatomic: Fix up libat_{,un}lock_n [PR119796]

As mentioned in the PR (and I think in PR101075 too), we can run into
deadlock with libat_lock_n calls with larger n.
As mentioned in PR66842, we use multiple locks (normally 64 mutexes
for each 64 byte cache line in 4KiB page) and currently can lock more
than one lock, in particular for n [0, 64] a single lock, for n [65, 128]
2 locks, for n [129, 192] 3 locks etc.
There are two problems with this:
1) we can deadlock if there is some wrap-around, because the locks are
   acquired always in the order from addr_hash (ptr) up to
   locks[NLOCKS-1].mutex and then if needed from locks[0].mutex onwards;
   so if e.g. 2 threads perform libat_lock_n with n = 2048+64, in one
   case at pointer starting at page boundary and in another case at
   page boundary + 2048 bytes, the first thread can lock the first
   32 mutexes, the second thread can lock the last 32 mutexes and
   then first thread wait for the lock 32 held by second thread and
   second thread wait for the lock 0 held by the first thread;
   fixed below by always locking the locks in order of increasing
   index, if there is a wrap-around, by locking in 2 loops, first
   locking some locks at the start of the array and second at the
   end of it
2) the number of locks seems to be determined solely depending on the
   n value, I think that is wrong, we don't know the structure alignment
   on the libatomic side, it could very well be 1 byte aligned struct,
   and so how many cachelines are actually (partly or fully) occupied
   by the atomic access depends not just on the size, but also on
   ptr % WATCH_SIZE, e.g. 2 byte structure at address page_boundary+63
   should IMHO lock 2 locks because it occupies the first and second
   cacheline

Note, before this patch it locked exactly one lock for n = 0, while
with this patch it could lock either no locks at all (if it is at cacheline
boundary) or 1 (otherwise).
Dunno of libatomic APIs can be called for zero sizes and whether
we actually care that much how many mutexes are locked in that case,
because one can't actually read/write anything into zero sized memory.
If you think it is important, I could add else if (nlocks == 0) nlocks = 1;
in both spots.

2025-04-16  Jakub Jelinek  <jakub@redhat.com>

PR libgcc/101075
PR libgcc/119796
* config/posix/lock.c (libat_lock_n, libat_unlock_n): Start with
computing how many locks will be needed and take into account
((uintptr_t)ptr % WATCH_SIZE).  If some locks from the end of the
locks array and others from the start of it will be needed, first
lock the ones from the start followed by ones from the end.

(cherry picked from commit 61dfb0747afcece3b7a690807b83b366ff34f329)

bitintlower: Fix interaction of gimple_assign_copy_p stmts vs. has_single_use [PR119808]

The following testcase is miscompiled, because we emit a CLOBBER in a place
where it shouldn't be emitted.
Before lowering we have:
  b_5 = 0;
  b.0_6 = b_5;
  b.1_1 = (unsigned _BitInt(129)) b.0_6;
...
  <retval> = b_5;
The bitint coalescing assigns the same partition/underlying variable
for both b_5 and b.0_6 (possible because there is a copy assignment)
and of course a different one for b.1_1 (and other SSA_NAMEs in between).
This is -O0 so stmts aren't DCEd and aren't propagated that much etc.
It is -O0 so we also don't try to optimize and omit some names from m_names
and handle multiple stmts at once, so the expansion emits essentially
  bitint.4 = {};
  bitint.4 = bitint.4;
  bitint.2 = cast of bitint.4;
  bitint.4 = CLOBBER;
...
  <retval> = bitint.4;
and the CLOBBER is the problem because bitint.4 is still live afterwards.
We emit the clobbers to improve code generation, but do it only for
(initially) has_single_use SSA_NAMEs (remembered in m_single_use_names)
being used, if they don't have the same partition on the lhs and a few
other conditions.
The problem above is that b.0_6 which is used in the cast has_single_use
and so was in m_single_use_names bitmask and the lhs in that case is
bitint.2, so a different partition.  But there is gimple_assign_copy_p
with SSA_NAME rhs1 and the partitioning special cases those and while
b.0_6 is single use, b_5 has multiple uses.  I believe this ought to be
a problem solely in the case of such copy stmts and its special case
by the partitioning, if instead of b.0_6 = b_5; there would be
b.0_6 = b_5 + 1; or whatever other stmts that performs or may perform
changes on the value, partitioning couldn't assign the same partition
to b.0_6 and b_5 if b_5 is used later, it couldn't have two different
(or potentially different) values in the same bitint.N var.  With
copy that is possible though.

So the following patch fixes it by being more careful when we set
m_single_use_names, don't set it if it is a has_single_use SSA_NAME
but SSA_NAME_DEF_STMT of it is a copy stmt with SSA_NAME rhs1 and that
rhs1 doesn't have single use, or has_single_use but SSA_NAME_DEF_STMT of it
is a copy stmt etc.

Just to make sure it doesn't change code generation too much, I've gathered
statistics how many times
      if (m_first
          && m_single_use_names
          && m_vars[p] != m_lhs
          && m_after_stmt
          && bitmap_bit_p (m_single_use_names, SSA_NAME_VERSION (op)))
        {
          tree clobber = build_clobber (TREE_TYPE (m_vars[p]),
                                        CLOBBER_STORAGE_END);
          g = gimple_build_assign (m_vars[p], clobber);
          gimple_stmt_iterator gsi = gsi_for_stmt (m_after_stmt);
          gsi_insert_after (&gsi, g, GSI_SAME_STMT);
        }
emits a clobber on
make check-gcc GCC_TEST_RUN_EXPENSIVE=1 RUNTESTFLAGS="--target_board=unix\{-m64,-m32\} GCC_TEST_RUN_EXPENSIVE=1 dg.exp='*bitint* pr112673.c builtin-stdc-bit-*.c pr112566-2.c pr112511.c pr116588.c pr116003.c pr113693.c pr113602.c flex-array-counted-by-7.c' dg-torture.exp='*bitint* pr116480-2.c pr114312.c pr114121.c' dfp.exp=*bitint* i386.exp='pr118017.c pr117946.c apx-ndd-x32-2a.c' vect.exp='vect-early-break_99-pr113287.c' tree-ssa.exp=pr113735.c"
and before this patch it was 41010 clobbers and after it is 40968,
so difference is 42 clobbers, 0.1% fewer.

2025-04-16  Jakub Jelinek  <jakub@redhat.com>

PR middle-end/119808
* gimple-lower-bitint.cc (gimple_lower_bitint): Don't set
m_single_use_names bits for SSA_NAMEs which have single use but
their SSA_NAME_DEF_STMT is a copy from another SSA_NAME which doesn't
have a single use, or single use which is such a copy etc.

(cherry picked from commit 5a48e7732d6aa0aaf12b508fa640125e6c4d14b9)

expmed: Always use QImode for init_expmed set_zero_cost [PR119785]

This is a regression on some targets introduced I believe by r6-2055
which added mode argument to set_src_cost.

The problem here is that in the first iteration, mode is always QImode
and we get as -Os zero cost set_src_cost (const0_rtx, QImode, false).
But then we use the mode variable for iterating over int, partial int
and vector int modes, so for the second iteration we call set_src_cost
with mode which is at that time (machine_mode) (MAX_MODE_VECTOR_INT + 1).

In the x86 case that happens to be V2HFmode and we don't crash (and
compute the same 0 cost as we would for QImode).
But e.g. in the SPARC case (machine_mode) (MAX_MODE_VECTOR_INT + 1) is
MAX_MACHINE_MODE and that does all kinds of weird things especially
when doing ubsan bootstrap.

Fixed by always using QImode.

2025-04-14 Jakub Jelinek <jakub@redhat.com>

PR rtl-optimization/119785
* expmed.cc (init_expmed): Always pass QImode rather than mode to
set_src_cost passed to set_zero_cost.

(cherry picked from commit f96a54350afcf7f3c90d0ecb51d7683d826acc00)

driver: On linux hosts disable ASLR during -freport-bug [PR119727]

Andi had a useful comment that even with the PR119727 workaround to
ignore differences in libbacktrace printed addresses, it is still better
to turn off ASLR when easily possible, e.g. in case some address leaks
in somewhere in the ICE message elsewhere, or to verify the ICE doesn't
depend on a particular library/binary load addresses.

The following patch adds a configure check and uses personality syscall
to turn off randomization for further -freport-bug subprocesses.

2025-04-14 Jakub Jelinek <jakub@redhat.com>

PR driver/119727
* configure.ac (HOST_HAS_PERSONALITY_ADDR_NO_RANDOMIZE): New check.
* gcc.cc: Include sys/personality.h if
HOST_HAS_PERSONALITY_ADDR_NO_RANDOMIZE is defined.
(try_generate_repro): Call
personality (personality (0xffffffffU) | ADDR_NO_RANDOMIZE)
if HOST_HAS_PERSONALITY_ADDR_NO_RANDOMIZE is defined.
* config.in: Regenerate.
* configure: Regenerate.

(cherry picked from commit 5a32e85810d33dc46b1b5fe2803ee787d40709d5)

driver: Fix up -freport-bug for ASLR [PR119727]

With --enable-host-pie -freport-bug almost never prepares preprocessed
source and instead emits
The bug is not reproducible, so it is likely a hardware or OS problem.
message even for bogus which are 100% reproducible.
The way -freport-bug works is that it reruns it 3 times, capturing stdout
and stderr from each and then tries to compare the outputs in between
different runs.
The libbacktrace emitted hexadecimal addresses at the start of the lines
can differ between runs due to ASLR, either of the PIE executable, or
even if not PIE if there is some frame with e.g. libc function (say
crash in strlen/memcpy etc.).

The following patch fixes it by ignoring such differences at the start of
the lines.

2025-04-12 Jakub Jelinek <jakub@redhat.com>

PR driver/119727
* gcc.cc (files_equal_p): Rewritten using fopen/fgets/fclose instead
of open/fstat/read/close. At the start of lines, ignore lowercase
hexadecimal addresses followed by space.

(cherry picked from commit 8b2ceb421f045ee8b39d7941f39f1e9a67217583)

bitintlower: Fix up handling of SSA_NAME copies in coalescing [PR119722]

The following patch is miscompiled, because during the limited
SSA name coalescing the bitintlower pass does we incorrectly don't
register a conflict.
This is on
  <bb 4> [local count: 1073741824]:
  # b_17 = PHI <b_19(3), 8(2)>
  g.4_13 = g;
  _14 = g.4_13 >> 50;
  _15 = (unsigned int) _14;
  _21 = b_17;
  _16 = (unsigned int) _21;
  s_22 = _15 + _16;
  return s_22;
basic block where in the map->bitint bitmap we track 14, 17 and 19.
The build_bitint_stmt_ssa_conflicts "hook" has special code where
it tracks uses at the final statements of mergeable operations, so
e.g. the
  _16 = (unsigned int) _21;
statement is considered to be use of b_17 because _21 is not in
map->bitmap (or large_huge.m_names), i.e. is mergeable.
The problem is that build_ssa_conflict_graph has special code to handle
SSA_NAME copies and _21 = b_17; is gimple_assign_copy_p.  In such cases
it calls live_track_clear_var on the rhs1.  The problem is that
on the above bb, after we note in the _16 = (unsigned int) _21;
stmt we need b_17 the generic code makes us forget that because
of the copy statement, and then build_bitint_stmt_ssa_conflicts
ignores it completely (because _21 is large/huge bitint and is
not in map->bitint, so assumed to be handled by a later stmt in the
bb, for backwards walk like this before this one).
As the b_17 use is ignored, the coalescing thinks it can put
all of b_17, b_19 and _14 into the same partition, which is wrong,
while we can and should coalesce b_17 and b_19, _14 needs to be a different
temporary because b_17 is set before and used after _14 has been written.

The following patch fixes it by handling gimple_assign_copy_p in two
separate spots, move the generic coalesce handling of it after
build_ssa_conflict_graph (where build_ssa_conflict_graph handling
doesn't fall through to that, it does continue after the call) and
inside of build_ssa_conflict_graph it performs it too, but only if
the lhs is not mergeable large/huge bitint.

2025-04-12  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/119722
* gimple-lower-bitint.h (build_bitint_stmt_ssa_conflicts): Add
CLEAR argument.
* gimple-lower-bitint.cc (build_bitint_stmt_ssa_conflicts): Add
CLEAR argument.  Call clear on gimple_assign_copy_p rhs1 if lhs
is large/huge bitint unless lhs is not in names.
* tree-ssa-coalesce.cc (build_ssa_conflict_graph): Adjust
build_bitint_stmt_ssa_conflicts caller.  Move gimple_assign_copy_p
handling to after the build_bitint_stmt_ssa_conflicts call.

(cherry picked from commit 3f9dfb94eab1ab1bbf9a2b5e20d1f61e36516063)

bitintlower: Fix up handling of nested casts in m_upward_2limbs cases [PR119707]

The following testcase is miscompiled I believe starting with
PR112941 r14-6742.  That commit fixed the bitint-55.c testcase.
The m_first initialization for such conversion initializes 2 SSA_NAMEs,
one is PHI result on the loop (m_data[save_data_cnt]) and the other
(m_data[save_data_cnt+1]) is the argument of that PHI from the latch
edge initialized somewhere in the loop.  Both of these are used to
propagate sign extension (i.e. either 0 or all ones limb) from the
iteration with the sign bit of a narrower type to following iterations.
The bitint-55.c testcase was ICEing with invalid SSA forms as it was
using unconditionally the PHI argument SSA_NAME even in places which
weren't dominated by that.  And the code which was touched is about
handling constant idx, so if e.g. there are nested casts and the
outer one does conditional code based on index comparison with
a particular constant index.
In the following testcase there are 2 nested casts, one from signed
_BitInt(129) to unsigned _BitInt(255) and the outer from unsigned
_BitInt(255) to unsigned _BitInt(256).  The m_upward_2limbs case which
is used for handling mergeable arithmetics (like +-|&^ and casts etc.)
one loop iteration handles 2 limbs, the first half the even ones, the
second half the odd ones.
And for these 2 conversions, the special one for the inner conversion
on x86_64 is with index 2 where the sign bit of _BitInt(129) is present,
while for the outer one index 3 where we need to mask off the most
significant bit.
The r15-6742 change started using m_data[save_data_cnt] for all constant
indexes if it is still inside of the loop (and it is sign extension).
But that doesn't work correctly for the case where the inner conversion
produces the sign extension limb in the loop for an even index and
the outer conversion needs to special case the immediately next conversion,
because in that case using the PHI result will see still 0 there rather
than the updated value from the handling of previous limb.
So the following patch special cases this and uses the other SSA_NAME.

Commented IL, trying to lower
  _1 = (unsigned _BitInt(255)) y_4(D);
  _2 = (unsigned _BitInt(256)) _1;
  _3 = _2 + x_5(D);
  <retval> = _3;
we were emitting
  <bb 3> [local count: 1073741824]:
  # _8 = PHI <0(2), _9(12)>     // This is the limb index
  # _10 = PHI <0(2), _11(12)>   // Sign extension limb from inner cast (0 or ~0UL)
  # _22 = PHI <0(2), _23(12)>   // Overflow bit from addition of previous limb
  if (_8 <= 2)
    goto <bb 4>; [80.00%]
  else
    goto <bb 7>; [20.00%]

  <bb 4> [local count: 1073741824]:
  if (_8 == 2)
    goto <bb 6>; [20.00%]
  else
    goto <bb 5>; [80.00%]

  <bb 5> [local count: 1073741824]:
  _12 = VIEW_CONVERT_EXPR<unsigned long[3]>(y)[_8];     // Full limbs in y
  goto <bb 7>; [100.00%]

  <bb 6> [local count: 214748360]:
  _13 = MEM <unsigned long> [(_BitInt(129) *)&y + 16B]; // y[2] which
  _14 = (<unnamed-signed:1>) _13;                       // needs to be
  _15 = (unsigned long) _14;                            // sign extended
  _16 = (signed long) _15;                              // to full
  _17 = _16 >> 63;                                      // limb
  _18 = (unsigned long) _17;

  <bb 7> [local count: 1073741824]:
  # _19 = PHI <_12(5), _10(3), _15(6)>  // Limb to add for result of casts
  # _20 = PHI <0(5), _10(3), _18(6)>    // Sign extension limb from previous limb
  _11 = _20;                            // PHI _10 argument above
  _21 = VIEW_CONVERT_EXPR<unsigned long[4]>(x)[_8];
  _24 = .UADDC (_19, _21, _22);
  _25 = IMAGPART_EXPR <_24>;
  _26 = REALPART_EXPR <_24>;
  VIEW_CONVERT_EXPR<unsigned long[4]>(<retval>)[_8] = _26;
  _27 = _8 + 1;
  if (_27 == 3)                 // For the outer cast limb 3 is special
    goto <bb 11>; [20.00%]
  else
    goto <bb 8>; [80.00%]

  <bb 8> [local count: 1073741824]:
  if (_27 < 2)
    goto <bb 9>; [80.00%]
  else
    goto <bb 10>; [20.00%]

  <bb 9> [local count: 1073741824]:
  _28 = VIEW_CONVERT_EXPR<unsigned long[3]>(y)[_27];    // These are used in full

  <bb 10> [local count: 1073741824]:
  # _29 = PHI <_28(9), _11(8)>
  goto <bb 12>; [100.00%]

  <bb 11> [local count: 214748360]:
// And HERE is the actual bug.  Using _10 for idx 3 will mean it is always
// zero there and doesn't contain the _18 value propagated to it.
// It should be
// _30 = (<unnamed-unsigned:63>) _11;
// Now if the outer conversion had special iteration say 5, we could
// have used _10 fine here, by that time it already propagates through
// the PHI.
  _30 = (<unnamed-unsigned:63>) _10;
  _31 = (unsigned long) _30;

  <bb 12> [local count: 1073741824]:
  # _32 = PHI <_29(10), _31(11)>
  _33 = VIEW_CONVERT_EXPR<unsigned long[4]>(x)[_27];
  _34 = .UADDC (_32, _33, _25);
  _23 = IMAGPART_EXPR <_34>;
  _35 = REALPART_EXPR <_34>;
  VIEW_CONVERT_EXPR<unsigned long[4]>(<retval>)[_27] = _35;
  _9 = _8 + 2;
  if (_9 != 4)
    goto <bb 3>; [0.05%]
  else
    goto <bb 13>; [99.95%]

2025-04-11  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/119707
* gimple-lower-bitint.cc (bitint_large_huge::handle_cast): Only use
m_data[save_data_cnt] instead of m_data[save_data_cnt + 1] if
idx is odd and equal to low + 1.  Remember tree_to_uhwi (idx) in
a temporary instead of calling the function multiple times.

* gcc.dg/torture/bitint-76.c: New test.

(cherry picked from commit b57d7ef4bdda8f939d804bfe40123cb9e4b447b3)

libquadmath: Fix up THREEp96 constant in expq

Here is a cherry-pick from glibc [BZ #32411] fix.

As mentioned by the reporter in a pull request against gcc-mirror,
the THREEp96 constant in e_expl.c is incorrect, it is actually 0x3.p+94f128
rather than 0x3.p+96f128.

The algorithm uses that to compute the t2 integer (tval2), by whose
delta it adjusts the x+xl pair and then in the result uses the precomputed
exp value for that entry.
Using 0x3.p+94f128 rather than 0x3.p+96f128 results in tval2 sometimes
being one smaller, sometimes one larger than the desired value, thus can mean
the x+xl pair after adjustment will be larger in absolute value than it
should be.

DesWursters created a test program for this
https://github.com/DesWurstes/comparefloats
and his results were
total: 1135000000 not_equal: 4322 earlier_score: 674 later_score: 3648
I've modified this so with
https://sourceware.org/bugzilla/show_bug.cgi?id=32411#c3
so that it actually tests pseudo-random _Float128 values with range
(-16384.,16384) with strong bias on values larger than 0.0002 in absolute
value (so that tval1/tval2 aren't zero most of the time) and that gave
total: 10000000000 not_equal: 29861 earlier_score: 4606 later_score: 25255
So, in both cases, in most cases the change doesn't result in any differences,
and in those rare cases where does, about 85% have smaller ulp than without
the patch.
Additionally I've tried
https://sourceware.org/bugzilla/show_bug.cgi?id=32411#c4
and in 2 billion iterations it didn't find any case where x+xl after the
adjustments without this change would be smaller in absolute value compared
to x+xl after the adjustments with this change.

2025-04-09 Jakub Jelinek <jakub@redhat.com>

* math/expq.c (C): Fix up THREEp96 constant.

(cherry picked from commit e081ced345c45581a4891361c08e50e07720239e)

lto: lto-opts fixes [PR119625]

I can reproduce a really weird error in our distro i686 trunk gcc
(but haven't managed to reproduce it with vanilla trunk yet).
echo 'void foo (void) {}' > a.c; gcc -O2 -flto=auto -m32 -march=i686 -ffat-lto-objects -fhardened -o a.o -c a.c; gcc -O2 -flto=auto -m32 -march=i686 -r -o a.lo a.o
lto1: fatal error: open  failed: No such file or directory
compilation terminated.
lto-wrapper: fatal error: gcc returned 1 exit status
The error is because
cat ./a.lo.lto.o-args.0
""
a.o
My suspicion is that this "" in there is caused by weird .gnu.lto_.opts
section content during
gcc -O2 -flto=auto -m32 -march=i686 -ffat-lto-objects -fhardened -S -o a.s -c a.c
compilation (and I can reproduce that one with vanilla trunk).
The above results in
        .section        .gnu.lto_.opts,"e",@progbits
        .string "'-fno-openmp' '-fno-openacc' '-fPIC' '' '-m32' '-march=i686' '-O2' '-flto=auto' '-ffat-lto-objects'"
There are two weird things, one (IMHO the cause of the "" later on) is
the '' part, I think it comes from lto_write_options doing
append_to_collect_gcc_options (&temporary_obstack, &first_p, "");
IMHO it shouldn't call append_to_collect_gcc_options at all for that case.

The -fhardened option causes global_options.x_flag_cf_protection
to be set to CF_FULL and later on the backend option processing
sets it to CF_FULL | CF_SET (i.e. 7, a value not handled in
lto_write_options).

The following patch fixes it by not emitting anything there if
flag_cf_protection is one of the unhandled values.

Perhaps it could incrementally use
switch (global_options.x_flag_cf_protection & ~CF_SET)
instead, dunno.

And the other problem is that the -fPIC in there is really weird.
Our distro compiler or vanilla configured trunk certainly doesn't
default to -fPIC and -fhardened uses -fPIE when
-fPIC/-fpic/-fno-pie/-fno-pic is not specified, so I was expecting
-fPIE in there.
The thing is that the -fpie option causes setting of both
global_options.x_flag_pi{c,e} to 1, -fPIE both to 2:
      /* If -fPIE or -fpie is used, turn on PIC.  */
      if (opts->x_flag_pie)
        opts->x_flag_pic = opts->x_flag_pie;
      else if (opts->x_flag_pic == -1)
        opts->x_flag_pic = 0;
      if (opts->x_flag_pic && !opts->x_flag_pie)
        opts->x_flag_shlib = 1;
so checking first for flag_pic == 2 and then flag_pic == 1
and only afterwards for flag_pie means we never print
-fPIE/-fpie.

Or do you want something further (like
switch (global_options.x_flag_cf_protection & ~CF_SET)
)?

2025-04-04  Jakub Jelinek  <jakub@redhat.com>

PR lto/119625
* lto-opts.cc (lto_write_options): If neither flag_pic nor
flag_pie are set, check first for flag_pie and only later
for flag_pic rather than the other way around, use a temporary
variable.  If flag_cf_protection is not set, don't append anything
if flag_cf_protection is none of CF_{NONE,FULL,BRANCH,RETURN} and
use a temporary variable.

(cherry picked from commit d25728c98682c058bfda79333c94b0a8cf2a3f49)

c: Fix ICEs with -fsanitize=pointer-{subtract,compare} [PR119582]

The following testcase ICEs because c_fully_fold isn't performed on the
arguments of __sanitizer_ptr_{sub,cmp} builtins and so e.g.
C_MAYBE_CONST_EXPR can leak into the gimplifier where it ICEs.

2025-04-02 Jakub Jelinek <jakub@redhat.com>

PR c/119582
* c-typeck.cc (pointer_diff, build_binary_op): Call c_fully_fold on
__sanitizer_ptr_sub or __sanitizer_ptr_cmp arguments.

* gcc.dg/asan/pr119582.c: New test.

(cherry picked from commit 29bc904cb827615ed9f36bc3742ccc4ac77515ec)

combine: Use reg_used_between_p rather than modified_between_p in two spots [PR119291]

The following testcase is miscompiled on x86_64-linux at -O2 by the combiner.
We have from earlier combinations
(insn 22 21 23 4 (set (reg:SI 104 [ _7 ])
        (const_int 0 [0])) "pr119291.c":25:15 96 {*movsi_internal}
     (nil))
(insn 23 22 24 4 (set (reg/v:SI 117 [ e ])
        (reg/v:SI 116 [ e ])) 96 {*movsi_internal}
     (expr_list:REG_DEAD (reg/v:SI 116 [ e ])
        (nil)))
(note 24 23 25 4 NOTE_INSN_DELETED)
(insn 25 24 26 4 (parallel [
            (set (reg:CCZ 17 flags)
                (compare:CCZ (neg:SI (reg:SI 104 [ _7 ]))
                    (const_int 0 [0])))
            (set (reg/v:SI 116 [ e ])
                (neg:SI (reg:SI 104 [ _7 ])))
        ]) "pr119291.c":26:13 977 {*negsi_2}
     (expr_list:REG_DEAD (reg:SI 104 [ _7 ])
        (nil)))
(note 26 25 27 4 NOTE_INSN_DELETED)
(insn 27 26 28 4 (set (reg:DI 128 [ _9 ])
        (ne:DI (reg:CCZ 17 flags)
            (const_int 0 [0]))) "pr119291.c":26:13 1447 {*setcc_di_1}
     (expr_list:REG_DEAD (reg:CCZ 17 flags)
        (nil)))
and try_combine is called on i3 25 and i2 22 (second time)
and reach the hunk being patched with simplified i3
(insn 25 24 26 4 (parallel [
            (set (pc)
                (pc))
            (set (reg/v:SI 116 [ e ])
                (const_int 0 [0]))
        ]) "pr119291.c":28:13 977 {*negsi_2}
     (expr_list:REG_DEAD (reg:SI 104 [ _7 ])
        (nil)))
and
(insn 22 21 23 4 (set (reg:SI 104 [ _7 ])
        (const_int 0 [0])) "pr119291.c":27:15 96 {*movsi_internal}
     (nil))
Now, the try_combine code there attempts to split two independent
sets in newpat by moving one of them to i2.
And among other tests it checks
!modified_between_p (SET_DEST (set1), i2, i3)
which is certainly needed, if there would be say
(set (reg/v:SI 116 [ e ]) (const_int 42 [0x2a]))
in between i2 and i3, we couldn't do that, as that set would overwrite
the value set by set1 we want to move to the i2 position.
But in this case pseudo 116 isn't set in between i2 and i3, but used
(and additionally there is a REG_DEAD note for it).

This is equally bad for the move, because while the i3 insn
and later will see the pseudo value that we set, the insn in between
which uses the value will see a different value from the one that
it should see.

As we don't check for that, in the end try_combine succeeds and
changes the IL to:
(insn 22 21 23 4 (set (reg/v:SI 116 [ e ])
        (const_int 0 [0])) "pr119291.c":27:15 96 {*movsi_internal}
     (nil))
(insn 23 22 24 4 (set (reg/v:SI 117 [ e ])
        (reg/v:SI 116 [ e ])) 96 {*movsi_internal}
     (expr_list:REG_DEAD (reg/v:SI 116 [ e ])
        (nil)))
(note 24 23 25 4 NOTE_INSN_DELETED)
(insn 25 24 26 4 (set (pc)
        (pc)) "pr119291.c":28:13 2147483647 {NOOP_MOVE}
     (nil))
(note 26 25 27 4 NOTE_INSN_DELETED)
(insn 27 26 28 4 (set (reg:DI 128 [ _9 ])
        (const_int 0 [0])) "pr119291.c":28:13 95 {*movdi_internal}
     (nil))
(note, the i3 got turned into a nop and try_combine also modified insn 27).

The following patch replaces the modified_between_p
tests with reg_used_between_p, my understanding is that
modified_between_p is a subset of reg_used_between_p, so one
doesn't need both.

Looking at this some more today, I think we should special case
set_noop_p because that can be put into i2 (except for the JUMP_P
violations), currently both modified_between_p (pc_rtx, i2, i3)
and reg_used_between_p (pc_rtx, i2, i3) returns false.
I'll post a patch incrementally for that (but that feels like
new optimization, so probably not something that should be backported).

On Tue, Apr 01, 2025 at 11:27:25AM +0200, Richard Biener wrote:
> Can we constrain SET_DEST (set1/set0) to a REG_P in combine?  Why
> does the comment talk about memory?

I was worried about making too risky changes this late in stage4
(and especially also for backports).  Most of this code is 1992-ish.
I think many of the functions are just misnamed, the reg_ in there doesn't
match what those functions do (bet they initially supported just REGs
and later on support for other kinds of expressions was added, but haven't
done git archeology to prove that).

What we know for sure is:
           && GET_CODE (SET_DEST (XVECEXP (newpat, 0, 0))) != ZERO_EXTRACT
           && GET_CODE (SET_DEST (XVECEXP (newpat, 0, 0))) != STRICT_LOW_PART
           && GET_CODE (SET_DEST (XVECEXP (newpat, 0, 1))) != ZERO_EXTRACT
           && GET_CODE (SET_DEST (XVECEXP (newpat, 0, 1))) != STRICT_LOW_PART
that is checked earlier in the condition.
Then it calls
           && ! reg_referenced_p (SET_DEST (XVECEXP (newpat, 0, 1)),
                                  XVECEXP (newpat, 0, 0))
           && ! reg_referenced_p (SET_DEST (XVECEXP (newpat, 0, 0)),
                                  XVECEXP (newpat, 0, 1))
While it has reg_* in it, that function mostly calls reg_overlap_mentioned_p
which is also misnamed, that function handles just fine all of
REG, MEM, SUBREG of REG, (SUBREG of MEM not, see below), ZERO_EXTRACT,
STRICT_LOW_PART, PC and even some further cases.
So, IMHO SET_DEST (set0) or SET_DEST (set0) can be certainly a REG, SUBREG
of REG, PC (at least the REG and PC cases are triggered on the testcase)
and quite possibly also MEM (SUBREG of MEM not, see below).

Now, the code uses !modified_between_p (SET_SRC (set{1,0}), i2, i3) where that
function for constants just returns false, for PC returns true, for REG
returns reg_set_between_p, for MEM recurses on the address, for
MEM_READONLY_P otherwise returns false, otherwise checks using alias.cc code
whether the memory could have been modified in between, for all other
rtxes recurses on the subrtxes.  This part didn't change in my patch.

I've only changed those
-         && !modified_between_p (SET_DEST (set{1,0}), i2, i3)
+         && !reg_used_between_p (SET_DEST (set{1,0}), i2, i3)
where the former has been described above and clearly handles all of
REG, SUBREG of REG, PC, MEM and SUBREG of MEM among other things.

The replacement reg_used_between_p calls reg_overlap_mentioned_p on each
instruction in between i2 and i3.  So, there is clearly a difference
in behavior if SET_DEST (set{1,0}) is pc_rtx, in that case modified_between_p
returns unconditionally true even if there are no instructions in between,
but reg_used_between_p if there are no non-debug insns in between returns
false.  Sorry for missing that, guess I should check for that (with the
exception of the noop moves which are often (set (pc) (pc)) and handled
by the incremental patch).  In fact not just that, reg_used_between_p
will only return true for PC if it is mentioned anywhere in the insns
in between.
Anyway, except for that, for REG it calls refers_to_regno_p
and so should find any occurrences of any of the REG or parts of it for hard
registers, for MEM returns true if it sees any MEMs in insns in between
(conservatively), for SUBREGs apparently it relies on it being SUBREG of REG
(so doesn't handle SUBREG of MEM) and handles SUBREG of REG like the
SUBREG_REG, PC I've already described.

Now, because reg_overlap_mentioned_p doesn't handle SUBREG of MEM, I think
already the initial
           && ! reg_referenced_p (SET_DEST (XVECEXP (newpat, 0, 1)),
                                  XVECEXP (newpat, 0, 0))
           && ! reg_referenced_p (SET_DEST (XVECEXP (newpat, 0, 0)),
                                  XVECEXP (newpat, 0, 1))
calls would have failed --enable-checking=rtl or would have misbehaved, so
I think there is no need to check for it further.

To your question why I don't use reg_referenced_p, that is because
reg_referenced_p is something to call on one insn pattern, while
reg_used_between_p is pretty much that on all insns in between two
instructions (excluding the boundaries).

So, I think it would be safer to add && SET_DEST (set{1,0} != pc_rtx
checks to preserve former behavior, like in the following version.

2025-04-01  Jakub Jelinek  <jakub@redhat.com>

PR rtl-optimization/119291
* combine.cc (try_combine): For splitting of PARALLEL with
2 independent SETs into i2 and i3 sets check reg_used_between_p
of the SET_DESTs rather than just modified_between_p.

* gcc.c-torture/execute/pr119291.c: New test.

(cherry picked from commit 19ba913517b5e2a001fa9c0f060a1ac74430c027)

LoongArch: Change {dg-do-what-default} save and restore logical.

The set of {dg-do-what-default} to 'run' may lead some test hang
during make check.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/vector/loongarch-vector.exp: Change
{dg-do-what-default} save and restore logical.

(cherry picked from commit dd982198656d914a4958bf86356a4c996c728b9d)

Daily bump.

libstdc++: Correct preprocessing checks for floatX_t and bfloat_16 formatting

Floating points types _Float16, _Float32, _Float64, and bfloat16,
can be formatted only if std::to_chars overloads for such types
were provided. Currently this is only the case for architectures
where float and double are 32-bits and 64-bits IEEE floating points types.

This patch updates the preprocessing checks for formatters
for above types to check _GLIBCXX_FLOAT_IS_IEEE_BINARY32
and _GLIBCXX_DOUBLE_IS_IEEE_BINARY64. Making them non-formattable
on non-IEEE architectures.

Remove a potential UB, where we could produce basic_format_arg
with _M_type set to _Arg_fp32 or _Arg_fp64, that was later not
handled by `_M_visit`.

libstdc++-v3/ChangeLog:

* include/std/format (formatter<_Float16, _CharT>): Define only if
_GLIBCXX_FLOAT_IS_IEEE_BINARY32 macro is defined.
(formatter<_Float16, _CharT>): As above.
(formatter<__gnu_cxx::__bfloat16_t, _CharT>): As above.
(formatter<_Float64, _CharT>): Define only if
_GLIBCXX_DOUBLE_IS_IEEE_BINARY64 is defined.
(basic_format_arg::_S_to_arg_type): Normalize _Float32 and _Float64
only to float and double respectivelly.
(basic_format_arg::_S_to_enum): Remove handling of _Float32 and _Float64.

Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>
(cherry picked from commit 445128c12cf22081223f7385196ee3889ef4c4b2)

i386: Enable -mnop-mcount for -fpic with PLTs [PR119386]

-mnop-mcount can be trivially enabled for -fPIC codegen as long as PLTs
are being used, given that the instruction encodings are identical, only
the target may resolve differently depending on how the linker decides
to incorporate the object file.

So relax the option check, and add a test to ensure that 5-byte NOPs are
emitted when -mnop-mcount is being used.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
gcc/ChangeLog:

PR target/119386
* config/i386/i386-options.cc: Permit -mnop-mcount when
using -fpic with PLTs.

gcc/testsuite/ChangeLog:

PR target/119386
* gcc.target/i386/pr119386-3.c: New test.

(cherry picked from commit 6b4569a3ebdd0df44d87d67a18272ec0b878f2ee)

i386: Prefer PLT indirection for __fentry__ calls under -fPIC [PR119386]

Commit bde21de1205 ("i386: Honour -mdirect-extern-access when calling
__fentry__") updated the logic that emits mcount() / __fentry__() calls
into function prologues when profiling is enabled, to avoid GOT-based
indirect calls when a direct call would suffice.

There are two problems with that change:
- it relies on -mdirect-extern-access rather than -fno-plt to decide
  whether or not a direct [PLT based] call is appropriate;
- for the PLT case, it falls through to x86_print_call_or_nop(), which
  does not emit the @PLT suffix, resulting in the wrong relocation to be
  used (R_X86_64_PC32 instead of R_X86_64_PLT32)

Fix this by testing flag_plt instead of ix86_direct_extern_access, and
updating x86_print_call_or_nop() to take flag_pic and flag_plt into
account. This also ensures that -mnop-mcount works as expected when
emitting the PLT based profiling calls.

While at it, fix the 32-bit logic as well, and issue a PLT call unless
PLTs are explicitly disabled.

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119386

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
gcc/ChangeLog:

PR target/119386
* config/i386/i386.cc (x86_print_call_or_nop): Add @PLT suffix
where appropriate.
(x86_function_profiler): Fall through to x86_print_call_or_nop()
for PIC codegen when flag_plt is set.

gcc/testsuite/ChangeLog:

PR target/119386
* gcc.target/i386/pr119386-1.c: New test.
* gcc.target/i386/pr119386-2.c: New test.

(cherry picked from commit 9b0ae0a8d70603960f3c578d261efd18c02b803f)

Daily bump.

Fix wrong optimization of conditional expression with enumeration type

This is a regression introduced on the mainline and 14 branch by:
https://gcc.gnu.org/pipermail/gcc-cvs/2023-October/391658.html

The change bypasses int_fits_type_p (essentially) to work around the
signedness constraints, but in doing so disregards the peculiarities
of boolean types whose precision is not 1 dealt with by the predicate,
leading to the creation of a problematic conversion here.

Fixed by special-casing boolean types whose precision is not 1, as done
in several other places.

gcc/
* tree-ssa-phiopt.cc (factor_out_conditional_operation): Do not
bypass the int_fits_type_p test for boolean types whose precision
is not 1.

gcc/testsuite/
* gnat.dg/opt105.adb: New test.
* gnat.dg/opt105_pkg.ads, gnat.dg/opt105_pkg.adb: New helper.

aarch64: Disable sysreg feature gating

This applies to the sysreg read/write intrinsics __arm_[wr]sr*. It does
not depend on changes to Binutils, because GCC converts recognised
sysreg names to an encoding based form, which is already ungated in Binutils.

We have, however, agreed to make an equivalent change in Binutils (which
would then disable feature gating for sysreg accesses in inline
assembly), but this has not yet been posted upstream.

In the future we may introduce a new flag to renable some checking,
but these checks could not be comprehensive because many system
registers depend on architecture features that don't have corresponding
GCC/GAS --march options. This would also depend on addressing numerous
inconsistencies in the existing list of sysreg feature dependencies.

gcc/ChangeLog:

* config/aarch64/aarch64.cc
(aarch64_valid_sysreg_name_p): Remove feature check.
(aarch64_retrieve_sysreg): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/acle/rwsr-ungated.c: New test.

x86: Update gcc.target/i386/apx-interrupt-1.c

ix86_add_cfa_restore_note omits the REG_CFA_RESTORE REG note for registers
pushed in red-zone.  Since

commit 0a074b8c7e79f9d9359d044f1499b0a9ce9d2801
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Sun Apr 13 12:20:42 2025 -0700

    APX: Don't use red-zone with 32 GPRs and no caller-saved registers

disabled red-zone, update gcc.target/i386/apx-interrupt-1.c to expect
31 .cfi_restore directives.

PR target/119784
* gcc.target/i386/apx-interrupt-1.c: Expect 31 .cfi_restore
directives.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
(cherry picked from commit 5ed2fa4768f3d318b8ace5bd4a095596e06fad7b)

APX: Don't use red-zone with 32 GPRs and no caller-saved registers

Don't use red-zone when there are no caller-saved registers with 32 GPRs
since 128-byte red-zone is too small for 31 GPRs.

gcc/

PR target/119784
* config/i386/i386.cc (ix86_using_red_zone): Don't use red-zone
with 32 GPRs and no caller-saved registers.

gcc/testsuite/

PR target/119784
* gcc.target/i386/pr119784a.c: New test.
* gcc.target/i386/pr119784b.c: Likewise.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
(cherry picked from commit 0a074b8c7e79f9d9359d044f1499b0a9ce9d2801)

Extend check-function-bodies to allow label and directives

As PR target/116174 shown, we may need to verify labels and the directive
order.  Extend check-function-bodies to support matched output lines to
allow label and directives.

gcc/

* doc/sourcebuild.texi (check-function-bodies): Add an optional
argument for matched output lines.

gcc/testsuite/

* gcc.target/i386/pr116174.c: Use check-function-bodies.
* lib/scanasm.exp (parse_function_bodies): Append the line if
$up_config(matched) matches the line.
(check-function-bodies): Add an argument for matched.  Set
up_config(matched) to $matched.  Append the expected line without
$config(line_prefix) to function_regexp if it starts with ".L".

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
(cherry picked from commit d6bb1e257fc414d21bc31faa7ddecbc93a197e3c)

RISC-V: Put jump table in text for large code model

Large code model assume the data or rodata may put far away from
text section. So we need to put jump table in text section for
large code model.

gcc/ChangeLog:

* config/riscv/riscv.h (JUMP_TABLES_IN_TEXT_SECTION): Check if
large code model.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/jump-table-large-code-model.c: New test.

(cherry picked from commit 1d9e02bb7e0af4f3d3eaaa1a0f4961970aba5560)

RISC-V: Fix vec_duplicate[bimode] expander [PR119572].

Since r15-9062-g70391e3958db79 we perform vector bitmask initialization
via the vec_duplicate expander directly. This triggered a latent bug in
ours where we missed to mask out the single bit which resulted in an
execution FAIL of pr119114.c

The attached patch adds the 1-masking of the broadcast operand.

PR target/119572

gcc/ChangeLog:

* config/riscv/autovec.md: Mask broadcast value.

(cherry picked from commit 716d39f0a248c1003033e6a312c736180790ef70)

aarch64: Split aarch64_combinev16qi before RA [PR115258]

Two-vector TBL instructions are fed by an aarch64_combinev16qi, whose
purpose is to put the two input data vectors into consecutive registers.
This aarch64_combinev16qi was then split after reload into individual
moves (from the first input to the first half of the output, and from
the second input to the second half of the output).

In the worst case, the RA might allocate things so that the destination
of the aarch64_combinev16qi is the second input followed by the first
input.  In that case, the split form of aarch64_combinev16qi uses three
eors to swap the registers around.

This PR is about a test where this worst case occurred.  And given the
insn description, that allocation doesn't semm unreasonable.

early-ra should (hopefully) mean that we're now better at allocating
subregs of vector registers.  The upcoming RA subreg patches should
improve things further.  The best fix for the PR therefore seems
to be to split the combination before RA, so that the RA can see
the underlying moves.

Perhaps it even makes sense to do this at expand time, avoiding the need
for aarch64_combinev16qi entirely.  That deserves more experimentation
though.

gcc/
PR target/115258
* config/aarch64/aarch64-simd.md (aarch64_combinev16qi): Allow
the split before reload.
* config/aarch64/aarch64.cc (aarch64_split_combinev16qi): Generalize
into a form that handles pseudo registers.

gcc/testsuite/
PR target/115258
* gcc.target/aarch64/pr115258.c: New test.

(cherry picked from commit 39263ed2d39ac1cebde59bc5e72ddcad5dc7a1ec)

aarch64: Avoid unnecessary use of 2-input TBLs [PR115258]

When using TBL for (say) a V4SI permutation, the aarch64 port first
asks target-independent code to lower to a V16QI permutation.
Then, during code generation, an input like:

  (reg:V4SI R)

gets converted to:

  (subreg:V16QI (reg:V4SI R) 0)

aarch64_vectorize_vec_perm_const had:

  d.op0 = op0 ? force_reg (op_mode, op0) : NULL_RTX;
  if (op0 == op1)
    d.op1 = d.op0;
  else
    d.op1 = op1 ? force_reg (op_mode, op1) : NULL_RTX;

But subregs (unlike regs) are not shared, so the op0 == op1 check
always failed for this case.  We'd then force each subreg into a
fresh register, meaning that during the later:

  aarch64_expand_vec_perm_1 (d->target, d->op0, d->op1, sel);

there is no way for aarch64_expand_vec_perm_1 to realise that
d->op0 and d->op1 are the same value.  It would therefore generate
a two-input TBL in the testcase, even though a single-input TBL
is enough.

I'm not sure forcing subregs to a fresh regiter is a good idea --
it caused problems for copysign & co. -- but that's not something
to fiddle with during stage 4.  Using op0 == op1 for rtx equality
is independently wrong, so we might as well just fix that for now.

The patch gets rid of extra MOVs that are a regression from GCC 14.

The testcase is based on one from Kugan, itself based on TSVC.

gcc/
PR target/115258
* config/aarch64/aarch64.cc (aarch64_vectorize_vec_perm_const): Use
d.one_vector_p to decide whether op1 should be a copy of op0.

gcc/testsuite/
PR target/115258
* gcc.target/aarch64/pr115258_2.c: New test.

Co-authored-by: Kugan Vivekanandarajah <kvivekananda@nvidia.com>
(cherry picked from commit 31dcf941ac78c4b1b01dc4b2ce9809f0209153b8)

vect: Enforce dr_with_seg_len::align precondition [PR116125]

tree-data-refs.cc uses alignment information to try to optimise
the code generated for alias checks.  The assumption for "normal"
non-grouped, full-width scalar accesses was that the access size
would be a multiple of the alignment.  As Richi notes in the PR,
this is a documented precondition of dr_with_seg_len:

  /* The minimum common alignment of DR's start address, SEG_LEN and
     ACCESS_SIZE.  */
  unsigned int align;

PR115192 was a case in which this assumption didn't hold.  The access
was part of an aligned 4-element group, but only the first 2 elements
of the group were accessed.  The alignment was therefore double the
access size.

In r15-820-ga0fe4fb1c8d78045 I'd "fixed" that by capping the
alignment in one of the output routines.  But I think that was
misconceived.  The precondition means that we should cap the
alignment at source instead.

Failure to do that caused a similar wrong code bug in this PR,
where the alignment comes from a short bitfield access rather
than from a group access.

gcc/
PR tree-optimization/116125
* tree-vect-data-refs.cc (vect_prune_runtime_alias_test_list): Make
the dr_with_seg_len alignment fields describe tha access sizes as
well as the pointer alignment.
* tree-data-ref.cc (create_intersect_range_checks): Don't compensate
for invalid alignment fields here.

gcc/testsuite/
PR tree-optimization/116125
* gcc.dg/vect/pr116125.c: New test.

(cherry picked from commit e8651b80aeb86da935035e218747a6b41b611497)

LoongArch: Fix invalid subregs in xorsign [PR118501]

The test case added in r15-7073 now triggers an ICE, indicating we need
the same fix as AArch64.

gcc/ChangeLog:

PR target/118501
* config/loongarch/loongarch.md (@xorsign<mode>3): Use
force_lowpart_subreg.

(cherry picked from commit 9ddf4a6cc650360e620c8fd97f550bf833cc177a)

aarch64: Fix invalid subregs in xorsign [PR118501]

In the testcase, we try to use xorsign on:

   (subreg:DF (reg:TI R) 8)

i.e. the highpart of the TI.  xorsign wants to take a V2DF
paradoxical subreg of this, which is rightly rejected as a direct
operation.  In cases like this, we need to force the highpart into
a fresh register first.

gcc/
PR target/118501
* config/aarch64/aarch64.md (@xorsign<mode>3): Use
force_lowpart_subreg.

gcc/testsuite/
PR target/118501
* gcc.c-torture/compile/pr118501.c: New test.

(cherry picked from commit 6612b8e55471fabd2071a9637a06d3ffce2b05a6)

aarch64: Use force_lowpart_subreg in a BFI splitter [PR119133]

lowpart_subreg ICEs are the gift that keeps giving. This is another
case where we need to use force_lowpart_subreg instead, to handle
cases where the input is already a subreg and where the combined
subreg is not allowed as a single operation.

We don't need to check can_create_pseudo_p since the input should
be a hard register rather than a subreg if !can_create_pseudo_p.

gcc/
PR target/119133
* config/aarch64/aarch64.md
(*aarch64_bfi<GPI:mode><ALLX:mode>_<SUBDI_BITS>): Use
force_lowpart_subreg.

gcc/testsuite/
PR target/119133
* gcc.dg/torture/pr119133.c: New test.

(cherry picked from commit 5ae621e2e86c00d1fb13ef6839d0c3bace762ac8)

Avoid using POINTER_DIFF_EXPR for overlap checks [PR119399]

In r10-4803-g8489e1f45b50600c I'd used POINTER_DIFF_EXPR to subtract
the two pointers involved in an overlap test. I'm not sure whether
I'd specifically chosen that over MINUS_EXPR or not; if so, the only
reason I can think of is that it is probably faster on targets with
PSImode pointers. Regardless, as the PR points out, subtracting
unrelated pointers using POINTER_DIFF_EXPR is undefined behaviour.

gcc/
PR tree-optimization/119399
* tree-data-ref.cc (create_waw_or_war_checks): Use a MINUS_EXPR
on two converted pointers, rather than converting a POINTER_DIFF_EXPR
on the pointers.

gcc/testsuite/
PR tree-optimization/119399
* gcc.dg/vect/pr119399.c: New test.

(cherry picked from commit 4c8c373495d7d863dfb7102726ac3b4b41685df4)

Add force_lowpart_subreg

optabs had a local function called lowpart_subreg_maybe_copy
that is very similar to the lowpart version of force_subreg.
This patch adds a force_lowpart_subreg wrapper around
force_subreg.

The only difference between the old and new functions is that
the old one asserted success while the new one doesn't.
It's common not to assert elsewhere when taking subregs;
normally a null result is enough.

Later patches will make more use of the new function.

gcc/
* explow.h (force_lowpart_subreg): Declare.
* explow.cc (force_lowpart_subreg): New function.

(cherry picked from commit 5f40d1c0cc6ce91ef28d326b8707b3f05e6f239c)

Make force_subreg emit nothing on failure

While adding more uses of force_subreg, I realised that it should
be more careful to emit no instructions on failure. This kind of
failure should be very rare, so I don't think it's a case worth
optimising for.

gcc/
* explow.cc (force_subreg): Emit no instructions on failure.

(cherry picked from commit 01044471ea39f9be4803c583ef2a946abc657f99)

libstdc++: Adjust comment in <numeric>

We don't need to mention ranges::out_value_result in this comment,
because <numeric> doesn't care about that name.

libstdc++-v3/ChangeLog:

* include/std/numeric: Only mention ranges::iota in comment.

libstdc++: Add test for not using reserved name 'ranges' before C++20

This is a test for a bug that was present on trunk, because 'ranges' is
not a reserved name before C++20.

libstdc++-v3/ChangeLog:

* testsuite/17_intro/names.cc: Check ranges is not used as an
identifier before C++20.

libstdc++: Do not define __cpp_lib_ranges_iota in <ranges>

In r14-7153-gadbc46942aee75 we removed a duplicate definition of
__glibcxx_want_range_iota from <ranges>, but __cpp_lib_ranges_iota
should be defined in <ranges> at all.

libstdc++-v3/ChangeLog:

* include/std/ranges (__glibcxx_want_ranges_iota): Do not
define.

(cherry picked from commit 25775e73ea4d40a55a26b71c42cc6509caf4845f)

libstdc++: Fix std::ranges::iota is not included in numeric [PR108760]

Before this patch, using std::ranges::iota required including
<algorithm> when it should have been sufficient to only include
<numeric>.

For the backport to the release branch ranges::iota is defined in
<bits/ranges_algobase.h> so that it's available in both <numeric> and
<algorithm>. This avoids breaking code that compiles successfully using
existing releases where <algorithm> defines ranges::iota.

libstdc++-v3/ChangeLog:

PR libstdc++/108760
* include/bits/ranges_algo.h (ranges::out_value_result)
(ranges::iota_result, ranges::__iota_fn, ranges::iota): Move to
<bits/ranges_algobase.h>.
* include/bits/ranges_algobase.h (ranges::out_value_result):
(ranges::iota_result, ranges::__iota_fn, ranges::iota): Move to
here.
* include/std/numeric: Include <bits/ranges_algobase.h>.
* testsuite/25_algorithms/iota/1.cc: Renamed to ...
* testsuite/26_numerics/iota/2.cc: ... here.

Signed-off-by: Michael Levine <mlevine55@bloomberg.net>
(cherry picked from commit 0bb1db32ccf54a9de59bea718f7575f7ef22abf5)

libstdc++: Fix ranges::move and ranges::move_backward to use iter_move [PR105609]

The ranges::move and ranges::move_backward algorithms are supposed to
use ranges::iter_move(iter) instead of std::move(*iter), which matters
for an iterator type with an iter_move overload findable by ADL.

Currently those algorithms use std::__assign_one which uses std::move,
so define a new ranges::__detail::__assign_one helper function that uses
ranges::iter_move.

libstdc++-v3/ChangeLog:

PR libstdc++/105609
* include/bits/ranges_algobase.h (__detail::__assign_one): New
helper function.
(__copy_or_move, __copy_or_move_backward): Use new function
instead of std::__assign_one.
* testsuite/25_algorithms/move/constrained.cc: Check that
ADL iter_move is used in preference to std::move.
* testsuite/25_algorithms/move_backward/constrained.cc:
Likewise.

(cherry picked from commit 3866ca796d5281d33f25b4165badacf8f198c6d1)

libstdc++: Reuse std::__assign_one in <bits/ranges_algobase.h>

Use std::__assign_one instead of ranges::__assign_one. Adjust the uses,
because std::__assign_one has the arguments in the opposite order (the
same order as an assignment expression).

libstdc++-v3/ChangeLog:

* include/bits/ranges_algobase.h (ranges::__assign_one): Remove.
(__copy_or_move, __copy_or_move_backward): Use std::__assign_one
instead of ranges::__assign_one.

Reviewed-by: Patrick Palka <ppalka@redhat.com>
(cherry picked from commit d0a9ae1321f01c33b7ee377249cad30187061c0c)

libstdc++: Fix ranges::copy_backward for a single memcpyable element [PR117121]

The result iterator needs to be decremented before writing to it.

Improve the PR 108846 tests for all of std::copy, std::copy_n,
std::copy_backward, and the std::ranges versions.

libstdc++-v3/ChangeLog:

PR libstdc++/117121
* include/bits/ranges_algobase.h (copy_backward): Decrement
output iterator before assigning one element through it.
* testsuite/25_algorithms/copy/108846.cc: Ensure the algorithm's
effects are correct for a single memcpyable element.
* testsuite/25_algorithms/copy_backward/108846.cc: Likewise.
* testsuite/25_algorithms/copy_n/108846.cc: Likewise.

(cherry picked from commit 27f6b376e8e196c7c85c8b47436cd2f2993768da)

libstdc++: Do not use use memmove for 1-element ranges [PR108846,PR116471]

This commit ports the fixes already applied by r13-6372-g822a11a1e642e0
to the range-based versions of copy/move algorithms.

When doing so, a further bug (PR116471) was discovered in the
implementation of the range-based algorithms: although the algorithms
are already constrained by the indirectly_copyable/movable concepts,
there was a failing static_assert in the memmove path.

This static_assert checked that iterator's value type was assignable by
using the is_copy_assignable (move) type traits. However, this is a
problem, because the traits are too strict when checking for constness;
a type like

struct S { S& operator=(S &) = default; };

is trivially copyable (and thus could benefit of the memmove path),
but it does not satisfy is_copy_assignable because the operator takes
by non-const reference.

Now, the reason for the check to be there is because a type with
a deleted assignment operator like

struct E { E& operator=(const E&) = delete; };

is still trivially copyable, but not assignable. We don't want
algorithms like std::ranges::copy to compile because they end up
selecting the memmove path, "ignoring" the fact that E isn't even
copy assignable.

But the static_assert isn't needed here any longer: as noted before,
the ranges algorithms already have the appropriate constraints; and
even if they didn't, there's now a non-discarded codepath to deal with
ranges of length 1 where there is an explicit assignment operation.

Therefore, this commit removes it. (In fact, r13-6372-g822a11a1e642e0
removed the same static_assert from the non-ranges algorithms.)

libstdc++-v3/ChangeLog:

PR libstdc++/108846
PR libstdc++/116471
* include/bits/ranges_algobase.h (__assign_one): New helper
function.
(__copy_or_move): Remove a spurious static_assert; use
__assign_one for memcpyable ranges of length 1.
(__copy_or_move_backward): Likewise.
* testsuite/25_algorithms/copy/108846.cc: Extend to range-based
algorithms, and cover both memcpyable and non-memcpyable
cases.
* testsuite/25_algorithms/copy_backward/108846.cc: Likewise.
* testsuite/25_algorithms/copy_n/108846.cc: Likewise.
* testsuite/25_algorithms/move/108846.cc: Likewise.
* testsuite/25_algorithms/move_backward/108846.cc: Likewise.

Signed-off-by: Giuseppe D'Angelo <giuseppe.dangelo@kdab.com>
(cherry picked from commit 5938e0681c3907b2771ce6717988416b0ddd2f54)

libstdc++: Add missing header to <bits/ranges_algobase.h> for std::__memcmp

As noticed by Michael Levine.

libstdc++-v3/ChangeLog:

* include/bits/ranges_algobase.h: Include <bits/stl_algobase.h>.

(cherry picked from commit 674d213ab91871652e96dc2de06e6f50682eebe0)

RISC-V: revert pr114194 tests on gcc-14 [PR118601]

The gcc-14 backport that split the pr114194 testcase for rv32 and rv64
would only generate the expected rv32 sequence if commit
6b315907c0353f71169a7555e653d29a981fef67 had also been backported, but
it wasn't. Without it, we get the same code as before on both rv32
and rv64, so revert to the original test.

for gcc/testsuite/ChangeLog

PR target/118601
* gcc.target/riscv/rvv/xtheadvector/pr114194.c: Restore.
* gcc.target/riscv/rvv/xtheadvector/pr114194-rv64.c: Remove.
* gcc.target/riscv/rvv/xtheadvector/pr114194-rv32.c: Likewise.

RISC-V: adjust testcase for gcc-14 [PR118182]

The pr118182-2.c testcase backported from gcc-15 depended on the late
combine pass after register allocation to substitute the zero constant
into the pred_broadcast to get to the expected vmv.s.x instruction.
Without that pass, we get a mfmv.s.f instead. Expect that on gcc-14.

for gcc/testsuite/ChangeLog

PR target/118182
* gcc.target/riscv/rvv/autovec/pr118182-2.c: Adjust.

Daily bump.

discriminators: Fix assigning discriminators on edge [PR113546]

The problem here is there was a compare debug since the discriminators
would still take into account debug statements. For the edge we would look
at the first statement after the labels and that might have been a debug statement.
So we need to skip over debug statements otherwise we could get different
discriminators # with and without -g.

Bootstrapped and tested on x86_64-linux-gnu with no regressions.

PR middle-end/113546

gcc/ChangeLog:

* tree-cfg.cc (first_non_label_stmt): Rename to ...
(first_non_label_nondebug_stmt): This and use gsi_start_nondebug_after_labels_bb.
(assign_discriminators): Update call to first_non_label_nondebug_stmt.

gcc/testsuite/ChangeLog:

* c-c++-common/torture/pr113546-1.c: New test.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
(cherry picked from commit c5ca45b8069229b6ad9bc845f03f46340f6316d7)

c++: wrong targs in satisfaction diagnostic context line [PR99214]

In the three-parameter version of satisfy_declaration_constraints, when
't' isn't the most general template, then 't' won't correspond with
'args' after we augment the latter via add_outermost_template_args, and
so the instantiation context that we push via push_tinst_level isn't
quite correct: 'args' is a complete set of template arguments, but 't'
is not necessarily the most general template.  This manifests as
misleading diagnostic context lines when issuing a satisfaction failure
error, e.g.  the below testcase without this patch we emit:
  In substitution of '... void A<int>::f<U>() ... [with U = int]'
and with this patch we emit:
  In substitution of '... void A<int>::f<U>() ... [with U = char]'.

This patch fixes this by passing the original 'args' to push_tinst_level,
which ought to properly correspond to 't'.

PR c++/99214

gcc/cp/ChangeLog:

* constraint.cc (satisfy_declaration_constraints): Pass the
original ARGS to push_tinst_level.

gcc/testsuite/ChangeLog:

* g++.dg/concepts/diagnostic20.C: New test.

Reviewed-by: Jason Merrill <jason@redhat.com>
(cherry picked from commit 00966a7fdb1478b3af5254ff3a80a3ef336c5a94)

libstdc++: Document thread-safety for COW std::string [PR21334]

The gcc4-compatible copy-on-write std::string does not conform to the
C++11 requirements on data race avoidance in standard containers.
Specifically, calling non-const member functions such as begin() and
data() needs to do the "copy on write" operation and so is most
definitely a modification of the object. As such, those non-const
members must not be called concurrently with any other uses of the
string object.

libstdc++-v3/ChangeLog:

PR libstdc++/21334
* doc/xml/manual/using.xml: Document that container data race
avoidance rules do not apply to COW std::string.
* doc/html/*: Regenerate.

(cherry picked from commit dd35f66287b7cca196a720c9641e463255dceb1c)

phiopt: Reset the number of iterations information of a loop when changing an exit from the loop [PR117243]

After r12-5300-gf98f373dd822b3, phiopt could get the following bb structure:
      |
    middle-bb -----|
      |            |
      |   |----|   |
    phi<1, 2>  |   |
    cond       |   |
      |        |   |
      |--------+---|

Which was considered 2 loops. The inner loop had esimtate of upper_bound to be 8,
due to the original `for (b = 0; b <= 7; b++)`. The outer loop was already an
infinite one.
So phiopt would come along and change the condition to be unconditionally true,
we change the inner loop to being an infinite one but don't reset the estimate
on the loop and cleanup cfg comes along and changes it into one loop but also
does not reset the estimate of the loop. Then the loop unrolling uses the old estimate
and decides to add an unreachable there.o
So the fix is when phiopt changes an exit to a loop, reset the estimates, similar to
how cleanupcfg does it when merging some basic blocks.

Bootstrapped and tested on x86_64-linux-gnu.

PR tree-optimization/117243
PR tree-optimization/116749

gcc/ChangeLog:

* tree-ssa-phiopt.cc (replace_phi_edge_with_variable): Reset loop
estimates if the cond_block was an exit to a loop.

gcc/testsuite/ChangeLog:

* gcc.dg/torture/pr117243-1.c: New test.
* gcc.dg/torture/pr117243-2.c: New test.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
(cherry picked from commit b7c69cc072ef0da36439ebc55c513b48e68391b7)

phiopt: Fix value_replacement for middle bb having phi nodes [PR118922]

After r12-5300-gf98f373dd822b3, value_replacement would be able to look at the
following cfg structure:
```
  <bb 5> [local count: 1014686024]:
  if (h_6 != 0)
    goto <bb 7>; [94.50%]
  else
    goto <bb 6>; [5.50%]

  <bb 6> [local count: 114863530]:
  # h_6 = PHI <0(4), 1(5)>

  <bb 7> [local count: 1073741824]:
  # f_8 = PHI <0(5), h_6(6)>
  _9 = f_8 ^ 1;
  a.0_10 = a;
  _11 = _9 + a.0_10;
  if (_11 != -117)
    goto <bb 5>; [94.50%]
  else
    goto <bb 8>; [5.50%]
```

value_replacement would incorrectly think the middle bb (6) was empty and so it decides
to remove condition in bb5 and replacing it with 0 as the function thought it was `h_6 ? 0 : h_6`.
But since the there is an incoming phi node to bb6 defining h_6 that is incorrect.

The fix is to check if there is phi nodes in the middle bb and set empty_or_with_defined_p to false.
This was not needed before r12-5300-gf98f373dd822b3 because the phi would have been dead otherwise due to
other checks.

Bootstrapped and tested on x86_64-linux-gnu.

PR tree-optimization/118922

gcc/ChangeLog:

* tree-ssa-phiopt.cc (value_replacement): Set empty_or_with_defined_p
to false when there is phi nodes for the middle bb.

gcc/testsuite/ChangeLog:

* gcc.dg/torture/pr118922-1.c: New test.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
(cherry picked from commit 7232c005afb5002cdfd0a2dbd0e8b8f2d80250ce)

Daily bump.

Revert very recent backport of changes to the type system

The backport of the change made for PR c/113688 onto the 14 branch a couple
of weeks ago has seriously broken the LTO compiler for the Ada language on
the 14 branch, because it changes the GCC type system for the sake of C in
a way that is not compatible with simple discriminated types in Ada.  To be
more precise, useless_type_conversion_p now returns true for some (view-)
conversions that are needed by the rest of the compiler.

gcc/
PR lto/119792
Revert

Backported from master:
    2024-12-12  Martin Uecker  <uecker@tugraz.at>

PR c/113688
PR c/114014
PR c/114713
PR c/117724
* tree.cc (gimple_canonical_types_compatible_p): Add exception.
(verify_type): Add exception.

gcc/lto/
PR lto/119792
Revert

Backported from master:
    2024-12-12  Martin Uecker  <uecker@tugraz.at>
* lto-common.cc (hash_canonical_type): Add exception.

gcc/testsuite/
* gcc.dg/pr113688.c: Delete.
* gcc.dg/pr114014.c: Likewise.
* gcc.dg/pr114713.c: Likewise.
* gcc.dg/pr117724.c: Likewise

testcase: Add testcase for already fixed PR [PR118476]

This testcase was fixed by r15-3052-gc7b76a076cb2c6ded but is
a testcase that failed in a different fashion and a much older
failure than the one added with r15-3052.

Pushed as obvious after a quick test.

PR tree-optimization/118476

gcc/testsuite/ChangeLog:

* gcc.dg/torture/pr118476-1.c: New test.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
(cherry picked from commit d45a6502d1ec87d43f1a39f87cca58f1e28369c8)

match: Reject non-ssa name/min invariants in gimple_extract [PR116412]

After the conversion for phiopt's conditional operand
to use maybe_push_res_to_seq, it was found that gimple_extract
will extract out from REALPART_EXPR/IMAGPART_EXPR/VCE and BIT_FIELD_REF,
a memory load. But that extraction was not needed as memory loads are not
simplified in match and simplify. So gimple_extract should return false
in those cases.

Changes since v1:
* Move the rejection to gimple_extract from factor_out_conditional_operation.

Bootstrapped and tested on x86_64-linux-gnu.

PR tree-optimization/116412

gcc/ChangeLog:

* gimple-match-exports.cc (gimple_extract): Return false if op0
was not a SSA name nor a min invariant for REALPART_EXPR/IMAGPART_EXPR/VCE
and BIT_FIELD_REF.

gcc/testsuite/ChangeLog:

* gcc.dg/torture/pr116412-1.c: New test.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
(cherry picked from commit c7b76a076cb2c6ded7ae208464019b04cb0531a2)

vec-lowering: Fix ABSU lowering [PR111285]

ABSU_EXPR lowering incorrectly used the resulting type
for the new expression but in the case of ABSU the resulting
type is an unsigned type and with ABSU is folded away. The fix
is to use a signed type for the expression instead.

Bootstrapped and tested on x86_64-linux-gnu.

PR middle-end/111285

gcc/ChangeLog:

* tree-vect-generic.cc (do_unop): Use a signed type for the
operand if the operation was ABSU_EXPR.

gcc/testsuite/ChangeLog:

* g++.dg/torture/vect-absu-1.C: New test.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
(cherry picked from commit ad0084337e901ddaedd48c14e7a5dad9fc2a093e)

backprop: Fix deleting of a phi node [PR116922]

The problem here is remove_unused_var is called on a name that is
defined by a phi node but it deletes it like removing a normal statement.
remove_phi_node should be called rather than gsi_remove for phinodes.

Note there is a possibility of using simple_dce_from_worklist instead
but that is for another day.

Bootstrapped and tested on x86_64-linux-gnu.

PR tree-optimization/116922

gcc/ChangeLog:

* gimple-ssa-backprop.cc (remove_unused_var): Handle phi
nodes correctly.

gcc/testsuite/ChangeLog:

* gcc.dg/torture/pr116922.c: New test.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
(cherry picked from commit cea87c84eacdb422caeada734ba5138c994d7022)

aarch64: Fix early ra for -fno-delete-dead-exceptions [PR116927]

Early-RA was considering throwing instructions as being dead and removing
them even if -fno-delete-dead-exceptions was in use. This fixes that oversight.

Built and tested for aarch64-linux-gnu.

PR target/116927

gcc/ChangeLog:

* config/aarch64/aarch64-early-ra.cc (early_ra::is_dead_insn): Insns
that throw are not dead with -fno-delete-dead-exceptions.

gcc/testsuite/ChangeLog:

* g++.dg/torture/pr116927-1.C: New test.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
(cherry picked from commit edec4bfc99744b48da3ffde1e4f39c9aceecfd42)

phiopt: Fix VCE moving by rewriting it into cast [PR116098]

Phiopt match_and_simplify might move a well defined VCE assign statement
from being conditional to being uncondtitional; that VCE might no longer
being defined. It will need a rewrite into a cast instead.

This adds the rewriting code to move_stmt for the VCE case.
This is enough to fix the issue at hand. It should also be using rewrite_to_defined_overflow
but first I need to move the check to see a rewrite is needed into its own function
and that is causing issues (see https://gcc.gnu.org/pipermail/gcc-patches/2024-September/663938.html).
Plus this version is easiest to backport.

Bootstrapped and tested on x86_64-linux-gnu.

PR tree-optimization/116098

gcc/ChangeLog:

* tree-ssa-phiopt.cc (move_stmt): Rewrite VCEs from integer to integer
types to case.

gcc/testsuite/ChangeLog:

* c-c++-common/torture/pr116098-2.c: New test.
* g++.dg/torture/pr116098-1.C: New test.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
(cherry picked from commit 1f619fe25925a5f79b9c33962e7a72e1f9fa4444)

Fortran: fix issue with impure elemental subroutine and interface [PR119656]

PR fortran/119656

gcc/fortran/ChangeLog:

* interface.cc (gfc_compare_actual_formal): Fix front-end memleak
when searching for matching interfaces.
* trans-expr.cc (gfc_conv_procedure_call): If there is a formal
dummy corresponding to an absent argument, use its type, and only
fall back to inferred type otherwise.

gcc/testsuite/ChangeLog:

* gfortran.dg/optional_absent_13.f90: New test.

(cherry picked from commit 334545194d9023fb9b2f72ee0dcde8af94930f25)

Add testcase for PR lto/119792

It demonstrates a serious LTO breakage for the Ada language.

gcc/testsuite/
PR lto/119792
* gnat.dg/lto29.adb: New test.
* gnat.dg/lto29_pkg.ads: New helper.

c++: Properly fold <COND_EXPR>.*<COMPONENT> [PR114525]

We've been miscompiling the following since r0-51314-gd6b4ea8592e338 (I
did not go compile something that old, and identified this change via
git blame, so might be wrong)

=== cut here ===
struct Foo { int x; };
Foo& get (Foo &v) { return v; }
void bar () {
  Foo v; v.x = 1;
  (true ? get (v) : get (v)).*(&Foo::x) = 2;
  // v.x still equals 1 here...
}
=== cut here ===

The problem lies in build_m_component_ref, that computes the address of
the COND_EXPR using build_address to build the representation of
  (true ? get (v) : get (v)).*(&Foo::x);
and gets something like
  &(true ? get (v) : get (v))  // #1
instead of
  (true ? &get (v) : &get (v)) // #2
and the write does not go where want it to, hence the miscompile.

This patch replaces the call to build_address by a call to
cp_build_addr_expr, which gives #2, that is properly handled.

PR c++/114525

gcc/cp/ChangeLog:

* typeck2.cc (build_m_component_ref): Call cp_build_addr_expr
instead of build_address.

gcc/testsuite/ChangeLog:

* g++.dg/expr/cond18.C: New test.

(cherry picked from commit 35ce9afc84a63fb647a90cbecb2adf3e748178be)

Daily bump.

rtl-optimization/119689 - compare-debug failure with LRA

The previous change to fix LRA rematerialization broke compare-debug
for i586 bootstrap. Fixed by using prev_nonnote_nondebug_insn
instead of prev_nonnote_insn.

PR rtl-optimization/119689
PR rtl-optimization/115568
* lra-remat.cc (create_cands): Use prev_nonnote_nondebug_insn
to check whether insn2 is directly before insn.

* g++.target/i386/pr119689.C: New testcase.

(cherry picked from commit 088887de7717a22b1503760e9b79dfbe22a0f428)

[PR115568][LRA]: Use more strict output reload check in rematerialization

  In this PR case LRA rematerialized a value from inheritance insn
instead of output reload one.  This resulted in considering a
rematerilization candidate value available when it was actually
not.  As a consequence an insn after rematerliazation used the
unexpected value and this use resulted in fp exception.  The patch
fixes this bug.

gcc/ChangeLog:

PR rtl-optimization/115568
* lra-remat.cc (create_cands): Check that output reload insn is
adjacent to given insn.  Update a comment.

gcc/testsuite/ChangeLog:

PR rtl-optimization/115568
* gcc.target/i386/pr115568.c: New.

(cherry picked from commit 98545441308c2ae4d535f14b108ad6551fd927d5)

Daily bump.

c++: avoid ARM -Wunused-value [PR114970]

Because of the __builtin_is_constant_evaluated, maybe_constant_init in
expand_default_init fails, so the constexpr constructor isn't folded until
cp_fold, which builds a COMPOUND_EXPR in case the enclosing expression is
relying on the ARM behavior of returning 'this'.

As in other places, avoid -Wunused-value on artificial COMPOUND_EXPR.

PR c++/114970

gcc/cp/ChangeLog:

* cp-gimplify.cc (cp_fold): Suppress warnings on
return_this COMPOUND_EXPR.

gcc/testsuite/ChangeLog:

* g++.dg/opt/is_constant_evaluated4.C: New test.

(cherry picked from commit 4acdfb71d4fdaa43c2707ad7b2fb7b2b7bddfc42)

[PATCH v2] RISC-V: Fixbug for slli + addw + zext.w into sh[123]add + zext.w

Assuming we have the following variables:

unsigned long long a0, a1;
unsigned int a2;

For the expression:

a0 = (a0 << 50) >> 49;  // slli a0, a0, 50 + srli a0, a0, 49
a2 = a1 + a0;           // addw a2, a1, a0 + slli a2, a2, 32 + srli a2, a2, 32

In the optimization process of ZBA (combine pass), it would be optimized to:

a2 = a0 << 1 + a1;      // sh1add a2, a0, a1 + zext.w a2, a2

This is clearly incorrect, as it overlooks the fact that a0=a0&0x7ffe, meaning
that the bits a0[32:14] are set to zero.

gcc/ChangeLog:

* config/riscv/bitmanip.md: The optimization can only be applied if
the high bit of operands[3] is set to 1.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zba-shNadd-09.c: New test.
* gcc.target/riscv/zba-shNadd-10.c: New test.

(cherry picked from commit dd6ebc0a3473a830115995bdcaf8f797ebd085a3)

Daily bump.

c++: nested lambda capture pack [PR119345]

tsubst_stmt already registers a local capture proxy as a
local_specialization of both an outer capture proxy and the captured
variable; we also need to do that in add_extra_args.

PR c++/119345

gcc/cp/ChangeLog:

* pt.cc (add_extra_args): Also register a specialization
of the captured variable.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/lambda-targ14.C: New test.

(cherry picked from commit 5957b9919c9ecda6e4ca198086f8bb9ea215232c)

c++: lambda in constraint of lambda [PR119175]

Here when we went to mangle the constraints of from<0>, the outer lambda has
no mangling scope, but the inner one was treated as having the outer one as
its scope. And mangling the outer one means mangling its constraints, which
include the inner one. So infinite recursion.

But a lambda closure type isn't a scope that anything should have for
mangling, the inner lambda should also have no mangling scope.

PR c++/119175

gcc/cp/ChangeLog:

* mangle.cc (decl_mangling_context): Look through lambda type.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-lambda23.C: New test.

(cherry picked from commit 39892d9618ee0f06dd09271589878b0df7b1e75d)

c++: self-dependent alias template [PR117530]

Here, instantiating B<short> means instantiating A<short>, which means
instantiating B<short>.  And then when we go to register the initial
instantiation, it conflicts with the inner one.  Fixed by checking after
tsubst whether there's already something in the hash table.  We already did
something much like this in tsubst_decl, but that doesn't handle this case.

PR c++/117530

gcc/cp/ChangeLog:

* pt.cc (instantiate_template): Check retrieve_specialization after
tsubst.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/lambda-uneval27.C: New test.

(cherry picked from commit d034c78c7be613db3c25fddec1dd50222327117b)

c++: alias_ctad_tweaks ICE w/ inherited CTAD [PR119687]

With inherited CTAD the set of guides may be a two-dimensional overload
set (i.e. OVERLOADs of OVERLOADs) so alias_ctad_tweaks (which also does
the inherited CTAD transformation) needs to use the 2D-aware lkp_iterator
instead of ovl_iterator, or better yet use the more idiomatic lkp_range.

PR c++/119687

gcc/cp/ChangeLog:

* pt.cc (alias_ctad_tweaks): Use lkp_range / lkp_iterator
instead of ovl_iterator.

gcc/testsuite/ChangeLog:

* g++.dg/cpp23/class-deduction-inherited8.C: New test.

Reviewed-by: Jason Merill <jason@redhat.com>
(cherry picked from commit 493974aa0ad8b94dbeb61f00d2acc57c94fd4809)

libstdc++: Fix conversions to key/value types for hash table insertion [PR115285]

The conversions to key_type and value_type that are performed when
inserting into _Hashtable need to be fixed to do any required
conversions explicitly. The current code assumes that conversions from
the parameter to the key_type or value_type can be done implicitly,
which isn't necessarily true.

Remove the _S_forward_key function which doesn't handle all cases and
either forward the parameter if it already has type cv key_type, or
explicitly construct a temporary of type key_type.

Similarly, the _ConvertToValueType specialization for maps doesn't
handle all cases either, for std::pair arguments only some value
categories are handled. Remove _ConvertToValueType and for the _M_insert
function for unique keys, either forward the argument unchanged or
explicitly construct a temporary of type value_type.

For the _M_insert overload for non-unique keys we don't need any
conversion at all, we can just forward the argument directly to where we
construct a node.

libstdc++-v3/ChangeLog:

PR libstdc++/115285
* include/bits/hashtable.h (_Hashtable::_S_forward_key): Remove.
(_Hashtable::_M_insert_unique_aux): Replace _S_forward_key with
a static_cast to a type defined using conditional_t.
(_Hashtable::_M_insert): Replace _ConvertToValueType with a
static_cast to a type defined using conditional_t.
* include/bits/hashtable_policy.h (_ConvertToValueType): Remove.
* testsuite/23_containers/unordered_map/insert/115285.cc: New test.
* testsuite/23_containers/unordered_set/insert/115285.cc: New test.
* testsuite/23_containers/unordered_set/96088.cc: Adjust
expected number of allocations.

(cherry picked from commit 90c578654a2c96032aa6621449859243df5f641b)

libstdc++: Define __is_pair variable template for C++11

libstdc++-v3/ChangeLog:

* include/bits/stl_pair.h (__is_pair): Define for C++11 and
C++14 as well.

(cherry picked from commit dd08cdccc36d084eda0e2748c772f6bf9a7f412f)

libstdc++: Fix test broken when using COW std::string

libstdc++-v3/ChangeLog:

* testsuite/23_containers/unordered_map/96088.cc (test03): Fix increments
value when _GLIBCXX_USE_CXX11_ABI is equal to 0.

(cherry picked from commit d01dc97a26d2f5034ca135f46094aa52c44cc90a)

libstdc++: Always instantiate key_type to compute hash code [PR115285]

Even if it is possible to compute a hash code from the inserted arguments
we need to instantiate the key_type to guaranty hash code consistency.

Preserve the lazy instantiation of the mapped_type in the context of
associative containers.

libstdc++-v3/ChangeLog:

PR libstdc++/115285
* include/bits/hashtable.h (_S_forward_key<_Kt>): Always return a temporary
key_type instance.
* testsuite/23_containers/unordered_map/96088.cc: Adapt to additional instanciation.
Also check that mapped_type is not instantiated when there is no insertion.
* testsuite/23_containers/unordered_multimap/96088.cc: Adapt to additional
instanciation.
* testsuite/23_containers/unordered_multiset/96088.cc: Likewise.
* testsuite/23_containers/unordered_set/96088.cc: Likewise.
* testsuite/23_containers/unordered_set/pr115285.cc: New test case.

(cherry picked from commit ee030b28004eade3da872e7ae62a526a2940a705)

RISC-V: Disable unsupported vsext/vzext patterns for XTheadVector.

XThreadVector does not support the vsext/vzext instructions; however,
due to the reuse of RVV optimizations, it may generate these instructions
in certain cases. To prevent the error "Unknown opcode 'th.vsext.vf2',"
we should disable these patterns.

V2:
Change the value of dg-do in the test case from assemble to compile, and
remove the -save-temps option.

gcc/ChangeLog:

* config/riscv/vector.md: Disable vsext/vzext for XTheadVector.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/xtheadvector/vsext.c: New test.
* gcc.target/riscv/rvv/xtheadvector/vzext.c: New test.

(cherry picked from commit 196b45caca0aae57a95bffcdd5c188994317de08)

RISC-V: unrecognizable insn ICE in xtheadvector/pr114194.c on 32bit targets

This is a follow-up to the patch below to avoid generating unrecognized
vsetivl instructions for XTheadVector.

https://gcc.gnu.org/pipermail/gcc-patches/2025-January/674185.html

PR target/118601

gcc/ChangeLog:

* config/riscv/riscv-string.cc (expand_block_move): Check with new
constraint 'vl' instead of 'K'.
(expand_vec_setmem): Likewise.
(expand_vec_cmpmem): Likewise.
* config/riscv/riscv-v.cc (force_vector_length_operand): Likewise.
(expand_load_store): Likewise.
(expand_strided_load): Likewise.
(expand_strided_store): Likewise.
(expand_lanes_load_store): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/xtheadvector/pr114194.c: Move to...
* gcc.target/riscv/rvv/xtheadvector/pr114194-rv64.c: ...here.
* gcc.target/riscv/rvv/xtheadvector/pr114194-rv32.c: New test.
* gcc.target/riscv/rvv/xtheadvector/pr118601.c: New test.

Reported-by: Edwin Lu <ewlu@rivosinc.com>
(cherry picked from commit 580f571be6ce80aa71fb80e7b16e01824f088229)

RISC-V: Add a new constraint to ensure that the vl of XTheadVector does not get a non-zero immediate

Although we have handled the vl of XTheadVector correctly in the
expand phase and predicates, the results show that the work is
still insufficient.

In the curr_insn_transform function, the insn is transformed from:
(insn 69 67 225 12 (set (mem:RVVM8SF (reg/f:DI 218 [ _77 ]) [0  S[128, 128] A32])
        (if_then_else:RVVM8SF (unspec:RVVMF4BI [
                    (const_vector:RVVMF4BI repeat [
                            (const_int 1 [0x1])
                        ])
                    (reg:DI 209)
                    (const_int 0 [0])
                    (reg:SI 66 vl)
                    (reg:SI 67 vtype)
                ] UNSPEC_VPREDICATE)
            (reg/v:RVVM8SF 143 [ _xx ])
            (mem:RVVM8SF (reg/f:DI 218 [ _77 ]) [0  S[128, 128] A32])))
     (expr_list:REG_DEAD (reg/v:RVVM8SF 143 [ _xx ])
        (nil)))
to
(insn 69 284 225 11 (set (mem:RVVM8SF (reg/f:DI 18 s2 [orig:218 _77 ] [218]) [0  S[128, 128] A32])
        (if_then_else:RVVM8SF (unspec:RVVMF4BI [
                    (const_vector:RVVMF4BI repeat [
                            (const_int 1 [0x1])
                        ])
                    (const_int 1 [0x1])
                    (const_int 0 [0])
                    (reg:SI 66 vl)
                    (reg:SI 67 vtype)
                ] UNSPEC_VPREDICATE)
            (reg/v:RVVM8SF 104 v8 [orig:143 _xx ] [143])
            (mem:RVVM8SF (reg/f:DI 18 s2 [orig:218 _77 ] [218]) [0  S[128, 128] A32])))
     (nil))

Looking at the log for the reload pass, it is found that "Changing pseudo 209 in
operand 3 of insn 69 on equiv 0x1".
It converts the vl operand in insn from the expected register(reg:DI 209) to the
constant 1(const_int 1 [0x1]).

This conversion occurs because, although the predicate for the vl operand is
restricted by "vector_length_operand" in the pattern, the constraint is still
"rK", which allows the transformation.

The issue is that changing the "rK" constraint to "rJ" for the constraint of vl
operand in the pattern would prevent this conversion, But unfortunately this will
conflict with RVV (RISC-V Vector Extension).

Based on the review's recommendations, the best solution for now is to create
a new constraint to distinguish between RVV and XTheadVector, which is exactly
what this patch does.

PR target/116593

gcc/ChangeLog:

* config/riscv/constraints.md (vl): New.
* config/riscv/thead-vector.md: Replacing rK with rvl.
* config/riscv/vector.md: Likewise.

gcc/testsuite/ChangeLog:

* g++.target/riscv/rvv/rvv.exp: Enable testsuite of XTheadVector.
* g++.target/riscv/rvv/xtheadvector/pr116593.C: New test.

(cherry picked from commit 3024b12f2cde5db3bf52b49b07e32ef3065929fb)

RISC-V: Enable and adjust the testsuite for XTheadVector.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/rvv.exp: Enable testsuite of
XTheadVector.
* gcc.target/riscv/rvv/xtheadvector/pr114194.c: Adjust correctly.
* gcc.target/riscv/rvv/xtheadvector/prefix.c: Likewise.
* gcc.target/riscv/rvv/xtheadvector/vlb-vsb.c: Likewise.
* gcc.target/riscv/rvv/xtheadvector/vlbu-vsb.c: Likewise.
* gcc.target/riscv/rvv/xtheadvector/vlh-vsh.c: Likewise.
* gcc.target/riscv/rvv/xtheadvector/vlhu-vsh.c: Likewise.
* gcc.target/riscv/rvv/xtheadvector/vlw-vsw.c: Likewise.
* gcc.target/riscv/rvv/xtheadvector/vlwu-vsw.c: Likewise.

(cherry picked from commit ab24171d237a9138714f0e6d2bb38fd357ccaed9)

[PATCH] RISC-V: Bugfix for unrecognizable insn for XTheadVector

error: unrecognizable insn:

(insn 35 34 36 2 (set (subreg:RVVM1SF (reg/v:RVVM1x4SF 142 [ _r ]) 0)
        (unspec:RVVM1SF [
                (const_vector:RVVM1SF repeat [
                        (const_double:SF 0.0 [0x0.0p+0])
                    ])
                (reg:DI 0 zero)
                (const_int 1 [0x1])
                (reg:SI 66 vl)
                (reg:SI 67 vtype)
            ] UNSPEC_TH_VWLDST)) -1
     (nil))
during RTL pass: mode_sw

PR target/116591

gcc/ChangeLog:

* config/riscv/vector.md: Add restriction to call pred_th_whole_mov.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/xtheadvector/pr116591.c: New test.

(cherry picked from commit 8564d0948c72df0a66d7eb47e15c6ab43e9b25ce)

RISC-V: Fix the behavior for multilib-generator with --cmodel=large on rv32

Large code model is only supported on RV64, so we don't need to
generate the multilibs for RV32 with --cmodel=large. And the compact
code model is something we don't supported on upstream (which is
accidentally added in the past), so we need to remove it.

gcc/ChangeLog:

* config/riscv/multilib-generator: Remove the compact code model
and check large code model for RV32.

(cherry picked from commit 72dff34bcdd6f05b64bbf07739ab815e673b5946)

Daily bump.

libstdc++: Fix constraint recursion in basic_const_iterator operator- [PR115046]

It was proposed in PR112490 to also adjust basic_const_iterator's friend
operator-(sent, iter) overload alongside the r15-7757-g4342c50ca84ae5
adjustments to its comparison operators, but we lacked a concrete
testcase demonstrating fixable constraint recursion there.  It turns out
Hewill Kang's PR115046 is such a testcase!  So this patch makes the same
adjustments to that overload as well, fixing PR115046.  The LWG 4218 P/R
will need to get adjusted too.

PR libstdc++/115046
PR libstdc++/112490

libstdc++-v3/ChangeLog:

* include/bits/stl_iterator.h (basic_const_iterator::operator-):
Replace non-dependent basic_const_iterator function parameter with
a dependent one of type basic_const_iterator<_It2> where _It2
matches _It.
* testsuite/std/ranges/adaptors/as_const/1.cc (test04): New test.

Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
(cherry picked from commit d69f73c0334486f3c66937388f02008736809e87)

c++: ICE with nested default targ lambdas [PR119574]

In GCC 14 we fixed PR116567 in a more conservative way that doesn't
distinguish between the two kinds of deferred substitutions, and so
for PR119574 we instead ICE from get_innermost_template_args due to
TMPL_PARMS_DEPTH of the lambda, 2, being greater than the depth of the
augmented args, 1.

This patch works around the ICE in a best effort kind of way by guarding
the get_innermost_template_args call appropriately; I don't think it's
possible to get this completely right in GCC 14 without backporting the
proper fix for PR116567.

Note that lambda-targ13b.C present in the GCC 15 version of this patch[1]
never worked in GCC 14, and still doesn't work, which is why it's not
present in this patch.

[1]: r15-9350-gf3862ab07943d1

PR c++/119574

gcc/cp/ChangeLog:

* pt.cc (tsubst_lambda_expr): Don't call
get_innermost_template_args if we're requesting too many
levels.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/lambda-targ13.C: New test.
* g++.dg/cpp2a/lambda-targ13a.C: New test.

RISC-V: Fix vid const vector expander for non-npatterns size steps

Prior to this patch the expander would emit vectors like:
{ 0, 0, 5, 5, 10, 10, ...}
as:
{ 0, 0, 2, 2, 4, 4, ...}

This patch sets the step size to the requested value.

gcc/ChangeLog:

* config/riscv/riscv-v.cc (expand_const_vector): Fix STEP size in
expander.

Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>

RISC-V:Bugfix for vlmul_ext and vlmul_trunc with NULL return value[pr117286]

This patch fixes following ICE:

test.c: In function 'func':
test.c:37:24: internal compiler error: Segmentation fault
   37 |     vfloat16mf2_t vc = __riscv_vlmul_trunc_v_f16m1_f16mf2(vb);
      |                        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The root cause is that vlmul_trunc has a null return value.
gimple_call <__riscv_vlmul_trunc_v_f16m1_f16mf2, NULL, vb_13>
                                                 ^^^

Passed the rv64gcv_zvfh regression test.

Singed-off-by: Li Xu <xuli1@eswincomputing.com>
PR target/117286

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc: Do not expand NULL return.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/pr117286.c: New test.

RISC-V: Bugfix for max_sew_overlap_and_next_ratio_valid_for_prev_sew_p[pr117483]

This patch fixs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117483

If prev and next satisfy the following rules, we should forbid the case
(next.get_sew() < prev.get_sew() && (!next.get_ta() || !next.get_ma()))
in the compatible function max_sew_overlap_and_next_ratio_valid_for_prev_sew_p.
Otherwise, the tail elements of next will be polluted.

DEF_SEW_LMUL_RULE (ge_sew, ratio_and_ge_sew, ratio_and_ge_sew,
max_sew_overlap_and_next_ratio_valid_for_prev_sew_p,
always_false, use_max_sew_and_lmul_with_next_ratio)

Passed the rv64gcv full regression test.

Signed-off-by: Li Xu <xuli1@eswincomputing.com>
PR target/117483

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc: Fix bug.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/pr117483.c: New test.

[committed] [RISC-V] Fix false-positive uninitialized variable

Andreas noted we were getting an uninit warning after the recent constant
synthesis changes. Essentially there's no way for the uninit analysis code to
know the first entry in the CODES array is a UNKNOWN which will set X before
its first use.

So trivial initialization with NULL_RTX is the obvious fix.

Pushed to the trunk.

gcc/

* config/riscv/riscv.cc (riscv_move_integer): Initialize "x".

RISC-V: Error early with V and no M extension.

For calculating the value of a poly_int at runtime we use a
multiplication instruction that requires the M extension.
Instead of just asserting and ICEing this patch emits an early
error at option-parsing time.

gcc/ChangeLog:

PR target/116036

* config/riscv/riscv.cc (riscv_override_options_internal): Error
with TARGET_VECTOR && !TARGET_MUL.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/arch-31.c: Add m to arch string and expect it.
* gcc.target/riscv/arch-32.c: Ditto.
* gcc.target/riscv/predef-14.c: Ditto.
* gcc.target/riscv/predef-15.c: Ditto.
* gcc.target/riscv/predef-16.c: Ditto.
* gcc.target/riscv/predef-26.c: Ditto.
* gcc.target/riscv/predef-27.c: Ditto.
* gcc.target/riscv/predef-32.c: Ditto.
* gcc.target/riscv/predef-33.c: Ditto.
* gcc.target/riscv/rvv/autovec/pr111486.c: Add m to arch string.
* gcc.target/riscv/compare-debug-1.c: Ditto.
* gcc.target/riscv/compare-debug-2.c: Ditto.
* gcc.target/riscv/rvv/base/pr116036.c: New test.

RISC-V: Reject 'd' extension with ILP32E ABI

Also add a testcase for -mabi=lp64d where 'd' is required.

gcc/ChangeLog:

PR target/116111
* config/riscv/riscv.cc (riscv_option_override): Add error.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/arch-41.c: New test.
* gcc.target/riscv/pr116111.c: New test.

Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>

RISC-V: Correct mode_idx attribute for viwalu wx variants [PR116149].

In PR116149 we choose a wrong vector length which causes wrong values in
a reduction.  The problem happens in avlprop where we choose the
number of units in the instruction's mode as vector length.  For the
non-scalar variants the respective operand has the correct non-widened
mode.  For the scalar variants, however, the same operand has a scalar
mode which obviously only has one unit.  This makes us choose VL = 1
leaving three elements undisturbed (so potentially -1).  Those end up
in the reduction causing the wrong result.

This patch adjusts the mode_idx just for the scalar variants of the
affected instruction patterns.

gcc/ChangeLog:

PR target/116149

* config/riscv/vector.md: Fix mode_idx attribute of scalar
widen add/sub variants.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/pr116149.c: New test.

[RISC-V][PR target/116240] Ensure object is a comparison before extracting arguments

This was supposed to go out the door yesterday, but I kept getting interrupted.

The target bits for rtx costing can't assume the rtl they're given actually
matches a target pattern.   It's just kind of inherent in how the costing
routines get called in various places.

In this particular case we're trying to cost a conditional move:

(set (dest) (if_then_else (cond) (true) (false))

On the RISC-V port the backend only allows actual conditionals for COND.  So
something like (eq (reg) (const_int 0)).  In the costing code for if-then-else
we did something like

(XEXP (XEXP (cond, 0), 0)))

Which fails miserably if COND is a terminal node like (reg) rather than (ne
(reg) (const_int 0)

So this patch tightens up the RTL scanning to ensure that we have a comparison
before we start looking at the comparison's arguments.

Run through my tester without incident, but I'll wait for the pre-commit tester
to run through a cycle before pushing to the trunk.

Jeff

ps.   We probably could support a naked REG for the condition and internally convert it to (ne (reg) (const_int 0)), but I don't think it likely happens with any regularity.

PR target/116240
gcc/
* config/riscv/riscv.cc (riscv_rtx_costs): Ensure object is a
comparison before looking at its arguments.

gcc/testsuite
* gcc.target/riscv/pr116240.c: New test.

RISC-V: Delete duplicate '#define RISCV_DWARF_VLENB'

gcc/ChangeLog:

* config/riscv/riscv.h (RISCV_DWARF_VLENB): Delete.

RISC-V: Fix factor in dwarf_poly_indeterminate_value [PR116305]

This patch is to fix the bug (BugId:116305) introduced by the commit
bd93ef for risc-v target.

The commit bd93ef changes the chunk_num from 1 to TARGET_MIN_VLEN/128
if TARGET_MIN_VLEN is larger than 128 in riscv_convert_vector_bits. So
it changes the value of BYTES_PER_RISCV_VECTOR. For example, before
merging the commit bd93ef and if TARGET_MIN_VLEN is 256, the value
of BYTES_PER_RISCV_VECTOR should be [8, 8], but now [16, 16]. The value
of riscv_bytes_per_vector_chunk and BYTES_PER_RISCV_VECTOR are no longer
equal.

Prologue will use BYTES_PER_RISCV_VECTOR.coeffs[1] to estimate the vlenb
register value in riscv_legitimize_poly_move, and dwarf2cfi will also
get the estimated vlenb register value in riscv_dwarf_poly_indeterminate_value
to calculate the number of times to multiply the vlenb register value.

So need to change the factor from riscv_bytes_per_vector_chunk to
BYTES_PER_RISCV_VECTOR, otherwise we will get the incorrect dwarf
information. The incorrect example as follow:

```
csrr    t0,vlenb
slli    t1,t0,1
sub     sp,sp,t1

.cfi_escape 0xf,0xb,0x72,0,0x92,0xa2,0x38,0,0x34,0x1e,0x23,0x50,0x22
```

The sequence '0x92,0xa2,0x38,0' means the vlenb register, '0x34' means
the literal 4, '0x1e' means the multiply operation. But in fact, the
vlenb register value just need to multiply the literal 2.

PR target/116305

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_dwarf_poly_indeterminate_value): Take
BYTES_PER_RISCV_VECTOR for *factor instead of riscv_bytes_per_vector_chunk.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/scalable_vector_cfi.c: New test.

Signed-off-by: Zhijin Zeng <zhijin.zeng@spacemit.com>