git.ipfire.org Git - thirdparty/gcc.git/log

libstdc++: Fix attribute order on __normal_iterator friends [PR120949]

In r16-1911-g6596f5ab746533 I claimed to have reordered some attributes
for compatibility with Clang, but it looks like I got the Clang
restriction backwards and put them all in the wrong order. Clang trunk
accepts either order (probably since the llvm/llvm-project#133107 fix)
but released versions still require a particular order.

There were also some cases where the attributes were after the friend
keyword, which Clang trunk still rejects.

libstdc++-v3/ChangeLog:

PR libstdc++/120949
* include/bits/stl_iterator.h (__normal_iterator): Fix order of
always_inline and nodiscard attributes for Clang compatibility.

libstdc++: Make VERIFY a variadic macro

This defines the testsuite assertion macro VERIFY so that it allows
un-parenthesized expressions containing commas. This matches how assert
is defined in C++26, following the approval of P2264R7.

The primary motivation is to allow expressions that the preprocessor
splits into multiple arguments, e.g.
VERIFY( vec == std::vector<int>{1,2,3,4} );

To achieve this, VERIFY is redefined as a variadic macro and then the
arguments are grouped together again through the use of __VA_ARGS__.

The implementation is complex due to the following points:

- The arguments __VA_ARGS__ are contextually-converted to bool, so that
  scoped enums and types that are not contextually convertible to bool
  cannot be used with VERIFY.
- bool(__VA_ARGS__) is used so that multiple arguments (i.e. those which
  are separated by top-level commas) are ill-formed. Nested commas are
  allowed, but likely mistakes such as VERIFY( cond, "some string" ) are
  ill-formed.
- The bool(__VA_ARGS__) expression needs to be unevaluated, so that we
  don't evaluate __VA_ARGS__ more than once. The simplest way to do that
  would be just sizeof bool(__VA_ARGS__), without parentheses to avoid a
  vexing parse for VERIFY(bool(i)). However that wouldn't work for e.g.
  VERIFY( []{ return true; }() ), because lambda expressions are not
  allowed in unevaluated contexts until C++20. So we use another
  conditional expression with bool(__VA_ARGS__) as the unevaluated
  operand.

libstdc++-v3/ChangeLog:

* testsuite/util/testsuite_hooks.h (VERIFY): Define as variadic
macro.
* testsuite/ext/verify_neg.cc: New test.

Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>

libstdc++: Use template keyword in __mapping_of alias template

This is needed to fix an error with Clang 19:

include/c++/16.0.0/mdspan:512:30: error: use 'template' keyword to treat 'mapping' as a dependent template name
512 | is_same_v<typename _Layout::mapping<typename _Mapping::extents_type>,
| ^

libstdc++-v3/ChangeLog:

* include/std/mdspan (__mapping_of): Add template keyword.

Revert "Extend "counted_by" attribute to pointer fields of structures. Convert a pointer reference with counted_by attribute to .ACCESS_WITH_SIZE." due to PR120929.

This reverts commit 687727375769dd41971bad369f3553f1163b3e7a.

Revert "Use the counted_by attribute of pointers in builtinin-object-size." due to PR120929

This reverts commit 7165ca43caf47007f5ceaa46c034618d397d42ec.

Revert "Use the counted_by attribute of pointers in array bound checker." due to PR120929

This reverts commit 9d579c522d551eaa807e438206e19a91a3def67f.

check-function-bodies: Support "^[0-9]+:"

While working on

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120936

I tried to use check-function-bodies to verify that label for mcount
and __fentry__ is only generated by "-pg" if it is used by __mcount_loc
section:

1: call mcount
.section __mcount_loc, "a",@progbits
.quad 1b
.previous

Add "^[0-9]+:" to check-function-bodies to allow:

1: call mcount

PR testsuite/120881
* lib/scanasm.exp (check-function-bodies): Allow "^[0-9]+:".

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>

Ignore more clang warnings in contrib/filter-clang-warnings.py

in contrib we have a script filter-clang-warnings.py which supposedly
filters out uninteresting warnings emitted by clang when it compiles
GCC.  I'm not sure if anyone else uses it but our internal SUSE
testing infrastructure does.

Since Martin Liška left, I have mostly ignored the warnings and so
they have multiplied.  In an effort to improve the situation, I have
tried to fix those warnings which I think are worth it and would like
to adjust the filtering script so that we get to zero "interesting"
warnings again.

The changes are the following:

1. Ignore -Woverloaded-shift-op-parentheses warnings.  IIUC, those
   make some sense when << and >> are used for I/O but since that is
   not the case in GCC they are not really interesting.

2. Ignore -Wunused-function and -Wunneeded-internal-declaration.  I
   think it is OK to occasionally prepare APIs before they are used
   (and with our LTO we should be able to get rid of them).

3. Ignore -Wvla-cxx-extension and -Wunused-command-line-argument which
   just don't seem to be useful.

4. Ignore -Wunused-private-field warning in diagnostic-path-output.cc
   which can only be correct if quite a few functions are removed and
   looks like it is just not an oversight:

     gcc/diagnostic-path-output.cc:271:35: warning: private field 'm_logical_loc_mgr' is not used [-Wunused-private-field]

5. Ignore a case in -Wunused-but-set-variable about named_args which
   is used in a piece of code behind an ifdef in ipa-strub.cc.

6. Adjust the gimple-match and generic-match filters to the fact that
   we now have multiple such files.

7. Ignore warnings about using memcpy to copy around wide_ints, like
   the one below.  I seem to remember wide-int has undergone fairly
   rigorous review and TBH I just hope I know what we are doing.

     gcc/wide-int.h:1198:11: warning: first argument in call to 'memcpy' is a pointer to non-trivially copyable type 'wide_int_storage' [-Wnontrivial-memcall]

8. Ignore -Wc++11-narrowing warning reported in omp-builtins.def when
   it is included from JIT.  The code probably has a bigger issue
   described in PR 120960.

9. Since the patch number 14 in the original series did not get
   approved, I assume that private member field m_wanted_type of class
   element_expected_type_with_indirection in c-family/c-format.cc will
   get a use sooner or later, so I ignore a warning about it being
   unused.

10. I have decided to ignore warnings in m2/gm2-compiler-boot about
    unused stuff (all reported unused stuff are variables).  These
    sources are in the build directory so I assume they are somehow
    generated and so warnings about unused things are a bit expected
    and probably not too bad.

11. On the Zulip chat, I have informed Rust folks they have a bunch of
    -Wunused-private-field cases in the FE.  Until they sort it out
    I'm ignoring these.  I might add the missing explicit type-cast
    case here too if it takes time for the patch I'm posting in this
    series to reach master.

12. I ignore warning about use of offsetof in libiberty/sha1.c which is
    apparently only a "C23 extension:"

      libiberty/sha1.c:239:11: warning: defining a type within 'offsetof' is a C23 extension [-Wc23-extensions]
      libiberty/sha1.c:460:11: warning: defining a type within 'offsetof' is a C23 extension [-Wc23-extensions]

13. I have enlarged the list of .texi files where warnings somehow got
    reported.  Not sure why that happens.

14. In analyzer/sm.cc there are several "no-op" methods which have
    named but unused parameters.  It seems this is deliberate and so I
    have filtered the -Wunused-parameter warning for this file.

I have also re-arranged the entries in a way which hopefully makes
somewhat more sense.

Thanks,

Martin

contrib/ChangeLog:

2025-07-07  Martin Jambor  <mjambor@suse.cz>

* filter-clang-warnings.py (skip_warning): Also ignore
-Woverloaded-shift-op-parentheses, -Wunused-function,
-Wunneeded-internal-declaration, -Wvla-cxx-extension', and
-Wunused-command-line-argument everywhere and a warning about
m_logical_loc_mgr in diagnostic-path-output.cc.  Adjust gimple-match
and generic-match "filenames."  Ignore -Wnontrivial-memcall warnings
in wide-int.h, all warnings about unused stuff in files under
m2/gm2-compiler-boot, all -Wunused-private-field in rust FE, in
analyzer/ana-state-to-diagnostic-state.h and c-family/c-format.cc, all
Warnings in avr-mmcu.texi, install.texi and libgccjit.texi and all
-Wc23-extensions warnings in libiberty/sha1.c. Ignore
-Wunused-parameter in analyzer/sm.cc.  Reorder entries.

ranger: Mark three occurrences of verify_range with overide

In line with my previous patches introducing override where clang
warnings indicate that they are missing, this patch adds it to three
new member functions overriding ancestor virtual functions that do not
have them.

Since Andrew has pre-approved such changes for ranger, I am going to
push it to master after bootstrapping it on x86_64-linux.

Thanks,

Martin

gcc/ChangeLog:

2025-07-07 Martin Jambor <mjambor@suse.cz>

* value-range.h (class irange): Mark member function verify_range
with override.
(class prange): Mark member function verify_range with final override.
(class frange): Mark member function verify_range with override.

xtensa: Remove TARGET_PROMOTE_PROTOTYPES

xtensa ABI requires sign extension of signed 8/16-bit arguments to 32
bits and zero extension of unsigned 8/16-bit arguments to 32 bits.
TARGET_PROMOTE_PROTOTYPES is an optimization, not an ABI requirement.
Remove TARGET_PROMOTE_PROTOTYPES and define xtensa_promote_function_mode
to properly extend 8/16-bit arguments to 32 bits.

gcc/

PR target/120888
* config/xtensa/xtensa.cc (xtensa_promote_function_mode): New.
(TARGET_PROMOTE_FUNCTION_MODE): Use.
(TARGET_PROMOTE_PROTOTYPES): Removed.

gcc/testsuite/

PR target/120888
* gcc.target/xtensa/pr120888-1.c: New test.
* gcc.target/xtensa/pr120888-2.c: Likewise.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>

s390: Optimize fmin/fmax.

On VXE targets, we can directly use the fp min/max instruction instead of
calling into libm for fmin/fmax etc.

Provide fmin/fmax versions also for vectors even though it cannot be
called directly. This will be exploited with a follow-up patch when
reductions are introduced.

gcc/ChangeLog:

* config/s390/s390.md: Update UNSPECs
* config/s390/vector.md (fmax<mode>3): New expander.
(fmin<mode>3): New expander.
* config/s390/vx-builtins.md (*fmin<mode>): New insn.
(vfmin<mode>): Redefined to use new insn.
(*fmax<mode>): New insn.
(vfmax<mode>): Redefined to use new insn.

gcc/testsuite/ChangeLog:

* gcc.target/s390/fminmax-1.c: New test.
* gcc.target/s390/fminmax-2.c: New test.

Signed-off-by: Juergen Christ <jchrist@linux.ibm.com>

testsuite: add sve hw check to testcase [PR120817]

Drop down from SVE2 to SVE1 as that's the minimum
required for the test, and since it's a mid-end test
add the aarch64_sve_hw check.

gcc/testsuite/ChangeLog:

PR tree-optimization/120817
* gcc.dg/vect/pr120817.c: Add SVE HW check.

c++: Fix FMV return type ambiguation

Add logic for the case of two FMV annotated functions with identical
signature other than the return type.

Previously this was ignored, this changes the behavior to emit a diagnostic.

gcc/cp/ChangeLog:
PR c++/119498
* decl.cc (duplicate_decls): Change logic to not always exclude FMV
annotated functions in cases of return type non-ambiguation.

gcc/testsuite/ChangeLog:
PR c++/119498
* g++.target/aarch64/pr119498.C: New test.

c++: -Wno-abbreviated-auto-in-template-arg [PR120917]

In r14-1659 I added a missing error for a Concepts TS feature that we were
failing to diagnose, but this PR requests a way to disable that error for
code written thinking it was valid. Which seems reasonable, since it
doesn't require any work beyond that and is a plausible extension by itself.

While looking at this, I also noticed we were still not giving the
diagnostic in a few cases, and fixing that affected a few of our old
concepts testcases.

PR c++/120917

gcc/ChangeLog:

* doc/invoke.texi: Add -Wno-abbreviated-auto-in-template-arg.

gcc/c-family/ChangeLog:

* c.opt: Add -Wno-abbreviated-auto-in-template-arg.
* c.opt.urls: Regenerate.

gcc/cp/ChangeLog:

* parser.cc (cp_parser_simple_type_specifier): Attach
auto in targ in parameter to -Wabbreviated-auto-in-template-arg.
(cp_parser_placeholder_type_specifier): Diagnose constrained auto in
template arg.

gcc/testsuite/ChangeLog:

* g++.dg/concepts/auto7a.C: Add diagnostic.
* g++.dg/concepts/auto7b.C: New test.
* g++.dg/concepts/auto7c.C: New test.
* g++.dg/cpp1y/pr85076.C: Expect 'auto' error.
* g++.dg/concepts/pr67249.C: Likewise.
* g++.dg/cpp1y/lambda-generic-variadic.C: Likewise.
* g++.dg/cpp2a/concepts-pr67210.C: Likewise.
* g++.dg/concepts/pr67249a.C: New test.
* g++.dg/cpp1y/lambda-generic-variadic-a.C: New test.
* g++.dg/cpp2a/concepts-pr67210a.C: New test.

aarch64: Improve popcountti2 with SVE

The TImode popcount sequence can be slightly improved with SVE.
If we generate:
        ldr     q31, [x0]
        ptrue   p7.b, vl16
        cnt     z31.d, p7/m, z31.d
        addp    d31, v31.2d
        fmov    x0, d31
        ret

instead of:
h128:
        ldr     q31, [x0]
        cnt     v31.16b, v31.16b
        addv    b31, v31.16b
        fmov    w0, s31
        ret

we use the ADDP instruction for reduction, which is cheaper on all CPUs AFAIK,
as it is only a single 64-bit addition vs the tree of additions for ADDV.
For example, on a CPU like Grace we get a latency and throughput of 2,4 vs 4,1
for ADDV.
We do generate one more instruction due to the PTRUE being materialised, but that
is cheap itself and can be scheduled away from the critical path or even CSE'd
with other PTRUE constants.
As this sequence is larger code size-wise it is avoided for -Os.

Bootstrapped and tested on aarch64-none-linux-gnu.

Signed-off-by: Kyrylo Tkachov <ktkachov@nvidia.com>
gcc/

* config/aarch64/aarch64.md (popcountti2): Add TARGET_SVE path.

gcc/testsuite/

* gcc.target/aarch64/popcnt9.c: Add +nosve to target pragma.
* gcc.target/aarch64/popcnt13.c: New test.

libstdc++: Format chrono %a/%A/%b/%h/%B/%p using locale's time_put [PR117214]

C++ formatting locale could have a custom time_put that performs
differently from the C locale, so do not use __timepunct directly,
instead all of above specifiers use _M_locale_fmt.

For %a/%A/%b/%h/%B, the code handling the exception is now moved
to the _M_check_ok function, that is invoked before handling of the
conversion specifier. For time_points the values of months/weekday
are computed, and thus are always ok(), this information is indicated
by new _M_time_point member of the _ChronoSpec.

The different behavior of j specifier for durations and time_points/calendar
types, is now handled using only _ChronoParts, and _M_time_only in _ChronoSpec
is no longer needed, thus it was removed.

PR libstdc++/117214

libstdc++-v3/ChangeLog:

* include/bits/chrono_io.h (_ChronoSpec::_M_time_only): Remove.
(_ChronoSpec::_M_time_point): Define.
(__formatter_chrono::_M_parse): Use __parts to determine
interpretation of j.
(__formatter_chrono::_M_check_ok): Define.
(__formatter_chrono::_M_format_to): Invoke _M_check_ok.
(__formatter_chrono::_M_a_A, __formatter_chrono::_M_b_B): Move
exception throwing to _M_check_ok.
(__formatter_chrono::_M_j): Use _M_needs to define interpretation.
(__formatter_duration::_S_spec_for): Set _M_time_point.
* testsuite/std/time/format/format.cc: Test for exception for !ok()
months/weekday.
* testsuite/std/time/format/pr117214_custom_timeput.cc: New
test.

Co-authored-by: Tomasz Kaminski <tkaminsk@redhat.com>
Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
Signed-off-by: XU Kailiang <xu2k3l4@outlook.com>
Signed-off-by: Tomasz Kaminski <tkaminsk@redhat.com>

tree-optimization/120817 - bogus DSE of .MASK_STORE

DSE used ao_ref_init_from_ptr_and_size for .MASK_STORE but
alias-analysis will use the specified size to disambiguate
against smaller objects. For .MASK_STORE we instead have to
make the access size unspecified but we can still constrain
the access extent based on the maximum size possible.

PR tree-optimization/120817
* tree-ssa-dse.cc (initialize_ao_ref_for_dse): Use
ao_ref_init_from_ptr_and_range with unknown size for
.MASK_STORE and .MASK_LEN_STORE.

* gcc.dg/vect/pr120817.c: New testcase.

RISC-V: Add test cases for unsigned scalar SAT_MUL from uint128_t

Add run and tree-optimized check for unsigned scalar SAT_MUL from
uint128_t.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/sat/sat_arith.h: Add test helper macros.
* gcc.target/riscv/sat/sat_arith_data.h: Add test data for
run test.
* gcc.target/riscv/sat/sat_u_mul-1-u16-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-1-u32-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-1-u64-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-1-u8-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-1-u16-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-1-u32-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-1-u64-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-1-u8-from-u128.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

RISC-V: Implement unsigned scalar SAT_MUL from uint128_t

This patch would like to implement the SAT_MUL scalar unsigned from
uint128_t, aka:

  NT __attribute__((noinline))
  sat_u_mul_##NT##_fmt_1 (NT a, NT b)
  {
    uint128_t x = (uint128_t)a * (uint128_t)b;
    NT max = -1;
    if (x > (uint128_t)(max))
      return max;
    else
      return (NT)x;
  }

Take uint64_t and uint8_t as example:

Before this patch for uint8_t:
  10   │ sat_u_mul_uint8_t_from_uint128_t_fmt_1:
  11   │     mulhu   a5,a0,a1
  12   │     mul a0,a0,a1
  13   │     bne a5,zero,.L3
  14   │     li  a5,255
  15   │     bleu    a0,a5,.L4
  16   │ .L3:
  17   │     li  a0,255
  18   │ .L4:
  19   │     andi    a0,a0,0xff
  20   │     ret

After this patch for uint8_t:
  10   │ sat_u_mul_uint8_t_from_uint128_t_fmt_1:
  11   │     mul a0,a0,a1
  12   │     li  a5,255
  13   │     sltu    a5,a5,a0
  14   │     neg a5,a5
  15   │     or  a0,a0,a5
  16   │     andi    a0,a0,0xff
  17   │     ret

Before this patch for uint64_t:
  10   │ sat_u_mul_uint64_t_from_uint128_t_fmt_1:
  11   │     mulhu   a5,a0,a1
  12   │     mul a0,a0,a1
  13   │     beq a5,zero,.L4
  14   │     li  a0,-1
  15   │ .L4:
  16   │     ret

After this patch for uint64_t:
  10   │ sat_u_mul_uint64_t_from_uint128_t_fmt_1:
  11   │     mulhsu  a5,a1,a0
  12   │     mul a0,a0,a1
  13   │     snez    a5,a5
  14   │     neg a5,a5
  15   │     or  a0,a0,a5
  16   │     ret

gcc/ChangeLog:

* config/riscv/riscv-protos.h (riscv_expand_usmul): Add new func
decl.
* config/riscv/riscv.cc (riscv_expand_xmode_usmul): Add new func
to expand Xmode SAT_MUL.
(riscv_expand_non_xmode_usmul): Ditto but for non-Xmode.
(riscv_expand_usmul): Add new func to implment SAT_MUL.
* config/riscv/riscv.md (usmul<mode>3): Add new pattern to match
standard name usmul.

Signed-off-by: Pan Li <pan2.li@intel.com>

Widening-Mul: Support unsigned scalar SAT_MUL form 1

This patch would like to try to match the SAT_MUL during
widening-mul pass, aka below pattern.

  NT __attribute__((noinline))
  sat_u_mul_##NT##_fmt_1 (NT a, NT b)
  {
    uint128_t x = (uint128_t)a * (uint128_t)b;
    NT max = -1;
    if (x > (uint128_t)(max))
      return max;
    else
      return (NT)x;
  }

while the NT can be uint8_t, uint16_t, uint32_t and uint64_t.

gcc/ChangeLog:

* match.pd: Add new match pattern for unsigned SAT_MUL.
* tree-ssa-math-opts.cc (gimple_unsigned_integer_sat_mul):
new decl for pattern match func.
(match_unsigned_saturation_mul): Add new func to match unsigned
SAT_MUL.
(math_opts_dom_walker::after_dom_children): Try to match
unsigned SAT_MUL on NOP.

Signed-off-by: Pan Li <pan2.li@intel.com>

Internal-fn: Introduce new IFN_SAT_MUL for unsigned int

This patch would like to add the middle-end presentation for the
unsigend saturation mul.  Aka set the result of mul to the max
when overflow.

Take uint8_t as example, we will have:

* SAT_MUL (1, 127)   => 127.
* SAT_MUL (2, 127)   => 254.
* SAT_MUL (3, 127)   => 255.
* SAT_MUL (255, 127) => 255.

Given below example for uint16_t from uint128_t

  #define DEF_SAT_U_MUL_FMT_1(NT, WT)             \
  NT __attribute__((noinline))                    \
  sat_u_mul_##NT##_from_##WT##_fmt_1 (NT a, NT b) \
  {                                               \
    WT x = (WT)a * (WT)b;                         \
    NT max = -1;                                  \
    if (x > (WT)(max))                            \
      return max;                                 \
    else                                          \
      return (NT)x;                               \
  }

  DEF_SAT_U_MUL_FMT_1(uint16_t, uint128_t)

Before this patch:
  15   │   <bb 2> [local count: 1073741824]:
  16   │   _1 = (__int128 unsigned) a_4(D);
  17   │   _2 = (__int128 unsigned) b_5(D);
  18   │   _9 = (unsigned long) _1;
  19   │   _10 = (unsigned long) _2;
  20   │   x_6 = _9 w* _10;
  21   │   _7 = MIN_EXPR <x_6, 255>;
  22   │   _3 = (uint8_t) _7;
  23   │   return _3;

After this patch:
   9   │   <bb 2> [local count: 1073741824]:
  10   │   _3 = .SAT_MUL (a_4(D), b_5(D)); [tail call]
  11   │   return _3;

gcc/ChangeLog:

* internal-fn.cc (commutative_binary_fn_p): Add new case
for SAT_MUL.
* internal-fn.def (SAT_MUL): Add new IFN_SAT_MUL.
* optabs.def (OPTAB_NL): Remove fixed point limitation.

Signed-off-by: Pan Li <pan2.li@intel.com>

Ada: Reapply tweaks to delay statements in ACATS 3&4 testsuites

They had originally been applied to the ACATS 2 testsuite and I forgot to
reapply them to the ACATS 4 testsuite altogether.

gcc/testsuite/
* ada/acats-3/tests/c9/c94001c.ada: Tweak delay statements.
* ada/acats-4/tests/c9/c94001c.ada: Likewise.
* ada/acats-4/tests/c9/c94006a.ada: Likewise.
* ada/acats-4/tests/c9/c94008c.ada: Likewise.
* ada/acats-4/tests/c9/c951002.a: Likewise.
* ada/acats-4/tests/c9/c954a01.a: Likewise.
* ada/acats-4/tests/c9/c940005.a: Tweak duration constants.
* ada/acats-4/tests/c9/c940007.a: Likewise.
* ada/acats-4/tests/c9/c96001a.ada: Likewise.

libstdc++: Format __float128 as _Float128 only when long double is not 128 IEEE [PR120976]

For powerpc64 and sparc architectures that both have __float128 and 128bit long double,
the __float128 is same type as long double/__ieee128 and already formattable.

The remaining specialization makes __float128 formattable on x86_64 via _Float128,
however __float128 is now not formattable on x86_32 (-m32) with -mlong-double-128,
where __float128 is distinct type from long double that is 128bit IEEE.

PR libstdc++/120976

libstdc++-v3/ChangeLog:

* include/std/format (formatter<__float128, _Char_T): Define if
_GLIBCXX_FORMAT_F128 == 2.

Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>

s390: Add some missing vector patterns.

Some patterns that are detected by the autovectorizer can be supported by
s390. Add expanders such that autovectorization of these patterns works.

RTL for the builtins used unspec to represent highpart multiplication.
Replace this by the correct RTL to allow further simplification.

gcc/ChangeLog:

* config/s390/s390.md: Removed unused unspecs.
* config/s390/vector.md (avg<mode>3_ceil): New expander.
(uavg<mode>3_ceil): New expander.
(smul<mode>3_highpart): New expander.
(umul<mode>3_highpart): New expander.
* config/s390/vx-builtins.md (vec_umulh<mode>): Remove unspec.
(vec_smulh<mode>): Remove unspec.

gcc/testsuite/ChangeLog:

* gcc.target/s390/vector/pattern-avg-1.c: New test.
* gcc.target/s390/vector/pattern-mulh-1.c: New test.

Signed-off-by: Juergen Christ <jchrist@linux.ibm.com>

Update maintainers file

Update MAINTAINERS file to include myself in AArch64 port.

ChangeLog:

* MAINTAINERS: Add myself to AArch64 pot.

aarch64: Add support for unpacked SVE FP comparisons

This patch extends our vec_cmp expander to support partial FP modes.

We use a predicate mode that is narrower the operation's VPRED to govern
unpacked FP operations under flag_trapping_math, so the expansion must
handle cases where the comparison's target and governing predicates have
different modes.

While such predicates enable all of the defined part of the operation, they
are not all-true. Their false bits contribute to the (trapping) behavior of
the operation, so we cannot have SVE_KNOWN_PTRUE.

gcc/ChangeLog:

* config/aarch64/aarch64-sve.md (vec_cmp<mode><vpred>): Extend
to handle partial FP modes.
(@aarch64_pred_fcm<cmp_op><mode>): Likewise.
(@aarch64_pred_fcmuo<mode>): Likewise.
(*one_cmpl<mode>3): Rename to...
(@aarch64_pred_one_cmpl<mode>_z): ... this.
* config/aarch64/aarch64.cc (aarch64_emit_sve_fp_cond): Allow the
target and governing predicates to have different modes.
(aarch64_emit_sve_or_fp_conds): Likewise.
(aarch64_emit_sve_invert_fp_cond): Likewise.
(aarch64_expand_sve_vec_cmp_float): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/sve/unpacked_fcm_1.c: New test.
* gcc.target/aarch64/sve/unpacked_fcm_2.c: Likewise.

vect: Fix VEC_WIDEN_PLUS_HI/LO choice for big-endian [PR118891]

In the tree codes and optabs, the "hi" in a vector hi/lo pair means
"most significant" and the "lo" means "least significant", with
sigificance following GCC's normal endian expectations.  Thus on
big-endian targets, the hi part handles the first half of the elements
in memory order and the lo part handles the second half.

For tree codes, supportable_widening_operation first chooses hi/lo
pairs based on little-endian order and then uses:

  if (BYTES_BIG_ENDIAN && c1 != VEC_WIDEN_MULT_EVEN_EXPR)
    std::swap (c1, c2);

to adjust.  However, the handling for internal functions was missing
an equivalent fixup.  This led to several execution failures in vect.exp
on aarch64_be-elf.

If the hi/lo code fails, the internal function handling goes on to try
even/odd.  But I couldn't see anything obvious that would put the even/
odd results back into the right order later, so there might be a latent
bug there too.

gcc/
PR tree-optimization/118891
* tree-vect-stmts.cc (supportable_widening_operation): Swap the
hi and lo internal functions on big-endian targets.

ext-dce: Fix subreg_lsb is_constant assumption

ext-dce had:

  if (SUBREG_P (dst) && SUBREG_BYTE (dst).is_constant ())
    {
      bit = subreg_lsb (dst).to_constant ();
      if (bit >= HOST_BITS_PER_WIDE_INT)
bit = HOST_BITS_PER_WIDE_INT - 1;
      dst = SUBREG_REG (dst);

But a constant SUBREG_BYTE doesn't guarantee a constant subreg_lsb.
If the SUBREG_REG is a pair of N-bit registers on a big-endian target,
the most significant end has a SUBREG_BYTE of 0 but a subreg_lsb of N.
This N would then be non-constant for variable-length registers.

The patch fixes gcc.dg/torture/pr120276.c and other failures on
aarch64_be-elf.

gcc/
* ext-dce.cc (ext_dce_process_uses): Apply is_constant directly
to the subreg_lsb.

aarch64: Fix neon-sve-bridge.c failures for big-endian

Lowpart subregs are generally disallowed on big-endian SVE vector
registers, since the first memory element is stored at the least
significant end of the register, rather than the most significant end.
(See the comment at the head of aarch64-sve.md for details,
and aarch64_modes_compatible_p for the implementation.)

This means that arm_sve_neon_bridge.h needs to use custom define_insns
for big-endian targets, in lieu of using lowpart subregs.  However,
one of those define_insns relied on the prohibited lowparts internally,
to convert an Advanced SIMD register to an SVE register.  Since the
lowpart is not allowed, the lowpart_subreg would return null, leading
to a later ICE.

The simplest fix seems to be to use %Z instead, to force the Advanced
SIMD register to be written as an SVE register.

gcc/
* config/aarch64/aarch64-sve.md (@aarch64_sve_set_neonq_<mode>):
Use %Z instead of lowpart_subreg.  Tweak formatting.

aarch64: Fix ZIP1 order in aarch64_expand_vector_init [PR118891]

aarch64_expand_vector_init contains some divide-and-conquer code
that tries to load the odd and even elements into 64-bit registers
and then ZIP them together. On big-endian targets, the even elements
are more significant than the odd elements and so should come second
in the ZIP.

This fixes many execution failures on aarch64_be-elf, including
gcc.c-torture/execute/pr28982a.c.

gcc/
PR target/118891
* config/aarch64/aarch64.cc (aarch64_expand_vector_init): Fix the
ZIP1 operand order for big-endian targets.

Print discriminators in dump_scope_block

gcc/ChangeLog:

* tree-ssa-live.cc (dump_scope_block): Print discriminators
of inlined functions.

x86: Improve vector_loop/unrolled_loop for memset/memcpy

1. Don't generate the loop if the loop count is 1.
2. For memset with vector on small size, use vector if small size supports
vector, otherwise use the scalar value.
3. Always expand vector-version of memset for vector_loop.
4. Always duplicate the promoted scalar value for vector_loop if not 0 nor
-1.
5. Use misaligned prologue if alignment isn't needed.  When misaligned
prologue is used, check if destination is actually aligned and update
destination alignment if aligned.
6. Use move_by_pieces and store_by_pieces for memcpy and memset epilogues
with the fixed epilogue size to enable overlapping moves and stores.

The included tests show that codegen of vector_loop/unrolled_loop for
memset/memcpy are significantly improved.  For

void
foo (void *p1, size_t len)
{
  __builtin_memset (p1, 0, len);
}

with

-O2 -minline-all-stringops -mmemset-strategy=vector_loop:256:noalign,libcall:-1:noalign -march=x86-64

we used to generate

foo:
.LFB0:
.cfi_startproc
movq %rdi, %rax
pxor %xmm0, %xmm0
cmpq $64, %rsi
jnb .L18
.L2:
andl $63, %esi
je .L1
xorl %edx, %edx
testb $1, %sil
je .L5
movl $1, %edx
movb $0, (%rax)
cmpq %rsi, %rdx
jnb .L19
.L5:
movb $0, (%rax,%rdx)
movb $0, 1(%rax,%rdx)
addq $2, %rdx
cmpq %rsi, %rdx
jb .L5
.L1:
ret
.p2align 4,,10
.p2align 3
.L18:
movq %rsi, %rdx
xorl %eax, %eax
andq $-64, %rdx
.L3:
movups %xmm0, (%rdi,%rax)
movups %xmm0, 16(%rdi,%rax)
movups %xmm0, 32(%rdi,%rax)
movups %xmm0, 48(%rdi,%rax)
addq $64, %rax
cmpq %rdx, %rax
jb .L3
addq %rdi, %rax
jmp .L2
.L19:
ret
.cfi_endproc

with very poor prologue/epilogue.  With this patch, we now generate:

foo:
.LFB0:
.cfi_startproc
pxor %xmm0, %xmm0
cmpq $64, %rsi
jnb .L2
testb $32, %sil
jne .L19
testb $16, %sil
jne .L20
testb $8, %sil
jne .L21
testb $4, %sil
jne .L22
testq %rsi, %rsi
jne .L23
.L1:
ret
.p2align 4,,10
.p2align 3
.L2:
movups %xmm0, -64(%rdi,%rsi)
movups %xmm0, -48(%rdi,%rsi)
movups %xmm0, -32(%rdi,%rsi)
movups %xmm0, -16(%rdi,%rsi)
subq $1, %rsi
cmpq $64, %rsi
jb .L1
andq $-64, %rsi
xorl %eax, %eax
.L9:
movups %xmm0, (%rdi,%rax)
movups %xmm0, 16(%rdi,%rax)
movups %xmm0, 32(%rdi,%rax)
movups %xmm0, 48(%rdi,%rax)
addq $64, %rax
cmpq %rsi, %rax
jb .L9
ret
.p2align 4,,10
.p2align 3
.L23:
movb $0, (%rdi)
testb $2, %sil
je .L1
xorl %eax, %eax
movw %ax, -2(%rdi,%rsi)
ret
.p2align 4,,10
.p2align 3
.L19:
movups %xmm0, (%rdi)
movups %xmm0, 16(%rdi)
movups %xmm0, -32(%rdi,%rsi)
movups %xmm0, -16(%rdi,%rsi)
ret
.p2align 4,,10
.p2align 3
.L20:
movups %xmm0, (%rdi)
movups %xmm0, -16(%rdi,%rsi)
ret
.p2align 4,,10
.p2align 3
.L21:
movq $0, (%rdi)
movq $0, -8(%rdi,%rsi)
ret
.p2align 4,,10
.p2align 3
.L22:
movl $0, (%rdi)
movl $0, -4(%rdi,%rsi)
ret
.cfi_endproc

gcc/

PR target/120670
PR target/120683
* config/i386/i386-expand.cc (expand_set_or_cpymem_via_loop):
Don't generate the loop if the loop count is 1.
(expand_cpymem_epilogue): Use move_by_pieces.
(setmem_epilogue_gen_val): New.
(expand_setmem_epilogue): Use store_by_pieces.
(expand_small_cpymem_or_setmem): Choose cpymem mode from MOVE_MAX.
For memset with vector and the size is smaller than the vector
size, first try the narrower vector, otherwise, use the scalar
value.
(promote_duplicated_reg): Duplicate the scalar value for vector.
(ix86_expand_set_or_cpymem): Always expand vector-version of
memset for vector_loop.  Use misaligned prologue if alignment
isn't needed and destination isn't aligned.  Always initialize
vec_promoted_val from the promoted scalar value for vector_loop.

gcc/testsuite/

PR target/120670
PR target/120683
* gcc.target/i386/auto-init-padding-9.c: Updated.
* gcc.target/i386/memcpy-strategy-12.c: Likewise.
* gcc.target/i386/memset-strategy-25.c: Likewise.
* gcc.target/i386/memset-strategy-29.c: Likewise.
* gcc.target/i386/memset-strategy-30.c: Likewise.
* gcc.target/i386/memset-strategy-31.c: Likewise.
* gcc.target/i386/memcpy-pr120683-1.c: New test.
* gcc.target/i386/memcpy-pr120683-2.c: Likewise.
* gcc.target/i386/memcpy-pr120683-3.c: Likewise.
* gcc.target/i386/memcpy-pr120683-4.c: Likewise.
* gcc.target/i386/memcpy-pr120683-5.c: Likewise.
* gcc.target/i386/memcpy-pr120683-6.c: Likewise.
* gcc.target/i386/memcpy-pr120683-7.c: Likewise.
* gcc.target/i386/memset-pr120683-1.c: Likewise.
* gcc.target/i386/memset-pr120683-2.c: Likewise.
* gcc.target/i386/memset-pr120683-3.c: Likewise.
* gcc.target/i386/memset-pr120683-4.c: Likewise.
* gcc.target/i386/memset-pr120683-5.c: Likewise.
* gcc.target/i386/memset-pr120683-6.c: Likewise.
* gcc.target/i386/memset-pr120683-7.c: Likewise.
* gcc.target/i386/memset-pr120683-8.c: Likewise.
* gcc.target/i386/memset-pr120683-9.c: Likewise.
* gcc.target/i386/memset-pr120683-10.c: Likewise.
* gcc.target/i386/memset-pr120683-11.c: Likewise.
* gcc.target/i386/memset-pr120683-12.c: Likewise.
* gcc.target/i386/memset-pr120683-13.c: Likewise.
* gcc.target/i386/memset-pr120683-14.c: Likewise.
* gcc.target/i386/memset-pr120683-15.c: Likewise.
* gcc.target/i386/memset-pr120683-16.c: Likewise.
* gcc.target/i386/memset-pr120683-17.c: Likewise.
* gcc.target/i386/memset-pr120683-18.c: Likewise.
* gcc.target/i386/memset-pr120683-19.c: Likewise.
* gcc.target/i386/memset-pr120683-20.c: Likewise.
* gcc.target/i386/memset-pr120683-21.c: Likewise.
* gcc.target/i386/memset-pr120683-22.c: Likewise.
* gcc.target/i386/memset-pr120683-23.c: Likewise.

c++: Pedwarn on invalid decl specifiers for for-range-declaration [PR84009]

https://eel.is/c++draft/stmt.ranged#2
says that in for-range-declaration only type-specifier or constexpr
can appear. As the following testcases show, we've emitted some
diagnostics in most cases, but not for static/thread_local (the patch
handles __thread too) and register in the non-sb case.
For extern there was an error that it is both extern and has an
initializer (again, non-sb only, sb errors on extern).

The following patch diagnoses those cases with pedwarn.
I've used for-range-declaration in the diagnostics wording (there was
already a case of that for the typedef), so that in the future
we don't need to differentiate it between range for and expansion
statements.

2025-07-07 Jakub Jelinek <jakub@redhat.com>

PR c++/84009
* parser.cc (cp_parser_decomposition_declaration): Pedwarn
on thread_local, __thread or static in decl_specifiers for
for-range-declaration.
(cp_parser_init_declarator): Likewise, and also for extern
or register.

* g++.dg/cpp0x/range-for40.C: New test.
* g++.dg/cpp0x/range-for41.C: New test.
* g++.dg/cpp0x/range-for42.C: New test.
* g++.dg/cpp0x/range-for43.C: New test.

fortran: Add the preliminary code of MOVE_ALLOC arguments

Add the preliminary code produced for the evaluation of the FROM and TO
arguments of the MOVE_ALLOC intrinsic before using their values.
Before this change, the preliminary code was ignored and dropped,
limiting the validity of the implementation of MOVE_ALLOC to simple
cases without preliminary code.

This change also adds the cleanup code of the same arguments.  It
doesn't make any difference on the testcase though.  Because of the
limited set of arguments that are allowed (variables or components
without subreference), it is possible that the cleanup code is actually
guaranteed to be empty.  At least adding the cleanup code makes the
array case consistent with the scalar case.

gcc/fortran/ChangeLog:

* trans-intrinsic.cc (conv_intrinsic_move_alloc): Add pre and
post code for the FROM and TO arguments.

gcc/testsuite/ChangeLog:

* gfortran.dg/move_alloc_20.f03: New test.

crc: Error out on non-constant poly arguments for the crc builtins [PR120709]

These builtins requires a constant integer for the third argument but currently
there is assert rather than error. This fixes that and updates the documentation too.
Uses the same terms as was being used for the __builtin_prefetch arguments.

Bootstrapped and tested on x86_64-linux-gnu.

PR middle-end/120709

gcc/ChangeLog:

* builtins.cc (expand_builtin_crc_table_based): Error out
instead of asserting the 3rd argument is an integer constant.
* internal-fn.cc (expand_crc_optab_fn): Likewise.
* doc/extend.texi (crc): Document requirement of the poly argument
being a constant.

gcc/testsuite/ChangeLog:

* gcc.dg/crc-non-cst-poly-1.c: New test.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

Daily bump.

libstdc++: Implement ranges::shift_left/right from P2440R1

The implementation is just a copy of std::shift_left/right with the
following changes:

- check bidirectional_iterator instead of iterator_category
- cope with __last being a distinct sentinel type
- for shift_left, return the subrange {__first, X} instead of X
- for shift_right, return the subrange {X, ranges::next(__first, __last)}
   instead of X
- use the ranges:: versions of move_backward, move and iter_swap
- don't bother std::move'ing any iterators, it's unnecessary since all
   iterators are forward, it's visually noisy, and in earlier versions
   of this patch it introduced subtle use-after-move bugs

In passing also use the __glibcxx_shift macro to guard the
std::shift_left/right implementations.

libstdc++-v3/ChangeLog:

* include/bits/ranges_algo.h (shift_left, shift_right): Guard
with __glibcxx_shift >= 201806L.
(ranges::__shift_left_fn, ranges::shift_left): Define for C++23.
(ranges::__shift_right_fn, ranges::shift_right): Likewise.
* include/bits/version.def (shift): Update for C++23.
* include/bits/version.h: Regenerate.
* src/c++23/std.cc.in: Add ranges::shift_left/right.
* testsuite/25_algorithms/shift_left/constrained.cc: New test,
based off of 1.cc.
* testsuite/25_algorithms/shift_right/constrained.cc: New test,
based off of 1.cc.

Reviewed-by: Jonathan Wakely <jwakely@redhat.com>

AVR: Fix a typo in avr-mcus.def.

gcc/
* config/avr/avr-mcus.def: -mmcu= takes lower case MCU names.
* doc/avr-mmcu.texi: Rebuild.

AVR: Add support for AVR32DAxxS, AVR64DAxxS, AVR128DAxxS devices.

gcc/
* config/avr/avr-mcus.def (avr32da28S, avr32da32S, avr32da48S)
(avr64da28S, avr64da32S, avr64da48S avr64da64S)
(avr128da28S, avr128da32S, avr128da48S, avr128da64S): Add devices.
* doc/avr-mmcu.texi: Rebuild.

cdce: Fix non-call exceptions with signaling nans [PR120951]

The cdce code introduces a test for a NaN using the EQ_EXPR code.
The problem is EQ_EXPR can cause an exception with non-call exceptions
and signaling nans turned on. This is now correctly rejected by the verfier
since r16-241-g4c40e3d7b9152f.
The fix is seperate out the comparison into its own statement from the GIMPLE_COND.

Bootstrapped and tested on x86_64-linux-gnu with no regressions.

PR tree-optimization/120951

gcc/ChangeLog:

* tree-call-cdce.cc (use_internal_fn): For non-call exceptions
with EQ_EXPR can throw for floating point types, then create
the EQ_EXPR seperately.

gcc/testsuite/ChangeLog:

* gcc.dg/torture/pr120951-1.c: New test.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

tree-cfg: Reject constants and addr on lhs for assign single [PR120921]

For GIMPLE_SINGLE_RHS, we don't currently test LHS for some invalid cases.
In this case all constants and ADDR_EXPR should be invalid on the LHS.
Also for vector (non-empty) constructors, the LHS needs to be an is_gimple_reg.

This adds the checks.
Also this fixes the following gimple testcase so it no longer ICEs, all functions are correctly rejected:
```
typedef vector int vi;

void __GIMPLE b1(int t) {
  1 = t;
}
void __GIMPLE b2() {
  1 = 2;
}
void __GIMPLE b3(int t, int t1) {
  &t1 = &t;
}
void __GIMPLE b4(vi t, vi t1) {
  _Literal(vi){0} = _Literal(vi){1};
}
void __GIMPLE b5(vi t, vi t1) {
  _Literal(vi){t} = _Literal(vi){t1};
}
```

Bootstrapped and tested on x86_64-linux-gnu.

PR middle-end/120921
gcc/ChangeLog:

* tree-cfg.cc (verify_gimple_assign_single): Reject constant and address expression LHS.
For non-empty vector constructors, make sure the LHS is an is_gimple_reg.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

Add cutoff information to profile_info and use it when forcing non-zero value

Main difference between normal profile feedback and auto-fdo is that with profile
feedback every basic block with non-zero profile has an incomming edge with non-zero
profile.  With auto-profile it is possible that none of predecessors was sampled
and also the tool has cutoff parameter which makes it to ignore small counts.

This becomes a problem when one tries to specialize code and scale profile.
For exmaple if inline function happens to have hot loop with non-zero counts
but its entry count has zero counts and we want to inline to zero counts and we
want to inline to a call with a non-zero count X, we want to scale the body by
X/0 which we currently turn into X/1.

This is a problem since I added logic to scale up the auto-profiles (to get
some extra bits of precision) so X is often a large value and multiplying by X
is not a right answer at all.  The multiply factor should be <= 1.

Iterating this few times will make counts to cap and we will lost any useful info.
Original implementation avoided this by doing all inlines before AFDO readback,
bit this is not possible with LTO (unless we move AFDO readback to WPA or add
support for context sensitive profiles).  I think I can get the scaling work
reasonably well and then we can look into possible benefits of context sensitive
profiling which can be implemented both atop of AFDO as well as FDO.

This patch adds cutoff value to profile_info which is initialized by profile
feedback to 1 and by auto-profile to the scale factor (since we do not know the
cutoff create_gcov used; llvm's tool streams it and we probably should too).
Then force_nonzero forces every value smaller than cutoff/2 to cutoff/2 which
should keep scaling factors in reasonable ranges.

gcc/ChangeLog:

* auto-profile.cc
(autofdo_source_profile::read): Scale cutoff.
(read_autofdo_file): Initialize cutoff
* coverage.cc (read_counts_file): Initialize cutoff to 1.
* gcov-io.h (struct gcov_summary): Add cutoff field.
* ipa-inline.cc (inline_small_functions): mac_count can be non-zero
also with auto_profile.
* lto-cgraph.cc (output_profile_summary): Write cutoff
and sum_max.
(input_profile_summary): Read cutoff and sum max.
(merge_profile_summaries): Initialize and scale global cutoffs
and sum max.
* profile-count.cc: Include profile.h
(profile_count::force_nonzero): move here from ...; use cutoff.
* profile-count.h: (profile_count::force_nonzero): ... here.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-prof/clone-merge-1.c:

Fix overflow check in profile_count::operator* (const sreal &num).

gcc/ChangeLog:

* profile-count.cc (profile_count::operator*): fix overflow check.

Daily bump.

[vxworks] [ppc] match TARGET_VXWORKS64 to TARGET_64BIT

Configuring gcc for --target=powerpc-wrs-vxworks7r2 sets things up for
a 64-bit compiler, just like powerpc64-wrs-vxworks7r2, except that
TARGET_VXWORKS64 is only defined as 1 for targets that match
*64-*-vxworks*.

With !TARGET_VXWORKS64, we get a 64-bit toolchain that defines
SIZE_TYPE, PTRDIFF_TYPE, and WCHAR_TYPE as 32-bit types, and that
breaks GCC passes that expect SIZE_TYPE and PTRDIFF_TYPE to be as wide
as pointers.

Arrange for TARGET_VXWORKS64 on ppc to match TARGET_64BIT, after using
it to select the default word size with driver self specs.

for gcc/ChangeLog

* config/rs6000/vxworks.h (SUBTARGET_DRIVER_SELF_SPECS):
Redefine to select word size matching TARGET_VXWORKS64.
(TARGET_VXWORKS64): Redefine in terms of TARGET_64BIT.

Daily bump.

RISC-V: prefetch: fix LRA failing to allocate reg [PR118241]

prefetch was recently fixed/tightened (with Q reg constraint) to only
support right address patterns (REG or REG+D with lower 5 bits clear).
However in some cases that's too restrictive for LRA and it fails to
allocate a reg resulting in following ICE...

| gcc/testsuite/gcc.target/riscv/pr118241-b.cc:31:19: error: unable to generate reloads for:
|   31 | void m() { a.l(); }
|      |                   ^
|(insn 26 25 27 7 (prefetch (mem/f:DI (plus:DI (reg/f:DI 143 [ _5 ])
|                (const_int 56 [0x38])) [5 _5->batch[6]+0 S8 A64])
|        (const_int 0 [0])
|        (const_int 3 [0x3])) "gcc/testsuite/gcc.target/riscv/pr118241-b.cc":18:29 498 {prefetch}
|     (expr_list:REG_DEAD (reg/f:DI 142 [ _5->batch[6] ])
|        (nil)))
|during RTL pass: reload

Fix that by providing a fallback alternative register constraint to reload the address.

PR target/118241

gcc/ChangeLog:

* config/riscv/riscv.md (prefetch): Add alternative "r".

gcc/testsuite/ChangeLog:

* gcc.target/riscv/pr118241-b.cc: New test.

Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>

RISC-V: prefetch: const offset needs to have 5 bits zero, not 4

Spotted this by chance as I saw a similar fixup in comment.
From comments, I think this is needed, but I've not hit any issues due
to this.

gcc/ChangeLog:

* config/riscv/predicates.md (prefetch_operand): mack 5 bits.

Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>

libstdc++: Avoid -Wswitch warning from chrono formatters

Add a default case to the switch to suppress warnings about unhandled
enumeration values. This is a consteval function, so if the default case
is ever reached it will be an error not silent miscompilation.

libstdc++-v3/ChangeLog:

* include/bits/chrono_io.h (__formatter_duration::_S_spec_for):
Add default case to switch and use __builtin_unreachable.

libstdc++: Fix typo in __size_to_integer(__GLIBCXX_TYPE_INT_N_3)

The overload taking a signed type was returning unsigned and the
overload taking an unsigned type was returning signed.

libstdc++-v3/ChangeLog:

* include/bits/stl_algobase.h (__size_to_integer): Move
misplaced unsigned keyword on __size_to_integer overloads for
__GLIBCXX_TYPE_INT_N_3 integer type.

sh: Recognize >> 31 in treg_set_expr_not_const01

A right shift of 31 will become 0 or 1, this can be checked for
treg_set_expr_not_const01 to avoid matching addc_t_r as this
can expand to a 3 insn sequence instead.
This improves tests 023 to 026 from gcc.target/sh/pr54236-2.c, e.g.:
test_023:
shll    r5
mov     #0,r1
mov     r4,r0
rts
addc    r1,r0

With this change:
test_023:
shll    r5
movt    r0
rts
add     r4,r0

We noticed this while evaluating a patch to improve how we handle
selecting between two constants based on the output of a LT/GE 0
test.

gcc/ChangeLog:
* config/sh/predicates.md
(treg_set_expr_not_const01): call sh_recog_treg_set_expr_not_01
* config/sh/sh-protos.h
(sh_recog_treg_set_expr_not_01): New function
* config/sh/sh.cc (sh_recog_treg_set_expr_not_01): Likewise

gcc/testsuite/ChangeLog:
* gcc.target/sh/pr54236-2.c: Fix comments and expected output

Fortran: Silence a clang warning (suggesting a brace) in io.cc

When GCC is built with clang, it suggests that we add a brace to the
initialization of format_asterisk:

gcc/fortran/io.cc:32:16: warning: suggest braces around initialization of subobject [-Wmissing-braces]

So this patch does that to silence it.

gcc/fortran/ChangeLog:

2025-06-24 Martin Jambor <mjambor@suse.cz>

* io.cc (format_asterisk): Add a brace around static initialization
location part of the field locus.

fold: Change comparison of error_mark_node to use error_operand_p in tree_expr_nonnegative_warnv_p [PR118948]

This is an obvious fix for this small regression. Basically after r15-328-g5726de79e2154a,
there is a call to tree_expr_nonnegative_warnv_p where the type of the expression is now
error_mark_node. Though there was only a check if the expression was error_mark_node.

Bootstrapped and tested on x86_64-linux-gnu.

PR c/118948

gcc/ChangeLog:

* fold-const.cc (tree_expr_nonnegative_warnv_p): Use
error_operand_p instead of checking for error_mark_node directly.

gcc/testsuite/ChangeLog:

* gcc.dg/pr118948-1.c: New test.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

MAINTAINERS: replace tabs with spaces

I didn't realize the white spaces in this file was just spaces
and not tabs. This replaces the 2 tabs that I added for the aarch64
port reviewer to be spaces rather than tabs.

Pushed as obvious.

ChangeLog:

* MAINTAINERS: Replace tabs with spaces.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

c++: -Wtemplate-body and tentative parsing [PR120575]

Here we were asserting non-zero errorcount, which is not the case if the
parse error was reduced to a warning (or silenced) in a template body. So
check seen_error instead.

PR c++/120575
PR c++/116064

gcc/cp/ChangeLog:

* parser.cc (cp_parser_abort_tentative_parse): Check seen_error
instead of errorcount.

gcc/testsuite/ChangeLog:

* g++.dg/template/permissive-error3.C: New test.

RISC-V: Add test for vec_duplicate + vsadd.vv combine case 1 with GR2VR cost 0, 1 and 2

Add asm dump check test for vec_duplicate + vsadd.vv combine to
vsadd.vx, with the GR2VR cost is 0, 1 and 2.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i16.c: Add asm check.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-i16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-i16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-i8.c: Ditto.

Signed-off-by: Pan Li <pan2.li@intel.com>

RISC-V: Add test for vec_duplicate + vsadd.vv combine case 0 with GR2VR cost 0, 2 and 15

Add asm dump check and run test for vec_duplicate + vsadd.vv
combine to vsadd.vx, with the GR2VR cost is 0, 2 and 15.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i16.c: Add asm check.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_binary.h: Add test
helper macros.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_binary_data.h: Add test
data for run test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vsadd-run-1-i16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vsadd-run-1-i32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vsadd-run-1-i64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vsadd-run-1-i8.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

RISC-V: Combine vec_duplicate + vsadd.vv to vsadd.vx on GR2VR cost

This patch would like to combine the vec_duplicate + vsadd.vv to the
vsadd.vx.  From example as below code.  The related pattern will depend
on the cost of vec_duplicate from GR2VR.  Then the late-combine will
take action if the cost of GR2VR is zero, and reject the combination
if the GR2VR cost is greater than zero.

Assume we have example code like below, GR2VR cost is 0.

  #define DEF_SAT_S_ADD(T, UT, MIN, MAX) \
  T                                      \
  test_##T##_sat_add (T x, T y)          \
  {                                      \
    T sum = (UT)x + (UT)y;               \
    return (x ^ y) < 0                   \
      ? sum                              \
      : (sum ^ x) >= 0                   \
        ? sum                            \
        : x < 0 ? MIN : MAX;             \
  }

  DEF_SAT_S_ADD(int32_t, uint32_t, INT32_MIN, INT32_MAX)
  DEF_VX_BINARY_CASE_2_WRAP(T, SAT_S_ADD_FUNC(T), sat_add)

Before this patch:
  10   │ test_vx_binary_or_int32_t_case_0:
  11   │     beq a3,zero,.L8
  12   │     vsetvli a5,zero,e32,m1,ta,ma
  13   │     vmv.v.x v2,a2
  14   │     slli    a3,a3,32
  15   │     srli    a3,a3,32
  16   │ .L3:
  17   │     vsetvli a5,a3,e32,m1,ta,ma
  18   │     vle32.v v1,0(a1)
  19   │     slli    a4,a5,2
  20   │     sub a3,a3,a5
  21   │     add a1,a1,a4
  22   │     vsadd.vv v1,v1,v2
  23   │     vse32.v v1,0(a0)
  24   │     add a0,a0,a4
  25   │     bne a3,zero,.L3

After this patch:
  10   │ test_vx_binary_or_int32_t_case_0:
  11   │     beq a3,zero,.L8
  12   │     slli    a3,a3,32
  13   │     srli    a3,a3,32
  14   │ .L3:
  15   │     vsetvli a5,a3,e32,m1,ta,ma
  16   │     vle32.v v1,0(a1)
  17   │     slli    a4,a5,2
  18   │     sub a3,a3,a5
  19   │     add a1,a1,a4
  20   │     vsadd.vx v1,v1,a2
  21   │     vse32.v v1,0(a0)
  22   │     add a0,a0,a4
  23   │     bne a3,zero,.L3

gcc/ChangeLog:

* config/riscv/riscv-v.cc (expand_vx_binary_vec_dup_vec): Add
new case SS_PLUS.
(expand_vx_binary_vec_vec_dup): Ditto.
* config/riscv/riscv.cc (riscv_rtx_costs): Ditto.
* config/riscv/vector-iterators.md: Add new op ss_plus.

Signed-off-by: Pan Li <pan2.li@intel.com>

MAINTAINERS: Add myself as an aarch64 port reviewer

Following on from the announcement here:
https://gcc.gnu.org/pipermail/gcc/2025-July/246267.html
adding myself as an aarch64 port reviewer.

ChangeLog:

* MAINTAINERS (Reviewers): Add myself for the aarch64 port.

tree-optimization/120944 - bogus VN with volatile copies

The following avoids translating expressions through volatile
copies.

PR tree-optimization/120944
* tree-ssa-sccvn.cc (vn_reference_lookup_3): Gate optimizations
invalid when volatile is involved.

* gcc.dg/torture/pr120944.c: New testcase.

Ada: Switch from ACATS 2.6 to ACATS 4.2 testsuite

This effectively adds 250 new tests, i.e. around 10% more tests.

gcc/ada/
* gcc-interface/Make-lang.in (ACATSDIR): Change to acats-4.

ada: Fix alignment violation for chain of aligned and misaligned composite types

This happens when aggressive optimizations are enabled (i.e. -O2 and above)
because the ivopts pass fails to properly mark the new memory accesses it is
creating as misaligned by means of the build_aligned_type function.

gcc/ada/ChangeLog:

* gcc-interface/utils.cc (make_packable_type): Clear the TYPE_PACKED
flag in the case where the alignment is bumped.

ada: Remove strange elaboration code generated for Cluster type in System.Pack_NN

Initialization procedures are turned into functions under the hood and, even
when they are null (empty), the compiler may generate a convoluted sequence
of instructions that return uninitialized data and, therefore, is useless.

gcc/ada/ChangeLog:

* gcc-interface/trans.cc (Subprogram_Body_to_gnu): Do not generate
a block-copy out for a null initialization procedure when the _Init
parameter is not passed in.

ada: Disable previous change for enumeration types

The debugger cannot correctly interpret the return value in this case.

gcc/ada/ChangeLog:

* gcc-interface/decl.cc (gnat_to_gnu_subprog_type): Only apply the
transformation to integer types.

ada: Add missing guards to previous change

We need to make sure that an integer type exists for the given size.

gcc/ada/ChangeLog:

* gcc-interface/decl.cc (gnat_to_gnu_subprog_type): Add guards.

ada: Improve code generated for return of Out parameter with access type

The second problem occurs on 64-bit platforms where there is a second Out
parameter that is smaller than the access parameter, creating a hole in the
return structure.

gcc/ada/ChangeLog:

* gcc-interface/decl.cc (gnat_to_gnu_subprog_type): In the case of a
subprogram using the Copy-In/Copy-Out mechanism, deal specially with
the case of 2 parameters of differing sizes.
* gcc-interface/trans.cc (Subprogram_Body_to_gnu): In the case of a
subprogram using the Copy-In/Copy-Out mechanism, make sure the types
are consistent on the two sides for all the parameters.

ada: Do not generate incorrect warning about redundant type conversion

If -gnatwr is enabled, then in some cases a type conversion between two
different Boolean types incorrectly results in a warning that the conversion
is redundant.

gcc/ada/ChangeLog:

* sem_res.adb (Resolve_Type_Conversion): Replace code for
detecting a similar case with a more comprehensive test.

ada: Pragma Short_Circuit_And_Or

Improve documentation of pragma Short_Circuit_And_Or.
Also disallow renamings, because the semantics as currently
implemented is confusing.

gcc/ada/ChangeLog:

* doc/gnat_rm/implementation_defined_pragmas.rst
(Short_Circuit_And_Or): Add more documentation.
* sem_ch8.adb (Analyze_Subprogram_Renaming):
Disallow renamings.
* gnat_rm.texi: Regenerate.

ada: Fix selection of Finalize subprogram in untagged case

The newly introduced Finalizable aspect makes it possible to derive from
a type that is not tagged but has a Finalize primitive. This patch fixes
problems where overridings of the Finalize primitive were ignored.

gcc/ada/ChangeLog:

* exp_ch7.adb (Make_Final_Call): Tweak search of Finalize primitive.
* exp_util.adb (Finalize_Address): Likewise.

ada: Fix inefficient Unchecked_Conversion to large array type

We fail to use the implementation permission given by RM 13.9(12) because
the array type does not have the Size_Known_At_Compile_Time flag set.

gcc/ada/ChangeLog:

* freeze.adb (Check_Compile_Time_Size): Try harder to see whether
the bounds of array types are known at compile time.

ada: Fix style in comment

Cleanup; technical commit meant to trigger a GNAT continuous builder.

gcc/ada/ChangeLog:

* sem_aux.ads (First_Discriminant): Remove space before period.

ada: Missing component clause warning for discriminant of Unchecked_Union type

Even when -gnatw.c is enabled, no warning about a missing component clause
should be generated if the placement of a discriminant of an Unchecked_Union
type is left unspecified in a record representation clause (such a discriminant
occupies no storage). In determining whether to generate such a warning, in
some cases the compiler would incorrectly ignore an Unchecked_Union pragma
occurring after the record representation clause. This could result in a
spurious warning.

gcc/ada/ChangeLog:

* sem_ch13.adb (Analyze_Record_Representation_Clause): In deciding
whether to generate a warning about a missing component clause, in
addition to calling Is_Unchecked_Union also call a new local
function, Unchecked_Union_Pragma_Pending, which checks for the
case of a not-yet-analyzed Unchecked_Union pragma occurring later
in the declaration list.

ada: Improved error message when size of descendant type exceeds Size'Class limit

Improve the error message that is generated when the size of tagged type
exceeds a Size'Class limit specified for an ancestor type.

gcc/ada/ChangeLog:

* mutably_tagged.adb (Make_CW_Size_Compile_Check): Include the
value of the Size'Class limit in the message generated via a
Compile_Time_Error pragma.

ada: Remove leftover from rework of aspect representation

This patch removes some comments and object definitions that referred to
a hacky use of the Entity field that had been removed by the latest
rework of the internal representation of aspects.

gcc/ada/ChangeLog:

* sem_ch13.adb (Check_Aspect_At_Freeze_Point): Remove obsolete bits.

ada: Fix error on Designated_Storage_Model with extensions disabled

The format string used for the error in that case requires setting the
Error_Msg_Name_1 global variable. This was not done so this patch adds
the missing assignment.

gcc/ada/ChangeLog:

* sem_ch13.adb (Analyze_Aspect_Specifications): Fix error emission.

Stop doing GCC 12 snapshots

In preparation for the final release from the GCC 12 branch stop
doing snapshots from it.

maintainer-scripts/
* crontab: Stop doing GCC 12 snapshots.

Regenerate common.opt.urls and add period into common.opt

gcc/ChangeLog:

* common.opt: Add period.
* common.opt.urls: Regenerate.

tree-optimization/120927 - 510.parest_r segfault with masked epilog

The following fixes bad alignment computaton for epilog vectorization
when as in this case for 510.parest_r and masked epilog vectorization
with AVX512 we end up choosing AVX to vectorize the main loop and
masked AVX512 (sic!) to vectorize the epilog.  In that case alignment
analysis for the epilog tries to force alignment of the base to 64,
but that cannot possibly help the epilog when the main loop had used
a vector mode with smaller alignment requirement.

There's another issue, that the check whether the step preserves
alignment needs to consider possibly previously involved VFs
(here, the main loops smaller VF) as well.

These might not be the only case with problems for such a mode mix
but at least there it seems wise to never use DR alignment forcing
when analyzing an epilog.

We get to chose this mode setup because the iteration over epilog
modes doesn't prevent this, the maybe_ge (cached_vf_per_mode[0],
first_vinfo_vf) skip is conditional on !supports_partial_vectors
and it is also conditional on having a cached VF.  Further nothing
in vect_analyze_loop_1 rejects this setup - it might be conceivable
that a target can do masking only for larger modes.  There is a
second reason we end up with this mode setup, which is that
vect_need_peeling_or_partial_vectors_p says we do not need
peeling or partial vectors when analyzing the main loop with
AVX512 (if it would say so we'd have chosen a masked AVX512
epilog-only vectorization).  It does that because it looks at
LOOP_VINFO_COST_MODEL_THRESHOLD (which is not yet computed, so
always zero at this point), and compares max_niter (5) against
the VF (8), but not with equality as the comment says but with
greater.  This also needs looking at, PR120939.

PR tree-optimization/120927
* tree-vect-data-refs.cc (vect_compute_data_ref_alignment):
Do not force a DRs base alignment when analyzing an
epilog loop.  Check whether the step preserves alignment
for all VFs possibly involved sofar.

* gcc.dg/vect/vect-pr120927.c: New testcase.
* gcc.dg/vect/vect-pr120927-2.c: Likewise.

c-family: Tweak ptr +- (expr +- cst) FE optimization [PR120837]

The following testcase is miscompiled with -fsanitize=undefined but we
introduce UB into the IL even without that flag.

The optimization ptr +- (expr +- cst) when expr/cst have undefined
overflow into (ptr +- cst) +- expr is sometimes simply not valid,
without careful analysis on what ptr points to we don't know if it
is valid to do (ptr +- cst) pointer arithmetics.
E.g. on the testcase, ptr points to start of an array (actually
conditionally one or another) and cst is -1, so ptr - 1 is invalid
pointer arithmetics, while ptr + (expr - 1) can be valid if expr
is at runtime always > 1 and smaller than size of the array ptr points
to + 1.

Unfortunately, removing this 1992-ish optimization altogether causes
FAIL: c-c++-common/restrict-2.c  -Wc++-compat   scan-tree-dump-times lim2 "Moving statement" 11
FAIL: gcc.dg/tree-ssa/copy-headers-5.c scan-tree-dump ch2 "is now do-while loop"
FAIL: gcc.dg/tree-ssa/copy-headers-5.c scan-tree-dump-times ch2 "  if " 3
FAIL: gcc.dg/vect/pr57558-2.c scan-tree-dump vect "vectorized 1 loops"
FAIL: gcc.dg/vect/pr57558-2.c -flto -ffat-lto-objects  scan-tree-dump vect "vectorized 1 loops"
regressions (restrict-2.c also for C++ in all std modes).  I've been thinking
about some match.pd optimization for signed integer addition/subtraction of
constant followed by widening integral conversion followed by multiplication
or left shift, but that wouldn't help 32-bit arches.

So, instead at least for now, the following patch keeps doing the
optimization, just doesn't perform it in pointer arithmetics.
pointer_int_sum itself actually adds the multiplication by size_exp,
so ptr + expr is turned into ptr p+ expr * size_exp,
so this patch will try to optimize
ptr + (expr +- cst)
into
ptr p+ ((sizetype)expr * size_exp +- (sizetype)cst * size_exp)
and
ptr - (expr +- cst)
into
ptr p+ -((sizetype)expr * size_exp +- (sizetype)cst * size_exp)

2025-07-04  Jakub Jelinek  <jakub@redhat.com>

PR c/120837
* c-common.cc (pointer_int_sum): Rewrite the intop PLUS_EXPR or
MINUS_EXPR optimization into extension of both intop operands,
their separate multiplication and then addition/subtraction followed
by rest of pointer_int_sum handling after the multiplication.

* gcc.dg/ubsan/pr120837.c: New test.

testsuite: Rename a test

I mistyped the file name :(.

gcc/testsuite/ChangeLog:

PR target/120807
* gcc.c-torture/compile/pr120708.c: Rename to ...
* gcc.c-torture/compile/pr120807.c: ... here.

LoongArch: Prevent subreg of subreg in CRC

The register_operand predicate can match subreg, then we'd have a subreg
of subreg and it's invalid. Use lowpart_subreg to avoid the nested
subreg.

gcc/ChangeLog:

* config/loongarch/loongarch.md (crc_combine): Avoid nested
subreg.

gcc/testsuite/ChangeLog:

* gcc.c-torture/compile/pr120708.c: New test.

[RISC-V] Add basic instrumentation to fusion detection

We were looking to evaluate some changes from Artemiy that improve GCC's
ability to discover fusible instruction pairs.  There was no good way to get
any static data out of the compiler about what kinds of fusions were happening.
Yea, you could grub around the .sched dumps looking for the magic '+'
annotation, then look around at the slim RTL representation and make an
educated guess about what fused.  But boy that was inconvenient.

All we really needed was a quick note in the dump file that the target hook
found a fusion pair and what kind was discovered.  That made it easy to spot
invalid fusions, evaluate the effectiveness of Artemiy's work, write/discover
testcases for existing fusions and implement new fusions.

So from a codegen standpoint this is NFC, it only affects dump file output.

It's gone through the usual testing and I'll wait for pre-commit CI to churn
through it before moving forward.

gcc/
* config/riscv/riscv.cc (riscv_macro_fusion_pair_p): Add basic
instrumentation to all cases where fusion is detected.  Fix
minor formatting goofs found in the process.

RISC-V: Add testcases for signed scalar SAT_ADD IMM form 2

This patch adds testcase for form2, as shown below:

T __attribute__((noinline))                                  \
sat_s_add_imm_##T##_fmt_2##_##INDEX (T x)                    \
{                                                            \
  T sum = (T)((UT)x + (UT)IMM);                                   \
  return ((x ^ sum) < 0 && (x ^ IMM) >= 0) ?                 \
    (-(T)(x < 0) ^ MAX) : sum;                         \
}

Passed the rv64gcv regression test.

Signed-off-by: Ciyan Pan <panciyan@eswincomputing.com>
gcc/testsuite/ChangeLog:

* gcc.target/riscv/sat/sat_arith.h: Add signed scalar SAT_ADD IMM form2.
* gcc.target/riscv/sat/sat_s_add_imm-2-i16.c: New test.
* gcc.target/riscv/sat/sat_s_add_imm-2-i32.c: New test.
* gcc.target/riscv/sat/sat_s_add_imm-2-i64.c: New test.
* gcc.target/riscv/sat/sat_s_add_imm-2-i8.c: New test.
* gcc.target/riscv/sat/sat_s_add_imm-run-2-i16.c: New test.
* gcc.target/riscv/sat/sat_s_add_imm-run-2-i32.c: New test.
* gcc.target/riscv/sat/sat_s_add_imm-run-2-i64.c: New test.
* gcc.target/riscv/sat/sat_s_add_imm-run-2-i8.c: New test.
* gcc.target/riscv/sat/sat_s_add_imm_type_check-2-i16.c: New test.
* gcc.target/riscv/sat/sat_s_add_imm_type_check-2-i32.c: New test.
* gcc.target/riscv/sat/sat_s_add_imm_type_check-2-i8.c: New test.

Match: Support for signed scalar SAT_ADD IMM form 2

This patch would like to support signed scalar SAT_ADD IMM form 2

Form2:
T __attribute__((noinline))                                  \
sat_s_add_imm_##T##_fmt_2##_##INDEX (T x)                    \
{                                                            \
  T sum = (T)((UT)x + (UT)IMM);                                   \
  return ((x ^ sum) < 0 && (x ^ IMM) >= 0) ?                 \
    (-(T)(x < 0) ^ MAX) : sum;                         \
}

Take below form1 as example:
DEF_SAT_S_ADD_IMM_FMT_2(0, int8_t, uint8_t, 9, INT8_MIN, INT8_MAX)

Before this patch:
__attribute__((noinline))
int8_t sat_s_add_imm_int8_t_fmt_2_0 (int8_t x)
{
  int8_t sum;
  unsigned char x.0_1;
  unsigned char _2;
  signed char _3;
  signed char _4;
  _Bool _5;
  signed char _6;
  int8_t _7;
  int8_t _10;
  signed char _11;
  signed char _13;
  signed char _14;

  <bb 2> [local count: 1073741822]:
  x.0_1 = (unsigned char) x_8(D);
  _2 = x.0_1 + 9;
  sum_9 = (int8_t) _2;
  _3 = x_8(D) ^ sum_9;
  _4 = x_8(D) ^ 9;
  _13 = ~_3;
  _14 = _4 | _13;
  if (_14 >= 0)
    goto <bb 3>; [59.00%]
  else
    goto <bb 4>; [41.00%]

  <bb 3> [local count: 259738146]:
  _5 = x_8(D) < 0;
  _11 = (signed char) _5;
  _6 = -_11;
  _10 = _6 ^ 127;

  <bb 4> [local count: 1073741824]:
  # _7 = PHI <sum_9(2), _10(3)>
  return _7;

}

After this patch:
__attribute__((noinline))
int8_t sat_s_add_imm_int8_t_fmt_2_0 (int8_t x)
{
  int8_t _7;

  <bb 2> [local count: 1073741824]:
  _7 = .SAT_ADD (x_8(D), 9); [tail call]
  return _7;

}

The below test suites are passed for this patch:
1. The rv64gcv fully regression tests.
2. The x86 bootstrap tests.
3. The x86 fully regression tests.

Signed-off-by: Ciyan Pan <panciyan@eswincomputing.com>
gcc/ChangeLog:

* match.pd: Add signed scalar SAT_ADD IMM form2 matching.

Daily bump.

libstdc++: fix bits/version.def typo

bits/version.def was missing a ';'.

libstdc++-v3/Changelog:
* include/bits/version.def: Fix typo.
* include/bits/version.h: Rebuilt.

c++: trivial lambda pruning [PR120716]

In this testcase there is nothing in the lambda except a static_assert which
mentions a variable from the enclosing scope but does not odr-use it, so we
want prune_lambda_captures to remove its capture. Since the lambda is so
empty, there's nothing in the body except the DECL_EXPR of the capture
proxy, so pop_stmt_list moves that into the enclosing STATEMENT_LIST and
passes the 'body' STATEMENT_LIST to free_stmt_list. As a result, passing
'body' to prune_lambda_captures is wrong; we should instead pass the
enclosing scope, i.e. cur_stmt_list.

PR c++/120716

gcc/cp/ChangeLog:

* lambda.cc (finish_lambda_function): Pass cur_stmt_list to
prune_lambda_captures.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/lambda/lambda-constexpr3.C: New test.
* g++.dg/cpp0x/lambda/lambda-constexpr3a.C: New test.

c++: ICE with 'this' in lambda signature [PR120748]

This testcase was crashing from infinite recursion in the diagnostic
machinery, trying to print the lambda signature, which referred to the
__this capture field in the lambda, which wanted to print the lambda again.

But we don't want the signature to refer to the capture field; 'this' in an
unevaluated context refers to the 'this' from the enclosing function, not
the capture.

After fixing that, we still wrongly rejected the B case because
THIS_FORBIDDEN is set in a default (template) argument. Since we don't
distinguish between THIS_FORBIDDEN being set for a default argument and it
being set for a static member function, let's just ignore it if
cp_unevaluated_operand; we'll give a better diagnostic for the static memfn
case in finish_this_expr.

PR c++/120748

gcc/cp/ChangeLog:

* lambda.cc (lambda_expr_this_capture): Don't return a FIELD_DECL.
* parser.cc (cp_parser_primary_expression): Ignore THIS_FORBIDDEN
if cp_unevaluated_operand.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/lambda-targ16.C: New test.
* g++.dg/cpp0x/this1.C: Adjust diagnostics.

c++: Fix a pasto in the PR120471 fix [PR120940]

No idea how this slipped in, I'm terribly sorry.
Strangely nothing in the testsuite has caught this, so I've added
a new test for that.

2025-07-03 Jakub Jelinek <jakub@redhat.com>

PR c++/120940
* typeck.cc (cp_build_array_ref): Fix a pasto.

* g++.dg/parse/pr120940.C: New test.
* g++.dg/warn/Wduplicated-branches9.C: New test.

Ada: Remove left-overs of front-end exception mechanism

It was removed from the compiler a few releases ago.

gcc/ada/
* gcc-interface/Makefile.in (gnatlib-sjlj): Delete.
(gnatlib-zcx): Do not modify Frontend_Exceptions constant.
* libgnat/system-linux-loongarch.ads (Frontend_Exceptions): Delete.

s390: More vec-perm-const cases.

s390 missed constant vector permutation cases based on the vector pack
instruction or changing the size of the vector elements during vector
merge. This enables some more patterns that do not need to load a
constant vector for permutation.

gcc/ChangeLog:

* config/s390/s390.cc (expand_perm_with_merge): Add size change cases.
(expand_perm_with_pack): New function.
(vectorize_vec_perm_const_1): Wire up new function.

gcc/testsuite/ChangeLog:

* gcc.target/s390/vector/vec-perm-merge-1.c: New test.
* gcc.target/s390/vector/vec-perm-pack-1.c: New test.

Signed-off-by: Juergen Christ <jchrist@linux.ibm.com>

Add myself as an aarch64 port reviewer

As mentioned in https://inbox.sourceware.org/gcc/EA828262-8F8F-4362-9CA8-312F7C20E2F9@nvidia.com/T/#m6e7e8e11656189598c759157d5d49cbd0ac9ba7c.
Adding myself as an aarch64 port reviewer.

ChangeLog:

* MAINTAINERS: Add myself as an aarch64 port reviewer.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

libstdc++: Update LWG 4166 changes to concat_view::end() [PR120934]

In r15-4555-gf191c830154565 we proactively implemented the initial
proposed resolution for LWG 4166 which later turned out to be
insufficient, since we must also require equality_comparable of the
underlying iterators before concat_view could be a common range.

This patch implements the updated P/R, requiring all underlying
iterators to be forward (which implies equality_comparable) before
making concat_view common, which fixes the testcase from this PR.

PR libstdc++/120934

libstdc++-v3/ChangeLog:

* include/std/ranges (concat_view::end): Refine condition
for returning an iterator instead of default_sentinel as
per the updated P/R for LWG 4166.
* testsuite/std/ranges/concat/1.cc (test05): New test.

Reviewed-by: Jonathan Wakely <jwakely@redhat.com>

OpenMP: Add omp_get_initial_device/omp_get_num_devices builtins: Fix test cases

With this fix-up for commit 387209938d2c476a67966c6ddbdbf817626f24a2
"OpenMP: Add omp_get_initial_device/omp_get_num_devices builtins", we progress:

     PASS: c-c++-common/gomp/omp_get_num_devices_initial_device.c (test for excess errors)
     PASS: c-c++-common/gomp/omp_get_num_devices_initial_device.c scan-tree-dump-not optimized "abort"
    -FAIL: c-c++-common/gomp/omp_get_num_devices_initial_device.c scan-tree-dump-times optimized "omp_get_num_devices;" 1
    +PASS: c-c++-common/gomp/omp_get_num_devices_initial_device.c scan-tree-dump-times optimized "omp_get_num_devices" 1
     PASS: c-c++-common/gomp/omp_get_num_devices_initial_device.c scan-tree-dump optimized "_1 = __builtin_omp_get_num_devices \$\$;[\\r\\n]+[ ]+return _1;"

... etc. for offloading configurations.

gcc/testsuite/
* c-c++-common/gomp/omp_get_num_devices_initial_device.c: Fix.
* gfortran.dg/gomp/omp_get_num_devices_initial_device.f90: Likewise.

[RISC-V][PR target/118886] Refine when two insns are signaled as fusion candidates

A number of folks have had their fingers in this code and it's going to take a
few submissions to do everything we want to do.

This patch is primarily concerned with avoiding signaling that fusion can occur
in cases where it obviously should not be signaling fusion.

Every DEC based fusion I'm aware of requires the first instruction to set a
destination register that is both used and set again by the second instruction.
If the two instructions set different registers, then the destination of the
first instruction was not dead and would need to have a result produced.

This is complicated by the fact that we have pseudo registers prior to reload.
So the approach we take is to signal fusion prior to reload even if the
destination registers don't match.  Post reload we require them to match.

That allows us to clean up the code ever-so-slightly.

Second, we sometimes signaled fusion into loads that weren't scalar integer
loads.  I'm not aware of a design that's fusing into FP loads or vector loads.
So those get rejected explicitly.

Third, the store pair "fusion" code is cleaned up a little.  We use fusion to
model store pair commits since the basic properties for detection are the same.
The point where they "fuse" is different.  Also this code liked to "return
false" at each step along the way if fusion wasn't possible.  Future work for
additional fusion cases makes that behavior undesirable.  So the logic gets
reworked a little bit to be more friendly to future work.

Fourth, if we already fused the previous instruction, then we can't fuse it
again.  Signaling fusion in that case is, umm, bad as it creates an atomic blob
of code from a scheduling standpoint.

Hopefully I got everything correct with extracting this work out of a larger
set of changes 🙂  We will contribute some instrumentation & testing code so if
I botched things in a major way we'll soon have a way to test that and I'll be
on the hook to fix any goof's.

From a correctness standpoint this should be a big fat nop.  We've seen this
make measurable differences in pico benchmarks, but obviously as you scale up
to bigger stuff the gains largely disappear into the noise.

This has been through Ventana's internal CI and my tester.  I'll obviously wait
for a verdict from the pre-commit tester.

PR target/118886
gcc/
* config/riscv/riscv.cc (riscv_macro_fusion_pair_p): Check
for fusion being disabled earlier.  If PREV is already fused,
then it can't be fused again.  Be more selective about fusing
when the destination registers do not match.  Don't fuse into
loads that aren't scalar integer modes.  Revamp store pair
commit support.

Co-authored-by: Daniel Barboza <dbarboza@ventanamicro.com>
Co-authored-by: Shreya Munnangi <smunnangi1@ventanamicro.com>

testsuite: Fix gcc.dg/ipa/pr120295.c on Solaris

gcc.dg/ipa/pr120295.c FAILs on Solaris:

FAIL: gcc.dg/ipa/pr120295.c (test for excess errors)

Excess errors:
ld: warning: symbol 'glob' has differing types:
(file /var/tmp//ccsDR59c.o type=OBJT; file /lib/libc.so type=FUNC);
/var/tmp//ccsDR59c.o definition taken

Fixed by renaming the glob variable to glob_ to avoid the conflict.

Tested on i386-pc-solaris2.11 and x86_64-pc-linux-gnu.

gcc/testsuite:
* gcc.dg/ipa/pr120295.c (glob): Rename to glob_.

AArch64: make rules for CBZ/TBZ higher priority

Move the rules for CBZ/TBZ to be above the rules for
CBB<cond>/CBH<cond>/CB<cond>. We want them to have higher priority
because they can express larger displacements.

gcc/ChangeLog:

* config/aarch64/aarch64.md (aarch64_cbz<optab><mode>1): Move
above rules for CBB<cond>/CBH<cond>/CB<cond>.
(*aarch64_tbz<optab><mode>1): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/cmpbr.c: Update tests.

AArch64: rules for CMPBR instructions

Add rules for lowering `cbranch<mode>4` to CBB<cond>/CBH<cond>/CB<cond> when
CMPBR extension is enabled.

gcc/ChangeLog:

* config/aarch64/aarch64-protos.h (aarch64_cb_rhs): New function.
* config/aarch64/aarch64.cc (aarch64_cb_rhs): Likewise.
* config/aarch64/aarch64.md (cbranch<mode>4): Rename to ...
(cbranch<GPI:mode>4): ...here, and emit CMPBR if possible.
(cbranch<SHORT:mode>4): New expand rule.
(aarch64_cb<INT_CMP:code><GPI:mode>): New insn rule.
(aarch64_cb<INT_CMP:code><SHORT:mode>): Likewise.
* config/aarch64/constraints.md (Uc0): New constraint.
(Uc1): Likewise.
(Uc2): Likewise.
* config/aarch64/iterators.md (cmpbr_suffix): New mode attr.
(INT_CMP): New code iterator.
(cmpbr_imm_constraint): New code attr.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/cmpbr.c:

AArch64: precommit test for CMPBR instructions

Commit the test file `cmpbr.c` before rules for generating the new
instructions are added, so that the changes in codegen are more obvious
in the next commit.

gcc/testsuite/ChangeLog:

* lib/target-supports.exp: Add `cmpbr` to the list of extensions.
* gcc.target/aarch64/cmpbr.c: New test.

AArch64: recognize `+cmpbr` option

Add the `+cmpbr` option to enable the FEAT_CMPBR architectural
extension.

gcc/ChangeLog:

* config/aarch64/aarch64-option-extensions.def (cmpbr): New
option.
* config/aarch64/aarch64.h (TARGET_CMPBR): New macro.
* doc/invoke.texi (cmpbr): New option.