git.ipfire.org Git - thirdparty/gcc.git/log

isel: Fold more in gimple_expand_vec_cond_expr [PR115659]

As PR115659 shows, assuming c = x CMP y, there are some
folding chances for patterns r = c ? -1/z : z/0.

For r = c ? -1 : z, it can be folded into:
  - r = c | z (with ior_optab supported)
  - or r = c ? c : z

while for r = c ?  z : 0, it can be foled into:
  - r = c & z (with and_optab supported)
  - or r = c ? z : c

This patch is to teach ISEL to take care of them and also
remove the redundant gsi_replace as the caller of function
gimple_expand_vec_cond_expr will handle it.

PR tree-optimization/115659

gcc/ChangeLog:

* gimple-isel.cc (gimple_expand_vec_cond_expr): Add more foldings for
patterns x CMP y ? -1 : z and x CMP y ? z : 0.

Daily bump.

c++: ICE with computed gotos [PR115469]

This is a low-prio crash on invalid code where we ICE on a VAR_DECL
with erroneous type. I thought I'd try to avoid putting such decls
into ->names and ->names_in_scope but that sounds riskier than the
following cleanup.

PR c++/115469

gcc/cp/ChangeLog:

* decl.cc (automatic_var_with_nontrivial_dtor_p): New.
(poplevel_named_label_1): Use it.
(check_goto_1): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/ext/label17.C: New test.

testsuite: fix spaceship-narrowing1.C

I made sure that Wnarrowing22.C works fine on ILP32, but apparently
I didn't verify that spaceship-narrowing1.C works there as well. :(

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/spaceship-narrowing1.C: Use __INT64_TYPE__.

c++: unresolved overload with comma op [PR115430]

This works:

  template<typename T>
  int Func(T);
  typedef int (*funcptrtype)(int);
  funcptrtype fp0 = &Func<int>;

but this doesn't:

  funcptrtype fp2 = (0, &Func<int>);

because we only call resolve_nondeduced_context on the LHS (via
convert_to_void) but not on the RHS, so cp_build_compound_expr's
type_unknown_p check issues an error.

PR c++/115430

gcc/cp/ChangeLog:

* typeck.cc (cp_build_compound_expr): Call resolve_nondeduced_context
on RHS.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/noexcept41.C: Remove dg-error.
* g++.dg/overload/addr3.C: New test.

c++: DR2627, Bit-fields and narrowing conversions [PR94058]

This DR (https://cplusplus.github.io/CWG/issues/2627.html) says that
even if we are converting from an integer type or unscoped enumeration type
to an integer type that cannot represent all the values of the original
type, it's not narrowing if "the source is a bit-field whose width w is
less than that of its type (or, for an enumeration type, its underlying
type) and the target type can represent all the values of a hypothetical
extended integer type with width w and with the same signedness as the
original type".

DR 2627
PR c++/94058
PR c++/104392

gcc/cp/ChangeLog:

* typeck2.cc (check_narrowing): Don't warn if the conversion isn't
narrowing as per DR 2627.

gcc/testsuite/ChangeLog:

* g++.dg/DRs/dr2627.C: New test.
* g++.dg/cpp0x/Wnarrowing22.C: New test.
* g++.dg/cpp2a/spaceship-narrowing1.C: New test.
* g++.dg/cpp2a/spaceship-narrowing2.C: New test.

Preserve SSA info for more propagated copy

Besides VN and copy-prop also CCP and VRP as well as forwprop
propagate out copies and thus it's worthwhile to try to preserve
range and points-to info there when possible.

Note that this also fixes the testcase from PR115701 but that's
because we do not actually intersect info but only copy info when
there was no info present.

* tree-ssa-forwprop.cc (fwprop_set_lattice_val): Preserve
SSA info.
* tree-ssa-propagate.cc
(substitute_and_fold_dom_walker::before_dom_children): Likewise.

RISC-V: Add testcases for unsigned scalar .SAT_ADD IMM form 4

This patch would like to add test cases for the unsigned scalar
.SAT_ADD IMM form 4.  Aka:

Form 4:
  #define DEF_SAT_U_ADD_IMM_FMT_4(T)                            \
  T __attribute__((noinline))                                   \
  sat_u_add_imm_##T##_fmt_4 (T x)                               \
  {                                                             \
    T ret;                                                      \
    return __builtin_add_overflow (x, 9, &ret) == 0 ? ret : -1; \
  }

DEF_SAT_U_ADD_IMM_FMT_4(uint64_t)

The below test is passed for this patch.
* The rv64gcv regression test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/sat_arith.h: Add helper test macro.
* gcc.target/riscv/sat_u_add_imm-13.c: New test.
* gcc.target/riscv/sat_u_add_imm-14.c: New test.
* gcc.target/riscv/sat_u_add_imm-15.c: New test.
* gcc.target/riscv/sat_u_add_imm-16.c: New test.
* gcc.target/riscv/sat_u_add_imm-run-13.c: New test.
* gcc.target/riscv/sat_u_add_imm-run-14.c: New test.
* gcc.target/riscv/sat_u_add_imm-run-15.c: New test.
* gcc.target/riscv/sat_u_add_imm-run-16.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

RISC-V: Add testcases for unsigned scalar .SAT_ADD IMM form 3

This patch would like to add test cases for the unsigned scalar
.SAT_ADD IMM form 3.  Aka:

Form 3:
  #define DEF_SAT_U_ADD_IMM_FMT_3(T)                       \
  T __attribute__((noinline))                              \
  sat_u_add_imm_##T##_fmt_3 (T x)                          \
  {                                                        \
    T ret;                                                 \
    return __builtin_add_overflow (x, 8, &ret) ? -1 : ret; \
  }

DEF_SAT_U_ADD_IMM_FMT_3(uint64_t)

The below test is passed for this patch.
* The rv64gcv regression test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/sat_arith.h: Add helper test macro.
* gcc.target/riscv/sat_u_add_imm-10.c: New test.
* gcc.target/riscv/sat_u_add_imm-11.c: New test.
* gcc.target/riscv/sat_u_add_imm-12.c: New test.
* gcc.target/riscv/sat_u_add_imm-9.c: New test.
* gcc.target/riscv/sat_u_add_imm-run-10.c: New test.
* gcc.target/riscv/sat_u_add_imm-run-11.c: New test.
* gcc.target/riscv/sat_u_add_imm-run-12.c: New test.
* gcc.target/riscv/sat_u_add_imm-run-9.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

RISC-V: Add testcases for unsigned scalar .SAT_ADD IMM form 2

This patch would like to add test cases for the unsigned scalar
.SAT_ADD IMM form 2.  Aka:

Form 2:
  #define DEF_SAT_U_ADD_IMM_FMT_2(T)      \
  T __attribute__((noinline))             \
  sat_u_add_imm_##T##_fmt_1 (T x)         \
  {                                       \
    return (T)(x + 9) < x ? -1 : (x + 9); \
  }

DEF_SAT_U_ADD_IMM_FMT_2(uint64_t)

The below test is passed for this patch.
* The rv64gcv regression test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/sat_arith.h: Add helper test macro.
* gcc.target/riscv/sat_u_add_imm-5.c: New test.
* gcc.target/riscv/sat_u_add_imm-6.c: New test.
* gcc.target/riscv/sat_u_add_imm-7.c: New test.
* gcc.target/riscv/sat_u_add_imm-8.c: New test.
* gcc.target/riscv/sat_u_add_imm-run-5.c: New test.
* gcc.target/riscv/sat_u_add_imm-run-6.c: New test.
* gcc.target/riscv/sat_u_add_imm-run-7.c: New test.
* gcc.target/riscv/sat_u_add_imm-run-8.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

RISC-V: Add testcases for unsigned scalar .SAT_ADD IMM form 1

This patch would like to add test cases for the unsigned scalar
.SAT_ADD IMM form 1.  Aka:

Form 1:
  #define DEF_SAT_U_ADD_IMM_FMT_1(T)       \
  T __attribute__((noinline))              \
  sat_u_add_imm_##T##_fmt_1 (T x)          \
  {                                        \
    return (T)(x + 9) >= x ? (x + 9) : -1; \
  }

DEF_SAT_U_ADD_IMM_FMT_1(uint64_t)

The below test is passed for this patch.
* The rv64gcv regression test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/sat_arith.h: Add helper test macro.
* gcc.target/riscv/sat_u_add_imm-1.c: New test.
* gcc.target/riscv/sat_u_add_imm-2.c: New test.
* gcc.target/riscv/sat_u_add_imm-3.c: New test.
* gcc.target/riscv/sat_u_add_imm-4.c: New test.
* gcc.target/riscv/sat_u_add_imm-run-1.c: New test.
* gcc.target/riscv/sat_u_add_imm-run-2.c: New test.
* gcc.target/riscv/sat_u_add_imm-run-3.c: New test.
* gcc.target/riscv/sat_u_add_imm-run-4.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

testsuite: Fix -m32 gcc.target/i386/pr102464-vrndscaleph.c on RedHat.

This patch fixes the 4 FAILs of gcc.target/i386/pr192464-vrndscaleph.c
with --target_board='unix{-m32}' on RedHat 7.x.  The issue is that this
AVX512 test includes the system math.h, and on older systems this provides
inline versions of floor, ceil and rint (for the 387).  The work around
is to define __NO_MATH_INLINES before #include <math.h> (or alternatively
use __builtin_floor, __builtin_ceil, etc.).

2024-07-01  Roger Sayle  <roger@nextmovesoftware.com>

gcc/testsuite/ChangeLog
PR middle-end/102464
* gcc.target/i386/pr102464-vrndscaleph.c: Define __NO_MATH_INLINES
to resovle FAILs with -m32 on older RedHat systems.

i386: Additional peephole2 to use lea in round-up integer division.

A common idiom for implementing an integer division that rounds upwards is
to write (x + y - 1) / y.  Conveniently on x86, the two additions to form
the numerator can be performed by a single lea instruction, and indeed gcc
currently generates a lea when both x and y are both registers.

int foo(int x, int y) {
  return (x+y-1)/y;
}

generates with -O2:

foo:    leal    -1(%rsi,%rdi), %eax // 4 bytes
        cltd
        idivl   %esi
        ret

Oddly, however, if x is a memory, gcc currently uses two instructions:

int m;
int bar(int y) {
  return (m+y-1)/y;
}

generates:

foo: movl    m(%rip), %eax
        addl    %edi, %eax // 2 bytes
        subl    $1, %eax // 3 bytes
        cltd
        idivl   %edi
        ret

This discrepancy is caused by the late decision (in peephole2) to split
an addition with a memory operand, into a load followed by a reg-reg
addition.  This patch improves this situation by adding a peephole2
to recognize consecutive additions and transform them into lea if
profitable.

My first attempt at fixing this was to use a define_insn_and_split:

(define_insn_and_split "*lea<mode>3_reg_mem_imm"
  [(set (match_operand:SWI48 0 "register_operand")
       (plus:SWI48 (plus:SWI48 (match_operand:SWI48 1 "register_operand")
                               (match_operand:SWI48 2 "memory_operand"))
                   (match_operand:SWI48 3 "x86_64_immediate_operand")))]
  "ix86_pre_reload_split ()"
  "#"
  "&& 1"
  [(set (match_dup 4) (match_dup 2))
   (set (match_dup 0) (plus:SWI48 (plus:SWI48 (match_dup 1) (match_dup 4))
                                 (match_dup 3)))]
  "operands[4] = gen_reg_rtx (<MODE>mode);")

using combine to combine instructions.  Unfortunately, this approach
interferes with (reload's) subtle balance of deciding when to use/avoid lea,
which can be observed as a code size regression in CSiBE.  The peephole2
approach (proposed here) uniformly improves CSiBE results.

2024-07-01  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
* config/i386/i386.md (peephole2): Transform two consecutive
additions into a 3-component lea if !TARGET_AVOID_LEA_FOR_ADDR.

gcc/testsuite/ChangeLog
* gcc.target/i386/lea-3.c: New test case.

AVR: target/88236, target/115726 - Fix __memx code generation.

PR target/88236
PR target/115726
gcc/
* config/avr/avr.md (mov<mode>) [avr_mem_memx_p]: Expand in such a
way that the destination does not overlap with any hard register
clobbered / used by xload8qi_A resp. xload<mode>_A.
* config/avr/avr.cc (avr_out_xload): Avoid early-clobber
situation for Z by executing just one load when the output register
overlaps with Z.
gcc/testsuite/
* gcc.target/avr/torture/pr88236-pr115726.c: New test.

testsuite/52641 - Adjust some test cases to less capable platforms.

PR testsuite/52641
gcc/testsuite/
* gcc.dg/analyzer/pr109577.c: Use __SIZE_TYPE__ instead of "unsigned long".
* gcc.dg/analyzer/pr93032-mztools-signed-char.c: Requires int32plus.
* gcc.dg/analyzer/pr93032-mztools-unsigned-char.c: Requires int32plus.
* gcc.dg/analyzer/putenv-1.c: Skip on avr.
* gcc.dg/torture/type-generic-1.c: Skip on avr.

libgomp, openmp: Add ompx_gnu_pinned_mem_alloc

This creates a new predefined allocator as a shortcut for using pinned
memory with OpenMP. This is not in the OpenMP standard so it uses the "ompx"
namespace and an independent enum baseline of 200 (selected to not clash with
other known implementations).

The allocator is equivalent to using a custom allocator with the pinned
trait and the null fallback trait. One motivation for having this feature is
for use by the (planned) -foffload-memory=pinned feature.

gcc/fortran/ChangeLog:

* openmp.cc (is_predefined_allocator): Update valid ranges to
incorporate ompx_gnu_pinned_mem_alloc.

libgomp/ChangeLog:

* allocator.c (ompx_gnu_min_predefined_alloc): New.
(ompx_gnu_max_predefined_alloc): New.
(predefined_alloc_mapping): Rename to ...
(predefined_omp_alloc_mapping): ... this.
(predefined_ompx_gnu_alloc_mapping): New.
(_Static_assert): Adjust for the new name, and add a new assert for the
new table.
(predefined_allocator_p): New.
(predefined_alloc_mapping): New.
(omp_aligned_alloc): Support ompx_gnu_pinned_mem_alloc.
Use predefined_allocator_p and predefined_alloc_mapping.
(omp_free): Likewise.
(omp_alligned_calloc): Likewise.
(omp_realloc): Likewise.
* env.c (parse_allocator): Add ompx_gnu_pinned_mem_alloc.
* libgomp.texi: Document ompx_gnu_pinned_mem_alloc.
* omp.h.in (omp_allocator_handle_t): Add ompx_gnu_pinned_mem_alloc.
* omp_lib.f90.in: Add ompx_gnu_pinned_mem_alloc.
* omp_lib.h.in: Add ompx_gnu_pinned_mem_alloc.
* testsuite/libgomp.c/alloc-pinned-5.c: New test.
* testsuite/libgomp.c/alloc-pinned-6.c: New test.
* testsuite/libgomp.fortran/alloc-pinned-1.f90: New test.

gcc/testsuite/ChangeLog:

* gfortran.dg/gomp/allocate-pinned-1.f90: New test.

Co-Authored-By: Thomas Schwinge <thomas@codesourcery.com>

libgomp: change alloc-pinned tests failure mode

The feature doesn't work on non-Linux hosts, at present, so skip the tests
entirely.

On Linux systems that have insufficient lockable memory configured we still
need to fail or else the feature won't be getting tested when we think it is,
but now there's a message to explain why.

libgomp/ChangeLog:

* testsuite/libgomp.c/alloc-pinned-1.c: Change dg-xfail-run-if to
dg-skip-if.
Correct spelling mistake.
Abort on insufficient lockable memory.
Use #error on non-linux hosts.
* testsuite/libgomp.c/alloc-pinned-2.c: Likewise.

libffi: Fix 32-bit SPARC structure passing [PR115681]

The libffi.closures/single_entry_structs2.c test FAILs on 32-bit SPARC:

FAIL: libffi.closures/single_entry_structs2.c -W -Wall -Wno-psabi -O0
execution test

The issue has been reported, analyzed and fixed upstream:

Several tests FAIL on 32-bit Solaris/SPARC
https://github.com/libffi/libffi/issues/841

Therefore this patch imports the fix into the GCC tree.

Tested on sparc-sun-solaris2.11.

2024-07-01 Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE>

libffi:
PR libffi/115681
* src/sparc/ffi.c (ffi_call_int): Copy structure arguments to
maintain call-by-value semantics.

tree-optimization/115723 - ICE with .COND_ADD reduction

The following fixes an ICE with a .COND_ADD discovered as reduction
even though its else value isn't the reduction chain link but a
constant. This would be wrong-code with --disable-checking I think.

PR tree-optimization/115723
* tree-vect-loop.cc (check_reduction_path): For a .COND_ADD
verify the else value also refers to the reduction chain op.

* gcc.dg/vect/pr115723.c: New testcase.

[MAINTAINERS] Update my email address

Update my email address.

ChangeLog:

* MAINTAINERS: Update claziss email address.

Signed-off-by: Claudiu Zissulescu <claziss@gmail.com>

tree-optimization/115694 - ICE with complex store rewrite

The following adds a missed check when forwprop attempts to rewrite
a complex store.

PR tree-optimization/115694
* tree-ssa-forwprop.cc (pass_forwprop::execute): Check the
store is complex before rewriting it.

* g++.dg/torture/pr115694.C: New testcase.

Remove vcond{,u,eq}<mode> expanders since they will be obsolete.

gcc/ChangeLog:

PR target/115517
* config/i386/mmx.md (vcond<mode>v2sf): Removed.
(vcond<MMXMODE124:mode><MMXMODEI:mode>): Ditto.
(vcond<mode><mode>): Ditto.
(vcondu<MMXMODE124:mode><MMXMODEI:mode>): Ditto.
(vcondu<mode><mode>): Ditto.
* config/i386/sse.md (vcond<V_512:mode><VF_512:mode>): Ditto.
(vcond<V_256:mode><VF_256:mode>): Ditto.
(vcond<V_128:mode><VF_128:mode>): Ditto.
(vcond<VI2HFBF_AVX512VL:mode><VHF_AVX512VL:mode>): Ditto.
(vcond<V_512:mode><VI_AVX512BW:mode>): Ditto.
(vcond<V_256:mode><VI_256:mode>): Ditto.
(vcond<V_128:mode><VI124_128:mode>): Ditto.
(vcond<VI8F_128:mode>v2di): Ditto.
(vcondu<V_512:mode><VI_AVX512BW:mode>): Ditto.
(vcondu<V_256:mode><VI_256:mode>): Ditto.
(vcondu<V_128:mode><VI124_128:mode>): Ditto.
(vcondu<VI8F_128:mode>v2di): Ditto.
(vcondeq<VI8F_128:mode>v2di): Ditto.

Optimize a < 0 ? -1 : 0 to (signed)a >> 31.

Try to optimize x < 0 ? -1 : 0 into (signed) x >> 31
and x < 0 ? 1 : 0 into (unsigned) x >> 31.

Add define_insn_and_split for the optimization did in
ix86_expand_int_vcond.

gcc/ChangeLog:

PR target/115517
* config/i386/sse.md ("*ashr<mode>3_1"): New
define_insn_and_split.
(*avx512_ashr<mode>3_1): Ditto.
(*avx2_lshr<mode>3_1): Ditto.
(*avx2_lshr<mode>3_2): Ditto and add 2 combine splitter after
it.
* config/i386/mmx.md (mmxscalarsize): New mode attribute.
(*mmw_ashr<mode>3_1): New define_insn_and_split.
("mmx_<insn><mode>3): Add a combine spiltter after it.
(*mmx_ashrv2hi3_1): New define_insn_and_plit, also add a
combine splitter after it.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr111023-2.c: Adjust testcase.
* gcc.target/i386/vect-div-1.c: Ditto.

Adjust testcase for the regressed testcases after obsolete of vcond{,u,eq}.

> Richard suggests that we implement the "obvious" transforms like
> inversion in the middle-end but if for example unsigned compares
> are not supported the us_minus + eq + negative trick isn't on
> that list.
>
> The main reason to restrict vec_cmp would be to avoid
> a <= b ? c : d going with an unsupported vec_cmp but instead
> do a > b ? d : c - the alternative is trying to fix this
> on the RTL side via combine.  I understand the non-native

Yes, I have a patch which can fix most regressions via pattern match
in combine.
Still there is a situation that is difficult to deal with, mainly the
optimization w/o sse4.1 . Because pblendvb/blendvps/blendvpd only
exists under sse4.1, w/o sse4.1, it takes 3
instructions (pand,pandn,por) to simulate the vcond_mask, and the
combine matches up to 4 instructions, which makes it currently
impossible to use the combine to recover those optimizations in the
vcond{,u,eq}.i.e min/max.

In the case of sse 4.1 and above, there is basically no regression anymore.

the regression testcases w/o sse4.1

FAIL: g++.target/i386/pr100637-1b.C  -std=gnu++14  scan-assembler-times pcmpeqb 2
FAIL: g++.target/i386/pr100637-1b.C  -std=gnu++17  scan-assembler-times pcmpeqb 2
FAIL: g++.target/i386/pr100637-1b.C  -std=gnu++20  scan-assembler-times pcmpeqb 2
FAIL: g++.target/i386/pr100637-1b.C  -std=gnu++98  scan-assembler-times pcmpeqb 2
FAIL: g++.target/i386/pr100637-1w.C  -std=gnu++14  scan-assembler-times pcmpeqw 2
FAIL: g++.target/i386/pr100637-1w.C  -std=gnu++17  scan-assembler-times pcmpeqw 2
FAIL: g++.target/i386/pr100637-1w.C  -std=gnu++20  scan-assembler-times pcmpeqw 2
FAIL: g++.target/i386/pr100637-1w.C  -std=gnu++98  scan-assembler-times pcmpeqw 2
FAIL: g++.target/i386/pr103861-1.C  -std=gnu++14  scan-assembler-times pcmpeqb 2
FAIL: g++.target/i386/pr103861-1.C  -std=gnu++17  scan-assembler-times pcmpeqb 2
FAIL: g++.target/i386/pr103861-1.C  -std=gnu++20  scan-assembler-times pcmpeqb 2
FAIL: g++.target/i386/pr103861-1.C  -std=gnu++98  scan-assembler-times pcmpeqb 2
FAIL: gcc.target/i386/pr88540.c scan-assembler minpd

gcc/testsuite/ChangeLog:

PR target/115517
* g++.target/i386/pr100637-1b.C: Add xfail and -mno-sse4.1.
* g++.target/i386/pr100637-1w.C: Ditto.
* g++.target/i386/pr103861-1.C: Ditto.
* gcc.target/i386/pr88540.c: Ditto.
* gcc.target/i386/pr103941-2.c: Add -mno-avx512f.
* g++.target/i386/sse4_1-pr100637-1b.C: New test.
* g++.target/i386/sse4_1-pr100637-1w.C: New test.
* g++.target/i386/sse4_1-pr103861-1.C: New test.
* gcc.target/i386/sse4_1-pr88540.c: New test.

Add more splitter for mskmov with avx512 comparison.

gcc/ChangeLog:

PR target/115517
* config/i386/sse.md
(*<sse>_movmsk<ssemodesuffix><avxsizesuffix>_lt_avx512): New
define_insn_and_split.
(*<sse>_movmsk<ssemodesuffix><avxsizesuffix>_<u>ext_lt_avx512):
Ditto.
(*<sse2_avx2>_pmovmskb_lt_avx512): Ditto.
(*<sse2_avx2>_pmovmskb_zext_lt_avx512): Ditto.
(*sse2_pmovmskb_ext_lt_avx512): Ditto.
(*pmovsk_kmask_v16qi_avx512): Ditto.
(*pmovsk_mask_v32qi_avx512): Ditto.
(*pmovsk_mask_cmp_<mode>_avx512): Ditto.
(*pmovsk_ptest_<mode>_avx512): Ditto.

Match IEEE min/max with UNSPEC_IEEE_{MIN,MAX}.

These versions of the min/max patterns implement exactly the operations
min = (op1 < op2 ? op1 : op2)
max = (!(op1 < op2) ? op1 : op2)

gcc/ChangeLog:
PR target/115517
* config/i386/sse.md (*minmax<mode>3_1): New pre_reload
define_insn_and_split.
(*minmax<mode>3_2): Ditto.

Lower AVX512 kmask comparison back to AVX2 comparison when op_{true,false} is vector -1/0.

gcc/ChangeLog
PR target/115517
* config/i386/sse.md
(*<avx512>_cvtmask2<ssemodesuffix><mode>_not): New pre_reload
splitter.
(*<avx512>_cvtmask2<ssemodesuffix><mode>_not): Ditto.
(*avx2_pcmp<mode>3_6): Ditto.
(*avx2_pcmp<mode>3_7): Ditto.

Add more splitters to match (unspec [op1 op2 (gt op3 constm1_operand)] UNSPEC_BLENDV)

These define_insn_and_split are needed after vcond{,u,eq} is obsolete.

gcc/ChangeLog:

PR target/115517
* config/i386/sse.md
(*<sse4_1>_blendv<ssemodesuffix><avxsizesuffix>_gt): New
define_insn_and_split.
(*<sse4_1>_blendv<ssefltmodesuffix><avxsizesuffix>_gtint):
Ditto.
(*<sse4_1>_blendv<ssefltmodesuffix><avxsizesuffix>_not_gtint):
Ditto.
(*<sse4_1_avx2>_pblendvb_gt): Ditto.
(*<sse4_1_avx2>_pblendvb_gt_subreg_not): Ditto.

Enable flate-combine.

Move pass_stv2 and pass_rpad after pre_reload pass_late_combine, also
define target_insn_cost to prevent post_reload pass_late_combine to
revert the optimziation did in pass_rpad.

Adjust testcases since pass_late_combine generates better code but
break scan assembly.

.i.e
Under 32-bit target, gcc used to generate broadcast from stack and
then do the real operation.
After flate_combine, they're combined into embeded broadcast
operations.

gcc/ChangeLog:

* config/i386/i386-features.cc (ix86_rpad_gate): New function.
* config/i386/i386-options.cc (ix86_override_options_after_change):
Don't disable flate_combine.
* config/i386/i386-passes.def: Move pass_stv2 and pass_rpad
after pre_reload pas_late_combine.
* config/i386/i386-protos.h (ix86_rpad_gate): New declare.
* config/i386/i386.cc (ix86_insn_cost): New function.
(TARGET_INSN_COST): Define.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx512f-broadcast-pr87767-1.c: Adjus
testcase.
* gcc.target/i386/avx512f-broadcast-pr87767-5.c: Ditto.
* gcc.target/i386/avx512f-fmadd-sf-zmm-7.c: Ditto.
* gcc.target/i386/avx512f-fmsub-sf-zmm-7.c: Ditto.
* gcc.target/i386/avx512f-fnmadd-sf-zmm-7.c: Ditto.
* gcc.target/i386/avx512f-fnmsub-sf-zmm-7.c: Ditto.
* gcc.target/i386/avx512vl-broadcast-pr87767-1.c: Ditto.
* gcc.target/i386/avx512vl-broadcast-pr87767-5.c: Ditto.
* gcc.target/i386/pr91333.c: Ditto.
* gcc.target/i386/vect-strided-4.c: Ditto.

Extend lshifrtsi3_1_zext to ?k alternative.

late_combine will combine lshift + zero into *lshifrtsi3_1_zext which
cause extra mov between gpr and kmask, add ?k to the pattern.

gcc/ChangeLog:

PR target/115610
* config/i386/i386.md (<*insnsi3_zext): Add alternative ?k,
enable it only for lshiftrt and under avx512bw.
* config/i386/sse.md (*klshrsi3_1_zext): New define_insn, and
add corresponding define_split after it.

Define mask as extern instead of uninitialized local variables.

The testcases are supposed to scan for vpopcnt{b,w,d,q} operations
with k mask, but mask is defined as uninitialized local variable which
will be set as 0 at rtl expand phase.
And it's further simplified off by late_combine which caused scan assembly failure.
Move the definition of mask outside to make the testcases more stable.

gcc/testsuite/ChangeLog:

PR target/115610
* gcc.target/i386/avx512bitalg-vpopcntb.c: Define mask as
extern instead of uninitialized local variables.
* gcc.target/i386/avx512bitalg-vpopcntbvl.c: Ditto.
* gcc.target/i386/avx512bitalg-vpopcntw.c: Ditto.
* gcc.target/i386/avx512bitalg-vpopcntwvl.c: Ditto.
* gcc.target/i386/avx512vpopcntdq-vpopcntd.c: Ditto.
* gcc.target/i386/avx512vpopcntdq-vpopcntq.c: Ditto.

Daily bump.

hppa: Fix ICE caused by mismatched predicate and constraint in xmpyu patterns

2024-06-30 John David Anglin <danglin@gcc.gnu.org>

gcc/ChangeLog:

PR target/115691
* config/pa/pa.md: Remove incorrect xmpyu patterns.

tree-optimization/115701 - fix maybe_duplicate_ssa_info_at_copy

The following restricts copying of points-to info from defs that
might be in regions invoking UB and are never executed.

PR tree-optimization/115701
* tree-ssanames.cc (maybe_duplicate_ssa_info_at_copy):
Only copy info from within the same BB.

* gcc.dg/torture/pr115701.c: New testcase.

tree-optimization/115701 - factor out maybe_duplicate_ssa_info_at_copy

The following factors out the code that preserves SSA info of the LHS
of a SSA copy LHS = RHS when LHS is about to be eliminated to RHS.

PR tree-optimization/115701
* tree-ssanames.h (maybe_duplicate_ssa_info_at_copy): Declare.
* tree-ssanames.cc (maybe_duplicate_ssa_info_at_copy): New
function, split out from ...
* tree-ssa-copy.cc (fini_copy_prop): ... here.
* tree-ssa-sccvn.cc (eliminate_dom_walker::eliminate_stmt): ...
and here.

Harden SLP reduction support wrt STMT_VINFO_REDUC_IDX

The following makes sure that for a SLP reductions all lanes have
the same STMT_VINFO_REDUC_IDX. Once we move that info and can adjust
it we can implement swapping. It also makes the existing protection
against operand swapping trigger for all stmts participating in a
reduction, not just the final one marked as reduction-def.

* tree-vect-slp.cc (vect_build_slp_tree_1): Compare
STMT_VINFO_REDUC_IDX.
(vect_build_slp_tree_2): Prevent operand swapping for
all stmts participating in a reduction.

vect: Determine input vectype for multiple lane-reducing operations

The input vectype of reduction PHI statement must be determined before
vect cost computation for the reduction. Since lance-reducing operation has
different input vectype from normal one, so we need to traverse all reduction
statements to find out the input vectype with the least lanes, and set that to
the PHI statement.

2024-06-16 Feng Xue <fxue@os.amperecomputing.com>

gcc/
* tree-vect-loop.cc (vectorizable_reduction): Determine input vectype
during traversal of reduction statements.

vect: Fix shift-by-induction for single-lane slp

Allow shift-by-induction for slp node, when it is single lane, which is
aligned with the original loop-based handling.

2024-06-26 Feng Xue <fxue@os.amperecomputing.com>

gcc/
* tree-vect-stmts.cc (vectorizable_shift): Allow shift-by-induction
for single-lane slp node.

gcc/testsuite/
* gcc.dg/vect/vect-shift-6.c
* gcc.dg/vect/vect-shift-7.c

Daily bump.

[PR115565] cse: Don't use a valid regno for non-register in comparison_qty

Use INT_MIN rather than -1 in `comparison_qty' where a comparison is not
with a register, because the value of -1 is actually a valid reference
to register 0 in the case where it has not been assigned a quantity.

Using -1 makes `REG_QTY (REGNO (folded_arg1)) == ent->comparison_qty'
comparison in `fold_rtx' to incorrectly trigger in rare circumstances
and return true for a memory reference, making CSE consider a comparison
operation to evaluate to a constant expression and consequently make the
resulting code incorrectly execute or fail to execute conditional
blocks.

This has caused a miscompilation of rwlock.c from LinuxThreads for the
`alpha-linux-gnu' target, where `rwlock->__rw_writer != thread_self ()'
expression (where `thread_self' returns the thread pointer via a PALcode
call) has been decided to be always true (with `ent->comparison_qty'
using -1 for a reference to to `rwlock->__rw_writer', while register 0
holding the thread pointer retrieved by `thread_self') and code for the
false case has been optimized away where it mustn't have, causing
program lockups.

The issue has been observed as a regression from commit 08a692679fb8
("Undefined cse.c behaviour causes 3.4 regression on HPUX"),
<https://gcc.gnu.org/ml/gcc-patches/2004-10/msg02027.html>, and up to
commit 932ad4d9b550 ("Make CSE path following use the CFG"),
<https://gcc.gnu.org/ml/gcc-patches/2006-12/msg00431.html>, where CSE
has been restructured sufficiently for the issue not to trigger with the
original reproducer anymore. However the original bug remains and can
trigger, because `comparison_qty' will still be assigned -1 for a memory
reference and the `reg_qty' member of a `cse_reg_info_table' entry will
still be assigned -1 for register 0 where the entry has not been
assigned a quantity, e.g. at initialization.

Use INT_MIN then as noted above, so that the value remains negative, for
consistency with the REGNO_QTY_VALID_P macro (even though not used on
`comparison_qty'), and then so that it should not ever match a valid
negated register number, fixing the regression with commit 08a692679fb8.

gcc/
PR rtl-optimization/115565
* cse.cc (record_jump_cond): Use INT_MIN rather than -1 for
`comparison_qty' if !REG_P.

[to-be-committed,RISC-V,V4] movmem for RISCV with V extension

I hadn't updated my repo on the host where I handle email, so it picked
up the older version of this patch without the testsuite fix.  So, V4
with the testsuite option for lmul fixed.

--

And Sergei's movmem patch.  Just trivial testsuite adjustment for an
option name change and a whitespace fix from me.

I've spun this in my tester for rv32 and rv64.  I'll wait for pre-commit
CI before taking further action.

Just a reminder, this patch is designed to handle the case where we can
issue a single vector load/store which avoids all the complexities of
determining which direction to copy.

--

gcc/ChangeLog

* config/riscv/riscv.md (movmem<mode>): New expander.

gcc/testsuite/ChangeLog

PR target/112109
* gcc.target/riscv/rvv/base/movmem-1.c: New test

Fortran: fix ALLOCATE with SOURCE of deferred character length [PR114019]

gcc/fortran/ChangeLog:

PR fortran/114019
* trans-stmt.cc (gfc_trans_allocate): Fix handling of case of
scalar character expression being used for SOURCE.

gcc/testsuite/ChangeLog:

PR fortran/114019
* gfortran.dg/allocate_with_source_33.f90: New test.

Match: Support imm form for unsigned scalar .SAT_ADD

This patch would like to support the form of unsigned scalar .SAT_ADD
when one of the op is IMM.  For example as below:

Form IMM:
  #define DEF_SAT_U_ADD_IMM_FMT_1(T)       \
  T __attribute__((noinline))              \
  sat_u_add_imm_##T##_fmt_1 (T x)          \
  {                                        \
    return (T)(x + 9) >= x ? (x + 9) : -1; \
  }

DEF_SAT_U_ADD_IMM_FMT_1(uint64_t)

Before this patch:
__attribute__((noinline))
uint64_t sat_u_add_imm_uint64_t_fmt_1 (uint64_t x)
{
  long unsigned int _1;
  uint64_t _3;

;;   basic block 2, loop depth 0
;;    pred:       ENTRY
  _1 = MIN_EXPR <x_2(D), 18446744073709551606>;
  _3 = _1 + 9;
  return _3;
;;    succ:       EXIT

}

After this patch:
__attribute__((noinline))
uint64_t sat_u_add_imm_uint64_t_fmt_1 (uint64_t x)
{
  uint64_t _3;

;;   basic block 2, loop depth 0
;;    pred:       ENTRY
  _3 = .SAT_ADD (x_2(D), 9); [tail call]
  return _3;
;;    succ:       EXIT

}

The below test suites are passed for this patch:
1. The rv64gcv fully regression test with newlib.
2. The x86 bootstrap test.
3. The x86 fully regression test.

gcc/ChangeLog:

* match.pd: Add imm form for .SAT_ADD matching.
* tree-ssa-math-opts.cc (math_opts_dom_walker::after_dom_children):
Add .SAT_ADD matching under PLUS_EXPR.

Signed-off-by: Pan Li <pan2.li@intel.com>

jit: Fix Darwin bootstrap after r15-1699.

r15-1699-g445c62ee492 contains changes that trigger two maybe-uninitialized
warnings on Darwin, which result in a bootstrap failure.

Note that the warnings are false positives, in fact the variables should be
initialized in the cases of a switch (all values of the switch condition are
covered).

Fixed here by providing default initializations for the relevant variables.

gcc/jit/ChangeLog:

* jit-recording.cc
(recording::memento_of_typeinfo::make_debug_string): Default the value
of ident.
(recording::memento_of_typeinfo::write_reproducer): Default the value
of type.

Signed-off-by: Iain Sandoe <iains@gcc.gnu.org>

[committed] Fix mcore-elf regression after recent IRA change

So the recent IRA change exposed a bug in the mcore backend.

The mcore has a special instruction (xtrb3) which can zero extend a GPR into
R1. It's useful because zextb requires a matching source/destination.
Unfortunately xtrb3 modifies CC.

The IRA changes twiddle register allocation such that we want to use xtrb3.
Unfortunately CC is live at the point where we want to use xtrb3 and clobbering
CC causes the test to fail.

Exposing the clobber in the expander and insn seems like the best path forward.
We could also drop the xtrb3 alternative, but that seems like it would hurt
codegen more than exposing the clobber.

The bitfield extraction patterns using xtrb look problematic as well, but I
didn't try to fix those.

This fixes the builtn-arith-overflow regressions and appears to fix
20010122-1.c as a side effect.

gcc/
* config/mcore/mcore.md (zero_extendqihi2): Clobber CC in expander
and matching insn.
(zero_extendqisi2): Likewise.

Daily bump.

c++: bad 'this' conversion for nullary memfn [PR106760]

Here we notice the 'this' conversion for the call f<void>() is bad, so
we correctly defer deduction for the template candidate, but we end up
never adding it to 'bad_cands' since missing_conversion_p for it returns
false (its only argument is 'this' which has already been determined to
be bad). This is not a huge deal, but it causes us to longer accept the
call with -fpermissive in release builds, and a tree check ICE in checking
builds.

So if we have a non-strictly viable template candidate that has not been
instantiated, then we need to add it to 'bad_cands' even if no argument
conversion is missing.

PR c++/106760

gcc/cp/ChangeLog:

* call.cc (add_candidates): Relax test for adding a candidate
to 'bad_cands' to also accept an uninstantiated template candidate
that has no missing conversions.

gcc/testsuite/ChangeLog:

* g++.dg/ext/conv3.C: New test.

Reviewed-by: Jason Merrill <jason@redhat.com>

libstdc++: Define __glibcxx_assert_fail for non-verbose build [PR115585]

When the library is configured with --disable-libstdcxx-verbose the
assertions just abort instead of calling __glibcxx_assert_fail, and so I
didn't export that function for the non-verbose build. However, that
option is documented to not change the library ABI, so we still need to
export the symbol from the library. It could be needed by programs
compiled against the headers from a verbose build.

The non-verbose definition can just call abort so that it doesn't pull
in I/O symbols, which are unwanted in a non-verbose build.

libstdc++-v3/ChangeLog:

PR libstdc++/115585
* src/c++11/assert_fail.cc (__glibcxx_assert_fail): Add
definition for non-verbose builds.

libstdc++: Extend std::equal memcmp optimization to std::byte [PR101485]

We optimize std::equal to memcmp for integers and pointers, which means
that std::byte comparisons generate bigger code than char comparisons.

We can't use memcmp for arbitrary enum types, because they could have an
overloaded operator== that has custom semantics, but we know that
std::byte doesn't do that.

libstdc++-v3/ChangeLog:

PR libstdc++/101485
* include/bits/stl_algobase.h (__equal_aux1): Check for
std::byte as well.
* testsuite/25_algorithms/equal/101485.cc: New test.

libstdc++: Do not use C++11 alignof in C++98 mode [PR104395]

When -faligned-new (or Clang's -faligned-allocation) is used our
allocators try to support extended alignments, gated on the
__cpp_aligned_new macro. However, because they use alignof(_Tp) which is
not a keyword in C++98 mode, using -std=c++98 -faligned-new results in
errors from <memory> and other headers.

We could change them to use __alignof__ instead of alignof, but that
would potentially alter the result of the conditions, because e.g.
alignof(long long) != __alignof__(long long) on some targets. That's
probably not an issue for any types with extended alignment, so maybe it
would be a safe change.

For now, it seems acceptable to just disable the extended alignment
support in C++98 mode, so that -faligned-new enables std::align_val_t
and the corresponding operator new overloads, but doesn't affect
std::allocator, __gnu_cxx::__bitmap_allocator etc.

libstdc++-v3/ChangeLog:

PR libstdc++/104395
* include/bits/new_allocator.h: Disable extended alignment
support in C++98 mode.
* include/bits/stl_tempbuf.h: Likewise.
* include/ext/bitmap_allocator.h: Likewise.
* include/ext/malloc_allocator.h: Likewise.
* include/ext/mt_allocator.h: Likewise.
* include/ext/pool_allocator.h: Likewise.
* testsuite/ext/104395.cc: New test.

libstdc++: Simplify <ext/aligned_buffer.h> class templates

As noted in a comment, the __gnu_cxx::__aligned_membuf class template
can be simplified, because alignof(T) and alignas(T) use the correct
alignment for a data member. That's true since GCC 8 and Clang 8. The
EDG front end (as used by Intel icc, aka "Intel C++ Compiler Classic")
does not implement the PR c++/69560 change, so keep using the old
implementation when __EDG__ is defined, to avoid an ABI change for icc.

For __gnu_cxx::__aligned_buffer<T> all supported compilers agree on the
value of __alignof__(T), but we can still simplify it by removing the
dependency on std::aligned_storage<sizeof(T), __alignof__(T)>.

Add a test that checks that the aligned buffer types have the expected
alignment, so that we can tell if changes like this affect their ABI
properties.

libstdc++-v3/ChangeLog:

* include/ext/aligned_buffer.h (__aligned_membuf): Use
alignas(T) directly instead of defining a struct and using 9its
alignment.
(__aligned_buffer): Remove use of std::aligned_storage.
* testsuite/abi/aligned_buffers.cc: New test.

ssa_lazy_cache takes an optional bitmap_obstack pointer.

Allow ssa_lazy cache to allocate bitmaps from a client provided obstack
if so desired.

* gimple-range-cache.cc (ssa_lazy_cache::ssa_lazy_cache): Relocate here.
Check for provided obstack.
(ssa_lazy_cache::~ssa_lazy_cache): Relocate here. Free bitmap or obstack.
* gimple-range-cache.h (ssa_lazy_cache::ssa_lazy_cache): Move.
(ssa_lazy_cache::~ssa_lazy_cache): Move.
(ssa_lazy_cache::m_ob): New.
* gimple-range.cc (dom_ranger::dom_ranger): Iniitialize obstack.
(dom_ranger::~dom_ranger): Release obstack.
(dom_ranger::pre_bb): Create ssa_lazy_cache using obstack.
* gimple-range.h (m_bitmaps): New.

i386: Cleanup tmp variable usage in ix86_expand_move

Remove extra assignment, extra temp variable and variable shadowing.

No functional changes intended.

gcc/ChangeLog:

* config/i386/i386-expand.cc (ix86_expand_move): Remove extra
assignment to tmp variable, reuse tmp variable instead of
declaring new temporary variable and remove tmp variable shadowing.

Use move-aware auto_vec in map

Using auto_vec rather than vec for means the vectors are release
automatically upon return, to stop the leak. The problem seems is that
auto_vec<T, N> is not really move-aware, only the <T, 0> specialization
is.

gcc/ChangeLog:

* tree-profile.cc (find_conditions): Use auto_vec without
embedded storage.

tree-optimization/115652 - more fixing of the fix

The following addresses the corner case of an outer loop with an empty
header where we end up asking for the BB of a NULL stmt by
special-casing this case.

PR tree-optimization/115652
* tree-vect-slp.cc (vect_schedule_slp_node): Handle the case
where the outer loop header block is empty.

i386: Fix regression after refactoring legitimize_pe_coff_symbol, ix86_GOT_alias_set and PE_COFF_LEGITIMIZE_EXTERN_DECL [PR115635]

This patch fixes 3 bugs reported after merging the "Add DLL
import/export implementation to AArch64" series.
https://gcc.gnu.org/pipermail/gcc-patches/2024-June/653955.html The
series refactors the i386 codebase to reuse it in AArch64, which
triggers some bugs.

Bug 115661 - [15 Regression] wrong code at -O{2,3} on x86_64-linux-gnu
since r15-1599-g63512c72df09b4
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115661

Bug 115635 - [15 regression] Bootstrap fails with failed self-test
with the rust fe (diagnostic-path.cc:1153: test_empty_path: FAIL:
ASSERT_FALSE ((path.interprocedural_p ()))) since
r15-1599-g63512c72df09b4
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115635

Issue 1. In some code, i386 has been relying on the
legitimize_pe_coff_symbol call on all platforms and should return
NULL_RTX if it is not supported.

Fix: NULL_RTX handling has been added when the target does not support
PECOFF.

Issue 2. ix86_GOT_alias_set is used on all platforms and cannot be
extracted to mingw.

Fix: ix86_GOT_alias_set has been returned as it was and is used on all
platforms for i386.

Bug 115643 - [15 regression] aarch64-w64-mingw32 support today breaks
x86_64-w64-mingw32 build cannot represent relocation type BFD_RELOC_64
since r15-1602-ged20feebd9ea31
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115643

Issue 3. PE_COFF_EXTERN_DECL_SHOULD_BE_LEGITIMIZED has been added and
used with a negative operator for a complex expression without braces.

Fix: Braces has been added, and
PE_COFF_EXTERN_DECL_SHOULD_BE_LEGITIMIZED has been renamed to
PE_COFF_LEGITIMIZE_EXTERN_DECL.

2024-06-28 Evgeny Karpov <Evgeny.Karpov@microsoft.com>

gcc/ChangeLog:
PR bootstrap/115635
PR target/115643
PR target/115661
* config/aarch64/cygming.h
(PE_COFF_EXTERN_DECL_SHOULD_BE_LEGITIMIZED): Rename to
PE_COFF_LEGITIMIZE_EXTERN_DECL.
(PE_COFF_LEGITIMIZE_EXTERN_DECL): Likewise.
* config/i386/cygming.h (GOT_ALIAS_SET): Remove the diffinition to
reuse it from i386.h.
(PE_COFF_EXTERN_DECL_SHOULD_BE_LEGITIMIZED): Rename to
PE_COFF_LEGITIMIZE_EXTERN_DECL.
(PE_COFF_LEGITIMIZE_EXTERN_DECL): Likewise.
* config/i386/i386-expand.cc (ix86_expand_move): Return
ix86_GOT_alias_set.
* config/i386/i386-expand.h (ix86_GOT_alias_set): Likewise.
* config/i386/i386.cc (ix86_GOT_alias_set): Likewise.
* config/i386/i386.h (GOT_ALIAS_SET): Likewise.
* config/mingw/winnt-dll.cc (get_dllimport_decl): Use
GOT_ALIAS_SET.
(legitimize_pe_coff_symbol): Rename to
PE_COFF_LEGITIMIZE_EXTERN_DECL.
* config/mingw/winnt-dll.h (ix86_GOT_alias_set): Declare
ix86_GOT_alias_set.

Remove unused hybrid_* operators in range-ops.

gcc/ChangeLog:

* range-op-ptr.cc (class hybrid_and_operator): Remove.
(class hybrid_or_operator): Same.
(class hybrid_min_operator): Same.
(class hybrid_max_operator): Same.

tree-optimization/115640 - outer loop vect with inner SLP permute

The following fixes wrong-code when using outer loop vectorization
and an inner loop SLP access with permutation. A wrong adjustment
to the IV increment is then applied on GCN.

PR tree-optimization/115640
* tree-vect-stmts.cc (vectorizable_load): With an inner
loop SLP access to not apply a gap adjustment.

amdgcn: Fix RDNA V32 permutations [PR115640]

There was an off-by-one error in the RDNA validation check, plus I forgot to
allow for two-to-one permute-and-merge operations.

PR target/115640

gcc/ChangeLog:

* config/gcn/gcn.cc (gcn_vectorize_vec_perm_const): Modify RDNA checks.

Add gfc_class_set_vptr.

First step to adding a general assign all class type's data members
routine. Having a general routine prevents forgetting to tackle the
edge cases, e.g. setting _len.

gcc/fortran/ChangeLog:

* trans-expr.cc (gfc_class_set_vptr): Add setting of _vptr
member.
* trans-intrinsic.cc (conv_intrinsic_move_alloc): First use
of gfc_class_set_vptr and refactor very similar code.
* trans.h (gfc_class_set_vptr): Declare the new function.

gcc/testsuite/ChangeLog:

* gfortran.dg/unlimited_polymorphic_11.f90: Remove unnecessary
casts in gd-final expression.

Use gfc_reset_vptr more consistently.

The vptr for a class type is set in various ways in different
locations. Refactor the use and simplify code.

gcc/fortran/ChangeLog:

* trans-array.cc (structure_alloc_comps): Use reset_vptr.
* trans-decl.cc (gfc_trans_deferred_vars): Same.
(gfc_generate_function_code): Same.
* trans-expr.cc (gfc_reset_vptr): Allow supplying the class
type.
(gfc_conv_procedure_call): Use reset_vptr.
* trans-intrinsic.cc (gfc_conv_intrinsic_transfer): Same.

i386: Handle sign_extend like zero_extend in *concatditi3_[346]

This patch generalizes some of the patterns in i386.md that recognize
double word concatenation, so they handle sign_extend the same way that
they handle zero_extend in appropriate contexts.

As a motivating example consider the following function:

__int128 foo(long long x, unsigned long long y)
{
  return ((__int128)x<<64) | y;
}

when compiled with -O2, x86_64 currently generates:

foo: movq    %rdi, %rdx
        xorl    %eax, %eax
        xorl    %edi, %edi
        orq     %rsi, %rax
        orq     %rdi, %rdx
        ret

with this patch we now generate (the same as if x is unsigned):

foo: movq    %rsi, %rax
        movq    %rdi, %rdx
        ret

Treating both extensions the same way using any_extend is valid as
the top (extended) bits are "unused" after the shift by 64 (or more).
In theory, the RTL optimizers might consider canonicalizing the form
of extension used in these cases, but zero_extend is faster on some
machine, whereas sign extension is supported via addressing modes on
others, so handling both in the machine description is probably best.

2024-06-28  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
* config/i386/i386.md (*concat<mode><dwi>3_3): Change zero_extend
to any_extend in first operand to left shift by mode precision.
(*concat<mode><dwi>3_4): Likewise.
(*concat<mode><dwi>3_6): Likewise.

gcc/testsuite/ChangeLog
* gcc.target/i386/concatditi-1.c: New test case.

i386: Some additional AVX512 ternlog refinements.

This patch is another round of refinements to fine tune the new ternlog
infrastructure in i386's sse.md.  This patch tweaks ix86_ternlog_idx
to allow multiple MEM/CONST_VECTOR/VEC_DUPLICATE operands prior to
splitting (before reload), when force_register is called on all but
one of these operands.  Conceptually during the dynamic programming,
registers fill the args slots in the order 0, 1, 2, and mem-like
operands fill the slots in the order 2, 0, 1 [preferring the memory
operand to come last].

This patch allows us to remove some of the legacy ternlog patterns
in sse.md without regressions [which is left to the next and final
patch in this series].  An indication that these patterns are no
longer required is shown by the necessary testsuite tweaks below,
where the output assembler for the legacy instructions used hexadecimal,
but with the new ternlog infrastructure now consistently use decimal.

2024-06-28  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
* config/i386/i386-expand.cc (ix86_ternlog_idx) <case VEC_DUPLICATE>:
Add a "goto do_mem_operand" as this need not match memory_operand.
<case CONST_VECTOR>: Only args[2] may be volatile memory operand.
Allow MEM/VEC_DUPLICATE/CONST_VECTOR as args[0] and args[1].

gcc/testsuite/ChangeLog
* gcc.target/i386/avx512f-andn-di-zmm-2.c: Match decimal instead
of hexadecimal immediate operand to ternlog.
* gcc.target/i386/avx512f-andn-si-zmm-2.c: Likewise.
* gcc.target/i386/avx512f-orn-si-zmm-1.c: Likewise.
* gcc.target/i386/avx512f-orn-si-zmm-2.c: Likewise.
* gcc.target/i386/pr100711-3.c: Likewise.
* gcc.target/i386/pr100711-4.c: Likewise.
* gcc.target/i386/pr100711-5.c: Likewise.

Daily bump.

libgccjit: Add ability to get the alignment of a type

gcc/jit/ChangeLog:

* docs/topics/compatibility.rst (LIBGCCJIT_ABI_28): New ABI tag.
* docs/topics/expressions.rst: Document gcc_jit_context_new_alignof.
* jit-playback.cc (new_alignof): New method.
* jit-playback.h: New method.
* jit-recording.cc (recording::context::new_alignof): New
method.
(recording::memento_of_sizeof::replay_into,
recording::memento_of_typeinfo::replay_into,
recording::memento_of_sizeof::make_debug_string,
recording::memento_of_typeinfo::make_debug_string,
recording::memento_of_sizeof::write_reproducer,
recording::memento_of_typeinfo::write_reproducer): Rename.
* jit-recording.h (enum type_info_type): New enum.
(class memento_of_sizeof class memento_of_typeinfo): Rename.
* libgccjit.cc (gcc_jit_context_new_alignof): New function.
* libgccjit.h (gcc_jit_context_new_alignof): New function.
* libgccjit.map: New function.

gcc/testsuite/ChangeLog:

* jit.dg/all-non-failing-tests.h: New test.
* jit.dg/test-alignof.c: New test.

c: Error message for incorrect use of static in array declarations.

Add an explicit error messages when c99's static is
used without a size expression in an array declarator.

gcc/c:
* c-parser.cc (c_parser_direct_declarator_inner): Add
error message.

gcc/testsuite:
* gcc.dg/c99-arraydecl-4.c: New test.

fixincludes: adjust stdio fix for macOS 15 headers

fixincludes/ChangeLog:

* fixincl.x: Regenerate.
* inclhack.def (apple_local_stdio_fn_deprecation): Also apply to
_stdio.h.

Disable late-combine for -O0 [PR115677]

late-combine relies on df, which for -O0 is only initialised late
(pass_df_initialize_no_opt, after split1). Other df-based passes
cope with this by requiring optimize > 0, so this patch does the
same for late-combine.

gcc/
PR rtl-optimization/115677
* late-combine.cc (pass_late_combine::gate): New function.

s390: Check for ADDR_REGS in s390_decompose_addrstyle_without_index

An explicit check for address registers was not required so far since
during register allocation the processing of address constraints was
sufficient. However, address constraints themself do not check for
REGNO_OK_FOR_{BASE,INDEX}_P. Thus, with the newly introduced
late-combine pass in r15-1579-g792f97b44ffc5e we generate new insns with
invalid address registers which aren't fixed up afterwards.

Fixed by explicitly checking for address registers in
s390_decompose_addrstyle_without_index such that those new insns are
rejected.

gcc/ChangeLog:

PR target/115634
* config/s390/s390.cc (s390_decompose_addrstyle_without_index):
Check for ADDR_REGS in s390_decompose_addrstyle_without_index.

tree-optimization/115669 - fix SLP reduction association

The following avoids associating a reduction path as that might
get STMT_VINFO_REDUC_IDX out-of-sync with the SLP operand order.
This is a latent issue with SLP reductions but now easily exposed
as we're doing single-lane SLP reductions.

When we achieved SLP only we can move and update this meta-data.

PR tree-optimization/115669
* tree-vect-slp.cc (vect_build_slp_tree_2): Do not reassociate
chains that participate in a reduction.

* gcc.dg/vect/pr115669.c: New testcase.

libstdc++: Fix std::codecvt<wchar_t, char, mbstate_t> for empty dest [PR37475]

For the GNU locale model, codecvt::do_out and codecvt::do_in incorrectly
return 'ok' when the destination range is empty. That happens because
detecting incomplete output is done in the loop body, and the loop is
never even entered if to == to_end.

By restructuring the loop condition so that we check the output range
separately, we can ensure that for a non-empty source range, we always
enter the loop at least once, and detect if the destination range is too
small.

The loops also seem easier to reason about if we return immediately on
any error, instead of checking the result twice on every iteration. We
can use an RAII type to restore the locale before returning, which also
simplifies all the other member functions.

libstdc++-v3/ChangeLog:

PR libstdc++/37475
* config/locale/gnu/codecvt_members.cc (Guard): New RAII type.
(do_out, do_in): Return partial if the destination is empty but
the source is not. Use Guard to restore locale on scope exit.
Return immediately on any conversion error.
(do_encoding, do_max_length, do_length): Use Guard.
* testsuite/22_locale/codecvt/in/char/37475.cc: New test.
* testsuite/22_locale/codecvt/in/wchar_t/37475.cc: New test.
* testsuite/22_locale/codecvt/out/char/37475.cc: New test.
* testsuite/22_locale/codecvt/out/wchar_t/37475.cc: New test.

[libstdc++] [testsuite] defer to check_vect_support* [PR115454]

The newly-added testcase overrides the default dg-do action set by
check_vect_support_and_set_flags (in libstdc++-dg/conformance.exp), so
it attempts to run the test even if runtime vector support is not
available.

Remove the explicit dg-do directive, so that the default is honored,
and the test is run if vector support is found, and only compiled
otherwise.

for libstdc++-v3/ChangeLog

PR libstdc++/115454
* testsuite/experimental/simd/pr115454_find_last_set.cc: Defer
to check_vect_support_and_set_flags's default dg-do action.

Avoid global bitmap space in ranger.

gcc/ChangeLog:

* gimple-range-cache.cc (update_list::update_list): Add m_bitmaps.
(update_list::~update_list): Initialize m_bitmaps.
* gimple-range-cache.h (ssa_lazy_cache): Add m_bitmaps.
* gimple-range.cc (enable_ranger): Remove global bitmap
initialization.
(disable_ranger): Remove global bitmap release.

libstdc++: Fix std::format for chrono::duration with unsigned rep [PR115668]

Using std::chrono::abs is only valid if numeric_limits<rep>::is_signed
is true, so using it unconditionally made it ill-formed to format a
duration with an unsigned rep.

The duration formatter might as negate the duration itself instead of
using chrono::abs, because it already needs to check for a negative
value.

libstdc++-v3/ChangeLog:

PR libstdc++/115668
* include/bits/chrono_io.h (formatter<duration<R,P, C>::format):
Do not use chrono::abs.
* testsuite/20_util/duration/io.cc: Check formatting a duration
with unsigned rep.

libstdc++: Add debug assertions to std::vector<bool> [PR103191]

This adds debug assertions for std::vector<bool> element access.

libstdc++-v3/ChangeLog:

PR libstdc++/103191
* include/bits/stl_bvector.h (vector<bool>::operator[])
(vector<bool>::front, vector<bool>::back): Add debug assertions.
* testsuite/23_containers/vector/bool/element_access/constexpr.cc:
Remove dg-error that no longer triggers.

libstdc++: Enable more debug assertions during constant evaluation [PR111250]

Some of our debug assertions expand to nothing unless
_GLIBCXX_ASSERTIONS is defined, which means they are not checked during
constant evaluation. By making them unconditionally expand to a
__glibcxx_assert expression they will be checked during constant
evaluation. This allows us to diagnose more instances of undefined
behaviour at compile-time, such as accessing a vector past-the-end.

libstdc++-v3/ChangeLog:

PR libstdc++/111250
* include/debug/assertions.h (__glibcxx_requires_non_empty_range)
(__glibcxx_requires_nonempty, __glibcxx_requires_subscript):
Define to __glibcxx_assert expressions or to debug mode
__glibcxx_check_xxx expressions.
* testsuite/23_containers/array/element_access/constexpr_c++17.cc:
Add checks for out-of-bounds accesses in constant expressions.
* testsuite/23_containers/vector/element_access/constexpr.cc:
Likewise.

ada: Remove last uses of System.Address_Operations in runtime library

This completes the switch from using System.Address_Operations to using only
System.Storage_Elements in the runtime library. The remaining uses were for
simple optimizations that can be done by the optimizer alone.

gcc/ada/

* libgnat/s-carsi8.adb: Remove clauses for System.Address_Operations
and use only operations of System.Storage_Elements for addresses.
* libgnat/s-casi16.adb: Likewise.
* libgnat/s-casi32.adb: Likewise.
* libgnat/s-casi64.adb: Likewise.
* libgnat/s-casi128.adb: Likewise.
* libgnat/s-carun8.adb: Likewise.
* libgnat/s-caun16.adb: Likewise.
* libgnat/s-caun32.adb: Likewise.
* libgnat/s-caun64.adb: Likewise.
* libgnat/s-caun128.adb: Likewise.
* libgnat/s-geveop.adb: Likewise.

ada: Reject ambiguous function calls in interpolated string expressions

gcc/ada/

* sem_ch2.adb (Analyze_Interpolated_String_Literal): Report
interpretations of ambiguous parameterless function calls.

ada: Add missing dimension information for target names

It is computed from the Etype of N_Target_Name nodes.

gcc/ada/

* sem_ch5.adb (Analyze_Target_Name): Call Analyze_Dimension on the
node once the Etype is set.
* sem_dim.adb (OK_For_Dimension): Set to True for N_Target_Name.
(Analyze_Dimension): Call Analyze_Dimension_Has_Etype for it.

ada: Fix array-manipulating code in Mdll

This patch fixes a duo of array assigments in Mdll that were bound
to fail.

gcc/ada/

* mdll.adb (Build_Non_Reloc_DLL): Fix incorrect assignment
to array object.
(Ada_Build_Non_Reloc_DLL): Likewise.

ada: Bug using user defined string literals with interpolated strings

The frontend rejects the use of user defined string literals
using interpolated strings.

gcc/ada/

* sem_res.adb (Has_Applicable_User_Defined_Literal): Add missing
support for interpolated strings.

ada: Overridden operation field not correctly set for controlling result wrappers

Implicit wrapper overridings generated for functions with
controlling result when deriving with null extension may
have field Overridden_Operation incorrectly set, when making
several such derivations in succession. This happens because
overridings were assumed to come from source, and entities
generated by Derive_Subprograms were also assumed to be
derived from source subprograms. Overridden_Operation could
be set to the entity generated by Derive_Subprograms for the
same type, resulting in a cycle between Overriden_Operation
and Alias fields, causing non-termination in GNATprove.

gcc/ada/

* sem_ch6.adb (Check_Overriding_Indicator) Remove Comes_From_Source filter.
(New_Overloaded_Entity) Move up special case of LSP_Subprogram,
and remove Comes_From_Source filter.

ada: Implement first half of Generalized Finalization

This implements the first half of the Generalized Finalization proposal,
namely the Finalizable aspect as well as its optional relaxed semantics
for the finalization operations, but the latter part is only implemented
for dynamically allocated objects.

In accordance with the spirit, if not the letter, of the proposal, this
implements the finalizable types declared with strict semantics for the
finalization operations as a direct generalization of controlled types,
which in turn makes it possible to reimplement the latter types in terms
of the former types and ensures full interoperability between them.

The relaxed semantics for the finalization operations is also a direct
generalization of the GNAT pragma No_Heap_Finalization for dynamically
allocated objects, in that it extends the effects of the pragma to all
access types designating the finalizable type, instead of just applying
them to library-level named access types.

gcc/ada/

* aspects.ads (Aspect_Id): Add Aspect_Finalizable.
(Implementation_Defined_Aspect): Add True for Aspect_Finalizable.
(Operational_Aspect): Add True for Aspect_Finalizable.
(Aspect_Argument): Add Expression for Aspect_Finalizable.
(Is_Representation_Aspect): Add False for Aspect_Finalizable.
(Aspect_Names): Add Name_Finalizable for Aspect_Finalizable.
(Aspect_Delay): Add Always_Delay for Aspect_Finalizable.
* checks.adb: Add with and use clauses for Sem_Elab.
(Install_Primitive_Elaboration_Check): Call Is_Controlled_Procedure.
* einfo.ads (Has_Relaxed_Finalization): Document new flag.
(Is_Controlled_Active): Update documentation.
* exp_aggr.adb (Generate_Finalization_Actions): Replace Find_Prim_Op
with Find_Controlled_Prim_Op for Name_Finalize.
* exp_attr.adb (Expand_N_Attribute_Reference) <Finalization_Size>:
Return 0 if the prefix type has relaxed finalization.
* exp_ch3.adb (Build_Equivalent_Record_Aggregate): Return Empty if
the type needs finalization.
(Expand_Freeze_Record_Type): Call Find_Controlled_Prim_Op instead of
Find_Prim_Op for Name_{Adjust,Initialize,Finalize}.
Call Make_Finalize_Address_Body for all controlled types.
* exp_ch4.adb (Insert_Dereference_Action): Do not generate a call to
Adjust_Controlled_Dereference if the designated type has relaxed
finalization.
* exp_ch6.adb (Needs_BIP_Collection): Return false for an untagged
type that has relaxed finalization.
* exp_ch7.adb (Allows_Finalization_Collection): Return false if the
designated type has relaxed finalization.
(Check_Visibly_Controlled): Call Find_Controlled_Prim_Op instead of
Find_Prim_Op.
(Make_Adjust_Call): Likewise.
(Make_Deep_Record_Body): Likewise.
(Make_Final_Call): Likewise.
(Make_Init_Call): Likewise.
* exp_disp.adb (Set_All_DT_Position): Remove obsolete warning.
* exp_util.ads: Add with and use clauses for Snames.
(Find_Prim_Op): Add precondition.
(Find_Controlled_Prim_Op): New function declaration.
(Name_Of_Controlled_Prim_Op): Likewise.
* exp_util.adb: Remove with and use clauses for Snames.
(Build_Allocate_Deallocate_Proc): Do not build finalization actions
if the designated type has relaxed finalization.
(Find_Controlled_Prim_Op): New function.
(Find_Last_Init): Call Find_Controlled_Prim_Op instead of
Find_Prim_Op.
(Name_Of_Controlled_Prim_Op): New function.
* freeze.adb (Freeze_Entity.Freeze_Record_Type): Propagate the
Has_Relaxed_Finalization flag from components.
* gen_il-fields.ads (Opt_Field_Enum): Add Has_Relaxed_Finalization.
* gen_il-gen-gen_entities.adb (Entity_Kind): Likewise.
* sem_aux.adb (Is_By_Reference_Type): Return true for all controlled
types.
* sem_ch3.adb (Build_Derived_Record_Type): Do not special case types
declared in Ada.Finalization.
(Record_Type_Definition): Propagate the Has_Relaxed_Finalization
flag from components.
* sem_ch13.adb (Analyze_Aspects_At_Freeze_Point): Also process the
Finalizable aspect.
(Analyze_Aspect_Specifications): Likewise. Call Flag_Non_Static_Expr
in more cases.
(Check_Aspect_At_Freeze_Point): Likewise.
(Inherit_Aspects_At_Freeze_Point): Likewise.
(Resolve_Aspect_Expressions): Likewise.
(Resolve_Finalizable_Argument): New procedure.
(Validate_Finalizable_Aspect): Likewise.
* sem_elab.ads: Add with and use clauses for Snames.
(Is_Controlled_Procedure): New function declaration.
* sem_elab.adb: Remove with and use clauses for Snames.
(Is_Controlled_Proc): Move to...
(Is_Controlled_Procedure): ...here and rename.
(Check_A_Call): Call Find_Controlled_Prim_Op instead of
Find_Prim_Op.
(Is_Finalization_Procedure): Likewise.
* sem_util.ads (Propagate_Controlled_Flags): Update documentation.
* sem_util.adb (Is_Fully_Initialized_Type): Replace call to
Find_Optional_Prim_Op with Find_Controlled_Prim_Op.
Call Has_Null_Extension only for derived tagged types.
(Propagate_Controlled_Flags): Propagate Has_Relaxed_Finalization.
* snames.ads-tmpl (Name_Finalizable): New name.
(Name_Relaxed_Finalization): Likewise.
* libgnat/s-finroo.ads (Root_Controlled): Add Finalizable aspect.
* doc/gnat_rm/gnat_language_extensions.rst: Document implementation
of Generalized Finalization.
* gnat_rm.texi: Regenerate.
* gnat_ugn.texi: Regenerate.

i386: Refactor vcvttps2qq/vcvtqq2ps patterns.

Refactor vcvttps2qq/vcvtqq2ps patterns for remove redundant
round_*_modev8sf_condition.

gcc/ChangeLog:

* config/i386/sse.md
(float<floatunssuffix><sselongvecmodelower><mode>2<mask_name>
<round_name>): Refactor the pattern.
(unspec_fix<vcvtt_uns_suffix>_trunc<mode><sselongvecmodelower>2
<mask_name><round_saeonly_name>): Ditto.
(fix<fixunssuffix>_trunc<mode><sselongvecmodelower>2<mask_name>
<round_saeonly_name>): Ditto.
* config/i386/subst.md (round_modev8sf_condition): Remove.
(round_saeonly_modev8sf_condition): Ditto.

vect: support direct conversion under x86-64-v3.

gcc/ChangeLog:

PR target/107432
* config/i386/i386-expand.cc (ix86_expand_trunc_with_avx2_noavx512f):
New function for generate a series of suitable insn.
* config/i386/i386-protos.h (ix86_expand_trunc_with_avx2_noavx512f):
Define new function.
* config/i386/sse.md: Extend trunc<mode><mode>2 for x86-64-v3.
(ssebytemode) Add V8HI.
(PMOV_DST_MODE_2_AVX2): New mode iterator.
(PMOV_SRC_MODE_3_AVX2): Ditto.
* config/i386/mmx.md
(trunc<mode><mmxhalfmodelower>2): Ditto.
(avx512vl_trunc<mode><mmxhalfmodelower>2): Ditto.
(truncv2si<mode>2): Ditto.
(avx512vl_truncv2si<mode>2): Ditto.
(mmxbytemode): New mode attr.

gcc/testsuite/ChangeLog:

PR target/107432
* gcc.target/i386/pr107432-8.c: New test.
* gcc.target/i386/pr107432-9.c: Ditto.
* gcc.target/i386/pr92645-4.c: Modify test.

vect: Support v4hi -> v4qi.

gcc/ChangeLog:

PR target/107432
* config/i386/mmx.md
(VI2_32_64): New mode iterator.
(mmxhalfmode): New mode atter.
(mmxhalfmodelower): Ditto.
(truncv2hiv2qi2): Extend mode v4hi and change name from
truncv2hiv2qi to trunc<mode><mmxhalfmodelower>2.

gcc/testsuite/ChangeLog:

PR target/107432
* gcc.target/i386/pr107432-1.c: Modify test.
* gcc.target/i386/pr107432-6.c: Add test.
* gcc.target/i386/pr108938-3.c: This patch supports
truncv4hiv4qi affect bswap optimization, so I added
the -mno-avx option for now, and open a bugzilla.

vect: generate suitable convert insn for int -> int, float -> float and int <-> float.

gcc/ChangeLog:

PR target/107432
* tree-vect-generic.cc
(expand_vector_conversion): Support convert for int -> int,
float -> float and int <-> float.
* tree-vect-stmts.cc (vectorizable_conversion): Wrap the
indirect convert part.
(supportable_indirect_convert_operation): New function.
* tree-vectorizer.h (supportable_indirect_convert_operation):
Define the new function.

gcc/testsuite/ChangeLog:

PR target/107432
* gcc.target/i386/pr107432-1.c: New test.
* gcc.target/i386/pr107432-2.c: Ditto.
* gcc.target/i386/pr107432-3.c: Ditto.
* gcc.target/i386/pr107432-4.c: Ditto.
* gcc.target/i386/pr107432-5.c: Ditto.
* gcc.target/i386/pr107432-6.c: Ditto.
* gcc.target/i386/pr107432-7.c: Ditto.

RISC-V: Add testcases for vector truncate after .SAT_SUB

This patch would like to add the test cases of the vector truncate after
.SAT_SUB.  Aka:

  #define DEF_VEC_SAT_U_SUB_TRUNC_FMT_1(OUT_T, IN_T)                   \
  void __attribute__((noinline))                                       \
  vec_sat_u_sub_trunc_##OUT_T##_fmt_1 (OUT_T *out, IN_T *op_1, IN_T y, \
        unsigned limit)                 \
  {                                                                    \
    unsigned i;                                                        \
    for (i = 0; i < limit; i++)                                        \
      {                                                                \
        IN_T x = op_1[i];                                              \
        out[i] = (OUT_T)(x >= y ? x - y : 0);                          \
      }                                                                \
  }

The below 3 cases are included.

DEF_VEC_SAT_U_SUB_TRUNC_FMT_1(uint8_t, uint16_t)
DEF_VEC_SAT_U_SUB_TRUNC_FMT_1(uint16_t, uint32_t)
DEF_VEC_SAT_U_SUB_TRUNC_FMT_1(uint32_t, uint64_t)

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h: Add helper
test macros.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_binary_scalar.h: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-1.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-2.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-3.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-run-1.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-run-2.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-run-3.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

LoongArch: NFC: Dedup and sort the comment in loongarch_print_operand_reloc

gcc/ChangeLog:

* config/loongarch/loongarch.cc (loongarch_print_operand_reloc):
Dedup and sort the comment describing modifiers.

LoongArch: Tweak IOR rtx_cost for bstrins

Consider

    c &= 0xfff;
    a &= ~0xfff;
    b &= ~0xfff;
    a |= c;
    b |= c;

This can be done with 2 bstrins instructions.  But we need to recognize
it in loongarch_rtx_costs or the compiler will not propagate "c & 0xfff"
forward.

gcc/ChangeLog:

* config/loongarch/loongarch.cc:
(loongarch_use_bstrins_for_ior_with_mask): Split the main logic
into ...
(loongarch_use_bstrins_for_ior_with_mask_1): ... here.
(loongarch_rtx_costs): Special case for IOR those can be
implemented with bstrins.

gcc/testsuite/ChangeLog;

* gcc.target/loongarch/bstrins-3.c: New test.

Fix wrong cost of MEM when addr is a lea.

416.gamess regressed 4-6% on x86_64 since my r15-882-g1d6199e5f8c1c0.
The commit adjust rtx_cost of mem to reduce cost of (add op0 disp).
But Cost of ADDR could be cheaper than XEXP (addr, 0) when it's a lea.
It is the case in the PR, the patch adjust rtx_cost to only handle reg
+ disp, for other forms, they're basically all LEA which doesn't have
additional cost of ADD.

gcc/ChangeLog:

PR target/115462
* config/i386/i386.cc (ix86_rtx_costs): Make cost of MEM (reg +
disp) just a little bit more than MEM (reg).

gcc/testsuite/ChangeLog:
* gcc.target/i386/pr115462.c: New test.

Internal-fn: Support new IFN SAT_TRUNC for unsigned scalar int

This patch would like to add the middle-end presentation for the
saturation truncation.  Aka set the result of truncated value to
the max value when overflow.  It will take the pattern similar
as below.

Form 1:
  #define DEF_SAT_U_TRUC_FMT_1(WT, NT) \
  NT __attribute__((noinline))         \
  sat_u_truc_##T##_fmt_1 (WT x)        \
  {                                    \
    bool overflow = x > (WT)(NT)(-1);  \
    return ((NT)x) | (NT)-overflow;    \
  }

For example, truncated uint16_t to uint8_t, we have

* SAT_TRUNC (254)   => 254
* SAT_TRUNC (255)   => 255
* SAT_TRUNC (256)   => 255
* SAT_TRUNC (65536) => 255

Given below SAT_TRUNC from uint64_t to uint32_t.

DEF_SAT_U_TRUC_FMT_1 (uint64_t, uint32_t)

Before this patch:
__attribute__((noinline))
uint32_t sat_u_truc_T_fmt_1 (uint64_t x)
{
  _Bool overflow;
  unsigned int _1;
  unsigned int _2;
  unsigned int _3;
  uint32_t _6;

;;   basic block 2, loop depth 0
;;    pred:       ENTRY
  overflow_5 = x_4(D) > 4294967295;
  _1 = (unsigned int) x_4(D);
  _2 = (unsigned int) overflow_5;
  _3 = -_2;
  _6 = _1 | _3;
  return _6;
;;    succ:       EXIT

}

After this patch:
__attribute__((noinline))
uint32_t sat_u_truc_T_fmt_1 (uint64_t x)
{
  uint32_t _6;

;;   basic block 2, loop depth 0
;;    pred:       ENTRY
  _6 = .SAT_TRUNC (x_4(D)); [tail call]
  return _6;
;;    succ:       EXIT

}

The below tests are passed for this patch:
*. The rv64gcv fully regression tests.
*. The rv64gcv build with glibc.
*. The x86 bootstrap tests.
*. The x86 fully regression tests.

gcc/ChangeLog:

* internal-fn.def (SAT_TRUNC): Add new signed IFN sat_trunc as
unary_convert.
* match.pd: Add new matching pattern for unsigned int sat_trunc.
* optabs.def (OPTAB_CL): Add unsigned and signed optab.
* tree-ssa-math-opts.cc (gimple_unsigend_integer_sat_trunc): Add
new decl for the matching pattern generated func.
(match_unsigned_saturation_trunc): Add new func impl to match
the .SAT_TRUNC.
(math_opts_dom_walker::after_dom_children): Add .SAT_TRUNC match
function under BIT_IOR_EXPR case.

Signed-off-by: Pan Li <pan2.li@intel.com>

Vect: Support truncate after .SAT_SUB pattern in zip

The zip benchmark of coremark-pro have one SAT_SUB like pattern but
truncated as below:

void test (uint16_t *x, unsigned b, unsigned n)
{
  unsigned a = 0;
  register uint16_t *p = x;

  do {
    a = *--p;
    *p = (uint16_t)(a >= b ? a - b : 0); // Truncate after .SAT_SUB
  } while (--n);
}

It will have gimple before vect pass,  it cannot hit any pattern of
SAT_SUB and then cannot vectorize to SAT_SUB.

_2 = a_11 - b_12(D);
iftmp.0_13 = (short unsigned int) _2;
_18 = a_11 >= b_12(D);
iftmp.0_5 = _18 ? iftmp.0_13 : 0;

This patch would like to improve the pattern match to recog above
as truncate after .SAT_SUB pattern.  Then we will have the pattern
similar to below,  as well as eliminate the first 3 dead stmt.

_2 = a_11 - b_12(D);
iftmp.0_13 = (short unsigned int) _2;
_18 = a_11 >= b_12(D);
iftmp.0_5 = (short unsigned int).SAT_SUB (a_11, b_12(D));

The below tests are passed for this patch.
1. The rv64gcv fully regression tests.
2. The rv64gcv build with glibc.
3. The x86 bootstrap tests.
4. The x86 fully regression tests.

gcc/ChangeLog:

* match.pd: Add convert description for minus and capture.
* tree-vect-patterns.cc (vect_recog_build_binary_gimple_call): Add
new logic to handle in_type is incompatibile with out_type,  as
well as rename from.
(vect_recog_build_binary_gimple_stmt): Rename to.
(vect_recog_sat_add_pattern): Leverage above renamed func.
(vect_recog_sat_sub_pattern): Ditto.

Signed-off-by: Pan Li <pan2.li@intel.com>

tree-optimization/115652 - amend last fix

The previous fix breaks in the degenerate case when the discovered
last_stmt is equal to the first stmt in the block since then we
undo a required stmt advancement.

PR tree-optimization/115652
* tree-vect-slp.cc (vect_schedule_slp_node): Only insert
at the start of the block if that strictly dominates
the discovered dependent stmt.

tree-optimization/115493 - complete previous fix

The following fixes the 2nd occurance of new_temp missed with the
previous fix.

PR tree-optimization/115493
* tree-vect-loop.cc (vect_create_epilog_for_reduction): Use
first scalar result.

Daily bump.

libstdc++: Add script to update docs for a new release branch

This should be run on a release branch after branching from trunk.
Various links and references to trunk in the docs will be updated to
refer to the new release branch.

libstdc++-v3/ChangeLog:

* scripts/update_release_branch.sh: New file.

libstdc++: Remove duplicate test

We currently have 808590.cc which only runs for C++98 mode, and
808590-cxx11.cc which only runs for C++11 and later, but have almost
identical content (except for a defaulted special member in the C++11
one, to suppress a -Wdeprecated-copy warning).

This was done originally to ensure that the test ran for both C++98 mode
and C++11 mode, because the logic being tested was different enough to
need both to be tested. But it's trivial to run all tests in multiple
-std modes now, using GLIBCXX_TESTSUITE_STDS, so we don't need two
separate tests. We can remove one of the tests and allow the other one
to run in any -std mode.

libstdc++-v3/ChangeLog:

* testsuite/20_util/specialized_algorithms/uninitialized_copy/808590.cc:
Copy defaulted assignment operator from 808590-cxx11.cc to
suppress a warning.
* testsuite/20_util/specialized_algorithms/uninitialized_copy/808590-cxx11.cc:
Removed.

libstdc++: Increase timeouts for PSTL tests in debug mode [PR90276]

These tests compile very slowly in debug mode.

libstdc++-v3/ChangeLog:

PR libstdc++/90276
* testsuite/25_algorithms/pstl/alg_modifying_operations/rotate_copy.cc:
Increase timeout for debug mode.
* testsuite/25_algorithms/pstl/alg_modifying_operations/transform_binary.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_nonmodifying/mismatch.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_sorting/lexicographical_compare.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_sorting/minmax_element.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_sorting/set_symmetric_difference.cc:
Likewise.

libstdc++: Work around some PSTL test failures for debug mode [PR90276]

This addresses one known failure due to a bug in the upstream tests, and
a number of timeouts due to the algorithms running much more slowly with
debug mode checks enabled.

libstdc++-v3/ChangeLog:

PR libstdc++/90276
* testsuite/25_algorithms/pstl/alg_sorting/partial_sort.cc
[_GLIBCXX_DEBUG]: Add xfail-run-if for debug mode.
* testsuite/25_algorithms/pstl/alg_nonmodifying/nth_element.cc
[_GLIBCXX_DEBUG]: Reduce size of test data.
* testsuite/25_algorithms/pstl/alg_sorting/includes.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_sorting/set_util.h:
Likewise.