git.ipfire.org Git - thirdparty/gcc.git/log

Daily bump.

RISC-V: Move lmul calculation into macro

Notice we calculate LMUL according to --param=riscv-autovec-lmul
in multiple places: int lmul = riscv_autovec_lmul == RVV_DYNAMIC ? RVV_M8 : riscv_autovec_lmul;

Create a new macro for it for easier matain.

gcc/ChangeLog:

* config/riscv/riscv-opts.h (TARGET_MAX_LMUL): New macro.
* config/riscv/riscv-v.cc (preferred_simd_mode): Adapt macro.
(autovectorize_vector_modes): Ditto.
(can_find_related_mode_p): Ditto.

RISC-V: Add AVL propagation PASS for RVV auto-vectorization

This patch addresses the redundant AVL/VL toggling in RVV partial auto-vectorization
which is a known issue for a long time and I finally find the time to address it.

Consider a simple vector addition operation:

https://godbolt.org/z/7hfGfEjW3

void
foo (int *__restrict a,
     int *__restrict b,
     int *__restrict n)
{
  for (int i = 0; i < n; i++)
      a[i] = a[i] + b[i];
}

Optimized IR:

Loop body:
  _38 = .SELECT_VL (ivtmp_36, POLY_INT_CST [4, 4]);                          -> vsetvli a5,a2,e8,mf4,ta,ma
  ...
  vect__4.8_27 = .MASK_LEN_LOAD (vectp_a.6_29, 32B, { -1, ... }, _38, 0);    -> vle32.v v2,0(a0)
  vect__6.11_20 = .MASK_LEN_LOAD (vectp_b.9_25, 32B, { -1, ... }, _38, 0);   -> vle32.v v1,0(a1)
  vect__7.12_19 = vect__6.11_20 + vect__4.8_27;                              -> vsetvli a6,zero,e32,m1,ta,ma + vadd.vv v1,v1,v2
  .MASK_LEN_STORE (vectp_a.13_11, 32B, { -1, ... }, _38, 0, vect__7.12_19);  -> vsetvli zero,a5,e32,m1,ta,ma + vse32.v v1,0(a4)

We can see 2 redundant vsetvls inside the loop body due to AVL/VL toggling.
The AVL/VL toggling is because we are missing LEN information in simple PLUS_EXPR GIMPLE assignment:

vect__7.12_19 = vect__6.11_20 + vect__4.8_27;

GCC apply partial predicate load/store and un-predicated full vector operation on partial vectorization.
Such flow are used by all other targets like ARM SVE (RVV also uses such flow):

ARM SVE:

.L3:
        ld1w    z30.s, p7/z, [x0, x3, lsl 2]   -> predicated load
        ld1w    z31.s, p7/z, [x1, x3, lsl 2]   -> predicated load
        add     z31.s, z31.s, z30.s            -> un-predicated add
        st1w    z31.s, p7, [x0, x3, lsl 2]     -> predicated store

Such vectorization flow causes AVL/VL toggling on RVV so we need AVL propagation PASS for it.

Also, It's very unlikely that we can apply predicated operations on all vectorization for following reasons:

1. It's very heavy workload to support them on all vectorization and we don't see any benefits if we can handle that on targets backend.
2. Changing Loop vectorizer for it will make code base ugly and hard to maintain.
3. We will need so many patterns for all operations. Not only COND_LEN_ADD, COND_LEN_SUB, ....
   We also need COND_LEN_EXTEND, ...., COND_LEN_CEIL, ... .. over 100+ patterns, unreasonable number of patterns.

To conclude, we prefer un-predicated operations here, and design a nice and clean AVL propagation PASS for it to elide the redundant vsetvls
due to AVL/VL toggling.

The second question is that why we separate a PASS called AVL propagation. Why not optimize it in VSETVL PASS (We definitetly can optimize AVL in VSETVL PASS)

Frankly, I was planning to address such issue in VSETVL PASS that's why we recently refactored VSETVL PASS. However, I changed my mind recently after several
experiments and tries.

The reasons as follows:

1. For code base management and maintainience. Current VSETVL PASS is complicated enough and aleady has enough aggressive and fancy optimizations which
   turns out it can always generate optimal codegen in most of the cases. It's not a good idea keep adding more features into VSETVL PASS to make VSETVL
PASS become heavy and heavy again, then we will need to refactor it again in the future.
Actuall, the VSETVL PASS is very stable and optimal after the recent refactoring. Hopefully, we should not change VSETVL PASS any more except the minor
fixes.

2. vsetvl insertion (VSETVL PASS does this thing) and AVL propagation are 2 different things,  I don't think we should fuse them into same PASS.

3. VSETVL PASS is an post-RA PASS, wheras AVL propagtion should be done before RA which can reduce register allocation.

4. This patch's AVL propagation PASS only does AVL propagation for RVV partial auto-vectorization situations.
   This patch's codes are only hundreds lines which is very managable and can be very easily extended features and enhancements.
We can easily extend and enhance more AVL propagation in a clean and separate PASS in the future. (If we do it on VSETVL PASS, we will complicate
VSETVL PASS again which is already so complicated.)

Here is an example to demonstrate more:

https://godbolt.org/z/bE86sv3q5

void foo2 (int *__restrict a,
          int *__restrict b,
          int *__restrict c,
          int *__restrict a2,
          int *__restrict b2,
          int *__restrict c2,
          int *__restrict a3,
          int *__restrict b3,
          int *__restrict c3,
          int *__restrict a4,
          int *__restrict b4,
          int *__restrict c4,
          int *__restrict a5,
          int *__restrict b5,
          int *__restrict c5,
          int n)
{
    for (int i = 0; i < n; i++){
      a[i] = b[i] + c[i];
      b5[i] = b[i] + c[i];
      a2[i] = b2[i] + c2[i];
      a3[i] = b3[i] + c3[i];
      a4[i] = b4[i] + c4[i];
      a5[i] = a[i] + a4[i];
      a[i] = a5[i] + b5[i]+ a[i];

      a[i] = a[i] + c[i];
      b5[i] = a[i] + c[i];
      a2[i] = a[i] + c2[i];
      a3[i] = a[i] + c3[i];
      a4[i] = a[i] + c4[i];
      a5[i] = a[i] + a4[i];
      a[i] = a[i] + b5[i]+ a[i];
    }
}

1. Loop Body:

Before this patch:                                          After this patch:

      vsetvli a4,t1,e8,mf4,ta,ma                           vsetvli a4,t1,e32,m1,ta,ma
        vle32.v v2,0(a2)                                     vle32.v v2,0(a2)
        vle32.v v4,0(a1)                                     vle32.v v3,0(t2)
        vle32.v v1,0(t2)                                     vle32.v v4,0(a1)
        vsetvli a7,zero,e32,m1,ta,ma                         vle32.v v1,0(t0)
        vadd.vv v4,v2,v4                                     vadd.vv v4,v2,v4
        vsetvli zero,a4,e32,m1,ta,ma                         vadd.vv v1,v3,v1
        vle32.v v3,0(s0)                                     vadd.vv v1,v1,v4
        vsetvli a7,zero,e32,m1,ta,ma                         vadd.vv v1,v1,v4
        vadd.vv v1,v3,v1                                     vadd.vv v1,v1,v4
        vadd.vv v1,v1,v4                                     vadd.vv v1,v1,v2
        vadd.vv v1,v1,v4                                     vadd.vv v2,v1,v2
        vadd.vv v1,v1,v4                                     vse32.v v2,0(t5)
        vsetvli zero,a4,e32,m1,ta,ma                         vadd.vv v2,v2,v1
        vle32.v v4,0(a5)                                     vadd.vv v2,v2,v1
        vsetvli a7,zero,e32,m1,ta,ma                         slli a7,a4,2
        vadd.vv v1,v1,v2                                     vadd.vv v3,v1,v3
        vadd.vv v2,v1,v2                                     vle32.v v5,0(a5)
        vadd.vv v4,v1,v4                                     vle32.v v6,0(t6)
        vsetvli zero,a4,e32,m1,ta,ma                         vse32.v v3,0(t3)
        vse32.v v2,0(t5)                                     vse32.v v2,0(a0)
        vse32.v v4,0(a3)                                     vadd.vv v3,v3,v1
        vsetvli a7,zero,e32,m1,ta,ma                         vadd.vv v2,v1,v5
        vadd.vv v3,v1,v3                                     vse32.v v3,0(t4)
        vadd.vv v2,v2,v1                                     vadd.vv v1,v1,v6
        vadd.vv v2,v2,v1                                     vse32.v v2,0(a3)
        vsetvli zero,a4,e32,m1,ta,ma                         vse32.v v1,0(a6)
        vse32.v v2,0(a0)
        vse32.v v3,0(t3)
        vle32.v v2,0(t0)
        vsetvli a7,zero,e32,m1,ta,ma
        vadd.vv v3,v3,v1
        vsetvli zero,a4,e32,m1,ta,ma
        vse32.v v3,0(t4)
        vsetvli a7,zero,e32,m1,ta,ma
        slli    a7,a4,2
        vadd.vv v1,v1,v2
        sub     t1,t1,a4
        vsetvli zero,a4,e32,m1,ta,ma
        vse32.v v1,0(a6)

It's quite obvious, all heavy && redundant vsetvls inside loop body are eliminated.

2. Epilogue:
    Before this patch:                                          After this patch:

     .L5:                                                      .L5:
        ld      s0,8(sp)                                         ret
        addi    sp,sp,16
        jr      ra

This is the benefit we do the AVL propation before RA since we eliminate the use of 'a7' register
which is used by the redudant AVL/VL toggling instruction: 'vsetvli a7,zero,e32,m1,ta,ma'

The final codegen after this patch:

foo2:
lw t1,56(sp)
ld t6,0(sp)
ld t3,8(sp)
ld t0,16(sp)
ld t2,24(sp)
ld t4,32(sp)
ld t5,40(sp)
ble t1,zero,.L5
.L3:
vsetvli a4,t1,e32,m1,ta,ma
vle32.v v2,0(a2)
vle32.v v3,0(t2)
vle32.v v4,0(a1)
vle32.v v1,0(t0)
vadd.vv v4,v2,v4
vadd.vv v1,v3,v1
vadd.vv v1,v1,v4
vadd.vv v1,v1,v4
vadd.vv v1,v1,v4
vadd.vv v1,v1,v2
vadd.vv v2,v1,v2
vse32.v v2,0(t5)
vadd.vv v2,v2,v1
vadd.vv v2,v2,v1
slli a7,a4,2
vadd.vv v3,v1,v3
vle32.v v5,0(a5)
vle32.v v6,0(t6)
vse32.v v3,0(t3)
vse32.v v2,0(a0)
vadd.vv v3,v3,v1
vadd.vv v2,v1,v5
vse32.v v3,0(t4)
vadd.vv v1,v1,v6
vse32.v v2,0(a3)
vse32.v v1,0(a6)
sub t1,t1,a4
add a1,a1,a7
add a2,a2,a7
add a5,a5,a7
add t6,t6,a7
add t0,t0,a7
add t2,t2,a7
add t5,t5,a7
add a3,a3,a7
add a6,a6,a7
add t3,t3,a7
add t4,t4,a7
add a0,a0,a7
bne t1,zero,.L3
.L5:
ret

PR target/111318
PR target/111888

gcc/ChangeLog:

* config.gcc: Add AVL propagation pass.
* config/riscv/riscv-passes.def (INSERT_PASS_AFTER): Ditto.
* config/riscv/riscv-protos.h (make_pass_avlprop): Ditto.
* config/riscv/t-riscv: Ditto.
* config/riscv/riscv-avlprop.cc: New file.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c: Adapt test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/select_vl-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/pr111318.c: New test.
* gcc.target/riscv/rvv/autovec/pr111888.c: New test.
Tested-by: Patrick O'Neill <patrick@rivosinc.com>

libstdc++: Fix exception thrown by std::shared_lock::unlock() [PR112089]

The incorrect errc constant here looks like a copy&paste error.

libstdc++-v3/ChangeLog:

PR libstdc++/112089
* include/std/shared_mutex (shared_lock::unlock): Change errc
constant to operation_not_permitted.
* testsuite/30_threads/shared_lock/locking/112089.cc: New test.

libstdc++: Add dg-timeout-factor to <chrono> IO tests

This avoids failures due to compilation timeouts when testing with a low
tool_timeout value.

libstdc++-v3/ChangeLog:

* testsuite/20_util/duration/io.cc: Double timeout using
dg-timeout-factor.
* testsuite/std/time/day/io.cc: Likewise.
* testsuite/std/time/format.cc: Likewise.
* testsuite/std/time/hh_mm_ss/io.cc: Likewise.
* testsuite/std/time/month/io.cc: Likewise.
* testsuite/std/time/month_day/io.cc: Likewise.
* testsuite/std/time/month_day_last/io.cc: Likewise.
* testsuite/std/time/month_weekday/io.cc: Likewise.
* testsuite/std/time/month_weekday_last/io.cc: Likewise.
* testsuite/std/time/weekday/io.cc: Likewise.
* testsuite/std/time/weekday_indexed/io.cc: Likewise.
* testsuite/std/time/weekday_last/io.cc: Likewise.
* testsuite/std/time/year/io.cc: Likewise.
* testsuite/std/time/year_month/io.cc: Likewise.
* testsuite/std/time/year_month_day/io.cc: Likewise.
* testsuite/std/time/year_month_day_last/io.cc: Likewise.
* testsuite/std/time/year_month_weekday/io.cc: Likewise.
* testsuite/std/time/year_month_weekday_last/io.cc: Likewise.
* testsuite/std/time/zoned_time/io.cc: Likewise.

Add attribute((null_terminated_string_arg(PARAM_IDX)))

This patch adds a new function attribute to GCC for marking that an
argument is expected to be a null-terminated string.

For example, consider:

  void test_a (const char *p)
    __attribute__((null_terminated_string_arg (1)));

which would indicate to humans and compilers that argument 1 of "test_a"
is expected to be a null-terminated string, with the idea:

- we should complain if it's not valid to read from *p up to the first
  '\0' character in the buffer

- we should complain if *p is not terminated, or if it's uninitialized
  before the first '\0' character

This is independent of the nonnull-ness of the pointer: if you also want
to express that the argument must be non-null, we already have
__attribute__((nonnull (N))), so the user can write e.g.:

  void test_b (const char *p)
    __attribute__((null_terminated_string_arg (1))
    __attribute__((nonnull (1)));

which can also be spelled as:

  void test_b (const char *p)
     __attribute__((null_terminated_string_arg (1),
                    nonnull (1)));

For a function similar to strncpy, we can use the "access" attribute to
express a maximum size of the read:

  void test_c (const char *p, size_t sz)
     __attribute__((null_terminated_string_arg (1),
                    nonnull (1),
                    access (read_only, 1, 2)));

The patch implements:
(a) C/C++ frontends: recognition of this attribute
(b) analyzer: usage of this attribute

gcc/analyzer/ChangeLog:
* region-model.cc
(region_model::check_external_function_for_access_attr): Split
out, replacing with...
(region_model::check_function_attr_access): ...this new function
and...
(region_model::check_function_attrs): ...this new function.
(region_model::check_one_function_attr_null_terminated_string_arg):
New.
(region_model::check_function_attr_null_terminated_string_arg):
New.
(region_model::handle_unrecognized_call): Update for renaming of
check_external_function_for_access_attr to check_function_attrs.
(region_model::check_for_null_terminated_string_arg): Add return
value to one overload.  Make both overloads const.
* region-model.h: Include "stringpool.h" and "attribs.h".
(region_model::check_for_null_terminated_string_arg): Add return
value to one overload.  Make both overloads const.
(region_model::check_external_function_for_access_attr): Delete
decl.
(region_model::check_function_attr_access): New decl.
(region_model::check_function_attr_null_terminated_string_arg):
New decl.
(region_model::check_one_function_attr_null_terminated_string_arg):
New decl.
(region_model::check_function_attrs): New decl.

gcc/c-family/ChangeLog:
* c-attribs.cc (c_common_attribute_table): Add
"null_terminated_string_arg".
(handle_null_terminated_string_arg_attribute): New.

gcc/ChangeLog:
* doc/extend.texi (Common Function Attributes): Add
null_terminated_string_arg.

gcc/testsuite/ChangeLog:
* c-c++-common/analyzer/attr-null_terminated_string_arg-access-read_write.c:
New test.
* c-c++-common/analyzer/attr-null_terminated_string_arg-access-without-size.c:
New test.
* c-c++-common/analyzer/attr-null_terminated_string_arg-multiple.c:
New test.
* c-c++-common/analyzer/attr-null_terminated_string_arg-nonnull-2.c:
New test.
* c-c++-common/analyzer/attr-null_terminated_string_arg-nonnull-sized.c:
New test.
* c-c++-common/analyzer/attr-null_terminated_string_arg-nonnull.c:
New test.
* c-c++-common/analyzer/attr-null_terminated_string_arg-nullable-sized.c:
New test.
* c-c++-common/analyzer/attr-null_terminated_string_arg-nullable.c:
New test.
* c-c++-common/attr-null_terminated_string_arg.c: New test.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

testsuite, aarch64: Normalise options to aarch64.exp.

When the compiler is configured --with-cpu= and that is different from
the baselines assumed, we see excess tes fails (primarly in body code
scans which are necessarily sensitive to costs). To stabilize the
testsuite against such changes, use aarch64-with-arch-dg-options ()
to provide suitable consistent defaults.

e.g. for --with-cpu=xgene1 we see over 100 excess fails which are
removed by this change.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/aarch64.exp: Use aarch64-with-arch-dg-options
to normaize the options to the tests in aarch64.exp.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>

testsuite, Darwin: Adjust target test for modern OS.

The same conditions on use of DYLD_LIBRARY_PATH apply to OS versions
11 to 14, so make the test general.

gcc/testsuite/ChangeLog:

* lib/target-libpath.exp: Skip DYLD_LIBRARY_PATH for all
current OS versions > 10.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>

match: Simplify `a != C1 ? abs(a) : C2` when C2 == abs(C1) [PR111957]

This adds a match pattern for `a != C1 ? abs(a) : C2` which gets simplified
to `abs(a)`. if C1 was originally *_MIN then change it over to use absu instead
of abs.

Bootstrapped and tested on x86_64-linux-gnu with no regressions.

PR tree-optimization/111957

gcc/ChangeLog:

* match.pd (`a != C1 ? abs(a) : C2`): New pattern.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/phi-opt-40.c: New test.

Add effective target to OpenMP tests

This adds an effective target DejaGnu directive to prevent these testcases from
failing on GCC configurations that do not support OpenMP.
This fixes 8d2130a4e5c.

gcc/testsuite/ChangeLog:

* gfortran.dg/c_ptr_tests_20.f90: Add "fopenmp" effective target.
* gfortran.dg/c_ptr_tests_21.f90: Add "fopenmp" effective target.

[range-op] Remove unused variable in fold_range.

gcc/ChangeLog:

* range-op-float.cc (range_operator::fold_range): Delete unused
variable.

[range-ops] Remove unneeded parameters from rv_fold.

Now that the floating point version of rv_fold calculates its result
in an frange, we can remove the superfluous LB, UB, and MAYBE_NAN
arguments.

gcc/ChangeLog:

* range-op-float.cc (range_operator::fold_range): Remove
superfluous code.
(range_operator::rv_fold): Remove unneeded arguments.
(operator_plus::rv_fold): Same.
(operator_minus::rv_fold): Same.
(operator_mult::rv_fold): Same.
(operator_div::rv_fold): Same.
* range-op-mixed.h: Remove lb, ub, and maybe_nan arguments from
rv_fold methods.
* range-op.h: Same.

[range-ops] Add frange& argument to rv_fold.

The floating point version of rv_fold returns its result in 3 pieces:
the lower bound, the upper bound, and a maybe_nan bit.  It is cleaner
to return everything in an frange, thus bringing the floating point
version of rv_fold in line with the integer version.

This first patch adds an frange argument, while keeping the current
functionality, and asserting that we get the same results.  In a
follow-up patch I will nuke the now useless 3 arguments.  Splitting
this into two patches makes it easier to bisect any problems if any
should arise.

gcc/ChangeLog:

* range-op-float.cc (range_operator::fold_range): Pass frange
argument to rv_fold.
(range_operator::rv_fold): Add frange argument.
(operator_plus::rv_fold): Same.
(operator_minus::rv_fold): Same.
(operator_mult::rv_fold): Same.
(operator_div::rv_fold): Same.
* range-op-mixed.h: Add frange argument to rv_fold methods.
* range-op.h: Same.

RISC-V: Pass abi to g++ rvv testsuite

On rv32gcv testcases like g++.target/riscv/rvv/base/bug-22.C fail with:
FAIL: g++.target/riscv/rvv/base/bug-22.C (test for excess errors)
Excess errors:
cc1plus: error: ABI requires '-march=rv32'

This patch adds the -mabi argument to g++ rvv tests.

gcc/testsuite/ChangeLog:

* g++.target/riscv/rvv/rvv.exp: Add -mabi argument to CFLAGS.

Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>

libatomic: Consider '--with-build-sysroot=[...]' for target libraries' build-tree testing (instead of build-time 'CC' etc.) [PR109951]

Similar to commit fb5d27be272b71fb9026224535fc73f125ce3be7
"libgomp: Consider '--with-build-sysroot=[...]' for target libraries' build-tree testing (instead of build-time 'CC' etc.) [PR91884, PR109951]",
this is commit 5ff06d762a88077aff0fb637c931c64e6f47f93d
"libatomic/test: Fix compilation for build sysroot" done differently,
avoiding build-tree testing use of any random gunk that may appear in
build-time 'CC'.

PR testsuite/109951
libatomic/
* configure.ac: 'AC_SUBST(SYSROOT_CFLAGS_FOR_TARGET)'.
* Makefile.in: Regenerate.
* configure: Likewise.
* testsuite/Makefile.in: Likewise.
* testsuite/lib/libatomic.exp (libatomic_init): If
'--with-build-sysroot=[...]' was specified, use it for build-tree
testing.
* testsuite/libatomic-site-extra.exp.in (GCC_UNDER_TEST): Don't
set.
(SYSROOT_CFLAGS_FOR_TARGET): Set.

libffi: Consider '--with-build-sysroot=[...]' for target libraries' build-tree testing (instead of build-time 'CC' etc.) [PR109951]

Similar to commit fb5d27be272b71fb9026224535fc73f125ce3be7
"libgomp: Consider '--with-build-sysroot=[...]' for target libraries' build-tree testing (instead of build-time 'CC' etc.) [PR91884, PR109951]",
this is commit a0b48358cb1e70e161a87ec5deb7a4b25defba6b
"libffi/test: Fix compilation for build sysroot" done differently,
avoiding build-tree testing use of any random gunk that may appear in
build-time 'CC', 'CXX'.

PR testsuite/109951
libffi/
* configure.ac: 'AC_SUBST(SYSROOT_CFLAGS_FOR_TARGET)'.
<local.exp>: Don't set 'CC_FOR_TARGET', 'CXX_FOR_TARGET', instead
set 'SYSROOT_CFLAGS_FOR_TARGET'.
* Makefile.in: Regenerate.
* configure: Likewise.
* include/Makefile.in: Likewise.
* man/Makefile.in: Likewise.
* testsuite/Makefile.in: Likewise.
* testsuite/lib/libffi.exp (libffi_target_compile): If
'--with-build-sysroot=[...]' was specified, use it for build-tree
testing.

testsuite: Allow general skips/requires in PCH tests

dg-pch.exp handled dg-require-effective-target pch_supported_debug
as a special case, by grepping the source code.  This patch tries
to generalise it to other dg-require-effective-targets, and to
dg-skip-if.

There also seemed to be some errors in check-flags.  It used:

    lappend $args [list <elt>]

which treats the contents of args as a variable name.  I think
it was supposed to be "lappend args" instead.  From the later
code, the element was supposed to be <elt> itself, rather than
a singleton list containing <elt>.

We can also save some time by doing the common early-exit first.

Doing this removes the need to specify the dg-require-effective-target
in both files.  Tested by faking unsupported debug and checking that
the tests were still correctly skipped.

gcc/testsuite/
* lib/target-supports-dg.exp (check-flags): Move default argument
handling further up.  Fix a couple of issues in the lappends.
Avoid frobbing the compiler flags if the return value is already
known to be 1.
* lib/dg-pch.exp (dg-flags-pch): Process the dg-skip-if and
dg-require-effective-target directives to see whether the
assembly test should be skipped.
* gcc.dg/pch/valid-1.c: Remove dg-require-effective-target.
* gcc.dg/pch/valid-1b.c: Likewise.

arm: Use deltas for Arm switch tables

For normal optimization for the Arm state in gcc we get an uncompressed
table of jump targets. This is in the middle of the text segment
far larger than necessary, especially at -Os.
This patch compresses the table to use deltas in a similar manner to
Thumb code generation.
Similar code is also used for -fpic where we currently generate a jump
to a jump. In this format the jumps are too dense for the hardware branch
predictor to handle accurately, so execution is likely to be very expensive.

Changes to switch statements for arm include a new function to handle the
assembly generation for different machine modes. This allows for more
optimisation to be performed in aout.h where arm has switched from using
ASM_OUTPUT_ADDR_VEC_ELT to using ASM_OUTPUT_ADDR_DIFF_ELT.
In ASM_OUTPUT_ADDR_DIFF_ELT new assembly generation options have been
added to utilise the different machine modes. Additional changes
made to the casesi expand and insn, CASE_VECTOR_PC_RELATIVE,
CASE_VECTOR_SHORTEN_MODE and LABEL_ALIGN_AFTER_BARRIER are all
to accomodate this new approach to switch statement generation.

New tests have been added and no regressions on arm-none-eabi.

gcc/ChangeLog:

* config/arm/aout.h (ASM_OUTPUT_ADDR_DIFF_ELT): Add table output
for different machine modes for arm.
* config/arm/arm-protos.h (arm_output_casesi): New prototype.
* config/arm/arm.h (CASE_VECTOR_PC_RELATIVE): Make arm use
ASM_OUTPUT_ADDR_DIFF_ELT.
(CASE_VECTOR_SHORTEN_MODE): Change table size calculation for
TARGET_ARM.
(LABEL_ALIGN_AFTER_BARRIER): Change to accommodate .p2align 2
for TARGET_ARM.
* config/arm/arm.cc (arm_output_casesi): New function.
* config/arm/arm.md (arm_casesi_internal): Change casesi expand
and insn.
for arm to use new function arm_output_casesi.

gcc/testsuite/ChangeLog:

* gcc.target/arm/arm-switchstatement.c: New test.

Darwin: Make metadata symbol lables linker-visible for GNU objc.

Now we have shifted to using the same relocation mechanism as clang for
objective-c typeinfo the static linker needs to have a linker-visible
symbol for metadata names (this is only needed for GNU objective C, for
NeXT the names are in separate sections).

gcc/ChangeLog:

* config/darwin.h
(darwin_label_is_anonymous_local_objc_name): Make metadata names
linker-visibile for GNU objective C.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>

[RA]: Modfify cost calculation for dealing with equivalences

RISCV target developers reported that pseudos with equivalence used in
a loop can be spilled.  Simple changes of heuristics of cost
calculation of pseudos with equivalence or even ignoring equivalences
resulted in numerous testsuite failures on different targets or worse
spec2017 performance.  This patch implements more sophisticated cost
calculations of pseudos with equivalences.  The patch does not change
RA behaviour for targets still using the old reload pass instead of
LRA.  The patch solves the reported problem and improves x86-64
specint2017 a bit (specfp2017 performance stays the same).  The patch
takes into account how the equivalence will be used: will it be
integrated into the user insns or require an input reload insn.  It
requires additional pass over insns.  To compensate RA slow down, the
patch removes a pass over insns in the reload pass used by IRA before.
This also decouples IRA from reload more and will help to remove the
reload pass in the future if it ever happens.

gcc/ChangeLog:

* dwarf2out.cc (reg_loc_descriptor): Use lra_eliminate_regs when
LRA is used.
* ira-costs.cc: Include regset.h.
(equiv_can_be_consumed_p, get_equiv_regno, calculate_equiv_gains):
New functions.
(find_costs_and_classes): Call calculate_equiv_gains and redefine
mem_cost of pseudos with equivs when LRA is used.
* var-tracking.cc: Include ira.h and lra.h.
(vt_initialize): Use lra_eliminate_regs when LRA is used.

Fortran: Fix incompatible types between INTEGER(8) and TYPE(c_ptr)

In the context of an OpenMP declare variant directive, arguments of type C_PTR
are sometimes recognised as C_PTR in the base function and as INTEGER(8) in the
variant - or the other way around, depending on the parsing order.
This patch prevents such situation from turning into a compile error.

2023-10-20 Paul-Antoine Arras <pa@codesourcery.com>
Tobias Burnus <tobias@codesourcery.com>

gcc/fortran/ChangeLog:

* interface.cc (gfc_compare_types): Return true if one type is C_PTR
and the other is a compatible INTEGER(8).
* misc.cc (gfc_typename): Handle the case where an INTEGER(8) actually
holds a TYPE(C_PTR).

gcc/testsuite/ChangeLog:

* gfortran.dg/c_ptr_tests_20.f90: New test, checking that INTEGER(8)
and TYPE(C_PTR) are recognised as compatible.
* gfortran.dg/c_ptr_tests_21.f90: New test, exercising the error
detection for C_FUNPTR.

DOC: Update COND_LEN document

gcc/ChangeLog:

* doc/md.texi: Adapt COND_LEN pseudo code.

PR 91865: Avoid ZERO_EXTEND of ZERO_EXTEND in make_compound_operation.

This patch is my proposed solution to PR rtl-optimization/91865.
Normally RTX simplification canonicalizes a ZERO_EXTEND of a ZERO_EXTEND
to a single ZERO_EXTEND, but as shown in this PR it is possible for
combine's make_compound_operation to unintentionally generate a
non-canonical ZERO_EXTEND of a ZERO_EXTEND, which is unlikely to be
matched by the backend.

For the new test case:

const int table[2] = {1, 2};
int foo (char i) { return table[i]; }

compiling with -O2 -mlarge on msp430 we currently see:

Trying 2 -> 7:
    2: r25:HI=zero_extend(R12:QI)
      REG_DEAD R12:QI
    7: r28:PSI=sign_extend(r25:HI)#0
      REG_DEAD r25:HI
Failed to match this instruction:
(set (reg:PSI 28 [ iD.1772 ])
    (zero_extend:PSI (zero_extend:HI (reg:QI 12 R12 [ iD.1772 ]))))

which results in the following code:

foo: AND     #0xff, R12
        RLAM.A #4, R12 { RRAM.A #4, R12
        RLAM.A  #1, R12
        MOVX.W  table(R12), R12
        RETA

With this patch, we now see:

Trying 2 -> 7:
    2: r25:HI=zero_extend(R12:QI)
      REG_DEAD R12:QI
    7: r28:PSI=sign_extend(r25:HI)#0
      REG_DEAD r25:HI
Successfully matched this instruction:
(set (reg:PSI 28 [ iD.1772 ])
    (zero_extend:PSI (reg:QI 12 R12 [ iD.1772 ])))
allowing combination of insns 2 and 7
original costs 4 + 8 = 12
replacement cost 8

foo: MOV.B   R12, R12
        RLAM.A  #1, R12
        MOVX.W  table(R12), R12
        RETA

2023-10-26  Roger Sayle  <roger@nextmovesoftware.com>
    Richard Biener  <rguenther@suse.de>

gcc/ChangeLog
PR rtl-optimization/91865
* combine.cc (make_compound_operation): Avoid creating a
ZERO_EXTEND of a ZERO_EXTEND.

gcc/testsuite/ChangeLog
PR rtl-optimization/91865
* gcc.target/msp430/pr91865.c: New test case.

Pass type of comparison operands instead of comparison result to truth_type_for in build_vec_cmp.

gcc/c/ChangeLog:

* c-typeck.cc (build_vec_cmp): Pass type of arg0 to
truth_type_for.

gcc/cp/ChangeLog:

* typeck.cc (build_vec_cmp): Pass type of arg0 to
truth_type_for.

LoongArch:Enable vcond_mask_mn expanders for SF/DF modes.

If the vcond_mask patterns don't support fp modes, the vector
FP comparison instructions will not be generated.

gcc/ChangeLog:

* config/loongarch/lasx.md (vcond_mask_<ILASX:mode><ILASX:mode>): Change to
(vcond_mask_<mode><mode256_i>): this.
* config/loongarch/lsx.md (vcond_mask_<ILSX:mode><ILSX:mode>): Change to
(vcond_mask_<mode><mode_i>): this.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/vector/lasx/lasx-vcond-1.c: New test.
* gcc.target/loongarch/vector/lasx/lasx-vcond-2.c: New test.
* gcc.target/loongarch/vector/lsx/lsx-vcond-1.c: New test.
* gcc.target/loongarch/vector/lsx/lsx-vcond-2.c: New test.

testsuite: Fix _BitInt in gcc.misc-tests/godump-1.c

Currently _BitInt is only supported on x86_64 which means that for other
targets all tests fail with e.g.

gcc.misc-tests/godump-1.c:237:1: sorry, unimplemented: '_BitInt(32)' is not supported on this target
237 | _BitInt(32) b32_v;
| ^~~~~~~

Instead of requiring _BitInt support for godump-1.c, move _BitInt tests
into godump-2.c such that all other tests in godump-1.c are still
executed in case of missing _BitInt support.

gcc/testsuite/ChangeLog:

* gcc.misc-tests/godump-1.c: Move _BitInt tests into godump-2.c.
* gcc.misc-tests/godump-2.c: New test.

More '#ifdef ASM_OUTPUT_DEF' -> 'if (TARGET_SUPPORTS_ALIASES)' etc.

Per commit a8b522b483ebb8c972ecfde8779a7a6ec16aecd6 (Subversion r251048)
"Introduce TARGET_SUPPORTS_ALIASES", there is the idea that a back end may or
may not provide symbol aliasing support ('TARGET_SUPPORTS_ALIASES') independent
of '#ifdef ASM_OUTPUT_DEF', and in particular, depending not just on static but
instead on dynamic (run-time) configuration. There did remain a few instances
where we currently still assume that from '#ifdef ASM_OUTPUT_DEF' follows
'TARGET_SUPPORTS_ALIASES'. Change these to 'if (TARGET_SUPPORTS_ALIASES)',
similarly, or 'gcc_checking_assert (TARGET_SUPPORTS_ALIASES);'.

gcc/
* ipa-icf.cc (sem_item::target_supports_symbol_aliases_p):
'gcc_checking_assert (TARGET_SUPPORTS_ALIASES);' before
'return true;'.
* ipa-visibility.cc (function_and_variable_visibility): Change
'#ifdef ASM_OUTPUT_DEF' to 'if (TARGET_SUPPORTS_ALIASES)'.
* varasm.cc (output_constant_pool_contents)
[#ifdef ASM_OUTPUT_DEF]:
'gcc_checking_assert (TARGET_SUPPORTS_ALIASES);'.
(do_assemble_alias) [#ifdef ASM_OUTPUT_DEF]:
'if (!TARGET_SUPPORTS_ALIASES)',
'gcc_checking_assert (seen_error ());'.
(assemble_alias): Change '#if !defined (ASM_OUTPUT_DEF)' to
'if (!TARGET_SUPPORTS_ALIASES)'.
(default_asm_output_anchor):
'gcc_checking_assert (TARGET_SUPPORTS_ALIASES);'.

set hardcmp eh probs

Set execution count of EH blocks, and probability of EH edges.

for gcc/ChangeLog

PR tree-optimization/111520
* gimple-harden-conditionals.cc
(pass_harden_compares::execute): Set EH edge probability and
EH block execution count.

for gcc/testsuite/ChangeLog

PR tree-optimization/111520
* g++.dg/torture/harden-comp-pr111520.cc: New.

rename make_eh_edges to make_eh_edge

Since make_eh_edges creates at most one edge, rename it to
make_eh_edge.

for gcc/ChangeLog

* tree-eh.h (make_eh_edges): Rename to...
(make_eh_edge): ... this.
* tree-eh.cc: Likewise. Adjust all callers...
* gimple-harden-conditionals.cc: ... here, ...
* gimple-harden-control-flow.cc: ... here, ...
* tree-cfg.cc: ... here, ...
* tree-inline.cc: ... and here.

Daily bump.

Darwin: Handle the fPIE option specially.

For Darwin, PIE requires PIC codegen, but otherwise is only a link-time
change. For almost all Darwin, we do not report __PIE__; the exception is
32bit X86 and from Darwin12 to 17 only (32 bit is no longer supported
after Darwin17).

gcc/ChangeLog:

* config/darwin.cc (darwin_override_options): Handle fPIE.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>

config, aarch64: Use a more compatible sed invocation.

Currently, the sed command used to parse --with-{cpu,tune,arch} are
using GNU-specific extension (automatically recognising extended regex).

This is failing on Darwin, which defualts to Posix behaviour.
However '-E' is accepted to indicate an extended RE. Strictly, this
is also not really sufficient, since we should only require a Posix
sed.

gcc/ChangeLog:

* config.gcc: Use -E to to sed to indicate that we are using
extended REs.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>

tree: update address_space comment

Mention front-end uses of the address_space bit-field, and remove the
inaccurate "only".

gcc/ChangeLog:

* tree-core.h (struct tree_base): Update address_space comment.

AArch64: Improve immediate generation

Further improve immediate generation by adding support for 2-instruction
MOV/EOR bitmask immediates. This reduces the number of 3/4-instruction
immediates in SPECCPU2017 by ~2%.

Reviewed-by: Richard Earnshaw <Richard.Earnshaw@arm.com>
gcc/ChangeLog:
* config/aarch64/aarch64.cc (aarch64_internal_mov_immediate)
Add support for immediates using MOV/EOR bitmask.

gcc/testsuite:
* gcc.target/aarch64/imm_choice_comparison.c: Change tests.
* gcc.target/aarch64/moveor_imm.c: Add new test.
* gcc.target/aarch64/pr106583.c: Change tests.

c++: improve comment

It's incorrect to say that the address of an OFFSET_REF is always a
pointer-to-member; if it represents an overload set with both static and
non-static member functions that ends up resolving to a static one, the
address is a normal pointer. And let's go ahead and mention explicit object
member functions even though the patch hasn't landed yet.

gcc/cp/ChangeLog:

* cp-tree.def: Improve OFFSET_REF comment.
* cp-gimplify.cc (cp_fold_immediate): Add to comment.

i386: Narrow test instructions with immediate operands [PR111698]

Narrow test instructions with immediate operand that test memory location
for zero. E.g. testl $0x00aa0000, mem can be converted to testb $0xaa, mem+2.
Reject targets where reading (possibly unaligned) part of memory location
after a large write to the same address causes store-to-load forwarding stall.

PR target/111698

gcc/ChangeLog:

* config/i386/x86-tune.def (X86_TUNE_PARTIAL_MEMORY_READ_STALL):
New tune.
* config/i386/i386.h (TARGET_PARTIAL_MEMORY_READ_STALL): New macro.
* config/i386/i386.md: New peephole pattern to narrow test
instructions with immediate operands that test memory locations
for zero.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr111698.c: New test.

Faster irange union for appending ranges.

A common pattern to to append a range to an existing range via union.
This optimizes that process.

* value-range.cc (irange::union_append): New.
(irange::union_): Call union_append when appropriate.
* value-range.h (irange::union_append): New prototype.

LoongArch: Fix vfrint-releated comments in lsxintrin.h and lasxintrin.h

The comment of vfrint-related intrinsic functions does not match the return
value type in definition. This patch fixes these comments.

gcc/ChangeLog:

* config/loongarch/lasxintrin.h (__lasx_xvftintrnel_l_s): Fix comments.
(__lasx_xvfrintrne_s): Ditto.
(__lasx_xvfrintrne_d): Ditto.
(__lasx_xvfrintrz_s): Ditto.
(__lasx_xvfrintrz_d): Ditto.
(__lasx_xvfrintrp_s): Ditto.
(__lasx_xvfrintrp_d): Ditto.
(__lasx_xvfrintrm_s): Ditto.
(__lasx_xvfrintrm_d): Ditto.
* config/loongarch/lsxintrin.h (__lsx_vftintrneh_l_s): Ditto.
(__lsx_vfrintrne_s): Ditto.
(__lsx_vfrintrne_d): Ditto.
(__lsx_vfrintrz_s): Ditto.
(__lsx_vfrintrz_d): Ditto.
(__lsx_vfrintrp_s): Ditto.
(__lsx_vfrintrp_d): Ditto.
(__lsx_vfrintrm_s): Ditto.
(__lsx_vfrintrm_d): Ditto.

LoongArch: Implement __builtin_thread_pointer for TLS.

gcc/ChangeLog:

* config/loongarch/loongarch.md (get_thread_pointer<mode>):Adds the
instruction template corresponding to the __builtin_thread_pointer
function.
* doc/extend.texi:Add the __builtin_thread_pointer function support
description to the documentation.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/builtin_thread_pointer.c: New test.

c++: add fixed testcase [PR99804]

We accept the non-dependent call f(e) here ever since the
NON_DEPENDENT_EXPR removal patch r14-4793-gdad311874ac3b3.
I haven't looked closely into why but I suspect wrapping 'e'
in a NON_DEPENDENT_EXPR was causing the argument conversion
to misbehave.

PR c++/99804

gcc/testsuite/ChangeLog:

* g++.dg/template/enum9.C: New test.

jit: dump string literal initializers correctly

Signed-off-by: David Malcolm <dmalcolm@redhat.com>
gcc/jit/ChangeLog:
* jit-recording.cc (recording::global::write_to_dump): Fix
dump of string literal initializers.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

libstdc++: Build libstdc++_libbacktrace.a as PIC [PR111936]

In order for std::stacktrace to be used in a shared library, the
libbacktrace symbols need to be built with -fPIC. Add the libtool
-prefer-pic flag to the commands in src/libbacktrace/Makefile so that
the archive contains PIC objects.

libstdc++-v3/ChangeLog:

PR libstdc++/111936
* src/libbacktrace/Makefile.am: Add -prefer-pic to libtool
compile commands.
* src/libbacktrace/Makefile.in: Regenerate.

PR modula2/111955 introduce isnan support to Builtins.def

This patch introduces isnan, isnanf and isnanl to Builtins.def.
It requires fallback functions isnan, isnanf, isnanl to be implemented in
libgm2/libm2pim/wrapc.cc and gm2-libs-ch/wrapc.c.
Access to the GCC builtin isnan tree is provided by adding
an isnan definition and support functions to gm2-gcc/m2builtins.cc.

gcc/m2/ChangeLog:

PR modula2/111955
* gm2-gcc/m2builtins.cc (gm2_isnan_node): New tree.
(DoBuiltinIsnan): New function.
(m2builtins_BuiltInIsnan): New function.
(m2builtins_init): Initialize gm2_isnan_node.
(list_of_builtins): Add define for __builtin_isnan.
* gm2-libs-ch/wrapc.c (wrapc_isnan): New function.
(wrapc_isnanf): New function.
(wrapc_isnanl): New function.
* gm2-libs/Builtins.def (isnanf): New procedure function.
(isnan): New procedure function.
(isnanl): New procedure function.
* gm2-libs/Builtins.mod:
* gm2-libs/wrapc.def (isnan): New function.
(isnanf): New function.
(isnanl): New function.

libgm2/ChangeLog:

PR modula2/111955
* libm2pim/wrapc.cc (isnan): Export new function.
(isnanf): Export new function.
(isnanl): Export new function.

gcc/testsuite/ChangeLog:

PR modula2/111955
* gm2/pimlib/run/pass/testnan.mod: New test.

Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>

rtl-ssa: Add new helper functions

This patch adds some RTL-SSA helper functions. They will be
used by the upcoming late-combine pass.

The patch contains the first non-template out-of-line function declared
in movement.h, so it adds a movement.cc. I realise it seems a bit
over-the-top to have a file with just one function, but it might grow
in future. :)

gcc/
* Makefile.in (OBJS): Add rtl-ssa/movement.o.
* rtl-ssa/access-utils.h (accesses_include_nonfixed_hard_registers)
(single_set_info): New functions.
(remove_uses_of_def, accesses_reference_same_resource): Declare.
(insn_clobbers_resources): Likewise.
* rtl-ssa/accesses.cc (rtl_ssa::remove_uses_of_def): New function.
(rtl_ssa::accesses_reference_same_resource): Likewise.
(rtl_ssa::insn_clobbers_resources): Likewise.
* rtl-ssa/movement.h (can_move_insn_p): Declare.
* rtl-ssa/movement.cc: New file.

rtl-ssa: Extend make_uses_available

The first in-tree use of RTL-SSA was fwprop, and one of the goals
was to make the fwprop rewrite preserve the old behaviour as far
as possible.  The switch to RTL-SSA was supposed to be a pure
infrastructure change.  So RTL-SSA has various FIXMEs for things
that were artifically limited to faciliate the old-fwprop vs.
new-fwprop comparison.

One of the things that fwprop wants to do is extend live ranges, and
function_info::make_use_available tried to keep within the cases that
old fwprop could handle.

Since the information is built in extended basic blocks, it's easy
to handle intra-EBB queries directly.  This patch does that, and
removes the associated FIXME.

To get a flavour for how much difference this makes, I tried compiling
the testsuite at -Os for at least one target per supported CPU and OS.
For most targets, only a handful of tests changed, but the vast majority
of changes were positive.  The only target that seemed to benefit
significantly was i686-apple-darwin.

The main point of the patch is to remove the FIXME and to enable
the upcoming post-RA late-combine pass to handle more cases.

gcc/
* rtl-ssa/functions.h (function_info::remains_available_at_insn):
New member function.
* rtl-ssa/accesses.cc (function_info::remains_available_at_insn):
Likewise.
(function_info::make_use_available): Avoid false negatives for
queries within an EBB.

rtl-ssa: Use frequency-weighted insn costs

rtl_ssa::changes_are_worthwhile used the standard approach
of summing up the individual costs of the old and new sequences
to see which one is better overall. But when optimising for
speed and changing instructions in multiple blocks, it seems
better to weight the cost of each instruction by its execution
frequency. (We already do something similar for SLP layouts.)

gcc/
* rtl-ssa/changes.cc: Include sreal.h.
(rtl_ssa::changes_are_worthwhile): When optimizing for speed,
scale the cost of each instruction by its execution frequency.

rtl-ssa: Handle call clobbers in more places

In order to save (a lot of) memory, RTL-SSA avoids creating
individual clobber records for every call-clobbered register.
It instead maintains a list & splay tree of calls in an EBB,
grouped by ABI.

This patch takes these call clobbers into account in a couple
more routines. I don't think this will have any effect on
existing users, since it's only necessary for hard registers.

gcc/
* rtl-ssa/access-utils.h (next_call_clobbers): New function.
(is_single_dominating_def, remains_available_on_exit): Replace with...
* rtl-ssa/functions.h (function_info::is_single_dominating_def)
(function_info::remains_available_on_exit): ...these new member
functions.
(function_info::m_clobbered_by_calls): New member variable.
* rtl-ssa/functions.cc (function_info::function_info): Explicitly
initialize m_clobbered_by_calls.
* rtl-ssa/insns.cc (function_info::record_call_clobbers): Update
m_clobbered_by_calls for each call-clobber note.
* rtl-ssa/member-fns.inl (function_info::is_single_dominating_def):
New function. Check for call clobbers.
* rtl-ssa/accesses.cc (function_info::remains_available_on_exit):
Likewise.

rtl-ssa: Calculate dominance frontiers for the exit block

The exit block can have multiple predecessors, for example if the
function calls __builtin_eh_return. We might then need PHI nodes
for values that are live on exit.

RTL-SSA uses the normal dominance frontiers approach for calculating
where PHI nodes are needed. However, dominannce.cc only calculates
dominators for normal blocks, not the exit block.
calculate_dominance_frontiers likewise only calculates dominance
frontiers for normal blocks.

This patch fills in the “missing” frontiers manually.

gcc/
* rtl-ssa/internals.h (build_info::exit_block_dominator): New
member variable.
* rtl-ssa/blocks.cc (build_info::build_info): Initialize it.
(bb_walker::bb_walker): Use it, moving the computation of the
dominator to...
(function_info::process_all_blocks): ...here.
(function_info::place_phis): Add dominance frontiers for the
exit block.

rtl-ssa: Handle artifical uses of deleted defs

If an optimisation removes the last real use of a definition,
there can still be artificial uses left. This patch removes
those uses too.

These artificial uses exist because RTL-SSA is only an SSA-like
view of the existing RTL IL, rather than a native SSA representation.
It effectively treats RTL registers like gimple vops, but with the
addition of an RPO view of the register's lifetime(s). Things are
structured to allow most operations to update this RPO view in
amortised sublinear time.

gcc/
* rtl-ssa/functions.h (function_info::process_uses_of_deleted_def):
New member function.
* rtl-ssa/changes.cc (function_info::process_uses_of_deleted_def):
Likewise.
(function_info::change_insns): Use it.

rtl-ssa: Fix ICE when deleting memory clobbers

Sometimes an optimisation can remove a clobber of scratch registers
or scratch memory.  We then need to update the DU chains to reflect
the removed clobber.

For registers this isn't a problem.  Clobbers of registers are just
momentary blips in the register's lifetime.  They act as a barrier for
moving uses later or defs earlier, but otherwise they have no effect on
the semantics of other instructions.  Removing a clobber is therefore a
cheap, local operation.

In contrast, clobbers of memory are modelled as full sets.
This is because (a) a clobber of memory does not invalidate
*all* memory and (b) it's a common idiom to use (clobber (mem ...))
in stack barriers.  But removing a set and redirecting all uses
to a different set is a linear operation.  Doing it for potentially
every optimisation could lead to quadratic behaviour.

This patch therefore refrains from removing sets of memory that appear
to be redundant.  There's an opportunity to clean this up in linear time
at the end of the pass, but as things stand, nothing would benefit from
that.

This is also a very rare event.  Usually we should try to optimise the
insn before the scratch memory has been allocated.

gcc/
* rtl-ssa/changes.cc (function_info::finalize_new_accesses):
If a change describes a set of memory, ensure that that set
is kept, regardless of the insn pattern.

rtl-ssa: Create REG_UNUSED notes after all pending changes

Unlike REG_DEAD notes, REG_UNUSED notes need to be kept free of
false positives by all passes. function_info::change_insns
does this by removing all REG_UNUSED notes, and then using
add_reg_unused_notes to add notes back (or create new ones)
where appropriate.

The problem was that it called add_reg_unused_notes on the fly
while updating each instruction, which meant that the information
for later instructions in the change set wasn't up to date.
This patch does it in a separate loop instead.

gcc/
* rtl-ssa/changes.cc (function_info::apply_changes_to_insn): Remove
call to add_reg_unused_notes and instead...
(function_info::change_insns): ...use a separate loop here.

rtl-ssa: Ensure global registers are live on exit

RTL-SSA mostly relies on DF for block-level register liveness
information, including artificial uses and defs at the beginning
and end of blocks.  But one case was missing.  DF does not add
artificial uses of global registers to the beginning or end
of a block.  Instead it marks them as used within every block
when computing LR and LIVE problems.

For RTL-SSA, global registers behave like memory, which in
turn behaves like gimple vops.  We need to ensure that they
are live on exit so that final definitions do not appear
to be unused.

Also, the previous live-on-exit handling only considered the exit
block itself.  It needs to consider non-local gotos as well, since
they jump directly to some code in a parent function and so do
not have a path to the exit block.

gcc/
* rtl-ssa/blocks.cc (function_info::add_artificial_accesses): Force
global registers to be live on exit.  Handle any block with zero
successors like an exit block.

Handle OpenACC 'self' clause for compute constructs in OpenACC 'kernels' decomposition

... to fix up recent commit 3a3596389c2e539cb8fd5dc5784a4e2afe193a2a
"OpenACC 2.7: Implement self clause for compute constructs" for that case.

gcc/
* omp-oacc-kernels-decompose.cc (omp_oacc_kernels_decompose_1):
Handle 'OMP_CLAUSE_SELF' like 'OMP_CLAUSE_IF'.
* omp-expand.cc (expand_omp_target): Handle 'OMP_CLAUSE_SELF' for
'GF_OMP_TARGET_KIND_OACC_DATA_KERNELS'.
gcc/testsuite/
* c-c++-common/goacc/self-clause-2.c: Verify
'--param=openacc-kernels=decompose'.
* gfortran.dg/goacc/kernels-tree.f95: Adjust.
libgomp/
* oacc-parallel.c (GOACC_data_start): Handle
'GOACC_FLAG_LOCAL_DEVICE'.
(GOACC_parallel_keyed): Simplify accordingly.
* testsuite/libgomp.oacc-fortran/self-1.f90: Adjust.

Extend test suite coverage for OpenACC 'self' clause for compute constructs

... on top of what was provided in recent
commit 3a3596389c2e539cb8fd5dc5784a4e2afe193a2a
"OpenACC 2.7: Implement self clause for compute constructs".

gcc/testsuite/
* c-c++-common/goacc/if-clause-2.c: Enhance.
* c-c++-common/goacc/self-clause-1.c: Likewise.
* c-c++-common/goacc/self-clause-2.c: Likewise.
* gfortran.dg/goacc/if.f95: Likewise.
* gfortran.dg/goacc/kernels-tree.f95: Likewise.
* gfortran.dg/goacc/parallel-tree.f95: Likewise.
* gfortran.dg/goacc/self.f95: Likewise.
libgomp/
* testsuite/libgomp.oacc-c-c++-common/if-1.c: Enhance.
* testsuite/libgomp.oacc-c-c++-common/self-1.c: Likewise.
* testsuite/libgomp.oacc-fortran/if-1.f90: Likewise.
* testsuite/libgomp.oacc-c-c++-common/if-self-1.c: New.
* testsuite/libgomp.oacc-fortran/self-1.f90: Likewise.

Consistently order 'OMP_CLAUSE_SELF' right after 'OMP_CLAUSE_IF'

As noted in recent commit 3a3596389c2e539cb8fd5dc5784a4e2afe193a2a
"OpenACC 2.7: Implement self clause for compute constructs", the OpenACC 'self'
clause very much relates to the 'if' clause, and therefore copies a lot of the
latter's handling. Therefore it makes sense to also place this handling in
proximity to that of the 'if' clause, which was done in a lot but not all
instances.

gcc/
* tree-core.h (omp_clause_code): Move 'OMP_CLAUSE_SELF' after
'OMP_CLAUSE_IF'.
* tree-pretty-print.cc (dump_omp_clause): Adjust.
* tree.cc (omp_clause_num_ops, omp_clause_code_name): Likewise.
* tree.h: Likewise.

RISC-V: Export some functions from riscv-vsetvl to riscv-v[NFC]

Address kito's comments of AVL propagation patch.

Export the functions that are not only used by VSETVL PASS but also AVL propagation PASS.

No functionality change.
gcc/ChangeLog:

* config/riscv/riscv-protos.h (has_vl_op): Export from riscv-vsetvl to riscv-v
(tail_agnostic_p): Ditto.
(validate_change_or_fail): Ditto.
(nonvlmax_avl_type_p): Ditto.
(vlmax_avl_p): Ditto.
(get_sew): Ditto.
(enum vlmul_type): Ditto.
(count_regno_occurrences): Ditto.
* config/riscv/riscv-v.cc (has_vl_op): Ditto.
(get_default_ta): Ditto.
(tail_agnostic_p): Ditto.
(validate_change_or_fail): Ditto.
(nonvlmax_avl_type_p): Ditto.
(vlmax_avl_p): Ditto.
(get_sew): Ditto.
(enum vlmul_type): Ditto.
(get_vlmul): Ditto.
(count_regno_occurrences): Ditto.
* config/riscv/riscv-vsetvl.cc (vlmax_avl_p): Ditto.
(has_vl_op): Ditto.
(get_sew): Ditto.
(get_vlmul): Ditto.
(get_default_ta): Ditto.
(tail_agnostic_p): Ditto.
(count_regno_occurrences): Ditto.
(validate_change_or_fail): Ditto.

Disentangle handling of OpenACC 'host', 'self' pragma tokens

'gcc/c-family/c-pragma.h:pragma_omp_clause' already defines
'PRAGMA_OACC_CLAUSE_SELF', but it has no longer been used for the 'update'
directive's 'self' clause as of 2018
commit 829c6349e96c5bfa8603aaef8858b38e237a2f33 (Subversion r261813)
"Update OpenACC data clause semantics to the 2.5 behavior".  That one instead
mapped the 'self' pragma token to the 'host' one (same semantics).  That means
that we're later not able to tell whether originally we had seen 'self' or
'host', which was OK as long as only the 'update' directive had a 'self'
clause.  However, as of recent commit 3a3596389c2e539cb8fd5dc5784a4e2afe193a2a
"OpenACC 2.7: Implement self clause for compute constructs", also OpenACC
compute constructs may have a 'self' clause -- with different semantics.  That
means, we need to know which OpenACC directive we're parsing clauses for, which
can be done in a simpler way than in that commit, similar to how the OpenMP
'to' clause is handled.

While at that, clarify that (already in OpenACC 2.0a)
"The 'host' clause is a synonym for the 'self' clause." -- not the other way
round.

gcc/c/
* c-parser.cc (c_parser_omp_clause_name): Return
'PRAGMA_OACC_CLAUSE_SELF' for "self".
(c_parser_oacc_data_clause, OACC_UPDATE_CLAUSE_MASK): Adjust.
(c_parser_oacc_all_clauses): Remove 'bool compute_p' formal
parameter, and instead locally determine whether we're called for
an OpenACC compute construct or OpenACC 'update' directive.
(c_parser_oacc_compute): Adjust.
gcc/cp/
* parser.cc (cp_parser_omp_clause_name): Return
'PRAGMA_OACC_CLAUSE_SELF' for "self".
(cp_parser_oacc_data_clause, OACC_UPDATE_CLAUSE_MASK): Adjust.
(cp_parser_oacc_all_clauses): Remove 'bool compute_p' formal
parameter, and instead locally determine whether we're called for
an OpenACC compute construct or OpenACC 'update' directive.
(cp_parser_oacc_compute): Adjust.
gcc/fortran/
* openmp.cc (omp_mask2): Split 'OMP_CLAUSE_HOST_SELF' into
'OMP_CLAUSE_SELF', 'OMP_CLAUSE_HOST'.
(gfc_match_omp_clauses, OACC_UPDATE_CLAUSES): Adjust.

Enable 'c-c++-common/goacc/{if,self}-clause-1.c' for C++

As discovered via recent
commit 3a3596389c2e539cb8fd5dc5784a4e2afe193a2a
"OpenACC 2.7: Implement self clause for compute constructs",
'c-c++-common/goacc/if-clause-1.c', which the new
'c-c++-common/goacc/self-clause-1.c' was copied from, was not enabled for C++.

gcc/testsuite/
* c-c++-common/goacc/if-clause-1.c: Enable for C++
* c-c++-common/goacc/self-clause-1.c: Likewise.

OpenACC 2.7: Implement self clause for compute constructs

This patch implements the 'self' clause for compute constructs: parallel,
kernels, and serial. This clause conditionally uses the local device
(the host mult-core CPU) as the executing device of the compute region.

The actual implementation of the "local device" device type inside libgomp
(presumably using pthreads) is still not yet completed, so the libgomp
side is still implemented the exact same as host-fallback mode. (so as of now,
it essentially behaves like the 'if' clause with the condition inverted)

gcc/c/ChangeLog:

* c-parser.cc (c_parser_oacc_compute_clause_self): New function.
(c_parser_oacc_all_clauses): Add new 'bool compute_p = false'
parameter, add parsing of self clause when compute_p is true.
(OACC_KERNELS_CLAUSE_MASK): Add PRAGMA_OACC_CLAUSE_SELF.
(OACC_PARALLEL_CLAUSE_MASK): Likewise,
(OACC_SERIAL_CLAUSE_MASK): Likewise.
(c_parser_oacc_compute): Adjust call to c_parser_oacc_all_clauses to
set compute_p argument to true.
* c-typeck.cc (c_finish_omp_clauses): Add OMP_CLAUSE_SELF case.

gcc/cp/ChangeLog:

* parser.cc (cp_parser_oacc_compute_clause_self): New function.
(cp_parser_oacc_all_clauses): Add new 'bool compute_p = false'
parameter, add parsing of self clause when compute_p is true.
(OACC_KERNELS_CLAUSE_MASK): Add PRAGMA_OACC_CLAUSE_SELF.
(OACC_PARALLEL_CLAUSE_MASK): Likewise,
(OACC_SERIAL_CLAUSE_MASK): Likewise.
(cp_parser_oacc_compute): Adjust call to c_parser_oacc_all_clauses to
set compute_p argument to true.
* pt.cc (tsubst_omp_clauses): Add OMP_CLAUSE_SELF case.
* semantics.cc (c_finish_omp_clauses): Add OMP_CLAUSE_SELF case, merged
with OMP_CLAUSE_IF case.

gcc/fortran/ChangeLog:

* gfortran.h (typedef struct gfc_omp_clauses): Add self_expr field.
* openmp.cc (enum omp_mask2): Add OMP_CLAUSE_SELF.
(gfc_match_omp_clauses): Add handling for OMP_CLAUSE_SELF.
(OACC_PARALLEL_CLAUSES): Add OMP_CLAUSE_SELF.
(OACC_KERNELS_CLAUSES): Likewise.
(OACC_SERIAL_CLAUSES): Likewise.
(resolve_omp_clauses): Add handling for omp_clauses->self_expr.
* trans-openmp.cc (gfc_trans_omp_clauses): Add handling of
clauses->self_expr and building of OMP_CLAUSE_SELF tree clause.
(gfc_split_omp_clauses): Add handling of self_expr field copy.

gcc/ChangeLog:

* gimplify.cc (gimplify_scan_omp_clauses): Add OMP_CLAUSE_SELF case.
(gimplify_adjust_omp_clauses): Likewise.
* omp-expand.cc (expand_omp_target): Add OMP_CLAUSE_SELF expansion code,
* omp-low.cc (scan_sharing_clauses): Add OMP_CLAUSE_SELF case.
* tree-core.h (enum omp_clause_code): Add OMP_CLAUSE_SELF enum.
* tree-nested.cc (convert_nonlocal_omp_clauses): Add OMP_CLAUSE_SELF
case.
(convert_local_omp_clauses): Likewise.
* tree-pretty-print.cc (dump_omp_clause): Add OMP_CLAUSE_SELF case.
* tree.cc (omp_clause_num_ops): Add OMP_CLAUSE_SELF entry.
(omp_clause_code_name): Likewise.
* tree.h (OMP_CLAUSE_SELF_EXPR): New macro.

gcc/testsuite/ChangeLog:

* c-c++-common/goacc/self-clause-1.c: New test.
* c-c++-common/goacc/self-clause-2.c: New test.
* gfortran.dg/goacc/self.f95: New test.

include/ChangeLog:

* gomp-constants.h (GOACC_FLAG_LOCAL_DEVICE): New flag bit value.

libgomp/ChangeLog:

* oacc-parallel.c (GOACC_parallel_keyed): Add code to handle
GOACC_FLAG_LOCAL_DEVICE case.
* testsuite/libgomp.oacc-c-c++-common/self-1.c: New test.

OpenMP/Fortran: Group handling of 'if' clause without and with modifier

The 'if' clause with modifier was introduced in
commit b4c3a85be96585374bf95c981ba2f602667cf5b7 (Subversion r242037)
"Partial OpenMP 4.5 fortran support", but -- in some instances -- didn't place
it next to the existing handling of 'if' clause without modifier. Unify that;
no change in behavior.

gcc/fortran/
* dump-parse-tree.cc (show_omp_clauses): Group handling of 'if'
clause without and with modifier.
* frontend-passes.cc (gfc_code_walker): Likewise.
* gfortran.h (gfc_omp_clauses): Likewise.
* openmp.cc (gfc_free_omp_clauses): Likewise.

RISC-V: Change MD attribute avl_type into avl_type_idx[NFC]

Address kito's comments of AVL propagation patch.

Change avl_type into avl_type_idx.

No functionality change.

gcc/ChangeLog:

* config/riscv/riscv-protos.h (vlmax_avl_type_p): New function.
* config/riscv/riscv-v.cc (vlmax_avl_type_p): Ditto.
* config/riscv/riscv-vsetvl.cc (get_avl): Adapt function.
* config/riscv/vector.md: Change avl_type into avl_type_idx.

c++: error with bit-fields and scoped enums [PR111895]

Here we issue a bogus error: invalid operands of types 'unsigned char:2'
and 'int' to binary 'operator!=' when casting a bit-field of scoped enum
type to bool.

In build_static_cast_1, perform_direct_initialization_if_possible returns
NULL_TREE, because the invented declaration T t(e) fails, which is
correct.  So we go down to ocp_convert, which has code to deal with this
case:
          /* We can't implicitly convert a scoped enum to bool, so convert
             to the underlying type first.  */
          if (SCOPED_ENUM_P (intype) && (convtype & CONV_STATIC))
            e = build_nop (ENUM_UNDERLYING_TYPE (intype), e);
but the SCOPED_ENUM_P is false since intype is <unnamed-unsigned:2>.
This could be fixed by using unlowered_expr_type.  But then
c_common_truthvalue_conversion/CASE_CONVERT has a similar problem, and
unlowered_expr_type is a C++-only function.

Rather than adding a dummy unlowered_expr_type to C, I think we should
follow [expr.static.cast]p3: "the lvalue-to-rvalue conversion is applied
to the bit-field and the resulting prvalue is used as the operand of the
static_cast."  There are no prvalue bit-fields, so the l-to-r conversion
performed in decay_conversion will get us an expression whose type is the
enum.

PR c++/111895

gcc/cp/ChangeLog:

* typeck.cc (build_static_cast_1): Call decay_conversion.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/scoped_enum12.C: New test.

Daily bump.

modula2: tidyup M2Dependent.mod

This patch tidies up M2Dependent.mod by introducing a new procedure
to initialize all fields of DependencyList.

gcc/m2/ChangeLog:

* gm2-libs/M2Dependent.mod (InitDependencyList): New
procedure.
(CreateModule): Call InitDependencyList to initialize
all fields of DependencyList.

Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>

c++: non-dep array new-expr size [PR111929]

This PR is another instance of NON_DEPENDENT_EXPR having acted as an
"analysis barrier" for middle-end routines, and now that it's gone we're
more prone to passing weird templated trees (that have a generic tree
code) to middle-end routines which end up ICEing on such trees.

In the testcase below the non-dependent array new-expr size 'x + 42' is
expressed as an ordinary PLUS_EXPR, but whose operands have different
types (since templated trees encode just the syntactic form of an
expression devoid of e.g. implicit conversions). This type incoherency
triggers an ICE from size_binop in build_new_1 due to a wide_int assert
that expects the operand types to have the same precision.

This patch fixes this by replacing our piecemeal folding of 'size' in
build_new_1 with a single call to cp_fully_fold (which is a no-op in a
template context) once 'size' is built up.

PR c++/111929

gcc/cp/ChangeLog:

* init.cc (build_new_1): Use convert, build2, build3 and
cp_fully_fold instead of fold_convert, size_binop and
fold_build3 when building up 'size'.

gcc/testsuite/ChangeLog:

* g++.dg/template/non-dependent28.C: New test.

c++: cp_stabilize_reference and non-dep exprs [PR111919]

After the removal of NON_DEPENDENT_EXPR, cp_stabilize_reference (which
used to just exit early for NON_DEPENDENT_EXPR) is now more prone to
passing a weird templated tree to middle-end routines, which for the
testcase below leads to a crash from contains_placeholder_p. It seems
the best fix is to just exit early when in a template context, like we
do in the closely related function cp_save_expr.

PR c++/111919

gcc/cp/ChangeLog:

* tree.cc (cp_stabilize_reference): Do nothing when
processing_template_decl.

gcc/testsuite/ChangeLog:

* g++.dg/template/non-dependent27.C: New test.

libstdc++: Include cstdarg in freestanding

P1642 includes cstdarg in the full headers to include.
This commit includes it along with cstdalign and cstdbool that were
left out when updating in an earlier commit.

libstdc++/Changelog

* include/Makefile.am: Move cstdarg, cstdalign and cstdbool to
freestanding.
* include/Makefile.in: Regenerate.

Signed-off-by: Paul M. Bendixen <paulbendixen@gmail.com>

modula2: gcc/m2/gm2-libs/M2Dependent.mod initialize all record fields.

Initialize all sub fields within mptr. Valgrind detected
uninitialized fields in M2Dependent.mod. CreateModule must ensure all
sub fields are initialized.

gcc/m2/ChangeLog:

* gm2-libs/M2Dependent.mod (CreateModule): Initialize all
dependency fields for DependencyList.

Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>

recog/reload: Remove old UNARY_P operand support

reload and constrain_operands had some old code to look through unary
operators.  E.g. an operand could be (sign_extend (reg X)), and the
constraints would match the reg rather than the sign_extend.

This was previously used by the MIPS port.  But relying on it was a
recurring source of problems, so Eric and I removed it in the MIPS
rewrite from ~20 years back.  I don't know of any other port that used it.

Also, the constraints processing in LRA and IRA do not have direct
support for these embedded operators, so I think it was only ever a
reload-specific feature (and probably only a global/local+reload-specific
feature, rather than IRA+reload).

Keeping the checks caused problems for special memory constraints,
leading to:

  /* A unary operator may be accepted by the predicate, but it
     is irrelevant for matching constraints.  */
  /* For special_memory_operand, there could be a memory operand inside,
     and it would cause a mismatch for constraint_satisfied_p.  */
  if (UNARY_P (op) && op == extract_mem_from_operand (op))
    op = XEXP (op, 0);

But inline asms are another source of problems.  Asms don't have
predicates, and so we can't use recog to decide whether a given change
to an asm gives a valid match.  We instead rely on constrain_operands as
something of a recog stand-in.  For an example like:

    void
    foo (int *ptr)
    {
      asm volatile ("%0" :: "r" (-*ptr));
    }

any attempt to propagate the negation into the asm would be allowed,
because it's the negated register that would be checked against the
"r" constraint.  This would later lead to:

    error: invalid 'asm': invalid operand

The same thing happened in gcc.target/aarch64/vneg_s.c with the
upcoming late-combine pass.

Rather than add more workarounds, it seemed better just to delete
this code.

gcc/
* recog.cc (constrain_operands): Remove UNARY_P handling.
* reload.cc (find_reloads): Likewise.

gcc: fix typo in comment in gcov-io.h

gcc/ChangeLog:

* gcov-io.h: Fix record length encoding in comment.

i386: Fine tune STV register conversion costs for -Os.

The eagle-eyed may have spotted that my recent testcases for DImode shifts
on x86_64 included -mno-stv in the dg-options.  This is because the
Scalar-To-Vector (STV) pass currently transforms these shifts to use
SSE vector operations, producing larger code even with -Os.  The issue
is that the compute_convert_gain currently underestimates the size of
instructions required for interunit moves, which is corrected with the
patch below.

For the simple test case:

unsigned long long shl1(unsigned long long x) { return x << 1; }

without this patch, GCC -m32 -Os -mavx2 currently generates:

shl1: push   %ebp // 1 byte
mov    %esp,%ebp // 2 bytes
vmovq  0x8(%ebp),%xmm0 // 5 bytes
pop    %ebp // 1 byte
vpaddq %xmm0,%xmm0,%xmm0 // 4 bytes
vmovd  %xmm0,%eax // 4 bytes
vpextrd $0x1,%xmm0,%edx  // 6 bytes
ret // 1 byte  = 24 bytes total

with this patch, we now generate the shorter

shl1: push   %ebp // 1 byte
mov    %esp,%ebp // 2 bytes
mov    0x8(%ebp),%eax // 3 bytes
mov    0xc(%ebp),%edx // 3 bytes
pop    %ebp // 1 byte
add    %eax,%eax // 2 bytes
adc    %edx,%edx // 2 bytes
ret // 1 byte  = 15 bytes total

Benchmarking using CSiBE, shows that this patch saves 1361 bytes
when compiling with -m32 -Os, and saves 172 bytes when compiling
with -Os.

2023-10-24  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
* config/i386/i386-features.cc (compute_convert_gain): Provide
more accurate values (sizes) for inter-unit moves with -Os.

ARC: Improved SImode shifts and rotates on !TARGET_BARREL_SHIFTER.

This patch completes the ARC back-end's transition to using pre-reload
splitters for SImode shifts and rotates on targets without a barrel
shifter.  The core part is that the shift_si3 define_insn is no longer
needed, as shifts and rotates that don't require a loop are split
before reload, and then because shift_si3_loop is the only caller
of output_shift, both can be significantly cleaned up and simplified.
The output_shift function (Claudiu's "the elephant in the room") is
renamed output_shift_loop, which handles just the four instruction
zero-overhead loop implementations.

Aside from the clean-ups, the user visible changes are much improved
implementations of SImode shifts and rotates on affected targets.

For the function:
unsigned int rotr_1 (unsigned int x) { return (x >> 1) | (x << 31); }

GCC with -O2 -mcpu=em would previously generate:

rotr_1: lsr_s r2,r0
        bmsk_s r0,r0,0
        ror     r0,r0
        j_s.d   [blink]
        or_s    r0,r0,r2

with this patch, we now generate:

        j_s.d   [blink]
        ror     r0,r0

For the function:
unsigned int rotr_31 (unsigned int x) { return (x >> 31) | (x << 1); }

GCC with -O2 -mcpu=em would previously generate:

rotr_31:
        mov_s   r2,r0   ;4
        asl_s r0,r0
        add.f 0,r2,r2
        rlc r2,0
        j_s.d   [blink]
        or_s    r0,r0,r2

with this patch we now generate an add.f followed by an adc:

rotr_31:
        add.f   r0,r0,r0
        j_s.d   [blink]
        add.cs  r0,r0,1

Shifts by constants requiring a loop have been improved for even counts
by performing two operations in each iteration:

int shl10(int x) { return x >> 10; }

Previously looked like:

shl10: mov.f lp_count, 10
        lpnz    2f
        asr r0,r0
        nop
2:      # end single insn loop
        j_s     [blink]

And now becomes:

shl10:
        mov     lp_count,5
        lp      2f
        asr     r0,r0
        asr     r0,r0
2:      # end single insn loop
        j_s     [blink]

So emulating ARC's SWAP on architectures that don't have it:

unsigned int rotr_16 (unsigned int x) { return (x >> 16) | (x << 16); }

previously required 10 instructions and ~70 cycles:

rotr_16:
        mov_s   r2,r0   ;4
        mov.f lp_count, 16
        lpnz    2f
        add r0,r0,r0
        nop
2:      # end single insn loop
        mov.f lp_count, 16
        lpnz    2f
        lsr r2,r2
        nop
2:      # end single insn loop
        j_s.d   [blink]
        or_s    r0,r0,r2

now becomes just 4 instructions and ~18 cycles:

rotr_16:
        mov     lp_count,8
        lp      2f
        ror     r0,r0
        ror     r0,r0
2:      # end single insn loop
        j_s     [blink]

2023-10-24  Roger Sayle  <roger@nextmovesoftware.com>
    Claudiu Zissulescu  <claziss@gmail.com>

gcc/ChangeLog
* config/arc/arc-protos.h (output_shift): Rename to...
(output_shift_loop): Tweak API to take an explicit rtx_code.
(arc_split_ashl): Prototype new function here.
(arc_split_ashr): Likewise.
(arc_split_lshr): Likewise.
(arc_split_rotl): Likewise.
(arc_split_rotr): Likewise.
* config/arc/arc.cc (output_shift): Delete local prototype.  Rename.
(output_shift_loop): New function replacing output_shift to output
a zero overheap loop for SImode shifts and rotates on ARC targets
without barrel shifter (i.e. no hardware support for these insns).
(arc_split_ashl): New helper function to split *ashlsi3_nobs.
(arc_split_ashr): New helper function to split *ashrsi3_nobs.
(arc_split_lshr): New helper function to split *lshrsi3_nobs.
(arc_split_rotl): New helper function to split *rotlsi3_nobs.
(arc_split_rotr): New helper function to split *rotrsi3_nobs.
(arc_print_operand): Correct whitespace.
(arc_rtx_costs): Likewise.
(hwloop_optimize): Likewise.
* config/arc/arc.md (ANY_SHIFT_ROTATE): New define_code_iterator.
(define_code_attr insn): New code attribute to map to pattern name.
(<ANY_SHIFT_ROTATE>si3): New expander unifying previous ashlsi3,
ashrsi3 and lshrsi3 define_expands.  Adds rotlsi3 and rotrsi3.
(*<ANY_SHIFT_ROTATE>si3_nobs): New define_insn_and_split that
unifies the previous *ashlsi3_nobs, *ashrsi3_nobs and *lshrsi3_nobs.
We now call arc_split_<insn> in arc.cc to implement each split.
(shift_si3): Delete define_insn, all shifts/rotates are now split.
(shift_si3_loop): Rename to...
(<insn>si3_loop): define_insn to handle loop implementations of
SImode shifts and rotates, calling ouput_shift_loop for template.
(rotrsi3): Rename to...
(*rotrsi3_insn): define_insn for TARGET_BARREL_SHIFTER's ror.
(*rotlsi3): New define_insn_and_split to transform left rotates
into right rotates before reload.
(rotlsi3_cnt1): New define_insn_and_split to implement a left
rotate by one bit using an add.f followed by an adc.
* config/arc/predicates.md (shiftr4_operator): Delete.

testsuite: Fix gcc.target/arm/mve/mve_vadcq_vsbcq_fpscr_overwrite.c

The test was declaring 'int *carry;' and wrote to '*carry' without
initializing 'carry' first, leading to an attempt to write at address
zero, and a crash.

Fix by declaring 'int carry;' and passing '&carrry' instead of 'carry'
as parameter.

2023-09-08 Christophe Lyon <christophe.lyon@linaro.org>

gcc/testsuite/
* gcc.target/arm/mve/mve_vadcq_vsbcq_fpscr_overwrite.c: Fix.

arc: Remove mpy_dest_reg_operand predicate

The mpy_dest_reg_operand is just a wrapper for
register_operand. Remove it.

gcc/

* config/arc/arc.md (mulsi3_700): Update pattern.
(mulsi3_v2): Likewise.
* config/arc/predicates.md (mpy_dest_reg_operand): Remove it.

Signed-off-by: Claudiu Zissulescu <claziss@gmail.com>

Improve factor_out_conditional_operation for conversions and constants

In the case of a NOP conversion (precisions of the 2 types are equal),
factoring out the conversion can be done even if int_fits_type_p returns
false and even when the conversion is defined by a statement inside the
conditional. Since it is a NOP conversion there is no zero/sign extending
happening which is why it is ok to be done here; we were trying to prevent
an extra sign/zero extend from being moved away from definition which no-op
conversions are not.

Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

PR tree-optimization/104376
PR tree-optimization/101541
* tree-ssa-phiopt.cc (factor_out_conditional_operation):
Allow nop conversions even if it is defined by a statement
inside the conditional.

gcc/testsuite/ChangeLog:

PR tree-optimization/101541
* gcc.dg/tree-ssa/phi-opt-39.c: New test.

match: Fix the `popcnt(a&b) + popcnt(a|b)` pattern for types [PR111913]

So this pattern needs a little help on the gimple side of things to know what
the type popcount should be. For most builtins, the type is the same as the input
but popcount and others are not. And when using it with another outer expression,
genmatch needs some slight help to know that the return type was type rather than
the argument type.

Bootstrapped and tested on x86_64-linux-gnu with no regressions.

PR tree-optimization/111913

gcc/ChangeLog:

* match.pd (`popcount(X&Y) + popcount(X|Y)`): Add the resulting
type for popcount.

gcc/testsuite/ChangeLog:

* gcc.c-torture/compile/fold-popcount-1.c: New test.
* gcc.dg/fold-popcount-8a.c: New test.

rtl-ssa: Avoid creating duplicated phis

If make_uses_available was called twice for the same use,
we could end up trying to create duplicate definitions for
the same extended live range.

gcc/
* rtl-ssa/blocks.cc (function_info::create_degenerate_phi): Check
whether the requested phi already exists.

rtl-ssa: Don't insert after insns that can throw

rtl_ssa::can_insert_after didn't handle insns that can throw.
Fixing that avoids a regression with a later patch.

gcc/
* rtl-ssa.h: Include cfgbuild.h.
* rtl-ssa/movement.h (can_insert_after): Replace is_jump with the
more comprehensive control_flow_insn_p.

rtl-ssa: Fix handling of deleted insns

RTL-SSA queues up some invasive changes for later. But sometimes
the insns involved in those changes can be deleted by later
optimisations, making the queued change unnecessary. This patch
checks for that case.

gcc/
* rtl-ssa/changes.cc (function_info::perform_pending_updates): Check
whether an insn has been replaced by a note.

rtl-ssa: Fix null deref in first_any_insn_use

first_any_insn_use implicitly (but contrary to its documentation)
assumed that there was at least one use.

gcc/
* rtl-ssa/member-fns.inl (first_any_insn_use): Handle null
m_first_use.

i386: Avoid paradoxical subreg dests in vector zero_extend

For the V2HI -> V2SI zero extension in:

  typedef unsigned short v2hi __attribute__((vector_size(4)));
  typedef unsigned int v2si __attribute__((vector_size(8)));
  v2si f (v2hi x) { return (v2si) {x[0], x[1]}; }

ix86_expand_sse_extend would generate:

   (set (reg:V2HI 102)
        (const_vector:V2HI [(const_int 0 [0])
    (const_int 0 [0])]))
   (set (subreg:V8HI (reg:V2HI 101) 0)
        (vec_select:V8HI
          (vec_concat:V16HI (subreg:V8HI (reg/v:V2HI 99 [ x ]) 0)
                            (subreg:V8HI (reg:V2HI 102) 0))
          (parallel [(const_int 0 [0])
                     (const_int 8 [0x8])
                     (const_int 1 [0x1])
                     (const_int 9 [0x9])
                     (const_int 2 [0x2])
                     (const_int 10 [0xa])
                     (const_int 3 [0x3])
                     (const_int 11 [0xb])])))
  (set (reg:V2SI 100)
       (subreg:V2SI (reg:V2HI 101) 0))
    (expr_list:REG_EQUAL (zero_extend:V2SI (reg/v:V2HI 99 [ x ])))

But using (subreg:V8HI (reg:V2HI 101) 0) as the destination of
the vec_select means that only the low 4 bytes of the destination
are stored.  Only the lower half of reg 100 is well-defined.

Things tend to happen to work if the register allocator ties reg 101
to reg 100.  But it caused problems with the upcoming late-combine pass
because we propagated the set of reg 100 into its uses.

gcc/
* config/i386/i386-expand.cc (ix86_split_mmx_punpck): Allow the
destination to be wider than the sources.  Take the mode from the
first source.
(ix86_expand_sse_extend): Pass the destination directly to
ix86_split_mmx_punpck, rather than using a fresh register that
is half the size.

i386: Fix unprotected REGNO in aeswidekl_operation

I hit an ICE in aeswidekl_operation while testing the late-combine
pass on x86. The predicate tested REGNO without first testing REG_P.

gcc/
* config/i386/predicates.md (aeswidekl_operation): Protect
REGNO check with REG_P.

aarch64: Define TARGET_INSN_COST

This patch adds a bare-bones TARGET_INSN_COST.  See the comment
in the patch for the rationale.

Just to get a flavour for how much difference it makes, I tried
compiling the testsuite with -Os -fno-schedule-insns{,2} and
seeing what effect the patch had on the number of instructions.
Very few tests changed, but all the changes were positive:

  Tests   Good    Bad   Delta    Best   Worst  Median
  =====   ====    ===   =====    ====   =====  ======
     19     19      0    -177     -52      -1      -4

The change for -O2 was even smaller, but more mixed:

  Tests   Good    Bad   Delta    Best   Worst  Median
  =====   ====    ===   =====    ====   =====  ======
      6      3      3      -8      -9       6      -2

There were no obvious effects on SPEC CPU2017.

The patch is needed to avoid a regression with a later change.

gcc/
* config/aarch64/aarch64.cc (aarch64_insn_cost): New function.
(TARGET_INSN_COST): Define.

aarch64: Avoid bogus atomics match

The non-LSE pattern aarch64_atomic_exchange<mode> comes before the
LSE pattern aarch64_atomic_exchange<mode>_lse. From a recog
perspective, the only difference between the patterns is that
the non-LSE one clobbers CC and needs a scratch.

However, combine and RTL-SSA can both add clobbers to make a
pattern match. This means that if they try to rerecognise an
LSE pattern, they could end up turning it into a non-LSE pattern.
This patch adds a !TARGET_LSE test to avoid that.

This is needed to avoid a regression with later patches.

gcc/
* config/aarch64/atomics.md (aarch64_atomic_exchange<mode>): Require
!TARGET_LSE.

RISC-V: Fix ICE of RVV vget/vset intrinsic[PR111935]

Calling vget/vset intrinsic without receiving a return value will cause
a crash. Because in this case e.target is null.
This patch should be backported to releases/gcc-13.

PR target/111935

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc: fix bug.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/pr111935.c: New test.

libgcc: make heap-based trampolines conditional on libc presence

To build `libc` for a target one needs to build `gcc` without `libc`
support first. Commit r14-4823-g8abddb187b3348 "libgcc: support
heap-based trampolines" added unconditional `libc` dependency and broke
libc-less `gcc` builds.

An example failure on `x86_64-unknown-linux-gnu`:

    $ mkdir -p /tmp/empty
    $ ../gcc/configure \
        --disable-multilib \
        --without-headers \
        --with-newlib \
        --enable-languages=c \
        --disable-bootstrap \
        --disable-gcov \
        --disable-threads \
        --disable-shared \
        --disable-libssp \
        --disable-libquadmath \
        --disable-libgomp \
        --disable-libatomic \
        --with-build-sysroot=/tmp/empty
    $ make
    ...
    /tmp/gb/./gcc/xgcc -B/tmp/gb/./gcc/ -B/usr/local/x86_64-pc-linux-gnu/bin/ -B/usr/local/x86_64-pc-linux-gnu/lib/ -isystem /usr/local/x86_64-pc-linux-gnu/include -isystem /usr/local/x86_64-pc-linux-gnu/sys-include --sysroot=/tmp/empty   -g -O2 -O2  -g -O2 -DIN_GCC   -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wstrict-prototypes -Wmissing-prototypes -Wold-style-definition  -isystem ./include  -fpic -mlong-double-80 -DUSE_ELF_SYMVER -fcf-protection -mshstk -g -DIN_LIBGCC2 -fbuilding-libgcc -fno-stack-protector -Dinhibit_libc -fpic -mlong-double-80 -DUSE_ELF_SYMVER -fcf-protection -mshstk -I. -I. -I../.././gcc -I/home/slyfox/dev/git/gcc/libgcc -I/home/slyfox/dev/git/gcc/libgcc/. -I/home/slyfox/dev/git/gcc/libgcc/../gcc -I/home/slyfox/dev/git/gcc/libgcc/../include  -DHAVE_CC_TLS  -DUSE_TLS  -o heap-trampoline.o -MT heap-trampoline.o -MD -MP -MF heap-trampoline.dep  -c .../gcc/libgcc/config/i386/heap-trampoline.c -fvisibility=hidden -DHIDE_EXPORTS
    ../gcc/libgcc/config/i386/heap-trampoline.c:3:10: fatal error: unistd.h: No such file or directory
        3 | #include <unistd.h>
          |          ^~~~~~~~~~
    compilation terminated.
    make[2]: *** [.../gcc/libgcc/static-object.mk:17: heap-trampoline.o] Error 1
    make[2]: Leaving directory '/tmp/gb/x86_64-pc-linux-gnu/libgcc'
    make[1]: *** [Makefile:13307: all-target-libgcc] Error 2

The change inhibits any heap-based trampoline code.

libgcc/

* config/aarch64/heap-trampoline.c: Disable when libc is not
present.
* config/i386/heap-trampoline.c: Ditto.

Remove obsolete debugging formats from names list

* opts.cc (debug_type_names): Remove stabs and xcoff.
(df_set_names): Adjust.

RISC-V: Fix ICE of RTL CHECK on VSETVL PASS[PR111947]

ICE on vsetvli a5, 8 instruction demand info.

The AVL is const_int 8 which ICE on RENGO caller.

Committed as it is obvious fix.

PR target/111947

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (pre_vsetvl::compute_lcm_local_properties): Add REGNO check.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/vsetvl/pr111947.c: New test.

Daily bump.

libcpp: Improve the diagnostic for poisoned identifiers [PR36887]

The PR requests an enhancement to the diagnostic issued for the use of a
poisoned identifier. Currently, we show the location of the usage, but not
the location which requested the poisoning, which would be helpful for the
user if the decision to poison an identifier was made externally, such as
in a library header.

In order to output this information, we need to remember a location_t for
each identifier that has been poisoned, and that data needs to be preserved
as well in a PCH. One option would be to add a field to struct cpp_hashnode,
but there is no convenient place to add it without increasing the size of
the struct for all identifiers. Given this facility will be needed rarely,
it seemed better to add a second hash map, which is handled PCH-wise the
same as the current one in gcc/stringpool.cc. This hash map associates a new
struct cpp_hashnode_extra with each identifier that needs one. Currently
that struct only contains the new location_t, but it could be extended in
the future if there is other ancillary data that may be convenient to put
there for other purposes.

libcpp/ChangeLog:

PR preprocessor/36887
* directives.cc (do_pragma_poison): Store in the extra hash map the
location from which an identifier has been poisoned.
* lex.cc (identifier_diagnostics_on_lex): When issuing a diagnostic
for the use of a poisoned identifier, also add a note indicating the
location from which it was poisoned.
* identifiers.cc (alloc_node): Convert to template function.
(_cpp_init_hashtable): Handle the new extra hash map.
(_cpp_destroy_hashtable): Likewise.
* include/cpplib.h (struct cpp_hashnode_extra): New struct.
(cpp_create_reader): Update prototype to...
* init.cc (cpp_create_reader): ...accept an argument for the extra
hash table and pass it to _cpp_init_hashtable.
* include/symtab.h (ht_lookup): New overload for convenience.
* internal.h (struct cpp_reader): Add EXTRA_HASH_TABLE member.
(_cpp_init_hashtable): Adjust prototype.

gcc/c-family/ChangeLog:

PR preprocessor/36887
* c-opts.cc (c_common_init_options): Pass new extra hash map
argument to cpp_create_reader().

gcc/ChangeLog:

PR preprocessor/36887
* toplev.h (ident_hash_extra): Declare...
* stringpool.cc (ident_hash_extra): ...this new global variable.
(init_stringpool): Handle ident_hash_extra as well as ident_hash.
(ggc_mark_stringpool): Likewise.
(ggc_purge_stringpool): Likewise.
(struct string_pool_data_extra): New struct.
(spd2): New GC root variable.
(gt_pch_save_stringpool): Use spd2 to handle ident_hash_extra,
analogous to how spd is used to handle ident_hash.
(gt_pch_restore_stringpool): Likewise.

gcc/testsuite/ChangeLog:

PR preprocessor/36887
* c-c++-common/cpp/diagnostic-poison.c: New test.
* g++.dg/pch/pr36887.C: New test.
* g++.dg/pch/pr36887.Hs: New test.

compiler: move Selector_expression up in file

This is a mechanical change to move Selector_expression up in expressions.cc.
This will make it visible to Builtin_call_expression for later work.
This produces a very large "git --diff", but "git diff --minimal" is clear.

Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/536642

compiler: make xx_constant_value methods non-const

This changes the Expression {numeric,string,boolean}_constant_value
methods non-const. This does not affect anything immediately,
but will be useful for later CLs in this series.

The only real effect is to Builtin_call_expression::do_export,
which remains const and can no longer call numeric_constant_value.
But it never needed to call it, as do_export runs after do_lower,
and do_lower replaces a constant expression with the actual constant.

Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/536641

compiler: pass gogo to Runtime::make_call

This is a boilerplate change to pass gogo to Runtime::make_call.
It's not currently used but will be used by later CLs in this series.

Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/536640

compiler: add Expression::is_untyped method

This method is not currently used by anything, but it will be used
by later CLs in this series.

Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/536639

syscall: add missing type conversion

The gofrontend incorrectly accepted code that was missing a type conversion.
The test case for this is bug518.go in https://go.dev/cl/536537.
Future CLs in this series will detect the type error.

Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/536638

vect: Allow same precision for bit-precision conversions.

In PR111794 we miss a vectorization because on riscv type precision and
mode precision differ for mask types. We can still vectorize when
allowing assignments with the same precision for dest and source which
is what this patch does.

gcc/ChangeLog:

PR tree-optimization/111794
* tree-vect-stmts.cc (vectorizable_assignment): Add
same-precision exception for dest and source.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/slp-mask-1.c: New test.
* gcc.target/riscv/rvv/autovec/slp-mask-run-1.c: New test.

RISC-V: Add popcount fallback expander.

I didn't manage to get back to the generic vectorizer fallback for
popcount so I figured I'd rather create a popcount fallback in the
riscv backend. It uses the WWG algorithm from libgcc.

gcc/ChangeLog:

* config/riscv/autovec.md (popcount<mode>2): New expander.
* config/riscv/riscv-protos.h (expand_popcount): Define.
* config/riscv/riscv-v.cc (expand_popcount): Vectorize popcount
with the WWG algorithm.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/unop/popcount-1.c: New test.
* gcc.target/riscv/rvv/autovec/unop/popcount-2.c: New test.
* gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c: New test.
* gcc.target/riscv/rvv/autovec/unop/popcount.c: New test.

tree-optimization/111916 - SRA of BIT_FIELD_REF of constant pool entries

The following adjusts a leftover BIT_FIELD_REF special-casing to only
cover the cases general code doesn't handle.

PR tree-optimization/111916
* tree-sra.cc (sra_modify_assign): Do not lower all
BIT_FIELD_REF reads that are sra_handled_bf_read_p.

* gcc.dg/torture/pr111916.c: New testcase.

tree-optimization/111915 - mixing grouped and non-grouped accesses

The change to allow SLP of non-grouped accesses failed to check
for the case of mixing with grouped accesses.

PR tree-optimization/111915
* tree-vect-slp.cc (vect_build_slp_tree_1): Check all
accesses are either grouped or not.

* gcc.dg/vect/pr111915.c: New testcase.

ipa/111914 - perform parameter init after remapping types

The following addresses a mismatch in SSA name vs. symbol when
we emit a dummy assignment when not optimizing. The temporary
we create is not remapped by initialize_inlined_parameters because
we have no easy way to get at it. The following instead emits
the additional statement after we have remapped the type of
the replacement variable.

PR ipa/111914
* tree-inline.cc (setup_one_parameter): Move code emitting
a dummy load when not optimizing ...
(initialize_inlined_parameters): ... here to after when
we remapped the parameter type.

* gcc.dg/pr111914.c: New testcase.