git.ipfire.org Git - thirdparty/gcc.git/log

Daily bump.

Fix couple of endianness issues in fold_ctor_reference

fold_ctor_reference attempts to use a recursive local processing in order
to call native_encode_expr on the leaf nodes of the constructor, before
falling back to calling native_encode_initializer if this fails.

There are a couple of issues related to endianness present in it:
  1) it does not specifically handle integral bit-fields; now these are left
justified on big-endian platforms so cannot be treated like ordinary fields.
  2) it does not check that the constructor uses the native storage order.

gcc/
* gimple-fold.cc (fold_array_ctor_reference): Fix head comment.
(fold_nonarray_ctor_reference): Likewise.  Specifically deal
with integral bit-fields.
(fold_ctor_reference): Make sure that the constructor uses the
native storage order.

gcc/testsuite/
* gcc.c-torture/execute/20230630-1.c: New test.
* gcc.c-torture/execute/20230630-2.c: Likewise.
* gcc.c-torture/execute/20230630-3.c: Likewise
* gcc.c-torture/execute/20230630-4.c: Likewise

Daily bump.

Refine maskstore patterns with UNSPEC_MASKMOV.

Similar like r14-2070-gc79476da46728e

If mem_addr points to a memory region with less than whole vector size
bytes of accessible memory and k is a mask that would prevent reading
the inaccessible bytes from mem_addr, add UNSPEC_MASKMOV to prevent
it to be transformed to any other whole memory access instructions.

gcc/ChangeLog:

PR rtl-optimization/110237
* config/i386/sse.md (<avx512>_store<mode>_mask): Refine with
UNSPEC_MASKMOV.
(maskstore<mode><avx512fmaskmodelower): Ditto.
(*<avx512>_store<mode>_mask): New define_insn, it's renamed
from original <avx512>_store<mode>_mask.

Refine maskloadmn pattern with UNSPEC_MASKLOAD.

If mem_addr points to a memory region with less than whole vector size
bytes of accessible memory and k is a mask that would prevent reading
the inaccessible bytes from mem_addr, add UNSPEC_MASKLOAD to prevent
it to be transformed to vpblendd.

gcc/ChangeLog:

PR target/110309
* config/i386/sse.md (maskload<mode><avx512fmaskmodelower>):
Refine pattern with UNSPEC_MASKLOAD.
(maskload<mode><avx512fmaskmodelower>): Ditto.
(*<avx512>_load<mode>_mask): Extend mode iterator to
VI12HF_AVX512VL.
(*<avx512>_load<mode>): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr110309.c: New test.

i386: Sync tune_string with arch_string for target attribute arch=*

For function with target attribute arch=*, current logic will set its
tune to -mtune from command line so all target_clones will get same
tuning flags which would affect the performance for each clone. Override
tune with arch if tune was not explicitly specified to get proper tuning
flags for target_clones.

gcc/ChangeLog:

* config/i386/i386-options.cc (ix86_valid_target_attribute_tree):
Override tune_string with arch_string if tune_string is not
explicitly specified.

gcc/testsuite/ChangeLog:

* gcc.target/i386/mvc17.c: New test.

(cherry picked from commit 2916278d14e9ac28c361c396a67256acbebda6e8)

Daily bump.

Support parallel testing in libgomp: fallback Perl 'flock' [PR66005]

Follow-up to commit 6c3b30ef9e0578509bdaf59c13da4a212fe6c2ba
"Support parallel testing in libgomp, part II [PR66005]"
("..., and enable if 'flock' is available for serializing execution testing"),
where we saw:

> On my Dell Precision 7530 laptop:
>
>     $ uname -srvi
>     Linux 5.15.0-71-generic #78-Ubuntu SMP Tue Apr 18 09:00:29 UTC 2023 x86_64
>     $ grep '^model name' < /proc/cpuinfo | uniq -c
>          12 model name      : Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz
>     $ nvidia-smi -L
>     GPU 0: Quadro P1000 (UUID: GPU-e043973b-b52a-d02b-c066-a8fdbf64e8ea)
>
> ... [...]: case (c) standard configuration, no offloading
> configured, [...]

>     $ \time make check-target-libgomp
>
> Case (c), baseline; [...]:
>
>     1180.98user 110.80system 19:36.40elapsed 109%CPU (0avgtext+0avgdata 505148maxresident)k
>     1133.22user 111.08system 19:35.75elapsed 105%CPU (0avgtext+0avgdata 505212maxresident)k
>
> Case (c), parallelized [using 'flock']:
>
> [...]
>     -j12 GCC_TEST_PARALLEL_SLOTS=12
>     2591.04user 192.64system 4:44.98elapsed 976%CPU (0avgtext+0avgdata 505216maxresident)k
>     2581.23user 195.21system 4:47.51elapsed 965%CPU (0avgtext+0avgdata 505212maxresident)k

Quite the same when instead of 'flock' using this fallback Perl 'flock':

    2565.23user 194.35system 4:46.77elapsed 962%CPU (0avgtext+0avgdata 505216maxresident)k
    2549.38user 200.20system 4:46.08elapsed 961%CPU (0avgtext+0avgdata 505216maxresident)k

PR testsuite/66005
gcc/
* doc/install.texi: Document (optional) Perl usage for parallel
testing of libgomp.
libgomp/
* testsuite/lib/libgomp.exp: 'flock' through stdout.
* testsuite/flock: New.
* configure.ac (FLOCK): Point to that if no 'flock' available, but
'perl' is.
* configure: Regenerate.

(cherry picked from commit 04abe1944d30eb18a2060cfcd9695d085f7b4752)

Support parallel testing in libgomp, part II [PR66005]

..., and enable if 'flock' is available for serializing execution testing.

Regarding the default of 19 parallel slots, this turned out to be a local
minimum for wall time when testing this on:

    $ uname -srvi
    Linux 4.2.0-42-generic #49~14.04.1-Ubuntu SMP Wed Jun 29 20:22:11 UTC 2016 x86_64
    $ grep '^model name' < /proc/cpuinfo | uniq -c
         32 model name      : Intel(R) Xeon(R) CPU E5-2640 v3 @ 2.60GHz

... in two configurations: case (a) standard configuration, no offloading
configured, case (b) offloading for GCN and nvptx configured but no devices
available.  For both cases, default plus '-m32' variant.

    $ \time make check-target-libgomp RUNTESTFLAGS="--target_board=unix\{,-m32\}"

Case (a), baseline:

    6432.23user 332.38system 47:32.28elapsed 237%CPU (0avgtext+0avgdata 505044maxresident)k
    6382.43user 319.21system 47:06.04elapsed 237%CPU (0avgtext+0avgdata 505172maxresident)k

This is what people have been complaining about, rightly so, in
<https://gcc.gnu.org/PR66005> "libgomp make check time is excessive" and
elsewhere.

Case (a), parallelized:

    -j12 GCC_TEST_PARALLEL_SLOTS=10
    3088.49user 267.74system 6:43.82elapsed 831%CPU (0avgtext+0avgdata 505188maxresident)k
    -j15 GCC_TEST_PARALLEL_SLOTS=15
    3308.08user 294.79system 5:56.04elapsed 1011%CPU (0avgtext+0avgdata 505360maxresident)k
    -j17 GCC_TEST_PARALLEL_SLOTS=17
    3539.93user 298.99system 5:27.86elapsed 1170%CPU (0avgtext+0avgdata 505112maxresident)k
    -j18 GCC_TEST_PARALLEL_SLOTS=18
    3697.50user 317.18system 5:14.63elapsed 1275%CPU (0avgtext+0avgdata 505360maxresident)k
    -j19 GCC_TEST_PARALLEL_SLOTS=19
    3765.94user 324.27system 5:13.22elapsed 1305%CPU (0avgtext+0avgdata 505128maxresident)k
    -j20 GCC_TEST_PARALLEL_SLOTS=20
    3684.66user 312.32system 5:15.26elapsed 1267%CPU (0avgtext+0avgdata 505100maxresident)k
    -j23 GCC_TEST_PARALLEL_SLOTS=23
    4040.59user 347.10system 5:29.12elapsed 1333%CPU (0avgtext+0avgdata 505200maxresident)k
    -j26 GCC_TEST_PARALLEL_SLOTS=26
    3973.24user 377.96system 5:24.70elapsed 1340%CPU (0avgtext+0avgdata 505160maxresident)k
    -j32 GCC_TEST_PARALLEL_SLOTS=32
    4004.42user 346.10system 5:16.11elapsed 1376%CPU (0avgtext+0avgdata 505160maxresident)k

Yay!

Case (b), baseline; 2+ h:

    7227.58user 700.54system 2:14:33elapsed 98%CPU (0avgtext+0avgdata 994264maxresident)k

Case (b), parallelized:

    -j12 GCC_TEST_PARALLEL_SLOTS=10
    7377.46user 777.52system 16:06.63elapsed 843%CPU (0avgtext+0avgdata 994344maxresident)k
    -j15 GCC_TEST_PARALLEL_SLOTS=15
    8019.18user 721.42system 12:13.56elapsed 1191%CPU (0avgtext+0avgdata 994228maxresident)k
    -j17 GCC_TEST_PARALLEL_SLOTS=17
    8530.11user 716.95system 10:45.92elapsed 1431%CPU (0avgtext+0avgdata 994176maxresident)k
    -j18 GCC_TEST_PARALLEL_SLOTS=18
    8776.79user 645.89system 10:27.20elapsed 1502%CPU (0avgtext+0avgdata 994248maxresident)k
    -j19 GCC_TEST_PARALLEL_SLOTS=19
    9332.37user 641.76system 10:15.09elapsed 1621%CPU (0avgtext+0avgdata 994260maxresident)k
    -j20 GCC_TEST_PARALLEL_SLOTS=20
    9609.54user 789.88system 10:26.94elapsed 1658%CPU (0avgtext+0avgdata 994284maxresident)k
    -j23 GCC_TEST_PARALLEL_SLOTS=23
    10362.40user 911.14system 10:44.47elapsed 1749%CPU (0avgtext+0avgdata 994208maxresident)k
    -j26 GCC_TEST_PARALLEL_SLOTS=26
    11159.44user 850.99system 11:09.25elapsed 1794%CPU (0avgtext+0avgdata 994256maxresident)k
    -j32 GCC_TEST_PARALLEL_SLOTS=32
    11453.50user 939.52system 11:00.38elapsed 1876%CPU (0avgtext+0avgdata 994240maxresident)k

On my Dell Precision 7530 laptop:

    $ uname -srvi
    Linux 5.15.0-71-generic #78-Ubuntu SMP Tue Apr 18 09:00:29 UTC 2023 x86_64
    $ grep '^model name' < /proc/cpuinfo | uniq -c
         12 model name      : Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz
    $ nvidia-smi -L
    GPU 0: Quadro P1000 (UUID: GPU-e043973b-b52a-d02b-c066-a8fdbf64e8ea)

... in two configurations: case (c) standard configuration, no offloading
configured, case (d) offloading for nvptx configured and device available.
For both cases, only default variant, no '-m32'.

    $ \time make check-target-libgomp

Case (c), baseline; roughly half of case (a) (just one variant):

    1180.98user 110.80system 19:36.40elapsed 109%CPU (0avgtext+0avgdata 505148maxresident)k
    1133.22user 111.08system 19:35.75elapsed 105%CPU (0avgtext+0avgdata 505212maxresident)k

Case (c), parallelized:

    -j12 GCC_TEST_PARALLEL_SLOTS=2
    1143.83user 110.76system 10:20.46elapsed 202%CPU (0avgtext+0avgdata 505216maxresident)k
    -j12 GCC_TEST_PARALLEL_SLOTS=6
    1737.08user 143.94system 4:59.48elapsed 628%CPU (0avgtext+0avgdata 505200maxresident)k
    1730.31user 143.02system 4:58.75elapsed 627%CPU (0avgtext+0avgdata 505152maxresident)k
    -j12 GCC_TEST_PARALLEL_SLOTS=8
    2192.63user 169.34system 4:52.96elapsed 806%CPU (0avgtext+0avgdata 505216maxresident)k
    2219.04user 167.67system 4:53.19elapsed 814%CPU (0avgtext+0avgdata 505152maxresident)k
    -j12 GCC_TEST_PARALLEL_SLOTS=10
    2463.93user 184.98system 4:48.39elapsed 918%CPU (0avgtext+0avgdata 505200maxresident)k
    2455.62user 183.68system 4:47.40elapsed 918%CPU (0avgtext+0avgdata 505216maxresident)k
    -j12 GCC_TEST_PARALLEL_SLOTS=12
    2591.04user 192.64system 4:44.98elapsed 976%CPU (0avgtext+0avgdata 505216maxresident)k
    2581.23user 195.21system 4:47.51elapsed 965%CPU (0avgtext+0avgdata 505212maxresident)k
    -j20 GCC_TEST_PARALLEL_SLOTS=20 [oversubscribe]
    2613.18user 199.51system 4:44.06elapsed 990%CPU (0avgtext+0avgdata 505216maxresident)k

Case (d), baseline (compared to case (b): only nvptx offloading compilation,
but also nvptx offloading execution); ~1 h:

    2841.93user 653.68system 1:02:26elapsed 93%CPU (0avgtext+0avgdata 909792maxresident)k
    2842.03user 654.39system 1:02:24elapsed 93%CPU (0avgtext+0avgdata 909880maxresident)k

Case (d), parallelized:

    -j12 GCC_TEST_PARALLEL_SLOTS=2
    2856.39user 606.87system 33:58.64elapsed 169%CPU (0avgtext+0avgdata 909948maxresident)k
    -j12 GCC_TEST_PARALLEL_SLOTS=6
    3444.90user 666.86system 18:37.57elapsed 367%CPU (0avgtext+0avgdata 909856maxresident)k
    3462.13user 667.13system 18:36.87elapsed 369%CPU (0avgtext+0avgdata 909872maxresident)k
    -j12 GCC_TEST_PARALLEL_SLOTS=8
    3929.74user 716.22system 18:02.36elapsed 429%CPU (0avgtext+0avgdata 909832maxresident)k
    -j12 GCC_TEST_PARALLEL_SLOTS=10
    4152.84user 736.16system 17:43.05elapsed 459%CPU (0avgtext+0avgdata 909872maxresident)k
    -j12 GCC_TEST_PARALLEL_SLOTS=12
    4209.60user 749.00system 17:35.20elapsed 469%CPU (0avgtext+0avgdata 909840maxresident)k
    -j20 GCC_TEST_PARALLEL_SLOTS=20 [oversubscribe]
    4255.54user 756.78system 17:29.06elapsed 477%CPU (0avgtext+0avgdata 909868maxresident)k

Worth noting is that with nvptx offloading, there is one execution test case
that times out ('libgomp.fortran/reverse-offload-5.f90').  This effectively
stalls progress for almost 5 min: quickly other executions test cases queue up
on the lock for all parallel slots.  That's working as expected; just noting
this as it accordingly does skew the wall time numbers.

PR testsuite/66005
libgomp/
* configure.ac: Look for 'flock'.
* testsuite/Makefile.am (gcc_test_parallel_slots): Enable parallel testing.
* testsuite/config/default.exp: Don't 'load_lib "standard.exp"' here...
* testsuite/lib/libgomp.exp: ... but here, instead.
(libgomp_load): Override for parallel testing.
* testsuite/libgomp-site-extra.exp.in (FLOCK): Set.
* configure: Regenerate.
* Makefile.in: Regenerate.
* testsuite/Makefile.in: Regenerate.

(cherry picked from commit 6c3b30ef9e0578509bdaf59c13da4a212fe6c2ba)

Support parallel testing in libgomp, part I [PR66005]

..., while still hard-coding the number of parallel slots to one.

PR testsuite/66005
libgomp/
* testsuite/Makefile.am (PWD_COMMAND): New variable.
(%/site.exp): New target.
(check_p_numbers0, check_p_numbers1, check_p_numbers2)
(check_p_numbers3, check_p_numbers4, check_p_numbers5)
(check_p_numbers6, check_p_numbers, gcc_test_parallel_slots)
(check_p_subdirs)
(check_DEJAGNU_libgomp_targets): New variables.
($(check_DEJAGNU_libgomp_targets)): New target.
($(check_DEJAGNU_libgomp_targets)): New dependency.
(check-DEJAGNU $(check_DEJAGNU_libgomp_targets)): New targets.
* testsuite/Makefile.in: Regenerate.
* testsuite/lib/libgomp.exp: For parallel testing,
'load_file ../libgomp-test-support.exp'.

Co-authored-by: Thomas Schwinge <thomas@codesourcery.com>
(cherry picked from commit e797db5c744f7b4e110f23a495fca8e6b8aebe83)

libgomp C++ testsuite: Use 'lang_include_flags' instead of 'libstdcxx_includes'

With nvptx offloading configured, and supported, and CUDA available:

    $ make check-target-libgomp RUNTESTFLAGS="--all c.exp=context-1.c c++.exp=context-1.c"
    [...]
    Running [...]/libgomp.oacc-c/c.exp ...
    PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/context-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O0  (test for excess errors)
    PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/context-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O0  execution test
    PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/context-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O2  (test for excess errors)
    PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/context-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O2  execution test
    UNSUPPORTED: libgomp.oacc-c/../libgomp.oacc-c-c++-common/context-1.c -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable  -O2
    Running [...]/libgomp.oacc-c++/c++.exp ...
    PASS: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/context-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O0  (test for excess errors)
    PASS: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/context-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O0  execution test
    PASS: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/context-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O2  (test for excess errors)
    PASS: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/context-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O2  execution test
    UNSUPPORTED: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/context-1.c -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable  -O2
    [...]

..., but for 'c++.exp=context-1.c' alone, we currently get all-UNSUPPORTED:

    $ make check-target-libgomp RUNTESTFLAGS_="--all c++.exp=context-1.c"
    [...]
    Running [...]/libgomp.oacc-c++/c++.exp ...
    UNSUPPORTED: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/context-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O0
    UNSUPPORTED: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/context-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O2
    UNSUPPORTED: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/context-1.c -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable  -O2
    [...]

That is, if 'c.exp' executes first, it does successfully evaluate
'dg-require-effective-target openacc_cublas' -- and does cache this result (so
it isn't reevaluated for 'c++.exp').  However, for 'c++.exp' alone (that is,
without the 'c.exp' result cached), we run into:

    spawn -ignore SIGHUP [xgcc] [...] -x c++ openacc_cublas2311907.c [...]
    In file included from /usr/include/cuda_fp16.h:3673,
                     from /usr/include/cublas_api.h:75,
                     from /usr/include/cublas_v2.h:65,
                     from openacc_cublas2311907.c:3:
    /usr/include/cuda_fp16.hpp:67:10: fatal error: utility: No such file or directory

We're missing include paths to C++/libstdc++ build-tree headers.

Fix this by using the mechanism introduced for Fortran in
r212268 (commit f707da16f714f7fe5a42391748212c84dfec639b) re
"libgomp.fortran/fortran.exp - add -fintrinsic-modules-path ${blddir}".

libgomp/
* testsuite/libgomp.c++/c++.exp: Use 'lang_include_flags' instead
of 'libstdcxx_includes'.
* testsuite/libgomp.oacc-c++/c++.exp: Likewise.

(cherry picked from commit 1b93b9191d073bf9e867ab8bfc8e4b59ba5af1f3)

go: Update usage of TARGET_AIX to TARGET_AIX_OS

TARGET_AIX is defined to a non-zero value on linux and maybe other
powerpc64le targets. This leads to unexpected behavior such as
dropping the .go_export section when linking a shared library
on linux/powerpc64le.

Instead, use TARGET_AIX_OS to toggle AIX specific behavior.

Fixes golang/go#60798.

2023-06-22 Paul E. Murphy <murphyp@linux.ibm.com>

gcc/go/
* go-backend.cc [TARGET_AIX]: Rename and update usage to TARGET_AIX_OS.
* go-lang.cc: Likewise.

(cherry picked from commit b76cd1ec361712e1ac9ca5e0246da24ea2b78916)

Make option mvzeroupper independent of optimization level.

pass_insert_vzeroupper is under condition

TARGET_AVX && TARGET_VZEROUPPER
&& flag_expensive_optimizations && !optimize_size

But the document of mvzeroupper doesn't mention the insertion
required -O2 and above, it may confuse users when they explicitly
use -Os -mvzeroupper.

------------
mvzeroupper
Target Mask(VZEROUPPER) Save
Generate vzeroupper instruction before a transfer of control flow out of
the function.
------------

The patch moves flag_expensive_optimizations && !optimize_size to
ix86_option_override_internal. It makes -mvzeroupper independent of
optimization level, but still keeps the behavior of architecture
tuning(emit_vzeroupper) unchanged.

gcc/ChangeLog:

* config/i386/i386-features.cc (pass_insert_vzeroupper:gate):
Move flag_expensive_optimizations && !optimize_size to ..
* config/i386/i386-options.cc (ix86_option_override_internal):
.. this, it makes -mvzeroupper independent of optimization
level, but still keeps the behavior of architecture
tuning(emit_vzeroupper) unchanged.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx-vzeroupper-29.c: New testcase.
* gcc.target/i386/avx-vzeroupper-12.c: Adjust testcase.
* gcc.target/i386/avx-vzeroupper-7.c: Ditto.
* gcc.target/i386/avx-vzeroupper-9.c: Ditto.

Daily bump.

Mark asm goto with outputs as volatile

The manual references asm goto as being implicitly volatile already
and that was done when asm goto could not have outputs. When outputs
were added to `asm goto`, only asm goto without outputs were still being
marked as volatile. Now some parts of GCC decide, removing the `asm goto`
is ok if the output is not used, though not updating the CFG (this happens
on both the RTL level and the gimple level). Since the biggest user of `asm goto`
is the Linux kernel and they expect them to be volatile (they use them to
copy to/from userspace), we should just mark the inline-asm as volatile.

OK? Bootstrapped and tested on x86_64-linux-gnu.

PR middle-end/110420
PR middle-end/103979
PR middle-end/98619

gcc/ChangeLog:

* gimplify.cc (gimplify_asm_expr): Mark asm with labels as volatile.

gcc/testsuite/ChangeLog:

* gcc.c-torture/compile/asmgoto-6.c: New test.

(cherry picked from commit 478840a2ca491fbff44371caee4983d1e7b7b7cf)

Daily bump.

d: Suboptimal codegen for __builtin_expect(cond, false)

Since PR96435, both boolean objects and expressions have been evaluated
in the following way.

(*(ubyte*)&obj_or_expr) & 1

It has been noted that sometimes this can cause the back-end to optimize
in non-obvious ways - in particular with __builtin_expect.

This @safe feature is now restricted to just when reading the value of a
bool field that comes from a union.

PR d/110359

gcc/d/ChangeLog:

* d-convert.cc (convert_for_rvalue): Only apply the @safe boolean
conversion to boolean fields of a union.
(convert_for_condition): Call convert_for_rvalue in the default case.

gcc/testsuite/ChangeLog:

* gdc.dg/pr110359.d: New test.

(cherry picked from commit ab98db1e8c1b997414539f41b7fb814019497d8d)

d: Fix crash in d/dmd/root/aav.d:127 dmd_aaGetRvalue from DsymbolTable::lookup

Backports patch from upstream dmd mainline for fixing PR110113.

The data being Mem.xrealloc'd contains many Array(T) fields, some of
which have self references in their data.ptr field thanks to the
smallarray optimization used by Array.

Naturally then, the memcpy from old GC data to new retains those self
referenced addresses, and the GC marks the old data as "free". Some time
later GC.malloc will return a pointer to said "free" data. So now we
have two GC references to the same memory. One that is treating the data
as an Array(VarDeclaration) in dmd.escape.escapeByStorage, and the other
as an AA in the symtab of a dmd.dsymbol.ScopeDsymbol.

Fix this memory corruption by not storing the data in a global variable
for reuse. If there are no more live references, the GC will free it.

PR d/110113

gcc/d/ChangeLog:

* dmd/escape.d (checkMutableArguments): Always allocate new buffer for
computing escapeBy.

gcc/testsuite/ChangeLog:

* gdc.test/compilable/test23978.d: New test.

Reviewed-on: https://github.com/dlang/dmd/pull/15302
(cherry picked from commit ae3a4cefd855512b10b833a56f275b701bacdb34)

Daily bump.

compiler, libgo: support bootstrapping gc compiler

In the Go 1.21 release the package internal/profile imports
internal/lazyregexp.  That works when bootstrapping with Go 1.17,
because that compiler has internal/lazyregep and permits importing it.
We also have internal/lazyregexp in libgo, but since it is not installed
it is not available for importing.  This CL adds internal/lazyregexp
to the list of internal packages that are installed for bootstrapping.

The Go 1.21, and earlier, releases have a couple of functions in
the internal/abi package that are always fully intrinsified.
The gofrontend recognizes and intrinsifies those functions as well.
However, the gofrontend was also building function descriptors
for references to the functions without calling them, which
failed because there was nothing to refer to.  That is OK for the
gc compiler, which guarantees that the functions are only called,
not referenced.  This CL arranges to not generate function descriptors
for these functions.

For golang/go#60913

Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/504798

libstdc++: Document removal of implicit allocator rebinding extensions

Traditionally libstdc++ allowed containers to be
instantiated with allocator's that have the wrong value type, implicitly
rebinding the allocator to the container's value type. Since C++20 that
has been explicitly ill-formed, so the extension is no longer supported
in strict modes (e.g. -std=c++17) and in C++20 and later.

libstdc++-v3/ChangeLog:

* doc/xml/manual/evolution.xml: Document removal of implicit
allocator rebinding extensions in strict mode and for C++20.
* doc/html/*: Regenerate.

(cherry picked from commit 8cbaf679a3c1875c5475bd1cb0fb86fb9d03b2d4)

tree-optimization/110298 - CFG cleanup and stale nb_iterations

When unrolling we eventually kill nb_iterations info since it may
refer to removed SSA names. But we do this only after cleaning
up the CFG which in turn can end up accessing it. Fixed by
swapping the two.

PR tree-optimization/110298
* tree-ssa-loop-ivcanon.cc (tree_unroll_loops_completely):
Clear number of iterations info before cleaning up the CFG.

* gcc.dg/torture/pr110298.c: New testcase.

(cherry picked from commit 916add3bf6e46467e4391e358b11ecfbc4daa275)

middle-end/110182 - TYPE_PRECISION on VECTOR_TYPE causes wrong-code

When folding two conversions in a row we use TYPE_PRECISION but
that's invalid for VECTOR_TYPE. The following fixes this by
using element_precision instead.

PR middle-end/110182
* match.pd (two conversions in a row): Use element_precision
to DTRT for VECTOR_TYPE.

(cherry picked from commit 3e12669a0eb968cfcbe9242b382fd8020935edf8)

Daily bump.

aarch64: Allow compiler to define ls64 builtins [PR110132]

This patch refactors the ls64 builtins to allow the compiler to define them
directly instead of having wrapper functions in arm_acle.h. This should be not
only easier to maintain, but it makes two important correctness fixes:
- It fixes PR110132, where the builtins ended up getting declared with
invisible bindings in the C FE, so the FE ended up synthesizing
incompatible implicit definitions for these builtins.
- It allows the builtins to be used with LTO, which didn't work previously.

We also take the opportunity to add test coverage from C++ for these
builtins.

gcc/ChangeLog:

PR target/110132
* config/aarch64/aarch64-builtins.cc (aarch64_general_simulate_builtin):
New. Use it ...
(aarch64_init_ls64_builtins): ... here. Switch to declaring public ACLE
names for builtins.
(aarch64_general_init_builtins): Ensure we invoke the arm_acle.h
setup if in_lto_p, just like we do for SVE.
* config/aarch64/arm_acle.h: (__arm_ld64b): Delete.
(__arm_st64b): Delete.
(__arm_st64bv): Delete.
(__arm_st64bv0): Delete.

gcc/testsuite/ChangeLog:

PR target/110132
* lib/target-supports.exp (check_effective_target_aarch64_asm_FUNC_ok):
Extend to ls64.
* g++.target/aarch64/acle/acle.exp: New.
* g++.target/aarch64/acle/ls64.C: New test.
* g++.target/aarch64/acle/ls64_lto.C: New test.
* gcc.target/aarch64/acle/ls64_lto.c: New test.
* gcc.target/aarch64/acle/pr110132.c: New test.

(cherry picked from commit 9963029a24f2d2510b82e7106fae3f364da33c5d)

aarch64: Fix wrong code with st64b builtin [PR110100]

The st64b pattern incorrectly had an output constraint on the register
operand containing the destination address for the store, leading to
wrong code. This patch fixes that.

gcc/ChangeLog:

PR target/110100
* config/aarch64/aarch64-builtins.cc (aarch64_expand_builtin_ls64):
Use input operand for the destination address.
* config/aarch64/aarch64.md (st64b): Fix constraint on address
operand.

gcc/testsuite/ChangeLog:

PR target/110100
* gcc.target/aarch64/acle/pr110100.c: New test.

(cherry picked from commit 737a0b749a7bc3e7cb904ea2d4b18dc130514b85)

aarch64: Fix whitespace in ls64 builtin implementation [PR110100]

The ls64 builtin code was using incorrect GNU style with eight spaces where
there should be a tab. Fixed thusly.

gcc/ChangeLog:

PR target/110100
* config/aarch64/aarch64-builtins.cc (aarch64_init_ls64_builtins_types):
Replace eight consecutive spaces with tabs.
(aarch64_init_ls64_builtins): Likewise.
(aarch64_expand_builtin_ls64): Likewise.
* config/aarch64/aarch64.md (ld64b): Likewise.
(st64b): Likewise.
(st64bv): Likewise
(st64bv0): Likewise.

(cherry picked from commit 713613541254039a34e1dd8fd4a613a299af1fd6)

Daily bump.

libstdc++: avoid bogus -Wrestrict [PR105651]

PR tree-optimization/105651

libstdc++-v3/ChangeLog:

* include/bits/basic_string.tcc (_M_replace): Add an assert
to avoid -Wrestrict false positive.

Daily bump.

testsuite: Check int128 effective target for pr109932-{1,2}.c [PR110230]

This patch is to make newly added test cases pr109932-{1,2}.c
check int128 effective target to avoid unsupported type error
on 32-bit. I did hit this failure during testing and fixed
it, but made a stupid mistake not updating the local formatted
patch which was actually out of date.

PR testsuite/110230
PR target/109932

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr109932-1.c: Adjust with int128 effective target.
* gcc.target/powerpc/pr109932-2.c: Ditto.

(cherry picked from commit 16eb9d69079d769b2aa2c07ce54aca20f5547c14)

rs6000: Guard __builtin_{un,}pack_vector_int128 with vsx [PR109932]

As PR109932 shows, builtins __builtin_{un,}pack_vector_int128
should be guarded under vsx rather than power7, as their
corresponding bif patterns have the conditions TARGET_VSX
and VECTOR_MEM_ALTIVEC_OR_VSX_P (V1TImode). This patch is to
move __builtin_{un,}pack_vector_int128 to stanza vsx to ensure
their supports.

PR target/109932

gcc/ChangeLog:

* config/rs6000/rs6000-builtins.def (__builtin_pack_vector_int128,
__builtin_unpack_vector_int128): Move from stanza power7 to vsx.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr109932-1.c: New test.
* gcc.target/powerpc/pr109932-2.c: New test.

(cherry picked from commit ff83d1b47aadcdaf80a4fda84b0dc00bb2cd3641)

rs6000: Don't use TFmode for 128 bits fp constant in toc [PR110011]

As PR110011 shows, when encoding 128 bits fp constant into
toc, we adopts REAL_VALUE_TO_TARGET_LONG_DOUBLE which is
to find the first float mode with LONG_DOUBLE_TYPE_SIZE
bits of precision, it would be TFmode here. But the 128
bits fp constant can be with mode IFmode or KFmode, which
doesn't necessarily have the same underlying float format
as the one of TFmode, like this PR exposes, with option
-mabi=ibmlongdouble TFmode has ibm_extended_format while
KFmode has ieee_quad_format, mixing up the formats (the
encoding/decoding ways) would cause unexpected results.

This patch is to make it use constant's own mode instead
of TFmode for real_to_target call.

PR target/110011

gcc/ChangeLog:

* config/rs6000/rs6000.cc (output_toc): Use the mode of the 128-bit
floating constant itself for real_to_target call.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr110011.c: New test.

(cherry picked from commit 388809f2afde874180da0669c669e241037eeba0)

Daily bump.

aarch64: testsuite: disable stack protector for tests relying on stack offset

Stack protector needs a guard value on the stack and change the stack
layout. So we need to disable it for those tests, to avoid test failure
with --enable-default-ssp.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/shrink_wrap_1.c (dg-options): Add
-fno-stack-protector.
* gcc.target/aarch64/stack-check-cfa-1.c (dg-options): Add
-fno-stack-protector.
* gcc.target/aarch64/stack-check-cfa-2.c (dg-options): Add
-fno-stack-protector.
* gcc.target/aarch64/test_frame_17.c (dg-options): Add
-fno-stack-protector.

(cherry picked from commit 59a72acbccf4c81a04b4d09760fc8b16992de106)

aarch64: testsuite: disable stack protector for pr104005.c

Storing stack guarding variable need one stp instruction, breaking the
scan-assembler-not pattern in the test. Disable stack protector to
avoid a test failure with --enable-default-ssp.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/pr104005.c (dg-options): Add
-fno-stack-protector.

(cherry picked from commit 5937cfb981debb2aeb72a1ed255fc3ed5a5835c4)

aarch64: testsuite: disable stack protector for auto-init-7.c

The test scans for "const_int 0" in the RTL dump, but stack protector
can produce more "const_int 0". To avoid a failure with
--enable-default-ssp, disable stack protector for this.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/auto-init-7.c (dg-options): Add
-fno-stack-protector.

(cherry picked from commit 4c59cfc4a4da579d60bfd82404e3ff51c72aca79)

aarch64: testsuite: disable stack protector for pr103147-10 tests

Stack protector influence code generation and cause function body checks
fail.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/pr103147-10.c (dg-options): Add
-fno-stack-protector.
* g++.target/aarch64/pr103147-10.C: Likewise.

(cherry picked from commit 2fa31207ea602d48b8f69982e0bde1d143e54ecb)

aarch64: testsuite: disable stack protector for sve-pcs tests

If GCC is configured with --enable-default-ssp, the stack protector can
make many sve-pcs tests fail.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/sve/pcs/aarch64-sve-pcs.exp (sve_flags):
Add -fno-stack-protector.

(cherry picked from commit edb336cc575a82cfc2a883e4c3453c36267482c1)

aarch64: testsuite: disable PIE for fuse_adrp_add_1.c [PR70150]

In PIE, symbol "fixed_regs" is addressed via GOT. It will break the
scan-assembler pattern and cause test failure with --enable-default-pie.

gcc/testsuite/ChangeLog:

PR testsuite/70150
* gcc.target/aarch64/fuse_adrp_add_1.c (dg-options): Add
-fno-pie.

(cherry picked from commit 7e8a3dbbb26f66ce8ea60be48962022b5fb2ef55)

aarch64: testsuite: disable PIE for tests with large code model [PR70150]

These tests set large code model with -mcmodel=large or target pragma for
AArch64. But if GCC is configured with --enable-default-pie, it triggers
"sorry: unimplemented: code model large with -fpic". Disable PIE to make
avoid the issue.

gcc/testsuite/ChangeLog:

PR testsuite/70150
* gcc.dg/tls/pr78796.c (dg-additional-options): Add -fno-pie
-no-pie for aarch64-*-*.
* gcc.target/aarch64/pr63304_1.c (dg-options): Add -fno-pie.
* gcc.target/aarch64/pr70120-2.c (dg-options): Add -fno-pie.
* gcc.target/aarch64/pr78733.c (dg-options): Add -fno-pie.
* gcc.target/aarch64/pr79041-2.c (dg-options): Add -fno-pie.
* gcc.target/aarch64/pr94530.c (dg-options): Add -fno-pie.
* gcc.target/aarch64/pr94577.c (dg-options): Add -fno-pie.
* gcc.target/aarch64/reload-valid-spoff.c (dg-options): Add
-fno-pie.

(cherry picked from commit a1ccb4583dfaa267648110aa7da7275acc3000f8)

aarch64: testsuite: disable PIE for aapcs64 tests [PR70150]

If GCC is built with --enable-default-pie, a lot of aapcs64 tests fail
because relocation unsupported in PIE is used.

gcc/testsuite/ChangeLog:

PR testsuite/70150
* gcc.target/aarch64/aapcs64/aapcs64.exp (additional_flags):
Add -fno-pie -no-pie.

(cherry picked from commit f30f04b1fbd4b4e13a7535fad8e698c7b24db9b8)

LoongArch: Avoid non-returning indirect jumps through $ra [PR110136]

Micro-architecture unconditionally treats a "jr $ra" as "return from subroutine",
hence doing "jr $ra" would interfere with both subroutine return prediction and
the more general indirect branch prediction.

Therefore, a problem like PR110136 can cause a significant increase in branch error
prediction rate and affect performance. The same problem exists with "indirect_jump".

gcc/ChangeLog:

PR target/110136
* config/loongarch/loongarch.md: Modify the register constraints for template
"jumptable" and "indirect_jump" from "r" to "e".

Co-authored-by: Andrew Pinski <apinski@marvell.com>
(cherry picked from commit 5430c86e71927492399129f3df80824c6c334ddf)

Daily bump.

middle-end/110200 - genmatch force-leaf and convert interaction

The following fixes code GENERIC generation for (convert! ...)
which currently generates

  if (TREE_TYPE (_o1[0]) != type)
    _r1 = fold_build1_loc (loc, NOP_EXPR, type, _o1[0]);
    if (EXPR_P (_r1))
      goto next_after_fail867;
  else
    _r1 = _o1[0];

where obviously braces are missing.

PR middle-end/110200
* genmatch.cc (expr::gen_transform): Put braces around
the if arm for the (convert ...) short-cut.

(cherry picked from commit 820d1aec89c43dbbc70d3d0b888201878388454c)

Daily bump.

target/109650: Fix wrong code after cc0 -> CCmode transition.

This patch fixes a wrong-code bug in the wake of PR92729, the transition that
turned the AVR backend from cc0 to CCmode.  In cc0, the insn that uses cc0 like
a conditional branch always follows the cc0 setter, which is no more the case
with CCmode where set and use of REG_CC might be in different basic blocks.

This patch removes the machine-dependent reorg pass in avr_reorg entirely.

It is replaced by a new, AVR specific mini-pass that runs prior to split2.
Canonicalization of comparisons away from the "difficult" codes GT[U] and LE[U]
is now mostly performed by implementing TARGET_CANONICALIZE_COMPARISON.

Moreover:

* Text peephole conditions get "dead_or_set_regno_p (*, REG_CC)" as needed.

* RTL peephole conditions get "peep2_regno_dead_p (*, REG_CC)" as needed.

* Conditional branches no more clobber REG_CC.

* insn output for compares looks ahead to determine the branch mode in use.
  This needs also "dead_or_set_regno_p (*, REG_CC)".

* Add RTL peepholes for decrement-and-branch detection.

* Some of the patterns like "*cmphi.zero-extend.0" lost their
  combine-ational part wit PR92729.  Restore them.

Finally, it fixes some of the many indentation glitches left over from PR92729.

gcc/
PR target/109650
PR target/92729
Backport from 2023-05-10 master r14-1688.
* config/avr/avr-passes.def (avr_pass_ifelse): Insert new pass.
* config/avr/avr.cc (avr_pass_ifelse): New RTL pass.
(avr_pass_data_ifelse): New pass_data for it.
(make_avr_pass_ifelse, avr_redundant_compare, avr_cbranch_cost)
(avr_canonicalize_comparison, avr_out_plus_set_ZN)
(avr_out_cmp_ext): New functions.
(compare_condtition): Make sure REG_CC dies in the branch insn.
(avr_rtx_costs_1): Add computation of cbranch costs.
(avr_adjust_insn_length) [ADJUST_LEN_ADD_SET_ZN, ADJUST_LEN_CMP_ZEXT]:
[ADJUST_LEN_CMP_SEXT]Handle them.
(TARGET_CANONICALIZE_COMPARISON): New define.
(avr_simplify_comparison_p, compare_diff_p, avr_compare_pattern)
(avr_reorg_remove_redundant_compare, avr_reorg): Remove functions.
(TARGET_MACHINE_DEPENDENT_REORG): Remove define.
* config/avr/avr-protos.h (avr_simplify_comparison_p): Remove proto.
(make_avr_pass_ifelse, avr_out_plus_set_ZN, cc_reg_rtx)
(avr_out_cmp_zext): New Protos
* config/avr/avr.md (branch, difficult_branch): Don't split insns.
(*cbranchhi.zero-extend.0", *cbranchhi.zero-extend.1")
(*swapped_tst<mode>, *add.for.eqne.<mode>): New insns.
(*cbranch<mode>4): Rename to cbranch<mode>4_insn.
(define_peephole): Add dead_or_set_regno_p(insn,REG_CC) as needed.
(define_deephole2): Add peep2_regno_dead_p(*,REG_CC) as needed.
Add new RTL peepholes for decrement-and-branch and *swapped_tst<mode>.
Rework signtest-and-branch peepholes for *sbrx_branch<mode>.
(adjust_len) [add_set_ZN, cmp_zext]: New.
(QIPSI): New mode iterator.
(ALLs1, ALLs2, ALLs4, ALLs234): New mode iterators.
(gelt): New code iterator.
(gelt_eqne): New code attribute.
(rvbranch, *rvbranch, difficult_rvbranch, *difficult_rvbranch)
(branch_unspec, *negated_tst<mode>, *reversed_tst<mode>)
(*cmpqi_sign_extend): Remove insns.
(define_c_enum "unspec") [UNSPEC_IDENTITY]: Remove.
* config/avr/avr-dimode.md (cbranch<mode>4): Canonicalize comparisons.
* config/avr/predicates.md (scratch_or_d_register_operand): New.
* config/avr/constraints.md (Yxx): New constraint.

gcc/testsuite/
PR target/109650
Backport from 2023-05-10 master r14-1688.
* gcc.target/avr/torture/pr109650-1.c: New test.
* gcc.target/avr/torture/pr109650-2.c: New test.

Daily bump.

rs6000: Remove duplicate expression [PR106907]

PR106907 has few warnings spotted from cppcheck. In that addressing duplicate
expression issue here. Here the same expression is used twice in logical
AND(&&) operation which result in same result so removing that.

2023-06-06 Jeevitha Palanisamy <jeevitha@linux.ibm.com>

gcc/
PR target/106907
* config/rs6000/rs6000.cc (vec_const_128bit_to_bytes): Remove
duplicate expression.

(cherry picked from commit c4deccd44655c5d748dfed200a37f2b678c32fe8)

Darwin, PPC: Fix struct layout with pragma pack [PR110044].

This bug was essentially that darwin_rs6000_special_round_type_align()
was ignoring externally-imposed capping of field alignment.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
PR target/110044

gcc/ChangeLog:

* config/rs6000/rs6000.cc (darwin_rs6000_special_round_type_align):
Make sure that we do not have a cap on field alignment before altering
the struct layout based on the type alignment of the first entry.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/darwin-abi-13-0.c: New test.
* gcc.target/powerpc/darwin-abi-13-1.c: New test.
* gcc.target/powerpc/darwin-abi-13-2.c: New test.
* gcc.target/powerpc/darwin-structs-0.h: New test.

(cherry picked from commit 84d080a29a780973bef47171ba708ae2f7b4ee47)

fortran: Fix ICE on pr96024.f90 on big-endian hosts [PR96024]

The pr96024.f90 testcase ICEs on big-endian hosts. The problem is
that length->val.integer is accessed after checking
length->expr_type == EXPR_CONSTANT, but it is a CHARACTER constant
which uses length->val.character union member instead and on big-endian
we end up reading constant 0x100000000 rather than some small number
on little-endian and if target doesn't have enough memory for 4 times
that (i.e. 16GB allocation), it ICEs.

2023-06-09 Jakub Jelinek <jakub@redhat.com>

PR fortran/96024
* primary.cc (gfc_convert_to_structure_constructor): Only do
constant string ctor length verification and truncation/padding
if constant length has INTEGER type.

(cherry picked from commit 4cf6e322adc19f927859e0a5edfa93cec4b8c844)

Explicitly view_convert_expr mask to signed type when folding pblendvb builtins.

Since mask < 0 will be always false for vector char when
-funsigned-char, but vpblendvb needs to check the most significant
bit. The patch explicitly VCE to vector signed char.

gcc/ChangeLog:

PR target/110108
* config/i386/i386.cc (ix86_gimple_fold_builtin): Explicitly
view_convert_expr mask to signed type when folding pblendvb
builtins.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr110108-2.c: New test.

Daily bump.

arm: Fix ICE due to infinite splitting [PR109800]

In r11-966-g9a182ef9ee011935d827ab5c6c9a7cd8e22257d8 we introduce a
simplification to emit_move_insn that attempts to simplify moves of the form:

(set (subreg:M1 (reg:M2 ...)) (constant C))

where M1 and M2 are of equal mode size. That is problematic for the splitter
vfp.md:no_literal_pool_df_immediate in the arm backend, which tries to pun an
lvalue DFmode pseudo into DImode and assign a constant to it with
emit_move_insn, as the new transformation simply undoes this, and we end up
splitting indefinitely.

This patch changes things around in the arm backend so that we use a
DImode temporary (instead of DFmode) and first load the DImode constant
into the pseudo, and then pun the pseudo into DFmode as an rvalue in a
reg -> reg move. I believe this should be semantically equivalent but
avoids the pathalogical behaviour seen in the PR.

gcc/ChangeLog:

PR target/109800
* config/arm/arm.md (movdf): Generate temporary pseudo in DImode
instead of DFmode.
* config/arm/vfp.md (no_literal_pool_df_immediate): Rather than punning an
lvalue DFmode pseudo into DImode, use a DImode pseudo and pun it into
DFmode as an rvalue.

gcc/testsuite/ChangeLog:

PR target/109800
* gcc.target/arm/pure-code/pr109800.c: New test.

(cherry picked from commit f5298d9969b4fa34ff3aecd54b9630e22b2984a5)

arm: PR target/109939 Correct signedness of return type of __ssat intrinsics

As the PR says we shouldn't be using qualifier_unsigned for the return type of the __ssat intrinsics.
UNSIGNED_SAT_BINOP_UNSIGNED_IMM_QUALIFIERS already exists for that.
This was just a thinko.
This patch fixes this and the warning with -Wconversion goes away.

Bootstrapped and tested on arm-none-linux-gnueabihf.

gcc/ChangeLog:

PR target/109939
* config/arm/arm-builtins.cc (SAT_BINOP_UNSIGNED_IMM_QUALIFIERS): Use
qualifier_none for the return operand.

gcc/testsuite/ChangeLog:

PR target/109939
* gcc.target/arm/pr109939.c: New test.

(cherry picked from commit 95542a6ec4b350c653b793b7c36a8210b0e9a89d)

Daily bump.

d: Merge upstream dmd 316b89f1e3, phobos 8e8aaae50.

Updates D language version to v2.100.2.

Phobos changes:

    - Fix instantiating std.container.array.Array!T where T is a
      shared class.
    - Fix calling toString on a const std.typecons.Nullable type.

gcc/d/ChangeLog:

* dmd/MERGE: Merge upstream dmd 316b89f1e3.
* dmd/VERSION: Bump version to v2.100.2.

libphobos/ChangeLog:

* src/MERGE: Merge upstream phobos 8e8aaae50.

Daily bump.

Fortran: fix diagnostics for SELECT RANK [PR100607]

gcc/fortran/ChangeLog:

PR fortran/100607
* resolve.cc (resolve_select_rank): Remove duplicate error.
(resolve_fl_var_and_proc): Prevent NULL pointer dereference and
suppress error message for temporary.

gcc/testsuite/ChangeLog:

PR fortran/100607
* gfortran.dg/select_rank_6.f90: New test.

(cherry picked from commit fae09dfc0e6bf4cfe35d817558827aea78c6426f)

Daily bump.

target/110088: Improve operation of l-reg with const after move from d-reg.

After reload, there may be sequences like
   lreg = dreg
   lreg = lreg <op> const
with an LD_REGS dreg, non-LD_REGS lreg, and <op> in PLUS, IOR, AND.
If dreg dies after the first insn, it is possible to use
   dreg = dreg <op> const
   lreg = dreg
instead which is more efficient.

gcc/
PR target/110088
* config/avr/avr.md: Add an RTL peephole to optimize operations on
non-LD_REGS after a move from LD_REGS.
(piaop): New code iterator.

Daily bump.

doc: Fix description of x86 -m32 option [PR109954]

This option does not imply -march=i386 so it's incorrect to say it
generates code that will run on "any i386 system".

gcc/ChangeLog:

PR target/109954
* doc/invoke.texi (x86 Options): Fix description of -m32 option.

(cherry picked from commit eeb92704967875411416b0b9508aa6f49e8192fd)

Daily bump.

[libstdc++] [testsuite] xfail double-prec from_chars for x86_64 ldbl

When long double is wider than double, but from_chars is implemented
in terms of double, tests that involve the full precision of long
double are expected to fail. Mark them as such on x86_64-*-vxworks*.

for libstdc++-v3/ChangeLog

* testsuite/20_util/from_chars/4.cc: Skip long double test06
on x86_64-vxworks.
* testsuite/20_util/to_chars/long_double.cc: Xfail run on
x86_64-vxworks.

(cherry picked from commit 282e4e745981c5c6e3edaae315e1f499a45402df)

[libstdc++] [testsuite] xfail to_chars/long_double on x86-vxworks

Just as on aarch64, x86's wider long double experiences loss of
precision with from_chars implemented in terms of double. Expect the
execution fail.

for libstdc++-v3/ChangeLog

* testsuite/20_util/to_chars/long_double.cc: Expect execution
fail on x86-vxworks.

(cherry picked from commit 7daa166fe89fca4ff1baa063c00a5a690f7e462f)

[libstdc++] [testsuite] xfail double-prec from_chars for ldbl

When long double is wider than double, but from_chars is implemented
in terms of double, tests that involve the full precision of long
double are expected to fail. Mark them as such on aarch64-*-vxworks.

for libstdc++-v3/ChangeLog

* testsuite/20_util/from_chars/4.cc: Skip long double test06
on aarch64-vxworks.
* testsuite/20_util/to_chars/long_double.cc: Xfail run on
aarch64-vxworks.

(cherry picked from commit e383fc69d2a3eab37319ea41543ee09c8cdd6e57)

libstdc++: Correct NTTP and simd_mask ctor call

Signed-off-by: Matthias Kretz <m.kretz@gsi.de>
libstdc++-v3/ChangeLog:

PR libstdc++/109822
* include/experimental/bits/simd.h (to_native): Use int NTTP
as specified in PTS2.
(to_compatible): Likewise. Add missing tag to call mask
generator ctor.
* testsuite/experimental/simd/pr109822_cast_functions.cc: New
test.

(cherry picked from commit 668d43502f465d48adbc1fe2956b979f36657e5f)

libstdc++: Simplify calculation of expected value in simd test

This avoids a failure on PR109964.

Signed-off-by: Matthias Kretz <m.kretz@gsi.de>
libstdc++-v3/ChangeLog:

* testsuite/experimental/simd/tests/integer_operators.cc:
Compute expected value differently to avoid getting turned into
a vector shift.

(cherry picked from commit 3e2689e568425f14d6728504ad6f5d32b90320ad)

libstdc++: Fix test assumptions on long and long double

Expect that long might not fit into the long double mantissa bits.

Signed-off-by: Matthias Kretz <m.kretz@gsi.de>
libstdc++-v3/ChangeLog:

* testsuite/experimental/simd/tests/operator_cvt.cc: Make long
double <-> (u)long conversion tests conditional on sizeof(long
double) and sizeof(long).

(cherry picked from commit 291549d43e823f163fa9961e42a751b5ce0d57fb)

libstdc++: Resolve -Wsign-compare issue

Signed-off-by: Matthias Kretz <m.kretz@gsi.de>
libstdc++-v3/ChangeLog:

* include/experimental/bits/simd_ppc.h (_S_bit_shift_left):
Negative __y is UB, so prefer signed compare.

(cherry picked from commit 1a1abec1d618cde709c585fcce89330bb33b07ac)

testsuite: make mve_intrinsic_type_overloads-int.c libc-agnostic

Glibc defines int32_t as 'int' while newlib defines it as 'long int'.

Although these correspond to the same size, g++ complains when using the
'wrong' version:
  invalid conversion from 'long int*' to 'int32_t*' {aka 'int*'} [-fpermissive]
or
  invalid conversion from 'int*' to 'int32_t*' {aka 'long int*'} [-fpermissive]

when calling vst1q(int32*, int32x4_t) with a first parameter of type
'long int *' (resp. 'int *')

To make this test pass with any type of toolchain, this patch defines
'word_type' according to which libc is in use.

2023-05-23  Christophe Lyon  <christophe.lyon@linaro.org>

gcc/testsuite/
* gcc.target/arm/mve/intrinsics/mve_intrinsic_type_overloads-int.c:
Support both definitions of int32_t.

(cherry picked from commit d12d2aa4fccc76a9a08c8120c5e37d9cab8683e8)

riscv: update riscv_asan_shadow_offset

gcc/
PR target/110036
* config/riscv/riscv.cc (riscv_asan_shadow_offset): Update to
match libsanitizer.

Daily bump.

target/104327: Allow more inlining between different optimization levels.

avr-common.cc introduces the following options that are set depending
on optimization level: -mgas-isr-prologues, -mmain-is-OS-task and
-fsplit-wide-types-early. The inliner thinks that different options
disallow cross-optimization inlining, so provide can_inline_p.

gcc/
PR target/104327
* config/avr/avr.cc (avr_can_inline_p): New static function.
(TARGET_CAN_INLINE_P): Define to that function.

target/82931: Make a pattern more generic to match more bit-transfers.

There is already a pattern in avr.md that matches single-bit transfers
from one register to another one, but it only handled bit 0 of 8-bit
registers. This change makes that pattern more generic so it matches
more of similar single-bit transfers.

gcc/
PR target/82931
* config/avr/avr.md (*movbitqi.0): Rename to *movbit<mode>.0-6.
Handle any bit position and use mode QISI.
* config/avr/avr.cc (avr_rtx_costs_1) [IOR]: Return a cost
of 2 insns for bit-transfer of respective style.

gcc/testsuite/
PR target/82931
* gcc.target/avr/pr82931.c: New test.

Daily bump.

libstdc++: Fix type of first argument to vec_cntm call

Signed-off-by: Matthias Kretz <m.kretz@gsi.de>
libstdc++-v3/ChangeLog:

PR libstdc++/109949
* include/experimental/bits/simd.h (__intrinsic_type): If
__ALTIVEC__ is defined, map gnu::vector_size types to their
corresponding __vector T types without losing unsignedness of
integer types. Also prefer long long over long.
* include/experimental/bits/simd_ppc.h (_S_popcount): Cast mask
object to the expected unsigned vector type.

(cherry picked from commit efd2b55d8562c6e80cb7ee8b9b1f9418f0c00cd9)

libstdc++: Fix SFINAE for __is_intrinsic_type on ARM

On ARM NEON doesn't support double, so __is_intrinsic_type_v<double,
whatever> should say false (instead of being ill-formed).

Signed-off-by: Matthias Kretz <m.kretz@gsi.de>
libstdc++-v3/ChangeLog:

PR libstdc++/109261
* include/experimental/bits/simd.h (__intrinsic_type):
Specialize __intrinsic_type<double, 8> and
__intrinsic_type<double, 16> in any case, but provide the member
type only with __aarch64__.

(cherry picked from commit aa8b363171a95b8f867a74f29c75f9577e9087e1)

libstdc++: Add missing constexpr to simd_neon

Signed-off-by: Matthias Kretz <m.kretz@gsi.de>
libstdc++-v3/ChangeLog:

PR libstdc++/109261
* include/experimental/bits/simd_neon.h (_S_reduce): Add
constexpr and make NEON implementation conditional on
not __builtin_is_constant_evaluated.

(cherry picked from commit b0a483b0a011f9cbc8b25053eae809c77dae2a12)

Daily bump.

Improve cost computation for single-bit bit insertions.

Some miscomputation of rtx_costs lead to sub-optimal code for
single-bit bit insertions. This patch implements TARGET_INSN_COST,
which has a chance to see the whole insn during insn combination;
in particular the SET_DEST of (set (zero_extract (...) ...)).

gcc/
* config/avr/avr.cc (avr_insn_cost): New static function.
(TARGET_INSN_COST): Define to that function.

libstdc++: Add missing constexpr to simd

The constexpr API is only available with -std=gnu++XX (and proposed for
C++26). The proposal is to have the complete simd API usable in constant
expressions.

This patch resolves several issues with using simd in constant
expressions.

Issues why constant_evaluated branches are necessary:
* subscripting vector builtins is not allowed in constant expressions
* if the implementation needs/uses memcpy
* if the implementation would otherwise call SIMD intrinsics/builtins

Signed-off-by: Matthias Kretz <m.kretz@gsi.de>
libstdc++-v3/ChangeLog:

PR libstdc++/109261
* include/experimental/bits/simd.h (_SimdWrapper::_M_set):
Avoid vector builtin subscripting in constant expressions.
(resizing_simd_cast): Avoid memcpy if constant_evaluated.
(const_where_expression, where_expression, where)
(__extract_part, simd_mask, _SimdIntOperators, simd): Add either
_GLIBCXX_SIMD_CONSTEXPR (on public APIs), or constexpr (on
internal APIs).
* include/experimental/bits/simd_builtin.h (__vector_permute)
(__vector_shuffle, __extract_part, _GnuTraits::_SimdCastType1)
(_GnuTraits::_SimdCastType2, _SimdImplBuiltin)
(_MaskImplBuiltin::_S_store): Add constexpr.
(_CommonImplBuiltin::_S_store_bool_array)
(_SimdImplBuiltin::_S_load, _SimdImplBuiltin::_S_store)
(_SimdImplBuiltin::_S_reduce, _MaskImplBuiltin::_S_load): Add
constant_evaluated case.
* include/experimental/bits/simd_fixed_size.h
(_S_masked_load): Reword comment.
(__tuple_element_meta, __make_meta, _SimdTuple::_M_apply_r)
(_SimdTuple::_M_subscript_read, _SimdTuple::_M_subscript_write)
(__make_simd_tuple, __optimize_simd_tuple, __extract_part)
(__autocvt_to_simd, _Fixed::__traits::_SimdBase)
(_Fixed::__traits::_SimdCastType, _SimdImplFixedSize): Add
constexpr.
(_SimdTuple::operator[], _M_set): Add constexpr and add
constant_evaluated case.
(_MaskImplFixedSize::_S_load): Add constant_evaluated case.
* include/experimental/bits/simd_scalar.h: Add constexpr.

* include/experimental/bits/simd_x86.h (_CommonImplX86): Add
constexpr and add constant_evaluated case.
(_SimdImplX86::_S_equal_to, _S_not_equal_to, _S_less)
(_S_less_equal): Value-initialize to satisfy constexpr
evaluation.
(_MaskImplX86::_S_load): Add constant_evaluated case.
(_MaskImplX86::_S_store): Add constexpr and constant_evaluated
case. Value-initialize local variables.
(_MaskImplX86::_S_logical_and, _S_logical_or, _S_bit_not)
(_S_bit_and, _S_bit_or, _S_bit_xor): Add constant_evaluated
case.
* testsuite/experimental/simd/pr109261_constexpr_simd.cc: New
test.

(cherry picked from commit da579188807ede4ee9466d0b5bf51559c96a0b51)