git.ipfire.org Git - thirdparty/gcc.git/log

Document/verify another aspect of OpenACC 'async' semantics in 'libgomp.oacc-c-c++-common/data-3.c'

... that I almost broke with later implementation changes.

libgomp/
* testsuite/libgomp.oacc-c-c++-common/data-3.c: Document/verify
another aspect of OpenACC 'async' semantics.

(cherry picked from commit 442d51a20ef13a8e6c080ca30bc37fc93b6bfac4)

Fix OpenACC/GCN 'acc_ev_enqueue_launch_end' position

For an OpenACC compute construct, we've currently got:

  - [...]
  - acc_ev_enqueue_launch_start
  - launch kernel
  - free memory
  - acc_ev_free
  - acc_ev_enqueue_launch_end

This confused another thing that I'm working on, so I adjusted that to:

  - [...]
  - acc_ev_enqueue_launch_start
  - launch kernel
  - acc_ev_enqueue_launch_end
  - free memory
  - acc_ev_free

Correspondingly, verify 'acc_ev_alloc', 'acc_ev_free' in
'libgomp.oacc-c-c++-common/acc_prof-parallel-1.c'.

libgomp/
* plugin/plugin-gcn.c (gcn_exec): Fix 'acc_ev_enqueue_launch_end'
position.
* testsuite/libgomp.oacc-c-c++-common/acc_prof-parallel-1.c:
Verify 'acc_ev_alloc', 'acc_ev_free'.

(cherry picked from commit 649f1939baf11f45fd3579b8b9601c7840a097b3)

libgomp: Merge 'gomp_map_vars_openacc' into 'goacc_map_vars' [PR76739]

Upstream has 'goacc_map_vars'; merge the new 'gomp_map_vars_openacc' into it.
(Maybe the latter didn't exist yet when the former was originally added?)
No functional change.

Clean-up for og12 commit 15d0f61a7fecdc8fd12857c40879ea3730f6d99f
"Merge non-contiguous array support patches".

PR other/76739
libgomp/
* libgomp.h (goacc_map_vars): Add 'struct goacc_ncarray_info *'
formal parameter.
(gomp_map_vars_openacc): Remove.
* target.c (goacc_map_vars): Adjust.
(gomp_map_vars_openacc): Remove.
* oacc-mem.c (acc_map_data, goacc_enter_datum)
(goacc_enter_data_internal): Adjust.
* oacc-parallel.c (GOACC_parallel_keyed, GOACC_data_start):
Adjust.

Revert "OpenACC profiling-interface fixes for asynchronous operations"

There is occasional execution failure; these changes need to be reviewed.

This reverts og12 commit 719f93c8618a134f90b5b661ab70c918d659ad05.

libgomp/
* oacc-host.c: Revert
"OpenACC profiling-interface fixes for asynchronous operations"
changes.
* oacc-mem.c: Likewise.
* oacc-parallel.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/acc_prof-init-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/acc_prof-parallel-1.c:
Likewise.

Revert "Revert changes to acc_prof-init-1.c and acc_prof-parallel-1.c"

... as a prerequisite for reverting
"OpenACC profiling-interface fixes for asynchronous operations".

This reverts og12 commit b845d2f62e7da1c4cfdfee99690de94b648d076d.

libgomp/
* testsuite/libgomp.oacc-c-c++-common/acc_prof-init-1.c: Revert
"Revert changes to acc_prof-init-1.c and acc_prof-parallel-1.c"
changes.
* testsuite/libgomp.oacc-c-c++-common/acc_prof-parallel-1.c:
Likewise.

amdgcn: Add instruction patterns for conditional min/max operations

gcc/ChangeLog:

* config/gcn/gcn-valu.md (<expander><mode>3_exec): Add patterns for
{s|u}{max|min} in QI, HI and DI modes.
(<expander><mode>3): Add pattern for {s|u}{max|min} in DI mode.
(cond_<fexpander><mode>): Add pattern for cond_f{max|min}.
(cond_<expander><mode>): Add pattern for cond_{s|u}{max|min}.
* config/gcn/gcn.cc (gcn_spill_class): Allow the exec register to be
saved in SGPRs.

gcc/testsuite/ChangeLog:

* gcc.target/gcn/cond_fmaxnm_1.c: New test.
* gcc.target/gcn/cond_fmaxnm_1_run.c: New test.
* gcc.target/gcn/cond_fmaxnm_2.c: New test.
* gcc.target/gcn/cond_fmaxnm_2_run.c: New test.
* gcc.target/gcn/cond_fmaxnm_3.c: New test.
* gcc.target/gcn/cond_fmaxnm_3_run.c: New test.
* gcc.target/gcn/cond_fmaxnm_4.c: New test.
* gcc.target/gcn/cond_fmaxnm_4_run.c: New test.
* gcc.target/gcn/cond_fmaxnm_5.c: New test.
* gcc.target/gcn/cond_fmaxnm_5_run.c: New test.
* gcc.target/gcn/cond_fmaxnm_6.c: New test.
* gcc.target/gcn/cond_fmaxnm_6_run.c: New test.
* gcc.target/gcn/cond_fmaxnm_7.c: New test.
* gcc.target/gcn/cond_fmaxnm_7_run.c: New test.
* gcc.target/gcn/cond_fmaxnm_8.c: New test.
* gcc.target/gcn/cond_fmaxnm_8_run.c: New test.
* gcc.target/gcn/cond_fminnm_1.c: New test.
* gcc.target/gcn/cond_fminnm_1_run.c: New test.
* gcc.target/gcn/cond_fminnm_2.c: New test.
* gcc.target/gcn/cond_fminnm_2_run.c: New test.
* gcc.target/gcn/cond_fminnm_3.c: New test.
* gcc.target/gcn/cond_fminnm_3_run.c: New test.
* gcc.target/gcn/cond_fminnm_4.c: New test.
* gcc.target/gcn/cond_fminnm_4_run.c: New test.
* gcc.target/gcn/cond_fminnm_5.c: New test.
* gcc.target/gcn/cond_fminnm_5_run.c: New test.
* gcc.target/gcn/cond_fminnm_6.c: New test.
* gcc.target/gcn/cond_fminnm_6_run.c: New test.
* gcc.target/gcn/cond_fminnm_7.c: New test.
* gcc.target/gcn/cond_fminnm_7_run.c: New test.
* gcc.target/gcn/cond_fminnm_8.c: New test.
* gcc.target/gcn/cond_fminnm_8_run.c: New test.
* gcc.target/gcn/cond_smax_1.c: New test.
* gcc.target/gcn/cond_smax_1_run.c: New test.
* gcc.target/gcn/cond_smin_1.c: New test.
* gcc.target/gcn/cond_smin_1_run.c: New test.
* gcc.target/gcn/cond_umax_1.c: New test.
* gcc.target/gcn/cond_umax_1_run.c: New test.
* gcc.target/gcn/cond_umin_1.c: New test.
* gcc.target/gcn/cond_umin_1_run.c: New test.
* gcc.target/gcn/smax_1.c: New test.
* gcc.target/gcn/smax_1_run.c: New test.
* gcc.target/gcn/smin_1.c: New test.
* gcc.target/gcn/smin_1_run.c: New test.
* gcc.target/gcn/umax_1.c: New test.
* gcc.target/gcn/umax_1_run.c: New test.
* gcc.target/gcn/umin_1.c: New test.
* gcc.target/gcn/umin_1_run.c: New test.

(cherry picked from commit 553ff2524f412be4e02e2ffb1a0a3dc3e2280742)

Merge branch 'releases/gcc-12' into devel/omp/gcc-12

Merge up to r12-9210-gb3f9d2cf7dd5488800f867a6aae076465ecb391b (2nd Mar 2023)

Daily bump.

OpenMP/Fortran: Fix handling of optional is_device_ptr + bind(C) [PR108546]

For is_device_ptr, optional checks should only be done before calling
libgomp, afterwards they are NULL either because of absent or, by
chance, because it is unallocated or unassociated (for pointers/allocatables).

Additionally, it fixes an issue with explicit mapping for 'type(c_ptr)'.

PR middle-end/108546

gcc/fortran/ChangeLog:

* trans-openmp.cc (gfc_trans_omp_clauses): Fix mapping of
type(C_ptr) variables.

gcc/ChangeLog:

* omp-low.cc (lower_omp_target): Remove optional handling
on the receiver side, i.e. inside target (data), for
use_device_ptr.

libgomp/ChangeLog:

* testsuite/libgomp.fortran/is_device_ptr-3.f90: New test.
* testsuite/libgomp.fortran/use_device_ptr-optional-4.f90: New test.

(cherry picked from commit 96ff97ff6574666a5509ae9fa596e7f2b6ad4f88)

Daily bump.

Merge branch 'releases/gcc-12' into devel/omp/gcc-12

Merge up to r12-9207-gb8e496d132ec087c9db5951fea23551dcc831d8c (27th Feb 2023)

Update dg-dump-scan for "Fortran/OpenMP: Fix mapping of array descriptors and deferred-length strings"

Follow-up to commit 55a18d4744258e3909568e425f9f473c49f9d13f
"Fortran/OpenMP: Fix mapping of array descriptors and deferred-length strings"
updating the dumps.
* For the goacc testcase, 'to' changed to 'release' and due to 'finally' then
  to 'delete', which can be regarded as bugfix.
* For pr78260-2.f90, the calculation moved inside the 'if(...->data == NULL)'
  block to handle deferred-string length vars better, esp. when 'optional'.

gcc/testsuite/:
        * gfortran.dg/goacc/finalize-1.f: Update scan-tree-dump-times for
        mapping changes.
        * gfortran.dg/gomp/pr78260-2.f90: Likewise.

asan: adjust module name for global variables

As mentioned in the PR, when we use LTO, we wrongly use ltrans output
file name as a module name of a global variable. That leads to a
non-reproducible output.

After the suggested change, we emit context name of normal global
variables. And for artificial variables (like .Lubsan_data3), we use
aux_base_name (e.g. "./a.ltrans0.ltrans").

PR sanitizer/108834

gcc/ChangeLog:

* asan.cc (asan_add_global): Use proper TU name for normal
global variables (and aux_base_name for the artificial one).

gcc/testsuite/ChangeLog:

* c-c++-common/asan/global-overflow-1.c: Test line and column
info for a global variable.

(cherry picked from commit 94c9b1bb79f63d000ebb05efc155c149325e332d)

rs6000/test: Adjust some test cases on partial vector [PR96373]

As Richard pointed out in [1] and the testing on Power10, the
proposed fix for PR96373 requires some updates on a few rs6000
test cases which adopt partial vector. This patch is to fix
all of them with one extra option "-fno-trapping-math" as
Richard suggested.

Besides, the original test case also failed on Power10 without
Richard's proposed fix, this patch adds it together for a bit
better testing coverage.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2023-January/610728.html

PR target/96373

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/p9-vec-length-epil-1.c: Add -fno-trapping-math.
* gcc.target/powerpc/p9-vec-length-epil-2.c: Likewise.
* gcc.target/powerpc/p9-vec-length-epil-3.c: Likewise.
* gcc.target/powerpc/p9-vec-length-epil-4.c: Likewise.
* gcc.target/powerpc/p9-vec-length-epil-5.c: Likewise.
* gcc.target/powerpc/p9-vec-length-epil-6.c: Likewise.
* gcc.target/powerpc/p9-vec-length-epil-8.c: Likewise.
* gcc.target/powerpc/p9-vec-length-full-1.c: Likewise.
* gcc.target/powerpc/p9-vec-length-full-2.c: Likewise.
* gcc.target/powerpc/p9-vec-length-full-3.c: Likewise.
* gcc.target/powerpc/p9-vec-length-full-4.c: Likewise.
* gcc.target/powerpc/p9-vec-length-full-5.c: Likewise.
* gcc.target/powerpc/p9-vec-length-full-6.c: Likewise.
* gcc.target/powerpc/p9-vec-length-full-8.c: Likewise.
* gcc.target/powerpc/pr96373.c: New test.

(cherry picked from commit 4f5a1198065dc078f8099db628da7b06a2666f34)

Daily bump.

RTEMS: Tune multilib selection

gcc/ChangeLog:

* config/riscv/t-rtems: Keep only -mcmodel=medany 64-bit multilibs.
Add non-compact 32-bit multilibs.

(cherry picked from commit 35a067020e41d97bc3be15b518b3dc2a64b4aae2)

Daily bump.

tree-optimization/108888 - call if-conversion

The following makes sure to only predicate calls necessary.

PR tree-optimization/108888
* tree-if-conv.cc (if_convertible_stmt_p): Set PLF_2 on
calls to predicate.
(predicate_statements): Only predicate calls with PLF_2.

* g++.dg/torture/pr108888.C: New testcase.

(cherry picked from commit 31cc5821223a096ef61743bff520f4a0dbba5872)

vect: inbranch SIMD clones

There has been support for generating "inbranch" SIMD clones for a long time,
but nothing actually uses them (as far as I can see).

This patch add supports for a sub-set of possible cases (those using
mask_mode == VOIDmode). The other cases fail to vectorize, just as before,
so there should be no regressions.

The sub-set of support should cover all cases needed by amdgcn, at present.

gcc/ChangeLog:

* internal-fn.cc (expand_MASK_CALL): New.
* internal-fn.def (MASK_CALL): New.
* internal-fn.h (expand_MASK_CALL): New prototype.
* omp-simd-clone.cc (simd_clone_adjust_argument_types): Set vector_type
for mask arguments also.
* tree-if-conv.cc: Include cgraph.h.
(if_convertible_stmt_p): Do if conversions for calls to SIMD calls.
(predicate_statements): Convert functions to IFN_MASK_CALL.
* tree-vect-loop.cc (vect_get_datarefs_in_loop): Recognise
IFN_MASK_CALL as a SIMD function call.
* tree-vect-stmts.cc (vectorizable_simd_clone_call): Handle
IFN_MASK_CALL as an inbranch SIMD function call.
Generate the mask vector arguments.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-simd-clone-16.c: New test.
* gcc.dg/vect/vect-simd-clone-16b.c: New test.
* gcc.dg/vect/vect-simd-clone-16c.c: New test.
* gcc.dg/vect/vect-simd-clone-16d.c: New test.
* gcc.dg/vect/vect-simd-clone-16e.c: New test.
* gcc.dg/vect/vect-simd-clone-16f.c: New test.
* gcc.dg/vect/vect-simd-clone-17.c: New test.
* gcc.dg/vect/vect-simd-clone-17b.c: New test.
* gcc.dg/vect/vect-simd-clone-17c.c: New test.
* gcc.dg/vect/vect-simd-clone-17d.c: New test.
* gcc.dg/vect/vect-simd-clone-17e.c: New test.
* gcc.dg/vect/vect-simd-clone-17f.c: New test.
* gcc.dg/vect/vect-simd-clone-18.c: New test.
* gcc.dg/vect/vect-simd-clone-18b.c: New test.
* gcc.dg/vect/vect-simd-clone-18c.c: New test.
* gcc.dg/vect/vect-simd-clone-18d.c: New test.
* gcc.dg/vect/vect-simd-clone-18e.c: New test.
* gcc.dg/vect/vect-simd-clone-18f.c: New test.

(cherry picked from commit 3da77f217c8b2089ecba3eb201e727c3fcdcd19d)

libstdc++: Simplify three helper functions into one

Broadcast is a very common function. This should reduce compile-time
effort.

Signed-off-by: Matthias Kretz <m.kretz@gsi.de>
libstdc++-v3/ChangeLog:

PR libstdc++/108030
* include/experimental/bits/simd.h (__vector_broadcast):
Implement via __vector_broadcast_impl instead of
__call_with_n_evaluations + 2 lambdas.
(__vector_broadcast_impl): New.

(cherry picked from commit 2e29e2fbeb8936e5c85cefaf547cba42e17e137b)

libstdc++: Fix -Wsign-compare issue

Signed-off-by: Matthias Kretz <m.kretz@gsi.de>
libstdc++-v3/ChangeLog:

* include/experimental/bits/simd_builtin.h (_S_set): Compare as
int. The actual range of these indexes is very small.

(cherry picked from commit ffa39f7120f6e83a567d7a83ff4437f6b41036ea)

libstdc++: Add missing constexpr on simd shift implementation

Resolves -Wtautological-compare warnings about `if
(__builtin_is_constant_evaluated())` in the implementations of these
functions.

Signed-off-by: Matthias Kretz <m.kretz@gsi.de>
libstdc++-v3/ChangeLog:

* include/experimental/bits/simd_x86.h (_S_bit_shift_left)
(_S_bit_shift_right): Declare constexpr. The implementation was
already expecting constexpr evaluation.

(cherry picked from commit fa37ac2b59ed1c379b35dbf9bd58f7849f9fd5b5)

libgomp: no need to attach USM pointers

Fix a bug in which Fortran pointers inside derived types caused a runtime
error when Unified Shared Memory was active.

libgomp/ChangeLog:

* target.c (gomp_attach_pointer): Check for USM.
* testsuite/libgomp.fortran/usm-3.f90: New test.

libstdc++: Fix uses of non-reserved names in simd header

Signed-off-by: Matthias Kretz <m.kretz@gsi.de>
libstdc++-v3/ChangeLog:

* include/experimental/bits/simd.h (__extract_part, split):
Use reserved name for template parameter.

(cherry picked from commit bb920f561e983c64d146f173dc4ebc098441a962)

Daily bump.

Fortran/OpenMP: Fix mapping of array descriptors and deferred-length strings

Previously, array descriptors might have been mapped as 'alloc'
instead of 'to' for 'alloc', not updating the array bounds. The
'alloc' could also appear for 'data exit', failing with a libgomp
assert. In some cases, either array descriptors or deferred-length
string's length variable was not mapped. And, finally, some offset
calculations with array-sections mappings went wrong.

The testcases contain some comment-out tests which require follow-up
work and for which PR exist. Those mostly relate to deferred-length
strings which have several issues beyong OpenMP support.

This is the OG12 variant of the submitted but unreviewed GCC 13/mainline
patch at https://gcc.gnu.org/pipermail/gcc-patches/2023-February/612387.html

gcc/fortran/ChangeLog:

* trans-decl.cc (gfc_get_symbol_decl): Add attributes
such as 'declare target' also to hidden artificial
variable for deferred-length character variables.
* trans-openmp.cc (gfc_trans_omp_array_section,
gfc_trans_omp_clauses, gfc_trans_omp_target_exit_data):
Improve mapping of array descriptors and deferred-length
string variables.

gcc/ChangeLog:

* gimplify.cc (gimplify_scan_omp_clauses): Remove Fortran
special case.

libgomp/ChangeLog:

* testsuite/libgomp.fortran/target-enter-data-3.f90: Uncomment
'target exit data'.
* testsuite/libgomp.fortran/target-enter-data-4.f90: New test.
* testsuite/libgomp.fortran/target-enter-data-5.f90: New test.
* testsuite/libgomp.fortran/target-enter-data-6.f90: New test.
* testsuite/libgomp.fortran/target-enter-data-7.f90: New test.

Fix: Fortran/OpenMP: align/allocator modifiers to the allocate clause

When merging r13-4584-gb2e1c49b4a4 to OG12 as commit 58e0579ed87,
the 'align' handling seemingly ended up in the wrong clause.
(Result: libgomp.fortran/allocate-2a.f90 FAILED; now fixed.)

gcc/fortran/
* trans-openmp.cc (gfc_trans_omp_clauses): Move align modifier
handling from OMP_LIST_ALLOCATOR to OMP_LIST_ALLOCATE.

Daily bump.

libstdc++: Add noexcept-specifier to std::reference_wrapper::operator()

This isn't required by the standard, but there's an LWG issue suggesting
to add it.

Also use __invoke_result instead of result_of, to match the spec in
recent standards.

libstdc++-v3/ChangeLog:

* include/bits/refwrap.h (reference_wrapper::operator()): Add
noexcept-specifier and use __invoke_result instead of result_of.
* testsuite/20_util/reference_wrapper/invoke-noexcept.cc: New test.

(cherry picked from commit e47df5eb56c4e7aca0d3e50826e5aaa1887fa446)

libstdc++: Fix std::filesystem errors with -fkeep-inline-functions [PR108636]

With -fkeep-inline-functions there are linker errors when including
<filesystem>. This happens because there are some filesystem::path
constructors defined inline which call non-exported functions defined in
the library. That's usually not a problem, because those constructors
are only called by code that's also inside the library. But when the
header is compiled with -fkeep-inline-functions those inline functions
are emitted even though they aren't called. That then creates an
undefined reference to the other library internsl. The fix is to just
move the private constructors into the library where they are called.
That way they are never even seen by users, and so not compiled even if
-fkeep-inline-functions is used.

libstdc++-v3/ChangeLog:

PR libstdc++/108636
* include/bits/fs_path.h (path::path(string_view, _Type))
(path::_Cmpt::_Cmpt(string_view, _Type, size_t)): Move inline
definitions to ...
* src/c++17/fs_path.cc: ... here.
* testsuite/27_io/filesystem/path/108636.cc: New test.

(cherry picked from commit db8d6fc572ec316ccfcf70b1dffe3be0b1b37212)

Daily bump.

c++: ICE with redundant capture [PR108829]

Here we crash in is_capture_proxy:

  /* Location wrappers should be stripped or otherwise handled by the
     caller before using this predicate.  */
  gcc_checking_assert (!location_wrapper_p (decl));

We only crash with the redundant capture:

  int abyPage = [=, abyPage] { ... }

because prune_lambda_captures is only called when there was a default
capture, and with [=] only abyPage won't be in LAMBDA_EXPR_CAPTURE_LIST.

The problem is that LAMBDA_CAPTURE_EXPLICIT_P wasn't propagated
correctly and so var_to_maybe_prune proceeded where it shouldn't.

Co-Authored by: Patrick Palka <ppalka@redhat.com>

PR c++/108829

gcc/cp/ChangeLog:

* pt.cc (prepend_one_capture): Set LAMBDA_CAPTURE_EXPLICIT_P.
(tsubst_lambda_expr): Pass LAMBDA_CAPTURE_EXPLICIT_P to
prepend_one_capture.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/lambda/lambda-108829-2.C: New test.
* g++.dg/cpp0x/lambda/lambda-108829.C: New test.

(cherry picked from commit 02d8ab3e4e2f3d9dc12157a98c976d6698e71e29)

Prototype 'GOMP_enable_pinned_mode'

Fix-up for og12 commit 842df187487f5b16ae29bbe7e9acd79661a9df48
"openmp: -foffload-memory=pinned". No functional change.

libgomp/
* libgomp_g.h (GOMP_enable_pinned_mode): New.

Attempt to not just register but allocate OpenMP pinned memory using a device: ChangeLog

... forgotten in og12 commit 4bd844f3e0202b3d083f0784f4343570c88bb86c
"Attempt to not just register but allocate OpenMP pinned memory using a device".

Attempt to not just register but allocate OpenMP pinned memory using a device

... instead of 'mmap' plus attempting to register using a device.

Implemented for nvptx offloading via 'cuMemHostAlloc'.

This re-works og12 commit a5a4800e92773da7126c00a9c79b172494d58ab5
"Attempt to register OpenMP pinned memory using a device instead of 'mlock'".

include/
* cuda/cuda.h (cuMemHostRegister, cuMemHostUnregister): Remove.
libgomp/
* config/linux/allocator.c (linux_memspace_alloc): Add 'init0'
formal parameter.  Adjust all users.
(linux_memspace_alloc, linux_memspace_free): Attempt to allocate
OpenMP pinned memory using a device instead of 'mmap' plus
attempting to register using a device.
* libgomp-plugin.h (GOMP_OFFLOAD_register_page_locked)
(GOMP_OFFLOAD_unregister_page_locked): Remove.
(GOMP_OFFLOAD_page_locked_host_alloc)
(GOMP_OFFLOAD_page_locked_host_free): New.
* libgomp.h (gomp_register_page_locked)
(gomp_unregister_page_locked): Remove.
(gomp_page_locked_host_alloc, gomp_page_locked_host_free): New.
(struct gomp_device_descr): Remove 'register_page_locked_func',
'unregister_page_locked_func'.  Add 'page_locked_host_alloc_func',
'page_locked_host_free_func'.
* plugin/cuda-lib.def (cuMemHostRegister_v2, cuMemHostRegister)
(cuMemHostUnregister): Remove.
* plugin/plugin-nvptx.c (GOMP_OFFLOAD_register_page_locked)
(GOMP_OFFLOAD_unregister_page_locked): Remove.
(GOMP_OFFLOAD_page_locked_host_alloc)
(GOMP_OFFLOAD_page_locked_host_free): New.
* target.c (gomp_register_page_locked)
(gomp_unregister_page_locked): Remove.
(gomp_page_locked_host_alloc, gomp_page_locked_host_free): Add.
(gomp_load_plugin_for_device): Don't handle
'register_page_locked', 'unregister_page_locked'.  Handle
'page_locked_host_alloc', 'page_locked_host_free'.

Suggested-by: Andrew Stubbs <ams@codesourcery.com>

aarch64: Fix up bfmlal lane pattern [PR104921]

As the testcase shows, this pattern had an incorrect constraint leading
to GCC's output getting rejected by the assembler.

This patch fixes the constraint accordingly.

The test is split into two: one that can run without bf16 support from
the assembler and another that checks that the output actually assembles
when such support is available.

gcc/ChangeLog:

PR target/104921
* config/aarch64/aarch64-simd.md (aarch64_bfmlal<bt>_lane<q>v4sf):
Use correct constraint for operand 3.

gcc/testsuite/ChangeLog:

PR target/104921
* gcc.target/aarch64/pr104921-1.c: New test.
* gcc.target/aarch64/pr104921-2.c: New test.
* gcc.target/aarch64/pr104921.x: Include file for new tests.

(cherry picked from commit 277e1f30a5e4e634304a7b8a532825119f0ea47f)

Merge branch 'releases/gcc-12' into devel/omp/gcc-12

Merge up to r12-9189-gc6e3ecca0e3dcf567d0c843a4987e52591041372 (20th Feb 2023)

Daily bump.

LoongArch: Fix multiarch tuple canonization

Multiarch tuple will be coded in file or directory names in
multiarch-aware distros, so one ABI should have only one multiarch
tuple.  For example, "--target=loongarch64-linux-gnu --with-abi=lp64s"
and "--target=loongarch64-linux-gnusf" should both set multiarch tuple
to "loongarch64-linux-gnusf".  Before this commit,
"--target=loongarch64-linux-gnu --with-abi=lp64s --disable-multilib"
will produce wrong result (loongarch64-linux-gnu).

A recent LoongArch psABI revision mandates "loongarch64-linux-gnu" to be
used for -mabi=lp64d (instead of "loongarch64-linux-gnuf64") for some
non-technical reason [1].  Note that we cannot make
"loongarch64-linux-gnuf64" an alias for "loongarch64-linux-gnu" because
to implement such an alias, we must create thousands of symlinks in the
distro and doing so would be completely unpractical.  This commit also
aligns GCC with the revision.

Tested by building cross compilers with --enable-multiarch and multiple
combinations of --target=loongarch64-linux-gnu*, --with-abi=lp64{s,f,d},
and --{enable,disable}-multilib; and run "xgcc --print-multiarch" then
manually verify the result with eyesight.

[1]: https://github.com/loongson/LoongArch-Documentation/pull/80

gcc/ChangeLog:

* config.gcc (triplet_abi): Set its value based on $with_abi,
instead of $target.
(la_canonical_triplet): Set it after $triplet_abi is set
correctly.
* config/loongarch/t-linux (MULTILIB_OSDIRNAMES): Make the
multiarch tuple for lp64d "loongarch64-linux-gnu" (without
"f64" suffix).

(cherry picked from commit 017849d9d88f021770a90f12fffec9aa2425ed27)

Daily bump.

Attempt to register OpenMP pinned memory using a device instead of 'mlock'

Implemented for nvptx offloading via 'cuMemHostRegister'. This means: (a) not
running into 'mlock' limitations, and (b) the device is aware of this and may
optimize host <-> device memory transfers.

This re-works og12 commit ab7520b3b4cd9fdabfd63652badde478955bd3b5
"libgomp: pinned memory".

include/
* cuda/cuda.h (cuMemHostRegister, cuMemHostUnregister): New.
libgomp/
* config/linux/allocator.c (linux_memspace_alloc)
(linux_memspace_free, linux_memspace_realloc): Attempt to register
OpenMP pinned memory using a device instead of 'mlock'.
* libgomp-plugin.h (GOMP_OFFLOAD_register_page_locked)
(GOMP_OFFLOAD_unregister_page_locked): New.
* libgomp.h (gomp_register_page_locked)
(gomp_unregister_page_locked): New
(struct gomp_device_descr): Add 'register_page_locked_func',
'unregister_page_locked_func'.
* plugin/cuda-lib.def (cuMemHostRegister_v2, cuMemHostRegister)
(cuMemHostUnregister): New.
* plugin/plugin-nvptx.c (GOMP_OFFLOAD_register_page_locked)
(GOMP_OFFLOAD_unregister_page_locked): New.
* target.c (gomp_register_page_locked)
(gomp_unregister_page_locked): New.
(gomp_load_plugin_for_device): Handle 'register_page_locked',
'unregister_page_locked'.
* testsuite/libgomp.c/alloc-pinned-1.c: Adjust.
* testsuite/libgomp.c/alloc-pinned-2.c: Likewise.
* testsuite/libgomp.c/alloc-pinned-3.c: Likewise.
* testsuite/libgomp.c/alloc-pinned-4.c: Likewise.
* testsuite/libgomp.c/alloc-pinned-5.c: Likewise.
* testsuite/libgomp.c/alloc-pinned-6.c: Likewise.

In 'libgomp/allocator.c:omp_realloc', route 'free' through 'MEMSPACE_FREE'

... to not run into a SIGSEGV if a non-'malloc'-based allocation is 'free'd
here.

Fix-up for og12 commit c5d1d7651297a273321154a5fe1b01eba9dcf604
"libgomp, nvptx: low-latency memory allocator".

libgomp/
* allocator.c (omp_realloc): Route 'free' through 'MEMSPACE_FREE'.

Clarify/verify OpenMP 'omp_calloc' zero-initialization for pinned memory

Clarification for og12 commit ab7520b3b4cd9fdabfd63652badde478955bd3b5
"libgomp: pinned memory". No functional change.

libgomp/
* config/linux/allocator.c (linux_memspace_alloc)
(linux_memspace_calloc): Clarify zero-initialization for pinned
memory.
* testsuite/libgomp.c/alloc-pinned-1.c: Verify zero-initialization
for pinned memory.
* testsuite/libgomp.c/alloc-pinned-2.c: Likewise.
* testsuite/libgomp.c/alloc-pinned-3.c: Likewise.
* testsuite/libgomp.c/alloc-pinned-4.c: Likewise.
* testsuite/libgomp.c/alloc-pinned-5.c: Likewise.

Miscellaneous clean-up re OpenMP 'ompx_unified_shared_mem_space', 'ompx_host_mem_space'

Clean-up for og12 commit 84914e197d91a67b3d27db0e4c69a433462983a5
"openmp, nvptx: ompx_unified_shared_mem_alloc". No functional change.

libgomp/
* config/linux/allocator.c (linux_memspace_calloc): Elide
(innocuous) duplicate 'if' condition.
* config/nvptx/allocator.c (nvptx_memspace_free): Explicitly
handle 'memspace == ompx_host_mem_space'.
* libgomp.h (gomp_is_usm_ptr): Remove.

Un-break nvptx libgomp build

    In file included from [...]/libgomp/config/nvptx/allocator.c:49:
    [...]/libgomp/config/nvptx/../../basic-allocator.c:52:2: error: invalid preprocessing directive #deine; did you mean #define?
       52 | #deine BASIC_ALLOC_YIELD
          |  ^~~~~
          |  define

Yes, indeed.

Fix-up for og12 commit 9583738a62a33a276b2aad980a27e77097f95924
"nvptx, libgomp: Move the low-latency allocator code".

libgomp/
* basic-allocator.c (BASIC_ALLOC_YIELD): instead of '#deine',
'#define' it.

libstdc++: Fix incorrect function call in -ffast-math optimization

Signed-off-by: Matthias Kretz <m.kretz@gsi.de>
libstdc++-v3/ChangeLog:

* include/experimental/bits/simd_math.h (__hypot): Bitcasting
between scalars requires the __bit_cast helper function instead
of simd_bit_cast.

(cherry picked from commit a5de17d9120dde7e6598a05ea4d1556c2783c69b)

libstdc++: Fix incorrect __builtin_is_constant_evaluated calls

Signed-off-by: Matthias Kretz <m.kretz@gsi.de>
libstdc++-v3/ChangeLog:

* include/experimental/bits/simd_x86.h
(_SimdImplX86::_S_not_equal_to, _SimdImplX86::_S_less)
(_SimdImplX86::_S_less_equal): Do not call
__builtin_is_constant_evaluated in constexpr-if.

(cherry picked from commit 1fd3836463c65f695831ef04c7dbda1e7a1794ba)

libstdc++: printf format string fix in testsuite

Signed-off-by: Matthias Kretz <m.kretz@gsi.de>
libstdc++-v3/ChangeLog:

* testsuite/experimental/simd/tests/bits/verify.h
(verify::verify): Use %zx for size_t in format string.

(cherry picked from commit 07e4648b4ab86a76ab040ea18a756e388652c882)

libstdc++: Document timeout and timeout-factor of simd tests

Signed-off-by: Matthias Kretz <m.kretz@gsi.de>
libstdc++-v3/ChangeLog:

* testsuite/experimental/simd/README.md: Document the timeout
and timeout-factor directives. Minor typo fixed.

(cherry picked from commit b0f4b166ada3b92da5f2917ac3f4397e99d1b58f)

libstdc++: Ensure __builtin_constant_p isn't lost on the way

The more expensive code path should only be taken if it can be optimized
away.

Signed-off-by: Matthias Kretz <m.kretz@gsi.de>
libstdc++-v3/ChangeLog:

* include/experimental/bits/simd.h
(_SimdWrapper::_M_is_constprop_none_of)
(_SimdWrapper::_M_is_constprop_all_of): Return false unless the
computed result still satisfies __builtin_constant_p.

(cherry picked from commit fea34ee491104f325682cc5fb75683b7d74a0a3b)

'libgomp.c/usm-{1,2,3,4}.c': Re-enable non-GCN offloading compilation

Change '-foffload=amdgcn-amdhsa=[...]' to
'-foffload-options=amdgcn-amdhsa=[...]', so that non-GCN offloading compilation
doesn't get disabled.

Fix-up for og12 commit 6ec2c29dbbc19e7d2a8f991a5848e10c65c7c74c
"amdgcn, libgomp: USM allocation update".

libgomp/
* testsuite/libgomp.c/usm-1.c: Re-enable non-GCN offloading
compilation.
* testsuite/libgomp.c/usm-2.c: Likewise.
* testsuite/libgomp.c/usm-3.c: Likewise.
* testsuite/libgomp.c/usm-4.c: Likewise.

amdgcn, libgomp: low-latency allocator

This implements the OpenMP low-latency memory allocator for AMD GCN using the
small per-team LDS memory (Local Data Store).

Since addresses can now refer to LDS space, the "Global" address space is
no-longer compatible. This patch therefore switches the backend to use
entirely "Flat" addressing (which supports both memories). A future patch
will re-enable "global" instructions for cases where it is known to be safe
to do so.

gcc/ChangeLog:

* config/gcn/gcn-builtins.def (DISPATCH_PTR): New built-in.
* config/gcn/gcn.cc (gcn_init_machine_status): Disable global
addressing.
(gcn_expand_builtin_1): Implement GCN_BUILTIN_DISPATCH_PTR.

libgomp/ChangeLog:

* config/gcn/libgomp-gcn.h (TEAM_ARENA_START): Move to here.
(TEAM_ARENA_FREE): Likewise.
(TEAM_ARENA_END): Likewise.
(GCN_LOWLAT_HEAP): New.
* config/gcn/team.c (LITTLEENDIAN_CPU): New, and import hsa.h.
(__gcn_lowlat_init): New prototype.
(gomp_gcn_enter_kernel): Initialize the low-latency heap.
* libgomp.h (TEAM_ARENA_START): Move to libgomp.h.
(TEAM_ARENA_FREE): Likewise.
(TEAM_ARENA_END): Likewise.
* plugin/plugin-gcn.c (lowlat_size): New variable.
(print_kernel_dispatch): Label the group_segment_size purpose.
(init_environment_variables): Read GOMP_GCN_LOWLAT_POOL.
(create_kernel_dispatch): Pass low-latency head allocation to kernel.
(run_kernel): Use shadow; don't assume values.
* testsuite/libgomp.c/allocators-7.c: Enable for amdgcn.
* config/gcn/allocator.c: New file.

nvptx, libgomp: Move the low-latency allocator code

There shouldn't be a functionality change; this is just so AMD can share
the code.

The new basic-allocator.c is designed to be included so it can be used as a
template multiple times and inlined.

libgomp/ChangeLog:

* config/nvptx/allocator.c (BASIC_ALLOC_PREFIX): New define, and
include basic-allocator.c.
(__nvptx_lowlat_heap_root): Remove.
(heapdesc): Remove.
(nvptx_memspace_alloc): Move implementation to basic-allocator.c.
(nvptx_memspace_calloc): Likewise.
(nvptx_memspace_free): Likewise.
(nvptx_memspace_realloc): Likewise.
* config/nvptx/team.c (__nvptx_lowlat_heap_root): Remove.
(gomp_nvptx_main): Call __nvptx_lowlat_init.
* basic-allocator.c: New file.

Fortran: error recovery on invalid assumed size reference [PR104554]

gcc/fortran/ChangeLog:

PR fortran/104554
* resolve.cc (check_assumed_size_reference): Avoid NULL pointer
dereference.

gcc/testsuite/ChangeLog:

PR fortran/104554
* gfortran.dg/pr104554.f90: New test.

(cherry picked from commit a418129273725fd02e881e6fb5e0877287a1356c)

libgomp: Fix up some typos in libgomp.texi

I decided to check for repeated the the in libgomp and noticed
there are several occurrences of a typo theads rather than threads
in libgomp.texi.

2023-02-16 Jakub Jelinek <jakub@redhat.com>

* libgomp.texi: Fix typos - theads -> threads.

(cherry picked from commit 0b9bd33d69d0c30330a465e6bad262d90c94d4ea)

libgomp: Fix comment typo

I saw
FAIL: libgomp.fortran/target-nowait-array-section.f90   -O  execution test
in my last x86_64-linux bootstrap.  From quick skimming, it might be just
unreliable test, which assumes that asynchronous execution wouldn't produce
ordered sequence, but can't it happen even with asynchronous execution?

That said, while skimming the test, I've noticed a comment typo and
this patch fixes that up.

2023-02-16  Jakub Jelinek  <jakub@redhat.com>

* testsuite/libgomp.fortran/target-nowait-array-section.f90: Fix
comment typo and improve its wording.

(cherry picked from commit 9d71955f383058f17362c1960bdf8ba2b8c74857)

Daily bump.

Fix PR target/90458

This is the incompatibility of -fstack-clash-protection with Windows SEH.
Now the Windows ports always enable TARGET_STACK_PROBE, which means that
the stack is always probed (out of line) so -fstack-clash-protection does
nothing more.

gcc/
PR target/90458
* config/i386/i386.cc (ix86_compute_frame_layout): Disable the
effects of -fstack-clash-protection for TARGET_STACK_PROBE.
(ix86_expand_prologue): Likewise.

Fix 'libgomp.{c-c++-common,fortran}/target-present-*' test cases

Their execution isn't expected to error out if we've been *compiling for any
offload target*, but rather if they're *executing on a non-shared memory
offload device*.  For example, if (any) offloading is configured but not
effective (no device available, for example), you'd get:

    PASS: libgomp.c/../libgomp.c-c++-common/target-present-1.c (test for excess errors)
    FAIL: libgomp.c/../libgomp.c-c++-common/target-present-1.c execution test
    PASS: libgomp.c/../libgomp.c-c++-common/target-present-2.c (test for excess errors)
    FAIL: libgomp.c/../libgomp.c-c++-common/target-present-2.c execution test
    PASS: libgomp.c/../libgomp.c-c++-common/target-present-3.c (test for excess errors)
    FAIL: libgomp.c/../libgomp.c-c++-common/target-present-3.c execution test

    PASS: libgomp.c++/../libgomp.c-c++-common/target-present-1.c (test for excess errors)
    FAIL: libgomp.c++/../libgomp.c-c++-common/target-present-1.c execution test
    PASS: libgomp.c++/../libgomp.c-c++-common/target-present-2.c (test for excess errors)
    FAIL: libgomp.c++/../libgomp.c-c++-common/target-present-2.c execution test
    PASS: libgomp.c++/../libgomp.c-c++-common/target-present-3.c (test for excess errors)
    FAIL: libgomp.c++/../libgomp.c-c++-common/target-present-3.c execution test

    PASS: libgomp.fortran/target-present-1.f90   -O0  (test for excess errors)
    FAIL: libgomp.fortran/target-present-1.f90   -O0  execution test
    [...]
    PASS: libgomp.fortran/target-present-2.f90   -O0  (test for excess errors)
    FAIL: libgomp.fortran/target-present-2.f90   -O0  execution test
    [...]
    PASS: libgomp.fortran/target-present-3.f90   -O0  (test for excess errors)
    FAIL: libgomp.fortran/target-present-3.f90   -O0  execution test
    [...]

Also, verify reaching a checkpoint before the expected error condition -- and
fix up one case where that didn't happen; missing OpenMP 'map' clauses
('libgomp.fortran/target-present-2.f90').

Fix-up for recent og12 commit 229b705862c1d7f9634f72272b77c22970baf821
"openmp: Add support for the 'present' modifier"

libgomp/
* testsuite/libgomp.c-c++-common/target-present-1.c: Fix.
* testsuite/libgomp.c-c++-common/target-present-2.c: Likewise.
* testsuite/libgomp.c-c++-common/target-present-3.c: Likewise.
* testsuite/libgomp.fortran/target-present-1.f90: Likewise.
* testsuite/libgomp.fortran/target-present-2.f90: Likewise.
* testsuite/libgomp.fortran/target-present-3.f90: Likewise.

warn-access: wrong -Wdangling-pointer with labels [PR106080]

-Wdangling-pointer warns when the address of a label escapes.  This
causes grief in OCaml (<https://github.com/ocaml/ocaml/issues/11358>) as
well as in the kernel:
<https://bugzilla.kernel.org/show_bug.cgi?id=215851> because it uses

  #define _THIS_IP_  ({ __label__ __here; __here: (unsigned long)&&__here; })

to get the PC.  -Wdangling-pointer is documented to warn about pointers
to objects.  However, it uses is_auto_decl which checks DECL_P, but DECL_P
is also true for a label/enumerator/function declaration, none of which is
an object.  Rather, it should use auto_var_p which correctly checks VAR_P
and PARM_DECL.

PR middle-end/106080

gcc/ChangeLog:

* gimple-ssa-warn-access.cc (is_auto_decl): Remove.  Use auto_var_p
instead.

gcc/testsuite/ChangeLog:

* c-c++-common/Wdangling-pointer-10.c: New test.
* c-c++-common/Wdangling-pointer-9.c: New test.

(cherry picked from commit d482b20fd346482635a770281a164a09d608b058)

OpenMP/Fortran: Fix loop-iter var privatization with !$OMP LOOP [PR108512]

For 'parallel', loop-iteration variables are marked are marked as 'private',
unless they either appear in an omp do/simd loop or an data-sharing clause
already exists for those on 'parallel'. 'omp loop' wasn't handled, leading
to (potentially) multiple data-sharing clauses in gfc_resolve_do_iterator
as omp_current_ctx pointed to the 'parallel' directive, ignoring the
in-betwen 'loop' directive.

The latter lead to a bogus diagnostic - or rather an ICE as the source
location var contained only '\0'.

Additionally, several 'case EXEC_OMP...LOOP' have been added to call the
right resolution function and likewise for '{masked,master} taskloop'.

gcc/fortran/ChangeLog:

PR fortran/108512
* openmp.cc (gfc_resolve_omp_parallel_blocks): Handle combined 'loop'
directives.
(gfc_resolve_do_iterator): Set a source location for added
'private'-clause arguments.
* resolve.cc (gfc_resolve_code): Call gfc_resolve_omp_do_blocks
also for EXEC_OMP_LOOP and gfc_resolve_omp_parallel_blocks for
combined directives with loop + '{masked,master} taskloop (simd)'.

gcc/testsuite/ChangeLog:

PR fortran/108512
* gfortran.dg/gomp/loop-5.f90: New test.
* gfortran.dg/gomp/loop-2.f90: Update dg-error.
* gfortran.dg/gomp/taskloop-2.f90: Update dg-error.

(cherry picked from commit 7a8cada824c5e45ea729c112f3d1d29956067b7b)

libgomp: Fix reverse-offload for GOMP_MAP_TO_PSET

libgomp/
* target.c (gomp_target_rev): Dereference ptr
to get device address.
* testsuite/libgomp.fortran/reverse-offload-5.f90: Add test
for unallocated allocatable.

(cherry picked from commit edaf1d607860075c5ff4ade20c58b3f578ed5489)

libgomp: Fix 'target enter data' with always pointer

As GOMP_MAP_ALWAYS_POINTER operates on the previous map item, ensure that
with 'target enter data' both are passed together to gomp_map_vars_internal.

libgomp/ChangeLog:

* target.c (gomp_map_vars_internal): Add 'i > 0' before doing a
kind check.
(GOMP_target_enter_exit_data): If the next map item is
GOMP_MAP_ALWAYS_POINTER map it together with the current item.
* testsuite/libgomp.fortran/target-enter-data-4.f90: New test.

(cherry picked from commit c7a9655be60cb4f224d1e5906bfe8ae227b5a3a0)
(Testfile added as target-enter-data-4.f90 as ...-3.f90 already existed.)

Daily bump.

c++: fix ICE in joust_maybe_elide_copy [PR106675]

joust_maybe_elide_copy checks that the last conversion in the ICS for
the first argument is ck_ref_bind, which is reasonable, because we've
checked that we're dealing with a copy/move constructor.  But it can
also happen that we couldn't figure out which conversion function is
better to convert the argument, as in this testcase: joust couldn't
decide if we should go with

  operator foo &()

or

  operator foo const &()

so we get a ck_ambig, which then upsets joust_maybe_elide_copy.  Since
a ck_ambig can validly occur, I think we should just return early, as
in the patch below.

PR c++/106675

gcc/cp/ChangeLog:

* call.cc (joust_maybe_elide_copy): Return false for ck_ambig.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/overload-conv-5.C: New test.

(cherry picked from commit cce62625025380c2ea2a220deb10f8f355f83abf)

openmp: Add support for 'present' modifier in the Fortran parse tree dump

2023-02-14 Kwok Cheung Yeung <kcy@codesourcery.com>

gcc/fortran/
* dump-parse-tree.cc (show_omp_namelist): Display 'present' map
modifier.
(show_omp_clauses): Display 'present' motion modifier for 'to'
and 'from' clauses.

Address cast to pointer from integer of different size in 'libgomp/target.c:gomp_target_rev'

For example, for '-m32' multilib of x86_64-pc-linux-gnu:

    [...]/libgomp/target.c: In function ‘gomp_target_rev’:
    [...]/libgomp/target.c:3699:33: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast]
     3699 |                                 (void *) devaddrs[i],
          |                                 ^

Fix-up for recent og12 commit 229b705862c1d7f9634f72272b77c22970baf821
"openmp: Add support for the 'present' modifier".

libgomp/
* target.c (gomp_target_rev): Address cast to pointer from integer
of different size.

Fix small regression in Ada

gcc/
* gimplify.cc (gimplify_save_expr): Add missing guard.
gcc/testsuite/
* gnat.dg/shift2.adb: New test.

libstdc++: Add missing free functions for atomic_flag [PR103934]

This patch adds -
atomic_flag_test
atomic_flag_test_explicit

Which were missed when commit 491ba6 introduced C++20 atomic flag
test.

libstdc++-v3/ChangeLog:

PR libstdc++/103934
* include/std/atomic (atomic_flag_test): Add.
(atomic_flag_test_explicit): Add.
* testsuite/29_atomics/atomic_flag/test/explicit.cc: Add
test case to cover missing atomic_flag free functions.
* testsuite/29_atomics/atomic_flag/test/implicit.cc:
Likewise.

(cherry picked from commit a8d769045b43e8509490362865a85cb31a855ccf)

Daily bump.

gomp/openmp-simd-8.f90: Remove .ASSUME tree-dump check

While mainline (GCC 13) converts assumptions to the ASSUME internal
function, OG12 has only parsing-only support. Thus, remove the
failing dump scan from the testcase.

gcc/testsuite/
* gfortran.dg/gomp/openmp-simd-8.f90: Remove dump test.

Merge branch 'releases/gcc-12' into devel/omp/gcc-12

Merge up to r12-9170-gcb6861acc4074fd2c30a96b52d68c2cd33b9e94d (13th Feb 2023)

rs6000: Fix typo on vec_vsubcuq in rs6000-overload.def [PR108396]

As Andrew pointed out in PR108396, there is one typo in
rs6000-overload.def on built-in function vec_vsubcuq:

[VEC_VSUBCUQ, vec_vsubcuqP, __builtin_vec_vsubcuq]

"vec_vsubcuqP" should be "vec_vsubcuq", this typo caused
us to define vec_vsubcuqP in rs6000-vecdefines.h instead
of vec_vsubcuq, so that compiler is not able to realize
the built-in function name vec_vsubcuq any more.

Co-authored-By: Andrew Pinski <apinski@marvell.com>
PR target/108396

gcc/ChangeLog:

* config/rs6000/rs6000-overload.def (VEC_VSUBCUQ): Fix typo
vec_vsubcuqP with vec_vsubcuq.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr108396.c: New test.

(cherry picked from commit aaf29ae6cdbaad58b709a77784375d15138174b3)

rs6000: Teach rs6000_opaque_type_invalid_use_p about gcall [PR108348]

PR108348 shows one special case that MMA opaque types are
used in function arguments and treated as pass by reference,
it results in one copying from argument to a temp variable,
since this copying happens before rs6000_function_arg check,
it can cause ICE without MMA support then. This patch is to
teach function rs6000_opaque_type_invalid_use_p to check if
any function argument in a gcall stmt has the invalid use of
MMA opaque types.

btw, I checked the handling on return value, it doesn't have
this kind of issue as its checking and error emission is quite
early, so this doesn't handle function return value.

PR target/108348

gcc/ChangeLog:

* config/rs6000/rs6000.cc (rs6000_opaque_type_invalid_use_p): Add the
support for invalid uses of MMA opaque type in function arguments.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr108348-1.c: New test.
* gcc.target/powerpc/pr108348-2.c: New test.

(cherry picked from commit 5d9529687deb9ed009361a16c02a7f6c3e2ebbf3)

rs6000: Teach rs6000_opaque_type_invalid_use_p about inline asm [PR108272]

As PR108272 shows, there are some invalid uses of MMA opaque
types in inline asm statements. This patch is to teach the
function rs6000_opaque_type_invalid_use_p for inline asm,
check and error any invalid use of MMA opaque types in input
and output operands.

PR target/108272

gcc/ChangeLog:

* config/rs6000/rs6000.cc (rs6000_opaque_type_invalid_use_p): Add the
support for invalid uses in inline asm, factor out the checking and
erroring to lambda function check_and_error_invalid_use.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr108272-1.c: New test.
* gcc.target/powerpc/pr108272-2.c: New test.
* gcc.target/powerpc/pr108272-3.c: New test.
* gcc.target/powerpc/pr108272-4.c: New test.

(cherry picked from commit 074b0c03eabeb8e9c8de813c81bf87a1f88fdb65)

Daily bump.

Suppress -fstack-protector warning on hppa.

Some package builds enable -fstack-protector and -Werror. Since
-fstack-protector is not supported on hppa because the stack grows
up, these packages must check for the warning generated by
-fstack-protector and suppress it on hppa. This is problematic
since hppa is the only significant architecture where the stack
grows up.

2022-12-16 John David Anglin <danglin@gcc.gnu.org>

gcc/ChangeLog:

* config/pa/pa.cc (pa_option_override): Disable -fstack-protector.

gcc/testsuite/ChangeLog:

* lib/target-supports.exp (check_effective_target_static): Return 0
on hppa*-*-*.

Daily bump.

i386: Fix up -Wuninitialized warnings in avx512erintrin.h [PR105593]

As reported in the PR, there are some -Wuninitialized warnings in
avx512erintrin.h.  One can see that by compiling sse-23.c testcase with
-Wuninitialized (or when actually using those intrinsics).
Those 6 spots use an uninitialized variable and pass it as one of the
argument to a builtin with constant mask -1, because there is no unmasked
builtin.  It is true that expansion of those builtins into RTL will see
mask is all ones and ignore the unneeded argument, but -Wuninitialized
is diagnosed on GIMPLE and on GIMPLE these builtins are just builtin calls.
avx512fintrin.h and other headers use in these cases the _mm*_undefined_* ()
intrinsics, like:
  return (__m512i) __builtin_ia32_psrav8di_mask ((__v8di) __X,
                                                 (__v8di) __Y,
                                                 (__v8di)
                                                 _mm512_undefined_epi32 (),
                                                 (__mmask8) -1);
etc. and the following patch does the same for avx512erintrin.h.
With the recent changes in C++ FE and the _mm*_undefined_* intrinsics,
we don't emit -Wuninitialized warnings for those (previously we didn't
just in C due to self-initialization).  Of course we could also
just self-initialize these uninitialized vars and add the #pragma GCC
diagnostic dances around it, but using the intrinsics is consistent with
the rest and IMHO cleaner.

2023-01-31  Jakub Jelinek  <jakub@redhat.com>

PR c++/105593
* config/i386/avx512erintrin.h (_mm512_exp2a23_round_pd,
_mm512_exp2a23_round_ps, _mm512_rcp28_round_pd, _mm512_rcp28_round_ps,
_mm512_rsqrt28_round_pd, _mm512_rsqrt28_round_ps): Use
_mm512_undefined_pd () or _mm512_undefined_ps () instead of using
uninitialized automatic variable __W.

* gcc.target/i386/sse-23.c: Add -Wuninitialized to dg-options.

(cherry picked from commit 41602390456901c14ecdfa2fa64c3cebd5b6ff09)

x86: Avoid -Wuninitialized warnings on _mm*_undefined_* in C++ [PR105593]

In https://gcc.gnu.org/pipermail/gcc-patches/2023-January/609844.html
I've posted a patch to allow ignoring -Winit-self using GCC diagnostic
pragmas, such that one can mark self-initialization as intentional
disabling of -Wuninitialized warnings.

The following incremental patch uses that in the x86 intrinsic
headers.

2023-01-16 Jakub Jelinek <jakub@redhat.com>

PR c++/105593
gcc/
* config/i386/xmmintrin.h (_mm_undefined_ps): Temporarily
disable -Winit-self using pragma GCC diagnostic ignored.
* config/i386/emmintrin.h (_mm_undefined_pd, _mm_undefined_si128):
Likewise.
* config/i386/avxintrin.h (_mm256_undefined_pd, _mm256_undefined_ps,
_mm256_undefined_si256): Likewise.
* config/i386/avx512fintrin.h (_mm512_undefined_pd,
_mm512_undefined_ps, _mm512_undefined_epi32): Likewise.
* config/i386/avx512fp16intrin.h (_mm_undefined_ph,
_mm256_undefined_ph, _mm512_undefined_ph): Likewise.
gcc/testsuite/
* g++.target/i386/pr105593.C: New test.

(cherry picked from commit 6b0907b4fc455377e5f8109f427d97da02b6aec9)

c, c++: Allow ignoring -Winit-self through pragmas [PR105593]

As mentioned in the PR, various x86 intrinsics need to return
an uninitialized vector. Currently they use self initialization
to avoid -Wuninitialized warnings, which works fine in C, but
doesn't work in C++ where -Winit-self is enabled in -Wall.
We don't have an attribute to mark a variable as knowingly
uninitialized (the uninitialized attribute exists but means
something else, only in the -ftrivial-auto-var-init context),
and trying to suppress either -Wuninitialized or -Winit-self
inside of the _mm_undefined_ps etc. intrinsic definitions
doesn't work, one needs to currently disable through pragmas
-Wuninitialized warning at the point where _mm_undefined_ps etc.
result is actually used, but that goes against the intent of
those intrinsics.

The -Winit-self warning option actually doesn't do any warning,
all we do is record a suppression for -Winit-self if !warn_init_self
on the decl definition and later look that up in uninit pass.

The following patch changes those !warn_init_self tests which
are true only based on the command line option setting, not based
on GCC diagnostic pragma overrides to
!warning_enabled_at (DECL_SOURCE_LOCATION (decl), OPT_Winit_self)
such that it takes them into account.

2023-01-16 Jakub Jelinek <jakub@redhat.com>

PR c++/105593
gcc/c/
* c-parser.cc (c_parser_initializer): Check warning_enabled_at
at the DECL_SOURCE_LOCATION (decl) for OPT_Winit_self instead
of warn_init_self.
gcc/cp/
* decl.cc (cp_finish_decl): Check warning_enabled_at
at the DECL_SOURCE_LOCATION (decl) for OPT_Winit_self instead
of warn_init_self.
gcc/testsuite/
* c-c++-common/Winit-self3.c: New test.
* c-c++-common/Winit-self4.c: New test.
* c-c++-common/Winit-self5.c: New test.

(cherry picked from commit 98b41fd4045b7856e7b85dd58d67c600bd909379)

c-family: Honor -Wno-init-self for cv-qual vars [PR102633]

Since r11-5188-g32934a4f45a721, we drop qualifiers during l-to-r
conversion by creating a NOP_EXPR.  For e.g.

  const int i = i;

that means that the DECL_INITIAL is '(int) i' and not 'i' anymore.
Consequently, we don't suppress_warning here:

711     case DECL_EXPR:
715       if (VAR_P (DECL_EXPR_DECL (*expr_p))
716           && !DECL_EXTERNAL (DECL_EXPR_DECL (*expr_p))
717           && !TREE_STATIC (DECL_EXPR_DECL (*expr_p))
718           && (DECL_INITIAL (DECL_EXPR_DECL (*expr_p)) == DECL_EXPR_DECL (*expr_p))
719           && !warn_init_self)
720         suppress_warning (DECL_EXPR_DECL (*expr_p), OPT_Winit_self);

because of the check on line 718 -- (int) i is not i.  So -Wno-init-self
doesn't disable the warning as it's supposed to.

The following patch fixes it by moving the suppress_warning call from
c_gimplify_expr to the front ends, at points where we haven't created
the NOP_EXPR yet.

PR middle-end/102633

gcc/c-family/ChangeLog:

* c-gimplify.cc (c_gimplify_expr) <case DECL_EXPR>: Don't call
suppress_warning here.

gcc/c/ChangeLog:

* c-parser.cc (c_parser_initializer): Add new tree parameter.  Use it.
Call suppress_warning.
(c_parser_declaration_or_fndef): Pass d down to c_parser_initializer.
(c_parser_omp_declare_reduction): Pass omp_priv down to
c_parser_initializer.

gcc/cp/ChangeLog:

* decl.cc (cp_finish_decl): Call suppress_warning.

gcc/testsuite/ChangeLog:

* c-c++-common/Winit-self1.c: New test.
* c-c++-common/Winit-self2.c: New test.

(cherry picked from commit 04ce2400b35225302e0d6883bb0817378180f5d7)

c++: Handle structured bindings like anon unions in initializers [PR108474]

As reported by Andrew Pinski, structured bindings (with the exception
of the ones using std::tuple_{size,element} and get which are really
standalone variables in addition to the binding one) also use
DECL_VALUE_EXPR and needs the same treatment in static initializers.

On Sun, Jan 22, 2023 at 07:19:07PM -0500, Jason Merrill wrote:
> Though, actually, why not instead fix expand_expr_real_1 (and staticp) to
> look through DECL_VALUE_EXPR?

Doing it when emitting the initializers seems to be too late to me,
we in various spots try to put parts of the static var DECL_INITIAL expressions
into the IL, or e.g. for varpool purposes remember which vars are referenced
there.

This patch moves it to record_reference, which is called from varpool_node::analyze
and so about the same time as gimplification of the bodies which also
replaces DECL_VALUE_EXPRs.

2023-01-24 Jakub Jelinek <jakub@redhat.com>

PR c++/108474
* cp-gimplify.cc (cp_fold_r): Handle structured bindings
vars like anon union artificial vars.

* g++.dg/cpp1z/decomp57.C: New test.
* g++.dg/cpp1z/decomp58.C: New test.

(cherry picked from commit b84e21115700523b4d0ac44275443f7b9c670344)

forwprop: Further fixes for simplify_rotate [PR108440]

As mentioned in the simplify_rotate comment, for e.g.
   ((T) ((T2) X << (Y & (B - 1)))) | ((T) ((T2) X >> ((-Y) & (B - 1))))
we already emit
   X r<< (Y & (B - 1))
as replacement.  This PR is about the
   ((T) ((T2) X << Y)) OP ((T) ((T2) X >> (B - Y)))
   ((T) ((T2) X << (int) Y)) OP ((T) ((T2) X >> (int) (B - Y)))
forms if T2 is wider than T.  Unlike e.g.
   (X << Y) OP (X >> (B - Y))
which is valid just for Y in [1, B - 1], the above 2 forms are actually
valid and do the rotates for Y in [0, B] - for Y 0 the X value is preserved
by the left shift and right logical shift by B adds just zeros (but because
the shift is in wider precision B is still valid shift count), while for
Y equal to B X is preserved through the latter shift and the former adds
just zeros.
Now, it is unclear if we in the middle-end treat rotates with rotate count
equal or larger than precision as UB or not, unlike shifts there are less
reasons to do so, but e.g. expansion of X r<< Y if there is no rotate optab
for the mode is emitted as (X << Y) | (((unsigned) X) >> ((-Y) & (B - 1)))
and so with UB on Y == B.

The following patch does multiple things:
1) for the above 2, asks the ranger if Y could be equal to B and if so,
   instead of using X r<< Y uses X r<< (Y & (B - 1))
2) for the
   ((T) ((T2) X << Y)) | ((T) ((T2) X >> ((-Y) & (B - 1))))
   ((T) ((T2) X << (int) Y)) | ((T) ((T2) X >> (int) ((-Y) & (B - 1))))
   forms that were fixed 2 days ago it only punts if Y might be in the
   [B,B2-1] range but isn't known to be in the
   [0,B][2*B,2*B][3*B,3*B]... range.  Because for Y which is a multiple
   of B but smaller than B2 it acts as a rotate too, left shift provides
   0 and (-Y) & (B - 1) is 0 and so preserves X.  Though, for the cases
   where Y is not known to be in [0,B-1] the patch also uses
   X r<< (Y & (B - 1)) rather than X r<< Y
3) as discussed with Aldy, instead of using global ranger it uses a pass
   specific copy but lazily created on first simplify_rotate that needs it;
   this e.g. handles rotate inside of if body where the guarding condition
   limits the shift count to some range which will not work with the
   global ranger (unless there is some SSA_NAME to attach the range to).

Note, e.g. on x86 X r<< (Y & (B - 1)) and X r<< Y actually emit the
same assembly because rotates work the same even for larger rotate counts,
but that is handled only during combine.

2023-01-19  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/108440
* tree-ssa-forwprop.cc: Include gimple-range.h.
(simplify_rotate): For the forms with T2 wider than T and shift counts of
Y and B - Y add & (B - 1) masking for the rotate count if Y could be equal
to B.  For the forms with T2 wider than T and shift counts of
Y and (-Y) & (B - 1), don't punt if range could be [B, B2], but only if
range doesn't guarantee Y < B or Y = N * B.  If range doesn't guarantee
Y < B, also add & (B - 1) masking for the rotate count.  Use lazily created
pass specific ranger instead of get_global_range_query.
(pass_forwprop::execute): Disable that ranger at the end of pass if it has
been created.

* c-c++-common/rotate-10.c: New test.
* c-c++-common/rotate-11.c: New test.

(cherry picked from commit 05b9868b182bb9ed2013b39a0bc6297354a0db49)

forwprop: Fix up rotate pattern matching [PR106523]

The comment above simplify_rotate roughly describes what patterns
are matched into what:
   We are looking for X with unsigned type T with bitsize B, OP being
   +, | or ^, some type T2 wider than T.  For:
   (X << CNT1) OP (X >> CNT2)                           iff CNT1 + CNT2 == B
   ((T) ((T2) X << CNT1)) OP ((T) ((T2) X >> CNT2))     iff CNT1 + CNT2 == B

   transform these into:
   X r<< CNT1

   Or for:
   (X << Y) OP (X >> (B - Y))
   (X << (int) Y) OP (X >> (int) (B - Y))
   ((T) ((T2) X << Y)) OP ((T) ((T2) X >> (B - Y)))
   ((T) ((T2) X << (int) Y)) OP ((T) ((T2) X >> (int) (B - Y)))
   (X << Y) | (X >> ((-Y) & (B - 1)))
   (X << (int) Y) | (X >> (int) ((-Y) & (B - 1)))
   ((T) ((T2) X << Y)) | ((T) ((T2) X >> ((-Y) & (B - 1))))
   ((T) ((T2) X << (int) Y)) | ((T) ((T2) X >> (int) ((-Y) & (B - 1))))

   transform these into (last 2 only if ranger can prove Y < B):
   X r<< Y

   Or for:
   (X << (Y & (B - 1))) | (X >> ((-Y) & (B - 1)))
   (X << (int) (Y & (B - 1))) | (X >> (int) ((-Y) & (B - 1)))
   ((T) ((T2) X << (Y & (B - 1)))) | ((T) ((T2) X >> ((-Y) & (B - 1))))
   ((T) ((T2) X << (int) (Y & (B - 1)))) \
     | ((T) ((T2) X >> (int) ((-Y) & (B - 1))))

   transform these into:
   X r<< (Y & (B - 1))

The following testcase shows that 2 of these are problematic.
If T2 is wider than T, then the 2 which yse (-Y) & (B - 1) on one
of the shift counts but Y on the can do something different from
rotate.  E.g.:
__attribute__((noipa)) unsigned char
f7 (unsigned char x, unsigned int y)
{
  unsigned int t = x;
  return (t << y) | (t >> ((-y) & 7));
}
if y is [0, 7], then it is a normal rotate, and if y is in [32, ~0U]
then it is UB, but for y in [9, 31] the left shift in this case
will never leave any bits in the result, while in a rotate they are
left there.  Say for y 5 and x 0xaa the expression gives
0x55 which is the same thing as rotate, while for y 19 and x 0xaa
0x5, which is different.
Now, I believe the
   ((T) ((T2) X << Y)) OP ((T) ((T2) X >> (B - Y)))
   ((T) ((T2) X << (int) Y)) OP ((T) ((T2) X >> (int) (B - Y)))
forms are ok, because B - Y still needs to be a valid shift count,
and if Y > B then B - Y should be either negative or very large
positive (for unsigned types).
And similarly the last 2 cases above which use & (B - 1) on both
shift operands are definitely ok.

The following patch disables the
   ((T) ((T2) X << Y)) | ((T) ((T2) X >> ((-Y) & (B - 1))))
   ((T) ((T2) X << (int) Y)) | ((T) ((T2) X >> (int) ((-Y) & (B - 1))))
unless ranger says Y is not in [B, B2 - 1] range.

And, looking at it again this morning, actually the Y equal to B
case is still fine, if Y is equal to 0, then it is
(T) (((T2) X << 0) | ((T2) X >> 0))
and so X, for Y == B it is
(T) (((T2) X << B) | ((T2) X >> 0))
which is the same as
(T) (0 | ((T2) X >> 0))
which is also X.  So instead of the [B, B2 - 1] range we could use
[B + 1, B2 - 1].  And, if we wanted to go further, even multiplies
of B are ok if they are smaller than B2, so we could construct a detailed
int_range_max if we wanted.

2023-01-17  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/106523
* tree-ssa-forwprop.cc (simplify_rotate): For the
patterns with (-Y) & (B - 1) in one operand's shift
count and Y in another, if T2 has wider precision than T,
punt if Y could have a value in [B, B2 - 1] range.

* c-c++-common/rotate-2.c (f5, f6, f7, f8, f13, f14, f15, f16,
f37, f38, f39, f40, f45, f46, f47, f48): Add assertions using
__builtin_unreachable about shift count.
* c-c++-common/rotate-2b.c: New test.
* c-c++-common/rotate-4.c (f5, f6, f7, f8, f13, f14, f15, f16,
f37, f38, f39, f40, f45, f46, f47, f48): Add assertions using
__builtin_unreachable about shift count.
* c-c++-common/rotate-4b.c: New test.
* gcc.c-torture/execute/pr106523.c: New test.

(cherry picked from commit 001121e8921d5d1a439ce0e64ab04c5959b0bfd8)

c++: Avoid incorrect shortening of divisions [PR108365]

The following testcase is miscompiled, because we shorten the division
in a case where it should not be shortened.
Divisions (and modulos) can be shortened if it is unsigned division/modulo,
or if it is signed division/modulo where we can prove the dividend will
not be the minimum signed value or divisor will not be -1, because e.g.
on sizeof(long long)==sizeof(int)*2 && __INT_MAX__ == 0x7fffffff targets
(-2147483647 - 1) / -1 is UB
but
(int) (-2147483648LL / -1LL) is not, it is -2147483648.
The primary aim of both the C and C++ FE division/modulo shortening I assume
was for the implicit integral promotions of {,signed,unsigned} {char,short}
and because at this point we have no VRP information etc., the shortening
is done if the integral promotion is from unsigned type for the divisor
or if the dividend is an integer constant other than -1.
This works fine for char/short -> int promotions when char/short have
smaller precision than int - unsigned char -> int or unsigned short -> int
will always be a positive int, so never the most negative.

Now, the C FE checks whether orig_op0 is TYPE_UNSIGNED where op0 is either
the same as orig_op0 or that promoted to int, I think that works fine,
if it isn't promoted, either the division/modulo common type will have the
same precision as op0 but then the division/modulo is unsigned and so
without UB, or it will be done in wider precision (e.g. because op1 has
wider precision), but then op0 can't be minimum signed value. Or it has
been promoted to int, but in that case it was again from narrower type and
so never minimum signed int.

But the C++ FE was checking if op0 is a NOP_EXPR from TYPE_UNSIGNED.
First of all, not sure if the operand of NOP_EXPR couldn't be non-integral
type where TYPE_UNSIGNED wouldn't be meaningful, but more importantly,
even if it is a cast from unsigned integral type, we only know it can't be
minimum signed value if it is a widening cast, if it is same precision or
narrowing cast, we know nothing.

So, the following patch for the NOP_EXPR cases checks just in case that
it is from integral type and more importantly checks it is a widening
conversion.

2023-01-14 Jakub Jelinek <jakub@redhat.com>

PR c++/108365
* typeck.cc (cp_build_binary_op): For integral division or modulo,
shorten if type0 is unsigned, or op0 is cast from narrower unsigned
integral type or stripped_op1 is INTEGER_CST other than -1.

* g++.dg/opt/pr108365.C: New test.
* g++.dg/warn/pr108365.C: New test.

(cherry picked from commit 5b3a88640f962d4ffca31ae651bed2d8672f1a8c)

libstdc++: Another merge from fast_float upstream [PR107468]

Upstream fast_float came up with a cheaper test for
fegetround () == FE_TONEAREST using one float addition, one subtraction
and one comparison. If we know we are rounding to nearest, we can use
fast path in more cases as before.
The following patch merges those changes into libstdc++.

2022-11-24 Jakub Jelinek <jakub@redhat.com>

PR libstdc++/107468
* src/c++17/fast_float/fast_float.h: Partial merge from fast_float
2ef9abbcf6a11958b6fa685a89d0150022e82e78 commit.

(cherry picked from commit ec73b55c75baa16c1cf7482fa65928a8d45598d4)

libstdc++: Update from latest fast_float [PR107468]

The following patch partially updates from fast_float trunk.
That way it gets fix for the incorrect rounding case,
PR107468 aka https://github.com/fastfloat/fast_float/issues/149
Using std::fegetround showed in benchmarks too slow, so instead of
doing that the patch limits the fast path where it uses floating
point multiplication rather than integral to cases where we can
prove there will be no rounding (the multiplication will be exact, not
just that the two multiplication or division operation arguments are
exactly representable).

2022-11-07 Jakub Jelinek <jakub@redhat.com>

PR libstdc++/107468
* src/c++17/fast_float/fast_float.h: Partial merge from fast_float
662497742fea7055f0e0ee27e5a7ddc382c2c38e commit.
* testsuite/20_util/from_chars/pr107468.cc: New test.

(cherry picked from commit cb0ceeaee9e041aaac3edd089b07b439621d0f29)

match.pd: When simplifying BFR of an insert, require a mode precision integral type [PR108688]

The same problem as PR 88739 has crept in but
this time in match.pd when simplifying bit_field_ref of
an bit_insert. That is we are generating a BIT_FIELD_REF
of a non-mode-precision integral type.

PR tree-optimization/108688
* match.pd (bit_field_ref [bit_insert]): Avoid generating
BIT_FIELD_REFs of non-mode-precision integral operands.

* gcc.c-torture/compile/pr108688-1.c: New test.

(cherry picked from commit 44f308e59bfa0f93ae05b17e257d8563c12399fd)

vect-patterns: Fix up vect_widened_op_tree [PR108692]

The following testcase is miscompiled on aarch64-linux since r11-5160.
Given
  <bb 3> [local count: 955630225]:
  # i_22 = PHI <i_20(6), 0(5)>
  # r_23 = PHI <r_19(6), 0(5)>
...
  a.0_5 = (unsigned char) a_15;
  _6 = (int) a.0_5;
  b.1_7 = (unsigned char) b_17;
  _8 = (int) b.1_7;
  c_18 = _6 - _8;
  _9 = ABS_EXPR <c_18>;
  r_19 = _9 + r_23;
...
where SSA_NAMEs 15/17 have signed char, 5/7 unsigned char and rest is int
we first pattern recognize c_18 as
patt_34 = (a.0_5) w- (b.1_7);
which is still correct, 5/7 are unsigned char subtracted in wider type,
but then vect_recog_sad_pattern turns it into
SAD_EXPR <a_15, b_17, r_23>
which is incorrect, because 15/17 are signed char and so it is
sum of absolute signed differences rather than unsigned sum of
absolute unsigned differences.
The reason why this happens is that vect_recog_sad_pattern calls
vect_widened_op_tree with MINUS_EXPR, WIDEN_MINUS_EXPR on the
patt_34 = (a.0_5) w- (b.1_7); statement's vinfo and vect_widened_op_tree
calls vect_look_through_possible_promotion on the operands of the
WIDEN_MINUS_EXPR, which looks through the further casts.
vect_look_through_possible_promotion has careful code to stop when there
would be nested casts that need to be preserved, but the problem here
is that the WIDEN_*_EXPR operation itself has an implicit cast on the
operands already - in this case of WIDEN_MINUS_EXPR the unsigned char
5/7 SSA_NAMEs are widened to unsigned short before the subtraction,
and vect_look_through_possible_promotion obviously isn't told about that.

Now, I think when we see those WIDEN_{MULT,MINUS,PLUS}_EXPR codes, we had
to look through possible promotions already when creating those and so
vect_look_through_possible_promotion again isn't really needed, all we need
to do is arrange what that function will do if the operand isn't result
of any cast.  Other option would be let vect_look_through_possible_promotion
know about the implicit promotion from the WIDEN_*_EXPR, but I'm afraid
that would be much harder.

2023-02-08  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/108692
* tree-vect-patterns.cc (vect_widened_op_tree): If rhs_code is
widened_code which is different from code, don't call
vect_look_through_possible_promotion but instead just check op is
SSA_NAME with integral type for which vect_is_simple_use is true
and call set_op on this_unprom.

* gcc.dg/pr108692.c: New test.

(cherry picked from commit 6ad1c1027628f094260037536f6b6fcdb63b5add)

fortran: Fix up hash table usage in gfc_trans_use_stmts [PR108451]

The first testcase in the PR (which I haven't included in the patch because
it is unclear to me if it is supposed to be valid or not) ICEs since extra
hash table checking has been added recently.  The problem is that
gfc_trans_use_stmts does
          tree *slot = entry->decls->find_slot_with_hash (rent->use_name, hash,
                                                          INSERT);
          if (*slot == NULL)
and later on doesn't store anything into *slot and continues.  Another spot
a few lines later correctly clears the slot if it decides not to use the
slot, so the following patch does the same.

2023-02-03  Jakub Jelinek  <jakub@redhat.com>

PR fortran/108451
* trans-decl.cc (gfc_trans_use_stmts): Call clear_slot before
doing continue.

(cherry picked from commit 76f7f0eddcb7c418d1ec3dea3e2341ca99097301)

nested, openmp: Wrap OMP_CLAUSE_*_GIMPLE_SEQ into GIMPLE_BIND for declare_vars [PR108435]

When gimplifying OMP_CLAUSE_{LASTPRIVATE,LINEAR}_STMT, we wrap it always
into a GIMPLE_BIND, but when putting statements directly into
OMP_CLAUSE_{LASTPRIVATE,LINEAR}_GIMPLE_SEQ, we do it only if needed (there
are any temporaries that need to be declared in the sequence).
convert_nonlocal_omp_clauses was relying on the GIMPLE_BIND to be there always
because it called declare_vars on it.

The following patch wraps it into GIMPLE_BIND in tree-nested if we need to
declare_vars on it on demand.

2023-02-02 Jakub Jelinek <jakub@redhat.com>

PR middle-end/108435
* tree-nested.cc (convert_nonlocal_omp_clauses)
<case OMP_CLAUSE_LASTPRIVATE>: If info->new_local_var_chain and *seq
is not a GIMPLE_BIND, wrap the sequence into a new GIMPLE_BIND
before calling declare_vars.
(convert_nonlocal_omp_clauses) <case OMP_CLAUSE_LINEAR>: Merge
with the OMP_CLAUSE_LASTPRIVATE handling except for whether
seq is initialized to &OMP_CLAUSE_LASTPRIVATE_GIMPLE_SEQ (clause)
or &OMP_CLAUSE_LINEAR_GIMPLE_SEQ (clause).

* gcc.dg/gomp/pr108435.c: New test.

(cherry picked from commit 0f349928e16fdc7dba52561e8d40347909f9f0ff)

ree: Fix -fcompare-debug issues in combine_reaching_defs [PR108573]

The PR78437 r7-4871 changes made combine_reaching_defs punt on
WORD_REGISTER_OPERATIONS targets if a setter of smaller than word
register has wider uses.  This unfortunately breaks -fcompare-debug,
because if such a use appears only in DEBUG_INSN(s), while all other
uses aren't wider than the setter, we can REE optimize it without -g
and not with -g.

Such decisions shouldn't be based on debug instructions.  We could try
to reset them or adjust in some other way after we decide to perform the
change, but at least on the testcase which used to fail on riscv64-linux
the
(debug_insn 8 7 9 2 (var_location:HI s (minus:HI (subreg:HI (and:DI (reg:DI 10 a0 [160])
                (const_int 1 [0x1])) 0)
        (subreg:HI (ashiftrt:DI (reg/v:DI 9 s1 [orig:151 l ] [151])
                (debug_expr:SI D#1)) 0))) "pr108573.c":12:5 -1
     (nil))
clearly doesn't care about the upper bits and I have hard time imaging how
could one end up with DEBUG_INSN which actually cares about those upper
bits.

So, the following patch just ignores uses on DEBUG_INSNs in this case,
if we run into something where we'd need to do something further later on,
let's deal with it when we have a testcase for it.

2023-02-01  Jakub Jelinek  <jakub@redhat.com>

PR debug/108573
* ree.cc (combine_reaching_defs): Don't return false for paradoxical
subregs in DEBUG_INSNs.

* gcc.dg/pr108573.c: New test.

(cherry picked from commit e4473d7cf871c8ddf8f22d105c5af6375ebe37bf)

c++, openmp: Handle some OMP_*/OACC_* constructs during constant expression evaluation [PR108607]

While potential_constant_expression_1 handled most of OMP_* codes (by saying that
they aren't potential constant expressions), OMP_SCOPE was missing in that list.
I've also added OMP_SCAN, though that is less important (similarly to OMP_SECTION
it ought to appear solely inside of OMP_{FOR,SIMD} resp. OMP_SECTIONS).
As the testcase shows, it isn't enough, potential_constant_expression_1
can catch only some cases, as soon as one uses switch or ifs where at least
one of the possible paths could be constant expression, we can run into the
same codes during cxx_eval_constant_expression, so this patch handles those
there as well.

2023-02-01 Jakub Jelinek <jakub@redhat.com>

PR c++/108607
* constexpr.cc (cxx_eval_constant_expression): Handle OMP_*
and OACC_* constructs as non-constant.
(potential_constant_expression_1): Handle OMP_SCAN and OMP_SCOPE.

* g++.dg/gomp/pr108607.C: New test.

(cherry picked from commit bfc070595bfb00abef88a002eee5d9117f5b86a7)