git.ipfire.org Git - thirdparty/gcc.git/log

Daily bump.

c++, coroutines: Fix block nests when the function has no top-level bind.

When the function contains no local vars and also no nested scopes, there
is no top-level bind expression. Because the rewritten coroutine body will
require both local vars and contain nested scopes, we add a bind expression
to such functions. When this was done the necessary scope blocks were
omitted which leads to disconnected function content.

Fixed by adding a new block to the added bind expression.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
gcc/cp/ChangeLog:

* coroutines.cc (coro_rewrite_function_body): Ensure that added
bind expressions have scope blocks.

(cherry picked from commit a8d7631d333c22e38a067d32d11fd2b60cf1d960)

c++,coroutines: Stabilize names of promoted slot vars [PR101118].

When we need to 'promote' a value (i.e. store it in the coroutine frame) it
is given a frame entry name. This was based on the DECL_UID for slot vars.
However, when LTO is used, the names from multiple TUs become visible at the
same time, and the DECL_UIDs usually differ between units. This leads to a
"ODR mismatch" warning for the frame type.

The fix here is to use the current promoted temporaries count to produce
the name, this is stable between TUs and computed per coroutine.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
PR c++/101118

gcc/cp/ChangeLog:

* coroutines.cc (flatten_await_stmt): Use the current count of
promoted temporaries to build a unique name for the frame entries.

(cherry picked from commit fc4cde2e6aa4d6ebdf7f70b7b4359fb59a1915ae)

libsanitizer, darwin: Unsupport Darwin >= 22 for now.

The mechanism for location dyld has altered from Darwin22 since dyld is now
in the shared cache. The implemented mechanism for walking the cache uses
Apple Blocks which GCC does not yet support, and the fallback to the original
mechanism does not work there.

Until a suitable work-around can be found, unsupport Darwin22+.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
libsanitizer/ChangeLog:

* configure.tgt: Unsupport Darwin22+ until a mechanism can be found
to locate dyld in the shared cache.

(cherry picked from commit e722a1f42b28092c9f709a3f758fc4fe57db32b0)

Darwin, fixincludes: Handle Apple Blocks in objc/runtime.h.

The macOS 13 SDK has unguarded Apple Blocks use in objc/runtime.h which
causes most of the objective-c tests to fail.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
fixincludes/ChangeLog:

* fixincl.x: Regenerate.
* inclhack.def (darwin_objc_runtime_1): New hack.
* tests/base/objc/runtime.h: New file.

(cherry picked from commit 046dc9d0d4683bab99d28983d8841ba3c56ef744)

Darwin, fixincludes: Handle MacOS13 SDK Apple-specific deprecations [PR107568].

The SDK for MacOS13 includes Apple-specific deprecations of some functions that
are not deprecated in Posix, C or C++ and widely used in GCC.

The fix makes the deprecation conditional on __APPLE_LOCAL_DEPRECATIONS so that
end users may still observe them but they are hidden from normal compilations.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
PR target/107568

fixincludes/ChangeLog:

* fixincl.x: Regenerate.
* inclhack.def: Add a fix for MacOS13 SDK function deprecations
in stdio.h.
* tests/base/stdio.h (__deprecated_msg): New test.

(cherry picked from commit 442d2bdc1d2a98aba0b18aeaa3e87fa946ac8031)

libgcc, Darwin: No early install for the compatibility libgcc_s.1.dylib.

On Darwin, GCC now uses a libgcc_s.1.1 for builtins and forwards the system
unwinder. We do, however, build a backwards compatibility libgcc_s.1.dylib.
However, this is not needed by GCC and can cause incorrect operation when
DYLD_LIBRARY_PATH is in use.

Since we do not need or use it during the build, the solution is to skip the
installation into the $build/gcc directory.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
libgcc/ChangeLog:

* config/t-slibgcc-darwin (install-darwin-libgcc-stubs): Skip the
install of libgcc_s.1.dylib when the installation is into the build
gcc directory.

(cherry picked from commit 163f0f2267370575a9950e7e30a2c9cd72f559f0)

configure, Darwin: Correct a pasto in host-shared processing.

We do, of course, mean $host not $target in this case. Corrected thus.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
ChangeLog:

* configure: Regenerate.
* configure.ac: Correct use of $host.

(cherry picked from commit 1edfc8f2d3307a3ffa077a605f432832d7715462)

Darwin: Truncate kernel-provided version to OS major for Darwin >= 20.

In common with system tools, GCC uses a version obtained from the kernel as
the prevailing macOS target, when that is not overridden by command line or
environment versions (i.e. mmacosx-version-min=, MACOSX_DEPLOYMENT_TARGET).

Presently, GCC assumes that if the OS version is >= 20, the value used should
include both major and minium version identifiers. However the system tools
(for those versions) truncate the value to the major version - this leads to
link errors when combining objects built with clang and GCC for example:

ld: warning: object file (null.o) was built for newer macOS version (12.2)
than being linked (12.0)

The change here truncates the values GCC uses to the major version.

gcc/ChangeLog:

PR target/104871
* config/darwin-driver.c (darwin_find_version_from_kernel): If the OS
version is darwin20 (macOS 11) or greater, truncate the version to the
major number.

(cherry picked from commit add1adaa17a294ea25918ffb4fdd40f115362632)

Darwin: Future-proof -mmacosx-version-min

f18cbc1ee1f4 (2021-12-18) updated various parts of gcc to not impose a
Darwin or macOS version maximum of the current known release. Different
parts of gcc accept, variously, Darwin version numbers matching
darwin2*, and macOS major version numbers up to 99. The current released
version is Darwin 21 and macOS 12, with Darwin 22 and macOS 13 expected
for public release later this year. With one major OS release per year,
this strategy is expected to provide another 8 years of headroom.

However, f18cbc1ee1f4 missed config/darwin-c.c (now .cc), which
continued to impose a maximum of macOS 12 on the -mmacosx-version-min
compiler driver argument. This was last updated from 11 to 12 in
11b967577483 (2021-10-27), but kicking the can down the road one year at
a time is not a viable strategy, and is not in line with the more recent
technique from f18cbc1ee1f4.

Prior to 556ab5125912 (2020-11-06), config/darwin-c.c did not impose a
maximum that needed annual maintenance, as at that point, all macOS
releases had used a major version of 10. The stricter approach imposed
since then was valuable for a time until the particulars of the new
versioning scheme were established and understood, but now that they
are, it's prudent to restore a more permissive approach.

gcc/ChangeLog:

* config/darwin-c.c: Make -mmacosx-version-min more future-proof.

Signed-off-by: Mark Mentovai <mark@mentovai.com>
(cherry picked from commit 6725f186cb70d48338f69456864bf469a12ee5be)

Darwin: Fix empty g++ command lines [PR105599].

An empty g++ command line should produce a diagnostic that there are no
inputs. The PR is that currently Darwin produces a dignostic about missing
link items instead - this is because (errnoeously), for this driver, we are
creating a link job for empty command lines.

The problem occurs in four stages:

The g++ driver appends -shared-libgcc to the command line.

The Darwin driver_init code in the backend does not see this (it sees an
empty command line).

When the back end driver code driver sees an empty command line, it does not
add any supplementary flags (e.g. asm-macosx-version-min) - precisely to
avoid anything being claimed as an input_file and therefore triggering a link
line.

Since we do not have a value for asm-macosx-version-min when processing the
driver specs, we unconditionally inject 'multiply_defined suppress' which is
used with shared libgcc (but only intended on very old Darwin). This then
causes the generation of a link job.

The solution, for the present, is to move version-specific link params to the
LINK_SPEC so that they are only processed when a link job has already been
decided.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
PR target/105599

gcc/ChangeLog:

* config/darwin.h: Move versions-specific handling of multiply_defined
from SUBTARGET_DRIVER_SELF_SPECS to LINK_SPEC.

(cherry picked from commit 794737976b9a6418eab817f143bb4eb2d0c834d2)

Darwin: Update rules for handling alignment of globals.

The current rule was too strict and has not been required since Darwin11.

This relaxes the constraint to allow up to 2^28 alignment for non-common
entities.  Common is still restricted to a maximum aligment of 2^15.

When the host is an older version of Darwin ( earlier that 11 ) then the
existing constraint is still applied.  Note that this is a host constraint
not a target one (so that a compilation on 10.7 targeting 10.6 is allowed
to use a greater alignment than the tools on 10.6 support).  This matches
the behaviour of clang.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
gcc/ChangeLog:

* config.gcc: Emit L2_MAX_OFILE_ALIGNMENT with suitable
values for the host.
* config/darwin.c (darwin_emit_common): Error for alignment
values > 32768.
* config/darwin.h (MAX_OFILE_ALIGNMENT): Rework to use the
configured L2_MAX_OFILE_ALIGNMENT.

gcc/testsuite/ChangeLog:

* gcc.dg/darwin-aligned-globals.c: New test.
* gcc.dg/darwin-comm-1.c: New test.
* gcc.dg/attr-aligned.c: Amend for new alignment values on
Darwin.
* gcc.target/i386/pr89261.c: Likewise.

(cherry picked from commit 19bf83a9a068f2d5293b63c9300f99172b2d278d)

Darwin: Future-proof and homogeneize detection of darwin versions

The current GCC branch will become 12.1.0, which will be the stable
version of GCC when the next macOS version is released. There are some
places in GCC that don’t handle darwin22 as a version, so we need to
future-proof it (gcc/config.gcc and gcc/config/darwin-driver.c). We
align that code with what Apple clang does, i.e. accept all potential
major macOS versions until 99.

This patch also homogenises the handling of darwin version numbers,
where the majority of places use darwin2*, but some used darwin2[0-9]*.
Since there never was a darwin2.x version, the two are equivalent, and
we prefer the simpler darwin2*

gcc/ChangeLog:

* config/darwin-driver.c: Make version code more future-proof.
* config.gcc: Homogeneize darwin versions.
* configure.ac: Homogeneize darwin versions.
* configure: Regenerate.

gcc/testsuite/ChangeLog:

* gcc.dg/darwin-minversion-link.c: Test darwin21.
* obj-c++.dg/cxx-ivars-3.mm: Homogeneize darwin versions.
* obj-c++.dg/objc-gc-3.mm: Homogeneize darwin versions.
* objc.dg/objc-gc-4.m: Homogeneize darwin versions.

(cherry picked from commit f18cbc1ee1f421a0dd79dc389bef9a23dd4a761d)

libstdc++: Fix src/c++17/memory_resource for H8 targets [PR107801]

This fixes compilation failures for H8 multilibs. For the normal
multilib (ILP16L32?), the chunk struct does not have the expected size,
because uint32_t is type long and has alignment 4 (by default). This
forces sizeof(chunk) to be 12 instead of the expected 10. We can fix
that by using bitset::size_type instead of uint32_t, so that we only use
a 16-bit size when size_t and pointers are 16-bit types.

For the IL32P16 multilibs that use -mint32, int is wider than size_t
and so arithmetic expressions involving size_t promote to int. This
means we need some explicit casts back to size_t.

libstdc++-v3/ChangeLog:

PR libstdc++/107801
* src/c++17/memory_resource.cc (chunk::_M_bytes): Change type
from uint32_t to bitset::size_type. Adjust static assertion.
(__pool_resource::_Pool::replenish): Cast to size_t after
multiplication instead of before.
(__pool_resource::_M_alloc_pools): Ensure both arguments to
std::max have type size_t.

(cherry picked from commit 75e562d2c4303d3918be9d1563284b0c580c5e45)

libstdc++: Fix pool resource build errors for H8 [PR107801]

The array of pool sizes was previously adjusted to work for msp430-elf
which has 16-bit int and either 16-bit size_t or 20-bit size_t. The
largest pool sizes were disabled unless size_t has more than 20 bits.

The H8 family has 16-bit int but 32-bit size_t, which means that the
largest sizes are enabled, but 1<<15 produces a negative number that
then cannot be narrowed to size_t.

Replace the test for 32-bit size_t with a test for 32-bit int, which
means we won't use the 4kiB to 4MiB pools for targets with 16-bit int
even if they have a wider size_t.

libstdc++-v3/ChangeLog:

PR libstdc++/107801
* src/c++17/memory_resource.cc (pool_sizes): Disable large pools
for targets with 16-bit int.

(cherry picked from commit 0f9659e770304d3c44cfa0e793833a461bc487aa)

libstdc++: Fix std::is_nothrow_invocable_r for uncopyable prvalues [PR91456]

This is the last missing piece of PR 91456.

This also removes the only use of the C++11 version of
std::is_nothrow_invocable.

libstdc++-v3/ChangeLog:

PR libstdc++/91456
* include/std/type_traits (__is_nothrow_invocable): Remove.
(__is_invocable_impl::__nothrow_type): New member type which
checks if the conversion can throw.
(__is_nt_invocable_impl): Replace class template with alias
template to __is_nt_invocable_impl::__nothrow_type.
* testsuite/20_util/is_nothrow_invocable/91456.cc: New test.
* testsuite/20_util/is_nothrow_convertible/value.cc: Remove
macro used by value_ext.cc test.
* testsuite/20_util/is_nothrow_convertible/value_ext.cc: Remove
test for non-standard __is_nothrow_invocable trait.

(cherry picked from commit 71c828f84572d933979468baf2cf744180258ee4)

libstdc++: Add always_inline attribute to std::byte operators

libstdc++-v3/ChangeLog:

* include/c_global/cstddef (byte): Add always_inline attribute
to all operator overloads.
(to_integer): Add always_inline attribute.

(cherry picked from commit 4977507e329ee3ed410fc7338b9aaada81b68361)

libstdc++: Add nodiscard attribute to filesystem operations

Some of these are not truly "pure" because they access the file system,
e.g. exists and file_size, but they do not modify anything and are only
useful for the return value.

If you really want to use one of those functions just to check whether
an error is reported (either via an exception or an error_code&
argument) you can still do so, but you need to cast the discarded result
to void. Several tests need such a change, because they were indeed
only calling the functions to check for expected errors.

libstdc++-v3/ChangeLog:

* include/bits/fs_ops.h: Add nodiscard to all pure functions.
* include/experimental/bits/fs_ops.h: Likewise.
* testsuite/27_io/filesystem/operations/all.cc: Do not discard
results of absolute and canonical.
* testsuite/27_io/filesystem/operations/absolute.cc: Cast
discarded result to void.
* testsuite/27_io/filesystem/operations/canonical.cc: Likewise.
* testsuite/27_io/filesystem/operations/exists.cc: Likewise.
* testsuite/27_io/filesystem/operations/is_empty.cc: Likewise.
* testsuite/27_io/filesystem/operations/read_symlink.cc:
Likewise.
* testsuite/27_io/filesystem/operations/status.cc: Likewise.
* testsuite/27_io/filesystem/operations/symlink_status.cc:
Likewise.
* testsuite/27_io/filesystem/operations/temp_directory_path.cc:
Likewise.
* testsuite/experimental/filesystem/operations/canonical.cc:
Likewise.
* testsuite/experimental/filesystem/operations/exists.cc:
Likewise.
* testsuite/experimental/filesystem/operations/is_empty.cc:
Likewise.
* testsuite/experimental/filesystem/operations/read_symlink.cc:
Likewise.
* testsuite/experimental/filesystem/operations/temp_directory_path.cc:
Likewise.

(cherry picked from commit f7a148304a71f3d3ad6845b7966fdc3af88c9e45)

libstdc++: Use emplace in std::variant::operator=(T&&) as per LWG 3585

This was approved at the October 2021 plenary.

libstdc++-v3/ChangeLog:

* include/std/variant (variant::operator=): Implement resolution
of LWG 3585.
* testsuite/20_util/variant/lwg3585.cc: New test.

(cherry picked from commit 1395415fdc2d60e5346dbcf476749daf42d5b724)

libstdc++: Fix tests with non-const operator==

These tests fail in strict -std=c++20 mode but their equality ops don't
need to be non-const, it looks like an accident.

This fixes two FAILs with -std=c++20:
FAIL: 20_util/tuple/swap.cc (test for excess errors)
FAIL: 26_numerics/valarray/87641.cc (test for excess errors)

libstdc++-v3/ChangeLog:

* testsuite/20_util/tuple/swap.cc (MoveOnly::operator==): Add
const qualifier.
* testsuite/26_numerics/valarray/87641.cc (X::operator==):
Likewise.

(cherry picked from commit fbad7a74aaaddea3d7b39045a09dd3860603658e)

libstdc++: Fix reading UTF-8 characters for 16-bit targets [PR104875]

The current code in read_utf8_code_point assumes that integer promotion
will create a 32-bit int, but that's not true for 16-bit targets like
msp430 and avr. This changes the intermediate variables used for each
octet from unsigned char to char32_t, so that (c << N) works correctly
when N > 8.

libstdc++-v3/ChangeLog:

PR libstdc++/104875
* src/c++11/codecvt.cc (read_utf8_code_point): Use char32_t to
hold octets that will be left-shifted.

(cherry picked from commit 8f7b7c1495f92c72da154d32317943a2cc276ca8)

libstdc++: Add missing move in ranges::copy

This is needed to support a move-only output iterator when the input
iterators are specializations of __normal_iterator.

libstdc++-v3/ChangeLog:

* include/bits/ranges_algobase.h (__detail::__copy_or_move):
Move output iterator.
* testsuite/25_algorithms/copy/constrained.cc: Check copying to
move-only output iterator.

(cherry picked from commit 2ff0e62275b1c322a8b65f38f8336f37d31c30e4)

libstdc++: Fix self-move for std::weak_ptr [PR108118]

I think an alternative fix would be something like:

_M_ptr = std::exchange(rhs._M_ptr, nullptr);
_M_refcount = std::move(rhs._M_refcount);

The standard's move-and-swap implementation generates smaller code at
all levels except -O0 and -Og, so it seems simplest to just do what the
standard says.

libstdc++-v3/ChangeLog:

PR libstdc++/108118
* include/bits/shared_ptr_base.h (weak_ptr::operator=):
Implement as move-and-swap exactly as specified in the standard.
* testsuite/20_util/weak_ptr/cons/self_move.cc: New test.

(cherry picked from commit 92eb0adc14a5f84acce7e5bc780b81b1544b24aa)

libstdc++: Fix std::chrono::hh_mm_ss with unsigned rep [PR108265]

libstdc++-v3/ChangeLog:

PR libstdc++/108265
* include/std/chrono (hh_mm_ss): Do not use chrono::abs if
duration rep is unsigned. Remove incorrect noexcept-specifier.
* testsuite/std/time/hh_mm_ss/1.cc: Check unsigned rep. Check
floating-point representations. Check default construction.

(cherry picked from commit e36e57b032b2d70eaa1294d5921e4fd8ce12a74d)

Daily bump.

Darwin: Fix bootstrap for X86.

r11-10764 has a typo that misplaces a trailing '\' to the start
of the following line.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
gcc/ChangeLog:

* config/i386/darwin.h (ENDFILE_SPEC): Fix trailing '\'.

x86: Add a new option -mdaz-ftz to enable FTZ and DAZ flags in MXCSR.

    if (mdaz-ftz)
      link crtfastmath.o
    else if ((Ofast || ffast-math || funsafe-math-optimizations)
             && !mno-daz-ftz)
      link crtfastmath.o
    else
      Don't link crtfastmath.o

gcc/ChangeLog:

* config/i386/cygwin.h (ENDFILE_SPEC): Link crtfastmath.o
whenever -mdaz-ftz is specified. Don't link crtfastmath.o
when -mno-daz-ftz is specified.
* config/i386/darwin.h (ENDFILE_SPEC): Ditto.
* config/i386/gnu-user-common.h
(GNU_USER_TARGET_MATHFILE_SPEC): Ditto.
* config/i386/mingw32.h (ENDFILE_SPEC): Ditto.
* config/i386/i386.opt (mdaz-ftz): New option.
* doc/invoke.texi (x86 options): Document mftz-daz.

Daily bump.

libstdc++: Fix __max_diff_type::operator>>= for negative values

This patch fixes sign bit propagation when right-shifting a negative
__max_diff_type value by more than one, a bug that our existing test
coverage didn't expose until r14-159-g03cebd304955a6 fixed the front
end's 'signed typedef-name' handling that the test relies on (which is
a non-standard extension to the language grammar).

libstdc++-v3/ChangeLog:

* include/bits/max_size_type.h (__max_diff_type::operator>>=):
Fix propagation of sign bit.
* testsuite/std/ranges/iota/max_size_type.cc: Avoid using the
non-standard 'signed typedef-name'. Add some compile-time tests
for right-shifting a negative __max_diff_type value by more than
one.

(cherry picked from commit 83470a5cd4c3d233e1d55b5e5553e1b9c553bf28)

Daily bump.

syscall: add prlimit

As of https://go.dev/cl/476695 golang.org/x/sys/unix can call
syscall.prlimit, so we need such a function in libgo.

Daily bump.

testsuite/108985 - missing vect_simd_clones target requirement on test

Added.

PR testsuite/108985
* gcc.dg/vect/pr108950.c: Require vect_simd_clones.

(cherry picked from commit a2926653ebbc88e8bba335563fa86b44651598d6)

Daily bump.

testsuite: Add further testcase for already fixed PR [PR109778]

I came up with a testcase which reproduces all the way to r10-7469.
LTO to avoid early inlining it, so that ccp handles rotates and not
shifts before they are turned into rotates.

2023-05-09 Jakub Jelinek <jakub@redhat.com>

PR tree-optimization/109778
* gcc.dg/lto/pr109778_0.c: New test.
* gcc.dg/lto/pr109778_1.c: New file.

(cherry picked from commit c2cf2dc988eb93551fa1c01d3f8d73ef21f39dc5)

tree-ssa-ccp, wide-int: Fix up handling of [LR]ROTATE_EXPR in bitwise ccp [PR109778]

The following testcase is miscompiled, because bitwise ccp2 handles
a rotate with a signed type incorrectly.
Seems tree-ssa-ccp.cc has the only callers of wi::[lr]rotate with 3
arguments, all other callers just rotate in the right precision and
I think work correctly.  ccp works with widest_ints and so rotations
by the excessive precision certainly don't match what it wants
when it sees a rotate in some specific bitsize.  Still, if it is
unsigned rotate and the widest_int is zero extended from width,
the functions perform left shift and logical right shift on the value
and then at the end zero extend the result of left shift and uselessly
also the result of logical right shift and return | of that.
On the testcase we the signed char rrotate by 4 argument is
CONSTANT -75 i.e. 0xffffffff....fffffb5 with mask 2.
The mask is correctly rotated to 0x20, but because the 8-bit constant
is sign extended to 192-bit one, the logical right shift by 4 doesn't
yield expected 0xb, but gives 0xfffffffffff....ffffb, and then
return wi::zext (left, width) | wi::zext (right, width); where left is
0xfffffff....fb50, so we return 0xfb instead of the expected
0x5b.

The following patch fixes that by doing the zero extension in case of
the right variable before doing wi::lrshift rather than after it.

Also, wi::[lr]rotate widht width < precision always zero extends
the result.  I'm afraid it can't do better because it doesn't know
if it is done for an unsigned or signed type, but the caller in this
case knows that very well, so I've done the extension based on sgn
in the caller.  E.g. 0x5b rotated right (or left) by 4 with width 8
previously gave 0xb5, but sgn == SIGNED in widest_int it should be
0xffffffff....fffb5 instead.

2023-05-09  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/109778
* wide-int.h (wi::lrotate, wi::rrotate): Call wi::lrshift on
wi::zext (x, width) rather than x if width != precision, rather
than using wi::zext (right, width) after the shift.
* tree-ssa-ccp.c (bit_value_binop): Call wi::ext on the results
of wi::lrotate or wi::rrotate.

* gcc.c-torture/execute/pr109778.c: New test.

(cherry picked from commit a8302d2a4669984c7c287d12ef5b37cde6699c80)

tree-optimization/108950 - widen-sum reduction ICE

When we end up with a widen-sum with an invariant smaller operand
the reduction code uses a wrong vector type for it, causing
IL checking ICEs. The following fixes that and the inefficiency
of using a widen-sum with a widenend invariant operand as well
by actually performing the check the following comment wants.

PR tree-optimization/108950
* tree-vect-patterns.c (vect_recog_widen_sum_pattern):
Check oprnd0 is defined in the loop.
* tree-vect-loop.c (vectorizable_reduction): Record all
operands vector types, compute that of invariants and
properly update their SLP nodes.

* gcc.dg/vect/pr108950.c: New testcase.

(cherry picked from commit e3837b6f6c28a1d2cea3a69efbda795ea3fb8816)

c++: non-templated friends [PR105852]

The previous patch for 105852 avoids copying DECL_TEMPLATE_INFO from a
non-templated friend, but it really shouldn't have it in the first place.

PR c++/106740
PR c++/105852

gcc/cp/ChangeLog:

* decl.c (duplicate_decls): Change non-templated friend
check to an assert.
* pt.c (tsubst_function_decl): Don't set DECL_TEMPLATE_INFO
on non-templated friends.
(tsubst_friend_function): Adjust.

gcc/testsuite/ChangeLog:

* g++.dg/template/friend78.C: New test.

Daily bump.

tree-optimization/107898 - ICE with -Walloca-larger-than

The following avoids ICEing with a mismatched prototype for alloca
and -Walloca-larger-than using irange for checks which doesn't
like mismatched types.

PR tree-optimization/107898
* gimple-ssa-warn-alloca.c (alloca_call_type): Check
the type of the alloca argument is compatible with size_t
before querying ranges.

(cherry picked from commit 9948daa4fd0f0ea0a9d56c2fefe1bca478554d27)

Daily bump.

tree-optimization/109724 - new testcase

The following adds a testcase for PR109724 which was caused by
backporting r13-2375-gbe1b42de9c151d and fixed by r11-199-g2b42509f8b7bdf.

PR tree-optimization/109724
* g++.dg/torture/pr109724.C: New testcase.

(cherry picked from commit ee99aaae4aeecd55f1d945a959652cf07e3b2e9e)

Daily bump.

extend.texi: replace @itemx not preceded by @item.

gcc/ChangeLog:

* doc/extend.texi: Replace @itemx not being preceded by @item.

call_summary: add missing template keyword

Without the 'template', this function template compares 'traverse' to 'f',
and then compares the result to 'a'. Evidently it hasn't been instantiated
yet.

gcc/ChangeLog:

* symbol-summary.h: Added missing template keyword.

(cherry picked from commit fccd5b48adf568f0aabe5d5f51206a9d42da095a)

libstdc++: Fix broken backport for std::lcm [PR105844]

The backport for gcc-11 was incomplete and should have replaced all uses
of std::__is_constant_evaluated with __builtin_is_constant_evaluated.

We also need to prune some additional output for the new tests, because
the r12-4425-g1595fe44e11a96 change to always prune those lines is not
present on the branch.

libstdc++-v3/ChangeLog:

PR libstdc++/105844
* include/experimental/numeric (lcm): Use built-in instead of
__is_constant_evaluated.
* include/std/numeric (__abs_r, lcm): Likewise.
* testsuite/26_numerics/gcd/105844.cc: Add dg-prune-output.
* testsuite/26_numerics/lcm/105844.cc: Likewise.

libstdc++: Make std::lcm and std::gcd detect overflow [PR105844]

When I fixed PR libstdc++/92978 I introduced a regression whereby
std::lcm(INT_MIN, 1) and std::lcm(50000, 49999) would no longer produce
errors during constant evaluation. Those calls are undefined, because
they violate the preconditions that |m| and the result can be
represented in the return type (which is int in both those cases). The
regression occurred because __absu<unsigned>(INT_MIN) is well-formed,
due to the explicit casts to unsigned in that new helper function, and
the out-of-range multiplication is well-formed, because unsigned
arithmetic wraps instead of overflowing.

To fix 92978 I made std::gcm and std::lcm calculate |m| and |n|
immediately, yielding a common unsigned type that was used to calculate
the result. That was partly correct, but there's no need to use an
unsigned type. Doing so only suppresses the overflow errors so the
compiler can't detect them. This change replaces __absu with __abs_r
that returns the common type (not its corresponding unsigned type). This
way we can detect overflow in __abs_r when required, while still
supporting the most-negative value when it can be represented in the
result type. To detect LCM results that are out of range of the result
type we still need explicit checks, because neither constant evaluation
nor UBsan will complain about unsigned wrapping for cases such as
std::lcm(500000u, 499999u). We can detect those overflows efficiently by
using __builtin_mul_overflow and asserting.

libstdc++-v3/ChangeLog:

PR libstdc++/105844
* include/experimental/numeric (experimental::gcd): Simplify
assertions. Use __abs_r instead of __absu.
(experimental::lcm): Likewise. Remove use of __detail::__lcm so
overflow can be detected.
* include/std/numeric (__detail::__absu): Rename to __abs_r and
change to allow signed result type, so overflow can be detected.
(__detail::__lcm): Remove.
(gcd): Simplify assertions. Use __abs_r instead of __absu.
(lcm): Likewise. Remove use of __detail::__lcm so overflow can
be detected.
* testsuite/26_numerics/gcd/gcd_neg.cc: Adjust dg-error lines.
* testsuite/26_numerics/lcm/lcm_neg.cc: Likewise.
* testsuite/26_numerics/gcd/105844.cc: New test.
* testsuite/26_numerics/lcm/105844.cc: New test.

(cherry picked from commit 671970a5621e18e7079b4ca113e56434c858db66)

libstdc++: Strip absolute paths from files shown in Doxygen docs

This avoids showing absolute paths from the expansion of
@srcdir@/libsupc++/ in the doxygen File List view.

libstdc++-v3/ChangeLog:

* doc/doxygen/user.cfg.in (STRIP_FROM_PATH): Remove prefixes
from header paths.

(cherry picked from commit 975e8e836ead0e9055a125a2a23463db5d847cb3)

tree-optimization/109473 - fix type mismatch in reduction vectorization

When backporting the PR109473 fix I failed to realize its testcase
will run into an unrelated similar bug. With GCC 12 the code has
seen substantial refactoring so the following applies a local fix
to make sure we are using the correct types when building initial
values for reductions.

PR tree-optimization/109473
* tree-vect-loop.c (get_initial_def_for_reduction):
Convert the scalar values to the vector component type
before using it to build the vector for the initial value.

Daily bump.

reassoc: Fix up another ICE with returns_twice call [PR109410]

The following testcase ICEs in reassoc, unlike the last case I've fixed
there here SSA_NAME_USED_IN_ABNORMAL_PHI is not the case anywhere.
build_and_add_sum places new statements after the later appearing definition
of an operand but if both operands are default defs or constants, we place
statement at the start of the function.

If the very first statement of a function is a call to returns_twice
function, this doesn't work though, because that call has to be the first
thing in its basic block, so the following patch splits the entry successor
edge such that the new statements are added into a different block from the
returns_twice call.

I think we should in stage1 reconsider such placements, I think it
unnecessarily enlarges the lifetime of the new lhs if its operand(s)
are used more than once in the function.  Unless something sinks those
again.  Would be nice to place it closer to the actual uses (or where
they will be placed).

2023-04-12  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/109410
* tree-ssa-reassoc.c (build_and_add_sum): Split edge from entry
block if first statement of the function is a call to returns_twice
function.

* gcc.dg/pr109410.c: New test.

(cherry picked from commit 51856718a82ce60f067910d9037ca255645b37eb)

c++: Fix Solaris bootstraps across midnight

When working on the PR109040 fix, I wanted to test it on some
WORD_REGISTER_OPERATIONS target and tried sparc-solaris on GCC Farm.
My bootstrap failed in comparison failure on cp/module.o, because
Solaris date doesn't support the -r option and one stage's cp/module.o
was built before midnight and next stage's cp/module.o after midnight,
so they had different -DMODULE_VERSION= value.

Now, I think the advice (don't bootstrap at midnight) is something
we shouldn't have, so the following patch stores the module version
(still generated through the same way, date -r cp/module.cc
if it works, otherwise just date) into a temporary file, makes sure
that temporary file is updated when cp/module.cc source is updated
and when date -r doesn't work copies file from previous stage
if it is newer than cp/module.cc.

2023-04-12 Jakub Jelinek <jakub@redhat.com>

* Make-lang.in (s-cp-module-version): New target.
(cp/module.o): Depend on it.
(MODULE_VERSION): Remove variable.
(CFLAGS-cp/module.o): For -DMODULE_VERSION= argument just
cat s-cp-module-version.

(cherry picked from commit 56529056cb42baa382c40de7d239d02dbf72c94f)

libiberty: Make strstr.c in libiberty ANSI compliant

On Fri, Nov 13, 2020 at 11:53:43AM -0700, Jeff Law via Gcc-patches wrote:
>
> On 5/1/20 6:06 PM, Seija Kijin via Gcc-patches wrote:
> > The original code in libiberty says "FIXME" and then says it has not been
> > validated to be ANSI compliant. However, this patch changes the function to
> > match implementations that ARE compliant, and such code is in the public
> > domain.
> >
> > I ran the test results, and there are no test failures.
>
> Thanks.  This seems to be the standard "simple" strstr implementation.
> There's significantly faster implementations available, but I doubt it's
> worth the effort as the version in this file only gets used if there is
> no system strstr.c.

Except that PR109306 says the new version is non-compliant and
is certainly slower than what we used to have.  The only problem I see
on the old version (sure, it is not very fast version) is that for
strstr ("abcd", "") it returned "abcd"+4 rather than "abcd" because
strchr in that case changed p to point to the last character and then
strncmp returned 0.

The question reported in PR109306 is whether memcmp is required not to
access characters beyond the first difference or not.
For all of memcmp/strcmp/strncmp, C17 says:
"The sign of a nonzero value returned by the comparison functions memcmp, strcmp, and strncmp
is determined by the sign of the difference between the values of the first pair of characters (both
interpreted as unsigned char) that differ in the objects being compared."
but then in memcmp description says:
"The memcmp function compares the first n characters of the object pointed to by s1 to the first n
characters of the object pointed to by s2."
rather than something similar to strncmp wording:
"The strncmp function compares not more than n characters (characters that follow a null character
are not compared) from the array pointed to by s1 to the array pointed to by
s2."
So, while for strncmp it seems clearly well defined when there is zero
terminator before reaching the n, for memcmp it is unclear if say
int
memcmp (const void *s1, const void *s2, size_t n)
{
  int ret = 0;
  size_t i;
  const unsigned char *p1 = (const unsigned char *) s1;
  const unsigned char *p2 = (const unsigned char *) s2;

  for (i = n; i; i--)
    if (p1[i - 1] != p2[i - 1])
      ret = p1[i - 1] < p2[i - 1] ? -1 : 1;
  return ret;
}
wouldn't be valid implementation (one which always compares all characters
and just returns non-zero from the first one that differs).

So, shouldn't we just revert and handle the len == 0 case correctly?

I think almost nothing really uses it, but still, the old version
at least worked nicer with a fast strchr.
Could as well strncmp (p + 1, s2 + 1, len - 1) if that is preferred
because strchr already compared the first character.

2023-04-02  Jakub Jelinek  <jakub@redhat.com>

PR other/109306
* strstr.c: Revert the 2020-11-13 changes.
(strstr): Return s1 if len is 0.

(cherry picked from commit 1719fa40c4ee4def60a2ce2f27e17f8168cf28ba)

sanopt: Return TODO_cleanup_cfg if any .{UB,HWA,A}SAN_* calls were lowered [PR106190]

The following testcase ICEs, because without optimization eh lowering
decides not to duplicate finally block of try/finally and so we end up
with variable guarded cleanup.  The sanopt pass creates a cfg that ought
to be cleaned up (some IFN_UBSAN_* functions are lowered in this case with
constant conditions in gcond and when not allowing recovery some bbs which
end with noreturn calls actually have successor edges), but the cfg cleanup
is actually (it is -O0) done only during the optimized pass.  We notice
there that the d[1][a] = 0; statement which has an EH edge is unreachable
(because ubsan would always abort on the out of bounds d[1] access), remove
the EH landing pad and block, but because that block just sets a variable
and jumps to another one which tests that variable and that one is reachable
from normal control flow, the __builtin_eh_pointer (1) later in there is
kept in the IL and we ICE during expansion of that statement because the
EH region has been removed.

The following patch fixes it by doing the cfg cleanup already during
sanopt pass if we create something that might need it, while the EH
landing pad is then removed already during sanopt pass, there is ehcleanup
later and we don't ICE anymore.

2023-03-28  Jakub Jelinek  <jakub@redhat.com>

PR middle-end/106190
* sanopt.c (pass_sanopt::execute): Return TODO_cleanup_cfg if any
of the IFN_{UB,HWA,A}SAN_* internal fns are lowered.

* gcc.dg/asan/pr106190.c: New test.

(cherry picked from commit 39a43dc336561e0eba0de477b16c7355f19d84ee)

i386: Require just 32-bit alignment for SLOT_FLOATxFDI_387 -m32 -mpreferred-stack-boundary=2 DImode temporaries [PR109276]

The following testcase ICEs since r11-2259 because assign_386_stack_local
-> assign_stack_local -> ix86_local_alignment now uses 64-bit alignment
for DImode temporaries rather than 32-bit as before.
Most of the spots in the backend which ask for DImode temporaries are during
expansion and those apparently are handled fine with -m32
-mpreferred-stack-boundary=2, we dynamically realign the stack in that case
(most of the spots actually don't need that alignment but at least one
does), then 2 spots are in STV which I assume also work correctly.
But during splitting we can create a DImode slot which doesn't need to be
64-bit alignment (it is nicer for performance though), when we apparently
aren't able to detect it for dynamic stack realignment purposes.

The following patch just makes the slot 32-bit aligned in that rare case.

2023-03-28 Jakub Jelinek <jakub@redhat.com>

PR target/109276
* config/i386/i386.c (assign_386_stack_local): For DImode
with SLOT_FLOATxFDI_387 and -m32 -mpreferred-stack-boundary=2 pass
align 32 rather than 0 to assign_stack_local.

* gcc.target/i386/pr109276.c: New test.

(cherry picked from commit 4b5ef857f5faf09f274c0a95c67faaa80d198124)

predict: Don't emit -Wsuggest-attribute=cold warning for functions which already have that attribute [PR105685]

In the following testcase, we predict baz to have cold
entry regardless of the user supplied attribute (as it call
unconditionally a cold function), but still issue
a -Wsuggest-attribute=cold warning despite it having that attribute
already.

The following patch avoids that.

2023-03-26 Jakub Jelinek <jakub@redhat.com>

PR ipa/105685
* predict.c (compute_function_frequency): Don't call
warn_function_cold if function already has cold attribute.

* c-c++-common/cold-2.c: New test.

(cherry picked from commit 7eca91d4781bb3df941f25c30b971dac66ba1b3d)

tree-vect-generic: Fix up expand_vector_condition [PR109176]

The following testcase ICEs on aarch64-linux, because
expand_vector_condition attempts to piecewise lower SVE
  d_3 = a_1(D) < b_2(D);
  _5 = VEC_COND_EXPR <d_3, c_4(D), d_3>;
which isn't possible - nunits_for_known_piecewise_op ICEs but
the rest of the code assumes constant number of elements too.

expand_vector_condition attempts to find if a (rhs1) is a SSA_NAME
for comparison and calls expand_vec_cond_expr_p (type, TREE_TYPE (a1), code)
where a1 is one of the operands of the comparison and code is the comparison
code.  That one indeed isn't supported here, but what aarch64 SVE supports
are the individual statements, comparison (expand_vec_cmp_expr_p) and
expand_vec_cond_expr_p (type, TREE_TYPE (a), SSA_NAME), the latter because
that function starts with
  if (VECTOR_BOOLEAN_TYPE_P (cmp_op_type)
      && get_vcond_mask_icode (TYPE_MODE (value_type),
                               TYPE_MODE (cmp_op_type)) != CODE_FOR_nothing)
    return true;

In an earlier version of the patch (in the PR), we did this
  if (VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (a))
      && expand_vec_cond_expr_p (type, TREE_TYPE (a), ERROR_MARK))
    return true;
before the code == SSA_NAME handling plus some further tweaks later.
While that fixed the ICE, it broke quite a few tests on x86 and some on
aarch64 too.  The problem is that expand_vector_comparison doesn't lower
comparisons which aren't supported and only feed VEC_COND_EXPR first operand
and expand_vector_condition succeeds for those, so with the above mentioned
change we'd verify the VEC_COND_EXPR is implementable using optab alone,
but nothing would verify the tcc_comparison which relied on
expand_vector_condition to verify.

So, the following patch instead queries whether optabs can handle the
comparison and VEC_COND_EXPR together (if a (rhs1) is a comparison;
otherwise as before it checks only the VEC_COND_EXPR) and if that fails,
also checks whether the two operations could be supported individually
and only if even that fails does the piecewise lowering.

2023-03-23  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/109176
* tree-vect-generic.c (expand_vector_condition): If a has
vector boolean type and is a comparison, also check if both
the comparison and VEC_COND_EXPR could be successfully expanded
individually.

* gcc.target/aarch64/sve/pr109176.c: New test.

(cherry picked from commit 484c41c747d95f9cee15a33b75b32ae2e7eb45f3)

c++: Drop TREE_READONLY on vars (possibly) initialized by tls wrapper [PR109164]

The following two testcases are miscompiled, because we keep TREE_READONLY
on the vars even when they are (possibly) dynamically initialized by a TLS
wrapper function.  Normally cp_finish_decl drops TREE_READONLY from vars
which need dynamic initialization, but for TLS we do this kind of
initialization upon every access to those variables.  Keeping them
TREE_READONLY means e.g. PRE can hoist loads from those before loops
which contain the TLS wrapper calls, so we can access the TLS variables
before they are initialized.

2023-03-20  Jakub Jelinek  <jakub@redhat.com>

PR c++/109164
* cp-tree.h (var_needs_tls_wrapper): Declare.
* decl2.c (var_needs_tls_wrapper): No longer static.
* decl.c (cp_finish_decl): Clear TREE_READONLY on TLS variables
for which a TLS wrapper will be needed.

* g++.dg/tls/thread_local13.C: New test.
* g++.dg/tls/thread_local13-aux.cc: New file.
* g++.dg/tls/thread_local14.C: New test.
* g++.dg/tls/thread_local14-aux.cc: New file.

(cherry picked from commit 0a846340b99675d57fc2f2923a0412134eed09d3)

tree-inline: Fix up multiversioning with vector arguments [PR105554]

The following testcase ICEs, because we call tree_function_versioning from
old_decl which has target attributes not supporting V4DImode and so
DECL_MODE of DECL_ARGUMENTS is BLKmode, while new_decl supports those.
tree_function_versioning initially copies DECL_RESULT and DECL_ARGUMENTS
from old_decl to new_decl, then calls initialize_cfun to create cfun
and only when the cfun is created it can later actually remap_decl
DECL_RESULT and DECL_ARGUMENTS etc.
The problem is that initialize_cfun -> push_struct_function ->
allocate_struct_function calls relayout_decl on DECL_RESULT and
DECL_ARGUMENTS, which clobbers DECL_MODE of old_decl and we then ICE because
of it.
In particular, allocate_struct_function does:
      if (!abstract_p)
        {
          /* Now that we have activated any function-specific attributes
             that might affect layout, particularly vector modes, relayout
             each of the parameters and the result.  */
          relayout_decl (result);
          for (tree parm = DECL_ARGUMENTS (fndecl); parm;
               parm = DECL_CHAIN (parm))
            relayout_decl (parm);

          /* Similarly relayout the function decl.  */
          targetm.target_option.relayout_function (fndecl);
        }

      if (!abstract_p && aggregate_value_p (result, fndecl))
        {
#ifdef PCC_STATIC_STRUCT_RETURN
          cfun->returns_pcc_struct = 1;
#endif
          cfun->returns_struct = 1;
        }
Now, in the case of tree_function_versioning, I believe all that we need
from these is possibly the
targetm.target_option.relayout_function (fndecl);
call (arm only), we will remap DECL_RESULT and DECL_ARGUMENTS later on
and copy_decl_for_dup_finish in that case will handle all we need:
  /* For vector typed decls make sure to update DECL_MODE according
     to the new function context.  */
  if (VECTOR_TYPE_P (TREE_TYPE (copy)))
    SET_DECL_MODE (copy, TYPE_MODE (TREE_TYPE (copy)));
We don't need the cfun->returns_*struct either, because we override it
in initialize_cfun a few lines later:
  /* Copy items we preserve during cloning.  */
...
  cfun->returns_struct = src_cfun->returns_struct;
  cfun->returns_pcc_struct = src_cfun->returns_pcc_struct;

So, to avoid the clobbering of DECL_RESULT/DECL_ARGUMENTS of old_decl,
the following patch arranges allocate_struct_function to be called with
abstract_p true and calls targetm.target_option.relayout_function (fndecl);
by hand.

The removal of DECL_RESULT/DECL_ARGUMENTS copying at the start of
initialize_cfun is removed because the only caller -
tree_function_versioning, does that unconditionally before.

2023-03-17  Jakub Jelinek  <jakub@redhat.com>

PR target/105554
* function.h (push_struct_function): Add ABSTRACT_P argument defaulted
to false.
* function.c (push_struct_function): Add ABSTRACT_P argument, pass it
to allocate_struct_function instead of false.
* tree-inline.c (initialize_cfun): Don't copy DECL_ARGUMENTS
nor DECL_RESULT here.  Pass true as ABSTRACT_P to
push_struct_function.  Call targetm.target_option.relayout_function
after it.
(tree_function_versioning): Formatting fix.

* gcc.target/i386/pr105554.c: New test.

(cherry picked from commit 24c06560a7fa39049911eeb8777325d112e0deb9)

c, ubsan: Instrument even shortened divisions [PR109151]

On the following testcase, the C FE decides to shorten the division because
it has a guarantee that INT_MIN / -1 division won't be encountered, the
first operand is widened from narrower unsigned and/or the second operand is
a constant other than all ones (in this case both are true).
The problem is that the narrower type in this case is _Bool and
ubsan_instrument_division only instruments it if op0's type is INTEGER_TYPE
or REAL_TYPE.  Strangely this doesn't happen in C++ FE.
Anyway, we only shorten divisions if the INT_MIN / -1 case is impossible,
so I think we should be fine even with -fstrict-enums in C++ in case it
shortened to ENUMERAL_TYPEs.

The following patch just instruments those on the ubsan_instrument_division
side.  Perhaps only the first hunk and testcase might be needed because
we shouldn't shorten if the other case could be triggered.

2023-03-17  Jakub Jelinek  <jakub@redhat.com>

PR c/109151
* c-ubsan.c (ubsan_instrument_division): Handle all scalar integral
types rather than just INTEGER_TYPE.

* c-c++-common/ubsan/div-by-zero-8.c: New test.

(cherry picked from commit 103d423f6ce72ccb03d55b7b1dfa2dabd5854371)

openmp: Fix up handling of doacross loops with noreturn body in loops [PR108685]

The following patch fixes an ICE with doacross loops which have a single entry
no exit body, at least one of the ordered > collapse loops isn't guaranteed to
have at least one iteration and the whole doacross loop is inside some other loop.
The OpenMP constructs aren't represented by struct loop until the omp expansions,
so for a normal doacross loop which doesn't have a noreturn body the entry_bb
with the GOMP_FOR statement and the first bb of the body typically have the
same loop_father, and if the doacross loop isn't inside of some other loop
and the body is noreturn as well, both are part of loop 0.  The problematic
case is when the entry_bb is inside of some deeper loop, but the body, because
it falls through into EXIT, has loop 0 as loop_father.  l0_bb is created by
splitting the entry_bb fallthru edge into l1_bb, and because the two basic blocks
have different loop_father, a common loop is found for those (which is loop 0).
Now, if the doacross loop has collapse == ordered or all the ordered > collapse
loops are guaranteed to iterate at least once, all is still fine, because all
enter the l1_bb (body), which doesn't return and so doesn't loop further either.
But, if one of those loops could loop 0 times, the user written body wouldn't be
reached at all, so unlike the expectations the whole construct actually wouldn't
be noreturn if entry_bb is encountered and decides to handle at least one
iteration.

In this case, we need to fix up, move the l0_bb into the same loop as entry_bb
(initially) and for the extra added loops put them as children of that same
loop, rather than of loop 0.

2023-03-17  Jakub Jelinek  <jakub@redhat.com>

PR middle-end/108685
* omp-expand.c (expand_omp_for_ordered_loops): Add L0_BB argument,
use its loop_father rather than BODY_BB's loop_father.
(expand_omp_for_generic): Adjust expand_omp_for_ordered_loops caller.
If broken_loop with ordered > collapse and at least one of those
extra loops aren't guaranteed to have at least one iteration, change
l0_bb's loop_father to entry_bb's loop_father.  Set cont_bb's
loop_father to l0_bb's loop_father rather than l1_bb's.

* c-c++-common/gomp/doacross-8.c: New test.

(cherry picked from commit 713fa5db8ceb4ba8783a0d690ceb4c07f2ff03d0)

c++: Treat unnamed bitfields as padding for __has_unique_object_representations [PR109096]

As reported in the PR, for __has_unique_object_representations we
were treating unnamed bitfields as named ones, which is wrong, they
are actually padding.

THe following patch fixes that.

2023-03-14 Jakub Jelinek <jakub@redhat.com>

PR c++/109096
* tree.c (record_has_unique_obj_representations): Ignore unnamed
bitfields.

* g++.dg/cpp1z/has-unique-obj-representations3.C: New test.

(cherry picked from commit c35cf160a0ed81570cff6600dba465cf95fa80fa)

c++: Don't clear TREE_READONLY for -fmerge-all-constants for non-aggregates [PR107558]

The following testcase ICEs, because OpenMP lowering for shared clause
on l variable with REFERENCE_TYPE creates POINTER_TYPE to REFERENCE_TYPE.
The reason is that the automatic variable has non-trivial construction
(reference to a lambda) and -fmerge-all-constants is on and so TREE_READONLY
isn't set - omp-low will handle automatic TREE_READONLY vars in shared
specially and only copy to the construct and not back, while !TREE_READONLY
are assumed to be changeable.
The PR91529 change rationale was that the gimplification can change
some non-addressable automatic variables to TREE_STATIC with
-fmerge-all-constants and therefore TREE_READONLY on them is undesirable.
But, the gimplifier does that only for aggregate variables:
  switch (TREE_CODE (type))
    {
    case RECORD_TYPE:
    case UNION_TYPE:
    case QUAL_UNION_TYPE:
    case ARRAY_TYPE:
and not for anything else.  So, I think clearing TREE_READONLY for
automatic integral or reference or pointer etc. vars for
-fmerge-all-constants only is unnecessary.

2023-03-10  Jakub Jelinek  <jakub@redhat.com>

PR c++/107558
* decl.c (cp_finish_decl): Don't clear TREE_READONLY on
automatic non-aggregate variables just because of
-fmerge-all-constants.

* g++.dg/gomp/pr107558.C: New test.

(cherry picked from commit 60b6f5c0a334db3f8f6dffaf0b9aab42fd5c54a2)

c-family: Incremental fix for -Wsign-compare BIT_NOT_EXPR handling [PR107465]

There can be too many extensions and seems I didn't get everything right in
the previously posted patch.

The following incremental patch ought to fix that.
The code can deal with quite a few sign/zero extensions at various spots
and it is important to deal with all of them right.
On the argument that contains BIT_NOT_EXPR we have:
MSB bits#4 bits#3 BIT_NOT_EXPR bits#2 bits#1 LSB
where bits#1 is one or more bits (TYPE_PRECISION (TREE_TYPE (arg0))
at the end of the function) we don't know anything about, for the purposes
of this warning it is VARYING that is inverted with BIT_NOT_EXPR to some other
VARYING bits;
bits#2 is one or more bits (TYPE_PRECISION (TREE_TYPE (op0)) -
TYPE_PRECISION (TREE_TYPE (arg0)) at the end of the function)
which are known to be 0 before the BIT_NOT_EXPR and 1 after it.
bits#3 is zero or more bits from the TYPE_PRECISION (TREE_TYPE (op0))
at the end of function to the TYPE_PRECISION (TREE_TYPE (op0)) at the
end of the function to TYPE_PRECISION (TREE_TYPE (op0)) at the start
of the function, which are either zero extension or sign extension.
And bits#4 is zero or more bits from the TYPE_PRECISION (TREE_TYPE (op0))
at the start of the function to TYPE_PRECISION (result_type), which
again can be zero or sign extension.

Now, vanilla trunk as well as the previously posted patch mishandles the
case where bits#3 are sign extended (as bits#2 are known to be all set,
that means bits#3 are all set too) but bits#4 are zero extended and are
thus all 0.

The patch fixes it by tracking the lowest bit which is known to be clear
above the known to be set bits (if any, otherwise it is precision of
result_type).

2023-03-04 Jakub Jelinek <jakub@redhat.com>

PR c/107465
* c-warn.c (warn_for_sign_compare): Don't warn for unset bits
above innermost zero extension of BIT_NOT_EXPR result.

* c-c++-common/Wsign-compare-2.c (f18): New test.

(cherry picked from commit 3ec9a8728086ad86a2d421e067329f305f40e005)

c-family: Fix up -Wsign-compare BIT_NOT_EXPR handling [PR107465]

The following patch fixes multiple bugs in warn_for_sign_compare related to
the BIT_NOT_EXPR related warnings.
My understanding is that what those 3 warnings are meant to warn (since 1995
apparently) is the case where we have BIT_NOT_EXPR of a zero-extended
value, so in result_type the value is something like:
0b11111111XXXXXXXX (e.g. ~ of a 8->16 bit zero extension)
0b000000000000000011111111XXXXXXXX (e.g. ~ of a 8->16 bit zero extension
then zero extended to 32 bits)
0b111111111111111111111111XXXXXXXX (e.g. ~ of a 8->16 bit zero extension
then sign extended to 32 bits)
and the intention of the warning is to warn when this is compared against
something that has some 0 bits at the place where the above has guaranteed
1 bits, either ensured through comparison against constant where we know
the bits exactly, or through zero extension from some narrower type where
again we know at least some upper bits are zero extended.
The bugs in the warning code are:
1) misunderstanding of the {,c_common_}get_narrower APIs - the unsignedp
   it sets is only meaningful if the function actually returns something
   narrower (in that case it says whether the narrower value is then
   sign (0) or zero (1) extended to the originally passed value.
   Though op0 or op1 at this point might be already narrower than
   result_type, and if the function doesn't return anything narrower,
   it all depends on whether the passed in op{0,1} had TYPE_UNSIGNED
   type or not
2) the code didn't check at all whether the BIT_NOT_EXPR operand
   was actually zero extended (i.e. that it was narrower and unsignedp
   was set to 1 for it), all it did is check that unsignedp from the
   call was 1.  But that isn't well defined thing, if the argument
   is returned as is, the function sets unsignedp to 0, but if there
   is e.g. a useless cast to the same or compatible type in between,
   it can return 1 if the cast is unsigned; now, if BIT_NOT_EXPR
   operand is not zero extended, we know nothing at all about any bits
   in the operand containing BIT_NOT_EXPR, so there is nothing to warn
   about
3) the code was actually testing both operands after calling
   c_common_get_narrower on them and on the one with BIT_NOT_EXPR
   again for constants; I think that is just wrong in case the BIT_NOT_EXPR
   operand wouldn't be fully folded, the warning makes sense only if the
   other operand not having BIT_NOT_EXPR in it is constant
4) as can be seen from the above bit pattern examples, the upper bits above
   (in the patch arg0) aren't always all 1s, there could be some zero extension
   above it and from it one would have 0s, so that needs to be taken into
   account for the choice which constant bits to test for being always set
   otherwise warning is emitted, or for the zero extension guaranteed zero
   bits
5) the patch also simplifies the handling, we only do it if one but not
   both operands are BIT_NOT_EXPR after first {,c_common_}get_narrower,
   so we can just use std::swap to ensure it is the first one
6) the code compared bits against HOST_BITS_PER_LONG, which made sense
   back in 1995 when the values were stored into long, but now that they
   are HOST_WIDE_INT should test HOST_BITS_PER_WIDE_INT (or we could rewrite
   the stuff to wide_int, not done in the patch)

2023-03-04  Jakub Jelinek  <jakub@redhat.com>

PR c/107465
* c-warn.c (warn_for_sign_compare): If c_common_get_narrower
doesn't return a narrower result, use TYPE_UNSIGNED to set unsignedp0
and unsignedp1.  For the one BIT_NOT_EXPR case vs. one without,
only check for constant in the non-BIT_NOT_EXPR operand, use std::swap
to simplify the code, only warn if BIT_NOT_EXPR operand is extended
from narrower unsigned, fix up computation of mask for the constant
cases and for unsigned other operand case handle differently
BIT_NOT_EXPR result being sign vs. zero extended.

* c-c++-common/Wsign-compare-2.c: New test.
* c-c++-common/pr107465.c: New test.

(cherry picked from commit daaf74a714c41c8dbaf9954bcc58462c63062b4f)

diagnostics: Fix up selftests with $COLUMNS < 42 [PR108973]

As mentioned in the PR, GCC's diagnostics self-tests fail if $COLUMNS < 42.
Guarding each self-test with if (get_terminal_width () > 41) or similar
would be a maintainance nightmare (PR has a patch to do so without
reformatting to make it work for $COLUMNS in [30, 41] inclusive, but
I'm afraid going down to $COLUMNS 1 would mean marking everything).
Furthermore, the self-tests don't really emit stuff to the terminal,
but into a buffer, so using get_terminal_width () for it seems
inappropriate. The following patch makes sure test_diagnostic_context
constructor uses exactly 80 columns wide caret max width, of course
some tests override it already if they want to test for behavior in narrower
cases.

2023-03-04 Jakub Jelinek <jakub@redhat.com>

PR testsuite/108973
* selftest-diagnostic.c
(test_diagnostic_context::test_diagnostic_context): Set
caret_max_width to 80.

(cherry picked from commit 739e7ebb3d378ece25d64b39baae47c584253498)

libquadmath: Assorted libquadmath strtoflt128 fixes [PR87204, PR94756]

This patch cherry-pickx 8 commits from glibc which fix various strtod_l
bugs.

2023-03-03 niXman <i.nixman@autistici.org>
Jakub Jelinek <jakub@redhat.com>

PR libquadmath/87204
PR libquadmath/94756
* strtod/strtod_l.c (round_and_return): Cherry-pick glibc
9310c284ae9 BZ #16151, 4406c41c1d6 BZ #16965 and fcd6b5ac36a
BZ #23279 fixes.
(____STRTOF_INTERNAL): Cherry-pick glibc b0debe14fcf BZ #23007,
5556d30caee BZ #18247, 09555b9721d and c6aac3bf366 BZ #26137 and
d84f25c7d87 fixes.

(cherry picked from commit df63f4162c78ef799d4ea9dec3443d5e9c51e5aa)

c++, debug: Fix up locus of DW_TAG_imported_module [PR108716]

Before IMPORTED_DECL has been introduced in PR37410, we used to emit correct
DW_AT_decl_line on DW_TAG_imported_module on the testcase below, after that
change we haven't emitted it at all for a while and after some time
started emitting incorrect locus, in particular the location of } closing
the function.

The problem is that while we have correct EXPR_LOCATION on the USING_STMT,
when genericizing that USING_STMT into IMPORTED_DECL we don't copy the
location to DECL_SOURCE_LOCATION, so it gets whatever input_location happens
to be when it is created.

2023-03-02 Jakub Jelinek <jakub@redhat.com>

PR debug/108716
* cp-gimplify.c (cp_genericize_r) <case USING_STMT>: Set
DECL_SOURCE_LOCATION on IMPORTED_DECL to expression location
of USING_STMT or input_location.

* g++.dg/debug/dwarf2/pr108716.C: New test.

(cherry picked from commit 4d82022bfd15d36717bf60a11e75e9ea02204269)

cfgexpand: Handle WIDEN_{PLUS,MINUS}_EXPR and VEC_WIDEN_{PLUS,MINUS}_{HI,LO}_EXPR in expand_debug_expr [PR108967]

When these tree codes were introduced, expand_debug_expr hasn't been
updated to handle them.  For the VEC_*_EXPR we currently mostly punt, the
non-vector ones can be handled easily.  In release compilers this doesn't
ICE, but with checking we ICE so that we make sure to handle all the needed
tree codes there.

2023-03-01  Jakub Jelinek  <jakub@redhat.com>

PR debug/108967
* cfgexpand.c (expand_debug_expr): Handle WIDEN_{PLUS,MINUS}_EXPR
and VEC_WIDEN_{PLUS,MINUS}_{HI,LO}_EXPR.

* g++.dg/debug/pr108967.C: New test.

(cherry picked from commit 520403f324a6ed8b527f39239709dd0841b992e2)

cgraphclones: Don't share DECL_ARGUMENTS between thunk and its artificial thunk [PR108854]

The following testcase ICEs on x86_64-linux with -m32.  The problem is
we create an artificial thunk and because of -fPIC, ia32 and thunk
destination which doesn't bind locally can't use a mi thunk.
The ICE is because during expansion to RTL we see SSA_NAME for a PARM_DECL,
but the PARM_DECL doesn't have DECL_CONTEXT of the current function.
This is because duplicate_thunk_for_node creates a new DECL_ARGUMENTS chain
only if some arguments need modification.

The following patch fixes it by copying the DECL_ARGUMENTS list even if
the arguments can stay as is, to update DECL_CONTEXT on them.  While for
mi thunks it doesn't really matter because we don't use those arguments
in any way, for other thunks it is important.

2023-02-23  Jakub Jelinek  <jakub@redhat.com>

PR middle-end/108854
* cgraphclones.c (duplicate_thunk_for_node): If no parameter
changes are needed, copy at least DECL_ARGUMENTS PARM_DECL
nodes and adjust their DECL_CONTEXT.

* g++.dg/opt/pr108854.C: New test.

(cherry picked from commit 2f1691be517fcdcabae9cd671ab511eb0e08b1d5)

i386: Fix up builtins used in avx512bf16vlintrin.h [PR108881]

The builtins used in avx512bf16vlintrin.h implementation need both
avx512bf16 and avx512vl ISAs, which the header ensures for them, but
the builtins weren't actually requiring avx512vl, so when used by hand
with just -mavx512bf16 -mno-avx512vl it resulted in ICEs.

Fixed by adding OPTION_MASK_ISA_AVX512VL to their BDESC.

2023-02-24 Jakub Jelinek <jakub@redhat.com>

PR target/108881
* config/i386/i386-builtin.def (__builtin_ia32_cvtne2ps2bf16_v16hi,
__builtin_ia32_cvtne2ps2bf16_v16hi_mask,
__builtin_ia32_cvtne2ps2bf16_v16hi_maskz,
__builtin_ia32_cvtne2ps2bf16_v8hi,
__builtin_ia32_cvtne2ps2bf16_v8hi_mask,
__builtin_ia32_cvtne2ps2bf16_v8hi_maskz,
__builtin_ia32_cvtneps2bf16_v8sf_mask,
__builtin_ia32_cvtneps2bf16_v8sf_maskz,
__builtin_ia32_cvtneps2bf16_v4sf_mask,
__builtin_ia32_cvtneps2bf16_v4sf_maskz,
__builtin_ia32_dpbf16ps_v8sf, __builtin_ia32_dpbf16ps_v8sf_mask,
__builtin_ia32_dpbf16ps_v8sf_maskz, __builtin_ia32_dpbf16ps_v4sf,
__builtin_ia32_dpbf16ps_v4sf_mask,
__builtin_ia32_dpbf16ps_v4sf_maskz): Require also
OPTION_MASK_ISA_AVX512VL.

* gcc.target/i386/avx512bf16-pr108881.c: New test.

(cherry picked from commit 0ccfa3884f638816af0f5a3f0ee2695e0771ef6d)

libgomp: Fix up some typos in libgomp.texi

I decided to check for repeated the the in libgomp and noticed
there are several occurrences of a typo theads rather than threads
in libgomp.texi.

2023-02-16 Jakub Jelinek <jakub@redhat.com>

* libgomp.texi: Fix typos - theads -> threads.

(cherry picked from commit 0b9bd33d69d0c30330a465e6bad262d90c94d4ea)

i386: Call get_available_features for all CPUs with max_level >= 1 [PR100758]

get_available_features doesn't depend on cpu_model2->__cpu_{family,model}
and just sets stuff up based on CPUID leaf 1, or some extended ones,
so I wonder why are we calling it separately for Intel, AMD and Zhaoxin
and not for all other CPUs too?  I think various programs in the wild
which aren't using __builtin_cpu_{is,supports} just check the various CPUID
leafs and query bits in there, without blacklisting unknown CPU vendors,
so I think even __builtin_cpu_supports ("sse2") etc. should be reliable
if those VENDOR_{CENTAUR,CYRIX,NSC,OTHER} CPUs set those bits in CPUID leaf
1 or some extended ones.  Calling it for all CPUs also means it can be
inlined because there will be just a single caller.

I have tested it on Intel and Martin tested it on AMD, but can't test it
on non-Intel/AMD; for Intel/AMD/Zhaoxin it should be really no change in
behavior.

2023-02-09  Jakub Jelinek  <jakub@redhat.com>

PR target/100758
* common/config/i386/cpuinfo.h (cpu_indicator_init): Call
get_available_features for all CPUs with max_level >= 1, rather
than just Intel or AMD.

(cherry picked from commit b24e9c083093a9e1b1007933a184c02f7ff058db)

c++: Handle structured bindings like anon unions in initializers [PR108474]

As reported by Andrew Pinski, structured bindings (with the exception
of the ones using std::tuple_{size,element} and get which are really
standalone variables in addition to the binding one) also use
DECL_VALUE_EXPR and needs the same treatment in static initializers.

On Sun, Jan 22, 2023 at 07:19:07PM -0500, Jason Merrill wrote:
> Though, actually, why not instead fix expand_expr_real_1 (and staticp) to
> look through DECL_VALUE_EXPR?

Doing it when emitting the initializers seems to be too late to me,
we in various spots try to put parts of the static var DECL_INITIAL expressions
into the IL, or e.g. for varpool purposes remember which vars are referenced
there.

This patch moves it to record_reference, which is called from varpool_node::analyze
and so about the same time as gimplification of the bodies which also
replaces DECL_VALUE_EXPRs.

2023-01-24 Jakub Jelinek <jakub@redhat.com>

PR c++/108474
* cp-gimplify.c (cp_fold_r): Handle structured bindings
vars like anon union artificial vars.

* g++.dg/cpp1z/decomp57.C: New test.
* g++.dg/cpp1z/decomp58.C: New test.

(cherry picked from commit b84e21115700523b4d0ac44275443f7b9c670344)

c++: Avoid incorrect shortening of divisions [PR108365]

The following testcase is miscompiled, because we shorten the division
in a case where it should not be shortened.
Divisions (and modulos) can be shortened if it is unsigned division/modulo,
or if it is signed division/modulo where we can prove the dividend will
not be the minimum signed value or divisor will not be -1, because e.g.
on sizeof(long long)==sizeof(int)*2 && __INT_MAX__ == 0x7fffffff targets
(-2147483647 - 1) / -1 is UB
but
(int) (-2147483648LL / -1LL) is not, it is -2147483648.
The primary aim of both the C and C++ FE division/modulo shortening I assume
was for the implicit integral promotions of {,signed,unsigned} {char,short}
and because at this point we have no VRP information etc., the shortening
is done if the integral promotion is from unsigned type for the divisor
or if the dividend is an integer constant other than -1.
This works fine for char/short -> int promotions when char/short have
smaller precision than int - unsigned char -> int or unsigned short -> int
will always be a positive int, so never the most negative.

Now, the C FE checks whether orig_op0 is TYPE_UNSIGNED where op0 is either
the same as orig_op0 or that promoted to int, I think that works fine,
if it isn't promoted, either the division/modulo common type will have the
same precision as op0 but then the division/modulo is unsigned and so
without UB, or it will be done in wider precision (e.g. because op1 has
wider precision), but then op0 can't be minimum signed value. Or it has
been promoted to int, but in that case it was again from narrower type and
so never minimum signed int.

But the C++ FE was checking if op0 is a NOP_EXPR from TYPE_UNSIGNED.
First of all, not sure if the operand of NOP_EXPR couldn't be non-integral
type where TYPE_UNSIGNED wouldn't be meaningful, but more importantly,
even if it is a cast from unsigned integral type, we only know it can't be
minimum signed value if it is a widening cast, if it is same precision or
narrowing cast, we know nothing.

So, the following patch for the NOP_EXPR cases checks just in case that
it is from integral type and more importantly checks it is a widening
conversion.

2023-01-14 Jakub Jelinek <jakub@redhat.com>

PR c++/108365
* typeck.c (cp_build_binary_op): For integral division or modulo,
shorten if type0 is unsigned, or op0 is cast from narrower unsigned
integral type or stripped_op1 is INTEGER_CST other than -1.

* g++.dg/opt/pr108365.C: New test.
* g++.dg/warn/pr108365.C: New test.

(cherry picked from commit 5b3a88640f962d4ffca31ae651bed2d8672f1a8c)

match.pd: When simplifying BFR of an insert, require a mode precision integral type [PR108688]

The same problem as PR 88739 has crept in but
this time in match.pd when simplifying bit_field_ref of
an bit_insert. That is we are generating a BIT_FIELD_REF
of a non-mode-precision integral type.

PR tree-optimization/108688
* match.pd (bit_field_ref [bit_insert]): Avoid generating
BIT_FIELD_REFs of non-mode-precision integral operands.

* gcc.c-torture/compile/pr108688-1.c: New test.

(cherry picked from commit 44f308e59bfa0f93ae05b17e257d8563c12399fd)

vect-patterns: Fix up vect_widened_op_tree [PR108692]

The following testcase is miscompiled on aarch64-linux since r11-5160.
Given
  <bb 3> [local count: 955630225]:
  # i_22 = PHI <i_20(6), 0(5)>
  # r_23 = PHI <r_19(6), 0(5)>
...
  a.0_5 = (unsigned char) a_15;
  _6 = (int) a.0_5;
  b.1_7 = (unsigned char) b_17;
  _8 = (int) b.1_7;
  c_18 = _6 - _8;
  _9 = ABS_EXPR <c_18>;
  r_19 = _9 + r_23;
...
where SSA_NAMEs 15/17 have signed char, 5/7 unsigned char and rest is int
we first pattern recognize c_18 as
patt_34 = (a.0_5) w- (b.1_7);
which is still correct, 5/7 are unsigned char subtracted in wider type,
but then vect_recog_sad_pattern turns it into
SAD_EXPR <a_15, b_17, r_23>
which is incorrect, because 15/17 are signed char and so it is
sum of absolute signed differences rather than unsigned sum of
absolute unsigned differences.
The reason why this happens is that vect_recog_sad_pattern calls
vect_widened_op_tree with MINUS_EXPR, WIDEN_MINUS_EXPR on the
patt_34 = (a.0_5) w- (b.1_7); statement's vinfo and vect_widened_op_tree
calls vect_look_through_possible_promotion on the operands of the
WIDEN_MINUS_EXPR, which looks through the further casts.
vect_look_through_possible_promotion has careful code to stop when there
would be nested casts that need to be preserved, but the problem here
is that the WIDEN_*_EXPR operation itself has an implicit cast on the
operands already - in this case of WIDEN_MINUS_EXPR the unsigned char
5/7 SSA_NAMEs are widened to unsigned short before the subtraction,
and vect_look_through_possible_promotion obviously isn't told about that.

Now, I think when we see those WIDEN_{MULT,MINUS,PLUS}_EXPR codes, we had
to look through possible promotions already when creating those and so
vect_look_through_possible_promotion again isn't really needed, all we need
to do is arrange what that function will do if the operand isn't result
of any cast.  Other option would be let vect_look_through_possible_promotion
know about the implicit promotion from the WIDEN_*_EXPR, but I'm afraid
that would be much harder.

2023-02-08  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/108692
* tree-vect-patterns.c (vect_widened_op_tree): If rhs_code is
widened_code which is different from code, don't call
vect_look_through_possible_promotion but instead just check op is
SSA_NAME with integral type for which vect_is_simple_use is true
and call set_op on this_unprom.

* gcc.dg/pr108692.c: New test.

(cherry picked from commit 6ad1c1027628f094260037536f6b6fcdb63b5add)

fortran: Fix up hash table usage in gfc_trans_use_stmts [PR108451]

The first testcase in the PR (which I haven't included in the patch because
it is unclear to me if it is supposed to be valid or not) ICEs since extra
hash table checking has been added recently.  The problem is that
gfc_trans_use_stmts does
          tree *slot = entry->decls->find_slot_with_hash (rent->use_name, hash,
                                                          INSERT);
          if (*slot == NULL)
and later on doesn't store anything into *slot and continues.  Another spot
a few lines later correctly clears the slot if it decides not to use the
slot, so the following patch does the same.

2023-02-03  Jakub Jelinek  <jakub@redhat.com>

PR fortran/108451
* trans-decl.c (gfc_trans_use_stmts): Call clear_slot before
doing continue.

(cherry picked from commit 76f7f0eddcb7c418d1ec3dea3e2341ca99097301)

nested, openmp: Wrap OMP_CLAUSE_*_GIMPLE_SEQ into GIMPLE_BIND for declare_vars [PR108435]

When gimplifying OMP_CLAUSE_{LASTPRIVATE,LINEAR}_STMT, we wrap it always
into a GIMPLE_BIND, but when putting statements directly into
OMP_CLAUSE_{LASTPRIVATE,LINEAR}_GIMPLE_SEQ, we do it only if needed (there
are any temporaries that need to be declared in the sequence).
convert_nonlocal_omp_clauses was relying on the GIMPLE_BIND to be there always
because it called declare_vars on it.

The following patch wraps it into GIMPLE_BIND in tree-nested if we need to
declare_vars on it on demand.

2023-02-02 Jakub Jelinek <jakub@redhat.com>

PR middle-end/108435
* tree-nested.c (convert_nonlocal_omp_clauses)
<case OMP_CLAUSE_LASTPRIVATE>: If info->new_local_var_chain and *seq
is not a GIMPLE_BIND, wrap the sequence into a new GIMPLE_BIND
before calling declare_vars.
(convert_nonlocal_omp_clauses) <case OMP_CLAUSE_LINEAR>: Merge
with the OMP_CLAUSE_LASTPRIVATE handling except for whether
seq is initialized to &OMP_CLAUSE_LASTPRIVATE_GIMPLE_SEQ (clause)
or &OMP_CLAUSE_LINEAR_GIMPLE_SEQ (clause).

* gcc.dg/gomp/pr108435.c: New test.

(cherry picked from commit 0f349928e16fdc7dba52561e8d40347909f9f0ff)

ree: Fix -fcompare-debug issues in combine_reaching_defs [PR108573]

The PR78437 r7-4871 changes made combine_reaching_defs punt on
WORD_REGISTER_OPERATIONS targets if a setter of smaller than word
register has wider uses.  This unfortunately breaks -fcompare-debug,
because if such a use appears only in DEBUG_INSN(s), while all other
uses aren't wider than the setter, we can REE optimize it without -g
and not with -g.

Such decisions shouldn't be based on debug instructions.  We could try
to reset them or adjust in some other way after we decide to perform the
change, but at least on the testcase which used to fail on riscv64-linux
the
(debug_insn 8 7 9 2 (var_location:HI s (minus:HI (subreg:HI (and:DI (reg:DI 10 a0 [160])
                (const_int 1 [0x1])) 0)
        (subreg:HI (ashiftrt:DI (reg/v:DI 9 s1 [orig:151 l ] [151])
                (debug_expr:SI D#1)) 0))) "pr108573.c":12:5 -1
     (nil))
clearly doesn't care about the upper bits and I have hard time imaging how
could one end up with DEBUG_INSN which actually cares about those upper
bits.

So, the following patch just ignores uses on DEBUG_INSNs in this case,
if we run into something where we'd need to do something further later on,
let's deal with it when we have a testcase for it.

2023-02-01  Jakub Jelinek  <jakub@redhat.com>

PR debug/108573
* ree.c (combine_reaching_defs): Don't return false for paradoxical
subregs in DEBUG_INSNs.

* gcc.dg/pr108573.c: New test.

(cherry picked from commit e4473d7cf871c8ddf8f22d105c5af6375ebe37bf)

c++, openmp: Handle some OMP_*/OACC_* constructs during constant expression evaluation [PR108607]

While potential_constant_expression_1 handled most of OMP_* codes (by saying that
they aren't potential constant expressions), OMP_SCOPE was missing in that list.
I've also added OMP_SCAN, though that is less important (similarly to OMP_SECTION
it ought to appear solely inside of OMP_{FOR,SIMD} resp. OMP_SECTIONS).
As the testcase shows, it isn't enough, potential_constant_expression_1
can catch only some cases, as soon as one uses switch or ifs where at least
one of the possible paths could be constant expression, we can run into the
same codes during cxx_eval_constant_expression, so this patch handles those
there as well.

2023-02-01 Jakub Jelinek <jakub@redhat.com>

PR c++/108607
* constexpr.c (cxx_eval_constant_expression): Handle OMP_*
and OACC_* constructs as non-constant.
(potential_constant_expression_1): Handle OMP_SCAN.

* g++.dg/gomp/pr108607.C: New test.

(cherry picked from commit bfc070595bfb00abef88a002eee5d9117f5b86a7)

bbpart: Fix up ICE on asm goto [PR108596]

On the following testcase we have asm goto in hot block with 2 successors,
one cold to which it both falls through and has one of the label
pointing to it and another hot successor with another label.

Now, during bbpart we want to ensure that no blocks from one partition fall
through into a block in a different partition.  fix_up_fall_thru_edges
does that by temporarily clearing the EDGE_CROSSING on the fallthrough edge,
calling force_nonfallthru and then depending on whether it created a new
bb either set EDGE_CROSSING on the single successor edge from the new bb
(the new bb is kept in the same partition as the predecessor block), or
if no new bb has been created setting EDGE_CROSSING back on the fallthru
edge which has been forced non-EDGE_FALLTHRU.
For asm goto this doesn't always work, force_nonfallthru can create a new bb
and change the fallthrough edge to point to that, but if the original
fallthru destination block has its label referenced among the asm goto
labels, it will create a new non-fallthru edge for the label(s).
But because we've temporarily cheated and cleared EDGE_CROSSING on the edge,
it is cleared on the new edge as well, then the caller sees we've created
a new bb and just sets EDGE_CROSSING on the single fallthru edge from the
new bb.  But the direct edge from cur_bb to fallthru edge's destination
isn't handled and fails afterwards consistency checks, because it crosses
partitions.

The following patch notes the case and sets EDGE_CROSSING on that edge too.

2023-01-31  Jakub Jelinek  <jakub@redhat.com>

PR rtl-optimization/108596
* bb-reorder.c (fix_up_fall_thru_edges): Handle the case where cur_bb
ends with asm goto and has a crossing fallthrough edge to the same bb
that contains at least one of its labels by restoring EDGE_CROSSING
flag even on possible edge from cur_bb to new_bb successor.

* gcc.c-torture/compile/pr108596.c: New test.

(cherry picked from commit 603a6fbcaac1e80aa90d1d26318c881a53473066)

doc: Fix up return type of __builtin_va_arg_pack_len [PR108560]

__builtin_va_arg_pack_len as implemented returned int since its introduction
in 2007. The initial documentation didn't mention any return type,
which changed in 2010 in r0-103077-gab940b73bfabe2cec4 during some
documentation formatting cleanups
https://gcc.gnu.org/legacy-ml/gcc-patches/2010-09/msg01632.html
I can understand that for formatting some type was needed there
but what exactly hasn't been really discussed.

So, I think we should change documentation to match the implementation,
rather than change implementation to match the documentation.
Most people don't use more than 2147483647 arguments to inline functions,
and on poor targets with 16-bit ints I bet even having more than 65535
arguments to inline functions would be highly unexpected.

2023-01-27 Jakub Jelinek <jakub@redhat.com>

PR other/108560
* doc/extend.texi: Fix up return type of __builtin_va_arg_pack_len
from size_t to int.

(cherry picked from commit 16f30680f403891556da2ad6329fcef9dc9b47db)

store-merging: Disable string_concatenate mode if start or end aren't byte aligned [PR108498]

The first of the following testcases is miscompiled on powerpc64-linux -O2
-m64 at least, the latter at least on x86_64-linux -m32/-m64.
Since GCC 11 store-merging has a separate string_concatenation mode which
turns stores into setting a MEM_REF from a STRING_CST.
This mode is triggered if at least one of the to be merged stores
is a STRING_CST store and either the first store (to earliest address)
is that STRING_CST store or the first store is 8-bit INTEGER_CST store
and then there are some rules when to turn that mode off or not merge
further stores into it.

The problem with these 2 testcases is that the actual implementation
relies on start/width of the store to be at byte boundaries, as it
simply creates a char array, MEM_REF can be only on byte boundaries
and the char array too, plus obviously STRING_CST as well.
But as can be easily seen in the second testcase, nothing verifies this,
while the first store has to be a STRING_CST (which will be aligned)
or 8-bit INTEGER_CST, that 8-bit INTEGER_CST store could be a bitfield
store, nothing verifies any stores in between whether they actually are
8-bit and aligned, the only major requirement is that all the stores
are consecutive.

For GCC 14 I think we should reconsider this, simply treat STRING_CST
stores during the merging like INTEGER_CST stores and deal with it only
during split_group where we can create multiple parts, this part
would be a normal store, this part would be STRING_CST store, this part
another normal store etc. But that is quite a lot of work, the following
patch just disables the string_concatenate mode if boundaries aren't byte
aligned in the spot where we disable it if it is too short too.
If that happens, we'll just try to do the merging using normal 1/2/4/8 etc.
byte stores as usually with RMW masking for any bits that shouldn't be
touched or punt if we end up with too many stores compared to the original.

Note, an original STRING_CST store will count as one store in that case,
something we might want to reconsider later too (but, after all, CONSTRUCTOR
stores (aka zeroing) already have the same problem, they can be large and
expensive and we still count them as one store).

2023-01-25 Jakub Jelinek <jakub@redhat.com>

PR tree-optimization/108498
* gimple-ssa-store-merging.c (class store_operand_info):
End coment with full stop rather than comma.
(split_group): Likewise.
(merged_store_group::apply_stores): Clear string_concatenation if
start or end aren't on a byte boundary.

* gcc.c-torture/execute/pr108498-1.c: New test.
* gcc.c-torture/execute/pr108498-2.c: New test.

(cherry picked from commit 617be7ba436bcbf9d7b883968c6b3c011206b56c)

options: fix cl_target_option_print_diff() with strings

Fix an obvious copy-and-paste error where ptr1 was used instead of ptr2.
This bug caused the dump file produced by -fdump-ipa-inline-details to
not correctly show the difference in target options when a function
could not be inlined due to a target option mismatch.

gcc/ChangeLog:
PR bootstrap/90543
* optc-save-gen.awk: Fix copy-and-paste error.

Signed-off-by: Eric Biggers <ebiggers@google.com>
(cherry picked from commit 9f0cb3368af735e95776769c4f28fa9cbb60eaf8)

c++: Fix up handling of references to anon union members in initializers [PR53932]

For anonymous union members we create artificial VAR_DECLs which
have DECL_VALUE_EXPR for the actual COMPONENT_REF.  That works
just fine inside of functions (including global dynamic constructors),
because during gimplification such VAR_DECLs are gimplified as
their DECL_VALUE_EXPR.  This is also done during regimplification.

But references to these artificial vars in DECL_INITIAL expressions
aren't ever replaced by the DECL_VALUE_EXPRs, so we end up either
with link failures like on the testcase below, or worse ICEs with
LTO.

The following patch fixes those during cp_fully_fold_init where we
already walk all the trees (!data->genericize means that
function rather than cp_fold_function).

2023-01-19  Jakub Jelinek  <jakub@redhat.com>

PR c++/53932
* cp-gimplify.c (cp_fold_r): During cp_fully_fold_init replace
DECL_ANON_UNION_VAR_P VAR_DECLs with their corresponding
DECL_VALUE_EXPR.

* g++.dg/init/pr53932.C: New test.

(cherry picked from commit 9b9a989adc042b304572fd6d4ade46b47be6ccb8)

openmp: Fix up OpenMP expansion of non-rectangular loops [PR108459]

expand_omp_for_init_counts was using for the case where collapse(2)
inner loop has init expression dependent on non-constant multiple of
the outer iterator and the condition upper bound expression doesn't
depend on the outer iterator fold_unary (NEGATE_EXPR, ...). This
will just return NULL if it can't be folded, we need fold_build1
instead.

2023-01-19 Jakub Jelinek <jakub@redhat.com>

PR middle-end/108459
* omp-expand.c (expand_omp_for_init_counts): Use fold_build1 rather
than fold_unary for NEGATE_EXPR.

* testsuite/libgomp.c/pr108459.c: New test.

(cherry picked from commit 46644ec99cb355845b23bb1d02775c057ed8ee88)

fortran: Fix up function types for realloc and sincos{,f,l} builtins [PR108349]

As reported in the PR, the FUNCTION_TYPE for __builtin_realloc in the
Fortran FE is wrong since r0-100026-gb64fca63690ad which changed
-  tmp = tree_cons (NULL_TREE, pvoid_type_node, void_list_node);
-  tmp = tree_cons (NULL_TREE, size_type_node, tmp);
-  ftype = build_function_type (pvoid_type_node, tmp);
+  ftype = build_function_type_list (pvoid_type_node,
+                                    size_type_node, pvoid_type_node,
+                                    NULL_TREE);
   gfc_define_builtin ("__builtin_realloc", ftype, BUILT_IN_REALLOC,
                      "realloc", false);
The return type is correct, void *, but the first argument should be
void * too and only second one size_t, while the above change changed
realloc to be void *__builtin_realloc (size_t, void *);
I went through all other changes from that commit and found that
__builtin_sincos{,f,l} got broken as well, instead of the former
void __builtin_sincos{,f,l} (ftype, ftype *, ftype *);
where ftype is {double,float,long double} it is now incorrectly
void __builtin_sincos{,f,l} (ftype *, ftype *);

The following patch fixes that, plus some formatting issues around
the spots I've changed.

2023-01-11  Jakub Jelinek  <jakub@redhat.com>

PR fortran/108349
* f95-lang.c (gfc_init_builtin_function): Fix up function types
for BUILT_IN_REALLOC and BUILT_IN_SINCOS{F,,L}.  Formatting fixes.

(cherry picked from commit 0986c351aa8a9f08b3cb614baec13564dd62c114)

generic-match-head: Don't assume GENERIC folding is done only early [PR108237]

We ICE on the following testcase, because a valid V2DImode
!= comparison is folded into an unsupported V2DImode > comparison.
The match.pd pattern which does this looks like:
/* Transform comparisons of the form (X & Y) CMP 0 to X CMP2 Z
   where ~Y + 1 == pow2 and Z = ~Y.  */
(for cst (VECTOR_CST INTEGER_CST)
(for cmp (eq ne)
      icmp (le gt)
  (simplify
   (cmp (bit_and:c@2 @0 cst@1) integer_zerop)
    (with { tree csts = bitmask_inv_cst_vector_p (@1); }
     (if (csts && (VECTOR_TYPE_P (TREE_TYPE (@1)) || single_use (@2)))
      (with { auto optab = VECTOR_TYPE_P (TREE_TYPE (@1))
                         ? optab_vector : optab_default;
              tree utype = unsigned_type_for (TREE_TYPE (@1)); }
       (if (target_supports_op_p (utype, icmp, optab)
            || (optimize_vectors_before_lowering_p ()
                && (!target_supports_op_p (type, cmp, optab)
                    || !target_supports_op_p (type, BIT_AND_EXPR, optab))))
        (if (TYPE_UNSIGNED (TREE_TYPE (@1)))
         (icmp @0 { csts; })
         (icmp (view_convert:utype @0) { csts; })))))))))
and that optimize_vectors_before_lowering_p () guarded stuff there
already deals with this problem, not trying to fold a supported comparison
into a non-supported one.  The reason it doesn't work in this case is that
it isn't GIMPLE folding which does this, but GENERIC folding done during
forwprop4 - forward_propagate_into_comparison -> forward_propagate_into_comparison_1
-> combine_cond_expr_cond -> fold_binary_loc -> generic_simplify
and we simply assumed that GENERIC folding happens only before
gimplification.

The following patch fixes that by checking cfun properties instead of
always returning true in those cases.

2023-01-04  Jakub Jelinek  <jakub@redhat.com>

PR middle-end/108237
* generic-match-head.c: Include tree-pass.h.
(canonicalize_math_p, optimize_vectors_before_lowering_p): Define
to false if cfun and cfun->curr_properties has PROP_gimple_opt_math
resp. PROP_gimple_lvec property set.

* gcc.c-torture/compile/pr108237.c: New test.

(cherry picked from commit 345dffd0d4ebff7e705dfff1a8a72017a167120a)

expr: Fix up store_expr into SUBREG_PROMOTED_* target [PR108264]

The following testcase ICEs on s390x-linux (e.g. with -march=z13).
The problem is that target is (subreg/s/u:SI (reg/v:DI 66 [ x+-4 ]) 4)
and we call convert_move from temp to the SUBREG_REG of that, expecting
to extend the value properly. That works nicely if temp has some
scalar integer mode (or partial one), but ICEs when temp has V4QImode
on the assertion that from and to modes have the same bitsize.
store_expr generally allows say store from V4QI to SI target because
they have the same size and if temp is a CONST_INT, we already have code
to convert the constant properly, so the following patch just adds handling
of non-scalar integer modes by converting them to the mode of target
first before convert_move extends them.

2023-01-03 Jakub Jelinek <jakub@redhat.com>

PR middle-end/108264
* expr.c (store_expr): For stores into SUBREG_PROMOTED_* targets
from source which doesn't have scalar integral mode first convert
it to outer_mode.

* gcc.dg/pr108264.c: New test.

(cherry picked from commit 226a498733e7919de72eb6f1bf3e16883ad159f6)

tree-ssa-dom: can_infer_simple_equiv fixes [PR108068]

As reported in the PR, tree-ssa-dom.cc uses real_zerop call to find
if a floating point constant is zero and it shouldn't try to infer
equivalences from comparison against it if signed zeros are honored.
This doesn't work at all for decimal types, because real_zerop always
returns false for them (one can have different representations of decimal
zero beyond -0/+0), and it doesn't work for vector compares either,
as real_zerop checks if all elements are zero, while we need to avoid
infering equivalences from comparison against vector constants which have
at least one zero element in it (if signed zeros are honored).
Furthermore, as mentioned by Joseph, for decimal types many other values
aren't singleton.

So, this patch stops infering anything if element mode is decimal, and
otherwise uses instead of real_zerop a new function, real_maybe_zerop,
which will work even for decimal types and for complex or vector will
return true if any element is or might be zero (so it returns true
for anything but constants for now).

2022-12-23 Jakub Jelinek <jakub@redhat.com>

PR tree-optimization/108068
* tree.h (real_maybe_zerop): Declare.
* tree.c (real_maybe_zerop): Define.
* tree-ssa-dom.c (record_edge_info): Use it instead of
real_zerop or TREE_CODE (op1) == SSA_NAME || real_zerop. Always set
can_infer_simple_equiv to false for decimal floating point types.

* gcc.dg/dfp/pr108068.c: New test.

(cherry picked from commit fd1b0aefda5b65f3f841ca6e61ccea6a72daa060)

cse: Fix up CSE const_anchor handling [PR108193]

The following testcase ICEs on aarch64, because insert_const_anchor
inserts invalid CONST_INT into the CSE tables - 0x80000000 for SImode.
The second hunk of the patch fixes that, the first one is to avoid
triggering undefined behavior at compile time during compute_const_anchors
computations - performing those additions and subtractions in
HOST_WIDE_INT means it can overflow for certain constants.

2022-12-22 Jakub Jelinek <jakub@redhat.com>

PR rtl-optimization/108193
* cse.c (compute_const_anchors): Change n type to
unsigned HOST_WIDE_INT, adjust comparison against it to avoid
warnings. Formatting fix.
(insert_const_anchor): Use gen_int_mode instead of GEN_INT.

* gfortran.dg/pr108193.f90: New test.

(cherry picked from commit 0cb5d7cdbab8e5f8359764ef5f62d93c2bc88552)

openmp: Don't try to destruct DECL_OMP_PRIVATIZED_MEMBER vars [PR108180]

DECL_OMP_PRIVATIZED_MEMBER vars are artificial vars with DECL_VALUE_EXPR
of this->field used just during gimplification and omp lowering/expansion
to privatize individual fields in methods when needed.
As the following testcase shows, when not in templates, they were handled
right, but in templates we actually called cp_finish_decl on them and
that can result in their destruction, which is obviously undesirable,
we should only destruct the privatized copies of them created in omp
lowering.

Fixed thusly.

2022-12-21 Jakub Jelinek <jakub@redhat.com>

PR c++/108180
* pt.c (tsubst_expr): Don't call cp_finish_decl on
DECL_OMP_PRIVATIZED_MEMBER vars.

* testsuite/libgomp.c++/pr108180.C: New test.

(cherry picked from commit 1119902b6c7c1c50123ed85ec1def8be4772d68c)

testsuite: Fix up pr64536.c for LLP64 targets [PR108151]

Apparently llp64 had 2 further warnings, fixed thusly.

2022-12-19 Jakub Jelinek <jakub@redhat.com>

PR testsuite/108151
* gcc.dg/pr64536.c (bar): Cast long to __INTPTR_TYPE__
before casting to long *.

(cherry picked from commit 6e85f89a7d59a99a3395b6e153b99262a58b2f6c)

testsuite: Fix up pr64536.c for LLP64 targets [PR108151]

The test casts a pointer to long, which is ok for ilp32 and lp64
targets but not for llp64 targets. Nothing reads the values later,
it is a link test, so all we care about is that it is the same
cast on s390x-linux where it used to fail before the PR64536 fix,
and that we don't warn about it.

2022-12-19 Jakub Jelinek <jakub@redhat.com>

PR testsuite/108151
* gcc.dg/pr64536.c (bar): Use casts to __INTPTR_TYPE__ rather than
long when casting pointer to integral type.

(cherry picked from commit ea37e96a37b50dad17b91d46edc518bbb9132d8e)