git.ipfire.org Git - thirdparty/gcc.git/log

Fortran: set shape of initializers of zero-sized arrays [PR95374,PR104352]

gcc/fortran/ChangeLog:

PR fortran/95374
PR fortran/104352
* decl.cc (add_init_expr_to_sym): Set shape of initializer also for
zero-sized arrays, so that bounds violations can be detected later.

gcc/testsuite/ChangeLog:

PR fortran/95374
PR fortran/104352
* gfortran.dg/zero_sized_13.f90: New test.

libstdc++: Fix up some <cmath> templates [PR109883]

As can be seen on the following testcase, for
std::{atan2,fmod,pow,copysign,fdim,fmax,fmin,hypot,nextafter,remainder,remquo,fma}
if one operand type is std::float{16,32,64,128}_t or std::bfloat16_t and
another one some integral type or some other floating point type which
promotes to the other operand's type, we can end up with endless recursion.
This is because of a declaration ordering problem in <cmath>, where the
float, double and long double overloads of those functions come before
the templates which use __gnu_cxx::__promote_{2,3}, but the
std::float{16,32,64,128}_t and std::bfloat16_t overloads come later in the
file. If the result of those promotions is _Float{16,32,64,128} or
__gnu_cxx::__bfloat16_t, say std::pow(_Float64, int) calls
std::pow(_Float64, _Float64) and the latter calls itself.

The following patch fixes that by moving those templates later in the file,
so that the calls from those templates see also the other overloads.

I think other templates in the file like e.g. isgreater etc. shouldn't be
a problem, because those just use __builtin_isgreater etc. in their bodies.

2023-05-17 Jakub Jelinek <jakub@redhat.com>

PR libstdc++/109883
* include/c_global/cmath (atan2, fmod, pow): Move
__gnu_cxx::__promote_2 using templates after _Float{16,32,64,128} and
__gnu_cxx::__bfloat16_t overloads.
(copysign, fdim, fmax, fmin, hypot, nextafter, remainder, remquo):
Likewise.
(fma): Move __gnu_cxx::__promote_3 using template after
_Float{16,32,64,128} and __gnu_cxx::__bfloat16_t overloads.

* testsuite/26_numerics/headers/cmath/constexpr_std_c++23.cc: New test.

libstdc++: Uncomment checks for <limits> enumeration types

I don't know why these checks are disabled.

libstdc++-v3/ChangeLog:

* testsuite/18_support/headers/limits/synopsis.cc: Uncomment
checks for float_round_style and float_denorm_style.

RISC-V: Remove masking third operand of rotate instructions

    Rotate instructions do not need to mask the third operand.
    For example,  RV64 the following code:

    unsigned long foo1(unsigned long rs1, unsigned long rs2)
    {
        long shamt = rs2 & (64 - 1);
        return (rs1 << shamt) | (rs1 >> ((64 - shamt) & (64 - 1)));
    }

    Compiles to:
    foo1:
            andi    a1,a1,63
            rol     a0,a0,a1
            ret

    This patch removes unnecessary masking.
    Besides, I have merged masking insns for shifts that were written before.

gcc/ChangeLog:
* config/riscv/riscv.md (*<optab><GPR:mode>3_mask): New pattern,
combined from ...
(*<optab>si3_mask, *<optab>di3_mask): Here.
(*<optab>si3_mask_1, *<optab>di3_mask_1): And here.
* config/riscv/bitmanip.md (*<bitmanip_optab><GPR:mode>3_mask): New
pattern.
(*<bitmanip_optab>si3_sext_mask): Likewise.
* config/riscv/iterators.md (shiftm1): Use const_si_mask_operand
and const_di_mask_operand.
(bitmanip_rotate): New iterator.
(bitmanip_optab): Add rotates.
* config/riscv/predicates.md (const_si_mask_operand): Renamed
from const31_operand.  Generalize to handle more mask constants.
(const_di_mask_operand): Similarly.

gcc/testsuite/ChangeLog:
* gcc.target/riscv/shift-and-2.c: Fixed test
* gcc.target/riscv/zbb-rol-ror-01.c: New test
* gcc.target/riscv/zbb-rol-ror-02.c: New test
* gcc.target/riscv/zbb-rol-ror-03.c: New test
* gcc.target/riscv/zbb-rol-ror-04.c: New test
* gcc.target/riscv/zbb-rol-ror-05.c: New test
* gcc.target/riscv/zbb-rol-ror-06.c: New test
* gcc.target/riscv/zbb-rol-ror-07.c: New test

libstdc++: Add system_header pragma to <bits/c++config.h>

Without this change many tests that depend on an effective-target will
fail when compiled with -pedantic -std=c++98. This happens because the
preprocessor check done by v3_check_preprocessor_condition uses -Werror
and includes <bits/c++config.h> directly (rather than via another header
like <string>). If <bits/c++config.h> is not a system header then this
pedwarn is not suppressed, and the effective-target check fails:

bits/c++config.h:220: error: anonymous variadic macros were introduced in C++11 [-Werror=variadic-macros]
cc1plus: all warnings being treated as errors
compiler exited with status 1
UNSUPPORTED: 18_support/headers/limits/synopsis.cc

We could consider also changing proc v3_check_preprocessor_condition so
that it includes a real header, rather than just <bits/c++config.h>, but
that's not necessary for now.

libstdc++-v3/ChangeLog:

* include/bits/c++config: Add system_header pragma.

libstdc++: Implement LWG 3877 for std::expected monadic ops

This was approved in Issaquah 2023. As well as fixing the value
categories, this fixes the fact that we were incorrectly testing E
instead of T in the or_else constraints.

libstdc++-v3/ChangeLog:

* include/std/expected (expected::and_then, expected::or_else)
(expected::transform, expected::transform_error): Fix exception
specifications as per LWG 3877.
(expected<void, E>::and_then, expected<void, E>::transform):
Likewise.
* testsuite/20_util/expected/lwg3877.cc: New test.

i386: Fix up types in __builtin_{inf,huge_val,nan{,s},fabs,copysign}q builtins [PR109884]

When _Float128 support has been added to C++ for 13.1,  float128t_type_node
tree has been added - in C float128_type_node and float128t_type_node is
the same and represents both _Float128 and __float128, but in C++ they
are distinct types which have different handling in the FEs.
When doing that change, I mistakenly forgot to change FLOAT128 primitive
type, which is used for the __builtin_{inf,huge_val,nan{,s},fabs,copysign}q
builtins results and some of their arguments (and nothing else).

The following patch fixes that.
On ia64 we already use float128t_type_node for those builtins, pa while
it has __float128 that type is the same as long double and so those builtins
have long double types and on powerpc seems we  don't have these builtins
but instead define macros which map them to __builtin_*f128.  That will
not work properly in C++, perhaps we should change those macros to be
function-like and cast to __float128.

2023-05-17  Jakub Jelinek  <jakub@redhat.com>

PR c++/109884
* config/i386/i386-builtin-types.def (FLOAT128): Use
float128t_type_node rather than float128_type_node.

* c-c++-common/pr109884.c: New test.

tree-ssa-math-opts: correct -ffp-contract= check

Since tree-ssa-math-opts may freely contract across statement boundaries
we should enable it only for -ffp-contract=fast instead of disabling it
for -ffp-contract=off.

No functional change, since -ffp-contract=on is not exposed yet.

gcc/ChangeLog:

* tree-ssa-math-opts.cc (convert_mult_to_fma): Enable only for
FP_CONTRACT_FAST (no functional change).

i386: Adjust emulated integer vector mode multiplication costs

Returned integer vector mode costs of emulated modes in
ix86_multiplication_cost are wrong and do not reflect generated
instruction sequences. Rewrite handling of different integer vector
modes and different target ABIs to return real instruction
counts in order to calcuate better costs of various emulated modes.

gcc/ChangeLog:

* config/i386/i386.cc (ix86_multiplication_cost): Correct
calcuation of integer vector mode costs to reflect generated
instruction sequences of different integer vector modes and
different target ABIs.

WriteInt in the ISO libraries should not emit '+' for positive values

This trivial patch changes the default behaviour for WriteInt so that
'+' is not emitted when writing positive values.

gcc/m2/ChangeLog:

* gm2-libs-iso/LongWholeIO.mod (WriteInt): Only request a
sign if the value is < 0.
* gm2-libs-iso/ShortWholeIO.mod (WriteInt): Only request a
sign if the value is < 0.
* gm2-libs-iso/WholeIO.mod (WriteInt): Only request a sign
if the value is < 0.
* gm2-libs-iso/WholeStr.mod (WriteInt): Only request a sign
if the value is < 0.

Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>

libstdc++: use __bool_constant instead of integral_constant

In the type_traits header, both integral_constant<bool> and __bool_constant
are used. This patch unifies those usages into __bool_constant.

libstdc++-v3/ChangeLog:

* include/std/type_traits: Use __bool_constant instead of
integral_constant.

Signed-off-by: Ken Matsui <kmatsui@cs.washington.edu>
Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
Reviewed-by: Patrick Palka <ppalka@redhat.com>

RISC-V: Add mode switching target hook to insert rounding mode config for fixed-point instructions

Hi, this patch support the new coming fixed-point intrinsics:
https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/222

Insert fixed-point rounding mode configuration by mode switching target hook.

Mode switching target hook is implemented applying LCM (Lazy code Motion).

So the performance && correctness can be well trusted.

Here is the example:

void f (void * in, void *out, int32_t x, int n, int m)
{
  for (int i = 0; i < n; i++) {
    vint32m1_t v = __riscv_vle32_v_i32m1 (in + i, 4);
    vint32m1_t v2 = __riscv_vle32_v_i32m1_tu (v, in + 100 + i, 4);
    vint32m1_t v3 = __riscv_vaadd_vx_i32m1 (v2, 0, VXRM_RDN, 4);
    v3 = __riscv_vaadd_vx_i32m1 (v3, 3, VXRM_RDN, 4);
    __riscv_vse32_v_i32m1 (out + 100 + i, v3, 4);
  }

  for (int i = 0; i < n; i++) {
    vint32m1_t v = __riscv_vle32_v_i32m1 (in + i + 1000, 4);
    vint32m1_t v2 = __riscv_vle32_v_i32m1_tu (v, in + 100 + i + 1000, 4);
    vint32m1_t v3 = __riscv_vaadd_vx_i32m1 (v2, 0, VXRM_RDN, 4);
    v3 = __riscv_vaadd_vx_i32m1 (v3, 3, VXRM_RDN, 4);
    __riscv_vse32_v_i32m1 (out + 100 + i + 1000, v3, 4);
  }
}

ASM:

...
csrwi   vxrm,2
vsetivli        zero,4,e32,m1,tu,ma
...
Loop 1
...
Loop 2

mode switching can global recognize both Loop 1 and Loop 2 are using RDN
rounding mode and hoist such single "csrwi vxrm,2" to dominate both Loop 1
and Loop 2.

Besides, I have add correctness check sanity tests in this patch too.

Ok for trunk ?

gcc/ChangeLog:

* config/riscv/riscv-opts.h (enum riscv_entity): New enum.
* config/riscv/riscv.cc (riscv_emit_mode_set): New function.
(riscv_mode_needed): Ditto.
(riscv_mode_after): Ditto.
(riscv_mode_entry): Ditto.
(riscv_mode_exit): Ditto.
(riscv_mode_priority): Ditto.
(TARGET_MODE_EMIT): New target hook.
(TARGET_MODE_NEEDED): Ditto.
(TARGET_MODE_AFTER): Ditto.
(TARGET_MODE_ENTRY): Ditto.
(TARGET_MODE_EXIT): Ditto.
(TARGET_MODE_PRIORITY): Ditto.
* config/riscv/riscv.h (OPTIMIZE_MODE_SWITCHING): Ditto.
(NUM_MODES_FOR_MODE_SWITCHING): Ditto.
* config/riscv/riscv.md: Add csrwvxrm.
* config/riscv/vector.md (rnu,rne,rdn,rod,none): New attribute.
(vxrmsi): New pattern.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/vxrm-10.c: New test.
* gcc.target/riscv/rvv/base/vxrm-6.c: New test.
* gcc.target/riscv/rvv/base/vxrm-7.c: New test.
* gcc.target/riscv/rvv/base/vxrm-8.c: New test.
* gcc.target/riscv/rvv/base/vxrm-9.c: New test.

RISC-V: Introduce rounding mode operand into fixed-point intrinsics

According to new comming fixed-point API:
https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/222

Introduce vxrm argument:
- vint32m1_t __riscv_vsadd_vv_i32m1 (vint32m1_t op1, vint32m1_t op2, size_t vl);
+ vint32m1_t __riscv_vsadd_vv_i32m1 (vint32m1_t op1, vint32m1_t op2, size_t vxrm, size_t vl);

This patch doesn't insert vxrm csrw configuration instruction yet.
Will support automatically insert csrw vxrm instruction in the next patch.

This patch does this following:
1. Only extend the vxrm argument.
2. Check vxrm argument is invalid immediate and report error message if it is invalid.

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc: Introduce rounding mode.
* config/riscv/riscv-vector-builtins-shapes.cc (struct alu_def): Ditto.
(struct narrow_alu_def): Ditto.
* config/riscv/riscv-vector-builtins.cc (function_builder::apply_predication): Ditto.
(function_expander::use_exact_insn): Ditto.
* config/riscv/riscv-vector-builtins.h (function_checker::arg_num): New function.
(function_base::has_rounding_mode_operand_p): New function.

gcc/testsuite/ChangeLog:

* g++.target/riscv/rvv/base/bug-11.C: Adapt testcase.
* g++.target/riscv/rvv/base/bug-12.C: Ditto.
* g++.target/riscv/rvv/base/bug-14.C: Ditto.
* g++.target/riscv/rvv/base/bug-15.C: Ditto.
* g++.target/riscv/rvv/base/bug-16.C: Ditto.
* g++.target/riscv/rvv/base/bug-17.C: Ditto.
* g++.target/riscv/rvv/base/bug-18.C: Ditto.
* g++.target/riscv/rvv/base/bug-19.C: Ditto.
* g++.target/riscv/rvv/base/bug-20.C: Ditto.
* g++.target/riscv/rvv/base/bug-21.C: Ditto.
* g++.target/riscv/rvv/base/bug-22.C: Ditto.
* g++.target/riscv/rvv/base/bug-23.C: Ditto.
* g++.target/riscv/rvv/base/bug-3.C: Ditto.
* g++.target/riscv/rvv/base/bug-5.C: Ditto.
* g++.target/riscv/rvv/base/bug-6.C: Ditto.
* g++.target/riscv/rvv/base/bug-8.C: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-100.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-101.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-102.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-103.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-104.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-105.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-106.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-107.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-108.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-109.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-110.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-111.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-112.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-113.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-114.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-115.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-116.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-117.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-118.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-119.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-122.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-97.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-98.c: Ditto.
* gcc.target/riscv/rvv/base/merge_constraint-1.c: Ditto.
* gcc.target/riscv/rvv/base/narrow_constraint-6.c: Ditto.
* gcc.target/riscv/rvv/base/narrow_constraint-7.c: Ditto.
* gcc.target/riscv/rvv/base/narrow_constraint-8.c: Ditto.
* gcc.target/riscv/rvv/base/narrow_constraint-9.c: Ditto.
* gcc.target/riscv/rvv/base/vxrm-2.c: New test.
* gcc.target/riscv/rvv/base/vxrm-3.c: New test.
* gcc.target/riscv/rvv/base/vxrm-4.c: New test.
* gcc.target/riscv/rvv/base/vxrm-5.c: New test.

Fix PR 106900: array-bounds warning inside simplify_builtin_call

The problem here is that VRP cannot figure out isize could not be 0
due to using integer_zerop. This patch removes the use of integer_zerop
and instead checks for 0 directly after converting the tree to
an unsigned HOST_WIDE_INT. This allows VRP to figure out isize is not 0
and `isize - 1` will always be >= 0.

This patch is just to avoid the warning that GCC could produce sometimes
and does not change any code generation or even VRP.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

* tree-ssa-forwprop.cc (simplify_builtin_call): Check
against 0 instead of calling integer_zerop.

RISC-V: Add rounding mode enum for fixed-point intrinsics

Hi, since fixed-point with modeling rounding mode intrinsics are coming:
https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/222

I am adding vxrm rounding mode enum to user first before the API intrinsic.

This patch is simple && obvious.

Ok for trunk ?

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins.cc (register_vxrm): New function.
(DEF_RVV_VXRM_ENUM): New macro.
(handle_pragma_vector): Add vxrm enum register.
* config/riscv/riscv-vector-builtins.def (DEF_RVV_VXRM_ENUM): New macro.
(RNU): Ditto.
(RNE): Ditto.
(RDN): Ditto.
(ROD): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/vxrm-1.c: New test.

Add Value_Range::operator=.

gcc/ChangeLog:

* value-range.h (Value_Range::operator=): New.

Provide support for copying unsupported ranges.

The unsupported_range class is provided for completness sake.  It is a
way to set VARYING/UNDEFINED ranges for unsupported ranges (currently
anything not float, integer, or pointer).  You can't do anything with
them, except set_varying, and set_undefined.  We will trap on any
other operation.

This patch provides a way to copy them, just in case they creep in.
This could happen in IPA under certain circumstances.

gcc/ChangeLog:

* value-range.cc (vrange::operator=): Add a stub to copy
unsupported ranges.
* value-range.h (is_a <unsupported_range>): New.
(Value_Range::operator=): Support copying unsupported ranges.

Add support for vrange streaming.

I think it's time for the ranger folk to start owning range streaming
instead of passes (IPA, etc) doing their own thing. I have plans for
overhauling the IPA code later this cycle to support generic ranges,
and I'd like to start cleaning up the streaming and hashing interface.

This patch adds generic streaming support for vrange.

gcc/ChangeLog:

* data-streamer-in.cc (streamer_read_real_value): New.
(streamer_read_value_range): New.
* data-streamer-out.cc (streamer_write_real_value): New.
(streamer_write_vrange): New.
* data-streamer.h (streamer_write_vrange): New.
(streamer_read_value_range): New.

doc: Describe behaviour of enums with fixed underlying type [PR109532]

gcc/ChangeLog:

PR c++/109532
* doc/invoke.texi (Code Gen Options): Note that -fshort-enums
is ignored for a fixed underlying type.
(C++ Dialect Options): Likewise for -fstrict-enums.

Reviewed-by: Marek Polacek <polacek@redhat.com>

Fortran/OpenMP: Fix mapping of array descriptors and deferred-length strings

Previously, array descriptors might have been mapped as 'alloc'
instead of 'to' for 'alloc', not updating the array bounds. The
'alloc' could also appear for 'data exit', failing with a libgomp
assert. In some cases, either array descriptors or deferred-length
string's length variable was not mapped. And, finally, some offset
calculations with array-sections mappings went wrong.

Additionally, the patch now unmaps for scalar allocatables/pointers
the GOMP_MAP_POINTER, avoiding stale mappings.

The testcases contain some comment-out tests which require follow-up
work and for which PR exist. Those mostly relate to deferred-length
strings which have several issues beyong OpenMP support.

gcc/fortran/ChangeLog:

* trans-decl.cc (gfc_get_symbol_decl): Add attributes
such as 'declare target' also to hidden artificial
variable for deferred-length character variables.
* trans-openmp.cc (gfc_trans_omp_array_section,
gfc_trans_omp_clauses, gfc_trans_omp_target_exit_data):
Improve mapping of array descriptors and deferred-length
string variables.

gcc/ChangeLog:

* gimplify.cc (gimplify_scan_omp_clauses): Remove Fortran
special case.

libgomp/ChangeLog:

* testsuite/libgomp.fortran/target-enter-data-3.f90: Uncomment
'target exit data'.
* testsuite/libgomp.fortran/target-enter-data-4.f90: New test.
* testsuite/libgomp.fortran/target-enter-data-5.f90: New test.
* testsuite/libgomp.fortran/target-enter-data-6.f90: New test.
* testsuite/libgomp.fortran/target-enter-data-7.f90: New test.

gcc/testsuite/
* gfortran.dg/goacc/finalize-1.f: Update dg-tree; shows a fix
for 'finalize' as a ptr is now 'delete' instead of 'release'.
* gfortran.dg/gomp/pr78260-2.f90: Likewise as elem-size calc moved
to if (allocated) block
* gfortran.dg/gomp/target-exit-data.f90: Likewise as a var is now a
replaced by a MEM< _25 > expression.
* gfortran.dg/gomp/map-9.f90: Update dg-scan-tree-dump.
* gfortran.dg/gomp/map-10.f90: New test.

libstdc++: Regenerate configure

I added a comment to configure.ac and forgot to regenerate configure.

libstdc++-v3/ChangeLog:

* configure: Regenerate.

s390: Implement TARGET_ATOMIC_ALIGN_FOR_MODE

So far atomic objects are aligned according to their default alignment.
For 128 bit scalar types like int128 or long double this results in an
8 byte alignment which is wrong and must be 16 byte.

libstdc++ already computes a correct alignment, though, still adding a
test case in order to make sure that both implementations are
compatible.

gcc/ChangeLog:

* config/s390/s390.cc (TARGET_ATOMIC_ALIGN_FOR_MODE):
New.
(s390_atomic_align_for_mode): New.

gcc/testsuite/ChangeLog:

* g++.target/s390/atomic-align-1.C: New test.
* gcc.target/s390/atomic-align-1.c: New test.
* gcc.target/s390/atomic-align-2.c: New test.

wide-int: Fix up function comment

When looking into _BitInt support, I've noticed unterminated parens in
a function comment.
Fixing thusly.

2023-05-17 Jakub Jelinek <jakub@redhat.com>

* wide-int.cc (wi::from_array): Add missing closing paren in function
comment.

c++: Don't try to initialize zero width bitfields in zero initialization [PR109868]

My GCC 12 change to avoid removing zero-sized bitfields as they are
important for ABI and are needed for layout compatibility traits
apparently causes zero sized bitfields to be initialized in the IL,
which at least in 13+ results in ICEs in the ranger which is upset
about zero precision types.

I think we could even avoid initializing other unnamed bitfields, but
unfortunately !CONSTRUCTOR_NO_CLEARING doesn't mean in the middle-end
clearing of padding bits and until we have some new flag that represents
the request to clear padding bits, I think it is better to keep zeroing
non-zero sized unnamed bitfields.

In addition to skipping those fields, I have changed the logic how
UNION_TYPEs are handled, the current code was a little bit weird in that
e.g. if first non-static data member had error_mark_node type, we'd happily
zero initialize the second non-static data member, etc.

2023-05-17 Jakub Jelinek <jakub@redhat.com>

PR c++/109868
* init.cc (build_zero_init_1): Don't initialize zero-width bitfields.
For unions only initialize the first FIELD_DECL.

* g++.dg/init/pr109868.C: New test.

vect: Don't retry if the previous analysis fails

When working on a cost tweaking patch, I found that a newly
added test case has different dumpings with stage-1 and
bootstrapped gcc.  By looking into it, the apparent reason
is vect_analyze_loop_2 doesn't get slp_done_for_suggested_uf
set expectedly, the following retrying will use the garbage
slp_done_for_suggested_uf instead.  In fact, the setting of
slp_done_for_suggested_uf only happens when the previous
analysis succeeds, for the mentioned test case, its previous
analysis does fail, it's unexpected to use the value of
slp_done_for_suggested_uf any more.

In function vect_analyze_loop_1, we only return success when
res is true, which is the result of 1st analysis.  It means
we never try to vectorize with unroll_vinfo if the previous
analysis fails.  So this patch shouldn't break anything, and
just stop some useless analysis early.

gcc/ChangeLog:

* tree-vect-loop.cc (vect_analyze_loop_1): Don't retry analysis with
suggested unroll factor once the previous analysis fails.

RISC-V: Support RVV VREINTERPRET from v{u}int*_t to vbool1_t

This patch support the RVV VREINTERPRET from the int to the vbool1_t.  Aka:

vbool1_t __riscv_vreinterpret_xx_xx(v{u}int[8|16|32|64]_t);

These APIs help the users to convert vector LMUL=1 integer to vbool1_t.
According to the RVV intrinsic SPEC as below, the reinterpret intrinsics
only change the types of the underlying contents.

https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/master/rvv-intrinsic-rfc.md#reinterpret-vbool-o-vintm1

For example, given below code.
vbool1_t test_vreinterpret_v_i8m1_b1(vint8m1_t src) {
  return __riscv_vreinterpret_v_i8m1_b1(src);
}

It will generate the assembly code similar as below:
vsetvli a5,zero,e8,m8,ta,ma
vlm.v   v1,0(a1)
vsm.v   v1,0(a0)
ret

The rest intrinsic bool size APIs will be prepared in other PATCH.

Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:

* config/riscv/genrvv-type-indexer.cc (BOOL_SIZE_LIST): New
macro.
(main): Add bool1 to the type indexer.
* config/riscv/riscv-vector-builtins-functions.def
(vreinterpret): Register vbool1 interpret function.
* config/riscv/riscv-vector-builtins-types.def
(DEF_RVV_BOOL1_INTERPRET_OPS): New macro.
(vint8m1_t): Add the type to bool1_interpret_ops.
(vint16m1_t): Ditto.
(vint32m1_t): Ditto.
(vint64m1_t): Ditto.
(vuint8m1_t): Ditto.
(vuint16m1_t): Ditto.
(vuint32m1_t): Ditto.
(vuint64m1_t): Ditto.
* config/riscv/riscv-vector-builtins.cc
(DEF_RVV_BOOL1_INTERPRET_OPS): New macro.
(required_extensions_p): Add bool1 interpret case.
* config/riscv/riscv-vector-builtins.def
(bool1_interpret): Add bool1 interpret to base type.
* config/riscv/vector.md (@vreinterpret<mode>): Add new expand
with VB dest for vreinterpret.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/misc_vreinterpret_vbool_vint.c: New test.

Disable warnings as errors for STAGEautofeedback.

Compilation during STAGEautofeedback produces additional warnings
since inlining decisions with -fauto-profile are different from
other builds.

This patches disables warnings as errors for STAGEautofeedback.

Tested on x86_64-pc-linux-gnu.

ChangeLog:

* Makefile.in: Disable warnings as errors for STAGEautofeedback

rs6000: use lis;xoris to build constant

For constant C:
If '(c & 0xFFFFFFFF0000FFFFULL) == 0xFFFFFFFF00000000' or say:
32(1) || 1(0) || 15(x) || 16(0), we could use "lis; xoris" to build.

Here N(M) means N continuous bit M, x for M means it is ok for either
1 or 0; '||' means concatenation.

This patch update rs6000_emit_set_long_const to support those constants.

Compare with previous version:
https://gcc.gnu.org/pipermail/gcc-patches/2022-December/608292.html
This patch updates test function names only.

Bootstrap and regtest pass on ppc64{,le}.

PR target/106708

gcc/ChangeLog:

* config/rs6000/rs6000.cc (rs6000_emit_set_long_const): Support building
constants through "lis; xoris".

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr106708.c: Add test function.

Daily bump.

c: Remove restrictions on declarations in 'for' loops for C2X

C2X removes a restriction that the only declarations in the
declaration part of a 'for' loop are declarations of objects with
storage class auto or register. Implement this change, making the
diagnostics into pedwarn_c11 calls instead of errors (as usual for
features added in a new standard version that were invalid code in a
previous version), so now pedwarn-if-pedantic for older standards and
diagnosed also with -Wc11-c2x-compat.

Bootstrapped with no regressions for x86_64-pc-linux-gnu.

gcc/c/
* c-decl.cc (check_for_loop_decls): Use pedwarn_c11 for
diagnostics.

gcc/testsuite/
* gcc.dg/c11-fordecl-1.c, gcc.dg/c11-fordecl-2.c,
gcc.dg/c11-fordecl-3.c, gcc.dg/c11-fordecl-4.c,
gcc.dg/c2x-fordecl-1.c, gcc.dg/c2x-fordecl-2.c,
gcc.dg/c2x-fordecl-3.c, gcc.dg/c2x-fordecl-4.c: New tests.
* gcc.dg/c99-fordecl-2.c: Test diagnostic for typedef declaration
in for loop here.
* gcc.dg/pr67784-2.c, gcc.dg/pr68320.c, objc.dg/foreach-7.m: Do
not expect errors for typedef declaration in for loop.

PR modula2/109879 WholeIO.ReadCard and ReadInt should consume leading space

The Read{TYPE} procedures in LongIO, LongWholeIO, RealIO, ShortWholeIO and
WholeIO all require skip space functionality. A new module TextUtil
is supplied with this functionality and the previous modules have been
changed to call SkipSpaces.

gcc/m2/ChangeLog:

PR modula2/109879
* gm2-libs-iso/LongIO.mod (ReadReal): Call SkipSpaces.
* gm2-libs-iso/LongWholeIO.mod (ReadInt): Call SkipSpaces.
(ReadCard): Call SkipSpaces.
* gm2-libs-iso/RealIO.mod (ReadReal): Call SkipSpaces.
* gm2-libs-iso/ShortWholeIO.mod: (ReadInt): Call SkipSpaces.
(ReadCard): Call SkipSpaces.
* gm2-libs-iso/TextIO.mod: Import SkipSpaces.
* gm2-libs-iso/WholeIO.mod (ReadInt): Call SkipSpaces.
(ReadCard): Call SkipSpaces.
* gm2-libs-iso/TextUtil.def: New file.
* gm2-libs-iso/TextUtil.mod: New file.

libgm2/ChangeLog:

PR modula2/109879
* Makefile.in: Regenerate.
* aclocal.m4: Regenerate.
* libm2cor/Makefile.in: Regenerate.
* libm2iso/Makefile.am (M2DEFS): Add TextUtil.def.
(M2MODS): Add TextUtil.mod.
* libm2iso/Makefile.in: Regenerate.
* libm2log/Makefile.in: Regenerate.
* libm2min/Makefile.in: Regenerate.
* libm2pim/Makefile.in: Regenerate.

gcc/testsuite/ChangeLog:

PR modula2/109879
* gm2/isolib/run/pass/testreadint.mod: New test.

Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>

c++: -Wdangling-reference not suppressed in template [PR109774]

In check_return_expr, we suppress the -Wdangling-reference warning when
we're sure it would be a false positive. It wasn't working in a
template, though, because the suppress_warning call was never reached.

PR c++/109774

gcc/cp/ChangeLog:

* typeck.cc (check_return_expr): In a template, return only after
suppressing -Wdangling-reference.

gcc/testsuite/ChangeLog:

* g++.dg/warn/Wdangling-reference13.C: New test.

libstdc++: Disable cacheline alignment for DJGPP [PR109741]

DJGPP (and maybe other targets) uses MAX_OFILE_ALIGNMENT=16 which means
that globals (and static objects) can't have alignment greater than 16.
This causes an error for the locks defined in src/c++11/shared_ptr.cc
because we try to align them to the cacheline size, to avoid false
sharing.

Add a configure check for the increased alignment, and live with false
sharing where we can't increase the alignment.

libstdc++-v3/ChangeLog:

PR libstdc++/109741
* acinclude.m4 (GLIBCXX_CHECK_ALIGNAS_CACHELINE): Define.
* config.h.in: Regenerate.
* configure: Regenerate.
* configure.ac: Use GLIBCXX_CHECK_ALIGNAS_CACHELINE.
* src/c++11/shared_ptr.cc (__gnu_internal::get_mutex): Do not
align lock table if not supported. use __GCC_DESTRUCTIVE_SIZE
instead of hardcoded 64.

c++: desig init in presence of list ctor [PR109871]

add_list_candidates has logic to reject designated initialization of a
non-aggregate type, but this is inadvertently being suppressed if the type
has a list constructor due to the order of case analysis, which in the
below testcase leads to us incorrectly treating the initializer list as if
it's non-designated. This patch fixes this by making us check for invalid
designated initialization sooner.

PR c++/109871

gcc/cp/ChangeLog:

* call.cc (add_list_candidates): Check for invalid designated
initialization sooner and even for types that have a list
constructor.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/desig27.C: New test.

rs6000: Enable REE pass by default

Add ree pass as a default pass for rs6000 target for
O2 and above.

2023-05-16 Ajit Kumar Agarwal <aagarwa1@linux.ibm.com>

gcc/ChangeLog:

* common/config/rs6000/rs6000-common.cc: Add REE pass as a
default rs6000 target pass for O2 and above.
* doc/invoke.texi: Document -free

RISC-V: Fix wrong select_kind in riscv_compute_multilib

Seems like I screw up bare-metal toolchian multi lib selection during
finxing linux multi-lib selction...

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc (riscv_compute_multilib):
Fix wrong select_kind...

rs6000: Fix test int_128bit-runnable.c instruction counts

The test reports two failures on Power 10LE:

FAIL: .../int_128bit-runnable.c scan-assembler-times \\\\mvdivsq\\\\M 1
FAIL: .../int_128bit-runnable.c scan-assembler-times \\\\mvextsd2q\\\\M 6

The current counts are :

  vdivsq   3
  vextsd2q 4

The counts changed with commit:

  commit 852b11da11a181df517c0348df044354ff0656d6
  Author: Michael Meissner <meissner@linux.ibm.com>
  Date:   Wed Jul 7 21:55:38 2021 -0400

      Generate 128-bit int divide/modulus on power10.

      This patch adds support for the VDIVSQ, VDIVUQ, VMODSQ, and VMODUQ
      instructions to do 128-bit arithmetic.

      2021-07-07  Michael Meissner  <meissner@linux.ibm.com>

The code generation changed significantly.  There are two places where
the vextsd2q is "replaced" by a vdivsq instruction thus increasing the
vdivsq count from 1 to 3.  The first case is:

expected_result = vec_arg1[0]/4;
    10000af8:   60 01 df e8     ld      r6,352(r31)
    10000afc:   68 01 ff e8     ld      r7,360(r31)
    10000b00:   76 fe e9 7c     sradi   r9,r7,63
    10000b04:   67 4b 00 7c     mtvsrdd vs32,0,r9
    10000b08:   02 06 1b 10     vextsd2q v0,v0         <----
    10000b0c:   03 00 40 39     li      r10,3
    10000b10:   00 00 60 39     li      r11,0
    10000b14:   67 00 09 7c     mfvrd   r9,v0
    10000b18:   67 02 08 7c     mfvsrld r8,vs32
    10000b1c:   38 50 08 7d     and     r8,r8,r10
    10000b20:   38 58 29 7d     and     r9,r9,r11
    10000b24:   78 4b 2b 7d     mr      r11,r9
    10000b28:   78 43 0a 7d     mr      r10,r8
    10000b2c:   14 30 4a 7f     addc    r26,r10,r6
    10000b30:   14 39 6b 7f     adde    r27,r11,r7
    10000b34:   46 f0 69 7b     sldi    r9,r27,62
    10000b38:   82 f0 58 7b     srdi    r24,r26,2
    10000b3c:   78 c3 38 7d     or      r24,r9,r24
    10000b40:   74 16 79 7f     sradi   r25,r27,2
    10000b44:   30 00 1f fb     std     r24,48(r31)
    10000b48:   38 00 3f fb     std     r25,56(r31)

To:

   expected_result = vec_arg1[0]/4;
    10000af8:   69 01 1f f4     lxv     vs32,352(r31)
    10000afc:   04 00 20 39     li      r9,4
    10000b00:   00 00 40 39     li      r10,0
    10000b04:   67 4b 2a 7c     mtvsrdd vs33,r10,r9
    10000b08:   0b 09 00 10     vdivsq  v0,v0,v1       <----
    10000b0c:   3d 00 1f f4     stxv    vs32,48(r31)

The second case were a vexts2q instruction is replaced with vdivsq:

From:

  expected_result = arg1/16;
    10000c24:   40 00 df e8     ld      r6,64(r31)
    10000c28:   48 00 ff e8     ld      r7,72(r31)
    10000c2c:   76 fe e9 7c     sradi   r9,r7,63
    10000c30:   67 4b 00 7c     mtvsrdd vs32,0,r9
    10000c34:   02 06 1b 10     vextsd2q v0,v0        <---
    10000c38:   0f 00 40 39     li      r10,15
    10000c3c:   00 00 60 39     li      r11,0
    10000c40:   67 00 09 7c     mfvrd   r9,v0
    10000c44:   67 02 08 7c     mfvsrld r8,vs32
    10000c48:   38 50 08 7d     and     r8,r8,r10
    10000c4c:   38 58 29 7d     and     r9,r9,r11
    10000c50:   78 4b 2b 7d     mr      r11,r9
    10000c54:   78 43 0a 7d     mr      r10,r8
    10000c58:   14 30 ca 7e     addc    r22,r10,r6
    10000c5c:   14 39 eb 7e     adde    r23,r11,r7
    10000c60:   c6 e0 e9 7a     sldi    r9,r23,60
    10000c64:   02 e1 d4 7a     srdi    r20,r22,4
    10000c68:   78 a3 34 7d     or      r20,r9,r20
    10000c6c:   74 26 f5 7e     sradi   r21,r23,4
    10000c70:   30 00 9f fa     std     r20,48(r31)
    10000c74:   38 00 bf fa     std     r21,56(r31)

To:

  expected_result = arg1/16;
    10000be8:   49 00 1f f4     lxv     vs32,64(r31)
    10000bec:   10 00 20 39     li      r9,16
    10000bf0:   00 00 40 39     li      r10,0
    10000bf4:   67 4b 2a 7c     mtvsrdd vs33,r10,r9
    10000bf8:   0b 09 00 10     vdivsq  v0,v0,v1       <---
    10000bfc:   3d 00 1f f4     stxv    vs32,48(r31)

The patch has been tested on Power10LE with no regressions.

gcc/testsuite/
* gcc.target/powerpc/int_128bit-runnable.c: Update expected
instruction counts.

rs6000: Fix test gc.target/powerpc/rs600-fpint.c test options

The test compile option rs6000-*-* is outdated and no longer supported.
The powerpc*-*-* is the defualt, so it doesn't need to be specified.
The dg-options needs to specify an older processor to get the desired
behavior on recent processors, since gfxopt is only off for very old CPUs,
we don't guard stfiwx under it for recent processors and don't want to.

This patch updates the test specifications so the test will run properly on
Power10LE. Tested on Power10 LE system with no regression test failures.

gcc/testsuite/:
* gcc.target/powerpc/rs6000-fpint.c: Update dg-options, drop dg-do
compile specifier.

PR modula2/108344 disable default opening of /dev/tty

This patch changes removes the static initialisation code for KeyBoardLEDs.cc.
The module is only initialised if one of the exported functions is called.
This is useful as the module will access /dev/tty which might not be
available. TimerHandler.mod has also been changed to disable the scroll
lock LED as a sign of life.

gcc/m2/ChangeLog:

PR modula2/108344
* gm2-libs-coroutines/TimerHandler.mod (EnableLED): New constant.
(Timer): Test EnableLED before switching on the scroll LED.

libgm2/ChangeLog:

PR modula2/108344
* libm2cor/KeyBoardLEDs.cc (initialize_module): New function.
(SwitchScroll): Call initialize_module.
(SwitchNum): Call initialize_module.
(SwitchCaps): Call initialize_module.
(SwitchLEDs): Call initialize_module.
(M2EXPORT): Remove initialization code.

Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>

aarch64: Allow moves after tied-register intrinsics (2nd edition)

I missed these two in g:4ff89f10ca0d41f9cfa76 because I was
testing on a system that didn't support big-endian compilation.
Testing on aarch64_be-elf shows no other related failures
(although the overall results are worse than for little-endian).

gcc/testsuite/
* gcc.target/aarch64/advsimd-intrinsics/bfdot-2.c: Allow mves
to occur after the intrinsic instruction, rather than requiring
them to happen before.
* gcc.target/aarch64/advsimd-intrinsics/vdot-3-2.c: Likewise.

libstdc++: Stop using TR1 macros in <cctype> and <cfenv>

As with the two commits before this, the _GLIBCXX_USE_C99_CTYPE_TR1 and
_GLIBCXX_USE_C99_FENV_TR1 macros are misleading when they are also used
for <cctype> and <cfenv>, not only for TR1 headers. It is also wrong,
because the configure checks for TR1 use -std=c++98 and a target might
define the C99 features for C++11 but not for C++98.

Add separate configure checks for the <ctype.h> and <fenv.h> features using -std=c++11
for the checks. Use the new macros defined by those checks in the
C++11-specific parts of <cctype>, <cfenv>, and <fenv.h>.

libstdc++-v3/ChangeLog:

* acinclude.m4 (GLIBCXX_USE_C99): Check for isblank in C++11
mode and define _GLIBCXX_USE_C99_CTYPE. Check for <fenv.h>
functions in C++11 mode and define _GLIBCXX_USE_C99_FENV.
* config.h.in: Regenerate.
* configure: Regenerate.
* include/c_compatibility/fenv.h: Check _GLIBCXX_USE_C99_FENV
instead of _GLIBCXX_USE_C99_FENV_TR1.
* include/c_global/cfenv: Likewise.
* include/c_global/cctype: Check _GLIBCXX_USE_C99_CTYPE instead
of _GLIBCXX_USE_C99_CTYPE_TR1.

libstdc++: Stop using _GLIBCXX_USE_C99_STDINT_TR1 in <cstdint>

The _GLIBCXX_USE_C99_STDINT_TR1 macro (and the comments about it in
acinclude.m4 and config.h) are misleading when it is also used for
<stdint>, not only <tr1/stdint>. It is also wrong, because the
configure checks for TR1 use -std=c++98 and a target might define
uint32_t etc. for C++11 but not for C++98.

Add a separate configure check for the <stdint.h> types using -std=c++11
for the checks. Use the result of that separate check in <cstdint> and
most other places that still depend on the macro (many uses of that
macro have been removed already). The remaining uses of the STDINT_TR1
macro are really for TR1, or are in the src/c++11/compatibility-*.cc
files, where we don't want/need to change the condition they depend on
(if those symbols were only exported when <stdint.h> types were
available for -std=c++98, then that's the condition we should continue
to use for whether to export the compat symbols now).

Make similar changes for the related _GLIBCXX_USE_C99_INTTYPES_TR1 and
_GLIBCXX_USE_C99_INTTYPES_WCHAR_T_TR1 macros, adding new macros for
non-TR1 uses.

libstdc++-v3/ChangeLog:

* acinclude.m4 (GLIBCXX_USE_C99): Check for <stdint.h> types in
C++11 mode and define _GLIBCXX_USE_C99_STDINT. Check for
<inttypes.h> features in C++11 mode and define
_GLIBCXX_USE_C99_INTTYPES and _GLIBCXX_USE_C99_INTTYPES_WCHAR_T.
* config.h.in: Regenerate.
* configure: Regenerate.
* doc/doxygen/user.cfg.in (PREDEFINED): Add new macros.
* include/bits/chrono.h: Check _GLIBCXX_USE_C99_STDINT instead
of _GLIBCXX_USE_C99_STDINT_TR1.
* include/c_compatibility/inttypes.h: Check
_GLIBCXX_USE_C99_INTTYPES and _GLIBCXX_USE_C99_INTTYPES_WCHAR_T
instead of _GLIBCXX_USE_C99_INTTYPES_TR1 and
_GLIBCXX_USE_C99_INTTYPES_WCHAR_T_TR1.
* include/c_compatibility/stdatomic.h: Check
_GLIBCXX_USE_C99_STDINT instead of _GLIBCXX_USE_C99_STDINT_TR1.
* include/c_compatibility/stdint.h: Likewise.
* include/c_global/cinttypes: Check _GLIBCXX_USE_C99_INTTYPES
and _GLIBCXX_USE_C99_INTTYPES_WCHAR_T instead of
_GLIBCXX_USE_C99_INTTYPES_TR1 and
_GLIBCXX_USE_C99_INTTYPES_WCHAR_T_TR1.
* include/c_global/cstdint: Check _GLIBCXX_USE_C99_STDINT
instead of _GLIBCXX_USE_C99_STDINT_TR1.
* include/std/atomic: Likewise.
* src/c++11/cow-stdexcept.cc: Likewise.
* testsuite/29_atomics/headers/stdatomic.h/c_compat.cc:
Likewise.
* testsuite/lib/libstdc++.exp (check_v3_target_cstdint):
Likewise.

libstdc++: Stop using _GLIBCXX_USE_C99_COMPLEX_TR1 in <complex>

The _GLIBCXX_USE_C99_COMPLEX_TR1 macro (and the comments about it in
acinclude.m4 and config.h) are misleading when it is also used for
<complex>, not only <tr1/complex>. It is also wrong, because the
configure checks for TR1 use -std=c++98 and a target might define cacos
etc. for C++11 but not for C++98.

Add a separate configure check for the inverse trigonometric functions
that are covered by _GLIBCXX_USE_C99_COMPLEX_TR1, but using -std=c++11
for the checks. Use the result of that separate check in <complex>.

libstdc++-v3/ChangeLog:

* acinclude.m4 (GLIBCXX_USE_C99): Check for complex inverse trig
functions in C++11 mode and define _GLIBCXX_USE_C99_COMPLEX_ARC.
* config.h.in: Regenerate.
* configure: Regenerate.
* doc/doxygen/user.cfg.in (PREDEFINED): Add new macro.
* include/std/complex: Check _GLIBCXX_USE_C99_COMPLEX_ARC
instead of _GLIBCXX_USE_C99_COMPLEX_TR1.

libstdc++: Add assertion to debug_allocator test

libstdc++-v3/ChangeLog:

* testsuite/ext/debug_allocator/check_deallocate_null.cc: Add
assertion to ensure expected exception is throw.

libstdc++: Require tzdb support for chrono::zoned_time printer test

libstdc++-v3/ChangeLog:

* testsuite/libstdc++-prettyprinters/chrono.cc: Only test
printer for chrono::zoned_time for cx11 ABI and tzdb effective
target.

libstdc++: Do not use pthread_mutex_clocklock with ThreadSanitizer

As noted in https://github.com/llvm/llvm-project/issues/62623 there are
no tsan interceptors for some of the new POSIX-1:202x APIs added by
https://austingroupbugs.net/view.php?id=1216 so tsan gives false
positive warnings for try_lock_for on timed mutexes.

Disable the uses of the new pthread_mutex_clocklock API when tsan is
active. This changes the semantics of the try_lock_for functions,
because it can change which clock is used for the wait. This means those
functions might be affected by system clock adjustments when tsan is
used, when they would not be affected otherwise.

Reviewed-by: Thomas Rodgers <trodgers@redhat.com>
Reviewed-by: Mike Crowe <mac@mcrowe.com>
libstdc++-v3/ChangeLog:

* acinclude.m4 (GLIBCXX_CHECK_PTHREAD_MUTEX_CLOCKLOCK): Define
_GLIBCXX_USE_PTHREAD_MUTEX_CLOCKLOCK in terms of _GLIBCXX_TSAN.
* configure: Regenerate.

ada: Add "gnat --help-ada" text for new switches.

The output generated by "gnat --help-ada" should include descriptions for
the newly added -gnatw_s and -gnatw_S switches".

gcc/ada/

* usage.adb: Generate output text describing the -gnatw_s switch
(and the corresponding -gnatw_S switch).

ada: Use accumulator type in expansion of 'Reduce attribute

The current expansion of the 'Reduce attribute uses the resolution type of
the expression for the accumulator. Now this type can be unresolved or set
to a universal type, for example if it is itself the prefix of the 'Image
attribute, and this may yield a spurious type mismatch error in that case.

This changes the expansion to use the accumulator type instead as defined
by the RM 4.5.10 clause, albeit only in the prefixed case for now.

gcc/ada/

* exp_attr.adb (Expand_N_Attribute_Reference) <Attribute_Reduce>:
Use the canonical accumulator type as the type of the accumulator
in the prefixed case.

ada: Fix missing warning on aggregate with iterated component

This happens when the iterated component does not really iterate.

gcc/ada/

* exp_aggr.adb (Expand_Array_Aggregate): Do not set Warnings_Off on
the temporary created when in-place expansion is not possible.

ada: Fix crash on iterated component in expression function

The problem is that the freeze node generated for the type of a static
subexpression present in the expression function is incorrectly placed
inside instead of outside the function.

gcc/ada/

* freeze.adb (Freeze_Expression): When the freezing is to be done
outside the current scope, skip any scope that is an internal loop.

ada: Fix internal error on 'Image applied to array component

This happens because the array component depends on a discriminant.

gcc/ada/

* exp_imgv.adb (Rewrite_Object_Image): If the prefix is a component
that depends on a discriminant, create an actual subtype for it.

ada: Fix internal error on chain of predicated record types

The preanalysis of a predicate set on one of the record types was causing
premature freezing of another record type.

gcc/ada/

* sem_ch13.adb: Add with and use clauses for Expander.
(Resolve_Aspect_Expressions) <Aspect_Predicate>: Emulate a
bona-fide preanalysis setup before calling
Resolve_Aspect_Expression.

ada: Update proof of runtime units

Following changes in GNATprove, proofs need to be amended.

gcc/ada/

* libgnat/s-aridou.adb (Lemma_Div_Pow2): Add assertion.
* libgnat/s-arit32.adb (Lemma_Abs_Div_Commutation): Simplify.
* libgnat/s-expmod.adb (Lemma_Exp_Mod): Add assertions.
(Lemma_Euclidean_Mod): Add body to lemma.
(Lemma_Mult_Mod): Add assertion.
* libgnat/s-valueu.adb (Scan_Raw_Unsigned): Modify assertion.
* libgnat/s-vauspe.ads (Raw_Unsigned_Last_Ghost): Add
postcondition.
* libgnat/s-widthi.adb: Use more precise types.

ada: Implement inheritance of user-defined literal aspects for untagged types

In Ada 2022, user-defined literal aspects are nonoverridable but the named
subprograms present in them can be overridden, including for untagged types.

gcc/ada/

* sem_res.adb (Has_Applicable_User_Defined_Literal): Apply the
same processing for derived untagged types as for tagged types.
* sem_util.ads (Corresponding_Primitive_Op): Adjust description.
* sem_util.adb (Corresponding_Primitive_Op): Handle untagged
types.

ada: Spurious error analyzing 'old or 'result in class-wide conditions

gcc/ada/

* sem_attr.adb
(Analyze_Attribute_Old_Result): When preanalyzing a class-wide
condition, search in the scopes stack for the subprogram that has
the condition. This is required because returning the current
scope causes reporting spurious errors when the occurrence of the
attribute is found, for example, in a quantified expression.

ada: Spurious error on function returning CPP type

gcc/ada/

* exp_ch6.adb
(Needs_BIP_Alloc_Form): Return False for functions with foreign
convention since we never use build-in-place for such functions.

ada: Apply range checks to preanalyzed aggregate expressions

When preanalyzing expressions in GNATprove mode, e.g. Pre/Post
contracts, we apply checks, because these expressions will never
be expanded. This didn't happen for aggregate expressions, most
likely because of an oversight.

gcc/ada/

* sem_util.adb (Aggregate_Constraint_Checks): Don't exit early
when preanalysing in GNATprove mode. Now the condition is
consistent with other similar conditions in other code.

ada: usage.adb: document -gnatyD switch

-gnatyD was documented in the user guide but not in `gnat --help-ada`.

gcc/ada/

* usage.adb (Usage): Document -gnatyD.

ada: Fix Ada representation of r_debug and link_map types

Both record types need to have their components 'aliased' to match their
C version. The mismatch could be observed when using LTO:

  warning: type of 'r_debug' does not match original declaration
       [-Wlto-type-mismatch]

  /usr/include/link.h:66:23: note: type 'struct r_debug' should match
  type 'struct  system__traceback__symbolic__module_name__build_...
   ...cache_for_all_modules__r_debug_type'

gcc/ada/

* libgnat/s-tsmona__linux.adb (link_map, r_debug_type): Add
'aliased' on all components.

ada: Enable Support_Atomic_Primitives on PPC Linux

gcc/ada/

* libgnat/system-linux-ppc.ads: Add Support_Atomic_Primitives.
* libgnat/s-atopri__32.ads: Add 32 bit version of s-atopri.ads.
* Makefile.rtl: Use s-atopro__32.ads for ppc-linux.

ada: Follow-up improvement to implementation of storage models

It avoids to recreate an actual subtype for an explicit dereference.

gcc/ada/

* sem_util.adb (Get_Actual_Subtype): For an explicit dereference,
return the Actual_Designated_Subtype if it is present.
(Get_Actual_Subtype_If_Available): Likewise.

ada: Add tags on style messages

Similar to tags on warnings [-gnatwx], we add tags on style messages
[-gnatyx] when -gnatw.d is enabled.

gcc/ada/

* errout.ads: Update comment.
* errout.adb (Skip_Msg_Insertion_Warning): Update to take e.g.
-gnatyM into account.
* erroutc.adb (Get_Warning_Option, Get_Warning_Tag)
(Prescan_Message): Add support for Style tags.
* par-ch5.adb, par-ch6.adb, par-ch7.adb, par-endh.adb,
par-util.adb, style.adb, styleg.adb: Set tag on all style
messages.

ada: Fix typo in "pattern"

I found a couple of spots using the typo "patterm" rather than the
correct "pattern".

gcc/ada/

* doc/gnat_ugn/building_executable_programs_with_gnat.rst
(Switches_for_gnatbind): Fix typo.
* libgnat/g-spipat.ads: Fix typo.
* gnat_ugn.texi: Regenerate.

ada: Adjust semantics and implementation of storage models

This makes the following adjustments to the semantics and implementation of
storage models in the compiler:

  1. By-copy semantics in subprogram calls: when an object accessed with a
     nonnative storage model is passed as an actual parameter in a call to
     a subprogram, an intermediate copy made on the host is passed instead.

  2. More generally, any additional temporary required on the host by the
     semantics of nonnative storage models is now created by the front-end
     instead of the code generator.

  3. All the temporaries created on the host for nonnative storage models
     are allocated on the secondary stack instead of the primary stack.

As a result, this should simplify the implementation in code generators.

gcc/ada/

* exp_aggr.adb (Build_Assignment_With_Temporary): Adjust comment
and fix type of second parameter. Create the temporary on the
secondary stack by calling Build_Temporary_On_Secondary_Stack.
(Convert_Array_Aggr_In_Allocator): Adjust formatting.
(Expand_Array_Aggregate): Likewise.
* exp_ch4.adb (Expand_N_Allocator): Set Actual_Designated_Subtype
on the dereference in the initialization for all composite types.
* exp_ch5.adb (Expand_N_Assignment_Statement): Create a temporary
on the host for an assignment between nonnative storage models.
Suppress more checks when Suppress_Assignment_Checks is set.
* exp_ch6.adb (Add_Simple_Call_By_Copy_Code): Deal with actuals
that are dereferences with an Actual_Designated_Subtype. Add
support for nonnative storage models.
(Expand_Actuals): Create a copy if the actual is a dereference
with a nonnative storage model.
* exp_util.ads (Build_Temporary_On_Secondary_Stack): Declare.
* exp_util.adb (Build_Temporary_On_Secondary_Stack): New function.
* sem_ch5.adb (Analyze_Assignment.Set_Assignment_Type): Do not
build an actual subtype for dereferences with an
Actual_Designated_Subtype
* sinfo.ads (Actual_Designated_Subtype): Adjust documentation.
(Suppress_Assignment_Checks): Likewise.

ada: Build invariant procedure while freezing in GNATprove mode

Invariant procedure bodies are created either by expansion of freezing
nodes (but only in ordinary compilation mode) or at the end of package
private declarations (but not for with private types in the type
derivation chain).

In GNATprove mode we didn't create invariant procedure bodies in
lightweight expansion, so we didn't create them at all when there were
private types in the type derivation chain.

This patch copies the relevant freezing part from ordinary to
lightweight expansion. This obviously involves code duplication,
but it seems better to duplicate whole sections that work properly
instead of small pieces that are incomplete. There are other pieces
of freezing that are similarly duplicated, so this patch doesn't make
the code substantially worse.

gcc/ada/

* exp_spark.adb (SPARK_Freeze_Type): Copy whole handling of DIC
and Type_Invariant from Freeze_Type.

ada: Get name from entity if that's what's passed to Subprogram_Name

gcc/ada/

* sem_util.adb (Subprogram_Name): If what's passed is already an
entity, use that for the name.

ada: Document examples of No_Dependence restriction for code generation

gcc/ada/

* doc/gnat_rm/standard_and_implementation_defined_restrictions.rst
(No_Dependence): Give examples of new No_Dependence restrictions.
* gnat_rm.texi: Regenerate.

ada: Bad handling of ASCII with -gnatyn

ASCII is special cased but this wasn't taking into account all cases
such as Standard.ASCII.

gcc/ada/

* snames.ads-tmpl (Name_ASCII): New.
* style.adb (Check_Identifier): Fix handling of ASCII.

ada: Introduce Cannot_Be_Superflat flag on N_Range nodes

The support of superflat arrays in the language generates an overhead that
the code generator attempts to minimize, but it cannot handle too complex
cases and it would be helpful if the front-end could lend a hand.

This change introduces the Cannot_Be_Superflat flag on N_Range nodes for
this purpose, and sets it on the result of string concatenations when it
is guaranteed to be nonnull.

gcc/ada/

* gen_il-fields.ads (Opt_Field_Enum): Add Cannot_Be_Superflat.
* gen_il-gen-gen_nodes.adb (N_Range): Add Cannot_Be_Superflat as
semantical flag and change Includes_Infinities to semantical.
* sinfo.ads (Cannot_Be_Superflat): Document it for N_Range.
* exp_ch4.adb (Expand_Concatenate): Set Cannot_Be_Superflat on the
range of the result if the result cannot be null.

ada: Change Present_Expr field type to Uint

We want the field to be initialized to No_Uint because we want to be
able to test in GNAT LLVM whether we've already set it so we can be
sure we only set it once.

gcc/ada/

* gen_il-gen-gen_nodes.adb (Present_Expr): Type is now Uint.

ada: Simplify dramatically ghost code for proof of System.Arith_Double

Using Inline_For_Proof annotation on key expression functions makes
it possible to remove hundreds of lines of ghost code that were
previously needed to guide provers.

gcc/ada/

* libgnat/s-aridou.adb (Big3, Is_Mult_Decomposition)
(Is_Scaled_Mult_Decomposition): Add annotation for inlining.
(Double_Divide, Scaled_Divide): Simplify and remove ghost code.
(Prove_Multiplication): Add calls to lemmas to make proof go
through.
* libgnat/s-aridou.ads (Big, In_Double_Int_Range): Add annotation
for inlining.

ada: Add intermediate assertions for proof of Super_Tail

Proof of Superbounded internal unit requires a little more help.

gcc/ada/

* libgnat/a-strsup.adb: Add intermediate assertions.

ada: Missing dependency with -gnatc

When using -gnatc, dependencies on preprocessor and config files
were not recorded.

gcc/ada/

* gnat1drv.adb: Ensure all dependencies are recorded even when not
generating code.

ada: Set Loop_Variant assertion policy to Ignore in both

Set Loop_Variant assertion policy to Ignore in both.

gcc/ada/

* libgnat/a-strsup.adb: Set assertion policy for Loop_Variant.

ada: Trivial refactoring in Instantiate_*_Body

Factor out Par_Vis/Install_Parent/Par_Installed in Instantiate_Package_Body
and Instantiate_Subprogram_Body.

gcc/ada/

* sem_ch12.adb (Instantiate_Package_Body): Simplify if/then/else.
(Instantiate_Subprogram_Body): Likewise.

ada: Restore proof of System.Arith_Double

Use Assert_And_Cut to simplify proof of second part of the Scaled_Divide.
Add intermediate assertions and simplify where necessary.

gcc/ada/

* libgnat/s-aridou.adb:
(Big3): Remove override made useless.
(Lemma_Quot_Rem): Add new lemma and justify it, as no prover
manages to prove it.
(Lemma_Div_Pow2): Use new lemma Lemma_Quot_Rem.
(Prove_Scaled_Mult_Decomposition_Regroup3): Retype for
simplification.
(Scaled_Divide): Remove useless assertions.Decompose some
assertions with cut operations. Use Assert_And_Cut for second
half. Add assertions.

RISC-V: Adjust stdint.h to stdint-gcc.h for rvv tests

This patch would like to align the stdint.h to the stdint-gcc.h for all
the RVV test files. Aka:

stdint.h => stdint-gcc.h

Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/shift-scalar-template.h:
Replace stdint.h with stdint-gcc.h.
* gcc.target/riscv/rvv/autovec/binop/shift-template.h: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vadd-template.h: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vand-template.h: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vdiv-template.h: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vmax-template.h: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vmin-template.h: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vmul-template.h: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vor-template.h: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vrem-template.h: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vsub-template.h: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vxor-template.h: Ditto.
* gcc.target/riscv/rvv/autovec/series-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/vmv-imm-run.c: Ditto.
* gcc.target/riscv/rvv/autovec/vmv-imm-template.h: Ditto.

s390: Refactor block operation setmem

Vectorize memset with a constant length of less than or equal to 64
bytes.

Do not perform a libc function call into memset in case the size is not
a compile-time constant but bounded and the upper bound is less than or
equal to 256 bytes.

gcc/ChangeLog:

* config/s390/s390-protos.h (s390_expand_setmem): Change
function signature.
* config/s390/s390.cc (s390_expand_setmem): For memset's less
than or equal to 256 byte do not perform a libc call.
* config/s390/s390.md: Change expander into a version which
takes 8 operands.

gcc/testsuite/ChangeLog:

* gcc.target/s390/memset-1.c: Test case memset1 makes use of
vst, now.

s390: Add block operation movmem

gcc/ChangeLog:

* config/s390/s390-protos.h (s390_expand_movmem): New.
* config/s390/s390.cc (s390_expand_movmem): New.
* config/s390/s390.md (movmem<mode>): New.
(*mvcrl): New.
(mvcrl): New.

s390: Refactor block operation cpymem

Do not perform a libc function call into memcpy in case the size is not
a compile-time constant but bounded and the upper bound is less than or
equal to 256 bytes.

gcc/ChangeLog:

* config/s390/s390-protos.h (s390_expand_cpymem): Change
function signature.
* config/s390/s390.cc (s390_expand_cpymem): For memcpy's less
than or equal to 256 byte do not perform a libc call.
(s390_expand_insv): Adapt new function signature of
s390_expand_cpymem.
* config/s390/s390.md: Change expander into a version which
takes 8 operands.

Fortran: Fix an assortment of bugs

2023-05-16 Paul Thomas <pault@gcc.gnu.org>

gcc/fortran
PR fortran/105152
* interface.cc (gfc_compare_actual_formal): Emit an error if an
unlimited polymorphic actual is not matched either to an
unlimited or assumed type formal argument.

PR fortran/100193
* resolve.cc (resolve_ordinary_assign): Emit an error if the
var expression of an ordinary assignment is a proc pointer
component.

PR fortran/87496
* trans-array.cc (gfc_walk_array_ref): Provide assumed shape
arrays coming from interface mapping with a viable arrayspec.

PR fortran/103389
* trans-expr.cc (gfc_conv_intrinsic_to_class): Tidy up flagging
of unlimited polymorphic 'class_ts'.
(gfc_conv_gfc_desc_to_cfi_desc): Assumed type is unlimited
polymorphic and should accept any actual type.

PR fortran/104429
(gfc_conv_procedure_call): Replace dreadful kludge with a call
to gfc_finalize_tree_expr. Avoid dereferencing a void pointer
by giving it the pointer type of the actual argument.

PR fortran/82774
(alloc_scalar_allocatable_subcomponent): Shorten the function
name and replace the symbol argument with the se string length.
If a deferred length character length is either not present or
is not a variable, give the typespec a variable and assign the
string length to that. Use gfc_deferred_strlen to find the
hidden string length component.
(gfc_trans_subcomponent_assign): Convert the expression before
the call to alloc_scalar_allocatable_subcomponent so that a
good string length is provided.
(gfc_trans_structure_assign): Remove the unneeded derived type
symbol from calls to gfc_trans_subcomponent_assign.

gcc/testsuite/
PR fortran/105152
* gfortran.dg/pr105152.f90 : New test

PR fortran/100193
* gfortran.dg/pr100193.f90 : New test

PR fortran/87946
* gfortran.dg/pr87946.f90 : New test

PR fortran/103389
* gfortran.dg/pr103389.f90 : New test

PR fortran/104429
* gfortran.dg/pr104429.f90 : New test

PR fortran/82774
* gfortran.dg/pr82774.f90 : New test

Skip -fdelete-null-pointer-check tests if target keeps_null_pointer_checks

A bunch of tests explicitly pass in -fdelete-null-pointer-checks and
fail if the target keeps null pointer checks. Skip such tests by
adding a dg-skip-if for keeps_null_pointer_checks.

gcc/testsuite/ChangeLog:

* gcc.dg/attr-returns-nonnull.c: Skip if
keeps_null_pointer_checks.
* gcc.dg/init-compare-1.c: Likewise.
* gcc.dg/ipa/pr85734.c: Likewise.
* gcc.dg/ipa/propmalloc-1.c: Likewise.
* gcc.dg/ipa/propmalloc-2.c: Likewise.
* gcc.dg/ipa/propmalloc-3.c: Likewise.
* gcc.dg/ipa/propmalloc-4.c: Likewise.
* gcc.dg/tree-ssa/evrp11.c: Likewise.
* gcc.dg/tree-ssa/pr83648.c: Likewise.

MATCH: [PR109424] Simplify min/max of boolean arguments

This is version 2 of https://gcc.gnu.org/pipermail/gcc-patches/2021-August/577394.html
which does not depend on adding gimple_truth_valued_p at this point.
Instead will use zero_one_valued_p which is already used for mult simplifications
to make sure that we only have [0,1] rather having the mistake of maybe having [-1,0]
as the range for signed bools.

This shows up in a few places in GCC itself but only at -O1, we miss the min/max conversion
because of PR 107888 (which I will be testing seperately).

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

Thanks,
Andrew Pinski

PR tree-optimization/109424

gcc/ChangeLog:

* match.pd: Add patterns for min/max of zero_one_valued
values to `&`/`|`.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/bool-12.c: New test.
* gcc.dg/tree-ssa/bool-13.c: New test.
* gcc.dg/tree-ssa/minmax-20.c: New test.
* gcc.dg/tree-ssa/minmax-21.c: New test.

RISC-V: Add FRM and rounding mode operand into floating point intrinsics

This patch is adding rounding mode operand and FRM_REGNUM dependency
into floating-point instructions.

The floating-point instructions we added FRM and rounding mode operand:
1. vfadd/vfsub
2. vfwadd/vfwsub
3. vfmul
4. vfdiv
5. vfwmul
6. vfwmacc/vfwnmacc/vfwmsac/vfwnmsac
7. vfsqrt
8. floating-point conversions.
9. floating-point reductions.
10. floating-point ternary.

The floating-point instructions we did NOT add FRM and rounding mode
operand:
1. vfabs/vfneg/vfsqrt7/vfrec7
2. vfmin/vfmax
3. comparisons
4. vfclass
5. vfsgnj/vfsgnjn/vfsgnjx
6. vfmerge
7. vfmv.v.f

gcc/ChangeLog:

* config/riscv/riscv-protos.h (enum frm_field_enum): New enum.
* config/riscv/riscv-vector-builtins.cc
(function_expander::use_ternop_insn): Add default rounding mode.
(function_expander::use_widen_ternop_insn): Ditto.
* config/riscv/riscv.cc (riscv_hard_regno_nregs): Add FRM REGNUM.
(riscv_hard_regno_mode_ok): Ditto.
(riscv_conditional_register_usage): Ditto.
* config/riscv/riscv.h (DWARF_FRAME_REGNUM): Ditto.
(FRM_REG_P): Ditto.
(RISCV_DWARF_FRM): Ditto.
* config/riscv/riscv.md: Ditto.
* config/riscv/vector-iterators.md: split no frm and has frm operations.
* config/riscv/vector.md (@pred_<optab><mode>_scalar): New pattern.
(@pred_<optab><mode>): Ditto.

Signed-off-by: Juzhe-Zhong <juzhe.zhong@rivai.ai>

Daily bump.

c: Ignore _Atomic on function return type for C2x

For C2x it was decided that _Atomic would be completely ignored on
function return types (just as was done for qualifiers in C11 DR#423),
to eliminate the potential for an rvalue returned by a function having
_Atomic-qualified type when an rvalue resulting from lvalue-to-rvalue
conversion could not have such a type. Implement this for GCC.

Bootstrapped with no regressions for x86_64-pc-linux-gnu.

gcc/c/
* c-decl.cc (grokdeclarator): Ignore _Atomic on function return
type for C2x.

gcc/testsuite/
* gcc.dg/qual-return-9.c, gcc.dg/qual-return-10.c: New tests.

c: Update __has_c_attribute values for C2x

WG14 decided that __has_c_attribute should return the same value
(equal to the intended __STDC_VERSION__ value) for all standard
attributes in C2x, with values associated with when an attribute was
added to the working draft (or had semantics added or changed in the
working draft) only being used in earlier stages of development of
that draft. The intent is that the values for existing attributes
increase in future standard versions only if there are new features /
semantic changes for those attributes. Implement this change for GCC.

Bootstrapped with no regressions for x86_64-pc-linux-gnu.

gcc/c-family/
* c-lex.cc (c_common_has_attribute): Use 202311 as
__has_c_attribute return for all C2x attributes.

gcc/testsuite/
* gcc.dg/c2x-has-c-attribute-2.c: Expect 202311L return value from
__has_c_attribute for all C2x attributes.

Fortran: CLASS pointer function result in variable definition context [PR109846]

gcc/fortran/ChangeLog:

PR fortran/109846
* expr.cc (gfc_check_vardef_context): Check appropriate pointer
attribute for CLASS vs. non-CLASS function result in variable
definition context.

gcc/testsuite/ChangeLog:

PR fortran/109846
* gfortran.dg/ptr-func-5.f90: New test.

Add auto-resizing capability to irange's [PR109695]

<tldr>
We can now have int_range<N, RESIZABLE=false> for automatically
resizable ranges.  int_range_max is now int_range<3, true>
for a 69X reduction in size from current trunk, and 6.9X reduction from
GCC12.  This incurs a 5% performance penalty for VRP that is more than
covered by our > 13% improvements recently.
</tldr>

int_range_max is the temporary range object we use in the ranger for
integers.  With the conversion to wide_int, this structure bloated up
significantly because wide_ints are huge (80 bytes a piece) and are
about 10 times as big as a plain tree.  Since the temporary object
requires 255 sub-ranges, that's 255 * 80 * 2, plus the control word.
This means the structure grew from 4112 bytes to 40912 bytes.

This patch adds the ability to resize ranges as needed, defaulting to
no resizing, while int_range_max now defaults to 3 sub-ranges (instead
of 255) and grows to 255 when the range being calculated does not fit.

For example:

int_range<1> foo; // 1 sub-range with no resizing.
int_range<5> foo; // 5 sub-ranges with no resizing.
int_range<5, true> foo; // 5 sub-ranges with resizing.

I ran some tests and found that 3 sub-ranges cover 99% of cases, so
I've set the int_range_max default to that:

typedef int_range<3, /*RESIZABLE=*/true> int_range_max;

We don't bother growing incrementally, since the default covers most
cases and we have a 255 hard-limit.  This hard limit could be reduced
to 128, since my tests never saw a range needing more than 124, but we
could do that as a follow-up if needed.

With 3-subranges, int_range_max is now 592 bytes versus 40912 for
trunk, and versus 4112 bytes for GCC12!  The penalty is 5.04% for VRP
and 3.02% for threading, with no noticeable change in overall
compilation (0.27%).  This is more than covered by our 13.26%
improvements for the legacy removal + wide_int conversion.

I think this approach is a good alternative, while providing us with
flexibility going forward.  For example, we could try defaulting to a
8 sub-ranges for a noticeable improvement in VRP.  We could also use
large sub-ranges for switch analysis to avoid resizing.

Another approach I tried was always resizing.  With this, we could
drop the whole int_range<N> nonsense, and have irange just hold a
resizable range.  This simplified things, but incurred a 7% penalty on
ipa_cp.  This was hard to pinpoint, and I'm not entirely convinced
this wasn't some artifact of valgrind.  However, until we're sure,
let's avoid massive changes, especially since IPA changes are coming
up.

For the curious, a particular hot spot for IPA in this area was:

ipcp_vr_lattice::meet_with_1 (const value_range *other_vr)
{
...
...
  value_range save (m_vr);
  m_vr.union_ (*other_vr);
  return m_vr != save;
}

The problem isn't the resizing (since we do that at most once) but the
fact that for some functions with lots of callers we end up a huge
range that gets copied and compared for every meet operation.  Maybe
the IPA algorithm could be adjusted somehow??.

Anywhooo... for now there is nothing to worry about, since value_range
still has 2 subranges and is not resizable.  But we should probably
think what if anything we want to do here, as I envision IPA using
infinite ranges here (well, int_range_max) and handling frange's, etc.

gcc/ChangeLog:

PR tree-optimization/109695
* value-range.cc (irange::operator=): Resize range.
(irange::union_): Same.
(irange::intersect): Same.
(irange::invert): Same.
(int_range_max): Default to 3 sub-ranges and resize as needed.
* value-range.h (irange::maybe_resize): New.
(~int_range): New.
(int_range::int_range): Adjust for resizing.
(int_range::operator=): Same.

Only return changed=true in union_nonzero when appropriate.

irange::union_ was being overly pessimistic in its return value.  It
was returning false when the nonzero mask was possibly the same.

The reason for this is because the nonzero mask is not entirely kept
up to date.  We avoid setting it up when a new range is set (from a
set, intersect, union, etc), because calculating a mask from a range
is measurably expensive.  However, irange::get_nonzero_bits() will
always return the correct mask because it will calculate the nonzero
mask inherit in the mask on the fly and bitwise or it with the saved
mask.  This was an optimization because last release it was a big
penalty to keep the mask up to date.  This may not necessarily be the
case with the conversion to wide_int's.  We should investigate.

Just to be clear, the result from get_nonzero_bits() is always correct
as seen by the user, but the wide_int in the irange does not contain
all the information, since part of the nonzero bits can be determined
by the range itself, on the fly.

The fix here is to make sure that the result the user sees (callers of
get_nonzero_bits()) changed when unioning bits.  This allows
ipcp_vr_lattice::meet_with_1 to avoid unnecessary copies when
determining if a range changed.

This patch yields an 6.89% improvement to the ipa_cp pass.  I'm
including the IPA changes in this patch, as it's a testcase of sorts for
the change.

gcc/ChangeLog:

* ipa-cp.cc (ipcp_vr_lattice::meet_with_1): Avoid unnecessary
range copying
* value-range.cc (irange::union_nonzero_bits): Return TRUE only
when range changed.

c++: add feature-test macro for auto(x)

This adds the feature-test macro for PR0849R8, as per
https://github.com/cplusplus/CWG/issues/281.

gcc/c-family/ChangeLog:

* c-cppbuiltin.cc (c_cpp_builtins): Predefine __cpp_auto_cast
for C++23.

gcc/testsuite/ChangeLog:

* g++.dg/cpp23/feat-cxx2b.C: Test __cpp_auto_cast.

RISC-V: Add rounding mode operand for fixed-point patterns

Since we are going to have fixed-point intrinsics that are modeling
rounding mode
https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/222

We should have operand to specify rounding mode in fixed-point instructions.
We don't support these modeling rounding mode intrinsics yet but we will
definetely support them later.

This is the preparing patch for new coming intrinsics.

gcc/ChangeLog:

* config/riscv/riscv-protos.h (enum vxrm_field_enum): New enum.
* config/riscv/riscv-vector-builtins.cc
(function_expander::use_exact_insn): Add default rounding mode operand.
* config/riscv/riscv.cc (riscv_hard_regno_nregs): Add VXRM_REGNUM.
(riscv_hard_regno_mode_ok): Ditto.
(riscv_conditional_register_usage): Ditto.
* config/riscv/riscv.h (DWARF_FRAME_REGNUM): Ditto.
(VXRM_REG_P): Ditto.
(RISCV_DWARF_VXRM): Ditto.
* config/riscv/riscv.md: Ditto.
* config/riscv/vector.md: Ditto

Signed-off-by: Juzhe-Zhong <juzhe.zhong@rivai.ai>

OPTABS: Extend the number of expanding instructions pattern

We (RVV) is going to add a rounding mode operand into floating-point
instructions which have 11 operands.

Since we are going have intrinsic that is adding rounding mode argument:
https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/226

This is the patch that is adding rounding mode operand in RISC-V port:
https://gcc.gnu.org/pipermail/gcc-patches/2023-May/618573.html
You can see there are 11 operands in these patterns.

gcc/ChangeLog:

* optabs.cc (maybe_gen_insn): Add case to generate instruction
that has 11 operands.

Signed-off-by: Juzhe-Zhong <juzhe.zhong@rivai.ai>

fix assert in non-atomic path

The non-atomic path does not have range information,
we have to adjust the assert handle that case, too.

libgcc/ChangeLog:
* unwind-dw2-fde.c: Fix assert in non-atomic path.

aarch64: Cost vector comparisons more accurately

We are missing cases for combining of FACGE/FACGT instructions. In the testcase of the patch we generate:
foo:
        fabs    v3.4s, v0.4s
        fabs    v0.4s, v1.4s
        fabs    v1.4s, v2.4s
        fcmgt   v0.4s, v3.4s, v0.4s
        fcmgt   v1.4s, v3.4s, v1.4s
        b       g

This is because combine is rejecting the pattern due to costs:
Successfully matched this instruction:
(set (reg:V4SI 106)
    (neg:V4SI (lt:V4SI (abs:V4SF (reg:V4SF 113))
            (abs:V4SF (reg:V4SF 111)))))
rejecting combination of insns 8, 9 and 10
original costs 8 + 8 + 12 = 28
replacement costs 8 + 28 = 36

It is obviously recursing in the various arms of the RTX and such.
This patch teaches the aarch64 rtx costs routine that our vector comparisons are represented as a NEG of
compare operators, with the FACGE/FAGT operations in particular having ABS on each arm. With this patch we get
the much more reasonable dump:
original costs 8 + 8 + 8 = 24
replacement costs 8 + 8 = 16
and generate the optimal assembly:
foo:
        mov     v31.16b, v0.16b
        facgt   v0.4s, v0.4s, v1.4s
        facgt   v1.4s, v31.4s, v2.4s
        b       g

Bootstrapped and tested on aarch64-none-linux-gnu.

gcc/ChangeLog:

* config/aarch64/aarch64.cc (aarch64_rtx_costs, NEG case): Add costing
logic for vector modes.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/facg_1.c: New test.

Support parallel testing in libgomp, part II [PR66005]

..., and enable if 'flock' is available for serializing execution testing.

Regarding the default of 19 parallel slots, this turned out to be a local
minimum for wall time when testing this on:

    $ uname -srvi
    Linux 4.2.0-42-generic #49~14.04.1-Ubuntu SMP Wed Jun 29 20:22:11 UTC 2016 x86_64
    $ grep '^model name' < /proc/cpuinfo | uniq -c
         32 model name      : Intel(R) Xeon(R) CPU E5-2640 v3 @ 2.60GHz

... in two configurations: case (a) standard configuration, no offloading
configured, case (b) offloading for GCN and nvptx configured but no devices
available.  For both cases, default plus '-m32' variant.

    $ \time make check-target-libgomp RUNTESTFLAGS="--target_board=unix\{,-m32\}"

Case (a), baseline:

    6432.23user 332.38system 47:32.28elapsed 237%CPU (0avgtext+0avgdata 505044maxresident)k
    6382.43user 319.21system 47:06.04elapsed 237%CPU (0avgtext+0avgdata 505172maxresident)k

This is what people have been complaining about, rightly so, in
<https://gcc.gnu.org/PR66005> "libgomp make check time is excessive" and
elsewhere.

Case (a), parallelized:

    -j12 GCC_TEST_PARALLEL_SLOTS=10
    3088.49user 267.74system 6:43.82elapsed 831%CPU (0avgtext+0avgdata 505188maxresident)k
    -j15 GCC_TEST_PARALLEL_SLOTS=15
    3308.08user 294.79system 5:56.04elapsed 1011%CPU (0avgtext+0avgdata 505360maxresident)k
    -j17 GCC_TEST_PARALLEL_SLOTS=17
    3539.93user 298.99system 5:27.86elapsed 1170%CPU (0avgtext+0avgdata 505112maxresident)k
    -j18 GCC_TEST_PARALLEL_SLOTS=18
    3697.50user 317.18system 5:14.63elapsed 1275%CPU (0avgtext+0avgdata 505360maxresident)k
    -j19 GCC_TEST_PARALLEL_SLOTS=19
    3765.94user 324.27system 5:13.22elapsed 1305%CPU (0avgtext+0avgdata 505128maxresident)k
    -j20 GCC_TEST_PARALLEL_SLOTS=20
    3684.66user 312.32system 5:15.26elapsed 1267%CPU (0avgtext+0avgdata 505100maxresident)k
    -j23 GCC_TEST_PARALLEL_SLOTS=23
    4040.59user 347.10system 5:29.12elapsed 1333%CPU (0avgtext+0avgdata 505200maxresident)k
    -j26 GCC_TEST_PARALLEL_SLOTS=26
    3973.24user 377.96system 5:24.70elapsed 1340%CPU (0avgtext+0avgdata 505160maxresident)k
    -j32 GCC_TEST_PARALLEL_SLOTS=32
    4004.42user 346.10system 5:16.11elapsed 1376%CPU (0avgtext+0avgdata 505160maxresident)k

Yay!

Case (b), baseline; 2+ h:

    7227.58user 700.54system 2:14:33elapsed 98%CPU (0avgtext+0avgdata 994264maxresident)k

Case (b), parallelized:

    -j12 GCC_TEST_PARALLEL_SLOTS=10
    7377.46user 777.52system 16:06.63elapsed 843%CPU (0avgtext+0avgdata 994344maxresident)k
    -j15 GCC_TEST_PARALLEL_SLOTS=15
    8019.18user 721.42system 12:13.56elapsed 1191%CPU (0avgtext+0avgdata 994228maxresident)k
    -j17 GCC_TEST_PARALLEL_SLOTS=17
    8530.11user 716.95system 10:45.92elapsed 1431%CPU (0avgtext+0avgdata 994176maxresident)k
    -j18 GCC_TEST_PARALLEL_SLOTS=18
    8776.79user 645.89system 10:27.20elapsed 1502%CPU (0avgtext+0avgdata 994248maxresident)k
    -j19 GCC_TEST_PARALLEL_SLOTS=19
    9332.37user 641.76system 10:15.09elapsed 1621%CPU (0avgtext+0avgdata 994260maxresident)k
    -j20 GCC_TEST_PARALLEL_SLOTS=20
    9609.54user 789.88system 10:26.94elapsed 1658%CPU (0avgtext+0avgdata 994284maxresident)k
    -j23 GCC_TEST_PARALLEL_SLOTS=23
    10362.40user 911.14system 10:44.47elapsed 1749%CPU (0avgtext+0avgdata 994208maxresident)k
    -j26 GCC_TEST_PARALLEL_SLOTS=26
    11159.44user 850.99system 11:09.25elapsed 1794%CPU (0avgtext+0avgdata 994256maxresident)k
    -j32 GCC_TEST_PARALLEL_SLOTS=32
    11453.50user 939.52system 11:00.38elapsed 1876%CPU (0avgtext+0avgdata 994240maxresident)k

On my Dell Precision 7530 laptop:

    $ uname -srvi
    Linux 5.15.0-71-generic #78-Ubuntu SMP Tue Apr 18 09:00:29 UTC 2023 x86_64
    $ grep '^model name' < /proc/cpuinfo | uniq -c
         12 model name      : Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz
    $ nvidia-smi -L
    GPU 0: Quadro P1000 (UUID: GPU-e043973b-b52a-d02b-c066-a8fdbf64e8ea)

... in two configurations: case (c) standard configuration, no offloading
configured, case (d) offloading for nvptx configured and device available.
For both cases, only default variant, no '-m32'.

    $ \time make check-target-libgomp

Case (c), baseline; roughly half of case (a) (just one variant):

    1180.98user 110.80system 19:36.40elapsed 109%CPU (0avgtext+0avgdata 505148maxresident)k
    1133.22user 111.08system 19:35.75elapsed 105%CPU (0avgtext+0avgdata 505212maxresident)k

Case (c), parallelized:

    -j12 GCC_TEST_PARALLEL_SLOTS=2
    1143.83user 110.76system 10:20.46elapsed 202%CPU (0avgtext+0avgdata 505216maxresident)k
    -j12 GCC_TEST_PARALLEL_SLOTS=6
    1737.08user 143.94system 4:59.48elapsed 628%CPU (0avgtext+0avgdata 505200maxresident)k
    1730.31user 143.02system 4:58.75elapsed 627%CPU (0avgtext+0avgdata 505152maxresident)k
    -j12 GCC_TEST_PARALLEL_SLOTS=8
    2192.63user 169.34system 4:52.96elapsed 806%CPU (0avgtext+0avgdata 505216maxresident)k
    2219.04user 167.67system 4:53.19elapsed 814%CPU (0avgtext+0avgdata 505152maxresident)k
    -j12 GCC_TEST_PARALLEL_SLOTS=10
    2463.93user 184.98system 4:48.39elapsed 918%CPU (0avgtext+0avgdata 505200maxresident)k
    2455.62user 183.68system 4:47.40elapsed 918%CPU (0avgtext+0avgdata 505216maxresident)k
    -j12 GCC_TEST_PARALLEL_SLOTS=12
    2591.04user 192.64system 4:44.98elapsed 976%CPU (0avgtext+0avgdata 505216maxresident)k
    2581.23user 195.21system 4:47.51elapsed 965%CPU (0avgtext+0avgdata 505212maxresident)k
    -j20 GCC_TEST_PARALLEL_SLOTS=20 [oversubscribe]
    2613.18user 199.51system 4:44.06elapsed 990%CPU (0avgtext+0avgdata 505216maxresident)k

Case (d), baseline (compared to case (b): only nvptx offloading compilation,
but also nvptx offloading execution); ~1 h:

    2841.93user 653.68system 1:02:26elapsed 93%CPU (0avgtext+0avgdata 909792maxresident)k
    2842.03user 654.39system 1:02:24elapsed 93%CPU (0avgtext+0avgdata 909880maxresident)k

Case (d), parallelized:

    -j12 GCC_TEST_PARALLEL_SLOTS=2
    2856.39user 606.87system 33:58.64elapsed 169%CPU (0avgtext+0avgdata 909948maxresident)k
    -j12 GCC_TEST_PARALLEL_SLOTS=6
    3444.90user 666.86system 18:37.57elapsed 367%CPU (0avgtext+0avgdata 909856maxresident)k
    3462.13user 667.13system 18:36.87elapsed 369%CPU (0avgtext+0avgdata 909872maxresident)k
    -j12 GCC_TEST_PARALLEL_SLOTS=8
    3929.74user 716.22system 18:02.36elapsed 429%CPU (0avgtext+0avgdata 909832maxresident)k
    -j12 GCC_TEST_PARALLEL_SLOTS=10
    4152.84user 736.16system 17:43.05elapsed 459%CPU (0avgtext+0avgdata 909872maxresident)k
    -j12 GCC_TEST_PARALLEL_SLOTS=12
    4209.60user 749.00system 17:35.20elapsed 469%CPU (0avgtext+0avgdata 909840maxresident)k
    -j20 GCC_TEST_PARALLEL_SLOTS=20 [oversubscribe]
    4255.54user 756.78system 17:29.06elapsed 477%CPU (0avgtext+0avgdata 909868maxresident)k

Worth noting is that with nvptx offloading, there is one execution test case
that times out ('libgomp.fortran/reverse-offload-5.f90').  This effectively
stalls progress for almost 5 min: quickly other executions test cases queue up
on the lock for all parallel slots.  That's working as expected; just noting
this as it accordingly does skew the wall time numbers.

PR testsuite/66005
libgomp/
* configure.ac: Look for 'flock'.
* testsuite/Makefile.am (gcc_test_parallel_slots): Enable parallel testing.
* testsuite/config/default.exp: Don't 'load_lib "standard.exp"' here...
* testsuite/lib/libgomp.exp: ... but here, instead.
(libgomp_load): Override for parallel testing.
* testsuite/libgomp-site-extra.exp.in (FLOCK): Set.
* configure: Regenerate.
* Makefile.in: Regenerate.
* testsuite/Makefile.in: Regenerate.

Support parallel testing in libgomp, part I [PR66005]

..., while still hard-coding the number of parallel slots to one.

PR testsuite/66005
libgomp/
* testsuite/Makefile.am (PWD_COMMAND): New variable.
(%/site.exp): New target.
(check_p_numbers0, check_p_numbers1, check_p_numbers2)
(check_p_numbers3, check_p_numbers4, check_p_numbers5)
(check_p_numbers6, check_p_numbers, gcc_test_parallel_slots)
(check_p_subdirs)
(check_DEJAGNU_libgomp_targets): New variables.
($(check_DEJAGNU_libgomp_targets)): New target.
($(check_DEJAGNU_libgomp_targets)): New dependency.
(check-DEJAGNU $(check_DEJAGNU_libgomp_targets)): New targets.
* testsuite/Makefile.in: Regenerate.
* testsuite/lib/libgomp.exp: For parallel testing,
'load_file ../libgomp-test-support.exp'.

Co-authored-by: Thomas Schwinge <thomas@codesourcery.com>

libgomp testsuite: As appropriate, use the 'gcc', 'g++', 'gfortran' driver [PR91884]

..., that is, 'GCC_UNDER_TEST', 'GXX_UNDER_TEST', 'GFORTRAN_UNDER_TEST' instead
of 'GCC_UNDER_TEST' for all of them. No need anymore for 'gcc -lstdc++ -x c++'
for C++ code, or 'gcc -lgfortran' plus conditional '-lquadmath' for Fortran
code. (Getting rid of explicit '-foffload=-lgfortran' is for another day.)

PR testsuite/91884
libgomp/
* configure.ac: 'AC_SUBST(CXX)'.
* configure: Regenerate.
* Makefile.in: Likewise.
* testsuite/Makefile.in: Likewise.
* testsuite/libgomp-site-extra.exp.in (GXX_UNDER_TEST)
(GFORTRAN_UNDER_TEST): Set.
* testsuite/lib/libgomp.exp (libgomp_init): Adjust.
* testsuite/libgomp.c++/c++.exp: Use 'GXX_UNDER_TEST'.
* testsuite/libgomp.oacc-c++/c++.exp: Likewise.
* testsuite/libgomp.fortran/fortran.exp: Use
'GFORTRAN_UNDER_TEST'.
* testsuite/libgomp.oacc-fortran/fortran.exp: Likewise.

libgomp testsuite: Have each '*.exp' file specify the compiler to use [PR91884]

..., which is still 'GCC_UNDER_TEST' for all of them; no change in behavior.

PR testsuite/91884
libgomp/
* testsuite/lib/libgomp.exp (libgomp_target_compile): Don't
specify compiler.
* testsuite/libgomp.c++/c++.exp (ALWAYS_CFLAGS): Specify compiler.
* testsuite/libgomp.c/c.exp (ALWAYS_CFLAGS): Likewise.
* testsuite/libgomp.fortran/fortran.exp (ALWAYS_CFLAGS): Likewise.
* testsuite/libgomp.graphite/graphite.exp (ALWAYS_CFLAGS):
Likewise.
* testsuite/libgomp.oacc-c++/c++.exp (ALWAYS_CFLAGS): Likewise.
* testsuite/libgomp.oacc-c/c.exp (ALWAYS_CFLAGS): Likewise.
* testsuite/libgomp.oacc-fortran/fortran.exp (ALWAYS_CFLAGS):
Likewise.

fix assert in __deregister_frame_info_bases

The assertion in __deregister_frame_info_bases assumes that for every
frame something was inserted into the lookup data structure by
__register_frame_info_bases. Unfortunately, this does not necessarily
hold true as the btree_insert call in __register_frame_info_bases will
not insert anything for empty ranges. Therefore, we need to explicitly
account for such empty ranges in the assertion as `ob` will be a null
pointer for such ranges, hence causing the assertion to fail.

Signed-off-by: Sören Tempel <soeren@soeren-tempel.net>
libgcc/ChangeLog:
* unwind-dw2-fde.c: Accept empty ranges when deregistering frames.