Harald Anlauf [Wed, 17 May 2023 18:39:18 +0000 (20:39 +0200)]
Fortran: set shape of initializers of zero-sized arrays [PR95374,PR104352]
gcc/fortran/ChangeLog:
PR fortran/95374
PR fortran/104352
* decl.cc (add_init_expr_to_sym): Set shape of initializer also for
zero-sized arrays, so that bounds violations can be detected later.
gcc/testsuite/ChangeLog:
PR fortran/95374
PR fortran/104352
* gfortran.dg/zero_sized_13.f90: New test.
Jakub Jelinek [Wed, 17 May 2023 19:21:23 +0000 (21:21 +0200)]
libstdc++: Fix up some <cmath> templates [PR109883]
As can be seen on the following testcase, for
std::{atan2,fmod,pow,copysign,fdim,fmax,fmin,hypot,nextafter,remainder,remquo,fma}
if one operand type is std::float{16,32,64,128}_t or std::bfloat16_t and
another one some integral type or some other floating point type which
promotes to the other operand's type, we can end up with endless recursion.
This is because of a declaration ordering problem in <cmath>, where the
float, double and long double overloads of those functions come before
the templates which use __gnu_cxx::__promote_{2,3}, but the
std::float{16,32,64,128}_t and std::bfloat16_t overloads come later in the
file. If the result of those promotions is _Float{16,32,64,128} or
__gnu_cxx::__bfloat16_t, say std::pow(_Float64, int) calls
std::pow(_Float64, _Float64) and the latter calls itself.
The following patch fixes that by moving those templates later in the file,
so that the calls from those templates see also the other overloads.
I think other templates in the file like e.g. isgreater etc. shouldn't be
a problem, because those just use __builtin_isgreater etc. in their bodies.
2023-05-17 Jakub Jelinek <jakub@redhat.com>
PR libstdc++/109883
* include/c_global/cmath (atan2, fmod, pow): Move
__gnu_cxx::__promote_2 using templates after _Float{16,32,64,128} and
__gnu_cxx::__bfloat16_t overloads.
(copysign, fdim, fmax, fmin, hypot, nextafter, remainder, remquo):
Likewise.
(fma): Move __gnu_cxx::__promote_3 using template after
_Float{16,32,64,128} and __gnu_cxx::__bfloat16_t overloads.
* testsuite/26_numerics/headers/cmath/constexpr_std_c++23.cc: New test.
Jivan Hakobyan [Wed, 17 May 2023 19:00:28 +0000 (13:00 -0600)]
RISC-V: Remove masking third operand of rotate instructions
Rotate instructions do not need to mask the third operand.
For example, RV64 the following code:
unsigned long foo1(unsigned long rs1, unsigned long rs2)
{
long shamt = rs2 & (64 - 1);
return (rs1 << shamt) | (rs1 >> ((64 - shamt) & (64 - 1)));
}
Compiles to:
foo1:
andi a1,a1,63
rol a0,a0,a1
ret
This patch removes unnecessary masking.
Besides, I have merged masking insns for shifts that were written before.
gcc/ChangeLog:
* config/riscv/riscv.md (*<optab><GPR:mode>3_mask): New pattern,
combined from ...
(*<optab>si3_mask, *<optab>di3_mask): Here.
(*<optab>si3_mask_1, *<optab>di3_mask_1): And here.
* config/riscv/bitmanip.md (*<bitmanip_optab><GPR:mode>3_mask): New
pattern.
(*<bitmanip_optab>si3_sext_mask): Likewise.
* config/riscv/iterators.md (shiftm1): Use const_si_mask_operand
and const_di_mask_operand.
(bitmanip_rotate): New iterator.
(bitmanip_optab): Add rotates.
* config/riscv/predicates.md (const_si_mask_operand): Renamed
from const31_operand. Generalize to handle more mask constants.
(const_di_mask_operand): Similarly.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/shift-and-2.c: Fixed test
* gcc.target/riscv/zbb-rol-ror-01.c: New test
* gcc.target/riscv/zbb-rol-ror-02.c: New test
* gcc.target/riscv/zbb-rol-ror-03.c: New test
* gcc.target/riscv/zbb-rol-ror-04.c: New test
* gcc.target/riscv/zbb-rol-ror-05.c: New test
* gcc.target/riscv/zbb-rol-ror-06.c: New test
* gcc.target/riscv/zbb-rol-ror-07.c: New test
Jonathan Wakely [Wed, 17 May 2023 12:47:54 +0000 (13:47 +0100)]
libstdc++: Add system_header pragma to <bits/c++config.h>
Without this change many tests that depend on an effective-target will
fail when compiled with -pedantic -std=c++98. This happens because the
preprocessor check done by v3_check_preprocessor_condition uses -Werror
and includes <bits/c++config.h> directly (rather than via another header
like <string>). If <bits/c++config.h> is not a system header then this
pedwarn is not suppressed, and the effective-target check fails:
bits/c++config.h:220: error: anonymous variadic macros were introduced in C++11 [-Werror=variadic-macros]
cc1plus: all warnings being treated as errors
compiler exited with status 1
UNSUPPORTED: 18_support/headers/limits/synopsis.cc
We could consider also changing proc v3_check_preprocessor_condition so
that it includes a real header, rather than just <bits/c++config.h>, but
that's not necessary for now.
Jonathan Wakely [Tue, 16 May 2023 21:40:42 +0000 (22:40 +0100)]
libstdc++: Implement LWG 3877 for std::expected monadic ops
This was approved in Issaquah 2023. As well as fixing the value
categories, this fixes the fact that we were incorrectly testing E
instead of T in the or_else constraints.
libstdc++-v3/ChangeLog:
* include/std/expected (expected::and_then, expected::or_else)
(expected::transform, expected::transform_error): Fix exception
specifications as per LWG 3877.
(expected<void, E>::and_then, expected<void, E>::transform):
Likewise.
* testsuite/20_util/expected/lwg3877.cc: New test.
Jakub Jelinek [Wed, 17 May 2023 18:59:54 +0000 (20:59 +0200)]
i386: Fix up types in __builtin_{inf,huge_val,nan{,s},fabs,copysign}q builtins [PR109884]
When _Float128 support has been added to C++ for 13.1, float128t_type_node
tree has been added - in C float128_type_node and float128t_type_node is
the same and represents both _Float128 and __float128, but in C++ they
are distinct types which have different handling in the FEs.
When doing that change, I mistakenly forgot to change FLOAT128 primitive
type, which is used for the __builtin_{inf,huge_val,nan{,s},fabs,copysign}q
builtins results and some of their arguments (and nothing else).
The following patch fixes that.
On ia64 we already use float128t_type_node for those builtins, pa while
it has __float128 that type is the same as long double and so those builtins
have long double types and on powerpc seems we don't have these builtins
but instead define macros which map them to __builtin_*f128. That will
not work properly in C++, perhaps we should change those macros to be
function-like and cast to __float128.
2023-05-17 Jakub Jelinek <jakub@redhat.com>
PR c++/109884
* config/i386/i386-builtin-types.def (FLOAT128): Use
float128t_type_node rather than float128_type_node.
Since tree-ssa-math-opts may freely contract across statement boundaries
we should enable it only for -ffp-contract=fast instead of disabling it
for -ffp-contract=off.
No functional change, since -ffp-contract=on is not exposed yet.
gcc/ChangeLog:
* tree-ssa-math-opts.cc (convert_mult_to_fma): Enable only for
FP_CONTRACT_FAST (no functional change).
Returned integer vector mode costs of emulated modes in
ix86_multiplication_cost are wrong and do not reflect generated
instruction sequences. Rewrite handling of different integer vector
modes and different target ABIs to return real instruction
counts in order to calcuate better costs of various emulated modes.
gcc/ChangeLog:
* config/i386/i386.cc (ix86_multiplication_cost): Correct
calcuation of integer vector mode costs to reflect generated
instruction sequences of different integer vector modes and
different target ABIs.
Gaius Mulley [Wed, 17 May 2023 16:42:03 +0000 (17:42 +0100)]
WriteInt in the ISO libraries should not emit '+' for positive values
This trivial patch changes the default behaviour for WriteInt so that
'+' is not emitted when writing positive values.
gcc/m2/ChangeLog:
* gm2-libs-iso/LongWholeIO.mod (WriteInt): Only request a
sign if the value is < 0.
* gm2-libs-iso/ShortWholeIO.mod (WriteInt): Only request a
sign if the value is < 0.
* gm2-libs-iso/WholeIO.mod (WriteInt): Only request a sign
if the value is < 0.
* gm2-libs-iso/WholeStr.mod (WriteInt): Only request a sign
if the value is < 0.
So the performance && correctness can be well trusted.
Here is the example:
void f (void * in, void *out, int32_t x, int n, int m)
{
for (int i = 0; i < n; i++) {
vint32m1_t v = __riscv_vle32_v_i32m1 (in + i, 4);
vint32m1_t v2 = __riscv_vle32_v_i32m1_tu (v, in + 100 + i, 4);
vint32m1_t v3 = __riscv_vaadd_vx_i32m1 (v2, 0, VXRM_RDN, 4);
v3 = __riscv_vaadd_vx_i32m1 (v3, 3, VXRM_RDN, 4);
__riscv_vse32_v_i32m1 (out + 100 + i, v3, 4);
}
for (int i = 0; i < n; i++) {
vint32m1_t v = __riscv_vle32_v_i32m1 (in + i + 1000, 4);
vint32m1_t v2 = __riscv_vle32_v_i32m1_tu (v, in + 100 + i + 1000, 4);
vint32m1_t v3 = __riscv_vaadd_vx_i32m1 (v2, 0, VXRM_RDN, 4);
v3 = __riscv_vaadd_vx_i32m1 (v3, 3, VXRM_RDN, 4);
__riscv_vse32_v_i32m1 (out + 100 + i + 1000, v3, 4);
}
}
mode switching can global recognize both Loop 1 and Loop 2 are using RDN
rounding mode and hoist such single "csrwi vxrm,2" to dominate both Loop 1
and Loop 2.
Besides, I have add correctness check sanity tests in this patch too.
* gcc.target/riscv/rvv/base/vxrm-10.c: New test.
* gcc.target/riscv/rvv/base/vxrm-6.c: New test.
* gcc.target/riscv/rvv/base/vxrm-7.c: New test.
* gcc.target/riscv/rvv/base/vxrm-8.c: New test.
* gcc.target/riscv/rvv/base/vxrm-9.c: New test.
This patch doesn't insert vxrm csrw configuration instruction yet.
Will support automatically insert csrw vxrm instruction in the next patch.
This patch does this following:
1. Only extend the vxrm argument.
2. Check vxrm argument is invalid immediate and report error message if it is invalid.
The problem here is that VRP cannot figure out isize could not be 0
due to using integer_zerop. This patch removes the use of integer_zerop
and instead checks for 0 directly after converting the tree to
an unsigned HOST_WIDE_INT. This allows VRP to figure out isize is not 0
and `isize - 1` will always be >= 0.
This patch is just to avoid the warning that GCC could produce sometimes
and does not change any code generation or even VRP.
OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
gcc/ChangeLog:
* tree-ssa-forwprop.cc (simplify_builtin_call): Check
against 0 instead of calling integer_zerop.
Aldy Hernandez [Tue, 16 May 2023 20:20:54 +0000 (22:20 +0200)]
Provide support for copying unsupported ranges.
The unsupported_range class is provided for completness sake. It is a
way to set VARYING/UNDEFINED ranges for unsupported ranges (currently
anything not float, integer, or pointer). You can't do anything with
them, except set_varying, and set_undefined. We will trap on any
other operation.
This patch provides a way to copy them, just in case they creep in.
This could happen in IPA under certain circumstances.
gcc/ChangeLog:
* value-range.cc (vrange::operator=): Add a stub to copy
unsupported ranges.
* value-range.h (is_a <unsupported_range>): New.
(Value_Range::operator=): Support copying unsupported ranges.
I think it's time for the ranger folk to start owning range streaming
instead of passes (IPA, etc) doing their own thing. I have plans for
overhauling the IPA code later this cycle to support generic ranges,
and I'd like to start cleaning up the streaming and hashing interface.
This patch adds generic streaming support for vrange.
Jonathan Wakely [Thu, 27 Apr 2023 11:02:38 +0000 (12:02 +0100)]
doc: Describe behaviour of enums with fixed underlying type [PR109532]
gcc/ChangeLog:
PR c++/109532
* doc/invoke.texi (Code Gen Options): Note that -fshort-enums
is ignored for a fixed underlying type.
(C++ Dialect Options): Likewise for -fstrict-enums.
Tobias Burnus [Wed, 17 May 2023 10:28:14 +0000 (12:28 +0200)]
Fortran/OpenMP: Fix mapping of array descriptors and deferred-length strings
Previously, array descriptors might have been mapped as 'alloc'
instead of 'to' for 'alloc', not updating the array bounds. The
'alloc' could also appear for 'data exit', failing with a libgomp
assert. In some cases, either array descriptors or deferred-length
string's length variable was not mapped. And, finally, some offset
calculations with array-sections mappings went wrong.
Additionally, the patch now unmaps for scalar allocatables/pointers
the GOMP_MAP_POINTER, avoiding stale mappings.
The testcases contain some comment-out tests which require follow-up
work and for which PR exist. Those mostly relate to deferred-length
strings which have several issues beyong OpenMP support.
gcc/fortran/ChangeLog:
* trans-decl.cc (gfc_get_symbol_decl): Add attributes
such as 'declare target' also to hidden artificial
variable for deferred-length character variables.
* trans-openmp.cc (gfc_trans_omp_array_section,
gfc_trans_omp_clauses, gfc_trans_omp_target_exit_data):
Improve mapping of array descriptors and deferred-length
string variables.
gcc/ChangeLog:
* gimplify.cc (gimplify_scan_omp_clauses): Remove Fortran
special case.
libgomp/ChangeLog:
* testsuite/libgomp.fortran/target-enter-data-3.f90: Uncomment
'target exit data'.
* testsuite/libgomp.fortran/target-enter-data-4.f90: New test.
* testsuite/libgomp.fortran/target-enter-data-5.f90: New test.
* testsuite/libgomp.fortran/target-enter-data-6.f90: New test.
* testsuite/libgomp.fortran/target-enter-data-7.f90: New test.
gcc/testsuite/
* gfortran.dg/goacc/finalize-1.f: Update dg-tree; shows a fix
for 'finalize' as a ptr is now 'delete' instead of 'release'.
* gfortran.dg/gomp/pr78260-2.f90: Likewise as elem-size calc moved
to if (allocated) block
* gfortran.dg/gomp/target-exit-data.f90: Likewise as a var is now a
replaced by a MEM< _25 > expression.
* gfortran.dg/gomp/map-9.f90: Update dg-scan-tree-dump.
* gfortran.dg/gomp/map-10.f90: New test.
So far atomic objects are aligned according to their default alignment.
For 128 bit scalar types like int128 or long double this results in an
8 byte alignment which is wrong and must be 16 byte.
libstdc++ already computes a correct alignment, though, still adding a
test case in order to make sure that both implementations are
compatible.
Jakub Jelinek [Wed, 17 May 2023 08:15:50 +0000 (10:15 +0200)]
c++: Don't try to initialize zero width bitfields in zero initialization [PR109868]
My GCC 12 change to avoid removing zero-sized bitfields as they are
important for ABI and are needed for layout compatibility traits
apparently causes zero sized bitfields to be initialized in the IL,
which at least in 13+ results in ICEs in the ranger which is upset
about zero precision types.
I think we could even avoid initializing other unnamed bitfields, but
unfortunately !CONSTRUCTOR_NO_CLEARING doesn't mean in the middle-end
clearing of padding bits and until we have some new flag that represents
the request to clear padding bits, I think it is better to keep zeroing
non-zero sized unnamed bitfields.
In addition to skipping those fields, I have changed the logic how
UNION_TYPEs are handled, the current code was a little bit weird in that
e.g. if first non-static data member had error_mark_node type, we'd happily
zero initialize the second non-static data member, etc.
2023-05-17 Jakub Jelinek <jakub@redhat.com>
PR c++/109868
* init.cc (build_zero_init_1): Don't initialize zero-width bitfields.
For unions only initialize the first FIELD_DECL.
Kewen Lin [Wed, 17 May 2023 07:48:40 +0000 (02:48 -0500)]
vect: Don't retry if the previous analysis fails
When working on a cost tweaking patch, I found that a newly
added test case has different dumpings with stage-1 and
bootstrapped gcc. By looking into it, the apparent reason
is vect_analyze_loop_2 doesn't get slp_done_for_suggested_uf
set expectedly, the following retrying will use the garbage
slp_done_for_suggested_uf instead. In fact, the setting of
slp_done_for_suggested_uf only happens when the previous
analysis succeeds, for the mentioned test case, its previous
analysis does fail, it's unexpected to use the value of
slp_done_for_suggested_uf any more.
In function vect_analyze_loop_1, we only return success when
res is true, which is the result of 1st analysis. It means
we never try to vectorize with unroll_vinfo if the previous
analysis fails. So this patch shouldn't break anything, and
just stop some useless analysis early.
gcc/ChangeLog:
* tree-vect-loop.cc (vect_analyze_loop_1): Don't retry analysis with
suggested unroll factor once the previous analysis fails.
These APIs help the users to convert vector LMUL=1 integer to vbool1_t.
According to the RVV intrinsic SPEC as below, the reinterpret intrinsics
only change the types of the underlying contents.
For example, given below code.
vbool1_t test_vreinterpret_v_i8m1_b1(vint8m1_t src) {
return __riscv_vreinterpret_v_i8m1_b1(src);
}
It will generate the assembly code similar as below:
vsetvli a5,zero,e8,m8,ta,ma
vlm.v v1,0(a1)
vsm.v v1,0(a0)
ret
The rest intrinsic bool size APIs will be prepared in other PATCH.
Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:
* config/riscv/genrvv-type-indexer.cc (BOOL_SIZE_LIST): New
macro.
(main): Add bool1 to the type indexer.
* config/riscv/riscv-vector-builtins-functions.def
(vreinterpret): Register vbool1 interpret function.
* config/riscv/riscv-vector-builtins-types.def
(DEF_RVV_BOOL1_INTERPRET_OPS): New macro.
(vint8m1_t): Add the type to bool1_interpret_ops.
(vint16m1_t): Ditto.
(vint32m1_t): Ditto.
(vint64m1_t): Ditto.
(vuint8m1_t): Ditto.
(vuint16m1_t): Ditto.
(vuint32m1_t): Ditto.
(vuint64m1_t): Ditto.
* config/riscv/riscv-vector-builtins.cc
(DEF_RVV_BOOL1_INTERPRET_OPS): New macro.
(required_extensions_p): Add bool1 interpret case.
* config/riscv/riscv-vector-builtins.def
(bool1_interpret): Add bool1 interpret to base type.
* config/riscv/vector.md (@vreinterpret<mode>): Add new expand
with VB dest for vreinterpret.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/misc_vreinterpret_vbool_vint.c: New test.
Joseph Myers [Tue, 16 May 2023 23:44:56 +0000 (23:44 +0000)]
c: Remove restrictions on declarations in 'for' loops for C2X
C2X removes a restriction that the only declarations in the
declaration part of a 'for' loop are declarations of objects with
storage class auto or register. Implement this change, making the
diagnostics into pedwarn_c11 calls instead of errors (as usual for
features added in a new standard version that were invalid code in a
previous version), so now pedwarn-if-pedantic for older standards and
diagnosed also with -Wc11-c2x-compat.
Bootstrapped with no regressions for x86_64-pc-linux-gnu.
gcc/c/
* c-decl.cc (check_for_loop_decls): Use pedwarn_c11 for
diagnostics.
gcc/testsuite/
* gcc.dg/c11-fordecl-1.c, gcc.dg/c11-fordecl-2.c,
gcc.dg/c11-fordecl-3.c, gcc.dg/c11-fordecl-4.c,
gcc.dg/c2x-fordecl-1.c, gcc.dg/c2x-fordecl-2.c,
gcc.dg/c2x-fordecl-3.c, gcc.dg/c2x-fordecl-4.c: New tests.
* gcc.dg/c99-fordecl-2.c: Test diagnostic for typedef declaration
in for loop here.
* gcc.dg/pr67784-2.c, gcc.dg/pr68320.c, objc.dg/foreach-7.m: Do
not expect errors for typedef declaration in for loop.
Gaius Mulley [Tue, 16 May 2023 23:18:56 +0000 (00:18 +0100)]
PR modula2/109879 WholeIO.ReadCard and ReadInt should consume leading space
The Read{TYPE} procedures in LongIO, LongWholeIO, RealIO, ShortWholeIO and
WholeIO all require skip space functionality. A new module TextUtil
is supplied with this functionality and the previous modules have been
changed to call SkipSpaces.
Marek Polacek [Tue, 16 May 2023 18:12:06 +0000 (14:12 -0400)]
c++: -Wdangling-reference not suppressed in template [PR109774]
In check_return_expr, we suppress the -Wdangling-reference warning when
we're sure it would be a false positive. It wasn't working in a
template, though, because the suppress_warning call was never reached.
PR c++/109774
gcc/cp/ChangeLog:
* typeck.cc (check_return_expr): In a template, return only after
suppressing -Wdangling-reference.
Jonathan Wakely [Tue, 16 May 2023 14:09:20 +0000 (15:09 +0100)]
libstdc++: Disable cacheline alignment for DJGPP [PR109741]
DJGPP (and maybe other targets) uses MAX_OFILE_ALIGNMENT=16 which means
that globals (and static objects) can't have alignment greater than 16.
This causes an error for the locks defined in src/c++11/shared_ptr.cc
because we try to align them to the cacheline size, to avoid false
sharing.
Add a configure check for the increased alignment, and live with false
sharing where we can't increase the alignment.
libstdc++-v3/ChangeLog:
PR libstdc++/109741
* acinclude.m4 (GLIBCXX_CHECK_ALIGNAS_CACHELINE): Define.
* config.h.in: Regenerate.
* configure: Regenerate.
* configure.ac: Use GLIBCXX_CHECK_ALIGNAS_CACHELINE.
* src/c++11/shared_ptr.cc (__gnu_internal::get_mutex): Do not
align lock table if not supported. use __GCC_DESTRUCTIVE_SIZE
instead of hardcoded 64.
Patrick Palka [Tue, 16 May 2023 16:39:16 +0000 (12:39 -0400)]
c++: desig init in presence of list ctor [PR109871]
add_list_candidates has logic to reject designated initialization of a
non-aggregate type, but this is inadvertently being suppressed if the type
has a list constructor due to the order of case analysis, which in the
below testcase leads to us incorrectly treating the initializer list as if
it's non-designated. This patch fixes this by making us check for invalid
designated initialization sooner.
PR c++/109871
gcc/cp/ChangeLog:
* call.cc (add_list_candidates): Check for invalid designated
initialization sooner and even for types that have a list
constructor.
This patch adds support for the VDIVSQ, VDIVUQ, VMODSQ, and VMODUQ
instructions to do 128-bit arithmetic.
2021-07-07 Michael Meissner <meissner@linux.ibm.com>
The code generation changed significantly. There are two places where
the vextsd2q is "replaced" by a vdivsq instruction thus increasing the
vdivsq count from 1 to 3. The first case is:
Carl Love [Tue, 28 Mar 2023 16:57:25 +0000 (12:57 -0400)]
rs6000: Fix test gc.target/powerpc/rs600-fpint.c test options
The test compile option rs6000-*-* is outdated and no longer supported.
The powerpc*-*-* is the defualt, so it doesn't need to be specified.
The dg-options needs to specify an older processor to get the desired
behavior on recent processors, since gfxopt is only off for very old CPUs,
we don't guard stfiwx under it for recent processors and don't want to.
This patch updates the test specifications so the test will run properly on
Power10LE. Tested on Power10 LE system with no regression test failures.
gcc/testsuite/:
* gcc.target/powerpc/rs6000-fpint.c: Update dg-options, drop dg-do
compile specifier.
Gaius Mulley [Tue, 16 May 2023 14:51:53 +0000 (15:51 +0100)]
PR modula2/108344 disable default opening of /dev/tty
This patch changes removes the static initialisation code for KeyBoardLEDs.cc.
The module is only initialised if one of the exported functions is called.
This is useful as the module will access /dev/tty which might not be
available. TimerHandler.mod has also been changed to disable the scroll
lock LED as a sign of life.
gcc/m2/ChangeLog:
PR modula2/108344
* gm2-libs-coroutines/TimerHandler.mod (EnableLED): New constant.
(Timer): Test EnableLED before switching on the scroll LED.
aarch64: Allow moves after tied-register intrinsics (2nd edition)
I missed these two in g:4ff89f10ca0d41f9cfa76 because I was
testing on a system that didn't support big-endian compilation.
Testing on aarch64_be-elf shows no other related failures
(although the overall results are worse than for little-endian).
gcc/testsuite/
* gcc.target/aarch64/advsimd-intrinsics/bfdot-2.c: Allow mves
to occur after the intrinsic instruction, rather than requiring
them to happen before.
* gcc.target/aarch64/advsimd-intrinsics/vdot-3-2.c: Likewise.
Jonathan Wakely [Fri, 12 May 2023 20:36:56 +0000 (21:36 +0100)]
libstdc++: Stop using TR1 macros in <cctype> and <cfenv>
As with the two commits before this, the _GLIBCXX_USE_C99_CTYPE_TR1 and
_GLIBCXX_USE_C99_FENV_TR1 macros are misleading when they are also used
for <cctype> and <cfenv>, not only for TR1 headers. It is also wrong,
because the configure checks for TR1 use -std=c++98 and a target might
define the C99 features for C++11 but not for C++98.
Add separate configure checks for the <ctype.h> and <fenv.h> features using -std=c++11
for the checks. Use the new macros defined by those checks in the
C++11-specific parts of <cctype>, <cfenv>, and <fenv.h>.
libstdc++-v3/ChangeLog:
* acinclude.m4 (GLIBCXX_USE_C99): Check for isblank in C++11
mode and define _GLIBCXX_USE_C99_CTYPE. Check for <fenv.h>
functions in C++11 mode and define _GLIBCXX_USE_C99_FENV.
* config.h.in: Regenerate.
* configure: Regenerate.
* include/c_compatibility/fenv.h: Check _GLIBCXX_USE_C99_FENV
instead of _GLIBCXX_USE_C99_FENV_TR1.
* include/c_global/cfenv: Likewise.
* include/c_global/cctype: Check _GLIBCXX_USE_C99_CTYPE instead
of _GLIBCXX_USE_C99_CTYPE_TR1.
Jonathan Wakely [Fri, 12 May 2023 17:53:55 +0000 (18:53 +0100)]
libstdc++: Stop using _GLIBCXX_USE_C99_STDINT_TR1 in <cstdint>
The _GLIBCXX_USE_C99_STDINT_TR1 macro (and the comments about it in
acinclude.m4 and config.h) are misleading when it is also used for
<stdint>, not only <tr1/stdint>. It is also wrong, because the
configure checks for TR1 use -std=c++98 and a target might define
uint32_t etc. for C++11 but not for C++98.
Add a separate configure check for the <stdint.h> types using -std=c++11
for the checks. Use the result of that separate check in <cstdint> and
most other places that still depend on the macro (many uses of that
macro have been removed already). The remaining uses of the STDINT_TR1
macro are really for TR1, or are in the src/c++11/compatibility-*.cc
files, where we don't want/need to change the condition they depend on
(if those symbols were only exported when <stdint.h> types were
available for -std=c++98, then that's the condition we should continue
to use for whether to export the compat symbols now).
Make similar changes for the related _GLIBCXX_USE_C99_INTTYPES_TR1 and
_GLIBCXX_USE_C99_INTTYPES_WCHAR_T_TR1 macros, adding new macros for
non-TR1 uses.
libstdc++-v3/ChangeLog:
* acinclude.m4 (GLIBCXX_USE_C99): Check for <stdint.h> types in
C++11 mode and define _GLIBCXX_USE_C99_STDINT. Check for
<inttypes.h> features in C++11 mode and define
_GLIBCXX_USE_C99_INTTYPES and _GLIBCXX_USE_C99_INTTYPES_WCHAR_T.
* config.h.in: Regenerate.
* configure: Regenerate.
* doc/doxygen/user.cfg.in (PREDEFINED): Add new macros.
* include/bits/chrono.h: Check _GLIBCXX_USE_C99_STDINT instead
of _GLIBCXX_USE_C99_STDINT_TR1.
* include/c_compatibility/inttypes.h: Check
_GLIBCXX_USE_C99_INTTYPES and _GLIBCXX_USE_C99_INTTYPES_WCHAR_T
instead of _GLIBCXX_USE_C99_INTTYPES_TR1 and
_GLIBCXX_USE_C99_INTTYPES_WCHAR_T_TR1.
* include/c_compatibility/stdatomic.h: Check
_GLIBCXX_USE_C99_STDINT instead of _GLIBCXX_USE_C99_STDINT_TR1.
* include/c_compatibility/stdint.h: Likewise.
* include/c_global/cinttypes: Check _GLIBCXX_USE_C99_INTTYPES
and _GLIBCXX_USE_C99_INTTYPES_WCHAR_T instead of
_GLIBCXX_USE_C99_INTTYPES_TR1 and
_GLIBCXX_USE_C99_INTTYPES_WCHAR_T_TR1.
* include/c_global/cstdint: Check _GLIBCXX_USE_C99_STDINT
instead of _GLIBCXX_USE_C99_STDINT_TR1.
* include/std/atomic: Likewise.
* src/c++11/cow-stdexcept.cc: Likewise.
* testsuite/29_atomics/headers/stdatomic.h/c_compat.cc:
Likewise.
* testsuite/lib/libstdc++.exp (check_v3_target_cstdint):
Likewise.
Jonathan Wakely [Fri, 12 May 2023 11:44:03 +0000 (12:44 +0100)]
libstdc++: Stop using _GLIBCXX_USE_C99_COMPLEX_TR1 in <complex>
The _GLIBCXX_USE_C99_COMPLEX_TR1 macro (and the comments about it in
acinclude.m4 and config.h) are misleading when it is also used for
<complex>, not only <tr1/complex>. It is also wrong, because the
configure checks for TR1 use -std=c++98 and a target might define cacos
etc. for C++11 but not for C++98.
Add a separate configure check for the inverse trigonometric functions
that are covered by _GLIBCXX_USE_C99_COMPLEX_TR1, but using -std=c++11
for the checks. Use the result of that separate check in <complex>.
libstdc++-v3/ChangeLog:
* acinclude.m4 (GLIBCXX_USE_C99): Check for complex inverse trig
functions in C++11 mode and define _GLIBCXX_USE_C99_COMPLEX_ARC.
* config.h.in: Regenerate.
* configure: Regenerate.
* doc/doxygen/user.cfg.in (PREDEFINED): Add new macro.
* include/std/complex: Check _GLIBCXX_USE_C99_COMPLEX_ARC
instead of _GLIBCXX_USE_C99_COMPLEX_TR1.
Jonathan Wakely [Tue, 9 May 2023 08:30:48 +0000 (09:30 +0100)]
libstdc++: Do not use pthread_mutex_clocklock with ThreadSanitizer
As noted in https://github.com/llvm/llvm-project/issues/62623 there are
no tsan interceptors for some of the new POSIX-1:202x APIs added by
https://austingroupbugs.net/view.php?id=1216 so tsan gives false
positive warnings for try_lock_for on timed mutexes.
Disable the uses of the new pthread_mutex_clocklock API when tsan is
active. This changes the semantics of the try_lock_for functions,
because it can change which clock is used for the wait. This means those
functions might be affected by system clock adjustments when tsan is
used, when they would not be affected otherwise.
Reviewed-by: Thomas Rodgers <trodgers@redhat.com> Reviewed-by: Mike Crowe <mac@mcrowe.com>
libstdc++-v3/ChangeLog:
* acinclude.m4 (GLIBCXX_CHECK_PTHREAD_MUTEX_CLOCKLOCK): Define
_GLIBCXX_USE_PTHREAD_MUTEX_CLOCKLOCK in terms of _GLIBCXX_TSAN.
* configure: Regenerate.
Eric Botcazou [Thu, 26 Jan 2023 14:59:37 +0000 (15:59 +0100)]
ada: Use accumulator type in expansion of 'Reduce attribute
The current expansion of the 'Reduce attribute uses the resolution type of
the expression for the accumulator. Now this type can be unresolved or set
to a universal type, for example if it is itself the prefix of the 'Image
attribute, and this may yield a spurious type mismatch error in that case.
This changes the expansion to use the accumulator type instead as defined
by the RM 4.5.10 clause, albeit only in the prefixed case for now.
gcc/ada/
* exp_attr.adb (Expand_N_Attribute_Reference) <Attribute_Reduce>:
Use the canonical accumulator type as the type of the accumulator
in the prefixed case.
Eric Botcazou [Sun, 29 Jan 2023 23:05:42 +0000 (00:05 +0100)]
ada: Fix crash on iterated component in expression function
The problem is that the freeze node generated for the type of a static
subexpression present in the expression function is incorrectly placed
inside instead of outside the function.
gcc/ada/
* freeze.adb (Freeze_Expression): When the freezing is to be done
outside the current scope, skip any scope that is an internal loop.
Eric Botcazou [Fri, 27 Jan 2023 14:13:07 +0000 (15:13 +0100)]
ada: Fix internal error on chain of predicated record types
The preanalysis of a predicate set on one of the record types was causing
premature freezing of another record type.
gcc/ada/
* sem_ch13.adb: Add with and use clauses for Expander.
(Resolve_Aspect_Expressions) <Aspect_Predicate>: Emulate a
bona-fide preanalysis setup before calling
Resolve_Aspect_Expression.
Eric Botcazou [Fri, 27 Jan 2023 23:08:24 +0000 (00:08 +0100)]
ada: Implement inheritance of user-defined literal aspects for untagged types
In Ada 2022, user-defined literal aspects are nonoverridable but the named
subprograms present in them can be overridden, including for untagged types.
gcc/ada/
* sem_res.adb (Has_Applicable_User_Defined_Literal): Apply the
same processing for derived untagged types as for tagged types.
* sem_util.ads (Corresponding_Primitive_Op): Adjust description.
* sem_util.adb (Corresponding_Primitive_Op): Handle untagged
types.
Javier Miranda [Thu, 26 Jan 2023 19:39:31 +0000 (19:39 +0000)]
ada: Spurious error analyzing 'old or 'result in class-wide conditions
gcc/ada/
* sem_attr.adb
(Analyze_Attribute_Old_Result): When preanalyzing a class-wide
condition, search in the scopes stack for the subprogram that has
the condition. This is required because returning the current
scope causes reporting spurious errors when the occurrence of the
attribute is found, for example, in a quantified expression.
Piotr Trojanek [Thu, 26 Jan 2023 14:56:04 +0000 (15:56 +0100)]
ada: Apply range checks to preanalyzed aggregate expressions
When preanalyzing expressions in GNATprove mode, e.g. Pre/Post
contracts, we apply checks, because these expressions will never
be expanded. This didn't happen for aggregate expressions, most
likely because of an oversight.
gcc/ada/
* sem_util.adb (Aggregate_Constraint_Checks): Don't exit early
when preanalysing in GNATprove mode. Now the condition is
consistent with other similar conditions in other code.
Marc Poulhiès [Thu, 12 Jan 2023 15:13:45 +0000 (16:13 +0100)]
ada: Fix Ada representation of r_debug and link_map types
Both record types need to have their components 'aliased' to match their
C version. The mismatch could be observed when using LTO:
warning: type of 'r_debug' does not match original declaration
[-Wlto-type-mismatch]
/usr/include/link.h:66:23: note: type 'struct r_debug' should match
type 'struct system__traceback__symbolic__module_name__build_...
...cache_for_all_modules__r_debug_type'
gcc/ada/
* libgnat/s-tsmona__linux.adb (link_map, r_debug_type): Add
'aliased' on all components.
ada: Enable Support_Atomic_Primitives on PPC Linux
gcc/ada/
* libgnat/system-linux-ppc.ads: Add Support_Atomic_Primitives.
* libgnat/s-atopri__32.ads: Add 32 bit version of s-atopri.ads.
* Makefile.rtl: Use s-atopro__32.ads for ppc-linux.
Eric Botcazou [Wed, 25 Jan 2023 14:55:34 +0000 (15:55 +0100)]
ada: Follow-up improvement to implementation of storage models
It avoids to recreate an actual subtype for an explicit dereference.
gcc/ada/
* sem_util.adb (Get_Actual_Subtype): For an explicit dereference,
return the Actual_Designated_Subtype if it is present.
(Get_Actual_Subtype_If_Available): Likewise.
Arnaud Charlet [Thu, 19 Jan 2023 08:43:47 +0000 (08:43 +0000)]
ada: Add tags on style messages
Similar to tags on warnings [-gnatwx], we add tags on style messages
[-gnatyx] when -gnatw.d is enabled.
gcc/ada/
* errout.ads: Update comment.
* errout.adb (Skip_Msg_Insertion_Warning): Update to take e.g.
-gnatyM into account.
* erroutc.adb (Get_Warning_Option, Get_Warning_Tag)
(Prescan_Message): Add support for Style tags.
* par-ch5.adb, par-ch6.adb, par-ch7.adb, par-endh.adb,
par-util.adb, style.adb, styleg.adb: Set tag on all style
messages.
Eric Botcazou [Mon, 23 Jan 2023 12:06:26 +0000 (13:06 +0100)]
ada: Adjust semantics and implementation of storage models
This makes the following adjustments to the semantics and implementation of
storage models in the compiler:
1. By-copy semantics in subprogram calls: when an object accessed with a
nonnative storage model is passed as an actual parameter in a call to
a subprogram, an intermediate copy made on the host is passed instead.
2. More generally, any additional temporary required on the host by the
semantics of nonnative storage models is now created by the front-end
instead of the code generator.
3. All the temporaries created on the host for nonnative storage models
are allocated on the secondary stack instead of the primary stack.
As a result, this should simplify the implementation in code generators.
gcc/ada/
* exp_aggr.adb (Build_Assignment_With_Temporary): Adjust comment
and fix type of second parameter. Create the temporary on the
secondary stack by calling Build_Temporary_On_Secondary_Stack.
(Convert_Array_Aggr_In_Allocator): Adjust formatting.
(Expand_Array_Aggregate): Likewise.
* exp_ch4.adb (Expand_N_Allocator): Set Actual_Designated_Subtype
on the dereference in the initialization for all composite types.
* exp_ch5.adb (Expand_N_Assignment_Statement): Create a temporary
on the host for an assignment between nonnative storage models.
Suppress more checks when Suppress_Assignment_Checks is set.
* exp_ch6.adb (Add_Simple_Call_By_Copy_Code): Deal with actuals
that are dereferences with an Actual_Designated_Subtype. Add
support for nonnative storage models.
(Expand_Actuals): Create a copy if the actual is a dereference
with a nonnative storage model.
* exp_util.ads (Build_Temporary_On_Secondary_Stack): Declare.
* exp_util.adb (Build_Temporary_On_Secondary_Stack): New function.
* sem_ch5.adb (Analyze_Assignment.Set_Assignment_Type): Do not
build an actual subtype for dereferences with an
Actual_Designated_Subtype
* sinfo.ads (Actual_Designated_Subtype): Adjust documentation.
(Suppress_Assignment_Checks): Likewise.
Piotr Trojanek [Thu, 19 Jan 2023 23:52:49 +0000 (00:52 +0100)]
ada: Build invariant procedure while freezing in GNATprove mode
Invariant procedure bodies are created either by expansion of freezing
nodes (but only in ordinary compilation mode) or at the end of package
private declarations (but not for with private types in the type
derivation chain).
In GNATprove mode we didn't create invariant procedure bodies in
lightweight expansion, so we didn't create them at all when there were
private types in the type derivation chain.
This patch copies the relevant freezing part from ordinary to
lightweight expansion. This obviously involves code duplication,
but it seems better to duplicate whole sections that work properly
instead of small pieces that are incomplete. There are other pieces
of freezing that are similarly duplicated, so this patch doesn't make
the code substantially worse.
gcc/ada/
* exp_spark.adb (SPARK_Freeze_Type): Copy whole handling of DIC
and Type_Invariant from Freeze_Type.
Eric Botcazou [Fri, 20 Jan 2023 11:48:16 +0000 (12:48 +0100)]
ada: Document examples of No_Dependence restriction for code generation
gcc/ada/
* doc/gnat_rm/standard_and_implementation_defined_restrictions.rst
(No_Dependence): Give examples of new No_Dependence restrictions.
* gnat_rm.texi: Regenerate.
Eric Botcazou [Wed, 18 Jan 2023 19:52:03 +0000 (20:52 +0100)]
ada: Introduce Cannot_Be_Superflat flag on N_Range nodes
The support of superflat arrays in the language generates an overhead that
the code generator attempts to minimize, but it cannot handle too complex
cases and it would be helpful if the front-end could lend a hand.
This change introduces the Cannot_Be_Superflat flag on N_Range nodes for
this purpose, and sets it on the result of string concatenations when it
is guaranteed to be nonnull.
gcc/ada/
* gen_il-fields.ads (Opt_Field_Enum): Add Cannot_Be_Superflat.
* gen_il-gen-gen_nodes.adb (N_Range): Add Cannot_Be_Superflat as
semantical flag and change Includes_Infinities to semantical.
* sinfo.ads (Cannot_Be_Superflat): Document it for N_Range.
* exp_ch4.adb (Expand_Concatenate): Set Cannot_Be_Superflat on the
range of the result if the result cannot be null.
Richard Kenner [Wed, 18 Jan 2023 22:45:15 +0000 (17:45 -0500)]
ada: Change Present_Expr field type to Uint
We want the field to be initialized to No_Uint because we want to be
able to test in GNAT LLVM whether we've already set it so we can be
sure we only set it once.
gcc/ada/
* gen_il-gen-gen_nodes.adb (Present_Expr): Type is now Uint.
Yannick Moy [Wed, 18 Jan 2023 08:40:40 +0000 (08:40 +0000)]
ada: Simplify dramatically ghost code for proof of System.Arith_Double
Using Inline_For_Proof annotation on key expression functions makes
it possible to remove hundreds of lines of ghost code that were
previously needed to guide provers.
gcc/ada/
* libgnat/s-aridou.adb (Big3, Is_Mult_Decomposition)
(Is_Scaled_Mult_Decomposition): Add annotation for inlining.
(Double_Divide, Scaled_Divide): Simplify and remove ghost code.
(Prove_Multiplication): Add calls to lemmas to make proof go
through.
* libgnat/s-aridou.ads (Big, In_Double_Int_Range): Add annotation
for inlining.
Yannick Moy [Tue, 17 Jan 2023 08:06:54 +0000 (08:06 +0000)]
ada: Restore proof of System.Arith_Double
Use Assert_And_Cut to simplify proof of second part of the Scaled_Divide.
Add intermediate assertions and simplify where necessary.
gcc/ada/
* libgnat/s-aridou.adb:
(Big3): Remove override made useless.
(Lemma_Quot_Rem): Add new lemma and justify it, as no prover
manages to prove it.
(Lemma_Div_Pow2): Use new lemma Lemma_Quot_Rem.
(Prove_Scaled_Mult_Decomposition_Regroup3): Retype for
simplification.
(Scaled_Divide): Remove useless assertions.Decompose some
assertions with cut operations. Use Assert_And_Cut for second
half. Add assertions.
Vectorize memset with a constant length of less than or equal to 64
bytes.
Do not perform a libc function call into memset in case the size is not
a compile-time constant but bounded and the upper bound is less than or
equal to 256 bytes.
gcc/ChangeLog:
* config/s390/s390-protos.h (s390_expand_setmem): Change
function signature.
* config/s390/s390.cc (s390_expand_setmem): For memset's less
than or equal to 256 byte do not perform a libc call.
* config/s390/s390.md: Change expander into a version which
takes 8 operands.
gcc/testsuite/ChangeLog:
* gcc.target/s390/memset-1.c: Test case memset1 makes use of
vst, now.
Do not perform a libc function call into memcpy in case the size is not
a compile-time constant but bounded and the upper bound is less than or
equal to 256 bytes.
gcc/ChangeLog:
* config/s390/s390-protos.h (s390_expand_cpymem): Change
function signature.
* config/s390/s390.cc (s390_expand_cpymem): For memcpy's less
than or equal to 256 byte do not perform a libc call.
(s390_expand_insv): Adapt new function signature of
s390_expand_cpymem.
* config/s390/s390.md: Change expander into a version which
takes 8 operands.
Paul Thomas [Tue, 16 May 2023 05:35:40 +0000 (06:35 +0100)]
Fortran: Fix an assortment of bugs
2023-05-16 Paul Thomas <pault@gcc.gnu.org>
gcc/fortran
PR fortran/105152
* interface.cc (gfc_compare_actual_formal): Emit an error if an
unlimited polymorphic actual is not matched either to an
unlimited or assumed type formal argument.
PR fortran/100193
* resolve.cc (resolve_ordinary_assign): Emit an error if the
var expression of an ordinary assignment is a proc pointer
component.
PR fortran/87496
* trans-array.cc (gfc_walk_array_ref): Provide assumed shape
arrays coming from interface mapping with a viable arrayspec.
PR fortran/103389
* trans-expr.cc (gfc_conv_intrinsic_to_class): Tidy up flagging
of unlimited polymorphic 'class_ts'.
(gfc_conv_gfc_desc_to_cfi_desc): Assumed type is unlimited
polymorphic and should accept any actual type.
PR fortran/104429
(gfc_conv_procedure_call): Replace dreadful kludge with a call
to gfc_finalize_tree_expr. Avoid dereferencing a void pointer
by giving it the pointer type of the actual argument.
PR fortran/82774
(alloc_scalar_allocatable_subcomponent): Shorten the function
name and replace the symbol argument with the se string length.
If a deferred length character length is either not present or
is not a variable, give the typespec a variable and assign the
string length to that. Use gfc_deferred_strlen to find the
hidden string length component.
(gfc_trans_subcomponent_assign): Convert the expression before
the call to alloc_scalar_allocatable_subcomponent so that a
good string length is provided.
(gfc_trans_structure_assign): Remove the unneeded derived type
symbol from calls to gfc_trans_subcomponent_assign.
gcc/testsuite/
PR fortran/105152
* gfortran.dg/pr105152.f90 : New test
PR fortran/100193
* gfortran.dg/pr100193.f90 : New test
PR fortran/87946
* gfortran.dg/pr87946.f90 : New test
PR fortran/103389
* gfortran.dg/pr103389.f90 : New test
PR fortran/104429
* gfortran.dg/pr104429.f90 : New test
PR fortran/82774
* gfortran.dg/pr82774.f90 : New test
Skip -fdelete-null-pointer-check tests if target keeps_null_pointer_checks
A bunch of tests explicitly pass in -fdelete-null-pointer-checks and
fail if the target keeps null pointer checks. Skip such tests by
adding a dg-skip-if for keeps_null_pointer_checks.
Andrew Pinski [Mon, 15 May 2023 21:44:27 +0000 (21:44 +0000)]
MATCH: [PR109424] Simplify min/max of boolean arguments
This is version 2 of https://gcc.gnu.org/pipermail/gcc-patches/2021-August/577394.html
which does not depend on adding gimple_truth_valued_p at this point.
Instead will use zero_one_valued_p which is already used for mult simplifications
to make sure that we only have [0,1] rather having the mistake of maybe having [-1,0]
as the range for signed bools.
This shows up in a few places in GCC itself but only at -O1, we miss the min/max conversion
because of PR 107888 (which I will be testing seperately).
OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
Thanks,
Andrew Pinski
PR tree-optimization/109424
gcc/ChangeLog:
* match.pd: Add patterns for min/max of zero_one_valued
values to `&`/`|`.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/bool-12.c: New test.
* gcc.dg/tree-ssa/bool-13.c: New test.
* gcc.dg/tree-ssa/minmax-20.c: New test.
* gcc.dg/tree-ssa/minmax-21.c: New test.
Joseph Myers [Mon, 15 May 2023 23:17:48 +0000 (23:17 +0000)]
c: Ignore _Atomic on function return type for C2x
For C2x it was decided that _Atomic would be completely ignored on
function return types (just as was done for qualifiers in C11 DR#423),
to eliminate the potential for an rvalue returned by a function having
_Atomic-qualified type when an rvalue resulting from lvalue-to-rvalue
conversion could not have such a type. Implement this for GCC.
Bootstrapped with no regressions for x86_64-pc-linux-gnu.
gcc/c/
* c-decl.cc (grokdeclarator): Ignore _Atomic on function return
type for C2x.
gcc/testsuite/
* gcc.dg/qual-return-9.c, gcc.dg/qual-return-10.c: New tests.
Joseph Myers [Mon, 15 May 2023 21:27:33 +0000 (21:27 +0000)]
c: Update __has_c_attribute values for C2x
WG14 decided that __has_c_attribute should return the same value
(equal to the intended __STDC_VERSION__ value) for all standard
attributes in C2x, with values associated with when an attribute was
added to the working draft (or had semantics added or changed in the
working draft) only being used in earlier stages of development of
that draft. The intent is that the values for existing attributes
increase in future standard versions only if there are new features /
semantic changes for those attributes. Implement this change for GCC.
Bootstrapped with no regressions for x86_64-pc-linux-gnu.
gcc/c-family/
* c-lex.cc (c_common_has_attribute): Use 202311 as
__has_c_attribute return for all C2x attributes.
gcc/testsuite/
* gcc.dg/c2x-has-c-attribute-2.c: Expect 202311L return value from
__has_c_attribute for all C2x attributes.
Harald Anlauf [Sun, 14 May 2023 19:53:51 +0000 (21:53 +0200)]
Fortran: CLASS pointer function result in variable definition context [PR109846]
gcc/fortran/ChangeLog:
PR fortran/109846
* expr.cc (gfc_check_vardef_context): Check appropriate pointer
attribute for CLASS vs. non-CLASS function result in variable
definition context.
gcc/testsuite/ChangeLog:
PR fortran/109846
* gfortran.dg/ptr-func-5.f90: New test.
Aldy Hernandez [Mon, 15 May 2023 10:25:58 +0000 (12:25 +0200)]
Add auto-resizing capability to irange's [PR109695]
<tldr>
We can now have int_range<N, RESIZABLE=false> for automatically
resizable ranges. int_range_max is now int_range<3, true>
for a 69X reduction in size from current trunk, and 6.9X reduction from
GCC12. This incurs a 5% performance penalty for VRP that is more than
covered by our > 13% improvements recently.
</tldr>
int_range_max is the temporary range object we use in the ranger for
integers. With the conversion to wide_int, this structure bloated up
significantly because wide_ints are huge (80 bytes a piece) and are
about 10 times as big as a plain tree. Since the temporary object
requires 255 sub-ranges, that's 255 * 80 * 2, plus the control word.
This means the structure grew from 4112 bytes to 40912 bytes.
This patch adds the ability to resize ranges as needed, defaulting to
no resizing, while int_range_max now defaults to 3 sub-ranges (instead
of 255) and grows to 255 when the range being calculated does not fit.
For example:
int_range<1> foo; // 1 sub-range with no resizing.
int_range<5> foo; // 5 sub-ranges with no resizing.
int_range<5, true> foo; // 5 sub-ranges with resizing.
I ran some tests and found that 3 sub-ranges cover 99% of cases, so
I've set the int_range_max default to that:
We don't bother growing incrementally, since the default covers most
cases and we have a 255 hard-limit. This hard limit could be reduced
to 128, since my tests never saw a range needing more than 124, but we
could do that as a follow-up if needed.
With 3-subranges, int_range_max is now 592 bytes versus 40912 for
trunk, and versus 4112 bytes for GCC12! The penalty is 5.04% for VRP
and 3.02% for threading, with no noticeable change in overall
compilation (0.27%). This is more than covered by our 13.26%
improvements for the legacy removal + wide_int conversion.
I think this approach is a good alternative, while providing us with
flexibility going forward. For example, we could try defaulting to a
8 sub-ranges for a noticeable improvement in VRP. We could also use
large sub-ranges for switch analysis to avoid resizing.
Another approach I tried was always resizing. With this, we could
drop the whole int_range<N> nonsense, and have irange just hold a
resizable range. This simplified things, but incurred a 7% penalty on
ipa_cp. This was hard to pinpoint, and I'm not entirely convinced
this wasn't some artifact of valgrind. However, until we're sure,
let's avoid massive changes, especially since IPA changes are coming
up.
For the curious, a particular hot spot for IPA in this area was:
The problem isn't the resizing (since we do that at most once) but the
fact that for some functions with lots of callers we end up a huge
range that gets copied and compared for every meet operation. Maybe
the IPA algorithm could be adjusted somehow??.
Anywhooo... for now there is nothing to worry about, since value_range
still has 2 subranges and is not resizable. But we should probably
think what if anything we want to do here, as I envision IPA using
infinite ranges here (well, int_range_max) and handling frange's, etc.
gcc/ChangeLog:
PR tree-optimization/109695
* value-range.cc (irange::operator=): Resize range.
(irange::union_): Same.
(irange::intersect): Same.
(irange::invert): Same.
(int_range_max): Default to 3 sub-ranges and resize as needed.
* value-range.h (irange::maybe_resize): New.
(~int_range): New.
(int_range::int_range): Adjust for resizing.
(int_range::operator=): Same.
Aldy Hernandez [Mon, 15 May 2023 13:10:11 +0000 (15:10 +0200)]
Only return changed=true in union_nonzero when appropriate.
irange::union_ was being overly pessimistic in its return value. It
was returning false when the nonzero mask was possibly the same.
The reason for this is because the nonzero mask is not entirely kept
up to date. We avoid setting it up when a new range is set (from a
set, intersect, union, etc), because calculating a mask from a range
is measurably expensive. However, irange::get_nonzero_bits() will
always return the correct mask because it will calculate the nonzero
mask inherit in the mask on the fly and bitwise or it with the saved
mask. This was an optimization because last release it was a big
penalty to keep the mask up to date. This may not necessarily be the
case with the conversion to wide_int's. We should investigate.
Just to be clear, the result from get_nonzero_bits() is always correct
as seen by the user, but the wide_int in the irange does not contain
all the information, since part of the nonzero bits can be determined
by the range itself, on the fly.
The fix here is to make sure that the result the user sees (callers of
get_nonzero_bits()) changed when unioning bits. This allows
ipcp_vr_lattice::meet_with_1 to avoid unnecessary copies when
determining if a range changed.
This patch yields an 6.89% improvement to the ipa_cp pass. I'm
including the IPA changes in this patch, as it's a testcase of sorts for
the change.
gcc/ChangeLog:
* ipa-cp.cc (ipcp_vr_lattice::meet_with_1): Avoid unnecessary
range copying
* value-range.cc (irange::union_nonzero_bits): Return TRUE only
when range changed.
Juzhe-Zhong [Mon, 15 May 2023 14:23:45 +0000 (22:23 +0800)]
RISC-V: Add rounding mode operand for fixed-point patterns
Since we are going to have fixed-point intrinsics that are modeling
rounding mode
https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/222
We should have operand to specify rounding mode in fixed-point instructions.
We don't support these modeling rounding mode intrinsics yet but we will
definetely support them later.
This is the preparing patch for new coming intrinsics.
Pan Li [Mon, 15 May 2023 14:05:44 +0000 (22:05 +0800)]
OPTABS: Extend the number of expanding instructions pattern
We (RVV) is going to add a rounding mode operand into floating-point
instructions which have 11 operands.
Since we are going have intrinsic that is adding rounding mode argument:
https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/226
This is the patch that is adding rounding mode operand in RISC-V port:
https://gcc.gnu.org/pipermail/gcc-patches/2023-May/618573.html
You can see there are 11 operands in these patterns.
gcc/ChangeLog:
* optabs.cc (maybe_gen_insn): Add case to generate instruction
that has 11 operands.
Kyrylo Tkachov [Mon, 15 May 2023 11:05:35 +0000 (12:05 +0100)]
aarch64: Cost vector comparisons more accurately
We are missing cases for combining of FACGE/FACGT instructions. In the testcase of the patch we generate:
foo:
fabs v3.4s, v0.4s
fabs v0.4s, v1.4s
fabs v1.4s, v2.4s
fcmgt v0.4s, v3.4s, v0.4s
fcmgt v1.4s, v3.4s, v1.4s
b g
This is because combine is rejecting the pattern due to costs:
Successfully matched this instruction:
(set (reg:V4SI 106)
(neg:V4SI (lt:V4SI (abs:V4SF (reg:V4SF 113))
(abs:V4SF (reg:V4SF 111)))))
rejecting combination of insns 8, 9 and 10
original costs 8 + 8 + 12 = 28
replacement costs 8 + 28 = 36
It is obviously recursing in the various arms of the RTX and such.
This patch teaches the aarch64 rtx costs routine that our vector comparisons are represented as a NEG of
compare operators, with the FACGE/FAGT operations in particular having ABS on each arm. With this patch we get
the much more reasonable dump:
original costs 8 + 8 + 8 = 24
replacement costs 8 + 8 = 16
and generate the optimal assembly:
foo:
mov v31.16b, v0.16b
facgt v0.4s, v0.4s, v1.4s
facgt v1.4s, v31.4s, v2.4s
b g
Bootstrapped and tested on aarch64-none-linux-gnu.
Thomas Schwinge [Tue, 25 Apr 2023 21:53:12 +0000 (23:53 +0200)]
Support parallel testing in libgomp, part II [PR66005]
..., and enable if 'flock' is available for serializing execution testing.
Regarding the default of 19 parallel slots, this turned out to be a local
minimum for wall time when testing this on:
$ uname -srvi
Linux 4.2.0-42-generic #49~14.04.1-Ubuntu SMP Wed Jun 29 20:22:11 UTC 2016 x86_64
$ grep '^model name' < /proc/cpuinfo | uniq -c
32 model name : Intel(R) Xeon(R) CPU E5-2640 v3 @ 2.60GHz
... in two configurations: case (a) standard configuration, no offloading
configured, case (b) offloading for GCN and nvptx configured but no devices
available. For both cases, default plus '-m32' variant.
$ \time make check-target-libgomp RUNTESTFLAGS="--target_board=unix\{,-m32\}"
$ uname -srvi
Linux 5.15.0-71-generic #78-Ubuntu SMP Tue Apr 18 09:00:29 UTC 2023 x86_64
$ grep '^model name' < /proc/cpuinfo | uniq -c
12 model name : Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz
$ nvidia-smi -L
GPU 0: Quadro P1000 (UUID: GPU-e043973b-b52a-d02b-c066-a8fdbf64e8ea)
... in two configurations: case (c) standard configuration, no offloading
configured, case (d) offloading for nvptx configured and device available.
For both cases, only default variant, no '-m32'.
$ \time make check-target-libgomp
Case (c), baseline; roughly half of case (a) (just one variant):
Worth noting is that with nvptx offloading, there is one execution test case
that times out ('libgomp.fortran/reverse-offload-5.f90'). This effectively
stalls progress for almost 5 min: quickly other executions test cases queue up
on the lock for all parallel slots. That's working as expected; just noting
this as it accordingly does skew the wall time numbers.
Thomas Schwinge [Wed, 10 May 2023 13:01:55 +0000 (15:01 +0200)]
libgomp testsuite: As appropriate, use the 'gcc', 'g++', 'gfortran' driver [PR91884]
..., that is, 'GCC_UNDER_TEST', 'GXX_UNDER_TEST', 'GFORTRAN_UNDER_TEST' instead
of 'GCC_UNDER_TEST' for all of them. No need anymore for 'gcc -lstdc++ -x c++'
for C++ code, or 'gcc -lgfortran' plus conditional '-lquadmath' for Fortran
code. (Getting rid of explicit '-foffload=-lgfortran' is for another day.)
Sören Tempel [Sun, 14 May 2023 17:30:21 +0000 (19:30 +0200)]
fix assert in __deregister_frame_info_bases
The assertion in __deregister_frame_info_bases assumes that for every
frame something was inserted into the lookup data structure by
__register_frame_info_bases. Unfortunately, this does not necessarily
hold true as the btree_insert call in __register_frame_info_bases will
not insert anything for empty ranges. Therefore, we need to explicitly
account for such empty ranges in the assertion as `ob` will be a null
pointer for such ranges, hence causing the assertion to fail.