From what I can see, this has been voted in as a DR and as it means
we warn less often than before in -std={gnu,c}++2{0,3} modes or with
-Wvolatile, I wonder if it shouldn't be backported to affected release
branches as well.
2022-08-16 Jakub Jelinek <jakub@redhat.com>
* typeck.c (cp_build_modify_expr): Implement
P2327R1 - De-deprecating volatile compound operations. Don't warn
for |=, &= or ^= with volatile lhs.
* expr.c (mark_use) <case MODIFY_EXPR>: Adjust warning wording,
leave out simple.
* g++.dg/cpp2a/volatile1.C: Adjust for de-deprecation of volatile
compound |=, &= and ^= operations.
* g++.dg/cpp2a/volatile3.C: Likewise.
* g++.dg/cpp2a/volatile5.C: Likewise.
Jakub Jelinek [Wed, 27 Jul 2022 10:06:22 +0000 (12:06 +0200)]
cgraphunit: Don't emit asm thunks for -dx [PR106261]
When -dx option is used (didn't know we have it and no idea what is it
useful for), we just expand functions to RTL and then omit all further
RTL passes, so the normal functions aren't actually emitted into assembly,
just variables.
The following testcase ICEs, because we don't emit the methods, but do
emit thunks pointing to that and those thunks have unwind info and rely on
at least some real functions to be emitted (which is normally the case,
thunks are only emitted for locally defined functions) because otherwise
there are no CIEs, only FDEs and dwarf2out is upset about it.
The following patch fixes that by not emitting assembly thunks for -dx
either.
Jakub Jelinek [Fri, 1 Jul 2022 09:17:41 +0000 (11:17 +0200)]
wide-int: Fix up wi::shifted_mask [PR106144]
As the following self-test testcase shows, wi::shifted_mask sometimes
doesn't create canonicalized wide_ints, which then fail to compare equal
to canonicalized wide_ints with the same value.
In particular, wi::mask (128, false, 128) gives { -1 } with len 1 and prec 128,
while wi::shifted_mask (0, 128, false, 128) gives { -1, -1 } with len 2
and prec 128.
The problem is that the code is written with the assumption that there are
3 bit blocks (or 2 if start is 0), but doesn't consider the possibility
where there are 2 bit blocks (or 1 if start is 0) where the highest block
isn't present. In that case, there is the optional block of negate ? 0 : -1
elts, followed by just one elt (either one from the if (shift) or just
negate ? -1 : 0) and the rest is implicit sign-extension.
Only if end < prec there is 1 or more bits above it that have different bit
value and so we need to emit all the elts till end and then one more elt.
if (end == prec) would work too, because we have:
if (width > prec - start)
width = prec - start;
unsigned int end = start + width;
so end is guaranteed to be end <= prec, dunno what is preferred.
2022-07-01 Jakub Jelinek <jakub@redhat.com>
PR middle-end/106144
* wide-int.cc (wi::shifted_mask): If end >= prec, return right after
emitting element for shift or if shift is 0 first element after start.
(wide_int_cc_tests): Add tests for equivalency of wi::mask and
wi::shifted_mask with 0 start.
Jakub Jelinek [Tue, 21 Jun 2022 09:40:16 +0000 (11:40 +0200)]
ifcvt: Don't introduce trapping or faulting reads in noce_try_sign_mask [PR106032]
noce_try_sign_mask as documented will optimize
if (c < 0)
x = t;
else
x = 0;
into x = (c >> bitsm1) & t;
The optimization is done if either t is unconditional
(e.g. for
x = t;
if (c >= 0)
x = 0;
) or if it is cheap. We already check that t doesn't have side-effects,
but if t is conditional, we need to punt also if it may trap or fault,
as we make it unconditional.
I've briefly skimmed other noce_try* optimizations and didn't find one that
would suffer from the same problem.
2022-06-21 Jakub Jelinek <jakub@redhat.com>
PR rtl-optimization/106032
* ifcvt.c (noce_try_sign_mask): Punt if !t_unconditional, and
t may_trap_or_fault_p, even if it is cheap.
Jakub Jelinek [Tue, 21 Jun 2022 09:38:59 +0000 (11:38 +0200)]
expand: Fix up expand_cond_expr_using_cmove [PR106030]
If expand_cond_expr_using_cmove can't find a cmove optab for a particular
mode, it tries to promote the mode and perform the cmove in the promoted
mode.
The testcase in the patch ICEs on arm because in that case we pass temp which
has the promoted mode (SImode) as target to expand_operands where the
operands have the non-promoted mode (QImode).
Later on the function uses paradoxical subregs:
if (GET_MODE (op1) != mode)
op1 = gen_lowpart (mode, op1);
if (GET_MODE (op2) != mode)
op2 = gen_lowpart (mode, op2);
to change the operand modes.
The following patch fixes it by passing NULL_RTX as target if it has
promoted mode.
2022-06-21 Jakub Jelinek <jakub@redhat.com>
PR middle-end/106030
* expr.c (expand_cond_expr_using_cmove): Pass NULL_RTX instead of
temp to expand_operands if mode has been promoted.
Jakub Jelinek [Tue, 21 Jun 2022 15:51:08 +0000 (17:51 +0200)]
libgomp: Fix up target-31.c test [PR106045]
The i variable is used inside of the parallel in:
#pragma omp simd safelen(32) private (v)
for (i = 0; i < 64; i++)
{
v = 3 * i;
ll[i] = u1 + v * u2[0] + u2[1] + x + y[0] + y[1] + v + h[0] + u3[i];
}
where i is predetermined linear (so while inside of the body
it is safe, private per SIMD lane var) the final value is written to
the shared variable, and in:
for (i = 0; i < 64; i++)
if (ll[i] != u1 + 3 * i * u2[0] + u2[1] + x + y[0] + y[1] + 3 * i + 13 + 14 + i)
#pragma omp atomic write
err = 1;
which is a normal loop and so it isn't in any way privatized there.
So we have a data race, fixed by adding private (i) clause to the
parallel.
2022-06-21 Jakub Jelinek <jakub@redhat.com>
Paul Iannetta <piannetta@kalrayinc.com>
Jonathan Wakely [Tue, 24 Nov 2020 12:48:31 +0000 (12:48 +0000)]
libstdc++: Throw instead of segfaulting in std::thread constructor [PR 67791]
This turns a mysterious segfault into an exception with a more useful
message. If the exception isn't caught, the user sees this instead of
just a segfault:
terminate called after throwing an instance of 'std::system_error'
what(): Enable multithreading to use std::thread: Operation not permitted
Aborted (core dumped)
libstdc++-v3/ChangeLog:
PR libstdc++/67791
* src/c++11/thread.cc (thread::_M_start_thread(_State_ptr, void (*)())):
Check that gthreads is available before calling __gthread_create.
Jonathan Wakely [Tue, 4 Apr 2023 11:04:14 +0000 (12:04 +0100)]
libstdc++: Fix outdated docs about demangling exception messages
The string returned by std::bad_exception::what() hasn't been a mangled
name since PR libstdc++/14493 was fixed for GCC 4.2.0, so remove the
docs showing how to demangle it.
libstdc++-v3/ChangeLog:
* doc/xml/manual/extensions.xml: Remove std::bad_exception from
example program.
* doc/html/manual/ext_demangling.html: Regenerate.
Jonathan Wakely [Wed, 26 Apr 2023 11:27:59 +0000 (12:27 +0100)]
libstdc++: Reduce Doxygen output for PDF
Including the header source code in the doxygen-generated PDF file makes
it too large, and causes pdflatex to run out of memory. If we only set
SOURCE_BROWSER=YES for the HTML docs then we won't include the sources
in the PDF file.
There are several macros defined for std::valarray that are only used to
generate repetitive code and then #undef'd. Those aren't useful in the
doxygen docs, especially the ones that reuse the same name in different
files. Omitting them avoids warnings about duplicate labels in the
refman.tex file.
libstdc++-v3/ChangeLog:
* doc/doxygen/user.cfg.in (SOURCE_BROWSER): Only set to YES for
HTML docs.
* include/bits/gslice_array.h (_DEFINE_VALARRAY_OPERATOR): Omit
from doxygen docs.
* include/bits/indirect_array.h (_DEFINE_VALARRAY_OPERATOR):
Likewise.
* include/bits/mask_array.h (_DEFINE_VALARRAY_OPERATOR):
Likewise.
* include/bits/slice_array.h (_DEFINE_VALARRAY_OPERATOR):
Likewise.
* include/std/valarray (_DEFINE_VALARRAY_UNARY_OPERATOR)
(_DEFINE_VALARRAY_AUGMENTED_ASSIGNMENT)
(_DEFINE_VALARRAY_EXPR_AUGMENTED_ASSIGNMENT)
(_DEFINE_BINARY_OPERATOR): Likewise.
This fixes the following conformance problems reported in the PR:
- Move constructor and move assignment should be defined.
- Copy assignment from a valueless object should be allowed.
Assignment is completely rewritten by this patch, as the previous
version had a number of problems. The converting assignment failed to
handle the case of assigning a new value to a valueless object, which
should work. It only accepted lvalue arguments, so wasn't usable to
implement the move assignment operator. Finally, it enforced the
precondition that the argument is not valueless, which is correct for
the converting assignment but not for the copy assignment.
A new _M_assign member is added to handle all cases of assignment
(copying from an lvalue, moving from an rvalue, and converting from a
different type). The not valueless precondition is checked in the
converting assignment before calling _M_assign, so isn't enforced for
copy and move assignment. The new function no longer uses a switch, so
handles valueless objects as the LHS or RHS of the assignment.
libstdc++-v3/ChangeLog:
PR libstdc++/100823
* include/bits/stl_iterator.h (common_iterator): Define move
constructor and move assignment operator.
(common_iterator::_M_assign): New function implementing
assignment.
(common_iterator::operator=): Use _M_assign.
(common_iterator::_S_valueless): New constant.
* testsuite/24_iterators/common_iterator/100823.cc: New test.
Jonathan Wakely [Wed, 20 Jul 2022 11:49:28 +0000 (12:49 +0100)]
libstdc++: Fix minor bugs in std::common_iterator
The noexcept-specifier for some std::common_iterator constructors was
incorrectly using an rvalue as the first argument of
std::is_nothrow_assignable_v. This gave the wrong answer for some types,
e.g. std::common_iterator<int*, S>, because an rvalue of scalar type
cannot be assigned to.
Also fix the friend declaration to use the same constraints as on the
definition of the class template. G++ fails to diagnose this error, due
to PR c++/96830.
Finally, the copy constructor was using std::move for its argument
in some cases, which should be removed.
libstdc++-v3/ChangeLog:
* include/bits/stl_iterator.h (common_iterator): Fix incorrect
uses of is_nothrow_assignable_v. Fix inconsistent constraints on
friend declaration. Do not move argument in copy constructor.
* testsuite/24_iterators/common_iterator/1.cc: Check for
noexcept constructibnle/assignable.
Jason Merrill [Thu, 23 Mar 2023 19:57:39 +0000 (15:57 -0400)]
c-family: -Wsequence-point and COMPONENT_REF [PR107163]
The patch for PR91415 fixed -Wsequence-point to treat shifts and ARRAY_REF
as sequenced in C++17, and COMPONENT_REF as well. But this is unnecessary
for COMPONENT_REF, since the RHS is just a FIELD_DECL with no actual
evaluation, and in this testcase handling COMPONENT_REF as sequenced blows
up fast in a deep inheritance tree. Instead, look through it.
PR c++/107163
gcc/c-family/ChangeLog:
* c-common.c (verify_tree): Don't use sequenced handling
for COMPONENT_REF.
Jason Merrill [Thu, 23 Mar 2023 20:50:09 +0000 (16:50 -0400)]
c++: constexpr PMF conversion [PR105996]
Here, we were calling build_reinterpret_cast regardless of whether there was
actually a cast, and that now sets REINTERPRET_CAST_P. But that
optimization seems dodgy anyway, as it involves NOP_EXPR from one
RECORD_TYPE to another and we try to reserve NOP_EXPR for fundamental types.
And the generated code seems the same, so let's drop it. And also strip
location wrappers.
PR c++/105996
gcc/cp/ChangeLog:
* typeck.c (build_ptrmemfunc): Drop 0-offset optimization
and location wrappers.
Jason Merrill [Fri, 17 Mar 2023 21:26:40 +0000 (17:26 -0400)]
c++: constant, array, lambda, template [PR108975]
When a lambda refers to a constant local variable in the enclosing scope, we
tentatively capture it, but if we end up pulling out its constant value, we
go back at the end of the lambda and prune any unneeded captures. Here
while parsing the template we decided that the dim capture was unneeded,
because we folded it away, but then we brought back the use in the template
trees that try to preserve the source representation with added type info.
So then when we tried to instantiate that use, we couldn't find what it was
trying to use, and crashed.
Fixed by not trying to prune when parsing a template; we'll prune at
instantiation time.
PR c++/108975
gcc/cp/ChangeLog:
* lambda.c (prune_lambda_captures): Don't bother in a template.
Jason Merrill [Fri, 17 Mar 2023 13:43:48 +0000 (09:43 -0400)]
c++: namespace-scoped friend in local class [PR69410]
do_friend was only considering class-qualified identifiers for the
qualified-id case, but we also need to skip local scope when there's an
explicit namespace scope.
PR c++/69410
gcc/cp/ChangeLog:
* friend.c (do_friend): Handle namespace as scope argument.
* decl.c (grokdeclarator): Pass down in_namespace.
Philipp Tomsich [Mon, 30 Jan 2023 22:40:26 +0000 (23:40 +0100)]
PR target/108589 - Check REG_P for AARCH64_FUSE_ADDSUB_2REG_CONST1
This adds a check for REG_P on SET_DEST for the new idiom recognizer
for AARCH64_FUSE_ADDSUB_2REG_CONST1. The reported ICE is only
observable with checking=rtl.
Bootstrapped/regtested aarch64-linux, committed.
PR target/108589
gcc/ChangeLog:
* config/aarch64/aarch64.c (aarch_macro_fusion_pair_p): Check
REG_P on SET_DEST.
Philipp Tomsich [Thu, 23 Mar 2023 18:47:57 +0000 (19:47 +0100)]
aarch64: disable LDP via tuning structure for -mcpu=ampere1
AmpereOne (-mcpu=ampere1) breaks LDP instructions into two uops.
Given the chance that this causes instructions to slip into the next
decoding cycle and the additional overheads when handling
cacheline-crossing LDP instructions, we disable the generation of LDP
isntructions through the tuning structure from instruction combining
(such as in peephole2).
Given the code-density benefits in builtins and prologue/epilogue
expansion, we allow LDPs there.
This commit:
* adds a new tuning option AARCH64_EXTRA_TUNE_NO_LDP_COMBINE
* allows -moverride=tune=... to override this
These changes are benchmark-driven, yielding the following changes
(with a net-overall improvement):
503.bwaves_r. -0.88%
507.cactuBSSN_r 0.35%
508.namd_r 3.09%
510.parest_r -2.99%
511.povray_r 5.54%
519.lbm_r 15.83%
521.wrf_r 0.56%
526.blender_r 2.47%
527.cam4_r 0.70%
538.imagick_r 0.00%
544.nab_r -0.33%
549.fotonik3d_r. -0.42%
554.roms_r 0.00%
-------------------------
= total 1.79%
Signed-off-by: Philipp Tomsich <philipp.tomsich@vrull.eu> Co-Authored-By: Di Zhao <di.zhao@amperecomputing.com>
gcc/ChangeLog:
* config/aarch64/aarch64-tuning-flags.def (AARCH64_EXTRA_TUNING_OPTION):
Add AARCH64_EXTRA_TUNE_NO_LDP_COMBINE.
* config/aarch64/aarch64.c (aarch64_operands_ok_for_ldpstp):
Check for the above tuning option when processing loads.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/ampere1-no_ldp_combine.c: New test.
Philipp Tomsich [Mon, 7 Nov 2022 13:22:21 +0000 (14:22 +0100)]
aarch64: Add support for Ampere-1A (-mcpu=ampere1a) CPU
This patch adds support for Ampere-1A CPU:
- recognize the name of the core and provide detection for -mcpu=native,
- updated extra_costs,
- adds a new fusion pair for (A+B+1 and A-B-1).
Ampere-1A and Ampere-1 have more timing difference than the extra
costs indicate, but these don't propagate through to the headline
items in our extra costs (e.g. the change in latency for scalar sqrt
doesn't have a corresponding table entry).
gcc/ChangeLog:
* config/aarch64/aarch64-cores.def (AARCH64_CORE): Add ampere1a.
* config/aarch64/aarch64-cost-tables.h: Add ampere1a_extra_costs.
* config/aarch64/aarch64-fusion-pairs.def (AARCH64_FUSION_PAIR):
Define a new fusion pair for A+B+1/A-B-1 (i.e., add/subtract two
registers and then +1/-1).
* config/aarch64/aarch64-tune.md: Regenerate.
* config/aarch64/aarch64.c (aarch_macro_fusion_pair_p): Implement
idiom-matcher for the new fusion pair.
* doc/invoke.texi: Add ampere1a.
Philipp Tomsich [Sun, 7 Aug 2022 22:30:52 +0000 (00:30 +0200)]
aarch64: update Ampere-1 core definition
This brings the extensions detected by -mcpu=native on Ampere-1 systems
in sync with the defaults generated for -mcpu=ampere1.
Note that some early kernel versions on Ampere1 may misreport the
presence of PAUTH and PREDRES (i.e., -mcpu=native will add 'nopauth'
and 'nopredres').
Philipp Tomsich [Mon, 3 Oct 2022 19:59:50 +0000 (21:59 +0200)]
aarch64: fix off-by-one in reading cpuinfo
Fixes: 341573406b39
Don't subtract one from the result of strnlen() when trying to point
to the first character after the current string. This issue would
cause individual characters (where the 128 byte buffers are stitched
together) to be lost.
Philipp Tomsich [Thu, 20 May 2021 19:57:48 +0000 (21:57 +0200)]
aarch64: enable Ampere-1 CPU
This adds support and a basic tuning model for the Ampere Computing
"Ampere-1" CPU.
The Ampere-1 implements the ARMv8.6 architecture in A64 mode and is
modelled as a 4-wide issue (as with all modern micro-architectures,
the chosen issue rate is a compromise between the maximum dispatch
rate and the maximum rate of uops issued to the scheduler).
This adds the -mcpu=ampere1 command-line option and the relevant cost
information/tuning tables for the Ampere-1.
gcc/ChangeLog:
* config/aarch64/aarch64-cores.def (AARCH64_CORE): New Ampere-1
core.
* config/aarch64/aarch64-tune.md: Regenerate.
* config/aarch64/aarch64-cost-tables.h: Add extra costs for
Ampere-1.
* config/aarch64/aarch64.c: Add tuning structures for Ampere-1.
* doc/invoke.texi: Add documentation for Ampere-1 core.
Kewen Lin [Tue, 4 Apr 2023 02:47:44 +0000 (21:47 -0500)]
rs6000: Fix vector parity support [PR108699]
The failures on the original failed case builtin-bitops-1.c
and the associated test case pr108699.c here show that the
current support of parity vector mode is wrong on Power.
The hardware insns vprtyb[wdq] which operate on the least
significant bit of each byte per element, they doesn't match
what RTL opcode parity needs, but the current implementation
expands it with them wrongly.
This patch is to fix the handling with one more insn vpopcntb.
PR target/108699
gcc/ChangeLog:
* config/rs6000/altivec.md (*p9v_parity<mode>2): Rename to ...
(rs6000_vprtyb<mode>2): ... this.
* config/rs6000/rs6000-builtin.def (VPRTYBD): Replace parityv2di2 with
rs6000_vprtybv2di2.
(VPRTYBW): Replace parityv4si2 with rs6000_vprtybv4si2.
(VPRTYBQ): Replace parityv1ti2 with rs6000_vprtybv1ti2.
* config/rs6000/vector.md (parity<mode>2 with VEC_IP): Expand with
popcountv16qi2 and the corresponding rs6000_vprtyb<mode>2.
gcc/testsuite/ChangeLog:
* gcc.target/powerpc/p9-vparity.c: Add scan-assembler-not for vpopcntb
to distinguish parity byte from parity.
* gcc.target/powerpc/pr108699.c: New test.
Harald Anlauf [Fri, 14 Apr 2023 18:45:19 +0000 (20:45 +0200)]
Fortran: fix compile-time simplification of SET_EXPONENT [PR109511]
gcc/fortran/ChangeLog:
PR fortran/109511
* simplify.c (gfc_simplify_set_exponent): Fix implementation of
compile-time simplification of intrinsic SET_EXPONENT for argument
X < 1 and for I < 0.
gcc/testsuite/ChangeLog:
PR fortran/109511
* gfortran.dg/set_exponent_1.f90: New test.
Harald Anlauf [Sat, 11 Mar 2023 14:37:37 +0000 (15:37 +0100)]
Fortran: fix bounds check for copying of class expressions [PR106945]
In the bounds check for copying of class expressions, the number of elements
determined from a descriptor, returned as type gfc_array_index_type (i.e. a
signed type), should be converted to the type of the passed element count,
which is of type size_type_node (i.e. unsigned), for use in comparisons.
gcc/fortran/ChangeLog:
PR fortran/106945
* trans-expr.c (gfc_copy_class_to_class): Convert element counts in
bounds check to common type for comparison.
gcc/testsuite/ChangeLog:
PR fortran/106945
* gfortran.dg/pr106945.f90: New test.
Jonathan Wakely [Thu, 2 Feb 2023 14:06:40 +0000 (14:06 +0000)]
libstdc++: Fix std::filesystem errors with -fkeep-inline-functions [PR108636]
With -fkeep-inline-functions there are linker errors when including
<filesystem>. This happens because there are some filesystem::path
constructors defined inline which call non-exported functions defined in
the library. That's usually not a problem, because those constructors
are only called by code that's also inside the library. But when the
header is compiled with -fkeep-inline-functions those inline functions
are emitted even though they aren't called. That then creates an
undefined reference to the other library internals. The fix is to just
move the private constructors into the library where they are called.
That way they are never even seen by users, and so not compiled even if
-fkeep-inline-functions is used.
Harald Anlauf [Mon, 20 Feb 2023 20:28:09 +0000 (21:28 +0100)]
Fortran: improve checking of character length specification [PR96025]
gcc/fortran/ChangeLog:
PR fortran/96025
* parse.c (check_function_result_typed): Improve type check of
specification expression for character length and return status.
(parse_spec): Use status from above.
* resolve.c (resolve_fntype): Prevent use of invalid specification
expression for character length.
gcc/testsuite/ChangeLog:
PR fortran/96025
* gfortran.dg/pr96025.f90: New test.