Jason Merrill [Thu, 23 Mar 2023 19:57:39 +0000 (15:57 -0400)]
c-family: -Wsequence-point and COMPONENT_REF [PR107163]
The patch for PR91415 fixed -Wsequence-point to treat shifts and ARRAY_REF
as sequenced in C++17, and COMPONENT_REF as well. But this is unnecessary
for COMPONENT_REF, since the RHS is just a FIELD_DECL with no actual
evaluation, and in this testcase handling COMPONENT_REF as sequenced blows
up fast in a deep inheritance tree. Instead, look through it.
PR c++/107163
gcc/c-family/ChangeLog:
* c-common.c (verify_tree): Don't use sequenced handling
for COMPONENT_REF.
Jason Merrill [Thu, 23 Mar 2023 20:50:09 +0000 (16:50 -0400)]
c++: constexpr PMF conversion [PR105996]
Here, we were calling build_reinterpret_cast regardless of whether there was
actually a cast, and that now sets REINTERPRET_CAST_P. But that
optimization seems dodgy anyway, as it involves NOP_EXPR from one
RECORD_TYPE to another and we try to reserve NOP_EXPR for fundamental types.
And the generated code seems the same, so let's drop it. And also strip
location wrappers.
PR c++/105996
gcc/cp/ChangeLog:
* typeck.c (build_ptrmemfunc): Drop 0-offset optimization
and location wrappers.
Jason Merrill [Fri, 17 Mar 2023 21:26:40 +0000 (17:26 -0400)]
c++: constant, array, lambda, template [PR108975]
When a lambda refers to a constant local variable in the enclosing scope, we
tentatively capture it, but if we end up pulling out its constant value, we
go back at the end of the lambda and prune any unneeded captures. Here
while parsing the template we decided that the dim capture was unneeded,
because we folded it away, but then we brought back the use in the template
trees that try to preserve the source representation with added type info.
So then when we tried to instantiate that use, we couldn't find what it was
trying to use, and crashed.
Fixed by not trying to prune when parsing a template; we'll prune at
instantiation time.
PR c++/108975
gcc/cp/ChangeLog:
* lambda.c (prune_lambda_captures): Don't bother in a template.
Jason Merrill [Fri, 17 Mar 2023 13:43:48 +0000 (09:43 -0400)]
c++: namespace-scoped friend in local class [PR69410]
do_friend was only considering class-qualified identifiers for the
qualified-id case, but we also need to skip local scope when there's an
explicit namespace scope.
PR c++/69410
gcc/cp/ChangeLog:
* friend.c (do_friend): Handle namespace as scope argument.
* decl.c (grokdeclarator): Pass down in_namespace.
Philipp Tomsich [Mon, 30 Jan 2023 22:40:26 +0000 (23:40 +0100)]
PR target/108589 - Check REG_P for AARCH64_FUSE_ADDSUB_2REG_CONST1
This adds a check for REG_P on SET_DEST for the new idiom recognizer
for AARCH64_FUSE_ADDSUB_2REG_CONST1. The reported ICE is only
observable with checking=rtl.
Bootstrapped/regtested aarch64-linux, committed.
PR target/108589
gcc/ChangeLog:
* config/aarch64/aarch64.c (aarch_macro_fusion_pair_p): Check
REG_P on SET_DEST.
Philipp Tomsich [Thu, 23 Mar 2023 18:47:57 +0000 (19:47 +0100)]
aarch64: disable LDP via tuning structure for -mcpu=ampere1
AmpereOne (-mcpu=ampere1) breaks LDP instructions into two uops.
Given the chance that this causes instructions to slip into the next
decoding cycle and the additional overheads when handling
cacheline-crossing LDP instructions, we disable the generation of LDP
isntructions through the tuning structure from instruction combining
(such as in peephole2).
Given the code-density benefits in builtins and prologue/epilogue
expansion, we allow LDPs there.
This commit:
* adds a new tuning option AARCH64_EXTRA_TUNE_NO_LDP_COMBINE
* allows -moverride=tune=... to override this
These changes are benchmark-driven, yielding the following changes
(with a net-overall improvement):
503.bwaves_r. -0.88%
507.cactuBSSN_r 0.35%
508.namd_r 3.09%
510.parest_r -2.99%
511.povray_r 5.54%
519.lbm_r 15.83%
521.wrf_r 0.56%
526.blender_r 2.47%
527.cam4_r 0.70%
538.imagick_r 0.00%
544.nab_r -0.33%
549.fotonik3d_r. -0.42%
554.roms_r 0.00%
-------------------------
= total 1.79%
Signed-off-by: Philipp Tomsich <philipp.tomsich@vrull.eu> Co-Authored-By: Di Zhao <di.zhao@amperecomputing.com>
gcc/ChangeLog:
* config/aarch64/aarch64-tuning-flags.def (AARCH64_EXTRA_TUNING_OPTION):
Add AARCH64_EXTRA_TUNE_NO_LDP_COMBINE.
* config/aarch64/aarch64.c (aarch64_operands_ok_for_ldpstp):
Check for the above tuning option when processing loads.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/ampere1-no_ldp_combine.c: New test.
Philipp Tomsich [Mon, 7 Nov 2022 13:22:21 +0000 (14:22 +0100)]
aarch64: Add support for Ampere-1A (-mcpu=ampere1a) CPU
This patch adds support for Ampere-1A CPU:
- recognize the name of the core and provide detection for -mcpu=native,
- updated extra_costs,
- adds a new fusion pair for (A+B+1 and A-B-1).
Ampere-1A and Ampere-1 have more timing difference than the extra
costs indicate, but these don't propagate through to the headline
items in our extra costs (e.g. the change in latency for scalar sqrt
doesn't have a corresponding table entry).
gcc/ChangeLog:
* config/aarch64/aarch64-cores.def (AARCH64_CORE): Add ampere1a.
* config/aarch64/aarch64-cost-tables.h: Add ampere1a_extra_costs.
* config/aarch64/aarch64-fusion-pairs.def (AARCH64_FUSION_PAIR):
Define a new fusion pair for A+B+1/A-B-1 (i.e., add/subtract two
registers and then +1/-1).
* config/aarch64/aarch64-tune.md: Regenerate.
* config/aarch64/aarch64.c (aarch_macro_fusion_pair_p): Implement
idiom-matcher for the new fusion pair.
* doc/invoke.texi: Add ampere1a.
Philipp Tomsich [Sun, 7 Aug 2022 22:30:52 +0000 (00:30 +0200)]
aarch64: update Ampere-1 core definition
This brings the extensions detected by -mcpu=native on Ampere-1 systems
in sync with the defaults generated for -mcpu=ampere1.
Note that some early kernel versions on Ampere1 may misreport the
presence of PAUTH and PREDRES (i.e., -mcpu=native will add 'nopauth'
and 'nopredres').
Philipp Tomsich [Mon, 3 Oct 2022 19:59:50 +0000 (21:59 +0200)]
aarch64: fix off-by-one in reading cpuinfo
Fixes: 341573406b39
Don't subtract one from the result of strnlen() when trying to point
to the first character after the current string. This issue would
cause individual characters (where the 128 byte buffers are stitched
together) to be lost.
Philipp Tomsich [Thu, 20 May 2021 19:57:48 +0000 (21:57 +0200)]
aarch64: enable Ampere-1 CPU
This adds support and a basic tuning model for the Ampere Computing
"Ampere-1" CPU.
The Ampere-1 implements the ARMv8.6 architecture in A64 mode and is
modelled as a 4-wide issue (as with all modern micro-architectures,
the chosen issue rate is a compromise between the maximum dispatch
rate and the maximum rate of uops issued to the scheduler).
This adds the -mcpu=ampere1 command-line option and the relevant cost
information/tuning tables for the Ampere-1.
gcc/ChangeLog:
* config/aarch64/aarch64-cores.def (AARCH64_CORE): New Ampere-1
core.
* config/aarch64/aarch64-tune.md: Regenerate.
* config/aarch64/aarch64-cost-tables.h: Add extra costs for
Ampere-1.
* config/aarch64/aarch64.c: Add tuning structures for Ampere-1.
* doc/invoke.texi: Add documentation for Ampere-1 core.
Kewen Lin [Tue, 4 Apr 2023 02:47:44 +0000 (21:47 -0500)]
rs6000: Fix vector parity support [PR108699]
The failures on the original failed case builtin-bitops-1.c
and the associated test case pr108699.c here show that the
current support of parity vector mode is wrong on Power.
The hardware insns vprtyb[wdq] which operate on the least
significant bit of each byte per element, they doesn't match
what RTL opcode parity needs, but the current implementation
expands it with them wrongly.
This patch is to fix the handling with one more insn vpopcntb.
PR target/108699
gcc/ChangeLog:
* config/rs6000/altivec.md (*p9v_parity<mode>2): Rename to ...
(rs6000_vprtyb<mode>2): ... this.
* config/rs6000/rs6000-builtin.def (VPRTYBD): Replace parityv2di2 with
rs6000_vprtybv2di2.
(VPRTYBW): Replace parityv4si2 with rs6000_vprtybv4si2.
(VPRTYBQ): Replace parityv1ti2 with rs6000_vprtybv1ti2.
* config/rs6000/vector.md (parity<mode>2 with VEC_IP): Expand with
popcountv16qi2 and the corresponding rs6000_vprtyb<mode>2.
gcc/testsuite/ChangeLog:
* gcc.target/powerpc/p9-vparity.c: Add scan-assembler-not for vpopcntb
to distinguish parity byte from parity.
* gcc.target/powerpc/pr108699.c: New test.
Harald Anlauf [Fri, 14 Apr 2023 18:45:19 +0000 (20:45 +0200)]
Fortran: fix compile-time simplification of SET_EXPONENT [PR109511]
gcc/fortran/ChangeLog:
PR fortran/109511
* simplify.c (gfc_simplify_set_exponent): Fix implementation of
compile-time simplification of intrinsic SET_EXPONENT for argument
X < 1 and for I < 0.
gcc/testsuite/ChangeLog:
PR fortran/109511
* gfortran.dg/set_exponent_1.f90: New test.
Harald Anlauf [Sat, 11 Mar 2023 14:37:37 +0000 (15:37 +0100)]
Fortran: fix bounds check for copying of class expressions [PR106945]
In the bounds check for copying of class expressions, the number of elements
determined from a descriptor, returned as type gfc_array_index_type (i.e. a
signed type), should be converted to the type of the passed element count,
which is of type size_type_node (i.e. unsigned), for use in comparisons.
gcc/fortran/ChangeLog:
PR fortran/106945
* trans-expr.c (gfc_copy_class_to_class): Convert element counts in
bounds check to common type for comparison.
gcc/testsuite/ChangeLog:
PR fortran/106945
* gfortran.dg/pr106945.f90: New test.
Jonathan Wakely [Thu, 2 Feb 2023 14:06:40 +0000 (14:06 +0000)]
libstdc++: Fix std::filesystem errors with -fkeep-inline-functions [PR108636]
With -fkeep-inline-functions there are linker errors when including
<filesystem>. This happens because there are some filesystem::path
constructors defined inline which call non-exported functions defined in
the library. That's usually not a problem, because those constructors
are only called by code that's also inside the library. But when the
header is compiled with -fkeep-inline-functions those inline functions
are emitted even though they aren't called. That then creates an
undefined reference to the other library internals. The fix is to just
move the private constructors into the library where they are called.
That way they are never even seen by users, and so not compiled even if
-fkeep-inline-functions is used.
Harald Anlauf [Mon, 20 Feb 2023 20:28:09 +0000 (21:28 +0100)]
Fortran: improve checking of character length specification [PR96025]
gcc/fortran/ChangeLog:
PR fortran/96025
* parse.c (check_function_result_typed): Improve type check of
specification expression for character length and return status.
(parse_spec): Use status from above.
* resolve.c (resolve_fntype): Prevent use of invalid specification
expression for character length.
gcc/testsuite/ChangeLog:
PR fortran/96025
* gfortran.dg/pr96025.f90: New test.
This is a follow-up to commit a4c6bd0821099f6b8c0f64a96ffd9d01a025c413
introducing a runtime check for alignment for 16 byte atomic
compare-exchange, load, and store.
libatomic/ChangeLog:
* config/s390/cas_n.c: New file.
* config/s390/load_n.c: New file.
* config/s390/store_n.c: New file.
Alex Coplan [Mon, 6 Feb 2023 14:32:21 +0000 (14:32 +0000)]
aarch64: Fix up bfmlal lane pattern [PR104921]
As the testcase shows, this pattern had an incorrect constraint leading
to GCC's output getting rejected by the assembler.
This patch fixes the constraint accordingly.
The test is split into two: one that can run without bf16 support from
the assembler and another that checks that the output actually assembles
when such support is available.
gcc/ChangeLog:
PR target/104921
* config/aarch64/aarch64-simd.md (aarch64_bfmlal<bt>_lane<q>v4sf):
Use correct constraint for operand 3.
gcc/testsuite/ChangeLog:
PR target/104921
* gcc.target/aarch64/pr104921-1.c: New test.
* gcc.target/aarch64/pr104921-2.c: New test.
* gcc.target/aarch64/pr104921.x: Include file for new tests.
Nelson Chu [Mon, 29 Nov 2021 12:48:20 +0000 (04:48 -0800)]
RISC-V: jal cannot refer to a default visibility symbol for shared object. [PR 108339]
This is the original binutils bugzilla report,
https://sourceware.org/bugzilla/show_bug.cgi?id=28509
And this is the first version of the proposed binutils patch,
https://sourceware.org/pipermail/binutils/2021-November/118398.html
After applying the binutils patch, I get the the unexpected error when
building libgcc,
/scratch/nelsonc/riscv-gnu-toolchain/riscv-gcc/libgcc/config/riscv/div.S:42:
/scratch/nelsonc/build-upstream/rv64gc-linux/build-install/riscv64-unknown-linux-gnu/bin/ld: relocation R_RISCV_JAL against `__udivdi3' which may bind externally can not be used when making a shared object; recompile with -fPIC
Therefore, this patch add an extra hidden alias symbol for __udivdi3, and
then use HIDDEN_JUMPTARGET to target a non-preemptible symbol instead.
The solution is similar to glibc as follows,
https://sourceware.org/git/?p=glibc.git;a=commit;h=68389203832ab39dd0dbaabbc4059e7fff51c29b
libgcc/ChangeLog:
PR target/108339
* config/riscv/div.S: Add the hidden alias symbol for __udivdi3, and
then use HIDDEN_JUMPTARGET to target it since it is non-preemptible.
* config/riscv/riscv-asm.h: Added new macros HIDDEN_JUMPTARGET and
HIDDEN_DEF.
Kewen Lin [Wed, 18 Jan 2023 08:34:19 +0000 (02:34 -0600)]
rs6000: Teach rs6000_opaque_type_invalid_use_p about gcall [PR108348]
PR108348 shows one special case that MMA opaque types are
used in function arguments and treated as pass by reference,
it results in one copying from argument to a temp variable,
since this copying happens before rs6000_function_arg check,
it can cause ICE without MMA support then. This patch is to
teach function rs6000_opaque_type_invalid_use_p to check if
any function argument in a gcall stmt has the invalid use of
MMA opaque types.
btw, I checked the handling on return value, it doesn't have
this kind of issue as its checking and error emission is quite
early, so this doesn't handle function return value.
PR target/108348
gcc/ChangeLog:
* config/rs6000/rs6000.c (rs6000_opaque_type_invalid_use_p): Add the
support for invalid uses of MMA opaque type in function arguments.
gcc/testsuite/ChangeLog:
* gcc.target/powerpc/pr108348-1.c: New test.
* gcc.target/powerpc/pr108348-2.c: New test.
Kewen Lin [Sun, 12 Feb 2023 15:35:27 +0000 (09:35 -0600)]
rs6000: Teach rs6000_opaque_type_invalid_use_p about inline asm [PR108272]
As PR108272 shows, there are some invalid uses of MMA opaque
types in inline asm statements. This patch is to teach the
function rs6000_opaque_type_invalid_use_p for inline asm,
check and error any invalid use of MMA opaque types in input
and output operands.
PR target/108272
gcc/ChangeLog:
* config/rs6000/rs6000.c (rs6000_opaque_type_invalid_use_p): Add the
support for invalid uses in inline asm, factor out the checking and
erroring to lambda function check_and_error_invalid_use.
gcc/testsuite/ChangeLog:
* gcc.target/powerpc/pr108272-1.c: New test.
* gcc.target/powerpc/pr108272-2.c: New test.
* gcc.target/powerpc/pr108272-3.c: New test.
* gcc.target/powerpc/pr108272-4.c: New test.