Iain Buclaw [Mon, 26 Jun 2023 01:24:27 +0000 (03:24 +0200)]
d: Suboptimal codegen for __builtin_expect(cond, false)
Since PR96435, both boolean objects and expressions have been evaluated
in the following way.
(*(ubyte*)&obj_or_expr) & 1
It has been noted that sometimes this can cause the back-end to optimize
in non-obvious ways - in particular with __builtin_expect.
This @safe feature is now restricted to just when reading the value of a
bool field that comes from a union.
PR d/110359
gcc/d/ChangeLog:
* d-convert.cc (convert_for_rvalue): Only apply the @safe boolean
conversion to boolean fields of a union.
(convert_for_condition): Call convert_for_rvalue in the default case.
Iain Buclaw [Mon, 26 Jun 2023 00:29:46 +0000 (02:29 +0200)]
d: Fix crash in d/dmd/root/aav.d:127 dmd_aaGetRvalue from DsymbolTable::lookup
Backports patch from upstream dmd mainline for fixing PR110113.
The data being Mem.xrealloc'd contains many Array(T) fields, some of
which have self references in their data.ptr field thanks to the
smallarray optimization used by Array.
Naturally then, the memcpy from old GC data to new retains those self
referenced addresses, and the GC marks the old data as "free". Some time
later GC.malloc will return a pointer to said "free" data. So now we
have two GC references to the same memory. One that is treating the data
as an Array(VarDeclaration) in dmd.escape.escapeByStorage, and the other
as an AA in the symtab of a dmd.dsymbol.ScopeDsymbol.
Fix this memory corruption by not storing the data in a global variable
for reuse. If there are no more live references, the GC will free it.
PR d/110113
gcc/d/ChangeLog:
* dmd/escape.d (checkMutableArguments): Always allocate new buffer for
computing escapeBy.
Ian Lance Taylor [Fri, 23 Jun 2023 23:16:06 +0000 (16:16 -0700)]
compiler, libgo: support bootstrapping gc compiler
In the Go 1.21 release the package internal/profile imports
internal/lazyregexp. That works when bootstrapping with Go 1.17,
because that compiler has internal/lazyregep and permits importing it.
We also have internal/lazyregexp in libgo, but since it is not installed
it is not available for importing. This CL adds internal/lazyregexp
to the list of internal packages that are installed for bootstrapping.
The Go 1.21, and earlier, releases have a couple of functions in
the internal/abi package that are always fully intrinsified.
The gofrontend recognizes and intrinsifies those functions as well.
However, the gofrontend was also building function descriptors
for references to the functions without calling them, which
failed because there was nothing to refer to. That is OK for the
gc compiler, which guarantees that the functions are only called,
not referenced. This CL arranges to not generate function descriptors
for these functions.
Jonathan Wakely [Mon, 15 May 2023 20:41:56 +0000 (21:41 +0100)]
libstdc++: Document removal of implicit allocator rebinding extensions
Traditionally libstdc++ allowed containers to be
instantiated with allocator's that have the wrong value type, implicitly
rebinding the allocator to the container's value type. Since C++20 that
has been explicitly ill-formed, so the extension is no longer supported
in strict modes (e.g. -std=c++17) and in C++20 and later.
libstdc++-v3/ChangeLog:
* doc/xml/manual/evolution.xml: Document removal of implicit
allocator rebinding extensions in strict mode and for C++20.
* doc/html/*: Regenerate.
Richard Biener [Mon, 19 Jun 2023 07:52:45 +0000 (09:52 +0200)]
tree-optimization/110298 - CFG cleanup and stale nb_iterations
When unrolling we eventually kill nb_iterations info since it may
refer to removed SSA names. But we do this only after cleaning
up the CFG which in turn can end up accessing it. Fixed by
swapping the two.
PR tree-optimization/110298
* tree-ssa-loop-ivcanon.cc (tree_unroll_loops_completely):
Clear number of iterations info before cleaning up the CFG.
Richard Biener [Fri, 9 Jun 2023 07:29:09 +0000 (09:29 +0200)]
middle-end/110182 - TYPE_PRECISION on VECTOR_TYPE causes wrong-code
When folding two conversions in a row we use TYPE_PRECISION but
that's invalid for VECTOR_TYPE. The following fixes this by
using element_precision instead.
PR middle-end/110182
* match.pd (two conversions in a row): Use element_precision
to DTRT for VECTOR_TYPE.
Alex Coplan [Tue, 6 Jun 2023 14:19:03 +0000 (15:19 +0100)]
aarch64: Allow compiler to define ls64 builtins [PR110132]
This patch refactors the ls64 builtins to allow the compiler to define them
directly instead of having wrapper functions in arm_acle.h. This should be not
only easier to maintain, but it makes two important correctness fixes:
- It fixes PR110132, where the builtins ended up getting declared with
invisible bindings in the C FE, so the FE ended up synthesizing
incompatible implicit definitions for these builtins.
- It allows the builtins to be used with LTO, which didn't work previously.
We also take the opportunity to add test coverage from C++ for these
builtins.
gcc/ChangeLog:
PR target/110132
* config/aarch64/aarch64-builtins.cc (aarch64_general_simulate_builtin):
New. Use it ...
(aarch64_init_ls64_builtins): ... here. Switch to declaring public ACLE
names for builtins.
(aarch64_general_init_builtins): Ensure we invoke the arm_acle.h
setup if in_lto_p, just like we do for SVE.
* config/aarch64/arm_acle.h: (__arm_ld64b): Delete.
(__arm_st64b): Delete.
(__arm_st64bv): Delete.
(__arm_st64bv0): Delete.
gcc/testsuite/ChangeLog:
PR target/110132
* lib/target-supports.exp (check_effective_target_aarch64_asm_FUNC_ok):
Extend to ls64.
* g++.target/aarch64/acle/acle.exp: New.
* g++.target/aarch64/acle/ls64.C: New test.
* g++.target/aarch64/acle/ls64_lto.C: New test.
* gcc.target/aarch64/acle/ls64_lto.c: New test.
* gcc.target/aarch64/acle/pr110132.c: New test.
Alex Coplan [Tue, 6 Jun 2023 10:52:19 +0000 (11:52 +0100)]
aarch64: Fix wrong code with st64b builtin [PR110100]
The st64b pattern incorrectly had an output constraint on the register
operand containing the destination address for the store, leading to
wrong code. This patch fixes that.
gcc/ChangeLog:
PR target/110100
* config/aarch64/aarch64-builtins.cc (aarch64_expand_builtin_ls64):
Use input operand for the destination address.
* config/aarch64/aarch64.md (st64b): Fix constraint on address
operand.
gcc/testsuite/ChangeLog:
PR target/110100
* gcc.target/aarch64/acle/pr110100.c: New test.
Kewen Lin [Tue, 13 Jun 2023 08:04:54 +0000 (03:04 -0500)]
testsuite: Check int128 effective target for pr109932-{1,2}.c [PR110230]
This patch is to make newly added test cases pr109932-{1,2}.c
check int128 effective target to avoid unsupported type error
on 32-bit. I did hit this failure during testing and fixed
it, but made a stupid mistake not updating the local formatted
patch which was actually out of date.
PR testsuite/110230
PR target/109932
gcc/testsuite/ChangeLog:
* gcc.target/powerpc/pr109932-1.c: Adjust with int128 effective target.
* gcc.target/powerpc/pr109932-2.c: Ditto.
Kewen Lin [Mon, 12 Jun 2023 06:08:22 +0000 (01:08 -0500)]
rs6000: Guard __builtin_{un,}pack_vector_int128 with vsx [PR109932]
As PR109932 shows, builtins __builtin_{un,}pack_vector_int128
should be guarded under vsx rather than power7, as their
corresponding bif patterns have the conditions TARGET_VSX
and VECTOR_MEM_ALTIVEC_OR_VSX_P (V1TImode). This patch is to
move __builtin_{un,}pack_vector_int128 to stanza vsx to ensure
their supports.
PR target/109932
gcc/ChangeLog:
* config/rs6000/rs6000-builtins.def (__builtin_pack_vector_int128,
__builtin_unpack_vector_int128): Move from stanza power7 to vsx.
gcc/testsuite/ChangeLog:
* gcc.target/powerpc/pr109932-1.c: New test.
* gcc.target/powerpc/pr109932-2.c: New test.
Kewen Lin [Mon, 12 Jun 2023 06:07:52 +0000 (01:07 -0500)]
rs6000: Don't use TFmode for 128 bits fp constant in toc [PR110011]
As PR110011 shows, when encoding 128 bits fp constant into
toc, we adopts REAL_VALUE_TO_TARGET_LONG_DOUBLE which is
to find the first float mode with LONG_DOUBLE_TYPE_SIZE
bits of precision, it would be TFmode here. But the 128
bits fp constant can be with mode IFmode or KFmode, which
doesn't necessarily have the same underlying float format
as the one of TFmode, like this PR exposes, with option
-mabi=ibmlongdouble TFmode has ibm_extended_format while
KFmode has ieee_quad_format, mixing up the formats (the
encoding/decoding ways) would cause unexpected results.
This patch is to make it use constant's own mode instead
of TFmode for real_to_target call.
PR target/110011
gcc/ChangeLog:
* config/rs6000/rs6000.cc (output_toc): Use the mode of the 128-bit
floating constant itself for real_to_target call.
Xi Ruoyao [Sat, 24 Sep 2022 07:16:57 +0000 (15:16 +0800)]
aarch64: testsuite: disable stack protector for tests relying on stack offset
Stack protector needs a guard value on the stack and change the stack
layout. So we need to disable it for those tests, to avoid test failure
with --enable-default-ssp.
Xi Ruoyao [Sat, 24 Sep 2022 06:50:03 +0000 (14:50 +0800)]
aarch64: testsuite: disable stack protector for pr104005.c
Storing stack guarding variable need one stp instruction, breaking the
scan-assembler-not pattern in the test. Disable stack protector to
avoid a test failure with --enable-default-ssp.
Xi Ruoyao [Sat, 24 Sep 2022 06:38:31 +0000 (14:38 +0800)]
aarch64: testsuite: disable stack protector for auto-init-7.c
The test scans for "const_int 0" in the RTL dump, but stack protector
can produce more "const_int 0". To avoid a failure with
--enable-default-ssp, disable stack protector for this.
Xi Ruoyao [Sat, 24 Sep 2022 06:28:44 +0000 (14:28 +0800)]
aarch64: testsuite: disable PIE for tests with large code model [PR70150]
These tests set large code model with -mcmodel=large or target pragma for
AArch64. But if GCC is configured with --enable-default-pie, it triggers
"sorry: unimplemented: code model large with -fpic". Disable PIE to make
avoid the issue.
Lulu Cheng [Wed, 7 Jun 2023 02:21:58 +0000 (10:21 +0800)]
LoongArch: Avoid non-returning indirect jumps through $ra [PR110136]
Micro-architecture unconditionally treats a "jr $ra" as "return from subroutine",
hence doing "jr $ra" would interfere with both subroutine return prediction and
the more general indirect branch prediction.
Therefore, a problem like PR110136 can cause a significant increase in branch error
prediction rate and affect performance. The same problem exists with "indirect_jump".
gcc/ChangeLog:
PR target/110136
* config/loongarch/loongarch.md: Modify the register constraints for template
"jumptable" and "indirect_jump" from "r" to "e".
Georg-Johann Lay [Sat, 10 Jun 2023 19:47:53 +0000 (21:47 +0200)]
target/109650: Fix wrong code after cc0 -> CCmode transition.
This patch fixes a wrong-code bug in the wake of PR92729, the transition that
turned the AVR backend from cc0 to CCmode. In cc0, the insn that uses cc0 like
a conditional branch always follows the cc0 setter, which is no more the case
with CCmode where set and use of REG_CC might be in different basic blocks.
This patch removes the machine-dependent reorg pass in avr_reorg entirely.
It is replaced by a new, AVR specific mini-pass that runs prior to split2.
Canonicalization of comparisons away from the "difficult" codes GT[U] and LE[U]
is now mostly performed by implementing TARGET_CANONICALIZE_COMPARISON.
Moreover:
* Text peephole conditions get "dead_or_set_regno_p (*, REG_CC)" as needed.
* RTL peephole conditions get "peep2_regno_dead_p (*, REG_CC)" as needed.
* Conditional branches no more clobber REG_CC.
* insn output for compares looks ahead to determine the branch mode in use.
This needs also "dead_or_set_regno_p (*, REG_CC)".
* Add RTL peepholes for decrement-and-branch detection.
* Some of the patterns like "*cmphi.zero-extend.0" lost their
combine-ational part wit PR92729. Restore them.
Finally, it fixes some of the many indentation glitches left over from PR92729.
gcc/
PR target/109650
PR target/92729
Backport from 2023-05-10 master r14-1688.
* config/avr/avr-passes.def (avr_pass_ifelse): Insert new pass.
* config/avr/avr.cc (avr_pass_ifelse): New RTL pass.
(avr_pass_data_ifelse): New pass_data for it.
(make_avr_pass_ifelse, avr_redundant_compare, avr_cbranch_cost)
(avr_canonicalize_comparison, avr_out_plus_set_ZN)
(avr_out_cmp_ext): New functions.
(compare_condtition): Make sure REG_CC dies in the branch insn.
(avr_rtx_costs_1): Add computation of cbranch costs.
(avr_adjust_insn_length) [ADJUST_LEN_ADD_SET_ZN, ADJUST_LEN_CMP_ZEXT]:
[ADJUST_LEN_CMP_SEXT]Handle them.
(TARGET_CANONICALIZE_COMPARISON): New define.
(avr_simplify_comparison_p, compare_diff_p, avr_compare_pattern)
(avr_reorg_remove_redundant_compare, avr_reorg): Remove functions.
(TARGET_MACHINE_DEPENDENT_REORG): Remove define.
* config/avr/avr-protos.h (avr_simplify_comparison_p): Remove proto.
(make_avr_pass_ifelse, avr_out_plus_set_ZN, cc_reg_rtx)
(avr_out_cmp_zext): New Protos
* config/avr/avr.md (branch, difficult_branch): Don't split insns.
(*cbranchhi.zero-extend.0", *cbranchhi.zero-extend.1")
(*swapped_tst<mode>, *add.for.eqne.<mode>): New insns.
(*cbranch<mode>4): Rename to cbranch<mode>4_insn.
(define_peephole): Add dead_or_set_regno_p(insn,REG_CC) as needed.
(define_deephole2): Add peep2_regno_dead_p(*,REG_CC) as needed.
Add new RTL peepholes for decrement-and-branch and *swapped_tst<mode>.
Rework signtest-and-branch peepholes for *sbrx_branch<mode>.
(adjust_len) [add_set_ZN, cmp_zext]: New.
(QIPSI): New mode iterator.
(ALLs1, ALLs2, ALLs4, ALLs234): New mode iterators.
(gelt): New code iterator.
(gelt_eqne): New code attribute.
(rvbranch, *rvbranch, difficult_rvbranch, *difficult_rvbranch)
(branch_unspec, *negated_tst<mode>, *reversed_tst<mode>)
(*cmpqi_sign_extend): Remove insns.
(define_c_enum "unspec") [UNSPEC_IDENTITY]: Remove.
* config/avr/avr-dimode.md (cbranch<mode>4): Canonicalize comparisons.
* config/avr/predicates.md (scratch_or_d_register_operand): New.
* config/avr/constraints.md (Yxx): New constraint.
gcc/testsuite/
PR target/109650
Backport from 2023-05-10 master r14-1688.
* gcc.target/avr/torture/pr109650-1.c: New test.
* gcc.target/avr/torture/pr109650-2.c: New test.
PR106907 has few warnings spotted from cppcheck. In that addressing duplicate
expression issue here. Here the same expression is used twice in logical
AND(&&) operation which result in same result so removing that.
* config/rs6000/rs6000.cc (darwin_rs6000_special_round_type_align):
Make sure that we do not have a cap on field alignment before altering
the struct layout based on the type alignment of the first entry.
gcc/testsuite/ChangeLog:
* gcc.target/powerpc/darwin-abi-13-0.c: New test.
* gcc.target/powerpc/darwin-abi-13-1.c: New test.
* gcc.target/powerpc/darwin-abi-13-2.c: New test.
* gcc.target/powerpc/darwin-structs-0.h: New test.
Jakub Jelinek [Fri, 9 Jun 2023 07:10:29 +0000 (09:10 +0200)]
fortran: Fix ICE on pr96024.f90 on big-endian hosts [PR96024]
The pr96024.f90 testcase ICEs on big-endian hosts. The problem is
that length->val.integer is accessed after checking
length->expr_type == EXPR_CONSTANT, but it is a CHARACTER constant
which uses length->val.character union member instead and on big-endian
we end up reading constant 0x100000000 rather than some small number
on little-endian and if target doesn't have enough memory for 4 times
that (i.e. 16GB allocation), it ICEs.
2023-06-09 Jakub Jelinek <jakub@redhat.com>
PR fortran/96024
* primary.cc (gfc_convert_to_structure_constructor): Only do
constant string ctor length verification and truncation/padding
if constant length has INTEGER type.
liuhongt [Mon, 5 Jun 2023 04:38:41 +0000 (12:38 +0800)]
Explicitly view_convert_expr mask to signed type when folding pblendvb builtins.
Since mask < 0 will be always false for vector char when
-funsigned-char, but vpblendvb needs to check the most significant
bit. The patch explicitly VCE to vector signed char.
gcc/ChangeLog:
PR target/110108
* config/i386/i386.cc (ix86_gimple_fold_builtin): Explicitly
view_convert_expr mask to signed type when folding pblendvb
builtins.
where M1 and M2 are of equal mode size. That is problematic for the splitter
vfp.md:no_literal_pool_df_immediate in the arm backend, which tries to pun an
lvalue DFmode pseudo into DImode and assign a constant to it with
emit_move_insn, as the new transformation simply undoes this, and we end up
splitting indefinitely.
This patch changes things around in the arm backend so that we use a
DImode temporary (instead of DFmode) and first load the DImode constant
into the pseudo, and then pun the pseudo into DFmode as an rvalue in a
reg -> reg move. I believe this should be semantically equivalent but
avoids the pathalogical behaviour seen in the PR.
gcc/ChangeLog:
PR target/109800
* config/arm/arm.md (movdf): Generate temporary pseudo in DImode
instead of DFmode.
* config/arm/vfp.md (no_literal_pool_df_immediate): Rather than punning an
lvalue DFmode pseudo into DImode, use a DImode pseudo and pun it into
DFmode as an rvalue.
gcc/testsuite/ChangeLog:
PR target/109800
* gcc.target/arm/pure-code/pr109800.c: New test.
Kyrylo Tkachov [Wed, 24 May 2023 08:33:04 +0000 (09:33 +0100)]
arm: PR target/109939 Correct signedness of return type of __ssat intrinsics
As the PR says we shouldn't be using qualifier_unsigned for the return type of the __ssat intrinsics.
UNSIGNED_SAT_BINOP_UNSIGNED_IMM_QUALIFIERS already exists for that.
This was just a thinko.
This patch fixes this and the warning with -Wconversion goes away.
Bootstrapped and tested on arm-none-linux-gnueabihf.
gcc/ChangeLog:
PR target/109939
* config/arm/arm-builtins.cc (SAT_BINOP_UNSIGNED_IMM_QUALIFIERS): Use
qualifier_none for the return operand.
gcc/testsuite/ChangeLog:
PR target/109939
* gcc.target/arm/pr109939.c: New test.
target/110088: Improve operation of l-reg with const after move from d-reg.
After reload, there may be sequences like
lreg = dreg
lreg = lreg <op> const
with an LD_REGS dreg, non-LD_REGS lreg, and <op> in PLUS, IOR, AND.
If dreg dies after the first insn, it is possible to use
dreg = dreg <op> const
lreg = dreg
instead which is more efficient.
gcc/
PR target/110088
* config/avr/avr.md: Add an RTL peephole to optimize operations on
non-LD_REGS after a move from LD_REGS.
(piaop): New code iterator.
Alexandre Oliva [Tue, 30 May 2023 21:46:26 +0000 (18:46 -0300)]
[libstdc++] [testsuite] xfail double-prec from_chars for x86_64 ldbl
When long double is wider than double, but from_chars is implemented
in terms of double, tests that involve the full precision of long
double are expected to fail. Mark them as such on x86_64-*-vxworks*.
for libstdc++-v3/ChangeLog
* testsuite/20_util/from_chars/4.cc: Skip long double test06
on x86_64-vxworks.
* testsuite/20_util/to_chars/long_double.cc: Xfail run on
x86_64-vxworks.
Alexandre Oliva [Fri, 5 May 2023 11:34:23 +0000 (08:34 -0300)]
[libstdc++] [testsuite] xfail double-prec from_chars for ldbl
When long double is wider than double, but from_chars is implemented
in terms of double, tests that involve the full precision of long
double are expected to fail. Mark them as such on aarch64-*-vxworks.
for libstdc++-v3/ChangeLog
* testsuite/20_util/from_chars/4.cc: Skip long double test06
on aarch64-vxworks.
* testsuite/20_util/to_chars/long_double.cc: Xfail run on
aarch64-vxworks.
PR libstdc++/109822
* include/experimental/bits/simd.h (to_native): Use int NTTP
as specified in PTS2.
(to_compatible): Likewise. Add missing tag to call mask
generator ctor.
* testsuite/experimental/simd/pr109822_cast_functions.cc: New
test.
* testsuite/experimental/simd/tests/operator_cvt.cc: Make long
double <-> (u)long conversion tests conditional on sizeof(long
double) and sizeof(long).
Christophe Lyon [Tue, 23 May 2023 14:30:53 +0000 (14:30 +0000)]
testsuite: make mve_intrinsic_type_overloads-int.c libc-agnostic
Glibc defines int32_t as 'int' while newlib defines it as 'long int'.
Although these correspond to the same size, g++ complains when using the
'wrong' version:
invalid conversion from 'long int*' to 'int32_t*' {aka 'int*'} [-fpermissive]
or
invalid conversion from 'int*' to 'int32_t*' {aka 'long int*'} [-fpermissive]
when calling vst1q(int32*, int32x4_t) with a first parameter of type
'long int *' (resp. 'int *')
To make this test pass with any type of toolchain, this patch defines
'word_type' according to which libc is in use.
Georg-Johann Lay [Tue, 23 May 2023 12:54:12 +0000 (14:54 +0200)]
target/104327: Allow more inlining between different optimization levels.
avr-common.cc introduces the following options that are set depending
on optimization level: -mgas-isr-prologues, -mmain-is-OS-task and
-fsplit-wide-types-early. The inliner thinks that different options
disallow cross-optimization inlining, so provide can_inline_p.
gcc/
PR target/104327
* config/avr/avr.cc (avr_can_inline_p): New static function.
(TARGET_CAN_INLINE_P): Define to that function.
Georg-Johann Lay [Thu, 25 May 2023 17:02:34 +0000 (19:02 +0200)]
target/82931: Make a pattern more generic to match more bit-transfers.
There is already a pattern in avr.md that matches single-bit transfers
from one register to another one, but it only handled bit 0 of 8-bit
registers. This change makes that pattern more generic so it matches
more of similar single-bit transfers.
gcc/
PR target/82931
* config/avr/avr.md (*movbitqi.0): Rename to *movbit<mode>.0-6.
Handle any bit position and use mode QISI.
* config/avr/avr.cc (avr_rtx_costs_1) [IOR]: Return a cost
of 2 insns for bit-transfer of respective style.
gcc/testsuite/
PR target/82931
* gcc.target/avr/pr82931.c: New test.
PR libstdc++/109949
* include/experimental/bits/simd.h (__intrinsic_type): If
__ALTIVEC__ is defined, map gnu::vector_size types to their
corresponding __vector T types without losing unsignedness of
integer types. Also prefer long long over long.
* include/experimental/bits/simd_ppc.h (_S_popcount): Cast mask
object to the expected unsigned vector type.
PR libstdc++/109261
* include/experimental/bits/simd.h (__intrinsic_type):
Specialize __intrinsic_type<double, 8> and
__intrinsic_type<double, 16> in any case, but provide the member
type only with __aarch64__.
PR libstdc++/109261
* include/experimental/bits/simd_neon.h (_S_reduce): Add
constexpr and make NEON implementation conditional on
not __builtin_is_constant_evaluated.
Georg-Johann Lay [Tue, 23 May 2023 16:49:19 +0000 (18:49 +0200)]
Improve cost computation for single-bit bit insertions.
Some miscomputation of rtx_costs lead to sub-optimal code for
single-bit bit insertions. This patch implements TARGET_INSN_COST,
which has a chance to see the whole insn during insn combination;
in particular the SET_DEST of (set (zero_extract (...) ...)).
gcc/
* config/avr/avr.cc (avr_insn_cost): New static function.
(TARGET_INSN_COST): Define to that function.
Matthias Kretz [Thu, 23 Mar 2023 08:32:58 +0000 (09:32 +0100)]
libstdc++: Add missing constexpr to simd
The constexpr API is only available with -std=gnu++XX (and proposed for
C++26). The proposal is to have the complete simd API usable in constant
expressions.
This patch resolves several issues with using simd in constant
expressions.
Issues why constant_evaluated branches are necessary:
* subscripting vector builtins is not allowed in constant expressions
* if the implementation needs/uses memcpy
* if the implementation would otherwise call SIMD intrinsics/builtins
Matthias Kretz [Thu, 23 Feb 2023 13:55:08 +0000 (14:55 +0100)]
libstdc++: Fix simd compilation with Clang
Clang fails to compile some constant expressions involving simd.
Therefore, just disable this non-conforming extension for clang.
Fix AVX512 blend implementation for Clang. It was converting the bitmask
to bool before, which is obviously wrong. Instead use a Clang builtin to
convert the bitmask to vector-mask before using a vector blend ?:. A
similar change is required for the masked unary implementation, because
the GCC builtins do not exist on Clang.
* include/experimental/bits/simd_detail.h: Don't declare the
simd API as constexpr with Clang.
* include/experimental/bits/simd_x86.h (__movm): New.
(_S_blend_avx512): Resolve FIXME. Implement blend using __movm
and ?:.
(_SimdImplX86::_S_masked_unary): Clang does not implement the
same builtins. Implement the function using __movm, ?:, and -
operators on vector_size types instead.
Matthias Kretz [Mon, 20 Feb 2023 16:49:37 +0000 (17:49 +0100)]
libstdc++: Always-inline most of non-cmath fixed_size implementation
For simd, the inlining behavior should be similar to builtin types. (No
operator on buitin types is ever translated into a function call.)
Therefore, always_inline is the right choice (i.e. inline on -O0 as
well).
PR libstdc++/108856
* include/experimental/bits/simd_builtin.h
(_SimdImplBuiltin::_S_masked_unary): More efficient
implementation of masked inc-/decrement for integers and floats
without AVX2.
* include/experimental/bits/simd_x86.h
(_SimdImplX86::_S_masked_unary): New. Use AVX512 masked subtract
builtins for masked inc-/decrement.
Matthias Kretz [Sat, 14 Jan 2023 16:07:59 +0000 (17:07 +0100)]
libstdc++: Annotate most lambdas with always_inline
All of the annotated lambdas are simply a necessary means for
implementing these functions and should never result in an actual
function call. Many of these lambdas would go away if C++ had better
language support for packs.
Michael Meissner [Mon, 22 May 2023 15:08:13 +0000 (11:08 -0400)]
Do not generate vmaddfp and vnmsubfp
This is version 3 of the patch. This is essentially version 1 with the removal
of changes to altivec.md, and cleanup of the comments.
Version 2 generated the vmaddfp and vnmsubfp instructions if -Ofast was used,
and those changes are deleted in this patch.
The Altivec instructions vmaddfp and vnmsubfp have different rounding behaviors
than the VSX xvmaddsp and xvnmsubsp instructions. In particular, generating
these instructions seems to break Eigen on big endian systems.
I have done bootstrap builds on power9 little endian (with both IEEE long
double and IBM long double). I have also done the builds and test on a power8
big endian system (testing both 32-bit and 64-bit code generation). Chip has
verified that it fixes the problem that Eigen encountered. Can I check this
into the master GCC branch? After a burn-in period, can I check this patch
into the active GCC branches?
Thanks in advance.
2023-05-22 Michael Meissner <meissner@linux.ibm.com>
gcc/
PR target/70243
* config/rs6000/vsx.md (vsx_fmav4sf4): Do not generate vmaddfp.
(vsx_nfmsv4sf4): Do not generate vnmsubfp. Back port from master
04/10/2023 change.
gcc/testsuite/
PR target/70243
* gcc.target/powerpc/pr70243.c: New test. Back port from master
04/10/2023 change.
Jakub Jelinek [Sun, 21 May 2023 11:36:56 +0000 (13:36 +0200)]
atch.pd: Ensure (op CONSTANT_CLASS_P CONSTANT_CLASS_P) is simplified [PR109505]
On the following testcase we hang, because POLY_INT_CST is CONSTANT_CLASS_P,
but BIT_AND_EXPR with it and INTEGER_CST doesn't simplify and the
(x | CST1) & CST2 -> (x & CST2) | (CST1 & CST2)
simplification actually relies on the (CST1 & CST2) simplification,
otherwise it is a deoptimization, trading 2 ops for 3 and furthermore
running into
/* Given a bit-wise operation CODE applied to ARG0 and ARG1, see if both
operands are another bit-wise operation with a common input. If so,
distribute the bit operations to save an operation and possibly two if
constants are involved. For example, convert
(A | B) & (A | C) into A | (B & C)
Further simplification will occur if B and C are constants. */
simplification which simplifies that
(x & CST2) | (CST1 & CST2) back to
CST2 & (x | CST1).
I went through all other places I could find where we have a simplification
with 2 CONSTANT_CLASS_P operands and perform some operation on those two,
while the other spots aren't that severe (just trade 2 operations for
another 2 if the two constants don't simplify, rather than as in the above
case trading 2 ops for 3), I still think all those spots really intend
to optimize only if the 2 constants simplify.
So, the following patch adds to those a ! modifier to ensure that,
even at GENERIC that modifier means !EXPR_P which is exactly what we want
IMHO.
2023-05-21 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/109505
* match.pd ((x | CST1) & CST2 -> (x & CST2) | (CST1 & CST2),
Combine successive equal operations with constants,
(A +- CST1) +- CST2 -> A + CST3, (CST1 - A) +- CST2 -> CST3 - A,
CST1 - (CST2 - A) -> CST3 + A): Use ! on ops with 2 CONSTANT_CLASS_P
operands.
Kewen Lin [Wed, 17 May 2023 07:48:40 +0000 (02:48 -0500)]
vect: Don't retry if the previous analysis fails
When working on a cost tweaking patch, I found that a newly
added test case has different dumpings with stage-1 and
bootstrapped gcc. By looking into it, the apparent reason
is vect_analyze_loop_2 doesn't get slp_done_for_suggested_uf
set expectedly, the following retrying will use the garbage
slp_done_for_suggested_uf instead. In fact, the setting of
slp_done_for_suggested_uf only happens when the previous
analysis succeeds, for the mentioned test case, its previous
analysis does fail, it's unexpected to use the value of
slp_done_for_suggested_uf any more.
In function vect_analyze_loop_1, we only return success when
res is true, which is the result of 1st analysis. It means
we never try to vectorize with unroll_vinfo if the previous
analysis fails. So this patch shouldn't break anything, and
just stop some useless analysis early.
gcc/ChangeLog:
* tree-vect-loop.cc (vect_analyze_loop_1): Don't retry analysis with
suggested unroll factor once the previous analysis fails.
* config.host: Arrange to set min Darwin OS versions from
the configured host version.
* config/darwin10-unwind-find-enc-func.c: Do not use current
headers, but declare the nexessary structures locally to the
versions in use for Mac OSX 10.6.
* config/t-darwin: Amend to handle configured min OS
versions.
* config/t-darwin-min-1: New.
* config/t-darwin-min-5: New.
* config/t-darwin-min-8: New.