Roger Sayle [Mon, 13 Nov 2023 09:16:59 +0000 (09:16 +0000)]
ARC: Improved DImode rotates and right shifts by one bit.
This patch improves the code generated for DImode right shifts (both
arithmetic and logical) by a single bit, and also for DImode rotates
(both left and right) by a single bit. In approach, this is similar
to the recently added DImode left shift by a single bit patch, but
also builds upon the x86's UNSPEC carry flag representation:
https://gcc.gnu.org/pipermail/gcc-patches/2023-October/632169.html
The benefits can be seen from the four new test cases:
On CPUs without a barrel shifter the improvements are even better.
2023-11-13 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* config/arc/arc.md (UNSPEC_ARC_CC_NEZ): New UNSPEC that
represents the carry flag being set if the operand is non-zero.
(adc_f): New define_insn representing adc with updated flags.
(ashrdi3): New define_expand that only handles shifts by 1.
(ashrdi3_cnt1): New pre-reload define_insn_and_split.
(lshrdi3): New define_expand that only handles shifts by 1.
(lshrdi3_cnt1): New pre-reload define_insn_and_split.
(rrcsi2): New define_insn for rrc (SImode rotate right through carry).
(rrcsi2_carry): Likewise for rrc.f, as above but updating flags.
(rotldi3): New define_expand that only handles rotates by 1.
(rotldi3_cnt1): New pre-reload define_insn_and_split.
(rotrdi3): New define_expand that only handles rotates by 1.
(rotrdi3_cnt1): New pre-reload define_insn_and_split.
(lshrsi3_cnt1_carry): New define_insn for lsr.f.
(ashrsi3_cnt1_carry): New define_insn for asr.f.
(btst_0_carry): New define_insn for asr.f without result.
gcc/testsuite/ChangeLog
* gcc.target/arc/ashrdi3-1.c: New test case.
* gcc.target/arc/lshrdi3-1.c: Likewise.
* gcc.target/arc/rotldi3-1.c: Likewise.
* gcc.target/arc/rotrdi3-1.c: Likewise.
Roger Sayle [Mon, 13 Nov 2023 09:11:42 +0000 (09:11 +0000)]
ARC: Provide a TARGET_FOLD_BUILTIN target hook.
This patch implements a arc_fold_builtin target hook to allow ARC
builtins to be folded at the tree-level. Currently this function
converts __builtin_arc_swap into a LROTATE_EXPR at the tree-level,
and evaluates __builtin_arc_norm and __builtin_arc_normw of integer
constant arguments at compile-time. Because ARC_BUILTIIN_SWAP is
now handled at the tree-level, UNSPEC_ARC_SWAP no longer used,
allowing it and the "swap" define_insn to be removed.
An example benefit of folding things at compile-time is that
calling __builtin_arc_swap on the result of __builtin_arc_swap
now eliminates both and generates no code, and likewise calling
__builtin_arc_swap of a constant integer argument is evaluated
at compile-time.
2023-11-13 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* config/arc/arc.cc (TARGET_FOLD_BUILTIN): Define to
arc_fold_builtin.
(arc_fold_builtin): New function. Convert ARC_BUILTIN_SWAP
into a rotate. Evaluate ARC_BUILTIN_NORM and
ARC_BUILTIN_NORMW of constant arguments.
* config/arc/arc.md (UNSPEC_ARC_SWAP): Delete.
(normw): Make output template/assembler whitespace consistent.
(swap): Remove define_insn, only use of SWAP UNSPEC.
* config/arc/builtins.def: Tweak indentation.
(SWAP): Expand using rotlsi2_cnt16 instead of using swap.
Roger Sayle [Mon, 13 Nov 2023 09:05:16 +0000 (09:05 +0000)]
i386: Improve reg pressure of double word right shift then truncate.
This patch improves register pressure during reload, inspired by PR 97756.
Normally, a double-word right-shift by a constant produces a double-word
result, the highpart of which is dead when followed by a truncation.
The dead code calculating the high part gets cleaned up post-reload, so
the issue isn't normally visible, except for the increased register
pressure during reload, sometimes leading to odd register assignments.
Providing a post-reload splitter, which clobbers a single wordmode
result register instead of a doubleword result register, helps (a bit).
An example demonstrating this effect is:
unsigned long foo (__uint128_t n)
{
unsigned long a = n & MASK60;
unsigned long b = (n >> 60);
b = b & MASK60;
unsigned long c = (n >> 120);
return a+b+c;
}
with this patch, we generate one less mov (12 instructions):
foo: movabsq $1152921504606846975, %rcx
xchgq %rdi, %rsi
movq %rdi, %rdx
movq %rsi, %rax
movq %rdi, %rsi
shrdq $60, %rdi, %rdx
andq %rcx, %rax
shrq $56, %rsi
addq %rsi, %rax
andq %rcx, %rdx
addq %rdx, %rax
ret
The significant difference is easier to see via diff:
< shrdq $60, %rdi, %rax
< movq %rax, %rdx
---
> shrdq $60, %rdi, %rdx
Admittedly a single "mov" isn't much of a saving on modern architectures,
but as demonstrated by the PR, people still track the number of them.
2023-11-13 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* config/i386/i386.md (<insn><dwi>3_doubleword_lowpart): New
define_insn_and_split to optimize register usage of doubleword
right shifts followed by truncation.
Jakub Jelinek [Mon, 13 Nov 2023 08:49:09 +0000 (09:49 +0100)]
i386: Remove j constraint letter from list of unused letters
I've noticed the list of unused letters still list j, even when that
constraint letter is now the first letter of jr, jR, jm, j<, j>, jo, jV, jp,
ja, jb and jc constraints.
2023-11-13 Jakub Jelinek <jakub@redhat.com>
* config/i386/constraints.md: Remove j constraint letter from list of
unused letters.
Florian Weimer [Mon, 13 Nov 2023 07:54:11 +0000 (08:54 +0100)]
C99 testsuite readiness: Cleanup of execute tests
This change updates the gcc.c-torture/execute/ to avoid obsolete
language constructs. In the changed tests, use of the features
appears to be accidental, and updating allows the tests run with
the default compiler flags.
gcc/testsuite/
* gcc.c-torture/execute/20000112-1.c (main): Add missing
int and void types.
* gcc.c-torture/execute/20000113-1.c (foobar): Add missing
void return type.
(main): Add missing int and void types.
* gcc.c-torture/execute/20000314-2.c (main): Likewise.
* gcc.c-torture/execute/20000402-1.c (main): Likewise.
* gcc.c-torture/execute/20000403-1.c (main): Likewise.
* gcc.c-torture/execute/20000503-1.c (main): Likewise.
* gcc.c-torture/execute/20000605-2.c (main): Likewise.
* gcc.c-torture/execute/20000717-1.c (main): Likewise.
* gcc.c-torture/execute/20000717-5.c (main): Likewise.
* gcc.c-torture/execute/20000726-1.c (main): Likewise.
* gcc.c-torture/execute/20000914-1.c(blah): Add missing
void types.
(main): Add missing int and void types.
* gcc.c-torture/execute/20001009-1.c (main): Likewise.
* gcc.c-torture/execute/20001013-1.c (main): Likewise.
* gcc.c-torture/execute/20001031-1.c (main): Likewise.
* gcc.c-torture/execute/20010221-1.c (main): Likewise.
* gcc.c-torture/execute/20010723-1.c (main): Likewise.
* gcc.c-torture/execute/20010915-1.c (s): Call
__builtin_strcmp instead of strcmp.
* gcc.c-torture/execute/20010924-1.c (main): Add missing
int and void types.
* gcc.c-torture/execute/20011128-1.c (main): Likewise.
* gcc.c-torture/execute/20020226-1.c (main): Likewise.
* gcc.c-torture/execute/20020328-1.c (foo): Add missing
void types.
* gcc.c-torture/execute/20020406-1.c (DUPFFexgcd): Call
__builtin_printf instead of printf.
(main): Likewise.
* gcc.c-torture/execute/20020508-1.c (main): Add missing
int and void types.
* gcc.c-torture/execute/20020508-2.c (main): Likewise.
* gcc.c-torture/execute/20020508-3.c (main): Likewise.
* gcc.c-torture/execute/20020611-1.c (main): Likewise.
* gcc.c-torture/execute/20021010-2.c (main): Likewise.
* gcc.c-torture/execute/20021113-1.c (foo): Add missing
void return type.
(main): Add missing int and void types.
* gcc.c-torture/execute/20021120-3.c (foo): Call
__builtin_sprintf instead of sprintf.
* gcc.c-torture/execute/20030125-1.c (main): Add missing
int and void types.
* gcc.c-torture/execute/20030216-1.c (main): Likewise.
* gcc.c-torture/execute/20030404-1.c (main): Likewise.
* gcc.c-torture/execute/20030606-1.c (main): Likewise.
Call __builtin_memset instead of memset.
* gcc.c-torture/execute/20030828-1.c (main): Add missing int
and void types.
* gcc.c-torture/execute/20030828-2.c (main): Likewise.
* gcc.c-torture/execute/20031012-1.c: Call __builtin_strlen
instead of strlen.
* gcc.c-torture/execute/20031211-1.c (main): Add missing int
and void types.
* gcc.c-torture/execute/20040319-1.c (main): Likewise.
* gcc.c-torture/execute/20040411-1.c (sub1): Call
__builtin_memcpy instead of memcpy.
* gcc.c-torture/execute/20040423-1.c (sub1): Likewise.
* gcc.c-torture/execute/20040917-1.c (main): Add missing int
and void types.
* gcc.c-torture/execute/20050131-1.c (main): Likewise.
* gcc.c-torture/execute/20051113-1.c (main): Likewise.
* gcc.c-torture/execute/20121108-1.c (main): Call
__builtin_printf instead of printf.
* gcc.c-torture/execute/20170401-2.c (main): Add missing int
and void types.
* gcc.c-torture/execute/900409-1.c (main): Likewise.
* gcc.c-torture/execute/920202-1.c (f): Add int return type.
(main): Add missing int and void types.
* gcc.c-torture/execute/920302-1.c (execute): Add void return
type.
(main): Add missing int and void types.
* gcc.c-torture/execute/920410-1.c (main): Likewise.
* gcc.c-torture/execute/920501-2.c (main): Likewise.
* gcc.c-torture/execute/920501-3.c (execute): Add void return
type.
(main): Add missing int and void types.
* gcc.c-torture/execute/920501-5.c (x): Add int return type.
(main): Add missing int and void types.
* gcc.c-torture/execute/920501-6.c (main): Add int return
type.
* gcc.c-torture/execute/920501-8.c (main): Add missing
int and void types. Call __builtin_strcmp instead of strcmp.
* gcc.c-torture/execute/920506-1.c (main): Add missing
int and void types.
* gcc.c-torture/execute/920612-2.c (main): Likewise.
* gcc.c-torture/execute/920618-1.c (main): Likewise.
* gcc.c-torture/execute/920625-1.c (main): Likewise.
* gcc.c-torture/execute/920710-1.c (main): Likewise.
* gcc.c-torture/execute/920721-1.c (main): Likewise.
* gcc.c-torture/execute/920721-4.c (main): Likewise.
* gcc.c-torture/execute/920726-1.c (first, second): Call
__builtin_strlen instead of strlen.
(main): Add missing int and void types. Call __builtin_strcmp
instead of strcmp.
* gcc.c-torture/execute/920810-1.c (main): Add missing
int and void types.
* gcc.c-torture/execute/920829-1.c (main): Likewise.
* gcc.c-torture/execute/920908-1.c (main): Likewise.
* gcc.c-torture/execute/920922-1.c (main): Likewise.
* gcc.c-torture/execute/920929-1.c (f): Add void return type.
(main): Add missing int and void types.
* gcc.c-torture/execute/921006-1.c (main): Likewise. Call
__builtin_strcmp instead of strcmp.
* gcc.c-torture/execute/921007-1.c (main): Add missing
int and void types.
* gcc.c-torture/execute/921016-1.c (main): Likewise.
* gcc.c-torture/execute/921019-1.c (main): Likewise.
* gcc.c-torture/execute/921019-2.c (main): Likewise.
* gcc.c-torture/execute/921029-1.c (main): Likewise.
* gcc.c-torture/execute/921104-1.c (main): Likewise.
* gcc.c-torture/execute/921112-1.c (f): Add void return type.
(main): Add missing int and void types.
* gcc.c-torture/execute/921113-1.c (w, f1, f2, gitter): Add
void return type.
(main): Add missing int and void types.
* gcc.c-torture/execute/921117-1.c (check): Call
__builtin_strcmp instead of strcmp.
(main): Add missing int and void types. Call __builtin_strcpy
instead of strcpy.
* gcc.c-torture/execute/921123-2.c (main): Add missing
int and void types.
* gcc.c-torture/execute/921202-2.c (main): Likewise.
* gcc.c-torture/execute/921204-1.c (main): Likewise.
* gcc.c-torture/execute/921208-1.c (main): Likewise.
* gcc.c-torture/execute/930123-1.c (f): Add void return type.
(main): Add missing int and void types.
* gcc.c-torture/execute/930126-1.c (main): Likewise.
* gcc.c-torture/execute/930406-1.c (f): Add void return type.
(main): Add missing int and void types.
* gcc.c-torture/execute/930408-1.c (p, f): Add missing void
types.
(main): Add missing int and void types.
* gcc.c-torture/execute/930429-1.c (main): Likewise.
* gcc.c-torture/execute/930603-2.c (f): Add missing void
types.
(main): Add missing int and void types.
* gcc.c-torture/execute/930608-1.c (main): Likewise.
* gcc.c-torture/execute/930614-1.c (f): Add void return type.
(main): Add missing int and void types.
* gcc.c-torture/execute/930614-2.c (main): Likewise.
* gcc.c-torture/execute/930622-2.c (main): Likewise.
* gcc.c-torture/execute/930628-1.c (f): Add void return type.
(main): Add missing int and void types.
* gcc.c-torture/execute/930725-1.c (main): Likewise. Call
__builtin_strcmp instead of strcmp.
* gcc.c-torture/execute/930930-2.c (main): Add missing
int and void types.
* gcc.c-torture/execute/931002-1.c (f): Add void return type.
(main): Add missing int and void types.
* gcc.c-torture/execute/931004-1.c (f): Add void return type.
(main): Add missing int and void types.
* gcc.c-torture/execute/931004-10.c (f): Add void return type.
(main): Add missing int and void types.
* gcc.c-torture/execute/931004-11.c (f): Add void return type.
(main): Add missing int and void types.
* gcc.c-torture/execute/931004-12.c (f): Add void return type.
(main): Add missing int and void types.
* gcc.c-torture/execute/931004-13.c (f): Add void return type.
(main): Add missing int and void types.
* gcc.c-torture/execute/931004-14.c (f): Add void return type.
(main): Add missing int and void types.
* gcc.c-torture/execute/931004-2.c (f): Add void return type.
(main): Add missing int and void types.
* gcc.c-torture/execute/931004-3.c (f): Add void return type.
(main): Add missing int and void types.
* gcc.c-torture/execute/931004-4.c (f): Add void return type.
(main): Add missing int and void types.
* gcc.c-torture/execute/931004-5.c (f): Add void return type.
(main): Add missing int and void types.
* gcc.c-torture/execute/931004-6.c (f): Add void return type.
(main): Add missing int and void types.
* gcc.c-torture/execute/931004-7.c (f): Add void return type.
(main): Add missing int and void types.
* gcc.c-torture/execute/931004-8.c (f): Add void return type.
(main): Add missing int and void types.
* gcc.c-torture/execute/931004-9.c (f): Add void return type.
(main): Add missing int and void types.
* gcc.c-torture/execute/931005-1.c (main): Likewise.
* gcc.c-torture/execute/931110-1.c (main): Likewise.
* gcc.c-torture/execute/931110-2.c (main): Likewise.
* gcc.c-torture/execute/941014-1.c (main): Likewise.
* gcc.c-torture/execute/941014-2.c (main): Likewise.
* gcc.c-torture/execute/941015-1.c (main): Likewise.
* gcc.c-torture/execute/941021-1.c (f): Add void return type.
(main): Add missing int and void types.
* gcc.c-torture/execute/941025-1.c (main): Likewise.
* gcc.c-torture/execute/941031-1.c (main): Likewise.
* gcc.c-torture/execute/950221-1.c (g2): Add int return type.
(f): Add missing void types. Call __builtin_strcpy instead
of strcpy.
(main): Add missing int and void types.
* gcc.c-torture/execute/950426-2.c (main): Likewise.
* gcc.c-torture/execute/950503-1.c (main): Likewise.
* gcc.c-torture/execute/950511-1.c (main): Likewise.
* gcc.c-torture/execute/950607-1.c (main): Likewise.
* gcc.c-torture/execute/950607-2.c (main): Likewise.
* gcc.c-torture/execute/950612-1.c (main): Likewise.
* gcc.c-torture/execute/950628-1.c (main): Likewise.
* gcc.c-torture/execute/950704-1.c (main): Likewise.
* gcc.c-torture/execute/950706-1.c (main): Likewise.
* gcc.c-torture/execute/950710-1.c (main): Likewise.
* gcc.c-torture/execute/950714-1.c (main): Likewise.
* gcc.c-torture/execute/950809-1.c (f): Add void return type.
(main): Add missing int and void types.
* gcc.c-torture/execute/950906-1.c (g, f): Add void return
type.
(main): Add missing int and void types.
* gcc.c-torture/execute/950915-1.c (main): Likewise.
* gcc.c-torture/execute/950929-1.c (main): Likewise.
* gcc.c-torture/execute/951003-1.c (f): Add missing int
parameter type.
(main): Add missing int and void types.
* gcc.c-torture/execute/951115-1.c (g, f): Add void return
type.
(main): Add missing int and void types.
* gcc.c-torture/execute/951204-1.c (f): Add void return type.
(main): Add missing int and void types.
* gcc.c-torture/execute/960116-1.c (p): Add int return type.
(main): Add missing int and void types.
* gcc.c-torture/execute/960117-1.c (main): Likewise.
* gcc.c-torture/execute/960209-1.c (main): Likewise.
* gcc.c-torture/execute/960215-1.c (main): Likewise.
* gcc.c-torture/execute/960219-1.c (f): Add void return type.
(main): Add missing int and void types.
* gcc.c-torture/execute/960301-1.c (main): Likewise.
* gcc.c-torture/execute/960302-1.c (foo, main): Add missing
int and void types.
* gcc.c-torture/execute/960311-1.c (main): Likewise.
* gcc.c-torture/execute/960311-2.c (main): Likewise.
* gcc.c-torture/execute/960311-3.c (main): Likewise.
* gcc.c-torture/execute/960312-1.c (f): Add void return type.
(main): Add missing int and void types.
* gcc.c-torture/execute/960317-1.c (main): Likewise.
* gcc.c-torture/execute/960321-1.c (main): Likewise.
* gcc.c-torture/execute/960326-1.c (main): Likewise.
* gcc.c-torture/execute/960327-1.c (g, main): Add missing
int and void types.
(f): Add missing void types.
* gcc.c-torture/execute/960405-1.c (main): Add missing
int and void types.
* gcc.c-torture/execute/960416-1.c (main): Likewise.
* gcc.c-torture/execute/960419-1.c (main): Likewise.
* gcc.c-torture/execute/960419-2.c (main): Likewise.
* gcc.c-torture/execute/960512-1.c (main): Likewise.
* gcc.c-torture/execute/960513-1.c (main): Likewise.
* gcc.c-torture/execute/960521-1.c (f): Add missing void
types.
(main): Add missing int and void types.
* gcc.c-torture/execute/960608-1.c (f): Add int return type.
(main): Add missing int and void types.
* gcc.c-torture/execute/960801-1.c (main): Likewise.
* gcc.c-torture/execute/960802-1.c (main): Likewise.
* gcc.c-torture/execute/960909-1.c (f): Add void return type.
(main): Add missing int and void types.
* gcc.c-torture/execute/961004-1.c (main): Likewise.
* gcc.c-torture/execute/961017-1.c (main): Likewise.
* gcc.c-torture/execute/961017-2.c (main): Likewise.
* gcc.c-torture/execute/961026-1.c (main): Likewise.
* gcc.c-torture/execute/961122-1.c (addhi, subhi): Add void
return type.
(main): Add missing int and void types.
* gcc.c-torture/execute/961122-2.c (main): Likewise.
* gcc.c-torture/execute/961125-1.c (main): Likewise.
* gcc.c-torture/execute/961206-1.c (main): Likewise.
* gcc.c-torture/execute/961213-1.c (main): Likewise.
* gcc.c-torture/execute/970214-1.c (main): Likewise.
* gcc.c-torture/execute/970214-2.c (main): Likewise.
* gcc.c-torture/execute/970217-1.c (sub): Add int return type.
(main): Add missing int and void types.
* gcc.c-torture/execute/970923-1.c (main): Likewise.
* gcc.c-torture/execute/980223.c (main): Likewise.
* gcc.c-torture/execute/980506-1.c (main): Likewise.
* gcc.c-torture/execute/980506-2.c (main): Likewise.
* gcc.c-torture/execute/980506-3.c (build_lookup): Call
__builtin_strlen instead of strlen and __builtin_memset
instead of memset.
* gcc.c-torture/execute/980526-3.c (main): Likewise.
* gcc.c-torture/execute/980602-1.c (main): Likewise.
* gcc.c-torture/execute/980604-1.c (main): Likewise.
* gcc.c-torture/execute/980605-1.c (dummy): Add missing int
parameter type.
(main): Add missing int and void types.
* gcc.c-torture/execute/980701-1.c (ns_name_skip): Add missing
int return type.
(main): Add missing int and void types.
* gcc.c-torture/execute/980709-1.c (main): Likewise.
* gcc.c-torture/execute/990117-1.c (main): Likewise.
* gcc.c-torture/execute/990127-1.c (main): Likewise.
* gcc.c-torture/execute/990128-1.c (main): Likewise.
* gcc.c-torture/execute/990130-1.c (main): Likewise.
* gcc.c-torture/execute/990324-1.c (main): Likewise.
* gcc.c-torture/execute/990524-1.c (main): Likewise.
* gcc.c-torture/execute/990531-1.c (main): Likewise.
* gcc.c-torture/execute/990628-1.c (fetch, load_data): Call
__builtin_memset instead of memset.
(main): Add missing int and void types.
* gcc.c-torture/execute/991019-1.c (main): Likewise.
* gcc.c-torture/execute/991023-1.c (foo, main): Likewise.
* gcc.c-torture/execute/991112-1.c (isprint): Declare.
* gcc.c-torture/execute/991118-1.c (main): Add missing
int and void types.
* gcc.c-torture/execute/alias-1.c (ptr2): Add cast to float *
in initializer.
(typepun): Add missing void types.
(main): Add missing int and void types.
* gcc.c-torture/execute/alias-2.c (main): Likewise.
* gcc.c-torture/execute/alias-3.c (inc): Add missing
void types.
* gcc.c-torture/execute/alias-4.c (main): Add missing int
return type.
* gcc.c-torture/execute/arith-1.c (main): Add missing
int and void types.
* gcc.c-torture/execute/arith-rand-ll.c (main): Likewise.
* gcc.c-torture/execute/arith-rand.c (main): Likewise.
* gcc.c-torture/execute/bf-layout-1.c (main): Likewise.
* gcc.c-torture/execute/bf-pack-1.c (foo): Add missing
void return type.
(main): Add missing int and void types.
* gcc.c-torture/execute/bf-sign-1.c (main): Likewise.
* gcc.c-torture/execute/bf-sign-2.c (main): Likewise.
* gcc.c-torture/execute/bf64-1.c (main): Likewise.
* gcc.c-torture/execute/builtin-prefetch-2.c (stat_int_arr):
Add missing int array element type.
* gcc.c-torture/execute/builtin-prefetch-3.c (stat_int_arr):
Likewise.
* gcc.c-torture/execute/cbrt.c (main): Add missing int and
void types.
* gcc.c-torture/execute/complex-1.c (main): Likewise.
* gcc.c-torture/execute/complex-2.c (main): Likewise.
* gcc.c-torture/execute/complex-3.c (main): Likewise.
* gcc.c-torture/execute/complex-4.c (main): Likewise.
* gcc.c-torture/execute/complex-5.c (main): Likewise.
* gcc.c-torture/execute/compndlit-1.c (main): Likewise.
* gcc.c-torture/execute/conversion.c (test_integer_to_float)
(test_longlong_integer_to_float, test_float_to_integer)
(test_float_to_longlong_integer): Add missing void return
type.
(main): Add missing int and void types.
* gcc.c-torture/execute/cvt-1.c (main): Likewise.
* gcc.c-torture/execute/divconst-1.c (f): Add void return type.
(main): Add missing int and void types.
* gcc.c-torture/execute/divconst-2.c (main): Likewise.
* gcc.c-torture/execute/divconst-3.c (main): Likewise.
* gcc.c-torture/execute/enum-1.c (main): Likewise.
* gcc.c-torture/execute/func-ptr-1.c (main): Likewise.
* gcc.c-torture/execute/ieee/20011123-1.c (main): Likewise.
* gcc.c-torture/execute/ieee/920518-1.c (main): Likewise.
* gcc.c-torture/execute/ieee/920810-1.c (main): Likewise.
Call __builtin_strcmp instead of strcmp.
* gcc.c-torture/execute/ieee/930529-1.c (main): Add missing
int and void types.
* gcc.c-torture/execute/ieee/fp-cmp-1.c (main): Likewise.
* gcc.c-torture/execute/ieee/fp-cmp-2.c (main): Likewise.
* gcc.c-torture/execute/ieee/fp-cmp-3.c (main): Likewise.
* gcc.c-torture/execute/ieee/fp-cmp-6.c (main): Likewise.
* gcc.c-torture/execute/ieee/fp-cmp-9.c (main): Likewise.
* gcc.c-torture/execute/ieee/minuszero.c (main): Likewise.
* gcc.c-torture/execute/ieee/mzero2.c (expect): Call
__builtin_memcmp instead of memcmp.
(main): Add missing int and void types.
* gcc.c-torture/execute/ieee/mzero3.c (main): Likewise.
(expectd, expectf): Call __builtin_memcmp instead of memcmp.
* gcc.c-torture/execute/ieee/mzero5.c (negzero_check):
Likewise.
* gcc.c-torture/execute/ieee/rbug.c (main): Add missing
int and void types.
* gcc.c-torture/execute/index-1.c (main): Likewise.
* gcc.c-torture/execute/loop-1.c (main): Likewise.
* gcc.c-torture/execute/loop-2b.c (f): Add void return type.
(main): Add missing int and void types.
* gcc.c-torture/execute/loop-6.c (main): Likewise.
* gcc.c-torture/execute/loop-7.c (main): Likewise.
* gcc.c-torture/execute/lto-tbaa-1.c (use_a, set_b, use_c):
Add void return type.
(main): Add missing int and void types.
* gcc.c-torture/execute/memcpy-1.c (main): Likewise.
* gcc.c-torture/execute/memcpy-2.c (main): Likewise.
* gcc.c-torture/execute/memcpy-bi.c (main): Likewise.
* gcc.c-torture/execute/memset-1.c (main): Likewise.
* gcc.c-torture/execute/memset-2.c: Include <string.h>.
* gcc.c-torture/execute/memset-3.c: Likewise.
* gcc.c-torture/execute/nest-stdar-1.c (main): Add missing
int and void types.
* gcc.c-torture/execute/nestfunc-1.c (f): Add void return type.
(main): Add missing int and void types.
* gcc.c-torture/execute/packed-1.c (f): Add void return type.
(main): Add missing int and void types.
* gcc.c-torture/execute/pr15262-1.c (main): Likewise. Call
__builtin_malloc instead of malloc.
* gcc.c-torture/execute/pr15262-2.c (foo): Add int return
type.
(main): Add missing int and void types.
* gcc.c-torture/execute/pr15262.c (main): Likewise.
* gcc.c-torture/execute/pr17252.c (main): Likewise.
* gcc.c-torture/execute/pr21331.c (main): Likewise.
* gcc.c-torture/execute/pr34176.c (foo): Add missing int
type to definition of foo.
* gcc.c-torture/execute/pr42231.c (max): Add missing int type
to definition.
* gcc.c-torture/execute/pr42614.c (expect_func): Call
__builtin_abs instead of abs.
* gcc.c-torture/execute/pr54937.c (t): Add void return type.
(main): Add missing int and void types.
* gcc.c-torture/execute/ptr-arith-1.c (main): Likewise.
* gcc.c-torture/execute/regstack-1.c (main): Likewise.
* gcc.c-torture/execute/scope-1.c (f): Add missing void types.
(main): Add missing int and void types.
* gcc.c-torture/execute/simd-5.c (main): Call __builtin_memcmp
instead of memcmp.
* gcc.c-torture/execute/strcmp-1.c (main): Add missing
int and void types.
* gcc.c-torture/execute/strcpy-1.c (main): Likewise.
* gcc.c-torture/execute/strct-pack-1.c (main): Likewise.
* gcc.c-torture/execute/strct-pack-2.c (main): Likewise.
* gcc.c-torture/execute/strct-pack-4.c (main): Likewise.
* gcc.c-torture/execute/strct-stdarg-1.c (f): Add void return
type.
(main): Add missing int and void types.
* gcc.c-torture/execute/strct-varg-1.c (f): Add void return
type.
(main): Add missing int and void types.
* gcc.c-torture/execute/strlen-1.c (main): Likewise.
* gcc.c-torture/execute/strncmp-1.c (main): Likewise.
* gcc.c-torture/execute/struct-ini-1.c (main): Likewise.
* gcc.c-torture/execute/struct-ini-2.c (main): Likewise.
* gcc.c-torture/execute/struct-ini-3.c (main): Likewise.
* gcc.c-torture/execute/struct-ini-4.c (main): Likewise.
* gcc.c-torture/execute/struct-ret-1.c (main): Likewise.
* gcc.c-torture/execute/struct-ret-2.c (main): Likewise.
* gcc.c-torture/execute/va-arg-1.c (f): Add void return type.
(main): Add missing int and void types.
* gcc.c-torture/execute/va-arg-10.c (main): Likewise.
* gcc.c-torture/execute/va-arg-2.c (main): Likewise.
* gcc.c-torture/execute/va-arg-4.c (main): Likewise.
* gcc.c-torture/execute/va-arg-5.c (va_double)
(va_long_double): Add void return type.
(main): Add missing int and void types.
* gcc.c-torture/execute/va-arg-6.c (f): Add void return type.
(main): Add missing int and void types.
* gcc.c-torture/execute/va-arg-9.c (main): Likewise.
* gcc.c-torture/execute/widechar-1.c (main): Likewise.
The execute tests use abort/exit to report failure/success, but
they generally do not declare these functions (or include <stdlib.h>).
This change adds declarations as appropriate.
It would have been possible to switch to __builtin_abort and
__builtin_exit instead. Existing practice varies. Adding the
declarations makes it easier to write the GNU-style commit message
because it is not necessary to mention the function with the call
site.
Instead of this change, it would be possible to create a special
header file with the declarations that is included during the
test file compilation using -include, but that would mean that
many tests would no longer build standalone.
Florian Weimer [Mon, 13 Nov 2023 07:54:10 +0000 (08:54 +0100)]
C99 testsuite readiness: -fpermissive tests
These tests use obsolete language constructs, but they are not
clearly targeting C89, either. So use -fpermissive to keep
future errors as warnings.
The reasons why obsolete constructs are used used vary from
test to test. Some tests deliberately exercise later stages
of the compiler that only occur with those constructs. Some
tests have precise expectations about warnings that will become
errors with a future change, but do not specifically test a
particular warning/error (if that is the case, the later changes
tend to duplicate them into warning/error variants). In a few
cases, use of obsolete constructs is clearly due to test case
reduction, but it was not possible to un-reduce the test due
to its size.
Jakub Jelinek [Mon, 13 Nov 2023 07:47:41 +0000 (08:47 +0100)]
gimple-range-cache: Fix ICEs when dumping details [PR111967]
The following testcase ICEs when dumping details.
When m_ssa_ranges vector is created, it is safe_grow_cleared (num_ssa_names),
but when when some new SSA_NAME is added, we strangely grow it to
num_ssa_names + 1 instead and later on the 3 argument dump method
iterates from 1 to m_ssa_ranges.length () - 1 and uses ssa_name (x)
on each; but because set_bb_range grew it one too much, ssa_name
(m_ssa_ranges.length () - 1) might be after the end of the ssanames
vector and ICE.
The fix grows the vector consistently only to num_ssa_names,
doesn't waste time checking m_ssa_ranges[0] because there is no
ssa_names (0), it is always NULL, before using ssa_name (x) checks
if we'll need it at all (we check later if m_ssa_ranges[x] is non-NULL,
so we might check it earlier as well) and also in the last loop
iterates until m_ssa_ranges.length () rather than num_ssa_names, I don't
see a reason for the inconsistency and in theory some SSA_NAME could be
added without set_bb_range called for it and the vector could be shorter
than the ssanames vector.
To actually fix the ICE, either the first hunk or the last 2 hunks
would be enough, but I think it doesn't hurt to change all the spots.
2023-11-13 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/111967
* gimple-range-cache.cc (block_range_cache::set_bb_range): Grow
m_ssa_ranges to num_ssa_names rather than num_ssa_names + 1.
(block_range_cache::dump): Iterate from 1 rather than 0. Don't use
ssa_name (x) unless m_ssa_ranges[x] is non-NULL. Iterate to
m_ssa_ranges.length () rather than num_ssa_names.
Xi Ruoyao [Mon, 30 Oct 2023 12:24:58 +0000 (20:24 +0800)]
LoongArch: Optimize single-used address with -mexplicit-relocs=auto for fld/fst
fld and fst have same address mode as ld.w and st.w, so the same
optimization as r14-4851 should be applied for them too.
gcc/ChangeLog:
* config/loongarch/loongarch.md (LD_AT_LEAST_32_BIT): New mode
iterator.
(ST_ANY): New mode iterator.
(define_peephole2): Use LD_AT_LEAST_32_BIT instead of GPR and
ST_ANY instead of QHWD for applicable patterns.
Pan Li [Mon, 13 Nov 2023 03:06:38 +0000 (11:06 +0800)]
RISC-V: Fix RVV dynamic frm tests failure
The hancement of mode-switching performs some optimization when
emit the frm backup insn, some redudant fsrm insns are removed
for the following test cases.
This patch would like to adjust the asm check for above optimization.
Pan Li [Sun, 12 Nov 2023 12:16:03 +0000 (20:16 +0800)]
RISC-V: Support FP l/ll round and rint HF mode autovec
This patch would like to support the FP below API auto vectorization
with different type size
+------------+-----------+----------+
| API | RV64 | RV32 |
+------------+-----------+----------+
| lrintf16 | HF => DI | HF => SI |
| llrintf16 | HF => DI | HF => DI |
| lroundf16 | HF => DI | HF => SI |
| llroundf16 | HF => DI | HF => DI |
+------------+-----------+----------+
Given below code:
void
test_lrintf16 (long *out, _Float16 *in, int count)
{
for (unsigned i = 0; i < count; i++)
out[i] = __builtin_lrintf16 (in[i]);
}
Before this patch:
.L3:
lhu a5,0(s0)
addi s0,s0,2
addi s1,s1,8
fmv.s.x fa0,a5
call lrintf16
sd a0,-8(s1)
bne s0,s2,.L3
After this patch:
.L3:
vsetvli a5,a2,e16,mf4,ta,ma
vle16.v v1,0(a1)
vfwcvt.f.f.v v2,v1
vsetvli zero,zero,e32,mf2,ta,ma
vfwcvt.x.f.v v1,v2
vse64.v v1,0(a0)
slli a4,a5,1
add a1,a1,a4
slli a4,a5,3
add a0,a0,a4
sub a2,a2,a5
bne a2,zero,.L3
gcc/ChangeLog:
* config/riscv/autovec.md: Add bridge mode to lrint and lround
pattern.
* config/riscv/riscv-protos.h (expand_vec_lrint): Add new arg
bridge machine mode.
(expand_vec_lround): Ditto.
* config/riscv/riscv-v.cc (emit_vec_widden_cvt_f_f): New helper
func impl to emit vfwcvt.f.f.
(emit_vec_rounding_to_integer): Handle the HF to DI rounding
with the bridge mode.
(expand_vec_lrint): Reorder the args.
(expand_vec_lround): Ditto.
(expand_vec_lceil): Ditto.
(expand_vec_lfloor): Ditto.
* config/riscv/vector-iterators.md: Add vector HFmode and bridge
mode for converting to DI.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/unop/math-llrintf16-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-llroundf16-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-lrintf16-rv32-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-lrintf16-rv64-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-lroundf16-rv32-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-lroundf16-rv64-0.c: New test.
* gcc.target/riscv/rvv/autovec/vls/math-llrintf16-0.c: New test.
* gcc.target/riscv/rvv/autovec/vls/math-llroundf16-0.c: New test.
* gcc.target/riscv/rvv/autovec/vls/math-lrintf16-rv32-0.c: New test.
* gcc.target/riscv/rvv/autovec/vls/math-lrintf16-rv64-0.c: New test.
* gcc.target/riscv/rvv/autovec/vls/math-lroundf16-rv32-0.c: New test.
* gcc.target/riscv/rvv/autovec/vls/math-lroundf16-rv64-0.c: New test.
Eric Botcazou [Sat, 11 Nov 2023 18:43:07 +0000 (19:43 +0100)]
Handle addresses of more constants in IPA-CP
IPA-CP can handle addresses of scalar constants (CONST_DECL) so this extends
that to addresses of constants in the pool (DECL_IN_CONSTANT_POOL). Again
this is helpful for so-called fat pointers in Ada, i.e. objects that are
semantically pointers but represented by structures made up of two pointers.
This also moves the unused function print_ipcp_constant_value from ipa-cp.cc
to ipa-prop.cc and renames it.
gcc/
* ipa-cp.cc (print_ipcp_constant_value): Move to...
(values_equal_for_ipcp_p): Deal with VAR_DECLs from the
constant pool.
* ipa-prop.cc (ipa_print_constant_value): ...here. Likewise.
(ipa_print_node_jump_functions_for_edge): Call the function
ipa_print_constant_value to print IPA_JF_CONST elements.
Jin Ma [Sat, 11 Nov 2023 20:11:45 +0000 (13:11 -0700)]
[PATCH v2] In the pipeline, USE or CLOBBER should delay execution if it starts a new live range.
CLOBBER and USE does not represent real instructions, but in the
process of pipeline optimization, they will wait for transmission
in ready list like other insns, without considering resource
conflicts and cycles. This results in a multi-issue CPU architecture
that can be issued at any time if other regular insns have resource
conflicts or cannot be launched for other reasons. As a result,
its position is advanced in the generated insns sequence, which
will affect register allocation and often lead to more redundant
mov instructions.
A simple example:
https://github.com/majin2020/gcc-test/blob/master/test.c
This is a function in the dhrystone benchmark.
https://github.com/majin2020/gcc-test/blob/0b08c1a13de9663d7d9aba7539b960ec0607ca24/test.c.299r.sched1
This is a log of the pass 'sched1' When -mtune=rocket but issue_rate == 2.
In this log, insn 13 and 14 are much ahead of schedule, which risks generating
redundant mov instructions, which seems unreasonable.
Therefore, I submit patch again on the basis of the last review
opinions to try to solve this problem.
https://github.com/majin2020/gcc-test/commit/efcb43e3369e771bde702955048bfe3f501263dd#diff-805031b1be5092a2322852a248d0b0f92eef7cad5784a8209f4dfc6221407457L189
This is the diff log of shed1 after patch is added.
The new pipeline is:
;; | insn | prio |
;; | 17 | 3 | r142=a0 alu
...
;; | 10 | 0 | [r144]=r141 alu
;; | 13 | 0 | clobber a0 nothing
;; | 14 | 0 | clobber r136 nothing
;; | 12 | 0 | a0=r136 alu
;; | 15 | 0 | use a0 nothing
gcc/ChangeLog:
* haifa-sched.cc (use_or_clobber_starts_range_p): New.
(prune_ready_list): USE or CLOBBER should delay execution
if it starts a new live range.
Jakub Jelinek [Sat, 11 Nov 2023 19:15:53 +0000 (20:15 +0100)]
tree-ssa-math-opts: Fix up gsi_remove order in match_uaddc_usubc [PR112430]
The following testcase ICEs, because the temp_stmts were removed in
wrong order, from the ones appearing earlier in the IL to the later ones,
so insert_debug_temps_for_defs can reintroduce dead SSA_NAMEs back into the
IL.
The following patch fixes that by removing them in the order they were
pushed into the vector, which is from later ones to earlier ones.
Additionally, I've noticed I forgot to call release_defs on the removed
stmts.
2023-11-11 Jakub Jelinek <jakub@redhat.com>
PR middle-end/112430
* tree-ssa-math-opts.cc (match_uaddc_usubc): Remove temp_stmts in the
order they were pushed rather than in reverse order. Call
release_defs after gsi_remove.
This patch adds a way for targets to ask that selected mode changes
be brought forward, through a combination of:
(1) requiring a mode in blocks where the entity was previously
transparent
(2) pushing the transition at the head of a block onto incomging edges
SME has two uses for this:
- A "one-shot" entity that, for any given path of execution,
either stays off or makes exactly one transition from off to on.
This relies only on (1) above; see the hook description for more info.
The main purpose of using mode-switching for this entity is to
shrink-wrap the code that requires it.
- A second entity for which all transitions must be from known
modes, which is enforced using a combination of (1) and (2).
More specifically, (1) looks for edges B1->B2 for which:
- B2 requires a specific mode and
- B1 does not guarantee a specific starting mode
In this system, such an edge is only possible if the entity is
transparent in B1. (1) then forces B1 to require some safe common
mode. Applying this inductively means that all incoming edges are
from known modes. If different edges give different starting modes,
(2) pushes the transitions onto the edges themselves; this only
happens if the entity is not transparent in some predecessor block.
The patch also uses the back-propagation as an excuse to do a simple
on-the-fly optimisation.
Hopefully the comments in the patch explain things a bit better.
gcc/
* target.def (mode_switching.backprop): New hook.
* doc/tm.texi.in (TARGET_MODE_BACKPROP): New @hook.
* doc/tm.texi: Regenerate.
* mode-switching.cc (struct bb_info): Add single_succ.
(confluence_info): Add transp field.
(single_succ_confluence_n, single_succ_transfer): New functions.
(backprop_confluence_n, backprop_transfer): Likewise.
(optimize_mode_switching): Use them. Push mode transitions onto
a block's incoming edges, if the backprop hook requires it.
mode-switching: Add a target-configurable confluence operator
The mode-switching pass assumed that all of an entity's modes
were mutually exclusive. However, the upcoming SME changes
have an entity with some overlapping modes, so that there is
sometimes a "superunion" mode that contains two given modes.
We can use this relationship to pass something more helpful than
"don't know" to the emit hook.
This patch adds a new hook that targets can use to specify
a mode confluence operator.
With mutually exclusive modes, it's possible to compute a block's
incoming and outgoing modes by looking at its availability sets.
With the confluence operator, we instead need to solve a full
dataflow problem.
However, when emitting a mode transition, the upcoming SME use of
mode-switching benefits from having as much information as possible
about the starting mode. Calculating this information is definitely
worth the compile time.
The dataflow problem is written to work before and after the LCM
problem has been solved. A later patch makes use of this.
While there (since git blame would ping me for the reindented code),
I used a lambda to avoid the cut-&-pasted loops.
gcc/
* target.def (mode_switching.confluence): New hook.
* doc/tm.texi (TARGET_MODE_CONFLUENCE): New @hook.
* doc/tm.texi.in: Regenerate.
* mode-switching.cc (confluence_info): New variable.
(mode_confluence, forward_confluence_n, forward_transfer): New
functions.
(optimize_mode_switching): Use them to calculate mode_in when
TARGET_MODE_CONFLUENCE is defined.
The pass used the edge aux field to record which mode change
should happen on the edge, with -1 meaning "none". It's more
convenient for later patches to leave aux zero for "none",
and use numbers based at 1 to record a change.
gcc/
* mode-switching.cc (commit_mode_sets): Use 1-based edge aux values.
mode-switching: Pass set of live registers to the needed hook
The emit hook already takes the set of live hard registers as input.
This patch passes it to the needed hook too. SME uses this to
optimise the mode choice based on whether state is live or dead.
The main caller already had access to the required info, but the
special handling of return values did not.
mode-switching: Allow targets to set the mode for EH handlers
The mode-switching pass already had hooks to say what mode
an entity is in on entry to a function and what mode it must
be in on return. For SME, we also want to say what mode an
entity is guaranteed to be in on entry to an exception handler.
gcc/
* target.def (mode_switching.eh_handler): New hook.
* doc/tm.texi.in (TARGET_MODE_EH_HANDLER): New @hook.
* doc/tm.texi: Regenerate.
* mode-switching.cc (optimize_mode_switching): Use eh_handler
to get the mode on entry to an exception handler.
An entity isn't transparent in a block that requires a specific mode.
optimize_mode_switching took that into account for normal insns,
but didn't for the exit block. Later patches misbehaved because
of this.
In contrast, an entity was correctly marked as non-transparent
in the entry block, but the reasoning seemed a bit convoluted.
It also referred to a function that no longer exists.
Since KILL = ~TRANSP, the entity is by definition not transparent
in a block that defines the entity, so I think we can make it so
without comment.
Finally, the exit handling was nested in the entry handling,
but that doesn't seem necessary. A target could say that an
entity is undefined on entry but must be defined on return,
on a "be liberal in what you accept, be conservative in what
you do" principle.
gcc/
* mode-switching.cc (optimize_mode_switching): Mark the exit
block as nontransparent if it requires a specific mode.
Handle the entry and exit mode as sibling rather than nested
concepts. Remove outdated comment.
mode-switching: Fix the mode passed to the emit hook
optimize_mode_switching passes an entity's current mode (if known)
to the emit hook. However, the mode that it passed ignored the
effect of the after hook. Instead, the mode for the first emit
call in a block was taken from the incoming mode, whereas the
mode for each subsequent emit call was taken from the result
of the previous call.
The previous pass through the insns already calculated the
correct mode, so this patch records it in the seginfo structure.
(There was a 32-bit hole on 64-bit hosts, so this doesn't increase
the size of the structure for them.)
gcc/
* mode-switching.cc (seginfo): Add a prev_mode field.
(new_seginfo): Take and initialize the prev_mode.
(optimize_mode_switching): Update calls accordingly.
Use the recorded modes during the emit phase, rather than
computing one on the fly.
add_seginfo chained insn information to the end of a list
by starting at the head of the list. This patch avoids the
quadraticness by keeping track of the tail pointer.
gcc/
* mode-switching.cc (add_seginfo): Replace head pointer with
a pointer to the tail pointer.
(optimize_mode_switching): Update calls accordingly.
mode-switching: Tweak the macro/hook documentation
I found the documentation for the mode-switching macros/hooks
a bit hard to follow at first. This patch tries to add the
information that I think would have made it easier to understand.
Of course, documentation preferences are personal, and so I could
be changing something that others understood to something that
seems impenetrable.
Some notes on specific changes:
- "in an optimizing compilation" didn't seem accurate; the pass
is run even at -O0, and often needs to be for correctness.
- "at run time" meant when the compiler was run, rather than when
the compiled code was run.
- Removing the list of optional macros isn't a clarification,
but it means that upcoming patches don't create an absurdly
long list.
- I don't really understand the purpose of TARGET_MODE_PRIORITY,
so I mostly left that alone.
Martin Uecker [Thu, 27 Jul 2023 11:41:33 +0000 (13:41 +0200)]
c: Synthesize nonnull attribute for parameters declared with static [PR110815]
Parameters declared with `static` are nonnull. We synthesize
an artifical nonnull attribute for such parameters to get the
same warnings and optimizations.
Bootstrapped and regression tested on x86.
PR c/110815
PR c/112428
gcc/c-family:
* c-attribs.cc (build_attr_access_from_parms): Synthesize
nonnull attribute for parameters declared with `static`.
gcc:
* gimple-ssa-warn-access.cc (pass_waccess::maybe_check_access_sizes):
remove warning for parameters declared with `static`.
gcc/testsuite:
* gcc.dg/Wnonnull-8.c: Adapt test.
* gcc.dg/Wnonnull-9.c: New test.
Joern Rennecke [Sat, 11 Nov 2023 03:53:44 +0000 (03:53 +0000)]
Make scan-assembler* ignore LTO sections
gcc/testsuite/
* lib/scanasm.exp (scan-assembler-times): Disregard LTO sections.
(scan-assembler-dem, scan-assembler-dem-not): Likewise.
(dg-scan): Likewise, if name starts with scan-assembler.
(scan-raw-assembler): New proc.
* gcc.dg/pr61868.c: Use scan-raw-assembler.
* gcc.dg/scantest-lto.c: New test.
gcc/
* doc/sourcebuild.texi (Scan the assembly output): Document change.
Jonathan Wakely [Fri, 10 Nov 2023 21:06:15 +0000 (21:06 +0000)]
libstdc++: Do not use assume attribute for Clang [PR112467]
Clang has an 'assume' attribute, but it's a function attribute not a
statement attribute. The recently-added use of the statement form causes
an error with Clang.
libstdc++-v3/ChangeLog:
PR libstdc++/112467
* include/bits/stl_bvector.h (_M_assume_normalized): Do not use
statement form of assume attribute for Clang.
LWG 3950 points out that the comparisons of std::basic_string_view can
be simplified to just a single overload of operator== and a single
overload of operator<=>. Those overloads work fine for homogeneous
comparisons of two string view objects.
Jonathan Wakely [Mon, 4 Sep 2023 14:23:23 +0000 (15:23 +0100)]
libstdc++: Fix broken tests for <complex.h>
When I added these tests I gave them .h file extensions, so they've
never been run.
They need to use the no_pch option, so that they only test the
<complex.h> header and don't get <complex> via <bits/stdc++.h>.
libstdc++-v3/ChangeLog:
* testsuite/26_numerics/headers/complex.h/std_c++11.h: Moved to...
* testsuite/26_numerics/headers/complex.h/std_c++11.cc: ...here.
* testsuite/26_numerics/headers/complex.h/std_c++98.h: Moved to...
* testsuite/26_numerics/headers/complex.h/std_c++98.cc: ...here.
Check macro first and then #undef.
* testsuite/26_numerics/headers/complex.h/std_gnu++11.h: Moved to...
* testsuite/26_numerics/headers/complex.h/std_gnu++11.cc: ...here.
Jonathan Wakely [Thu, 9 Nov 2023 21:50:34 +0000 (21:50 +0000)]
libstdc++: Fix test that fails with -ffreestanding
The -ffreestanding option disables Debug Mode, forcibly #undef'ing
_GLIBCXX_DEBUG. This means that the dangling checks in std::pair are
disabled for -ffreestanding in C++17 and earlier, because they depend on
_GLIBCXX_DEBUG. Adjust the target specifiers for the errors currently
matching c++17_down so they also require the hosted effective target.
libstdc++-v3/ChangeLog:
* testsuite/20_util/pair/dangling_ref.cc: Add hosted effective
target for specifiers using c++17_down.
Jonathan Wakely [Wed, 8 Nov 2023 13:43:04 +0000 (13:43 +0000)]
libstdc++: Deprecate std::atomic_xxx overloads for std::shared_ptr
These overloads are deprecated in C++20 (and likely to be removed for
C++26). The std::atomic<std::shared_ptr<T>> specialization should be
preferred in new code.
Jonathan Wakely [Thu, 17 Aug 2023 17:58:24 +0000 (18:58 +0100)]
libstdc++: Add [[nodiscard]] to lock types
Adding this attribute means users get a warning when they accidentally
create a temporary lock instead of creating an automatic variable with
block scope.
For std::lock_guard both constructors have side effects (they both take
a mutex and so both cause it to be unlocked at the end of the full
expression when a temporary is constructed). Ideally we would just put
the attribute on the class instead of the constructors, but that doesn't
work with GCC (PR c++/85973).
For std::unique_lock the default constructor and std::defer_lock_t
constructor do not cause any locking or unlocking, so do not need to
give a warning. It might still be a mistake to create a temporary using
those constructors, but it's harmless and seems unlikely anyway. For a
lock object created with one of those constructors you would expect the
lock object to be referred to later in the function, and that would not
even compile if it was constructed as an unnamed temporary.
std::scoped_lock gets the same treatment as std::lock_guard, except that
the explicit specialization for zero lockables has no side effects so
doesn't need to warn.
libstdc++-v3/ChangeLog:
* include/bits/std_mutex.h (lock_guard): Add [[nodiscard]]
attribute to constructors.
* include/bits/unique_lock.h (unique_lock): Likewise.
* include/std/mutex (scoped_lock, scoped_lock<Mutex>): Likewise.
* testsuite/30_threads/lock_guard/cons/nodiscard.cc: New test.
* testsuite/30_threads/scoped_lock/cons/nodiscard.cc: New test.
* testsuite/30_threads/unique_lock/cons/nodiscard.cc: New test.
Jonathan Wakely [Sat, 4 Nov 2023 08:30:54 +0000 (08:30 +0000)]
libstdc++: Add [[nodiscard]] to std::span members
All std::span member functions are pure functions that have no side
effects. They are only useful for their return value, so they should all
warn if that value is not used.
Jonathan Wakely [Fri, 29 Sep 2023 13:29:16 +0000 (14:29 +0100)]
libstdc++: Remove handling for underscore-prefixed libm functions [PR111638]
The checks in linkage.m4 try to support math functions prefixed with
underscores, like _acosf and _isinf. However, that doesn't work because
they're renamed to the standard names using a macro, but then <cmath>
undefines that macro again.
This simply removes everything related to those underscored functions.
libstdc++-v3/ChangeLog:
PR libstdc++/111638
* config.h.in: Regenerate.
* configure: Regenerate.
* linkage.m4 (GLIBCXX_MAYBE_UNDERSCORED_FUNCS): Remove.
(GLIBCXX_CHECK_MATH_DECL_AND_LINKAGE_1): Do not check for _foo.
(GLIBCXX_CHECK_MATH_DECLS_AND_LINKAGES_1): Likewise.
(GLIBCXX_CHECK_MATH_DECL_AND_LINKAGE_2): Likewise.
(GLIBCXX_CHECK_MATH_DECL_AND_LINKAGE_3): Likewise.
(GLIBCXX_CHECK_STDLIB_DECL_AND_LINKAGE_2): Do not use
GLIBCXX_MAYBE_UNDERSCORED_FUNCS.
Nathaniel Shead [Thu, 11 May 2023 22:02:18 +0000 (23:02 +0100)]
libstdc++: Add missing functions to <cmath> [PR79700]
This patch adds the -f and -l variants of the C99 <math.h> functions to
<cmath> under namespace std (so std::sqrtf, std::fabsl, etc.) for C++11
and up.
Marek Polacek [Sat, 11 Nov 2023 00:36:17 +0000 (19:36 -0500)]
testsuite: fix lambda-decltype3.C in C++11
This fixes
FAIL: g++.dg/cpp0x/lambda/lambda-decltype3.C -std=c++11 (test for excess errors)
due to
lambda-decltype3.C:25:6: error: lambda capture initializers only available with '-std=c++14' or '-std=gnu++14' [-Wc++14-extensions]
Keith Packard [Fri, 10 Nov 2023 23:41:19 +0000 (16:41 -0700)]
[PATCH] libgcc/m68k: Fixes for soft float
Check for non-zero denorm in __adddf3. Need to check both the upper and
lower 32-bit chunks of a 64-bit float for a non-zero value when
checking to see if the value is -0.
Fix __addsf3 when the sum exponent is exactly 0xff to ensure that
produces infinity and not nan.
Handle converting NaN/inf values between formats.
Handle underflow and overflow when truncating.
Write a replacement for __fixxfsi so that it does not raise extra
exceptions during an extra conversion from long double to double.
libgcc/
* config/m68k/lb1sf68.S (__adddf3): Properly check for non-zero denorm.
(__divdf3): Restore sign bit properly.
(__addsf3): Correct exponent check.
* config/m68k/fpgnulib.c (EXPMASK): Define.
(__extendsfdf2): Handle Inf and NaN properly.
(__truncdfsf2): Handle underflow and overflow correctly.
(__extenddfxf2): Handle underflow, denorms, Inf and NaN correctly.
(__truncxfdf2): Handle underflow and denorms correctly.
(__fixxfsi): Reimplement.
RISC-V: Fix indentation of "length" attribute for branches and jumps
The "length" attribute calculation expressions for branches and jumps
are incorrectly and misleadingly indented, and they overrun the 80
column limit as well, all of this causing troubles in following them.
Correct all these issues.
gcc/
* config/riscv/riscv.md (length): Fix indentation for branch and
jump length calculation expressions.
Patrick O'Neill [Thu, 2 Nov 2023 21:34:48 +0000 (14:34 -0700)]
g++: Rely on dg-do-what-default to avoid running pr102788.cc on non-vector targets
Testcases in g++.dg/vect rely on check_vect_support_and_set_flags
to set dg-do-what-default and avoid running vector tests on non-vector
targets. The testcase in this patch overwrites the default with
dg-do run.
Removing the dg-do run directive resolves this issue for non-vector
targets (while still running the tests on vector targets).
gcc/testsuite/ChangeLog:
* g++.dg/vect/pr102788.cc: Remove dg-do run directive.
Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>
Eric Botcazou [Fri, 10 Nov 2023 17:59:31 +0000 (18:59 +0100)]
Handle constant CONSTRUCTORs in operand_compare
This teaches operand_compare to compare constant CONSTRUCTORs, which is
quite helpful for so-called fat pointers in Ada, i.e. objects that are
semantically pointers but are represented by structures made up of two
pointers. This is modeled on the implementation present in the ICF pass.
gcc/
* fold-const.cc (operand_compare::operand_equal_p) <CONSTRUCTOR>:
Deal with nonempty constant CONSTRUCTORs.
(operand_compare::hash_operand) <CONSTRUCTOR>: Hash DECL_FIELD_OFFSET
and DECL_FIELD_BIT_OFFSET for FIELD_DECLs.
gcc/testsuite/
* gnat.dg/opt103.ads, gnat.dg/opt103.adb: New test.
[IRA]: Check autoinc and memory address after temporary equivalence substitution
My previous RA patches to take register equivalence into account do
temporary register equivalence substitution to find out that the
equivalence can be consumed by insns. The insn with the substitution is
checked on validity using target-depended code. This code expects that
autoinc operations work on register but this register can be substituted
by equivalent memory. The patch fixes this problem. The patch also adds
checking that the substitution can be consumed in memory address too.
gcc/ChangeLog:
PR target/112337
* ira-costs.cc: (validate_autoinc_and_mem_addr_p): New function.
(equiv_can_be_consumed_p): Use it.
Patrick Palka [Fri, 10 Nov 2023 15:58:06 +0000 (10:58 -0500)]
c++: decltype of (by-value captured reference) [PR79620]
The capture_decltype handling in finish_decltype_type wasn't looking
through implicit INDIRECT_REF (added by convert_from_reference), which
caused us to incorrectly resolve decltype((r)) to float& below. This
patch fixes this, and adds an assert to outer_automatic_var_p to help
prevent against such bugs.
We still don't fully accept the example ultimately because for the
decltype inside the lambda's trailing return type, at that point we're
in lambda type scope but not yet in lambda function scope that the
capture_decltype handling looks for (which is an orthogonal bug).
PR c++/79620
gcc/cp/ChangeLog:
* cp-tree.h (STRIP_REFERENCE_REF): Define.
* semantics.cc (outer_var_p): Assert REFERENCE_REF_P is false.
(finish_decltype_type): Look through implicit INDIRECT_REF when
deciding whether to call capture_decltype.
gcc/testsuite/ChangeLog:
* g++.dg/cpp0x/lambda/lambda-decltype3.C: New test.
Patrick Palka [Fri, 10 Nov 2023 15:58:04 +0000 (10:58 -0500)]
c++: decltype of capture proxy [PR79378, PR96917]
We typically don't see capture proxies in finish_decltype_type because
process_outer_var_ref is a no-op within an unevaluated context and so a
use of a captured variable within decltype resolves to the captured
variable, not the capture. But we can see them during decltype(auto)
deduction and for decltype of an init-capture, which suggests we need to
handle capture proxies specially within finish_decltype_type after all.
This patch adds such handling.
PR c++/79378
PR c++/96917
gcc/cp/ChangeLog:
* semantics.cc (finish_decltype_type): Handle an id-expression
naming a capture proxy specially.
gcc/testsuite/ChangeLog:
* g++.dg/cpp1y/decltype-auto7.C: New test.
* g++.dg/cpp1y/lambda-init20.C: New test.
This patch allows an .md iterator to include the contents of
previous iterators, possibly with an extra condition attached.
Too much indirection might become hard to follow, so for the
AArch64 changes I tried to stick to things that seemed likely
to be uncontroversial:
(a) structure iterators that combine modes for different sizes
and vector counts
(b) iterators that explicitly duplicate another iterator
(for iterating over the cross product)
gcc/
* read-rtl.cc (md_reader::read_mapping): Allow iterators to
include other iterators.
* doc/md.texi: Document the change.
* config/aarch64/iterators.md (DREG2, VQ2, TX2, DX2, SX2): Include
the iterator that is being duplicated, rather than reproducing it.
(VSTRUCT_D): Redefine using VSTRUCT_[234]D.
(VSTRUCT_Q): Likewise VSTRUCT_[234]Q.
(VSTRUCT_2QD, VSTRUCT_3QD, VSTRUCT_4QD, VSTRUCT_QD): Redefine using
the individual D and Q iterators.
E.g. Consider the total iterations N = 6, the VF = 4.
Since SELECT_VL output is defined as not always to be VF in non-final iteration
which needs to depend on hardware implementation.
Suppose we have a RVV CPU core with vsetvl doing even distribution workload optimization.
It may process 3 elements at the 1st iteration and 3 elements at the last iteration.
Then the induction variable here: _21 = vect_vec_iv_.8_22 + { POLY_INT_CST [4, 4], ... };
is wrong which is adding VF, which is 4, actually, we didn't process 4 elements.
It should be adding 3 elements which is the result of SELECT_VL.
So, here the correct IR should be:
Note the induction variable IR: _21 = vect_vec_iv_.8_22 + { POLY_INT_CST [4, 4], ... }; update induction variable
independent on VF (or don't care about how many elements are processed in the iteration).
The update is loop invariant. So it won't be the problem even if LOOP_VINFO_USING_SELECT_VL_P is true.
Testing passed, Ok for trunk ?
PR tree-optimization/112438
gcc/ChangeLog:
* tree-vect-loop.cc (vectorizable_induction): Bugfix when
LOOP_VINFO_USING_SELECT_VL_P.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/pr112438.c: New test.
Wilco Dijkstra [Fri, 10 Nov 2023 14:06:50 +0000 (14:06 +0000)]
libatomic: Improve ifunc selection on AArch64
Add support for ifunc selection based on CPUID register. Neoverse N1 supports
atomic 128-bit load/store, so use the FEAT_USCAT ifunc like newer Neoverse
cores.
Reviewed-by: Kyrylo.Tkachov@arm.com
libatomic:
* config/linux/aarch64/host-config.h (ifunc1): Use CPUID in ifunc
selection.
Juzhe-Zhong [Fri, 10 Nov 2023 03:36:51 +0000 (11:36 +0800)]
RISC-V: Add combine optimization by slideup for vec_init vectorization
This patch is a small optimization for vector initialization.
Discovered when I am evaluating benchmarks.
Consider this following case:
void foo3 (int8_t *out, int8_t x, int8_t y)
{
v16qi v = {y, y, y, y, y, y, y, x, x, x, x, x, x, x, x, x};
*(v16qi*)out = v;
}
* config/riscv/riscv-protos.h (enum insn_type): New enum.
* config/riscv/riscv-v.cc
(rvv_builder::combine_sequence_use_slideup_profitable_p): New function.
(expand_vector_init_slideup_combine_sequence): Ditto.
(expand_vec_init): Add slideup combine optimization.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vls/def.h: Add combine test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/combine-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/combine-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/combine-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/combine-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls/combine-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls/combine-5.c: New test.
* gcc.target/riscv/rvv/autovec/vls/combine-6.c: New test.
* gcc.target/riscv/rvv/autovec/vls/combine-7.c: New test.
Robin Dapp [Fri, 10 Nov 2023 07:56:18 +0000 (08:56 +0100)]
vect: Look through pattern stmt in fold_left_reduction.
It appears as if we "look through" a statement pattern in
vect_finish_replace_stmt but not before when we replace the newly
created vector statement's lhs. Then the lhs is the statement pattern's
lhs while in vect_finish_replace_stmt we assert that it's from the
statement the pattern replaced.
This patch uses vect_orig_stmt on the scalar destination's definition so
the replaced statement is used everywhere.
gcc/ChangeLog:
PR tree-optimization/112464
* tree-vect-loop.cc (vectorize_fold_left_reduction): Use
vect_orig_stmt on scalar_dest_def_info.
Jin Ma [Fri, 10 Nov 2023 07:14:31 +0000 (15:14 +0800)]
RISC-V: XTheadMemPair: Fix missing fcsr handling in ISR prologue/epilogue
The t0 register is used as a temporary register for interrupts, so it needs
special treatment. It is necessary to avoid using "th.ldd" in the interrupt
program to stop the subsequent operation of the t0 register, so they need to
exchange positions in the function "riscv_for_each_saved_reg".
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_for_each_saved_reg): Place the interrupt
operation before the XTheadMemPair.
Richard Biener [Fri, 10 Nov 2023 11:39:11 +0000 (12:39 +0100)]
tree-optimization/110221 - SLP and loop mask/len
The following fixes the issue that when SLP stmts are internal defs
but appear invariant because they end up only using invariant defs
then they get scheduled outside of the loop. This nice optimization
breaks down when loop masks or lens are applied since those are not
explicitly tracked as dependences. The following makes sure to never
schedule internal defs outside of the vectorized loop when the
loop uses masks/lens.
PR tree-optimization/110221
* tree-vect-slp.cc (vect_schedule_slp_node): When loop
masking / len is applied make sure to not schedule
intenal defs outside of the loop.
Andrew Stubbs [Fri, 20 Oct 2023 15:26:51 +0000 (16:26 +0100)]
vect: Don't set excess bits in unform masks
AVX ignores any excess bits in the mask (at least for vector sizes >=8), but
AMD GCN magically uses a larger vector than was intended (the smaller sizes are
"fake"), leading to wrong-code.
This patch fixes amdgcn execution failures in gcc.dg/vect/pr81740-1.c,
gfortran.dg/c-interop/contiguous-1.f90,
gfortran.dg/c-interop/ff-descriptor-7.f90, and others.
gcc/ChangeLog:
* expr.cc (store_constructor): Add "and" operation to uniform mask
generation.
Andrew Stubbs [Fri, 10 Nov 2023 11:13:55 +0000 (11:13 +0000)]
amdgcn: Fix v_add constraints (pr112308)
The instruction doesn't allow "B" constants for the vop3b encoding (used when
the cc register isn't VCC), so fix the pattern and all the insns that might get
split to it post-reload.
Also switch to the new constraint format for ease of adding new alternatives.
gcc/ChangeLog:
PR target/112308
* config/gcn/gcn-valu.md (add<mode>3<exec_clobber>): Fix B constraint
and switch to the new format.
(add<mode>3_dup<exec_clobber>): Likewise.
(add<mode>3_vcc<exec_vcc>): Likewise.
(add<mode>3_vcc_dup<exec_vcc>): Likewise.
(add<mode>3_vcc_zext_dup): Likewise.
(add<mode>3_vcc_zext_dup_exec): Likewise.
(add<mode>3_vcc_zext_dup2): Likewise.
(add<mode>3_vcc_zext_dup2_exec): Likewise.
Juzhe-Zhong [Fri, 10 Nov 2023 03:33:16 +0000 (11:33 +0800)]
RISC-V: Robustify vec_init pattern[NFC]
Although current GCC didn't cause ICE when I create FP16 vec_init case
with -march=rv64gcv (no ZVFH), current vec_init pattern looks wrong.
Since V_VLS FP16 predicate is TARGET_VECTOR_ELEN_FP_16, wheras vec_init
needs vfslide1down/vfslide1up.
It makes more sense to robustify the vec_init patterns which split them
into 2 patterns (one is integer, the other is float) like other autovectorization patterns.
After this patch:
f_vnx16df:
vsetivli zero,16,e64,m8,ta,ma
vfmv.v.f v16,fa1
vfslide1up.vf v8,v16,fa0
vmv8r.v v16,v8
vfslide1up.vf v8,v16,fa0
vmv8r.v v16,v8
vfslide1up.vf v8,v16,fa0
vs8r.v v8,0(a0)
ret
gcc/ChangeLog:
* config/riscv/riscv-v.cc (expand_vector_init_trailing_same_elem):
New fun impl to expand the insn when trailing same elements.
(expand_vec_init): Try trailing same elements when vec_init.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vls-vlmax/init-same-tail-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/init-same-tail-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/init-same-tail-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/init-same-tail-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/init-same-tail-5.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/init-same-tail-run-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/init-same-tail-run-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/init-same-tail-run-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls/init-same-tail-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/init-same-tail-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/init-same-tail-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls/init-same-tail-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls/init-same-tail-5.c: New test.
* gcc.target/riscv/rvv/autovec/vls/init-same-tail-6.c: New test.
* gcc.target/riscv/rvv/autovec/vls/init-same-tail-7.c: New test.
* gcc.target/riscv/rvv/autovec/vls/init-same-tail-8.c: New test.
* gcc.target/riscv/rvv/autovec/vls/init-same-tail-9.c: New test.
Brendan Shanks [Fri, 10 Nov 2023 04:01:07 +0000 (21:01 -0700)]
[PATCH v3] libiberty: Use posix_spawn in pex-unix when available.
Hi,
This patch implements pex_unix_exec_child using posix_spawn when
available.
This should especially benefit recent macOS (where vfork just calls
fork), but should have equivalent or faster performance on all
platforms.
In addition, the implementation is substantially simpler than the
vfork+exec code path.
Tested on x86_64-linux.
v2: Fix error handling (previously the function would be run twice in
case of error), and don't use a macro that changes control flow.
v3: Match file style for error-handling blocks, don't close
in/out/errdes on error, and check close() for errors.
Pan Li [Thu, 9 Nov 2023 14:04:39 +0000 (22:04 +0800)]
Internal-fn: Add FLOATN support for l/ll round and rint [PR/112432]
The defined DEF_EXT_LIB_FLOATN_NX_BUILTINS functions should also
have DEF_INTERNAL_FLT_FLOATN_FN instead of DEF_INTERNAL_FLT_FN for
the FLOATN support. According to the glibc API and gcc builtin, we
have below table for the FLOATN is supported or not.
+---------+-------+-------------------------------------+
| | glibc | gcc: DEF_EXT_LIB_FLOATN_NX_BUILTINS |
+---------+-------+-------------------------------------+
| iceil | N | N |
| ifloor | N | N |
| irint | N | N |
| iround | N | N |
| lceil | N | N |
| lfloor | N | N |
| lrint | Y | Y |
| lround | Y | Y |
| llceil | N | N |
| llfllor | N | N |
| llrint | Y | Y |
| llround | Y | Y |
+---------+-------+-------------------------------------+
This patch would like to support FLOATN for:
1. lrint
2. lround
3. llrint
4. llround
The below tests are passed within this patch:
1. x86 bootstrap and regression test.
2. aarch64 regression test.
3. riscv regression tests.
Jeff Law [Fri, 10 Nov 2023 00:34:01 +0000 (17:34 -0700)]
[committed] Improve single bit zero extraction on H8.
When zero extracting a single bit bitfield from bits 16..31 on the H8 we
currently generate some pretty bad code.
The fundamental issue is we can't shift efficiently and there's no trivial way
to extract a single bit out of the high half word of an SImode value.
What usually happens is we use a synthesized right shift to get the single bit
into the desired position, then a bit-and to mask off everything we don't care
about.
The shifts are expensive, even using tricks like half and quarter word moves to
implement shift-by-16 and shift-by-8. Additionally a logical right shift must
clear out the upper bits which is redundant since we're going to mask things
with &1 later.
This patch provides a consistently better sequence for such extractions. The
general form moves the high half into the low half, a bit extraction into C,
clear the destination, then move C into the destination with a few special
cases.
This also avoids all the shenanigans for H8/SX which has a much more capable
shifter. It's not single cycle, but it is reasonably efficient.
This has been regression tested on the H8 without issues. Pushing to the trunk
momentarily.
jeff
ps. Yes, supporting zero extraction of multi-bit fields might be improvable as
well. But I've already spent more time on this than I can reasonably justify.
gcc/
* config/h8300/combiner.md (single bit sign_extract): Avoid recently
added patterns for H8/SX.
(single bit zero_extract): New patterns.
liuhongt [Thu, 9 Nov 2023 05:20:05 +0000 (13:20 +0800)]
Fix wrong code due to vec_merge + pcmp to blendvb splitter.
gcc/ChangeLog:
PR target/112443
* config/i386/sse.md (*avx2_pcmp<mode>3_4): Fix swap condition
from LT to GT since there's not in the pattern.
(*avx2_pcmp<mode>3_5): Ditto.
Jose E. Marchesi [Fri, 10 Nov 2023 00:12:49 +0000 (01:12 +0100)]
bpf: fix pseudo-c asm emitted for *mulsidi3_zeroextend
This patch fixes the pseudo-c BPF assembly syntax used for
*mulsidi3_zeroextend, which was being emitted as:
rN *= wM
instead of the proper way to denote a mul32 in pseudo-C syntax:
wN *= wM
Includes test.
Tested in bpf-unknown-none-gcc target in x86_64-linux-gnu host.
gcc/ChangeLog:
* config/bpf/bpf.cc (bpf_print_register): Accept modifier code 'W'
to force emitting register names using the wN form.
* config/bpf/bpf.md (*mulsidi3_zeroextend): Force operands to
always use wN written form in pseudo-C assembly syntax.
gcc/testsuite/ChangeLog:
* gcc.target/bpf/mulsidi3-zeroextend-pseudoc.c: New test.
Arsen Arsenović [Wed, 8 Nov 2023 09:22:47 +0000 (10:22 +0100)]
libstdc++: declare std::allocator in !HOSTED as an extension
This allows us to add features to freestanding which allow specifying
non-default allocators (generators, collections, ...) without having to
modify them.
libstdc++-v3/ChangeLog:
* include/bits/memoryfwd.h: Remove HOSTED check around allocator
and its specializations.
David Malcolm [Thu, 9 Nov 2023 22:22:52 +0000 (17:22 -0500)]
diagnostics: cleanups to diagnostic-show-locus.cc
Reduce implicit usage of line_table global, and move source printing to
within diagnostic_context.
gcc/ChangeLog:
* diagnostic-show-locus.cc (layout::m_line_table): New field.
(compatible_locations_p): Convert to...
(layout::compatible_locations_p): ...this, replacing uses of
line_table global with m_line_table.
(layout::layout): Convert "richloc" param from a pointer to a
const reference. Initialize m_line_table member.
(layout::maybe_add_location_range): Replace uses of line_table
global with m_line_table. Pass the latter to
linemap_client_expand_location_to_spelling_point.
(layout::print_leading_fixits): Pass m_line_table to
affects_line_p.
(layout::print_trailing_fixits): Likewise.
(gcc_rich_location::add_location_if_nearby): Update for change
to layout ctor params.
(diagnostic_show_locus): Convert to...
(diagnostic_context::maybe_show_locus): ...this, converting
richloc param from a pointer to a const reference. Make "loc"
const. Split out printing part of function to...
(diagnostic_context::show_locus): ...this.
(selftest::test_offset_impl): Update for change to layout ctor
params.
(selftest::test_layout_x_offset_display_utf8): Likewise.
(selftest::test_layout_x_offset_display_tab): Likewise.
(selftest::test_tab_expansion): Likewise.
* diagnostic.h (diagnostic_context::maybe_show_locus): New decl.
(diagnostic_context::show_locus): New decl.
(diagnostic_show_locus): Convert from a decl to an inline function.
* gdbinit.in (break-on-diagnostic): Update from a breakpoint
on diagnostic_show_locus to one on
diagnostic_context::maybe_show_locus.
* genmatch.cc (linemap_client_expand_location_to_spelling_point):
Add "set" param and use it in place of line_table global.
* input.cc (expand_location_1): Likewise.
(expand_location): Update for new param of expand_location_1.
(expand_location_to_spelling_point): Likewise.
(linemap_client_expand_location_to_spelling_point): Add "set"
param and use it in place of line_table global.
* tree-diagnostic-path.cc (event_range::print): Pass line_table
for new param of linemap_client_expand_location_to_spelling_point.
libcpp/ChangeLog:
* include/line-map.h (rich_location::get_expanded_location): Make
const.
(rich_location::get_line_table): New accessor.
(rich_location::m_line_table): Make the pointer be const.
(rich_location::m_have_expanded_location): Make mutable.
(rich_location::m_expanded_location): Likewise.
(fixit_hint::affects_line_p): Add const line_maps * param.
(linemap_client_expand_location_to_spelling_point): Likewise.
* line-map.cc (rich_location::get_expanded_location): Make const.
Pass m_line_table to
linemap_client_expand_location_to_spelling_point.
(rich_location::maybe_add_fixit): Likewise.
(fixit_hint::affects_line_p): Add set param and pass to
linemap_client_expand_location_to_spelling_point.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
Patrick Palka [Thu, 9 Nov 2023 20:15:08 +0000 (15:15 -0500)]
libstdc++: Fix forwarding in __take/drop_of_repeat_view [PR112453]
We need to respect the value category of the repeat_view passed to these
two functions when accessing the view's _M_value member. This revealed
that the space-efficient partial specialization of __box lacks && overloads
of operator* to match those of the primary template (inherited from
std::optional).
PR libstdc++/112453
libstdc++-v3/ChangeLog:
* include/std/ranges (__detail::__box<_Tp>::operator*): Define
&& overloads as well.
(__detail::__take_of_repeat_view): Forward __r when accessing
its _M_value member.
(__detail::__drop_of_repeat_view): Likewise.
* testsuite/std/ranges/repeat/1.cc (test07): New test.
Robin Dapp [Thu, 9 Nov 2023 10:32:30 +0000 (11:32 +0100)]
RISC-V/testsuite: Fix several zvfh tests.
This fixes some zvfh test oversights as well as adds zfh to the target
requirements. It's not strictly necessary to have zfh but it greatly
simplifies test handling when we can just calculate the reference value
instead of working around it.
Uros Bizjak [Thu, 9 Nov 2023 18:47:57 +0000 (19:47 +0100)]
i386: Improve stack protector patterns and peephole2s even more
Improve stack protector patterns and peephole2s even more:
a. Use unrelated register clears with integer mode size <= word
mode size to clear stack protector scratch register.
b. Use unrelated register initializations in front of stack
protector sequence to clear stack protector scratch register.
c. Use unrelated register initializations using LEA instructions
to clear stack protector scratch register.
These stack protector improvements reuse 6914 unrelated register
initializations to substitute the clear of stack protector scratch
register in 12034 instances of stack protector sequence in recent linux
defconfig build.
gcc/ChangeLog:
* config/i386/i386.md (@stack_protect_set_1_<PTR:mode>_<W:mode>):
Use W mode iterator instead of SWI48. Output MOV instead of XOR
for TARGET_USE_MOV0.
(stack_protect_set_1 peephole2): Use integer modes with
mode size <= word mode size for operand 3.
(stack_protect_set_1 peephole2 #2): New peephole2 pattern to
substitute stack protector scratch register clear with unrelated
register initialization, originally in front of stack
protector sequence.
(*stack_protect_set_3_<PTR:mode>_<SWI48:mode>): New insn pattern.
(stack_protect_set_1 peephole2): New peephole2 pattern to
substitute stack protector scratch register clear with unrelated
register initialization involving LEA instruction.
[IRA]: Fixing conflict calculation from region landing pads.
The following patch fixes conflict calculation from exception landing
pads. The previous patch processed only one newly created landing pad.
Besides it was wrong, it also resulted in large memory consumption by IRA.
gcc/ChangeLog:
PR rtl-optimization/110215
* ira-lives.cc: (add_conflict_from_region_landing_pads): New
function.
(process_bb_node_lives): Use it.
Alexandre Oliva [Thu, 9 Nov 2023 15:26:41 +0000 (12:26 -0300)]
i386 PIE: accept @GOTOFF in load/store multi base address
Looking at the code generated for sse2-{load,store}-multi.c with PIE,
I realized we could use UNSPEC_GOTOFF as a base address, and that this
would enable the test to use the vector insns expected by the tests
even with PIC, so I extended the base + offset logic used by the SSE2
multi-load/store peepholes to accept reg + symbolic base + offset too,
so that the test generated the expected insns even with PIE.
for gcc/ChangeLog
* config/i386/i386.cc (symbolic_base_address_p,
base_address_p): New, factored out from...
(extract_base_offset_in_addr): ... here and extended to
recognize REG+GOTOFF, as in gcc.target/i386/sse2-load-multi.c
and sse2-store-multi.c with PIE enabled by default.
Alexandre Oliva [Thu, 9 Nov 2023 15:26:38 +0000 (12:26 -0300)]
testsuite: xfail scev-[35].c on ia32
These gimplefe tests never got the desired optimization on ia32, but
they only started visibly failing when the representation of MEMs in
dumps changed from printing 'symbol: a' to '&a'.
The transformation is not considered profitable on ia32, that's why it
doesn't take place. Maybe that's a bug in itself, but it's not a
regression, and not something to be noisy about.
for gcc/testsuite/ChangeLog
* gcc.dg/tree-ssa/scev-3.c: xfail on ia32.
* gcc.dg/tree-ssa/scev-5.c: Likewise.
copysign (x, -1) is effectively fneg (abs (x)) which on AArch64 can be
most efficiently done by doing an OR of the signbit.
The middle-end will optimize fneg (abs (x)) now to copysign as the
canonical form and so this optimizes the expansion.
If the target has an inclusive-OR that takes an immediate, then the transformed
instruction is both shorter and faster. For those that don't, the immediate
has to be separately constructed, but this still ends up being faster as the
immediate construction is not on the critical path.
Note that this is part of another patch series, the additional testcases
are mutually dependent on the match.pd patch. As such the tests are added
there insteadof here.
Tamar Christina [Thu, 9 Nov 2023 14:18:48 +0000 (14:18 +0000)]
AArch64: Use SVE unpredicated LOGICAL expressions when Advanced SIMD inefficient [PR109154]
SVE has much bigger immediate encoding range for bitmasks than Advanced SIMD has
and so on a system that is SVE capable if we need an Advanced SIMD Inclusive-OR
by immediate and would require a reload then use an unpredicated SVE ORR instead.