Jonathan Wakely [Fri, 9 Jun 2023 11:15:21 +0000 (12:15 +0100)]
libstdc++: Add preprocessor checks to <experimental/internet> [PR100285]
We can't define endpoints and resolvers without the relevant OS support.
If IPPROTO_TCP and IPPROTO_UDP are both udnefined then we won't need
basic_endpoint and basic_resovler anyway, so make them depend on those
macros.
libstdc++-v3/ChangeLog:
PR libstdc++/100285
* include/experimental/internet [IPPROTO_TCP || IPPROTO_UDP]
(basic_endpoint, basic_resolver_entry, resolver_base)
(basic_resolver_results, basic_resolver): Only define if the tcp
or udp protocols will be defined.
I had intended to support the P2510R3 proposal unconditionally in C++20
mode, but I left it half implemented. The parse function supported the
new extensions, but the format function didn't.
This adds the missing pieces, and makes it only enabled for C++26 and
non-strict modes.
libstdc++-v3/ChangeLog:
PR libstdc++/110149
* include/std/format (formatter<const void*, charT>::parse):
Only alow 0 and P for C++26 and non-strict modes.
(formatter<const void*, charT>::format): Use toupper for P
type, and insert zero-fill characters for 0 option.
* testsuite/std/format/functions/format.cc: Check pointer
formatting. Only check P2510R3 extensions conditionally.
* testsuite/std/format/parse_ctx.cc: Only check P2510R3
extensions conditionally.
Jonathan Wakely [Thu, 8 Jun 2023 11:24:43 +0000 (12:24 +0100)]
libstdc++: Optimize std::to_array for trivial types [PR110167]
As reported in PR libstdc++/110167, std::to_array compiles extremely
slowly for very large arrays. It needs to instantiate a very large
specialization of std::index_sequence<N...> and then create a very large
aggregate initializer from the pack expansion. For trivial types we can
simply default-initialize the std::array and then use memcpy to copy the
values. For non-trivial types we need to use the existing
implementation, despite the compilation cost.
As also noted in the PR, using a generic lambda instead of the
__to_array helper compiles faster since gcc-13. It also produces
slightly smaller code at -O1, due to additional inlining. The code at
-Os, -O2 and -O3 seems to be the same. This new implementation requires
__cpp_generic_lambdas >= 201707L (i.e. P0428R2) but that is supported
since Clang 10 and since Intel icc 2021.5.0 (and since GCC 10.1).
libstdc++-v3/ChangeLog:
PR libstdc++/110167
* include/std/array (to_array): Initialize arrays of trivial
types using memcpy. For non-trivial types, use lambda
expressions instead of a separate helper function.
(__to_array): Remove.
* testsuite/23_containers/array/creation/110167.cc: New test.
Richard Biener [Fri, 9 Jun 2023 07:29:09 +0000 (09:29 +0200)]
middle-end/110182 - TYPE_PRECISION on VECTOR_TYPE causes wrong-code
When folding two conversions in a row we use TYPE_PRECISION but
that's invalid for VECTOR_TYPE. The following fixes this by
using element_precision instead.
* match.pd (two conversions in a row): Use element_precision
to DTRT for VECTOR_TYPE.
Jonathan Wakely [Thu, 8 Jun 2023 11:19:26 +0000 (12:19 +0100)]
libstdc++: Improve tests for emplace member of sequence containers
Our existing tests for std::deque::emplace, std::list::emplace and
std::vector::emplace are poor. We only have compile tests for PR 52799
and the equivalent for a const_iterator as the insertion point. This
fails to check that the value is actually inserted correctly and the
right iterator is returned.
Add new tests that cover the existing 52799.cc and const_iterator.cc
compile-only tests, as well as verifying the effects are correct.
libstdc++-v3/ChangeLog:
* testsuite/23_containers/deque/modifiers/emplace/52799.cc:
Removed.
* testsuite/23_containers/deque/modifiers/emplace/const_iterator.cc:
Removed.
* testsuite/23_containers/list/modifiers/emplace/52799.cc:
Removed.
* testsuite/23_containers/list/modifiers/emplace/const_iterator.cc:
Removed.
* testsuite/23_containers/vector/modifiers/emplace/52799.cc:
Removed.
* testsuite/23_containers/vector/modifiers/emplace/const_iterator.cc:
Removed.
* testsuite/23_containers/deque/modifiers/emplace/1.cc: New
test.
* testsuite/23_containers/list/modifiers/emplace/1.cc: New
test.
* testsuite/23_containers/vector/modifiers/emplace/1.cc: New
test.
Pan Li [Fri, 9 Jun 2023 03:19:12 +0000 (11:19 +0800)]
RISC-V: Refactor requirement of ZVFH and ZVFHMIN.
This patch would like to refactor the requirement of both the ZVFH
and ZVFHMIN. By default, the ZVFHMIN will enable FP16 for all the
iterators of RVV. And then the ZVFH will leverage one define attr as
the gate for FP16 supported or not.
Please note the ZVFH will cover the ZVFHMIN instructions. This patch
add one test for this.
Signed-off-by: Pan Li <pan2.li@intel.com>
Co-Authored by: Juzhe-Zhong <juzhe.zhong@rivai.ai>
Co-Authored by: Kito Cheng <kito.cheng@sifive.com>
gcc/ChangeLog:
* config/riscv/riscv.md (enabled): Move to another place, and
add fp_vector_disabled to the cond.
(fp_vector_disabled): New attr defined for disabling fp.
* config/riscv/vector-iterators.md: Fix V_WHOLE and V_FRACT.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/zvfhmin-intrinsic.c: Add vle16 test
for ZVFHMIN.
Jakub Jelinek [Fri, 9 Jun 2023 07:10:29 +0000 (09:10 +0200)]
fortran: Fix ICE on pr96024.f90 on big-endian hosts [PR96024]
The pr96024.f90 testcase ICEs on big-endian hosts. The problem is
that length->val.integer is accessed after checking
length->expr_type == EXPR_CONSTANT, but it is a CHARACTER constant
which uses length->val.character union member instead and on big-endian
we end up reading constant 0x100000000 rather than some small number
on little-endian and if target doesn't have enough memory for 4 times
that (i.e. 16GB allocation), it ICEs.
2023-06-09 Jakub Jelinek <jakub@redhat.com>
PR fortran/96024
* primary.cc (gfc_convert_to_structure_constructor): Only do
constant string ctor length verification and truncation/padding
if constant length has INTEGER type.
liuhongt [Mon, 5 Jun 2023 04:38:41 +0000 (12:38 +0800)]
Explicitly view_convert_expr mask to signed type when folding pblendvb builtins.
Since mask < 0 will be always false for vector char when
-funsigned-char, but vpblendvb needs to check the most significant
bit. The patch explicitly VCE to vector signed char.
gcc/ChangeLog:
PR target/110108
* config/i386/i386.cc (ix86_gimple_fold_builtin): Explicitly
view_convert_expr mask to signed type when folding pblendvb
builtins.
liuhongt [Mon, 5 Jun 2023 03:59:33 +0000 (11:59 +0800)]
Fold _mm{,256,512}_abs_{epi8,epi16,epi32,epi64} into gimple ABSU_EXPR + VCE.
r14-1145 fold the intrinsics into gimple ABS_EXPR which has UB for
TYPE_MIN, but PABSB will store unsigned result into dst. The patch
uses ABSU_EXPR + VCE instead of ABS_EXPR.
Also don't fold _mm_abs_{pi8,pi16,pi32} w/o TARGET_64BIT since 64-bit
vector absm2 is guarded with TARGET_MMX_WITH_SSE.
gcc/ChangeLog:
PR target/110108
* config/i386/i386.cc (ix86_gimple_fold_builtin): Fold
_mm{,256,512}_abs_{epi8,epi16,epi32,epi64} into gimple
ABSU_EXPR + VCE, don't fold _mm_abs_{pi8,pi16,pi32} w/o
TARGET_64BIT.
* config/i386/i386-builtin.def: Replace CODE_FOR_nothing with
real codename for __builtin_ia32_pabs{b,w,d}.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr110108.c: New test.
* gcc.target/i386/pr110108-3.c: New test.
* gcc.target/i386/pr109900.c: Adjust testcase.
Gaius Mulley [Thu, 8 Jun 2023 23:55:50 +0000 (00:55 +0100)]
PR modula2/110126 variables are reported as unused when referenced by ASM
This patches fixes two problems with the asm statement.
gm2 -Wall -c fooasm3.mod generates an incorrect warning and
gm2 cannot concatenate strings before an ASM statement.
The asm statement now accepts a constant expression (rather than
a string) and it updates the variable read/write use lists as
appropriate.
gcc/m2/ChangeLog:
PR modula2/110126
* gm2-compiler/M2GenGCC.mod (BuildTreeFromInterface): Remove
tokenno parameter. Use object tok instead of tokenno.
(BuildTrashTreeFromInterface): Use object tok instead of
GetDeclaredMod.
(CodeInline): Remove tokenno from parameter list to BuildTreeFromInterface.
* gm2-compiler/M2Quads.def (BuildAsmElement): Exported and
defined.
* gm2-compiler/M2Quads.mod (BuildOptimizeOff): Reformatted.
(BuildInline): Reformatted.
(BuildLineNo): Reformatted.
(UseLineNote): Reformatted.
(BuildAsmElement): New procedure.
* gm2-compiler/P0SyntaxCheck.bnf (AsmOperands): Use
ConstExpression instead of string.
(AsmElement): Use ConstExpression instead of string.
(TrashList): Use ConstExpression instead of string.
* gm2-compiler/P1Build.bnf (AsmOperands): Use
ConstExpression instead of string.
(AsmElement): Use ConstExpression instead of string.
(TrashList): Use ConstExpression instead of string.
* gm2-compiler/P2Build.bnf (AsmOperands): Use
ConstExpression instead of string.
(AsmElement): Use ConstExpression instead of string.
(TrashList): Use ConstExpression instead of string.
* gm2-compiler/P3Build.bnf (AsmOperands): Rewrite.
(AsmOperandSpec): Rewrite.
(AsmOutputList): New rule.
(AsmInputList): New rule.
(TrashList): Rewrite.
* gm2-compiler/PCBuild.bnf (AsmOperands): Use
ConstExpression instead of string.
(AsmElement): Use ConstExpression instead of string.
(TrashList): Use ConstExpression instead of string.
* gm2-compiler/PHBuild.bnf (AsmOperands): Use
ConstExpression instead of string.
(AsmElement): Use ConstExpression instead of string.
(TrashList): Use ConstExpression instead of string.
* gm2-compiler/SymbolTable.def (PutRegInterface):
Rewrite interface.
(GetRegInterface): Rewrite interface.
* gm2-compiler/SymbolTable.mod (SetFirstUsed): New procedure.
(PutFirstUsed): New procedure.
(PutRegInterface): Rewrite.
(GetRegInterface): Rewrite.
gcc/testsuite/ChangeLog:
PR modula2/110126
* gm2/pim/pass/fooasm3.mod: New test.
Andrew MacLeod [Wed, 31 May 2023 17:10:31 +0000 (13:10 -0400)]
Provide a new dispatch mechanism for range-ops.
Simplify range_op_handler to have a single range_operator pointer and
provide a more flexible dispatch mechanism for calls via generic vrange
classes. This is more extensible for adding new classes of range support.
Any unsupported dispatch patterns will simply return FALSE now rather
than generating compile time exceptions, aleviating the need to
constantly check for supoprted types.
* gimple-range-op.cc
(gimple_range_op_handler::gimple_range_op_handler): Adjust.
(gimple_range_op_handler::maybe_builtin_call): Adjust.
* gimple-range-op.h (operand1, operand2): Use m_operator.
* range-op.cc (integral_table, pointer_table): Relocate.
(get_op_handler): Rename from get_handler and handle all types.
(range_op_handler::range_op_handler): Relocate.
(range_op_handler::set_op_handler): Relocate and adjust.
(range_op_handler::range_op_handler): Relocate.
(dispatch_trio): New.
(RO_III, RO_IFI, RO_IFF, RO_FFF, RO_FIF, RO_FII): New consts.
(range_op_handler::dispatch_kind): New.
(range_op_handler::fold_range): Relocate and Use new dispatch value.
(range_op_handler::op1_range): Ditto.
(range_op_handler::op2_range): Ditto.
(range_op_handler::lhs_op1_relation): Ditto.
(range_op_handler::lhs_op2_relation): Ditto.
(range_op_handler::op1_op2_relation): Ditto.
(range_op_handler::set_op_handler): Use m_operator member.
* range-op.h (range_op_handler::operator bool): Use m_operator.
(range_op_handler::dispatch_kind): New.
(range_op_handler::m_valid): Delete.
(range_op_handler::m_int): Delete
(range_op_handler::m_float): Delete
(range_op_handler::m_operator): New.
(range_op_table::operator[]): Relocate from .cc file.
(range_op_table::set): Ditto.
* value-range.h (class vrange): Make range_op_handler a friend.
Andrew MacLeod [Wed, 31 May 2023 16:31:53 +0000 (12:31 -0400)]
Unify range_operators to one class.
Range_operator and range_operator_float are 2 different classes, making
generalized dispatch difficult. The distinction between what is a float
operator and what is an integral operator also blurs when some methods
have multiple types. ie, casts : INT = FLOAT and FLOAT = INT
This patch unifies all possible invocation patterns in one class, and
switches the float table to use the general range_op_table.
* gimple-range-op.cc (cfn_constant_float_p): Change base class.
(cfn_pass_through_arg1): Adjust using statemenmt.
(cfn_signbit): Change base class, adjust using statement.
(cfn_copysign): Ditto.
(cfn_sqrt): Ditto.
(cfn_sincos): Ditto.
* range-op-float.cc (fold_range): Change class to range_operator.
(rv_fold): Ditto.
(op1_range): Ditto
(op2_range): Ditto
(lhs_op1_relation): Ditto.
(lhs_op2_relation): Ditto.
(op1_op2_relation): Ditto.
(foperator_*): Ditto.
(class float_table): New. Inherit from range_op_table.
(floating_tree_table) Change to range_op_table pointer.
(class floating_op_table): Delete.
* range-op.cc (operator_equal): Adjust using statement.
(operator_not_equal): Ditto.
(operator_lt, operator_le, operator_gt, operator_ge): Ditto.
(operator_minus, operator_cast): Ditto.
(operator_bitwise_and, pointer_plus_operator): Ditto.
(get_float_handle): Change return type.
* range-op.h (range_operator_float): Delete. Relocate all methods
into class range_operator.
(range_op_handler::m_float): Change type to range_operator.
(floating_op_table): Delete.
(floating_tree_table): Change type.
Andrew MacLeod [Wed, 31 May 2023 14:55:28 +0000 (10:55 -0400)]
Remove tree_code from range-operator.
Range_operator had a tree code added last release to facilitate
bitmask operations. This removes the tree_code and replaces it with a
virtual routine to peform the masking. Remove any duplicate instances
which are no longer needed.
Andrew MacLeod [Wed, 7 Jun 2023 18:03:35 +0000 (14:03 -0400)]
Fix floating point bug in fold_range.
We currently do not have any floating point operators where operand 1 is
a different type than the LHS. When we eventually do there is a bug
in fold_range. If either operand is a known NAN, it returns a NAN
of the type of operand 1 instead of the result type.
* range-op-float.cc (range_operator_float::fold_range): Return
NAN of the result type.
This patch enhances -Wanalyzer-out-of-bounds that is no longer paired
with a -Wanalyzer-use-of-uninitialized-value on out-of-bounds-read.
This also fixes PR analyzer/109437.
Before there could always be at most one OOB-read warning per frame because
-Wanalyzer-use-of-uninitialized-value always terminates the analysis
path.
PR 109439
gcc/analyzer/ChangeLog:
* bounds-checking.cc (region_model::check_symbolic_bounds): Returns whether the BASE_REG
region access was OOB.
(region_model::check_region_bounds): Likewise.
* region-model.cc (region_model::get_store_value): Creates an
unknown svalue on OOB-read access to REG.
(region_model::check_region_access): Returns whether an unknown svalue needs be created.
(region_model::check_region_for_read): Passes check_region_access return value.
* region-model.h: Update prior function definitions.
gcc/testsuite/ChangeLog:
* gcc.dg/analyzer/out-of-bounds-2.c: Cleaned test for uninitialized-value warning
* gcc.dg/analyzer/out-of-bounds-5.c: Likewise.
* gcc.dg/analyzer/pr101962.c: Likewise.
* gcc.dg/analyzer/realloc-5.c: Likewise.
* gcc.dg/analyzer/pr109439.c: New test.
Jakub Jelinek [Thu, 8 Jun 2023 08:13:23 +0000 (10:13 +0200)]
optabs: Implement double-word ctz and ffs expansion
We have expand_doubleword_clz for a couple of years, where we emit
double-word CLZ as if (high_word == 0) return CLZ (low_word) + word_size;
else return CLZ (high_word);
We can do something similar for CTZ and FFS IMHO, just with the 2
words swapped. So if (low_word == 0) return CTZ (high_word) + word_size;
else return CTZ (low_word); for CTZ and
if (low_word == 0) { return high_word ? FFS (high_word) + word_size : 0;
else return FFS (low_word);
The following patch implements that.
Note, on some targets which implement both word_mode ctz and ffs patterns,
it might be better to incrementally implement those double-word ffs expansion
patterns in md files, because we aren't able to optimize it correctly;
nothing can detect we have just made sure that argument is not 0 and so
don't need to bother with handling that case. So, on ia32 just using
CTZ patterns would be better there, but I think we can even do better and
instead of doing the comparisons of the operands against 0 do the CTZ
expansion followed by testing of flags.
2023-06-08 Jakub Jelinek <jakub@redhat.com>
* optabs.cc (expand_ffs): Add forward declaration.
(expand_doubleword_clz): Rename to ...
(expand_doubleword_clz_ctz_ffs): ... this. Add UNOPTAB argument,
handle also doubleword CTZ and FFS in addition to CLZ.
(expand_unop): Adjust caller. Also call it for doubleword
ctz_optab and ffs_optab.
* gcc.target/i386/ctzll-1.c: New test.
* gcc.target/i386/ffsll-1.c: New test.
Jakub Jelinek [Thu, 8 Jun 2023 08:11:25 +0000 (10:11 +0200)]
i386: Fix endless recursion in ix86_expand_vector_init_general with MMX [PR110152]
I'm getting
+FAIL: gcc.target/i386/3dnow-1.c (internal compiler error: Segmentation fault signal terminated program cc1)
+FAIL: gcc.target/i386/3dnow-1.c (test for excess errors)
+FAIL: gcc.target/i386/3dnow-2.c (internal compiler error: Segmentation fault signal terminated program cc1)
+FAIL: gcc.target/i386/3dnow-2.c (test for excess errors)
+FAIL: gcc.target/i386/mmx-1.c (internal compiler error: Segmentation fault signal terminated program cc1)
+FAIL: gcc.target/i386/mmx-1.c (test for excess errors)
+FAIL: gcc.target/i386/mmx-2.c (internal compiler error: Segmentation fault signal terminated program cc1)
+FAIL: gcc.target/i386/mmx-2.c (test for excess errors)
regressions on i686-linux since r14-1166. The problem is when
ix86_expand_vector_init_general is called with mmx_ok = true and
mode = V4HImode, it newly recurses with mmx_ok = false and mode = V2SImode,
but as mmx_ok is false and !TARGET_SSE, we recurse again with the same
arguments (ok, fresh new tmp and vals) infinitely.
The following patch fixes that by passing mmx_ok to that recursive call.
For n_words == 4 it isn't needed, because we only care about mmx_ok for
V2SImode or V2SFmode and no other modes.
2023-06-08 Jakub Jelinek <jakub@redhat.com>
PR target/110152
* config/i386/i386-expand.cc (ix86_expand_vector_init_general): For
n_words == 2 recurse with mmx_ok as first argument rather than false.
Paul Thomas [Thu, 8 Jun 2023 06:11:32 +0000 (07:11 +0100)]
Fortran: Fix some more blockers in associate meta-bug [PR87477]
2023-06-08 Paul Thomas <pault@gcc.gnu.org>
gcc/fortran
PR fortran/87477
PR fortran/99350
PR fortran/107821
PR fortran/109451
* decl.cc (char_len_param_value): Simplify a copy of the expr
and replace the original if there is no error.
* gfortran.h : Remove the redundant field 'rankguessed' from
'gfc_association_list'.
* resolve.cc (resolve_assoc_var): Remove refs to 'rankguessed'.
(resolve_variable): Associate names with constant or structure
constructor targets cannot have array refs.
* trans-array.cc (gfc_conv_expr_descriptor): Guard expression
character length backend decl before using it. Suppress the
assignment if lhs equals rhs.
* trans-io.cc (gfc_trans_transfer): Scalarize transfer of
associate variables pointing to a variable. Add comment.
* trans-stmt.cc (trans_associate_var): Remove requirement that
the character length be deferred before assigning the value
returned by gfc_conv_expr_descriptor. Also, guard the backend
decl before testing with VAR_P.
gcc/testsuite/
PR fortran/99350
* gfortran.dg/pr99350.f90 : New test.
Roger Sayle [Wed, 7 Jun 2023 22:40:56 +0000 (23:40 +0100)]
[Committed] Bug fix to new wi::bitreverse_large function.
Richard Sandiford was, of course, right to be warry of new code without
much test coverage. Converting the nvptx backend to use the BITREVERSE
rtx infrastructure, has resulted in far more exhaustive testing and
revealed a subtle bug in the new wi::bitreverse implementation. The
code needs to use HOST_WIDE_INT_1U (instead of 1) to avoid unintended
sign extension.
This patch has been tested on nvptx-none hosted on x86_64-pc-linux-gnu
(with a minor tweak to use BITREVERSE), where it fixes regressions of
the 32-bit test vectors in gcc.target/nvptx/brev-2.c and the 64-bit
test vectors in gcc.target/nvptx/brevll-2.c. Committed as obvious.
2023-06-07 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* wide-int.cc (wi::bitreverse_large): Use HOST_WIDE_INT_1U to
avoid sign extension/undefined behaviour when setting each bit.
Roger Sayle [Wed, 7 Jun 2023 22:35:15 +0000 (23:35 +0100)]
Add support for stc and cmc instructions in i386.md
This patch is the latest revision of my patch to add support for the
STC (set carry flag) and CMC (complement carry flag) instructions to
the i386 backend, incorporating Uros' previous feedback. The significant
changes are (i) the inclusion of CMC, (ii) the use of UNSPEC for pattern,
(iii) Use of a new X86_TUNE_SLOW_STC tuning flag to use alternate
implementations on pentium4 (which has a notoriously slow STC) when
not optimizing for size.
An example of the use of the stc instruction is:
unsigned int foo (unsigned int a, unsigned int b, unsigned int *c) {
return __builtin_ia32_addcarryx_u32 (1, a, b, c);
}
with this patch now generates:
stc
adcl %esi, %edi
setc %al
movl %edi, (%rdx)
movzbl %al, %eax
ret
An example of the use of the cmc instruction (where the carry from
a first adc is inverted/complemented as input to a second adc) is:
unsigned int bar (unsigned int a, unsigned int b,
unsigned int c, unsigned int d)
{
unsigned int c1 = __builtin_ia32_addcarryx_u32 (1, a, b, &o1);
return __builtin_ia32_addcarryx_u32 (c1 ^ 1, c, d, &o2);
}
and now generates:
stc
adcl %esi, %edi
cmc
movl %edi, o1(%rip)
adcl %ecx, %edx
setc %al
movl %edx, o2(%rip)
movzbl %al, %eax
ret
This version implements Uros' suggestions/refinements. (i) Avoid the
UNSPEC_CMC by using the canonical RTL idiom for *x86_cmc, (ii) Use
peephole2s to convert x86_stc and *x86_cmc into alternate forms on
TARGET_SLOW_STC CPUs (pentium4), when a suitable QImode register is
available, (iii) Prefer the addqi_cconly_overflow idiom (addb $-1,%al)
over negqi_ccc_1 (neg %al) for setting the carry from a QImode value,
These changes required two minor edits to i386.cc: ix86_cc_mode had
to be tweaked to suggest CCCmode for the new *x86_cmc pattern, and
*x86_cmc needed to be handled/parameterized in ix86_rtx_costs so that
combine would appreciate that this complex RTL expression was actually
a fast, single byte instruction [i.e. preferable].
2022-06-07 Roger Sayle <roger@nextmovesoftware.com>
Uros Bizjak <ubizjak@gmail.com>
gcc/ChangeLog
* config/i386/i386-expand.cc (ix86_expand_builtin) <handlecarry>:
Use new x86_stc instruction when the carry flag must be set.
* config/i386/i386.cc (ix86_cc_mode): Use CCCmode for *x86_cmc.
(ix86_rtx_costs): Provide accurate rtx_costs for *x86_cmc.
* config/i386/i386.h (TARGET_SLOW_STC): New define.
* config/i386/i386.md (UNSPEC_STC): New UNSPEC for stc.
(x86_stc): New define_insn.
(define_peephole2): Convert x86_stc into alternate implementation
on pentium4 without -Os when a QImode register is available.
(*x86_cmc): New define_insn.
(define_peephole2): Convert *x86_cmc into alternate implementation
on pentium4 without -Os when a QImode register is available.
(*setccc): New define_insn_and_split for a no-op CCCmode move.
(*setcc_qi_negqi_ccc_1_<mode>): New define_insn_and_split to
recognize (and eliminate) the carry flag being copied to itself.
(*setcc_qi_negqi_ccc_2_<mode>): Likewise.
* config/i386/x86-tune.def (X86_TUNE_SLOW_STC): New tuning flag.
gcc/testsuite/ChangeLog
* gcc.target/i386/cmc-1.c: New test case.
* gcc.target/i386/stc-1.c: Likewise.
Jason Merrill [Wed, 7 Jun 2023 09:15:02 +0000 (05:15 -0400)]
c++: allow NRV and non-NRV returns [PR58487]
Now that we support NRV from an inner block, we can also support non-NRV
returns from other blocks, since once the NRV is out of scope a later return
expression can't possibly alias it.
This fixes 58487 and half-fixes 53637: now one of the returns is elided, but
not the other.
Fixing the remaining xfails in these testcases will require a very different
approach, probably involving a full tree/block walk from finalize_nrv, and
check_return_expr only adding to a list of potential return variables.
PR c++/58487
PR c++/53637
gcc/cp/ChangeLog:
* cp-tree.h (INIT_EXPR_NRV_P): New.
* semantics.cc (finalize_nrv_r): Check it.
* name-lookup.h (decl_in_scope_p): Declare.
* name-lookup.cc (decl_in_scope_p): New.
* typeck.cc (check_return_expr): Allow non-NRV
returns if the NRV is no longer in scope.
gcc/testsuite/ChangeLog:
* g++.dg/opt/nrv26.C: New test.
* g++.dg/opt/nrv26a.C: New test.
* g++.dg/opt/nrv27.C: New test.
Jeff Law [Wed, 7 Jun 2023 19:40:16 +0000 (13:40 -0600)]
RISC-V: Eliminate extension after for *w instructions
This patch tries to prevent generating unnecessary sign extension
after *w instructions like "addiw" or "divw".
The main idea of it is to add SUBREG_PROMOTED fields during expanding.
I have tested on SPEC2017 there is no regression.
Only gcc.dg/pr30957-1.c test failed.
To solve that I did some changes in loop-iv.cc, but not sure that it is
suitable.
gcc/ChangeLog:
* config/riscv/bitmanip.md (rotrdi3, rotrsi3, rotlsi3): New expanders.
(rotrsi3_sext): Expose generator.
(rotlsi3 pattern): Hide generator.
* config/riscv/riscv-protos.h (riscv_emit_binary): New function
declaration.
* config/riscv/riscv.cc (riscv_emit_binary): Removed static
* config/riscv/riscv.md (addsi3, subsi3, negsi2): Hide generator.
(mulsi3, <optab>si3): Likewise.
(addsi3, subsi3, negsi2, mulsi3, <optab>si3): New expanders.
(addv<mode>4, subv<mode>4, mulv<mode>4): Use riscv_emit_binary.
(<u>mulsidi3): Likewise.
(addsi3_extended, subsi3_extended, negsi2_extended): Expose generator.
(mulsi3_extended, <optab>si3_extended): Likewise.
(splitter for shadd feeding divison): Update RTL pattern to account
for changes in how 32 bit ops are expanded for TARGET_64BIT.
* loop-iv.cc (get_biv_step_1): Process src of extension when it PLUS.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/shift-and-2.c: New tests.
* gcc.target/riscv/shift-shift-2.c: Adjust expected output.
* gcc.target/riscv/sign-extend.c: New test.
* gcc.target/riscv/zbb-rol-ror-03.c: Adjust expected output.
Fix by following the spirit of the adjacent comment, and using the
dedicated riscv_const_insns() function to calculate cost for loading a
constant element. Infinite recursion is not possible because the first
invocation is on a CONST_VECTOR, whereas the second is on a single
element of the vector (e.g. CONST_INT or CONST_DOUBLE).
Regression tested for riscv32-none-elf. No changes in gcc.sum and
g++.sum.
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_const_insns): Recursively call
for constant element of a vector.
Jakub Jelinek [Wed, 7 Jun 2023 17:27:35 +0000 (19:27 +0200)]
libstdc++: Fix up 20_util/to_chars/double.cc test for excess precision [PR110145]
This test apparently contains 3 problematic floating point constants,
1e126, 4.91e-6 and 5.547e-6. These constants suffer from double rounding
when -fexcess-precision=standard evaluates double constants in the precision
of Intel extended 80-bit long double.
As written in the PR, e.g. the first one is
0x1.7a2ecc414a03f7ff6ca1cb527787b130a97d51e51202365p+418
in the precision of GCC's internal format, 80-bit long double has
63-bit precision, so the above constant rounded to long double is
0x1.7a2ecc414a03f800p+418L
(the least significant bit in the 0 before p isn't there already).
0x1.7a2ecc414a03f800p+418L rounded to IEEE double is
0x1.7a2ecc414a040p+418.
Now, if excess precision doesn't happen and we round the GCC's internal
format number directly to double, it is
0x1.7a2ecc414a03fp+418 and that is the number the test expects.
One can see it on x86-64 (where excess precision to long double doesn't
happen) where double(1e126L) != 1e126.
The other two constants suffer from the same problem.
The following patch tweaks the testcase, such that those problematic
constants are used only if FLT_EVAL_METHOD is 0 or 1 (i.e. when we have
guarantee the constants will be evaluated in double precision),
plus adds corresponding tests with hexadecimal constants which don't
suffer from this excess precision problem, they are exact in double
and long double can hold all double values.
2023-06-07 Jakub Jelinek <jakub@redhat.com>
PR libstdc++/110145
* testsuite/20_util/to_chars/double.cc: Include <cfloat>.
(double_to_chars_test_cases,
double_scientific_precision_to_chars_test_cases_2,
double_fixed_precision_to_chars_test_cases_2): #if out 1e126, 4.91e-6
and 5.547e-6 tests if FLT_EVAL_METHOD is negative or larger than 1.
Add unconditional tests with corresponding double constants
0x1.7a2ecc414a03fp+418, 0x1.4981285e98e79p-18 and
0x1.7440bbff418b9p-18.
Jakub Jelinek [Wed, 7 Jun 2023 17:18:26 +0000 (19:18 +0200)]
match.pd: Improve zero_one_valued_p
Recently zero_one_valued_p was changed to handle integer_zerop
case specially, because tree_nonzero_bits (@0) == 1 only returns
true for non-constant values with range [0, 1] or constant 1,
constant 0 has tree_nonzero_bits (integer_zero_node) == 0.
The following patch reverts that change and instead checks
that tree_nonzero_bits is <= 1U.
2023-06-07 Jakub Jelinek <jakub@redhat.com>
* match.pd (zero_one_valued_p): Don't handle integer_zerop specially,
instead compare tree_nonzero_bits <= 1U rather than just == 1.
Alex Coplan [Tue, 6 Jun 2023 14:19:03 +0000 (15:19 +0100)]
aarch64: Allow compiler to define ls64 builtins [PR110132]
This patch refactors the ls64 builtins to allow the compiler to define them
directly instead of having wrapper functions in arm_acle.h. This should be not
only easier to maintain, but it makes two important correctness fixes:
- It fixes PR110132, where the builtins ended up getting declared with
invisible bindings in the C FE, so the FE ended up synthesizing
incompatible implicit definitions for these builtins.
- It allows the builtins to be used with LTO, which didn't work previously.
We also take the opportunity to add test coverage from C++ for these
builtins.
gcc/ChangeLog:
PR target/110132
* config/aarch64/aarch64-builtins.cc (aarch64_general_simulate_builtin):
New. Use it ...
(aarch64_init_ls64_builtins): ... here. Switch to declaring public ACLE
names for builtins.
(aarch64_general_init_builtins): Ensure we invoke the arm_acle.h
setup if in_lto_p, just like we do for SVE.
* config/aarch64/arm_acle.h: (__arm_ld64b): Delete.
(__arm_st64b): Delete.
(__arm_st64bv): Delete.
(__arm_st64bv0): Delete.
gcc/testsuite/ChangeLog:
PR target/110132
* lib/target-supports.exp (check_effective_target_aarch64_asm_FUNC_ok):
Extend to ls64.
* g++.target/aarch64/acle/acle.exp: New.
* g++.target/aarch64/acle/ls64.C: New test.
* g++.target/aarch64/acle/ls64_lto.C: New test.
* gcc.target/aarch64/acle/ls64_lto.c: New test.
* gcc.target/aarch64/acle/pr110132.c: New test.
Alex Coplan [Tue, 6 Jun 2023 10:52:19 +0000 (11:52 +0100)]
aarch64: Fix wrong code with st64b builtin [PR110100]
The st64b pattern incorrectly had an output constraint on the register
operand containing the destination address for the store, leading to
wrong code. This patch fixes that.
gcc/ChangeLog:
PR target/110100
* config/aarch64/aarch64-builtins.cc (aarch64_expand_builtin_ls64):
Use input operand for the destination address.
* config/aarch64/aarch64.md (st64b): Fix constraint on address
operand.
gcc/testsuite/ChangeLog:
PR target/110100
* gcc.target/aarch64/acle/pr110100.c: New test.
Florian Weimer [Tue, 6 Jun 2023 09:01:07 +0000 (11:01 +0200)]
libgcc: Fix eh_frame fast path in find_fde_tail
The eh_frame value is only used by linear_search_fdes, not the binary
search directly in find_fde_tail, so the bug is not immediately
apparent with most programs.
Jonathan Wakely [Tue, 6 Jun 2023 15:09:29 +0000 (16:09 +0100)]
libstdc++: Fix some tests that fail with -fexcess-precision=standard
libstdc++-v3/ChangeLog:
* testsuite/20_util/duration/cons/2.cc: Use values that aren't
affected by rounding.
* testsuite/20_util/from_chars/5.cc: Cast arithmetic result to
double before comparing for equality.
* testsuite/20_util/from_chars/6.cc: Likewise.
* testsuite/20_util/variant/86874.cc: Use values that aren't
affected by rounding.
* testsuite/25_algorithms/lower_bound/partitioned.cc: Compare to
original value instead of to floating-point-literal.
* testsuite/26_numerics/random/discrete_distribution/cons/range.cc:
Cast arithmetic result to double before comparing for equality.
* testsuite/26_numerics/random/piecewise_constant_distribution/cons/range.cc:
Likewise.
* testsuite/26_numerics/random/piecewise_linear_distribution/cons/range.cc:
Likewise.
* testsuite/26_numerics/valarray/transcend.cc (eq): Check that
the absolute difference is less than 0.01 instead of comparing
to two decimal places.
* testsuite/27_io/basic_istream/extractors_arithmetic/char/01.cc:
Cast arithmetic result to double before comparing for equality.
* testsuite/27_io/basic_istream/extractors_arithmetic/char/09.cc:
Likewise.
* testsuite/27_io/basic_istream/extractors_arithmetic/char/10.cc:
Likewise.
* testsuite/27_io/basic_istream/extractors_arithmetic/wchar_t/01.cc:
Likewise.
* testsuite/27_io/basic_istream/extractors_arithmetic/wchar_t/09.cc:
Likewise.
* testsuite/27_io/basic_istream/extractors_arithmetic/wchar_t/10.cc:
Likewise.
* testsuite/ext/random/hoyt_distribution/cons/parms.cc: Likewise.
RA: Constrain class of pic offset table pseudo to general regs
On some targets an integer pseudo can be assigned to a FP reg. For
pic offset table pseudo it means we will reload the pseudo in this
case and, as a consequence, memory containing the pseudo might be
recognized as wrong one. The patch fix this problem.
PR target/109541
gcc/ChangeLog:
* ira-costs.cc: (find_costs_and_classes): Constrain classes of pic
offset table pseudo to a general reg subset.
Kyrylo Tkachov [Wed, 7 Jun 2023 15:20:57 +0000 (16:20 +0100)]
aarch64: Represent SQXTUN with RTL operations
This patch removes UNSPEC_SQXTUN and uses organic RTL codes to represent the operation.
SQXTUN is an odd one. It's described in the architecture as "Signed saturating extract Unsigned Narrow".
It's not a straightforward ss_truncate nor a us_truncate.
It is a sort of truncating signed clamp operation with limits derived from the unsigned extrema of the narrow mode:
(truncate:N
(smin:M
(smax:M (reg:M) (const_int 0))
(const_int <unsigned-max-for-mode-N>)))
This patch implements these semantics. I've checked that the vqmovun tests in advsimd-intrinsics.exp
now get constant-folded and still pass validation, so I'm pretty confident in the semantics.
Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf.
gcc/ChangeLog:
* config/aarch64/aarch64-simd.md (aarch64_sqmovun<mode><vczle><vczbe>):
Rename to...
(*aarch64_sqmovun<mode>_insn<vczle><vczbe>): ... This. Reimplement
with RTL codes.
(aarch64_sqmovun<mode> [SD_HSDI]): Reimplement with RTL codes.
(aarch64_sqxtun2<mode>_le): Likewise.
(aarch64_sqxtun2<mode>_be): Likewise.
(aarch64_sqxtun2<mode>): Adjust for the above.
(aarch64_sqmovun<mode>): New define_expand.
* config/aarch64/iterators.md (UNSPEC_SQXTUN): Delete.
(half_mask): New mode attribute.
* config/aarch64/predicates.md (aarch64_simd_umax_half_mode):
New predicate.
Kyrylo Tkachov [Wed, 7 Jun 2023 15:18:01 +0000 (16:18 +0100)]
aarch64: Improve RTL representation of ADDP instructions
Similar to the ADDLP instructions the non-widening ADDP ones can be
represented by adding the odd lanes with the even lanes of a vector.
These instructions take two vector inputs and the architecture spec
describes the operation as concatenating them together before going
through it with pairwise additions.
This patch chooses to represent ADDP on 64-bit and 128-bit input
vectors slightly differently, reasons explained in the comments
in aarhc64-simd.md.
Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf.
gcc/ChangeLog:
* config/aarch64/aarch64-simd.md (aarch64_addp<mode><vczle><vczbe>):
Reimplement as...
(aarch64_addp<mode>_insn): ... This...
(aarch64_addp<mode><vczle><vczbe>_insn): ... And this.
(aarch64_addp<mode>): New define_expand.
Jeff Law [Wed, 7 Jun 2023 13:55:32 +0000 (07:55 -0600)]
Fix expected test output on hppa
Recent changes in the hoisting code change the optimized gimple for the
shadd-3 testcase on the PA. That in turn changes the number of expected
shadd instructions.
I'm not entirely sure the test is actually testing what we want anymore
since I don't see a CSE for postreload to discover. But I did verify
that the number of shadd instructions is sane, so I just changed the
count in the obvious way.
Tobias Burnus [Wed, 7 Jun 2023 11:22:13 +0000 (13:22 +0200)]
testsuite/libgomp.*/target-present-*.{c,f90}: Improve and fix
One of the testcases lacked variables in a map clause such that
the fail occurred too early. Additionally, it would have failed
for all those non-host devices where 'present' is always true, i.e.
non-host devices which can access all of the host memory
(shared-memory devices). [There are currently none.]
The commit now runs the code on all devices, which should succeed
for host fallback and for shared-memory devices, finding potenial issues
that way. Additionally, a checkpoint (required stdout output) is used
to ensure that the execution won't fail (with the same error) before
reaching the expected fail location.
2023-06-07 Thomas Schwinge <thomas@codesourcery.com>
Tobias Burnus <tobias@codesourcery.com>
libgomp/
* testsuite/libgomp.c-c++-common/target-present-1.c: Run code
also for non-offload_device targets; check that it runs
successfully for those and for all until a checkpoint for all
* testsuite/libgomp.c-c++-common/target-present-2.c: Likewise.
* testsuite/libgomp.c-c++-common/target-present-3.c: Likewise.
* testsuite/libgomp.fortran/target-present-1.f90: Likewise.
* testsuite/libgomp.fortran/target-present-3.f90: Likewise.
* testsuite/libgomp.fortran/target-present-2.f90: Likewise;
add missing vars to map clause.
Thomas Schwinge [Wed, 7 Jun 2023 06:46:38 +0000 (08:46 +0200)]
Support 'UNSUPPORTED: [...]: exception handling disabled' for libstdc++ testing
Verbatim copy of what was added to 'gcc/testsuite/lib/gcc-dg.exp:gcc-dg-prune'
in Subversion r279246 (Git commit a9046e9853024206bec092dd63e21e152cb5cbca)
"[MSP430] -Add fno-exceptions multilib".
This greatly improves 'make check-target-libstdc++-v3' results for, for
example, x86_64-pc-linux-gnu with:
Jakub Jelinek [Wed, 7 Jun 2023 07:45:13 +0000 (09:45 +0200)]
modula2: Fix bootstrap
internal-fn.h since yesterday includes insn-opinit.h, which is a generated
header.
One of my bootstraps today failed because some m2 sources started compiling
before insn-opinit.h has been generated.
Normally, gcc/Makefile.in has
# In order for parallel make to really start compiling the expensive
# objects from $(OBJS) as early as possible, build all their
# prerequisites strictly before all objects.
$(ALL_HOST_OBJS) : | $(generated_files)
rule which ensures that all the generated files are generated before
any $(ALL_HOST_OBJS) objects start, but use order-only dependency for
this because we don't want to rebuild most of the objects whenever one
generated header is regenerated. After the initial build in an empty
directory we'll have .deps/ files contain the detailed dependencies.
$(ALL_HOST_OBJS) includes even some FE files, I think in the m2 case
would be m2_OBJS, but m2/Make-lang.in doesn't define those.
The following patch just adds a similar rule to m2/Make-lang.in.
Another option would be to set m2_OBJS variable in m2/Make-lang.in to
something, but not really sure to which exactly and why it isn't
done.
2023-06-07 Jakub Jelinek <jakub@redhat.com>
* Make-lang.in: Build $(generated_files) before building
all $(GM2_C_OBJS).
Note: we need to use "vrgatherei16.vv" instead of "vrgather.vv" for SEW = 8
since "vrgatherei16.vv" can cover larger range than "vrgather.vv" (which
only can maximum element index = 255).
There is one more last thing we need to do is the "Epilogue auto-vectorization"
which needs VLS modes support. I will support VLS modes for
"Epilogue auto-vectorization" in the future.
gcc/ChangeLog:
* config/riscv/riscv-protos.h (expand_vec_perm_const): New function.
* config/riscv/riscv-v.cc
(rvv_builder::can_duplicate_repeating_sequence_p): Support POLY
handling.
(rvv_builder::single_step_npatterns_p): New function.
(rvv_builder::npatterns_all_equal_p): Ditto.
(const_vec_all_in_range_p): Support POLY handling.
(gen_const_vector_dup): Ditto.
(emit_vlmax_gather_insn): Add vrgatherei16.
(emit_vlmax_masked_gather_mu_insn): Ditto.
(expand_const_vector): Add VLA SLP const vector support.
(expand_vec_perm): Support POLY.
(struct expand_vec_perm_d): New struct.
(shuffle_generic_patterns): New function.
(expand_vec_perm_const_1): Ditto.
(expand_vec_perm_const): Ditto.
* config/riscv/riscv.cc (riscv_vectorize_vec_perm_const): Ditto.
(TARGET_VECTORIZE_VEC_PERM_CONST): New targethook.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/scalable-1.c: Adapt testcase for VLA
vectorizer.
* gcc.target/riscv/rvv/autovec/v-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve32f_zvl128b-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve32x_zvl128b-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve64d-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve64d_zvl128b-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve64f-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve64f_zvl128b-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve64x_zvl128b-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/slp-1.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp-2.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp-3.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp-4.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp-5.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp-6.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp-7.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp_run-3.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp_run-4.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp_run-5.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp_run-6.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp_run-7.c: New test.
Andrew Pinski [Mon, 5 Jun 2023 02:42:08 +0000 (19:42 -0700)]
Handle const_int in expand_single_bit_test
After expanding directly to rtl instead of
creating a tree, we could end up with
a const_int which is not ready to be handled
by extract_bit_field.
So need to the constant folding here instead.
OK? bootstrapped and tested on x86_64-linux-gnu with no regressions.
PR middle-end/110117
gcc/ChangeLog:
* expr.cc (expand_single_bit_test): Handle
const_int from expand_expr.
gcc/testsuite/ChangeLog:
* gcc.dg/pr110117-1.c: New test.
* gcc.dg/pr110117-2.c: New test.
Andrew Pinski [Mon, 5 Jun 2023 02:21:05 +0000 (19:21 -0700)]
Improve do_store_flag for single bit when there is no non-zero bits
In r14-1534-g908e5ab5c11c, I forgot you could turn off CCP or
turn off the bit tracking part of CCP so we would lose out
what TER was able to do before hand. This moves around the
TER code so that it is used instead of just the nonzerobits.
It also makes it easier to remove the TER part of the code
later on too.
OK? Bootstrapped and tested on x86_64-linux-gnu.
Note it reintroduces PR 110117 (which was accidently fixed after r14-1534-g908e5ab5c11c). The next patch in series will fix that.
gcc/ChangeLog:
* expr.cc (do_store_flag): Rearrange the
TER code so that it overrides the nonzero bits
info if we had `a & POW2`.
Andrew Pinski [Tue, 6 Jun 2023 02:12:43 +0000 (19:12 -0700)]
For the `-A CMP -B -> B CMP A` pattern allow EQ/NE for all integer types
I noticed while looking at some code generation issue, that forwprop
was not handling `-a == 0` for unsigned types and I was confused why
it was not. r6-1814-g66e1cacf608045 removed these from fold because they
were supposed to be already handled by the match.pd patterns
but it was missed that the match.pd patterns checked
TYPE_OVERFLOW_UNDEFINED while fold didn't do that for NE/EQ.
This patch removes the restriction on NE/EQ on TYPE_OVERFLOW_UNDEFINED.
OK? Bootstrapped and tested on x86_64-linux-gnu.
gcc/ChangeLog:
PR tree-optimization/110134
* match.pd (-A CMP -B -> B CMP A): Allow EQ/NE for all integer
types.
(-A CMP CST -> B CMP (-CST)): Likewise.
gcc/testsuite/ChangeLog:
PR tree-optimization/110134
* gcc.dg/tree-ssa/negneq-1.c: New test.
* gcc.dg/tree-ssa/negneq-2.c: New test.
* gcc.dg/tree-ssa/negneq-3.c: New test.
* gcc.dg/tree-ssa/negneq-4.c: New test.
Costas Argyris [Wed, 7 Jun 2023 02:50:07 +0000 (20:50 -0600)]
libiberty: writeargv: Simplify function error mode.
You are right, this is also a remnant of the old function design
that I completely missed. Here is the follow-up patch for that.
Thanks for pointing it out.
Costas
On Tue, 6 Jun 2023 at 04:12, Jeff Law <jeffreyalaw@gmail.com> wrote:
On 6/5/23 08:37, Costas Argyris via Gcc-patches wrote:
> writeargv can be simplified by getting rid of the error exit mode
> that was only relevant many years ago when the function used
> to open the file descriptor internally.
[ ... ]
Thanks. I've pushed this to the trunk.
You could (as a follow-up) simplify it even further. There's no need
for the status variable as far as I can tell. You could just have the
final return be "return 0;" instead of "return status;".
libiberty/
* argv.c (writeargv): Constant propagate "0" for "status",
simplifying the code slightly.
Andrew Pinski [Wed, 24 May 2023 07:08:45 +0000 (07:08 +0000)]
Add match patterns for `a ? onezero : onezero` where one of the two operands are constant
This adds a match pattern that are for boolean values
that optimizes `a ? onezero : 0` to `a & onezero` and
`a ? 1 : onezero` to `a | onezero`.
This was reported a few times and I thought I would finally
add the match pattern for this.
This hits a few times in GCC itself too.
Notes on the testcases:
* phi-opt-2.c: This now is optimized to `a & b` in phiopt rather than ifcombine
* phi-opt-25b.c: The test part that was failing was parity which now gets `x & y` treatment.
* ssa-thread-21.c: there is no longer a threading opportunity, so need to disable phiopt.
Note PR 109957 is filed for the now missing optimization in that testcase too.
gcc/ChangeLog:
PR tree-optimization/89263
PR tree-optimization/99069
PR tree-optimization/20083
PR tree-optimization/94898
* match.pd: Add patterns to optimize `a ? onezero : onezero` with
one of the operands are constant.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/phi-opt-2.c: Adjust the testcase.
* gcc.dg/tree-ssa/phi-opt-25b.c: Adjust the testcase.
* gcc.dg/tree-ssa/ssa-thread-21.c: Disable phiopt.
* gcc.dg/tree-ssa/phi-opt-27.c: New test.
* gcc.dg/tree-ssa/phi-opt-28.c: New test.
* gcc.dg/tree-ssa/phi-opt-29.c: New test.
* gcc.dg/tree-ssa/phi-opt-30.c: New test.
* gcc.dg/tree-ssa/phi-opt-31.c: New test.
* gcc.dg/tree-ssa/phi-opt-32.c: New test.
Andrew Pinski [Tue, 6 Jun 2023 15:21:46 +0000 (08:21 -0700)]
Match: zero_one_valued_p should match 0 constants too
While working on `bool0 ? bool1 : bool2` I noticed that
zero_one_valued_p does not match on the constant zero
as in that case tree_nonzero_bits will return 0 and
that is different from 1.
OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
gcc/ChangeLog:
* match.pd (zero_one_valued_p): Match 0 integer constant
too.
Pan Li [Wed, 7 Jun 2023 01:25:33 +0000 (09:25 +0800)]
RISC-V: Fix ICE when include riscv_vector.h with rv64gcv
This patch would like to fix the incorrect requirement of the vector
builtin types for the ZVFH/ZVFHMIN extension. The incorrect requirement
will result in the ops mismatch with iterators, and then ICE will be
triggered if ZVFH/ZVFHMIN is not given.
Sorry for inconviensient.
Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:
Jason Merrill [Tue, 6 Jun 2023 16:46:26 +0000 (12:46 -0400)]
c++: Add -Wnrvo
While looking at PRs about cases where we don't perform the named return
value optimization, it occurred to me that it might be useful to have a
warning for that.
This does not fix PR58487, but might be interesting to people watching it.
Jason Merrill [Sun, 4 Jun 2023 16:09:11 +0000 (12:09 -0400)]
c++: enable NRVO from inner block [PR51571]
Our implementation of the named return value optimization has been limited
to variables declared in the outermost block of the function, to avoid
needing to handle the case where the variable needs to be destroyed due to
going out of scope. PR92407 pointed out a case we were missing, where the
variable goes out of scope due to a goto and we were failing to destroy it.
It occurred to me that this problem is the flip side of PR33799, where we
need to be sure to destroy the return value if a cleanup throws on return;
here we want to avoid destroying the return value when exiting the
variable's scope on return. We can use the same flag to indicate to both
cleanups that we're returning.
This implements the guaranteed copy elision specified by P2025 (which is not
yet part of the draft standard).
PR c++/51571
PR c++/92407
gcc/cp/ChangeLog:
* decl.cc (finish_function): Simplify NRV handling.
* except.cc (maybe_set_retval_sentinel): Also set if NRV.
(maybe_splice_retval_cleanup): Don't add the cleanup region
if we don't need it.
* semantics.cc (nrv_data): Add simple field.
(finalize_nrv): Set it.
(finalize_nrv_r): Check it and retval sentinel.
* cp-tree.h (finalize_nrv): Adjust declaration.
* typeck.cc (check_return_expr): Remove named_labels check.
Jason Merrill [Sun, 4 Jun 2023 16:00:55 +0000 (12:00 -0400)]
c++: NRV and goto [PR92407]
Here our named return value optimization was breaking the required
destructor when the goto takes 'a' out of scope. The simplest fix is to
disable the optimization in the presence of user labels.
We could do better by disabling the optimization only if there is a backward
goto across the variable declaration, but we don't currently track that.
PR c++/92407
gcc/cp/ChangeLog:
* typeck.cc (check_return_expr): Prevent NRV in the presence of
named labels.
Jason Merrill [Tue, 6 Jun 2023 19:31:23 +0000 (15:31 -0400)]
c++: fix throwing cleanup with label
While looking at PR92407 I noticed that the expectations of
maybe_splice_retval_cleanup weren't being met; an sk_cleanup level was
confusing its attempt to recognize the outer block of the function. And
even if I fixed the detection, it failed to actually wrap the body of the
function because the STATEMENT_LIST it got only had the label, not anything
after it. So I moved the call after poplevel does pop_stmt_list on all the
sk_cleanup levels.
PR c++/33799
gcc/cp/ChangeLog:
* except.cc (maybe_splice_retval_cleanup): Change
recognition of function body and try scopes.
* semantics.cc (do_poplevel): Call it after poplevel.
(at_try_scope): New.
* cp-tree.h (maybe_splice_retval_cleanup): Adjust.
Jason Merrill [Tue, 6 Jun 2023 03:58:32 +0000 (23:58 -0400)]
c++: fix contracts with NRV
The NRV implementation was blindly replacing the operand of RETURN_EXPR,
clobbering anything that check_return_expr might have added on to the actual
initialization, such as checking the postcondition.
gcc/cp/ChangeLog:
* semantics.cc (finalize_nrv_r): [RETURN_EXPR]: Only replace the
INIT_EXPR.
Gaius Mulley [Wed, 7 Jun 2023 00:21:19 +0000 (01:21 +0100)]
PR modula2/110019 Reported line numbers off by 1 when cpp invoked.
Fix off by one in m2.flex when the line number is set via cpp.
gcc/m2/ChangeLog:
PR modula2/110019
* gm2-compiler/SymbolKey.mod (SearchAndDo): Reformatted.
(ForeachNodeDo): Reformatted.
* gm2-compiler/SymbolTable.mod (AddListify): Join list
with "," or "and" if more than one word is in the list.
* m2.flex: Remove -1 from atoi(yytext) line number.
gcc/testsuite/ChangeLog:
PR modula2/110019
* gm2/cpp/fail/cpp-fail.exp: New test.
* gm2/cpp/fail/foocpp.mod: New test.
Roger Sayle [Tue, 6 Jun 2023 23:32:51 +0000 (00:32 +0100)]
Add RTX codes for BITREVERSE and COPYSIGN.
An analysis of backend UNSPECs reveals that two of the most common UNSPECs
across target backends are for copysign and bit reversal. This patch
adds RTX codes for these expressions to allow their representation to
be standardized, and them to optimized by the middle-end RTL optimizers.
2023-06-07 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* doc/rtl.texi (bitreverse, copysign): Document new RTX codes.
* rtl.def (BITREVERSE, COPYSIGN): Define new RTX codes.
* simplify-rtx.cc (simplify_unary_operation_1): Optimize
NOT (BITREVERSE x) as BITREVERSE (NOT x).
Optimize POPCOUNT (BITREVERSE x) as POPCOUNT x.
Optimize PARITY (BITREVERSE x) as PARITY x.
Optimize BITREVERSE (BITREVERSE x) as x.
(simplify_const_unary_operation) <case BITREVERSE>: Evaluate
BITREVERSE of a constant integer at compile-time.
(simplify_binary_operation_1) <case COPYSIGN>: Optimize
COPY_SIGN (x, x) as x. Optimize COPYSIGN (x, C) as ABS x
or NEG (ABS x) for constant C. Optimize COPYSIGN (ABS x, y)
and COPYSIGN (NEG x, y) as COPYSIGN (x, y).
Optimize COPYSIGN (x, ABS y) as ABS x.
Optimize COPYSIGN (COPYSIGN (x, y), z) as COPYSIGN (x, z).
Optimize COPYSIGN (x, COPYSIGN (y, z)) as COPYSIGN (x, z).
(simplify_const_binary_operation): Evaluate COPYSIGN of constant
arguments at compile-time.
Uros Bizjak [Tue, 6 Jun 2023 17:11:29 +0000 (19:11 +0200)]
reload1: Change return type of predicate function from int to bool
gcc/ChangeLog:
* rtl.h (function_invariant_p): Change return type from int to bool.
* reload1.cc (function_invariant_p): Change return type from
int to bool and adjust function body accordingly.
Tobias Burnus [Tue, 6 Jun 2023 16:06:14 +0000 (18:06 +0200)]
libgomp: plugin-gcn - support 'unified_address'
Effectively, for GCN (as for nvptx) there is a common address space between
host and device, whether being accessible or not. Thus, this commit
permits to use 'omp requires unified_address' with GCN devices.
(nvptx accepts this requirement since r13-3460-g131d18e928a3ea.)
libgomp/
* plugin/plugin-gcn.c (GOMP_OFFLOAD_get_num_devices): Regard
unified_address requirement as supported.
* libgomp.texi (OpenMP 5.0, AMD Radeon, nvptx): Remove
'unified_address' from the not-supported requirements.
Jonathan Wakely [Tue, 6 Jun 2023 09:37:32 +0000 (10:37 +0100)]
libstdc++: Update list of known symbol versions for abi-check
Add the recently added CXXABI_1.3.15 version. Also remove two "frozen"
versions from the latestp list, as no more symbols should be added to
those now.
libstdc++-v3/ChangeLog:
* testsuite/util/testsuite_abi.cc (check_version): Add
CXXABI_1.3.15 symver and make it the latestp. Remove
GLIBCXX_IEEE128_3.4.31 and GLIBCXX_LDBL_3.4.31 from latestp.
Jonathan Wakely [Mon, 5 Jun 2023 15:14:29 +0000 (16:14 +0100)]
libstdc++: Make std::numeric_limits<__float128> more portable [PR104772]
This redefines std::numeric_limits<__float128> so that it works with
non-GCC compilers. The previous definition didn't work with Clang, due
to it not supporting __builtin_high_valq, __builtin_nanq, and
__builtin_nansq. It also didn't work in strict modes, due to using Q
literal suffixes.
The new definition uses the Q suffixes when supported, or calculates the
correct values using __float128 arithmetic from double values. Ideally
the values would be defined as hexadecimal-floating-point-literals, but
that won't work for C++14 and older.
The only member that can't be defined this way is signaling_NaN() which
still requires a built-in. If __builtin_nansq is not supported, try to
use __builtin_nansf128 (with a possibly-redundant bit_cast) and if that
isn't supported, return a quiet NaN and define has_signaling_NaN and
is_iec754 to be false.
libstdc++-v3/ChangeLog:
PR libstdc++/104772
* include/std/limits: (numeric_limits<__float128>): Define
for __STRICT_ANSI__ as well.
* testsuite/18_support/numeric_limits/128bit.cc: Remove
check for __STRICT_ANSI__.
This patch add combine optimization for following case:
__attribute__ ((noipa)) void
vwmaccsu (int16_t *__restrict dst, int8_t *__restrict a, uint8_t *__restrict b,
int n)
{
for (int i = 0; i < n; i++)
dst[i] += (int16_t) a[i] * (int16_t) b[i];
}
Before this patch:
...
vsext.vf2
vzext.vf2
vmadd.vv
..
After this patch:
...
vwmaccsu.vv
...
gcc/ChangeLog:
* config/riscv/autovec-opt.md (*<optab>_fma<mode>): New pattern.
(*single_<optab>mult_plus<mode>): Ditto.
(*double_<optab>mult_plus<mode>): Ditto.
(*sign_zero_extend_fma): Ditto.
(*zero_sign_extend_fma): Ditto.
* config/riscv/riscv-protos.h (enum insn_type): New enum.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/widen/widen-8.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-9.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-complicate-5.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-complicate-6.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run-8.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run-9.c: New test.
Tobias Burnus [Tue, 6 Jun 2023 14:47:16 +0000 (16:47 +0200)]
openmp: Add support for the 'present' modifier
This implements support for the OpenMP 5.1 'present' modifier, which can be
used in map clauses in the 'target', 'target data', 'target data enter' and
'target data exit' constructs, and in the 'to' and 'from' clauses of the
'target update' construct. It is also supported in defaultmap.
The modifier triggers a fatal runtime error if the data specified by the
clause is not already present on the target device. It can also be combined
with 'always' in map clauses.
gcc/cp/
* parser.cc (cp_parser_omp_clause_defaultmap,
cp_parser_omp_clause_map): Parse 'present'.
(cp_parser_omp_clause_from_to): New; parse to/from
clauses with optional 'present' modifier.
(cp_parser_omp_all_clauses): Update call.
(cp_parser_omp_target_data, cp_parser_omp_target_enter_data,
cp_parser_omp_target_exit_data): Handle new enum value for
'present' mapping.
* semantics.cc (finish_omp_target): Likewise.
gcc/fortran/
* dump-parse-tree.cc (show_omp_namelist): Display 'present' map
modifier.
(show_omp_clauses): Display 'present' motion modifier for 'to'
and 'from' clauses.
* gfortran.h (enum gfc_omp_map_op): Add entries with 'present'
modifiers.
(struct gfc_omp_namelist): Add 'present_modifer'.
* openmp.cc (gfc_match_motion_var_list): New, handles optional
'present' modifier for to/from clauses.
(gfc_match_omp_clauses): Call it for to/from clauses; parse 'present'
in defaultmap and map clauses.
(resolve_omp_clauses): Allow 'present' modifiers on 'target',
'target data', 'target enter' and 'target exit' directives.
* trans-openmp.cc (gfc_trans_omp_clauses): Apply 'present' modifiers
to tree node for 'map', 'to' and 'from' clauses. Apply 'present' for
defaultmap.
gcc/
* gimplify.cc (omp_notice_variable): Apply GOVD_MAP_ALLOC_ONLY flag
and defaultmap flags if the defaultmap has GOVD_MAP_FORCE_PRESENT flag
set.
(omp_get_attachment): Handle map clauses with 'present' modifier.
(omp_group_base): Likewise.
(gimplify_scan_omp_clauses): Reorder present maps to come first.
Set GOVD flags for present defaultmaps.
(gimplify_adjust_omp_clauses_1): Set map kind for present defaultmaps.
* omp-low.cc (scan_sharing_clauses): Handle 'always, present' map
clauses.
(lower_omp_target): Handle map clauses with 'present' modifier.
Handle 'to' and 'from' clauses with 'present'.
* tree-core.h (enum omp_clause_defaultmap_kind): Add
OMP_CLAUSE_DEFAULTMAP_PRESENT defaultmap kind.
* tree-pretty-print.cc (dump_omp_clause): Handle 'map', 'to' and
'from' clauses with 'present' modifier. Handle present defaultmap.
* tree.h (OMP_CLAUSE_MOTION_PRESENT): New #define.
include/
* gomp-constants.h (GOMP_MAP_FLAG_SPECIAL_5): New.
(GOMP_MAP_FLAG_FORCE): Redefine.
(GOMP_MAP_FLAG_PRESENT, GOMP_MAP_FLAG_ALWAYS_PRESENT): New.
(enum gomp_map_kind): Add map kinds with 'present' modifiers.
(GOMP_MAP_COPY_TO_P, GOMP_MAP_COPY_FROM_P): Evaluate to true for
map variants with 'present'
(GOMP_MAP_ALWAYS_TO_P, GOMP_MAP_ALWAYS_FROM_P): Evaluate to true
for map variants with 'always, present' modifiers.
(GOMP_MAP_ALWAYS): Redefine.
(GOMP_MAP_FORCE_P, GOMP_MAP_PRESENT_P): New.
libgomp/
* libgomp.texi (OpenMP 5.1 Impl. status): Set 'present' support for
defaultmap to 'Y', add 'Y' entry for 'present' on to/from/map clauses.
* target.c (gomp_to_device_kind_p): Add map kinds with 'present'
modifier.
(gomp_map_vars_existing): Use new GOMP_MAP_FORCE_P macro.
(gomp_map_vars_internal, gomp_update, gomp_target_rev):
Emit runtime error if memory region not present.
* testsuite/libgomp.c-c++-common/target-present-1.c: New test.
* testsuite/libgomp.c-c++-common/target-present-2.c: New test.
* testsuite/libgomp.c-c++-common/target-present-3.c: New test.
* testsuite/libgomp.fortran/target-present-1.f90: New test.
* testsuite/libgomp.fortran/target-present-2.f90: New test.
* testsuite/libgomp.fortran/target-present-3.f90: New test.
gcc/testsuite/
* c-c++-common/gomp/map-6.c: Update dg-error, extend to test for
duplicated 'present' and extend scan-dump tests for 'present'.
* gfortran.dg/gomp/defaultmap-1.f90: Update dg-error.
* gfortran.dg/gomp/map-7.f90: Extend parse and dump test for
'present'.
* gfortran.dg/gomp/map-8.f90: Extend for duplicate 'present'
modifier checking.
* c-c++-common/gomp/defaultmap-4.c: New test.
* c-c++-common/gomp/map-9.c: New test.
* c-c++-common/gomp/target-update-1.c: New test.
* gfortran.dg/gomp/defaultmap-8.f90: New test.
* gfortran.dg/gomp/map-11.f90: New test.
* gfortran.dg/gomp/map-12.f90: New test.
* gfortran.dg/gomp/target-update-1.f90: New test.
PR libstdc++/109822
* include/experimental/bits/simd_builtin.h (_S_store): Rewrite
to avoid casts to other vector types. Implement store as
succession of power-of-2 sized memcpy to avoid PR90424.
Matthias Kretz [Fri, 2 Jun 2023 11:44:22 +0000 (13:44 +0200)]
libstdc++: Replace use of incorrect non-temporal store
The call to the base implementation sometimes didn't find a matching
signature because the _Abi parameter of _SimdImpl* was "wrong" after
conversion. It has to call into <new ABI tag>::_SimdImpl instead of the
current ABI tag's _SimdImpl. This also reduces the number of possible
template instantiations.
PR libstdc++/110054
* include/experimental/bits/simd_builtin.h (_S_masked_store):
Call into deduced ABI's SimdImpl after conversion.
* include/experimental/bits/simd_x86.h (_S_masked_store_nocvt):
Don't use _mm_maskmoveu_si128. Use the generic fall-back
implementation. Also fix masked stores without SSE2, which
were not doing anything before.
This makes the code more readable, more digestible, more maintainable,
more extensible. That kind of thing. It does that by pulling things
apart a bit, but also making what stays together more cohesive lumps.
The original function was a bunch of loops and early-outs, and then
quite a bit of stuff done per iteration, with the iterations essentially
independent of each other. This patch moves the stuff done for one
iteration to a new _one function.
The second big thing is the stuff printed to the .md file is done in
"here documents" now, which is a lot more readable than having to quote
and escape and double-escape pieces of text. Whitespace inside the
here-document is significant (will be printed as-is), which is a bit
awkward sometimes, or might take some getting used to, but it is also
one of the benefits of using them.
Local variables are declared at first use (or close to first use).
There also shouldn't be many at all, often you can write easier to
read and manage code by omitting to name something that is hard to name
in the first place.
Finally some things are done in more typical, more modern, and tighter
Perl style, for example REs in "if"s or "qw" for lists of constants.
Jonathan Wakely [Tue, 6 Jun 2023 10:38:42 +0000 (11:38 +0100)]
libstdc++: Fix ambiguous expression in std::array<T, 0>::front() [PR110139]
For 32-bit targets using -pedantic (or using Clang) makes the expression
_M_elems[0] ambiguous. The overloaded operator[] that we want to call
has a size_t parameter, but 0 is type ptrdiff_t for many ILP32 targets,
so using the implicit conversion from _M_elems to T* and then
subscripting that is also viable.
Change the 0 to (size_type)0 and also make the conversion to T*
explicit, so that's it's not viable here. The latter change requires a
static_cast in data() where we really do want to convert _M_elems to a
pointer.
libstdc++-v3/ChangeLog:
PR libstdc++/110139
* include/std/array (__array_traits<T, 0>::operator T*()): Make
conversion operator explicit.
(array::front): Use size_type as subscript operand.
(array::data): Use static_cast to make conversion explicit.
* testsuite/23_containers/array/element_access/110139.cc: New
test.
Joseph Faulls [Fri, 2 Jun 2023 15:44:48 +0000 (15:44 +0000)]
libstdc++: Do not assume existence of char8_t codecvt facet
It is not required that codecvt<char8_t, char, mbstate_t> facet be
supported by the locale, nor is it added as part of the default locale.
This can lead to dangerous behaviour when static_cast.
libstdc++-v3/ChangeLog:
* include/bits/locale_classes.tcc: Remove check for
codecvt<char8_t, char, mbstate_t> facet.
Jonathan Wakely [Tue, 21 Mar 2023 12:29:08 +0000 (12:29 +0000)]
libstdc++: Make std::filesystem::copy_file work for procfs [PR108178]
The size reported by stat is always zero for some special files such as
those under /proc, which means the current copy_file implementation
thinks there is nothing to copy. Instead of trusting the stat value, try
to read a character from a streambuf and check for EOF.
libstdc++-v3/ChangeLog:
PR libstdc++/108178
* src/filesystem/ops-common.h (do_copy_file): Check for empty
files by trying to read a character.
* testsuite/27_io/filesystem/operations/copy_file_108178.cc:
New test.
Jannik Glückert [Wed, 8 Mar 2023 18:37:43 +0000 (19:37 +0100)]
libstdc++: Use copy_file_range for filesystem::copy_file
copy_file_range is a recent-ish syscall for copying files. It is similar
to sendfile but allows filesystem-specific optimizations. Common are:
Reflinks: BTRFS, XFS, ZFS (does not implement the syscall yet)
Server-side copy: NFS, SMB, Ceph
If copy_file_range is not available for the given files, fall back to
sendfile / userspace copy.
libstdc++-v3/ChangeLog:
* acinclude.m4 (_GLIBCXX_USE_COPY_FILE_RANGE): Define.
* config.h.in: Regenerate.
* configure: Regenerate.
* src/filesystem/ops-common.h (copy_file_copy_file_range):
Define new function.
(do_copy_file): Use it.
Jannik Glückert [Mon, 6 Mar 2023 19:52:08 +0000 (20:52 +0100)]
libstdc++: Also use sendfile for big files
We were previously only using sendfile for files smaller than 2GB, as
sendfile needs to be called repeatedly for files bigger than that.
Some quick numbers, copying a 16GB file, average of 10 repetitions:
old:
real: 13.4s
user: 0.14s
sys : 7.43s
new:
real: 8.90s
user: 0.00s
sys : 3.68s
libstdc++-v3/ChangeLog:
* acinclude.m4 (_GLIBCXX_HAVE_LSEEK): Define.
* config.h.in: Regenerate.
* configure: Regenerate.
* src/filesystem/ops-common.h (copy_file_sendfile): Define new
function for sendfile logic. Loop to support large files. Skip
zero-length files.
(do_copy_file): Use it.
PR106907 has few warnings spotted from cppcheck. In that addressing duplicate
expression issue here. Here the same expression is used twice in logical
AND(&&) operation which result in same result so removing that.
Kyrylo Tkachov [Tue, 6 Jun 2023 10:09:12 +0000 (11:09 +0100)]
aarch64: Improve representation of vpaddd intrinsics
The aarch64_addpdi pattern is redundant as the reduc_plus_scal_<mode> pattern can already generate
the required form of the ADDP instruction, and is mostly folded to GIMPLE early on so can benefit from more optimisations.
Though it turns out that we were missing the folding for the unsigned variants.
This patch adds that and wires up the vpaddd_u64 and vpaddd_s64 intrinsics through the above pattern instead
so that we can remove a redundant pattern and get more optimisation earlier.
Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf.
Kyrylo Tkachov [Tue, 6 Jun 2023 09:51:34 +0000 (10:51 +0100)]
aarch64: Reimplement URSHR,SRSHR patterns with standard RTL codes
Having converted the patterns for the URSRA,SRSRA instructions to standard RTL codes we can also
easily convert the non-accumulating forms URSHR,SRSHR.
This patch does that, reusing the various helpers and predicates from that patch in a straightforward way.
This allows GCC to perform the optimisations in the testcase, matching what Clang does.
Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf.
gcc/ChangeLog:
* config/aarch64/aarch64-simd.md (aarch64_<sur>shr_n<mode>): Delete.
(aarch64_<sra_op>rshr_n<mode><vczle><vczbe>_insn): New define_insn.
(aarch64_<sra_op>rshr_n<mode>): New define_expand.
Kyrylo Tkachov [Tue, 6 Jun 2023 08:56:52 +0000 (09:56 +0100)]
aarch64: Simplify SHRN, RSHRN expanders and patterns
Now that we've got the <vczle><vczbe> annotations we can get rid of explicit
!BYTES_BIG_ENDIAN and BYTES_BIG_ENDIAN patterns for the narrowing shift instructions.
This allows us to clean up the expanders as well.
Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf.
gcc/ChangeLog:
* config/aarch64/aarch64-simd.md (aarch64_shrn<mode>_insn_le): Delete.
(aarch64_shrn<mode>_insn_be): Delete.
(*aarch64_<srn_op>shrn<mode>_vect): Rename to...
(*aarch64_<srn_op>shrn<mode><vczle><vczbe>): ... This.
(aarch64_shrn<mode>): Remove reference to the above deleted patterns.
(aarch64_rshrn<mode>_insn_le): Delete.
(aarch64_rshrn<mode>_insn_be): Delete.
(aarch64_rshrn<mode><vczle><vczbe>_insn): New define_insn.
(aarch64_rshrn<mode>): Remove references to the above deleted patterns.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/simd/pr99195_5.c: Add testing for shrn_n, rshrn_n
intrinsics.
Kyrylo Tkachov [Tue, 6 Jun 2023 08:54:41 +0000 (09:54 +0100)]
aarch64: Improve representation of ADDLV instructions
We've received requests to optimise the attached intrinsics testcase.
We currently generate:
foo_1:
uaddlp v0.4s, v0.8h
uaddlv d31, v0.4s
fmov x0, d31
ret
foo_2:
uaddlp v0.4s, v0.8h
addv s31, v0.4s
fmov w0, s31
ret
foo_3:
saddlp v0.4s, v0.8h
addv s31, v0.4s
fmov w0, s31
ret
The widening pair-wise addition addlp instructions can be omitted if we're just doing an ADDV afterwards.
Making this optimisation would be quite simple if we had a standard RTL PLUS vector reduction code.
As we don't, we can use UNSPEC_ADDV as a stand in.
This patch expresses the SADDLV and UADDLV instructions as an UNSPEC_ADDV over a widened input, thus removing
the need for separate UNSPEC_SADDLV and UNSPEC_UADDLV codes.
To optimise the testcases involved we add two splitters that match a vector addition where all participating elements
are taken and widened from the same vector and then fed into an UNSPEC_ADDV. In that case we can just remove the
vector PLUS and just emit the simple RTL for SADDLV/UADDLV.
Bootstrapped and tested on aarch64-none-linux-gnu.
The gimplifier can elide initialized constant automatic variables
to static storage in which case TARGET_EXPR gimplification needs
to avoid emitting a CLOBBER for them since their lifetime is no
longer limited. Failing to do so causes spurious dangling-pointer
diagnostics on the added testcase for some targets.
PR middle-end/110055
* gimplify.cc (gimplify_target_expr): Do not emit
CLOBBERs for variables which have static storage duration
after gimplifying their initializers.
* g++.dg/warn/Wdangling-pointer-pr110055.C: New testcase.
Richard Biener [Wed, 31 May 2023 12:28:37 +0000 (14:28 +0200)]
tree-optimization/109143 - improve PTA compile time
The following improves solution_set_expand to require one less
iteration over the bitmap and avoid changing the bitmap we iterate
over. Plus we handle adjacent subvars in the ID space (the common case)
and use bitmap_set_range. This cuts a bit less than 10% off the PTA
time from the testcase in the PR.
PR tree-optimization/109143
* tree-ssa-structalias.cc (solution_set_expand): Avoid
one bitmap iteration and optimize bit range setting.
Costas Argyris [Tue, 6 Jun 2023 03:10:26 +0000 (21:10 -0600)]
libiberty: writeargv: Simplify function error mode.
writeargv can be simplified by getting rid of the error exit mode
that was only relevant many years ago when the function used
to open the file descriptor internally.
From 1271552baee5561fa61652f4ca7673c9667e4f8f Mon Sep 17 00:00:00 2001
From: Costas Argyris <costas.argyris@gmail.com>
Date: Mon, 5 Jun 2023 15:02:06 +0100
Subject: [PATCH] libiberty: writeargv: Simplify function error mode.
The goto-based error mode was based on a previous version
of the function where it was responsible for opening the
file, so it had to close it upon any exit:
This is no longer the case though since now the function
takes the file descriptor as input, so the exit mode on
error can be just a simple return 1 statement.
Then users can leverage the instrinsic APIs to perform the FP=16 related
reduction operations. Please note not all the instrinsic APIs are coverred
in the test files, only pick some typical ones due to too many. We will
perform the FP16 related instrinsic API test entirely soon.
Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:
* config/riscv/riscv-vector-builtins-types.def
(vfloat16mf4_t): Add vfloat16mf4_t to WF operations.
(vfloat16mf2_t): Likewise.
(vfloat16m1_t): Likewise.
(vfloat16m2_t): Likewise.
(vfloat16m4_t): Likewise.
(vfloat16m8_t): Likewise.
* config/riscv/vector-iterators.md: Add FP=16 to VWF, VWF_ZVE64,
VWLMUL1, VWLMUL1_ZVE64, vwlmul1 and vwlmul1_zve64.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/zvfh-intrinsic.c: Add new test cases.
Fei Gao [Tue, 6 Jun 2023 02:09:03 +0000 (20:09 -0600)]
[RISC-V] correct machine mode in save-restore cfi RTL.
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_adjust_libcall_cfi_prologue): Use Pmode
for cfi reg/mem machmode
(riscv_adjust_libcall_cfi_epilogue): Use Pmode for cfi reg machmode
gcc/testsuite/ChangeLog:
* gcc.target/riscv/save-restore-cfi-2.c: New test to check machmode
for cfi reg/mem.
Andre Vieira [Mon, 5 Jun 2023 16:53:10 +0000 (17:53 +0100)]
internal-fn,vect: Refactor widen_plus as internal_fn
DEF_INTERNAL_WIDENING_OPTAB_FN and DEF_INTERNAL_NARROWING_OPTAB_FN
are like DEF_INTERNAL_SIGNED_OPTAB_FN and DEF_INTERNAL_OPTAB_FN
respectively. With the exception that they provide convenience wrappers
for a single vector to vector conversion, a hi/lo split or an even/odd
split. Each definition for <NAME> will require either signed optabs
named <UOPTAB> and <SOPTAB> (for widening) or a single <OPTAB> (for
narrowing) for each of the five functions it creates.
For example, for widening addition the
DEF_INTERNAL_WIDENING_OPTAB_FN will create five internal functions:
IFN_VEC_WIDEN_PLUS, IFN_VEC_WIDEN_PLUS_HI, IFN_VEC_WIDEN_PLUS_LO,
IFN_VEC_WIDEN_PLUS_EVEN and IFN_VEC_WIDEN_PLUS_ODD. Each requiring two
optabs, one for signed and one for unsigned.
Aarch64 implements the hi/lo split optabs:
IFN_VEC_WIDEN_PLUS_HI -> vec_widen_<su>add_hi_<mode> -> (u/s)addl2
IFN_VEC_WIDEN_PLUS_LO -> vec_widen_<su>add_lo_<mode> -> (u/s)addl
This gives the same functionality as the previous
WIDEN_PLUS/WIDEN_MINUS tree codes which are expanded into
VEC_WIDEN_PLUS_LO, VEC_WIDEN_PLUS_HI.
2023-06-05 Andre Vieira <andre.simoesdiasvieira@arm.com>
Joel Hutton <joel.hutton@arm.com>
Tamar Christina <tamar.christina@arm.com>
gcc/ChangeLog:
* config/aarch64/aarch64-simd.md (vec_widen_<su>addl_lo_<mode>): Rename
this ...
(vec_widen_<su>add_lo_<mode>): ... to this.
(vec_widen_<su>addl_hi_<mode>): Rename this ...
(vec_widen_<su>add_hi_<mode>): ... to this.
(vec_widen_<su>subl_lo_<mode>): Rename this ...
(vec_widen_<su>sub_lo_<mode>): ... to this.
(vec_widen_<su>subl_hi_<mode>): Rename this ...
(vec_widen_<su>sub_hi_<mode>): ...to this.
* doc/generic.texi: Document new IFN codes.
* internal-fn.cc (lookup_hilo_internal_fn): Add lookup function.
(commutative_binary_fn_p): Add widen_plus fn's.
(widening_fn_p): New function.
(narrowing_fn_p): New function.
(direct_internal_fn_optab): Change visibility.
* internal-fn.def (DEF_INTERNAL_WIDENING_OPTAB_FN): Macro to define an
internal_fn that expands into multiple internal_fns for widening.
(IFN_VEC_WIDEN_PLUS, IFN_VEC_WIDEN_PLUS_HI, IFN_VEC_WIDEN_PLUS_LO,
IFN_VEC_WIDEN_PLUS_EVEN, IFN_VEC_WIDEN_PLUS_ODD,
IFN_VEC_WIDEN_MINUS, IFN_VEC_WIDEN_MINUS_HI,
IFN_VEC_WIDEN_MINUS_LO, IFN_VEC_WIDEN_MINUS_ODD,
IFN_VEC_WIDEN_MINUS_EVEN): Define widening plus,minus functions.
* internal-fn.h (direct_internal_fn_optab): Declare new prototype.
(lookup_hilo_internal_fn): Likewise.
(widening_fn_p): Likewise.
(Narrowing_fn_p): Likewise.
* optabs.cc (commutative_optab_p): Add widening plus optabs.
* optabs.def (OPTAB_D): Define widen add, sub optabs.
* tree-vect-patterns.cc (vect_recog_widen_op_pattern): Support
patterns with a hi/lo or even/odd split.
(vect_recog_sad_pattern): Refactor to use new IFN codes.
(vect_recog_widen_plus_pattern): Likewise.
(vect_recog_widen_minus_pattern): Likewise.
(vect_recog_average_pattern): Likewise.
* tree-vect-stmts.cc (vectorizable_conversion): Add support for
_HILO IFNs.
(supportable_widening_operation): Likewise.
* tree.def (WIDEN_SUM_EXPR): Update example to use new IFNs.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/vect-widen-add.c: Test that new
IFN_VEC_WIDEN_PLUS is being used.
* gcc.target/aarch64/vect-widen-sub.c: Test that new
IFN_VEC_WIDEN_MINUS is being used.