Jonathan Wakely [Fri, 18 Aug 2023 12:10:15 +0000 (13:10 +0100)]
libstdc++: Revert pre-C++23 support for 16-bit float types [PR111060]
In r14-3304-g1a566fddea212a and r14-3305-g6cf214b4fc97f5 I tried to
enable std::format for 16-bit float types before C++23. This causes
errors for targets where the types are defined but can't actually be
used, e.g. i686 without sse2.
Make the std::numeric_limits and std::formatter specializations for
_Float16 and __bfloat16_t depend on the __STDCPP_FLOAT16_T__ and
__STDCPP_BFLOAT16_T__ macros again, so they're only defined for C++23
when the type is fully supported. This is OK because the main point of
my earlier commits was to add better support for _Float32 and _Float64.
It seems fine for the new 16-bit types to only be supported for C++23,
as they were never present before GCC 13 anyway.
libstdc++-v3/ChangeLog:
PR target/111060
* include/std/format (formatter): Only define specializations
for 16-bit floating-point types for C++23.
* include/std/limits (numeric_limits): Likewise.
If GCC is tested with a sysroot which doesn't contain a Python
installation (e.g., with a command such as
"make check-gcc-c FLAGS_UNDER_TEST="--sysroot=/some/path"), but there's
a python3-config in $PATH, then the testsuite will pick up the host's
Python.h which can't actually be used:
Executing on host: python3-config --includes (timeout = 300)
spawn -ignore SIGHUP python3-config --includes
-I/usr/include/python3.10 -I/usr/include/python3.10
Executing on host: /some/sysroot/bin/aarch64-unknown-linux-gnu-gcc --sysroot=/some/sysroot/libc -Wl,-dynamic-linker=/some/sysroot/libc/lib/ld-linux-aarch64.so.1 -Wl,-rpath=/some/sysroot/libc/lib /some/src/gcc.git/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-2.c -fdiagnostics-plain-output -fplugin=./analyzer_cpython_plugin.so -fanalyzer -I/usr/include/python3.10 -I/usr/include/python3.10 -S -o cpython-plugin-test-2.s (timeout = 600)
spawn -ignore SIGHUP /some/sysroot/bin/aarch64-unknown-linux-gnu-gcc --sysroot=/some/sysroot/libc -Wl,-dynamic-linker=/some/sysroot/libc/lib/ld-linux-aarch64.so.1 -Wl,-rpath=/some/sysroot/libc/lib /some/src/gcc.git/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-2.c -fdiagnostics-plain-output -fplugin=./analyzer_cpython_plugin.so -fanalyzer -I/usr/include/python3.10 -I/usr/include/python3.10 -S -o cpython-plugin-test-2.s
In file included from /usr/include/python3.10/Python.h:8,
from /some/src/gcc.git/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-2.c:8:
/usr/include/python3.10/pyconfig.h:9:12: fatal error: aarch64-linux-gnu/python3.10/pyconfig.h: No such file or directory
compilation terminated.
compiler exited with status 1
This problem causes these testsuite failures:
FAIL: gcc.dg/plugin/cpython-plugin-test-2.c -fplugin=./analyzer_cpython_plugin.so (test for warnings, line 17)
FAIL: gcc.dg/plugin/cpython-plugin-test-2.c -fplugin=./analyzer_cpython_plugin.so (test for warnings, line 18)
FAIL: gcc.dg/plugin/cpython-plugin-test-2.c -fplugin=./analyzer_cpython_plugin.so (test for warnings, line 21)
FAIL: gcc.dg/plugin/cpython-plugin-test-2.c -fplugin=./analyzer_cpython_plugin.so (test for warnings, line 31)
FAIL: gcc.dg/plugin/cpython-plugin-test-2.c -fplugin=./analyzer_cpython_plugin.so (test for warnings, line 32)
FAIL: gcc.dg/plugin/cpython-plugin-test-2.c -fplugin=./analyzer_cpython_plugin.so (test for warnings, line 35)
FAIL: gcc.dg/plugin/cpython-plugin-test-2.c -fplugin=./analyzer_cpython_plugin.so (test for warnings, line 45)
FAIL: gcc.dg/plugin/cpython-plugin-test-2.c -fplugin=./analyzer_cpython_plugin.so (test for warnings, line 55)
FAIL: gcc.dg/plugin/cpython-plugin-test-2.c -fplugin=./analyzer_cpython_plugin.so (test for warnings, line 63)
FAIL: gcc.dg/plugin/cpython-plugin-test-2.c -fplugin=./analyzer_cpython_plugin.so (test for warnings, line 66)
FAIL: gcc.dg/plugin/cpython-plugin-test-2.c -fplugin=./analyzer_cpython_plugin.so (test for warnings, line 68)
FAIL: gcc.dg/plugin/cpython-plugin-test-2.c -fplugin=./analyzer_cpython_plugin.so (test for warnings, line 69)
FAIL: gcc.dg/plugin/cpython-plugin-test-2.c -fplugin=./analyzer_cpython_plugin.so (test for excess errors)
Excess errors:
/usr/include/python3.10/pyconfig.h:9:12: fatal error: aarch64-linux-gnu/python3.10/pyconfig.h: No such file or directory
compilation terminated.
So try to compile a test file so that the testcase can be marked as
unsupported instead.
gcc/testsuite/ChangeLog:
* lib/target-supports.exp (dg-require-python-h): Test
whether Python.h can really be used.
Uros Bizjak [Fri, 18 Aug 2023 17:06:38 +0000 (19:06 +0200)]
i386: Use PUNPCKL?? to implement vector extend and zero_extend for TARGET_SSE2.
Implement vector extend and zero_extend functionality for TARGET_SSE2 using
PUNPCKL?? family of instructions. The code for e.g. zero-extend from V2SI to
V2DImode improves from:
* config/i386/i386-expand.cc (ix86_split_mmx_punpck):
Also handle V2QImode.
(ix86_expand_sse_extend): New function.
* config/i386/i386-protos.h (ix86_expand_sse_extend): New prototype.
* config/i386/mmx.md (<any_extend:insn>v4qiv4hi2): Enable for
TARGET_SSE2. Expand through ix86_expand_sse_extend for !TARGET_SSE4_1.
(<any_extend:insn>v2hiv2si2): Ditto.
(<any_extend:insn>v2qiv2hi2): Ditto.
* config/i386/sse.md (<any_extend:insn>v8qiv8hi2): Ditto.
(<any_extend:insn>v4hiv4si2): Ditto.
(<any_extend:insn>v2siv2di2): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr111023-2.c: New test.
* gcc.target/i386/pr111023-4b.c: New test.
* gcc.target/i386/pr111023-8b.c: New test.
* gcc.target/i386/pr111023.c: New test.
Aldy Hernandez [Fri, 18 Aug 2023 10:41:46 +0000 (12:41 +0200)]
[irange] Return FALSE if updated bitmask is unchanged [PR110753]
The mask/value pair we track in the irange is a bit fickle in that it
can sometimes contradict the bitmask inherent in the range. This can
happen when a series of calculations yield a combination such as:
[3, 1000] MASK 0xfffffffe VALUE 0x0
The mask/value above implies that the lowest bit is a known 0, which
would exclude the 3 in the range. At one time we tried keeping mask
and ranges 100% consistent, but the performance penalty was too high
(5% in VRP). Also, it's unclear whether the intersection of two
incompatible known bits should make the whole range undefined, or
just the contradicting bits. This is all documented in
irange::get_bitmask(). We could revisit both of these assumptions
in the future.
In this testcase IPA ends up with a range where the lower 2 bits are
expected to be 0, but the range is [1,1].
[irange] long int [1, 1] MASK 0xfffffffffffffffc VALUE 0x0
This causes irange::union_bitmask() to think an update occurred, when
no semantic change happened, thus triggering an assert in IPA-cp. We
could get rid of the assert, but it's cleaner to make
irange::{union,intersect}_bitmask always tell the truth. Beside, the
ranger's cache also depends on union being truthful.
PR ipa/110753
gcc/ChangeLog:
* value-range.cc (irange::union_bitmask): Return FALSE if updated
bitmask is semantically equivalent to the original mask.
(irange::intersect_bitmask): Same.
(irange::get_bitmask): Add comment.
Richard Biener [Thu, 17 Aug 2023 13:21:33 +0000 (15:21 +0200)]
tree-optimization/111019 - invariant motion and aliasing
The following fixes a bad choice in representing things to the alias
oracle by LIM which while correct in pieces is inconsistent with itself.
When canonicalizing a ref to a bare deref instead of leaving the base
object and the extracted offset the same and just substituting an
alternate ref the following replaces the base and the offset as well,
avoiding the confusion that otherwise will arise in
aliasing_matching_component_refs_p.
PR tree-optimization/111019
* tree-ssa-loop-im.cc (gather_mem_refs_stmt): When canonicalizing
also scrap base and offset in case the ref is indirect.
Jonathan Wakely [Fri, 18 Aug 2023 10:28:32 +0000 (11:28 +0100)]
libstdc++: Replace non-type-dependent uses of wchar_t in <format> and <chrono>
This is one more piece of the rework to make wchar_t support in
std::format depend on _GLIBCXX_USE_WCHAR_T.
In <format> the __to_wstring_numeric function is called with arguments
that aren't type-dependent, so a declaration needs to be available, or
the calls need to be guarded by _GLIBCXX_USE_WCHAR_T.
In <chrono> there is a similarly non-type-dependent call to std::format
with a wchar_t format string, which is ill-formed when the wchar_t
overloads of std::format are not declared. Use _GLIBCXX_WIDEN to make it
type-dependent.
libstdc++-v3/ChangeLog:
* include/bits/chrono_io.h (operator<<): Make uses of wide
strings with streams and std::format type-dependent on _CharT.
* include/std/format [!_GLIBCXX_USE_WCHAR_T] Do not use
__to_wstring_numeric.
Kewen Lin [Fri, 18 Aug 2023 10:03:40 +0000 (05:03 -0500)]
Makefile.in: Make TM_P_H depend on $(TREE_H) [PR111021]
As PR111021 shows, the below ${port}-protos.h include tree.h
for code_helper and tree_code:
arm/arm-protos.h:#include "tree.h"
cris/cris-protos.h:#include "tree.h" (H-P removed this in r14-3218)
microblaze/microblaze-protos.h:#include "tree.h"
rl78/rl78-protos.h:#include "tree.h"
stormy16/stormy16-protos.h:#include "tree.h"
, when compiling build/gencondmd.cc, the include hierarchy
makes it depend on tm_p.h -> ${port}-protos.h -> tree.h,
which further includes (depends on) some files that are
generated during the building, such as: all-tree.def,
tree-check.h and so on. The previous commit r14-3215
should already force build/gencondmd.cc to depend on
${TREE_H}, so the reported build failure should be gone.
But for a long term maintenance, especially one day some
build/xxx.cc requires tm_p.h but not recog.h, the ${TREE_H}
dependence could be missed and a build failure will show
up. So this patch is to make TM_P_H depend on $(TREE_H),
any new build/xxx.cc depending on tm_p.h will be able to
consider ${TREE_H}.
It's tested with cross-builds for the affected ports with
steps:
1) dropped the fix r14-3215;
2) reproduced the build failure with serial build;
3) applied this patch, serial built and verified all passed;
4) added back r14-3215, serial built and verified all passed;
PR bootstrap/111021
gcc/ChangeLog:
* Makefile.in (TM_P_H): Add $(TREE_H) as dependence.
Kewen Lin [Fri, 18 Aug 2023 10:00:05 +0000 (05:00 -0500)]
vect: Factor out the handling on scatter store having gs_info.decl
Similar to the existing function vect_build_gather_load_calls,
this patch is to factor out the handling on scatter store
having gs_info.decl to vect_build_scatter_store_calls which
is a new function. It also does some minor refactoring like
moving some variables' declarations close to their uses and
restrict the scope for some of them etc.
It's a pre-patch for upcoming vectorizable_store re-structuring
for costing.
gcc/ChangeLog:
* tree-vect-stmts.cc (vect_build_scatter_store_calls): New, factor
out from ...
(vectorizable_store): ... here.
Jonathan Wakely [Fri, 18 Aug 2023 08:33:23 +0000 (09:33 +0100)]
libstdc++: Fix incomplete rework of wchar_t support in std::format
r14-3300-g023a62b77f999b left make_wformat_args and some uses of
std::wformat_context unguarded by _GLIBCXX_USE_WCHAR_T.
libstdc++-v3/ChangeLog:
* include/bits/chrono_io.h (operator<<): Use __format_context.
* include/std/format (__format::__format_context): New alias
template.
[!_GLIBCXX_USE_WCHAR_T] (wformat_args, make_wformat_arg):
Disable.
Kewen Lin [Fri, 18 Aug 2023 07:26:52 +0000 (02:26 -0500)]
vect: Move VMAT_GATHER_SCATTER handlings from final loop nest
Following Richi's suggestion [1], this patch is to move the
handlings on VMAT_GATHER_SCATTER in the final loop nest
of function vectorizable_load to its own loop. Basically
it duplicates the final loop nest, clean up some useless
set up code for the case of VMAT_GATHER_SCATTER, remove some
unreachable code. Also remove the corresponding handlings
in the final loop nest.
* tree-vect-stmts.cc (vectorizable_load): Move the handlings on
VMAT_GATHER_SCATTER in the final loop nest to its own loop,
and update the final nest accordingly.
Lehua Ding [Fri, 18 Aug 2023 02:52:53 +0000 (10:52 +0800)]
RISC-V: Fix -march error of zhinxmin testcases
This little patch fixs the -march error of a zhinxmin testcase I added earlier
and an old zhinxmin testcase, since these testcases are for zhinxmin extension
and not zfhmin extension.
Andrew Pinski [Thu, 17 Aug 2023 19:16:25 +0000 (12:16 -0700)]
Document cond_neg, cond_one_cmpl, cond_len_neg and cond_len_one_cmpl standard patterns
When I added `cond_one_cmpl` (and the corresponding IFN) I had noticed cond_neg
standard named pattern was not documented and this adds the documentation for
all 4 named patterns now.
OK? Tested by building the manual.
gcc/ChangeLog:
* doc/md.texi (Standard patterns): Document cond_neg, cond_one_cmpl,
cond_len_neg and cond_len_one_cmpl.
Lehua Ding [Sat, 12 Aug 2023 08:12:45 +0000 (16:12 +0800)]
RISC-V: Add the missed half floating-point mode patterns of local_pic_load/store when only use zfhmin or zhinxmin
Hi,
There is a new failed RISC-V testcase(testsuite/gcc.target/riscv/rvv/autovec/vls/const-4.c)
on the current trunk branch when use medany as default cmodel.
The reason is the load of half floating-point imm is convert from RTL 1 to RTL
2 as the cmodel be changed from medlow to medany. This change let insn 7 be
combineed with @pred_broadcast patterns (insn 8) at combine pass. However,
insn 6 and insn 7 are combined for SF and DF mode, but not for HF mode, and
the fail combined leads to insn 7 and insn 8 be combined. The reason of the
fail combined is the local_pic_loadhf pattern doesn't exist when only enable
zfhmin(implied by zvfh).
Therefore, when only zfhmin but not zfh is enabled, the define_insn of
*local_pic_load<ANYF:mode> must also be able to produce the pattern for
*load_pic_loadhf pattern, since the zfhmin extension also includes a
half floating-point load/store instructions. So, I added an ANFLSF Iterator
and applied it to local_pic_load/store define_insns. I have checked other ANYF
usage scenarios and feel that this is the only place that needs to be corrected.
I may have missed something, please correct. Thanks.
Lehua Ding [Mon, 14 Aug 2023 03:34:13 +0000 (11:34 +0800)]
RISC-V: Revert the convert from vmv.s.x to vmv.v.i
Hi,
This patch revert the convert from vmv.s.x to vmv.v.i and add new pattern
optimize the special case when the scalar operand is zero.
Currently, the broadcast pattern where the scalar operand is a imm
will be converted to vmv.v.i from vmv.s.x and the mask operand will be
converted from 00..01 to 11..11. There are some advantages and
disadvantages before and after the conversion after discussing
with Juzhe offline and we chose not to do this transform.
Before:
Advantages: The vsetvli info required by vmv.s.x has better compatibility since
vmv.s.x only required SEW and VLEN be zero or one. That mean there
is more opportunities to combine with other vsetlv infos in vsetvl pass.
Disadvantages: For non-zero scalar imm, one more `li rd, imm` instruction
will be needed.
After:
Advantages: No need `li rd, imm` instruction since vmv.v.i support imm operand.
Disadvantages: Like before's advantages. Worse compatibility leads to more
vsetvl instrunctions need.
Consider the bellow C code and asm after autovec.
there is an extra insn (vsetivli zero, 1, e32, m1, ta, ma)
after converted vmv.s.x to vmv.v.i.
```
int foo1(int* restrict a, int* restrict b, int *restrict c, int n) {
int sum = 0;
for (int i = 0; i < n; i++)
sum += a[i] * b[i];
return sum;
}
```
asm (Before):
```
foo1:
ble a3,zero,.L7
vsetvli a2,zero,e32,m1,ta,ma
vmv.v.i v1,0
.L6:
vsetvli a5,a3,e32,m1,tu,ma
slli a4,a5,2
sub a3,a3,a5
vle32.v v2,0(a0)
vle32.v v3,0(a1)
add a0,a0,a4
add a1,a1,a4
vmacc.vv v1,v3,v2
bne a3,zero,.L6
vsetvli a2,zero,e32,m1,ta,ma
vmv.s.x v2,zero
vredsum.vs v1,v1,v2
vmv.x.s a0,v1
ret
.L7:
li a0,0
ret
```
asm (After):
```
foo1:
ble a3,zero,.L4
vsetvli a2,zero,e32,m1,ta,ma
vmv.v.i v1,0
.L3:
vsetvli a5,a3,e32,m1,tu,ma
slli a4,a5,2
sub a3,a3,a5
vle32.v v2,0(a0)
vle32.v v3,0(a1)
add a0,a0,a4
add a1,a1,a4
vmacc.vv v1,v3,v2
bne a3,zero,.L3
vsetivli zero,1,e32,m1,ta,ma
vmv.v.i v2,0
vsetvli a2,zero,e32,m1,ta,ma
vredsum.vs v1,v1,v2
vmv.x.s a0,v1
ret
.L4:
li a0,0
ret
```
Lehua Ding [Thu, 17 Aug 2023 07:42:51 +0000 (15:42 +0800)]
RISC-V: Forbidden fuse vlmax vsetvl to DEMAND_NONZERO_AVL vsetvl
Hi,
This little patch fix the fail testcase
(gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-1.c)
after apply this patch
(https://gcc.gnu.org/pipermail/gcc-patches/2023-August/627121.html).
The specific reason is that the vsetvl pass has bug and this patch
forbidden the fuse of this case. This patch needs to be committed
before that patch to work.
Jonathan Wakely [Thu, 17 Aug 2023 19:39:02 +0000 (20:39 +0100)]
libstdc++: Replace global std::string objects in tzdb.cc
When the library is built with --disable-libstdcxx-dual-abi the only
type of std::string supported is the COW string, and the two global
std::string objects in tzdb.cc have to allocate memory. I added them
thinking they would fit in the SSO string buffer, but that's not the
case when the library only uses COW strings.
Replace them with string_view objects to avoid any allocations.
libstdc++-v3/ChangeLog:
* src/c++20/tzdb.cc (tzdata_file, leaps_file): Change type to
std::string_view.
Jonathan Wakely [Mon, 24 Jul 2023 10:38:32 +0000 (11:38 +0100)]
libstdc++: Reuse double overload of __convert_to_v if possible
For targets where double and long double have the same representation we
can reuse the same __convert_to_v code for both types. This will
slightly reduce the size of the compiled code in the library.
libstdc++-v3/ChangeLog:
* config/locale/generic/c_locale.cc (__convert_to_v): Reuse
double overload for long double if possible.
Jonathan Wakely [Wed, 19 Jul 2023 08:56:58 +0000 (09:56 +0100)]
libstdc++: Micro-optimize construction of named std::locale
This shaves about 100ns off the std::locale constructor for named
locales (which is only about 1% of the total time).
Using !*s instead of !strcmp(s, "") doesn't make any difference as GCC
optimizes that already even at -O1. !strcmp(s, "C") is optimized at -O2
so replacing that with s[0] == 'C' && s[1] == '\0' only matters for the
--enable-libstdcxx-debug builds. But !strcmp(s, "POSIX") always makes a
call to strcmp at any optimization level. We make that strcmp call,
maybe several times, for any locale name except for "C" (which will be
matched before we get to the check for "POSIX").
For most targets, locale names begin with a lowercase letter and the
only one that begins with 'P' is "POSIX". Replacing !strcmp(s, "POSIX")
with s[0] == 'P' && !strcmp(s+1, "OSIX") means that we avoid calling
strcmp unless the string really does match "POSIX".
Maybe more importantly, I find is_C_locale(s) easier to read than
strcmp(s, "C") == 0 || strcmp(s, "POSIX") == 0, and !is_C_locale(s)
easier to read than strcmp(s, "C") != 0 && strcmp(s, "POSIX") != 0.
libstdc++-v3/ChangeLog:
* src/c++98/localename.cc (is_C_locale): New function.
(locale::locale(const char*)): Use is_C_locale.
Calling string::assign(Iter, Iter) with "foreign" iterators (not the
string's own iterator or pointer types) currently constructs a temporary
string and then calls replace to copy the characters from it. That means
we copy from the iterators twice, and if the replace operation has to
grow the string then we also allocate twice.
By using *this = basic_string(first, last, get_allocator()) we only
perform a single allocation+copy and then do a cheap move assignment
instead of a second copy (and possible allocation). But that alternative
has to be done conditionally, so that we don't pessimize the native
iterator case (the string's own iterator and pointer types) which
currently select efficient overloads of replace which will not allocate
at all if the string already has sufficient capacity. For C++20 we can
extend that efficient case to work for any contiguous iterator with the
right value type, not just for the string's native iterators.
So the change is to inline the code that decides whether to work in
place or to allocate+copy (instead of deciding that via overload
resolution for replace), and for the allocate+copy case do a move
assignment instead of another call to replace.
For C++98 there is no change, as we can't do an efficient move
assignment anyway, so keep the current code.
We can also simplify assign(initializer_list<CharT>) because the backing
array for an initializer_list is always disjunct with *this, so most of
the code in _M_replace is not needed.
libstdc++-v3/ChangeLog:
PR libstdc++/110945
* include/bits/basic_string.h (basic_string::assign(Iter, Iter)):
Dispatch to _M_replace or move assignment from a temporary,
based on the iterator type.
Jonathan Wakely [Mon, 7 Aug 2023 13:06:59 +0000 (14:06 +0100)]
libstdc++: Add std::formatter specializations for extended float types
This makes it possible to format _Float32, _Float64 etc. in C++20 mode.
Previously it was only possible to format them in C++23 when the
<stdfloat> typedefs and the std::to_chars overloads were defined.
Instead of relying on std::to_chars for those types, we can just reuse
the formatters for float, double and long double. This also avoids
template bloat by reusing the same specializations instead of
instantiating __formatter_fp for every different type.
libstdc++-v3/ChangeLog:
* include/std/format (formatter): Add partial specializations
for extended floating-point types.
* testsuite/std/format/functions/format.cc: Move test_float128()
to ...
* testsuite/std/format/formatter/ext_float.cc: New test.
Jonathan Wakely [Mon, 7 Aug 2023 11:52:57 +0000 (12:52 +0100)]
libstdc++: Define std::numeric_limits<_FloatNN> before C++23
The extended floating-point types such as _Float32 are supported by GCC
prior to C++23, you just can't use the standard-conforming names from
<stdfloat> to refer to them. This change defines the specializations of
std::numeric_limits for those types for older dialects, not only for
C++23.
libstdc++-v3/ChangeLog:
* include/bits/c++config (__gnu_cxx::__bfloat16_t): Define
whenever __BFLT16_DIG__ is defined, not only for C++23.
* include/std/limits (numeric_limits<bfloat16_t>): Likewise.
(numeric_limits<_Float16>, numeric_limits<_Float32>)
(numeric_limits<_Float64>): Likewise for other extended
floating-point types.
Jonathan Wakely [Thu, 17 Aug 2023 17:27:15 +0000 (18:27 +0100)]
libstdc++: Make __cmp_cat::__unseq constructor consteval
This constructor should only ever be used with a literal 0 as the
argument, so we can make it consteval. This has the nice advantage that
it is expanded immediately in the front end, and so GDB will never step
into the __cmp_cat::__unseq::__unseq(__unseq*) constructor that is
uninteresting and probably confusing to users.
libstdc++-v3/ChangeLog:
* libsupc++/compare (__cmp_cat::__unseq): Make ctor consteval.
* testsuite/18_support/comparisons/categories/zero_neg.cc: Prune
excess errors caused by invalid consteval calls.
Jonathan Wakely [Tue, 15 Aug 2023 15:35:22 +0000 (16:35 +0100)]
libstdc++: Simplify chrono::__units_suffix using std::format
For std::chrono formatting we can simplify __units_suffix by using
std::format_to to generate the "[n/m]s" suffix with the correct
character type and write directly to the output iterator, so it doesn't
need to be widened using ctype. We can't remove the use of ctype::widen
for formatting a time zone abbreviation as a wide string, because that
can contain arbitrary characters that can't be widened by
__to_wstring_numeric.
This also fixes a bug in the chrono formatter for %Z which created a
dangling wstring_view.
libstdc++-v3/ChangeLog:
* include/bits/chrono_io.h (__units_suffix_misc): Remove.
(__units_suffix): Return a known suffix as string view, do not
write unknown suffixes to a buffer.
(__fmt_units_suffix): New function that formats the suffix using
std::format_to.
(operator<<, __chrono_formatter::_M_q): Use __fmt_units_suffix.
(__chrono_formatter::_M_Z): Correct lifetime of wstring.
Jonathan Wakely [Tue, 15 Aug 2023 15:35:22 +0000 (16:35 +0100)]
libstdc++: Rework std::format support for wchar_t
This changes how std::format creates wide strings, by replacing uses of
std::ctype<wchar_t>::widen with the recently-added __to_wstring_numeric
helper function. This removes the dependency on the locale, which should
only be used for locale-specific formats such as {:Ld}.
Also disable all the wide string formatting support if the
_GLIBCXX_USE_WCHAR_T macro is not defined. This is consistent with other
wchar_t support being disabled if the library is built without that
macro defined.
libstdc++-v3/ChangeLog:
* include/std/format [_GLIBCXX_USE_WCHAR_T]: Guard all wide
string formatters with this macro.
(__formatter_int::_M_format_int, __formatter_fp::format)
(formatter<const void*, C>::format): Use __to_wstring_numeric
instead of std::ctype::widen.
(__formatter_fp::_M_localize): Use hardcoded wchar_t values
instead of std::ctype::widen.
* testsuite/std/format/functions/format.cc: Add more checks for
wstring formatting of arithmetic types.
Jonathan Wakely [Wed, 16 Aug 2023 15:55:00 +0000 (16:55 +0100)]
libstdc++: Implement std::to_string in terms of std::format (P2587R3)
This change for C++26 affects std::to_string for floating-point
arguments, so that they should be formatted using std::format("{}", v)
instead of using sprintf. The modified specification in the standard
also affects integral arguments, but there's no observable difference
for them, and we already use std::to_chars for them anyway.
To avoid <string> depending on all of <format>, this change actually
just uses std::to_chars directly instead of using std::format. This is
equivalent, because the format spec "{}" doesn't use any of the other
features of std::format.
libstdc++-v3/ChangeLog:
* include/bits/basic_string.h (to_string(floating-point-type)):
Implement using std::to_chars for C++26.
* include/bits/version.def (__cpp_lib_to_string): Define.
* include/bits/version.h: Regenerate.
* testsuite/21_strings/basic_string/numeric_conversions/char/dr1261.cc:
Adjust expected result in C++26 mode.
* testsuite/21_strings/basic_string/numeric_conversions/char/to_string.cc:
Likewise.
* testsuite/21_strings/basic_string/numeric_conversions/wchar_t/dr1261.cc:
Likewise.
* testsuite/21_strings/basic_string/numeric_conversions/wchar_t/to_wstring.cc:
Likewise.
* testsuite/21_strings/basic_string/numeric_conversions/char/to_string_float.cc:
New test.
* testsuite/21_strings/basic_string/numeric_conversions/wchar_t/to_wstring_float.cc:
New test.
* testsuite/21_strings/basic_string/numeric_conversions/version.cc:
New test.
Jonathan Wakely [Mon, 14 Aug 2023 10:56:55 +0000 (11:56 +0100)]
libstdc++: Optimize std::to_string using std::string::resize_and_overwrite
This uses std::string::__resize_and_overwrite to avoid initializing the
string buffer with characters that are immediately overwritten. This
results in about 6% better performance for the std_to_string case in
int-benchmark.cc from https://github.com/fmtlib/format-benchmark
This requires a change to a testcase. The previous implementation
guaranteed that the string returned from std::to_string(integral-type)
would have no excess capacity, because it was constructed with the
correct length. The new implementation constructs an empty string and
then resizes it with resize_and_overwrite, which over-allocates. This
means that the "no-excess capacity" guarantee no longer holds.
We can also greatly improve the performance of std::to_wstring by using
std::to_string and then widening it with a new helper function, instead
of using std::swprintf to do the formatting.
libstdc++-v3/ChangeLog:
* include/bits/basic_string.h (to_string(integral-type)): Use
resize_and_overwrite when available.
(__to_wstring_numeric): New helper functions.
(to_wstring): Use std::to_string then __to_wstring_numeric.
* testsuite/21_strings/basic_string/numeric_conversions/char/to_string_int.cc:
Remove check for no excess capacity.
Jonathan Wakely [Wed, 16 Aug 2023 17:22:38 +0000 (18:22 +0100)]
libstdc++: Define std::string::resize_and_overwrite for C++11 and COW string
There are several places in the library where we can improve performance
using resize_and_overwrite so it's inconvenient only being able to use
it in C++23 mode, and only for cxx11 strings. This adds it for COW
strings, and also adds __resize_and_overwrite as an extension for C++11
mode.
The new __resize_and_overwrite is available for C++11 and later, so
within the library we can use that consistently even in C++23. In order
to avoid making a copy (which might not be possible for non-copyable,
non-movable types) the callable is passed to resize_and_overwrite as an
lvalue reference. Unlike wrapping it in std::ref(op) this ensures that
invoking it as std::move(op)(n, p) will use the correct value category.
It also avoids any overhead that would be added by wrapping it in a
lambda like [&op](auto p, auto n) { return std::move(op)(p, n); }.
Adjust std::format to use the new __resize_and_overwrite, which we can
assume exists because we only use std::basic_string<char> and
std::basic_string<wchar_t>, so no program-defined specializations.
The uses in <experimental/internet> cannot be replaced, because those
are type-dependent on an Allocator template parameter, which could mean
they use program-defined specializations of std::basic_string that don't
have the __resize_and_overwrite extension.
libstdc++-v3/ChangeLog:
* include/bits/basic_string.h (__resize_and_overwrite): New
function.
* include/bits/basic_string.tcc (__resize_and_overwrite): New
function.
(resize_and_overwrite): Simplify by using reserve instead of
growing the string manually. Adjust for C++11 compatibility.
* include/bits/cow_string.h (resize_and_overwrite): New
function.
(__resize_and_overwrite): New function.
* include/bits/version.def (__cpp_lib_string_resize_and_overwrite):
Do not depend on cxx11abi.
* include/bits/version.h: Regenerate.
* include/std/format (__formatter_fp::_S_resize_and_overwrite):
Remove.
(__formatter_fp::format, __formatter_fp::_M_localize): Use
__resize_and_overwrite instead of _S_resize_and_overwrite.
* testsuite/21_strings/basic_string/capacity/char/resize_and_overwrite.cc:
Adjust for C++11 compatibility when included by ...
* testsuite/21_strings/basic_string/capacity/char/resize_and_overwrite_ext.cc:
New test.
Patrick Palka [Thu, 17 Aug 2023 16:56:32 +0000 (12:56 -0400)]
libstdc++: Implement P2770R0 changes to join_view / join_with_view
This C++23 paper fixes an issue in these views when adapting a certain
kind of non-forward range, and we treat it as a DR against C++20.
Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
libstdc++-v3/ChangeLog:
* include/bits/regex.h (regex_iterator::iterator_concept):
Define for C++20 as per P2770R0.
(regex_token_iterator::iterator_concept): Likewise.
* include/std/ranges (__detail::__as_lvalue): Define.
(join_view::_Iterator): Befriend join_view.
(join_view::_Iterator::_M_satisfy): Use _M_get_outer
instead of _M_outer.
(join_view::_Iterator::_M_get_outer): Define.
(join_view::_Iterator::_Iterator): Split constructor taking
_Parent argument into two as per P2770R0. Remove constraint on
default constructor.
(join_view::_Iterator::_M_outer): Make this data member present
only when the underlying range is forward.
(join_view::_Iterator::operator++): Use _M_get_outer instead of
_M_outer.
(join_view::_Iterator::operator--): Use __as_lvalue helper.
(join_view::_Iterator::operator==): Adjust constraints as per
P2770R0.
(join_view::_Sentinel::__equal): Use _M_get_outer instead of
_M_outer.
(join_view::_M_outer): New data member when the underlying range
is non-forward.
(join_view::begin): Adjust definition as per P2770R0.
(join_view::end): Likewise.
(join_with_view::_M_outer_it): New data member when the
underlying range is non-forward.
(join_with_view::begin): Adjust definition as per P2770R0.
(join_with_view::end): Likewise.
(join_with_view::_Iterator::_M_outer_it): Make this data member
present only when the underlying range is forward.
(join_with_view::_Iterator::_M_get_outer): Define.
(join_with_view::_Iterator::_Iterator): Split constructor
taking _Parent argument into two as per P2770R0. Remove
constraint on default constructor.
(join_with_view::_Iterator::_M_update_inner): Adjust definition
as per P2770R0.
(join_with_view::_Iterator::_M_get_inner): Likewise.
(join_with_view::_Iterator::_M_satisfy): Adjust calls to
_M_get_inner. Use _M_get_outer instead of _M_outer_it.
(join_with_view::_Iterator::operator==): Adjust constraints
as per P2770R0.
(join_with_view::_Sentinel::operator==): Use _M_get_outer
instead of _M_outer_it.
* testsuite/std/ranges/adaptors/p2770r0.cc: New test.
Patrick Palka [Thu, 17 Aug 2023 16:40:04 +0000 (12:40 -0400)]
libstdc++: Convert _RangeAdaptorClosure into a CRTP base [PR108827]
Using the CRTP idiom for this base class avoids bloating the size of a
pipeline when adding distinct empty range adaptor closure objects to it,
as detailed in section 4.1 of P2387R3.
But it means we can no longer define its operator| overloads as hidden
friends, since it'd mean each instantiation of _RangeAdaptorClosure
introduces its own distinct set of hidden friends. So e.g. for the
outer | in
x | (views::reverse | views::join)
ADL would find 6 distinct hidden operator| friends:
two from _RangeAdaptorClosure<_Reverse>
two from _RangeAdaptorClosure<_Join>
two from _RangeAdaptorClosure<_Pipe<_Reverse, _Join>>
but we really only want to consider the last two.
We avoid this issue by instead defining the operator| overloads at
namespace scope alongside _RangeAdaptorClosure. This should be fine
because the only types defined in this namespace are _RangeAdaptorClosure,
_RangeAdaptor, _Pipe and _Partial, so we don't have to worry about
unintentional ADL.
Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
PR libstdc++/108827
libstdc++-v3/ChangeLog:
* include/std/ranges (__adaptor::_RangeAdaptorClosure):
Convert into a CRTP class template. Move hidden operator|
friends into namespace scope and adjust their constraints.
(__closure::__is_range_adaptor_closure_fn): Define.
(__closure::__is_range_adaptor_closure): Define.
(__adaptor::_Partial): Adjust use of _RangeAdaptorClosure.
(__adaptor::_Pipe): Likewise.
(views::_All): Likewise.
(views::_Join): Likewise.
(views::_Common): Likewise.
(views::_Reverse): Likewise.
(views::_Elements): Likewise.
(views::_Adjacent): Likewise.
(views::_AsRvalue): Likewise.
(views::_Enumerate): Likewise.
(views::_AsConst): Likewise.
* testsuite/std/ranges/adaptors/all.cc: Reinstate assertion
expecting that adding empty range adaptor closure objects to a
pipeline doesn't increase the size of a pipeline.
[LRA]: When assigning stack slots to pseudos previously assigned to fp consider other spilled pseudos
The previous LRA patch can assign slot of conflicting pseudos to
pseudos spilled after prohibiting fp->sp elimination. This patch
fixes this problem.
gcc/ChangeLog:
* lra-spills.cc (assign_stack_slot_num_and_sort_pseudos): Moving
slots_num initialization from here ...
(lra_spill): ... to here before the 1st call of
assign_stack_slot_num_and_sort_pseudos. Add the 2nd call after
fp->sp elimination.
/home/jemarch/foo.c: In function ‘xdp_context’:
/home/jemarch/foo.c:15:20: warning: comparison of distinct pointer types lacks a cast
15 | if (metadata + 1 > data)
| ^
LLVM supports an option -W[no-]compare-distinct-pointer-types that can
be used in order to enable or disable the emission of such warnings.
It is enabled by default.
This patch adds the same options to GCC.
Documentation and testsuite updated included.
Regtested in x86_64-linu-gnu.
No regressions observed.
fr30 is the only target defining GO_IF_LEGITIMATE_ADDRESS right now, in
which case the `code_helper ch` argument to memory_address_addr_space_p()
is unused and emits a new warning.
gcc/ChangeLog:
* recog.cc (memory_address_addr_space_p): Mark possibly unused
argument as unused.
Tsukasa OI [Thu, 17 Aug 2023 13:52:14 +0000 (07:52 -0600)]
[PATCH] RISC-V: Deduplicate #error messages in testsuite
"#error Feature macro not defined" is required to test the existence of an
extension through the preprocessor. However, multiple occurrence of the
exact same error message will confuse the developer once an error is
encountered.
This commit replaces such error messages to
"#error Feature macro for `EXT' not defined" to make which
macro is missing.
Tobias Burnus [Thu, 17 Aug 2023 13:20:55 +0000 (15:20 +0200)]
libgomp: call numa_available first when using libnuma
The documentation requires that numa_available() is called and only
when successful, other libnuma function may be called. Internally,
it does a syscall to get_mempolicy with flag=0 (which would return
the default policy if mode were not NULL). If this returns -1 (and
not 0) and errno == ENOSYS, the Linux kernel does not have the
get_mempolicy syscall function; if so, numa_available() returns -1
(otherwise: 0).
libgomp/
PR libgomp/111024
* allocator.c (gomp_init_libnuma): Call numa_available; if
not available or not returning 0, disable libnuma usage.
Alex Coplan [Thu, 17 Aug 2023 13:08:31 +0000 (14:08 +0100)]
doc: Fixes to RTL-SSA sample code
This patch fixes up the code examples in the RTL-SSA documentation (the
sections on making insn changes) to reflect the current API.
The main issues are as follows:
- rtl_ssa::recog takes an obstack_watermark & as the first parameter.
Presumably this is intended to be the change attempt, so I've updated
the examples to pass this through.
- The variants of recog and restrict_movement that take an ignore
predicate have been renamed with an _ignoring suffix, so I've
updated callers to use those names.
- A couple of minor "obvious" fixes to add a missing address-of
operator and correct a variable name.
gcc/ChangeLog:
* doc/rtl.texi: Fix up sample code for RTL-SSA insn changes.
Jose E. Marchesi [Thu, 17 Aug 2023 12:19:15 +0000 (14:19 +0200)]
bpf: support `naked' function attributes in BPF targets
The kernel selftests and other BPF programs make extensive use of the
`naked' function attribute with bodies written using basic inline
assembly. This patch adds support for the attribute to
bpf-unkonwn-none, makes it to inhibit warnings due to lack of explicit
`return' statement, and updates documentation and testsuite
accordingly.
Tested in x86_64-linux-gnu host and bpf-unknown-none target.
gcc/ChangeLog
PR target/111046
* config/bpf/bpf.cc (bpf_attribute_table): Add entry for the
`naked' function attribute.
(bpf_warn_func_return): New function.
(TARGET_WARN_FUNC_RETURN): Define.
(bpf_expand_prologue): Add preventive comment.
(bpf_expand_epilogue): Likewise.
* doc/extend.texi (BPF Function Attributes): Document the `naked'
function attribute.
Jonathan Wakely [Thu, 17 Aug 2023 12:02:27 +0000 (13:02 +0100)]
libstdc++: Fix std::format("{:F}", inf) to use uppercase
std::format was treating {:f} and {:F} identically on the basis that for
the fixed 1.234567 format there are no alphabetical characters that need
to be in uppercase. But that's wrong for infinities and NaNs, which
should be formatted as "INF" and "NAN" for {:F}.
libstdc++-v3/ChangeLog:
* include/std/format (__format::_Pres_type): Add _Pres_F.
(__formatter_fp::parse): Use _Pres_F for 'F'.
(__formatter_fp::format): Set __upper for _Pres_F.
* testsuite/std/format/functions/format.cc: Check formatting of
infinity and NaN for each presentation type.
The following changes the gate to perform vectorization of BB reductions
to use needs_fold_left_reduction_p which in turn requires handling
TYPE_OVERFLOW_UNDEFINED types in the epilogue code generation by
promoting any operations generated there to use unsigned arithmetic.
The following does this, there's currently only v16qi where x86
supports a .REDUC_PLUS reduction for integral modes so I had to
add a x86 specific testcase using GIMPLE IL.
* tree-vect-slp.cc (vect_slp_check_for_roots): Use
!needs_fold_left_reduction_p to decide whether we can
handle the reduction with association.
(vectorize_slp_instance_root_stmt): For TYPE_OVERFLOW_UNDEFINED
reductions perform all arithmetic in an unsigned type.
Test case g++.dg/analyzer/fanalyzer-show-events-in-system-headers.C
introduced by patch ce8cdf5bcf96a2db6d7b9f656fc9ba58d7942a83
emitted a warning for an unused dg-line variable.
This fixes up the blunder.
Signed-off-by: benjamin priour <vultkayn@gcc.gnu.org>
gcc/testsuite/ChangeLog:
* g++.dg/analyzer/fanalyzer-show-events-in-system-headers.C:
Remove dg-line var declare_a.
Rainer Orth [Thu, 17 Aug 2023 08:14:49 +0000 (10:14 +0200)]
build: Allow for Xcode 15 ld -v output
Since Xcode 15 beta 6, ld -v output differs from previous versions:
* macOS 13/Xcode 14:
@(#)PROGRAM:ld PROJECT:ld64-857.1
* macOS 14/Xcode 15:
@(#)PROGRAM:ld PROJECT:dyld-1015.1
configure cannot handle the new form, so LD64_VERSION isn't set.
This patch fixes this. The autoconf manual states that sed doesn't
portably support alternation, so I'm using two separate expressions to
extract the version number.
Jonathan Wakely [Wed, 16 Aug 2023 20:29:46 +0000 (21:29 +0100)]
libstdc++: Disable PCH for tests that rely on include order
These tests expect to be able to #undef a feature test macro and then
include <version> to get it redefined. But if <version> has already been
included by the <bits/stdc++.h> PCH then including it again does nothing
and the macro remains undefined.
Jonathan Wakely [Wed, 16 Aug 2023 20:46:05 +0000 (21:46 +0100)]
libstdc++: Fix testsuite no_pch directive
The { dg-add-options no_pch } directive is supposed to add a macro
definition that invalidates the PCH file, and ensures that the #include
directives in the test file are processed as written. But the proc that
adds the options actually removes all existing options, cancelling out
any previous dg-options directive.
This means that using no_pch will cause FAILs in a file that relies on
other options set by an earlier dg-options.
The no_pch directive was added for PR libstdc++/21769 where Janis
suggested adding it as return "$flags -D__GLIBCXX__=99999999" but what
was actually committed didn't include the $flags so replaced them.
Additionally, using no_pch only prevents the precompiled version of
<bits/stdc++.h> from being included, it doesn't prevent the
non-precompiled version being included by -include bits/stdc++.h in the
test flags. Use regsub to filter that out of the options as well.
libstdc++-v3/ChangeLog:
* testsuite/lib/dg-options.exp (add_options_for_no_pch): Remove
any "-include bits/stdc++.h" from options and add the macro to
the existing options instead of replacing them.
Haochen Jiang [Thu, 17 Aug 2023 06:15:43 +0000 (14:15 +0800)]
Emit a warning when AVX10 options conflict in vector width
gcc/ChangeLog:
* common/config/i386/i386-common.cc
(ix86_check_avx10_vector_width): New function to check isa_flags
to emit a warning when there is a conflict in AVX10 options for
vector width.
(ix86_handle_option): Add check for avx10.1-256 and avx10.1-512.
* config/i386/driver-i386.cc (host_detect_local_cpu):
Do not append -mno-avx10-max-512bit for -march=native.
Haochen Jiang [Thu, 17 Aug 2023 06:13:28 +0000 (14:13 +0800)]
Emit a warning when disabling AVX512 with AVX10 enabled or disabling AVX10 with AVX512 enabled
gcc/ChangeLog:
* common/config/i386/i386-common.cc
(ix86_check_avx10): New function to check isa_flags and
isa_flags_explicit to emit warning when AVX10 is enabled
by "-m" option.
(ix86_check_avx512): New function to check isa_flags and
isa_flags_explicit to emit warning when AVX512 is enabled
by "-m" option.
(ix86_handle_option): Do not change the flags when warning
is emitted.
* config/i386/driver-i386.cc (host_detect_local_cpu):
Do not append -mno-avx10.1 for -march=native.
Andrew Pinski [Sat, 12 Aug 2023 01:19:01 +0000 (18:19 -0700)]
Add support for vector conitional not
Like the support conditional neg (r12-4470-g20dcda98ed376cb61c74b2c71),
this just adds conditional not too.
Also we should be able to turn `(a ? -1 : 0) ^ b` into a conditional
not.
OK? Bootstrapped and tested on x86_64-linux-gnu and aarch64-linux-gnu.
gcc/ChangeLog:
* internal-fn.def (COND_NOT): New internal function.
* match.pd (UNCOND_UNARY, COND_UNARY): Add bit_not/not
to the lists.
(`vec (a ? -1 : 0) ^ b`): New pattern to convert
into conditional not.
* optabs.def (cond_one_cmpl): New optab.
(cond_len_one_cmpl): Likewise.
gcc/testsuite/ChangeLog:
PR target/110986
* gcc.target/aarch64/sve/cond_unary_9.c: New test.
Harald Anlauf [Wed, 16 Aug 2023 20:00:49 +0000 (22:00 +0200)]
Fortran: fix memleak for character,value dummy of bind(c) procedure [PR110360]
Testcase gfortran.dg/bind_c_usage_13.f03 exhibited a memleak in the frontend
occuring when passing a character literal to a character,value dummy of a
bind(c) procedure, due to a missing cleanup in the conversion of the actual
argument expression. Reduced testcase:
program p
interface
subroutine val_c (c) bind(c)
use iso_c_binding, only: c_char
character(len=1,kind=c_char), value :: c
end subroutine val_c
end interface
call val_c ("A")
end
gcc/fortran/ChangeLog:
PR fortran/110360
* trans-expr.cc (conv_scalar_char_value): Use gfc_replace_expr to
avoid leaking replaced gfc_expr.
The callable used for resize_and_overwrite was being passed the string's
expanded capacity, which might be greater than the new size being
requested. This is not conforming, as the standard requires the same n
to be passed to the callable that the user passed to
resize_and_overwrite.
The existing tests didn't catch this because they all used a value which
was more than twice the existing capacity, so the _M_create call
allocated exactly what was requested, and the value passed to the
callable was correct. But when the requested size is greater than the
current capacity but smaller than twice the current capacity, _M_create
will allocate twice the current capacity and then that value was being
passed to the callable.
I noticed this because std::format(L"{}", 0.25) was producing L"0.25XX"
where the XX characters were whatever happened to be on the stack before
the call. When std::format used resize_and_overwrite to widen a string
it was copying too many characters into the destination and setting the
result's length too long. I've added a test for this case, and a new
test that doesn't hardcode -std=gnu++20 so can be used to test
std::format in C++23 and C++26 modes.
libstdc++-v3/ChangeLog:
* include/bits/basic_string.tcc (resize_and_overwrite): Invoke
the callable with the same size as resize_and_overwrite was
called with.
* testsuite/21_strings/basic_string/capacity/char/resize_and_overwrite.cc:
Check with small values for the new size.
* testsuite/std/format/functions/format.cc: Check wide
formatting of double values that produce small strings.
* testsuite/std/format/functions/format_c++23.cc: New test.
ira: update allocated_hardreg_p[] in improve_allocation() [PR110254]
The improve_allocation() routine does not update the
allocated_hardreg_p[] array after an allocno is assigned a register.
If the register chosen in improve_allocation() is one that already has
been assigned to a conflicting allocno, then allocated_hardreg_p[]
already has the corresponding bit set to TRUE, so nothing needs to be
done.
But improve_allocation() can also choose a register that has not been
assigned to a conflicting allocno, and also has not been assigned to any
other allocno. In this case, allocated_hardreg_p[] has to be updated.
Jonathan Wakely [Wed, 16 Aug 2023 11:24:35 +0000 (12:24 +0100)]
libstdc++: Fix comment naming upstream PSTL test file
These tests were derived from set.pass.cpp not set.pass.cc, specifically
pstl/test/std/algorithms/alg.sorting/alg.set.operations/set.pass.cpp in
the LLVM repo.
libstdc++-v3/ChangeLog:
* testsuite/25_algorithms/pstl/alg_sorting/set_difference.cc:
Fix name of upstream file this was derived from.
* testsuite/25_algorithms/pstl/alg_sorting/set_intersection.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_sorting/set_symmetric_difference.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_sorting/set_union.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_sorting/set_util.h: Likewise.
[LRA]: Spill pseudos assigned to fp when fp->sp elimination became impossible
Porting LRA to AVR revealed that creating a stack slot can make fp->sp
elimination impossible. The previous patches undoes fp assignment after
the stack slot creation but calculated wrongly live info after this. This
resulted in wrong generation by deleting some still alive insns. This
patch fixes this problem.
gcc/ChangeLog:
* lra-int.h (lra_update_fp2sp_elimination): Change the prototype.
* lra-eliminations.cc (spill_pseudos): Record spilled pseudos.
(lra_update_fp2sp_elimination): Ditto.
(update_reg_eliminate): Adjust spill_pseudos call.
* lra-spills.cc (lra_spill): Assign stack slots to pseudos spilled
in lra_update_fp2sp_elimination.
Arsen Arsenović [Mon, 27 Mar 2023 13:29:08 +0000 (15:29 +0200)]
libstdc++: Implement more maintainable <version> header
This commit replaces the ad-hoc logic in <version> with an AutoGen
database that (mostly) declaratively generates a version.h bit which
combines all of the FTM logic across all headers together.
This generated header defines macros of the form __glibcxx_foo,
equivalent to their __cpp_lib_foo variants, according to rules specified
in version.def and, optionally, if __glibcxx_want_foo or
__glibcxx_want_all are defined, also defines __cpp_lib_foo forms with
the same definition.
libstdc++-v3/ChangeLog:
* include/Makefile.am (bits_freestanding): Add version.h.
(allcreated): Add version.h.
(${bits_srcdir}/version.h): New rule. Regenerates
version.h out of version.{def,tpl}.
* include/Makefile.in: Regenerate.
* include/bits/version.def: New file. Declares a list of
all feature test macros, their values and their preconditions.
* include/bits/version.tpl: New file. Turns version.def
into a sequence of #if blocks.
* include/bits/version.h: New file. Generated from
version.def.
* include/std/version: Replace with a __glibcxx_want_all define
and bits/version.h include.
If there is no direct support, the vectorizer can synthesize the pattern
but, presumably, due to lack of narrowing operation support, won't try a
narrowing shift. Therefore, this patch implements the expanders
instead.
* gcc.target/riscv/rvv/autovec/widen/vec-avg-run.c: New test.
* gcc.target/riscv/rvv/autovec/widen/vec-avg-rv32gcv.c: New test.
* gcc.target/riscv/rvv/autovec/widen/vec-avg-rv64gcv.c: New test.
* gcc.target/riscv/rvv/autovec/widen/vec-avg-template.h: New test.
Robin Dapp [Tue, 15 Aug 2023 09:43:43 +0000 (11:43 +0200)]
internal-fn: Fix vector extraction into promoted subreg.
This patch fixes the case where vec_extract gets passed a promoted
subreg (e.g. from a return value). This is achieved by using
expand_convert_optab_fn instead of a separate expander function.
gcc/ChangeLog:
* internal-fn.cc (vec_extract_direct): Change type argument
numbers.
(expand_vec_extract_optab_fn): Call convert_optab_fn.
(direct_vec_extract_optab_supported_p): Use
convert_optab_supported_p.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-1u.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-2u.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-3u.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-4u.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-runu.c: New test.
For this case the index 2 in sel is ambiguous for len 2 + 2x:
if x = 0, runtime vector length = 2 and sel[i] will choose arg1[0]
if x > 0, runtime vector length > 2 and sel[i] choose arg0[2].
So we return NULL_TREE for this case.
This leads us to defining a constraint that a stepped sequence in sel,
should only select a particular pattern from a particular input vector.
sel contains a single pattern with stepped sequence: {0, 2, ...}.
Let, a1 = the first element of stepped part of sequence, which is 0.
Let esel = number of total elements in stepped sequence.
Thus,
esel = len / sel_npatterns
= (4 + 4x) / 1
= 4 + 4x
Let S = step of the sequence, which is 2 in this case.
Let ae = last element of the stepped sequence.
Thus,
ae = a1 + (esel - 2) * S
= 0 + (4 + 4x - 2) * 2
= 4 + 8x
To ensure that we select elements from the same input vector,
a1 /trunc len = ae /trunc len.
Let, q1 = a1 /trunc len = 0 / (4 + 4x) = 0
Let, qe = ae /trunc len = (4 + 8x) / (4 + 4x) = 1
Since q1 != qe, we cross input vectors, and return NULL_TREE for this case.
However, if sel was:
sel = {len, 0, 1, ...}
The only change in this case is S = 1.
So,
ae = a1 + (esel - 2) * S
= 0 + (4 + 4x - 2) * 1
= 2 + 4x
In this case, a1/len == ae/len == 0, and the stepped sequence chooses all elements
from arg0.
Thus,
res = {arg1[0], arg0[0], arg0[1], ...}
For VLA folding, sel has to conform to constraints imposed in
valid_mask_for_fold_vec_perm_cst_p.
test_fold_vec_perm_cst defines several unit-tests for VLA folding.
Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:
* config/riscv/riscv-vector-builtins-bases.cc
(BASE): New declaration.
* config/riscv/riscv-vector-builtins-bases.h: Ditto.
* config/riscv/riscv-vector-builtins-functions.def
(vfwcvt_xu_frm): New intrinsic function def.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/float-point-wcvt-xu.c: New test.
Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:
* config/riscv/riscv-vector-builtins-bases.cc
(BASE): New declaration.
* config/riscv/riscv-vector-builtins-bases.h: Ditto.
* config/riscv/riscv-vector-builtins-functions.def
(vfwcvt_x_frm): New intrinsic function def.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/float-point-wcvt-x.c: New test.
Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:
* config/riscv/riscv-vector-builtins-bases.cc (BASE): New declaration.
* config/riscv/riscv-vector-builtins-bases.h: Ditto.
* config/riscv/riscv-vector-builtins-functions.def
(vfcvt_f_frm): New intrinsic function def.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/float-point-cvt-f.c: New test.
Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:
* config/riscv/riscv-vector-builtins-bases.cc
(BASE): New declaration.
* config/riscv/riscv-vector-builtins-bases.h: Ditto.
* config/riscv/riscv-vector-builtins-functions.def
(vfcvt_xu_frm): New intrinsic function def..
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/float-point-cvt-xu.c: New test.
Haochen Gui [Wed, 16 Aug 2023 06:29:36 +0000 (14:29 +0800)]
rs6000: Skip unnecessary vector extract for certain elements.
If the extracted element index is:
- for byte, 7 on BE while 8 on LE;
- for half word, 3 on BE while 4 on LE;
the element to be stored is already in the corresponding place for
stxsi[hb]x. We don't need a redundant vector extraction at all.
gcc/
PR target/110429
* config/rs6000/vsx.md (*vsx_extract_<mode>_store_p9): Skip vector
extract when the element is 7 on BE while 8 on LE for byte or 3 on
BE while 4 on LE for halfword.
Haochen Gui [Wed, 16 Aug 2023 06:21:09 +0000 (14:21 +0800)]
rs6000: Generate mfvsrwz for all platforms and remove redundant zero extend
mfvsrwz has lower latency than xxextractuw or vextuw[lr]x. So it should be
generated even with p9 vector enabled. Also the instruction is already
zero extended. A combine pattern is needed to eliminate redundant zero
extend instructions.
gcc/
PR target/106769
* config/rs6000/vsx.md (expand vsx_extract_<mode>): Set it only
for V8HI and V16QI.
(vsx_extract_v4si): New expand for V4SI extraction.
(vsx_extract_v4si_w1): New insn pattern for V4SI extraction on
word 1 from BE order.
(*mfvsrwz): New insn pattern for mfvsrwz.
(*vsx_extract_<mode>_di_p9): Assert that it won't be generated on
word 1 from BE order.
(*vsx_extract_si): Remove.
(*vsx_extract_v4si_w023): New insn and split pattern on word 0, 2,
3 from BE order.
* config/riscv/autovec.md (vec_mask_len_load_lanes<mode><vsingle>):
New pattern.
(vec_mask_len_store_lanes<mode><vsingle>): Ditto.
* config/riscv/riscv-protos.h (expand_lanes_load_store): New function.
* config/riscv/riscv-v.cc (get_mask_mode): Add tuple mask mode.
(expand_lanes_load_store): New function.
* config/riscv/vector-iterators.md: New iterator.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-2.c:
Adapt test.
* gcc.target/riscv/rvv/autovec/partial/slp-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/slp-16.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/slp-17.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/slp-18.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/slp-19.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/slp-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/slp-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/slp-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/slp-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/slp-6.c: Ditto.
* gcc.target/riscv/rvv/rvv.exp: Add lanes tests.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load-1.c: New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load-2.c: New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load-3.c: New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load-4.c: New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load-5.c: New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load-6.c: New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load-7.c: New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load_run-1.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load_run-2.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load_run-3.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load_run-4.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load_run-5.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load_run-6.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load_run-7.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store-1.c: New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store-2.c: New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store-3.c: New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store-4.c: New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store-5.c: New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store-6.c: New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store-7.c: New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store_run-1.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store_run-2.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store_run-3.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store_run-4.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store_run-5.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store_run-6.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store_run-7.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect-1.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect-10.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect-11.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect-12.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect-13.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect-14.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect-15.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect-16.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect-17.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect-18.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect-2.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect-3.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect-4.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect-5.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect-6.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect-7.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect-8.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect-9.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect_run-10.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect_run-11.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect_run-12.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect_run-13.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect_run-14.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect_run-15.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect_run-16.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect_run-17.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect_run-18.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect_run-3.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect_run-4.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect_run-5.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect_run-6.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect_run-7.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect_run-8.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect_run-9.c: New test.
Step 1 - Modifiy the LANES LOAD/STORE support function (vect_load_lanes_supported/vect_store_lanes_supported):
+/* Return FN if vec_{masked_,mask_len,}load_lanes is available for COUNT
+ vectors of type VECTYPE. MASKED_P says whether the masked form is needed. */
Instead of returning TRUE or FALSE whether target support the LANES LOAD/STORE.
I change it into return internal_fn of the LANES LOAD/STORE that target support,
If target didn't support any LANE LOAD/STORE optabs, return IFN_LAST.
if (!STMT_VINFO_STRIDED_P (first_stmt_info)
&& (can_overrun_p || !would_overrun_p)
&& compare_step_with_zero (vinfo, stmt_info) > 0)
{
/* First cope with the degenerate case of a single-element
vector. */
if (known_eq (TYPE_VECTOR_SUBPARTS (vectype), 1U))
;
Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:
* config/riscv/riscv-vector-builtins-bases.cc
(enum frm_op_type): New type for frm.
(BASE): New declaration.
* config/riscv/riscv-vector-builtins-bases.h: Ditto.
* config/riscv/riscv-vector-builtins-functions.def
(vfcvt_x_frm): New intrinsic function def.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/float-point-cvt-x.c: New test.
liuhongt [Thu, 10 Aug 2023 08:26:13 +0000 (16:26 +0800)]
Support -m[no-]gather -m[no-]scatter to enable/disable vectorization for all gather/scatter instructions
Rename original use_gather to use_gather_8parts, Support
-mtune-ctrl={,^}use_gather to set/clear tune features
use_gather_{2parts, 4parts, 8parts}. Support the new option -mgather
as alias of -mtune-ctrl=, use_gather, ^use_gather.
Similar for use_scatter.
gcc/ChangeLog:
* config/i386/i386-builtins.cc
(ix86_vectorize_builtin_gather): Adjust for use_gather_8parts.
* config/i386/i386-options.cc (parse_mtune_ctrl_str):
Set/Clear tune features use_{gather,scatter}_{2parts, 4parts,
8parts} for -mtune-crtl={,^}{use_gather,use_scatter}.
* config/i386/i386.cc (ix86_vectorize_builtin_scatter): Adjust
for use_scatter_8parts
* config/i386/i386.h (TARGET_USE_GATHER): Rename to ..
(TARGET_USE_GATHER_8PARTS): .. this.
(TARGET_USE_SCATTER): Rename to ..
(TARGET_USE_SCATTER_8PARTS): .. this.
* config/i386/x86-tune.def (X86_TUNE_USE_GATHER): Rename to
(X86_TUNE_USE_GATHER_8PARTS): .. this.
(X86_TUNE_USE_SCATTER): Rename to
(X86_TUNE_USE_SCATTER_8PARTS): .. this.
* config/i386/i386.opt: Add new options mgather, mscatter.
liuhongt [Thu, 10 Aug 2023 03:41:39 +0000 (11:41 +0800)]
Software mitigation: Disable gather generation in vectorization for GDS affected Intel Processors.
For more details of GDS (Gather Data Sampling), refer to
https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/advisory-guidance/gather-data-sampling.html
After microcode update, there's performance regression. To avoid that,
the patch disables gather generation in autovectorization but uses
gather scalar emulation instead.
gcc/ChangeLog:
* config/i386/i386-options.cc (m_GDS): New macro.
* config/i386/x86-tune.def (X86_TUNE_USE_GATHER_2PARTS): Don't
enable for m_GDS.
(X86_TUNE_USE_GATHER_4PARTS): Ditto.
(X86_TUNE_USE_GATHER): Ditto.
Generate vmovapd instead of vmovsd for moving DFmode between SSE_REGS.
vmovapd can enable register renaming and have same code size as
vmovsd. Similar for vmovsh vs vmovaps, vmovaps is 1 byte less than
vmovsh.
When TARGET_AVX512VL is not available, still generate
vmovsd/vmovss/vmovsh to avoid vmovapd/vmovaps zmm16-31.
gcc/ChangeLog:
* config/i386/i386.md (movdf_internal): Generate vmovapd instead of
vmovsd when moving DFmode between SSE_REGS.
(movhi_internal): Generate vmovdqa instead of vmovsh when
moving HImode between SSE_REGS.
(mov<mode>_internal): Use vmovaps instead of vmovsh when
moving HF/BFmode between SSE_REGS.