Jakub Jelinek [Fri, 8 Dec 2023 08:03:18 +0000 (09:03 +0100)]
lower-bitint: Avoid merging non-mergeable stmt with cast and mergeable stmt [PR112902]
Before bitint lowering, the IL has:
b.0_1 = b;
_2 = -b.0_1;
_3 = (unsigned _BitInt(512)) _2;
a.1_4 = a;
a.2_5 = (unsigned _BitInt(512)) a.1_4;
_6 = _3 * a.2_5;
on the first function. Now, gimple_lower_bitint has an optimization
(when not -O0) that it avoids assigning underlying VAR_DECLs for certain
SSA_NAMEs where it is possible to lower it in a single loop (or straight
line code) rather than in multiple loops.
So, e.g. the multiplication above uses handle_operand_addr, which can deal
with INTEGER_CST arguments, loads but also casts, so it is fine
not to assign an underlying VAR_DECL for SSA_NAMEs a.1_4 and a.2_5, as
the multiplication can handle it fine.
The more problematic case is the other multiplication operand.
It is again a result of a (in this case narrowing) cast, so it is fine
not to assign VAR_DECL for _3. Normally we can merge the load (b.0_1)
with the negation (_2) and even with the following cast (_3). If _3
was used in a mergeable operation like addition, subtraction, negation,
&|^ or equality comparison, all of b.0_1, _2 and _3 could be without
underlying VAR_DECLs.
The problem is that the current code does that even when the cast is used
by a non-mergeable operation, and handle_operand_addr certainly can't handle
the mergeable operations feeding the rhs1 of the cast, for multiplication
we don't emit any loop in which it could appear, for other operations like
shifts or non-equality comparisons we emit loops, but either in the reverse
direction or with unpredictable indexes (for shifts).
So, in order to lower the above correctly, we need to have an underlying
VAR_DECL for either _2 or _3; if we choose _2, then the load and negation
would be done in one loop and extension handled as part of the
multiplication, if we choose _3, then the load, negation and cast are done
in one loop and the multiplication just uses the underlying VAR_DECL
computed by that.
It is far easier to do this for _3, which is what the following patch
implements.
It actually already had code for most of it, just it did that for widening
casts only (optimize unless the cast rhs1 is not SSA_NAME, or is SSA_NAME
defined in some other bb, or with more than one use, etc.).
This falls through into such code even for the narrowing or same precision
casts, unless the cast is used in a mergeable operation.
2023-12-08 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/112902
* gimple-lower-bitint.cc (gimple_lower_bitint): For a narrowing
or same precision cast don't set SSA_NAME_VERSION in m_names only
if use_stmt is mergeable_op or fall through into the check that
use is a store or rhs1 is not mergeable or other reasons prevent
merging.
Jakub Jelinek [Fri, 8 Dec 2023 08:02:15 +0000 (09:02 +0100)]
vr-values: Avoid ICEs on large _BitInt cast to floating point [PR112901]
For casts from integers to floating point,
simplify_float_conversion_using_ranges uses SCALAR_INT_TYPE_MODE
and queries optabs on the optimization it wants to make.
That doesn't really work for large/huge BITINT_TYPE, those have BLKmode
which is not scalar int mode. Querying an optab is not useful for that
either.
I think it is best to just skip this optimization for those bitints,
after all, bitint lowering uses ranges already to determine minimum
precision for bitint operands of the integer to float casts.
2023-12-08 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/112901
* vr-values.cc
(simplify_using_ranges::simplify_float_conversion_using_ranges):
Return false if rhs1 has BITINT_TYPE type with BLKmode TYPE_MODE.
Jakub Jelinek [Fri, 8 Dec 2023 07:56:33 +0000 (08:56 +0100)]
haifa-sched: Avoid overflows in extend_h_i_d [PR112411]
On Thu, Dec 07, 2023 at 09:36:23AM +0100, Jakub Jelinek wrote:
> Without the dg-skip-if I got on 64-bit host with
> -O3 --param min-nondebug-insn-uid=0x40000000:
> cc1: out of memory allocating 571230784744 bytes after a total of 2772992 bytes
I've looked at this and the problem is in haifa-sched.cc:
9047 h_i_d.safe_grow_cleared (3 * get_max_uid () / 2, true);
get_max_uid () is 0x4000024d with the --param min-nondebug-insn-uid=0x40000000
and so 3 * get_max_uid () / 2 actually overflows to -536870028 but as vec.h
then treats the value as unsigned, it attempts to allocate
0xe0000374U * 152UL bytes, i.e. those 532GB. If the above is fixed to do
3U * get_max_uid () / 2 instead, it will get slightly better and will only
need 0x60000373U * 152UL bytes, i.e. 228GB.
Perhaps more could be helped by making the vector indirect (contain pointers
to haifa_insn_data_def rather than the structures themselves) and pool allocate
those, but the more important question is how sparse are uids in normal
compilations without those large --param min-nondebug-insn-uid= parameters.
Because if they aren't enough, such a change would increase compile time memory
just to help the unusual case.
2023-12-08 Jakub Jelinek <jakub@redhat.com>
PR middle-end/112411
* haifa-sched.cc (extend_h_i_d): Use 3U instead of 3 in
3 * get_max_uid () / 2 calculation.
Lulu Cheng [Fri, 1 Dec 2023 03:51:51 +0000 (11:51 +0800)]
LoongArch: Remove the definition of ISA_BASE_LA64V110 from the code.
The instructions defined in LoongArch Reference Manual v1.1 are not the instruction
set v1.1 version. The CPU defined later may only support some instructions in
LoongArch Reference Manual v1.1. Therefore, the macro ISA_BASE_LA64V110 and
related definitions are removed here.
gcc/ChangeLog:
* config/loongarch/genopts/loongarch-strings: Delete STR_ISA_BASE_LA64V110.
* config/loongarch/genopts/loongarch.opt.in: Likewise.
* config/loongarch/loongarch-cpu.cc (ISA_BASE_LA64V110_FEATURES): Delete macro.
(fill_native_cpu_config): Define a new variable hw_isa_evolution record the
extended instruction set support read from cpucfg.
* config/loongarch/loongarch-def.cc: Set evolution at initialization.
* config/loongarch/loongarch-def.h (ISA_BASE_LA64V100): Delete.
(ISA_BASE_LA64V110): Likewise.
(N_ISA_BASE_TYPES): Likewise.
(defined): Likewise.
* config/loongarch/loongarch-opts.cc: Likewise.
* config/loongarch/loongarch-opts.h (TARGET_64BIT): Likewise.
(ISA_BASE_IS_LA64V110): Likewise.
* config/loongarch/loongarch-str.h (STR_ISA_BASE_LA64V110): Likewise.
* config/loongarch/loongarch.opt: Regenerate.
Xi Ruoyao [Fri, 1 Dec 2023 02:09:33 +0000 (10:09 +0800)]
LoongArch: Switch loongarch-def from C to C++ to make it possible.
We'll use HOST_WIDE_INT in LoongArch static properties in following patches.
To keep the same readability as C99 designated initializers, create a
std::array like data structure with position setter function, and add
field setter functions for structs used in loongarch-def.cc.
Remove unneeded guards #if
!defined(IN_LIBGCC2) && !defined(IN_TARGET_LIBS) && !defined(IN_RTS)
in loongarch-def.h and loongarch-opts.h.
gcc/ChangeLog:
* config/loongarch/loongarch-def.h: Remove extern "C".
(loongarch_isa_base_strings): Declare as loongarch_def_array
instead of plain array.
(loongarch_isa_ext_strings): Likewise.
(loongarch_abi_base_strings): Likewise.
(loongarch_abi_ext_strings): Likewise.
(loongarch_cmodel_strings): Likewise.
(loongarch_cpu_strings): Likewise.
(loongarch_cpu_default_isa): Likewise.
(loongarch_cpu_issue_rate): Likewise.
(loongarch_cpu_multipass_dfa_lookahead): Likewise.
(loongarch_cpu_cache): Likewise.
(loongarch_cpu_align): Likewise.
(loongarch_cpu_rtx_cost_data): Likewise.
(loongarch_isa): Add a constructor and field setter functions.
* config/loongarch/loongarch-opts.h (loongarch-defs.h): Do not
include for target libraries.
* config/loongarch/loongarch-opts.cc: Comment code that doesn't
run and causes compilation errors.
* config/loongarch/loongarch-tune.h (LOONGARCH_TUNE_H): Likewise.
(struct loongarch_rtx_cost_data): Likewise.
(struct loongarch_cache): Likewise.
(struct loongarch_align): Likewise.
* config/loongarch/t-loongarch: Compile loongarch-def.cc with the
C++ compiler.
* config/loongarch/loongarch-def-array.h: New file for a
std:array like data structure with position setter function.
* config/loongarch/loongarch-def.c: Rename to ...
* config/loongarch/loongarch-def.cc: ... here.
(loongarch_cpu_strings): Define as loongarch_def_array instead
of plain array.
(loongarch_cpu_default_isa): Likewise.
(loongarch_cpu_cache): Likewise.
(loongarch_cpu_align): Likewise.
(loongarch_cpu_rtx_cost_data): Likewise.
(loongarch_cpu_issue_rate): Likewise.
(loongarch_cpu_multipass_dfa_lookahead): Likewise.
(loongarch_isa_base_strings): Likewise.
(loongarch_isa_ext_strings): Likewise.
(loongarch_abi_base_strings): Likewise.
(loongarch_abi_ext_strings): Likewise.
(loongarch_cmodel_strings): Likewise.
(abi_minimal_isa): Likewise.
(loongarch_rtx_cost_optimize_size): Use field setter functions
instead of designated initializers.
(loongarch_rtx_cost_data): Implement default constructor.
Jakub Jelinek [Fri, 8 Dec 2023 07:29:44 +0000 (08:29 +0100)]
Add IntegerRange for -param=min-nondebug-insn-uid= and fix vector growing in LRA and vec [PR112411]
As documented, --param min-nondebug-insn-uid= is very useful in debugging
-fcompare-debug issues in RTL dumps, without it it is really hard to
find differences. With it, DEBUG_INSNs generally use low INSN_UIDs
(1+) and non-DEBUG_INSNs use INSN_UIDs from the parameter up.
For good results, the parameter should be larger than the number of
DEBUG_INSNs in all or at least problematic functions, so I typically
use --param min-nondebug-insn-uid=10000 or --param
min-nondebug-insn-uid=1000.
The PR is about using --param min-nondebug-insn-uid=2147483647 or
similar behavior can be achieved with that minus some epsilon,
INSN_UIDs for the non-debug insns then wrap around and as they are signed,
all kinds of things break. Obviously, that can happen even without that
option, but functions containing more than 2147483647 insns usually don't
compile much earlier due to getting out of memory.
As it is a debugging option, I'd prefer not to impose any drastically small
limits on it because if a function has a lot of DEBUG_INSNs, it is useful
to start still above them, otherwise the allocation of uids will DTRT
even for DEBUG_INSNs but there will be then differences in non-DEBUG_INSN
allocations.
So, the following patch uses 0x40000000 limit, half the maximum amount for
DEBUG_INSNs and half for non-DEBUG_INSNs, it will still result in very
unlikely overflows in real world.
Note, using large min-nondebug-insn-uid is very expensive for compile time
memory and compile time, because DF as well as various RTL passes use
arrays indexed by INSN_UIDs, e.g. LRA with sizeof (void *) elements,
ditto df (df->insns).
Now, in LRA I've ran into ICEs already with
--param min-nondebug-insn-uid=0x2aaaaaaa
on 64-bit host. It uses a custom vector management and wants to grow
allocation 1.5x when growing, but all this computation is done in int,
so already 0x2aaaaaab * 3 / 2 + 1 overflows to negative value. And
unlike vec.cc growing which also uses unsigned int type for the above
(and the + 1 is not there), it also doesn't make sure if there is an
overflow that it allocates at least as much as needed, vec.cc
does
if ...
else
/* Grow slower when large. */
alloc = (alloc * 3 / 2);
/* If this is still too small, set it to the right size. */
if (alloc < desired)
alloc = desired;
so even if there is overflow during the * 1.5 computation, but
desired is still representable in the range of the alloced counter
(31-bits in both vec.h and LRA), it doesn't grow exponentially but
at least works for the current value.
The patch now uses there
lra_insn_recog_data_len = index * 3U / 2;
if (lra_insn_recog_data_len <= index)
lra_insn_recog_data_len = index + 1;
basically do what vec.cc does. I thought we could do better for
both vec.cc and LRA on 64-bit hosts even without growing the allocated
counters, but now that I look at it again, perhaps we can't.
The above overflows already with original alloc or lra_insn_recog_data_len
0x55555556, where 0x5555555 * 3U / 2 is still 0x7fffffff
and so representable in the 32-bit, but 0x55555556 * 3U / 2 is
1. I thought that we could use alloc * (size_t) 3 / 2 so that on 64-bit
hosts it wouldn't overflow that quickly, but 0x55555556 * (size_t) 3 / 2
there is 0x80000001 which is still ok in unsigned, but given that vec.h
then stores the counter into unsigned m_alloc:31; bit-field, it is too much.
With the lra.cc change, one can actually compile simple function
with -O0 on 64-bit host with --param min-nondebug-insn-uid=0x40000000
(i.e. the new limit), but already needed quite a big part of my 32GB
RAM + 24GB swap.
The patch adds a dg-skip-if for that case though, because such option
is way too much for 32-bit hosts even at -O0 and empty function,
and with -O3 on a longer function it is too much for average 64-bit host
as well. Without the dg-skip-if I got on 64-bit host:
cc1: out of memory allocating 571230784744 bytes after a total of 2772992 bytes
and
cc1: out of memory allocating 1388 bytes after a total of 2002944 bytes
on 32-bit host. A test requiring more than 532GB of RAM on 64-bit hosts
is just too much for our testsuite.
2023-12-08 Jakub Jelinek <jakub@redhat.com>
PR middle-end/112411
* params.opt (-param=min-nondebug-insn-uid=): Add
IntegerRange(0, 1073741824).
* lra.cc (check_and_expand_insn_recog_data): Use 3U rather than 3
in * 3 / 2 computation and if the result is smaller or equal to
index, use index + 1.
* gcc.dg/params/blocksort-part.c: Add dg-skip-if for
--param min-nondebug-insn-uid=1073741824.
Haochen Jiang [Fri, 10 Nov 2023 02:03:37 +0000 (10:03 +0800)]
i386: Mark Xeon Phi ISAs as deprecated
Since Knight Landing and Knight Mill microarchitectures are EOL, we
would like to remove its support in GCC 15. In GCC 14, we will first
emit a warning for the usage.
gcc/ChangeLog:
* config/i386/driver-i386.cc (host_detect_local_cpu):
Do not append "-mno-" for Xeon Phi ISAs.
* config/i386/i386-options.cc (ix86_option_override_internal):
Emit a warning for KNL/KNM targets.
* config/i386/i386.opt: Emit a warning for Xeon Phi ISAs.
Hao Liu [Wed, 6 Dec 2023 06:52:19 +0000 (14:52 +0800)]
tree-optimization/112774: extend the SCEV CHREC tree with a nonwrapping flag
The flag is defined as CHREC_NOWRAP(tree), and will be dumped from
"{offset, +, 1}_1" to "{offset, +, 1}<nw>_1" (nw is short for nonwrapping).
Two SCEV interfaces record_nonwrapping_chrec and nonwrapping_chrec_p are
added to set and check the flag respectively.
As resetting the SCEV cache (i.e., the chrec trees) may not reset the
loop->estimate_state, free_numbers_of_iterations_estimates is called
explicitly in loop vectorization to make sure the flag can be
calculated propriately by niter.
gcc/ChangeLog:
PR tree-optimization/112774
* tree-pretty-print.cc: if nonwrapping flag is set, chrec will be
printed with additional <nw> info.
* tree-scalar-evolution.cc: add record_nonwrapping_chrec and
nonwrapping_chrec_p to set and check the new flag respectively.
* tree-scalar-evolution.h: Likewise.
* tree-ssa-loop-niter.cc (idx_infer_loop_bounds,
infer_loop_bounds_from_pointer_arith, infer_loop_bounds_from_signedness,
scev_probably_wraps_p): call record_nonwrapping_chrec before
record_nonwrapping_iv, call nonwrapping_chrec_p to check the flag is
set and return false from scev_probably_wraps_p.
* tree-vect-loop.cc (vect_analyze_loop): call
free_numbers_of_iterations_estimates explicitly.
* tree-core.h: document the nothrow_flag usage in CHREC_NOWRAP
* tree.h: add CHREC_NOWRAP(NODE), base.nothrow_flag is used to
represent the nonwrapping info.
David Malcolm [Fri, 8 Dec 2023 00:42:45 +0000 (19:42 -0500)]
analyzer: fix ICE for 2 bits before the start of base region [PR112889]
Cncrete bindings were using -1 and -2 in the offset field to signify
deleted and empty hash slots, but these are valid values, leading to
assertion failures inside hash_map::put on a debug build, and probable
bugs in a release build.
Fix by using the size field rather than the offset.
gcc/analyzer/ChangeLog:
PR analyzer/112889
* store.h (concrete_binding::concrete_binding): Strengthen
assertion to require size to be be positive, rather than just
non-zero.
(concrete_binding::mark_deleted): Use size rather than start bit
offset.
(concrete_binding::mark_empty): Likewise.
(concrete_binding::is_deleted): Likewise.
(concrete_binding::is_empty): Likewise.
gcc/testsuite/ChangeLog:
PR analyzer/112889
* c-c++-common/analyzer/ice-pr112889.c: New test.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
Juzhe-Zhong [Thu, 7 Dec 2023 22:09:10 +0000 (06:09 +0800)]
RISC-V: Support interleave vector with different step sequence
This patch fixes 64 ICEs in full coverage testing since they happens due to same reason.
Before this patch:
internal compiler error: in expand_const_vector, at config/riscv/riscv-v.cc:1270
appears 400 times in full coverage testing report.
The root cause is we didn't support interleave vector with different steps.
Here is the story:
We already supported interleave with single same step, that is:
e.g. v = { 0, 100, 2, 102, 4, 104, ... }
This sequence can be interpreted as interleave vector by 2 seperate sequences:
sequence1 = { 0, 2, 4, ... } and sequence2 = { 100, 102, 104, ... }.
Their step are both 2.
However, we didn't support interleave vector when they have different steps which
cause ICE in such situations.
This patch support different steps interleaved vector for the following 2 situations:
1. When vector can be extended EEW:
Case 1: { 0, 0, 1, 0, 2, 0, ... }
It's interleaved by sequence1 = { 0, 1, 2, ... } and sequence1 = { 0, 0, 0, ... }
Suppose the original vector can be extended EEW, e.g. mode = RVVM1SImode.
Then such interleaved vector can be achieved with { 1, 2, 3, ... } with RVVM1DImode.
So, for this situation the codegen is pretty efficient and clean:
* config/riscv/riscv-protos.h (expand_vec_series): Adapt function.
* config/riscv/riscv-v.cc (rvv_builder::double_steps_npatterns_p): New function.
(expand_vec_series): Adapt function.
(expand_const_vector): Support new interleave vector with different step.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/slp-interleave-1.c: New test.
* gcc.target/riscv/rvv/autovec/slp-interleave-2.c: New test.
* gcc.target/riscv/rvv/autovec/slp-interleave-3.c: New test.
* gcc.target/riscv/rvv/autovec/slp-interleave-4.c: New test.
Jonathan Wakely [Thu, 7 Dec 2023 12:40:18 +0000 (12:40 +0000)]
libstdc++: Fix misleading typedef name in <format>
This local typedef for uintptr_t was accidentally named uint64_t,
probably from a careless code completion shortcut. We don't need the
typedef at all since it's only used once. Just use __UINTPTR_TYPE__
directly instead.
libstdc++-v3/ChangeLog:
* include/std/format (_Iter_sink<charT, contiguous_iterator>):
Remove uint64_t local type.
Jonathan Wakely [Thu, 7 Dec 2023 11:00:02 +0000 (11:00 +0000)]
libstdc++: Use <cstdint> instead of <stdint.h> in <bits/atomic_wait.h>
In r14-5922-g6c8f2d3a08bc01 I added <stdint.h> to <bits/atomic_wait.h>,
so that uintptr_t is declared if that header is compiled as a header
unit. I used <stdint.h> because that's what <atomic> already includes,
so it seemed simpler to be consistent. However, this means that name
lookup for uintptr_t in <bits/atomic_wait.h> depends on whether
<cstdint> has been included by another header first. Whether name lookup
finds std::uintptr_t or ::uintptr_t will depend on include order. This
causes problems when compiling modules with Clang:
bits/atomic_wait.h:251:7: error: 'std::__detail::__waiter_pool_base' has different definitions in different modules; first difference is defined here found method '_S_for' with body
_S_for(const void* __addr) noexcept
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
bits/atomic_wait.h:251:7: note: but in 'tm.<global>' found method '_S_for' with different body
_S_for(const void* __addr) noexcept
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
By including <cstdint> we would ensure that name lookup always finds the
name in namespace std. Alternatively, we can stop including <stdint.h>
for those types, so that we don't declare the entire contents of
<stdint.h> when we only need a couple of types from it. This patch does
the former, which is appropriate for backporting.
libstdc++-v3/ChangeLog:
* include/bits/atomic_wait.h: Include <cstdint> instead of
<stdint.h>.
Jonathan Wakely [Wed, 6 Dec 2023 17:21:29 +0000 (17:21 +0000)]
libstdc++: Fix recent changes to __glibcxx_assert [PR112882]
The changes in r14-6198-g5e8a30d8b8f4d7 were broken, as I used
_GLIBCXX17_CONSTEXPR for the 'if _GLIBCXX17_CONSTEXPR (true)' condition,
forgetting that it would also be used for the is_constant_evaluated()
check. Using 'if constexpr (std::is_constant_evaluated())' is a bug.
Additionally, relying on __glibcxx_assert_fail to give a "not a constant
expression" error is a problem because at -O0 an undefined reference to
__glibcxx_assert_fail is present in the compiled code. This means you
can't use libstdc++ headers without also linking to libstdc++ for the
symbol definition.
This fix rewrites the __glibcxx_assert macro again. This still avoids
doing the duplicate checks, once for constexpr and once at runtime (if
_GLIBCXX_ASSERTIONS is defined). When _GLIBCXX_ASSERTIONS is defined we
still rely on __glibcxx_assert_fail to give a "not a constant
expression" error during constant evaluation (because when assertions
are defined it's not a problem to emit a reference to the symbol). But
when that macro is not defined, we use a new inline (but not constexpr)
overload of __glibcxx_assert_fail to cause compilation to fail. That
inline function doesn't cause an undefined reference to a symbol in the
library (and will be optimized away anyway).
We can also add always_inline to the __is_constant_evaluated function,
although this doesn't actually matter for -O0 and it's always inlined
with any optimization enabled.
libstdc++-v3/ChangeLog:
PR libstdc++/112882
* include/bits/c++config (__is_constant_evaluated): Add
always_inline attribute.
(_GLIBCXX_DO_ASSERT): Remove macro.
(__glibcxx_assert): Define separately for assertions-enabled and
constexpr-only cases.
This pass adds a simple register allocator for FP & SIMD registers.
Its main purpose is to make use of SME2's strided LD1, ST1 and LUTI2/4
instructions, which require a very specific grouping structure,
and so would be difficult to exploit with general allocation.
The allocator is very simple. It gives up on anything that would
require spilling, or that it might not handle well for other reasons.
The allocator needs to track liveness at the level of individual FPRs.
Doing that fixes a lot of the PRs relating to redundant moves caused by
structure loads and stores. That particular problem is going to be
fixed more generally for GCC 15 by Lehua's RA patches.
However, the early-RA pass runs before scheduling, so it has a chance
to bag a spill-free allocation of vector code before the scheduler moves
things around. It could therefore still be useful for non-SME code
(e.g. for hand-scheduled ACLE code) even after Lehua's patches are in.
The pass is controlled by a tristate switch:
- -mearly-ra=all: run on all functions
- -mearly-ra=strided: run on functions that have access to strided registers
- -mearly-ra=none: don't run on any function
The patch makes -mearly-ra=all the default at -O2 and above for now.
We can revisit this for GCC 15 once Lehua's patches are in;
-mearly-ra=strided might then be more appropriate.
As said previously, the pass is very naive. There's much more that we
could do, such as handling invariants better. The main focus is on not
committing to a bad allocation, rather than on handling as much as
possible.
gcc/
PR rtl-optimization/106694
PR rtl-optimization/109078
PR rtl-optimization/109391
* config.gcc: Add aarch64-early-ra.o for AArch64 targets.
* config/aarch64/t-aarch64 (aarch64-early-ra.o): New rule.
* config/aarch64/aarch64-opts.h (aarch64_early_ra_scope): New enum.
* config/aarch64/aarch64.opt (mearly_ra): New option.
* doc/invoke.texi: Document it.
* common/config/aarch64/aarch64-common.cc
(aarch_option_optimization_table): Use -mearly-ra=strided by
default for -O2 and above.
* config/aarch64/aarch64-passes.def (pass_aarch64_early_ra): New pass.
* config/aarch64/aarch64-protos.h (aarch64_strided_registers_p)
(make_pass_aarch64_early_ra): Declare.
* config/aarch64/aarch64-sme.md (@aarch64_sme_lut<LUTI_BITS><mode>):
Add a stride_type attribute.
(@aarch64_sme_lut<LUTI_BITS><mode>_strided2): New pattern.
(@aarch64_sme_lut<LUTI_BITS><mode>_strided4): Likewise.
* config/aarch64/aarch64-sve-builtins-base.cc (svld1_impl::expand)
(svldnt1_impl::expand, svst1_impl::expand, svstn1_impl::expand): Handle
new way of defining multi-register loads and stores.
* config/aarch64/aarch64-sve.md (@aarch64_ld1<SVE_FULLx24:mode>)
(@aarch64_ldnt1<SVE_FULLx24:mode>, @aarch64_st1<SVE_FULLx24:mode>)
(@aarch64_stnt1<SVE_FULLx24:mode>): Delete.
* config/aarch64/aarch64-sve2.md (@aarch64_<LD1_COUNT:optab><mode>)
(@aarch64_<LD1_COUNT:optab><mode>_strided2): New patterns.
(@aarch64_<LD1_COUNT:optab><mode>_strided4): Likewise.
(@aarch64_<ST1_COUNT:optab><mode>): Likewise.
(@aarch64_<ST1_COUNT:optab><mode>_strided2): Likewise.
(@aarch64_<ST1_COUNT:optab><mode>_strided4): Likewise.
* config/aarch64/aarch64.cc (aarch64_strided_registers_p): New
function.
* config/aarch64/aarch64.md (UNSPEC_LD1_SVE_COUNT): Delete.
(UNSPEC_ST1_SVE_COUNT, UNSPEC_LDNT1_SVE_COUNT): Likewise.
(UNSPEC_STNT1_SVE_COUNT): Likewise.
(stride_type): New attribute.
* config/aarch64/constraints.md (Uwd, Uwt): New constraints.
* config/aarch64/iterators.md (UNSPEC_LD1_COUNT, UNSPEC_LDNT1_COUNT)
(UNSPEC_ST1_COUNT, UNSPEC_STNT1_COUNT): New unspecs.
(optab): Handle them.
(LD1_COUNT, ST1_COUNT): New iterators.
* config/aarch64/aarch64-early-ra.cc: New file.
gcc/testsuite/
PR rtl-optimization/106694
PR rtl-optimization/109078
PR rtl-optimization/109391
* gcc.target/aarch64/ldp_stp_16.c (cons4_4_float): Tighten expected
output test.
* gcc.target/aarch64/sve/shift_1.c: Allow reversed shifts for .s
as well as .d.
* gcc.target/aarch64/sme/strided_1.c: New test.
* gcc.target/aarch64/pr109078.c: Likewise.
* gcc.target/aarch64/pr109391.c: Likewise.
* gcc.target/aarch64/sve/pr106694.c: Likewise.
Ezra Sitorus [Thu, 7 Dec 2023 15:41:06 +0000 (15:41 +0000)]
arm: vld1_types_x4 ACLE intrinsics
This patch is part of a series of patches implementing the _xN
variants of the vld1 intrinsic for the arm port. This patch adds the
_x4 variants of the vld1 intrinsic.
The previous vld1_x4 has been updated to vld1q_x4 to take into
account that it works with 4-word-length types. vld1_x4 is now
only for 2-word-length types.
ISA documents:
https://developer.arm.com/documentation/ddi0487/latest/
gcc/ChangeLog:
* config/arm/arm_neon.h
(vld1_u8_x4, vld1_u16_x4, vld1_u32_x4, vld1_u64_x4): New
(vld1_s8_x4, vld1_s16_x4, vld1_s32_x4, vld1_s64_x4): New.
(vld1_f16_x4, vld1_f32_x4): New.
(vld1_p8_x4, vld1_p16_x4, vld1_p64_x4): New.
(vld1_bf16_x4): New.
(vld1q_types_x4): Updated to use vld1q_x4
from arm_neon_builtins.def
* config/arm/arm_neon_builtins.def
(vld1_x4): Updated entries.
(vld1q_x4): New entries, but comes from the old vld1_x2
* config/arm/neon.md (neon_vld1q_x4<mode>):
Updated from neon_vld1_x4<mode>.
gcc/testsuite/ChangeLog:
* gcc.target/arm/simd/vld1_base_xN_1.c: Add new tests.
* gcc.target/arm/simd/vld1_bf16_xN_1.c: Add new tests.
* gcc.target/arm/simd/vld1_fp16_xN_1.c: Add new tests.
* gcc.target/arm/simd/vld1_p64_xN_1.c: Add new tests.
Ezra Sitorus [Thu, 7 Dec 2023 15:41:05 +0000 (15:41 +0000)]
arm: vld1_types_x3 ACLE intrinsics
This patch is part of a series of patches implementing the _xN
variants of the vld1 intrinsic for the arm port. This patch adds the
_x3 variants of the vld1 intrinsic.
The previous vld1_x3 has been updated to vld1q_x3 to take into
account that it works with 4-word-length types. vld1_x3 is now
only for 2-word-length types.
ISA documents:
https://developer.arm.com/documentation/ddi0487/latest/
gcc/ChangeLog:
* config/arm/arm_neon.h
(vld1_u8_x3, vld1_u16_x3, vld1_u32_x3, vld1_u64_x3): New
(vld1_s8_x3, vld1_s16_x3, vld1_s32_x3, vld1_s64_x3): New.
(vld1_f16_x3, vld1_f32_x3): New.
(vld1_p8_x3, vld1_p16_x3, vld1_p64_x3): New.
(vld1_bf16_x3): New.
(vld1q_types_x3): Updated to use vld1q_x3 from
arm_neon_builtins.def
* config/arm/arm_neon_builtins.def
(vld1_x3): Updated entries.
(vld1q_x3): New entries, but comes from the old vld1_x2
* config/arm/neon.md (neon_vld1q_x3<mode>): Updated from
neon_vld1_x3<mode>.
gcc/testsuite/ChangeLog:
* gcc.target/arm/simd/vld1_base_xN_1.c: Add new tests.
* gcc.target/arm/simd/vld1_bf16_xN_1.c: Add new tests.
* gcc.target/arm/simd/vld1_fp16_xN_1.c: Add new tests.
* gcc.target/arm/simd/vld1_p64_xN_1.c: Add new tests.
Ezra Sitorus [Thu, 7 Dec 2023 15:41:04 +0000 (15:41 +0000)]
arm: vld1_types_x2 ACLE intrinsics
This patch is part of a series of patches implementing the _xN
variants of the vld1 intrinsic for the arm port. This patch adds the
_x2 variants of the vld1 intrinsic.
The previous vld1_x2 has been updated to vld1q_x2 to take into
account that it works with 4-word-length types. vld1_x2 is now
only for 2-word-length types.
ISA documents:
https://developer.arm.com/documentation/ddi0487/latest/
gcc/ChangeLog:
* config/arm/arm_neon.h
(vld1_u8_x2, vld1_u16_x2, vld1_u32_x2, vld1_u64_x2): New
(vld1_s8_x2, vld1_s16_x2, vld1_s32_x2, vld1_s64_x2): New.
(vld1_f16_x2, vld1_f32_x2): New.
(vld1_p8_x2, vld1_p16_x2, vld1_p64_x2): New.
(vld1_bf16_x2): New.
(vld1q_types_x2): Updated to use vld1q_x2 from
arm_neon_builtins.def
* config/arm/arm_neon_builtins.def
(vld1_x2): Updated entries.
(vld1q_x2): New entries, but comes from the old vld1_x2
* config/arm/neon.md
(neon_vld1<VMEMX2_q>_x2<VDQX:mode>): Updated
from neon_vld1_x2<mode>.
gcc/testsuite/ChangeLog:
* gcc.target/arm/simd/vld1_base_xN_1.c: Add new tests.
* gcc.target/arm/simd/vld1_bf16_xN_1.c: Add new tests.
* gcc.target/arm/simd/vld1_fp16_xN_1.c: Add new tests.
* gcc.target/arm/simd/vld1_p64_xN_1.c: Add new tests.
Ezra Sitorus [Thu, 7 Dec 2023 15:36:52 +0000 (15:36 +0000)]
arm: vst1q_types_x4 ACLE intrinsics
This patch is part of a series of patches implementing the _xN
variants of the vst1q intrinsic for the arm port. This patch adds the
_x4 variants of the vst1q intrinsic.
Ezra Sitorus [Thu, 7 Dec 2023 15:36:51 +0000 (15:36 +0000)]
arm: vst1q_types_x3 ACLE intrinsics
This patch is part of a series of patches implementing the _xN
variants of the vst1q intrinsic for the arm port. This patch adds the
_x3 variants of the vst1q intrinsic.
Ezra Sitorus [Thu, 7 Dec 2023 15:36:50 +0000 (15:36 +0000)]
arm: vst1q_types_x2 ACLE intrinsics
This patch is part of a series of patches implementing the _xN
variants of the vst1q intrinsic for the arm port. This patch adds the
_x2 variants of the vst1q intrinsic.
Ezra Sitorus [Thu, 7 Dec 2023 15:28:44 +0000 (15:28 +0000)]
arm: vst1_types_x4 ACLE intrinsics
This patch is part of a series of patches implementing the _xN
variants of the vst1 intrinsic for the arm port. This patch adds the
_x4 variants of the vst1 intrinsic.
Ezra Sitorus [Thu, 7 Dec 2023 15:28:43 +0000 (15:28 +0000)]
arm: vst1_types_x3 ACLE intrinsics
This patch is part of a series of patches implementing the _xN
variants of the vst1 intrinsic for the arm port. This patch adds the
_x3 variants of the vst1 intrinsic.
Ezra Sitorus [Thu, 7 Dec 2023 15:28:42 +0000 (15:28 +0000)]
arm: vst1_types_x2 ACLE intrinsics
This patch is part of a series of patches implementing the _xN
variants of the vst1 intrinsic for the arm port. This patch adds the
_x2 variants of the vst1 intrinsic.
Ezra Sitorus [Thu, 7 Dec 2023 15:21:56 +0000 (15:21 +0000)]
arm: vld1q_types_x4 ACLE intrinsics
This patch is part of a series of patches implementing the _xN
variants of the vld1q intrinsic for the arm port. This patch adds the
_x4 variants of the vld1q intrinsic.
Ezra Sitorus [Thu, 7 Dec 2023 15:21:55 +0000 (15:21 +0000)]
arm: vld1q_types_x3 ACLE intrinsics
This patch is part of a series of patches implementing the _xN
variants of the vld1q intrinsic for the arm port. This patch adds the
_x3 variants of the vld1q intrinsic.
Ezra Sitorus [Thu, 7 Dec 2023 15:21:54 +0000 (15:21 +0000)]
arm: vld1q_types_x2 ACLE intrinsics
This patch is part of a series of patches implementing the _xN
variants of the vld1q intrinsic for the arm port. This patch adds the
_x2 variants of the vld1q intrinsic.
Alexandre Oliva [Thu, 7 Dec 2023 15:58:20 +0000 (12:58 -0300)]
strub: enable conditional support
Targets that don't expose callee stacks to callers, such as nvptx, as
well as -fsplit-stack compilations, violate fundamental assumptions of
the current strub implementation. This patch enables targets to
disable strub, and disables it when -fsplit-stack is enabled.
When strub support is disabled, the testsuite will now skip strub
tests, and libgcc will not build the strub runtime components.
for gcc/ChangeLog
* target.def (have_strub_support_for): New hook.
* doc/tm.texi.in: Document it.
* doc/tm.texi: Rebuild.
* ipa-strub.cc: Include target.h.
(strub_target_support_p): New.
(can_strub_p): Call it. Test for no flag_split_stack.
(pass_ipa_strub::adjust_at_calls_call): Check for target
support.
* config/nvptx/nvptx.cc (TARGET_HAVE_STRUB_SUPPORT_FOR):
Disable.
* doc/sourcebuild.texi (strub): Document new effective
target.
Marc Poulhiès [Mon, 6 Nov 2023 11:01:17 +0000 (12:01 +0100)]
testsuite: refine gcc.dg/analyzer/fd-4.c test for newlib
Contrary to glibc, including stdio.h from newlib defines mode_t which
conflicts with the test's type definition.
.../gcc/testsuite/gcc.dg/analyzer/fd-4.c:19:3: error: redefinition of typedef 'mode_t' with different type
...
.../include/sys/types.h:189:25: note: previous declaration of 'mode_t' with type 'mode_t' {aka 'unsigned int'}
Defining _MODE_T_DECLARED skips the type definition.
Gaius Mulley [Thu, 7 Dec 2023 13:10:49 +0000 (13:10 +0000)]
PR modula2/112893 detect procedure address incompatible with cardinal in iso
In ISO m2 the type cardinal is assignment incompatible with address (but
it is allowed in PIM). The patch also extends the type checker to include
procedures (which appear as having GetType () = address). At some point
this should be be improved to use a pointer to proc type. Perhaps in
the next stage1.
For now this will catch procedures being passed as actual parameters into
a formal cardinal parameter in ISO m2 (for example).
gcc/m2/ChangeLog:
PR modula2/112893
* gm2-compiler/M2Base.mod (Ass): Extend array to include proc row
and column. Allow PIM to assign cardinal variables to address
variables.
(Expr): Ditto.
(Comp): Ditto.
* gm2-compiler/M2Check.mod (getSType): New procedure function.
Replace all occurances of GetSType with getSType.
* gm2-compiler/M2GenGCC.mod (CodeParam): Rewrite format specifier
error message.
* gm2-compiler/M2Quads.mod (CheckProcTypeAndProcedure): Add tokno
parameter.
* gm2-compiler/M2Range.def (InitTypesParameterCheck): Add tokno
parameter.
(InitParameterRangeCheck): Add tokno parameter.
Remove EXPORT QUALIFIED list.
(InitParameterRangeCheck): Add tokno parameter.
* gm2-compiler/M2Range.mod (InitTypesParameterCheck): Add tokno
parameter and pass tokno to PutRangeParam.
(InitParameterRangeCheck): Add tokno parameter and pass tokno to
PutRangeParam.
(PutRangeParam): Add tokno parameter and assign to tokenNo.
(FoldTypeParam): Rewrite format string.
gcc/testsuite/ChangeLog:
PR modula2/112893
* gm2/iso/fail/proccard.mod: New test.
* gm2/pim/pass/proccard.mod: New test.
which is the operand change according to location. Such operand change in 2 locations instead of 1.
So regenerate pattern for such instructions AVL propagation to fix the ICEs.
gcc/ChangeLog:
* config/riscv/riscv-avlprop.cc (simplify_replace_avl): New function.
(simplify_replace_vlmax_avl): Fix bug.
* config/riscv/t-riscv: Add a new include file.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/vsetvl/avl_prop-2.c: New test.
RISC-V: xtheadmemidx: Document inline asm issue with memory constraint
The XTheadMemIdx support relies on the fact that memory operands that
can be expressed by XTheadMemIdx instructions, will only appear as
operands of such instructions. For internal instruction generation
this is guaranteed by the implemenation. However, in case of inline
assembly, this guarantee is not given and we cannot differentiate
these two cases when printing the operand:
If XTheadMemIdx is enabled, then the address will be printed as if an
XTheadMemIdx instruction is emitted, which is obviously wrong in the
first case.
There might be solutions to handle this (e.g. using TARGET_MEM_CONSTRAINT
or extending the mnemonics to accept the standard operands for
XTheadMemIdx instructions), but let's document this behavior for now
as a known issue by adding xfail tests until we have an acceptable fix.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/xtheadmemidx-inline-asm-1.c: New test.
Reported-by: Jin Ma <jinma@linux.alibaba.com> Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
This results in the following assembler error:
Assembler messages:
Error: unrecognized opcode `th.lrd a5,a5,a0,0', extension `xtheadmemidx' required
There seems to be a (reasonable) assumption, that addressing modes
for FP registers are compatible with those of GP registers.
We already ran into a similar issue during development of the
XTheadFMemIdx support patch, where we could trace the issue down to
the optimization splitters. Back then we simply disabled them in case
XTheadMemIdx is not available. But as it turned out, that was not
enough.
To ensure, we won't see such issues anymore, let's make the support
for XTheadFMemIdx depend on XTheadMemIdx. I.e., if only XTheadFMemIdx
is available, then no instructions of this extension will be emitted.
While this looks a bit drastic at first view, it is the best practical
solution since XTheadFMemIdx without XTheadMemIdx does not exist in real
hardware and would be an odd thing to do.
gcc/ChangeLog:
* config/riscv/thead.cc (th_memidx_classify_address_index):
Require TARGET_XTHEADMEMIDX for FP modes.
* config/riscv/thead.md: Require TARGET_XTHEADMEMIDX for all
XTheadFMemIdx pattern.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/xtheadfmemidx-without-xtheadmemidx.c: New test.
Reported-by: Jin Ma <jinma@linux.alibaba.com> Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
Jakub Jelinek [Thu, 7 Dec 2023 08:48:57 +0000 (09:48 +0100)]
testsuite: Add testcase for already fixed PR [PR111068]
This one unfortunately can't be bisected, it ICEd until r14-3430
inclusive, but r14-3431 removed -mavx10.1-512 support and when it
was readded in r14-5607 it doesn't ICE anymore.
I'm just committing the testcase so that it doesn't reappear.
2023-12-07 Jakub Jelinek <jakub@redhat.com>
PR target/111068
* gcc.target/i386/pr111068.c: New test.
Jakub Jelinek [Thu, 7 Dec 2023 08:47:54 +0000 (09:47 +0100)]
c-family: Fix up -fno-debug-cpp [PR111965]
As can be seen in the second testcase, -fno-debug-cpp is actually
implemented the same as -fdebug-cpp and so doesn't turn the debugging
off.
The following patch fixes that.
2023-12-07 Andrew Pinski <pinskia@gmail.com>
Jakub Jelinek <jakub@redhat.com>
PR preprocessor/111965
gcc/c-family/
* c-opts.cc (c_common_handle_option) <case OPT_fdebug_cpp>: Set
cpp_opts->debug to value rather than 1.
gcc/testsuite/
* gcc.dg/cpp/pr111965-1.c: New test.
* gcc.dg/cpp/pr111965-2.c: New test.
Jakub Jelinek [Thu, 7 Dec 2023 08:47:16 +0000 (09:47 +0100)]
expr: Handle BITINT_TYPE in count_type_elements [PR112881]
The following testcaser ICEs during gimplification, because
count_type_elements doesn't handle BITINT_TYPE. It should handle it like
other integral types.
2023-12-07 Jakub Jelinek <jakub@redhat.com>
PR middle-end/112881
* expr.cc (count_type_elements): Handle BITINT_TYPE like INTEGER_TYPE.
Jakub Jelinek [Thu, 7 Dec 2023 08:46:38 +0000 (09:46 +0100)]
tree-ssa-dce: Fix up maybe_optimize_arith_overflow for BITINT_TYPE [PR112880]
The following testcase ICEs because maybe_optimize_arith_overflow
uses build_nonstandard_integer_type, which is inappropriate if
type is large BITINT_TYPE.
2023-12-07 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/112880
* tree-ssa-dce.cc (maybe_optimize_arith_overflow): Use
unsigned_type_for instead of conditionally calling
build_nonstandard_integer_type.
Jakub Jelinek [Thu, 7 Dec 2023 08:45:13 +0000 (09:45 +0100)]
testsuite: Fix up gcc.target/s390/pr96127.c test for modern C [PR96127]
I've noticed this test regressed on s390x-linux with the addition of the
switch to modern C patchset. Haven't tried to reproduce the ICE, but as it
was a backend ICE and FE after warning used to add such casts before (now
errors), I think this ought to keep the testcase testing what was intended
before.
2023-12-07 Jakub Jelinek <jakub@redhat.com>
PR target/96127
* gcc.target/s390/pr96127.c (c1): Add casts to long int *.
Alexandre Oliva [Thu, 7 Dec 2023 03:38:18 +0000 (00:38 -0300)]
analyzer: deal with -fshort-enums
On platforms that enable -fshort-enums by default, various switch-enum
analyzer tests fail, because apply_constraints_for_gswitch doesn't
expect the integral promotion type cast. I've arranged for the code
to cope with those casts.
for gcc/analyzer/ChangeLog
* region-model.cc (has_nondefault_case_for_value_p): Take
enumerate type as a parameter.
(region_model::apply_constraints_for_gswitch): Cope with
integral promotion type casts.
Alexandre Oliva [Thu, 7 Dec 2023 03:38:14 +0000 (00:38 -0300)]
libsupc++: try cxa_thread_atexit_impl at runtime
g++.dg/tls/thread_local-order2.C fails when the toolchain is built for
a platform that lacks __cxa_thread_atexit_impl, even if the program is
built and run using that toolchain on a (later) platform that offers
__cxa_thread_atexit_impl.
This patch adds runtime testing for __cxa_thread_atexit_impl on select
platforms (GNU variants, for starters) that support weak symbols.
aarch64: rcpc3: Add relevant iterators to handle Neon intrinsics
The LDAP1 and STL1 Neon ACLE intrinsics, operating on 64-bit data
values, operate on single-lane (Vt.1D) or twin-lane (Vt.2D) SIMD
register configurations, either in the DI or DF modes. This leads to
the need for a mode iterator accounting for the V1DI, V1DF, V2DI and
V2DF modes.
This patch therefore introduces the new V12DIF mode iterator with
which to generate functions operating on signed 64-bit integer and
float values and V12DIUP for generating the unsigned and
polynomial-type counterparts. Along with this, we modify the
associated mode attributes accordingly in order to allow for the
implementation of the relevant backend patterns for the intrinsics.
gcc/ChangeLog:
* config/aarch64/iterators.md (V12DIF): New.
(V12DUP): Likewise.
(VEL): Add support for all V12DIF-associated modes.
(Vetype): Add support for V1DI and V1DF.
(Vel): Likewise.
Hongyu Wang [Sat, 2 Dec 2023 04:55:59 +0000 (12:55 +0800)]
[APX NDD] Support TImode shift for NDD
For TImode shifts, they are splitted by splitter functions, which assume
operands[0] and operands[1] to be the same. For the NDD alternative the
assumption may not be true so add split functions for NDD to emit the NDD
form instructions, and omit the handling of !64bit target split.
Although the NDD form allows memory src, for post-reload splitter there are
no extra register to accept NDD form shift, especially shld/shrd. So only
accept register alternative for shift src under NDD.
gcc/ChangeLog:
* config/i386/i386-expand.cc (ix86_split_ashl_ndd): New
function to split NDD form lshift.
(ix86_split_rshift_ndd): Likewise for l/ashiftrt.
* config/i386/i386-protos.h (ix86_split_ashl_ndd): New
prototype.
(ix86_split_rshift_ndd): Likewise.
* config/i386/i386.md (ashl<mode>3_doubleword): Add NDD
alternative, call ndd split function when operands[0]
not equal to operands[1].
(define_split for doubleword lshift): Likewise.
(define_peephole for doubleword lshift): Likewise.
(<insn><mode>3_doubleword): Likewise for l/ashiftrt.
(define_split for doubleword l/ashiftrt): Likewise.
(define_peephole for doubleword l/ashiftrt): Likewise.
Hongyu Wang [Wed, 8 Nov 2023 08:04:26 +0000 (16:04 +0800)]
[APX NDD] Support APX NDD for cmove insns
gcc/ChangeLog:
* config/i386/i386.md (*mov<mode>cc_noc): Extend with new constraints
to support NDD.
(*movsicc_noc_zext): Likewise.
(*movsicc_noc_zext_1): Likewise.
(*movqicc_noc): Likewise.
Hongyu Wang [Tue, 7 Nov 2023 08:28:28 +0000 (16:28 +0800)]
[APX NDD] Support APX NDD for shld/shrd insns
For shld/shrd insns, the old pattern use match_dup 0 as its shift src and use
+r*m as its constraint. To support NDD we added new define_insns to handle NDD
form pattern with extra input and dest operand to be fixed in register.
Hongyu Wang [Tue, 31 Oct 2023 06:21:16 +0000 (14:21 +0800)]
[APX NDD] Support APX NDD for rotate insns
gcc/ChangeLog:
* config/i386/i386.md (*<insn><mode>3_1): Extend with a new
alternative to support NDD for SI/DI rotate, and adjust output
template.
(*<insn>si3_1_zext): Likewise.
(*<insn><mode>3_1): Likewise for QI/HI modes.
(rcrsi2): Likewise, and use nonimmediate_operand for operands[1]
to accept memory input for NDD alternative.
(rcrdi2): Likewise.
gcc/testsuite/ChangeLog:
* gcc.target/i386/apx-ndd.c: Add test for left/right rotate.
Hongyu Wang [Wed, 25 Oct 2023 08:26:49 +0000 (16:26 +0800)]
[APX NDD] Support APX NDD for right shift insns
Similar to LSHIFT, rshift do not need to omit $1 for NDD form.
gcc/ChangeLog:
* config/i386/i386.md (ashr<mode>3_cvt): Extend with new
alternatives to support NDD, and adjust output templates.
(*ashr<mode>3_1): Likewise for SI/DI mode.
(*lshr<mode>3_1): Likewise.
(*<insn>si3_1_zext): Likewise.
(*ashr<mode>3_1): Likewise for QI/HI mode.
(*lshrqi3_1): Likewise.
(*lshrhi3_1): Likewise.
(<insn><mode>3_cmp): Likewise.
(*<insn><mode>3_cconly): Likewise.
(*ashrsi3_cvt_zext): Likewise, and use nonimmediate_operand for
operands[1] to accept memory input for NDD alternative.
(*highpartdisi2): Likewise.
(*<insn>si3_cmp_zext): Likewise.
(<insn><mode>3_carry): Likewise.
Hongyu Wang [Wed, 25 Oct 2023 07:07:29 +0000 (15:07 +0800)]
[APX NDD] Support APX NDD for left shift insns
For left shift, there is an optimization TARGET_DOUBLE_WITH_ADD that shl
1 can be optimized to add. As NDD form of add requires src operand to
be register since NDD cannot take 2 memory src, we currently just keep
using NDD form shift instead of add.
The optimization TARGET_SHIFT1 will try to remove constant 1 to use shorter
opcode, but under NDD assembler will automatically use it whether $1 exist
or not, so do not involve NDD with it.
The doubleword insns for left shift calls ix86_expand_ashl, which assume
all shift related pattern has same operand[0] and operand[1]. For these pattern
we will support them in a standalone patch.
gcc/ChangeLog:
* config/i386/i386.md (*ashl<mode>3_1): Extend with new
alternatives to support NDD, limit the new alternative to
generate sal only, and adjust output template for NDD.
(*ashlsi3_1_zext): Likewise.
(*ashlhi3_1): Likewise.
(*ashlqi3_1): Likewise.
(*ashl<mode>3_cmp): Likewise.
(*ashlsi3_cmp_zext): Likewise, and use nonimmediate_operand for
operands[1] to accept memory input for NDD alternative.
(*ashl<mode>3_cconly): Likewise.
(*ashl<dwi>3_doubleword_highpart): Adjust codegen for NDD.
Kong Lingling [Fri, 19 May 2023 02:50:29 +0000 (10:50 +0800)]
[APX NDD] Support APX NDD for or/xor insn
Similar to AND insn, two splitters need to be adjusted to prevent
misoptimizaiton for NDD OR/XOR.
Also adjust *one_cmplsi2_2_zext and its corresponding splitter that will
generate xor insn.
gcc/ChangeLog:
* config/i386/i386.md (<code><mode>3): Add new alternative for NDD
and adjust output templates.
(*<code><mode>_1): Likewise.
(*<code>qi_1): Likewise.
(*notxor<mode>_1): Likewise.
(*<code>si_1_zext): Likewise.
(*notxorqi_1): Likewise.
(*<code><mode>_2): Likewise.
(*<code>si_2_zext): Likewise.
(*<code>si_2_zext_imm): Likewise.
(*<code>si_1_zext_imm): Likewise, and use nonimmediate_operand for
operands[1] to accept memory input for NDD alternative.
(*one_cmplsi2_2_zext): Likewise.
(define_split for *one_cmplsi2_2_zext): Use nonimmediate_operand for
operands[3].
(*<code><dwi>3_doubleword): Add NDD constraints, adopt '&' to NDD dest
and emit move for optimized case if operands[0] != operands[1] or
operands[4] != operands[5].
(define_split for QI highpart OR/XOR): Prohibit splitter to split NDD
form OR/XOR insn to <any_logic:code>qi_ext<mode>_3.
(define_split for QI strict_lowpart optimization): Prohibit splitter to
split NDD form AND insn to *<code><mode>3_1_slp.
Kong Lingling [Wed, 17 May 2023 09:20:37 +0000 (17:20 +0800)]
[APX NDD] Support APX NDD for and insn
For NDD form AND insn, there are three splitter fixes after extending legacy
patterns.
1. APX NDD does not support high QImode registers like ah, bh, ch, dh, so for
some optimization splitters that generates highpart zero_extract for QImode
need to be prohibited under NDD pattern.
2. Legacy AND insn will use r/qm/L constraint, and a post-reload splitter will
transform it into zero_extend move. But for NDD form AND, the splitter is not
strict enough as the splitter assum such AND will have the const_int operand
matching the constraint "L", then NDD form AND allows const_int with any QI
values. Restrict the splitter condition to match "L" constraint that strictly
matches zero-extend sematic.
3. Legacy AND insn will adopt r/0/Z constraint, a splitter will try to optimize
such form into strict_lowpart QImode AND when 7th bit is not set. But the
splitter will wronly convert non-zext form of NDD and with memory src, then the
strict_lowpart transform matches alternative 1 of *<code><mode>_slp_1 and
generates *movstrict<mode>_1 so the zext sematic was omitted. This could cause
highpart of dest not cleared and generates wrong code. Disable the splitter
when NDD adopted and operands[0] and operands[1] are not equal.
gcc/ChangeLog:
* config/i386/i386.md (and<mode>3): Add NDD alternatives and adjust
output template.
(*anddi_1): Likewise.
(*and<mode>_1): Likewise.
(*andqi_1): Likewise.
(*andsi_1_zext): Likewise.
(*anddi_2): Likewise.
(*andsi_2_zext): Likewise.
(*andqi_2_maybe_si): Likewise.
(*and<mode>_2): Likewise.
(*and<dwi>3_doubleword): Add NDD alternative, adopt '&' to NDD dest and
emit move for optimized case if operands[0] not equal to operands[1].
(define_split for QI highpart AND): Prohibit splitter to split NDD
form AND insn to <any_logic:code>qi_ext<mode>_3.
(define_split for QI strict_lowpart optimization): Prohibit splitter to
split NDD form AND insn to *<code><mode>3_1_slp.
(define_split for zero_extend and optimization): Prohibit splitter to
split NDD form AND insn to zero_extend insn.
Kong Lingling [Mon, 22 May 2023 02:08:39 +0000 (10:08 +0800)]
[APX NDD] Support APX NDD for not insn
For *one_cmplsi2_2_zext, it will be splitted to xor, so its NDD form will be
added together with xor NDD support.
gcc/ChangeLog:
* config/i386/i386.md (one_cmpl<mode>2): Add new constraints for NDD
and adjust output template.
(*one_cmpl<mode>2_1): Likewise.
(*one_cmplqi2_1): Likewise.
(*one_cmpl<dwi>2_doubleword): Likewise, and adopt '&' to NDD dest.
(*one_cmpl<mode>2_2): Likewise.
(*one_cmplsi2_1_zext): Likewise, and use nonimmediate_operand for
operands[1] to accept memory input for NDD alternative.
Kong Lingling [Fri, 19 May 2023 09:15:52 +0000 (17:15 +0800)]
[APX NDD] Support APX NDD for neg insn
gcc/ChangeLog:
* config/i386/i386-expand.cc (ix86_expand_unary_operator): Add use_ndd
parameter and adjust for NDD.
* config/i386/i386-protos.h: Add use_ndd parameter for
ix86_unary_operator_ok and ix86_expand_unary_operator.
* config/i386/i386.cc (ix86_unary_operator_ok): Add use_ndd parameter
and adjust for NDD.
* config/i386/i386.md (neg<mode>2): Add new constraint for NDD and
adjust output template.
(*neg<mode>_1): Likewise.
(*neg<dwi>2_doubleword): Likewise and adopt '&' to NDD dest.
(*neg<mode>_2): Likewise.
(*neg<mode>_ccc_1): Likewise.
(*neg<mode>_ccc_2): Likewise.
(*negsi_1_zext): Likewise, and use nonimmediate_operand for operands[1]
to accept memory input for NDD alternatives.
(*negsi_2_zext): Likewise.
Kong Lingling [Wed, 18 Jan 2023 07:51:23 +0000 (15:51 +0800)]
[APX NDD] Support APX NDD for sbb insn
Similar to *add<dwi>3_doubleword, operands[1] may not equal to operands[0] so
extra move and earlyclobber are required.
gcc/ChangeLog:
* config/i386/i386.md (*sub<dwi>3_doubleword): Add new alternative for
NDD, adopt '&' modifier to NDD dest and emit move when operands[0] not
equal to operands[1].
(*sub<dwi>3_doubleword_zext): Likewise.
(*subv<dwi>4_doubleword): Likewise.
(*subv<dwi>4_doubleword_1): Likewise.
(*subv<mode>4_overflow_1): Add NDD alternatives and adjust output
templates.
(*subv<mode>4_overflow_2): Likewise.
(@sub<mode>3_carry): Likewise.
(*addsi3_carry_zext_0r): Likewise, and use nonimmediate_operand for
operands[1] to accept memory input for NDD alternative.
(*subsi3_carry_zext): Likewise.
(subborrow<mode>): Parse TARGET_APX_NDD to ix86_binary_operator_ok.
(subborrow<mode>_0): Likewise.
(*sub<mode>3_eq): Likewise.
(*sub<mode>3_ne): Likewise.
(*sub<mode>3_eq_1): Likewise.
Kong Lingling [Wed, 18 Jan 2023 09:52:52 +0000 (17:52 +0800)]
[APX NDD] Support APX NDD for adc insns
Legacy adc patterns are commonly adopted to TImode add, when extending TImode
add to NDD version, operands[0] and operands[1] can be different, so extra move
should be emitted if those patterns have optimization when adding const0_rtx.
For TImode insn, there could be register overlapping between operands[0]
and operands[1] as x86 allocates TImode register sequentially like rax:rdi,
rdi:rdx. After postreload split for TImode, write to 1st highpart rdi will
be overrided by the 2nd lowpart rdi if 2nd lowpart rdi have different src as
input, then the write to 1st highpart rdi will missed and cause miscompliation.
In addition, when input operands contain memory, the address register may also
overlaps with dest register if it is marked dead after one of highpart/lowpart
operation was done.
So the earlyclobber modifier '&' should be added to NDD dest to avoid
overlapping between dest and src operands.
NDD instructions will automatically zero-extend dest register to 64bit, so for
zext patterns it can adopt all NDD form that have memory src input.
gcc/ChangeLog:
* config/i386/i386.md (*add<dwi>3_doubleword): Add ndd alternatives,
adopt '&' to ndd dest and move operands[1] to operands[0] when they are
not equal.
(*add<dwi>3_doubleword_cc_overflow_1): Likewise.
(*addv<dwi>4_doubleword): Likewise.
(*addv<dwi>4_doubleword_1): Likewise.
(*add<dwi>3_doubleword_zext): Likewise.
(addv<mode>4_overflow_1): Add ndd alternatives.
(*addv<mode>4_overflow_2): Likewise.
(@add<mode>3_carry): Likewise.
(*add<mode>3_carry_0): Likewise.
(*addsi3_carry_zext): Likewise.
(addcarry<mode>): Likewise.
(addcarry<mode>_0): Likewise.
(*addcarry<mode>_1): Likewise.
(*add<mode>3_eq): Likewise.
(*add<mode>3_ne): Likewise.
(*addsi3_carry_zext_0): Likewise, and use nonimmediate_operand for
operands[1] to accept memory input for NDD alternative.
Hongyu Wang [Mon, 13 Nov 2023 10:49:07 +0000 (18:49 +0800)]
[APX NDD] Disable seg_prefixed memory usage for NDD add
NDD uses evex prefix, so when segment prefix is also applied, the instruction
could excceed its 15byte limit, especially adding immediates. This could happen
when "e" constraint accepts any UNSPEC_TPOFF/UNSPEC_NTPOFF constant and it will
add the offset to segment register, which will be encoded using segment prefix.
Disable those *POFF constant usage in NDD add alternatives with new constraint.
gcc/ChangeLog:
* config/i386/constraints.md (je): New constraint.
* config/i386/i386-protos.h (x86_poff_operand_p): New function to
check any *POFF constant in operand.
* config/i386/i386.cc (x86_poff_operand_p): New prototype.
* config/i386/i386.md (*add<mode>_1): Split out je alternative for add.
Kong Lingling [Wed, 14 Dec 2022 02:10:19 +0000 (10:10 +0800)]
[APX NDD] Support Intel APX NDD for legacy add insn
APX NDD provides an extra destination register operand for several gpr
related legacy insns, so a new alternative can be adopted to operand1
with "r" constraint.
This first patch supports NDD for add instruction, and keeps to use lea
when all operands are registers since lea have shorter encoding. For
add operations containing mem NDD will be adopted to save an extra move.
In legacy x86 binary operation expand it will force operands[0] and
operands[1] to be the same so add a helper function to allow NDD form
pattern that operands[0] and operands[1] can be different.
gcc/ChangeLog:
* config/i386/i386-expand.cc (ix86_fixup_binary_operands): Add
new use_ndd flag to check whether ndd can be used for this binop
and adjust operand emit.
(ix86_binary_operator_ok): Likewise.
(ix86_expand_binary_operator): Likewise, and void postreload
expand generate lea pattern when use_ndd is explicit parsed.
* config/i386/i386-options.cc (ix86_option_override_internal):
Prohibit apx subfeatures when not in 64bit mode.
* config/i386/i386-protos.h (ix86_binary_operator_ok):
Add use_ndd flag.
(ix86_fixup_binary_operand): Likewise.
(ix86_expand_binary_operand): Likewise.
* config/i386/i386.md (*add<mode>_1): Extend with new alternatives
to support NDD, and adjust output template.
(*addhi_1): Likewise.
(*addqi_1): Likewise.
David Malcolm [Thu, 7 Dec 2023 00:25:26 +0000 (19:25 -0500)]
analyzer: fix taint false positives with UNKNOWN [PR112850]
PR analyzer/112850 reports a false positive from
-Wanalyzer-tainted-allocation-size on the Linux kernel [1] where
-fanalyzer complains that an allocation size is attacker-controlled
despite the value being correctly sanitized against upper and lower
limits.
The root cause is that the expression is sufficiently complex
to exceed the -param=analyzer-max-svalue-depth= threshold,
currently at 12, with depth 13, and so it is treated as UNKNOWN.
Hence the sanitizations are seen as comparisons of an UNKNOWN
symbolic value against constants, and these were being ignored
by the taint state machine.
The expression in question is relatively typical for those seen in
Linux kernel ioctl handlers, and I was surprised that it had exceeded
the analyzer's default expression complexity limit.
This patch addresses this problem in three ways:
(a) the default value of the threshold parameter is increased, from 12
to 18, so that such expressions are precisely handled
(b) adding a new -Wanalyzer-symbol-too-complex to warn when the symbol
complexity limit is reached. This is off by default for users, and
on by default in the test suite.
(c) the taint state machine handles comparisons against UNKNOWN svalues
by dropping all taint information on that execution path, so that if
the complexity limit has been exceeded we don't generate false positives
As well as fixing the taint false positive (PR analyzer/112850), the
patch also fixes a couple of leak false positives seen on flex-generated
scanners (PR analyzer/103546).
[1] specifically, in sound/core/rawmidi.c's handler for
SNDRV_RAWMIDI_STREAM_OUTPUT.
aarch64: Add support for GCS system registers with the +gcs modifier
Given the introduction of system registers associated with the Guarded
Control Stack extension to Armv9.4-a in Binutils and their reliance on
the `+gcs' modifier, we implement the necessary changes in GCC to
allow for them to be recognized by the compiler.
aarch64: Add march flags for +the and +d128 arch extensions
Given the introduction of optional 128-bit page table descriptor and
translation hardening extension support with the Arm9.4-a
architecture, this introduces the relevant flags to enable the reading
and writing of 128-bit system registers.
The `+d128' -march modifier enables the use of the following ACLE
builtin functions:
and defines the __ARM_FEATURE_SYSREG128 macro to 1.
Finally, the `rcwmask_el1' and `rcwsmask_el1' 128-bit system register
implementations are also reliant on the enablement of the `+the' flag,
which is thus also implemented in this patch.
Edwin Lu [Wed, 6 Dec 2023 00:15:10 +0000 (16:15 -0800)]
RISC-V: Remove xfail from ssa-fre-3.c testcase
Ran the test case at 122e7b4f9d0c2d54d865272463a1d812002d0a5c where the xfail
was introduced. The test did pass at that hash and has continued to pass since
then. Remove the xfail
Eric Gallager [Mon, 4 Dec 2023 15:13:55 +0000 (10:13 -0500)]
remove qmtest-related Makefile targets
On GitHub, Joseph Myers (@jsm28 there) says in MentorEmbedded/qmtest#1
that the qmtest-related targets should have been removed long ago. This
patch does so.
David Malcolm [Wed, 6 Dec 2023 17:35:44 +0000 (12:35 -0500)]
diagnostics: prettify JSON output formats
Previously our JSON output emitted the JSON all on one line, with
no indentation to show the structure of the values.
Although it's easy to reformat such output (e.g. with
"python -m json.tool"), I've found it's a pain to need to do so
e.g. my text editor sometimes hangs when opening a multimegabyte
json file all on one line. Similarly diff-ing is easier if the
json is already formatted.
This patch add whitespace to json output to show the structure.
It turned out to be fairly easy to implement using pretty_printer's
existing indentation machinery.
The patch uses this formatting for the various JSON-based diagnostic
output formats.
For example, with this patch, the output from
fdiagnostics-format=json-stderr looks like:
I was able to update almost all of our DejaGnu test cases for JSON to
handle this format tweak, and IMHO it improved the readability of these
test cases, but a couple were more awkward. Hence I added
-fno-diagnostics-json-formatting as an option to disable this
formatting.
The formatting does not affect the output of -fsave-optimization-record
or the JSON output from gcov (but this could be enabled if desirable).
gcc/analyzer/ChangeLog:
* engine.cc (dump_analyzer_json): Use
flag_diagnostics_json_formatting.
gcc/ChangeLog:
* common.opt (fdiagnostics-json-formatting): New.
* diagnostic-format-json.cc: Add "formatted" boolean
to json_output_format and subclasses, and to the
diagnostic_output_format_init_json_* functions. Use it when
printing JSON.
* diagnostic-format-sarif.cc: Likewise for sarif_builder,
sarif_output_format, and the various
diagnostic_output_format_init_sarif_* functions.
* diagnostic.cc (diagnostic_output_format_init): Add
"json_formatting" boolean and pass on to the various cases.
* diagnostic.h (diagnostic_output_format_init): Add
"json_formatted" param.
(diagnostic_output_format_init_json_stderr): Add "formatted" param
(diagnostic_output_format_init_json_file): Likewise.
(diagnostic_output_format_init_sarif_stderr): Likewise.
(diagnostic_output_format_init_sarif_file): Likewise.
(diagnostic_output_format_init_sarif_stream): Likewise.
* doc/invoke.texi (-fdiagnostics-format=json): Remove discussion
about JSON output needing formatting.
(-fno-diagnostics-json-formatting): Add.
* gcc.cc (driver_handle_option): Use
opts->x_flag_diagnostics_json_formatting.
* gcov.cc (generate_results): Pass "false" for new formatting
option when printing json.
* json.cc (value::dump): Add new "formatted" param.
(object::print): Likewise, using it to add whitespace to format
the JSON output.
(array::print): Likewise.
(float_number::print): Add new "formatted" param.
(integer_number::print): Likewise.
(string::print): Likewise.
(literal::print): Likewise.
(selftest::assert_print_eq): Add "formatted" param.
(ASSERT_PRINT_EQ): Add "FORMATTED" param.
(selftest::test_writing_objects): Test both formatted and
unformatted printing.
(selftest::test_writing_arrays): Likewise.
(selftest::test_writing_float_numbers): Update for new param of
ASSERT_PRINT_EQ.
(selftest::test_writing_integer_numbers): Likewise.
(selftest::test_writing_strings): Likewise.
(selftest::test_writing_literals): Likewise.
(selftest::test_formatting): New.
(selftest::json_cc_tests): Call it.
* json.h (value::print): Add "formatted" param.
(value::dump): Likewise.
(object::print): Likewise.
(array::print): Likewise.
(float_number::print): Likewise.
(integer_number::print): Likewise.
(string::print): Likewise.
(literal::print): Likewise.
* optinfo-emit-json.cc (optrecord_json_writer::write): Pass
"false" for new formatting option when printing json.
(selftest::test_building_json_from_dump_calls): Likewise.
* opts.cc (common_handle_option): Use
opts->x_flag_diagnostics_json_formatting.
gcc/ChangeLog:
* diagnostic-format-json.cc (on_begin_diagnostic): Convert param
to const reference.
(on_end_diagnostic): Likewise.
(json_output_format::on_end_diagnostic): Likewise.
* diagnostic-format-sarif.cc
(sarif_invocation::add_notification_for_ice): Likewise.
(sarif_result::on_nested_diagnostic): Likewise.
(sarif_ice_notification::sarif_ice_notification): Likewise.
(sarif_builder::end_diagnostic): Likewise.
(sarif_builder::make_result_object): Likewise.
(make_reporting_descriptor_object_for_warning): Likewise.
(sarif_builder::make_locations_arr): Likewise.
(sarif_output_format::on_begin_diagnostic): Likewise.
(sarif_output_format::on_end_diagnostic): Likewise.
* diagnostic.cc (default_diagnostic_starter): Make diagnostic_info
param const.
(default_diagnostic_finalizer): Likewise.
(diagnostic_context::report_diagnostic): Pass diagnostic by
reference to on_{begin,end}_diagnostic.
(diagnostic_text_output_format::on_begin_diagnostic): Convert
param to const reference.
(diagnostic_text_output_format::on_end_diagnostic): Likewise.
* diagnostic.h (diagnostic_starter_fn): Make diagnostic_info param
const.
(diagnostic_finalizer_fn): Likeewise.
(diagnostic_output_format::on_begin_diagnostic): Convert param to
const reference.
(diagnostic_output_format::on_end_diagnostic): Likewise.
(diagnostic_text_output_format::on_begin_diagnostic): Likewise.
(diagnostic_text_output_format::on_end_diagnostic): Likewise.
(default_diagnostic_starter): Make diagnostic_info param const.
(default_diagnostic_finalizer): Likewise.
* langhooks-def.h (lhd_print_error_function): Make diagnostic_info
param const.
* langhooks.cc (lhd_print_error_function): Likewise.
* langhooks.h (lang_hooks::print_error_function): Likewise.
* tree-diagnostic.cc (diagnostic_report_current_function):
Likewise.
(default_tree_diagnostic_starter): Likewise.
(virt_loc_aware_diagnostic_finalizer): Likewise.
* tree-diagnostic.h (diagnostic_report_current_function):
Likewise.
(virt_loc_aware_diagnostic_finalizer): Likewise.
gcc/fortran/ChangeLog:
* error.cc (gfc_diagnostic_starter): Make diagnostic_info param
const.
(gfc_diagnostic_finalizer): Likewise.
gcc/jit/ChangeLog:
* dummy-frontend.cc (jit_begin_diagnostic): Make diagnostic_info
param const.
(jit_end_diagnostic): Likewise. Pass to add_diagnostic by
reference.
* jit-playback.cc (jit::playback::context::add_diagnostic):
Convert diagnostic_info to const reference.
* jit-playback.h (jit::playback::context::add_diagnostic):
Likewise.
Andrew Stubbs [Mon, 30 Jan 2023 14:43:00 +0000 (14:43 +0000)]
amdgcn, libgomp: low-latency allocator
This implements the OpenMP low-latency memory allocator for AMD GCN using the
small per-team LDS memory (Local Data Store).
Since addresses can now refer to LDS space, the "Global" address space is
no-longer compatible. This patch therefore switches the backend to use
entirely "Flat" addressing (which supports both memories). A future patch
will re-enable "global" instructions for cases where it is known to be safe
to do so.
gcc/ChangeLog:
* config/gcn/gcn-builtins.def (DISPATCH_PTR): New built-in.
* config/gcn/gcn.cc (gcn_init_machine_status): Disable global
addressing.
(gcn_expand_builtin_1): Implement GCN_BUILTIN_DISPATCH_PTR.
libgomp/ChangeLog:
* config/gcn/libgomp-gcn.h (TEAM_ARENA_START): Move to here.
(TEAM_ARENA_FREE): Likewise.
(TEAM_ARENA_END): Likewise.
(GCN_LOWLAT_HEAP): New.
* config/gcn/team.c (LITTLEENDIAN_CPU): New, and import hsa.h.
(__gcn_lowlat_init): New prototype.
(gomp_gcn_enter_kernel): Initialize the low-latency heap.
* libgomp.h (TEAM_ARENA_START): Move to libgomp.h.
(TEAM_ARENA_FREE): Likewise.
(TEAM_ARENA_END): Likewise.
* plugin/plugin-gcn.c (lowlat_size): New variable.
(print_kernel_dispatch): Label the group_segment_size purpose.
(init_environment_variables): Read GOMP_GCN_LOWLAT_POOL.
(create_kernel_dispatch): Pass low-latency head allocation to kernel.
(run_kernel): Use shadow; don't assume values.
* testsuite/libgomp.c/omp_alloc-traits.c: Enable for amdgcn.
* config/gcn/allocator.c: New file.
* libgomp.texi: Document low-latency implementation details.
Andrew Stubbs [Thu, 27 Jan 2022 13:48:50 +0000 (13:48 +0000)]
openmp, nvptx: low-lat memory access traits
The NVPTX low latency memory is not accessible outside the team that allocates
it, and therefore should be unavailable for allocators with the access trait
"all". This change means that the omp_low_lat_mem_alloc predefined
allocator no longer works (but omp_cgroup_mem_alloc still does).
libgomp/ChangeLog:
* allocator.c (MEMSPACE_VALIDATE): New macro.
(omp_init_allocator): Use MEMSPACE_VALIDATE.
(omp_aligned_alloc): Use OMP_LOW_LAT_MEM_ALLOC_INVALID.
(omp_aligned_calloc): Likewise.
(omp_realloc): Likewise.
* config/nvptx/allocator.c (nvptx_memspace_validate): New function.
(MEMSPACE_VALIDATE): New macro.
(OMP_LOW_LAT_MEM_ALLOC_INVALID): New define.
* libgomp.texi: Document low-latency implementation details.
* testsuite/libgomp.c/omp_alloc-1.c (main): Add gnu_lowlat.
* testsuite/libgomp.c/omp_alloc-2.c (main): Add gnu_lowlat.
* testsuite/libgomp.c/omp_alloc-3.c (main): Add gnu_lowlat.
* testsuite/libgomp.c/omp_alloc-4.c (main): Add access trait.
* testsuite/libgomp.c/omp_alloc-5.c (main): Add gnu_lowlat.
* testsuite/libgomp.c/omp_alloc-6.c (main): Add access trait.
* testsuite/libgomp.c/omp_alloc-traits.c: New test.
Andrew Stubbs [Fri, 3 Dec 2021 17:46:41 +0000 (17:46 +0000)]
libgomp, nvptx: low-latency memory allocator
This patch adds support for allocating low-latency ".shared" memory on
NVPTX GPU device, via the omp_low_lat_mem_space and omp_alloc. The memory
can be allocated, reallocated, and freed using a basic but fast algorithm,
is thread safe and the size of the low-latency heap can be configured using
the GOMP_NVPTX_LOWLAT_POOL environment variable.
The use of the PTX dynamic_smem_size feature means that low-latency allocator
will not work with the PTX 3.1 multilib.
For now, the omp_low_lat_mem_alloc allocator also works, but that will change
when I implement the access traits.
libgomp/ChangeLog:
* allocator.c (MEMSPACE_ALLOC): New macro.
(MEMSPACE_CALLOC): New macro.
(MEMSPACE_REALLOC): New macro.
(MEMSPACE_FREE): New macro.
(predefined_alloc_mapping): New array. Add _Static_assert to match.
(ARRAY_SIZE): New macro.
(omp_aligned_alloc): Use MEMSPACE_ALLOC.
Implement fall-backs for predefined allocators. Simplify existing
fall-backs.
(omp_free): Use MEMSPACE_FREE.
(omp_calloc): Use MEMSPACE_CALLOC. Implement fall-backs for
predefined allocators. Simplify existing fall-backs.
(omp_realloc): Use MEMSPACE_REALLOC, MEMSPACE_ALLOC, and MEMSPACE_FREE.
Implement fall-backs for predefined allocators. Simplify existing
fall-backs.
* config/nvptx/team.c (__nvptx_lowlat_pool): New asm variable.
(__nvptx_lowlat_init): New prototype.
(gomp_nvptx_main): Call __nvptx_lowlat_init.
* libgomp.texi: Update memory space table.
* plugin/plugin-nvptx.c (lowlat_pool_size): New variable.
(GOMP_OFFLOAD_init_device): Read the GOMP_NVPTX_LOWLAT_POOL envvar.
(GOMP_OFFLOAD_run): Apply lowlat_pool_size.
* basic-allocator.c: New file.
* config/nvptx/allocator.c: New file.
* testsuite/libgomp.c/omp_alloc-1.c: New test.
* testsuite/libgomp.c/omp_alloc-2.c: New test.
* testsuite/libgomp.c/omp_alloc-3.c: New test.
* testsuite/libgomp.c/omp_alloc-4.c: New test.
* testsuite/libgomp.c/omp_alloc-5.c: New test.
* testsuite/libgomp.c/omp_alloc-6.c: New test.
Co-authored-by: Kwok Cheung Yeung <kcy@codesourcery.com> Co-Authored-By: Thomas Schwinge <thomas@codesourcery.com>
Fix c-c++-common/fhardened-[12].c test fails on hppa
The -fstack-protector and -fstack-protector-strong options are
not supported on hppa since the stack grows up.
2023-12-06 John David Anglin <danglin@gcc.gnu.org>
gcc/testsuite/ChangeLog:
* c-c++-common/fhardened-1.c: Ignore __SSP_STRONG__ define
if __hppa__ is defined.
* c-c++-common/fhardened-2.c: Ignore __SSP__ define
if __hppa__ is defined.
Juzhe-Zhong [Wed, 6 Dec 2023 14:26:46 +0000 (22:26 +0800)]
RISC-V: Fix VSETVL PASS bug
As PR112855 mentioned, the VSETVL PASS insert vsetvli in unexpected location.
Due to 2 reasons:
1. incorrect transparant computation LCM data. We need to check VL operand defs and uses.
2. incorrect fusion of unrelated edge which is the edge never reach the vsetvl expression.
Jason Merrill [Tue, 5 Dec 2023 20:28:16 +0000 (15:28 -0500)]
c++: partial ordering of object parameter [PR53499]
Looks like we implemented option 1 (skip the object parameter) for CWG532
before the issue was resolved, and never updated to the final resolution of
option 2 (model it as a reference). More recently CWG2445 extended this
handling to static member functions; I think that's wrong, and have
opened CWG2834 to address that and how explicit object member functions
interact with it.
The FIXME comments are to guide how the explicit object member function
support should change the uses of DECL_NONSTATIC_MEMBER_FUNCTION_P.
The library testsuite changes are to make partial ordering work again
between the generic operator- in the testcase and
_Pointer_adapter::operator-.
DR 532
PR c++/53499
gcc/cp/ChangeLog:
* pt.cc (more_specialized_fn): Fix object parameter handling.
gcc/testsuite/ChangeLog:
* g++.dg/template/partial-order4.C: New test.
* g++.dg/template/spec26.C: Adjust for CWG532.
libstdc++-v3/ChangeLog:
* testsuite/23_containers/vector/ext_pointer/types/1.cc
* testsuite/23_containers/vector/ext_pointer/types/2.cc
(N::operator-): Make less specialized.
Marek Polacek [Tue, 5 Dec 2023 18:39:49 +0000 (13:39 -0500)]
build: unbreak bootstrap on uclinux targets [PR112762]
Currently, cross-compiling with --target=c6x-uclinux (and several other)
fails due to:
../../src/gcc/config/linux.h:221:45: error: 'linux_fortify_source_default_level' was not declared in this scope
#define TARGET_FORTIFY_SOURCE_DEFAULT_LEVEL linux_fortify_source_default_level
In the PR Andrew mentions that another fix would be in config.gcc,
but really, here I meant to use the target hook for glibc only, not
uclibc. This trivial patch fixes the build problem. It means that
-fhardened with uclibc will use -D_FORTIFY_SOURCE=2 and not =3.
PR target/112762
gcc/ChangeLog:
* config/linux.h: Redefine TARGET_FORTIFY_SOURCE_DEFAULT_LEVEL for
glibc only.
Thomas Schwinge [Tue, 5 Dec 2023 08:54:54 +0000 (09:54 +0100)]
Modula-2: Support '-isysroot [...]'
In GCC cross configurations (tested '--target=amdgcn-amdhsa' and
'--target=nvptx-none') with a sysroot configured, the 'gm2' driver invocations
are passed '--sysroot=[...]', which is translated into '-isysroot [...]' for
the 'cc1gm2' compiler invocation. The latter, however gets complained about:
cc1gm2: warning: command-line option ‘-isysroot [...]’ is valid for C/C++/D/Fortran/ObjC/ObjC++ but not for Modula-2
..., and therefore a ton of FAILs.
Reproducer (also for non-cross, native configurations):
$ build-gcc/gcc/gm2 -Bbuild-gcc/gcc -v --sysroot=/tmp -x modula-2 /dev/null
[...]
build-gcc/gcc/cc1gm2 [...] -isysroot [...]/tmp [...]
cc1gm2: warning: command-line option ‘-isysroot /tmp’ is valid for C/C++/D/Fortran/ObjC/ObjC++ but not for Modula-2
[...]
Jakub Jelinek [Wed, 6 Dec 2023 11:27:12 +0000 (12:27 +0100)]
libgcc: Avoid -Wbuiltin-declaration-mismatch warnings in emutls.c
When libgcc is being built in --disable-tls configuration or on
a target without native TLS support, one gets annoying warnings:
../../../../libgcc/emutls.c:61:7: warning: conflicting types for built-in function ‘__emutls_get_address’; expected ‘void *(void *)’ [-Wbuiltin-declaration-mismatch]
61 | void *__emutls_get_address (struct __emutls_object *);
| ^~~~~~~~~~~~~~~~~~~~
../../../../libgcc/emutls.c:63:6: warning: conflicting types for built-in function ‘__emutls_register_common’; expected ‘void(void *, unsigned int, unsigned int, void *)’
+[-Wbuiltin-declaration-mismatch]
63 | void __emutls_register_common (struct __emutls_object *, word, word, void *);
| ^~~~~~~~~~~~~~~~~~~~~~~~
../../../../libgcc/emutls.c:140:1: warning: conflicting types for built-in function ‘__emutls_get_address’; expected ‘void *(void *)’ [-Wbuiltin-declaration-mismatch]
140 | __emutls_get_address (struct __emutls_object *obj)
| ^~~~~~~~~~~~~~~~~~~~
../../../../libgcc/emutls.c:204:1: warning: conflicting types for built-in function ‘__emutls_register_common’; expected ‘void(void *, unsigned int, unsigned int, void *)’
+[-Wbuiltin-declaration-mismatch]
204 | __emutls_register_common (struct __emutls_object *obj,
| ^~~~~~~~~~~~~~~~~~~~~~~~
The thing is that in that case __emutls_get_address and
__emutls_register_common are builtins, and are declared with void *
arguments rather than struct __emutls_object *.
Now, struct __emutls_object is a type private to libgcc/emutls.c and the
middle-end creates on demand when calling the builtins a similar structure
(with small differences, like not having the union in there).
We have a precedent for this e.g. for fprintf or strftime builtins where
the builtins are created with magic fileptr_type_node or const_tm_ptr_type_node
types and then match it with user definition of pointers to some structure,
but I think for this case users should never define these functions
themselves nor call them and having special types for them in the compiler
would mean extra compile time spent during compiler initialization and more
GC data, so I think it is better to keep the compiler as is.
On the library side, there is an option to just follow what the
compiler is doing and do
EMUTLS_ATTR void
-__emutls_register_common (struct __emutls_object *obj,
+__emutls_register_common (void *xobj,
word size, word align, void *templ)
{
+ struct __emutls_object *obj = (struct __emutls_object *) xobj;
but that will make e.g. libabigail complain about ABI change in libgcc.
So, the patch just turns the warning off.
2023-12-06 Thomas Schwinge <thomas@codesourcery.com>
Jakub Jelinek <jakub@redhat.com>
aarch64: Add system register duplication check selftest
Add a build-time test to check whether system register data, as
imported from `aarch64-sys-reg.def' has any duplicate entries.
Duplicate entries are defined as any two SYSREG entries in the .def
file which share the same encoding values (as specified by its `CPENC'
field) and where the relationship amongst the two does not fit into
one of the following categories:
* Simple aliasing: In some cases, it is observed that one
register name serves as an alias to another. One example of
this is where TRCEXTINSELR aliases TRCEXTINSELR0.
* Expressing intent: It is possible that when a given register
serves two distinct functions depending on how it is used, it
is given two distinct names whose use should match the context
under which it is being used. Example: Debug Data Transfer
Register. When used to receive data, it should be accessed as
DBGDTRRX_EL0 while when transmitting data it should be
accessed via DBGDTRTX_EL0.
* Register depreciation: Some register names have been
deprecated and should no longer be used, but backwards-
compatibility requires that such names continue to be
recognized, as is the case for the SPSR_EL1 register, whose
access via the SPSR_SVC name is now deprecated.
* Same encoding different target: Some encodings are given
different meaning depending on the target architecture and, as
such, are given different names in each of theses contexts.
We see an example of this for CPENC(3,4,2,0,0), which
corresponds to TTBR0_EL2 for Armv8-A targets and VSCTLR_EL2
in Armv8-R targets.
A consequence of these observations is that `CPENC' duplication is
acceptable iff at least one of the `properties' or `arch_reqs' fields
of the `sysreg_t' structs associated with the two registers in
question differ and it's this condition that is checked by the new
`aarch64_test_sysreg_encoding_clashes' function.
gcc/ChangeLog:
* config/aarch64/aarch64.cc
(aarch64_test_sysreg_encoding_clashes): New.
(aarch64_run_selftests): add call to
aarch64_test_sysreg_encoding_clashes selftest.
aarch64: Add front-end argument type checking for target builtins
In implementing the ACLE read/write system register builtins it was
observed that leaving argument type checking to be done at expand-time
meant that poorly-formed function calls were being "fixed" by certain
optimization passes, meaning bad code wasn't being properly picked up
in checking.
Example:
const char *regname = "amcgcr_el0";
long long a = __builtin_aarch64_rsr64 (regname);
is reduced by the ccp1 pass to
long long a = __builtin_aarch64_rsr64 ("amcgcr_el0");
As these functions require an argument of STRING_CST type, there needs
to be a check carried out by the front-end capable of picking this up.
The introduced `check_general_builtin_call' function will be called by
the TARGET_CHECK_BUILTIN_CALL hook whenever a call to a builtin
belonging to the AARCH64_BUILTIN_GENERAL category is encountered,
carrying out any appropriate checks associated with a particular
builtin function code.
aarch64: Implement system register validation tools
Given the implementation of a mechanism of encoding system registers
into GCC, this patch provides the mechanism of validating their use by
the compiler. In particular, this involves:
1. Ensuring a supplied string corresponds to a known system
register name. System registers can be accessed either via their
name (e.g. `SPSR_EL1') or their encoding (e.g. `S3_0_C4_C0_0').
Register names are validated using a hash map, mapping known
system register names to its corresponding `sysreg_t' struct,
which is populated from the `aarch64_system_regs.def' file.
Register name validation is done via `lookup_sysreg_map', while
the encoding naming convention is validated via a parser
implemented in this patch - `is_implem_def_reg'.
2. Once a given register name is deemed to be valid, it is checked
against a further 2 criteria:
a. Is the referenced register implemented in the target
architecture? This is achieved by comparing the ARCH field
in the relevant SYSREG entry from `aarch64_system_regs.def'
against `aarch64_feature_flags' flags set at compile-time.
b. Is the register being used correctly? Check the requested
operation against the FLAGS specified in SYSREG.
This prevents operations like writing to a read-only system
register.
This patch defines the structure of a new .def file used for
representing the aarch64 system registers, what information it should
hold and the basic framework in GCC to process this file.
Entries in the aarch64-system-regs.def file should be as follows:
Where the arguments to SYSREG correspond to:
- NAME: The system register name, as used in the assembly language.
- CPENC: The system register encoding, mapping to:
s<sn>_<op1>_c<cn>_c<cm>_<op2>
- FLAG: The entries in the FLAGS field are bitwise-OR'd together to
encode extra information required to ensure proper use of
the system register. For example, a read-only system
register will have the flag F_REG_READ, while write-only
registers will be labeled F_REG_WRITE. Such flags are
tested against at compile-time.
- ARCH: The architectural features the system register is associated
with. This is encoded via one of three possible macros:
1. When a system register is universally implemented, we say
it has no feature requirements, so we tag it with the
AARCH64_NO_FEATURES macro.
2. When a register is only implemented for a single
architectural extension EXT, the AARCH64_FEATURE (EXT), is
used.
3. When a given system register is made available by any of N
possible architectural extensions, the AARCH64_FEATURES(N, ...)
macro is used to combine them accordingly.
In order to enable proper interpretation of the SYSREG entries by the
compiler, flags defining system register behavior such as `F_REG_READ'
and `F_REG_WRITE' are also defined here, so they can later be used for
the validation of system register properties.
Finally, any architectural feature flags from Binutils missing from GCC
have appropriate aliases defined here so as to ensure
cross-compatibility of SYSREG entries across the toolchain.
aarch64: Sync system register information with Binutils
This patch adds the `aarch64-sys-regs.def' file, originally written
for Binutils, to GCC. In so doing, it provides GCC with the necessary
information for teaching the compiler about system registers known to
the assembler and how these can be used.
By aligning the representation of data common to different parts of
the toolchain we can greatly reduce the duplication of work,
facilitating the maintenance of the aarch64 back-end across different
parts of the toolchain; By keeping both copies of the file in sync,
any `SYSREG (...)' that is added in one project is automatically added
to its counterpart. This being the case, no change should be made in
the GCC copy of the file. Any modifications should first be made in
Binutils and the resulting file copied over to GCC.
GCC does not implement the full range of ISA flags present in
Binutils. Where this is the case, aliases must be added to aarch64.h
with the unknown architectural extension being mapped to its
associated base architecture, such that any flag present in Binutils
and used in system register definitions is understood in GCC. Again,
this is done such that flags can be used interchangeably between
projects making use of the aarch64-system-regs.def file. This is done
in the next patch in the series.
`.arch' directives missing from the emitted assembly files as a
consequence of this aliasing are accounted for by the compiler using
the S<op0>_<op1>_<Cn>_<Cm>_<op2> encoding of system registers when
issuing mrs/msr instructions. This design choice ensures the
assembler will accept anything that was deemed acceptable by the
compiler.
Robin Dapp [Tue, 5 Dec 2023 14:24:12 +0000 (15:24 +0100)]
RISC-V: Add vec_init expander for masks [PR112854].
PR112854 shows a problem on rv32 with zvl1024b. During the course of
expand_constructor we try to overlay/subreg a 64-element mask by a
scalar (Pmode) register. This works for zvl512b and its maximum of
32 elements but fails for rv32 and 64 elements.
To circumvent this this patch adds a vec_init expander for vector masks
by initializing a QImode vector and comparing that against 0.
gcc/ChangeLog:
PR target/112854
PR target/112872
* config/riscv/autovec.md (vec_init<mode>qi): New expander.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/pr112854.c: New test.
* gcc.target/riscv/rvv/autovec/pr112872.c: New test.
Jakub Jelinek [Wed, 6 Dec 2023 08:59:12 +0000 (09:59 +0100)]
i386: Move vzeroupper pass from after reload pass to after postreload_cse [PR112760]
Regardless of the outcome of the REG_UNUSED discussions, I think
it is a good idea to move the vzeroupper pass one pass later.
As can be seen in the multiple PRs and as postreload.cc documents,
reload/LRA is known to create dead statements quite often, which
is the reason why we have postreload_cse pass at all.
Doing vzeroupper pass before such cleanup means the pass including
df_analyze for it needs to process more instructions than needed
and because mode switching adds note problem, also higher chance of
having stale REG_UNUSED notes.
And, I really don't see why vzeroupper can't wait until those cleanups
are done.
2023-12-06 Jakub Jelinek <jakub@redhat.com>
PR rtl-optimization/112760
* config/i386/i386-passes.def (pass_insert_vzeroupper): Insert
after pass_postreload_cse rather than pass_reload.
* config/i386/i386-features.cc (rest_of_handle_insert_vzeroupper):
Adjust comment for it.
Jakub Jelinek [Wed, 6 Dec 2023 08:55:30 +0000 (09:55 +0100)]
lower-bitint: Fix arithmetics followed by extension by many bits [PR112809]
A zero or sign extension from result of some upwards_2limb operation
is implemented in lower_mergeable_stmt as an extra loop which fills in
the extra bits with 0s or 1s.
If the delta of extended vs. unextended bit count is small, the code
doesn't use a loop and emits up to a couple of stores to constant indexes,
but if the delta is large, it uses
cnt = (bo_bit != 0) + 1 + (rem != 0);
statements. bo_bit is non-zero for bit-field loads and is done in that
case as straight line, the unconditional 1 in there is for a loop which
handles most of the limbs in the delta and finally (rem != 0) is for the
case when the extended precision is not a multiple of limb_prec and is
again done in straight line code (after the loop).
The testcase ICEs because the decision what idx to use was incorrect
for kind == bitint_prec_huge (i.e. when the precision delta is very large)
and rem == 0 (i.e. the extended precision is multiple of limb_prec).
In that case cnt is either 1 (if bo_bit == 0) or 2, and idx should
be either first size_int (start) and then result of create_loop (for bo_bit
!= 0) or just result of create_loop, but by mistake the last case
was size_int (end), which means when precision is multiple of limb_prec
storing above the precision (which ICEs; but also not emitting the loop
which is needed).
2023-12-06 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/112809
* gimple-lower-bitint.cc (bitint_large_huge::lower_mergeable_stmt): For
separate_ext in kind == bitint_prec_huge mode if rem == 0, create for
i == cnt - 1 the loop rather than using size_int (end).
Jakub Jelinek [Wed, 6 Dec 2023 08:54:03 +0000 (09:54 +0100)]
driver: Fix bootstrap with --enable-default-pie
On IRC Iain mentioned bootstrap is broken for him presumably since
r14-5791 -fhardened addition. I think it is only a problem with
--enable-default-pie when the case OPT_pie: wants to fall through
into case OPT_r: and warns.
Before the patch validated = true; was set up if ENABLE_DEFAULT_PIE
for OPT_pie, and for -fhardened as documented I think we want to
set any_link_options_p = true; for it too:
/* True if -r, -shared, -pie, or -no-pie were specified on the command
line. */
static bool any_link_options_p;
2023-12-06 Jakub Jelinek <jakub@redhat.com>
* gcc.cc (driver_handle_option): Add /* FALLTHROUGH */ comment
between OPT_pie and OPT_r cases.