Now that the jump thread back registry has been split into the generic
copier and the custom (old) copier, it becomes trivial to remove the
FSM bits from the jump threaders.
First, there's no need for an EDGE_FSM_THREAD type. The only reason
we were looking at the threading type was to determine what type of
copier to use, and now that the copier has been split, there's no need
to even look. However, there is one check in register_jump_thread
where we verify that only the generic copier can thread through
back-edges. I've removed that check in favor of a flag passed to the
constructor.
I've also removed all the FSM references from the code and tests.
Interestingly, some tests weren't even testing the right thing. They
were testing for "FSM" which would catch jump thread paths as well as
the backward threader *failing* on registering a path. *big eye roll*
The only remaining code that was actually checking for EDGE_FSM_THREAD
was adjust_paths_after_duplication, and the checks could be written
without looking at the edge type at all. For the record, the code
there is horrible: it's convoluted, hard to read, and doesn't have any
tests. I'd smack myself if I could go back in time.
All that remains are the FSM references in the --param's themselves.
I think we should s/fsm/threader/, since I envision a day when we can
share the cost basis code between the threaders. However, I don't
know what the proper procedure is for renaming existing compiler
options.
By the way, param_fsm_maximum_phi_arguments is no longer relevant
after the rewrite. We can nuke that one right away.
Patrick Palka [Mon, 13 Sep 2021 14:29:32 +0000 (10:29 -0400)]
c++: parameter pack inside constexpr if [PR101764]
Here when partially instantiating the first pack expansion, substitution
into the condition of the constexpr if yields a still-dependent tree, so
tsubst_expr returns an IF_STMT with an unsubstituted IF_COND and with
IF_STMT_EXTRA_ARGS added to. Hence after partial instantiation the pack
expansion pattern still refers to the unlowered parameter pack 'ts' of
level 2, and it's thusly recorded in the new PACK_EXPANSION_PARAMETER_PACKS.
During the subsequent final instantiation of the regenerated lambda we
crash in tsubst_pack_expansion because it can't find an argument pack
for this unlowered 'ts', due to the level mismatch. (Likewise when the
constexpr if is replaced by a requires-expr, which also uses the extra
args mechanism for avoiding partial instantiation.)
So essentially, a pack expansion pattern that contains an "extra args"
tree doesn't play well with partial instantiation. This patch fixes
this by forcing such pack expansions to use the extra args mechanism as
well.
PR c++/101764
gcc/cp/ChangeLog:
* cp-tree.h (PACK_EXPANSION_FORCE_EXTRA_ARGS_P): New accessor
macro.
* pt.c (has_extra_args_mechanism_p): New function.
(find_parameter_pack_data::found_extra_args_tree_p): New data
member.
(find_parameter_packs_r): Set ppd->found_extra_args_tree_p
appropriately.
(make_pack_expansion): Set PACK_EXPANSION_FORCE_EXTRA_ARGS_P if
ppd.found_extra_args_tree_p.
(use_pack_expansion_extra_args_p): Return true if there were
unsubstituted packs and PACK_EXPANSION_FORCE_EXTRA_ARGS_P.
(tsubst_pack_expansion): Pass the pack expansion to
use_pack_expansion_extra_args_p.
H.J. Lu [Thu, 26 Aug 2021 12:31:50 +0000 (05:31 -0700)]
x86: Add TARGET_AVX256_[MOVE|STORE]_BY_PIECES
1. Add TARGET_AVX256_MOVE_BY_PIECES to perform move by-pieces operation
with 256-bit AVX instructions.
2. Add TARGET_AVX256_STORE_BY_PIECES to perform move and store by-pieces
operations with 256-bit AVX instructions.
They are enabled only for Intel Alder Lake and Intel processors with
AVX512.
gcc/
PR target/101935
* config/i386/i386.h (TARGET_AVX256_MOVE_BY_PIECES): New.
(TARGET_AVX256_STORE_BY_PIECES): Likewise.
(MOVE_MAX): Check TARGET_AVX256_MOVE_BY_PIECES and
TARGET_AVX256_STORE_BY_PIECES instead of
TARGET_AVX256_SPLIT_UNALIGNED_LOAD and
TARGET_AVX256_SPLIT_UNALIGNED_STORE.
(STORE_MAX_PIECES): Check TARGET_AVX256_STORE_BY_PIECES instead
of TARGET_AVX256_SPLIT_UNALIGNED_STORE.
* config/i386/x86-tune.def (X86_TUNE_AVX256_MOVE_BY_PIECES): New.
(X86_TUNE_AVX256_STORE_BY_PIECES): Likewise.
We need to use the pointer equivalence tracking from evrp in the jump
threader. Instead of moving it to some *evrp.h header, it's cleaner for
it to live in its own file, since it's completely independent and not
evrp specific.
The current restriction on folding memcpy to a single element of size
MOVE_MAX is excessively cautious on most machines and limits some
significant further optimizations. So relax the restriction provided
the copy size does not exceed MOVE_MAX * MOVE_RATIO and that a SET
insn exists for moving the value into machine registers.
Note that there were already checks in place for having misaligned
move operations when one or more of the operands were unaligned.
arm: expand handling of movmisalign for DImode [PR102125]
DImode is currently handled only for machines with vector modes
enabled, but this is unduly restrictive and is generally better done
in core registers.
gcc/ChangeLog:
PR target/102125
* config/arm/arm.md (movmisaligndi): New define_expand.
* config/arm/vec-common.md (movmisalign<mode>): Iterate over VDQ mode.
rtl: directly handle MEM in gen_highpart [PR102125]
gen_lowpart_general handles forming a lowpart of a MEM by using
adjust_address to rework and validate a new version of the MEM.
Do the same for gen_highpart rather than calling simplify_gen_subreg
for this case.
gcc/ChangeLog:
PR target/102125
* emit-rtl.c (gen_highpart): Use adjust_address to handle
MEM rather than calling simplify_gen_subreg.