Tom Tromey [Thu, 29 Jun 2023 14:52:59 +0000 (08:52 -0600)]
ada: Add typedefs to snames.h-tmpl
A future patch will change sname.h-tmpl to use enums rather than
preprocessor defines. In order to do this, first introduce some
typedefs that can be used in gcc-interface.
gcc/ada/
* snames.h-tmpl (Name_Id, Attribute_Id, Convention_Id)
(Pragma_Id): New typedefs.
(Get_Attribute_Id, Get_Pragma_Id): Use typedef.
Bob Duff [Thu, 22 Jun 2023 17:43:00 +0000 (13:43 -0400)]
ada: Documentation for mixed declarations and statements
This patch documents the new feature that allows declarations mixed with
statements, primarily by referring to the RFC.
gcc/ada/
* doc/gnat_rm/gnat_language_extensions.rst
(Local Declarations Without Block): Document the feature very
briefly, and refer the reader to the RFC for details and examples.
* gnat_rm.texi: Regenerate.
* gnat_ugn.texi: Regenerate.
Yannick Moy [Wed, 28 Jun 2023 13:56:26 +0000 (13:56 +0000)]
ada: Adapt proof of System.Arith_Double to remove CVC4
The proof of System.Arith_Double still required the use of
CVC4, now replaced by its successor cvc5. Adapt the proof to be
able to remove CVC4 in the proof of run-time units.
gcc/ada/
* libgnat/s-aridou.adb (Lemma_Div_Mult): New simple lemma.
(Lemma_Powers_Of_2_Commutation): State post in else branch.
(Lemma_Div_Pow2): Introduce local lemma and use it.
(Scaled_Divide): Use cut operations in assertions, lemmas, new
assertions. Introduce local lemma and use it.
Roger Sayle [Mon, 10 Jul 2023 08:06:52 +0000 (09:06 +0100)]
i386: Add new insvti_lowpart_1 and insvdi_lowpart_1 patterns.
This patch implements another of Uros' suggestions, to investigate a
insvti_lowpart_1 pattern to improve TImode parameter passing on x86_64.
In PR 88873, the RTL the middle-end expands for passing V2DF in TImode
is subtly different from what it does for V2DI in TImode, sufficiently so
that my explanations for why insvti_lowpart_1 isn't required don't apply
in this case.
This patch adds an insvti_lowpart_1 pattern, complementing the existing
insvti_highpart_1 pattern, and also a 32-bit variant, insvdi_lowpart_1.
Because the middle-end represents 128-bit constants using CONST_WIDE_INT
and 64-bit constants using CONST_INT, it's easiest to treat these as
different patterns, rather than attempt <dwi> parameterization.
This patch also includes a peephole2 (actually a pair) to transform
xchg instructions into mov instructions, when one of the destinations
is unused. This optimization is required to produce the optimal code
sequences below.
For the 64-bit case:
__int128 foo(__int128 x, unsigned long long y)
{
__int128 m = ~((__int128)~0ull);
__int128 t = x & m;
__int128 r = t | y;
return r;
}
2023-07-10 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* config/i386/i386.md (peephole2): Transform xchg insn with a
REG_UNUSED note to a (simple) move.
(*insvti_lowpart_1): New define_insn_and_split.
(*insvdi_lowpart_1): Likewise.
gcc/testsuite/ChangeLog
* gcc.target/i386/insvdi_lowpart-1.c: New test case.
* gcc.target/i386/insvti_lowpart-1.c: Likewise.
Roger Sayle [Mon, 10 Jul 2023 08:04:29 +0000 (09:04 +0100)]
i386: Add AVX512 support for STV of SI/DImode rotation by constant.
Following Uros' suggestion, this patch adds support for AVX512VL's
vpro[lr][dq] instructions to the recently added scalar-to-vector (STV)
enhancements to handle DImode and SImode rotations by a constant.
For the test cases:
unsigned long long rot1(unsigned long long x) {
return (x>>1) | (x<<63);
}
void mem1(unsigned long long *p) {
*p = rot1(*p);
}
2023-07-10 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* config/i386/i386-features.cc (compute_convert_gain): Tweak
gains/costs for ROTATE/ROTATERT by integer constant on AVX512VL.
(general_scalar_chain::convert_rotate): On TARGET_AVX512F generate
avx512vl_rolv2di or avx412vl_rolv4si when appropriate.
gcc/testsuite/ChangeLog
* gcc.target/i386/avx512vl-stv-rotatedi-1.c: New test case.
Add pre_reload splitter to detect fp min/max pattern.
We have ix86_expand_sse_fp_minmax to detect min/max sematics, but
it requires rtx_equal_p for cmp_op0/cmp_op1 and if_true/if_false, for
the testcase in the PR, there's an extra move from cmp_op0 to if_true,
and it failed ix86_expand_sse_fp_minmax.
This patch adds pre_reload splitter to detect the min/max pattern.
Operands order in MINSS matters for signed zero and NANs, since the
instruction always returns second operand when any operand is NAN or
both operands are zero.
gcc/ChangeLog:
PR target/110170
* config/i386/i386.md (*ieee_max<mode>3_1): New pre_reload
splitter to detect fp max pattern.
(*ieee_min<mode>3_1): Ditto, but for fp min pattern.
gcc/testsuite/ChangeLog:
* g++.target/i386/pr110170.C: New test.
* gcc.target/i386/pr110170.c: New test.
- Import dmd v2.104.0-beta.1.
- Better error message when attribute inference fails down the
call stack.
- Using `;' as an empty statement has been turned into an error.
- Using `in' parameters with non- `extern(D)' or `extern(C++)'
functions is deprecated.
- `in ref' on parameters has been deprecated in favor of
`-preview=in'.
- Throwing `immutable', `const', `inout', and `shared' qualified
objects is now deprecated.
- User Defined Attributes now parse Template Arguments.
D runtime changes:
- Import druntime v2.104.0-beta.1.
Phobos changes:
- Import phobos v2.104.0-beta.1.
- Better static assert messages when instantiating
`std.algorithm.comparison.clamp' with wrong inputs.
- `std.typecons.Rebindable' now supports all types.
gcc/d/ChangeLog:
* dmd/MERGE: Merge upstream dmd 28a3b24c2e.
* dmd/VERSION: Bump version to v2.104.0-beta.1.
* d-codegen.cc (build_bounds_slice_condition): Update for new
front-end interface.
* d-lang.cc (d_init_options): Likewise.
(d_handle_option): Likewise.
(d_post_options): Initialize global.compileEnv.
* expr.cc (ExprVisitor::visit (CatExp *)): Replace code generation
with new front-end lowering.
(ExprVisitor::visit (LoweredAssignExp *)): New method.
(ExprVisitor::visit (StructLiteralExp *)): Don't generate static
initializer symbols for structs defined in C sources.
* runtime.def (ARRAYCATT): Remove.
(ARRAYCATNTX): Remove.
Jan Hubicka [Sun, 9 Jul 2023 13:14:54 +0000 (15:14 +0200)]
Improve dumping of profile_count
Dumps of profile_counts are quite hard to interpret since they are 64bit fixed point
values. In many cases one looks at a single function and it is better to think of
basic block frequency, that is how many times it is executed each invocatoin. This
patch makes CFG dumps to also print this info.
For example:
main()
{
for (int i = 0; i < 10; i++)
t();
}
the -fdump-tree-optimized-blocks-details now prints:
int main ()
{
unsigned int ivtmp_1;
unsigned int ivtmp_2;
Paul Thomas [Sat, 8 Jul 2023 17:13:23 +0000 (18:13 +0100)]
Fortran: Fix default type bugs in gfortran [PR99139, PR99368]
2023-07-08 Steve Kargl <sgk@troutmask.apl.washington.edu>
gcc/fortran
PR fortran/99139
PR fortran/99368
* match.cc (gfc_match_namelist): Check for host associated or
defined types before applying default type.
(gfc_match_select_rank): Apply default type to selector of
unknown type if possible.
* resolve.cc (resolve_fl_variable): Do not apply local default
initialization to assumed rank entities.
gcc/testsuite/
PR fortran/99139
* gfortran.dg/pr99139.f90 : New test
PR fortran/99368
* gfortran.dg/pr99368.f90 : New test
Jan Hubicka [Sat, 8 Jul 2023 15:38:09 +0000 (17:38 +0200)]
Fix tree-ssa/update-cunroll.c
In this testcase the profile is misupdated before loop has two exits.
The first exit is one eliminated by complete unrolling while second exit remains.
We remove first exit but forget about fact that the source BB of other exit will
then have higher frequency making other exit more likely.
This patch fixes that in duplicate_loop_body_to_header_edge.
While looking into resulting profiles I also noticed that in some cases
scale_loop_profile may drop probabilities to 0 incorrectly either when
trying to update exit from nested loop (which has similar problem) or when the profile
was inconsistent as described in coment bellow.
gcc/ChangeLog:
PR middle-end/110590
* cfgloopmanip.cc (scale_loop_profile): Avoid scaling exits within
inner loops and be more careful about inconsistent profiles.
(duplicate_loop_body_to_header_edge): Fix profile update when eliminated
exit is followed by other exit.
Harald Anlauf [Wed, 5 Jul 2023 20:21:09 +0000 (22:21 +0200)]
Fortran: fixes for procedures with ALLOCATABLE,INTENT(OUT) arguments [PR92178]
gcc/fortran/ChangeLog:
PR fortran/92178
* trans-expr.cc (gfc_conv_procedure_call): Check procedures for
allocatable dummy arguments with INTENT(OUT) and move deallocation
of actual arguments after evaluation of argument expressions before
the procedure is executed.
gcc/testsuite/ChangeLog:
PR fortran/92178
* gfortran.dg/intent_out_16.f90: New test.
* gfortran.dg/intent_out_17.f90: New test.
* gfortran.dg/intent_out_18.f90: New test.
Co-authored-by: Steven G. Kargl <kargl@gcc.gnu.org>
cprop: Change return type of predicate functions from int to bool
Also change some internal variables from int to bool.
gcc/ChangeLog:
* cprop.cc (reg_available_p): Change return type from int to bool.
(reg_not_set_p): Ditto.
(try_replace_reg): Ditto. Change "success" variable to bool.
(cprop_jump): Change return type from int to void
and adjust function body accordingly.
(constprop_register): Ditto.
(cprop_insn): Ditto. Change "changed" variable to bool.
(local_cprop_pass): Change return type from int to void
and adjust function body accordingly.
(bypass_block): Ditto. Change "change", "may_be_loop_header"
and "removed_p" variables to bool.
(bypass_conditional_jumps): Change return type from int to void
and adjust function body accordingly. Change "changed"
variable to bool.
(one_cprop_pass): Ditto.
gcse: Change return type of predicate functions from int to bool
Also change some internal variables and function arguments from int to bool.
gcc/ChangeLog:
* gcse.cc (expr_equiv_p): Change return type from int to bool.
(oprs_unchanged_p): Change return type from int to void
and adjust function body accordingly.
(oprs_anticipatable_p): Ditto.
(oprs_available_p): Ditto.
(insert_expr_in_table): Ditto. Change "antic_p" and "avail_p"
arguments to bool. Change "found" variable to bool.
(load_killed_in_block_p): Change return type from int to void and
adjust function body accordingly. Change "avail_p" argument to bool.
(pre_expr_reaches_here_p): Change return type from int to void
and adjust function body accordingly.
(pre_delete): Ditto. Change "changed" variable to bool.
(pre_gcse): Change return type from int to void
and adjust function body accordingly. Change "did_insert" and
"changed" variables to bool.
(one_pre_gcse_pass): Change return type from int to void
and adjust function body accordingly. Change "changed" variable
to bool.
(should_hoist_expr_to_dom): Change return type from int to void
and adjust function body accordingly. Change
"visited_allocated_locally" variable to bool.
(hoist_code): Change return type from int to void and adjust
function body accordingly. Change "changed" variable to bool.
(one_code_hoisting_pass): Ditto.
(pre_edge_insert): Change return type from int to void and adjust
function body accordingly. Change "did_insert" variable to bool.
(pre_expr_reaches_here_p_work): Change return type from int to void
and adjust function body accordingly.
(simple_mem): Ditto.
(want_to_gcse_p): Change return type from int to void
and adjust function body accordingly.
(can_assign_to_reg_without_clobbers_p): Update function body
for bool return type.
(hash_scan_set): Change "antic_p" and "avail_p" variables to bool.
(pre_insert_copies): Change "added_copy" variable to bool.
Eugene Rozenfeld [Fri, 30 Jun 2023 02:38:41 +0000 (19:38 -0700)]
Collect both user and kernel events for autofdo tests and autoprofiledbootstrap
When we collect just user events for autofdo with lbr we get some events where branch
sources are kernel addresses and branch targets are user addresses. Without kernel MMAP
events create_gcov can't make sense of kernel addresses. Currently create_gcov fails if
it can't map at least 95% of events. We sometimes get below this threshold with just
user events. The change is to collect both user events and kernel events.
Tested on x86_64-pc-linux-gnu.
ChangeLog:
* Makefile.in: Collect both kernel and user events for autofdo
* Makefile.tpl: Collect both kernel and user events for autofdo
gcc/testsuite/ChangeLog:
* lib/target-supports.exp: Collect both kernel and user events for autofdo
where the heavy use of SUBREG SET_DESTs creates challenges for both
combine and register allocation.
The improvement proposed here is to avoid these problematic SUBREGs
by adding (two) special cases to ix86_expand_move. For insn 4, which
sets a TImode destination from a paradoxical SUBREG, to assign the
lowpart, we can use an explicit zero extension (zero_extendditi2 was
added in July 2022), and for insn 5, which sets the highpart of a
TImode register we can use the *insvti_highpart_1 instruction (that
was added in May 2023, after being approved for stage1 in January).
This allows combine to work its magic, merging these insns into a
*concatditi3 and from there into other optimized forms.
So for the test case above, we now generate only a single movq:
But there is a little bad news. This patch causes two (minor) missed
optimization regressions on x86_64; gcc.target/i386/pr82580.c and
gcc.target/i386/pr91681-1.c. As shown in the test case above, we're
no longer generating adcq $0, but instead using xorl. For the other
FAIL, register allocation now has more freedom and is (arbitrarily)
choosing a register assignment that doesn't match what the test is
expecting. These issues are easier to explain and fix once this patch
is in the tree.
The good news is that this approach fixes a number of long standing
issues, that need to checked in bugzilla, including PR target/110533
which was just opened/reported earlier this week.
2023-07-07 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
PR target/43644
PR target/110533
* config/i386/i386-expand.cc (ix86_expand_move): Convert SETs of
TImode destinations from paradoxical SUBREGs (setting the lowpart)
into explicit zero extensions. Use *insvti_highpart_1 instruction
to set the highpart of a TImode destination.
gcc/testsuite/ChangeLog
PR target/43644
PR target/110533
* gcc.target/i386/pr110533.c: New test case.
* gcc.target/i386/pr43644-2.c: Likewise.
d: Fix PR 108842: Cannot use enum array with -fno-druntime
Restrict the generating of CONST_DECLs for D manifest constants to just
scalars without pointers. It shouldn't happen that a reference to a
manifest constant has not been expanded within a function body during
codegen, but it has been found to occur in older versions of the D
front-end (PR98277), so if the decl of a non-scalar constant is
requested, just return its initializer as an expression.
PR d/108842
gcc/d/ChangeLog:
* decl.cc (DeclVisitor::visit (VarDeclaration *)): Only emit scalar
manifest constants.
(get_symbol_decl): Don't generate CONST_DECL for non-scalar manifest
constants.
* imports.cc (ImportVisitor::visit (VarDeclaration *)): New method.
gcc/testsuite/ChangeLog:
* gdc.dg/pr98277.d: Add more tests.
* gdc.dg/pr108842.d: New test.
Jan Hubicka [Fri, 7 Jul 2023 17:16:59 +0000 (19:16 +0200)]
Fix some profile consistency testcases
Information about profile mismatches is printed only with -details-blocks for some time.
I think it should be printed even with default to make it easier to spot when someone introduces
new transform that breaks the profile, but I will send separate RFC for that.
This patch enables details in all testcases that greps for Invalid sum. There are 4 testcases
which fails:
gcc.dg/tree-ssa/loop-ch-profile-1.c
here the problem is that loop header dulication introduces loop invariant conditoinal that is later
updated by tree-ssa-dom but dom does not take care of updating profile.
Since loop-ch knows when it duplicates loop invariant, we may be able to get this right.
The test is still useful since it tests that right after ch profile is consistent.
gcc.dg/tree-prof/update-cunroll-2.c
This is about profile updating code in duplicate_loop_body_to_header_edge being wrong when optimized
out exit is not last in the loop. In that case the probability of later exits needs to be accounted in.
I will think about making this better - in general this does not seem to have easy solution, but for
special case of chained tests we can definitely account for the later exits.
gcc.dg/tree-ssa/update-unroll-1.c
This fails after aprefetch invoked unrolling. I did not look into details yet.
gcc.dg/tree-prof/update-unroll-2.c
This one seems similar as previous
I decided to xfail these tests and deal with them incrementally and filled in PR110590.
Jan Hubicka [Fri, 7 Jul 2023 16:22:11 +0000 (18:22 +0200)]
Fix epilogue loop profile
Fix two bugs in scale_loop_profile which crept in during my
cleanups and curiously enoug did not show on the testcases we have so far.
The patch also adds the missing call to cap iteration count of the vectorized
loop epilogues.
Vectorizer profile needs more work, but I am trying to chase out obvious bugs first
so the profile quality statistics become meaningful and we can try to improve on them.
Vect still introduces 77 profile mismatches (same as without this patch)
however subsequent cunroll works much better with 6 new mismatches compared to
78. Overall it reduces 229 mismatches to 160.
Also overall runtime estimate is now reduced by 6.9%.
Previously the overall runtime estimate grew by 11% which was result of the fat
that the epilogue profile was pretty much the same as profile of the original
loop.
Bootstrapped/regtested x86_64-linux, comitted.
gcc/ChangeLog:
* cfgloopmanip.cc (scale_loop_profile): Fix computation of count_in and scaling blocks
after exit.
* tree-vect-loop-manip.cc (vect_do_peeling): Scale loop profile of the epilogue if bound
is known.
For given testcase a reload pseudo happened to occur only in reload
insns created on one constraint sub-pass. Therefore its initial class
(ALL_REGS) was not refined and the reload insns were not processed on
the next constraint sub-passes. This resulted into the wrong insn.
PR rtl-optimization/110372
gcc/ChangeLog:
* lra-assigns.cc (assign_by_spills): Add reload insns involving
reload pseudos with non-refined class to be processed on the next
sub-pass.
* lra-constraints.cc (enough_allocatable_hard_regs_p): New func.
(in_class_p): Use it.
(print_curr_insn_alt): New func.
(process_alt_operands): Use it. Improve debug info.
(curr_insn_transform): Use print_curr_insn_alt. Refine reload
pseudo class if it is not refined yet.
* value-range.cc (irange::get_bitmask_from_range): Return all the
known bits for a singleton.
(irange::set_range_from_bitmask): Set a range of a singleton when
all bits are known.
Aldy Hernandez [Fri, 30 Jun 2023 18:24:38 +0000 (20:24 +0200)]
The caller to irange::intersect (wide_int, wide_int) must normalize the range.
Per the function comment, the caller to intersect(wide_int, wide_int)
must handle the mask. This means it must also normalize the range if
anything changed.
gcc/ChangeLog:
* value-range.cc (irange::intersect): Leave normalization to
caller.
Aldy Hernandez [Thu, 29 Jun 2023 09:27:22 +0000 (11:27 +0200)]
Implement value/mask tracking for irange.
Integer ranges (irange) currently track known 0 bits. We've wanted to
track known 1 bits for some time, and instead of tracking known 0 and
known 1's separately, it has been suggested we track a value/mask pair
similarly to what we do for CCP and RTL. This patch implements such a
thing.
With this we now track a VALUE integer which are the known values, and
a MASK which tells us which bits contain meaningful information. This
allows us to fix a handful of enhancement requests, such as PR107043
and PR107053.
There is a 4.48% performance penalty for VRP and 0.42% in overall
compilation for this entire patchset. It is expected and in line
with the loss incurred when we started tracking known 0 bits.
This patch just provides the value/mask tracking support. All the
nonzero users (range-op, IPA, CCP, etc), are still using the nonzero
nomenclature. For that matter, this patch reimplements the nonzero
accessors with the value/mask functionality. In follow-up patches I
will enhance these passes to use the value/mask information, and
fix the aforementioned PRs.
Jan Beulich [Fri, 7 Jul 2023 07:45:06 +0000 (09:45 +0200)]
x86: slightly correct / simplify *vec_extractv2ti
V2TImode values cannot appear in the upper 16 YMM registers without
AVX512VL being enabled. Therefore forcing 512-bit mode (also not
reflected in the "mode" attribute) is pointless.
gcc/
* config/i386/sse.md (*vec_extractv2ti): Drop g modifiers.
Jan Beulich [Fri, 7 Jul 2023 07:44:36 +0000 (09:44 +0200)]
x86: correct / simplify @vec_extract_hi_<mode> and vec_extract_hi_v32qi
The middle alternative each was unusable without enabling AVX512DQ (in
addition to AVX512VL), which is entirely unrelated here. The last
alternative is usable with AVX512VL only (due to type restrictions on
what may be put in the upper 16 YMM registers), and hence is pointlessly
forcing 512-bit mode (without actually reflecting that in the "mode"
attribute).
gcc/
* config/i386/sse.md (@vec_extract_hi_<mode>): Drop last
alternative. Switch new last alternative's "isa" attribute to
"avx512vl".
(vec_extract_hi_v32qi): Likewise.
Pan Li [Tue, 4 Jul 2023 14:05:36 +0000 (22:05 +0800)]
RISC-V: Fix one bug for floating-point static frm
This patch would like to fix one bug to align below items of spec.
RVV floating-point instructions always (implicitly) use the dynamic
rounding mode. This implies that rounding is performed according to the
rounding mode set in the FRM register. The FRM register itself
only holds proper rounding modes and never the dynamic rounding mode.
Signed-off-by: Pan Li <pan2.li@intel.com> Co-Authored-By: Robin Dapp <rdapp@ventanamicro.com>
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_emit_mode_set): Avoid emit insn
when FRM_MODE_DYN.
(riscv_mode_entry): Take FRM_MODE_DYN as entry mode.
(riscv_mode_exit): Likewise for exit mode.
(riscv_mode_needed): Likewise for needed mode.
(riscv_mode_after): Likewise for after mode.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/float-point-frm-insert-6.c: New test.
Jan Hubicka [Thu, 6 Jul 2023 16:56:22 +0000 (18:56 +0200)]
Improve profile updates after loop-ch and cunroll
Extend loop-ch and loop unrolling to fix profile in case the loop is
known to not iterate at all (or iterate few times) while profile claims it
iterates more. While this is kind of symptomatic fix, it is best we can do
incase profile was originally esitmated incorrectly.
In the testcase the problematic loop is produced by vectorizer and I think
vectorizer should know and account into its costs that vectorizer loop and/or
epilogue is not going to loop after the transformation. So it would be nice
to fix it on that side, too.
The patch avoids about half of profile mismatches caused by cunroll.
Bootstrapped/regtested x86_64-linux. Plan to commit it after the scale_loop_frequency patch.
gcc/ChangeLog:
PR middle-end/25623
* tree-ssa-loop-ch.cc (ch_base::copy_headers): Scale loop frequency to maximal number
of iterations determined.
* tree-ssa-loop-ivcanon.cc (try_unroll_loop_completely): Likewise.
gcc/testsuite/ChangeLog:
PR middle-end/25623
* gfortran.dg/pr25623-2.f90: New test.
Jan Hubicka [Thu, 6 Jul 2023 16:51:02 +0000 (18:51 +0200)]
Improve scale_loop_profile
Original scale_loop_profile was implemented to only handle very simple loops
produced by vectorizer at that time (basically loops with only one exit and no
subloops). It also has not been updated to new profile-count API very carefully.
The function does two thigs
1) scales down the loop profile by a given probability.
This is useful, for example, to scale down profile after peeling when loop
body is executed less often than before
2) update profile to cap iteration count by ITERATION_BOUND parameter.
I changed ITERATION_BOUND to be actual bound on number of iterations as
used elsewhere (i.e. number of executions of latch edge) rather then
number of iterations + 1 as it was before.
To do 2) one needs to do the following
a) scale own loop profile so frquency o header is at most
the sum of in-edge counts * (iteration_bound + 1)
b) update loop exit probabilities so their count is the same
as before scaling.
c) reduce frequencies of basic blocks after loop exit
old code did b) by setting probability to 1 / iteration_bound which is
correctly only of the basic block containing exit executes precisely one per
iteration (it is not insie other conditional or inner loop). This is fixed
now by using set_edge_probability_and_rescale_others
aldo c) was implemented only for special case when the exit was just before
latch bacis block. I now use dominance info to get right some of addional
case.
I still did not try to do anything for multiple exit loops, though the
implementatoin could be generalized.
Bootstrapped/regtested x86_64-linux. Plan to cmmit it tonight if there
are no complains.
gcc/ChangeLog:
* cfgloopmanip.cc (scale_loop_profile): Rewrite exit edge
probability update to be safe on loops with subloops.
Make bound parameter to be iteration bound.
* tree-ssa-loop-ivcanon.cc (try_peel_loop): Update call
of scale_loop_profile.
* tree-vect-loop-manip.cc (vect_do_peeling): Likewise.
Hao Liu OS [Thu, 6 Jul 2023 16:04:46 +0000 (10:04 -0600)]
Vect: use a small step to calculate induction for the unrolled loop (PR tree-optimization/110449)
If a loop is unrolled by n times during vectoriation, two steps are used to
calculate the induction variable:
- The small step for the unrolled ith-copy: vec_1 = vec_iv + (VF/n * Step)
- The large step for the whole loop: vec_loop = vec_iv + (VF * Step)
This patch calculates an extra vec_n to replace vec_loop:
vec_n = vec_prev + (VF/n * S) = vec_iv + (VF/n * S) * n = vec_loop.
So that we can save the large step register and related operations.
gcc/ChangeLog:
PR tree-optimization/110449
* tree-vect-loop.cc (vectorizable_induction): use vec_n to replace
vec_loop for the unrolled loop.
Jan Hubicka [Thu, 6 Jul 2023 14:19:15 +0000 (16:19 +0200)]
updat_bb_profile_for_threading TLC
Apply some TLC to update_bb_profile_for_threading. The function resales
probabilities by:
FOR_EACH_EDGE (c, ei, bb->succs)
c->probability /= prob;
which is correct but in case prob is 0 (took all execution counts to the newly
constructed path), this leads to undefined results which do not sum to 100%.
In several other plpaces we need to change probability of one edge and rescale
remaining to sum to 100% so I decided to break this off to helper function
set_edge_probability_and_rescale_others
For jump threading the probability of edge is always reduced, so division is right
update, however in general case we also may want to increase probability of the edge
which needs different scalling. This is bit hard to do staying with probabilities
in range 0...1 for all temporaries.
For this reason I decided to add profile_probability::apply_scale which is symmetric
to what we already have in profile_count::apply_scale and does right thing in
both directions.
Finally I added few early exits so we do not produce confused dumps when
profile is missing and special case the common situation where edges out of BB
are precisely two. In this case we can set the other edge to inverter probability
which. Saling drop probability quality from PRECISE to ADJUSTED.
Bootstrapped/regtested x86_64-linux. The patch has no effect on in count mismatches
in tramp3d build and improves out-count. Will commit it shortly.
gcc/ChangeLog:
* cfg.cc (set_edge_probability_and_rescale_others): New function.
(update_bb_profile_for_threading): Use it; simplify the rest.
* cfg.h (set_edge_probability_and_rescale_others): Declare.
* profile-count.h (profile_probability::apply_scale): New.
Richard Biener [Thu, 6 Jul 2023 11:51:55 +0000 (13:51 +0200)]
tree-optimization/110556 - tail merging still pre-tuples
The stmt comparison function for GIMPLE_ASSIGNs for tail merging
still looks like it deals with pre-tuples IL. The following
attempts to fix this, not only comparing the first operand (sic!)
of stmts but all of them plus also compare the operation code.
PR tree-optimization/110556
* tree-ssa-tail-merge.cc (gimple_equal_p): Check
assign code and all operands of non-stores.
Claire Dross [Thu, 15 Jun 2023 14:22:11 +0000 (16:22 +0200)]
ada: Refactor the proof of the Value and Image runtime units
The aim of this refactoring is to avoid unnecessary dependencies
between Image and Value units even though they share the same
specification functions. These functions are grouped inside ghost
packages which are then withed by Image and Value units.
gcc/ada/
* libgnat/s-vs_int.ads: Instance of Value_I_Spec for Integer.
* libgnat/s-vs_lli.ads: Instance of Value_I_Spec for
Long_Long_Integer.
* libgnat/s-vsllli.ads: Instance of Value_I_Spec for
Long_Long_Long_Integer.
* libgnat/s-vs_uns.ads: Instance of Value_U_Spec for Unsigned.
* libgnat/s-vs_llu.ads: Instance of Value_U_Spec for
Long_Long_Unsigned.
* libgnat/s-vslllu.ads: Instance of Value_U_Spec for
Long_Long_Long_Unsigned.
* libgnat/s-imagei.ads: Take instances of Value_*_Spec as
parameters.
* libgnat/s-imagei.adb: Idem.
* libgnat/s-imageu.ads: Idem.
* libgnat/s-imageu.adb: Idem.
* libgnat/s-valuei.ads: Idem.
* libgnat/s-valuei.adb: Idem.
* libgnat/s-valueu.ads: Idem.
* libgnat/s-valueu.adb: Idem.
* libgnat/s-imgint.ads: Adapt instance to new ghost parameters.
* libgnat/s-imglli.ads: Adapt instance to new ghost parameters.
* libgnat/s-imgllli.ads: Adapt instance to new ghost parameters.
* libgnat/s-imglllu.ads: Adapt instance to new ghost parameters.
* libgnat/s-imgllu.ads: Adapt instance to new ghost parameters.
* libgnat/s-imguns.ads: Adapt instance to new ghost parameters.
* libgnat/s-valint.ads: Adapt instance to new ghost parameters.
* libgnat/s-vallli.ads: Adapt instance to new ghost parameters.
* libgnat/s-valllli.ads: Adapt instance to new ghost parameters.
* libgnat/s-vallllu.ads: Adapt instance to new ghost parameters.
* libgnat/s-valllu.ads: Adapt instance to new ghost parameters.
* libgnat/s-valuns.ads: Adapt instance to new ghost parameters.
* libgnat/s-vaispe.ads: Take instance of Value_U_Spec as parameter
and remove unused declaration.
* libgnat/s-vaispe.adb: Idem.
* libgnat/s-vauspe.ads: Remove unused declaration.
* libgnat/s-valspe.ads: Factor out the specification part of
Val_Util.
* libgnat/s-valspe.adb: Idem.
* libgnat/s-valuti.ads: Move specification to Val_Spec.
* libgnat/s-valuti.adb: Idem.
* libgnat/s-valboo.ads: Use Val_Spec.
* libgnat/s-valboo.adb: Idem.
* libgnat/s-imgboo.adb: Idem.
* libgnat/s-imagef.adb: Adapt instances to new ghost parameters.
* Makefile.rtl: List new files.
Viljar Indus [Wed, 21 Jun 2023 13:22:37 +0000 (16:22 +0300)]
ada: Evaluate static expressions in Range attributes
Gigi assumes that the value of range expressions is an integer literal.
Force evaluation of such expressions since static non-literal expressions
are not always evaluated to a literal form by gnat.
gcc/ada/
* sem_attr.adb (analyze_attribute.check_array_type): Replace valid
indexes with their staticly evaluated values.
Viljar Indus [Tue, 20 Jun 2023 14:29:41 +0000 (17:29 +0300)]
ada: Refer to non-Ada binding limitations in user guide
The limitation of resetting the FPU mode for non 80-bit
precision was not referenced from "Creating a Stand-alone
Library to be used in a non-Ada context". Reference it the same
way it is already referenced from "Interfacing to C".
gcc/ada/
* doc/gnat_ugn/the_gnat_compilation_model.rst: Reference "Binding
with Non-Ada Main Programs" from "Creating a Stand-alone Library
to be used in a non-Ada context".
* gnat_ugn.texi: Regenerate.
Viljar Indus [Wed, 14 Jun 2023 20:19:49 +0000 (23:19 +0300)]
ada: Avoid crash in Find_Optional_Prim_Op
Find_Optional_Prim_Op can crash when the Underlying_Type is Empty.
This can happen when you are dealing with a structure type with a
private part that does not have its Full_View set yet.
gcc/ada/
* exp_util.adb (Find_Optional_Prim_Op): Stop deriving primitive
operation if there is no underlying type to derive it from.
Steve Baird [Tue, 6 Jun 2023 19:44:00 +0000 (12:44 -0700)]
ada: Finalization not performed for component of protected type
In some cases involving a discriminated protected type with an array
component that is subject to a discriminant-dependent index constraint,
where the element type of the array requires finalization and the array
type has not yet been frozen at the point of the declaration of the protected
type, finalization of an object of the protected type may incorrectly omit
finalization of the array component. One case where this scenario can arise
is an instantiation of Ada.Containers.Bounded_Synchronized_Queues, passing in
an Element type that requires finalization.
gcc/ada/
* exp_ch7.adb (Make_Final_Call): Add assertion that if no
finalization call is generated, then the type of the object being
finalized does not require finalization.
* freeze.adb (Freeze_Entity): If freezing an already-frozen
subtype, do not assume that nothing needs to be done. In the case
of a frozen subtype of a non-frozen type or subtype (which is
possible), freeze the non-frozen entity.
The following consolidates an assert that now hits for ppc64le
with an earlier check we already do, simplifying
vect_determine_partial_vectors_and_peeling and getting rid of
its now redundant argument.
PR tree-optimization/110563
* tree-vectorizer.h (vect_determine_partial_vectors_and_peeling):
Remove second argument.
* tree-vect-loop.cc (vect_determine_partial_vectors_and_peeling):
Remove for_epilogue_p argument. Merge assert ...
(vect_analyze_loop_2): ... with check done before determining
partial vectors by moving it after.
* tree-vect-loop-manip.cc (vect_do_peeling): Adjust.
Thomas Schwinge [Wed, 5 Jul 2023 13:34:56 +0000 (15:34 +0200)]
GGC, GTY: Tighten up a few things re 'reorder' option and strings
..., which doesn't make sense in combination.
This, again, is primarily preparational for another change.
gcc/
* ggc-common.cc (gt_pch_note_reorder, gt_pch_save): Tighten up a
few things re 'reorder' option and strings.
* stringpool.cc (gt_pch_p_S): This is now 'gcc_unreachable'.
Thomas Schwinge [Tue, 4 Jul 2023 20:47:48 +0000 (22:47 +0200)]
GTY: Clean up obsolete parametrized structs remnants
Support removed in 2014 with
commit 63f5d5b818319129217e41bcb23db53f99ff11b0 (Subversion r218558)
"remove gengtype support for param_is use_param, if_marked and splay tree allocators".
Thomas Schwinge [Tue, 4 Jul 2023 20:47:48 +0000 (22:47 +0200)]
GTY: Clean up obsolete 'bool needs_cast_p' field of 'gcc/gengtype.cc:struct walk_type_data'
Last use disappeared in 2014 with
commit 63f5d5b818319129217e41bcb23db53f99ff11b0 (Subversion r218558)
"remove gengtype support for param_is use_param, if_marked and splay tree allocators".
gcc/
* gengtype.cc (struct walk_type_data): Remove 'needs_cast_p'.
Adjust all users.
--- gcc/ggc-tests.cc
+++ gcc/ggc-tests.cc
@@ -258 +258 @@ class GTY((tag("1"))) some_subclass : public example_base
-class GTY((tag("2"))) some_other_subclass : public example_base
+class GTY((tag(user))) some_other_subclass : public example_base
@@ -384 +384 @@ test_chain_next ()
-struct GTY((user)) user_struct
+struct GTY((user user)) user_struct
..., we get unexpected "have a param<N>_is option" diagnostics:
[...]
build/gengtype \
-S [...]/source-gcc/gcc -I gtyp-input.list -w tmp-gtype.state
[...]/source-gcc/gcc/ggc-tests.cc:258: parse error: expected a string constant, have a param<N>_is option
[...]/source-gcc/gcc/ggc-tests.cc:384: parse error: expected ')', have a param<N>_is option
make[2]: *** [Makefile:2888: s-gtype] Error 1
[...]
This traces back to 2012 "Support garbage-collected C++ templates", which got
incorporated in commit 0823efedd0fb8669b7e840954bc54c3b2cf08d67
(Subversion r190402), which did add 'USER_GTY' to what nowadays is known as
'enum gty_token', but didn't accordingly update
'gcc/gengtype-parse.c:token_names', leaving those out of sync. Updating
'gcc/gengtype-parse.c:token_value_format' wasn't necessary, as:
/* print_token assumes that any token >= FIRST_TOKEN_WITH_VALUE may have
a meaningful value to be printed. */
FIRST_TOKEN_WITH_VALUE = PARAM_IS
This, in turn, got further confused -- or "fixed" -- by later changes:
2014 commit 63f5d5b818319129217e41bcb23db53f99ff11b0 (Subversion r218558)
"remove gengtype support for param_is use_param, if_marked and splay tree allocators",
which reciprocally missed corresponding clean-up.
With that addressed via adding the missing '"user"' to 'token_names', and,
until that is properly fixed, a temporary 'UNUSED_PARAM_IS' (re-)added for use
with 'FIRST_TOKEN_WITH_VALUE', we then get the expected:
[...]/source-gcc/gcc/ggc-tests.cc:258: parse error: expected a string constant, have 'user'
[...]/source-gcc/gcc/ggc-tests.cc:384: parse error: expected ')', have 'user'
gcc/
* gengtype-parse.cc (token_names): Add '"user"'.
* gengtype.h (gty_token): Add 'UNUSED_PARAM_IS' for use with
'FIRST_TOKEN_WITH_VALUE'.
Thomas Schwinge [Wed, 5 Jul 2023 06:38:49 +0000 (08:38 +0200)]
GTY: Enhance 'string_length' option documentation
We're (currently) not aware of any actual use of 'ht_identifier's with NUL
characters embedded; its 'len' field appears to exist for optimization
purposes, since "forever". Before 'struct ht_identifier' was added in
commit 2a967f3d3a45294640e155381ef549e0b8090ad4 (Subversion r42334), we had in
'gcc/cpplib.h:struct cpp_hashnode': 'unsigned short len', or earlier 'length',
earlier in 'gcc/cpphash.h:struct hashnode': 'unsigned short length', earlier
'size_t length' with comment: "length of token, for quick comparison", earlier
'int length', ever since the 'gcc/cpp*' files were added in
commit 7f2935c734c36f84ab62b20a04de465e19061333 (Subversion r9191).
--- gcc/varasm.cc
+++ gcc/varasm.cc
@@ -66 +66 @@ along with GCC; see the file COPYING3. If not see
-extern GTY(()) const char *first_global_object_name;
+extern GTY((string_length("strlen(%h.first_global_object_name) + 1"))) const char *first_global_object_name;
..., we get:
[...]
build/gengtype \
-S [...]/source-gcc/gcc -I gtyp-input.list -w tmp-gtype.state
/bin/sh [...]/source-gcc/gcc/../move-if-change tmp-gtype.state gtype.state
build/gengtype \
-r gtype.state
[...]/source-gcc/gcc/varasm.cc:66: global `first_global_object_name' has unknown option `string_length'
[...]/source-gcc/gcc/c-family/c-cppbuiltin.cc:1789: field `hex_str' of global `lazy_hex_fp_values[0]' has unknown option `string_length'
make[2]: *** [Makefile:2890: s-gtype] Error 1
[...]
These errors occur when writing "GC roots", where -- per my understanding --
'string_length' isn't relevant for actual GC purposes. However, like
elsewhere, it is for PCH purposes, and simply accepting 'string_length' here
isn't sufficient: we'll still get '(gt_pointer_walker) >_pch_n_S' used in the
'struct ggc_root_tab' instances, and there's no easy way to change that to
instead use 'gt_pch_n_S2' with explicit 'size_t string_len' argument. (At
least not sufficiently easy to justify spending any further time on, given that
I don't have an actual use for that feature.)
So, until an actual need arises, and/or to avoid the next person looking into
this having to figure out the same thing again, let's just document this
limitation:
[...]/source-gcc/gcc/varasm.cc:66: option `string_length' not supported for global `first_global_object_name'
[...]/source-gcc/gcc/c-family/c-cppbuiltin.cc:1789: option `string_length' not supported for field `hex_str' of global `lazy_hex_fp_values[0]'
Roger Sayle [Thu, 6 Jul 2023 08:58:17 +0000 (09:58 +0100)]
[Committed] Handle COPYSIGN in dwarf2out.cc's mem_loc_descriptor.
Many thanks to Hans-Peter Nilsson for reminding me that new RTX codes
need to be added to dwarf2out.cc's mem_loc_descriptor, and for doing
this for BITREVERSE. This patch does the same for the recently added
COPYSIGN. I'd been testing these on a target that doesn't use DWARF
(nvptx-none) and so didn't exhibit the issue, and my additional testing
on x86_64-pc-linux-gnu to double check that changes were safe, doesn't
(yet) trigger the problematic assert in dwarf2out.cc's mem_loc_descriptor.
2023-07-06 Roger Sayle <roger@nextmovesoftware.com>
Richard Biener [Wed, 5 Jul 2023 13:57:49 +0000 (15:57 +0200)]
tree-optimization/110515 - wrong code with LIM + PRE
In this PR we face the issue that LIM speculates a load when
hoisting it out of the loop (since it knows it cannot trap).
Unfortunately this exposes undefined behavior when the load
accesses memory with the wrong dynamic type. This later
makes PRE use that representation instead of the original
which accesses the same memory location but using a different
dynamic type leading to a wrong disambiguation of that
original access against another and thus a wrong-code transform.
Fortunately there already is code in PRE dealing with a similar
situation for code hoisting but that left a small gap which
when fixed also fixes the wrong-code transform in this bug even
if it doesn't address the underlying issue of LIM speculating
that load.
The upside is this fix is trivially safe to backport and chances
of code generation regressions are very low.
PR tree-optimization/110515
* tree-ssa-pre.cc (compute_avail): Make code dealing
with hoisting loads with different alias-sets more
robust.
Richard Biener [Thu, 6 Jul 2023 06:52:46 +0000 (08:52 +0200)]
Fix expectation on gcc.dg/vect/pr71264.c
With the recent change to more reliably not vectorize code already
using vector types we run into FAILs of gcc.dg/vect/pr71264.c
The testcase was added for fixing an ICE and possible (re-)vectorization
of the code isn't really supported and I suspect might even go
wrong for non-bitops.
The following leaves the testcase as just testing for an ICE.
PR tree-optimization/110544
* gcc.dg/vect/pr71264.c: Remove scan for vectorization.
Hongyu Wang [Fri, 30 Jun 2023 01:44:56 +0000 (09:44 +0800)]
i386: Inline function with default arch/tune to caller
For function with different target attributes, current logic rejects to
inline the callee when any arch or tune is mismatched. Relax the
condition to allow callee with default arch/tune to be inlined.
gcc/ChangeLog:
* config/i386/i386.cc (ix86_can_inline_p): If callee has
default arch=x86-64 and tune=generic, do not block the
inlining to its caller. Also allow callee with different
arch= to be inlined if it has always_inline attribute and
it's ISA is subset of caller's.
gcc/testsuite/ChangeLog:
* gcc.target/i386/inline_attr_arch.c: New test.
* gcc.target/i386/inline_target_clones.c: Ditto.
So the problem is vector generic decided to do comparisons in signed-boolean:32
types but the rest of the middle-end was not ready for that. Since we are building
the comparison which will feed into a cond_expr here, using boolean_type_node is
better and also correct. The rest of the compiler thinks the ranges for
comparison is always [0,1] too.
Note this code does not currently lowers bigger vector sizes into smaller
vector sizes so using boolean_type_node here is better.
OK? bootstrapped and tested on x86_64-linux-gnu with no regressions.
gcc/ChangeLog:
PR middle-end/110554
* tree-vect-generic.cc (expand_vector_condition): For comparisons,
just build using boolean_type_node instead of the cond_type.
For non-comparisons/non-scalar-bitmask, build a ` != 0` gimple
that will feed into the COND_EXPR.
rax is used to save and restore DFmode value. In RA both GENERAL_REGS
and SSE_REGS cost zero since we didn't disparage the
alternative in movdf_internal pattern, according to register
allocation order, GENERAL_REGS is allocated. The patch add ? for
alternative (r,v) and (v,r) just like we did for movsf/hf/bf_internal
pattern, after that we get optimal RA.
PR106907 has few warnings spotted from cppcheck. In that addressing
redundant initialization issue. Here the initialized value of 'new_addr'
was overwritten before it was read. Updated the source by removing the
unnecessary initialization of 'new_addr'.
Hao Liu [Thu, 6 Jul 2023 02:03:47 +0000 (10:03 +0800)]
tree-optimization/110474 - Vect: select small VF for epilog of unrolled loop
If a loop is unrolled during vectorization (i.e. suggested_unroll_factor > 1),
the VFs of both main and epilog loop are enlarged. The epilog vect loop is
specific for a loop with small iteration counts, so a large VF may hurt
performance.
This patch unscales the main loop VF by suggested_unroll_factor while selecting
the epilog loop VF, so that it will be the same as vectorized loop without
unrolling (i.e. suggested_unroll_factor = 1).
gcc/ChangeLog:
PR tree-optimization/110474
* tree-vect-loop.cc (vect_analyze_loop_2): unscale the VF by suggested
unroll factor while selecting the epilog vect loop VF.
Andrew MacLeod [Wed, 5 Jul 2023 17:36:27 +0000 (13:36 -0400)]
Simplify compute_operand_range for op1 and op2 case.
Move the check for co-dependency between 2 operands into
compute_operand_range, resulting in a much cleaner
compute_operand1_and_operand2_range routine.
* gimple-range-gori.cc (compute_operand_range): Check for
operand interdependence when both op1 and op2 are computed.
(compute_operand1_and_operand2_range): No checks required now.
Andrew MacLeod [Tue, 4 Jul 2023 15:28:52 +0000 (11:28 -0400)]
Move relation discovery into compute_operand_range
compute_operand1_range and compute_operand2_range were both doing
relation discovery between the 2 operands... move it into a common area.
* gimple-range-gori.cc (compute_operand_range): Check for
a relation between op1 and op2 and use that instead.
(compute_operand1_range): Don't look for a relation override.
(compute_operand2_range): Ditto.
Filip Kastl [Wed, 5 Jul 2023 15:36:02 +0000 (17:36 +0200)]
value-prof.cc: Correct edge prob calculation.
The mod-subtract optimization with ncounts==1 produced incorrect edge
probabilities due to incorrect conditional probability calculation. This
patch fixes the calculation.
Signed-off-by: Filip Kastl <filip.kastl@gmail.com>
gcc/ChangeLog:
sched: Change return type of predicate functions from int to bool
Also change some internal variables to bool.
gcc/ChangeLog:
* sched-int.h (struct haifa_sched_info): Change can_schedule_ready_p,
scehdule_more_p and contributes_to_priority indirect frunction
type from int to bool.
(no_real_insns_p): Change return type from int to bool.
(contributes_to_priority): Ditto.
* haifa-sched.cc (no_real_insns_p): Change return type from
int to bool and adjust function body accordingly.
* modulo-sched.cc (try_scheduling_node_in_cycle): Change "success"
variable type from int to bool.
(ps_insn_advance_column): Change return type from int to bool.
(ps_has_conflicts): Ditto. Change "has_conflicts"
variable type from int to bool.
* sched-deps.cc (deps_may_trap_p): Change return type from int to bool.
(conditions_mutex_p): Ditto.
* sched-ebb.cc (schedule_more_p): Ditto.
(ebb_contributes_to_priority): Change return type from
int to bool and adjust function body accordingly.
* sched-rgn.cc (is_cfg_nonregular): Ditto.
(check_live_1): Ditto.
(is_pfree): Ditto.
(find_conditional_protection): Ditto.
(is_conditionally_protected): Ditto.
(is_prisky): Ditto.
(is_exception_free): Ditto.
(haifa_find_rgns): Change "unreachable" and "too_large_failure"
variables from int to bool.
(extend_rgns): Change "rescan" variable from int to bool.
(check_live): Change return type from
int to bool and adjust function body accordingly.
(can_schedule_ready_p): Ditto.
(schedule_more_p): Ditto.
(contributes_to_priority): Ditto.
Robin Dapp [Wed, 28 Jun 2023 13:48:55 +0000 (15:48 +0200)]
gimple-isel: Recognize vec_extract pattern.
In gimple-isel we already deduce a vec_set pattern from an
ARRAY_REF(VIEW_CONVERT_EXPR). This patch does the same for a
vec_extract.
The code is largely similar to the vec_set one
including the addition of a can_vec_extract_var_idx_p function
in optabs.cc to check if the backend can handle a register
operand as index. We already have can_vec_extract in
optabs-query but that one checks whether we can extract
specific modes.
With the introduction of an internal function for vec_extract
the expander must not FAIL. For vec_set this has already been
the case so adjust the documentation accordingly.
Additionally, clarify the wording of the vector-vector case for
vec_extract.
gcc/ChangeLog:
* doc/md.texi: Document that vec_set and vec_extract must not
fail.
* gimple-isel.cc (gimple_expand_vec_set_expr): Rename this...
(gimple_expand_vec_set_extract_expr): ...to this.
(gimple_expand_vec_exprs): Call renamed function.
* internal-fn.cc (vec_extract_direct): Add.
(expand_vec_extract_optab_fn): New function to expand
vec_extract optab.
(direct_vec_extract_optab_supported_p): Add.
* internal-fn.def (VEC_EXTRACT): Add.
* optabs.cc (can_vec_extract_var_idx_p): New function.
* optabs.h (can_vec_extract_var_idx_p): Declare.
Pan Li [Tue, 4 Jul 2023 12:26:11 +0000 (20:26 +0800)]
RISC-V: Use FRM_DYN when add the rounding mode operand
This patch would like to take FRM_DYN const rtx as the rounding mode
operand according to the RVV spec, which takes the dyn as the only
rounding mode for floating-point.
Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:
* config/riscv/riscv-vector-builtins.cc
(function_expander::use_exact_insn): Use FRM_DYN instead of const0. Signed-off-by: Pan Li <pan2.li@intel.com>
Robin Dapp [Wed, 5 Jul 2023 12:42:21 +0000 (14:42 +0200)]
RISC-V: Change truncate to float_truncate in narrowing patterns.
This fixes a bug in the autovect FP narrowing patterns which resulted in
a combine ICE. It would try to e.g. simplify a unary operation by
simplify_const_unary_operation which obviously expects a float_truncate
and not a truncate for a floating-point mode.
VECT: Apply LEN_MASK_GATHER_LOAD/SCATTER_STORE into vectorizer
Hi, Richard and Richi.
Address comments from Richi.
Make gs_info.ifn = LEN_MASK_GATHER_LOAD/LEN_MASK_SCATTER_STORE.
I have fully tested these 4 format:
length = vf is a dummpy length,
mask = {-1,-1, ... } is a dummy mask.
1. no length, no mask
LEN_MASK_GATHER_LOAD (..., length = vf, mask = {-1,-1,...})
2. exist length, no mask
LEN_MASK_GATHER_LOAD (..., len, mask = {-1,-1,...})
3. exist mask, no length
LEN_MASK_GATHER_LOAD (..., length = vf, mask)
4. both mask and length exist
LEN_MASK_GATHER_LOAD (..., length, mask)
All of these work fine in this patch.
Here is the example:
void
f (int *restrict a,
int *restrict b, int n,
int base, int step,
int *restrict cond)
{
for (int i = 0; i < n; ++i)
{
if (cond[i])
a[i * 4] = b[i];
}
}
* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-10.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-11.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-12.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-13.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-14.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-5.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-6.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-7.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-8.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-9.c: New test.
YunQiang Su [Wed, 31 May 2023 09:55:50 +0000 (17:55 +0800)]
MIPS: Use unaligned access to expand block_move on r6
MIPSr6 support unaligned memory access with normal lh/sh/lw/sw/ld/sd
instructions, and thus lwl/lwr/ldl/ldr and swl/swr/sdl/sdr is removed.
For microarchitecture, these memory access instructions issue 2
operation if the address is not aligned, which is like what lwl family
do.
For some situation (such as accessing boundary of pages) on some
microarchitectures, the unaligned access may not be good enough,
then the kernel should trap&emu it: the kernel may need
-mno-unalgined-access option.
gcc/
* config/mips/mips.cc (mips_expand_block_move): don't expand for
r6 with -mno-unaligned-access option if one or both of src and
dest are unaligned. restruct: return directly if length is not const.
(mips_block_move_straight): emit_move if ISA_HAS_UNALIGNED_ACCESS.
gcc/testsuite/
* gcc.target/mips/expand-block-move-r6-no-unaligned.c: new test.
* gcc.target/mips/expand-block-move-r6.c: new test.
Richard Biener [Wed, 5 Jul 2023 07:59:44 +0000 (09:59 +0200)]
adjust testcase for now happening epilogue vectorization
gcc.dg/vect/slp-perm-9.c is reported to FAIL with -march=cascadelake
now which is because we now vectorize the epilogue with V2HImode
vectors after the recent change to not scrap too large vector
epilogues during transform but during analysis time.
The following adjusts the testcase to always use the existing alternate
N which avoids epilogue vectorization.
* gcc.dg/vect/slp-perm-9.c: Always use alternate N.
Jan Beulich [Wed, 5 Jul 2023 07:52:41 +0000 (09:52 +0200)]
x86: suppress avx512f-copysign.c testcase for 32-bit
The test installed by "x86: make VPTERNLOG* usable on less than 512-bit
operands with just AVX512F" won't succeed on 32-bit, for floating point
operations being done there (by default) without using SIMD insns.
gcc/testsuite/
* gcc.target/i386/avx512f-copysign.c: Suppress for 32-bit.