git.ipfire.org Git - thirdparty/gcc.git/commit

middle-end: Add new parameter to scale scalar loop costing in vectorizer

This patch adds a new param vect-scalar-cost-multiplier to scale the scalar
costing during vectorization.  If the cost is set high enough and when using
the dynamic cost model it has the effect of effectively disabling the
costing vs scalar and assumes all vectorization to be profitable.

This is similar to using the unlimited cost model, but unlike unlimited it
does not fully disable the vector cost model.  That means that we still
perform comparisons between vector modes.  And it means it also still does
costing for alias analysis.

As an example, the following:

void
foo (char *restrict a, int *restrict b, int *restrict c,
     int *restrict d, int stride)
{
    if (stride <= 1)
        return;

    for (int i = 0; i < 3; i++)
        {
            int res = c[i];
            int t = b[i * stride];
            if (a[i] != 0)
                res = t * d[i];
            c[i] = res;
        }
}

compiled with -O3 -march=armv8-a+sve -fvect-cost-model=dynamic fails to
vectorize as it assumes scalar would be faster, and with
-fvect-cost-model=unlimited it picks a vector type that's so big that the large
sequence generated is working on mostly inactive lanes:

        ...
        and     p3.b, p3/z, p4.b, p4.b
        whilelo p0.s, wzr, w7
        ld1w    z23.s, p3/z, [x3, #3, mul vl]
        ld1w    z28.s, p0/z, [x5, z31.s, sxtw 2]
        add     x0, x5, x0
        punpklo p6.h, p6.b
        ld1w    z27.s, p4/z, [x0, z31.s, sxtw 2]
        and     p6.b, p6/z, p0.b, p0.b
        punpklo p4.h, p7.b
        ld1w    z24.s, p6/z, [x3, #2, mul vl]
        and     p4.b, p4/z, p2.b, p2.b
        uqdecw  w6
        ld1w    z26.s, p4/z, [x3]
        whilelo p1.s, wzr, w6
        mul     z27.s, p5/m, z27.s, z23.s
        ld1w    z29.s, p1/z, [x4, z31.s, sxtw 2]
        punpkhi p7.h, p7.b
        mul     z24.s, p5/m, z24.s, z28.s
        and     p7.b, p7/z, p1.b, p1.b
        mul     z26.s, p5/m, z26.s, z30.s
        ld1w    z25.s, p7/z, [x3, #1, mul vl]
        st1w    z27.s, p3, [x2, #3, mul vl]
        mul     z25.s, p5/m, z25.s, z29.s
        st1w    z24.s, p6, [x2, #2, mul vl]
        st1w    z25.s, p7, [x2, #1, mul vl]
        st1w    z26.s, p4, [x2]
        ...

With -fvect-cost-model=dynamic --param vect-scalar-cost-multiplier=200
you get more reasonable code:

foo:
        cmp     w4, 1
        ble     .L1
        ptrue   p7.s, vl3
        index   z0.s, #0, w4
        ld1b    z29.s, p7/z, [x0]
        ld1w    z30.s, p7/z, [x1, z0.s, sxtw 2]
ptrue   p6.b, all
        cmpne   p7.b, p7/z, z29.b, #0
        ld1w    z31.s, p7/z, [x3]
mul     z31.s, p6/m, z31.s, z30.s
        st1w    z31.s, p7, [x2]
.L1:
        ret

This model has been useful internally for performance exploration and cost-model
validation.  It allows us to force realistic vectorization overriding the cost
model to be able to tell whether it's correct wrt to profitability.

gcc/ChangeLog:

* params.opt (vect-scalar-cost-multiplier): New.
* tree-vect-loop.cc (vect_estimate_min_profitable_iters): Use it.
* doc/invoke.texi (vect-scalar-cost-multiplier): Document it.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/sve/cost_model_16.c: New test.

author	Tamar Christina <tamar.christina@arm.com>
	Mon, 9 Jun 2025 06:03:27 +0000 (07:03 +0100)
committer	Tamar Christina <tamar.christina@arm.com>
	Mon, 9 Jun 2025 06:05:04 +0000 (07:05 +0100)
commit	4238e3470d3fa9b9697f9cf6ad26d4ef76fdf248
tree	d8ca289254304e33570350c478c02f1070ade85b	tree
parent	845826088c942f7df2ad6fbfe4cb976f83abcd16	commit \| diff

gcc/doc/invoke.texi		diff \| blob \| blame \| history
gcc/params.opt		diff \| blob \| blame \| history
gcc/testsuite/gcc.target/aarch64/sve/cost_model_16.c	[new file with mode: 0644]	blob
gcc/tree-vect-loop.cc		diff \| blob \| blame \| history