The following patch adds big endian support to the bitintlower pass.
While the rest of the _BitInt support has been written with endianity
in mind, in the bitintlower pass I've written it solely little endian
at the start, because the pass is large and complicated and there were
no big-endian backends with _BitInt psABI at the point of writing it,
so the big-endian support would be completely untested.
Now that I got privately a patch to add s390x support, I went through
the whole pass and added the support.
Some months ago I've talked about two possibilities to do the big-endian
support, one perhaps easier would be keep the idx vars (INTEGER_CSTs
for bitint_prec_large and partially SSA_NAMEs, partially INTEGER_CSTs
for bitint_prec_huge) iterating like for little-endian from 0 upwards
and do the big-endian index correction only when accessing the limbs
(but mergeable casts between _BitInts with different number of limbs
would be a nightmare), which would have the disadvantage that we'd need
to wait until propagation and ivopts to fix stuff up (and not sure it
would be able to fix everything), or change stuff so that the idxes
used between the different bitint_large_huge class methods iterate on
big endian from highest down to 0.
The following patch implements the latter.
On s390x with the 3 patches from IBM without this patch I got on
make -j32 -k check-gcc GCC_TEST_RUN_EXPENSIVE=1 RUNTESTFLAGS="GCC_TEST_RUN_EXPENSIVE=1 dg.exp='*bitint* pr112673.c builtin-stdc-bit-*.c pr112566-2.c pr112511.c pr116588.c pr116003.c
+pr113693.c pr113602.c flex-array-counted-by-7.c' dg-torture.exp='*bitint* pr116480-2.c pr114312.c pr114121.c' dfp.exp=*bitint* vect.exp='vect-early-break_99-pr113287.c'
+tree-ssa.exp=pr113735.c"
347 FAILs, 326 out of that execution failures (and that doesn't include
some tests that happened to succeed by pure luck because e.g. comparisons
weren't implemented correctly).
With this patch (and 2 small patches I'm going to post next) I got this
down to
FAIL: gcc.dg/dfp/bitint-1.c (test for excess errors)
FAIL: gcc.dg/dfp/bitint-2.c (test for excess errors)
FAIL: gcc.dg/dfp/bitint-3.c (test for excess errors)
FAIL: gcc.dg/dfp/bitint-4.c (test for excess errors)
FAIL: gcc.dg/dfp/bitint-5.c (test for excess errors)
FAIL: gcc.dg/dfp/bitint-6.c (test for excess errors)
FAIL: gcc.dg/dfp/bitint-8.c (test for excess errors)
FAIL: gcc.dg/torture/bitint-64.c -O3 -g execution test
FAIL: gcc.dg/torture/bitint-64.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects execution test
where the dfp stuff is due to missing DPD dfp <-> _BitInt support
I'm working on next, and bitint-64.c is some expansion related
issue with _Atomic _BitInt(5) (will look at it later, but there
bitint lowering isn't involved at all).
Most of the patch just tweaks things so that it iterates in the
right direction, for casts with different number of limbs does the
needed index adjustments and unfortunately due to that (and e.g.
add/sub/mul overflow BE lowering) has some pessimizations on the
SSA conflict side; on little-endian mergeable ops have the
advantage that all the accesses iterate from index 0 up, so even
if there is e.g. overlap between the lhs and some used values, except
for mul/div where libgcc APIs require no overlap we don't need to
avoid it, all the limbs are updated in sync before going on to handle
next limb. On big-endian, that isn't the case, casts etc. can result
in index adjustments and so we could overwrite a limb that will still
need to be processed as input. So, there is a special case that looks
for different numbers of limbs among arguments and in that case marks
the lhs as conflicting with the inputs.
On little-endian, this patch shouldn't affect code generation, with
one little exception; in the separate_ext handling in lower_mergeable_stmt
the loop (if bitint_large_huge) was iterating using some idx and then
if bo_idx was non-zero, adding that constant to a new SSA_NAME and using
that to do the limb accesses. As the limb accesses are the only place
where the idx is used (apart from the loop exit test), I've changed it
to iterate on idxes with bo_idx already added to those.
P.S., would be nice to eventually also enable big-endian aarch64,
but I don't have access to those, so can't test that myself.
P.S., at least in the current s390x patches it wants info->extended_p,
this patch doesn't change anything about that. I believe most of the
time the _BitInt vars/params/return values are already extended, but
there is no testcase coverage for that, I will work on that incrementally
(and then perhaps arm 32-bit _BitInt support can be enabled too).
* gcc.dg/torture/bitint-78.c: New test.
* gcc.dg/torture/bitint-79.c: New test.
* gcc.dg/torture/bitint-80.c: New test.
* gcc.dg/torture/bitint-81.c: New test.