So I've been noticing the cycle time for a native build/test on the
Pioneer and BPI rising over the last many months. I've suspected a pain
point is likely genautomata due to long reservations in the DFAs.
Trying to describe a 30+ cycle bubble in the pipeline just isn't useful
and causes the DFA to blow up.
This is time to build insn-automata.cc using an optimized genautomata
using my skylake server cross compiling to riscv64. The baseline is what
we have today. Then I clamped the reservations (but not the latency) to
7c. 7c is arbitrary, but known not to blow up the DFA. I fixed the BPI
first, then the Andes 23 and so-on.
That's a significant improvement, though I probably wouldn't go forward
with just that improvement. It's less than a minute and skylake systems
aren't exactly new anymore...
Let's try that with an unoptimized genautomata. I often build that way
when debugging.
Baseline 343s
Final 79s
So that's saving ~4m on my skylake server for a common build. Given I
use ccache, that 4m is often a significant amount of the build time. So
this feels like a better motivating example.
But I'm really after bringing down bootstrap cycle times on the BPI and
Pioneer. So let's see what the BPI does. For an optimized genautomata
we get (not testing all the intermediate steps):
Baseline 310s
Final: 110s
Not bad. And if we look at unoptimized genautomata:
Baseline: 2196s
Final: 553s
Now we can see why bootstrap times have crept up meaningfully. That's
~27 minutes out of a 9hr bootstrap time on the BPI (pure bootstrap, no
testing). The effect is more pronounced on the Pioneer where the
improvement is 30+ minutes on a 4hr bootstrap time (each core is slower,
but there's 8x as many cores).
Tested on riscv{32,64}-elf and bootstrapped on the Pioneer (regression
testing in progress). I'll wait for pre-commit CI to do its thing.