It is no longer necessary to get good performance, there is only a small
speed difference between -O2 and -O3, so just stick to the default of
-O2.
This also fixes the stack space usage on parisc. The compiler was buggy
for -O3 and used ~3KB of stack space for several functions. With -O2 the
problem is completely resolved, and stack space is back to a few hundred
bytes.
Additionally, we get a large code size win on gcc: