With GCC 13, the compiler apparently applies new aliasing optimizations
when compiled with -O2 and without -fno-strict-aliasing. This caused
the application of the second padding bit, where the state was accessed
via uint8_t[], to be moved before the loop that absorbs the buffer into
the state, where the state is accessed via uint64_t[], resulting in
incorrect output. By only accessing the state via uint64_t[] here the
compiler won't reorder the instructions.