There are currently some overflow problems in adler32_rvv
implementation, which can lead to wrong results for some input, and
these problems could be easily exhibited when running `git fsck` with
zlib-ng suitituting the system zlib on a big git repository.
These problems and the solutions are the following:
- When the input data is long enough, the v_buf32_accu can overflow too.
Add it to the modulo code that happens per ~NMAX bytes.
- When the vector data is reduced to scalar ones, the resulting scalar
value (and the proceeded length) may lead to the calculation of sum2
to overflow. Add mod BASE to all these reductions and initial
calculation of sum2.
- When the remaining data less than vl bytes, the code falls back to a
scalar implementation; however the sum2 and alder2 values are just
reduced from vectors and could be very big that makes sum2 overflows
in the scalar code. Modulo them before the scalar code to prevent such
overflow (because vl is surely quite smaller than NMAX).