[crypto] Support direct reduction only for Montgomery constant R^2 mod N
The only remaining use case for direct reduction (outside of the unit
tests) is in calculating the constant R^2 mod N used during Montgomery
multiplication.
The current implementation of direct reduction requires a writable
copy of the modulus (to allow for shifting), and both the modulus and
the result buffer must be padded to be large enough to hold (R^2 - N),
which is twice the size of the actual values involved.
For the special case of reducing R^2 mod N (or any power of two mod
N), we can run the same algorithm without needing either a writable
copy of the modulus or a padded result buffer. The working state
required is only two bits larger than the result buffer, and these
additional bits may be held in local variables instead.
Rewrite bigint_reduce() to handle only this use case, and remove the
no longer necessary uses of double-sized big integers.