SLH-DSA spends a significant amount of time performing large
numbers of hash calculations. Initially this was done using
EVP layer calls. The overhead is significant when there are thousands
of calls. To reduce this overhead the lower level sha functions for
KECCAK1600_CTX, SHA256_CTX and SHA512_CTX are accessed directly.
Profiling showed that a significant amount of time is spent in
"WOTS+ Public key generation" (FIPS 205 Section 5.1 Algorithm 6) so
this was inlined for shake and sha2 (See slh_wots_pk_gen_sha2()).
In FIPS 205 Section 11 there is a list of Hash functions.
Many of these functions use a pattern of
Trunc(n)(SHA256(PK.Seed || toByte(0, 64-n) || ....)
Because this operation is done many times, this prehashed
value is calculated once and stored into a low level SHA256_CTX or
KECCAK1600_CTX.
This can then be block copied to stack based KECCAK1600_CTX or
SHA256_CTX that we can then perform low level SHA functions on.
The md_len field is written to directly before the SHA final() to
control the length of the output (which avoids performing a memcpy).
Reviewed-by: Paul Dale <paul.dale@oracle.com> Reviewed-by: Viktor Dukhovni <viktor@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/28941)