Speed up SSL_add_{file,dir}_cert_subjects_to_stack
The X509_NAME comparison function converts its arguments to DER using
i2d_X509_NAME before comparing the results using memcmp(). For every
invocation of the comparison function (of which there are many when
loading many certificates), it allocates two buffers of the appropriate
size for the DER encoding.
Switching to static buffers (possibly of X509_NAME_MAX size as defined
in crypto/x509/x_name.c) would not work with multithreaded use, e.g.,
when two threads sort two separate STACK_OF(X509_NAME)s at the same
time. A suitable re-usable buffer could have been added to the
STACK_OF(X509_NAME) if sk_X509_NAME_compfunc did have a void* argument,
or a pointer to the STACK_OF(X509_NAME) – but it does not.
Instead, copy the solution chosen in SSL_load_client_CA_file() by
filling an LHASH_OF(X509_NAME) with all existing names in the stack and
using that to deduplicate, rather than relying on sk_X509_NAME_find(),
which ends up being very slow.
Adjust SSL_add_dir_cert_subjects_to_stack() to keep a local
LHASH_OF(X509_NAME)s over the complete directory it is processing.
In a small benchmark that calls SSL_add_dir_cert_subjects_to_stack()
twice, once on a directory with one entry, and once with a directory
with 1000 certificates, and repeats this in a loop 10 times, this change
yields a speed-up of 5.32:
| Benchmark 1: ./bench 10 dir-1 dir-1000
| Time (mean ± σ): 6.685 s ± 0.017 s [User: 6.402 s, System: 0.231 s]
| Range (min … max): 6.658 s … 6.711 s 10 runs
|
| Benchmark 2: LD_LIBRARY_PATH=. ./bench 10 dir-1 dir-1000
| Time (mean ± σ): 1.256 s ± 0.013 s [User: 1.034 s, System: 0.212 s]
| Range (min … max): 1.244 s … 1.286 s 10 runs
|
| Summary
| LD_LIBRARY_PATH=. ./bench 10 dir-1 dir-1000 ran
| 5.32 ± 0.06 times faster than ./bench 10 dir-1 dir-1000
In the worst case scenario where many entries are added to a stack that
is then repeatedly used to add more certificates, and with a larger test
size, the speedup is still very significant. With 15000 certificates,
a single pass to load them, followed by attempting to load a subset of
1000 of these 15000 certificates, followed by a single certificate, the
new approach is ~85 times faster:
| Benchmark 1: ./bench 1 dir-15000 dir-1000 dir-1
| Time (mean ± σ): 176.295 s ± 4.147 s [User: 174.593 s, System: 0.448 s]
| Range (min … max): 173.774 s … 185.594 s 10 runs
|
| Benchmark 2: LD_LIBRARY_PATH=. ./bench 1 dir-15000 dir-1000 dir-1
| Time (mean ± σ): 2.087 s ± 0.034 s [User: 1.679 s, System: 0.393 s]
| Range (min … max): 2.057 s … 2.167 s 10 runs
|
| Summary
| LD_LIBRARY_PATH=. ./bench 1 dir-15000 dir-1000 dir-1 ran
| 84.48 ± 2.42 times faster than ./bench 1 dir-15000 dir-1000 dir-1
Signed-off-by: Clemens Lang <cllang@redhat.com> Reviewed-by: Dmitry Belyavskiy <beldmit@gmail.com> Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/25056)