Align Windows RCU implementation to the pthread variant
Unlike the pthread variant, Windows RCU uses broadcast instead
of targeted signal calls in some places, unnecessarily increasing
the number of used cycles.
The retire_qp should wake up only one thread to proceed, not
all of them. For update_qp, that signals the thread after
increasing writers_alloced, signalling all threads does not make
sense either.
The speedup is significant on lhash_test, running on many CPUs
(on 32 cores, a speedup from 6:20 to 1:40 minutes on test hw).
Co-Authored-By: Claude Opus 4.6 Extended <noreply@anthropic.com>
Signed-off-by: Milan Broz <gmazyland@gmail.com>
Reviewed-by: Saša Nedvědický <sashan@openssl.org>
Reviewed-by: Nikola Pajkovsky <nikolap@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
MergeDate: Fri Mar 13 17:25:47 2026
(Merged from https://github.com/openssl/openssl/pull/30388)
(cherry picked from commit
5f8fad06475fae024609cf09a1bb2ca8c74b44d6)