Attempt to fix occasional failure of quicapi test in ci
https://github.com/openssl/openssl/actions/runs/
15214054228/job/
42795224720
the theory I have for the cause of this failure is:
1. qtest_create_quic_connection_ex is called for the client
2. The client is in blocking mode, so we fall into the conditional on line 512
3. We create the server thread on line 519, which is non-blocking
4. The scheduler in the failing case, lets the server run ahead of the client
5. Server thread enters qtest_create_quic_connection_ex and iterates steps
6-9 in the do_while loop starting on line 530
6. Server calls qtest_add_time
7. Server calls ossl_quic_tserver_tick
8. Server calls ossl_quic_tserver_is_term_any, received NULL return
9. Server calls qtest_wait_for_timeout
10. Eventually qtest_wait_for_timeout returns zero, adn the server jumps to
the error label, returning zero to globservret, and the thread exits
11. Client thread regains the cpu, and attempts to call SSL_connect, which
fails, as the server is no longer listening
12. We fall into the error case on line 556, and SSL_get_error returns
SSL_ERROR_SSL, which causes clienterr to get set to 1
13. We exit the do{} while loop on line 581, and do the TEST_true check on
line 593. The server having exited wait_for_thread returns true, but
globserverret is still zero from step 10 above, and so the test fails
I can't prove this is the case, as the test only appears to fail in CI,
and we can't dump verbose logging there, lest we affect the timing of
the tests, so this is just a theory, but it seems to fit the
observations we have.
Attempting to fix this, by creating a thread interlock with a condition
variable that blocks the server from ticking the quic reactor until such
time as the client is about to call SSL_connect to prevent the race
condition
Reviewed-by: Saša Nedvědický <sashan@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/27704)
(cherry picked from commit
864333b455eb36ba84562d6482547bf4c8b49581)