[PATCH] mpm_motorz: performance, bug fixes, concurrency hardening, async HTTP/2
Performance:
- Accept drain loop: motorz_io_accept now drains the kernel accept queue
in one poll wakeup (do/while until EAGAIN, admission-disabled, or die_now)
instead of one connection per round-trip through apr_pollset_poll. Eliminates
O(N) poll wakeups for N burst connections.
- Hot-path log levels: 25 APLOG_DEBUG calls on the per-request path demoted to
APLOG_TRACE6/TRACE7/TRACE8, matching event MPM practice. Error and admission-
control events remain at DEBUG.
- Redundant pollset_remove removed: the defensive apr_pollset_remove in
motorz_io_process is gone from the common dispatch path (connection is already
claimed before reaching here); isolated to the clogging-filter branch where it
is actually needed.
- Admission control: added active-thread saturation check (active >= threads_per_child)
alongside the existing idle==0 and pending>=hi checks, catching slow-client /
keep-alive-heavy saturation where the task queue appears empty but all workers
are blocked in I/O. apr_size_t underflow fix: read total before idle, clamp
active = (total > idle) ? total - idle : 0, preventing spurious disable during
graceful restart.
- Hysteresis low-water mark tightened from 50% to 75% of ThreadsPerChild, so
the listener re-enables sooner after a burst subsides.
Bug fixes:
- Clogging-filter timer race (use-after-free): the SSL/clogging path in
motorz_io_process bypassed motorz_conn_claim(). A pending timer could fire
concurrently while the worker was inside ap_run_process_connection(), dispatching
a timeout worker on the same scon. Fixed by replacing the bare pollset_remove
with motorz_conn_claim(), which atomically disarms both the pollset entry and
the timer under poller->mtx.
- motorz_resume_suspended: restore c->sbh before ap_run_resume_connection().
motorz_suspend_connection() NULLs c->sbh (matching event's notify_suspend);
without this fix any module calling ap_update_child_status(c->sbh) after
resume dereferenced NULL.
- SERVER_BUSY_READ scoreboard update: add ap_update_child_status(scon->sbh,
SERVER_BUSY_READ, NULL) before ap_run_process_connection(), matching event MPM
and fixing misleading mod_status output where motorz connections showed as
SERVER_READY throughout the read/process phase.
- requests_this_child data race: written by the accepting poller thread, read
by the supervisor main thread. Declared volatile to prevent stale cached reads
across threads.
- conn_id always zero: every connection was created with conn_id=0, breaking
%{connection} log formats and any module keying state on c->id. Now derived
as ID_FROM_CHILD_THREAD(my_child_num, atomic_seq), matching worker/event MPM
formula for globally unique IDs across children and connections.
- ap_create_sb_handle hardcoded child 0: all connections reported as child 0
in the scoreboard, making mod_status show all activity in slot [0][0].
Now passes my_child_num so each child's activity appears in the correct slot.
- Worker threads not drained on exit: apr_thread_pool_destroy() is now called
in clean_child_exit() before apr_pool_destroy(pchild), joining all workers
and preventing use-after-free in ap_log_error / apr_pool_clear after the
pool is torn down.
- ThreadsPerChild 1 throughput collapse: the hysteresis low-water mark becomes
(1*3)/4=0, so listeners only re-enable when the task queue is completely
empty. Added startup/runtime warnings (APLOGNO 10555/10556) advising
ThreadsPerChild >= 4. next-number advanced to 10557.
Multi-poller scale-out (PollersPerChild):
- motorz_core_t no longer holds a single pollset/timeout_ring/mtx/recycle-list;
these are moved to per-poller motorz_poller_t contexts.
- Each poller owns its pollset, skiplist timer ring, ring mutex, and lock-free
MPSC transaction-pool recycle list, so pollers never contend with each other.
- Connections are sharded round-robin to pollers at accept time (scon->poller);
pool recycling returns to the accepting poller's free-list (scon->pool_poller).
- PollersPerChild directive added (0 = auto from online CPUs, capped at 8).
- Listener admission control and the pipe-of-death/generation supervision are
isolated to poller 0 and the main-thread supervisor respectively.
- AP_MPMQ_CAN_SUSPEND / motorz_resume_suspended hook wired in for full
async-suspend support (fix for CONN_STATE_SUSPENDED lifecycle).
- Non-blocking lingering close: replaced blocking ap_lingering_close() with
motorz_start_lingering_close() / motorz_lingering_close() that hand the
draining socket back to the poll loop with a bounded linger timeout.
- Pool cleanup (motorz_conn_pool_cleanup) cancels the timer under poller->mtx
so pool destruction never leaves a dangling skiplist entry.
Async HTTP/2 handoff -- ENABLED (MOTORZ_ENABLE_ASYNC 1):
- motorz reports AP_MPMQ_IS_ASYNC=1 / AP_MPMQ_CAN_WAITIO=1. motorz_io_process()
implements CONN_STATE_ASYNC_WAITIO: arm the pollset for read/write per
c->cs->sense under Timeout and re-dispatch into PROCESSING, mirroring event.
New APLOGNOs 10557-10559.
- Clogging-filter branch honors the hook-returned connection state
(WRITE_COMPLETION / ASYNC_WAITIO / SUSPENDED) and maps KEEPALIVE to
WRITE_COMPLETION instead of force-closing to LINGER. h2 c2 connections set
clogging_input_filters unconditionally, so the old behavior collapsed h2
keep-alive into one-shot connections.
- Forward-declare motorz_update_listeners() (called from motorz_io_accept
before its definition; an implicit declaration is a hard error under strict
C). Replaced a dead duplicate clean_child_exit prototype.
The async-handoff churn bug -- FIXED in mod_http2 (h2_session.c):
Under async handoff mod_http2 hands the master (c1) connection back to the MPM
between requests; motorz re-dispatches it on a fresh worker. Under rapid HTTP/2
connection churn this raced mod_http2's stream lifecycle: a client's graceful
GOAWAY drove the c1 session straight to DONE -> CONN_STATE_LINGER, and the MPM
close ran m_stream_cleanup()/h2_c2_abort() on any stream whose secondary
connection (c2) had emitted its response but not yet called c2_prod_done() --
silently dropping that response (~0.2-3% under h2load -n.. -c50 -m1).
The fix establishes the invariant "a c1 connection is closed only after every
stream's c2 has finished and flushed", in two points in h2_session.c:
* h2_session_ev_remote_goaway(): a graceful GOAWAY (error code 0) with
streams still in flight no longer transits to H2_SESSION_ST_DONE. It RSTs
only the unprocessed streams and keeps the session running so the in-flight
streams complete and their c2 output is written. (An error GOAWAY, or one
with no open streams, still goes to DONE immediately.) This also matches
RFC 9113: a peer GOAWAY stops new streams, it does not abort streams at or
below its last-stream-id.
* H2_SESSION_ST_IDLE handling: once those streams drain (open_streams == 0)
and the remote has shut down, send our GOAWAY and go to DONE from IDLE.
Reaching DONE only here -- after the c2s are done and flushed -- keeps the
close from racing an in-flight c2.
This benefits mpm_event too and is a conformance improvement, not just a
motorz workaround. MOTORZ_ENABLE_ASYNC remains a single flip point (set 0 to
fall back to advertising IS_ASYNC=0) should a regression ever reappear.
Hardening of the fix (this change):
- motorz.c, smoke.sh: corrected stale "async DISABLED" comments that still
described the old workaround while the code already enabled async.
- h2_session.c: documented the liveness bound -- keeping the session alive on a
graceful GOAWAY cannot pin c1 open indefinitely, since a wedged c2 is bounded
by its own request Timeout, which drops open_streams and lets c1 reach IDLE.
- h2_session.h: documented the open_streams threading invariant the fix now
relies on (c1-thread only; async re-dispatch is successive not concurrent, so
no atomics/volatile needed; decrements to 0 only after each c2 has flushed).
Tests (server/mpm/motorz/test/): setup.sh configures+builds httpd; run-all.sh
runs the smoke, HTTP/1.1, and HTTP/2-over-TLS suites; bench.sh compares motorz
vs event throughput. The async assertions expect async ON
(CONN_STATE_ASYNC_WAITIO arms / "returning to mpm c1 monitoring" appears).
The churn regression measures the fix correctly (two pitfalls, documented in
MOTORZ.README and encoded in the tests):
* Assert on RESPONSE LOSS (started - succeeded), NOT on h2load's "failed"
total. "failed" also counts connection-establishment errors (ephemeral
port / accept-queue pressure on busy loopback) which are environmental and
appear with and without the fix; only started > succeeded is this bug.
* Measure at LogLevel info, NOT trace8. The bug is a Heisenbug; trace8 slows
the hot path enough to hide it, so a churn assertion run under trace8
passes even with the fix removed (vacuous). The load-bearing churn
regression is in run-http2.sh at info; smoke.sh runs at trace8 for its
state-machine traces, so its churn check is a gross-sanity pass only.
Full analysis, reproduction recipe, and the fix are in
server/mpm/motorz/MOTORZ.README ("HTTP/2 async handoff").
Official mod_http2 pytest suite (test/modules/http2/): test_h2_106_02 now
skips on MPMs that do not register ServerLimit (i.e. mpm_motorz, whose static
fixed-size process pool makes StartServers the hard daemon limit, so ServerLimit
is meaningless and unregistered). The test's ServerLimit/MaxConnectionsPerChild
config is a syntax error there; prefork/worker/event still run it unchanged.
MaxConnectionsPerChild itself IS supported by motorz (a core directive honored by
the supervisor). Validated: full http2 suite green on both event and motorz
(only known flaky proxy-backend tests aside), and the motorz custom suite
(smoke/http1/http2) passes with 0 churn response-loss.
Docs and packaging: add the mpm_motorz manual page (docs/manual/mod/motorz.xml
+ .meta, registered in allmodules.xml) documenting the threading model, async
handling, admission control, and the PollersPerChild directive; add a CHANGES
entry; and insert motorz into the default-MPM fallback chain (server/mpm/
config2.m4) between event and worker.
git-svn-id: https://svn.apache.org/repos/asf/httpd/httpd/trunk@
1934868 13f79535-47bb-0310-9956-
ffa450edef68