git.ipfire.org Git - thirdparty/apache/httpd.git/commit

[PATCH] mpm_motorz: performance, bug fixes, concurrency hardening, async HTTP/2

Performance:
- Accept drain loop: motorz_io_accept now drains the kernel accept queue
  in one poll wakeup (do/while until EAGAIN, admission-disabled, or die_now)
  instead of one connection per round-trip through apr_pollset_poll. Eliminates
  O(N) poll wakeups for N burst connections.
- Hot-path log levels: 25 APLOG_DEBUG calls on the per-request path demoted to
  APLOG_TRACE6/TRACE7/TRACE8, matching event MPM practice. Error and admission-
  control events remain at DEBUG.
- Redundant pollset_remove removed: the defensive apr_pollset_remove in
  motorz_io_process is gone from the common dispatch path (connection is already
  claimed before reaching here); isolated to the clogging-filter branch where it
  is actually needed.
- Admission control: added active-thread saturation check (active >= threads_per_child)
  alongside the existing idle==0 and pending>=hi checks, catching slow-client /
  keep-alive-heavy saturation where the task queue appears empty but all workers
  are blocked in I/O. apr_size_t underflow fix: read total before idle, clamp
  active = (total > idle) ? total - idle : 0, preventing spurious disable during
  graceful restart.
- Hysteresis low-water mark tightened from 50% to 75% of ThreadsPerChild, so
  the listener re-enables sooner after a burst subsides.

Bug fixes:
- Clogging-filter timer race (use-after-free): the SSL/clogging path in
  motorz_io_process bypassed motorz_conn_claim(). A pending timer could fire
  concurrently while the worker was inside ap_run_process_connection(), dispatching
  a timeout worker on the same scon. Fixed by replacing the bare pollset_remove
  with motorz_conn_claim(), which atomically disarms both the pollset entry and
  the timer under poller->mtx.
- motorz_resume_suspended: restore c->sbh before ap_run_resume_connection().
  motorz_suspend_connection() NULLs c->sbh (matching event's notify_suspend);
  without this fix any module calling ap_update_child_status(c->sbh) after
  resume dereferenced NULL.
- SERVER_BUSY_READ scoreboard update: add ap_update_child_status(scon->sbh,
  SERVER_BUSY_READ, NULL) before ap_run_process_connection(), matching event MPM
  and fixing misleading mod_status output where motorz connections showed as
  SERVER_READY throughout the read/process phase.
- requests_this_child data race: written by the accepting poller thread, read
  by the supervisor main thread. Declared volatile to prevent stale cached reads
  across threads.
- conn_id always zero: every connection was created with conn_id=0, breaking
  %{connection} log formats and any module keying state on c->id. Now derived
  as ID_FROM_CHILD_THREAD(my_child_num, atomic_seq), matching worker/event MPM
  formula for globally unique IDs across children and connections.
- ap_create_sb_handle hardcoded child 0: all connections reported as child 0
  in the scoreboard, making mod_status show all activity in slot [0][0].
  Now passes my_child_num so each child's activity appears in the correct slot.
- Worker threads not drained on exit: apr_thread_pool_destroy() is now called
  in clean_child_exit() before apr_pool_destroy(pchild), joining all workers
  and preventing use-after-free in ap_log_error / apr_pool_clear after the
  pool is torn down.
- ThreadsPerChild 1 throughput collapse: the hysteresis low-water mark becomes
  (1*3)/4=0, so listeners only re-enable when the task queue is completely
  empty. Added startup/runtime warnings (APLOGNO 10555/10556) advising
  ThreadsPerChild >= 4. next-number advanced to 10557.

Multi-poller scale-out (PollersPerChild):
- motorz_core_t no longer holds a single pollset/timeout_ring/mtx/recycle-list;
  these are moved to per-poller motorz_poller_t contexts.
- Each poller owns its pollset, skiplist timer ring, ring mutex, and lock-free
  MPSC transaction-pool recycle list, so pollers never contend with each other.
- Connections are sharded round-robin to pollers at accept time (scon->poller);
  pool recycling returns to the accepting poller's free-list (scon->pool_poller).
- PollersPerChild directive added (0 = auto from online CPUs, capped at 8).
- Listener admission control and the pipe-of-death/generation supervision are
  isolated to poller 0 and the main-thread supervisor respectively.
- AP_MPMQ_CAN_SUSPEND / motorz_resume_suspended hook wired in for full
  async-suspend support (fix for CONN_STATE_SUSPENDED lifecycle).
- Non-blocking lingering close: replaced blocking ap_lingering_close() with
  motorz_start_lingering_close() / motorz_lingering_close() that hand the
  draining socket back to the poll loop with a bounded linger timeout.
- Pool cleanup (motorz_conn_pool_cleanup) cancels the timer under poller->mtx
  so pool destruction never leaves a dangling skiplist entry.

Async HTTP/2 handoff -- ENABLED (MOTORZ_ENABLE_ASYNC 1):
- motorz reports AP_MPMQ_IS_ASYNC=1 / AP_MPMQ_CAN_WAITIO=1. motorz_io_process()
  implements CONN_STATE_ASYNC_WAITIO: arm the pollset for read/write per
  c->cs->sense under Timeout and re-dispatch into PROCESSING, mirroring event.
  New APLOGNOs 10557-10559.
- Clogging-filter branch honors the hook-returned connection state
  (WRITE_COMPLETION / ASYNC_WAITIO / SUSPENDED) and maps KEEPALIVE to
  WRITE_COMPLETION instead of force-closing to LINGER. h2 c2 connections set
  clogging_input_filters unconditionally, so the old behavior collapsed h2
  keep-alive into one-shot connections.
- Forward-declare motorz_update_listeners() (called from motorz_io_accept
  before its definition; an implicit declaration is a hard error under strict
  C). Replaced a dead duplicate clean_child_exit prototype.

The async-handoff churn bug -- FIXED in mod_http2 (h2_session.c):
  Under async handoff mod_http2 hands the master (c1) connection back to the MPM
  between requests; motorz re-dispatches it on a fresh worker. Under rapid HTTP/2
  connection churn this raced mod_http2's stream lifecycle: a client's graceful
  GOAWAY drove the c1 session straight to DONE -> CONN_STATE_LINGER, and the MPM
  close ran m_stream_cleanup()/h2_c2_abort() on any stream whose secondary
  connection (c2) had emitted its response but not yet called c2_prod_done() --
  silently dropping that response (~0.2-3% under h2load -n.. -c50 -m1).

  The fix establishes the invariant "a c1 connection is closed only after every
  stream's c2 has finished and flushed", in two points in h2_session.c:
    * h2_session_ev_remote_goaway(): a graceful GOAWAY (error code 0) with
      streams still in flight no longer transits to H2_SESSION_ST_DONE. It RSTs
      only the unprocessed streams and keeps the session running so the in-flight
      streams complete and their c2 output is written. (An error GOAWAY, or one
      with no open streams, still goes to DONE immediately.) This also matches
      RFC 9113: a peer GOAWAY stops new streams, it does not abort streams at or
      below its last-stream-id.
    * H2_SESSION_ST_IDLE handling: once those streams drain (open_streams == 0)
      and the remote has shut down, send our GOAWAY and go to DONE from IDLE.
      Reaching DONE only here -- after the c2s are done and flushed -- keeps the
      close from racing an in-flight c2.
  This benefits mpm_event too and is a conformance improvement, not just a
  motorz workaround. MOTORZ_ENABLE_ASYNC remains a single flip point (set 0 to
  fall back to advertising IS_ASYNC=0) should a regression ever reappear.

Hardening of the fix (this change):
- motorz.c, smoke.sh: corrected stale "async DISABLED" comments that still
  described the old workaround while the code already enabled async.
- h2_session.c: documented the liveness bound -- keeping the session alive on a
  graceful GOAWAY cannot pin c1 open indefinitely, since a wedged c2 is bounded
  by its own request Timeout, which drops open_streams and lets c1 reach IDLE.
- h2_session.h: documented the open_streams threading invariant the fix now
  relies on (c1-thread only; async re-dispatch is successive not concurrent, so
  no atomics/volatile needed; decrements to 0 only after each c2 has flushed).

Tests (server/mpm/motorz/test/): setup.sh configures+builds httpd; run-all.sh
  runs the smoke, HTTP/1.1, and HTTP/2-over-TLS suites; bench.sh compares motorz
  vs event throughput. The async assertions expect async ON
  (CONN_STATE_ASYNC_WAITIO arms / "returning to mpm c1 monitoring" appears).

  The churn regression measures the fix correctly (two pitfalls, documented in
  MOTORZ.README and encoded in the tests):
    * Assert on RESPONSE LOSS (started - succeeded), NOT on h2load's "failed"
      total. "failed" also counts connection-establishment errors (ephemeral
      port / accept-queue pressure on busy loopback) which are environmental and
      appear with and without the fix; only started > succeeded is this bug.
    * Measure at LogLevel info, NOT trace8. The bug is a Heisenbug; trace8 slows
      the hot path enough to hide it, so a churn assertion run under trace8
      passes even with the fix removed (vacuous). The load-bearing churn
      regression is in run-http2.sh at info; smoke.sh runs at trace8 for its
      state-machine traces, so its churn check is a gross-sanity pass only.

  Full analysis, reproduction recipe, and the fix are in
  server/mpm/motorz/MOTORZ.README ("HTTP/2 async handoff").

Official mod_http2 pytest suite (test/modules/http2/): test_h2_106_02 now
skips on MPMs that do not register ServerLimit (i.e. mpm_motorz, whose static
fixed-size process pool makes StartServers the hard daemon limit, so ServerLimit
is meaningless and unregistered). The test's ServerLimit/MaxConnectionsPerChild
config is a syntax error there; prefork/worker/event still run it unchanged.
MaxConnectionsPerChild itself IS supported by motorz (a core directive honored by
the supervisor). Validated: full http2 suite green on both event and motorz
(only known flaky proxy-backend tests aside), and the motorz custom suite
(smoke/http1/http2) passes with 0 churn response-loss.

Docs and packaging: add the mpm_motorz manual page (docs/manual/mod/motorz.xml
+ .meta, registered in allmodules.xml) documenting the threading model, async
handling, admission control, and the PollersPerChild directive; add a CHANGES
entry; and insert motorz into the default-MPM fallback chain (server/mpm/
config2.m4) between event and worker.

git-svn-id: https://svn.apache.org/repos/asf/httpd/httpd/trunk@1934868 13f79535-47bb-0310-9956-ffa450edef68

author	Jim Jagielski <jim@apache.org>
	Tue, 2 Jun 2026 10:04:18 +0000 (10:04 +0000)
committer	Jim Jagielski <jim@apache.org>
	Tue, 2 Jun 2026 10:04:18 +0000 (10:04 +0000)
commit	51651aec5397f9ba2b474ba7f39ddb7278369365
tree	489bddddc76b852513675735c730220c1a92dc65	tree \| snapshot
parent	52e6887eced229f562e7cc6599184dcc6fced321	commit \| diff

CHANGES		diff \| blob \| blame \| history
docs/log-message-tags/next-number		diff \| blob \| blame \| history
docs/manual/mod/allmodules.xml		diff \| blob \| blame \| history
docs/manual/mod/motorz.xml	[new file with mode: 0644]	blob
docs/manual/mod/motorz.xml.meta	[new file with mode: 0644]	blob
modules/http2/h2_session.c		diff \| blob \| blame \| history
modules/http2/h2_session.h		diff \| blob \| blame \| history
server/mpm/config2.m4		diff \| blob \| blame \| history
server/mpm/motorz/MOTORZ.README		diff \| blob \| blame \| history
server/mpm/motorz/motorz.c		diff \| blob \| blame \| history
server/mpm/motorz/motorz.h		diff \| blob \| blame \| history
server/mpm/motorz/test/README.md	[new file with mode: 0644]	blob
server/mpm/motorz/test/bench.sh	[new file with mode: 0755]	blob
server/mpm/motorz/test/lib.sh	[new file with mode: 0644]	blob
server/mpm/motorz/test/run-all.sh	[new file with mode: 0755]	blob
server/mpm/motorz/test/run-http1.sh	[new file with mode: 0755]	blob
server/mpm/motorz/test/run-http2.sh	[new file with mode: 0755]	blob
server/mpm/motorz/test/setup.sh	[new file with mode: 0755]	blob
server/mpm/motorz/test/smoke.sh	[new file with mode: 0755]	blob
test/modules/http2/test_106_shutdown.py		diff \| blob \| blame \| history