From: Michal Nowak Date: Thu, 11 Jun 2026 08:32:13 +0000 (+0000) Subject: Retry pipequeries on a transient EADDRINUSE in the pipelined test X-Git-Url: http://git.ipfire.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=b9cf877277d37db7148a917581a57c7702a35b33;p=thirdparty%2Fbind9.git Retry pipequeries on a transient EADDRINUSE in the pipelined test On FreeBSD, the TCP connect() call can transiently fail with EADDRINUSE under parallel CI load. The netmgr already retries such connects (see #3451), but it retries on the same socket, which is already bound to the same ephemeral source port, so when the four-tuple is genuinely busy (e.g. in TIME_WAIT) every retry fails the same way. pipequeries then exits with "request event result: address in use", leaving raw.1 empty and failing the first check. All eight requests share a single TCP dispatch, so the failed connect means no query ever reached ns4 and its cache is still cold. It is therefore safe to run pipequeries again: a fresh process binds a new ephemeral port, and the out-of-order check keeps its meaning. Retry for up to ten attempts, but only on this specific transient error. Assisted-by: Claude:claude-fable-5 Assisted-by: Claude:claude-opus-4-8 --- diff --git a/bin/tests/system/pipelined/tests.sh b/bin/tests/system/pipelined/tests.sh index 59dfda940ab..8b87df1fcec 100644 --- a/bin/tests/system/pipelined/tests.sh +++ b/bin/tests/system/pipelined/tests.sh @@ -32,7 +32,32 @@ n=1 ret=0 echo_i "check pipelined TCP queries ($n)" -pipequeries raw.$n || ret=1 +# On FreeBSD, the TCP connect() call can transiently fail with +# EADDRINUSE even after the netmgr retried it in place: the socket is +# already bound, so retrying on the same source port cannot help. +# pipequeries then bails out before any query is sent, which leaves +# the ns4 cache cold, so it is safe to simply run it again (and the +# out-of-order check below remains meaningful on a repeated run). +# +# This loop is a workaround for the pipequeries.c implementation. If +# pipequeries is ever rewritten in pure Python (using the test suite's +# own DNS machinery, which can pick a fresh source port per attempt), +# this retry should no longer be necessary and can be dropped. +pq_left=10 +while :; do + ret=0 + pipequeries raw.$n 2>pipequeries.err.$n || ret=1 + cat pipequeries.err.$n >&2 + pq_left=$((pq_left - 1)) + if [ $ret -eq 0 ] || [ $pq_left -le 0 ]; then + break + fi + if ! grep "address in use" pipequeries.err.$n >/dev/null; then + break + fi + echo_i "retrying pipequeries after a transient connect failure" + sleep 1 +done awk '{ print $1 " " $5 }' output.$n sort output-sorted.$n diff ref output-sorted.$n || { diff --git a/bin/tests/system/pipelined/tests_sh_pipelined.py b/bin/tests/system/pipelined/tests_sh_pipelined.py index 7efe483da43..fe83536b42d 100644 --- a/bin/tests/system/pipelined/tests_sh_pipelined.py +++ b/bin/tests/system/pipelined/tests_sh_pipelined.py @@ -14,6 +14,7 @@ import pytest pytestmark = pytest.mark.extra_artifacts( [ "output*", + "pipequeries.err*", "raw*", "ans*/ans.run", ]