From: Michal Nowak <mnowak@isc.org>
Date: Thu, 11 Jun 2026 08:32:13 +0000 (+0000)
Subject: Retry pipequeries on a transient EADDRINUSE in the pipelined test
X-Git-Url: http://git.ipfire.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=b9cf877277d37db7148a917581a57c7702a35b33;p=thirdparty%2Fbind9.git

Retry pipequeries on a transient EADDRINUSE in the pipelined test

On FreeBSD, the TCP connect() call can transiently fail with
EADDRINUSE under parallel CI load.  The netmgr already retries such
connects (see #3451), but it retries on the same socket, which is
already bound to the same ephemeral source port, so when the
four-tuple is genuinely busy (e.g. in TIME_WAIT) every retry fails
the same way.  pipequeries then exits with "request event result:
address in use", leaving raw.1 empty and failing the first check.

All eight requests share a single TCP dispatch, so the failed connect
means no query ever reached ns4 and its cache is still cold.  It is
therefore safe to run pipequeries again: a fresh process binds a new
ephemeral port, and the out-of-order check keeps its meaning.  Retry
for up to ten attempts, but only on this specific transient error.

Assisted-by: Claude:claude-fable-5
Assisted-by: Claude:claude-opus-4-8
---

diff --git a/bin/tests/system/pipelined/tests.sh b/bin/tests/system/pipelined/tests.sh
index 59dfda940ab..8b87df1fcec 100644
--- a/bin/tests/system/pipelined/tests.sh
+++ b/bin/tests/system/pipelined/tests.sh
@@ -32,7 +32,32 @@ n=1
 ret=0
 
 echo_i "check pipelined TCP queries ($n)"
-pipequeries <input >raw.$n || ret=1
+# On FreeBSD, the TCP connect() call can transiently fail with
+# EADDRINUSE even after the netmgr retried it in place: the socket is
+# already bound, so retrying on the same source port cannot help.
+# pipequeries then bails out before any query is sent, which leaves
+# the ns4 cache cold, so it is safe to simply run it again (and the
+# out-of-order check below remains meaningful on a repeated run).
+#
+# This loop is a workaround for the pipequeries.c implementation.  If
+# pipequeries is ever rewritten in pure Python (using the test suite's
+# own DNS machinery, which can pick a fresh source port per attempt),
+# this retry should no longer be necessary and can be dropped.
+pq_left=10
+while :; do
+  ret=0
+  pipequeries <input >raw.$n 2>pipequeries.err.$n || ret=1
+  cat pipequeries.err.$n >&2
+  pq_left=$((pq_left - 1))
+  if [ $ret -eq 0 ] || [ $pq_left -le 0 ]; then
+    break
+  fi
+  if ! grep "address in use" pipequeries.err.$n >/dev/null; then
+    break
+  fi
+  echo_i "retrying pipequeries after a transient connect failure"
+  sleep 1
+done
 awk '{ print $1 " " $5 }' <raw.$n >output.$n
 sort <output.$n >output-sorted.$n
 diff ref output-sorted.$n || {
diff --git a/bin/tests/system/pipelined/tests_sh_pipelined.py b/bin/tests/system/pipelined/tests_sh_pipelined.py
index 7efe483da43..fe83536b42d 100644
--- a/bin/tests/system/pipelined/tests_sh_pipelined.py
+++ b/bin/tests/system/pipelined/tests_sh_pipelined.py
@@ -14,6 +14,7 @@ import pytest
 pytestmark = pytest.mark.extra_artifacts(
     [
         "output*",
+        "pipequeries.err*",
         "raw*",
         "ans*/ans.run",
     ]