Andrew Tridgell [Mon, 15 Jun 2026 21:54:07 +0000 (07:54 +1000)]
ci: run scan-build on pinned clang-18 + latest clang (informational)
Split the scan-build workflow into two non-gating jobs, each uploading
its HTML report as an artifact:
- pinned-clang18: clang-18 / clang-tools-18 on ubuntu-24.04, so the
checker set -- and thus the report -- is deterministic.
- informational-latest: whatever clang ubuntu-latest ships, to surface
what newer analyzers see.
Both are informational (no --status-bugs): the tree still has known
clang-18 findings, so the run reports without blocking the build. Once
the tree is at zero for clang-18, re-add --status-bugs to the pinned job
to turn it back into a gate. Installs libpopt-dev so configure finds
popt under the scan-build compiler wrapper.
Andrew Tridgell [Fri, 12 Jun 2026 00:53:10 +0000 (10:53 +1000)]
scan-build: fix resource leaks on error paths
clientserver.c: close the --early-input-file FILE* on the
fstat/oversize/early-EOF error returns; it was only closed on the
success path.
getgroups.c: free the gid list before returning.
Andrew Tridgell [Mon, 15 Jun 2026 20:43:20 +0000 (06:43 +1000)]
scan-build: drop dead assignments
Remove stores that are never read before being overwritten or going
out of scope. No behavior change except batch.c write_opt, which now
accumulates the leading-space write error into the return value
(consistent with the arg branch) instead of discarding it.
Andrew Tridgell [Fri, 12 Jun 2026 00:53:01 +0000 (10:53 +1000)]
scan-build: zero-init buffers the analyzer can't prove are written
clang's static analyzer doesn't model SIVAL/SIVAL64/SIVALu or
getpeername/getsockname as initializing their target bytes, so it
reports false "garbage value" reads. Zero-init the affected buffers;
the bytes are always overwritten at runtime, so this only quiets the
analyzer.
Andrew Tridgell [Sat, 13 Jun 2026 22:02:20 +0000 (08:02 +1000)]
runtests: write valgrind logs to a world-writable subdir
Under --valgrind some tests run rsync with reduced privileges: partial_nowrite
wraps it in "setpriv --inh-caps -all --bounding-set -all" to force EACCES, and
chdir-symlink-race's daemon drops to the module's uid. Such a child cannot
create valgrind's --log-file in a root-owned scratchbase, so valgrind aborts at
startup and the test fails (seen only in the root + --use-tcp cell).
Put the logs in a 1777 valgrind-logs/ subdir so a privilege-dropped child can
always write them. Scan and cleanup are unchanged; the logs just move one
directory down.
Andrew Tridgell [Sat, 13 Jun 2026 20:44:21 +0000 (06:44 +1000)]
generator: don't read an unstat'd sx.st when creating a device/special
this fixes a valgrind error where we could read an uninitialised sx.st
field when we don't fill the stat data.
Also drop the now-obsolete testsuite/valgrind.supp stanzas for these
reads (atomic_create/delete_item, plus the rwrite strlcpy over-read that
master already fixed) -- they are no longer needed now the reads are gone.
Thanks to report from Michael Mess <michael@michaelmess.de>
Andrew Tridgell [Sat, 13 Jun 2026 08:36:53 +0000 (18:36 +1000)]
wildtest: don't read past the buffer when scanning a test line
main()'s line parser stepped through the fgets() buffer with `*++s` in
three places without first checking for the terminating NUL, so a test
line whose last token runs to the end of the buffer (e.g. a final line
with no trailing newline) could advance s past the NUL and read out of
bounds.
Guard the flag-separator check and rewrite the two whitespace-skip loops
so they never step past the NUL. No behaviour change for well-formed
input: the existing wildtest.txt still passes, and the crafted overflow
input is now clean under valgrind.
Andrew Tridgell [Sat, 13 Jun 2026 08:12:07 +0000 (18:12 +1000)]
log: copy forwarded message by length in rwrite(), not strlcpy()
The valgrind memcheck CI flagged 'Conditional jump depends on uninitialised
value(s)' in rwrite() -> strlcpy() (log.c) and the subsequent logit() fprintf.
rwrite()'s daemon/logfile branch did strlcpy(msg, buf, MIN(sizeof msg, len+1)),
but strlcpy() scans the whole source with strlen(); buf is the data buffer from
read_a_msg() (io.c) holding exactly len bytes of a forwarded MSG_* payload with
no NUL terminator, so strlen() reads past the message into uninitialised stack.
Copy exactly len (bounded) bytes with memcpy() and NUL-terminate, matching the
(buf, len) contract the rest of rwrite() already honours. Behaviour is
unchanged for the NUL-terminated callers; the over-read is gone.
Full testsuite under valgrind (1572 logs) now reports zero unsuppressed errors.
Andrew Tridgell [Sat, 13 Jun 2026 07:24:18 +0000 (17:24 +1000)]
ci: build test helpers before the valgrind run
`make` alone does not build the CHECK_PROGS test helpers (tls, trimslash,
t_chmod_secure, ...), so runtests.py exited immediately with "missing
test helper program(s)", produced no valgrind logs, and the scan step
failed every job with "the suite did not run". Use `make check-progs`,
which builds rsync plus the helpers and symlink fixtures without running
the suite.
Andrew Tridgell [Sat, 13 Jun 2026 06:51:24 +0000 (16:51 +1000)]
testsuite: force C locale in reverse-daemon-delta byte-count parse
rsync groups the "sent/received N bytes" summary numbers using the
locale's thousands separator (e.g. de_DE uses '.'), which broke the
[\d,]+ parser and failed the test for testers in non-C locales. Run the
peer client under LC_ALL=C so the output is deterministic.
Reported-by: Michael Mess <michael@michaelmess.de>
Add a .github/workflows/valgrind.yml that runs the full suite under
valgrind in a 2x2 matrix (user/root x pipe/tcp transport) and gates on
memory errors. It uses --leak-check=no: rsync intentionally leaves
file-list/socket/option memory unfreed at exit, so a leak check is
inherently noisy; the gate flags uninitialised reads, invalid
read/write, bad frees and uninit syscall params instead.
Add testsuite/valgrind.supp covering the known-benign reports (rwrite
strlcpy over-read on a non-NUL-terminated peer message, atomic_create/
delete_item st_mode read under fakeroot, libfakeroot msgsnd padding,
plus popt/xxhash leaks for manual --leak-check audits). runtests.py
--valgrind now loads it automatically.
Andrew Tridgell [Sat, 13 Jun 2026 07:49:18 +0000 (17:49 +1000)]
token: allow uncompressed literal runs larger than CHUNK_SIZE
The hardening in c44c90e9 added a check in simple_recv_token() rejecting
any uncompressed literal-run length > CHUNK_SIZE (32k). That assumption
breaks interoperability: other rsync implementations -- e.g. the acrosync
library used by the iOS "PhotoBackup" app -- use a 64k block size and
send literal runs of 65536 bytes, which 3.4.3+ now rejects with
"invalid uncompressed token length 65536".
The check was unnecessary: simple_recv_token() already reads the run
CHUNK_SIZE bytes at a time via the residue loop (n = MIN(CHUNK_SIZE,
residue)), so read_buf() never writes past the static CHUNK_SIZE buffer
regardless of the wire-supplied length. Drop the check to restore
interop; the compressed-token integer-overflow fix from c44c90e9 (the
MAX_TOKEN_INDEX / rx_token caps) is left unchanged.
Will Sarg [Thu, 11 Jun 2026 05:02:42 +0000 (01:02 -0400)]
testsuite: fix executability test skip on FreeBSD (EFTYPE)
FreeBSD and OpenBSD return EFTYPE (errno 79) when chmod-ing a sticky bit
onto a regular file as non-root, rather than EPERM/EACCES. Catch OSError
and check errno against the expected skip set so the test skips correctly
on those platforms instead of erroring out.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Andrew Tridgell [Wed, 10 Jun 2026 22:25:50 +0000 (08:25 +1000)]
abdiff: A/B differential regression hunter for rsync
testsuite/abdiff.py runs the same benign transfer with two rsync binaries
(A = build under test, B = a baseline) and compares the OUTCOME -- exit code,
stderr, --stats "Literal data", the destination tree (content + full metadata),
the --itemize list, and (with --cost) peak process-group RSS. For benign input
the two must be indistinguishable; any divergence is a regression candidate.
It is a developer tool, NOT a runtests.py test (does not end in _test.py).
Capabilities:
- Scenario sweeps over options / path shapes / file types / sizes / modes /
selection / placement / wire / transports, plus domain-knowledge pairwise +
combo sweeps and a stochastic fuzzer/role matrix.
- Transport lanes: local, ssh split (lsh.sh), stdio-pipe daemon, a REAL TCP
daemon (bound port + greeting/handshake/auth challenge-response), and the
restricted rrsync wrapper (support/rrsh.sh; each binary paired with its own
version's rrsync via --rrsync-a/--rrsync-b, since rrsync ships in the script).
- Stability gate: each binary is run N times and escalated on a candidate diff;
nondeterministic scenarios are quarantined FLAKY, never reported as regressions.
- Parallel (-j, default 20) with a per-run findings log; --loop runs until
--timelimit (or Ctrl-C), feeding the pool a half-random / half-systematic
stream of new combinations. As root an "all" run also folds in the root-only
sweeps (priv, daemonchroot).
- General coverage levers: a cost oracle (--cost, peak RSS over the whole process
group), transport lifted as an orthogonal axis, a resume/redo sweep, and
type-transition / nanosecond-mtime / scale (--scale N) fixtures.
Andrew Tridgell [Wed, 10 Jun 2026 01:18:54 +0000 (11:18 +1000)]
testsuite: add perftest.py to compare two rsync builds' transfer speed
A standalone dev tool (run directly, not via runtests.py) for catching
performance regressions between rsync releases. Given two rsync binaries it
builds one deterministic test tree -- heavy-tailed file sizes, a directory
spine, symlinks, hard links and a spread of permission modes, modelled on the
gentestdata generator -- then runs the two binaries ALTERNATELY for N loops,
timing each transfer, and reports the mean and standard deviation per binary.
Each loop times a full copy into an emptied destination and an incremental
no-op against an already-synced one (rsync's scan/file-list/stat overhead,
where many regressions hide); --mode selects. The first run of each binary is
dropped to reduce page-cache impact, the run order alternates to cancel drift,
and a B-vs-A slowdown is flagged only when it exceeds the run-to-run noise.
Andrew Tridgell [Tue, 9 Jun 2026 02:04:17 +0000 (12:04 +1000)]
lib: use .balign in md5 x86-64 asm to fix macOS over-alignment
The file used ".align 16" intending 16-byte alignment (GNU/ELF semantics).
On macOS the Mach-O assembler reads ".align N" as 2^N, so it requested
64KB alignment for __TEXT,__text, producing:
ld: warning: reducing alignment of section __TEXT,__text from 0x10000
to 0x1000 because it exceeds segment maximum alignment
The linker clamps it back, so it was harmless, but .balign 16 means
16 bytes on every target and silences the warning.
Andrew Tridgell [Tue, 9 Jun 2026 02:04:17 +0000 (12:04 +1000)]
checksum: guard the AVX2 roll-asm path with a runtime CPUID check
When built with --enable-roll-asm, get_checksum1() called the AVX2 asm
routine get_checksum1_avx2_asm() unconditionally. Unlike the intrinsic
path (get_checksum1_avx2_64), which is function-multiversioned with a
target("default") fallback and so resolves safely on any CPU, the asm
routine is a single AVX2-only symbol with no fallback. On an x86-64 host
without AVX2 (an older CPU, or a VM that does not expose AVX2) the first
block checksum executes a VEX-encoded instruction and dies with SIGILL,
which surfaces as "connection unexpectedly closed (0 bytes received so
far)" and a code-12 protocol error.
Gate the asm call on a cached __builtin_cpu_supports("avx2") check, the
same signal the intrinsic resolver uses. When AVX2 is absent we skip it
and the SSSE3/SSE2/scalar steps (safe everywhere) do the work. Apply the
same guard in the simdtest harness so it can run on non-AVX2 hosts too.
Andrew Tridgell [Sun, 7 Jun 2026 23:47:57 +0000 (09:47 +1000)]
tests: add clang scan-build static-analysis CI (informational)
Run the clang static analyzer over a check-progs build, publish the HTML report
as an artifact, and print the bug count to the run summary. INFORMATIONAL only:
it does not pass --status-bugs, so it surfaces new analyzer findings without
going red on the existing (overwhelmingly false-positive) reports.
Runs on push/PR to master and via workflow_dispatch. No cron: it is
informational and its output only changes with the code (push/PR) or the clang
version, so a daily run on an unchanged tree would add noise without value.
Andrew Tridgell [Sun, 7 Jun 2026 23:47:57 +0000 (09:47 +1000)]
tests: add ASan+UBSan CI gate
Add a clang AddressSanitizer + UndefinedBehaviorSanitizer workflow that builds
rsync with -fsanitize=address,undefined -fno-sanitize-recover=undefined -DNDEBUG
and runs the full test suite over both the stdio-pipe and TCP daemon transports.
UBSAN_OPTIONS=halt_on_error=1 together with -fno-sanitize-recover=undefined makes
any undefined behaviour fatal, so this job gates: the tree must stay UBSan-clean.
The remaining findings are fixed in code (hashtable/mdfour shifts, xattrs, and
log.c's file_struct, kept aligned via rounding.h); only byteorder.h's intentional
unaligned accessors are suppressed, with no_sanitize. -DNDEBUG builds as a release
does (assert() compiled out) so ASan covers the production code paths.
Runs on push/PR to master and via workflow_dispatch, plus a weekly cron to
catch breakage from a moving ubuntu-latest/clang toolchain (push/PR already
cover every code change, so daily would just re-run an unchanged tree).
Andrew Tridgell [Sun, 7 Jun 2026 22:09:10 +0000 (08:09 +1000)]
io: drop the dead/unnecessary read_varint UBSan guard
The cherry-picked #428 wrapped no_sanitize attributes on read_varint() and
read_varlong() in `#ifndef CAREFUL_ALIGNMENT`, but byteorder.h always
#defines CAREFUL_ALIGNMENT (to 0 or 1), so that guard is never true and the
attributes were dead code.
They are also unnecessary: both functions read the assembled value through
an aligned union member (union { char b[5]; int32 x; }), not an unaligned
cast, so UBSan's alignment check never fires there (verified: the ASan+UBSan
suite is clean without them). Remove the whole block rather than fix the
guard. (The byteorder.h annotations from #428, which are real and correctly
placed inside the !CAREFUL_ALIGNMENT branch, are kept.)
Sam James [Mon, 9 Jan 2023 06:30:28 +0000 (06:30 +0000)]
Disable UBSAN for alignment-sensitive functions when !CAREFUL_ALIGNMENT
rsync sets CAREFUL_ALIGNMENT for architectures which do not support
unaligned access. Disable UBSAN for functions which may use unaligned
accesses when CAREFUL_ALIGNMENT is set.
Andrew Tridgell [Mon, 8 Jun 2026 10:29:40 +0000 (20:29 +1000)]
log: align the file_struct built in log_delete()
log_delete() builds a struct file_struct inside a char buffer offset by the
(EXTRA_LEN-granular) extra data. The EXTRA_ROUNDING block that rounds that
offset up to the struct's alignment (exactly as flist.c does for its pool
allocations) was dead code here: log.c never included rounding.h, so
EXTRA_ROUNDING was undefined and the rounding never ran, leaving the
file_struct pointer potentially under-aligned. That trips UBSan's alignment
check and would fault on strict-alignment arches.
Include rounding.h (and add the Makefile dependency) so the existing rounding
actually applies -- fixing the alignment at the source rather than suppressing
the sanitizer.
Andrew Tridgell [Sun, 7 Jun 2026 21:18:19 +0000 (07:18 +1000)]
xattrs: fix UBSan-detected undefined behavior
Three pre-existing issues UBSan flags during the xattr tests:
* xattr_lookup_hash(): the summed hashlittle2() values overflow the
signed int64 accumulator (UB). Accumulate in uint64_t and convert back
at return -- the key is only used for hash-table equality, so the value
is unchanged.
* rsync_xal_get(): for an empty list (count == 0) the loop init
`rxa += count-1` forms `items - 1` on a NULL `items` (UB). Guard with
`if (count)`.
* rsync_xal_store(): `memcpy(dst, xalp->items, 0)` passes a NULL source for
an empty list (UB). Guard with `if (xalp->count)`.
Andrew Tridgell [Sun, 7 Jun 2026 21:18:02 +0000 (07:18 +1000)]
hashtable, mdfour: avoid signed left-shift overflow
UBSan flags two spots that shift a value into the top bits of a word via a
signed operand:
* lib/mdfour.c copy64(): `in[i] << 24` promotes the uchar to int, so a
byte >= 128 overflows int (UB). Cast each byte to uint32.
* hashtable.c NON_ZERO_64(): `(int64)(x) << 32` overflows int64 whenever
x's high bit is set. Shift as uint64_t (covers all four call sites).
Behavior-preserving -- only the intermediate type changes; the resulting
bit pattern is identical.
Andrew Tridgell [Mon, 8 Jun 2026 03:37:38 +0000 (13:37 +1000)]
release.py: accept a git worktree in require_top_of_checkout()
In a git worktree .git is a file (a gitdir pointer), not a directory,
so os.path.isdir('.git') wrongly aborted with "no .git dir" when the
release was run from a worktree. Use os.path.exists() so it works from
both a normal checkout and a linked worktree.
Andrew Tridgell [Mon, 8 Jun 2026 00:00:17 +0000 (10:00 +1000)]
ci: move the daily scheduled jobs to weekly
Every platform build (the BSD/Solaris/macOS/cygwin/almalinux/ubuntu jobs),
coverage, the version-mix job and the android static build ran on a daily cron
*in addition to* push and pull_request to master. Since push/PR already cover
every code change, the cron only adds drift coverage -- catching breakage from a
moving runner image or toolchain that no commit triggers. Those images do not
change daily, so a daily run mostly re-tests an unchanged tree.
Move them all to a weekly cron (Mondays, keeping each job's existing time) to
keep that drift coverage at roughly a seventh of the Actions spend and log
noise. fleettest was already weekly. Per-change CI on push/PR is unchanged, and
workflow_dispatch still allows an on-demand run.
Andrew Tridgell [Sun, 7 Jun 2026 22:58:09 +0000 (08:58 +1000)]
fleettest: --cleanup also kills stray flippers/daemons and root-owned dirs
A run killed without a parent-death backstop can strand a TOCTOU path-flipper
(a busy `python -c` rename loop that pins a CPU) and an orphaned test rsyncd
(--no-detach --address=127.0.0.1) that squats its fixed port -- the wedge the
claim_ports() bind-probe now reports and points at --cleanup. Sweep both, best
effort, before removing the run dirs.
Each sweep counts the pattern, kills it (with a `sudo -n` retry for a process a
root-running test left), then re-counts after a settle: KILLED reports what
actually died, and a process that survives (pkill blocked, no passwordless sudo,
missing/limited pkill) is reported as SURVIVED and fails the run instead of
falsely claiming success.
Run-dir removal falls back to `sudo -n rm` so a dir whose contents a root test
owns is removed instead of failing with "Permission denied" (the failure mode
seen on the ubuntu/mac targets); only a dir that survives even sudo is failed.
The kill patterns use the pgrep self-exclusion trick ('r[e]name', 'det[a]ch')
so they match a real process's "rename"/"detach" but not the literal pattern in
the cleanup shell's own argv -- run_on() passes the whole script as the remote
argv, so without it --cleanup would signal itself. The patterns are host-global
(not scoped to one run), so --cleanup is documented to run between runs, not
during one.
Andrew Tridgell [Sun, 7 Jun 2026 22:49:24 +0000 (08:49 +1000)]
testsuite: verify a claimed test port is actually bindable
claim_ports() takes a POSIX byte-range lock per port, which serializes
concurrent live test runs. But the kernel drops that lock the instant the
holding process dies, even if the run left an orphaned rsync --daemon still
bound to the port -- which happens when a run is SIGKILLed on a platform with
no parent-death backstop (rsyncfns only arms PR_SET_PDEATHSIG, Linux-only, so
the BSDs/Solaris/macOS can strand a daemon). A later run then wins the freed
lock while the socket is still squatted and dies with a cryptic "bind() failed:
Address already in use" / "did not see server greeting".
After taking each lock, actually bind the port (SO_REUSEADDR, so a port merely
in TIME_WAIT is not a false positive; only a live squatter fails) and close it
immediately. On failure stop with an actionable message naming the port and the
likely orphaned daemon. Closes the gap that masked the OpenBSD daemon-auth wedge.
Andrew Tridgell [Sun, 7 Jun 2026 08:59:18 +0000 (18:59 +1000)]
fleettest: require runtests.py in --testsuite-repo, not the build tree
When --testsuite-repo provides the suite, the build tree (--repo) need not
carry runtests.py -- it may be an older release whose shell testsuite predates
the Python runtests.py (e.g. a 3.4.1 backport branch built and tested with the
current suite). Check runtests.py in TESTSUITE_REPO and only require the build
tree to be rsync source (rsync.h).
Andrew Tridgell [Sun, 7 Jun 2026 03:56:17 +0000 (13:56 +1000)]
fleettest: add --testsuite-repo to run another tree's suite against this build
--repo couples the built source and the test suite that exercises it.
--testsuite-repo PATH overlays runtests.py + testsuite/ from a second tree onto
the staged build tree, and sources the expected-skip workflows from it, so one
can build an older release (e.g. a 3.4.x stable branch) and run the current
comprehensive suite against that binary. Defaults to --repo, so the existing
single-tree behaviour is unchanged.
The shell testsuite was removed in 1f689ec0 (rewritten in Python); only
*_test.py remain, yet collect_tests still globbed *.test and _testbase mapped
foo.test and foo_test.py to the same canonical name. Harmless on a master tree
(no .test files), but when an older tree's *.test files are present -- e.g.
fleettest --testsuite-repo building a 3.4.x release whose shell suite still
exists -- both glob to the same test name and scratch dir and race under -j,
producing spurious failures. Drop .test discovery entirely.
Andrew Tridgell [Sat, 6 Jun 2026 08:43:06 +0000 (18:43 +1000)]
testsuite,ci: mark recv-discard-nullderef CI skip and tighten its check
The regression test honestly skips when it cannot force the receiver's
output mkstemp() to fail -- as root (root bypasses DAC) and on Cygwin
(chmod 0555 does not deny the owner a write). The ubuntu, ubuntu-22.04,
almalinux and macOS jobs run `make check` as root, and Cygwin can't
enforce the unwritable directory, so the test skips on all of them.
runtests.py fails a run on any skip-set mismatch, so add the test to
those jobs' RSYNC_EXPECT_SKIPPED lists; the BSD/Solaris jobs run as root
too but enforce no expected-skip set, so they need no change.
Also tighten the pass condition. The post-chmod writability probe already
guarantees the receiver discards (mkstemp must fail), so an exit 0 would
mean the file actually transferred and the discard path was never
exercised -- a silent false-pass. Require exactly exit 23 (the forced
discard leaves the file untransferred); 12 remains the pre-fix crash.
pterror [Fri, 5 Jun 2026 07:24:13 +0000 (17:24 +1000)]
testsuite: regression for the receiver discard-path NULL deref
Drives a real sender<->receiver pair (client sender -> daemon receiver,
both the binary under test in the default pipe transport) so the receiver
actually takes the recv_files discard path -- a local `rsync a b` does
not. The basis and source share a leading block so the generator emits
real sums and the receiver gets a block MATCH; the destination directory
is made unwritable so the receiver's output mkstemp() fails and it
discards the delta. Pre-fix the receiver SIGSEGVs in full_fname(NULL),
which the client sees as a protocol-data-stream error (code 12); post-fix
it drains the delta and reports a benign code 23 (or 0).
Skips (exit 77) when run as root, since root bypasses DAC and the
unwritable destination would not make mkstemp() fail -- so the discard
path, and the bug, would never be reached.
Verified red-on-buggy / green-on-fixed against the 0d0399bb receiver.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
pterror [Fri, 5 Jun 2026 07:24:05 +0000 (17:24 +1000)]
receiver: fix NULL deref on the delta discard path
receive_data() crashed a receiver that was merely DISCARDING a file's
delta stream. discard_receive_data() calls receive_data() with
fname == NULL and fd == -1, so size_r == 0 and mapbuf == NULL. A normal
block-MATCH token (against a block the basis and source share) then
reaches the !mapbuf branch added in 31fbb17d ("receiver: fix absolute
--partial-dir delta resume"), which calls full_fname(fname). full_fname()
dereferences its argument unconditionally (util1.c: `if (*fn == '/')`),
so fname == NULL faults there -> receiver SIGSEGV.
This is a normal-operation crash with a stock cooperating sender, not an
adversarial one. The generator hands the sender real block sums whenever
the basis is readable and we're in delta mode; the receiver only decides
to discard afterwards, when its output cannot be produced -- e.g. the
destination directory is not writable (mkstemp fails), the basis turns
out to be a directory, or a --partial-dir resume is skipped. A MATCH
token arriving during that discard hit the NULL deref.
The 31fbb17d branch is correct only for a REAL output transfer (fd != -1,
fname valid): there, a block match with no mapped basis is a genuine
protocol inconsistency (the generator promised a basis the receiver could
not open), and honoring it would silently omit those bytes from the
verification checksum or leave a hole, so hard-erroring -- and
full_fname(fname) -- is right. It conflated that with the discard path.
The discriminator is fd, not mapbuf: on the discard path fd == -1 always;
on the real-output inconsistency fd != -1. Scope the "no basis file"
protocol error to fd != -1 (where fname is non-NULL and full_fname is
safe) and, on the discard path (fd == -1), absorb the matched bytes
benignly (offset += len; continue) -- symmetric with the literal-token
handling just above, and restoring the pre-31fbb17d behavior. The
real-transfer inconsistency check is preserved unchanged.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Andrew Tridgell [Sat, 6 Jun 2026 05:43:02 +0000 (15:43 +1000)]
fleettest: add a per-target max_retry budget for flaky tests
A slow or heavily-loaded fleet box can occasionally flake a concurrency-
sensitive test (e.g. a daemon/lsh test under -j8 on a nested-VM Solaris box).
Rather than dropping the whole target to a lower -j, add a per-target
"max_retry" property: after a run, each failed test is re-run on its own up to
max_retry more times, and any that then pass are dropped from the failure list.
Recovered tests are listed in a new "RECOVERED" report section, so a flake is
surfaced, never silently hidden.
Applies to every pass for the target (pipe, tcp, protoNN, nonroot). Default 0
keeps the current no-retry behaviour.
A target can list older "protocols" (e.g. [30, 29]) in the fleet config;
each runs as an extra stdio-pipe pass with runtests --protocol=N, the fleet
analogue of a workflow's check30/check29 steps. The passes reuse the same
parsed RSYNC_EXPECT_SKIPPED list as the default pipe run and appear as protoNN
columns in the report and --timing breakdown. Targets without the key run only
the default protocol and show "-" there.
The example config's ubuntu-2604 target (mirroring ubuntu-build.yml, which has
check30/check29 steps) now sets protocols: [30, 29].
Andrew Tridgell [Fri, 5 Jun 2026 01:29:18 +0000 (11:29 +1000)]
testsuite: regression for #880 --mkpath --dry-run file-to-file
Covers both halves: a --mkpath file-to-file --dry-run must succeed and
match the real run (the #880 abort), and a plain file-to-file --dry-run
onto an existing differing destination must still itemize the real change
rather than report it as brand new. Both compare "--dry-run -i" output
against the real run.
A single-file --mkpath copy whose destination parent does not exist
failed under --dry-run: make_path() only *reports* the directories it
would create in a dry run, so change_dir#3 then tried to chdir into a
parent that isn't there and aborted with "change_dir#3 ... failed".
When the parent is genuinely missing in a dry run, skip the chdir and
mark the destination as not-yet-present (dry_run++), exactly as the
multi-file/dir-creation path already does, so the generator doesn't
probe the missing tree. Gating it on the missing-parent case keeps an
ordinary file-to-file dry run chdir'ing into and itemizing against an
existing destination.
Andrew Tridgell [Fri, 5 Jun 2026 00:51:06 +0000 (10:51 +1000)]
Remove obsolete DocBook manual
doc/rsync.sgml is a 1996-2002 DocBook user manual (with README-SGML
describing the docbook-utils build) that was long ago superseded by the
markdown man pages. It is unmaintained and referenced by nothing in the
build. This empties doc/.
Andrew Tridgell [Fri, 5 Jun 2026 00:49:44 +0000 (10:49 +1000)]
Remove obsolete design notes
rsync3.txt and rsyncsh.txt are Martin Pool's 2001 design proposals
("notes towards a new version of rsync", an interactive rsync shell),
neither of which reflects the current implementation. doc/profile.txt is
stale profiling notes. None are referenced by the build, tests, or docs.
Andrew Tridgell [Thu, 4 Jun 2026 23:02:32 +0000 (09:02 +1000)]
Remove obsolete testhelp/maketree.py
This Python 2 test-tree generator (print statements, string.letters,
.next()) has been broken on modern Python for years and is referenced
nowhere in the build, tests, or any script. Drop it.
Andrew Tridgell [Thu, 4 Jun 2026 05:49:14 +0000 (15:49 +1000)]
token: drain the matched-block insert deflate (#951)
send_deflated_token() adds a matched block to the compressor history with
deflate(Z_INSERT_ONLY). Our bundled zlib implements Z_INSERT_ONLY (it
produces no output and consumes the input in one call), but a build
against a system zlib lacks it and falls back to Z_SYNC_FLUSH (see the top
of the file), which emits a flush block into obuf. For a large
incompressible matched token that block exceeds AVAIL_OUT_SIZE(CHUNK_SIZE),
so deflate returned with avail_in != 0 and the transfer aborted:
"deflate on token returned 0 (N bytes left)" at token.c
The insert output is never sent -- the receiver rebuilds the matching
history itself in see_deflate_token() -- so loop, resetting the output
buffer, and discard it. Drain with the same condition as the data loop
above: until the input is consumed AND avail_out != 0. Stopping at
avail_in == 0 alone can leave pending output in the deflate stream (a
full output buffer with bytes still buffered), which would then be emitted
by the next real deflate send and corrupt the stream. A bundled-zlib
build still finishes in one iteration.
Andrew Tridgell [Thu, 4 Jun 2026 22:15:58 +0000 (08:15 +1000)]
ci: add ubuntu-latest fleettest workflow against a localhost fleet
fleettest is a developer tool meant to run on a modern Ubuntu box, so a
bitrot check belongs in its own ubuntu-latest job rather than in the
testsuite (which runs on the BSD/Solaris/macOS/Cygwin matrix, whose
older Pythons may not even parse it).
The job sets up passwordless ssh to localhost, writes a two-target
fleet config that both ssh to localhost (distinct build dirs), and runs
a real fleettest pass. Two targets exercise the parallel multi-target
path and the per-run dir / port isolation; the run exits 0 only if
every cell is OK. Triggered on changes to fleettest.py or this
workflow, manually, and weekly.
Andrew Tridgell [Thu, 4 Jun 2026 21:52:49 +0000 (07:52 +1000)]
fleettest: add --timing to show per-target wall-clock
Records wall-clock per phase (push, build, each test transport, nonroot)
plus a total in TargetResult, and with --timing prints a breakdown after
the report, sorted slowest-target-first. Targets run in parallel, so the
run is gated by the slowest one; the phase columns show whether that
hold-up is the push, the build, or a test pass. A target that failed
early (no total) falls back to the sum of the phases it reached.
Andrew Tridgell [Thu, 4 Jun 2026 21:48:03 +0000 (07:48 +1000)]
fleettest: tighten --cleanup sweep scope and rm hardening
Address review findings on the cleanup paths:
- --cleanup no longer removes a bare <builddir>, only the suffixed
<builddir>-* run dirs it created. This keeps the sweep within its
documented scope and avoids clobbering an unrelated tree.
- Add _unsafe_builddir(): reject empty/root/$HOME and any absolute path
directly under / (e.g. a misconfigured builddir of "/tmp") before
building a destructive command, in both cleanup paths.
- Use `rm -rf --` so a path with a leading dash can't be read as options.
- Soften the docs: run-dir removal on Ctrl-C/kill is best-effort (a
signal arriving mid-push can still leave a remnant for --cleanup).
Andrew Tridgell [Thu, 4 Jun 2026 21:39:31 +0000 (07:39 +1000)]
fleettest: isolate concurrent runs and add config/cleanup options
Each run now builds in its own randomly-named dir on every target
(<builddir>-<run_id>), so two or three fleettest runs can share the same
fleet without colliding on the pushed tree, the build, or the testtmp
scratch. Port collisions were already handled by claim_ports() locks.
The run dir is removed when the run ends -- on success, failure, or
Ctrl-C/kill (atexit + SIGINT/SIGTERM handlers); --keep retains it. A new
--cleanup mode sweeps stray <builddir>-* dirs left by a SIGKILL.
Incremental builds are dropped (every run is a fresh dir + full build):
--no-push removed, --clean removed.
Also look for the fleet config at ~/.fleettest.json first, then
testsuite/fleettest.json (still overridable with --fleet PATH).
Andrew Tridgell [Thu, 4 Jun 2026 06:19:31 +0000 (16:19 +1000)]
testsuite: regression for the #829 daemon --chown/--groupmap wildcard
Maps every source group to a second group the test user belongs to via a
daemon upload (--groupmap='*:GID') and checks the wildcard took effect.
Runs both arg modes: the default path (the '*' is safe_arg-escaped and the
daemon must un-backslash it -- the regression) and --secluded-args (the '*'
is sent raw over the protected channel, a guard that the fix left that path
alone). Needs no root -- a non-root receiver can chgrp to a member group --
and was verified RED on a pre-fix binary (the escaped '\*' is ignored, gid
unchanged) and GREEN after the fix.
Andrew Tridgell [Thu, 4 Jun 2026 06:19:31 +0000 (16:19 +1000)]
daemon: un-backslash escaped option args (#829)
Without --secluded-args, the client's safe_arg() backslash-escapes shell
and wildcard chars in option values before sending them to the server, so
--chown's --usermap=*:user is transmitted as --usermap=\*:user. Over ssh a
remote shell removes the backslashes before rsync parses the args, but a
daemon has no shell and read_args() stored option args verbatim -- so the
receiver saw the literal "\*", the usermap/groupmap wildcard never matched,
and the module's configured uid/gid won instead. A regression from the
secluded-args hardening; rsync 3.2.3 (protocol 31) worked.
Un-backslash option args in read_args() on the daemon's first
(non-protected) read, mirroring what the ssh-side shell does. File args
after the dot are already handled by glob_expand(); the protected (NUL,
already-unescaped) re-read and the server's stdin read pass unescape=0 so
their raw args are left untouched.
Andrew Tridgell [Thu, 4 Jun 2026 04:46:38 +0000 (14:46 +1000)]
build: fall back to do_mknod() when mknodat() is unavailable (#896)
do_mknod_at() (the symlink-race-safe variant used by a non-chrooted
daemon receiver) calls mknodat()/mkfifoat(), but the at-variant was
gated only on AT_FDCWD. Older Darwin declares AT_FDCWD without
mknodat(), so the build failed with "mknodat undeclared".
Probe mknodat()/mkfifoat() in configure and require HAVE_MKNODAT for the
at-variant; without it do_mknod_at() falls back to do_mknod(), exactly
as it already does where AT_FDCWD is missing. Linux keeps the mknodat
path since HAVE_MKNODAT is defined there.
Andrew Tridgell [Thu, 4 Jun 2026 04:43:38 +0000 (14:43 +1000)]
alloc: revert "zero all new memory from allocations" (#959)
Commit d046525d made my_alloc() calloc every fresh allocation and made
expand_item_list() memset the freshly grown tail, to hand out predictably
zeroed memory. But that forces the kernel to back pages callers never
touch: each per-directory file_list pre-allocates a FLIST_START-entry
(32768) pointer array -- 256KB -- and calloc now zeroes the whole array
even for an empty directory. With incremental recursion over many
directories the resident set explodes; 80000 empty dirs went from ~336MB
to ~10.8GB.
Restore the pre-d046525d malloc/calloc split: fresh allocations use
malloc (so untouched tails stay lazy) and only explicit do_calloc
requests (new_array0) are zeroed. Callers that need zeroed memory
already ask for it, and the full test suite passes.
Andrew Tridgell [Thu, 4 Jun 2026 04:17:12 +0000 (14:17 +1000)]
testsuite: regression for short-checksum --append-verify s2length
Forces --checksum-choice=xxh64 (an 8-byte transfer checksum) with a
corrupted-prefix --append-verify so the full-checksum redo path runs.
Before the generator capped s2length at MIN(SUM_LENGTH, xfer_sum_len)
this died with "Invalid checksum length 16 [sender]"; the test is RED on
the prior generator and GREEN with the cap. Reproduces on any build that
has xxhash, so it guards the fix without an old-libxxhash host; skips when
xxh64 is absent (a build without xxhash).
Andrew Tridgell [Thu, 4 Jun 2026 04:04:47 +0000 (14:04 +1000)]
generator: cap block s2length at the negotiated checksum length
sum_sizes_sqroot() capped the strong-sum length at SUM_LENGTH (16), the
legacy MD4/MD5 digest size. Since 0902b52f the sum2 array elements are
xfer_sum_len bytes and the sender rejects a sums header whose s2length
exceeds xfer_sum_len. When the negotiated transfer checksum is shorter
than 16 bytes -- xxh64 (8), used when the build's libxxhash lacks
xxh128/xxh3 (e.g. Ubuntu 20.04) -- the generator still emitted s2length
up to 16, so --append-verify and other full-checksum (redo) transfers
died with "Invalid checksum length 16 [sender]" (protocol incompatibility).
Cap s2length at MIN(SUM_LENGTH, xfer_sum_len): unchanged for any checksum
>= 16 bytes (md5/xxh128/sha1), corrected for short ones. Also closes a
latent over-read of the xfer_sum_len-sized digest buffer.
Andrew Tridgell [Wed, 3 Jun 2026 23:14:52 +0000 (09:14 +1000)]
android: probe openat2 usability behind a SIGSYS handler
Android's seccomp sandbox traps openat2() with SECCOMP_RET_TRAP, which
raises SIGSYS and kills the process instead of returning ENOSYS, so the
secure resolver cannot simply try openat2() and inspect errno. Add
openat2_usable() in a new android.c: it probes openat2() once behind a
temporary SIGSYS handler and caches the result.
Gate every SYS_openat2 call on openat2_usable(): in the resolver via an
openat2_beneath() wrapper, and in t_chmod_secure's kernel probe directly,
so a blocked openat2 reports ENOSYS and the caller falls back to the
portable O_NOFOLLOW resolver. Only openat2 is gated -- a plain openat()
(e.g. opening an operator-trusted absolute basedir) is left free.
The probe body compiles only on Android -- __ANDROID__ is a Bionic target
macro, so it is set for NDK cross-builds and native Termux alike and unset
everywhere else, where openat2_usable() collapses to a constant 1. Link
android.o into the secure-resolver test helpers too so their self-tests
survive on Termux.
Andrew Tridgell [Wed, 3 Jun 2026 22:50:49 +0000 (08:50 +1000)]
configure: require <linux/openat2.h>, not just SYS_openat2
The openat2 secure resolver in syscall.c needs struct open_how and
RESOLVE_BENEATH from <linux/openat2.h>, not only the SYS_openat2 syscall
number. Some setups expose the syscall number via glibc without the
kernel header present, so probing SYS_openat2 alone still left the build
broken (#905). Exercise the header and struct in the configure check so
HAVE_OPENAT2 is defined only when both are actually usable.
Markus Mayer [Fri, 29 May 2026 17:15:13 +0000 (10:15 -0700)]
t_chmod_secure: use HAVE_OPENAT2 to check for openat2() support
To prevent using openat2() in situations where it is not supported, use
#if defined(__linux__) && defined(HAVE_OPENAT2)
in t_chmod_secure.c, just like it was already being done in syscall.c.
Markus Mayer [Thu, 28 May 2026 00:44:37 +0000 (17:44 -0700)]
build: auto-detect the presence of the openat2() syscall
Let configure detect if the openat2() syscall is supported by the kernel
headers we are building against. Do not attempt to use openat2() if
support is not present.
Users can still disable using the openat2() syscall manually if so
desired.
Andrew Tridgell [Wed, 3 Jun 2026 23:35:33 +0000 (09:35 +1000)]
testsuite: add fleettest.py fleet CI harness
fleettest.py builds the committed HEAD of a checkout on a fleet of remote machines over ssh and runs the test suite under both the stdio-pipe and --use-tcp transports in parallel, reporting only the unexpected results. Each target mirrors a .github/workflows/*.yml job: its configure flags, and the RSYNC_EXPECT_SKIPPED list parsed from the workflow.
The fleet is described by a JSON file (testsuite/fleettest.json, git-ignored); fleettest.json.example is a worked template. Use --fleet to point at another config and --repo to build a tree other than the current directory.
A target with nonroot:true reruns, as the unprivileged ssh user, the tests that declare a module-level fleet_nonroot=True (here ownership-depth and daemon). The set lives in the test files, so new privilege-sensitive tests join the non-root pass with no fleet-config change.
Also rename testsuite/README.testsuite to README.md and rewrite it as markdown documenting the current testsuite: runtests.py, the make check/check29/check30/installcheck/coverage targets, the result/exit-code conventions, and fleettest.py.
Andrew Tridgell [Wed, 3 Jun 2026 10:48:10 +0000 (20:48 +1000)]
syscall/receiver: honour a relative alt-basis dir on a daemon receiver (#915)
The symlink-race hardening routed the receiver's basis open through
secure_relative_open(), which rejects any '..' -- so a sibling
--link-dest=../01 on a use-chroot=no daemon was silently ignored and every file
re-transferred (#915/#928, a regression from 3.4.1).
Narrow the confinement to the sanitizing daemon (am_daemon && !am_chrooted) and
re-anchor it at the module root, the real trust boundary: secure_relative_open()
prefixes the cwd's module-relative path (from rsync's logical curr_dir[], a
guaranteed lexical prefix of module_dir) and resolves beneath module_dir, so
RESOLVE_BENEATH permits an in-module '..' climb while still rejecting one that
escapes the module. secure_basis_open() opens with a bare do_open() in the
non-sanitizing cases. t_stub.c gains weak curr_dir[]/curr_dir_len for the
helpers (via #pragma weak on non-GNU compilers, where rsync.h erases
__attribute__).
Two tests: link-dest-relative-basis asserts the in-module '..' is honoured;
link-dest-module-escape asserts a --link-dest=../../OUTSIDE climb that leaves
the module is refused (not hard-linked to an outside file). See upstream
PR #930.
Andrew Tridgell [Wed, 3 Jun 2026 10:48:10 +0000 (20:48 +1000)]
sender: open a module-root-absolute path for a `path = /` module (#897)
A daemon module with path=/ makes F_PATHNAME absolute, so the secure_path built
for the content open starts with '/'. secure_relative_open() rejects an
absolute relpath with EINVAL, so a use-chroot=no daemon with path=/ could not
send any file ('failed to open ...: Invalid argument (22)') -- a regression
from 3.4.2. Strip leading slashes to a module-relative path; resolution stays
confined beneath module_dir.
Andrew Tridgell [Wed, 3 Jun 2026 10:47:56 +0000 (20:47 +1000)]
flist: accept the missing-args mode-0 entry in recv_file_entry (#910)
--delete-missing-args (missing_args==2) sends a missing --files-from arg as a
mode-0 entry (IS_MISSING_FILE), the generator's delete signal. The mode-type
validation in recv_file_entry() rejected mode 0 as an invalid file type,
aborting the transfer with 'invalid file mode 00 ... code 2' before the
generator could act (a regression from 3.4.1). Allow mode 0 through only when
missing_args==2 (the delete mode -- not --ignore-missing-args, which never
sends a mode-0 entry); all other modes are still rejected.
Andrew Tridgell [Wed, 3 Jun 2026 11:36:25 +0000 (21:36 +1000)]
testsuite/runtests: count XFAIL (exit 78) as expected, not a failure
The regression tests use test_xfail() (exit 78) to assert a known, documented
residual on platforms where the fix can't apply -- e.g. link-dest-relative-basis
XFAILs where the receiver has no openat2/O_RESOLVE_BENEATH and the portable
resolver rejects the '..' for safety. runtests.py counted exit 78 in the
generic else->failed branch, so a bare XFAIL failed the whole suite; tally it
separately ('N xfailed (expected)') and exclude it from the failure exit code.
Also add --race-timeout plumbing (race_timeout env) for race tests.
Adds .github/workflows/ubuntu-version-mix.yml (ubuntu-latest) and a
per-release manifest testsuite/expect/rsync_<ver>.expect for each of the
nine peers. The workflow builds the current rsync, then runs the two-
sided suite against every old binary over both the pipe and --use-tcp
daemon transports. All peers run in a SINGLE looped job (not a matrix)
so the PR shows one check line; each peer/transport is a foldable log
group and a failure annotates which one broke.
A new phony `check-progs` target builds rsync plus the test helper
programs and check symlinks without running the suite -- the build half
of `make check` -- so the workflow's direct runtests.py invocation has
the helpers it needs.
Notable expected results encoded in the manifests:
- The four May-2026 security tests xfail against every released peer:
the suite demonstrates each release is vulnerable to those findings
while current master is fixed.
- symlink-dirlink-basis xfails on 3.4.0/3.4.1 (issue #715: their
secure_relative_open O_NOFOLLOW-confines the basedir, breaking a -K
dir-symlink update; current master fixes it with secure_basis_open).
- Older peers carry more xfails for options/negotiation they lack;
2.6.0 (protocol 27) fails most daemon tests. reverse-daemon-delta
passes against all peers, confirming backward compat down to 2004.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Andrew Tridgell [Sun, 31 May 2026 11:01:09 +0000 (21:01 +1000)]
old_versions: commit static binaries of old rsync releases
Nine statically-linked, stripped binaries for the version-mixing test
suite (and ad-hoc cross-version behaviour checks): every x.y.0 release
from 2.6.0 (2004, protocol 27) through 3.4.0, plus the 3.1.3/3.2.7/3.4.1
point releases. 2.6.0 is the practical floor; older tags need more
porting to build on a current toolchain.
build_static.sh rebuilds any release from its git tag, applying the
minimal patches needed to compile old sources on a modern toolchain:
K&R lseek64 redecl, gettimeofday, -std=gnu11, --disable-openssl, and
_FORTIFY_SOURCE disabled (modern FORTIFY=3 turns latent benign over-reads
in old rsync into aborts when it runs as a server). Pre-3.0 trees ship
configure.in, so it regenerates configure (autoheader/autoconf) after
neutralizing the dead AC_LIBOBJ replacement fallbacks, generates proto.h,
and stubs the dropped vendored lib/addrinfo.h -- all guarded to no-op on
newer versions.
.gitattributes marks the binaries binary (so the text=auto rule can't
corrupt them) and export-ignore (kept out of the release tarball).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Andrew Tridgell [Sun, 31 May 2026 11:01:09 +0000 (21:01 +1000)]
testsuite: reverse-direction smoke test (old client -> current daemon)
Every other two-sided test drives with the current binary, covering
new-client -> old-server. This adds the backward-compat direction that
matters most for a project shipping new servers to a world of old
clients: a current daemon must keep serving the installed base of old
rsync clients.
reverse-daemon-delta_test.py starts the daemon with the current build
(via start_test_daemon's rsync_cmd override) and drives it with the old
binary. It does a push and a pull, each with and without -z, with the
receiving side pre-seeded with an older version of the file so the delta
algorithm actually runs -- exercising delta encoding both ways (old->new
on push, new->old on pull) and compression negotiation both ways. It
asserts the bytes crossing the wire are far smaller than the file, so a
silent fallback to a whole-file copy is caught, and accepts both the
modern "sent/received" and the old "wrote/read" summary wording so an
old client's output parses.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Andrew Tridgell [Sun, 31 May 2026 11:00:51 +0000 (21:00 +1000)]
runtests: add --rsync-bin2 / --expect-result for version-mixing tests
Let the suite run with two rsync binaries so the current build can be
tested against the actual old code of a previous release, rather than
only forcing the current binary to speak an old protocol (check29/30).
--rsync-bin2 PATH exports RSYNC_PEER, the binary used for the SERVER
side of two-sided transfers (the daemon process and
the remote-shell --rsync-path target). Defaults to
RSYNC, so single-binary runs are byte-for-byte
unchanged.
--expect-result F the manifest's listed tests ARE the run set; each
test's actual outcome (pass/skip/fail/xfail) is
compared to its expected one and any mismatch --
including an unexpected pass (xpass) -- fails the
run. --expect-skipped and the default exit logic
are untouched.
rsyncfns gains the RSYNC_PEER global and launches the daemon with it
(start_rsyncd / start_test_daemon, the latter with an optional rsync_cmd
override used by the reverse-direction test); the remote-shell tests
pass --rsync-path={RSYNC_PEER}. All no-ops when no peer is selected.
Direction is fixed: the current binary always drives (only it
understands the new test scripts); the old binary is only ever the
server/daemon side.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Andrew Tridgell [Mon, 1 Jun 2026 05:54:41 +0000 (15:54 +1000)]
runtests: add --exclude / RSYNC_EXCLUDE to skip tests entirely
Some tests cannot run in certain build/CI environments. In particular the
protected-regular test self-re-execs under "unshare --map-users" to exercise
fs.protected_regular handling, and that user-namespace path hangs in a
restricted buildd chroot (e.g. Launchpad/sbuild), tripping the per-test
timeout and failing the whole "make check".
Add an --exclude option (comma-separated test names/globs), with an
RSYNC_EXCLUDE environment fallback so it can be set without touching the
make/check command line. Excluded tests are dropped before running -- they
are neither executed nor reported as skipped.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Andrew Tridgell [Mon, 1 Jun 2026 05:03:05 +0000 (15:03 +1000)]
docs: document the rsync-latest snapshot PPA
Add the new ppa:rsyncproject/rsync-latest (development snapshots rebuilt
from git master) alongside the existing stable PPA in INSTALL.md and the
download page. Notes that snapshot versions (3.5.0~git...) sort below the
matching stable release, so the two PPAs can coexist without a stable
release being silently replaced by a snapshot.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Andrew Tridgell [Thu, 28 May 2026 19:26:30 +0000 (05:26 +1000)]
ci: halve CI artifact retention from 90 to 45 days
GitHub Actions artifact storage is approaching our quota. Each `make`/build
job uploads its rsync binary + manpages, the coverage job uploads its full
HTML tree, and Android uploads its dist/ -- 11 jobs producing artifacts per
PR/push, all kept for the repo default of 90 days.
Set retention-days: 45 explicitly on every upload-artifact step so they
expire at half the previous lifetime; older artifacts can still be re-built
from the commit if needed. No other workflow behaviour changes.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Andrew Tridgell [Tue, 26 May 2026 21:07:58 +0000 (07:07 +1000)]
runtests.py: accept a relative --rsync-bin
Tests are launched with subprocess.run(..., cwd=TOOLDIR) so the
subprocess's argv[0] resolves against TOOLDIR, not the runner's
invocation cwd. A user-supplied --rsync-bin=../foo/rsync therefore
worked when invoked from inside TOOLDIR but silently failed (or
ENOENT'd inside individual tests) when invoked from a sibling
directory.
Fix: absolutize rsync_bin via os.path.abspath() at parse time, before
it propagates into build_rsync_cmd()/RSYNC. abspath() captures
os.getcwd() now, which is the operator's invocation cwd -- exactly
what the --rsync-bin=../path form expresses.
Andrew Tridgell [Tue, 26 May 2026 10:02:52 +0000 (20:02 +1000)]
ci: add actionlint workflow to lint GitHub Actions YAML
Adds .github/workflows/actionlint.yml which runs rhysd/actionlint over
.github/workflows/*.yml on push and PR to master. Triggers only when
something in .github/workflows/ (or the actionlint config) changes, so
the rest of the platform matrix isn't billed when nothing here moves.
The job downloads a pinned actionlint binary (1.7.12) via the upstream
download script (which verifies a SHA256) -- no third-party Action
dependency, matching the inline-install style of the existing
ubuntu/macos/cygwin workflows. Bump the pinned version deliberately.
actionlint catches a) GitHub Actions expression / type errors, b)
unsupported runner images, c) missing secrets / inputs, and d) the
embedded shellcheck class of issues in 'run:' scripts that the previous
commit cleaned up. Keeping it in CI prevents regressions.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Andrew Tridgell [Tue, 26 May 2026 09:59:16 +0000 (19:59 +1000)]
ci: clean up workflow shellcheck nits
actionlint (rhysd/actionlint) reported a handful of shellcheck-class issues
across the GitHub Actions workflows. All are 1-line mechanical fixes:
* Replace legacy backticks in --rsync-bin=`pwd`/rsync with
--rsync-bin="$PWD/rsync" (SC2006 + SC2046; almalinux-8-build,
macos-build, ubuntu-22.04-build, ubuntu-build).
* Quote >>$GITHUB_PATH redirects as >>"$GITHUB_PATH"
(SC2086; coverage, macos-build, ubuntu-22.04-build, ubuntu-build).
After this commit `actionlint .github/workflows/*.yml` exits 0.
(Also cleaned up 6 editor backup *.yml~ files from the local working
tree; those weren't tracked -- *~ is gitignored -- so the cleanup is
local-only and not part of this commit.)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Andrew Tridgell [Sun, 24 May 2026 22:23:11 +0000 (08:23 +1000)]
testsuite: close minor assertion gaps
symlink-dirlink-basis assert the --backup file holds the pre-update content,
not merely that the backup file exists.
acls-default check that clearing the inherited default ACL actually
succeeded, so the no-default-ACL cases can't silently
test against the scratch dir's seeded default ACL.
alt-dest assert --copy-dest produces a distinct inode from the
alt-dir candidate (a copy, not a hard link) -- the
property that distinguishes it from --link-dest, which
checkit's tree comparison alone doesn't capture.
(crtimes' "independently pin the historical create time" gap is left as-is: the
touch-trick pinning is APFS-specific and not locally verifiable, and a mistuned
probe would make the test skip on macOS and break its expected-skip set.)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>