git.ipfire.org Git - thirdparty/rsync.git/log

manpage: clarify remote-shell daemon user@ handling

The description of user@host::module transfers over a remote shell only
documented the "ssh -l ssh-user" form, which led readers to conclude that
user@ never reaches the remote shell. In fact, for the simple
`--rsh=ssh user@host::module` form the user@ prefix is used both as the
ssh login user (ssh -l user) and as the rsync-user offered to the module;
the two are the same name. rsync only omits its own -l when the remote
shell command already specifies one, in which case user@ becomes the
rsync-user alone.

Spell out the default behaviour and why the explicit -l is needed to use
a different ssh login than the rsync-user.

ci: restore default build target

ci: fix no-AT_FDCWD compile check

syscall: build without AT_SYMLINK_NOFOLLOW

match: bound the hash_search() chain walk (issue #217)

hash_search() walks the entire hash-table chain for the current rolling
checksum at every byte offset of the source file. Disk and VM images
contain large runs of identical blocks, so a single weak checksum
(get_checksum1) can collide thousands of times and pile every one of
those blocks onto one chain. When the sender then rolls across a region
whose weak checksum keeps landing on that chain without ever producing a
strong-checksum match, it re-walks the whole chain for every byte, giving
O(file_size * chain_length) behaviour. The result is rsync sitting at
100% CPU for hours with no apparent progress -- the long-standing "rsync
hangs on large files" reports.

Cap the number of same-weak-checksum candidates examined per offset at
MAX_CHAIN_LEN. Once the cap is hit we treat the offset as a non-match and
roll forward a byte; any block skipped this way is simply sent as literal
data, so the transferred result is always correct -- only the transfer
size is marginally affected. This is purely a sender-side search limit:
it changes no checksum, emitted byte, or protocol field, so a capped
sender interoperates with an unmodified receiver and vice versa.

On a synthetic 40000-block basis sharing one weak checksum, syncing a
60KB source dropped from ~18.4s to ~0.7s; the unbounded cost grows with
the square of the file size.

testsuite/hashsearch-chain_test.py reproduces the pathology with a tiny
basis of weak-checksum-colliding decoy blocks and asserts, via the
existing false_alarms counter (--debug=deltasum1), that the per-hash-hit
chain walk stays bounded. The assertion is exact and machine-independent
rather than timing-based.

testsuite: C23 bool compatibility

Fix Solaris xattr retry handling

Use the remaining byte count for retry writes and avoid using a
size_t sentinel for write failures.

ci: run scan-build on pinned clang-18 + latest clang (informational)

Split the scan-build workflow into two non-gating jobs, each uploading
its HTML report as an artifact:

- pinned-clang18: clang-18 / clang-tools-18 on ubuntu-24.04, so the
  checker set -- and thus the report -- is deterministic.
- informational-latest: whatever clang ubuntu-latest ships, to surface
  what newer analyzers see.

Both are informational (no --status-bugs): the tree still has known
clang-18 findings, so the run reports without blocking the build.  Once
the tree is at zero for clang-18, re-add --status-bugs to the pinned job
to turn it back into a gate.  Installs libpopt-dev so configure finds
popt under the scan-build compiler wrapper.

scan-build: close a test-helper FILE* leak

wildtest.c: close the test file before main() returns (a real, if
exit-benign, FILE* leak flagged by scan-build).

scan-build: fix resource leaks on error paths

clientserver.c: close the --early-input-file FILE* on the
fstat/oversize/early-EOF error returns; it was only closed on the
success path.
getgroups.c: free the gid list before returning.

scan-build: drop dead assignments

Remove stores that are never read before being overwritten or going
out of scope. No behavior change except batch.c write_opt, which now
accumulates the leading-space write error into the return value
(consistent with the arg branch) instead of discarding it.

simd-checksum-x86_64.cpp, options.c, util1.c, batch.c

scan-build: zero-init buffers the analyzer can't prove are written

clang's static analyzer doesn't model SIVAL/SIVAL64/SIVALu or
getpeername/getsockname as initializing their target bytes, so it
reports false "garbage value" reads. Zero-init the affected buffers;
the bytes are always overwritten at runtime, so this only quiets the
analyzer.

io.c: write_varint/write_varlong b[]
hashtable.c: hash_search buf[]
socket.c: accepted_peer/our_local

generator: fix build warning

in sum_sizes_sqroot() cnt is assigned a variable but never actually
used, so remove it entirely as it's not needed anymore.

fix funding github username

Create FUNDING.yml

runtests: write valgrind logs to a world-writable subdir

Under --valgrind some tests run rsync with reduced privileges: partial_nowrite
wraps it in "setpriv --inh-caps -all --bounding-set -all" to force EACCES, and
chdir-symlink-race's daemon drops to the module's uid. Such a child cannot
create valgrind's --log-file in a root-owned scratchbase, so valgrind aborts at
startup and the test fails (seen only in the root + --use-tcp cell).

Put the logs in a 1777 valgrind-logs/ subdir so a privilege-dropped child can
always write them. Scan and cleanup are unchanged; the logs just move one
directory down.

generator: don't read an unstat'd sx.st when creating a device/special

this fixes a valgrind error where we could read an uninitialised sx.st
field when we don't fill the stat data.

Also drop the now-obsolete testsuite/valgrind.supp stanzas for these
reads (atomic_create/delete_item, plus the rwrite strlcpy over-read that
master already fixed) -- they are no longer needed now the reads are gone.

Thanks to report from Michael Mess <michael@michaelmess.de>

wildtest: don't read past the buffer when scanning a test line

main()'s line parser stepped through the fgets() buffer with `*++s` in
three places without first checking for the terminating NUL, so a test
line whose last token runs to the end of the buffer (e.g. a final line
with no trailing newline) could advance s past the NUL and read out of
bounds.

Guard the flag-separator check and rewrite the two whitespace-skip loops
so they never step past the NUL. No behaviour change for well-formed
input: the existing wildtest.txt still passes, and the crafted overflow
input is now clean under valgrind.

Fixes #776
Reported-by: vikk777 (@vikk777)

log: copy forwarded message by length in rwrite(), not strlcpy()

The valgrind memcheck CI flagged 'Conditional jump depends on uninitialised
value(s)' in rwrite() -> strlcpy() (log.c) and the subsequent logit() fprintf.
rwrite()'s daemon/logfile branch did strlcpy(msg, buf, MIN(sizeof msg, len+1)),
but strlcpy() scans the whole source with strlen(); buf is the data buffer from
read_a_msg() (io.c) holding exactly len bytes of a forwarded MSG_* payload with
no NUL terminator, so strlen() reads past the message into uninitialised stack.

Copy exactly len (bounded) bytes with memcpy() and NUL-terminate, matching the
(buf, len) contract the rest of rwrite() already honours. Behaviour is
unchanged for the NUL-terminated callers; the over-read is gone.

Full testsuite under valgrind (1572 logs) now reports zero unsuppressed errors.

ci: build test helpers before the valgrind run

`make` alone does not build the CHECK_PROGS test helpers (tls, trimslash,
t_chmod_secure, ...), so runtests.py exited immediately with "missing
test helper program(s)", produced no valgrind logs, and the scan step
failed every job with "the suite did not run". Use `make check-progs`,
which builds rsync plus the helpers and symlink fixtures without running
the suite.

testsuite: force C locale in reverse-daemon-delta byte-count parse

rsync groups the "sent/received N bytes" summary numbers using the
locale's thousands separator (e.g. de_DE uses '.'), which broke the
[\d,]+ parser and failed the test for testers in non-C locales. Run the
peer client under LC_ALL=C so the output is deterministic.

Reported-by: Michael Mess <michael@michaelmess.de>

testsuite: add gating valgrind memcheck workflow + suppressions

Add a .github/workflows/valgrind.yml that runs the full suite under
valgrind in a 2x2 matrix (user/root x pipe/tcp transport) and gates on
memory errors. It uses --leak-check=no: rsync intentionally leaves
file-list/socket/option memory unfreed at exit, so a leak check is
inherently noisy; the gate flags uninitialised reads, invalid
read/write, bad frees and uninit syscall params instead.

Add testsuite/valgrind.supp covering the known-benign reports (rwrite
strlcpy over-read on a non-NUL-terminated peer message, atomic_create/
delete_item st_mode read under fakeroot, libfakeroot msgsnd padding,
plus popt/xxhash leaks for manual --leak-check audits). runtests.py
--valgrind now loads it automatically.

token: allow uncompressed literal runs larger than CHUNK_SIZE

The hardening in c44c90e9 added a check in simple_recv_token() rejecting
any uncompressed literal-run length > CHUNK_SIZE (32k). That assumption
breaks interoperability: other rsync implementations -- e.g. the acrosync
library used by the iOS "PhotoBackup" app -- use a 64k block size and
send literal runs of 65536 bytes, which 3.4.3+ now rejects with
"invalid uncompressed token length 65536".

The check was unnecessary: simple_recv_token() already reads the run
CHUNK_SIZE bytes at a time via the residue loop (n = MIN(CHUNK_SIZE,
residue)), so read_buf() never writes past the static CHUNK_SIZE buffer
regardless of the wire-supplied length. Drop the check to restore
interop; the compressed-token integer-overflow fix from c44c90e9 (the
MAX_TOKEN_INDEX / rx_token caps) is left unchanged.

Fixes #1002
Reported-by: Jack Whitham

testsuite: fix executability test skip on FreeBSD (EFTYPE)

FreeBSD and OpenBSD return EFTYPE (errno 79) when chmod-ing a sticky bit
onto a regular file as non-root, rather than EPERM/EACCES. Catch OSError
and check errno against the expected skip set so the test skips correctly
on those platforms instead of erroring out.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

abdiff: A/B differential regression hunter for rsync

testsuite/abdiff.py runs the same benign transfer with two rsync binaries
(A = build under test, B = a baseline) and compares the OUTCOME -- exit code,
stderr, --stats "Literal data", the destination tree (content + full metadata),
the --itemize list, and (with --cost) peak process-group RSS. For benign input
the two must be indistinguishable; any divergence is a regression candidate.
It is a developer tool, NOT a runtests.py test (does not end in _test.py).

Capabilities:
- Scenario sweeps over options / path shapes / file types / sizes / modes /
  selection / placement / wire / transports, plus domain-knowledge pairwise +
  combo sweeps and a stochastic fuzzer/role matrix.
- Transport lanes: local, ssh split (lsh.sh), stdio-pipe daemon, a REAL TCP
  daemon (bound port + greeting/handshake/auth challenge-response), and the
  restricted rrsync wrapper (support/rrsh.sh; each binary paired with its own
  version's rrsync via --rrsync-a/--rrsync-b, since rrsync ships in the script).
- Stability gate: each binary is run N times and escalated on a candidate diff;
  nondeterministic scenarios are quarantined FLAKY, never reported as regressions.
- Parallel (-j, default 20) with a per-run findings log; --loop runs until
  --timelimit (or Ctrl-C), feeding the pool a half-random / half-systematic
  stream of new combinations. As root an "all" run also folds in the root-only
  sweeps (priv, daemonchroot).
- General coverage levers: a cost oracle (--cost, peak RSS over the whole process
  group), transport lifted as an orthogonal axis, a resume/redo sweep, and
  type-transition / nanosecond-mtime / scale (--scale N) fixtures.

Documented in testsuite/README.md.

testsuite: add perftest.py to compare two rsync builds' transfer speed

A standalone dev tool (run directly, not via runtests.py) for catching
performance regressions between rsync releases. Given two rsync binaries it
builds one deterministic test tree -- heavy-tailed file sizes, a directory
spine, symlinks, hard links and a spread of permission modes, modelled on the
gentestdata generator -- then runs the two binaries ALTERNATELY for N loops,
timing each transfer, and reports the mean and standard deviation per binary.

Each loop times a full copy into an emptied destination and an incremental
no-op against an already-synced one (rsync's scan/file-list/stat overhead,
where many regressions hide); --mode selects. The first run of each binary is
dropped to reduce page-cache impact, the run order alternates to cancel drift,
and a B-vs-A slowdown is flagged only when it exceeds the run-to-run noise.

docs: clarify chmod copy special bits

chmod: clear special bits on copy assignment

chmod: support permission copy modes

lib: use .balign in md5 x86-64 asm to fix macOS over-alignment

The file used ".align 16" intending 16-byte alignment (GNU/ELF semantics).
On macOS the Mach-O assembler reads ".align N" as 2^N, so it requested
64KB alignment for __TEXT,__text, producing:

ld: warning: reducing alignment of section __TEXT,__text from 0x10000
to 0x1000 because it exceeds segment maximum alignment

The linker clamps it back, so it was harmless, but .balign 16 means
16 bytes on every target and silences the warning.

checksum: guard the AVX2 roll-asm path with a runtime CPUID check

When built with --enable-roll-asm, get_checksum1() called the AVX2 asm
routine get_checksum1_avx2_asm() unconditionally. Unlike the intrinsic
path (get_checksum1_avx2_64), which is function-multiversioned with a
target("default") fallback and so resolves safely on any CPU, the asm
routine is a single AVX2-only symbol with no fallback. On an x86-64 host
without AVX2 (an older CPU, or a VM that does not expose AVX2) the first
block checksum executes a VEX-encoded instruction and dies with SIGILL,
which surfaces as "connection unexpectedly closed (0 bytes received so
far)" and a code-12 protocol error.

Gate the asm call on a cached __builtin_cpu_supports("avx2") check, the
same signal the intrinsic resolver uses. When AVX2 is absent we skip it
and the SSSE3/SSE2/scalar steps (safe everywhere) do the work. Apply the
same guard in the simdtest harness so it can run on non-AVX2 hosts too.

tests: add clang scan-build static-analysis CI (informational)

Run the clang static analyzer over a check-progs build, publish the HTML report
as an artifact, and print the bug count to the run summary. INFORMATIONAL only:
it does not pass --status-bugs, so it surfaces new analyzer findings without
going red on the existing (overwhelmingly false-positive) reports.

Runs on push/PR to master and via workflow_dispatch. No cron: it is
informational and its output only changes with the code (push/PR) or the clang
version, so a daily run on an unchanged tree would add noise without value.

tests: add ASan+UBSan CI gate

Add a clang AddressSanitizer + UndefinedBehaviorSanitizer workflow that builds
rsync with -fsanitize=address,undefined -fno-sanitize-recover=undefined -DNDEBUG
and runs the full test suite over both the stdio-pipe and TCP daemon transports.

UBSAN_OPTIONS=halt_on_error=1 together with -fno-sanitize-recover=undefined makes
any undefined behaviour fatal, so this job gates: the tree must stay UBSan-clean.
The remaining findings are fixed in code (hashtable/mdfour shifts, xattrs, and
log.c's file_struct, kept aligned via rounding.h); only byteorder.h's intentional
unaligned accessors are suppressed, with no_sanitize. -DNDEBUG builds as a release
does (assert() compiled out) so ASan covers the production code paths.

Runs on push/PR to master and via workflow_dispatch, plus a weekly cron to
catch breakage from a moving ubuntu-latest/clang toolchain (push/PR already
cover every code change, so daily would just re-run an unchanged tree).

io: drop the dead/unnecessary read_varint UBSan guard

The cherry-picked #428 wrapped no_sanitize attributes on read_varint() and
read_varlong() in `#ifndef CAREFUL_ALIGNMENT`, but byteorder.h always
#defines CAREFUL_ALIGNMENT (to 0 or 1), so that guard is never true and the
attributes were dead code.

They are also unnecessary: both functions read the assembled value through
an aligned union member (union { char b[5]; int32 x; }), not an unaligned
cast, so UBSan's alignment check never fires there (verified: the ASan+UBSan
suite is clean without them). Remove the whole block rather than fix the
guard. (The byteorder.h annotations from #428, which are real and correctly
placed inside the !CAREFUL_ALIGNMENT branch, are kept.)

Disable UBSAN for alignment-sensitive functions when !CAREFUL_ALIGNMENT

rsync sets CAREFUL_ALIGNMENT for architectures which do not support
unaligned access. Disable UBSAN for functions which may use unaligned
accesses when CAREFUL_ALIGNMENT is set.

Bug: https://github.com/WayneD/rsync/issues/427
Signed-off-by: Sam James <sam@gentoo.org>
(cherry picked from commit 11c1e934e8ac4a6c44fbd3d7bb171c04d928bb40)

log: align the file_struct built in log_delete()

log_delete() builds a struct file_struct inside a char buffer offset by the
(EXTRA_LEN-granular) extra data. The EXTRA_ROUNDING block that rounds that
offset up to the struct's alignment (exactly as flist.c does for its pool
allocations) was dead code here: log.c never included rounding.h, so
EXTRA_ROUNDING was undefined and the rounding never ran, leaving the
file_struct pointer potentially under-aligned. That trips UBSan's alignment
check and would fault on strict-alignment arches.

Include rounding.h (and add the Makefile dependency) so the existing rounding
actually applies -- fixing the alignment at the source rather than suppressing
the sanitizer.

xattrs: fix UBSan-detected undefined behavior

Three pre-existing issues UBSan flags during the xattr tests:

  * xattr_lookup_hash(): the summed hashlittle2() values overflow the
    signed int64 accumulator (UB).  Accumulate in uint64_t and convert back
    at return -- the key is only used for hash-table equality, so the value
    is unchanged.
  * rsync_xal_get(): for an empty list (count == 0) the loop init
    `rxa += count-1` forms `items - 1` on a NULL `items` (UB).  Guard with
    `if (count)`.
  * rsync_xal_store(): `memcpy(dst, xalp->items, 0)` passes a NULL source for
    an empty list (UB).  Guard with `if (xalp->count)`.

hashtable, mdfour: avoid signed left-shift overflow

UBSan flags two spots that shift a value into the top bits of a word via a
signed operand:

  * lib/mdfour.c copy64(): `in[i] << 24` promotes the uchar to int, so a
    byte >= 128 overflows int (UB).  Cast each byte to uint32.
  * hashtable.c NON_ZERO_64(): `(int64)(x) << 32` overflows int64 whenever
    x's high bit is set.  Shift as uint64_t (covers all four call sites).

Behavior-preserving -- only the intermediate type changes; the resulting
bit pattern is identical.

rsync-web: updates for the 3.4.4 release

release.py: accept a git worktree in require_top_of_checkout()

In a git worktree .git is a file (a gitdir pointer), not a directory,
so os.path.isdir('.git') wrongly aborted with "no .git dir" when the
release was run from a worktree. Use os.path.exists() so it works from
both a normal checkout and a linked worktree.

ci: move the daily scheduled jobs to weekly

Every platform build (the BSD/Solaris/macOS/cygwin/almalinux/ubuntu jobs),
coverage, the version-mix job and the android static build ran on a daily cron
*in addition to* push and pull_request to master. Since push/PR already cover
every code change, the cron only adds drift coverage -- catching breakage from a
moving runner image or toolchain that no commit triggers. Those images do not
change daily, so a daily run mostly re-tests an unchanged tree.

Move them all to a weekly cron (Mondays, keeping each job's existing time) to
keep that drift coverage at roughly a seventh of the Actions spend and log
noise. fleettest was already weekly. Per-change CI on push/PR is unchanged, and
workflow_dispatch still allows an on-demand run.

fleettest: --cleanup also kills stray flippers/daemons and root-owned dirs

A run killed without a parent-death backstop can strand a TOCTOU path-flipper
(a busy `python -c` rename loop that pins a CPU) and an orphaned test rsyncd
(--no-detach --address=127.0.0.1) that squats its fixed port -- the wedge the
claim_ports() bind-probe now reports and points at --cleanup. Sweep both, best
effort, before removing the run dirs.

Each sweep counts the pattern, kills it (with a `sudo -n` retry for a process a
root-running test left), then re-counts after a settle: KILLED reports what
actually died, and a process that survives (pkill blocked, no passwordless sudo,
missing/limited pkill) is reported as SURVIVED and fails the run instead of
falsely claiming success.

Run-dir removal falls back to `sudo -n rm` so a dir whose contents a root test
owns is removed instead of failing with "Permission denied" (the failure mode
seen on the ubuntu/mac targets); only a dir that survives even sudo is failed.

The kill patterns use the pgrep self-exclusion trick ('r[e]name', 'det[a]ch')
so they match a real process's "rename"/"detach" but not the literal pattern in
the cleanup shell's own argv -- run_on() passes the whole script as the remote
argv, so without it --cleanup would signal itself. The patterns are host-global
(not scoped to one run), so --cleanup is documented to run between runs, not
during one.

testsuite: verify a claimed test port is actually bindable

claim_ports() takes a POSIX byte-range lock per port, which serializes
concurrent live test runs. But the kernel drops that lock the instant the
holding process dies, even if the run left an orphaned rsync --daemon still
bound to the port -- which happens when a run is SIGKILLed on a platform with
no parent-death backstop (rsyncfns only arms PR_SET_PDEATHSIG, Linux-only, so
the BSDs/Solaris/macOS can strand a daemon). A later run then wins the freed
lock while the socket is still squatted and dies with a cryptic "bind() failed:
Address already in use" / "did not see server greeting".

After taking each lock, actually bind the port (SO_REUSEADDR, so a port merely
in TIME_WAIT is not a false positive; only a live squatter fails) and close it
immediately. On failure stop with an actionable message naming the port and the
likely orphaned daemon. Closes the gap that masked the OpenBSD daemon-auth wedge.

fleettest: require runtests.py in --testsuite-repo, not the build tree

When --testsuite-repo provides the suite, the build tree (--repo) need not
carry runtests.py -- it may be an older release whose shell testsuite predates
the Python runtests.py (e.g. a 3.4.1 backport branch built and tested with the
current suite). Check runtests.py in TESTSUITE_REPO and only require the build
tree to be rsync source (rsync.h).

fleettest: add --testsuite-repo to run another tree's suite against this build

--repo couples the built source and the test suite that exercises it.
--testsuite-repo PATH overlays runtests.py + testsuite/ from a second tree onto
the staged build tree, and sources the expected-skip workflows from it, so one
can build an older release (e.g. a 3.4.x stable branch) and run the current
comprehensive suite against that binary. Defaults to --repo, so the existing
single-tree behaviour is unchanged.

runtests: stop discovering obsolete *.test shell tests

The shell testsuite was removed in 1f689ec0 (rewritten in Python); only
*_test.py remain, yet collect_tests still globbed *.test and _testbase mapped
foo.test and foo_test.py to the same canonical name. Harmless on a master tree
(no .test files), but when an older tree's *.test files are present -- e.g.
fleettest --testsuite-repo building a 3.4.x release whose shell suite still
exists -- both glob to the same test name and scratch dir and race under -j,
producing spurious failures. Drop .test discovery entirely.

testsuite,ci: mark recv-discard-nullderef CI skip and tighten its check

The regression test honestly skips when it cannot force the receiver's
output mkstemp() to fail -- as root (root bypasses DAC) and on Cygwin
(chmod 0555 does not deny the owner a write). The ubuntu, ubuntu-22.04,
almalinux and macOS jobs run `make check` as root, and Cygwin can't
enforce the unwritable directory, so the test skips on all of them.
runtests.py fails a run on any skip-set mismatch, so add the test to
those jobs' RSYNC_EXPECT_SKIPPED lists; the BSD/Solaris jobs run as root
too but enforce no expected-skip set, so they need no change.

Also tighten the pass condition. The post-chmod writability probe already
guarantees the receiver discards (mkstemp must fail), so an exit 0 would
mean the file actually transferred and the discard path was never
exercised -- a silent false-pass. Require exactly exit 23 (the forced
discard leaves the file untransferred); 12 remains the pre-fix crash.

testsuite: regression for the receiver discard-path NULL deref

Drives a real sender<->receiver pair (client sender -> daemon receiver,
both the binary under test in the default pipe transport) so the receiver
actually takes the recv_files discard path -- a local `rsync a b` does
not. The basis and source share a leading block so the generator emits
real sums and the receiver gets a block MATCH; the destination directory
is made unwritable so the receiver's output mkstemp() fails and it
discards the delta. Pre-fix the receiver SIGSEGVs in full_fname(NULL),
which the client sees as a protocol-data-stream error (code 12); post-fix
it drains the delta and reports a benign code 23 (or 0).

Skips (exit 77) when run as root, since root bypasses DAC and the
unwritable destination would not make mkstemp() fail -- so the discard
path, and the bug, would never be reached.

Verified red-on-buggy / green-on-fixed against the 0d0399bb receiver.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

receiver: fix NULL deref on the delta discard path

receive_data() crashed a receiver that was merely DISCARDING a file's
delta stream. discard_receive_data() calls receive_data() with
fname == NULL and fd == -1, so size_r == 0 and mapbuf == NULL. A normal
block-MATCH token (against a block the basis and source share) then
reaches the !mapbuf branch added in 31fbb17d ("receiver: fix absolute
--partial-dir delta resume"), which calls full_fname(fname). full_fname()
dereferences its argument unconditionally (util1.c: `if (*fn == '/')`),
so fname == NULL faults there -> receiver SIGSEGV.

This is a normal-operation crash with a stock cooperating sender, not an
adversarial one. The generator hands the sender real block sums whenever
the basis is readable and we're in delta mode; the receiver only decides
to discard afterwards, when its output cannot be produced -- e.g. the
destination directory is not writable (mkstemp fails), the basis turns
out to be a directory, or a --partial-dir resume is skipped. A MATCH
token arriving during that discard hit the NULL deref.

The 31fbb17d branch is correct only for a REAL output transfer (fd != -1,
fname valid): there, a block match with no mapped basis is a genuine
protocol inconsistency (the generator promised a basis the receiver could
not open), and honoring it would silently omit those bytes from the
verification checksum or leave a hole, so hard-erroring -- and
full_fname(fname) -- is right. It conflated that with the discard path.

The discriminator is fd, not mapbuf: on the discard path fd == -1 always;
on the real-output inconsistency fd != -1. Scope the "no basis file"
protocol error to fd != -1 (where fname is non-NULL and full_fname is
safe) and, on the discard path (fd == -1), absorb the matched bytes
benignly (offset += len; continue) -- symmetric with the literal-token
handling just above, and restoring the pre-31fbb17d behavior. The
real-transfer inconsistency check is preserved unchanged.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

fleettest: add a per-target max_retry budget for flaky tests

A slow or heavily-loaded fleet box can occasionally flake a concurrency-
sensitive test (e.g. a daemon/lsh test under -j8 on a nested-VM Solaris box).
Rather than dropping the whole target to a lower -j, add a per-target
"max_retry" property: after a run, each failed test is re-run on its own up to
max_retry more times, and any that then pass are dropped from the failure list.
Recovered tests are listed in a new "RECOVERED" report section, so a flake is
surfaced, never silently hidden.

Applies to every pass for the target (pipe, tcp, protoNN, nonroot). Default 0
keeps the current no-retry behaviour.

docs: fix option summary inconsistencies

ci: test uninstall targets

testsuite: correct files-from comment coverage

testsuite: cover files-from comments

docs: describe files-from comments

testsuite: cover groupmap empty source matching

docs: clarify empty name groupmap matching

docs: clarify batch compression limits

configure: avoid runtime IPv6 availability probe

docs: mention systemd rsync daemon units

build: fix rrsync manpage fallback

fleettest: add per-target protocol passes (check30/check29)

A target can list older "protocols" (e.g. [30, 29]) in the fleet config;
each runs as an extra stdio-pipe pass with runtests --protocol=N, the fleet
analogue of a workflow's check30/check29 steps. The passes reuse the same
parsed RSYNC_EXPECT_SKIPPED list as the default pipe run and appear as protoNN
columns in the report and --timing breakdown. Targets without the key run only
the default protocol and show "-" there.

The example config's ubuntu-2604 target (mirroring ubuntu-build.yml, which has
check30/check29 steps) now sets protocols: [30, 29].

rrsync: add -absolute argument to support calling rsync with absolute path

Signed-off-by: SebMtn <102696928+SebMtn@users.noreply.github.com>

receiver: try to chmod the target file when denied opening

When the target file exists but its permission modes prevent us from
opening it for writing, we can try first to chmod it and then open it.

Always clear st out and validate nanoseconds before using it

Otherwise we get errors.
Fixes: https://github.com/RsyncProject/rsync/issues/927

testsuite: regression for #880 --mkpath --dry-run file-to-file

Covers both halves: a --mkpath file-to-file --dry-run must succeed and
match the real run (the #880 abort), and a plain file-to-file --dry-run
onto an existing differing destination must still itemize the real change
rather than report it as brand new. Both compare "--dry-run -i" output
against the real run.

Co-authored-by: Stiliyan Tonev (Bark) <stiliyan21@gmail.com>

main: fix --mkpath + --dry-run file-to-file copy (#880)

A single-file --mkpath copy whose destination parent does not exist
failed under --dry-run: make_path() only *reports* the directories it
would create in a dry run, so change_dir#3 then tried to chdir into a
parent that isn't there and aborted with "change_dir#3 ... failed".

When the parent is genuinely missing in a dry run, skip the chdir and
mark the destination as not-yet-present (dry_run++), exactly as the
multi-file/dir-creation path already does, so the generator doesn't
probe the missing tree. Gating it on the missing-parent case keeps an
ordinary file-to-file dry run chdir'ing into and itemizing against an
existing destination.

Fixes: #880
Co-authored-by: Stiliyan Tonev (Bark) <stiliyan21@gmail.com>

Drop stale "redo manual as SGML" TODO entries

The SGML manual idea is long dead (man pages are markdown now, and the
DocBook source was just removed). Remove both TODO mentions.

Remove obsolete DocBook manual

doc/rsync.sgml is a 1996-2002 DocBook user manual (with README-SGML
describing the docbook-utils build) that was long ago superseded by the
markdown man pages. It is unmaintained and referenced by nothing in the
build. This empties doc/.

Remove obsolete design notes

rsync3.txt and rsyncsh.txt are Martin Pool's 2001 design proposals
("notes towards a new version of rsync", an interactive rsync shell),
neither of which reflects the current implementation. doc/profile.txt is
stale profiling notes. None are referenced by the build, tests, or docs.

Remove obsolete testhelp/maketree.py

This Python 2 test-tree generator (print statements, string.letters,
.next()) has been broken on modern Python for years and is referenced
nowhere in the build, tests, or any script. Drop it.

fix: daemon upload delete stats

token: drain the matched-block insert deflate (#951)

send_deflated_token() adds a matched block to the compressor history with
deflate(Z_INSERT_ONLY).  Our bundled zlib implements Z_INSERT_ONLY (it
produces no output and consumes the input in one call), but a build
against a system zlib lacks it and falls back to Z_SYNC_FLUSH (see the top
of the file), which emits a flush block into obuf.  For a large
incompressible matched token that block exceeds AVAIL_OUT_SIZE(CHUNK_SIZE),
so deflate returned with avail_in != 0 and the transfer aborted:

    "deflate on token returned 0 (N bytes left)"  at token.c

The insert output is never sent -- the receiver rebuilds the matching
history itself in see_deflate_token() -- so loop, resetting the output
buffer, and discard it.  Drain with the same condition as the data loop
above: until the input is consumed AND avail_out != 0.  Stopping at
avail_in == 0 alone can leave pending output in the deflate stream (a
full output buffer with bytes still buffered), which would then be emitted
by the next real deflate send and corrupt the stream.  A bundled-zlib
build still finishes in one iteration.

Fixes: #951

fix: install generated manpages out of tree

fix: update skips different file type

ci: add ubuntu-latest fleettest workflow against a localhost fleet

fleettest is a developer tool meant to run on a modern Ubuntu box, so a
bitrot check belongs in its own ubuntu-latest job rather than in the
testsuite (which runs on the BSD/Solaris/macOS/Cygwin matrix, whose
older Pythons may not even parse it).

The job sets up passwordless ssh to localhost, writes a two-target
fleet config that both ssh to localhost (distinct build dirs), and runs
a real fleettest pass. Two targets exercise the parallel multi-target
path and the per-run dir / port isolation; the run exits 0 only if
every cell is OK. Triggered on changes to fleettest.py or this
workflow, manually, and weekly.

fleettest: add --timing to show per-target wall-clock

Records wall-clock per phase (push, build, each test transport, nonroot)
plus a total in TargetResult, and with --timing prints a breakdown after
the report, sorted slowest-target-first. Targets run in parallel, so the
run is gated by the slowest one; the phase columns show whether that
hold-up is the push, the build, or a test pass. A target that failed
early (no total) falls back to the sum of the phases it reached.

fleettest: tighten --cleanup sweep scope and rm hardening

Address review findings on the cleanup paths:

- --cleanup no longer removes a bare <builddir>, only the suffixed
  <builddir>-* run dirs it created. This keeps the sweep within its
  documented scope and avoids clobbering an unrelated tree.

- Add _unsafe_builddir(): reject empty/root/$HOME and any absolute path
  directly under / (e.g. a misconfigured builddir of "/tmp") before
  building a destructive command, in both cleanup paths.

- Use `rm -rf --` so a path with a leading dash can't be read as options.

- Soften the docs: run-dir removal on Ctrl-C/kill is best-effort (a
  signal arriving mid-push can still leave a remnant for --cleanup).

fleettest: isolate concurrent runs and add config/cleanup options

Each run now builds in its own randomly-named dir on every target
(<builddir>-<run_id>), so two or three fleettest runs can share the same
fleet without colliding on the pushed tree, the build, or the testtmp
scratch. Port collisions were already handled by claim_ports() locks.

The run dir is removed when the run ends -- on success, failure, or
Ctrl-C/kill (atexit + SIGINT/SIGTERM handlers); --keep retains it. A new
--cleanup mode sweeps stray <builddir>-* dirs left by a SIGKILL.

Incremental builds are dropped (every run is a fresh dir + full build):
--no-push removed, --clean removed.

Also look for the fleet config at ~/.fleettest.json first, then
testsuite/fleettest.json (still overridable with --fleet PATH).

testsuite: regression for the #829 daemon --chown/--groupmap wildcard

Maps every source group to a second group the test user belongs to via a
daemon upload (--groupmap='*:GID') and checks the wildcard took effect.
Runs both arg modes: the default path (the '*' is safe_arg-escaped and the
daemon must un-backslash it -- the regression) and --secluded-args (the '*'
is sent raw over the protected channel, a guard that the fix left that path
alone). Needs no root -- a non-root receiver can chgrp to a member group --
and was verified RED on a pre-fix binary (the escaped '\*' is ignored, gid
unchanged) and GREEN after the fix.

daemon: un-backslash escaped option args (#829)

Without --secluded-args, the client's safe_arg() backslash-escapes shell
and wildcard chars in option values before sending them to the server, so
--chown's --usermap=*:user is transmitted as --usermap=\*:user.  Over ssh a
remote shell removes the backslashes before rsync parses the args, but a
daemon has no shell and read_args() stored option args verbatim -- so the
receiver saw the literal "\*", the usermap/groupmap wildcard never matched,
and the module's configured uid/gid won instead.  A regression from the
secluded-args hardening; rsync 3.2.3 (protocol 31) worked.

Un-backslash option args in read_args() on the daemon's first
(non-protected) read, mirroring what the ssh-side shell does.  File args
after the dot are already handled by glob_expand(); the protected (NUL,
already-unescaped) re-read and the server's stdin read pass unescape=0 so
their raw args are left untouched.

Fixes: #829

build: fall back to do_mknod() when mknodat() is unavailable (#896)

do_mknod_at() (the symlink-race-safe variant used by a non-chrooted
daemon receiver) calls mknodat()/mkfifoat(), but the at-variant was
gated only on AT_FDCWD. Older Darwin declares AT_FDCWD without
mknodat(), so the build failed with "mknodat undeclared".

Probe mknodat()/mkfifoat() in configure and require HAVE_MKNODAT for the
at-variant; without it do_mknod_at() falls back to do_mknod(), exactly
as it already does where AT_FDCWD is missing. Linux keeps the mknodat
path since HAVE_MKNODAT is defined there.

Fixes: #896

alloc: revert "zero all new memory from allocations" (#959)

Commit d046525d made my_alloc() calloc every fresh allocation and made
expand_item_list() memset the freshly grown tail, to hand out predictably
zeroed memory.  But that forces the kernel to back pages callers never
touch: each per-directory file_list pre-allocates a FLIST_START-entry
(32768) pointer array -- 256KB -- and calloc now zeroes the whole array
even for an empty directory.  With incremental recursion over many
directories the resident set explodes; 80000 empty dirs went from ~336MB
to ~10.8GB.

Restore the pre-d046525d malloc/calloc split: fresh allocations use
malloc (so untouched tails stay lazy) and only explicit do_calloc
requests (new_array0) are zeroed.  Callers that need zeroed memory
already ask for it, and the full test suite passes.

Fixes: #959

testsuite: regression for short-checksum --append-verify s2length

Forces --checksum-choice=xxh64 (an 8-byte transfer checksum) with a
corrupted-prefix --append-verify so the full-checksum redo path runs.
Before the generator capped s2length at MIN(SUM_LENGTH, xfer_sum_len)
this died with "Invalid checksum length 16 [sender]"; the test is RED on
the prior generator and GREEN with the cap. Reproduces on any build that
has xxhash, so it guards the fix without an old-libxxhash host; skips when
xxh64 is absent (a build without xxhash).

generator: cap block s2length at the negotiated checksum length

sum_sizes_sqroot() capped the strong-sum length at SUM_LENGTH (16), the
legacy MD4/MD5 digest size.  Since 0902b52f the sum2 array elements are
xfer_sum_len bytes and the sender rejects a sums header whose s2length
exceeds xfer_sum_len.  When the negotiated transfer checksum is shorter
than 16 bytes -- xxh64 (8), used when the build's libxxhash lacks
xxh128/xxh3 (e.g. Ubuntu 20.04) -- the generator still emitted s2length
up to 16, so --append-verify and other full-checksum (redo) transfers
died with "Invalid checksum length 16 [sender]" (protocol incompatibility).

Cap s2length at MIN(SUM_LENGTH, xfer_sum_len): unchanged for any checksum
>= 16 bytes (md5/xxh128/sha1), corrected for short ones.  Also closes a
latent over-read of the xfer_sum_len-sized digest buffer.

android: probe openat2 usability behind a SIGSYS handler

Android's seccomp sandbox traps openat2() with SECCOMP_RET_TRAP, which
raises SIGSYS and kills the process instead of returning ENOSYS, so the
secure resolver cannot simply try openat2() and inspect errno.  Add
openat2_usable() in a new android.c: it probes openat2() once behind a
temporary SIGSYS handler and caches the result.

Gate every SYS_openat2 call on openat2_usable(): in the resolver via an
openat2_beneath() wrapper, and in t_chmod_secure's kernel probe directly,
so a blocked openat2 reports ENOSYS and the caller falls back to the
portable O_NOFOLLOW resolver.  Only openat2 is gated -- a plain openat()
(e.g. opening an operator-trusted absolute basedir) is left free.

The probe body compiles only on Android -- __ANDROID__ is a Bionic target
macro, so it is set for NDK cross-builds and native Termux alike and unset
everywhere else, where openat2_usable() collapses to a constant 1.  Link
android.o into the secure-resolver test helpers too so their self-tests
survive on Termux.

Adapted from PR #909.

configure: require <linux/openat2.h>, not just SYS_openat2

The openat2 secure resolver in syscall.c needs struct open_how and
RESOLVE_BENEATH from <linux/openat2.h>, not only the SYS_openat2 syscall
number. Some setups expose the syscall number via glibc without the
kernel header present, so probing SYS_openat2 alone still left the build
broken (#905). Exercise the header and struct in the configure check so
HAVE_OPENAT2 is defined only when both are actually usable.

t_chmod_secure: use HAVE_OPENAT2 to check for openat2() support

To prevent using openat2() in situations where it is not supported, use
#if defined(__linux__) && defined(HAVE_OPENAT2)
in t_chmod_secure.c, just like it was already being done in syscall.c.

Signed-off-by: Markus Mayer <mmayer@broadcom.com>

build: auto-detect the presence of the openat2() syscall

Let configure detect if the openat2() syscall is supported by the kernel
headers we are building against. Do not attempt to use openat2() if
support is not present.

Users can still disable using the openat2() syscall manually if so
desired.

Signed-off-by: Markus Mayer <mmayer@broadcom.com>

testsuite: add fleettest.py fleet CI harness

fleettest.py builds the committed HEAD of a checkout on a fleet of remote machines over ssh and runs the test suite under both the stdio-pipe and --use-tcp transports in parallel, reporting only the unexpected results. Each target mirrors a .github/workflows/*.yml job: its configure flags, and the RSYNC_EXPECT_SKIPPED list parsed from the workflow.

The fleet is described by a JSON file (testsuite/fleettest.json, git-ignored); fleettest.json.example is a worked template. Use --fleet to point at another config and --repo to build a tree other than the current directory.

A target with nonroot:true reruns, as the unprivileged ssh user, the tests that declare a module-level fleet_nonroot=True (here ownership-depth and daemon). The set lives in the test files, so new privilege-sensitive tests join the non-root pass with no fleet-config change.

Also rename testsuite/README.testsuite to README.md and rewrite it as markdown documenting the current testsuite: runtests.py, the make check/check29/check30/installcheck/coverage targets, the result/exit-code conventions, and fleettest.py.

syscall/receiver: honour a relative alt-basis dir on a daemon receiver (#915)

The symlink-race hardening routed the receiver's basis open through
secure_relative_open(), which rejects any '..' -- so a sibling
--link-dest=../01 on a use-chroot=no daemon was silently ignored and every file
re-transferred (#915/#928, a regression from 3.4.1).

Narrow the confinement to the sanitizing daemon (am_daemon && !am_chrooted) and
re-anchor it at the module root, the real trust boundary: secure_relative_open()
prefixes the cwd's module-relative path (from rsync's logical curr_dir[], a
guaranteed lexical prefix of module_dir) and resolves beneath module_dir, so
RESOLVE_BENEATH permits an in-module '..' climb while still rejecting one that
escapes the module.  secure_basis_open() opens with a bare do_open() in the
non-sanitizing cases.  t_stub.c gains weak curr_dir[]/curr_dir_len for the
helpers (via #pragma weak on non-GNU compilers, where rsync.h erases
__attribute__).

Two tests: link-dest-relative-basis asserts the in-module '..' is honoured;
link-dest-module-escape asserts a --link-dest=../../OUTSIDE climb that leaves
the module is refused (not hard-linked to an outside file).  See upstream
PR #930.

sender: open a module-root-absolute path for a `path = /` module (#897)

A daemon module with path=/ makes F_PATHNAME absolute, so the secure_path built
for the content open starts with '/'. secure_relative_open() rejects an
absolute relpath with EINVAL, so a use-chroot=no daemon with path=/ could not
send any file ('failed to open ...: Invalid argument (22)') -- a regression
from 3.4.2. Strip leading slashes to a module-relative path; resolution stays
confined beneath module_dir.

flist: accept the missing-args mode-0 entry in recv_file_entry (#910)

--delete-missing-args (missing_args==2) sends a missing --files-from arg as a
mode-0 entry (IS_MISSING_FILE), the generator's delete signal. The mode-type
validation in recv_file_entry() rejected mode 0 as an invalid file type,
aborting the transfer with 'invalid file mode 00 ... code 2' before the
generator could act (a regression from 3.4.1). Allow mode 0 through only when
missing_args==2 (the delete mode -- not --ignore-missing-args, which never
sends a mode-0 entry); all other modes are still rejected.

testsuite/runtests: count XFAIL (exit 78) as expected, not a failure

The regression tests use test_xfail() (exit 78) to assert a known, documented
residual on platforms where the fix can't apply -- e.g. link-dest-relative-basis
XFAILs where the receiver has no openat2/O_RESOLVE_BENEATH and the portable
resolver rejects the '..' for safety. runtests.py counted exit 78 in the
generic else->failed branch, so a bare XFAIL failed the whole suite; tally it
separately ('N xfailed (expected)') and exclude it from the failure exit code.
Also add --race-timeout plumbing (race_timeout env) for race tests.

Corrected test case broken for locales that uses , instead of . for decimal numbers in human readable form.

ci: version-mixing workflow, expect manifests, check-progs target

Adds .github/workflows/ubuntu-version-mix.yml (ubuntu-latest) and a
per-release manifest testsuite/expect/rsync_<ver>.expect for each of the
nine peers. The workflow builds the current rsync, then runs the two-
sided suite against every old binary over both the pipe and --use-tcp
daemon transports. All peers run in a SINGLE looped job (not a matrix)
so the PR shows one check line; each peer/transport is a foldable log
group and a failure annotates which one broke.

A new phony `check-progs` target builds rsync plus the test helper
programs and check symlinks without running the suite -- the build half
of `make check` -- so the workflow's direct runtests.py invocation has
the helpers it needs.

Notable expected results encoded in the manifests:
- The four May-2026 security tests xfail against every released peer:
   the suite demonstrates each release is vulnerable to those findings
   while current master is fixed.
- symlink-dirlink-basis xfails on 3.4.0/3.4.1 (issue #715: their
   secure_relative_open O_NOFOLLOW-confines the basedir, breaking a -K
   dir-symlink update; current master fixes it with secure_basis_open).
- Older peers carry more xfails for options/negotiation they lack;
   2.6.0 (protocol 27) fails most daemon tests. reverse-daemon-delta
   passes against all peers, confirming backward compat down to 2004.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

old_versions: commit static binaries of old rsync releases

Nine statically-linked, stripped binaries for the version-mixing test
suite (and ad-hoc cross-version behaviour checks): every x.y.0 release
from 2.6.0 (2004, protocol 27) through 3.4.0, plus the 3.1.3/3.2.7/3.4.1
point releases. 2.6.0 is the practical floor; older tags need more
porting to build on a current toolchain.

build_static.sh rebuilds any release from its git tag, applying the
minimal patches needed to compile old sources on a modern toolchain:
K&R lseek64 redecl, gettimeofday, -std=gnu11, --disable-openssl, and
_FORTIFY_SOURCE disabled (modern FORTIFY=3 turns latent benign over-reads
in old rsync into aborts when it runs as a server). Pre-3.0 trees ship
configure.in, so it regenerates configure (autoheader/autoconf) after
neutralizing the dead AC_LIBOBJ replacement fallbacks, generates proto.h,
and stubs the dropped vendored lib/addrinfo.h -- all guarded to no-op on
newer versions.

.gitattributes marks the binaries binary (so the text=auto rule can't
corrupt them) and export-ignore (kept out of the release tarball).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

testsuite: reverse-direction smoke test (old client -> current daemon)

Every other two-sided test drives with the current binary, covering
new-client -> old-server. This adds the backward-compat direction that
matters most for a project shipping new servers to a world of old
clients: a current daemon must keep serving the installed base of old
rsync clients.

reverse-daemon-delta_test.py starts the daemon with the current build
(via start_test_daemon's rsync_cmd override) and drives it with the old
binary. It does a push and a pull, each with and without -z, with the
receiving side pre-seeded with an older version of the file so the delta
algorithm actually runs -- exercising delta encoding both ways (old->new
on push, new->old on pull) and compression negotiation both ways. It
asserts the bytes crossing the wire are far smaller than the file, so a
silent fallback to a whole-file copy is caught, and accepts both the
modern "sent/received" and the old "wrote/read" summary wording so an
old client's output parses.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

runtests: add --rsync-bin2 / --expect-result for version-mixing tests

Let the suite run with two rsync binaries so the current build can be
tested against the actual old code of a previous release, rather than
only forcing the current binary to speak an old protocol (check29/30).

  --rsync-bin2 PATH  exports RSYNC_PEER, the binary used for the SERVER
                     side of two-sided transfers (the daemon process and
                     the remote-shell --rsync-path target). Defaults to
                     RSYNC, so single-binary runs are byte-for-byte
                     unchanged.
  --expect-result F  the manifest's listed tests ARE the run set; each
                     test's actual outcome (pass/skip/fail/xfail) is
                     compared to its expected one and any mismatch --
                     including an unexpected pass (xpass) -- fails the
                     run. --expect-skipped and the default exit logic
                     are untouched.

rsyncfns gains the RSYNC_PEER global and launches the daemon with it
(start_rsyncd / start_test_daemon, the latter with an optional rsync_cmd
override used by the reverse-direction test); the remote-shell tests
pass --rsync-path={RSYNC_PEER}. All no-ops when no peer is selected.

Direction is fixed: the current binary always drives (only it
understands the new test scripts); the old binary is only ever the
server/daemon side.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

runtests: add --exclude / RSYNC_EXCLUDE to skip tests entirely

Some tests cannot run in certain build/CI environments. In particular the
protected-regular test self-re-execs under "unshare --map-users" to exercise
fs.protected_regular handling, and that user-namespace path hangs in a
restricted buildd chroot (e.g. Launchpad/sbuild), tripping the per-test
timeout and failing the whole "make check".

Add an --exclude option (comma-separated test names/globs), with an
RSYNC_EXCLUDE environment fallback so it can be set without touching the
make/check command line. Excluded tests are dropped before running -- they
are neither executed nor reported as skipped.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>