Test #29 ('ref transaction: corrupted tables cause failure') started to
fail intermittently for me (from v2.52.0-rc0) when running the testsuite
with '-j8'. (Also, having moved to a new laptop and windows 11, rather
than windows 10). If the test is run by hand, or without any parallelism,
then it passes without issue.
When the test fails (e.g. 1 out of 32 parallel runs) the cause is due to
a permission error while corrupting a table file:
./test-lib.sh: line 1010: .git/reftable/0x000000000001-0x000000000002-
d89bb8ee.ref: Permission denied
This corruption is done in a shell loop, directly after a 'test_commit',
which uses an ': >"$f"' expression to truncate the file. Adding a sleep
of one second after the 'test_commit' and before the shell loop fixes
the test (it is not clear why). Replacing the redirection shell expression
with a 'test-tool truncate "$f" 0' invocation also provides a fix, which
could simply be another way to change the timing sufficiently to win the
race.
During a debug session, I tried looking at the strace output for the
shell redirection:
$ rm /tmp/hello; echo hello >/tmp/hello; ls -l /tmp/hello
-rw-r--r-- 1 ramsay None 6 Nov 10 17:25 /tmp/hello
$
$ strace -o zzz bash -c ': >/tmp/hello'
$
Similarly, for the test-tool solution:
$ strace -o xxx ./t/helper/test-tool truncate /tmp/hello 0
$
When comparing the output, the differences seemed to be what you would
expect and, if anything, the shell redirect probably would have taken
longer than the test-tool solution (many fcntl() calls to dup the stdout
to the <fd>). The call to the win32 api NtCreateFile() was identical,
apart from the first (FileHandle) parameter, of course.
In order to fix this flaky test on cygwin, despite not knowing why it
works, replace the shell redirection with the above 'test-tool truncate'
invocation.
Helped-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>