git.ipfire.org Git - thirdparty/git.git/log

run-command: avoid `close(-1)` in `start_command()` error paths

When `start_command()` fails to set up a pipe partway through, it rolls
back by closing the pipe ends it has already opened. For descriptors
supplied by the caller rather than allocated locally, that rollback
tested `if (cmd->in)` / `if (cmd->out)` before calling close(). The
CHILD_PROCESS_INIT default of -1 ("no descriptor") is non-zero and so
passes the test, meaning a caller that sets cmd->no_stdin or
cmd->no_stdout without supplying a real fd ends up triggering close(-1)
on the error path.

The stdin-pipe failure branch a few lines above already uses the right
idiom, `if (cmd->out > 0)`, which rejects both the -1 sentinel and 0
(the parent's own standard streams). Apply it to the three remaining
rollback sites.

Reported by Coverity as CID 1049722 ("Argument cannot be negative").

Assisted-by: Opus 4.7
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

download_https_uri_to_file(): do not leak fd upon failure

When the `git-remote-https` command fails, we do not want to leak
`child_out`.

Pointed out by Coverity.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

loose: avoid closing invalid fd on error path

`write_one_object()` opens a file at line 186 and jumps to the errout
label on failure. The errout cleanup unconditionally calls `close(fd)`,
but when `open()` itself failed, fd is -1. Calling `close(-1)` is
harmless on most platforms (returns EBADF) but is undefined behavior per
POSIX and can confuse fd tracking in sanitizer builds.

Guard the close with fd >= 0.

Pointed out by Coverity.

Assisted-by: Claude Opus 4.6
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

load_one_loose_object_map(): fix resource leak

Pointed out by Coverity.

While at it, reduce near-duplicate clean-up code at the end of the
function.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

precompose_utf8: use a flex array for d_name

On macOS, git status may abort while reading a directory entry
whose UTF-8 name grows past NAME_MAX bytes:

  __chk_fail_overflow
  __strlcpy_chk
  precompose_utf8_readdir
  read_directory_recursive
  wt_status_collect
  cmd_status

The precompose wrapper already reallocates dirent_prec_psx for
long names, but d_name is declared as char[NAME_MAX + 1]. A
fortified libc can still see that declared object size and reject a
larger strlcpy bound, even though the allocation was grown.

Make d_name a FLEX_ARRAY and size allocations from offsetof(). That
matches the actual object layout with the dynamic allocation, so the
fortified copy sees a destination whose size can grow with max_name_len.

Add a regression test that creates an over-NAME_MAX non-ASCII basename
and runs status with core.precomposeunicode enabled.

Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

ci(dockerized): raise the PID limit for private repositories

Every once in a while I need to verify that Microsoft Git's test suite
passes for changes that are not yet meant for public consumption, and
since it was (made) too difficult to keep up a working Azure Pipeline
definition, I have to use GitHub Actions in a private GitHub repository
for that purpose.

In these tests, basically all Dockerized CI jobs fail consistently. The
symptom is something like:

  error: cannot create async thread: Resource temporarily unavailable

in the middle of a test, typically in the t5xxx-t6xxx range. The first
such error is immediately followed by plenty more of these errors, and
not a single test succeeds afterwards.

At first, I thought that maybe the massive parallelism I enjoy there is
the problem, and I thought that the cgroups limits might be shared
between the many containers that run on essentially the same physical
machine. But even reducing the matrix to just a single of those
Dockerized jobs runs into the very same problems.

The underlying reason seems to be a substantial difference in the hosted
runners that execute these Dockerized jobs: forcing the PID limit of the
container to a high number lets the jobs pass, even when running the
complete matrix of all 13 Dockerized jobs concurrently. But that's not
the only difference: The jobs seem to take a lot longer in these
containers than, say, in the containers made available to
https://github.com/git/git.

When forcing a PID limit of 64k in that private repository, the jobs
completed successfully, but they also took a lot longer, between 2x to
2.5x longer, i.e. painfully much longer. Reducing the PID limit to 16k,
the CI jobs still passed, but took an equally long amount of time.
Reducing the PID limit to 8k caused the errors to reappear.

Here are the numbers from three example runs, the first one forcing the
PID and nproc limit to 65536, the second one to 16384, the third run is
from the public git/git repository:

Job                           | 64k     | 16k     | reference
------------------------------|---------|---------|---------
almalinux-8                   | 19m 3s  | 16m 0s  | 9m 36s
debian-11                     | 20m 31s | 20m 3s  | 8m 5s
fedora-breaking-changes-meson | 16m 29s | 19m 19s | 9m 40s
linux-asan-ubsan              | 1h 10m  | 1h 11m  | 34m 36s
linux-breaking-changes        | 25m 39s | 25m 58s | 13m 15s
linux-leaks                   | 1h 9m   | 1h 10m  | 33m 30s
linux-meson                   | 28m 9s  | 27m 4s  | 13m 45s
linux-musl-meson              | 16m 32s | 13m 39s | 8m 6s
linux-reftable-leaks          | 1h 13m  | 1h 13m  | 34m 34s
linux-reftable                | 26m 2s  | 25m 48s | 13m 31s
linux-sha256                  | 26m 12s | 26m 3s  | 12m 36s
linux-TEST-vars               | 26m 5s  | 25m 21s | 13m 25s
linux32                       | 21m 16s | 19m 57s | 10m 44s

It does not look as if the PID limit is the reason for the longer
runtime, seeing as the 64k vs 16k timings deviate no more than as is
usual with GitHub workflows. So let's go for 16k.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

reftable/table: fix OOB read on truncated table

When opening a table we compute the size of its data section by
subtracting the footer size from the file size. We do not verify that
the file is actually large enough to contain both the header and the
footer though. With a truncated table the subtraction can thus
underflow, causing us to read the footer out of bounds:

  SUMMARY: AddressSanitizer: heap-buffer-overflow (/home/pks/Development/git/build/t/unit-tests+0x2479a4) in __asan_memcpy
  Shadow bytes around the buggy address:
    0x7ccff6e0de80: fa fa fa fa fa fa fa fa fd fd fd fd fd fd fd fd
    0x7ccff6e0df00: fd fd fd fd fd fd fd fd fd fa fa fa fa fa fa fa
    0x7ccff6e0df80: fa fa fd fd fd fd fd fd fd fd fd fd fd fd fd fd
    0x7ccff6e0e000: fd fd fd fd fa fa fa fa fa fa fa fa fd fd fd fd
    0x7ccff6e0e080: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fa fa
  =>0x7ccff6e0e100: fa fa fa fa fa[fa]00 00 00 00 00 00 00 00 00 00
    0x7ccff6e0e180: 00 00 00 00 00 00 00 04 fa fa fa fa fa fa fa fa
    0x7ccff6e0e200: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    0x7ccff6e0e280: 00 00 fa fa fa fa fa fa fa fa fa fa fa fa fa fa
    0x7ccff6e0e300: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
    0x7ccff6e0e380: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  Shadow byte legend (one shadow byte represents 8 application bytes):
    Addressable:           00
    Partially addressable: 01 02 03 04 05 06 07
    Heap left redzone:       fa
    Freed heap region:       fd
    Stack left redzone:      f1
    Stack mid redzone:       f2
    Stack right redzone:     f3
    Stack after return:      f5
    Stack use after scope:   f8
    Global redzone:          f9
    Global init order:       f6
    Poisoned by user:        f7
    Container overflow:      fc
    Array cookie:            ac
    Intra object redzone:    bb
    ASan internal:           fe
    Left alloca redzone:     ca
    Right alloca redzone:    cb
  ==1500371==ABORTING

Verify that the file is large enough to contain both the header and the
footer before computing the table size.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

reftable/table: fix NULL pointer access when seeking to bogus offsets

When seeking an iterator to an arbitrary offset we may return a positive
value in case the offset points beyond the block. This makes sense when
iterating through multiple blocks of the same section, as the positive
value indicates to us that we're at the end of the table.

But when the offset originates from a section or index offset it is
supposed to point at a valid block, so an out-of-bounds value means that
the table is corrupt. Treating it as a normal end-of-iteration causes us
to silently report an empty section instead of surfacing the corruption,
and we are left with a partially-initialized block. This may later on
cause a NULL pointer exception:

  ==1486841==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x55555598e02c bp 0x7fffffff4eb0 sp 0x7fffffff4e70 T0)
  ==1486841==The signal is caused by a READ memory access.
  ==1486841==Hint: address points to the zero page.
      #0 0x55555598e02c in reftable_block_type ./git/build/../reftable/block.c:392:9
      #1 0x55555598ee6e in block_iter_seek_key ./git/build/../reftable/block.c:536:35
      #2 0x5555559ae553 in table_iter_seek_linear ./git/build/../reftable/table.c:344:8
      #3 0x5555559adbff in table_iter_seek ./git/build/../reftable/table.c:450:9
      #4 0x5555559ada9c in table_iter_seek_void ./git/build/../reftable/table.c:460:9
      #5 0x555555992872 in reftable_iterator_seek_log_at ./git/build/../reftable/iter.c:281:9
      #6 0x555555992953 in reftable_iterator_seek_log ./git/build/../reftable/iter.c:287:9
      #7 0x55555583aa78 in test_reftable_table__seek_invalid_log_offset ./git/build/../t/unit-tests/u-reftable-table.c:257:20
      #8 0x5555557f684e in clar_run_test ./git/build/../t/unit-tests/clar/clar.c:335:3
      #9 0x5555557f2e69 in clar_run_suite ./git/build/../t/unit-tests/clar/clar.c:431:3
      #10 0x5555557f2882 in clar_test_run ./git/build/../t/unit-tests/clar/clar.c:636:4
      #11 0x5555557f375f in clar_test ./git/build/../t/unit-tests/clar/clar.c:687:11
      #12 0x5555557fa49d in cmd_main ./git/build/../t/unit-tests/unit-test.c:62:8
      #13 0x55555584cffa in main ./git/build/../common-main.c:9:11
      #14 0x7ffff7a2b284 in __libc_start_call_main (/nix/store/57iz36553175g3178pvxjij8z5rcsd4n-glibc-2.42-61/lib/libc.so.6+0x2b284) (BuildId: 8ae0b698f2d4e727f569f64bb166e08ae30bd077)
      #15 0x7ffff7a2b337 in __libc_start_main@GLIBC_2.2.5 (/nix/store/57iz36553175g3178pvxjij8z5rcsd4n-glibc-2.42-61/lib/libc.so.6+0x2b337) (BuildId: 8ae0b698f2d4e727f569f64bb166e08ae30bd077)
      #16 0x555555694c24 in _start (./git/build/t/unit-tests+0x140c24)

  ==1486841==Register values:
  rax = 0x0000000000000000  rbx = 0x00007fffffff4ec0  rcx = 0x0000000000000000  rdx = 0x00007cfff6e2bd58
  rdi = 0x00007cfff6e2bd58  rsi = 0x00007bfff5da1020  rbp = 0x00007fffffff4eb0  rsp = 0x00007fffffff4e70
   r8 = 0x0000000000000000   r9 = 0x0000000000000002  r10 = 0x0000000000000000  r11 = 0x0000000000000017
  r12 = 0x00007fffffff5908  r13 = 0x0000000000000001  r14 = 0x00007ffff7ffd000  r15 = 0x0000555556056e90
  AddressSanitizer can not provide additional info.
  SUMMARY: AddressSanitizer: SEGV ./git/build/../reftable/block.c:392:9 in reftable_block_type
  ==1486841==ABORTING

Fix this by returning a proper error in `table_iter_seek_to()` when the
offset ranges beyond the block.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

reftable/block: fix OOB read with bogus restart offset

Restart points encode records in a given block that do not use prefix
compression and that can thus immediately be seeked to. These offsets
are encoded in the restart table, where each offset needs to point at
one of the records of the block. We do not verify this though, so a
bogus restart offset may cause an out-of-bounds read:

  ==1472280==ERROR: AddressSanitizer: SEGV on unknown address 0x7d8ff7de5f7f (pc 0x55555599502b bp 0x7fffffff4df0 sp 0x7fffffff4d40 T0)
  ==1472280==The signal is caused by a READ memory access.
      #0 0x55555599502b in get_var_int ./git/build/../reftable/record.c:30:6
      #1 0x555555995c2a in reftable_decode_keylen ./git/build/../reftable/record.c:177:6
      #2 0x55555598e85c in restart_needle_less ./git/build/../reftable/block.c:455:6
      #3 0x55555598895f in binsearch ./git/build/../reftable/basics.c:175:9
      #4 0x55555598e189 in block_iter_seek_key ./git/build/../reftable/block.c:543:6
      #5 0x555555814aee in test_reftable_block__corrupt_restart_offset ./git/build/../t/unit-tests/u-reftable-block.c:636:20
      #6 0x5555557f684e in clar_run_test ./git/build/../t/unit-tests/clar/clar.c:335:3
      #7 0x5555557f2e69 in clar_run_suite ./git/build/../t/unit-tests/clar/clar.c:431:3
      #8 0x5555557f2882 in clar_test_run ./git/build/../t/unit-tests/clar/clar.c:636:4
      #9 0x5555557f375f in clar_test ./git/build/../t/unit-tests/clar/clar.c:687:11
      #10 0x5555557fa49d in cmd_main ./git/build/../t/unit-tests/unit-test.c:62:8
      #11 0x55555584c25a in main ./git/build/../common-main.c:9:11
      #12 0x7ffff7a2b284 in __libc_start_call_main (/nix/store/57iz36553175g3178pvxjij8z5rcsd4n-glibc-2.42-61/lib/libc.so.6+0x2b284) (BuildId: 8ae0b698f2d4e727f569f64bb166e08ae30bd077)
      #13 0x7ffff7a2b337 in __libc_start_main@GLIBC_2.2.5 (/nix/store/57iz36553175g3178pvxjij8z5rcsd4n-glibc-2.42-61/lib/libc.so.6+0x2b337) (BuildId: 8ae0b698f2d4e727f569f64bb166e08ae30bd077)
      #14 0x555555694c24 in _start (./git/build/t/unit-tests+0x140c24)

  ==1472280==Register values:
  rax = 0x00007d8ff7de5f7f  rbx = 0x00007fffffff4e00  rcx = 0x00007d8ff7de5f80  rdx = 0x00007bfff5b6af60
  rdi = 0x00007bfff5b6af40  rsi = 0x00007bfff592dfa0  rbp = 0x00007fffffff4df0  rsp = 0x00007fffffff4d40
   r8 = 0x00000000ff00002b   r9 = 0x00007d8ff7de5f7f  r10 = 0x00000f7ffeb25bf0  r11 = 0xf3f30000f1f1f1f1
  r12 = 0x00007fffffff58f8  r13 = 0x0000000000000001  r14 = 0x00007ffff7ffd000  r15 = 0x0000555556055fd0
  AddressSanitizer can not provide additional info.
  SUMMARY: AddressSanitizer: SEGV ./git/build/../reftable/record.c:30:6 in get_var_int

Guard against such restart offsets and signal an error to the caller via
`args.error`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

reftable/block: fix use of uninitialized memory when binsearch fails

When doing the binary search through our restart offsets we may hit an
error in case `restart_needle_less()` fails to decode the record at the
given offset. While we correctly detect this case and error out, it will
cause us to call `reftable_record_release()` on the yet-uninitialized
record.

Fix this by initializing the record earlier.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

reftable/block: fix OOB read with bogus restart count

The restart count is stored in the last two bytes of a block. We use it
without verification to compute the offset of the restart table. With a
bogus restart count that is large enough this computation underflows,
and the subsequent reads via the restart table access out-of-bounds
memory:

  ==129439==ERROR: AddressSanitizer: SEGV on unknown address 0x7d90f6dcd0ad (pc 0x55555598ce89 bp 0x7fffffff4ed0 sp 0x7fffffff4e80 T0)
  ==129439==The signal is caused by a READ memory access.
      #0 0x55555598ce89 in reftable_get_be24 ./git/build/../reftable/basics.h:125:9
      #1 0x55555598eabf in block_restart_offset ./git/build/../reftable/block.c:407:9
      #2 0x55555598e5d5 in restart_needle_less ./git/build/../reftable/block.c:431:17
      #3 0x5555559887e2 in binsearch ./git/build/../reftable/basics.c:165:13
      #4 0x55555598dfec in block_iter_seek_key ./git/build/../reftable/block.c:529:6
      #5 0x555555814517 in test_reftable_block__corrupt_restart_count ./git/build/../t/unit-tests/u-reftable-block.c:593:15
      #6 0x5555557f684e in clar_run_test ./git/build/../t/unit-tests/clar/clar.c:335:3
      #7 0x5555557f2e69 in clar_run_suite ./git/build/../t/unit-tests/clar/clar.c:431:3
      #8 0x5555557f2882 in clar_test_run ./git/build/../t/unit-tests/clar/clar.c:636:4
      #9 0x5555557f375f in clar_test ./git/build/../t/unit-tests/clar/clar.c:687:11
      #10 0x5555557fa49d in cmd_main ./git/build/../t/unit-tests/unit-test.c:62:8
      #11 0x55555584c12a in main ./git/build/../common-main.c:9:11
      #12 0x7ffff7a2b284 in __libc_start_call_main (/nix/store/57iz36553175g3178pvxjij8z5rcsd4n-glibc-2.42-61/lib/libc.so.6+0x2b284) (BuildId: 8ae0b698f2d4e727f569f64bb166e08ae30bd077)
      #13 0x7ffff7a2b337 in __libc_start_main@GLIBC_2.2.5 (/nix/store/57iz36553175g3178pvxjij8z5rcsd4n-glibc-2.42-61/lib/libc.so.6+0x2b337) (BuildId: 8ae0b698f2d4e727f569f64bb166e08ae30bd077)
      #14 0x555555694c24 in _start (./git/build/t/unit-tests+0x140c24)

  ==129439==Register values:
  rax = 0x00007d90f6dcd0ad  rbx = 0x00007fffffff4f20  rcx = 0xf2f2f2f8f2f2f2f8  rdx = 0x0000000000000000
  rdi = 0x00007d90f6dcd0ad  rsi = 0x0000000000007fff  rbp = 0x00007fffffff4ed0  rsp = 0x00007fffffff4e80
   r8 = 0x0000000000000000   r9 = 0x0000000000000000  r10 = 0x0000000000000000  r11 = 0x0000000000000017
  r12 = 0x00007fffffff58e8  r13 = 0x0000000000000001  r14 = 0x00007ffff7ffd000  r15 = 0x00005555560550b0
  AddressSanitizer can not provide additional info.
  SUMMARY: AddressSanitizer: SEGV ./git/build/../reftable/basics.h:125:9 in reftable_get_be24

Verify that the restart table actually fits into the block.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

reftable/block: fix OOB read with bogus block size

The block size is read from the block header, which is untrusted data.
We use it without verification to access the restart count at the end of
the block as well as to compute the restart table offset. With a bogus
block size that exceeds the data we have actually read this can lead to
an out-of-bounds read:

  ==2274138==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x7c3ff6de2e3f at pc 0x55555598c6ea bp 0x7fffffff4ee0 sp 0x7fffffff4ed8
  READ of size 1 at 0x7c3ff6de2e3f thread T0
      #0 0x55555598c6e9 in reftable_get_be16 /home/pks/Development/git/build/../reftable/basics.h:119:20
      #1 0x55555598c252 in reftable_block_init /home/pks/Development/git/build/../reftable/block.c:343:18
      #2 0x555555813c70 in test_reftable_block__corrupt_block_size /home/pks/Development/git/build/../t/unit-tests/u-reftable-block.c:531:20
      #3 0x5555557f684e in clar_run_test /home/pks/Development/git/build/../t/unit-tests/clar/clar.c:335:3
      #4 0x5555557f2e69 in clar_run_suite /home/pks/Development/git/build/../t/unit-tests/clar/clar.c:431:3
      #5 0x5555557f2882 in clar_test_run /home/pks/Development/git/build/../t/unit-tests/clar/clar.c:636:4
      #6 0x5555557f375f in clar_test /home/pks/Development/git/build/../t/unit-tests/clar/clar.c:687:11
      #7 0x5555557fa49d in cmd_main /home/pks/Development/git/build/../t/unit-tests/unit-test.c:62:8
      #8 0x55555584b8aa in main /home/pks/Development/git/build/../common-main.c:9:11
      #9 0x7ffff7a2b284 in __libc_start_call_main (/nix/store/8kvxvr3pmsypxiypq4g8zy13glnfr7nx-glibc-2.42-67/lib/libc.so.6+0x2b284) (BuildId: 5a702452a01df1d7d50ce0663acec7be3c71fd4d)
      #10 0x7ffff7a2b337 in __libc_start_main@GLIBC_2.2.5 (/nix/store/8kvxvr3pmsypxiypq4g8zy13glnfr7nx-glibc-2.42-67/lib/libc.so.6+0x2b337) (BuildId: 5a702452a01df1d7d50ce0663acec7be3c71fd4d)
      #11 0x555555694c24 in _start (/home/pks/Development/git/build/t/unit-tests+0x140c24)

  0x7c3ff6de2e3f is located 0 bytes after 47-byte region [0x7c3ff6de2e10,0x7c3ff6de2e3f)
  allocated by thread T0 here:
      #0 0x55555579e95b in malloc (/home/pks/Development/git/build/t/unit-tests+0x24a95b)
      #1 0x5555559871c2 in reftable_malloc /home/pks/Development/git/build/../reftable/basics.c:24:9
      #2 0x5555559872e8 in reftable_calloc /home/pks/Development/git/build/../reftable/basics.c:54:6
      #3 0x55555598f0d3 in reftable_buf_read_data /home/pks/Development/git/build/../reftable/blocksource.c:67:2
      #4 0x55555598ea7e in block_source_read_data /home/pks/Development/git/build/../reftable/blocksource.c:41:19
      #5 0x55555598c555 in read_block /home/pks/Development/git/build/../reftable/block.c:224:9
      #6 0x55555598b69e in reftable_block_init /home/pks/Development/git/build/../reftable/block.c:258:9
      #7 0x555555813c70 in test_reftable_block__corrupt_block_size /home/pks/Development/git/build/../t/unit-tests/u-reftable-block.c:531:20
      #8 0x5555557f684e in clar_run_test /home/pks/Development/git/build/../t/unit-tests/clar/clar.c:335:3
      #9 0x5555557f2e69 in clar_run_suite /home/pks/Development/git/build/../t/unit-tests/clar/clar.c:431:3
      #10 0x5555557f2882 in clar_test_run /home/pks/Development/git/build/../t/unit-tests/clar/clar.c:636:4
      #11 0x5555557f375f in clar_test /home/pks/Development/git/build/../t/unit-tests/clar/clar.c:687:11
      #12 0x5555557fa49d in cmd_main /home/pks/Development/git/build/../t/unit-tests/unit-test.c:62:8
      #13 0x55555584b8aa in main /home/pks/Development/git/build/../common-main.c:9:11
      #14 0x7ffff7a2b284 in __libc_start_call_main (/nix/store/8kvxvr3pmsypxiypq4g8zy13glnfr7nx-glibc-2.42-67/lib/libc.so.6+0x2b284) (BuildId: 5a702452a01df1d7d50ce0663acec7be3c71fd4d)
      #15 0x7ffff7a2b337 in __libc_start_main@GLIBC_2.2.5 (/nix/store/8kvxvr3pmsypxiypq4g8zy13glnfr7nx-glibc-2.42-67/lib/libc.so.6+0x2b337) (BuildId: 5a702452a01df1d7d50ce0663acec7be3c71fd4d)
      #16 0x555555694c24 in _start (/home/pks/Development/git/build/t/unit-tests+0x140c24)

  SUMMARY: AddressSanitizer: heap-buffer-overflow /home/pks/Development/git/build/../reftable/basics.h:119:20 in reftable_get_be16
  Shadow bytes around the buggy address:
    0x7c3ff6de2b80: fa fa fd fd fd fd fd fa fa fa fd fd fd fd fd fa
    0x7c3ff6de2c00: fa fa fd fd fd fd fd fa fa fa fd fd fd fd fd fa
    0x7c3ff6de2c80: fa fa fd fd fd fd fd fd fa fa fd fd fd fd fd fa
    0x7c3ff6de2d00: fa fa fd fd fd fd fd fd fa fa fd fd fd fd fd fa
    0x7c3ff6de2d80: fa fa 00 00 00 00 00 00 fa fa fd fd fd fd fd fd
  =>0x7c3ff6de2e00: fa fa 00 00 00 00 00[07]fa fa fa fa fa fa fa fa
    0x7c3ff6de2e80: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
    0x7c3ff6de2f00: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
    0x7c3ff6de2f80: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
    0x7c3ff6de3000: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
    0x7c3ff6de3080: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  Shadow byte legend (one shadow byte represents 8 application bytes):
    Addressable:           00
    Partially addressable: 01 02 03 04 05 06 07
    Heap left redzone:       fa
    Freed heap region:       fd
    Stack left redzone:      f1
    Stack mid redzone:       f2
    Stack right redzone:     f3
    Stack after return:      f5
    Stack use after scope:   f8
    Global redzone:          f9
    Global init order:       f6
    Poisoned by user:        f7
    Container overflow:      fc
    Array cookie:            ac
    Intra object redzone:    bb
    ASan internal:           fe
    Left alloca redzone:     ca
    Right alloca redzone:    cb

Verify that the claimed block size fits into the block data before using
it.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

reftable/block: fix OOB write with bogus inflated log size

The "log" reftable block stores reflog information. This information is
compressed using zlib. The inflated size is stored in the block header
so that callers can easily learn ahead of time how large of a buffer
they have to allocate to inflate the data in a single pass. So to
reconstruct the full inflated block we:

  - Copy over the header as-is, as it's not deflated.

  - Append the inflated data to the buffer.

The inflated block size stored in the header also includes the length of
the header itself. So to figure out the bytes that should be inflated by
zlib we need to subtract the header size, which is trusted data, from
the block size, which is untrusted data derived from the block header.

While we do verify that we were able to inflate all data as expected, we
don't verify ahead of time that the encoded block length is larger than
the header length. This can lead to an underflow, which makes zlib
assume that it can write more data into the target buffer than we have
allocated. The result is an out-of-bounds write:

  ==1422297==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x7c1ff6de5231 at pc 0x55555579a628 bp 0x7fffffff4f10 sp 0x7fffffff46d0
  WRITE of size 4 at 0x7c1ff6de5231 thread T0
      #0 0x55555579a627 in __asan_memcpy (./build/t/unit-tests+0x246627)
      #1 0x55555598b093 in reftable_block_init ./build/../reftable/block.c:277:3
      #2 0x555555813701 in test_reftable_block__corrupt_log_block_size ./build/../t/unit-tests/u-reftable-block.c:495:20
      #3 0x5555557f684e in clar_run_test ./build/../t/unit-tests/clar/clar.c:335:3
      #4 0x5555557f2e69 in clar_run_suite ./build/../t/unit-tests/clar/clar.c:431:3
      #5 0x5555557f2882 in clar_test_run ./build/../t/unit-tests/clar/clar.c:636:4
      #6 0x5555557f375f in clar_test ./build/../t/unit-tests/clar/clar.c:687:11
      #7 0x5555557fa49d in cmd_main ./build/../t/unit-tests/unit-test.c:62:8
      #8 0x55555584af4a in main ./build/../common-main.c:9:11
      #9 0x7ffff7a2b284 in __libc_start_call_main (/nix/store/57iz36553175g3178pvxjij8z5rcsd4n-glibc-2.42-61/lib/libc.so.6+0x2b284) (BuildId: 8ae0b698f2d4e727f569f64bb166e08ae30bd077)
      #10 0x7ffff7a2b337 in __libc_start_main@GLIBC_2.2.5 (/nix/store/57iz36553175g3178pvxjij8z5rcsd4n-glibc-2.42-61/lib/libc.so.6+0x2b337) (BuildId: 8ae0b698f2d4e727f569f64bb166e08ae30bd077)
      #11 0x555555694c24 in _start (./build/t/unit-tests+0x140c24)

  0x7c1ff6de5231 is located 0 bytes after 1-byte region [0x7c1ff6de5230,0x7c1ff6de5231)
  allocated by thread T0 here:
      #0 0x55555579db1b in realloc.part.0 asan_malloc_linux.cpp.o
      #1 0x5555559868d7 in reftable_realloc ./build/../reftable/basics.c:36:9
      #2 0x55555598a98f in reftable_alloc_grow ./build/../reftable/basics.h:229:10
      #3 0x55555598ae58 in reftable_block_init ./build/../reftable/block.c:269:3
      #4 0x555555813701 in test_reftable_block__corrupt_log_block_size ./build/../t/unit-tests/u-reftable-block.c:495:20
      #5 0x5555557f684e in clar_run_test ./build/../t/unit-tests/clar/clar.c:335:3
      #6 0x5555557f2e69 in clar_run_suite ./build/../t/unit-tests/clar/clar.c:431:3
      #7 0x5555557f2882 in clar_test_run ./build/../t/unit-tests/clar/clar.c:636:4
      #8 0x5555557f375f in clar_test ./build/../t/unit-tests/clar/clar.c:687:11
      #9 0x5555557fa49d in cmd_main ./build/../t/unit-tests/unit-test.c:62:8
      #10 0x55555584af4a in main ./build/../common-main.c:9:11
      #11 0x7ffff7a2b284 in __libc_start_call_main (/nix/store/57iz36553175g3178pvxjij8z5rcsd4n-glibc-2.42-61/lib/libc.so.6+0x2b284) (BuildId: 8ae0b698f2d4e727f569f64bb166e08ae30bd077)
      #12 0x7ffff7a2b337 in __libc_start_main@GLIBC_2.2.5 (/nix/store/57iz36553175g3178pvxjij8z5rcsd4n-glibc-2.42-61/lib/libc.so.6+0x2b337) (BuildId: 8ae0b698f2d4e727f569f64bb166e08ae30bd077)
      #13 0x555555694c24 in _start (./build/t/unit-tests+0x140c24)

  SUMMARY: AddressSanitizer: heap-buffer-overflow (./build/t/unit-tests+0x246627) in __asan_memcpy
  Shadow bytes around the buggy address:
    0x7c1ff6de4f80: fa fa fd fd fa fa fd fd fa fa fd fd fa fa fd fd
    0x7c1ff6de5000: fa fa fd fd fa fa fd fd fa fa fd fd fa fa fd fd
    0x7c1ff6de5080: fa fa fd fd fa fa fd fd fa fa fd fd fa fa fd fd
    0x7c1ff6de5100: fa fa fd fd fa fa fd fd fa fa fd fd fa fa fd fd
    0x7c1ff6de5180: fa fa fd fd fa fa fd fd fa fa fd fa fa fa fd fd
  =>0x7c1ff6de5200: fa fa 04 fa fa fa[01]fa fa fa fa fa fa fa fa fa
    0x7c1ff6de5280: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
    0x7c1ff6de5300: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
    0x7c1ff6de5380: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
    0x7c1ff6de5400: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
    0x7c1ff6de5480: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  Shadow byte legend (one shadow byte represents 8 application bytes):
    Addressable:           00
    Partially addressable: 01 02 03 04 05 06 07
    Heap left redzone:       fa
    Freed heap region:       fd
    Stack left redzone:      f1
    Stack mid redzone:       f2
    Stack right redzone:     f3
    Stack after return:      f5
    Stack use after scope:   f8
    Global redzone:          f9
    Global init order:       f6
    Poisoned by user:        f7
    Container overflow:      fc
    Array cookie:            ac
    Intra object redzone:    bb
    ASan internal:           fe
    Left alloca redzone:     ca
    Right alloca redzone:    cb

Fix the bug by adding a sanity check and add a unit test.

Reported-by: oxsignal <awo@kakao.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t/unit-tests: introduce test helper to write reftable blocks

Introduce a new test helper that allows us to write reftable blocks.
This helper will be used by subsequent commits.

Suggested-by: Christian Couder <christian.couder@gmail.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

reftable/record: don't abort when decoding invalid ref value type

When decoding a ref record we read its value type from the block. In
case the type itself is invalid we call `abort()`. This is rather
heavy-handed though: the data we're reading is untrusted, so we should
treat the issue as a normal and not as a programming error.

Fix this by handling the error gracefully. Note that this also requires
us to set the value type later, as otherwise we might store an invalid
type in the record.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

reftable/basics: fix OOB read on binary search of empty range

`binsearch()` performs a binary search over a range of `sz` elements by
repeatedly calling the comparison function with indices into that range.
When the range is empty though, there is no valid index to call the
comparison function with. We still end up executing the comparison
function though with an index of 0, which of course will cause an
out-of-bounds read.

Return early when the range is empty.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

oss-fuzz: add fuzzer for parsing reftables

Add a new fuzzer that exercises our parsing of reftables. Fallout from
this fuzzer will be fixed over subsequent commits.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

meson: support building fuzzers with libFuzzer

To support fuzzing via libFuzzer one has to pass a couple of compiler
options:

  - It is mandatory to enable the "fuzzer-no-link" sanitizer for
    coverage feedback.

  - It is recommended to enable at least one more sanitizer to catch
    issues, like the "address" sanitizer.

  - The fuzzing executables need to be linked with "-fsanitize=fuzzer"
    to wire up libFuzzer itself.

The first two items can already be achieved via the "-Db_sanitize="
option. But the last item cannot easily be achieved, as we can only
configure global link arguments.

Introduce a new "-Dfuzzers_link_args=" build option to plug this gap.
Add documentation so that users know how to set up libFuzzer.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

builtin/history: implement "drop" subcommand

A common operation when editing the commit history is to drop a specific
commit from the history entirely, but this operation is not currently
covered by git-history(1).

A couple of noteworthy bits:

  - This is the first git-history(1) command that will ultimately result
    in changes to both the index and the working tree. We thus have to
    add logic to merge resulting changes into those.

  - It is still not possible to replay merge commits, so this limitation
    is inherited for the new "drop" command.

  - For now we refuse to drop root commits. While we _can_ indeed drop
    root commits in the general case, there are edge cases where the
    resulting history would become completely empty. This is thus left
    to a subsequent patch series.

Other than that, most of the logic is rather straight-forward as we can
continue to build on the preexisting logic in git-history(1) for most of
the part.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

builtin/history: split handling of ref updates into two phases

The function `handle_reference_updates()` is used by git-history(1) to
update all references that refer to commits that have been rewritten. As
such, it performs two steps:

  - It gathers the references that need to be updated in the first
    place.

  - It prepares and commits the reference transaction.

In a subsequent commit we'll want to handle those two steps separately.
Prepare for this by splitting up the function into two.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

replay: expose `replay_result_queue_update()`

Expose `replay_result_queue_update()`, which is used to append another
reference update to the replay result. This function will be used in a
subsequent commit.

Suggested-by: Christian Couder <christian.couder@gmail.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

reset: stop assuming that the caller passes in a clean index

In 652bd0211d (rebase: use 'skip_cache_tree_update' option, 2022-11-10),
we updated `reset_working_tree()` to stop updating the index tree cache.
This was done as a performance optimization: the function is only called
by "sequencer.c" and "rebase.c", both of which assume a clean index
before they perform their operation, so we know that the end result will
be a clean index, too. Consequently, we can skip recomputing the cache
as we can instead use `prime_cache_tree()` directly.

In a subsequent commit we're about to add a new caller though where the
assumption doesn't hold anymore: the index may be dirty before calling
`reset_working_tree()`, and consequently we cannot prime the cache with
a given tree anymore as the index and tree will mismatch.

Adapt the logic so that we only skip the cache tree update in case we're
doing a hard reset. While we could introduce logic that only skips the
update in case the incoming index was dirty already, that doesn't really
feel worth it: after all, the mentioned commit says itself that the
performance improvement was negligible anyway.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

reset: allow the caller to specify the current HEAD object

When calling `reset_working_tree()` we automatically derive the commit
that the callers wants to move from by reading the HEAD commit. Some
callers may already have resolved it, or they may want to move from a
different commit that doesn't match HEAD.

Introduce a new `oid_from` option that lets the caller specify the
commit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

reset: introduce ability to skip updating HEAD

In a subsequent commit we'll introduce a new caller to
`reset_working_tree()` that really only wants to update the index and
working tree, without updating any references. Introduce a new flag that
makes the caller opt in to updating HEAD and adapt all callers to set
that flag.

Note that in a previous iteration we instead introduced a flag that made
callers opt out of updating any references. This was somewhat awkward
though because we already have the `UPDATE_ORIG_HEAD` flag, so the
result was somewhat inconsistent.

Suggested-by: Phillip Wood <phillip.wood123@gmail.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
[jc: fixed-up a typo pointed out by Christian]
Signed-off-by: Junio C Hamano <gitster@pobox.com>

meson: restore hook-list.h to builtin_sources

This fixes a racy build failure.

```
builtin/bugreport.c:12:10: fatal error: hook-list.h: No such file or directory
12 | #include "hook-list.h"
| ^~~~~~~~~~~~~

```

hook-list.h must be generated before builtin/bugreport.c is compiled.

Bug: https://bugs.gentoo.org/978326
Fixes: 2eb541e8f2a9 (hook: move is_known_hook() to hook.c for wider use, 2026-04-10)
Signed-off-by: Mike Gilbert <floppym@gentoo.org>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

odb: document object info fields

Some of the fields in `struct object_info` are undocumented. Add these
missing comments.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

odb: drop `whence` field from object info

In the preceding commits we have migrated all callers to derive their
information of how a specific object is stored to use the new object
info source instead, and hence the field is now unused. Drop it.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

treewide: convert users of `whence` to the new source field

The `whence` field has become redundant now that callers can learn about
the exact source an object has been looked up from via the `struct
object_info_source::source` field.

Adapt callers to use the new field. Note that all callsites already set
up the `info.sourcep` request pointer, so the conversion is rather
straight-forward.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

odb: add `source` field to struct object_info_source

The previous commit introduced `struct object_info_source` as an opt-in
container for backend-specific information, but for now we only moved
preexisting data into this structure. Most importantly, the caller has
no way yet to learn about which source an object was actually looked up
from. Instead, callers have to rely on the `whence` enum to distinguish
the object type, but cannot use that enum to tell the object source.

Add a `struct odb_source *source` field to the structure and populate it
from each backend's lookup path.

The `whence` enum is still set and used by callers; it will be removed
in a subsequent commit now that `sourcep->source` can identify the
backend on its own.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

odb: make backend-specific fields optional

The `struct object_info` carries two pieces of information
about how an object was looked up:

  - The `whence` enum identifying the backend.

  - The backend-tagged union `u` exposing backend-specific details
    (currently only the packed-source case, which records the owning
    pack, offset and packed object type).

The union is populated unconditionally, even though most callers don't
care about provenance at all.

Split the backend-specific union out into a new public type, `struct
object_info_source`, and make the object info structure carry it via
just another opt-in request pointer. As with all the other requestable
information, callers that need source info allocate a `struct
object_info_source` on the stack and point `sourcep` at it; callers that
don't care about it simply leave the field as a `NULL` pointer. Adapt
callers accordingly.

Note that the `whence` enum is strictly-speaking also backend-specific
information, so it would be another good candidate to be moved into the
`struct object_info_source`. For now though it is left alone, as it will
be replaced by a `struct odb_source` pointer in a subsequent commit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

packfile: thread odb_source_packed through packed_object_info()

Add an optional `struct odb_source_packed *source` parameter to
`packed_object_info()` and `packed_object_info_with_index_pos()`. This
parameter is unused at this point in time, but it will be used in a
follow-up commit so that we can record the source of a specific object.

Note that callers in "odb/source-packed.c" pass the already-available
source, but all other callers pass `NULL` instead. This is fine though,
as we only care about populating this info when called via the packed
store.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

hash: add platform-specific discard functions

Our git_hash_discard() is a bit hacky: it just calls git_hash_final()
into a dummy result buffer, using the side effect that each
implementation's Final() function will also free any resources.

This is probably not too terrible, since generating the final hash is
not that expensive and we'd mostly call discard on unusual or error code
paths. But we can do better by widening the platform API a bit to add an
explicit discard function.

This requires an annoying amount of boilerplate:

  - Each algorithm needs a git_$ALGO_discard() wrapper that dereferences
    the union'd git_hash_ctx into the type-safe field. So sha1 + sha256
    + sha1-unsafe, plus a BUG() for the unknown algo. And then these all
    need to be referenced in the git_hash_algo structs.

  - Platforms which don't do anything special to discard now need a
    fallback function which does nothing. And we need this for each algo
    (sha1, sha256, and sha1-unsafe).

  - Platforms which do need to discard must define their discard
    functions. This includes sha1/openssl, sha256/openssl, and
    sha256/gcrypt (no sha1-unsafe here as it sits atop the sha1/openssl
    functions).

  - Algo selection needs to point platform_*_Discard to the appropriate
    underlying macro, or indicate that the fallback should be used. We
    have a similar situation for the Clone function (where a straight
    memcpy() of the context struct is not enough for some platforms).
    I've tied Discard to the same flag used by Clone here, since they
    are basically the same problem: is the hash context a sequence of
    bytes, or does it need smart copying/discarding?

It's easy to miss a case here since we don't even compile the
implementations we aren't using. I've tested with each of:

  - no flags, which uses our internal sha1/sha256 implementations, both
    of which exercise the noop fallback function

  - OPENSSL_SHA1_UNSAFE=1, which checks that our unsafe macro
    redirections work

  - OPENSSL_SHA1=1, though you should not do that in real life!

  - OPENSSL_SHA256=1, passes tests with GIT_TEST_DEFAULT_HASH=sha256

  - GCRYPT_SHA256=1, which likewise passes

The other implementations do not set the CLONE_HELPER flag, so they
treat the context as bytes and should be fine with the fallback.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

hash: fix memory leak copying sha256 gcrypt handles

Our abstracted hash-algorithm API allows for cloning a hash context. By
default this just memcpy()s the bytes, but specific implementations can
provide a custom clone function.

Our API is based around the way that OpenSSL works, which is that you
first initialize the destination context, then copy into it. In our code
that is this:

  algo->init_fn(&dst);
  git_hash_clone(&dst, src);

and that translates into OpenSSL calls like:

  /* init_fn */
  dst->ectx = EVP_MD_CTX_new();
  EVP_DigestInit_ex(dst->ectx, EVP_sha256());
  /* clone */
  EVP_MD_CTX_copy_ex(dst->ectx, src->ectx);

So the allocation happens in the first step, and then the clone is just
copying values (the DigestInit is initializing values that just get
overwritten, but that's not wrong, just a little inefficient).

But libgcrypt doesn't work like that! Its copy function initializes dst
from scratch. So when using the sha256 gcrypt backend, that becomes:

  /* init_fn; this allocates */
  gcry_md_open(&dst, GCRY_MD_SHA256);
  /* clone; this also allocates, leaking the previous value! */
  gcry_md_copy(&dst, src);

You can see the leaks in the test suite by running:

  make \
    SANITIZE=leak \
    GCRYPT_SHA256=1 \
    GIT_TEST_DEFAULT_SHA=256 \
    test

which has many failures, as opposed to building with OPENSSL_SHA256,
which is leak-free.

The easy fix here is for the clone function to close the open context
we're about to overwrite. It's a little inefficient (we did a pointless
open in the init function), but probably not a big deal in practice.

If our API went the other way, assuming that we're always cloning into
garbage bytes, then we could be more efficient. We'd teach OpenSSL's
clone function to do its own new(), skip the DigestInit, and then copy
into it. And gcrypt could stick with just the copy() call.

But look again at the asymmetry in the very first code example. We call
the init function straight from the git_hash_algo struct, and then
subsequent calls are dispatched through our git_hash_* wrappers. If you
wanted to clone into an uninitialized destination, you'd do something
like:

  algo->clone_fn(&dst, src);

instead. That would require changing all of the callers. There's not
that many of them, but I don't know that it's worth changing our calling
conventions to try to reclaim this tiny bit of efficiency.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

http: discard hash in dumb-http http_object_request

Usually an object request results in finish_http_object_request()
calling git_hash_final_oid(), after we've received all of the data. But
if we hit an error, we'll bail early and free the http_object_request,
dropping the git_hash_ctx entirely.  This can cause a leak for hash
implementations that allocate memory in their context, like OpenSSL >=
3.0.

The obvious fix is for abort_http_object_request() to call
git_hash_discard(), under the assumption that every request is either
finished or aborted. But that's not quite true:

  1. Not everybody calls the abort function. Sometimes they jump
     straight to release_http_object_request(). So we'd have to put it
     there.

  2. After the finish function finalizes the hash, we can still
     encounter errors! In that case we end up aborting or releasing,
     and they must not discard that hash (since that would be a
     double-free).

So we'll keep a flag marking the validity of the hash_ctx field of the
request. The lifetime is simple: it is valid immediately after creation,
up until we call finalize. And then our release function can just
conditionally discard the hash based on that flag.

This fixes test failures in t5550 and t5619 when run with:

  make SANITIZE=leak \
       OPENSSL_SHA256=1 \
       GIT_TEST_DEFAULT_HASH=sha256 \
       test

The flag handling could be removed if the hash-discard function were
idempotent. This could be done easily-ish by having the underlying
hash functions (like the ones in sha256/openssl.h) set the context
pointer to NULL after free-ing. But it's something that every platform
implementation would have to remember to do, and the benefit for the
callers is not that huge (it would let us shave a few lines here and
probably in a few other spots).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

check_stream_oid(): discard hash on read error

The happy path of check_stream_oid() is to initialize a hash, feed the
loose object zlib stream into it, and then get the final result. But if
we hit a zlib error or see extra cruft we'll bail early with an error.

Since we never call git_hash_final() in this cases, any resources held
by the git_hash_ctx may be leaked. Our default hash algorithms don't
allocate anything in the hash_ctx, but some implementations do. For
example, running:

  make SANITIZE=leak \
       OPENSSL_SHA256=1 \
       GIT_TEST_DEFAULT_HASH=sha256 \
       test

will fail t1450, since it feeds corrupted objects that cause us to bail
from check_stream_oid(). This patch fixes it by discarding the hash in
those early return paths. Trying to jump to a common "out:" label is not
worth it here, as we must _not_ discard a hash that was already fed to
git_hash_final(). And the hash_ctx itself does not carry any information
(so we cannot check for a NULL pointer, etc).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

patch-id: discard hash when done

When computing a patch-id, we have a flush_one_hunk() helper that calls
git_hash_final() on our running hunk git_hash_ctx, and then
reinitializes that context for the next hunk.

When we run out of hunks to look at, we return, discarding the
git_hash_ctx. This can cause a leak if the hash implementation we are
using allocates any memory during its initialization. This includes
OpenSSL >= 3.0, for both SHA-1 and SHA-256. Normally we would not use
SHA-1 here at all, as we only recommend using non-DC implementations for
the "unsafe" variant (and patch-id, though they probably _could_ use the
unsafe variant, were never taught to do so).

But it is certainly a problem for SHA-256, which you can see with:

  make SANITIZE=leak \
       OPENSSL_SHA256=1 \
       GIT_TEST_DEFAULT_HASH=sha256 \
       test

That results in leak failures of 60 scripts, 57 of which are fixed by
this patch (basically anything which runs rebase will hit this case).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

csum-file: provide a function to release checkpoints

A hashfile_checkpoint struct is basically just a copy of the hash_ctx
state at a given point in the file. As such, it contains its own
git_hash_ctx which may (depending on the underlying hash implementation)
need to be discarded when we're done with it.

Let's add a "release" function which cleans up the hash context it
holds. I chose "release" here and not "discard" because you'd use this
to clean up every checkpoint, whether you used it or not. As opposed to
git_hash_discard(), which is needed only if you didn't call
git_hash_final().

There are only two callers which use hashfile_checkpoints, and we can
add release calls to both. When built with "SANITIZE=leak
OPENSSL_SHA1_UNSAFE=1", this makes both t1050 and t9300 leak-free.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

csum-file: always finalize or discard hash

When a hashfile struct is created, we always initialize the git_hash_ctx
inside it. We usually end up in hashfile_finalize(), which passes that
ctx to git_hash_final(), cleaning it up.

But a few code paths don't do so:

  1. If we bail on the hashfile and call free_hashfile() directly rather
     than finalizing.

  2. If the skip_hash flag is set, the hashfile_finalize() call will
     never call git_hash_final(). (You might think that we should just
     avoid git_hash_init() entirely in this case, but the skip_hash flag
     is set by the caller after the hashfile is initialized).

For most hash implementations this is OK, but for ones that allocate on
initialization it causes a memory leak. You can see many failures by
running:

  make SANITIZE=leak OPENSSL_SHA1_UNSAFE=1 test

since OpenSSL >= 3.0 is such an allocating hash implementation (and
csum-file uses the "unsafe" algorithm variant).

We can solve this by calling git_hash_discard() as appropriate.

Note that free_hashfile() is used both directly by callers to abort
without finalizing, and by hashfile_finalize() to free memory. In the
latter case we _don't_ want to call git_hash_discard(), because we'll
already have either finalized or discarded it. So we'll push that to an
internal "free_memory" function, and keep free_hashfile() as the public
interface to abort a hashfile without finalizing.

This fix makes several scripts leak-free with the command above: t1600,
t1601, t2107, t7008, t9210, t9211.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

hash: add discard primitive

The usual life-cycle for a git_hash_ctx is calling git_hash_init(),
adding some data, and then using git_hash_final() to get the output
digest and free any resources.

Sometimes we decide to abort the operation without the final() call
(e.g., due to errors or other reasons). In that case we just abandon the
hash_ctx completely and let it go out of scope. For most hash
implementations this is fine; they were just holding values directly in
the struct.

But some implementations do allocate memory, and in these cases we leak
the memory. Notably OpenSSL >= 3.0 requires us to allocate the digest
context on the heap with EVP_MD_CTX_new().

Let's provide a git_hash_discard() function that can be used in these
code paths to free any resources. For now we'll implement it by just
calling git_hash_final() into a dummy output, relying on its side effect
of freeing the resources. Our view of the underlying hash implementation
is abstracted behind the platform_SHA_* macros, so that's the best we
can do without widening that interface.

It's a little inefficient, but probably not noticeably so in practice,
especially as we'd usually hit this on an error code path. And by
abstracting it in this function, we can later swap it out when the
platform_SHA interface lets us do so.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

csum-file: drop discard_hashfile()

Commit c3d034df16 (csum-file: introduce discard_hashfile(), 2024-07-25)
added a cleanup function that no longer has any callers. In that commit
we adjusted do_write_index() to use the new function. But a similar fix
occurred on a parallel branch, making free_hashfile() public, and the
merge resolution in 1b6b2bfae5 (Merge branch 'ps/leakfixes-part-4',
2024-08-23) took the free_hashfile() version.

So now we have two functions, discard_hashfile() and free_hashfile(),
and we only need one. Which one do we want to keep?

The only difference between them is that the discard variant also closes
the descriptors held in the struct. Let's look at the three callers:

  1. In finalize_hashfile() we've either already closed the descriptors
     (if the CSUM_CLOSE flag is passed) or the caller didn't want them
     closed (if it didn't pass that flag). So we want the more limited
     free_hashfile().

  2. In object-file.c:flush_packfile_transaction() we close the
     descriptor ourselves. So discard_hashfile() could save us a line of
     code.

  3. In do_write_index() we don't close the descriptor. This was the spot
     for which c3d034df16 added the discard function in the first place,
     but I'm skeptical that closing the descriptor here is the right
     thing. It is true that we are done with the descriptor at this
     point and closing it would be ideal. But we don't really own it!

     The descriptor comes from a tempfile struct (as part of a lock) and
     that tempfile will hold on to the descriptor and try to close it
     when it is deleted. This might happen at the end of the program, in
     which case the double-close is mostly harmless (we might
     accidentally close some other open descriptor, but at that point
     we're just closing and unlinking everything we can).

     But in theory it could also cause subtle bugs. If do_write_index()
     fails, we return the error up the stack and would eventually end up
     in write_locked_index(). There we roll back the lock file on error,
     which will close the descriptor. So now we get our double close,
     and we might actually close something else that was opened in the
     interim.

     This is probably unlikely in practice (as soon as we see the error
     we'd mostly be unwinding the stack, not opening new files). But it
     highlights a potential problem with the discard_hashfile()
     interface: the hashfile doesn't necessarily own that descriptor.

Note that I said "descriptors" plural above. Those callers all care
about the "fd" member of the struct. But discard_hashfile() also closes
check_fd. That is only used if the struct is initialized with
hashfd_check(), and neither of its two callers call either discard or
free (they always "finalize" instead). So closing it is irrelevant for
the current callers.

I think we're better off sticking with the simpler free_hashfile()
interface, and the handful of callers can decide how to handle the
descriptors themselves.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

reset: introduce dry-run mode

In a subsequent commit we'll add another caller to `reset_working_tree()`
that wants to perform a dry-run check of whether it would be possible to
update the index and working tree when moving to a new commit. Introduce
a new flag that lets the caller perform this operation.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

reset: modernize flags passed to `reset_working_tree()`

The flags passed to `reset_working_tree()` are declared as defines. This
has fallen a bit out of practice nowadays, where we instead prefer to
use enums. Furthermore, the prefix of those flags does not match the
function name anymore after the rename in the preceding commit.

Adapt the code to follow modern best practices and adapt the flag names.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

reset: rename `reset_head()`

In a subsequent commit we're about to adapt `reset_head()` so that the
reference update to HEAD is optional, only. At this point the function
starts to feel misnamed, as it doesn't necessarily have anything to do
with the HEAD reference anymore. The gist of the function then is that
we reset the working tree to a specific new commit, updating both the
index and the checked-out files.

Rename it to `reset_working_tree()` to better reflect that.

Note that we don't adjust the flags yet. This will happen in a
subsequent commit.

Suggested-by: Phillip Wood <phillip.wood123@gmail.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

reset: drop `USE_THE_REPOSITORY_VARIABLE`

In "reset.c" we still have references to `the_repository`, even though
the only entry point into the file already receives a repository as
parameter.

Update all uses of `the_repository` to instead use the passed-in repo
and drop `USE_THE_REPOSITORY_VARIABLE`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

read-cache: split out function to drop unmerged entries to stage 0

In `repo_read_index_unmerged()` we read the index and then drop any
unmerged entries to stage 0. In a subsequent commit we'll want to
perform this operation on arbitrary indexes, not only the one of the
given repository.

Prepare for this by splitting out the functionality into a new function
that can act on an arbitrary index.

While at it, fix a signedness mismatch when iterating through the index
cache entries.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

line-log: drop extra copy of range with bloom filters

When line_log_process_ranges_arbitrary_commit() finds out from a Bloom
filter that a commit didn't touch the path in question, it can quickly
pass its range on to the parent commit.

It does so by making a copy of the range, and passing that copy to
add_line_range(). But add_line_range() already makes its own copy
(either directly, or by merging with an existing range for that parent).
So the copy we make is leaked.

We can plug the leak by just passing our range directly, without the
extra copy.

The bug goes back to f32dde8c12 (line-log: integrate with changed-path
Bloom filters, 2020-05-11). We didn't notice because the test suite
never explicitly combines these features! You can observe it by building
with SANITIZE=leak and running t4211 with some extra flags:

  GIT_TEST_COMMIT_GRAPH=1 \
  GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS=1 \
  ./t4211-line-log.sh

It would probably be useful to have some more targeted test coverage of
these features together. But I don't think there's much point in just
blindly copying the existing tests and adding bloom-filter support. We
already do that via the linux-TEST-vars CI job. We just don't run the
leak-checking build with those flags (so if there were a correctness
problem, we'd have noticed, just not a leak).

So I think we'd benefit from somebody clueful thinking about the
interaction of these features and testing the corner cases. But for the
purposes of this leak fix, I think we can just rely on the recipe above
(and consider running an extra leak-test job with more TEST-vars set).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

revision: avoid leaking bloom keyvecs with multiple traversals

In prepare_revision_walk(), we convert the pruning pathspecs into
bloom-filter "keyvecs" via prepare_to_use_bloom_filter(). This allocates
memory which is then freed eventually by release_revisions(), via
release_revisions_bloom_keyvecs().

But there's one case where we leak. If a caller uses the same rev_info
for multiple walks, calling prepare_revision_walk() multiple times, then
subsequent calls will overwrite the earlier keyvecs, leaking them. This
can happen with "git show foo bar", which does a separate no-walk
traversal for "foo" and "bar". Building with SANITIZE=leak and running
the test suite like:

  GIT_TEST_COMMIT_GRAPH=1 \
  GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS=1 \
  ./t4013-diff-various.sh

will trigger a complaint from LSan. It does not happen without those
extra flags because we don't store on-disk bloom filters by default, and
thus we optimize out the keyvec computation.

We can fix the leak by discarding the old entries before generating new
ones.

There's an alternative fix, which is that prepare_to_use_bloom_filter()
could notice that we already have keyvec entries and just reuse them.
But this is less safe; the keyvec depends on the pruning pathspec, and
we don't know if that has changed.

I think it would _probably_ work in practice, since any caller using a
rev_info for multiple traversals is probably doing so with the same
pathspec. But it would also create a very subtle bug if that assumption
is violated. So we'll do the safer thing here, and generate fresh keyvec
entries for each traversal. The efficiency difference is probably not
noticeable, and this is what was happening already (we just weren't
bothering to free the old ones!).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

bloom: make bloom-filter slab initialization idempotent

Before using any of the commit-graph bloom-filter code, somebody needs
to call init_bloom_filters(). This initializes the commit-slab we use
for storing filter information. But we don't want to call it twice
(without a matching deinit call in the middle), since it overwrites the
existing slab pointers, leaking the old values.

Usually this init call is done lazily by parse_commit_graph() when we
read a graph file that contains bloom data. But this can lead to some
oddities:

  1. We may call parse_commit_graph() multiple times when we have a
     split commit graph. I think this doesn't produce any user-visible
     bug, because we parse all of the files back-to-back. So even though
     we call init_bloom_filters() multiple times, we never look up any
     commits in between, so the slab is always empty and initializing it
     again happens to do nothing. This is a little sketchy to rely on,
     though.

  2. We call init_bloom_filters() directly in the "test-tool bloom"
     helper so we can call get_or_compute_bloom_filter(). Normally this
     is OK, as there is no bloom data in the on-disk graph file. But if
     you build with SANITIZE=leak and run:

       GIT_TEST_COMMIT_GRAPH=1 \
       GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS=1 \
       ./t0095-bloom.sh

     there's a leak that happens like this:

       a. Our direct init_bloom_filters() sets up the slab.

       b. In get_or_compute_bloom_filter() we look in the slab for a
  cached entry. We won't find anything yet, but since we don't
  use the read-only "peek" accessor (since we'll fill in the
  entry if not present), this actually populates the slab with
  an allocated chunk.

       c. Now we look for an entry in the graph files. So we have to
  load them and end up in parse_commit_graph(), which calls
  init_bloom_filters() again. That trashes our existing slab
  allocation, which is now leaked.

  3. There's a similar case in write_commit_graph(), which calls
     init_bloom_filters() before get_or_compute_bloom_filter(). I think
     this code path is lucky to avoid the leak because it reads the
     graph files first, then calls its init_bloom_filters(), and then
     starts filling in entries. So even though it has the same overwrite
     problem, we'd never actually allocate any slab entries between
     overwrites.

The easiest solution here is just to make initialization of the slab
idempotent using an extra flag.

We could actually get away without using the extra flag, for example by
checking whether bloom_filters.stride has been set. But it's probably
better to avoid being too intimate with the commit-slab details.
Likewise we don't actually need to re-initialize after a deinit call;
the slab-clearing function leaves things in a usable state. But it
seemed less surprising to pair the init/deinit calls explicitly.

I suspect this could all be cleaned up a bit more, but it's tricky. The
only function which uses the slab is get_or_compute_bloom_filter(), so
it would be much simpler if it just lazy-initialized the slab itself.
But I think there is a subtle dependency here: we usually only
initialize the slab when we find a graph file that has bloom entries. So
if we were to lose that signal, then even repos without on-disk bloom
data would start trying to populate the slab, wasting memory that will
never get entries filled in from the disk. So we'd need some other way
of signaling "it is worth considering bloom entries at all".

This patch takes a smaller and more direct route to just dealing with
the potential leak issue.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

Merge branch 'jk/repo-info-path-keys' into ps/setup-split-discovery-and-setup

* jk/repo-info-path-keys:
  repo: add path.gitdir with absolute and relative suffix formatting
  repo: add path.commondir with absolute and relative suffix formatting
  path: extract format_path() and use in rev-parse

Merge branch 'ps/setup-drop-global-state' into ps/setup-split-discovery-and-setup

* ps/setup-drop-global-state:
  treewide: drop USE_THE_REPOSITORY_VARIABLE
  environment: stop using `the_repository` in `is_bare_repository()`
  environment: split up concerns of `is_bare_repository_cfg`
  builtin/init: stop modifying `is_bare_repository_cfg`
  setup: remove global `git_work_tree_cfg` variable
  builtin/init: simplify logic to configure worktree
  builtin/init: stop modifying global `git_work_tree_cfg` variable

Merge branch 'ps/refs-onbranch-fixes' into ps/setup-split-discovery-and-setup

* ps/refs-onbranch-fixes:
  refs: protect against chicken-and-egg recursion
  refs/reftable: lazy-load configuration to fix chicken-and-egg
  reftable: split up write options
  refs/files: lazy-load configuration to fix chicken-and-egg
  refs: move parsing of "core.logAllRefUpdates" back into ref stores
  repository: free main reference database
  chdir-notify: drop unused `chdir_notify_reparent()`
  refs: unregister reference stores from "chdir_notify"
  setup: don't apply "GIT_REFERENCE_BACKEND" without a repository
  setup: stop applying repository format twice
  setup: inline `check_and_apply_repository_format()`

format-patch: fix leak of rev_info in prepare_bases()

In prepare_bases() we do a custom revision walk, separate from the main
format-patch walk. After we finish, we fail to call release_revisions(),
possibly leaking its contents.

We failed to notice it so far because the revision machinery doesn't
always allocate. But at least one case can trigger the leak: if a commit
graph is present, then the topo-walk allocates revs.topo_walk_info and
some associated data structures. You can see it in the test suite by
running:

  make SANITIZE=leak
  cd t
  GIT_TEST_COMMIT_GRAPH=1 ./t4014-format-patch.sh

which yields many entries like:

  ==git==3687620==ERROR: LeakSanitizer: detected memory leaks
  Direct leak of 200 byte(s) in 1 object(s) allocated from:
      #0 0x7f4ccba185cb in malloc ../../../../src/libsanitizer/lsan/lsan_interceptors.cpp:74
      #1 0x55cd452cdd0b in do_xmalloc wrapper.c:55
      #2 0x55cd452cdd9d in xmalloc wrapper.c:76
      #3 0x55cd45255473 in init_topo_walk revision.c:3845
      #4 0x55cd45255bef in prepare_revision_walk revision.c:4017
      #5 0x55cd44ffec40 in prepare_bases builtin/log.c:1872
      #6 0x55cd450010ec in cmd_format_patch builtin/log.c:2439

The un-released rev_info has been there since the code was added in
fa2ab86d18 (format-patch: add '--base' option to record base tree info,
2016-04-26), but back then we didn't even have a way to release rev_info
resources! The actual leak probably started around f0d9cc4196
(revision.c: begin refactoring --topo-order logic, 2018-11-01), but it's
hard to bisect because there were so many other unrelated leaks back
then.

So I'm not sure exactly when the leak started beyond "long ago", but it
is easy-ish to find now (since we've plugged all those other leaks) and
the solution is clear.

I didn't add a new test since we can demonstrate it with the existing
ones, but it does require tweaking a test variable. We might consider
ways to get more automatic leak-checking coverage there, but I think it
should be done outside of this fix.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t: move LSan errors from stdout to stderr

When we find LSan errors, we dump them via "say_color", which goes to
stdout. This is mostly harmless, since stdout and stderr tend to go to
the same place (either the user's terminal, or to the ".out" file with
--verbose-log).

But when running under a TAP harness like prove, they are split and
stdout is interpreted as TAP output. Historically even this was fine, as
the extra lines on stdout would be ignored. But since 389c83025d (t: let
prove fail when parsing invalid TAP output, 2026-06-04) we instruct the
TAP reader to complain, and a leaking test will result in complaints
like this (this is a real leak which we have yet to fix):

  $ GIT_TEST_COMMIT_GRAPH=1 make SANITIZE=leak test
  [...]
  Test Summary Report
  -------------------
  t4014-format-patch.sh (Wstat: 256 (exited 1) Tests: 226 Failed: 30)
    Failed tests:  197-226
    Non-zero exit status: 1
    Parse errors: Unknown TAP token: ""
                  Unknown TAP token: "================================================================="
                  Unknown TAP token: "==git==3693658==ERROR: LeakSanitizer: detected memory leaks"
                  Unknown TAP token: ""
                  Unknown TAP token: "Direct leak of 200 byte(s) in 1 object(s) allocated from:"
  Displayed the first 5 of 1531 TAP syntax errors.
  Re-run prove with the -p option to see them all.

You still see the failing tests, so it's mostly just an annoyance. We
can fix it by redirecting to stderr (actually descriptor 4, which is our
verbose-respecting variant). I confirmed manually that the output still
appears with --verbose-log, and even with a single-test "-i
--verbose-only=197" going to the terminal.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit-reach: guard !FIND_ALL early exit with generation ordering check

When paint_down_to_common() falls back to commit-date ordering (for
v1 commit graphs without corrected commit dates), the !FIND_ALL early
exit incorrectly fires. The exit assumes the queue is generation-
ordered, so the first RESULT commit found must be the shallowest.
With date ordering this is not guaranteed: a closer merge base with
a lower committer date (clock skew) may still be in the queue behind
deeper commits.

Add a gen_ordered flag that is cleared when the date fallback fires,
and require it for the early exit.

Update the test from the previous commit to test_expect_success.

Signed-off-by: Kristofer Karlsson <krka@spotify.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t6600: add test for merge-base early exit with clock skew

Add a topology where the correct merge base (M2) has a lower
committer date than its ancestor (M1) due to clock skew. With a
v1 commit graph (topological levels only, no corrected commit
dates), paint_down_to_common() falls back to commit-date ordering.
In that mode, M1 pops before M2, acquires both paint sides, and
the !FIND_ALL early exit fires -- returning the wrong merge base.

Mark the test as test_expect_failure to document the bug; the next
commit will fix it.

Signed-off-by: Kristofer Karlsson <krka@spotify.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

history: streamline message preparation and plug file stream leak

An early part of fill_commit_message() function uses write_file_buf()
to write out what was prepared in a strbuf, which is primarily meant
for use by callers that have their own message prepared fully and
called as the last thing to flush it to the destination file.

However, the function then opens a file stream in append mode to
further write into it. It may have been understandable if this was
a later addition, but it seems it came from a single commit,
d205234c (builtin/history: implement "reword" subcommand,
2026-01-13), which is somewhat puzzling, but anyway...

Just open the file stream upfront for writing, write the message
the function has in the strbuf, and then keep writing whatever it
wants to write to the same open file stream.

And do not forget to close the stream. We are about to pass the
resulting file to an external editor, and on some systems, notably
Windows, you are not supposed to keep a file open while expecting
another program to access it.

Diagnosed-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

Git 2.55

Signed-off-by: Junio C Hamano <gitster@pobox.com>

Merge branch 'jk/t5551-expensive-test-timeouts-fix'

The Apache timeout in HTTP tests has been increased to prevent test
failures on heavily loaded CI runners. The tests creating an
enormous number of refs have been isolated to their own repositories
to avoid slowing down subsequent tests.

* jk/t5551-expensive-test-timeouts-fix:
t5551: put many-tags case into its own repo
t/lib-httpd: bump apache timeout

t5551: put many-tags case into its own repo

Most of the t5551 http fetch tests use a handful of refs. But there are
a few test cases which check our handling of large numbers of refs.
These tests use the same server-side repo, so all subsequent tests end
up having to consider those extra refs, too.

The result is that the test script is a bit slower than it needs to be.
In a normal run, moving the "2,000 tags" test into its own repo drops my
runtime for the whole script from ~2.7s to ~1.9s.

This is a modest gain, but when we add the "--long" flag it gets much
bigger. There we trigger a test (marked with EXPENSIVE) that adds
100,000 tags, and the script runtime jumps to ~95s. But if we use the
same "many tags" repo for that, our runtime drops to just ~37s.

This is a pretty easy win to drop the cost of the script. It may even be
a larger gain on a heavily loaded system, since one of the main costs
here is unpacked refs, which are heavy on system time and I/O costs.

It's possible we are reducing test coverage, since all of those other
tests were inadvertently using large ref advertisements (and thus could
have uncovered some unexpected interaction). But that seems somewhat
unlikely; the tests targeted at the large number of refs are doing
roughly similar things to the other tests.

Note that the real performance culprit is the 100k-tag --long test, not
the 2k-tag one. So we could just let the 100k one use its own repo, and
keep the 2k tags in the main repo. But since these two tests are
somewhat interlinked, it's easier to just move them both (and it does
provide a small gain even for the 2000-tag test). I also notice that the
2000-tag test is gated on the CMDLINE_LIMIT prereq, and without that the
later EXPENSIVE test will fail (since we won't have a too-many-refs
clone). Nobody seems to have noticed or complained after many years, and
I left it alone for this patch.

Signed-off-by: Jeff King <peff@peff.net>
[jc: made the new "many-tags.git" bare to match the original "repo.git"]
Signed-off-by: Junio C Hamano <gitster@pobox.com>

Merge branch 'js/http-https-proxy-fix'

We lost ability to use https:// proxies during this cycle; this is
a hotfix for the regression.

* js/http-https-proxy-fix:
http: accept https:// proxies again

reftable: fix unlikely leak on API error

If the reftable writer sees a bogus block size, we return with
REFTABLE_API_ERROR, leaking the reftable_writer struct we previously
allocated. Originally this case was a BUG(), but it became a regular
return in 445f9f4f35 (reftable: stop using `BUG()` in trivial cases,
2025-02-18).

We could obviously fix it by calling "reftable_free(wp)". But we can
observe that we never use the allocated "wp" until after we've validated
the input options. So let's just bump the allocation down. That fixes
the leak, and I think makes the flow of the function more logical
(we validate our inputs before doing any work).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t/lib-httpd: bump apache timeout

Since enabling more tests with 7a094d68a2 (ci: run expensive tests on
push builds to integration branches, 2026-05-08), we sometimes see test
failures or timeouts in GitHub CI. The culprit seems to be the "enormous
ref negotiation" test in t5551, which creates ~100k tag refs in our http
server-side repo.

Iterating through the loose refs of this repo to generate a ref
advertisement can take a long time, especially on a platform with slow
I/O. On my otherwise unloaded local machine, a cold cache ref
advertisement takes ~10s. On a busy CI machine running tests in
parallel, it can presumably top 60s, which runs afoul of Apache's
default CGI timeout.

The result in t5551 is a test failure, where Apache simply hangs up the
connection and the client reports an error. But worse, t5559 runs the
same test with HTTP/2, and a bug in Apache causes the connection to hang
indefinitely! We eventually see this as a CI timeout after 6 hours.

Let's bump Apache's timeout to something much larger: 600 seconds. This
doesn't eliminate the possibility of a timeout, but it makes it much
less likely. It should eliminate both the test failures and the CI
timeouts in practice, and it protects us from running into similar
problems with other tests in the future.

There are two counter-arguments to consider.

One, could/should we just make the test faster? Probably yes. The
biggest mistake here is having such an absurd number of unpacked refs on
a system which is bottle-necked on I/O. But I think it's worth bumping
the timeout so that we can fix this (and possibly other) correctness
issues, and then consider performance separately (which we'll do in
subsequent patches).

And two, is this just papering over a problem that users might see in
the real world? We could teach Git to handle this case more gracefully
with optimizations or keep-alives. But I think it's really an artificial
situation. You need a combination of this silly number of loose refs,
plus a very heavily loaded system. If you were trying to run a real
server and it took more than 60s to generate the ref advertisement, I
don't think the timeout is your biggest problem. Your crappy service is,
and you should adjust your resources to match your load. I.e., it is
probably reasonable for Git to assume that advertisements happen
fast-ish and don't need protocol-level keepalives.

Though the patch here is small, tons of work went into analyzing the
problem. Many thanks to the contributors credited below.

Helped-by: Michael Montalbo <mmontalbo@gmail.com>
Helped-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

http: accept https:// proxies again

Since 663d7abe07ea (http: reject unsupported proxy URL schemes,
2026-05-05), set_curl_proxy_type() returns 0 only for the "http"
and SOCKS variants via dedicated early returns, and -1 for
everything else. The "https" branch configures the CURL handle for
HTTPS proxying but then falls through to the trailing `return -1`
intended for unknown schemes, so the caller in get_curl_handle()
treats a perfectly valid https:// proxy URL as unsupported and
refuses to use it.

Noticed while looking into a Coverity report against the same
function; the unchecked curl_easy_setopt() return values it flags
are orthogonal to this fix.

Assisted-by: Opus 4.7
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

Merge tag 'l10n-2.55.0-v1' of https://github.com/git-l10n/git-po

l10n-2.55.0-v1

* tag 'l10n-2.55.0-v1' of https://github.com/git-l10n/git-po:
  l10n: zh-TW.po: Update Chinese (Traditional) translation
  l10n: uk: add 2.55 translation
  l10n: ga.po: update for Git 2.55
  l10n: fr: mass fix of typos
  l10n: fr: version 2.55
  l10n: po-id for 2.55
  l10n: AGENTS.md: add quotation mark preservation guidelines
  l10n: zh_CN: updated translation for 2.55
  l10n: TEAMS: change Simplified Chinese team leader
  l10n: sv.po: Update Swedish translation
  l10n: ca.po: update Catalan translation
  l10n: tr: Update Turkish translations
  l10n: bg.po: Updated Bulgarian translation (6322t)
  l10n: it: fix italian usage messages alignment

Merge branch '2.55-uk-pr' of github.com:arkid15r/git-ukrainian-l10n

* '2.55-uk-pr' of github.com:arkid15r/git-ukrainian-l10n:
l10n: uk: add 2.55 translation

Merge branch 'l10n-ga-2.55' of github.com:aindriu80/git-po

* 'l10n-ga-2.55' of github.com:aindriu80/git-po:
l10n: ga.po: update for Git 2.55

Merge branch 'l10n/zh-TW/2026-06-26' of github.com:l10n-tw/git-po

* 'l10n/zh-TW/2026-06-26' of github.com:l10n-tw/git-po:
l10n: zh-TW.po: Update Chinese (Traditional) translation

Merge branch 'ca-20260624-b' of github.com:Softcatala/git-po

* 'ca-20260624-b' of github.com:Softcatala/git-po:
l10n: ca.po: update Catalan translation

Merge branch 'zh_CN-2.55' of github.com:lilydjwg/git-po

* 'zh_CN-2.55' of github.com:lilydjwg/git-po:
l10n: zh_CN: updated translation for 2.55
l10n: TEAMS: change Simplified Chinese team leader

Merge branch 'tr-l10n' of github.com:bitigchi/git-po

* 'tr-l10n' of github.com:bitigchi/git-po:
l10n: tr: Update Turkish translations

Merge branch 'po-id' of github.com:bagasme/git-po

* 'po-id' of github.com:bagasme/git-po:
l10n: po-id for 2.55

Merge branch 'master' of github.com:alshopov/git-po

* 'master' of github.com:alshopov/git-po:
l10n: bg.po: Updated Bulgarian translation (6322t)

Merge branch 'fr_v2.55' of github.com:jnavila/git

* 'fr_v2.55' of github.com:jnavila/git:
l10n: fr: mass fix of typos
l10n: fr: version 2.55

Merge branch 'master' of github.com:nafmo/git-l10n-sv

* 'master' of github.com:nafmo/git-l10n-sv:
l10n: sv.po: Update Swedish translation

l10n: zh-TW.po: Update Chinese (Traditional) translation

Signed-off-by: Yi-Jyun Pan <pan93412@gmail.com>

push: suggest <remote> <branch> for a slash slip

When pushing the 'main' branch to the remote 'origin', i.e.,

    $ git push origin main

it is easy to mistakenly write

    $ git push origin/main

That is parsed as the repository to push to, and since 'origin/main'
is neither a configured remote nor a path it dies with:

    fatal: 'origin/main' does not appear to be a git repository

Often 'origin/main' does not exist as a repository, so the command
fails without doing any harm, but it gives no hint that a space was
meant instead of a slash and can leave the user puzzled.

When the argument is not an existing path or configured remote but
its part before the first slash names one, suggest the intended
'<remote> <branch>' form:

    $ git push origin main

The suggestion is shown as advice so it can be silenced with
advice.pushRepoLooksLikeRef.

Signed-off-by: Harald Nordgren <haraldnordgren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

branch: suggest <remote>/<branch> on upstream slip

When setting the upstream of the current branch to the 'main' branch
of the remote 'origin', i.e.,

    $ git branch --set-upstream-to origin/main

it is easy to mistakenly write

    $ git branch --set-upstream-to origin main

That is parsed as a request to set the upstream of the local branch
'main' to 'origin'. When 'main' does not exist, the command dies
with:

    fatal: branch 'main' does not exist

pointing at a branch the user never meant to name. When 'main' does
exist, it instead dies with:

    fatal: the requested upstream branch 'origin' does not exist

leaving the user equally puzzled.

When the operated-on branch is missing and '<remote>/<branch>' names
a real remote-tracking ref, suggest the intended form:

    $ git branch --set-upstream-to=origin/main

The suggestion is gated on '<remote>/<branch>' existing so it only
appears when a slipped slash is the likely explanation.

Signed-off-by: Harald Nordgren <haraldnordgren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t3420-rebase-autostash: don't try to grep non-existing files

Several tests in 't3420-rebase-autostash.sh' start various rebase
processes that are expected to fail because of merge conflicts.  The
tests [1] checking that 'git rebase --quit' and autostash work
together as expected after such a failure then run '! grep ...' to
ensure that the dirty contents of the file is gone.  However, due to
the test repo's history and the choice of upstream branch that file
shouldn't exist in the conflicted state at all, and thus it shouldn't
exist after the subsequent 'git rebase --quit' either.  Consequently,
this 'grep' doesn't fail as expected, i.e. because it can't find the
dirty content, but instead it fails, because it can't open the file.

Thighten this check by using 'test_path_is_missing' instead, thereby
avoiding unexpected errors from 'grep' as well.

Previously 2745817028 (t3420-rebase-autostash: don't try to grep
non-existing files, 2018-08-22) fixed a couple of similar issues; this
one was added later in 9b2df3e8d0 (rebase: save autostash entry into
stash reflog on --quit, 2020-04-28).

[1] This patch modifies only a single test, but that test is run
    several times with different strategies ('--apply', '--merge', and
    '--interactive'), hence the plural "tests".

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

l10n: uk: add 2.55 translation

Co-authored-by: Kate Golovanova <kate@kgthreads.com>
Signed-off-by: Arkadii Yakovets <ark@cho.red>
Signed-off-by: Kate Golovanova <kate@kgthreads.com>

l10n: ga.po: update for Git 2.55

Signed-off-by: Aindriú Mac Giolla Eoin <aindriu80@gmail.com>

l10n: fr: mass fix of typos

Helped-by: Kévin Leprêtre <k.lepretre@houseofhr.onmicrosoft.com>
Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>

l10n: fr: version 2.55

Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>

l10n: po-id for 2.55

Update following components:

  * add-patch.c
  * apply.c
  * bisect.c
  * builtin/add.c
  * builtin/backfill.c
  * builtin/bisect.c
  * builtin/cat-file.c
  * builtin/checkout.c
  * builtin/config.c
  * builtin/fast-import.c
  * builtin/fetch.c
  * builtin/fsmonitor--daemon.c
  * builtin/hook.c
  * builtin/index-pack.c
  * builtin/interpret-trailers.c
  * builtin/last-modified.c
  * builtin/log.c
  * builtin/multi-pack-index.c
  * builtin/name-rev.c
  * builtin/pack-objects.c
  * builtin/push.c
  * builtin/repack.c
  * builtin/replay.c
  * builtin/repo.c
  * builtin/show-index.c
  * builtin/stash.c
  * builtin/submodule--helper.c
  * builtin/worktree.c
  * command-list.h
  * diff.c
  * fetch-pack.c
  * hook.c
  * list-objects-filter-options.c
  * lockfile.c
  * midx-write.c
  * midx.c
  * object-file.c
  * object.c
  * packfile.c
  * path-walk.c
  * pretty.c
  * promisor-remote.c
  * pseudo-merge.c
  * read-cache.c
  * refs.c
  * remote-curl.c
  * repack-midx.c
  * replay.c
  * repository.c
  * revision.c
  * sequencer.c
  * setup.c
  * submodule.c
  * t/helper/test-path-walk.c
  * t/helper/test-read-midx.c
  * trailer.c
  * git-send-email.perl

Translate following new components:

  * builtin/history.c
  * builtin/url-parse.c
  * compat/fsmonitor/fsm-listen-linux.c
  * sideband.c
  * t/helper/test-synthesize.c

Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>

refs: protect against chicken-and-egg recursion

In the preceding commits we have fixed recursion when creating the
reference backends due to a chicken-and-egg situation with "onbranch"
conditions. Unfortunately, this issue has existed for a while, and we
didn't really have a good mechanism to detect this recursion.

Improve the status quo by detecting the recursion when creating the main
reference store.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

refs/reftable: lazy-load configuration to fix chicken-and-egg

Same as with the "files" backend, the "reftable" backend also has a
chicken-and-egg problem with "onbranch" conditions. Fix this issue the
same as we did with the "files" backend by lazy-loading configuration.

Now that both the "files" and the "reftable" backend handle this
properly, add a generic test to t1400 that verifies that the user can
configure "core.logAllRefUpdates" via an "onbranch" condition. This is
mostly a nonsensical thing to do in the first place, but it serves as a
good sanity check.

Note that we had to move `should_write_log()` around so that it can
access the new `reftable_be_write_options()` function.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

reftable: split up write options

When initializing the reftable stack the caller may optionally pass some
write options. These write options mix up two different concerns though:

  - Of course, they allow the caller to configure how new reftables are
    being written.

  - But they also allow the caller to configure the stack itself, like
    its hash ID and the `on_reload` callback.

This is somewhat awkward, as it doesn't easily give the caller the
flexibility to for example write multiple reftables with different
options. Furthermore, this requires us to eagerly parse relevant
configuration when initializing the reftable backend.

Refactor the code by splitting out those options that configure the
stack itself. Creating a new stack will thus only require this limited
set of options, whereas the caller is expected to pass write options to
all functions that end up writing tables.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

refs/files: lazy-load configuration to fix chicken-and-egg

When initializing the "files" reference backend we read the repository's
config to parse "core.preferSymlinkRefs" and "core.logAllRefUpdates".
This results in a chicken-and-egg problem though, because parsing the
configuration may require us to have access to the reference store
already when an "onbranch" condition exists.

Luckily, all the configuration that we honor only relates to writing
references. Consequently, we don't strictly need that configuration to
be readily available at initialization time, and we can easiliy defer
parsing it to a later point in time.

Implement this fix and add tests that verify that we can indeed properly
parse these config knobs via an "onbranch" condition.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

refs: move parsing of "core.logAllRefUpdates" back into ref stores

In cc42c88945 (refs: extract out reflog config to generic layer,
2026-05-04) we have refactored how we parse "core.logAllRefUpdates" so
that it happens in the generic layer. Unfortunately, this has worsened a
preexisting issue where we may recurse when creating the reference store
because of a chicken-and-egg problem between parsing the configuration
and evaluating "onbranch" conditions.

Prepare for a fix by essentially reverting that change so that we handle
this setting in the respective backends again. The backends are already
parsing other configuration anyway, so by moving the logic back in there
we can ensure that all backend configuration is parsed the same way.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

repository: free main reference database

While we release worktree and submodule reference databases when
clearing a repository, we don't ever release the main reference
database. This memory leak went unnoticed because its pointer is
kept alive by the "chdir_notify" subsystem.

Fix the memory leak.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

chdir-notify: drop unused `chdir_notify_reparent()`

With the preceding commit we've removed all callers of
`chdir_notify_reparent()`, so the function is unused now. Drop it.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

refs: unregister reference stores from "chdir_notify"

When creating reference stores we register them with the "chdir_notify"
subsystem. This is required because some of the paths we track may be
relative paths, so we have to reparent them in case the current working
directory changes.

But while we register the reference stores, we never unregister them.
This can have multiple outcomes:

  - For a repository's main reference database we essentially keep the
    pointer alive. We never free that database, either, and our leak
    checker doesn't notice because it's still registered.

  - For submodule and worktree reference databases we do eventually free
    them in `repo_clear()`, so we may keep pointers to free'd memory
    registered. We never notice though as we don't tend to chdir around
    in the middle of the process.

We never noticed either of these symptoms, but they are obviously bad.

Partially fix those issues by unregistering the reference stores when
releasing them. The leak of the main reference database will be fixed in
a subsequent commit.

Note that this requires us to use `chdir_notify_register()` instead of
`chdir_notify_reparent()`, as there is no infrastructure to unregister the
latter.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

setup: don't apply "GIT_REFERENCE_BACKEND" without a repository

When discovering a repository we eventually also apply the
"GIT_REFERENCE_BACKEND" environment variable to the repository. There's
two problems with that:

  - We do this unconditionally, which is rather pointless: we really
    only have to configure the repository when we have found one.

  - We have already applied the repository format at that point in time,
    so we need to manually reapply it.

Move the logic around so that we only apply the environment variable
when a repository was discovered. This also allows us to drop the
explcit call to `repo_set_ref_storage_format()` because we now adjust
the format before we apply it via `apply_repository_format()`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

setup: stop applying repository format twice

When discovering the repository in "setup.c" we apply the final
repository format multiple times:

  - Once via `repository_format_configure()`, where we apply the hash
    algorithm and ref storage format to both `struct repository_format`
    and `struct repository`.

  - And once via `apply_repository_format()`, where we apply these two
    settings from `struct repository_format` to `struct repository`.

With the current flow both of these are in fact necessary. But this is
only because we call `repository_format_configure()` after we have
called `apply_repository_format()`. Consequently, if we only changed the
repository format in `repository_format_configure()` it would never
propagate to the repository.

Refactor the code so that we first configure the repository format
before applying it to the repository so that we can stop setting the
hash and reference storage format multiple times.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

setup: inline `check_and_apply_repository_format()`

We have two callsites of `check_and_apply_repository_format()`. In a
subsequent commit we'll want to adapt one of those callsites to change
the order in which we read and apply the repository format, at which
point the helper function will not really be a good fit for us anymore.

Inline the function to both of the callsites.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

Merge branch 'ps/setup-centralize-odb-creation' into ps/refs-onbranch-fixes

* ps/setup-centralize-odb-creation:
  setup: construct object database in `apply_repository_format()`
  repository: stop reading loose object map twice on repo init
  setup: stop initializing object database without repository
  setup: stop creating the object database in `setup_git_env()`
  repository: stop initializing the object database in `repo_set_gitdir()`
  setup: deduplicate logic to apply repository format
  setup: drop `setup_git_env()`
  t0001: plug test gaps for git-init(1) with GIT_OBJECT_DIRECTORY

Merge branch 'master' of github.com:mbeniamino/git-po

* 'master' of github.com:mbeniamino/git-po:
l10n: it: fix italian usage messages alignment

l10n: AGENTS.md: add quotation mark preservation guidelines

Add a "Preserving Quotation Marks" section to prevent AI-assisted
translation and review from incorrectly converting language-specific
UTF-8 curly quotes (e.g., „ U+201E, " U+201C for Bulgarian) into
ASCII straight quotes " (U+0022), which would cause PO string
truncation and syntax errors.

Also update the "Special characters" item in the Quality checklist
to reference the new section.

Signed-off-by: Jiang Xin <worldhello.net@gmail.com>

l10n: zh_CN: updated translation for 2.55

Reviewed-by: Jiang Xin <worldhello.net@gmail.com>
Reviewed-by: Fangyi Zhou <me@fangyi.io>
Signed-off-by: lilydjwg <lilydjwg@gmail.com>

l10n: TEAMS: change Simplified Chinese team leader

Signed-off-by: lilydjwg <lilydjwg@gmail.com>

Merge branch 'ps/t4216-tap-fix'

TAP output breakage fix.

* ps/t4216-tap-fix:
t4216: fix no-op test that breaks TAP output