git.ipfire.org Git - thirdparty/git.git/log

Merge branch 'ps/ref-tests-update'

Update ref-related tests.

* ps/ref-tests-update:
  t: mark several tests that assume the files backend with REFFILES
  t7900: assert the absence of refs via git-for-each-ref(1)
  t7300: assert exact states of repo
  t4207: delete replace references via git-update-ref(1)
  t1450: convert tests to remove worktrees via git-worktree(1)
  t: convert tests to not access reflog via the filesystem
  t: convert tests to not access symrefs via the filesystem
  t: convert tests to not write references via the filesystem
  t: allow skipping expected object ID in `ref-store update-ref`

Merge branch 'jw/git-add-attr-pathspec'

"git add" and "git stash" learned to support the ":(attr:...)"
magic pathspec.

* jw/git-add-attr-pathspec:
attr: enable attr pathspec magic for git-add and git-stash

Merge branch 'jk/chunk-bounds-more'

Code clean-up for jk/chunk-bounds topic.

* jk/chunk-bounds-more:
  commit-graph: mark chunk error messages for translation
  commit-graph: drop verify_commit_graph_lite()
  commit-graph: check order while reading fanout chunk
  commit-graph: use fanout value for graph size
  commit-graph: abort as soon as we see a bogus chunk
  commit-graph: clarify missing-chunk error messages
  commit-graph: drop redundant call to "lite" verification
  midx: check consistency of fanout table
  commit-graph: handle overflow in chunk_size checks

Merge branch 'js/ci-discard-prove-state'

The way CI testing used "prove" could lead to running the test
suite twice needlessly, which has been corrected.

* js/ci-discard-prove-state:
ci: avoid running the test suite _twice_

Merge branch 'ps/ci-gitlab'

Add support for GitLab CI.

* ps/ci-gitlab:
  ci: add support for GitLab CI
  ci: install test dependencies for linux-musl
  ci: squelch warnings when testing with unusable Git repo
  ci: unify setup of some environment variables
  ci: split out logic to set up failed test artifacts
  ci: group installation of Docker dependencies
  ci: make grouping setup more generic
  ci: reorder definitions for grouping functions

Merge branch 'js/doc-unit-tests-with-cmake'

Update the base topic to work with CMake builds.

* js/doc-unit-tests-with-cmake:
  cmake: handle also unit tests
  cmake: use test names instead of full paths
  cmake: fix typo in variable name
  artifacts-tar: when including `.dll` files, don't forget the unit-tests
  unit-tests: do show relative file paths
  unit-tests: do not mistake `.pdb` files for being executable
  cmake: also build unit tests

Merge branch 'js/doc-unit-tests'

Process to add some form of low-level unit tests has started.

* js/doc-unit-tests:
  ci: run unit tests in CI
  unit tests: add TAP unit test framework
  unit tests: add a project plan document

Merge branch 'ps/httpd-tests-on-nixos'

Portability tweak.

* ps/httpd-tests-on-nixos:
  t9164: fix inability to find basename(1) in Subversion hooks
  t/lib-httpd: stop using legacy crypt(3) for authentication
  t/lib-httpd: dynamically detect httpd and modules path

Git 2.43

Signed-off-by: Junio C Hamano <gitster@pobox.com>

Merge tag 'l10n-2.43.0-rnd2' of https://github.com/git-l10n/git-po

l10n-2.43.0-rnd2

* tag 'l10n-2.43.0-rnd2' of https://github.com/git-l10n/git-po:
  l10n: zh-TW: Git 2.43.0-rc1
  l10n: Update German translation
  l10n: bg.po: Updated Bulgarian translation (5579t)
  l10n: zh_CN: for git 2.43.0-rc1
  l10n: Update Catalan translation
  l10n: po-id for 2.43 (round 1)
  l10n: fr: v2.43.0 rnd 2
  l10n: update uk localization for v2.43
  l10n: sv.po: Update Swedish translation (5579t)
  l10n: tr: v2.43.0

Merge branch 'vd/glossary-dereference-peel'

"To dereference" and "to peel" were sometimes used in in-code
comments and documentation but without description in the glossary.

* vd/glossary-dereference-peel:
glossary: add definitions for dereference & peel

Merge branch 'tz/send-email-helpfix'

Typoes in "git send-email -h" have been corrected.

* tz/send-email-helpfix:
send-email: remove stray characters from usage

Merge branch 'l10n/zh-TW/2023-11-19' of github.com:l10n-tw/git-po

* 'l10n/zh-TW/2023-11-19' of github.com:l10n-tw/git-po:
l10n: zh-TW: Git 2.43.0-rc1

l10n: zh-TW: Git 2.43.0-rc1

Signed-off-by: Yi-Jyun Pan <pan93412@gmail.com>

Merge branch 'master' of github.com:alshopov/git-po

* 'master' of github.com:alshopov/git-po:
l10n: bg.po: Updated Bulgarian translation (5579t)

l10n: Update German translation

Signed-off-by: Ralf Thielow <ralf.thielow@gmail.com>

l10n: bg.po: Updated Bulgarian translation (5579t)

Signed-off-by: Alexander Shopov <ash@kambanaria.org>

l10n: zh_CN: for git 2.43.0-rc1

Signed-off-by: Teng Long <dyroneteng@gmail.com>
Signed-off-by: Jiang Xin <worldhello.net@gmail.com>

Merge branch '2.43-uk-update' of github.com:arkid15r

* '2.43-uk-update' of github.com:arkid15r/git-ukrainian-l10n:
l10n: update uk localization for v2.43

Merge branch 'catalan' of github.com:Softcatala/git-po

* 'catalan' of github.com:Softcatala/git-po:
l10n: Update Catalan translation

Merge branch 'tr-l10n' of github.com:bitigchi/git-po

* 'tr-l10n' of github.com:bitigchi/git-po:
l10n: tr: v2.43.0

Merge branch 'fr_v2.43.0' of github.com:jnavila/git

* 'fr_v2.43.0' of github.com:jnavila/git:
l10n: fr: v2.43.0 rnd 2

Merge branch 'po-id' of github.com:bagasme/git-po

* 'po-id' of github.com:bagasme/git-po:
l10n: po-id for 2.43 (round 1)

Merge branch 'master' of github.com:nafmo/git-l10n-sv

* 'master' of github.com:nafmo/git-l10n-sv:
l10n: sv.po: Update Swedish translation (5579t)

send-email: remove stray characters from usage

A few stray single quotes crept into the usage string in a2ce608244
(send-email docs: add format-patch options, 2021-10-25). Remove them.

Signed-off-by: Todd Zullinger <tmz@pobox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

Git 2.43-rc2

Signed-off-by: Junio C Hamano <gitster@pobox.com>

glossary: add definitions for dereference & peel

Add 'gitglossary' definitions for "dereference" (as it used for both symrefs
and objects) and "peel". These terms are used in options and documentation
throughout Git, but they are not clearly defined anywhere and the behavior
they refer to depends heavily on context. Provide explicit definitions to
clarify existing documentation to users and help contributors to use the
most appropriate terminology possible in their additions to Git.

Update other definitions in the glossary that use the term "dereference" to
link to 'def_dereference'.

Signed-off-by: Victoria Dye <vdye@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

ci: avoid running the test suite _twice_

This is a late amendment of 4a6e4b960263 (CI: remove Travis CI support,
2021-11-23), whereby the `.prove` file (being written by the `prove`
command that is used to run the test suite) is no longer retained
between CI builds: This feature was only ever used in the Travis CI
builds, we tried for a while to do the same in Azure Pipelines CI runs
(but I gave up on it after a while), and we never used that feature in
GitHub Actions (nor does the new GitLab CI code use it).

Retaining the Prove cache has been fragile from the start, even though
the idea seemed good at the time, the idea being that the `.prove` file
caches information about previous `prove` runs (`save`) and uses them
(`slow`) to run the tests in the order from longer-running to shorter
ones, making optimal use of the parallelism implied by `--jobs=<N>`.

However, using a Prove cache can cause some surprising behavior: When
the `prove` caches information about a test script it has run,
subsequent `prove` runs (with `--state=slow`) will run the same test
script again even if said script is not specified on the `prove`
command-line!

So far, this bug did not matter. Right until d8f416bbb87c (ci: run unit
tests in CI, 2023-11-09) did it not matter.

But starting with that commit, we invoke `prove` _twice_ in CI, once to
run the regular test suite of regression test scripts, and once to run
the unit tests. Due to the bug, the second invocation re-runs all of the
tests that were already run as part of the first invocation. This not
only wastes build minutes, it also frequently causes the `osx-*` jobs to
fail because they already take a long time and now are likely to run
into a timeout.

The worst part about it is that there is actually no benefit to keep
running with `--state=slow,save`, ever since we decided no longer to
try to reuse the Prove cache between CI runs.

So let's just drop that Prove option and live happily ever after.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

l10n: Update Catalan translation

Signed-off-by: Jordi Mas <jmas@softcatala.org>

l10n: po-id for 2.43 (round 1)

Update following components:

  * builtin/gc.c
  * builtin/interpret-trailers.c
  * builtin/merge-file.c
  * builtin/show-ref.c
  * builtin/update-index.c
  * chunk-format.c
  * parse-options.c
  * scalar.c

While at it, drop unused strings.

Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>

RelNotes: minor wording fixes in 2.43.0 release notes

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

l10n: fr: v2.43.0 rnd 2

Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>

t9164: fix inability to find basename(1) in Subversion hooks

Hooks executed by Subversion are spawned with an empty environment. By
default, not even variables like PATH will be propagated to them. In
order to ensure that we're still able to find required executables, we
thus write the current PATH variable into the hook script itself and
then re-export it in t9164.

This happens too late in the script though, as we already tried to
execute the basename(1) utility before exporting the PATH variable. This
tends to work on most platforms as the fallback value of PATH for Bash
(see `getconf PATH`) is likely to contain this binary. But on more
exotic platforms like NixOS this is not the case, and thus the test
fails.

While we could work around this issue by simply setting PATH earlier, it
feels fragile to inject a user-controlled value into the script and have
the shell interpret it. Instead, we can refactor the hook setup to write
a `hooks-env` file that configures PATH for us. Like this, Subversion
will know to set up the environment as expected for all hooks.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t/lib-httpd: stop using legacy crypt(3) for authentication

When setting up httpd for our tests, we also install a passwd and
proxy-passwd file that contain the test user's credentials. These
credentials currently use crypt(3) as the password encryption schema.

This schema can be considered deprecated nowadays as it is not safe
anymore. Quoting Apache httpd's documentation [1]:

> Unix only. Uses the traditional Unix crypt(3) function with a
> randomly-generated 32-bit salt (only 12 bits used) and the first 8
> characters of the password. Insecure.

This is starting to cause issues in modern Linux distributions. glibc
has deprecated its libcrypt library that used to provide crypt(3) in
favor of the libxcrypt library. This newer replacement provides a
compile time switch to disable insecure password encryption schemata,
which causes crypt(3) to always return `EINVAL`. The end result is that
httpd tests that exercise authentication will fail on distros that use
libxcrypt without these insecure encryption schematas.

Regenerate the passwd files to instead use the default password
encryption schema, which is md5. While it feels kind of funny that an
MD5-based encryption schema should be more secure than anything else, it
is the current default and supported by all platforms. Furthermore, it
really doesn't matter all that much given that these files are only used
for testing purposes anyway.

[1]: https://httpd.apache.org/docs/2.4/misc/password_encryptions.html

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t/lib-httpd: dynamically detect httpd and modules path

In order to set up the Apache httpd server, we need to locate both the
httpd binary and its default module path. This is done with a hardcoded
list of locations that we scan. While this works okayish with distros
that more-or-less follow the Filesystem Hierarchy Standard, it falls
apart on others like NixOS that don't.

While it is possible to specify these paths via `LIB_HTTPD_PATH` and
`LIB_HTTPD_MODULE_PATH`, it is not a nice experience for the developer
to figure out how to set those up. And in fact we can do better by
dynamically detecting both httpd and its module path at runtime:

    - The httpd binary can be located via PATH.

    - The module directory can (in many cases) be derived via the
      `HTTPD_ROOT` compile-time variable.

Amend the code to do so.

Note that the new runtime-detected paths will only be used as a fallback
in case none of the hardcoded paths are usable. For the PATH lookup this
is because httpd is typically installed into "/usr/sbin", which is often
not included in the user's PATH variable. And the module path detection
relies on a configured httpd installation and may thus not work in all
cases, either.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

l10n: update uk localization for v2.43

Co-authored-by: Kate Golovanova <kate@kgthreads.com>
Signed-off-by: Arkadii Yakovets <ark@cho.red>
Signed-off-by: Kate Golovanova <kate@kgthreads.com>

cmake: handle also unit tests

The unit tests should also be available e.g. in Visual Studio's Test
Explorer when configuring Git's source code via CMake.

Suggested-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

cmake: use test names instead of full paths

The primary purpose of Git's CMake definition is to allow developing Git
in Visual Studio. As part of that, the CTest feature allows running
individual test scripts conveniently in Visual Studio's Test Explorer.

However, this Test Explorer's design targets object-oriented languages
and therefore expects the test names in the form
`<namespace>.<class>.<testname>`. And since we specify the full path
of the test scripts instead, including the ugly `/.././t/` part, these
dots confuse the Test Explorer and it uses a large part of the path as
"namespace".

Let's just use `t.suite.<name>` instead. This presents the tests in
Visual Studio's Test Explorer in the following form by default (i.e.
unless the user changes the view via the "Group by" menu):

◢ ◈ git
◢ ◈ t
  ◢ ◈ suite
     ◈ t0000-basic
     ◈ t0001-init
     ◈ t0002-gitfile
     [...]

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

cmake: fix typo in variable name

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

artifacts-tar: when including `.dll` files, don't forget the unit-tests

As of recent, Git also builds executables in `t/unit-tests/`. For
technical reasons, when building with CMake and Visual C, the
dependencies (".dll files") need to be copied there, too, otherwise
running the executable will fail "due to missing dependencies".

The CMake definition already contains the directives to copy those
`.dll` files, but we also need to adjust the `artifacts-tar` rule in
the `Makefile` accordingly to let the `vs-test` job in the CI runs
pass successfully.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

unit-tests: do show relative file paths

Visual C interpolates `__FILE__` with the absolute _Windows_ path of
the source file. GCC interpolates it with the relative path, and the
tests even verify that.

So let's make sure that the unit tests only emit such paths.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

unit-tests: do not mistake `.pdb` files for being executable

When building the unit tests via CMake, the `.pdb` files are built.
Those are, essentially, files containing the debug information
separately from the executables.

Let's not confuse them with the executables we actually want to run.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

cmake: also build unit tests

A new, better way to run unit tests was just added to Git. This adds
support for building those unit tests via CMake.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

ci: run unit tests in CI

Run unit tests in both Cirrus and GitHub CI. For sharded CI instances
(currently just Windows on GitHub), run only on the first shard. This is
OK while we have only a single unit test executable, but we may wish to
distribute tests more evenly when we add new unit tests in the future.

We may also want to add more status output in our unit test framework,
so that we can do similar post-processing as in
ci/lib.sh:handle_failed_tests().

Signed-off-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

unit tests: add TAP unit test framework

This patch contains an implementation for writing unit tests with TAP
output. Each test is a function that contains one or more checks. The
test is run with the TEST() macro and if any of the checks fail then the
test will fail. A complete program that tests STRBUF_INIT would look
like

     #include "test-lib.h"
     #include "strbuf.h"

     static void t_static_init(void)
     {
             struct strbuf buf = STRBUF_INIT;

             check_uint(buf.len, ==, 0);
             check_uint(buf.alloc, ==, 0);
             check_char(buf.buf[0], ==, '\0');
     }

     int main(void)
     {
             TEST(t_static_init(), "static initialization works);

             return test_done();
     }

The output of this program would be

     ok 1 - static initialization works
     1..1

If any of the checks in a test fail then they print a diagnostic message
to aid debugging and the test will be reported as failing. For example a
failing integer check would look like

     # check "x >= 3" failed at my-test.c:102
     #    left: 2
     #   right: 3
     not ok 1 - x is greater than or equal to three

There are a number of check functions implemented so far. check() checks
a boolean condition, check_int(), check_uint() and check_char() take two
values to compare and a comparison operator. check_str() will check if
two strings are equal. Custom checks are simple to implement as shown in
the comments above test_assert() in test-lib.h.

Tests can be skipped with test_skip() which can be supplied with a
reason for skipping which it will print. Tests can print diagnostic
messages with test_msg().  Checks that are known to fail can be wrapped
in TEST_TODO().

There are a couple of example test programs included in this
patch. t-basic.c implements some self-tests and demonstrates the
diagnostic output for failing test. The output of this program is
checked by t0080-unit-test-output.sh. t-strbuf.c shows some example
unit tests for strbuf.c

The unit tests will be built as part of the default "make all" target,
to avoid bitrot. If you wish to build just the unit tests, you can run
"make build-unit-tests". To run the tests, you can use "make unit-tests"
or run the test binaries directly, as in "./t/unit-tests/bin/t-strbuf".

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

unit tests: add a project plan document

In our current testing environment, we spend a significant amount of
effort crafting end-to-end tests for error conditions that could easily
be captured by unit tests (or we simply forgo some hard-to-setup and
rare error conditions). Describe what we hope to accomplish by
implementing unit tests, and explain some open questions and milestones.
Discuss desired features for test frameworks/harnesses, and provide a
comparison of several different frameworks. Finally, document our
rationale for implementing a custom framework.

Co-authored-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

l10n: sv.po: Update Swedish translation (5579t)

Signed-off-by: Peter Krefting <peter@softwolves.pp.se>

commit-graph: mark chunk error messages for translation

The patches from f32af12cee (Merge branch 'jk/chunk-bounds', 2023-10-23)
added many new untranslated error messages. While it's unlikely for most
users to see these messages at all, most of the other commit-graph error
messages are translated (and likewise for the matching midx messages).

Let's mark them all for consistency (and to help any poor unfortunate
user who does manage to find a broken graph file).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit-graph: drop verify_commit_graph_lite()

As we've moved all of the checks from this function directly into the
chunk-reading code used by the caller (and there is only one caller), we
can just drop it entirely.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit-graph: check order while reading fanout chunk

We read the fanout chunk, storing a pointer to it, but only confirm that
the entries are monotonic in a final "lite" verification step. Let's
move that into the actual OIDF chunk callback, so that we can report
problems immediately (for all the reasons given in the previous
"commit-graph: abort as soon as we see a bogus chunk" commit).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit-graph: use fanout value for graph size

Commit-graph, midx, and pack idx files all have both a lookup table of
oids and an oid fanout table. In midx and pack idx files, we take the
final entry of the fanout table as the source of truth for the number of
entries, and then verify that the size of the lookup table matches that.
But for commit-graph files, we do the opposite: we use the size of the
lookup table as the source of truth, and then check the final fanout
entry against it.

As noted in 4169d89645 (commit-graph: check consistency of fanout
table, 2023-10-09), either is correct. But there are a few reasons to
prefer the fanout table as the source of truth:

  1. The fanout entries are 32-bits on disk, and that defines the
     maximum number of entries we can store. But since the size of the
     lookup table is only bounded by the filesystem, it can be much
     larger. And hence computing it as the commit-graph does means that
     we may truncate the result when storing it in a uint32_t.

  2. We read the fanout first, then the lookup table. If we're verifying
     the chunks as we read them, then we'd want to take the fanout as
     truth (we have nothing yet to check it against) and then we can
     check that the lookup table matches what we already know.

  3. It is pointlessly inconsistent with the midx and pack idx code.
     Since the three have to do similar size and bounds checks, it is
     easier to reason about all three if they use the same approach.

So this patch moves the assignment of g->num_commits to the fanout
parser, and then we can check the size of the lookup chunk as soon as we
try to load it.

There's already a test covering this situation, which munges the final
fanout entry to 2^32-1. In the current code we complain that it does not
agree with the table size. But now that we treat the munged value as the
source of truth, we'll complain that the lookup table is the wrong size
(again, either is correct). So we'll have to update the message we
expect (and likewise for an earlier test which does similar munging).

There's a similar test for this situation on the midx side, but rather
than making a very-large fanout value, it just truncates the lookup
table. We could do that here, too, but the very-large fanout value
actually shows an interesting corner case. On a 32-bit system,
multiplying to find the expected table size would cause an integer
overflow. Using st_mult() would detect that, but cause us to die()
rather than falling back to the non-graph code path. Checking the size
using division (as we do with existing chunk-size checks) avoids the
overflow entirely, and the test demonstrates this when run on a 32-bit
system.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit-graph: abort as soon as we see a bogus chunk

The code to read commit-graph files tries to read all of the required
chunks, but doesn't abort if we can't find one (or if it's corrupted).
It's only at the end of reading the file that we then do some sanity
checks for NULL entries. But it's preferable to detect the errors and
bail immediately, for a few reasons:

  1. It's less error-prone. It's easy in the reader functions to flag an
     error but still end up setting some struct fields (an error I in
     fact made while working on this patch series).

  2. It's safer. Since verifying some chunks depends on the values of
     other chunks, we may be depending on not-yet-verified data. I don't
     know offhand of any case where this can cause problems, but it's
     one less subtle thing to worry about in the reader code.

  3. It prevents the user from seeing nonsense errors. If we're missing
     an OIDL chunk, then g->num_commits will be zero. And so we may
     complain that the size of our CDAT chunk (which should have a
     fixed-size record for each commit) is wrong unless it's also zero.
     But that's misleading; the problem is the missing OIDL chunk; the
     CDAT one might be fine!

So let's just check the return value from read_chunk(). This is exactly
how the midx chunk-reading code does it.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit-graph: clarify missing-chunk error messages

When a required commit-graph chunk cannot be loaded, we leave its entry
in the struct NULL, and then later complain that it is missing. But
that's just one reason we might not have loaded it, as we also do some
data quality checks.

Let's switch these messages to say "missing or corrupted", which is
exactly what the midx code says for the same cases. Likewise, we'll use
the same phrasing and capitalization as those for consistency. And while
we're here, we can mark them for translation (just like the midx ones).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit-graph: drop redundant call to "lite" verification

The idea of verify_commit_graph_lite() is to have cheap verification
checks both for everyday use of the graph files (to avoid out of bounds
reads, etc) as well as for doing a full check via "commit-graph verify"
(which will also check the hash, etc).

But the expensive verification checks operate on a commit_graph struct,
which we get by using the normal everyday-reader code! So any problem
we'd find by calling it would have been found before we even got to the
verify_one_commit_graph() function.

Removing it simplifies the code a bit, but also frees us up to move the
"lite" verification steps around within that everyday-reader code.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

midx: check consistency of fanout table

The commit-graph, midx, and pack idx on-disk formats all have oid fanout
tables which are fed to bsearch_hash(). If these tables do not increase
monotonically, then the binary search may not only produce bogus values,
it may cause out of bounds reads.

We fixed this for commit graphs in 4169d89645 (commit-graph: check
consistency of fanout table, 2023-10-09). That commit argued that we did
not need to do the same for midx and pack idx files, because they
already did this check. However, that is wrong. We _do_ check the fanout
table for pack idx files when we load them, but we only do so for midx
files when running "git multi-pack-index verify". So it is possible to
get an out-of-bounds read by running a normal command with a specially
crafted midx file.

Let's fix this using the same solution (and roughly the same test) we
did for the commit-graph in 4169d89645. This replaces the same check
from "multi-pack-index verify", because verify uses the same read
routines, we'd bail on reading the midx much sooner now. So let's make
sure to copy its verbose error message.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit-graph: handle overflow in chunk_size checks

We check the size of chunks with fixed records by multiplying the width
of each record by the number of commits in the file. Like:

  if (chunk_size != g->num_commits * GRAPH_DATA_WIDTH)

If this multiplication overflows, we may not notice a chunk is too small
(which could later lead to out-of-bound reads).

In the current code this is only possible for the CDAT chunk, but the
reasons are quite subtle. We compute g->num_commits by dividing the size
of the OIDL chunk by the hash length (since it consists of a bunch of
hashes). So we know that any size_t multiplication that uses a value
smaller than the hash length cannot overflow. And the CDAT records are
the only ones that are larger (the others are just 4-byte records). So
it's worth fixing all of these, to make it clear that they're not
subject to overflow (without having to reason about seemingly unrelated
code).

The obvious thing to do is add an st_mult(), like:

  if (chunk_size != st_mult(g->num_commits, GRAPH_DATA_WIDTH))

And that certainly works, but it has one downside: if we detect an
overflow, we'll immediately die(). But the commit graph is an optional
file; if we run into other problems loading it, we'll generally return
an error and fall back to accessing the full objects. Using st_mult()
means a malformed file will abort the whole process.

So instead, we can do a division like this:

  if (chunk_size / GRAPH_DATA_WIDTH != g->num_commits)

where there's no possibility of overflow. We do lose a little bit of
precision; due to integer division truncation we'd allow up to an extra
GRAPH_DATA_WIDTH-1 bytes of data in the chunk. That's OK. Our main goal
here is making sure we don't have too _few_ bytes, which would cause an
out-of-bounds read (we could actually replace our "!=" with "<", but I
think it's worth being a little pedantic, as a large mismatch could be a
sign of other problems).

I didn't add a test here. We'd need to generate a very large graph file
in order to get g->num_commits large enough to cause an overflow. And a
later patch in this series will use this same division technique in a
way that is much easier to trigger in the tests.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

ci: add support for GitLab CI

We already support Azure Pipelines and GitHub Workflows in the Git
project, but until now we do not have support for GitLab CI. While it is
arguably not in the interest of the Git project to maintain a ton of
different CI platforms, GitLab has recently ramped up its efforts and
tries to contribute to the Git project more regularly.

Part of a problem we hit at GitLab rather frequently is that our own,
custom CI setup we have is so different to the setup that the Git
project has. More esoteric jobs like "linux-TEST-vars" that also set a
couple of environment variables do not exist in GitLab's custom CI
setup, and maintaining them to keep up with what Git does feels like
wasted time. The result is that we regularly send patch series upstream
that fail to compile or pass tests in GitHub Workflows. We would thus
like to integrate the GitLab CI configuration into the Git project to
help us send better patch series upstream and thus reduce overhead for
the maintainer. Results of these pipeline runs will be made available
(at least) in GitLab's mirror of the Git project at [1].

This commit introduces the integration into our regular CI scripts so
that most of the setup continues to be shared across all of the CI
solutions. Note that as the builds on GitLab CI run as unprivileged
user, we need to pull in both sudo and shadow packages to our Alpine
based job to set this up.

[1]: https://gitlab.com/gitlab-org/git

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

ci: install test dependencies for linux-musl

The linux-musl CI job executes tests on Alpine Linux, which is based on
musl libc instead of glibc. We're missing some test dependencies though,
which causes us to skip a subset of tests.

Install these test dependencies to increase our test coverage on this
platform. There are still some missing test dependecies, but these do
not have a corresponding package in the Alpine repositories:

    - p4 and p4d, both parts of the Perforce version control system.

    - cvsps, which generates patch sets for CVS.

    - Subversion and the SVN::Core Perl library, the latter of which is
      not available in the Alpine repositories. While the tool itself is
      available, all Subversion-related tests are skipped without the
      SVN::Core Perl library anyway.

The Apache2-based tests require a bit more care though. For one, the
module path is different on Alpine Linux, which requires us to add it to
the list of known module paths to detect it. But second, the WebDAV
module on Alpine Linux is broken because it does not bundle the default
database backend [1]. We thus need to skip the WebDAV-based tests on
Alpine Linux for now.

[1]: https://gitlab.alpinelinux.org/alpine/aports/-/issues/13112

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

ci: squelch warnings when testing with unusable Git repo

Our CI jobs that run on Docker also use mostly the same architecture to
build and test Git via the "ci/run-build-and-tests.sh" script. These
scripts also provide some functionality to massage the Git repository
we're supposedly operating in.

In our Docker-based infrastructure we may not even have a Git repository
available though, which leads to warnings when those functions execute.
Make the helpers exit gracefully in case either there is no Git in our
PATH, or when not running in a Git repository.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

ci: unify setup of some environment variables

Both GitHub Actions and Azure Pipelines set up the environment variables
GIT_TEST_OPTS, GIT_PROVE_OPTS and MAKEFLAGS. And while most values are
actually the same, the setup is completely duplicate. With the upcoming
support for GitLab CI this duplication would only extend even further.

Unify the setup of those environment variables so that only the uncommon
parts are separated. While at it, we also perform some additional small
improvements:

    - We now always pass `--state=failed,slow,save` via GIT_PROVE_OPTS.
      It doesn't hurt on platforms where we don't persist the state, so
      this further reduces boilerplate.

    - When running on Windows systems we set `--no-chain-lint` and
      `--no-bin-wrappers`. Interestingly though, we did so _after_
      already having exported the respective environment variables.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

ci: split out logic to set up failed test artifacts

We have some logic in place to create a directory with the output from
failed tests, which will then subsequently be uploaded as CI artifacts.
We're about to add support for GitLab CI, which will want to reuse the
logic.

Split the logic into a separate function so that it is reusable.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

ci: group installation of Docker dependencies

The output of CI jobs tends to be quite long-winded and hard to digest.
To help with this, many CI systems provide the ability to group output
into collapsible sections, and we're also doing this in some of our
scripts.

One notable omission is the script to install Docker dependencies.
Address it to bring more structure to the output for Docker-based jobs.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

ci: make grouping setup more generic

Make the grouping setup more generic by always calling `begin_group ()`
and `end_group ()` regardless of whether we have stubbed those functions
or not. This ensures we can more readily add support for additional CI
platforms.

Furthermore, the `group ()` function is made generic so that it is the
same for both GitHub Actions and for other platforms. There is a
semantic conflict here though: GitHub Actions used to call `set +x` in
`group ()` whereas the non-GitHub case unconditionally uses `set -x`.
The latter would get overriden if we kept the `set +x` in the generic
version of `group ()`. To resolve this conflict, we simply drop the `set
+x` in the generic variant of this function. As `begin_group ()` calls
`set -x` anyway this is not much of a change though, as the only
commands that aren't printed anymore now are the ones between the
beginning of `group ()` and the end of `begin_group ()`.

Last, this commit changes `end_group ()` to also accept a parameter that
indicates _which_ group should end. This will be required by a later
commit that introduces support for GitLab CI.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

ci: reorder definitions for grouping functions

We define a set of grouping functions that are used to group together
output in our CI, where these groups then end up as collapsible sections
in the respective pipeline platform. The way these functions are defined
is not easily extensible though as we have an up front check for the CI
_not_ being GitHub Actions, where we define the non-stub logic in the
else branch.

Reorder the conditional branches such that we explicitly handle GitHub
Actions.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

l10n: tr: v2.43.0

Signed-off-by: Emir SARI <emir_sari@icloud.com>

Git 2.43-rc1

Signed-off-by: Junio C Hamano <gitster@pobox.com>

Merge branch 'tb/rev-list-unpacked-fix'

"git rev-list --unpacked --objects" failed to exclude packed
non-commit objects, which has been corrected.

* tb/rev-list-unpacked-fix:
pack-bitmap: drop --unpacked non-commit objects from results
list-objects: drop --unpacked non-commit objects from results

Merge branch 'ps/leakfixes'

Leakfix.

* ps/leakfixes:
  setup: fix leaking repository format
  setup: refactor `upgrade_repository_format()` to have common exit
  shallow: fix memory leak when registering shallow roots
  test-bloom: stop setting up Git directory twice

Prepare for -rc1

Signed-off-by: Junio C Hamano <gitster@pobox.com>

Merge branch 'jc/test-i18ngrep'

Another step to deprecate test_i18ngrep.

* jc/test-i18ngrep:
tests: teach callers of test_i18ngrep to use test_grep
test framework: further deprecate test_i18ngrep

Merge branch 'la/strvec-header-fix'

Code clean-up.

* la/strvec-header-fix:
strvec: drop unnecessary include of hex.h

Merge branch 'bc/merge-file-object-input'

"git merge-file" learns a mode to read three contents to be merged
from blob objects.

* bc/merge-file-object-input:
merge-file: add an option to process object IDs
git-merge-file doc: drop "-file" from argument placeholders

Merge branch 'kn/rev-list-missing-fix'

"git rev-list --missing" did not work for missing commit objects,
which has been corrected.

* kn/rev-list-missing-fix:
  rev-list: add commit object support in `--missing` option
  rev-list: move `show_commit()` to the bottom
  revision: rename bit to `do_not_die_on_missing_objects`

Merge branch 'an/clang-format-typofix'

Typofix.

* an/clang-format-typofix:
clang-format: fix typo in comment

Merge branch 'tb/format-pack-doc-update'

Doc update.

* tb/format-pack-doc-update:
Documentation/gitformat-pack.txt: fix incorrect MIDX documentation
Documentation/gitformat-pack.txt: fix typo

Merge branch 'ps/show-ref'

Teach "git show-ref" a mode to check the existence of a ref.

* ps/show-ref:
  t: use git-show-ref(1) to check for ref existence
  builtin/show-ref: add new mode to check for reference existence
  builtin/show-ref: explicitly spell out different modes in synopsis
  builtin/show-ref: ensure mutual exclusiveness of subcommands
  builtin/show-ref: refactor options for patterns subcommand
  builtin/show-ref: stop using global vars for `show_one()`
  builtin/show-ref: stop using global variable to count matches
  builtin/show-ref: refactor `--exclude-existing` options
  builtin/show-ref: fix dead code when passing patterns
  builtin/show-ref: fix leaking string buffer
  builtin/show-ref: split up different subcommands
  builtin/show-ref: convert pattern to a local variable

Merge branch 'ps/do-not-trust-commit-graph-blindly-for-existence'

The codepath to traverse the commit-graph learned to notice that a
commit is missing (e.g., corrupt repository lost an object), even
though it knows something about the commit (like its parents) from
what is in commit-graph.

* ps/do-not-trust-commit-graph-blindly-for-existence:
commit: detect commits that exist in commit-graph but not in the ODB
commit-graph: introduce envvar to disable commit existence checks

Merge branch 'js/ci-use-macos-13'

Replace macos-12 used at GitHub CI with macos-13.

* js/ci-use-macos-13:
ci: upgrade to using macos-13

Merge branch 'jk/chunk-bounds'

Test portability fix.

* jk/chunk-bounds:
t: avoid perl's pack/unpack "Q" specifier

Merge branch 'jk/tree-name-and-depth-limit'

Further limit tree depth max to avoid Windows build running out of
the stack space.

* jk/tree-name-and-depth-limit:
max_tree_depth: lower it for MSVC to avoid stack overflows

pack-bitmap: drop --unpacked non-commit objects from results

When performing revision queries with `--objects` and
`--use-bitmap-index`, the output may incorrectly contain objects which
are packed, even when the `--unpacked` option is given. This affects
traversals, but also other querying operations, like `--count`,
`--disk-usage`, etc.

Like in the previous commit, the fix is to exclude those objects from
the result set before they are shown to the user (or, in this case,
before the bitmap containing the result of the traversal is enumerated
and its objects listed).

This is performed by a new function in pack-bitmap.c, called
`filter_packed_objects_from_bitmap()`. Note that we do not have to
inspect individual bits in the result bitmap, since we know that the
first N (where N is the number of objects in the bitmap's pack/MIDX)
bits correspond to objects which packed by definition.

In other words, for an object to have a bitmap position (not in the
extended index), it must appear in either the bitmap's pack or one of
the packs in its MIDX.

This presents an appealing optimization to us, which is that we can
simply memset() the corresponding number of `eword_t`'s to zero,
provided that we handle any objects which spill into the next word (but
don't occupy all 64 bits of the word itself).

We only have to handle objects in the bitmap's extended index. These
objects may (or may not) appear in one or more pack(s). Since these
objects are known to not appear in either the bitmap's MIDX or pack,
they may be stored as loose, appear in other pack(s), or both.

Before returning a bitmap containing the result of the traversal back to
the caller, drop any bits from the extended index which appear in one or
more packs. This implements the correct behavior for rev-list operations
which use the bitmap index to compute their result.

Co-authored-by: Jeff King <peff@peff.net>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

list-objects: drop --unpacked non-commit objects from results

In git-rev-list(1), we describe the `--unpacked` option as:

    Only useful with `--objects`; print the object IDs that are not in
    packs.

This is true of commits, which we discard via get_commit_action(), but
not of the objects they reach. So if we ask for an --objects traversal
with --unpacked, we may get arbitrarily many objects which are indeed
packed.

I am nearly certain this behavior dates back to the introduction of
`--unpacked` via 12d2a18780 ("git rev-list --unpacked" shows only
unpacked commits, 2005-07-03), but I couldn't get that revision of Git
to compile for me. At least as early as v2.0.0 this has been subtly
broken:

    $ git.compile --version
    git version 2.0.0

    $ git.compile rev-list --objects --all --unpacked
    72791fe96c93f9ec5c311b8bc966ab349b3b5bbe
    05713d991c18bbeef7e154f99660005311b5004d v1.0
    153ed8b7719c6f5a68ce7ffc43133e95a6ac0fdb
    8e4020bb5a8d8c873b25de15933e75cc0fc275df one
    9200b628cf9dc883a85a7abc8d6e6730baee589c two
    3e6b46e1b7e3b91acce99f6a823104c28aae0b58 unpacked.t

There, only the first, third, and sixth entries are loose, with the
remaining set of objects belonging to at least one pack.

The implications for this are relatively benign: bare 'git repack'
invocations which invoke pack-objects with --unpacked are impacted, and
at worst we'll store a few extra objects that should have been excluded.

Arguably changing this behavior is a backwards-incompatible change,
since it alters the set of objects emitted from rev-list queries with
`--objects` and `--unpacked`. But I argue that this change is still
sensible, since the existing implementation deviates from
clearly-written documentation.

The fix here is straightforward: avoid showing any non-commit objects
which are contained in packs by discarding them within list-objects.c,
before they are shown to the user. Note that similar treatment for
`list-objects.c::show_commit()` is not needed, since that case is
already handled by `revision.c::get_commit_action()`.

Co-authored-by: Jeff King <peff@peff.net>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

RelNotes: improve wording of credential helper notes

Offer a slightly more verbose description of the issue fixed by
7144dee3ec (credential/libsecret: erase matching creds only, 2023-07-26)
and cb626f8e5c (credential/wincred: erase matching creds only,
2023-07-26).

Signed-off-by: Todd Zullinger <tmz@pobox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

RelNotes: minor typo fixes in 2.43.0 draft

Signed-off-by: Todd Zullinger <tmz@pobox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

A bit more before -rc1

Merge branch 'rc/trace-upload-pack'

Trace2 update.

* rc/trace-upload-pack:
upload-pack: add tracing for fetches

Merge branch 'es/bugreport-no-extra-arg'

"git bugreport" learned to complain when it received a command line
argument that it will not use.

* es/bugreport-no-extra-arg:
bugreport: reject positional arguments
t0091-bugreport: stop using i18ngrep

Merge branch 'js/my-first-contribution-update'

Documentation update.

* js/my-first-contribution-update:
Include gettext.h in MyFirstContribution tutorial

Merge branch 'ms/send-email-validate-fix'

"git send-email" did not have certain pieces of data computed yet
when it tried to validate the outging messages and its recipient
addresses, which has been sorted out.

* ms/send-email-validate-fix:
send-email: move validation code below process_address_list

Merge branch 'rs/reflog-expire-single-worktree-fix'

"git reflog expire --single-worktree" has been broken for the past
20 months or so, which has been corrected.

* rs/reflog-expire-single-worktree-fix:
reflog: fix expire --single-worktree

Merge branch 'rs/fix-arghelp'

Doc and help update.

* rs/fix-arghelp:
am, rebase: fix arghelp syntax of --empty

Merge branch 'rs/parse-options-cmdmode'

parse-options improvements for OPT_CMDMODE options.

* rs/parse-options-cmdmode:
am: simplify --show-current-patch handling
parse-options: make CMDMODE errors more precise

Merge branch 'jc/grep-f-relative-to-cwd'

"cd sub && git grep -f patterns" tried to read "patterns" file at
the top level of the working tree; it has been corrected to read
"sub/patterns" instead.

* jc/grep-f-relative-to-cwd:
grep: -f <path> is relative to $cwd

Merge branch 'ar/submitting-patches-doc-update'

Doc update.

* ar/submitting-patches-doc-update:
SubmittingPatches: call gitk's command "Copy commit reference"

setup: fix leaking repository format

While populating the `repository_format` structure may cause us to
allocate memory, we do not call `clear_repository_format()` in some
places and thus potentially leak memory. Fix this.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

setup: refactor `upgrade_repository_format()` to have common exit

The `upgrade_repository_format()` function has multiple exit paths,
which means that there is no common cleanup of acquired resources.
While this isn't much of a problem right now, we're about to fix a
memory leak that would require us to free the resource in every one of
those exit paths.

Refactor the code to have a common exit path so that the subsequent
memory leak fix becomes easier to implement.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

shallow: fix memory leak when registering shallow roots

When registering shallow roots, we unset the list of parents of the
to-be-registered commit if it's already been parsed. This causes us to
leak memory though because we never free this list. Fix this.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

test-bloom: stop setting up Git directory twice

We're setting up the Git directory twice in the `test-tool bloom`
helper, once at the beginning of `cmd_bloom()` and once in the local
subcommand implementation `get_bloom_filter_for_commit()`. This can lead
to memory leaks as we'll overwrite variables of `the_repository` with
newly allocated data structures. On top of that it's simply unnecessary.

Fix this by only setting up the Git directory once.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

attr: enable attr pathspec magic for git-add and git-stash

Allow users to limit or exclude files based on file attributes
during git-add and git-stash.

For example, the chromium project would like to use

    $ git add . ':(exclude,attr:submodule)'

as submodules are managed by an external tool, forbidding end users
to record changes with "git add".  Allowing "git add" to often
records changes that users do not want in their commits.

This commit does not change any attr magic implementation. It is
only adding attr as an allowed pathspec in git-add and git-stash,
which was previously blocked by GUARD_PATHSPEC and a pathspec mask
in parse_pathspec()).

However, we fix a bug in prefix_magic() where attr values were
unintentionally removed.  This was triggerable when parse_pathspec()
is called with PATHSPEC_PREFIX_ORIGIN as a flag, which was the case
for git-stash (Bug originally filed here [*])

Furthermore, while other commands hit this code path it did not
result in unexpected behavior because this bug only impacts the
pathspec->items->original field which is NOT used to filter
paths. However, git-stash does use pathspec->items->original when
building args used to call other git commands.  (See add_pathspecs()
usage and implementation in stash.c)

It is possible that when the attr pathspec feature was first added
in b0db704652 (pathspec: allow querying for attributes, 2017-03-13),
"PATHSPEC_ATTR" was just unintentionally left out of a few
GUARD_PATHSPEC() invocations.

Later, to get a more user-friendly error message when attr was used
with git-add, PATHSPEC_ATTR was added as a mask to git-add's
invocation of parse_pathspec() 84d938b732 (add: do not accept
pathspec magic 'attr', 2018-09-18).  However, this user-friendly
error message was never added for git-stash.

[Reference]
* https://lore.kernel.org/git/CAMmZTi-0QKtj7Q=sbC5qhipGsQxJFOY-Qkk1jfkRYwfF5FcUVg@mail.gmail.com/)

Signed-off-by: Joanna Wang <jojwang@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t: avoid perl's pack/unpack "Q" specifier

The perl script introduced by 86b008ee61 (t: add library for munging
chunk-format files, 2023-10-09) uses pack("Q") and unpack("Q") to read
and write 64-bit values ("quadwords" in perl parlance) from the on-disk
chunk files. However, some builds of perl may not support 64-bit
integers at all, and throw an exception here. While some 32-bit
platforms may still support 64-bit integers in perl (such as our linux32
CI environment), others reportedly don't (the NonStop 32-bit builds).

We can work around this by treating the 64-bit values as two 32-bit
values. We can't ever combine them into a single 64-bit value, but in
practice this is OK. These are representing file offsets, and our files
are much smaller than 4GB. So the upper half of the 64-bit value will
always be 0.

We can just introduce a few helper functions which perform the
translation and double-check our assumptions.

Reported-by: Randall S. Becker <randall.becker@nexbridge.ca>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>