Jeff King [Wed, 11 Mar 2020 22:48:24 +0000 (18:48 -0400)]
fsck: detect gitmodules URLs with embedded newlines
The credential protocol can't handle values with newlines. We already
detect and block any such URLs from being used with credential helpers,
but let's also add an fsck check to detect and block gitmodules files
with such URLs. That will let us notice the problem earlier when
transfer.fsckObjects is turned on. And in particular it will prevent bad
objects from spreading, which may protect downstream users running older
versions of Git.
We'll file this under the existing gitmodulesUrl flag, which covers URLs
with option injection. There's really no need to distinguish the exact
flaw in the URL in this context. Likewise, I've expanded the description
of t7416 to cover all types of bogus URLs.
Jeff King [Thu, 12 Mar 2020 05:31:11 +0000 (01:31 -0400)]
credential: detect unrepresentable values when parsing urls
The credential protocol can't represent newlines in values, but URLs can
embed percent-encoded newlines in various components. A previous commit
taught the low-level writing routines to die() when encountering this,
but we can be a little friendlier to the user by detecting them earlier
and handling them gracefully.
This patch teaches credential_from_url() to notice such components,
issue a warning, and blank the credential (which will generally result
in prompting the user for a username and password). We blank the whole
credential in this case. Another option would be to blank only the
invalid component. However, we're probably better off not feeding a
partially-parsed URL result to a credential helper. We don't know how a
given helper would handle it, so we're better off to err on the side of
matching nothing rather than something unexpected.
The die() call in credential_write() is _probably_ impossible to reach
after this patch. Values should end up in credential structs only by URL
parsing (which is covered here), or by reading credential protocol input
(which by definition cannot read a newline into a value). But we should
definitely keep the low-level check, as it's our final and most accurate
line of defense against protocol injection attacks. Arguably it could
become a BUG(), but it probably doesn't matter much either way.
Note that the public interface of credential_from_url() grows a little
more than we need here. We'll use the extra flexibility in a future
patch to help fsck catch these cases.
Jeff King [Wed, 11 Mar 2020 22:11:37 +0000 (18:11 -0400)]
t/lib-credential: use test_i18ncmp to check stderr
The credential tests have a "check" function which feeds some input to
git-credential and checks the stdout and stderr. We look for exact
matches in the output. For stdout, this makes sense; the output is
the credential protocol. But for stderr, we may be showing various
diagnostic messages, or the prompts fed to the askpass program, which
could be translated. Let's mark them as such.
Jeff King [Wed, 11 Mar 2020 21:53:41 +0000 (17:53 -0400)]
credential: avoid writing values with newlines
The credential protocol that we use to speak to helpers can't represent
values with newlines in them. This was an intentional design choice to
keep the protocol simple, since none of the values we pass should
generally have newlines.
However, if we _do_ encounter a newline in a value, we blindly transmit
it in credential_write(). Such values may break the protocol syntax, or
worse, inject new valid lines into the protocol stream.
The most likely way for a newline to end up in a credential struct is by
decoding a URL with a percent-encoded newline. However, since the bug
occurs at the moment we write the value to the protocol, we'll catch it
there. That should leave no possibility of accidentally missing a code
path that can trigger the problem.
At this level of the code we have little choice but to die(). However,
since we'd not ever expect to see this case outside of a malicious URL,
that's an acceptable outcome.
Jonathan Nieder [Thu, 5 Dec 2019 09:30:43 +0000 (01:30 -0800)]
fsck: reject submodule.update = !command in .gitmodules
This allows hosting providers to detect whether they are being used
to attack users using malicious 'update = !command' settings in
.gitmodules.
Since ac1fbbda2013 (submodule: do not copy unknown update mode from
.gitmodules, 2013-12-02), in normal cases such settings have been
treated as 'update = none', so forbidding them should not produce any
collateral damage to legitimate uses. A quick search does not reveal
any repositories making use of this construct, either.
Reported-by: Joern Schneeweisz <jschneeweisz@gitlab.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
* maint-2.16: (31 commits)
Git 2.16.6
test-drop-caches: use `has_dos_drive_prefix()`
Git 2.15.4
Git 2.14.6
mingw: handle `subst`-ed "DOS drives"
mingw: refuse to access paths with trailing spaces or periods
mingw: refuse to access paths with illegal characters
unpack-trees: let merged_entry() pass through do_add_entry()'s errors
quote-stress-test: offer to test quoting arguments for MSYS2 sh
t6130/t9350: prepare for stringent Win32 path validation
quote-stress-test: allow skipping some trials
quote-stress-test: accept arguments to test via the command-line
tests: add a helper to stress test argument quoting
mingw: fix quoting of arguments
Disallow dubiously-nested submodule git directories
protect_ntfs: turn on NTFS protection by default
path: also guard `.gitmodules` against NTFS Alternate Data Streams
is_ntfs_dotgit(): speed it up
mingw: disallow backslash characters in tree objects' file names
path: safeguard `.git` against NTFS Alternate Streams Accesses
...
This is a companion patch to 'mingw: handle `subst`-ed "DOS drives"':
use the DOS drive prefix handling that is already provided by
`compat/mingw.c` (and which just learned to handle non-alphabetical
"drive letters").
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
* maint-2.15: (29 commits)
Git 2.15.4
Git 2.14.6
mingw: handle `subst`-ed "DOS drives"
mingw: refuse to access paths with trailing spaces or periods
mingw: refuse to access paths with illegal characters
unpack-trees: let merged_entry() pass through do_add_entry()'s errors
quote-stress-test: offer to test quoting arguments for MSYS2 sh
t6130/t9350: prepare for stringent Win32 path validation
quote-stress-test: allow skipping some trials
quote-stress-test: accept arguments to test via the command-line
tests: add a helper to stress test argument quoting
mingw: fix quoting of arguments
Disallow dubiously-nested submodule git directories
protect_ntfs: turn on NTFS protection by default
path: also guard `.gitmodules` against NTFS Alternate Data Streams
is_ntfs_dotgit(): speed it up
mingw: disallow backslash characters in tree objects' file names
path: safeguard `.git` against NTFS Alternate Streams Accesses
clone --recurse-submodules: prevent name squatting on Windows
is_ntfs_dotgit(): only verify the leading segment
...
Jonathan Nieder [Thu, 5 Dec 2019 09:28:28 +0000 (01:28 -0800)]
submodule: reject submodule.update = !command in .gitmodules
Since ac1fbbda2013 (submodule: do not copy unknown update mode from
.gitmodules, 2013-12-02), Git has been careful to avoid copying
[submodule "foo"]
update = !run an arbitrary scary command
from .gitmodules to a repository's local config, copying in the
setting 'update = none' instead. The gitmodules(5) manpage documents
the intention:
The !command form is intentionally ignored here for security
reasons
Unfortunately, starting with v2.20.0-rc0 (which integrated ee69b2a9
(submodule--helper: introduce new update-module-mode helper,
2018-08-13, first released in v2.20.0-rc0)), there are scenarios where
we *don't* ignore it: if the config store contains no
submodule.foo.update setting, the submodule-config API falls back to
reading .gitmodules and the repository-supplied !command gets run
after all.
This was part of a general change over time in submodule support to
read more directly from .gitmodules, since unlike .git/config it
allows a project to change values between branches and over time
(while still allowing .git/config to override things). But it was
never intended to apply to this kind of dangerous configuration.
The behavior change was not advertised in ee69b2a9's commit message
and was missed in review.
Let's take the opportunity to make the protection more robust, even in
Git versions that are technically not affected: instead of quietly
converting 'update = !command' to 'update = none', noisily treat it as
an error. Allowing the setting but treating it as meaning something
else was just confusing; users are better served by seeing the error
sooner. Forbidding the construct makes the semantics simpler and
means we can check for it in fsck (in a separate patch).
As a result, the submodule-config API cannot read this value from
.gitmodules under any circumstance, and we can declare with confidence
For security reasons, the '!command' form is not accepted
here.
Reported-by: Joern Schneeweisz <jschneeweisz@gitlab.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
* maint-2.14: (28 commits)
Git 2.14.6
mingw: handle `subst`-ed "DOS drives"
mingw: refuse to access paths with trailing spaces or periods
mingw: refuse to access paths with illegal characters
unpack-trees: let merged_entry() pass through do_add_entry()'s errors
quote-stress-test: offer to test quoting arguments for MSYS2 sh
t6130/t9350: prepare for stringent Win32 path validation
quote-stress-test: allow skipping some trials
quote-stress-test: accept arguments to test via the command-line
tests: add a helper to stress test argument quoting
mingw: fix quoting of arguments
Disallow dubiously-nested submodule git directories
protect_ntfs: turn on NTFS protection by default
path: also guard `.gitmodules` against NTFS Alternate Data Streams
is_ntfs_dotgit(): speed it up
mingw: disallow backslash characters in tree objects' file names
path: safeguard `.git` against NTFS Alternate Streams Accesses
clone --recurse-submodules: prevent name squatting on Windows
is_ntfs_dotgit(): only verify the leading segment
test-path-utils: offer to run a protectNTFS/protectHFS benchmark
...
While the only permitted drive letters for physical drives on Windows
are letters of the US-English alphabet, this restriction does not apply
to virtual drives assigned via `subst <letter>: <path>`.
To prevent targeted attacks against systems where "funny" drive letters
such as `1` or `!` are assigned, let's handle them as regular drive
letters on Windows.
This fixes CVE-2019-1351.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
On Windows, filenames cannot have trailing spaces or periods, when
opening such paths, they are stripped automatically. Read: you can open
the file `README` via the file name `README . . .`. This ambiguity can
be used in combination with other security bugs to cause e.g. remote
code execution during recursive clones. This patch series fixes that.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
This patch fixes a vulnerability in the Windows-specific code where a
submodule names ending in a backslash were quoted incorrectly, and that
bug could be abused to insert command-line parameters e.g. to `ssh` in a
recursive clone.
Note: this bug is Windows-only, as we have to construct a command line
for the process-to-spawn, unlike Linux/macOS, where `execv()` accepts an
already-split command line.
While at it, other quoting issues are fixed as well.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
This patch series makes it safe to use Git on Windows drives, even if
running on a mounted network share or within the Windows Subsystem for
Linux (WSL).
This topic branch addresses CVE-2019-1353.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Over a decade ago, in 25fe217b86c (Windows: Treat Windows style path
names., 2008-03-05), Git was taught to handle absolute Windows paths,
i.e. paths that start with a drive letter and a colon.
Unbeknownst to us, while drive letters of physical drives are limited to
letters of the English alphabet, there is a way to assign virtual drive
letters to arbitrary directories, via the `subst` command, which is
_not_ limited to English letters.
It is therefore possible to have absolute Windows paths of the form
`1:\what\the\hex.txt`. Even "better": pretty much arbitrary Unicode
letters can also be used, e.g. `ä:\tschibät.sch`.
While it can be sensibly argued that users who set up such funny drive
letters really seek adverse consequences, the Windows Operating System
is known to be a platform where many users are at the mercy of
administrators who have their very own idea of what constitutes a
reasonable setup.
Therefore, let's just make sure that such funny paths are still
considered absolute paths by Git, on Windows.
In addition to Unicode characters, pretty much any character is a valid
drive letter, as far as `subst` is concerned, even `:` and `"` or even a
space character. While it is probably the opposite of smart to use them,
let's safeguard `is_dos_drive_prefix()` against all of them.
Note: `[::1]:repo` is a valid URL, but not a valid path on Windows.
As `[` is now considered a valid drive letter, we need to be very
careful to avoid misinterpreting such a string as valid local path in
`url_is_local_not_ssh()`. To do that, we use the just-introduced
function `is_valid_path()` (which will label the string as invalid file
name because of the colon characters).
This fixes CVE-2019-1351.
Reported-by: Nicolas Joly <Nicolas.Joly@microsoft.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
This patch series plugs an attack vector we had overlooked in our
December 2014 work on `core.protectNTFS`.
Essentially, the path `.git::$INDEX_ALLOCATION/config` is interpreted as
`.git/config` when NTFS Alternate Data Streams are available (which they
are on Windows, and at least on network shares that are SMB-mounted on
macOS).
Needless to say: we don't want that.
In fact, we want to stay on the very safe side and not even special-case
the `$INDEX_ALLOCATION` stream type: let's just prevent Git from
touching _any_ explicitly specified Alternate Data Stream of `.git`.
In essence, we'll prevent Git from tracking, or writing to, any path
with a segment of the form `.git:<anything>`.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
mingw: refuse to access paths with trailing spaces or periods
When creating a directory on Windows whose path ends in a space or a
period (or chains thereof), the Win32 API "helpfully" trims those. For
example, `mkdir("abc ");` will return success, but actually create a
directory called `abc` instead.
This stems back to the DOS days, when all file names had exactly 8
characters plus exactly 3 characters for the file extension, and the
only way to have shorter names was by padding with spaces.
Sadly, this "helpful" behavior is a bit inconsistent: after a successful
`mkdir("abc ");`, a `mkdir("abc /def")` will actually _fail_ (because
the directory `abc ` does not actually exist).
Even if it would work, we now have a serious problem because a Git
repository could contain directories `abc` and `abc `, and on Windows,
they would be "merged" unintentionally.
As these paths are illegal on Windows, anyway, let's disallow any
accesses to such paths on that Operating System.
For practical reasons, this behavior is still guarded by the
config setting `core.protectNTFS`: it is possible (and at least two
regression tests make use of it) to create commits without involving the
worktree. In such a scenario, it is of course possible -- even on
Windows -- to create such file names.
Among other consequences, this patch disallows submodules' paths to end
in spaces on Windows (which would formerly have confused Git enough to
try to write into incorrect paths, anyway).
While this patch does not fix a vulnerability on its own, it prevents an
attack vector that was exploited in demonstrations of a number of
recently-fixed security bugs.
The regression test added to `t/t7417-submodule-path-url.sh` reflects
that attack vector.
Note that we have to adjust the test case "prevent git~1 squatting on
Windows" in `t/t7415-submodule-names.sh` because of a very subtle issue.
It tries to clone two submodules whose names differ only in a trailing
period character, and as a consequence their git directories differ in
the same way. Previously, when Git tried to clone the second submodule,
it thought that the git directory already existed (because on Windows,
when you create a directory with the name `b.` it actually creates `b`),
but with this patch, the first submodule's clone will fail because of
the illegal name of the git directory. Therefore, when cloning the
second submodule, Git will take a different code path: a fresh clone
(without an existing git directory). Both code paths fail to clone the
second submodule, both because the the corresponding worktree directory
exists and is not empty, but the error messages are worded differently.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
quote-stress-test: offer to test quoting arguments for MSYS2 sh
It is unfortunate that we need to quote arguments differently on
Windows, depending whether we build a command-line for MSYS2's `sh` or
for other Windows executables.
We already have a test helper to verify the latter, with this patch we
can also verify the former.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
mingw: refuse to access paths with illegal characters
Certain characters are not admissible in file names on Windows, even if
Cygwin/MSYS2 (and therefore, Git for Windows' Bash) pretend that they
are, e.g. `:`, `<`, `>`, etc
Let's disallow those characters explicitly in Windows builds of Git.
Note: just like trailing spaces or periods, it _is_ possible on Windows
to create commits adding files with such illegal characters, as long as
the operation leaves the worktree untouched. To allow for that, we
continue to guard `is_valid_win32_path()` behind the config setting
`core.protectNTFS`, so that users _can_ continue to do that, as long as
they turn the protections off via that config setting.
Among other problems, this prevents Git from trying to write to an "NTFS
Alternate Data Stream" (which refers to metadata stored alongside a
file, under a special name: "<filename>:<stream-name>"). This fix
therefore also prevents an attack vector that was exploited in
demonstrations of a number of recently-fixed security bugs.
Further reading on illegal characters in Win32 filenames:
https://docs.microsoft.com/en-us/windows/win32/fileio/naming-a-file
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
unpack-trees: let merged_entry() pass through do_add_entry()'s errors
A `git clone` will end with exit code 0 when `merged_entry()` returns a
positive value during a call of `unpack_trees()` to `traverse_trees()`.
The reason is that `unpack_trees()` will interpret a positive value not
to be an error.
The problem is, however, that `add_index_entry()` (which is called by
`merged_entry()` can report an error, and we really should fail the
entire clone in such a case.
Let's fix this problem, in preparation for a Windows-specific patch
disallowing `mkdir()` with directory names that contain a trailing space
(which is illegal on NTFS): we want `git clone` to abort when a path
cannot be checked out due to that condition.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
t6130/t9350: prepare for stringent Win32 path validation
On Windows, file names cannot contain asterisks nor newline characters.
In an upcoming commit, we will make this limitation explicit,
disallowing even the creation of commits that introduce such file names.
However, in the test scripts touched by this patch, we _know_ that those
paths won't be checked out, so we _want_ to allow such file names.
Happily, the stringent path validation will be guarded via the
`core.protectNTFS` flag, so all we need to do is to force that flag off
temporarily.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
quote-stress-test: accept arguments to test via the command-line
When the stress test reported a problem with quoting certain arguments,
it is helpful to have a facility to play with those arguments in order
to find out whether variations of those arguments are affected, too.
Let's allow `test-run-command quote-stress-test -- <args>` to be used
for that purpose.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
tests: add a helper to stress test argument quoting
On Windows, we have to do all the command-line argument quoting
ourselves. Worse: we have to have two versions of said quoting, one for
MSYS2 programs (which have their own dequoting rules) and the rest.
We care mostly about the rest, and to make sure that that works, let's
have a stress test that comes up with all kinds of awkward arguments,
verifying that a spawned sub-process receives those unharmed.
Signed-off-by: Garima Singh <garima.singh@microsoft.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Currently it is technically possible to let a submodule's git
directory point right into the git dir of a sibling submodule.
Example: the git directories of two submodules with the names `hippo`
and `hippo/hooks` would be `.git/modules/hippo/` and
`.git/modules/hippo/hooks/`, respectively, but the latter is already
intended to house the former's hooks.
In most cases, this is just confusing, but there is also a (quite
contrived) attack vector where Git can be fooled into mistaking remote
content for file contents it wrote itself during a recursive clone.
Let's plug this bug.
To do so, we introduce the new function `validate_submodule_git_dir()`
which simply verifies that no git dir exists for any leading directories
of the submodule name (if there are any).
Note: this patch specifically continues to allow sibling modules names
of the form `core/lib`, `core/doc`, etc, as long as `core` is not a
submodule name.
This fixes CVE-2019-1387.
Reported-by: Nicolas Joly <Nicolas.Joly@microsoft.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Back in the DOS days, in the FAT file system, file names always
consisted of a base name of length 8 plus a file extension of length 3.
Shorter file names were simply padded with spaces to the full 8.3
format.
Later, the FAT file system was taught to support _also_ longer names,
with an 8.3 "short name" as primary file name. While at it, the same
facility allowed formerly illegal file names, such as `.git` (empty base
names were not allowed), which would have the "short name" `git~1`
associated with it.
For backwards-compatibility, NTFS supports alternative 8.3 short
filenames, too, even if starting with Windows Vista, they are only
generated on the system drive by default.
We addressed the problem that the `.git/` directory can _also_ be
accessed via `git~1/` (when short names are enabled) in 2b4c6efc821
(read-cache: optionally disallow NTFS .git variants, 2014-12-16), i.e.
since Git v1.9.5, by introducing the config setting `core.protectNTFS`
and enabling it by default on Windows.
In the meantime, Windows 10 introduced the "Windows Subsystem for Linux"
(short: WSL), i.e. a way to run Linux applications/distributions in a
thinly-isolated subsystem on Windows (giving rise to many a "2016 is the
Year of Linux on the Desktop" jokes). WSL is getting increasingly
popular, also due to the painless way Linux application can operate
directly ("natively") on files on Windows' file system: the Windows
drives are mounted automatically (e.g. `C:` as `/mnt/c/`).
Taken together, this means that we now have to enable the safe-guards of
Git v1.9.5 also in WSL: it is possible to access a `.git` directory
inside `/mnt/c/` via the 8.3 name `git~1` (unless short name generation
was disabled manually). Since regular Linux distributions run in WSL,
this means we have to enable `core.protectNTFS` at least on Linux, too.
To enable Services for Macintosh in Windows NT to store so-called
resource forks, NTFS introduced "Alternate Data Streams". Essentially,
these constitute additional metadata that are connected to (and copied
with) their associated files, and they are accessed via pseudo file
names of the form `filename:<stream-name>:<stream-type>`.
In a recent patch, we extended `core.protectNTFS` to also protect
against accesses via NTFS Alternate Data Streams, e.g. to prevent
contents of the `.git/` directory to be "tracked" via yet another
alternative file name.
While it is not possible (at least by default) to access files via NTFS
Alternate Data Streams from within WSL, the defaults on macOS when
mounting network shares via SMB _do_ allow accessing files and
directories in that way. Therefore, we need to enable `core.protectNTFS`
on macOS by default, too, and really, on any Operating System that can
mount network shares via SMB/CIFS.
A couple of approaches were considered for fixing this:
1. We could perform a dynamic NTFS check similar to the `core.symlinks`
check in `init`/`clone`: instead of trying to create a symbolic link
in the `.git/` directory, we could create a test file and try to
access `.git/config` via 8.3 name and/or Alternate Data Stream.
2. We could simply "flip the switch" on `core.protectNTFS`, to make it
"on by default".
The obvious downside of 1. is that it won't protect worktrees that were
clone with a vulnerable Git version already. We considered patching code
paths that check out files to check whether we're running on an NTFS
system dynamically and persist the result in the repository-local config
setting `core.protectNTFS`, but in the end decided that this solution
would be too fragile, and too involved.
The obvious downside of 2. is that everybody will have to "suffer" the
performance penalty incurred from calling `is_ntfs_dotgit()` on every
path, even in setups where.
After the recent work to accelerate `is_ntfs_dotgit()` in most cases,
it looks as if the time spent on validating ten million random
file names increases only negligibly (less than 20ms, well within the
standard deviation of ~50ms). Therefore the benefits outweigh the cost.
Another downside of this is that paths that might have been acceptable
previously now will be forbidden. Realistically, though, this is an
improvement because public Git hosters already would reject any `git
push` that contains such file names.
Note: There might be a similar problem mounting HFS+ on Linux. However,
this scenario has been considered unlikely and in light of the cost (in
the aforementioned benchmark, `core.protectHFS = true` increased the
time from ~440ms to ~610ms), it was decided _not_ to touch the default
of `core.protectHFS`.
This change addresses CVE-2019-1353.
Reported-by: Nicolas Joly <Nicolas.Joly@microsoft.com> Helped-by: Garima Singh <garima.singh@microsoft.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
path: also guard `.gitmodules` against NTFS Alternate Data Streams
We just safe-guarded `.git` against NTFS Alternate Data Stream-related
attack vectors, and now it is time to do the same for `.gitmodules`.
Note: In the added regression test, we refrain from verifying all kinds
of variations between short names and NTFS Alternate Data Streams: as
the new code disallows _all_ Alternate Data Streams of `.gitmodules`, it
is enough to test one in order to know that all of them are guarded
against.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
We need to be careful to follow proper quoting rules. For example, if an
argument contains spaces, we have to quote them. Double-quotes need to
be escaped. Backslashes need to be escaped, but only if they are
followed by a double-quote character.
We need to be _extra_ careful to consider the case where an argument
ends in a backslash _and_ needs to be quoted: in this case, we append a
double-quote character, i.e. the backslash now has to be escaped!
The current code, however, fails to recognize that, and therefore can
turn an argument that ends in a single backslash into a quoted argument
that now ends in an escaped double-quote character. This allows
subsequent command-line parameters to be split and part of them being
mistaken for command-line options, e.g. through a maliciously-crafted
submodule URL during a recursive clone.
Technically, we would not need to quote _all_ arguments which end in a
backslash _unless_ the argument needs to be quoted anyway. For example,
`test\` would not need to be quoted, while `test \` would need to be.
To keep the code simple, however, and therefore easier to reason about
and ensure its correctness, we now _always_ quote an argument that ends
in a backslash.
This addresses CVE-2019-1350.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Previously, this function was written without focusing on speed,
intending to make reviewing the code as easy as possible, to avoid any
bugs in this critical code.
Turns out: we can do much better on both accounts. With this patch, we
make it as fast as this developer can make it go:
- We avoid the call to `is_dir_sep()` and make all the character
comparisons explicit.
- We avoid the cost of calling `strncasecmp()` and unroll the test for
`.git` and `git~1`, not even using `tolower()` because it is faster to
compare against two constant values.
- We look for `.git` and `.git~1` first thing, and return early if not
found.
- We also avoid calling a separate function for detecting chains of
spaces and periods.
Each of these improvements has a noticeable impact on the speed of
`is_ntfs_dotgit()`.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
path: safeguard `.git` against NTFS Alternate Streams Accesses
Probably inspired by HFS' resource streams, NTFS supports "Alternate
Data Streams": by appending `:<stream-name>` to the file name,
information in addition to the file contents can be written and read,
information that is copied together with the file (unless copied to a
non-NTFS location).
These Alternate Data Streams are typically used for things like marking
an executable as having just been downloaded from the internet (and
hence not necessarily being trustworthy).
In addition to a stream name, a stream type can be appended, like so:
`:<stream-name>:<stream-type>`. Unless specified, the default stream
type is `$DATA` for files and `$INDEX_ALLOCATION` for directories. In
other words, `.git::$INDEX_ALLOCATION` is a valid way to reference the
`.git` directory!
In our work in Git v2.2.1 to protect Git on NTFS drives under
`core.protectNTFS`, we focused exclusively on NTFS short names, unaware
of the fact that NTFS Alternate Data Streams offer a similar attack
vector.
Let's fix this.
Seeing as it is better to be safe than sorry, we simply disallow paths
referring to *any* NTFS Alternate Data Stream of `.git`, not just
`::$INDEX_ALLOCATION`. This also simplifies the implementation.
This closes CVE-2019-1352.
Further reading about NTFS Alternate Data Streams:
https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-fscc/c54dec26-1551-4d3a-a0ea-4fa40f848eb3
Reported-by: Nicolas Joly <Nicolas.Joly@microsoft.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
The config setting `core.protectNTFS` is specifically designed to work
not only on Windows, but anywhere, to allow for repositories hosted on,
say, Linux servers to be protected against NTFS-specific attack vectors.
As a consequence, `is_ntfs_dotgit()` manually splits backslash-separated
paths (but does not do the same for paths separated by forward slashes),
under the assumption that the backslash might not be a valid directory
separator on the _current_ Operating System.
However, the two callers, `verify_path()` and `fsck_tree()`, are
supposed to feed only individual path segments to the `is_ntfs_dotgit()`
function.
This causes a lot of duplicate scanning (and very inefficient scanning,
too, as the inner loop of `is_ntfs_dotgit()` was optimized for
readability rather than for speed.
Let's simplify the design of `is_ntfs_dotgit()` by putting the burden of
splitting the paths by backslashes as directory separators on the
callers of said function.
Consequently, the `verify_path()` function, which already splits the
path by directory separators, now treats backslashes as directory
separators _explicitly_ when `core.protectNTFS` is turned on, even on
platforms where the backslash is _not_ a directory separator.
Note that we have to repeat some code in `verify_path()`: if the
backslash is not a directory separator on the current Operating System,
we want to allow file names like `\`, but we _do_ want to disallow paths
that are clearly intended to cause harm when the repository is cloned on
Windows.
The `fsck_tree()` function (the other caller of `is_ntfs_dotgit()`) now
needs to look for backslashes in tree entries' names specifically when
`core.protectNTFS` is turned on. While it would be tempting to
completely disallow backslashes in that case (much like `fsck` reports
names containing forward slashes as "full paths"), this would be
overzealous: when `core.protectNTFS` is turned on in a non-Windows
setup, backslashes are perfectly valid characters in file names while we
_still_ want to disallow tree entries that are clearly designed to
exploit NTFS-specific behavior.
This simplification will make subsequent changes easier to implement,
such as turning `core.protectNTFS` on by default (not only on Windows)
or protecting against attack vectors involving NTFS Alternate Data
Streams.
Incidentally, this change allows for catching malicious repositories
that contain tree entries of the form `dir\.gitmodules` already on the
server side rather than only on the client side (and previously only on
Windows): in contrast to `is_ntfs_dotgit()`, the
`is_ntfs_dotgitmodules()` function already expects the caller to split
the paths by directory separators.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
test-path-utils: offer to run a protectNTFS/protectHFS benchmark
In preparation to flipping the default on `core.protectNTFS`, let's have
some way to measure the speed impact of this config setting reliably
(and for comparison, the `core.protectHFS` config setting).
where the arguments are an optional number of file names to test with,
optionally followed by minimum and maximum length of the random file
names. The default values are one million, 3 and 20, respectively.
Just like `sqrti()` in `bisect.c`, we introduce a very simple function
to approximation the square root of a given value, in order to avoid
having to introduce the first user of `<math.h>` in Git's source code.
Note: this is _not_ implemented as a Unix shell script in t/perf/
because we really care about _very_ precise timings here, and Unix shell
scripts are simply unsuited for precise and consistent benchmarking.
Signed-off-by: Garima Singh <garima.singh@microsoft.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
This patch series fixes an issue where Git could formerly have been
tricked into creating a `.git` file with an unexpected (and therefore
unprotected) NTFS short name.
Incidentally, it also fixes an issue where a tree entry containing a
backslash could be tricked into following a symbolic link, i.e. Git
could be tricked into writing files outside the worktree.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
The `--export-marks` option of `git fast-import` is exposed also via the
in-stream command `feature export-marks=...` and it allows overwriting
arbitrary paths.
This topic branch prevents the in-stream version, to prevent arbitrary
file accesses by `git fast-import` streams coming from untrusted sources
(e.g. in remote helpers that are based on `git fast-import`).
This fixes CVE-2019-1348.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
mingw: disallow backslash characters in tree objects' file names
The backslash character is not a valid part of a file name on Windows.
Hence it is dangerous to allow writing files that were unpacked from
tree objects, when the stored file name contains a backslash character:
it will be misinterpreted as directory separator.
This not only causes ambiguity when a tree contains a blob `a\b` and a
tree `a` that contains a blob `b`, but it also can be used as part of an
attack vector to side-step the careful protections against writing into
the `.git/` directory during a clone of a maliciously-crafted
repository.
Let's prevent that, addressing CVE-2019-1354.
Note: we guard against backslash characters in tree objects' file names
_only_ on Windows (because on other platforms, even on those where NTFS
volumes can be mounted, the backslash character is _not_ a directory
separator), and _only_ when `core.protectNTFS = true` (because users
might need to generate tree objects for other platforms, of course
without touching the worktree, e.g. using `git update-index
--cacheinfo`).
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
clone --recurse-submodules: prevent name squatting on Windows
In addition to preventing `.git` from being tracked by Git, on Windows
we also have to prevent `git~1` from being tracked, as the default NTFS
short name (also known as the "8.3 filename") for the file name `.git`
is `git~1`, otherwise it would be possible for malicious repositories to
write directly into the `.git/` directory, e.g. a `post-checkout` hook
that would then be executed _during_ a recursive clone.
When we implemented appropriate protections in 2b4c6efc821 (read-cache:
optionally disallow NTFS .git variants, 2014-12-16), we had analyzed
carefully that the `.git` directory or file would be guaranteed to be
the first directory entry to be written. Otherwise it would be possible
e.g. for a file named `..git` to be assigned the short name `git~1` and
subsequently, the short name generated for `.git` would be `git~2`. Or
`git~3`. Or even `~9999999` (for a detailed explanation of the lengths
we have to go to protect `.gitmodules`, see the commit message of e7cb0b4455c (is_ntfs_dotgit: match other .git files, 2018-05-11)).
However, by exploiting two issues (that will be addressed in a related
patch series close by), it is currently possible to clone a submodule
into a non-empty directory:
- On Windows, file names cannot end in a space or a period (for
historical reasons: the period separating the base name from the file
extension was not actually written to disk, and the base name/file
extension was space-padded to the full 8/3 characters, respectively).
Helpfully, when creating a directory under the name, say, `sub.`, that
trailing period is trimmed automatically and the actual name on disk
is `sub`.
This means that while Git thinks that the submodule names `sub` and
`sub.` are different, they both access `.git/modules/sub/`.
- While the backslash character is a valid file name character on Linux,
it is not so on Windows. As Git tries to be cross-platform, it
therefore allows backslash characters in the file names stored in tree
objects.
Which means that it is totally possible that a submodule `c` sits next
to a file `c\..git`, and on Windows, during recursive clone a file
called `..git` will be written into `c/`, of course _before_ the
submodule is cloned.
Note that the actual exploit is not quite as simple as having a
submodule `c` next to a file `c\..git`, as we have to make sure that the
directory `.git/modules/b` already exists when the submodule is checked
out, otherwise a different code path is taken in `module_clone()` that
does _not_ allow a non-empty submodule directory to exist already.
Even if we will address both issues nearby (the next commit will
disallow backslash characters in tree entries' file names on Windows,
and another patch will disallow creating directories/files with trailing
spaces or periods), it is a wise idea to defend in depth against this
sort of attack vector: when submodules are cloned recursively, we now
_require_ the directory to be empty, addressing CVE-2019-1349.
Note: the code path we patch is shared with the code path of `git
submodule update --init`, which must not expect, in general, that the
directory is empty. Hence we have to introduce the new option
`--force-init` and hand it all the way down from `git submodule` to the
actual `git submodule--helper` process that performs the initial clone.
Reported-by: Nicolas Joly <Nicolas.Joly@microsoft.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Jeff King [Thu, 29 Aug 2019 19:08:42 +0000 (15:08 -0400)]
fast-import: disallow "feature import-marks" by default
As with export-marks in the previous commit, import-marks can access the
filesystem. This is significantly less dangerous than export-marks
because it only involves reading from arbitrary paths, rather than
writing them. However, it could still be surprising and have security
implications (e.g., exfiltrating data from a service that accepts
fast-import streams).
Let's lump it (and its "if-exists" counterpart) in with export-marks,
and enable the in-stream version only if --allow-unsafe-features is set.
Jeff King [Thu, 29 Aug 2019 18:37:26 +0000 (14:37 -0400)]
fast-import: disallow "feature export-marks" by default
The fast-import stream command "feature export-marks=<path>" lets the
stream write marks to an arbitrary path. This may be surprising if you
are running fast-import against an untrusted input (which otherwise
cannot do anything except update Git objects and refs).
Let's disallow the use of this feature by default, and provide a
command-line option to re-enable it (you can always just use the
command-line --export-marks as well, but the in-stream version provides
an easy way for exporters to control the process).
This is a backwards-incompatible change, since the default is flipping
to the new, safer behavior. However, since the main users of the
in-stream versions would be import/export-based remote helpers, and
since we trust remote helpers already (which are already running
arbitrary code), we'll pass the new option by default when reading a
remote helper's stream. This should minimize the impact.
Note that the implementation isn't totally simple, as we have to work
around the fact that fast-import doesn't parse its command-line options
until after it has read any "feature" lines from the stream. This is how
it lets command-line options override in-stream. But in our case, it's
important to parse the new --allow-unsafe-features first.
There are three options for resolving this:
1. Do a separate "early" pass over the options. This is easy for us to
do because there are no command-line options that allow the
"unstuck" form (so there's no chance of us mistaking an argument
for an option), though it does introduce a risk of incorrect
parsing later (e.g,. if we convert to parse-options).
2. Move the option parsing phase back to the start of the program, but
teach the stream-reading code never to override an existing value.
This is tricky, because stream "feature" lines override each other
(meaning we'd have to start tracking the source for every option).
3. Accept that we might parse a "feature export-marks" line that is
forbidden, as long we don't _act_ on it until after we've parsed
the command line options.
This would, in fact, work with the current code, but only because
the previous patch fixed the export-marks parser to avoid touching
the filesystem.
So while it works, it does carry risk of somebody getting it wrong
in the future in a rather subtle and unsafe way.
I've gone with option (1) here as simple, safe, and unlikely to cause
regressions.
Jeff King [Thu, 29 Aug 2019 17:33:48 +0000 (13:33 -0400)]
fast-import: delay creating leading directories for export-marks
When we parse the --export-marks option, we don't immediately open the
file, but we do create any leading directories. This can be especially
confusing when a command-line option overrides an in-stream one, in
which case we'd create the leading directory for the in-stream file,
even though we never actually write the file.
Let's instead create the directories just before opening the file, which
means we'll create only useful directories. Note that this could change
the handling of relative paths if we chdir() in between, but we don't
actually do so; the only permanent chdir is from setup_git_directory()
which runs before either code path (potentially we should take the
pre-setup dir into account to avoid surprising the user, but that's an
orthogonal change).
The test just adapts the existing "override" test to use paths with
leading directories. This checks both that the correct directory is
created (which worked before but was not tested), and that the
overridden one is not (our new fix here).
While we're here, let's also check the error result of
safe_create_leading_directories(). We'd presumably notice any failure
immediately after when we try to open the file itself, but we can give a
more specific error message in this case.
Jeff King [Thu, 29 Aug 2019 17:07:04 +0000 (13:07 -0400)]
fast-import: stop creating leading directories for import-marks
When asked to import marks from "subdir/file.marks", we create the
leading directory "subdir" if it doesn't exist. This makes no sense for
importing marks, where we only ever open the path for reading.
Most of the time this would be a noop, since if the marks file exists,
then the leading directories exist, too. But if it doesn't (e.g.,
because --import-marks-if-exists was used), then we'd create the useless
directory.
This dates back to 580d5f83e7 (fast-import: always create marks_file
directories, 2010-03-29). Even then it was useless, so it seems to have
been added in error alongside the --export-marks case (which _is_
helpful).
Jeff King [Thu, 29 Aug 2019 15:25:45 +0000 (11:25 -0400)]
fast-import: tighten parsing of boolean command line options
We parse options like "--max-pack-size=" using skip_prefix(), which
makes sense to get at the bytes after the "=". However, we also parse
"--quiet" and "--stats" with skip_prefix(), which allows things like
"--quiet-nonsense" to behave like "--quiet".
This was a mistaken conversion in 0f6927c229 (fast-import: put option
parsing code in separate functions, 2009-12-04). Let's tighten this to
an exact match, which was the original intent.
Jeff King [Thu, 29 Aug 2019 17:43:23 +0000 (13:43 -0400)]
t9300: create marks files for double-import-marks test
Our tests confirm that providing two "import-marks" options in a
fast-import stream is an error. However, the invoked command would fail
even without covering this case, because the marks files themselves do
not actually exist. Let's create the files to make sure we fail for the
right reason (we actually do, because the option parsing happens before
we open anything, but this future-proofs our test).
Jeff King [Mon, 24 Sep 2018 08:42:19 +0000 (04:42 -0400)]
fsck: detect submodule paths starting with dash
As with urls, submodule paths with dashes are ignored by
git, but may end up confusing older versions. Detecting them
via fsck lets us prevent modern versions of git from being a
vector to spread broken .gitmodules to older versions.
Compared to blocking leading-dash urls, though, this
detection may be less of a good idea:
1. While such paths provide confusing and broken results,
they don't seem to actually work as option injections
against anything except "cd". In particular, the
submodule code seems to canonicalize to an absolute
path before running "git clone" (so it passes
/your/clone/-sub).
2. It's more likely that we may one day make such names
actually work correctly. Even after we revert this fsck
check, it will continue to be a hassle until hosting
servers are all updated.
On the other hand, it's not entirely clear that the behavior
in older versions is safe. And if we do want to eventually
allow this, we may end up doing so with a special syntax
anyway (e.g., writing "./-sub" in the .gitmodules file, and
teaching the submodule code to canonicalize it when
comparing).
So on balance, this is probably a good protection.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Mon, 24 Sep 2018 08:37:17 +0000 (04:37 -0400)]
fsck: detect submodule urls starting with dash
Urls with leading dashes can cause mischief on older
versions of Git. We should detect them so that they can be
rejected by receive.fsckObjects, preventing modern versions
of git from being a vector by which attacks can spread.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Thu, 27 Sep 2018 18:41:02 +0000 (11:41 -0700)]
Sync with 2.16.5
* maint-2.16:
Git 2.16.5
Git 2.15.3
Git 2.14.5
submodule-config: ban submodule paths that start with a dash
submodule-config: ban submodule urls that start with dash
submodule--helper: use "--" to signal end of clone options
Junio C Hamano [Thu, 27 Sep 2018 18:35:43 +0000 (11:35 -0700)]
Sync with 2.15.3
* maint-2.15:
Git 2.15.3
Git 2.14.5
submodule-config: ban submodule paths that start with a dash
submodule-config: ban submodule urls that start with dash
submodule--helper: use "--" to signal end of clone options
Junio C Hamano [Thu, 27 Sep 2018 18:20:22 +0000 (11:20 -0700)]
Sync with Git 2.14.4
* maint-2.14:
Git 2.14.5
submodule-config: ban submodule paths that start with a dash
submodule-config: ban submodule urls that start with dash
submodule--helper: use "--" to signal end of clone options
Jeff King [Mon, 24 Sep 2018 08:39:55 +0000 (04:39 -0400)]
submodule-config: ban submodule paths that start with a dash
We recently banned submodule urls that look like
command-line options. This is the matching change to ban
leading-dash paths.
As with the urls, this should not break any use cases that
currently work. Even with our "--" separator passed to
git-clone, git-submodule.sh gets confused. Without the code
portion of this patch, the clone of "-sub" added in t7417
would yield results like:
/path/to/git-submodule: 410: cd: Illegal option -s
/path/to/git-submodule: 417: cd: Illegal option -s
/path/to/git-submodule: 410: cd: Illegal option -s
/path/to/git-submodule: 417: cd: Illegal option -s
Fetched in submodule path '-sub', but it did not contain b56243f8f4eb91b2f1f8109452e659f14dd3fbe4. Direct fetching of that commit failed.
Moreover, naively adding such a submodule doesn't work:
$ git submodule add $url -sub
The following path is ignored by one of your .gitignore files:
-sub
even though there is no such ignore pattern (the test script
hacks around this with a well-placed "git mv").
Unlike leading-dash urls, though, it's possible that such a
path _could_ be useful if we eventually made it work. So
this commit should be seen not as recommending a particular
policy, but rather temporarily closing off a broken and
possibly dangerous code-path. We may revisit this decision
later.
There are two minor differences to the tests in t7416 (that
covered urls):
1. We don't have a "./-sub" escape hatch to make this
work, since the submodule code expects to be able to
match canonical index names to the path field (so you
are free to add submodule config with that path, but we
would never actually use it, since an index entry would
never start with "./").
2. After this patch, cloning actually succeeds. Since we
ignore the submodule.*.path value, we fail to find a
config stanza for our submodule at all, and simply
treat it as inactive. We still check for the "ignoring"
message.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Mon, 24 Sep 2018 08:36:30 +0000 (04:36 -0400)]
submodule-config: ban submodule urls that start with dash
The previous commit taught the submodule code to invoke our
"git clone $url $path" with a "--" separator so that we
aren't confused by urls or paths that start with dashes.
However, that's just one code path. It's not clear if there
are others, and it would be an easy mistake to add one in
the future. Moreover, even with the fix in the previous
commit, it's quite hard to actually do anything useful with
such an entry. Any url starting with a dash must fall into
one of three categories:
- it's meant as a file url, like "-path". But then any
clone is not going to have the matching path, since it's
by definition relative inside the newly created clone. If
you spell it as "./-path", the submodule code sees the
"/" and translates this to an absolute path, so it at
least works (assuming the receiver has the same
filesystem layout as you). But that trick does not apply
for a bare "-path".
- it's meant as an ssh url, like "-host:path". But this
already doesn't work, as we explicitly disallow ssh
hostnames that begin with a dash (to avoid option
injection against ssh).
- it's a remote-helper scheme, like "-scheme::data". This
_could_ work if the receiver bends over backwards and
creates a funny-named helper like "git-remote--scheme".
But normally there would not be any helper that matches.
Since such a url does not work today and is not likely to do
anything useful in the future, let's simply disallow them
entirely. That protects the existing "git clone" path (in a
belt-and-suspenders way), along with any others that might
exist.
Our tests cover two cases:
1. A file url with "./" continues to work, showing that
there's an escape hatch for people with truly silly
repo names.
2. A url starting with "-" is rejected.
Note that we expect case (2) to fail, but it would have done
so even without this commit, for the reasons given above.
So instead of just expecting failure, let's also check for
the magic word "ignoring" on stderr. That lets us know that
we failed for the right reason.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Mon, 24 Sep 2018 08:32:15 +0000 (04:32 -0400)]
submodule--helper: use "--" to signal end of clone options
When we clone a submodule, we call "git clone $url $path".
But there's nothing to say that those components can't begin
with a dash themselves, confusing git-clone into thinking
they're options. Let's pass "--" to make it clear what we
expect.
There's no test here, because it's actually quite hard to
make these names work, even with "git clone" parsing them
correctly. And we're going to restrict these cases even
further in future commits. So we'll leave off testing until
then; this is just the minimal fix to prevent us from doing
something stupid with a badly formed entry.
Reported-by: joernchen <joernchen@phenoelit.de> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Tue, 22 May 2018 05:25:26 +0000 (14:25 +0900)]
Sync with Git 2.16.4
* maint-2.16:
Git 2.16.4
Git 2.15.2
Git 2.14.4
Git 2.13.7
verify_path: disallow symlinks in .gitmodules
update-index: stat updated files earlier
verify_dotfile: mention case-insensitivity in comment
verify_path: drop clever fallthrough
skip_prefix: add case-insensitive variant
is_{hfs,ntfs}_dotgitmodules: add tests
is_ntfs_dotgit: match other .git files
is_hfs_dotgit: match other .git files
is_ntfs_dotgit: use a size_t for traversing string
submodule-config: verify submodule names as paths
Junio C Hamano [Tue, 22 May 2018 05:18:06 +0000 (14:18 +0900)]
Sync with Git 2.15.2
* maint-2.15:
Git 2.15.2
Git 2.14.4
Git 2.13.7
verify_path: disallow symlinks in .gitmodules
update-index: stat updated files earlier
verify_dotfile: mention case-insensitivity in comment
verify_path: drop clever fallthrough
skip_prefix: add case-insensitive variant
is_{hfs,ntfs}_dotgitmodules: add tests
is_ntfs_dotgit: match other .git files
is_hfs_dotgit: match other .git files
is_ntfs_dotgit: use a size_t for traversing string
submodule-config: verify submodule names as paths
Junio C Hamano [Tue, 22 May 2018 05:15:14 +0000 (14:15 +0900)]
Sync with Git 2.14.4
* maint-2.14:
Git 2.14.4
Git 2.13.7
verify_path: disallow symlinks in .gitmodules
update-index: stat updated files earlier
verify_dotfile: mention case-insensitivity in comment
verify_path: drop clever fallthrough
skip_prefix: add case-insensitive variant
is_{hfs,ntfs}_dotgitmodules: add tests
is_ntfs_dotgit: match other .git files
is_hfs_dotgit: match other .git files
is_ntfs_dotgit: use a size_t for traversing string
submodule-config: verify submodule names as paths
Junio C Hamano [Tue, 22 May 2018 05:10:49 +0000 (14:10 +0900)]
Sync with Git 2.13.7
* maint-2.13:
Git 2.13.7
verify_path: disallow symlinks in .gitmodules
update-index: stat updated files earlier
verify_dotfile: mention case-insensitivity in comment
verify_path: drop clever fallthrough
skip_prefix: add case-insensitive variant
is_{hfs,ntfs}_dotgitmodules: add tests
is_ntfs_dotgit: match other .git files
is_hfs_dotgit: match other .git files
is_ntfs_dotgit: use a size_t for traversing string
submodule-config: verify submodule names as paths
Junio C Hamano [Tue, 22 May 2018 04:48:26 +0000 (13:48 +0900)]
Merge branch 'jk/submodule-fix-loose' into maint-2.13
* jk/submodule-fix-loose:
verify_path: disallow symlinks in .gitmodules
update-index: stat updated files earlier
verify_dotfile: mention case-insensitivity in comment
verify_path: drop clever fallthrough
skip_prefix: add case-insensitive variant
is_{hfs,ntfs}_dotgitmodules: add tests
is_ntfs_dotgit: match other .git files
is_hfs_dotgit: match other .git files
is_ntfs_dotgit: use a size_t for traversing string
submodule-config: verify submodule names as paths
Jeff King [Sat, 5 May 2018 00:03:35 +0000 (20:03 -0400)]
fsck: complain when .gitmodules is a symlink
We've recently forbidden .gitmodules to be a symlink in
verify_path(). And it's an easy way to circumvent our fsck
checks for .gitmodules content. So let's complain when we
see it.
Jeff King [Fri, 4 May 2018 23:45:01 +0000 (19:45 -0400)]
index-pack: check .gitmodules files with --strict
Now that the internal fsck code has all of the plumbing we
need, we can start checking incoming .gitmodules files.
Naively, it seems like we would just need to add a call to
fsck_finish() after we've processed all of the objects. And
that would be enough to cover the initial test included
here. But there are two extra bits:
1. We currently don't bother calling fsck_object() at all
for blobs, since it has traditionally been a noop. We'd
actually catch these blobs in fsck_finish() at the end,
but it's more efficient to check them when we already
have the object loaded in memory.
2. The second pass done by fsck_finish() needs to access
the objects, but we're actually indexing the pack in
this process. In theory we could give the fsck code a
special callback for accessing the in-pack data, but
it's actually quite tricky:
a. We don't have an internal efficient index mapping
oids to packfile offsets. We only generate it on
the fly as part of writing out the .idx file.
b. We'd still have to reconstruct deltas, which means
we'd basically have to replicate all of the
reading logic in packfile.c.
Instead, let's avoid running fsck_finish() until after
we've written out the .idx file, and then just add it
to our internal packed_git list.
This does mean that the objects are "in the repository"
before we finish our fsck checks. But unpack-objects
already exhibits this same behavior, and it's an
acceptable tradeoff here for the same reason: the
quarantine mechanism means that pushes will be
fully protected.
In addition to a basic push test in t7415, we add a sneaky
pack that reverses the usual object order in the pack,
requiring that index-pack access the tree and blob during
the "finish" step.
This already works for unpack-objects (since it will have
written out loose objects), but we'll check it with this
sneaky pack for good measure.
Jeff King [Fri, 4 May 2018 23:40:08 +0000 (19:40 -0400)]
unpack-objects: call fsck_finish() after fscking objects
As with the previous commit, we must call fsck's "finish"
function in order to catch any queued objects for
.gitmodules checks.
This second pass will be able to access any incoming
objects, because we will have exploded them to loose objects
by now.
This isn't quite ideal, because it means that bad objects
may have been written to the object database (and a
subsequent operation could then reference them, even if the
other side doesn't send the objects again). However, this is
sufficient when used with receive.fsckObjects, since those
loose objects will all be placed in a temporary quarantine
area that will get wiped if we find any problems.
Jeff King [Wed, 2 May 2018 21:20:35 +0000 (17:20 -0400)]
fsck: call fsck_finish() after fscking objects
Now that the internal fsck code is capable of checking
.gitmodules files, we just need to teach its callers to use
the "finish" function to check any queued objects.
With this, we can now catch the malicious case in t7415 with
git-fsck.
Jeff King [Wed, 2 May 2018 21:25:27 +0000 (17:25 -0400)]
fsck: check .gitmodules content
This patch detects and blocks submodule names which do not
match the policy set forth in submodule-config. These should
already be caught by the submodule code itself, but putting
the check here means that newer versions of Git can protect
older ones from malicious entries (e.g., a server with
receive.fsckObjects will block the objects, protecting
clients which fetch from it).
As a side effect, this means fsck will also complain about
.gitmodules files that cannot be parsed (or were larger than
core.bigFileThreshold).
Jeff King [Mon, 14 May 2018 16:22:48 +0000 (12:22 -0400)]
fsck: handle promisor objects in .gitmodules check
If we have a tree that points to a .gitmodules blob but
don't have that blob, we can't check its contents. This
produces an fsck error when we encounter it.
But in the case of a promisor object, this absence is
expected, and we must not complain. Note that this can
technically circumvent our transfer.fsckObjects check.
Imagine a client fetches a tree, but not the matching
.gitmodules blob. An fsck of the incoming objects will show
that we don't have enough information. Later, we do fetch
the actual blob. But we have no idea that it's a .gitmodules
file.
The only ways to get around this would be to re-scan all of
the existing trees whenever new ones enter (which is
expensive), or to somehow persist the gitmodules_found set
between fsck runs (which is complicated).
In practice, it's probably OK to ignore the problem. Any
repository which has all of the objects (including the one
serving the promisor packs) can perform the checks. Since
promisor packs are inherently about a hierarchical topology
in which clients rely on upstream repositories, those
upstream repositories can protect all of their downstream
clients from broken objects.
Jeff King [Wed, 2 May 2018 21:20:08 +0000 (17:20 -0400)]
fsck: detect gitmodules files
In preparation for performing fsck checks on .gitmodules
files, this commit plumbs in the actual detection of the
files. Note that unlike most other fsck checks, this cannot
be a property of a single object: we must know that the
object is found at a ".gitmodules" path at the root tree of
a commit.
Since the fsck code only sees one object at a time, we have
to mark the related objects to fit the puzzle together. When
we see a commit we mark its tree as a root tree, and when
we see a root tree with a .gitmodules file, we mark the
corresponding blob to be checked.
In an ideal world, we'd check the objects in topological
order: commits followed by trees followed by blobs. In that
case we can avoid ever loading an object twice, since all
markings would be complete by the time we get to the marked
objects. And indeed, if we are checking a single packfile,
this is the order in which Git will generally write the
objects. But we can't count on that:
1. git-fsck may show us the objects in arbitrary order
(loose objects are fed in sha1 order, but we may also
have multiple packs, and we process each pack fully in
sequence).
2. The type ordering is just what git-pack-objects happens
to write now. The pack format does not require a
specific order, and it's possible that future versions
of Git (or a custom version trying to fool official
Git's fsck checks!) may order it differently.
3. We may not even be fscking all of the relevant objects
at once. Consider pushing with transfer.fsckObjects,
where one push adds a blob at path "foo", and then a
second push adds the same blob at path ".gitmodules".
The blob is not part of the second push at all, but we
need to mark and check it.
So in the general case, we need to make up to three passes
over the objects: once to make sure we've seen all commits,
then once to cover any trees we might have missed, and then
a final pass to cover any .gitmodules blobs we found in the
second pass.
We can simplify things a bit by loosening the requirement
that we find .gitmodules only at root trees. Technically
a file like "subdir/.gitmodules" is not parsed by Git, but
it's not unreasonable for us to declare that Git is aware of
all ".gitmodules" files and make them eligible for checking.
That lets us drop the root-tree requirement, which
eliminates one pass entirely. And it makes our worst case
much better: instead of potentially queueing every root tree
to be re-examined, the worst case is that we queue each
unique .gitmodules blob for a second look.
This patch just adds the boilerplate to find .gitmodules
files. The actual content checks will come in a subsequent
commit.
Jeff King [Wed, 2 May 2018 19:44:51 +0000 (15:44 -0400)]
fsck: actually fsck blob data
Because fscking a blob has always been a noop, we didn't
bother passing around the blob data. In preparation for
content-level checks, let's fix up a few things:
1. The fsck_object() function just returns success for any
blob. Let's a noop fsck_blob(), which we can fill in
with actual logic later.
2. The fsck_loose() function in builtin/fsck.c
just threw away blob content after loading it. Let's
hold onto it until after we've called fsck_object().
The easiest way to do this is to just drop the
parse_loose_object() helper entirely. Incidentally,
this also fixes a memory leak: if we successfully
loaded the object data but did not parse it, we would
have left the function without freeing it.
3. When fsck_loose() loads the object data, it
does so with a custom read_loose_object() helper. This
function streams any blobs, regardless of size, under
the assumption that we're only checking the sha1.
Instead, let's actually load blobs smaller than
big_file_threshold, as the normal object-reading
code-paths would do. This lets us fsck small files, and
a NULL return is an indication that the blob was so big
that it needed to be streamed, and we can pass that
information along to fsck_blob().
Jeff King [Sun, 13 May 2018 16:35:37 +0000 (12:35 -0400)]
fsck: simplify ".git" check
There's no need for us to manually check for ".git"; it's a
subset of the other filesystem-specific tests. Dropping it
makes our code slightly shorter. More importantly, the
existing code may make a reader wonder why ".GIT" is not
covered here, and whether that is a bug (it isn't, as it's
also covered in the filesystem-specific tests).
Jeff King [Wed, 2 May 2018 20:37:09 +0000 (16:37 -0400)]
index-pack: make fsck error message more specific
If fsck reports an error, we say only "Error in object".
This isn't quite as bad as it might seem, since the fsck
code would have dumped some errors to stderr already. But it
might help to give a little more context. The earlier output
would not have even mentioned "fsck", and that may be a clue
that the "fsck.*" or "*.fsckObjects" config may be relevant.
Jeff King [Tue, 15 May 2018 14:15:18 +0000 (10:15 -0400)]
Merge branch 'jk/submodule-name-verify-fix' into jk/submodule-name-verify-fsck
* jk/submodule-name-verify-fix:
verify_path: disallow symlinks in .gitmodules
update-index: stat updated files earlier
verify_path: drop clever fallthrough
skip_prefix: add icase-insensitive variant
is_{hfs,ntfs}_dotgitmodules: add tests
path: match NTFS short names for more .git files
is_hfs_dotgit: match other .git files
is_ntfs_dotgit: use a size_t for traversing string
submodule-config: verify submodule names as paths
Note that this includes two bits of evil-merge:
- there's a new call to verify_path() that doesn't actually
have a mode available. It should be OK to pass "0" here,
since we're just manipulating the untracked cache, not an
actual index entry.
- the lstat() in builtin/update-index.c:update_one() needs
to be updated to handle the fsmonitor case (without this
it still behaves correctly, but does an unnecessary
lstat).
Jeff King [Sat, 5 May 2018 00:03:35 +0000 (20:03 -0400)]
verify_path: disallow symlinks in .gitmodules
There are a few reasons it's not a good idea to make
.gitmodules a symlink, including:
1. It won't be portable to systems without symlinks.
2. It may behave inconsistently, since Git may look at
this file in the index or a tree without bothering to
resolve any symbolic links. We don't do this _yet_, but
the config infrastructure is there and it's planned for
the future.
With some clever code, we could make (2) work. And some
people may not care about (1) if they only work on one
platform. But there are a few security reasons to simply
disallow it:
a. A symlinked .gitmodules file may circumvent any fsck
checks of the content.
b. Git may read and write from the on-disk file without
sanity checking the symlink target. So for example, if
you link ".gitmodules" to "../oops" and run "git
submodule add", we'll write to the file "oops" outside
the repository.
Again, both of those are problems that _could_ be solved
with sufficient code, but given the complications in (1) and
(2), we're better off just outlawing it explicitly.
Note the slightly tricky call to verify_path() in
update-index's update_one(). There we may not have a mode if
we're not updating from the filesystem (e.g., we might just
be removing the file). Passing "0" as the mode there works
fine; since it's not a symlink, we'll just skip the extra
checks.
Jeff King [Mon, 14 May 2018 15:00:56 +0000 (11:00 -0400)]
update-index: stat updated files earlier
In the update_one(), we check verify_path() on the proposed
path before doing anything else. In preparation for having
verify_path() look at the file mode, let's stat the file
earlier, so we can check the mode accurately.
This is made a bit trickier by the fact that this function
only does an lstat in a few code paths (the ones that flow
down through process_path()). So we can speculatively do the
lstat() here and pass the results down, and just use a dummy
mode for cases where we won't actually be updating the index
from the filesystem.
Jeff King [Sun, 13 May 2018 17:00:23 +0000 (13:00 -0400)]
verify_path: drop clever fallthrough
We check ".git" and ".." in the same switch statement, and
fall through the cases to share the end-of-component check.
While this saves us a line or two, it makes modifying the
function much harder. Let's just write it out.
Jeff King [Sun, 13 May 2018 16:57:14 +0000 (12:57 -0400)]
skip_prefix: add case-insensitive variant
We have the convenient skip_prefix() helper, but if you want
to do case-insensitive matching, you're stuck doing it by
hand. We could add an extra parameter to the function to
let callers ask for this, but the function is small and
somewhat performance-critical. Let's just re-implement it
for the case-insensitive version.
When we started to catch NTFS short names that clash with .git, we only
looked for GIT~1. This is sufficient because we only ever clone into an
empty directory, so .git is guaranteed to be the first subdirectory or
file in that directory.
However, even with a fresh clone, .gitmodules is *not* necessarily the
first file to be written that would want the NTFS short name GITMOD~1: a
malicious repository can add .gitmodul0000 and friends, which sorts
before `.gitmodules` and is therefore checked out *first*. For that
reason, we have to test not only for ~1 short names, but for others,
too.
It's hard to just adapt the existing checks in is_ntfs_dotgit(): since
Windows 2000 (i.e., in all Windows versions still supported by Git),
NTFS short names are only generated in the <prefix>~<number> form up to
number 4. After that, a *different* prefix is used, calculated from the
long file name using an undocumented, but stable algorithm.
For example, the short name of .gitmodules would be GITMOD~1, but if it
is taken, and all of ~2, ~3 and ~4 are taken, too, the short name
GI7EBA~1 will be used. From there, collisions are handled by
incrementing the number, shortening the prefix as needed (until ~9999999
is reached, in which case NTFS will not allow the file to be created).
We'd also want to handle .gitignore and .gitattributes, which suffer
from a similar problem, using the fall-back short names GI250A~1 and
GI7D29~1, respectively.
To accommodate for that, we could reimplement the hashing algorithm, but
it is just safer and simpler to provide the known prefixes. This
algorithm has been reverse-engineered and described at
https://usn.pw/blog/gen/2015/06/09/filenames/, which is defunct but
still available via https://web.archive.org/.
These can be recomputed by running the following Perl script:
Jeff King [Wed, 2 May 2018 19:23:45 +0000 (15:23 -0400)]
is_hfs_dotgit: match other .git files
Both verify_path() and fsck match ".git", ".GIT", and other
variants specific to HFS+. Let's allow matching other
special files like ".gitmodules", which we'll later use to
enforce extra restrictions via verify_path() and fsck.
Jeff King [Sun, 13 May 2018 16:09:42 +0000 (12:09 -0400)]
is_ntfs_dotgit: use a size_t for traversing string
We walk through the "name" string using an int, which can
wrap to a negative value and cause us to read random memory
before our array (e.g., by creating a tree with a name >2GB,
since "int" is still 32 bits even on most 64-bit platforms).
Worse, this is easy to trigger during the fsck_tree() check,
which is supposed to be protecting us from malicious
garbage.
Note one bit of trickiness in the existing code: we
sometimes assign -1 to "len" at the end of the loop, and
then rely on the "len++" in the for-loop's increment to take
it back to 0. This is still legal with a size_t, since
assigning -1 will turn into SIZE_MAX, which then wraps
around to 0 on increment.
Jeff King [Mon, 30 Apr 2018 07:25:25 +0000 (03:25 -0400)]
submodule-config: verify submodule names as paths
Submodule "names" come from the untrusted .gitmodules file,
but we blindly append them to $GIT_DIR/modules to create our
on-disk repo paths. This means you can do bad things by
putting "../" into the name (among other things).
Let's sanity-check these names to avoid building a path that
can be exploited. There are two main decisions:
1. What should the allowed syntax be?
It's tempting to reuse verify_path(), since submodule
names typically come from in-repo paths. But there are
two reasons not to:
a. It's technically more strict than what we need, as
we really care only about breaking out of the
$GIT_DIR/modules/ hierarchy. E.g., having a
submodule named "foo/.git" isn't actually
dangerous, and it's possible that somebody has
manually given such a funny name.
b. Since we'll eventually use this checking logic in
fsck to prevent downstream repositories, it should
be consistent across platforms. Because
verify_path() relies on is_dir_sep(), it wouldn't
block "foo\..\bar" on a non-Windows machine.
2. Where should we enforce it? These days most of the
.gitmodules reads go through submodule-config.c, so
I've put it there in the reading step. That should
cover all of the C code.
We also construct the name for "git submodule add"
inside the git-submodule.sh script. This is probably
not a big deal for security since the name is coming
from the user anyway, but it would be polite to remind
them if the name they pick is invalid (and we need to
expose the name-checker to the shell anyway for our
test scripts).
This patch issues a warning when reading .gitmodules
and just ignores the related config entry completely.
This will generally end up producing a sensible error,
as it works the same as a .gitmodules file which is
missing a submodule entry (so "submodule update" will
barf, but "git clone --recurse-submodules" will print
an error but not abort the clone.
There is one minor oddity, which is that we print the
warning once per malformed config key (since that's how
the config subsystem gives us the entries). So in the
new test, for example, the user would see three
warnings. That's OK, since the intent is that this case
should never come up outside of malicious repositories
(and then it might even benefit the user to see the
message multiple times).
Credit for finding this vulnerability and the proof of
concept from which the test script was adapted goes to
Etienne Stalmans.
add -p: fix 2.17.0-rc* regression due to moved code
Fix a regression in 88f6ffc1c2 ("add -p: only bind search key if
there's more than one hunk", 2018-02-13) which is present in
2.17.0-rc*, but not 2.16.0.
In Perl, regex variables like $1 always refer to the last regex
match. When the aforementioned change added a new regex match between
the old match and the corresponding code that was expecting $1, the $1
variable would always be undef, since the newly inserted regex match
doesn't have any captures.
As a result the "/" feature to search for a string in a hunk by regex
completely broke, on git.git:
$ perl -pi -e 's/Git/Tig/g' README.md
$ ./git --exec-path=$PWD add -p
[..]
Stage this hunk [y,n,q,a,d,j,J,g,/,s,e,?]? s
Split into 4 hunks.
[...]
Stage this hunk [y,n,q,a,d,j,J,g,/,s,e,?]? /Many
Use of uninitialized value $1 in string eq at /home/avar/g/git/git-add--interactive line 1568, <STDIN> line 1.
search for regex? Many
I.e. the initial "/regex" command wouldn't work, and would always emit
a warning and ask again for a regex, now it works as intended again.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jonathan Nieder [Wed, 28 Mar 2018 20:33:03 +0000 (13:33 -0700)]
upload-pack: disable object filtering when disabled by config
When upload-pack gained partial clone support (v2.17.0-rc0~132^2~12,
2017-12-08), it was guarded by the uploadpack.allowFilter config item
to allow server operators to control when they start supporting it.
That config item didn't go far enough, though: it controls whether the
'filter' capability is advertised, but if a (custom) client ignores
the capability advertisement and passes a filter specification anyway,
the server would handle that despite allowFilter being false.
This is particularly significant if a security bug is discovered in
this new experimental partial clone code. Installations without
uploadpack.allowFilter ought not to be affected since they don't
intend to support partial clone, but they would be swept up into being
vulnerable.
Simplify and limit the attack surface by making uploadpack.allowFilter
disable the feature, not just the advertisement of it.
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>