Adam Iwaniuk and Borys Popławski discovered that an attacker can compromise the
runC host binary from inside a privileged runC container. As a result, this
could be exploited to gain root access on the host. runC is used as the default
runtime for containers with Docker, containerd, Podman, and CRI-O.
The attack can be made when attaching to a running container or when starting a
container running a specially crafted image. For example, when runC attaches
to a container the attacker can trick it into executing itself. This could be
done by replacing the target binary inside the container with a custom binary
pointing back at the runC binary itself. As an example, if the target binary
was /bin/bash, this could be replaced with an executable script specifying the
interpreter path #!/proc/self/exe (/proc/self/exec is a symbolic link created
by the kernel for every process which points to the binary that was executed
for that process). As such when /bin/bash is executed inside the container,
instead the target of /proc/self/exe will be executed - which will point to the
runc binary on the host. The attacker can then proceed to write to the target
of /proc/self/exe to try and overwrite the runC binary on the host. However in
general, this will not succeed as the kernel will not permit it to be
overwritten whilst runC is executing. To overcome this, the attacker can
instead open a file descriptor to /proc/self/exe using the O_PATH flag and then
proceed to reopen the binary as O_WRONLY through /proc/self/fd/<nr> and try to
write to it in a busy loop from a separate process. Ultimately it will succeed
when the runC binary exits. After this the runC binary is compromised and can
be used to attack other containers or the host itself.
This attack is only possible with privileged containers since it requires root
privilege on the host to overwrite the runC binary. Unprivileged containers
with a non-identity ID mapping do not have the permission to write to the host
binary and therefore are unaffected by this attack.
LXC is also impacted in a similar manner by this vulnerability, however as the
LXC project considers privileged containers to be unsafe no CVE has been
assigned for this issue for LXC. Quoting from the
https://linuxcontainers.org/lxc/security/ project's Security information page:
"As privileged containers are considered unsafe, we typically will not consider
new container escape exploits to be security issues worthy of a CVE and quick
fix. We will however try to mitigate those issues so that accidental damage to
the host is prevented."
To prevent this attack, LXC has been patched to create a temporary copy of the
calling binary itself when it starts or attaches to containers. To do this LXC
creates an anonymous, in-memory file using the memfd_create() system call and
copies itself into the temporary in-memory file, which is then sealed to
prevent further modifications. LXC then executes this sealed, in-memory file
instead of the original on-disk binary. Any compromising write operations from
a privileged container to the host LXC binary will then write to the temporary
in-memory binary and not to the host binary on-disk, preserving the integrity
of the host LXC binary. Also as the temporary, in-memory LXC binary is sealed,
writes to this will also fail.
Note: memfd_create() was added to the Linux kernel in the 3.17 release.
When we are running inside of a user namespace getuid() will return a
non-zero uid. So let's check euid as well to make sure we correctly drop
capabilities
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Stephen Barber [Tue, 18 Sep 2018 00:31:22 +0000 (17:31 -0700)]
attach: don't shutdown ipc socket in child
shutdown() affects sockets even across forked processes. The
attached child process doesn't have any interest in using the
IPC socket, so just close it in the child process and let the
intermediate process handle shutting it down.
This fixes a bug seen with lxc exec in crbug.com/884244
Signed-off-by: Stephen Barber <smbarber@chromium.org>
Tobin C. Harding [Fri, 17 Aug 2018 04:40:45 +0000 (14:40 +1000)]
CODING_STYLE: Update section header format
Currently for section headings we use fourth level markdown heading
level (####). We do not have levels two or three.
We can use standard incremental levels for heading adornments i.e
1) =========
2) ##
3) ###
ect.
Since this document is likely referenced by maintainers when guiding new
contributors it can save maintainer time to be able to quickly reference
a section in the coding stlye guide. If we add numbers to each heading
(like the kernel stlye guide) then maintainers can say:
Nice patch, please see section 3 of the coding style guide and ...
So, this patch makes two changes
- Use incremental level heading adornments
- Add a number to each section heading
Tobin C. Harding [Fri, 17 Aug 2018 03:55:47 +0000 (13:55 +1000)]
CODING_STLYE: Remove sections implied by 'kernel style'
We explicitly state that LXC uses coding style based on Linux kernel
style. It is redundant to then document obvious, and well known, kernel
style rules. Identifier names certainly fall into this category as does
usage of braces.
Remove sections implied by 'kernel style'. Naming conventions and brace
placement conventions.
Tobin C. Harding [Thu, 16 Aug 2018 23:38:48 +0000 (09:38 +1000)]
CODING_STLYE: Simplify explanation for use of 'extern'
Current explanation of rules around usage of 'extern' are overly
verbose. It is not necessary to state that functions should be declared
in header files, the compiler already enforces this. These rules are
simple, they are better described with simple statements. An example is
not necessary for such simple rules and serves only to make the document
longer.
Use two simple statements describing the rules that govern function
declaration and the usage of the 'extern' keyword.
Tobin C. Harding [Fri, 17 Aug 2018 03:46:16 +0000 (13:46 +1000)]
CODING_STYLE: Mention kernel style in introduction
Currently the coding style guide does not mention that we use kernel
coding style as a base style for LXC. We have just linked to
CODING_STLYE.md from CONTRIBUTING (which mentions use of kernel coding
style). We can increase documentation congruence and completeness by
mentioning kernel coding style guide in the introduction to our style
guide.
Add heading and introduction to coding style guide informing readers
that we follow kernel coding style as a base before explaining our style
additions.
Tobin C. Harding [Thu, 16 Aug 2018 23:19:32 +0000 (09:19 +1000)]
CONTRIBUTING: Direct readers to CODING_STYLE.md
Currently the 'Coding Style' section mentions only the kernel coding
style. We have additions on top on this outlined in CODING_STYLE.md.
We should direct readers to this document as well as the kernel docs.
Direct readers to CODING_STLYE.md in the 'Coding Style' section.
Tobin C. Harding [Fri, 17 Aug 2018 04:29:15 +0000 (14:29 +1000)]
CONTRIBUTING: Link to latest online kernel docs
Currently we link to a URL for v4.10 of the kernel docs. Since we
already mention the kernel tree we should link to the _latest_ kernel
docs online instead of a fixed past version.
Link to latest online kernel docs tracking the mainline instead of past
fixed version.
Tobin C. Harding [Fri, 17 Aug 2018 04:16:47 +0000 (14:16 +1000)]
CONTRIBUTING: Update reference to kernel coding style
Kernel coding style guide filename is stale, this file has been renamed
in the kernel tree. While this file still exists we should use the new
filename.
Update reference to kernel coding style guide to use the new file name.
We don't want to link caps.{c,h} against utils.{c,h} for the sake of our static
builds init.lxc.static. This means lxc_write_nointr() will not be available. So
handle it EINTR.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
We don't want to link log.{c,h} against utils.{c,h} for the sake of our static
builds init.lxc.static. This means lxc_write_nointr() will not be available. So
handle it EINTR.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2xsec [Mon, 25 Jun 2018 13:00:43 +0000 (22:00 +0900)]
log: add lxc_log_strerror_r macro
Let's ensure that we always use the thread-safe strerror_r() function and add
an approriate macro.
Additionally, define SYS*() macros for all log levels. They will use the new
macro and ensure thread-safe retrieval of errno values.