A change in kernel 4.2 caused btrfs_recursive_destroy to
fail to delete unprivileged containers. This patch restores
the pre-kernel-4.2 behaviour. Ref: Issue 935.
Niklas Eiling [Wed, 30 Mar 2016 10:32:02 +0000 (12:32 +0200)]
c/r: support for the criu pageserver
this enables lxc to perform "disk-less migrations" where memory pages are sent directly to the destination machine instead of being written to the sources filesystem first.
For this, the migrate_opts struct has been added the strings "pageserver_address" and "pageserver_port" so that criu can be told where to look for a pageserver.
Niklas Eiling [Wed, 30 Mar 2016 18:10:21 +0000 (20:10 +0200)]
fix possible buffer overflow
strncat only returns its first argument and not the end of the written string.
Thus "buf-pos" is always 0 and consquently no range check is performed.
In order to do this we make use of the MAP_FIXED flag of mmap(). MAP_FIXED
should be safe to use when it replaces an already existing mapping. To this
end, we establish an anonymous mapping that is one byte larger than the
underlying file. The pages handed to us are zero filled. Now we establish a
fixed-address mapping starting at the address we received from our anonymous
mapping and replace all bytes excluding the additional \0-byte with the file.
This allows us to use normal string-handling function. The idea implemented
here is similar to how shared libraries are mapped.
Signed-off-by: Christian Brauner <christian.brauner@mailbox.org>
Tycho Andersen [Tue, 29 Mar 2016 00:43:20 +0000 (18:43 -0600)]
start: only use host's /dev/null when absolutely necessary
See comments for details, but basically, only use the host's /dev/null when
absolutely necessary (i.e. there is no reasonable /dev/null in the
container).
lxc-busybox: Remove warning for dynamically linked Busybox
The warning has been present since commit 32b37181ea (with no purpose stated).
Support for dynamically linked Busybox has been added since commit bf6cc73696.
Haven't encountered any issues with dynamically linked Busybox in my last
2 years' testing.
open_without_symlink: Don't SYSERROR on something else than ELOOP
The open_without_symlink routine has been specifically created to prevent
mounts with synlinks as source or destination. Keep SYSERROR'ing in that
particular scenario, but leave error handling to calling functions for the
other ones - e.g. optional bind mount when the source dir doesn't exist
throws a nasty error.
Tycho Andersen [Mon, 21 Mar 2016 22:52:02 +0000 (16:52 -0600)]
c/r: don't fail if there is no console_fd on restore
If we set lxc.console=none, this fd won't exist, so let's not fail if it
doesn't. We already partially handled this case correctly, so let's
actually handle it correctly :)
Tycho Andersen [Mon, 21 Mar 2016 22:50:39 +0000 (16:50 -0600)]
c/r: don't pass --ext-mount-map flag when console=none
We don't pass anything on the restore side since we didn't save anything,
but the restore side will expect something if we pass this. Instead, let's
not pass anything.
Tycho Andersen [Fri, 18 Mar 2016 19:13:17 +0000 (13:13 -0600)]
c/r: print criu's stdout when it fails
In particular, when CRIU fails before it has its log completely initialized
(e.g. if the log directory doesn't exist, or if the argument parser fails),
it prints this to stdout. Let's log that.
Tycho Andersen [Thu, 17 Mar 2016 11:14:43 +0000 (05:14 -0600)]
autodev: don't always create /dev/console
In particular, only create /dev/console when it is set to "none".
Otherwise, we will bind mount a pts device later, so let's just leave it.
Also, when bind mounting the pts device, let's create /dev/console if it
doesn't exist, since it may not already exist due to the above :)
v2: s/ot/to
v3: add O_EXCL so we actually get EEXIST, use the right condition for
mount_console (we want to compare against console.path, not
console.name, and console.path can be null)
Serge Hallyn [Wed, 16 Mar 2016 06:02:10 +0000 (23:02 -0700)]
cgroups: try to load cgmanager first
If cgmanager is running, use it. This allows the admin to simply
stop cgmanager if they don't want to use it. The other way there
is no way to choose to use cgmanager.
Serge Hallyn [Wed, 16 Mar 2016 21:48:49 +0000 (14:48 -0700)]
Prevent access to pci devices
Prevent privileged containers from messing with the host's pci devices
directly. Refuse access under /proc/bus, and drop cap_sys_rawio. Some
containers may need to re-enable cap_sys_rawio (i.e. if they run an
X server).
It may be desirable to break some of this stuff into files which can be
separately included (or not included), but this patch isn't the right
place for that.
Tycho Andersen [Tue, 15 Mar 2016 18:01:36 +0000 (12:01 -0600)]
build: fix build on android (and ppc)
The problem here is that dev_t on most platforms is `long unsigned`, but on
android (and ppc?) it's `long long unsigned`. Let's just upcast to `long
long unsigned` and use that format string to keep the compilers happy.
Tycho Andersen [Sat, 12 Mar 2016 01:10:40 +0000 (18:10 -0700)]
c/r: drop lxc.console=none config requirement
There are a few things going on in this patch.
1. /dev/console is an external mount since it is bind mounted from the
host. However, we don't want to use criu's --ext-mount-map auto handling
here, because that will bind mount exactly the same path from the host
on restore, but if the pts device is different on the target host, we'll
bind mount the wrong one, which is obviously wrong.
2. We need to tell CRIU how to restore the TTY. Since we declare the tty as
--external, we need to provide it via --inherit-fd (even though we've
already fixed up the environment).
Tycho Andersen [Sat, 12 Mar 2016 02:01:43 +0000 (19:01 -0700)]
criu: hide more stuff in criu.c
Various other functions/structures are now only used in criu.c, so let's
hide stuff there so as not to pollute headers.
This commit also bumps the required CRIU versions to 2.0. While we don't
*require* any features that aren't in 1.8 patchlevel 21 or above, 2.0 is a
vast improvement, and so we should use that instead.