lxc-busybox: Remove warning for dynamically linked Busybox
The warning has been present since commit 32b37181ea (with no purpose stated).
Support for dynamically linked Busybox has been added since commit bf6cc73696.
Haven't encountered any issues with dynamically linked Busybox in my last
2 years' testing.
open_without_symlink: Don't SYSERROR on something else than ELOOP
The open_without_symlink routine has been specifically created to prevent
mounts with synlinks as source or destination. Keep SYSERROR'ing in that
particular scenario, but leave error handling to calling functions for the
other ones - e.g. optional bind mount when the source dir doesn't exist
throws a nasty error.
Tycho Andersen [Mon, 21 Mar 2016 22:52:02 +0000 (16:52 -0600)]
c/r: don't fail if there is no console_fd on restore
If we set lxc.console=none, this fd won't exist, so let's not fail if it
doesn't. We already partially handled this case correctly, so let's
actually handle it correctly :)
Tycho Andersen [Mon, 21 Mar 2016 22:50:39 +0000 (16:50 -0600)]
c/r: don't pass --ext-mount-map flag when console=none
We don't pass anything on the restore side since we didn't save anything,
but the restore side will expect something if we pass this. Instead, let's
not pass anything.
Tycho Andersen [Fri, 18 Mar 2016 19:13:17 +0000 (13:13 -0600)]
c/r: print criu's stdout when it fails
In particular, when CRIU fails before it has its log completely initialized
(e.g. if the log directory doesn't exist, or if the argument parser fails),
it prints this to stdout. Let's log that.
Tycho Andersen [Thu, 17 Mar 2016 11:14:43 +0000 (05:14 -0600)]
autodev: don't always create /dev/console
In particular, only create /dev/console when it is set to "none".
Otherwise, we will bind mount a pts device later, so let's just leave it.
Also, when bind mounting the pts device, let's create /dev/console if it
doesn't exist, since it may not already exist due to the above :)
v2: s/ot/to
v3: add O_EXCL so we actually get EEXIST, use the right condition for
mount_console (we want to compare against console.path, not
console.name, and console.path can be null)
Serge Hallyn [Wed, 16 Mar 2016 06:02:10 +0000 (23:02 -0700)]
cgroups: try to load cgmanager first
If cgmanager is running, use it. This allows the admin to simply
stop cgmanager if they don't want to use it. The other way there
is no way to choose to use cgmanager.
Serge Hallyn [Wed, 16 Mar 2016 21:48:49 +0000 (14:48 -0700)]
Prevent access to pci devices
Prevent privileged containers from messing with the host's pci devices
directly. Refuse access under /proc/bus, and drop cap_sys_rawio. Some
containers may need to re-enable cap_sys_rawio (i.e. if they run an
X server).
It may be desirable to break some of this stuff into files which can be
separately included (or not included), but this patch isn't the right
place for that.
Tycho Andersen [Tue, 15 Mar 2016 18:01:36 +0000 (12:01 -0600)]
build: fix build on android (and ppc)
The problem here is that dev_t on most platforms is `long unsigned`, but on
android (and ppc?) it's `long long unsigned`. Let's just upcast to `long
long unsigned` and use that format string to keep the compilers happy.
Tycho Andersen [Sat, 12 Mar 2016 01:10:40 +0000 (18:10 -0700)]
c/r: drop lxc.console=none config requirement
There are a few things going on in this patch.
1. /dev/console is an external mount since it is bind mounted from the
host. However, we don't want to use criu's --ext-mount-map auto handling
here, because that will bind mount exactly the same path from the host
on restore, but if the pts device is different on the target host, we'll
bind mount the wrong one, which is obviously wrong.
2. We need to tell CRIU how to restore the TTY. Since we declare the tty as
--external, we need to provide it via --inherit-fd (even though we've
already fixed up the environment).
Tycho Andersen [Sat, 12 Mar 2016 02:01:43 +0000 (19:01 -0700)]
criu: hide more stuff in criu.c
Various other functions/structures are now only used in criu.c, so let's
hide stuff there so as not to pollute headers.
This commit also bumps the required CRIU versions to 2.0. While we don't
*require* any features that aren't in 1.8 patchlevel 21 or above, 2.0 is a
vast improvement, and so we should use that instead.
Tycho Andersen [Thu, 10 Mar 2016 18:10:14 +0000 (11:10 -0700)]
cgroup: cgroup_escape takes no arguments
cgroup_escape() is a slight abuse of the cgroup code: what we really want
here is to escape the *current* process, whether it happens to be the LXC
monitor or not, into the / cgroups.
In the case of dump, we can't do an lxc_init(), because:
We don't want to make this a command to send to the handler, because again,
cgroup_escape() is intended to escape the *current* task to the root
cgroups.
So, let's just have cgroup_escape() build its own handler when required.
Serge Hallyn [Wed, 9 Mar 2016 07:04:46 +0000 (23:04 -0800)]
cgfsng: fix real bug and fake libc realloc bug
read_file was using the wrong value for the string length. Also,
realloc on i386 is wonky with small sizes - so use a batch size
to avoid small reallocs.
Serge Hallyn [Tue, 8 Mar 2016 03:10:58 +0000 (19:10 -0800)]
prevent containers from reading /sys/kernel/debug
Unprivileged containers cannot read it anyway, but also prevent root
owned containers from doing so. Sadly upstart's mountall won't run
if we try to prevent it from being mounted at all.
Serge Hallyn [Mon, 7 Mar 2016 19:16:43 +0000 (11:16 -0800)]
cgfsng - remove the code checking whether devices cgroup lines are already done
We may need to revert this, but I *think* we no longer need this
with default configs. The idea iirc was that if caller cannot
write to devices.allow (i.e. is in a user namespace), then ignore
permission failures if the cgroups are already sufficiently setup.
Serge Hallyn [Thu, 3 Mar 2016 18:31:23 +0000 (10:31 -0800)]
cgfsng: next generation filesystem-backed cgroup implementation
This makes simplifying assumptions: all usable cgroups must be
mounted under /sys/fs/cgroup/controller or /sys/fs/cgroup/contr1,contr2.
Currently this will only work with cgroup namespaces, because
lxc.mount.auto = cgroup is not implemented. So cgfsng_ops_init()
returns NULL if cgroup namespaces are not enabled.