This replaces the constructor implementation of cgroup handling with a simpler,
thread-safe on-demand model of cgroup driver initialization.
Making the cgroup initialization code run in a constructor means that each time
the shared library gets mapped the cgroup parsing code gets run. That's
unnecessary overhead.
It also feels to me that this is only accidently thread-safe because
constructors are only run once. But should threads actually end up manipulating
or freeing memory that is file-global to cgfsng.c we'd be screwed. Now, I might
be wrong here but the cleaner implementation is to allocate a cgroup driver on
demand whenever we need it.
Take the chance and rework the cgroup_ops interface to make the functions it
wants to have implemented a lot cleaner.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
First, I forgot to actually replace strncpy() with strlcpy(). Second, we don't
want to \0-terminate since this is an abstract unix socket and this is not
required. Instead, let's simply use memcpy() which is more correct and also
silences gcc-8.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Apparently -Werror=stringop-overflow will trigger an error here even though
this is completely valid since we now that we're definitely copying a \0-byte.
Work around this gcc-8 quirk by using memcpy(). This shouldn't trigger the
warning.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
These files have never been used and as such have no dependencies in the
codebase whatsoever. So remove them. If we need them we can simply pull them
out of the git history.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Tycho Andersen [Wed, 9 May 2018 01:48:31 +0000 (01:48 +0000)]
execute: set init_path when existing init is found
I'm not really sure we should be looking in the rootfs for an existing
init, but I'll send a much more invasive patch to correct that. For now,
let's just make sure we set init_path when we find one, so that later in
execute_start() we don't bail.
Tycho Andersen [Wed, 9 May 2018 01:29:06 +0000 (01:29 +0000)]
execute: account for -o path option count
This always works fine... until your exec() fails and you try to go and
free it, you've overwritten the allocator's metadata (and potentially other
stuff) and it fails.
Tycho Andersen [Tue, 8 May 2018 15:43:19 +0000 (09:43 -0600)]
add some TRACE/ERROR reporting
The errors in execute_start are important because nothing actually prints
out what error if any there was in these cases, so you're left with an
empty log.
The TRACE logs are simply to tell you which version of start lxc chose to
invoke: exec or start.
This is already done in do_lxcapi_start{l}() so a) no need to do it again here
and b) this would close the state socket pair sockets, corrup the fd, and lead
to EBADF.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Tycho Andersen [Thu, 3 May 2018 18:32:19 +0000 (18:32 +0000)]
fix logic for execute log file
The problem here is that lxc-init runs *inside* the container. So if a
person has the log file set to /home/$USER/foo, lxc-init ends up making a
directory /home/$USER/foo inside the container to put the log file in. What
we really want are the logs to be propagated from inside the container to
the outside. We accomplish this by passing an fd without O_CLOEXEC, and
telling lxc-init to log to that file.
Thomas Petazzoni [Fri, 20 Apr 2018 10:26:33 +0000 (12:26 +0200)]
lxc/tools/lxc_monitor: include missing <stddef.h>
lxc_monitor.c uses offsetof(), so it should include
<stddef.h>. Otherwise the build fails with the musl C library:
tools/lxc_monitor.c: In function ‘lxc_abstract_unix_connect’:
tools/lxc_monitor.c:324:9: warning: implicit declaration of function ‘offsetof’ [-Wimplicit-function-declaration]
offsetof(struct sockaddr_un, sun_path) + len + 1);
^~~~~~~~
tools/lxc_monitor.c:324:18: error: expected expression before ‘struct’
offsetof(struct sockaddr_un, sun_path) + len + 1);
^~~~~~
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
LXC generates and loads the seccomp-bpf filter in the host/container which
spawn the new container. In other words, userspace N is responsible for
generating and loading the seccomp-bpf filter which restricts userspace N + 1.
Assume 64bit kernel and 32bit userspace running a 64bit container. In this case
the 32-bit x86 userspace is used to create a seccomp-bpf filter for a 64-bit
userspace. Unless one explicitly adds the 64-bit ABI to the libseccomp filter,
or adjusts the default behavior for "BAD_ARCH", *all* 64-bit x86 syscalls will
be blocked.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Suggested-by: Paul Moore <paul@paul-moore.com>
Kaarle Ritvanen [Sun, 15 Apr 2018 11:50:28 +0000 (14:50 +0300)]
do_lxcapi_create: set umask
Always use 022 as the umask when creating the rootfs directory and
executing the template. A too loose umask may cause security issues.
A too strict umask may cause programs to fail inside the container.
Signed-off-by: Kaarle Ritvanen <kaarle.ritvanen@datakunkku.fi>
This commit deals with different kernel and userspace layouts and nesting. Here
are three examples:
1. 64bit kernel and 64bit userspace running 32bit containers
2. 64bit kernel and 32bit userspace running 64bit containers
3. 64bit kernel and 64bit userspace running 32bit containers running 64bit containers
Two things to lookout for:
1. The compat arch that is detected might have already been present in the main
context. So check that it actually hasn't been and only then add it.
2. The contexts don't need merging if the architectures are the same and also can't be.
With these changes I can run all crazy/weird combinations with proper seccomp
isolation.