Serge Hallyn [Fri, 1 May 2015 21:11:28 +0000 (21:11 +0000)]
Use 'cgm listcontrollers' list rather than /proc/self/cgroups
to populate the list of subsystems to use.
Cgmanager can be started with some subsystems disabled (i.e.
cgmanager -M cpuset). If lxc using cgmanager then uses the
/proc/self/cgroup output to determine which controllers to use,
it will fail when trying to do things to cpuset. Instead, ask
cgmanager which controllers to use.
This still defers (per patch 1/1) to the lxc.cgroup.use values.
Use POSIX-compliant function names in bash completion
When running in posix mode (for example, because it was invoked as `sh`,
or with the --posix option), bash rejects the function names previously
used because they contain hyphens, which are not legal POSIX names, and
exits immediately.
This is a particularly serious problem on a system in which the
following three conditions hold:
1. The `sh` executable is provided by bash, e. g. via a symlink
2. Gnome Display Manager is used to launch X sessions
3. Bash completion is loaded in the (system or user) profile file
instead of in the bashrc file
In that case, GDM's Xsession script (run with `sh`, i. e., bash in posix
mode) sources the profile files, thus causing the shell to load the bash
completion files. Upon encountering the non-POSIX-compliant function
names, bash would then exit, immediately ending the X session.
Fixes #521.
Signed-off-by: Lucas Werkmeister <mail@lucaswerkmeister.de>
Cyril Bitterich [Sat, 9 May 2015 19:57:14 +0000 (21:57 +0200)]
lxc-debian.in: Fixed errors if dbus is not installed
The lxc-debian template debootstraps a minimum debian system which does not contain dbus.
If systemd is used this will result in getty-static.service to be used instead of getty@ .
The systemd default files uses 6 tty's instead of the 4 the script creates.
This will lead to repeated error messages in the systemd journal.
Martin Pitt [Thu, 7 May 2015 11:38:50 +0000 (13:38 +0200)]
Call /lib/apparmor/profile-load directly instead of the wrapper
AppArmor ships /lib/apparmor/profile-load. /lib/init/apparmor-profile-load is
merely a wrapper which calls the former, so just call it directly to avoid the
dependency on the wrapper.
Make lxc-checkconfig work with kernel versions > 3
(1) Add test for kernel version greater 3.
(2) Use && and || instead of -a and -o as suggested in
http://www.unix.com/man-page/posix/1p/test/.
lxc-checkconfig will currently report "missing" on "Cgroup memory controller"
for kernel versions greater 3. This happens because the script, before checking
for the corresponding memory variable in the kernel config, currently will test
whether we have a major kernel version greater- or equal to 3 and a minor kernel
version greater- or equal to 6. This adds an additional test whether we have a
major kernel version greater than 3.
Signed-off-by: Christian Brauner <christianvanbrauner@gmail.com>
Particularly when using the go-lxc api with lots of threads, it
happens that if the open files limit is > 1024, we will try to
select on fd > 1024 which breaks on glibc.
In the past, lxc-cmd-stop would wait until the command pipe was closed
before returning, ensuring that the container monitor had exited.
Now that we accept the actual success return value, lxcapi_stop can
return success before the monitor has fully exited.
So explicitly wait for the container to stop, when lxc-cmd-stop returned
success.
1. When we stop a container from the lxc_cmd stop handler, we kill its
init task, then we unfreeze the container to make sure it receives the
signal. When that unfreeze succeeds, we were immediately returning 0,
without sending a response to the invoker.
2. lxc_cmd returns the length of the field received. In the case of
an lxc_cmd_stop this is 16. But a comment claims we expect no response,
only a 0. In fact the handler does send a response, which may or may
not include an error. So don't call an error just because we got back a
response.
Serge Hallyn [Wed, 18 Mar 2015 00:02:18 +0000 (19:02 -0500)]
cgmanager: put unprivileged containers under $(curcgroup)/lxc/$(container0
Currently if we are in /user.slice/user-1000.slice/session-c2.scope,
and we start an unprivileged container t1, it will be in cgroup
3:memory:/user.slice/user-1000.slice/session-c2.scope/t1. If
we then do a 'lxc-cgroup -n t1 freezer.tasks', cgm_get will
first switch to 3:memory:/user.slice/user-1000.slice/session-c2.scope
then look up 't1's values. The reasons for this are
1. cgmanager get_value is relative to your own cgroup, so we need
to be sure to be in t1's cgroup or an ancestor
2. we don't want to be in the container's cgroup bc it might freeze us.
But in Ubuntu 15.04 it was decided that
3:memory:/user.slice/user-1000.slice/session-c2.scope/tasks should
not be writeable by the user, making this fail.
Therefore put all unprivileged cgroups under "lxc/%n". That way
the "lxc" cgroup should always be owned by the user so that he can
enter.
This patch enables seccomp support for LXC containers running on PowerPC
architectures. It is based on the latest PowerPC support added to libseccomp, on
the working-ppc64 branch [1].
Libseccomp has been tested on ppc, ppc64 and ppc64le architectures. LXC with
seccomp support has been tested on ppc and ppc64 architectures, using the
default seccomp policy example files delivered with the LXC package.
CVE-2015-1334: Don't use the container's /proc during attach
A user could otherwise over-mount /proc and prevent the apparmor profile
or selinux label from being written which combined with a modified
/bin/sh or other commonly used binary would lead to unconfined code
execution.
Reported-by: Roman Fiedler Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
Stéphane Graber [Mon, 2 Feb 2015 09:21:20 +0000 (11:21 +0200)]
In lxc.mount.auto, skip on ENONENT
This resolves the case where /proc/sysrq-trigger doesn't exist by simply
ignoring any mount failure on ENOENT. With the current mount list, this
will always result in a safe environment (typically the read-only
underlay).
Closes #425
v2: Don't always show an error
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Serge Hallyn [Thu, 29 Jan 2015 16:09:45 +0000 (16:09 +0000)]
clone_paths: use 'rootfs' for destination directory
We were trying to be smart and use whatever the last part of
the container's rootfs path was. However for block devices
that doesn't make much sense. I.e. if lxc.rootfs = /dev/md-1,
chances are that /var/lib/lxc/c1/md-1 does not exist.
So always use the $lxcpath/$lxcname/rootfs, and if it does
not exist, try to create it.
With this, 'lxc-clone -s -o c1 -n c2' where c1 has an lvm backend
is fixed. See https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1414771
Serge Hallyn [Tue, 27 Jan 2015 09:29:17 +0000 (10:29 +0100)]
fix busybox unpriv
1. tty5 is not needed
2. the devices should be optional in case they didn't exist in the
host / parent-container
3. switch from 'touch $rootfs/dev/$dev' to using create=file in the
mount entry.
Patrick O'Leary [Wed, 17 Dec 2014 01:47:21 +0000 (19:47 -0600)]
replace deprecated `index` with `strchr`
The `index` libc function was removed in POSIX 2008, and `strchr` is a direct
replacement. The bionic (Android) libc has removed `index` when you are
compiling for a 64-bit architecture, such as AArch64.
Signed-off-by: Patrick O'Leary <patrick.oleary@gmail.com> Acked-by: Stéphane Graber <stgraber@ubuntu.com>
Michael Adam [Mon, 19 Jan 2015 21:50:58 +0000 (22:50 +0100)]
add "--mask-tmp" to lxc-fedora, plus some template script fixes]
Hi Michael,
do you have any concerns with the attached patch to
the fedora template that adds an option --mask-tmp
that prevents fedora/systemd from over-mounting
/tmp with tmpfs, which is useful in some cases?
Thanks - Michael
----- Forwarded message from Michael Adam <obnox@samba.org> -----
Date: Sat, 10 Jan 2015 13:12:06 +0100
From: Michael Adam <obnox@samba.org>
To: LXC development mailing-list <lxc-devel@lists.linuxcontainers.org>
Subject: Re: [lxc-devel] [PATCHES] add "--mask-tmp" to lxc-fedora, plus some
template script fixes
User-Agent: Mutt/1.5.23 (2014-03-12)
On 2015-01-10 at 13:08 +0100, Michael Adam wrote:
> On 2015-01-10 at 04:05 +0000, Serge Hallyn wrote:
>
> > The less controversial one is adding mask-tmp to the fedora template.
> > It looks fine to me, but that should go separately to mwarfield, our
> > fedora template maintainer :)
>
> I had notified mhw of my patches on irc, but apparently he is
> currently very busy.
>
> For a start, following is an update of the uncontroversial fix
> patches, i.e. the fix patche without the path ones, and without
> the mask-tmp patch.
And here comes the mask-tmp patch.
It needs to be applied onto the previous fix-patchset.
From 9589dca113535ed2f4faad89db2fab33bb8a9d7e Mon Sep 17 00:00:00 2001
From: Michael Adam <obnox@samba.org>
Date: Thu, 8 Jan 2015 10:25:24 +0100
Subject: [PATCH] lxc-fedora: add a new option --mask-tmp
This will configure the container to prevent the standard
behaviour of over-mounting /tmp with tmpfs, which can be
undesirable in some cases.
My personal use case is vagrant-lxc in combination with
vagrant-cachier.
Signed-off-by: Michael Adam <obnox@samba.org> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Axel Neumann [Tue, 13 Jan 2015 09:48:52 +0000 (10:48 +0100)]
Fix instantiation of multiple vlan interfaces with same id
Container fail to start with configs (as shown below) where the same
vlan id is used for several type=vlan container interfaces.
Then, during the instantiation of the vlan interfaces, an error occurs
because the lxc code tries to assign the same temporary name to both
of them before it is bound into the container.
Serge Hallyn [Fri, 9 Jan 2015 22:00:28 +0000 (22:00 +0000)]
Fix reversed args in mount call
Riya Khanna reported that with a ramfs rootfs the mount to make
/ rprivate was returning -EFAULT. NULL was being passed as the
mount target. Pass "/" instead.
Martin Pitt [Thu, 8 Jan 2015 12:09:37 +0000 (13:09 +0100)]
apparmor: Fix slave bind mounts
The permission to make a mount "slave" is spelt "make-slave", not "slave", see
https://launchpad.net/bugs/1401619. Also, we need to make all mounts slave, not
just the root dir.
Serge Hallyn [Fri, 19 Dec 2014 18:23:52 +0000 (18:23 +0000)]
Enable seccomp by default for unprivileged users.
In contrast to what the comment above the line disabling it said,
it seems to work just fine. It also is needed on current kernels
(until Eric's patch hits upstream) to prevent unprivileged containers
from hosing fuse filesystems they inherit.
Serge Hallyn [Fri, 19 Dec 2014 18:22:55 +0000 (18:22 +0000)]
seccomp: add rule to reject umount -f
If a container has a bind mount from a host nfs or fuse
filesystem, and does 'umount -f', it will disconnect the
host's filesystem. This patch adds a seccomp rule to
block umount -f from a container. It also adds that rule
to the default seccomp profile.
Shuai Zhang [Sun, 30 Nov 2014 13:03:37 +0000 (21:03 +0800)]
audit: added capacity and reserve() to nlmsg
There are now two (permitted) ways to add data to netlink message:
1. put_xxx()
2. call nlmsg_reserve() to get a pointer to newly reserved room within the
original netlink message, then write or memcpy data to that area.
Both of them guarantee adding requested length data do not overflow the
pre-allocated message buffer by checking against its cap field first.
And there may be no need to access nlmsg_len outside nl module, because both
put_xxx() and nlmsg_reserve() have alread did that for us.