Cameron Norman [Mon, 1 Dec 2014 21:29:26 +0000 (13:29 -0800)]
lxc-debian: adjust init system configurations
Do as much as possible to allow containers switching from non-systemd to
systemd to work as intended (but nothing that will cause side effects).
Use update-rc.d disable instead of remove so the init scripts are not
re-enabled when the package is updated
Signed-off-by: Cameron Norman <camerontnorman@gmail.com> Acked-by: Stéphane Graber <stgraber@ubuntu.com>
Antonio Terceiro [Mon, 24 Nov 2014 01:51:06 +0000 (23:51 -0200)]
lxc-debian: support systemd as PID 1
Containers with systemd need a somewhat special setup, which I borrowed
and adapted from lxc-fedora. These changes are required so that Debian 8
(jessie) containers work properly, and are a no-op for previous Debian
versions.
Signed-off-by: Antonio Terceiro <terceiro@debian.org> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Gu1 [Tue, 28 Oct 2014 01:14:28 +0000 (02:14 +0100)]
lxc-debian: Fix default mirrors
Fix a typo in the lines inserted in the default sources.list.
Change the default mirror to http.debian.net which is (supposedly) more
accurate and better than cdn.debian.net for a generic configuration.
Use security.debian.org directly for the {release}/updates repository.
Abin Shahab [Wed, 12 Nov 2014 00:06:52 +0000 (00:06 +0000)]
Remounts bind mounts if read-only flag is provided
Bind mounts do not honor filesystem mount options. This change will
remount filesystems that are bind mounted if there are changes to
filesystem mount options, specifically if the mount is readonly.
Signed-off-by: Abin Shahab <ashahab@altiscale.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Silvio Fricke [Fri, 14 Nov 2014 19:56:12 +0000 (20:56 +0100)]
lxc/utils: bugfix freed pointer return value
We allocate a pointer and save this address in a static variable. After
this we freed this pointer and return.
Here a cuttout of a valgrind report:
[...]
==11568== Invalid read of size 1
==11568== at 0x4C2D524: strlen (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==11568== by 0x5961C9B: puts (in /usr/lib/libc-2.20.so)
==11568== by 0x400890: main (lxc_config.c:73)
==11568== Address 0x6933e21 is 1 bytes inside a block of size 32 free'd
==11568== at 0x4C2B200: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==11568== by 0x4E654F2: lxc_global_config_value (utils.c:415)
==11568== by 0x4E92177: lxc_get_global_config_item (lxccontainer.c:2287)
==11568== by 0x400883: main (lxc_config.c:71)
[...]
Serge Hallyn [Sun, 2 Nov 2014 14:01:18 +0000 (14:01 +0000)]
cgmanager: fix 'attach' with "all" controller support
"all" is not a supported keyword for cgmanager's get_pid_cgroup.
Pass the first mounted cgroup subsystem instead of passing "all" when
getting the container's cgorup to attach to.
Also, make sure that the target cgroup is in fact in all identical
cgroups before attaching with 'all". If not, then we must attach to
each cgroup separately, or else we will not be in all the same cgroups
as the target container.
Serge Hallyn [Mon, 27 Oct 2014 14:23:10 +0000 (14:23 +0000)]
lxc_global_config_value: simplify the theme
Rather than try to free all the not-being-returned items at
each if clause where we assign one to return value, just NULL
the one we are returning so we can safely free all the
values. This should fix the newly reported coverity memory
leak
Serge Hallyn [Tue, 14 Oct 2014 11:04:35 +0000 (11:04 +0000)]
lxc-start: don't re-try to mount rootfs if we already did so
If we are root using a user namespace and are mounting a blockdev as rootfs,
then we do this before unsharing the userns, because we are not allowed to
do it in a userns. But after unsharing the userns, we unconditionally
retried mounting the rootfs, resulting in failure. stop that.
Serge Hallyn [Mon, 27 Oct 2014 03:01:30 +0000 (22:01 -0500)]
do_rootfs_setup: fix return bugs
Fix return value on bind mount failure.
If we've already mounted the rootfs, exit after the bind mount
rather than re-trying the rootfs mount. The only case where
this happens is when root is starting a container in a user
namespace and with a block device backing store.
In that case, pre-mount hooks will be executed in the initial
user namespace. That may be worth fixing. Or it may be what
we want. We should think about it and fix it.
Dark Templar [Wed, 22 Oct 2014 14:35:08 +0000 (09:35 -0500)]
Fix another gentoo template typo
I've found one more typo in the gentoo template, configuration in the
generated file /etc/conf.d/hostname was not valid, but it didn't impact
me due to "lxc.utsname" being set in the configuration file of container
and hostname service being not used. Anyway, I've made a patch and
sending it with this mail.
Signed-off-by: Dark Templar <dark_templar@hotbox.ru> Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
When running unprivileged, lxc-create will touch a fstab file, with bind-mounts
for the ttys and other devices. Add this entry in the container config.
Signed-off-by: Bogdan Purcareata <bogdan.purcareata@freescale.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
busybox template: support for unprivileged containers
Apply the changes found in templates/lxc-download to the busybox template as
well. Change ownership of the config and fstab files to the unprivileged user,
and the ownership of the rootfs to root in the new user namespace.
Eliminate the "unsupported for userns" flag.
Signed-off-by: Bogdan Purcareata <bogdan.purcareata@freescale.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
KATOH Yasufumi [Thu, 2 Oct 2014 09:01:06 +0000 (18:01 +0900)]
lxc_global_config_value can return the default lxc.cgroup.pattern whether root or non-root
>>> On Tue, 30 Sep 2014 19:48:09 +0000
in message "Re: [lxc-devel] [PATCH] lxc-config can show lxc.cgroup.(use|pattern)"
Serge Hallyn-san wrote:
> I think it would be worth also augmenting
> lxc_global_config_value() to return a default lxc.cgroup.use
> for 'all', and a default lxc.cgroup.pattern ("/lxc/%n" for root
> or "%n" for non-root).
Dongsheng Yang [Tue, 16 Sep 2014 04:58:55 +0000 (12:58 +0800)]
network: allow lxc_network_move_by_index() rename netdev in moving.
In netlink, we can set the dest_name of netdev when move netdev
between namespaces in one netlink request. And moving a netdev of
a src_name to a netdev with a dest_name is a common usecase.
So this patch add a parametaer to lxc_network_move_by_index() to
indicate the dest_name for the movement. NULL means same with
the src_name.
Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Serge Hallyn [Thu, 9 Oct 2014 15:54:51 +0000 (10:54 -0500)]
fix lxc.mount.auto clearing
the way config_mount was structured, sending 'lxc.mount.auto = '
ended up actually clearing all lxc.mount.entrys. Fix that by
moving the check for an empty value to after the subkey checks.
Then, actually do the clearing of auto_mounts in config_mount_auto.
The 'strlen(subkey)' check being removed was bogus - the subkey
either known to be 'lxc.mount.entry', else subkey would have been
NULL (and forced a return in the block above).
This would have been clearer if the config_mount() and helper
fns were structured like the rest of confile.c. It's tempting
to switch it over, but there are subtleties in there so it's
not something to do without a lot of thought and testing.
Andrey Vagin [Sat, 4 Oct 2014 21:49:16 +0000 (01:49 +0400)]
lxc: don't call pivot_root if / is on a ramfs
pivot_root can't be called if / is on a ramfs. Currently chroot is
called before pivot_root. In this case the standard well-known
'chroot escape' technique allows to escape a container.
I think the best way to handle this situation is to make following actions:
* clean all mounts, which should not be visible in CT
* move CT's rootfs into /
* make chroot into /
I don't have a host, where / is on a ramfs, so I can't test this patch.
Serge Hallyn [Wed, 8 Oct 2014 05:14:26 +0000 (00:14 -0500)]
cgmanager: several fixes
These all fix various ways that cgroup actions could fail if an
unprivileged user's cgroup paths were not all the same for all
controllers.
1. in cgm_{g,s}et, use the right controller, not the first in the list,
to get the cgroup path.
2. when we pass 'all' to cgmanager for a ${METHOD}_abs, make sure that all
cgroup paths are the same. That isn't necessary for methods not
taking an absolute path, so split up the former
cgm_supports_multiple_controllers() function into two booleans, one
telling whether cgm supports it, and another telling us whether
cgm supports it AND all controller cgroup paths are the same.
3. separately, do_cgm_enter with abs=true couldn't work if all
cgroup paths were not the same. So just ditch that helper and
call lxc_cgmanager_enter() where needed, because the special
cases would be more complicated.
apparmor: restrict signal and ptrace for processes
Restrict signal and ptrace for processes running under the container
profile. Rules based on AppArmor base abstraction. Add unix rules for
processes running under the container profile.
To cover all the cases we have around, we need to:
- Attempt to use cgm if present (preferred)
- Attempt to use cgmanager directly over dbus otherwise
- Fallback to cgroupfs
When "lxc.autodev = 1", LXC creates automatically a "/dev/.lxc/<name>.<hash>"
folder to put container's devices in so that they are visible from both
the host and the container itself.
On container exit (ne it normal or not), this folder was not cleaned
which made "/dev" folder grow continuously.
We fix this by adding a new `int lxc_delete_autodev(struct lxc_handler
*handler)` called from `static void lxc_fini(const char *name, struct
lxc_handler *handler)`.
Signed-off-by: Jean-Tiare LE BIGOT <jean-tiare.le-bigot@ovh.net> Acked-by: Stéphane Graber <stgraber@ubuntu.com>
Serge Hallyn [Fri, 22 Aug 2014 03:50:36 +0000 (22:50 -0500)]
lxc_map_ids: don't do bogus chekc for newgidmap
If we didn't find newuidmap, then simply require the caller to be
root and write to /proc/self/uidmap manually. Checking for
newgidmap to exist is bogus.
support use of 'all' containers when cgmanager supports it
Introduce a new list of controllers just containing "all".
Make the lists of controllers null-terminated.
If the cgmanager api version is high enough, use the 'all' controller
rather than walking all controllers, which should greatly reduce the
amount of dbus overhead. This will be especially important for
those going through a cgproxy.
Also remove the call to cleanup cgroups when a cgroup existed. That
usually fails (and failure is ignored) since the to-be-cleaned-up
cgroup is busy, but we shouldn't even be trying. Note this can
create for extra un-cleanedup cgroups, however it's better than us
accidentally removing a cgroup that someone else had created and was
about to use.
lxc-gentoo: keep original uid/gid of files/dirs when installing
Call tar with --numeric-owner option to use numbers for user/group
names because the whole uid/gid in rootfs should be consistently
unchanged as in original stage3 tarball and private portage.
lxc-plamo: keep original uid/gid of files/dirs when installing
Regardless of whether "installpkg" command exists or not, install the
command temporarily with static linked tar command into the lxc cache
directory to keep the original uid/gid of files/directories. Also,
use sed command instead of ed command for simplicity.
When managing containers, I need to take action based on container
exit status. For instance, if it exited abnormally (status!=0), I
sometime want to respawn it automatically. Or, when invoking
`lxc-stop` I want to know if it terminated gracefully (ie on `SIGTERM`)
or on `SIGKILL` after a timeout.
This patch adds a new message type `lxc_msg_exit_code,` to preserve
ABI. It sends the raw status code as returned by `waitpid` so that
listening application may want to apply `WEXITSTATUS` before. This is
what `lxc-monitor` does.
Signed-off-by: Jean-Tiare LE BIGOT <jean-tiare.le-bigot@ovh.net>
Serge Hallyn [Fri, 29 Aug 2014 14:20:44 +0000 (14:20 +0000)]
lxc-cgm: fix issue with nested chowning
To ask cgmanager to chown files as an unpriv user, we must send the
request from the container's namespace (with our own userid also
mapped in). However when we create a new namespace then we must
open a new dbus connection, so that our credential and the credential
on the dbus socket match. Otherwise the proxy will refuse the request.
Because we were warning about this failure but not exiting, the failure
was not noticed until the unprivileged container went on to try to
administer its cgroups, i.e. creating a container inside itself.
Fix this by having the do_chown_cgroup create a new cgmanager connection.
In order to reduce the number of connections, since the list of subsystems
is global anyway, don't call do_chown_cgroup once for each controller,
just call it once and have it run over all controllers.
(This patch does not change the fact that we don't fail if the
chown failed. I think we should change that, but let's do it in a
later patch)
Daniel Miranda [Mon, 25 Aug 2014 21:16:43 +0000 (18:16 -0300)]
build: Make setup.py run from srcdir to avoid distutils errors
distutils can't handle paths to source files containing '..'. It will
try to navigate away from the build directory and fail. To fix that,
before building the python module, transform all the path variables then
cd to the srcdir, and set the build directory manually.
This is hopefully the last needed fix to use separate build and
source diretories.
Signed-off-by: Daniel Miranda <danielkza2@gmail.com> Acked-by: Stéphane Graber <stgraber@ubuntu.com>
Daniel Miranda [Mon, 25 Aug 2014 21:16:42 +0000 (18:16 -0300)]
build: don't remove configuration template on clean
Now that default.conf is generated/linked during the configuration
phase, it should not longer be removed in the 'clean' stage, or
subsequent builds will fail. Only remove it during 'dist-clean'.
Signed-off-by: Daniel Miranda <danielkza2@gmail.com> Acked-by: Stéphane Graber <stgraber@ubuntu.com>
Serge Hallyn [Fri, 22 Aug 2014 21:23:56 +0000 (16:23 -0500)]
statvfs: do nothing if statvfs does not exist (android/bionic)
If statvfs does not exist, then don't recalculate mount flags
at remount.
If someone does need this, they could replace the code (only
if !HAVE_STATVFS) with code parsing /proc/self/mountinfo (which
exists in the recent git history)
Serge Hallyn [Wed, 20 Aug 2014 23:18:40 +0000 (23:18 +0000)]
lxc_mount_auto_mounts: honor existing nodev etc at remounts
Same problem as we had with mount_entry(). lxc_mount_auto_mounts()
sometimes does bind mount followed by remount to change options.
With recent kernels it must pass any preexisting NODEV/NOSUID/etc
flags.
Serge Hallyn [Wed, 20 Aug 2014 22:51:43 +0000 (22:51 +0000)]
mount_entry: use statvfs
Use statvfs instead of parsing /proc/self/mountinfo to check for the
flags we need to and into the msbind mount flags. This will be faster
and the code is cleaner.
Daniel Miranda [Thu, 21 Aug 2014 10:56:39 +0000 (07:56 -0300)]
build: Fix support for split build and source dirs
Building LXC in a separate target directory, by running configure from
outside the source tree, failed with multiple errors, mostly in the
Python and Lua extensions, due to assuming the source dir and build dir
are the same in a few places. To fix that:
- Pre-process setup.py with the appropriate directories at configure
time
- Introduce the build dir as an include path in the Lua Makefile
- Link the default container configuration file from the alternatives
in the configure stage, instead of setting a variable and using it
in the Makefile
Signed-off-by: Daniel Miranda <danielkza2@gmail.com> Acked-by: Stéphane Graber <stgraber@ubuntu.com>
Serge Hallyn [Thu, 21 Aug 2014 16:02:18 +0000 (16:02 +0000)]
chmod container dir to 0770 (v2)
This prevents u2 from going into /home/u1/.local/share/lxc/u1/rootfs
and running setuid-root applications to get write access to u1's
container rootfs.
v2: set umask to 002 for the mkdir. Otherwise if umask happens to be,
say, 022, then user does not have write permissions under the container
dir and creation of $containerdir/partial file will fail.
S.Çağlar Onur [Fri, 22 Aug 2014 16:10:12 +0000 (12:10 -0400)]
show additional info if btrfs subvolume deletion fails (issue #315)
Unprivileged users require "-o user_subvol_rm_allowed" mount option for btrfs.
Make the INFO level message to ERROR to make it clear, which now says following;
[caglar@qop:~] lxc-destroy -n rubik
lxc_container: Is the rootfs mounted with -o user_subvol_rm_allowed?
lxc_container: Error destroying rootfs for rubik
Destroying rubik failed
Signed-off-by: S.Çağlar Onur <caglar@10ur.org> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
TAMUKI Shoichi [Tue, 19 Aug 2014 00:29:49 +0000 (09:29 +0900)]
Update plamo template
- If "installpkg" command does not exist, lxc-plamo temporarily
install the command with static linked tar command into the lxc
cache directory. The tar command does not refer to passwd/group
files, which means that only a few files/directories are extracted
with wrong user/group ownership. To avoid this, the installpkg
command now uses the standard tar command in the system.
- Change mode to 666 for $rootfs/dev/null to allow write access for
all users.
- Small fix in usage message.
Serge Hallyn [Sat, 9 Aug 2014 00:30:12 +0000 (00:30 +0000)]
monitor: fix sockname calculation for long lxcpaths
A long enough lxcpath (and small PATH_MAX through crappy defines) can cause
the creation of the string to be hashed to fail. So just use alloca to
get the size string we need.
More importantly, while I can't explain it, if lxcpath is too long, setting
sockname[sizeof(addr->sun_path)-2] to \0 simply doesn't seem to work. So set
sockname[sizeof(addr->sun_path)-3] to \0, which does work.
Serge Hallyn [Sat, 9 Aug 2014 00:28:18 +0000 (00:28 +0000)]
command socket: use hash if needed
The container command socket is an abstract unix socket containing
the lxcpath and container name. Those can be too long. In that case,
use the hash of the lxcpath and lxcname. Continue to use the path and
name if possible to avoid any back compat issues.
Serge Hallyn [Mon, 18 Aug 2014 03:28:21 +0000 (03:28 +0000)]
do_mount_entry: add nexec, nosuid, nodev, rdonly flags if needed at remount
See http://lkml.org/lkml/2014/8/13/746 and its history. The kernel now refuses
mounts if we don't add ro,nosuid,nodev,noexec flags if they were already there.
Also use the newly found info to skip remount if unneeded. For background, if
you want to create a read-only bind mount, then you must first mount(2) with
MS_BIND to create the bind mount, then re-mount(2) again to get the new mount
options to apply. So if this wasn't a bind mount, or no new mount options were
introduced, then we don't do the second mount(2).
null_endofword() and get_field() were not changed, only moved up in
the file.
(Note, while I can start containers inside a privileged container with
this patch, most of the lxc tests still fail with the kernel in question;
Andy's patch seems to still be needed - a kernel with which is available
at https://launchpad.net/~serge-hallyn/+archive/ubuntu/userns-natty
ppa:serge-hallyn/userns-natty)
Stéphane Graber [Sat, 16 Aug 2014 21:16:36 +0000 (17:16 -0400)]
Revert "chmod container dir to 0770"
This commit broke the testsuite for unprivileged containers as the
container directory is now 0750 with the owner being the container root
and the group being the user's group, meaning that the parent user can
only enter the directory, not create entries in there.
Serge Hallyn [Thu, 14 Aug 2014 18:29:55 +0000 (18:29 +0000)]
chmod container dir to 0770
This prevents u2 from going into /home/u1/.local/share/lxc/u1/rootfs
and running setuid-root applications to get write access to u1's
container rootfs.