Serge Hallyn [Tue, 25 Feb 2014 05:08:26 +0000 (23:08 -0600)]
always check whether rootfs is shared
(this expands on Dwight's recent patch, commit c597baa8f9)
After unshare(CLONE_NEWNS) and before doing any mounting, always
check whether rootfs is shared. Otherwise template runs or clone
scripts can bleed mount activity to the host.
Serge Hallyn [Fri, 21 Feb 2014 20:36:06 +0000 (14:36 -0600)]
add dir support
It used to be supported with the lxc-create.in script, and
the manpage says it's supported... So let's just support it.
Now
sudo lxc-create -t download --dir /opt/ab -n ab
works, creating the container rootfs under /opt/ab. This
generally isn't something I'd recommend, however telling users
to use a different lxc-path isn't as friendly as I'd like,
because each lxcpath requires separate lxc-ls and lxc-autostart
runs.
Dwight Engen [Wed, 19 Feb 2014 21:44:19 +0000 (16:44 -0500)]
fix mounts not propagating back to root mntns during create and clone
Systems based on systemd mount the root shared by default. We don't want
mounts done during creation by templates nor those done internally by
bdev during rsync based clones to propagate to the root mntns.
The create case already had the right check, but the mount call was
missing "/", so it was failing.
Signed-off-by: Dwight Engen <dwight.engen@oracle.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Stéphane Graber [Tue, 18 Feb 2014 22:33:51 +0000 (17:33 -0500)]
Set a reasonable fallback for get_rundir
If get_rundir can't find XDG_RUNTIME_DIR in the environment, it'll
attempt to build a path using ~/.cache/lxc/run/. Should that fail
because of missing $HOME in the environment, it'll then return NULL an
all callers will fail in that case.
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Serge Hallyn [Tue, 18 Feb 2014 21:12:52 +0000 (15:12 -0600)]
Fix unprivileged networking
If we are unprivileged and have asked for a veth device, then create
a pipe over which to pass the veth names.
Network-related todos:
1. set mtu on the container side of veth device
2. set mtu in lxc-user-nic. Note that this probably requires an
update to the /etc/lxc/lxc-usernet file :(
Serge Hallyn [Tue, 18 Feb 2014 21:01:38 +0000 (15:01 -0600)]
cache whether 'optional' was in mntopts
after commit 4e4ca16158f91ac1271495638a4e62881169474e we are
checking for optional in mntopts after we forcibly remove it.
Cache whether we had it before removing it.
Serge Hallyn [Mon, 17 Feb 2014 18:47:35 +0000 (12:47 -0600)]
attach: try to use the container's seccomp policy
We can't get the actual policy (in the case where the policy file
has changed) from the container, but at least we can use the
seccomp policy file listed in the container config file.
(If anyone wants to further improve this, it may be better to get
the seccomp policy over the cmd api; not sure that's what we want,
and this seems simpler to hook into the existing code, so I went
this way for now)
Stéphane Graber [Mon, 17 Feb 2014 15:51:53 +0000 (10:51 -0500)]
download: Support nested containers in unpriv
This adds detection for the case where we are root in an unprivileged
container and then run LXC from there. In this case, we want to download
to the system location, ignore the missing uid/gid ranges and run
templates that are userns-ready.
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
S.Çağlar Onur [Sun, 16 Feb 2014 21:20:48 +0000 (16:20 -0500)]
fill missing netdev fields for unprivileged containers
lxc-user-nic now returns the names of the interfaces and
unpriv_assign_nic function parses that information to fill
missing netdev->veth_attr.pair and netdev->name.
With this patch get_running_config_item started to provide
correct information;
Dwight Engen [Thu, 13 Feb 2014 21:13:03 +0000 (16:13 -0500)]
create fd, stdin, stdout, stderr symlinks in /dev
The kernel's Documentation/devices.txt says that these symlinks should
exist in /dev (they are listed in the "Compulsory" section). I'm not
currently adding nfsd and X0R since they are required for iBCS, but
they can be easily added to the array later if need be.
Signed-off-by: Dwight Engen <dwight.engen@oracle.com> Acked-by: Michael H. Warfield <mhw@WittsEnd.com> Acked-by: Stéphane Graber <stgraber@ubuntu.com>
Stéphane Graber [Thu, 13 Feb 2014 17:42:21 +0000 (12:42 -0500)]
lxc-start-ephemeral: Use attach
With this change, systems that support it will use attach to run any
provided command.
This doesn't change the default behaviour of attaching to tty1, but it
does make it much easier to script or even get a quick shell with:
lxc-start-ephemeral -o p1 -n p2 -- /bin/bash
I'm doing the setgid,initgroups,setuid,setenv magic in python rather
than using the attach_wait parameters as I need access to the pwd module
in the target namespace to grab the required information.
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Stéphane Graber [Thu, 13 Feb 2014 16:17:48 +0000 (11:17 -0500)]
coverity: Do chdir following chroot
We used to do chdir(path), chroot(path). That's correct but not properly
handled coverity, so do chroot(path), chdir("/") instead as that's the
recommended way.
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Serge Hallyn [Thu, 13 Feb 2014 06:52:52 +0000 (00:52 -0600)]
overlayfs_clonepaths: if unpriv then rsync in a userns
This allows lxc-snapshot and lxc-clone -s from an overlayfs container
to work unprivileged. (lxc-clone -s from a directory backed container
already did work)
Stéphane Graber [Wed, 12 Feb 2014 22:46:06 +0000 (17:46 -0500)]
Fix some configure.ac issues
- Run on distro without lsb_release
- Don't try and interpret with_runtime_path as a command
- Don't print stuff on screen while in the middle of a check
Stéphane Graber [Wed, 12 Feb 2014 22:30:12 +0000 (17:30 -0500)]
travis: Build using the daily PPA
Now that we depend on seccomp2, the backport currently in precise is too
old to allow for a succesful build, so instead use ppa:ubuntu-lxc/daily
which contains recent versions of all needed build-dependencies.
Serge Hallyn [Wed, 12 Feb 2014 21:50:20 +0000 (15:50 -0600)]
seccomp: introduce v2 policy (v2)
v2 allows specifying system calls by name, and specifying
architecture. A policy looks like:
2
whitelist
open
read
write
close
mount
[x86]
open
read
Also use SCMP_ACT_KILL by default rather than SCMP_ACT_ERRNO(31) -
which confusingly returns 'EMLINK' on x86_64. Note this change
is also done for v1 as I think it is worthwhile.
With this patch, I can in fact use a seccomp policy like:
2
blacklist
mknod errno 0
after which 'sudo mknod null c 1 3' silently succeeds without
creating the null device.
changelog v2:
add blacklist support
support default action
support per-rule action
Stéphane Graber [Wed, 12 Feb 2014 16:58:15 +0000 (11:58 -0500)]
lxc-start-ephemeral: Allow unprivileged run
This allows running lxc-start-ephemeral using overlayfs. aufs remains
blocked as it hasn't been looked at and patched to work in the kernel at
this point (not sure if it ever wil).
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Serge Hallyn [Wed, 12 Feb 2014 04:20:03 +0000 (22:20 -0600)]
check for access to lxcpath
The previous check for access to rootfs->path failed in the case of
overlayfs or loop backign stores. Instead just check early on for
access to lxcpath.