git.ipfire.org Git - thirdparty/lxc.git/log

statvfs: do nothing if statvfs does not exist (android/bionic)

If statvfs does not exist, then don't recalculate mount flags
at remount.

If someone does need this, they could replace the code (only
if !HAVE_STATVFS) with code parsing /proc/self/mountinfo (which
exists in the recent git history)

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>

lxc_mount_auto_mounts: honor existing nodev etc at remounts

Same problem as we had with mount_entry(). lxc_mount_auto_mounts()
sometimes does bind mount followed by remount to change options.
With recent kernels it must pass any preexisting NODEV/NOSUID/etc
flags.

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>

mount_entry: use statvfs

Use statvfs instead of parsing /proc/self/mountinfo to check for the
flags we need to and into the msbind mount flags. This will be faster
and the code is cleaner.

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>

build: Fix support for split build and source dirs

Building LXC in a separate target directory, by running configure from
outside the source tree, failed with multiple errors, mostly in the
Python and Lua extensions, due to assuming the source dir and build dir
are the same in a few places. To fix that:

- Pre-process setup.py with the appropriate directories at configure
  time
- Introduce the build dir as an include path in the Lua Makefile
- Link the default container configuration file from the alternatives
  in the configure stage, instead of setting a variable and using it
  in the Makefile

Signed-off-by: Daniel Miranda <danielkza2@gmail.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>

chmod container dir to 0770 (v2)

This prevents u2 from going into /home/u1/.local/share/lxc/u1/rootfs
and running setuid-root applications to get write access to u1's
container rootfs.

v2: set umask to 002 for the mkdir. Otherwise if umask happens to be,
say, 022, then user does not have write permissions under the container
dir and creation of $containerdir/partial file will fail.

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>

ignore SIGKILL (CTRL-C) and SIGQUIT (CTRL-\) - issue #313

Signed-off-by: S.Çağlar Onur <caglar@10ur.org>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>

show additional info if btrfs subvolume deletion fails (issue #315)

Unprivileged users require "-o user_subvol_rm_allowed" mount option for btrfs.
Make the INFO level message to ERROR to make it clear, which now says following;

[caglar@qop:~] lxc-destroy -n rubik
lxc_container: Is the rootfs mounted with -o user_subvol_rm_allowed?
lxc_container: Error destroying rootfs for rubik
Destroying rubik failed

Signed-off-by: S.Çağlar Onur <caglar@10ur.org>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>

Update plamo template

- If "installpkg" command does not exist, lxc-plamo temporarily
  install the command with static linked tar command into the lxc
  cache directory.  The tar command does not refer to passwd/group
  files, which means that only a few files/directories are extracted
  with wrong user/group ownership.  To avoid this, the installpkg
  command now uses the standard tar command in the system.
- Change mode to 666 for $rootfs/dev/null to allow write access for
  all users.
- Small fix in usage message.

Signed-off-by: TAMUKI Shoichi <tamuki@linet.gr.jp>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
Acked-by: KATOH Yasufumi <karma@jazz.email.ne.jp>

monitor: fix sockname calculation for long lxcpaths

A long enough lxcpath (and small PATH_MAX through crappy defines) can cause
the creation of the string to be hashed to fail.  So just use alloca to
get the size string we need.

More importantly, while I can't explain it, if lxcpath is too long, setting
sockname[sizeof(addr->sun_path)-2] to \0 simply doesn't seem to work.  So set
sockname[sizeof(addr->sun_path)-3] to \0, which does work.

With this, and with

lxc.lxcpath = /opt/lxc0123456789/lxc0123456789/lxc0123456789/lxc0123456789/lxc0123456789/lxc0123456789/lxc0123456789/lxc0123456789/lxc0123456789/lxc0123456789

in /etc/lxc/lxc.conf, I can run lxc-wait just fine.  Without it, it fails
(as does lxc-start -d, which uses lxc_wait to verify the container started)

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>

command socket: use hash if needed

The container command socket is an abstract unix socket containing
the lxcpath and container name. Those can be too long. In that case,
use the hash of the lxcpath and lxcname. Continue to use the path and
name if possible to avoid any back compat issues.

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>

doc: Add destroy option to lxc-snapshot(1)

This commit is the same as the commit 18aa217 and 99e616a on master
branch, except "ALL" keyword.

Signed-off-by: KATOH Yasufumi <karma@jazz.email.ne.jp>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>

Discontinue the use of in-line comments (stable)

Those aren't supported, it's just a lucky coincidence that they weren't
causing problems.

Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>

tests: Call sync before testing a shutdown

This should avoid tests failure when the machine running the tests has
either very slow disks or a lot of data waiting to be flushed.

Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>

do_mount_entry: add nexec, nosuid, nodev, rdonly flags if needed at remount

See http://lkml.org/lkml/2014/8/13/746 and its history.  The kernel now refuses
mounts if we don't add ro,nosuid,nodev,noexec flags if they were already there.

Also use the newly found info to skip remount if unneeded.  For background, if
you want to create a read-only bind mount, then you must first mount(2) with
MS_BIND to create the bind mount, then re-mount(2) again to get the new mount
options to apply.  So if this wasn't a bind mount, or no new mount options were
introduced, then we don't do the second mount(2).

null_endofword() and get_field() were not changed, only moved up in
the file.

(Note, while I can start containers inside a privileged container with
this patch, most of the lxc tests still fail with the kernel in question;
Andy's patch seems to still be needed - a kernel with which is available
at https://launchpad.net/~serge-hallyn/+archive/ubuntu/userns-natty
ppa:serge-hallyn/userns-natty)

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>

doc: Fix Japanese translation of lxc.container.conf(5)

Signed-off-by: KATOH Yasufumi <karma@jazz.email.ne.jp>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>

Revert "chmod container dir to 0770"

This commit broke the testsuite for unprivileged containers as the
container directory is now 0750 with the owner being the container root
and the group being the user's group, meaning that the parent user can
only enter the directory, not create entries in there.

This reverts commit c86da6a3ac517b78e6f710df7efe2f51d153b73c.

Fix typo in the previous commit...

Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>

Add extra debugging

This is an hybrid between Micahel's original patch and me making the new
debugging statements look like our existing ones.

Signed-off-by: "Micahel J. Evans" <mjevans1983@gmail.com>
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>

doc: language correction

Signed-off-by: Lars Wikberg <lars.wikberg@anvia.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>

chmod container dir to 0770

This prevents u2 from going into /home/u1/.local/share/lxc/u1/rootfs
and running setuid-root applications to get write access to u1's
container rootfs.

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Dwight Engen <dwight.engen@oracle.com>

cgmanager get/set: clean up child (v2)

(Thanks, Dwight, this one look right?)

Make sure we reap our child at cgm_{s,g}et.

Changelog: Fix change in behavior on empty read from the do_cgm_get()
helper that was spotted by Dwight.

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Dwight Engen <dwight.engen@oracle.com>

introduce --with-distro=raspbian

Raspberry Pi kernel finally supports all the bits required by LXC [1]

This patch makes "./configure --with-distro=raspbian" to install lxcbr0
based config file and upstart jobs.
Also src/lxc/lxc.net now checks the existence of the lxc-dnsmasq user
(and fallbacks to dnsmasq)

RPI users still need to pass
"MIRROR=http://archive.raspbian.org/raspbian/" parameter to lxc-create
to pick the correct packages

MIRROR=http://archive.raspbian.org/raspbian/ lxc-create -t debian -n rpi

[1] https://github.com/raspberrypi/linux/issues/176

stable-1.0: Cherry-picked from master minus the lxc-net change as
lxc-net isn't available in LXC 1.0.x. Instead it is assumed that the
distribution will take care of setting up the network (lxcbr0 in this
case).

Signed-off-by: S.Çağlar Onur <caglar@10ur.org>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>

lxc-test-unpriv: test lxc-clone -s

This would have caught a regression in Ubuntu's 3.16 kernel.

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>

coverity: malloc the right size for btrs_node tree

We were allocating sizeof(tree) instead of sizeof(*tree).

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>

coverity: don't use newname after null check

Actually, get rid of the temporary variables, and set newname
and lxcpath to usable values if they were NULL.

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>

doc: add lxc.console.logpath to Japanese lxc.container.conf(5)

Update for commit 96f15ca

Signed-off-by: KATOH Yasufumi <karma@jazz.email.ne.jp>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>

add lxc.console.logpath

logpath has been supported through lxc-start command line, but not
through the API. Since the lxc.console is now required to be a device,
support lxc.console.logfile to be a simple file to which console output
will be logged.

clear_config_item is not supported, as it isn't for lxc.console, bc
you can do 'lxc.console.logfile =' to clear it.

(This patch is for stable-1.0)

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>

Do not allow snapshots of LVM backed containers

They don't work right now, so until we fix that, don't allow it.

(This patch is for stable-1.0)

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>

Ensure /dev/pts directory exists on pts setup

When `lxc.autodev = 0` and empty tmpfs is mounted on /dev
and private pts are requested, we need to ensure '/dev/pts'
exists before attempting to mount devpts on it.

Signed-off-by: Jean-Tiare LE BIGOT <jean-tiare.le-bigot@ovh.net>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>

Fix a file descriptor leak in the monitord spawn

Signed-off-by: Vincent Giersch <vincent.giersch@ovh.net>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>

Fix a file descriptor leak in the daemonization

Especially when using the Python API, the child process inherits of
the file descriptiors of the script.

Signed-off-by: Vincent Giersch <vincent.giersch@ovh.net>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>

fix '--log-priority' --> '--logpriority' in main

Signed-off-by: Jean-Tiare LE BIGOT <jean-tiare.le-bigot@ovh.net>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>

Use portageq to determine portage distdir

Signed-off-by: Rabi Shanker Guha <guha.rabishankar@gmail.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>

Include hostname in DHCP requests

With the current old CentOS template, dnsmasq was not able to resolve
the hostname of an lxc container after it had been created. This minor
change rectifies that.

Signed-off-by: Kalman Olah <hello@kalmanolah.net>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>

ssh: send hostname to dhcp server

Send container's hostname to dhcp server when getting ip address.

Signed-off-by: Nikolay Martynov <mar.kolya@gmail.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>

templates: switch from arch command to uname -m

Signed-off-by: Michael Werner <xaseron@googlemail.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>

ubuntu templates: don't check for $rootfs/run/shm

/dev/shm must be turned from a directory into a symlink to /run/shm.
The templates do this only if they find -d $rootfs/run/shm. Since /run
will be a tmpfs, checking for it in the rootfs is silly. It also is
currently broken as ubuntu cloud images have an empty /run.

(this should fix https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1353734)

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>

btrfs: support recursive subvolume deletion (v2)

Pull the #defines and struct definitions for btrfs into a separate
.h file to not clutter bdev.c

Implement btrfs recursive delete support

A non-root user isn't allow to do the ioctls needed for searching (as you can
verify with 'btrfs subvolume list').  So for an unprivileged user, if the
rootfs has subvolumes under it, deletion will fail.  Otherwise, it will
succeed.

Changelog: Aug 1:
  . Fix wrong objid passing when determining directory paths
  . In do_remove_btrfs_children, avoid dereferencing NULL dirid
  . Fix memleak in error case.

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>

doc: Add 'zfs' to the parameter of -B option in lxc-create(1)

Signed-off-by: KATOH Yasufumi <karma@jazz.email.ne.jp>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>

doc: Update the description of SELinux in Japanese lxc.container.conf(5)

Update for commit 719fae0

Signed-off-by: KATOH Yasufumi <karma@jazz.email.ne.jp>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>

doc: Add -F option to Japanese lxc-start(1)

Update for commit 476d302

Signed-off-by: KATOH Yasufumi <karma@jazz.email.ne.jp>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>

use non-thread-safe getpwuid and getpwgid for android

We only call it (so far) after doing a fork(), so this is fine. If we
ever need such a thing from threaded context, we'll simply need to write
our own version for android.

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>

print a helpful message if creating unpriv container with no idmap

This gives me:

ubuntu@c-t1:~$ lxc-create -t download -n u1
lxc_container: No mapping for container root
lxc_container: Error chowning /home/ubuntu/.local/share/lxc/u1/rootfs to container root
lxc_container: You must either run as root, or define uid mappings
lxc_container: To pass uid mappings to lxc-create, you could create
lxc_container: ~/.config/lxc/default.conf:
lxc_container: lxc.include = /etc/lxc/default.conf
lxc_container: lxc.id_map = u 0 100000 65536
lxc_container: lxc.id_map = g 0 100000 65536
lxc_container: Error creating backing store type (none) for u1
lxc_container: Error creating container u1

when I create a container without having an id mapping defined.

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>

provide an example SELinux policy for older releases

The virtd_lxc_t type provided by the default RHEL/CentOS/Oracle 6.5
policy is an unconfined_domain(), so it doesn't really enforce anything.
This change will provide a link in the documentation to an example
policy that does confine containers.

On more recent distributions with new enough policy, it is recommended
not to use this sample policy, but to use the types already available
on the system from /etc/selinux/targeted/contexts/lxc_contexts, ie:

process = "system_u:system_r:svirt_lxc_net_t:s0"
file = "system_u:object_r:svirt_sandbox_file_t:s0"

Signed-off-by: Dwight Engen <dwight.engen@oracle.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>

download: Have wget retry 3 times

This forces wget to retry if it gets a network error.

Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>

lxc-start: Add -F (foreground) option

Introduce a new -F option (no-op for now) as an opposite of -d.

Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>

apparmor: Allow slave bind mounts

Without this, if the system uses shared subtrees by default (like systemd), you
get a large stream of

  lxc-start: Permission denied - Failed to make /<mountpoint> rslave
  lxc-start: Continuing...

with

  apparmor="DENIED" operation="mount" info="failed flags match" error=-13
  profile="/usr/bin/lxc-start" name="/" pid=17284 comm="lxc-start" flags="rw, slave"

and eventual failure plus a lot of leftover mounts in the host.

https://launchpad.net/bugs/1325468

add help string for ubuntu templete

Signed-off-by: Trần Ngọc Quân <vnwildman@gmail.com>
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>

fix typo in btrfs error msg

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>

unprivileged containers: use next available nic name if unspecified

Rather than always using eth0. Otherwise unpriv containers cannot have
multiple lxc.network.type = veth's without manually setting
lxc.network.name =.

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>

Sysvinit script fixes

Signed-off-by: Stefano Ansaloni <ansalonistefano@gmail.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>

Add SIGPWR support to lxc_init

This patch adds SIGPWR support to lxc_init.
This helps to properly shutdown lxc_init based containers.

Signed-off-by: Nikolay Martynov <mar.kolya@gmail.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>

remove mountcgroup hook entirely

Also fix the comment in lxc-cirros template (which I overlooked last time).

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>

Remove mention of mountcgroups in ubuntu.common config

That mount hook predates the lxc.mount.auto = cgroup option. So mention
that instead.

Perhaps we should simply drop the mountcgroup hook from the tree, but
I'm not doing that in this patch.

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>

lxc-test-{unpriv,usernic.in}: make sure to chgrp as well

These tests are failing on new kernels because the container root is
not privileged over the directories, since privilege no requires
the group being mapped into the container.

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>

doc: Add mention that veth.pair is ignored for unpriv in Japanese man

Update Japanese lxc.container.conf(5) for commit 8982c0f

Signed-off-by: KATOH Yasufumi <karma@jazz.email.ne.jp>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>

doc: Mention that veth.pair is ignored for unpriv

veth.pair is ignore for unprivileged containers as allowing an
unprivileged user to set a specific device name would allow them to
trigger actions in tools like NetworkManager or other uevent based
handlers that may react based on specific names or prefixes being used.

Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>

centos template: prevent mingetty from calling vhangup(2)

When using unprivileged containers, tty fails because of vhangup. Adding
--nohangup to nimgetty, it fixes the issue. This is the same problem
occurred for oracle template, commit 2e83f7201c5d402478b9849f0a85c62d5b9f1589

Signed-off-by: Claudio Alarcon clalarco@gmail.com

Fix typo in previous patch

Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>

confile: sanity-check netdev->type before setting netdev->priv elements

The netdev->priv is shared for the netdev types. A bad config file
could mix configuration for different types, resulting in a bad
netdev->priv when starting or even destroying a container. So sanity
check the netdev->type before setting a netdev->priv element.

This should fix https://github.com/lxc/lxc/issues/254

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>

rootfs_is_blockdev: don't run if no rootfs is specified

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>

change version to 1.0.5 in configure.ac

Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>

tests: lxc-test-ubuntu doesn't actually need bind9-host

Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>

tests: Clarify error message and fix return codes

Reported-by: Michael J. Evans
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>

lxc-debian: standardize formatting

Signed-off-by: Alexander Dreweke <alexander@dreweke.net>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>

lxc-debian: fix formatting

added space ">/" -> "> /"

Signed-off-by: Alexander Dreweke <alexander@dreweke.net>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>

Also add --verison support to lxc-start-ephemeral

Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>

Add support for --version to lxc-ls and lxc-device

This is based on the patch submitted by:
Yuto KAWAMURA(kawamuray) <kawamuray.dadada@gmail.com>

Updated to use lxc.version rather than @LXC_VERSION@ and to apply to
both lxc-ls and lxc-device rather than just the former.

Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>

Fix attach_wait and threads

Signed-off-by: Dorian Eikenberg <dorian.eikenberg@uni-duesseldorf.de>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>

Fix incorrect timeout handling of do_reboot_and_check()

Currently do_reboot_and_check() is decreasing timeout variable even if
it is set to -1, so running 'lxc-stop --reboot --timeout=-1 ...' will
exits immediately at end of second iteration of loop, without waiting
container reboot.
Also, there is no need to call gettimeofday if timeout is set to -1, so
these statements should be evaluated only when timeout is enabled.

Signed-off-by: Yuto KAWAMURA(kawamuray) <kawamuray.dadada@gmail.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>

Change find_fstype_cb to ignore blank lines and comments

/etc/filesystems could be contain blank lines and comments.
Change find_fstype_cb() to ignore blank lines and comments which starts
with '#'.

Signed-off-by: Yuto KAWAMURA(kawamuray) <kawamuray.dadada@gmail.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>

Fix invalid seccomp profile name for Ubuntu.

Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>

make the container exit code propagate to lxc-start exit code when appropriate

Signed-off-by: Rodrigo Sampaio Vaz <rodrigo@heroku.com>
Acked-by: Serge Hallyn <serge.hallyn@ubuntu.com>

chown_mapped_root: don't try chgrp if we don't own the file

New kernels require that to have privilege over a file, your
userns must have the old and new groups mapped into your userns.
So if a file is owned by our uid but another groupid, then we
have to chgrp the file to our primary group before we can try
(in a new user namespace) to chgrp the file to a group id in the
namespace.

But in some cases (when cloning) the file may already be mapped
into the container. Now we cannot chgrp the file to our own
primary group - and we don't have to.

So detect that case. Only try to chgrp the file to our primary
group if the file is owned by our euid (i.e. not by the container)
and the owning group is not already mapped into the container by
default.

With this patch, I'm again able to both create and clone containers
with no errors again.

Reported-by: S.Çağlar Onur <caglar@10ur.org>
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>

Enable default seccomp profile for all distros

This updates the common config to include Serge's seccomp profile by
default for privileged containers.

Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>

Apparmor: allow hugetlbfs mounts everywhere

Signed-off-by: Jesse Tane <jesse.tane@gmail.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>

Cast to gid_t to fix android build failure

stat.st_gid is unsigned long in bionic instead of the expected gid_t, so
just cast it to gid_t.

Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>

Fix to work lxc-destroy with unprivileged containers on recent kernel

Change idmap_add_id() to add both ID_TYPE_UID and ID_TYPE_GID entries
to an existing lxc_conf, not just an ID_TYPE_UID entry, so as to work
lxc-destroy with unprivileged containers on recent kernel.

Signed-off-by: TAMUKI Shoichi <tamuki@linet.gr.jp>
Acked-by: KATOH Yasufumi <karma@jazz.email.ne.jp>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>

Fix to work lxc-start with unprivileged containers on recent kernel

Change chown_mapped_root() to map in both the root uid and gid, not
just the uid, so as to work lxc-start with unprivileged containers on
recent kernel.

Signed-off-by: TAMUKI Shoichi <tamuki@linet.gr.jp>
Signed-off-by: KATOH Yasufumi <karma@jazz.email.ne.jp>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>

Don't call sig_name twice, use pointer instead

Signed-off-by: Alexander Vladimirov <alexander.idkfa.vladimirov@gmail.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>

cgm_get: make sure @value is null-terminated

Previously this was done by strncpy, but now we just read
the len bytes - not including \0 - from a pipe, so pre-fill
@value with 0s to be safe.

This fixes the python3 api_test failure.

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>

cgmanager: have cgm_set and cgm_get use absolute path when possible

This allows users to get/set cgroup settings when logged into a different
session than that from which they started the container.

There is no cgmanager command to do an _abs variant of cgmanager_get_value
and cgmanager_set_value.  So we fork off a new task, which enters the
parent cgroup of the started container, then can get/set the value from
there.  The reason not to go straight into the container's cgroup is that
if we are freezing the container, or the container is already frozen, we'll
freeze as well :)  The reason to fork off a new task is that if we are
in a cgroup which is set to remove-on-empty, we may not be able to return
to our original cgroup after making the change.

This should fix https://github.com/lxc/lxc/issues/246

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>

attach: Fix querying for the current personality

Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
Acked-by: Serge Hallyn <serge.hallyn@ubuntu.com>

Prevent write_config from corrupting container config

write_config doesn't check the value sig_name function returns,
this causes write_config to produce corrupted container config when
using non-predefined signal names.

Signed-off-by: Alexander Vladimirov <alexander.idkfa.vladimirov@gmail.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>

Include ubuntu.priv.seccomp in dist tarball

Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>

ubuntu containers: use a seccomp filter by default (v2)

Blacklist module loading, kexec, and open_by_handle_at (the cause of the
not-docker-specific dockerinit mounts namespace escape).

This should be applied to all arches, but iiuc stgraber will be doing
some reworking of the commonizations which will simplify that, so I'm
not doing it here.

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>

seccomp: fix 32-bit rules

When calling seccomp_rule_add(), you must pass the native syscall number
even if the context is a 32-bit context. So use resolve_name rather
than resolve_name_arch.

Enhance the check of /proc/self/status for Seccomp: so that we do not
enable seccomp policies if seccomp is not built into the kernel. This
is needed before we can enable by-default seccomp policies (which we
want to do next)

Fix wrong return value check from seccomp_arch_exist, and remove
needless abstraction in arch handling.

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>

seccomp: support 'all' arch sections (plus bugfixes)

seccomp_ctx is already a void*, so don't use 'scmp_filter_ctx *'

Separately track the native arch from the arch a rule is aimed at.

Clearly ignore irrelevant architectures (i.e. arm rules on x86)

Don't try to load seccomp (and don't fail) if we are already
seccomp-confined.  Otherwise nested containers fail.

Make it clear that the extra seccomp ctx is only for compat calls
on 64-bit arch.  (This will be extended to arm64 when libseccomp
supports it).  Power may will complicate this (if ever it is supported)
and require a new rethink and rewrite.

NOTE - currently when starting a 32-bit container on 64-bit host,
rules pertaining to 32-bit syscalls (as opposed to once which have
the same syscall #) appear to be ignored.  I can reproduce that without
lxc, so either there is a bug in seccomp or a fundamental
misunderstanding in how I"m merging the contexts.

Rereading the seccomp_rule_add manpage suggests that keeping the seccond
seccomp context may not be necessary, but this is not something I care
to test right now.  If it's true, then the code could be simplified, and
it may solve my concerns about power.

With this patch I'm able to start nested containers (with seccomp
policies defined) including 32-bit and 32-bit-in-64-bit.

[ this patch does not yet add the default seccomp policy ]

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>

fix the expansion of libexecdir when not explicitly passed to configure

Signed-off-by: Dwight Engen <dwight.engen@oracle.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>

split -lcap and -lselinux out of LIBS

Signed-off-by: Dwight Engen <dwight.engen@oracle.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>

don't force dropping capabilities in lxc-init

Commit 0af683cf added clearing of capabilities to lxc-init, but only
after lxc_setup_fs() was done, likely so that the mounting done in
that routine wouldn't fail.

However, in my testing lxc_caps_reset() wasn't really effective
anyway since it did not clear the bounding set. Adding prctl
PR_CAPBSET_DROP in a loop from 0 to CAP_LAST_CAP would fix this, but I
don't think its necessary to forcefully clear all capabilities since
users can now specify lxc.cap.keep = none to drop all capabilities.

Signed-off-by: Dwight Engen <dwight.engen@oracle.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>

Fix typo in lxc_attach's usage

Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>

seccomp: warn but continue on unresolvable syscalls

If a syscall is listed which is not resolvable, continue. This allows
us to keep a more complete list of syscalls in a global seccomp policy
without having to worry about older kernels not supporting the newer
syscalls.

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>

bdev.c: initialize a pointer to avoid build failures with -Werror=maybe-uninitialized

Signed-off-by: Leonid Isaev <lisaev@umail.iu.edu>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>

lxc-autostart: Respect -P

-P was only used for log setup and not when retrieving the container list.

Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>

tests: apparmor: Always end with a newline

Some error messages in lxc-test-apparmor didn't end with a newline,
leading to slightly difficult to read output.

Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>

tests: Don't fail when HOME isn't defined

Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>

tests: Avoid the download template when possible

The use of the download template with an hardcoded --arch=amd64 in aa.c
was causing test failures on any platform incapable of running amd64
binaries.

This wasn't noticed in the CI environment as we run the tests within
containers on an amd64 kernel but this caused failures on the Ubuntu CI
environment.

Instead, let's use the busybox template, tweaking the configuration when
needed to match the needs of the testcase.

Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>

change version to 1.0.4 in configure.ac

Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>

tests: Wait 5s for init to respond in lxc-test-autostart

lxc-test-autostart occasionaly fails at the restart test in the CI
environment. Looking at the current test case, the most obvious race
there is if lxc-wait exists succesfuly immediately after LXC marked the
container RUNNING (init spawned) but before init had a chance to setup
the signal handlers.

To avoid this potential race period, let's add a 5s delay between the
tests to give a chance for init to finish starting up.

Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>