Robert Schiele [Fri, 21 Aug 2015 05:35:34 +0000 (07:35 +0200)]
check for NULL pointers before calling setenv()
Latest glibc release actually honours calling setenv with a NULL
pointer by causing SIGSEGV but checking pointers before submitting
to any system function is a good idea anyway.
Signed-off-by: Robert Schiele <rschiele@gmail.com>
Tycho Andersen [Fri, 14 Aug 2015 16:24:47 +0000 (10:24 -0600)]
c/r: enable tracefs
tracefs is a new filesystem that can be mounted by users. Only the options
and fs name need to be passed to restore the state, so we can use criu's
auto fs feature.
Robert LeBlanc [Thu, 13 Aug 2015 19:36:55 +0000 (13:36 -0600)]
Caps are getting lost when cloning an LXC. Adding the -X parameter copies the extended attributes. This allows things like ping to continue to be used by a non-privilged user in Debian at least.
Tycho Andersen [Mon, 10 Aug 2015 17:12:18 +0000 (11:12 -0600)]
c/r: get rid of dump_net_info()
This was originally used to propagate the bridge and veth names across
hosts, but now we extract both from the container's config file, and
nothing reads the files that dump_net_info() writes, so let's just get rid
of them.
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
reuse label cleanup since free(NULL) is a no-op Signed-off-by: Arjun Sreedharan <arjun024@gmail.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
When setting lxc.network.veth.pair to get a fixed interface
name the recreation of it after a reboot caused an EEXIST.
-) The reboot flag is now a three-state value. It's set to
1 to request a reboot, and 2 during a reboot until after
lxc_spawn where it is reset to 0.
-) If the reboot is set (!= 0) within instantiate_veth and
a fixed name is used, the interface is now deleted before
being recreated.
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Jiri Slaby [Wed, 5 Aug 2015 08:32:54 +0000 (10:32 +0200)]
templates: lxc-opensuse, use rpm to determine build version
zypper info's output is not usable for several reasons:
* it is localized -- there is no "Version: " in my output
* it shows results both from the repo and local system
So use plain rpm to determine whether build is installed and if proper
version is in place.
1) Two checks on amd64 for whether compat_ctx has already
been generated were redundant, as compat_ctx is generally
generated before entering the parsing loop.
2) With introduction of reject_force_umount the check for
whether the syscall has the same id on both native and
compat archs results in false behavior as this is an
internal keyword and thus produces a -1 on
seccomp_syscall_resolve_name_arch().
The result was that it was added to the native architecture
twice and never to the 32 bit architecture, causing it to
have no effect on 32 bit containers on 64 bit hosts.
3) I do not see a reason to care about whether the syscalls
have the same number on the two architectures. On the one
hand this check was there to avoid adding it to two archs
(and effectively leaving one arch unprotected), while on
the other hand it seemed to be okay to add it to the
same arch *twice*.
The entire architecture checking branches are now reduced to
three simple cases: 'native', 'non-native' and 'all'. With
'all' adding to both architectures regardless of the syscall
ID.
Also note that libseccomp had a bug in its architecture
checking, so architecture related filters weren't working as
expected before version 2.2.2, which may have contributed to
the confusion in the original architecture-related code.
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
The Fedora 22 squashfs doesn't appear to work, the Fedora 21 isn't
available, so lets use the fedora archive mirror and pull the good old
Fedora 20 squashfs.
Loop devices can be added on the fly when needed, they're
not always created beforehand. The loop-control device can
be used to find and allocate the next available number
instead of going through the /dev directory contents (which
is now only a fallback mechanism).
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
CVE-2015-1334: Don't use the container's /proc during attach
A user could otherwise over-mount /proc and prevent the apparmor profile
or selinux label from being written which combined with a modified
/bin/sh or other commonly used binary would lead to unconfined code
execution.
Reported-by: Roman Fiedler Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
KATOH Yasufumi [Thu, 25 Jun 2015 09:14:04 +0000 (18:14 +0900)]
Support unprivileged ephemeral container using aufs
As the commit 31a882e, an unprivileged container can use aufs.
This patch removes the check for unpriv aufs, and change the path of
xino file as an unprivileged user can mount aufs.
Signed-off-by: KATOH Yasufumi <karma@jazz.email.ne.jp> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Stéphane Graber [Thu, 18 Jun 2015 19:55:45 +0000 (15:55 -0400)]
lxc-net: Use iproute and relative paths everywhere (V2)
V2 changes:
- Keep using /var/lib for the lease file, but making it respect localstatedir
- Don't pass an empty --conf-file as that confuses dnsmasq when
/etc/dnsmasq.conf doesn't exist or isn't readable.
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Lenz Grimmer [Fri, 12 Jun 2015 23:08:41 +0000 (01:08 +0200)]
use `hostname` for DHCP_HOSTNAME in ifcfg-eth0
Updated centos/fedora/oracle templates to use `hostname` for DHCP_HOSTNAME in
/etc/sysconfig/network/ifcfg-eth0, so the container's host name is propagated
to the host's DHCP server (e.g. dnsmasq, which also acts as the DNS server).
This resolves lxc/lxd#756
Dennis Schridde [Thu, 11 Jun 2015 17:51:02 +0000 (19:51 +0200)]
Adopt capability drop explanations from other distros on Gentoo, drop setpcap,sys_nice caps
Documents setpcap,sys_admin,sys_resources as breaking systemd, but does not drop them from lxc.cap.drop, as the default init system on Gentoo is OpenRC, thus stuff breaking systemd can be blocked anyway.
This also drops setpcap and sys_nice caps, as these are also dropped in other non-systemd distros.
Most of the explanatory blurb was copied from other distros' configs.
Serge Hallyn [Thu, 11 Jun 2015 04:08:15 +0000 (23:08 -0500)]
daemonized start: exit children on failure, don't return
When starting a daemonized container, only the original parent
thread should return to the caller. The first forked child
immediately exits after forking, but the grandparent child
was in some places returning on error - causing a second instance
of the calling function.
Tycho Andersen [Wed, 10 Jun 2015 21:57:50 +0000 (21:57 +0000)]
uniformly nullify std fds
In various places throughout the code, we want to "nullify" the std fds,
opening them to /dev/null or zero or so. Instead, let's unify this code and do
it in such a way that Coverity (probably) won't complain.
v2: use /dev/null for stdin as well
v3: add a comment about use of C's short circuiting
v4: axe comment, check errors on dup2, s/quiet/need_null_stdfds
Daniel Golle [Tue, 9 Jun 2015 10:58:12 +0000 (12:58 +0200)]
fix build on mpc85xx
Initialize ret to 0 so compiler no longer complains about
monitor.c: In function 'lxc_monitor_open':
monitor.c:212:5: error: 'ret' may be used uninitialized in this function [-Werror=maybe-uninitialized]
https://github.com/openwrt/packages/issues/1356
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Serge Hallyn [Tue, 2 Jun 2015 22:33:34 +0000 (22:33 +0000)]
api_start: always close fds 0-2 when daemonized
commit 507cee3618237d3 moved the close and re-open of fds 0-2 into
do_start. But this means that the lxc monitor itself keeps the
caller's fds 0-2 open, which is wrong for daemonized containers.
Closes #548
Reported-by: Mathieu Le Marec - Pasquet <kiorky@cryptelium.net> Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com> Acked-by: Stéphane Graber <stgraber@ubuntu.com>
Serge Hallyn [Wed, 27 May 2015 10:05:16 +0000 (10:05 +0000)]
cgmanager: attach: never use 'all' controller
We were using 'all' controller if current was in all the
same cgroup. That doesn't suffice. We'd have to check
the target. At that point we may as well just attach
controller by controller.
An optimization to consider is to check the /proc/initpid/cgroup
for all identical controllers. Let's start by just getting it
right.
Stéphane Graber [Fri, 29 May 2015 15:39:25 +0000 (11:39 -0400)]
Fix ABI compatibility
Until we bump the SONAME to liblxc2, only symbol additions and struct
member additions are allowed.
Adding struct members in the middle of the struct breaks backward
compatibility.
This commit makes it clear when struct members were added and moves a
few members that were added in the middle of the 1.0 struct to the end
of it.
Note that unfortunately that means we're breaking backward compatibility
between LXC 1.1.0 and the state after this commit, given 1.1 is
reasonably new, this is the least damaging way of fixing the problem.
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Dwight Schauer [Tue, 2 Jun 2015 04:41:09 +0000 (23:41 -0500)]
The yum in Centos 5.11 does not know about '--releasever', which is used by: lxc-create ... -- release=VERSION
The release version only needs to be set in the outer bootstrap, not the inner one.
With this change an lxc-create bootstrap of CentOS 5.11 completes enough to be usable.
CentOS 5.11 containers can be created, started, stopped, and networking works. Signed-off-by: Dwight Schauer <das@teegra.net>
Serge Hallyn [Sun, 17 May 2015 20:14:13 +0000 (20:14 +0000)]
proc update - don't assume we are pid 1
(I erred in the first patch, causing every lxc-attach to unmount the
container-'s /proc)
Since we now use mount_proc_if_needed() from attach, as opposed to only
from start, we cannot assume we are pid 1. So fix the check for whether
to mount a new proc.
Serge Hallyn [Sun, 17 May 2015 13:04:47 +0000 (13:04 +0000)]
attach: mount a sane prox for LSM setup
To set lsm labels, a namespace-local proc mount is needed.
If a container does not have a lxc.mount.auto = proc set, then
tasks in the container do not have a correct /proc mount until
init feels like doing the mount. At startup we handlie this
by mounting a temporary /proc if needed. We weren't doing this
at attach, though, so that
Serge Hallyn [Fri, 1 May 2015 21:11:28 +0000 (21:11 +0000)]
Use 'cgm listcontrollers' list rather than /proc/self/cgroups
to populate the list of subsystems to use.
Cgmanager can be started with some subsystems disabled (i.e.
cgmanager -M cpuset). If lxc using cgmanager then uses the
/proc/self/cgroup output to determine which controllers to use,
it will fail when trying to do things to cpuset. Instead, ask
cgmanager which controllers to use.
This still defers (per patch 1/1) to the lxc.cgroup.use values.
Doing this requires some btrfs functions from bdev to be used in
utils.c Because utils.h is imported by lxc_init.c, I had to create
a new initutils.[ch] which are used by both lxc_init.c and utils.c
We could instead put the btrfs functions into utils.c, which would
be a shorter patch, but it really doesn't belong there. So I went
the other way figuring there may be more such cases coming up of
fns in utils.c needing code from bdev.c which can't go into lxc_init.
Currently, if we detect a btrfs subvolume we just remove it. The
st_dev on that dir is different, so we cannot detect if this is
bound in from another fs easily. If we care, we should check
whether this is a mountpoint, this patch doesn't do that.
Use POSIX-compliant function names in bash completion
When running in posix mode (for example, because it was invoked as `sh`,
or with the --posix option), bash rejects the function names previously
used because they contain hyphens, which are not legal POSIX names, and
exits immediately.
This is a particularly serious problem on a system in which the
following three conditions hold:
1. The `sh` executable is provided by bash, e. g. via a symlink
2. Gnome Display Manager is used to launch X sessions
3. Bash completion is loaded in the (system or user) profile file
instead of in the bashrc file
In that case, GDM's Xsession script (run with `sh`, i. e., bash in posix
mode) sources the profile files, thus causing the shell to load the bash
completion files. Upon encountering the non-POSIX-compliant function
names, bash would then exit, immediately ending the X session.
Fixes #521.
Signed-off-by: Lucas Werkmeister <mail@lucaswerkmeister.de>
Cyril Bitterich [Sat, 9 May 2015 19:57:14 +0000 (21:57 +0200)]
lxc-debian.in: Fixed errors if dbus is not installed
The lxc-debian template debootstraps a minimum debian system which does not contain dbus.
If systemd is used this will result in getty-static.service to be used instead of getty@ .
The systemd default files uses 6 tty's instead of the 4 the script creates.
This will lead to repeated error messages in the systemd journal.
Martin Pitt [Thu, 7 May 2015 11:38:50 +0000 (13:38 +0200)]
Call /lib/apparmor/profile-load directly instead of the wrapper
AppArmor ships /lib/apparmor/profile-load. /lib/init/apparmor-profile-load is
merely a wrapper which calls the former, so just call it directly to avoid the
dependency on the wrapper.
Kien Truong [Mon, 6 Apr 2015 16:20:43 +0000 (17:20 +0100)]
Properly free memory of sorted cgroup settings
We need to use lxc_list_for_each_safe, otherwise de-allocation
will fail with a list size bigger than 2. The pointer to the head
of the list also need freeing after we've freed all other elements
of the list.
Signed-off-by: Kien Truong <duckientruong@gmail.com>
Kien Truong [Sun, 5 Apr 2015 23:46:22 +0000 (23:46 +0000)]
Sort the cgroup memory settings before applying.
Add a function to sort the cgroup settings before applying.
Currently, the function will put memory.memsw.limit_in_bytes after
memory.limit_in_bytes setting so the container will start
regardless of the order specified in the input. Fix #453
Signed-off-by: Kien Truong <duckientruong@gmail.com>
c/r: check for criu images in the checkpoint directory
CRIU can get confused if there are two dumps that are written to the same
directory, so we make some minimal effort to prevent people from doing this.
This is a better alternative than forcing liblxc to create the directory, since
it is mostly race free (and neither solution is bullet proof anyway if someone
rsyncs some bad images over the top of the good ones).
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
This updates lxc-net with the following changes:
- Better recover from crashes/partial runs
- Better error detection and reporting
- Less code duplication (use the stop code on crash)
- Better state tracking
- Allow for restart of all of lxc-net except for the bridge itself
- Only support iproute from this point on (ifconfig's been deprecated
for years)
V2: Use template variables everywhere
Signed-off-by: Stéphane Graber <stgraber@ubuntu.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Make lxc-checkconfig work with kernel versions > 3
(1) Add test for kernel version greater 3.
(2) Use && and || instead of -a and -o as suggested in
http://www.unix.com/man-page/posix/1p/test/.
lxc-checkconfig will currently report "missing" on "Cgroup memory controller"
for kernel versions greater 3. This happens because the script, before checking
for the corresponding memory variable in the kernel config, currently will test
whether we have a major kernel version greater- or equal to 3 and a minor kernel
version greater- or equal to 6. This adds an additional test whether we have a
major kernel version greater than 3.
Signed-off-by: Christian Brauner <christianvanbrauner@gmail.com>
Particularly when using the go-lxc api with lots of threads, it
happens that if the open files limit is > 1024, we will try to
select on fd > 1024 which breaks on glibc.
logs: introduce a thread-local 'current' lxc_config (v2)
The logging code uses a global log_fd and log_level to direct
logging (ERROR(), etc). While the container configuration file allows
for lxc.loglevel and lxc.logfile, those are only used at configuration
file read time to set the global variables. This works ok in the
lxc front-end programs, but becomes a problem with threaded API users.
The simplest solution would be to not allow per-container configuration
files, but it'd be nice to avoid that.
Passing a logfd or lxc_conf into every ERROR/INFO/etc call is "possible",
but would be a huge complication as there are many functions, including
struct member functions and callbacks, which don't have that info and
would need to get it from somewhere.
So the approach I'm taking here is to say that all real container work
is done inside api calls, and therefore the API calls themselves can
set a thread-local variable indicating which log info to use. If
unset, then use the global values. The lxc-* programs, when called
with a '-o logfile' argument, set a global variable to indicate that
the user-specified value should be used.
In this patch:
If the lxc container configuration specifies a loglevel/logfile, only
set the lxc_config's logfd and loglevel according to those, not the
global values.
Each API call is wrapped to set/unset the current_config. (The few
exceptions are calls which do not result in any log actions)
Update logfile appender to use the logfile specified in lxc_conf if (a)
current_config is set and (b) the lxc-* command did not override it.
Changelog (2015-04-21):
. always re-set current_config to NULL at end of an API
call, rather than storing the previous value. We don't
nest API calls.
. remove the log_lock stuff which wasn't used
. lxc_conf_free: if the config is current_config, set
current_config to NULL. (It can't be another thread's
current_config, or we wouldn't be freeing it)
. lxc_check_inherited: don't close fd if it is the
current_config->logfd. Note this is only called when
starting a container, so we have no other threads at
this point.
Changelog (2015-04-22)
. Unset the per-container logfd on destroy
.
. Do so before we rm the containerdir. Otherwise if the logfile is set
. to $lxcpath/$name/log, the containerdir won't be fully deleted.
If we don't re-open these after clone, the init process has a pointer to the
parent's /dev/{zero,null}. CRIU seese these and wants to dump the parent's
mount namespace, which is unnecessary. Instead, we should just re-open
stdin/out/err after we do the clone and pivot root, to ensure that we have
pointers to the devcies in init's rootfs instead of the host's.
v2: Only close fds if the container was daemonized. This didn't turn out as
nicely as described on the list because lxc_start() doesn't actually have
the struct lxc_container, so it cant see the flag. Instead, we just pass it
down everywhere.
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>