Thomas Jarosch [Thu, 2 Feb 2017 11:48:35 +0000 (12:48 +0100)]
lxc_setup_tios(): Ignore SIGTTOU and SIGTTIN signals
Prevent an endless loop while executing lxc-attach in the background:
The kernel might fire SIGTTOU while an ioctl() in tcsetattr()
is executed. When the ioctl() is resumed and retries,
the signal handler interrupts it again.
We can't configure the TTY to stop sending
the signals in the first place since that
is a modification/write to the TTY already.
Still we clear the TOSTOP flag to prevent further signals.
Command to reproduce the hang:
----------------------------
cat > lxc_hang.sh << EOF
/usr/bin/timeout 5s /usr/bin/lxc-attach -n SOMECONTAINER -- /bin/true
EOF
sh lxc_hang.sh # hangs
----------------------------
Signed-off-by: Thomas Jarosch <thomas.jarosch@intra2net.com>
Previous versions of lxc-attach simply attached to the specified namespaces of
a container and ran a shell or the specified command without first allocating a
pseudo terminal. This made them vulnerable to input faking via a TIOCSTI ioctl
call after switching between userspace execution contexts with different
privilege levels. Newer versions of lxc-attach will try to allocate a pseudo
terminal master/slave pair on the host and attach any standard file descriptors
which refer to a terminal to the slave side of the pseudo terminal before
executing a shell or command. Note, that if none of the standard file
descriptors refer to a terminal lxc-attach will not try to allocate a pseudo
terminal. Instead it will simply attach to the containers namespaces and run a
shell or the specified command.
(This is a backport of a series of patches fixing CVE-2016-10124.)
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
- Make escape sequence to exit tty optional since we want to reuse
lxc_console_cb_tty_stdin() in lxc_attach.c.
- Export the following functions since they can be reused in other modules:
- lxc_console_cb_tty_stdin()
- lxc_console_cb_tty_master()
- lxc_setup_tios(int fd, struct termios *oldtios);
- lxc_console_winsz(int srcfd, int dstfd);
- lxc_console_cb_sigwinch_fd(int fd, uint32_t events, void *cbdata, struct lxc_epoll_descr *descr);
- lxc_tty_state *lxc_console_sigwinch_init(int srcfd, int dstfd);
- lxc_console_sigwinch_fini(struct lxc_tty_state *ts);
- rewrite lxc_console_set_stdfds()
- Make lxc_console_set_stdfds useable by other callers that do not have
access to lxc_handler.
- Use ssh settings for ptys.
- Remove all asserts from console.{c,h}.
- Adapt start.c to changes.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Serge Hallyn [Mon, 6 Mar 2017 19:36:19 +0000 (13:36 -0600)]
seccomp: set SCMP_FLTATR_ATL_TSKIP if available
Newer libseccomp has a flag called SCMP_FLTATR_ATL_TSKIP which
allows syscall '-1' (nop) to be executed. Without that flag,
debuggers cannot skip system calls inside containers. For reference,
see the seccomp(2) manpage, which says:
The tracer can skip the system call by changing the system call number to -1.
Adam Borowski [Sun, 12 Feb 2017 06:26:54 +0000 (07:26 +0100)]
seccomp: allow x32 guests on amd64 hosts.
Without this patch, x32 guests (and no others) worked "natively" with x32
host lxc, but not on regular amd64 hosts. That was especially problematic
as a number of ioctls such as those needed by netfilter don't work in such
scenarios, thus you want to run amd64 on the host.
With the patch, you can use all three ABIs: i386 x32 amd64 on amd64 hosts.
Despite x32 being little used, there's no reason to deny it by default:
the admin needs to compile their own kernel with CONFIG_X86_X32=y or (on
Debian) boot with syscall.x32=y. If they've done so, it is a reasonable
assumption they want x32 guests.
Signed-off-by: Adam Borowski <kilobyte@angband.pl>
CVE-2017-5985: Ensure target netns is caller-owned
Before this commit, lxc-user-nic could potentially have been tricked into
operating on a network namespace over which the caller did not hold privilege.
This commit ensures that the caller is privileged over the network namespace by
temporarily dropping privilege.
Launchpad: https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1654676 Reported-by: Jann Horn <jannh@google.com> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Colin Watson [Thu, 26 Jan 2017 14:32:08 +0000 (14:32 +0000)]
Make lxc-start-ephemeral Python 3.2-compatible
On Ubuntu 12.04 LTS with Python 3.2, `lxc-start-ephemeral` breaks as
follows:
Traceback (most recent call last):
File "/usr/bin/lxc-start-ephemeral", line 371, in attach_as_user
File "/usr/lib/python3.2/subprocess.py", line 515, in check_output
File "/usr/lib/python3.2/subprocess.py", line 732, in __init__
LookupError: unknown encoding: ANSI_X3.4-1968
This is because `universal_newlines=True` causes `subprocess` to use
`io.TextIOWrapper`, and in versions of Python earlier than 3.3 that
fetched the preferred encoding using `locale.getpreferredencoding()`
rather than `locale.getpreferredencoding(False)`, thereby changing the
locale and causing codecs to be reloaded. However, `attach_as_user`
runs inside the container and thus can't rely on having access to the
same Python standard library on disk.
The workaround is to decode by hand instead, avoiding the temporary
change of locale.
Use AC_HEADER_MAJOR to detect major()/minor()/makedev()
Before the change build failed on Gentoo as:
bdev/lxclvm.c: In function 'lvm_detect':
bdev/lxclvm.c:140:4: error: implicit declaration of function 'major' [-Werror=implicit-function-declaration]
major(statbuf.st_rdev), minor(statbuf.st_rdev));
^~~~~
bdev/lxclvm.c:140:28: error: implicit declaration of function 'minor' [-Werror=implicit-function-declaration]
major(statbuf.st_rdev), minor(statbuf.st_rdev));
^~~~~
glibc plans to remove <sys/sysmacros.h> from glibc's <sys/types.h>:
https://sourceware.org/ml/libc-alpha/2015-11/msg00253.html
Gentoo already applied glibc patch to experimental glibc-2.24
to start preparingfor the change.
Autoconf has AC_HEADER_MAJOR to find out which header defines
reqiured macros:
https://www.gnu.org/software/autoconf/manual/autoconf-2.69/html_node/Particular-Headers.html
This change should also increase portability across other libcs.
Bug: https://bugs.gentoo.org/604360 Signed-off-by: Sergei Trofimovich <siarheit@google.com>
This mainly affects Trusty. The 3.13 kernel has a broken overlay module which
does not handle symlinks correctly. This is a problem for containers that use
an overlay based rootfs since safe_mount() uses /proc/<pid>/fd/<fd-number> in
its calls to mount().
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Fabrice Fontaine [Sun, 18 Dec 2016 20:39:24 +0000 (21:39 +0100)]
Add --enable-gnutls option
Previously HAVE_LIBGNUTLS was never set in config.h even if gnutls was
detected as AC_CHECK_LIB default action-if-found was overriden by
enable_gnutls=yes
This patch adds an --enable-gnutls option and will call AC_CHECK_LIB
with the default action to write HAVE_LIBGNUTLS in config.h
So far, we opened a file descriptor refering to proc on the host inside the
host namespace and handed that fd to the attached process in
attach_child_main(). This was done to ensure that LSM labels were correctly
setup. However, by exploiting a potential kernel bug, ptrace could be used to
prevent the file descriptor from being closed which in turn could be used by an
unprivileged container to gain access to the host namespace. Aside from this
needing an upstream kernel fix, we should make sure that we don't pass the fd
for proc itself to the attached process. However, we cannot completely prevent
this, as the attached process needs to be able to change its apparmor profile
by writing to /proc/self/attr/exec or /proc/self/attr/current. To minimize the
attack surface, we only send the fd for /proc/self/attr/exec or
/proc/self/attr/current to the attached process. To do this we introduce a
little more IPC between the child and parent:
* IPC mechanism: (X is receiver)
* initial process intermediate attached
* X <--- send pid of
* attached proc,
* then exit
* send 0 ------------------------------------> X
* [do initialization]
* X <------------------------------------ send 1
* [add to cgroup, ...]
* send 2 ------------------------------------> X
* [set LXC_ATTACH_NO_NEW_PRIVS]
* X <------------------------------------ send 3
* [open LSM label fd]
* send 4 ------------------------------------> X
* [set LSM label]
* close socket close socket
* run program
The attached child tells the parent when it is ready to have its LSM labels set
up. The parent then opens an approriate fd for the child PID to
/proc/<pid>/attr/exec or /proc/<pid>/attr/current and sends it via SCM_RIGHTS
to the child. The child can then set its LSM laben. Both sides then close the
socket fds and the child execs the requested process.
Signed-off-by: Christian Brauner <christian.brauner@canonical.com>
Stéphane Graber [Mon, 14 Nov 2016 16:53:07 +0000 (11:53 -0500)]
debian: Don't depend on libui-dialog-perl
This package doesn't exist in stretch anymore, and it's unclear why we
were depending on a library to begin with (as opposed to having it
brought by whatever needs it).
Somehow this implementation of a cgroupfs backend decided to use the hierarchy
numbers it detects in /proc/cgroups and /proc/self/cgroups as indices for
the hierarchy struct. Controller numbering usually starts at 1 but may start at
0 if:
a) the controller is not mounted on a cgroups v1 hierarchy;
b) the controller is bound to the cgroups v2 single unified hierarchy; or
c) the controller is disabled
To avoid having to rework our fallback backend significantly, we should
explicitly check for each controller if hierarchy[i] != NULL.
Signed-off-by: Christian Brauner <christian.brauner@canonical.com>
conf: merge network namespace move & rename on shutdown
On shutdown we move physical network interfaces back to the
host namespace and rename them afterwards as well as in the
later lxc_network_delete() step. However, if the device had
a name which already exists in the host namespace then the
moving fails and so do the subsequent rename attempts. When
the namespace ceases to exist the devices finally end up
in the host namespace named 'dev<ID>' by the kernel.
In order to avoid this, we do the moving and renaming in a
single step (lxc_netdev_move_by_*()'s move & rename happen
in a single netlink transaction).
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Evgeni Golov [Sat, 8 Oct 2016 16:29:30 +0000 (18:29 +0200)]
mark the python examples as having utf-8 encoding
this allows running them also under Python2, which otherwise
would choke on Stéphane's name and error out with
SyntaxError: Non-ASCII character '\xc3' in file …
- We expect destroy to fail in zfs_clone() so try to silence it so users are
not irritated when they create zfs snapshots.
- Add -r recursive to zfs_destroy(). This code is only hit when a) the
container has no snapshots or b) the user calls destroy with snapshots. So
this should be safe. Without -r snapshots will remain.
Signed-off-by: Christian Brauner <christian.brauner@canonical.com>
James Cowgill [Mon, 15 Aug 2016 16:09:44 +0000 (16:09 +0000)]
seccomp: Implement MIPS seccomp handling
MIPS processors implement 3 ABIs: o32, n64 and n32 (similar to x32). The kernel
treats each ABI separately so syscalls disallowed on "all" arches should be
added to all three seccomp sets. This is implemented by expanding compat_arch
and compat_ctx to accept two compat architectures.
After this, the MIPS hostarch detection code and config section code is added.
Signed-off-by: James Cowgill <james410@cowgill.org.uk>
Stéphane Graber [Wed, 17 Aug 2016 19:42:34 +0000 (15:42 -0400)]
Use full GPG fingerprint instead of long IDs.
With how easy it is to create a collision on a short ID nowadays and
given that the user doesn't actually have to remember or manually enter
the key ID, lets just use the full fingerprint from now on.
This fixes a double free corruption on container-requested
reboots when lxc_spawn() fails before receiving the ttys, as
lxc_fini() (part of __lxc_start()'s cleanup) calls
lxc_delete_tty().
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Antonio Terceiro [Wed, 29 Jun 2016 17:58:35 +0000 (14:58 -0300)]
lxc-debian: fix regression when creating wheezy containers
The regression was introduced by commit 3c39b0b7a2b445e08d2e2aecb05566075f4f3423 which makes it possible to
create working stretch containers by forcinig `init` to be in the
included package list.
However, `init` didn't exit before jessie, so now for wheezy we
explicitly include `sysvinit`; sysvinit on wheezy is essential,
so it would already be included anyway.
Signed-off-by: Antonio Terceiro <terceiro@debian.org>
Preetam D'Souza [Tue, 28 Jun 2016 03:12:12 +0000 (23:12 -0400)]
Include all lxcmntent.h function declarations on Bionic
Newer versions of Android (5.0+, aka API Level 21+) include mntent.h,
which declares setmntent and endmntent. This hits an edge
case with the preprocessor checks in lxcmntent.h because HAVE_SETMNTENT
and HAVE_ENDMNTENT are both defined (in Bionic's mntent.h), but conf.c
always includes lxcmntent.h on Bionic! As a result, we get compiler
warnings of implicit function declarations for setmntent endmntent.
This patch always includes setmntent/endmntent/hasmntopt function
declarations on Bionic, which gets rid of these warnings.
Antonio Terceiro [Fri, 17 Jun 2016 22:00:56 +0000 (19:00 -0300)]
lxc-debian: make sure init is installed
init 1.34 is not "Essential" anymore, in order to make it not required
on minimal chroots, docker containers, etc. Because of that we now need
to manually include it on systems that are expected to boot.
Signed-off-by: Antonio Terceiro <terceiro@debian.org>
Stewart Brodie [Tue, 10 May 2016 12:57:00 +0000 (13:57 +0100)]
Allow configuration file values to be quoted
If the value starts and ends with matching quote characters, those
characters are stripped automatically. Quote characters are the
single quote (') or double quote ("). The quote removal is done after
the whitespace trimming.
This is needed particularly in order that lxc.environment values may
have trailing spaces. However, the quote removal is done for all values
in the parse_line function, as it has non-const access to the value.
Signed-off-by: Stewart Brodie <stewart@metahusky.net>
Aron Podrigal [Sun, 1 May 2016 15:06:53 +0000 (11:06 -0400)]
Fixed - set PyErr when Container.__init__ fails
When container init failed for whatever reason, previously it resulted
in a `SystemError: NULL result without error in PyObject_Call`
This will now result in a RuntimeError with the error message
previously printed to stderr.