In case the lxc command socket is hashed and the socket was created for a
different path than the one we're currently querying
lxc_cmd_get_{lxcpath,name}() can return NULL. The command socket path is hashed
when len(lxcpath) > sizeof(sun_path) - 2.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
These functions define pointer to their key shifted by a
number and guard access to it later via another variable.
Let's make this more explicit (and additionally have the
pointer be NULL in the case where it is not supposed to be
used).
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Thomas Jarosch [Thu, 2 Feb 2017 11:48:35 +0000 (12:48 +0100)]
lxc_setup_tios(): Ignore SIGTTOU and SIGTTIN signals
Prevent an endless loop while executing lxc-attach in the background:
The kernel might fire SIGTTOU while an ioctl() in tcsetattr()
is executed. When the ioctl() is resumed and retries,
the signal handler interrupts it again.
We can't configure the TTY to stop sending
the signals in the first place since that
is a modification/write to the TTY already.
Still we clear the TOSTOP flag to prevent further signals.
Command to reproduce the hang:
----------------------------
cat > lxc_hang.sh << EOF
/usr/bin/timeout 5s /usr/bin/lxc-attach -n SOMECONTAINER -- /bin/true
EOF
sh lxc_hang.sh # hangs
----------------------------
Signed-off-by: Thomas Jarosch <thomas.jarosch@intra2net.com>
Previous versions of lxc-attach simply attached to the specified namespaces of
a container and ran a shell or the specified command without first allocating a
pseudo terminal. This made them vulnerable to input faking via a TIOCSTI ioctl
call after switching between userspace execution contexts with different
privilege levels. Newer versions of lxc-attach will try to allocate a pseudo
terminal master/slave pair on the host and attach any standard file descriptors
which refer to a terminal to the slave side of the pseudo terminal before
executing a shell or command. Note, that if none of the standard file
descriptors refer to a terminal lxc-attach will not try to allocate a pseudo
terminal. Instead it will simply attach to the containers namespaces and run a
shell or the specified command.
(This is a backport of a series of patches fixing CVE-2016-10124.)
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
- Make escape sequence to exit tty optional since we want to reuse
lxc_console_cb_tty_stdin() in lxc_attach.c.
- Export the following functions since they can be reused in other modules:
- lxc_console_cb_tty_stdin()
- lxc_console_cb_tty_master()
- lxc_setup_tios(int fd, struct termios *oldtios);
- lxc_console_winsz(int srcfd, int dstfd);
- lxc_console_cb_sigwinch_fd(int fd, uint32_t events, void *cbdata, struct lxc_epoll_descr *descr);
- lxc_tty_state *lxc_console_sigwinch_init(int srcfd, int dstfd);
- lxc_console_sigwinch_fini(struct lxc_tty_state *ts);
- rewrite lxc_console_set_stdfds()
- Make lxc_console_set_stdfds useable by other callers that do not have
access to lxc_handler.
- Use ssh settings for ptys.
- Remove all asserts from console.{c,h}.
- Adapt start.c to changes.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Serge Hallyn [Mon, 6 Mar 2017 19:36:19 +0000 (13:36 -0600)]
seccomp: set SCMP_FLTATR_ATL_TSKIP if available
Newer libseccomp has a flag called SCMP_FLTATR_ATL_TSKIP which
allows syscall '-1' (nop) to be executed. Without that flag,
debuggers cannot skip system calls inside containers. For reference,
see the seccomp(2) manpage, which says:
The tracer can skip the system call by changing the system call number to -1.
Adam Borowski [Sun, 12 Feb 2017 06:26:54 +0000 (07:26 +0100)]
seccomp: allow x32 guests on amd64 hosts.
Without this patch, x32 guests (and no others) worked "natively" with x32
host lxc, but not on regular amd64 hosts. That was especially problematic
as a number of ioctls such as those needed by netfilter don't work in such
scenarios, thus you want to run amd64 on the host.
With the patch, you can use all three ABIs: i386 x32 amd64 on amd64 hosts.
Despite x32 being little used, there's no reason to deny it by default:
the admin needs to compile their own kernel with CONFIG_X86_X32=y or (on
Debian) boot with syscall.x32=y. If they've done so, it is a reasonable
assumption they want x32 guests.
Signed-off-by: Adam Borowski <kilobyte@angband.pl>
CVE-2017-5985: Ensure target netns is caller-owned
Before this commit, lxc-user-nic could potentially have been tricked into
operating on a network namespace over which the caller did not hold privilege.
This commit ensures that the caller is privileged over the network namespace by
temporarily dropping privilege.
Launchpad: https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1654676 Reported-by: Jann Horn <jannh@google.com> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Colin Watson [Thu, 26 Jan 2017 14:32:08 +0000 (14:32 +0000)]
Make lxc-start-ephemeral Python 3.2-compatible
On Ubuntu 12.04 LTS with Python 3.2, `lxc-start-ephemeral` breaks as
follows:
Traceback (most recent call last):
File "/usr/bin/lxc-start-ephemeral", line 371, in attach_as_user
File "/usr/lib/python3.2/subprocess.py", line 515, in check_output
File "/usr/lib/python3.2/subprocess.py", line 732, in __init__
LookupError: unknown encoding: ANSI_X3.4-1968
This is because `universal_newlines=True` causes `subprocess` to use
`io.TextIOWrapper`, and in versions of Python earlier than 3.3 that
fetched the preferred encoding using `locale.getpreferredencoding()`
rather than `locale.getpreferredencoding(False)`, thereby changing the
locale and causing codecs to be reloaded. However, `attach_as_user`
runs inside the container and thus can't rely on having access to the
same Python standard library on disk.
The workaround is to decode by hand instead, avoiding the temporary
change of locale.
Use AC_HEADER_MAJOR to detect major()/minor()/makedev()
Before the change build failed on Gentoo as:
bdev/lxclvm.c: In function 'lvm_detect':
bdev/lxclvm.c:140:4: error: implicit declaration of function 'major' [-Werror=implicit-function-declaration]
major(statbuf.st_rdev), minor(statbuf.st_rdev));
^~~~~
bdev/lxclvm.c:140:28: error: implicit declaration of function 'minor' [-Werror=implicit-function-declaration]
major(statbuf.st_rdev), minor(statbuf.st_rdev));
^~~~~
glibc plans to remove <sys/sysmacros.h> from glibc's <sys/types.h>:
https://sourceware.org/ml/libc-alpha/2015-11/msg00253.html
Gentoo already applied glibc patch to experimental glibc-2.24
to start preparingfor the change.
Autoconf has AC_HEADER_MAJOR to find out which header defines
reqiured macros:
https://www.gnu.org/software/autoconf/manual/autoconf-2.69/html_node/Particular-Headers.html
This change should also increase portability across other libcs.
Bug: https://bugs.gentoo.org/604360 Signed-off-by: Sergei Trofimovich <siarheit@google.com>
This mainly affects Trusty. The 3.13 kernel has a broken overlay module which
does not handle symlinks correctly. This is a problem for containers that use
an overlay based rootfs since safe_mount() uses /proc/<pid>/fd/<fd-number> in
its calls to mount().
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Fabrice Fontaine [Sun, 18 Dec 2016 20:39:24 +0000 (21:39 +0100)]
Add --enable-gnutls option
Previously HAVE_LIBGNUTLS was never set in config.h even if gnutls was
detected as AC_CHECK_LIB default action-if-found was overriden by
enable_gnutls=yes
This patch adds an --enable-gnutls option and will call AC_CHECK_LIB
with the default action to write HAVE_LIBGNUTLS in config.h
So far, we opened a file descriptor refering to proc on the host inside the
host namespace and handed that fd to the attached process in
attach_child_main(). This was done to ensure that LSM labels were correctly
setup. However, by exploiting a potential kernel bug, ptrace could be used to
prevent the file descriptor from being closed which in turn could be used by an
unprivileged container to gain access to the host namespace. Aside from this
needing an upstream kernel fix, we should make sure that we don't pass the fd
for proc itself to the attached process. However, we cannot completely prevent
this, as the attached process needs to be able to change its apparmor profile
by writing to /proc/self/attr/exec or /proc/self/attr/current. To minimize the
attack surface, we only send the fd for /proc/self/attr/exec or
/proc/self/attr/current to the attached process. To do this we introduce a
little more IPC between the child and parent:
* IPC mechanism: (X is receiver)
* initial process intermediate attached
* X <--- send pid of
* attached proc,
* then exit
* send 0 ------------------------------------> X
* [do initialization]
* X <------------------------------------ send 1
* [add to cgroup, ...]
* send 2 ------------------------------------> X
* [set LXC_ATTACH_NO_NEW_PRIVS]
* X <------------------------------------ send 3
* [open LSM label fd]
* send 4 ------------------------------------> X
* [set LSM label]
* close socket close socket
* run program
The attached child tells the parent when it is ready to have its LSM labels set
up. The parent then opens an approriate fd for the child PID to
/proc/<pid>/attr/exec or /proc/<pid>/attr/current and sends it via SCM_RIGHTS
to the child. The child can then set its LSM laben. Both sides then close the
socket fds and the child execs the requested process.
Signed-off-by: Christian Brauner <christian.brauner@canonical.com>
Stéphane Graber [Mon, 14 Nov 2016 16:53:07 +0000 (11:53 -0500)]
debian: Don't depend on libui-dialog-perl
This package doesn't exist in stretch anymore, and it's unclear why we
were depending on a library to begin with (as opposed to having it
brought by whatever needs it).
Somehow this implementation of a cgroupfs backend decided to use the hierarchy
numbers it detects in /proc/cgroups and /proc/self/cgroups as indices for
the hierarchy struct. Controller numbering usually starts at 1 but may start at
0 if:
a) the controller is not mounted on a cgroups v1 hierarchy;
b) the controller is bound to the cgroups v2 single unified hierarchy; or
c) the controller is disabled
To avoid having to rework our fallback backend significantly, we should
explicitly check for each controller if hierarchy[i] != NULL.
Signed-off-by: Christian Brauner <christian.brauner@canonical.com>
conf: merge network namespace move & rename on shutdown
On shutdown we move physical network interfaces back to the
host namespace and rename them afterwards as well as in the
later lxc_network_delete() step. However, if the device had
a name which already exists in the host namespace then the
moving fails and so do the subsequent rename attempts. When
the namespace ceases to exist the devices finally end up
in the host namespace named 'dev<ID>' by the kernel.
In order to avoid this, we do the moving and renaming in a
single step (lxc_netdev_move_by_*()'s move & rename happen
in a single netlink transaction).
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>