Rachid Koucha [Sat, 12 Oct 2019 11:05:50 +0000 (13:05 +0200)]
Bad sgml/man translation
When calling "man lxc.container.conf", an internal "man" keyword is displayed :
$ man lxc.container.conf
[...]
lxc.mount.entry
Specify a mount point corresponding to a line in the fstab format. Moreover lxc supports mount propagation, such as
rslave or rprivate, and adds three additional mount options. optional don't fail if mount does not work. create=dir
or create=file to create dir (or file) when the point will be mounted. relative source path is taken to be relative to
the mounted container root. For instance,
In the usual case the child runs in a separate pid namespace. So far we haven't
been able to reliably set the pdeath signal. When we set the pdeath signal we
need to verify that we haven't lost a race whereby we have been orphaned and
though we have set a pdeath signal it won't help us since, well, the parent is
dead.
We were able to correctly handle this case when we were in the same pidns since
getppid() will return a valid pid. When we are in a separate pidns 0 will be
returned since the parent doesn't exist in our pidns.
A while back, while Jann and I were discussing other things he came up with a
nifty idea: simply pass an fd for the parent's status file and check the
"State:" field. This is the implementation of that idea.
Suggested-by: Jann Horn <jann@thejh.net> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Julio Faracco [Thu, 5 Sep 2019 04:43:21 +0000 (01:43 -0300)]
utils: Copying source filename to avoid missing info.
Some applications use information from LOOP_GET_STATUS64. The file
associated with loop device is pointed inside structure field
`lo_file_name`. The current code is setting up a loop device without
this information. A legacy example of code checking this is cryptsetup:
Antonio Terceiro [Sun, 18 Aug 2019 20:30:32 +0000 (17:30 -0300)]
lxc-attach: make sure exit status of command is returned
Commit ae68cad763d5b39a6a9e51de2acd1ad128b720ca introduced a regression that
makes lxc-attach ignore the exit status of the executed command. This was first
identified in 3.0.4 LTS, while it worked on 3.0.3.
cgfsng: mount pure unified cgroup layout correctly
When pure cgroup unified mode is used we cannot pre-mount a tmpfs as this
confuses systemd.
Users should also set lxc.mount.auto = cgroup:force to ensure that systemd in
the container and on the host use identical cgroup layouts.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Julio Faracco [Sat, 3 Aug 2019 05:16:13 +0000 (02:16 -0300)]
utils: Fix wrong integer of a function parameter.
If SSL is enabled, utils will include function `do_sha1_hash()` to
generate a sha1 encrypted buffer. Last function argument of
`EVP_DigestFinal_ex()` requires a `unsigned int` but the current
parameter is an `integer` type.
See error:
utils.c:350:38: error: passing 'int *' to parameter of type 'unsigned int *' converts between pointers to integer types with different sign
[-Werror,-Wpointer-sign]
EVP_DigestFinal_ex(mdctx, md_value, md_len);
^~~~~~
/usr/include/openssl/evp.h:549:49: note: passing argument to parameter 's' here
unsigned int *s);
Signed-off-by: Julio Faracco <jcfaracco@gmail.com>
Most kernels don't have this functionality yet, and so the warning is
printed a lot. Our people are scared of warnings, so let's make it INFO
instead in this case.
suppress false-negative error in templates and nvidia hook
``/proc`` might be mounted with ``hidepid=2``.
This makes ``/proc/1/…`` appear absent for non-root users.
When using the templates or the nvidia hook as a non-root user
(e.g., when creating unprivileged containers) the error
"/proc/1/uid_map: No such file or directory" is printed.
Since the script works correctly despite the error, this error
message might be confusing for users.
seccomp: send caller pidfd along with proxied requests
On the one hand this should close the race between the
process exiting until the proxy reads the request.
On the other hand it'll help the proxy quickly access info
from /proc (such as ./cwd, ./ns/mnt, ...)
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
We only read the message without the cookie. For now assert
that the sender also didn't try to send more by letting
`recvmsg()` return the original size of the packet if it was
longer.
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
With the previous commit we now attempt to reconnect to the
proxy in the beginning of the notify handler if we had no
connection.
If the connection fails later on, we now don't really need
to immediately try to reconnect if we send a default
response anyway (particularly if the recv() fails). (This
also gives the proxy more time, for instance if it was just
restarted.)
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
If a syscall happens after we already failed to communicate
with the proxy, proxy_fd was -1.
Before the previous commit we'd then be stuck in the state
where there was no proxy registered. With the previous
commit we'd send a default reply and only then try to
reconnect.
Improve this even further by trying to reconnect right at
the start.
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
seccomp: send default response when there's no proxy
Particularly, when there's no proxy registered (iow. none
configured but the seccomp profile still had a 'notify'
rule), we don't want to leave them hanging.
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
If the first sendmsg() fails, try to reconnect once before
failing. Otherwise if a proxy restarts while no syscall
happens, the next syscall always fails with ENOSYS.
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
When we fail to send a message, we send a default seccomp
response and try to reconnect to the proxy. It doesn't
really make much sense to retry to send the request over the
new connection as the syscall has already been answered. The
same goes for receiving the response - after reconnecting to
the proxy, we're a new client to a potentially new proxy
process, so awaiting a response without having sent a
request doesn't make all too much sense either.
In the future we should probably have a timeout or retry
count for the entire proxy _transaction_ before sending a
response to seccomp at all (and probably handle requests
asynchronously).
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
The seccomp notify API has a few variables: The struct sizes
are queried at runtime, and we now also have a user
configured cookie.
This means that with a SOCK_STREAM connection the proxy
needs to carefully read() the right amount of data based on
the contents of our proxy message struct to avoid ending up
in the middle of a packet.
While for now this may not be too tragic, since we currently
only ever send a single packet and then wait for the
response, we may at some point want to be able to handle
multiple processes simultaneously, hence it makes sense to
switch to a packet based connection.
So switch to using SOCK_SEQPACKET which is packet based,
(and also guarantees ordering). The `MSG_PEEK` flag can be
used with `recvmsg()` to figure out a packet's size on the
other end, and usually the size *should* not change after
that for an existing connection from a running container.
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>