- the reboot2() API extension doesn't exist so the command socket fd needs to
be closed unconditionally
- fix bad cherry-pick that tried to take the lock on the state client list
twice
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
This is based on raw_clone in systemd but adapted to our needs. The main reason
is that we need an implementation of fork()/clone() that does guarantee us that
no pthread_atfork() handlers are run. While clone() in glibc currently doesn't
run pthread_atfork() handlers we should be fine but there's no guarantee that
this won't be the case in the future. So let's do the syscall directly - or as
direct as we can. An additional nice feature is that we get fork() behavior,
i.e. lxc_raw_clone() returns 0 in the child and the child pid in the parent.
Our implementation tries to make sure that we cover all cases according to
kernel sources. Note that we are not interested in any arguments that could be
passed after the stack.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
When we report STOPPED to a caller and then close the command socket it is
technically possible - and I've seen this happen on the test builders - that a
container start() right after a wait() will receive ECONNREFUSED because it
called open() before we close(). So for all new state clients simply close the
command socket. This will inform all state clients that the container is
STOPPED and also prevents a race between a open()/close() on the command socket
causing a new process to get ECONNREFUSED because we haven't yet closed the
command socket.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Serge Hallyn [Thu, 14 Dec 2017 19:16:02 +0000 (13:16 -0600)]
dir_detect: warn on eperm
if user has lxc.rootfs.path = /some/path/foo, but can't access
some piece of that path, then we'll get an unhelpful "failed to
mount" without any indication of the problem.
Prior to this patch we raced with a very short-lived init process. Essentially,
the init process could exit before we had time to record the cgroup namespace
causing the container to abort and report ABORTING to the caller when it
actually started just fine. Let's not do this.
(This uses syscall(SYS_getpid) in the the child to retrieve the pid just in case
we're on an older glibc version and we end up in the namespace sharing branch
of the actual lxc_clone() call.)
Additionally this fixes the shortlived tests. They were faulty so far and
should have actually failed because of the cgroup namespace recording race but
the ret variable used to return from the function was not correctly
initialized. This fixes it.
Furthermore, the shortlived tests used the c->error_num variable to determine
success or failure but this is actually not correct when the container is
started daemonized.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
In the case the container has a console with a valid slave pty file descriptor
we duplicate std{in,out,err} to the slave file descriptor so console logging
works correctly. When the container does not have a valid slave pty file
descriptor for its console and is started daemonized we should dup to
/dev/null.
Closes #1646.
Signed-off-by: Li Feng <lifeng68@huawei.com> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
we made std{err,in,out} a duplicate of the slave file descriptor of the console
if it existed. This meant we also duplicated all of them when we executed
application containers in the foreground even if some std{err,in,out} file
descriptor did not refer to a {p,t}ty. This blocked use cases such as:
echo foo | lxc-execute -n -- cat
which are very valid and common with application containers but less common
with system containers where we don't have to care about this. So my suggestion
is to unconditionally duplicate std{err,in,out} to the console file descriptor
if we are either running daemonized - this ensures that daemonized application
containers with a single bash shell keep on working - or when we are not
running an application container. In other cases we only duplicate those file
descriptors that actually refer to a {p,t}ty. This logic is similar to what we
do for lxc-attach already.
Refers to #1690.
Closes #2028.
Reported-by: Felix Abecassis <fabecassis@nvidia.com> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Detaching network namespaces as an unprivileged user is currently not possible
and attaching to the user namespace will mean we are not allowed to move the
network device into an ancestor network namespace.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
tools: block using lxc-execute without config file
Moving away from internal symbols we can't do hacks like we currently do in
lxc-start and call internal functions like lxc_conf_init(). This is unsafe
anyway. Instead, we should simply error out if the user didn't give us a
configuration file to use. lxc-start refuses to start in that case already.
Relates to discussion in https://github.com/lxc/go-lxc/pull/96#discussion_r155075560 .
Closes #2023.
Reported-by: Felix Abecassis <fabecassis@nvidia.com> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Callers can then make a decision whether they want to consider the peer closing
the connection an error or not. For example, a c->wait(c, "STOPPED", -1) call
can then consider a ECONNRESET not an error but rather see it - correctly - as
a container exiting before being able to register a state client.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Take the lock on the list after we've done all necessary work and check state.
If we are in requested state, do cleanup and return without adding the state
client to the state client list.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
- setting the handler->state value is atomic on any POSIX implementation since
we're dealing with an integer (enum/lxc_state_t)
- while the state clients are served it is not possible for lxc_set_state() to
transition to the next state anyway so there's no danger in moving to the
next state with clients missing it
- we only care about the list being modified
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
There are multiple reasons why this is not required:
- every command is transactional
- we only care about the list being modified not the memory allocation and
other costly operations
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
We're dealing with an integer (lxc_state_t which is an enum). Any POSIX
implementation makes those operations atomic so there's not need in locking
this.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Abbas Ally [Sun, 3 Dec 2017 05:51:44 +0000 (05:51 +0000)]
Add bash completion to list backing store types for lxc-create -B
- Backing Store types are hard-coded (Not sure how to get programmatically)
- Closes #1236
CC-Hsu [Sat, 2 Dec 2017 11:27:34 +0000 (19:27 +0800)]
Add new dependency to Slackware template
I followed the [changelog of Slackware-current]<http://www.slackware.com/changelog/>,
and found that Slackware-current split hostname utility from util-linux package in Nov 17 2017.
So I add the new package to the template.
Change conf.c to export function write_id_mapping, which will now be
called inside main function of lxc_unshare.c.
This is required because setuid syscalls only permits a new userns to
set a new uid if the uid of parameter is mapped inside the ns using
uid_map file[1]. So, just after the clone invocation, map the uid passed as
parameter into the newly created user namespace, and put the current uid
as the ID-outside-ns. After the mapping is done, setuid call succeeds.
Felix Abecassis [Wed, 29 Nov 2017 04:27:53 +0000 (20:27 -0800)]
confile_utils: simplify lxc_config_net_hwaddr
In addition to the memory corruption fixed in ee3e84df78424d26fc6c90862fbe0fa92a686b0d,
this function was also performing invalid memory accesses for the following inputs:
- `lxc.net`
- `lxc.net.`
- `lxc.net.0.`
- `lxc.network`
- `lxc.network.0.`
Signed-off-by: Felix Abecassis <fabecassis@nvidia.com>