]> git.ipfire.org Git - thirdparty/lxc.git/log
thirdparty/lxc.git
7 years agolsm: non-functional changes
Christian Brauner [Mon, 22 Jan 2018 09:48:56 +0000 (10:48 +0100)] 
lsm: non-functional changes

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
7 years agoFix comments and add check in lxc_poll.
LiFeng [Mon, 22 Jan 2018 11:13:52 +0000 (06:13 -0500)] 
Fix comments and add check in lxc_poll.

Add check whether handler->conf->console.path is 'none'

Signed-off-by: LiFeng <lifeng68@huawei.com>
7 years agoModify .gitignore
LiFeng [Mon, 22 Jan 2018 12:48:21 +0000 (07:48 -0500)] 
Modify .gitignore

Add:
src/tests/lxc-test-api-reboot
src/tests/lxc-test-criu-check-feature
src/tests/lxc-test-raw-clone
src/tests/lxc-test-share-ns
src/tests/lxc-test-state-server

Signed-off-by: LiFeng <lifeng68@huawei.com>
7 years agostart: fix mainloop cleanup goto statements
Christian Brauner [Sun, 21 Jan 2018 12:55:42 +0000 (13:55 +0100)] 
start: fix mainloop cleanup goto statements

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
7 years agocoverity: #1426694
Christian Brauner [Sat, 20 Jan 2018 20:46:31 +0000 (21:46 +0100)] 
coverity: #1426694

remove logically dead code

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
7 years agocoverity: #1426734
Christian Brauner [Sat, 20 Jan 2018 20:44:50 +0000 (21:44 +0100)] 
coverity: #1426734

do not call close on bad fd

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
7 years agocoverity: #1427190
Christian Brauner [Mon, 8 Jan 2018 17:25:56 +0000 (18:25 +0100)] 
coverity: #1427190

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
7 years agocoverity: #1427191
Christian Brauner [Mon, 8 Jan 2018 17:24:41 +0000 (18:24 +0100)] 
coverity: #1427191

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
7 years agocoverity: #1427638
Christian Brauner [Sat, 20 Jan 2018 20:35:35 +0000 (21:35 +0100)] 
coverity: #1427638

avoid (however unlikely) double free

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
7 years agocoverity: #1427639
Christian Brauner [Sat, 20 Jan 2018 20:30:17 +0000 (21:30 +0100)] 
coverity: #1427639

remove logically dead code

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
7 years agocoverity: #1427668
Christian Brauner [Sat, 20 Jan 2018 20:26:33 +0000 (21:26 +0100)] 
coverity: #1427668

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
7 years agounlink lxc-init
Tycho Andersen [Wed, 20 Dec 2017 17:52:38 +0000 (17:52 +0000)] 
unlink lxc-init

It's sort of an implementation detail that this exists at all, and we
should probably not pollute the container's mount tables or FS with this.

Signed-off-by: Tycho Andersen <tycho@tycho.ws>
7 years agoCreate console when the rootfs is NULL
LiFeng [Thu, 18 Jan 2018 22:26:11 +0000 (17:26 -0500)] 
Create console when the rootfs is NULL

Signed-off-by: LiFeng <lifeng68@huawei.com>
7 years agotools: fix android
Christian Brauner [Sat, 20 Jan 2018 13:01:19 +0000 (14:01 +0100)] 
tools: fix android

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
7 years agocgfsng: reduce delta
Christian Brauner [Sat, 20 Jan 2018 11:52:08 +0000 (12:52 +0100)] 
cgfsng: reduce delta

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
7 years agoRevert commit "bla" with bad commit message
Christian Brauner [Sat, 20 Jan 2018 11:51:04 +0000 (12:51 +0100)] 
Revert commit "bla" with bad commit message

This reverts commit 04c141a4ed93e5daf20b90b3da80314cb501b1a9.

7 years agobla
Christian Brauner [Sat, 20 Jan 2018 11:48:06 +0000 (12:48 +0100)] 
bla

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
7 years agocgroups: reduce delta
Christian Brauner [Sat, 20 Jan 2018 11:40:45 +0000 (12:40 +0100)] 
cgroups: reduce delta

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
7 years agoattach: reduce delta
Christian Brauner [Sat, 20 Jan 2018 11:12:05 +0000 (12:12 +0100)] 
attach: reduce delta

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
7 years agoattach_options: reduce delta
Christian Brauner [Sat, 20 Jan 2018 11:06:14 +0000 (12:06 +0100)] 
attach_options: reduce delta

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
7 years agotest: fix console tests
Christian Brauner [Fri, 19 Jan 2018 15:53:24 +0000 (16:53 +0100)] 
test: fix console tests

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
7 years agoconsole: cleanup
Christian Brauner [Fri, 19 Jan 2018 14:36:14 +0000 (15:36 +0100)] 
console: cleanup

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
7 years agoset exit status to 1 in the unknown si_code case
Tycho Andersen [Fri, 19 Jan 2018 04:23:48 +0000 (04:23 +0000)] 
set exit status to 1 in the unknown si_code case

Signed-off-by: Tycho Andersen <tycho@tycho.ws>
7 years agolxc-execute: actually exit with the status of the spawned task
Tycho Andersen [Fri, 19 Jan 2018 03:31:33 +0000 (03:31 +0000)] 
lxc-execute: actually exit with the status of the spawned task

Now that we have things propagated through init and liblxc correctly, at
least in non-daemon mode, we can exit with the actual exit status of the
task, instead of always succeeding, which is not so helpful.

Signed-off-by: Tycho Andersen <tycho@tycho.ws>
7 years agostart: don't return false when the container's init exits nonzero
Tycho Andersen [Fri, 19 Jan 2018 03:29:05 +0000 (03:29 +0000)] 
start: don't return false when the container's init exits nonzero

This seems slightly counter-intuitive, but IMO it's what we want.
Basically, ->start() should succeed if the container is spawned correctly
(similar to how golang's exec.Cmd.Start() returns nil if the thing spawns
correctly), and users can check error_num (i.e. golang's exec.Cmd.Wait())
to see how it exited.

This preserves previous behavior, which basically was that start was always
successful if the thing actually launched. Since we never kept track of
exit codes, this would always succeed too. Now that we do, it doesn't, and
this change is required.

Signed-off-by: Tycho Andersen <tycho@tycho.ws>
7 years agoremember the exit code from the init process
Tycho Andersen [Fri, 19 Jan 2018 03:24:59 +0000 (03:24 +0000)] 
remember the exit code from the init process

error_num seems to be trying to remember the exit code of the init process,
except that nothing actually keeps track of it anywhere. So, let's add a
field to the handler, so that we can keep track of the process' exit
status, and the propagate it to error_num in struct lxc_container so that
people can use it.

Note that this is a slight behavior change, essentially instead of making
error_num always == the return code from start, now it contains slightly
more useful information (the actual exit status). But, there is only one
internal user of error_num which I'll fix in later in the series, so IMO
this is ok.

Signed-off-by: Tycho Andersen <tycho@tycho.ws>
7 years agolxc.init: correctly exit with the app's error code
Tycho Andersen [Fri, 19 Jan 2018 03:21:10 +0000 (03:21 +0000)] 
lxc.init: correctly exit with the app's error code

Based on the comments in the code (and the have_status flag), the intent
here (and IMO, the desired behavior) should be for init.lxc to propagate
the actual exit code from the real application process up through.
Otherwise, it is swallowed and nobody can access it.

The bug being fixed here is that ret held the correct exit code, but when
it went around the loop again (to wait for other children) ret is
clobbered. Let's save the desired exit status somewhere else, so it can't
get clobbered, and we propagate things correctly.

Signed-off-by: Tycho Andersen <tycho@tycho.ws>
7 years agofix lxc_error_set_and_log to match the docs
Tycho Andersen [Fri, 19 Jan 2018 03:20:08 +0000 (03:20 +0000)] 
fix lxc_error_set_and_log to match the docs

The documentation for this function says if the task was killed by a
signal, the return code will be 128+n, where n is the signal number. Let's
make that actually true.

(We'll use this behavior in later patches.)

Signed-off-by: Tycho Andersen <tycho@tycho.ws>
7 years agostart: don't log stop/continue for non-init processes
Tycho Andersen [Fri, 19 Jan 2018 00:50:39 +0000 (00:50 +0000)] 
start: don't log stop/continue for non-init processes

This non-init forwarding check should really be before all the log messages
about "init continued" or "init stopped", since they will otherwise lie
about some process that wasn't init being stopped or continued.

Signed-off-by: Tycho Andersen <tycho@tycho.ws>
7 years agocommands: add LXC_CMD_SERVE_STATE_CLIENTS
Christian Brauner [Wed, 17 Jan 2018 19:46:04 +0000 (20:46 +0100)] 
commands: add LXC_CMD_SERVE_STATE_CLIENTS

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
7 years agofreezer: non-functional changes
Christian Brauner [Wed, 17 Jan 2018 19:09:13 +0000 (20:09 +0100)] 
freezer: non-functional changes

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
7 years agolxccontainer: restore blocking wait()
Christian Brauner [Wed, 17 Jan 2018 14:22:36 +0000 (15:22 +0100)] 
lxccontainer: restore blocking wait()

Closes #2027.
Closes lxc/go-lxc#98.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
7 years agoutils: check suffix length
Christian Brauner [Wed, 17 Jan 2018 11:21:09 +0000 (12:21 +0100)] 
utils: check suffix length

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
7 years agotest: log error on failure
Christian Brauner [Wed, 17 Jan 2018 10:50:54 +0000 (11:50 +0100)] 
test: log error on failure

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
7 years agoutils: do not rely on unitialized variable
Christian Brauner [Wed, 17 Jan 2018 10:19:05 +0000 (11:19 +0100)] 
utils: do not rely on unitialized variable

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
7 years agolxc-alpine: allow retaining sys_ptrace per container
Kaarle Ritvanen [Tue, 16 Jan 2018 13:53:04 +0000 (15:53 +0200)] 
lxc-alpine: allow retaining sys_ptrace per container

Signed-off-by: Kaarle Ritvanen <kaarle.ritvanen@datakunkku.fi>
7 years agoconsole: set SFD_CLOEXEC on signal fd
Christian Brauner [Sun, 31 Dec 2017 00:58:16 +0000 (01:58 +0100)] 
console: set SFD_CLOEXEC on signal fd

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
7 years agostart: non-functional changes
Christian Brauner [Sun, 31 Dec 2017 00:48:01 +0000 (01:48 +0100)] 
start: non-functional changes

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
7 years agotools: honor --console and --console-log
Christian Brauner [Mon, 25 Dec 2017 00:52:33 +0000 (01:52 +0100)] 
tools: honor --console and --console-log

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
7 years agoattach: minor tweaks
Christian Brauner [Sun, 24 Dec 2017 18:24:35 +0000 (19:24 +0100)] 
attach: minor tweaks

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
7 years agoconsole: add lxc_pty_map_ids()
Christian Brauner [Sun, 24 Dec 2017 18:13:54 +0000 (19:13 +0100)] 
console: add lxc_pty_map_ids()

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
7 years agoconsole: adapt lxc_console_mainloop_add()
Christian Brauner [Sat, 23 Dec 2017 12:25:44 +0000 (13:25 +0100)] 
console: adapt lxc_console_mainloop_add()

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
7 years agoattach: cleanup attach_child_main()
Christian Brauner [Sat, 23 Dec 2017 11:39:52 +0000 (12:39 +0100)] 
attach: cleanup attach_child_main()

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
7 years agoconsole: add some pty helpers
Christian Brauner [Sat, 23 Dec 2017 11:19:51 +0000 (12:19 +0100)] 
console: add some pty helpers

- int lxc_make_controlling_pty()
- int lxc_login_pty()
- void lxc_pty_conf_free()
- void lxc_pty_info_init()
- void lxc_pty_init()

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
7 years agostart: non-functional changes
Christian Brauner [Sat, 23 Dec 2017 11:03:32 +0000 (12:03 +0100)] 
start: non-functional changes

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
7 years agoconsole: move pty creation to separate function
Christian Brauner [Sat, 23 Dec 2017 10:59:36 +0000 (11:59 +0100)] 
console: move pty creation to separate function

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
7 years agoconfile: improve log messages
Christian Brauner [Tue, 9 Jan 2018 10:20:44 +0000 (11:20 +0100)] 
confile: improve log messages

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
7 years agoadd some idmap parsing error messages
Tycho Andersen [Tue, 9 Jan 2018 00:07:50 +0000 (00:07 +0000)] 
add some idmap parsing error messages

otherwise, we just get a return value of false from setting config failure,
with no indication as to what actually failed in the log.

Signed-off-by: Tycho Andersen <tycho@tycho.ws>
7 years agofix up lxc-usernsexec's exit status
Tycho Andersen [Mon, 8 Jan 2018 16:20:24 +0000 (16:20 +0000)] 
fix up lxc-usernsexec's exit status

* exit(1) when there is an option parsing error
* exit(0) when the user explicitly asks for help
* exit(1) when the user specifies an invalid option

Signed-off-by: Tycho Andersen <tycho@tycho.ws>
7 years agoAdd return check for 'lxc_cmd_get_name'
LiFeng [Mon, 8 Jan 2018 19:26:00 +0000 (14:26 -0500)] 
Add return check for 'lxc_cmd_get_name'

If 'lxc_cmd_get_name' failed and return with NULL, this would cause a segment fault.

Signed-off-by: LiFeng <lifeng68@huawei.com>
7 years agoInclude -devel suffix in version string
Stéphane Graber [Fri, 5 Jan 2018 20:20:55 +0000 (15:20 -0500)] 
Include -devel suffix in version string

Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
7 years agoFix broken indentation
Stéphane Graber [Fri, 5 Jan 2018 20:19:30 +0000 (15:19 -0500)] 
Fix broken indentation

Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
7 years agocgfsng: only establish mapping once
Christian Brauner [Thu, 4 Jan 2018 14:28:12 +0000 (15:28 +0100)] 
cgfsng: only establish mapping once

When we deleted cgroups for unprivileged containers we used to allocate a new
mapping and clone a new user namespace each time we delete a cgroup. This of
course meant - on a cgroup v1 system - doing this >= 10 times when all
controllers were used. Let's not to do this and only allocate and establish a
mapping once.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
7 years agoconf: rework userns_exec_1()
Christian Brauner [Thu, 4 Jan 2018 14:01:06 +0000 (15:01 +0100)] 
conf: rework userns_exec_1()

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
7 years agoconf: non-functional changes
Christian Brauner [Thu, 4 Jan 2018 13:59:42 +0000 (14:59 +0100)] 
conf: non-functional changes

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
7 years agoconf: write "deny" to /proc/[pid]/setgroups
Christian Brauner [Wed, 3 Jan 2018 15:28:40 +0000 (16:28 +0100)] 
conf: write "deny" to /proc/[pid]/setgroups

When fully unprivileged users run a container that only maps their own {g,u}id
and they do not have access to setuid new{g,u}idmap binaries we will write the
idmapping directly. This however requires us to write "deny" to
/proc/[pid]/setgroups otherwise any write to /proc/[pid]/gid_map will be
denied.

On a sidenote, this patch enables fully unprivileged containers. If you now set
lxc.net.[i].type = empty no privilege whatsoever is required to run a container.

Enhances #2033.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Felix Abecassis <fabecassis@nvidia.com>
Cc: Jonathan Calmels <jcalmels@nvidia.com>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
7 years agoconfigure.ac: fix the check for static libcap
Serge Hallyn [Thu, 4 Jan 2018 03:02:53 +0000 (21:02 -0600)] 
configure.ac: fix the check for static libcap

The existing check doesn't work, because when you statically
link a program against libc, any functions not called are not
included.  So cap_init() which we check for is not there in
the built binary.

So instead just check whether a "gcc -lcap -static" works.
If libcap.a is not available it will fail, if it is it will
succeed.

Signed-off-by: Serge Hallyn <shallyn@cisco.com>
7 years agogentoo: Add support for .xz tarballs
Stéphane Graber [Wed, 3 Jan 2018 23:06:33 +0000 (18:06 -0500)] 
gentoo: Add support for .xz tarballs

Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
7 years agoconf: detect if devpts can be mounted with gid=5
Christian Brauner [Tue, 2 Jan 2018 23:11:38 +0000 (00:11 +0100)] 
conf: detect if devpts can be mounted with gid=5

Closes #2033.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
7 years agocgfsng: use init {g,u}id
Christian Brauner [Tue, 2 Jan 2018 22:41:10 +0000 (23:41 +0100)] 
cgfsng: use init {g,u}id

If no id mapping for the container's root id is defined try to us the id
mappings specified via lxc.init.{g,u}id.

Closes #2033.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
7 years agoconf{ile}: detect ns{g,u}id mapping for root
Christian Brauner [Tue, 2 Jan 2018 22:27:55 +0000 (23:27 +0100)] 
conf{ile}: detect ns{g,u}id mapping for root

Closes #2033.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
7 years agoconf: adapt userns_exec_1()
Christian Brauner [Tue, 2 Jan 2018 21:31:16 +0000 (22:31 +0100)] 
conf: adapt userns_exec_1()

Closes #2033.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
7 years agoconf: adapt idmap helpers
Christian Brauner [Tue, 2 Jan 2018 21:15:17 +0000 (22:15 +0100)] 
conf: adapt idmap helpers

- mapped_hostid_entry()
- idmap_add()

Closes #2033.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
7 years agomainloop: use epoll_create1(EPOLL_CLOEXEC)
Christian Brauner [Tue, 26 Dec 2017 19:57:12 +0000 (20:57 +0100)] 
mainloop: use epoll_create1(EPOLL_CLOEXEC)

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
7 years agoconsole: do not allow non-pty devices on open()
Christian Brauner [Tue, 26 Dec 2017 17:00:08 +0000 (18:00 +0100)] 
console: do not allow non-pty devices on open()

We don't allow non-pty devices anyway so don't let open() create unneeded
files.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
7 years agostart: properly cleanup mainloop
Christian Brauner [Tue, 26 Dec 2017 12:45:12 +0000 (13:45 +0100)] 
start: properly cleanup mainloop

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
7 years agolxc_config: Add -h and --help flags handler
Marcos Paulo de Souza [Sat, 30 Dec 2017 18:35:52 +0000 (16:35 -0200)] 
lxc_config: Add -h and --help flags handler

As the other tools already handle, show usage message when -h or --help
are used.

Signed-off-by: Marcos Paulo de Souza <marcos.souza.org@gmail.com>
7 years agomainloop: capture output of short-lived init procs
Christian Brauner [Mon, 25 Dec 2017 13:53:40 +0000 (14:53 +0100)] 
mainloop: capture output of short-lived init procs

The handler for the signal fd will detect when the init process of a container
has exited and cause the mainloop to close. However, this can happen before the
console handlers - or any other events for that matter - are handled. So in the
case of init exiting we still need to allow for all buffered input to the
console to be handled before exiting. This allows us to capture output from
short-lived init processes.

This is conceptually equivalent to my implementation of ExecReaderToChannel()
https://github.com/lxc/lxd/blob/master/shared/util_linux.go#L527

Closes #1694.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
7 years agomainloop: add mainloop macros
Christian Brauner [Mon, 25 Dec 2017 13:52:39 +0000 (14:52 +0100)] 
mainloop: add mainloop macros

This makes it clearer why handlers return what value.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
7 years agostart: handle setting death signal smarter
Christian Brauner [Fri, 22 Dec 2017 21:52:42 +0000 (22:52 +0100)] 
start: handle setting death signal smarter

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
7 years agostart: fix death signal
Christian Brauner [Fri, 22 Dec 2017 21:17:44 +0000 (22:17 +0100)] 
start: fix death signal

On set{g,u}id() the kernel does:

  /* dumpability changes */
if (!uid_eq(old->euid, new->euid) ||
    !gid_eq(old->egid, new->egid) ||
    !uid_eq(old->fsuid, new->fsuid) ||
    !gid_eq(old->fsgid, new->fsgid) ||
    !cred_cap_issubset(old, new)) {
if (task->mm)
set_dumpable(task->mm, suid_dumpable);
task->pdeath_signal = 0;
smp_wmb();
}

which means we need to re-enable the deat signal after the set{g,u}id().

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
7 years agostart: simplify cgroup namespace preservation
Christian Brauner [Fri, 22 Dec 2017 16:18:50 +0000 (17:18 +0100)] 
start: simplify cgroup namespace preservation

Since we are now dumpable we can open /proc/<child-pid>/ns/cgroup so let's
avoid the overhead of sending around fds.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
7 years agostart: make us dumpable
Christian Brauner [Fri, 22 Dec 2017 16:11:45 +0000 (17:11 +0100)] 
start: make us dumpable

When set set{u,g}id() the kernel will make us undumpable. This is unnecessary
since we can guarantee that whatever is running inside the child process at
this point this is fully trusted by the parent. Making us dumpable let's users
use debuggers on the child process before the exec as well and also allows us
to open /proc/<child-pid> files in lieu of the child.
Note, that we only need to perform the prctl(PR_SET_DUMPABLE, ...) if our
effective uid on the host is not 0. If our effective uid on the host is 0 then
we will keep all capabilities in the child user namespace across set{g,u}id().

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
7 years agostart: log closing cmd socket and STOPPED state
Christian Brauner [Sat, 16 Dec 2017 13:39:12 +0000 (14:39 +0100)] 
start: log closing cmd socket and STOPPED state

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
7 years agostart: use lxc_raw_clone_cb() where possible
Christian Brauner [Fri, 15 Dec 2017 16:42:31 +0000 (17:42 +0100)] 
start: use lxc_raw_clone_cb() where possible

This way we can rely on the kernel's copy-on-write support similar to fork().

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
7 years agonamespace: add lxc_raw_clone_cb()
Christian Brauner [Fri, 15 Dec 2017 16:35:43 +0000 (17:35 +0100)] 
namespace: add lxc_raw_clone_cb()

This is a copy-on-write (no stack passed) variant of lxc_clone().

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
7 years agonamespace: comment lxc_{raw_}clone()
Christian Brauner [Fri, 15 Dec 2017 16:35:07 +0000 (17:35 +0100)] 
namespace: comment lxc_{raw_}clone()

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
7 years agotree-wide: s/getpid()/lxc_raw_getpid()/g
Christian Brauner [Sat, 16 Dec 2017 01:07:43 +0000 (02:07 +0100)] 
tree-wide: s/getpid()/lxc_raw_getpid()/g

This is to avoid bad surprises caused by older glibc's pid cache (up to 2.25)
when using clone().

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
7 years agonamespace: add lxc_raw_getpid()
Christian Brauner [Sat, 16 Dec 2017 00:23:17 +0000 (01:23 +0100)] 
namespace: add lxc_raw_getpid()

Because of older glibc's pid cache (up to 2.25) whenever clone() is called the
child must must retrieve it's own pid via lxc_raw_getpid().

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
7 years agotests: expand lxc_raw_clone() tests
Christian Brauner [Fri, 15 Dec 2017 16:03:09 +0000 (17:03 +0100)] 
tests: expand lxc_raw_clone() tests

- test CLONE_VFORK
- test CLONE_FILES

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
7 years agoattach: handle /proc with hidepid={1,2} property
Christian Brauner [Wed, 20 Dec 2017 23:42:37 +0000 (00:42 +0100)] 
attach: handle /proc with hidepid={1,2} property

Receive fd for LSM security module before we set{g,u}id(). The reason is that
on set{g,u}id() the kernel will a) make us undumpable and b) we will change our
effective uid. This means our effective uid will be different from the
effective uid of the process that created us which means that this processs no
longer has capabilities in our namespace including CAP_SYS_PTRACE. This means
we will not be able to read and /proc/<pid> files for the process anymore when
/proc is mounted with hidepid={1,2}. So let's get the lsm label fd before the
set{g,u}id().

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
7 years agoattach: use lxc_raw_clone()
Christian Brauner [Wed, 20 Dec 2017 12:14:33 +0000 (13:14 +0100)] 
attach: use lxc_raw_clone()

This let's us simplify the whole file a lot and makes things way clearer. It
also let's us avoid the infamous pid cache.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
7 years agoattach: simplify significantly
Christian Brauner [Mon, 18 Dec 2017 01:46:10 +0000 (02:46 +0100)] 
attach: simplify significantly

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
7 years agocgfsng: Add new macro to print errors
Marcos Paulo de Souza [Wed, 20 Dec 2017 01:43:47 +0000 (23:43 -0200)] 
cgfsng: Add new macro to print errors

At this point, macros such DEBUG or ERROR does not take effect because
this code is called from cgroup_ops_init(cgroup.c), which runs with
__attribute__((constructor)), before any log level is set form any tool
like lxc-start, so these messages are lost.

For now on, use the same LXC_DEBUG_CGFSNG environment variable to
control these messages.

Signed-off-by: Marcos Paulo de Souza <marcos.souza.org@gmail.com>
7 years ago[monitor] wrong statement of break
独孤昊天 [Mon, 18 Dec 2017 06:52:25 +0000 (14:52 +0800)] 
[monitor] wrong statement of break

if lxc_abstract_unix_connect fail and return -1,  this code never goto retry.

Signed-off-by: liuhao <liuhao27@huawei.com>
7 years agocommands_utils: add missing mutex
Christian Brauner [Mon, 18 Dec 2017 11:45:25 +0000 (12:45 +0100)] 
commands_utils: add missing mutex

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
7 years agotests: s/lxc.init.cmd/lxc.init_cmd/g
Christian Brauner [Sun, 17 Dec 2017 16:53:46 +0000 (17:53 +0100)] 
tests: s/lxc.init.cmd/lxc.init_cmd/g

lxc.init.cmd is the new key that stable-2.0 doesn't know about.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
7 years agolxc_init: fix cgroup parsing
Christian Brauner [Thu, 14 Dec 2017 22:00:04 +0000 (23:00 +0100)] 
lxc_init: fix cgroup parsing

coverity: #1426132
coverity: #1426133

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
7 years agoutils: use lxc_raw_clone() in run_command()
Christian Brauner [Thu, 14 Dec 2017 15:42:00 +0000 (16:42 +0100)] 
utils: use lxc_raw_clone() in run_command()

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
7 years agonamespace: add lxc_raw_clone()
Christian Brauner [Thu, 14 Dec 2017 14:31:54 +0000 (15:31 +0100)] 
namespace: add lxc_raw_clone()

This is based on raw_clone in systemd but adapted to our needs. The main reason
is that we need an implementation of fork()/clone() that does guarantee us that
no pthread_atfork() handlers are run. While clone() in glibc currently doesn't
run pthread_atfork() handlers we should be fine but there's no guarantee that
this won't be the case in the future. So let's do the syscall directly - or as
direct as we can. An additional nice feature is that we get fork() behavior,
i.e. lxc_raw_clone() returns 0 in the child and the child pid in the parent.

Our implementation tries to make sure that we cover all cases according to
kernel sources. Note that we are not interested in any arguments that could be
passed after the stack.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
7 years agocommands: fix race when open()/close() cmd socket
Christian Brauner [Thu, 14 Dec 2017 19:57:15 +0000 (20:57 +0100)] 
commands: fix race when open()/close() cmd socket

When we report STOPPED to a caller and then close the command socket it is
technically possible - and I've seen this happen on the test builders - that a
container start() right after a wait() will receive ECONNREFUSED because it
called open() before we close(). So for all new state clients simply close the
command socket. This will inform all state clients that the container is
STOPPED and also prevents a race between a open()/close() on the command socket
causing a new process to get ECONNREFUSED because we haven't yet closed the
command socket.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
7 years agoSHARE_NS options should be before OPT_USAGE
Tycho Andersen [Thu, 14 Dec 2017 00:57:48 +0000 (00:57 +0000)] 
SHARE_NS options should be before OPT_USAGE

Signed-off-by: Tycho Andersen <tycho@tycho.ws>
7 years agoinit: don't kill(-1) if we aren't in a pid ns
Tycho Andersen [Fri, 8 Dec 2017 23:23:26 +0000 (23:23 +0000)] 
init: don't kill(-1) if we aren't in a pid ns

...otherwise we'll kill everyone on the machine. Instead, let's explicitly
try to kill our children. Let's do a best effort against fork bombs by
disabling forking via the pids cgroup if it exists. This is best effort for
a number of reasons:

* the pids cgroup may not be available
* the container may have bind mounted /dev/null over pids.max, so the write
  doesn't do anything

Signed-off-by: Tycho Andersen <tycho@tycho.ws>
7 years agostart: fix cgroup namespace preservation
Christian Brauner [Tue, 12 Dec 2017 23:22:47 +0000 (00:22 +0100)] 
start: fix cgroup namespace preservation

Prior to this patch we raced with a very short-lived init process. Essentially,
the init process could exit before we had time to record the cgroup namespace
causing the container to abort and report ABORTING to the caller when it
actually started just fine. Let's not do this.

(This uses syscall(SYS_getpid) in the the child to retrieve the pid just in case
we're on an older glibc version and we end up in the namespace sharing branch
of the actual lxc_clone() call.)

Additionally this fixes the shortlived tests. They were faulty so far and
should have actually failed because of the cgroup namespace recording race but
the ret variable used to return from the function was not correctly
initialized. This fixes it.
Furthermore, the shortlived tests used the c->error_num variable to determine
success or failure but this is actually not correct when the container is
started daemonized.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
7 years agotools: exit success when lxc-execute is daemonized
Christian Brauner [Tue, 12 Dec 2017 20:05:39 +0000 (21:05 +0100)] 
tools: exit success when lxc-execute is daemonized

The error_num value doesn't tell us anything since the container hasn't exited.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
7 years agostart: do not unconditionally dup std{in,out,err}
Christian Brauner [Tue, 12 Dec 2017 19:09:06 +0000 (20:09 +0100)] 
start: do not unconditionally dup std{in,out,err}

Starting with commit

    commit c5b93afba1d79c6861a6f45db2943b6f3cfbdab4
    Author: Li Feng <lifeng68@huawei.com>
    Date:   Mon Jul 10 17:19:52 2017 +0800

        start: dup std{in,out,err} to pty slave

        In the case the container has a console with a valid slave pty file descriptor
        we duplicate std{in,out,err} to the slave file descriptor so console logging
        works correctly. When the container does not have a valid slave pty file
        descriptor for its console and is started daemonized we should dup to
        /dev/null.

        Closes #1646.

Signed-off-by: Li Feng <lifeng68@huawei.com>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
we made std{err,in,out} a duplicate of the slave file descriptor of the console
if it existed. This meant we also duplicated all of them when we executed
application containers in the foreground even if some std{err,in,out} file
descriptor did not refer to a {p,t}ty. This blocked use cases such as:

    echo foo | lxc-execute -n -- cat

which are very valid and common with application containers but less common
with system containers where we don't have to care about this. So my suggestion
is to unconditionally duplicate std{err,in,out} to the console file descriptor
if we are either running daemonized - this ensures that daemonized application
containers with a single bash shell keep on working - or when we are not
running an application container. In other cases we only duplicate those file
descriptors that actually refer to a {p,t}ty. This logic is similar to what we
do for lxc-attach already.

Refers to #1690.
Closes #2028.

Reported-by: Felix Abecassis <fabecassis@nvidia.com>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
7 years agocoverity: #1425857
Christian Brauner [Sat, 9 Dec 2017 19:00:40 +0000 (20:00 +0100)] 
coverity: #1425857

remove logically dead code

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
7 years agocoverity: #1425858
Christian Brauner [Sat, 9 Dec 2017 18:59:11 +0000 (19:59 +0100)] 
coverity: #1425858

free allocated memory

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
7 years agocoverity: #1425859
Christian Brauner [Sat, 9 Dec 2017 18:53:43 +0000 (19:53 +0100)] 
coverity: #1425859

check return value of snprintf()

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
7 years agocoverity: #1425860
Christian Brauner [Sat, 9 Dec 2017 18:51:55 +0000 (19:51 +0100)] 
coverity: #1425860

remove logically dead code

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>