Li Feng [Mon, 10 Jul 2017 09:19:52 +0000 (17:19 +0800)]
start: dup std{in,out,err} to pty slave
In the case the container has a console with a valid slave pty file descriptor
we duplicate std{in,out,err} to the slave file descriptor so console logging
works correctly. When the container does not have a valid slave pty file
descriptor for its console and is started daemonized we should dup to
/dev/null.
Closes #1646.
Signed-off-by: Li Feng <lifeng68@huawei.com> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
This adds a little more flexibility to the state server. The idea is to have a
command socket function "lxc_cmd_add_state_client()" whose only task is to add
a new state client to the container's in-memory handler. This function returns
either the state of the container if it is already in the requested state or it
will return the newly registered client's fd in one of its arguments to the
caller. We then provide a separate helper function "lxc_cmd_sock_rcv_state()"
which can be passed the returned client fd and listens on the fd for the
requested state.
This is useful when we want to first register a client, then send a signal to
the container and wait for a state. This ensure that the client fd is
registered before the signal can have any effect and can e.g. be used to catch
something like the "STOPPING" state that is very ephemeral.
Additionally we provide a convenience function "lxc_cmd_sock_get_state()" which
combines both tasks and is used in e.g. "lxc_wait()".
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Add a test to see if we can start daemonized containers that have a very
short-lived init process. The point of this is to see whether we can correctly
retrieve the state.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Since we killed lxc-monitord we rely on the container's command socket to wait
for the container. This doesn't work nicely on daemonized startup since a
container's init process might be something that is so short-lived that we
won't even be able to add a state client before the mainloop closes. But the
container might still have been RUNNING and executed the init binary correctly.
In this case we would erroneously report that the container failed to start
when it actually started just fine.
This commit ensures that we really all cases where the container successfully
ran by switching to a short-lived per-container anonymous unix socket pair that
uses credentials to pass container states around. It is immediately closed once
the container has started successfully.
This should also make daemonized container start way more robust since we don't
rely on the command socket handler to be running.
For the experienced developer: Yes, I did think about utilizing the command
socket directly for this. The problem is that when the mainloop starts it may
end up end accept()ing the connection that we want
do_wait_on_daemonized_start() to accept() so this won't work and might cause us
to hang indefinitely. The same problem arises when the container fails to start
before the mainloop is created. In this case we would hang indefinitely as
well.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Long Wang [Sat, 8 Jul 2017 02:29:57 +0000 (10:29 +0800)]
af_unix: remove unlink operation
It is not necessary to unlink the abstract socket pathname when
we have finished using the socket. The abstract name is automatically
removed when the socket is closed.
This patch allows users to start containers in AppArmor namespaces.
Users can define their own profiles for their containers, but
lxc-start must be allowed to change to a namespace.
A container configuration file can wrap a container in an AppArmor
profile using lxc.aa_profile.
A process in an AppArmor namespace is restricted to view
or manage only the profiles belonging to this namespace, as if no
other profiles existed. A namespace can be created as follow:
sudo mkdir /sys/kernel/security/apparmor/policy/namespaces/$NAMESPACE
AppArmor can stack profiles so that the contained process is bound
by the intersection of all profiles of the stack. This is achieved
using the '//&' operator as follow:
In this case, even the guest process appears unconfined in the
namespace, it is still confined by $PROFILE.
A guest allowed to access "/sys/kernel/security/apparmor/** rwklix,"
will be able to manage its own profile set, while still being
enclosed in the topmost profile $PROFILE:
Different guests can be assigned the same namespace or different
namespaces. In the first case, they will share their profiles.
In the second case, they will have distinct sets of profiles.
start: send state to legacy lxc-monitord state server even if no state clients registered
This pr https://github.com/lxc/lxc/pull/1618 kill lxc-monitord, for backwards compatibility,
we also send state to legacy lxc-monitord state server in function `lxc_set_state`.
we should also send state if there is no state clients registered, otherwise `lxc-monitor` client will
not get state change event if container changed state to `STARTING` or `RUNNING`.
lxc-monitor has an option to tell lxc-monitord to quit.
```
~/lxc (master) # lxc-monitor --help
lxc-monitor monitors the state of the NAME container
Options :
-n, --name=NAME NAME of the container
NAME may be a regular expression
-Q, --quit tell lxc-monitord to quit
```
But it does not work. This patch fix that.
network: perform network validation at creation time
Some of the checks were previously performed when parsing the network config.
But since we allow for a little more flexibility now it doesn't work anymore.
Instead, let's validate the network at creation time.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
lxc_get_netdev_by_idx() takes care of checking whether a given netdev struct
for a given index is already allocated. If so it returns a pointer to it to the
caller.
If it doesn't find it it will allocate a new netdev struct and insert it into
the network list at the right position. For example, let's say you have the
following networks defined in your config file:
When we merged the new logging function for the api we exposed the log level
argument in the struct as "priority" which we actually requested to be changed
to "level" which somehow didn't happen and we missed it. Given the fact there
has been no new liblxc release let's fix it right now before it hits users.
Also, take the chance to change the terminology in the log from "priority" to
"level" globally. This is to prevent confusion with syslog's "priority"
argument which we also support.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
tests: don't fail when no processes for user exist
Since we kicked lxc-monitord there will very likely be no user processes around
anymore after all container's have been stopped. Which is a very very very good
thing. So let's not error out when pkill doesn't find any processes.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
A LXC container's lifecycle is regulated by the states STARTING, RUNNING,
STOPPING, STOPPED, ABORTING. These states are tracked in the LXC handler and
can be checked via approriate functions in the command socket callback system.
(The freezer stages are not part of a container's lifecycle since they are not
recorded in the LXC handler. This might change in the future but given that the
freezer controller will be removed from future cgroup implementations it is
unlikely.) So far, LXC was using an external helper to track the states of a
container (lxc-monitord). This solution was error prone. For example, the
external state server would hang in various scenarios that seemed to be caused
by either very subtle internal races or irritation of the external state server
by signals.
LXC will switch from an external state monitor (lxc-monitord) which serves as a
state server for state clients to a native implementation using the indiviual
container's command socket. This solution was discussed and outlined by Stéphane
Graber and Christian Brauner during a LX{C,D} sprint.
The LXC handler will gain an additional field to track state clients. In order
for a state client to receive state notifications from the command server he
will need to register himself via the lxc_cmd_state_server() function in the
state client list. The state client list will be served by lxc_set_state()
during the container's lifecycle. lxc_set_state() will also take care of
removing any clients from the state list in the LXC handler once the requested
state has been reached and sent to the client.
In order to prevent races between adding and serving new state clients the state
client list and the state field in the LXC handler will be protected by a lock.
This commit effectively deprecates lxc-monitord. Instead of serving states to
state clients via the lxc-monitord fifo and socket we will now send the state
of the container via the container's command socket.
lxc-monitord is still useable and will - for the sake of the lxc-monitor
command - be kept around so that non-API state clients can still monitor the
container during it's lifecycle.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
The removed codepath was non-functional for a long time now. All mounting is
handled through bdev.{c,h} and if that fails the other codepath would
necessarily fail as well. So let's remove them. This makes it way clearer what
is going on and simplifies things massively.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
- Enable lxc_abstract_unix_{send,recv}_fd() to send and receive multiple fds at
once.
- lxc_abstract_unix_{send,recv}_fd() -> lxc_abstract_unix_{send,recv}_fds()
- Send tty fds from child to parent all at once.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
This is a potentially security sensitive operation and I really want to keep an
eye on *when exactly* this is send. So add more logging on the TRACE() level.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
This adds a test that checks LXC's configuration jump table whether all methods
for a given configuration item are implemented. If it is not, we'll error out.
This should provide additional safety since a) the API can now be sure that
dereferencing the pointer for a given method in the config struct is safe and
b) when users implement new configuration keys and forget to implement a
required method we'll see it right away.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Afaict, userns_exec_1() is only used to operate based on privileges for the
user's own {g,u}id on the host and for the container root's unmapped {g,u}id.
This means we require only to establish a mapping from:
- the container root {g,u}id as seen from the host -> user's host {g,u}id
- the container root -> some sub{g,u}id
The former we add, if the user did not specifiy a mapping. The latter we
retrieve from the ontainer's configured {g,u}id mappings.
Closes #1598.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>