Stéphane Graber [Mon, 27 Aug 2012 22:53:00 +0000 (18:53 -0400)]
Merge the liblxc API work by Serge Hallyn.
This turns liblxc into a public library implementing a container structure.
The container structure is meant to cover most LXC commands and can easily be
used to write bindings in other programming languages.
More information on the new functions can be found in src/lxc/lxccontainer.h
Test programs using the API can also be found in src/tests/
Christian Seiler [Tue, 21 Aug 2012 22:03:16 +0000 (00:03 +0200)]
lxc-attach: Add -R option to remount /sys and /proc when only partially attaching
When attaching to only some namespaces of the container but not the mount
namespace, the contents of /sys and /proc of the host system do not properly
reflect the context of the container's pid and/or network namespaces, and
possibly others.
The introduced -R option adds the possibility to additionally unshare the
mount namespace (when it is not being attached) and remount /sys and /proc
in order for those filesystems to properly reflect the container's context
even when only attaching to some of the namespaces.
Signed-off-by: Christian Seiler <christian@iwakd.de> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Cc: Daniel Lezcano <daniel.lezcano@free.fr>
Christian Seiler [Tue, 21 Aug 2012 22:03:15 +0000 (00:03 +0200)]
lxc-attach: Add -s option to select namespaces to attach to
This patch allows the user to select any list of namespaces (network, pid,
mount, uts, ipc, user) that lxc-attach should use when attaching to the
container; all other namespaces will not be attached to.
This allows the user to for example attach to just the network namespace and
use the host's (and not the container's) network tools to reconfigure the
network of the container.
Signed-off-by: Christian Seiler <christian@iwakd.de> Cc: Daniel Lezcano <daniel.lezcano@free.fr> Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Christian Seiler [Tue, 21 Aug 2012 22:03:14 +0000 (00:03 +0200)]
lxc-unshare: Move functions to determine clone flags from command line options to namespace.c
In order to be able to reuse code in lxc-attach, the functions
lxc_namespace_2_cloneflag and lxc_fill_namespace_flags are moved from
lxc_unshare.c to namespace.c.
Signed-off-by: Christian Seiler <christian@iwakd.de> Cc: Daniel Lezcano <daniel.lezcano@free.fr> Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Christian Seiler [Tue, 21 Aug 2012 22:03:13 +0000 (00:03 +0200)]
lxc-attach: Detect which namespaces to attach to dynamically
Use the command interface to contact lxc-start to receive the set of
flags passed to clone() when starting the container. This allows lxc-attach
to determine which namespaces were used for the container and select only
those to attach to.
Signed-off-by: Christian Seiler <christian@iwakd.de> Cc: Daniel Lezcano <daniel.lezcano@free.fr> Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Christian Seiler [Tue, 21 Aug 2012 22:03:12 +0000 (00:03 +0200)]
lxc-attach: Remodel cgroup attach logic and attach to namespaces again in parent process
With the introduction of lxc-attach's functionality to attach to cgroups,
the setns() calls were put in the child process after the fork() and not the
parent process before the fork() so the parent process remained outside the
namespaces and could add the child to the correct cgroup.
Unfortunately, the pid namespace really affects only children of the current
process and not the process itself, which has several drawbacks: The
attached program does not have a pid inside the container and the context
that is used when remounting /proc from that process is wrong. Thus, the
previous logic of first setting the namespaces and then forking so the child
process (which then exec()s to the desired program) is a real member of the
container.
However, inside the container, there is no guarantee that the cgroup
filesystem is still be mounted and that we are allowed to write to it (which
is why the setns() was moved in the first place).
To work around both problems, we separate the cgroup attach functionality
into two parts: Preparing the attach process, which just opens the tasks
files of all cgroups and keeps the file descriptors open and the writing to
those fds part. This allows us to open all the tasks files in lxc_attach,
then call setns(), then fork, in the child process close them completely and
in the parent process just write the pid of the child process to all those
fds.
Signed-off-by: Christian Seiler <christian@iwakd.de> Cc: Daniel Lezcano <daniel.lezcano@free.fr> Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Christian Seiler [Tue, 21 Aug 2012 22:03:11 +0000 (00:03 +0200)]
lxc-start: Add command to retrieve the clone flags used to start the container.
Add the LXC_COMMAND_CLONE_FLAGS that retrieves the flags passed to clone(2)
when the container was started. This allows external programs to determine
which namespaces the container was unshared from.
Signed-off-by: Christian Seiler <christian@iwakd.de> Cc: Daniel Lezcano <daniel.lezcano@free.fr> Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Serge Hallyn [Tue, 21 Aug 2012 15:11:23 +0000 (10:11 -0500)]
lxc-create: Make location of container rootfs configurable
Make 'dir' an explicit backing store type, which accepts '--dir rootfs'
as an option to specify a custom location for the container rootfs. Also
update lxc-destroy to now remove the rootfs separately, as removing
@LXCPATH@/$name may not hit it.
Serge Hallyn [Tue, 21 Aug 2012 15:05:19 +0000 (10:05 -0500)]
lxc_start: exit early if insufficient privs in daemon mode
Starting a container with insufficient privilege (correctly) fails
during lxc_init. However, if starting a daemonized container, we
daemonize before we get to that check. Therefore while the
container will fail to start, and the logfile will show this, the
'lxc-start -n x -d' command will return success. For ease of
scripting, do a check for the required privilege before we exit.
Jan Kiszka [Mon, 9 Jul 2012 17:15:48 +0000 (19:15 +0200)]
Add network-down script
Analogously to lxc.network.script.up, add the ability to register a down
script. It is called before the guest network is finally destroyed,
allowing to clean up resources that are not reset/destroyed
automatically. Parameters of the down script are identical to the up
script except for the execution context "down".
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Serge Hallyn [Fri, 17 Aug 2012 02:11:50 +0000 (21:11 -0500)]
Cleanup partial container if -h was passed to template
If user calls 'lxc-create -t ubuntu -- -h' (as opposed to
'lxc-create -t ubuntu -h') then the ubuntu template will print its
help then exit 0. Then lxc-create does not cleanup. So detect this
in lxc-create.
This patch is so far just a proof of concept. The libseccomp api will be
changing soon so it probably wouldn't be worth pulling this until it is
updated for the new API.
This patch introduces support for seccomp to lxc. Seccomp lets a program
restrict its own (and its children's) future access to system calls. It
uses a simple whitelist system call policy file. It would probably be
better to switch to something more symbolic (i.e specifying 'open' rather
than the syscall #, especially given container arch flexibility).
I just wanted to get this out there as a first step. You can also get
source for an ubuntu package based on this patch at
https://code.launchpad.net/~serge-hallyn/ubuntu/quantal/lxc/lxc-seccomp
Jan Kiszka [Thu, 9 Aug 2012 22:54:48 +0000 (17:54 -0500)]
lxc-wait: Add timeout option
Allow to specify a timeout for waiting on state changes via lxc-wait.
Helpful for scripts that need to handle errors or excessive delays in
state changing procedures.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
The 'lxc.mount =' entry can have more than one space, or tabs, before the =.
We only need to disambiguate from 'lxc.mount.entry'. So just check for a
space or tab after mount.
CAP_LAST_CAP in linux/capability.h doesn't always match what the kernel
actually supports. If the kernel supports fewer capabilities, then a
cap_get_flag for an unsupported capability returns -EINVAL.
Recognize that, and don't fail when initializing capabilities when this
happens, rather accept that we've reached the last capability.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com> Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
- Update list of extra packages for debootstrap to only include vim
and ssh. The others were only relevant when we were still using the
minbase variant. (LP: #996839)
- Drop any hardcoded Ubuntu version check and replace by feature
checks instead.
- Format lxc-ubuntu to consistently use 4-spaces indent instead of
mixed spaces/tabs.
- Update default /etc/network/interfaces to include the header.
- Update default /etc/hosts to match that of a regular Ubuntu system.
- Drop support for end-of-life releases (gutsy on sparc).
- Make sure /etc/resolv.conf is valid before running any apt command.
- Update template help message for release and arch parameters.
- Switch default Ubuntu version from lucid to precise.
When installing a non-native architecture, the template
installs a bunch of packages of the native architecture to work around
existing limitations of qemu-user-static, mostly related to netlink.
The current code would install upstart of the host architecture but
force the amd64 version of the others. This was just a mistake done
while testing/developping the code. Fixing now to always install
the native architecture version of all of them.
lxc-init used to be under /usr/lib/lxc. Now it is under
/usr/lib/<multiarch>/lxc, but old containers will still have it under
/usr/lib/lxc. So search for a valid lxc-init to run.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com> Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
lxc-ubuntu-cloud: extract the right filenames from tarball
Signed-off-by: Ben Howard <ben.howard@canonical.com> Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com> Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Description: Fix handling of user-data in ubuntu-cloud template
Signed-off-by: Ben Howard <ben.howard@canonical.com> Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com> Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
templates: use relative paths when creating containers
At the same time, allow lxc.mount.entry to specify an absolute target
path relative to /var/lib/lxc/CN/rootfs, even if rootfs is a blockdev.
Otherwise all such entries are ignored for blockdev-backed containers.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com> Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
This patch introduces support for 4 hooks. We'd like to have 6 in
all to mirror the openvz ones (thanks to Stéphane for this info):
pre-start: in the host namespace before container mounting happens
mount: after container mounting (as per config and /var/lib/lxc/container/fstab)
but before pivot_root
start: immediately before exec'ing init
stop: in container namespace and in chroot before shutdown
umount: after other unmounting has happened
post-stop: outside of the container
stop and umount are not implemented here because when the kernel kills
the container init, it kills the namespace. We can probably work around
this, i.e. by keeping the /proc/pid/ns/mnt open, and using that, though
all container tasks including init would still be dead. Is that worth
pursuing?
start also presents a bit of an issue. openvz allows a script on the
host to be specified, apparently. My patch requires the script or
program to exist in the container. I'm fine with trying to do it the
openvz way, but I wasn't sure what the best way to do that was. Openvz
(I'm told) opens the script and passes its contents to a bash in the
container. But that limits the hooks to being only scripts. By
requiring the hook to be in the container, we can allow any sort of
hook, and assume that any required libraries/dependencies exist
there.
This could be done as generic 'lsm_init()' and 'lsm_load()' functions,
however that would make it impossible to compile one package supporting
more than one lsm. If we explicitly add the selinux, smack, and aa
hooks in the source, then one package can be built to support multiple
kernels.
The smack support should be pretty trivial, and probably very close
to the apparmor support.
The selinux support may require more, including labeling the passed-in
fds (consoles etc) and filesystems.
If someone on the list has the inclination and experience to add selinux
support, please let me know. Otherwise, I'll do Smack and SELinux.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com> Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
It optionally waits (an optional timeout # of seconds) for the container to
be STOPPED. If given -r, it reboots the container (and exits immediately).
I decided to add the timeout after all because it's harder to finagle into
an upstart post-stop script than a full bash script.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com> Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
lxc-ubuntu-cloud.in: re-enable use of daily cloud images
There are two types of cloud images - released and daily ones. We were
always using daily ones, instead of using released by default with an
option for daily. Fix that.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com> Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
lxc-ubuntu.in: fix up the logic adding group for bound users
1. 'getent group $user' assumes user's group is named $user.
2. if 'getent group' returns error, just ignore the group in container
3. (misc) while it happens to all work out fine anyway, don't do
getent passwd $bindhome if $bindhome isn't defined. (it will
successfully return all password entries)
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com> Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
When creating a container as lvm snapshot, use the original size unless
user explicitly overrides it.
It's all well and good to day "use lvextend if you run out of space", but
in the meantime applications may become corrupted...
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com> Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
ubuntu template: add sudo group and cleanup minor devttydir issue
Always add the user to the 'sudo' group as it's been around
since at least Ubuntu 10.04. In addition make the user part
of the admin group until 12.04 where it's been removed.
Also fix a minor layout issue with devttydir.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com> Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Daniel Baumann [Tue, 31 Jul 2012 14:01:24 +0000 (16:01 +0200)]
fix netstat script with separator
Allow to use -- as seperator in lxc-netstat, otherwise -n from lxc-netstat
collides with netstats -n option (Closes: #641251).
[Serge Hallyn] update patch to (1) not demand argument for
exec (breaks) and (2) set $name not $lxc_name.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com> Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
David Ward [Thu, 3 May 2012 22:50:15 +0000 (00:50 +0200)]
make help consistent for other scripts
Display help information in a consistent format.
Print error messages and help information to stderr. Prefix error
messages with the name of the script (for easier debugging as part
of larger scripts).
Allow help information to be printed as a non-root user.
Fix file mode for lxc-checkconfig.in.
Signed-off-by: David Ward <david.ward@ll.mit.edu> Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
David Ward [Thu, 3 May 2012 22:50:15 +0000 (00:50 +0200)]
rewrite lxc-ps
Use bash instead of perl; eliminates final lxc dependency on perl
(beneficial for minimal operating system environments).
Modify the cgroup search to only use hierarchies that contain one
or more subsystems. When searching, if a hierarchy contains the
'ns' subsystem, do not append '/lxc' to the parent cgroup.
Maintain column spacing. Expand container name column as necessary.
Properly handle spaces in 'ps' output that are not field separators
(for example, try 'lxc-ps -o pid,args').
Fix file mode in repository.
Signed-off-by: David Ward <david.ward@ll.mit.edu> Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
David Ward [Thu, 3 May 2012 22:50:15 +0000 (00:50 +0200)]
refresh lxc-netstat
Modify the cgroup search to only use hierarchies that contain one
or more subsystems. When searching, if a hierarchy contains the
'ns' subsystem, do not append '/lxc' to the parent cgroup.
Change method of bind mounting /proc/<pid>/net onto /proc/net, to
avoid error "cannot mount block device /proc/<pid>/net read-only".
Check that user is root. Check that container name is specified
before calling 'exec'.
Update the help information.
Print error messages and help information to stderr.
Make indentation consistent.
Signed-off-by: David Ward <david.ward@ll.mit.edu> Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
David Ward [Thu, 3 May 2012 22:50:15 +0000 (00:50 +0200)]
refresh lxc-ls
Add an '--active' option that lists active containers by searching
cgroups. (Otherwise, the directories in /var/lib/lxc are listed.)
Modify the cgroup search to only use hierarchies that contain one
or more subsystems. When searching, if a hierarchy contains the
'ns' subsystem, do not append '/lxc' to the parent cgroup.
Add a '--help' option that prints the command syntax.
Print error messages and help information to stderr.
Update the documentation.
Signed-off-by: David Ward <david.ward@ll.mit.edu> Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
David Ward [Thu, 3 May 2012 22:50:15 +0000 (00:50 +0200)]
cgroup: only touch hierarchies that are bound to subsystems
Obtain a list of subsystems from /proc/cgroups, and ignore hierarchies
that are not bound to any of them (especially the 'systemd' hierarchy:
http://www.freedesktop.org/wiki/Software/systemd/PaxControlGroups ).
Signed-off-by: David Ward <david.ward@ll.mit.edu> Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
David Ward [Thu, 3 May 2012 22:50:15 +0000 (00:50 +0200)]
lxc-attach: unify code for attaching a pid to a cgroup
To attach a new pid to the cgroups for an existing container, we can use
the same method that we did when we started the container: iterate over
all the mounted cgroup hierarchies; find the cgroup that pid 1 is in for
each hierarchy; add 'lxc/<name>' to the end of it; then write the pid to
the 'tasks' file in that cgroup. (The only difference is that we do not
create the cgroup again.) Note that we follow exactly the same iteration
pattern to delete our cgroups when a container is shutdown.
There may be situations where additional cgroups hierarchies are mounted
after the container is started, or the cgroup for pid 1 gets reassigned.
But we currently don't handle any of these cases in the shutdown code or
anywhere else, so it doesn't make sense to try to handle these cases for
lxc-attach by itself. Aside from simplifying the code, this change makes
it easier to solve a different problem: ignoring hierarchies that are
not bound to any subsystems (like 'systemd').
Signed-off-by: David Ward <david.ward@ll.mit.edu> Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
David Ward [Thu, 3 May 2012 22:50:15 +0000 (00:50 +0200)]
utmp: support non-rootfs configuration
Having a rootfs is not a necessary condition for monitoring utmp, since
/var or /var/run can just be remounted inside the container instead. We
should rely on the other two conditions already in place to decide
whether to monitor the utmp file:
- the container was started with 'lxc-start', which indicates that it
has a real init process and is expected to write to a utmp file
- support for CAP_SYS_BOOT was not found in the kernel, which would
otherwise supersede utmp monitoring
Signed-off-by: David Ward <david.ward@ll.mit.edu> Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
David Ward [Thu, 3 May 2012 22:50:15 +0000 (00:50 +0200)]
utmp: do not set conf->need_utmp_watch if CAP_SYS_BOOT is not found
If CAP_SYS_BOOT is not found in the kernel, the existing value for
conf->need_utmp_watch should be left intact (which will be '1' for
containers started with 'lxc-start', or '0' for containers started
with 'lxc-execute').
Signed-off-by: David Ward <david.ward@ll.mit.edu> Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
David Ward [Thu, 3 May 2012 22:50:15 +0000 (00:50 +0200)]
lxc-attach: use execvp instead of execve
execvp does not require specifying the full path to the executable
(e.g., "ls" instead of "/bin/ls"), making the operation of 'lxc-attach'
consistent with 'lxc-start' and 'lxc-execute'.
Signed-off-by: David Ward <david.ward@ll.mit.edu> Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Serge Hallyn [Sun, 18 Mar 2012 23:31:40 +0000 (00:31 +0100)]
ubuntu templates cleanups
1. fix inconsistent use of '--auth-key' (not --auth_key) which broke their
usage
2. add --debug option to lxc-ubuntu (which does set -x to show what broke)
(idea from Idea from lifeless and benji)
3. fix incorrect assumption about group with -b option. User's default group
may not be the same as username. Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Serge Hallyn [Sun, 18 Mar 2012 23:31:40 +0000 (00:31 +0100)]
do check for utmp checking at the right time
We were doing the check for whether we need to watch utmp from a
thread cloned from that which will actually do the utmp watching.
As a result, the utmp file was always being watched, even if it
didn't need to be.
Serge Hallyn [Mon, 5 Mar 2012 22:53:14 +0000 (23:53 +0100)]
cgroups: fix broken support for deprecated ns cgroup
when using ns cgroup, use /cgroup/<init-cgroup> rather than
/cgroup/<init-cgroup>/lxc
At least lxc-start, lxc-stop, lxc-cgroup, lxc-console and lxc-ls work
with this patch. I've tested this in a 2.6.35 kernel with ns cgroup,
and in a 3.2 kernel without ns cgroup.
Note also that because of the check for container reboot support,
if we're using the ns cgroup we now end up with a /cgroup/<container>/2
cgroup created, empty, by the clone(CLONE_NEWPID). I'm really not
sure how much time we want to spend cleaning such things up since
ns cgroup is deprecated in kernel.
Signed-off-by: Serge Hallyn <serge@hallyn.com> Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Serge Hallyn [Thu, 16 Feb 2012 20:14:13 +0000 (14:14 -0600)]
update ubuntu templates to provide macaddr and more
Add a macaddr if precisely one veth is specified but no hwaddr. Allow
specifying ssh authkeys. In cloud template, copy locales by default and allow
a tarball to be specified.
Signed-off-by: Ben Howard <ben.howard@canonical.com> Signed-off-by: Serge Hallyn <serge@hallyn.com> Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Serge Hallyn [Thu, 16 Feb 2012 20:13:26 +0000 (14:13 -0600)]
lxc-ubuntu: fix obscure arguments
1. --path is meant to be passed by lxc-create, but should not be passed
in by users. Don't advertise it in --help.
2. --clean syntax ends up not making much sense. Get rid of it, and
add '--flush-cache' option instead.
Signed-off-by: Serge Hallyn <serge@hallyn.com> Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Serge Hallyn [Thu, 16 Feb 2012 20:01:20 +0000 (14:01 -0600)]
ubuntu template changes
Author: Stéphane Graber <stgraber@ubuntu.com>
Use ubuntu/ubuntu instead of root/root by default. Stop
removing tty[56].conf in Precise. Stop messing with dhclient.conf.
Set devttydir on Precise to /dev/lxc to allow for clean upgrades.