Serge Hallyn [Fri, 2 Feb 2024 16:41:11 +0000 (10:41 -0600)]
lxc-test-usernic: drop cgroup handling
This stuff is not needed in a modern systemd based system, and in fact
breaks. It would probably be better to detect such a system so that a
non-systemd box can still run this test. But I'm not sure what would be
reliable.
tree-wide: use container_uses_namespace() in less trivial cases
In our current codebase we have a logical pattern:
list_empty(&handler->conf->id_map)
*IF AND ONLY IF*
container does NOT use user namespace
Which is perfectly correct nowadays, but once we (hopefully)
get an "isolated user namespaces" stuff ready it won't be the case.
It will be perfectly fine to have a user namespace with empty
/proc/*/{u,g}id_map files. Nowadays it's also possible,
but this kind of a configuration close to useless and nobody
actually uses it.
No functional changes intended.
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Will be useful in future support for an isolated
user namespaces [1]. I have already played with
that locally and found that in the LXC codebase
we have a bunch of different ways to ensure if
a container uses user namespaces or not.
This commit contains a trivial conversion from
an open-coded version of the container_uses_namespace()
helper to an actual use of the helper.
Cole Miller [Fri, 15 Jul 2022 17:52:52 +0000 (13:52 -0400)]
Disable IPv6 link-local addresses for bridged veth
When creating a bridged veth tunnel, disable assignment of IPv6
link-local addresses on the host's end by writing 1 to
/proc/sys/net/ipv6/conf/NAME/disable_ipv6, if it exists.
Scott Moser [Tue, 22 Aug 2023 18:07:36 +0000 (14:07 -0400)]
Fix start api call to split quoted strings in execute or init command.
If a user of the container.start api call provided NULL for the argv
argument, then lxc would load either 'lxc.execute.cmd' or
'lxc.init.cmd' configuration items as the command.
lxc would just split the string on spaces and end up executing array:
['touch', 'file"', 'one"', '"file', '2"']
This differs from the experience with the `lxc-start` command which
would use lxc_string_split_quoted and execute:
['touch', 'file one', 'file 2']
Note that as described in lxc_string_split_quoted, commands that include
nested quotes and possibly other characters are still a problem. In
those cases, the caller of 'start' can provide an argv array.
get_hierarchy: dont WARN about no usable controller
If I start a container with loglevel WARN, and (on a pretty
stock ubuntu) do lxc-info -n $c, I get
lxc-start media 20230706233337.765 WARN cgfsng - cgroups/cgfsng.c:get_hierarchy:142 - There is no useable cpuacct controller
lxc-start media 20230706233337.765 WARN cgfsng - cgroups/cgfsng.c:get_hierarchy:142 - There is no useable blkio controller
I don't think that's worth WARNing about, so change it to
INFO.
Levent Komurcu [Mon, 26 Jun 2023 07:23:30 +0000 (09:23 +0200)]
Add libarchive tar support for lxc download
This patch fixes unpacking images when the system provided tar is libarchive (bsd-tar). bsd-tar doesn't support 'exclude' flags (--anchored) like gnu-tar does. Instead each exclude path is prepended with ^ to simulate behavior of --anchored when bsd tar is detected.
lxccontainer: extend lxccontainer API with set_timeout
lxccontainer set_timeout method allows to set LXC client
timeout for waiting monitor response.
Right now, it's implemented using the SO_RCVTIMEO client
socket option. (But it's the implementation detail that
can be changed in the future.)
This commit doesn't change behavior, because it's just
adds a new option and setter, but not changes any existing
LXC commands implementation. It's also extends internal API
function lxc_cmd with lxc_cmd_timeout.
Issue #4257
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Let's disable IORING_POLL_ADD_MULTI to workaround an issue
with false-positive POLLIN events in CQ.
In my local setup I managed to fix an issue without this
by making terminal FDs non-blocking, but during full
testsuite execution in Jenkins it was found that issue
still persists. So, let's add this ugly workaround too.
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Let's prevent freezes on read(2) by making a terminal FDs non-blocking.
It was discovered that there is an issue with io_uring mainloop when
multishot poll (IORING_POLL_ADD_MULTI) mode is enabled. Sometimes
false-positive poll events are put into a CQ. It makes further read(2)
stuck forever and blocks all mainloop processing for an infinite time.
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
it turns out that our parsing of /proc/pid/stat was not safe in general
(though probably safe for lxc, since our executable names do not contain
spaces).
Let's fix this by looking backwards through the file for ), and then
continuing on from there.
This was reported to me by Solar Designer, who pointed me to this thread:
https://twitter.com/solardiz/status/1634204168545001473
Indeed, this is a lot of tap dancing to work around the kernel's 16
character executable limit. Perhaps I'll send a kernel patch to raise that
limit next.
Long story behind this. Many years ago, Stéphane Graber
discovered an issue with apparmor mount rules.
Since
https://github.com/lxc/lxc/commit/7f2b13275daf68b173474900b1ce2c04105da33f
commit ("apparmor: Update mount states handling") it was prohibited
to change mount propagation flags, just because adding rules which
allow mount propagation user inside the container gets an ability
to mount everything [1].
Now with modern systemd versions this problem become more critical than
before. For instance, ArchLinux containers fail to start without
nesting apparmor profile enabled (because nesting profile effectively
just allow all mounts). Of course, that's a security issue.
We've also enabled sharing on the container rootfs:
https://github.com/lxc/lxc/pull/4229
Now for many workloads it's needed to change propagation flag to
private (see https://github.com/canonical/craft-parts/pull/400).
bpf-lsm: BPF LSM hook not enabled in the kernel, BPF LSM not supported
Failed to remount root directory as MS_SLAVE: Permission denied
(sd-gens) failed with exit status 1.
[!!!!!!] Failed to start up manager.
Exiting PID 1...
John Johansen (Apparmor maintainer) and LXD team worked on fix [2].
It was merged to stable AppArmor 3.0 and 3.1 branches already.
There is no stable AppArmor version tag for that, but I think it will
be in the AppArmor version 3.0.10.
See also:
[1] https://bugs.launchpad.net/apparmor/+bug/1597017
[2] https://gitlab.com/apparmor/apparmor/-/merge_requests/333
Fixes: #4280 Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Serge Hallyn [Wed, 8 Mar 2023 05:53:59 +0000 (23:53 -0600)]
switch from libsystemd's dbus to dbus-1
This is purely so that we can do static linking. Linking against
libsystemd makes that a challenge because while it's perfectly simple
to do, distros tend not to provide a libsystemd.a.
Tools that want to (a) link against liblxc and (b) have a statically
linked binary to bind into a minimal container are ill served by
this. So link against libdbus-1.
.github/workflows/build.yml: switch to dbus-1.
src/lxc/cgroups/cgfsng.c: replace the unpriv_systemd_create_scope(),
start_scope, and enter_scope() systemd code with dbus-1 code.
src/tests/oss-fuzz.sh: update from libsystemd-dev to libdbus-1-dev
src/tests/oss-fuzz.sh: disable dbus
.github/workflows/*: update from libsystemd-dev to libdbus-1-dev
meson.build and meson_options.txt: switch from sd_bus to dbus
lxc.spec.in: add dbus-1 to BuildRequires
Signed-off-by: Serge Hallyn <serge@hallyn.com>
Changelog: 03/13: use custom iter type so we can cleanup more easily...
Changelog: 03/13: initialize each dbus_iter to { 0 } as mihalicyn suggested.
tree-wide: convert fcntl(FD_CLOEXEC) to SOCK_CLOEXEC
- replace accept() + fcntl(FD_CLOEXEC) with accept4(..., SOCK_CLOEXEC)
- remove fcntl(FD_CLOEXEC) in lxc_server_init() as we already set
SOCK_CLOEXEC in lxc_abstract_unix_open().
See also: ad9429e52 ("tree-wide: make socket SOCK_CLOEXEC") Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Scott Moser [Fri, 24 Feb 2023 21:48:10 +0000 (16:48 -0500)]
Allow fuse mounts in apparmor start-container.
Unprivledged user should be able to do fuse mounts during start-container.
Specifically this solves the problem for un-priv fuse mounting via
pre-hook.