lxc/lxccontainer: stop printing misleading errors in enter_net_ns()
In enter_net_ns() we try to enter network namespace at first, before
entering a user namespace to support inherited netns case properly.
It is expected to get EPERM for unprivileged container with non-shared
network namespace at first try. Let's take this into account
and stop misleading users with these error messages.
src/tests/lxc-test-unpriv: prevent fail on cleanup path
/run/user/$(id -u $TUSER) is a mountpoint for tmpfs, rm -rf
may fail with EBUSY errno. We should mask it and prevent test from marked
as failed because of this.
Also add set -x to make debugging easier in case of failures.
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
src/tests/lxc-test-apparmor-mount: prevent fail on cleanup path
/run/user/$(id -u $TUSER) is a mountpoint for tmpfs, rm -rf
may fail with EBUSY errno. We should mask it and prevent test from marked
as failed because of this.
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Commit eae44ce19931 ("conf: fix append_ttyname()") changed the format
of `conf->ttys.tty_names`, where the `container_ttys=` prefix was
removed.
This seems to have been taken into account in `lxc_create_ttys()` in
`src/lxc/conf.c`, however that's not enough. `do_start()` in
`src/lxc/start.c` clears the environment, and then does `putenv(...)`
directly on the value of `tty_names`. As it no longer has the
`container_ttys=` prefix, this call doesn't have the intended effect.
This behaviour is also confirmed via `ltrace` when doing `lxc-start`:
Given that `do_start()` clears the environment anyway, there is no
reason for another `setenv()` call in `lxc_create_ttys()`, and a fix
is required for `putenv()` in `do_start()`.
Change the `putenv()` call to `setenv()` in `do_start()` to account
for the change of format in `conf->ttys.tty_names`. Remove extraneous
`setenv()` from `lxc_create_ttys()`.
Fixes #4198
Fixes: eae44ce19931 ("conf: fix append_ttyname()") Signed-off-by: Roman Azarenko <roman.azarenko+gh@genexis.eu>
lxc/start: do prctl(PR_SET_DUMPABLE) after last uid/gid switch
We need to do prctl(PR_SET_DUMPABLE) later, after last lxc_switch_uid_gid()
call. Because otherwise, our earlier call won't be effective as commit_creds()
in the kernel [1] will set_dumpable(task->mm, suid_dumpable) if UID/GID or capabilities
were affected by lxc_switch_uid_gid() call.
This only affects LXC API ->start(struct lxc_container *c, int useinit, char *const argv[])
call when useinit == 1 because in this case we don't perform additinal exec() and
task's dumpable bit remains set to 2 (default value taken from /proc/sys/fs/suid_dumpable).
If useinit == 0, then we do exec() (see start_ops->start callback) and then dumblable
flag will be reset in begin_new_exec() to SUID_DUMP_USER=1 [2]. Then everything will be fine.
prctl(PR_SCHED_CORE, PR_SCHED_CORE_SHARE_FROM...) fails with EPERM, because:
- container's init task->mm: (get_dumpable(mm) != SUID_DUMP_USER)
AND
- mm->user_ns == init_user_ns (as there was no exec() and mm_struct->user_ns was set in the initial
user namespace when we run lxc-execute)
( for more details see [3] )
Since 7418b27f1 ("tree-wide: use __u32 for capabilities") open
/proc/sys/kernel/cap_last_cap never worked, it was failing with
EXDEV and we were using a fallback codepath to get a last cap.
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
lxc/conf: handle rootfs open_at error in lxc_mount_rootfs
If LXC build is misconfigured, for instance, --prefix=/
and /lib is a symlink to /usr/lib then open_at always fails
to open rootfs. Let's add error print to make it easier to
figure out this.
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Let's revert this change as it introduces 2 regressions:
1. it's not correct to do exit(2) from a signal handler in this case,
as we skip a proper cleaning procedures like restoring PTY configuration
state (see lxc_terminal_delete()) which leads to a problem with a PTY after lxc-attach exits.
[ hint: just try to use lxc-attach on a main branch with this change and you will
see it. After lxc-attach exits you won't be able to type anything in your
current terminal session as it's messed up. ]
2. this introduces race-condition in the code which leads to a
regression on LXD/(and I believe Incus too) which can be seen as
random "Failed to retrieve PID of executing child process" errors
on "lxc exec"/"incus exec" commands. It's extremely hard to reproduce,
but my guess is that we are getting a race condition here, because
by the time when we set a new signal handler for SIGCHLD, transient process
is still alive and when it exists it generates SIGCHLD which may lead to
exit().
3. This changes a behavior of lxc-attach which was there for *years*
and it's quite scary to be honest. I'm not against having this change, but
in a different form, for example we can add a new command line parameter for
lxc-attach command which will enable this behavior.
My first attempt was to fix that change to prevent race, but then
I've noticed that we also have a more serious problem described in (1),
this requires more work to do.
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Stéphane Graber [Thu, 19 Dec 2024 03:13:05 +0000 (22:13 -0500)]
github: Rework test workflow
Introduce a main "tests" workflow which runs the LXC testsuite on both
x86_64 and aarch64, on a variety of compilers and OS as well as handling
the santizer runs.
Jef Steelant [Fri, 6 Dec 2024 10:20:20 +0000 (11:20 +0100)]
lxccontainer: fix enter_net_ns helper to work when netns is inherited
If a network namespace is shared by setting lxc.namespace.share.net and
the container is unprivileged, then the network namespace should be
entered before entering the user namespace. However, if an unprivileged
user started a container, then the network namespace should be entered
after entering the user namespace. To solve this, we try to enter the
network namespace before entering the user namespace. If it did not
succeed, it will be tried again inside the uder namespace.
Add suppport for PuzzleFS images in the oci template
PuzzleFS images (media type 'application/vnd.puzzlefs.image.rootfs.v1')
can be mounted in a similar way to squashfs images, we just have to
detect the type and reuse the existing code for providing a mount
helper. PuzzleFS is a next-generation container filesystem [1] with
several benefits, such as reduced duplication, reproducible image
builds, direct mounting support and memory safety guarantees.
Since PuzzleFS currently doesn't provide an image config, also add
support for empty image configs, they are supported by the OCI spec [2].
The MOUNT_HELPER is now passed a `--persist <upperdir>` flag, so it
knows that it needs to create an overlay. This is needed because LXC
expects a writable rootfs and both atomfs and puzzlefs are read-only
filesystems.
Serge Hallyn [Thu, 3 Oct 2024 18:41:39 +0000 (13:41 -0500)]
meson.build: drop suggest-attribute=noreturn build option
The suggest-attribute=noreturn option marks functions which will
never return, to give the compiler some hints. It catches all of
our src/lxc/tools/*.c *_main functions as follows:
error: function might be candidate for attribute ‘noreturn’ [-Werror=suggest-attribute=noreturn]
But if we mark those __noreturn, then the compiler complains that:
../src/lxc/tools/lxc_attach.c:320:53: warning: ‘main’ specifies less restrictive attribute than its target ‘lxc_attach_main’: ‘noreturn’ [-Wmissi
ng-attributes]
320 | int __attribute__((weak, alias("lxc_attach_main"))) main(int argc, char *argv[]);
This recommendation is really not very important, so let's not ask
the build to warn about it.