From: Christian Brauner Date: Mon, 12 Jan 2026 12:51:32 +0000 (+0100) Subject: Merge patch series "mount: add OPEN_TREE_NAMESPACE" X-Git-Url: http://git.ipfire.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=1bce1a664ac25d37a327c433a01bc347f0a81bd6;p=thirdparty%2Flinux.git Merge patch series "mount: add OPEN_TREE_NAMESPACE" Christian Brauner says: When creating containers the setup usually involves using CLONE_NEWNS via clone3() or unshare(). This copies the caller's complete mount namespace. The runtime will also assemble a new rootfs and then use pivot_root() to switch the old mount tree with the new rootfs. Afterward it will recursively umount the old mount tree thereby getting rid of all mounts. On a basic system here where the mount table isn't particularly large this still copies about 30 mounts. Copying all of these mounts only to get rid of them later is pretty wasteful. This is exacerbated if intermediary mount namespaces are used that only exist for a very short amount of time and are immediately destroyed again causing a ton of mounts to be copied and destroyed needlessly. With a large mount table and a system where thousands or ten-thousands of namespaces are spawned in parallel this quickly becomes a bottleneck increasing contention on the semaphore. Extend open_tree() with a new OPEN_TREE_NAMESPACE flag. Similar to OPEN_TREE_CLONE only the indicated mount tree is copied. Instead of returning a file descriptor referring to that mount tree OPEN_TREE_NAMESPACE will cause open_tree() to return a file descriptor to a new mount namespace. In that new mount namespace the copied mount tree has been mounted on top of a copy of the real rootfs. The caller can setns() into that mount namespace and perform any additionally setup such as move_mount()ing detached mounts in there. This allows OPEN_TREE_NAMESPACE to function as a combined unshare(CLONE_NEWNS) and pivot_root(). A caller may for example choose to create an extremely minimal rootfs: fd_mntns = open_tree(-EBADF, "/var/lib/containers/wootwoot", OPEN_TREE_NAMESPACE); This will create a mount namespace where "wootwoot" has become the rootfs mounted on top of the real rootfs. The caller can now setns() into this new mount namespace and assemble additional mounts. This also works with user namespaces: unshare(CLONE_NEWUSER); fd_mntns = open_tree(-EBADF, "/var/lib/containers/wootwoot", OPEN_TREE_NAMESPACE); which creates a new mount namespace owned by the earlier created user namespace with "wootwoot" as the rootfs mounted on top of the real rootfs. This will scale a lot better when creating tons of mount namespaces and will allow to get rid of a lot of unnecessary mount and umount cycles. It also allows to create mount namespaces without needing to spawn throwaway helper processes. * patches from https://patch.msgid.link/20251229-work-empty-namespace-v1-0-bfb24c7b061f@kernel.org: selftests/open_tree: add OPEN_TREE_NAMESPACE tests mount: add OPEN_TREE_NAMESPACE Link: https://patch.msgid.link/20251229-work-empty-namespace-v1-0-bfb24c7b061f@kernel.org Signed-off-by: Christian Brauner --- 1bce1a664ac25d37a327c433a01bc347f0a81bd6