git.ipfire.org Git - thirdparty/systemd.git/commit

nspawn: make sure host root can write to the uidmapped mounts we prepare for the container payload

When using user namespaces in conjunction with uidmapped mounts, nspawn
so far set up two uidmappings:

1. One that is used for the uidmapped mount and that maps the UID range
   0…65535 on the backing fs to some high UID range X…X+65535 on the
   uidmapped fs. (Let's call this mapping the "mount mapping")

2. One that is used for the userns namespace the container payload
   processes run in, that maps X…X+65535 back to 0…65535. (Let's call
   this one the "process mapping").

These mappings hence are pretty much identical, one just moves things up
and one back down. (Reminder: we do all this so that the processes can
run under high UIDs while running off file systems that require no
recursive chown()ing, i.e. we want processes with high UID range but
files with low UID range.)

This creates one problem, i.e. issue #20989: if nspawn (which runs as
host root, i.e. host UID 0) wants to add inodes to the uidmapped mount
it can't do that, since host UID 0 is not defined in the mount mapping
(only the X…X+65536 range is, after all, and X > 0), and processes whose
UID is not mapped in a uidmapped fs cannot create inodes in it since
those would be owned by an unmapped UID, which then triggers
the famous EOVERFLOW error.

Let's fix this, by explicitly including an entry for the host UID 0 in
the mount mapping. Specifically, we'll extend the mount mapping to map
UID 2147483646 (which is INT32_MAX-1, see code for an explanation why I
picked this one) of the backing fs to UID 0 on the uidmapped fs. This
way nspawn can creates inode on the uidmapped as it likes (which will
then actually be owned by UID 2147483646 on the backing fs), and as it
always did. Note that we do *not* create a similar entry in the process
mapping. Thus any files created by nspawn that way (and not chown()ed to
something better) will appear as unmapped (i.e. as overflowuid/"nobody")
in the container payload. And that's good. Of course, the latter is
mostly theoretic, as nspawn should generally chown() the inodes it
creates to UID ranges that actually make sense for the container (and we
generally already do this correctly), but it#s good to know that we are
safe here, given we might accidentally forget to chown() some inodes we
create.

Net effect: the two mappings will not be identical anymore. The mount
mapping has one entry more, and the only reason it exists is so that
nspawn can access the uidmapped fs reasonably independently from any
process mapping.

Fixes: #20989

author	Lennart Poettering <lennart@poettering.net>
	Thu, 17 Mar 2022 12:46:12 +0000 (13:46 +0100)
committer	Lennart Poettering <lennart@poettering.net>
	Thu, 17 Mar 2022 18:08:12 +0000 (19:08 +0100)
commit	50ae2966d20b0b4a19def060de3b966b7a70b54a
tree	d0c072dfc682f5d2e39439d8b664c76a359eba37	tree \| snapshot
parent	264caae299aa8f42f20460ad3280add657a3747f	commit \| diff

src/basic/user-util.h		diff \| blob \| blame \| history
src/nspawn/nspawn-mount.c		diff \| blob \| blame \| history
src/nspawn/nspawn.c		diff \| blob \| blame \| history
src/shared/dissect-image.c		diff \| blob \| blame \| history
src/shared/mount-util.c		diff \| blob \| blame \| history
src/shared/mount-util.h		diff \| blob \| blame \| history