The logical implication between PID namespaces being readonly after
process creation and process trees needing to loosely mirror PID
namespaces is not trivial to follow. Part of that implication is
implicit: since PID namespace membership is readonly, one has to use
fork() or one of its variants to "change" PID namespace, and these APIs
need to return a valid child PID in the parent namespace. The
consequence could also be made more explicit (setns() will fail on
non-descendant PID namespaces) while explaining how this is implemented.
Signed-off-by: Matthieu Buffet <matthieu@buffet.re>
Message-ID: <
20260513083339.27911-2-matthieu@buffet.re>
Signed-off-by: Alejandro Colomar <alx@kernel.org>
To put things another way:
a process's PID namespace membership is determined when the process is created
and cannot be changed thereafter.
-Among other things,
-this means that
+.P
+Because of this,
+and because system calls to create a process
+in another namespace
+need to return a meaningful new PID
+in the namespace of their caller,
the parental relationship between processes
loosely mirrors
the parental relationship between PID namespaces:
is either in the same namespace
or resides in an ancestor PID namespace
(immediate parent or not).
+This is enforced by the design of
+.BR clone (2)
+and
+.BR unshare (2),
+while
+.BR setns (2)
+is restricted to only accept
+the current PID namespace
+and its descendants.
.P
A process may call
.BR unshare (2)