]> git.ipfire.org Git - thirdparty/systemd.git/blob - UIDS-GIDS.md
final v236 update (#7649)
[thirdparty/systemd.git] / UIDS-GIDS.md
1 # Users, Groups, UIDs and GIDs on `systemd` systems
2
3 Here's a summary of the requirements `systemd` (and Linux) make on UID/GID
4 assignments and their ranges.
5
6 Note that while in theory UIDs and GIDs are orthogonal concepts they really
7 aren't IRL. With that in mind, when we discuss UIDs below it should be assumed
8 that whatever we say about UIDs applies to GIDs in mostly the same way, and all
9 the special assignments and ranges for UIDs always have mostly the same
10 validity for GIDs too.
11
12 ## Special Linux UIDs
13
14 In theory, the range of the C type `uid_t` is 32bit wide on Linux,
15 i.e. 0…4294967295. However, four UIDs are special on Linux:
16
17 1. 0 → The `root` super-user
18
19 2. 65534 → The `nobody` UID, also called the "overflow" UID or similar. It's
20 where various subsystems map unmappable users to, for example NFS or user
21 namespacing. (The latter can be changed with a sysctl during runtime, but
22 that's not supported on `systemd`. If you do change it you void your
23 warranty.) Because Fedora is a bit confused the `nobody` user is called
24 `nfsnobody` there (and they have a different `nobody` user at UID 99). I
25 hope this will be corrected eventually though. (Also, some distributions
26 call the `nobody` group `nogroup`. I wish they didn't.)
27
28 3. 4294967295, aka "32bit `(uid_t) -1`" → This UID is not a valid user ID, as
29 setresuid(), chown() and friends treat -1 as a special request to not change
30 the UID of the process/file. This UID is hence not available for assignment
31 to users in the user database.
32
33 4. 65535, aka "16bit `(uid_t) -1`" → Once upon a time `uid_t` used to be 16bit, and
34 programs compiled for that would hence assume that `(uid_t) -1` is 65535. This
35 UID is hence not usable either.
36
37 The `nss-systemd` glibc NSS module will synthesize user database records for
38 the UIDs 0 and 65534 if the system user database doesn't list them. This means
39 that any system where this module is enabled works to some minimal level
40 without `/etc/passwd`.
41
42 ## Special Distribution UID ranges
43
44 Distributions generally split the available UID range in two:
45
46 1. 1…999 → System users. These are users that do not map to actual "human"
47 users, but are used as security identities for system daemons, to implement
48 privilege separation and run system daemons with minimal privileges.
49
50 2. 1000…65533 and 65536…4294967294 → Everything else, i.e. regular (human) users.
51
52 Note that most distributions allow changing the boundary between system and
53 regular users, even during runtime as user configuration. Moreover, some older
54 systems placed the boundary at 499/500, or even 99/100. In `systemd`, the
55 boundary is configurable only during compilation time, as this should be a
56 decision for distribution builders, not for users. Moreover, we strongly
57 discourage downstreams to change the boundary from the upstream default of
58 999/1000.
59
60 Also note that programs such as `adduser` tend to allocate from a subset of the
61 available regular user range only, usually 1000..60000. And it's also usually
62 user-configurable, too.
63
64 Note that systemd requires that system users and groups are resolvable without
65 networking available — a requirement that is not made for regular users. This
66 means regular users may be stored in remote LDAP or NIS databases, but system
67 users may not (except when there's a consistent local cache kept, that is
68 available during earliest boot, including in the initial RAM disk).
69
70 ## Special `systemd` GIDs
71
72 `systemd` defines no special UIDs beyond what Linux already defines (see
73 above). However, it does define some special group/GID assignments, which are
74 primarily used for `systemd-udevd`'s device management. The precise list of the
75 currently defined groups is found in this `sysusers.d` snippet:
76 [basic.conf](https://raw.githubusercontent.com/systemd/systemd/master/sysusers.d/basic.conf.in)
77
78 It's strongly recommended that downstream distributions include these groups in
79 their default group databases.
80
81 Note that the actual GID numbers assigned to these groups do not have to be
82 constant beyond a specific system. There's one exception however: the `tty`
83 group must have the GID 5. That's because it must be encoded in the `devpts`
84 mount parameters during earliest boot, at a time where NSS lookups are not
85 possible. (Note that the actual GID can be changed during `systemd` build time,
86 but downstreams are strongly advised against doing that.)
87
88 ## Special `systemd` UID ranges
89
90 `systemd` defines a number of special UID ranges:
91
92 1. 61184…65519 → UIDs for dynamic users are allocated from this range (see the
93 `DynamicUser=` documentation in
94 [`systemd.exec(5)`](https://www.freedesktop.org/software/systemd/man/systemd.exec.html)). This
95 range has been chosen so that it is below the 16bit boundary (i.e. below
96 65535), in order to provide compatibility with container environments that
97 assign a 64K range of UIDs to containers using user namespacing. This range
98 is above the 60000 boundary, so that its allocations are unlikely to be
99 affected by `adduser` allocations (see above). And we leave some room
100 upwards for other purposes. (And if you wonder why precisely these numbers:
101 if you write them in hexadecimal, they might make more sense: 0xEF00 and
102 0xFFEF). The `nss-systemd` module will synthesize user records implicitly
103 for all currently allocated dynamic users from this range. Thus, NSS-based
104 user record resolving works correctly without those users being in
105 `/etc/passwd`.
106
107 2. 524288…1879048191 → UID range for `systemd-nspawn`'s automatic allocation of
108 per-container UID ranges. When the `--private-users=pick` switch is used (or
109 `-U`) then it will automatically find a so far unused 16bit subrange of this
110 range and assign it to the container. The range is picked so that the upper
111 16bit of the 32bit UIDs are constant for all users of the container, while
112 the lower 16bit directly encode the 65536 UIDs assigned to the
113 container. This mode of allocation means that the upper 16bit of any UID
114 assigned to a container are kind of a "container ID", while the lower 16bit
115 directly expose the container's own UID numbers. If you wonder why precisely
116 these numbers, consider them in hexadecimal: 0x00080000…0x6FFFFFFF. This
117 range is above the 16bit boundary. Moreover it's below the 31bit boundary,
118 as some broken code (specifically: the kernel's `devpts` file system)
119 erroneously considers UIDs signed integers, and hence can't deal with values
120 above 2^31. The `nss-mymachines` glibc NSS module will synthesize user
121 database records for all UIDs assigned to a running container from this
122 range.
123
124 Note for both allocation ranges: when an UID allocation takes place NSS is
125 checked for collisions first, and a different UID is picked if an entry is
126 found. Thus, the user database is used as synchronization mechanism to ensure
127 exclusive ownership of UIDs and UID ranges. To ensure compatibility with other
128 subsystems allocating from the same ranges it is hence essential that they
129 ensure that whatever they pick shows up in the user/group databases, either by
130 providing an NSS module, or by adding entries directly to `/etc/passwd` and
131 `/etc/group`. For performance reasons, do note that `systemd-nspawn` will only
132 do an NSS check for the first UID of the range it allocates, not all 65536 of
133 them. Also note that while the allocation logic is operating, the glibc
134 `lckpwdf()` user database lock is taken, in order to make this logic race-free.
135
136 ## Figuring out the system's UID boundaries
137
138 The most important boundaries of the local system may be queried with
139 `pkg-config`:
140
141 ```
142 $ pkg-config --variable=systemuidmax systemd
143 999
144 $ pkg-config --variable=dynamicuidmin systemd
145 61184
146 $ pkg-config --variable=dynamicuidmax systemd
147 65519
148 $ pkg-config --variable=containeruidbasemin systemd
149 524288
150 $ pkg-config --variable=containeruidbasemax systemd
151 1878982656
152 ```
153
154 (Note that the latter encodes the maximum UID *base* `systemd-nspawn` might
155 pick — given that 64K UIDs are assigned to each container according to this
156 allocation logic, the maximum UID used for this range is hence
157 1878982656+65535=1879048191.)
158
159 Note that systemd does not make any of these values runtime-configurable. All
160 these boundaries are chosen during build time. That said, the system UID/GID
161 boundary is traditionally configured in /etc/login.defs, though systemd won't
162 look there during runtime.
163
164 ## Considerations for container managers
165
166 If you hack on a container manager, and wonder how and how many UIDs best to
167 assign to your containers, here are a few recommendations:
168
169 1. Definitely, don't assign less than 65536 UIDs/GIDs. After all the `nobody`
170 user has magic properties, and hence should be available in your container, and
171 given that it's assigned the UID 65534, you should really cover the full 16bit
172 range in your container. Note that systemd will — as mentioned — synthesize
173 user records for the `nobody` user, and assumes its availability in various
174 other parts of its codebase, too, hence assigning fewer users means you lose
175 compatibility with running systemd code inside your container. And most likely
176 other packages make similar restrictions.
177
178 2. While it's fine to assign more than 65536 UIDs/GIDs to a container, there's
179 most likely not much value in doing so, as Linux distributions won't use the
180 higher ranges by default (as mentioned neither `adduser` nor `systemd`'s
181 dynamic user concept allocate from above the 16bit range). Unless you actively
182 care for nested containers, it's hence probably a good idea to allocate exactly
183 65536 UIDs per container, and neither less nor more. A pretty side-effect is
184 that by doing so, you expose the same number of UIDs per container as Linux 2.2
185 supported for the whole system, back in the days.
186
187 3. Consider allocating UID ranges for containers so that the first UID you
188 assign has the lower 16bits all set to zero. That way, the upper 16bits become
189 a container ID of some kind, while the lower 16bits directly encode the
190 internal container UID. This is the way `systemd-nspawn` allocates UID ranges
191 (see above). Following this allocation logic ensures best compability with
192 `systemd-nspawn` and all other container managers following the scheme, as it
193 is sufficient then to check NSS for the first UID you pick regarding conflicts,
194 as that's what they do, too. Moreover, it makes `chown()`ing container file
195 system trees nicely robust to interruptions: as the external UID encodes the
196 internal UID in a fixed way, it's very easy to adjust the container's base UID
197 without the need to know the original base UID: to change the container base,
198 just mask away the upper 16bit, and insert the upper 16bit of the new container
199 base instead. Here are the easy conversions to derive the internal UID, the
200 external UID, and the container base UID from each other:
201
202 ```
203 INTERNAL_UID = EXTERNAL_UID & 0x0000FFFF
204 CONTAINER_BASE_UID = EXTERNAL_UID & 0xFFFF0000
205 EXTERNAL_UID = INTERNAL_UID | CONTAINER_BASE_UID
206 ```
207
208 4. When picking a UID range for containers, make sure to check NSS first, with
209 a simple `getpwuid()` call: if there's already a user record for the first UID
210 you want to pick, then it's already in use: pick a different one. Wrap that
211 call in a `lckpwdf()` + `ulckpwdf()` pair, to make allocation
212 race-free. Provide an NSS module that makes all UIDs you end up taking show up
213 in the user database, and make sure that the NSS module returns up-to-date
214 information before you release the lock, so that other system components can
215 safely use the NSS user database as allocation check, too. Note that if you
216 follow this scheme no changes to `/etc/passwd` need to be made, thus minimizing
217 the artifacts the container manager persistently leaves in the system.
218
219 ## Summary
220
221 | UID/GID | Purpose | Defined By | Listed in |
222 |-----------------------|-----------------------|---------------|-------------------------------|
223 | 0 | `root` user | Linux | `/etc/passwd` + `nss-systemd` |
224 | 1…4 | System users | Distributions | `/etc/passwd` |
225 | 5 | `tty` group | `systemd` | `/etc/passwd` |
226 | 6…999 | System users | Distributions | `/etc/passwd` |
227 | 1000…60000 | Regular users | Distributions | `/etc/passwd` + LDAP/NIS/… |
228 | 60001…61183 | Unused | | |
229 | 61184…65519 | Dynamic service users | `systemd` | `nss-systemd` |
230 | 65520…65533 | Unused | | |
231 | 65534 | `nobody` user | Linux | `/etc/passwd` + `nss-systemd` |
232 | 65535 | 16bit `(uid_t) -1` | Linux | |
233 | 65536…524287 | Unused | | |
234 | 524288…1879048191 | Container UID ranges | `systemd` | `nss-mymachines` |
235 | 1879048192…4294967294 | Unused | | |
236 | 4294967295 | 32bit `(uid_t) -1` | Linux | |
237
238 Note that "Unused" in the table above doesn't meant that these ranges are
239 really unused. It just means that these ranges have no well-established
240 pre-defined purposes between Linux, generic low-level distributions and
241 `systemd`. There might very well be other packages that allocate from these
242 ranges.