]>
Commit | Line | Data |
---|---|---|
c3e270f4 | 1 | --- |
48f60ea9 | 2 | title: Users, Groups, UIDs and GIDs on systemd Systems |
4cdca0af | 3 | category: Concepts |
b41a3f66 | 4 | layout: default |
c3e270f4 FB |
5 | --- |
6 | ||
48f60ea9 | 7 | # Users, Groups, UIDs and GIDs on systemd Systems |
39972553 LP |
8 | |
9 | Here's a summary of the requirements `systemd` (and Linux) make on UID/GID | |
10 | assignments and their ranges. | |
11 | ||
12 | Note that while in theory UIDs and GIDs are orthogonal concepts they really | |
13 | aren't IRL. With that in mind, when we discuss UIDs below it should be assumed | |
14 | that whatever we say about UIDs applies to GIDs in mostly the same way, and all | |
15 | the special assignments and ranges for UIDs always have mostly the same | |
16 | validity for GIDs too. | |
17 | ||
18 | ## Special Linux UIDs | |
19 | ||
20 | In theory, the range of the C type `uid_t` is 32bit wide on Linux, | |
21 | i.e. 0…4294967295. However, four UIDs are special on Linux: | |
22 | ||
23 | 1. 0 → The `root` super-user | |
24 | ||
25 | 2. 65534 → The `nobody` UID, also called the "overflow" UID or similar. It's | |
2e276b1d LP |
26 | where various subsystems map unmappable users to, for example file systems |
27 | only supporting 16bit UIDs, NFS or user namespacing. (The latter can be | |
28 | changed with a sysctl during runtime, but that's not supported on | |
29 | `systemd`. If you do change it you void your warranty.) Because Fedora is a | |
30 | bit confused the `nobody` user is called `nfsnobody` there (and they have a | |
31 | different `nobody` user at UID 99). I hope this will be corrected eventually | |
32 | though. (Also, some distributions call the `nobody` group `nogroup`. I wish | |
33 | they didn't.) | |
39972553 LP |
34 | |
35 | 3. 4294967295, aka "32bit `(uid_t) -1`" → This UID is not a valid user ID, as | |
9e4b8893 LP |
36 | `setresuid()`, `chown()` and friends treat -1 as a special request to not |
37 | change the UID of the process/file. This UID is hence not available for | |
38 | assignment to users in the user database. | |
39972553 | 39 | |
9e4b8893 LP |
40 | 4. 65535, aka "16bit `(uid_t) -1`" → Before Linux kernel 2.4 `uid_t` used to be |
41 | 16bit, and programs compiled for that would hence assume that `(uid_t) -1` | |
42 | is 65535. This UID is hence not usable either. | |
39972553 LP |
43 | |
44 | The `nss-systemd` glibc NSS module will synthesize user database records for | |
45 | the UIDs 0 and 65534 if the system user database doesn't list them. This means | |
46 | that any system where this module is enabled works to some minimal level | |
47 | without `/etc/passwd`. | |
48 | ||
49 | ## Special Distribution UID ranges | |
50 | ||
51 | Distributions generally split the available UID range in two: | |
52 | ||
53 | 1. 1…999 → System users. These are users that do not map to actual "human" | |
54 | users, but are used as security identities for system daemons, to implement | |
55 | privilege separation and run system daemons with minimal privileges. | |
56 | ||
57 | 2. 1000…65533 and 65536…4294967294 → Everything else, i.e. regular (human) users. | |
58 | ||
59 | Note that most distributions allow changing the boundary between system and | |
60 | regular users, even during runtime as user configuration. Moreover, some older | |
61 | systems placed the boundary at 499/500, or even 99/100. In `systemd`, the | |
62 | boundary is configurable only during compilation time, as this should be a | |
63 | decision for distribution builders, not for users. Moreover, we strongly | |
64 | discourage downstreams to change the boundary from the upstream default of | |
65 | 999/1000. | |
66 | ||
67 | Also note that programs such as `adduser` tend to allocate from a subset of the | |
68 | available regular user range only, usually 1000..60000. And it's also usually | |
69 | user-configurable, too. | |
70 | ||
71 | Note that systemd requires that system users and groups are resolvable without | |
72 | networking available — a requirement that is not made for regular users. This | |
73 | means regular users may be stored in remote LDAP or NIS databases, but system | |
74 | users may not (except when there's a consistent local cache kept, that is | |
75 | available during earliest boot, including in the initial RAM disk). | |
76 | ||
77 | ## Special `systemd` GIDs | |
78 | ||
79 | `systemd` defines no special UIDs beyond what Linux already defines (see | |
80 | above). However, it does define some special group/GID assignments, which are | |
81 | primarily used for `systemd-udevd`'s device management. The precise list of the | |
82 | currently defined groups is found in this `sysusers.d` snippet: | |
83 | [basic.conf](https://raw.githubusercontent.com/systemd/systemd/master/sysusers.d/basic.conf.in) | |
84 | ||
85 | It's strongly recommended that downstream distributions include these groups in | |
86 | their default group databases. | |
87 | ||
88 | Note that the actual GID numbers assigned to these groups do not have to be | |
89 | constant beyond a specific system. There's one exception however: the `tty` | |
90 | group must have the GID 5. That's because it must be encoded in the `devpts` | |
91 | mount parameters during earliest boot, at a time where NSS lookups are not | |
92 | possible. (Note that the actual GID can be changed during `systemd` build time, | |
93 | but downstreams are strongly advised against doing that.) | |
94 | ||
95 | ## Special `systemd` UID ranges | |
96 | ||
97 | `systemd` defines a number of special UID ranges: | |
98 | ||
f62dd237 LP |
99 | 1. 60001…60513 → UIDs for home directories managed by |
100 | [`systemd-homed.service(8)`](https://www.freedesktop.org/software/systemd/man/systemd-homed.service.html). UIDs | |
101 | from this range are automatically assigned to any home directory discovered, | |
102 | and persisted locally on first login. On different systems the same user | |
103 | might get different UIDs assigned in case of conflict, though it is | |
104 | attempted to make UID assignments stable, by deriving them from a hash of | |
105 | the user name. | |
106 | ||
107 | 2. 61184…65519 → UIDs for dynamic users are allocated from this range (see the | |
39972553 LP |
108 | `DynamicUser=` documentation in |
109 | [`systemd.exec(5)`](https://www.freedesktop.org/software/systemd/man/systemd.exec.html)). This | |
110 | range has been chosen so that it is below the 16bit boundary (i.e. below | |
111 | 65535), in order to provide compatibility with container environments that | |
112 | assign a 64K range of UIDs to containers using user namespacing. This range | |
113 | is above the 60000 boundary, so that its allocations are unlikely to be | |
114 | affected by `adduser` allocations (see above). And we leave some room | |
115 | upwards for other purposes. (And if you wonder why precisely these numbers: | |
116 | if you write them in hexadecimal, they might make more sense: 0xEF00 and | |
117 | 0xFFEF). The `nss-systemd` module will synthesize user records implicitly | |
118 | for all currently allocated dynamic users from this range. Thus, NSS-based | |
119 | user record resolving works correctly without those users being in | |
120 | `/etc/passwd`. | |
121 | ||
f62dd237 | 122 | 3. 524288…1879048191 → UID range for `systemd-nspawn`'s automatic allocation of |
39972553 LP |
123 | per-container UID ranges. When the `--private-users=pick` switch is used (or |
124 | `-U`) then it will automatically find a so far unused 16bit subrange of this | |
125 | range and assign it to the container. The range is picked so that the upper | |
126 | 16bit of the 32bit UIDs are constant for all users of the container, while | |
127 | the lower 16bit directly encode the 65536 UIDs assigned to the | |
128 | container. This mode of allocation means that the upper 16bit of any UID | |
129 | assigned to a container are kind of a "container ID", while the lower 16bit | |
130 | directly expose the container's own UID numbers. If you wonder why precisely | |
131 | these numbers, consider them in hexadecimal: 0x00080000…0x6FFFFFFF. This | |
132 | range is above the 16bit boundary. Moreover it's below the 31bit boundary, | |
133 | as some broken code (specifically: the kernel's `devpts` file system) | |
134 | erroneously considers UIDs signed integers, and hence can't deal with values | |
135 | above 2^31. The `nss-mymachines` glibc NSS module will synthesize user | |
136 | database records for all UIDs assigned to a running container from this | |
137 | range. | |
138 | ||
139 | Note for both allocation ranges: when an UID allocation takes place NSS is | |
140 | checked for collisions first, and a different UID is picked if an entry is | |
141 | found. Thus, the user database is used as synchronization mechanism to ensure | |
142 | exclusive ownership of UIDs and UID ranges. To ensure compatibility with other | |
143 | subsystems allocating from the same ranges it is hence essential that they | |
144 | ensure that whatever they pick shows up in the user/group databases, either by | |
145 | providing an NSS module, or by adding entries directly to `/etc/passwd` and | |
146 | `/etc/group`. For performance reasons, do note that `systemd-nspawn` will only | |
147 | do an NSS check for the first UID of the range it allocates, not all 65536 of | |
148 | them. Also note that while the allocation logic is operating, the glibc | |
149 | `lckpwdf()` user database lock is taken, in order to make this logic race-free. | |
150 | ||
151 | ## Figuring out the system's UID boundaries | |
152 | ||
153 | The most important boundaries of the local system may be queried with | |
154 | `pkg-config`: | |
155 | ||
156 | ``` | |
157 | $ pkg-config --variable=systemuidmax systemd | |
158 | 999 | |
159 | $ pkg-config --variable=dynamicuidmin systemd | |
160 | 61184 | |
161 | $ pkg-config --variable=dynamicuidmax systemd | |
162 | 65519 | |
163 | $ pkg-config --variable=containeruidbasemin systemd | |
164 | 524288 | |
165 | $ pkg-config --variable=containeruidbasemax systemd | |
166 | 1878982656 | |
167 | ``` | |
168 | ||
169 | (Note that the latter encodes the maximum UID *base* `systemd-nspawn` might | |
170 | pick — given that 64K UIDs are assigned to each container according to this | |
171 | allocation logic, the maximum UID used for this range is hence | |
172 | 1878982656+65535=1879048191.) | |
173 | ||
174 | Note that systemd does not make any of these values runtime-configurable. All | |
175 | these boundaries are chosen during build time. That said, the system UID/GID | |
176 | boundary is traditionally configured in /etc/login.defs, though systemd won't | |
177 | look there during runtime. | |
178 | ||
179 | ## Considerations for container managers | |
180 | ||
181 | If you hack on a container manager, and wonder how and how many UIDs best to | |
182 | assign to your containers, here are a few recommendations: | |
183 | ||
184 | 1. Definitely, don't assign less than 65536 UIDs/GIDs. After all the `nobody` | |
185 | user has magic properties, and hence should be available in your container, and | |
186 | given that it's assigned the UID 65534, you should really cover the full 16bit | |
187 | range in your container. Note that systemd will — as mentioned — synthesize | |
188 | user records for the `nobody` user, and assumes its availability in various | |
189 | other parts of its codebase, too, hence assigning fewer users means you lose | |
190 | compatibility with running systemd code inside your container. And most likely | |
191 | other packages make similar restrictions. | |
192 | ||
193 | 2. While it's fine to assign more than 65536 UIDs/GIDs to a container, there's | |
194 | most likely not much value in doing so, as Linux distributions won't use the | |
195 | higher ranges by default (as mentioned neither `adduser` nor `systemd`'s | |
196 | dynamic user concept allocate from above the 16bit range). Unless you actively | |
197 | care for nested containers, it's hence probably a good idea to allocate exactly | |
198 | 65536 UIDs per container, and neither less nor more. A pretty side-effect is | |
199 | that by doing so, you expose the same number of UIDs per container as Linux 2.2 | |
200 | supported for the whole system, back in the days. | |
201 | ||
202 | 3. Consider allocating UID ranges for containers so that the first UID you | |
203 | assign has the lower 16bits all set to zero. That way, the upper 16bits become | |
204 | a container ID of some kind, while the lower 16bits directly encode the | |
205 | internal container UID. This is the way `systemd-nspawn` allocates UID ranges | |
e5988600 | 206 | (see above). Following this allocation logic ensures best compatibility with |
39972553 LP |
207 | `systemd-nspawn` and all other container managers following the scheme, as it |
208 | is sufficient then to check NSS for the first UID you pick regarding conflicts, | |
209 | as that's what they do, too. Moreover, it makes `chown()`ing container file | |
210 | system trees nicely robust to interruptions: as the external UID encodes the | |
211 | internal UID in a fixed way, it's very easy to adjust the container's base UID | |
212 | without the need to know the original base UID: to change the container base, | |
213 | just mask away the upper 16bit, and insert the upper 16bit of the new container | |
214 | base instead. Here are the easy conversions to derive the internal UID, the | |
215 | external UID, and the container base UID from each other: | |
216 | ||
217 | ``` | |
218 | INTERNAL_UID = EXTERNAL_UID & 0x0000FFFF | |
219 | CONTAINER_BASE_UID = EXTERNAL_UID & 0xFFFF0000 | |
220 | EXTERNAL_UID = INTERNAL_UID | CONTAINER_BASE_UID | |
221 | ``` | |
222 | ||
223 | 4. When picking a UID range for containers, make sure to check NSS first, with | |
224 | a simple `getpwuid()` call: if there's already a user record for the first UID | |
225 | you want to pick, then it's already in use: pick a different one. Wrap that | |
226 | call in a `lckpwdf()` + `ulckpwdf()` pair, to make allocation | |
227 | race-free. Provide an NSS module that makes all UIDs you end up taking show up | |
228 | in the user database, and make sure that the NSS module returns up-to-date | |
229 | information before you release the lock, so that other system components can | |
230 | safely use the NSS user database as allocation check, too. Note that if you | |
231 | follow this scheme no changes to `/etc/passwd` need to be made, thus minimizing | |
232 | the artifacts the container manager persistently leaves in the system. | |
233 | ||
234 | ## Summary | |
235 | ||
236 | | UID/GID | Purpose | Defined By | Listed in | | |
237 | |-----------------------|-----------------------|---------------|-------------------------------| | |
238 | | 0 | `root` user | Linux | `/etc/passwd` + `nss-systemd` | | |
239 | | 1…4 | System users | Distributions | `/etc/passwd` | | |
240 | | 5 | `tty` group | `systemd` | `/etc/passwd` | | |
241 | | 6…999 | System users | Distributions | `/etc/passwd` | | |
242 | | 1000…60000 | Regular users | Distributions | `/etc/passwd` + LDAP/NIS/… | | |
f62dd237 LP |
243 | | 60001…60513 | Human Users (homed) | `systemd` | `nss-systemd` |
244 | | 60514…61183 | Unused | | | | |
39972553 LP |
245 | | 61184…65519 | Dynamic service users | `systemd` | `nss-systemd` | |
246 | | 65520…65533 | Unused | | | | |
247 | | 65534 | `nobody` user | Linux | `/etc/passwd` + `nss-systemd` | | |
248 | | 65535 | 16bit `(uid_t) -1` | Linux | | | |
249 | | 65536…524287 | Unused | | | | |
250 | | 524288…1879048191 | Container UID ranges | `systemd` | `nss-mymachines` | | |
581004bd | 251 | | 1879048192…2147483647 | Unused | | | |
a305eda3 | 252 | | 2147483648…4294967294 | HIC SVNT LEONES | | | |
39972553 LP |
253 | | 4294967295 | 32bit `(uid_t) -1` | Linux | | |
254 | ||
255 | Note that "Unused" in the table above doesn't meant that these ranges are | |
256 | really unused. It just means that these ranges have no well-established | |
257 | pre-defined purposes between Linux, generic low-level distributions and | |
258 | `systemd`. There might very well be other packages that allocate from these | |
259 | ranges. | |
bf613f7a | 260 | |
a305eda3 LP |
261 | Note that the range 2147483648…4294967294 (i.e. 2^31…2^32-2) should be handled |
262 | with care. Various programs (including kernel file systems, see `devpts`) have | |
263 | trouble with UIDs outside of the signed 32bit range, i.e any UIDs equal to or | |
264 | above 2147483648. It is thus strongly recommended to stay away from this range | |
265 | in order to avoid complications. This range should be considered reserved for | |
266 | future, special purposes. | |
267 | ||
bf613f7a LP |
268 | ## Notes on resolvability of user and group names |
269 | ||
270 | User names, UIDs, group names and GIDs don't have to be resolvable using NSS | |
271 | (i.e. getpwuid() and getpwnam() and friends) all the time. However, systemd | |
272 | makes the following requirements: | |
273 | ||
274 | System users generally have to be resolvable during early boot already. This | |
275 | means they should not be provided by any networked service (as those usually | |
276 | become available during late boot only), except if a local cache is kept that | |
277 | makes them available during early boot too (i.e. before networking is | |
278 | up). Specifically, system users need to be resolvable at least before | |
279 | `systemd-udevd.service` and `systemd-tmpfiles.service` are started, as both | |
280 | need to resolve system users — but note that there might be more services | |
281 | requiring full resolvability of system users than just these two. | |
282 | ||
283 | Regular users do not need to be resolvable during early boot, it is sufficient | |
284 | if they become resolvable during late boot. Specifically, regular users need to | |
285 | be resolvable at the point in time the `nss-user-lookup.target` unit is | |
286 | reached. This target unit is generally used as synchronization point between | |
287 | providers of the user database and consumers of it. Services that require that | |
288 | the user database is fully available (for example, the login service | |
289 | `systemd-logind.service`) are ordered *after* it, while services that provide | |
290 | parts of the user database (for example an LDAP user database client) are | |
291 | ordered *before* it. Note that `nss-user-lookup.target` is a *passive* unit: in | |
292 | order to minimize synchronization points on systems that don't need it the unit | |
293 | is pulled into the initial transaction only if there's at least one service | |
294 | that really needs it, and that means only if there's a service providing the | |
295 | local user database somehow through IPC or suchlike. Or in other words: if you | |
296 | hack on some networked user database project, then make sure you order your | |
297 | service `Before=nss-user-lookup.target` and that you pull it in with | |
298 | `Wants=nss-user-lookup.target`. However, if you hack on some project that needs | |
299 | the user database to be up in full, then order your service | |
300 | `After=nss-user-lookup.target`, but do *not* pull it in via a `Wants=` | |
301 | dependency. |