]>
Commit | Line | Data |
---|---|---|
c11b1abf | 1 | .\" Copyright (c) 2002 by Michael Kerrisk <mtk.manpages@gmail.com> |
fea681da | 2 | .\" |
5fbde956 | 3 | .\" SPDX-License-Identifier: Linux-man-pages-copyleft |
fea681da MK |
4 | .\" |
5 | .\" 6 Aug 2002 - Initial Creation | |
c11b1abf MK |
6 | .\" Modified 2003-05-23, Michael Kerrisk, <mtk.manpages@gmail.com> |
7 | .\" Modified 2004-05-27, Michael Kerrisk, <mtk.manpages@gmail.com> | |
1c1e15ed | 8 | .\" 2004-12-08, mtk Added O_NOATIME for CAP_FOWNER |
5eaee3d9 | 9 | .\" 2005-08-16, mtk, Added CAP_AUDIT_CONTROL and CAP_AUDIT_WRITE |
c8e68512 MK |
10 | .\" 2008-07-15, Serge Hallyn <serue@us.bbm.com> |
11 | .\" Document file capabilities, per-process capability | |
12 | .\" bounding set, changed semantics for CAP_SETPCAP, | |
13 | .\" and other changes in 2.6.2[45]. | |
14 | .\" Add CAP_MAC_ADMIN, CAP_MAC_OVERRIDE, CAP_SETFCAP. | |
15 | .\" 2008-07-15, mtk | |
16 | .\" Add text describing circumstances in which CAP_SETPCAP | |
17 | .\" (theoretically) permits a thread to change the | |
18 | .\" capability sets of another thread. | |
19 | .\" Add section describing rules for programmatically | |
20 | .\" adjusting thread capability sets. | |
21 | .\" Describe rationale for capability bounding set. | |
22 | .\" Document "securebits" flags. | |
23 | .\" Add text noting that if we set the effective flag for one file | |
24 | .\" capability, then we must also set the effective flag for all | |
25 | .\" other capabilities where the permitted or inheritable bit is set. | |
bfb730f9 | 26 | .\" 2011-09-07, mtk/Serge hallyn: Add CAP_SYSLOG |
5eaee3d9 | 27 | .\" |
6e00b7a8 | 28 | .TH CAPABILITIES 7 2021-08-27 "Linux" "Linux Programmer's Manual" |
fea681da MK |
29 | .SH NAME |
30 | capabilities \- overview of Linux capabilities | |
31 | .SH DESCRIPTION | |
fea681da | 32 | For the purpose of performing permission checks, |
008f1ecc | 33 | traditional UNIX implementations distinguish two categories of processes: |
fea681da MK |
34 | .I privileged |
35 | processes (whose effective user ID is 0, referred to as superuser or root), | |
36 | and | |
37 | .I unprivileged | |
c7094399 | 38 | processes (whose effective UID is nonzero). |
fea681da MK |
39 | Privileged processes bypass all kernel permission checks, |
40 | while unprivileged processes are subject to full permission | |
41 | checking based on the process's credentials | |
42 | (usually: effective UID, effective GID, and supplementary group list). | |
ade303d7 | 43 | .PP |
c13182ef MK |
44 | Starting with kernel 2.2, Linux divides the privileges traditionally |
45 | associated with superuser into distinct units, known as | |
fea681da | 46 | .IR capabilities , |
3dfe7e0d | 47 | which can be independently enabled and disabled. |
cf7a13d4 | 48 | Capabilities are a per-thread attribute. |
c8e68512 | 49 | .\" |
c634028a | 50 | .SS Capabilities list |
c8e68512 MK |
51 | The following list shows the capabilities implemented on Linux, |
52 | and the operations or behaviors that each capability permits: | |
fea681da | 53 | .TP |
45286787 | 54 | .BR CAP_AUDIT_CONTROL " (since Linux 2.6.11)" |
5eaee3d9 MK |
55 | Enable and disable kernel auditing; change auditing filter rules; |
56 | retrieve auditing status and filtering rules. | |
57 | .TP | |
c81cea2c MK |
58 | .BR CAP_AUDIT_READ " (since Linux 3.16)" |
59 | .\" commit a29b694aa1739f9d76538e34ae25524f9c549d59 | |
60 | .\" commit 3a101b8de0d39403b2c7e5c23fd0b005668acf48 | |
61 | Allow reading the audit log via a multicast netlink socket. | |
62 | .TP | |
45286787 | 63 | .BR CAP_AUDIT_WRITE " (since Linux 2.6.11)" |
c8e68512 | 64 | Write records to kernel auditing log. |
dd61e8a8 | 65 | .\" FIXME Add FAN_ENABLE_AUDIT |
5eaee3d9 | 66 | .TP |
9339d749 MK |
67 | .BR CAP_BLOCK_SUSPEND " (since Linux 3.5)" |
68 | Employ features that can block system suspend | |
69 | .RB ( epoll (7) | |
70 | .BR EPOLLWAKEUP , | |
71 | .IR /proc/sys/wake_lock ). | |
72 | .TP | |
81701c04 MK |
73 | .BR CAP_BPF " (since Linux 5.8)" |
74 | Employ privileged BPF operations; see | |
75 | .BR bpf (2) | |
76 | and | |
28a4c58c | 77 | .BR bpf\-helpers (7). |
81701c04 MK |
78 | .IP |
79 | This capability was added in Linux 5.8 to separate out | |
80 | BPF functionality from the overloaded | |
1ae6b2c7 | 81 | .B CAP_SYS_ADMIN |
81701c04 MK |
82 | capability. |
83 | .TP | |
71f6247f MK |
84 | .BR CAP_CHECKPOINT_RESTORE " (since Linux 5.9)" |
85 | .\" commit 124ea650d3072b005457faed69909221c2905a1f | |
86 | .PD 0 | |
87 | .RS | |
88 | .IP * 2 | |
89 | Update | |
90 | .I /proc/sys/kernel/ns_last_pid | |
91 | (see | |
92 | .BR pid_namespaces (7)); | |
93 | .IP * | |
94 | employ the | |
95 | .I set_tid | |
96 | feature of | |
97 | .BR clone3 (2); | |
98 | .\" FIXME There is also some use case relating to | |
99 | .\" prctl_set_mm_exe_file(); in the 5.9 sources, see | |
100 | .\" prctl_set_mm_map(). | |
101 | .IP * | |
102 | read the contents of the symbolic links in | |
1ae6b2c7 | 103 | .IR /proc/ pid /map_files |
71f6247f MK |
104 | for other processes. |
105 | .RE | |
106 | .PD | |
107 | .IP | |
108 | This capability was added in Linux 5.9 to separate out | |
109 | checkpoint/restore functionality from the overloaded | |
1ae6b2c7 | 110 | .B CAP_SYS_ADMIN |
71f6247f MK |
111 | capability. |
112 | .TP | |
fea681da | 113 | .B CAP_CHOWN |
c8e68512 | 114 | Make arbitrary changes to file UIDs and GIDs (see |
fea681da MK |
115 | .BR chown (2)). |
116 | .TP | |
117 | .B CAP_DAC_OVERRIDE | |
118 | Bypass file read, write, and execute permission checks. | |
c8e68512 | 119 | (DAC is an abbreviation of "discretionary access control".) |
fea681da MK |
120 | .TP |
121 | .B CAP_DAC_READ_SEARCH | |
a537062e MK |
122 | .PD 0 |
123 | .RS | |
124 | .IP * 2 | |
fea681da | 125 | Bypass file read permission checks and |
a537062e MK |
126 | directory read and execute permission checks; |
127 | .IP * | |
3bbab71a | 128 | invoke |
24ee13df MK |
129 | .BR open_by_handle_at (2); |
130 | .IP * | |
131 | use the | |
132 | .BR linkat (2) | |
133 | .B AT_EMPTY_PATH | |
134 | flag to create a link to a file referred to by a file descriptor. | |
a537062e MK |
135 | .RE |
136 | .PD | |
fea681da MK |
137 | .TP |
138 | .B CAP_FOWNER | |
c8e68512 MK |
139 | .PD 0 |
140 | .RS | |
141 | .IP * 2 | |
fea681da | 142 | Bypass permission checks on operations that normally |
9ee4a2b6 | 143 | require the filesystem UID of the process to match the UID of |
fea681da MK |
144 | the file (e.g., |
145 | .BR chmod (2), | |
146 | .BR utime (2)), | |
c8e68512 | 147 | excluding those operations covered by |
fea681da MK |
148 | .B CAP_DAC_OVERRIDE |
149 | and | |
150 | .BR CAP_DAC_READ_SEARCH ; | |
c8e68512 | 151 | .IP * |
1dc9bca6 MK |
152 | set inode flags (see |
153 | .BR ioctl_iflags (2)) | |
fea681da | 154 | on arbitrary files; |
c8e68512 | 155 | .IP * |
fea681da | 156 | set Access Control Lists (ACLs) on arbitrary files; |
c8e68512 | 157 | .IP * |
1c1e15ed | 158 | ignore directory sticky bit on file deletion; |
c8e68512 | 159 | .IP * |
c99eb2b2 MK |
160 | modify |
161 | .I user | |
162 | extended attributes on sticky directory owned by any user; | |
163 | .IP * | |
1c1e15ed MK |
164 | specify |
165 | .B O_NOATIME | |
166 | for arbitrary files in | |
167 | .BR open (2) | |
168 | and | |
169 | .BR fcntl (2). | |
c8e68512 MK |
170 | .RE |
171 | .PD | |
fea681da MK |
172 | .TP |
173 | .B CAP_FSETID | |
3bbab71a MK |
174 | .PD 0 |
175 | .RS | |
176 | .IP * 2 | |
ed948c28 | 177 | Don't clear set-user-ID and set-group-ID mode |
c8e68512 | 178 | bits when a file is modified; |
3bbab71a | 179 | .IP * |
c8e68512 | 180 | set the set-group-ID bit for a file whose GID does not match |
9ee4a2b6 | 181 | the filesystem or any of the supplementary GIDs of the calling process. |
3bbab71a MK |
182 | .RE |
183 | .PD | |
fea681da MK |
184 | .TP |
185 | .B CAP_IPC_LOCK | |
bea08fec | 186 | .\" FIXME . As at Linux 3.2, there are some strange uses of this capability |
46c73a44 | 187 | .\" in other places; they probably should be replaced with something else. |
3dcdef94 MK |
188 | .PD 0 |
189 | .RS | |
190 | .IP * 2 | |
c8e68512 | 191 | Lock memory |
fea681da MK |
192 | .RB ( mlock (2), |
193 | .BR mlockall (2), | |
194 | .BR mmap (2), | |
3dcdef94 MK |
195 | .BR shmctl (2)); |
196 | .IP * | |
197 | Allocate memory using huge pages | |
36e6250f | 198 | .RB ( memfd_create (2), |
3dcdef94 | 199 | .BR mmap (2), |
fea681da | 200 | .BR shmctl (2)). |
3dcdef94 MK |
201 | .PD 0 |
202 | .RE | |
fea681da MK |
203 | .TP |
204 | .B CAP_IPC_OWNER | |
205 | Bypass permission checks for operations on System V IPC objects. | |
206 | .TP | |
207 | .B CAP_KILL | |
208 | Bypass permission checks for sending signals (see | |
209 | .BR kill (2)). | |
097585ed | 210 | This includes use of the |
c8e68512 | 211 | .BR ioctl (2) |
097585ed | 212 | .B KDSIGACCEPT |
c8e68512 | 213 | operation. |
bea08fec | 214 | .\" FIXME . CAP_KILL also has an effect for threads + setting child |
a7c1e564 MK |
215 | .\" termination signal to other than SIGCHLD: without this |
216 | .\" capability, the termination signal reverts to SIGCHLD | |
c13182ef | 217 | .\" if the child does an exec(). What is the rationale |
a7c1e564 | 218 | .\" for this? |
fea681da | 219 | .TP |
c8e68512 MK |
220 | .BR CAP_LEASE " (since Linux 2.4)" |
221 | Establish leases on arbitrary files (see | |
fea681da MK |
222 | .BR fcntl (2)). |
223 | .TP | |
224 | .B CAP_LINUX_IMMUTABLE | |
c8e68512 MK |
225 | Set the |
226 | .B FS_APPEND_FL | |
fea681da | 227 | and |
c8e68512 | 228 | .B FS_IMMUTABLE_FL |
e7e006f2 | 229 | inode flags (see |
1dc9bca6 | 230 | .BR ioctl_iflags (2)). |
fea681da | 231 | .TP |
c8e68512 | 232 | .BR CAP_MAC_ADMIN " (since Linux 2.6.25)" |
7f82d0b0 | 233 | Allow MAC configuration or state changes. |
c8e68512 MK |
234 | Implemented for the Smack Linux Security Module (LSM). |
235 | .TP | |
236 | .BR CAP_MAC_OVERRIDE " (since Linux 2.6.25)" | |
7f82d0b0 | 237 | Override Mandatory Access Control (MAC). |
c8e68512 MK |
238 | Implemented for the Smack LSM. |
239 | .TP | |
240 | .BR CAP_MKNOD " (since Linux 2.4)" | |
241 | Create special files using | |
fea681da MK |
242 | .BR mknod (2). |
243 | .TP | |
244 | .B CAP_NET_ADMIN | |
e87268ec MK |
245 | Perform various network-related operations: |
246 | .PD 0 | |
247 | .RS | |
248 | .IP * 2 | |
249 | interface configuration; | |
250 | .IP * | |
12fe8fd3 | 251 | administration of IP firewall, masquerading, and accounting; |
e87268ec MK |
252 | .IP * |
253 | modify routing tables; | |
254 | .IP * | |
255 | bind to any address for transparent proxying; | |
256 | .IP * | |
1cc2995a | 257 | set type-of-service (TOS); |
e87268ec MK |
258 | .IP * |
259 | clear driver statistics; | |
260 | .IP * | |
261 | set promiscuous mode; | |
262 | .IP * | |
263 | enabling multicasting; | |
264 | .IP * | |
265 | use | |
266 | .BR setsockopt (2) | |
267 | to set the following socket options: | |
268 | .BR SO_DEBUG , | |
269 | .BR SO_MARK , | |
1ae6b2c7 | 270 | .B SO_PRIORITY |
e87268ec MK |
271 | (for a priority outside the range 0 to 6), |
272 | .BR SO_RCVBUFFORCE , | |
273 | and | |
274 | .BR SO_SNDBUFFORCE . | |
275 | .RE | |
276 | .PD | |
fea681da MK |
277 | .TP |
278 | .B CAP_NET_BIND_SERVICE | |
6eb334b2 | 279 | Bind a socket to Internet domain privileged ports |
fea681da MK |
280 | (port numbers less than 1024). |
281 | .TP | |
282 | .B CAP_NET_BROADCAST | |
c8e68512 | 283 | (Unused) Make socket broadcasts, and listen to multicasts. |
fd39ef0c MK |
284 | .\" FIXME Since Linux 4.2, there are use cases for netlink sockets |
285 | .\" commit 59324cf35aba5336b611074028777838a963d03b | |
fea681da MK |
286 | .TP |
287 | .B CAP_NET_RAW | |
93e9e2d6 MK |
288 | .PD 0 |
289 | .RS | |
290 | .IP * 2 | |
dd55b8a1 | 291 | Use RAW and PACKET sockets; |
93e9e2d6 MK |
292 | .IP * |
293 | bind to any address for transparent proxying. | |
294 | .RE | |
295 | .PD | |
fea681da MK |
296 | .\" Also various IP options and setsockopt(SO_BINDTODEVICE) |
297 | .TP | |
e39e4240 MK |
298 | .BR CAP_PERFMON " (since Linux 5.8)" |
299 | Employ various performance-monitoring mechanisms, including: | |
e39e4240 | 300 | .RS |
cbcd1195 | 301 | .IP * 2 |
f7cf9c0b | 302 | .PD 0 |
e39e4240 MK |
303 | call |
304 | .BR perf_event_open (2); | |
305 | .IP * | |
306 | employ various BPF operations that have performance implications. | |
307 | .RE | |
308 | .PD | |
309 | .IP | |
310 | This capability was added in Linux 5.8 to separate out | |
311 | performance monitoring functionality from the overloaded | |
1ae6b2c7 | 312 | .B CAP_SYS_ADMIN |
e39e4240 | 313 | capability. |
874355e3 | 314 | See also the kernel source file |
b49c2acb | 315 | .IR Documentation/admin\-guide/perf\-security.rst . |
e39e4240 | 316 | .TP |
fea681da | 317 | .B CAP_SETGID |
3bbab71a MK |
318 | .RS |
319 | .PD 0 | |
320 | .IP * 2 | |
c8e68512 | 321 | Make arbitrary manipulations of process GIDs and supplementary GID list; |
3bbab71a | 322 | .IP * |
5bea231d | 323 | forge GID when passing socket credentials via UNIX domain sockets; |
3bbab71a | 324 | .IP * |
5bea231d | 325 | write a group ID mapping in a user namespace (see |
f58fb24f | 326 | .BR user_namespaces (7)). |
3bbab71a MK |
327 | .PD |
328 | .RE | |
fea681da | 329 | .TP |
c8e68512 | 330 | .BR CAP_SETFCAP " (since Linux 2.6.24)" |
b8cee784 | 331 | Set arbitrary capabilities on a file. |
29c1f3cf KK |
332 | .IP |
333 | .\" commit db2e718a47984b9d71ed890eb2ea36ecf150de18 | |
334 | Since Linux 5.12, this capability is | |
a1508e36 MK |
335 | also needed to map user ID 0 in a new user namespace; see |
336 | .BR user_namespaces (7) | |
337 | for details. | |
c8e68512 MK |
338 | .TP |
339 | .B CAP_SETPCAP | |
e62172cb | 340 | If file capabilities are supported (i.e., since Linux 2.6.24): |
c8e68512 MK |
341 | add any capability from the calling thread's bounding set |
342 | to its inheritable set; | |
343 | drop capabilities from the bounding set (via | |
344 | .BR prctl (2) | |
345 | .BR PR_CAPBSET_DROP ); | |
346 | make changes to the | |
347 | .I securebits | |
348 | flags. | |
e62172cb MK |
349 | .IP |
350 | If file capabilities are not supported (i.e., kernels before Linux 2.6.24): | |
351 | grant or remove any capability in the | |
352 | caller's permitted capability set to or from any other process. | |
353 | (This property of | |
354 | .B CAP_SETPCAP | |
355 | is not available when the kernel is configured to support | |
356 | file capabilities, since | |
357 | .B CAP_SETPCAP | |
358 | has entirely different semantics for such kernels.) | |
fea681da MK |
359 | .TP |
360 | .B CAP_SETUID | |
3bbab71a MK |
361 | .RS |
362 | .PD 0 | |
363 | .IP * 2 | |
c8e68512 | 364 | Make arbitrary manipulations of process UIDs |
fea681da MK |
365 | .RB ( setuid (2), |
366 | .BR setreuid (2), | |
367 | .BR setresuid (2), | |
368 | .BR setfsuid (2)); | |
3bbab71a | 369 | .IP * |
a7d96776 | 370 | forge UID when passing socket credentials via UNIX domain sockets; |
3bbab71a | 371 | .IP * |
5bea231d | 372 | write a user ID mapping in a user namespace (see |
f58fb24f | 373 | .BR user_namespaces (7)). |
3bbab71a MK |
374 | .PD |
375 | .RE | |
777f5a9e | 376 | .\" FIXME CAP_SETUID also an effect in exec(); document this. |
fea681da MK |
377 | .TP |
378 | .B CAP_SYS_ADMIN | |
fa50d3d4 MK |
379 | .IR Note : |
380 | this capability is overloaded; see | |
aca89285 | 381 | .I Notes to kernel developers |
fa50d3d4 | 382 | below. |
ade303d7 | 383 | .IP |
c8e68512 MK |
384 | .PD 0 |
385 | .RS | |
386 | .IP * 2 | |
387 | Perform a range of system administration operations including: | |
fea681da MK |
388 | .BR quotactl (2), |
389 | .BR mount (2), | |
390 | .BR umount (2), | |
40ca3880 | 391 | .BR pivot_root (2), |
1368e847 MK |
392 | .BR swapon (2), |
393 | .BR swapoff (2), | |
fea681da | 394 | .BR sethostname (2), |
f169a862 | 395 | and |
c8e68512 MK |
396 | .BR setdomainname (2); |
397 | .IP * | |
bfb730f9 MK |
398 | perform privileged |
399 | .BR syslog (2) | |
400 | operations (since Linux 2.6.37, | |
1ae6b2c7 | 401 | .B CAP_SYSLOG |
bfb730f9 MK |
402 | should be used to permit such operations); |
403 | .IP * | |
c8e68512 | 404 | perform |
c11e3891 MK |
405 | .B VM86_REQUEST_IRQ |
406 | .BR vm86 (2) | |
407 | command; | |
408 | .IP * | |
045c5bde | 409 | access the same checkpoint/restore functionality that is governed by |
1ae6b2c7 | 410 | .B CAP_CHECKPOINT_RESTORE |
045c5bde MK |
411 | (but the latter, weaker capability is preferred for accessing |
412 | that functionality). | |
413 | .IP * | |
2fbfb575 | 414 | perform the same BPF operations as are governed by |
1ae6b2c7 | 415 | .B CAP_BPF |
2fbfb575 MK |
416 | (but the latter, weaker capability is preferred for accessing |
417 | that functionality). | |
418 | .IP * | |
419 | employ the same performance monitoring mechanisms as are governed by | |
1ae6b2c7 | 420 | .B CAP_PERFMON |
2fbfb575 MK |
421 | (but the latter, weaker capability is preferred for accessing |
422 | that functionality). | |
423 | .IP * | |
c11e3891 | 424 | perform |
fea681da MK |
425 | .B IPC_SET |
426 | and | |
427 | .B IPC_RMID | |
428 | operations on arbitrary System V IPC objects; | |
c8e68512 | 429 | .IP * |
1a3b63f7 MK |
430 | override |
431 | .B RLIMIT_NPROC | |
432 | resource limit; | |
433 | .IP * | |
fea681da MK |
434 | perform operations on |
435 | .I trusted | |
436 | and | |
437 | .I security | |
19531dec | 438 | extended attributes (see |
89fabe2e | 439 | .BR xattr (7)); |
c8e68512 MK |
440 | .IP * |
441 | use | |
08baa0af | 442 | .BR lookup_dcookie (2); |
c8e68512 | 443 | .IP * |
a1f926b8 MK |
444 | use |
445 | .BR ioprio_set (2) | |
446 | to assign | |
447 | .B IOPRIO_CLASS_RT | |
83ee9237 | 448 | and (before Linux 2.6.25) |
237aa7c5 | 449 | .B IOPRIO_CLASS_IDLE |
a1f926b8 | 450 | I/O scheduling classes; |
c8e68512 | 451 | .IP * |
f5ac5bbf | 452 | forge PID when passing socket credentials via UNIX domain sockets; |
c8e68512 | 453 | .IP * |
fea681da | 454 | exceed |
b49c2acb | 455 | .IR /proc/sys/fs/file\-max , |
3dfe7e0d MK |
456 | the system-wide limit on the number of open files, |
457 | in system calls that open files (e.g., | |
fea681da MK |
458 | .BR accept (2), |
459 | .BR execve (2), | |
460 | .BR open (2), | |
f169a862 | 461 | .BR pipe (2)); |
c8e68512 | 462 | .IP * |
c13182ef | 463 | employ |
0f807eea MK |
464 | .B CLONE_* |
465 | flags that create new namespaces with | |
a7c1e564 MK |
466 | .BR clone (2) |
467 | and | |
c67d3814 MK |
468 | .BR unshare (2) |
469 | (but, since Linux 3.8, | |
470 | creating user namespaces does not require any capability); | |
c8e68512 | 471 | .IP * |
0f322ccc MK |
472 | access privileged |
473 | .I perf | |
474 | event information; | |
2bfe6656 MK |
475 | .IP * |
476 | call | |
c3b49118 MK |
477 | .BR setns (2) |
478 | (requires | |
479 | .B CAP_SYS_ADMIN | |
480 | in the | |
481 | .I target | |
482 | namespace); | |
e4698850 | 483 | .IP * |
0f807eea MK |
484 | call |
485 | .BR fanotify_init (2); | |
486 | .IP * | |
2cf45b0d | 487 | perform privileged |
a7c1e564 MK |
488 | .B KEYCTL_CHOWN |
489 | and | |
490 | .B KEYCTL_SETPERM | |
491 | .BR keyctl (2) | |
e64e6056 MK |
492 | operations; |
493 | .IP * | |
494 | perform | |
495 | .BR madvise (2) | |
496 | .B MADV_HWPOISON | |
0f807eea MK |
497 | operation; |
498 | .IP * | |
499 | employ the | |
500 | .B TIOCSTI | |
501 | .BR ioctl (2) | |
502 | to insert characters into the input queue of a terminal other than | |
838ad419 | 503 | the caller's controlling terminal; |
0f807eea | 504 | .IP * |
0f807eea | 505 | employ the obsolete |
51c5c662 | 506 | .BR nfsservctl (2) |
c42221c4 MK |
507 | system call; |
508 | .IP * | |
509 | employ the obsolete | |
0f807eea MK |
510 | .BR bdflush (2) |
511 | system call; | |
512 | .IP * | |
513 | perform various privileged block-device | |
514 | .BR ioctl (2) | |
515 | operations; | |
516 | .IP * | |
9ee4a2b6 | 517 | perform various privileged filesystem |
0f807eea MK |
518 | .BR ioctl (2) |
519 | operations; | |
520 | .IP * | |
fdf41f57 MK |
521 | perform privileged |
522 | .BR ioctl (2) | |
523 | operations on the | |
1ae6b2c7 | 524 | .I /dev/random |
fdf41f57 MK |
525 | device (see |
526 | .BR random (4)); | |
527 | .IP * | |
c6ddae52 MK |
528 | install a |
529 | .BR seccomp (2) | |
530 | filter without first having to set the | |
531 | .I no_new_privs | |
532 | thread attribute; | |
533 | .IP * | |
968b27aa MK |
534 | modify allow/deny rules for device control groups; |
535 | .IP * | |
536 | employ the | |
537 | .BR ptrace (2) | |
538 | .B PTRACE_SECCOMP_GET_FILTER | |
539 | operation to dump tracee's seccomp filters; | |
540 | .IP * | |
541 | employ the | |
542 | .BR ptrace (2) | |
543 | .B PTRACE_SETOPTIONS | |
544 | operation to suspend the tracee's seccomp protections (i.e., the | |
545 | .B PTRACE_O_SUSPEND_SECCOMP | |
115c1eb4 | 546 | flag); |
c6ddae52 | 547 | .IP * |
a526aa40 | 548 | perform administrative operations on many device drivers; |
7e7e8de3 | 549 | .IP * |
a526aa40 | 550 | modify autogroup nice values by writing to |
1ae6b2c7 | 551 | .IR /proc/ pid /autogroup |
7e7e8de3 MK |
552 | (see |
553 | .BR sched (7)). | |
c8e68512 MK |
554 | .RE |
555 | .PD | |
fea681da MK |
556 | .TP |
557 | .B CAP_SYS_BOOT | |
c8e68512 | 558 | Use |
08baa0af MK |
559 | .BR reboot (2) |
560 | and | |
561 | .BR kexec_load (2). | |
fea681da MK |
562 | .TP |
563 | .B CAP_SYS_CHROOT | |
4312e0cb MK |
564 | .RS |
565 | .PD 0 | |
566 | .IP * 2 | |
c8e68512 | 567 | Use |
4312e0cb MK |
568 | .BR chroot (2); |
569 | .IP * | |
570 | change mount namespaces using | |
571 | .BR setns (2). | |
572 | .PD | |
573 | .RE | |
fea681da MK |
574 | .TP |
575 | .B CAP_SYS_MODULE | |
3bbab71a MK |
576 | .RS |
577 | .PD 0 | |
578 | .IP * 2 | |
c8e68512 MK |
579 | Load and unload kernel modules |
580 | (see | |
fea681da MK |
581 | .BR init_module (2) |
582 | and | |
c8e68512 | 583 | .BR delete_module (2)); |
3bbab71a | 584 | .IP * |
c8e68512 MK |
585 | in kernels before 2.6.25: |
586 | drop capabilities from the system-wide capability bounding set. | |
3bbab71a MK |
587 | .PD |
588 | .RE | |
fea681da MK |
589 | .TP |
590 | .B CAP_SYS_NICE | |
c8e68512 MK |
591 | .PD 0 |
592 | .RS | |
593 | .IP * 2 | |
0c576731 | 594 | Lower the process nice value |
fea681da MK |
595 | .RB ( nice (2), |
596 | .BR setpriority (2)) | |
c8e68512 MK |
597 | and change the nice value for arbitrary processes; |
598 | .IP * | |
599 | set real-time scheduling policies for calling process, | |
600 | and set scheduling policies and priorities for arbitrary processes | |
fea681da | 601 | .RB ( sched_setscheduler (2), |
f96787ab | 602 | .BR sched_setparam (2), |
0d59d0c8 | 603 | .BR sched_setattr (2)); |
c8e68512 | 604 | .IP * |
fea681da | 605 | set CPU affinity for arbitrary processes |
c13182ef | 606 | .RB ( sched_setaffinity (2)); |
c8e68512 | 607 | .IP * |
a1f926b8 | 608 | set I/O scheduling class and priority for arbitrary processes |
c13182ef | 609 | .RB ( ioprio_set (2)); |
c8e68512 MK |
610 | .IP * |
611 | apply | |
a1f926b8 | 612 | .BR migrate_pages (2) |
c8e68512 | 613 | to arbitrary processes and allow processes |
a1f926b8 | 614 | to be migrated to arbitrary nodes; |
c13182ef | 615 | .\" FIXME CAP_SYS_NICE also has the following effect for |
a1f926b8 MK |
616 | .\" migrate_pages(2): |
617 | .\" do_migrate_pages(mm, &old, &new, | |
618 | .\" capable(CAP_SYS_NICE) ? MPOL_MF_MOVE_ALL : MPOL_MF_MOVE); | |
1a0fbe37 | 619 | .\" |
bea08fec | 620 | .\" Document this. |
c8e68512 MK |
621 | .IP * |
622 | apply | |
a7c1e564 | 623 | .BR move_pages (2) |
c8e68512 MK |
624 | to arbitrary processes; |
625 | .IP * | |
4d62f7b6 MK |
626 | use the |
627 | .B MPOL_MF_MOVE_ALL | |
c13182ef | 628 | flag with |
a7c1e564 | 629 | .BR mbind (2) |
c13182ef | 630 | and |
a7c1e564 | 631 | .BR move_pages (2). |
c8e68512 MK |
632 | .RE |
633 | .PD | |
fea681da MK |
634 | .TP |
635 | .B CAP_SYS_PACCT | |
c8e68512 | 636 | Use |
fea681da MK |
637 | .BR acct (2). |
638 | .TP | |
639 | .B CAP_SYS_PTRACE | |
eb64a9cb MK |
640 | .PD 0 |
641 | .RS | |
de6a5c05 | 642 | .IP * 2 |
c8e68512 | 643 | Trace arbitrary processes using |
cbd7b9bf | 644 | .BR ptrace (2); |
eb64a9cb | 645 | .IP * |
cbd7b9bf MK |
646 | apply |
647 | .BR get_robust_list (2) | |
38b6e5b0 | 648 | to arbitrary processes; |
eb64a9cb | 649 | .IP * |
b8f84ce2 MK |
650 | transfer data to or from the memory of arbitrary processes using |
651 | .BR process_vm_readv (2) | |
652 | and | |
3bbab71a | 653 | .BR process_vm_writev (2); |
b8f84ce2 | 654 | .IP * |
38b6e5b0 MK |
655 | inspect processes using |
656 | .BR kcmp (2). | |
eb64a9cb MK |
657 | .RE |
658 | .PD | |
fea681da MK |
659 | .TP |
660 | .B CAP_SYS_RAWIO | |
4637c8cb MK |
661 | .PD 0 |
662 | .RS | |
663 | .IP * 2 | |
c8e68512 | 664 | Perform I/O port operations |
fea681da MK |
665 | .RB ( iopl (2) |
666 | and | |
667 | .BR ioperm (2)); | |
4637c8cb | 668 | .IP * |
fea681da | 669 | access |
474e1f9d | 670 | .IR /proc/kcore ; |
4637c8cb | 671 | .IP * |
474e1f9d MK |
672 | employ the |
673 | .B FIBMAP | |
674 | .BR ioctl (2) | |
4637c8cb MK |
675 | operation; |
676 | .IP * | |
677 | open devices for accessing x86 model-specific registers (MSRs, see | |
3bbab71a | 678 | .BR msr (4)); |
4637c8cb MK |
679 | .IP * |
680 | update | |
681 | .IR /proc/sys/vm/mmap_min_addr ; | |
682 | .IP * | |
683 | create memory mappings at addresses below the value specified by | |
684 | .IR /proc/sys/vm/mmap_min_addr ; | |
685 | .IP * | |
50b2aa27 | 686 | map files in |
cef53f3e | 687 | .IR /proc/bus/pci ; |
4637c8cb MK |
688 | .IP * |
689 | open | |
1ae6b2c7 | 690 | .I /dev/mem |
4637c8cb MK |
691 | and |
692 | .IR /dev/kmem ; | |
693 | .IP * | |
694 | perform various SCSI device commands; | |
695 | .IP * | |
696 | perform certain operations on | |
697 | .BR hpsa (4) | |
698 | and | |
699 | .BR cciss (4) | |
700 | devices; | |
701 | .IP * | |
702 | perform a range of device-specific operations on other devices. | |
703 | .RE | |
704 | .PD | |
fea681da MK |
705 | .TP |
706 | .B CAP_SYS_RESOURCE | |
c8e68512 MK |
707 | .PD 0 |
708 | .RS | |
709 | .IP * 2 | |
9ee4a2b6 | 710 | Use reserved space on ext2 filesystems; |
c8e68512 MK |
711 | .IP * |
712 | make | |
fea681da MK |
713 | .BR ioctl (2) |
714 | calls controlling ext3 journaling; | |
c8e68512 MK |
715 | .IP * |
716 | override disk quota limits; | |
717 | .IP * | |
718 | increase resource limits (see | |
fea681da | 719 | .BR setrlimit (2)); |
c8e68512 MK |
720 | .IP * |
721 | override | |
fea681da | 722 | .B RLIMIT_NPROC |
c8e68512 MK |
723 | resource limit; |
724 | .IP * | |
aa66392d MK |
725 | override maximum number of consoles on console allocation; |
726 | .IP * | |
727 | override maximum number of keymaps; | |
728 | .IP * | |
729 | allow more than 64hz interrupts from the real-time clock; | |
730 | .IP * | |
c8e68512 | 731 | raise |
fea681da | 732 | .I msg_qbytes |
c8e68512 | 733 | limit for a System V message queue above the limit in |
0daa9e92 | 734 | .I /proc/sys/kernel/msgmnb |
fea681da MK |
735 | (see |
736 | .BR msgop (2) | |
737 | and | |
ad7b0f91 MK |
738 | .BR msgctl (2)); |
739 | .IP * | |
7509f758 MK |
740 | allow the |
741 | .B RLIMIT_NOFILE | |
742 | resource limit on the number of "in-flight" file descriptors | |
743 | to be bypassed when passing file descriptors to another process | |
744 | via a UNIX domain socket (see | |
745 | .BR unix (7)); | |
746 | .IP * | |
ad7b0f91 | 747 | override the |
b49c2acb | 748 | .I /proc/sys/fs/pipe\-size\-max |
ad7b0f91 MK |
749 | limit when setting the capacity of a pipe using the |
750 | .B F_SETPIPE_SZ | |
751 | .BR fcntl (2) | |
1cc2995a | 752 | command; |
46883521 MK |
753 | .IP * |
754 | use | |
1ae6b2c7 | 755 | .B F_SETPIPE_SZ |
46883521 | 756 | to increase the capacity of a pipe above the limit specified by |
b49c2acb | 757 | .IR /proc/sys/fs/pipe\-max\-size ; |
b39a2012 MK |
758 | .IP * |
759 | override | |
5d63eed8 AM |
760 | .IR /proc/sys/fs/mqueue/queues_max , |
761 | .IR /proc/sys/fs/mqueue/msg_max , | |
69a0c93e SM |
762 | and |
763 | .I /proc/sys/fs/mqueue/msgsize_max | |
aade901b | 764 | limits when creating POSIX message queues (see |
ecc1f45b MK |
765 | .BR mq_overview (7)); |
766 | .IP * | |
3bbab71a | 767 | employ the |
ecc1f45b MK |
768 | .BR prctl (2) |
769 | .B PR_SET_MM | |
8ddcc591 | 770 | operation; |
41f00272 | 771 | .IP * |
8ddcc591 | 772 | set |
1ae6b2c7 | 773 | .IR /proc/ pid /oom_score_adj |
8ddcc591 MK |
774 | to a value lower than the value last set by a process with |
775 | .BR CAP_SYS_RESOURCE . | |
c8e68512 MK |
776 | .RE |
777 | .PD | |
fea681da MK |
778 | .TP |
779 | .B CAP_SYS_TIME | |
c8e68512 | 780 | Set system clock |
fea681da MK |
781 | .RB ( settimeofday (2), |
782 | .BR stime (2), | |
783 | .BR adjtimex (2)); | |
c8e68512 | 784 | set real-time (hardware) clock. |
fea681da MK |
785 | .TP |
786 | .B CAP_SYS_TTY_CONFIG | |
c8e68512 | 787 | Use |
749ac769 MK |
788 | .BR vhangup (2); |
789 | employ various privileged | |
790 | .BR ioctl (2) | |
791 | operations on virtual terminals. | |
bfb730f9 MK |
792 | .TP |
793 | .BR CAP_SYSLOG " (since Linux 2.6.37)" | |
5f94327c MK |
794 | .RS |
795 | .PD 0 | |
de6a5c05 | 796 | .IP * 2 |
bfb730f9 MK |
797 | Perform privileged |
798 | .BR syslog (2) | |
799 | operations. | |
800 | See | |
801 | .BR syslog (2) | |
802 | for information on which operations require privilege. | |
10fe5485 MK |
803 | .IP * |
804 | View kernel addresses exposed via | |
805 | .I /proc | |
806 | and other interfaces when | |
1ae6b2c7 | 807 | .I /proc/sys/kernel/kptr_restrict |
10fe5485 | 808 | has the value 1. |
4eaa04c5 | 809 | (See the discussion of the |
10fe5485 MK |
810 | .I kptr_restrict |
811 | in | |
812 | .BR proc (5).) | |
5f94327c MK |
813 | .PD |
814 | .RE | |
d6b08708 MK |
815 | .TP |
816 | .BR CAP_WAKE_ALARM " (since Linux 3.0)" | |
817 | Trigger something that will wake up the system (set | |
818 | .B CLOCK_REALTIME_ALARM | |
819 | and | |
820 | .B CLOCK_BOOTTIME_ALARM | |
821 | timers). | |
c8e68512 | 822 | .\" |
c634028a | 823 | .SS Past and current implementation |
c8e68512 MK |
824 | A full implementation of capabilities requires that: |
825 | .IP 1. 3 | |
826 | For all privileged operations, | |
827 | the kernel must check whether the thread has the required | |
828 | capability in its effective set. | |
829 | .IP 2. | |
137d81b5 | 830 | The kernel must provide system calls allowing a thread's capability sets to |
c8e68512 MK |
831 | be changed and retrieved. |
832 | .IP 3. | |
9ee4a2b6 | 833 | The filesystem must support attaching capabilities to an executable file, |
c8e68512 MK |
834 | so that a process gains those capabilities when the file is executed. |
835 | .PP | |
836 | Before kernel 2.6.24, only the first two of these requirements are met; | |
837 | since kernel 2.6.24, all three requirements are met. | |
838 | .\" | |
8de5616f MK |
839 | .SS Notes to kernel developers |
840 | When adding a new kernel feature that should be governed by a capability, | |
841 | consider the following points. | |
842 | .IP * 3 | |
ddb624a9 MK |
843 | The goal of capabilities is divide the power of superuser into pieces, |
844 | such that if a program that has one or more capabilities is compromised, | |
845 | its power to do damage to the system would be less than the same program | |
846 | running with root privilege. | |
8de5616f MK |
847 | .IP * |
848 | You have the choice of either creating a new capability for your new feature, | |
849 | or associating the feature with one of the existing capabilities. | |
ddb624a9 | 850 | In order to keep the set of capabilities to a manageable size, |
8de5616f MK |
851 | the latter option is preferable, |
852 | unless there are compelling reasons to take the former option. | |
ddb624a9 MK |
853 | (There is also a technical limit: |
854 | the size of capability sets is currently limited to 64 bits.) | |
8de5616f MK |
855 | .IP * |
856 | To determine which existing capability might best be associated | |
857 | with your new feature, review the list of capabilities above in order | |
858 | to find a "silo" into which your new feature best fits. | |
ddb624a9 | 859 | One approach to take is to determine if there are other features |
9f92e4e1 | 860 | requiring capabilities that will always be used along with the new feature. |
ddb624a9 MK |
861 | If the new feature is useless without these other features, |
862 | you should use the same capability as the other features. | |
8de5616f | 863 | .IP * |
1ae6b2c7 | 864 | .I Don't |
8de5616f MK |
865 | choose |
866 | .B CAP_SYS_ADMIN | |
867 | if you can possibly avoid it! | |
868 | A vast proportion of existing capability checks are associated | |
6e9219f7 MK |
869 | with this capability (see the partial list above). |
870 | It can plausibly be called "the new root", | |
871 | since on the one hand, it confers a wide range of powers, | |
872 | and on the other hand, | |
873 | its broad scope means that this is the capability | |
874 | that is required by many privileged programs. | |
8de5616f MK |
875 | Don't make the problem worse. |
876 | The only new features that should be associated with | |
877 | .B CAP_SYS_ADMIN | |
878 | are ones that | |
879 | .I closely | |
880 | match existing uses in that silo. | |
881 | .IP * | |
882 | If you have determined that it really is necessary to create | |
883 | a new capability for your feature, | |
ddb624a9 | 884 | don't make or name it as a "single-use" capability. |
8de5616f | 885 | Thus, for example, the addition of the highly specific |
1ae6b2c7 | 886 | .B CAP_SYS_PACCT |
8de5616f MK |
887 | was probably a mistake. |
888 | Instead, try to identify and name your new capability as a broader | |
889 | silo into which other related future use cases might fit. | |
890 | .\" | |
c634028a | 891 | .SS Thread capability sets |
1db1d36d | 892 | Each thread has the following capability sets containing zero or more |
fea681da MK |
893 | of the above capabilities: |
894 | .TP | |
1ae6b2c7 | 895 | .I Permitted |
c8e68512 MK |
896 | This is a limiting superset for the effective |
897 | capabilities that the thread may assume. | |
898 | It is also a limiting superset for the capabilities that | |
899 | may be added to the inheritable set by a thread that does not have the | |
900 | .B CAP_SETPCAP | |
901 | capability in its effective set. | |
ade303d7 | 902 | .IP |
cf7a13d4 | 903 | If a thread drops a capability from its permitted set, |
3b777aff | 904 | it can never reacquire that capability (unless it |
c930827f | 905 | .BR execve (2)s |
c8e68512 MK |
906 | either a set-user-ID-root program, or |
907 | a program whose associated file capabilities grant that capability). | |
fea681da | 908 | .TP |
1ae6b2c7 | 909 | .I Inheritable |
c8e68512 | 910 | This is a set of capabilities preserved across an |
fea681da | 911 | .BR execve (2). |
6260f4cd AL |
912 | Inheritable capabilities remain inheritable when executing any program, |
913 | and inheritable capabilities are added to the permitted set when executing | |
914 | a program that has the corresponding bits set in the file inheritable set. | |
915 | .IP | |
916 | Because inheritable capabilities are not generally preserved across | |
917 | .BR execve (2) | |
918 | when running as a non-root user, applications that wish to run helper | |
e574dcd0 MK |
919 | programs with elevated capabilities should consider using |
920 | ambient capabilities, described below. | |
c8e68512 | 921 | .TP |
1ae6b2c7 | 922 | .I Effective |
c8e68512 MK |
923 | This is the set of capabilities used by the kernel to |
924 | perform permission checks for the thread. | |
6260f4cd | 925 | .TP |
36de80b9 MK |
926 | .IR Bounding " (per-thread since Linux 2.6.25)" |
927 | The capability bounding set is a mechanism that can be used | |
928 | to limit the capabilities that are gained during | |
929 | .BR execve (2). | |
930 | .IP | |
931 | Since Linux 2.6.25, this is a per-thread capability set. | |
932 | In older kernels, the capability bounding set was a system wide attribute | |
933 | shared by all threads on the system. | |
934 | .IP | |
aca89285 KK |
935 | For more details, see |
936 | .I Capability bounding set | |
937 | below. | |
36de80b9 | 938 | .TP |
c2b279af | 939 | .IR Ambient " (since Linux 4.3)" |
e574dcd0 | 940 | .\" commit 58319057b7847667f0c9585b9de0e8932b0fdb08 |
6260f4cd AL |
941 | This is a set of capabilities that are preserved across an |
942 | .BR execve (2) | |
3375bef1 | 943 | of a program that is not privileged. |
e574dcd0 MK |
944 | The ambient capability set obeys the invariant that no capability |
945 | can ever be ambient if it is not both permitted and inheritable. | |
ade303d7 | 946 | .IP |
3375bef1 MK |
947 | The ambient capability set can be directly modified using |
948 | .BR prctl (2). | |
949 | Ambient capabilities are automatically lowered if either of | |
950 | the corresponding permitted or inheritable capabilities is lowered. | |
ade303d7 | 951 | .IP |
3375bef1 MK |
952 | Executing a program that changes UID or GID due to the |
953 | set-user-ID or set-group-ID bits or executing a program that has | |
954 | any file capabilities set will clear the ambient set. | |
955 | Ambient capabilities are added to the permitted set and | |
956 | assigned to the effective set when | |
6260f4cd | 957 | .BR execve (2) |
e574dcd0 | 958 | is called. |
5367a9ab MK |
959 | If ambient capabilities cause a process's permitted and effective |
960 | capabilities to increase during an | |
961 | .BR execve (2), | |
962 | this does not trigger the secure-execution mode described in | |
963 | .BR ld.so (8). | |
fea681da | 964 | .PP |
fea681da MK |
965 | A child created via |
966 | .BR fork (2) | |
967 | inherits copies of its parent's capability sets. | |
aca89285 KK |
968 | For details on how |
969 | .BR execve (2) | |
970 | affects capabilities, see | |
971 | .I Transformation of capabilities during execve() | |
972 | below. | |
fea681da MK |
973 | .PP |
974 | Using | |
975 | .BR capset (2), | |
aca89285 KK |
976 | a thread may manipulate its own capability sets; see |
977 | .I Programmatically adjusting capability sets | |
978 | below. | |
afae50e4 MK |
979 | .PP |
980 | Since Linux 3.2, the file | |
981 | .I /proc/sys/kernel/cap_last_cap | |
a60b1f03 | 982 | .\" commit 73efc0394e148d0e15583e13712637831f926720 |
afae50e4 MK |
983 | exposes the numerical value of the highest capability |
984 | supported by the running kernel; | |
985 | this can be used to determine the highest bit | |
986 | that may be set in a capability set. | |
c8e68512 | 987 | .\" |
c634028a | 988 | .SS File capabilities |
c8e68512 MK |
989 | Since kernel 2.6.24, the kernel supports |
990 | associating capability sets with an executable file using | |
991 | .BR setcap (8). | |
992 | The file capability sets are stored in an extended attribute (see | |
6a65cff8 MK |
993 | .BR setxattr (2) |
994 | and | |
995 | .BR xattr (7)) | |
c8e68512 MK |
996 | named |
997 | .IR "security.capability" . | |
998 | Writing to this extended attribute requires the | |
1ae6b2c7 | 999 | .B CAP_SETFCAP |
fea681da | 1000 | capability. |
c8e68512 | 1001 | The file capability sets, |
cf7a13d4 | 1002 | in conjunction with the capability sets of the thread, |
c8e68512 | 1003 | determine the capabilities of a thread after an |
c930827f | 1004 | .BR execve (2). |
ade303d7 | 1005 | .PP |
c8e68512 | 1006 | The three file capability sets are: |
fea681da | 1007 | .TP |
3dfe7e0d | 1008 | .IR Permitted " (formerly known as " forced ): |
c8e68512 | 1009 | These capabilities are automatically permitted to the thread, |
cf7a13d4 | 1010 | regardless of the thread's inheritable capabilities. |
fea681da | 1011 | .TP |
c8e68512 MK |
1012 | .IR Inheritable " (formerly known as " allowed ): |
1013 | This set is ANDed with the thread's inheritable set to determine which | |
1014 | inheritable capabilities are enabled in the permitted set of | |
1015 | the thread after the | |
1016 | .BR execve (2). | |
1017 | .TP | |
fea681da | 1018 | .IR Effective : |
c8e68512 MK |
1019 | This is not a set, but rather just a single bit. |
1020 | If this bit is set, then during an | |
1021 | .BR execve (2) | |
1022 | all of the new permitted capabilities for the thread are | |
1023 | also raised in the effective set. | |
1024 | If this bit is not set, then after an | |
1025 | .BR execve (2), | |
1026 | none of the new permitted capabilities is in the new effective set. | |
ade303d7 | 1027 | .IP |
c8e68512 | 1028 | Enabling the file effective capability bit implies |
2914a14d | 1029 | that any file permitted or inheritable capability that causes a |
c8e68512 MK |
1030 | thread to acquire the corresponding permitted capability during an |
1031 | .BR execve (2) | |
aca89285 KK |
1032 | (see |
1033 | .I Transformation of capabilities during execve() | |
1034 | below) will also acquire that | |
c8e68512 MK |
1035 | capability in its effective set. |
1036 | Therefore, when assigning capabilities to a file | |
1037 | .RB ( setcap (8), | |
1038 | .BR cap_set_file (3), | |
1039 | .BR cap_set_fd (3)), | |
1040 | if we specify the effective flag as being enabled for any capability, | |
1041 | then the effective flag must also be specified as enabled | |
1042 | for all other capabilities for which the corresponding permitted or | |
1043 | inheritable flags is enabled. | |
1044 | .\" | |
c281d050 | 1045 | .SS File capability extended attribute versioning |
b6848704 MK |
1046 | To allow extensibility, |
1047 | the kernel supports a scheme to encode a version number inside the | |
1048 | .I security.capability | |
1049 | extended attribute that is used to implement file capabilities. | |
1050 | These version numbers are internal to the implementation, | |
1051 | and not directly visible to user-space applications. | |
1052 | To date, the following versions are supported: | |
1053 | .TP | |
1ae6b2c7 | 1054 | .B VFS_CAP_REVISION_1 |
b6848704 MK |
1055 | This was the original file capability implementation, |
1056 | which supported 32-bit masks for file capabilities. | |
1057 | .TP | |
1058 | .BR VFS_CAP_REVISION_2 " (since Linux 2.6.25)" | |
1059 | .\" commit e338d263a76af78fe8f38a72131188b58fceb591 | |
1060 | This version allows for file capability masks that are 64 bits in size, | |
1061 | and was necessary as the number of supported capabilities grew beyond 32. | |
1062 | The kernel transparently continues to support the execution of files | |
1063 | that have 32-bit version 1 capability masks, | |
1064 | but when adding capabilities to files that did not previously | |
1065 | have capabilities, or modifying the capabilities of existing files, | |
bcaa30c9 MK |
1066 | it automatically uses the version 2 scheme |
1067 | (or possibly the version 3 scheme, as described below). | |
b6848704 MK |
1068 | .TP |
1069 | .BR VFS_CAP_REVISION_3 " (since Linux 4.14)" | |
1070 | .\" commit 8db6c34f1dbc8e06aa016a9b829b06902c3e1340 | |
bcaa30c9 | 1071 | Version 3 file capabilities are provided |
12dce731 | 1072 | to support namespaced file capabilities (described below). |
bcaa30c9 | 1073 | .IP |
b6848704 | 1074 | As with version 2 file capabilities, |
bcaa30c9 MK |
1075 | version 3 capability masks are 64 bits in size. |
1076 | But in addition, the root user ID of namespace is encoded in the | |
b6848704 MK |
1077 | .I security.capability |
1078 | extended attribute. | |
7da0c87a MK |
1079 | (A namespace's root user ID is the value that user ID 0 |
1080 | inside that namespace maps to in the initial user namespace.) | |
7b45f4b2 | 1081 | .IP |
bcaa30c9 MK |
1082 | Version 3 file capabilities are designed to coexist |
1083 | with version 2 capabilities; | |
1084 | that is, on a modern Linux system, | |
1085 | there may be some files with version 2 capabilities | |
1086 | while others have version 3 capabilities. | |
1087 | .PP | |
1088 | Before Linux 4.14, | |
c281d050 MK |
1089 | the only kind of file capability extended attribute |
1090 | that could be attached to a file was a | |
bcaa30c9 | 1091 | .B VFS_CAP_REVISION_2 |
c281d050 | 1092 | attribute. |
bcaa30c9 | 1093 | Since Linux 4.14, |
9b2c207a | 1094 | the version of the |
bcaa30c9 | 1095 | .I security.capability |
9b2c207a MK |
1096 | extended attribute that is attached to a file |
1097 | depends on the circumstances in which the attribute was created. | |
bcaa30c9 | 1098 | .PP |
7b45f4b2 | 1099 | Starting with Linux 4.14, a |
7b45f4b2 MK |
1100 | .I security.capability |
1101 | extended attribute is automatically created as (or converted to) | |
bcaa30c9 MK |
1102 | a version 3 |
1103 | .RB ( VFS_CAP_REVISION_3 ) | |
1104 | attribute if both of the following are true: | |
7b45f4b2 | 1105 | .IP (1) 4 |
ffea2c14 | 1106 | The thread writing the attribute resides in a noninitial user namespace. |
7b45f4b2 MK |
1107 | (More precisely: the thread resides in a user namespace other |
1108 | than the one from which the underlying filesystem was mounted.) | |
1109 | .IP (2) | |
1110 | The thread has the | |
1ae6b2c7 | 1111 | .B CAP_SETFCAP |
7b45f4b2 MK |
1112 | capability over the file inode, |
1113 | meaning that (a) the thread has the | |
1114 | .B CAP_SETFCAP | |
1115 | capability in its own user namespace; | |
1116 | and (b) the UID and GID of the file inode have mappings in | |
1117 | the writer's user namespace. | |
bcaa30c9 | 1118 | .PP |
7b45f4b2 | 1119 | When a |
1ae6b2c7 | 1120 | .B VFS_CAP_REVISION_3 |
7b45f4b2 MK |
1121 | .I security.capability |
1122 | extended attribute is created, the root user ID of the creating thread's | |
1123 | user namespace is saved in the extended attribute. | |
bcaa30c9 | 1124 | .PP |
2c77e8de | 1125 | By contrast, creating or modifying a |
7b45f4b2 MK |
1126 | .I security.capability |
1127 | extended attribute from a privileged | |
1128 | .RB ( CAP_SETFCAP ) | |
1129 | thread that resides in the | |
90ef0f7b | 1130 | namespace where the underlying filesystem was mounted |
7b45f4b2 | 1131 | (this normally means the initial user namespace) |
2c77e8de | 1132 | automatically results in the creation of a version 2 |
bcaa30c9 | 1133 | .RB ( VFS_CAP_REVISION_2 ) |
7b45f4b2 | 1134 | attribute. |
bcaa30c9 | 1135 | .PP |
2c77e8de MK |
1136 | Note that the creation of a version 3 |
1137 | .I security.capability | |
1138 | extended attribute is automatic. | |
1139 | That is to say, when a user-space application writes | |
1140 | .RB ( setxattr (2)) | |
1141 | a | |
1142 | .I security.capability | |
1143 | attribute in the version 2 format, | |
1144 | the kernel will automatically create a version 3 attribute | |
1145 | if the attribute is created in the circumstances described above. | |
1146 | Correspondingly, when a version 3 | |
1147 | .I security.capability | |
1148 | attribute is retrieved | |
1149 | .RB ( getxattr (2)) | |
1150 | by a process that resides inside a user namespace that was created by the | |
1151 | root user ID (or a descendant of that user namespace), | |
1152 | the returned attribute is (automatically) | |
1153 | simplified to appear as a version 2 attribute | |
1154 | (i.e., the returned value is the size of a version 2 attribute and does | |
1155 | not include the root user ID). | |
1156 | These automatic translations mean that no changes are required to | |
1157 | user-space tools (e.g., | |
1158 | .BR setcap (1) | |
1159 | and | |
1160 | .BR getcap (1)) | |
1161 | in order for those tools to be used to create and retrieve version 3 | |
1162 | .I security.capability | |
1163 | attributes. | |
1164 | .PP | |
bcaa30c9 MK |
1165 | Note that a file can have either a version 2 or a version 3 |
1166 | .I security.capability | |
1167 | extended attribute associated with it, but not both: | |
1168 | creation or modification of the | |
1169 | .I security.capability | |
1170 | extended attribute will automatically modify the version | |
1171 | according to the circumstances in which the extended attribute is | |
1172 | created or modified. | |
b6848704 | 1173 | .\" |
c634028a | 1174 | .SS Transformation of capabilities during execve() |
c13182ef | 1175 | During an |
c930827f | 1176 | .BR execve (2), |
1e321034 | 1177 | the kernel calculates the new capabilities of |
fea681da | 1178 | the process using the following algorithm: |
ade303d7 | 1179 | .PP |
088a639b | 1180 | .in +4n |
b8302363 | 1181 | .EX |
f04f131f | 1182 | P'(ambient) = (file is privileged) ? 0 : P(ambient) |
6260f4cd | 1183 | |
f04f131f | 1184 | P'(permitted) = (P(inheritable) & F(inheritable)) | |
2e87ced3 | 1185 | (F(permitted) & P(bounding)) | P'(ambient) |
fea681da | 1186 | |
f04f131f | 1187 | P'(effective) = F(effective) ? P'(permitted) : P'(ambient) |
fea681da | 1188 | |
5bdccabd | 1189 | P'(inheritable) = P(inheritable) [i.e., unchanged] |
2e87ced3 MK |
1190 | |
1191 | P'(bounding) = P(bounding) [i.e., unchanged] | |
b8302363 | 1192 | .EE |
088a639b | 1193 | .in |
ade303d7 | 1194 | .PP |
fea681da | 1195 | where: |
c8e68512 | 1196 | .RS 4 |
2e87ced3 | 1197 | .IP P() 6 |
c13182ef | 1198 | denotes the value of a thread capability set before the |
c930827f | 1199 | .BR execve (2) |
2e87ced3 | 1200 | .IP P'() |
8295fc02 | 1201 | denotes the value of a thread capability set after the |
c930827f | 1202 | .BR execve (2) |
2e87ced3 | 1203 | .IP F() |
fea681da | 1204 | denotes a file capability set |
c8e68512 | 1205 | .RE |
3375bef1 | 1206 | .PP |
ddc1ad30 MK |
1207 | Note the following details relating to the above capability |
1208 | transformation rules: | |
1209 | .IP * 3 | |
1210 | The ambient capability set is present only since Linux 4.3. | |
1211 | When determining the transformation of the ambient set during | |
1212 | .BR execve (2), | |
1213 | a privileged file is one that has capabilities or | |
3375bef1 | 1214 | has the set-user-ID or set-group-ID bit set. |
ddc1ad30 MK |
1215 | .IP * |
1216 | Prior to Linux 2.6.25, | |
1217 | the bounding set was a system-wide attribute shared by all threads. | |
1218 | That system-wide value was employed to calculate the new permitted set during | |
1219 | .BR execve (2) | |
1220 | in the same manner as shown above for | |
1221 | .IR P(bounding) . | |
ade303d7 | 1222 | .PP |
56cc88cb | 1223 | .IR Note : |
1a9ed17c MK |
1224 | during the capability transitions described above, |
1225 | file capabilities may be ignored (treated as empty) for the same reasons | |
56cc88cb MK |
1226 | that the set-user-ID and set-group-ID bits are ignored; see |
1227 | .BR execve (2). | |
1a9ed17c | 1228 | File capabilities are similarly ignored if the kernel was booted with the |
f6acfeb8 | 1229 | .I no_file_caps |
1a9ed17c | 1230 | option. |
ade303d7 | 1231 | .PP |
e3ed67ed MK |
1232 | .IR Note : |
1233 | according to the rules above, | |
1234 | if a process with nonzero user IDs performs an | |
1235 | .BR execve (2) | |
1236 | then any capabilities that are present in | |
1237 | its permitted and effective sets will be cleared. | |
1238 | For the treatment of capabilities when a process with a | |
1239 | user ID of zero performs an | |
1240 | .BR execve (2), | |
aca89285 KK |
1241 | see |
1242 | .I Capabilities and execution of programs by root | |
1243 | below. | |
c8e68512 | 1244 | .\" |
e0e57837 | 1245 | .SS Safety checking for capability-dumb binaries |
4a866754 | 1246 | A capability-dumb binary is an application that has been |
e0e57837 MK |
1247 | marked to have file capabilities, but has not been converted to use the |
1248 | .BR libcap (3) | |
1249 | API to manipulate its capabilities. | |
1250 | (In other words, this is a traditional set-user-ID-root program | |
1251 | that has been switched to use file capabilities, | |
1252 | but whose code has not been modified to understand capabilities.) | |
2c767761 | 1253 | For such applications, |
e0e57837 MK |
1254 | the effective capability bit is set on the file, |
1255 | so that the file permitted capabilities are automatically | |
1256 | enabled in the process effective set when executing the file. | |
1257 | The kernel recognizes a file which has the effective capability bit set | |
1258 | as capability-dumb for the purpose of the check described here. | |
ade303d7 | 1259 | .PP |
e0e57837 MK |
1260 | When executing a capability-dumb binary, |
1261 | the kernel checks if the process obtained all permitted capabilities | |
1262 | that were specified in the file permitted set, | |
1263 | after the capability transformations described above have been performed. | |
1264 | (The typical reason why this might | |
1265 | .I not | |
1266 | occur is that the capability bounding set masked out some | |
1267 | of the capabilities in the file permitted set.) | |
1268 | If the process did not obtain the full set of | |
1269 | file permitted capabilities, then | |
1270 | .BR execve (2) | |
1271 | fails with the error | |
1272 | .BR EPERM . | |
1273 | This prevents possible security risks that could arise when | |
1274 | a capability-dumb application is executed with less privilege that it needs. | |
1275 | Note that, by definition, | |
1276 | the application could not itself recognize this problem, | |
1277 | since it does not employ the | |
1278 | .BR libcap (3) | |
1279 | API. | |
1280 | .\" | |
c8e68512 | 1281 | .SS Capabilities and execution of programs by root |
33d0916f MK |
1282 | .\" See cap_bprm_set_creds(), bprm_caps_from_vfs_cap() and |
1283 | .\" handle_privileged_root() in security/commoncap.c (Linux 5.0 source) | |
bc1950ac | 1284 | In order to mirror traditional UNIX semantics, |
33d0916f MK |
1285 | the kernel performs special treatment of file capabilities when |
1286 | a process with UID 0 (root) executes a program and | |
1287 | when a set-user-ID-root program is executed. | |
bc1950ac | 1288 | .PP |
33d0916f MK |
1289 | After having performed any changes to the process effective ID that |
1290 | were triggered by the set-user-ID mode bit of the binary\(eme.g., | |
1291 | switching the effective user ID to 0 (root) because | |
1292 | a set-user-ID-root program was executed\(emthe | |
1293 | kernel calculates the file capability sets as follows: | |
c8e68512 | 1294 | .IP 1. 3 |
bc1950ac | 1295 | If the real or effective user ID of the process is 0 (root), |
33d0916f MK |
1296 | then the file inheritable and permitted sets are ignored; |
1297 | instead they are notionally considered to be all ones | |
c8e68512 | 1298 | (i.e., all capabilities enabled). |
aca89285 KK |
1299 | (There is one exception to this behavior, described in |
1300 | .I Set-user-ID-root programs that have file capabilities | |
1301 | below.) | |
c8e68512 | 1302 | .IP 2. |
bc1950ac MK |
1303 | If the effective user ID of the process is 0 (root) or |
1304 | the file effective bit is in fact enabled, | |
33d0916f | 1305 | then the file effective bit is notionally defined to be one (enabled). |
3dfe7e0d | 1306 | .PP |
33d0916f MK |
1307 | These notional values for the file's capability sets are then used |
1308 | as described above to calculate the transformation of the process's | |
1309 | capabilities during | |
1310 | .BR execve (2). | |
bc1950ac | 1311 | .PP |
33d0916f | 1312 | Thus, when a process with nonzero UIDs |
c930827f | 1313 | .BR execve (2)s |
33d0916f MK |
1314 | a set-user-ID-root program that does not have capabilities attached, |
1315 | or when a process whose real and effective UIDs are zero | |
ab8aa2e4 | 1316 | .BR execve (2)s |
33d0916f MK |
1317 | a program, the calculation of the process's new |
1318 | permitted capabilities simplifies to: | |
1319 | .PP | |
1320 | .in +4n | |
1321 | .EX | |
1322 | P'(permitted) = P(inheritable) | P(bounding) | |
1323 | ||
1324 | P'(effective) = P'(permitted) | |
1325 | .EE | |
1326 | .in | |
1327 | .PP | |
1328 | Consequently, the process gains all capabilities in its permitted and | |
1329 | effective capability sets, | |
ab8aa2e4 | 1330 | except those masked out by the capability bounding set. |
33d0916f MK |
1331 | (In the calculation of P'(permitted), |
1332 | the P'(ambient) term can be simplified away because it is by | |
1333 | definition a proper subset of P(inheritable).) | |
ab8aa2e4 | 1334 | .PP |
33d0916f MK |
1335 | The special treatments of user ID 0 (root) described in this subsection |
1336 | can be disabled using the securebits mechanism described below. | |
1337 | .\" | |
0603dda3 MK |
1338 | .\" |
1339 | .SS Set-user-ID-root programs that have file capabilities | |
aca89285 KK |
1340 | There is one exception to the behavior described in |
1341 | .I Capabilities and execution of programs by root | |
1342 | above. | |
33d0916f MK |
1343 | If (a) the binary that is being executed has capabilities attached and |
1344 | (b) the real user ID of the process is | |
1345 | .I not | |
1346 | 0 (root) and | |
1347 | (c) the effective user ID of the process | |
1348 | .I is | |
1349 | 0 (root), then the file capability bits are honored | |
1350 | (i.e., they are not notionally considered to be all ones). | |
1351 | The usual way in which this situation can arise is when executing | |
1352 | a set-UID-root program that also has file capabilities. | |
1353 | When such a program is executed, | |
1354 | the process gains just the capabilities granted by the program | |
0603dda3 MK |
1355 | (i.e., not all capabilities, |
1356 | as would occur when executing a set-user-ID-root program | |
1357 | that does not have any associated file capabilities). | |
bc1950ac | 1358 | .PP |
c199053b MK |
1359 | Note that one can assign empty capability sets to a program file, |
1360 | and thus it is possible to create a set-user-ID-root program that | |
1361 | changes the effective and saved set-user-ID of the process | |
1362 | that executes the program to 0, | |
1363 | but confers no capabilities to that process. | |
0603dda3 | 1364 | .\" |
c8e68512 MK |
1365 | .SS Capability bounding set |
1366 | The capability bounding set is a security mechanism that can be used | |
1367 | to limit the capabilities that can be gained during an | |
1368 | .BR execve (2). | |
1369 | The bounding set is used in the following ways: | |
1370 | .IP * 2 | |
1371 | During an | |
1372 | .BR execve (2), | |
1373 | the capability bounding set is ANDed with the file permitted | |
1374 | capability set, and the result of this operation is assigned to the | |
1375 | thread's permitted capability set. | |
1376 | The capability bounding set thus places a limit on the permitted | |
1377 | capabilities that may be granted by an executable file. | |
1378 | .IP * | |
1379 | (Since Linux 2.6.25) | |
1380 | The capability bounding set acts as a limiting superset for | |
1381 | the capabilities that a thread can add to its inheritable set using | |
1382 | .BR capset (2). | |