]>
Commit | Line | Data |
---|---|---|
c11b1abf | 1 | .\" Copyright (c) 2002 by Michael Kerrisk <mtk.manpages@gmail.com> |
fea681da | 2 | .\" |
93015253 | 3 | .\" %%%LICENSE_START(VERBATIM) |
fea681da MK |
4 | .\" Permission is granted to make and distribute verbatim copies of this |
5 | .\" manual provided the copyright notice and this permission notice are | |
6 | .\" preserved on all copies. | |
7 | .\" | |
8 | .\" Permission is granted to copy and distribute modified versions of this | |
9 | .\" manual under the conditions for verbatim copying, provided that the | |
10 | .\" entire resulting derived work is distributed under the terms of a | |
11 | .\" permission notice identical to this one. | |
12 | .\" | |
13 | .\" Since the Linux kernel and libraries are constantly changing, this | |
14 | .\" manual page may be incorrect or out-of-date. The author(s) assume no | |
15 | .\" responsibility for errors or omissions, or for damages resulting from | |
10d76543 MK |
16 | .\" the use of the information contained herein. The author(s) may not |
17 | .\" have taken the same level of care in the production of this manual, | |
18 | .\" which is licensed free of charge, as they might when working | |
19 | .\" professionally. | |
fea681da MK |
20 | .\" |
21 | .\" Formatted or processed versions of this manual, if unaccompanied by | |
22 | .\" the source, must acknowledge the copyright and authors of this work. | |
4b72fb64 | 23 | .\" %%%LICENSE_END |
fea681da MK |
24 | .\" |
25 | .\" 6 Aug 2002 - Initial Creation | |
c11b1abf MK |
26 | .\" Modified 2003-05-23, Michael Kerrisk, <mtk.manpages@gmail.com> |
27 | .\" Modified 2004-05-27, Michael Kerrisk, <mtk.manpages@gmail.com> | |
1c1e15ed | 28 | .\" 2004-12-08, mtk Added O_NOATIME for CAP_FOWNER |
5eaee3d9 | 29 | .\" 2005-08-16, mtk, Added CAP_AUDIT_CONTROL and CAP_AUDIT_WRITE |
c8e68512 MK |
30 | .\" 2008-07-15, Serge Hallyn <serue@us.bbm.com> |
31 | .\" Document file capabilities, per-process capability | |
32 | .\" bounding set, changed semantics for CAP_SETPCAP, | |
33 | .\" and other changes in 2.6.2[45]. | |
34 | .\" Add CAP_MAC_ADMIN, CAP_MAC_OVERRIDE, CAP_SETFCAP. | |
35 | .\" 2008-07-15, mtk | |
36 | .\" Add text describing circumstances in which CAP_SETPCAP | |
37 | .\" (theoretically) permits a thread to change the | |
38 | .\" capability sets of another thread. | |
39 | .\" Add section describing rules for programmatically | |
40 | .\" adjusting thread capability sets. | |
41 | .\" Describe rationale for capability bounding set. | |
42 | .\" Document "securebits" flags. | |
43 | .\" Add text noting that if we set the effective flag for one file | |
44 | .\" capability, then we must also set the effective flag for all | |
45 | .\" other capabilities where the permitted or inheritable bit is set. | |
bfb730f9 | 46 | .\" 2011-09-07, mtk/Serge hallyn: Add CAP_SYSLOG |
5eaee3d9 | 47 | .\" |
3df541c0 | 48 | .TH CAPABILITIES 7 2016-07-17 "Linux" "Linux Programmer's Manual" |
fea681da MK |
49 | .SH NAME |
50 | capabilities \- overview of Linux capabilities | |
51 | .SH DESCRIPTION | |
fea681da | 52 | For the purpose of performing permission checks, |
008f1ecc | 53 | traditional UNIX implementations distinguish two categories of processes: |
fea681da MK |
54 | .I privileged |
55 | processes (whose effective user ID is 0, referred to as superuser or root), | |
56 | and | |
57 | .I unprivileged | |
c7094399 | 58 | processes (whose effective UID is nonzero). |
fea681da MK |
59 | Privileged processes bypass all kernel permission checks, |
60 | while unprivileged processes are subject to full permission | |
61 | checking based on the process's credentials | |
62 | (usually: effective UID, effective GID, and supplementary group list). | |
63 | ||
c13182ef MK |
64 | Starting with kernel 2.2, Linux divides the privileges traditionally |
65 | associated with superuser into distinct units, known as | |
fea681da | 66 | .IR capabilities , |
3dfe7e0d | 67 | which can be independently enabled and disabled. |
cf7a13d4 | 68 | Capabilities are a per-thread attribute. |
c8e68512 | 69 | .\" |
c634028a | 70 | .SS Capabilities list |
c8e68512 MK |
71 | The following list shows the capabilities implemented on Linux, |
72 | and the operations or behaviors that each capability permits: | |
fea681da | 73 | .TP |
45286787 | 74 | .BR CAP_AUDIT_CONTROL " (since Linux 2.6.11)" |
5eaee3d9 MK |
75 | Enable and disable kernel auditing; change auditing filter rules; |
76 | retrieve auditing status and filtering rules. | |
77 | .TP | |
c81cea2c MK |
78 | .BR CAP_AUDIT_READ " (since Linux 3.16)" |
79 | .\" commit a29b694aa1739f9d76538e34ae25524f9c549d59 | |
80 | .\" commit 3a101b8de0d39403b2c7e5c23fd0b005668acf48 | |
81 | Allow reading the audit log via a multicast netlink socket. | |
82 | .TP | |
45286787 | 83 | .BR CAP_AUDIT_WRITE " (since Linux 2.6.11)" |
c8e68512 | 84 | Write records to kernel auditing log. |
5eaee3d9 | 85 | .TP |
9339d749 MK |
86 | .BR CAP_BLOCK_SUSPEND " (since Linux 3.5)" |
87 | Employ features that can block system suspend | |
88 | .RB ( epoll (7) | |
89 | .BR EPOLLWAKEUP , | |
90 | .IR /proc/sys/wake_lock ). | |
91 | .TP | |
fea681da | 92 | .B CAP_CHOWN |
c8e68512 | 93 | Make arbitrary changes to file UIDs and GIDs (see |
fea681da MK |
94 | .BR chown (2)). |
95 | .TP | |
96 | .B CAP_DAC_OVERRIDE | |
97 | Bypass file read, write, and execute permission checks. | |
c8e68512 | 98 | (DAC is an abbreviation of "discretionary access control".) |
fea681da MK |
99 | .TP |
100 | .B CAP_DAC_READ_SEARCH | |
a537062e MK |
101 | .PD 0 |
102 | .RS | |
103 | .IP * 2 | |
fea681da | 104 | Bypass file read permission checks and |
a537062e MK |
105 | directory read and execute permission checks; |
106 | .IP * | |
3bbab71a | 107 | invoke |
a537062e MK |
108 | .BR open_by_handle_at (2). |
109 | .RE | |
110 | .PD | |
fea681da MK |
111 | .TP |
112 | .B CAP_FOWNER | |
c8e68512 MK |
113 | .PD 0 |
114 | .RS | |
115 | .IP * 2 | |
fea681da | 116 | Bypass permission checks on operations that normally |
9ee4a2b6 | 117 | require the filesystem UID of the process to match the UID of |
fea681da MK |
118 | the file (e.g., |
119 | .BR chmod (2), | |
120 | .BR utime (2)), | |
c8e68512 | 121 | excluding those operations covered by |
fea681da MK |
122 | .B CAP_DAC_OVERRIDE |
123 | and | |
124 | .BR CAP_DAC_READ_SEARCH ; | |
c8e68512 | 125 | .IP * |
fea681da MK |
126 | set extended file attributes (see |
127 | .BR chattr (1)) | |
128 | on arbitrary files; | |
c8e68512 | 129 | .IP * |
fea681da | 130 | set Access Control Lists (ACLs) on arbitrary files; |
c8e68512 | 131 | .IP * |
1c1e15ed | 132 | ignore directory sticky bit on file deletion; |
c8e68512 | 133 | .IP * |
1c1e15ed MK |
134 | specify |
135 | .B O_NOATIME | |
136 | for arbitrary files in | |
137 | .BR open (2) | |
138 | and | |
139 | .BR fcntl (2). | |
c8e68512 MK |
140 | .RE |
141 | .PD | |
fea681da MK |
142 | .TP |
143 | .B CAP_FSETID | |
3bbab71a MK |
144 | .PD 0 |
145 | .RS | |
146 | .IP * 2 | |
ed948c28 | 147 | Don't clear set-user-ID and set-group-ID mode |
c8e68512 | 148 | bits when a file is modified; |
3bbab71a | 149 | .IP * |
c8e68512 | 150 | set the set-group-ID bit for a file whose GID does not match |
9ee4a2b6 | 151 | the filesystem or any of the supplementary GIDs of the calling process. |
3bbab71a MK |
152 | .RE |
153 | .PD | |
fea681da MK |
154 | .TP |
155 | .B CAP_IPC_LOCK | |
bea08fec | 156 | .\" FIXME . As at Linux 3.2, there are some strange uses of this capability |
46c73a44 | 157 | .\" in other places; they probably should be replaced with something else. |
c8e68512 | 158 | Lock memory |
fea681da MK |
159 | .RB ( mlock (2), |
160 | .BR mlockall (2), | |
161 | .BR mmap (2), | |
162 | .BR shmctl (2)). | |
163 | .TP | |
164 | .B CAP_IPC_OWNER | |
165 | Bypass permission checks for operations on System V IPC objects. | |
166 | .TP | |
167 | .B CAP_KILL | |
168 | Bypass permission checks for sending signals (see | |
169 | .BR kill (2)). | |
097585ed | 170 | This includes use of the |
c8e68512 | 171 | .BR ioctl (2) |
097585ed | 172 | .B KDSIGACCEPT |
c8e68512 | 173 | operation. |
bea08fec | 174 | .\" FIXME . CAP_KILL also has an effect for threads + setting child |
a7c1e564 MK |
175 | .\" termination signal to other than SIGCHLD: without this |
176 | .\" capability, the termination signal reverts to SIGCHLD | |
c13182ef | 177 | .\" if the child does an exec(). What is the rationale |
a7c1e564 | 178 | .\" for this? |
fea681da | 179 | .TP |
c8e68512 MK |
180 | .BR CAP_LEASE " (since Linux 2.4)" |
181 | Establish leases on arbitrary files (see | |
fea681da MK |
182 | .BR fcntl (2)). |
183 | .TP | |
184 | .B CAP_LINUX_IMMUTABLE | |
c8e68512 MK |
185 | Set the |
186 | .B FS_APPEND_FL | |
fea681da | 187 | and |
c8e68512 MK |
188 | .B FS_IMMUTABLE_FL |
189 | .\" These attributes are now available on ext2, ext3, Reiserfs, XFS, JFS | |
e7e006f2 | 190 | inode flags (see |
fea681da MK |
191 | .BR chattr (1)). |
192 | .TP | |
c8e68512 MK |
193 | .BR CAP_MAC_ADMIN " (since Linux 2.6.25)" |
194 | Override Mandatory Access Control (MAC). | |
195 | Implemented for the Smack Linux Security Module (LSM). | |
196 | .TP | |
197 | .BR CAP_MAC_OVERRIDE " (since Linux 2.6.25)" | |
198 | Allow MAC configuration or state changes. | |
199 | Implemented for the Smack LSM. | |
200 | .TP | |
201 | .BR CAP_MKNOD " (since Linux 2.4)" | |
202 | Create special files using | |
fea681da MK |
203 | .BR mknod (2). |
204 | .TP | |
205 | .B CAP_NET_ADMIN | |
e87268ec MK |
206 | Perform various network-related operations: |
207 | .PD 0 | |
208 | .RS | |
209 | .IP * 2 | |
210 | interface configuration; | |
211 | .IP * | |
12fe8fd3 | 212 | administration of IP firewall, masquerading, and accounting; |
e87268ec MK |
213 | .IP * |
214 | modify routing tables; | |
215 | .IP * | |
216 | bind to any address for transparent proxying; | |
217 | .IP * | |
218 | set type-of-service (TOS) | |
219 | .IP * | |
220 | clear driver statistics; | |
221 | .IP * | |
222 | set promiscuous mode; | |
223 | .IP * | |
224 | enabling multicasting; | |
225 | .IP * | |
226 | use | |
227 | .BR setsockopt (2) | |
228 | to set the following socket options: | |
229 | .BR SO_DEBUG , | |
230 | .BR SO_MARK , | |
231 | .BR SO_PRIORITY | |
232 | (for a priority outside the range 0 to 6), | |
233 | .BR SO_RCVBUFFORCE , | |
234 | and | |
235 | .BR SO_SNDBUFFORCE . | |
236 | .RE | |
237 | .PD | |
fea681da MK |
238 | .TP |
239 | .B CAP_NET_BIND_SERVICE | |
6eb334b2 | 240 | Bind a socket to Internet domain privileged ports |
fea681da MK |
241 | (port numbers less than 1024). |
242 | .TP | |
243 | .B CAP_NET_BROADCAST | |
c8e68512 | 244 | (Unused) Make socket broadcasts, and listen to multicasts. |
fea681da MK |
245 | .TP |
246 | .B CAP_NET_RAW | |
93e9e2d6 MK |
247 | .PD 0 |
248 | .RS | |
249 | .IP * 2 | |
250 | use RAW and PACKET sockets; | |
251 | .IP * | |
252 | bind to any address for transparent proxying. | |
253 | .RE | |
254 | .PD | |
fea681da MK |
255 | .\" Also various IP options and setsockopt(SO_BINDTODEVICE) |
256 | .TP | |
257 | .B CAP_SETGID | |
3bbab71a MK |
258 | .RS |
259 | .PD 0 | |
260 | .IP * 2 | |
c8e68512 | 261 | Make arbitrary manipulations of process GIDs and supplementary GID list; |
3bbab71a | 262 | .IP * |
5bea231d | 263 | forge GID when passing socket credentials via UNIX domain sockets; |
3bbab71a | 264 | .IP * |
5bea231d | 265 | write a group ID mapping in a user namespace (see |
f58fb24f | 266 | .BR user_namespaces (7)). |
3bbab71a MK |
267 | .PD |
268 | .RE | |
fea681da | 269 | .TP |
c8e68512 MK |
270 | .BR CAP_SETFCAP " (since Linux 2.6.24)" |
271 | Set file capabilities. | |
272 | .TP | |
273 | .B CAP_SETPCAP | |
274 | If file capabilities are not supported: | |
275 | grant or remove any capability in the | |
276 | caller's permitted capability set to or from any other process. | |
277 | (This property of | |
278 | .B CAP_SETPCAP | |
279 | is not available when the kernel is configured to support | |
280 | file capabilities, since | |
fea681da | 281 | .B CAP_SETPCAP |
c8e68512 MK |
282 | has entirely different semantics for such kernels.) |
283 | ||
284 | If file capabilities are supported: | |
285 | add any capability from the calling thread's bounding set | |
286 | to its inheritable set; | |
287 | drop capabilities from the bounding set (via | |
288 | .BR prctl (2) | |
289 | .BR PR_CAPBSET_DROP ); | |
290 | make changes to the | |
291 | .I securebits | |
292 | flags. | |
fea681da MK |
293 | .TP |
294 | .B CAP_SETUID | |
3bbab71a MK |
295 | .RS |
296 | .PD 0 | |
297 | .IP * 2 | |
c8e68512 | 298 | Make arbitrary manipulations of process UIDs |
fea681da MK |
299 | .RB ( setuid (2), |
300 | .BR setreuid (2), | |
301 | .BR setresuid (2), | |
302 | .BR setfsuid (2)); | |
3bbab71a | 303 | .IP * |
a7d96776 | 304 | forge UID when passing socket credentials via UNIX domain sockets; |
3bbab71a | 305 | .IP * |
5bea231d | 306 | write a user ID mapping in a user namespace (see |
f58fb24f | 307 | .BR user_namespaces (7)). |
3bbab71a MK |
308 | .PD |
309 | .RE | |
777f5a9e | 310 | .\" FIXME CAP_SETUID also an effect in exec(); document this. |
fea681da MK |
311 | .TP |
312 | .B CAP_SYS_ADMIN | |
c8e68512 MK |
313 | .PD 0 |
314 | .RS | |
315 | .IP * 2 | |
316 | Perform a range of system administration operations including: | |
fea681da MK |
317 | .BR quotactl (2), |
318 | .BR mount (2), | |
319 | .BR umount (2), | |
1368e847 MK |
320 | .BR swapon (2), |
321 | .BR swapoff (2), | |
fea681da | 322 | .BR sethostname (2), |
f169a862 | 323 | and |
c8e68512 MK |
324 | .BR setdomainname (2); |
325 | .IP * | |
bfb730f9 MK |
326 | perform privileged |
327 | .BR syslog (2) | |
328 | operations (since Linux 2.6.37, | |
329 | .BR CAP_SYSLOG | |
330 | should be used to permit such operations); | |
331 | .IP * | |
c8e68512 | 332 | perform |
c11e3891 MK |
333 | .B VM86_REQUEST_IRQ |
334 | .BR vm86 (2) | |
335 | command; | |
336 | .IP * | |
337 | perform | |
fea681da MK |
338 | .B IPC_SET |
339 | and | |
340 | .B IPC_RMID | |
341 | operations on arbitrary System V IPC objects; | |
c8e68512 | 342 | .IP * |
1a3b63f7 MK |
343 | override |
344 | .B RLIMIT_NPROC | |
345 | resource limit; | |
346 | .IP * | |
fea681da MK |
347 | perform operations on |
348 | .I trusted | |
349 | and | |
350 | .I security | |
351 | Extended Attributes (see | |
89fabe2e | 352 | .BR xattr (7)); |
c8e68512 MK |
353 | .IP * |
354 | use | |
08baa0af | 355 | .BR lookup_dcookie (2); |
c8e68512 | 356 | .IP * |
a1f926b8 MK |
357 | use |
358 | .BR ioprio_set (2) | |
359 | to assign | |
360 | .B IOPRIO_CLASS_RT | |
83ee9237 | 361 | and (before Linux 2.6.25) |
237aa7c5 | 362 | .B IOPRIO_CLASS_IDLE |
a1f926b8 | 363 | I/O scheduling classes; |
c8e68512 | 364 | .IP * |
f5ac5bbf | 365 | forge PID when passing socket credentials via UNIX domain sockets; |
c8e68512 | 366 | .IP * |
fea681da | 367 | exceed |
3dfe7e0d MK |
368 | .IR /proc/sys/fs/file-max , |
369 | the system-wide limit on the number of open files, | |
370 | in system calls that open files (e.g., | |
fea681da MK |
371 | .BR accept (2), |
372 | .BR execve (2), | |
373 | .BR open (2), | |
f169a862 | 374 | .BR pipe (2)); |
c8e68512 | 375 | .IP * |
c13182ef | 376 | employ |
0f807eea MK |
377 | .B CLONE_* |
378 | flags that create new namespaces with | |
a7c1e564 MK |
379 | .BR clone (2) |
380 | and | |
c67d3814 MK |
381 | .BR unshare (2) |
382 | (but, since Linux 3.8, | |
383 | creating user namespaces does not require any capability); | |
c8e68512 | 384 | .IP * |
e4698850 | 385 | call |
0f322ccc MK |
386 | .BR perf_event_open (2); |
387 | .IP * | |
0f322ccc MK |
388 | access privileged |
389 | .I perf | |
390 | event information; | |
2bfe6656 MK |
391 | .IP * |
392 | call | |
c3b49118 MK |
393 | .BR setns (2) |
394 | (requires | |
395 | .B CAP_SYS_ADMIN | |
396 | in the | |
397 | .I target | |
398 | namespace); | |
e4698850 | 399 | .IP * |
0f807eea MK |
400 | call |
401 | .BR fanotify_init (2); | |
402 | .IP * | |
0563f204 MK |
403 | call |
404 | .BR bpf (2); | |
405 | .IP * | |
2cf45b0d | 406 | perform privileged |
a7c1e564 MK |
407 | .B KEYCTL_CHOWN |
408 | and | |
409 | .B KEYCTL_SETPERM | |
410 | .BR keyctl (2) | |
e64e6056 MK |
411 | operations; |
412 | .IP * | |
ba8f381e MK |
413 | use |
414 | .BR ptrace (2) | |
415 | .B PTRACE_SECCOMP_GET_FILTER | |
416 | to dump a tracees seccomp filters; | |
417 | .IP * | |
e64e6056 MK |
418 | perform |
419 | .BR madvise (2) | |
420 | .B MADV_HWPOISON | |
0f807eea MK |
421 | operation; |
422 | .IP * | |
423 | employ the | |
424 | .B TIOCSTI | |
425 | .BR ioctl (2) | |
426 | to insert characters into the input queue of a terminal other than | |
838ad419 | 427 | the caller's controlling terminal; |
0f807eea | 428 | .IP * |
0f807eea | 429 | employ the obsolete |
51c5c662 | 430 | .BR nfsservctl (2) |
c42221c4 MK |
431 | system call; |
432 | .IP * | |
433 | employ the obsolete | |
0f807eea MK |
434 | .BR bdflush (2) |
435 | system call; | |
436 | .IP * | |
437 | perform various privileged block-device | |
438 | .BR ioctl (2) | |
439 | operations; | |
440 | .IP * | |
9ee4a2b6 | 441 | perform various privileged filesystem |
0f807eea MK |
442 | .BR ioctl (2) |
443 | operations; | |
444 | .IP * | |
fdf41f57 MK |
445 | perform privileged |
446 | .BR ioctl (2) | |
447 | operations on the | |
448 | .IR /dev/random | |
449 | device (see | |
450 | .BR random (4)); | |
451 | .IP * | |
0f807eea | 452 | perform administrative operations on many device drivers. |
c8e68512 MK |
453 | .RE |
454 | .PD | |
fea681da MK |
455 | .TP |
456 | .B CAP_SYS_BOOT | |
c8e68512 | 457 | Use |
08baa0af MK |
458 | .BR reboot (2) |
459 | and | |
460 | .BR kexec_load (2). | |
fea681da MK |
461 | .TP |
462 | .B CAP_SYS_CHROOT | |
c8e68512 | 463 | Use |
fea681da MK |
464 | .BR chroot (2). |
465 | .TP | |
466 | .B CAP_SYS_MODULE | |
3bbab71a MK |
467 | .RS |
468 | .PD 0 | |
469 | .IP * 2 | |
c8e68512 MK |
470 | Load and unload kernel modules |
471 | (see | |
fea681da MK |
472 | .BR init_module (2) |
473 | and | |
c8e68512 | 474 | .BR delete_module (2)); |
3bbab71a | 475 | .IP * |
c8e68512 MK |
476 | in kernels before 2.6.25: |
477 | drop capabilities from the system-wide capability bounding set. | |
3bbab71a MK |
478 | .PD |
479 | .RE | |
fea681da MK |
480 | .TP |
481 | .B CAP_SYS_NICE | |
c8e68512 MK |
482 | .PD 0 |
483 | .RS | |
484 | .IP * 2 | |
485 | Raise process nice value | |
fea681da MK |
486 | .RB ( nice (2), |
487 | .BR setpriority (2)) | |
c8e68512 MK |
488 | and change the nice value for arbitrary processes; |
489 | .IP * | |
490 | set real-time scheduling policies for calling process, | |
491 | and set scheduling policies and priorities for arbitrary processes | |
fea681da | 492 | .RB ( sched_setscheduler (2), |
f96787ab MK |
493 | .BR sched_setparam (2), |
494 | .BR shed_setattr (2)); | |
c8e68512 | 495 | .IP * |
fea681da | 496 | set CPU affinity for arbitrary processes |
c13182ef | 497 | .RB ( sched_setaffinity (2)); |
c8e68512 | 498 | .IP * |
a1f926b8 | 499 | set I/O scheduling class and priority for arbitrary processes |
c13182ef | 500 | .RB ( ioprio_set (2)); |
c8e68512 MK |
501 | .IP * |
502 | apply | |
a1f926b8 | 503 | .BR migrate_pages (2) |
c8e68512 | 504 | to arbitrary processes and allow processes |
a1f926b8 | 505 | to be migrated to arbitrary nodes; |
c13182ef | 506 | .\" FIXME CAP_SYS_NICE also has the following effect for |
a1f926b8 MK |
507 | .\" migrate_pages(2): |
508 | .\" do_migrate_pages(mm, &old, &new, | |
509 | .\" capable(CAP_SYS_NICE) ? MPOL_MF_MOVE_ALL : MPOL_MF_MOVE); | |
1a0fbe37 | 510 | .\" |
bea08fec | 511 | .\" Document this. |
c8e68512 MK |
512 | .IP * |
513 | apply | |
a7c1e564 | 514 | .BR move_pages (2) |
c8e68512 MK |
515 | to arbitrary processes; |
516 | .IP * | |
4d62f7b6 MK |
517 | use the |
518 | .B MPOL_MF_MOVE_ALL | |
c13182ef | 519 | flag with |
a7c1e564 | 520 | .BR mbind (2) |
c13182ef | 521 | and |
a7c1e564 | 522 | .BR move_pages (2). |
c8e68512 MK |
523 | .RE |
524 | .PD | |
fea681da MK |
525 | .TP |
526 | .B CAP_SYS_PACCT | |
c8e68512 | 527 | Use |
fea681da MK |
528 | .BR acct (2). |
529 | .TP | |
530 | .B CAP_SYS_PTRACE | |
eb64a9cb MK |
531 | .PD 0 |
532 | .RS | |
de6a5c05 | 533 | .IP * 2 |
c8e68512 | 534 | Trace arbitrary processes using |
cbd7b9bf | 535 | .BR ptrace (2); |
eb64a9cb | 536 | .IP * |
cbd7b9bf MK |
537 | apply |
538 | .BR get_robust_list (2) | |
38b6e5b0 | 539 | to arbitrary processes; |
eb64a9cb | 540 | .IP * |
b8f84ce2 MK |
541 | transfer data to or from the memory of arbitrary processes using |
542 | .BR process_vm_readv (2) | |
543 | and | |
3bbab71a | 544 | .BR process_vm_writev (2); |
b8f84ce2 | 545 | .IP * |
38b6e5b0 MK |
546 | inspect processes using |
547 | .BR kcmp (2). | |
eb64a9cb MK |
548 | .RE |
549 | .PD | |
fea681da MK |
550 | .TP |
551 | .B CAP_SYS_RAWIO | |
4637c8cb MK |
552 | .PD 0 |
553 | .RS | |
554 | .IP * 2 | |
c8e68512 | 555 | Perform I/O port operations |
fea681da MK |
556 | .RB ( iopl (2) |
557 | and | |
558 | .BR ioperm (2)); | |
4637c8cb | 559 | .IP * |
fea681da | 560 | access |
474e1f9d | 561 | .IR /proc/kcore ; |
4637c8cb | 562 | .IP * |
474e1f9d MK |
563 | employ the |
564 | .B FIBMAP | |
565 | .BR ioctl (2) | |
4637c8cb MK |
566 | operation; |
567 | .IP * | |
568 | open devices for accessing x86 model-specific registers (MSRs, see | |
3bbab71a | 569 | .BR msr (4)); |
4637c8cb MK |
570 | .IP * |
571 | update | |
572 | .IR /proc/sys/vm/mmap_min_addr ; | |
573 | .IP * | |
574 | create memory mappings at addresses below the value specified by | |
575 | .IR /proc/sys/vm/mmap_min_addr ; | |
576 | .IP * | |
50b2aa27 | 577 | map files in |
cef53f3e | 578 | .IR /proc/bus/pci ; |
4637c8cb MK |
579 | .IP * |
580 | open | |
581 | .IR /dev/mem | |
582 | and | |
583 | .IR /dev/kmem ; | |
584 | .IP * | |
585 | perform various SCSI device commands; | |
586 | .IP * | |
587 | perform certain operations on | |
588 | .BR hpsa (4) | |
589 | and | |
590 | .BR cciss (4) | |
591 | devices; | |
592 | .IP * | |
593 | perform a range of device-specific operations on other devices. | |
594 | .RE | |
595 | .PD | |
fea681da MK |
596 | .TP |
597 | .B CAP_SYS_RESOURCE | |
c8e68512 MK |
598 | .PD 0 |
599 | .RS | |
600 | .IP * 2 | |
9ee4a2b6 | 601 | Use reserved space on ext2 filesystems; |
c8e68512 MK |
602 | .IP * |
603 | make | |
fea681da MK |
604 | .BR ioctl (2) |
605 | calls controlling ext3 journaling; | |
c8e68512 MK |
606 | .IP * |
607 | override disk quota limits; | |
608 | .IP * | |
609 | increase resource limits (see | |
fea681da | 610 | .BR setrlimit (2)); |
c8e68512 MK |
611 | .IP * |
612 | override | |
fea681da | 613 | .B RLIMIT_NPROC |
c8e68512 MK |
614 | resource limit; |
615 | .IP * | |
aa66392d MK |
616 | override maximum number of consoles on console allocation; |
617 | .IP * | |
618 | override maximum number of keymaps; | |
619 | .IP * | |
620 | allow more than 64hz interrupts from the real-time clock; | |
621 | .IP * | |
c8e68512 | 622 | raise |
fea681da | 623 | .I msg_qbytes |
c8e68512 | 624 | limit for a System V message queue above the limit in |
0daa9e92 | 625 | .I /proc/sys/kernel/msgmnb |
fea681da MK |
626 | (see |
627 | .BR msgop (2) | |
628 | and | |
ad7b0f91 MK |
629 | .BR msgctl (2)); |
630 | .IP * | |
631 | override the | |
632 | .I /proc/sys/fs/pipe-size-max | |
633 | limit when setting the capacity of a pipe using the | |
634 | .B F_SETPIPE_SZ | |
635 | .BR fcntl (2) | |
636 | command. | |
46883521 MK |
637 | .IP * |
638 | use | |
639 | .BR F_SETPIPE_SZ | |
640 | to increase the capacity of a pipe above the limit specified by | |
b39a2012 MK |
641 | .IR /proc/sys/fs/pipe-max-size ; |
642 | .IP * | |
643 | override | |
644 | .I /proc/sys/fs/mqueue/queues_max | |
645 | limit when creating POSIX message queues (see | |
ecc1f45b MK |
646 | .BR mq_overview (7)); |
647 | .IP * | |
3bbab71a | 648 | employ the |
ecc1f45b MK |
649 | .BR prctl (2) |
650 | .B PR_SET_MM | |
8ddcc591 | 651 | operation; |
41f00272 | 652 | .IP * |
8ddcc591 | 653 | set |
750653a8 | 654 | .IR /proc/[pid]/oom_score_adj |
8ddcc591 MK |
655 | to a value lower than the value last set by a process with |
656 | .BR CAP_SYS_RESOURCE . | |
c8e68512 MK |
657 | .RE |
658 | .PD | |
fea681da MK |
659 | .TP |
660 | .B CAP_SYS_TIME | |
c8e68512 | 661 | Set system clock |
fea681da MK |
662 | .RB ( settimeofday (2), |
663 | .BR stime (2), | |
664 | .BR adjtimex (2)); | |
c8e68512 | 665 | set real-time (hardware) clock. |
fea681da MK |
666 | .TP |
667 | .B CAP_SYS_TTY_CONFIG | |
c8e68512 | 668 | Use |
749ac769 MK |
669 | .BR vhangup (2); |
670 | employ various privileged | |
671 | .BR ioctl (2) | |
672 | operations on virtual terminals. | |
bfb730f9 MK |
673 | .TP |
674 | .BR CAP_SYSLOG " (since Linux 2.6.37)" | |
5f94327c MK |
675 | .RS |
676 | .PD 0 | |
de6a5c05 | 677 | .IP * 2 |
bfb730f9 MK |
678 | Perform privileged |
679 | .BR syslog (2) | |
680 | operations. | |
681 | See | |
682 | .BR syslog (2) | |
683 | for information on which operations require privilege. | |
10fe5485 MK |
684 | .IP * |
685 | View kernel addresses exposed via | |
686 | .I /proc | |
687 | and other interfaces when | |
688 | .IR /proc/sys/kernel/kptr_restrict | |
689 | has the value 1. | |
4eaa04c5 | 690 | (See the discussion of the |
10fe5485 MK |
691 | .I kptr_restrict |
692 | in | |
693 | .BR proc (5).) | |
5f94327c MK |
694 | .PD |
695 | .RE | |
d6b08708 MK |
696 | .TP |
697 | .BR CAP_WAKE_ALARM " (since Linux 3.0)" | |
698 | Trigger something that will wake up the system (set | |
699 | .B CLOCK_REALTIME_ALARM | |
700 | and | |
701 | .B CLOCK_BOOTTIME_ALARM | |
702 | timers). | |
c8e68512 | 703 | .\" |
c634028a | 704 | .SS Past and current implementation |
c8e68512 MK |
705 | A full implementation of capabilities requires that: |
706 | .IP 1. 3 | |
707 | For all privileged operations, | |
708 | the kernel must check whether the thread has the required | |
709 | capability in its effective set. | |
710 | .IP 2. | |
137d81b5 | 711 | The kernel must provide system calls allowing a thread's capability sets to |
c8e68512 MK |
712 | be changed and retrieved. |
713 | .IP 3. | |
9ee4a2b6 | 714 | The filesystem must support attaching capabilities to an executable file, |
c8e68512 MK |
715 | so that a process gains those capabilities when the file is executed. |
716 | .PP | |
717 | Before kernel 2.6.24, only the first two of these requirements are met; | |
718 | since kernel 2.6.24, all three requirements are met. | |
719 | .\" | |
c634028a | 720 | .SS Thread capability sets |
cf7a13d4 | 721 | Each thread has three capability sets containing zero or more |
fea681da MK |
722 | of the above capabilities: |
723 | .TP | |
fea681da | 724 | .IR Permitted : |
c8e68512 MK |
725 | This is a limiting superset for the effective |
726 | capabilities that the thread may assume. | |
727 | It is also a limiting superset for the capabilities that | |
728 | may be added to the inheritable set by a thread that does not have the | |
729 | .B CAP_SETPCAP | |
730 | capability in its effective set. | |
731 | ||
cf7a13d4 | 732 | If a thread drops a capability from its permitted set, |
3b777aff | 733 | it can never reacquire that capability (unless it |
c930827f | 734 | .BR execve (2)s |
c8e68512 MK |
735 | either a set-user-ID-root program, or |
736 | a program whose associated file capabilities grant that capability). | |
fea681da | 737 | .TP |
c8e68512 MK |
738 | .IR Inheritable : |
739 | This is a set of capabilities preserved across an | |
fea681da | 740 | .BR execve (2). |
6260f4cd AL |
741 | Inheritable capabilities remain inheritable when executing any program, |
742 | and inheritable capabilities are added to the permitted set when executing | |
743 | a program that has the corresponding bits set in the file inheritable set. | |
744 | .IP | |
745 | Because inheritable capabilities are not generally preserved across | |
746 | .BR execve (2) | |
747 | when running as a non-root user, applications that wish to run helper | |
e574dcd0 MK |
748 | programs with elevated capabilities should consider using |
749 | ambient capabilities, described below. | |
c8e68512 MK |
750 | .TP |
751 | .IR Effective : | |
752 | This is the set of capabilities used by the kernel to | |
753 | perform permission checks for the thread. | |
6260f4cd AL |
754 | .TP |
755 | .IR Ambient " (since Linux 4.3):" | |
e574dcd0 | 756 | .\" commit 58319057b7847667f0c9585b9de0e8932b0fdb08 |
6260f4cd AL |
757 | This is a set of capabilities that are preserved across an |
758 | .BR execve (2) | |
3375bef1 | 759 | of a program that is not privileged. |
e574dcd0 MK |
760 | The ambient capability set obeys the invariant that no capability |
761 | can ever be ambient if it is not both permitted and inheritable. | |
3375bef1 MK |
762 | |
763 | The ambient capability set can be directly modified using | |
764 | .BR prctl (2). | |
765 | Ambient capabilities are automatically lowered if either of | |
766 | the corresponding permitted or inheritable capabilities is lowered. | |
767 | ||
768 | Executing a program that changes UID or GID due to the | |
769 | set-user-ID or set-group-ID bits or executing a program that has | |
770 | any file capabilities set will clear the ambient set. | |
771 | Ambient capabilities are added to the permitted set and | |
772 | assigned to the effective set when | |
6260f4cd | 773 | .BR execve (2) |
e574dcd0 | 774 | is called. |
fea681da | 775 | .PP |
fea681da MK |
776 | A child created via |
777 | .BR fork (2) | |
778 | inherits copies of its parent's capability sets. | |
3dfe7e0d | 779 | See below for a discussion of the treatment of capabilities during |
c930827f | 780 | .BR execve (2). |
fea681da MK |
781 | .PP |
782 | Using | |
783 | .BR capset (2), | |
c8e68512 | 784 | a thread may manipulate its own capability sets (see below). |
afae50e4 MK |
785 | .PP |
786 | Since Linux 3.2, the file | |
787 | .I /proc/sys/kernel/cap_last_cap | |
a60b1f03 | 788 | .\" commit 73efc0394e148d0e15583e13712637831f926720 |
afae50e4 MK |
789 | exposes the numerical value of the highest capability |
790 | supported by the running kernel; | |
791 | this can be used to determine the highest bit | |
792 | that may be set in a capability set. | |
c8e68512 | 793 | .\" |
c634028a | 794 | .SS File capabilities |
c8e68512 MK |
795 | Since kernel 2.6.24, the kernel supports |
796 | associating capability sets with an executable file using | |
797 | .BR setcap (8). | |
798 | The file capability sets are stored in an extended attribute (see | |
799 | .BR setxattr (2)) | |
800 | named | |
801 | .IR "security.capability" . | |
802 | Writing to this extended attribute requires the | |
803 | .BR CAP_SETFCAP | |
fea681da | 804 | capability. |
c8e68512 | 805 | The file capability sets, |
cf7a13d4 | 806 | in conjunction with the capability sets of the thread, |
c8e68512 | 807 | determine the capabilities of a thread after an |
c930827f | 808 | .BR execve (2). |
c8e68512 MK |
809 | |
810 | The three file capability sets are: | |
fea681da | 811 | .TP |
3dfe7e0d | 812 | .IR Permitted " (formerly known as " forced ): |
c8e68512 | 813 | These capabilities are automatically permitted to the thread, |
cf7a13d4 | 814 | regardless of the thread's inheritable capabilities. |
fea681da | 815 | .TP |
c8e68512 MK |
816 | .IR Inheritable " (formerly known as " allowed ): |
817 | This set is ANDed with the thread's inheritable set to determine which | |
818 | inheritable capabilities are enabled in the permitted set of | |
819 | the thread after the | |
820 | .BR execve (2). | |
821 | .TP | |
fea681da | 822 | .IR Effective : |
c8e68512 MK |
823 | This is not a set, but rather just a single bit. |
824 | If this bit is set, then during an | |
825 | .BR execve (2) | |
826 | all of the new permitted capabilities for the thread are | |
827 | also raised in the effective set. | |
828 | If this bit is not set, then after an | |
829 | .BR execve (2), | |
830 | none of the new permitted capabilities is in the new effective set. | |
831 | ||
832 | Enabling the file effective capability bit implies | |
2914a14d | 833 | that any file permitted or inheritable capability that causes a |
c8e68512 MK |
834 | thread to acquire the corresponding permitted capability during an |
835 | .BR execve (2) | |
e33a08e1 | 836 | (see the transformation rules described below) will also acquire that |
c8e68512 MK |
837 | capability in its effective set. |
838 | Therefore, when assigning capabilities to a file | |
839 | .RB ( setcap (8), | |
840 | .BR cap_set_file (3), | |
841 | .BR cap_set_fd (3)), | |
842 | if we specify the effective flag as being enabled for any capability, | |
843 | then the effective flag must also be specified as enabled | |
844 | for all other capabilities for which the corresponding permitted or | |
845 | inheritable flags is enabled. | |
1f601b1c MK |
846 | .RE |
847 | ||
1a0dff18 MK |
848 | File capability sets are ignored if the executable file |
849 | resides on a filesystem mounted with the | |
850 | .B nosuid | |
851 | option (see | |
852 | .BR mount (2) | |
853 | and | |
854 | .BR mount (8)). | |
c8e68512 | 855 | .\" |
c634028a | 856 | .SS Transformation of capabilities during execve() |
fea681da | 857 | .PP |
c13182ef | 858 | During an |
c930827f | 859 | .BR execve (2), |
1e321034 | 860 | the kernel calculates the new capabilities of |
fea681da | 861 | the process using the following algorithm: |
088a639b | 862 | .in +4n |
fea681da MK |
863 | .nf |
864 | ||
3375bef1 | 865 | P'(ambient) = (file is privileged) ? 0 : P(ambient) |
6260f4cd | 866 | |
c13182ef | 867 | P'(permitted) = (P(inheritable) & F(inheritable)) | |
6260f4cd | 868 | (F(permitted) & cap_bset) | P'(ambient) |
fea681da | 869 | |
6260f4cd | 870 | P'(effective) = F(effective) ? P'(permitted) : P'(ambient) |
fea681da | 871 | |
5bdccabd | 872 | P'(inheritable) = P(inheritable) [i.e., unchanged] |
fea681da MK |
873 | |
874 | .fi | |
088a639b | 875 | .in |
fea681da | 876 | where: |
c8e68512 | 877 | .RS 4 |
fea681da | 878 | .IP P 10 |
c13182ef | 879 | denotes the value of a thread capability set before the |
c930827f | 880 | .BR execve (2) |
c8e68512 | 881 | .IP P' |
8295fc02 | 882 | denotes the value of a thread capability set after the |
c930827f | 883 | .BR execve (2) |
c8e68512 | 884 | .IP F |
fea681da | 885 | denotes a file capability set |
c8e68512 MK |
886 | .IP cap_bset |
887 | is the value of the capability bounding set (described below). | |
888 | .RE | |
3375bef1 MK |
889 | .PP |
890 | A privileged file is one that has capabilities or | |
891 | has the set-user-ID or set-group-ID bit set. | |
c8e68512 | 892 | .\" |
e0e57837 | 893 | .SS Safety checking for capability-dumb binaries |
4a866754 | 894 | A capability-dumb binary is an application that has been |
e0e57837 MK |
895 | marked to have file capabilities, but has not been converted to use the |
896 | .BR libcap (3) | |
897 | API to manipulate its capabilities. | |
898 | (In other words, this is a traditional set-user-ID-root program | |
899 | that has been switched to use file capabilities, | |
900 | but whose code has not been modified to understand capabilities.) | |
2c767761 | 901 | For such applications, |
e0e57837 MK |
902 | the effective capability bit is set on the file, |
903 | so that the file permitted capabilities are automatically | |
904 | enabled in the process effective set when executing the file. | |
905 | The kernel recognizes a file which has the effective capability bit set | |
906 | as capability-dumb for the purpose of the check described here. | |
907 | ||
908 | When executing a capability-dumb binary, | |
909 | the kernel checks if the process obtained all permitted capabilities | |
910 | that were specified in the file permitted set, | |
911 | after the capability transformations described above have been performed. | |
912 | (The typical reason why this might | |
913 | .I not | |
914 | occur is that the capability bounding set masked out some | |
915 | of the capabilities in the file permitted set.) | |
916 | If the process did not obtain the full set of | |
917 | file permitted capabilities, then | |
918 | .BR execve (2) | |
919 | fails with the error | |
920 | .BR EPERM . | |
921 | This prevents possible security risks that could arise when | |
922 | a capability-dumb application is executed with less privilege that it needs. | |
923 | Note that, by definition, | |
924 | the application could not itself recognize this problem, | |
925 | since it does not employ the | |
926 | .BR libcap (3) | |
927 | API. | |
928 | .\" | |
c8e68512 MK |
929 | .SS Capabilities and execution of programs by root |
930 | In order to provide an all-powerful | |
931 | .I root | |
932 | using capability sets, during an | |
933 | .BR execve (2): | |
934 | .IP 1. 3 | |
935 | If a set-user-ID-root program is being executed, | |
936 | or the real user ID of the process is 0 (root) | |
937 | then the file inheritable and permitted sets are defined to be all ones | |
938 | (i.e., all capabilities enabled). | |
939 | .IP 2. | |
940 | If a set-user-ID-root program is being executed, | |
941 | then the file effective bit is defined to be one (enabled). | |
3dfe7e0d | 942 | .PP |
c8e68512 MK |
943 | The upshot of the above rules, |
944 | combined with the capabilities transformations described above, | |
945 | is that when a process | |
c930827f | 946 | .BR execve (2)s |
3dfe7e0d | 947 | a set-user-ID-root program, or when a process with an effective UID of 0 |
c930827f | 948 | .BR execve (2)s |
3dfe7e0d | 949 | a program, |
c13182ef | 950 | it gains all capabilities in its permitted and effective capability sets, |
c8e68512 | 951 | except those masked out by the capability bounding set. |
c7094399 | 952 | .\" If a process with real UID 0, and nonzero effective UID does an |
c8e68512 | 953 | .\" exec(), then it gets all capabilities in its |
35fb7de5 | 954 | .\" permitted set, and no effective capabilities |
3dfe7e0d | 955 | This provides semantics that are the same as those provided by |
008f1ecc | 956 | traditional UNIX systems. |
c8e68512 MK |
957 | .SS Capability bounding set |
958 | The capability bounding set is a security mechanism that can be used | |
959 | to limit the capabilities that can be gained during an | |
960 | .BR execve (2). | |
961 | The bounding set is used in the following ways: | |
962 | .IP * 2 | |
963 | During an | |
964 | .BR execve (2), | |
965 | the capability bounding set is ANDed with the file permitted | |
966 | capability set, and the result of this operation is assigned to the | |
967 | thread's permitted capability set. | |
968 | The capability bounding set thus places a limit on the permitted | |
969 | capabilities that may be granted by an executable file. | |
970 | .IP * | |
971 | (Since Linux 2.6.25) | |
972 | The capability bounding set acts as a limiting superset for | |
973 | the capabilities that a thread can add to its inheritable set using | |
974 | .BR capset (2). | |