]> git.ipfire.org Git - thirdparty/man-pages.git/blame - man7/capabilities.7
Many pages: Fix style issues reported by `make lint-groff`
[thirdparty/man-pages.git] / man7 / capabilities.7
CommitLineData
c11b1abf 1.\" Copyright (c) 2002 by Michael Kerrisk <mtk.manpages@gmail.com>
fea681da 2.\"
5fbde956 3.\" SPDX-License-Identifier: Linux-man-pages-copyleft
fea681da
MK
4.\"
5.\" 6 Aug 2002 - Initial Creation
c11b1abf
MK
6.\" Modified 2003-05-23, Michael Kerrisk, <mtk.manpages@gmail.com>
7.\" Modified 2004-05-27, Michael Kerrisk, <mtk.manpages@gmail.com>
1c1e15ed 8.\" 2004-12-08, mtk Added O_NOATIME for CAP_FOWNER
5eaee3d9 9.\" 2005-08-16, mtk, Added CAP_AUDIT_CONTROL and CAP_AUDIT_WRITE
c8e68512
MK
10.\" 2008-07-15, Serge Hallyn <serue@us.bbm.com>
11.\" Document file capabilities, per-process capability
12.\" bounding set, changed semantics for CAP_SETPCAP,
13.\" and other changes in 2.6.2[45].
14.\" Add CAP_MAC_ADMIN, CAP_MAC_OVERRIDE, CAP_SETFCAP.
15.\" 2008-07-15, mtk
16.\" Add text describing circumstances in which CAP_SETPCAP
17.\" (theoretically) permits a thread to change the
18.\" capability sets of another thread.
19.\" Add section describing rules for programmatically
20.\" adjusting thread capability sets.
21.\" Describe rationale for capability bounding set.
22.\" Document "securebits" flags.
23.\" Add text noting that if we set the effective flag for one file
24.\" capability, then we must also set the effective flag for all
25.\" other capabilities where the permitted or inheritable bit is set.
bfb730f9 26.\" 2011-09-07, mtk/Serge hallyn: Add CAP_SYSLOG
5eaee3d9 27.\"
6e00b7a8 28.TH CAPABILITIES 7 2021-08-27 "Linux" "Linux Programmer's Manual"
fea681da
MK
29.SH NAME
30capabilities \- overview of Linux capabilities
31.SH DESCRIPTION
fea681da 32For the purpose of performing permission checks,
008f1ecc 33traditional UNIX implementations distinguish two categories of processes:
fea681da
MK
34.I privileged
35processes (whose effective user ID is 0, referred to as superuser or root),
36and
37.I unprivileged
c7094399 38processes (whose effective UID is nonzero).
fea681da
MK
39Privileged processes bypass all kernel permission checks,
40while unprivileged processes are subject to full permission
41checking based on the process's credentials
42(usually: effective UID, effective GID, and supplementary group list).
ade303d7 43.PP
c13182ef
MK
44Starting with kernel 2.2, Linux divides the privileges traditionally
45associated with superuser into distinct units, known as
fea681da 46.IR capabilities ,
3dfe7e0d 47which can be independently enabled and disabled.
cf7a13d4 48Capabilities are a per-thread attribute.
c8e68512 49.\"
c634028a 50.SS Capabilities list
c8e68512
MK
51The following list shows the capabilities implemented on Linux,
52and the operations or behaviors that each capability permits:
fea681da 53.TP
45286787 54.BR CAP_AUDIT_CONTROL " (since Linux 2.6.11)"
5eaee3d9
MK
55Enable and disable kernel auditing; change auditing filter rules;
56retrieve auditing status and filtering rules.
57.TP
c81cea2c
MK
58.BR CAP_AUDIT_READ " (since Linux 3.16)"
59.\" commit a29b694aa1739f9d76538e34ae25524f9c549d59
60.\" commit 3a101b8de0d39403b2c7e5c23fd0b005668acf48
61Allow reading the audit log via a multicast netlink socket.
62.TP
45286787 63.BR CAP_AUDIT_WRITE " (since Linux 2.6.11)"
c8e68512 64Write records to kernel auditing log.
dd61e8a8 65.\" FIXME Add FAN_ENABLE_AUDIT
5eaee3d9 66.TP
9339d749
MK
67.BR CAP_BLOCK_SUSPEND " (since Linux 3.5)"
68Employ features that can block system suspend
69.RB ( epoll (7)
70.BR EPOLLWAKEUP ,
71.IR /proc/sys/wake_lock ).
72.TP
81701c04
MK
73.BR CAP_BPF " (since Linux 5.8)"
74Employ privileged BPF operations; see
75.BR bpf (2)
76and
28a4c58c 77.BR bpf\-helpers (7).
81701c04
MK
78.IP
79This capability was added in Linux 5.8 to separate out
80BPF functionality from the overloaded
1ae6b2c7 81.B CAP_SYS_ADMIN
81701c04
MK
82capability.
83.TP
71f6247f
MK
84.BR CAP_CHECKPOINT_RESTORE " (since Linux 5.9)"
85.\" commit 124ea650d3072b005457faed69909221c2905a1f
86.PD 0
87.RS
88.IP * 2
89Update
90.I /proc/sys/kernel/ns_last_pid
91(see
92.BR pid_namespaces (7));
93.IP *
94employ the
95.I set_tid
96feature of
97.BR clone3 (2);
98.\" FIXME There is also some use case relating to
99.\" prctl_set_mm_exe_file(); in the 5.9 sources, see
100.\" prctl_set_mm_map().
101.IP *
102read the contents of the symbolic links in
1ae6b2c7 103.IR /proc/ pid /map_files
71f6247f
MK
104for other processes.
105.RE
106.PD
107.IP
108This capability was added in Linux 5.9 to separate out
109checkpoint/restore functionality from the overloaded
1ae6b2c7 110.B CAP_SYS_ADMIN
71f6247f
MK
111capability.
112.TP
fea681da 113.B CAP_CHOWN
c8e68512 114Make arbitrary changes to file UIDs and GIDs (see
fea681da
MK
115.BR chown (2)).
116.TP
117.B CAP_DAC_OVERRIDE
118Bypass file read, write, and execute permission checks.
c8e68512 119(DAC is an abbreviation of "discretionary access control".)
fea681da
MK
120.TP
121.B CAP_DAC_READ_SEARCH
a537062e
MK
122.PD 0
123.RS
124.IP * 2
fea681da 125Bypass file read permission checks and
a537062e
MK
126directory read and execute permission checks;
127.IP *
3bbab71a 128invoke
24ee13df
MK
129.BR open_by_handle_at (2);
130.IP *
131use the
132.BR linkat (2)
133.B AT_EMPTY_PATH
134flag to create a link to a file referred to by a file descriptor.
a537062e
MK
135.RE
136.PD
fea681da
MK
137.TP
138.B CAP_FOWNER
c8e68512
MK
139.PD 0
140.RS
141.IP * 2
fea681da 142Bypass permission checks on operations that normally
9ee4a2b6 143require the filesystem UID of the process to match the UID of
fea681da
MK
144the file (e.g.,
145.BR chmod (2),
146.BR utime (2)),
c8e68512 147excluding those operations covered by
fea681da
MK
148.B CAP_DAC_OVERRIDE
149and
150.BR CAP_DAC_READ_SEARCH ;
c8e68512 151.IP *
1dc9bca6
MK
152set inode flags (see
153.BR ioctl_iflags (2))
fea681da 154on arbitrary files;
c8e68512 155.IP *
fea681da 156set Access Control Lists (ACLs) on arbitrary files;
c8e68512 157.IP *
1c1e15ed 158ignore directory sticky bit on file deletion;
c8e68512 159.IP *
c99eb2b2
MK
160modify
161.I user
162extended attributes on sticky directory owned by any user;
163.IP *
1c1e15ed
MK
164specify
165.B O_NOATIME
166for arbitrary files in
167.BR open (2)
168and
169.BR fcntl (2).
c8e68512
MK
170.RE
171.PD
fea681da
MK
172.TP
173.B CAP_FSETID
3bbab71a
MK
174.PD 0
175.RS
176.IP * 2
ed948c28 177Don't clear set-user-ID and set-group-ID mode
c8e68512 178bits when a file is modified;
3bbab71a 179.IP *
c8e68512 180set the set-group-ID bit for a file whose GID does not match
9ee4a2b6 181the filesystem or any of the supplementary GIDs of the calling process.
3bbab71a
MK
182.RE
183.PD
fea681da
MK
184.TP
185.B CAP_IPC_LOCK
bea08fec 186.\" FIXME . As at Linux 3.2, there are some strange uses of this capability
46c73a44 187.\" in other places; they probably should be replaced with something else.
3dcdef94
MK
188.PD 0
189.RS
190.IP * 2
c8e68512 191Lock memory
fea681da
MK
192.RB ( mlock (2),
193.BR mlockall (2),
194.BR mmap (2),
3dcdef94
MK
195.BR shmctl (2));
196.IP *
197Allocate memory using huge pages
36e6250f 198.RB ( memfd_create (2),
3dcdef94 199.BR mmap (2),
fea681da 200.BR shmctl (2)).
3dcdef94
MK
201.PD 0
202.RE
fea681da
MK
203.TP
204.B CAP_IPC_OWNER
205Bypass permission checks for operations on System V IPC objects.
206.TP
207.B CAP_KILL
208Bypass permission checks for sending signals (see
209.BR kill (2)).
097585ed 210This includes use of the
c8e68512 211.BR ioctl (2)
097585ed 212.B KDSIGACCEPT
c8e68512 213operation.
bea08fec 214.\" FIXME . CAP_KILL also has an effect for threads + setting child
a7c1e564
MK
215.\" termination signal to other than SIGCHLD: without this
216.\" capability, the termination signal reverts to SIGCHLD
c13182ef 217.\" if the child does an exec(). What is the rationale
a7c1e564 218.\" for this?
fea681da 219.TP
c8e68512
MK
220.BR CAP_LEASE " (since Linux 2.4)"
221Establish leases on arbitrary files (see
fea681da
MK
222.BR fcntl (2)).
223.TP
224.B CAP_LINUX_IMMUTABLE
c8e68512
MK
225Set the
226.B FS_APPEND_FL
fea681da 227and
c8e68512 228.B FS_IMMUTABLE_FL
e7e006f2 229inode flags (see
1dc9bca6 230.BR ioctl_iflags (2)).
fea681da 231.TP
c8e68512 232.BR CAP_MAC_ADMIN " (since Linux 2.6.25)"
7f82d0b0 233Allow MAC configuration or state changes.
c8e68512
MK
234Implemented for the Smack Linux Security Module (LSM).
235.TP
236.BR CAP_MAC_OVERRIDE " (since Linux 2.6.25)"
7f82d0b0 237Override Mandatory Access Control (MAC).
c8e68512
MK
238Implemented for the Smack LSM.
239.TP
240.BR CAP_MKNOD " (since Linux 2.4)"
241Create special files using
fea681da
MK
242.BR mknod (2).
243.TP
244.B CAP_NET_ADMIN
e87268ec
MK
245Perform various network-related operations:
246.PD 0
247.RS
248.IP * 2
249interface configuration;
250.IP *
12fe8fd3 251administration of IP firewall, masquerading, and accounting;
e87268ec
MK
252.IP *
253modify routing tables;
254.IP *
255bind to any address for transparent proxying;
256.IP *
1cc2995a 257set type-of-service (TOS);
e87268ec
MK
258.IP *
259clear driver statistics;
260.IP *
261set promiscuous mode;
262.IP *
263enabling multicasting;
264.IP *
265use
266.BR setsockopt (2)
267to set the following socket options:
268.BR SO_DEBUG ,
269.BR SO_MARK ,
1ae6b2c7 270.B SO_PRIORITY
e87268ec
MK
271(for a priority outside the range 0 to 6),
272.BR SO_RCVBUFFORCE ,
273and
274.BR SO_SNDBUFFORCE .
275.RE
276.PD
fea681da
MK
277.TP
278.B CAP_NET_BIND_SERVICE
6eb334b2 279Bind a socket to Internet domain privileged ports
fea681da
MK
280(port numbers less than 1024).
281.TP
282.B CAP_NET_BROADCAST
c8e68512 283(Unused) Make socket broadcasts, and listen to multicasts.
fd39ef0c
MK
284.\" FIXME Since Linux 4.2, there are use cases for netlink sockets
285.\" commit 59324cf35aba5336b611074028777838a963d03b
fea681da
MK
286.TP
287.B CAP_NET_RAW
93e9e2d6
MK
288.PD 0
289.RS
290.IP * 2
dd55b8a1 291Use RAW and PACKET sockets;
93e9e2d6
MK
292.IP *
293bind to any address for transparent proxying.
294.RE
295.PD
fea681da
MK
296.\" Also various IP options and setsockopt(SO_BINDTODEVICE)
297.TP
e39e4240
MK
298.BR CAP_PERFMON " (since Linux 5.8)"
299Employ various performance-monitoring mechanisms, including:
e39e4240 300.RS
cbcd1195 301.IP * 2
f7cf9c0b 302.PD 0
e39e4240
MK
303call
304.BR perf_event_open (2);
305.IP *
306employ various BPF operations that have performance implications.
307.RE
308.PD
309.IP
310This capability was added in Linux 5.8 to separate out
311performance monitoring functionality from the overloaded
1ae6b2c7 312.B CAP_SYS_ADMIN
e39e4240 313capability.
874355e3 314See also the kernel source file
b49c2acb 315.IR Documentation/admin\-guide/perf\-security.rst .
e39e4240 316.TP
fea681da 317.B CAP_SETGID
3bbab71a
MK
318.RS
319.PD 0
320.IP * 2
c8e68512 321Make arbitrary manipulations of process GIDs and supplementary GID list;
3bbab71a 322.IP *
5bea231d 323forge GID when passing socket credentials via UNIX domain sockets;
3bbab71a 324.IP *
5bea231d 325write a group ID mapping in a user namespace (see
f58fb24f 326.BR user_namespaces (7)).
3bbab71a
MK
327.PD
328.RE
fea681da 329.TP
c8e68512 330.BR CAP_SETFCAP " (since Linux 2.6.24)"
b8cee784 331Set arbitrary capabilities on a file.
29c1f3cf
KK
332.IP
333.\" commit db2e718a47984b9d71ed890eb2ea36ecf150de18
334Since Linux 5.12, this capability is
a1508e36
MK
335also needed to map user ID 0 in a new user namespace; see
336.BR user_namespaces (7)
337for details.
c8e68512
MK
338.TP
339.B CAP_SETPCAP
e62172cb 340If file capabilities are supported (i.e., since Linux 2.6.24):
c8e68512
MK
341add any capability from the calling thread's bounding set
342to its inheritable set;
343drop capabilities from the bounding set (via
344.BR prctl (2)
345.BR PR_CAPBSET_DROP );
346make changes to the
347.I securebits
348flags.
e62172cb
MK
349.IP
350If file capabilities are not supported (i.e., kernels before Linux 2.6.24):
351grant or remove any capability in the
352caller's permitted capability set to or from any other process.
353(This property of
354.B CAP_SETPCAP
355is not available when the kernel is configured to support
356file capabilities, since
357.B CAP_SETPCAP
358has entirely different semantics for such kernels.)
fea681da
MK
359.TP
360.B CAP_SETUID
3bbab71a
MK
361.RS
362.PD 0
363.IP * 2
c8e68512 364Make arbitrary manipulations of process UIDs
fea681da
MK
365.RB ( setuid (2),
366.BR setreuid (2),
367.BR setresuid (2),
368.BR setfsuid (2));
3bbab71a 369.IP *
a7d96776 370forge UID when passing socket credentials via UNIX domain sockets;
3bbab71a 371.IP *
5bea231d 372write a user ID mapping in a user namespace (see
f58fb24f 373.BR user_namespaces (7)).
3bbab71a
MK
374.PD
375.RE
777f5a9e 376.\" FIXME CAP_SETUID also an effect in exec(); document this.
fea681da
MK
377.TP
378.B CAP_SYS_ADMIN
fa50d3d4
MK
379.IR Note :
380this capability is overloaded; see
aca89285 381.I Notes to kernel developers
fa50d3d4 382below.
ade303d7 383.IP
c8e68512
MK
384.PD 0
385.RS
386.IP * 2
387Perform a range of system administration operations including:
fea681da
MK
388.BR quotactl (2),
389.BR mount (2),
390.BR umount (2),
40ca3880 391.BR pivot_root (2),
1368e847
MK
392.BR swapon (2),
393.BR swapoff (2),
fea681da 394.BR sethostname (2),
f169a862 395and
c8e68512
MK
396.BR setdomainname (2);
397.IP *
bfb730f9
MK
398perform privileged
399.BR syslog (2)
400operations (since Linux 2.6.37,
1ae6b2c7 401.B CAP_SYSLOG
bfb730f9
MK
402should be used to permit such operations);
403.IP *
c8e68512 404perform
c11e3891
MK
405.B VM86_REQUEST_IRQ
406.BR vm86 (2)
407command;
408.IP *
045c5bde 409access the same checkpoint/restore functionality that is governed by
1ae6b2c7 410.B CAP_CHECKPOINT_RESTORE
045c5bde
MK
411(but the latter, weaker capability is preferred for accessing
412that functionality).
413.IP *
2fbfb575 414perform the same BPF operations as are governed by
1ae6b2c7 415.B CAP_BPF
2fbfb575
MK
416(but the latter, weaker capability is preferred for accessing
417that functionality).
418.IP *
419employ the same performance monitoring mechanisms as are governed by
1ae6b2c7 420.B CAP_PERFMON
2fbfb575
MK
421(but the latter, weaker capability is preferred for accessing
422that functionality).
423.IP *
c11e3891 424perform
fea681da
MK
425.B IPC_SET
426and
427.B IPC_RMID
428operations on arbitrary System V IPC objects;
c8e68512 429.IP *
1a3b63f7
MK
430override
431.B RLIMIT_NPROC
432resource limit;
433.IP *
fea681da
MK
434perform operations on
435.I trusted
436and
437.I security
19531dec 438extended attributes (see
89fabe2e 439.BR xattr (7));
c8e68512
MK
440.IP *
441use
08baa0af 442.BR lookup_dcookie (2);
c8e68512 443.IP *
a1f926b8
MK
444use
445.BR ioprio_set (2)
446to assign
447.B IOPRIO_CLASS_RT
83ee9237 448and (before Linux 2.6.25)
237aa7c5 449.B IOPRIO_CLASS_IDLE
a1f926b8 450I/O scheduling classes;
c8e68512 451.IP *
f5ac5bbf 452forge PID when passing socket credentials via UNIX domain sockets;
c8e68512 453.IP *
fea681da 454exceed
b49c2acb 455.IR /proc/sys/fs/file\-max ,
3dfe7e0d
MK
456the system-wide limit on the number of open files,
457in system calls that open files (e.g.,
fea681da
MK
458.BR accept (2),
459.BR execve (2),
460.BR open (2),
f169a862 461.BR pipe (2));
c8e68512 462.IP *
c13182ef 463employ
0f807eea
MK
464.B CLONE_*
465flags that create new namespaces with
a7c1e564
MK
466.BR clone (2)
467and
c67d3814
MK
468.BR unshare (2)
469(but, since Linux 3.8,
470creating user namespaces does not require any capability);
c8e68512 471.IP *
0f322ccc
MK
472access privileged
473.I perf
474event information;
2bfe6656
MK
475.IP *
476call
c3b49118
MK
477.BR setns (2)
478(requires
479.B CAP_SYS_ADMIN
480in the
481.I target
482namespace);
e4698850 483.IP *
0f807eea
MK
484call
485.BR fanotify_init (2);
486.IP *
2cf45b0d 487perform privileged
a7c1e564
MK
488.B KEYCTL_CHOWN
489and
490.B KEYCTL_SETPERM
491.BR keyctl (2)
e64e6056
MK
492operations;
493.IP *
494perform
495.BR madvise (2)
496.B MADV_HWPOISON
0f807eea
MK
497operation;
498.IP *
499employ the
500.B TIOCSTI
501.BR ioctl (2)
502to insert characters into the input queue of a terminal other than
838ad419 503the caller's controlling terminal;
0f807eea 504.IP *
0f807eea 505employ the obsolete
51c5c662 506.BR nfsservctl (2)
c42221c4
MK
507system call;
508.IP *
509employ the obsolete
0f807eea
MK
510.BR bdflush (2)
511system call;
512.IP *
513perform various privileged block-device
514.BR ioctl (2)
515operations;
516.IP *
9ee4a2b6 517perform various privileged filesystem
0f807eea
MK
518.BR ioctl (2)
519operations;
520.IP *
fdf41f57
MK
521perform privileged
522.BR ioctl (2)
523operations on the
1ae6b2c7 524.I /dev/random
fdf41f57
MK
525device (see
526.BR random (4));
527.IP *
c6ddae52
MK
528install a
529.BR seccomp (2)
530filter without first having to set the
531.I no_new_privs
532thread attribute;
533.IP *
968b27aa
MK
534modify allow/deny rules for device control groups;
535.IP *
536employ the
537.BR ptrace (2)
538.B PTRACE_SECCOMP_GET_FILTER
539operation to dump tracee's seccomp filters;
540.IP *
541employ the
542.BR ptrace (2)
543.B PTRACE_SETOPTIONS
544operation to suspend the tracee's seccomp protections (i.e., the
545.B PTRACE_O_SUSPEND_SECCOMP
115c1eb4 546flag);
c6ddae52 547.IP *
a526aa40 548perform administrative operations on many device drivers;
7e7e8de3 549.IP *
a526aa40 550modify autogroup nice values by writing to
1ae6b2c7 551.IR /proc/ pid /autogroup
7e7e8de3
MK
552(see
553.BR sched (7)).
c8e68512
MK
554.RE
555.PD
fea681da
MK
556.TP
557.B CAP_SYS_BOOT
c8e68512 558Use
08baa0af
MK
559.BR reboot (2)
560and
561.BR kexec_load (2).
fea681da
MK
562.TP
563.B CAP_SYS_CHROOT
4312e0cb
MK
564.RS
565.PD 0
566.IP * 2
c8e68512 567Use
4312e0cb
MK
568.BR chroot (2);
569.IP *
570change mount namespaces using
571.BR setns (2).
572.PD
573.RE
fea681da
MK
574.TP
575.B CAP_SYS_MODULE
3bbab71a
MK
576.RS
577.PD 0
578.IP * 2
c8e68512
MK
579Load and unload kernel modules
580(see
fea681da
MK
581.BR init_module (2)
582and
c8e68512 583.BR delete_module (2));
3bbab71a 584.IP *
c8e68512
MK
585in kernels before 2.6.25:
586drop capabilities from the system-wide capability bounding set.
3bbab71a
MK
587.PD
588.RE
fea681da
MK
589.TP
590.B CAP_SYS_NICE
c8e68512
MK
591.PD 0
592.RS
593.IP * 2
0c576731 594Lower the process nice value
fea681da
MK
595.RB ( nice (2),
596.BR setpriority (2))
c8e68512
MK
597and change the nice value for arbitrary processes;
598.IP *
599set real-time scheduling policies for calling process,
600and set scheduling policies and priorities for arbitrary processes
fea681da 601.RB ( sched_setscheduler (2),
f96787ab 602.BR sched_setparam (2),
0d59d0c8 603.BR sched_setattr (2));
c8e68512 604.IP *
fea681da 605set CPU affinity for arbitrary processes
c13182ef 606.RB ( sched_setaffinity (2));
c8e68512 607.IP *
a1f926b8 608set I/O scheduling class and priority for arbitrary processes
c13182ef 609.RB ( ioprio_set (2));
c8e68512
MK
610.IP *
611apply
a1f926b8 612.BR migrate_pages (2)
c8e68512 613to arbitrary processes and allow processes
a1f926b8 614to be migrated to arbitrary nodes;
c13182ef 615.\" FIXME CAP_SYS_NICE also has the following effect for
a1f926b8
MK
616.\" migrate_pages(2):
617.\" do_migrate_pages(mm, &old, &new,
618.\" capable(CAP_SYS_NICE) ? MPOL_MF_MOVE_ALL : MPOL_MF_MOVE);
1a0fbe37 619.\"
bea08fec 620.\" Document this.
c8e68512
MK
621.IP *
622apply
a7c1e564 623.BR move_pages (2)
c8e68512
MK
624to arbitrary processes;
625.IP *
4d62f7b6
MK
626use the
627.B MPOL_MF_MOVE_ALL
c13182ef 628flag with
a7c1e564 629.BR mbind (2)
c13182ef 630and
a7c1e564 631.BR move_pages (2).
c8e68512
MK
632.RE
633.PD
fea681da
MK
634.TP
635.B CAP_SYS_PACCT
c8e68512 636Use
fea681da
MK
637.BR acct (2).
638.TP
639.B CAP_SYS_PTRACE
eb64a9cb
MK
640.PD 0
641.RS
de6a5c05 642.IP * 2
c8e68512 643Trace arbitrary processes using
cbd7b9bf 644.BR ptrace (2);
eb64a9cb 645.IP *
cbd7b9bf
MK
646apply
647.BR get_robust_list (2)
38b6e5b0 648to arbitrary processes;
eb64a9cb 649.IP *
b8f84ce2
MK
650transfer data to or from the memory of arbitrary processes using
651.BR process_vm_readv (2)
652and
3bbab71a 653.BR process_vm_writev (2);
b8f84ce2 654.IP *
38b6e5b0
MK
655inspect processes using
656.BR kcmp (2).
eb64a9cb
MK
657.RE
658.PD
fea681da
MK
659.TP
660.B CAP_SYS_RAWIO
4637c8cb
MK
661.PD 0
662.RS
663.IP * 2
c8e68512 664Perform I/O port operations
fea681da
MK
665.RB ( iopl (2)
666and
667.BR ioperm (2));
4637c8cb 668.IP *
fea681da 669access
474e1f9d 670.IR /proc/kcore ;
4637c8cb 671.IP *
474e1f9d
MK
672employ the
673.B FIBMAP
674.BR ioctl (2)
4637c8cb
MK
675operation;
676.IP *
677open devices for accessing x86 model-specific registers (MSRs, see
3bbab71a 678.BR msr (4));
4637c8cb
MK
679.IP *
680update
681.IR /proc/sys/vm/mmap_min_addr ;
682.IP *
683create memory mappings at addresses below the value specified by
684.IR /proc/sys/vm/mmap_min_addr ;
685.IP *
50b2aa27 686map files in
cef53f3e 687.IR /proc/bus/pci ;
4637c8cb
MK
688.IP *
689open
1ae6b2c7 690.I /dev/mem
4637c8cb
MK
691and
692.IR /dev/kmem ;
693.IP *
694perform various SCSI device commands;
695.IP *
696perform certain operations on
697.BR hpsa (4)
698and
699.BR cciss (4)
700devices;
701.IP *
702perform a range of device-specific operations on other devices.
703.RE
704.PD
fea681da
MK
705.TP
706.B CAP_SYS_RESOURCE
c8e68512
MK
707.PD 0
708.RS
709.IP * 2
9ee4a2b6 710Use reserved space on ext2 filesystems;
c8e68512
MK
711.IP *
712make
fea681da
MK
713.BR ioctl (2)
714calls controlling ext3 journaling;
c8e68512
MK
715.IP *
716override disk quota limits;
717.IP *
718increase resource limits (see
fea681da 719.BR setrlimit (2));
c8e68512
MK
720.IP *
721override
fea681da 722.B RLIMIT_NPROC
c8e68512
MK
723resource limit;
724.IP *
aa66392d
MK
725override maximum number of consoles on console allocation;
726.IP *
727override maximum number of keymaps;
728.IP *
729allow more than 64hz interrupts from the real-time clock;
730.IP *
c8e68512 731raise
fea681da 732.I msg_qbytes
c8e68512 733limit for a System V message queue above the limit in
0daa9e92 734.I /proc/sys/kernel/msgmnb
fea681da
MK
735(see
736.BR msgop (2)
737and
ad7b0f91
MK
738.BR msgctl (2));
739.IP *
7509f758
MK
740allow the
741.B RLIMIT_NOFILE
742resource limit on the number of "in-flight" file descriptors
743to be bypassed when passing file descriptors to another process
744via a UNIX domain socket (see
745.BR unix (7));
746.IP *
ad7b0f91 747override the
b49c2acb 748.I /proc/sys/fs/pipe\-size\-max
ad7b0f91
MK
749limit when setting the capacity of a pipe using the
750.B F_SETPIPE_SZ
751.BR fcntl (2)
1cc2995a 752command;
46883521
MK
753.IP *
754use
1ae6b2c7 755.B F_SETPIPE_SZ
46883521 756to increase the capacity of a pipe above the limit specified by
b49c2acb 757.IR /proc/sys/fs/pipe\-max\-size ;
b39a2012
MK
758.IP *
759override
5d63eed8
AM
760.IR /proc/sys/fs/mqueue/queues_max ,
761.IR /proc/sys/fs/mqueue/msg_max ,
69a0c93e
SM
762and
763.I /proc/sys/fs/mqueue/msgsize_max
aade901b 764limits when creating POSIX message queues (see
ecc1f45b
MK
765.BR mq_overview (7));
766.IP *
3bbab71a 767employ the
ecc1f45b
MK
768.BR prctl (2)
769.B PR_SET_MM
8ddcc591 770operation;
41f00272 771.IP *
8ddcc591 772set
1ae6b2c7 773.IR /proc/ pid /oom_score_adj
8ddcc591
MK
774to a value lower than the value last set by a process with
775.BR CAP_SYS_RESOURCE .
c8e68512
MK
776.RE
777.PD
fea681da
MK
778.TP
779.B CAP_SYS_TIME
c8e68512 780Set system clock
fea681da
MK
781.RB ( settimeofday (2),
782.BR stime (2),
783.BR adjtimex (2));
c8e68512 784set real-time (hardware) clock.
fea681da
MK
785.TP
786.B CAP_SYS_TTY_CONFIG
c8e68512 787Use
749ac769
MK
788.BR vhangup (2);
789employ various privileged
790.BR ioctl (2)
791operations on virtual terminals.
bfb730f9
MK
792.TP
793.BR CAP_SYSLOG " (since Linux 2.6.37)"
5f94327c
MK
794.RS
795.PD 0
de6a5c05 796.IP * 2
bfb730f9
MK
797Perform privileged
798.BR syslog (2)
799operations.
800See
801.BR syslog (2)
802for information on which operations require privilege.
10fe5485
MK
803.IP *
804View kernel addresses exposed via
805.I /proc
806and other interfaces when
1ae6b2c7 807.I /proc/sys/kernel/kptr_restrict
10fe5485 808has the value 1.
4eaa04c5 809(See the discussion of the
10fe5485
MK
810.I kptr_restrict
811in
812.BR proc (5).)
5f94327c
MK
813.PD
814.RE
d6b08708
MK
815.TP
816.BR CAP_WAKE_ALARM " (since Linux 3.0)"
817Trigger something that will wake up the system (set
818.B CLOCK_REALTIME_ALARM
819and
820.B CLOCK_BOOTTIME_ALARM
821timers).
c8e68512 822.\"
c634028a 823.SS Past and current implementation
c8e68512
MK
824A full implementation of capabilities requires that:
825.IP 1. 3
826For all privileged operations,
827the kernel must check whether the thread has the required
828capability in its effective set.
829.IP 2.
137d81b5 830The kernel must provide system calls allowing a thread's capability sets to
c8e68512
MK
831be changed and retrieved.
832.IP 3.
9ee4a2b6 833The filesystem must support attaching capabilities to an executable file,
c8e68512
MK
834so that a process gains those capabilities when the file is executed.
835.PP
836Before kernel 2.6.24, only the first two of these requirements are met;
837since kernel 2.6.24, all three requirements are met.
838.\"
8de5616f
MK
839.SS Notes to kernel developers
840When adding a new kernel feature that should be governed by a capability,
841consider the following points.
842.IP * 3
ddb624a9
MK
843The goal of capabilities is divide the power of superuser into pieces,
844such that if a program that has one or more capabilities is compromised,
845its power to do damage to the system would be less than the same program
846running with root privilege.
8de5616f
MK
847.IP *
848You have the choice of either creating a new capability for your new feature,
849or associating the feature with one of the existing capabilities.
ddb624a9 850In order to keep the set of capabilities to a manageable size,
8de5616f
MK
851the latter option is preferable,
852unless there are compelling reasons to take the former option.
ddb624a9
MK
853(There is also a technical limit:
854the size of capability sets is currently limited to 64 bits.)
8de5616f
MK
855.IP *
856To determine which existing capability might best be associated
857with your new feature, review the list of capabilities above in order
858to find a "silo" into which your new feature best fits.
ddb624a9 859One approach to take is to determine if there are other features
9f92e4e1 860requiring capabilities that will always be used along with the new feature.
ddb624a9
MK
861If the new feature is useless without these other features,
862you should use the same capability as the other features.
8de5616f 863.IP *
1ae6b2c7 864.I Don't
8de5616f
MK
865choose
866.B CAP_SYS_ADMIN
867if you can possibly avoid it!
868A vast proportion of existing capability checks are associated
6e9219f7
MK
869with this capability (see the partial list above).
870It can plausibly be called "the new root",
871since on the one hand, it confers a wide range of powers,
872and on the other hand,
873its broad scope means that this is the capability
874that is required by many privileged programs.
8de5616f
MK
875Don't make the problem worse.
876The only new features that should be associated with
877.B CAP_SYS_ADMIN
878are ones that
879.I closely
880match existing uses in that silo.
881.IP *
882If you have determined that it really is necessary to create
883a new capability for your feature,
ddb624a9 884don't make or name it as a "single-use" capability.
8de5616f 885Thus, for example, the addition of the highly specific
1ae6b2c7 886.B CAP_SYS_PACCT
8de5616f
MK
887was probably a mistake.
888Instead, try to identify and name your new capability as a broader
889silo into which other related future use cases might fit.
890.\"
c634028a 891.SS Thread capability sets
1db1d36d 892Each thread has the following capability sets containing zero or more
fea681da
MK
893of the above capabilities:
894.TP
1ae6b2c7 895.I Permitted
c8e68512
MK
896This is a limiting superset for the effective
897capabilities that the thread may assume.
898It is also a limiting superset for the capabilities that
899may be added to the inheritable set by a thread that does not have the
900.B CAP_SETPCAP
901capability in its effective set.
ade303d7 902.IP
cf7a13d4 903If a thread drops a capability from its permitted set,
3b777aff 904it can never reacquire that capability (unless it
c930827f 905.BR execve (2)s
c8e68512
MK
906either a set-user-ID-root program, or
907a program whose associated file capabilities grant that capability).
fea681da 908.TP
1ae6b2c7 909.I Inheritable
c8e68512 910This is a set of capabilities preserved across an
fea681da 911.BR execve (2).
6260f4cd
AL
912Inheritable capabilities remain inheritable when executing any program,
913and inheritable capabilities are added to the permitted set when executing
914a program that has the corresponding bits set in the file inheritable set.
915.IP
916Because inheritable capabilities are not generally preserved across
917.BR execve (2)
918when running as a non-root user, applications that wish to run helper
e574dcd0
MK
919programs with elevated capabilities should consider using
920ambient capabilities, described below.
c8e68512 921.TP
1ae6b2c7 922.I Effective
c8e68512
MK
923This is the set of capabilities used by the kernel to
924perform permission checks for the thread.
6260f4cd 925.TP
36de80b9
MK
926.IR Bounding " (per-thread since Linux 2.6.25)"
927The capability bounding set is a mechanism that can be used
928to limit the capabilities that are gained during
929.BR execve (2).
930.IP
931Since Linux 2.6.25, this is a per-thread capability set.
932In older kernels, the capability bounding set was a system wide attribute
933shared by all threads on the system.
934.IP
aca89285
KK
935For more details, see
936.I Capability bounding set
937below.
36de80b9 938.TP
c2b279af 939.IR Ambient " (since Linux 4.3)"
e574dcd0 940.\" commit 58319057b7847667f0c9585b9de0e8932b0fdb08
6260f4cd
AL
941This is a set of capabilities that are preserved across an
942.BR execve (2)
3375bef1 943of a program that is not privileged.
e574dcd0
MK
944The ambient capability set obeys the invariant that no capability
945can ever be ambient if it is not both permitted and inheritable.
ade303d7 946.IP
3375bef1
MK
947The ambient capability set can be directly modified using
948.BR prctl (2).
949Ambient capabilities are automatically lowered if either of
950the corresponding permitted or inheritable capabilities is lowered.
ade303d7 951.IP
3375bef1
MK
952Executing a program that changes UID or GID due to the
953set-user-ID or set-group-ID bits or executing a program that has
954any file capabilities set will clear the ambient set.
955Ambient capabilities are added to the permitted set and
956assigned to the effective set when
6260f4cd 957.BR execve (2)
e574dcd0 958is called.
5367a9ab
MK
959If ambient capabilities cause a process's permitted and effective
960capabilities to increase during an
961.BR execve (2),
962this does not trigger the secure-execution mode described in
963.BR ld.so (8).
fea681da 964.PP
fea681da
MK
965A child created via
966.BR fork (2)
967inherits copies of its parent's capability sets.
aca89285
KK
968For details on how
969.BR execve (2)
970affects capabilities, see
971.I Transformation of capabilities during execve()
972below.
fea681da
MK
973.PP
974Using
975.BR capset (2),
aca89285
KK
976a thread may manipulate its own capability sets; see
977.I Programmatically adjusting capability sets
978below.
afae50e4
MK
979.PP
980Since Linux 3.2, the file
981.I /proc/sys/kernel/cap_last_cap
a60b1f03 982.\" commit 73efc0394e148d0e15583e13712637831f926720
afae50e4
MK
983exposes the numerical value of the highest capability
984supported by the running kernel;
985this can be used to determine the highest bit
986that may be set in a capability set.
c8e68512 987.\"
c634028a 988.SS File capabilities
c8e68512
MK
989Since kernel 2.6.24, the kernel supports
990associating capability sets with an executable file using
991.BR setcap (8).
992The file capability sets are stored in an extended attribute (see
6a65cff8
MK
993.BR setxattr (2)
994and
995.BR xattr (7))
c8e68512
MK
996named
997.IR "security.capability" .
998Writing to this extended attribute requires the
1ae6b2c7 999.B CAP_SETFCAP
fea681da 1000capability.
c8e68512 1001The file capability sets,
cf7a13d4 1002in conjunction with the capability sets of the thread,
c8e68512 1003determine the capabilities of a thread after an
c930827f 1004.BR execve (2).
ade303d7 1005.PP
c8e68512 1006The three file capability sets are:
fea681da 1007.TP
3dfe7e0d 1008.IR Permitted " (formerly known as " forced ):
c8e68512 1009These capabilities are automatically permitted to the thread,
cf7a13d4 1010regardless of the thread's inheritable capabilities.
fea681da 1011.TP
c8e68512
MK
1012.IR Inheritable " (formerly known as " allowed ):
1013This set is ANDed with the thread's inheritable set to determine which
1014inheritable capabilities are enabled in the permitted set of
1015the thread after the
1016.BR execve (2).
1017.TP
fea681da 1018.IR Effective :
c8e68512
MK
1019This is not a set, but rather just a single bit.
1020If this bit is set, then during an
1021.BR execve (2)
1022all of the new permitted capabilities for the thread are
1023also raised in the effective set.
1024If this bit is not set, then after an
1025.BR execve (2),
1026none of the new permitted capabilities is in the new effective set.
ade303d7 1027.IP
c8e68512 1028Enabling the file effective capability bit implies
2914a14d 1029that any file permitted or inheritable capability that causes a
c8e68512
MK
1030thread to acquire the corresponding permitted capability during an
1031.BR execve (2)
aca89285
KK
1032(see
1033.I Transformation of capabilities during execve()
1034below) will also acquire that
c8e68512
MK
1035capability in its effective set.
1036Therefore, when assigning capabilities to a file
1037.RB ( setcap (8),
1038.BR cap_set_file (3),
1039.BR cap_set_fd (3)),
1040if we specify the effective flag as being enabled for any capability,
1041then the effective flag must also be specified as enabled
1042for all other capabilities for which the corresponding permitted or
1043inheritable flags is enabled.
1044.\"
c281d050 1045.SS File capability extended attribute versioning
b6848704
MK
1046To allow extensibility,
1047the kernel supports a scheme to encode a version number inside the
1048.I security.capability
1049extended attribute that is used to implement file capabilities.
1050These version numbers are internal to the implementation,
1051and not directly visible to user-space applications.
1052To date, the following versions are supported:
1053.TP
1ae6b2c7 1054.B VFS_CAP_REVISION_1
b6848704
MK
1055This was the original file capability implementation,
1056which supported 32-bit masks for file capabilities.
1057.TP
1058.BR VFS_CAP_REVISION_2 " (since Linux 2.6.25)"
1059.\" commit e338d263a76af78fe8f38a72131188b58fceb591
1060This version allows for file capability masks that are 64 bits in size,
1061and was necessary as the number of supported capabilities grew beyond 32.
1062The kernel transparently continues to support the execution of files
1063that have 32-bit version 1 capability masks,
1064but when adding capabilities to files that did not previously
1065have capabilities, or modifying the capabilities of existing files,
bcaa30c9
MK
1066it automatically uses the version 2 scheme
1067(or possibly the version 3 scheme, as described below).
b6848704
MK
1068.TP
1069.BR VFS_CAP_REVISION_3 " (since Linux 4.14)"
1070.\" commit 8db6c34f1dbc8e06aa016a9b829b06902c3e1340
bcaa30c9 1071Version 3 file capabilities are provided
12dce731 1072to support namespaced file capabilities (described below).
bcaa30c9 1073.IP
b6848704 1074As with version 2 file capabilities,
bcaa30c9
MK
1075version 3 capability masks are 64 bits in size.
1076But in addition, the root user ID of namespace is encoded in the
b6848704
MK
1077.I security.capability
1078extended attribute.
7da0c87a
MK
1079(A namespace's root user ID is the value that user ID 0
1080inside that namespace maps to in the initial user namespace.)
7b45f4b2 1081.IP
bcaa30c9
MK
1082Version 3 file capabilities are designed to coexist
1083with version 2 capabilities;
1084that is, on a modern Linux system,
1085there may be some files with version 2 capabilities
1086while others have version 3 capabilities.
1087.PP
1088Before Linux 4.14,
c281d050
MK
1089the only kind of file capability extended attribute
1090that could be attached to a file was a
bcaa30c9 1091.B VFS_CAP_REVISION_2
c281d050 1092attribute.
bcaa30c9 1093Since Linux 4.14,
9b2c207a 1094the version of the
bcaa30c9 1095.I security.capability
9b2c207a
MK
1096extended attribute that is attached to a file
1097depends on the circumstances in which the attribute was created.
bcaa30c9 1098.PP
7b45f4b2 1099Starting with Linux 4.14, a
7b45f4b2
MK
1100.I security.capability
1101extended attribute is automatically created as (or converted to)
bcaa30c9
MK
1102a version 3
1103.RB ( VFS_CAP_REVISION_3 )
1104attribute if both of the following are true:
7b45f4b2 1105.IP (1) 4
ffea2c14 1106The thread writing the attribute resides in a noninitial user namespace.
7b45f4b2
MK
1107(More precisely: the thread resides in a user namespace other
1108than the one from which the underlying filesystem was mounted.)
1109.IP (2)
1110The thread has the
1ae6b2c7 1111.B CAP_SETFCAP
7b45f4b2
MK
1112capability over the file inode,
1113meaning that (a) the thread has the
1114.B CAP_SETFCAP
1115capability in its own user namespace;
1116and (b) the UID and GID of the file inode have mappings in
1117the writer's user namespace.
bcaa30c9 1118.PP
7b45f4b2 1119When a
1ae6b2c7 1120.B VFS_CAP_REVISION_3
7b45f4b2
MK
1121.I security.capability
1122extended attribute is created, the root user ID of the creating thread's
1123user namespace is saved in the extended attribute.
bcaa30c9 1124.PP
2c77e8de 1125By contrast, creating or modifying a
7b45f4b2
MK
1126.I security.capability
1127extended attribute from a privileged
1128.RB ( CAP_SETFCAP )
1129thread that resides in the
90ef0f7b 1130namespace where the underlying filesystem was mounted
7b45f4b2 1131(this normally means the initial user namespace)
2c77e8de 1132automatically results in the creation of a version 2
bcaa30c9 1133.RB ( VFS_CAP_REVISION_2 )
7b45f4b2 1134attribute.
bcaa30c9 1135.PP
2c77e8de
MK
1136Note that the creation of a version 3
1137.I security.capability
1138extended attribute is automatic.
1139That is to say, when a user-space application writes
1140.RB ( setxattr (2))
1141a
1142.I security.capability
1143attribute in the version 2 format,
1144the kernel will automatically create a version 3 attribute
1145if the attribute is created in the circumstances described above.
1146Correspondingly, when a version 3
1147.I security.capability
1148attribute is retrieved
1149.RB ( getxattr (2))
1150by a process that resides inside a user namespace that was created by the
1151root user ID (or a descendant of that user namespace),
1152the returned attribute is (automatically)
1153simplified to appear as a version 2 attribute
1154(i.e., the returned value is the size of a version 2 attribute and does
1155not include the root user ID).
1156These automatic translations mean that no changes are required to
1157user-space tools (e.g.,
1158.BR setcap (1)
1159and
1160.BR getcap (1))
1161in order for those tools to be used to create and retrieve version 3
1162.I security.capability
1163attributes.
1164.PP
bcaa30c9
MK
1165Note that a file can have either a version 2 or a version 3
1166.I security.capability
1167extended attribute associated with it, but not both:
1168creation or modification of the
1169.I security.capability
1170extended attribute will automatically modify the version
1171according to the circumstances in which the extended attribute is
1172created or modified.
b6848704 1173.\"
c634028a 1174.SS Transformation of capabilities during execve()
c13182ef 1175During an
c930827f 1176.BR execve (2),
1e321034 1177the kernel calculates the new capabilities of
fea681da 1178the process using the following algorithm:
ade303d7 1179.PP
088a639b 1180.in +4n
b8302363 1181.EX
f04f131f 1182P'(ambient) = (file is privileged) ? 0 : P(ambient)
6260f4cd 1183
f04f131f 1184P'(permitted) = (P(inheritable) & F(inheritable)) |
2e87ced3 1185 (F(permitted) & P(bounding)) | P'(ambient)
fea681da 1186
f04f131f 1187P'(effective) = F(effective) ? P'(permitted) : P'(ambient)
fea681da 1188
5bdccabd 1189P'(inheritable) = P(inheritable) [i.e., unchanged]
2e87ced3
MK
1190
1191P'(bounding) = P(bounding) [i.e., unchanged]
b8302363 1192.EE
088a639b 1193.in
ade303d7 1194.PP
fea681da 1195where:
c8e68512 1196.RS 4
2e87ced3 1197.IP P() 6
c13182ef 1198denotes the value of a thread capability set before the
c930827f 1199.BR execve (2)
2e87ced3 1200.IP P'()
8295fc02 1201denotes the value of a thread capability set after the
c930827f 1202.BR execve (2)
2e87ced3 1203.IP F()
fea681da 1204denotes a file capability set
c8e68512 1205.RE
3375bef1 1206.PP
ddc1ad30
MK
1207Note the following details relating to the above capability
1208transformation rules:
1209.IP * 3
1210The ambient capability set is present only since Linux 4.3.
1211When determining the transformation of the ambient set during
1212.BR execve (2),
1213a privileged file is one that has capabilities or
3375bef1 1214has the set-user-ID or set-group-ID bit set.
ddc1ad30
MK
1215.IP *
1216Prior to Linux 2.6.25,
1217the bounding set was a system-wide attribute shared by all threads.
1218That system-wide value was employed to calculate the new permitted set during
1219.BR execve (2)
1220in the same manner as shown above for
1221.IR P(bounding) .
ade303d7 1222.PP
56cc88cb 1223.IR Note :
1a9ed17c
MK
1224during the capability transitions described above,
1225file capabilities may be ignored (treated as empty) for the same reasons
56cc88cb
MK
1226that the set-user-ID and set-group-ID bits are ignored; see
1227.BR execve (2).
1a9ed17c 1228File capabilities are similarly ignored if the kernel was booted with the
f6acfeb8 1229.I no_file_caps
1a9ed17c 1230option.
ade303d7 1231.PP
e3ed67ed
MK
1232.IR Note :
1233according to the rules above,
1234if a process with nonzero user IDs performs an
1235.BR execve (2)
1236then any capabilities that are present in
1237its permitted and effective sets will be cleared.
1238For the treatment of capabilities when a process with a
1239user ID of zero performs an
1240.BR execve (2),
aca89285
KK
1241see
1242.I Capabilities and execution of programs by root
1243below.
c8e68512 1244.\"
e0e57837 1245.SS Safety checking for capability-dumb binaries
4a866754 1246A capability-dumb binary is an application that has been
e0e57837
MK
1247marked to have file capabilities, but has not been converted to use the
1248.BR libcap (3)
1249API to manipulate its capabilities.
1250(In other words, this is a traditional set-user-ID-root program
1251that has been switched to use file capabilities,
1252but whose code has not been modified to understand capabilities.)
2c767761 1253For such applications,
e0e57837
MK
1254the effective capability bit is set on the file,
1255so that the file permitted capabilities are automatically
1256enabled in the process effective set when executing the file.
1257The kernel recognizes a file which has the effective capability bit set
1258as capability-dumb for the purpose of the check described here.
ade303d7 1259.PP
e0e57837
MK
1260When executing a capability-dumb binary,
1261the kernel checks if the process obtained all permitted capabilities
1262that were specified in the file permitted set,
1263after the capability transformations described above have been performed.
1264(The typical reason why this might
1265.I not
1266occur is that the capability bounding set masked out some
1267of the capabilities in the file permitted set.)
1268If the process did not obtain the full set of
1269file permitted capabilities, then
1270.BR execve (2)
1271fails with the error
1272.BR EPERM .
1273This prevents possible security risks that could arise when
1274a capability-dumb application is executed with less privilege that it needs.
1275Note that, by definition,
1276the application could not itself recognize this problem,
1277since it does not employ the
1278.BR libcap (3)
1279API.
1280.\"
c8e68512 1281.SS Capabilities and execution of programs by root
33d0916f
MK
1282.\" See cap_bprm_set_creds(), bprm_caps_from_vfs_cap() and
1283.\" handle_privileged_root() in security/commoncap.c (Linux 5.0 source)
bc1950ac 1284In order to mirror traditional UNIX semantics,
33d0916f
MK
1285the kernel performs special treatment of file capabilities when
1286a process with UID 0 (root) executes a program and
1287when a set-user-ID-root program is executed.
bc1950ac 1288.PP
33d0916f
MK
1289After having performed any changes to the process effective ID that
1290were triggered by the set-user-ID mode bit of the binary\(eme.g.,
1291switching the effective user ID to 0 (root) because
1292a set-user-ID-root program was executed\(emthe
1293kernel calculates the file capability sets as follows:
c8e68512 1294.IP 1. 3
bc1950ac 1295If the real or effective user ID of the process is 0 (root),
33d0916f
MK
1296then the file inheritable and permitted sets are ignored;
1297instead they are notionally considered to be all ones
c8e68512 1298(i.e., all capabilities enabled).
aca89285
KK
1299(There is one exception to this behavior, described in
1300.I Set-user-ID-root programs that have file capabilities
1301below.)
c8e68512 1302.IP 2.
bc1950ac
MK
1303If the effective user ID of the process is 0 (root) or
1304the file effective bit is in fact enabled,
33d0916f 1305then the file effective bit is notionally defined to be one (enabled).
3dfe7e0d 1306.PP
33d0916f
MK
1307These notional values for the file's capability sets are then used
1308as described above to calculate the transformation of the process's
1309capabilities during
1310.BR execve (2).
bc1950ac 1311.PP
33d0916f 1312Thus, when a process with nonzero UIDs
c930827f 1313.BR execve (2)s
33d0916f
MK
1314a set-user-ID-root program that does not have capabilities attached,
1315or when a process whose real and effective UIDs are zero
ab8aa2e4 1316.BR execve (2)s
33d0916f
MK
1317a program, the calculation of the process's new
1318permitted capabilities simplifies to:
1319.PP
1320.in +4n
1321.EX
1322P'(permitted) = P(inheritable) | P(bounding)
1323
1324P'(effective) = P'(permitted)
1325.EE
1326.in
1327.PP
1328Consequently, the process gains all capabilities in its permitted and
1329effective capability sets,
ab8aa2e4 1330except those masked out by the capability bounding set.
33d0916f
MK
1331(In the calculation of P'(permitted),
1332the P'(ambient) term can be simplified away because it is by
1333definition a proper subset of P(inheritable).)
ab8aa2e4 1334.PP
33d0916f
MK
1335The special treatments of user ID 0 (root) described in this subsection
1336can be disabled using the securebits mechanism described below.
1337.\"
0603dda3
MK
1338.\"
1339.SS Set-user-ID-root programs that have file capabilities
aca89285
KK
1340There is one exception to the behavior described in
1341.I Capabilities and execution of programs by root
1342above.
33d0916f
MK
1343If (a) the binary that is being executed has capabilities attached and
1344(b) the real user ID of the process is
1345.I not
13460 (root) and
1347(c) the effective user ID of the process
1348.I is
13490 (root), then the file capability bits are honored
1350(i.e., they are not notionally considered to be all ones).
1351The usual way in which this situation can arise is when executing
1352a set-UID-root program that also has file capabilities.
1353When such a program is executed,
1354the process gains just the capabilities granted by the program
0603dda3
MK
1355(i.e., not all capabilities,
1356as would occur when executing a set-user-ID-root program
1357that does not have any associated file capabilities).
bc1950ac 1358.PP
c199053b
MK
1359Note that one can assign empty capability sets to a program file,
1360and thus it is possible to create a set-user-ID-root program that
1361changes the effective and saved set-user-ID of the process
1362that executes the program to 0,
1363but confers no capabilities to that process.
0603dda3 1364.\"
c8e68512
MK
1365.SS Capability bounding set
1366The capability bounding set is a security mechanism that can be used
1367to limit the capabilities that can be gained during an
1368.BR execve (2).
1369The bounding set is used in the following ways:
1370.IP * 2
1371During an
1372.BR execve (2),
1373the capability bounding set is ANDed with the file permitted
1374capability set, and the result of this operation is assigned to the
1375thread's permitted capability set.
1376The capability bounding set thus places a limit on the permitted
1377capabilities that may be granted by an executable file.
1378.IP *
1379(Since Linux 2.6.25)
1380The capability bounding set acts as a limiting superset for
1381the capabilities that a thread can add to its inheritable set using
1382.BR capset (2).