]> git.ipfire.org Git - thirdparty/man-pages.git/blame - man7/capabilities.7
ldd.1, localedef.1, add_key.2, chroot.2, clone.2, fork.2, futex.2, get_mempolicy...
[thirdparty/man-pages.git] / man7 / capabilities.7
CommitLineData
c11b1abf 1.\" Copyright (c) 2002 by Michael Kerrisk <mtk.manpages@gmail.com>
fea681da 2.\"
93015253 3.\" %%%LICENSE_START(VERBATIM)
fea681da
MK
4.\" Permission is granted to make and distribute verbatim copies of this
5.\" manual provided the copyright notice and this permission notice are
6.\" preserved on all copies.
7.\"
8.\" Permission is granted to copy and distribute modified versions of this
9.\" manual under the conditions for verbatim copying, provided that the
10.\" entire resulting derived work is distributed under the terms of a
11.\" permission notice identical to this one.
12.\"
13.\" Since the Linux kernel and libraries are constantly changing, this
14.\" manual page may be incorrect or out-of-date. The author(s) assume no
15.\" responsibility for errors or omissions, or for damages resulting from
10d76543
MK
16.\" the use of the information contained herein. The author(s) may not
17.\" have taken the same level of care in the production of this manual,
18.\" which is licensed free of charge, as they might when working
19.\" professionally.
fea681da
MK
20.\"
21.\" Formatted or processed versions of this manual, if unaccompanied by
22.\" the source, must acknowledge the copyright and authors of this work.
4b72fb64 23.\" %%%LICENSE_END
fea681da
MK
24.\"
25.\" 6 Aug 2002 - Initial Creation
c11b1abf
MK
26.\" Modified 2003-05-23, Michael Kerrisk, <mtk.manpages@gmail.com>
27.\" Modified 2004-05-27, Michael Kerrisk, <mtk.manpages@gmail.com>
1c1e15ed 28.\" 2004-12-08, mtk Added O_NOATIME for CAP_FOWNER
5eaee3d9 29.\" 2005-08-16, mtk, Added CAP_AUDIT_CONTROL and CAP_AUDIT_WRITE
c8e68512
MK
30.\" 2008-07-15, Serge Hallyn <serue@us.bbm.com>
31.\" Document file capabilities, per-process capability
32.\" bounding set, changed semantics for CAP_SETPCAP,
33.\" and other changes in 2.6.2[45].
34.\" Add CAP_MAC_ADMIN, CAP_MAC_OVERRIDE, CAP_SETFCAP.
35.\" 2008-07-15, mtk
36.\" Add text describing circumstances in which CAP_SETPCAP
37.\" (theoretically) permits a thread to change the
38.\" capability sets of another thread.
39.\" Add section describing rules for programmatically
40.\" adjusting thread capability sets.
41.\" Describe rationale for capability bounding set.
42.\" Document "securebits" flags.
43.\" Add text noting that if we set the effective flag for one file
44.\" capability, then we must also set the effective flag for all
45.\" other capabilities where the permitted or inheritable bit is set.
bfb730f9 46.\" 2011-09-07, mtk/Serge hallyn: Add CAP_SYSLOG
5eaee3d9 47.\"
3df541c0 48.TH CAPABILITIES 7 2016-07-17 "Linux" "Linux Programmer's Manual"
fea681da
MK
49.SH NAME
50capabilities \- overview of Linux capabilities
51.SH DESCRIPTION
fea681da 52For the purpose of performing permission checks,
008f1ecc 53traditional UNIX implementations distinguish two categories of processes:
fea681da
MK
54.I privileged
55processes (whose effective user ID is 0, referred to as superuser or root),
56and
57.I unprivileged
c7094399 58processes (whose effective UID is nonzero).
fea681da
MK
59Privileged processes bypass all kernel permission checks,
60while unprivileged processes are subject to full permission
61checking based on the process's credentials
62(usually: effective UID, effective GID, and supplementary group list).
63
c13182ef
MK
64Starting with kernel 2.2, Linux divides the privileges traditionally
65associated with superuser into distinct units, known as
fea681da 66.IR capabilities ,
3dfe7e0d 67which can be independently enabled and disabled.
cf7a13d4 68Capabilities are a per-thread attribute.
c8e68512 69.\"
c634028a 70.SS Capabilities list
c8e68512
MK
71The following list shows the capabilities implemented on Linux,
72and the operations or behaviors that each capability permits:
fea681da 73.TP
45286787 74.BR CAP_AUDIT_CONTROL " (since Linux 2.6.11)"
5eaee3d9
MK
75Enable and disable kernel auditing; change auditing filter rules;
76retrieve auditing status and filtering rules.
77.TP
c81cea2c
MK
78.BR CAP_AUDIT_READ " (since Linux 3.16)"
79.\" commit a29b694aa1739f9d76538e34ae25524f9c549d59
80.\" commit 3a101b8de0d39403b2c7e5c23fd0b005668acf48
81Allow reading the audit log via a multicast netlink socket.
82.TP
45286787 83.BR CAP_AUDIT_WRITE " (since Linux 2.6.11)"
c8e68512 84Write records to kernel auditing log.
5eaee3d9 85.TP
9339d749
MK
86.BR CAP_BLOCK_SUSPEND " (since Linux 3.5)"
87Employ features that can block system suspend
88.RB ( epoll (7)
89.BR EPOLLWAKEUP ,
90.IR /proc/sys/wake_lock ).
91.TP
fea681da 92.B CAP_CHOWN
c8e68512 93Make arbitrary changes to file UIDs and GIDs (see
fea681da
MK
94.BR chown (2)).
95.TP
96.B CAP_DAC_OVERRIDE
97Bypass file read, write, and execute permission checks.
c8e68512 98(DAC is an abbreviation of "discretionary access control".)
fea681da
MK
99.TP
100.B CAP_DAC_READ_SEARCH
a537062e
MK
101.PD 0
102.RS
103.IP * 2
fea681da 104Bypass file read permission checks and
a537062e
MK
105directory read and execute permission checks;
106.IP *
107Invoke
108.BR open_by_handle_at (2).
109.RE
110.PD
fea681da
MK
111.TP
112.B CAP_FOWNER
c8e68512
MK
113.PD 0
114.RS
115.IP * 2
fea681da 116Bypass permission checks on operations that normally
9ee4a2b6 117require the filesystem UID of the process to match the UID of
fea681da
MK
118the file (e.g.,
119.BR chmod (2),
120.BR utime (2)),
c8e68512 121excluding those operations covered by
fea681da
MK
122.B CAP_DAC_OVERRIDE
123and
124.BR CAP_DAC_READ_SEARCH ;
c8e68512 125.IP *
fea681da
MK
126set extended file attributes (see
127.BR chattr (1))
128on arbitrary files;
c8e68512 129.IP *
fea681da 130set Access Control Lists (ACLs) on arbitrary files;
c8e68512 131.IP *
1c1e15ed 132ignore directory sticky bit on file deletion;
c8e68512 133.IP *
1c1e15ed
MK
134specify
135.B O_NOATIME
136for arbitrary files in
137.BR open (2)
138and
139.BR fcntl (2).
c8e68512
MK
140.RE
141.PD
fea681da
MK
142.TP
143.B CAP_FSETID
ed948c28 144Don't clear set-user-ID and set-group-ID mode
c8e68512
MK
145bits when a file is modified;
146set the set-group-ID bit for a file whose GID does not match
9ee4a2b6 147the filesystem or any of the supplementary GIDs of the calling process.
fea681da
MK
148.TP
149.B CAP_IPC_LOCK
bea08fec 150.\" FIXME . As at Linux 3.2, there are some strange uses of this capability
46c73a44 151.\" in other places; they probably should be replaced with something else.
c8e68512 152Lock memory
fea681da
MK
153.RB ( mlock (2),
154.BR mlockall (2),
155.BR mmap (2),
156.BR shmctl (2)).
157.TP
158.B CAP_IPC_OWNER
159Bypass permission checks for operations on System V IPC objects.
160.TP
161.B CAP_KILL
162Bypass permission checks for sending signals (see
163.BR kill (2)).
097585ed 164This includes use of the
c8e68512 165.BR ioctl (2)
097585ed 166.B KDSIGACCEPT
c8e68512 167operation.
bea08fec 168.\" FIXME . CAP_KILL also has an effect for threads + setting child
a7c1e564
MK
169.\" termination signal to other than SIGCHLD: without this
170.\" capability, the termination signal reverts to SIGCHLD
c13182ef 171.\" if the child does an exec(). What is the rationale
a7c1e564 172.\" for this?
fea681da 173.TP
c8e68512
MK
174.BR CAP_LEASE " (since Linux 2.4)"
175Establish leases on arbitrary files (see
fea681da
MK
176.BR fcntl (2)).
177.TP
178.B CAP_LINUX_IMMUTABLE
c8e68512
MK
179Set the
180.B FS_APPEND_FL
fea681da 181and
c8e68512
MK
182.B FS_IMMUTABLE_FL
183.\" These attributes are now available on ext2, ext3, Reiserfs, XFS, JFS
e7e006f2 184inode flags (see
fea681da
MK
185.BR chattr (1)).
186.TP
c8e68512
MK
187.BR CAP_MAC_ADMIN " (since Linux 2.6.25)"
188Override Mandatory Access Control (MAC).
189Implemented for the Smack Linux Security Module (LSM).
190.TP
191.BR CAP_MAC_OVERRIDE " (since Linux 2.6.25)"
192Allow MAC configuration or state changes.
193Implemented for the Smack LSM.
194.TP
195.BR CAP_MKNOD " (since Linux 2.4)"
196Create special files using
fea681da
MK
197.BR mknod (2).
198.TP
199.B CAP_NET_ADMIN
e87268ec
MK
200Perform various network-related operations:
201.PD 0
202.RS
203.IP * 2
204interface configuration;
205.IP *
12fe8fd3 206administration of IP firewall, masquerading, and accounting;
e87268ec
MK
207.IP *
208modify routing tables;
209.IP *
210bind to any address for transparent proxying;
211.IP *
212set type-of-service (TOS)
213.IP *
214clear driver statistics;
215.IP *
216set promiscuous mode;
217.IP *
218enabling multicasting;
219.IP *
220use
221.BR setsockopt (2)
222to set the following socket options:
223.BR SO_DEBUG ,
224.BR SO_MARK ,
225.BR SO_PRIORITY
226(for a priority outside the range 0 to 6),
227.BR SO_RCVBUFFORCE ,
228and
229.BR SO_SNDBUFFORCE .
230.RE
231.PD
fea681da
MK
232.TP
233.B CAP_NET_BIND_SERVICE
6eb334b2 234Bind a socket to Internet domain privileged ports
fea681da
MK
235(port numbers less than 1024).
236.TP
237.B CAP_NET_BROADCAST
c8e68512 238(Unused) Make socket broadcasts, and listen to multicasts.
fea681da
MK
239.TP
240.B CAP_NET_RAW
93e9e2d6
MK
241.PD 0
242.RS
243.IP * 2
244use RAW and PACKET sockets;
245.IP *
246bind to any address for transparent proxying.
247.RE
248.PD
fea681da
MK
249.\" Also various IP options and setsockopt(SO_BINDTODEVICE)
250.TP
251.B CAP_SETGID
c8e68512 252Make arbitrary manipulations of process GIDs and supplementary GID list;
5bea231d
MK
253forge GID when passing socket credentials via UNIX domain sockets;
254write a group ID mapping in a user namespace (see
f58fb24f 255.BR user_namespaces (7)).
fea681da 256.TP
c8e68512
MK
257.BR CAP_SETFCAP " (since Linux 2.6.24)"
258Set file capabilities.
259.TP
260.B CAP_SETPCAP
261If file capabilities are not supported:
262grant or remove any capability in the
263caller's permitted capability set to or from any other process.
264(This property of
265.B CAP_SETPCAP
266is not available when the kernel is configured to support
267file capabilities, since
fea681da 268.B CAP_SETPCAP
c8e68512
MK
269has entirely different semantics for such kernels.)
270
271If file capabilities are supported:
272add any capability from the calling thread's bounding set
273to its inheritable set;
274drop capabilities from the bounding set (via
275.BR prctl (2)
276.BR PR_CAPBSET_DROP );
277make changes to the
278.I securebits
279flags.
fea681da
MK
280.TP
281.B CAP_SETUID
c8e68512 282Make arbitrary manipulations of process UIDs
fea681da
MK
283.RB ( setuid (2),
284.BR setreuid (2),
285.BR setresuid (2),
286.BR setfsuid (2));
a7d96776 287forge UID when passing socket credentials via UNIX domain sockets;
5bea231d 288write a user ID mapping in a user namespace (see
f58fb24f 289.BR user_namespaces (7)).
777f5a9e 290.\" FIXME CAP_SETUID also an effect in exec(); document this.
fea681da
MK
291.TP
292.B CAP_SYS_ADMIN
c8e68512
MK
293.PD 0
294.RS
295.IP * 2
296Perform a range of system administration operations including:
fea681da
MK
297.BR quotactl (2),
298.BR mount (2),
299.BR umount (2),
1368e847
MK
300.BR swapon (2),
301.BR swapoff (2),
fea681da 302.BR sethostname (2),
f169a862 303and
c8e68512
MK
304.BR setdomainname (2);
305.IP *
bfb730f9
MK
306perform privileged
307.BR syslog (2)
308operations (since Linux 2.6.37,
309.BR CAP_SYSLOG
310should be used to permit such operations);
311.IP *
c8e68512 312perform
c11e3891
MK
313.B VM86_REQUEST_IRQ
314.BR vm86 (2)
315command;
316.IP *
317perform
fea681da
MK
318.B IPC_SET
319and
320.B IPC_RMID
321operations on arbitrary System V IPC objects;
c8e68512 322.IP *
1a3b63f7
MK
323override
324.B RLIMIT_NPROC
325resource limit;
326.IP *
fea681da
MK
327perform operations on
328.I trusted
329and
330.I security
331Extended Attributes (see
89fabe2e 332.BR xattr (7));
c8e68512
MK
333.IP *
334use
08baa0af 335.BR lookup_dcookie (2);
c8e68512 336.IP *
a1f926b8
MK
337use
338.BR ioprio_set (2)
339to assign
340.B IOPRIO_CLASS_RT
83ee9237 341and (before Linux 2.6.25)
237aa7c5 342.B IOPRIO_CLASS_IDLE
a1f926b8 343I/O scheduling classes;
c8e68512 344.IP *
f5ac5bbf 345forge PID when passing socket credentials via UNIX domain sockets;
c8e68512 346.IP *
fea681da 347exceed
3dfe7e0d
MK
348.IR /proc/sys/fs/file-max ,
349the system-wide limit on the number of open files,
350in system calls that open files (e.g.,
fea681da
MK
351.BR accept (2),
352.BR execve (2),
353.BR open (2),
f169a862 354.BR pipe (2));
c8e68512 355.IP *
c13182ef 356employ
0f807eea
MK
357.B CLONE_*
358flags that create new namespaces with
a7c1e564
MK
359.BR clone (2)
360and
c67d3814
MK
361.BR unshare (2)
362(but, since Linux 3.8,
363creating user namespaces does not require any capability);
c8e68512 364.IP *
e4698850 365call
0f322ccc
MK
366.BR perf_event_open (2);
367.IP *
0f322ccc
MK
368access privileged
369.I perf
370event information;
2bfe6656
MK
371.IP *
372call
c3b49118
MK
373.BR setns (2)
374(requires
375.B CAP_SYS_ADMIN
376in the
377.I target
378namespace);
e4698850 379.IP *
0f807eea
MK
380call
381.BR fanotify_init (2);
382.IP *
0563f204
MK
383call
384.BR bpf (2);
385.IP *
c13182ef 386perform
a7c1e564
MK
387.B KEYCTL_CHOWN
388and
389.B KEYCTL_SETPERM
390.BR keyctl (2)
e64e6056
MK
391operations;
392.IP *
393perform
394.BR madvise (2)
395.B MADV_HWPOISON
0f807eea
MK
396operation;
397.IP *
398employ the
399.B TIOCSTI
400.BR ioctl (2)
401to insert characters into the input queue of a terminal other than
838ad419 402the caller's controlling terminal;
0f807eea 403.IP *
0f807eea 404employ the obsolete
51c5c662 405.BR nfsservctl (2)
c42221c4
MK
406system call;
407.IP *
408employ the obsolete
0f807eea
MK
409.BR bdflush (2)
410system call;
411.IP *
412perform various privileged block-device
413.BR ioctl (2)
414operations;
415.IP *
9ee4a2b6 416perform various privileged filesystem
0f807eea
MK
417.BR ioctl (2)
418operations;
419.IP *
420perform administrative operations on many device drivers.
c8e68512
MK
421.RE
422.PD
fea681da
MK
423.TP
424.B CAP_SYS_BOOT
c8e68512 425Use
08baa0af
MK
426.BR reboot (2)
427and
428.BR kexec_load (2).
fea681da
MK
429.TP
430.B CAP_SYS_CHROOT
c8e68512 431Use
fea681da
MK
432.BR chroot (2).
433.TP
434.B CAP_SYS_MODULE
c8e68512
MK
435Load and unload kernel modules
436(see
fea681da
MK
437.BR init_module (2)
438and
c8e68512
MK
439.BR delete_module (2));
440in kernels before 2.6.25:
441drop capabilities from the system-wide capability bounding set.
fea681da
MK
442.TP
443.B CAP_SYS_NICE
c8e68512
MK
444.PD 0
445.RS
446.IP * 2
447Raise process nice value
fea681da
MK
448.RB ( nice (2),
449.BR setpriority (2))
c8e68512
MK
450and change the nice value for arbitrary processes;
451.IP *
452set real-time scheduling policies for calling process,
453and set scheduling policies and priorities for arbitrary processes
fea681da 454.RB ( sched_setscheduler (2),
f96787ab
MK
455.BR sched_setparam (2),
456.BR shed_setattr (2));
c8e68512 457.IP *
fea681da 458set CPU affinity for arbitrary processes
c13182ef 459.RB ( sched_setaffinity (2));
c8e68512 460.IP *
a1f926b8 461set I/O scheduling class and priority for arbitrary processes
c13182ef 462.RB ( ioprio_set (2));
c8e68512
MK
463.IP *
464apply
a1f926b8 465.BR migrate_pages (2)
c8e68512 466to arbitrary processes and allow processes
a1f926b8 467to be migrated to arbitrary nodes;
c13182ef 468.\" FIXME CAP_SYS_NICE also has the following effect for
a1f926b8
MK
469.\" migrate_pages(2):
470.\" do_migrate_pages(mm, &old, &new,
471.\" capable(CAP_SYS_NICE) ? MPOL_MF_MOVE_ALL : MPOL_MF_MOVE);
bea08fec 472.\" Document this.
c8e68512
MK
473.IP *
474apply
a7c1e564 475.BR move_pages (2)
c8e68512
MK
476to arbitrary processes;
477.IP *
4d62f7b6
MK
478use the
479.B MPOL_MF_MOVE_ALL
c13182ef 480flag with
a7c1e564 481.BR mbind (2)
c13182ef 482and
a7c1e564 483.BR move_pages (2).
c8e68512
MK
484.RE
485.PD
fea681da
MK
486.TP
487.B CAP_SYS_PACCT
c8e68512 488Use
fea681da
MK
489.BR acct (2).
490.TP
491.B CAP_SYS_PTRACE
eb64a9cb
MK
492.PD 0
493.RS
494.IP * 3
c8e68512 495Trace arbitrary processes using
cbd7b9bf 496.BR ptrace (2);
eb64a9cb 497.IP *
cbd7b9bf
MK
498apply
499.BR get_robust_list (2)
38b6e5b0 500to arbitrary processes;
eb64a9cb 501.IP *
b8f84ce2
MK
502transfer data to or from the memory of arbitrary processes using
503.BR process_vm_readv (2)
504and
505.BR process_vm_writev (2).
506.IP *
38b6e5b0
MK
507inspect processes using
508.BR kcmp (2).
eb64a9cb
MK
509.RE
510.PD
fea681da
MK
511.TP
512.B CAP_SYS_RAWIO
4637c8cb
MK
513.PD 0
514.RS
515.IP * 2
c8e68512 516Perform I/O port operations
fea681da
MK
517.RB ( iopl (2)
518and
519.BR ioperm (2));
4637c8cb 520.IP *
fea681da 521access
474e1f9d 522.IR /proc/kcore ;
4637c8cb 523.IP *
474e1f9d
MK
524employ the
525.B FIBMAP
526.BR ioctl (2)
4637c8cb
MK
527operation;
528.IP *
529open devices for accessing x86 model-specific registers (MSRs, see
530.BR msr (4))
531.IP *
532update
533.IR /proc/sys/vm/mmap_min_addr ;
534.IP *
535create memory mappings at addresses below the value specified by
536.IR /proc/sys/vm/mmap_min_addr ;
537.IP *
50b2aa27 538map files in
cef53f3e 539.IR /proc/bus/pci ;
4637c8cb
MK
540.IP *
541open
542.IR /dev/mem
543and
544.IR /dev/kmem ;
545.IP *
546perform various SCSI device commands;
547.IP *
548perform certain operations on
549.BR hpsa (4)
550and
551.BR cciss (4)
552devices;
553.IP *
554perform a range of device-specific operations on other devices.
555.RE
556.PD
fea681da
MK
557.TP
558.B CAP_SYS_RESOURCE
c8e68512
MK
559.PD 0
560.RS
561.IP * 2
9ee4a2b6 562Use reserved space on ext2 filesystems;
c8e68512
MK
563.IP *
564make
fea681da
MK
565.BR ioctl (2)
566calls controlling ext3 journaling;
c8e68512
MK
567.IP *
568override disk quota limits;
569.IP *
570increase resource limits (see
fea681da 571.BR setrlimit (2));
c8e68512
MK
572.IP *
573override
fea681da 574.B RLIMIT_NPROC
c8e68512
MK
575resource limit;
576.IP *
aa66392d
MK
577override maximum number of consoles on console allocation;
578.IP *
579override maximum number of keymaps;
580.IP *
581allow more than 64hz interrupts from the real-time clock;
582.IP *
c8e68512 583raise
fea681da 584.I msg_qbytes
c8e68512 585limit for a System V message queue above the limit in
0daa9e92 586.I /proc/sys/kernel/msgmnb
fea681da
MK
587(see
588.BR msgop (2)
589and
ad7b0f91
MK
590.BR msgctl (2));
591.IP *
592override the
593.I /proc/sys/fs/pipe-size-max
594limit when setting the capacity of a pipe using the
595.B F_SETPIPE_SZ
596.BR fcntl (2)
597command.
46883521
MK
598.IP *
599use
600.BR F_SETPIPE_SZ
601to increase the capacity of a pipe above the limit specified by
b39a2012
MK
602.IR /proc/sys/fs/pipe-max-size ;
603.IP *
604override
605.I /proc/sys/fs/mqueue/queues_max
606limit when creating POSIX message queues (see
ecc1f45b
MK
607.BR mq_overview (7));
608.IP *
609employ
610.BR prctl (2)
611.B PR_SET_MM
8ddcc591 612operation;
41f00272 613.IP *
8ddcc591
MK
614set
615.IR /proc/PID/oom_score_adj
616to a value lower than the value last set by a process with
617.BR CAP_SYS_RESOURCE .
c8e68512
MK
618.RE
619.PD
fea681da
MK
620.TP
621.B CAP_SYS_TIME
c8e68512 622Set system clock
fea681da
MK
623.RB ( settimeofday (2),
624.BR stime (2),
625.BR adjtimex (2));
c8e68512 626set real-time (hardware) clock.
fea681da
MK
627.TP
628.B CAP_SYS_TTY_CONFIG
c8e68512 629Use
749ac769
MK
630.BR vhangup (2);
631employ various privileged
632.BR ioctl (2)
633operations on virtual terminals.
bfb730f9
MK
634.TP
635.BR CAP_SYSLOG " (since Linux 2.6.37)"
5f94327c
MK
636.RS
637.PD 0
10fe5485 638.IP * 3
bfb730f9
MK
639Perform privileged
640.BR syslog (2)
641operations.
642See
643.BR syslog (2)
644for information on which operations require privilege.
10fe5485
MK
645.IP *
646View kernel addresses exposed via
647.I /proc
648and other interfaces when
649.IR /proc/sys/kernel/kptr_restrict
650has the value 1.
4eaa04c5 651(See the discussion of the
10fe5485
MK
652.I kptr_restrict
653in
654.BR proc (5).)
5f94327c
MK
655.PD
656.RE
d6b08708
MK
657.TP
658.BR CAP_WAKE_ALARM " (since Linux 3.0)"
659Trigger something that will wake up the system (set
660.B CLOCK_REALTIME_ALARM
661and
662.B CLOCK_BOOTTIME_ALARM
663timers).
c8e68512 664.\"
c634028a 665.SS Past and current implementation
c8e68512
MK
666A full implementation of capabilities requires that:
667.IP 1. 3
668For all privileged operations,
669the kernel must check whether the thread has the required
670capability in its effective set.
671.IP 2.
137d81b5 672The kernel must provide system calls allowing a thread's capability sets to
c8e68512
MK
673be changed and retrieved.
674.IP 3.
9ee4a2b6 675The filesystem must support attaching capabilities to an executable file,
c8e68512
MK
676so that a process gains those capabilities when the file is executed.
677.PP
678Before kernel 2.6.24, only the first two of these requirements are met;
679since kernel 2.6.24, all three requirements are met.
680.\"
c634028a 681.SS Thread capability sets
cf7a13d4 682Each thread has three capability sets containing zero or more
fea681da
MK
683of the above capabilities:
684.TP
fea681da 685.IR Permitted :
c8e68512
MK
686This is a limiting superset for the effective
687capabilities that the thread may assume.
688It is also a limiting superset for the capabilities that
689may be added to the inheritable set by a thread that does not have the
690.B CAP_SETPCAP
691capability in its effective set.
692
cf7a13d4 693If a thread drops a capability from its permitted set,
3b777aff 694it can never reacquire that capability (unless it
c930827f 695.BR execve (2)s
c8e68512
MK
696either a set-user-ID-root program, or
697a program whose associated file capabilities grant that capability).
fea681da 698.TP
c8e68512
MK
699.IR Inheritable :
700This is a set of capabilities preserved across an
fea681da 701.BR execve (2).
6260f4cd
AL
702Inheritable capabilities remain inheritable when executing any program,
703and inheritable capabilities are added to the permitted set when executing
704a program that has the corresponding bits set in the file inheritable set.
705.IP
706Because inheritable capabilities are not generally preserved across
707.BR execve (2)
708when running as a non-root user, applications that wish to run helper
e574dcd0
MK
709programs with elevated capabilities should consider using
710ambient capabilities, described below.
c8e68512
MK
711.TP
712.IR Effective :
713This is the set of capabilities used by the kernel to
714perform permission checks for the thread.
6260f4cd
AL
715.TP
716.IR Ambient " (since Linux 4.3):"
e574dcd0 717.\" commit 58319057b7847667f0c9585b9de0e8932b0fdb08
6260f4cd
AL
718This is a set of capabilities that are preserved across an
719.BR execve (2)
3375bef1 720of a program that is not privileged.
e574dcd0
MK
721The ambient capability set obeys the invariant that no capability
722can ever be ambient if it is not both permitted and inheritable.
3375bef1
MK
723
724The ambient capability set can be directly modified using
725.BR prctl (2).
726Ambient capabilities are automatically lowered if either of
727the corresponding permitted or inheritable capabilities is lowered.
728
729Executing a program that changes UID or GID due to the
730set-user-ID or set-group-ID bits or executing a program that has
731any file capabilities set will clear the ambient set.
732Ambient capabilities are added to the permitted set and
733assigned to the effective set when
6260f4cd 734.BR execve (2)
e574dcd0 735is called.
fea681da 736.PP
fea681da
MK
737A child created via
738.BR fork (2)
739inherits copies of its parent's capability sets.
3dfe7e0d 740See below for a discussion of the treatment of capabilities during
c930827f 741.BR execve (2).
fea681da
MK
742.PP
743Using
744.BR capset (2),
c8e68512 745a thread may manipulate its own capability sets (see below).
afae50e4
MK
746.PP
747Since Linux 3.2, the file
748.I /proc/sys/kernel/cap_last_cap
a60b1f03 749.\" commit 73efc0394e148d0e15583e13712637831f926720
afae50e4
MK
750exposes the numerical value of the highest capability
751supported by the running kernel;
752this can be used to determine the highest bit
753that may be set in a capability set.
c8e68512 754.\"
c634028a 755.SS File capabilities
c8e68512
MK
756Since kernel 2.6.24, the kernel supports
757associating capability sets with an executable file using
758.BR setcap (8).
759The file capability sets are stored in an extended attribute (see
760.BR setxattr (2))
761named
762.IR "security.capability" .
763Writing to this extended attribute requires the
764.BR CAP_SETFCAP
fea681da 765capability.
c8e68512 766The file capability sets,
cf7a13d4 767in conjunction with the capability sets of the thread,
c8e68512 768determine the capabilities of a thread after an
c930827f 769.BR execve (2).
c8e68512
MK
770
771The three file capability sets are:
fea681da 772.TP
3dfe7e0d 773.IR Permitted " (formerly known as " forced ):
c8e68512 774These capabilities are automatically permitted to the thread,
cf7a13d4 775regardless of the thread's inheritable capabilities.
fea681da 776.TP
c8e68512
MK
777.IR Inheritable " (formerly known as " allowed ):
778This set is ANDed with the thread's inheritable set to determine which
779inheritable capabilities are enabled in the permitted set of
780the thread after the
781.BR execve (2).
782.TP
fea681da 783.IR Effective :
c8e68512
MK
784This is not a set, but rather just a single bit.
785If this bit is set, then during an
786.BR execve (2)
787all of the new permitted capabilities for the thread are
788also raised in the effective set.
789If this bit is not set, then after an
790.BR execve (2),
791none of the new permitted capabilities is in the new effective set.
792
793Enabling the file effective capability bit implies
2914a14d 794that any file permitted or inheritable capability that causes a
c8e68512
MK
795thread to acquire the corresponding permitted capability during an
796.BR execve (2)
e33a08e1 797(see the transformation rules described below) will also acquire that
c8e68512
MK
798capability in its effective set.
799Therefore, when assigning capabilities to a file
800.RB ( setcap (8),
801.BR cap_set_file (3),
802.BR cap_set_fd (3)),
803if we specify the effective flag as being enabled for any capability,
804then the effective flag must also be specified as enabled
805for all other capabilities for which the corresponding permitted or
806inheritable flags is enabled.
807.\"
c634028a 808.SS Transformation of capabilities during execve()
fea681da 809.PP
c13182ef 810During an
c930827f 811.BR execve (2),
1e321034 812the kernel calculates the new capabilities of
fea681da 813the process using the following algorithm:
088a639b 814.in +4n
fea681da
MK
815.nf
816
3375bef1 817P'(ambient) = (file is privileged) ? 0 : P(ambient)
6260f4cd 818
c13182ef 819P'(permitted) = (P(inheritable) & F(inheritable)) |
6260f4cd 820 (F(permitted) & cap_bset) | P'(ambient)
fea681da 821
6260f4cd 822P'(effective) = F(effective) ? P'(permitted) : P'(ambient)
fea681da 823
5bdccabd 824P'(inheritable) = P(inheritable) [i.e., unchanged]
fea681da
MK
825
826.fi
088a639b 827.in
fea681da 828where:
c8e68512 829.RS 4
fea681da 830.IP P 10
c13182ef 831denotes the value of a thread capability set before the
c930827f 832.BR execve (2)
c8e68512 833.IP P'
8295fc02 834denotes the value of a thread capability set after the
c930827f 835.BR execve (2)
c8e68512 836.IP F
fea681da 837denotes a file capability set
c8e68512
MK
838.IP cap_bset
839is the value of the capability bounding set (described below).
840.RE
3375bef1
MK
841.PP
842A privileged file is one that has capabilities or
843has the set-user-ID or set-group-ID bit set.
c8e68512 844.\"
e0e57837 845.SS Safety checking for capability-dumb binaries
4a866754 846A capability-dumb binary is an application that has been
e0e57837
MK
847marked to have file capabilities, but has not been converted to use the
848.BR libcap (3)
849API to manipulate its capabilities.
850(In other words, this is a traditional set-user-ID-root program
851that has been switched to use file capabilities,
852but whose code has not been modified to understand capabilities.)
2c767761 853For such applications,
e0e57837
MK
854the effective capability bit is set on the file,
855so that the file permitted capabilities are automatically
856enabled in the process effective set when executing the file.
857The kernel recognizes a file which has the effective capability bit set
858as capability-dumb for the purpose of the check described here.
859
860When executing a capability-dumb binary,
861the kernel checks if the process obtained all permitted capabilities
862that were specified in the file permitted set,
863after the capability transformations described above have been performed.
864(The typical reason why this might
865.I not
866occur is that the capability bounding set masked out some
867of the capabilities in the file permitted set.)
868If the process did not obtain the full set of
869file permitted capabilities, then
870.BR execve (2)
871fails with the error
872.BR EPERM .
873This prevents possible security risks that could arise when
874a capability-dumb application is executed with less privilege that it needs.
875Note that, by definition,
876the application could not itself recognize this problem,
877since it does not employ the
878.BR libcap (3)
879API.
880.\"
c8e68512
MK
881.SS Capabilities and execution of programs by root
882In order to provide an all-powerful
883.I root
884using capability sets, during an
885.BR execve (2):
886.IP 1. 3
887If a set-user-ID-root program is being executed,
888or the real user ID of the process is 0 (root)
889then the file inheritable and permitted sets are defined to be all ones
890(i.e., all capabilities enabled).
891.IP 2.
892If a set-user-ID-root program is being executed,
893then the file effective bit is defined to be one (enabled).
3dfe7e0d 894.PP
c8e68512
MK
895The upshot of the above rules,
896combined with the capabilities transformations described above,
897is that when a process
c930827f 898.BR execve (2)s
3dfe7e0d 899a set-user-ID-root program, or when a process with an effective UID of 0
c930827f 900.BR execve (2)s
3dfe7e0d 901a program,
c13182ef 902it gains all capabilities in its permitted and effective capability sets,
c8e68512 903except those masked out by the capability bounding set.
c7094399 904.\" If a process with real UID 0, and nonzero effective UID does an
c8e68512 905.\" exec(), then it gets all capabilities in its
35fb7de5 906.\" permitted set, and no effective capabilities
3dfe7e0d 907This provides semantics that are the same as those provided by
008f1ecc 908traditional UNIX systems.
c8e68512
MK
909.SS Capability bounding set
910The capability bounding set is a security mechanism that can be used
911to limit the capabilities that can be gained during an
912.BR execve (2).
913The bounding set is used in the following ways:
914.IP * 2
915During an
916.BR execve (2),
917the capability bounding set is ANDed with the file permitted
918capability set, and the result of this operation is assigned to the
919thread's permitted capability set.
920The capability bounding set thus places a limit on the permitted
921capabilities that may be granted by an executable file.
922.IP *
923(Since Linux 2.6.25)
924The capability bounding set acts as a limiting superset for
925the capabilities that a thread can add to its inheritable set using
926.BR capset (2).