You can also see this in the various implementations of
->get_unmapped_area() - if the specified address isn't available,
the kernel basically ignores the hint (apart from the 5level
paging hack).
Clarify how this works a bit.
Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Nikola Forró [Fri, 22 Feb 2019 16:14:14 +0000 (17:14 +0100)]
socket.2: Remove notes concerning AF_ALG and AF_XDP
All address families are now documented in address_families.7,
which is already present in SEE ALSO section. Also, the AF_ALG
note contains dead link to kernel HTML documentation.
Signed-off-by: Nikola Forró <nforro@redhat.com> Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
sigaction.2: Describe obsolete usage of struct sigcontext as signal handler argument
* man2/sigaction.2 (.SS Undocumented): Provide information about
relation between the second argument of sa_handler and
uc_mcontext field of the struct ucontext structure.
Signed-off-by: Eugene Syromyatnikov <evgsyr@gmail.com> Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
bert hubert [Sun, 24 Feb 2019 10:18:11 +0000 (11:18 +0100)]
ip.7: IP_RECVTTL error fixed
I need to get the TTL of UDP datagrams from userspace, so I set
the IP_RECVTTL socket option. And as promised by ip.7, I then get
IP_TTL messages from recvfrom. However, unlike what the manpage
promises, the TTL field gets passed as a 32 bit integer.
Michael Kerrisk [Tue, 12 Feb 2019 15:56:13 +0000 (16:56 +0100)]
capabilities.7: Substantially rework "Capabilities and execution of programs by root"
Rework for improved clarity, and also to include missing details
on the case where (1) the binary that is being executed has
capabilities attached and (2) the real user ID of the process is
not 0 (root) and (3) the effective user ID of the process is 0
(root).
Kernel code analysis and some test code (GPLv3 licensed) below.
======
My analysis of security/commoncaps.c capabilities handling
(from Linux 4.20 source):
execve() eventually calls __do_execve_file():
__do_execve_file()
|
+-prepare_bprm_creds(&bprm)
| |
| +-prepare_exec_creds()
| | |
| | +-prepare_creds()
| | |
| | | // Returns copy of existing creds
| | |
| | +-security_prepare_creds()
| | |
| | +-cred_prepare() [via hook]
| | // Seems to do nothing for commoncaps
| |
| // Returns creds provided by prepare_creds()
|
// Places creds returned by prepare_exec_creds() in bprm->creds
|
|
+-prepare_binprm(&bprm) // bprm from prepare_bprm_creds()
|
+-bprm_fill_uid(&bprm)
|
| // Places current credentials into bprm
|
| // Performs set-UID & set-GID transitions if those file bits are set
|
+-security_bprm_set_creds(&bprm)
|
+-bprm_set_creds(&bprm) [via hook]
|
+-cap_bprm_set_creds(&bprm)
|
// effective = false
|
+-get_file_caps(&bprm, &effective, &has_fcap)
| |
| +-get_vfs_caps_from_disk(..., &vcaps)
| |
| | // Fetches file capabilities from disk and places in vcaps
| |
| +-bprm_caps_from_vfs_caps(&vcaps, &bprm, &effective, &has_fcap)
|
| // If file effective bit is set: effective = true
| //
| // If file has capabilities: has_fcap |= true
| //
| // Perform execve transformation:
| // P'(perm) = F(inh) & P(Inh) | F(Perm) & P(bset)
|
+-handle_privileged_root(&bprm, has_fcap, &effective, root_uid)
|
| // If has_fcap && (rUID != root && eUID == root) then
| // return without doing anything
| //
| // If rUID == root || eUID == root then
| // P'(perm) = P(inh) | P(bset)
| //
| // If eUID == root then
| // effective = true
|
// Perform execve() transformation:
//
// P'(Amb) = (privprog) ? 0 : P(Amb)
// P'(Perm) |= P'(Amb)
// P'(Eff) = effective ? P'(Perm) : P'(Amb)
Summary
1. Perform set-UID/set-GID transformations
2. P'(Amb) = (privprog) ? 0 : P(Amb)
3. If [process has nonzero UIDs] OR
([file has caps] && [rUID != root && eUID == root]), then
Michael Kerrisk [Tue, 12 Feb 2019 09:29:21 +0000 (10:29 +0100)]
capabilities.7: Improve the discussion of when file capabilities are ignored
The text stated that the execve() capability transitions are not
performed for the same reasons that setuid and setgid mode bits
may be ignored (as described in execve(2)). But, that's not quite
correct: rather, the file capability sets are treated as empty
for the purpose of the capability transition calculations.
Also merge the new 'no_file_caps' kernel option text into the
same paragraph.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Michael Kerrisk [Sat, 23 Feb 2019 20:58:23 +0000 (21:58 +0100)]
capget.2: Remove crufty sentence suggesting use of deprecated functions
Remove crufty sentence suggesting use of deprecated capsetp(3) and
capgetp(3); the manual page for those functions has long (at least
as far back as 2007) noted that they are deprecated.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Michael Kerrisk [Sun, 10 Feb 2019 02:40:15 +0000 (03:40 +0100)]
capabilities.7: Rework discussion of exec and UID 0, correcting a couple of details
Clarify the "Capabilities and execution of programs by root"
section, and correct a couple of details:
* If a process with rUID == 0 && eUID != 0 does an exec,
the process will nevertheless gain effective capabilities
if the file effective bit is set.
* Set-UID-root programs only confer a full set of capabilities
if the binary does not also have attached capabilities.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Howard Johnson [Thu, 10 Jan 2019 04:01:46 +0000 (17:01 +1300)]
localedef.1: Note that -f and -c, are reversed from what you might expect
I was reading the local-gen bash script, looking for why I'm
getting locale errors, when I noticed that localdef's -f and -c
options were named, in what I think, is a very confusing way.
-c is the same as --force, and
-f charmapfile is the same as --charmap=charmapfile.
Yes, it would have been better if they're names had been reversed,
like this:
-f is the same as --force, and
-c charmapfile is the same as --charmap=charmapfile.
But given what they are, I thought it would be helpful to give a
heads up to watch for their irregular naming. I hope I've worded
it appropriately.
I'm not ccing this to anyone else, (i.e. developers, etc), as
these features work as described in the man page. They're just
confusing.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Xiao Yang [Fri, 21 Dec 2018 03:01:58 +0000 (11:01 +0800)]
bsd_signal.3: Fix the wrong version of _POSIX_C_SOURCE
According to the latest glibc, the bsd_signal() function is just
declared when POSIX.1-2008 (or newer) instead of POSIX.1-2001 is
not set since glibc v2.26.
Please see the following code from signal/signal.h:
-----------------------------------------------------------------
/* The X/Open definition of `signal' conflicts with the BSD version.
So they defined another function `bsd_signal'. */
extern __sighandler_t bsd_signal (int __sig, __sighandler_t __handler)
__THROW;
-----------------------------------------------------------------
Signed-off-by: Xiao Yang <yangx.jy@cn.fujitsu.com> Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Michael Kerrisk [Sun, 23 Dec 2018 16:50:02 +0000 (17:50 +0100)]
write.2: RETURN VALUE: clarify details of partial write and
As reported by Nadav Har'El in
https://bugzilla.kernel.org/show_bug.cgi?id=197961
The write(2) manual page has this paragraph:
"On success, the number of bytes written is returned
(zero indicates nothing was written). It is not an error
if this number is smaller than the number of bytes
requested; this may happen for example because the disk
device was filled. See also NOTES."
I find a few problems with this paragraph:
1. It's not clear what "See also NOTES." refers to (does it
refer to anything?). What in the NOTES is relevant here?
2. The paragraph seems to suggest that write(2) of a
non-empty buffer may sometimes return even 0 in case of an
error like the device being filled. I think this is wrong
- if there was an error after already writing some number
of bytes, this non-zero number is returned. But if there's
an error before writing any bytes, -1 will be returned
(and the error reason in errno) - 0 will not be returned
unless the given count is 0 (that case is explained in the
following paragraph).
3. The paragraph doesn't explain what a user should do
after a short write (i.e., write(2) returning less than
count). How would the user know why there was an error, or
if there even was one? I think users should be told what
to do next because this information is part of how to use
this API correctly. I think users should be told to retry
the rest of the write (i.e., write(fd, buf+ret, count-ret)
and this will either succeed in writing some more data if
the error reason was solved, or the second write will
return -1 and the error reason in errno.
Reported-by: Nadav Har'El <nyh@math.technion.ac.il> Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Michael Kerrisk [Sun, 23 Dec 2018 16:36:18 +0000 (17:36 +0100)]
getxattr.2, removexattr.2, setxattr.2: ERRORS: replace ENOATTR with ENODATA
ENOATTR is not a standard error code, but rather one that is
defined in 'libattr' as a synonym for ENODATA. The manual pages
should use the error code actually returned by the kernel APIs.
See also https://bugzilla.kernel.org/show_bug.cgi?id=201995
Reported-by: Enrico Scholz <enrico.scholz@sigma-chemnitz.de> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>