The restriction to CAP_SYS_ADMIN was removed from map_files in
2015 [1]. There was a fixme that indicted this might happen, but
the main text was never updated when this commit landed. While
we're at it, add a note about the ptrace access check that is
still required.
Li Xinhai [Fri, 14 Feb 2020 17:03:58 +0000 (17:03 +0000)]
mbind.2: Remove note about MPOL_MF_STRICT been ignored
Current code ignores the MPOL_MF_STRICT when handling hugetlb
mapping, now patch([1]) handles MPOL_MF_STRICT in same semantic as
other mapping. So, we can remove the note about 'MPOL_MF_STRICT
is ignored on huge page mappings', and no changes to other part of
man-page.
Michael Kerrisk [Sat, 11 Apr 2020 11:32:29 +0000 (13:32 +0200)]
time_namespaces.7: Tweaks for symbolic clock-IDs in /proc/PID/timens_offsets
Andrei Vagin implemented a change I suggested:
clock-IDs are now be expressed in symbolic form (e.g.,
"monotonic") instead of numeric form (e.g., 1) when reading
/proc/PID/timerns_offsets, and can be expressed either
symbolically or numerically when writing to that file.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Michael Kerrisk [Tue, 7 Apr 2020 13:07:51 +0000 (15:07 +0200)]
time_namespaces.7: Add an ERRORS description for writes to timens_offsets
In particular, note the ERANGE restrictions reported by
Thomas Gleixner.
Reported-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Michael Kerrisk [Thu, 9 Apr 2020 09:43:59 +0000 (11:43 +0200)]
strcmp.3: Rework text describing return value to be clearer
Reported-by: Andrew Micallef <andrew.micallef@live.com.au> Reported-by: Walter Harms <wharms@bfs.de> Reviewed-by: Andrew Micallef <andrew.micallef@live.com.au> Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Mike Frysinger [Fri, 10 Apr 2020 05:33:54 +0000 (07:33 +0200)]
proc.5: Clarify /proc/[pid]/cmdline mutability
The cmdline file is a window into memory that is controlled by the
target process, and that memory may be changed arbitrarily, as can
the window via prctl settings. Make sure people understand that
this file is all an illusion.
Signed-off-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Zack Weinberg [Mon, 23 Dec 2019 17:31:46 +0000 (12:31 -0500)]
sigaction.2, signal.7: Document kernel bugs in delivery of signals from CPU exceptions
signal.7: Which signal is delivered in response to a CPU exception
is under-documented and does not always make sense. See
<https://bugzilla.kernel.org/show_bug.cgi?id=205831> for an
example where it doesn’t make sense; per the discussion there,
this cannot be changed because of backward compatibility concerns,
so let’s instead document the problem.
sigaction.2: For related reasons, the kernel doesn’t always fill
in all of the fields of the siginfo_t when delivering signals from
CPU exceptions. Document this as well. I imagine this one
_could_ be fixed, but the problem would still be relevant to
anyone using an older kernel.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Michael Kerrisk [Thu, 9 Apr 2020 19:51:31 +0000 (21:51 +0200)]
fanotify_mark.2: wfix
After a comment from Matthew Bobrowski:
Although, I would just have to point out that it doesn't
necessarily have to be a "script" file, but rather a file of
any type that can have its contents interpreted, which then
results in a form of program execution i.e.
$ /usr/lib64/ld-linux-x86-64.so.2 ./foo
In this case, foo is not a "script" file.
Reported-by: Matthew Bobrowski <mbobrowski@mbobrowski.org> Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Keno Fischer [Mon, 16 Mar 2020 03:21:27 +0000 (23:21 -0400)]
arch_prctl.2: Add ARCH_SET_CPUID subcommand
This subcommand was added a few years ago to support cpuid emulation
on x86 targets, but no changes to the man page appear to have been
made at the time. This commit adds a description for it and the
corresponding getter.
Signed-off-by: Keno Fischer <keno@juliacomputing.com> Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The example is misleading. It is not a good idea to unlink an
existing socket because we might try to start the server multiple
times. In this case it is preferable to receive an error.
We could add code that removes the socket when the server process
is killed but that would stretch the example too far.
Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de> Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Michael Kerrisk [Mon, 6 Apr 2020 11:43:00 +0000 (13:43 +0200)]
connect.2: Update the details on AF_UNSPEC
Update the details on AF_UNSPEC and circumstances in which
socket can be reconnected.
From a mail conversation with Eric Dumazet:
> connect() man page seems obsolete or confusing :
>
> Generally, connection-based protocol sockets may successfully
> connect() only once; connectionless protocol sockets may use
> connect() multiple times to change their association.
> Connectionless sockets may dissolve the association by connecting to
> an address with the sa_family member of sockaddr set to AF_UNSPEC
> (supported on Linux since kernel 2.2).>
>
> 1) At least TCP has supported AF_UNSPEC thing forever.
> 2) By definition connectionless sockets do not have an association,
> why would they call connect(AF_UNSPEC) to remove a connection
> which does not exist ...
Calling connect() on a connectionless socket serves two purposes:
a) Assigns a default outgoing address for datagrams (sent using write(2)).
b) Causes datagrams sent from sources other than the peer address to be
discarded.
Both of these things are true in AF_UNIX and the Internet domains.
Using connect(AF_UNSPEC) allows the local datagram socket to clear
this association (without having to connect() to a *different*
peer), so that now it can send datagrams to any peer and receive
datagrams for any peer, (I've just retested all of this.)
>
> Maybe we should rewrite this paragraph to match reality, since
> this causes confusion.
>
>
> Some protocol sockets may successfully connect() only once.
> Some protocol sockets may use connect() multiple times to change
> their association.
> Some protocol sockets may dissolve the association by connecting to
> an address with the sa_family member of sockaddr set to AF_UNSPEC
> (supported on Linux since kernel 2.2).
When I first saw your note, I was afraid that I had written
the offending text. But, I see it has been there since the
manual page was first added in 1992 (other than the piece
"(supported since on Linux since kernel 2.2)", which I added in
2007). Perhaps it was true in 1992.
Anyway, I confirm your statement about TCP sockets. The
connect(AF_UNSPEC) thing works; thereafter, the socket may be
connected to another socket.
Interestingly, connect(AF_UNSPEC) does not seem to work for
UNIX domain stream sockets. (My light testing gives an EINVAL
error on connect(AF_UNSPEC) of an already connected UNIX stream
socket. I could not easily spot where this error was being
generated in the kernel though.)
Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Michael Kerrisk [Mon, 6 Apr 2020 05:06:15 +0000 (07:06 +0200)]
lseek.2: ERRORS: ENXIO can also occur SEEK_DATA in middle of hole at end of file
Quoting Matthew Wilcox:
The current text of the lseek manpage is ambiguous about
the behaviour of lseek(SEEK_DATA) for a file which is
entirely a hole (or the end of the file is a hole and the
pos lies within the hole). The draft POSIX language is
specific (ENXIO is returned when whence is SEEK_DATA and
offset lies within the final hole of the file). Could I
trouble you to wordsmith that in?
If you want to look at the draft POSIX text, it's here:
https://www.austingroupbugs.net/view.php?id=415
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Michael Kerrisk [Thu, 2 Apr 2020 13:11:32 +0000 (15:11 +0200)]
timerfd_create.2: Note a case where timterfd_settime() can fail with ECANCELED
From email discussions with Thomas Gleixner:
======
Hello Thomas, et al,
Following on from our discussion of read() on a timerfd [1], I
happened to remember a Debian bug report [2] that points out that
timer_settime() can fail with the error ECANCELED, which is both
surprising and odd (because despite the error, the timer does get
updated).
The relevant kernel code (I think, from your commit [3]) seems to be
the following in timerfd_setup():
if (texp != 0) {
if (flags & TFD_TIMER_ABSTIME)
texp = timens_ktime_to_host(clockid, texp);
if (isalarm(ctx)) {
if (flags & TFD_TIMER_ABSTIME)
alarm_start(&ctx->t.alarm, texp);
else
alarm_start_relative(&ctx->t.alarm, texp);
} else {
hrtimer_start(&ctx->t.tmr, texp, htmode);
}
if (timerfd_canceled(ctx))
return -ECANCELED;
}
Using a small test program [4] shows the behavior. The program loops,
repeatedly calling timerfd_settime() (with a delay of a few seconds
before each call). In another terminal window, enter the following
command a few times:
$ sudo date -s "5 seconds" # Add 5 secs to wall-clock time
I see behavior as follows (the /sudo date -s "5 seconds"/ command was
executed before loop iterations 0, 2, and 4):
[[
$ ./timerfd_settime_ECANCELED
0
Current time is 1585729978 secs, 868510078 nsecs
Timer value is now 0 secs, 0 nsecs
timerfd_settime() succeeded
Timer value is now 9 secs, 999991977 nsecs
1
Current time is 1585729982 secs, 716339545 nsecs
Timer value is now 6 secs, 152167990 nsecs
timerfd_settime() succeeded
Timer value is now 9 secs, 999992940 nsecs
2
Current time is 1585729991 secs, 567377831 nsecs
Timer value is now 1 secs, 148959376 nsecs
timerfd_settime: Operation canceled
Timer value is now 9 secs, 999976294 nsecs
3
Current time is 1585729995 secs, 405385503 nsecs
Timer value is now 6 secs, 161989917 nsecs
timerfd_settime() succeeded
Timer value is now 9 secs, 999993317 nsecs
4
Current time is 1585730004 secs, 225036165 nsecs
Timer value is now 1 secs, 180346909 nsecs
timerfd_settime: Operation canceled
Timer value is now 9 secs, 999984345 nsecs
]]
I note from the above.
(1) If the wall-clock is changed before the first timerfd_settime()
call, the call succeeds. This is of course expected.
(2) If the wall-clock is changed after a timerfd_settime() call, then
the next timerfd_settime() call fails with ECANCELED.
(3) Even if the timerfd_settime() call fails, the timer is still updated(!).
Some questions:
(a) What is the rationale for timerfd_settime() failing with ECANCELED
in this case? (Currently, the manual page says nothing about this.)
(b) It seems at the least surprising, but more likely a bug, that
timerfd_settime() fails with ECANCELED while at the same time
successfully updating the timer value.
if (clock_gettime(CLOCK_REALTIME, &start) == -1)
errExit("clock_gettime");
printf("Current time is %ld secs, %ld nsecs\n",
start.tv_sec, start.tv_nsec);
/* Before resetting the timer, retrieve its current value
so that after the timerfd_settime() call, we can see
whether the the value has changed */
if (timerfd_gettime(tfd, >s) == -1)
perror("timerfd_gettime");
printf("Timer value is now %ld secs, %ld nsecs\n",
gts.it_value.tv_sec, gts.it_value.tv_nsec);
if (timerfd_gettime(tfd, >s) == -1)
perror("timerfd_gettime");
printf("Timer value is now %ld secs, %ld nsecs\n",
gts.it_value.tv_sec, gts.it_value.tv_nsec);
printf("\n");
}
}
=======
Subject: Re: timer_settime() and ECANCELED
Date: Wed, 01 Apr 2020 19:42:42 +0200
From: Thomas Gleixner <tglx@linutronix.de>
Michael,
"Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> writes:
> Following on from our discussion of read() on a timerfd [1], I
> happened to remember a Debian bug report [2] that points out that
> timer_settime() can fail with the error ECANCELED, which is both
> surprising and odd (because despite the error, the timer does get
> updated).
...
> (1) If the wall-clock is changed before the first timerfd_settime()
> call, the call succeeds. This is of course expected.
> (2) If the wall-clock is changed after a timerfd_settime() call, then
> the next timerfd_settime() call fails with ECANCELED.
> (3) Even if the timerfd_settime() call fails, the timer is still updated(!).
>
> Some questions:
> (a) What is the rationale for timerfd_settime() failing with ECANCELED
> in this case? (Currently, the manual page says nothing about this.)
> (b) It seems at the least surprising, but more likely a bug, that
> timerfd_settime() fails with ECANCELED while at the same time
> successfully updating the timer value.
Really good question and TBH I can't remember why this is implemented in
the way it is, but I have a faint memory that at least (a) is
intentional.
After staring at the code for a while I came up with the following
answers:
(a): If the clock was set event ("date -s ...") which triggered the
cancel was not yet consumed by user space via read(), then that
information would get lost because arming the timer to the new
value has to reset the state.
(b): Arming the timer in that case is indeed very questionable, but it
could be argued that because the clock was set event happened with
the old expiry value that the new expiry value is not affected.
I'd be happy to change that and not arm the timer in the case of a
pending cancel, but I fear that some user space already depends on
that behaviour.
Thanks,
tglx
======
Subject: Re: timer_settime() and ECANCELED
Date: Thu, 02 Apr 2020 10:49:18 +0200
From: Thomas Gleixner <tglx@linutronix.de>
To: Michael Kerrisk (man-pages) <mtk.manpages@gmail.com>
"Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> writes:
> On 4/1/20 7:42 PM, Thomas Gleixner wrote:
>> (b): Arming the timer in that case is indeed very questionable, but it
>> could be argued that because the clock was set event happened with
>> the old expiry value that the new expiry value is not affected.
>>
>> I'd be happy to change that and not arm the timer in the case of a
>> pending cancel, but I fear that some user space already depends on
>> that behaviour.
>
> Yes, that's the risk, of course. So, shall we just document all
> this in the manual page?
I think so.
Thanks,
tglx
======
Reported-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
prctl: PR_{G,S}ET_IO_FLUSHER to support controlling memory reclaim
Reviewed-by: Michal Hocko <mhocko@kernel.org> Signed-off-by: Mike Christie <mchristi@redhat.com> Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
CLOCK_BOOTTIME y n (EINVAL) y y y
CLOCK_BOOTTIME_ALARM y n (EINVAL) y [1] y [1] y [1]
CLOCK_MONOTONIC y n (EINVAL) y y y
CLOCK_MONOTONIC_COARSE y n (EINVAL) n (ENOTSUP) n (ENOTSUP) n (EINVAL)
CLOCK_MONOTONIC_RAW y n (EINVAL) n (ENOTSUP) n (ENOTSUP) n (EINVAL)
CLOCK_REALTIME y y y y y
CLOCK_REALTIME_ALARM y n (EINVAL) y [1] y [1] y [1]
CLOCK_REALTIME_COARSE y n (EINVAL) n (ENOTSUP) n (ENOTSUP) n (EINVAL)
CLOCK_TAI y n (EINVAL) y y n (EINVAL)
CLOCK_PROCESS_CPUTIME_ID y n (EINVAL) y y n (EINVAL)
CLOCK_THREAD_CPUTIME_ID y n (EINVAL) n (EINVAL [2]) y n (EINVAL)
pthread_getcpuclockid() y n (EINVAL) y y n (EINVAL)
[1] The caller must have CAP_WAKE_ALARM, or the error EPERM results.
[2] This error is generated in the glibc wrapper.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>