Michael Kerrisk [Thu, 12 Aug 2021 03:41:56 +0000 (05:41 +0200)]
mount_setattr.2: Clarify the description of "detached" mounts
From email:
>> Thanks. I made it "detached". Elsewhere, the page already explains
>> that a detached mount is one that:
>>
>> must have been created by calling open_tree(2) with the
>> OPEN_TREE_CLONE flag and it must not already have been
>> visible in the filesystem.
>>
>> Which seems a fine explanation.
>>
>> ????
>> But, just a thought... "visible in the filesystem" seems not quite accurate.
>> What you really mean I guess is that it must not already have been
>> /visible in the filesystem hierarchy/previously mounted/something else/,
>> right?
I suppose that I should have clarified that my main problem was
that you were using the word "filesystem" in a way that I find
unconventional/ambiguous. I mean, I normally take the term
"filesystem" to be "a storage system for folding files".
Here, you are using "filesystem" to mean something else, what
I might call like "the single directory hierarchy" or "the
filesystem hierarchy" or "the list of mount points".
> A detached mount is created via the OPEN_TREE_CLONE flag. It is a
> separate new mount so "previously mounted" is not applicable.
> A detached mount is _related_ to what the MS_BIND flag gives you with
> mount(2). However, they differ conceptually and technically. A MS_BIND
> mount(2) is always visible in the fileystem when mount(2) returns, i.e.
> it is discoverable by regular path-lookup starting within the
> filesystem.
>
> However, a detached mount can be seen as a split of MS_BIND into two
> distinct steps:
> 1. fd_tree = open_tree(OPEN_TREE_CLONE): create a new mount
> 2. move_mount(fd_tree, <somewhere>): attach the mount to the filesystem
>
> 1. and 2. together give you the equivalent of MS_BIND.
> In between 1. and 2. however the mount is detached. For the kernel
> "detached" means that an anonymous mount namespace is attached to it
> which doen't appear in proc and has a 0 sequence number (Technically,
> there's a bit of semantical argument to be made that "attached" and
> "detached" are ambiguous as they could also be taken to mean "does or
> does not have a parent mount". This ambiguity e.g. appears in
> do_move_mount(). That's why the kernel itself calls it an "anonymous
> mount". However, an OPEN_TREE_CLONE-detached mount of course doesn't
> have a parent mount so it works.).
>
> For userspace it's better to think of detached and attached in terms of
> visibility in the filesystem or in a mount namespace. That's more
> straightfoward, more relevant, and hits the target in 90% of the cases.
>
> However, the better and clearer picture is to say that a
> OPEN_TREE_CLONE-detached mount is a mount that has never been
> move_mount()ed. Which in turn can be defined as the detached mount has
> never been made visible in a mount namespace. Once that has happened the
> mount is irreversibly an attached mount.
>
> I keep thinking that maybe we should just say "anonymous mount"
> everywhere. So changing the wording to:
I'm not against the word "detached". To user space, I think it is a
little more meaningful than "anonymous". For the moment, I'll stay with
"detached", but if you insist on "anonymous", I'll probably change it.
> [...]
> EINVAL The mount that is to be ID mapped is not an anonymous mount;
> that is, the mount has already been visible in a mount namespace.
I like that text *a lot* better! Thanks very much for suggesting
wordings. It makes my life much easier.
I've made the text:
EINVAL The mount that is to be ID mapped is not a detached
mount; that is, the mount has not previously been
visible in a mount namespace.
> [...]
> The mount must be an anonymous mount; that is, it must have been
> created by calling open_tree(2) with the OPEN_TREE_CLONE flag and it
> must not already have been visible in a mount namespace, i.e. it must
> not have been attached to the filesystem hierarchy with syscalls such
> as move_mount() syscall.
And that too! I've made the text:
• The mount must be a detached mount; that is, it must have
been created by calling open_tree(2) with the
OPEN_TREE_CLONE flag and it must not already have been
visible in a mount namespace. (To put things another way:
the mount must not have been attached to the filesystem
hierarchy with a system call such as move_mount(2).)
Reported-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Michael Kerrisk [Thu, 12 Aug 2021 03:16:42 +0000 (05:16 +0200)]
mount_setattr.2: EXAMPLES: use -1 rather than -EBADF
From email with Christian Braner:
> [1]: In this code "source" is expected to be absolute. If it's not
> absolute we should fail. This can be achieved by passing -1/-EBADF,
> afaict.
D'oh! Okay. I hadn't considered that use case for an invalid dirfd.
(And now I've done some adjustments to openat(2),which contains a
rationale for the *at() functions.)
So, now I understand your purpose, but still the code is obscure,
since
* You use a magic value (-EBADF) rather than (say) -1.
* There's no explanation (comment about) of the fact that you want
to prevent relative pathnames.
So, I've changed the code to use -1, not -EBADF, and I've added some
comments to explain that the intent is to prevent relative pathnames.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Make the description of the EBADF error for invalid 'dirfd' more
uniform. In particular, note that the error only occurs when the
pathname is relative, and that it occurs when the 'dirfd' is
neither valid *nor* has the value AT_FDCWD.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Michael Kerrisk [Tue, 10 Aug 2021 09:11:40 +0000 (11:11 +0200)]
pthread_setname_np.3: EXAMPLES: remove a bug by simplify the code
From an email conversation with Alexis:
Hello Alexis,
On 8/6/21 7:06 PM, Alexis Wilke wrote:
> Hi guys,
>
> The pthread_setname_np(3) manual page has an example where the second
> argument is used to get a size of the thread name.
>
> https://man7.org/linux/man-pages/man3/pthread_setname_np.3.html#EXAMPLES
>
> The current code:
>
> rc = pthread_getname_np(thread, thread_name,
> (argc > 2) ? atoi(argv[1]) : NAMELEN);
>
> The suggested code:
>
> rc = pthread_getname_np(thread, thread_name,
> (argc > 2) ? atoi(argv[2]) : NAMELEN);
I agree that there's a problem, but I think we could go even simpler:
> I'm thinking that maybe the author meant to compute the length like so:
>
> rc = pthread_getname_np(thread, thread_name,
> (argc > 2) ? strlen(argv[1]) + 1 :
> NAMELEN);
>
> But I think that the atoi() points to using argv[2] as a number
> representing the length.
>
> (Of course, it should be tested against NAMELEN as a maximum, but I
> understand that examples do not always show how to verify each possible
> error).
I imagine that the author's intention was to allow the user to do
experiments where argv[2] specified a number less than NAMELEN,
in order to see the resulting ERANGE error. But, that experiment
is of limited value, and complicates the code unnecessarily, IMO,
so that's why I made the change above.
--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
Reported-by: Alexis Wilke <alexis@m2osw.com> Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Michael Kerrisk [Tue, 10 Aug 2021 08:41:05 +0000 (10:41 +0200)]
mount_setattr.2: Rework the discussion of MOUNT_ATTR__ATIME
Phrases such as "In the new mount API" will date fast. Remove it.
Also:
* Make it clear that MOUNT_ATTR__ATIME expresses a bit field.
* Replace 'enum' with 'enumeration'.
* Clarify what is meant by "partially" set MOUNT_ATTR__ATIME.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Michael Kerrisk [Tue, 10 Aug 2021 07:23:08 +0000 (09:23 +0200)]
mount_setattr.2: Remove description of propagation types
These types are already well described in mount_namespaces(7);
indeed, much of the text from that page seems to have just been
cut and pasted into this page! Simply referring the reader to
mount_namespaces(7) is sufficient.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Michael Kerrisk [Tue, 10 Aug 2021 07:13:56 +0000 (09:13 +0200)]
mount_setattr.2: Reword the description of the 'propagation field'
Point out that this field can have the value zero, meaning
no change. And avoid discussions of 'enum', and simply say
that otherwise the field has one of the MS_* values.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
seccomp.2: Clarify that bad system calls kill the thread
Reported-by: Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Michael Kerrisk [Tue, 10 Aug 2021 00:08:49 +0000 (02:08 +0200)]
mount_setattr.2: Move the discussion of ID-mapped mounts to NOTES
Having this discussion under DESCRIPTION clutters that section,
and has the effect of burying the discussion of propagation. Move
the discussion to NOTES, to make the page more readable.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Michael Kerrisk [Mon, 9 Aug 2021 20:56:47 +0000 (22:56 +0200)]
mount_setattr.2: Minor clean-ups in example program
- Change some instances of "-" to "\"
- Use C99 style (declare variables nearer use in code)
- Add a bit of white space
- Remove one 'const...const' added by Alex that caused
compiler warnings
- Use "reverse Christmas tree" form for declarations in main()
- Other minor changes
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
mount_setattr.2: Minor tweaks to Christian's patch
- Fix SYNOPSIS to fit in 78 columns
Also, we don't show when an include is included for a specific type,
unless that header is included _only_ for the type,
or there might be confusion (e.g., termios).
Instead, that type should be documented in system_data_types(7),
with a link page mount_attr-struct(3).
- Fix references to mount_setattr(). See man-pages(7):
Any reference to the subject of the current manual page should be writ‐
ten with the name in bold followed by a pair of parentheses in Roman
(normal) font. For example, in the fcntl(2) man page, references to
the subject of the page would be written as: fcntl(). The preferred
way to write this in the source file is:
.BR fcntl ()
- Fix line breaks according to semantic newline rules (and add some commas)
- Fix wrong usage of .IR when .RI should have been used
- Fix formatting of variable part in FOO<number>:
- Make italic the variable part (as groff_man(7) recommends)
- Remove <>
- Use syntax recommended by G. Branden Robinson (groff)
- Fix unnecessary uses of .BR or .IR when .B or .I would suffice
- Fix formatting of punctuation
In some cases, it was in italics or bold, and it should always be in roman.
- Use uppercase to begin text, even in bullet points, since those were
multi-sentence.
- Simplify usage of .RS/.RE in combination with .IP
- s/fat/FAT/ as fs(7) does
- Slightly reword some sentences for consistency
- Use Linux-specific for consistency with other pages (in VERSIONS)
- EXAMPLES: Place the return type in a line of its own (as in other pages)
- Fix alignment of code
- Replace unnecessary use of the GNU extension ({}) by do {} while (0)
In that case, there was no return value (moreover, it's a noreturn).
- Break complex declaration lines into a line for each variable
The variables were being initialized, some to non-zero values,
so for clarity, a line for each one seems more appropriate.
- Add const to pointers when possible
- s/\\/\e/
- Remove unmatched groff commands
Cc: Christian Brauner <brauner@kernel.org> Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com> Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Kurt Kanzenbach [Sun, 8 Aug 2021 08:41:16 +0000 (10:41 +0200)]
futex.2: Document FUTEX_LOCK_PI2
FUTEX_LOCK_PI2 is a new futex operation which was recently introduced into the
Linux kernel. It works exactly like FUTEX_LOCK_PI. However, it has support for
selectable clocks for timeouts. By default CLOCK_MONOTONIC is used. If
FUTEX_CLOCK_REALTIME is specified then the timeout is measured against
CLOCK_REALTIME.
This new operation addresses an inconsistency in the futex interface:
FUTEX_LOCK_PI only works with timeouts based on CLOCK_REALTIME in contrast to
all the other PI operations.
Document the FUTEX_LOCK_PI2 command.
Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com> Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
- Move example program to a new EXAMPLES section
- Invert logic in the handler to have the failure in the
conditional path, and the success out of any conditionals.
- Use NULL, EXIT_SUCCESS, and EXIT_FAILURE instead of magic numbers
- Separate declarations from code
- Put function return type on its own line
- Put function opening brace on its line
Cc: Peter Collingbourne <pcc@google.com> Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com> Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
[
I was reading the man page for ldd(1)[1]; and I read this in the first
paragraph of the DECRIPTION section:
ldd prints the shared objects (shared libraries) required by each
program or shared object specified on the command line. An
example of its use and output (using sed(1) to trim leading white
space for readability in this page) is the following:
This is a little confusing though since that sed(1) command does not
seem to work. (and also potentially misleading for someone who is trying
figure out how to parse ldd(1)'s output.)
ldd(1) prepends a TAB character (0x09) to each line, not spaces:
I read ldd(1)'s source code[2] (it is part of glibc) and it seems to be
a bash script that tries to use different rtld programs ( ld.so(8) )
from an RTLDLIST.
I have tried to follow the git history of glibc to see when the switch
from spaces to the TAB character occured, but, to me, it seems like
glibc.git/elf/rtld.c has always used '\t'; at since 6a76c115150318eae5d02eca76f2fc03be7bd029[3] (358th commit since glibc
started using the git repository repository - Nov 18th 1995): before
that commit there are not any results for `git grep '\\t'` in the elf
directory and I did not investigate further.
Still, at the time of that commit, glibc did not seem to have an ldd(1)
utility.
Perhaps the man page is old and its original author was using and
documenting an ldd(1) utility that was not part of glibc when he was
writing it.
Anyhow, since I think that sed(1) command will not work on any system
that uses, at least, the most recent version of glibc (because lld(1)
and the ld.so(8) programs it depends on are all part of glibc), I think
that that example should be changed to avoid confusions.
The output format of ldd(1) does not seem to be clearly defined, so I
think this would be a good option:
$ ldd /bin/ls | sed 's/^[[:space:]]*/ /'
NB: ^\s* should also work on most GNU/Linux systems, but \s is
non-standard or documented so I don not suggest using it in the man
page.
Another option could be to remove "the pipe to sed(1)" part and the note
in parentheses that explains why it was used by the original author.
$ uname -a
Linux t420 5.10.54-1-lts #1 SMP Wed, 28 Jul 2021 15:05:20 +0000
x86_64 GNU/Linux
$ pacman -Qo ldd
/usr/bin/ldd is owned by glibc 2.33-5
$ pacman -Qo /usr/share/man/man1/ldd.1.gz
/usr/share/man/man1/ldd.1.gz is owned by man-pages 5.12-2
$ pacman -Qo /usr/lib/ld-linux.so.2
/usr/lib/ld-linux.so.2 is owned by lib32-glibc 2.33-5
$ pacman -Qo /usr/lib64/ld-linux-x86-64.so.2
/usr/lib/ld-linux-x86-64.so.2 is owned by glibc 2.33-5
$ pacman -F /usr/libx32/ld-linux-x32.so.2 || echo not available on arch linux.
not available on arch linux.
]
Cc: James O. D. Hunt <jamesodhunt@gmail.com> Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com> Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Explain that `optstring` cannot contain a semi-colon (`;`)
character.
[mtk: verified with a small test program; see also posix/getopt.c
in the glibc sources:
if (temp == NULL || c == ':' || c == ';')
{
if (print_errors)
fprintf (stderr, _("%s: invalid option -- '%c'\n"), argv[0], c);
d->optopt = c;
return '?';
}
]
Also explain that `optstring` can include `+` as an option
character, possibly in addition to that character being used as
the first character in `optstring` to denote `POSIXLY_CORRECT`
behaviour.
[mtk: verified with a small test program.]
switch (opt) {
case 'p': pstr = optarg; break;
case 'x': xfnd++; break;
case ';': printf("Got semicolon\n"); break;
case '+': printf("Got plus\n"); break;
case ':': usageError(argv[0], "Missing argument", optopt);
case '?': usageError(argv[0], "Unrecognized option", optopt);
default:
printf("Unexpected case in switch()\n");
exit(EXIT_FAILURE);
}
}
if (xfnd != 0)
printf("-x was specified (count=%d)\n", xfnd);
if (pstr != NULL)
printf("-p was specified with the value \"%s\"\n", pstr);
if (optind < argc)
printf("First nonoption argument is \"%s\" at argv[%d]\n",
argv[optind], optind);
exit(EXIT_SUCCESS);
}
Michael Kerrisk [Sun, 8 Aug 2021 09:24:16 +0000 (11:24 +0200)]
readv.2, pipe.7: Make text on pipe writes more general to avoid a confusion in writev(2)
After a patch proposal from наб triggered by concerns that, when
talking about PIPE_BUF, pipe(7) explicitly mentions write(2) but
not writev(2), I've concluded that the reference in writev(2) to
pipe(7) is not needed (mea culpa; I added that text), and I think
the text in pipe(7) could be written to be closer to the POSIX
spec, which doesn't talk about "write() calls", but simply about
"writes".
Reported-by: наб <nabijaczleweli@nabijaczleweli.xyz> Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Michael Kerrisk [Sun, 8 Aug 2021 02:34:37 +0000 (04:34 +0200)]
readv.2: Minor fixes (part 2) to Will Manley's patch
Mainly: I generally don't want us to be including URLs to mailing
list discussions in a manual page. Either, the issue in the
discussion is worth writing up in the manual page (so that
the reader doesn't have to look elsewhere), or the details
are less important, in which case it is sufficient to note the
existence of the bug. I think this is an example of the latter.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Will Manley [Wed, 28 Jul 2021 20:19:37 +0000 (22:19 +0200)]
readv2: Note preadv2(..., RWF_NOWAIT) bug in BUGS section
To save the next person before they fall foul of it. See
<https://lore.kernel.org/linux-fsdevel/fea8b16d-5a69-40f9-b123-e84dcd6e8f2e@www.fastmail.com/T/#u>
and <https://github.com/tokio-rs/tokio/issues/3803> for more information.
Signed-off-by: Will Manley <will@williammanley.net> Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com> Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This flag was recently added to Linux 5.14 by a patch I wrote:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0ae71c7720e3ae3aabd2e8a072d27f7bd173d25c
This patch adds documentation for the flag, the error code that the flag
added and explains in the caveat when it is useful.
Dan Robertson [Wed, 28 Jul 2021 20:19:42 +0000 (22:19 +0200)]
man2/fallocate.2: tfix documentation of shared blocks
Fix a typo in the documentation of using fallocate to allocate shared
blocks. The flag FALLOC_FL_UNSHARE should instead be documented as
FALLOC_FL_UNSHARE_RANGE.
Fixes: 63a599c657d8 ("man2/fallocate.2: Document behavior with shared blocks") Signed-off-by: Dan Robertson <dan@dlrobertson.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com> Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>