Michael Kerrisk [Fri, 16 Oct 2020 09:24:25 +0000 (11:24 +0200)]
seccomp_user_notif.2: EXAMPLE: Improve allocation of response buffer
From a conversation with Jann Horn:
[[
>>>> struct seccomp_notif_resp *resp = malloc(sizes.seccomp_notif_resp);
>>>
>>> This should probably do something like max(sizes.seccomp_notif_resp,
>>> sizeof(struct seccomp_notif_resp)) in case the program was built
>>> against new UAPI headers that make struct seccomp_notif_resp big, but
>>> is running under an old kernel where that struct is still smaller?
>>
>> I'm confused. Why? I mean, if the running kernel says that it expects
>> a buffer of a certain size, and we allocate a buffer of that size,
>> what's the problem?
>
> Because in userspace, we cast the result of malloc() to a "struct
> seccomp_notif_resp *". If the kernel tells us that it expects a size
> smaller than sizeof(struct seccomp_notif_resp), then we end up with a
> pointer to a struct that consists partly of allocated memory, partly
> of out-of-bounds memory, which is generally a bad idea - I'm not sure
> whether the C standard permits that. And if userspace then e.g.
> decides to access some member of that struct that is beyond what the
> kernel thinks is the struct size, we get actual OOB memory accesses.
Got it. (But gosh, this seems like a fragile API mess.)
I added the following to the code:
/* When allocating the response buffer, we must allow for the fact
that the user-space binary may have been built with user-space
headers where 'struct seccomp_notif_resp' is bigger than the
response buffer expected by the (older) kernel. Therefore, we
allocate a buffer that is the maximum of the two sizes. This
ensures that if the supervisor places bytes into the response
structure that are past the response size that the kernel expects,
then the supervisor is not touching an invalid memory location. */
Michael Kerrisk [Fri, 16 Oct 2020 09:02:08 +0000 (11:02 +0200)]
seccomp_user_notif.2: EXAMPLE: ensure path read() by the supervisor is null-terminated
From a conversation with Jann Horn:
>> We should probably make sure here that the value we read is actually
>> NUL-terminated?
>
> So, I was curious about that point also. But, (why) are we not
> guaranteed that it will be NUL-terminated?
Because it's random memory filled by another process, which we don't
necessarily trust. While seccomp notifiers aren't usable for applying
*extra* security restrictions, the supervisor will still often be more
privileged than the supervised process.
Reported-by: Jann Horn <jannh@google.com> Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Michael Kerrisk [Fri, 16 Oct 2020 07:29:10 +0000 (09:29 +0200)]
seccomp_user_notif.2: Small wording fix
Change "read(2) will return 0" to "read(2) may return 0".
Quoting Jann Horn:
Maybe make that "may return 0" instead of "will return 0" -
reading from /proc/$pid/mem can only return 0 in the
following cases AFAICS:
1. task->mm was already gone at open() time
2. mm->mm_users has dropped to zero (the mm only has lazytlb
users; page tables and VMAs are being blown away or have
been blown away)
3. the syscall was called with length 0
When a process has gone away, normally mm->mm_users will
drop to zero, but someone else could theoretically still be
holding a reference to the mm (e.g. someone else in the
middle of accessing /proc/$pid/mem). (Such references
should normally not be very long-lived though.)
Additionally, in the unlikely case that the OOM killer just
chomped through the page tables of the target process, I
think the read will return -EIO (same error as if the
address was simply unmapped) if the address is within a
non-shared mapping. (Maybe that's something procfs could do
better...)
Reported-by: Jann Horn <jannh@google.com> Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Michael Kerrisk [Mon, 28 Sep 2020 20:13:12 +0000 (22:13 +0200)]
seccomp_user_notif.2: Document the seccomp user-space notification mechanism
The APIs used by this mechanism comprise not only seccomp(2), but
also a number of ioctl(2) operations. And any useful example
demonstrating these APIs is will necessarily be rather long.
Trying to cram all of this into the seccomp(2) page would make
that page unmanageably long. Therefore, let's document this
mechanism in a separate page.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The existing text says the structures (plural!) contain a 'struct
seccomp_data'. But this is only true for the received notification
structure (seccomp_notif). So, reword the sentence to be more
general, noting simply that the structures may evolve over time.
Add some comments to the structure definition.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
CIRCLEQ_ENTRY.3, CIRCLEQ_HEAD.3, CIRCLEQ_INIT.3, CIRCLEQ_INSERT_AFTER.3, CIRCLEQ_INSERT_BEFORE.3, CIRCLEQ_INSERT_HEAD.3, CIRCLEQ_INSERT_TAIL.3, CIRCLEQ_REMOVE.3: Link to the new circleq(3) page instead of queue(3)
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com> Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
SLIST_EMPTY.3, SLIST_ENTRY.3, SLIST_FIRST.3, SLIST_FOREACH.3, SLIST_HEAD.3, SLIST_HEAD_INITIALIZER.3, SLIST_INIT.3, SLIST_INSERT_AFTER.3, SLIST_INSERT_HEAD.3, SLIST_NEXT.3, SLIST_REMOVE.3, SLIST_REMOVE_HEAD.3: Link to the new slist(3) page instead of queue(3)
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com> Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
LIST_EMPTY.3, LIST_ENTRY.3, LIST_FIRST.3, LIST_FOREACH.3, LIST_HEAD.3, LIST_HEAD_INITIALIZER.3, LIST_INIT.3, LIST_INSERT_AFTER.3, LIST_INSERT_BEFORE.3, LIST_INSERT_HEAD.3, LIST_NEXT.3, LIST_REMOVE.3: Link to the new list.3 page instead of queue.3
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com> Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
list.3: NAME: Add description
list.3: DESCRIPTION: Add short description
list.3: SEE ALSO: Add insque(3) and queue(3)
list.3: BUGS: Note LIST_FOREACH() limitations
list.3: RETURN VALUE: Add details about the return value of those macros that "return" a value
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com> Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
list.3: NAME: ffix: Use man markup
list.3: SYNOPSIS: ffix: Use man markup
list.3: DESCRIPTION: ffix: Use man markup
list.3: DESCRIPTION: ffix: Use man markup
list.3: CONFORMING TO: ffix: Use man markup
list.3: EXAMPLES: ffix: Use man markup
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com> Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
list.3: SYNOPSIS: Copy include from queue.3
list.3: DESCRIPTION: Copy description about naming of macros from queue.3
list.3: DESCRIPTION: Remove unrelated code to adapt to this page
list.3: DESCRIPTION: Remove lines pointing to the EXAMPLES
list.3: CONFORMING TO: Copy from queue.3
list.3: CONFORMING TO: Adapt to this page
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com> Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Michael Kerrisk [Sun, 18 Oct 2020 13:00:14 +0000 (15:00 +0200)]
system_data_types.7: srcfix: add comment noting time_t difference in POSIX.1-2001
Paul Eggert commented on a patch that proposed to note the
POSIX.2001 details:
No actual POSIXish implementation ever made it a
real-floating type, though, and that point should be made
lest some conscientious programmer worry about a nonexistent
porting issue.
We opted to drop the patch, but in case someone else points out
this POSIX.1-2001 difference in the future, let's leave a comment
in the page source.
Reported-by: Paul Eggert <eggert@cs.ucla.edu> Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de> Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
It stated that 'eflags' may be the OR of one or two of those two flags,
but then a third flag was documented
(which according to the previous wording could not be used?!).
Moreover, the wording also disallowed using 0 (i.e., no flags at all),
which POSIX specifically allows;
I tested the function with no flags and it worked fine for me,
so I guess it was a problem with the documentation,
and not with the implementation itself.
queue.3: Replace incomplete example by a complete example
I added the EXAMPLES section.
The examples in this page are incomplete
(you can't copy&paste&compile&run).
I fixed the one about TAILQ first,
and the rest should follow.
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com> Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>