Ammar Faizi [Tue, 29 Mar 2022 10:17:35 +0000 (17:17 +0700)]
tools/nolibc/stdlib: Implement `malloc()`, `calloc()`, `realloc()` and `free()`
Implement basic dynamic allocator functions. These functions are
currently only available on architectures that have nolibc mmap()
syscall implemented. These are not a super-fast memory allocator,
but at least they can satisfy basic needs for having heap without
libc.
Cc: David Laight <David.Laight@ACULAB.COM> Acked-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Ammar Faizi [Tue, 29 Mar 2022 10:17:33 +0000 (17:17 +0700)]
tools/nolibc/sys: Implement `mmap()` and `munmap()`
Implement mmap() and munmap(). Currently, they are only available for
architecures that have my_syscall6 macro. For architectures that don't
have, this function will return -1 with errno set to ENOSYS (Function
not implemented).
This has been tested on x86 and i386.
Notes for i386:
1) The common mmap() syscall implementation uses __NR_mmap2 instead
of __NR_mmap.
2) The offset must be shifted-right by 12-bit.
Acked-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Ammar Faizi [Tue, 29 Mar 2022 10:17:32 +0000 (17:17 +0700)]
tools/nolibc: i386: Implement syscall with 6 arguments
On i386, the 6th argument of syscall goes in %ebp. However, both Clang
and GCC cannot use %ebp in the clobber list and in the "r" constraint
without using -fomit-frame-pointer. To make it always available for
any kind of compilation, the below workaround is implemented.
1) Push the 6-th argument.
2) Push %ebp.
3) Load the 6-th argument from 4(%esp) to %ebp.
4) Do the syscall (int $0x80).
5) Pop %ebp (restore the old value of %ebp).
6) Add %esp by 4 (undo the stack pointer).
Cc: x86@kernel.org Cc: llvm@lists.linux.dev Link: https://lore.kernel.org/lkml/2e335ac54db44f1d8496583d97f9dab0@AcuMS.aculab.com Suggested-by: David Laight <David.Laight@ACULAB.COM> Acked-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Ammar Faizi [Tue, 29 Mar 2022 10:17:31 +0000 (17:17 +0700)]
tools/nolibc: Remove .global _start from the entry point code
Building with clang yields the following error:
```
<inline asm>:3:1: error: _start changed binding to STB_GLOBAL
.global _start
^
1 error generated.
```
Make sure only specify one between `.global _start` and `.weak _start`.
Remove `.global _start`.
Cc: llvm@lists.linux.dev Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Acked-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Ammar Faizi [Tue, 29 Mar 2022 10:17:30 +0000 (17:17 +0700)]
tools/nolibc: Replace `asm` with `__asm__`
Replace `asm` with `__asm__` to support compilation with -std flag.
Using `asm` with -std flag makes GCC think `asm()` is a function call
instead of an inline assembly.
GCC doc says:
For the C language, the `asm` keyword is a GNU extension. When
writing C code that can be compiled with `-ansi` and the `-std`
options that select C dialects without GNU extensions, use
`__asm__` instead of `asm`.
Link: https://gcc.gnu.org/onlinedocs/gcc/Basic-Asm.html Reported-by: Alviro Iskandar Setiawan <alviro.iskandar@gnuweeb.org> Acked-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Willy Tarreau [Wed, 23 Mar 2022 07:18:07 +0000 (08:18 +0100)]
tools/nolibc/stdlib: only reference the external environ when inlined
When building with gcc at -O0 we're seeing link errors due to the
"environ" variable being referenced by getenv(). The problem is that
at -O0 gcc will not inline getenv() and will not drop the external
reference. One solution would be to locally declare the variable as
weak, but then it would appear in all programs even those not using
it, and would be confusing to users of getenv() who would forget to
set environ to envp.
An alternate approach used in this patch consists in always inlining
the outer part of getenv() that references this extern so that it's
always dropped when not used. The biggest part of the function was
now moved to a new function called _getenv() that's still not inlined
by default.
Reported-by: Ammar Faizi <ammarfaizi2@gnuweeb.org> Signed-off-by: Willy Tarreau <w@1wt.eu> Tested-by: Ammar Faizi <ammarfaizi2@gnuweeb.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Willy Tarreau [Wed, 23 Mar 2022 07:18:06 +0000 (08:18 +0100)]
tools/nolibc/string: do not use __builtin_strlen() at -O0
clang wants to use strlen() for __builtin_strlen() at -O0. We don't
really care about -O0 but it at least ought to build, so let's make
sure we don't choke on this, by dropping the optimizationn for
constant strings in this case.
Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Willy Tarreau [Mon, 21 Mar 2022 17:33:14 +0000 (18:33 +0100)]
tools/nolibc: add the nolibc subdir to the common Makefile
The Makefile in tools/ is used to forward options to the makefiles
in the various subdirs. Let's add nolibc there so that it becomes
possible to make tools/nolibc_headers_standalone from the main tree
to simply create a completely usable sysroot.
Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Willy Tarreau [Mon, 21 Mar 2022 17:33:13 +0000 (18:33 +0100)]
tools/nolibc: add a makefile to install headers
This provides a target "headers_standalone" which installs the nolibc's
arch-specific headers with "arch.h" taken from the current arch (or a
concatenation of both i386 and x86_64 for arch=x86), then installs
kernel headers. This creates a convenient sysroot which is directly
usable by a bare-metal compiler to create any executable.
Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Willy Tarreau [Mon, 21 Mar 2022 17:33:12 +0000 (18:33 +0100)]
tools/nolibc/types: add poll() and waitpid() flag definitions
- POLLIN etc were missing, so poll() could only be used with timeouts.
- WNOHANG was not defined and is convenient to check if a child is still
running
Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Willy Tarreau [Mon, 21 Mar 2022 17:33:10 +0000 (18:33 +0100)]
tools/nolibc/string: add strcmp() and strncmp()
We need these functions all the time, including when checking environment
variables and parsing command-line arguments. These implementations were
optimized to show optimal code size on a wide range of compilers (22 bytes
return included for strcmp(), 33 for strncmp()).
Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Willy Tarreau [Mon, 21 Mar 2022 17:33:07 +0000 (18:33 +0100)]
tools/nolibc/stdio: make printf(%s) accept NULL
It's often convenient to support this, especially in test programs where
a NULL may correspond to an allocation error or a non-existing value.
Let's make printf("%s") support being passed a NULL. In this case it
prints "(null)" like glibc's printf().
Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Willy Tarreau [Mon, 7 Feb 2022 16:23:54 +0000 (17:23 +0100)]
tools/nolibc/stdlib: implement abort()
libgcc uses it for certain divide functions, so it must be exported. Like
for memset() we do that in its own section so that the linker can strip
it when not needed.
Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Willy Tarreau [Mon, 7 Feb 2022 16:23:53 +0000 (17:23 +0100)]
tools/nolibc: also mention how to build by just setting the include path
Now that a few basic include files are provided, some simple portable
programs may build, which will save them from having to surround their
includes with #ifndef NOLIBC. This patch mentions how to proceed, and
enumerates the list of files that are covered.
A comprehensive list of required include files is available here:
https://en.cppreference.com/w/c/header
Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Willy Tarreau [Mon, 7 Feb 2022 16:23:52 +0000 (17:23 +0100)]
tools/nolibc/time: create time.h with time()
The time() syscall is used by a few simple applications, and is trivial
to implement based on gettimeofday() that we already have. Let's create
the file to ease porting and provide the function. It never returns any
error, though it may segfault in case of invalid pointer, like other
implementations relying on gettimeofday().
Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Willy Tarreau [Mon, 7 Feb 2022 16:23:47 +0000 (17:23 +0100)]
tools/nolibc/string: export memset() and memmove()
"clang -Os" and "gcc -Ofast" without -ffreestanding may ignore memset()
and memmove(), hoping to provide their builtin equivalents, and finally
not find them. Thus we must export these functions for these rare cases.
Note that as they're set in their own sections, they will be eliminated
by the linker if not used. In addition, they do not prevent gcc from
identifying them and replacing them with the shorter "rep movsb" or
"rep stosb" when relevant.
Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Willy Tarreau [Mon, 7 Feb 2022 16:23:46 +0000 (17:23 +0100)]
tools/nolibc/types: define PATH_MAX and MAXPATHLEN
These ones are often used and commonly set by applications to fallback
values. Let's fix them both to agree on PATH_MAX=4096 by default, as is
already present in linux/limits.h.
Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Willy Tarreau [Mon, 7 Feb 2022 16:23:44 +0000 (17:23 +0100)]
tools/nolibc: move exported functions to their own section
Some functions like raise() and memcpy() are permanently exported because
they're needed by libgcc on certain platforms. However most of the time
they are not needed and needlessly take space.
Let's move them to their own sub-section, called .text.nolibc_<function>.
This allows ld to get rid of them if unused when passed --gc-sections.
Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Willy Tarreau [Mon, 7 Feb 2022 16:23:43 +0000 (17:23 +0100)]
tools/nolibc/string: add tiny versions of strncat() and strlcat()
While these functions are often dangerous, forcing the user to work
around their absence is often much worse. Let's provide small versions
of each of them. The respective sizes in bytes on a few architectures
are:
The two are quite different, and strncat() is even different from
strncpy() in that it limits the amount of data it copies and will always
terminate the output by one zero, while strlcat() will always limit the
total output to the specified size and will put a zero if possible.
Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Willy Tarreau [Mon, 7 Feb 2022 16:23:42 +0000 (17:23 +0100)]
tools/nolibc/string: add strncpy() and strlcpy()
These are minimal variants. strncpy() always fills the destination for
<size> chars, while strlcpy() copies no more than <size> including the
zero and returns the source's length. The respective sizes on various
archs are:
Willy Tarreau [Mon, 7 Feb 2022 16:23:41 +0000 (17:23 +0100)]
tools/nolibc/string: slightly simplify memmove()
The direction test inside the loop was not always completely optimized,
resulting in a larger than necessary function. This change adds a
direction variable that is set out of the loop. Now the function is down
to 48 bytes on x86, 32 on ARM and 68 on mips. It's worth noting that other
approaches were attempted (including relying on the up and down functions)
but they were only slightly beneficial on x86 and cost more on others.
Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Willy Tarreau [Mon, 7 Feb 2022 16:23:40 +0000 (17:23 +0100)]
tools/nolibc/string: use unidirectional variants for memcpy()
Till now memcpy() relies on memmove(), but it's always included for libgcc,
so we have a larger than needed function. Let's implement two unidirectional
variants to copy from bottom to top and from top to bottom, and use the
former for memcpy(). The variants are optimized to be compact, and at the
same time the compiler is sometimes able to detect the loop and to replace
it with a "rep movsb". The new function is 24 bytes instead of 52 on x86_64.
Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Willy Tarreau [Mon, 7 Feb 2022 16:23:37 +0000 (17:23 +0100)]
tools/nolibc/stdlib: avoid a 64-bit shift in u64toh_r()
The build of printf() on mips requires libgcc for functions __ashldi3 and
__lshrdi3 due to 64-bit shifts when scanning the input number. These are
not really needed in fact since we scan the number 4 bits at a time. Let's
arrange the loop to perform two 32-bit shifts instead on 32-bit platforms.
Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Willy Tarreau [Mon, 7 Feb 2022 16:23:36 +0000 (17:23 +0100)]
tools/nolibc/sys: make open() take a vararg on the 3rd argument
Let's pass a vararg to open() so that it remains compatible with existing
code. The arg is only dereferenced when flags contain O_CREAT. The function
is generally not inlined anymore, causing an extra call (total 16 extra
bytes) but it's still optimized for constant propagation, limiting the
excess to no more than 16 bytes in practice when open() is called without
O_CREAT, and ~40 with O_CREAT, which remains reasonable.
Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Willy Tarreau [Mon, 7 Feb 2022 16:23:35 +0000 (17:23 +0100)]
tools/nolibc/stdio: add perror() to report the errno value
It doesn't contain the text for the error codes, but instead displays
"errno=" followed by the errno value. Just like the regular errno, if
a non-empty message is passed, it's placed followed with ": " on the
output before the errno code. The message is emitted on stderr.
Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Willy Tarreau [Mon, 7 Feb 2022 16:23:31 +0000 (17:23 +0100)]
tools/nolibc/stdio: add stdin/stdout/stderr and fget*/fput* functions
The standard puts() function always emits the trailing LF which makes it
unconvenient for small string concatenation. fputs() ought to be used
instead but it requires a FILE*.
This adds 3 dummy FILE* values (stdin, stdout, stderr) which are in fact
pointers to struct FILE of one byte. We reserve 3 pointer values for them,
-3, -2 and -1, so that they are ordered, easing the tests and mapping to
integer.
>From this, fgetc(), fputc(), fgets() and fputs() were implemented, and
the previous putchar() and getchar() now remap to these. The standard
getc() and putc() macros were also implemented as pointing to these
ones.
There is absolutely no buffering, fgetc() and fgets() read one byte at
a time, fputc() writes one byte at a time, and only fputs() which knows
the string's length writes all of it at once.
Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Willy Tarreau [Mon, 7 Feb 2022 16:23:28 +0000 (17:23 +0100)]
tools/nolibc/stdlib: add i64toa() and u64toa()
These are 64-bit variants of the itoa() and utoa() functions. They also
support reentrant ones, and use the same itoa_buffer. The functions are
a bit larger than the previous ones in 32-bit mode (86 and 98 bytes on
x86_64 and armv7 respectively), which is why we continue to provide them
as separate functions.
Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Willy Tarreau [Mon, 7 Feb 2022 16:23:27 +0000 (17:23 +0100)]
tools/nolibc/stdlib: replace the ltoa() function with more efficient ones
The original ltoa() function and the reentrant one ltoa_r() present a
number of drawbacks. The divide by 10 generates calls to external code
from libgcc_s, and the number does not necessarily start at the beginning
of the buffer.
Let's rewrite these functions so that they do not involve a divide and
only use loops on powers of 10, and implement both signed and unsigned
variants, always starting from the buffer's first character. Instead of
using a static buffer for each function, we're now using a common one.
In order to avoid confusion with the ltoa() name, the new functions are
called itoa_r() and utoa_r() to distinguish the signed and unsigned
versions, and for convenience for their callers, these functions now
reutrn the number of characters emitted. The ltoa_r() function is just
an inline mapping to the signed one and which returns the buffer.
The functions are quite small (86 bytes on x86_64, 68 on armv7) and
do not depend anymore on external code.
Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Willy Tarreau [Mon, 7 Feb 2022 16:23:26 +0000 (17:23 +0100)]
tools/nolibc/stdlib: move ltoa() to stdlib.h
This function is not standard and performs the opposite of atol(). Let's
move it with atol(). It's been split between a reentrant function and one
using a static buffer.
There's no more definition in nolibc.h anymore now.
Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Willy Tarreau [Mon, 7 Feb 2022 16:23:25 +0000 (17:23 +0100)]
tools/nolibc/types: move makedev to types.h and make it a macro
The makedev() man page says it's supposed to be a macro and that some
OSes have it with the other ones in sys/types.h so it now makes sense
to move it to types.h as a macro. Let's also define major() and
minor() that perform the reverse operation.
Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Willy Tarreau [Sun, 13 Feb 2022 08:53:01 +0000 (09:53 +0100)]
tools/nolibc/types: make FD_SETSIZE configurable
The macro was hard-coded to 256 but it's common to see it redefined.
Let's support this and make sure we always allocate enough entries for
the cases where it wouldn't be multiple of 32.
Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Willy Tarreau [Sun, 13 Feb 2022 08:52:10 +0000 (09:52 +0100)]
tools/nolibc/types: move the FD_* functions to macros in types.h
FD_SET, FD_CLR, FD_ISSET, FD_ZERO are often expected to be macros and
not functions. In addition we already have a file dedicated to such
macros and types used by syscalls, it's types.h, so let's move them
there and turn them to macros. FD_CLR() and FD_ISSET() were missing,
so they were added. FD_ZERO() now deals with its own loop so that it
doesn't rely on memset() that sets one byte at a time.
Cc: David Laight <David.Laight@aculab.com> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Willy Tarreau [Mon, 7 Feb 2022 16:23:20 +0000 (17:23 +0100)]
tools/nolibc/string: split the string functions into string.h
The string manipulation functions (mem*, str*) are now found in
string.h. The file depends on almost nothing and will be
usable from other includes if needed. Maybe more functions could
be added.
Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Willy Tarreau [Mon, 7 Feb 2022 16:23:18 +0000 (17:23 +0100)]
tools/nolibc/sys: split the syscall definitions into their own file
The syscall definitions were moved to sys.h. They were arranged
in a more easily maintainable order, whereby the sys_xxx() and xxx()
functions were grouped together, which also enlights the occasional
mappings such as wait relying on wait4().
Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Willy Tarreau [Mon, 7 Feb 2022 16:23:17 +0000 (17:23 +0100)]
tools/nolibc/arch: split arch-specific code into individual files
In order to ease maintenance, this splits the arch-specific code into
one file per architecture. A common file "arch.h" is used to include the
right file among arch-* based on the detected architecture. Projects
which are already split per architecture could simply rename these
files to $arch/arch.h and get rid of the common arch.h. For this
reason, include guards were placed into each arch-specific file.
Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Willy Tarreau [Mon, 7 Feb 2022 16:23:16 +0000 (17:23 +0100)]
tools/nolibc/types: split syscall-specific definitions into their own files
The macros and type definitions used by a number of syscalls were moved
to types.h where they will be easier to maintain. A few of them
are arch-specific and must not be moved there (e.g. O_*, sys_stat_struct).
A warning about them was placed at the top of the file.
Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Willy Tarreau [Mon, 7 Feb 2022 16:23:15 +0000 (17:23 +0100)]
tools/nolibc/std: move the standard type definitions to std.h
The ordering of includes and definitions for now is a bit of a mess, as
for example asm/signal.h is included after int definitions, but plenty of
structures are defined later as they rely on other includes.
Let's move the standard type definitions to a dedicated file that is
included first. We also move NULL there. This way all other includes
are aware of it, and we can bring asm/signal.h back to the top of the
file.
Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Merge tag 'trace-v5.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull more tracing updates from Steven Rostedt:
- Rename the staging files to give them some meaning. Just
stage1,stag2,etc, does not show what they are for
- Check for NULL from allocation in bootconfig
- Hold event mutex for dyn_event call in user events
- Mark user events to broken (to work on the API)
- Remove eBPF updates from user events
- Remove user events from uapi header to keep it from being installed.
- Move ftrace_graph_is_dead() into inline as it is called from hot
paths and also convert it into a static branch.
* tag 'trace-v5.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Move user_events.h temporarily out of include/uapi
ftrace: Make ftrace_graph_is_dead() a static branch
tracing: Set user_events to BROKEN
tracing/user_events: Remove eBPF interfaces
tracing/user_events: Hold event_mutex during dyn_event_add
proc: bootconfig: Add null pointer check
tracing: Rename the staging files for trace_events
Merge tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
Pull clk fix from Stephen Boyd:
"A single revert to fix a boot regression seen when clk_put() started
dropping rate range requests. It's best to keep various systems
booting so we'll kick this out and try again next time"
* tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
Revert "clk: Drop the rate range on clk_put()"
Merge tag 'x86-urgent-2022-04-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
"A set of x86 fixes and updates:
- Make the prctl() for enabling dynamic XSTATE components correct so
it adds the newly requested feature to the permission bitmap
instead of overwriting it. Add a selftest which validates that.
- Unroll string MMIO for encrypted SEV guests as the hypervisor
cannot emulate it.
- Handle supervisor states correctly in the FPU/XSTATE code so it
takes the feature set of the fpstate buffer into account. The
feature sets can differ between host and guest buffers. Guest
buffers do not contain supervisor states. So far this was not an
issue, but with enabling PASID it needs to be handled in the buffer
offset calculation and in the permission bitmaps.
- Avoid a gazillion of repeated CPUID invocations in by caching the
values early in the FPU/XSTATE code.
- Enable CONFIG_WERROR in x86 defconfig.
- Make the X86 defconfigs more useful by adapting them to Y2022
reality"
* tag 'x86-urgent-2022-04-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/fpu/xstate: Consolidate size calculations
x86/fpu/xstate: Handle supervisor states in XSTATE permissions
x86/fpu/xsave: Handle compacted offsets correctly with supervisor states
x86/fpu: Cache xfeature flags from CPUID
x86/fpu/xsave: Initialize offset/size cache early
x86/fpu: Remove unused supervisor only offsets
x86/fpu: Remove redundant XCOMP_BV initialization
x86/sev: Unroll string mmio with CC_ATTR_GUEST_UNROLL_STRING_IO
x86/config: Make the x86 defconfigs a bit more usable
x86/defconfig: Enable WERROR
selftests/x86/amx: Update the ARCH_REQ_XCOMP_PERM test
x86/fpu/xstate: Fix the ARCH_REQ_XCOMP_PERM implementation
Merge tag 'core-urgent-2022-04-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RT signal fix from Thomas Gleixner:
"Revert the RT related signal changes. They need to be reworked and
generalized"
* tag 'core-urgent-2022-04-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
Revert "signal, x86: Delay calling signals in atomic on RT enabled kernels"
Merge tag 'dma-mapping-5.18-1' of git://git.infradead.org/users/hch/dma-mapping
Pull more dma-mapping updates from Christoph Hellwig:
- fix a regression in dma remap handling vs AMD memory encryption (me)
- finally kill off the legacy PCI DMA API (Christophe JAILLET)
* tag 'dma-mapping-5.18-1' of git://git.infradead.org/users/hch/dma-mapping:
dma-mapping: move pgprot_decrypted out of dma_pgprot
PCI/doc: cleanup references to the legacy PCI DMA API
PCI: Remove the deprecated "pci-dma-compat.h" API
Stephen Boyd [Sun, 3 Apr 2022 02:28:18 +0000 (19:28 -0700)]
Revert "clk: Drop the rate range on clk_put()"
This reverts commit 7dabfa2bc4803eed83d6f22bd6f045495f40636b. There are
multiple reports that this breaks boot on various systems. The common
theme is that orphan clks are having rates set on them when that isn't
expected. Let's revert it out for now so that -rc1 boots.
Merge tag 'perf-tools-for-v5.18-2022-04-02' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
Pull more perf tools updates from Arnaldo Carvalho de Melo:
- Avoid SEGV if core.cpus isn't set in 'perf stat'.
- Stop depending on .git files for building PERF-VERSION-FILE, used in
'perf --version', fixing some perf tools build scenarios.
- Convert tracepoint.py example to python3.
- Update UAPI header copies from the kernel sources: socket,
mman-common, msr-index, KVM, i915 and cpufeatures.
- Update copy of libbpf's hashmap.c.
- Directly return instead of using local ret variable in
evlist__create_syswide_maps(), found by coccinelle.
* tag 'perf-tools-for-v5.18-2022-04-02' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
perf python: Convert tracepoint.py example to python3
perf evlist: Directly return instead of using local ret variable
perf cpumap: More cpu map reuse by merge.
perf cpumap: Add is_subset function
perf evlist: Rename cpus to user_requested_cpus
perf tools: Stop depending on .git files for building PERF-VERSION-FILE
tools headers cpufeatures: Sync with the kernel sources
tools headers UAPI: Sync drm/i915_drm.h with the kernel sources
tools headers UAPI: Sync linux/kvm.h with the kernel sources
tools kvm headers arm64: Update KVM headers from the kernel sources
tools arch x86: Sync the msr-index.h copy with the kernel sources
tools headers UAPI: Sync asm-generic/mman-common.h with the kernel
perf beauty: Update copy of linux/socket.h with the kernel sources
perf tools: Update copy of libbpf's hashmap.c
perf stat: Avoid SEGV if core.cpus isn't set
Merge tag 'kbuild-fixes-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild fixes from Masahiro Yamada:
- Fix empty $(PYTHON) expansion.
- Fix UML, which got broken by the attempt to suppress Clang warnings.
- Fix warning message in modpost.
* tag 'kbuild-fixes-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
modpost: restore the warning message for missing symbol versions
Revert "um: clang: Strip out -mno-global-merge from USER_CFLAGS"
kbuild: Remove '-mno-global-merge'
kbuild: fix empty ${PYTHON} in scripts/link-vmlinux.sh
kconfig: remove stale comment about removed kconfig_print_symbol()
Merge tag 'mips_5.18_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux
Pull MIPS fixes from Thomas Bogendoerfer:
- build fix for gpio
- fix crc32 build problems
- check for failed memory allocations
* tag 'mips_5.18_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
MIPS: crypto: Fix CRC32 code
MIPS: rb532: move GPIOD definition into C-files
MIPS: lantiq: check the return value of kzalloc()
mips: sgi-ip22: add a check for the return of kzalloc()
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
- Only do MSR filtering for MSRs accessed by rdmsr/wrmsr
- Documentation improvements
- Prevent module exit until all VMs are freed
- PMU Virtualization fixes
- Fix for kvm_irq_delivery_to_apic_fast() NULL-pointer dereferences
- Other miscellaneous bugfixes
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (42 commits)
KVM: x86: fix sending PV IPI
KVM: x86/mmu: do compare-and-exchange of gPTE via the user address
KVM: x86: Remove redundant vm_entry_controls_clearbit() call
KVM: x86: cleanup enter_rmode()
KVM: x86: SVM: fix tsc scaling when the host doesn't support it
kvm: x86: SVM: remove unused defines
KVM: x86: SVM: move tsc ratio definitions to svm.h
KVM: x86: SVM: fix avic spec based definitions again
KVM: MIPS: remove reference to trap&emulate virtualization
KVM: x86: document limitations of MSR filtering
KVM: x86: Only do MSR filtering when access MSR by rdmsr/wrmsr
KVM: x86/emulator: Emulate RDPID only if it is enabled in guest
KVM: x86/pmu: Fix and isolate TSX-specific performance event logic
KVM: x86: mmu: trace kvm_mmu_set_spte after the new SPTE was set
KVM: x86/svm: Clear reserved bits written to PerfEvtSeln MSRs
KVM: x86: Trace all APICv inhibit changes and capture overall status
KVM: x86: Add wrappers for setting/clearing APICv inhibits
KVM: x86: Make APICv inhibit reasons an enum and cleanup naming
KVM: X86: Handle implicit supervisor access with SMAP
KVM: X86: Rename variable smap to not_smap in permission_fault()
...
Merge tag 'for-5.18/drivers-2022-04-02' of git://git.kernel.dk/linux-block
Pull block driver fix from Jens Axboe:
"Got two reports on nbd spewing warnings on load now, which is a
regression from a commit that went into your tree yesterday.
Revert the problematic change for now"
* tag 'for-5.18/drivers-2022-04-02' of git://git.kernel.dk/linux-block:
Revert "nbd: fix possible overflow on 'first_minor' in nbd_dev_add()"
Merge tag 'tag-chrome-platform-for-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux
Pull chrome platform updates from Benson Leung:
"cros_ec_typec:
- Check for EC device - Fix a crash when using the cros_ec_typec
driver on older hardware not capable of typec commands
- Make try power role optional
- Mux configuration reorganization series from Prashant
cros_ec_debugfs:
- Fix use after free. Thanks Tzung-bi
sensorhub:
- cros_ec_sensorhub fixup - Split trace include file
misc:
- Add new mailing list for chrome-platform development:
chrome-platform@lists.linux.dev
Now with patchwork!"
* tag 'tag-chrome-platform-for-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux:
platform/chrome: cros_ec_debugfs: detach log reader wq from devm
platform: chrome: Split trace include file
platform/chrome: cros_ec_typec: Update mux flags during partner removal
platform/chrome: cros_ec_typec: Configure muxes at start of port update
platform/chrome: cros_ec_typec: Get mux state inside configure_mux
platform/chrome: cros_ec_typec: Move mux flag checks
platform/chrome: cros_ec_typec: Check for EC device
platform/chrome: cros_ec_typec: Make try power role optional
MAINTAINERS: platform-chrome: Add new chrome-platform@lists.linux.dev list
After being merged, user_events become more visible to a wider audience
that have concerns with the current API.
It is too late to fix this for this release, but instead of a full
revert, just mark it as BROKEN (which prevents it from being selected in
make config). Then we can work finding a better API. If that fails,
then it will need to be completely reverted.
To not have the code silently bitrot, still allow building it with
COMPILE_TEST.
And to prevent the uapi header from being installed, then later changed,
and then have an old distro user space see the old version, move the
header file out of the uapi directory.
Surround the include with CONFIG_COMPILE_TEST to the current location,
but when the BROKEN tag is taken off, it will use the uapi directory,
and fail to compile. This is a good way to remind us to move the header
back.
tracing: Move user_events.h temporarily out of include/uapi
While user_events API is under development and has been marked for broken
to not let the API become fixed, move the header file out of the uapi
directory. This is to prevent it from being installed, then later changed,
and then have an old distro user space update with a new kernel, where
applications see the user_events being available, but the old header is in
place, and then they get compiled incorrectly.
Also, surround the include with CONFIG_COMPILE_TEST to the current
location, but when the BROKEN tag is taken off, it will use the uapi
directory, and fail to compile. This is a good way to remind us to move
the header back.
Christophe Leroy [Wed, 30 Mar 2022 07:00:19 +0000 (09:00 +0200)]
ftrace: Make ftrace_graph_is_dead() a static branch
ftrace_graph_is_dead() is used on hot paths, it just reads a variable
in memory and is not worth suffering function call constraints.
For instance, at entry of prepare_ftrace_return(), inlining it avoids
saving prepare_ftrace_return() parameters to stack and restoring them
after calling ftrace_graph_is_dead().
While at it using a static branch is even more performant and is
rather well adapted considering that the returned value will almost
never change.
Inline ftrace_graph_is_dead() and replace 'kill_ftrace_graph' bool
by a static branch.
After being merged, user_events become more visible to a wider audience
that have concerns with the current API. It is too late to fix this for
this release, but instead of a full revert, just mark it as BROKEN (which
prevents it from being selected in make config). Then we can work finding
a better API. If that fails, then it will need to be completely reverted.
tracing: Rename the staging files for trace_events
When looking for implementation of different phases of the creation of the
TRACE_EVENT() macro, it is pretty useless when all helper macro
redefinitions are in files labeled "stageX_defines.h". Rename them to
state which phase the files are for. For instance, when looking for the
defines that are used to create the event fields, seeing
"stage4_event_fields.h" gives the developer a good idea that the defines
are in that file.
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Li RongQing [Wed, 9 Mar 2022 08:35:44 +0000 (16:35 +0800)]
KVM: x86: fix sending PV IPI
If apic_id is less than min, and (max - apic_id) is greater than
KVM_IPI_CLUSTER_SIZE, then the third check condition is satisfied but
the new apic_id does not fit the bitmask. In this case __send_ipi_mask
should send the IPI.
This is mostly theoretical, but it can happen if the apic_ids on three
iterations of the loop are for example 1, KVM_IPI_CLUSTER_SIZE, 0.
Fixes: aaffcfd1e82 ("KVM: X86: Implement PV IPIs in linux guest") Signed-off-by: Li RongQing <lirongqing@baidu.com>
Message-Id: <1646814944-51801-1-git-send-email-lirongqing@baidu.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Tue, 29 Mar 2022 16:56:24 +0000 (12:56 -0400)]
KVM: x86/mmu: do compare-and-exchange of gPTE via the user address
FNAME(cmpxchg_gpte) is an inefficient mess. It is at least decent if it
can go through get_user_pages_fast(), but if it cannot then it tries to
use memremap(); that is not just terribly slow, it is also wrong because
it assumes that the VM_PFNMAP VMA is contiguous.
The right way to do it would be to do the same thing as
hva_to_pfn_remapped() does since commit add6a0cd1c5b ("KVM: MMU: try to
fix up page faults before giving up", 2016-07-05), using follow_pte()
and fixup_user_fault() to determine the correct address to use for
memremap(). To do this, one could for example extract hva_to_pfn()
for use outside virt/kvm/kvm_main.c. But really there is no reason to
do that either, because there is already a perfectly valid address to
do the cmpxchg() on, only it is a userspace address. That means doing
user_access_begin()/user_access_end() and writing the code in assembly
to handle exceptions correctly. Worse, the guest PTE can be 8-byte
even on i686 so there is the extra complication of using cmpxchg8b to
account for. But at least it is an efficient mess.
(Thanks to Linus for suggesting improvement on the inline assembly).
Reported-by: Qiuhao Li <qiuhao@sysec.org> Reported-by: Gaoning Pan <pgn@zju.edu.cn> Reported-by: Yongkang Jia <kangel@zju.edu.cn> Reported-by: syzbot+6cde2282daa792c49ab8@syzkaller.appspotmail.com Debugged-by: Tadeusz Struk <tadeusz.struk@linaro.org> Tested-by: Maxim Levitsky <mlevitsk@redhat.com> Cc: stable@vger.kernel.org Fixes: bd53cb35a3e9 ("X86/KVM: Handle PFNs outside of kernel reach when touching GPTEs") Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When emulating exit from long mode, EFER_LMA is cleared with
vmx_set_efer(). This will already unset the VM_ENTRY_IA32E_MODE control
bit as requested by SDM, so there is no need to unset VM_ENTRY_IA32E_MODE
again in exit_lmode() explicitly. In case EFER isn't supported by
hardware, long mode isn't supported, so exit_lmode() cannot be reached.
Note that, thanks to the shadow controls mechanism, this change doesn't
eliminate vmread or vmwrite.
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Message-Id: <20220311102643.807507-3-zhenzhong.duan@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Zhenzhong Duan [Fri, 11 Mar 2022 10:26:42 +0000 (18:26 +0800)]
KVM: x86: cleanup enter_rmode()
vmx_set_efer() sets uret->data but, in fact if the value of uret->data
will be used vmx_setup_uret_msrs() will have rewritten it with the value
returned by update_transition_efer(). uret->data is consumed if and only
if uret->load_into_hardware is true, and vmx_setup_uret_msrs() takes care
of (a) updating uret->data before setting uret->load_into_hardware to true
(b) setting uret->load_into_hardware to false if uret->data isn't updated.
Opportunistically use "vmx" directly instead of redoing to_vmx().
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Message-Id: <20220311102643.807507-2-zhenzhong.duan@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Sun, 13 Mar 2022 14:05:22 +0000 (15:05 +0100)]
KVM: MIPS: remove reference to trap&emulate virtualization
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220313140522.1307751-1-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Tue, 15 Mar 2022 22:17:15 +0000 (18:17 -0400)]
KVM: x86: document limitations of MSR filtering
MSR filtering requires an exit to userspace that is hard to implement and
would be very slow in the case of nested VMX vmexit and vmentry MSR
accesses. Document the limitation.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Hou Wenlong [Mon, 7 Mar 2022 12:26:33 +0000 (20:26 +0800)]
KVM: x86: Only do MSR filtering when access MSR by rdmsr/wrmsr
If MSR access is rejected by MSR filtering,
kvm_set_msr()/kvm_get_msr() would return KVM_MSR_RET_FILTERED,
and the return value is only handled well for rdmsr/wrmsr.
However, some instruction emulation and state transition also
use kvm_set_msr()/kvm_get_msr() to do msr access but may trigger
some unexpected results if MSR access is rejected, E.g. RDPID
emulation would inject a #UD but RDPID wouldn't cause a exit
when RDPID is supported in hardware and ENABLE_RDTSCP is set.
And it would also cause failure when load MSR at nested entry/exit.
Since msr filtering is based on MSR bitmap, it is better to only
do MSR filtering for rdmsr/wrmsr.
Signed-off-by: Hou Wenlong <houwenlong.hwl@antgroup.com>
Message-Id: <2b2774154f7532c96a6f04d71c82a8bec7d9e80b.1646655860.git.houwenlong.hwl@antgroup.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Hou Wenlong [Wed, 2 Mar 2022 13:15:14 +0000 (21:15 +0800)]
KVM: x86/emulator: Emulate RDPID only if it is enabled in guest
When RDTSCP is supported but RDPID is not supported in host,
RDPID emulation is available. However, __kvm_get_msr() would
only fail when RDTSCP/RDPID both are disabled in guest, so
the emulator wouldn't inject a #UD when RDPID is disabled but
RDTSCP is enabled in guest.
Like Xu [Wed, 9 Mar 2022 08:42:57 +0000 (16:42 +0800)]
KVM: x86/pmu: Fix and isolate TSX-specific performance event logic
HSW_IN_TX* bits are used in generic code which are not supported on
AMD. Worse, these bits overlap with AMD EventSelect[11:8] and hence
using HSW_IN_TX* bits unconditionally in generic code is resulting in
unintentional pmu behavior on AMD. For example, if EventSelect[11:8]
is 0x2, pmc_reprogram_counter() wrongly assumes that
HSW_IN_TX_CHECKPOINTED is set and thus forces sampling period to be 0.
Also per the SDM, both bits 32 and 33 "may only be set if the processor
supports HLE or RTM" and for "IN_TXCP (bit 33): this bit may only be set
for IA32_PERFEVTSEL2."
Opportunistically eliminate code redundancy, because if the HSW_IN_TX*
bit is set in pmc->eventsel, it is already set in attr.config.
Reported-by: Ravi Bangoria <ravi.bangoria@amd.com> Reported-by: Jim Mattson <jmattson@google.com> Fixes: 103af0a98788 ("perf, kvm: Support the in_tx/in_tx_cp modifiers in KVM arch perfmon emulation v5") Co-developed-by: Ravi Bangoria <ravi.bangoria@amd.com> Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Signed-off-by: Like Xu <likexu@tencent.com>
Message-Id: <20220309084257.88931-1-likexu@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Jim Mattson [Sat, 26 Feb 2022 23:41:31 +0000 (15:41 -0800)]
KVM: x86/svm: Clear reserved bits written to PerfEvtSeln MSRs
AMD EPYC CPUs never raise a #GP for a WRMSR to a PerfEvtSeln MSR. Some
reserved bits are cleared, and some are not. Specifically, on
Zen3/Milan, bits 19 and 42 are not cleared.
When emulating such a WRMSR, KVM should not synthesize a #GP,
regardless of which bits are set. However, undocumented bits should
not be passed through to the hardware MSR. So, rather than checking
for reserved bits and synthesizing a #GP, just clear the reserved
bits.
This may seem pedantic, but since KVM currently does not support the
"Host/Guest Only" bits (41:40), it is necessary to clear these bits
rather than synthesizing #GP, because some popular guests (e.g Linux)
will set the "Host Only" bit even on CPUs that don't support
EFER.SVME, and they don't expect a #GP.
Feb 23 03:59:58 Ubuntu1804 kernel: [ 405.379957] unchecked MSR access error: WRMSR to 0xc0010200 (tried to write 0x0000020000130026) at rIP: 0xffffffff9b276a28 (native_write_msr+0x8/0x30)
Feb 23 03:59:58 Ubuntu1804 kernel: [ 405.379958] Call Trace:
Feb 23 03:59:58 Ubuntu1804 kernel: [ 405.379963] amd_pmu_disable_event+0x27/0x90
Fixes: ca724305a2b0 ("KVM: x86/vPMU: Implement AMD vPMU code for KVM") Reported-by: Lotus Fenn <lotusf@google.com> Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: Like Xu <likexu@tencent.com> Reviewed-by: David Dunn <daviddunn@google.com>
Message-Id: <20220226234131.2167175-1-jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>