git.ipfire.org Git - thirdparty/linux.git/log

]> git.ipfire.org Git - thirdparty/linux.git/log

Keith Busch [Thu, 27 Feb 2025 22:39:15 +0000 (14:39 -0800)]

ublk: zc register/unregister bvec

Provide new operations for the user to request mapping an active request
to an io uring instance's buf_table. The user has to provide the index
it wants to install the buffer.

A reference count is taken on the request to ensure it can't be
completed while it is active in a ring's buf_table.

Signed-off-by: Keith Busch <kbusch@kernel.org>
Link: https://lore.kernel.org/r/20250227223916.143006-6-kbusch@meta.com
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Keith Busch [Thu, 27 Feb 2025 22:39:14 +0000 (14:39 -0800)]

io_uring: add support for kernel registered bvecs

Provide an interface for the kernel to leverage the existing
pre-registered buffers that io_uring provides. User space can reference
these later to achieve zero-copy IO.

User space must register an empty fixed buffer table with io_uring in
order for the kernel to make use of it.

Signed-off-by: Keith Busch <kbusch@kernel.org>
Link: https://lore.kernel.org/r/20250227223916.143006-5-kbusch@meta.com
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Xinyu Zhang [Thu, 27 Feb 2025 22:39:13 +0000 (14:39 -0800)]

nvme: map uring_cmd data even if address is 0

When using kernel registered bvec fixed buffers, the "address" is
actually the offset into the bvec rather than userspace address.
Therefore it can be 0.

We can skip checking whether the address is NULL before mapping
uring_cmd data. Bad userspace address will be handled properly later when
the user buffer is imported.

With this patch, we will be able to use the kernel registered bvec fixed
buffers in io_uring NVMe passthru with ublk zero-copy support.

Reviewed-by: Caleb Sander Mateos <csander@purestorage.com>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Xinyu Zhang <xizhang@purestorage.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Link: https://lore.kernel.org/r/20250227223916.143006-4-kbusch@meta.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Keith Busch [Thu, 27 Feb 2025 22:39:12 +0000 (14:39 -0800)]

io_uring/rw: move fixed buffer import to issue path

Registered buffers may depend on a linked command, which makes the prep
path too early to import. Move to the issue path when the node is
actually needed like all the other users of fixed buffers.

Signed-off-by: Keith Busch <kbusch@kernel.org>
Link: https://lore.kernel.org/r/20250227223916.143006-3-kbusch@meta.com
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Keith Busch [Thu, 27 Feb 2025 22:39:11 +0000 (14:39 -0800)]

io_uring/rw: move buffer_select outside generic prep

Cleans up the generic rw prep to not require the do_import flag. Use a
different prep function for callers that might need buffer select.

Based-on-a-patch-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Link: https://lore.kernel.org/r/20250227223916.143006-2-kbusch@meta.com
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Arnd Bergmann [Thu, 27 Feb 2025 13:20:09 +0000 (14:20 +0100)]

io_uring/net: fix build warning for !CONFIG_COMPAT

A code rework resulted in an uninitialized return code when COMPAT
mode is disabled:

io_uring/net.c:722:6: error: variable 'ret' is used uninitialized whenever 'if' condition is true [-Werror,-Wsometimes-uninitialized]
  722 |         if (io_is_compat(req->ctx)) {
      |             ^~~~~~~~~~~~~~~~~~~~~~
io_uring/net.c:736:15: note: uninitialized use occurs here
  736 |         if (unlikely(ret))
      |                      ^~~

Since io_is_compat() turns into a compile-time 'false', the #ifdef
here is completely unnecessary, and removing it avoids the warning.

Fixes: 51e158d40589 ("io_uring/net: unify *mshot_prep calls with compat")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20250227132018.1111094-1-arnd@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Pavel Begunkov [Wed, 26 Feb 2025 20:46:34 +0000 (20:46 +0000)]

io_uring: rearrange opdef flags by use pattern

Keep all flags that we use in the generic req init path close together.
That saves a load for x86 because apparently some compilers prefer
reading single bytes.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/ef03b6ce4a0c2a5234cd4037fa07e9e4902dcc9e.1740602793.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Pavel Begunkov [Wed, 26 Feb 2025 11:41:21 +0000 (11:41 +0000)]

io_uring/net: extract iovec import into a helper

Deduplicate iovec imports between compat and !compat by introducing a
helper function.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/6a5f8c526f6732c4249a7fa0213b49e1a3ecccf0.1740569495.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Pavel Begunkov [Wed, 26 Feb 2025 11:41:20 +0000 (11:41 +0000)]

io_uring/net: unify *mshot_prep calls with compat

Instead of duplicating a io_recvmsg_mshot_prep() call in the compat
path, let the common code handle it. For that, copy necessary compat
fields into struct user_msghdr. Note, it zeroes user_msghdr to be on the
safe side as compat is not that interesting and overhead shouldn't be
high.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/94e62386dec570f83b4a4270a46ac60bc415fb71.1740569495.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Pavel Begunkov [Wed, 26 Feb 2025 11:41:19 +0000 (11:41 +0000)]

io_uring/net: derive iovec storage later

Don't read free_iov until right before we need it to import the iovec.
The only place that uses it before that is provided buffer selection,
but it only serves as temporary storage and iovec content is not reused
afterwards, so use a local variable for that.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/8bfa7d74c33e37860a724f4e0e96660c25cd4c02.1740569495.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Pavel Begunkov [Wed, 26 Feb 2025 11:41:18 +0000 (11:41 +0000)]

io_uring/net: verify msghdr before copying iovec

Normally, net/ would verify msghdr before importing iovec, for example
see copy_msghdr_from_user(), which further assumed by __copy_msghdr()
validating msg->msg_iovlen.

io_uring does it in reverse order, which is fine, but it'll be more
convenient for flip it so that the iovec business is done at the end and
eventually can be nicely pulled out of msghdr parsing section and
thought as a sepaarate step. That also makes structure accesses more
localised, which should be better for caches.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/cd35dc1b48d4e6e31f59ae7304c037fbe8a3fd3d.1740569495.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Pavel Begunkov [Wed, 26 Feb 2025 11:41:17 +0000 (11:41 +0000)]

io_uring/net: isolate msghdr copying code

The user access section in io_msg_copy_hdr() is overextended by covering
selected buffers. It's hard to work with and prone to errors. Limit the
section to msghdr import only, selected buffers will do a separate
copy_from_user() call, and then move it into its own function. This
should be fine, selected buffer single shots are not important, for
multishots the overhead should be non-existent, and it's not that
expensive overall.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/d3eb1f81c8cfbea9f1aa57dab90c472d2aa6e371.1740569495.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Pavel Begunkov [Wed, 26 Feb 2025 11:41:16 +0000 (11:41 +0000)]

io_uring/net: simplify compat selbuf iov parsing

Use copy_from_user() instead of open coded access_ok() + get_user(),
that's simpler and we don't care about compat that much.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/e51f9c323a3cd4ad7c8da656559bdf6237f052fb.1740569495.git.asml.silence@gmail.com
[axboe: fold in bogus < 0 check for tmp_iov.iov_len]
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Pavel Begunkov [Wed, 26 Feb 2025 11:41:15 +0000 (11:41 +0000)]

io_uring/net: remove unnecessary REQ_F_NEED_CLEANUP

REQ_F_NEED_CLEANUP in io_recvmsg_prep_setup() and in io_sendmsg_setup()
are relics of the past and don't do anything useful, the flag should be
and are set earlier on iovec and async_data allocation.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/6aedc3141c1fc027128a4503656cfd686a6980ef.1740569495.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Jens Axboe [Thu, 27 Feb 2025 14:18:01 +0000 (07:18 -0700)]

Merge branch 'io_uring-6.14' into for-6.15/io_uring

Merge mainline fixes into 6.15 branch, as upcoming patches depend on
fixes that went into the 6.14 mainline branch.

* io_uring-6.14:
  io_uring/net: save msg_control for compat
  io_uring/rw: clean up mshot forced sync mode
  io_uring/rw: move ki_complete init into prep
  io_uring/rw: don't directly use ki_complete
  io_uring/rw: forbid multishot async reads
  io_uring/rsrc: remove unused constants
  io_uring: fix spelling error in uapi io_uring.h
  io_uring: prevent opcode speculation
  io-wq: backoff when retrying worker creation

commit | commitdiff | tree

Pavel Begunkov [Mon, 24 Feb 2025 21:31:10 +0000 (13:31 -0800)]

io_uring: combine buffer lookup and import

Registered buffer are currently imported in two steps, first we lookup
a rsrc node and then use it to set up the iterator. The first part is
usually done at the prep stage, and import happens whenever it's needed.
As we want to defer binding to a node so that it works with linked
requests, combine both steps into a single helper.

Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250224213116.3509093-6-kbusch@meta.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Pavel Begunkov [Mon, 24 Feb 2025 21:31:09 +0000 (13:31 -0800)]

io_uring/nvme: pass issue_flags to io_uring_cmd_import_fixed()

io_uring_cmd_import_fixed() will need to know the io_uring execution
state in following commits, for now just pass issue_flags into it
without actually using.

Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250224213116.3509093-5-kbusch@meta.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Pavel Begunkov [Mon, 24 Feb 2025 21:31:08 +0000 (13:31 -0800)]

io_uring/net: reuse req->buf_index for sendzc

There is already a field in io_kiocb that can store a registered buffer
index, use that instead of stashing the value into struct io_sr_msg.

Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250224213116.3509093-4-kbusch@meta.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Keith Busch [Mon, 24 Feb 2025 21:31:07 +0000 (13:31 -0800)]

io_uring/nop: reuse req->buf_index

There is already a field in io_kiocb that can store a registered buffer
index, use that instead of stashing the value into struct io_nop.

Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250224213116.3509093-3-kbusch@meta.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Keith Busch [Mon, 24 Feb 2025 21:31:06 +0000 (13:31 -0800)]

io_uring/rsrc: remove redundant check for valid imu

The only caller to io_buffer_unmap already checks if the node's buf is
not null, so no need to check again.

Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250224213116.3509093-2-kbusch@meta.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Pavel Begunkov [Mon, 24 Feb 2025 19:45:06 +0000 (19:45 +0000)]

io_uring/rw: open code io_prep_rw_setup()

Open code io_prep_rw_setup() into its only caller, it doesn't provide
any meaningful abstraction anymore.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/61ba72e2d46119db71f27ab908018e6a6cd6c064.1740425922.git.asml.silence@gmail.com
[axboe: fold in 'ret' being unused fix]
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Pavel Begunkov [Tue, 25 Feb 2025 15:59:02 +0000 (15:59 +0000)]

io_uring/net: save msg_control for compat

Match the compat part of io_sendmsg_copy_hdr() with its counterpart and
save msg_control.

Fixes: c55978024d123 ("io_uring/net: move receive multishot out of the generic msghdr path")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/2a8418821fe83d3b64350ad2b3c0303e9b732bbd.1740498502.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Pavel Begunkov [Mon, 24 Feb 2025 19:45:05 +0000 (19:45 +0000)]

io_uring/rw: extract helper for iovec import

Split out a helper out of __io_import_rw_buffer() that handles vectored
buffers. I'll need it for registered vectored buffers, but it also looks
cleaner, especially with parameters being properly named.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/075470cfb24be38709d946815f35ec846d966f41.1740425922.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Pavel Begunkov [Mon, 24 Feb 2025 19:45:04 +0000 (19:45 +0000)]

io_uring/rw: rename io_import_iovec()

io_import_iovec() is not limited to iovecs but also imports buffers for
normal reads and selected buffers, rename it for clarity.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/91cea59340b61a8f52dc7b8e720274577a25188c.1740425922.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Pavel Begunkov [Mon, 24 Feb 2025 19:45:03 +0000 (19:45 +0000)]

io_uring/rw: allocate async data in io_prep_rw()

rw always allocates async_data, so instead of doing that deeper in prep
calls inside of io_prep_rw_setup(), be a bit more explicit and do that
early on in io_prep_rw().

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/5ead621051bc3374d1e8d96f816454906a6afd71.1740425922.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Pavel Begunkov [Sun, 23 Feb 2025 17:22:31 +0000 (17:22 +0000)]

io_uring: make io_poll_issue() sturdier

io_poll_issue() forwards the call to io_issue_sqe() and thus inherits
some of the handling. That's not particularly failure resistant, as for
example returning an innocently looking IOU_OK from a multishot issue
will lead to severe bugs.

Reimplement io_poll_issue() without io_issue_sqe()'s request completion
logic. Remove extra checks as we know that req->file is already set,
linked timeout are armed, and iopoll is not supported. Also cover it
with warnings for now.

The patch should be useful by itself, but it's also preparing the
codebase for other future clean ups.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/3096d7b1026d9a52426a598bdfc8d9d324555545.1740331076.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Pavel Begunkov [Sun, 23 Feb 2025 17:22:30 +0000 (17:22 +0000)]

io_uring/net: canonise accept mshot handling

Use a more recognisable pattern for mshot accept, first try to post an
mshot cqe if needed and after do terminating handling.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/daf5c0df7e2966deb0a115021c065fc6161a52d7.1740331076.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Pavel Begunkov [Sun, 23 Feb 2025 17:22:29 +0000 (17:22 +0000)]

io_uring/net: fix accept multishot handling

REQ_F_APOLL_MULTISHOT doesn't guarantee it's executed from the multishot
context, so a multishot accept may get executed inline, fail
io_req_post_cqe(), and ask the core code to kill the request with
-ECANCELED by returning IOU_STOP_MULTISHOT even when a socket has been
accepted and installed.

Cc: stable@vger.kernel.org
Fixes: 390ed29b5e425 ("io_uring: add IORING_ACCEPT_MULTISHOT for accept")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/51c6deb01feaa78b08565ca8f24843c017f5bc80.1740331076.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Pavel Begunkov [Mon, 24 Feb 2025 12:42:24 +0000 (12:42 +0000)]

io_uring/net: use io_is_compat()

Use io_is_compat() for consistency.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Anuj Gupta <anuj20.g@samsung.com>
Link: https://lore.kernel.org/r/fff93d9d08243284c5db5d546be766a82e85c130.1740400452.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Pavel Begunkov [Mon, 24 Feb 2025 12:42:23 +0000 (12:42 +0000)]

io_uring/waitid: use io_is_compat()

Use io_is_compat() for consistency.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Anuj Gupta <anuj20.g@samsung.com>
Link: https://lore.kernel.org/r/28c5b5f1f1bf7f4d18869dafe6e4147ce1bbf0f5.1740400452.git.asml.silence@gmail.com
Link: https://lore.kernel.org/r/20250224172337.2009871-1-csander@purestorage.com
[axboe: fold in improvement from Caleb, see link]
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Pavel Begunkov [Mon, 24 Feb 2025 12:42:22 +0000 (12:42 +0000)]

io_uring/rw: shrink io_iov_compat_buffer_select_prep

Compat performance is not important and simplicity is more appreciated.
Let's not be smart about it and use simpler copy_from_user() instead of
access + __get_user pair.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/b334a3a5040efa424ded58e4d8a6ef2554324266.1740400452.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Pavel Begunkov [Mon, 24 Feb 2025 12:42:21 +0000 (12:42 +0000)]

io_uring/rw: compile out compat param passing

Even when COMPAT is compiled out, we still have to pass
ctx->compat to __import_iovec(). Replace the read with an indirection
with a constant when the kernel doesn't support compat.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Anuj Gupta <anuj20.g@samsung.com>
Link: https://lore.kernel.org/r/2819df9c8533c36b46d7baccbb317a0ec89da6cd.1740400452.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Pavel Begunkov [Mon, 24 Feb 2025 12:42:20 +0000 (12:42 +0000)]

io_uring/cmd: optimise !CONFIG_COMPAT flags setting

Use io_is_compat() to avoid extra overhead in io_uring_cmd() for flag
setting when compat is compiled out.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Anuj Gupta <anuj20.g@samsung.com>
Link: https://lore.kernel.org/r/f4d74c62d7cbddc386c0a9138ecd2b2ed6d3f146.1740400452.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Pavel Begunkov [Mon, 24 Feb 2025 12:42:19 +0000 (12:42 +0000)]

io_uring: introduce io_is_compat()

A preparation patch adding a simple helper for gauging the compat state.
It'll help us to optimise and compile out more code in the following
commits.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Anuj Gupta <anuj20.g@samsung.com>
Link: https://lore.kernel.org/r/1a87a640265196a67bc38300128e0bfd7839ab1f.1740400452.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Pavel Begunkov [Wed, 19 Feb 2025 01:33:40 +0000 (01:33 +0000)]

io_uring/rw: clean up mshot forced sync mode

Move code forcing synchronous execution of multishot read requests out
a more generic __io_read().

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/4ad7b928c776d1ad59addb9fff64ef2d1fc474d5.1739919038.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Pavel Begunkov [Wed, 19 Feb 2025 01:33:39 +0000 (01:33 +0000)]

io_uring/rw: move ki_complete init into prep

Initialise ki_complete during request prep stage, we'll depend on it not
being reset during issue in the following patch.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/817624086bd5f0448b08c80623399919fda82f34.1739919038.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Pavel Begunkov [Wed, 19 Feb 2025 01:33:38 +0000 (01:33 +0000)]

io_uring/rw: don't directly use ki_complete

We want to avoid checking ->ki_complete directly in the io_uring
completion path. Fortunately we have only two callback the selection
of which depend on the ring constant flags, i.e. IOPOLL, so use that
to infer the function.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/4eb4bdab8cbcf5bc87083f7047edc81e920ab83c.1739919038.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Pavel Begunkov [Wed, 19 Feb 2025 01:33:37 +0000 (01:33 +0000)]

io_uring/rw: forbid multishot async reads

At the moment we can't sanely handle queuing an async request from a
multishot context, so disable them. It shouldn't matter as pollable
files / socekts don't normally do async.

Patching it in __io_read() is not the cleanest way, but it's simpler
than other options, so let's fix it there and clean up on top.

Cc: stable@vger.kernel.org
Reported-by: chase xd <sl1589472800@gmail.com>
Fixes: fc68fcda04910 ("io_uring/rw: add support for IORING_OP_READ_MULTISHOT")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/7d51732c125159d17db4fe16f51ec41b936973f8.1739919038.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Caleb Sander Mateos [Wed, 19 Feb 2025 03:34:43 +0000 (20:34 -0700)]

io_uring/rsrc: remove unused constants

IO_NODE_ALLOC_CACHE_MAX has been unused since commit fbbb8e991d86
("io_uring/rsrc: get rid of io_rsrc_node allocation cache") removed the
rsrc_node_cache.

IO_RSRC_TAG_TABLE_SHIFT and IO_RSRC_TAG_TABLE_MASK have been unused
since commit 7029acd8a950 ("io_uring/rsrc: get rid of per-ring
io_rsrc_node list") removed the separate tag table for registered nodes.

Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Reviewed-by: Li Zetao <lizetao1@huawei.com>
Link: https://lore.kernel.org/r/20250219033444.2020136-1-csander@purestorage.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Jens Axboe [Tue, 18 Feb 2025 23:47:40 +0000 (16:47 -0700)]

io_uring: fix spelling error in uapi io_uring.h

This is obviously not that important, but when changes are synced back
from the kernel to liburing, the codespell CI ends up erroring because
of this misspelling. Let's just correct it and avoid this biting us
again on an import.

Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Caleb Sander Mateos [Wed, 12 Feb 2025 00:51:18 +0000 (17:51 -0700)]

io_uring: use lockless_cq flag in io_req_complete_post()

io_uring_create() computes ctx->lockless_cq as:
ctx->task_complete || (ctx->flags & IORING_SETUP_IOPOLL)

So use it to simplify that expression in io_req_complete_post().

Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Reviewed-by: Li Zetao <lizetao1@huawei.com>
Link: https://lore.kernel.org/r/20250212005119.3433005-1-csander@purestorage.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Caleb Sander Mateos [Mon, 17 Feb 2025 02:25:05 +0000 (19:25 -0700)]

io_uring: pass struct io_tw_state by value

8e5b3b89ecaf ("io_uring: remove struct io_tw_state::locked") removed the
only field of io_tw_state but kept it as a task work callback argument
to "forc[e] users not to invoke them carelessly out of a wrong context".
Passing the struct io_tw_state * argument adds a few instructions to all
callers that can't inline the functions and see the argument is unused.

So pass struct io_tw_state by value instead. Since it's a 0-sized value,
it can be passed without any instructions needed to initialize it.

Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Link: https://lore.kernel.org/r/20250217022511.1150145-2-csander@purestorage.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Caleb Sander Mateos [Mon, 17 Feb 2025 02:25:04 +0000 (19:25 -0700)]

io_uring: introduce type alias for io_tw_state

In preparation for changing how io_tw_state is passed, introduce a type
alias io_tw_token_t for struct io_tw_state *. This allows for changing
the representation in one place, without having to update the many
functions that just forward their struct io_tw_state * argument.

Also add a comment to struct io_tw_state to explain its purpose.

Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Link: https://lore.kernel.org/r/20250217022511.1150145-1-csander@purestorage.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Caleb Sander Mateos [Sun, 16 Feb 2025 22:58:59 +0000 (15:58 -0700)]

io_uring/rsrc: avoid NULL check in io_put_rsrc_node()

Most callers of io_put_rsrc_node() already check that node is non-NULL:
- io_rsrc_data_free()
- io_sqe_buffer_register()
- io_reset_rsrc_node()
- io_req_put_rsrc_nodes() (REQ_F_BUF_NODE indicates non-NULL buf_node)

Only io_splice_cleanup() can call io_put_rsrc_node() with a NULL node.
So move the NULL check there.

Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Link: https://lore.kernel.org/r/20250216225900.1075446-1-csander@purestorage.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Caleb Sander Mateos [Wed, 12 Feb 2025 16:48:05 +0000 (09:48 -0700)]

io_uring: pass ctx instead of req to io_init_req_drain()

io_init_req_drain() takes a struct io_kiocb *req argument but only uses
it to get struct io_ring_ctx *ctx. The caller already knows the ctx, so
pass it instead.

Drop "req" from the function name since it operates on the ctx rather
than a specific req.

Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Link: https://lore.kernel.org/r/20250212164807.3681036-1-csander@purestorage.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Caleb Sander Mateos [Tue, 11 Feb 2025 20:19:56 +0000 (13:19 -0700)]

io_uring: use IO_REQ_LINK_FLAGS more

Replace the 2 instances of REQ_F_LINK | REQ_F_HARDLINK with
the more commonly used IO_REQ_LINK_FLAGS.

Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Link: https://lore.kernel.org/r/20250211202002.3316324-1-csander@purestorage.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Jens Axboe [Sat, 8 Feb 2025 17:50:34 +0000 (10:50 -0700)]

io_uring/net: improve recv bundles

Current recv bundles are only supported for multishot receives, and
additionally they also always post at least 2 CQEs if more data is
available than what a buffer will hold. This happens because the initial
bundle recv will do a single buffer, and then do the rest of what is in
the socket as a followup receive. As shown in a test program, if 1k
buffers are available and 32k is available to receive in the socket,
you'd get the following completions:

bundle=1, mshot=0
cqe res 1024
cqe res 1024
[...]
cqe res 1024

bundle=1, mshot=1
cqe res 1024
cqe res 31744

where bundle=1 && mshot=0 will post 32 1k completions, and bundle=1 &&
mshot=1 will post a 1k completion and then a 31k completion.

To support bundle recv without multishot, it's possible to simply retry
the recv immediately and post a single completion, rather than split it
into two completions. With the below patch, the same test looks as
follows:

bundle=1, mshot=0
cqe res 32768

bundle=1, mshot=1
cqe res 32768

where mshot=0 works fine for bundles, and both of them post just a
single 32k completion rather than split it into separate completions.
Posting fewer completions is always a nice win, and not needing
multishot for proper bundle efficiency is nice for cases that can't
necessarily use multishot.

Reported-by: Norman Maurer <norman_maurer@apple.com>
Link: https://lore.kernel.org/r/184f9f92-a682-4205-a15d-89e18f664502@kernel.dk
Fixes: 2f9c9515bdfd ("io_uring/net: support bundles for recv")
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Jens Axboe [Wed, 5 Feb 2025 20:16:29 +0000 (13:16 -0700)]

io_uring/waitid: use generic io_cancel_remove() helper

Don't implement our own loop rolling and checking, just use the generic
helper to find and cancel requests.

Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Jens Axboe [Wed, 5 Feb 2025 20:15:57 +0000 (13:15 -0700)]

io_uring/futex: use generic io_cancel_remove() helper

Don't implement our own loop rolling and checking, just use the generic
helper to find and cancel requests.

Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Jens Axboe [Wed, 5 Feb 2025 20:13:58 +0000 (13:13 -0700)]

io_uring/cancel: add generic cancel helper

Any opcode that is cancelable ends up defining its own cancel helper
for finding and canceling a specific request. Add a generic helper that
can be used for this purpose.

Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Jens Axboe [Wed, 5 Feb 2025 19:52:46 +0000 (12:52 -0700)]

io_uring/waitid: convert to io_cancel_remove_all()

Use the generic helper for cancelations.

Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Jens Axboe [Wed, 5 Feb 2025 19:51:26 +0000 (12:51 -0700)]

io_uring/futex: convert to io_cancel_remove_all()

Use the generic helper for cancelations.

Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Jens Axboe [Wed, 5 Feb 2025 19:48:56 +0000 (12:48 -0700)]

io_uring/cancel: add generic remove_all helper

Any opcode that is cancelable ends up defining its own remove all
helper, which iterates the pending list and cancels matches. Add a
generic helper for it, which can be used by them.

Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Pavel Begunkov [Wed, 5 Feb 2025 11:36:49 +0000 (11:36 +0000)]

io_uring/kbuf: uninline __io_put_kbufs

__io_put_kbufs() and other helper functions are too large to be inlined,
compilers would normally refuse to do so. Uninline it and move together
with io_kbuf_commit into kbuf.c.

io_kbuf_commitSigned-off-by: Pavel Begunkov <asml.silence@gmail.com>

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/3dade7f55ad590e811aff83b1ec55c9c04e17b2b.1738724373.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Pavel Begunkov [Wed, 5 Feb 2025 11:36:48 +0000 (11:36 +0000)]

io_uring/kbuf: introduce io_kbuf_drop_legacy()

io_kbuf_drop() is only used for legacy provided buffers, and so
__io_put_kbuf_list() is never called for REQ_F_BUFFER_RING. Remove the
dead branch out of __io_put_kbuf_list(), rename it into
io_kbuf_drop_legacy() and use it directly instead of io_kbuf_drop().

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/c8cc73e2272f09a86ecbdad9ebdd8304f8e583c0.1738724373.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Pavel Begunkov [Wed, 5 Feb 2025 11:36:47 +0000 (11:36 +0000)]

io_uring/kbuf: open code __io_put_kbuf()

__io_put_kbuf() is a trivial wrapper, open code it into
__io_put_kbufs().

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/9dc17380272b48d56c95992c6f9eaacd5546e1d3.1738724373.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Pavel Begunkov [Wed, 5 Feb 2025 11:36:46 +0000 (11:36 +0000)]

io_uring/kbuf: remove legacy kbuf caching

Remove all struct io_buffer caches. It makes it a fair bit simpler.
Apart from from killing a bunch of lines and juggling between lists,
__io_put_kbuf_list() doesn't need ->completion_lock locking now.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/18287217466ee2576ea0b1e72daccf7b22c7e856.1738724373.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Pavel Begunkov [Wed, 5 Feb 2025 11:36:45 +0000 (11:36 +0000)]

io_uring/kbuf: simplify __io_put_kbuf

As a preparation step remove an optimisation from __io_put_kbuf() trying
to use the locked cache. With that __io_put_kbuf_list() is only used
with ->io_buffers_comp, and we remove the explicit list argument.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/1b7f1394ec4afc7f96b35a61f5992e27c49fd067.1738724373.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Pavel Begunkov [Wed, 5 Feb 2025 11:36:44 +0000 (11:36 +0000)]

io_uring/kbuf: move locking into io_kbuf_drop()

Move the burden of locking out of the caller into io_kbuf_drop(), that
will help with furher refactoring.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/530f0cf1f06963029399f819a9a58b1a34bebef3.1738724373.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Pavel Begunkov [Wed, 5 Feb 2025 11:36:43 +0000 (11:36 +0000)]

io_uring/kbuf: remove legacy kbuf kmem cache

Remove the kmem cache used by legacy provided buffers.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/8195c207d8524d94e972c0c82de99282289f7f5c.1738724373.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Pavel Begunkov [Wed, 5 Feb 2025 11:36:42 +0000 (11:36 +0000)]

io_uring/kbuf: remove legacy kbuf bulk allocation

Legacy provided buffers are slow and discouraged in favour of the ring
variant. Remove the bulk allocation to keep it simpler as we don't care
about performance.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/a064d70370e590efed8076e9501ae4cfc20fe0ca.1738724373.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Pavel Begunkov [Fri, 31 Jan 2025 17:31:03 +0000 (17:31 +0000)]

io_uring: sanitise ring params earlier

Do all struct io_uring_params validation early on before allocating the
context. That makes initialisation easier, especially by having fewer
places where we need to care about partial de-initialisation.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/363ba90b83ff78eefdc88b60e1b2c4a39d182247.1738344646.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Pavel Begunkov [Fri, 31 Jan 2025 17:28:21 +0000 (17:28 +0000)]

io_uring: check for iowq alloc_workqueue failure

alloc_workqueue() can fail even during init in io_uring_init(), check
the result and panic if anything went wrong.

Fixes: 73eaa2b583493 ("io_uring: use private workqueue for exit work")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/3a046063902f888f66151f89fa42f84063b9727b.1738343083.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Pavel Begunkov [Fri, 31 Jan 2025 17:27:02 +0000 (17:27 +0000)]

io_uring: deduplicate caches deallocation

Add a function that frees all ring caches since we already have two
spots repeating the same thing and it's easy to miss it and change only
one of them.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/b6b0125677c58bdff99eda91ab320137406e8562.1738342562.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Max Kellermann [Tue, 28 Jan 2025 13:39:25 +0000 (14:39 +0100)]

io_uring/io-wq: pass io_wq to io_get_next_work()

The only caller has already determined this pointer, so let's skip
the redundant dereference.

Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
Link: https://lore.kernel.org/r/20250128133927.3989681-7-max.kellermann@ionos.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Max Kellermann [Tue, 28 Jan 2025 13:39:24 +0000 (14:39 +0100)]

io_uring/io-wq: do not use bogus hash value

Previously, the `hash` variable was initialized with `-1` and only
updated by io_get_next_work() if the current work was hashed.  Commit
60cf46ae6054 ("io-wq: hash dependent work") changed this to always
call io_get_work_hash() even if the work was not hashed.  This caused
the `hash != -1U` check to always be true, adding some overhead for
the `hash->wait` code.

This patch fixes the regression by checking the `IO_WQ_WORK_HASHED`
flag.

Perf diff for a flood of `IORING_OP_NOP` with `IOSQE_ASYNC`:

    38.55%     -1.57%  [kernel.kallsyms]  [k] queued_spin_lock_slowpath
     6.86%     -0.72%  [kernel.kallsyms]  [k] io_worker_handle_work
     0.10%     +0.67%  [kernel.kallsyms]  [k] put_prev_entity
     1.96%     +0.59%  [kernel.kallsyms]  [k] io_nop_prep
     3.31%     -0.51%  [kernel.kallsyms]  [k] try_to_wake_up
     7.18%     -0.47%  [kernel.kallsyms]  [k] io_wq_free_work

Fixes: 60cf46ae6054 ("io-wq: hash dependent work")
Cc: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
Link: https://lore.kernel.org/r/20250128133927.3989681-6-max.kellermann@ionos.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Max Kellermann [Tue, 28 Jan 2025 13:39:23 +0000 (14:39 +0100)]

io_uring/io-wq: cache work->flags in variable

This eliminates several redundant atomic reads and therefore reduces
the duration the surrounding spinlocks are held.

In several io_uring benchmarks, this reduced the CPU time spent in
queued_spin_lock_slowpath() considerably:

io_uring benchmark with a flood of `IORING_OP_NOP` and `IOSQE_ASYNC`:

    38.86%     -1.49%  [kernel.kallsyms]  [k] queued_spin_lock_slowpath
     6.75%     +0.36%  [kernel.kallsyms]  [k] io_worker_handle_work
     2.60%     +0.19%  [kernel.kallsyms]  [k] io_nop
     3.92%     +0.18%  [kernel.kallsyms]  [k] io_req_task_complete
     6.34%     -0.18%  [kernel.kallsyms]  [k] io_wq_submit_work

HTTP server, static file:

    42.79%     -2.77%  [kernel.kallsyms]     [k] queued_spin_lock_slowpath
     2.08%     +0.23%  [kernel.kallsyms]     [k] io_wq_submit_work
     1.19%     +0.20%  [kernel.kallsyms]     [k] amd_iommu_iotlb_sync_map
     1.46%     +0.15%  [kernel.kallsyms]     [k] ep_poll_callback
     1.80%     +0.15%  [kernel.kallsyms]     [k] io_worker_handle_work

HTTP server, PHP:

    35.03%     -1.80%  [kernel.kallsyms]     [k] queued_spin_lock_slowpath
     0.84%     +0.21%  [kernel.kallsyms]     [k] amd_iommu_iotlb_sync_map
     1.39%     +0.12%  [kernel.kallsyms]     [k] _copy_to_iter
     0.21%     +0.10%  [kernel.kallsyms]     [k] update_sd_lb_stats

Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
Link: https://lore.kernel.org/r/20250128133927.3989681-5-max.kellermann@ionos.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Max Kellermann [Tue, 28 Jan 2025 13:39:22 +0000 (14:39 +0100)]

io_uring/io-wq: move worker lists to struct io_wq_acct

Have separate linked lists for bounded and unbounded workers.  This
way, io_acct_activate_free_worker() sees only workers relevant to it
and doesn't need to skip irrelevant ones.  This speeds up the
linked list traversal (under acct->lock).

The `io_wq.lock` field is moved to `io_wq_acct.workers_lock`.  It did
not actually protect "access to elements below", that is, not all of
them; it only protected access to the worker lists.  By having two
locks instead of one, contention on this lock is reduced.

Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
Link: https://lore.kernel.org/r/20250128133927.3989681-4-max.kellermann@ionos.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Max Kellermann [Tue, 28 Jan 2025 13:39:21 +0000 (14:39 +0100)]

io_uring/io-wq: add io_worker.acct pointer

This replaces the `IO_WORKER_F_BOUND` flag.  All code that checks this
flag is not interested in knowing whether this is a "bound" worker;
all it does with this flag is determine the `io_wq_acct` pointer.  At
the cost of an extra pointer field, we can eliminate some fragile
pointer arithmetic.  In turn, the `create_index` and `index` fields
are not needed anymore.

Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
Link: https://lore.kernel.org/r/20250128133927.3989681-3-max.kellermann@ionos.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Max Kellermann [Tue, 28 Jan 2025 13:39:20 +0000 (14:39 +0100)]

io_uring/io-wq: eliminate redundant io_work_get_acct() calls

Instead of calling io_work_get_acct() again, pass acct to
io_wq_insert_work() and io_wq_remove_pending().

This atomic access in io_work_get_acct() was done under the
`acct->lock`, and optimizing it away reduces lock contention a bit.

Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
Link: https://lore.kernel.org/r/20250128133927.3989681-2-max.kellermann@ionos.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Linus Torvalds [Sun, 16 Feb 2025 22:02:44 +0000 (14:02 -0800)]

Linux 6.14-rc3

commit | commitdiff | tree

Linus Torvalds [Sun, 16 Feb 2025 20:58:51 +0000 (12:58 -0800)]

Merge tag 'kbuild-fixes-v6.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild fixes from Masahiro Yamada:

- Fix annoying logs when building tools in parallel

- Fix the Debian linux-headers package build again

- Fix the target triple detection for userspace programs on Clang

* tag 'kbuild-fixes-v6.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
  modpost: Fix a few typos in a comment
  kbuild: userprogs: fix bitsize and target detection on clang
  kbuild: fix linux-headers package build when $(CC) cannot link userspace
  tools: fix annoying "mkdir -p ..." logs when building tools in parallel

commit | commitdiff | tree

Linus Torvalds [Sun, 16 Feb 2025 20:54:42 +0000 (12:54 -0800)]

Merge tag 'driver-core-6.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core api addition from Greg KH:
"Here is a driver core new api for 6.14-rc3 that is being added to
  allow platform devices from stop being abused.

  It adds a new 'faux_device' structure and bus and api to allow almost
  a straight or simpler conversion from platform devices that were not
  really a platform device. It also comes with a binding for rust, with
  an example driver in rust showing how it's used.

  I'm adding this now so that the patches that convert the different
  drivers and subsystems can all start flowing into linux-next now
  through their different development trees, in time for 6.15-rc1.

  We have a number that are already reviewed and tested, but adding
  those conversions now doesn't seem right. For now, no one is using
  this, and it passes all build tests from 0-day and linux-next, so all
  should be good"

* tag 'driver-core-6.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  rust/kernel: Add faux device bindings
  driver core: add a faux bus for use when a simple device/bus is needed

commit | commitdiff | tree

Linus Torvalds [Sun, 16 Feb 2025 20:50:44 +0000 (12:50 -0800)]

Merge tag 'tty-6.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

Pull serial driver fixes from Greg KH:
"Here are some small serial driver fixes for some reported problems.
  Nothing major, just:

   - sc16is7xx irq check fix

   - 8250 fifo underflow fix

   - serial_port and 8250 iotype fixes

  Most of these have been in linux-next already, and all have passed
  0-day testing"

* tag 'tty-6.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  serial: 8250: Fix fifo underflow on flush
  serial: 8250_pnp: Remove unneeded ->iotype assignment
  serial: 8250_platform: Remove unneeded ->iotype assignment
  serial: 8250_of: Remove unneeded ->iotype assignment
  serial: port: Make ->iotype validation global in __uart_read_properties()
  serial: port: Always update ->iotype in __uart_read_properties()
  serial: port: Assign ->iotype correctly when ->iobase is set
  serial: sc16is7xx: Fix IRQ number check behavior

commit | commitdiff | tree

Linus Torvalds [Sun, 16 Feb 2025 19:15:50 +0000 (11:15 -0800)]

Merge tag 'usb-6.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb

Pull USB fixes from Greg KH:
"Here are some small USB driver fixes, and new device ids, for
  6.14-rc3. Lots of tiny stuff for reported problems, including:

   - new device ids and quirks

   - usb hub crash fix found by syzbot

   - dwc2 driver fix

   - dwc3 driver fixes

   - uvc gadget driver fix

   - cdc-acm driver fixes for a variety of different issues

   - other tiny bugfixes

  Almost all of these have been in linux-next this week, and all have
  passed 0-day testing"

* tag 'usb-6.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (25 commits)
  usb: typec: tcpm: PSSourceOffTimer timeout in PR_Swap enters ERROR_RECOVERY
  usb: roles: set switch registered flag early on
  usb: gadget: uvc: Fix unstarted kthread worker
  USB: quirks: add USB_QUIRK_NO_LPM quirk for Teclast dist
  usb: gadget: core: flush gadget workqueue after device removal
  USB: gadget: f_midi: f_midi_complete to call queue_work
  usb: core: fix pipe creation for get_bMaxPacketSize0
  usb: dwc3: Fix timeout issue during controller enter/exit from halt state
  USB: Add USB_QUIRK_NO_LPM quirk for sony xperia xz1 smartphone
  USB: cdc-acm: Fill in Renesas R-Car D3 USB Download mode quirk
  usb: cdc-acm: Fix handling of oversized fragments
  usb: cdc-acm: Check control transfer buffer size before access
  usb: xhci: Restore xhci_pci support for Renesas HCs
  USB: pci-quirks: Fix HCCPARAMS register error for LS7A EHCI
  USB: serial: option: drop MeiG Smart defines
  USB: serial: option: fix Telit Cinterion FN990A name
  USB: serial: option: add Telit Cinterion FN990B compositions
  USB: serial: option: add MeiG Smart SLM828
  usb: gadget: f_midi: fix MIDI Streaming descriptor lengths
  usb: dwc2: gadget: remove of_node reference upon udc_stop
  ...

commit | commitdiff | tree

Linus Torvalds [Sun, 16 Feb 2025 18:55:17 +0000 (10:55 -0800)]

Merge tag 'irq_urgent_for_v6.14_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irq Kconfig cleanup from Borislav Petkov:

- Remove an unused config item GENERIC_PENDING_IRQ_CHIPFLAGS

* tag 'irq_urgent_for_v6.14_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genirq: Remove unused CONFIG_GENERIC_PENDING_IRQ_CHIPFLAGS

commit | commitdiff | tree

Linus Torvalds [Sun, 16 Feb 2025 18:41:50 +0000 (10:41 -0800)]

Merge tag 'perf_urgent_for_v6.14_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 perf fixes from Borislav Petkov:

- Explicitly clear DEBUGCTL.LBR to prevent LBRs continuing being
   enabled after handoff to the OS

- Check CPUID(0x23) leaf and subleafs presence properly

- Remove the PEBS-via-PT feature from being supported on hybrid systems

- Fix perf record/top default commands on systems without a raw PMU
   registered

* tag 'perf_urgent_for_v6.14_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel: Ensure LBRs are disabled when a CPU is starting
  perf/x86/intel: Fix ARCH_PERFMON_NUM_COUNTER_LEAF
  perf/x86/intel: Clean up PEBS-via-PT on hybrid
  perf/x86/rapl: Fix the error checking order

commit | commitdiff | tree

Linus Torvalds [Sun, 16 Feb 2025 18:38:24 +0000 (10:38 -0800)]

Merge tag 'sched_urgent_for_v6.14_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler fix from Borislav Petkov:

- Clarify what happens when a task is woken up from the wake queue and
make clear its removal from that queue is atomic

* tag 'sched_urgent_for_v6.14_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched: Clarify wake_up_q()'s write to task->wake_q.next

commit | commitdiff | tree

Linus Torvalds [Sun, 16 Feb 2025 18:30:58 +0000 (10:30 -0800)]

Merge tag 'objtool_urgent_for_v6.14_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull objtool fixes from Borislav Petkov:

- Move a warning about a lld.ld breakage into the verbose setting as
   said breakage has been fixed in the meantime

- Teach objtool to ignore dangling jump table entries added by Clang

* tag 'objtool_urgent_for_v6.14_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  objtool: Move dodgy linker warn to verbose
  objtool: Ignore dangling jump table entries

commit | commitdiff | tree

Linus Torvalds [Sun, 16 Feb 2025 18:25:12 +0000 (10:25 -0800)]

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
"ARM:

   - Large set of fixes for vector handling, especially in the
     interactions between host and guest state.

     This fixes a number of bugs affecting actual deployments, and
     greatly simplifies the FP/SIMD/SVE handling. Thanks to Mark Rutland
     for dealing with this thankless task.

   - Fix an ugly race between vcpu and vgic creation/init, resulting in
     unexpected behaviours

   - Fix use of kernel VAs at EL2 when emulating timers with nVHE

   - Small set of pKVM improvements and cleanups

  x86:

   - Fix broken SNP support with KVM module built-in, ensuring the PSP
     module is initialized before KVM even when the module
     infrastructure cannot be used to order initcalls

   - Reject Hyper-V SEND_IPI hypercalls if the local APIC isn't being
     emulated by KVM to fix a NULL pointer dereference

   - Enter guest mode (L2) from KVM's perspective before initializing
     the vCPU's nested NPT MMU so that the MMU is properly tagged for
     L2, not L1

   - Load the guest's DR6 outside of the innermost .vcpu_run() loop, as
     the guest's value may be stale if a VM-Exit is handled in the
     fastpath"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (25 commits)
  x86/sev: Fix broken SNP support with KVM module built-in
  KVM: SVM: Ensure PSP module is initialized if KVM module is built-in
  crypto: ccp: Add external API interface for PSP module initialization
  KVM: arm64: vgic: Hoist SGI/PPI alloc from vgic_init() to kvm_create_vgic()
  KVM: arm64: timer: Drop warning on failed interrupt signalling
  KVM: arm64: Fix alignment of kvm_hyp_memcache allocations
  KVM: arm64: Convert timer offset VA when accessed in HYP code
  KVM: arm64: Simplify warning in kvm_arch_vcpu_load_fp()
  KVM: arm64: Eagerly switch ZCR_EL{1,2}
  KVM: arm64: Mark some header functions as inline
  KVM: arm64: Refactor exit handlers
  KVM: arm64: Refactor CPTR trap deactivation
  KVM: arm64: Remove VHE host restore of CPACR_EL1.SMEN
  KVM: arm64: Remove VHE host restore of CPACR_EL1.ZEN
  KVM: arm64: Remove host FPSIMD saving for non-protected KVM
  KVM: arm64: Unconditionally save+flush host FPSIMD/SVE/SME state
  KVM: x86: Load DR6 with guest value only before entering .vcpu_run() loop
  KVM: nSVM: Enter guest mode before initializing nested NPT MMU
  KVM: selftests: Add CPUID tests for Hyper-V features that need in-kernel APIC
  KVM: selftests: Manage CPUID array in Hyper-V CPUID test's core helper
  ...

commit | commitdiff | tree

Linus Torvalds [Sun, 16 Feb 2025 18:19:41 +0000 (10:19 -0800)]

Merge tag 'mips-fixes_6.14_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux

Pull MIPS fixes from Thomas Bogendoerfer:
"Fix for o32 ptrace/get_syscall_info"

* tag 'mips-fixes_6.14_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
MIPS: fix mips_get_syscall_arg() for o32
MIPS: Export syscall stack arguments properly for remote use

commit | commitdiff | tree

Linus Torvalds [Sun, 16 Feb 2025 01:20:39 +0000 (17:20 -0800)]

Merge tag 'devicetree-fixes-for-6.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux

Pull devicetree fixes from Rob Herring:

- Add bindings for QCom QCS8300 clocks, QCom SAR2130P qfprom, and
   powertip,{st7272|hx8238a} displays

- Fix compatible for TI am62a7 dss

- Add a kunit test for __of_address_resource_bounds()

* tag 'devicetree-fixes-for-6.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
  dt-bindings: display: Add powertip,{st7272|hx8238a} as DT Schema description
  dt-bindings: nvmem: qcom,qfprom: Add SAR2130P compatible
  dt-bindings: display: ti: Fix compatible for am62a7 dss
  of: address: Add kunit test for __of_address_resource_bounds()
  dt-bindings: clock: qcom: Add QCS8300 video clock controller
  dt-bindings: clock: qcom: Add CAMCC clocks for QCS8300
  dt-bindings: clock: qcom: Add GPU clocks for QCS8300

commit | commitdiff | tree

Linus Torvalds [Sun, 16 Feb 2025 01:14:53 +0000 (17:14 -0800)]

Merge tag 'uml-for-linus-6.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux

Pull UML fixes from Richard Weinberger:

- Align signal stack correctly

- Convert to raw spinlocks where needed (irq and virtio)

- FPU related fixes

* tag 'uml-for-linus-6.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux:
  um: convert irq_lock to raw spinlock
  um: virtio_uml: use raw spinlock
  um: virt-pci: don't use kmalloc()
  um: fix execve stub execution on old host OSs
  um: properly align signal stack on x86_64
  um: avoid copying FP state from init_task
  um: add back support for FXSAVE registers

commit | commitdiff | tree

Linus Torvalds [Sun, 16 Feb 2025 00:34:41 +0000 (16:34 -0800)]

Merge tag 'trace-ring-buffer-v6.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull trace ring buffer fixes from Steven Rostedt:

- Enable resize on mmap() error

   When a process mmaps a ring buffer, its size is locked and resizing
   is disabled. But if the user passes in a wrong parameter, the mmap()
   can fail after the resize was disabled and the mmap() exits with
   error without reenabling the ring buffer resize. This prevents the
   ring buffer from ever being resized after that. Reenable resizing of
   the ring buffer on mmap() error.

- Have resizing return proper error and not always -ENOMEM

   If the ring buffer is mmapped by one task and another task tries to
   resize the buffer it will error with -ENOMEM. This is confusing to
   the user as there may be plenty of memory available. Have it return
   the error that actually happens (in this case -EBUSY) where the user
   can understand why the resize failed.

- Test the sub-buffer array to validate persistent memory buffer

   On boot up, the initialization of the persistent memory buffer will
   do a validation check to see if the content of the data is valid, and
   if so, it will use the memory as is, otherwise it re-initializes it.
   There's meta data in this persistent memory that keeps track of which
   sub-buffer is the reader page and an array that states the order of
   the sub-buffers. The values in this array are indexes into the
   sub-buffers. The validator checks to make sure that all the entries
   in the array are within the sub-buffer list index, but it does not
   check for duplications.

   While working on this code, the array got corrupted and had
   duplicates, where not all the sub-buffers were accounted for. This
   passed the validator as all entries were valid, but the link list was
   incorrect and could have caused a crash. The corruption only produced
   incorrect data, but it could have been more severe. To fix this,
   create a bitmask that covers all the sub-buffer indexes and set it to
   all zeros. While iterating the array checking the values of the array
   content, have it set a bit corresponding to the index in the array.
   If the bit was already set, then it is a duplicate and mark the
   buffer as invalid and reset it.

- Prevent mmap()ing persistent ring buffer

   The persistent ring buffer uses vmap() to map the persistent memory.
   Currently, the mmap() logic only uses virt_to_page() to get the page
   from the ring buffer memory and use that to map to user space. This
   works because a normal ring buffer uses alloc_page() to allocate its
   memory. But because the persistent ring buffer use vmap() it causes a
   kernel crash.

   Fixing this to work with vmap() is not hard, but since mmap() on
   persistent memory buffers never worked, just have the mmap() return
   -ENODEV (what was returned before mmap() for persistent memory ring
   buffers, as they never supported mmap. Normal buffers will still
   allow mmap(). Implementing mmap() for persistent memory ring buffers
   can wait till the next merge window.

- Fix polling on persistent ring buffers

   There's a "buffer_percent" option (default set to 50), that is used
   to have reads of the ring buffer binary data block until the buffer
   fills to that percentage. The field "pages_touched" is incremented
   every time a new sub-buffer has content added to it. This field is
   used in the calculations to determine the amount of content is in the
   buffer and if it exceeds the "buffer_percent" then it will wake the
   task polling on the buffer.

   As persistent ring buffers can be created by the content from a
   previous boot, the "pages_touched" field was not updated. This means
   that if a task were to poll on the persistent buffer, it would block
   even if the buffer was completely full. It would block even if the
   "buffer_percent" was zero, because with "pages_touched" as zero, it
   would be calculated as the buffer having no content. Update
   pages_touched when initializing the persistent ring buffer from a
   previous boot.

* tag 'trace-ring-buffer-v6.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  ring-buffer: Update pages_touched to reflect persistent buffer content
  tracing: Do not allow mmap() of persistent ring buffer
  ring-buffer: Validate the persistent meta data subbuf array
  tracing: Have the error of __tracing_resize_ring_buffer() passed to user
  ring-buffer: Unlock resize on mmap error

commit | commitdiff | tree

Steven Rostedt [Fri, 14 Feb 2025 17:35:12 +0000 (12:35 -0500)]

ring-buffer: Update pages_touched to reflect persistent buffer content

The pages_touched field represents the number of subbuffers in the ring
buffer that have content that can be read. This is used in accounting of
"dirty_pages" and "buffer_percent" to allow the user to wait for the
buffer to be filled to a certain amount before it reads the buffer in
blocking mode.

The persistent buffer never updated this value so it was set to zero, and
this accounting would take it as it had no content. This would cause user
space to wait for content even though there's enough content in the ring
buffer that satisfies the buffer_percent.

Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Vincent Donnefort <vdonnefort@google.com>
Link: https://lore.kernel.org/20250214123512.0631436e@gandalf.local.home
Fixes: 5f3b6e839f3ce ("ring-buffer: Validate boot range memory events")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

commit | commitdiff | tree

Steven Rostedt [Fri, 14 Feb 2025 16:55:47 +0000 (11:55 -0500)]

tracing: Do not allow mmap() of persistent ring buffer

When trying to mmap a trace instance buffer that is attached to
reserve_mem, it would crash:

BUG: unable to handle page fault for address: ffffe97bd00025c8
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 2862f3067 P4D 2862f3067 PUD 0
Oops: Oops: 0000 [#1] PREEMPT_RT SMP PTI
CPU: 4 UID: 0 PID: 981 Comm: mmap-rb Not tainted 6.14.0-rc2-test-00003-g7f1a5e3fbf9e-dirty #233
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
RIP: 0010:validate_page_before_insert+0x5/0xb0
Code: e2 01 89 d0 c3 cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 <48> 8b 46 08 a8 01 75 67 66 90 48 89 f0 8b 50 34 85 d2 74 76 48 89
RSP: 0018:ffffb148c2f3f968 EFLAGS: 00010246
RAX: ffff9fa5d3322000 RBX: ffff9fa5ccff9c08 RCX: 00000000b879ed29
RDX: ffffe97bd00025c0 RSI: ffffe97bd00025c0 RDI: ffff9fa5ccff9c08
RBP: ffffb148c2f3f9f0 R08: 0000000000000004 R09: 0000000000000004
R10: 0000000000000000 R11: 0000000000000200 R12: 0000000000000000
R13: 00007f16a18d5000 R14: ffff9fa5c48db6a8 R15: 0000000000000000
FS:  00007f16a1b54740(0000) GS:ffff9fa73df00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffe97bd00025c8 CR3: 00000001048c6006 CR4: 0000000000172ef0
Call Trace:
  <TASK>
  ? __die_body.cold+0x19/0x1f
  ? __die+0x2e/0x40
  ? page_fault_oops+0x157/0x2b0
  ? search_module_extables+0x53/0x80
  ? validate_page_before_insert+0x5/0xb0
  ? kernelmode_fixup_or_oops.isra.0+0x5f/0x70
  ? __bad_area_nosemaphore+0x16e/0x1b0
  ? bad_area_nosemaphore+0x16/0x20
  ? do_kern_addr_fault+0x77/0x90
  ? exc_page_fault+0x22b/0x230
  ? asm_exc_page_fault+0x2b/0x30
  ? validate_page_before_insert+0x5/0xb0
  ? vm_insert_pages+0x151/0x400
  __rb_map_vma+0x21f/0x3f0
  ring_buffer_map+0x21b/0x2f0
  tracing_buffers_mmap+0x70/0xd0
  __mmap_region+0x6f0/0xbd0
  mmap_region+0x7f/0x130
  do_mmap+0x475/0x610
  vm_mmap_pgoff+0xf2/0x1d0
  ksys_mmap_pgoff+0x166/0x200
  __x64_sys_mmap+0x37/0x50
  x64_sys_call+0x1670/0x1d70
  do_syscall_64+0xbb/0x1d0
  entry_SYSCALL_64_after_hwframe+0x77/0x7f

The reason was that the code that maps the ring buffer pages to user space
has:

page = virt_to_page((void *)cpu_buffer->subbuf_ids[s]);

And uses that in:

vm_insert_pages(vma, vma->vm_start, pages, &nr_pages);

But virt_to_page() does not work with vmap()'d memory which is what the
persistent ring buffer has. It is rather trivial to allow this, but for
now just disable mmap() of instances that have their ring buffer from the
reserve_mem option.

If an mmap() is performed on a persistent buffer it will return -ENODEV
just like it would if the .mmap field wasn't defined in the
file_operations structure.

Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Vincent Donnefort <vdonnefort@google.com>
Link: https://lore.kernel.org/20250214115547.0d7287d3@gandalf.local.home
Fixes: 9b7bdf6f6ece6 ("tracing: Have trace_printk not use binary prints if boot buffer")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

commit | commitdiff | tree

Linus Torvalds [Sat, 15 Feb 2025 18:20:47 +0000 (10:20 -0800)]

Merge tag 'i2c-for-6.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux

Pull i2c fixes from Wolfram Sang:
"MAINTAINERS maintenance.

  Changed email, added entry, deleted entry falling back to a generic
  one"

* tag 'i2c-for-6.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  MAINTAINERS: Add maintainer for Qualcomm's I2C GENI driver
  MAINTAINERS: delete entry for AXXIA I2C
  MAINTAINERS: Use my kernel.org address for I2C ACPI work

commit | commitdiff | tree

Linus Torvalds [Sat, 15 Feb 2025 18:15:24 +0000 (10:15 -0800)]

Merge tag 's390-6.14-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull s390 fixes from Vasily Gorbik:

- Fix isolated VFs handling by verifying that a VF’s parent PF is
   locally owned before registering it in an existing PCI domain

- Disable arch_test_bit() optimization for PROFILE_ALL_BRANCHES to
   workaround gcc failure in handling __builtin_constant_p() in this
   case

- Fix CHPID "configure" attribute caching in CIO by not updating the
   cache when SCLP returns no data, ensuring consistent sysfs output

- Remove CONFIG_LSM from default configs and rely on defaults, which
   enables BPF LSM hook

* tag 's390-6.14-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/pci: Fix handling of isolated VFs
  s390/pci: Pull search for parent PF out of zpci_iov_setup_virtfn()
  s390/bitops: Disable arch_test_bit() optimization for PROFILE_ALL_BRANCHES
  s390/cio: Fix CHPID "configure" attribute caching
  s390/configs: Remove CONFIG_LSM

commit | commitdiff | tree

Uwe Kleine-König [Thu, 13 Feb 2025 16:04:29 +0000 (17:04 +0100)]

modpost: Fix a few typos in a comment

Namely: s/becasue/because/ and s/wiht/with/ plus an added article.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>

commit | commitdiff | tree

Thomas Weißschuh [Thu, 13 Feb 2025 14:55:17 +0000 (15:55 +0100)]

kbuild: userprogs: fix bitsize and target detection on clang

scripts/Makefile.clang was changed in the linked commit to move --target from
KBUILD_CFLAGS to KBUILD_CPPFLAGS, as that generally has a broader scope.
However that variable is not inspected by the userprogs logic,
breaking cross compilation on clang.

Use both variables to detect bitsize and target arguments for userprogs.

Fixes: feb843a469fb ("kbuild: add $(CLANG_FLAGS) to KBUILD_CPPFLAGS")
Cc: stable@vger.kernel.org
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>

commit | commitdiff | tree

Linus Torvalds [Sat, 15 Feb 2025 17:54:46 +0000 (09:54 -0800)]

Merge tag 'rust-fixes-6.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux

Pull rust fixes from Miguel Ojeda:

- Fix objtool warning due to future Rust 1.85.0 (to be released in a
   few days)

- Clean future Rust 1.86.0 (to be released 2025-04-03) Clippy warning

* tag 'rust-fixes-6.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux:
  rust: rbtree: fix overindented list item
  objtool/rust: add one more `noreturn` Rust function

commit | commitdiff | tree

Linus Torvalds [Sat, 15 Feb 2025 17:28:55 +0000 (09:28 -0800)]

tegra210-adma: fix 32-bit x86 build

The Tegra210 Audio DMA controller driver did a plain divide:

page_no = (res_page->start - res_base->start) / cdata->ch_base_offset;

which causes problems on 32-bit x86 configurations that have 64-bit
resource sizes:

  x86_64-linux-ld: drivers/dma/tegra210-adma.o: in function `tegra_adma_probe':
  tegra210-adma.c:(.text+0x1322): undefined reference to `__udivdi3'

because gcc doesn't generate the trivial code for a 64-by-32 divide,
turning it into a function call to do a full 64-by-64 divide.  And the
kernel intentionally doesn't provide that helper function, because 99%
of the time all you want is the narrower version.

Of course, tegra210 is a 64-bit architecture and the 32-bit x86 build is
purely for build testing, so this really is just about build coverage
failure.

But build coverage is good.

Side note: div_u64() would be suboptimal if you actually have a 32-bit
resource_t, so our "helper" for divides are admittedly making it harder
than it should be to generate good code for all the possible cases.

At some point, I'll consider 32-bit x86 so entirely legacy that I can't
find it in myself to care any more, and we'll just add the __udivdi3
library function.

But for now, the right thing to do is to use "div_u64()" to show that
you know that you are doing the simpler divide with a 32-bit number.
And the build error enforces that.

While fixing the build issue, also check for division-by-zero, and for
overflow.  Which hopefully cannot happen on real production hardware,
but the value of 'ch_base_offset' can definitely be zero in other
places.

Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit | commitdiff | tree

Linus Torvalds [Sat, 15 Feb 2025 16:13:45 +0000 (08:13 -0800)]

Merge tag 'gpio-fixes-for-v6.14-rc3-take2' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux

Pull gpio fixes from Bartosz Golaszewski:

- fix interrupt handling issues in gpio-bcm-kona

- add an ACPI quirk for Acer Nitro ANV14 fixing an issue with spurious
   wake up events

- add missing return value checks to gpio-stmpe

- fix a crash in error path in gpiochip_get_ngpios()

* tag 'gpio-fixes-for-v6.14-rc3-take2' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
  gpiolib: Fix crash on error in gpiochip_get_ngpios()
  gpio: stmpe: Check return value of stmpe_reg_read in stmpe_gpio_irq_sync_unlock
  gpiolib: acpi: Add a quirk for Acer Nitro ANV14
  gpio: bcm-kona: Add missing newline to dev_err format string
  gpio: bcm-kona: Make sure GPIO bits are unlocked when requesting IRQ
  gpio: bcm-kona: Fix GPIO lock/unlock for banks above bank 0

commit | commitdiff | tree

Pavel Begunkov [Fri, 14 Feb 2025 22:48:15 +0000 (22:48 +0000)]

io_uring: prevent opcode speculation

sqe->opcode is used for different tables, make sure we santitise it
against speculations.

Cc: stable@vger.kernel.org
Fixes: d3656344fea03 ("io_uring: add lookup table for various opcode needs")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Li Zetao <lizetao1@huawei.com>
Link: https://lore.kernel.org/r/7eddbf31c8ca0a3947f8ed98271acc2b4349c016.1739568408.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Masahiro Yamada [Thu, 13 Feb 2025 06:26:44 +0000 (15:26 +0900)]

kbuild: fix linux-headers package build when $(CC) cannot link userspace

Since commit 5f73e7d0386d ("kbuild: refactor cross-compiling
linux-headers package"), the linux-headers Debian package fails to
build when $(CC) cannot build userspace applications, for example,
when using toolchains installed by the 0day bot.

The host programs in the linux-headers package should be rebuilt using
the disto's cross-compiler, ${DEB_HOST_GNU_TYPE}-gcc instead of $(CC).
Hence, the variable 'CC' must be expanded in this shell script instead
of in the top-level Makefile.

Commit f354fc88a72a ("kbuild: install-extmod-build: add missing
quotation marks for CC variable") was not a correct fix because
CC="ccache gcc" should be unrelated when rebuilding userspace tools.

Fixes: 5f73e7d0386d ("kbuild: refactor cross-compiling linux-headers package")
Reported-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
Closes: https://lore.kernel.org/linux-kbuild/CAK7LNARb3xO3ptBWOMpwKcyf3=zkfhMey5H2KnB1dOmUwM79dA@mail.gmail.com/T/#t
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Tested-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com>

commit | commitdiff | tree

Masahiro Yamada [Tue, 11 Feb 2025 00:29:06 +0000 (09:29 +0900)]

tools: fix annoying "mkdir -p ..." logs when building tools in parallel

When CONFIG_OBJTOOL=y or CONFIG_DEBUG_INFO_BTF=y, parallel builds
show awkward "mkdir -p ..." logs.

  $ make -j16
    [ snip ]
  mkdir -p /home/masahiro/ref/linux/tools/objtool && make O=/home/masahiro/ref/linux subdir=tools/objtool --no-print-directory -C objtool
  mkdir -p /home/masahiro/ref/linux/tools/bpf/resolve_btfids && make O=/home/masahiro/ref/linux subdir=tools/bpf/resolve_btfids --no-print-directory -C bpf/resolve_btfids

Defining MAKEFLAGS=<value> on the command line wipes out command line
switches from the resultant MAKEFLAGS definition, even though the command
line switches are active. [1]

MAKEFLAGS puts all single-letter options into the first word, and that
word will be empty if no single-letter options were given. [2]
However, this breaks if MAKEFLAGS=<value> is given on the command line.

The tools/ and tools/% targets set MAKEFLAGS=<value> on the command
line, which breaks the following code in tools/scripts/Makefile.include:

    short-opts := $(firstword -$(MAKEFLAGS))

If MAKEFLAGS really needs modification, it should be done through the
environment variable, as follows:

    MAKEFLAGS=<value> $(MAKE) ...

That said, I question whether modifying MAKEFLAGS is necessary here.
The only flag we might want to exclude is --no-print-directory, as the
tools build system changes the working directory. However, people might
find the "Entering/Leaving directory" logs annoying.

I simply removed the offending MAKEFLAGS=<value>.

[1]: https://savannah.gnu.org/bugs/?62469
[2]: https://www.gnu.org/software/make/manual/make.html#Testing-Flags

Fixes: ea01fa9f63ae ("tools: Connect to the kernel build system")
Fixes: a50e43332756 ("perf tools: Honor parallel jobs")
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Tested-by: Daniel Xu <dxu@dxuuu.xyz>

commit | commitdiff | tree

Linus Torvalds [Sat, 15 Feb 2025 03:56:12 +0000 (19:56 -0800)]

Merge tag 'alpha-fixes-v6.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/mattst88/alpha

Pull alpha fixes from Matt Turner:
"A few changes for alpha, including some important fixes for kernel
  stack alignment"

* tag 'alpha-fixes-v6.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/mattst88/alpha:
  alpha: Use str_yes_no() helper in pci_dac_dma_supported()
  alpha: Replace one-element array with flexible array member
  alpha: align stack for page fault and user unaligned trap handlers
  alpha: make stack 16-byte aligned (most cases)
  alpha: replace hardcoded stack offsets with autogenerated ones

commit | commitdiff | tree

Linus Torvalds [Sat, 15 Feb 2025 00:49:07 +0000 (16:49 -0800)]

Merge tag 'pci-v6.14-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci

Pull pci fixes from Bjorn Helgaas:

- Update a BUILD_BUG_ON() usage that works on current compilers, but
   breaks compilation on gcc 5.3.1 (Alex Williamson)

- Avoid use of FLR for Mediatek MT7922 WiFi; the device previously
   worked after a long timeout and fallback to SBR, but after a recent
   RRS change it doesn't work at all after FLR (Bjorn Helgaas)

* tag 'pci-v6.14-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
  PCI: Avoid FLR for Mediatek MT7922 WiFi
  PCI: Fix BUILD_BUG_ON usage for old gcc

commit | commitdiff | tree

Paolo Bonzini [Sat, 15 Feb 2025 00:08:35 +0000 (19:08 -0500)]

Merge tag 'kvm-x86-fixes-6.14-rcN' of https://github.com/kvm-x86/linux into HEAD

KVM fixes for 6.14 part 1

- Reject Hyper-V SEND_IPI hypercalls if the local APIC isn't being emulated
   by KVM to fix a NULL pointer dereference.

- Enter guest mode (L2) from KVM's perspective before initializing the vCPU's
   nested NPT MMU so that the MMU is properly tagged for L2, not L1.

- Load the guest's DR6 outside of the innermost .vcpu_run() loop, as the
   guest's value may be stale if a VM-Exit is handled in the fastpath.

commit | commitdiff | tree

Ashish Kalra [Mon, 10 Feb 2025 22:54:18 +0000 (22:54 +0000)]

x86/sev: Fix broken SNP support with KVM module built-in

Fix issues with enabling SNP host support and effectively SNP support
which is broken with respect to the KVM module being built-in.

SNP host support is enabled in snp_rmptable_init() which is invoked as
device_initcall(). SNP check on IOMMU is done during IOMMU PCI init
(IOMMU_PCI_INIT stage). And for that reason snp_rmptable_init() is
currently invoked via device_initcall() and cannot be invoked via
subsys_initcall() as core IOMMU subsystem gets initialized via
subsys_initcall().

Now, if kvm_amd module is built-in, it gets initialized before SNP host
support is enabled in snp_rmptable_init() :

[   10.131811] kvm_amd: TSC scaling supported
[   10.136384] kvm_amd: Nested Virtualization enabled
[   10.141734] kvm_amd: Nested Paging enabled
[   10.146304] kvm_amd: LBR virtualization supported
[   10.151557] kvm_amd: SEV enabled (ASIDs 100 - 509)
[   10.156905] kvm_amd: SEV-ES enabled (ASIDs 1 - 99)
[   10.162256] kvm_amd: SEV-SNP enabled (ASIDs 1 - 99)
[   10.171508] kvm_amd: Virtual VMLOAD VMSAVE supported
[   10.177052] kvm_amd: Virtual GIF supported
...
...
[   10.201648] kvm_amd: in svm_enable_virtualization_cpu

And then svm_x86_ops->enable_virtualization_cpu()
(svm_enable_virtualization_cpu) programs MSR_VM_HSAVE_PA as following:
wrmsrl(MSR_VM_HSAVE_PA, sd->save_area_pa);

So VM_HSAVE_PA is non-zero before SNP support is enabled on all CPUs.

snp_rmptable_init() gets invoked after svm_enable_virtualization_cpu()
as following :
...
[   11.256138] kvm_amd: in svm_enable_virtualization_cpu
...
[   11.264918] SEV-SNP: in snp_rmptable_init

This triggers a #GP exception in snp_rmptable_init() when snp_enable()
is invoked to set SNP_EN in SYSCFG MSR:

[   11.294289] unchecked MSR access error: WRMSR to 0xc0010010 (tried to write 0x0000000003fc0000) at rIP: 0xffffffffaf5d5c28 (native_write_msr+0x8/0x30)
...
[   11.294404] Call Trace:
[   11.294482]  <IRQ>
[   11.294513]  ? show_stack_regs+0x26/0x30
[   11.294522]  ? ex_handler_msr+0x10f/0x180
[   11.294529]  ? search_extable+0x2b/0x40
[   11.294538]  ? fixup_exception+0x2dd/0x340
[   11.294542]  ? exc_general_protection+0x14f/0x440
[   11.294550]  ? asm_exc_general_protection+0x2b/0x30
[   11.294557]  ? __pfx_snp_enable+0x10/0x10
[   11.294567]  ? native_write_msr+0x8/0x30
[   11.294570]  ? __snp_enable+0x5d/0x70
[   11.294575]  snp_enable+0x19/0x20
[   11.294578]  __flush_smp_call_function_queue+0x9c/0x3a0
[   11.294586]  generic_smp_call_function_single_interrupt+0x17/0x20
[   11.294589]  __sysvec_call_function+0x20/0x90
[   11.294596]  sysvec_call_function+0x80/0xb0
[   11.294601]  </IRQ>
[   11.294603]  <TASK>
[   11.294605]  asm_sysvec_call_function+0x1f/0x30
...
[   11.294631]  arch_cpu_idle+0xd/0x20
[   11.294633]  default_idle_call+0x34/0xd0
[   11.294636]  do_idle+0x1f1/0x230
[   11.294643]  ? complete+0x71/0x80
[   11.294649]  cpu_startup_entry+0x30/0x40
[   11.294652]  start_secondary+0x12d/0x160
[   11.294655]  common_startup_64+0x13e/0x141
[   11.294662]  </TASK>

This #GP exception is getting triggered due to the following errata for
AMD family 19h Models 10h-1Fh Processors:

Processor may generate spurious #GP(0) Exception on WRMSR instruction:
Description:
The Processor will generate a spurious #GP(0) Exception on a WRMSR
instruction if the following conditions are all met:
- the target of the WRMSR is a SYSCFG register.
- the write changes the value of SYSCFG.SNPEn from 0 to 1.
- One of the threads that share the physical core has a non-zero
value in the VM_HSAVE_PA MSR.

The document being referred to above:
https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/revision-guides/57095-PUB_1_01.pdf

To summarize, with kvm_amd module being built-in, KVM/SVM initialization
happens before host SNP is enabled and this SVM initialization
sets VM_HSAVE_PA to non-zero, which then triggers a #GP when
SYSCFG.SNPEn is being set and this will subsequently cause
SNP_INIT(_EX) to fail with INVALID_CONFIG error as SYSCFG[SnpEn] is not
set on all CPUs.

Essentially SNP host enabling code should be invoked before KVM
initialization, which is currently not the case when KVM is built-in.

Add fix to call snp_rmptable_init() early from iommu_snp_enable()
directly and not invoked via device_initcall() which enables SNP host
support before KVM initialization with kvm_amd module built-in.

Add additional handling for `iommu=off` or `amd_iommu=off` options.

Note that IOMMUs need to be enabled for SNP initialization, therefore,
if host SNP support is enabled but late IOMMU initialization fails
then that will cause PSP driver's SNP_INIT to fail as IOMMU SNP sanity
checks in SNP firmware will fail with invalid configuration error as
below:

[    9.723114] ccp 0000:23:00.1: sev enabled
[    9.727602] ccp 0000:23:00.1: psp enabled
[    9.732527] ccp 0000:a2:00.1: enabling device (0000 -> 0002)
[    9.739098] ccp 0000:a2:00.1: no command queues available
[    9.745167] ccp 0000:a2:00.1: psp enabled
[    9.805337] ccp 0000:23:00.1: SEV-SNP: failed to INIT rc -5, error 0x3
[    9.866426] ccp 0000:23:00.1: SEV API:1.53 build:5

Fixes: c3b86e61b756 ("x86/cpufeatures: Enable/unmask SEV-SNP CPU feature")
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Co-developed-by: Vasant Hegde <vasant.hegde@amd.com>
Signed-off-by: Vasant Hegde <vasant.hegde@amd.com>
Cc: <Stable@vger.kernel.org>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Acked-by: Joerg Roedel <jroedel@suse.de>
Message-ID: <138b520fb83964782303b43ade4369cd181fdd9c.1739226950.git.ashish.kalra@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

A mirror of Linus' kernel repository

RSS Atom