Remove a bunch of unused xdr_*decode* functions:
The last use of xdr_decode_netobj() was removed in 2021 by:
commit 7cf96b6d0104 ("lockd: Update the NLMv4 SHARE arguments decoder to
use struct xdr_stream")
The last use of xdr_decode_string_inplace() was removed in 2021 by:
commit 3049e974a7c7 ("lockd: Update the NLMv4 FREE_ALL arguments decoder
to use struct xdr_stream")
The last use of xdr_stream_decode_opaque() was removed in 2024 by:
commit fed8a17c61ff ("xdrgen: typedefs should use the built-in string and
opaque functions")
The functions xdr_stream_decode_string() and
xdr_stream_decode_opaque_dup() were both added in 2018 by the
commit 0e779aa70308 ("SUNRPC: Add helpers for decoding opaque and string
types")
but never used.
Create a kernel .nfs keyring similar to the nvme .nvme one. Unlike for
a userspace-created keyrind, tlshd is a possesor of the keys with this
and thus the keys don't need user read permissions.
Allow tlshd to use a per-mount key from the kernel keyring similar
to NVMe over TCP.
Note that tlshd expects keys and certificates stored in the kernel
keyring to be in DER format, not the PEM format used for file based keys
and certificates, so they need to be converted before they are added
to the keyring, which is a bit unexpected.
Jeff Layton [Wed, 18 Jun 2025 13:19:12 +0000 (09:19 -0400)]
nfs: add cache_validity to the nfs_inode_event tracepoints
Managing the cache_validity flags is the deep voodoo of NFS cache
coherency. Let's have a little extra visibility into that value via the
nfs_inode_event tracepoints.
NFS: remove unused pnfs_ld_data field from struct nfs_server
The last code that was using this was removed via commit 20d655d6197d
("pnfs/blocklayout: use the device id cache") which was merged in
v3.18-rc1, so it can be removed completely.
NFS: remove unused time_delta field from struct nfs_server
The last code that was using this was removed via commit ca0daa277aca
("NFS: Cache aggressively when file is open for writing") which was
merged in v4.8-rc1, so it can be removed completely.
NFS: remove unused wpages field from struct nfs_server
The wpages field is not serving any purpose since commit c63c7b051395
("NFS: Fix a race when doing NFS write coalescing") which was merged in
v2.6.22-rc1. Remove it completely.
Zheyun Shen [Thu, 22 May 2025 23:37:32 +0000 (16:37 -0700)]
KVM: SVM: Flush cache only on CPUs running SEV guest
On AMD CPUs without ensuring cache consistency, each memory page
reclamation in an SEV guest triggers a call to do WBNOINVD/WBINVD on all
CPUs, thereby affecting the performance of other programs on the host.
Typically, an AMD server may have 128 cores or more, while the SEV guest
might only utilize 8 of these cores. Meanwhile, host can use qemu-affinity
to bind these 8 vCPUs to specific physical CPUs.
Therefore, keeping a record of the physical core numbers each time a vCPU
runs can help avoid flushing the cache for all CPUs every time.
Take care to allocate the cpumask used to track which CPUs have run a
vCPU when copying or moving an "encryption context", as nothing guarantees
memory in a mirror VM is a strict subset of the ASID owner, and the
destination VM for intrahost migration needs to maintain it's own set of
CPUs. E.g. for intrahost migration, if a CPU was used for the source VM
but not the destination VM, then it can only have cached memory that was
accessible to the source VM. And a CPU that was run in the source is also
used by the destination is no different than a CPU that was run in the
destination only.
Note, KVM is guaranteed to do flush caches prior to sev_vm_destroy(),
thanks to kvm_arch_guest_memory_reclaimed for SEV and SEV-ES, and
kvm_arch_gmem_invalidate() for SEV-SNP. I.e. it's safe to free the
cpumask prior to unregistering encrypted regions and freeing the ASID.
Opportunistically clean up sev_vm_destroy()'s comment regarding what is
(implicitly, what isn't) skipped for mirror VMs.
TCP socket iterators use iter->offset to track progress through a
bucket, which is a measure of the number of matching sockets from the
current bucket that have been seen or processed by the iterator. On
subsequent iterations, if the current bucket has unprocessed items, we
skip at least iter->offset matching items in the bucket before adding
any remaining items to the next batch. However, iter->offset isn't
always an accurate measure of "things already seen" when the underlying
bucket changes between reads, which can lead to repeated or skipped
sockets. Instead, this series remembers the cookies of the sockets we
haven't seen yet in the current bucket and resumes from the first cookie
in that list that we can find on the next iteration.
This is a continuation of the work started in [1]. This series largely
replicates the patterns applied to UDP socket iterators, applying them
instead to TCP socket iterators.
CHANGES
=======
v5 -> v6:
* In patch ten ("selftests/bpf: Create established sockets in socket
iterator tests"), use poll() to choose a socket that has a connection
ready to be accept()ed. Before, connect_to_server would set the
O_NONBLOCK flag on all listening sockets so that accept_from_one could
loop through them all and find the one that connect_to_addr_str
connected to. However, this is subtly buggy and could potentially lead
to test flakes, since the 3 way handshake isn't necessarily done when
connect returns, so it's possible none of the accept() calls succeed.
Use poll() instead to guarantee that the socket we accept() from is
ready and eliminate the need for the O_NONBLOCK flag (Martin).
v4 -> v5:
* Move WARN_ON_ONCE before the `done` label in patch two ("bpf: tcp:
Make sure iter->batch always contains a full bucket snapshot"")
(Martin).
* Remove unnecessary kfunc declaration in patch eleven ("selftests/bpf:
Create iter_tcp_destroy test program") (Martin).
* Make sure to close the socket fd at the end of `destroy` in patch
twelve ("selftests/bpf: Add tests for bucket resume logic in
established sockets") (Martin).
v3 -> v4:
* Drop braces around sk_nulls_for_each_from in patch five ("bpf: tcp:
Avoid socket skips and repeats during iteration") (Stanislav).
* Add a break after the TCP_SEQ_STATE_ESTABLISHED case in patch five
(Stanislav).
* Add an `if (sock_type == SOCK_STREAM)` check before assigning
TCP_LISTEN to skel->rodata->ss in patch eight ("selftests/bpf: Allow
for iteration over multiple states") to more clearly express the
intent that the option is only consumed for SOCK_STREAM tests
(Stanislav).
* Move the `i = 0` assignment into the for loop in patch ten
("selftests/bpf: Create established sockets in socket iterator
tests") (Stanislav).
v2 -> v3:
* Unroll the loop inside bpf_iter_tcp_batch to make the logic easier to
follow in patch two ("bpf: tcp: Make sure iter->batch always contains
a full bucket snapshot"). This gets rid of the `resizes` variable from
v2 and eliminates the extra conditional that checks how many batch
resize attempts have occurred so far (Stanislav).
Note: This changes the behavior slightly. Before, in the case that
the second call to tcp_seek_last_pos (and later bpf_iter_tcp_resume)
advances to a new bucket, which may happen if the current bucket is
emptied after releasing its lock, the `resizes` "budget" would be
reset, the net effect being that we would try a batch resize with
GFP_USER at most once per bucket. Now, we try to resize the batch
with GFP_USER at most once per call, so it makes it slightly more
likely that we hit the GFP_NOWAIT scenario. However, this edge case
should be rare in practice anyway, and the new behavior is more or
less consistent with the original retry logic, so avoid the loop and
prefer code clarity.
* Move the call to bpf_iter_tcp_put_batch out of
bpf_iter_tcp_realloc_batch and call it directly before invoking
bpf_iter_tcp_realloc_batch with GFP_USER inside bpf_iter_tcp_batch.
/Don't/ call it before invoking bpf_iter_tcp_realloc_batch the second
time while we hold the lock with GFP_NOWAIT. This avoids a conditional
inside bpf_iter_tcp_realloc_batch from v2 that only calls
bpf_iter_tcp_put_batch if flags != GFP_NOWAIT and is a bit more
explicit (Stanislav).
* Adjust patch five ("bpf: tcp: Avoid socket skips and repeats during
iteration") to fit with the new logic in patch two.
v1 -> v2:
* In patch five ("bpf: tcp: Avoid socket skips and repeats during
iteration"), remove unnecessary bucket bounds checks in
bpf_iter_tcp_resume. In either case, if st->bucket is outside the
current table's range then bpf_iter_tcp_resume_* calls *_get_first
which immediately returns NULL anyway and the logic will fall through.
(Martin)
* Add a check at the top of bpf_iter_tcp_resume_listening and
bpf_iter_tcp_resume_established to see if we're done with the current
bucket and advance it immediately instead of wasting time finding the
first matching socket in that bucket with
(listening|established)_get_first. In v1, we originally discussed
adding logic to advance the bucket in bpf_iter_tcp_seq_next and
bpf_iter_tcp_seq_stop, but after trying this the logic seemed harder
to track. Overall, keeping everything inside bpf_iter_tcp_resume_*
seemed a bit clearer. (Martin)
* Instead of using a timeout in the last patch ("selftests/bpf: Add
tests for bucket resume logic in established sockets") to wait for
sockets to leave the ehash table after calling close(), use
bpf_sock_destroy to deterministically destroy and remove them. This
introduces one more patch ("selftests/bpf: Create iter_tcp_destroy
test program") to create the iterator program that destroys a selected
socket. Drive this through a destroy() function in the last patch
which, just like close(), accepts a socket file descriptor. (Martin)
* Introduce one more patch ("selftests/bpf: Allow for iteration over
multiple states") to fix a latent bug in iter_tcp_soreuse where the
sk->sk_state != TCP_LISTEN check was ignored. Add the "ss" variable to
allow test code to configure which socket states to allow.
Jordan Rife [Mon, 14 Jul 2025 18:09:15 +0000 (11:09 -0700)]
selftests/bpf: Create iter_tcp_destroy test program
Prepare for bucket resume tests for established TCP sockets by creating
a program to immediately destroy and remove sockets from the TCP ehash
table, since close() is not deterministic.
Signed-off-by: Jordan Rife <jordan@jrife.io> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Jordan Rife [Mon, 14 Jul 2025 18:09:14 +0000 (11:09 -0700)]
selftests/bpf: Create established sockets in socket iterator tests
Prepare for bucket resume tests for established TCP sockets by creating
established sockets. Collect socket fds from connect() and accept()
sides and pass them to test cases.
Signed-off-by: Jordan Rife <jordan@jrife.io> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Jordan Rife [Mon, 14 Jul 2025 18:09:13 +0000 (11:09 -0700)]
selftests/bpf: Make ehash buckets configurable in socket iterator tests
Prepare for bucket resume tests for established TCP sockets by making
the number of ehash buckets configurable. Subsequent patches force all
established sockets into the same bucket by setting ehash_buckets to
one.
Signed-off-by: Jordan Rife <jordan@jrife.io> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Lizhi Xu [Fri, 13 Jun 2025 03:05:34 +0000 (11:05 +0800)]
jfs: truncate good inode pages when hard link is 0
The fileset value of the inode copy from the disk by the reproducer is
AGGR_RESERVED_I. When executing evict, its hard link number is 0, so its
inode pages are not truncated. This causes the bugon to be triggered when
executing clear_inode() because nrpages is greater than 0.
The reproducer builds a corrupted file on disk with a negative i_size value.
Add a check when opening this file to avoid subsequent operation failures.
Reported-by: syzbot+630f6d40b3ccabc8e96e@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=630f6d40b3ccabc8e96e Tested-by: syzbot+630f6d40b3ccabc8e96e@syzkaller.appspotmail.com Signed-off-by: Edward Adam Davis <eadavis@qq.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
When computing the tree index in dbAllocAG, we never check if we are
out of bounds realative to the size of the stree.
This could happen in a scenario where the filesystem metadata are
corrupted.
The intended implementations of `ForeignOwnable` will not return null
pointers from `into_foreign`, as this would render the implementation of
`try_from_foreign` useless. Current users of `ForeignOwnable` rely on
`into_foreign` returning non-null pointers. So require `into_foreign` to
return non-null pointers.
Suggested-by: Benno Lossin <lossin@kernel.org> Suggested-by: Alice Ryhl <aliceryhl@google.com> Signed-off-by: Andreas Hindborg <a.hindborg@kernel.org> Reviewed-by: Benno Lossin <lossin@kernel.org> Reviewed-by: Alice Ryhl <aliceryhl@google.com> Link: https://lore.kernel.org/r/20250612-pointed-to-v3-2-b009006d86a1@kernel.org Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
Andreas Hindborg [Thu, 12 Jun 2025 13:09:43 +0000 (15:09 +0200)]
rust: types: add FOREIGN_ALIGN to ForeignOwnable
The current implementation of `ForeignOwnable` is leaking the type of the
opaque pointer to consumers of the API. This allows consumers of the opaque
pointer to rely on the information that can be extracted from the pointer
type.
To prevent this, change the API to the version suggested by Maira
Canal (link below): Remove `ForeignOwnable::PointedTo` in favor of a
constant, which specifies the alignment of the pointers returned by
`into_foreign`.
With this change, `ArcInner` no longer needs `pub` visibility, so change it
to private.
Alice Ryhl [Mon, 16 Jun 2025 13:45:57 +0000 (13:45 +0000)]
rust: uaccess: use newtype for user pointers
Currently, Rust code uses a typedef for unsigned long to represent
userspace addresses. This is unfortunate because it means that userspace
addresses could accidentally be mixed up with other integers. To
alleviate that, we introduce a new UserPtr struct that wraps a raw
pointer to represent a userspace address. By using a struct, type
checking enforces that userspace addresses cannot be mixed up with
anything else.
This is similar to the __user annotation in C that detects cases where
user pointers are mixed with non-user pointers.
Note that unlike __user pointers in C, this type is just a pointer
without a target type. This means that it can't detect cases such as
mixing up which struct this user pointer references. However, that is
okay due to the way this is intended to be used - generally, you create
a UserPtr in your ioctl callback from the provided usize *before*
dispatching on which ioctl is in use, and then after dispatching on the
ioctl you pass the UserPtr into a UserSliceReader or UserSliceWriter;
selecting the target type does not happen until you have obtained the
UserSliceReader/Writer.
The UserPtr type is not marked with #[derive(Debug)], which means that
it's not possible to print values of this type. This avoids ASLR
leakage.
The type is added to the prelude as it is a fairly fundamental type
similar to c_int. The wrapping_add() method is renamed to
wrapping_byte_add() for consistency with the method name found on raw
pointers.
Reviewed-by: Benno Lossin <lossin@kernel.org> Reviewed-by: Danilo Krummrich <dakr@kernel.org> Reviewed-by: Christian Schrefl <chrisi.schrefl@gmail.com> Reviewed-by: Boqun Feng <boqun.feng@gmail.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Alice Ryhl <aliceryhl@google.com> Link: https://lore.kernel.org/r/20250616-userptr-newtype-v3-1-5ff7b2d18d9e@google.com
[ Reworded title. - Miguel ] Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
The tegra variant of smmu-v3 now uses the iommufd mmap interface but
is missing the corresponding import:
ERROR: modpost: module arm_smmu_v3 uses symbol _iommufd_object_depend from namespace IOMMUFD, but does not import it.
ERROR: modpost: module arm_smmu_v3 uses symbol iommufd_viommu_report_event from namespace IOMMUFD, but does not import it.
ERROR: modpost: module arm_smmu_v3 uses symbol _iommufd_destroy_mmap from namespace IOMMUFD, but does not import it.
ERROR: modpost: module arm_smmu_v3 uses symbol _iommufd_object_undepend from namespace IOMMUFD, but does not import it.
ERROR: modpost: module arm_smmu_v3 uses symbol _iommufd_alloc_mmap from namespace IOMMUFD, but does not import it.
Miguel Ojeda [Sat, 12 Jul 2025 16:01:03 +0000 (18:01 +0200)]
rust: use `#[used(compiler)]` to fix build and `modpost` with Rust >= 1.89.0
Starting with Rust 1.89.0 (expected 2025-08-07), the Rust compiler fails
to build the `rusttest` target due to undefined references such as:
kernel...-cgu.0:(.text....+0x116): undefined reference to
`rust_helper_kunit_get_current_test'
Moreover, tooling like `modpost` gets confused:
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/gpu/drm/nova/nova.o
ERROR: modpost: missing MODULE_LICENSE() in drivers/gpu/nova-core/nova_core.o
The reason behind both issues is that the Rust compiler will now [1]
treat `#[used]` as `#[used(linker)]` instead of `#[used(compiler)]`
for our targets. This means that the retain section flag (`R`,
`SHF_GNU_RETAIN`) will be used and that they will be marked as `unique`
too, with different IDs. In turn, that means we end up with undefined
references that did not get discarded in `rusttest` and that multiple
`.modinfo` sections are generated, which confuse tooling like `modpost`
because they only expect one.
Thus start using `#[used(compiler)]` to keep the previous behavior
and to be explicit about what we want. Sadly, it is an unstable feature
(`used_with_arg`) [2] -- we will talk to upstream Rust about it. The good
news is that it has been available for a long time (Rust >= 1.60) [3].
The changes should also be fine for previous Rust versions, since they
behave the same way as before [4].
Alternatively, we could use `#[no_mangle]` or `#[export_name = ...]`
since those still behave like `#[used(compiler)]`, but of course it is
not really what we want to express, and it requires other changes to
avoid symbol conflicts.
docs: dt: writing-schema: Document preferred order of properties
Document established Devicetree bindings maintainers review practice:
using DTS coding style property order in both 'properties' and
'required' secions.
Document established Devicetree bindings maintainers review practice:
instance indexes, either as properties or as custom new OF alias, are
not accepted. Recommended way is to use, depending on the
situation/hardware: different compatible, cell arguments or syscon
phandle arguments.
docs: dt: writing-bindings: Document compatible and filename naming
Document established Devicetree bindings maintainers review practices:
1. Compatibles should not use bus suffixes to encode the type of
interface, because the parent bus node defines that interface, e.g.
"vendor,device" instead of "vendor,device-i2c" + "vendor,device-spi".
2. If the compatible represents the device as a whole, it should not
contain the type of device in the name.
3. Filenames should match compatible. The best if match is 100%, but if
binding has multiple compatibles, then one of the fallbacks should be
used. Alternatively a genericish name is allowed if it follows
"vendor,device" style.
docs: dt: submitting-patches: Avoid 'YAML' in the subject and add an example
Patches adding new device bindings should avoid 'YAML' keyword in the
subject, because all bindings are supposed to be in DT schema format,
which uses YAML. The DT schema is welcomed only in case of patches
doing conversion. Effectively people get confused that subject should
not contain anything else than device name after the prefix, so add two
recommended examples.
Miguel Ojeda [Sat, 12 Jul 2025 16:01:02 +0000 (18:01 +0200)]
objtool/rust: add one more `noreturn` Rust function for Rust 1.89.0
Starting with Rust 1.89.0 (expected 2025-08-07), under
`CONFIG_RUST_DEBUG_ASSERTIONS=y`, `objtool` may report:
rust/kernel.o: warning: objtool: _R..._6kernel4pageNtB5_4Page8read_raw()
falls through to next function _R..._6kernel4pageNtB5_4Page9write_raw()
(and many others) due to calls to the `noreturn` symbol:
core::panicking::panic_nounwind_fmt
Thus add the mangled one to the list so that `objtool` knows it is
actually `noreturn`.
See commit 56d680dd23c3 ("objtool/rust: list `noreturn` Rust functions")
for more details.
Cc: stable@vger.kernel.org # Needed in 6.12.y and later (Rust is pinned in older LTSs). Cc: Josh Poimboeuf <jpoimboe@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20250712160103.1244945-2-ojeda@kernel.org Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
I was debugging some unrelated issue and noticed the current code was
very verbose. We can improve it easily by using the more common batch
buffer building pattern.
add/remove: 0/0 grow/shrink: 0/2 up/down: 0/-348 (-348)
Function old new delta
xe_gt_record_default_lrcs.cold 304 296 -8
xe_gt_record_default_lrcs 2200 1860 -340
Total: Before=13554, After=13206, chg -2.57%
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250710-lrc-refactors-v2-7-a5e2ca03f6bd@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Lucas De Marchi [Thu, 10 Jul 2025 20:33:51 +0000 (13:33 -0700)]
drm/xe/gt: Drop third submission for default context
There's no need to submit the nop job again on the first queue. Any
state needed is already saved when the first LRC is switched out. The
comment is a little misleading regarding indirect W/A: first of all
there's still no indirect W/A enabled and secondly, even after they are,
there's no need to submit this job again for having their state
propagated: the indirect W/A will actually run on every LRC switch.
Lucas De Marchi [Thu, 10 Jul 2025 20:33:50 +0000 (13:33 -0700)]
drm/xe/lrc: Remove leftover TODO/FIXME
There isn't anything to set for CTX_TIMESTAMP handling in the empty
LRC: that is set on every LRC init since it should always start from 0
rather than the value saved in the image after first submission.
The FIXME about perma-pinning also doesn't make much sense as we will
always going to pin the lrc and the GGTT mapping has nothing to do with
VM bind.
Lucas De Marchi [Thu, 10 Jul 2025 20:33:48 +0000 (13:33 -0700)]
drm/xe/gt: Extract emit_job_sync()
Both the nop and wa jobs are going through the same boiler plate calls
to emit the job with a timeout and handling error for both bb and job.
Extract emit_job_sync() so those functions create the bb, handling
possible errors and delegate the part about really emitting the job
and waiting for its completion.
Lucas De Marchi [Thu, 10 Jul 2025 20:33:47 +0000 (13:33 -0700)]
drm/xe: Count dwords before allocating
The bb allocation in emit_wa_job() is wrong in 2 ways: first it's
allocating enough space for the 3DSTATE or hardcoding 4k depending on
the engine. In the first case it doesn't account for the WAs and in the
former it may not be sufficient. Secondly it's using the size instead of
number of dwords, causing the buffer to be 4x bigger than needed:
xe_bb_new() receives number of dwords as parameter and its declaration
was also not following its implementation.
Lastly, reword the debug message since it's not only about the LRC WAs
anymore as it also include the 3DSTATE for render.
While it's unlikely this is causing any real issue, let's calculate the
needed space and allocate just enough.
Lucas De Marchi [Thu, 10 Jul 2025 20:33:46 +0000 (13:33 -0700)]
drm/xe/lrc: Reduce scope of empty lrc data
The only case in which new lrc data is created from scratch is when it's
called prior to recording the default lrc. There's no need to check for
NULL init_data since in that case the function already failed: just move
the allocation where it's needed.
Drivers could leverage the fact that the VF BAR MMIO reservation is created
for total number of VFs supported by the device by resizing the BAR to
larger size when smaller number of VFs is enabled.
Add pci_iov_vf_bar_set_size() to control the size and a
pci_iov_vf_bar_get_sizes() helper to get the VF BAR sizes that will allow
up to num_vfs to be successfully enabled with the current underlying
reservation size.
PCI/IOV: Check that VF BAR fits within the reservation
When the resource representing a VF MMIO BAR reservation is created, its
size is always large enough to accommodate the BAR of all SR-IOV Virtual
Functions that can potentially be created (total VFs). If for whatever
reason it's not possible to accommodate all VFs, the resource is not
assigned and no VFs can be created.
An upcoming change will allow VF BAR size to be modified by drivers at a
later point in time, which means that the check for resource assignment is
no longer sufficient.
Add an additional check that verifies that the VF BAR for all enabled VFs
fits within the underlying reservation resource.
PCI/IOV: Allow IOV resources to be resized in pci_resize_resource()
Similar to regular resizable BARs, VF BARs can also be resized.
The capability layout is the same as PCI_EXT_CAP_ID_REBAR, which means we
can reuse most of the implementation, the only difference being resource
size calculation (which is multiplied by total VFs) and memory decoding
(which is controlled by a separate VF MSE field in SR-IOV cap).
Extend the pci_resize_resource() function to accept IOV resources.
PCI/IOV: Add pci_resource_num_to_vf_bar() to convert VF BAR number to/from IOV resource
There are multiple places where conversions between IOV resources and
corresponding VF BAR numbers are done.
Extract the logic to pci_resource_num_from_vf_bar() and
pci_resource_num_to_vf_bar() helpers.
Suggested-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Michał Winiarski <michal.winiarski@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Acked-by: Christian König <christian.koenig@amd.com> Link: https://patch.msgid.link/20250702093522.518099-3-michal.winiarski@intel.com
PCI/IOV: Restore VF resizable BAR state after reset
Similar to regular resizable BARs, VF BARs can also be resized, e.g. by the
system firmware or the PCI subsystem itself.
The capability layout is the same as PCI_EXT_CAP_ID_REBAR.
Add the capability ID and restore it as a part of IOV state.
See PCIe r6.2, sec 7.8.7.
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://patch.msgid.link/20250702093522.518099-2-michal.winiarski@intel.com
Jordan Rife [Mon, 14 Jul 2025 18:09:12 +0000 (11:09 -0700)]
selftests/bpf: Allow for iteration over multiple states
Add parentheses around loopback address check to fix up logic and make
the socket state filter configurable for the TCP socket iterators.
Iterators can skip the socket state check by setting ss to 0.
Signed-off-by: Jordan Rife <jordan@jrife.io> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Jordan Rife [Mon, 14 Jul 2025 18:09:09 +0000 (11:09 -0700)]
bpf: tcp: Avoid socket skips and repeats during iteration
Replace the offset-based approach for tracking progress through a bucket
in the TCP table with one based on socket cookies. Remember the cookies
of unprocessed sockets from the last batch and use this list to
pick up where we left off or, in the case that the next socket
disappears between reads, find the first socket after that point that
still exists in the bucket and resume from there.
This approach guarantees that all sockets that existed when iteration
began and continue to exist throughout will be visited exactly once.
Sockets that are added to the table during iteration may or may not be
seen, but if they are they will be seen exactly once.
Signed-off-by: Jordan Rife <jordan@jrife.io> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Jordan Rife [Mon, 14 Jul 2025 18:09:08 +0000 (11:09 -0700)]
bpf: tcp: Use bpf_tcp_iter_batch_item for bpf_tcp_iter_state batch items
Prepare for the next patch that tracks cookies between iterations by
converting struct sock **batch to union bpf_tcp_iter_batch_item *batch
inside struct bpf_tcp_iter_state.
Signed-off-by: Jordan Rife <jordan@jrife.io> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Jordan Rife [Mon, 14 Jul 2025 18:09:07 +0000 (11:09 -0700)]
bpf: tcp: Get rid of st_bucket_done
Get rid of the st_bucket_done field to simplify TCP iterator state and
logic. Before, st_bucket_done could be false if bpf_iter_tcp_batch
returned a partial batch; however, with the last patch ("bpf: tcp: Make
sure iter->batch always contains a full bucket snapshot"),
st_bucket_done == true is equivalent to iter->cur_sk == iter->end_sk.
Signed-off-by: Jordan Rife <jordan@jrife.io> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Jordan Rife [Mon, 14 Jul 2025 18:09:06 +0000 (11:09 -0700)]
bpf: tcp: Make sure iter->batch always contains a full bucket snapshot
Require that iter->batch always contains a full bucket snapshot. This
invariant is important to avoid skipping or repeating sockets during
iteration when combined with the next few patches. Before, there were
two cases where a call to bpf_iter_tcp_batch may only capture part of a
bucket:
1. When bpf_iter_tcp_realloc_batch() returns -ENOMEM.
2. When more sockets are added to the bucket while calling
bpf_iter_tcp_realloc_batch(), making the updated batch size
insufficient.
In cases where the batch size only covers part of a bucket, it is
possible to forget which sockets were already visited, especially if we
have to process a bucket in more than two batches. This forces us to
choose between repeating or skipping sockets, so don't allow this:
1. Stop iteration and propagate -ENOMEM up to userspace if reallocation
fails instead of continuing with a partial batch.
2. Try bpf_iter_tcp_realloc_batch() with GFP_USER just as before, but if
we still aren't able to capture the full bucket, call
bpf_iter_tcp_realloc_batch() again while holding the bucket lock to
guarantee the bucket does not change. On the second attempt use
GFP_NOWAIT since we hold onto the spin lock.
I did some manual testing to exercise the code paths where GFP_NOWAIT is
used and where ERR_PTR(err) is returned. I used the realloc test cases
included later in this series to trigger a scenario where a realloc
happens inside bpf_iter_tcp_batch and made a small code tweak to force
the first realloc attempt to allocate a too-small batch, thus requiring
another attempt with GFP_NOWAIT. Some printks showed both reallocs with
the tests passing:
Jun 27 00:00:53 crow kernel: again GFP_USER
Jun 27 00:00:53 crow kernel: again GFP_NOWAIT
Jun 27 00:00:53 crow kernel: again GFP_USER
Jun 27 00:00:53 crow kernel: again GFP_NOWAIT
With this setup, I also forced each of the bpf_iter_tcp_realloc_batch
calls to return -ENOMEM to ensure that iteration ends and that the
read() in userspace fails.
Signed-off-by: Jordan Rife <jordan@jrife.io> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Jordan Rife [Mon, 14 Jul 2025 18:09:05 +0000 (11:09 -0700)]
bpf: tcp: Make mem flags configurable through bpf_iter_tcp_realloc_batch
Prepare for the next patch which needs to be able to choose either
GFP_USER or GFP_NOWAIT for calls to bpf_iter_tcp_realloc_batch.
Signed-off-by: Jordan Rife <jordan@jrife.io> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Eric Biggers [Sat, 12 Jul 2025 23:23:06 +0000 (16:23 -0700)]
lib/crypto: tests: Add KUnit tests for SHA-1 and HMAC-SHA1
Add a KUnit test suite for the SHA-1 library functions, including the
corresponding HMAC support. The core test logic is in the
previously-added hash-test-template.h. This commit just adds the actual
KUnit suite, and it adds the generated test vectors to the tree so that
gen-hash-testvecs.py won't have to be run at build time.
Eric Biggers [Wed, 9 Jul 2025 20:01:12 +0000 (13:01 -0700)]
lib/crypto: tests: Add KUnit tests for Poly1305
Add a KUnit test suite for the Poly1305 functions. Most of its test
cases are instantiated from hash-test-template.h, which is also used by
the SHA-2 tests. A couple additional test cases are also included to
test edge cases specific to Poly1305.
Eric Biggers [Wed, 9 Jul 2025 20:01:11 +0000 (13:01 -0700)]
lib/crypto: tests: Add KUnit tests for SHA-384 and SHA-512
Add KUnit test suites for the SHA-384 and SHA-512 library functions,
including the corresponding HMAC support. The core test logic is in the
previously-added hash-test-template.h. This commit just adds the actual
KUnit suites, and it adds the generated test vectors to the tree so that
gen-hash-testvecs.py won't have to be run at build time.
Eric Biggers [Wed, 9 Jul 2025 20:01:10 +0000 (13:01 -0700)]
lib/crypto: tests: Add KUnit tests for SHA-224 and SHA-256
Add KUnit test suites for the SHA-224 and SHA-256 library functions,
including the corresponding HMAC support. The core test logic is in the
previously-added hash-test-template.h. This commit just adds the actual
KUnit suites, and it adds the generated test vectors to the tree so that
gen-hash-testvecs.py won't have to be run at build time.
The initial use cases for this will be sha224_kunit, sha256_kunit,
sha384_kunit, sha512_kunit, and poly1305_kunit.
Add a Python script gen-hash-testvecs.py which generates the test
vectors required by test_hash_test_vectors,
test_hash_all_lens_up_to_4096, and test_hmac.
Eric Biggers [Mon, 30 Jun 2025 17:22:24 +0000 (10:22 -0700)]
fsverity: Switch from crypto_shash to SHA-2 library
fsverity supports two hash algorithms: SHA-256 and SHA-512. Since both
of these have a library API now, just use the library API instead of
crypto_shash. Even with multiple algorithms, the library-based code
still ends up being quite a bit simpler, due to how clumsy the
old-school crypto API is. The library-based code is also more
efficient, since it avoids overheads such as indirect calls.
Eric Biggers [Mon, 30 Jun 2025 17:48:05 +0000 (10:48 -0700)]
apparmor: use SHA-256 library API instead of crypto_shash API
This user of SHA-256 does not support any other algorithm, so the
crypto_shash abstraction provides no value. Just use the SHA-256
library API instead, which is much simpler and easier to use.
Eric Biggers [Sat, 12 Jul 2025 23:23:04 +0000 (16:23 -0700)]
lib/crypto: x86/sha1: Migrate optimized code into library
Instead of exposing the x86-optimized SHA-1 code via x86-specific
crypto_shash algorithms, instead just implement the sha1_blocks()
library function. This is much simpler, it makes the SHA-1 library
functions be x86-optimized, and it fixes the longstanding issue where
the x86-optimized SHA-1 code was disabled by default. SHA-1 still
remains available through crypto_shash, but individual architectures no
longer need to handle it.
To match sha1_blocks(), change the type of the nblocks parameter of the
assembly functions from int to size_t. The assembly functions actually
already treated it as size_t.
There are no _raw or _scale attributes in use, so those are all removed.
There are no currentY attributes in use with these modifiers, so those
are also removed. All of the voltageY are changed to altvoltageY since
that is how they are actually used. None of these channels are used
with scan buffers, so all of those attributes are removed as well. And
the {in,out}_altvoltageY_{i,q}_phase attributes were missing so those
are added.
The differential channel names for admv1013 are fixed.
Nicera D3-323-AA is a PIR sensor for human detection. It has support for
raw data measurements and detection notification. The communication
protocol is custom made and therefore needs to be GPIO bit banged.
The device has two main settings that can be configured: a threshold
value for detection and a band-pass filter. The configurable parameters
for the band-pass filter are the high-pass and low-pass cutoff
frequencies and its peak gain. Map these settings to the corresponding
parameters in the `iio` framework.
Raw data measurements can be obtained from the device. However, since we
rely on bit banging, it will be rather cumbersome with buffer support.
The main reason being that the data protocol has strict timing
requirements (it's serial like UART), and it's mainly used during
debugging since in real-world applications only the event notification
is of importance. Therefore, only add support for events (for now).
iio: dac: vf610: Simplify with devm_clk_get_enabled()
Driver is getting clock and almost immediately enabling it, with no
relevant code executed between, thus the probe path and cleanups can be
simplified with devm_clk_get_enabled().
David Lechner [Thu, 10 Jul 2025 02:20:00 +0000 (21:20 -0500)]
iio: imu: bno055: fix OOB access of hw_xlate array
Fix a potential out-of-bounds array access of the hw_xlate array in
bno055.c.
In bno055_get_regmask(), hw_xlate was iterated over the length of the
vals array instead of the length of the hw_xlate array. In the case of
bno055_gyr_scale, the vals array is larger than the hw_xlate array,
so this could result in an out-of-bounds access. In practice, this
shouldn't happen though because a match should always be found which
breaks out of the for loop before it iterates beyond the end of the
hw_xlate array.
By adding a new hw_xlate_len field to the bno055_sysfs_attr, we can be
sure we are iterating over the correct length.
Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202507100510.rGt1YOOx-lkp@intel.com/ Fixes: 4aefe1c2bd0c ("iio: imu: add Bosch Sensortec BNO055 core driver") Signed-off-by: David Lechner <dlechner@baylibre.com> Link: https://patch.msgid.link/20250709-iio-const-data-19-v2-1-fb3fc9191251@baylibre.com Cc: <Stable@vger.kernel.org> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
The temperature sensor in the MT7981 is same as in the MT7986.
Add compatible string for mt7981.
Signed-off-by: Aleksander Jan Bajkowski <olek2@wp.pl> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://patch.msgid.link/20250708220405.1072393-2-olek2@wp.pl Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Jonathan Cameron [Sun, 29 Jun 2025 18:36:49 +0000 (19:36 +0100)]
iio: accel: kionix-kx022a: Apply approximate iwyu principles to includes
Motivated by the W=1 warning about export.h that was introduced this cycle
this is an attempt to apply an approximation of the principles of including
whatever is used in the file directly.
Helped by the include-what-you-use tool.
Reasoning:
- Drop linux/moduleparam.h as completely unused.
- linux/array_size.h for ARRAY_SIZE()
- linux/bitmap.h for for_each_set_bit
- linux/errno.h for error codes.
- linux/export.h for EXPORT_SYMBOL*()
- linux/math64.h for do_div - alternative would be asm/div64.h
- linux/minmax.h for min()
- linux/sysfs.h for sysfs_emit()
- linux/time64.h for USEC_PER_MSEC
- linux/iio/buffer.h for iio_push_to_buffers_with_timestamp()
- asm/byteorder.h for le16_to_cpu()
iio: adc: ad4170-4: Add support for weigh scale, thermocouple, and RTD sens
The AD4170-4 design provides features to aid interfacing with weigh scales,
thermocouples, and RTD sensors, which are set up with additional circuitry
for proper sensor operation. A key characteristic of those sensors is that
the circuit they are in must be excited with a single, a pair, or two pairs
of signals. The external circuit can be excited either by a voltage supply
or by AD4170-4 excitation signals. The sensor can then be read through a
different pair of lines that are connected to the AD4170-4 ADC.
Extend the ad4170-4 driver to handle external circuit sensors.
iio: adc: ad4170-4: Add support for internal temperature sensor
The AD4170-4 has an internal temperature sensor that can be read using the
ADC. Whenever possible, configure an IIO channel to provide the chip's
temperature.
The AD4170-4 has four multifunctional pins that can be used as GPIOs. The
GPIO functionality can be accessed when the AD4170-4 chip is not busy
performing continuous data capture or handling any other register
read/write request. Also, the AD4170-4 does not provide any interrupt based
on GPIO pin states so AD4170-4 GPIOs can't be used as interrupt sources.
Implement gpio_chip callbacks to make AD4170-4 GPIO pins controllable
through the gpiochip interface.
The AD4170-4 chip can use an externally supplied clock at the XTAL2 pin, or
an external crystal connected to the XTAL1 and XTAL2 pins. Alternatively,
the AD4170-4 can provide its 16 MHz internal clock at the XTAL2 pin. In
addition, the chip has a programmable clock divider that allows dividing
the external or internal clock frequency, however, control for that is not
provided in this patch. Extend the AD4170-4 driver so it effectively uses
the provided external clock, if any, or supplies its own clock as a clock
provider.
iio: adc: ad4170-4: Add support for buffered data capture
Extend the AD4170-4 driver to allow buffered data capture in continuous
read mode. In continuous read mode, the chip skips the instruction phase
and outputs just ADC sample data, enabling faster sample rates to be
reached. The internal channel sequencer always starts sampling from channel
0 and channel 0 must be enabled if more than one channel is selected for
data capture. The scan mask validation callback checks if the
aforementioned condition is met.
The AD4170-4 is a multichannel, low noise, 24-bit precision sigma-delta
analog to digital converter. The AD4170-4 design offers a flexible data
acquisition solution with crosspoint multiplexed analog inputs,
configurable ADC voltage reference inputs, ultra-low noise integrated PGA,
digital filtering, wide range of configurable output data rates, internal
oscillator and temperature sensor, four GPIOs, and integrated features for
interfacing with load cell weigh scales, RTD, and thermocouple sensors.
Add basic support for the AD4170-4 ADC with the following features:
- Single-shot read.
- Analog front end PGA configuration.
- Differential and pseudo-differential input configuration.
iio: imu: inv_icm42600: reorganize DMA aligned buffers in structure
Move all DMA aligned buffers together at the end of the structure.
1. Timestamp anynomous structure is not used with DMA so it doesn't
belong after __aligned(IIO_DMA_MINALIGN).
2. struct inv_icm42600_fifo contains it's own __aligned(IIO_DMA_MINALIGN)
within it at the end so it should not be after __aligned(IIO_DMA_MINALIGN)
in the outer struct either.
3. Normally 1 would have been considered a bug, but because of the extra
alignment from 2, it actually was OK, but we shouldn't be relying on such
quirks.
Refactor the sensor interrupt mapping by utilizing regmap_assign_bits(),
which accepts a boolean value directly. Introduce a helper function to
streamline the identification of the configured interrupt line pin. Also,
use identifiers from units.h to represent the full 8-bit register when
setting bits.
This is a purely refactoring change and does not affect functionality.