riscv: ftrace: Properly acquire text_mutex to fix a race condition
As reported by lockdep, some patching was done without acquiring
text_mutex, so there could be a race when mapping the page to patch
since we use the same fixmap entry.
Reported-by: Han Gao <rabenda.cn@gmail.com> Reported-by: Vivian Wang <wangruikang@iscas.ac.cn> Reported-by: Yao Zi <ziyao@disroot.org> Closes: https://lore.kernel.org/linux-riscv/aGODMpq7TGINddzM@pie.lan/ Tested-by: Yao Zi <ziyao@disroot.org> Tested-by: Han Gao <rabenda.cn@gmail.com> Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20250711-alex-fixes-v2-1-d85a5438da6c@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
The presence or absence of the CPPC SBI extension is currently logged
on every boot. This message is not particularly useful and can clutter
the boot log. Remove this debug message to reduce noise during boot.
riscv: Stop considering R_RISCV_NONE as bad relocations
Even though those relocations should not be present in the final
vmlinux, there are a lot of them. And since those relocations are
considered "bad", they flood the compilation output which may hide some
legitimate bad relocations.
drm/panel/boe-himax8279d: Use refcounted allocation in place of devm_kzalloc()
Move to using the new API devm_drm_panel_alloc() to allocate the
panel. In the call to the new API, avoid using explicit type and use
__typeof() for more type safety.
drm/panel/boe-tv101wum-nl6: Use refcounted allocation in place of devm_kzalloc()
Move to using the new API devm_drm_panel_alloc() to allocate the
panel. In the call to the new API, avoid using explicit type and use
__typeof() for more type safety.
drm/panel/himax-hx83102: Use refcounted allocation in place of devm_kzalloc()
Move to using the new API devm_drm_panel_alloc() to allocate the
panel. In the call to the new API, avoid using explicit type and use
__typeof() for more type safety.
drm/panel/ilitek-ili9882t: Use refcounted allocation in place of devm_kzalloc()
Move to using the new API devm_drm_panel_alloc() to allocate the
panel. In the call to the new API, avoid using explicit type and use
__typeof() for more type safety.
drm/panel/lpm102a188a: Use refcounted allocation in place of devm_kzalloc()
Move to using the new API devm_drm_panel_alloc() to allocate the
panel. In the call to the new API, avoid using explicit type and use
__typeof() for more type safety.
drm/panel/jdi-lt070me05000: Use refcounted allocation in place of devm_kzalloc()
Move to using the new API devm_drm_panel_alloc() to allocate the
panel. In the call to the new API, avoid using explicit type and use
__typeof() for more type safety.
drm/panel/khadas-ts050: Use refcounted allocation in place of devm_kzalloc()
Move to using the new API devm_drm_panel_alloc() to allocate the
panel. In the call to the new API, avoid using explicit type and use
__typeof() for more type safety.
drm/panel/kd097d04: Use refcounted allocation in place of devm_kzalloc()
Move to using the new API devm_drm_panel_alloc() to allocate the
panel. In the call to the new API, avoid using explicit type and use
__typeof() for more type safety.
drm/panel/lg-sw43408: Use refcounted allocation in place of devm_kzalloc()
Move to using the new API devm_drm_panel_alloc() to allocate the
panel. In the call to the new API, avoid using explicit type and use
__typeof() for more type safety.
drm/panel/novatek-nt36672a: Use refcounted allocation in place of devm_kzalloc()
Move to using the new API devm_drm_panel_alloc() to allocate the
panel. In the call to the new API, avoid using explicit type and use
__typeof() for more type safety.
drm/panel/osd101t2587-53ts: Use refcounted allocation in place of devm_kzalloc()
Move to using the new API devm_drm_panel_alloc() to allocate the
panel. In the call to the new API, avoid using explicit type and use
__typeof() for more type safety.
drm/panel/vvx10f034n00: Use refcounted allocation in place of devm_kzalloc()
Move to using the new API devm_drm_panel_alloc() to allocate the
panel. In the call to the new API, avoid using explicit type and use
__typeof() for more type safety.
drm/panel/raspberrypi: Use refcounted allocation in place of devm_kzalloc()
Move to using the new API devm_drm_panel_alloc() to allocate the
panel. In the call to the new API, avoid using explicit type and use
__typeof() for more type safety.
drm/panel/lq101r1sx01: Use refcounted allocation in place of devm_kzalloc()
Move to using the new API devm_drm_panel_alloc() to allocate the
panel. In the call to the new API, avoid using explicit type and use
__typeof() for more type safety.
On 11 Oct 2022, it was reported that the crc32 verification
of the u-boot environment failed only on big-endian systems
for the u-boot-env nvmem layout driver with the following error.
This problem has been present since the driver was introduced,
and before it was made into a layout driver.
The suggested fix at the time was to use further endianness
conversion macros in order to have both the stored and calculated
crc32 values to compare always represented in the system's endianness.
This was not accepted due to sparse warnings
and some disagreement on how to handle the situation.
Later on in a newer revision of the patch, it was proposed to use
cpu_to_le32() for both values to compare instead of le32_to_cpu()
and store the values as __le32 type to remove compilation errors.
The necessity of this is based on the assumption that the use of crc32()
requires endianness conversion because the algorithm uses little-endian,
however, this does not prove to be the case and the issue is unrelated.
Upon inspecting the current kernel code,
there already is an existing use of le32_to_cpu() in this driver,
which suggests there already is special handling for big-endian systems,
however, it is big-endian systems that have the problem.
This, being the only functional difference between architectures
in the driver combined with the fact that the suggested fix
was to use the exact same endianness conversion for the values
brings up the possibility that it was not necessary to begin with,
as the same endianness conversion for two values expected to be the same
is expected to be equivalent to no conversion at all.
After inspecting the u-boot environment of devices of both endianness
and trying to remove the existing endianness conversion,
the problem is resolved in an equivalent way as the other suggested fixes.
Ultimately, it seems that u-boot is agnostic to endianness
at least for the purpose of environment variables.
In other words, u-boot reads and writes the stored crc32 value
with the same endianness that the crc32 value is calculated with
in whichever endianness a certain architecture runs on.
Therefore, the u-boot-env driver does not need to convert endianness.
Remove the usage of endianness macros in the u-boot-env driver,
and change the type of local variables to maintain the same return type.
If there is a special situation in the case of endianness,
it would be a corner case and should be handled by a unique "compatible".
Even though it is not necessary to use endianness conversion macros here,
it may be useful to use them in the future for consistent error printing.
Norman Maurer [Tue, 15 Jul 2025 14:02:50 +0000 (16:02 +0200)]
io_uring/net: Support multishot receive len cap
At the moment its very hard to do fine grained backpressure when using
multishot as the kernel might produce a lot of completions before the
user has a chance to cancel a previous submitted multishot recv.
This change adds support to issue a multishot recv that is capped by a
len, which means the kernel will only rearm until X amount of data is
received. When the limit is reached the completion will signal to the
user that a re-arm needs to happen manually by not setting the IORING_CQE_F_MORE
flag.
Andrew Price [Wed, 16 Jul 2025 13:12:07 +0000 (14:12 +0100)]
gfs2: Validate i_depth for exhash directories
A fuzzer test introduced corruption that ends up with a depth of 0 in
dir_e_read(), causing an undefined shift by 32 at:
index = hash >> (32 - dip->i_depth);
As calculated in an open-coded way in dir_make_exhash(), the minimum
depth for an exhash directory is ilog2(sdp->sd_hash_ptrs) and 0 is
invalid as sdp->sd_hash_ptrs is fixed as sdp->bsize / 16 at mount time.
So we can avoid the undefined behaviour by checking for depth values
lower than the minimum in gfs2_dinode_in(). Values greater than the
maximum are already being checked for there.
Also switch the calculation in dir_make_exhash() to use ilog2() to
clarify how the depth is calculated.
Tested with the syzkaller repro.c and xfstests '-g quick'.
Reported-by: syzbot+4708579bb230a0582a57@syzkaller.appspotmail.com Signed-off-by: Andrew Price <anprice@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
James Morse [Fri, 11 Jul 2025 18:27:43 +0000 (18:27 +0000)]
arm64: cacheinfo: Provide helper to compress MPIDR value into u32
Filesystems like resctrl use the cache-id exposed via sysfs to identify
groups of CPUs. The value is also used for PCIe cache steering tags. On
DT platforms cache-id is not something that is described in the
device-tree, but instead generated from the smallest MPIDR of the CPUs
associated with that cache. The cache-id exposed to user-space has
historically been 32 bits.
MPIDR values may be larger than 32 bits.
MPIDR only has 32 bits worth of affinity data, but the aff3 field lives
above 32bits. The corresponding lower bits are masked out by
MPIDR_HWID_BITMASK and contain an SMT flag and Uni-Processor flag.
Swizzzle the aff3 field into the bottom 32 bits and using that.
In case more affinity fields are added in the future, the upper RES0
area should be checked. Returning a value greater than 32 bits from
this helper will cause the caller to give up on allocating cache-ids.
Signed-off-by: James Morse <james.morse@arm.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Link: https://lore.kernel.org/r/20250711182743.30141-4-james.morse@arm.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
James Morse [Fri, 11 Jul 2025 18:27:42 +0000 (18:27 +0000)]
cacheinfo: Add arch hook to compress CPU h/w id into 32 bits for cache-id
Filesystems like resctrl use the cache-id exposed via sysfs to identify
groups of CPUs. The value is also used for PCIe cache steering tags. On
DT platforms cache-id is not something that is described in the
device-tree, but instead generated from the smallest CPU h/w id of the
CPUs associated with that cache.
CPU h/w ids may be larger than 32 bits.
Add a hook to allow architectures to compress the value from the devicetree
into 32 bits. Returning the same value is always safe as cache_of_set_id()
will stop if a value larger than 32 bits is seen.
For example, on arm64 the value is the MPIDR affinity register, which only
has 32 bits of affinity data, but spread accross the 64 bit field. An
arch-specific bit swizzle gives a 32 bit value.
Signed-off-by: James Morse <james.morse@arm.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Link: https://lore.kernel.org/r/20250711182743.30141-3-james.morse@arm.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Rob Herring [Fri, 11 Jul 2025 18:27:41 +0000 (18:27 +0000)]
cacheinfo: Set cache 'id' based on DT data
Use the minimum CPU h/w id of the CPUs associated with the cache for the
cache 'id'. This will provide a stable id value for a given system. As
we need to check all possible CPUs, we can't use the shared_cpu_map
which is just online CPUs. As there's not a cache to CPUs mapping in DT,
we have to walk all CPU nodes and then walk cache levels.
The cache_id exposed to user-space has historically been 32 bits, and
is too late to change. This value is parsed into a u32 by user-space
libraries such as libvirt:
https://github.com/libvirt/libvirt/blob/master/src/util/virresctrl.c#L1588
Give up on assigning cache-id's if a CPU h/w id greater than 32 bits
is found.
match_cache_node() does not make use of the __free() cleanup helpers
because of_find_next_cache_node(prev) does not drop a reference to prev,
and its too easy to accidentally drop the reference on cpu, which belongs
to for_each_of_cpu_node().
Cc: "Rafael J. Wysocki" <rafael@kernel.org> Signed-off-by: Rob Herring <robh@kernel.org>
[ ben: converted to use the __free cleanup idiom ] Signed-off-by: Ben Horgan <ben.horgan@arm.com>
[ morse: Add checks to give up if a value larger than 32 bits is seen. ] Signed-off-by: James Morse <james.morse@arm.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Link: https://lore.kernel.org/r/20250711182743.30141-2-james.morse@arm.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
- New AMD processor will support different input/output for same command.
- In some scenarios the input value is not cleared, which will be added to
output before reporting the data.
- Clearing input explicitly will be a cleaner and safer approach.
misc: amd-sbi: Address copy_to/from_user() warning reported in smatch
Smatch warnings are reported for below commit,
Commit bb13a84ed6b7 ("misc: amd-sbi: Add support for CPUID protocol")
from Apr 28, 2025 (linux-next), leads to the following Smatch static
checker warning:
drivers/misc/amd-sbi/rmi-core.c:376 apml_rmi_reg_xfer() warn: maybe return -EFAULT instead of the bytes remaining?
drivers/misc/amd-sbi/rmi-core.c:394 apml_mailbox_xfer() warn: maybe return -EFAULT instead of the bytes remaining?
drivers/misc/amd-sbi/rmi-core.c:411 apml_cpuid_xfer() warn: maybe return -EFAULT instead of the bytes remaining?
drivers/misc/amd-sbi/rmi-core.c:428 apml_mcamsr_xfer() warn: maybe return -EFAULT instead of the bytes remaining?
copy_to/from_user() returns number of bytes, not copied.
In case data not copied, return "-EFAULT".
Additionally, fixes the "-EPROTOTYPE" error return as intended.
Fixes: 35ac2034db72 ("misc: amd-sbi: Add support for AMD_SBI IOCTL") Fixes: bb13a84ed6b7 ("misc: amd-sbi: Add support for CPUID protocol") Fixes: 69b1ba83d21c ("misc: amd-sbi: Add support for read MCA register protocol") Fixes: cf141287b774 ("misc: amd-sbi: Add support for register xfer") Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/all/aDVyO8ByVsceybk9@stanley.mountain/ Reviewed-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com> Signed-off-by: Akshay Gupta <akshay.gupta@amd.com> Link: https://lore.kernel.org/r/20250716110729.2193725-2-akshay.gupta@amd.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
misc: amd-sbi: Address potential integer overflow issue reported in smatch
Smatch warnings are reported for below commit,
Commit bb13a84ed6b7 ("misc: amd-sbi: Add support for CPUID protocol")
from Apr 28, 2025 (linux-next), leads to the following Smatch static
checker warning:
drivers/misc/amd-sbi/rmi-core.c:132 rmi_cpuid_read() warn: bitwise OR is zero '0xffffffff00000000 & 0xffff'
drivers/misc/amd-sbi/rmi-core.c:132 rmi_cpuid_read() warn: potential integer overflow from user 'msg->cpu_in_out << 32'
drivers/misc/amd-sbi/rmi-core.c:213 rmi_mca_msr_read() warn: bitwise OR is zero '0xffffffff00000000 & 0xffff'
drivers/misc/amd-sbi/rmi-core.c:213 rmi_mca_msr_read() warn: potential integer overflow from user 'msg->mcamsr_in_out << 32'
CPUID & MCAMSR thread data from input is available at byte 4 & 5, this
patch fixes to copy the user data correctly in the argument.
Previously, CPUID and MCAMSR data is return only for thread 0.
Fixes: bb13a84ed6b7 ("misc: amd-sbi: Add support for CPUID protocol") Fixes: 69b1ba83d21c ("misc: amd-sbi: Add support for read MCA register protocol") Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/all/aDVyO8ByVsceybk9@stanley.mountain/ Reviewed-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com> Signed-off-by: Akshay Gupta <akshay.gupta@amd.com> Link: https://lore.kernel.org/r/20250716110729.2193725-1-akshay.gupta@amd.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Ian Abbott [Tue, 8 Jul 2025 13:06:27 +0000 (14:06 +0100)]
comedi: comedi_test: Fix possible deletion of uninitialized timers
In `waveform_common_attach()`, the two timers `&devpriv->ai_timer` and
`&devpriv->ao_timer` are initialized after the allocation of the device
private data by `comedi_alloc_devpriv()` and the subdevices by
`comedi_alloc_subdevices()`. The function may return with an error
between those function calls. In that case, `waveform_detach()` will be
called by the Comedi core to clean up. The check that
`waveform_detach()` uses to decide whether to delete the timers is
incorrect. It only checks that the device private data was allocated,
but that does not guarantee that the timers were initialized. It also
needs to check that the subdevices were allocated. Fix it.
Ian Abbott [Mon, 7 Jul 2025 16:14:39 +0000 (17:14 +0100)]
comedi: Fix initialization of data for instructions that write to subdevice
Some Comedi subdevice instruction handlers are known to access
instruction data elements beyond the first `insn->n` elements in some
cases. The `do_insn_ioctl()` and `do_insnlist_ioctl()` functions
allocate at least `MIN_SAMPLES` (16) data elements to deal with this,
but they do not initialize all of that. For Comedi instruction codes
that write to the subdevice, the first `insn->n` data elements are
copied from user-space, but the remaining elements are left
uninitialized. That could be a problem if the subdevice instruction
handler reads the uninitialized data. Ensure that the first
`MIN_SAMPLES` elements are initialized before calling these instruction
handlers, filling the uncopied elements with 0. For
`do_insnlist_ioctl()`, the same data buffer elements are used for
handling a list of instructions, so ensure the first `MIN_SAMPLES`
elements are initialized for each instruction that writes to the
subdevice.
Ian Abbott [Mon, 7 Jul 2025 15:33:54 +0000 (16:33 +0100)]
comedi: Fix use of uninitialized data in insn_rw_emulate_bits()
For Comedi `INSN_READ` and `INSN_WRITE` instructions on "digital"
subdevices (subdevice types `COMEDI_SUBD_DI`, `COMEDI_SUBD_DO`, and
`COMEDI_SUBD_DIO`), it is common for the subdevice driver not to have
`insn_read` and `insn_write` handler functions, but to have an
`insn_bits` handler function for handling Comedi `INSN_BITS`
instructions. In that case, the subdevice's `insn_read` and/or
`insn_write` function handler pointers are set to point to the
`insn_rw_emulate_bits()` function by `__comedi_device_postconfig()`.
For `INSN_WRITE`, `insn_rw_emulate_bits()` currently assumes that the
supplied `data[0]` value is a valid copy from user memory. It will at
least exist because `do_insnlist_ioctl()` and `do_insn_ioctl()` in
"comedi_fops.c" ensure at lease `MIN_SAMPLES` (16) elements are
allocated. However, if `insn->n` is 0 (which is allowable for
`INSN_READ` and `INSN_WRITE` instructions, then `data[0]` may contain
uninitialized data, and certainly contains invalid data, possibly from a
different instruction in the array of instructions handled by
`do_insnlist_ioctl()`. This will result in an incorrect value being
written to the digital output channel (or to the digital input/output
channel if configured as an output), and may be reflected in the
internal saved state of the channel.
Fix it by returning 0 early if `insn->n` is 0, before reaching the code
that accesses `data[0]`. Previously, the function always returned 1 on
success, but it is supposed to be the number of data samples actually
read or written up to `insn->n`, which is 0 in this case.
Ian Abbott [Mon, 7 Jul 2025 13:57:37 +0000 (14:57 +0100)]
comedi: das6402: Fix bit shift out of bounds
When checking for a supported IRQ number, the following test is used:
/* IRQs 2,3,5,6,7, 10,11,15 are valid for "enhanced" mode */
if ((1 << it->options[1]) & 0x8cec) {
However, `it->options[i]` is an unchecked `int` value from userspace, so
the shift amount could be negative or out of bounds. Fix the test by
requiring `it->options[1]` to be within bounds before proceeding with
the original test. Valid `it->options[1]` values that select the IRQ
will be in the range [1,15]. The value 0 explicitly disables the use of
interrupts.
Ian Abbott [Mon, 7 Jul 2025 13:46:22 +0000 (14:46 +0100)]
comedi: aio_iiro_16: Fix bit shift out of bounds
When checking for a supported IRQ number, the following test is used:
if ((1 << it->options[1]) & 0xdcfc) {
However, `it->options[i]` is an unchecked `int` value from userspace, so
the shift amount could be negative or out of bounds. Fix the test by
requiring `it->options[1]` to be within bounds before proceeding with
the original test. Valid `it->options[1]` values that select the IRQ
will be in the range [1,15]. The value 0 explicitly disables the use of
interrupts.
Fixes: ad7a370c8be4 ("staging: comedi: aio_iiro_16: add command support for change of state detection") Cc: stable@vger.kernel.org # 5.13+ Signed-off-by: Ian Abbott <abbotti@mev.co.uk> Link: https://lore.kernel.org/r/20250707134622.75403-1-abbotti@mev.co.uk Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Ian Abbott [Mon, 7 Jul 2025 13:34:29 +0000 (14:34 +0100)]
comedi: pcl812: Fix bit shift out of bounds
When checking for a supported IRQ number, the following test is used:
if ((1 << it->options[1]) & board->irq_bits) {
However, `it->options[i]` is an unchecked `int` value from userspace, so
the shift amount could be negative or out of bounds. Fix the test by
requiring `it->options[1]` to be within bounds before proceeding with
the original test. Valid `it->options[1]` values that select the IRQ
will be in the range [1,15]. The value 0 explicitly disables the use of
interrupts.
Ian Abbott [Mon, 7 Jul 2025 13:09:08 +0000 (14:09 +0100)]
comedi: das16m1: Fix bit shift out of bounds
When checking for a supported IRQ number, the following test is used:
/* only irqs 2, 3, 4, 5, 6, 7, 10, 11, 12, 14, and 15 are valid */
if ((1 << it->options[1]) & 0xdcfc) {
However, `it->options[i]` is an unchecked `int` value from userspace, so
the shift amount could be negative or out of bounds. Fix the test by
requiring `it->options[1]` to be within bounds before proceeding with
the original test.
Reported-by: syzbot+c52293513298e0fd9a94@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=c52293513298e0fd9a94 Fixes: 729988507680 ("staging: comedi: das16m1: tidy up the irq support in das16m1_attach()") Tested-by: syzbot+c52293513298e0fd9a94@syzkaller.appspotmail.com Suggested-by: "Enju, Kohei" <enjuk@amazon.co.jp> Cc: stable@vger.kernel.org # 5.13+ Signed-off-by: Ian Abbott <abbotti@mev.co.uk> Link: https://lore.kernel.org/r/20250707130908.70758-1-abbotti@mev.co.uk Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Ian Abbott [Mon, 7 Jul 2025 12:15:55 +0000 (13:15 +0100)]
comedi: Fix some signed shift left operations
Correct some left shifts of the signed integer constant 1 by some
unsigned number less than 32. Change the constant to 1U to avoid
shifting a 1 into the sign bit.
The corrected functions are comedi_dio_insn_config(),
comedi_dio_update_state(), and __comedi_device_postconfig().
Ian Abbott [Fri, 4 Jul 2025 12:04:05 +0000 (13:04 +0100)]
comedi: Fail COMEDI_INSNLIST ioctl if n_insns is too large
The handling of the `COMEDI_INSNLIST` ioctl allocates a kernel buffer to
hold the array of `struct comedi_insn`, getting the length from the
`n_insns` member of the `struct comedi_insnlist` supplied by the user.
The allocation will fail with a WARNING and a stack dump if it is too
large.
Avoid that by failing with an `-EINVAL` error if the supplied `n_insns`
value is unreasonable.
Define the limit on the `n_insns` value in the `MAX_INSNS` macro. Set
this to the same value as `MAX_SAMPLES` (65536), which is the maximum
allowed sum of the values of the member `n` in the array of `struct
comedi_insn`, and sensible comedi instructions will have an `n` of at
least 1.
Merge patch series "fs: refactor write_begin/write_end and add ext4 IOCB_DONTCACHE support"
陈涛涛 Taotao Chen <chentaotao@didiglobal.com> says:
This patch series refactors the address_space_operations write_begin()
and write_end() callbacks to take const struct kiocb * as their first
argument, allowing IOCB flags such as IOCB_DONTCACHE to propagate to the
filesystem's buffered I/O path.
Ext4 is updated to implement handling of the IOCB_DONTCACHE flag and
advertises support via the FOP_DONTCACHE file operation flag.
Additionally, the i915 driver's shmem write paths are updated to bypass
the legacy write_begin/write_end interface in favor of directly
calling write_iter() with a constructed synchronous kiocb. Another i915
change replaces a manual write loop with kernel_write() during GEM shmem
object creation.
Tested with ext4 and i915 GEM workloads.
* patches from https://lore.kernel.org/20250716093559.217344-1-chentaotao@didiglobal.com:
ext4: support uncached buffered I/O
mm/pagemap: add write_begin_get_folio() helper function
fs: change write_begin/write_end interface to take struct kiocb *
drm/i915: Refactor shmem_pwrite() to use kiocb and write_iter
drm/i915: Use kernel_write() in shmem object create
Set FOP_DONTCACHE in ext4_file_operations to declare support for
uncached buffered I/O.
To handle this flag, update ext4_write_begin() and ext4_da_write_begin()
to use write_begin_get_folio(), which encapsulates FGP_DONTCACHE logic
based on iocb->ki_flags.
Part of a series refactoring address_space_operations write_begin and
write_end callbacks to use struct kiocb for passing write context and
flags.
mm/pagemap: add write_begin_get_folio() helper function
Add write_begin_get_folio() to simplify the common folio lookup logic
used by filesystem ->write_begin() implementations.
This helper wraps __filemap_get_folio() with common flags such as
FGP_WRITEBEGIN, conditional FGP_DONTCACHE, and set folio order based
on the write length.
Part of a series refactoring address_space_operations write_begin and
write_end callbacks to use struct kiocb for passing write context and
flags.
drm/i915: Refactor shmem_pwrite() to use kiocb and write_iter
Refactors shmem_pwrite() to replace the ->write_begin/end logic
with a write_iter-based implementation using kiocb and iov_iter.
While kernel_write() was considered, it caused about 50% performance
regression. vfs_write() is not exported for kernel use. Therefore,
file->f_op->write_iter() is called directly with a synchronously
initialized kiocb to preserve performance and remove write_begin
usage.
Performance results use gem_pwrite on Intel CPU i7-10700
(average of 10 runs):
Ensure that epoll instances can never form a graph deeper than
EP_MAX_NESTS+1 links.
Currently, ep_loop_check_proc() ensures that the graph is loop-free and
does some recursion depth checks, but those recursion depth checks don't
limit the depth of the resulting tree for two reasons:
- They don't look upwards in the tree.
- If there are multiple downwards paths of different lengths, only one of
the paths is actually considered for the depth check since commit 28d82dc1c4ed ("epoll: limit paths").
Essentially, the current recursion depth check in ep_loop_check_proc() just
serves to prevent it from recursing too deeply while checking for loops.
A more thorough check is done in reverse_path_check() after the new graph
edge has already been created; this checks, among other things, that no
paths going upwards from any non-epoll file with a length of more than 5
edges exist. However, this check does not apply to non-epoll files.
As a result, it is possible to recurse to a depth of at least roughly 500,
tested on v6.15. (I am unsure if deeper recursion is possible; and this may
have changed with commit 8c44dac8add7 ("eventpoll: Fix priority inversion
problem").)
To fix it:
1. In ep_loop_check_proc(), note the subtree depth of each visited node,
and use subtree depths for the total depth calculation even when a subtree
has already been visited.
2. Add ep_get_upwards_depth_proc() for similarly determining the maximum
depth of an upwards walk.
3. In ep_loop_check(), use these values to limit the total path length
between epoll nodes to EP_MAX_NESTS edges.
MAINTAINERS: add block and fsdevel lists to iov_iter
We've had multiple instances where people didn't Cc fsdevel or block
which are easily the most affected subsystems by iov_iter changes.
Put a stop to that and make sure both lists are Cced so we can catch
stuff like [1] early.
Ming Lei [Wed, 16 Jul 2025 11:48:08 +0000 (19:48 +0800)]
loop: use kiocb helpers to fix lockdep warning
The lockdep tool can report a circular lock dependency warning in the loop
driver's AIO read/write path:
```
[ 6540.587728] kworker/u96:5/72779 is trying to acquire lock:
[ 6540.593856] ff110001b5968440 (sb_writers#9){.+.+}-{0:0}, at: loop_process_work+0x11a/0xf70 [loop]
[ 6540.603786]
[ 6540.603786] but task is already holding lock:
[ 6540.610291] ff110001b5968440 (sb_writers#9){.+.+}-{0:0}, at: loop_process_work+0x11a/0xf70 [loop]
[ 6540.620210]
[ 6540.620210] other info that might help us debug this:
[ 6540.627499] Possible unsafe locking scenario:
[ 6540.627499]
[ 6540.634110] CPU0
[ 6540.636841] ----
[ 6540.639574] lock(sb_writers#9);
[ 6540.643281] lock(sb_writers#9);
[ 6540.646988]
[ 6540.646988] *** DEADLOCK ***
```
This patch fixes the issue by using the AIO-specific helpers
`kiocb_start_write()` and `kiocb_end_write()`. These functions are
designed to be used with a `kiocb` and manage write sequencing
correctly for asynchronous I/O without introducing the problematic
lock dependency.
The `kiocb` is already part of the `loop_cmd` struct, so this change
also simplifies the completion function `lo_rw_aio_do_completion()` by
using the `iocb` from the `cmd` struct directly, instead of retrieving
the loop device from the request queue.
Frank Li [Sat, 12 Jul 2025 18:19:04 +0000 (19:19 +0100)]
dt-bindings: nvmem: convert vf610-ocotp.txt to yaml format
Convert vf610-ocotp.txt to yaml format.
Additional changes:
- Remove label in examples.
- Add include file in examples.
- Move reg just after compatible in examples.
- Add ref: nvmem.yaml and nvmem-deprecated-cells.yaml
- Remove #address-cells and #size-cells from required list to match existed
dts file.
Signed-off-by: Frank Li <Frank.Li@nxp.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Srinivas Kandagatla <srini@kernel.org> Link: https://lore.kernel.org/r/20250712181905.6738-9-srini@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
dt-bindings: nvmem: mediatek: efuse: split MT8186/MT8188 from base version
On MT8186 and MT8188 one of the NVMEM cells contains the GPU speed bin
value. In combination with the GPU OPP bindings, on these two platforms
there is an implied scheme of converting the cell value to what the GPU
OPP "opp-supported-hw" property matches. This does not apply to the base
mediatek,efuse hardware, nor does it apply to any of the other platforms
that do not have the GPU speed bin cell. The platform maintainer argues
that this makes the compatibles incompatible with the base
"mediatek,efuse" compatible, as shown in the link given.
Deprecate the MT8186/MT8188 + "mediatek,efuse" combination, and add
new entries with MT8186 being the base model and MT8188 falling back
to MT8186.
Link: https://lore.kernel.org/all/11028242-afe4-474a-9d76-cd1bd9208987@collabora.com/ Fixes: ff1df1886f43 ("dt-bindings: nvmem: mediatek: efuse: Add support for MT8188") Cc: Johnson Wang <johnson.wang@mediatek.com> Signed-off-by: Chen-Yu Tsai <wenst@chromium.org> Acked-by: Conor Dooley <conor.dooley@microchip.com> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Signed-off-by: Srinivas Kandagatla <srini@kernel.org> Link: https://lore.kernel.org/r/20250712181905.6738-8-srini@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Now that the driver core can properly handle constant struct bus_type,
move the nvmem_bus_type variable to be a constant structure as well,
placing it into read-only memory which can not be modified at runtime.
nvmem: core: Fix typos in comments and MODULE_AUTHOR strings
This patch fixes minor typo issues for nvmem-core.c:
Corrects "form" to "from" in multiple function descriptions.
Fixes missing closing angle brackets in MODULE_AUTHOR entries.
These changes improve readability and formatting consistency.
Sven Peter [Sat, 12 Jul 2025 18:18:58 +0000 (19:18 +0100)]
dt-bindings: nvmem: fixed-layout: Allow optional bit positions
NVMEM nodes can optionally include the bits property to specify the bit
position of the cell within a byte.
Extend patternProperties to allow adding the bit offset to the node
address to be able to distinguish nodes with the same address but
different bit positions, e.g.
Before the conversion to NVMEM layouts in commit bd912c991d2e
("dt-bindings: nvmem: layouts: add fixed-layout") this extension was
originally added with commit 4b2545dd19ed ("dt-bindings: nvmem: Extend
patternProperties to optionally indicate bit position") to the now
deprecated layout.
Signed-off-by: Sven Peter <sven@kernel.org> Reviewed-by: "Rob Herring (Arm)" <robh@kernel.org> Signed-off-by: Srinivas Kandagatla <srini@kernel.org> Link: https://lore.kernel.org/r/20250712181905.6738-3-srini@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Sven Peter [Sat, 12 Jul 2025 18:18:57 +0000 (19:18 +0100)]
nvmem: apple: drop default ARCH_APPLE in Kconfig
When the first driver for Apple Silicon was upstreamed we accidentally
included `default ARCH_APPLE` in its Kconfig which then spread to almost
every subsequent driver. As soon as ARCH_APPLE is set to y this will
pull in many drivers as built-ins which is not what we want.
Thus, drop `default ARCH_APPLE` from Kconfig.
Signed-off-by: Sven Peter <sven@kernel.org> Reviewed-by: Janne Grunau <j@jannau.net> Signed-off-by: Srinivas Kandagatla <srini@kernel.org> Link: https://lore.kernel.org/r/20250712181905.6738-2-srini@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Now that the driver core can properly handle constant struct bus_type,
move the fsi_bus_type variable to be a constant structure as well,
placing it into read-only memory which can not be modified at runtime.
misc: rtsx: usb: Ensure mmc child device is active when card is present
When a card is present in the reader, the driver currently defers
autosuspend by returning -EAGAIN during the suspend callback to
trigger USB remote wakeup signaling. However, this does not guarantee
that the mmc child device has been resumed, which may cause issues if
it remains suspended while the card is accessible.
This patch ensures that all child devices, including the mmc host
controller, are explicitly resumed before returning -EAGAIN. This
fixes a corner case introduced by earlier remote wakeup handling,
improving reliability of runtime PM when a card is inserted.
Fixes: 883a87ddf2f1 ("misc: rtsx_usb: Use USB remote wakeup signaling for card insertion detection") Cc: stable@vger.kernel.org Signed-off-by: Ricky Wu <ricky_wu@realtek.com> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Link: https://lore.kernel.org/r/20250711140143.2105224-1-ricky_wu@realtek.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Replace the RAW SPI accesses with spi-mem API. The latter will fall back to
RAW SPI accesses if spi-mem callbacks are not implemented by a controller
driver.
Notable advantages:
- read function now allocates a bounce buffer for SPI DMA compatibility,
similar to write function;
- the driver can now be used in conjunction with SPI controller drivers
providing spi-mem API only, e.g. spi-nxp-fspi.
- during the initial probe the driver polls busy/ready status bit for 25ms
instead of giving up instantly and hoping that the FW didn't write the
EEPROM
Notes:
- mutex_lock() has been dropped from fm25_aux_read() because the latter is
only being called in probe phase and therefore cannot race with
at25_ee_read() or at25_ee_write()
Quick 4KB block size test with CY15B102Q 256KB F-RAM over spi_omap2_mcspi
driver (no spi-mem ops provided, fallback to raw SPI inside spi-mem):
The lower throughtput probably comes from the 3 messages per SPI transfer
inside spi-mem instead of hand-crafted 2 messages per transfer in the
former at25 code. However, if the raw SPI access is not preserved, then
the driver doesn't grow from the lines-of-code perspective and subjectively
could be considered even a bit simpler.
Higher performance impact on the read operation could be explained by the
newly introduced bounce buffer in read operation. I didn't find any
explanation or guarantee, why would a bounce buffer be not needed on the
read side, so I assume it's a pure luck that nobody read EEPROM into
some variable on stack on an architecture where kernel stack would be
not DMA-able.
vmci: Prevent the dispatching of uninitialized payloads
The reproducer executes the host's unlocked_ioctl call in two different
tasks. When init_context fails, the struct vmci_event_ctx is not fully
initialized when executing vmci_datagram_dispatch() to send events to all
vm contexts. This affects the datagram taken from the datagram queue of
its context by another task, because the datagram payload is not initialized
according to the size payload_size, which causes the kernel data to leak
to the user space.
Before dispatching the datagram, and before setting the payload content,
explicitly set the payload content to 0 to avoid data leakage caused by
incomplete payload initialization.
To avoid the oob check failure when executing __compiletime_lessthan()
in memset(), directly use the address of the vmci_event_ctx instance ev
to replace ev.msg.hdr, because their addresses are the same.
eeprom: at25: fram: Detect and support inside-out chip variants
Infineon seems to be confused with the order ID bytes should be presented
by the FRAM chips and to be on the safe side they offer chips which are
either JEDEC conform or the full opposite of the latter.
Examples of the chips which present ID bytes in the reversed order are:
CY15B102QN
CY15B204QSN
Let's support them nevertheless. Except reversing the ID bytes, they also
have quite different density encoding even across EXCELON(tm) family.
The patch has been tested with the above two chips.
misc: fastrpc: Use of_reserved_mem_region_to_resource() for "memory-region"
Use the newly added of_reserved_mem_region_to_resource() function to
handle "memory-region" properties.
The error handling is a bit different. "memory-region" is optional, so
failed lookup is not an error. But then an error in
of_reserved_mem_lookup() is treated as an error. However, that
distinction is not really important. Either the region is available
and usable or it is not. So now, it is just
of_reserved_mem_region_to_resource() which is checked for an error.
mcb: use sysfs_emit_at() instead of scnprintf() in show functions
This change improves clarity and ensures proper bounds checking in
line with the preferred sysfs_emit() API usage for sysfs 'show'
functions. The PAGE_SIZE check is now handled internally by the helper.
No functional change intended.
Signed-off-by: Abhinav Ananthu <abhinav.ogl@gmail.com> Signed-off-by: Johannes Thumshirn <jth@kernel.org> Reviewed-by: Johannes Thumshirn <jth@kernel.org> Link: https://lore.kernel.org/r/20250707074720.40051-2-jth@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Tiffany Yang [Mon, 14 Jul 2025 18:53:19 +0000 (11:53 -0700)]
binder: encapsulate individual alloc test cases
Each case tested by the binder allocator test is defined by 3 parameters:
the end alignment type of each requested buffer allocation, whether those
buffers share the front or back pages of the allotted address space, and
the order in which those buffers should be released. The alignment type
represents how a binder buffer may be laid out within or across page
boundaries and relative to other buffers, and it's used along with
whether the buffers cover part (sharing the front pages) of or all
(sharing the back pages) of the vma to calculate the sizes passed into
each test.
binder_alloc_test_alloc recursively generates each possible arrangement
of alignment types and then tests that the binder_alloc code tracks pages
correctly when those buffers are allocated and then freed in every
possible order at both ends of the address space. While they provide
comprehensive coverage, they are poor candidates to be represented as
KUnit test cases, which must be statically enumerated. For 5 buffers and
5 end alignment types, the test case array would have 750,000 entries.
This change structures the recursive calls into meaningful test cases so
that failures are easier to interpret.
Tiffany Yang [Mon, 14 Jul 2025 18:53:18 +0000 (11:53 -0700)]
binder: Convert binder_alloc selftests to KUnit
Convert the existing binder_alloc_selftest tests into KUnit tests. These
tests allocate and free an exhaustive combination of buffers with
various sizes and alignments. This change allows them to be run without
blocking or otherwise interfering with other processes in binder.
This test is refactored into more meaningful cases in the subsequent
patch.
Tiffany Yang [Mon, 14 Jul 2025 18:53:16 +0000 (11:53 -0700)]
kunit: test: Export kunit_attach_mm()
Tests can allocate from virtual memory using kunit_vm_mmap(), which
transparently creates and attaches an mm_struct to the test runner if
one is not already attached. This is suitable for most cases, except for
when the code under test must access a task's mm before performing an
mmap. Expose kunit_attach_mm() as part of the interface for those
cases. This does not change the existing behavior.
Tiffany Yang [Mon, 14 Jul 2025 18:53:15 +0000 (11:53 -0700)]
binder: Store lru freelist in binder_alloc
Store a pointer to the free pages list that the binder allocator should
use for a process inside of struct binder_alloc. This change allows
binder allocator code to be tested and debugged deterministically while
a system is using binder; i.e., without interfering with other binder
processes and independently of the shrinker. This is necessary to
convert the current binder_alloc_selftest into a kunit test that does
not rely on hijacking an existing binder_proc to run.
A binder process's binder_alloc->freelist should not be changed after
it is initialized. A sole exception is the process that runs the
existing binder_alloc selftest. Its freelist can be temporarily replaced
for the duration of the test because it runs as a single thread before
any pages can be added to the global binder freelist, and the test frees
every page it allocates before dropping the binder_selftest_lock. This
exception allows the existing selftest to be used to check for
regressions, but it will be dropped when the binder_alloc tests are
converted to kunit in a subsequent patch in this series.
Tiffany Yang [Mon, 14 Jul 2025 18:53:14 +0000 (11:53 -0700)]
binder: Fix selftest page indexing
The binder allocator selftest was only checking the last page of buffers
that ended on a page boundary. Correct the page indexing to account for
buffers that are not page-aligned.
Dmitry Antipov [Thu, 26 Jun 2025 07:30:54 +0000 (10:30 +0300)]
binder: use guards for plain mutex- and spinlock-protected sections
Use 'guard(mutex)' and 'guard(spinlock)' for plain (i.e. non-scoped)
mutex- and spinlock-protected sections, respectively, thus making
locking a bit simpler. Briefly tested with 'stress-ng --binderfs'.
Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru> Reviewed-by: Alice Ryhl <aliceryhl@google.com> Acked-by: Carlos Llamas <cmllamas@google.com> Link: https://lore.kernel.org/r/20250626073054.7706-2-dmantipov@yandex.ru Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Thomas Richter [Wed, 25 Jun 2025 05:43:37 +0000 (07:43 +0200)]
s390/pai_crypto: Rename PAI Crypto event 4210
The PAI crypto event number 4210 is named
PCC_COMPUTE_LAST_BLOCK_CMAC_USING_ENCRYPTED_AES_256A
According to the z16 and z17 Principle of Operation documents
SA22-7832-13 and SA22-7832-14 the event is named
PCC_COMPUTE_LAST_BLOCK_CMAC_USING_ENCRYPTED_AES_256
without a trailing 'A'.
Adjust this event name.
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Sakari Ailus [Tue, 20 May 2025 10:34:37 +0000 (13:34 +0300)]
container_of: Document container_of() is not to be used in new code
There is a warning in the kerneldoc documentation of container_of() that
constness of its ptr argument is lost. While this is a valid suggestion
container_of_const() should be used instead, the vast majority of new
code still uses container_of():
drm/amdgpu: Reset the clear flag in buddy during resume
- Added a handler in DRM buddy manager to reset the cleared
flag for the blocks in the freelist.
- This is necessary because, upon resuming, the VRAM becomes
cluttered with BIOS data, yet the VRAM backend manager
believes that everything has been cleared.
v2:
- Add lock before accessing drm_buddy_clear_reset_blocks()(Matthew Auld)
- Force merge the two dirty blocks.(Matthew Auld)
- Add a new unit test case for this issue.(Matthew Auld)
- Having this function being able to flip the state either way would be
good. (Matthew Brost)
v3(Matthew Auld):
- Do merge step first to avoid the use of extra reset flag.
Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Suggested-by: Christian König <christian.koenig@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Cc: stable@vger.kernel.org Fixes: a68c7eaa7a8f ("drm/amdgpu: Enable clear page functionality") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3812 Signed-off-by: Christian König <christian.koenig@amd.com> Link: https://lore.kernel.org/r/20250716075125.240637-2-Arunpravin.PaneerSelvam@amd.com
spi: gpio: Use explicit 'unsigned int' for parameter types
The C standard allows 'unsigned' as a shorthand for 'unsigned int'.
For improved code clarity and consistency with the prevailing kernel coding
style, replace the shorthand with the more explicit 'unsigned int' type
for function parameters.
This is a purely stylistic cleanup and has no functional impact on the
generated code.
ASoC: dt-bindings: qcom,lpass-va-macro: Define clock-names in top-level
Device variants use different amount of clock inputs, but all of them
are in the same order, 'clock-names' in top-level properties can define
the list and each if:then: block can only narrow the number of items.
This is preferred syntax, because it keeps list unified among devices
and encourages adding new entries to the end of the list, instead of
adding them in the middle. The change has no functional impact, but
partially reverts approach implemented in commit cfad817095e1 ("ASoC:
dt-bindings: qcom,lpass-va-macro: Add missing NPL clock").
drm/sitronix/st7571-i2c: Add support for the ST7567 Controller
The Sitronix ST7567 is a monochrome Dot Matrix LCD Controller that has SPI,
I2C and parallel interfaces. The st7571-i2c driver only has support for I2C
so displays using other transport interfaces are currently not supported.
The DRM_FORMAT_R1 pixel format and data commands are the same than what
is used by the ST7571 controller, so only is needed a different callback
that implements the expected initialization sequence for the ST7567 chip.
mmc: loongson2: Unify the function prefixes for loongson2_mmc_pdata
The function prefixes for loongson2_mmc_pdata follow two naming
conventions: SoC-based and DMA-based.
First, DMA-based prefixes are the preferred choice, as they clearly
highlight differences, such as prepare_dma; however, for functions
related to SoC, such as reorder_cmd_data, it is agreed to use the
smallest SoC name as the fallback prefix, such as ls2k0500.
memstick: core: Zero initialize id_reg in h_memstick_read_dev_id()
A new warning in clang [1] points out that id_reg is uninitialized then
passed to memstick_init_req() as a const pointer:
drivers/memstick/core/memstick.c:330:59: error: variable 'id_reg' is uninitialized when passed as a const pointer argument here [-Werror,-Wuninitialized-const-pointer]
330 | memstick_init_req(&card->current_mrq, MS_TPC_READ_REG, &id_reg,
| ^~~~~~
Commit de182cc8e882 ("drivers/memstick/core/memstick.c: avoid -Wnonnull
warning") intentionally passed this variable uninitialized to avoid an
-Wnonnull warning from a NULL value that was previously there because
id_reg is never read from the call to memstick_init_req() in
h_memstick_read_dev_id(). Just zero initialize id_reg to avoid the
warning, which is likely happening in the majority of builds using
modern compilers that support '-ftrivial-auto-var-init=zero'.
The ovpn_netdev_write() function is responsible for injecting
decapsulated and decrypted packets back into the local network stack.
Prior to this patch, the skb could retain GSO metadata from the outer,
encrypted tunnel packet. This original GSO metadata, relevant to the
sender's transport context, becomes invalid and misleading for the
tunnel/data path once the inner packet is exposed.
Leaving this stale metadata intact causes internal GSO validation checks
further down the kernel's network stack (validate_xmit_skb()) to fail,
leading to packet drops. The reasons for these failures vary by
protocol, for example:
- for ICMP, no offload handler is registered;
- for TCP and UDP, the respective offload handlers return errors when
comparing skb->len to the outdated skb_shinfo(skb)->gso_size.
By calling skb_gso_reset(skb) we ensure the inner packet is presented to
gro_cells_receive() with a clean state, correctly indicating it is an
individual packet from the perspective of the local stack.
This change eliminates the "Driver has suspect GRO implementation, TCP
performance may be compromised" warning and improves overall TCP
performance by allowing GSO/GRO to function as intended on the
decapsulated traffic.
Fixes: 11851cbd60ea ("ovpn: implement TCP transport") Reported-by: Gert Doering <gert@greenie.muc.de> Closes: https://github.com/OpenVPN/ovpn-net-next/issues/4 Tested-by: Gert Doering <gert@greenie.muc.de> Signed-off-by: Ralf Lici <ralf@mandelbit.com> Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
Netlink ops do not expect all attributes to be always set, however
this condition is not explicitly coded any where, leading the user
to believe that all sent attributes are somewhat processed.
Fix this behaviour by introducing explicit checks.
For CMD_OVPN_PEER_GET and CMD_OVPN_KEY_GET directly open-code the
needed condition in the related ops handlers.
While for all other ops use attribute subsets in the ovpn.yaml spec file.
Ralf Lici [Wed, 4 Jun 2025 13:11:58 +0000 (15:11 +0200)]
ovpn: propagate socket mark to skb in UDP
OpenVPN allows users to configure a FW mark on sockets used to
communicate with other peers. The mark is set by means of the
`SO_MARK` Linux socket option.
However, in the ovpn UDP code path, the socket's `sk_mark` value is
currently ignored and it is not propagated to outgoing `skbs`.
This commit ensures proper inheritance of the field by setting
`skb->mark` to `sk->sk_mark` before handing the `skb` to the network
stack for transmission.
staging: greybus: gbphy: fix up const issue with the match callback
gbphy_dev_match_id() should be taking a const pointer, as the pointer
passed to it from the container_of() call was const to start with (it
was accidentally cast away with the call.) Fix this all up by correctly
marking the pointer types.
Cc: Alex Elder <elder@kernel.org> Cc: greybus-dev@lists.linaro.org Fixes: d69d80484598 ("driver core: have match() callback in struct bus_type take a const *") Reviewed-by: Johan Hovold <johan@kernel.org> Link: https://lore.kernel.org/r/2025070115-reoccupy-showy-e2ad@gregkh Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>