drm/exynos: Make exynos_drm_framebuffer_init() an internal interface
The only caller of exynos_drm_framebuffer_init() is the helper
exynos_user_fb_create() from the same source file. Declare the
former as static. Tidy up the header's include statements.
v2:
- clean up the includes in the header file (Chen-Yu)
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Acked-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Inki Dae <inki.dae@samsung.com>
Replace the internal DRM framebuffer with a DRM client buffer. The
client buffer allocates the DRM framebuffer on a file and also uses
GEM object handles via the regular ADDFB2 interfaces.
Using client-buffer interfaces unifies framebuffer allocation for
DRM clients in user space and exynos' internal fbdev emulation. It
also simplifies the clean-up side of the fbdev emulation.
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Acked-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Inki Dae <inki.dae@samsung.com>
drm/exynos: fbdev: Calculate buffer geometry with format helpers
Replace the geometry and size calculation in exynos' fbdev emulation
with DRM format helpers. This consists of a 4CC lookup from the fbdev
parameters, format lookup, pitch calculation and size calculation.
Then allocate the GEM buffer object for the framebuffer memory from
the calculated size. Mmap provides the allocated buffer to user space,
so align the buffer size to PAGE_SIZE.
Initialize the fields screen_size and fix.smem_len in struct fb_info
from the size of the allocated buffer object. This is the real size
and can differ from the requested size.
v3:
- add more error checks to geometry calculations
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Acked-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Inki Dae <inki.dae@samsung.com>
drm/exynos: fbdev: Remove offset into screen_buffer
The screen_buffer field in struct fb_info contains the kernel address
of the first byte of framebuffer memory. Do not add the display offset.
This offset only describes scrolling during scanout.
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Fixes: 19c8b8343d9c ("drm/exynos: fixed overlay data updating.") Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Acked-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Inki Dae <inki.dae@samsung.com> Cc: dri-devel@lists.freedesktop.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-samsung-soc@vger.kernel.org Cc: <stable@vger.kernel.org> # v3.2+
hwmon: (pmbus/adm1266) serialize sequencer_state debugfs read with pmbus_lock
adm1266_state_read() backs the sequencer_state debugfs entry and
issues an i2c_smbus_read_word_data(client, ADM1266_READ_STATE)
against the device without taking pmbus_lock. pmbus_core holds
pmbus_lock around its own multi-transaction sequences (notably the
"set PAGE, then read paged register" pattern used by hwmon
attributes), so an unlocked debugfs reader can land between a PAGE
write and the subsequent paged read in another thread. READ_STATE
itself is not paged, so it cannot corrupt PAGE in flight, but the
same defensive serialisation that applies to the GPIO accessors
applies here: any direct device access from outside pmbus_core
should be ordered with respect to pmbus_core's own.
Take pmbus_lock at the top of adm1266_state_read() via the
scope-based guard().
hwmon: (pmbus/adm1266) serialize NVMEM blackbox read with pmbus_lock
adm1266_nvmem_read() is the reg_read callback the NVMEM core invokes
when userspace reads /sys/bus/nvmem/devices/.../nvmem on this chip.
On the first byte of every read it does a memset of data->dev_mem,
walks the device blackbox through adm1266_nvmem_read_blackbox()
(which issues a chain of PMBus block transactions), and then memcpys
the refreshed buffer out to userspace. None of that runs under
pmbus_lock today.
Two consequences:
- The PMBus traffic the refresh issues is not serialised against
pmbus_core's own multi-step PAGE+register sequences. A paged
hwmon attribute read from another thread can land between a
PAGE write and the paged read in either direction and corrupt
one side's view of the device state machine.
- The NVMEM core does not serialise concurrent reg_read calls, so
two userspace readers racing at offset 0 can interleave the
memset of data->dev_mem with another reader's
adm1266_nvmem_read_blackbox() refill or memcpy out, returning
torn data to userspace.
Take pmbus_lock at the top of adm1266_nvmem_read() via the
scope-based guard(). Patch 5 of this series moves
adm1266_config_nvmem() past pmbus_do_probe() so the lock is
guaranteed to be live before the callback is reachable from
userspace.
hwmon: (pmbus/adm1266) serialize GPIO PMBus accesses with pmbus_lock
adm1266_gpio_get(), adm1266_gpio_get_multiple(), and
adm1266_gpio_dbg_show() all issue PMBus reads against the device but
none of them take pmbus_lock. The pmbus_core framework holds
pmbus_lock around its own multi-transaction sequences (notably the
"set PAGE, then read paged register" pattern used by hwmon
attributes), so an unlocked GPIO accessor can land between a PAGE
write and the subsequent paged read in another thread and corrupt
either side's view of the device state machine.
Take pmbus_lock at the top of each of the three accessors via the
scope-based guard(). The lock is uncontended in the common case and
adds only a single mutex round-trip per call.
hwmon: (pmbus/adm1266) register the nvmem device after pmbus_do_probe()
adm1266_probe() calls adm1266_config_nvmem() -- which goes on to
devm_nvmem_register() and exposes adm1266_nvmem_read() to userspace --
before pmbus_do_probe() has initialised the per-client PMBus state.
Same latent hazard as the gpio_chip one fixed in the previous patch:
once the nvmem device is registered, gpiolib's nvmem char-dev / sysfs
interface is reachable, and any concurrent read triggers
adm1266_nvmem_read() -> adm1266_nvmem_read_blackbox(), which issues
PMBus traffic that races pmbus_do_probe()'s own device accesses with
no serialisation.
Move adm1266_config_nvmem() down past pmbus_do_probe() so the nvmem
device isn't reachable from userspace until the PMBus state the
nvmem accessors depend on is fully initialised.
hwmon: (pmbus/adm1266) register the gpio_chip after pmbus_do_probe()
adm1266_probe() calls adm1266_config_gpio() -- which goes on to
devm_gpiochip_add_data() and exposes the gpio_chip callbacks to
gpiolib -- before pmbus_do_probe() has initialised the per-client
PMBus state (notably the pmbus_lock mutex the core hands out via
pmbus_get_data()).
That ordering is already a latent hazard: any GPIO access that lands
between adm1266_config_gpio() and the end of pmbus_do_probe() (for
example a sysfs read from a user space agent that opens the gpiochip
the instant gpiolib advertises it) races pmbus_do_probe()'s own
device accesses with no serialisation.
Move adm1266_config_gpio() down past pmbus_do_probe() so the chip
isn't reachable from userspace until the PMBus state it depends on
is fully initialised.
hwmon: (pmbus/adm1266) reject short block-read responses in the GPIO accessors
adm1266_gpio_get() and adm1266_gpio_get_multiple() both compose the
pin-status word as
pins_status = read_buf[0] + (read_buf[1] << 8);
right after i2c_smbus_read_block_data(), guarding only against an
error return. A well-behaved device returns 2 bytes for
GPIO_STATUS/PDIO_STATUS, but the helper happily reports a 0- or
1-byte response too. If the device returns 0 bytes, both read_buf
slots are uninitialized stack memory; if it returns 1 byte, read_buf[1]
is.
The composed value then flows through set_bit() into the caller's
*bits in adm1266_gpio_get_multiple(), or into the return value of
adm1266_gpio_get(), and ends up in userspace via gpiolib (sysfs and
the char-dev ioctls). That leaks a few bits of kernel stack per
request on any device whose firmware glitch, bus error, or hostile
slave produces a short block-read response.
Add the missing length check to both call sites and surface a short
response as -EIO.
The second *bits = 0 throws away every GPIO bit the first loop just
populated, so callers asking for any combination of GPIO and PDIO
pins always see the GPIO portion of the returned bits as zero.
Drop the redundant second assignment so both halves of the result
survive.
where ADM1266_PDIO_STATUS is the PMBus command code (0xE9, i.e. 233),
not the number of PDIO pins. The intended upper bound is
ADM1266_GPIO_NR + ADM1266_PDIO_NR = 25.
gpiolib hands in a mask sized for gc.ngpio (= 25 bits on this chip),
so the iteration walks find_next_bit() up to 242, reading up to 217
extra bits (a handful of unsigned-long words: four on 64-bit, seven
on 32-bit) of whatever lives past the end of the mask in the
caller's stack. Any incidental set bit in that range then drives a
set_bit(gpio_nr, bits) call that writes past the end of the
caller-supplied bits array too -- both out-of-bounds.
Substitute ADM1266_PDIO_NR for the constant so the scan stops at the
last real PDIO bit.
hwmon: (pmbus/adm1266) bounce blackbox records through a protocol-sized buffer
adm1266_pmbus_block_xfer() copies the device-supplied block payload
into the caller-provided buffer using the device-supplied length:
memcpy(data_r, &msgs[1].buf[1], msgs[1].buf[0]);
The helper does not know how large data_r is and trusts the device to
return at most one record's worth of bytes. adm1266_nvmem_read_blackbox()
violates that contract: it advances read_buff inside data->dev_mem in
ADM1266_BLACKBOX_SIZE (64-byte) strides while the helper is willing to
write up to ADM1266_PMBUS_BLOCK_MAX (255) bytes. A device that returns
more than 64 bytes on the trailing record (read_buff offset 1984 in
the 2048-byte dev_mem allocation) overflows dev_mem by up to 191 bytes
before the post-call
if (ret != ADM1266_BLACKBOX_SIZE)
return -EIO;
can reject the response.
Contain the fix in the caller without changing the helper signature:
read each record into a 255-byte local bounce buffer that matches the
helper's maximum output, validate the returned length, and only then
copy exactly ADM1266_BLACKBOX_SIZE bytes into the dev_mem slot.
hwmon: (pmbus/adm1266) include adapter number in GPIO line label
Platforms that fit more than one ADM1266 on different I2C buses at
the same 7-bit slave address (a common shelf-management pattern,
e.g. one device per power domain) end up with duplicate GPIO line
labels because the existing format only includes the slave address.
Including the adapter number disambiguates them.
The adapter number is formatted as decimal to match the i2c-N
convention used elsewhere in Linux (sysfs paths, dev nodes); the
slave address keeps its conventional hexadecimal form.
The label is purely informational (visible via gpioinfo and the
gpiochip /sys/class/gpio name); no DT or ABI consumer parses it.
Shuicheng Lin [Thu, 14 May 2026 20:32:10 +0000 (20:32 +0000)]
drm/xe/oa: Fix exec_queue leak on width check in stream open
In xe_oa_stream_open_ioctl(), when param.exec_q->width > 1 the
function returns -EOPNOTSUPP directly, skipping the existing
err_exec_q cleanup path. The exec_queue reference obtained by
xe_exec_queue_lookup() is leaked.
The exec queue holds a reference on the xe_file, which is only
dropped during queue teardown. The leaked lookup ref is not on
the file's exec_queue xarray, so file close cannot release it.
This keeps both the exec queue and the file private state pinned
indefinitely.
Jump to err_exec_q instead of returning directly so the reference
is released.
but read_buf in struct adm1266_data is declared as
u8 read_buf[ADM1266_PMBUS_BLOCK_MAX + 1];
For a max-length block response (length byte = 255 + up to 1 PEC
byte), the i2c controller is told to write 257 bytes into a 256-byte
buffer, putting one byte past the end of read_buf. The same response
also makes the subsequent PEC compare
if (crc != msgs[1].buf[msgs[1].buf[0] + 1])
read a byte beyond the array.
Bump the read_buf declaration to ADM1266_PMBUS_BLOCK_MAX + 2 so the
buffer can hold the length byte, up to 255 payload bytes, and the PEC
byte the i2c_msg length already accounts for.
adm1266_nvmem_read_blackbox() loops over a record_count that comes
straight from byte 3 of the BLACKBOX_INFO response. The destination
buffer is data->dev_mem, sized for the nvmem cell's declared 2048
bytes (ADM1266_BLACKBOX_MAX_RECORDS * ADM1266_BLACKBOX_SIZE = 32 * 64).
A device that reports a record_count greater than 32 -- whether due
to firmware bugs, bus corruption, or a non-responsive slave returning
0xff -- would walk read_buff past the end of the dev_mem allocation
on the trailing iterations.
Cap record_count at ADM1266_BLACKBOX_MAX_RECORDS (introduced here)
before entering the loop and return -EIO on any larger value, so a
malformed BLACKBOX_INFO response cannot drive the loop out of bounds.
hwmon: (pmbus/adm1266) widen blackbox-info buffer to I2C_SMBUS_BLOCK_MAX
adm1266_nvmem_read_blackbox() declares a 5-byte stack buffer and
passes it to i2c_smbus_read_block_data() to retrieve the 4-byte
BLACKBOX_INFO response. i2c_smbus_read_block_data() does not honour
caller buffer sizes -- it memcpy()s data.block[0] bytes from the
SMBus transaction (where data.block[0] is the length byte returned by
the slave device, up to I2C_SMBUS_BLOCK_MAX = 32):
memcpy(values, &data.block[1], data.block[0]);
If the device returns any block length above 5, the call overflows
the caller's 5-byte stack buffer before the post-call
if (ret != 4)
return -EIO;
check has a chance to reject the response.
Widen the local buffer to I2C_SMBUS_BLOCK_MAX so the helper has room
for any well-formed SMBus block response, matching the convention used
by the other i2c_smbus_read_block_data() callers in this driver.
hwmon: (pmbus/adm1266) seed timestamp from the real-time clock
adm1266_set_rtc() seeds the chip's SET_RTC register from
ktime_get_seconds(), which returns CLOCK_MONOTONIC -- i.e. seconds
since the host last booted, not seconds since the Unix epoch.
The chip stamps that value into every blackbox record it captures.
Userspace reading those timestamps back expects wall-clock seconds:
that's what the SET_RTC frame layout documents (datasheet Rev. D,
Table 84) and what every other consumer of "seconds since epoch"
assumes. Seeding from CLOCK_MONOTONIC gives blackbox records a
timestamp that is only meaningful within a single boot of the host
and silently resets to small values on every reboot.
Switch to ktime_get_real_seconds() so the seed matches what the
register is documented to hold.
The EC signature check uses && instead of || between the four
byte comparisons. With &&, the condition is true only when ALL
four bytes fail to match simultaneously, meaning the driver
accepts a device as a valid Microchip EC if ANY single byte of
the 4-byte "MCHP" signature happens to match.
Due to short-circuit evaluation, if the first byte reads back as
'M' (0x4D, a very common register value), the remaining three
comparisons are skipped entirely and the device is accepted.
Change && to || so the check rejects devices that do not fully
match the expected EC signature, as originally intended.
Kean Ren [Thu, 21 May 2026 03:52:27 +0000 (11:52 +0800)]
hwmon: (lenovo-ec-sensors): Convert to devm_request_region()
Replace manual request_region()/release_region() with
devm_request_region(). This lets the device-managed framework
handle I/O region lifetime automatically and fixes:
- A double release_region() when probe fails after acquiring the
I/O region: the probe error path releases it, and then
lenovo_ec_init() releases it again on the same error path.
- A release-after-use window in lenovo_ec_exit() where
release_region() was called before platform_device_unregister(),
leaving the hwmon device active with a released I/O region.
- Missing release_region() in lenovo_ec_probe() if
devm_hwmon_device_register_with_info() fails.
Remove all four manual release_region() calls that are now handled
automatically and replace request_region with
devm_request_region, use dev_err replace pr_err.
Also remove the now-unnecessary braces around the single-statement
if body.
Chen-Yu Tsai [Thu, 26 Mar 2026 09:30:38 +0000 (17:30 +0800)]
drm/exynos/dma: Drop iommu_dma_init_domain() stub
Commit 1feda5eb77fc ("drm/exynos: Use selected dma_dev default iommu
domain instead of a fake one") removed the code around creating a
custom IOMMU domain, but forgot to remove the stub.
Remove the iommu_dma_init_domain() stub as the function is no longer
referenced, and was also made private to the IOMMU DMA code.
Myeonghun Pak [Fri, 24 Apr 2026 13:21:31 +0000 (22:21 +0900)]
HID: u2fzero: free allocated URB on probe errors
u2fzero_fill_in_urb() allocates dev->urb with usb_alloc_urb(), but
u2fzero_probe() ignored its return value and only freed the URB from
u2fzero_remove().
If LED or hwrng registration fails after the URB allocation, probe returns
an error and the driver core does not call .remove(), leaking the URB. A
failed URB setup was also allowed to continue probing with an unusable
device.
Check the URB setup result and add the missing probe-error unwind so the
URB is freed before returning from later errors.
Signed-off-by: Myeonghun Pak <mhun512@gmail.com> Signed-off-by: Benjamin Tissoires <bentiss@kernel.org>
Merge patch series "OPENAT2_REGULAR flag support for openat2"
Dorjoy Chowdhury <dorjoychy111@gmail.com> says:
I came upon this "Ability to only open regular files" uapi feature
suggestion from
https://uapi-group.org/kernel-features/#ability-to-only-open-regular-files
and thought it would be something I could do as a first patch and get to
know the kernel code a bit better.
The following filesystems have been tested by building and booting the
kernel x86 bzImage in a Fedora 43 VM in QEMU. I have tested with
OPENAT2_REGULAR that regular files can be successfully opened and
non-regular files (directory, fifo etc) return -EFTYPE.
- btrfs
- NFS (loopback)
- SMB (loopback)
Christian Brauner (Amutable) <brauner@kernel.org>:
All atomic_open implementations were audited for OPENAT2_REGULAR handling.
Explicit checks were added to ceph, gfs2, nfs (v4), and cifs/smb — these
are the filesystems whose atomic_open can encounter an existing non-regular
file and would otherwise call finish_open() on it or return a misleading
error code. The checks allow these filesystems to return -EFTYPE directly
and avoid unnecessary open+close round-trips.
The remaining implementations (9p, fuse, vboxsf, nfs v2/v3) don't need
explicit checks. They only call finish_open() on freshly created files
(always S_IFREG) and use finish_no_open() for lookup hits, letting the
VFS catch non-regular files via the do_open() safety net. Notably, fuse
also validates the server response (S_ISREG check on the reply) before
reaching finish_open().
* patches from https://patch.msgid.link/20260328172314.45807-1-dorjoychy111@gmail.com:
kselftest/openat2: test for OPENAT2_REGULAR flag
openat2: new OPENAT2_REGULAR flag support
Dorjoy Chowdhury [Sat, 16 May 2026 14:43:11 +0000 (16:43 +0200)]
kselftest/openat2: test for OPENAT2_REGULAR flag
Just a happy path test.
Christian Brauner <brauner@kernel.org> says:
Update OPENAT2_REGULAR fallback define to match upper-32-bit UAPI value.
Port the test to the kselftest_harness TEST*/FIXTURE framework to match
the migrated openat2_test.c, and add a regression test ensuring
open()/openat() keep ignoring the internal __O_REGULAR carrier bit.
Dorjoy Chowdhury [Sat, 16 May 2026 14:42:39 +0000 (16:42 +0200)]
openat2: new OPENAT2_REGULAR flag support
This flag indicates the path should be opened if it's a regular file.
This is useful to write secure programs that want to avoid being
tricked into opening device nodes with special semantics while thinking
they operate on regular files. This is a requested feature from the
uapi-group[1].
The previously introduced EFTYPE error code is returned when the path
doesn't refer to a regular file. For example, if openat2 is called on
path /dev/null with OPENAT2_REGULAR in the flag param, it will return
-EFTYPE.
When used in combination with O_CREAT, either the regular file is
created, or if the path already exists, it is opened if it's a regular
file. Otherwise, -EFTYPE is returned.
When OPENAT2_REGULAR is combined with O_DIRECTORY, -EINVAL is returned
as it doesn't make sense to open a path that is both a directory and a
regular file.
The UAPI bit lives in the upper 32 bits of open_how::flags
(((__u64)1 << 32)) so that open(2) and openat(2) -- whose @flags
argument is a C int -- cannot physically express it. This is a
structural guarantee, not a runtime mask: the bit is unrepresentable in
32 bits.
Because the rest of the VFS open path narrows to 32 bits in several
places (op->open_flag, f->f_flags, the unsigned open_flag argument of
i_op->atomic_open()), build_open_flags() translates OPENAT2_REGULAR
into a kernel-internal lower-32-bit carrier __O_REGULAR (bit 4, unused
as an O_* on every architecture) before the assignment to op->open_flag.
__O_REGULAR then rides through the existing channels exactly like
__FMODE_EXEC. do_dentry_open() strips it so it cannot leak back to
userspace via fcntl(F_GETFL).
Four BUILD_BUG_ON_MSG() invariants in build_open_flags() prevent any
future bit collision or accidental low-32 redefinition:
- VALID_OPEN_FLAGS fits in 32 bits.
- OPENAT2_REGULAR lives in the upper 32 bits.
- OPENAT2_REGULAR does not alias any open()/openat() flag.
- __O_REGULAR does not alias any user-visible flag.
Move OPENAT2_REGULAR to the upper 32 bits of open_how::flags with a
kernel-internal __O_REGULAR carrier so that open(2)/openat(2) cannot
encode the flag; add BUILD_BUG_ON_MSG() invariants and register
__O_REGULAR in the fcntl_init() allocation-uniqueness BUILD_BUG_ON()
(bit count 21 -> 22).
Signed-off-by: Dorjoy Chowdhury <dorjoychy111@gmail.com> Link: https://patch.msgid.link/20260328172314.45807-2-dorjoychy111@gmail.com Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Aleksa Sarai <aleksa@amutable.com> Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
ASoC: cs35l56: Fix flushing of IRQ work in cs35l56_sdw_remove()
Use flush_work() instead of cancel_work_sync() to terminate pending IRQ
work in cs35l56_sdw_remove(). And flush_work() again after masking the
interrupts to flush any queueing that was racing with the masking. This is
the same sequence as cs35l56_sdw_system_suspend().
cs35l56_sdw_interrupt() takes the pm_runtime to prevent the bus powering-
down before the interrupt status can be read and handled. The work releases
this pm_runtime. So cancelling it, instead of flushing, could leave an
unbalanced pm_runtime.
Huacai Chen [Thu, 21 May 2026 12:58:40 +0000 (20:58 +0800)]
LoongArch: Remove unused code to avoid build warning
After commit feee6b2989165631b1 ("mm/memory_hotplug: shrink zones when
offlining memory"), __remove_pages() doesn't need the "zone" parameter
so the "page" variable is also unused. Remove the unused code to avoid
such build warning:
arch/loongarch/mm/init.c: In function 'arch_remove_memory':
arch/loongarch/mm/init.c:134:22: warning: variable 'page' set but not used [-Wunused-but-set-variable=]
134 | struct page *page = pfn_to_page(start_pfn);
WANG Rui [Thu, 21 May 2026 12:58:36 +0000 (20:58 +0800)]
LoongArch: Avoid initrd overlap during kernel relocation
Validate the relocation address against the initrd region specified via
"initrd=" or "initrdmem=" on the command line. Reject relocation targets
that overlap the initrd to prevent memory corruption during early boot.
WANG Rui [Thu, 21 May 2026 12:58:36 +0000 (20:58 +0800)]
LoongArch: Skip relocation-time KASLR if already applied
When the kernel is relocated during early boot (efistub or kexec_file),
a randomized load address may has already been selected and applied. In
this case, performing KASLR again in relocate.c is unnecessary.
Note: strictly-defined KASLR means the kernel's final runtime address
has a random offset from the kernel's load address, which is implemented
in relocate.c; broadly-defined KALSR means the kernel's final runtime
address has a random offset from the kernel's link address (a.k.a.
VMLINUX_LOAD_ADDRESS), which also include the efistlub implementation,
kexec_file implementation and QEMU direct kernel boot. kaslr_disabled()
return true only means strictly-defined KASLR is disabled.
WANG Rui [Thu, 21 May 2026 12:58:36 +0000 (20:58 +0800)]
efi/loongarch: Randomize kernel preferred address for KASLR
Introduce efi_get_kimg_kaslr_address() helper to compute the preferred
kernel image load address dynamically when CONFIG_RANDOMIZE_BASE is
enabled. The function derives a random offset by using the EFI-provided
randomness combined with the timer tick value, and constrains it within
CONFIG_RANDOMIZE_BASE_MAX_OFFSET.
Update EFI_KIMG_PREFERRED_ADDRESS to call this helper so that the EFI
stub can select a randomized load address when KASLR is active, while
preserving the original base address behavior when KASLR is disabled or
"nokaslr" is specified.
Note: LoongArch can't KASLR for hibernation, so set efi_nokaslr to true
if "resume=<devname>" is explicitly specified in cmdline.
tracing: Fix unload_page for simple_ring_buffer init rollback
The unload_page callback expects the return value of load_page() as its
argument: ret = load_page(va); unload(ret). Fix the rollback code in
simple_ring_buffer_init_mm() where the descriptor's VA is used instead
of the loaded page address.
David Carlier [Tue, 12 May 2026 13:54:20 +0000 (14:54 +0100)]
tracing: Fix nr_subbufs initialization in simple_ring_buffer_init_mm()
nr_subbufs in the ring buffer metadata is always initialized to zero
because it is assigned from cpu_buffer->nr_pages before the page
initialization loop has run. While nr_subbufs is not currently read
by the kernel, it should reflect the actual buffer geometry in the
meta page for correctness.
Move the assignment after the page loop so that cpu_buffer->nr_pages
holds the final count.
Link: https://patch.msgid.link/20260512135420.99194-1-devnexen@gmail.com Fixes: 34e5b958bdad ("tracing: Introduce simple_ring_buffer") Reviewed-by: Vincent Donnefort <vdonnefort@google.com> Assisted-by: Claude:claude-opus-4-7 Signed-off-by: David Carlier <devnexen@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
ring-buffer: Flush and stop persistent ring buffer on panic
On real hardware, panic and machine reboot may not flush hardware cache
to memory. This means the persistent ring buffer, which relies on a
coherent state of memory, may not have its events written to the buffer
and they may be lost. Moreover, there may be inconsistency with the
counters which are used for validation of the integrity of the
persistent ring buffer which may cause all data to be discarded.
To avoid this issue, stop recording of the ring buffer on panic and
flush the cache of the ring buffer's memory.
Fixes: e645535a954a ("tracing: Add option to use memmapped memory for trace boot instance") Cc: stable@vger.kernel.org Cc: Will Deacon <will@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Ian Rogers <irogers@google.com> Link: https://patch.msgid.link/177751969602.2136606.12031934362587643488.stgit@mhiramat.tok.corp.google.com Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Steven Rostedt [Thu, 21 May 2026 02:08:01 +0000 (22:08 -0400)]
ring-buffer: Fix reporting of missed events in iterator
When tracing is active while reading the trace file, if the iterator
reading the buffer detects that the writer has passed the iterator head,
it will reset and set a "missed events" flag. This flag is passed to the
output processing to show the user that events were missed:
CPU:4 [LOST EVENTS]
The problem is that the flag is reset after it is checked in
ring_buffer_iter_dropped(). But the "trace" file iterates over all the CPU
ring buffers and it will check if they are dropped when figuring out which
buffer to print next. This prematurely clears the missed_events flag if
the CPU buffer with the missed events is not the one that is printed next.
On the iteration where the CPU buffer with the missed events is printed,
the check if it had missed events would return false and the output does
not show that events were missed.
Do not reset the missed_events flag when checking if there were missed
events, but instead clear it when moving the iterator head to the next
event.
When a buffer is freed either by LRU eviction or because it is unset,
the lockref is marked as dead instantly, which prevents the buffer from
being used after finding it in the buffer hash in xfs_buf_lookup and
xfs_buf_find_insert. But the latter will then not add the new buffer to
the hash because it already found an existing buffer.
Fix this using in two places: Remove the buffer from the hash before
marking the lockref dead so that that no buffer with a dead lockref can
be found in the hash, but if we find one in xfs_buf_find_insert due to
store reordering, handle this case correctly instead of returning an
unhashed buffer.
Fixes: 67fe4303972e ("xfs: don't keep a reference for buffers on the LRU") Reported-by: Andrey Albershteyn <aalbersh@redhat.com> Reported-by: Carlos Maiolino <cem@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Andrey Albershteyn <aalbersh@kernel.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
Mateusz Guzik [Sat, 16 May 2026 02:18:52 +0000 (04:18 +0200)]
fs/pipe: write to ->poll_usage only once
Both GNU and BSD makes share a "token pipe" between their instances, as
a result a -j $BIGNUM invocation results in multicore perf problems in
the poll handler.
Avoiding the store will reduce it a little bit. However, the crux of the
problem is the locked queuing up in poll_wait().
Merge patch series "fix crashes when mounting legacy file system with sector size > PAGE_SIZE"
Christoph Hellwig <hch@lst.de> says:
Due to an almost comical failure on my part, my work in progress test
case failed to create any file system on a 64k block size loop device,
and then tried to mount it, leading to a probe of file system built
into my kernel. Roughly the first half of the series are file systems
that actually crashed, but I fixed up all the pattern of missing
error handling that I saw.
Amir Goldstein [Wed, 22 Apr 2026 12:52:12 +0000 (14:52 +0200)]
docs: add guidelines for submitting new filesystems
This document is motivated by the ongoing maintenance burden that
abandoned and untestable filesystems impose on VFS developers, blocking
infrastructure changes such as folio conversions and iomap migration.
This week alone, two new filesystems were proposed on linux-fsdevel
(VMUFAT and FTRFS), highlighting the need for documented guidelines
that new filesystem authors can refer to before submission.
Multiple recent discussions on linux-fsdevel have touched on the
criteria for merging new filesystems and for deprecating old ones,
covering topics such as modern VFS interface adoption, testability,
userspace utilities, maintainer commitment, and user base viability.
Add Documentation/filesystems/adding-new-filesystems.rst describing
the technical requirements and community expectations for merging a
new filesystem into the kernel. The guidelines cover:
- Alternatives to consider before proposing a new in-kernel filesystem
- Technical requirements: modern VFS interfaces (iomap, folios,
fs_context mount API), testability, and userspace utilities
- Community expectations: identified maintainers, demonstrated
commitment, sustained backing, and a clear user base
- Ongoing obligations after merging, including the risk of deprecation
for unmaintained filesystems
Jeff Layton [Wed, 22 Apr 2026 11:29:48 +0000 (07:29 -0400)]
dcache: add extra sanity checks of the dentry in dentry_free()
If d_flags isn't what we expect, then it's good to display it. Add a new
DENTRY_WARN_ONCE() macro that also displays d_flags for the dentry.
Change D_FLAG_VERIFY() to call that instead of a generic WARN_ON_ONCE().
Change the existing hlist_unhashed() check in dentry_free() to use the
new macro, and add checks for other invariants of a dead dentry. Notably:
1) Ensure that DCACHE_LRU_LIST and DCACHE_SHRINK_LIST are not set.
Li RongQing [Fri, 10 Apr 2026 08:09:18 +0000 (04:09 -0400)]
fs/coredump: reduce redundant log noise in validate_coredump_safety
Currently, writing to 'core_pattern' or 'suid_dumpable' sysctl nodes
always triggers validate_coredump_safety(), even if the values have
not changed. This results in redundant warning messages in dmesg:
"Unsafe core_pattern used with fs.suid_dumpable=2..."
This patch optimizes the procfs handlers to only invoke the safety
validation when an actual change in the configuration is detected:
1. In proc_dostring_coredump(), compare the new core_pattern string
with the existing one using strncmp().
2. In proc_dointvec_minmax_coredump(), check if the new suid_dumpable
value differs from the previous one.
This keeps the kernel log clean from repetitive warnings when
re-applying the same sysctl settings.
struct i2c_device_id {
char name[I2C_NAME_SIZE];
- kernel_ulong_t driver_data; /* Data private to the driver */
+ union {
+ /* Data private to the driver */
+ kernel_ulong_t driver_data;
+ const void *driver_data_ptr;
+ };
};
/* pci_epf */
and this requires that .driver_data is assigned via a named initializer
for static data. This requirement isn't a bad one because named
initializers are also much better readable than list initializers.
The union added to struct i2c_device_id enables further cleanups like:
diff --git a/drivers/iio/accel/kxcjk-1013.c b/drivers/iio/accel/kxcjk-1013.c
index 8a082ff034dd..b2aac7348d22 100644
--- a/drivers/iio/accel/kxcjk-1013.c
+++ b/drivers/iio/accel/kxcjk-1013.c
@@ -1429,7 +1429,7 @@ static int kxcjk1013_probe(struct i2c_client *client)
that are an improvement for readability (again!) and it keeps some
properties of the pointers (here: being const) without having to pay
attention for that. (I didn't find a good example in sound/soc, so an
iio driver was used to demonstrate the gain.)
My additional motivation for this effort is CHERI[1]. This is a hardware
extension that uses 128 bit pointers but unsigned long is still 64 bit.
So with CHERI you cannot store pointers in unsigned long variables.
The first patch drops a few empty remove callbacks that I found while
working on patch #2. The second converts all hwmon drivers to use named
initializers.
ASoC: Use named initializers for arrays of i2c_device_data
While being less compact, using named initializers allows to more easily
see which members of the structs are assigned which value without having
to lookup the declaration of the struct. And it's also more robust
against changes to the struct definition.
The mentioned robustness is relevant for a planned change to struct
i2c_device_id that replaces .driver_data by an anonymous union.
While touching all these arrays, unify indention and usage of commas.
This patch doesn't modify the compiled arrays, only their representation
in source form benefits. The former was confirmed with x86 and arm64
builds.
Johan Hovold [Thu, 21 May 2026 07:38:16 +0000 (09:38 +0200)]
spi: fix controller registration API inconsistency
The SPI controller API is asymmetric in that a controller is allocated
and registered in two step, while it is freed as part of deregistration.
[1]
This is especially unfortunate as any driver data is freed along with
the controller, something which has lead to use-after-free bugs during
deregistration when drivers tear down resources after deregistering the
controller (or tear down resources that may still be in use before
deregistering the controller in an attempt to work around the API).
To reduce the risk of such bugs being introduced a device managed
allocation interface was added, but this arguably made things even less
consistent as now whether the controller gets freed as part of
deregistration depends on how it was allocated. [2][3]
With most drivers converted to use managed allocation in preparation for
fixing the API, the remaining 16 drivers can be converted in one
tree-wide change. Ten of those drivers use the bitbang interface and can
be converted by simply removing the extra reference already taken by
spi_bitbang_start() (and updating the two bitbang drivers that use
managed allocation). [4]
Fix the API inconsistency by no longer dropping a reference when
deregistering non-devres allocated controllers.
[1] 68b892f1fdc4 ("spi: document odd controller reference handling")
[2] 5e844cc37a5c ("spi: Introduce device-managed SPI controller allocation")
[3] 3f174274d224 ("spi: fix misleading controller deregistration kernel-doc")
[4] 702a4879ec33 ("spi: bitbang: Let spi_bitbang_start() take a reference to master")
====================
vsock/virtio: fix skb overhead accounting to preserve full buf_alloc
Patch 1 resets the connection when we can no longer queue packets,
this prevents silent data loss, and both peers are notified.
Patch 2 increases the total budget to `buf_alloc * 2` for payload
plus skb overhead similar to how SO_RCVBUF is doubled to reserve
space for sk_buff metadata. This preserves the full buf_alloc for
payload under normal operation, while still bounding the skb queue
growth.
In the future, we plan to improve how we handle the merging of packets
to minimize overhead and avoid closing connections.
vsock/virtio: fix skb overhead accounting to preserve full buf_alloc
After commit 059b7dbd20a6 ("vsock/virtio: fix potential unbounded skb
queue"), virtio_transport_inc_rx_pkt() subtracts per-skb overhead from
buf_alloc when checking whether a new packet fits. This reduces the
effective receive buffer below what the user configured via
SO_VM_SOCKETS_BUFFER_SIZE, causing legitimate data packets to be
silently dropped and applications that rely on the full buffer size
to deadlock.
Also, the reduced space is not communicated to the remote peer, so
its credit calculation accounts more credit than the receiver will
actually accept, causing data loss (there is no retransmission).
With this approach we currently have failures in
tools/testing/vsock/vsock_test.c. Test 18 sometimes fails, while
test 22 always fails in this way:
18 - SOCK_STREAM MSG_ZEROCOPY...hash mismatch
Fix by allowing at most `buf_alloc * 2` as the total budget for payload
plus skb overhead in virtio_transport_inc_rx_pkt(), similar to how
SO_RCVBUF is doubled to reserve space for sk_buff metadata.
This preserves the full buf_alloc for payload under normal operation,
while still bounding the skb queue growth.
With this patch, all tests in tools/testing/vsock/vsock_test.c are
now passing again.
vsock/virtio: reset connection on receiving queue overflow
When there is no more space to queue an incoming packet, the packet is
silently dropped. This causes data loss without any notification to
either peer, since there is no retransmission.
Under normal circumstances, this should never happen. However, it could
happen if the other peer doesn't respect the credit, or if the skb
overhead, which we recently began to take into account with commit 059b7dbd20a6 ("vsock/virtio: fix potential unbounded skb queue"),
is too high.
Fix this by resetting the connection and setting the local socket error
to ENOBUFS when virtio_transport_recv_enqueue() can no longer queue a
packet, so both peers are explicitly notified of the failure rather than
silently losing data.
Fixes: ae6fcfbf5f03 ("vsock/virtio: discard packets if credit is not respected") Cc: stable@vger.kernel.org Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://patch.msgid.link/20260518090656.134588-2-sgarzare@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Peter Ujfalusi [Wed, 20 May 2026 14:13:49 +0000 (17:13 +0300)]
ASoC: SOF: ipc4-control: Use local copy of IPC message for sending
If a kcontrol update comes to a control right at the same time when the
PCM containing the control is started up then there is a small window when
a race can happen:
while the widget is set up the swidget->setup_mutex is taken in topology
level and if the control update comes at this point, it will be stopped
within sof_ipc4_set_get_kcontrol_data() with the mutex and it will be
blocked until the swidget setup is done, which will clear the control's
IPC message payload.
To avoid such race, use local copy of the template IPC message instead of
the template directly. This will ensure data integrity in case of
concurrent updates during initialization.
====================
Add preliminary NETC switch support for i.MX94
i.MX94 NETC (v4.3) integrates 802.1Q Ethernet switch functionality, the
switch provides advanced QoS with 8 traffic classes and a full range of
TSN standards capabilities. It has 3 user ports and 1 CPU port, and the
CPU port is connected to an internal ENETC through the pseduo link, so
instead of a back-to-back MAC, the lightweight "pseudo MAC" is used at
both ends of the pseudo link to transfer Ethernet frames. The pseudo
link provides a zero-copy interface (no serialization delay) and lower
power (less logic and memory).
Like most Ethernet switches, the NETC switch also supports a proprietary
switch tag, is used to carry in-band metadata information about frames.
This in-band metadata information can include the source port from which
the frame was received, what was the reason why this frame got forwarded
to the entity, and for the entity to indicate the precise destination
port of a frame. The NETC switch tag is added to frames after the source
MAC address. There are three types of switch tags, and each type has 1
to 4 subtypes, more details are as follows.
Forward switch tag (Type = 0): Represents forwarded frames.
- SubType = 0 - Normal frame processing.
To_Port switch tag (Type = 1): Represents frames that are to be sent to
a specific switch port.
- SubType = 0. No request to perform timestamping.
- SubType = 1. Request to perform one-step timestamping.
- SubType = 2. Request to perform two-step timestamping.
- SubType = 3. Request to perform both one-step timestamping and
two-step timestamping.
To_Host switch tag (Type = 2): Represents frames redirected or copied to
the switch management port.
- SubType = 0. Received frames redirected or copied to the switch
management port.
- SubType = 1. Received frames redirected or copied to the switch
management port with captured timestamp at the switch port where
the frame was received.
- SubType = 2. Transmit timestamp response (two-step timestamping).
Currently, this patch set supports Forward tag, SubType 0 of To_Port tag
and SubType 0 of To_Host tag. More tags will be supported in the future.
In addition, the switch supports NETC Table Management Protocol (NTMP),
some switch functionality is controlled using control messages sent to
the hardware using BD ring interface with 32B descriptors similar to the
packet Transmit BD ring used on ENETC. This interface is referred to as
the command BD ring. This is used to configure functionality where the
underlying resources may be shared between different entities or being
too large to configure using direct registers.
For this patch set, we have supported the following tables through the
command BD ring interface.
FDB Table: It contains forwarding and/or filtering information about MAC
addresses. The FDB table is used for MAC learning lookups and MAC
forwarding lookups.
VLAN Filter Table: It contains configuration and control information for
each VLAN configured on the switch.
Buffer Pool Table: It contains buffer pool configuration and operational
information. Each entry corresponds to a buffer pool. Currently, we use
this table to implement flow control feature on each port.
Ingress Port Filter Table: It contains a set of filters each capable of
classifying incoming traffic using a mix of L2, L3, and L4 parsed and
arbitrary field data. We use this table to implement host flood support
to the switch port.
The switch also supports other tables, and we will add more advanced
features through them in the future.
====================
Wei Fang [Mon, 18 May 2026 08:25:06 +0000 (16:25 +0800)]
net: dsa: netc: add support for ethtool private statistics
Implement the ethtool private statistics interface to expose additional
port-level and MAC-level counters that are not covered by the standard
IEEE 802.3 statistics. The pMAC counters are only reported when the port
supports Frame Preemption (802.1Qbu/802.3br).
Note that although rtnl_link_stats64 provides some standard statistics
such as rx octets, rx frame errors, rx dropped packets, and tx packets,
these are overall port statistics. The NETC switch supports preemption
on each port, and each port has two MACs (eMAC and pMAC). The driver
private statistics are used to obtain statistics for each MAC, allowing
users to perform analysis and debugging.
Wei Fang [Mon, 18 May 2026 08:25:05 +0000 (16:25 +0800)]
net: dsa: netc: add support for the standardized counters
Each user port of the NETC switch supports 802.3 basic and mandatory
managed objects statistic counters and IETF Management Information
Database (MIB) package (RFC2665) and Remote Network Monitoring (RMON)
counters. And all of these counters are 64-bit registers. In addition,
some user ports support preemption, so these ports have two MACs, MAC
0 is the express MAC (eMAC), MAC 1 is the preemptible MAC (pMAC). So
for ports that support preemption, the statistics are the sum of the
pMAC and eMAC statistics.
Note that the current switch driver does not support preemption, all
frames are sent and received via the eMAC by default. The statistics
read from the pMAC should be zero.
Wei Fang [Mon, 18 May 2026 08:25:04 +0000 (16:25 +0800)]
net: dsa: netc: initialize buffer pool table and implement flow-control
The buffer pool is a quantity of memory available for buffering a group
of flows (e.g. frames having the same priority, frames received from the
same port), while waiting to be transmitted on a port. The buffer pool
tracks internal memory consumption with upper bound limits and optionally
a non-shared portion when associated with a shared buffer pool. Currently
the shared buffer pool is not supported, it will be added in the future.
For i.MX94, the switch has 4 ports and 8 buffer pools, so each port is
allocated two buffer pools. For frames with priorities of 0 to 3, they
will be mapped to the first buffer pool; For frames with priorities of
4 to 7, they will be mapped to the second buffer pool. Each buffer pool
has a flow control on threshold and a flow control off threshold. By
setting these threshold, add the flow control support to each port.
Wei Fang [Mon, 18 May 2026 08:25:03 +0000 (16:25 +0800)]
net: dsa: netc: add FDB, STP, MTU, port setup and host flooding support
Expand the NETC switch driver with several foundational features:
- FDB and MDB management
- STP state handling
- MTU configuration
- Port setup/teardown
- Host flooding support
At this stage, the driver operates only in standalone port mode. Each
port uses VLAN 0 as its PVID, meaning ingress frames are internally
assigned VID 0 regardless of whether they arrive tagged or untagged.
Note that this does not inject a VLAN 0 header into the frame, the VID
is used purely for subsequent VLAN processing within the switch.
Wei Fang [Mon, 18 May 2026 08:25:02 +0000 (16:25 +0800)]
net: dsa: netc: add phylink MAC operations
Different versions of NETC switches have different numbers of ports and
MAC capabilities. Add .phylink_get_caps() to struct netc_switch_info,
allowing each NETC switch version to implement its own callback for
obtaining MAC capabilities.
Implement the phylink_mac_ops callbacks: .mac_config(), .mac_link_up(),
and .mac_link_down(). Note that flow-control configuration is not yet
supported in .mac_link_up(), but will be implemented in a subsequent
patch.
Wei Fang [Mon, 18 May 2026 08:25:01 +0000 (16:25 +0800)]
net: dsa: netc: introduce NXP NETC switch driver for i.MX94
For i.MX94 series, the NETC IP provides full 802.1Q Ethernet switch
functionality, advanced QoS with 8 traffic classes, and a full range of
TSN standards capabilities. The switch has 3 user ports and 1 CPU port,
the CPU port is connected to an internal ENETC. Since the switch and the
internal ENETC are fully integrated within the NETC IP, no back-to-back
MAC connection is required. Instead, a light-weight "pseudo MAC" is used
between the switch and the ENETC. This translates to lower power (less
logic and memory) and lower delay (as there is no serialization delay
across this link).
Introduce the initial NETC switch driver with basic probe and remove
functionality. More features will be added in subsequent patches.
Wei Fang [Mon, 18 May 2026 08:25:00 +0000 (16:25 +0800)]
net: dsa: add NETC switch tag support
The NXP NETC switch tag is a proprietary header added to frames after the
source MAC address. The switch tag has 3 types, and each type has 1 ~ 4
subtypes, the details are as follows.
Forward NXP switch tag (Type=0): Represents forwarded frames.
- SubType = 0 - Normal frame processing.
To_Port NXP switch tag (Type=1): Represents frames that are to be sent
to a specific switch port.
- SubType = 0. No request to perform timestamping.
- SubType = 1. Request to perform one-step timestamping.
- SubType = 2. Request to perform two-step timestamping.
- SubType = 3. Request to perform both one-step timestamping and
two-step timestamping.
To_Host NXP switch tag (Type=2): Represents frames redirected or copied
to the switch management port.
- SubType = 0. Received frames redirected or copied to the switch
management port.
- SubType = 1. Received frames redirected or copied to the switch
management port with captured timestamp at the switch port where
the frame was received.
- SubType = 2. Transmit timestamp response (two-step timestamping).
In addition, the length of different type switch tag is different, the
minimum length is 6 bytes, the maximum length is 14 bytes. Currently,
Forward tag, SubType 0 of To_Port tag and Subtype 0 of To_Host tag are
supported. More tags will be supported in the future.
Wei Fang [Mon, 18 May 2026 08:24:59 +0000 (16:24 +0800)]
net: enetc: add multiple command BD rings support
All the tables of NETC switch are managed through the command BD ring,
but unlike ENETC, the switch has two command BD rings, if the current
ring is busy, the switch driver can switch to another ring to manage
the table. Currently, the NTMP driver does not support multiple rings.
Therefore, update ntmp_select_and_lock_cbdr() to select a appropriate
ring to execute the command for the switch.
Wei Fang [Mon, 18 May 2026 08:24:58 +0000 (16:24 +0800)]
net: enetc: add support for "Add" and "Delete" operations to IPFT
The ingress port filter table (IPFT )contains a set of filters each
capable of classifying incoming traffic using a mix of L2, L3, and L4
parsed and arbitrary field data. As a result of a filter match, several
actions can be specified such as on whether to deny or allow a frame,
overriding internal QoS attributes associated with the frame and setting
parameters for the subsequent frame processing functions, such as stream
identification, policing, ingress mirroring. Each entry corresponds to a
filter. The ingress port filter entries are added using a precedence
value. If a frame matches multiple entries, the entry with the higher
precedence is used. Currently, this patch only adds "Add" and "Delete"
operations to the ingress port filter table. These two interfaces will
be used by both ENETC driver and NETC switch driver.
Wei Fang [Mon, 18 May 2026 08:24:57 +0000 (16:24 +0800)]
net: enetc: add support for the "Update" operation to buffer pool table
The buffer pool table contains buffer pool configuration and operational
information. Each entry corresponds to a buffer pool. The Entry ID value
represents the buffer pool ID to access.
The buffer pool table is a static bounded index table, buffer pools are
always present and enabled. It only supports Update and Query operations,
This patch only adds ntmp_bpt_update_entry() helper to support updating
the specified entry of the buffer pool table. Query action to the table
will be added in the future.
Wei Fang [Mon, 18 May 2026 08:24:56 +0000 (16:24 +0800)]
net: enetc: add support for the "Add" operation to VLAN filter table
The VLAN filter table contains configuration and control information for
each VLAN configured on the switch. Each VLAN entry includes the VLAN
port membership, which FID to use in the FDB lookup, which spanning tree
group to use, the egress frame modification actions to apply to a frame
exiting form this VLAN, and various configuration and control parameters
for this VLAN.
The VLAN filter table can only be managed by the command BD ring using
table management protocol version 2.0. The table supports Add, Delete,
Update and Query operations. And the table supports 3 access methods:
Entry ID, Exact Match Key Element and Search. But currently we only add
the ntmp_vft_add_entry() helper to support the upcoming switch driver to
add an entry to the VLAN filter table. Other interfaces will be added in
the future.
Wei Fang [Mon, 18 May 2026 08:24:55 +0000 (16:24 +0800)]
net: enetc: add basic operations to the FDB table
The FDB table is used for MAC learning lookups and MAC forwarding lookups.
Each table entry includes information such as a FID and MAC address that
may be unicast or multicast and a forwarding destination field containing
a port bitmap identifying the associated port(s) with the MAC address.
FDB table entries can be static or dynamic. Static entries are added from
software whereby dynamic entries are added either by software or by the
hardware as MAC addresses are learned in the datapath.
The FDB table can only be managed by the command BD ring using table
management protocol version 2.0. Table management command operations Add,
Delete, Update and Query are supported. And the FDB table supports three
access methods: Entry ID, Exact Match Key Element and Search. This patch
adds the following basic supports to the FDB table.
ntmp_fdbt_update_entry() - update the configuration element data of a
specified FDB entry
ntmp_fdbt_delete_entry() - delete a specified FDB entry
ntmp_fdbt_add_entry() - add an entry into the FDB table
ntmp_fdbt_search_port_entry() - Search the FDB entry on the specified
port based on RESUME_ENTRY_ID.
Wei Fang [Mon, 18 May 2026 08:24:54 +0000 (16:24 +0800)]
net: enetc: add pre-boot initialization for i.MX94 switch
Before probing the NETC switch driver, some pre-initialization needs to
be set in NETCMIX and IERB to ensure that the switch can work properly.
For example, i.MX94 NETC switch has three external ports and each port
is bound to a link. And each link needs to be configured so that it can
work properly, such as I/O variant and MII protocol.
In addition, the switch port 2 (MAC 2) and ENETC 0 (MAC 3) share the same
parallel interface, they cannot be used at the same time due to the SoC
constraint. And the MAC selection is controlled by the mac2_mac3_sel bit
of EXT_PIN_CONTROL register. Currently, the interface is set for ENETC 0
by default unless the switch port 2 is enabled in the DT node.
Like ENETC, each external port of the NETC switch can manage its external
PHY through its port MDIO registers. And the port can only access its own
external PHY by setting the PHY address to the LaBCR[MDIO_PHYAD_PRTAD].
If the accessed PHY address is not equal to LaBCR[MDIO_PHYAD_PRTAD], then
the MDIO access initiated by port MDIO will be invalid.
Wei Fang [Mon, 18 May 2026 08:24:53 +0000 (16:24 +0800)]
dt-bindings: net: dsa: add NETC switch
Add bindings for NETC switch. This switch is a PCIe function of NETC IP,
it supports advanced QoS with 8 traffic classes and 4 drop resilience
levels, and a full range of TSN standards capabilities. The switch CPU
port connects to an internal ENETC port, which is also a PCIe function
of NETC IP. So these two ports use a light-weight "pseudo MAC" instead
of a back-to-back MAC, because the "pseudo MAC" provides the delineation
between switch and ENETC, this translates to lower power (less logic and
memory) and lower delay (as there is no serialization delay across this
link).
Wei Fang [Mon, 18 May 2026 08:24:52 +0000 (16:24 +0800)]
dt-bindings: net: dsa: update the description of 'dsa,member' property
The current description indicates that the 'dsa,member' property cannot
be set for a switch that is not part of any cluster. Vladimir thinks
that this is a case where the actual technical limitation was poorly
transposed into words when this restriction was first documented, in
commit 8c5ad1d6179d ("net: dsa: Document new binding").
The true technical limitation is that many DSA tagging protocols are
topology-unaware, and always call dsa_conduit_find_user() with a
switch_id of 0. Specifying a custom "dsa,member" property with a
non-zero switch_id would break them.
Therefore, for topology-aware switches, it is fine to specify this
property for them, even if they are not part of any cluster. Our NETC
switch is a good example which is topology-aware, the switch_id is
carried in the switch tag, but the switch_id 0 is reserved for VEPA
switch and cannot be used, so we need to use this property to assign
a non-zero switch_id for it.
Suggested-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Wei Fang <wei.fang@nxp.com> Acked-by: Rob Herring (Arm) <robh@kernel.org> Link: https://patch.msgid.link/20260518082506.1318236-2-wei.fang@nxp.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Anand Moon [Wed, 20 May 2026 04:40:41 +0000 (10:10 +0530)]
media: meson: vdec: Fix memory leak in error path of vdec_open
The vdec_open() function previously jumped directly to
err_m2m_release when vdec_init_ctrls() failed, skipping
release of the m2m context. This caused a resource leak.
Fix it by introducing a proper err_m2m_ctx_release label
that calls v4l2_m2m_ctx_release(sess->m2m_ctx) before
releasing the m2m device.
Signed-off-by: Maha Maryam Javaid <mahamaryamjavaid@gmail.com> Signed-off-by: Nicolas Dufresne <nicolas.dufresne@collabora.com> Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>
Myeonghun Pak [Wed, 6 May 2026 12:41:16 +0000 (21:41 +0900)]
media: cedrus: clean up media device on probe failure
cedrus_probe() initializes the media device before registering the video
device, the media controller, and the media device. If any of those later
steps fails, probe returns without calling media_device_cleanup(), so the
media device internals initialized by media_device_init() are left behind.
Add a media-device cleanup label to the probe unwind path and route video
registration failures through it as well.
Fixes: 50e761516f2b8c ("media: platform: Add Cedrus VPU decoder driver") Cc: stable@vger.kernel.org Reviewed-by: Paul Kocialkowski <paulk@sys-base.io> Co-developed-by: Ijae Kim <ae878000@gmail.com> Signed-off-by: Ijae Kim <ae878000@gmail.com> Signed-off-by: Myeonghun Pak <mhun512@gmail.com> Signed-off-by: Nicolas Dufresne <nicolas.dufresne@collabora.com> Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>
Sven Püschel [Wed, 20 May 2026 22:44:32 +0000 (00:44 +0200)]
media: rockchip: rga: add rga3 support
Add support for the RGA3 unit contained in the RK3588.
Only a basic feature set consisting of scaling and color conversion is
implemented. Currently unimplemented features include:
- Advanced formats like 10bit YUV, FBCE mode and Tile8x8 mode
- Background color (V4L2_CID_BG_COLOR)
- Configurable alpha value (V4L2_CID_ALPHA_COMPONENT)
- Image flipping (V4L2_CID_HFLIP and V4L2_CID_VFLIP)
- Image rotation (V4L2_CID_ROTATE)
- Image cropping/composing (VIDIOC_S_SELECTION)
- Only very basic output cropping for 1088 -> 1080 cases is implemented
The register address defines were copied from the
vendor Rockchip kernel sources and slightly adjusted to not start at 0
again for the cmd registers.
During testing it has been noted that the scaling of the hardware is
slightly incorrect. A test conversion of 128x128 RGBA to 256x256 RGBA
causes a slightly larger scaling. The scaling is suddle, as it seems
that the image is scaled to a 2px larger version and then cropped to
it's final size. Trying to use the RGA2 scaling factor calculation
didn't work. As the calculation matches the vendor kernel driver, no
further research has been utilized to check if there may be some kind of
better scaling factor calculation.
Furthermore comparing the RGA3 conversion with the GStreamer
videoconvertscale element, the chroma-site is different. A quick testing
didn't reveal a chroma-site that creates the same image with the
GStreamer Element. Also when converting from YUV to RGB the RGB values
differ by 1 or 2. This doesn't seem to be a colorspace conversion issue
but rather a slightly different precision on the calculation.
Reviewed-by: Nicolas Dufresne <nicolas.dufresne@collabora.com> Signed-off-by: Sven Püschel <s.pueschel@pengutronix.de> Signed-off-by: Nicolas Dufresne <nicolas.dufresne@collabora.com> Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>
Sven Püschel [Wed, 20 May 2026 22:44:31 +0000 (00:44 +0200)]
media: rockchip: rga: disable multi-core support
Disable multi-core support in preparation of the RGA3 addition. The
RK3588 SoC features two equal RGA3 cores. This allows scheduling of the
work between both cores, which is not yet implemented. Until it is
implemented avoid exposing both cores as independent video devices to
prevent an ABI breakage when multi-core support is added.
This patch is copied from the Hantro driver patch to disable multi core
support by Sebastian Reichel. See
commit ccdeb8d57f7f ("media: hantro: Disable multicore support")
Sven Püschel [Wed, 20 May 2026 22:44:30 +0000 (00:44 +0200)]
media: rockchip: rga: add feature flags
In preparation to the RGA3 addition add feature flags, which can limit
the exposed feature set of the video device, like rotating or selection
support. This is necessary as the RGA3 doesn't initially implement the
full feature set currently exposed by the driver.
Reviewed-by: Nicolas Dufresne <nicolas.dufresne@collabora.com> Signed-off-by: Sven Püschel <s.pueschel@pengutronix.de> Signed-off-by: Nicolas Dufresne <nicolas.dufresne@collabora.com> Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>
Sven Püschel [Wed, 20 May 2026 22:44:29 +0000 (00:44 +0200)]
media: rockchip: rga: move rga_fmt to rga-hw.h
Move rga_fmt to rga-hw in preparation of the RGA3 addition, as the struct
contains many RGA2 specific values. They are used to write the correct
register values quickly based on the chosen format. Therefore the
pointer to the rga_fmt struct is kept but changed to an opaque void
pointer outside of the rga-hw.h.
To enumerate and set the correct formats, two helper functions need to
be exposed in the rga_hw struct:
enum_format just get's the vidioc_enum_fmt format and it's return value
is also returned from vidioc_enum_fmt. This is a simple pass-through,
as the implementation is very simple.
adjust_and_map_format is a simple abstraction around the previous
rga_find_format. But unlike rga_find_format, it always returns a valid
format. Therefore the passed format value is also a pointer to update
it in case the values are not supported by the hardware.
Due to the RGA3 supporting different formats on the capture and output
side, an additional parameter is_capture has been added to support
this use-case. The additional ctx parameter is also added to allow the
RGA3 to limit the colorimetry only if an RGB<->YCrCb transformation
happens.
Reviewed-by: Nicolas Dufresne <nicolas.dufresne@collabora.com> Signed-off-by: Sven Püschel <s.pueschel@pengutronix.de> Signed-off-by: Nicolas Dufresne <nicolas.dufresne@collabora.com> Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>
Sven Püschel [Wed, 20 May 2026 22:44:28 +0000 (00:44 +0200)]
media: rockchip: rga: remove stride from rga_frame
Remove the stride variable from rga_frame. Despite the comment it
didn't involve any calculation and is just a copy of the
plane_fmt[0].bytesperline value. Therefore avoid this struct member
and use the bytesperline value directly in the places where it is
required.
Also drop the dependency on the depth format member, which was only
used to calculate the stride of the default format. This is already done
by the v4l2_fill_pixfmt_mp_aligned helper and used as stride in try_fmt.
Therefore using it's value also for the default format stride is just
more consistent.
Reviewed-by: Nicolas Dufresne <nicolas.dufresne@collabora.com> Signed-off-by: Sven Püschel <s.pueschel@pengutronix.de> Signed-off-by: Nicolas Dufresne <nicolas.dufresne@collabora.com> Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>