Remove any GFP_KERNEL arguments found in the new kmalloc_obj-family
helpers. This captures the script used in commit 189f164e573e ("Convert
remaining multi-line kmalloc_obj/flex GFP_KERNEL uses").
Linus Torvalds [Fri, 20 Mar 2026 16:58:56 +0000 (09:58 -0700)]
Merge tag 'io_uring-7.0-20260320' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux
Pull io_uring fixes from Jens Axboe:
- A bit of a work-around for AF_UNIX recv multishot, as the in-kernel
implementation doesn't properly signal EOF. We'll likely rework this
one going forward, but the fix is sufficient for now
- Two fixes for incrementally consumed buffers, for non-pollable files
and for 0 byte reads
* tag 'io_uring-7.0-20260320' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
io_uring/kbuf: propagate BUF_MORE through early buffer commit path
io_uring/kbuf: fix missing BUF_MORE for incremental buffers at EOF
io_uring/poll: fix multishot recv missing EOF on wakeup race
Thomas Weißschuh [Tue, 17 Mar 2026 08:40:36 +0000 (09:40 +0100)]
tools/nolibc: MIPS: fix clobbers of 'lo' and 'hi' registers on different ISAs
All MIPS ISAs before r6 use the 'lo' and 'hi' special registers.
These are clobbered by system calls and need to be marked as such to
avoid miscompilations. Currently nolibc ties the clobbers to the ABI.
But this is wrong and leads to ISA<->ABI combinations which are not
handled correctly, leading to compiler errors or miscompilations.
David Laight [Sun, 8 Mar 2026 11:37:42 +0000 (11:37 +0000)]
selftests/nolibc: Use printf variable field widths and precisions
Now that printf supports '*' for field widths and precisions
then can be used to simplify the test output.
- aligning the "[OK]" strings.
- reporting the expected sprintf() output when there is a mismatch.
David Laight [Sun, 8 Mar 2026 11:37:41 +0000 (11:37 +0000)]
tools/nolibc/printf: Add support for octal output
Octal output isn't often used, but adding it costs very little.
Supporting "%#o" is mildly annoying, it has to add a leading '0' if
there isn't one present. In simple cases this is the same as adding a sign
of '0' - but that adds an extra '0' in a few places.
So you need 3 tests, %o, # and no leading '0' (which can only be checked
after the zero pad for precision).
If all the test are deferred until after zero padding then too many values
are 'live' across the call to _nolibc_u64toa_base() and get spilled to stack.
Hence the check that ignores the 'sign' if it is the same as the first
character of the output string.
David Laight [Sun, 8 Mar 2026 11:37:38 +0000 (11:37 +0000)]
tools/nolibc/printf: Special case 0 and add support for %#x
The output for %#x is almost the same as that for %p, both output in
hexadecimal with a leading "0x".
However for zero %#x should just output "0" (the same as decimal and ocal).
For %p match glibc and output "(nil)" rather than "0x0" or "0".
Add tests for "%#x", "% d", "%+d" and passing NULL to "%p".
David Laight [Sun, 8 Mar 2026 11:37:36 +0000 (11:37 +0000)]
tools/nolibc/printf: Prepend sign to converted number
Instead of appending the converted number to the sign, convert first
and then prepend the sign (or "0x").
Use the length returned by u64toh_r() instead of calling strlen().
Needed so that zero padding can be inserted between the sign and digits
in an upcoming patch.
David Laight [Sun, 8 Mar 2026 11:37:34 +0000 (11:37 +0000)]
tools/nolibc/printf: Add support for length modifiers tzqL and formats iX
Length modifiers t (ptrdiff_t) and z (size_t) are aliases for l (long),
q and L are 64bit the same as j (intmax).
Format i is an alias for d and X similar to x but upper case.
Supporting them is mostly just adding the relevant bit to the bit
pattern used for matching characters.
Although %X is detected the output will be lower case.
Change/add tests to use conversions i and X, and length modifiers L and ll.
Use the correct minimum value for "%Li".
Linus Torvalds [Fri, 20 Mar 2026 16:54:40 +0000 (09:54 -0700)]
Merge tag 'spi-fix-v7.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi fixes from Mark Brown:
"There's a couple of core fixes here from Johan, fixing a race
condition and an error handling path, plus a bunch of driver specific
fixups.
The Qualcomm issues could be nasty if you ran into them, especially
the DMA ordering one"
* tag 'spi-fix-v7.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: geni-qcom: Check DMA interrupts early in ISR
spi: fix statistics allocation
spi: fix use-after-free on controller registration failure
spi: geni-qcom: Fix CPHA and CPOL mode change detection
spi: axiado: Fix double-free in ax_spi_probe()
spi: amlogic-spisg: Fix memory leak in aml_spisg_probe()
spi: amlogic: spifc-a4: Remove redundant clock cleanup
Linus Torvalds [Fri, 20 Mar 2026 16:52:45 +0000 (09:52 -0700)]
Merge tag 'regulator-fix-v7.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
Pull regulator fix from Mark Brown:
"Just one fix here from Hugo Villeneuve, the documentation for some of
the regulator DT properties had been cut'n'pasted so that if anyone
actually read it they'd be informed that those properties had
completely incorrect meanings"
* tag 'regulator-fix-v7.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: dt-bindings: fix typos in regulator-uv-* descriptions
Linus Torvalds [Fri, 20 Mar 2026 16:46:15 +0000 (09:46 -0700)]
Merge tag 'pmdomain-v7.0-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm
Pull pmdomain fixes from Ulf Hansson:
- bcm: increase ASB control timeout for bcm2835
- mediatek: fix power domain count
* tag 'pmdomain-v7.0-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm:
pmdomain: bcm: bcm2835-power: Increase ASB control timeout
pmdomain: mediatek: Fix power domain count
David Laight [Sun, 8 Mar 2026 11:37:32 +0000 (11:37 +0000)]
tools/nolibc/printf: Use goto and reduce indentation
Upcoming changes will need to use goto to jump to the code that
outputs characters.
Use 'goto do_output' to output a known number of characters.
Use 'goto do_strlen_output' to output a '\0' terminated string.
Removes a level of indentation from the format processing code.
The change is best reviewed using 'git diff -b' after applying it.
David Laight [Sun, 8 Mar 2026 11:37:31 +0000 (11:37 +0000)]
tools/nolibc/printf: Simplify __nolibc_printf()
Move the check for the length modifiers into the format processing
between the field width and conversion specifier.
This lets the loop be simplified and a 'fast scan' for a format start
used.
If an error is detected (eg an invalid conversion specifier) then
copy the invalid format to the output buffer.
Reduces code size by about 10% on x86-64.
Some versions of gcc bloat this version by generating a jump table.
All goes away in the later patches.
David Laight [Sun, 8 Mar 2026 11:37:29 +0000 (11:37 +0000)]
tools/nolibc: Rename the 'errnum' parameter to strerror()
Change the parameter variable name from 'errno' to 'errnum'.
Matches any documentation and avoids any issues that might happen
if errno is actually a #define (which is not uncommon).
David Laight [Sun, 8 Mar 2026 11:37:28 +0000 (11:37 +0000)]
tools/nolibc: Implement strerror() in terms of strerror_r()
strerror() can be the only part of a program that has a .data section.
This requires 4k in the program file.
Add a simple implementation of strerror_r() and use that in strerror()
so that the "errno=" string is copied at run-time.
Use __builtin_memcpy() because that optimises away the input string
and just writes the required constants to the target buffer.
Code size change largely depends on whether the inlining decision for
strerror() changes.
Change the tests to use the normal EXPECT_VFPRINTF() when testing %m.
Skip the tests when !is_nolibc.
David Laight [Mon, 2 Mar 2026 10:17:59 +0000 (10:17 +0000)]
selftests/nolibc: Let EXPECT_VFPRINTF() tests be skipped
Tests that check explicit nolibc behavior (eg "%m") or test places
where the nolibc behaviour deviates from the libc need skipping
when compiled to use the host libc.
David Laight [Mon, 2 Mar 2026 10:17:57 +0000 (10:17 +0000)]
selftests/nolibc: Use length of 'expected' string to check snprintf() output
Instead of requiring the test cases specifying both the length and
expected output, take the length from the expected output.
Tests that expect the output be truncated are changed to specify
the un-truncated output.
Change the strncmp() to a memcmp() with an extra check that the
output is actually terminated.
Append a '+' to the printed output (after the final ") when the output
is truncated.
David Laight [Mon, 2 Mar 2026 10:17:55 +0000 (10:17 +0000)]
selftests/nolibc: Return correct value when printf test fails
Correctly return 1 (the number of errors) when strcmp()
fails rather than the return value from strncmp() which is the
signed difference between the mismatching characters.
David Laight [Mon, 23 Feb 2026 10:17:20 +0000 (10:17 +0000)]
tools/nolibc: Optimise and common up the number to ascii functions
Implement u[64]to[ah]_r() using a common function that uses multiply
by reciprocal to generate the least significant digit first and then
reverses the string.
On 32bit this is five multiplies (with 64bit product) for each output
digit. I think the old utoa_r() always did 36 multiplies and a lot
of subtracts - so this is likely faster even for 32bit values.
Definitely better for 64bit values (especially small ones).
Clearly shifts are faster for base 16, but reversing the output buffer
makes a big difference.
Sharing the code reduces the footprint (unless gcc decides to constant
fold the functions).
Definitely helps vfprintf() where the constants get loaded and a single
call is done.
Also makes it cheap to add octal support to vfprintf for completeness.
David Laight [Mon, 23 Feb 2026 10:17:21 +0000 (10:17 +0000)]
selftests/nolibc: Fix build with host headers and libc
Many systems don't have strlcpy() or strlcat() and readdir_r() is
deprecated. This makes the tests fail to build with the host headers.
Disable the 'directories' test and define strlcpy(), strlcat() and
readdir_r() using #defines so that the code compiles.
Thomas Weißschuh [Wed, 18 Mar 2026 17:00:33 +0000 (18:00 +0100)]
selftests/nolibc: fix test_file_stream() on musl libc
fwrite() modifying errno is non-standard.
Only validate this behavior on those libc implementations which
implement it.
Fixes: a5f00be9b3b0 ("tools/nolibc: Add a simple test for writing to a FILE and reading it back") Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Linus Torvalds [Fri, 20 Mar 2026 16:38:12 +0000 (09:38 -0700)]
Merge tag 'ata-7.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux
Pull ata fixes from Niklas Cassel:
- ADATA SU680 SSDs are causing command timeouts when LPM is enabled.
Enable the ATA_QUIRK_NOLPM quirk to prevent LPM from being enabled
on these devices (Damien)
- When receiving a REPORT SUPPORTED OPERATION CODES command with an
invalid REPORTING OPTIONS format, sense data should have the field
pointer set to byte 2 (the location of the REPORTING OPTIONS field)
instead of incorrectly pointing to byte 1 (Damien)
* tag 'ata-7.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux:
ata: libata-scsi: report correct sense field pointer in ata_scsiop_maint_in()
ata: libata-core: disable LPM on ADATA SU680 SSD
Linus Torvalds [Fri, 20 Mar 2026 16:34:32 +0000 (09:34 -0700)]
Merge tag 'mtd/fixes-for-7.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux
Pull MTD fixes from Miquel Raynal:
- In SPI NOR, there was an issue with the RDCR capability, leading to
several platforms no longer capable of using it for wrong reasons
(the follow-up commit renames the helper to avoid future confusion)
- NAND controller drivers needed to be improved to fix some timings, a
locking schenario and avoid certain operations during panic writes
- The Spear600 DT binding conversion was done partially, leading to
several warnings which have individually been fixed
- Tudor gets replaced by Takahiro for the SPI NOR maintainance
- Plus two more misc fixes
* tag 'mtd/fixes-for-7.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux:
mtd: rawnand: pl353: make sure optimal timings are applied
mtd: spi-nor: Rename spi_nor_spimem_check_op()
mtd: spi-nor: Fix RDCR controller capability core check
mtd: rawnand: brcmnand: skip DMA during panic write
mtd: rawnand: serialize lock/unlock against other NAND operations
dt-bindings: mtd: st,spear600-smi: Fix example
dt-bindings: mtd: st,spear600-smi: #address/size-cells is mandatory
dt-bindings: mtd: st,spear600-smi: Fix description
mtd: rawnand: cadence: Fix error check for dma_alloc_coherent() in cadence_nand_init()
mtd: Avoid boot crash in RedBoot partition table parser
MAINTAINERS: add Takahiro Kuwano as SPI NOR reviewer
MAINTAINERS: remove Tudor Ambarus as SPI NOR maintainer
Linus Torvalds [Fri, 20 Mar 2026 16:29:03 +0000 (09:29 -0700)]
Merge tag 'iommu-fixes-v7.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux
Pull iommu fixes from Joerg Roedel:
"Intel VT-d:
- Abort all pending requests on dev_tlb_inv timeout to avoid
hardlockup
- Limit IOPF handling to PRI-capable device to avoid SVA attach
failure
AMD-Vi:
- Make sure identity domain is not used when SNP is active
Core fixes:
- Handle mapping IOVA 0x0 correctly
- Fix crash in SVA code
- Kernel-doc fix in IO-PGTable code"
* tag 'iommu-fixes-v7.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux:
iommu/amd: Block identity domain when SNP enabled
iommu/sva: Fix crash in iommu_sva_unbind_device()
iommu/io-pgtable: fix all kernel-doc warnings in io-pgtable.h
iommu: Fix mapping check for 0x0 to avoid re-mapping it
iommu/vt-d: Only handle IOPF for SVA when PRI is supported
iommu/vt-d: Fix intel iommu iotlb sync hardlockup and retry
Linus Torvalds [Fri, 20 Mar 2026 16:23:01 +0000 (09:23 -0700)]
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"There's a small crop of fixes for the MPAM resctrl driver, a fix for
SCS/PAC patching with the AMDGPU driver and a page-table fix for
realms running with 52-bit physical addresses:
- Fix DWARF parsing for SCS/PAC patching to work with very large
modules (such as the amdgpu driver)
- Fixes to the mpam resctrl driver
- Fix broken handling of 52-bit physical addresses when sharing
memory from within a realm"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: realm: Fix PTE_NS_SHARED for 52bit PA support
arm_mpam: Force __iomem casts
arm_mpam: Disable preemption when making accesses to fake MSC in kunit test
arm_mpam: Fix null pointer dereference when restoring bandwidth counters
arm64/scs: Fix handling of advance_loc4
- Update maintainers for Hyper-V DRM driver (Saurabh Sengar)
- Misc clean up in MSHV crashdump code (Ard Biesheuvel, Uros Bizjak)
- Minor improvements to MSHV code (Mukesh R, Wei Liu)
- Revert not yet released MSHV scrub partition hypercall (Wei Liu)
* tag 'hyperv-fixes-signed-20260319' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
mshv: Fix error handling in mshv_region_pin
MAINTAINERS: Update maintainers for Hyper-V DRM driver
mshv: Fix use-after-free in mshv_map_user_memory error path
mshv: pass struct mshv_user_mem_region by reference
x86/hyperv: Use any general-purpose register when saving %cr2 and %cr8
x86/hyperv: Use current_stack_pointer to avoid asm() in hv_hvcrash_ctxt_save()
x86/hyperv: Save segment registers directly to memory in hv_hvcrash_ctxt_save()
x86/hyperv: Use __naked attribute to fix stackless C function
Revert "mshv: expose the scrub partition hypercall"
mshv: add arm64 support for doorbell & intercept SINTs
mshv: refactor synic init and cleanup
x86/hyperv: print out reserved vectors in hexadecimal
Dave Airlie [Fri, 20 Mar 2026 16:12:41 +0000 (02:12 +1000)]
Merge tag 'drm-xe-fixes-2026-03-19' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes
Driver Changes:
- A number of teardown fixes (Daniele, Matt Brost, Zhanjun, Ashutosh)
- Skip over non-leaf PTE for PRL generation (Brian)
- Fix an unitialized variable (Umesh)
- Fix a missing runtime PM reference (Sanjay)
Linus Torvalds [Fri, 20 Mar 2026 16:07:29 +0000 (09:07 -0700)]
Merge tag 'v7.0-rc4-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull smb client fixes from Steve French:
- Fix reporting of i_blocks
- Fix Kerberos mounts with different usernames to same server
- Trivial comment cleanup
* tag 'v7.0-rc4-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
smb: client: fix generic/694 due to wrong ->i_blocks
cifs: smb1: fix comment typo
smb: client: fix krb5 mount with username option
Linus Torvalds [Fri, 20 Mar 2026 16:03:37 +0000 (09:03 -0700)]
Merge tag 'v7.0-rc4-ksmbd-server-fixes' of git://git.samba.org/ksmbd
Pull smb server fixes from Steve French:
- Three use after free fixes (in close, in compounded ops, and in tree
disconnect)
- Multichannel fix
- return proper volume identifier (superblock uuid if available) in
FS_OBJECT_ID queries
* tag 'v7.0-rc4-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
ksmbd: fix use-after-free in durable v2 replay of active file handles
ksmbd: fix use-after-free of share_conf in compound request
ksmbd: use volume UUID in FS_OBJECT_ID_INFORMATION
ksmbd: unset conn->binding on failed binding request
ksmbd: fix share_conf UAF in tree_conn disconnect
Dave Airlie [Fri, 20 Mar 2026 15:52:29 +0000 (01:52 +1000)]
Merge tag 'drm-misc-fixes-2026-03-19' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes
A doc warning fix and a memory leak fix for vmwgfx, a deadlock fix and
interrupt handling fixes for imagination, a locking fix for
pagemap_until, a UAF fix for drm_dev_unplug, and a multi-channel audio
handling fix for dw-hdmi-qp.
Dave Airlie [Fri, 20 Mar 2026 15:43:57 +0000 (01:43 +1000)]
Merge tag 'drm-intel-fixes-2026-03-19' of https://gitlab.freedesktop.org/drm/i915/kernel into drm-fixes
- Fix #15771: Screen corruption and stuttering on P14s w/ 3K display
- Fix for PSR entry setup frames count on rejected commit
- Fix OOPS if firmware is not loaded and suspend is attempted
- Fix unlikely NULL deref due to DC6 on probe
Dave Jiang [Thu, 19 Mar 2026 15:25:41 +0000 (08:25 -0700)]
cxl: Add endpoint decoder flags clear when PCI reset happens
When a PCI reset happens, the lock and enable flags of the CXL device
should be cleared to avoid stale state flags after reset. Add flag
clearing during cxl_reset_done() to clear the relevant endpoint
decoder flags for all decoders of the endpoint device.
Reported-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Link: https://patch.msgid.link/20260319152541.2739343-1-dave.jiang@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
x86/efi: efi_unmap_boot_services: fix calculation of ranges_to_free size
ranges_to_free array should have enough room to store the entire EFI
memmap plus an extra element for NULL entry.
The calculation of this array size wrongly adds 1 to the overall size
instead of adding 1 to the number of elements.
Merge patch series "pid_namespace: make init creation more flexible"
Pavel Tikhomirov <ptikhomirov@virtuozzo.com> says:
The first patch properly annotates accesses to ->child_reaper with
_ONCE macroses, to protect unlocked accesses from possible cpu/compiler
optimization problems.
The second patch makes sure that the init is always a first process in
the pid namespace, previously this was only checked for set_tid case.
The third patch allows to join pid namespace before pid namespace init
is created, that allows to create pid namespace by one process and then
create pid namespace init from another process after setns(). Please see
the detailed description in the patch commit message. It depends on the
second patch.
The forth and the final patch is a comprehansive test, that tests both
basic usecase of creating pid namespace and init separately, and a more
specific usecase which shows how we can improve clone3(set_tid)
usability after this change.
This change is generally useful as it makes clone3(set_tid) more
universal, and let's it work in all the cases evenly. Also it is highly
useful to CRIU to handle nested containers.
* patches from https://patch.msgid.link/20260318122157.280595-1-ptikhomirov@virtuozzo.com:
MAINTAINERS: add a new entry for testing pidns init creation via setns
selftests: Add tests for creating pidns init via setns
pid_namespace: allow opening pid_for_children before init was created
pid: check init is created first after idr alloc
pid_namespace: avoid optimization of accesses to ->child_reaper
Pavel Tikhomirov [Wed, 18 Mar 2026 12:21:52 +0000 (13:21 +0100)]
selftests: Add tests for creating pidns init via setns
First testcase "pidns_init_via_setns" checks that a process can become
Pid 1 (init) in a new Pid namespace created via unshare() and joined via
setns().
Second testcase "pidns_init_via_setns_set_tid" checks that during this
process we can use clone3() + set_tid and set the pid in both the new
and old pid namespaces (owned by different user namespaces). This test
requires root to run to avoid complex setup for wrapper userns.
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
--
pidns_init_via_setns. Make pidns_init_via_setns_set_tid require root.
Pavel Tikhomirov [Wed, 18 Mar 2026 12:21:51 +0000 (13:21 +0100)]
pid_namespace: allow opening pid_for_children before init was created
This effectively gives us an ability to create the pid namespace init as
a child of the process (setns-ed to the pid namespace) different to the
process which created the pid namespace itself.
Original problem:
There is a cool set_tid feature in clone3() syscall, it allows you to
create process with desired pids on multiple pid namespace levels. Which
is useful to restore processes in CRIU for nested pid namespace case.
In nested container case we can potentially see this kind of pid/user
namespace tree:
So to create the "Process" and set pids {p0, p1, ... pn} for it on all
pid namespace levels we can use clone3() syscall set_tid feature, BUT
the syscall does not allow you to set pid on pid namespace levels you
don't have permission to. So basically you have to be in "User NS0" when
creating the "Process" to actually be able to set pids on all levels.
It is ok for almost any process, but with pid namespace init this does
not work, as currently we can only create pid namespace init and the pid
namespace itself simultaneously, so to make "Pid NSn" owned by "User
NSn" we have to be in the "User NSn".
We can't possibly be in "User NS0" and "User NSn" at the same time,
hence the problem.
Alternative solution:
Yes, for the case of pid namespace init we can use old and gold
/proc/sys/kernel/ns_last_pid interface on the levels lower than n. But
it is much more complicated and introduces tons of extra code to do. It
would be nice to make clone3() set_tid interface also aplicable to this
corner case.
Implementation:
Now when anyone can setns to the pid namespace before the creation of
init, and thus multiple processes can fork children to the pid
namespace, it is important that we enforce the first process created is
always pid namespace init. (Note that this was done by the previous
preparational patch as a standalon useful change.) We only allow other
processes after the init sets pid_namespace->child_reaper.
Reviewed-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Andrei Vagin <avagin@google.com> Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
--
v2: Use *_ONCE for ->child_reaper accesses atomicity, and avoid taking
task_list lock for reading it. Rebase to master, and thus remove
now excess pidns_ready variable.
v3: Separate *_ONCE change and "init is first" checks into separate
commits.
v5: Add Andrei's review tag.
->child_reaper which can influence the pid namespace, so it looks like
the pid namespace is fully setup at the point when init sets
->child_reaper to receive more processes. Thus tasklist lock looks
excess in pidns_for_children_get()'s ->child_reaper check and it should
be safe not to have it in the corresponding check in alloc_pid()
(introduced earlier in this series).
Pavel Tikhomirov [Wed, 18 Mar 2026 12:21:50 +0000 (13:21 +0100)]
pid: check init is created first after idr alloc
This moves the condition (tid != 1 && !tmp->child_reaper) to after idr
alloc, so it not only covers that first process in pid namespace has pid
1 in case of clone3(set_tid) requesting wrong pid, but also if idr
itself gives wrong pid for some reason.
This could've been the case before this patch, when creating first
process the alloc_pid()->pidfs_add_pid() code path fails, so that the
idr->idr_next is non zero anymore and next process calling to
alloc_pid(), will get 2 as a pid from idr_alloc_cyclic(). Though thanks
to PIDNS_ADDING logic, free_pid() disables further pid allocation in
this case and it does not lead to any real problem.
Note: This is also a preparation for the next patch in the series, which
will introduce an ability of creating init from the task different to
the task which had created the pid namespace. Needed to make sure that
init is always first, even in this new case.
--
Suggested-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Andrei Vagin <avagin@google.com> Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com> Link: https://patch.msgid.link/20260318122157.280595-3-ptikhomirov@virtuozzo.com
v3: Split from main commit. Merge two checks of ->child_reaper into one.
v4: Update commit message about PIDNS_ADDING.
v5: Add Andrei's review tag. Signed-off-by: Christian Brauner <brauner@kernel.org>
Pavel Tikhomirov [Wed, 18 Mar 2026 12:21:49 +0000 (13:21 +0100)]
pid_namespace: avoid optimization of accesses to ->child_reaper
To avoid potential problems related to cpu/compiler optimizations around
->child_reaper, let's use WRITE_ONCE (additional to task_list lock)
everywhere we write it and use READ_ONCE where we read it without
explicit lock. Note: It also pairs with existing READ_ONCE with no lock
in nsfs_fh_to_dentry().
Also let's add ASSERT_EXCLUSIVE_WRITER before write to identify to KCSAN
that we don't expect any concurrent ->child_reaper modifications, and
those must be detected.
--
Suggested-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com> Link: https://patch.msgid.link/20260318122157.280595-2-ptikhomirov@virtuozzo.com
v3: Split from main commit. Add ASSERT_EXCLUSIVE_WRITER.
v5: Add one more READ_ONCE for access without lock in free_pid(). Signed-off-by: Christian Brauner <brauner@kernel.org>
Tomas Glozar [Tue, 10 Mar 2026 16:07:25 +0000 (17:07 +0100)]
rtla: Fix segfault on multiple SIGINTs
Detach stop_trace() from SIGINT/SIGALRM on tool clean-up to prevent it
from crashing RTLA by accessing freed memory.
This prevents a crash when multiple SIGINTs are received.
Fixes: d6899e560366 ("rtla/timerlat_hist: Abort event processing on second signal") Fixes: 80967b354a76 ("rtla/timerlat_top: Abort event processing on second signal") Reviewed-by: Wander Lairson Costa <wander@redhat.com> Link: https://lore.kernel.org/r/20260310160725.144443-1-tglozar@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com>
Joanne Koong [Fri, 20 Mar 2026 00:51:45 +0000 (17:51 -0700)]
writeback: don't block sync for filesystems with no data integrity guarantees
Add a SB_I_NO_DATA_INTEGRITY superblock flag for filesystems that cannot
guarantee data persistence on sync (eg fuse). For superblocks with this
flag set, sync kicks off writeback of dirty inodes but does not wait
for the flusher threads to complete the writeback.
This replaces the per-inode AS_NO_DATA_INTEGRITY mapping flag added in
commit f9a49aa302a0 ("fs/writeback: skip AS_NO_DATA_INTEGRITY mappings
in wait_sb_inodes()"). The flag belongs at the superblock level because
data integrity is a filesystem-wide property, not a per-inode one.
Having this flag at the superblock level also allows us to skip having
to iterate every dirty inode in wait_sb_inodes() only to skip each inode
individually.
Prior to this commit, mappings with no data integrity guarantees skipped
waiting on writeback completion but still waited on the flusher threads
to finish initiating the writeback. Waiting on the flusher threads is
unnecessary. This commit kicks off writeback but does not wait on the
flusher threads. This change properly addresses a recent report [1] for
a suspend-to-RAM hang seen on fuse-overlayfs that was caused by waiting
on the flusher threads to finish:
On fuse this is problematic because there are paths that may cause the
flusher thread to block (eg if systemd freezes the user session cgroups
first, which freezes the fuse daemon, before invoking the kernel
suspend. The kernel suspend triggers ->write_node() which on fuse issues
a synchronous setattr request, which cannot be processed since the
daemon is frozen. Or if the daemon is buggy and cannot properly complete
writeback, initiating writeback on a dirty folio already under writeback
leads to writeback_get_folio() -> folio_prepare_writeback() ->
unconditional wait on writeback to finish, which will cause a hang).
This commit restores fuse to its prior behavior before tmp folios were
removed, where sync was essentially a no-op.
Yuto Ohnuki [Thu, 26 Feb 2026 20:18:58 +0000 (20:18 +0000)]
fs: remove stale and duplicate forward declarations
Remove the following unnecessary forward declarations from fs.h, which
improves maintainability.
- struct hd_geometry: became unused in fs.h when
block_device_operations was moved to blkdev.h in commit 08f858512151
("[PATCH] move block_device_operations to blkdev.h"). The forward
declaration is now added to blkdev.h where it is actually used.
- struct iovec: became unused when aio_read/aio_write were removed in
commit 8436318205b9 ("->aio_read and ->aio_write removed")
- struct iov_iter: duplicate forward declaration. This removes the
redundant second declaration, added in commit 293bc9822fa9
("new methods: ->read_iter() and ->write_iter()")
Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202512301303.s7YWTZHA-lkp@intel.com/ Closes: https://lore.kernel.org/oe-kbuild-all/202512302139.Wl0soAlz-lkp@intel.com/ Closes: https://lore.kernel.org/oe-kbuild-all/202512302105.pmzYfmcV-lkp@intel.com/ Closes: https://lore.kernel.org/oe-kbuild-all/202512302125.FNgHwu5z-lkp@intel.com/ Closes: https://lore.kernel.org/oe-kbuild-all/202512302108.nIV8r5ES-lkp@intel.com/ Signed-off-by: Yuto Ohnuki <ytohnuki@amazon.com> Link: https://patch.msgid.link/20260226201857.27310-2-ytohnuki@amazon.com Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
component has component->val_bytes which is set via
snd_soc_component_setup_regmap(). But it can be calculated via
component->regmap. No need to keep it as component->val_bytes.
This patchset adds new snd_soc_component_regmap_val_bytes(),
and remove component->val_bytes / snd_soc_component_setup_regmap().
component has component->val_bytes which is set via
snd_soc_component_setup_regmap(). But it can be calculated via
component->regmap. No need to keep it as component->val_bytes.
ASoC: soc-ops: use snd_soc_component_regmap_val_bytes()
component has component->val_bytes which is set via
snd_soc_component_setup_regmap(). But it can be calculated via
component->regmap. No need to keep it as component->val_bytes.
ASoC: tegra: use snd_soc_component_regmap_val_bytes()
component has component->val_bytes which is set via
snd_soc_component_setup_regmap(). But it can be calculated via
component->regmap. No need to keep it as component->val_bytes.
component has component->val_bytes which is set via
snd_soc_component_setup_regmap(). But it can be calculated via
component->regmap. No need to keep it as component->val_bytes.
sof_parse_token_sets() accepts array->size values that can be invalid
for a vendor tuple array header. In particular, a zero size does not
advance the parser state and can lead to non-progress parsing on
malformed topology data.
Validate array->size against the minimum header size and reject values
smaller than sizeof(*array) before parsing. This preserves behavior for
valid topologies and hardens malformed-input handling.
Thomas Gleixner [Tue, 17 Mar 2026 09:01:54 +0000 (10:01 +0100)]
clocksource: Rewrite watchdog code completely
The clocksource watchdog code has over time reached the state of an
impenetrable maze of duct tape and staples. The original design, which was
made in the context of systems far smaller than today, is based on the
assumption that the to be monitored clocksource (TSC) can be trivially
compared against a known to be stable clocksource (HPET/ACPI-PM timer).
Over the years it turned out that this approach has major flaws:
- Long delays between watchdog invocations can result in wrap arounds
of the reference clocksource
- Scalability of the reference clocksource readout can degrade on large
multi-socket systems due to interconnect congestion
This was addressed with various heuristics which degraded the accuracy of
the watchdog to the point that it fails to detect actual TSC problems on
older hardware which exposes slow inter CPU drifts due to firmware
manipulating the TSC to hide SMI time.
To address this and bring back sanity to the watchdog, rewrite the code
completely with a different approach:
1) Restrict the validation against a reference clocksource to the boot
CPU, which is usually the CPU/Socket closest to the legacy block which
contains the reference source (HPET/ACPI-PM timer). Validate that the
reference readout is within a bound latency so that the actual
comparison against the TSC stays within 500ppm as long as the clocks
are stable.
2) Compare the TSCs of the other CPUs in a round robin fashion against
the boot CPU in the same way the TSC synchronization on CPU hotplug
works. This still can suffer from delayed reaction of the remote CPU
to the SMP function call and the latency of the control variable cache
line. But this latency is not affecting correctness. It only affects
the accuracy. With low contention the readout latency is in the low
nanoseconds range, which detects even slight skews between CPUs. Under
high contention this becomes obviously less accurate, but still
detects slow skews reliably as it solely relies on subsequent readouts
being monotonically increasing. It just can take slightly longer to
detect the issue.
3) Rewrite the watchdog test so it tests the various mechanisms one by
one and validating the result against the expectation.
Kit Dallege [Sun, 15 Mar 2026 17:10:01 +0000 (18:10 +0100)]
dma-mapping: fix false kernel-doc comment marker
Change /** to /* for the DMA attributes list comment in dma-mapping.h.
The comment is not a kernel-doc structured comment and should not use
the kernel-doc opening marker.
Juergen Gross [Tue, 14 Oct 2025 11:28:15 +0000 (13:28 +0200)]
xen/privcmd: add boot control for restricted usage in domU
When running in an unprivileged domU under Xen, the privcmd driver
is restricted to allow only hypercalls against a target domain, for
which the current domU is acting as a device model.
Add a boot parameter "unrestricted" to allow all hypercalls (the
hypervisor will still refuse destructive hypercalls affecting other
guests).
Make this new parameter effective only in case the domU wasn't started
using secure boot, as otherwise hypercalls targeting the domU itself
might result in violating the secure boot functionality.
This is achieved by adding another lockdown reason, which can be
tested to not being set when applying the "unrestricted" option.
This is part of XSA-482
Signed-off-by: Juergen Gross <jgross@suse.com>
---
V2:
- new patch
Leon Romanovsky [Mon, 16 Mar 2026 19:06:52 +0000 (21:06 +0200)]
mm/hmm: Indicate that HMM requires DMA coherency
HMM is fundamentally about allowing a sophisticated device to perform DMA
directly to a process’s memory while the CPU accesses that same memory at
the same time. It is similar to SVA but does not rely on IOMMU support.
Because the entire model depends on concurrent access to shared memory, it
fails as a uAPI if SWIOTLB substitutes the memory or if the CPU caches are
not coherent with DMA.
Until now, there has been no reliable way to report this, and various
approximations have been used:
int hmm_dma_map_alloc(struct device *dev, struct hmm_dma_map *map,
size_t nr_entries, size_t dma_entry_size)
{
<...>
/*
* The HMM API violates our normal DMA buffer ownership rules and can't
* transfer buffer ownership. The dma_addressing_limited() check is a
* best approximation to ensure no swiotlb buffering happens.
*/
dma_need_sync = !dev->dma_skip_sync;
if (dma_need_sync || dma_addressing_limited(dev))
return -EOPNOTSUPP;
So let's mark mapped buffers with DMA_ATTR_REQUIRE_COHERENT attribute
to prevent silent data corruption if someone tries to use hmm in a system
with swiotlb or incoherent DMA
Leon Romanovsky [Mon, 16 Mar 2026 19:06:51 +0000 (21:06 +0200)]
RDMA/umem: Tell DMA mapping that UMEM requires coherency
The RDMA subsystem exposes DMA regions through the verbs interface, which
assumes a coherent system. Use the DMA_ATTR_REQUIRE_COHERENCE attribute
to ensure coherency and avoid taking the SWIOTLB path.
The RDMA verbs programming model resembles HMM and assumes concurrent DMA
and CPU access to userspace memory. The hardware and programming model
support "one-sided" operations initiated remotely without any local CPU
involvement or notification. These include ATOMIC compare/swap, READ, and
WRITE. A remote CPU can use these operations to traverse data structures,
manipulate locks, and perform similar tasks without the host CPU’s
awareness. If SWIOTLB substitutes memory or DMA is not cache coherent,
these use cases break entirely.
In-kernel RDMA is fine with incoherent mappings because kernel users do
not rely on one-sided operations in ways that would expose these issues.
A given region may also be exported multiple times, which can trigger
warnings about cacheline overlaps. These warnings are suppressed when the
new attribute is used.
Leon Romanovsky [Mon, 16 Mar 2026 19:06:50 +0000 (21:06 +0200)]
iommu/dma: add support for DMA_ATTR_REQUIRE_COHERENT attribute
Add support for the DMA_ATTR_REQUIRE_COHERENT attribute to the exported
functions. This attribute indicates that the SWIOTLB path must not be
used and that no sync operations should be performed.
Juergen Gross [Thu, 9 Oct 2025 14:54:58 +0000 (16:54 +0200)]
xen/privcmd: restrict usage in unprivileged domU
The Xen privcmd driver allows to issue arbitrary hypercalls from
user space processes. This is normally no problem, as access is
usually limited to root and the hypervisor will deny any hypercalls
affecting other domains.
In case the guest is booted using secure boot, however, the privcmd
driver would be enabling a root user process to modify e.g. kernel
memory contents, thus breaking the secure boot feature.
The only known case where an unprivileged domU is really needing to
use the privcmd driver is the case when it is acting as the device
model for another guest. In this case all hypercalls issued via the
privcmd driver will target that other guest.
Fortunately the privcmd driver can already be locked down to allow
only hypercalls targeting a specific domain, but this mode can be
activated from user land only today.
The target domain can be obtained from Xenstore, so when not running
in dom0 restrict the privcmd driver to that target domain from the
beginning, resolving the potential problem of breaking secure boot.
This is XSA-482
Reported-by: Teddy Astie <teddy.astie@vates.tech> Fixes: 1c5de1939c20 ("xen: add privcmd driver") Signed-off-by: Juergen Gross <jgross@suse.com>
---
V2:
- defer reading from Xenstore if Xenstore isn't ready yet (Jan Beulich)
- wait in open() if target domain isn't known yet
- issue message in case no target domain found (Jan Beulich)
The mapping buffers which carry this attribute require DMA coherent system.
This means that they can't take SWIOTLB path, can perform CPU cache overlap
and doesn't perform cache flushing.
Sources already have SPDX-FileCopyrightText (~40 instances) and more
appear on the mailing list, so document that it is allowed. On the
other hand SPDX defines several other tags like SPDX-FileType, so add
checkpatch rule to narrow desired tags only to two of them - license and
copyright. That way no new tags would sneak in to the kernel unnoticed.
Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Cc: Joe Perches <joe@perches.com> Acked-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Leon Romanovsky [Mon, 16 Mar 2026 19:06:47 +0000 (21:06 +0200)]
dma-mapping: Clarify valid conditions for CPU cache line overlap
Rename the DMA_ATTR_CPU_CACHE_CLEAN attribute to better reflect that it
is debugging aid to inform DMA core code that CPU cache line overlaps are
allowed, and refine the documentation describing its use.
Leon Romanovsky [Mon, 16 Mar 2026 19:06:46 +0000 (21:06 +0200)]
dma-mapping: handle DMA_ATTR_CPU_CACHE_CLEAN in trace output
Tracing prints decoded DMA attribute flags, but it does not yet
include the recently added DMA_ATTR_CPU_CACHE_CLEAN. Add support
for decoding and displaying this attribute in the trace output.
Leon Romanovsky [Mon, 16 Mar 2026 19:06:45 +0000 (21:06 +0200)]
dma-debug: Allow multiple invocations of overlapping entries
Repeated DMA mappings with DMA_ATTR_CPU_CACHE_CLEAN trigger the
following splat. This prevents using the attribute in cases where a DMA
region is shared and reused more than seven times.
arm64: dts: renesas: rzt2h-rzn2h-evk: Fix GMAC pins sort order
Restore alphabetical sort order of the pin control subnodes by
exchanging the gmac1-pins and gmac2-pins nodes.
While at it, fix the index in an incorrect "GMAC2" comment.
Add device tree overlay to support the MayQueen PixPaper e-paper display
on the Renesas RZ/V2H EVK (KAKIP board). The display is connected via
SPI0 interface and uses GPIO pins for reset, busy, and DC control.
The overlay configures:
- RSPI0 pinmux for SPI communication (MOSI, MISO, CLK, CE0),
- PixPaper display device with proper GPIO assignments,
- SPI frequency set to 1MHz for stable operation.
This enables support for the Open-EP Community pixpaper-213-c module on
the RZ/V2H platform.
Lad Prabhakar [Fri, 23 Jan 2026 22:59:55 +0000 (22:59 +0000)]
arm64: dts: renesas: r9a09g077m44-rzt2h-evk: Clarify SD0 power jumpers
Clarify the board setup requirements for using SDHI0 on the RZ/T2H EVK by
documenting the CN78 jumper positions needed to supply SD0 power for
either the default eMMC configuration or the SD card slot configuration.
Fix a malformed MODULE_AUTHOR macro in the RZ/G2L USBPHY control driver
where the author's name and opening angle bracket were missing, leaving
only the email address with a stray closing >. Correct it to the standard
Name <email> format.
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com> Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de> Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Nitin Gote [Tue, 17 Mar 2026 08:00:59 +0000 (13:30 +0530)]
drm/xe: Extend Wa_14026781792 for xe3lpg
Wa_14026781792 applies to all graphics versions from 30.00
through 35.10 (inclusive). Since there are no IPs between
30.05 and 35.10, consolidate the RTP rules into a single
GRAPHICS_VERSION_RANGE(3000, 3510).
v2: (Matt)
- There are no IPs between 30.05 and 35.10 either,
So, consolidate this into a single GRAPHICS_VERSION_RANGE(3000, 3510)
- Also move it up to the top part of the table
Varun Gupta [Tue, 17 Mar 2026 04:04:47 +0000 (09:34 +0530)]
drm/xe/xe3p_lpg: Add Wa_16029437861
Wa_16029437861 requires disabling COAMA atomics by setting bit 22
(SQ_DISABLE_COAMA) of L3SQCREG2 (0xb104) for Xe3p_LPG graphics
version 35.10 stepping A0..B0. This bit is already set by the existing
Wa_14026144927 entry, so add the new WA ID to the same implementation.
Luca Ceresoli [Tue, 10 Mar 2026 12:13:23 +0000 (13:13 +0100)]
drm/bridge: add drm_bridge_clear_and_put()
Drivers having a struct drm_bridge pointer pointing to a bridge in many
cases hold that reference until the owning device is removed. In those
cases the reference to the bridge can be put in the .remove callback
(possibly using devm actions) or in the .destroy func (possibly with the
help of struct drm_bridge::next_bridge). At those moments the driver should
not be operating anymore and won't dereference the bridge pointer after it
is put.
However there are cases when drivers need to stop holding a reference to a
bridge even when their device is not being removed. This is the case for
bridge hot-unplug, when a bridge is removed but the previous entity (bridge
or encoder) is staying. In such case the "previous entity" needs to put it
but cannot do it via devm or .destroy, because it is not being removed.
However this is risky because there is a time window between the two lines
where the reference is put, and thus the bridge could be deallocated, but
the pointer is still assigned. If other functions of the same driver were
invoked concurrently they might dereference my_priv->some_bridge during
that window, resulting in use-after-free.
A correct solution is to clear the pointer before putting the reference,
but that needs a temporary variable: