Merge tag 'pm-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki:
"As is tradition, cpufreq is the part with the largest number of
updates that include core fixes and cleanups as well as updates of
several assorted drivers, but there are also quite a few updates
related to system sleep, mostly focused on asynchronous suspend and
resume of devices and on making the integration of system suspend
and resume with runtime PM easier.
Runtime PM is also updated to allow some code duplication in drivers
to be eliminated going forward and to work more consistently overall
in some cases.
Apart from that, there are some driver core updates related to PM
domains that should help to address ordering issues with devm_ cleanup
routines relying on PM domains, some assorted devfreq updates
including core fixes and cleanups, tooling updates, and documentation
and MAINTAINERS updates.
Specifics:
- Fix two initialization ordering issues in the cpufreq core and a
governor initialization error path in it, and clean it up (Lifeng
Zheng)
- Add Granite Rapids support in no-HWP mode to the intel_pstate
cpufreq driver (Li RongQing)
- Make intel_pstate always use HWP_DESIRED_PERF when operating in the
passive mode (Rafael Wysocki)
- Allow building the tegra124 cpufreq driver as a module (Aaron
Kling)
- Do minor cleanups for Rust cpufreq and cpumask APIs and fix
MAINTAINERS entry for cpu.rs (Abhinav Ananthu, Ritvik Gupta, Lukas
Bulwahn)
- Clean up assorted cpufreq drivers (Arnd Bergmann, Dan Carpenter,
Krzysztof Kozlowski, Sven Peter, Svyatoslav Ryhel, Lifeng Zheng)
- Add the NEED_UPDATE_LIMITS flag to the CPPC cpufreq driver
(Prashant Malani)
- Fix minimum performance state label error in the amd-pstate driver
documentation (Shouye Liu)
- Add the CPUFREQ_GOV_STRICT_TARGET flag to the userspace cpufreq
governor and explain HW coordination influence on it in the
documentation (Shashank Balaji)
- Fix opencoded for_each_cpu() in idle_state_valid() in the DT
cpuidle driver (Yury Norov)
- Remove info about non-existing QoS interfaces from the PM QoS
documentation (Ulf Hansson)
- Use c_* types via kernel prelude in Rust for OPP (Abhinav Ananthu)
- Add HiSilicon uncore frequency scaling driver to devfreq (Jie Zhan)
- Simplify the sun8i-a33-mbus devfreq driver by using more devm
functions (Uwe Kleine-König)
- Fix an index typo in trans_stat() in devfreq (Chanwoo Choi)
- Check devfreq governor before using governor->name (Lifeng Zheng)
- Remove a redundant devfreq_get_freq_range() call from
devfreq_add_device() (Lifeng Zheng)
- Limit max_freq with scaling_min_freq in devfreq (Lifeng Zheng)
- Replace sscanf() with kstrtoul() in set_freq_store() (Lifeng Zheng)
- Extend the asynchronous suspend and resume of devices to handle
suppliers like parents and consumers like children (Rafael Wysocki)
- Make pm_runtime_force_resume() work for drivers that set the
DPM_FLAG_SMART_SUSPEND flag and allow PCI drivers and drivers that
collaborate with the general ACPI PM domain to set it (Rafael
Wysocki)
- Add kernel parameter to disable asynchronous suspend/resume of
devices (Tudor Ambarus)
- Drop redundant might_sleep() calls from some functions in the
device suspend/resume core code (Zhongqiu Han)
- Fix the handling of monitors connected right before waking up the
system from sleep (tuhaowen)
- Clean up MAINTAINERS entries for suspend and hibernation (Rafael
Wysocki)
- Fix error code path in the KEXEC_JUMP flow and drop a redundant
pm_restore_gfp_mask() call from it (Rafael Wysocki)
- Rearrange suspend/resume error handling in the core device suspend
and resume code (Rafael Wysocki)
- Fix up white space that does not follow coding style in the
hibernation core code (Darshan Rathod)
- Document return values of suspend-related API functions in the
runtime PM framework (Sakari Ailus)
- Mark last busy stamp in multiple autosuspend-related functions in
the runtime PM framework and update its documentation (Sakari
Ailus)
- Take active children into account in pm_runtime_get_if_in_use() for
consistency (Rafael Wysocki)
- Fix NULL pointer dereference in get_pd_power_uw() in the dtpm_cpu
power capping driver (Sivan Zohar-Kotzer)
- Add support for the Bartlett Lake platform to the Intel RAPL power
capping driver (Qiao Wei)
- Add PL4 support for Panther Lake to the intel_rapl_msr power
capping driver (Zhang Rui)
- Update contact information in the PM ABI docs and maintainer
information in the power domains DT binding (Rafael Wysocki)
- Update PM header inclusions to follow the IWYU (Include What You
Use) principle (Andy Shevchenko)
- Add flags to specify power on attach/detach for PM domains, make
the driver core detach PM domains in device_unbind_cleanup(), and
drop the dev_pm_domain_detach() call from the platform bus type
(Claudiu Beznea)
- Improve Python binding's Makefile for cpupower (John B. Wyatt IV)
- Fix printing of CORE, CPU fields in cpupower-monitor (Gautham
Shenoy)"
* tag 'pm-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (75 commits)
cpufreq: CPPC: Mark driver with NEED_UPDATE_LIMITS flag
PM: docs: Use my kernel.org address in ABI docs and DT bindings
PM: hibernate: Fix up white space that does not follow coding style
PM: sleep: Rearrange suspend/resume error handling in the core
Documentation: amd-pstate:fix minimum performance state label error
PM: runtime: Take active children into account in pm_runtime_get_if_in_use()
kexec_core: Drop redundant pm_restore_gfp_mask() call
kexec_core: Fix error code path in the KEXEC_JUMP flow
PM: sleep: Clean up MAINTAINERS entries for suspend and hibernation
drivers: cpufreq: add Tegra114 support
rust: cpumask: Replace `MaybeUninit` and `mem::zeroed` with `Opaque` APIs
cpufreq: Exit governor when failed to start old governor
cpufreq: Move the check of cpufreq_driver->get into cpufreq_verify_current_freq()
cpufreq: Init policy->rwsem before it may be possibly used
cpufreq: Initialize cpufreq-based frequency-invariance later
cpufreq: Remove duplicate check in __cpufreq_offline()
cpufreq: Contain scaling_cur_freq.attr in cpufreq_attrs
cpufreq: intel_pstate: Add Granite Rapids support in no-HWP mode
cpufreq: intel_pstate: Always use HWP_DESIRED_PERF in passive mode
PM / devfreq: Add HiSilicon uncore frequency scaling driver
...
Merge tag 'landlock-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux
Pull landlock update from Mickaël Salaün:
"Fix test issues, improve build compatibility, and add new tests"
* tag 'landlock-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux:
landlock: Fix cosmetic change
samples/landlock: Fix building on musl libc
landlock: Fix warning from KUnit tests
selftests/landlock: Add test to check rule tied to covered mount point
selftests/landlock: Fix build of audit_test
selftests/landlock: Fix readlink check
Merge tag 'selinux-pr-20250725' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux
Pull selinux updates from Paul Moore:
- Introduce the concept of a SELinux "neveraudit" type which prevents
all auditing of the given type/domain.
Taken by itself, the benefit of marking a SELinux domain with the
"neveraudit" tag is likely not very interesting, especially given the
significant overlap with the "dontaudit" tag.
However, given that the "neveraudit" tag applies to *all* auditing of
the tagged domain, we can do some fairly interesting optimizations
when a SELinux domain is marked as both "permissive" and "dontaudit"
(think of the unconfined_t domain).
While this pull request includes optimized inode permission and
getattr hooks, these optimizations require SELinux policy changes,
therefore the improvements may not be visible on standard downstream
Linux distos for a period of time.
- Continue the deprecation process of /sys/fs/selinux/user.
After removing the associated userspace code in 2020, we marked the
/sys/fs/selinux/user interface as deprecated in Linux v6.13 with
pr_warn() and the usual documention update.
This adds a five second sleep after the pr_warn(), following a
previous deprecation process pattern that has worked well for us in
the past in helping identify any existing users that we haven't yet
reached.
- Add a __GFP_NOWARN flag to our initial hash table allocation.
Fuzzers such a syzbot often attempt abnormally large SELinux policy
loads, which the SELinux code gracefully handles by checking for
allocation failures, but not before the allocator emits a warning
which causes the automated fuzzing to flag this as an error and
report it to the list. While we want to continue to support the work
done by the fuzzing teams, we want to focus on proper issues and not
an error case that is already handled safely. Add a NOWARN flag to
quiet the allocator and prevent syzbot from tripping on this again.
- Remove some unnecessary selinuxfs cleanup code, courtesy of Al.
- Update the SELinux in-kernel documentation with pointers to
additional information.
* tag 'selinux-pr-20250725' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: don't bother with selinuxfs_info_free() on failures
selinux: add __GFP_NOWARN to hashtab_init() allocations
selinux: optimize selinux_inode_getattr/permission() based on neveraudit|permissive
selinux: introduce neveraudit types
documentation: add links to SELinux resources
selinux: add a 5 second sleep to /sys/fs/selinux/user
Merge tag 'lsm-pr-20250725' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm
Pull lsm updates from Paul Moore:
- Add Nicolas Bouchinet and Xiu Jianfeng as Lockdown maintainers
The Lockdown LSM has been without a dedicated mantainer since its
original acceptance upstream, and it has suffered as a result.
Thankfully we have two new volunteers who together I believe have the
background and desire to help ensure Lockdown is properly supported.
- Remove the unused cap_mmap_file() declaration
* tag 'lsm-pr-20250725' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm:
MAINTAINERS: Add Xiu and myself as Lockdown maintainers
security: Remove unused declaration cap_mmap_file()
lsm: trivial comment fix
Merge tag 'tpmdd-next-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd
Pull tpm updates from Jarkko Sakkinen:
"Quite a few commits but nothing really that would be worth of spending
too much time for, or would want to emphasize in particular"
* tag 'tpmdd-next-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd:
tpm_crb_ffa: handle tpm busy return code
tpm_crb_ffa: Remove memset usage
tpm_crb_ffa: Fix typos in function name
tpm: Check for completion after timeout
tpm: Use of_reserved_mem_region_to_resource() for "memory-region"
tpm: Replace scnprintf() with sysfs_emit() and sysfs_emit_at() in sysfs show functions
tpm_crb_ffa: Remove unused export
tpm: tpm_crb_ffa: try to probe tpm_crb_ffa when it's built-in
firmware: arm_ffa: Change initcall level of ffa_init() to rootfs_initcall
tpm/tpm_svsm: support TPM_CHIP_FLAG_SYNC
tpm/tpm_ftpm_tee: support TPM_CHIP_FLAG_SYNC
tpm: support devices with synchronous send()
tpm: add bufsiz parameter in the .send callback
Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linux
Pull fscrypt updates from Eric Biggers:
"Simplify how fscrypt uses the crypto API, resulting in some
significant performance improvements:
- Drop the incomplete and problematic support for asynchronous
algorithms. These drivers are bug-prone, and it turns out they are
actually much slower than the CPU-based code as well.
- Allocate crypto requests on the stack instead of the heap. This
improves encryption and decryption performance, especially for
filenames. This also eliminates a point of failure during I/O"
* tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linux:
ceph: Remove gfp_t argument from ceph_fscrypt_encrypt_*()
fscrypt: Remove gfp_t argument from fscrypt_encrypt_block_inplace()
fscrypt: Remove gfp_t argument from fscrypt_crypt_data_unit()
fscrypt: Switch to sync_skcipher and on-stack requests
fscrypt: Drop FORBID_WEAK_KEYS flag for AES-ECB
fscrypt: Don't use asynchronous CryptoAPI algorithms
fscrypt: Don't use problematic non-inline crypto engines
fscrypt: Drop obsolete recommendation to enable optimized SHA-512
fscrypt: Explicitly include <linux/export.h>
Merge tag 'libcrypto-conversions-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux
Pull crypto library conversions from Eric Biggers:
"Convert fsverity and apparmor to use the SHA-2 library functions
instead of crypto_shash. This is simpler and also slightly faster"
* tag 'libcrypto-conversions-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux:
fsverity: Switch from crypto_shash to SHA-2 library
fsverity: Explicitly include <linux/export.h>
apparmor: use SHA-256 library API instead of crypto_shash API
Merge tag 'libcrypto-tests-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux
Pull crypto library test updates from Eric Biggers:
"Add KUnit test suites for the Poly1305, SHA-1, SHA-224, SHA-256,
SHA-384, and SHA-512 library functions.
These are the first KUnit tests for lib/crypto/. So in addition to
being useful tests for these specific algorithms, they also establish
some conventions for lib/crypto/ testing going forwards.
The new tests are fairly comprehensive: more comprehensive than the
generic crypto infrastructure's tests. They use a variety of
techniques to check for the types of implementation bugs that tend to
occur in the real world, rather than just naively checking some test
vectors. (Interestingly, poly1305_kunit found a bug in QEMU)
The core test logic is shared by all six algorithms, rather than being
duplicated for each algorithm.
Each algorithm's test suite also optionally includes a benchmark"
* tag 'libcrypto-tests-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux:
lib/crypto: tests: Annotate worker to be on stack
lib/crypto: tests: Add KUnit tests for SHA-1 and HMAC-SHA1
lib/crypto: tests: Add KUnit tests for Poly1305
lib/crypto: tests: Add KUnit tests for SHA-384 and SHA-512
lib/crypto: tests: Add KUnit tests for SHA-224 and SHA-256
lib/crypto: tests: Add hash-test-template.h and gen-hash-testvecs.py
Merge tag 'libcrypto-updates-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux
Pull crypto library updates from Eric Biggers:
"This is the main crypto library pull request for 6.17. The main focus
this cycle is on reorganizing the SHA-1 and SHA-2 code, providing
high-quality library APIs for SHA-1 and SHA-2 including HMAC support,
and establishing conventions for lib/crypto/ going forward:
- Migrate the SHA-1 and SHA-512 code (and also SHA-384 which shares
most of the SHA-512 code) into lib/crypto/. This includes both the
generic and architecture-optimized code. Greatly simplify how the
architecture-optimized code is integrated. Add an easy-to-use
library API for each SHA variant, including HMAC support. Finally,
reimplement the crypto_shash support on top of the library API.
- Apply the same reorganization to the SHA-256 code (and also SHA-224
which shares most of the SHA-256 code). This is a somewhat smaller
change, due to my earlier work on SHA-256. But this brings in all
the same additional improvements that I made for SHA-1 and SHA-512.
There are also some smaller changes:
- Move the architecture-optimized ChaCha, Poly1305, and BLAKE2s code
from arch/$(SRCARCH)/lib/crypto/ to lib/crypto/$(SRCARCH)/. For
these algorithms it's just a move, not a full reorganization yet.
- Fix the MIPS chacha-core.S to build with the clang assembler.
- Fix the Poly1305 functions to work in all contexts.
- Fix a performance regression in the x86_64 Poly1305 code.
- Clean up the x86_64 SHA-NI optimized SHA-1 assembly code.
Note that since the new organization of the SHA code is much simpler,
the diffstat of this pull request is negative, despite the addition of
new fully-documented library APIs for multiple SHA and HMAC-SHA
variants.
These APIs will allow further simplifications across the kernel as
users start using them instead of the old-school crypto API. (I've
already written a lot of such conversion patches, removing over 1000
more lines of code. But most of those will target 6.18 or later)"
* tag 'libcrypto-updates-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux: (67 commits)
lib/crypto: arm64/sha512-ce: Drop compatibility macros for older binutils
lib/crypto: x86/sha1-ni: Convert to use rounds macros
lib/crypto: x86/sha1-ni: Minor optimizations and cleanup
crypto: sha1 - Remove sha1_base.h
lib/crypto: x86/sha1: Migrate optimized code into library
lib/crypto: sparc/sha1: Migrate optimized code into library
lib/crypto: s390/sha1: Migrate optimized code into library
lib/crypto: powerpc/sha1: Migrate optimized code into library
lib/crypto: mips/sha1: Migrate optimized code into library
lib/crypto: arm64/sha1: Migrate optimized code into library
lib/crypto: arm/sha1: Migrate optimized code into library
crypto: sha1 - Use same state format as legacy drivers
crypto: sha1 - Wrap library and add HMAC support
lib/crypto: sha1: Add HMAC support
lib/crypto: sha1: Add SHA-1 library functions
lib/crypto: sha1: Rename sha1_init() to sha1_init_raw()
crypto: x86/sha1 - Rename conflicting symbol
lib/crypto: sha2: Add hmac_sha*_init_usingrawkey()
lib/crypto: arm/poly1305: Remove unneeded empty weak function
lib/crypto: x86/poly1305: Fix performance regression on short messages
...
Merge tag 'crc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux
Pull CRC updates from Eric Biggers:
- Reorganize the architecture-optimized CRC code
It now lives in lib/crc/$(SRCARCH)/ rather than arch/$(SRCARCH)/lib/,
and it is no longer artificially split into separate generic and arch
modules. This allows better inlining and dead code elimination
The generic CRC code is also no longer exported, simplifying the API.
(This mirrors the similar changes to SHA-1 and SHA-2 in lib/crypto/,
which can be found in the "Crypto library updates" pull request)
- Improve crc32c() performance on newer x86_64 CPUs on long messages by
enabling the VPCLMULQDQ optimized code
- Simplify the crypto_shash wrappers for crc32_le() and crc32c()
Register just one shash algorithm for each that uses the (fully
optimized) library functions, instead of unnecessarily providing
direct access to the generic CRC code
- Remove unused and obsolete drivers for hardware CRC engines
- Remove CRC-32 combination functions that are no longer used
- Add kerneldoc for crc32_le(), crc32_be(), and crc32c()
- Convert the crc32() macro to an inline function
* tag 'crc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux: (26 commits)
lib/crc: x86/crc32c: Enable VPCLMULQDQ optimization where beneficial
lib/crc: x86: Reorganize crc-pclmul static_call initialization
lib/crc: crc64: Add include/linux/crc64.h to kernel-api.rst
lib/crc: crc32: Change crc32() from macro to inline function and remove cast
nvmem: layouts: Switch from crc32() to crc32_le()
lib/crc: crc32: Document crc32_le(), crc32_be(), and crc32c()
lib/crc: Explicitly include <linux/export.h>
lib/crc: Remove ARCH_HAS_* kconfig symbols
lib/crc: x86: Migrate optimized CRC code into lib/crc/
lib/crc: sparc: Migrate optimized CRC code into lib/crc/
lib/crc: s390: Migrate optimized CRC code into lib/crc/
lib/crc: riscv: Migrate optimized CRC code into lib/crc/
lib/crc: powerpc: Migrate optimized CRC code into lib/crc/
lib/crc: mips: Migrate optimized CRC code into lib/crc/
lib/crc: loongarch: Migrate optimized CRC code into lib/crc/
lib/crc: arm64: Migrate optimized CRC code into lib/crc/
lib/crc: arm: Migrate optimized CRC code into lib/crc/
lib/crc: Prepare for arch-optimized code in subdirs of lib/crc/
lib/crc: Move files into lib/crc/
lib/crc32: Remove unused combination support
...
- Remove arbitrary 4K limitation of program header size (Yin Fengwei)
- Reorder function qualifiers for copy_clone_args_from_user() (Dishank Jogi)
* tag 'execve-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (25 commits)
fork: reorder function qualifiers for copy_clone_args_from_user
binfmt_elf: remove the 4k limitation of program header size
binfmt_elf: Warn on missing or suspicious regset note names
xtensa: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names
um: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names
x86/ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names
sparc: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names
sh: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names
s390/ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names
riscv: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names
powerpc/ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names
parisc: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names
openrisc: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names
nios2: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names
MIPS: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names
m68k: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names
LoongArch: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names
hexagon: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names
csky: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names
arm64: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names
...
Merge tag 'ata-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux
Pull ata updates from Damien Le Moal:
- Replace the ATA_DFLAG_ZAC device flag with the helper function
ata_dev_is_zac() testing directly the device class and device zoned
mode (me)
- Some small cleanup of ata_scsi_offline_dev() code (me)
- Improve the description of the link power management (LPM) policies
in Kconfig and in the comments defining these. Together with this,
clarify the description of the ahci driver mobile_lpm_policy module
parameter (me)
- Various code refactoring of libata LPM handling (ata_eh_set_lpm()
renaming, introduce ata_dev_config_lpm(), LPM related quirk handling,
and LPM related feature advertizing on device scan) (me)
- Avoid unnecessary device reset when revalidating after an error when
LPM is used (me)
- Do not allow setting a port/link LPM policy if LPM is not supported,
either because the controller does not support partial, slumber nor
devsleep, or when the port is an external port with hotplug
capability (me)
- Make sure that device initiated power management (DIPM) is not
enabled if the host (controller) lacks support for this feature (me)
- Improve messages and debug messages related to LPM, in particular,
reduce the number of messages signaling the lack of LPM support (me)
- Cache in memory a device general purpose log directory to avoid
having to access this log for every log page access. The intent here
is to reduce the number of read log commands when scanning or
revalidating a device (me)
- Change ata_dev_cleanup_cdl_resources() to be a static function (me)
- Rename and simplify the mode setting functions (me)
- Introduce the helper function ata_port_eh_scheduled() to check if EH
is pending or running for a port (me)
- Improve ata_eh_set_pending() (return bool instead of int) (me)
- Use sysfs_emit() instead of scnprintf() for libata-transport
attributes (Jonathan)
- Use the existing macro definiton of RDC vendor ID instead of
hardcoding it in the pata_rdc driver (Andy)
- Rework how EH is called for a port to avoid needing to pass along the
prereset, softreset, hardreset and postreset operations. The driver
API documentation for this is also updated (me)
* tag 'ata-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux: (28 commits)
Documentation: driver-api: Update libata error handler information
ata: libata-eh: Simplify reset operation management
ata: libata-eh: Remove ata_do_eh()
ata: pata_rdc: Use registered definition for the RDC vendor
ata: libata-eh: Make ata_eh_followup_srst_needed() return a bool
ata: libata-transport: replace scnprintf with sysfs_emit for simple attributes
ata: libata-eh: use bool for fastdrain in ata_eh_set_pending()
ata: libata: Introduce ata_port_eh_scheduled()
ata: libata-core: Rename ata_do_set_mode()
ata: libata-eh: Rename and make ata_set_mode() static
ata: libata-core: Make ata_dev_cleanup_cdl_resources() static
ata: libata-core: Cache the general purpose log directory
ata: libata_eh: Add debug messages to ata_eh_link_set_lpm()
ata: libata-core: Reduce the number of messages signaling broken LPM
ata: ahci: Disallow LPM policy control if not supported
ata: ahci: Disallow LPM policy control for external ports
ata: ahci: Disable DIPM if host lacks support
ata: libata-sata: Disallow changing LPM state if not supported
ata: libata-eh: Avoid unnecessary resets when revalidating devices
ata: libata-core: Advertize device support for DIPM and HIPM features
...
- NVMe pull request via Christoph:
- try PCIe function level reset on init failure (Keith Busch)
- log TLS handshake failures at error level (Maurizio Lombardi)
- pci-epf: do not complete commands twice if nvmet_req_init()
fails (Rick Wertenbroek)
- misc cleanups (Alok Tiwari)
- Removal of the pktcdvd driver
This has been more than a decade coming at this point, and some
recently revealed breakages that had it causing issues even for cases
where it isn't required made me re-pull the trigger on this one. It's
known broken and nobody has stepped up to maintain the code
- Series for ublk supporting batch commands, enabling the use of
multishot where appropriate
- Speed up ublk exit handling
- Fix for the two-stage elevator fixing which could leak data
- Convert NVMe to use the new IOVA based API
- Increase default max transfer size to something more reasonable
- Series fixing write operations on zoned DM devices
- Add tracepoints for zoned block device operations
- Prep series working towards improving blk-mq queue management in the
presence of isolated CPUs
- Don't allow updating of the block size of a loop device that is
currently under exclusively ownership/open
- Set chunk sectors from stacked device stripe size and use it for the
atomic write size limit
- Switch to folios in bcache read_super()
- Fix for CD-ROM MRW exit flush handling
- Various tweaks, fixes, and cleanups
* tag 'for-6.17/block-20250728' of git://git.kernel.dk/linux: (94 commits)
block: restore two stage elevator switch while running nr_hw_queue update
cdrom: Call cdrom_mrw_exit from cdrom_release function
sunvdc: Balance device refcount in vdc_port_mpgroup_check
nvme-pci: try function level reset on init failure
dm: split write BIOs on zone boundaries when zone append is not emulated
block: use chunk_sectors when evaluating stacked atomic write limits
dm-stripe: limit chunk_sectors to the stripe size
md/raid10: set chunk_sectors limit
md/raid0: set chunk_sectors limit
block: sanitize chunk_sectors for atomic write limits
ilog2: add max_pow_of_two_factor()
nvmet: pci-epf: Do not complete commands twice if nvmet_req_init() fails
nvme-tcp: log TLS handshake failures at error level
docs: nvme: fix grammar in nvme-pci-endpoint-target.rst
nvme: fix typo in status code constant for self-test in progress
nvmet: remove redundant assignment of error code in nvmet_ns_enable()
nvme: fix incorrect variable in io cqes error message
nvme: fix multiple spelling and grammar issues in host drivers
block: fix blk_zone_append_update_request_bio() kernel-doc
md/raid10: fix set but not used variable in sync_request_write()
...
Merge tag 'for-6.17/io_uring-20250728' of git://git.kernel.dk/linux
Pull io_uring updates from Jens Axboe:
- Optimization to avoid reference counts on non-cloned registered
buffers. This is how these buffers were handled prior to having
cloning support, and we can still use that approach as long as the
buffers haven't been cloned to another ring.
- Cleanup and improvement for uring_cmd, where btrfs was the only user
of storing allocated data for the lifetime of the uring_cmd. Clean
that up so we can get rid of the need to do that.
- Avoid unnecessary memory copies in uring_cmd usage. This is
particularly important as a lot of uring_cmd usage necessitates the
use of 128b SQEs.
- A few updates for recv multishot, where it's now possible to add
fairness limits for limiting how much is transferred for each retry
loop. Additionally, recv multishot now supports an overall cap as
well, where once reached the multishot recv will terminate. The
latter is useful for buffer management and juggling many recv streams
at the same time.
- Add support for returning the TX timestamps via a new socket command.
This feature can work in either singleshot or multishot mode, where
the latter triggers a completion whenever new timestamps are
available. This is an alternative to using the existing error queue.
- Add support for an io_uring "mock" file, which is the start of being
able to do 100% targeted testing in terms of exercising io_uring
request handling. The idea is to have a file type that can be
anything the tester would like, and behave exactly how you want it to
behave in terms of hitting the code paths you want.
- Improve zcrx by using sgtables to de-duplicate and improve dma
address handling.
- Prep work for supporting larger pages for zcrx.
- Various little improvements and fixes.
* tag 'for-6.17/io_uring-20250728' of git://git.kernel.dk/linux: (42 commits)
io_uring/zcrx: fix leaking pages on sg init fail
io_uring/zcrx: don't leak pages on account failure
io_uring/zcrx: fix null ifq on area destruction
io_uring: fix breakage in EXPERT menu
io_uring/cmd: remove struct io_uring_cmd_data
btrfs/ioctl: store btrfs_uring_encoded_data in io_btrfs_cmd
io_uring/cmd: introduce IORING_URING_CMD_REISSUE flag
io_uring/zcrx: account area memory
io_uring: export io_[un]account_mem
io_uring/net: Support multishot receive len cap
io_uring: deduplicate wakeup handling
io_uring/net: cast min_not_zero() type
io_uring/poll: cleanup apoll freeing
io_uring/net: allow multishot receive per-invocation cap
io_uring/net: move io_sr_msg->retry_flags to io_sr_msg->flags
io_uring/net: use passed in 'len' in io_recv_buf_select()
io_uring/zcrx: prepare fallback for larger pages
io_uring/zcrx: assert area type in io_zcrx_iov_page
io_uring/zcrx: allocate sgtable for umem areas
io_uring/zcrx: introduce io_populate_area_dma
...
Merge tag 'v6.17-rc-smb3-server-fixes' of git://git.samba.org/ksmbd
Pull smb server updates from Steve French:
- Fix mtime/ctime reporting issue
- Auth fixes, including two session setup race bugs reported by ZDI
- Locking improvement in query directory
- Fix for potential deadlock in creating hardlinks
- Improvements to path name processing
* tag 'v6.17-rc-smb3-server-fixes' of git://git.samba.org/ksmbd:
ksmbd: fix corrupted mtime and ctime in smb2_open
ksmbd: fix Preauh_HashValue race condition
ksmbd: check return value of xa_store() in krb5_authenticate
ksmbd: fix null pointer dereference error in generate_encryptionkey
smb/server: add ksmbd_vfs_kern_path()
smb/server: avoid deadlock when linking with ReplaceIfExists
smb/server: simplify ksmbd_vfs_kern_path_locked()
smb/server: use lookup_one_unlocked()
Merge tag 'hfs-v6.17-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/vdubeyko/hfs
Pull hfs/hfsplus updates from Viacheslav Dubeyko:
"Johannes Thumshirn has made nice cleanup in hfsplus_submit_bio().
Tetsuo Handa has fixed the syzbot reported issue in
hfsplus_create_attributes_file() for the case of corruption the
Attributes File's metadata.
Yangtao Li has fixed the syzbot reported issue by removing the
uneccessary WARN_ON() in hfsplus_free_extents().
Other fixes:
- restore generic/001 successful execution by erasing deleted b-tree
nodes
- eliminate slab-out-of-bounds issue in hfs_bnode_read() and
hfsplus_bnode_read() by checking correctness of offset and length
when accessing b-tree node contents
- eliminate slab-out-of-bounds read in hfsplus_uni2asc() if the
b-tree node record has corrupted length of a name that could be
bigger than HFSPLUS_MAX_STRLEN
- eliminate general protection fault in hfs_find_init() for the case
of initial b-tree object creation"
* tag 'hfs-v6.17-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/vdubeyko/hfs:
hfs: fix general protection fault in hfs_find_init()
hfs: fix slab-out-of-bounds in hfs_bnode_read()
hfsplus: fix slab-out-of-bounds in hfsplus_bnode_read()
hfsplus: fix slab-out-of-bounds read in hfsplus_uni2asc()
hfsplus: don't use BUG_ON() in hfsplus_create_attributes_file()
hfsplus: don't set REQ_SYNC for hfsplus_submit_bio()
hfsplus: remove mutex_lock check in hfsplus_free_extents
hfs: make splice write available again
hfsplus: make splice write available again
hfs: fix not erasing deleted b-tree node issue
Merge tag 'fs_for_v6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull udf and ext2 updates from Jan Kara:
"A few udf and ext2 fixes and cleanups"
* tag 'fs_for_v6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
udf: Verify partition map count
udf: stop using write_cache_pages
ext2: Handle fiemap on empty files to prevent EINVAL
When introduced in commit 9eb22f7fedfc ("fs: add ioctl to query metadata
and protection info capabilities") the stub of blk_get_meta_cap() for
!BLK_DEV_INTEGRITY always returns -EOPNOTSUPP. The motivation was that
while the command was unsupported in that configuration it was still
recognized.
A later change instead assumed -ENOIOCTLCMD as is required for unknown
ioctl commands per Documentation/driver-api/ioctl.rst. The result being
that on !BLK_DEV_INTEGRITY configs, any ioctl which reaches
blkdev_common_ioctl() will return -EOPNOTSUPP.
Change the stub to return -ENOIOCTLCMD, fixing the issue and better
matching with expectations.
[ The blkdev_common_ioctl() confusion has been fixed, but -ENOIOCTLCMD
is the right thing to return for unrecognized ioctls, so the patch
remains the right thing to do. - Linus ]
fuse: remove page alignment check for writeback len
Remove incorrect page alignment check for the writeback len arg in
fuse_iomap_writeback_range(). len will always be block-aligned as
passed in by iomap.
On regular fuse filesystems, i_blkbits is set to PAGE_SHIFT so this is
not a problem but for fuseblk filesystems, the block size is set to a
default of 512 bytes or a block size passed in at mount time.
Please note that non-page-aligned lengths are fine for the logic in
fuse_iomap_writeback_range(). The check was originally added as a
safeguard to detect conspicuously wrong ranges.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Fixes: ef7e7cbb323f ("fuse: use iomap for writeback") Reported-by: Linux Kernel Functional Testing <lkft@linaro.org> Link: https://lore.kernel.org/linux-fsdevel/CA+G9fYs5AdVM-T2Tf3LciNCwLZEHetcnSkHsjZajVwwpM2HmJw@mail.gmail.com/ Reported-by: Sasha Levin <sashal@kernel.org> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Merge tag 'vfs-6.17-rc1.iomap' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs iomap updates from Christian Brauner:
- Refactor the iomap writeback code and split the generic and ioend/bio
based writeback code.
There are two methods that define the split between the generic
writeback code, and the implemementation of it, and all knowledge of
ioends and bios now sits below that layer.
- Add fuse iomap support for buffered writes and dirty folio writeback.
This is needed so that granular uptodate and dirty tracking can be
used in fuse when large folios are enabled. This has two big
advantages. For writes, instead of the entire folio needing to be
read into the page cache, only the relevant portions need to be. For
writeback, only the dirty portions need to be written back instead of
the entire folio.
* tag 'vfs-6.17-rc1.iomap' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
fuse: refactor writeback to use iomap_writepage_ctx inode
fuse: hook into iomap for invalidating and checking partial uptodateness
fuse: use iomap for folio laundering
fuse: use iomap for writeback
fuse: use iomap for buffered writes
iomap: build the writeback code without CONFIG_BLOCK
iomap: add read_folio_range() handler for buffered writes
iomap: improve argument passing to iomap_read_folio_sync
iomap: replace iomap_folio_ops with iomap_write_ops
iomap: export iomap_writeback_folio
iomap: move folio_unlock out of iomap_writeback_folio
iomap: rename iomap_writepage_map to iomap_writeback_folio
iomap: move all ioend handling to ioend.c
iomap: add public helpers for uptodate state manipulation
iomap: hide ioends from the generic writeback code
iomap: refactor the writeback interface
iomap: cleanup the pending writeback tracking in iomap_writepage_map_blocks
iomap: pass more arguments using the iomap writeback context
iomap: header diet
Merge tag 'vfs-6.17-rc1.super' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull superblock callback update from Christian Brauner:
"Currently all filesystems which implement super_operations::shutdown()
can not afford losing a device.
Thus fs_bdev_mark_dead() will just call the ->shutdown() callback for
the involved filesystem.
But it will no longer be the case, as multi-device filesystems like
btrfs can handle certain device loss without the need to shutdown the
whole filesystem.
To allow those multi-device filesystems to be integrated to use
fs_holder_ops:
- Add a new super_operations::remove_bdev() callback
- Try ->remove_bdev() callback first inside fs_bdev_mark_dead().
If the callback returned 0, meaning the fs can handling the device
loss, then exit without doing anything else.
If there is no such callback or the callback returned non-zero
value, continue to shutdown the filesystem as usual.
This means the new remove_bdev() should only do the check on whether
the operation can continue, and if so do the fs specific handlings.
The shutdown handling should still be handled by the existing
->shutdown() callback.
For all existing filesystems with shutdown callback, there is no
change to the code nor behavior.
Btrfs is going to implement both the ->remove_bdev() and ->shutdown()
callbacks soon"
* tag 'vfs-6.17-rc1.super' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
fs: add a new remove_bdev() callback
Merge tag 'vfs-6.17-rc1.fileattr' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull fileattr updates from Christian Brauner:
"This introduces the new file_getattr() and file_setattr() system calls
after lengthy discussions.
Both system calls serve as successors and extensible companions to
the FS_IOC_FSGETXATTR and FS_IOC_FSSETXATTR system calls which have
started to show their age in addition to being named in a way that
makes it easy to conflate them with extended attribute related
operations.
These syscalls allow userspace to set filesystem inode attributes on
special files. One of the usage examples is the XFS quota projects.
XFS has project quotas which could be attached to a directory. All new
inodes in these directories inherit project ID set on parent
directory.
The project is created from userspace by opening and calling
FS_IOC_FSSETXATTR on each inode. This is not possible for special
files such as FIFO, SOCK, BLK etc. Therefore, some inodes are left
with empty project ID. Those inodes then are not shown in the quota
accounting but still exist in the directory. This is not critical but
in the case when special files are created in the directory with
already existing project quota, these new inodes inherit extended
attributes. This creates a mix of special files with and without
attributes. Moreover, special files with attributes don't have a
possibility to become clear or change the attributes. This, in turn,
prevents userspace from re-creating quota project on these existing
files.
In addition, these new system calls allow the implementation of
additional attributes that we couldn't or didn't want to fit into the
legacy ioctls anymore"
* tag 'vfs-6.17-rc1.fileattr' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
fs: tighten a sanity check in file_attr_to_fileattr()
tree-wide: s/struct fileattr/struct file_kattr/g
fs: introduce file_getattr and file_setattr syscalls
fs: prepare for extending file_get/setattr()
fs: make vfs_fileattr_[get|set] return -EOPNOTSUPP
selinux: implement inode_file_[g|s]etattr hooks
lsm: introduce new hooks for setting/getting inode fsxattr
fs: split fileattr related helpers into separate file
Merge tag 'vfs-6.17-rc1.integrity' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs 'protection info' updates from Christian Brauner:
"This adds the new FS_IOC_GETLBMD_CAP ioctl() to query metadata and
protection info (PI) capabilities. This ioctl returns information
about the files integrity profile. This is useful for userspace
applications to understand a files end-to-end data protection support
and configure the I/O accordingly.
For now this interface is only supported by block devices. However the
design and placement of this ioctl in generic FS ioctl space allows us
to extend it to work over files as well. This maybe useful when
filesystems start supporting PI-aware layouts.
A new structure struct logical_block_metadata_cap is introduced, which
contains the following fields:
- lbmd_flags:
bitmask of logical block metadata capability flags
- lbmd_interval:
the amount of data described by each unit of logical block metadata
- lbmd_size:
size in bytes of the logical block metadata associated with each
interval
- lbmd_opaque_size:
size in bytes of the opaque block tag associated with each interval
- lbmd_opaque_offset:
offset in bytes of the opaque block tag within the logical block
metadata
- lbmd_pi_size:
size in bytes of the T10 PI tuple associated with each interval
- lbmd_pi_offset:
offset in bytes of T10 PI tuple within the logical block metadata
- lbmd_pi_guard_tag_type:
T10 PI guard tag type
- lbmd_pi_app_tag_size:
size in bytes of the T10 PI application tag
- lbmd_pi_ref_tag_size:
size in bytes of the T10 PI reference tag
- lbmd_pi_storage_tag_size:
size in bytes of the T10 PI storage tag
The internal logic to fetch the capability is encapsulated in a helper
function blk_get_meta_cap(), which uses the blk_integrity profile
associated with the device. The ioctl returns -EOPNOTSUPP, if
CONFIG_BLK_DEV_INTEGRITY is not enabled"
* tag 'vfs-6.17-rc1.integrity' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
block: fix lbmd_guard_tag_type assignment in FS_IOC_GETLBMD_CAP
block: fix FS_IOC_GETLBMD_CAP parsing in blkdev_common_ioctl()
fs: add ioctl to query metadata and protection info capabilities
nvme: set pi_offset only when checksum type is not BLK_INTEGRITY_CSUM_NONE
block: introduce pi_tuple_size field in blk_integrity
block: rename tuple_size field in blk_integrity to metadata_size
Merge tag 'vfs-6.17-rc1.rust' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs rust updates from Christian Brauner:
- Allow poll_table pointers to be NULL
- Add Rust files to vfs MAINTAINERS entry
* tag 'vfs-6.17-rc1.rust' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
vfs: add Rust files to MAINTAINERS
poll: rust: allow poll_table ptrs to be null
Merge tag 'vfs-6.17-rc1.bpf' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs bpf updates from Christian Brauner:
"These changes allow bpf to read extended attributes from cgroupfs.
This is useful in redirecting AF_UNIX socket connections based on
cgroup membership of the socket. One use-case is the ability to
implement log namespaces in systemd so services and containers are
redirected to different journals"
* tag 'vfs-6.17-rc1.bpf' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
selftests/kernfs: test xattr retrieval
selftests/bpf: Add tests for bpf_cgroup_read_xattr
bpf: Mark cgroup_subsys_state->cgroup RCU safe
bpf: Introduce bpf_cgroup_read_xattr to read xattr of cgroup's node
kernfs: remove iattr_mutex
Merge tag 'vfs-6.17-rc1.pidfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull pidfs updates from Christian Brauner:
- persistent info
Persist exit and coredump information independent of whether anyone
currently holds a pidfd for the struct pid.
The current scheme allocated pidfs dentries on-demand repeatedly.
This scheme is reaching it's limits as it makes it impossible to pin
information that needs to be available after the task has exited or
coredumped and that should not be lost simply because the pidfd got
closed temporarily. The next opener should still see the stashed
information.
This is also a prerequisite for supporting extended attributes on
pidfds to allow attaching meta information to them.
If someone opens a pidfd for a struct pid a pidfs dentry is allocated
and stashed in pid->stashed. Once the last pidfd for the struct pid
is closed the pidfs dentry is released and removed from pid->stashed.
So if 10 callers create a pidfs dentry for the same struct pid
sequentially, i.e., each closing the pidfd before the other creates a
new one then a new pidfs dentry is allocated every time.
Because multiple tasks acquiring and releasing a pidfd for the same
struct pid can race with each another a task may still find a valid
pidfs entry from the previous task in pid->stashed and reuse it. Or
it might find a dead dentry in there and fail to reuse it and so
stashes a new pidfs dentry. Multiple tasks may race to stash a new
pidfs dentry but only one will succeed, the other ones will put their
dentry.
The current scheme aims to ensure that a pidfs dentry for a struct
pid can only be created if the task is still alive or if a pidfs
dentry already existed before the task was reaped and so exit
information has been was stashed in the pidfs inode.
That's great except that it's buggy. If a pidfs dentry is stashed in
pid->stashed after pidfs_exit() but before __unhash_process() is
called we will return a pidfd for a reaped task without exit
information being available.
The pidfds_pid_valid() check does not guard against this race as it
doens't sync at all with pidfs_exit(). The pid_has_task() check might
be successful simply because we're before __unhash_process() but
after pidfs_exit().
Introduce a new scheme where the lifetime of information associated
with a pidfs entry (coredump and exit information) isn't bound to the
lifetime of the pidfs inode but the struct pid itself.
The first time a pidfs dentry is allocated for a struct pid a struct
pidfs_attr will be allocated which will be used to store exit and
coredump information.
If all pidfs for the pidfs dentry are closed the dentry and inode can
be cleaned up but the struct pidfs_attr will stick until the struct
pid itself is freed. This will ensure minimal memory usage while
persisting relevant information.
The new scheme has various advantages. First, it allows to close the
race where we end up handing out a pidfd for a reaped task for which
no exit information is available. Second, it minimizes memory usage.
Third, it allows to remove complex lifetime tracking via dentries
when registering a struct pid with pidfs. There's no need to get or
put a reference. Instead, the lifetime of exit and coredump
information associated with a struct pid is bound to the lifetime of
struct pid itself.
- extended attributes
Now that we have a way to persist information for pidfs dentries we
can start supporting extended attributes on pidfds. This will allow
userspace to attach meta information to tasks.
One natural extension would be to introduce a custom pidfs.* extended
attribute space and allow for the inheritance of extended attributes
across fork() and exec().
The first simple scheme will allow privileged userspace to set
trusted extended attributes on pidfs inodes.
- Allow autonomous pidfs file handles
Various filesystems such as pidfs and drm support opening file
handles without having to require a file descriptor to identify the
filesystem. The filesystem are global single instances and can be
trivially identified solely on the information encoded in the file
handle.
This makes it possible to not have to keep or acquire a sentinal file
descriptor just to pass it to open_by_handle_at() to identify the
filesystem. That's especially useful when such sentinel file
descriptor cannot or should not be acquired.
For pidfs this means a file handle can function as full replacement
for storing a pid in a file. Instead a file handle can be stored and
reopened purely based on the file handle.
Such autonomous file handles can be opened with or without specifying
a a file descriptor. If no proper file descriptor is used the
FD_PIDFS_ROOT sentinel must be passed. This allows us to define
further special negative fd sentinels in the future.
Userspace can trivially test for support by trying to open the file
handle with an invalid file descriptor.
- Allow pidfds for reaped tasks with SCM_PIDFD messages
This is a logical continuation of the earlier work to create pidfds
for reaped tasks through the SO_PEERPIDFD socket option merged in 923ea4d4482b ("Merge patch series "net, pidfs: enable handing out
pidfds for reaped sk->sk_peer_pid"").
- Two minor fixes:
* Fold fs_struct->{lock,seq} into a seqlock
* Don't bother with path_{get,put}() in unix_open_file()
* tag 'vfs-6.17-rc1.pidfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (37 commits)
don't bother with path_get()/path_put() in unix_open_file()
fold fs_struct->{lock,seq} into a seqlock
selftests: net: extend SCM_PIDFD test to cover stale pidfds
af_unix: enable handing out pidfds for reaped tasks in SCM_PIDFD
af_unix: stash pidfs dentry when needed
af_unix/scm: fix whitespace errors
af_unix: introduce and use scm_replace_pid() helper
af_unix: introduce unix_skb_to_scm helper
af_unix: rework unix_maybe_add_creds() to allow sleep
selftests/pidfd: decode pidfd file handles withou having to specify an fd
fhandle, pidfs: support open_by_handle_at() purely based on file handle
uapi/fcntl: add FD_PIDFS_ROOT
uapi/fcntl: add FD_INVALID
fcntl/pidfd: redefine PIDFD_SELF_THREAD_GROUP
uapi/fcntl: mark range as reserved
fhandle: reflow get_path_anchor()
pidfs: add pidfs_root_path() helper
fhandle: rename to get_path_anchor()
fhandle: hoist copy_from_user() above get_path_from_fd()
fhandle: raise FILEID_IS_DIR in handle_type
...
Merge tag 'vfs-6.17-rc1.mmap_prepare' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull mmap_prepare updates from Christian Brauner:
"Last cycle we introduce f_op->mmap_prepare() in c84bf6dd2b83 ("mm:
introduce new .mmap_prepare() file callback").
This is preferred to the existing f_op->mmap() hook as it does require
a VMA to be established yet, thus allowing the mmap logic to invoke
this hook far, far earlier, prior to inserting a VMA into the virtual
address space, or performing any other heavy handed operations.
This allows for much simpler unwinding on error, and for there to be a
single attempt at merging a VMA rather than having to possibly
reattempt a merge based on potentially altered VMA state.
Far more importantly, it prevents inappropriate manipulation of
incompletely initialised VMA state, which is something that has been
the cause of bugs and complexity in the past.
The intent is to gradually deprecate f_op->mmap, and in that vein this
series coverts the majority of file systems to using f_op->mmap_prepare.
Prerequisite steps are taken - firstly ensuring all checks for mmap
capabilities use the file_has_valid_mmap_hooks() helper rather than
directly checking for f_op->mmap (which is now not a valid check) and
secondly updating daxdev_mapping_supported() to not require a VMA
parameter to allow ext4 and xfs to be converted.
Commit bb666b7c2707 ("mm: add mmap_prepare() compatibility layer for
nested file systems") handles the nasty edge-case of nested file
systems like overlayfs, which introduces a compatibility shim to allow
f_op->mmap_prepare() to be invoked from an f_op->mmap() callback.
This allows for nested filesystems to continue to function correctly
with all file systems regardless of which callback is used. Once we
finally convert all file systems, this shim can be removed.
As a result, ecryptfs, fuse, and overlayfs remain unaltered so they
can nest all other file systems.
We additionally do not update resctl - as this requires an update to
remap_pfn_range() (or an alternative to it) which we defer to a later
series, equally we do not update cramfs which needs a mixed mapping
insertion with the same issue, nor do we update procfs, hugetlbfs,
syfs or kernfs all of which require VMAs for internal state and hooks.
We shall return to all of these later"
* tag 'vfs-6.17-rc1.mmap_prepare' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
doc: update porting, vfs documentation to describe mmap_prepare()
fs: replace mmap hook with .mmap_prepare for simple mappings
fs: convert most other generic_file_*mmap() users to .mmap_prepare()
fs: convert simple use of generic_file_*_mmap() to .mmap_prepare()
mm/filemap: introduce generic_file_*_mmap_prepare() helpers
fs/xfs: transition from deprecated .mmap hook to .mmap_prepare
fs/ext4: transition from deprecated .mmap hook to .mmap_prepare
fs/dax: make it possible to check dev dax support without a VMA
fs: consistently use can_mmap_file() helper
mm/nommu: use file_has_valid_mmap_hooks() helper
mm: rename call_mmap/mmap_prepare to vfs_mmap/mmap_prepare
Merge tag 'vfs-6.17-rc1.fallocate' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull fallocate updates from Christian Brauner:
"fallocate() currently supports creating preallocated files
efficiently. However, on most filesystems fallocate() will preallocate
blocks in an unwriten state even if FALLOC_FL_ZERO_RANGE is specified.
The extent state must later be converted to a written state when the
user writes data into this range, which can trigger numerous metadata
changes and journal I/O. This may leads to significant write
amplification and performance degradation in synchronous write mode.
At the moment, the only method to avoid this is to create an empty
file and write zero data into it (for example, using 'dd' with a large
block size). However, this method is slow and consumes a considerable
amount of disk bandwidth.
Now that more and more flash-based storage devices are available it is
possible to efficiently write zeros to SSDs using the unmap write
zeroes command if the devices do not write physical zeroes to the
media.
For example, if SCSI SSDs support the UMMAP bit or NVMe SSDs support
the DEAC bit[1], the write zeroes command does not write actual data
to the device, instead, NVMe converts the zeroed range to a
deallocated state, which works fast and consumes almost no disk write
bandwidth.
This series implements the BLK_FEAT_WRITE_ZEROES_UNMAP feature and
BLK_FLAG_WRITE_ZEROES_UNMAP_DISABLED flag for SCSI, NVMe and
device-mapper drivers, and add the FALLOC_FL_WRITE_ZEROES and
STATX_ATTR_WRITE_ZEROES_UNMAP support for ext4 and raw bdev devices.
fallocate() is subsequently extended with the FALLOC_FL_WRITE_ZEROES
flag. FALLOC_FL_WRITE_ZEROES zeroes a specified file range in such a
way that subsequent writes to that range do not require further
changes to the file mapping metadata. This flag is beneficial for
subsequent pure overwriting within this range, as it can save on block
allocation and, consequently, significant metadata changes"
* tag 'vfs-6.17-rc1.fallocate' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
ext4: add FALLOC_FL_WRITE_ZEROES support
block: add FALLOC_FL_WRITE_ZEROES support
block: factor out common part in blkdev_fallocate()
fs: introduce FALLOC_FL_WRITE_ZEROES to fallocate
dm: clear unmap write zeroes limits when disabling write zeroes
scsi: sd: set max_hw_wzeroes_unmap_sectors if device supports SD_ZERO_*_UNMAP
nvmet: set WZDS and DRB if device enables unmap write zeroes operation
nvme: set max_hw_wzeroes_unmap_sectors if device supports DEAC bit
block: introduce max_{hw|user}_wzeroes_unmap_sectors to queue limits
Merge tag 'vfs-6.17-rc1.async.dir' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull async directory updates from Christian Brauner:
"This contains preparatory changes for the asynchronous directory
locking scheme.
While the locking scheme is still very much controversial and we're
still far away from landing any actual changes in that area the
preparatory work that we've been upstreaming for a while now has been
very useful. This is another set of minor changes and cleanups"
* tag 'vfs-6.17-rc1.async.dir' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
exportfs: use lookup_one_unlocked()
coda: use iterate_dir() in coda_readdir()
VFS: Minor fixes for porting.rst
VFS: merge lookup_one_qstr_excl_raw() back into lookup_one_qstr_excl()
Merge tag 'vfs-6.17-rc1.nsfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull namespace updates from Christian Brauner:
"This contains namespace updates. This time specifically for nsfs:
- Userspace heavily relies on the root inode numbers for namespaces
to identify the initial namespaces. That's already a hard
dependency. So we cannot change that anymore. Move the initial
inode numbers to a public header and align the only two namespaces
that currently don't do that with all the other namespaces.
- The root inode of /proc having a fixed inode number has been part
of the core kernel ABI since its inception, and recently some
userspace programs (mainly container runtimes) have started to
explicitly depend on this behaviour.
The main reason this is useful to userspace is that by checking
that a suspect /proc handle has fstype PROC_SUPER_MAGIC and is
PROCFS_ROOT_INO, they can then use openat2() together with
RESOLVE_{NO_{XDEV,MAGICLINK},BENEATH} to ensure that there isn't a
bind-mount that replaces some procfs file with a different one.
This kind of attack has lead to security issues in container
runtimes in the past (such as CVE-2019-19921) and libraries like
libpathrs[1] use this feature of procfs to provide safe procfs
handling functions"
* tag 'vfs-6.17-rc1.nsfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
uapi: export PROCFS_ROOT_INO
mntns: use stable inode number for initial mount ns
netns: use stable inode number for initial mount ns
nsfs: move root inode number to uapi
Merge tag 'vfs-6.17-rc1.ovl' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull overlayfs updates from Christian Brauner:
"This contains overlayfs updates for this cycle.
The changes for overlayfs in here are primarily focussed on preparing
for some proposed changes to directory locking.
Overlayfs currently will sometimes lock a directory on the upper
filesystem and do a few different things while holding the lock. This
is incompatible with the new potential scheme.
This series narrows the region of code protected by the directory
lock, taking it multiple times when necessary. This theoretically
opens up the possibilty of other changes happening on the upper
filesytem between the unlock and the lock. To some extent the patches
guard against that by checking the dentries still have the expect
parent after retaking the lock. In general, concurrent changes to the
upper and lower filesystems aren't supported properly anyway"
* tag 'vfs-6.17-rc1.ovl' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (25 commits)
ovl: properly print correct variable
ovl: rename ovl_cleanup_unlocked() to ovl_cleanup()
ovl: change ovl_create_real() to receive dentry parent
ovl: narrow locking in ovl_check_rename_whiteout()
ovl: narrow locking in ovl_whiteout()
ovl: change ovl_cleanup_and_whiteout() to take rename lock as needed
ovl: narrow locking on ovl_remove_and_whiteout()
ovl: change ovl_workdir_cleanup() to take dir lock as needed.
ovl: narrow locking in ovl_workdir_cleanup_recurse()
ovl: narrow locking in ovl_indexdir_cleanup()
ovl: narrow locking in ovl_workdir_create()
ovl: narrow locking in ovl_cleanup_index()
ovl: narrow locking in ovl_cleanup_whiteouts()
ovl: narrow locking in ovl_rename()
ovl: simplify gotos in ovl_rename()
ovl: narrow locking in ovl_create_over_whiteout()
ovl: narrow locking in ovl_clear_empty()
ovl: narrow locking in ovl_create_upper()
ovl: narrow the locked region in ovl_copy_up_workdir()
ovl: Call ovl_create_temp() without lock held.
...
Merge tag 'vfs-6.17-rc1.coredump' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull coredump updates from Christian Brauner:
"This contains an extension to the coredump socket and a proper rework
of the coredump code.
- This extends the coredump socket to allow the coredump server to
tell the kernel how to process individual coredumps. This allows
for fine-grained coredump management. Userspace can decide to just
let the kernel write out the coredump, or generate the coredump
itself, or just reject it.
* COREDUMP_KERNEL
The kernel will write the coredump data to the socket.
* COREDUMP_USERSPACE
The kernel will not write coredump data but will indicate to the
parent that a coredump has been generated. This is used when
userspace generates its own coredumps.
* COREDUMP_REJECT
The kernel will skip generating a coredump for this task.
* COREDUMP_WAIT
The kernel will prevent the task from exiting until the coredump
server has shutdown the socket connection.
The flexible coredump socket can be enabled by using the "@@"
prefix instead of the single "@" prefix for the regular coredump
socket:
@@/run/systemd/coredump.socket
- Cleanup the coredump code properly while we have to touch it
anyway.
Split out each coredump mode in a separate helper so it's easy to
grasp what is going on and make the code easier to follow. The core
coredump function should now be very trivial to follow"
* tag 'vfs-6.17-rc1.coredump' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (31 commits)
cleanup: add a scoped version of CLASS()
coredump: add coredump_skip() helper
coredump: avoid pointless variable
coredump: order auto cleanup variables at the top
coredump: add coredump_cleanup()
coredump: auto cleanup prepare_creds()
cred: add auto cleanup method
coredump: directly return
coredump: auto cleanup argv
coredump: add coredump_write()
coredump: use a single helper for the socket
coredump: move pipe specific file check into coredump_pipe()
coredump: split pipe coredumping into coredump_pipe()
coredump: move core_pipe_count to global variable
coredump: prepare to simplify exit paths
coredump: split file coredumping into coredump_file()
coredump: rename do_coredump() to vfs_coredump()
selftests/coredump: make sure invalid paths are rejected
coredump: validate socket path in coredump_parse()
coredump: don't allow ".." in coredump socket path
...
Merge tag 'vfs-6.17-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull misc VFS updates from Christian Brauner:
"This contains the usual selections of misc updates for this cycle.
Features:
- Add ext4 IOCB_DONTCACHE support
This refactors the address_space_operations write_begin() and
write_end() callbacks to take const struct kiocb * as their first
argument, allowing IOCB flags such as IOCB_DONTCACHE to propagate
to the filesystem's buffered I/O path.
Ext4 is updated to implement handling of the IOCB_DONTCACHE flag
and advertises support via the FOP_DONTCACHE file operation flag.
Additionally, the i915 driver's shmem write paths are updated to
bypass the legacy write_begin/write_end interface in favor of
directly calling write_iter() with a constructed synchronous kiocb.
Another i915 change replaces a manual write loop with
kernel_write() during GEM shmem object creation.
Cleanups:
- don't duplicate vfs_open() in kernel_file_open()
- proc_fd_getattr(): don't bother with S_ISDIR() check
- fs/ecryptfs: replace snprintf with sysfs_emit in show function
- vfs: Remove unnecessary list_for_each_entry_safe() from
evict_inodes()
- filelock: add new locks_wake_up_waiter() helper
- fs: Remove three arguments from block_write_end()
- VFS: change old_dir and new_dir in struct renamedata to dentrys
* tag 'vfs-6.17-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (24 commits)
netfs: Remove unused declaration netfs_queue_write_request()
eventpoll: fix sphinx documentation build warning
ext4: support uncached buffered I/O
mm/pagemap: add write_begin_get_folio() helper function
fs: change write_begin/write_end interface to take struct kiocb *
drm/i915: Refactor shmem_pwrite() to use kiocb and write_iter
drm/i915: Use kernel_write() in shmem object create
eventpoll: Fix semi-unbounded recursion
vfs: Remove unnecessary list_for_each_entry_safe() from evict_inodes()
fs/libfs: don't assume blocksize <= PAGE_SIZE in generic_check_addressable
fs/buffer: remove the min and max limit checks in __getblk_slow()
fs: Prevent file descriptor table allocations exceeding INT_MAX
fs: Remove three arguments from block_write_end()
fs/ecryptfs: replace snprintf with sysfs_emit in show function
fs: annotate suspected data race between poll_schedule_timeout() and pollwake()
docs/vfs: update references to i_mutex to i_rwsem
fs/buffer: remove comment about hard sectorsize
fs_context: fix parameter name in infofc() macro
VFS: change old_dir and new_dir in struct renamedata to dentrys
proc_fd_getattr(): don't bother with S_ISDIR() check
...
Merge tag 'pull-mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs mount updates from Al Viro:
- mount hash conflicts rudiments are gone now - we do not allow
multiple mounts with the same parent/mountpoint to be hashed at the
same time.
- 'struct mount' changes:
- mnt_umounting is gone
- mnt_slave_list/mnt_slave is an hlist now
- overmounts are kept track of by explicit pointer in mount
- a bunch of flags moved out of mnt_flags to a new field, with
only namespace_sem for protection
- mnt_expiry is protected by mount_lock now (instead of
namespace_sem)
- MNT_LOCKED is used only for mounts that need to remain attached
to their parents to prevent mountpoint exposure - no more
overloading it for absolute roots
- all mnt_list uses are transient now - it's used only to
represent temporary sets during umount_tree()
- mount refcounting change: children no longer pin parents for any
mounts, whether they'd passed through umount_tree() or not
- 'struct mountpoint' changes:
- refcount is no more; what matters is ->m_list emptiness
- instead of temporary bumping the refcount, we insert a new
object (pinned_mountpoint) into ->m_list
- new calling conventions for lock_mount() and friends
- do_move_mount()/attach_recursive_mnt() seriously cleaned up
- globals in fs/pnode.c are gone
- propagate_mnt(), change_mnt_propagation() and propagate_umount()
cleaned up (in the last case - pretty much completely rewritten).
- freeing of emptied mnt_namespace is done in namespace_unlock(). For
one thing, there are subtle ordering requirements there; for another
it simplifies cleanups.
- assorted cleanups
- restore the machinery for long-term mounts from accumulated bitrot.
This is going to get a followup come next cycle, when the change of
vfs_fs_parse_string() calling conventions goes into -next
* tag 'pull-mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (48 commits)
statmount_mnt_basic(): simplify the logics for group id
invent_group_ids(): zero ->mnt_group_id always implies !IS_MNT_SHARED()
get rid of CL_SHARE_TO_SLAVE
take freeing of emptied mnt_namespace to namespace_unlock()
copy_tree(): don't link the mounts via mnt_list
change_mnt_propagation(): move ->mnt_master assignment into MS_SLAVE case
mnt_slave_list/mnt_slave: turn into hlist_head/hlist_node
turn do_make_slave() into transfer_propagation()
do_make_slave(): choose new master sanely
change_mnt_propagation(): do_make_slave() is a no-op unless IS_MNT_SHARED()
change_mnt_propagation() cleanups, step 1
propagate_mnt(): fix comment and convert to kernel-doc, while we are at it
propagate_mnt(): get rid of last_dest
fs/pnode.c: get rid of globals
propagate_one(): fold into the sole caller
propagate_one(): separate the "what should be the master for this copy" part
propagate_one(): separate the "do we need secondary here?" logics
propagate_mnt(): handle all peer groups in the same loop
propagate_one(): get rid of dest_master
mount: separate the flags accessed only under namespace_sem
...
Merge tag 'pull-ceph-d_name-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull ceph dentry->d_name fixes from Al Viro:
"Stuff that had fallen through the cracks back in February; ceph folks
tested that pile and said they prefer to have it go through my tree..."
* tag 'pull-ceph-d_name-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
ceph: fix a race with rename() in ceph_mdsc_build_path()
prep for ceph_encode_encrypted_fname() fixes
[ceph] parse_longname(): strrchr() expects NUL-terminated string
Merge tag 'pull-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull misc VFS updates from Al Viro:
"VFS-related cleanups in various places (mostly of the "that really
can't happen" or "there's a better way to do it" variety)"
* tag 'pull-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
gpib: use file_inode()
binder_ioctl_write_read(): simplify control flow a bit
secretmem: move setting O_LARGEFILE and bumping users' count to the place where we create the file
apparmor: file never has NULL f_path.mnt
landlock: opened file never has a negative dentry
Merge tag 'pull-securityfs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull securityfs updates from Al Viro:
"Securityfs cleanups and fixes:
- one extra reference is enough to pin a dentry down; no need for
two. Switch to regular scheme, similar to shmem, debugfs, etc. This
fixes a securityfs_recursive_remove() dentry leak, among other
things.
- we need to have the filesystem pinned to prevent the contents
disappearing; what we do not need is pinning it for each file.
Doing that only for files and directories in the root is enough.
- the previous two changes allow us to get rid of the racy kludges in
efi_secret_unlink(), where we can use simple_unlink() instead of
securityfs_remove(). Which does not require unlocking and relocking
the parent, with all deadlocks that invites.
- Make securityfs_remove() take the entire subtree out, turning
securityfs_recursive_remove() into its alias. Makes a lot more
sense for callers and fixes a mount leak, while we are at it.
- Making securityfs_remove() remove the entire subtree allows for
much simpler life in most of the users - efi_secret, ima_fs, evm,
ipe, tmp get cleaner. I hadn't touched apparmor use of securityfs,
but I suspect that it would be useful there as well"
* tag 'pull-securityfs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
tpm: don't bother with removal of files in directory we'll be removing
ipe: don't bother with removal of files in directory we'll be removing
evm_secfs: clear securityfs interactions
ima_fs: get rid of lookup-by-dentry stuff
ima_fs: don't bother with removal of files in directory we'll be removing
efi_secret: clean securityfs use up
make securityfs_remove() remove the entire subtree
fix locking in efi_secret_unlink()
securityfs: pin filesystem only for objects directly in root
securityfs: don't pin dentries twice, once is enough...
Merge tag 'pull-rpc_pipefs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull rpc_pipefs updates from Al Viro:
"Massage rpc_pipefs to use saner primitives and clean up the APIs
provided to the rest of the kernel"
* tag 'pull-rpc_pipefs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
rpc_create_client_dir(): return 0 or -E...
rpc_create_client_dir(): don't bother with rpc_populate()
rpc_new_dir(): the last argument is always NULL
rpc_pipe: expand the calls of rpc_mkdir_populate()
rpc_gssd_dummy_populate(): don't bother with rpc_populate()
rpc_mkpipe_dentry(): switch to simple_start_creating()
rpc_pipe: saner primitive for creating regular files
rpc_pipe: saner primitive for creating subdirectories
rpc_pipe: don't overdo directory locking
rpc_mkpipe_dentry(): saner calling conventions
rpc_unlink(): saner calling conventions
rpc_populate(): lift cleanup into callers
rpc_unlink(): use simple_recursive_removal()
rpc_{rmdir_,}depopulate(): use simple_recursive_removal() instead
rpc_pipe: clean failure exits in fill_super
new helper: simple_start_creating()
Merge tag 'pull-simple_recursive_removal' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull simple_recursive_removal() update from Al Viro:
"Removing subtrees of kernel filesystems is done in quite a few places;
unfortunately, it's easy to get wrong. A number of open-coded attempts
are out there, with varying amount of bogosities.
simple_recursive_removal() had been introduced for doing that with all
precautions needed; it does an equivalent of rm -rf, with sufficient
locking, eviction of anything mounted on top of the subtree, etc.
This series converts a bunch of open-coded instances to using that"
* tag 'pull-simple_recursive_removal' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
functionfs, gadgetfs: use simple_recursive_removal()
kill binderfs_remove_file()
fuse_ctl: use simple_recursive_removal()
pstore: switch to locked_recursive_removal()
binfmt_misc: switch to locked_recursive_removal()
spufs: switch to locked_recursive_removal()
add locked_recursive_removal()
better lockdep annotations for simple_recursive_removal()
simple_recursive_removal(): saner interaction with fsnotify
Merge tag 'pull-dcache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull dentry d_flags updates from Al Viro:
"The current exclusion rules for dentry->d_flags stores are rather
unpleasant. The basic rules are simple:
- stores to dentry->d_flags are OK under dentry->d_lock
- stores to dentry->d_flags are OK in the dentry constructor, before
becomes potentially visible to other threads
Unfortunately, there's a couple of exceptions to that, and that's
where the headache comes from.
The main PITA comes from d_set_d_op(); that primitive sets ->d_op of
dentry and adjusts the flags that correspond to presence of individual
methods. It's very easy to misuse; existing uses _are_ safe, but proof
of correctness is brittle.
Use in __d_alloc() is safe (we are within a constructor), but we might
as well precalculate the initial value of 'd_flags' when we set the
default ->d_op for given superblock and set 'd_flags' directly instead
of messing with that helper.
The reasons why other uses are safe are bloody convoluted; I'm not
going to reproduce it here. See [1] for gory details, if you care. The
critical part is using d_set_d_op() only just prior to
d_splice_alias(), which makes a combination of d_splice_alias() with
setting ->d_op, etc a natural replacement primitive.
Better yet, if we go that way, it's easy to take setting ->d_op and
modifying 'd_flags' under ->d_lock, which eliminates the headache as
far as 'd_flags' exclusion rules are concerned. Other exceptions are
minor and easy to deal with.
What this series does:
- d_set_d_op() is no longer available; instead a new primitive
(d_splice_alias_ops()) is provided, equivalent to combination of
d_set_d_op() and d_splice_alias().
- new field of struct super_block - 's_d_flags'. This sets the
default value of 'd_flags' to be used when allocating dentries on
this filesystem.
- new primitive for setting 's_d_op': set_default_d_op(). This
replaces stores to 's_d_op' at mount time.
All in-tree filesystems converted; out-of-tree ones will get caught
by the compiler ('s_d_op' is renamed, so stores to it will be
caught). 's_d_flags' is set by the same primitive to match the
's_d_op'.
- a lot of filesystems had sb->s_d_op->d_delete equal to
always_delete_dentry; that is equivalent to setting
DCACHE_DONTCACHE in 'd_flags', so such filesystems can bloody well
set that bit in 's_d_flags' and drop 'd_delete()' from
dentry_operations.
In quite a few cases that results in empty dentry_operations, which
means that we can get rid of those.
- kill simple_dentry_operations - not needed anymore
- massage d_alloc_parallel() to get rid of the other exception wrt
'd_flags' stores - we can set DCACHE_PAR_LOOKUP as soon as we
allocate the new dentry; no need to delay that until we commit to
using the sucker.
As the result, 'd_flags' stores are all either under ->d_lock or done
before the dentry becomes visible in any shared data structures"
Link: https://lore.kernel.org/all/20250224010624.GT1977892@ZenIV/
* tag 'pull-dcache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (21 commits)
configfs: use DCACHE_DONTCACHE
debugfs: use DCACHE_DONTCACHE
efivarfs: use DCACHE_DONTCACHE instead of always_delete_dentry()
9p: don't bother with always_delete_dentry
ramfs, hugetlbfs, mqueue: set DCACHE_DONTCACHE
kill simple_dentry_operations
devpts, sunrpc, hostfs: don't bother with ->d_op
shmem: no dentry retention past the refcount reaching zero
d_alloc_parallel(): set DCACHE_PAR_LOOKUP earlier
make d_set_d_op() static
simple_lookup(): just set DCACHE_DONTCACHE
tracefs: Add d_delete to remove negative dentries
set_default_d_op(): calculate the matching value for ->d_flags
correct the set of flags forbidden at d_set_d_op() time
split d_flags calculation out of d_set_d_op()
new helper: set_default_d_op()
fuse: no need for special dentry_operations for root dentry
switch procfs from d_set_d_op() to d_splice_alias_ops()
new helper: d_splice_alias_ops()
procfs: kill ->proc_dops
...
Merge tag 'pull-headers_param' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull asm/param cleanup from Al Viro:
"This massages asm/param.h to simpler and more uniform shape:
- all arch/*/include/uapi/asm/param.h are either generated includes
of <asm-generic/param.h> or a #define or two followed by such
include
- no arch/*/include/asm/param.h anywhere, generated or not
- include <asm/param.h> resolves to arch/*/include/uapi/asm/param.h
of the architecture in question (or that of host in case of uml)
- include/asm-generic/param.h pulls uapi/asm-generic/param.h and
deals with USER_HZ, CLOCKS_PER_SEC and with HZ redefinition after
that"
* tag 'pull-headers_param' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
loongarch, um, xtensa: get rid of generated arch/$ARCH/include/asm/param.h
alpha: regularize the situation with asm/param.h
xtensa: get rid uapi/asm/param.h
Merge tag 'nfsd-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd updates from Chuck Lever:
"NFSD is finally able to offer write delegations to clients that open
files with O_WRONLY, thanks to patches from Dai Ngo. We're expecting
this to accelerate a few interesting corner cases.
The cap on the number of operations per NFSv4 COMPOUND has been
lifted. Now, clients that send COMPOUNDs containing dozens of
operations (for example, a long stream of LOOKUP operations to walk a
pathname in a single round trip) will no longer be rejected.
This release re-enables the ability for NFSD to perform NFSv4.2 COPY
operations asynchronously. This feature has been disabled to mitigate
the risk of denial-of-service when too many such requests arrive.
Many thanks to the contributors, reviewers, testers, and bug reporters
who participated during the v6.17 development cycle"
* tag 'nfsd-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (32 commits)
nfsd: Drop dprintk in blocklayout xdr functions
sunrpc: make svc_tcp_sendmsg() take a signed sentp pointer
sunrpc: rearrange struct svc_rqst for fewer cachelines
sunrpc: return better error in svcauth_gss_accept() on alloc failure
sunrpc: reset rq_accept_statp when starting a new RPC
sunrpc: remove SVC_SYSERR
sunrpc: fix handling of unknown auth status codes
NFSD: Simplify struct knfsd_fh
NFSD: Access a knfsd_fh's fsid by pointer
Revert "NFSD: Force all NFSv4.2 COPY requests to be synchronous"
NFSD: Avoid multiple -Wflex-array-member-not-at-end warnings
NFSD: Use vfs_iocb_iter_write()
NFSD: Use vfs_iocb_iter_read()
NFSD: Clean up kdoc for nfsd_open_local_fh()
NFSD: Clean up kdoc for nfsd_file_put_local()
NFSD: Remove definition for trace_nfsd_ctl_maxconn
NFSD: Remove definition for trace_nfsd_file_gc_recent
NFSD: Remove definitions for unused trace_nfsd_file_lru trace points
NFSD: Remove definition for trace_nfsd_file_unhash_and_queue
nfsd: Use correct error code when decoding extents
...
Merge tag 'gfs2-for-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2
Pull gfs2 updates from Andreas Gruenbacher:
- Prevent cluster nodes from trying to recover their own filesystems
during a withdraw
- Add two missing migrate_folio aops and an additional exhash directory
consistency check (both triggered by syzbot bug reports)
- Sanitize how dlm results are processed and clean up a few quirks in
the glock code
- Minor stuff: Get rid of the GIF_ALLOC_FAILED flag; use SECTOR_SIZE
and SECTOR_SHIFT
* tag 'gfs2-for-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
gfs2: No more self recovery
gfs2: Validate i_depth for exhash directories
gfs2: Set .migrate_folio in gfs2_{rgrp,meta}_aops
gfs2: a minor finish_xmote cleanup
gfs2: simplify finish_xmote
gfs2: sanitize the gdlm_ast -> finish_xmote interface
gfs2: Minor do_xmote cancelation fix
gfs2: Remove GIF_ALLOC_FAILED flag
gfs2: Use SECTOR_SIZE and SECTOR_SHIFT
Merge tag 'xfs-merge-6.17' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs updates from Carlos Maiolino:
"This doesn't contain any new features. It mostly is a collection of
clean ups and code refactoring that I preferred to postpone to the
merge window.
It includes removal of several unused tracepoints, refactoring key
comparing routines under the B-Trees management and cleanup of xfs
journaling code"
* tag 'xfs-merge-6.17' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (44 commits)
xfs: don't use a xfs_log_iovec for ri_buf in log recovery
xfs: don't use a xfs_log_iovec for attr_item names and values
xfs: use better names for size members in xfs_log_vec
xfs: cleanup the ordered item logic in xlog_cil_insert_format_items
xfs: don't pass the old lv to xfs_cil_prepare_item
xfs: remove unused trace event xfs_reflink_cow_enospc
xfs: remove unused trace event xfs_discard_rtrelax
xfs: remove unused trace event xfs_log_cil_return
xfs: remove unused trace event xfs_dqreclaim_dirty
fs/xfs: replace strncpy with memtostr_pad()
xfs: Remove unused label in xfs_dax_notify_dev_failure
xfs: improve the comments in xfs_select_zone_nowait
xfs: improve the comments in xfs_max_open_zones
xfs: stop passing an inode to the zone space reservation helpers
xfs: rename oz_write_pointer to oz_allocated
xfs: use a uint32_t to cache i_used_blocks in xfs_init_zone
xfs: improve the xg_active_ref check in xfs_group_free
xfs: remove the xlog_ticket_t typedef
xfs: remove xrep_trans_{alloc,cancel}_hook_dummy
xfs: return the allocated transaction from xchk_trans_alloc_empty
...
Merge tag 'erofs-for-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
Pull erofs updates from Gao Xiang:
"We now support metadata compression. It can be useful for embedded use
cases or archiving a large number of small files.
Additionally, readdir performance has been improved by enabling
readahead (note that it was already common practice for ext3/4 non-dx
and f2fs directories). We may consider further improvements later to
align with ext4's s_inode_readahead_blks behavior for slow devices
too.
The remaining commits are minor.
Summary:
- Add support for metadata compression
- Enable readahead for directories to improve readdir performance
- Minor fixes and cleanups"
* tag 'erofs-for-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
erofs: support to readahead dirent blocks in erofs_readdir()
erofs: implement metadata compression
erofs: add on-disk definition for metadata compression
erofs: fix build error with CONFIG_EROFS_FS_ZIP_ACCEL=y
erofs: remove ENOATTR definition
erofs: refine erofs_iomap_begin()
erofs: unify meta buffers in z_erofs_fill_inode()
erofs: remove need_kmap in erofs_read_metabuf()
erofs: do sanity check on m->type in z_erofs_load_compact_lcluster()
erofs: get rid of {get,put}_page() for ztailpacking data
Merge tag 'ntfs3_for_6.17' of https://github.com/Paragon-Software-Group/linux-ntfs3
Pull ntfs3 updates from Konstantin Komarov:
"Added:
- sanity check for file name
- mark live inode as bad and avoid any operations
Fixed:
- handling of symlinks created in windows
- creation of symlinks for relative path
Changed:
- cancel setting inode as bad after removing name fails
- revert 'replace inode_trylock with inode_lock'"
* tag 'ntfs3_for_6.17' of https://github.com/Paragon-Software-Group/linux-ntfs3:
Revert "fs/ntfs3: Replace inode_trylock with inode_lock"
fs/ntfs3: Exclude call make_bad_inode for live nodes.
fs/ntfs3: cancle set bad inode after removing name fails
fs/ntfs3: Add sanity check for file name
fs/ntfs3: correctly create symlink for relative path
fs/ntfs3: fix symlinks cannot be handled correctly
Merge tag 'for-6.17-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs updates from David Sterba:
"A number of usability and feature updates, scattered performance
improvements and fixes. Highlight of the core changes is getting
closer to enabling large folios (now behind a config option).
User visible changes:
- update defrag ioctl, add new flag to request no compression on
existing extents
- restrict writes to block devices after mount
- in experimental config, enable large folios for data, almost
complete but not widely tested
- add stats tracking duration of critical section in transaction
commit to /sys/fs/btrfs/FSID/commit_stats
Performance improvements:
- caching of lookup results of free space bitmap (20% runtime
improvement on an empty file creation benchmark)
- accessors to metadata (b-tree items) simplified and optimized,
minor improvement in metadata-heavy workloads
- readahead on compressed data improves sequential read
- the xarray for extent buffers is indexed by denser keys, leading to
better packing of the nodes (50-70% reduction of leaf nodes)
Notable fixes:
- stricter compression mount option parsing
- send properly emits fallocate command for file holes when protocol
v2 is used
- fix overallocation of chunks with mount option 'ssd_spread', due to
interaction with size classes not finding the right chunk
(workaround: manual reclaim by 'usage' balance filter)
- various quota enable/disable races with rescan, more verbose
notifications about inconsistent state
- populate otime in tree-log during log replay
- handle ENOSPC when NOCOW file is used with mmap()
Core:
- large data folios enabled in experimental config
- in zoned mode, allocate reloc block group on mount to make sure
there's always one available for zone reclaim under heavy load
- rework device opening, they're always open as read-only and delayed
until the super block is created, allowing the restricted writes
after mount
- preparatory work for adding blk_holder_ops, allowing device
freeze/thaw in the future
Cleanups, refactoring:
- type and naming unifications (int/bool, return variables)
- rb-tree helper refactoring and simplifications
- reorder memory allocations to less critical places
- RCU string (used for device name) refactoring and API removal
- replace all remaining use of strcpy()"
* tag 'for-6.17-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (209 commits)
btrfs: send: use fallocate for hole punching with send stream v2
btrfs: unfold transaction aborts when writing dirty block groups
btrfs: use saner variable type and name to indicate extrefs at add_inode_ref()
btrfs: don't skip remaining extrefs if dir not found during log replay
btrfs: don't ignore inode missing when replaying log tree
btrfs: enable large data folios for data reloc inode
btrfs: output more info when btrfs_subpage_assert() failed
btrfs: reloc: unconditionally invalidate the page cache for each cluster
btrfs: defrag: add flag to force no-compression
btrfs: fix ssd_spread overallocation
btrfs: zoned: requeue to unused block group list if zone finish failed
btrfs: zoned: do not remove unwritten non-data block group
btrfs: remove btrfs_clear_extent_bits()
btrfs: use cached state when falling back from NOCoW write to CoW write
btrfs: set EXTENT_NORESERVE before range unlock in btrfs_truncate_block()
btrfs: don't print relocation messages from auto reclaim
btrfs: remove redundant auto reclaim log message
btrfs: make btrfs_check_nocow_lock() check more than one extent
btrfs: assert we can NOCOW the range in btrfs_truncate_block()
btrfs: update function comment for btrfs_check_nocow_lock()
...
Merge tag 'timers-urgent-2025-07-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fix from Thomas Gleixner:
"A single fix for the PTP systemcounter mechanism:
The rework of this mechanism added a 'use_nsec' member to struct
system_counterval. get_device_system_crosststamp() instantiates that
struct on the stack and hands a pointer to the driver callback.
Only the drivers which set use_nsec to true, initialize that field,
but all others ignore it. As get_device_system_crosststamp() does not
initialize the struct, the use_nsec field contains random stack
content in those cases. That causes a miscalulation usually resulting
in a failing range check in the best case.
Initialize the structure before handing it to the drivers to cure
that"
* tag 'timers-urgent-2025-07-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
timekeeping: Zero initialize system_counterval when querying time from phc drivers
sched/task_stack: Add missing const qualifier to end_of_stack()
Add missing const qualifier to the non-CONFIG_THREAD_INFO_IN_TASK
version of end_of_stack() to match the CONFIG_THREAD_INFO_IN_TASK
version. Fixes a warning with CONFIG_KSTACK_ERASE=y on archs that don't
select THREAD_INFO_IN_TASK (such as LoongArch):
error: passing 'const struct task_struct *' to parameter of type 'struct task_struct *' discards qualifiers
The stackleak_task_low_bound() function correctly uses a const task
parameter, but the legacy end_of_stack() prototype didn't like that.
Build tested on loongarch (with CONFIG_KSTACK_ERASE=y) and m68k
(with CONFIG_DEBUG_STACK_USAGE=y).
kstack_erase: Add -mgeneral-regs-only to silence Clang warnings
Once CONFIG_KSTACK_ERASE is enabled with Clang on i386, the build warns:
kernel/kstack_erase.c:168:2: warning: function with attribute 'no_caller_saved_registers' should only call a function with attribute 'no_caller_saved_registers' or be compiled with '-mgeneral-regs-only' [-Wexcessive-regsave]
Add -mgeneral-regs-only for the kstack_erase handler, to make Clang feel
better (it is effectively a no-op flag for the kernel). No binary
changes encountered.
Build & boot tested with Clang 21 on x86_64, and i386.
Build tested with GCC 14.2.0 on x86_64, i386, arm64, and arm.
init.h: Disable sanitizer coverage for __init and __head
While __noinstr already contained __no_sanitize_coverage, it needs to
be added to __init and __head section markings to support the Clang
implementation of CONFIG_KSTACK_ERASE. This is to make sure the stack
depth tracking callback is not executed in unsupported contexts.
The other sanitizer coverage options (trace-pc and trace-cmp) aren't
needed in __head nor __init either ("We are interested in code coverage
as a function of a syscall inputs"[1]), so this is fine to disable for
them as well.
kstack_erase: Disable kstack_erase for all of arm compressed boot code
When building with CONFIG_KSTACK_ERASE=y and CONFIG_ARM_ATAG_DTB_COMPAT=y,
the compressed boot environment encounters an undefined symbol error:
ld.lld: error: undefined symbol: __sanitizer_cov_stack_depth
>>> referenced by atags_to_fdt.c:135
This occurs because the compiler instruments the atags_to_fdt() function
with sanitizer coverage calls, but the minimal compressed boot environment
lacks access to sanitizer runtime support.
The compressed boot environment already disables stack protector with
-fno-stack-protector. Similarly disable sanitizer coverage by adding
$(DISABLE_KSTACK_ERASE) to the general compiler flags (and remove it
from the one place it was noticed before), which contains the appropriate
flags to prevent sanitizer instrumentation.
This follows the same pattern used in other early boot contexts where
sanitizer runtime support is unavailable.
Merge tag 'i2c-for-6.16-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
- qup: avoid potential hang when waiting for bus idle
- tegra: improve ACPI reset error handling
- virtio: use interruptible wait to prevent hang during transfer
* tag 'i2c-for-6.16-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: qup: jump out of the loop in case of timeout
i2c: virtio: Avoid hang by using interruptible completion wait
i2c: tegra: Fix reset error handling with ACPI
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rmk/linux
Pull ARM fixes from Russell King:
- use an absolute path for asm/unified.h in KBUILD_AFLAGS to solve a
regression caused by commit d5c8d6e0fa61 ("kbuild: Update assembler
calls to use proper flags and language target")
- fix dead code elimination binutils version check again
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rmk/linux:
ARM: 9450/1: Fix allowing linker DCE with binutils < 2.36
ARM: 9448/1: Use an absolute path to unified.h in KBUILD_AFLAGS
Merge tag 'soc-fixes-6.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull SoC fixes from Arnd Bergmann:
"These are two fixes that came in late, one addresses a regression on a
rockchips based board, the other is for ensuring a consistent dt
binding for a device added in 6.16 before the incorrect one makes it
into a release"
* tag 'soc-fixes-6.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
arm64: dts: rockchip: Drop netdev led-triggers on NanoPi R5S
arm64: dts: allwinner: a523: Rename emac0 to gmac0
Wolfram Sang [Fri, 25 Jul 2025 22:59:39 +0000 (00:59 +0200)]
Merge tag 'i2c-host-fixes-6.16-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/andi.shyti/linux into i2c/for-current
i2c-host-fixes for v6.16-rc8
qup: avoid potential hang when waiting for bus idle
tegra: improve ACPI reset error handling
virtio: use interruptible wait to prevent hang during transfer
res = hfs_find_init(HFS_SB(inode->i_sb)->ext_tree, &fd);
if (!res) {
res = __hfs_ext_cache_extent(&fd, inode, block);
hfs_find_exit(&fd);
}
return res;
}
The problem here that hfs_find_init() is trying to use
HFS_SB(inode->i_sb)->ext_tree that is not initialized yet.
It will be initailized when hfs_btree_open() finishes
the execution.
The patch adds checking of tree pointer in hfs_find_init()
and it reworks the logic of hfs_btree_open() by reading
the b-tree's header directly from the volume. The read_mapping_page()
is exchanged on filemap_grab_folio() that grab the folio from
mapping. Then, sb_bread() extracts the b-tree's header
content and copy it into the folio.
Reported-by: Wenzhi Wang <wenzhi.wang@uwaterloo.ca> Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
cc: Yangtao Li <frank.li@vivo.com>
cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20250710213657.108285-1-slava@dubeyko.com Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
This patch introduces is_bnode_offset_valid() method that checks
the requested offset value. Also, it introduces
check_and_correct_requested_length() method that checks and
correct the requested length (if it is necessary). These methods
are used in hfs_bnode_read(), hfs_bnode_write(), hfs_bnode_clear(),
hfs_bnode_copy(), and hfs_bnode_move() with the goal to prevent
the access out of allocated memory and triggering the crash.
The reason of the issue that code doesn't check the correctness
of the requested offset and length. As a result, incorrect value
of offset or/and length could result in access out of allocated
memory.
This patch introduces is_bnode_offset_valid() method that checks
the requested offset value. Also, it introduces
check_and_correct_requested_length() method that checks and
correct the requested length (if it is necessary). These methods
are used in hfsplus_bnode_read(), hfsplus_bnode_write(),
hfsplus_bnode_clear(), hfsplus_bnode_copy(), and hfsplus_bnode_move()
with the goal to prevent the access out of allocated memory
and triggering the crash.
Reported-by: Kun Hu <huk23@m.fudan.edu.cn> Reported-by: Jiaji Qin <jjtan24@m.fudan.edu.cn> Reported-by: Shuoran Bai <baishuoran@hrbeu.edu.cn> Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com> Link: https://lore.kernel.org/r/20250703214804.244077-1-slava@dubeyko.com Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
where HFSPLUS_MAX_STRLEN is 255 bytes. The issue happens if length
of the structure instance has value bigger than 255 (for example,
65283). In such case, pointer on unicode buffer is going beyond of
the allocated memory.
The patch fixes the issue by checking the length value of
hfsplus_unistr instance and using 255 value in the case if length
value is bigger than HFSPLUS_MAX_STRLEN. Potential reason of such
situation could be a corruption of Catalog File b-tree's node.
Reported-by: Wenzhi Wang <wenzhi.wang@uwaterloo.ca> Signed-off-by: Liu Shixin <liushixin2@huawei.com> Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
cc: Yangtao Li <frank.li@vivo.com>
cc: linux-fsdevel@vger.kernel.org Reviewed-by: Yangtao Li <frank.li@vivo.com> Link: https://lore.kernel.org/r/20250710230830.110500-1-slava@dubeyko.com Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
hfsplus: don't use BUG_ON() in hfsplus_create_attributes_file()
When the volume header contains erroneous values that do not reflect
the actual state of the filesystem, hfsplus_fill_super() assumes that
the attributes file is not yet created, which later results in hitting
BUG_ON() when hfsplus_create_attributes_file() is called. Replace this
BUG_ON() with -EIO error with a message to suggest running fsck tool.
hfsplus: don't set REQ_SYNC for hfsplus_submit_bio()
hfsplus_submit_bio() called by hfsplus_sync_fs() uses bdev_virt_rw() which
in turn uses submit_bio_wait() to submit the BIO.
But submit_bio_wait() already sets the REQ_SYNC flag on the BIO so there
is no need for setting the flag in hfsplus_sync_fs() when calling
hfsplus_submit_bio().
Merge tag 'drm-fixes-2025-07-26' of https://gitlab.freedesktop.org/drm/kernel
Pull drm fixes (part 2) from Dave Airlie:
"Just the follow up fixes for i915 and xe, all pretty minor.
i915:
- Fix DP 2.7 Gbps DP_LINK_BW value on g4x
- Fix return value on intel_atomic_commit_fence_wait
xe:
- Fix build without debugfs"
* tag 'drm-fixes-2025-07-26' of https://gitlab.freedesktop.org/drm/kernel:
drm/xe: Fix build without debugfs
drm/i915/display: Fix dma_fence_wait_timeout() return value handling
drm/i915/dp: Fix 2.7 Gbps DP_LINK_BW value on g4x
Merge tag 'vfs-6.16-rc8.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs fixes from Christian Brauner:
"Two last-minute fixes for this cycle:
- Set afs vllist to NULL if addr parsing fails
- Add a missing check for reaching the end of the string in afs"
* tag 'vfs-6.16-rc8.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
afs: Set vllist to NULL if addr parsing fails
afs: Fix check for NULL terminator
Merge tag 'bcachefs-2025-07-24' of git://evilpiepirate.org/bcachefs
Pull bcachefs fixes from Kent Overstreet:
"User reported fixes:
- Fix btree node scan on encrypted filesystems by not using btree
node header fields encrypted
- Fix a race in btree write buffer flush; this caused EROs primarily
during fsck for some people"
* tag 'bcachefs-2025-07-24' of git://evilpiepirate.org/bcachefs:
bcachefs: Add missing snapshots_seen_add_inorder()
bcachefs: Fix write buffer flushing from open journal entry
bcachefs: btree_node_scan: don't re-read before initializing found_btree_node
ARM: 9450/1: Fix allowing linker DCE with binutils < 2.36
Commit e7607f7d6d81 ("ARM: 9443/1: Require linker to support KEEP within
OVERLAY for DCE") accidentally broke the binutils version restriction
that was added in commit 0d437918fb64 ("ARM: 9414/1: Fix build issue
with LD_DEAD_CODE_DATA_ELIMINATION"), reintroducing the segmentation
fault addressed by that workaround.
Restore the binutils version dependency by using
CONFIG_LD_CAN_USE_KEEP_IN_OVERLAY as an additional condition to ensure
that CONFIG_HAVE_LD_DEAD_CODE_DATA_ELIMINATION is only enabled with
binutils >= 2.36 and ld.lld >= 21.0.0.
Closes: https://lore.kernel.org/6739da7d-e555-407a-b5cb-e5681da71056@landley.net/ Closes: https://lore.kernel.org/CAFERDQ0zPoya5ZQfpbeuKVZEo_fKsonLf6tJbp32QnSGAtbi+Q@mail.gmail.com/ Cc: stable@vger.kernel.org Fixes: e7607f7d6d81 ("ARM: 9443/1: Require linker to support KEEP within OVERLAY for DCE") Reported-by: Rob Landley <rob@landley.net> Tested-by: Rob Landley <rob@landley.net> Reported-by: Martin Wetterwald <martin@wetterwald.eu> Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
ARM: 9448/1: Use an absolute path to unified.h in KBUILD_AFLAGS
After commit d5c8d6e0fa61 ("kbuild: Update assembler calls to use proper
flags and language target"), which updated as-instr to use the
'assembler-with-cpp' language option, the Kbuild version of as-instr
always fails internally for arch/arm with
<command-line>: fatal error: asm/unified.h: No such file or directory
compilation terminated.
because '-include' flags are now taken into account by the compiler
driver and as-instr does not have '$(LINUXINCLUDE)', so unified.h is not
found.
This went unnoticed at the time of the Kbuild change because the last
use of as-instr in Kbuild that arch/arm could reach was removed in 5.7
by commit 541ad0150ca4 ("arm: Remove 32bit KVM host support") but a
stable backport of the Kbuild change to before that point exposed this
potential issue if one were to be reintroduced.
Follow the general pattern of '-include' paths throughout the tree and
make unified.h absolute using '$(srctree)' to ensure KBUILD_AFLAGS can
be used independently.
Closes: https://lore.kernel.org/CACo-S-1qbCX4WAVFA63dWfHtrRHZBTyyr2js8Lx=Az03XHTTHg@mail.gmail.com/ Cc: stable@vger.kernel.org Fixes: d5c8d6e0fa61 ("kbuild: Update assembler calls to use proper flags and language target") Reported-by: KernelCI bot <bot@kernelci.org> Reviewed-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
block: restore two stage elevator switch while running nr_hw_queue update
The kmemleak reports memory leaks related to elevator resources that
were originally allocated in the ->init_hctx() method. The following
leak traces are observed after running blktests block/040:
The issue arises while we run nr_hw_queue update, Specifically, we first
reallocate hardware contexts (hctx) via __blk_mq_realloc_hw_ctxs(), and
then later invoke elevator_switch() (assuming q->elevator is not NULL).
The elevator switch code would first exit old elevator (elevator_exit)
and then switches to the new elevator. The elevator_exit loops through
each hctx and invokes the elevator’s per-hctx exit method ->exit_hctx(),
which releases resources allocated during ->init_hctx().
This memleak manifests when we reduce the num of h/w queues - for example,
when the initial update sets the number of queues to X, and a later update
reduces it to Y, where Y < X. In this case, we'd loose the access to old
hctxs while we get to elevator exit code because __blk_mq_realloc_hw_ctxs
would have already released the old hctxs. As we don't now have any
reference left to the old hctxs, we don't have any way to free the
scheduler resources (which are allocate in ->init_hctx()) and kmemleak
complains about it.
This issue was caused due to the commit 596dce110b7d ("block: simplify
elevator reattachment for updating nr_hw_queues"). That change unified
the two-stage elevator teardown and reattachment into a single call that
occurs after __blk_mq_realloc_hw_ctxs() has already freed the hctxs.
This patch restores the previous two-stage elevator switch logic during
nr_hw_queues updates. First, the elevator is switched to 'none', which
ensures all scheduler resources are properly freed. Then, the hardware
contexts (hctxs) are reallocated, and the software-to-hardware queue
mappings are updated. Finally, the original elevator is reattached. This
sequence prevents loss of references to old hctxs and avoids the scheduler
resource leaks reported by kmemleak.
Cc: stable@vger.kernel.org Reported-by: David Howells <dhowells@redhat.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
If client send multiple session setup requests to ksmbd,
Preauh_HashValue race condition could happen.
There is no need to free sess->Preauh_HashValue at session setup phase.
It can be freed together with session at connection termination phase.
Cc: stable@vger.kernel.org Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-27661 Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
ksmbd: fix null pointer dereference error in generate_encryptionkey
If client send two session setups with krb5 authenticate to ksmbd,
null pointer dereference error in generate_encryptionkey could happen.
sess->Preauth_HashValue is set to NULL if session is valid.
So this patch skip generate encryption key if session is valid.
Cc: stable@vger.kernel.org Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-27654 Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
Kent Overstreet [Tue, 22 Jul 2025 03:41:50 +0000 (23:41 -0400)]
bcachefs: Fix write buffer flushing from open journal entry
When flushing the btree write buffer, we pull write buffer keys directly
from the journal instead of letting the journal write path copy them to
the write buffer.
When flushing from the currently open journal buffer, we have to block
new reservations and wait for outstanding reservations to complete.
Recheck the reservation state after blocking new reservations:
previously, we were checking the reservation count from before calling
__journal_block().
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Merge tag 'mm-hotfixes-stable-2025-07-24-18-03' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"11 hotfixes. 9 are cc:stable and the remainder address post-6.15
issues or aren't considered necessary for -stable kernels.
7 are for MM"
* tag 'mm-hotfixes-stable-2025-07-24-18-03' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
sprintf.h requires stdarg.h
resource: fix false warning in __request_region()
mm/damon/core: commit damos_quota_goal->nid
kasan: use vmalloc_dump_obj() for vmalloc error reports
mm/ksm: fix -Wsometimes-uninitialized from clang-21 in advisor_mode_show()
mm: update MAINTAINERS entry for HMM
nilfs2: reject invalid file types when reading inodes
selftests/mm: fix split_huge_page_test for folio_split() tests
mailmap: add entry for Senozhatsky
mm/zsmalloc: do not pass __GFP_MOVABLE if CONFIG_COMPACTION=n
mm/vmscan: fix hwpoisoned large folio handling in shrink_folio_list
Stephen Rothwell [Mon, 21 Jul 2025 06:15:57 +0000 (16:15 +1000)]
sprintf.h requires stdarg.h
In file included from drivers/crypto/intel/qat/qat_common/adf_pm_dbgfs_utils.c:4:
include/linux/sprintf.h:11:54: error: unknown type name 'va_list'
11 | __printf(2, 0) int vsprintf(char *buf, const char *, va_list);
| ^~~~~~~
include/linux/sprintf.h:1:1: note: 'va_list' is defined in header '<stdarg.h>'; this is probably fixable by adding '#include <stdarg.h>'
Link: https://lkml.kernel.org/r/20250721173754.42865913@canb.auug.org.au Fixes: 39ced19b9e60 ("lib/vsprintf: split out sprintf() and friends") Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Andriy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Petr Mladek <pmladek@suse.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Akinobu Mita [Sat, 19 Jul 2025 11:26:04 +0000 (20:26 +0900)]
resource: fix false warning in __request_region()
A warning is raised when __request_region() detects a conflict with a
resource whose resource.desc is IORES_DESC_DEVICE_PRIVATE_MEMORY.
But this warning is only valid for iomem_resources.
The hmem device resource uses resource.desc as the numa node id, which can
cause spurious warnings.
This warning appeared on a machine with multiple cxl memory expanders.
One of the NUMA node id is 6, which is the same as the value of
IORES_DESC_DEVICE_PRIVATE_MEMORY.
In this environment it was just a spurious warning, but when I saw the
warning I suspected a real problem so it's better to fix it.
This change fixes this by restricting the warning to only iomem_resource.
This also adds a missing new line to the warning message.
Link: https://lkml.kernel.org/r/20250719112604.25500-1-akinobu.mita@gmail.com Fixes: 7dab174e2e27 ("dax/hmem: Move hmem device registration to dax_hmem.ko") Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
SeongJae Park [Sat, 19 Jul 2025 18:19:32 +0000 (11:19 -0700)]
mm/damon/core: commit damos_quota_goal->nid
DAMOS quota goal uses 'nid' field when the metric is
DAMOS_QUOTA_NODE_MEM_{USED,FREE}_BP. But the goal commit function is not
updating the goal's nid field. Fix it.
Link: https://lkml.kernel.org/r/20250719181932.72944-1-sj@kernel.org Fixes: 0e1c773b501f ("mm/damon/core: introduce damos quota goal metrics for memory node utilization") [6.16.x] Signed-off-by: SeongJae Park <sj@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
GCC appears to have kind of fragile inlining heuristics, in the
sense that it can change whether or not it inlines something based on
optimizations. It looks like the kcov instrumentation being added (or in
this case, removed) from a function changes the optimization results,
and some functions marked "inline" are _not_ inlined. In that case,
we end up with __init code calling a function not marked __init, and we
get the build warnings I'm trying to eliminate in the coming patch that
adds __no_sanitize_coverage to __init functions:
This problem is somewhat fragile (though using either __always_inline
or __init will deterministically solve it), but we've tripped over
this before with GCC and the solution has usually been to just use
__always_inline and move on.
For x86 this means forcing several functions to be inline with
__always_inline.
GCC appears to have kind of fragile inlining heuristics, in the
sense that it can change whether or not it inlines something based on
optimizations. It looks like the kcov instrumentation being added (or in
this case, removed) from a function changes the optimization results,
and some functions marked "inline" are _not_ inlined. In that case,
we end up with __init code calling a function not marked __init, and we
get the build warnings I'm trying to eliminate in the coming patch that
adds __no_sanitize_coverage to __init functions:
This problem is somewhat fragile (though using either __always_inline
or __init will deterministically solve it), but we've tripped over
this before with GCC and the solution has usually been to just use
__always_inline and move on.
For arm64 this requires forcing one ACPI function to be inlined with
__always_inline.
Merge tag 'pci-v6.16-fixes-4' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Pull pci fix from Bjorn Helgaas:
- Create pwrctrl devices only when we need them, i.e., when
CONFIG_PCI_PWRCTRL is enabled.
This allows brcmstb to work around a pwrctrl regression by
disabling CONFIG_PCI_PWRCTRL (Manivannan Sadhasivam)
* tag 'pci-v6.16-fixes-4' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
PCI/pwrctrl: Create pwrctrl devices only when CONFIG_PCI_PWRCTRL is enabled
Lucas De Marchi [Tue, 22 Jul 2025 19:52:08 +0000 (12:52 -0700)]
drm/xe: Fix build without debugfs
When CONFIG_DEBUG_FS is off, drivers/gpu/drm/xe/xe_gt_debugfs.o
is not built and build fails on some setups with:
ld: drivers/gpu/drm/xe/xe_gt.o: in function `xe_fault_inject_gt_reset':
drivers/gpu/drm/xe/xe_gt.h:27:(.text+0x1659): undefined reference to `gt_reset_failure'
ld: drivers/gpu/drm/xe/xe_gt.h:27:(.text+0x1c16): undefined reference to `gt_reset_failure'
collect2: error: ld returned 1 exit status
Do not use the gt_reset_failure attribute if debugfs is not enabled.
Fixes: 8f3013e0b222 ("drm/xe: Introduce fault injection for gt reset") Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Tested-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Link: https://lore.kernel.org/r/20250722-xe-fix-build-fault-v1-1-157384d50987@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
(cherry picked from commit 4d3bbe9dd28c0a4ca119e4b8823c5f5e9cb3ff90) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Merge tag 'sound-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"Some last-minute fixes. All changes are device-specific small fixes or
quirks, safe to apply"
* tag 'sound-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ASoC: mediatek: common: fix device and OF node leak
ALSA: hda/realtek: Fix mute LED mask on HP OMEN 16 laptop
ALSA: usb-audio: qcom: Adjust mutex unlock order
ASoC: SDCA: correct the calculation of the maximum init table size
ASoC: rt5650: Eliminate the high frequency glitch
ASoC: SOF: Intel: PTL: Add the sdw_process_wakeen op
ALSA: hda/realtek - Add mute LED support for HP Pavilion 15-eg0xxx
ALSA: hda/realtek - Add mute LED support for HP Victus 15-fa0xxx
ASoC: mediatek: mt8365-dai-i2s: pass correct size to mt8365_dai_set_priv
The function ksmbd_vfs_kern_path_locked() seems to serve two functions
and as a result has an odd interface.
On success it returns with the parent directory locked and with write
access on that filesystem requested, but it may have crossed over a
mount point to return the path, which makes the lock and the write
access irrelevant.
This patches separates the functionality into two functions:
- ksmbd_vfs_kern_path() does not lock the parent, does not request
write access, but does cross mount points
- ksmbd_vfs_kern_path_locked() does not cross mount points but
does lock the parent and request write access.
The parent_path parameter is no longer needed. For the _locked case
the final path is sufficient to drop write access and to unlock the
parent (using path->dentry->d_parent which is safe while the lock is
held).
There were 3 caller of ksmbd_vfs_kern_path_locked().
- smb2_create_link() needs to remove the target if it existed and
needs the lock and the write-access, so it continues to use
ksmbd_vfs_kern_path_locked(). It would not make sense to
cross mount points in this case.
- smb2_open() is the only user that needs to cross mount points
and it has no need for the lock or write access, so it now uses
ksmbd_vfs_kern_path()
- smb2_creat() does not need to cross mountpoints as it is accessing
a file that it has just created on *this* filesystem. But also it
does not need the lock or write access because by the time
ksmbd_vfs_kern_path_locked() was called it has already created the
file. So it could use either interface. It is simplest to use
ksmbd_vfs_kern_path().
ksmbd_vfs_kern_path_unlock() is still needed after
ksmbd_vfs_kern_path_locked() but it doesn't require the parent_path any
more. After a successful call to ksmbd_vfs_kern_path(), only path_put()
is needed to release the path.
Signed-off-by: NeilBrown <neil@brown.name> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"Two important arm64 fixes ahead of the 6.16 release.
The first fixes a regression introduced during the merge window where
the KVM UUID (which is used to advertise KVM-specific hypercalls for
things like time synchronisation in the guest) was corrupted thanks to
an endianness bug introduced when converting the code to use the
UUID_INIT() helper.
The second fixes a stack-pointer corruption issue during
context-switch which has been observed in the wild when taking a
pseudo-NMI with shadow call stack enabled.
Summary:
- Fix broken UUID value for the KVM/arm64 hypervisor SMCCC interface
- Fix stack corruption on context-switch, primarily seen on (but not
limited to) configurations with both pNMI and SCS enabled"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64/entry: Mask DAIF in cpu_switch_to(), call_on_irq_stack()
arm64: kvm, smccc: Fix vendor uuid
Merge tag 'net-6.16-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from can and xfrm.
The TI regression notified last week is actually on our net-next tree,
it does not affect 6.16.
We are investigating a virtio regression which is quite hard to
reproduce - currently only our CI sporadically hits it. Hopefully it
should not be critical, and I'm not sure that an additional week would
be enough to solve it.
Current release - fix to a fix:
- sched: sch_qfq: avoid sleeping in atomic context in qfq_delete_class
Previous releases - regressions:
- xfrm:
- set transport header to fix UDP GRO handling
- delete x->tunnel as we delete x
- eth:
- mlx5: fix memory leak in cmd_exec()
- i40e: when removing VF MAC filters, avoid losing PF-set MAC
- gve: fix stuck TX queue for DQ queue format
Previous releases - always broken:
- can: fix NULL pointer deref of struct can_priv::do_set_mode
- eth:
- ice: fix a null pointer dereference in ice_copy_and_init_pkg()
- ism: fix concurrency management in ism_cmd()
- dpaa2: fix device reference count leak in MAC endpoint handling
- icssg-prueth: fix buffer allocation for ICSSG
Misc:
- selftests: mptcp: increase code coverage"
* tag 'net-6.16-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (34 commits)
net: hns3: default enable tx bounce buffer when smmu enabled
net: hns3: fixed vf get max channels bug
net: hns3: disable interrupt when ptp init failed
net: hns3: fix concurrent setting vlan filter issue
s390/ism: fix concurrency management in ism_cmd()
selftests: drv-net: wait for iperf client to stop sending
MAINTAINERS: Add in6.h to MAINTAINERS
selftests: netfilter: tone-down conntrack clash test
can: netlink: can_changelink(): fix NULL pointer deref of struct can_priv::do_set_mode
net/sched: sch_qfq: Avoid triggering might_sleep in atomic context in qfq_delete_class
gve: Fix stuck TX queue for DQ queue format
net: appletalk: Fix use-after-free in AARP proxy probe
net: bcmasp: Restore programming of TX map vector register
selftests: mptcp: connect: also cover checksum
selftests: mptcp: connect: also cover alt modes
e1000e: ignore uninitialized checksum word on tgp
e1000e: disregard NVM checksum on tgp when valid checksum bit is not set
ice: Fix a null pointer dereference in ice_copy_and_init_pkg()
i40e: When removing VF MAC filters, only check PF-set MAC
i40e: report VF tx_dropped with tx_errors instead of tx_discards
...