git.ipfire.org Git - thirdparty/kernel/linux.git/log

Merge tag 'gpio-updates-for-v6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux

Pull gpio updates from Bartosz Golaszewski:
"There's one new driver (Apple SMC) and extensions to existing drivers
  for supporting new HW models. A lot of different impovements across
  drivers and in core GPIO code. Details on that are in the signed tag
  as usual.

  We managed to remove some of the legacy APIs. Arnd Bergmann started to
  work on making the legacy bits optional so that we may compile them
  only for older platforms that still really need them.

  Rob Herring has done a lot of work to convert legacy .txt dt-bindings
  for GPIO controllers to YAML. There are only a few left now in the
  GPIO tree.

  A big part of the commits in this PR concern the conversion of GPIO
  drivers to using the new line value setter callbacks. This conversion
  is now complete treewide (unless I've missed something) and once all
  the changes from different trees land in mainline, I'll send you
  another PR containing a commit dropping the legacy callbacks from the
  tree.

  As the quest to pay back technical dept never really ends, we're
  starting another set of interface conversions, this time it's about
  moving fields specific to only a handful of drivers using the
  gpio-mmio helper out of the core gpio_chip structure that every
  controller implements and uses. This cycle we introduce a new set of
  APIs and convert a few drivers under drivers/gpio/, next cycle we'll
  convert remaining modules treewide (in gpio, pinctrl and mfd trees)
  and finally remove the old interfaces and move the gpio-mmio fields
  into their own structure wrapping gpio_chip.

  One last change I should mention here is the rework of the sysfs
  interface. In 2016, we introduced the GPIO character device as the
  preferred alternative to the sysfs class under /sys/class/gpio. While
  it has seen a wide adoption with the help of its user-space
  counterpart - libgpiod - there are still users who prefer the
  simplicity of sysfs.

  As far as the GPIO subsystem is concerned, the problem is not the
  existince of the GPIO class as such but rather the fact that it
  exposes the global GPIO numbers to the user-space, stopping us from
  ever being able to remove the numberspace from the kernel. To that
  end, this release we introduced a parallel, limited sysfs interface
  that doesn't expose these numbers and only implements a subset of
  features that are relevant to the existing users. This is a result of
  several discussions over the course of last year and should allow us
  to remove the legacy part some time in the future.

  Summary:

  GPIOLIB core:
   - introduce a parallel, limited sysfs user ABI that doesn't expose
     the global GPIO numbers to user-space while maintaining backward
     compatibility with the end goal of it completely replacing the
     existing interface, allowing us to remove it
   - remove the legacy devm_gpio_request() routine which has no more
     users
   - start the process of allowing to compile-out the legacy parts of
     the GPIO core for users who don't need it by introducing a new
     Kconfig option: GPIOLIB_LEGACY
   - don't use global GPIO numbers in debugfs output from the core code
     (drivers still do it, the work is ongoing)
   - start the process of moving the fields specific to the gpio-mmio
     helper out of the core struct gpio_chip into their own structure
     that wraps it: create a new header with modern interfaces and
     convert several drivers to using it
   - remove the platform data structure associated with the gpio-mmio
     helper from the kernel after having converted all remaining users
     to generic device properties
   - remove legacy struct gpio definition as it has no more users

  New drivers:
   - add the GPIO driver for the Apple System Management Controller

  Driver improvements:
   - add support for new models to gpio-adp5585, gpio-tps65219 and
     gpio-pca953x
   - extend the interrupt support in gpio-loongson-64bit
   - allow to mark the simulated GPIO lines as invalid in gpio-sim
   - convert all remaining GPIO drivers to using the new GPIO value
     setter callbacks
   - convert gpio-rcar to using simple device power management ops
     callbacks
   - don't check if current direction of a line is output before setting
     the value in gpio-pisosr and ti-fpc202: the GPIO core already
     handles that
   - also drop unneeded GPIO range checks in drivers, the core already
     makes sure we're within bounds when calling driver callbacks
   - use dev_fwnode() where applicable across GPIO drivers
   - set line value in gpio-zynqmp-modepin and gpio-twl6040 when the
     user wants to change direction of the pin to output even though
     these drivers don't need to do anything else to actually set the
     direction, otherwise a call like gpiod_direction_output(d, 1) will
     not result in the line driver high
   - remove the reduntant call to pm_runtime_mark_last_busy() from
     gpio-arizona
   - use lock guards in gpio-cadence and gpio-mxc
   - check the return values of regmap functions in gpio-wcd934x and
     gpio-tps65912
   - use better regmap interfaces in gpio-wcove and gpio-pca953x
   - remove dummy GPIO chip callbacks from several drivers in cases
     where the GPIO core can already handle their absence
   - allow building gpio-palmas as a module

  Fixes:
   - use correct bit widths (according to the documentation) in
     gpio-virtio

  Device-tree bindings:
   - convert several of the legacy .txt documents for many different
     devices to YAML, improving automatic validation
   - create a "trivial" GPIO DT schema that covers a wide range of
     simple hardware that share a set of basic GPIO properties
   - document new HW: Apple MAC SMC GPIO block and adp5589 I/O expander
   - document a new model for pca95xx
   - add and/or remove properties in YAML documents for gpio-rockchip,
     fsl,qoriq-gpio, arm,pl061 and gpio-xilinx

  Misc:
   - some minor refactoring in several places, adding/removing forward
     declarations, moving defines to better places, constify the
     arguments in some functions, remove duplicate includes, etc.
   - documentation updates"

* tag 'gpio-updates-for-v6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux: (202 commits)
  MIPS: alchemy: gpio: use new GPIO line value setter callbacks for the remaining chips
  gpiolib: enable CONFIG_GPIOLIB_LEGACY even for !GPIOLIB
  gpio: virtio: Fix config space reading.
  gpiolib: make legacy interfaces optional
  dt-bindings: gpio: rockchip: Allow use of a power-domain
  gpiolib: of: add forward declaration for struct device_node
  power: reset: macsmc-reboot: Add driver for rebooting via Apple SMC
  gpio: Add new gpio-macsmc driver for Apple Macs
  mfd: Add Apple Silicon System Management Controller
  soc: apple: rtkit: Make shmem_destroy optional
  dt-bindings: mfd: Add Apple Mac System Management Controller
  dt-bindings: power: reboot: Add Apple Mac SMC Reboot Controller
  dt-bindings: gpio: Add Apple Mac SMC GPIO block
  gpio: cadence: Remove duplicated include in gpio-cadence.c
  gpio: tps65219: Add support for TI TPS65214 PMIC
  gpio: tps65219: Update _IDX & _OFFSET macro prefix
  gpio: sysfs: Fix an end of loop test in gpiod_unexport()
  dt-bindings: gpio: Convert qca,ar7100-gpio to DT schema
  dt-bindings: gpio: Convert maxim,max3191x to DT schema
  dt-bindings: gpio: fsl,qoriq-gpio: Add missing mpc8xxx compatibles
  ...

Merge tag 'sound-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound updates from Takashi Iwai:
"This includes lots of file shuffling due to HD-audio code
  reorganization and many trivial changes, but otherwise there shouldn't
  be much surprise from the functionality POV. The PR includes the PM
  changes as prerequisite, too. Some highlights below:

  Core:
   - Performance optimizations in PCM core code
   - Refactoring of ASoC Kconfig menus to be hopefully more consistant
     and easier to navigate.
   - Refactoring of ASoC DAPM code, mainly hiding functionality that
     doesn't need to be exposed to drivers

  HD-audio reorganization:
   - All code are moved under sound/hda with a bit more understandable
     tree structure, as well as file renames
   - The huge Realtek driver code is split to several parts, a common
     helper module with driver modules per probe entry
   - HDMI and Cirrus codec drivers also split

  ASoC:
   - Further work on the generic handling for SoundWire SDCA devices
   - Support for AMD ACP7.2 and SoundWire on ACP 7.1, Fairphone 4 & 5,
     various Intel systems, Qualcomm QCS8275, Richtek RTQ9124 and TI
     TAS5753

  HD-audio and USB-audio:
   - TAS2781 driver cleanup and TAS2770 support
   - EQ enablement in CA0132 driver
   - USB audio quirk code cleanups

  Others:
   - Cleanups of PM autosuspend call patterns with the update from the
     PM tree
   - Lots of strcpy() -> strscpy() conversions for fixed size arrays"

* tag 'sound-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (385 commits)
  ALSA: hda: Add TAS2770 support
  ASoC: qcom: sm8250: Add Fairphone 4 soundcard compatible
  ASoC: dt-bindings: qcom,sm8250: Add Fairphone 4 sound card
  ASoC: dt-bindings: qcom,q6afe: Document q6usb subnode
  ASoC: SDCA: Fix implicit cast from le16
  ASoC: SDCA: Shrink detected_mode_handler() stack frame
  ASoC: SDCA: Check devm_mutex_init() return value
  ASoC: SDCA: add route by the number of input pins in MU entity
  ALSA: hda/realtek: Add support for ASUS Commercial laptops using CS35L41 HDA
  ASoC: Intel: sof_rt5682: Add HDMI-In capture with rt5682 support for PTL.
  ASoC: codec: tlv320aic32x4: Fix reset GPIO check
  ASoC: dt-bindings: qcom,lpass-va-macro: Define clock-names in top-level
  ASoC: SDCA: Add hw_params() helper function
  ASoC: SDCA: Add a helper to get the SoundWire port number
  ASoC: SDCA: Add helper to add DAI constraints
  ASoC: soc-dai: Add private data to snd_soc_dai
  ASoC: SDCA: Move SDCA search functions and export
  ASoC: SDCA: Remove overly chatty input pin list warning
  ASoC: SDCA: Allow read-only controls to be deferrable
  ASoC: SDCA: Update memory allocations to zero initialise
  ...

Merge tag 'thermal-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull thermal control updates from Rafael Wysocki:
"These update the thermal control sysfs interface and multiple thermal
  control drivers:

   - Convert EAGAIN into ENODATA in temp_show() to prevent user space
     from polling the sysfs file in vain after a failing O_NONBLOCK read
     under the assumption that the read would have blocked (Hsin-Te
     Yuan)

   - Add Wildcat Lake PCI ID to the int340x Intel processor thermal
     driver (Srinivas Pandruvada)

   - Add debugfs interface to override the temperature set by the
     firmware in the Intel platform temperature control (PTC) interface
     and add a new sysfs control attribute called thermal_tolerance to
     it (Srinivas Pandruvada)

   - Enable the stage 2 shutdown in the qcom-spmi-temp-alarm thermal
     driver and add support for more SPMI variants to it (Anjelique
     Melendez)

   - Constify the thermal_zone_device_ops structure where possible in
     several assorted thermal drivers (Christophe Jaillet)

   - Use the dev_fwnode() helper instead of of_fwnode_handle(), as it is
     more adequate, wherever possible in thermal drivers (Jiri Slaby)

   - Implement and document One-Time Programmable fuse support in the
     Rockchip thermal driver in order to increase the precision of the
     measurements (Nicolas Frattaroli)

   - Change the way the Mediatek LTVS thermal driver stores the
     initialization data sequence to support different sequences
     matching different platforms. Introduce mt7988 support with a new
     initialization sequence (Mason Chang)

   - Document the QCom TSens Milos Temperature Sensor DT bindings (Luca
     Weiss)

   - Add the fallback compatible string for MT7981 and MT8516 DT
     bindings (Aleksander Jan Bajkowski)

   - Add the compatible string for the Tegra210B01 SOC_THERM driver
     (Aaron Kling)"

* tag 'thermal-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (23 commits)
  dt-bindings: thermal: tegra: Document Tegra210B01
  dt-bindings: thermal: mediatek: Add fallback compatible string for MT7981 and MT8516
  dt-bindings: thermal: qcom-tsens: document the Milos Temperature Sensor
  thermal/drivers/mediatek/lvts_thermal: Add mt7988 lvts commands
  thermal/drivers/mediatek/lvts_thermal: Add lvts commands and their sizes to driver data
  thermal/drivers/mediatek/lvts_thermal: Change lvts commands array to static const
  thermal/drivers/rockchip: Support reading trim values from OTP
  dt-bindings: thermal: rockchip: document otp thermal trim
  thermal/drivers/rockchip: Support RK3576 SoC in the thermal driver
  dt-bindings: rockchip-thermal: Add RK3576 compatible
  thermal/drivers/rockchip: Rename rk_tsadcv3_tshut_mode
  thermal: Use dev_fwnode()
  thermal: Constify struct thermal_zone_device_ops
  thermal/drivers/loongson2: Constify struct thermal_zone_device_ops
  thermal/drivers/qcom-spmi-temp-alarm: Add support for LITE PMIC peripherals
  thermal/drivers/qcom-spmi-temp-alarm: Add support for GEN2 rev 2 PMIC peripherals
  thermal/drivers/qcom-spmi-temp-alarm: Prepare to support additional Temp Alarm subtypes
  thermal/drivers/qcom-spmi-temp-alarm: Add temp alarm data struct based on HW subtype
  thermal/drivers/qcom-spmi-temp-alarm: Enable stage 2 shutdown when required
  thermal: sysfs: Return ENODATA instead of EAGAIN for reads
  ...

Merge tag 'acpi-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI updates from Rafael Wysocki:
"These update APEI (new EINJv2 error injection, assorted fixes), fix
  the ACPI processor driver, update the legacy ACPI /proc interface
  (multiple assorted fixes of minor issues) and several assorted ACPI
  drivers (minor fixes and cleanups):

   - Printing the address in acpi_ex_trace_point() is either incorrect
     during early kernel boot or not really useful later when pathnames
     resolve properly, so stop doing it (Mario Limonciello)

   - Address several minor issues in the legacy ACPI proc interface
     (Andy Shevchenko)

   - Fix acpi_object union initialization in the ACPI processor driver
     to avoid using memory that contains leftover data (Sebastian Ott)

   - Make the ACPI processor perflib driver take the initial _PPC limit
     into account as appropriate (Jiayi Li)

   - Fix message formatting in the ACPI processor throttling driver and
     in the ACPI PCI link driver (Colin Ian King)

   - Clean up general ACPI PM domain handling (Rafael Wysocki)

   - Fix iomem-related sparse warnings in the APEI EINJ driver (Zaid
     Alali, Tony Luck)

   - Add EINJv2 error injection support to the APEI EINJ driver (Zaid
     Alali)

   - Fix memory corruption in error_type_set() in the APEI EINJ driver
     (Dan Carpenter)

   - Fix less than zero comparison on a size_t variable in the APEI EINJ
     driver (Colin Ian King)

   - Fix check and iounmap of an uninitialized pointer in the APEI EINJ
     driver (Colin Ian King)

   - Add TAINT_MACHINE_CHECK to the GHES panic path in APEI to improve
     diagnostics and post-mortem analysis (Breno Leitao)

   - Update APEI reviewer records and other ACPI-related information in
     MAINTAINERS as well as the contact information in the ACPI ABI
     documentation (Rafael Wysocki)

   - Fix the handling of synchronous uncorrected memory errors in APEI
     (Shuai Xue)

   - Remove an AudioDSP-related ID from the ACPI LPSS driver (Andy
     Shevchenko)

   - Replace sprintf()/scnprintf() with sysfs_emit() in the ACPI fan
     driver and update a debug message in fan_get_state_acpi4() (Eslam
     Khafagy, Abdelrahman Fekry, Sumeet Pawnikar)

   - Add Intel Wildcat Lake support to the ACPI DPTF driver (Srinivas
     Pandruvada)

   - Add more debug information regarding failing firmware updates to
     the ACPI pfr_update driver (Chen Yu)

   - Reduce the verbosity of the ACPI PRM (platform runtime mechanism)
     driver to avoid user confusion (Zhu Qiyu)

   - Replace sprintf() with sysfs_emit() in the ACPI TAD (time and alarm
     device) driver (Sukrut Heroorkar)

   - Enable CONFIG_ACPI_DEBUG by default to make it easier to get ACPI
     debug messages from OEM platforms (Mario Limonciello)

   - Fix parent device references in ASL examples in the ACPI
     documentation and fix spelling and style in the gpio-properties
     documentation in firmware-guide (Andy Shevchenko)

   - Fix typos in ACPI documentation and comments (Bjorn Helgaas)"

* tag 'acpi-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (39 commits)
  ACPI: Fix typos
  ACPI/PCI: Remove space before newline
  ACPI: processor: throttling: Remove space before newline
  ACPI: processor: perflib: Fix initial _PPC limit application
  ACPI/PNP: Use my kernel.org address in MAINTAINERS and ABI docs
  ACPI: TAD: Replace sprintf() with sysfs_emit()
  ACPI: APEI: handle synchronous exceptions in task work
  ACPI: APEI: send SIGBUS to current task if synchronous memory error not recovered
  ACPI: APEI: MAINTAINERS: Update reviewers for APEI
  Documentation: ACPI: Fix parent device references
  ACPI: fan: Update debug message in fan_get_state_acpi4()
  ACPI: PRM: Reduce unnecessary printing to avoid user confusion
  ACPI: fan: Replace sprintf() with sysfs_emit()
  ACPI: APEI: EINJ: Fix trigger actions
  ACPI: processor: fix acpi_object initialization
  ACPI: APEI: GHES: add TAINT_MACHINE_CHECK on GHES panic path
  ACPI: LPSS: Remove AudioDSP related ID
  Documentation: firmware-guide: gpio-properties: Spelling and style fixes
  ACPI: fan: Replace sprintf()/scnprintf() with sysfs_emit() in show() functions
  ACPI: PM: Set .detach in acpi_general_pm_domain definition
  ...

Merge tag 'pm-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management updates from Rafael Wysocki:
"As is tradition, cpufreq is the part with the largest number of
  updates that include core fixes and cleanups as well as updates of
  several assorted drivers, but there are also quite a few updates
  related to system sleep, mostly focused on asynchronous suspend and
  resume of devices and on making the integration of system suspend
  and resume with runtime PM easier.

  Runtime PM is also updated to allow some code duplication in drivers
  to be eliminated going forward and to work more consistently overall
  in some cases.

  Apart from that, there are some driver core updates related to PM
  domains that should help to address ordering issues with devm_ cleanup
  routines relying on PM domains, some assorted devfreq updates
  including core fixes and cleanups, tooling updates, and documentation
  and MAINTAINERS updates.

  Specifics:

   - Fix two initialization ordering issues in the cpufreq core and a
     governor initialization error path in it, and clean it up (Lifeng
     Zheng)

   - Add Granite Rapids support in no-HWP mode to the intel_pstate
     cpufreq driver (Li RongQing)

   - Make intel_pstate always use HWP_DESIRED_PERF when operating in the
     passive mode (Rafael Wysocki)

   - Allow building the tegra124 cpufreq driver as a module (Aaron
     Kling)

   - Do minor cleanups for Rust cpufreq and cpumask APIs and fix
     MAINTAINERS entry for cpu.rs (Abhinav Ananthu, Ritvik Gupta, Lukas
     Bulwahn)

   - Clean up assorted cpufreq drivers (Arnd Bergmann, Dan Carpenter,
     Krzysztof Kozlowski, Sven Peter, Svyatoslav Ryhel, Lifeng Zheng)

   - Add the NEED_UPDATE_LIMITS flag to the CPPC cpufreq driver
     (Prashant Malani)

   - Fix minimum performance state label error in the amd-pstate driver
     documentation (Shouye Liu)

   - Add the CPUFREQ_GOV_STRICT_TARGET flag to the userspace cpufreq
     governor and explain HW coordination influence on it in the
     documentation (Shashank Balaji)

   - Fix opencoded for_each_cpu() in idle_state_valid() in the DT
     cpuidle driver (Yury Norov)

   - Remove info about non-existing QoS interfaces from the PM QoS
     documentation (Ulf Hansson)

   - Use c_* types via kernel prelude in Rust for OPP (Abhinav Ananthu)

   - Add HiSilicon uncore frequency scaling driver to devfreq (Jie Zhan)

   - Allow devfreq drivers to add custom sysfs ABIs (Jie Zhan)

   - Simplify the sun8i-a33-mbus devfreq driver by using more devm
     functions (Uwe Kleine-König)

   - Fix an index typo in trans_stat() in devfreq (Chanwoo Choi)

   - Check devfreq governor before using governor->name (Lifeng Zheng)

   - Remove a redundant devfreq_get_freq_range() call from
     devfreq_add_device() (Lifeng Zheng)

   - Limit max_freq with scaling_min_freq in devfreq (Lifeng Zheng)

   - Replace sscanf() with kstrtoul() in set_freq_store() (Lifeng Zheng)

   - Extend the asynchronous suspend and resume of devices to handle
     suppliers like parents and consumers like children (Rafael Wysocki)

   - Make pm_runtime_force_resume() work for drivers that set the
     DPM_FLAG_SMART_SUSPEND flag and allow PCI drivers and drivers that
     collaborate with the general ACPI PM domain to set it (Rafael
     Wysocki)

   - Add kernel parameter to disable asynchronous suspend/resume of
     devices (Tudor Ambarus)

   - Drop redundant might_sleep() calls from some functions in the
     device suspend/resume core code (Zhongqiu Han)

   - Fix the handling of monitors connected right before waking up the
     system from sleep (tuhaowen)

   - Clean up MAINTAINERS entries for suspend and hibernation (Rafael
     Wysocki)

   - Fix error code path in the KEXEC_JUMP flow and drop a redundant
     pm_restore_gfp_mask() call from it (Rafael Wysocki)

   - Rearrange suspend/resume error handling in the core device suspend
     and resume code (Rafael Wysocki)

   - Fix up white space that does not follow coding style in the
     hibernation core code (Darshan Rathod)

   - Document return values of suspend-related API functions in the
     runtime PM framework (Sakari Ailus)

   - Mark last busy stamp in multiple autosuspend-related functions in
     the runtime PM framework and update its documentation (Sakari
     Ailus)

   - Take active children into account in pm_runtime_get_if_in_use() for
     consistency (Rafael Wysocki)

   - Fix NULL pointer dereference in get_pd_power_uw() in the dtpm_cpu
     power capping driver (Sivan Zohar-Kotzer)

   - Add support for the Bartlett Lake platform to the Intel RAPL power
     capping driver (Qiao Wei)

   - Add PL4 support for Panther Lake to the intel_rapl_msr power
     capping driver (Zhang Rui)

   - Update contact information in the PM ABI docs and maintainer
     information in the power domains DT binding (Rafael Wysocki)

   - Update PM header inclusions to follow the IWYU (Include What You
     Use) principle (Andy Shevchenko)

   - Add flags to specify power on attach/detach for PM domains, make
     the driver core detach PM domains in device_unbind_cleanup(), and
     drop the dev_pm_domain_detach() call from the platform bus type
     (Claudiu Beznea)

   - Improve Python binding's Makefile for cpupower (John B. Wyatt IV)

   - Fix printing of CORE, CPU fields in cpupower-monitor (Gautham
     Shenoy)"

* tag 'pm-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (75 commits)
  cpufreq: CPPC: Mark driver with NEED_UPDATE_LIMITS flag
  PM: docs: Use my kernel.org address in ABI docs and DT bindings
  PM: hibernate: Fix up white space that does not follow coding style
  PM: sleep: Rearrange suspend/resume error handling in the core
  Documentation: amd-pstate:fix minimum performance state label error
  PM: runtime: Take active children into account in pm_runtime_get_if_in_use()
  kexec_core: Drop redundant pm_restore_gfp_mask() call
  kexec_core: Fix error code path in the KEXEC_JUMP flow
  PM: sleep: Clean up MAINTAINERS entries for suspend and hibernation
  drivers: cpufreq: add Tegra114 support
  rust: cpumask: Replace `MaybeUninit` and `mem::zeroed` with `Opaque` APIs
  cpufreq: Exit governor when failed to start old governor
  cpufreq: Move the check of cpufreq_driver->get into cpufreq_verify_current_freq()
  cpufreq: Init policy->rwsem before it may be possibly used
  cpufreq: Initialize cpufreq-based frequency-invariance later
  cpufreq: Remove duplicate check in __cpufreq_offline()
  cpufreq: Contain scaling_cur_freq.attr in cpufreq_attrs
  cpufreq: intel_pstate: Add Granite Rapids support in no-HWP mode
  cpufreq: intel_pstate: Always use HWP_DESIRED_PERF in passive mode
  PM / devfreq: Add HiSilicon uncore frequency scaling driver
  ...

Merge branch 'bpf-show-precise-rejected-function-when-attaching-to-__noreturn-and-deny-list-functions'

KaFai Wan says:

====================
bpf: Show precise rejected function when attaching to __noreturn and deny list functions

Show precise rejected function when attaching fexit/fmod_ret to __noreturn
functions.
Add log for attaching tracing programs to functions in deny list.
Add selftest for attaching tracing programs to functions in deny list.
Migrate fexit_noreturns case into tracing_failure test suite.

changes:
v4:
- change tracing_deny case attaching function (Yonghong Song)
- add Acked-by: Yafang Shao and Yonghong Song

v3:
- add tracing_deny case into existing files (Alexei)
- migrate fexit_noreturns into tracing_failure
- change SOB
https://lore.kernel.org/bpf/20250722153434.20571-1-kafai.wan@linux.dev/

v2:
- change verifier log message (Alexei)
- add missing Suggested-by
https://lore.kernel.org/bpf/20250714120408.1627128-1-mannkafai@gmail.com/

v1:
https://lore.kernel.org/all/20250710162717.3808020-1-mannkafai@gmail.com/
---
====================

Link: https://patch.msgid.link/20250724151454.499040-1-kafai.wan@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: Migrate fexit_noreturns case into tracing_failure test suite

Delete fexit_noreturns.c files and migrate the cases into
tracing_failure.c files.

The result:

$ tools/testing/selftests/bpf/test_progs -t tracing_failure/fexit_noreturns
#467/4 tracing_failure/fexit_noreturns:OK
#467 tracing_failure:OK
Summary: 1/1 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: KaFai Wan <kafai.wan@linux.dev>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20250724151454.499040-5-kafai.wan@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: Add selftest for attaching tracing programs to functions in deny list

The result:

$ tools/testing/selftests/bpf/test_progs -t tracing_failure/tracing_deny
#468/3 tracing_failure/tracing_deny:OK
#468 tracing_failure:OK
Summary: 1/1 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: KaFai Wan <kafai.wan@linux.dev>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20250724151454.499040-4-kafai.wan@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: Add log for attaching tracing programs to functions in deny list

Show the rejected function name when attaching tracing programs to
functions in deny list.

With this change, we know why tracing programs can't attach to functions
like __rcu_read_lock() from log.

$ ./fentry
libbpf: prog '__rcu_read_lock': BPF program load failed: -EINVAL
libbpf: prog '__rcu_read_lock': -- BEGIN PROG LOAD LOG --
Attaching tracing programs to function '__rcu_read_lock' is rejected.

Suggested-by: Leon Hwang <leon.hwang@linux.dev>
Signed-off-by: KaFai Wan <kafai.wan@linux.dev>
Acked-by: Yafang Shao <laoar.shao@gmail.com>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20250724151454.499040-3-kafai.wan@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: Show precise rejected function when attaching fexit/fmod_ret to __noreturn functions

With this change, we know the precise rejected function name when
attaching fexit/fmod_ret to __noreturn functions from log.

$ ./fexit
libbpf: prog 'fexit': BPF program load failed: -EINVAL
libbpf: prog 'fexit': -- BEGIN PROG LOAD LOG --
Attaching fexit/fmod_ret to __noreturn function 'do_exit' is rejected.

Suggested-by: Leon Hwang <leon.hwang@linux.dev>
Signed-off-by: KaFai Wan <kafai.wan@linux.dev>
Acked-by: Yafang Shao <laoar.shao@gmail.com>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20250724151454.499040-2-kafai.wan@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Merge tag 'landlock-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux

Pull landlock update from Mickaël Salaün:
"Fix test issues, improve build compatibility, and add new tests"

* tag 'landlock-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux:
  landlock: Fix cosmetic change
  samples/landlock: Fix building on musl libc
  landlock: Fix warning from KUnit tests
  selftests/landlock: Add test to check rule tied to covered mount point
  selftests/landlock: Fix build of audit_test
  selftests/landlock: Fix readlink check

Merge tag 'audit-pr-20250725' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit

Pull audit update from Paul Moore:
"A single audit patch that restores logging of an audit event in the
module load failure case"

* tag 'audit-pr-20250725' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
audit,module: restore audit logging in load failure case

Merge tag 'selinux-pr-20250725' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux

Pull selinux updates from Paul Moore:

- Introduce the concept of a SELinux "neveraudit" type which prevents
   all auditing of the given type/domain.

   Taken by itself, the benefit of marking a SELinux domain with the
   "neveraudit" tag is likely not very interesting, especially given the
   significant overlap with the "dontaudit" tag.

   However, given that the "neveraudit" tag applies to *all* auditing of
   the tagged domain, we can do some fairly interesting optimizations
   when a SELinux domain is marked as both "permissive" and "dontaudit"
   (think of the unconfined_t domain).

   While this pull request includes optimized inode permission and
   getattr hooks, these optimizations require SELinux policy changes,
   therefore the improvements may not be visible on standard downstream
   Linux distos for a period of time.

- Continue the deprecation process of /sys/fs/selinux/user.

   After removing the associated userspace code in 2020, we marked the
   /sys/fs/selinux/user interface as deprecated in Linux v6.13 with
   pr_warn() and the usual documention update.

   This adds a five second sleep after the pr_warn(), following a
   previous deprecation process pattern that has worked well for us in
   the past in helping identify any existing users that we haven't yet
   reached.

- Add a __GFP_NOWARN flag to our initial hash table allocation.

   Fuzzers such a syzbot often attempt abnormally large SELinux policy
   loads, which the SELinux code gracefully handles by checking for
   allocation failures, but not before the allocator emits a warning
   which causes the automated fuzzing to flag this as an error and
   report it to the list. While we want to continue to support the work
   done by the fuzzing teams, we want to focus on proper issues and not
   an error case that is already handled safely. Add a NOWARN flag to
   quiet the allocator and prevent syzbot from tripping on this again.

- Remove some unnecessary selinuxfs cleanup code, courtesy of Al.

- Update the SELinux in-kernel documentation with pointers to
   additional information.

* tag 'selinux-pr-20250725' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
  selinux: don't bother with selinuxfs_info_free() on failures
  selinux: add __GFP_NOWARN to hashtab_init() allocations
  selinux: optimize selinux_inode_getattr/permission() based on neveraudit|permissive
  selinux: introduce neveraudit types
  documentation: add links to SELinux resources
  selinux: add a 5 second sleep to /sys/fs/selinux/user

Merge tag 'lsm-pr-20250725' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm

Pull lsm updates from Paul Moore:

- Add Nicolas Bouchinet and Xiu Jianfeng as Lockdown maintainers

   The Lockdown LSM has been without a dedicated mantainer since its
   original acceptance upstream, and it has suffered as a result.
   Thankfully we have two new volunteers who together I believe have the
   background and desire to help ensure Lockdown is properly supported.

- Remove the unused cap_mmap_file() declaration

* tag 'lsm-pr-20250725' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm:
  MAINTAINERS: Add Xiu and myself as Lockdown maintainers
  security: Remove unused declaration cap_mmap_file()
  lsm: trivial comment fix

Merge tag 'tpmdd-next-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd

Pull tpm updates from Jarkko Sakkinen:
"Quite a few commits but nothing really that would be worth of spending
  too much time for, or would want to emphasize in particular"

* tag 'tpmdd-next-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd:
  tpm_crb_ffa: handle tpm busy return code
  tpm_crb_ffa: Remove memset usage
  tpm_crb_ffa: Fix typos in function name
  tpm: Check for completion after timeout
  tpm: Use of_reserved_mem_region_to_resource() for "memory-region"
  tpm: Replace scnprintf() with sysfs_emit() and sysfs_emit_at() in sysfs show functions
  tpm_crb_ffa: Remove unused export
  tpm: tpm_crb_ffa: try to probe tpm_crb_ffa when it's built-in
  firmware: arm_ffa: Change initcall level of ffa_init() to rootfs_initcall
  tpm/tpm_svsm: support TPM_CHIP_FLAG_SYNC
  tpm/tpm_ftpm_tee: support TPM_CHIP_FLAG_SYNC
  tpm: support devices with synchronous send()
  tpm: add bufsiz parameter in the .send callback

Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linux

Pull fscrypt updates from Eric Biggers:
"Simplify how fscrypt uses the crypto API, resulting in some
  significant performance improvements:

   - Drop the incomplete and problematic support for asynchronous
     algorithms. These drivers are bug-prone, and it turns out they are
     actually much slower than the CPU-based code as well.

   - Allocate crypto requests on the stack instead of the heap. This
     improves encryption and decryption performance, especially for
     filenames. This also eliminates a point of failure during I/O"

* tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linux:
  ceph: Remove gfp_t argument from ceph_fscrypt_encrypt_*()
  fscrypt: Remove gfp_t argument from fscrypt_encrypt_block_inplace()
  fscrypt: Remove gfp_t argument from fscrypt_crypt_data_unit()
  fscrypt: Switch to sync_skcipher and on-stack requests
  fscrypt: Drop FORBID_WEAK_KEYS flag for AES-ECB
  fscrypt: Don't use asynchronous CryptoAPI algorithms
  fscrypt: Don't use problematic non-inline crypto engines
  fscrypt: Drop obsolete recommendation to enable optimized SHA-512
  fscrypt: Explicitly include <linux/export.h>

Merge tag 'libcrypto-conversions-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux

Pull crypto library conversions from Eric Biggers:
"Convert fsverity and apparmor to use the SHA-2 library functions
  instead of crypto_shash. This is simpler and also slightly faster"

* tag 'libcrypto-conversions-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux:
  fsverity: Switch from crypto_shash to SHA-2 library
  fsverity: Explicitly include <linux/export.h>
  apparmor: use SHA-256 library API instead of crypto_shash API

Merge tag 'libcrypto-tests-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux

Pull crypto library test updates from Eric Biggers:
"Add KUnit test suites for the Poly1305, SHA-1, SHA-224, SHA-256,
  SHA-384, and SHA-512 library functions.

  These are the first KUnit tests for lib/crypto/. So in addition to
  being useful tests for these specific algorithms, they also establish
  some conventions for lib/crypto/ testing going forwards.

  The new tests are fairly comprehensive: more comprehensive than the
  generic crypto infrastructure's tests. They use a variety of
  techniques to check for the types of implementation bugs that tend to
  occur in the real world, rather than just naively checking some test
  vectors. (Interestingly, poly1305_kunit found a bug in QEMU)

  The core test logic is shared by all six algorithms, rather than being
  duplicated for each algorithm.

  Each algorithm's test suite also optionally includes a benchmark"

* tag 'libcrypto-tests-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux:
  lib/crypto: tests: Annotate worker to be on stack
  lib/crypto: tests: Add KUnit tests for SHA-1 and HMAC-SHA1
  lib/crypto: tests: Add KUnit tests for Poly1305
  lib/crypto: tests: Add KUnit tests for SHA-384 and SHA-512
  lib/crypto: tests: Add KUnit tests for SHA-224 and SHA-256
  lib/crypto: tests: Add hash-test-template.h and gen-hash-testvecs.py

Merge tag 'libcrypto-updates-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux

Pull crypto library updates from Eric Biggers:
"This is the main crypto library pull request for 6.17. The main focus
  this cycle is on reorganizing the SHA-1 and SHA-2 code, providing
  high-quality library APIs for SHA-1 and SHA-2 including HMAC support,
  and establishing conventions for lib/crypto/ going forward:

   - Migrate the SHA-1 and SHA-512 code (and also SHA-384 which shares
     most of the SHA-512 code) into lib/crypto/. This includes both the
     generic and architecture-optimized code. Greatly simplify how the
     architecture-optimized code is integrated. Add an easy-to-use
     library API for each SHA variant, including HMAC support. Finally,
     reimplement the crypto_shash support on top of the library API.

   - Apply the same reorganization to the SHA-256 code (and also SHA-224
     which shares most of the SHA-256 code). This is a somewhat smaller
     change, due to my earlier work on SHA-256. But this brings in all
     the same additional improvements that I made for SHA-1 and SHA-512.

  There are also some smaller changes:

   - Move the architecture-optimized ChaCha, Poly1305, and BLAKE2s code
     from arch/$(SRCARCH)/lib/crypto/ to lib/crypto/$(SRCARCH)/. For
     these algorithms it's just a move, not a full reorganization yet.

   - Fix the MIPS chacha-core.S to build with the clang assembler.

   - Fix the Poly1305 functions to work in all contexts.

   - Fix a performance regression in the x86_64 Poly1305 code.

   - Clean up the x86_64 SHA-NI optimized SHA-1 assembly code.

  Note that since the new organization of the SHA code is much simpler,
  the diffstat of this pull request is negative, despite the addition of
  new fully-documented library APIs for multiple SHA and HMAC-SHA
  variants.

  These APIs will allow further simplifications across the kernel as
  users start using them instead of the old-school crypto API. (I've
  already written a lot of such conversion patches, removing over 1000
  more lines of code. But most of those will target 6.18 or later)"

* tag 'libcrypto-updates-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux: (67 commits)
  lib/crypto: arm64/sha512-ce: Drop compatibility macros for older binutils
  lib/crypto: x86/sha1-ni: Convert to use rounds macros
  lib/crypto: x86/sha1-ni: Minor optimizations and cleanup
  crypto: sha1 - Remove sha1_base.h
  lib/crypto: x86/sha1: Migrate optimized code into library
  lib/crypto: sparc/sha1: Migrate optimized code into library
  lib/crypto: s390/sha1: Migrate optimized code into library
  lib/crypto: powerpc/sha1: Migrate optimized code into library
  lib/crypto: mips/sha1: Migrate optimized code into library
  lib/crypto: arm64/sha1: Migrate optimized code into library
  lib/crypto: arm/sha1: Migrate optimized code into library
  crypto: sha1 - Use same state format as legacy drivers
  crypto: sha1 - Wrap library and add HMAC support
  lib/crypto: sha1: Add HMAC support
  lib/crypto: sha1: Add SHA-1 library functions
  lib/crypto: sha1: Rename sha1_init() to sha1_init_raw()
  crypto: x86/sha1 - Rename conflicting symbol
  lib/crypto: sha2: Add hmac_sha*_init_usingrawkey()
  lib/crypto: arm/poly1305: Remove unneeded empty weak function
  lib/crypto: x86/poly1305: Fix performance regression on short messages
  ...

dt-bindings: Correct indentation and style in DTS example

DTS example in the bindings should be indented with 2- or 4-spaces and
aligned with opening '- |', so correct any differences like 3-spaces or
mixtures 2- and 4-spaces in one binding.

No functional changes here, but saves some comments during reviews of
new patches built on existing code.

Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # For MMC
Acked-by: Lee Jones <lee@kernel.org>
Acked-by: Thierry Reding <treding@nvidia.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> # renesas
Link: https://lore.kernel.org/r/20250107131456.247610-1-krzysztof.kozlowski@linaro.org
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20250725100241.120106-2-krzysztof.kozlowski@linaro.org
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>

staging: media: atomisp: Fix stack buffer overflow in gmin_get_var_int()

When gmin_get_config_var() calls efi.get_variable() and the EFI variable
is larger than the expected buffer size, two behaviors combine to create
a stack buffer overflow:

1. gmin_get_config_var() does not return the proper error code when
   efi.get_variable() fails. It returns the stale 'ret' value from
   earlier operations instead of indicating the EFI failure.

2. When efi.get_variable() returns EFI_BUFFER_TOO_SMALL, it updates
   *out_len to the required buffer size but writes no data to the output
   buffer. However, due to bug #1, gmin_get_var_int() believes the call
   succeeded.

The caller gmin_get_var_int() then performs:
- Allocates val[CFG_VAR_NAME_MAX + 1] (65 bytes) on stack
- Calls gmin_get_config_var(dev, is_gmin, var, val, &len) with len=64
- If EFI variable is >64 bytes, efi.get_variable() sets len=required_size
- Due to bug #1, thinks call succeeded with len=required_size
- Executes val[len] = 0, writing past end of 65-byte stack buffer

This creates a stack buffer overflow when EFI variables are larger than
64 bytes. Since EFI variables can be controlled by firmware or system
configuration, this could potentially be exploited for code execution.

Fix the bug by returning proper error codes from gmin_get_config_var()
based on EFI status instead of stale 'ret' value.

The gmin_get_var_int() function is called during device initialization
for camera sensor configuration on Intel Bay Trail and Cherry Trail
platforms using the atomisp camera stack.

Reported-by: zepta <z3ptaa@gmail.com>
Closes: https://lore.kernel.org/all/CAPBS6KoQyM7FMdPwOuXteXsOe44X4H3F8Fw+y_qWq6E+OdmxQA@mail.gmail.com
Fixes: 38d4f74bc148 ("media: atomisp_gmin_platform: stop abusing efivar API")
Reviewed-by: Hans de Goede <hansg@kernel.org>
Link: https://lore.kernel.org/r/20250724080756.work.741-kees@kernel.org
Signed-off-by: Kees Cook <kees@kernel.org>

Merge tag 'crc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux

Pull CRC updates from Eric Biggers:

- Reorganize the architecture-optimized CRC code

   It now lives in lib/crc/$(SRCARCH)/ rather than arch/$(SRCARCH)/lib/,
   and it is no longer artificially split into separate generic and arch
   modules. This allows better inlining and dead code elimination

   The generic CRC code is also no longer exported, simplifying the API.
   (This mirrors the similar changes to SHA-1 and SHA-2 in lib/crypto/,
   which can be found in the "Crypto library updates" pull request)

- Improve crc32c() performance on newer x86_64 CPUs on long messages by
   enabling the VPCLMULQDQ optimized code

- Simplify the crypto_shash wrappers for crc32_le() and crc32c()

   Register just one shash algorithm for each that uses the (fully
   optimized) library functions, instead of unnecessarily providing
   direct access to the generic CRC code

- Remove unused and obsolete drivers for hardware CRC engines

- Remove CRC-32 combination functions that are no longer used

- Add kerneldoc for crc32_le(), crc32_be(), and crc32c()

- Convert the crc32() macro to an inline function

* tag 'crc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux: (26 commits)
  lib/crc: x86/crc32c: Enable VPCLMULQDQ optimization where beneficial
  lib/crc: x86: Reorganize crc-pclmul static_call initialization
  lib/crc: crc64: Add include/linux/crc64.h to kernel-api.rst
  lib/crc: crc32: Change crc32() from macro to inline function and remove cast
  nvmem: layouts: Switch from crc32() to crc32_le()
  lib/crc: crc32: Document crc32_le(), crc32_be(), and crc32c()
  lib/crc: Explicitly include <linux/export.h>
  lib/crc: Remove ARCH_HAS_* kconfig symbols
  lib/crc: x86: Migrate optimized CRC code into lib/crc/
  lib/crc: sparc: Migrate optimized CRC code into lib/crc/
  lib/crc: s390: Migrate optimized CRC code into lib/crc/
  lib/crc: riscv: Migrate optimized CRC code into lib/crc/
  lib/crc: powerpc: Migrate optimized CRC code into lib/crc/
  lib/crc: mips: Migrate optimized CRC code into lib/crc/
  lib/crc: loongarch: Migrate optimized CRC code into lib/crc/
  lib/crc: arm64: Migrate optimized CRC code into lib/crc/
  lib/crc: arm: Migrate optimized CRC code into lib/crc/
  lib/crc: Prepare for arch-optimized code in subdirs of lib/crc/
  lib/crc: Move files into lib/crc/
  lib/crc32: Remove unused combination support
  ...

Merge tag 'hardening-v6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull hardening updates from Kees Cook:

- Introduce and start using TRAILING_OVERLAP() helper for fixing
   embedded flex array instances (Gustavo A. R. Silva)

- mux: Convert mux_control_ops to a flex array member in mux_chip
   (Thorsten Blum)

- string: Group str_has_prefix() and strstarts() (Andy Shevchenko)

- Remove KCOV instrumentation from __init and __head (Ritesh Harjani,
   Kees Cook)

- Refactor and rename stackleak feature to support Clang

- Add KUnit test for seq_buf API

- Fix KUnit fortify test under LTO

* tag 'hardening-v6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (22 commits)
  sched/task_stack: Add missing const qualifier to end_of_stack()
  kstack_erase: Support Clang stack depth tracking
  kstack_erase: Add -mgeneral-regs-only to silence Clang warnings
  init.h: Disable sanitizer coverage for __init and __head
  kstack_erase: Disable kstack_erase for all of arm compressed boot code
  x86: Handle KCOV __init vs inline mismatches
  arm64: Handle KCOV __init vs inline mismatches
  s390: Handle KCOV __init vs inline mismatches
  arm: Handle KCOV __init vs inline mismatches
  mips: Handle KCOV __init vs inline mismatch
  powerpc/mm/book3s64: Move kfence and debug_pagealloc related calls to __init section
  configs/hardening: Enable CONFIG_INIT_ON_FREE_DEFAULT_ON
  configs/hardening: Enable CONFIG_KSTACK_ERASE
  stackleak: Split KSTACK_ERASE_CFLAGS from GCC_PLUGINS_CFLAGS
  stackleak: Rename stackleak_track_stack to __sanitizer_cov_stack_depth
  stackleak: Rename STACKLEAK to KSTACK_ERASE
  seq_buf: Introduce KUnit tests
  string: Group str_has_prefix() and strstarts()
  kunit/fortify: Add back "volatile" for sizeof() constants
  acpi: nfit: intel: avoid multiple -Wflex-array-member-not-at-end warnings
  ...

Merge tag 'execve-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull execve updates from Kees Cook:

- Introduce regular REGSET note macros arch-wide (Dave Martin)

- Remove arbitrary 4K limitation of program header size (Yin Fengwei)

- Reorder function qualifiers for copy_clone_args_from_user() (Dishank Jogi)

* tag 'execve-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (25 commits)
  fork: reorder function qualifiers for copy_clone_args_from_user
  binfmt_elf: remove the 4k limitation of program header size
  binfmt_elf: Warn on missing or suspicious regset note names
  xtensa: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names
  um: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names
  x86/ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names
  sparc: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names
  sh: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names
  s390/ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names
  riscv: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names
  powerpc/ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names
  parisc: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names
  openrisc: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names
  nios2: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names
  MIPS: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names
  m68k: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names
  LoongArch: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names
  hexagon: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names
  csky: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names
  arm64: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names
  ...

Merge tag 'ata-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux

Pull ata updates from Damien Le Moal:

- Replace the ATA_DFLAG_ZAC device flag with the helper function
   ata_dev_is_zac() testing directly the device class and device zoned
   mode (me)

- Some small cleanup of ata_scsi_offline_dev() code (me)

- Improve the description of the link power management (LPM) policies
   in Kconfig and in the comments defining these. Together with this,
   clarify the description of the ahci driver mobile_lpm_policy module
   parameter (me)

- Various code refactoring of libata LPM handling (ata_eh_set_lpm()
   renaming, introduce ata_dev_config_lpm(), LPM related quirk handling,
   and LPM related feature advertizing on device scan) (me)

- Avoid unnecessary device reset when revalidating after an error when
   LPM is used (me)

- Do not allow setting a port/link LPM policy if LPM is not supported,
   either because the controller does not support partial, slumber nor
   devsleep, or when the port is an external port with hotplug
   capability (me)

- Make sure that device initiated power management (DIPM) is not
   enabled if the host (controller) lacks support for this feature (me)

- Improve messages and debug messages related to LPM, in particular,
   reduce the number of messages signaling the lack of LPM support (me)

- Cache in memory a device general purpose log directory to avoid
   having to access this log for every log page access. The intent here
   is to reduce the number of read log commands when scanning or
   revalidating a device (me)

- Change ata_dev_cleanup_cdl_resources() to be a static function (me)

- Rename and simplify the mode setting functions (me)

- Introduce the helper function ata_port_eh_scheduled() to check if EH
   is pending or running for a port (me)

- Improve ata_eh_set_pending() (return bool instead of int) (me)

- Use sysfs_emit() instead of scnprintf() for libata-transport
   attributes (Jonathan)

- Use the existing macro definiton of RDC vendor ID instead of
   hardcoding it in the pata_rdc driver (Andy)

- Rework how EH is called for a port to avoid needing to pass along the
   prereset, softreset, hardreset and postreset operations. The driver
   API documentation for this is also updated (me)

* tag 'ata-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux: (28 commits)
  Documentation: driver-api: Update libata error handler information
  ata: libata-eh: Simplify reset operation management
  ata: libata-eh: Remove ata_do_eh()
  ata: pata_rdc: Use registered definition for the RDC vendor
  ata: libata-eh: Make ata_eh_followup_srst_needed() return a bool
  ata: libata-transport: replace scnprintf with sysfs_emit for simple attributes
  ata: libata-eh: use bool for fastdrain in ata_eh_set_pending()
  ata: libata: Introduce ata_port_eh_scheduled()
  ata: libata-core: Rename ata_do_set_mode()
  ata: libata-eh: Rename and make ata_set_mode() static
  ata: libata-core: Make ata_dev_cleanup_cdl_resources() static
  ata: libata-core: Cache the general purpose log directory
  ata: libata_eh: Add debug messages to ata_eh_link_set_lpm()
  ata: libata-core: Reduce the number of messages signaling broken LPM
  ata: ahci: Disallow LPM policy control if not supported
  ata: ahci: Disallow LPM policy control for external ports
  ata: ahci: Disable DIPM if host lacks support
  ata: libata-sata: Disallow changing LPM state if not supported
  ata: libata-eh: Avoid unnecessary resets when revalidating devices
  ata: libata-core: Advertize device support for DIPM and HIPM features
  ...

Merge tag 'zonefs-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs

Pull zonefs update from Damien Le Moal:

- Use ZONEFS_SUPER_SIZE instead of PAGE_SIZE to read from disk the
super block (Johannes).

* tag 'zonefs-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs:
zonefs: use ZONEFS_SUPER_SIZE instead of PAGE_SIZE

Merge tag 'for-6.17/block-20250728' of git://git.kernel.dk/linux

Pull block updates from Jens Axboe:

- MD pull request via Yu:
      - call del_gendisk synchronously (Xiao)
      - cleanup unused variable (John)
      - cleanup workqueue flags (Ryo)
      - fix faulty rdev can't be removed during resync (Qixing)

- NVMe pull request via Christoph:
      - try PCIe function level reset on init failure (Keith Busch)
      - log TLS handshake failures at error level (Maurizio Lombardi)
      - pci-epf: do not complete commands twice if nvmet_req_init()
        fails (Rick Wertenbroek)
      - misc cleanups (Alok Tiwari)

- Removal of the pktcdvd driver

   This has been more than a decade coming at this point, and some
   recently revealed breakages that had it causing issues even for cases
   where it isn't required made me re-pull the trigger on this one. It's
   known broken and nobody has stepped up to maintain the code

- Series for ublk supporting batch commands, enabling the use of
   multishot where appropriate

- Speed up ublk exit handling

- Fix for the two-stage elevator fixing which could leak data

- Convert NVMe to use the new IOVA based API

- Increase default max transfer size to something more reasonable

- Series fixing write operations on zoned DM devices

- Add tracepoints for zoned block device operations

- Prep series working towards improving blk-mq queue management in the
   presence of isolated CPUs

- Don't allow updating of the block size of a loop device that is
   currently under exclusively ownership/open

- Set chunk sectors from stacked device stripe size and use it for the
   atomic write size limit

- Switch to folios in bcache read_super()

- Fix for CD-ROM MRW exit flush handling

- Various tweaks, fixes, and cleanups

* tag 'for-6.17/block-20250728' of git://git.kernel.dk/linux: (94 commits)
  block: restore two stage elevator switch while running nr_hw_queue update
  cdrom: Call cdrom_mrw_exit from cdrom_release function
  sunvdc: Balance device refcount in vdc_port_mpgroup_check
  nvme-pci: try function level reset on init failure
  dm: split write BIOs on zone boundaries when zone append is not emulated
  block: use chunk_sectors when evaluating stacked atomic write limits
  dm-stripe: limit chunk_sectors to the stripe size
  md/raid10: set chunk_sectors limit
  md/raid0: set chunk_sectors limit
  block: sanitize chunk_sectors for atomic write limits
  ilog2: add max_pow_of_two_factor()
  nvmet: pci-epf: Do not complete commands twice if nvmet_req_init() fails
  nvme-tcp: log TLS handshake failures at error level
  docs: nvme: fix grammar in nvme-pci-endpoint-target.rst
  nvme: fix typo in status code constant for self-test in progress
  nvmet: remove redundant assignment of error code in nvmet_ns_enable()
  nvme: fix incorrect variable in io cqes error message
  nvme: fix multiple spelling and grammar issues in host drivers
  block: fix blk_zone_append_update_request_bio() kernel-doc
  md/raid10: fix set but not used variable in sync_request_write()
  ...

tracing: trace_fprobe: Fix typo of the semicolon

Fix a typo that uses ',' instead of ';' for line delimiter.

Link: https://lore.kernel.org/linux-trace-kernel/175366879192.487099.5714468217360139639.stgit@mhiramat.tok.corp.google.com/
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>

Merge tag 'for-6.17/io_uring-20250728' of git://git.kernel.dk/linux

Pull io_uring updates from Jens Axboe:

- Optimization to avoid reference counts on non-cloned registered
   buffers. This is how these buffers were handled prior to having
   cloning support, and we can still use that approach as long as the
   buffers haven't been cloned to another ring.

- Cleanup and improvement for uring_cmd, where btrfs was the only user
   of storing allocated data for the lifetime of the uring_cmd. Clean
   that up so we can get rid of the need to do that.

- Avoid unnecessary memory copies in uring_cmd usage. This is
   particularly important as a lot of uring_cmd usage necessitates the
   use of 128b SQEs.

- A few updates for recv multishot, where it's now possible to add
   fairness limits for limiting how much is transferred for each retry
   loop. Additionally, recv multishot now supports an overall cap as
   well, where once reached the multishot recv will terminate. The
   latter is useful for buffer management and juggling many recv streams
   at the same time.

- Add support for returning the TX timestamps via a new socket command.
   This feature can work in either singleshot or multishot mode, where
   the latter triggers a completion whenever new timestamps are
   available. This is an alternative to using the existing error queue.

- Add support for an io_uring "mock" file, which is the start of being
   able to do 100% targeted testing in terms of exercising io_uring
   request handling. The idea is to have a file type that can be
   anything the tester would like, and behave exactly how you want it to
   behave in terms of hitting the code paths you want.

- Improve zcrx by using sgtables to de-duplicate and improve dma
   address handling.

- Prep work for supporting larger pages for zcrx.

- Various little improvements and fixes.

* tag 'for-6.17/io_uring-20250728' of git://git.kernel.dk/linux: (42 commits)
  io_uring/zcrx: fix leaking pages on sg init fail
  io_uring/zcrx: don't leak pages on account failure
  io_uring/zcrx: fix null ifq on area destruction
  io_uring: fix breakage in EXPERT menu
  io_uring/cmd: remove struct io_uring_cmd_data
  btrfs/ioctl: store btrfs_uring_encoded_data in io_btrfs_cmd
  io_uring/cmd: introduce IORING_URING_CMD_REISSUE flag
  io_uring/zcrx: account area memory
  io_uring: export io_[un]account_mem
  io_uring/net: Support multishot receive len cap
  io_uring: deduplicate wakeup handling
  io_uring/net: cast min_not_zero() type
  io_uring/poll: cleanup apoll freeing
  io_uring/net: allow multishot receive per-invocation cap
  io_uring/net: move io_sr_msg->retry_flags to io_sr_msg->flags
  io_uring/net: use passed in 'len' in io_recv_buf_select()
  io_uring/zcrx: prepare fallback for larger pages
  io_uring/zcrx: assert area type in io_zcrx_iov_page
  io_uring/zcrx: allocate sgtable for umem areas
  io_uring/zcrx: introduce io_populate_area_dma
  ...

Merge tag 'v6.17-rc-smb3-server-fixes' of git://git.samba.org/ksmbd

Pull smb server updates from Steve French:

- Fix mtime/ctime reporting issue

- Auth fixes, including two session setup race bugs reported by ZDI

- Locking improvement in query directory

- Fix for potential deadlock in creating hardlinks

- Improvements to path name processing

* tag 'v6.17-rc-smb3-server-fixes' of git://git.samba.org/ksmbd:
  ksmbd: fix corrupted mtime and ctime in smb2_open
  ksmbd: fix Preauh_HashValue race condition
  ksmbd: check return value of xa_store() in krb5_authenticate
  ksmbd: fix null pointer dereference error in generate_encryptionkey
  smb/server: add ksmbd_vfs_kern_path()
  smb/server: avoid deadlock when linking with ReplaceIfExists
  smb/server: simplify ksmbd_vfs_kern_path_locked()
  smb/server: use lookup_one_unlocked()

Merge tag 'hfs-v6.17-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/vdubeyko/hfs

Pull hfs/hfsplus updates from Viacheslav Dubeyko:
"Johannes Thumshirn has made nice cleanup in hfsplus_submit_bio().

  Tetsuo Handa has fixed the syzbot reported issue in
  hfsplus_create_attributes_file() for the case of corruption the
  Attributes File's metadata.

  Yangtao Li has fixed the syzbot reported issue by removing the
  uneccessary WARN_ON() in hfsplus_free_extents().

  Other fixes:

   - restore generic/001 successful execution by erasing deleted b-tree
     nodes

   - eliminate slab-out-of-bounds issue in hfs_bnode_read() and
     hfsplus_bnode_read() by checking correctness of offset and length
     when accessing b-tree node contents

   - eliminate slab-out-of-bounds read in hfsplus_uni2asc() if the
     b-tree node record has corrupted length of a name that could be
     bigger than HFSPLUS_MAX_STRLEN

   - eliminate general protection fault in hfs_find_init() for the case
     of initial b-tree object creation"

* tag 'hfs-v6.17-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/vdubeyko/hfs:
  hfs: fix general protection fault in hfs_find_init()
  hfs: fix slab-out-of-bounds in hfs_bnode_read()
  hfsplus: fix slab-out-of-bounds in hfsplus_bnode_read()
  hfsplus: fix slab-out-of-bounds read in hfsplus_uni2asc()
  hfsplus: don't use BUG_ON() in hfsplus_create_attributes_file()
  hfsplus: don't set REQ_SYNC for hfsplus_submit_bio()
  hfsplus: remove mutex_lock check in hfsplus_free_extents
  hfs: make splice write available again
  hfsplus: make splice write available again
  hfs: fix not erasing deleted b-tree node issue

Merge tag 'fs_for_v6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs

Pull udf and ext2 updates from Jan Kara:
"A few udf and ext2 fixes and cleanups"

* tag 'fs_for_v6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  udf: Verify partition map count
  udf: stop using write_cache_pages
  ext2: Handle fiemap on empty files to prevent EINVAL

block: change blk_get_meta_cap() stub return -ENOIOCTLCMD

When introduced in commit 9eb22f7fedfc ("fs: add ioctl to query metadata
and protection info capabilities") the stub of blk_get_meta_cap() for
!BLK_DEV_INTEGRITY always returns -EOPNOTSUPP.  The motivation was that
while the command was unsupported in that configuration it was still
recognized.

A later change instead assumed -ENOIOCTLCMD as is required for unknown
ioctl commands per Documentation/driver-api/ioctl.rst. The result being
that on !BLK_DEV_INTEGRITY configs, any ioctl which reaches
blkdev_common_ioctl() will return -EOPNOTSUPP.

Change the stub to return -ENOIOCTLCMD, fixing the issue and better
matching with expectations.

[ The blkdev_common_ioctl() confusion has been fixed, but -ENOIOCTLCMD
  is the right thing to return for unrecognized ioctls, so the patch
  remains the right thing to do.   - Linus ]

Link: https://lore.kernel.org/lkml/CACzX3AsRd__fXb9=CJPTTJC494SDnYAtYrN2=+bZgMCvM6UQDg@mail.gmail.com
Fixes: 42b0ef01e6b5 ("block: fix FS_IOC_GETLBMD_CAP parsing in blkdev_common_ioctl()")
Signed-off-by: Klara Modin <klarasmodin@gmail.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

fuse: remove page alignment check for writeback len

Remove incorrect page alignment check for the writeback len arg in
fuse_iomap_writeback_range(). len will always be block-aligned as
passed in by iomap.

On regular fuse filesystems, i_blkbits is set to PAGE_SHIFT so this is
not a problem but for fuseblk filesystems, the block size is set to a
default of 512 bytes or a block size passed in at mount time.

Please note that non-page-aligned lengths are fine for the logic in
fuse_iomap_writeback_range(). The check was originally added as a
safeguard to detect conspicuously wrong ranges.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Fixes: ef7e7cbb323f ("fuse: use iomap for writeback")
Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Link: https://lore.kernel.org/linux-fsdevel/CA+G9fYs5AdVM-T2Tf3LciNCwLZEHetcnSkHsjZajVwwpM2HmJw@mail.gmail.com/
Reported-by: Sasha Levin <sashal@kernel.org>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'vfs-6.17-rc1.iomap' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs iomap updates from Christian Brauner:

- Refactor the iomap writeback code and split the generic and ioend/bio
   based writeback code.

   There are two methods that define the split between the generic
   writeback code, and the implemementation of it, and all knowledge of
   ioends and bios now sits below that layer.

- Add fuse iomap support for buffered writes and dirty folio writeback.

   This is needed so that granular uptodate and dirty tracking can be
   used in fuse when large folios are enabled. This has two big
   advantages. For writes, instead of the entire folio needing to be
   read into the page cache, only the relevant portions need to be. For
   writeback, only the dirty portions need to be written back instead of
   the entire folio.

* tag 'vfs-6.17-rc1.iomap' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  fuse: refactor writeback to use iomap_writepage_ctx inode
  fuse: hook into iomap for invalidating and checking partial uptodateness
  fuse: use iomap for folio laundering
  fuse: use iomap for writeback
  fuse: use iomap for buffered writes
  iomap: build the writeback code without CONFIG_BLOCK
  iomap: add read_folio_range() handler for buffered writes
  iomap: improve argument passing to iomap_read_folio_sync
  iomap: replace iomap_folio_ops with iomap_write_ops
  iomap: export iomap_writeback_folio
  iomap: move folio_unlock out of iomap_writeback_folio
  iomap: rename iomap_writepage_map to iomap_writeback_folio
  iomap: move all ioend handling to ioend.c
  iomap: add public helpers for uptodate state manipulation
  iomap: hide ioends from the generic writeback code
  iomap: refactor the writeback interface
  iomap: cleanup the pending writeback tracking in iomap_writepage_map_blocks
  iomap: pass more arguments using the iomap writeback context
  iomap: header diet

Merge tag 'vfs-6.17-rc1.super' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull superblock callback update from Christian Brauner:
"Currently all filesystems which implement super_operations::shutdown()
  can not afford losing a device.

  Thus fs_bdev_mark_dead() will just call the ->shutdown() callback for
  the involved filesystem.

  But it will no longer be the case, as multi-device filesystems like
  btrfs can handle certain device loss without the need to shutdown the
  whole filesystem.

  To allow those multi-device filesystems to be integrated to use
  fs_holder_ops:

   - Add a new super_operations::remove_bdev() callback

   - Try ->remove_bdev() callback first inside fs_bdev_mark_dead().

     If the callback returned 0, meaning the fs can handling the device
     loss, then exit without doing anything else.

     If there is no such callback or the callback returned non-zero
     value, continue to shutdown the filesystem as usual.

  This means the new remove_bdev() should only do the check on whether
  the operation can continue, and if so do the fs specific handlings.
  The shutdown handling should still be handled by the existing
  ->shutdown() callback.

  For all existing filesystems with shutdown callback, there is no
  change to the code nor behavior.

  Btrfs is going to implement both the ->remove_bdev() and ->shutdown()
  callbacks soon"

* tag 'vfs-6.17-rc1.super' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  fs: add a new remove_bdev() callback

Merge tag 'vfs-6.17-rc1.fileattr' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull fileattr updates from Christian Brauner:
"This introduces the new file_getattr() and file_setattr() system calls
  after lengthy discussions.

  Both system calls serve as successors and extensible companions to
  the FS_IOC_FSGETXATTR and FS_IOC_FSSETXATTR system calls which have
  started to show their age in addition to being named in a way that
  makes it easy to conflate them with extended attribute related
  operations.

  These syscalls allow userspace to set filesystem inode attributes on
  special files. One of the usage examples is the XFS quota projects.

  XFS has project quotas which could be attached to a directory. All new
  inodes in these directories inherit project ID set on parent
  directory.

  The project is created from userspace by opening and calling
  FS_IOC_FSSETXATTR on each inode. This is not possible for special
  files such as FIFO, SOCK, BLK etc. Therefore, some inodes are left
  with empty project ID. Those inodes then are not shown in the quota
  accounting but still exist in the directory. This is not critical but
  in the case when special files are created in the directory with
  already existing project quota, these new inodes inherit extended
  attributes. This creates a mix of special files with and without
  attributes. Moreover, special files with attributes don't have a
  possibility to become clear or change the attributes. This, in turn,
  prevents userspace from re-creating quota project on these existing
  files.

  In addition, these new system calls allow the implementation of
  additional attributes that we couldn't or didn't want to fit into the
  legacy ioctls anymore"

* tag 'vfs-6.17-rc1.fileattr' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  fs: tighten a sanity check in file_attr_to_fileattr()
  tree-wide: s/struct fileattr/struct file_kattr/g
  fs: introduce file_getattr and file_setattr syscalls
  fs: prepare for extending file_get/setattr()
  fs: make vfs_fileattr_[get|set] return -EOPNOTSUPP
  selinux: implement inode_file_[g|s]etattr hooks
  lsm: introduce new hooks for setting/getting inode fsxattr
  fs: split fileattr related helpers into separate file

Merge tag 'vfs-6.17-rc1.integrity' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs 'protection info' updates from Christian Brauner:
"This adds the new FS_IOC_GETLBMD_CAP ioctl() to query metadata and
  protection info (PI) capabilities. This ioctl returns information
  about the files integrity profile. This is useful for userspace
  applications to understand a files end-to-end data protection support
  and configure the I/O accordingly.

  For now this interface is only supported by block devices. However the
  design and placement of this ioctl in generic FS ioctl space allows us
  to extend it to work over files as well. This maybe useful when
  filesystems start supporting PI-aware layouts.

  A new structure struct logical_block_metadata_cap is introduced, which
  contains the following fields:

   - lbmd_flags:
     bitmask of logical block metadata capability flags

   - lbmd_interval:
     the amount of data described by each unit of logical block metadata

   - lbmd_size:
     size in bytes of the logical block metadata associated with each
     interval

   - lbmd_opaque_size:
     size in bytes of the opaque block tag associated with each interval

   - lbmd_opaque_offset:
     offset in bytes of the opaque block tag within the logical block
     metadata

   - lbmd_pi_size:
     size in bytes of the T10 PI tuple associated with each interval

   - lbmd_pi_offset:
     offset in bytes of T10 PI tuple within the logical block metadata

   - lbmd_pi_guard_tag_type:
     T10 PI guard tag type

   - lbmd_pi_app_tag_size:
     size in bytes of the T10 PI application tag

   - lbmd_pi_ref_tag_size:
     size in bytes of the T10 PI reference tag

   - lbmd_pi_storage_tag_size:
     size in bytes of the T10 PI storage tag

  The internal logic to fetch the capability is encapsulated in a helper
  function blk_get_meta_cap(), which uses the blk_integrity profile
  associated with the device. The ioctl returns -EOPNOTSUPP, if
  CONFIG_BLK_DEV_INTEGRITY is not enabled"

* tag 'vfs-6.17-rc1.integrity' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  block: fix lbmd_guard_tag_type assignment in FS_IOC_GETLBMD_CAP
  block: fix FS_IOC_GETLBMD_CAP parsing in blkdev_common_ioctl()
  fs: add ioctl to query metadata and protection info capabilities
  nvme: set pi_offset only when checksum type is not BLK_INTEGRITY_CSUM_NONE
  block: introduce pi_tuple_size field in blk_integrity
  block: rename tuple_size field in blk_integrity to metadata_size

Merge tag 'vfs-6.17-rc1.rust' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs rust updates from Christian Brauner:

- Allow poll_table pointers to be NULL

- Add Rust files to vfs MAINTAINERS entry

* tag 'vfs-6.17-rc1.rust' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
vfs: add Rust files to MAINTAINERS
poll: rust: allow poll_table ptrs to be null

Merge tag 'vfs-6.17-rc1.bpf' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs bpf updates from Christian Brauner:
"These changes allow bpf to read extended attributes from cgroupfs.

  This is useful in redirecting AF_UNIX socket connections based on
  cgroup membership of the socket. One use-case is the ability to
  implement log namespaces in systemd so services and containers are
  redirected to different journals"

* tag 'vfs-6.17-rc1.bpf' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  selftests/kernfs: test xattr retrieval
  selftests/bpf: Add tests for bpf_cgroup_read_xattr
  bpf: Mark cgroup_subsys_state->cgroup RCU safe
  bpf: Introduce bpf_cgroup_read_xattr to read xattr of cgroup's node
  kernfs: remove iattr_mutex

Merge tag 'vfs-6.17-rc1.pidfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull pidfs updates from Christian Brauner:

- persistent info

   Persist exit and coredump information independent of whether anyone
   currently holds a pidfd for the struct pid.

   The current scheme allocated pidfs dentries on-demand repeatedly.
   This scheme is reaching it's limits as it makes it impossible to pin
   information that needs to be available after the task has exited or
   coredumped and that should not be lost simply because the pidfd got
   closed temporarily. The next opener should still see the stashed
   information.

   This is also a prerequisite for supporting extended attributes on
   pidfds to allow attaching meta information to them.

   If someone opens a pidfd for a struct pid a pidfs dentry is allocated
   and stashed in pid->stashed. Once the last pidfd for the struct pid
   is closed the pidfs dentry is released and removed from pid->stashed.

   So if 10 callers create a pidfs dentry for the same struct pid
   sequentially, i.e., each closing the pidfd before the other creates a
   new one then a new pidfs dentry is allocated every time.

   Because multiple tasks acquiring and releasing a pidfd for the same
   struct pid can race with each another a task may still find a valid
   pidfs entry from the previous task in pid->stashed and reuse it. Or
   it might find a dead dentry in there and fail to reuse it and so
   stashes a new pidfs dentry. Multiple tasks may race to stash a new
   pidfs dentry but only one will succeed, the other ones will put their
   dentry.

   The current scheme aims to ensure that a pidfs dentry for a struct
   pid can only be created if the task is still alive or if a pidfs
   dentry already existed before the task was reaped and so exit
   information has been was stashed in the pidfs inode.

   That's great except that it's buggy. If a pidfs dentry is stashed in
   pid->stashed after pidfs_exit() but before __unhash_process() is
   called we will return a pidfd for a reaped task without exit
   information being available.

   The pidfds_pid_valid() check does not guard against this race as it
   doens't sync at all with pidfs_exit(). The pid_has_task() check might
   be successful simply because we're before __unhash_process() but
   after pidfs_exit().

   Introduce a new scheme where the lifetime of information associated
   with a pidfs entry (coredump and exit information) isn't bound to the
   lifetime of the pidfs inode but the struct pid itself.

   The first time a pidfs dentry is allocated for a struct pid a struct
   pidfs_attr will be allocated which will be used to store exit and
   coredump information.

   If all pidfs for the pidfs dentry are closed the dentry and inode can
   be cleaned up but the struct pidfs_attr will stick until the struct
   pid itself is freed. This will ensure minimal memory usage while
   persisting relevant information.

   The new scheme has various advantages. First, it allows to close the
   race where we end up handing out a pidfd for a reaped task for which
   no exit information is available. Second, it minimizes memory usage.
   Third, it allows to remove complex lifetime tracking via dentries
   when registering a struct pid with pidfs. There's no need to get or
   put a reference. Instead, the lifetime of exit and coredump
   information associated with a struct pid is bound to the lifetime of
   struct pid itself.

- extended attributes

   Now that we have a way to persist information for pidfs dentries we
   can start supporting extended attributes on pidfds. This will allow
   userspace to attach meta information to tasks.

   One natural extension would be to introduce a custom pidfs.* extended
   attribute space and allow for the inheritance of extended attributes
   across fork() and exec().

   The first simple scheme will allow privileged userspace to set
   trusted extended attributes on pidfs inodes.

- Allow autonomous pidfs file handles

   Various filesystems such as pidfs and drm support opening file
   handles without having to require a file descriptor to identify the
   filesystem. The filesystem are global single instances and can be
   trivially identified solely on the information encoded in the file
   handle.

   This makes it possible to not have to keep or acquire a sentinal file
   descriptor just to pass it to open_by_handle_at() to identify the
   filesystem. That's especially useful when such sentinel file
   descriptor cannot or should not be acquired.

   For pidfs this means a file handle can function as full replacement
   for storing a pid in a file. Instead a file handle can be stored and
   reopened purely based on the file handle.

   Such autonomous file handles can be opened with or without specifying
   a a file descriptor. If no proper file descriptor is used the
   FD_PIDFS_ROOT sentinel must be passed. This allows us to define
   further special negative fd sentinels in the future.

   Userspace can trivially test for support by trying to open the file
   handle with an invalid file descriptor.

- Allow pidfds for reaped tasks with SCM_PIDFD messages

   This is a logical continuation of the earlier work to create pidfds
   for reaped tasks through the SO_PEERPIDFD socket option merged in
   923ea4d4482b ("Merge patch series "net, pidfs: enable handing out
   pidfds for reaped sk->sk_peer_pid"").

- Two minor fixes:

    * Fold fs_struct->{lock,seq} into a seqlock

    * Don't bother with path_{get,put}() in unix_open_file()

* tag 'vfs-6.17-rc1.pidfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (37 commits)
  don't bother with path_get()/path_put() in unix_open_file()
  fold fs_struct->{lock,seq} into a seqlock
  selftests: net: extend SCM_PIDFD test to cover stale pidfds
  af_unix: enable handing out pidfds for reaped tasks in SCM_PIDFD
  af_unix: stash pidfs dentry when needed
  af_unix/scm: fix whitespace errors
  af_unix: introduce and use scm_replace_pid() helper
  af_unix: introduce unix_skb_to_scm helper
  af_unix: rework unix_maybe_add_creds() to allow sleep
  selftests/pidfd: decode pidfd file handles withou having to specify an fd
  fhandle, pidfs: support open_by_handle_at() purely based on file handle
  uapi/fcntl: add FD_PIDFS_ROOT
  uapi/fcntl: add FD_INVALID
  fcntl/pidfd: redefine PIDFD_SELF_THREAD_GROUP
  uapi/fcntl: mark range as reserved
  fhandle: reflow get_path_anchor()
  pidfs: add pidfs_root_path() helper
  fhandle: rename to get_path_anchor()
  fhandle: hoist copy_from_user() above get_path_from_fd()
  fhandle: raise FILEID_IS_DIR in handle_type
  ...

rv: Add opid per-cpu monitor

Add a per-cpu monitor as part of the sched model:
* opid: operations with preemption and irq disabled
Monitor to ensure wakeup and need_resched occur with irq and
preemption disabled or in irq handlers.

Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tomas Glozar <tglozar@redhat.com>
Cc: Juri Lelli <jlelli@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Cc: John Kacur <jkacur@redhat.com>
Link: https://lore.kernel.org/20250728135022.255578-10-gmonaco@redhat.com
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
Acked-by: Nam Cao <namcao@linutronix.de>
Tested-by: Nam Cao <namcao@linutronix.de>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

rv: Add nrp and sssw per-task monitors

Add 2 per-task monitors as part of the sched model:

* nrp: need-resched preempts
    Monitor to ensure preemption requires need resched.
* sssw: set state sleep and wakeup
    Monitor to ensure sched_set_state to sleepable leads to sleeping and
    sleeping tasks require wakeup.

Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Tomas Glozar <tglozar@redhat.com>
Cc: Juri Lelli <jlelli@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Cc: John Kacur <jkacur@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/20250728135022.255578-9-gmonaco@redhat.com
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
Acked-by: Nam Cao <namcao@linutronix.de>
Tested-by: Nam Cao <namcao@linutronix.de>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

rv: Replace tss and sncid monitors with more complete sts

The tss monitor currently guarantees task switches can happen only while
scheduling, whereas the sncid monitor enforces scheduling occurs with
interrupt disabled.

Replace the monitors with a more comprehensive specification which
implies both but also ensures that:
* each scheduler call disable interrupts to switch
* each task switch happens with interrupts disabled

Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Nam Cao <namcao@linutronix.de>
Cc: Tomas Glozar <tglozar@redhat.com>
Cc: Juri Lelli <jlelli@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Cc: John Kacur <jkacur@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/20250728135022.255578-8-gmonaco@redhat.com
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

sched: Adapt sched tracepoints for RV task model

Add the following tracepoint:
* sched_set_need_resched(tsk, cpu, tif)
Called when a task is set the need resched [lazy] flag

Remove the unused ip parameter from sched_entry and sched_exit and alter
sched_entry to have a value of preempt consistent with the one used in
sched_switch.

Also adapt all monitors using sched_{entry,exit} to avoid breaking build.

These tracepoints are useful to describe the Linux task model and are
adapted from the patches by Daniel Bristot de Oliveira
(https://bristot.me/linux-task-model/).

Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Nam Cao <namcao@linutronix.de>
Cc: Tomas Glozar <tglozar@redhat.com>
Cc: Juri Lelli <jlelli@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Cc: John Kacur <jkacur@redhat.com>
Link: https://lore.kernel.org/20250728135022.255578-7-gmonaco@redhat.com
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

rv: Retry when da monitor detects race conditions

DA monitor can be accessed from multiple cores simultaneously, this is
likely, for instance when dealing with per-task monitors reacting on
events that do not always occur on the CPU where the task is running.
This can cause race conditions where two events change the next state
and we see inconsistent values. E.g.:

  [62] event_srs: 27: sleepable x sched_wakeup -> running (final)
  [63] event_srs: 27: sleepable x sched_set_state_sleepable -> sleepable
  [63] error_srs: 27: event sched_switch_suspend not expected in the state running

In this case the monitor fails because the event on CPU 62 wins against
the one on CPU 63, although the correct state should have been
sleepable, since the task get suspended.

Detect if the current state was modified by using try_cmpxchg while
storing the next value. If it was, try again reading the current state.
After a maximum number of failed retries, react by calling a special
tracepoint, print on the console and reset the monitor.

Remove the functions da_monitor_curr_state() and da_monitor_set_state()
as they only hide the underlying implementation in this case.

Monitors where this type of condition can occur must be able to account
for racing events in any possible order, as we cannot know the winner.

Cc: Ingo Molnar <mingo@redhat.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Tomas Glozar <tglozar@redhat.com>
Cc: Juri Lelli <jlelli@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Cc: John Kacur <jkacur@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/20250728135022.255578-6-gmonaco@redhat.com
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
Reviewed-by: Nam Cao <namcao@linutronix.de>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

rv: Adjust monitor dependencies

RV monitors relying on the preemptirqs tracepoints are set as dependent
on PREEMPT_TRACER and IRQSOFF_TRACER. In fact, those configurations do
enable the tracepoints but are not the minimal configurations enabling
them, which are TRACE_PREEMPT_TOGGLE and TRACE_IRQFLAGS (not selectable
manually).

Set TRACE_PREEMPT_TOGGLE and TRACE_IRQFLAGS as dependencies for
monitors.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tomas Glozar <tglozar@redhat.com>
Cc: Juri Lelli <jlelli@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Cc: John Kacur <jkacur@redhat.com>
Link: https://lore.kernel.org/20250728135022.255578-5-gmonaco@redhat.com
Fixes: fbe6c09b7eb4 ("rv: Add scpd, snep and sncid per-cpu monitors")
Acked-by: Nam Cao <namcao@linutronix.de>
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

rv: Use strings in da monitors tracepoints

Using DA monitors tracepoints with KASAN enabled triggers the following
warning:

BUG: KASAN: global-out-of-bounds in do_trace_event_raw_event_event_da_monitor+0xd6/0x1a0
Read of size 32 at addr ffffffffaada8980 by task ...
Call Trace:
  <TASK>
[...]
  do_trace_event_raw_event_event_da_monitor+0xd6/0x1a0
  ? __pfx_do_trace_event_raw_event_event_da_monitor+0x10/0x10
  ? trace_event_sncid+0x83/0x200
  trace_event_sncid+0x163/0x200
[...]
The buggy address belongs to the variable:
  automaton_snep+0x4e0/0x5e0

This is caused by the tracepoints reading 32 bytes __array instead of
__string from the automata definition. Such strings are literals and
reading 32 bytes ends up in out of bound memory accesses (e.g. the next
automaton's data in this case).
The error is harmless as, while printing the string, we stop at the null
terminator, but it should still be fixed.

Use the __string facilities while defining the tracepoints to avoid
reading out of bound memory.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tomas Glozar <tglozar@redhat.com>
Cc: Juri Lelli <jlelli@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Cc: John Kacur <jkacur@redhat.com>
Link: https://lore.kernel.org/20250728135022.255578-4-gmonaco@redhat.com
Fixes: 792575348ff7 ("rv/include: Add deterministic automata monitor definition via C macros")
Reviewed-by: Nam Cao <namcao@linutronix.de>
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

rv: Remove trailing whitespace from tracepoint string

RV event tracepoints print a line with the format:
"event_xyz: S0 x event -> S1 "
"event_xyz: S1 x event -> S0 (final)"

While printing an event leading to a non-final state, the line
has a trailing white space (visible above before the closing ").

Adapt the format string not to print the trailing whitespace if we are
not printing "(final)".

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tomas Glozar <tglozar@redhat.com>
Cc: Juri Lelli <jlelli@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Cc: John Kacur <jkacur@redhat.com>
Link: https://lore.kernel.org/20250728135022.255578-3-gmonaco@redhat.com
Reviewed-by: Nam Cao <namcao@linutronix.de>
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

rv: Add da_handle_start_run_event_ to per-task monitors

The RV da_monitor API allows to start monitors in two ways:
da_handle_start_event_NAME and da_handle_start_run_event_NAME.
The former is used when the event is followed by the initial state of
the module, so we ignore the event but we know the monitor is in the
initial state and can start monitoring, the latter can be used if the
event can only occur in the initial state, so we do handle the event as
if the monitor was in the initial state.
This latter API is defined for implicit monitors but not per-task ones.

Define da_handle_start_run_event_NAME macro also for per-task monitors.

Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tomas Glozar <tglozar@redhat.com>
Cc: Juri Lelli <jlelli@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Cc: John Kacur <jkacur@redhat.com>
Link: https://lore.kernel.org/20250728135022.255578-2-gmonaco@redhat.com
Reviewed-by: Nam Cao <namcao@linutronix.de>
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

Merge tag 'vfs-6.17-rc1.mmap_prepare' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull mmap_prepare updates from Christian Brauner:
"Last cycle we introduce f_op->mmap_prepare() in c84bf6dd2b83 ("mm:
  introduce new .mmap_prepare() file callback").

  This is preferred to the existing f_op->mmap() hook as it does require
  a VMA to be established yet, thus allowing the mmap logic to invoke
  this hook far, far earlier, prior to inserting a VMA into the virtual
  address space, or performing any other heavy handed operations.

  This allows for much simpler unwinding on error, and for there to be a
  single attempt at merging a VMA rather than having to possibly
  reattempt a merge based on potentially altered VMA state.

  Far more importantly, it prevents inappropriate manipulation of
  incompletely initialised VMA state, which is something that has been
  the cause of bugs and complexity in the past.

  The intent is to gradually deprecate f_op->mmap, and in that vein this
  series coverts the majority of file systems to using f_op->mmap_prepare.

  Prerequisite steps are taken - firstly ensuring all checks for mmap
  capabilities use the file_has_valid_mmap_hooks() helper rather than
  directly checking for f_op->mmap (which is now not a valid check) and
  secondly updating daxdev_mapping_supported() to not require a VMA
  parameter to allow ext4 and xfs to be converted.

  Commit bb666b7c2707 ("mm: add mmap_prepare() compatibility layer for
  nested file systems") handles the nasty edge-case of nested file
  systems like overlayfs, which introduces a compatibility shim to allow
  f_op->mmap_prepare() to be invoked from an f_op->mmap() callback.

  This allows for nested filesystems to continue to function correctly
  with all file systems regardless of which callback is used. Once we
  finally convert all file systems, this shim can be removed.

  As a result, ecryptfs, fuse, and overlayfs remain unaltered so they
  can nest all other file systems.

  We additionally do not update resctl - as this requires an update to
  remap_pfn_range() (or an alternative to it) which we defer to a later
  series, equally we do not update cramfs which needs a mixed mapping
  insertion with the same issue, nor do we update procfs, hugetlbfs,
  syfs or kernfs all of which require VMAs for internal state and hooks.
  We shall return to all of these later"

* tag 'vfs-6.17-rc1.mmap_prepare' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  doc: update porting, vfs documentation to describe mmap_prepare()
  fs: replace mmap hook with .mmap_prepare for simple mappings
  fs: convert most other generic_file_*mmap() users to .mmap_prepare()
  fs: convert simple use of generic_file_*_mmap() to .mmap_prepare()
  mm/filemap: introduce generic_file_*_mmap_prepare() helpers
  fs/xfs: transition from deprecated .mmap hook to .mmap_prepare
  fs/ext4: transition from deprecated .mmap hook to .mmap_prepare
  fs/dax: make it possible to check dev dax support without a VMA
  fs: consistently use can_mmap_file() helper
  mm/nommu: use file_has_valid_mmap_hooks() helper
  mm: rename call_mmap/mmap_prepare to vfs_mmap/mmap_prepare

Merge tag 'vfs-6.17-rc1.fallocate' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull fallocate updates from Christian Brauner:
"fallocate() currently supports creating preallocated files
  efficiently. However, on most filesystems fallocate() will preallocate
  blocks in an unwriten state even if FALLOC_FL_ZERO_RANGE is specified.

  The extent state must later be converted to a written state when the
  user writes data into this range, which can trigger numerous metadata
  changes and journal I/O. This may leads to significant write
  amplification and performance degradation in synchronous write mode.

  At the moment, the only method to avoid this is to create an empty
  file and write zero data into it (for example, using 'dd' with a large
  block size). However, this method is slow and consumes a considerable
  amount of disk bandwidth.

  Now that more and more flash-based storage devices are available it is
  possible to efficiently write zeros to SSDs using the unmap write
  zeroes command if the devices do not write physical zeroes to the
  media.

  For example, if SCSI SSDs support the UMMAP bit or NVMe SSDs support
  the DEAC bit[1], the write zeroes command does not write actual data
  to the device, instead, NVMe converts the zeroed range to a
  deallocated state, which works fast and consumes almost no disk write
  bandwidth.

  This series implements the BLK_FEAT_WRITE_ZEROES_UNMAP feature and
  BLK_FLAG_WRITE_ZEROES_UNMAP_DISABLED flag for SCSI, NVMe and
  device-mapper drivers, and add the FALLOC_FL_WRITE_ZEROES and
  STATX_ATTR_WRITE_ZEROES_UNMAP support for ext4 and raw bdev devices.

  fallocate() is subsequently extended with the FALLOC_FL_WRITE_ZEROES
  flag. FALLOC_FL_WRITE_ZEROES zeroes a specified file range in such a
  way that subsequent writes to that range do not require further
  changes to the file mapping metadata. This flag is beneficial for
  subsequent pure overwriting within this range, as it can save on block
  allocation and, consequently, significant metadata changes"

* tag 'vfs-6.17-rc1.fallocate' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  ext4: add FALLOC_FL_WRITE_ZEROES support
  block: add FALLOC_FL_WRITE_ZEROES support
  block: factor out common part in blkdev_fallocate()
  fs: introduce FALLOC_FL_WRITE_ZEROES to fallocate
  dm: clear unmap write zeroes limits when disabling write zeroes
  scsi: sd: set max_hw_wzeroes_unmap_sectors if device supports SD_ZERO_*_UNMAP
  nvmet: set WZDS and DRB if device enables unmap write zeroes operation
  nvme: set max_hw_wzeroes_unmap_sectors if device supports DEAC bit
  block: introduce max_{hw|user}_wzeroes_unmap_sectors to queue limits

Merge tag 'vfs-6.17-rc1.async.dir' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull async directory updates from Christian Brauner:
"This contains preparatory changes for the asynchronous directory
  locking scheme.

  While the locking scheme is still very much controversial and we're
  still far away from landing any actual changes in that area the
  preparatory work that we've been upstreaming for a while now has been
  very useful. This is another set of minor changes and cleanups"

* tag 'vfs-6.17-rc1.async.dir' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  exportfs: use lookup_one_unlocked()
  coda: use iterate_dir() in coda_readdir()
  VFS: Minor fixes for porting.rst
  VFS: merge lookup_one_qstr_excl_raw() back into lookup_one_qstr_excl()

Merge tag 'vfs-6.17-rc1.nsfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull namespace updates from Christian Brauner:
"This contains namespace updates. This time specifically for nsfs:

   - Userspace heavily relies on the root inode numbers for namespaces
     to identify the initial namespaces. That's already a hard
     dependency. So we cannot change that anymore. Move the initial
     inode numbers to a public header and align the only two namespaces
     that currently don't do that with all the other namespaces.

   - The root inode of /proc having a fixed inode number has been part
     of the core kernel ABI since its inception, and recently some
     userspace programs (mainly container runtimes) have started to
     explicitly depend on this behaviour.

     The main reason this is useful to userspace is that by checking
     that a suspect /proc handle has fstype PROC_SUPER_MAGIC and is
     PROCFS_ROOT_INO, they can then use openat2() together with
     RESOLVE_{NO_{XDEV,MAGICLINK},BENEATH} to ensure that there isn't a
     bind-mount that replaces some procfs file with a different one.

     This kind of attack has lead to security issues in container
     runtimes in the past (such as CVE-2019-19921) and libraries like
     libpathrs[1] use this feature of procfs to provide safe procfs
     handling functions"

* tag 'vfs-6.17-rc1.nsfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  uapi: export PROCFS_ROOT_INO
  mntns: use stable inode number for initial mount ns
  netns: use stable inode number for initial mount ns
  nsfs: move root inode number to uapi

Merge tag 'vfs-6.17-rc1.ovl' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull overlayfs updates from Christian Brauner:
"This contains overlayfs updates for this cycle.

  The changes for overlayfs in here are primarily focussed on preparing
  for some proposed changes to directory locking.

  Overlayfs currently will sometimes lock a directory on the upper
  filesystem and do a few different things while holding the lock. This
  is incompatible with the new potential scheme.

  This series narrows the region of code protected by the directory
  lock, taking it multiple times when necessary. This theoretically
  opens up the possibilty of other changes happening on the upper
  filesytem between the unlock and the lock. To some extent the patches
  guard against that by checking the dentries still have the expect
  parent after retaking the lock. In general, concurrent changes to the
  upper and lower filesystems aren't supported properly anyway"

* tag 'vfs-6.17-rc1.ovl' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (25 commits)
  ovl: properly print correct variable
  ovl: rename ovl_cleanup_unlocked() to ovl_cleanup()
  ovl: change ovl_create_real() to receive dentry parent
  ovl: narrow locking in ovl_check_rename_whiteout()
  ovl: narrow locking in ovl_whiteout()
  ovl: change ovl_cleanup_and_whiteout() to take rename lock as needed
  ovl: narrow locking on ovl_remove_and_whiteout()
  ovl: change ovl_workdir_cleanup() to take dir lock as needed.
  ovl: narrow locking in ovl_workdir_cleanup_recurse()
  ovl: narrow locking in ovl_indexdir_cleanup()
  ovl: narrow locking in ovl_workdir_create()
  ovl: narrow locking in ovl_cleanup_index()
  ovl: narrow locking in ovl_cleanup_whiteouts()
  ovl: narrow locking in ovl_rename()
  ovl: simplify gotos in ovl_rename()
  ovl: narrow locking in ovl_create_over_whiteout()
  ovl: narrow locking in ovl_clear_empty()
  ovl: narrow locking in ovl_create_upper()
  ovl: narrow the locked region in ovl_copy_up_workdir()
  ovl: Call ovl_create_temp() without lock held.
  ...

Merge tag 'vfs-6.17-rc1.coredump' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull coredump updates from Christian Brauner:
"This contains an extension to the coredump socket and a proper rework
  of the coredump code.

   - This extends the coredump socket to allow the coredump server to
     tell the kernel how to process individual coredumps. This allows
     for fine-grained coredump management. Userspace can decide to just
     let the kernel write out the coredump, or generate the coredump
     itself, or just reject it.

     * COREDUMP_KERNEL
       The kernel will write the coredump data to the socket.

     * COREDUMP_USERSPACE
       The kernel will not write coredump data but will indicate to the
       parent that a coredump has been generated. This is used when
       userspace generates its own coredumps.

     * COREDUMP_REJECT
       The kernel will skip generating a coredump for this task.

     * COREDUMP_WAIT
       The kernel will prevent the task from exiting until the coredump
       server has shutdown the socket connection.

     The flexible coredump socket can be enabled by using the "@@"
     prefix instead of the single "@" prefix for the regular coredump
     socket:

       @@/run/systemd/coredump.socket

   - Cleanup the coredump code properly while we have to touch it
     anyway.

     Split out each coredump mode in a separate helper so it's easy to
     grasp what is going on and make the code easier to follow. The core
     coredump function should now be very trivial to follow"

* tag 'vfs-6.17-rc1.coredump' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (31 commits)
  cleanup: add a scoped version of CLASS()
  coredump: add coredump_skip() helper
  coredump: avoid pointless variable
  coredump: order auto cleanup variables at the top
  coredump: add coredump_cleanup()
  coredump: auto cleanup prepare_creds()
  cred: add auto cleanup method
  coredump: directly return
  coredump: auto cleanup argv
  coredump: add coredump_write()
  coredump: use a single helper for the socket
  coredump: move pipe specific file check into coredump_pipe()
  coredump: split pipe coredumping into coredump_pipe()
  coredump: move core_pipe_count to global variable
  coredump: prepare to simplify exit paths
  coredump: split file coredumping into coredump_file()
  coredump: rename do_coredump() to vfs_coredump()
  selftests/coredump: make sure invalid paths are rejected
  coredump: validate socket path in coredump_parse()
  coredump: don't allow ".." in coredump socket path
  ...

Merge tag 'vfs-6.17-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull misc VFS updates from Christian Brauner:
"This contains the usual selections of misc updates for this cycle.

  Features:

   - Add ext4 IOCB_DONTCACHE support

     This refactors the address_space_operations write_begin() and
     write_end() callbacks to take const struct kiocb * as their first
     argument, allowing IOCB flags such as IOCB_DONTCACHE to propagate
     to the filesystem's buffered I/O path.

     Ext4 is updated to implement handling of the IOCB_DONTCACHE flag
     and advertises support via the FOP_DONTCACHE file operation flag.

     Additionally, the i915 driver's shmem write paths are updated to
     bypass the legacy write_begin/write_end interface in favor of
     directly calling write_iter() with a constructed synchronous kiocb.
     Another i915 change replaces a manual write loop with
     kernel_write() during GEM shmem object creation.

  Cleanups:

   - don't duplicate vfs_open() in kernel_file_open()

   - proc_fd_getattr(): don't bother with S_ISDIR() check

   - fs/ecryptfs: replace snprintf with sysfs_emit in show function

   - vfs: Remove unnecessary list_for_each_entry_safe() from
     evict_inodes()

   - filelock: add new locks_wake_up_waiter() helper

   - fs: Remove three arguments from block_write_end()

   - VFS: change old_dir and new_dir in struct renamedata to dentrys

   - netfs: Remove unused declaration netfs_queue_write_request()

  Fixes:

   - eventpoll: Fix semi-unbounded recursion

   - eventpoll: fix sphinx documentation build warning

   - fs/read_write: Fix spelling typo

   - fs: annotate data race between poll_schedule_timeout() and
     pollwake()

   - fs/pipe: set FMODE_NOWAIT in create_pipe_files()

   - docs/vfs: update references to i_mutex to i_rwsem

   - fs/buffer: remove comment about hard sectorsize

   - fs/buffer: remove the min and max limit checks in __getblk_slow()

   - fs/libfs: don't assume blocksize <= PAGE_SIZE in
     generic_check_addressable

   - fs_context: fix parameter name in infofc() macro

   - fs: Prevent file descriptor table allocations exceeding INT_MAX"

* tag 'vfs-6.17-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (24 commits)
  netfs: Remove unused declaration netfs_queue_write_request()
  eventpoll: fix sphinx documentation build warning
  ext4: support uncached buffered I/O
  mm/pagemap: add write_begin_get_folio() helper function
  fs: change write_begin/write_end interface to take struct kiocb *
  drm/i915: Refactor shmem_pwrite() to use kiocb and write_iter
  drm/i915: Use kernel_write() in shmem object create
  eventpoll: Fix semi-unbounded recursion
  vfs: Remove unnecessary list_for_each_entry_safe() from evict_inodes()
  fs/libfs: don't assume blocksize <= PAGE_SIZE in generic_check_addressable
  fs/buffer: remove the min and max limit checks in __getblk_slow()
  fs: Prevent file descriptor table allocations exceeding INT_MAX
  fs: Remove three arguments from block_write_end()
  fs/ecryptfs: replace snprintf with sysfs_emit in show function
  fs: annotate suspected data race between poll_schedule_timeout() and pollwake()
  docs/vfs: update references to i_mutex to i_rwsem
  fs/buffer: remove comment about hard sectorsize
  fs_context: fix parameter name in infofc() macro
  VFS: change old_dir and new_dir in struct renamedata to dentrys
  proc_fd_getattr(): don't bother with S_ISDIR() check
  ...

Merge tag 'pull-mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull vfs mount updates from Al Viro:

- mount hash conflicts rudiments are gone now - we do not allow
     multiple mounts with the same parent/mountpoint to be hashed at the
     same time.

- 'struct mount' changes:
      - mnt_umounting is gone
      - mnt_slave_list/mnt_slave is an hlist now
      - overmounts are kept track of by explicit pointer in mount
      - a bunch of flags moved out of mnt_flags to a new field, with
        only namespace_sem for protection
      - mnt_expiry is protected by mount_lock now (instead of
        namespace_sem)
      - MNT_LOCKED is used only for mounts that need to remain attached
        to their parents to prevent mountpoint exposure - no more
        overloading it for absolute roots
      - all mnt_list uses are transient now - it's used only to
        represent temporary sets during umount_tree()

- mount refcounting change: children no longer pin parents for any
   mounts, whether they'd passed through umount_tree() or not

- 'struct mountpoint' changes:
      - refcount is no more; what matters is ->m_list emptiness
      - instead of temporary bumping the refcount, we insert a new
        object (pinned_mountpoint) into ->m_list
      - new calling conventions for lock_mount() and friends

- do_move_mount()/attach_recursive_mnt() seriously cleaned up

- globals in fs/pnode.c are gone

- propagate_mnt(), change_mnt_propagation() and propagate_umount()
   cleaned up (in the last case - pretty much completely rewritten).

- freeing of emptied mnt_namespace is done in namespace_unlock(). For
   one thing, there are subtle ordering requirements there; for another
   it simplifies cleanups.

- assorted cleanups

- restore the machinery for long-term mounts from accumulated bitrot.

   This is going to get a followup come next cycle, when the change of
   vfs_fs_parse_string() calling conventions goes into -next

* tag 'pull-mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (48 commits)
  statmount_mnt_basic(): simplify the logics for group id
  invent_group_ids(): zero ->mnt_group_id always implies !IS_MNT_SHARED()
  get rid of CL_SHARE_TO_SLAVE
  take freeing of emptied mnt_namespace to namespace_unlock()
  copy_tree(): don't link the mounts via mnt_list
  change_mnt_propagation(): move ->mnt_master assignment into MS_SLAVE case
  mnt_slave_list/mnt_slave: turn into hlist_head/hlist_node
  turn do_make_slave() into transfer_propagation()
  do_make_slave(): choose new master sanely
  change_mnt_propagation(): do_make_slave() is a no-op unless IS_MNT_SHARED()
  change_mnt_propagation() cleanups, step 1
  propagate_mnt(): fix comment and convert to kernel-doc, while we are at it
  propagate_mnt(): get rid of last_dest
  fs/pnode.c: get rid of globals
  propagate_one(): fold into the sole caller
  propagate_one(): separate the "what should be the master for this copy" part
  propagate_one(): separate the "do we need secondary here?" logics
  propagate_mnt(): handle all peer groups in the same loop
  propagate_one(): get rid of dest_master
  mount: separate the flags accessed only under namespace_sem
  ...

Merge tag 'pull-fd' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull CLASS(fd) update from Al Viro:
"A missing bit of commit 66635b077624 ("assorted variants of irqfd
  setup: convert to CLASS(fd)") from a year ago.

  mshv_eventfd would've been covered by that, but it had forked slightly
  before that series and got merged into mainline later"

* tag 'pull-fd' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  mshv_eventfd: convert to CLASS(fd)

Merge tag 'pull-ceph-d_name-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull ceph dentry->d_name fixes from Al Viro:
"Stuff that had fallen through the cracks back in February; ceph folks
  tested that pile and said they prefer to have it go through my tree..."

* tag 'pull-ceph-d_name-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  ceph: fix a race with rename() in ceph_mdsc_build_path()
  prep for ceph_encode_encrypted_fname() fixes
  [ceph] parse_longname(): strrchr() expects NUL-terminated string

Merge tag 'pull-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull misc VFS updates from Al Viro:
"VFS-related cleanups in various places (mostly of the "that really
  can't happen" or "there's a better way to do it" variety)"

* tag 'pull-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  gpib: use file_inode()
  binder_ioctl_write_read(): simplify control flow a bit
  secretmem: move setting O_LARGEFILE and bumping users' count to the place where we create the file
  apparmor: file never has NULL f_path.mnt
  landlock: opened file never has a negative dentry

Merge tag 'pull-securityfs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull securityfs updates from Al Viro:
"Securityfs cleanups and fixes:

   - one extra reference is enough to pin a dentry down; no need for
     two. Switch to regular scheme, similar to shmem, debugfs, etc. This
     fixes a securityfs_recursive_remove() dentry leak, among other
     things.

   - we need to have the filesystem pinned to prevent the contents
     disappearing; what we do not need is pinning it for each file.
     Doing that only for files and directories in the root is enough.

   - the previous two changes allow us to get rid of the racy kludges in
     efi_secret_unlink(), where we can use simple_unlink() instead of
     securityfs_remove(). Which does not require unlocking and relocking
     the parent, with all deadlocks that invites.

   - Make securityfs_remove() take the entire subtree out, turning
     securityfs_recursive_remove() into its alias. Makes a lot more
     sense for callers and fixes a mount leak, while we are at it.

   - Making securityfs_remove() remove the entire subtree allows for
     much simpler life in most of the users - efi_secret, ima_fs, evm,
     ipe, tmp get cleaner. I hadn't touched apparmor use of securityfs,
     but I suspect that it would be useful there as well"

* tag 'pull-securityfs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  tpm: don't bother with removal of files in directory we'll be removing
  ipe: don't bother with removal of files in directory we'll be removing
  evm_secfs: clear securityfs interactions
  ima_fs: get rid of lookup-by-dentry stuff
  ima_fs: don't bother with removal of files in directory we'll be removing
  efi_secret: clean securityfs use up
  make securityfs_remove() remove the entire subtree
  fix locking in efi_secret_unlink()
  securityfs: pin filesystem only for objects directly in root
  securityfs: don't pin dentries twice, once is enough...

bpf: Fix various typos in verifier.c comments

This patch fixes several minor typos in comments within the BPF verifier.
No changes in functionality.

Signed-off-by: Suchit Karunakaran <suchitkarunakaran@gmail.com>
Link: https://lore.kernel.org/r/20250727081754.15986-1-suchitkarunakaran@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Merge branch 'bpf-improve-64bits-bounds-refinement'

Paul Chaignon says:

====================
bpf: Improve 64bits bounds refinement

This patchset improves the 64bits bounds refinement when the s64 ranges
crosses the sign boundary. The first patch explains the small addition
to __reg64_deduce_bounds. The last one explains why we need a third
round of __reg_deduce_bounds. The third patch adds a selftest with a
more complete example of the impact on verification. The second and
fourth patches update the existing selftests to take the new refinement
into account.

This patchset should reduce the number of kernel warnings hit by
syzkaller due to invariant violations [1]. It was also tested with
Agni [2] (and Cilium's CI for good measure).

Link: https://syzkaller.appspot.com/bug?extid=c711ce17dd78e5d4fdcf
Link: https://github.com/bpfverif/agni
Changes in v4:
  - Fixed outdated test comment, noticed by Eduard.
  - Rebased.
Changes in v3:
  - Added a 5th patch to call __reg_deduce_bounds a third time in
    reg_bounds_sync following tests from Eduard.
  - Fixed broken indentations in the first patch.
Changes in v2 (all on Eduard's suggestions):
  - Added two tests to ensure we cover all cases of u64/s64 overlap.
  - Improved tests to check deduced ranges with __msg.
  - Improved code comments.
====================

Link: https://patch.msgid.link/cover.1753695655.git.paul.chaignon@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: Add third round of bounds deduction

Commit d7f008738171 ("bpf: try harder to deduce register bounds from
different numeric domains") added a second call to __reg_deduce_bounds
in reg_bounds_sync because a single call wasn't enough to converge to a
fixed point in terms of register bounds.

With patch "bpf: Improve bounds when s64 crosses sign boundary" from
this series, Eduard noticed that calling __reg_deduce_bounds twice isn't
enough anymore to converge. The first selftest added in "selftests/bpf:
Test cross-sign 64bits range refinement" highlights the need for a third
call to __reg_deduce_bounds. After instruction 7, reg_bounds_sync
performs the following bounds deduction:

  reg_bounds_sync entry:          scalar(smin=-655,smax=0xeffffeee,smin32=-783,smax32=-146)
  __update_reg_bounds:            scalar(smin=-655,smax=0xeffffeee,smin32=-783,smax32=-146)
  __reg_deduce_bounds:
      __reg32_deduce_bounds:      scalar(smin=-655,smax=0xeffffeee,smin32=-783,smax32=-146,umin32=0xfffffcf1,umax32=0xffffff6e)
      __reg64_deduce_bounds:      scalar(smin=-655,smax=0xeffffeee,smin32=-783,smax32=-146,umin32=0xfffffcf1,umax32=0xffffff6e)
      __reg_deduce_mixed_bounds:  scalar(smin=-655,smax=0xeffffeee,umin=umin32=0xfffffcf1,umax=0xffffffffffffff6e,smin32=-783,smax32=-146,umax32=0xffffff6e)
  __reg_deduce_bounds:
      __reg32_deduce_bounds:      scalar(smin=-655,smax=0xeffffeee,umin=umin32=0xfffffcf1,umax=0xffffffffffffff6e,smin32=-783,smax32=-146,umax32=0xffffff6e)
      __reg64_deduce_bounds:      scalar(smin=-655,smax=smax32=-146,umin=0xfffffffffffffd71,umax=0xffffffffffffff6e,smin32=-783,umin32=0xfffffcf1,umax32=0xffffff6e)
      __reg_deduce_mixed_bounds:  scalar(smin=-655,smax=smax32=-146,umin=0xfffffffffffffd71,umax=0xffffffffffffff6e,smin32=-783,umin32=0xfffffcf1,umax32=0xffffff6e)
  __reg_bound_offset:             scalar(smin=-655,smax=smax32=-146,umin=0xfffffffffffffd71,umax=0xffffffffffffff6e,smin32=-783,umin32=0xfffffcf1,umax32=0xffffff6e,var_off=(0xfffffffffffffc00; 0x3ff))
  __update_reg_bounds:            scalar(smin=-655,smax=smax32=-146,umin=0xfffffffffffffd71,umax=0xffffffffffffff6e,smin32=-783,umin32=0xfffffcf1,umax32=0xffffff6e,var_off=(0xfffffffffffffc00; 0x3ff))

In particular, notice how:
1. In the first call to __reg_deduce_bounds, __reg32_deduce_bounds
   learns new u32 bounds.
2. __reg64_deduce_bounds is unable to improve bounds at this point.
3. __reg_deduce_mixed_bounds derives new u64 bounds from the u32 bounds.
4. In the second call to __reg_deduce_bounds, __reg64_deduce_bounds
   improves the smax and umin bounds thanks to patch "bpf: Improve
   bounds when s64 crosses sign boundary" from this series.
5. Subsequent functions are unable to improve the ranges further (only
   tnums). Yet, a better smin32 bound could be learned from the smin
   bound.

__reg32_deduce_bounds is able to improve smin32 from smin, but for that
we need a third call to __reg_deduce_bounds.

As discussed in [1], there may be a better way to organize the deduction
rules to learn the same information with less calls to the same
functions. Such an optimization requires further analysis and is
orthogonal to the present patchset.

Link: https://lore.kernel.org/bpf/aIKtSK9LjQXB8FLY@mail.gmail.com/
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Co-developed-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Link: https://lore.kernel.org/r/79619d3b42e5525e0e174ed534b75879a5ba15de.1753695655.git.paul.chaignon@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: Test invariants on JSLT crossing sign

The improvement of the u64/s64 range refinement fixed the invariant
violation that was happening on this test for BPF_JSLT when crossing the
sign boundary.

After this patch, we have one test remaining with a known invariant
violation. It's the same test as fixed here but for 32 bits ranges.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Link: https://lore.kernel.org/r/ad046fb0016428f1a33c3b81617aabf31b51183f.1753695655.git.paul.chaignon@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: Test cross-sign 64bits range refinement

This patch adds coverage for the new cross-sign 64bits range refinement
logic. The three tests cover the cases when the u64 and s64 ranges
overlap (1) in the negative portion of s64, (2) in the positive portion
of s64, and (3) in both portions.

The first test is a simplified version of a BPF program generated by
syzkaller that caused an invariant violation [1]. It looks like
syzkaller could not extract the reproducer itself (and therefore didn't
report it to the mailing list), but I was able to extract it from the
console logs of a crash.

The principle is similar to the invariant violation described in
commit 6279846b9b25 ("bpf: Forget ranges when refining tnum after
JSET"): the verifier walks a dead branch, uses the condition to refine
ranges, and ends up with inconsistent ranges. In this case, the dead
branch is when we fallthrough on both jumps. The new refinement logic
improves the bounds such that the second jump is properly detected as
always-taken and the verifier doesn't end up walking a dead branch.

The second and third tests are inspired by the first, but rely on
condition jumps to prepare the bounds instead of ALU instructions. An
R10 write is used to trigger a verifier error when the bounds can't be
refined.

Link: https://syzkaller.appspot.com/bug?extid=c711ce17dd78e5d4fdcf
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Link: https://lore.kernel.org/r/a0e17b00dab8dabcfa6f8384e7e151186efedfdd.1753695655.git.paul.chaignon@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: Update reg_bound range refinement logic

This patch updates the range refinement logic in the reg_bound test to
match the new logic from the previous commit. Without this change, tests
would fail because we end with more precise ranges than the tests
expect.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Link: https://lore.kernel.org/r/b7f6b1fbe03373cca4e1bb6a113035a6cd2b3ff7.1753695655.git.paul.chaignon@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: Improve bounds when s64 crosses sign boundary

__reg64_deduce_bounds currently improves the s64 range using the u64
range and vice versa, but only if it doesn't cross the sign boundary.

This patch improves __reg64_deduce_bounds to cover the case where the
s64 range crosses the sign boundary but overlaps with the u64 range on
only one end. In that case, we can improve both ranges. Consider the
following example, with the s64 range crossing the sign boundary:

    0                                                   U64_MAX
    |  [xxxxxxxxxxxxxx u64 range xxxxxxxxxxxxxx]              |
    |----------------------------|----------------------------|
    |xxxxx s64 range xxxxxxxxx]                       [xxxxxxx|
    0                     S64_MAX S64_MIN                    -1

The u64 range overlaps only with positive portion of the s64 range. We
can thus derive the following new s64 and u64 ranges.

    0                                                   U64_MAX
    |  [xxxxxx u64 range xxxxx]                               |
    |----------------------------|----------------------------|
    |  [xxxxxx s64 range xxxxx]                               |
    0                     S64_MAX S64_MIN                    -1

The same logic can probably apply to the s32/u32 ranges, but this patch
doesn't implement that change.

In addition to the selftests, the __reg64_deduce_bounds change was
also tested with Agni, the formal verification tool for the range
analysis [1].

Link: https://github.com/bpfverif/agni
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Link: https://lore.kernel.org/r/933bd9ce1f36ded5559f92fdc09e5dbc823fa245.1753695655.git.paul.chaignon@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

RISC-V: KVM: Avoid re-acquiring memslot in kvm_riscv_gstage_map()

The caller has already passed in the memslot, and there are
two instances `{kvm_faultin_pfn/mark_page_dirty}` of retrieving
the memslot again in `kvm_riscv_gstage_map`, we can replace them
with `{__kvm_faultin_pfn/mark_page_dirty_in_slot}`.

Signed-off-by: Quan Zhou <zhouquan@iscas.ac.cn>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/50989f0a02790f9d7dc804c2ade6387c4e7fbdbc.1749634392.git.zhouquan@iscas.ac.cn
Signed-off-by: Anup Patel <anup@brainfault.org>

RISC-V: KVM: Use find_vma_intersection() to search for intersecting VMAs

There is already a helper function find_vma_intersection() in KVM
for searching intersecting VMAs, use it directly.

Signed-off-by: Quan Zhou <zhouquan@iscas.ac.cn>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/230d6c8c8b8dd83081fcfd8d83a4d17c8245fa2f.1731552790.git.zhouquan@iscas.ac.cn
Signed-off-by: Anup Patel <anup@brainfault.org>

RISC-V: perf/kvm: Add reporting of interrupt events

For `perf kvm stat` on the RISC-V, in order to avoid the
occurrence of `UNKNOWN` event names, interrupts should be
reported in addition to exceptions.

testing without patch:

Event name                    Samples  Sample%       Time(ns)
---------------------------  --------  --------  ------------
STORE_GUEST_PAGE_FAULT        1496461   53.00%    889612544
UNKNOWN                        887514   31.00%    272857968
LOAD_GUEST_PAGE_FAULT          305164   10.00%    189186331
VIRTUAL_INST_FAULT              70625    2.00%    134114260
SUPERVISOR_SYSCALL              32014    1.00%     58577110
INST_GUEST_PAGE_FAULT               1    0.00%         2545

testing with patch:

Event name                    Samples  Sample%       Time(ns)
---------------------------  --------  --------  ------------
IRQ_S_TIMER                   211271    58.00%  738298680600
EXC_STORE_GUEST_PAGE_FAULT    111279    30.00%  130725914800
EXC_LOAD_GUEST_PAGE_FAULT      22039     6.00%   25441480600
EXC_VIRTUAL_INST_FAULT          8913     2.00%   21015381600
IRQ_VS_EXT                      4748     1.00%   10155464300
IRQ_S_EXT                       2802     0.00%   13288775800
IRQ_S_SOFT                      1998     0.00%    4254129300

Signed-off-by: Quan Zhou <zhouquan@iscas.ac.cn>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/9693132df4d0f857b8be3a75750c36b40213fcc0.1726211632.git.zhouquan@iscas.ac.cn
Signed-off-by: Anup Patel <anup@brainfault.org>

RISC-V: KVM: Enable ring-based dirty memory tracking

Enable ring-based dirty memory tracking on riscv:

- Enable CONFIG_HAVE_KVM_DIRTY_RING_ACQ_REL as riscv is weakly
  ordered.
- Set KVM_DIRTY_LOG_PAGE_OFFSET for the ring buffer's physical page
  offset.
- Add a check to kvm_vcpu_kvm_riscv_check_vcpu_requests for checking
  whether the dirty ring is soft full.

To handle vCPU requests that cause exits to userspace, modified the
`kvm_riscv_check_vcpu_requests` to return a value (currently only
returns 0 or 1).

Signed-off-by: Quan Zhou <zhouquan@iscas.ac.cn>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20e116efb1f7aff211dd8e3cf8990c5521ed5f34.1749810735.git.zhouquan@iscas.ac.cn
Signed-off-by: Anup Patel <anup@brainfault.org>

RISC-V: KVM: Fix inclusion of Smnpm in the guest ISA bitmap

The Smnpm extension requires special handling because the guest ISA
extension maps to a different extension (Ssnpm) on the host side.
commit 1851e7836212 ("RISC-V: KVM: Allow Smnpm and Ssnpm extensions for
guests") missed that the vcpu->arch.isa bit is based only on the host
extension, so currently both KVM_RISCV_ISA_EXT_{SMNPM,SSNPM} map to
vcpu->arch.isa[RISCV_ISA_EXT_SSNPM]. This does not cause any problems
for the guest, because both extensions are force-enabled anyway when the
host supports Ssnpm, but prevents checking for (guest) Smnpm in the SBI
FWFT logic.

Redefine kvm_isa_ext_arr to look up the guest extension, since only the
guest -> host mapping is unambiguous. Factor out the logic for checking
for host support of an extension, so this special case only needs to be
handled in one place, and be explicit about which variables hold a host
vs a guest ISA extension.

Fixes: 1851e7836212 ("RISC-V: KVM: Allow Smnpm and Ssnpm extensions for guests")
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20250111004702.2813013-2-samuel.holland@sifive.com
Signed-off-by: Anup Patel <anup@brainfault.org>

RISC-V: KVM: Delegate illegal instruction fault to VS mode

Delegate illegal instruction fault to VS mode by default to avoid such
exceptions being trapped to HS and redirected back to VS.

The delegation of illegal instruction fault is particularly important
to guest applications that use vector instructions frequently. In such
cases, an illegal instruction fault will be raised when guest user thread
uses vector instruction the first time and then guest kernel will enable
user thread to execute following vector instructions.

The fw pmu event counter remains undeleted so that guest can still query
illegal instruction events via sbi call. Guest will only see zero count
on illegal instruction faults and know 'firmware' has delegated it.

Reviewed-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Xu Lu <luxu.kernel@bytedance.com>
Link: https://lore.kernel.org/r/20250714094554.89151-1-luxu.kernel@bytedance.com
Signed-off-by: Anup Patel <anup@brainfault.org>

RISC-V: KVM: Pass VMID as parameter to kvm_riscv_hfence_xyz() APIs

Currently, all kvm_riscv_hfence_xyz() APIs assume VMID to be the
host VMID of the Guest/VM which resticts use of these APIs only
for host TLB maintenance. Let's allow passing VMID as a parameter
to all kvm_riscv_hfence_xyz() APIs so that they can be re-used
for nested virtualization related TLB maintenance.

Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Tested-by: Atish Patra <atishp@rivosinc.com>
Reviewed-by: Nutty Liu <liujingqi@lanxincomputing.com>
Link: https://lore.kernel.org/r/20250618113532.471448-13-apatel@ventanamicro.com
Signed-off-by: Anup Patel <anup@brainfault.org>

RISC-V: KVM: Factor-out g-stage page table management

The upcoming nested virtualization can share g-stage page table
management with the current host g-stage implementation hence
factor-out g-stage page table management as separate sources
and also use "kvm_riscv_mmu_" prefix for host g-stage functions.

Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Tested-by: Atish Patra <atishp@rivosinc.com>
Reviewed-by: Nutty Liu <liujingqi@lanxincomputing.com>
Link: https://lore.kernel.org/r/20250618113532.471448-12-apatel@ventanamicro.com
Signed-off-by: Anup Patel <anup@brainfault.org>

RISC-V: KVM: Add vmid field to struct kvm_riscv_hfence

Currently, the struct kvm_riscv_hfence does not have vmid field
and various hfence processing functions always pick vmid assigned
to the guest/VM. This prevents us from doing hfence operation on
arbitrary vmid hence add vmid field to struct kvm_riscv_hfence
and use it wherever applicable.

Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Tested-by: Atish Patra <atishp@rivosinc.com>
Reviewed-by: Nutty Liu <liujingqi@lanxincomputing.com>
Link: https://lore.kernel.org/r/20250618113532.471448-11-apatel@ventanamicro.com
Signed-off-by: Anup Patel <anup@brainfault.org>

RISC-V: KVM: Introduce struct kvm_gstage_mapping

Introduce struct kvm_gstage_mapping which represents a g-stage
mapping at a particular g-stage page table level. Also, update
the kvm_riscv_gstage_map() to return the g-stage mapping upon
success.

Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Tested-by: Atish Patra <atishp@rivosinc.com>
Reviewed-by: Nutty Liu <liujingqi@lanxincomputing.com>
Link: https://lore.kernel.org/r/20250618113532.471448-10-apatel@ventanamicro.com
Signed-off-by: Anup Patel <anup@brainfault.org>

RISC-V: KVM: Factor-out MMU related declarations into separate headers

The MMU, TLB, and VMID management for KVM RISC-V already exists as
seprate sources so create separate headers along these lines. This
further simplifies asm/kvm_host.h header.

Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Tested-by: Atish Patra <atishp@rivosinc.com>
Reviewed-by: Nutty Liu <liujingqi@lanxincomputing.com>
Link: https://lore.kernel.org/r/20250618113532.471448-9-apatel@ventanamicro.com
Signed-off-by: Anup Patel <anup@brainfault.org>

RISC-V: KVM: Use ncsr_xyz() in kvm_riscv_vcpu_trap_redirect()

The H-extension CSRs accessed by kvm_riscv_vcpu_trap_redirect() will
trap when KVM RISC-V is running as Guest/VM hence remove these traps
by using ncsr_xyz() instead of csr_xyz().

Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Tested-by: Atish Patra <atishp@rivosinc.com>
Reviewed-by: Nutty Liu <liujingqi@lanxincomputing.com>
Link: https://lore.kernel.org/r/20250618113532.471448-8-apatel@ventanamicro.com
Signed-off-by: Anup Patel <anup@brainfault.org>

RISC-V: KVM: Implement kvm_arch_flush_remote_tlbs_range()

The kvm_arch_flush_remote_tlbs_range() expected by KVM core can be
easily implemented for RISC-V using kvm_riscv_hfence_gvma_vmid_gpa()
hence provide it.

Also with kvm_arch_flush_remote_tlbs_range() available for RISC-V, the
mmu_wp_memory_region() can happily use kvm_flush_remote_tlbs_memslot()
instead of kvm_flush_remote_tlbs().

Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Tested-by: Atish Patra <atishp@rivosinc.com>
Reviewed-by: Nutty Liu <liujingqi@lanxincomputing.com>
Link: https://lore.kernel.org/r/20250618113532.471448-7-apatel@ventanamicro.com
Signed-off-by: Anup Patel <anup@brainfault.org>

RISC-V: KVM: Don't flush TLB when PTE is unchanged

The gstage_set_pte() and gstage_op_pte() should flush TLB only when
a leaf PTE changes so that unnecessary TLB flushes can be avoided.

Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Tested-by: Atish Patra <atishp@rivosinc.com>
Reviewed-by: Nutty Liu <liujingqi@lanxincomputing.com>
Link: https://lore.kernel.org/r/20250618113532.471448-6-apatel@ventanamicro.com
Signed-off-by: Anup Patel <anup@brainfault.org>

RISC-V: KVM: Replace KVM_REQ_HFENCE_GVMA_VMID_ALL with KVM_REQ_TLB_FLUSH

The KVM_REQ_HFENCE_GVMA_VMID_ALL is same as KVM_REQ_TLB_FLUSH so
to avoid confusion let's replace KVM_REQ_HFENCE_GVMA_VMID_ALL with
KVM_REQ_TLB_FLUSH. Also, rename kvm_riscv_hfence_gvma_vmid_all_process()
to kvm_riscv_tlb_flush_process().

Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Tested-by: Atish Patra <atishp@rivosinc.com>
Reviewed-by: Nutty Liu <liujingqi@lanxincomputing.com>
Link: https://lore.kernel.org/r/20250618113532.471448-5-apatel@ventanamicro.com
Signed-off-by: Anup Patel <anup@brainfault.org>

RISC-V: KVM: Rename and move kvm_riscv_local_tlb_sanitize()

The kvm_riscv_local_tlb_sanitize() deals with sanitizing current
VMID related TLB mappings when a VCPU is moved from one host CPU
to another.

Let's move kvm_riscv_local_tlb_sanitize() to VMID management
sources and rename it to kvm_riscv_gstage_vmid_sanitize().

Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Tested-by: Atish Patra <atishp@rivosinc.com>
Reviewed-by: Nutty Liu <liujingqi@lanxincomputing.com>
Link: https://lore.kernel.org/r/20250618113532.471448-4-apatel@ventanamicro.com
Signed-off-by: Anup Patel <anup@brainfault.org>

RISC-V: KVM: Drop the return value of kvm_riscv_vcpu_aia_init()

The kvm_riscv_vcpu_aia_init() does not return any failure so drop
the return value which is always zero.

Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Tested-by: Atish Patra <atishp@rivosinc.com>
Reviewed-by: Nutty Liu <liujingqi@lanxincomputing.com>
Link: https://lore.kernel.org/r/20250618113532.471448-3-apatel@ventanamicro.com
Signed-off-by: Anup Patel <anup@brainfault.org>

RISC-V: KVM: Check kvm_riscv_vcpu_alloc_vector_context() return value

The kvm_riscv_vcpu_alloc_vector_context() does return an error code
upon failure so don't ignore this in kvm_arch_vcpu_create().

Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Tested-by: Atish Patra <atishp@rivosinc.com>
Reviewed-by: Nutty Liu <liujingqi@lanxincomputing.com>
Link: https://lore.kernel.org/r/20250618113532.471448-2-apatel@ventanamicro.com
Signed-off-by: Anup Patel <anup@brainfault.org>

Merge tag 'pull-rpc_pipefs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull rpc_pipefs updates from Al Viro:
"Massage rpc_pipefs to use saner primitives and clean up the APIs
  provided to the rest of the kernel"

* tag 'pull-rpc_pipefs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  rpc_create_client_dir(): return 0 or -E...
  rpc_create_client_dir(): don't bother with rpc_populate()
  rpc_new_dir(): the last argument is always NULL
  rpc_pipe: expand the calls of rpc_mkdir_populate()
  rpc_gssd_dummy_populate(): don't bother with rpc_populate()
  rpc_mkpipe_dentry(): switch to simple_start_creating()
  rpc_pipe: saner primitive for creating regular files
  rpc_pipe: saner primitive for creating subdirectories
  rpc_pipe: don't overdo directory locking
  rpc_mkpipe_dentry(): saner calling conventions
  rpc_unlink(): saner calling conventions
  rpc_populate(): lift cleanup into callers
  rpc_unlink(): use simple_recursive_removal()
  rpc_{rmdir_,}depopulate(): use simple_recursive_removal() instead
  rpc_pipe: clean failure exits in fill_super
  new helper: simple_start_creating()

Merge tag 'pull-simple_recursive_removal' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull simple_recursive_removal() update from Al Viro:
"Removing subtrees of kernel filesystems is done in quite a few places;
  unfortunately, it's easy to get wrong. A number of open-coded attempts
  are out there, with varying amount of bogosities.

  simple_recursive_removal() had been introduced for doing that with all
  precautions needed; it does an equivalent of rm -rf, with sufficient
  locking, eviction of anything mounted on top of the subtree, etc.

  This series converts a bunch of open-coded instances to using that"

* tag 'pull-simple_recursive_removal' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  functionfs, gadgetfs: use simple_recursive_removal()
  kill binderfs_remove_file()
  fuse_ctl: use simple_recursive_removal()
  pstore: switch to locked_recursive_removal()
  binfmt_misc: switch to locked_recursive_removal()
  spufs: switch to locked_recursive_removal()
  add locked_recursive_removal()
  better lockdep annotations for simple_recursive_removal()
  simple_recursive_removal(): saner interaction with fsnotify

Merge tag 'pull-dcache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull dentry d_flags updates from Al Viro:
"The current exclusion rules for dentry->d_flags stores are rather
  unpleasant. The basic rules are simple:

   - stores to dentry->d_flags are OK under dentry->d_lock

   - stores to dentry->d_flags are OK in the dentry constructor, before
     becomes potentially visible to other threads

  Unfortunately, there's a couple of exceptions to that, and that's
  where the headache comes from.

  The main PITA comes from d_set_d_op(); that primitive sets ->d_op of
  dentry and adjusts the flags that correspond to presence of individual
  methods. It's very easy to misuse; existing uses _are_ safe, but proof
  of correctness is brittle.

  Use in __d_alloc() is safe (we are within a constructor), but we might
  as well precalculate the initial value of 'd_flags' when we set the
  default ->d_op for given superblock and set 'd_flags' directly instead
  of messing with that helper.

  The reasons why other uses are safe are bloody convoluted; I'm not
  going to reproduce it here. See [1] for gory details, if you care. The
  critical part is using d_set_d_op() only just prior to
  d_splice_alias(), which makes a combination of d_splice_alias() with
  setting ->d_op, etc a natural replacement primitive.

  Better yet, if we go that way, it's easy to take setting ->d_op and
  modifying 'd_flags' under ->d_lock, which eliminates the headache as
  far as 'd_flags' exclusion rules are concerned. Other exceptions are
  minor and easy to deal with.

  What this series does:

   - d_set_d_op() is no longer available; instead a new primitive
     (d_splice_alias_ops()) is provided, equivalent to combination of
     d_set_d_op() and d_splice_alias().

   - new field of struct super_block - 's_d_flags'. This sets the
     default value of 'd_flags' to be used when allocating dentries on
     this filesystem.

   - new primitive for setting 's_d_op': set_default_d_op(). This
     replaces stores to 's_d_op' at mount time.

     All in-tree filesystems converted; out-of-tree ones will get caught
     by the compiler ('s_d_op' is renamed, so stores to it will be
     caught). 's_d_flags' is set by the same primitive to match the
     's_d_op'.

   - a lot of filesystems had sb->s_d_op->d_delete equal to
     always_delete_dentry; that is equivalent to setting
     DCACHE_DONTCACHE in 'd_flags', so such filesystems can bloody well
     set that bit in 's_d_flags' and drop 'd_delete()' from
     dentry_operations.

     In quite a few cases that results in empty dentry_operations, which
     means that we can get rid of those.

   - kill simple_dentry_operations - not needed anymore

   - massage d_alloc_parallel() to get rid of the other exception wrt
     'd_flags' stores - we can set DCACHE_PAR_LOOKUP as soon as we
     allocate the new dentry; no need to delay that until we commit to
     using the sucker.

  As the result, 'd_flags' stores are all either under ->d_lock or done
  before the dentry becomes visible in any shared data structures"

Link: https://lore.kernel.org/all/20250224010624.GT1977892@ZenIV/
* tag 'pull-dcache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (21 commits)
  configfs: use DCACHE_DONTCACHE
  debugfs: use DCACHE_DONTCACHE
  efivarfs: use DCACHE_DONTCACHE instead of always_delete_dentry()
  9p: don't bother with always_delete_dentry
  ramfs, hugetlbfs, mqueue: set DCACHE_DONTCACHE
  kill simple_dentry_operations
  devpts, sunrpc, hostfs: don't bother with ->d_op
  shmem: no dentry retention past the refcount reaching zero
  d_alloc_parallel(): set DCACHE_PAR_LOOKUP earlier
  make d_set_d_op() static
  simple_lookup(): just set DCACHE_DONTCACHE
  tracefs: Add d_delete to remove negative dentries
  set_default_d_op(): calculate the matching value for ->d_flags
  correct the set of flags forbidden at d_set_d_op() time
  split d_flags calculation out of d_set_d_op()
  new helper: set_default_d_op()
  fuse: no need for special dentry_operations for root dentry
  switch procfs from d_set_d_op() to d_splice_alias_ops()
  new helper: d_splice_alias_ops()
  procfs: kill ->proc_dops
  ...

Merge tag 'pull-headers_param' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull asm/param cleanup from Al Viro:
"This massages asm/param.h to simpler and more uniform shape:

   - all arch/*/include/uapi/asm/param.h are either generated includes
     of <asm-generic/param.h> or a #define or two followed by such
     include

   - no arch/*/include/asm/param.h anywhere, generated or not

   - include <asm/param.h> resolves to arch/*/include/uapi/asm/param.h
     of the architecture in question (or that of host in case of uml)

   - include/asm-generic/param.h pulls uapi/asm-generic/param.h and
     deals with USER_HZ, CLOCKS_PER_SEC and with HZ redefinition after
     that"

* tag 'pull-headers_param' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  loongarch, um, xtensa: get rid of generated arch/$ARCH/include/asm/param.h
  alpha: regularize the situation with asm/param.h
  xtensa: get rid uapi/asm/param.h

Merge tag 'nfsd-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd updates from Chuck Lever:
"NFSD is finally able to offer write delegations to clients that open
  files with O_WRONLY, thanks to patches from Dai Ngo. We're expecting
  this to accelerate a few interesting corner cases.

  The cap on the number of operations per NFSv4 COMPOUND has been
  lifted. Now, clients that send COMPOUNDs containing dozens of
  operations (for example, a long stream of LOOKUP operations to walk a
  pathname in a single round trip) will no longer be rejected.

  This release re-enables the ability for NFSD to perform NFSv4.2 COPY
  operations asynchronously. This feature has been disabled to mitigate
  the risk of denial-of-service when too many such requests arrive.

  Many thanks to the contributors, reviewers, testers, and bug reporters
  who participated during the v6.17 development cycle"

* tag 'nfsd-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (32 commits)
  nfsd: Drop dprintk in blocklayout xdr functions
  sunrpc: make svc_tcp_sendmsg() take a signed sentp pointer
  sunrpc: rearrange struct svc_rqst for fewer cachelines
  sunrpc: return better error in svcauth_gss_accept() on alloc failure
  sunrpc: reset rq_accept_statp when starting a new RPC
  sunrpc: remove SVC_SYSERR
  sunrpc: fix handling of unknown auth status codes
  NFSD: Simplify struct knfsd_fh
  NFSD: Access a knfsd_fh's fsid by pointer
  Revert "NFSD: Force all NFSv4.2 COPY requests to be synchronous"
  NFSD: Avoid multiple -Wflex-array-member-not-at-end warnings
  NFSD: Use vfs_iocb_iter_write()
  NFSD: Use vfs_iocb_iter_read()
  NFSD: Clean up kdoc for nfsd_open_local_fh()
  NFSD: Clean up kdoc for nfsd_file_put_local()
  NFSD: Remove definition for trace_nfsd_ctl_maxconn
  NFSD: Remove definition for trace_nfsd_file_gc_recent
  NFSD: Remove definitions for unused trace_nfsd_file_lru trace points
  NFSD: Remove definition for trace_nfsd_file_unhash_and_queue
  nfsd: Use correct error code when decoding extents
  ...

Merge tag 'gfs2-for-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2

Pull gfs2 updates from Andreas Gruenbacher:

- Prevent cluster nodes from trying to recover their own filesystems
   during a withdraw

- Add two missing migrate_folio aops and an additional exhash directory
   consistency check (both triggered by syzbot bug reports)

- Sanitize how dlm results are processed and clean up a few quirks in
   the glock code

- Minor stuff: Get rid of the GIF_ALLOC_FAILED flag; use SECTOR_SIZE
   and SECTOR_SHIFT

* tag 'gfs2-for-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
  gfs2: No more self recovery
  gfs2: Validate i_depth for exhash directories
  gfs2: Set .migrate_folio in gfs2_{rgrp,meta}_aops
  gfs2: a minor finish_xmote cleanup
  gfs2: simplify finish_xmote
  gfs2: sanitize the gdlm_ast -> finish_xmote interface
  gfs2: Minor do_xmote cancelation fix
  gfs2: Remove GIF_ALLOC_FAILED flag
  gfs2: Use SECTOR_SIZE and SECTOR_SHIFT

Merge tag 'xfs-merge-6.17' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs updates from Carlos Maiolino:
"This doesn't contain any new features. It mostly is a collection of
  clean ups and code refactoring that I preferred to postpone to the
  merge window.

  It includes removal of several unused tracepoints, refactoring key
  comparing routines under the B-Trees management and cleanup of xfs
  journaling code"

* tag 'xfs-merge-6.17' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (44 commits)
  xfs: don't use a xfs_log_iovec for ri_buf in log recovery
  xfs: don't use a xfs_log_iovec for attr_item names and values
  xfs: use better names for size members in xfs_log_vec
  xfs: cleanup the ordered item logic in xlog_cil_insert_format_items
  xfs: don't pass the old lv to xfs_cil_prepare_item
  xfs: remove unused trace event xfs_reflink_cow_enospc
  xfs: remove unused trace event xfs_discard_rtrelax
  xfs: remove unused trace event xfs_log_cil_return
  xfs: remove unused trace event xfs_dqreclaim_dirty
  fs/xfs: replace strncpy with memtostr_pad()
  xfs: Remove unused label in xfs_dax_notify_dev_failure
  xfs: improve the comments in xfs_select_zone_nowait
  xfs: improve the comments in xfs_max_open_zones
  xfs: stop passing an inode to the zone space reservation helpers
  xfs: rename oz_write_pointer to oz_allocated
  xfs: use a uint32_t to cache i_used_blocks in xfs_init_zone
  xfs: improve the xg_active_ref check in xfs_group_free
  xfs: remove the xlog_ticket_t typedef
  xfs: remove xrep_trans_{alloc,cancel}_hook_dummy
  xfs: return the allocated transaction from xchk_trans_alloc_empty
  ...

Merge tag 'erofs-for-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs

Pull erofs updates from Gao Xiang:
"We now support metadata compression. It can be useful for embedded use
  cases or archiving a large number of small files.

  Additionally, readdir performance has been improved by enabling
  readahead (note that it was already common practice for ext3/4 non-dx
  and f2fs directories). We may consider further improvements later to
  align with ext4's s_inode_readahead_blks behavior for slow devices
  too.

  The remaining commits are minor.

  Summary:

   - Add support for metadata compression

   - Enable readahead for directories to improve readdir performance

   - Minor fixes and cleanups"

* tag 'erofs-for-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
  erofs: support to readahead dirent blocks in erofs_readdir()
  erofs: implement metadata compression
  erofs: add on-disk definition for metadata compression
  erofs: fix build error with CONFIG_EROFS_FS_ZIP_ACCEL=y
  erofs: remove ENOATTR definition
  erofs: refine erofs_iomap_begin()
  erofs: unify meta buffers in z_erofs_fill_inode()
  erofs: remove need_kmap in erofs_read_metabuf()
  erofs: do sanity check on m->type in z_erofs_load_compact_lcluster()
  erofs: get rid of {get,put}_page() for ztailpacking data

Merge tag 'ntfs3_for_6.17' of https://github.com/Paragon-Software-Group/linux-ntfs3

Pull ntfs3 updates from Konstantin Komarov:
"Added:
   - sanity check for file name
   - mark live inode as bad and avoid any operations

  Fixed:
   - handling of symlinks created in windows
   - creation of symlinks for relative path

  Changed:
   - cancel setting inode as bad after removing name fails
   - revert 'replace inode_trylock with inode_lock'"

* tag 'ntfs3_for_6.17' of https://github.com/Paragon-Software-Group/linux-ntfs3:
  Revert "fs/ntfs3: Replace inode_trylock with inode_lock"
  fs/ntfs3: Exclude call make_bad_inode for live nodes.
  fs/ntfs3: cancle set bad inode after removing name fails
  fs/ntfs3: Add sanity check for file name
  fs/ntfs3: correctly create symlink for relative path
  fs/ntfs3: fix symlinks cannot be handled correctly

Merge tag 'for-6.17-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs updates from David Sterba:
"A number of usability and feature updates, scattered performance
  improvements and fixes. Highlight of the core changes is getting
  closer to enabling large folios (now behind a config option).

  User visible changes:

   - update defrag ioctl, add new flag to request no compression on
     existing extents

   - restrict writes to block devices after mount

   - in experimental config, enable large folios for data, almost
     complete but not widely tested

   - add stats tracking duration of critical section in transaction
     commit to /sys/fs/btrfs/FSID/commit_stats

  Performance improvements:

   - caching of lookup results of free space bitmap (20% runtime
     improvement on an empty file creation benchmark)

   - accessors to metadata (b-tree items) simplified and optimized,
     minor improvement in metadata-heavy workloads

   - readahead on compressed data improves sequential read

   - the xarray for extent buffers is indexed by denser keys, leading to
     better packing of the nodes (50-70% reduction of leaf nodes)

  Notable fixes:

   - stricter compression mount option parsing

   - send properly emits fallocate command for file holes when protocol
     v2 is used

   - fix overallocation of chunks with mount option 'ssd_spread', due to
     interaction with size classes not finding the right chunk
     (workaround: manual reclaim by 'usage' balance filter)

   - various quota enable/disable races with rescan, more verbose
     notifications about inconsistent state

   - populate otime in tree-log during log replay

   - handle ENOSPC when NOCOW file is used with mmap()

  Core:

   - large data folios enabled in experimental config

   - improved error handling, transaction abort call sites

   - in zoned mode, allocate reloc block group on mount to make sure
     there's always one available for zone reclaim under heavy load

   - rework device opening, they're always open as read-only and delayed
     until the super block is created, allowing the restricted writes
     after mount

   - preparatory work for adding blk_holder_ops, allowing device
     freeze/thaw in the future

  Cleanups, refactoring:

   - type and naming unifications (int/bool, return variables)

   - rb-tree helper refactoring and simplifications

   - reorder memory allocations to less critical places

   - RCU string (used for device name) refactoring and API removal

   - replace all remaining use of strcpy()"

* tag 'for-6.17-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (209 commits)
  btrfs: send: use fallocate for hole punching with send stream v2
  btrfs: unfold transaction aborts when writing dirty block groups
  btrfs: use saner variable type and name to indicate extrefs at add_inode_ref()
  btrfs: don't skip remaining extrefs if dir not found during log replay
  btrfs: don't ignore inode missing when replaying log tree
  btrfs: enable large data folios for data reloc inode
  btrfs: output more info when btrfs_subpage_assert() failed
  btrfs: reloc: unconditionally invalidate the page cache for each cluster
  btrfs: defrag: add flag to force no-compression
  btrfs: fix ssd_spread overallocation
  btrfs: zoned: requeue to unused block group list if zone finish failed
  btrfs: zoned: do not remove unwritten non-data block group
  btrfs: remove btrfs_clear_extent_bits()
  btrfs: use cached state when falling back from NOCoW write to CoW write
  btrfs: set EXTENT_NORESERVE before range unlock in btrfs_truncate_block()
  btrfs: don't print relocation messages from auto reclaim
  btrfs: remove redundant auto reclaim log message
  btrfs: make btrfs_check_nocow_lock() check more than one extent
  btrfs: assert we can NOCOW the range in btrfs_truncate_block()
  btrfs: update function comment for btrfs_check_nocow_lock()
  ...

KVM: arm64: selftests: Add FEAT_RAS EL2 registers to get-reg-list

VDISR_EL2 and VSESR_EL2 are now visible to userspace for nested VMs. Add
them to get-reg-list.

Link: https://lore.kernel.org/r/20250728152603.2823699-1-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>

Merge branch 'kvm-arm64/vgic-v4-ctl' into kvmarm/next

* kvm-arm64/vgic-v4-ctl:
  : Userspace control of nASSGIcap, courtesy of Raghavendra Rao Ananta
  :
  : Allow userspace to decide if support for SGIs without an active state is
  : advertised to the guest, allowing VMs from GICv3-only hardware to be
  : migrated to to GICv4.1 capable machines.
  Documentation: KVM: arm64: Describe VGICv3 registers writable pre-init
  KVM: arm64: selftests: Add test for nASSGIcap attribute
  KVM: arm64: vgic-v3: Allow userspace to write GICD_TYPER2.nASSGIcap
  KVM: arm64: vgic-v3: Allow access to GICD_IIDR prior to initialization
  KVM: arm64: vgic-v3: Consolidate MAINT_IRQ handling
  KVM: arm64: Disambiguate support for vSGIs v. vLPIs

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>

Merge branch 'kvm-arm64/el2-reg-visibility' into kvmarm/next

* kvm-arm64/el2-reg-visibility:
  : Fixes to EL2 register visibility, courtesy of Marc Zyngier
  :
  :  - Expose EL2 VGICv3 registers via the VGIC attributes accessor, not the
  :    KVM_{GET,SET}_ONE_REG ioctls
  :
  :  - Condition visibility of FGT registers on the presence of FEAT_FGT in
  :    the VM
  KVM: arm64: selftest: vgic-v3: Add basic GICv3 sysreg userspace access test
  KVM: arm64: Enforce the sorting of the GICv3 system register table
  KVM: arm64: Clarify the check for reset callback in check_sysreg_table()
  KVM: arm64: vgic-v3: Fix ordering of ICH_HCR_EL2
  KVM: arm64: Document registers exposed via KVM_DEV_ARM_VGIC_GRP_CPU_SYSREGS
  KVM: arm64: selftests: get-reg-list: Add base EL2 registers
  KVM: arm64: selftests: get-reg-list: Simplify feature dependency
  KVM: arm64: Advertise FGT2 registers to userspace
  KVM: arm64: Condition FGT registers on feature availability
  KVM: arm64: Expose GICv3 EL2 registers via KVM_DEV_ARM_VGIC_GRP_CPU_SYSREGS
  KVM: arm64: Let GICv3 save/restore honor visibility attribute
  KVM: arm64: Define helper for ICH_VTR_EL2
  KVM: arm64: Define constant value for ICC_SRE_EL2
  KVM: arm64: Don't advertise ICH_*_EL2 registers through GET_ONE_REG
  KVM: arm64: Make RVBAR_EL2 accesses UNDEF

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>