block: Fix default IO priority if there is no IO context
Upstream commit 53889bcaf536 ("block: make __get_task_ioprio() easier to
read") changes the IO priority returned to the caller if no IO context
is defined for the task. Prior to this commit, the returned IO priority
was determined by task_nice_ioclass() and task_nice_ioprio(). Now it is
always IOPRIO_DEFAULT, which translates to IOPRIO_CLASS_NONE with priority
0. However, task_nice_ioclass() returns IOPRIO_CLASS_IDLE, IOPRIO_CLASS_RT,
or IOPRIO_CLASS_BE depending on the task scheduling policy, and
task_nice_ioprio() returns a value determined by task_nice(). This causes
regressions in test code checking the IO priority and class of IO
operations on tasks with no IO context.
Fix the problem by returning the IO priority calculated from
task_nice_ioclass() and task_nice_ioprio() if no IO context is defined
to match earlier behavior.
Fixes: 53889bcaf536 ("block: make __get_task_ioprio() easier to read") Cc: Jens Axboe <axboe@kernel.dk> Cc: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20250731044953.1852690-1-linux@roeck-us.net Signed-off-by: Jens Axboe <axboe@kernel.dk>
- support for Wake-on-touch in intel-thc (Even Xu)
- support for "Input max input size control" and "Input interrupt delay"
I2C features in order to improve compatibility of THC devices with
legacy HIDI2C touch devices (Even Xu)
Merge tag 'mtd/for-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux
Pull mtd updates from Miquel Raynal:
"MTD changes:
- Apart from a binding conversion to yaml, only minor changes/small
fixes have been merged.
Raw NAND changes:
- Minor fixes for various controller drivers like DMA mapping checks,
better timing derivations or bitflip statistics.
- some Hynix NAND flashes were not supporting read-retries, so don't
even try to do it
SPI NAND changes:
- In order to support high-speed modes, certain chips need extra
configuration like adding more dummy cycles. This is now possible,
especially on Winbond chips.
- Aside from that, Gigadevice gets support for a new chip (GD5F1GM9).
SPI NOR changes:
- A notable changes is the fix for exiting 4-byte addressing on
Infineon SEMPER flashes. These flashes do not support the standard
EX4B opcode (E9h), and use a vendor-specific opcode (B8h) instead.
- There is also a fix for unlocking flashes that are write-protected
at power-on. This was caused by using an uninitialized mtd_info in
spi_nor_try_unlock_all()"
* tag 'mtd/for-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux: (26 commits)
mtd: spinand: winbond: Add comment about the maximum frequency
mtd: spinand: winbond: Enable high-speed modes on w35n0xjw
mtd: spinand: winbond: Enable high-speed modes on w25n0xjw
mtd: spinand: Add a ->configure_chip() hook
mtd: spinand: Add a frequency field to all READ_FROM_CACHE variants
mtd: spinand: Fix macro alignment
spi: spi-mem: Take into account the actual maximum frequency
spi: spi-mem: Use picoseconds for calculating the op durations
mtd: rawnand: atmel: set pmecc data setup time
mtd: spinand: propagate spinand_wait() errors from spinand_write_page()
mtd: rawnand: fsmc: Add missing check after DMA map
mtd: rawnand: rockchip: Add missing check after DMA map
mtd: rawnand: hynix: don't try read-retry on SLC NANDs
mtd: rawnand: atmel: Fix dma_mapping_error() address
mtd: nand: brcmnand: fix mtd corrected bits stat
mtd: rawnand: renesas: Add missing check after DMA map
mtd: spinand: gigadevice: Add support for GD5F1GM9 chips
mtd: nand: brcmnand: replace manual string choices with standard helpers
mtd: map: Don't use "proxy" headers
mtd: spi-nor: Fix spi_nor_try_unlock_all()
...
- fix for potential NULL pointer dereference in hid-apple that could be caused
by malicious device with APPLE_MAGIC_BACKLIGHT quirk present triggering
overflow in data field (Qasim Ijaz)
- third party trackpart support for MacBookPro15,1 (Aditya Garg)
- Apple Magic Keyboard A311[89] USB-C support (Aditya Garg, Grigorii Sokolik)
Merge tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
Pull clk updates from Stephen Boyd:
"This is the usual collection of primarily clk driver updates.
The big part of the diff is all the new Qualcomm clk drivers added for
a few SoCs they're working on. The other two vendors with significant
work this cycle are Renesas and Amlogic. Renesas adds a bunch of clks
to existing drivers and supports some new SoCs while Amlogic is
starting a significant refactoring to simplify their code.
The core framework gained a pair of helpers to get the 'struct device'
or 'struct device_node' associated with a 'struct clk_hw'. Some
associated KUnit tests were added for these simple helpers as well.
Beyond that core change there are lots of little fixes throughout the
clk drivers for the stuff we see every day, wrong clk driver data that
affects tree topology or supported frequencies, etc. They're not found
until the clks are actually used by some consumer device driver.
New Drivers:
- Global, display, gpu, video, camera, tcsr, and rpmh clock
controller for the Qualcomm Milos SoC
- Camera, display, GPU, and video clock controllers for Qualcomm
QCS615
- Video clock controller driver for Qualcomm SM6350
- Camera clock controller driver for Qualcomm SC8180X
- I3C clocks and resets on Renesas RZ/G3E
- Expanded Serial Peripheral Interface (xSPI) clocks and resets on
Renesas RZ/V2H(P) and RZ/V2N
- SPI (RSPI) clocks and resets on Renesas RZ/V2H(P)
- SDHI and I2C clocks on Renesas RZ/T2H and RZ/N2H
- Ethernet clocks and resets on Renesas RZ/G3E
- Initial support for the Renesas RZ/T2H (R9A09G077) and RZ/N2H
(R9A09G087) SoCs
- Ethernet clocks and resets on Renesas RZ/V2H and RZ/V2N
- Timer, I2C, watchdog, GPU, and USB2.0 clocks and resets on Renesas
RZ/V2N
Updates:
- Support atomic PWMs in the PWM clk driver
- clk_hw_get_dev() and clk_hw_get_of_node() helpers
- Replace round_rate() with determine_rate() in various clk drivers
- Convert clk DT bindings to DT schema format for DT validation
- Various clk driver cleanups and refactorings from static analysis
tools and possibly real humans
- A lot of little fixes here and there to things like clk tree
topology, missing frequencies, flagging clks as critical, etc"
* tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: (216 commits)
clk: clocking-wizard: Fix the round rate handling for versal
clk: Fix typos
clk: spacemit: ccu_pll: fix error return value in recalc_rate callback
clk: tegra: periph: Make tegra_clk_periph_ops static
clk: tegra: periph: Fix error handling and resolve unsigned compare warning
clk: imx: scu: convert from round_rate() to determine_rate()
clk: imx: pllv4: convert from round_rate() to determine_rate()
clk: imx: pllv3: convert from round_rate() to determine_rate()
clk: imx: pllv2: convert from round_rate() to determine_rate()
clk: imx: pll14xx: convert from round_rate() to determine_rate()
clk: imx: pfd: convert from round_rate() to determine_rate()
clk: imx: frac-pll: convert from round_rate() to determine_rate()
clk: imx: fracn-gppll: convert from round_rate() to determine_rate()
clk: imx: fixup-div: convert from round_rate() to determine_rate()
clk: imx: cpu: convert from round_rate() to determine_rate()
clk: imx: busy: convert from round_rate() to determine_rate()
clk: imx: composite-93: remove round_rate() in favor of determine_rate()
clk: imx: composite-8m: remove round_rate() in favor of determine_rate()
clk: qcom: Remove redundant pm_runtime_mark_last_busy() calls
clk: imx: Remove redundant pm_runtime_mark_last_busy() calls
...
Merge tag 'pwm/for-6.17-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/ukleinek/linux
Pull pwm fixes from Uwe Kleine-König:
"Two fixes for the mediatek and the imx-tpm driver. Both are old
(v4.12-rc1 and v5.2-rc1 respectively).
The mediatek issue is that both period and duty_cycle were configured
to higher values than requested. For most applications the period part
is no tragedy, but a PWM that is configured for duty_cycle = 0 should
really emit a constant inactive signal. That was noticed by an LED not
being completely off in this case (two commits for one fix: a
preparatory one and the actual fix in the second one).
For the imx-tpm PWM driver the fixed issue is that the first period is
quite a bit too long under some circumstances. So it might take up to
UINT32_MAX << 7 clock ticks until the PWM starts toggling. With an
assumed input clock rate of 166 MHz (completely made up) that's 55
minutes"
* tag 'pwm/for-6.17-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/ukleinek/linux:
pwm: imx-tpm: Reset counter if CMOD is 0
pwm: mediatek: Fix duty and period setting
pwm: mediatek: Handle hardware enable and clock enable separately
Miguel Ojeda [Thu, 31 Jul 2025 19:41:37 +0000 (21:41 +0200)]
gpu: nova-core: fix up formatting after merge
In the merge 260f6f4fda93 ("Merge tag 'drm-next-2025-07-30' of
https://gitlab.freedesktop.org/drm/kernel"), the formatting in the
conflict resolution doesn't match what `make rustfmt` wants to make it.
Fix it up appropriately.
Signed-off-by: Miguel Ojeda <ojeda@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Merge tag 'media/v6.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media updates from Mauro Carvalho Chehab:
- v4l2 core:
- sub-device framework routing improvements
- NV12M tiled variants added to v4l2_format_info
- some fixes at control handler freeing logic
- fixed H264 SEPARATE_COLOUR_PLANE check
- new staging driver: Intel IPU7 PCI
- Rockchip video decoder driver got promoted from staging
- iris: added HEVC/VP9 encoder/decoder support
- vsp1: driver has gained Renesas VSPX support
- uvc:
- switched to vb2 ioctl helpers
- added MSXU 1.5 metadata support
- atomisp: GC0310 sensor driver cleanups in preparation for moving it
out of staging
- Lots of cleanup, fixes and improvements
* tag 'media/v6.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (310 commits)
media: rkvdec: Unstage the driver
media: rkvdec: Remove TODO file
media: dt-bindings: rockchip: Add RK3576 Video Decoder bindings
media: dt-bindings: rockchip: Document RK3588 Video Decoder bindings
media: amphion: Support dmabuf and v4l2 buffer without binding
media: verisilicon: postproc: 4K support
media: v4l2: Add support for NV12M tiled variants to v4l2_format_info()
media: uvcvideo: Use a count variable for meta_formats instead of 0 terminating
media: uvcvideo: Auto-set UVC_QUIRK_MSXU_META
media: uvcvideo: Introduce V4L2_META_FMT_UVC_MSXU_1_5
media: uvcvideo: Introduce dev->meta_formats
media: Documentation: Add note about UVCH length field
media: uvcvideo: Do not mark valid metadata as invalid
media: uvcvideo: uvc_v4l2_unlocked_ioctl: Invert PM logic
media: core: export v4l2_translate_cmd
media: uvcvideo: Turn on the camera if V4L2_EVENT_SUB_FL_SEND_INITIAL
media: uvcvideo: Remove stream->is_streaming field
media: uvcvideo: Split uvc_stop_streaming()
media: uvcvideo: Handle locks in uvc_queue_return_buffers
media: uvcvideo: Use vb2 ioctl and fop helpers
...
Merge tag 'libnvdimm-for-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm updates from Ira Weiny:
"Header file cleanups. Specifically use specific headers rather than
just kernel.h in libnvdimm.h to reduce header file usage.
The second patch is a fix to CXL but is going through my tree as a
libnvdimm patch caused the issue"
* tag 'libnvdimm-for-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
cxl: Include range.h in cxl.h
libnvdimm: Don't use "proxy" headers
Merge tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd
Pull iommufd updates from Jason Gunthorpe:
"This broadly brings the assigned HW command queue support to iommufd.
This feature is used to improve SVA performance in VMs by avoiding
paravirtualization traps during SVA invalidations.
Along the way I think some of the core logic is in a much better state
to support future driver backed features.
Summary:
- IOMMU HW now has features to directly assign HW command queues to a
guest VM. In this mode the command queue operates on a limited set
of invalidation commands that are suitable for improving guest
invalidation performance and easy for the HW to virtualize.
This brings the generic infrastructure to allow IOMMU drivers to
expose such command queues through the iommufd uAPI, mmap the
doorbell pages, and get the guest physical range for the command
queue ring itself.
- An implementation for the NVIDIA SMMUv3 extension "cmdqv" is built
on the new iommufd command queue features. It works with the
existing SMMU driver support for cmdqv in guest VMs.
- Many precursor cleanups and improvements to support the above
cleanly, changes to the general ioctl and object helpers, driver
support for VDEVICE, and mmap pgoff cookie infrastructure.
- Sequence VDEVICE destruction to always happen before VFIO device
destruction. When using the above type features, and also in future
confidential compute, the internal virtual device representation
becomes linked to HW or CC TSM configuration and objects. If a VFIO
device is removed from iommufd those HW objects should also be
cleaned up to prevent a sort of UAF. This became important now that
we have HW backing the VDEVICE.
- Fix one syzkaller found error related to math overflows during iova
allocation"
* tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd: (57 commits)
iommu/arm-smmu-v3: Replace vsmmu_size/type with get_viommu_size
iommu/arm-smmu-v3: Do not bother impl_ops if IOMMU_VIOMMU_TYPE_ARM_SMMUV3
iommufd: Rename some shortterm-related identifiers
iommufd/selftest: Add coverage for vdevice tombstone
iommufd/selftest: Explicitly skip tests for inapplicable variant
iommufd/vdevice: Remove struct device reference from struct vdevice
iommufd: Destroy vdevice on idevice destroy
iommufd: Add a pre_destroy() op for objects
iommufd: Add iommufd_object_tombstone_user() helper
iommufd/viommu: Roll back to use iommufd_object_alloc() for vdevice
iommufd/selftest: Test reserved regions near ULONG_MAX
iommufd: Prevent ALIGN() overflow
iommu/tegra241-cmdqv: import IOMMUFD module namespace
iommufd: Do not allow _iommufd_object_alloc_ucmd if abort op is set
iommu/tegra241-cmdqv: Add IOMMU_VEVENTQ_TYPE_TEGRA241_CMDQV support
iommu/tegra241-cmdqv: Add user-space use support
iommu/tegra241-cmdqv: Do not statically map LVCMDQs
iommu/tegra241-cmdqv: Simplify deinit flow in tegra241_cmdqv_remove_vintf()
iommu/tegra241-cmdqv: Use request_threaded_irq
iommu/arm-smmu-v3-iommufd: Add hw_info to impl_ops
...
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma updates from Jason Gunthorpe:
- Various minor code cleanups and fixes for hns, iser, cxgb4, hfi1,
rxe, erdma, mana_ib
- Prefetch supprot for rxe ODP
- Remove memory window support from hns as new device FW is no longer
support it
- Remove qib, it is very old and obsolete now, Cornelis wishes to
restructure the hfi1/qib shared layer
- Fix a race in destroying CQs where we can still end up with work
running because the work is cancled before the driver stops
triggering it
- Improve interaction with namespaces:
* Follow the devlink namespace for newly spawned RDMA devices
* Create iopoib net devces in the parent IB device's namespace
* Allow CAP_NET_RAW checks to pass in user namespaces
- A new flow control scheme for IB MADs to try and avoid queue
overflows in the network
- Fix 2G message sizes in bnxt_re
- Optimize mkey layout for mlx5 DMABUF
- New "DMA Handle" concept to allow controlling PCI TPH and steering
tags
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (71 commits)
RDMA/siw: Change maintainer email address
RDMA/mana_ib: add support of multiple ports
RDMA/mlx5: Refactor optional counters steering code
RDMA/mlx5: Add DMAH support for reg_user_mr/reg_user_dmabuf_mr
IB: Extend UVERBS_METHOD_REG_MR to get DMAH
RDMA/mlx5: Add DMAH object support
RDMA/core: Introduce a DMAH object and its alloc/free APIs
IB/core: Add UVERBS_METHOD_REG_MR on the MR object
net/mlx5: Add support for device steering tag
net/mlx5: Expose IFC bits for TPH
PCI/TPH: Expose pcie_tph_get_st_table_size()
RDMA/mlx5: Fix incorrect MKEY masking
RDMA/mlx5: Fix returned type from _mlx5r_umr_zap_mkey()
RDMA/mlx5: remove redundant check on err on return expression
RDMA/mana_ib: add additional port counters
RDMA/mana_ib: Fix DSCP value in modify QP
RDMA/efa: Add CQ with external memory support
RDMA/core: Add umem "is_contiguous" and "start_dma_addr" helpers
RDMA/uverbs: Add a common way to create CQ with umem
RDMA/mlx5: Optimize DMABUF mkey page size
...
Merge tag 'leds-next-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/leds
Pull LED updates from Lee Jones:
"Improvements & Fixes:
- A fix for QCOM Flash to prevent incorrect register access when the
driver is re-bound. This is solved by duplicating a static register
array during the probe function, which prevents register addresses
from being miscalculated after multiple binds
- The LP50xx driver now correctly handles the 'reg' property in
device tree child nodes to ensure the multi_index is set correctly.
This prevents issues where LEDs could be controlled incorrectly if
the device tree nodes were processed in a different order. An error
is returned if the reg property is missing or out of range
- Add a Kconfig dependency on between TPS6131x and
V4L2_FLASH_LED_CLASS to prevent a build failure when the driver is
built-in and the V4L2 flash infrastructure is a loadable module
- Fix a potential buffer overflow warning in PCA955x reported by
older GCC versions by using a more precise format specifier when
creating the default LED label.
Cleanups & Refactoring:
- Correct the MAINTAINERS file entry for the TPS6131X flash LED
driver to point to the correct device tree binding file name
- Fix a comment in the Flash Class for the flash_timeout setter to
"flash timeout" from "flash duration" for accuracy
- The of_led_get() function is no longer exported as it has no users
outside of its own module.
Removals:
- Revert the commit to configure LED blink intervals for hardware
offload in the Netdev Trigger. This change was found to break
existing PHY drivers by putting their LEDs into a permanent,
unconditional blinking state.
Device Tree Bindings Updates:
- Update the binding for LP50xx to document that the child reg
property is the index within the LED bank. The example was also
updated to use correct values
- Update the JNCP5623 binding to add 0x39 as a valid I2C address, as
it is used by the NCP5623C variant"
* tag 'leds-next-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/leds:
dt-bindings: leds: ncp5623: Add 0x39 as a valid I2C address
Revert "leds: trigger: netdev: Configure LED blink interval for HW offload"
leds: pca955x: Avoid potential overflow when filling default_label (take 2)
leds: Unexport of_led_get()
leds: tps6131x: Add V4L2_FLASH_LED_CLASS dependency
dt-bindings: leds: lp50xx: Document child reg, fix example
leds: leds-lp50xx: Handle reg to get correct multi_index
leds: led-class-flash:: Fix flash_timeout comment
MAINTAINERS: Adjust file entry in TPS6131X FLASH LED DRIVER
leds: flash: leds-qcom-flash: Fix registry access after re-bind
Merge tag 'mfd-next-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd
Pull MFD updates from Lee Jones:
"New Support & Features:
- Add extensive support for the Analog Devices ADP5589 I/O expander,
including core MFD, GPIO, PWM, and a new keypad matrix input
driver. This also adds support for handling various events
including GPI, keypad, reset and unlock ev ents
- Add support for the TI TPS652G1 PMIC, a stripped-down version of
the TPS65224, including core MFD, PFSM, pinctrl, and GPIO support
- Add support for the Apple Silicon System Management Controller
(SMC), including the core MFD driver which handles the RTKit-based
protocol, a new GPIO driver for PMU GPIOs, and a new
reboot/power-off driver.
Improvements & Fixes:
- Dynamically add ADP5585 sub-devices based on device tree properties
- Move ADP5585 oscillator control from the child PWM driver to the
main MFD driver to better handle shared resources
- Add support for a hardware reset pin and VDD regulator to the
ADP5585 driver
- Update the TPS65219 MFD cell's GPIO compatible string for the
TPS65214 to reflect hardware capabilities correctly
- Separate the ChromeOS EC charge-control probing from the USB-PD
subsystem, allowing it to probe independently based on the
dedicated EC_FEATURE_CHARGER
- Fix an interrupt naming typo in the MT6370 driver
- Fix RK806 PMIC reset behavior by allowing the reset mode to be
customized via a new device tree property
- Fix AXP20X regulator cell ID conflicts for secondary PMICs on
boards without an IRQ line connected
- Fix MT6397 keypad sub-device creation to use specific names instead
of a generic one, ensuring correct driver binding
- Fix a build warning in the stm32-timers driver by adding a missing
include for export.h.
Cleanups & Refactoring:
- Refactor the ADP5585 driver to simplify how regmap defaults are
handled, making it easier to add new chip variants
- Introduce per-chip register map structures for the ADP5585/ADP5589
family to handle differences between the devices
- Convert several drivers to use dev_fwnode() instead of
of_fwnode_handle()
- Make various static structures const in the cs40l50, rohm-bd71828,
tps65219, and twl6040 drivers
- Remove redundant pm_runtime_mark_last_busy() calls from several
drivers
- Alphabetize Kconfig entries for Cirrus Logic and Maxim drivers
- Remove unused fields from the 'tps65219' struct
- Update several MFD-related headers to follow the 'Include What You
Use' (IWYU) principle.
Removals:
- Remove the old, platform-data-based adp5589-keys input driver,
which is now superseded by the new MFD-based adp5585-keys driver
- Remove the unused twl6030_mmc_card_detect() functions and
associated header declarations
- Remove the now unused pcf50633/core.h header file
- Remove the fsl,imx8qxp-csr device tree binding, which was being
used incorrectly.
Device Tree Bindings Updates:
- Add support for the Analog Devices ADP5589 I/O expander to the
adi,adp5585.yaml binding
- Add new properties to the adi,adp5585.yaml binding for input
events, including keypad pins, unlock events, and reset events
- Add a reset-gpios property to the adi,adp5585.yaml binding
- Add the TI TPS652G1 PMIC to the ti,tps6594.yaml binding
- Add new bindings for the Apple Mac System Management Controller
(SMC) and its sub-devices: apple,smc.yaml, apple,smc-gpio.yaml, and
apple,smc-reboot.yaml
- Convert the Freescale MXS LRADC binding (mxs-lradc) to YAML schema
format
- Convert and combine the NXP LPC1850 CREG, DMAMUX, and USB OTG PHY
bindings into a single YAML schema file
- Convert the TI TPS65910 binding to YAML schema format
- Add a comment to the samsung,s2mps11.yaml binding to clarify the
use of 'oneOf' for interrupt properties
- Add the rockchip,reset-mode property to the rockchip,rk806.yaml
binding to allow customization of the PMIC's reset behavior"
* tag 'mfd-next-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd: (28 commits)
mfd: dt-bindings: Convert TPS65910 to DT schema
mfd: Minor Cirrus/Maxim Kconfig order fixes
mfd: Remove redundant pm_runtime_mark_last_busy() calls
mfd: mt6397: Do not use generic name for keypad sub-devices
mfd: axp20x: Set explicit ID for regulator cell if no IRQ line is present
mfd: mt6370: Fix the interrupt naming typo
mfd: rk8xx-core: Allow to customize RK806 reset mode
dt-bindings: mfd: rk806: Allow to customize PMIC reset mode
mfd: syscon: atmel-smc: Don't use "proxy" headers
mfd: madera: Don't use "proxy" headers
mfd: wm8350-core: Don't use "proxy" headers
dt-bindings: mfd: samsung,s2mps11: Add comment about interrupts properties
mfd: davinci_voicecodec: Don't use "proxy" headers
mfd: pcf50633: Remove the header file core.h
mfd: tps65219: Remove another unused field from 'struct tps65219'
mfd: tps65219: Remove an unused field from 'struct tps65219'
mfd: tps65219: Constify struct regmap_irq_sub_irq_map and tps65219_chip_data
mfd: rohm-bd71828: Constify some structures
dt-bindings: mfd: fsl,imx8qxp-csr: Remove binding documentation
mfd: axp20x: Set explicit ID for AXP313 regulator
...
Merge tag 'integrity-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity
Pull integrity update from Mimi Zohar:
"A single commit to permit disabling IMA from the boot command line for
just the kdump kernel.
The exception itself sort of makes sense. My concern is that
exceptions do not remain as exceptions, but somehow morph to become
the norm"
* tag 'integrity-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity:
ima: add a knob ima= to allow disabling IMA in kdump kernel
libbpf: Avoid possible use of uninitialized mod_len
Though mod_len is only read when mod_name != NULL and both are initialized
together, gcc15 produces a warning with -Werror=maybe-uninitialized:
libbpf.c: In function 'find_kernel_btf_id.constprop':
libbpf.c:10100:33: error: 'mod_len' may be used uninitialized [-Werror=maybe-uninitialized]
10100 | if (mod_name && strncmp(mod->name, mod_name, mod_len) != 0)
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
libbpf.c:10070:21: note: 'mod_len' was declared here
10070 | int ret, i, mod_len;
| ^~~~~~~
Silence the false positive.
Signed-off-by: Achill Gilgenast <fossdd@pwned.life> Acked-by: Yonghong Song <yonghong.song@linux.dev> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20250729094611.2065713-1-fossdd@pwned.life Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Daniel Borkmann [Wed, 30 Jul 2025 23:47:33 +0000 (01:47 +0200)]
bpf: Fix oob access in cgroup local storage
Lonial reported that an out-of-bounds access in cgroup local storage
can be crafted via tail calls. Given two programs each utilizing a
cgroup local storage with a different value size, and one program
doing a tail call into the other. The verifier will validate each of
the indivial programs just fine. However, in the runtime context
the bpf_cg_run_ctx holds an bpf_prog_array_item which contains the
BPF program as well as any cgroup local storage flavor the program
uses. Helpers such as bpf_get_local_storage() pick this up from the
runtime context:
For the second program which was called from the originally attached
one, this means bpf_get_local_storage() will pick up the former
program's map, not its own. With mismatching sizes, this can result
in an unintended out-of-bounds access.
To fix this issue, we need to extend bpf_map_owner with an array of
storage_cookie[] to match on i) the exact maps from the original
program if the second program was using bpf_get_local_storage(), or
ii) allow the tail call combination if the second program was not
using any of the cgroup local storage maps.
Fixes: 7d9c3427894f ("bpf: Make cgroup storages shared between programs on the same cgroup") Reported-by: Lonial Con <kongln9170@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/r/20250730234733.530041-4-daniel@iogearbox.net Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Daniel Borkmann [Wed, 30 Jul 2025 23:47:31 +0000 (01:47 +0200)]
bpf: Move bpf map owner out of common struct
Given this is only relevant for BPF tail call maps, it is adding up space
and penalizing other map types. We also need to extend this with further
objects to track / compare to. Therefore, lets move this out into a separate
structure and dynamically allocate it only for BPF tail call maps.
Daniel Borkmann [Wed, 30 Jul 2025 23:47:30 +0000 (01:47 +0200)]
bpf: Add cookie object to bpf maps
Add a cookie to BPF maps to uniquely identify BPF maps for the timespan
when the node is up. This is different to comparing a pointer or BPF map
id which could get rolled over and reused.
Merge tag 'caps-pr-20250729' of git://git.kernel.org/pub/scm/linux/kernel/git/sergeh/linux
Pull capabilities update from Serge Hallyn:
- Fix broken link in documentation in capability.h
- Correct the permission check for unsafe exec
During exec, different effective and real credentials were assumed to
mean changed credentials, making it impossible in the no-new-privs
case to keep different uid and euid
* tag 'caps-pr-20250729' of git://git.kernel.org/pub/scm/linux/kernel/git/sergeh/linux:
uapi: fix broken link in linux/capability.h
exec: Correct the permission check for unsafe exec
Merge tag 'sh-for-v6.17-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/glaubitz/sh-linux
Pull sh update from John Paul Adrian Glaubitz:
"A single patch by Ben Hutchings replaces the hyphen in the exported
shell variable ld-bfd with an underscore to avoid issues with certain
shells such as dash which do not pass through environment variables
whose name includes a hyphen"
* tag 'sh-for-v6.17-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/glaubitz/sh-linux:
sh: Do not use hyphen in exported variable name
Module (SFP) eeprom GET has a lot of input params, they are all
mistakenly listed as output in the spec. Looks like kernel doesn't
output them at all. Correct what are the inputs and what the outputs.
Reported-by: Duo Yi <duo@meta.com> Fixes: a353318ebf24 ("tools: ynl: populate most of the ethtool spec") Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250730172137.1322351-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Namhyung Kim [Thu, 31 Jul 2025 07:03:30 +0000 (00:03 -0700)]
perf record: Cache build-ID of hit DSOs only
It post-processes samples to find which DSO has samples. Based on that
info, it can save used DSOs in the build-ID cache directory. But for
some reason, it saves all DSOs without checking the hit mark. Skipping
unused DSOs can give some speedup especially with --buildid-mmap being
default.
On my idle machine, `time perf record -a sleep 1` goes down from 3 sec
to 1.5 sec with this change.
Fixes: e29386c8f7d71fa5 ("perf record: Add --buildid-mmap option to enable PERF_RECORD_MMAP2's build id") Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: https://lore.kernel.org/r/20250731070330.57116-1-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Merge tag 'fsnotify_for_v6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull fsnotify updates from Jan Kara:
"A couple of small improvements for fsnotify subsystem.
The most interesting is probably Amir's change modifying the meaning
of fsnotify fmode bits (and I spell it out specifically because I know
you care about those). There's no change for the common cases of no
fsnotify watches or no permission event watches. But when there are
permission watches (either for open or for pre-content events) but no
FAN_ACCESS_PERM watch (which nobody uses in practice) we are now able
optimize away unnecessary cache loads from the read path"
* tag 'fsnotify_for_v6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
fsnotify: optimize FMODE_NONOTIFY_PERM for the common cases
fsnotify: merge file_set_fsnotify_mode_from_watchers() with open perm hook
samples: fix building fs-monitor on musl systems
fanotify: sanitize handle_type values when reporting fid
Merge tag 'jfs-6.17' of github.com:kleikamp/linux-shaggy
Pull jfs updates from Dave Kleikamp:
"Fixes and cleanups for JFS filesystem"
* tag 'jfs-6.17' of github.com:kleikamp/linux-shaggy:
jfs: fix metapage reference count leak in dbAllocCtl
jfs: stop using write_cache_pages
jfs: truncate good inode pages when hard link is 0
jfs: jfs_xtree: replace XT_GETPAGE macro with xt_getpage()
jfs: Regular file corruption check
jfs: upper bound check of tree index in dbAllocAG
Merge tag 'for-linus-6.17-ofs1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux
Pull orangefs updates from Mike Marshall:
"Fixes for string handling in debugfs and sysfs:
- change scnprintf to sysfs_emit in sysfs code.
- change sprintf to scnprintf in debugfs code.
- refactor debugfs mask-to-string code for readability and slightly
improved functionality"
* tag 'for-linus-6.17-ofs1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux:
fs/orangefs: Allow 2 more characters in do_c_string()
fs: orangefs: replace scnprintf() with sysfs_emit()
fs/orangefs: use snprintf() instead of sprintf()
Merge tag 'ubifs-for-linus-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs
Pull UBI and UBIFS updates from Richard Weinberger:
"UBIFS:
- No longer use write_cache_pages()
UBI:
- Remove an unused function"
* tag 'ubifs-for-linus-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs:
ubifs: stop using write_cache_pages
mtd: ubi: Remove unused ubi_flush
Merge tag 'ext4_for_linus_6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 updates from Ted Ts'o:
"Major ext4 changes for 6.17:
- Better scalability for ext4 block allocation
- Fix insufficient credits when writing back large folios
Miscellaneous bug fixes, especially when handling exteded attriutes,
inline data, and fast commit"
* tag 'ext4_for_linus_6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (39 commits)
ext4: do not BUG when INLINE_DATA_FL lacks system.data xattr
ext4: implement linear-like traversal across order xarrays
ext4: refactor choose group to scan group
ext4: convert free groups order lists to xarrays
ext4: factor out ext4_mb_scan_group()
ext4: factor out ext4_mb_might_prefetch()
ext4: factor out __ext4_mb_scan_group()
ext4: fix largest free orders lists corruption on mb_optimize_scan switch
ext4: fix zombie groups in average fragment size lists
ext4: merge freed extent with existing extents before insertion
ext4: convert sbi->s_mb_free_pending to atomic_t
ext4: fix typo in CR_GOAL_LEN_SLOW comment
ext4: get rid of some obsolete EXT4_MB_HINT flags
ext4: utilize multiple global goals to reduce contention
ext4: remove unnecessary s_md_lock on update s_mb_last_group
ext4: remove unnecessary s_mb_last_start
ext4: separate stream goal hits from s_bal_goals for better tracking
ext4: add ext4_try_lock_group() to skip busy groups
ext4: initialize superblock fields in the kballoc-test.c kunit tests
ext4: refactor the inline directory conversion and new directory codepaths
...
After hugetlbfs PTE_MARKER support for s390 introduced region-third and
segment table swap entries, it is now possible to also enable THP_SWAP
and THP_MIGRATION for s390.
s390 has different layout for PTE and region / segment table entries
(RSTE). This is also true for swap entries, and their swap type and offset
encoding. For hugetlbfs PTE_MARKER support, s390 has internal
__swp_type_rste() and __swp_offset_rste() helpers to correctly handle RSTE
swap entries.
But common swap code does not know about this difference, and only uses
__swp_type(), __swp_offset() and __swp_entry() helpers for conversion
between arch-dependent and arch-independent representation of swp_entry_t
for all pagetable levels. On s390, those helpers only work for PTE swap
entries.
Therefore, implement __pmd_to_swp_entry() to build a fake PTE swap entry
and return the arch-dependent representation of that. Correspondingly,
implement __swp_entry_to_pmd() to convert that into a proper PMD swap
entry again. With this, the arch-dependent swp_entry_t representation will
always look like a PTE swap entry in common code.
This is somewhat similar to fake PTEs in hugetlbfs code for s390, but only
requires conversion of the swap type and offset, and not all the possible
PTE bits.
For PMD swap entry SOFT_DIRTY handling, use the same helpers as for normal
PMDs. Similar to PTEs, the SOFT_DIRTY bit location is the same for swap
and normal entries.
Enable the functionality of commits d593d64f043a ("lib: Add register read/write tracing support") 210031971cdd ("asm-generic/io: Add logging support for MMIO accessors").
It only depends on explicit function calls for the tracing.
s390/mm: Set high_memory at the end of the identity mapping
The value of high_memory variable is set by set_high_memory() function
to a value returned by memblock_end_of_DRAM(). The latter function
returns by default the upper bound of the last online memory block,
not the upper bound of the directly mapped memory region. As result,
in case the end of memory happens to be offline, high_memory variable
is set to a value that is short on the last offline memory blocks size:
RANGE SIZE STATE REMOVABLE BLOCK
0x0000000000000000-0x000000ffffffffff 1T online yes 0-511
0x0000010000000000-0x0000011fffffffff 128G offline 512-575
Memory block size: 2G
Total online memory: 1T
Total offline memory: 128G
In the past the value of high_memory was derived from max_low_pfn,
which in turn was derived from the identity_size. Since identity_size
accommodates the whole memory size - including tailing offline blocks,
the offlined blocks did not impose any problem. But since commit e120d1bc12da ("arch, mm: set high_memory in free_area_init()") the
value of high_memory is derived from the last memblock online region,
and that is where the problem comes from.
The value of high_memory is used by several drivers and by external
tools (e.g. crash tool aborts while loading a dump).
Similarily to ARM, use the override path provided by set_high_memory()
function and set the value of high_memory at the end of the identity
mapping early. That forces set_high_memory() to leave in high_memory
the correct value, even when the end of available memory is offline.
Fixes: e120d1bc12da ("arch, mm: set high_memory in free_area_init()") Tested-by: Mikhail Zaslonko <zaslonko@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Reviewed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
s390/ap: Unmask SLCF bit in card and queue ap functions sysfs
The SLCF bit ("stateless command filtering") introduced with
CEX8 cards was because of the function mask's default value
suppressed when user space read the ap function for an AP
card or queue. Unmask this bit so that user space applications
like lszcrypt can evaluate and list this feature.
Fixes: d4c53ae8e494 ("s390/ap: store TAPQ hwinfo in struct ap_card") Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Various controller drivers received minor fixes like DMA mapping checks,
better timing derivations or bitflip statistics.
It has also been discovered that some Hynix NAND flashes were not
supporting read-retries, which is not properly supported.
* SPI NAND changes:
In order to support high-speed modes, certain chips need extra
configuration like adding more dummy cycles. This is now possible,
especially on Winbond chips.
Aside from that, Gigadevice gets support for a new chip (GD5F1GM9).
- Fix exiting 4-byte addressing on Infineon SEMPER flashes. These
flashes do not support the standard EX4B opcode (E9h), and use a
vendor-specific opcode (B8h) instead.
- Fix unlocking of flashes that are write-protected at power-on. This
was caused by using an uninitialized mtd_info in
spi_nor_try_unlock_all().
Merge tag 'v6.17-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto update from Herbert Xu:
"API:
- Allow hash drivers without fallbacks (e.g., hardware key)
Algorithms:
- Add hmac hardware key support (phmac) on s390
- Re-enable sha384 in FIPS mode
- Disable sha1 in FIPS mode
- Convert zstd to acomp
Drivers:
- Lower priority of qat skcipher and aead
- Convert aspeed to partial block API
- Add iMX8QXP support in caam
- Add rate limiting support for GEN6 devices in qat
- Enable telemetry for GEN6 devices in qat
- Implement full backlog mode for hisilicon/sec2"
* tag 'v6.17-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (116 commits)
crypto: keembay - Use min() to simplify ocs_create_linked_list_from_sg()
crypto: hisilicon/hpre - fix dma unmap sequence
crypto: qat - make adf_dev_autoreset() static
crypto: ccp - reduce stack usage in ccp_run_aes_gcm_cmd
crypto: qat - refactor ring-related debug functions
crypto: qat - fix seq_file position update in adf_ring_next()
crypto: qat - fix DMA direction for compression on GEN2 devices
crypto: jitter - replace ARRAY_SIZE definition with header include
crypto: engine - remove {prepare,unprepare}_crypt_hardware callbacks
crypto: engine - remove request batching support
crypto: qat - flush misc workqueue during device shutdown
crypto: qat - enable rate limiting feature for GEN6 devices
crypto: qat - add compression slice count for rate limiting
crypto: qat - add get_svc_slice_cnt() in device data structure
crypto: qat - add adf_rl_get_num_svc_aes() in rate limiting
crypto: qat - relocate service related functions
crypto: qat - consolidate service enums
crypto: qat - add decompression service for rate limiting
crypto: qat - validate service in rate limiting sysfs api
crypto: hisilicon/sec2 - implement full backlog mode for sec
...
Merge tag 'nvme-6.17-2025-07-31' of git://git.infradead.org/nvme into block-6.17
Pull NVMe changes from Christoph:
"- add support for getting the FDP featuee in fabrics passthru path
(Nitesh Shetty)
- add capability to connect to an administrative controller
(Kamaljit Singh)
- fix a leak on sgl setup error (Keith Busch)
- initialize discovery subsys after debugfs is initialized
(Mohamed Khalfella)
- fix various comment typos (Bjorn Helgaas)
- remove unneeded semicolons (Jiapeng Chong)"
* tag 'nvme-6.17-2025-07-31' of git://git.infradead.org/nvme:
nvme: fix various comment typos
nvme-auth: remove unneeded semicolon
nvme-pci: fix leak on sgl setup error
nvmet: initialize discovery subsys after debugfs is initialized
nvme: add capability to connect to an administrative controller
nvmet: add support for FDP in fabrics passthru path
Merge tag 'ipe-pr-20250728' of git://git.kernel.org/pub/scm/linux/kernel/git/wufan/ipe
Pull ipe update from Fan Wu:
"A single commit from Eric Biggers to simplify the IPE (Integrity
Policy Enforcement) policy audit with the SHA-256 library API"
* tag 'ipe-pr-20250728' of git://git.kernel.org/pub/scm/linux/kernel/git/wufan/ipe:
ipe: use SHA-256 library API instead of crypto_shash API
These were initially added because some software was checking for their
presence. However, the device is not NUMA, so adding these is wrong and
hence they should be removed.
Stephen Boyd [Thu, 31 Jul 2025 16:10:06 +0000 (09:10 -0700)]
Merge branch 'clk-fixes' into clk-next
Resolve conflicts with i.MX95 changes 88768d6f8c13 ("clk:
imx95-blk-ctl: Rename lvds and displaymix csr blk") in clk-imx
and aacc875a448d ("clk: imx: Fix an out-of-bounds access in
dispmix_csr_clk_dev_data") in clk-fixes.
* clk-fixes:
clk: sunxi-ng: v3s: Fix TCON clock parents
clk: sunxi-ng: v3s: Fix CSI1 MCLK clock name
clk: sunxi-ng: v3s: Fix CSI SCLK clock name
dt-bindings: clock: mediatek: Add #reset-cells property for MT8188
clk: imx: Fix an out-of-bounds access in dispmix_csr_clk_dev_data
clk: scmi: Handle case where child clocks are initialized before their parents
clk: sunxi-ng: a523: Mark MBUS clock as critical
Pull documentation updates from Jonathan Corbet:
"It has been a relatively busy cycle for docs, especially the build
system:
- The Perl kernel-doc script was added to 2.3.52pre1 just after the
turn of the millennium. Over the following 25 years, it accumulated
a vast amount of cruft, all in a language few people want to deal
with anymore. Mauro's Python replacement in 6.16 faithfully
reproduced all of the cruft in the hope of avoiding regressions.
Now that we have a more reasonable code base, though, we can work
on cleaning it up; many of the changes this time around are toward
that end.
- A reorganization of the ext4 docs into the usual TOC format.
- Various Chinese translations and updates.
- A new script from Mauro to help with docs-build testing.
- A new document for linked lists
- A sweep through MAINTAINERS fixing broken GitHub git:// repository
links.
...and lots of fixes and updates"
* tag 'docs-6.17' of git://git.lwn.net/linux: (147 commits)
scripts: add origin commit identification based on specific patterns
sphinx: kernel_abi: fix performance regression with O=<dir>
Documentation: core-api: entry: Replace deprecated KVM entry/exit functions
docs: fault-injection: drop reference to md-faulty
docs: document linked lists
scripts: kdoc: make it backward-compatible with Python 3.7
docs: kernel-doc: emit warnings for ancient versions of Python
Documentation/rtla: Describe exit status
Documentation/rtla: Add include common_appendix.rst
docs: kernel: Clarify printk_ratelimit_burst reset behavior
Documentation: ioctl-number: Don't repeat macro names
Documentation: ioctl-number: Shorten macros table
Documentation: ioctl-number: Correct full path to papr-physical-attestation.h
Documentation: ioctl-number: Extend "Include File" column width
Documentation: ioctl-number: Fix linuxppc-dev mailto link
overlayfs.rst: fix typos
docs: kdoc: emit a warning for ancient versions of Python
docs: kdoc: clean up check_sections()
docs: kdoc: directly access the always-there KdocItem fields
docs: kdoc: straighten up dump_declaration()
...
Ben Horgan [Wed, 9 Jul 2025 09:38:08 +0000 (10:38 +0100)]
bitfield: Ensure the return values of helper functions are checked
As type##_replace_bits() has no side effects it is only useful if its
return value is checked. Add __must_check to enforce this usage. To have
the bits replaced in-place typep##_replace_bits() can be used instead.
Although, type_##_get_bits() and type_##_encode_bits() are harder to misuse
they are still only useful if the return value is checked. For
consistency, also add __must_check to these.
Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Yury Norov (NVIDIA) <yury.norov@gmail.com>
Vincent Mailhol [Mon, 9 Jun 2025 02:45:47 +0000 (11:45 +0900)]
test_bits: add tests for __GENMASK() and __GENMASK_ULL()
The definitions of GENMASK() and GENMASK_ULL() do not depend any more
on __GENMASK() and __GENMASK_ULL(). Duplicate the existing unit tests
so that __GENMASK{,ULL}() are still covered.
Because __GENMASK() and __GENMASK_ULL() do use GENMASK_INPUT_CHECK(),
drop the TEST_GENMASK_FAILURES negative tests.
It would be good to have a small assembly test case for GENMASK*() in
case somebody decides to unify both in the future. However, I lack
expertise in assembly to do so. Instead add a FIXME message to
highlight the absence of the asm unit test.
Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Yury Norov (NVIDIA) <yury.norov@gmail.com>
Vincent Mailhol [Mon, 9 Jun 2025 02:45:45 +0000 (11:45 +0900)]
bits: split the definition of the asm and non-asm GENMASK*()
In an upcoming change, the non-asm GENMASK*() will all be unified to
depend on GENMASK_TYPE() which indirectly depend on sizeof(), something
not available in asm.
Instead of adding further complexity to GENMASK_TYPE() to make it work
for both asm and non asm, just split the definition of the two variants.
Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Yury Norov (NVIDIA) <yury.norov@gmail.com>
Shaopeng Tan [Mon, 23 Jun 2025 07:46:45 +0000 (16:46 +0900)]
cpumask: Remove unnecessary cpumask_nth_andnot()
Commit 94f753143028("x86/resctrl: Optimize cpumask_any_housekeeping()")
switched the only user of cpumask_nth_andnot() to other cpumask
functions, but left the function cpumask_nth_andnot() unused.
This makes function find_nth_andnot_bit() unused as well. Delete them.
Signed-off-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Signed-off-by: Yury Norov [NVIDIA] <yury.norov@gmail.com>
clocksource: Improve randomness in clocksource_verify_choose_cpus()
The current algorithm of picking a random CPU works OK for dense online
cpumask, but if cpumask is non-dense, the distribution of picked CPUs
is skewed.
For example, on 8-CPU board with CPUs 4-7 offlined, the probability of
selecting CPU 0 is 5/8. Accordingly, cpus 1, 2 and 3 are chosen with
probability 1/8 each. The proper algorithm should pick each online CPU
with probability 1/4.
Switch it to cpumask_random(), which has better statistical
characteristics.
CC: Andrew Morton <akpm@linux-foundation.org> Acked-by: John Stultz <jstultz@google.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: "Yury Norov [NVIDIA]" <yury.norov@gmail.com>
scarlett2_input_select_ctl_info() sets up the string arrays allocated
via kasprintf(), but it misses NULL checks, which may lead to NULL
dereference Oops. Let's add the proper NULL check.
The HD-audio codec driver configs have been updated again since the
previous change. Correct the types and enable all Realtek HD-audio
codecs for loongson, per request.
The Realtek and HDMI HD-audio codec configs have been slightly updated
again since the previous change. Follow the new kconfig changes for
multi_v7_defconfig and tegra_defconfig, and add a few other configs
for HDMI codecs, too.
ALSA: usb-audio: Add DSD support for Comtrue USB Audio device
The vendor Comtrue Inc. (0x2fc6) produces USB audio chipsets like
the CT7601 which are capable of Native DSD playback.
This patch adds QUIRK_FLAG_DSD_RAW for Comtrue (VID 0x2fc6), which enables
native DSD playback (DSD_U32_LE) on their USB Audio device. This has been
verified under Ubuntu 25.04 with JRiver.
Steve French [Mon, 28 Jul 2025 17:32:53 +0000 (12:32 -0500)]
smb3 client: add way to show directory leases for improved debugging
When looking at performance issues around directory caching, or debugging
directory lease issues, it is helpful to be able to display the current
directory leases (as we can e.g. or open files). Create pseudo-file
/proc/fs/cifs/open_dirs that displays current directory leases. Here
is sample output:
Steven Rostedt [Tue, 29 Jul 2025 18:23:14 +0000 (14:23 -0400)]
unwind: Finish up unwind when a task exits
On do_exit() when a task is exiting, if a unwind is requested and the
deferred user stacktrace is deferred via the task_work, the task_work
callback is called after exit_mm() is called in do_exit(). This means that
the user stack trace will not be retrieved and an empty stack is created.
Instead, add a function unwind_deferred_task_exit() and call it just
before exit_mm() so that the unwinder can call the requested callbacks
with the user space stack.
Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Josh Poimboeuf <jpoimboe@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Indu Bhagat <indu.bhagat@oracle.com> Cc: "Jose E. Marchesi" <jemarch@gnu.org> Cc: Beau Belgrave <beaub@linux.microsoft.com> Cc: Jens Remus <jremus@linux.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Florian Weimer <fweimer@redhat.com> Cc: Sam James <sam@gentoo.org> Link: https://lore.kernel.org/20250729182406.504259474@kernel.org Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Steven Rostedt [Tue, 29 Jul 2025 18:23:13 +0000 (14:23 -0400)]
unwind deferred: Use SRCU unwind_deferred_task_work()
Instead of using the callback_mutex to protect the link list of callbacks
in unwind_deferred_task_work(), use SRCU instead. This gets called every
time a task exits that has to record a stack trace that was requested.
This can happen for many tasks on several CPUs at the same time. A mutex
is a bottleneck and can cause a bit of contention and slow down performance.
As the callbacks themselves are allowed to sleep, regular RCU cannot be
used to protect the list. Instead use SRCU, as that still allows the
callbacks to sleep and the list can be read without needing to hold the
callback_mutex.
Link: https://lore.kernel.org/all/ca9bd83a-6c80-4ee0-a83c-224b9d60b755@efficios.com/ Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Josh Poimboeuf <jpoimboe@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Indu Bhagat <indu.bhagat@oracle.com> Cc: "Jose E. Marchesi" <jemarch@gnu.org> Cc: Beau Belgrave <beaub@linux.microsoft.com> Cc: Jens Remus <jremus@linux.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Florian Weimer <fweimer@redhat.com> Cc: Sam James <sam@gentoo.org> Link: https://lore.kernel.org/20250729182406.331548065@kernel.org Suggested-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Steven Rostedt [Tue, 29 Jul 2025 18:23:12 +0000 (14:23 -0400)]
unwind: Add USED bit to only have one conditional on way back to user space
On the way back to user space, the function unwind_reset_info() is called
unconditionally (but always inlined). It currently has two conditionals.
One that checks the unwind_mask which is set whenever a deferred trace is
called and is used to know that the mask needs to be cleared. The other
checks if the cache has been allocated, and if so, it resets the
nr_entries so that the unwinder knows it needs to do the work to get a new
user space stack trace again (it only does it once per entering the
kernel).
Use one of the bits in the unwind mask as a "USED" bit that gets set
whenever a trace is created. This will make it possible to only check the
unwind_mask in the unwind_reset_info() to know if it needs to do work or
not and eliminates a conditional that happens every time the task goes
back to user space.
Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Josh Poimboeuf <jpoimboe@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Indu Bhagat <indu.bhagat@oracle.com> Cc: "Jose E. Marchesi" <jemarch@gnu.org> Cc: Beau Belgrave <beaub@linux.microsoft.com> Cc: Jens Remus <jremus@linux.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Florian Weimer <fweimer@redhat.com> Cc: Sam James <sam@gentoo.org> Link: https://lore.kernel.org/20250729182406.155422551@kernel.org Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Steven Rostedt [Tue, 29 Jul 2025 18:23:11 +0000 (14:23 -0400)]
unwind deferred: Add unwind_completed mask to stop spurious callbacks
If there's more than one registered tracer to the unwind deferred
infrastructure, it is currently possible that one tracer could cause extra
callbacks to happen for another tracer if the former requests a deferred
stacktrace after the latter's callback was executed and before the task
went back to user space.
Here's an example of how this could occur:
[Task enters kernel]
tracer 1 request -> add cookie to its buffer
tracer 1 request -> add cookie to its buffer
<..>
[ task work executes ]
tracer 1 callback -> add trace + cookie to its buffer
[tracer 2 requests and triggers the task work again]
[ task work executes again ]
tracer 1 callback -> add trace + cookie to its buffer
tracer 2 callback -> add trace + cookie to its buffer
[Task exits back to user space]
This is because the bit for tracer 1 gets set in the task's unwind_mask
when it did its request and does not get cleared until the task returns
back to user space. But if another tracer were to request another deferred
stacktrace, then the next task work will executed all tracer's callbacks
that have their bits set in the task's unwind_mask.
To fix this issue, add another mask called unwind_completed and place it
into the task's info->cache structure. The cache structure is allocated
on the first occurrence of a deferred stacktrace and this unwind_completed
mask is not needed until then. It's better to have it in the cache than to
permanently waste space in the task_struct.
After a tracer's callback is executed, it's bit gets set in this
unwind_completed mask. When the task_work enters, it will AND the task's
unwind_mask with the inverse of the unwind_completed which will eliminate
any work that already had its callback executed since the task entered the
kernel.
When the task leaves the kernel, it will reset this unwind_completed mask
just like it resets the other values as it enters user space.
Link: https://lore.kernel.org/all/20250716142609.47f0e4a5@batman.local.home/ Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Josh Poimboeuf <jpoimboe@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Indu Bhagat <indu.bhagat@oracle.com> Cc: "Jose E. Marchesi" <jemarch@gnu.org> Cc: Beau Belgrave <beaub@linux.microsoft.com> Cc: Jens Remus <jremus@linux.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Florian Weimer <fweimer@redhat.com> Cc: Sam James <sam@gentoo.org> Link: https://lore.kernel.org/20250729182405.989222722@kernel.org Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Steven Rostedt [Tue, 29 Jul 2025 18:23:10 +0000 (14:23 -0400)]
unwind deferred: Use bitmask to determine which callbacks to call
In order to know which registered callback requested a stacktrace for when
the task goes back to user space, add a bitmask to keep track of all
registered tracers. The bitmask is the size of long, which means that on a
32 bit machine, it can have at most 32 registered tracers, and on 64 bit,
it can have at most 64 registered tracers. This should not be an issue as
there should not be more than 10 (unless BPF can abuse this?).
When a tracer registers with unwind_deferred_init() it will get a bit
number assigned to it. When a tracer requests a stacktrace, it will have
its bit set within the task_struct. When the task returns back to user
space, it will call the callbacks for all the registered tracers where
their bits are set in the task's mask.
When a tracer is removed by the unwind_deferred_cancel() all current tasks
will clear the associated bit, just in case another tracer gets registered
immediately afterward and then gets their callback called unexpectedly.
To prevent live locks from happening if an event that happens between the
task_work and when the task goes back to user space, triggers the deferred
unwind, have the unwind_mask get cleared on exit to user space and not
after the callback is made.
Move the pending bit from a value on the task_struct to bit zero of the
unwind_mask (saves space on the task_struct). This will allow modifying
the pending bit along with the work bits atomically.
Instead of clearing a work's bit after its callback is called, it is
delayed until exit. If the work is requested again, the task_work is not
queued again and the request will be notified that the task has already been
called by returning a positive number (the same as if it was already
pending).
The pending bit is cleared before calling the callback functions but the
current work bits remain. If one of the called works registers again, it
will not trigger a task_work if its bit is still present in the task's
unwind_mask.
If a new work requests a deferred unwind, then it will set both the
pending bit and its own bit. Note this will also cause any work that was
previously queued and had their callback already executed to be executed
again. Future work will remove these spurious callbacks.
The use of atomic_long bit operations were suggested by Peter Zijlstra: Link: https://lore.kernel.org/all/20250715102912.GQ1613200@noisy.programming.kicks-ass.net/
The unwind_mask could not be converted to atomic_long_t do to atomic_long
not having all the bit operations needed by unwind_mask. Instead it
follows other use cases in the kernel and just typecasts the unwind_mask
to atomic_long_t when using the two atomic_long functions.
Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Josh Poimboeuf <jpoimboe@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Indu Bhagat <indu.bhagat@oracle.com> Cc: "Jose E. Marchesi" <jemarch@gnu.org> Cc: Beau Belgrave <beaub@linux.microsoft.com> Cc: Jens Remus <jremus@linux.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Florian Weimer <fweimer@redhat.com> Cc: Sam James <sam@gentoo.org> Link: https://lore.kernel.org/20250729182405.822789300@kernel.org Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Steven Rostedt [Tue, 29 Jul 2025 18:23:09 +0000 (14:23 -0400)]
unwind_user/deferred: Make unwind deferral requests NMI-safe
Make unwind_deferred_request() NMI-safe so tracers in NMI context can
call it and safely request a user space stacktrace when the task exits.
Note, this is only allowed for architectures that implement a safe
cmpxchg. If an architecture requests a deferred stack trace from NMI
context that does not support a safe NMI cmpxchg, it will get an -EINVAL
and trigger a warning. For those architectures, they would need another
method (perhaps an irqwork), to request a deferred user space stack trace.
That can be dealt with later if one of theses architectures require this
feature.
Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Josh Poimboeuf <jpoimboe@kernel.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Indu Bhagat <indu.bhagat@oracle.com> Cc: "Jose E. Marchesi" <jemarch@gnu.org> Cc: Beau Belgrave <beaub@linux.microsoft.com> Cc: Jens Remus <jremus@linux.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Florian Weimer <fweimer@redhat.com> Cc: Sam James <sam@gentoo.org> Link: https://lore.kernel.org/20250729182405.657072238@kernel.org Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Add an interface for scheduling task work to unwind the user space stack
before returning to user space. This solves several problems for its
callers:
- Ensure the unwind happens in task context even if the caller may be
running in interrupt context.
- Avoid duplicate unwinds, whether called multiple times by the same
caller or by different callers.
- Create a "context cookie" which allows trace post-processing to
correlate kernel unwinds/traces with the user unwind.
A concept of a "cookie" is created to detect when the stacktrace is the
same. A cookie is generated the first time a user space stacktrace is
requested after the task enters the kernel. As the stacktrace is saved on
the task_struct while the task is in the kernel, if another request comes
in, if the cookie is still the same, it will use the saved stacktrace,
and not have to regenerate one.
The cookie is passed to the caller on request, and when the stacktrace is
generated upon returning to user space, it calls the requester's callback
with the cookie as well as the stacktrace. The cookie is cleared
when it goes back to user space. Note, this currently adds another
conditional to the unwind_reset_info() path that is always called
returning to user space, but future changes will put this back to a single
conditional.
A global list is created and protected by a global mutex that holds
tracers that register with the unwind infrastructure. The number of
registered tracers will be limited in future changes. Each perf program or
ftrace instance will register its own descriptor to use for deferred
unwind stack traces.
Note, in the function unwind_deferred_task_work() that gets called when
returning to user space, it uses a global mutex for synchronization which
will cause a big bottleneck. This will be replaced by SRCU, but that
change adds some complex synchronization that deservers its own commit.
Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Indu Bhagat <indu.bhagat@oracle.com> Cc: "Jose E. Marchesi" <jemarch@gnu.org> Cc: Beau Belgrave <beaub@linux.microsoft.com> Cc: Jens Remus <jremus@linux.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Florian Weimer <fweimer@redhat.com> Cc: Sam James <sam@gentoo.org> Link: https://lore.kernel.org/20250729182405.488066537@kernel.org Co-developed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Cache the results of the unwind to ensure the unwind is only performed
once, even when called by multiple tracers.
The cache nr_entries gets cleared every time the task exits the kernel.
When a stacktrace is requested, nr_entries gets set to the number of
entries in the stacktrace. If another stacktrace is requested, if
nr_entries is not zero, then it contains the same stacktrace that would be
retrieved so it is not processed again and the entries is given to the
caller.
Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Indu Bhagat <indu.bhagat@oracle.com> Cc: "Jose E. Marchesi" <jemarch@gnu.org> Cc: Beau Belgrave <beaub@linux.microsoft.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Florian Weimer <fweimer@redhat.com> Cc: Sam James <sam@gentoo.org> Link: https://lore.kernel.org/20250729182405.319691167@kernel.org Reviewed-by: Jens Remus <jremus@linux.ibm.com> Reviewed-By: Indu Bhagat <indu.bhagat@oracle.com> Co-developed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Pavel Tikhomirov [Mon, 21 Jul 2025 03:49:13 +0000 (11:49 +0800)]
dm-raid: do not include dm-core.h
In commit 4cc96131afce ("dm: move request-based code out to dm-rq.[hc]")
we have a note: "DM targets should _never_ include dm-core.h!". And it
is not used in any DM targets except dm-raid now, so let's remove it
from dm-raid for consistency, also use special helpers instead of
accessing dm_table and mapper_device fields directly. This change is
merely a cleanup and should not affect functionality.
md: dm-zoned-target: Initialize return variable r to avoid uninitialized use
Fix Smatch-detected error:
drivers/md/dm-zoned-target.c:1073 dmz_iterate_devices()
error: uninitialized symbol 'r'.
Smatch detects a possible use of the uninitialized variable 'r' in
dmz_iterate_devices() because if dmz->nr_ddevs is zero, the loop is
skipped and 'r' is returned without being set, leading to undefined
behavior.
Initialize 'r' to 0 before the loop. This ensures that if there are no
devices to iterate over, the function still returns a defined value.
Eric Biggers [Wed, 9 Jul 2025 19:09:02 +0000 (12:09 -0700)]
dm-verity: remove support for asynchronous hashes
The support for asynchronous hashes in dm-verity has outlived its
usefulness. It adds significant code complexity and opportunity for
bugs. I don't know of anyone using it in practice. (The original
submitter of the code possibly was, but that was 8 years ago.) Data I
recently collected for en/decryption shows that using off-CPU crypto
"accelerators" is consistently much slower than the CPU
(https://lore.kernel.org/r/20250704070322.20692-1-ebiggers@kernel.org/),
even on CPUs that lack dedicated cryptographic instructions. Similar
results are likely to be seen for hashing.
I already removed support for asynchronous hashes from fsverity two
years ago, and no one ever complained.
Moreover, neither dm-verity, fsverity, nor fscrypt has ever actually
used the asynchronous crypto algorithms in a truly asynchronous manner.
The lack of interest in such optimizations provides further evidence
that it's only the CPU-based crypto that actually matters.
Historically, it's also been common for people to forget to enable the
optimized SHA-256 code, which could contribute to an off-CPU crypto
engine being perceived as more useful than it really is. In 6.16 I
fixed that: the optimized SHA-256 code is now enabled by default.
Therefore, let's drop the support for asynchronous hashes in dm-verity.
Keith Busch [Tue, 29 Jul 2025 18:12:47 +0000 (11:12 -0700)]
nvme-pci: fix leak on sgl setup error
We need to free the descriptor that was allocated. We also don't
necessarily need to unmap each sgl entry, which was previously being
attempted unconditionally.
Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
nvmet: initialize discovery subsys after debugfs is initialized
During nvme target initialization discovery subsystem is initialized
before "nvmet" debugfs directory is created. This results in discovery
subsystem debugfs directory to be created in debugfs root directory.
In other words, the codepath above is exeucted before nvmet_debugfs is
created. We get /sys/kernel/debug/nqn.2014-08.org.nvmexpress.discovery
instead of /sys/kernel/debug/nvmet/nqn.2014-08.org.nvmexpress.discovery.
Move nvmet_init_discovery() call after nvmet_init_debugfs() to fix it.
Fixes: 649fd41420a8 ("nvmet: add debugfs support") Signed-off-by: Mohamed Khalfella <mkhalfella@purestorage.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Hannes Reinecke <hare@kernel.org> Reviewed-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
nvme: add capability to connect to an administrative controller
Add capability to connect to an administrative controller by
preventing ioq creation for admin-controllers.
Add a nvme_admin_ctrl() to check if a controller's CNTRLTYPE indicates
that it is an administrative controller and override ctrl->queue_count to
1 for admin controllers, so that only the admin queue and no I/O queues
are created for an administrative controller. This override is done in
nvme_init_ctrl_finish() after ctrl->cntrltype has been initialized in
nvme_init_identify() so nvme_admin_ctrl() will work correctly.
Doing this override in generic code (nvme_init_ctrl_finish) makes it
transport agnostic and will work properly for nvme/tcp as well as for
nvme/rdma.
Suggested-by: Niklas Cassel <cassel@kernel.org> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Niklas Cassel <cassel@kernel.org> Signed-off-by: Kamaljit Singh <kamaljit.singh1@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
nvmet: add support for FDP in fabrics passthru path
Add support for admin_get_feature FDP(0x1d) feature id, thus enabling
FDP at the initiator side for the target controller and namespaces
attached to it.
Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
Petr Pavlu [Mon, 30 Jun 2025 14:32:36 +0000 (16:32 +0200)]
module: Rename MAX_PARAM_PREFIX_LEN to __MODULE_NAME_LEN
The maximum module name length (MODULE_NAME_LEN) is somewhat confusingly
defined in terms of the maximum parameter prefix length
(MAX_PARAM_PREFIX_LEN), when in fact the dependency is in the opposite
direction.
This split originates from commit 730b69d22525 ("module: check kernel param
length at compile time, not runtime"). The code needed to use
MODULE_NAME_LEN in moduleparam.h, but because module.h requires
moduleparam.h, this created a circular dependency. It was resolved by
introducing MAX_PARAM_PREFIX_LEN in moduleparam.h and defining
MODULE_NAME_LEN in module.h in terms of MAX_PARAM_PREFIX_LEN.
Rename MAX_PARAM_PREFIX_LEN to __MODULE_NAME_LEN for clarity. This matches
the similar approach of defining MODULE_INFO in module.h and __MODULE_INFO
in moduleparam.h.
Petr Pavlu [Mon, 30 Jun 2025 14:32:35 +0000 (16:32 +0200)]
tracing: Replace MAX_PARAM_PREFIX_LEN with MODULE_NAME_LEN
Use the MODULE_NAME_LEN definition in module_exists() to obtain the maximum
size of a module name, instead of using MAX_PARAM_PREFIX_LEN. The values
are the same but MODULE_NAME_LEN is more appropriate in this context.
MAX_PARAM_PREFIX_LEN was added in commit 730b69d22525 ("module: check
kernel param length at compile time, not runtime") only to break a circular
dependency between module.h and moduleparam.h, and should mostly be limited
to use in moduleparam.h.
Signed-off-by: Petr Pavlu <petr.pavlu@suse.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Reviewed-by: Daniel Gomez <da.gomez@samsung.com> Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org> Link: https://lore.kernel.org/r/20250630143535.267745-5-petr.pavlu@suse.com Signed-off-by: Daniel Gomez <da.gomez@samsung.com>
Petr Pavlu [Mon, 30 Jun 2025 14:32:34 +0000 (16:32 +0200)]
module: Restore the moduleparam prefix length check
The moduleparam code allows modules to provide their own definition of
MODULE_PARAM_PREFIX, instead of using the default KBUILD_MODNAME ".".
Commit 730b69d22525 ("module: check kernel param length at compile time,
not runtime") added a check to ensure the prefix doesn't exceed
MODULE_NAME_LEN, as this is what param_sysfs_builtin() expects.
Later, commit 58f86cc89c33 ("VERIFY_OCTAL_PERMISSIONS: stricter checking
for sysfs perms.") removed this check, but there is no indication this was
intentional.
Since the check is still useful for param_sysfs_builtin() to function
properly, reintroduce it in __module_param_call(), but in a modernized form
using static_assert().
While here, clean up the __module_param_call() comments. In particular,
remove the comment "Default value instead of permissions?", which comes
from commit 9774a1f54f17 ("[PATCH] Compile-time check re world-writeable
module params"). This comment was related to the test variable
__param_perm_check_##name, which was removed in the previously mentioned
commit 58f86cc89c33.
Fixes: 58f86cc89c33 ("VERIFY_OCTAL_PERMISSIONS: stricter checking for sysfs perms.") Signed-off-by: Petr Pavlu <petr.pavlu@suse.com> Reviewed-by: Daniel Gomez <da.gomez@samsung.com> Link: https://lore.kernel.org/r/20250630143535.267745-4-petr.pavlu@suse.com Signed-off-by: Daniel Gomez <da.gomez@samsung.com>
Petr Pavlu [Mon, 30 Jun 2025 14:32:33 +0000 (16:32 +0200)]
module: Remove unnecessary +1 from last_unloaded_module::name size
The variable last_unloaded_module::name tracks the name of the last
unloaded module. It is a string copy of module::name, which is
MODULE_NAME_LEN bytes in size and includes the NUL terminator. Therefore,
the size of last_unloaded_module::name can also be just MODULE_NAME_LEN,
without the need for an extra byte.
Fixes: e14af7eeb47e ("debug: track and print last unloaded module in the oops trace") Signed-off-by: Petr Pavlu <petr.pavlu@suse.com> Reviewed-by: Daniel Gomez <da.gomez@samsung.com> Link: https://lore.kernel.org/r/20250630143535.267745-3-petr.pavlu@suse.com Signed-off-by: Daniel Gomez <da.gomez@samsung.com>
Petr Pavlu [Mon, 30 Jun 2025 14:32:32 +0000 (16:32 +0200)]
module: Prevent silent truncation of module name in delete_module(2)
Passing a module name longer than MODULE_NAME_LEN to the delete_module
syscall results in its silent truncation. This really isn't much of
a problem in practice, but it could theoretically lead to the removal of an
incorrect module. It is more sensible to return ENAMETOOLONG or ENOENT in
such a case.
Update the syscall to return ENOENT, as documented in the delete_module(2)
man page to mean "No module by that name exists." This is appropriate
because a module with a name longer than MODULE_NAME_LEN cannot be loaded
in the first place.
Thomas Weißschuh [Fri, 11 Jul 2025 13:31:38 +0000 (15:31 +0200)]
kunit: test: Drop CONFIG_MODULE ifdeffery
The function stubs exposed by module.h allow the code to compile properly
without the ifdeffery. The generated object code stays the same, as the
compiler can optimize away all the dead code.
As the code is still typechecked developer errors can be detected faster.
Thomas Weißschuh [Fri, 11 Jul 2025 13:31:37 +0000 (15:31 +0200)]
module: make structure definitions always visible
To write code that works with both CONFIG_MODULES=y and CONFIG_MODULES=n
it is convenient to use "if (IS_ENABLED(CONFIG_MODULES))" over raw #ifdef.
The code will still fully typechecked but the unreachable parts are
discarded by the compiler. This prevents accidental breakage when a certain
kconfig combination was not specifically tested by the developer.
This pattern is already supported to some extend by module.h defining
empty stub functions if CONFIG_MODULES=n.
However some users of module.h work on the structured defined by module.h.
Therefore these structure definitions need to be visible, too.
Many structure members are still gated by specific configuration settings.
The assumption for those is that the code using them will be gated behind
the same configuration setting anyways.
Thomas Weißschuh [Fri, 11 Jul 2025 13:31:36 +0000 (15:31 +0200)]
module: move 'struct module_use' to internal.h
The struct was moved to the public header file in commit c8e21ced08b3
("module: fix kdb's illicit use of struct module_use.").
Back then the structure was used outside of the module core.
Nowadays this is not true anymore, so the structure can be made internal.
A port link power management (LPM) policy can be controlled using the
link_power_management_policy sysfs host attribute. However, this
attribute exists also for hosts that do not support LPM and in such
case, attempting to change the LPM policy for the host (port) will fail
with -EOPNOTSUPP.
Introduce the new sysfs link_power_management_supported host attribute
to indicate to the user if a the port and the devices connected to the
port for the host support LPM, which implies that the
link_power_management_policy attribute can be used.
Since checking that a port and its devices support LPM is common between
the new ata_scsi_lpm_supported_show() function and the existing
ata_scsi_lpm_store() function, the new helper ata_scsi_lpm_supported()
is introduced.
Fixes: 413e800cadbf ("ata: libata-sata: Disallow changing LPM state if not supported") Reported-by: Borah, Chaitanya Kumar <chaitanya.kumar.borah@intel.com> Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202507251014.a5becc3b-lkp@intel.com Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Damien Le Moal [Tue, 29 Jul 2025 10:37:12 +0000 (19:37 +0900)]
ata: libata-scsi: Return aborted command when missing sense and result TF
ata_gen_ata_sense() is always called for a failed qc missing sense data
so that a sense key, code and code qualifier can be generated using
ata_to_sense_error() from the qc status and error fields of its result
task file. However, if the qc does not have its result task file filled,
ata_gen_ata_sense() returns early without setting a sense key.
Improve this by defaulting to returning ABORTED COMMAND without any
additional sense code, since we do not know the reason for the failure.
The same fix is also applied in ata_gen_passthru_sense() with the
additional check that the qc failed (qc->err_mask is set).
Fixes: 816be86c7993 ("ata: libata-scsi: Check ATA_QCFLAG_RTF_FILLED before using result_tf") Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Damien Le Moal [Tue, 29 Jul 2025 09:28:07 +0000 (18:28 +0900)]
ata: libata-scsi: Fix ata_to_sense_error() status handling
Commit 8ae720449fca ("libata: whitespace fixes in ata_to_sense_error()")
inadvertantly added the entry 0x40 (ATA_DRDY) to the stat_table array in
the function ata_to_sense_error(). This entry ties a failed qc which has
a status filed equal to ATA_DRDY to the sense key ILLEGAL REQUEST with
the additional sense code UNALIGNED WRITE COMMAND. This entry will be
used to generate a failed qc sense key and sense code when the qc is
missing sense data and there is no match for the qc error field in the
sense_table array of ata_to_sense_error().
As a result, for a failed qc for which we failed to get sense data (e.g.
read log 10h failed if qc is an NCQ command, or REQUEST SENSE EXT
command failed for the non-ncq case, the user very often end up seeing
the completely misleading "unaligned write command" error, even if qc
was not a write command. E.g.:
Fix this by removing the ATA_DRDY entry from the stat_table array so
that we default to always returning ABORTED COMMAND without any
additional sense code, since we do not know any better. The entry 0x08
(ATA_DRQ) is also removed since signaling ABORTED COMMAND with a parity
error is also misleading (as a parity error would likely be signaled
through a bus error). So for this case, also default to returning
ABORTED COMMAND without any additional sense code. With this, the
previous example error case becomes:
sd 0:0:0:0: [sda] tag#17 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
sd 0:0:0:0: [sda] tag#17 Sense Key : Aborted Command [current]
sd 0:0:0:0: [sda] tag#17 Add. Sense: No additional sense information
sd 0:0:0:0: [sda] tag#17 CDB: Read(10) 28 00 00 00 10 00 00 00 08 00
I/O error, dev sda, sector 4096 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
Together with these fixes, refactor stat_table to make it more readable
by putting the entries comments in front of the entries and using the
defined status bits macros instead of hardcoded values.
Reported-by: Lorenz Brun <lorenz@brun.one> Reported-by: Brandon Schwartz <Brandon.Schwartz@wdc.com> Fixes: 8ae720449fca ("libata: whitespace fixes in ata_to_sense_error()") Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Merge tag 'drm-next-2025-07-30' of https://gitlab.freedesktop.org/drm/kernel
Pull drm updates from Dave Airlie:
"Highlights:
- Intel xe enable Panthor Lake, started adding WildCat Lake
- amdgpu has a bunch of reset improvments along with the usual IP
updates
- msm got VM_BIND support which is important for vulkan sparse memory
- more drm_panic users
- gpusvm common code to handle a bunch of core SVM work outside
drivers.
Detail summary:
Changes outside drm subdirectory:
- 'shrink_shmem_memory()' for better shmem/hibernate interaction
- Rust support infrastructure:
- make ETIMEDOUT available
- add size constants up to SZ_2G
- add DMA coherent allocation bindings
- mtd driver for Intel GPU non-volatile storage
- i2c designware quirk for Intel xe
core:
- atomic helpers: tune enable/disable sequences
- add task info to wedge API
- refactor EDID quirks
- connector: move HDR sink to drm_display_info
- fourcc: half-float and 32-bit float formats
- mode_config: pass format info to simplify
dma-buf:
- heaps: Give CMA heap a stable name
ci:
- add device tree validation and kunit
displayport:
- change AUX DPCD access probe address
- add quirk for DPCD probe
- add panel replay definitions
- backlight control helpers
fbdev:
- make CONFIG_FIRMWARE_EDID available on all arches
fence:
- fix UAF issues
format-helper:
- improve tests
gpusvm:
- introduce devmem only flag for allocation
- add timeslicing support to GPU SVM
panel:
- switch to reference counter drm_panel allocations
- fwnode panel lookup
- Huiling hl055fhv028c support
- Raspberry Pi 7" 720x1280 support
- edp: KDC KD116N3730A05, N160JCE-ELL CMN, N116BCJ-EAK
- simple: AUO P238HAN01
- st7701: Winstar wf40eswaa6mnn0
- visionox: rm69299-shift
- Renesas R61307, Renesas R69328 support
- DJN HX83112B
hdmi:
- add CEC handling
- YUV420 output support
xe:
- WildCat Lake support
- Enable PanthorLake by default
- mark BMG as SRIOV capable
- update firmware recommendations
- Expose media OA units
- aux-bux support for non-volatile memory
- MTD intel-dg driver for non-volatile memory
- Expose fan control and voltage regulator in sysfs
- restructure migration for multi-device
- Restore GuC submit UAF fix
- make GEM shrinker drm managed
- SRIOV VF Post-migration recovery of GGTT nodes
- W/A additions/reworks
- Prefetch support for svm ranges
- Don't allocate managed BO for each policy change
- HWMON fixes for BMG
- Create LRC BO without VM
- PCI ID updates
- make SLPC debugfs files optional
- rework eviction rejection of bound external BOs
- consolidate PAT programming logic for pre/post Xe2
- init changes for flicker-free boot
- Enable GuC Dynamic Inhibit Context switch
i915:
- drm_panic support for i915/xe
- initial flip queue off by default for LNL/PNL
- Wildcat Lake Display support
- Support for DSC fractional link bpp
- Support for simultaneous Panel Replay and Adaptive sync
- Support for PTL+ double buffer LUT
- initial PIPEDMC event handling
- drm_panel_follower support
- DPLL interface renames
- allocate struct intel_display dynamically
- flip queue preperation
- abstract DRAM detection better
- avoid GuC scheduling stalls
- remove DG1 force probe requirement
- fix MEI interrupt handler on RT kernels
- use backlight control helpers for eDP
- more shared display code refactoring
amdgpu:
- add userq slot to INFO ioctl
- SR-IOV hibernation support
- Suspend improvements
- Backlight improvements
- Use scaling for non-native eDP modes
- cleaner shader updates for GC 9.x
- Remove fence slab
- SDMA fw checks for userq support
- RAS updates
- DMCUB updates
- DP tunneling fixes
- Display idle D3 support
- Per queue reset improvements
- initial smartmux support
amdkfd:
- enable KFD on loongarch
- mtype fix for ext coherent system memory
radeon:
- CS validation additional GL extensions
- drop console lock during suspend/resume
- bump driver version
msm:
- VM BIND support
- CI: infrastructure updates
- UBWC single source of truth
- decouple GPU and KMS support
- DP: rework I/O accessors
- DPU: SM8750 support
- DSI: SM8750 support
- GPU: X1-45 support and speedbin support for X1-85
- MDSS: SM8750 support
nova:
- register! macro improvements
- DMA object abstraction
- VBIOS parser + fwsec lookup
- sysmem flush page support
- falcon: generic falcon boot code and HAL
- FWSEC-FRTS: fb setup and load/execute
panfrost:
- MT8370 support
- bo labeling
- 64-bit register access
qaic:
- add RAS support
rockchip:
- convert inno_hdmi to a bridge
rz-du:
- add RZ/V2H(P) support
- MIPI-DSI DCS support
sitronix:
- ST7567 support
sun4i:
- add H616 support
tidss:
- add TI AM62L support
- AM65x OLDI bridge support
bochs:
- drm panic support
vkms:
- YUV and R* format support
- use faux device
vmwgfx:
- fence improvements
hyperv:
- move out of simple
- add drm_panic support"
* tag 'drm-next-2025-07-30' of https://gitlab.freedesktop.org/drm/kernel: (1479 commits)
drm/tidss: oldi: convert to devm_drm_bridge_alloc() API
drm/tidss: encoder: convert to devm_drm_bridge_alloc()
drm/amdgpu: move reset support type checks into the caller
drm/amdgpu/sdma7: re-emit unprocessed state on ring reset
drm/amdgpu/sdma6: re-emit unprocessed state on ring reset
drm/amdgpu/sdma5.2: re-emit unprocessed state on ring reset
drm/amdgpu/sdma5: re-emit unprocessed state on ring reset
drm/amdgpu/gfx12: re-emit unprocessed state on ring reset
drm/amdgpu/gfx11: re-emit unprocessed state on ring reset
drm/amdgpu/gfx10: re-emit unprocessed state on ring reset
drm/amdgpu/gfx9.4.3: re-emit unprocessed state on kcq reset
drm/amdgpu/gfx9: re-emit unprocessed state on kcq reset
drm/amdgpu: Add WARN_ON to the resource clear function
drm/amd/pm: Use cached metrics data on SMUv13.0.6
drm/amd/pm: Use cached data for min/max clocks
gpu: nova-core: fix bounds check in PmuLookupTableEntry::new
drm/amdgpu: Replace HQD terminology with slots naming
drm/amdgpu: Add user queue instance count in HW IP info
drm/amd/amdgpu: Add helper functions for isp buffers
drm/amd/amdgpu: Initialize swnode for ISP MFD device
...
netlink: avoid infinite retry looping in netlink_unicast()
netlink_attachskb() checks for the socket's read memory allocation
constraints. Firstly, it has:
rmem < READ_ONCE(sk->sk_rcvbuf)
to check if the just increased rmem value fits into the socket's receive
buffer. If not, it proceeds and tries to wait for the memory under:
rmem + skb->truesize > READ_ONCE(sk->sk_rcvbuf)
The checks don't cover the case when skb->truesize + sk->sk_rmem_alloc is
equal to sk->sk_rcvbuf. Thus the function neither successfully accepts
these conditions, nor manages to reschedule the task - and is called in
retry loop for indefinite time which is caught as:
rcu: INFO: rcu_sched self-detected stall on CPU
rcu: 0-....: (25999 ticks this GP) idle=ef2/1/0x4000000000000000 softirq=262269/262269 fqs=6212
(t=26000 jiffies g=230833 q=259957)
NMI backtrace for cpu 0
CPU: 0 PID: 22 Comm: kauditd Not tainted 5.10.240 #68
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-4.fc42 04/01/2014
Call Trace:
<IRQ>
dump_stack lib/dump_stack.c:120
nmi_cpu_backtrace.cold lib/nmi_backtrace.c:105
nmi_trigger_cpumask_backtrace lib/nmi_backtrace.c:62
rcu_dump_cpu_stacks kernel/rcu/tree_stall.h:335
rcu_sched_clock_irq.cold kernel/rcu/tree.c:2590
update_process_times kernel/time/timer.c:1953
tick_sched_handle kernel/time/tick-sched.c:227
tick_sched_timer kernel/time/tick-sched.c:1399
__hrtimer_run_queues kernel/time/hrtimer.c:1652
hrtimer_interrupt kernel/time/hrtimer.c:1717
__sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1113
asm_call_irq_on_stack arch/x86/entry/entry_64.S:808
</IRQ>