git.ipfire.org Git - thirdparty/kernel/linux.git/log

thermal/core: Fix missing stub for devm_thermal_cooling_device_register

Even it is very unlikely the thermal framework is disabled, the newly
added devm_thermal_cooling_device_register() function has not the stub
when the thermal framework is optout in the kernel.

Add it.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202605301554.S9n45bfQ-lkp@intel.com/
Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>
Acked-by: Rafael J. Wysocki (Intel) <rafael@kernel.org>
Link: https://patch.msgid.link/20260601090152.1243983-2-daniel.lezcano@kernel.org

dt-bindings: thermal: cooling-devices: Update support for 3 cells cooling device

Extend the thermal cooling device binding to support a 3 cells specifier
along with the 2 cells format.

Update #cooling-cells property to enum to support both 2 and 3 arguments.

Fix pwm-fan.yaml to restrict the number of cells to 'const: 2'

Signed-off-by: Gaurav Kohli <gaurav.kohli@oss.qualcomm.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@oss.qualcomm.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Acked-by: Rafael J. Wysocki (Intel) <rafael@kernel.org>
Link: https://patch.msgid.link/20260526140802.1059293-22-daniel.lezcano@oss.qualcomm.com

thermal/of: Support cooling device ID in cooling-spec

Extend the cooling device specifier parsing to support an optional
cooling device identifier (cdev_id).

Two formats are now supported:

  - Legacy format:
        <&cdev lower upper>

  - Indexed format:
        <&cdev cdev_id lower upper>

When the indexed format is used, both the device node and the
cdev_id must match in order to bind a cooling device to a thermal
zone. The legacy format continues to match on the device node only,
preserving backward compatibility.

Update the parsing logic accordingly to handle both formats and
extract the mitigation limits from the appropriate arguments.

This is a preparatory step for upcoming DT bindings describing
cooling devices using (device node, id) tuples instead of child
nodes.

No functional change for existing device trees.

Signed-off-by: Daniel Lezcano <daniel.lezcano@oss.qualcomm.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Acked-by: Rafael J. Wysocki (Intel) <rafael@kernel.org>
Link: https://patch.msgid.link/20260526140802.1059293-21-daniel.lezcano@oss.qualcomm.com

thermal/of: Pass cdev_id and introduce devm registration helper

Extend the OF cooling device registration to support an explicit
cooling device identifier (cdev_id), preparing for upcoming DT
bindings where cooling devices are identified by a tuple (device node,
id) instead of relying on child nodes.

Introduce a new helper:

devm_thermal_of_cooling_device_register()

which registers a cooling device using the device's of_node and an
explicit cdev_id. This complements the existing
devm_thermal_of_child_cooling_device_register() helper, which
remains dedicated to the legacy child-node based bindings.

Internally, factorize the devm registration logic into a common
helper to avoid code duplication.

Existing users are unaffected, as the child-based helper continues
to pass a default cdev_id of 0, preserving current behavior.

This change is a preparatory step for supporting indexed cooling
devices in thermal OF bindings.

Signed-off-by: Daniel Lezcano <daniel.lezcano@oss.qualcomm.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Acked-by: Rafael J. Wysocki (Intel) <rafael@kernel.org>
Link: https://patch.msgid.link/20260526140802.1059293-20-daniel.lezcano@oss.qualcomm.com

thermal/of: Add cooling device ID support

Introduce an identifier (cdev_id) for cooling devices registered from
device tree.

This prepares support for a new DT binding where cooling devices are
identified by a tuple (device node, ID), instead of relying on child
nodes.

Existing users are updated to pass a default ID of 0, preserving the
current behavior.

Future changes will extend the cooling map parsing to match cooling
devices based on both the device node and the ID.

No functional change intended.

Signed-off-by: Daniel Lezcano <daniel.lezcano@oss.qualcomm.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Acked-by: Rafael J. Wysocki (Intel) <rafael@kernel.org>
Link: https://patch.msgid.link/20260526140802.1059293-19-daniel.lezcano@oss.qualcomm.com

thermal/of: Rename the devm_thermal_of_cooling_device_register() function

To clarify that the function operates on child nodes, rename:

          devm_thermal_of_cooling_device_register()
                     |
     v
       devm_thermal_of_child_cooling_device_register()

Used the command:

      find . -type f -name '*.[ch]' -exec \
sed -i 's/devm_thermal_of_cooling_device_register/\
devm_thermal_of_child_cooling_device_register/g' {} \;

Did not used clang-format-diff because it does not indent correctly
and checkpatch complained. Manually reindented to make checkpatch
happy

This prepares for upcoming support of cooling devices identified by
an ID rather than device tree child nodes.

No functional change.

Signed-off-by: Daniel Lezcano <daniel.lezcano@oss.qualcomm.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Acked-by: Rafael J. Wysocki (Intel) <rafael@kernel.org>
Link: https://patch.msgid.link/20260526140802.1059293-18-daniel.lezcano@oss.qualcomm.com

thermal/core: Make cooling device OF node conditional on CONFIG_THERMAL_OF

The device node pointer stored in struct thermal_cooling_device is
only used by the OF-specific thermal code to associate cooling devices
with thermal zones defined in device tree.

Now that OF and non-OF registration paths are separated and non-OF
users no longer rely on devm_thermal_of_cooling_device_register() with
a NULL device node, the np field is no longer required for non-OF
configurations.

Make this field conditional on CONFIG_THERMAL_OF to reduce memory
footprint and better reflect its usage.

Signed-off-by: Daniel Lezcano <daniel.lezcano@oss.qualcomm.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>
Acked-by: Rafael J. Wysocki (Intel) <rafael@kernel.org>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Link: https://patch.msgid.link/20260526140802.1059293-16-daniel.lezcano@oss.qualcomm.com

thermal/of: Move cooling device OF helpers out of thermal core

The functions:
- thermal_of_cooling_device_register()
- devm_thermal_of_cooling_device_register()

are specific to device tree usage but are currently implemented in
thermal_core.c.

Move them to thermal_of.c to better reflect the separation between
generic thermal core code and OF-specific logic.

This change is enabled by the recent split of the cooling device
registration into allocation and addition phases, allowing OF-specific
handling (such as device node assignment) to be isolated from the core.

No functional change intended.

Signed-off-by: Daniel Lezcano <daniel.lezcano@oss.qualcomm.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Acked-by: Rafael J. Wysocki (Intel) <rafael@kernel.org>
Link: https://patch.msgid.link/20260526140802.1059293-17-daniel.lezcano@oss.qualcomm.com

hwmon: Use non-OF thermal cooling device registration API

Some HWMON drivers register cooling devices using the OF helper
devm_thermal_of_cooling_device_register() with a NULL device node.

With the introduction of a dedicated non-OF registration API,
switch these users to devm_thermal_cooling_device_register()
to make the intent explicit and avoid relying on OF-specific helpers.

This is a pure refactoring with no functional change.

Signed-off-by: Daniel Lezcano <daniel.lezcano@oss.qualcomm.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Acked-by: Guenter Roeck <linux@roeck-us.net>
Acked-by: Rafael J. Wysocki (Intel) <rafael@kernel.org>
Link: https://patch.msgid.link/20260526140802.1059293-15-daniel.lezcano@oss.qualcomm.com

thermal/core: Add devm_thermal_cooling_device_register()

Introduce a device-managed variant of the non-OF cooling device
registration API.

This complements devm_thermal_of_cooling_device_register() and allows
non-device-tree users to register cooling devices with automatic
cleanup tied to the device lifecycle.

The helper relies on devm_add_action_or_reset() to release the cooling
device via thermal_cooling_device_release() on driver detach or probe
failure.

This keeps the API consistent across OF and non-OF users and avoids
manual cleanup in error paths.

Signed-off-by: Daniel Lezcano <daniel.lezcano@oss.qualcomm.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Acked-by: Rafael J. Wysocki (Intel) <rafael@kernel.org>
Link: https://patch.msgid.link/20260526140802.1059293-14-daniel.lezcano@oss.qualcomm.com

thermal/core: Introduce non-OF thermal_cooling_device_register()

Split the cooling device registration API into OF and non-OF variants.

Introduce thermal_cooling_device_register() for non-device-tree users
and rework thermal_of_cooling_device_register() to use the new
alloc/add split.

This removes the need for the internal __thermal_cooling_device_register()
helper and makes the separation between OF and non-OF users explicit.

Signed-off-by: Daniel Lezcano <daniel.lezcano@oss.qualcomm.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Acked-by: Rafael J. Wysocki (Intel) <rafael@kernel.org>
Link: https://patch.msgid.link/20260526140802.1059293-13-daniel.lezcano@oss.qualcomm.com

thermal/drivers/samsung: Enable TMU by default

The SoC Thermal Management Unit (TMU) is essential for proper operation
of the SoCs.  Kernel should not ask users choice of drivers when that
choice is obvious and known to the developers that answer should be
'yes' or 'module'.

Impact of making it default:

1. arm64 defconfig: No changes, already present in defconfig.

2. arm32: No changes, the driver is already selected by MACH_EXYNOS.

3. COMPILE_TEST builds: enable by default for arm32 or arm64 builds,
   whenever ARCH_EXYNOS is selected.  This has impact on build time and
   feels logical, because if one selects ARCH_EXYNOS then probably by
   default wants to build test it entirely.  Kernels with COMPILE_TEST
   are not supposed to be used for booting.

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>
Link: https://patch.msgid.link/20260526135312.8697-2-krzysztof.kozlowski@oss.qualcomm.com

thermal/driver/qoriq: Workaround unexpected temperature readings from tmu

Invalid temperature measurements may be observed across the temperature
range specified in the device data sheet. The invalid temperature can
be read from any remote site and from any capture or report registers.
The invalid change in temperature can be positive or negative and the
resulting temperature can be outside the calibrated range, in which
case the TSR[ORL] or TSR[ORH] bit will be set.

Workaround:
Use the raising/falling edge threshold to filter out the invalid temp.
Check the TIDR register to make sure no jump happens When reading the temp.

i.MX93 ERR052243:
(https://www.nxp.com/webapp/Download?colCode=IMX93_2P87F&appType=license)

Signed-off-by: Jacky Bai <ping.bai@nxp.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>
Link: https://patch.msgid.link/20260430-imx93_tmu-v6-3-485459d7b54f@nxp.com

thermal/drivers/qoriq: Add i.MX93 tmu support

For Thermal monitor unit(TMU) used on i.MX93, the HW revision info read
from the ID register is the same the one used on some of the QorIQ
platform, but the config has some slight differance. Add i.MX93 compatible
string and corresponding code for it.

Signed-off-by: Alice Guo <alice.guo@nxp.com>
Signed-off-by: Jacky Bai <ping.bai@nxp.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20260430-imx93_tmu-v6-2-485459d7b54f@nxp.com

dt-bindings: thermal: qoriq: Add compatible string for imx93

Add i.MX93 compatible string 'fsl,imx93-tmu' because Thermal monitor
unit(TMU) on i.MX93 has differences with QorIQ platform and not fully
compatible with existing Platform, such as fsl,qoriq-tmu.

Signed-off-by: Jacky Bai <ping.bai@nxp.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>
Tested-by: Alexander Stein <alexander.stein@ew.tq-group.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://patch.msgid.link/20260430-imx93_tmu-v6-1-485459d7b54f@nxp.com

thermal/drivers/spacemit/k1: Add thermal sensor support

The thermal sensor on K1 supports monitoring five temperature zones.
The driver registers these sensors with the thermal framework
and supports standard operations:
- Reading temperature (millidegree Celsius)
- Setting high/low thresholds for interrupts

Signed-off-by: Shuwei Wu <shuwei.wu@mailbox.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>
Reviewed-by: Anand Moon <linux.amoon@gmail.com>
Reviewed-by: Troy Mitchell <troy.mitchell@linux.spacemit.com>
Reviewed-by: Yao Zi <me@ziyao.cc>
Tested-by: Anand Moon <linux.amoon@gmail.com>
Tested-by: Vincent Legoll <legoll@online.fr> # OrangePi-RV2
Tested-by: Gong Shuai <gsh517025@gmail.com>
Link: https://patch.msgid.link/20260427-k1-thermal-v5-2-df39187480ed@mailbox.org

dt-bindings: thermal: Add SpacemiT K1 thermal sensor

Document the SpacemiT K1 Thermal Sensor, which supports
monitoring temperatures for five zones: soc, package, gpu, cluster0,
and cluster1.

Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Signed-off-by: Shuwei Wu <shuwei.wu@mailbox.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.or>
Link: https://patch.msgid.link/20260427-k1-thermal-v5-1-df39187480ed@mailbox.org

thermal/drivers/imx: Do not split quoted string across lines

The checkpatch tool warns against splitting quoted strings across
multiple lines. Join the dev_info message into a single line to
improve the ability to grep for the message in the source.

Signed-off-by: Mayur Kumar <kmayur809@gmail.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>
Link: https://patch.msgid.link/20260511174255.215207-1-kmayur809@gmail.com

thermal/of: Fix trailing whitespace and repeated word

Correct a trailing whitespace error on line 101 and remove a
duplicated "which" in the kernel-doc comment for thermal_of_zone_register.

Signed-off-by: Mayur Kumar <kmayur809@gmail.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>
Link: https://patch.msgid.link/20260511161854.193573-1-kmayur809@gmail.com

thermal/drivers/qcom/tsens: Atomic temperature read with hardware-guided retries

The existing TSENS temperature read logic polls the valid bit and then
reads the temperature register. When temperature reads are triggered
at very short intervals, this can race with hardware updates and allow
the temperature field to be read while it is still being updated.

In this case, the valid bit may already be asserted even though the
temperature value is transitioning, resulting in an incorrect reading.

Hardware programming guidelines require the temperature value and the
valid bit to be sampled atomically in the same read transaction. A
reading is considered valid only if the valid bit is observed set in
that same sample.

The guidelines further specify that software should attempt the
temperature read up to three times to account for transient update
windows. If none of the attempts yields a valid sample, a stable fallback
value must be returned: if the first and second samples match, the second
value is returned;otherwise, if the second and third samples match, the
third value is returned;if neither pair matches, -EAGAIN is returned.

Update the TSENS sensor read logic to implement atomic sampling along
with the recommended retry-and-compare fallback behavior. This removes
the race window and ensures deterministic temperature values in
accordance with hardware requirements.

Signed-off-by: Priyansh Jain <priyansh.jain@oss.qualcomm.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Link: https://patch.msgid.link/20260514113643.1954111-1-priyansh.jain@oss.qualcomm.com

dt-bindings: thermal: qcom-tsens: Document the Hawi Temperature Sensor

Document the Temperature Sensor (TSENS) on the Qualcomm Hawi SoC.

Signed-off-by: Dipa Ramesh Mantre <dipa.mantre@oss.qualcomm.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Link: https://patch.msgid.link/20260512-dtbinding-hawi-v1-1-96149d06cccf@oss.qualcomm.com

dt-bindings: thermal: qcom-tsens: Document the Shikra Temperature Sensor

Document the Temperature Sensor (TSENS) on the Shikra SoC.

Signed-off-by: Gaurav Kohli <gaurav.kohli@oss.qualcomm.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Link: https://patch.msgid.link/20260513-tsens_binding-v1-1-1780c6a6caf2@oss.qualcomm.com

thermal/drivers/qcom: Fix typo in comment

Fix a typo in the struct tsens_irq_data comment.
Replace "uppper" with "upper".

Signed-off-by: Jinseok Kim <always.starving0@gmail.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>
Acked-by: Amit Kucheria <amitk@kernel.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Link: https://patch.msgid.link/20260516152324.1863-1-always.starving0@gmail.com

thermal/drivers/amlogic: Add support for secure monitor calibration readout

Some SoCs (e.g. T7) expose thermal calibration data through the secure
monitor rather than a directly accessible eFuse register. Add a use_sm
flag to amlogic_thermal_data to select this path, and retrieve the
firmware handle and tsensor_id from the "amlogic,secure-monitor" DT
phandle with one fixed argument.

Also introduce the amlogic,t7-thermal compatible using this new path.

While refactoring, fix a pre-existing bug where
amlogic_thermal_initialize() was called after
devm_thermal_of_zone_register(), causing the thermal framework to
read an uninitialized trim_info on zone registration.

Signed-off-by: Ronald Claveau <linux-kernel-dev@aliel.fr>
Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patch.msgid.link/20260424-add-thermal-t7-vim4-v5-4-9040ca36afe2@aliel.fr

thermal/drivers/amlogic: Add missing dependency on MESON_SM

The amlogic thermal driver calls meson_sm_get() and
meson_sm_get_thermal_calib() which are exported by the meson_sm
driver. Without CONFIG_MESON_SM enabled, the build fails with
undefined references to these symbols.

Add a proper Kconfig dependency on MESON_SM instead of relying on
stub functions, which makes the dependency explicit and prevents
invalid configurations.

Closes: https://lore.kernel.org/oe-kbuild-all/202605291530.en7aGn7w-lkp@intel.com/
Reported-by: Mark Brown <broonie@kernel.org>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Ronald Claveau <linux-kernel-dev@aliel.fr>
Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>
Link: https://lore.kernel.org/oe-kbuild-all/202605291530.en7aGn7w-lkp@intel.com/
Link: https://patch.msgid.link/20260602-fix-missing-meson_sm-symbol-v3-1-6f7f69cd7d6c@aliel.fr

Merge patch series "super: retire sget()"

Christian Brauner <brauner@kernel.org> says:

CIFS plus the two ext4 KUnit tests (extents-test, mballoc-test) were the
last in-tree callers, and all three convert cleanly to sget_fc(). That
lets sget() and its prototype come out, taking ~60 lines that only
existed to be kept in lockstep with sget_fc() on every publish-path
change.

* patches from https://patch.msgid.link/20260529-work-sget-v2-0-57bbe08604e4@kernel.org:
  fs: retire sget()
  smb: client: convert cifs_smb3_do_mount() to sget_fc()
  ext4: convert mballoc KUnit test to sget_fc()
  ext4: convert extents KUnit test to sget_fc()

Link: https://patch.msgid.link/20260529-work-sget-v2-0-57bbe08604e4@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>

fs: retire sget()

sget() and sget_fc() have lived side by side as near-duplicate
find-or-create-and-publish helpers for the legacy and fs_context mount
APIs. The three remaining in-tree callers (CIFS plus the ext4 extents
and mballoc KUnit tests) have all been moved to sget_fc(). Nothing
calls sget() anymore.

Delete sget() from fs/super.c and the prototype in <linux/fs.h>.
Update the two comments that referred to "sget()" or "sget{_fc}()" to
just say "sget_fc()".

This removes ~60 lines of code that only existed to be kept in
lockstep with sget_fc() on every superblock publish-path change.

Link: https://patch.msgid.link/20260529-work-sget-v2-4-57bbe08604e4@kernel.org
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>

smb: client: convert cifs_smb3_do_mount() to sget_fc()

The CIFS mount path already runs through fs_context: smb3_get_tree()
calls smb3_get_tree_common() with a struct fs_context * in hand. But
the fc is dropped on the way to sget(). Plumb it through to sget_fc()
so the legacy sget() interface can go.

cifs_smb3_do_mount() now takes (struct fs_context *, struct
smb3_fs_context *). The old (fs_type, flags) pair is reconstructed
from fc->fs_type and fc->sb_flags. The flags argument was always
passed as 0 by the sole caller anyway. The cifs_dbg diagnostic now
prints fc->sb_flags directly.

cifs_match_super() and cifs_set_super() were the two void-data
callbacks for sget(). The match callback now takes
(struct super_block *, struct fs_context *) and reads struct
cifs_mnt_data out of fc->sget_key. The set callback is gone entirely:
sget_fc() pre-populates sb->s_fs_info from fc->s_fs_info before
invoking set() so set_anon_super_fc() (which just allocates an anon
bdev) is sufficient.

Before sget_fc() we stash cifs_sb in fc->s_fs_info, the per-mount data
in fc->sget_key and force fc->sb_flags to SB_NODIRATIME | SB_NOATIME
to reproduce the previous hard-coded behaviour (alloc_super() reads
fc->sb_flags). The original sb_flags is saved and restored around the
call so the rest of the mount path sees the same fc semantics as
before.

mnt_data.flags keeps its historical value of 0 so the CIFS_MS_MASK
comparison in compare_mount_options() returns the same (always-equal)
result.

No functional change. With this in place sget() has no remaining CIFS
caller.

Link: https://patch.msgid.link/20260529-work-sget-v2-3-57bbe08604e4@kernel.org
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>

ext4: convert mballoc KUnit test to sget_fc()

Same treatment as the extents KUnit test. The mballoc test uses sget()
as a thin "give me an initialized superblock" wrapper for a fake
file_system_type. Move it onto sget_fc() so sget() can go away.

Add a no-op mbt_init_fs_context() so fs_context_for_mount() has
something to call on the fake fs_type. mbt_set() now takes a struct
fs_context * (still a no-op). mbt_ext4_alloc_super_block() allocates
the fc, hands it to sget_fc() and drops the fc reference once the sb
is published.

No functional change.

Link: https://patch.msgid.link/20260529-work-sget-v2-2-57bbe08604e4@kernel.org
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>

ext4: convert extents KUnit test to sget_fc()

The extents KUnit test uses sget() to get an initialized superblock for
its fake file_system_type. sget() predates fs_context and we want to
retire it. Switch this caller over to sget_fc().

Add a no-op ext_init_fs_context() so fs_context_for_mount() has
something to call on the fake fs_type. ext_set() now takes a struct
fs_context * (still a no-op). extents_kunit_init() allocates the fc,
hands it to sget_fc() and drops the fc reference once the sb is
published. sget_fc() does not retain a pointer to it.

No functional change for the test.

Link: https://patch.msgid.link/20260529-work-sget-v2-1-57bbe08604e4@kernel.org
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>

dma-mapping: direct: fix missing mapping for THRU_HOST_BRIDGE segments

In dma_direct_map_sg(), the case PCI_P2PDMA_MAP_THRU_HOST_BRIDGE
incorrectly used 'break' instead of falling through to MAP_NONE.
As a result, segments traversing the host bridge skipped the required
dma_direct_map_phys() call entirely, leaving sg->dma_address
uninitialized and leading to DMA failures. Fix this by using
'fallthrough;'.

Fixes: a25e7962db0d79 ("PCI/P2PDMA: Refactor the p2pdma mapping helpers")
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Link: https://lore.kernel.org/r/20260603013723.2439-1-lirongqing@baidu.com

ipv6: anycast: insert aca into global hash under idev->lock

syzbot reported a splat [1]: a slab-use-after-free in
ipv6_chk_acast_addr(), which walks the global inet6_acaddr_lst[] hash
under RCU and dereferences a struct ifacaddr6 that has already been
freed while still linked in the hash, so a later reader walks into a
dangling node.

In __ipv6_dev_ac_inc() the aca is allocated with refcount 1, then
aca_get() bumps it to 2 to keep it alive across the unlocked region.
It is published to idev->ac_list under idev->lock, but
ipv6_add_acaddr_hash() runs after write_unlock_bh(). A concurrent
teardown (ipv6_ac_destroy_dev() from addrconf_ifdown(), under RTNL)
can slip into that window:

  CPU0 __ipv6_dev_ac_inc           CPU1 ipv6_ac_destroy_dev (RTNL)
  ------------------------------   ------------------------------------
  aca_alloc()              refcnt 1
  aca_get()               refcnt 2
  write_lock_bh(idev->lock)
    add aca to ac_list
  write_unlock_bh(idev->lock)
                                   write_lock_bh(idev->lock)
                                     pull aca off ac_list
                                   write_unlock_bh(idev->lock)
                                   ipv6_del_acaddr_hash(aca)
                                     hlist_del_init_rcu() is a no-op,
                                     aca is not in the hash yet
                                   aca_put()           refcnt 2->1
  ipv6_add_acaddr_hash(aca)
    aca now inserted into the hash
  aca_put()                refcnt 1->0
    call_rcu(aca_free_rcu) -> kfree(aca)

The hash removal becomes a no-op because the insertion has not
happened yet, so once CPU0 inserts and drops the last reference, the
aca is freed while still linked in inet6_acaddr_lst[], and readers
dereference freed memory after the slab slot is reused.

This window opened once RTNL stopped serializing the join path against
device teardown. Move ipv6_add_acaddr_hash() inside the idev->lock
section so the ac_list and hash insertions are atomic with respect to
teardown: a racing remover now either misses the aca entirely or finds
it in both lists.

acaddr_hash_lock is now nested under idev->lock, which is acquired in
softirq context, so switch all acaddr_hash_lock sites to spin_lock_bh()
to avoid the irq lock inversion reported in [2].

[1] https://syzkaller.appspot.com/bug?extid=a01df04303c131efbf3a
[2] https://lore.kernel.org/netdev/6a194ef7.ba3b1513.1890b4.0000.GAE@google.com/

Reported-by: syzbot+819eb928d120d2bdad0e@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/6a191f87.ce022c6e.138e56.0003.GAE@google.com/T/
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Fixes: eb1ac9ff6c4a ("ipv6: anycast: Don't hold RTNL for IPV6_JOIN_ANYCAST.")
Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20260529152219.235475-1-jiayuan.chen@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

kbuild: Remove unnecessary 'T' modifier in cmd_ar_builtin_fixup

In cmd_ar_builtin_fixup, the 'T' modifier was added to '$(AR) mPi' to
work around a bug in llvm-ar that caused thin archives to be silently
converted to full archives [1]. Since commit 20c098928356 ("kbuild: Bump
minimum version of LLVM for building the kernel to 15.0.0"), all
supported versions of llvm-ar have this issue fixed, so the 'T' modifier
and comment can be removed.

Link: https://github.com/llvm/llvm-project/commit/d17c54d17de22d2961a04163f3dbc8e973de89b8
Signed-off-by: Nathan Chancellor <nathan@kernel.org>

n64cart: use strscpy in n64cart_probe

strcpy() has been deprecated [1] because it performs no bounds checking
on the destination buffer, which can lead to buffer overflows. While the
current code works correctly, replace strcpy() with the safer strscpy()
to follow secure coding best practices.

[1] https://www.kernel.org/doc/html/latest/process/deprecated.html#strcpy

Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Link: https://patch.msgid.link/20260517172617.3954-2-thorsten.blum@linux.dev
Signed-off-by: Jens Axboe <axboe@kernel.dk>

io_uring/nop: Drop a wrong comment in struct io_nop

This was copy-pasted from io_rw, where the comment actually makes sense.
Here, we don't have struct kiocb. Drop the comment.

Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
Link: https://patch.msgid.link/20260602215327.1885109-4-krisman@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>

io_uring/net: Remove async_size for OP_LISTEN

OP_LISTEN does not use async_data. Remove it.

Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
Link: https://patch.msgid.link/20260602215327.1885109-3-krisman@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>

io_uring/net: Avoid msghdr on op_connect/op_bind async data

Both IORING_OP_CONNECT and IORING_OP_BIND reuse the msghdr object just
to store the sockaddr. Beyond allocating a much larger object than
needed, msghdr can also wrap an iovec, which will be recycled
unnecessarily. This uses the sockaddr directly.

Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
Link: https://patch.msgid.link/20260602215327.1885109-2-krisman@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>

Input: atkbd - add DMI quirk for Lenovo Yoga Air 14 (83QK)

The Lenovo Yoga Air 14 (83QK) laptop keyboard becomes unresponsive
after the standard atkbd init sequence. Controlled testing on the
actual hardware shows the F5 (ATKBD_CMD_RESET_DIS / deactivate)
command specifically corrupts the EC state, causing zero IRQ1
interrupts after init.

Skipping only the deactivate command (while keeping F4 ENABLE)
resolves the issue completely: both keystroke input and CapsLock
LED toggle work correctly. The reverse test - skipping only F4
while keeping F5 - makes the problem worse (zero keystroke
interrupts), confirming F5 is the sole culprit.

Add a DMI quirk entry for LENOVO/83QK using the existing
atkbd_deactivate_fixup callback, consistent with the existing
entries for LG Electronics and HONOR FMB-P that address the
same EC F5 deactivate issue.

Signed-off-by: Zeyu WANG <zeyu.thomas.wang@gmail.com>
Link: https://patch.msgid.link/20260602170909.14725-1-zeyu.thomas.wang@gmail.com
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>

cgroup/cpuset: Change Ridong's email

The chenridong@huaweicloud.com is no longer a valid email,
replace it with the personal email ridong.chen@linux.dev

Signed-off-by: Ridong Chen <ridong.chen@linux.dev>
Acked-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>

sched_ext: Don't warn on NULL cgrp_moving_from in scx_cgroup_move_task()

A WARN fires when systemd's user manager writes "+cpu +memory +pids" to
its own subtree_control while a sched_ext scheduler is loaded:

  WARNING: at kernel/sched/ext.c:3227 scx_cgroup_move_task+0xa8/0xb0
   scx_cgroup_move_task+0xa8/0xb0
   sched_move_task+0x134/0x290
   cpu_cgroup_attach+0x39/0x70
   cgroup_migrate_execute+0x37d/0x450
   cgroup_update_dfl_csses+0x1e3/0x270
   cgroup_subtree_control_write+0x3e7/0x440

scx_cgroup_can_attach() arms cgrp_moving_from only when a task's cpu
cgroup changes. It can still be NULL when scx_cgroup_move_task() runs,
through this sequence:

  Step                               Result
  ---------------------------------  ----------------------------------
  1. cpu enabled on cgroup G         cpu css = A
  2. cpu toggled off then on for G   A killed, B created (same cgroup)
  3. an exiting task keeps A alive   migration skips it, A now stale
  4. +memory migrates G              stale A vs current B pulls cpu in
  5. cpu attach runs for all tasks   hits a live, cpu-unchanged task
  6. scx_cgroup_move_task() on it    cgrp_moving_from NULL -> WARN

The mismatch is that scx_cgroup_can_attach() keys on cgroup identity
while migration drives the move on css identity, so a NULL cgrp_moving_from
here is a legitimate css-only migration, not a missing prep.

The call is already gated on cgrp_moving_from, so just drop the warning.
ops.cgroup_prep_move() and ops.cgroup_move() stay paired.

Fixes: 819513666966 ("sched_ext: Add cgroup support")
Cc: stable@vger.kernel.org # v6.12+
Reported-by: Matt Fleming <mfleming@cloudflare.com>
Closes: https://lore.kernel.org/all/20260601124156.2205704-1-mfleming@cloudflare.com/
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Andrea Righi <arighi@nvidia.com>

sctp: diag: reject stale associations in dump_one path

The SCTP exact sock_diag lookup can hold a transport reference, block on
lock_sock(sk), and then resume after sctp_association_free() has marked
the association dead and freed its bind address list.

When that happens, inet_assoc_attr_size() and
inet_diag_msg_sctpasoc_fill() can still dereference association state
that is no longer valid for reporting. In particular,
inet_diag_msg_sctpasoc_fill() may read an empty bind-address list as a
real sctp_sockaddr_entry and trigger an out-of-bounds read from
unrelated association memory.

Reject the association after taking the socket lock if it has been
reaped or detached from the endpoint, and report the lookup as stale.
This keeps the exact dump-one path from formatting torn association
state.

Fixes: 8f840e47f190 ("sctp: add the sctp_diag.c file")
Cc: stable@kernel.org
Reported-by: Yuan Tan <yuantan098@gmail.com>
Reported-by: Yifan Wu <yifanwucs@gmail.com>
Reported-by: Juefei Pu <tomapufckgml@gmail.com>
Reported-by: Zhengchuan Liang <zcliangcn@gmail.com>
Reported-by: Xin Liu <bird@lzu.edu.cn>
Signed-off-by: Zhao Zhang <zzhan461@ucr.edu>
Signed-off-by: Ren Wei <n05ec@lzu.edu.cn>
Acked-by: Xin Long <lucien.xin@gmail.com>
Link: https://patch.msgid.link/fac6043fa20a2ff68e12958c431836f692c51268.1780113823.git.zzhan461@ucr.edu
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: fec: fix pinctrl default state restore order on resume

In fec_resume(), fec_enet_clk_enable() is called before
pinctrl_pm_select_default_state() in the non-WoL path, inverting the
ordering used in fec_suspend() which correctly switches to the sleep
pinctrl state before disabling clocks.

For PHYs with the PHY_RST_AFTER_CLK_EN flag (e.g. TI DP83848 or
SMSC LAN87xx), fec_enet_clk_enable() triggers a hardware reset pulse
via the phy-reset GPIO. With the GPIO pin still in sleep pinctrl state
at that point, the GPIO write has no physical effect and the PHY never
receives the required reset after clock enable, leading to unreliable
link establishment after system resume.

Fix by restoring the default pinctrl state before enabling clocks,
making resume the proper mirror of suspend. The call is made
unconditionally: fec_suspend() only switches to the sleep pinctrl state
on the non-WoL path and leaves the pins in the default state when WoL
is enabled, so on a WoL resume the device is already in the default
state and pinctrl_pm_select_default_state() is a no-op.

Fixes: de40ed31b3c5 ("net: fec: add Wake-on-LAN support")
Signed-off-by: Tapio Reijonen <tapio.reijonen@vaisala.com>
Reviewed-by: Wei Fang <wei.fang@nxp.com>
Link: https://patch.msgid.link/20260529-b4-fec-resume-pinctrl-order-v3-1-6eda0f592fca@vaisala.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests/rseq: Add config fragment

Currently there is no config fragment for the rseq selftests but there are
a couple of configuration options which are required for running them:

- CONFIG_RSEQ is required for obvious reasons, it is enabled by default
   but it doesn't hurt to specify it in case the user is usinsg a
   defconfig that disables it.

- CONFIG_RSEQ_SLICE_EXTENSION is tested by the slice_test test, the
   test will fail without it.

Add a configuration fragment which enables these options, helping encourage
CI systems and people doing manual testing to run the tests with all the
features. This also requires CONFIG_EXPERT since it is a dependency for
slice extension.

Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20260424-selftests-rseq-config-fragment-v2-1-a9475996edcb@kernel.org

net: lan743x: permit VLAN-tagged packets up to configured MTU

VLAN-tagged interfaces on lan743x devices were previously unreachable via
SSH and failed to respond to large ping packets (e.g. "ping -s 1469" given
MTU=1500). In these scenarios, "ethtool -S" reports non-zero "RX Oversize
Frame Errors". According to Microchip AN2948, the MAC_RX FSE (VLAN field
size enforcement) bit determines whether frames with VLAN tags exceeding
the base MTU plus tag length are discarded.

The driver must set the MAC_RX.FSE bit before setting MAC_RX.RXEN to allow
VLAN-tagged frames up to the interface MTU, preventing them from being
treated as oversized. As a result, both the base and VLAN-tagged interfaces
can use the same MTU without receive errors.

Fixes: 23f0703c125b ("lan743x: Add main source files for new lan743x driver")
Signed-off-by: David Thompson <davthompson@nvidia.com>
Reviewed-by: Thangaraj Samynathan <Thangaraj.s@microchip.com>
Reviewed-by: Nicolai Buchwitz <nb@tipi-net.de>
Tested-by: Nicolai Buchwitz <nb@tipi-net.de> # lan7430 on arm64 (RevPi
Link: https://patch.msgid.link/20260529210300.433135-1-davthompson@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

x86/platform/uv: Use str_enabled_disabled() in uv_nmi_setup_hubless_intr()

Replace hard-coded strings with the str_enabled_disabled() helper. This
unifies the output and helps the linker with deduplication, which can result
in a smaller binary. Additionally, address the following Coccinelle/coccicheck
warning reported by string_choices.cocci:

opportunity for str_enabled_disabled(uv_pch_intr_now_enabled)

Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Kyle Meyer <kyle.meyer@hpe.com>
Link: https://patch.msgid.link/20260504181945.143928-2-thorsten.blum@linux.dev

net: rds: clear i_sends on setup unwind

The RDS IB connection teardown path is written so it can run during
partial startup and on repeated shutdown attempts. It uses NULL
pointers to distinguish resources that are still owned from resources
that have already been released.

When rds_ib_setup_qp() fails after allocating i_sends but before
allocating i_recvs, the sends_out path frees i_sends without clearing
the pointer. A later shutdown pass can still treat that stale pointer
as a live send ring allocation.

Clear i_sends after vfree() in the error unwind path so the existing
shutdown logic continues to use the correct ownership state.

Fixes: 3b12f73a5c29 ("rds: ib: add error handle")
Cc: stable@kernel.org
Reported-by: Yuan Tan <yuantan098@gmail.com>
Reported-by: Zhengchuan Liang <zcliangcn@gmail.com>
Reported-by: Xin Liu <bird@lzu.edu.cn>
Signed-off-by: Yuqi Xu <xuyq21@lenovo.com>
Signed-off-by: Ren Wei <n05ec@lzu.edu.cn>
Reviewed-by: Allison Henderson <achender@kernel.org>
Link: https://patch.msgid.link/5a0f7624bb9845a7b67d26166a150b59e7f394ce.1779632468.git.xuyq21@lenovo.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

futex/requeue: Prevent NULL pointer dereference in remove_waiter() on self-deadlock

When FUTEX_CMP_REQUEUE_PI requeues a non-top waiter that already owns the
target PI futex, task_blocks_on_rt_mutex() returns -EDEADLK before setting
waiter->task.

The subsequent remove_waiter() in rt_mutex_start_proxy_lock() dereferences
the NULL waiter->task, causing a kernel crash.

Add a self-deadlock check for non-top waiters before calling
rt_mutex_start_proxy_lock(), analogous to the top-waiter check in
futex_lock_pi_atomic().

Fixes: 3bfdc63936dd4773109b7b8c280c0f3b5ae7d349 ("rtmutex: Use waiter::task instead of current in remove_waiter()")
Signed-off-by: Ji'an Zhou <eilaimemedsnaimel@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Cc: stable@vger.kernel.org

vdso/treewide: Drop GENERIC_TIME_VSYSCALL

This Kconfig symbol is not used anymore, remove it.

Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20260519-vdso-generic_time_vsyscal-v1-3-5c2a5905d5f5@linutronix.de

vdso/vsyscall: Gate update_vsyscall() behind CONFIG_GENERIC_GETTIMEOFDAY

Both the compilation of kernel/time/vsyscall.c, which contains the real
definition of update_vsyscall() and the other vDSO definitions in
timekeeper_internal.h use CONFIG_GENERIC_GETTIMEOFDAY and not
CONFIG_GENERIC_TIME_VSYSCALL.

Align the code to use a single Kconfig symbol.

Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20260519-vdso-generic_time_vsyscal-v1-2-5c2a5905d5f5@linutronix.de

riscv: vdso: Drop CONFIG_GENERIC_TIME_VSYSCALL guard around syscall fallbacks

The syscall definitions can be built just fine for 32-bit systems.
Also the guard does not cover __arch_get_hw_counter() which is always
used together with those system call fallbacks. Also this header is
unused when no vDSO is built anyways.

Drop the ifdeffery. The logic will be simpler to understand. Furthermore
this prepares the complete removal of CONFIG_GENERIC_TIME_VSYSCALL.

Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20260519-vdso-generic_time_vsyscal-v1-1-5c2a5905d5f5@linutronix.de

vdso/datastore: Mark vdso_k_*_data pointers as __ro_after_init

These pointers are only modified once in vdso_setup_data_pages(),
during the init phase. Make them read-only after that.

Drop __refdata as that would conflict with __ro_after_init.
Modpost does accept the reference from a __ro_after_init symbol to
an __init one.

Fixes: 05988dba1179 ("vdso/datastore: Allocate data pages dynamically")
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20260513-vdso-ro-after-init-v1-1-4b51f74015a4@linutronix.de

timers/migration: Turn tmigr_hierarchy level_list into a flexible array

The level_list array is allocated separately right after the parent
struct. The size of the array is already known.

Move level_list to the struct tail as a flexible array member and fold the
two allocations into a single kzalloc_flex().

Signed-off-by: Rosen Penev <rosenp@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Assisted-by: Claude:Opus-4.7
Link: https://patch.msgid.link/20260522231618.41622-1-rosenp@gmail.com

timers/migration: Deactivate per-capacity hierarchies under nohz_full

NOHZ_FULL CPUs global timers are guaranteed to be handled by the timekeeper
CPU, which never stops its tick and therefore remains active in the
hierarchy.

But since the introduction of per-capacity hierarchies, this guarantee is
broken because the timekeeper may not belong to the same hierarchy as all
the NOHZ_FULL CPUs.

Fix it with simply turning off capacity awareness when NOHZ_FULL is
running and force a single hierarchy. NOHZ_FULL is not exactly optimized
powerwise anyway.

Fixes: 098cbaad8e57 ("timers/migration: Split per-capacity hierarchies")
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20260519220926.63437-3-frederic@kernel.org

timers/migration: Fix hotplug migrator selection target on asymetric capacity machines

When a top-level migrator is deactivated, either at CPU down hotplug time
or when a CPU is domain isolated, a new migrator is elected among the
available CPUs and woken up to take over the migration duty.

However that election must happen at the scope of a given hierarchy and not
globally, which the introduction of per-capacity hierarchies failed to
handle.

As a result a given hierarchy may end up without migrator to handle global
timers.

Fix it by making sure that the new migrator belongs to the same hierarchy
as the outgoing CPU.

Fixes: 098cbaad8e57 ("timers/migration: Split per-capacity hierarchies")
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20260519220926.63437-2-frederic@kernel.org

sched/cputime: Handle dyntick-idle steal time correctly

The dyntick-idle steal time is currently accounted when the tick restarts
but the stolen idle time is not subtracted from the idle time that was
already accounted. This is to avoid observing the idle time going backward
as the dyntick-idle cputime accessors can't reliably know in advance the
stolen idle time.

In order to maintain a forward progressing idle cputime while subtracting
idle steal time from it, keep track of the previously accounted idle stolen
time and substract it from _later_ idle cputime accounting.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Tested-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Link: https://patch.msgid.link/20260508131647.43868-16-frederic@kernel.org

sched/cputime: Handle idle irqtime gracefully

The dyntick-idle cputime accounting always assumes that interrupt time
accounting is enabled and consequently stops elapsing the idle time during
dyntick-idle interrupts.

This doesn't mix up well with disabled interrupt time accounting because
then idle interrupts become a cputime blind-spot. Also this feature is
disabled on most configurations and the overhead of pausing dyntick-idle
accounting while in idle interrupts could then be avoided.

Fix the situation with conditionally pausing dyntick-idle accounting during
idle interrupts only iff either native vtime (which does interrupt time
accounting) or generic interrupt time accounting are enabled.

Also make sure that the accumulated interrupt time is not accidentally
substracted from later accounting.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Tested-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Link: https://patch.msgid.link/20260508131647.43868-15-frederic@kernel.org

sched/cputime: Provide get_cpu_[idle|iowait]_time_us() off-case

The last reason why get_cpu_idle/iowait_time_us() may return -1 now is if
the config doesn't support nohz.

The ad-hoc replacement solution by cpufreq is to compute jiffies minus the
whole busy cputime. Although the intention should provide a coherent low
resolution estimation of the idle and iowait time, the implementation is
buggy because jiffies don't start at 0.

Just provide instead a real get_cpu_[idle|iowait]_time_us() offcase.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Tested-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Link: https://patch.msgid.link/20260508131647.43868-14-frederic@kernel.org

tick/sched: Consolidate idle time fetching APIs

Fetching the idle cputime is available through a variety of accessors all
over the place depending on the different accounting flavours and needs:

  - idle vtime generic accounting can be accessed by kcpustat_field(),
    kcpustat_cpu_fetch(), get_idle/iowait_time() and
    get_cpu_idle/iowait_time_us()

  - dynticks-idle accounting can only be accessed by get_idle/iowait_time()
    or get_cpu_idle/iowait_time_us()

  - CONFIG_NO_HZ_COMMON=n idle accounting can be accessed by kcpustat_field()
    kcpustat_cpu_fetch(), or get_idle/iowait_time() but not by
    get_cpu_idle/iowait_time_us()

Moreover get_idle/iowait_time() relies on get_cpu_idle/iowait_time_us()
with a non-sensical conversion to microseconds and back to nanoseconds on
the way.

Start consolidating the APIs with removing get_idle/iowait_time() and make
kcpustat_field() and kcpustat_cpu_fetch() work for all cases.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Tested-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Link: https://patch.msgid.link/20260508131647.43868-13-frederic@kernel.org

tick/sched: Account tickless idle cputime only when tick is stopped

There is no real point in switching to dyntick-idle cputime accounting mode
if the tick is not actually stopped. This just adds overhead, notably
fetching the GTOD, on each idle exit and each idle IRQ entry for no reason
during short idle trips.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Tested-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Link: https://patch.msgid.link/20260508131647.43868-12-frederic@kernel.org

tick/sched: Remove unused fields

Remove fields after the dyntick-idle cputime migration to scheduler code.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Tested-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Link: https://patch.msgid.link/20260508131647.43868-11-frederic@kernel.org

tick/sched: Move dyntick-idle cputime accounting to cputime code

Although the dynticks-idle cputime accounting is necessarily tied to the
tick subsystem, the actual related accounting code has no business residing
there and should be part of the scheduler cputime code.

Move away the relevant pieces and state machine to where they belong.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Tested-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Link: https://patch.msgid.link/20260508131647.43868-10-frederic@kernel.org

tick/sched: Remove nohz disabled special case in cputime fetch

Even when nohz is not runtime enabled, the dynticks idle cputime accounting
can run and the common idle cputime accessors are still relevant.

Remove the nohz disabled special case accordingly.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Tested-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Link: https://patch.msgid.link/20260508131647.43868-9-frederic@kernel.org

tick/sched: Unify idle cputime accounting

The non-vtime dynticks-idle cputime accounting is a big mess that
accumulates within two concurrent statistics, each having their own
shortcomings:

* The accounting for online CPUs which is based on the delta between
   tick_nohz_start_idle() and tick_nohz_stop_idle().

   Pros:
       - Works when the tick is off

       - Has nsecs granularity

   Cons:
       - Account idle steal time but doesn't substract it from idle
         cputime.

       - Assumes CONFIG_IRQ_TIME_ACCOUNTING by not accounting IRQs but
         the IRQ time is simply ignored when
         CONFIG_IRQ_TIME_ACCOUNTING=n

       - The windows between 1) idle task scheduling and the first call
         to tick_nohz_start_idle() and 2) idle task between the last
         tick_nohz_stop_idle() and the rest of the idle time are
         blindspots wrt. cputime accounting (though mostly insignificant
         amount)

       - Relies on private fields outside of kernel stats, with specific
         accessors.

* The accounting for offline CPUs which is based on ticks and the
   jiffies delta during which the tick was stopped.

   Pros:
       - Handles steal time correctly

       - Handle CONFIG_IRQ_TIME_ACCOUNTING=y and
         CONFIG_IRQ_TIME_ACCOUNTING=n correctly.

       - Handles the whole idle task

       - Accounts directly to kernel stats, without midlayer accumulator.

    Cons:
       - Doesn't elapse when the tick is off, which doesn't make it
         suitable for online CPUs.

       - Has TICK_NSEC granularity (jiffies)

       - Needs to track the dyntick-idle ticks that were accounted and
         substract them from the total jiffies time spent while the tick
         was stopped. This is an ugly workaround.

Having two different accounting for a single context is not the only
problem: since those accountings are of different natures, it is
possible to observe the global idle time going backward after a CPU goes
offline.

Clean up the situation with introducing a hybrid approach that stays
coherent and works for both online and offline CPUs:

  * Tick based or native vtime accounting operate before the idle loop
    is entered and resume once the idle loop prepares to exit.

  * When the idle loop starts, switch to dynticks-idle accounting as is
    done currently, except that the statistics accumulate directly to the
    relevant kernel stat fields.

  * Private dyntick cputime accounting fields are removed.

  * Works on both online and offline case.

Further improvement will include:

  * Only switch to dynticks-idle cputime accounting when the tick actually
    goes in dynticks mode.

  * Handle CONFIG_IRQ_TIME_ACCOUNTING=n correctly such that the
    dynticks-idle accounting still elapses while on IRQs.

  * Correctly substract idle steal cputime from idle time

Reported-by: Xin Zhao <jackzxcui1989@163.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Tested-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Link: https://patch.msgid.link/20260508131647.43868-8-frederic@kernel.org

s390/time: Prepare to stop elapsing in dynticks-idle

Currently the tick subsystem stores the idle cputime accounting in private
fields, allowing cohabitation with architecture idle vtime accounting. The
former is fetched on online CPUs, the latter on offline CPUs.

For consolidation purposes, architecture vtime accounting will continue to
account the cputime but will make a break when the idle tick is
stopped. The dyntick cputime accounting will then be relayed by the tick
subsystem so that the idle cputime is still seen advancing coherently even
when the tick isn't there to flush the idle vtime.

Prepare for that and introduce three new APIs which will be used in
subsequent patches:

  - vtime_dynticks_start() is deemed to be called when idle enters in
    dyntick mode. The idle cputime that elapsed so far is accumulated
    and accounted. Also idle time accounting is ignored.

  - vtime_dynticks_stop() is deemed to be called when idle exits from
    dyntick mode. The vtime entry clocks are fast-forward to current time
    so that idle accounting restarts elapsing from now. Also idle time
    accounting is resumed.

  - vtime_reset() is deemed to be called from dynticks idle IRQ entry to
    fast-forward the clock to current time so that the IRQ time is still
    accounted by vtime while nohz cputime is paused.

Also accumulated vtime won't be flushed from dyntick-idle ticks to avoid
accounting twice the idle cputime, along with nohz accounting.

Co-developed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Tested-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Link: https://patch.msgid.link/20260508131647.43868-7-frederic@kernel.org

powerpc/time: Prepare to stop elapsing in dynticks-idle

Currently the tick subsystem stores the idle cputime accounting in
private fields, allowing cohabitation with architecture idle vtime
accounting. The former is fetched on online CPUs, the latter on offline
CPUs.

For consolidation purpose, architecture vtime accounting will continue
to account the cputime but will make a break when the idle tick is
stopped. The dyntick cputime accounting will then be relayed by the tick
subsystem so that the idle cputime is still seen advancing coherently
even when the tick isn't there to flush the idle vtime.

Prepare for that and introduce three new APIs which will be used in
subsequent patches:

  - vtime_dynticks_start() is deemed to be called when idle enters in
    dyntick mode. The idle cputime that elapsed so far is accumulated.

  - vtime_dynticks_stop() is deemed to be called when idle exits from
    dyntick mode. The vtime entry clocks are fast-forward to current time
    so that idle accounting restarts elapsing from now.

  - vtime_reset() is deemed to be called from dynticks idle IRQ entry to
    fast-forward the clock to current time so that the IRQ time is still
    accounted by vtime while nohz cputime is paused.

Also accumulated vtime won't be flushed from dyntick-idle ticks to avoid
accounting twice the idle cputime, along with nohz accounting.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Tested-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Reviewed-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Link: https://patch.msgid.link/20260508131647.43868-6-frederic@kernel.org

sched/cputime: Correctly support generic vtime idle time

Currently whether generic vtime is running or not, the idle cputime is
fetched from the nohz accounting.

However generic vtime already does its own idle cputime accounting. Only
the kernel stat accessors are not plugged to support it.

Read the idle generic vtime cputime when it's running, this will allow to
later more clearly split nohz and vtime cputime accounting.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Tested-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Link: https://patch.msgid.link/20260508131647.43868-5-frederic@kernel.org

sched/cputime: Remove superfluous and error prone kcpustat_field() parameter

The first parameter to kcpustat_field() is a pointer to the cpu kcpustat to
be fetched from. This parameter is error prone because a copy to a kcpustat
could be passed by accident instead of the original one. Also the kcpustat
structure can already be retrieved with the help of the mandatory CPU
argument.

Remove the needless parameter.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Tested-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Reviewed-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Link: https://patch.msgid.link/20260508131647.43868-4-frederic@kernel.org

sched/idle: Handle offlining first in idle loop

Offline handling happens from within the inner idle loop, after the
beginning of dyntick cputime accounting, nohz idle load balancing and
TIF_NEED_RESCHED polling.

This is not necessary and even buggy because:

  * There is no dyntick handling to do. And calling tick_nohz_idle_enter()
    messes up with the struct tick_sched reset that was performed on
    tick_sched_timer_dying().

  * There is no nohz idle balancing to do.

  * Polling on TIF_RESCHED is irrelevant at this stage, there are no more
    tasks allowed to run.

  * No need to check if need_resched() before offline handling since
    stop_machine is done and all per-cpu kthread should be done with
    their job.

Therefore move the offline handling at the beginning of the idle loop.
This will also ease the idle cputime unification later by not elapsing
idle time while offline through the call to:

   tick_nohz_idle_enter() -> tick_nohz_start_idle()

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Tested-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Reviewed-by: Rafael J. Wysocki (Intel) <rafael@kernel.org>
Reviewed-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Link: https://patch.msgid.link/20260508131647.43868-3-frederic@kernel.org

tick/sched: Fix TOCTOU in nohz idle time fetch

When the nohz idle time is fetched, the current clock timestamp is taken
outside the seqcount, which can result in a race as reported by Sashiko:

    get_cpu_sleep_time_us()                 tick_nohz_start_idle()
    -----------------------                 ---------------------
    now = ktime_get()
                                            write_seqcount_begin(idle_sleeptime_seq);
                                            idle_entrytime = ktime_get()
                                            tick_sched_flag_set(ts, TS_FLAG_IDLE_ACTIVE);
                                            write_seqcount_end(&ts->idle_sleeptime_seq);
    read_seqcount_begin(idle_sleeptime_seq)
    delta = now - idle_entrytime);
    //!! But now < idle_entrytime
    idle = *sleeptime +  delta;
    read_seqcount_retry(&ts->idle_sleeptime_seq, seq)

Here the read side fetches the timestamp before the write side and its
update. As a result the time delta computed on the read side is negative
(ktime_t is signed) and breaks the cputime monotonicity guarantee.

This could possibly be fixed with reading the current clock timestamp
inside the seqcount but the reader overhead might then increase. Also
simply checking that the current timestamp is above the idle entry time
is enough to prevent any issue of the like.

Fixes: 620a30fa0bd1 ("timers/nohz: Protect idle/iowait sleep time under seqcount")
Reported-by: Sashiko
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20260508131647.43868-2-frederic@kernel.org

net: garp: fix unsigned integer underflow in garp_pdu_parse_attr

The receive-side GARP attribute parser computes dlen with reversed
operands:

dlen = sizeof(*ga) - ga->len;

ga->len is the on-wire attribute length and includes the GARP attribute
header. For normal attributes with data, ga->len is larger than
sizeof(*ga), so the subtraction underflows in unsigned arithmetic.

The resulting value is later passed to garp_attr_lookup(), whose length
argument is u8. After truncation, the parsed data length usually no
longer matches the length stored for locally registered attributes, so
received Join/Leave events are ignored. This breaks the GARP receive path
for common attributes, such as GVRP VLAN registration attributes.

Compute the data length as the attribute length minus the header length.

Fixes: eca9ebac651f ("net: Add GARP applicant-only participant")
Reported-by: Yizhou Zhao <zhaoyz24@mails.tsinghua.edu.cn>
Reported-by: Yuxiang Yang <yangyx22@mails.tsinghua.edu.cn>
Reported-by: Ao Wang <wangao@seu.edu.cn>
Reported-by: Xuewei Feng <fengxw06@126.com>
Reported-by: Qi Li <qli01@tsinghua.edu.cn>
Reported-by: Ke Xu <xuke@tsinghua.edu.cn>
Signed-off-by: Yizhou Zhao <zhaoyz24@mails.tsinghua.edu.cn>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260527083200.42861-1-zhaoyz24@mails.tsinghua.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

time: Fix off-by-one in settimeofday() usec validation

The validation check uses '>' instead of '>=' when comparing tv_usec
against USEC_PER_SEC, allowing the value 1000000 through. After
conversion to nanoseconds (*= 1000), this produces tv_nsec ==
NSEC_PER_SEC, violating the timespec invariant that tv_nsec must be
less than NSEC_PER_SEC.

Use '>=' to reject tv_usec values that are not in the valid range of
0 to 999999.

Fixes: 5e0fb1b57bea ("y2038: time: avoid timespec usage in settimeofday()")
Signed-off-by: Naveen Kumar Chaudhary <naveen.osdev@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Acked-by: John Stultz <jstultz@google.com>
Link: https://patch.msgid.link/4rikk44zew3s6577dugmx4jyblz7o5c57niuap6ct3td5yfm6w@gh7pcumg7qor

clockevents: Fix duplicate type specifier in stub function parameter

The stub for arch_inlined_clockevent_set_next_coupled() has 'u64 u64
cycles' in its parameter list. Since u64 is a typedef, the compiler
parses the second 'u64' as the parameter name, making 'cycles' an
unused token. Remove the duplicate so the parameter is correctly named.

Fixes: 89f951a1e8ad ("clockevents: Provide support for clocksource coupled comparators")
Signed-off-by: Naveen Kumar Chaudhary <naveen.osdev@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/7tostpvxzdn6tobmyow63a5rweatls5kux3scqp2vzhe7mv6uq@ecr746b4hyhf

ACPI: button: Switch over to devres-based resource management

Switch over the ACPI button driver to devres-based resource management
by making the following changes:

* Use devm_kzalloc() for allocating button object memory.

* Use devm_input_allocate_device() for allocating the input class
   device object.

* Turn acpi_lid_remove_fs() into a devm cleanup action added
   by devm_acpi_lid_add_fs() which is a new wrapper around
   acpi_lid_add_fs().

* Add devm_acpi_button_init_wakeup() for initializing the wakeup source
   and make it add a custom devm action that will automatically remove
   the wakeup source registered by it.

* Turn acpi_button_remove_event_handler() into a devm cleanup action
   added by devm_acpi_button_add_event_handler() which is a new wrapper
   around acpi_button_add_event_handler().

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/2283436.Mh6RI2rZIc@rafael.j.wysocki
[ rjw: Rebased and removed unnecessary input device parent assignment ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

ACPI: button: Reorganize installing and removing event handlers

To facilitate subsequent changes, move the code installing and
removing button event handlers into two separate functions called
acpi_button_add_event_handler() and acpi_button_remove_event_handler(),
respectively, and rearrange it to reduce code duplication.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/2714170.Lt9SDvczpP@rafael.j.wysocki

ACPI: button: Use string literals for generating netlink messages

Instead of storing strings that never change later under
acpi_device_class(device) and using them for generating netlink
messages, use pointers to string literals with the same content.

This also allows the clearing of the acpi_device_class(device)
area during driver removal and in the probe rollback path to be
dropped.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/2070791.usQuhbGJ8B@rafael.j.wysocki

ACPI: button: Clean up adding and removing lid procfs interface

The procfs interface is only used with lid devices which only becomes
clear after looking into the function bodies of acpi_button_add_fs()
and acpi_button_remove_fs(). Moreover, the only error code returned
by the former of these functions is -ENODEV, so the ret local variable
in it is redundant, and the return type of the latter one can be changed
to void.

Accordingly, rename these functions to acpi_button_add_fs() and
acpi_button_remove_fs(), respectively, move the button->type checks
against ACPI_BUTTON_TYPE_LID from them to their callers, and make
code simplifications as per the above.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/1869050.VLH7GnMWUR@rafael.j.wysocki

ACPI: button: Merge two switch () statements in acpi_button_probe()

Two switch () statements in acpi_button_probe() operate on the same
value and the statements between them can be reordered with respect
to the second one, so merge them.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/3352815.5fSG56mABF@rafael.j.wysocki

ACPI: button: Drop redundant variable from acpi_button_probe()

Local char pointer called "name" in acpi_button_probe() is redundant
because its value can be assigned directly to input->name and the
latter can be used in the only other place where "name" is read, so
get rid of it.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/3706239.iIbC2pHGDl@rafael.j.wysocki

ACPI: button: Rework device verification during probe

Instead of manually comparing the primary ID of the device (retuned
by _HID) with each of the device IDs supported by the driver, use
acpi_match_acpi_device() (which includes the ACPI companion device
pointer check against NULL) and store the ACPI button type as
driver_data in button_device_ids[], which allows a multi-branch
conditional statement to be replaced with a switch () one. However,
to continue preventing successful probing of devices that only have
one of the supported device IDs in their _CID lists, compare the
matched device ID with the primary ID of the device and return an
error if they don't match.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/7960518.EvYhyI6sBW@rafael.j.wysocki
[ rjw: Fixed button memory leak on probe failure ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

ntsync: Honour caller's time namespace for absolute MONOTONIC timeouts

ntsync_schedule() takes the absolute timeout from userspace and hands it to
schedule_hrtimeout_range_clock() with HRTIMER_MODE_ABS. For the default
CLOCK_MONOTONIC path, it does not call timens_ktime_to_host() first.

A process inside a CLOCK_MONOTONIC time namespace computes the absolute
timeout in its own clock view. The kernel reads the same value against the
host clock. The two differ by the namespace offset. The timeout then fires
too early or too late.

Other users of absolute timeouts run the ktime through
timens_ktime_to_host() before starting the hrtimer. ntsync was added later
and missed that step.

/dev/ntsync is mode 0666. Any user inside a time namespace that can
open it is affected. The visible effect is wrong timeout behaviour
for Wine in a container that sets a CLOCK_MONOTONIC offset.

Reproducer: unshare --user --time, set the monotonic offset to -10s,
issue NTSYNC_IOC_WAIT_ANY with a 100 ms absolute MONOTONIC timeout.
The baseline run elapses about 100 ms. The run inside the namespace
elapses about 0 ms.

Apply timens_ktime_to_host() to the parsed timeout when the caller
did not set NTSYNC_WAIT_REALTIME. The helper does nothing in the
initial time namespace, so the fast path is unchanged.

Fixes: b4a7b5fe3f51 ("ntsync: Introduce NTSYNC_IOC_WAIT_ANY.")
Signed-off-by: Maoyi Xie <maoyixie.tju@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Reviewed-by: Elizabeth Figura <zfigura@codeweavers.com>
Link: https://patch.msgid.link/20260528063311.3300393-3-maoyixie.tju@gmail.com

time/namespace: Export init_time_ns and do_timens_ktime_to_host()

timens_ktime_to_host() in compares the current time namespace against
init_time_ns for the fast path. It calls do_timens_ktime_to_host() for the
offset case. Both symbols are needed at link time by any caller of the
inline.

All current callers are builtin, but ntsync can be built as module, which
prevents it from using it.

Export both with EXPORT_SYMBOL_GPL.

Signed-off-by: Maoyi Xie <maoyixie.tju@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20260528063311.3300393-2-maoyixie.tju@gmail.com

hsr: Remove WARN_ONCE() in hsr_addr_is_self().

syzbot reported the warning [0] in hsr_addr_is_self(),
whose assumption is simply wrong.

hsr->self_node is cleared in hsr_del_self_node(), which
is called from hsr_dellink().

Since dev->rtnl_link_ops->dellink() is called before
unregister_netdevice_many(), there is a window when
user can find the device but without hsr->self_node.

Let's remove WARN_ONCE() in hsr_addr_is_self().

[0]:
HSR: No self node
WARNING: net/hsr/hsr_framereg.c:39 at hsr_addr_is_self+0x211/0x3f0 net/hsr/hsr_framereg.c:39, CPU#0: syz.4.16848/17220
Modules linked in:
CPU: 0 UID: 0 PID: 17220 Comm: syz.4.16848 Tainted: G             L      syzkaller #0 PREEMPT_{RT,(full)}
Tainted: [L]=SOFTLOCKUP
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/18/2026
RIP: 0010:hsr_addr_is_self+0x211/0x3f0 net/hsr/hsr_framereg.c:39
Code: 33 2f 41 0f b7 dd 89 ee 09 de 31 ff e8 c8 b4 c6 f6 09 dd 74 54 e8 0f b0 c6 f6 31 ed eb 53 e8 06 b0 c6 f6 48 8d 3d 2f 50 9c 04 <67> 48 0f b9 3a 31 ed eb 42 e8 c1 13 1f 00 89 c5 31 ff 89 c6 e8 96
RSP: 0018:ffffc900041c70e0 EFLAGS: 00010283
RAX: ffffffff8afdc6ca RBX: ffffffff8afdc4e6 RCX: 0000000000080000
RDX: ffffc90010493000 RSI: 0000000000000948 RDI: ffffffff8f9a1700
RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000
R10: ffffc900041c71e8 R11: fffff52000838e3f R12: dffffc0000000000
R13: ffff888041f9e3c0 R14: ffff888086ee3802 R15: 0000000000000000
FS:  00007f6fe985d6c0(0000) GS:ffff888126176000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f80bd437dac CR3: 0000000025096000 CR4: 00000000003526f0
DR0: ffffffffffffffff DR1: 00000000000001f8 DR2: 0000000000000002
DR3: ffffffffefffff15 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
check_local_dest net/hsr/hsr_forward.c:592 [inline]
fill_frame_info net/hsr/hsr_forward.c:728 [inline]
hsr_forward_skb+0xa11/0x2a80 net/hsr/hsr_forward.c:739
hsr_dev_xmit+0x253/0x370 net/hsr/hsr_device.c:236
__netdev_start_xmit include/linux/netdevice.h:5368 [inline]
netdev_start_xmit include/linux/netdevice.h:5377 [inline]
xmit_one net/core/dev.c:3888 [inline]
dev_hard_start_xmit+0x2df/0x860 net/core/dev.c:3904
__dev_queue_xmit+0x1428/0x3900 net/core/dev.c:4870
neigh_output include/net/neighbour.h:556 [inline]
ip_finish_output2+0xcec/0x10b0 net/ipv4/ip_output.c:237
ip_send_skb net/ipv4/ip_output.c:1510 [inline]
ip_push_pending_frames+0x8b/0x110 net/ipv4/ip_output.c:1530
raw_sendmsg+0x1547/0x1a50 net/ipv4/raw.c:659
sock_sendmsg_nosec net/socket.c:787 [inline]
__sock_sendmsg net/socket.c:802 [inline]
____sys_sendmsg+0x7da/0x9c0 net/socket.c:2698
___sys_sendmsg+0x2a5/0x360 net/socket.c:2752
__sys_sendmsg net/socket.c:2784 [inline]
__do_sys_sendmsg net/socket.c:2789 [inline]
__se_sys_sendmsg net/socket.c:2787 [inline]
__x64_sys_sendmsg+0x1c3/0x2a0 net/socket.c:2787
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0x15f/0xf80 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f6feb62ce59
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f6fe985d028 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f6feb8a6090 RCX: 00007f6feb62ce59
RDX: 0000000000000000 RSI: 0000200000000000 RDI: 0000000000000004
RBP: 00007f6feb6c2d6f R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007f6feb8a6128 R14: 00007f6feb8a6090 R15: 00007ffcf01cc488
</TASK>

Fixes: f266a683a480 ("net/hsr: Better frame dispatch")
Reported-by: syzbot+652670cf249077eb498b@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/6a1a861e.b111c304.35cd64.0016.GAE@google.com/
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Fernando Fernandez Mancera <fmancera@suse.de>
Link: https://patch.msgid.link/20260530064300.340793-1-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'nf-26-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf

Pablo Neira Ayuso says:

====================
Netfilter/IPVS fixes for net

The following patchset contains Netfilter/IPVS fixes for net:

1) Fix splat with PREEMPT_RCU because smp_processor_id() in nfqueue,
   from Fernando Fernandez Mancera.

2) Fix possible use of pointer to old IPVS scheduler after RCU grace
   period when editing service, from Julian Anastasov.

3) Fix possible forever RCU walk over rt->fib6_siblings in nft_fib6,
   if rt is unlinked mid-iteration, apparently same issue happens in
   the fib6 core. From Jiayuan Chen.

4) Add mutex to guard refcount in synproxy infrastructure, since
   concurrent hook {un}registration can happen.
   From Fernando Fernandez Mancera.

5) Bail out if IRC conntrack helper fails to parse a command, do not
   try parsing using other command handlers, from Florian Westphal.
   This fixes a possible out-of-bound read.

6) Possible use-after-free in nft_tunnel by releasing template dst
   after all references has been dropped, from Tristan Madani.

7) Ignore conntrack template in nft_ct, from Jiayuan Chen.

8) Missing skb_ensure_writable() in ebt_snat, Yiming Qian.

9) Remove multi-register byteorder support, this allows for kernel
   stack info leak, from Florian Westphal.

* tag 'nf-26-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  netfilter: nft_byteorder: remove multi-register support
  netfilter: bridge: make ebt_snat ARP rewrite writable
  netfilter: nft_ct: bail out on template ct in get eval
  netfilter: nft_tunnel: fix use-after-free on object destroy
  netfilter: conntrack_irc: fix possible out-of-bounds read
  netfilter: synproxy: add mutex to guard hook reference counting
  netfilter: nft_fib_ipv6: bail out of sibling walk if rt got unlinked
  ipvs: clear the svc scheduler ptr early on edit
  netfilter: xt_NFQUEUE: prefer raw_smp_processor_id
====================

Link: https://patch.msgid.link/20260601115923.433946-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tcp: Add preempt_{disable,enable}_nested() in reqsk_queue_hash_req().

syzbot reported a weird reqsk->rsk_refcnt underflow in
__inet_csk_reqsk_queue_drop().

The captured reqsk_put() in __inet_csk_reqsk_queue_drop()
is called only when it successfully removes reqsk from ehash.

Moreover, reqsk_timer_handler() calls another reqsk_put()
after that.

This indicates that the reqsk was missing both refcnts for
ehash and the timer itself.

Since all the syzbot reports had PREEMPT_RT enabled, the only
possible scenario is that reqsk_queue_hash_req() is preempted
after mod_timer() and before refcount_set(), and then the timer
triggered after 1s aborts the reqsk due to its listener's close().

Let's wrap mod_timer() and refcount_set() with
preempt_disable_nested() and preempt_enable_nested().

Note that inet_ehash_insert() holds the normal spin_lock()
(mutex in PREEMPT_RT), so it must be called outside of
preempt_disable_nested(), but this is fine.

The lookup path just ignores 0 sk_refcnt entries in ehash
and tries to create another reqsk, but this will fail at
inet_ehash_insert().

[0]:
refcount_t: underflow; use-after-free.
WARNING: lib/refcount.c:28 at refcount_warn_saturate+0xb2/0x110 lib/refcount.c:28, CPU#0: ktimers/0/16
Modules linked in:
CPU: 0 UID: 0 PID: 16 Comm: ktimers/0 Tainted: G             L      syzkaller #0 PREEMPT_{RT,(full)}
Tainted: [L]=SOFTLOCKUP
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/18/2026
RIP: 0010:refcount_warn_saturate+0xb2/0x110 lib/refcount.c:28
Code: e4 7d d1 0a 67 48 0f b9 3a eb 4a e8 38 3d 23 fd 48 8d 3d e1 7d d1 0a 67 48 0f b9 3a eb 37 e8 25 3d 23 fd 48 8d 3d de 7d d1 0a <67> 48 0f b9 3a eb 24 e8 12 3d 23 fd 48 8d 3d db 7d d1 0a 67 48 0f
RSP: 0000:ffffc90000157948 EFLAGS: 00010246
RAX: ffffffff84a1301b RBX: 0000000000000003 RCX: ffff88801ca98000
RDX: 0000000000000100 RSI: 0000000000000000 RDI: ffffffff8f72ae00
RBP: ffffffff99ae3b01 R08: ffff88801ca98000 R09: 0000000000000005
R10: 0000000000000100 R11: 0000000000000004 R12: ffff8880425ef568
R13: ffff8880425ef4f8 R14: ffff8880425ef578 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff888126386000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f7b46710e9c CR3: 000000000dbb6000 CR4: 00000000003526f0
Call Trace:
<TASK>
__refcount_sub_and_test include/linux/refcount.h:400 [inline]
__refcount_dec_and_test include/linux/refcount.h:432 [inline]
refcount_dec_and_test include/linux/refcount.h:450 [inline]
reqsk_put include/net/request_sock.h:136 [inline]
__inet_csk_reqsk_queue_drop+0x3ce/0x440 net/ipv4/inet_connection_sock.c:1007
reqsk_timer_handler+0x651/0xdf0 net/ipv4/inet_connection_sock.c:1137
call_timer_fn+0x192/0x5e0 kernel/time/timer.c:1748
expire_timers kernel/time/timer.c:1799 [inline]
__run_timers kernel/time/timer.c:2374 [inline]
__run_timer_base+0x6a3/0x9f0 kernel/time/timer.c:2386
run_timer_base kernel/time/timer.c:2395 [inline]
run_timer_softirq+0x67/0x170 kernel/time/timer.c:2403
handle_softirqs+0x1de/0x6d0 kernel/softirq.c:622
__do_softirq kernel/softirq.c:656 [inline]
run_ktimerd+0x69/0x100 kernel/softirq.c:1151
smpboot_thread_fn+0x541/0xa50 kernel/smpboot.c:160
kthread+0x388/0x470 kernel/kthread.c:436
ret_from_fork+0x514/0xb70 arch/x86/kernel/process.c:158
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
</TASK>

Fixes: d2d6422f8bd1 ("x86: Allow to enable PREEMPT_RT.")
Reported-by: syzbot+e809069bc15f26300526@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/6a1a7bcf.0a9e871e.332604.000b.GAE@google.com/
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://patch.msgid.link/20260601182101.3183993-1-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: Annotate sk->sk_write_space() for UDP SOCKMAP.

UDP TX skb->destructor() is sock_wfree(), and UDP holds lock_sock()
only for UDP_CORK / MSG_MORE sendmsg().

Otherwise, sk->sk_write_space() may be read locklessly while SOCKMAP
rewrites sk->sk_write_space().

Let's use WRITE_ONCE() and READ_ONCE() for sk->sk_write_space().

Note that the write side is annotated by commit 2ef2b20cf4e0
("net: annotate data-races around sk->sk_{data_ready,write_space}").

Fixes: 7b98cd42b049 ("bpf: sockmap: Add UDP support")
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://patch.msgid.link/20260529193941.3897256-1-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

pcnet32: stop holding device spin lock during napi_complete_done

napi_complete_done may call gro_flush_normal (though not currently, as GRO
is unsupported at the moment), which may result in packet TX. This will
eventually result in calling pcnet32_start_xmit - resulting in a deadlock
while trying to re-acquire the already locked spin lock.

It is safe to split the spinlock block into two, because the hardware
registers are still protected from concurrent access, and the two blocks
perform unrelated operations that don't need to happen atomically.

Fixes: 5b2ec6f2be51 ("pcnet32: use napi_complete_done()")
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Oscar Maes <oscmaes92@gmail.com>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://patch.msgid.link/20260528140320.5556-1-oscmaes92@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'soc-fixes-7.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc

Pull SoC fixes from Arnd Bergmann:
"Following the previous set of fixes, this addresses another
  significant number of small issues found in firmware drivers (tee,
  optee, qcomtee, qcom ice, exynos acpm) drivers through various tools.

  This is about error handling, resource leaks, concurrency and a
  use-after-free bug.

  The fixes for the Qualcomm ICE driver also introduce interface changes
  in the UFS and MMC drivers using it.

  Outside of firmware drivers, there are a few fixes across the tree:

   - Minor driver code mistakes in the Atmel EBI memory controller, the
     i.MX soc ID driver and socfpga boot logic

   - A defconfig change to avoid a boot time regression on multiple
     qualcomm boards

   - Device tree fixes for qualcomm, at91 and gemini, addressing mostly
     minor configuration mistakes"

* tag 'soc-fixes-7.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (28 commits)
  firmware: samsung: acpm: Fix infinite loop on sequence number exhaustion
  firmware: samsung: acpm: Fix missing LKMM barriers in sequence allocator
  firmware: samsung: acpm: Fix false timeouts and Use-After-Free in polling
  ARM: dts: gemini: Fix partition offsets
  ARM: socfpga: Fix OF node refcount leak in SMP setup
  soc: qcom: ice: Fix the error code when 'qcom,ice' property is not found
  arm64: dts: qcom: eliza: Add power-domain and iface clk for ice node
  arm64: dts: qcom: milos: Add power-domain and iface clk for ice node
  tee: qcomtee: add missing va_end in early return qcomtee_object_user_init()
  tee: fix params_from_user() error path in tee_ioctl_supp_recv
  tee: shm: fix shm leak in register_shm_helper()
  tee: fix tee_ioctl_object_invoke_arg padding
  arm64: defconfig: Enable PCI M.2 power sequencing driver
  scsi: ufs: ufs-qcom: Remove NULL check from devm_of_qcom_ice_get()
  mmc: sdhci-msm: Remove NULL check from devm_of_qcom_ice_get()
  soc: qcom: ice: Return proper error codes from devm_of_qcom_ice_get() instead of NULL
  soc: qcom: ice: Return -ENODEV if the ICE platform device is not found
  soc: qcom: ice: Fix race between qcom_ice_probe() and of_qcom_ice_get()
  ARM: dts: microchip: sam9x7: fix GMAC clock configuration
  firmware: samsung: acpm: Fix mailbox channel leak on probe error
  ...

RDMA/efa: Validate SQ ring size against max LLQ size

Validate the SQ ring size against the device's max LLQ size. This
ensures that when using 128-byte WQEs, userspace cannot exceed the queue
limits.

On create QP, userspace provides the SQ ring size (depth x WQE size)
which is validated against the max LLQ size.

Fixes: 40909f664d27 ("RDMA/efa: Add EFA verbs implementation")
Link: https://patch.msgid.link/r/20260526081536.1203553-1-ynachum@amazon.com
Reviewed-by: Michael Margolin <mrgolin@amazon.com>
Signed-off-by: Yonatan Nachum <ynachum@amazon.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

Merge branches 'rcutorture.2026.05.24' and 'misc.2026.05.24' into rcu-merge.2026.05.24

rcutorture.2026.05.24: Torture-test updates
misc.2026.05.24: Miscellaneous RCU updates

rcu/nocb: reduce stack usage in nocb_gp_wait()

When CONFIG_UBSAN_ALIGNMENT is enabled, the stack usage of nocb_gp_wait()
grows above typical warning limits:

In file included from kernel/rcu/tree.c:4930:
kernel/rcu/tree_nocb.h: In function 'rcu_nocb_gp_kthread':
kernel/rcu/tree_nocb.h:866:1: error: the frame size of 1968 bytes is larger than 1280 bytes [-Werror=frame-larger-than=]

Apparently, the problem is passing rcu_data from a 'void *' pointer,
which gcc assumes may be misaligned. When the function is not inlined
into rcu_nocb_gp_kthread(), that is no longer visible to gcc.

Add a 'noinline_for_stack' annotation that leads to skipping a lot of
the alignment sanitizer checks and keeps the stack usage 60% lower here.

Reviewed-by: Kunwu Chan <chentao@kylinos.cn>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>

cpufreq/amd-pstate: Fix setting EPP in performance mode

EPP 0 is the only supported value in the performance policy.
commit 798c47593cca ("cpufreq/amd-pstate: Add support for platform profile
class") changed this while adding platform profile support to the
dynamic EPP feature, but this actually wasn't necessary since platform
profile writes disable manual EPP writes.

Restore allowing writing EPP of 0 when in performance mode.

Reviewed-by: Marco Scardovi <scardracs@disroot.org>
Tested-by: Marco Scardovi <scardracs@disroot.org>
Reported-by: Stuart Meckle <stuartmeckle@gmail.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=221473
Closes: https://gitlab.freedesktop.org/upower/power-profiles-daemon/-/work_items/190
Fixes: 798c47593cca ("cpufreq/amd-pstate: Add support for platform profile class")
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>

KVM: s390: Remove ptep_zap_softleaf_entry()

Migration entries do not need to be removed.

The swap subsystem has been (and still is being) heavily reworked. The
current implementation of ptep_zap_softleaf_entry() has been slowly
modified and is now wrong, since it unconditionally calls
swap_put_entries_direct() for both swap and migration entries.

Remove ptep_zap_softleaf_entry() altogether, merge the path for proper
swap entries directly in the only caller, and ignore migration entries.

Fixes: 200197908dc4 ("KVM: s390: Refactor and split some gmap helpers")
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Message-ID: <20260602142356.169458-11-imbrenda@linux.ibm.com>

KVM: s390: Fix possible reference leak in fault-in code

If kvm_s390_new_mmu_cache() fails, kvm_s390_faultin_gfn() returns
without releasing the faulted page.

Fix this by moving the allocation of the memory cache outside of the
loop. There is no reason to check at every iteration.

Opportunistically fix a comment.

Fixes: e907ae530133 ("KVM: s390: Add helper functions for fault handling")
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Message-ID: <20260602142356.169458-10-imbrenda@linux.ibm.com>

KVM: s390: Prevent memslots outside the ASCE range

With KVM_S390_VM_MEM_LIMIT_SIZE, userspace can set the highest address
allowed for the VM. Creating a memslot that lies over the maximum
address does not make sense and is only a potential source of bugs.

Prevent creation of memslots over the maximum address, and prevent the
maximum address from being reduced below the end of existing memslots.

Fixes: e38c884df921 ("KVM: s390: Switch to new gmap")
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Message-ID: <20260602142356.169458-9-imbrenda@linux.ibm.com>

io_uring/bpf-ops: restrict ctx access to BPF

BPF programs should have no need in looking into struct io_ring_ctx, if
anything, most of such cases would be anti patterns like looking up ring
indices directly via the context.

Replace it with a new empty structure, which is just an alias to struct
io_ring_ctx. It'll create a new BTF type and fail verification if a BPF
program tries to access it (beyond the first byte). It'll also give more
flexibility for the future, and otherwise it can be made aligned with
io_ring_ctx as before with struct groups if ever needed or extended in a
different way.

Fixes: d0e437b76bd3c ("io_uring/bpf-ops: implement loop_step with BPF struct_ops")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://patch.msgid.link/5f6ca3649e9e0bae8667db4357e28dd00cd07901.1780394491.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

block/partitions/acorn: use min in {riscix,linux}_partition

Use min() to replace the open-coded implementations and to simplify
riscix_partition() and linux_partition().

Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Reviewed-by: Kees Cook <kees@kernel.org>
Link: https://patch.msgid.link/20260602160757.973736-3-thorsten.blum@linux.dev
Signed-off-by: Jens Axboe <axboe@kernel.dk>

Merge tag 'mm-hotfixes-stable-2026-06-01-20-58' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM fixes from Andrew Morton:
"13 hotfixes. All are for MM. 10 are cc:stable and the remaining 3
  address post-7.1 issues or aren't considered suitable for backporting.

  There's a three-patch series "userfaultfd: verify VMA state across
  UFFDIO_COPY retry" from Mike Rapoport which fixes a few uffd things.
  The rest are singletons - please see the individual changelogs for
  details"

* tag 'mm-hotfixes-stable-2026-06-01-20-58' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  userfaultfd: remove redundant check in vm_uffd_ops()
  userfaultfd: refuse to __mfill_atomic_pte() for unsupported VMAs
  userfaultfd: verify VMA state across UFFDIO_COPY retry
  mm/huge_memory: update file PMD counter before folio_put()
  mm/huge_memory: update file PUD counter before folio_put()
  mm/hugetlb_vmemmap: fix incorrect vmemmap restore in rollback
  mm/damon/ops-common: call folio_test_lru() after folio_get()
  mm/cma: fix reserved page leak on activation failure
  mm/memory-failure: fix hugetlb_lock AA deadlock in get_huge_page_for_hwpoison
  mm/hugetlb: restore reservation on error in hugetlb folio copy paths
  mm/cma_debug: fix invalid accesses for inactive CMA areas
  memcg: use round-robin victim selection in refill_stock
  mm/hugetlb: avoid false positive lockdep assertion

arm64: mm: Unmap kernel data/bss entirely from the linear map

The linear aliases of the kernel text and rodata are also mapped
read-only in the linear map. Given that the contents of these regions
are mostly identical to the version in the loadable image, mapping them
read-only and leaving their contents visible is a reasonable hardening
measure.

Data and bss, however, are now also mapped read-only but the contents of
these regions are more likely to contain data that we'd rather not leak.
So let's unmap these entirely in the linear map when the kernel is
running normally.

When going into hibernation or waking up from it, these regions need to
be mapped, so map the region initially, and toggle the valid bit so
map/unmap the region as needed.

Doing so is required because pages covering the kernel image are marked
as PageReserved, and therefore disregarded for snapshotting by the
hibernate logic unless they are mapped.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>

arm64: mm: Map the kernel data/bss read-only in the linear map

On systems where the bootloader adheres to the original arm64 boot
protocol, the placement of the kernel in the physical address space is
highly predictable, and this makes the placement of its linear alias in
the kernel virtual address space equally predictable, given the lack of
randomization of the linear map.

The linear aliases of the kernel text and rodata regions are already
mapped read-only, but the kernel data and bss are mapped read-write in
this region. This is not needed, so map them read-only as well.

Note that the statically allocated kernel page tables do need to be
modifiable via the linear map, so leave these mapped read-write.

Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>

mm: Make empty_zero_page[] const

The empty zero page is used to back any kernel or user space mapping
that is supposed to remain cleared, and so the page itself is never
supposed to be modified.

So mark it as const, which moves it into .rodata rather than .bss: on
most architectures, this ensures that both the kernel's mapping of it
and any aliases that are accessible via the kernel direct (linear) map
are mapped read-only, and cannot be used (inadvertently or maliciously)
to corrupt the contents of the zero page.

Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Reviewed-by: Jann Horn <jannh@google.com>
Reviewed-by: Feng Tang <feng.tang@linux.alibaba.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>