git.ipfire.org Git - thirdparty/linux.git/log

s390/sclp: Allow user-space to provide PCI reports for NVMe SMART data

The new SCLP action qualifier 4 is used by user-space code to provide
NVMe SMART log data to the platform.

Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Reviewed-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>

arm64/fpsimd: ptrace: zero target's fpsimd_state, not the tracer's

sve_set_common() is the backend for PTRACE_SETREGSET(NT_ARM_SVE) and
PTRACE_SETREGSET(NT_ARM_SSVE). Every write in the function operates on
the tracee (target) - except a single memset that uses current instead,
zeroing the tracer's saved V0-V31 / FPSR / FPCR shadow on every ptrace
SETREGSET call.

The memset is meant to give the tracee a defined zero register image
before the user-supplied payload is copied in (for partial writes,
header-only writes, and FPSIMD<->SVE format switches). Aiming it at
current both denies the tracee that clean slate and silently corrupts
the tracer.

The corruption of the tracer's saved FPSIMD state is not always
observable. Where the tracer's state is live on a CPU, this may be
reused without loading the corrupted state from memory, and will
eventually be written back over the corrupted state. Where the tracer's
state is saved in SVE_PT_REGS_SVE format, only the FPSR and FPCR are
clobbered, and the effective copy of the vectors is in the task's
sve_state.

Reproducible on an arm64 kernel with SVE: a single-threaded tracer that
loads a known pattern into V0-V31, issues PTRACE_SETREGSET(NT_ARM_SVE)
on a child, and reads V0-V31 back observes them all zeroed within tens
of thousands of iterations when a sibling thread keeps stealing the
FPSIMD CPU binding.

Fixes: 316283f276eb ("arm64/fpsimd: ptrace: Consistently handle partial writes to NT_ARM_(S)SVE")
Cc: <stable@vger.kernel.org>
Signed-off-by: Breno Leitao <leitao@debian.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

clocksource/drivers/sun5i: Add D1 hstimer support

D1 high speed timer differs from existing timer-sun5i by register base
offset.

Add sunxi quirks to handle D1 specific offset.
Add D1 compatible string to OF match table.

Signed-off-by: Michal Piekos <michal.piekos@mmpsystems.pl>
Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>
Reviewed-by: Chen-Yu Tsai <wens@kernel.org>
Link: https://patch.msgid.link/20260428-h616-t113s-hstimer-v3-2-7e02178a93ee@mmpsystems.pl

dt-bindings: timer: allwinner,sun5i-a13-hstimer: add H616 and D1

D1 is similar to existing sun5i, but with different register offsets.
H616 uses same offsets as D1.

Add allwinner,sun20i-d1-hstimer
Add allwinner,sun50i-h616-hstimer with fallback to
allwinner,sun20i-d1-hstimer
Extend schema condition for interrupts to cover D1 compatible variant.

Signed-off-by: Michal Piekos <michal.piekos@mmpsystems.pl>
Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://patch.msgid.link/20260428-h616-t113s-hstimer-v3-1-7e02178a93ee@mmpsystems.pl

io_uring/wait: honour caller's time namespace for IORING_ENTER_ABS_TIMER

io_uring_enter() with IORING_ENTER_ABS_TIMER takes an absolute
timespec from the caller via ext_arg->ts. It arms an ABS mode
hrtimer in __io_cqring_wait_schedule(). The conversion path in
io_uring/wait.c parses ext_arg->ts inline rather than going
through io_parse_user_time(). It therefore does not pick up the
time namespace conversion added by the previous patch.

Apply timens_ktime_to_host() to the parsed time on the
IORING_ENTER_ABS_TIMER branch. This mirrors the IORING_TIMEOUT_ABS
fix in io_parse_user_time(). Use ctx->clockid as the clock id.
ctx->clockid is set either at ring creation or via
IORING_REGISTER_CLOCK.

timens_ktime_to_host() is a no-op for clocks not affected by time
namespaces. It is also a no-op for callers in the initial time
namespace. The fast path is unchanged.

Reproducer: in unshare --user --time, with a -10s monotonic
offset, call io_uring_enter with min_complete=1,
IORING_ENTER_ABS_TIMER, and ts = now + 1s. The call returns
-ETIME after <1ms instead of after the expected ~1s.

Suggested-by: Pavel Begunkov <asml.silence@gmail.com>
Suggested-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Maoyi Xie <maoyi.xie@ntu.edu.sg>
Link: https://patch.msgid.link/20260504153755.1293932-3-maoyi.xie@ntu.edu.sg
Signed-off-by: Jens Axboe <axboe@kernel.dk>

io_uring/timeout: honour caller's time namespace for IORING_TIMEOUT_ABS

io_uring's IORING_OP_TIMEOUT and IORING_OP_LINK_TIMEOUT accept a
timespec from the caller via io_parse_user_time(). With
IORING_TIMEOUT_ABS, the timestamp is an absolute deadline on the
selected clock. The clock is CLOCK_MONOTONIC by default.
CLOCK_BOOTTIME and CLOCK_REALTIME are also selectable.

A submitter inside a CLONE_NEWTIME time namespace observes
CLOCK_MONOTONIC and CLOCK_BOOTTIME shifted by the namespace's
offsets relative to the host. Every other ABS timer interface in
the kernel converts the caller's absolute time to host view via
timens_ktime_to_host() before arming an hrtimer:

  kernel/time/posix-timers.c    -- timer_settime(TIMER_ABSTIME)
  kernel/time/posix-stubs.c     -- clock_nanosleep(TIMER_ABSTIME)
  kernel/time/alarmtimer.c      -- alarm_timer_nsleep(TIMER_ABSTIME)
  fs/timerfd.c                  -- timerfd_settime(TFD_TIMER_ABSTIME)

io_parse_user_time() does not. As a result, an absolute timeout
submitted from within a time namespace is interpreted in host
view. That is generally a different point in time. It may already
be in the past, causing the timer to fire immediately, or far in
the future, causing the timer not to fire when expected.

Reproducer: in unshare --user --time, with a -10s monotonic
offset, submit IORING_OP_TIMEOUT with IORING_TIMEOUT_ABS and
deadline = now + 1s. The CQE is delivered after <1ms instead of
the expected ~1s.

Apply timens_ktime_to_host() to the parsed time when
IORING_TIMEOUT_ABS is set. Split the existing clock id resolver
in io_timeout_get_clock() into a flags only helper
io_flags_to_clock(), so io_parse_user_time() can resolve the
clock without a struct io_timeout_data.

timens_ktime_to_host() is a no-op for clocks not affected by time
namespaces, e.g. CLOCK_REALTIME. It is also a no-op for callers
in the initial time namespace. The fast path is unchanged.

SQPOLL is also covered. The SQPOLL kernel thread is created via
create_io_thread() with CLONE_THREAD and no CLONE_NEW* flag.
copy_namespaces() therefore shares the submitter's nsproxy by
reference. Inside the SQPOLL kthread, current->nsproxy->time_ns
is the submitter's time_ns. timens_ktime_to_host() resolves
correctly.

Suggested-by: Pavel Begunkov <asml.silence@gmail.com>
Suggested-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Maoyi Xie <maoyi.xie@ntu.edu.sg>
Link: https://patch.msgid.link/20260504153755.1293932-2-maoyi.xie@ntu.edu.sg
Signed-off-by: Jens Axboe <axboe@kernel.dk>

ublk: validate physical_bs_shift, io_min_shift and io_opt_shift

ublk_validate_params() checks logical_bs_shift is within
[9, PAGE_SHIFT] but has no upper bound for physical_bs_shift,
io_min_shift, or io_opt_shift. A malicious userspace can set any
of these to a large value (e.g., 44), causing undefined behavior
from `1 << shift` in ublk_ctrl_start_dev() since the result is
stored in 32-bit unsigned int.

Cap all three at ilog2(SZ_256M) (28). 256M is big enough to cover
all practical block sizes, and originates from the maximum physical
block size possible in NVMe (lba_size * (1 + npwg), where npwg is
16-bit).

Also zero out ub->params with memset() when copy_from_user() fails
or ublk_validate_params() returns error, so that no stale or partial
params survive for a subsequent START_DEV to consume.

Fixes: 71f28f3136af ("ublk_drv: add io_uring based userspace block driver")
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
Link: https://patch.msgid.link/20260506082238.22363-1-tom.leiming@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

drm/xe/cri: Add new PCI IDs

Add support for new CRI PCI IDs.

Bspec: 77979
Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://patch.msgid.link/20260505074855.3813063-2-balasubramani.vivekanandan@intel.com
Signed-off-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>

wifi: mac80211: explicitly disable FTM responder on AP stop

When stopping the AP, explicitly disable FTM responder while
disabling beaconing.

Link: https://patch.msgid.link/20260505151241.f213196d7d6a.I95d65c030e986c5f7d63ecbd79596da890b9fc84@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: iwlwifi: don't blindly start the responder upon BSS_CHANGED_FTM_RESPONDER

mac80211 may just want to stop it, so check the ftm_responder boolean
before starting the responder.

Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Reviewed-by: Miriam Rachel Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20260505151241.285da8fbf7f4.I1b6922ca8d06d592356d7a5d190e6118fec1d5b5@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

drm/bridge: microchip-lvds: fix bus format mismatch with VESA displays

The LVDS controller was hardcoded to JEIDA mapping, which leads to
distorted output on panels expecting VESA mapping.

Update the driver to dynamically select the appropriate mapping and
pixel size based on the panel's advertised media bus format. This
ensures compatibility with both JEIDA and VESA displays.

Signed-off-by: Sandeep Sheriker M <sandeep.sheriker@microchip.com>
Signed-off-by: Dharma Balasubiramani <dharma.b@microchip.com>
Reviewed-by: Maxime Ripard <mripard@kernel.org>
Link: https://patch.msgid.link/20250625-microchip-lvds-v6-3-7ce91f89d35a@microchip.com
Signed-off-by: Manikandan Muralidharan <manikandan.m@microchip.com>

drm/bridge: microchip-lvds: migrate to atomic bridge ops

Replace legacy .enable and .disable callbacks with their atomic
counterparts .atomic_enable and .atomic_disable.

Signed-off-by: Dharma Balasubiramani <dharma.b@microchip.com>
Reviewed-by: Maxime Ripard <mripard@kernel.org>
Link: https://patch.msgid.link/20250625-microchip-lvds-v6-2-7ce91f89d35a@microchip.com
Signed-off-by: Manikandan Muralidharan <manikandan.m@microchip.com>

drm/bridge: microchip-lvds: Remove unused drm_panel and redundant port node lookup

Drop the unused drm_panel field from the mchp_lvds structure, and remove
the unnecessary port device node lookup, as devm_drm_of_get_bridge()
already performs the required checks internally.

Signed-off-by: Dharma Balasubiramani <dharma.b@microchip.com>
Reviewed-by: Maxime Ripard <mripard@kernel.org>
Link: https://patch.msgid.link/20250625-microchip-lvds-v6-1-7ce91f89d35a@microchip.com
Signed-off-by: Manikandan Muralidharan <manikandan.m@microchip.com>

wifi: mac80211_hwsim: claim HT STBC capability

This is already claimed for VHT and HE, so it doesn't really
make sense to not claim it for HT, and this causes sigma-dut
failures since it assumes VHT support implies HT support.

Link: https://patch.msgid.link/20260506093231.155762-2-johannes@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: mac80211_hwsim: enable NAN_DATA interface simulation support

Enable NAN_DATA interface simulation support by adding it to the
supported interface types. This completes the NAN Data Path
simulation introduced in the previous patches.

Signed-off-by: Daniel Gabay <daniel.gabay@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20260506064301.d4bd95959bfa.I450087714bd55189242ab6a72ce6650be36edbcb@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: mac80211_hwsim: Support Tx of multicast data on NAN

Add support for transmitting multicast data frames. These
frames can be transmitted when all the peer NDI stations
on the interface are available at the current slot.

Signed-off-by: Ilan Peer <ilan.peer@intel.com>
Reviewed-by: Benjamin Berg <benjamin.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20260506064301.0af7e24f0df3.I3c2de3e456ae092c939e6bfd3d30960fbf2fbeaa@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: mac80211_hwsim: Do not declare support for NDPE

Do not declare support for NAN Data Path Extension attribute
as this is handled by user space and should be set by it.

Signed-off-by: Ilan Peer <ilan.peer@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20260506064301.711c61538c8a.I9796410c0376f50a07259cc611428d76c51f180a@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: mac80211_hwsim: Declare support for secure NAN

Advertise NL80211_EXT_FEATURE_SECURE_NAN to indicate support for
NAN Pairing, enabling peer authentication and secure data path
establishment.

Signed-off-by: Daniel Gabay <daniel.gabay@intel.com>
Reviewed-by: Avraham Stern <avraham.stern@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20260506064301.d3bcc26b4525.I6993cc70c43579694ffd429f1afb971a73db2ae4@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: mac80211_hwsim: add NAN data path TX/RX support

Implement TX and RX path handling for NAN Data Path (NDP) frames,
enabling data communication between NAN peers during scheduled
availability windows.

TX path:
- Select TX channel based on current time slot: use DW channel
  during Discovery Windows, or FAW channel from local
  schedule during Further Availability Windows.
- Verify peer availability before transmission by checking committed
  DW schedule or FAW of the peer schedule.

RX path:
- Extend NAN receive filtering to handle NAN_DATA interface frames.
- Accept incoming frames during FAW slots when channel matches local
  schedule.

Signed-off-by: Daniel Gabay <daniel.gabay@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20260506064301.155252ebc72b.Ic210f6c095c6ff372941bc8c77ee9c8c37d0356c@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: mac80211_hwsim: set HAS_RATE_CONTROL when using NAN

- NAN switches between bands/channels per its schedule, so mac80211
  rate control can't work, set HAS_RATE_CONTROL instead.
- Skip rate control checks for NAN interfaces in
  mac80211_hwsim_sta_rc_update() as it's not relevant.
- Move set_rts_threshold stub to HWSIM_COMMON_OPS and return 0 instead
  of -EOPNOTSUPP to prevent failures in non-MLO tests that set RTS
  threshold (hwsim ignores the use_rts instruction from mac80211
  anyway).

Signed-off-by: Daniel Gabay <daniel.gabay@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20260506064301.216e68be61ac.If9ef94a12cec8dfc55416afaf745d6e5025a5ec9@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: mac80211_hwsim: implement NAN schedule callbacks

Implement mac80211 schedule callbacks for NAN Data Path support:

- Track local schedule via BSS_CHANGED_NAN_LOCAL_SCHED, caching
  the channel for each 16TU time slot.
- Copy peer schedule to driver-private storage in
  nan_peer_sched_changed callback for use in TX availability
  decisions.

Signed-off-by: Daniel Gabay <daniel.gabay@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20260506064301.f3ad9e3dc9d4.I75cf3555b7506d5b8bb30e70a0f3721ab73477cb@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: mac80211_hwsim: add NAN PHY capabilities

Add static HT, VHT and HE PHY capabilities to the NAN capabilities
structure. These are based on the existing band capability structures
and initialization in mac80211_hwsim.

The NAN PHY capabilities are used by mac80211 and nl80211 to
advertise device capabilities for NAN data interfaces.

Signed-off-by: Daniel Gabay <daniel.gabay@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20260506064301.2c94c156f05d.I539fab4adf2eb43bfec27006f7529b926e5208ea@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: mac80211_hwsim: add NAN_DATA interface limits

Increase interface limits for NAN_DATA interface.

Signed-off-by: Daniel Gabay <daniel.gabay@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20260506064301.587955b23089.I261b782e5c198726b9465815d59ce037f094784d@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: mac80211_hwsim: implement NAN synchronization

Add all the handling to do NAN synchronization on 2.4 GHz including
sending out beacons. With this, the mac80211_hwsim NAN device also works
when used in conjunction with an external medium simulation.

Note that the TSF sync is not ideal in case of an external medium
simulation. This is because the mactime for received frames needs to be
estimated and the simulation may not update the timestamp of beacons
to the actual time that the frame was transmitted.

The implementation has an initial short phase where it scans for
clusters. This facilitates cluster joining and avoids creating a new
cluster immediately, which would result in two cluster join
notifications. It does not scan otherwise and will only see another
cluster appearing if a discovery beacon happens to be sent during the
2.4 GHz discovery window (DW).

Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20260506064301.7d21c3cdc565.I98b6c15eadefd6d123658294ef1a0cd3c2ce3054@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: mac80211_hwsim: protect tsf_offset using a spinlock

To implement NAN synchronization in hwsim, the TSF needs to be adjusted
regularly from the RX path. Add a spinlock so that this can be done in a
safe manner.

Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20260506064301.18f36f264eb9.I0da5477220b896e2177bd521f7d9a8f2595631e6@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: mac80211_hwsim: only RX on NAN when active on a slot

This moves the NAN receive into the main code and changes it so that
frame RX only happens when the device is active on the channel. This
limits RX to the DW slots as there is currently no datapath.

With this the globally stored channel is obsolete, remove it.

Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20260506064301.8cf4a67d3436.Ife07cf4ae8a2d59766356398163f7ee8d734bd6a@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: mac80211_hwsim: select NAN TX channel based on current TSF

Move the TX channel selection into the NAN specific file and select the
channel based on the current slot.

Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20260506064301.c235b5a78b98.I5ec4076a8a9445233dc414c6ecaa39f32f1e9595@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: mac80211_hwsim: limit TX of frames to the NAN DW

Frames submitted on the NAN device interface should only be transmitted
during one of the discovery windows (DWs). It is assumed that software
submits frames from the DW end notifications for the next DW period.

Simulate this behaviour by checking that we are currently in a DW before
transmitting from ieee80211_hwsim_wake_tx_queue. As frames will be
queued up at the start of a DW, wake the management TX queue every time
a DW is started. Do so with a randomized offset just to avoid every
client transmitting at the same time.

Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20260506064301.f3456f159655.Id6780e2f7f7cab03264299b7d696ba5b1269e451@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: cfg80211: don't allow NAN DATA on multi radio devices

The support for NAN DATA was added for single radio devices only. For
example, checking the interface combinations is done for a single radio.
Prevent registration with NAN DATA interface type for multi radio
devices.

Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20260505194607.ff87e6fcff56.If201aa58119d2a6b08223ecb63bc2869f63ff5a1@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

Revert "drm/edid: add CTA Video Format Data Block support"

This reverts commit e3953ff665742c15c002af9e176bd24d5cd9ec61.

Seems to have been accidentally pushed without mandatory review.

Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>
Acked-by: Thomas Zimmermann <tzimmermann@suse.de> #irc
Acked-by: Maxime Ripard <mripard@kernel.org> #irc

wifi: nl80211: re-check wiphy netns in nl80211_prepare_wdev_dump() continuation

NL80211_CMD_GET_SCAN is implemented as a multi-call dumpit. The first
invocation of nl80211_prepare_wdev_dump() validates the requested wdev
against the caller's netns via __cfg80211_wdev_from_attrs(). Subsequent
invocations look up the same wiphy by its global index and do not check
that the wiphy is still in the caller's netns.

Add the same filter to the continuation path. If the wiphy's netns no
longer matches the caller's, return -ENODEV and the netlink dump
machinery terminates the walk cleanly.

Signed-off-by: Maoyi Xie <maoyi.xie@ntu.edu.sg>
Link: https://patch.msgid.link/20260506064854.2207105-3-maoyixie.tju@gmail.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: nl80211: require CAP_NET_ADMIN over the target netns in SET_WIPHY_NETNS

NL80211_CMD_SET_WIPHY_NETNS dispatches with GENL_UNS_ADMIN_PERM, which
verifies that the caller has CAP_NET_ADMIN for the source netns. It
doesn't verify that the caller has CAP_NET_ADMIN over the target netns
selected by NL80211_ATTR_NETNS_FD or NL80211_ATTR_PID.

This diverges from the convention enforced in
net/core/rtnetlink.c::rtnl_get_net_ns_capable():

    /* For now, the caller is required to have CAP_NET_ADMIN in
     * the user namespace owning the target net ns.
     */
    if (!sk_ns_capable(sk, net->user_ns, CAP_NET_ADMIN))
        return ERR_PTR(-EACCES);

A user with CAP_NET_ADMIN in their own user namespace can therefore
push a wiphy into an arbitrary netns (including init_net) over which
they have no privilege.

Mirror the rtnetlink convention by requiring CAP_NET_ADMIN in the
target netns before calling cfg80211_switch_netns().

Signed-off-by: Maoyi Xie <maoyi.xie@ntu.edu.sg>
Link: https://patch.msgid.link/20260506064854.2207105-2-maoyixie.tju@gmail.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: nl80211: fix NL80211_PMSR_FTM_REQ_ATTR_FTMS_PER_BURST usage

This is documented as a u8 and has a policy of NLA_U8, but uses
nla_get_u32() which means it's completely broken on big-endian.
Fix it to use nla_get_u8().

Fixes: 9bb7e0f24e7e ("cfg80211: add peer measurement with FTM initiator API")
Link: https://patch.msgid.link/20260505113837.260159-2-johannes@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: mac80211: remove station if connection prep fails

If connection preparation fails for MLO connections, then the
interface is completely reset to non-MLD. In this case, we must
not keep the station since it's related to the link of the vif
being removed. Delete an existing station. Any "new_sta" is
already being removed, so that doesn't need changes.

This fixes a use-after-free/double-free in debugfs if that's
enabled, because a vif going from MLD (and to MLD, but that's
not relevant here) recreates its entire debugfs.

Cc: stable@vger.kernel.org
Fixes: 81151ce462e5 ("wifi: mac80211: support MLO authentication/association with one link")
Reviewed-by: Miriam Rachel Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20260505151533.c4e52deb06ad.Iafe56cec7de8512626169496b134bce3a6c17010@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: rtw89: debug: add BB diagnose

Add BB diagnostic to track potential abnormal conditions. Currently,
five diagnostic metrics are monitored: 1) Hang detection monitors
consecutive absence of TX and RX activity. 2) PD maximum triggers
when PD stays at its maximum threshold for a period. 3) No RX
occurs when no CCA activity is detected over multiple consecutive
cycles. 4) High FA indicates a high false alarm ratio, reflecting
severe environmental interference. 5) EDCCA alerts when high EDCCA
ratio, signaling a potential TX hang.

These metrics are exposed via debugfs diag_bb.

Output:

  [PHY 0]
  Diag bitmap = 0x0
  Event{Hang, PD MAX, No RX, High FA, High EDCCA Ratio} = {0, 0, 0, 0, 0}
  consecutive_no_tx_cnt=0, consecutive_no_rx_cnt=0

Signed-off-by: Kuan-Chung Chen <damon.chen@realtek.com>
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Link: https://patch.msgid.link/20260429132625.1659182-8-pkshih@realtek.com

wifi: rtw89: debug: add RX statistics in bb_info

Expand RX packet statistics including coding type, spatial
diversity, and beamforming. These statistics are accumulated
per PHY and displayed in bb_info debugfs.

RX statistics output:

== RX General
LDPC: 190, BCC: 0, STBC: 0, SU_NON_BF: 0, SU_BF: 190, MU: 0

Signed-off-by: Kuan-Chung Chen <damon.chen@realtek.com>
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Link: https://patch.msgid.link/20260429132625.1659182-7-pkshih@realtek.com

ALSA: sparc/dbri: add missing fallthrough

Fixes compiler error with probably newer compilers:

sound/sparc/dbri.c:595:2: error: unannotated fall-through between switch labels [-Werror,-Wimplicit-fallthrough]
  595 |         case 1:
      |         ^
sound/sparc/dbri.c:595:2: note: insert 'break;' to avoid fall-through
  595 |         case 1:
      |         ^
      |         break;

Signed-off-by: Rosen Penev <rosenp@gmail.com>
Link: https://patch.msgid.link/20260506031854.780411-1-rosenp@gmail.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>

wifi: rtw89: debug: extend bb_info with TX status and PER

Enhance bb_info debugfs by adding TX status information to aid
debugging and performance analysis.

A snapshot of TX-related registers, including PPDU type and subtype,
bandwidth, TX power, STBC, etc. The information is collected per
PHY during track_work and displayed via bb_info debugfs.

TX status output:

  == TX General
  EHT_MU_SU DATA
  BW: 160, TX_SC: 0, TX_PATH_EN: 3, PATH_MAP: 0xe4
  TXPWR TMAC: 11, P0: 11, P1: 11 dBm
  MCS: 9, STBC: 0
  Info: [0x0000000b, 0xc1702c30, 0x0ff1e430, 0xc0000000, 0x99000009, 0x40000000]
  Common ctrl: [0x00000150, 0x40190140]

In addition, include the retry ratio from RA reports and display it
as PER (packet error rate) in station debug information.

PER output:

  TX rate [0, 0]: HE 2SS MCS-4 GI:0.8 BW:20 (hw_rate=0x324) PER:23

Signed-off-by: Kuan-Chung Chen <damon.chen@realtek.com>
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Link: https://patch.msgid.link/20260429132625.1659182-6-pkshih@realtek.com

gpio: 74x164: support lines-initial-states for boot-time output state

74HC595 and 74LVC594 chains retain their output state from the first
serial write onwards. Today the driver always kicks that first write
from a zero-initialised buffer, so every output comes up low until user
space issues a write. Boards that rely on the chain to drive signals
whose power-on state matters (active-low indicators, reset lines, etc.)
have no way to express the desired initial pattern via DT.

Read the optional lines-initial-states bitmask, recently documented for
this binding, into chip->buffer before the first
__gen_74x164_write_config() so the chain comes up in a known state on
the very first SPI transaction. Bit N maps to GPIO line N (matching the
nxp,pcf8575 convention); on this output-only device, bit=0 drives the
line low and bit=1 drives it high. Property absence keeps the existing
zeroing behaviour intact.

Suggested-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Chanhong Jung <happycpu@gmail.com>
Reviewed-by: Linus Walleij <linusw@kernel.org>
Link: https://patch.msgid.link/20260429035134.1023330-3-happycpu@gmail.com
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>

dt-bindings: gpio: fairchild,74hc595: add lines-initial-states property

The 74HC595 and 74LVC594 shift registers latch their outputs until the
first serial write, so boards that depend on a specific power-on pattern
(for example active-low indicators, reset lines, or other signals that
must come up non-zero) have no way to express that today: the Linux
driver always writes zeros from its zero-initialised buffer during
probe.

Document support for the existing lines-initial-states bitmask, already
defined for nxp,pcf8575, so the same convention covers this output-only
device. Bit N corresponds to GPIO line N. Because the 74HC595/74LVC594
family is push-pull output only (no input mode, no high-impedance state
under software control), bit=0 drives the line low and bit=1 drives it
high; this differs from nxp,pcf8575, where the 0/1 polarity reflects the
quasi-bidirectional nature of that part.

The bitmask covers up to 32 lines, which fits the typical 1-4 chip
cascades that appear in tree. Should longer chains require seeding in
the future, the property can be extended to a uint32-array without
breaking the bit-N-equals-line-N convention.

Suggested-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Chanhong Jung <happycpu@gmail.com>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Reviewed-by: Linus Walleij <linusw@kernel.org>
Link: https://patch.msgid.link/20260429035134.1023330-2-happycpu@gmail.com
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>

drm/gma500: Drop unused include of <drm/drm_pciids.h>

The <drm/drm_pciids.h> header only defines radeon_PCI_IDS which isn't
used in the gma500 driver. So drop the useless include statement.

Signed-off-by: Uwe Kleine-König (The Capable Hub) <u.kleine-koenig@baylibre.com>
Signed-off-by: Patrik Jakobsson <patrik.r.jakobsson@gmail.com>
Link: https://patch.msgid.link/981b4edf04d7fce3bd750f1136ac4ad78628d114.1777545446.git.u.kleine-koenig@baylibre.com

ALSA: core: Serialize deferred fasync state checks

snd_fasync_helper() updates fasync->on under snd_fasync_lock, and
snd_fasync_work_fn() now also evaluates fasync->on under the same
lock. snd_kill_fasync() still tests the flag before taking the lock,
leaving an unsynchronized read against FASYNC enable/disable updates.

Move the enabled-state check into the locked section.

Also clear fasync->on under snd_fasync_lock in snd_fasync_free()
before unlinking the pending entry. Together with the locked sender-side
check, this publishes teardown before flushing the deferred work and
prevents a racing sender from requeueing the entry after free has
started.

Fixes: ef34a0ae7a26 ("ALSA: core: Add async signal helpers")
Fixes: 8146cd333d23 ("ALSA: core: Fix potential data race at fasync handling")
Cc: stable@vger.kernel.org
Signed-off-by: Cássio Gabriel <cassiogabrielcontato@gmail.com>
Link: https://patch.msgid.link/20260506-alsa-core-fasync-on-lock-v1-1-ea48c77d6ca4@gmail.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>

wifi: rtw89: debug: add PMAC counter in bb_info

PMAC (Pseudo MAC) is a circuit within the baseband that can report
various packet-related counters through registers, such as TX ON,
TX EN, CCA, FA, CRC, etc. The driver periodically collects per
PHY PMAC counters in track_work and exposes them through the
bb_info debugfs for easier debugging.

The output of PMAC counter:

  == PMAC
  TX [CCK_TXEN, CCK_TXON, OFDM_TXEN, OFDM_TXON]: [0, 0, 17, 17]
  CRC  [CCK, OFDM, HT, VHT, HE, EHT, ALL, MPDU]
   ok: [0, 301, 0, 0, 5, 0, 306, 5]
  err: [0, 4, 0, 0, 0, 0, 4, 0]
  CCA [CCK, OFDM]: [0, 353]
  FA  [CCK, OFDM]: [0, 39]
  CCA spoofing [CCK, OFDM]: [0, 0]
  CCK SFD: 0, SIG_GG: 0
  OFDM Parity: 4, Rate: 2, LSIG_BRK_S: 0, LSIG_BRK_L: 7, SBD: 5
  AMPDU miss: 0

Signed-off-by: Kuan-Chung Chen <damon.chen@realtek.com>
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Link: https://patch.msgid.link/20260429132625.1659182-5-pkshih@realtek.com

wifi: rtw89: debug: bb_info entry including TX rate count for WiFi 7 chips

Enhance TX performance visibility for WiFi 7 chips by introducing TX
rate count tracking. This is critical for debugging and validation.
Additionally, introduce a new debugfs bb_info to enable and provide
baseband status.

Usage of bb_info debugfs:
  $ echo enable 1 > bb_info  // Start logging BB statistics information
  $ echo mac_id 0 > bb_info  // Specify mac_id for TX rate count tracking

The output of bb_info:

  TP TX: 0 [0] Mbps, RX: 0 [0] Mbps
  Avg packet length: TX=118, RX=136
  TF: 0
  TX count [0]:
     Legacy: [0, 0, 0, 0]
       OFDM: [0, 0, 0, 0, 0, 0, 0, 0]
    MCS 1SS: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
    MCS 2SS: [183, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

  [PHY 0]
  == RSSI/RX Rate
  Beacon: 19 (-41 dBm)
  RX count:
     Legacy: [0, 0, 0, 0]
       OFDM: [0, 0, 0, 0, 0, 0, 0, 0]
       HT 0: [0, 0, 0, 0, 0, 0, 0, 0]
       HT 1: [0, 0, 0, 0, 0, 0, 0, 0]
    VHT 1SS: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0][0, 0]
    VHT 2SS: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0][0, 0]
     HE 1SS: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
     HE 2SS: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
    EHT 1SS: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0][0, 0]
    EHT 2SS: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 185]

  TX rate [0, 2]: EHT 2SS MCS-0 GI:0.8 FB_G BW:160 (hw_rate=0x420) ==> agg_wait=-1 (1)
  RX rate [0, 2]: EHT 2SS MCS-13 GI:0.8 BW:160 (hw_rate=0x42d)
  RSSI: -43 dBm (raw=134, prev=135) [-43, -44]
  EVM: [38.75, (41.50, 43.00)]    SNR: 39

Signed-off-by: Kuan-Chung Chen <damon.chen@realtek.com>
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Link: https://patch.msgid.link/20260429132625.1659182-4-pkshih@realtek.com

wifi: rtw89: phy: support per PHY RX statistics

Previously, RX statistics such as beacon RSSI and packet
counters were shared across all PHYs. To support MLO,
extend the statistics to be maintained per PHY.

Update the debugfs output for phy_info and beacon_info
to include a "[PHY X]" label for better clarity.

The output of phy_info:

  TP TX: 0 [0] Mbps (lv: 0), RX: 0 [0] Mbps (lv: 0)
  Avg packet length: TX=0, RX=120
  TF: 0

  [PHY 0]
  Beacon: 19 (-45 dBm)
  RX count:
     Legacy: [0, 0, 0, 0]
      ...
    EHT 2SS: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

The output of beacon_info:

  [PHY 0]
  Beacon: 20
  raw rssi: 131
  hw rate: 4
  length: 437

  [Beacon info]
  interval: 100
  dtim: 1

Signed-off-by: Kuan-Chung Chen <damon.chen@realtek.com>
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Link: https://patch.msgid.link/20260429132625.1659182-3-pkshih@realtek.com

wifi: rtw89: mlo: rearrange MLSR link decision flow

The original MLSR link decision refers to RSSI, but it should be
based on the premise of an existing link. Otherwise, make a link
decision to select a new link from any available band.

Signed-off-by: Kuan-Chung Chen <damon.chen@realtek.com>
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Link: https://patch.msgid.link/20260429132625.1659182-2-pkshih@realtek.com

drm/bridge: cdns-dsi: Replace deprecated UNIVERSAL_DEV_PM_OPS()

The deprecated UNIVERSAL_DEV_PM_OPS() macro uses the provided callbacks
for both runtime PM and system sleep. This causes the DSI clocks to be
disabled twice: once during runtime suspend and again during system
suspend, resulting in a WARN message from the clock framework when
attempting to disable already-disabled clocks.

[   84.384540] clk:231:5 already disabled
[   84.388314] WARNING: CPU: 2 PID: 531 at /drivers/clk/clk.c:1181 clk_core_disable+0xa4/0xac
...
[   84.579183] Call trace:
[   84.581624]  clk_core_disable+0xa4/0xac
[   84.585457]  clk_disable+0x30/0x4c
[   84.588857]  cdns_dsi_suspend+0x20/0x58 [cdns_dsi]
[   84.593651]  pm_generic_suspend+0x2c/0x44
[   84.597661]  ti_sci_pd_suspend+0xbc/0x15c
[   84.601670]  dpm_run_callback+0x8c/0x14c
[   84.605588]  __device_suspend+0x1a0/0x56c
[   84.609594]  dpm_suspend+0x17c/0x21c
[   84.613165]  dpm_suspend_start+0xa0/0xa8
[   84.617083]  suspend_devices_and_enter+0x12c/0x634
[   84.621872]  pm_suspend+0x1fc/0x368

To address this issue, replace UNIVERSAL_DEV_PM_OPS() with
RUNTIME_PM_OPS(). Bridge and panel drivers should only deal with runtime
PM, as the DRM framework manages system-wide power transitions through
the bridge enable() and disable() hooks.

Link: https://lore.kernel.org/all/fbde0659-78f3-46e4-98cf-d832f765a18b@ideasonboard.com/
Cc: stable@vger.kernel.org # 6.1.x
Fixes: e19233955d9e ("drm/bridge: Add Cadence DSI driver")
Reviewed-by: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com>
Signed-off-by: Vitor Soares <vitor.soares@toradex.com>
Reviewed-by: Luca Ceresoli <luca.ceresoli@bootlin.com>
Link: https://patch.msgid.link/20260505134705.188661-2-ivitro@gmail.com
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com>

drm/xlnx/zynqmp-dpsub: Fix dependencies for COMPILE_TEST

The zynqmp-dpsub driver does not have build time dependencies on the PHY
or DMA drivers. These are runtime hardware restrictions.

Make the two dependencies optional if COMPILE_TEST.

Signed-off-by: Chen-Yu Tsai <wenst@chromium.org>
Link: https://patch.msgid.link/20260505094716.1784225-1-wenst@chromium.org
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com>

ALSA: hda/realtek: Add mute LED fixup for HP Pavilion 15-cs1xxx

Add a new fixup for the mute LED on the HP Pavilion 15-cs1xxx series
using the VREF on NID 0x1b.

The BIOS on these models (tested up to F.32) incorrectly reports
the mute LED on NID 0x18 via DMI OEM strings, which lacks VREF
capabilities. This fixup overrides the LED pin to the correct
NID 0x1b.

Signed-off-by: Rodrigo Faria <rodrigofilipefaria@gmail.com>
Link: https://patch.msgid.link/20260505185518.23625-1-rodrigofilipefaria@gmail.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>

ALSA: seq: Fix UMP group 16 filtering

The sequencer UAPI defines group_filter as an unsigned int bitmap.
Bit 0 filters groupless messages and bits 1-16 filter UMP groups 1-16.

The internal snd_seq_client storage is only unsigned short, so bit 16
is truncated when userspace sets the filter. The same truncation affects
the automatic UMP client filter used to avoid delivery to inactive
groups, so events for group 16 cannot be filtered.

Store the internal bitmap as unsigned int and keep both userspace-provided
and automatically generated values limited to the defined UAPI bits.

Fixes: d2b706077792 ("ALSA: seq: Add UMP group filter")
Cc: stable@vger.kernel.org
Signed-off-by: Cássio Gabriel <cassiogabrielcontato@gmail.com>
Link: https://patch.msgid.link/20260506-alsa-seq-ump-group16-filter-v1-1-b75160bf6993@gmail.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>

wifi: rtl8xxxu: Detect the maximum supported channel width

Some devices malfunction when connected to a network with 40 MHz channel
width, because they don't support that.

RTL8188FU, RTL8192FU, and RTL8710BU (RTL8188GU) have a way to signal
this (and some other capabilities) to the driver. Get this information
from the hardware and advertise 40 MHz support only when the hardware
can handle it. We assume the other chips can always handle it.

RTL8710BU needs a different way to retrieve this information, which will
be implemented some other time.

Fixes: dbf9b7bb0edf ("wifi: rtl8xxxu: Enable 40 MHz width by default")
Cc: stable@vger.kernel.org
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=221394
Signed-off-by: Bitterblue Smith <rtl8821cerfe2@gmail.com>
Reviewed-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Link: https://patch.msgid.link/c57de68e-5d57-4c26-898f-8a284bb25381@gmail.com

irqchip/starfive: Fix error check for devm_platform_ioremap_resource()

devm_platform_ioremap_resource() returns an error pointer on failure, not
NULL.

Fix the check to use IS_ERR() and return PTR_ERR() to correctly handle
allocation failures.

Fixes: 2f59ca185497 ("irqchip/starfive: Use devm_ interfaces to simplify resource release")
Signed-off-by: Chen Ni <nichen@iscas.ac.cn>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Reviewed-by: Changhuang Liang <changhuang.liang@starfivetech.com>
Link: https://patch.msgid.link/20260506041413.1670799-1-nichen@iscas.ac.cn

hrtimer: Return ktime_t from hrtimer_get_next_event()/hrtimer_next_event_without()

These functions really work in terms of ktime_t and not u64.

Change their return types and adapt the callers.

Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20260504-hrtimer-next_event-v2-1-7a5d0550b42f@linutronix.de

clocksource: Clean up clocksource_update_freq() functions

Remove the unused functions __clocksource_update_freq_hz() and
__clocksource_update_freq_khz().

Then make __clocksource_update_freq_scale() static as it is not used
from external callers anymore. Also clean up the comment accordingly.

Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Acked-by: John Stultz <jstultz@google.com>
Link: https://patch.msgid.link/20260504-clocksource-update_freq-v2-1-3e696fb01776@linutronix.de

alarmtimer: Remove stale return description from alarm_handle_timer()

alarm_handle_timer() was converted from returning enum alarmtimer_restart
to void, but the kernel-doc "Return:" line was not removed. Remove the
stale description.

Fixes: 2634303f8773 ("alarmtimers: Remove return value from alarm functions")
Signed-off-by: Zhan Xusheng <zhanxusheng@xiaomi.com>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20260429080635.166790-1-zhanxusheng@xiaomi.com

selftests/posix_timers: Use CLOCK_THREAD_CPUTIME_ID for ITIMER_PROF measurements

It was reported that the posix_timers test was at times seeing failures
with ITIMER_PROF timers, specifically in cases where the RCU_SOFTIRQ was
taking up significant amounts of time.

Analysis showed that as the time in softirq isn't included in the task
stime + utime accounting used to trigger the SIGPROF so delays from softirq
work could cause it to appear that the signal was incorrectly delayed.

Contributing to this is that the test uses gettimeofday() to measure
itimers, which also means any scheduling delay can also cause failures (as
the task may not be running the entire time).

To fix this, convert all the itimer measurements to use clock_gettime(),
tweaking the logic to use nsecs instead of usecs. Then for ITIMER_PROF
timers, utilize the CLOCK_THREAD_CPUTIME_ID clockid so that it is similarly
measuring the time the task was running.

Signed-off-by: John Stultz <jstultz@google.com>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20260428173957.1394265-1-jstultz@google.com

scripts/timers: Add timer_migration_tree.py

Introduce a script that provides a simple ascii representation of the
timer migration tree on top of boot trace events.

First boot with:

    trace_event==tmigr_connect_cpu_parent,tmigr_connect_child_parent

Then parse the result with:

scripts/timer_migration_tree.py < /sys/kernel/tracing/trace

On a system with 8 CPUs, this produces the following output:

    Tree for capacity 1024

                                      /-0, node 0, lvl:-1
                                     |
                                     |--1, node 0, lvl:-1
                                     |
                                     |--2, node 0, lvl:-1
                                     |
                                     |--3, node 0, lvl:-1
    -- /00000000dcebac8b, node 0, lvl:0
                                     |--4, node 0, lvl:-1
                                     |
                                     |--5, node 0, lvl:-1
                                     |
                                     |--6, node 0, lvl:-1
                                     |
                                      \-7, node 0, lvl:-1

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20260423165354.95152-7-frederic@kernel.org

timers/migration: Handle capacity in connect tracepoints

This let tracers know to which hierarchy a CPU belongs to.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20260423165354.95152-6-frederic@kernel.org

timers/migration: Split per-capacity hierarchies

Systems with heterogeneous CPU capacities, such as big.LITTLE, have
reported power issues since the introduction of the new timer migration
code.

Timers migrate from small capacity CPUs to big ones, degrading their
target residency and thus overall power consumption.

Solve this with splitting hierarchies per CPU capacity. For example in
a big.LITTLE machine, split a single hierarchy in two: one for big
capacity CPUs and another one for small capacity CPUs. This way global
timers only migrate across CPUs of the same capacity.

For simplicity purpose, split hierarchies keep the same number of
possible levels as if there were a single hierarchy, even though the
CPUs are distributed between multiple hierarchies. This could be a
problem on NUMA systems with heterogeneous CPU capacities (provided that
ever exists yet) where useless intermediate nodes may be created.
Solving this properly will imply on boot to know in advance how many
capacities are available and the number of CPUs for each of them.

Reported-by: Sehee Jeong <sehee1.jeong@samsung.com>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20260423165354.95152-5-frederic@kernel.org

timers/migration: Track CPUs in a hierarchy

When a new root is created, the old root is connected to it and
propagates up its own assumed to be active state, since the hotplug
control CPU is itself active and part of the old root.

However with per-capacity hierarchies, this assumption won't be true
anymore because the hotplug control CPU calling the timer migration
prepare callback may not belong to the same hierarchy as the booting
CPU.

To solve this, track the available CPUs per hierarchies so that the
root connection can be offlined to safe CPUs.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20260423165354.95152-4-frederic@kernel.org

timers/migration: Abstract out hierarchy to prepare for CPU capacity awareness

In order to prepare for separating out CPUs from different capacities in
distinct hierarchies, create a hierarchy structure that group setup
must rely upon.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20260423165354.95152-3-frederic@kernel.org

drm: renesas: rz-du: mipi_dsi: Fix return path on error

In case of error, we should unwind correctly.
Switching to using dmam_ instead of dma_ and moving the code earlier
fixes the issue.

Fixes: 6f392f371650 ("drm: renesas: rz-du: Implement MIPI DSI host transfers")
Suggested-by: Pavel Machek <pavel@nabladev.com>
Signed-off-by: Chris Brandt <chris.brandt@renesas.com>
Reviewed-by: Biju Das <biju.das.jz@bp.renesas.com>
Link: https://patch.msgid.link/20260501132135.196701-1-chris.brandt@renesas.com
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>

Merge branch 'timers/urgent' into timers/core

Pull in the timer migration fix so further changes can be applied.

timers/migration: Fix another hotplug activation race

The hotplug control CPU is assumed to be active in the hierarchy but
that doesn't imply that the root is active. If the current CPU is not
the one that activated the current hierarchy, and the CPU performing
this duty is still halfway through the tree, the root may still be
observed inactive. And this can break the activation of a new root as in
the following scenario:

1) Initially, the whole system has 64 CPUs and only CPU 63 is awake.

                   [GRP1:0]
                    active
                  /    |    \
                 /     |     \
         [GRP0:0]    [...]    [GRP0:7]
           idle      idle      active
         /   |   \               |
     CPU 0  CPU 1  ...         CPU 63
     idle   idle               active

2) CPU 63 goes idle _but_ due to a #VMEXIT it hasn't yet reached the
   [GRP1:0]->parent dereference (that would be NULL and stop the walk)
   in __walk_groups_from().

                   [GRP1:0]
                     idle
                  /    |    \
                 /     |     \
         [GRP0:0]    [...]    [GRP0:7]
           idle      idle       idle
         /   |   \                |
     CPU 0  CPU 1  ...         CPU 63
     idle   idle                idle

3) CPU 1 wakes up, activates GRP0:0 but didn't yet manage to propagate
   up to GRP1:0 due to yet another #VMEXIT.

                   [GRP1:0]
                     idle
                  /    |    \
                 /     |     \
         [GRP0:0]    [...]    [GRP0:7]
         active      idle       idle
         /   |   \                |
     CPU 0  CPU 1  ...         CPU 63
     idle  active               idle

3) CPU 0 wakes up and doesn't need to walk above GRP0:0 as it's CPU 1
   role.

                   [GRP1:0]
                     idle
                  /    |    \
                 /     |     \
         [GRP0:0]    [...]    [GRP0:7]
         active      idle       idle
         /   |   \                |
     CPU 0  CPU 1  ...         CPU 63
    active  active              idle

4) CPU 0 boots CPU 64. It creates a new root for it.

                             [GRP2:0]
                               idle
                           /          \
                          /            \
                   [GRP1:0]           [GRP1:1]
                   idle                 idle
                  /    |    \                \
                 /     |     \                \
         [GRP0:0]    [...]    [GRP0:7]      [GRP0:8]
         active      idle       idle          idle
         /   |   \                |            |
     CPU 0  CPU 1  ...         CPU 63        CPU 64
    active  active              idle         offline

5) CPU 0 activates the new root, but note that GRP1:0 is still idle,
   waiting for CPU 1 to resume from #VMEXIT and activate it.

                             [GRP2:0]
                              active
                           /          \
                          /            \
                   [GRP1:0]           [GRP1:1]
                   idle                 idle
                  /    |    \                \
                 /     |     \                \
         [GRP0:0]    [...]    [GRP0:7]      [GRP0:8]
         active      idle       idle          idle
         /   |   \                |            |
     CPU 0  CPU 1  ...         CPU 63        CPU 64
    active  active              idle         offline

6) CPU 63 resumes after #VMEXIT and sees the new GRP1:0 parent.
   Therefore it propagates the stale inactive state of GRP1:0 up to
   GRP2:0.

                             [GRP2:0]
                              idle
                           /          \
                          /            \
                   [GRP1:0]           [GRP1:1]
                   idle                 idle
                  /    |    \                \
                 /     |     \                \
         [GRP0:0]    [...]    [GRP0:7]      [GRP0:8]
         active      idle       idle          idle
         /   |   \                |            |
     CPU 0  CPU 1  ...         CPU 63        CPU 64
    active  active              idle         offline

7) CPU 1 resumes after #VMEXIT and finally activates GRP1:0. But it
   doesn't observe its parent link because no ordering enforced that.
   Therefore GRP2:0 is spuriously left idle.

                             [GRP2:0]
                              idle
                           /          \
                          /            \
                   [GRP1:0]           [GRP1:1]
                   active                 idle
                  /    |    \                \
                 /     |     \                \
         [GRP0:0]    [...]    [GRP0:7]      [GRP0:8]
         active      idle       idle          idle
         /   |   \                |            |
     CPU 0  CPU 1  ...         CPU 63        CPU 64
    active  active              idle         offline

Such races are highly theoretical and the problem would solve itself
once the old root ever becomes idle again. But it still leaves a taste
of discomfort.

Fix it with enforcing a fully ordered atomic read of the old root state
before propagating the activate state up to the new root. It has a two
directions ordering effect:

* Acquire + release of the latest old root state: If the hotplug control
  CPU is not the one that woke up the old root, make sure to acquire its
  active state and propagate it upwards through the ordered chain of
  activation (the acquire pairs with the cmpxchg() in tmigr_active_up()
  and subsequent releases will pair with atomic_read_acquire() and
  smp_mb__after_atomic() in tmigr_inactive_up()).

* Release: If the hotplug control CPU is not the one that must wake up
  the old root, but the CPU covering that is lagging behind its duty,
  publish the links from the old root to the new parents. This way the
  lagging CPU will propagate the active state itself.

Fixes: 7ee988770326 ("timers: Implement the hierarchical pull model")
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20260423165354.95152-2-frederic@kernel.org

drm/bridge: ite-it6263: Drop unnecessary blank line

Drop unnecessary blank line in it6263_hdmi_write_hdmi_infoframe().

Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Liu Ying <victor.liu@nxp.com>
Link: https://patch.msgid.link/20260504145906.155198-1-biju.das.jz@bp.renesas.com
Signed-off-by: Liu Ying <victor.liu@nxp.com>

x86/cpu: Remove the CONFIG_X86_INVD_BUG quirk

Now that support for 486 CPUs is gone, remove this
quirk as well.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Ahmed S. Darwish <darwi@linutronix.de>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20250425084216.3913608-7-mingo@kernel.org

x86/cpu, x86/platform, watchdog: Remove CONFIG_X86_RDC321X support

This depends on M486 CPU support, which has been removed.

Note that we still keep the RDC321X MFD, watchdog and GPIO
drivers, because apparently there were 586/686 CPUs offered with the
RDC321X, according to Arnd Bergmann:

| "the [RDC321X] product line is still actively developed by RDC
|  and DM&P, and I suspect that some of the drivers are still used
|  on 586tsc-class (vortex86dx, vortex86mx) and 686-class
|  (vortex86dx3, vortex86ex) SoCs that do run modern kernels and
|  get updates."

For this reason, update the watchdog driver and offer it on
the broader 32-bit landscape, which has been COMPILE_TEST=y
build-tested previously already:

  -       depends on X86_RDC321X || COMPILE_TEST
  +       depends on X86_32 || COMPILE_TEST

The MFD and GPIO drivers were already independent of CONFIG_X86_RDC321X.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Wim Van Sebroeck <wim@linux-watchdog.org>
Cc: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/r/20250425084216.3913608-6-mingo@kernel.org

x86/cpu: Remove TSC-less CONFIG_M586 support

Remove support for TSC-less Pentium variants.

All TSC-capable Pentium variants, derivatives and
clones should still work under the M586TSC or M586MMX
options.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20250425084216.3913608-5-mingo@kernel.org

x86/cpu: Remove CPU_SUP_UMC_32 support

These are 486 based CPUs, which build option (M486) is now gone
upstream.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20250425084216.3913608-4-mingo@kernel.org

x86/cpu: Remove CONFIG_MWINCHIP3D/MWINCHIPC6

These CPUs lack CMPXCHG8B support, according to Arnd Bergmann:

| "Winchip6 (486-class, no tsc, no cx8) and Winchip3D
| (486-class, with tsc but no cx8)"

Any still available derivatives, if they have TSC and CX8 support,
would work with regular Pentium builds, there's no need to have
a separate build option for them.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20250425084216.3913608-3-mingo@kernel.org

x86: Mark AMD Geode support as orphaned

Andres mentioned that he no longer has access to Geode hardware including
the OLPC XO-1, so the MAINTAINERS entry is no longer accurate. I also
noticed that the documentation link no longer works, as the product
was finally discontinued a few years ago.

Aside from the XO-1, there are still a few embeded boards with custom code
in arch/x86/platforms/geode and a number of Geode based thin clients were
shipped that may continue to work without any custom kernel code.

Mark the platform as orphaned, remove the dead link, and update the
files list to include the platform code.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Andres Salomon <dilinger@queued.net>
Link: https://lore.kernel.org/all/fddba1c8-a95a-490f-962e-8505cb948672@queued.net/
Link: https://patch.msgid.link/20260505212458.2263891-1-arnd@kernel.org

drm/bridge: ite-it6263: Move chip initialization code from probe to atomic_enable

On the RZ/G3L SMARC EVK, suspend to RAM powers down the ITE IT6263 chip.
The display controller driver's system PM callbacks invoke
drm_mode_config_helper_{suspend,resume}, which in turn call the bridge's
atomic_{disable,enable} callbacks to handle suspend/resume for the bridge
without dedicated PM ops.

To support proper reinitialization after power loss, move reset_gpio into
the it6263 struct so it is accessible beyond probe time. Relocate
it6263_hw_reset(), it6263_lvds_set_i2c_addr(), it6263_lvds_config() and
it6263_hdmi_config() from probe to atomic_enable, ensuring the chip is
fully reset and reconfigured on every enable, including after a
suspend/resume cycle.

Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Liu Ying <victor.liu@nxp.com>
Link: https://patch.msgid.link/20260501061200.20129-1-biju.das.jz@bp.renesas.com
Signed-off-by: Liu Ying <victor.liu@nxp.com>

Merge tag 'loongarch-fixes-7.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson

Pull LoongArch fixes from Huacai Chen:
"Fix some build and runtime issues after 32BIT Kconfig option enabled,
  improve the platform-specific PCI controller compatibility, drop
  custom __arch_vdso_hres_capable(), and fix a lot of KVM bugs"

* tag 'loongarch-fixes-7.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
  LoongArch: KVM: Move unconditional delay into timer clear scenery
  LoongArch: KVM: Fix HW timer interrupt lost when inject interrupt by software
  LoongArch: KVM: Move AVEC interrupt injection into switch loop
  LoongArch: KVM: Use kvm_set_pte() in kvm_flush_pte()
  LoongArch: KVM: Fix missing EMULATE_FAIL in kvm_emu_mmio_read()
  LoongArch: KVM: Cap KVM_CAP_NR_VCPUS by KVM_CAP_MAX_VCPUS
  LoongArch: KVM: Fix "unreliable stack" for kvm_exc_entry
  LoongArch: KVM: Compile switch.S directly into the kernel
  LoongArch: vDSO: Drop custom __arch_vdso_hres_capable()
  LoongArch: Fix potential ADE in loongson_gpu_fixup_dma_hang()
  LoongArch: Use per-root-bridge PCIH flag to skip mem resource fixup
  LoongArch: Fix SYM_SIGFUNC_START definition for 32BIT
  LoongArch: Specify -m32/-m64 explicitly for 32BIT/64BIT
  LoongArch: Make CONFIG_64BIT as the default option

Merge branch 'xsk-fix-bugs-around-xsk-skb-allocation'

Jason Xing says:

====================
xsk: fix bugs around xsk skb allocation

There are rare issues around xsk_build_skb(). Some of them
were founded by Sashiko[1][2].

[1]: https://lore.kernel.org/all/20260415082654.21026-1-kerneljasonxing@gmail.com/
[2]: https://lore.kernel.org/all/20260418045644.28612-1-kerneljasonxing@gmail.com/
====================

Link: https://patch.msgid.link/20260502200722.53960-1-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

xsk: fix u64 descriptor address truncation on 32-bit architectures

In copy mode TX, xsk_skb_destructor_set_addr() stores the 64-bit
descriptor address into skb_shinfo(skb)->destructor_arg (void *) via a
uintptr_t cast:

skb_shinfo(skb)->destructor_arg = (void *)((uintptr_t)addr | 0x1UL);

On 32-bit architectures uintptr_t is 32 bits, so the upper 32 bits of
the descriptor address are silently dropped. In XDP_ZEROCOPY unaligned
mode the chunk offset is encoded in bits 48-63 of the descriptor
address (XSK_UNALIGNED_BUF_OFFSET_SHIFT = 48), meaning the offset is
lost entirely. The completion queue then returns a truncated address to
userspace, making buffer recycling impossible.

Fix this by handling the 32-bit case directly in
xsk_skb_destructor_set_addr(): when !CONFIG_64BIT, allocate an
xsk_addrs struct (the same path already used for multi-descriptor
SKBs) to store the full u64 address. The existing tagged-pointer logic
in xsk_skb_destructor_is_addr() stays unchanged: slab pointers returned
from kmem_cache_zalloc() are always word-aligned and therefore have
bit 0 clear, which correctly identifies them as a struct pointer
rather than an inline tagged address on every architecture.

Factor the shared kmem_cache_zalloc + destructor_arg assignment into
__xsk_addrs_alloc() and add a wrapper xsk_addrs_alloc() that handles
the inline-to-list upgrade (is_addr check + get_addr + num_descs = 1).
The three former open-coded kmem_cache_zalloc call sites now reduce to
a single call each.

Propagate the -ENOMEM from xsk_skb_destructor_set_addr() through
xsk_skb_init_misc() so the caller can clean up the skb via kfree_skb()
before skb->destructor is installed.

The overhead is one extra kmem_cache_zalloc per first descriptor on
32-bit only; 64-bit builds are completely unchanged.

Closes: https://lore.kernel.org/all/20260419045824.D9E5EC2BCAF@smtp.kernel.org/
Fixes: 0ebc27a4c67d ("xsk: avoid data corruption on cq descriptor number")
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://patch.msgid.link/20260502200722.53960-9-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

xsk: fix xsk_addrs slab leak on multi-buffer error path

When xsk_build_skb() / xsk_build_skb_zerocopy() sees the first
continuation descriptor, it promotes destructor_arg from an inlined
address to a freshly allocated xsk_addrs (num_descs = 1). The counter
is bumped to >= 2 only at the very end of a successful build (by calling
xsk_inc_num_desc()).

If the build fails in between (e.g. alloc_page() returns NULL with
-EAGAIN, or the MAX_SKB_FRAGS overflow hits), we jump to free_err, skip
calling xsk_inc_num_desc() to increment num_descs and leave the half-built
skb attached to xs->skb for the app to retry. The skb now has
1) destructor_arg = a real xsk_addrs pointer,
2) num_descs = 1

If the app never retries and just close()s the socket, xsk_release()
calls xsk_drop_skb() -> xsk_consume_skb(), which decides whether to
free xsk_addrs by testing num_descs > 1:

if (unlikely(num_descs > 1))
kmem_cache_free(xsk_tx_generic_cache, destructor_arg);

Because num_descs is exactly 1 the branch is skipped and the
xsk_addrs object is leaked to the xsk_tx_generic_cache slab.

Fix it by directly testing if destructor_arg is still addr. Or else it
is modified and used to store the newly allocated memory from
xsk_tx_generic_cache regardless of increment of num_desc, which we
need to handle.

Closes: https://lore.kernel.org/all/20260419045824.D9E5EC2BCAF@smtp.kernel.org/
Fixes: 0ebc27a4c67d ("xsk: avoid data corruption on cq descriptor number")
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://patch.msgid.link/20260502200722.53960-8-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

xsk: avoid skb leak in XDP_TX_METADATA case

Fix it by explicitly adding kfree_skb() before returning back to its
caller.

How to reproduce it in virtio_net:
1. the current skb is the first one (which means no frag and xs->skb is
NULL) and users enable metadata feature.
2. xsk_skb_metadata() returns a error code.
3. the caller xsk_build_skb() clears skb by using 'skb = NULL;'.
4. there is no chance to free this skb anymore.

Closes: https://lore.kernel.org/all/20260415085204.3F87AC19424@smtp.kernel.org/
Fixes: 30c3055f9c0d ("xsk: wrap generic metadata handling onto separate function")
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://patch.msgid.link/20260502200722.53960-7-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

xsk: prevent CQ desync when freeing half-built skbs in xsk_build_skb()

Once xsk_skb_init_misc() has been called on an skb, its destructor is
set to xsk_destruct_skb(), which submits the descriptor address(es) to
the completion queue and advances the CQ producer. If such an skb is
subsequently freed via kfree_skb() along an error path - before the
skb has ever been handed to the driver - the destructor still runs and
submits a bogus, half-initialized address to the CQ.

Postpone the init phase when we believe the allocation of first frag is
successfully completed. Before this init, skb can be safely freed by
kfree_skb().

Closes: https://lore.kernel.org/all/20260419045822.843BFC2BCAF@smtp.kernel.org/
Fixes: c30d084960cf ("xsk: avoid overwriting skb fields for multi-buffer traffic")
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://patch.msgid.link/20260502200722.53960-6-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

xsk: fix use-after-free of xs->skb in xsk_build_skb() free_err path

When xsk_build_skb() processes multi-buffer packets in copy mode, the
first descriptor stores data into the skb linear area without adding
any frags, so nr_frags stays at 0. The caller then sets xs->skb = skb
to accumulate subsequent descriptors.

If a continuation descriptor fails (e.g. alloc_page returns NULL with
-EAGAIN), we jump to free_err where the condition:

if (skb && !skb_shinfo(skb)->nr_frags)
kfree_skb(skb);

evaluates to true because nr_frags is still 0 (the first descriptor
used the linear area, not frags). This frees the skb while xs->skb
still points to it, creating a dangling pointer. On the next transmit
attempt or socket close, xs->skb is dereferenced, causing a
use-after-free or double-free.

Fix by using a !xs->skb check to handle first frag situation, ensuring
we only free skbs that were freshly allocated in this call
(xs->skb is NULL) and never free an in-progress multi-buffer skb that
the caller still references.

Closes: https://lore.kernel.org/all/20260415082654.21026-4-kerneljasonxing@gmail.com/
Fixes: 6b9c129c2f93 ("xsk: remove @first_frag from xsk_build_skb()")
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://patch.msgid.link/20260502200722.53960-5-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

xsk: handle NULL dereference of the skb without frags issue

When a first descriptor (xs->skb == NULL) triggers -EOVERFLOW in
xsk_build_skb_zerocopy() (e.g., MAX_SKB_FRAGS exceeded), the
free_err -EOVERFLOW handler unconditionally dereferences xs->skb
via xsk_inc_num_desc(xs->skb) and xsk_drop_skb(xs->skb), causing
a NULL pointer dereference.

Fix this by guarding the existing xsk_inc_num_desc()/xsk_drop_skb()
calls with an xs->skb check (for the continuation case), and add
an else branch for the first-descriptor case that manually cancels
the one reserved CQ slot and increments invalid_descs by one to
account for the single invalid descriptor.

Fixes: cf24f5a5feea ("xsk: add support for AF_XDP multi-buffer on Tx path")
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://patch.msgid.link/20260502200722.53960-4-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

xsk: free the skb when hitting the upper bound MAX_SKB_FRAGS

Fix it by explicitly adding kfree_skb() before returning back to its
caller.

How to reproduce it in virtio_net:
1. the current skb is the first one (which means xs->skb is NULL) and
hit the limit MAX_SKB_FRAGS.
2. xsk_build_skb_zerocopy() returns -EOVERFLOW.
3. the caller xsk_build_skb() clears skb by using 'skb = NULL;'. This
is why bug can be triggered.
4. there is no chance to free this skb anymore.

Note that if in this case the xs->skb is not NULL, xsk_build_skb() will
call xsk_drop_skb(xs->skb) to do the right thing.

Fixes: cf24f5a5feea ("xsk: add support for AF_XDP multi-buffer on Tx path")
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://patch.msgid.link/20260502200722.53960-3-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

xsk: reject sw-csum UMEM binding to IFF_TX_SKB_NO_LINEAR devices

skb_checksum_help() is a common helper that writes the folded
16-bit checksum back via skb->data + csum_start + csum_offset,
i.e. it relies on the skb's linear head and fails (with WARN_ONCE
and -EINVAL) when skb_headlen() is 0.

AF_XDP generic xmit takes two very different paths depending on the
netdev. Drivers that advertise IFF_TX_SKB_NO_LINEAR (e.g. virtio_net)
skip the "copy payload into a linear head" step on purpose as a
performance optimisation: xsk_build_skb_zerocopy() only attaches UMEM
pages as frags and never calls skb_put(), so skb_headlen() stays 0
for the whole skb. For these skbs there is simply no linear area for
skb_checksum_help() to write the csum into - the sw-csum fallback is
structurally inapplicable.

The patch tries to catch this and reject the combination with error at
setup time. Rejecting at bind() converts this silent per-packet failure
into a synchronous, actionable -EOPNOTSUPP at setup time. HW csum and
launch_time metadata on IFF_TX_SKB_NO_LINEAR drivers are unaffected
because they do not call skb_checksum_help().

Without the patch, every descriptor carrying 'XDP_TX_METADATA |
XDP_TXMD_FLAGS_CHECKSUM' produces:
1) a WARN_ONCE "offset (N) >= skb_headlen() (0)" from skb_checksum_help(),
2) sendmsg() returning -EINVAL without consuming the descriptor
   (invalid_descs is not incremented),
3) a wedged TX ring: __xsk_generic_xmit() does not advance the
    consumer on non-EOVERFLOW errors, so the next sendmsg() re-reads
    the same descriptor and re-hits the same WARN until the socket
    is closed.

Closes: https://lore.kernel.org/all/20260419045822.843BFC2BCAF@smtp.kernel.org/#t
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Fixes: 30c3055f9c0d ("xsk: wrap generic metadata handling onto separate function")
Link: https://patch.msgid.link/20260502200722.53960-2-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-mana-avoid-queue-struct-allocation-failure-under-memory-fragmentation'

Aditya Garg says:

====================
net: mana: Avoid queue struct allocation failure under memory fragmentation

The MANA driver can fail to load on systems with high memory
utilization because several allocations in the queue setup paths
require large physically contiguous blocks via kmalloc. Under memory
fragmentation these high-order allocations may fail, preventing the
driver from creating queues when opening the interface or when
reconfiguring channels, ring parameters or MTU at runtime.

Allocation sizes that are problematic:

  mana_create_txq -> tx_qp flat array (sizeof(mana_tx_qp) = 35528):
    16 queues (default): 35528 * 16 =  ~555 KB contiguous
    64 queues (max):     35528 * 64 = ~2220 KB contiguous

  mana_create_rxq -> rxq struct with flex array
  (sizeof(mana_rxq) = 35712, rx_oobs=296 per entry):
    depth 1024 (default): 35712 + 296 * 1024 =  ~331 KB per queue
    depth 8192 (max):     35712 + 296 * 8192 = ~2403 KB per queue

  mana_pre_alloc_rxbufs -> rxbufs_pre and das_pre arrays:
    16 queues, depth 1024 (default): 16 * 1024 * 8 =  128 KB each
    64 queues, depth 8192 (max):     64 * 8192 * 8 = 4096 KB each

This series addresses the issue by:
  1. Converting the tx_qp flat array into an array of pointers with
     per-queue kvzalloc (~35 KB each), replacing a single contiguous
     allocation that can reach ~2.2 MB at 64 queues.
  2. Switching rxbufs_pre, das_pre, and rxq allocations to
     kvmalloc/kvzalloc so the allocator can fall back to vmalloc
     when contiguous memory is unavailable.

Throughput testing confirms no regression. Since kvmalloc falls
back to vmalloc under memory fragmentation, all kvmalloc calls
were temporarily replaced with vmalloc to simulate the fallback
path (iperf3, GBits/sec):

                 Physically contiguous         vmalloc region
  Connections      TX          RX              TX          RX
  --------------------------------------------------------------
  1                47.2        46.9            46.8        46.6
  16               181         181             181         181
  32               181         181             181         181
  64               181         181             181         181
====================

Link: https://patch.msgid.link/20260502074552.23857-1-gargaditya@linux.microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: mana: Use kvmalloc for large RX queue and buffer allocations

The RX path allocations for rxbufs_pre, das_pre, and rxq scale with
queue count and queue depth. With high queue counts and depth, these can
exceed what kmalloc can reliably provide from physically contiguous
memory under fragmentation.

Switch these from kmalloc to kvmalloc variants so the allocator
transparently falls back to vmalloc when contiguous memory is scarce,
and update the corresponding frees to kvfree.

Signed-off-by: Aditya Garg <gargaditya@linux.microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Link: https://patch.msgid.link/20260502074552.23857-3-gargaditya@linux.microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: mana: Use per-queue allocation for tx_qp to reduce allocation size

Convert tx_qp from a single contiguous array allocation to per-queue
individual allocations. Each mana_tx_qp struct is approximately 35KB.
With many queues (e.g., 32/64), the flat array requires a single
contiguous allocation that can fail under memory fragmentation.

Change mana_tx_qp *tx_qp to mana_tx_qp **tx_qp (array of pointers),
allocating each queue's mana_tx_qp individually via kvzalloc. This
reduces each allocation to ~35KB and provides vmalloc fallback,
avoiding allocation failure due to fragmentation.

Signed-off-by: Aditya Garg <gargaditya@linux.microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Link: https://patch.msgid.link/20260502074552.23857-2-gargaditya@linux.microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'selftests-rds-log-collection-tap-compliance-and-cleanups'

Allison Henderson says:

====================
selftests: rds: Log collection, TAP compliance and cleanups

This series is a set of bug fixes and improvements for the rds
selftests.

Patch 1 bumps the kselftest timeout from 400s to 800s. The original
limit was developed against a lean config, but the kselftest harness
counts boot time and gcov log collection against the limit, so a
default config with gcov enabled needs more headroom.

Patch 2 corrects some typos in the run.sh USAGE string and removes an
unused "-g" flag.

Patch 3 silences a handful of pylint warnings in test.py: it adds a
module docstring, suppresses the warnings tied to the sys.path.append
import trick, marks the long lived tcpdump Popen with disable-next
consider-using-with, and drops unused exception variables from two
BlockingIOError except clauses.

Patch 4 adds a -t flag to run.sh so the timeout can be overridden
if needed.

Patch 5 adds a RDS_LOG_DIR environment variable that specifies where
logs should be stored, or skips log collection if left unset

Patch 6 adds a SUDO_USER environment variable that sets the user
for tcpdump --relinquish-privileges. This avoid the permissions
drop that would leave pcaps empty on 9pfs since 9p does not
support chown

Patch 7 removes the initial tmp tcpdumps and instead saves the pcaps
directly to the logdir if it is set.

Patch 8 hoists the tcpdump shutdown into a helper and calls it from the
timeout signal handler so that the processes are properly terminated
and dumps are flushed

Patch 9 fixes gcov collection by ensuring debugfs is mounted, and
specifying the --root folder so that gcov can still find the kernel
source when it is run from the ksft test directory.

Patch 10 makes the test output TAP compliant so the kselftest runner
parses results correctly.
====================

Link: https://patch.msgid.link/20260504054143.4027538-1-achender@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: rds: Make rds selftests TAP compliant

This patch updates the rds selftests output to be TAP compliant.

Use ksft_pr() to mark debug output with a leading '# ' so that TAP
parsers treat it as commentary, and convert all informational print()
calls to use ksft_pr(). sys.exit(0) is changed to os._exit(0) to
avoid duplicate prints from the buffered TAP output. The console
output from the tcpdump subprocess is silenced, and the gcov console
output is redirected to a gcovr.log.

Finally adjust the exit path so that the hash check loop sets a
return code instead exiting directly. Then print the TAP results
and totals lines before exiting.

Signed-off-by: Allison Henderson <achender@kernel.org>
Link: https://patch.msgid.link/20260504054143.4027538-11-achender@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: rds: Fix gcov collection

debugfs is not mounted automatically in a virtme-ng guest, so the
gcov data copy from /sys/kernel/debug/gcov/ silently finds nothing
depending on whether debugfs is mounted by default on the host OS.
Fix this by mounting debugfs in run.sh before copying the gcda
files.

Finally when invoked through the kselftest runner, the working
directory is the test directory rather than the kernel source root.
gcovr defaults --root to the current working directory, which causes
it to filter out all coverage data for files under net/rds/ since
they are not under the test directory. Fix this by passing --root
to gcovr explicitly.

Signed-off-by: Allison Henderson <achender@kernel.org>
Link: https://patch.msgid.link/20260504054143.4027538-10-achender@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: rds: Stop tcpdump on timeout

The timeout signal handler for the rds selftests currently just
exits when the time limit is exceeded, and forgets to stop the
network dumps. Fix this by hoisting the tcpdump terminate commands
into a helper function, and call it from the signal handler before
exiting

Bound proc.wait() with a timeout (and fall back to proc.kill())
so an unresponsive tcpdump cannot hang the timeout path itself.

We also pop() tcpdump_procs as we iterate, so stop_pcaps() is safe
to call from both the normal cleanup path and the signal handler,
since the second invocation simply has nothing to do

Signed-off-by: Allison Henderson <achender@kernel.org>
Link: https://patch.msgid.link/20260504054143.4027538-9-achender@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: rds: Remove tmp pcaps

This patch removes the initial tmp tcpdumps and instead saves
the pcaps directly to the logdir if it is set.

Signed-off-by: Allison Henderson <achender@kernel.org>
Link: https://patch.msgid.link/20260504054143.4027538-8-achender@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: rds: Add SUDO_USER env variable

This patch modifies rds selftests to use the environment variable
SUDO_USER for tcpdumps if it is set. This is needed to avoid chown
operations on the vng 9pfs which is not supported. Passing a user
listed in sudoers avoids the tcpdump privilege drop which may
otherwise create empty pcaps

Signed-off-by: Allison Henderson <achender@kernel.org>
Link: https://patch.msgid.link/20260504054143.4027538-7-achender@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: rds: Add RDS_LOG_DIR env variable

This patch modifies the rds selftest to look for an env variable
RDS_LOG_DIR, and log all traces, pcaps and gcov collections to
the folder specified in RDS_LOG_DIR. If RDS_LOG_DIR is unset,
logs are not collected.

Signed-off-by: Allison Henderson <achender@kernel.org>
Link: https://patch.msgid.link/20260504054143.4027538-6-achender@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: rds: Add timeout flag to run.sh

Add a -t flag to run.sh to optionally override the default
timeout. The --timeout flag is already supported in test.py,
so just add the shorthand -t flag

Signed-off-by: Allison Henderson <achender@kernel.org>
Link: https://patch.msgid.link/20260504054143.4027538-5-achender@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: rds: Fix more pylint errors

This patch fixes a few pylint errors in test.py. Remove unused exception
variables from except blocks, and disable warnings for imports that cannot
appear at the start of the module. Also disable warnings for the
tcpdump processes. The suggestion to use a with block does not apply
here since the process needs to outlive the parent to collect the dumps.
Lastly add the module docstring at the top of the module.

Signed-off-by: Allison Henderson <achender@kernel.org>
Link: https://patch.msgid.link/20260504054143.4027538-4-achender@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: rds: Update USAGE string for run.sh

The run.sh script does not have a -g flag. Update USAGE string with
correct flags. Also fix typo packet_duplcate -> packet_duplicate

Signed-off-by: Allison Henderson <achender@kernel.org>
Link: https://patch.msgid.link/20260504054143.4027538-3-achender@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: rds: Increase selftest timeout

The 400s time out was originally developed under a leaner
kernel config that booted much faster than a default config.
Boot up is included as part of the over all test runtime, as
well as any log collection done when the test is complete.
A slower config combined with the gcov enabled test means
we'll need more time to accommodate the boot up and log
collection. So, bump time out to 800s.

Signed-off-by: Allison Henderson <achender@kernel.org>
Link: https://patch.msgid.link/20260504054143.4027538-2-achender@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

powerpc/pasemi: Drop redundant res assignment

Return value of pas_add_bridge() is not used, so code can be simplified
to fix W=1 clang warnings:

arch/powerpc/platforms/pasemi/pci.c:275:6: error: variable 'res' set but not used [-Werror,-Wunused-but-set-variable]

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20260317130823.240279-4-krzysztof.kozlowski@oss.qualcomm.com

powerpc/ps3: Drop redundant result assignment

Return value of ps3_start_probe_thread() is not used, so code can be
simplified to fix W=1 clang warnings:

arch/powerpc/platforms/ps3/device-init.c:953:6: error: variable 'result' set but not used [-Werror,-Wunused-but-set-variable]

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20260317130823.240279-3-krzysztof.kozlowski@oss.qualcomm.com

powerpc/vdso: Drop -DCC_USING_PATCHABLE_FUNCTION_ENTRY from 32-bit flags with clang

After commit 73cdf24e81e4 ("powerpc64: make clang cross-build
friendly"), building 64-bit little endian + CONFIG_COMPAT=y with clang
results in many warnings along the lines of:

  $ cat arch/powerpc/configs/compat.config
  CONFIG_COMPAT=y

  $ make -skj"$(nproc)" ARCH=powerpc LLVM=1 ppc64le_defconfig compat.config arch/powerpc/kernel/vdso/
  ...
  In file included from <built-in>:4:
  In file included from lib/vdso/gettimeofday.c:6:
  In file included from include/vdso/datapage.h:15:
  In file included from include/vdso/cache.h:5:
  arch/powerpc/include/asm/cache.h:77:8: warning: unknown attribute 'patchable_function_entry' ignored [-Wunknown-attributes]
     77 | static inline u32 l1_icache_bytes(void)
        |        ^~~~~~
  include/linux/compiler_types.h:235:58: note: expanded from macro 'inline'
    235 | #define inline inline __gnu_inline __inline_maybe_unused notrace
        |                                                          ^~~~~~~
  include/linux/compiler_types.h:215:34: note: expanded from macro 'notrace'
    215 | #define notrace                 __attribute__((patchable_function_entry(0, 0)))
        |                                                ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  ...

arch/powerpc/Makefile adds -DCC_USING_PATCHABLE_FUNCTION_ENTRY to
KBUILD_CPPFLAGS, which is inherited by the 32-bit vDSO. However, the
32-bit little endian target does not support
'-fpatchable-function-entry', resulting in the warnings above.

Remove -DCC_USING_PATCHABLE_FUNCTION_ENTRY from the 32-bit vDSO flags
when building with clang to avoid the warnings.

Fixes: 73cdf24e81e4 ("powerpc64: make clang cross-build friendly")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20260311-ppc-vdso-drop-cc-using-pfe-define-clang-v1-1-66c790e22650@kernel.org

platform/chrome: cros_ec_typec: Init mutex in Thunderbolt registration

cros_typec_register_thunderbolt() missed initializing the `adata->lock`
mutex. This leads to a NULL dereference when the mutex is later
acquired (e.g. in cros_typec_altmode_work()).

Initialize the mutex in cros_typec_register_thunderbolt() to fix the
issue.

Cc: stable@vger.kernel.org
Fixes: 3b00be26b16a ("platform/chrome: cros_ec_typec: Thunderbolt support")
Reviewed-by: Benson Leung <bleung@chromium.org>
Reviewed-by: Abhishek Pandit-Subedi <abhishekpandit@chromium.org>
Link: https://lore.kernel.org/r/20260505053403.3335740-1-tzungbi@kernel.org
Signed-off-by: Tzung-Bi Shih <tzungbi@kernel.org>