]> git.ipfire.org Git - thirdparty/kernel/stable.git/log
thirdparty/kernel/stable.git
12 days agoipmr: Add __rcu to netns_ipv4.mrt.
Kuniyuki Iwashima [Sat, 2 May 2026 18:07:47 +0000 (18:07 +0000)] 
ipmr: Add __rcu to netns_ipv4.mrt.

kernel test robot reported this Sparse warning:

  $ make C=1 net/ipv4/ipmr.o
  net/ipv4/ipmr.c:312:24: error: incompatible types in comparison expression (different address spaces):
  net/ipv4/ipmr.c:312:24:    struct mr_table [noderef] __rcu *
  net/ipv4/ipmr.c:312:24:    struct mr_table *

Let's add __rcu annotation to netns_ipv4.mrt.

Fixes: b3b6babf4751 ("ipmr: Free mr_table after RCU grace period.")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202605030032.glNApko7-lkp@intel.com/
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260502180755.359554-1-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
12 days agopsp: strip variable-length PSP header in psp_dev_rcv()
David Carlier [Sat, 2 May 2026 14:19:45 +0000 (15:19 +0100)] 
psp: strip variable-length PSP header in psp_dev_rcv()

psp_dev_rcv() unconditionally removes a fixed PSP_ENCAP_HLEN, even
when psph->hdrlen indicates that the PSP header carries optional
fields. A frame whose PSP header advertises a non-zero VC or any
extension would therefore be silently mis-decapsulated: option bytes
would spill into the inner packet head and downstream parsing would
fail on a corrupted skb.

Compute the full PSP header length from psph->hdrlen, pull the
optional bytes into the linear region, and strip the whole header
when decapsulating. Optional fields (VC, ...) are still ignored,
just discarded with the rest of the header instead of leaking.
crypt_offset and the VIRT flag are intentionally not validated here
- callers know their device's PSP implementation and can decide.

Both in-tree callers gate on hardware-validated PSP, so this is a
correctness fix rather than a reachable corruption path under
current configurations.

Fixes: 0eddb8023cee ("psp: provide decapsulation and receive helper for drivers")
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Daniel Zahka <daniel.zahka@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: David Carlier <devnexen@gmail.com>
Link: https://patch.msgid.link/20260502141945.14484-1-devnexen@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
12 days agonet: prevent possible UAF in rtnl_prop_list_size()
Eric Dumazet [Sat, 2 May 2026 12:41:02 +0000 (12:41 +0000)] 
net: prevent possible UAF in rtnl_prop_list_size()

I was mistaken by synchronize_rcu() [1] call in netdev_name_node_alt_destroy(),
giving a false sense of RCU safety at delete times.

We have to use list_del_rcu() to not confuse potential readers
in rtnl_prop_list_size().

[1] This synchronize_rcu() call was later removed in commit 723de3ebef03
("net: free altname using an RCU callback").

Fixes: 9f30831390ed ("net: add rcu safety to rtnl_prop_list_size()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260502124102.499204-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
12 days agoMerge branch 'mptcp-misc-fixes-for-v7-1-rc3'
Jakub Kicinski [Tue, 5 May 2026 02:20:52 +0000 (19:20 -0700)] 
Merge branch 'mptcp-misc-fixes-for-v7-1-rc3'

Matthieu Baerts says:

====================
mptcp: misc fixes for v7.1-rc3

Here are various unrelated fixes:

- Patch 1: increment the right MIB counter. A fix for v5.7.

- Patch 2: set the right MPTCP reset reason. A fix for v5.9.

- Patch 3: fix rx timestamp corruption when on MPTCP passive fastopen. A
  fix for v6.2.

- Patch 4: increase sockopt seq after having set TCP_MAXSEG to propagate
  it to newer subflows later. A fix for 6.17.
====================

Link: https://patch.msgid.link/20260501-net-mptcp-misc-fixes-7-1-rc3-v1-0-b70118df778e@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
12 days agomptcp: sockopt: increase seq in mptcp_setsockopt_all_sf
Matthieu Baerts (NGI0) [Fri, 1 May 2026 19:35:37 +0000 (21:35 +0200)] 
mptcp: sockopt: increase seq in mptcp_setsockopt_all_sf

mptcp_setsockopt_all_sf() was missing a call to sockopt_seq_inc(). This
is required not to cause missing synchronization for newer subflows
created later on.

This helper is called each time a socket option is set on subflows, and
future ones will need to inherit this option after their creation.

Fixes: 51c5fd09e1b4 ("mptcp: add TCP_MAXSEG sockopt support")
Cc: stable@vger.kernel.org
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260501-net-mptcp-misc-fixes-7-1-rc3-v1-4-b70118df778e@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
12 days agomptcp: fix rx timestamp corruption on fastopen
Paolo Abeni [Fri, 1 May 2026 19:35:36 +0000 (21:35 +0200)] 
mptcp: fix rx timestamp corruption on fastopen

The skb cb offset containing the timestamp presence flag is cleared
before loading such information. Cache such value before MPTCP CB
initialization.

Fixes: 36b122baf6a8 ("mptcp: add subflow_v(4,6)_send_synack()")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260501-net-mptcp-misc-fixes-7-1-rc3-v1-3-b70118df778e@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
12 days agomptcp: use MPTCP_RST_EMPTCP for ACK HMAC validation failure
Shardul Bankar [Fri, 1 May 2026 19:35:35 +0000 (21:35 +0200)] 
mptcp: use MPTCP_RST_EMPTCP for ACK HMAC validation failure

When HMAC validation fails on a received ACK + MP_JOIN in
subflow_syn_recv_sock(), the subflow is reset with reason
MPTCP_RST_EPROHIBIT ("Administratively prohibited"). This is
incorrect: HMAC validation failure is an MPTCP protocol-level
error, not an administrative policy denial.

The mirror site on the client, in subflow_finish_connect(), already
uses MPTCP_RST_EMPTCP ("MPTCP-specific error") for the same kind of
HMAC failure on the SYN/ACK + MP_JOIN. Use the same reason on the
server side for symmetry and accuracy.

Suggested-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Fixes: 443041deb5ef ("mptcp: fix NULL pointer in can_accept_new_subflow")
Cc: stable@vger.kernel.org
Signed-off-by: Shardul Bankar <shardul.b@mpiricsoftware.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260501-net-mptcp-misc-fixes-7-1-rc3-v1-2-b70118df778e@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
12 days agomptcp: use MPJoinSynAckHMacFailure for SynAck HMAC failure
Shardul Bankar [Fri, 1 May 2026 19:35:34 +0000 (21:35 +0200)] 
mptcp: use MPJoinSynAckHMacFailure for SynAck HMAC failure

In subflow_finish_connect(), HMAC validation of the server's HMAC
in SYN/ACK + MP_JOIN increments MPTCP_MIB_JOINACKMAC ("HMAC was
wrong on ACK + MP_JOIN") on failure. The function processes the
SYN/ACK, not the ACK; the matching MPTCP_MIB_JOINSYNACKMAC counter
("HMAC was wrong on SYN/ACK + MP_JOIN") exists but is not
incremented anywhere in the tree.

The mirror site on the server, subflow_syn_recv_sock(), already
uses JOINACKMAC correctly for ACK HMAC failure. Use JOINSYNACKMAC
at the SYN/ACK validation site so each counter reflects the packet
whose HMAC actually failed.

Suggested-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Fixes: fc518953bc9c ("mptcp: add and use MIB counter infrastructure")
Cc: stable@vger.kernel.org
Signed-off-by: Shardul Bankar <shardul.b@mpiricsoftware.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260501-net-mptcp-misc-fixes-7-1-rc3-v1-1-b70118df778e@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
12 days agovsock/virtio: fix potential unbounded skb queue
Eric Dumazet [Thu, 30 Apr 2026 12:26:52 +0000 (12:26 +0000)] 
vsock/virtio: fix potential unbounded skb queue

virtio_transport_inc_rx_pkt() checks vvs->rx_bytes + len > vvs->buf_alloc.

virtio_transport_recv_enqueue() skips coalescing for packets
with VIRTIO_VSOCK_SEQ_EOM.

If fed with packets with len == 0 and VIRTIO_VSOCK_SEQ_EOM,
a very large number of packets can be queued
because vvs->rx_bytes stays at 0.

Fix this by estimating the skb metadata size:

(Number of skbs in the queue) * SKB_TRUESIZE(0)

Fixes: 077706165717 ("virtio/vsock: don't use skbuff state to account credit")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Arseniy Krasnov <AVKrasnov@sberdevices.ru>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Stefano Garzarella <sgarzare@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Cc: "Eugenio Pérez" <eperezma@redhat.com>
Cc: virtualization@lists.linux.dev
Link: https://patch.msgid.link/20260430122653.554058-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
12 days agonet: usb: asix: ax88772: re-add usbnet_link_change() in phylink callbacks
Markus Baier [Fri, 1 May 2026 16:39:41 +0000 (18:39 +0200)] 
net: usb: asix: ax88772: re-add usbnet_link_change() in phylink callbacks

Commit e0bffe3e6894 ("net: asix: ax88772: migrate to phylink") replaced
the asix_adjust_link() PHY callback with phylink's mac_link_up() and
mac_link_down() handlers, but did not carry over the usbnet_link_change()
notification that commit 805206e66fab ("net: asix: fix "can't send until
first packet is send" issue") had added.

As a result, the original symptom returns: when the link comes up,
usbnet is never notified, so the RX URB submission stays dormant until
some other event (e.g. a transmitted packet triggering the status
endpoint interrupt) wakes it up.

This is reproducible with the Apple A1277 USB Ethernet Adapter
(05ac:1402, AX88772A based) on a Banana Pro using a static IPv4
configuration. After bringing the interface up, no incoming packets are
received until the first outgoing frame triggers usbnet's RX path.

Restore the link change notification, gated on a carrier transition so
the call remains idempotent if the status endpoint also reports the
change later.

Fixes: e0bffe3e6894 ("net: asix: ax88772: migrate to phylink")
Signed-off-by: Markus Baier <Markus.Baier@soslab.tu-darmstadt.de>
Tested-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://patch.msgid.link/20260501163941.107668-1-Markus.Baier@soslab.tu-darmstadt.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
12 days agoASoC: tas2770: Deal with bogus initial temperature value
James Calligeros [Sun, 3 May 2026 12:23:24 +0000 (22:23 +1000)] 
ASoC: tas2770: Deal with bogus initial temperature value

TAS2770 initialises the temperature readout registers to 0.
This value persists until the chip is fully powered up and
the ADC starts sampling. The ADC then persists the last sampled
temperature during software shutdown.

The ADC should therefore never return 0 in normal operating
conditions, so return -ENODATA and mark it as a fault condition
using HWMON_T_FAULT.

Fixes: ff73e2780169 ("ASoC: tas2770: expose die temp to hwmon")
Signed-off-by: James Calligeros <jcalligeros99@gmail.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
12 days agoASoC: tas2764: Deal with bogus initial temperature register value
James Calligeros [Sun, 3 May 2026 12:23:23 +0000 (22:23 +1000)] 
ASoC: tas2764: Deal with bogus initial temperature register value

The TAS2764 datasheet specifies that the chip initialises the
temperature register such that the temperature reading is 2.6 *C,
ostensibly to prevent tripping the chip's protection circuitry.
The chip is not capable of representing 2.6 *C however, and the
register is actually initialised to 0. The ADC does not start
sampling until the chip is powered up, and the last sampled
temperature persists in the register during software shutdown.
Therefore, any reading returning 0 is almost certain to be
from before the ADC has actually started sampling, meaning that
it is invalid.

Return -ENODATA early if the temperature has not yet been sampled
by the chip, and indicate a fault condition using HWMON_T_FAULT.

Fixes: 186dfc85f9a8 ("ASoC: tas2764: expose die temp to hwmon")
Signed-off-by: James Calligeros <jcalligeros99@gmail.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
12 days agonetpoll: pass buffer size to egress_dev() to avoid MAC truncation
Breno Leitao [Fri, 1 May 2026 09:58:41 +0000 (02:58 -0700)] 
netpoll: pass buffer size to egress_dev() to avoid MAC truncation

egress_dev() formats np->dev_mac via snprintf() but receives buf as
a bare char *, so it cannot derive the buffer size from the pointer. The
size argument was hardcoded to MAC_ADDR_STR_LEN (3 * ETH_ALEN - 1 = 17),
which is silly wrong in two ways:

 1) misleading kernel log output on the MAC-selected target path
    (np->dev_name[0] == '\0'); for example "aa:bb:cc:dd:ee:ff doesn't
    exist, aborting" was logged as "aa:bb:cc:dd:ee:f doesn't exist,
    aborting".

 2) the second argument of snprintf is the size of the buffer, not the
    size of what you want to write.

Add a bufsz parameter to egress_dev() and pass sizeof(buf) from each
caller, matching the standard snprintf() idiom and removing the
hardcoded size from the helper.

Every caller already declares "char buf[MAC_ADDR_STR_LEN + 1]" so the
formatted MAC continues to fit.

Tested by booting with
  netconsole=6665@/aa:bb:cc:dd:ee:ff,6666@10.0.0.1/00:11:22:33:44:55
on a kernel without a matching device. Pre-fix dmesg shows
"aa:bb:cc:dd:ee:f doesn't exist, aborting"; post-fix shows the full
"aa:bb:cc:dd:ee:ff doesn't exist, aborting".

Fixes: f8a10bed32f5 ("netconsole: allow selection of egress interface via MAC address")
Cc: stable@vger.kernel.org
Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20260501-netpoll_snprintf_fix-v1-1-84b0566e6597@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
12 days agoaf_unix: Set gc_in_progress to true in unix_gc().
Kuniyuki Iwashima [Fri, 1 May 2026 07:39:41 +0000 (07:39 +0000)] 
af_unix: Set gc_in_progress to true in unix_gc().

Igor Ushakov reported that unix_gc() could run with gc_in_progress
being false if the work is scheduled while running:

  Thread 1         Thread 2                     Thread 3
  --------         --------                     --------
                   unix_schedule_gc()           unix_schedule_gc()
                   `- if (!gc_in_progress)      `- if (!gc_in_progress)
                      |- gc_in_progress = true     |
                      `- queue_work()              |
  unix_gc() <----------------/                     |
  |                                                |- gc_in_progress = true
  ...                                              `- queue_work()
  |                                                       |
  `- gc_in_progress = false                               |
                                                          |
  unix_gc() <---------------------------------------------'
  |
  ... /* gc_in_progress == false */
  |
  `- gc_in_progress = false

unix_peek_fpl() relies on gc_in_progress not to confuse GC
by MSG_PEEK.

Let's set gc_in_progress to true in unix_gc().

Fixes: 8b90a9f819dc ("af_unix: Run GC on only one CPU.")
Reported-by: Igor Ushakov <sysroot314@gmail.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260501073945.1884564-1-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
12 days agosched/isolation: Make HK_TYPE_KTHREAD an alias of HK_TYPE_DOMAIN
Waiman Long [Thu, 30 Apr 2026 07:44:20 +0000 (10:44 +0300)] 
sched/isolation: Make HK_TYPE_KTHREAD an alias of HK_TYPE_DOMAIN

Since commit 041ee6f3727a ("kthread: Rely on HK_TYPE_DOMAIN for preferred
affinity management"), kthreads default to use the HK_TYPE_DOMAIN
cpumask. IOW, it is no longer affected by the setting of the nohz_full
boot kernel parameter.

That means HK_TYPE_KTHREAD should now be an alias of HK_TYPE_DOMAIN
instead of HK_TYPE_KERNEL_NOISE to correctly reflect the current kthread
behavior. Make the change as HK_TYPE_KTHREAD is still being used in
some networking code.

Fixes: 041ee6f3727a ("kthread: Rely on HK_TYPE_DOMAIN for preferred affinity management")
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
12 days agoipvs: Guard access of HK_TYPE_KTHREAD cpumask with RCU
Waiman Long [Thu, 30 Apr 2026 07:44:19 +0000 (10:44 +0300)] 
ipvs: Guard access of HK_TYPE_KTHREAD cpumask with RCU

The ip_vs_ctl.c file and the associated ip_vs.h file are the only places
in the kernel where HK_TYPE_KTHREAD cpumask is being retrieved and used.
Now that HK_TYPE_KTHREAD/HK_TYPE_DOMAIN cpumask can be changed at run
time. We need to use RCU to guard access to this cpumask to avoid a
potential UAF problem as the returned cpumask may be freed before it
is being used.

We can replace HK_TYPE_KTHREAD by HK_TYPE_DOMAIN as they are aliases
of each other, but keeping the HK_TYPE_KTHREAD name can highlight the
fact that it is the kthread initiated by ipvs that is being controlled.

Fixes: 03ff73510169 ("cpuset: Update HK_TYPE_DOMAIN cpumask from cpuset")
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
12 days agoipvs: fix shift-out-of-bounds in ip_vs_rht_desired_size
Julian Anastasov [Thu, 30 Apr 2026 07:44:18 +0000 (10:44 +0300)] 
ipvs: fix shift-out-of-bounds in ip_vs_rht_desired_size

Calling roundup_pow_of_two() with 0 has undefined result:

UBSAN: shift-out-of-bounds in ./include/linux/log2.h:57:13
shift exponent 64 is too large for 64-bit type 'unsigned long'
CPU: 1 UID: 0 PID: 77 Comm: kworker/u8:4 Not tainted syzkaller #0 PREEMPT(full)
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/18/2026
Workqueue: events_unbound conn_resize_work_handler
Call Trace:
 <TASK>
 dump_stack_lvl+0xe8/0x150 lib/dump_stack.c:120
 ubsan_epilogue+0xa/0x30 lib/ubsan.c:233
 __ubsan_handle_shift_out_of_bounds+0x385/0x410 lib/ubsan.c:494
 __roundup_pow_of_two include/linux/log2.h:57 [inline]
 ip_vs_rht_desired_size+0x2cf/0x410 net/netfilter/ipvs/ip_vs_core.c:240
 ip_vs_conn_desired_size net/netfilter/ipvs/ip_vs_conn.c:765 [inline]
 conn_resize_work_handler+0x1b6/0x14c0 net/netfilter/ipvs/ip_vs_conn.c:822
 process_one_work kernel/workqueue.c:3302 [inline]
 process_scheduled_works+0xb5d/0x1860 kernel/workqueue.c:3385
 worker_thread+0xa53/0xfc0 kernel/workqueue.c:3466
 kthread+0x388/0x470 kernel/kthread.c:436
 ret_from_fork+0x514/0xb70 arch/x86/kernel/process.c:158
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
 </TASK>

Reported-by: syzbot+217f1db9c791e27fe54a@syzkaller.appspotmail.com
Fixes: b655388111cf ("ipvs: add resizable hash tables")
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
12 days agoipvs: fix races around est_mutex and est_cpulist
Julian Anastasov [Thu, 30 Apr 2026 07:44:17 +0000 (10:44 +0300)] 
ipvs: fix races around est_mutex and est_cpulist

Sashiko reports for races and possible crash around
the usage of est_cpulist_valid and sysctl_est_cpulist.
The problem is that we do not lock est_mutex in some
places which can lead to wrong write ordering and
as result problems when calling cpumask_weight()
and cpumask_empty().

Fix them by moving the est_max_threads read/write under
locked est_mutex. Do the same for one ip_vs_est_reload_start()
call to protect the cpumask_empty() usage of sysctl_est_cpulist.

To remove the chance of deadlock while stopping the
estimation kthreads, keep the data structure for kthread 0
even after last estimator is removed and do not hold mutexes
while stopping this task. Now we will use a new flag 'needed'
to know when kthread 0 should run. The kthreads above 0
do not use mutexes, so stop them under est_mutex because
their kthread data still can be destroyed if they do not
serve estimators. Now all kthreads will be started by
the est_reload_work to properly serialize the stop/start
for kthread 0.

Reduce the use of service_mutex in ip_vs_est_calc_phase()
because under est_mutex we can safely walk est_kt_arr to
stop the kthreads above slot 0.

As ip_vs_stop_estimator() for tot_stats should be called
under service_mutex, do it early in the netns exit path
in ip_vs_flush() to avoid locking the mutex again later.
It still should be called in ip_vs_control_net_cleanup_sysctl()
when we are called during netns init error. Use -2 for ktid
as indicator if estimator was already stopped.

Finally, fix use-after-free for kd->est_row in
ip_vs_est_calc_phase(). est->ktrow should simply switch to
a delay value while estimator is linked to est_temp_list.

Link: https://sashiko.dev/#/patchset/20260331165015.2777765-1-longman%40redhat.com
Link: https://sashiko.dev/#/patchset/20260420171308.87192-1-ja%40ssi.bg
Link: https://sashiko.dev/#/patchset/20260422125123.40658-1-ja%40ssi.bg
Link: https://sashiko.dev/#/patchset/20260424175858.54752-1-ja%40ssi.bg
Link: https://sashiko.dev/#/patchset/20260425103918.7447-1-ja%40ssi.bg
Fixes: f0be83d54217 ("ipvs: add est_cpulist and est_nice sysctl vars")
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
12 days agoipvs: do not leak dest after get from dest trash
Julian Anastasov [Thu, 30 Apr 2026 07:44:16 +0000 (10:44 +0300)] 
ipvs: do not leak dest after get from dest trash

Sashiko warns about leaked dest if ip_vs_start_estimator()
fails in ip_vs_add_dest(). Add ip_vs_trash_put_dest() to
put back the dest into dest trash.

Link: https://sashiko.dev/#/patchset/20260428175725.72050-1-ja%40ssi.bg
Fixes: 705dd3444081 ("ipvs: use kthreads for stats estimation")
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
12 days agoipvs: fix the spin_lock usage for RT build
Julian Anastasov [Thu, 30 Apr 2026 07:44:15 +0000 (10:44 +0300)] 
ipvs: fix the spin_lock usage for RT build

syzbot reports for sleeping function called from invalid context [1].
The recently added code for resizable hash tables uses
hlist_bl bit locks in combination with spin_lock for
the connection fields (cp->lock).

Fix the following problems:

* avoid using spin_lock(&cp->lock) under locked bit lock
because it sleeps on PREEMPT_RT

* as the recent changes call ip_vs_conn_hash() only for newly
allocated connection, the spin_lock can be removed there because
the connection is still not linked to table and does not need
cp->lock protection.

* the lock can be removed also from ip_vs_conn_unlink() where we
are the last connection user.

* the last place that is fixed is ip_vs_conn_fill_cport()
where now the cp->lock is locked before the other locks to
ensure other packets do not modify the cp->flags in non-atomic
way. Here we make sure cport and flags are changed only once
if two or more packets race to fill the cport. Also, we fill
cport early, so that if we race with resizing there will be
valid cport key for the hashing. Add a warning if too many
hash table changes occur for our RCU read-side critical
section which is error condition but minor because the
connection still can expire gracefully. Still, restore the
cport to 0 to allow retransmitted packet to properly fill
the cport. Problems reported by Sashiko.

[1]:
BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 16, name: ktimers/0
preempt_count: 2, expected: 0
RCU nest depth: 3, expected: 3
8 locks held by ktimers/0/16:
 #0: ffffffff8de5f260 (local_bh){.+.+}-{1:3}, at: __local_bh_disable_ip+0x3c/0x420 kernel/softirq.c:163
 #1: ffffffff8dfc80c0 (rcu_read_lock){....}-{1:3}, at: __local_bh_disable_ip+0x3c/0x420 kernel/softirq.c:163
 #2: ffff8880b8826360 (&base->expiry_lock){+...}-{3:3}, at: spin_lock include/linux/spinlock_rt.h:45 [inline]
 #2: ffff8880b8826360 (&base->expiry_lock){+...}-{3:3}, at: timer_base_lock_expiry kernel/time/timer.c:1502 [inline]
 #2: ffff8880b8826360 (&base->expiry_lock){+...}-{3:3}, at: __run_timer_base+0x120/0x9f0 kernel/time/timer.c:2384
 #3: ffffffff8dfc80c0 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire include/linux/rcupdate.h:300 [inline]
 #3: ffffffff8dfc80c0 (rcu_read_lock){....}-{1:3}, at: rcu_read_lock include/linux/rcupdate.h:838 [inline]
 #3: ffffffff8dfc80c0 (rcu_read_lock){....}-{1:3}, at: __rt_spin_lock kernel/locking/spinlock_rt.c:50 [inline]
 #3: ffffffff8dfc80c0 (rcu_read_lock){....}-{1:3}, at: rt_spin_lock+0x1e0/0x400 kernel/locking/spinlock_rt.c:57
 #4: ffffc90000157a80 ((&cp->timer)){+...}-{0:0}, at: call_timer_fn+0xd4/0x5e0 kernel/time/timer.c:1745
 #5: ffffffff8dfc80c0 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire include/linux/rcupdate.h:300 [inline]
 #5: ffffffff8dfc80c0 (rcu_read_lock){....}-{1:3}, at: rcu_read_lock include/linux/rcupdate.h:838 [inline]
 #5: ffffffff8dfc80c0 (rcu_read_lock){....}-{1:3}, at: ip_vs_conn_unlink net/netfilter/ipvs/ip_vs_conn.c:315 [inline]
 #5: ffffffff8dfc80c0 (rcu_read_lock){....}-{1:3}, at: ip_vs_conn_expire+0x257/0x2390 net/netfilter/ipvs/ip_vs_conn.c:1260
 #6: ffffffff8de5f260 (local_bh){.+.+}-{1:3}, at: __local_bh_disable_ip+0x3c/0x420 kernel/softirq.c:163
 #7: ffff888068d4c3f0 (&cp->lock#2){+...}-{3:3}, at: spin_lock include/linux/spinlock_rt.h:45 [inline]
 #7: ffff888068d4c3f0 (&cp->lock#2){+...}-{3:3}, at: ip_vs_conn_unlink net/netfilter/ipvs/ip_vs_conn.c:324 [inline]
 #7: ffff888068d4c3f0 (&cp->lock#2){+...}-{3:3}, at: ip_vs_conn_expire+0xd4a/0x2390 net/netfilter/ipvs/ip_vs_conn.c:1260
Preemption disabled at:
[<ffffffff898a6358>] bit_spin_lock include/linux/bit_spinlock.h:38 [inline]
[<ffffffff898a6358>] hlist_bl_lock+0x18/0x110 include/linux/list_bl.h:149
CPU: 0 UID: 0 PID: 16 Comm: ktimers/0 Tainted: G        W    L      syzkaller #0 PREEMPT_{RT,(full)}
Tainted: [W]=WARN, [L]=SOFTLOCKUP
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/18/2026
Call Trace:
 <TASK>
 dump_stack_lvl+0xe8/0x150 lib/dump_stack.c:120
 __might_resched+0x329/0x480 kernel/sched/core.c:9162
 __rt_spin_lock kernel/locking/spinlock_rt.c:48 [inline]
 rt_spin_lock+0xc2/0x400 kernel/locking/spinlock_rt.c:57
 spin_lock include/linux/spinlock_rt.h:45 [inline]
 ip_vs_conn_unlink net/netfilter/ipvs/ip_vs_conn.c:324 [inline]
 ip_vs_conn_expire+0xd4a/0x2390 net/netfilter/ipvs/ip_vs_conn.c:1260
 call_timer_fn+0x192/0x5e0 kernel/time/timer.c:1748
 expire_timers kernel/time/timer.c:1799 [inline]
 __run_timers kernel/time/timer.c:2374 [inline]
 __run_timer_base+0x6a3/0x9f0 kernel/time/timer.c:2386
 run_timer_base kernel/time/timer.c:2395 [inline]
 run_timer_softirq+0xb7/0x170 kernel/time/timer.c:2405
 handle_softirqs+0x1de/0x6d0 kernel/softirq.c:622
 __do_softirq kernel/softirq.c:656 [inline]
 run_ktimerd+0x69/0x100 kernel/softirq.c:1151
 smpboot_thread_fn+0x541/0xa50 kernel/smpboot.c:160
 kthread+0x388/0x470 kernel/kthread.c:436
 ret_from_fork+0x514/0xb70 arch/x86/kernel/process.c:158
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
 </TASK>

Reported-by: syzbot+504e778ddaecd36fdd17@syzkaller.appspotmail.com
Link: https://sashiko.dev/#/patchset/20260415200216.79699-1-ja%40ssi.bg
Link: https://sashiko.dev/#/patchset/20260420165539.85174-4-ja%40ssi.bg
Link: https://sashiko.dev/#/patchset/20260422135823.50489-4-ja%40ssi.bg
Fixes: 2fa7cc9c7025 ("ipvs: switch to per-net connection table")
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
12 days agoipvs: fix races around the conn_lfactor and svc_lfactor sysctl vars
Julian Anastasov [Thu, 30 Apr 2026 07:44:14 +0000 (10:44 +0300)] 
ipvs: fix races around the conn_lfactor and svc_lfactor sysctl vars

Sashiko warns that the new sysctls vars can be changed
after the hash tables are destroyed and their respective
resizing works canceled, leading to mod_delayed_work()
being called for canceled works.

Solve this in different ways. conn_tab can be present even
without services and is destroyed only on netns exit, so use
disable_delayed_work_sync() to disable the work instead of
adding more synchronization mechanisms.

As for the svc_table, it is destroyed when the services
are deleted, so we must be sure that netns exit is not
called yet (the check for 'enable') and the work is
not canceled by checking all under same mutex lock.

Also, use WRITE_ONCE when updating the sysctl vars as we
already read them with READ_ONCE.

Link: https://sashiko.dev/#/patchset/20260410112352.23599-1-fw%40strlen.de
Fixes: 8d7de5477e47 ("ipvs: add conn_lfactor and svc_lfactor sysctl vars")
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
12 days agoipvs: fixes for the new ip_vs_status info
Julian Anastasov [Thu, 30 Apr 2026 07:44:13 +0000 (10:44 +0300)] 
ipvs: fixes for the new ip_vs_status info

Sashiko reports some problems for the recently added
/proc/net/ip_vs_status:

* ip_vs_status_show() as a table reader may run long after the
conn_tab and svc_table table are released. While ip_vs_conn_flush()
properly changes the conn_tab_changes counter when conn_tab is removed,
ip_vs_del_service() and ip_vs_flush() were missing such change for
the svc_table_changes counter. As result, readers like
ip_vs_dst_event() and ip_vs_status_show() may continue to use
a freed table after a cond_resched_rcu() call.

* While counting the buckets in ip_vs_status_show() make sure we
traverse only the needed number of entries in the chain. This also
prevents possible overflow of the 'count' variable.

* Add check for 'loops' to prevent infinite loops while restarting
the traversal on table change.

* While IP_VS_CONN_TAB_MAX_BITS is 20 on 32-bit platforms and
there is no risk to overflow when multiplying the number of
conn_tab buckets to 100, prefer the div_u64() helper to make
the following dividing safer.

* Use 0440 permissions for ip_vs_status to restrict the
info only to root due to the exported information for hash
distribution.

Link: https://sashiko.dev/#/patchset/20260410112352.23599-1-fw%40strlen.de
Fixes: 9a9ccef907a7 ("ipvs: add ip_vs_status info")
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
12 days agoMerge tag 'linux_kselftest-fixes-7.1-rc3' of git://git.kernel.org/pub/scm/linux/kerne...
Linus Torvalds [Mon, 4 May 2026 23:02:53 +0000 (16:02 -0700)] 
Merge tag 'linux_kselftest-fixes-7.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest

Pull kselftest fixes from Shuah Khan:

 - Fix extra test number increment in ksft_exit_skip() that results in
   incorrect KTAP result

 - Fix regression introduced by addition of explicit constructor orders
   for fixture tests. This addition broke the ordering of those relative
   to non-fixture tests and the reverse-constructor-order detection

* tag 'linux_kselftest-fixes-7.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  selftests: harness: Restore order of test functions
  selftests: kselftest: fix wrong test number in ksft_exit_skip

12 days agoselftests: ovpn: reduce ping count in test.sh
Ralf Lici [Wed, 29 Apr 2026 08:00:16 +0000 (10:00 +0200)] 
selftests: ovpn: reduce ping count in test.sh

The second stage of test.sh ("run baseline data traffic") performs a
basic connectivity check with ping -qfc 500 -w 3.  On slower CI
instances this is too strict for TCP: the RTT is high enough that 500
echo requests do not reliably complete within 3 seconds, so the stage
flakes and the test fails even though the ovpn setup is healthy.

Reduce the packet count to 100 for both the plain and 3000-byte pings in
that stage.  This still verifies peer setup, key exchange, routing, and
data-path traffic, without making the basic connectivity check depend on
timing out under load.

Fixes: 959bc330a439 ("testing/selftests: add test tool and scripts for ovpn module")
Signed-off-by: Ralf Lici <ralf@mandelbit.com>
Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
12 days agoovpn: ensure packet delivery happens with BH disabled
Ralf Lici [Wed, 25 Mar 2026 16:49:18 +0000 (17:49 +0100)] 
ovpn: ensure packet delivery happens with BH disabled

ovpn injects decrypted packets into the netdev RX path through
ovpn_netdev_write() which invokes gro_cells_receive() and
dev_dstats_rx_add().

ovpn_netdev_write() is normally called in softirq context,
however, in case of TCP connections it may also be invoked
process context.

When this happens gro_cells_receive() will throw a warning:

  [  230.183747][   T12] WARNING: net/core/gro_cells.c:30 at gro_cells_receive+0x708/0xaa0, CPU#1: kworker/u16:0/12

and lockdep will also report a potential inconsistent lock state:

  WARNING: inconsistent lock state
  7.0.0-rc4+ #246 Tainted: G        W
  --------------------------------
  inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.

because attempts to acquire gro_cells->bh_lock by both
contexts may lead to a deadlock.

At the same time, dev_dstats_rx_add() does not expect to race
with a softirq (which may happen when invoked in process context),
because the latter may access its per-cpu state and corrupt
it.

Fix all this by invoking local_bh_disable/enable() around
gro_cells_receive() and dev_dstats_rx_add() to ensure that
bottom halves are always disabled before calling both of
them.

Fixes: 11851cbd60ea ("ovpn: implement TCP transport")
Signed-off-by: Ralf Lici <ralf@mandelbit.com>
Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
12 days agoovpn: reset MAC header before passing skb up
Qingfang Deng [Mon, 27 Apr 2026 04:00:11 +0000 (12:00 +0800)] 
ovpn: reset MAC header before passing skb up

After decapsulating a packet, the skb->mac_header still points to the
outer transport header.

Fix this by calling skb_reset_mac_header() in ovpn_netdev_write() to
ensure the MAC header points to the beginning of
the inner IP/network packet, as expected by the rest of the stack.

Reported-by: Minqiang Chen <ptpt52@gmail.com>
Fixes: 8534731dbf2d ("ovpn: implement packet processing")
Signed-off-by: Qingfang Deng <qingfang.deng@linux.dev>
Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
12 days agodocs: cgroup-v1: Update charge-commit section
T.J. Mercier [Thu, 30 Apr 2026 20:11:42 +0000 (13:11 -0700)] 
docs: cgroup-v1: Update charge-commit section

Commit 1d8f136a421f ("memcg/hugetlb: remove memcg hugetlb
try-commit-cancel protocol") removed mem_cgroup_commit_charge() and
mem_cgroup_cancel_charge(), but the docs still refer to those functions.
There is no longer any charge cancellation.

Update the docs to match the code.

Signed-off-by: T.J. Mercier <tjmercier@google.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
12 days agosched_ext: idle: Recheck prev_cpu after narrowing allowed mask
David Carlier [Thu, 30 Apr 2026 09:27:47 +0000 (10:27 +0100)] 
sched_ext: idle: Recheck prev_cpu after narrowing allowed mask

scx_select_cpu_dfl() narrows @allowed to @cpus_allowed & @p->cpus_ptr
when the BPF caller supplies a @cpus_allowed that differs from
@p->cpus_ptr and @p doesn't have full affinity. However,
@is_prev_allowed was computed against the original (wider)
@cpus_allowed, so the prev_cpu fast paths could pick a @prev_cpu that
is in @cpus_allowed but not in @p->cpus_ptr, violating the intended
invariant that the returned CPU is always usable by @p. The kernel
masks this via the SCX_EV_SELECT_CPU_FALLBACK fallback, but the
behavior contradicts the documented contract.

Move the @is_prev_allowed evaluation past the narrowing block so it
tests against the final @allowed mask.

Fixes: ee9a4e92799d ("sched_ext: idle: Properly handle invalid prev_cpu during idle selection")
Cc: stable@vger.kernel.org # v6.16+
Assisted-by: Claude <noreply@anthropic.com>
Signed-off-by: David Carlier <devnexen@gmail.com>
Reviewed-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
12 days agodrm/sti: remove bridge when sti_hda component_add fails
Osama Abdelkader [Thu, 23 Apr 2026 20:06:19 +0000 (22:06 +0200)] 
drm/sti: remove bridge when sti_hda component_add fails

Use devm_drm_bridge_add() so the bridge is released if probe fails after
registration, and drop the manual drm_bridge_remove() in remove().

Check the return value of devm_drm_bridge_add().

Signed-off-by: Osama Abdelkader <osama.abdelkader@gmail.com>
Fixes: d28726efc637 ("drm/sti: hda: add bridge before attaching")
Cc: stable@vger.kernel.org
Reviewed-by: Luca Ceresoli <luca.ceresoli@bootlin.com>
Acked-by: Raphaël Gallais-Pou <rgallaispou@gmail.com>
Link: https://patch.msgid.link/20260423200622.325076-1-osama.abdelkader@gmail.com
Signed-off-by: Raphael Gallais-Pou <raphael.gallais-pou@foss.st.com>
12 days agoMerge tag 'for-linus-7.1-2' of https://github.com/cminyard/linux-ipmi
Linus Torvalds [Mon, 4 May 2026 19:48:30 +0000 (12:48 -0700)] 
Merge tag 'for-linus-7.1-2' of https://github.com/cminyard/linux-ipmi

Pull IPMI fixes from Corey Minyard:
 "Fix a number of issues that came up recently

  The first two fixes are workarounds for buggy IPMI hardware. The
  hardware says it has data for the IPMI driver to read constantly, so
  the driver reads the data constantly, causing any new requests to be
  blocked.

  The first fix was to check for invalid data right when the data was
  read from the device and stop the operation there (there was a later
  check for invalid data, but it could not stop the operation at that
  point). It turned out the device was providing good data, so that
  didn't fix the issue, but it's still a good check.

  The second fix stops fetching this data after a few fetches and allows
  other operations to occur. The driver won't work very well, but at
  least it won't wedge. This seems to fix the issue.

  The third issue is a problem I spotted while working on the previous
  issue where if a certain memory allocation failed the driver would
  stop working.

  The fourth issue is a problem was a missing set to NULL on a PTR_ERR()
  return, introduced in the previous series for 7.1"

* tag 'for-linus-7.1-2' of https://github.com/cminyard/linux-ipmi:
  ipmi:ssif: NULL thread on error
  ipmi:si: Return state to normal if message allocation fails
  ipmi: Add limits to event and receive message requests
  ipmi: Check event message buffer response for bad data

12 days agosched_ext: Skip past-sched_ext_dead() tasks in scx_task_iter_next_locked()
Tejun Heo [Tue, 28 Apr 2026 00:16:35 +0000 (14:16 -1000)] 
sched_ext: Skip past-sched_ext_dead() tasks in scx_task_iter_next_locked()

scx_task_iter's cgroup-scoped mode can return tasks whose
sched_ext_dead() has already completed: cgroup_task_dead() removes
from cset->tasks after sched_ext_dead() in finish_task_switch() and is
irq-work deferred on PREEMPT_RT. The global mode is fine -
sched_ext_dead() removes from scx_tasks via list_del_init() first.

Callers (sub-sched enable prep/abort/apply, scx_sub_disable(),
scx_fail_parent()) assume returned tasks are still on @sch and trip
WARN_ON_ONCE() or operate on torn-down state otherwise.

Set %SCX_TASK_OFF_TASKS in sched_ext_dead() under @p's rq lock and
have scx_task_iter_next_locked() skip flagged tasks under the same
lock. Setter and reader serialize on the per-task rq lock - no race.

Signed-off-by: Tejun Heo <tj@kernel.org>
12 days agocgroup, sched_ext: Include exiting tasks in cgroup iter
Tejun Heo [Tue, 28 Apr 2026 00:16:34 +0000 (14:16 -1000)] 
cgroup, sched_ext: Include exiting tasks in cgroup iter

a72f73c4dd9b ("cgroup: Don't expose dead tasks in cgroup") made
css_task_iter_advance() skip exiting tasks so cgroup.procs stays consistent
with waitpid() visibility. Unfortunately, this broke scx_task_iter.

scx_task_iter walks either scx_tasks (global) or a cgroup subtree via
css_task_iter() and the two modes are expected to cover the same set of
tasks. After the above change the cgroup-scoped mode silently skips tasks
past exit_signals() that are still on scx_tasks.

scx_sub_enable_workfn()'s abort path is one of the symptoms: an exiting
SCX_TASK_SUB_INIT task can race past the cgroup iter leaking
__scx_init_task() state. Other iterations share the same gap.

Add CSS_TASK_ITER_WITH_DEAD to opt out of the skip and use it from
scx_task_iter().

Fixes: b0e4c2f8a0f0 ("sched_ext: Implement cgroup subtree iteration for scx_task_iter")
Reported-by: Cheng-Yang Chou <yphbchou0911@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
12 days agoMAINTAINERS: Update mail for Peter Rosin
Peter Rosin [Thu, 30 Apr 2026 04:09:58 +0000 (06:09 +0200)] 
MAINTAINERS: Update mail for Peter Rosin

I'm resigning from my position at Axentia.

Signed-off-by: Peter Rosin <peda@lysator.liu.se>
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
12 days agocgroup: Defer css percpu_ref kill on rmdir until cgroup is depopulated
Tejun Heo [Fri, 1 May 2026 18:31:22 +0000 (08:31 -1000)] 
cgroup: Defer css percpu_ref kill on rmdir until cgroup is depopulated

A chain of commits going back to v7.0 reworked rmdir to satisfy the
controller invariant that a subsystem's ->css_offline() must not run while
tasks are still doing kernel-side work in the cgroup.

[1] d245698d727a ("cgroup: Defer task cgroup unlink until after the task is done switching out")
[2] a72f73c4dd9b ("cgroup: Don't expose dead tasks in cgroup")
[3] 1b164b876c36 ("cgroup: Wait for dying tasks to leave on rmdir")
[4] 4c56a8ac6869 ("cgroup: Fix cgroup_drain_dying() testing the wrong condition")
[5] 13e786b64bd3 ("cgroup: Increment nr_dying_subsys_* from rmdir context")

[1] moved task cset unlink from do_exit() to finish_task_switch() so a
task's cset link drops only after the task has fully stopped scheduling.
That made tasks past exit_signals() linger on cset->tasks until their final
context switch, which led to a series of problems as what userspace expected
to see after rmdir diverged from what the kernel needs to wait for. [2]-[5]
tried to bridge that divergence: [2] filtered the exiting tasks from
cgroup.procs; [3] had rmdir(2) sleep in TASK_UNINTERRUPTIBLE for them; [4]
fixed the wait's condition; [5] made nr_dying_subsys_* visible
synchronously.

The cgroup_drain_dying() wait in [3] turned out to be a dead end. When the
rmdir caller is also the reaper of a zombie that pins a pidns teardown (e.g.
host PID 1 systemd reaping orphan pids that were re-parented to it during
the same teardown), rmdir blocks in TASK_UNINTERRUPTIBLE waiting for those
pids to free, the pids can't free because PID 1 is the reaper and it's stuck
in rmdir, and the system A-A deadlocks. No internal lock ordering breaks
this; the wait itself is the bug.

The css killing side that drove the original reorder, however, can be made
cleanly asynchronous: ->css_offline() is already async, run from
css_killed_work_fn() driven by percpu_ref_kill_and_confirm(). The fix is to
make that chain start only after all tasks have left the cgroup. rmdir's
user-visible side then returns as soon as cgroup.procs and friends are
empty, while ->css_offline() still runs only after the cgroup is fully
drained.

Verified by the original reproducer (pidns teardown + zombie reaper, runs
under vng) which hangs vanilla and succeeds here, and by per-commit
deterministic repros for [2], [3], [4], [5] with a boot parameter that
widens the post-exit_signals() window so each state is reliably reachable.
Some stress tests on top of that.

cgroup_apply_control_disable() has the same shape of pre-existing race:
when a controller is disabled via subtree_control, kill_css() ran
synchronously while tasks past exit_signals() could still be linked to
the cgroup's csets, and ->css_offline() could fire before they drained.
This patch preserves the existing synchronous behavior at that call site
(kill_css_sync() + kill_css_finish() back-to-back) and a follow-up patch
will defer kill_css_finish() there using a per-css trigger.

This seems like the right approach and I don't see problems with it. The
changes are somewhat invasive but not excessively so, so backporting to
-stable should be okay. If something does turn out to be wrong, the fallback
is to revert the entire chain ([1]-[5]) and rework in the development branch
instead.

v2: Pin cgrp across the deferred destroy work with explicit
    cgroup_get()/cgroup_put() around queue_work() and the work_fn. v1
    wasn't actually broken (ordered cgroup_offline_wq + queue_work order
    in cgroup_task_dead() saved it) but the explicit ref removes the
    dependency on those non-obvious invariants. Also note the
    pre-existing cgroup_apply_control_disable() race in the description;
    a follow-up will defer kill_css_finish() there.

Fixes: 1b164b876c36 ("cgroup: Wait for dying tasks to leave on rmdir")
Cc: stable@vger.kernel.org # v7.0+
Reported-and-tested-by: Martin Pitt <martin@piware.de>
Link: https://lore.kernel.org/all/afHNg2VX2jy9bW7y@piware.de/
Link: https://lore.kernel.org/all/35e0670adb4abeab13da2c321582af9f@kernel.org/
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
12 days agokunit: config: KUNIT_DEBUGFS should depend on DEBUG_FS
David Gow [Sat, 25 Apr 2026 03:41:54 +0000 (11:41 +0800)] 
kunit: config: KUNIT_DEBUGFS should depend on DEBUG_FS

CONFIG_KUNIT_DEBUGFS is totally useless without debugfs, so it should
depend on CONFIG_DEBUG_FS.

Link: https://lore.kernel.org/r/20260425034155.53913-2-david@davidgow.net
Fixes: e2219db280e3 ("kunit: add debugfs /sys/kernel/debug/kunit/<suite>/results display")
Signed-off-by: David Gow <david@davidgow.net>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
12 days agokunit: config: Enable KUNIT_DEBUGFS by default
David Gow [Sat, 25 Apr 2026 03:41:53 +0000 (11:41 +0800)] 
kunit: config: Enable KUNIT_DEBUGFS by default

The KUNIT_DEBUGFS option is currently enabled based on the value of
KUNIT_ALL_TESTS, but it really doesn't have anything to do with the set of
enabled tests, so just enable it by default anyway. In particular, this
shouldn't be only visible if KUNIT_ALL_TESTS is set, which is quite
confusing.

Link: https://lore.kernel.org/r/20260425034155.53913-1-david@davidgow.net
Fixes: beaed42c427d ("kunit: default KUNIT_* fragments to KUNIT_ALL_TESTS")
Signed-off-by: David Gow <david@davidgow.net>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
12 days agoALSA: usb-audio: add clock quirk for Motu 1248
Nicola Lunghi [Mon, 4 May 2026 14:45:20 +0000 (16:45 +0200)] 
ALSA: usb-audio: add clock quirk for Motu 1248

The Motu 1248 (and probably other older Motu AVB interfaces) take more
than 2 seconds to switch clock. During the clock switching process the
device return that the clock is not valid. This is similar to what
already implemented for the Microbook II interface. Add the Motu
1248 usb id to the existing Motu quirk.

Signed-off-by: Nicola Lunghi <nick83ola@gmail.com>
Link: https://patch.msgid.link/20260504144520.699522-2-nick83ola@gmail.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
12 days agoALSA: usb-audio: midi2: Restart output URBs on resume
Cássio Gabriel [Mon, 4 May 2026 14:08:45 +0000 (11:08 -0300)] 
ALSA: usb-audio: midi2: Restart output URBs on resume

USB MIDI 2.0 suspend saves the endpoint running state, clears it and
kills all endpoint URBs. Resume restores the running state, but only
restarts input endpoints.

For a running output endpoint, this leaves the endpoint marked running
with an empty URB queue. Output transfer progress depends on either the
rawmidi trigger path starting the queue or an output completion refilling
it. After suspend there is no completion left, and output data that
remains queued in the raw UMP or legacy rawmidi buffer can stay stalled
until userspace happens to trigger the stream again.

Restore the saved state with atomic accessors, keep input endpoints
restarted as before, and restart output endpoints that were running before
suspend. Clear the saved suspend state after restoring it.

Fixes: ff49d1df79ae ("ALSA: usb-audio: USB MIDI 2.0 UMP support")
Cc: stable@vger.kernel.org
Signed-off-by: Cássio Gabriel <cassiogabrielcontato@gmail.com>
Link: https://patch.msgid.link/20260504-usb-midi2-output-resume-v1-1-c089cc8ad3c6@gmail.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
12 days agoALSA: hda/realtek: Fix mute and mic-mute LEDs for HP Envy X360 15-fh0xxx
Fernando Antunez Antonio [Mon, 4 May 2026 13:33:26 +0000 (07:33 -0600)] 
ALSA: hda/realtek: Fix mute and mic-mute LEDs for HP Envy X360 15-fh0xxx

This enables the mute and mic-mute LEDs on the HP Envy X360 15-fh0xxx
2-in-1 laptops.
The quirk 'ALC245_FIXUP_HP_ENVY_X360_15_FH0XXX' has been created and
is now enabled for this device.

This is my first patch, and I'm still getting to grips with the code,
so there's probably a better way to implement this fix.
I apologize for any inconvenience caused by the constant release of
new versions of this patch.

Signed-off-by: Fernando Antunez Antonio <fer.antunez24antonio@gmail.com>
Link: https://patch.msgid.link/20260504-hpenvy-muteled-fix-v3-1-5567fd9b3d25@gmail.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
12 days agoALSA: usb-audio: Add quirk flags for JBL Pebbles
Rong Zhang [Mon, 4 May 2026 11:38:05 +0000 (19:38 +0800)] 
ALSA: usb-audio: Add quirk flags for JBL Pebbles

JBL Pebbles is a pair of desktop speakers with UAC interface. Its
Playback and Capture mixers use linear volume with val = 0/999/1 and
0/3996/4. Meanwhile, the reported sample rates are truncated to
multiples of 0x100 (i.e., 44100 => 44032), resulting in noisy kmsg, as a
warning message is printed each time a stream is opened.

Add a quirk table entry matching VID/PID=0x05fc/0x0231 and applying
linear volume and sample rate quirk flags, so that it can work properly.

Also note that the volume control knob on device is an incremental
encoder. It does nothing but sends KEY_VOLUMEUP and KEY_VOLUMEDOWN per
rotation, controlling the UAC Playback volume mixer indirectly. Hence,
the linear volume quirk flags also enable the volume control knob to
function properly.

Quirky device sample:

  usb 5-1.1: new full-speed USB device number 12 using xhci_hcd
  usb 5-1.1: New USB device found, idVendor=05fc, idProduct=0231, bcdDevice= 1.00
  usb 5-1.1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
  usb 5-1.1: Product: JBL Pebbles
  usb 5-1.1: Manufacturer: Harman International Industries
  usb 5-1.1: SerialNumber: 1.0.0
  usb-storage 5-1.1:1.0: USB Mass Storage device detected
  scsi host0: usb-storage 5-1.1:1.0
  usb 5-1.1: Found last interface = 1
  usb 5-1.1: 2:1: add audio endpoint 0x5
  usb 5-1.1: Creating new data endpoint #5
  usb 5-1.1: 2:1 Set sample rate 44100, clock 0
  usb 5-1.1: current rate 44032 is different from the runtime rate 44100
  usb 5-1.1: 3:1: add audio endpoint 0x84
  usb 5-1.1: Creating new data endpoint #84
  usb 5-1.1: 3:1 Set sample rate 44100, clock 0
  usb 5-1.1: current rate 44032 is different from the runtime rate 44100
  usb 5-1.1: [2] FU [PCM Playback Switch] ch = 1, val = 0/1/1
  usb 5-1.1: Warning! Unlikely big volume step count (=999), linear volume or wrong cval->res?
  usb 5-1.1: [2] FU [PCM Playback Volume] ch = 2, val = 0/999/1
  usb 5-1.1: [5] FU [Mic Capture Switch] ch = 1, val = 0/1/1
  usb 5-1.1: Warning! Unlikely big volume step count (=999), linear volume or wrong cval->res?
  usb 5-1.1: [5] FU [Mic Capture Volume] ch = 2, val = 0/3996/4
  input: Harman International Industries JBL Pebbles as /devices/pci0000:00/0000:00:08.3/0000:67:00.3/usb5/5-1/5-1.1/5-1.1:1.4/0003:05FC:0231.0018/input/input55
  hid-generic 0003:05FC:0231.0018: input,hidraw2: USB HID v2.01 Device [Harman International Industries JBL Pebbles] on usb-0000:67:00.3-1.1/input4

Signed-off-by: Rong Zhang <i@rong.moe>
Link: https://patch.msgid.link/20260504-uac-jbl-pebbles-v1-1-c888d592a286@rong.moe
Signed-off-by: Takashi Iwai <tiwai@suse.de>
12 days agoALSA: firewire-tascam: Do not drop unread control events
Cássio Gabriel [Mon, 4 May 2026 00:55:52 +0000 (21:55 -0300)] 
ALSA: firewire-tascam: Do not drop unread control events

tscm_hwdep_read_queue() copies as many queued control events as fit in
the userspace buffer. When the buffer is smaller than the current
contiguous queue segment, length is rounded down to the number of bytes
that can be copied.

However, after copying that shortened length, the code advances pull_pos
to the original tail_pos, marking the whole contiguous segment as
consumed. Any events between the copied portion and tail_pos are lost.

Limit tail_pos to the position after the entries actually copied before
updating pull_pos. When the whole segment fits, this is equivalent to the
old tail_pos update; when the buffer is smaller, the remaining events
stay queued for the next read.

Fixes: a8c0d13267a4 ("ALSA: firewire-tascam: notify events of change of state for userspace applications")
Cc: stable@vger.kernel.org
Suggested-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Signed-off-by: Cássio Gabriel <cassiogabrielcontato@gmail.com>
Reviewed-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Co-developed-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Link: https://patch.msgid.link/20260503-alsa-firewire-tascam-read-queue-v2-1-126c6efd7642@gmail.com
12 days agoALSA: usb-audio: Add quirk flags for AlphaTheta EUPHONIA
Anton Swart [Sun, 3 May 2026 21:15:17 +0000 (23:15 +0200)] 
ALSA: usb-audio: Add quirk flags for AlphaTheta EUPHONIA

The AlphaTheta EUPHONIA (VID 0x2b73, PID 0x0047) is a USB Audio
Class 2 DJ mixer that requires implicit feedback for full-duplex
operation. The capture endpoint (0x83 IN, interface 2) acts as the
implicit feedback source for the playback endpoint (0x03 OUT,
interface 1), and the device firmware does not send isochronous
data on the capture endpoint unless the host is simultaneously
sending data on the playback endpoint, i.e. playback must be
started first.

Without QUIRK_FLAG_PLAYBACK_FIRST the kernel waits for capture URBs
before submitting playback URBs, creating a deadlock: the device
waits for playback data and the host waits for capture data.
Without QUIRK_FLAG_GENERIC_IMPLICIT_FB the kernel does not detect
the implicit feedback relationship between the two interfaces.

The same flag combination is already used for the Behringer UMC202HD,
UMC204HD and UMC404HD (0x1397:0x0507/0x0508/0x0509), which exhibit
the identical implicit-feedback topology.

Tested on Raspberry Pi 5 with kernel 6.12.75; continuous full-duplex
streaming at 96 kHz / 24-bit, zero XRUNs.

Signed-off-by: Anton Swart <anton.swart.jhb@gmail.com>
Link: https://patch.msgid.link/20260503211517.14332-1-anton.swart.jhb@gmail.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
12 days agodrm/bridge: tda998x: Return NULL instead of 0 in tda998x_edid_read()
Kory Maincent (TI) [Fri, 17 Apr 2026 15:54:45 +0000 (17:54 +0200)] 
drm/bridge: tda998x: Return NULL instead of 0 in tda998x_edid_read()

tda998x_edid_read() returns a const struct drm_edid pointer, but when
tda998x_edid_delay_wait() fails (process killed while waiting for the
HPD timeout), the integer literal 0 is returned instead of NULL,
triggering a sparse warning: "Using plain integer as NULL pointer"

Replace 0 with NULL to fix the sparse warning.

Fixes: c76a8be4feec ("drm/bridge: tda998x: Add support for DRM_BRIDGE_ATTACH_NO_CONNECTOR")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202604172257.Imo6GOH9-lkp@intel.com/
Signed-off-by: Kory Maincent (TI) <kory.maincent@bootlin.com>
Reviewed-by: Luca Ceresoli <luca.ceresoli@bootlin.com>
Link: https://patch.msgid.link/20260417155446.1068893-1-kory.maincent@bootlin.com
Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com>
12 days agodrm/bridge: tda998x: Use __be32 for audio port OF property pointer
Kory Maincent (TI) [Tue, 28 Apr 2026 09:04:56 +0000 (11:04 +0200)] 
drm/bridge: tda998x: Use __be32 for audio port OF property pointer

of_get_property() returns a pointer to big-endian (__be32) data, but
port_data in tda998x_get_audio_ports() was declared as const u32 *,
causing a sparse endianness type mismatch warning. Fix the declaration
to use const __be32 *.

Fixes: 7e567624dc5a4 ("drm/i2c: tda998x: Register ASoC hdmi-codec and add audio DT binding")
Cc: stable@vger.kernel.org
Signed-off-by: Kory Maincent (TI) <kory.maincent@bootlin.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/20260428090457.121894-1-kory.maincent@bootlin.com
Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com>
13 days agowifi: ath5k: do not access array OOB
Jiri Slaby (SUSE) [Tue, 9 Dec 2025 10:04:59 +0000 (11:04 +0100)] 
wifi: ath5k: do not access array OOB

Vincent reports:
> The ath5k driver seems to do an array-index-out-of-bounds access as
> shown by the UBSAN kernel message:
> UBSAN: array-index-out-of-bounds in drivers/net/wireless/ath/ath5k/base.c:1741:20
> index 4 is out of range for type 'ieee80211_tx_rate [4]'
> ...
> Call Trace:
>  <TASK>
>  dump_stack_lvl+0x5d/0x80
>  ubsan_epilogue+0x5/0x2b
>  __ubsan_handle_out_of_bounds.cold+0x46/0x4b
>  ath5k_tasklet_tx+0x4e0/0x560 [ath5k]
>  tasklet_action_common+0xb5/0x1c0

It is real. 'ts->ts_final_idx' can be 3 on 5212, so:
   info->status.rates[ts->ts_final_idx + 1].idx = -1;
with the array defined as:
   struct ieee80211_tx_rate rates[IEEE80211_TX_MAX_RATES];
while the size is:
   #define IEEE80211_TX_MAX_RATES  4
is indeed bogus.

Set this 'idx = -1' sentinel only if the array index is less than the
array size. As mac80211 will not look at rates beyond the size
(IEEE80211_TX_MAX_RATES).

Note: The effect of the OOB write is negligible. It just overwrites the
next member of info->status, i.e. ack_signal.

Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org>
Reported-by: Vincent Danjean <vdanjean@debian.org>
Link: https://lore.kernel.org/all/aQYUkIaT87ccDCin@eldamar.lan
Closes: https://bugs.debian.org/1119093
Fixes: 6d7b97b23e11 ("ath5k: fix tx status reporting issues")
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20251209100459.2253198-1-jirislaby@kernel.org
Signed-off-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
13 days agowifi: ath12k: fix peer_id usage in normal RX path
Baochen Qiang [Mon, 27 Apr 2026 05:51:41 +0000 (13:51 +0800)] 
wifi: ath12k: fix peer_id usage in normal RX path

ath12k_dp_rx_deliver_msdu() currently uses hal_rx_desc_data::peer_id
parsed from mpdu_start descriptor to do peer lookup. However In an A-MSDU
aggregation scenario, hardware only populates mpdu_start descriptor for
the first sub-msdu, but not the following ones. In that case peer_id could
be invalid, leading to peer lookup failure:

ath12k_wifi7_pci 0000:06:00.0: rx skb 00000000c391c041 len 1532 peer (null) 0 ucast sn 0 eht320 rate_idx 12 vht_nss 2 freq 6105 band 3 flag 0x40d1a fcs-err 0 mic-err 0 amsdu-more 0

As a result pubsta is NULL and parts of ieee80211_rx_status structure are
left uninitialized, which may cause unexpected behavior.

Fix it by switching the normal RX path to use ath12k_skb_rxcb::peer_id
which is parsed from REO ring's rx_mpdu_desc and is always valid.

hal_rx_desc_data::peer_id is still used in
ath12k_wifi7_dp_rx_frag_h_mpdu(), which is safe since A-MSDU
aggregation does not occur for fragmented frames. Similarly,
ath12k_skb_rxcb::peer_id may be overwritten by hal_rx_desc_data::peer_id
in ath12k_wifi7_dp_rx_h_mpdu(), which only handles non-aggregated
multicast/broadcast traffic.

Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.1.c5-00302-QCAHMTSWPL_V1.0_V2.0_SILICONZ-1.115823.3

Fixes: 11157e0910fd ("wifi: ath12k: Use ath12k_dp_peer in per packet Tx & Rx paths")
Signed-off-by: Baochen Qiang <baochen.qiang@oss.qualcomm.com>
Reviewed-by: Rameshkumar Sundaram <rameshkumar.sundaram@oss.qualcomm.com>
Link: https://patch.msgid.link/20260427-ath12k-fix-peer-id-source-v1-1-b5f701fb8e88@oss.qualcomm.com
Signed-off-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
13 days agowifi: ath12k: initialize RSSI dBm conversion event state
Rameshkumar Sundaram [Mon, 27 Apr 2026 10:30:11 +0000 (16:00 +0530)] 
wifi: ath12k: initialize RSSI dBm conversion event state

Currently, the RSSI dBm conversion event handler leaves struct
ath12k_wmi_rssi_dbm_conv_info_arg uninitialized on the stack before
calling the TLV parser. If one of the optional sub-TLVs is absent, the
corresponding *_present flag retains stack garbage and later gets read
in ath12k_wmi_update_rssi_offsets(). With UBSAN enabled this triggers an
invalid-load report for _Bool:

UBSAN: invalid-load in drivers/net/wireless/ath/ath12k/wmi.c:9682:15
load of value 9 is not a valid value for type '_Bool'
Call Trace:
 ath12k_wmi_rssi_dbm_conversion_params_info_event.cold+0x72/0x85 [ath12k]
 ath12k_wmi_op_rx+0x1871/0x2ab0 [ath12k]
 ath12k_htc_rx_completion_handler+0x44b/0x810 [ath12k]
 ath12k_ce_recv_process_cb+0x554/0x9f0 [ath12k]
 ath12k_ce_per_engine_service+0xbe/0xf0 [ath12k]
 ath12k_pci_ce_workqueue+0x69/0x120 [ath12k]

Initialize the parsed event state to zero before passing it to the TLV
parser so missing sub-TLVs correctly leave the presence flags false.

Tested-on: QCN9274 hw2.0 PCI WLAN.WBE.1.4.1-00199-QCAHKSWPL_SILICONZ-1

Fixes: 0314ee81a91d ("wifi: ath12k: handle WMI event for real noise floor calculation")
Signed-off-by: Rameshkumar Sundaram <rameshkumar.sundaram@oss.qualcomm.com>
Reviewed-by: Baochen Qiang <baochen.qiang@oss.qualcomm.com>
Link: https://patch.msgid.link/20260427103011.2983269-1-rameshkumar.sundaram@oss.qualcomm.com
Signed-off-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
13 days agowifi: ath12k: fix leak in some ath12k_wmi_xxx() functions
Nicolas Escande [Wed, 22 Apr 2026 16:32:58 +0000 (18:32 +0200)] 
wifi: ath12k: fix leak in some ath12k_wmi_xxx() functions

Some wmi functions were using plain 'return ath12k_wmi_cmd_send(...)'
without explicitly handling the error code. This leads to leaking the skb
in case of error.

Tested-on: QCN9274 hw2.0 PCI WLAN.WBE.1.3.1-00218-QCAHKSWPL_SILICONZ-1

Fixes: 66a9448b1b89 ("wifi: ath12k: implement hardware data filter")
Fixes: 593174170919 ("wifi: ath12k: implement WoW enable and wakeup commands")
Fixes: 4a3c212eee0e ("wifi: ath12k: add basic WoW functionalities")
Fixes: 16f474d6d49d ("wifi: ath12k: add WoW net-detect functionality")
Fixes: 1666108c74c4 ("wifi: ath12k: support ARP and NS offload")
Fixes: aab4ae566fa1 ("wifi: ath12k: support GTK rekey offload")
Fixes: 7af01e569529 ("wifi: ath12k: handle keepalive during WoWLAN suspend and resume")
Signed-off-by: Nicolas Escande <nico.escande@gmail.com>
Reviewed-by: Baochen Qiang <baochen.qiang@oss.qualcomm.com>
Reviewed-by: Vasanthakumar Thiagarajan <vasanthakumar.thiagarajan@oss.qualcomm.com>
Link: https://patch.msgid.link/20260422163258.3013872-1-nico.escande@gmail.com
Signed-off-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
13 days agomm/memfd_luo: document preservation of file seals
David Carlier [Thu, 23 Apr 2026 12:56:48 +0000 (13:56 +0100)] 
mm/memfd_luo: document preservation of file seals

Commit 8a552d68a86e ("mm: memfd_luo: preserve file seals") started
preserving file seals across live update and restoring them via
memfd_add_seals() on retrieve, but the DOC header was not updated and
still listed seals under "Non-Preserved Properties" as being unsealed
on restore.

Move the Seals entry to the "Preserved Properties" section and describe
the actual behavior, including the MEMFD_LUO_ALL_SEALS restriction that
both preserve and retrieve enforce.

Fixes: 8a552d68a86e ("mm: memfd_luo: preserve file seals")
Signed-off-by: David Carlier <devnexen@gmail.com>
Reviewed-by: Pratyush Yadav <pratyush@kernel.org>
Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Link: https://patch.msgid.link/20260423125648.152113-2-devnexen@gmail.com
Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
13 days agomm/memfd_luo: reject memfds whose page count exceeds UINT_MAX
David Carlier [Thu, 23 Apr 2026 12:56:47 +0000 (13:56 +0100)] 
mm/memfd_luo: reject memfds whose page count exceeds UINT_MAX

memfd_luo_preserve_folios() declares max_folios as unsigned int and
computes it from the inode size, then passes it to memfd_pin_folios()
which itself caps max_folios at unsigned int.  For files whose base-page
count exceeds UINT_MAX (larger than 16 TiB with 4 KiB pages), the
assignment truncates silently: only a prefix of the file gets pinned and
preserved, while memfd_luo_preserve() still records the full inode size
in ser->size.  On retrieve the inode is restored to the full size but
only the preserved prefix repopulates the page cache, so the tail comes
back as holes and user data is silently lost across the live update.

Reject such files at preserve time with -EFBIG rather than chunk the
pin loop, which would also require enlarging the preserved folios array
well beyond what is practical.

Fixes: b3749f174d68 ("mm: memfd_luo: allow preserving memfd")
Signed-off-by: David Carlier <devnexen@gmail.com>
Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Reviewed-by: Pratyush Yadav <pratyush@kernel.org>
Link: https://patch.msgid.link/20260423125648.152113-1-devnexen@gmail.com
Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
13 days agospi: microchip core-qspi gpio-cs fixes + cleanup
Mark Brown [Mon, 4 May 2026 13:23:04 +0000 (22:23 +0900)] 
spi: microchip core-qspi gpio-cs fixes + cleanup

Conor Dooley <conor@kernel.org> says:

v3 with the review comment about the core handing CS_HIGH dealt with.
I noticed that in the same function there was a "raw" BIT(1), which I
replaced with a macro that the patch was already adding for use in the
setup function...

13 days agospi: microchip-core-qspi: remove some inline markings
Conor Dooley [Thu, 30 Apr 2026 10:10:20 +0000 (11:10 +0100)] 
spi: microchip-core-qspi: remove some inline markings

Remove inline markings from a number of functions that are called as
part of mem ops callbacks. None of them are either particularly trivial
or sensitive to overhead of a function call. Just let the compiler
decide what to do with them.

Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://patch.msgid.link/20260430-serpent-stimulate-59fb860ef429@spud
Signed-off-by: Mark Brown <broonie@kernel.org>
13 days agospi: microchip-core-qspi: don't attempt to transmit during emulated read-only dual...
Conor Dooley [Thu, 30 Apr 2026 10:10:19 +0000 (11:10 +0100)] 
spi: microchip-core-qspi: don't attempt to transmit during emulated read-only dual/quad operations

The core will deal with reads by creating clock cycles itself, there's
no need to generate clock cycles by transmitting garbage data at the
driver level. Further, transmitting garbage data just bricks the transfer
since QSPI doesn't have a dedicated master-out line like MOSI in regular
SPI. I'm not entirely sure if the transfer is bricked because of the
garbage data being transmitted on the bus or because the core loses
track of whether it is supposed to be sending or receiving data.

Fixes: 8f9cf02c88528 ("spi: microchip-core-qspi: Add regular transfers")
CC: stable@vger.kernel.org
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://patch.msgid.link/20260430-freezing-saloon-95b1f3d9dad0@spud
Signed-off-by: Mark Brown <broonie@kernel.org>
13 days agospi: microchip-core-qspi: control built-in cs manually
Conor Dooley [Thu, 30 Apr 2026 10:10:18 +0000 (11:10 +0100)] 
spi: microchip-core-qspi: control built-in cs manually

The coreQSPI IP supports only a single chip select, which is
automagically operated by the hardware - set low when the transmit
buffer first gets written to and set high when the number of bytes
written to the TOTALBYTES field of the FRAMES register have been sent on
the bus. Additional devices must use GPIOs for their chip selects.
It was reported to me that if there are two devices attached to this
QSPI controller that the in-built chip select is set low while linux
tries to access the device attached to the GPIO.

This went undetected as the boards that connected multiple devices to
the SPI controller all exclusively used GPIOs for chip selects, not
relying on the built-in chip select at all. It turns out that this was
because the built-in chip select, when controlled automagically, is set
low when active and high when inactive, thereby ruling out its use for
active-high devices or devices that need to transmit with the chip
select disabled.

Modify the driver so that it controls chip select directly, retaining
the behaviour for mem_ops of setting the chip select active for the
entire duration of the transfer in the exec_op callback. For regular
transfers, implement the set_cs callback for the core to use.

As part of this, the existing setup callback, mchp_coreqspi_setup_op(),
is removed. Modifying the CLKIDLE field is not safe to do during
operation when there are multiple devices, so this code is removed
entirely. Setting the MASTER and ENABLE fields is something that can be
done once at probe, it doesn't need to be re-run for each device.
Instead the new setup callback sets the built-in chip select to its
inactive state for active-low devices, as the reset value of the chip
select in software controlled mode is low.

Fixes: 8f9cf02c88528 ("spi: microchip-core-qspi: Add regular transfers")
Fixes: 8596124c4c1bc ("spi: microchip-core-qspi: Add support for microchip fpga qspi controllers")
CC: stable@vger.kernel.org
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://patch.msgid.link/20260430-hamstring-busload-f941d0347b5e@spud
Signed-off-by: Mark Brown <broonie@kernel.org>
13 days agospi: imx: Three fixes for the i.MX SPI driver
Mark Brown [Mon, 4 May 2026 13:22:18 +0000 (22:22 +0900)] 
spi: imx: Three fixes for the i.MX SPI driver

John Madieu <john.madieu@gmail.com> says:

This series independent fixes found in the i.MX SPI driver.

These are:

1/3 fixes a precedence bug in spi_imx_dma_max_wml_find() that makes
    the watermark-finding logic effectively dead code. The function
    currently always returns wml = 1 because of how the !-operator
    binds to the modulo expression.

2/3 fixes a missing return on the package-1 failure path in
    spi_imx_dma_data_prepare(). The error path frees the
    dma_data array and the package-0 buffers, then falls through
    to "return 0" - the caller proceeds with a freed pointer.

3/3 makes spi_imx_setupxfer() propagate the prepare_transfer()
    return value. Currently a -EINVAL from mx51_ecspi_prepare_transfer
    (e.g. on a word_delay overflow) is silently swallowed and the
    transfer proceeds with a partially-configured controller.

13 days agospi: imx: Propagate prepare_transfer() error from spi_imx_setupxfer()
John Madieu [Fri, 1 May 2026 13:59:51 +0000 (13:59 +0000)] 
spi: imx: Propagate prepare_transfer() error from spi_imx_setupxfer()

spi_imx_setupxfer() calls the per-variant prepare_transfer()
callback and returns 0 unconditionally:

spi_imx->devtype_data->prepare_transfer(spi_imx, spi, t);

return 0;

mx51_ecspi_prepare_transfer() can return -EINVAL when the requested
word_delay does not fit in MX51_ECSPI_PERIOD_MASK. The error is
detected after a partial set of register writes (CTRL: BL, clkdiv,
SMC), so the controller is left in a partially-configured state and
the transfer is then submitted as if setup succeeded.

Propagate the return value. The other variants' prepare_transfer
callbacks all return 0, so this is a no-op for them.

Fixes: a3bb4e663df3 ("spi: imx: support word delay")
Signed-off-by: John Madieu <john.madieu@gmail.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20260501135951.2416527-4-john.madieu@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>
13 days agospi: imx: Fix UAF on package-1 prepare failure in spi_imx_dma_data_prepare()
John Madieu [Fri, 1 May 2026 13:59:50 +0000 (13:59 +0000)] 
spi: imx: Fix UAF on package-1 prepare failure in spi_imx_dma_data_prepare()

When transfer->len exceeds MX51_ECSPI_CTRL_MAX_BURST and is not a
multiple of it, spi_imx_dma_data_prepare() splits the transfer into
two DMA packages. If preparing the second package fails:

ret = spi_imx_dma_tx_data_handle(spi_imx, &spi_imx->dma_data[1],
 transfer->tx_buf + spi_imx->dma_data[0].data_len,
 false);
if (ret) {
kfree(spi_imx->dma_data[0].dma_tx_buf);
kfree(spi_imx->dma_data[0].dma_rx_buf);
kfree(spi_imx->dma_data);
}
}

return 0;

the function frees the package-0 buffers and the dma_data array,
then falls through to `return 0`, telling the caller the prepare
succeeded. The caller then dereferences the freed dma_data array,
producing a use-after-free.

Return the error from the failure path so the caller takes its
existing failure branch.

Fixes: faa8e404ad8e ("spi: imx: support dynamic burst length for ECSPI DMA mode")
Signed-off-by: John Madieu <john.madieu@gmail.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20260501135951.2416527-3-john.madieu@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>
13 days agospi: imx: Fix precedence bug in spi_imx_dma_max_wml_find()
John Madieu [Fri, 1 May 2026 13:59:49 +0000 (13:59 +0000)] 
spi: imx: Fix precedence bug in spi_imx_dma_max_wml_find()

The watermark search in spi_imx_dma_max_wml_find() reads:

if (!dma_data->dma_len % (i * bytes_per_word))
break;

The unary ! binds tighter than %, so this parses as:

if ((!dma_data->dma_len) % (i * bytes_per_word))
break;

!dma_data->dma_len is 0 or 1, and `0 % x == 0` for any x; `1 % x` is
0 unless x == 1. The condition is therefore false in every case
except dma_len != 0 with i * bytes_per_word == 1, i.e. i == 1 and
bytes_per_word == 1.

The loop almost always falls through to its end, leaving i == 0,
which the post-loop fallback rewrites to 1:

if (i == 0)
i = 1;

So spi_imx->wml ends up at 1 for essentially every DMA transfer,
defeating the entire purpose of the function. The DMA engine then
requests service after every single FIFO word instead of using
multi-word bursts, hurting throughput on every DMA-capable variant.

Add the missing parentheses so the modulo is computed first, then
negated:

if (!(dma_data->dma_len % (i * bytes_per_word)))
break;

Fixes: faa8e404ad8e ("spi: imx: support dynamic burst length for ECSPI DMA mode")
Signed-off-by: John Madieu <john.madieu@gmail.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20260501135951.2416527-2-john.madieu@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>
13 days agoASoC: fsl_xcvr: Fix event generation for cached controls
Cássio Gabriel [Tue, 28 Apr 2026 03:07:08 +0000 (00:07 -0300)] 
ASoC: fsl_xcvr: Fix event generation for cached controls

ALSA controls should return 1 from a put callback when the control
value changes. fsl_xcvr_capds_put() and fsl_xcvr_tx_cs_put() both
update cached control data but always return 0, so ALSA suppresses
change notifications for the Capabilities Data Structure and playback
IEC958 channel status controls.

Compare the old and new cached values before copying the new data,
and return whether the control value changed.

Fixes: 28564486866f ("ASoC: fsl_xcvr: Add XCVR ASoC CPU DAI driver")
Signed-off-by: Cássio Gabriel <cassiogabrielcontato@gmail.com>
Link: https://patch.msgid.link/20260428-asoc-fsl-xcvr-event-generation-v1-1-f21cf0812c4f@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>
13 days agoASoC: sdw_utils: avoid the SDCA companion function not supported failure
Derek Fang [Thu, 30 Apr 2026 12:10:43 +0000 (20:10 +0800)] 
ASoC: sdw_utils: avoid the SDCA companion function not supported failure

Treat the companion amp as generic AMP until full support for companion
amp is added.

Signed-off-by: Derek Fang <derek.fang@realtek.com>
Reviewed-by: Charles Keepax <ckeepax@opensource.cirrus.com>
Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com>
Link: https://patch.msgid.link/20260430121043.552241-1-yung-chuan.liao@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
13 days agoASoC: amd: yc: Add HP OMEN Gaming Laptop 16-ap0xxx product line in quirk table
Tommaso Soncin [Wed, 29 Apr 2026 16:08:57 +0000 (18:08 +0200)] 
ASoC: amd: yc: Add HP OMEN Gaming Laptop 16-ap0xxx product line in quirk table

Add a DMI quirk for the HP OMEN Gaming Laptop 16-ap0xxx line fixing the
issue where the internal microphone was not detected.

Cc: stable@vger.kernel.org
Signed-off-by: Tommaso Soncin <soncintommaso@gmail.com>
Link: https://patch.msgid.link/20260429160858.538986-1-soncintommaso@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>
13 days agoASoC: cs35l56: Fix out-of-bounds in dev_err() in cs35l56_read_onchip_spkid()
Richard Fitzgerald [Thu, 30 Apr 2026 10:11:34 +0000 (11:11 +0100)] 
ASoC: cs35l56: Fix out-of-bounds in dev_err() in cs35l56_read_onchip_spkid()

Remove the incorrect use of onchip_spkid_gpios[i] in the dev_err() after
regmap_read() of CS35L56_GPIO_STATUS1 returns an error.

This dev_err() was incorrectly copy-pasted from one inside the for-loop,
where i was valid. The read of CS35L56_GPIO_STATUS1 isn't for a specific
GPIO register, so the use of onchip_spkid_gpios[i] in the error message is
both irrelevant and out-of-bounds here.

Fixes: 4d1e3e2c404d ("ASoC: cs35l56: Support for reading speaker ID from on-chip GPIOs")
Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Link: https://patch.msgid.link/20260430101134.2655938-1-rf@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org>
13 days agoASoC: amd: yc: Add DMI quirk for MSI Bravo 15 C7VE
Bob Song [Thu, 30 Apr 2026 01:49:20 +0000 (09:49 +0800)] 
ASoC: amd: yc: Add DMI quirk for MSI Bravo 15 C7VE

The laptop requires a quirk ID to enable its internal microphone. Add
it to the DMI quirk table.

Reported-by: gannovera <gannovera@gmail.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218402
Signed-off-by: Bob Song <songxiebing@kylinos.cn>
Link: https://patch.msgid.link/20260430014920.141276-1-songxiebing@kylinos.cn
Signed-off-by: Mark Brown <broonie@kernel.org>
13 days agoASoC: cs35l56: Fix hibernate write in runtime resume error path
Richard Fitzgerald [Wed, 29 Apr 2026 10:53:15 +0000 (11:53 +0100)] 
ASoC: cs35l56: Fix hibernate write in runtime resume error path

The error path of cs35l56_runtime_resume_common() should only write
the hibernation sequence if can_hibernate is true.

Something has already gone badly wrong if we ever reach the error
path. But triggering hibernate on hardware that does not support it
is likely to make the situation unrecoverable without a full reboot
because there might not be any hardware signal to exit hibernate.

Fixes: a47cf4dac7dc ("ASoC: cs35l56: Change hibernate sequence to use allow auto hibernate")
Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Link: https://patch.msgid.link/20260429105315.2438298-1-rf@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org>
13 days agoASoC: spacemit: fix RX DMA params not set when TX is running
Troy Mitchell [Wed, 29 Apr 2026 09:00:50 +0000 (17:00 +0800)] 
ASoC: spacemit: fix RX DMA params not set when TX is running

When TX is already running (SSCR_SSE is set), the hw_params callback
returns early before setting up DMA parameters for the RX stream. This
prevents the capture path from configuring its DMA data properly.

Move the SSCR_SSE check after DMA parameter setup and format
constraints, so both TX and RX streams get their DMA configuration
regardless of whether the hardware is already enabled. The early return
now only skips the register writes that would disrupt an active stream.

Fixes: fce217449075 ("ASoC: spacemit: add i2s support for K1 SoC")
Signed-off-by: Troy Mitchell <troy.mitchell@linux.spacemit.com>
Link: https://patch.msgid.link/20260429-k1-i2s-fix-v2-1-8d67835aaddc@linux.spacemit.com
Signed-off-by: Mark Brown <broonie@kernel.org>
13 days agodrm/fb-helper: Fix clipping when damage area spans a single scanline
Francesco Lavra [Tue, 10 Feb 2026 17:35:45 +0000 (18:35 +0100)] 
drm/fb-helper: Fix clipping when damage area spans a single scanline

When the damage area resulting from a dirty memory range spans a single
scanline, the width of the rectangle is calculated dynamically because it
may not coincide with the framebuffer width.
If the dirty range ends exactly at the end of the scanline, the `bit_end`
variable is incorrectly assigned a 0 value, which results in a bogus clip
rectangle where the x2 coordinate is 0. This prevents the dirty scanline
from being flushed to the hardware.
Change the calculation of the `bit_end` value to fix the x2 coordinate
value in the above edge case.

Fixes: ded74cafeea9 ("drm/fb-helper: Clip damage area horizontally")
Signed-off-by: Francesco Lavra <flavra@baylibre.com>
Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de>
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patch.msgid.link/20260210173545.733937-1-flavra@baylibre.com
13 days agodrm/qxl: Fix missing KMS poll cleanup
Myeonghun Pak [Fri, 24 Apr 2026 11:25:18 +0000 (20:25 +0900)] 
drm/qxl: Fix missing KMS poll cleanup

drm_kms_helper_poll_init() initializes the output polling work and
enables polling for the DRM device. qxl enables polling before calling
drm_dev_register(), but the drm_dev_register() failure path tears down
the modeset and device state without disabling the polling helper.

The remove path also unregisters and shuts down the DRM device without
first disabling the polling helper. Add matching drm_kms_helper_poll_fini()
calls in both paths so the delayed polling work is cancelled before qxl
tears down the associated modeset/device state.

Signed-off-by: Myeonghun Pak <mhun512@gmail.com>
Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de>
Fixes: 5ff91e442652 ("qxl: use drm helper hotplug support")
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patch.msgid.link/20260424112543.57819-1-mhun512@gmail.com
13 days agoASoC: codecs: ab8500: Remove suspicious code
Uwe Kleine-König (The Capable Hub) [Thu, 30 Apr 2026 15:45:24 +0000 (17:45 +0200)] 
ASoC: codecs: ab8500: Remove suspicious code

anc_configure() passed values from drvdata->anc_fir_values[],
drvdata->anc_iir_values[] and drvdata->sid_fir_values[] as register
offset to snd_soc_component_read(). The content of these arrays are user
controllable via the component controls "ANC FIR Coefficients", "ANC
IIR Coefficients" and "Sidetone FIR Coefficients" which I assume are
supposed to hold register values, not register offsets.

Without a datasheet for that component and given that before commit
a201aef1a88b ("ASoC: codecs: ab8500: Fix casting of private data") the
arrays overlapped with driver control structures and thus didn't work
properly since 2012, drop that functionality and let someone repair it
who has an actual need for it.

With the core functionally removed several code parts become essentially
unused and are removed, too.

Reported-by: Sashiko (gemini/gemini-3.1-pro-preview)
Link: https://sashiko.dev/#/patchset/20260428192255.2294705-2-u.kleine-koenig%40baylibre.com
Fixes: 679d7abdc754 ("ASoC: codecs: Add AB8500 codec-driver")
Signed-off-by: Uwe Kleine-König (The Capable Hub) <u.kleine-koenig@baylibre.com>
Link: https://patch.msgid.link/20260430154524.338912-2-u.kleine-koenig@baylibre.com
Signed-off-by: Mark Brown <broonie@kernel.org>
13 days agoALSA: pcmtest: Return -EFAULT on pattern read copy failure
Cássio Gabriel [Fri, 1 May 2026 17:45:14 +0000 (14:45 -0300)] 
ALSA: pcmtest: Return -EFAULT on pattern read copy failure

pattern_write() reports -EFAULT when copy_from_user() fails, but
pattern_read() converts copy_to_user() failures into a zero-length read.
That makes a userspace buffer fault look like EOF instead of reporting the
actual error.

Return -EFAULT from pattern_read() when copying the pattern data to
userspace fails, and update the file offset only after a successful copy.

Fixes: 315a3d57c64c ("ALSA: Implement the new Virtual PCM Test Driver")
Signed-off-by: Cássio Gabriel <cassiogabrielcontato@gmail.com>
Link: https://patch.msgid.link/20260501-alsa-pcmtest-pattern-read-efault-v1-1-53e1e8c11dda@gmail.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
13 days agoi2c: stub: Reject I2C block transfers with invalid length
Weiming Shi [Tue, 14 Apr 2026 17:23:39 +0000 (01:23 +0800)] 
i2c: stub: Reject I2C block transfers with invalid length

The I2C_SMBUS_I2C_BLOCK_DATA case in stub_xfer() uses data->block[0]
as the transfer length. The existing check only clamps it to avoid
overrunning the chip->words[256] register array, but does not validate
it against I2C_SMBUS_BLOCK_MAX (32), which is the limit of the union
i2c_smbus_data.block buffer (34 bytes total). The driver is a
development/test tool (CONFIG_I2C_STUB=m, not built by default)
that must be loaded with a chip_addr= parameter.

A local user with access to /dev/i2c-* can issue an I2C_SMBUS ioctl
with I2C_SMBUS_I2C_BLOCK_DATA and data->block[0] > 32, causing
stub_xfer() to read or write past the end of the union
i2c_smbus_data.block buffer:

 BUG: KASAN: stack-out-of-bounds in stub_xfer (drivers/i2c/i2c-stub.c:223)
 Read of size 1 at addr ffff88800abcfd92 by task exploit/81
 Call Trace:
  <TASK>
  stub_xfer (drivers/i2c/i2c-stub.c:223)
  __i2c_smbus_xfer (drivers/i2c/i2c-core-smbus.c:593)
  i2c_smbus_xfer (drivers/i2c/i2c-core-smbus.c:536)
  i2cdev_ioctl_smbus (drivers/i2c/i2c-dev.c:391)
  i2cdev_ioctl (drivers/i2c/i2c-dev.c:478)
  __x64_sys_ioctl (fs/ioctl.c:583)
  do_syscall_64 (arch/x86/entry/syscall_64.c:94)
  entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
  </TASK>

The bug exists because i2c-stub implements .smbus_xfer directly,
bypassing the I2C_SMBUS_BLOCK_MAX validation in
i2c_smbus_xfer_emulated(). The I2C_SMBUS_BLOCK_DATA case in the same
function correctly validates against I2C_SMBUS_BLOCK_MAX, but the
I2C_SMBUS_I2C_BLOCK_DATA case does not.

Fix by rejecting transfers with data->block[0] == 0 or
data->block[0] > I2C_SMBUS_BLOCK_MAX with -EINVAL, consistent with
both the I2C_SMBUS_BLOCK_DATA case in the same function and the
I2C_SMBUS_I2C_BLOCK_DATA validation in i2c_smbus_xfer_emulated().

Fixes: 4710317891e4 ("i2c-stub: Implement I2C block support")
Reported-by: Xiang Mei <xmei5@asu.edu>
Signed-off-by: Weiming Shi <bestswngs@gmail.com>
Reviewed-by: Jean Delvare <jdelvare@suse.de>
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
13 days agox86/efi: Fix graceful fault handling after FPU softirq changes
Ivan Hu [Thu, 30 Apr 2026 07:41:07 +0000 (15:41 +0800)] 
x86/efi: Fix graceful fault handling after FPU softirq changes

Since commit d02198550423 ("x86/fpu: Improve crypto performance by
making kernel-mode FPU reliably usable in softirqs"), kernel_fpu_begin()
calls fpregs_lock() which uses local_bh_disable() instead of the
previous preempt_disable(). This sets SOFTIRQ_OFFSET in preempt_count
during the entire EFI runtime service call, causing in_interrupt() to
return true in normal task context.

The graceful page fault handler efi_crash_gracefully_on_page_fault()
uses in_interrupt() to bail out for faults in real interrupt context.
With SOFTIRQ_OFFSET now set, the handler always bails out, leaving EFI
firmware page faults unhandled. This escalates to die() which also sees
in_interrupt() as true and calls panic("Fatal exception in interrupt"),
resulting in a hard system freeze. On systems with buggy firmware that
triggers page faults during EFI runtime calls (e.g., accessing unmapped
memory in GetTime()), this causes an unrecoverable hang instead of the
expected graceful EFI_ABORTED recovery.

Fix by replacing in_interrupt() with !in_task(). This preserves the
original intent of bailing for interrupts or NMI faults, while no longer
falsely triggering from the FPU code path's local_bh_disable().

Fixes: d02198550423 ("x86/fpu: Improve crypto performance by making kernel-mode FPU reliably usable in softirqs")
Cc: <stable@vger.kernel.org>
Signed-off-by: Ivan Hu <ivan.hu@canonical.com>
[ardb: Sashiko spotted that using 'in_hardirq() || in_nmi()' leaves a
       window where a softirq may be taken before fpregs_lock() is
       called, but after efi_rts_work.efi_rts_id has been assigned,
       and any page faults occurring in that window will then be
       misidentified as having been caused by the firmware. Instead,
       use !in_task(), which incorporates in_serving_softirq(). ]
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
13 days agoi2c: Compare the return value of gpiod_get_direction against GPIO_LINE_DIRECTION_OUT
Nikola Z. Ivanov [Wed, 15 Apr 2026 20:50:21 +0000 (23:50 +0300)] 
i2c: Compare the return value of gpiod_get_direction against GPIO_LINE_DIRECTION_OUT

The GPIO_LINE_DIRECTION_* definitions have just recently been exposed to
gpio consumers.h by breaking them out in a separate defs.h file.

Use this to validate the gpio direction instead of the hard-coded literal.

Signed-off-by: Nikola Z. Ivanov <zlatistiv@gmail.com>
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
13 days agoparisc: Fix IRQ leak in LASI driver
Hongling Zeng [Sun, 3 May 2026 04:17:44 +0000 (12:17 +0800)] 
parisc: Fix IRQ leak in LASI driver

When request_irq() succeeds but gsc_common_setup() fails later,
the IRQ is never released. Fix this by adding proper error handling
with goto labels to ensure resources are released in LIFO order.

Detected by Smatch:
  drivers/parisc/lasi.c:216 lasi_init_chip() warn: 'lasi->gsc_irq.irq'
from request_irq() not released on lines: 207.

Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <error27@gmail.com>
Closes: https://lore.kernel.org/r/202604180957.4QdAIxP6-lkp@intel.com/
Signed-off-by: Hongling Zeng <zenghongling@kylinos.cn>
Cc: stable@vger.kernel.org
Signed-off-by: Helge Deller <deller@gmx.de>
13 days agostaging: rtl8723bs: os_dep: avoid NULL pointer dereference in rtw_cbuf_alloc
Shyam Sunder Reddy Padira [Tue, 14 Apr 2026 07:13:06 +0000 (12:43 +0530)] 
staging: rtl8723bs: os_dep: avoid NULL pointer dereference in rtw_cbuf_alloc

The return value of kzalloc_flex() is used without
ensuring that the allocation succeeded, and the
pointer is dereferenced unconditionally.

Guard the access to the allocated structure to
avoid a potential NULL pointer dereference if the
allocation fails.

Fixes: 980cd426a257 ("staging: rtl8723bs: replace rtw_zmalloc() with kzalloc()")
Cc: stable <stable@kernel.org>
Signed-off-by: Shyam Sunder Reddy Padira <shyamsunderreddypadira@gmail.com>
Reviewed-by: Dan Carpenter <error27@gmail.com>
Link: https://patch.msgid.link/20260414071308.4781-2-shyamsunderreddypadira@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
13 days agoi2c: dev: prevent integer overflow in I2C_TIMEOUT ioctl
Mingyu Wang [Mon, 27 Apr 2026 02:57:45 +0000 (10:57 +0800)] 
i2c: dev: prevent integer overflow in I2C_TIMEOUT ioctl

While fuzzing with Syzkaller, a persistent `schedule_timeout: wrong
timeout value` warning was observed, accompanied by SMBus controller
state machine corruption.

The I2C_TIMEOUT ioctl accepts a user-provided timeout in multiples of
10 ms. The user argument is checked against INT_MAX, but it is
subsequently multiplied by 10 before being passed to msecs_to_jiffies().

A malicious user can pass a large value (e.g., 429496729) that passes
the `arg > INT_MAX` check but overflows when multiplied by 10. This
results in a truncated 32-bit unsigned value that bypasses the
internal `(int)m < 0` check in `msecs_to_jiffies()`.

The truncated value is then assigned to `client->adapter->timeout`
(a signed 32-bit int), which is reinterpreted as a negative number.
When passed to wait_for_completion_timeout(), this negative value
undergoes sign extension to a 64-bit unsigned long, triggering the
`schedule_timeout` warning and causing premature returns. This leaves
the SMBus state machine in an unrecoverable state, constituting a
local Denial of Service (DoS).

Fix this by bounding the user argument to `INT_MAX / 10`.

Signed-off-by: Mingyu Wang <25181214217@stu.xidian.edu.cn>
[wsa: move the comment as well]
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
13 days agostaging: vme_user: fix root device leak on init failure
Johan Hovold [Fri, 24 Apr 2026 10:49:10 +0000 (12:49 +0200)] 
staging: vme_user: fix root device leak on init failure

Make sure to deregister and free the root device in case module
initialisation fails.

Fixes: 658bcdae9c67 ("vme: Adding Fake VME driver")
Cc: stable@vger.kernel.org # 4.9
Cc: Martyn Welch <martyn@welchs.me.uk>
Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://patch.msgid.link/20260424104910.2619349-1-johan@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
13 days agoi2c: acpi: Add ELAN0678 to i2c_acpi_force_100khz_device_ids
Niels Franke [Sat, 18 Apr 2026 05:37:19 +0000 (07:37 +0200)] 
i2c: acpi: Add ELAN0678 to i2c_acpi_force_100khz_device_ids

The ELAN0678 touchpad (04F3:3195) found in the Lenovo ThinkPad X13
exhibits excessive smoothing when the I2C bus runs at 400KHz, making
the touchpad feel sluggish when plugged into AC power. This is the
same issue previously fixed for ELAN06FA.

The device's ACPI table (Lenovo TP-R22) specifies 0x00061A80 (400KHz)
for the I2cSerialBusV2 descriptor. Forcing the bus to 100KHz eliminates
the sluggish behavior.

Signed-off-by: Niels Franke <nielsfranke@gmail.com>
Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com>
[wsa: kept the sorting]
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
13 days agofbdev: udlfb: add vm_ops to dlfb_ops_mmap to prevent use-after-free
Rajat Gupta [Mon, 4 May 2026 03:51:10 +0000 (20:51 -0700)] 
fbdev: udlfb: add vm_ops to dlfb_ops_mmap to prevent use-after-free

dlfb_ops_mmap() uses remap_pfn_range() to map vmalloc framebuffer pages
to userspace but sets no vm_ops on the VMA. This means the kernel cannot
track active mmaps. When dlfb_realloc_framebuffer() replaces the backing
buffer via FBIOPUT_VSCREENINFO, existing mmap PTEs are not invalidated.
On USB disconnect, dlfb_ops_destroy() calls vfree() on the old pages
while userspace PTEs still reference them, resulting in a use-after-free:
the process retains read/write access to freed kernel pages.

Add vm_operations_struct with open/close callbacks that maintain an
atomic mmap_count on struct dlfb_data. In dlfb_realloc_framebuffer(),
check mmap_count and return -EBUSY if the buffer is currently mapped,
preventing buffer replacement while userspace holds stale PTEs.

Tested with PoC using dummy_hcd + raw_gadget USB device emulation.

Signed-off-by: Rajat Gupta <rajgupt@qti.qualcomm.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: stable@vger.kernel.org
Signed-off-by: Helge Deller <deller@gmx.de>
13 days agodt-bindings: i2c: apple,i2c: Add t8122 compatible
Janne Grunau [Fri, 20 Mar 2026 12:23:24 +0000 (13:23 +0100)] 
dt-bindings: i2c: apple,i2c: Add t8122 compatible

The i2c block on the Apple silicon t8122 (M3) SoC is compatible with the
existing driver. Add "apple,t8122-i2c" as SoC specific compatible under
"apple,t8103-i2c" used by the deriver.

Signed-off-by: Janne Grunau <j@jannau.net>
Acked-by: Andi Shyti <andi.shyti@kernel.org>
Acked-by: Rob Herring (Arm) <robh@kernel.org>
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
13 days agoiommu/amd: Fix precedence order in set_dte_passthrough()
Weinan Liu [Thu, 30 Apr 2026 23:28:51 +0000 (23:28 +0000)] 
iommu/amd: Fix precedence order in set_dte_passthrough()

Bitwise OR | operator has a higher precedence than the ternary ?:
operatior. It will be incorrectly evaluated as:

new->data[1] |= (FIELD_PREP(...) | dev_data->ats_enabled) ? DTE_FLAG_IOTLB : 0;

Wrap the conditional operation in parentheses to enforce the
correct evaluation order.

Fixes: 93eee2a49c1b ("iommu/amd: Refactor logic to program the host page table in DTE")
Signed-off-by: Weinan Liu <wnliu@google.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
13 days agoi2c: stm32f7: reinit_completion() per transfer not per msg
Marek Vasut [Sat, 2 May 2026 15:31:54 +0000 (17:31 +0200)] 
i2c: stm32f7: reinit_completion() per transfer not per msg

Currently, the driver may repeatedly call reinit_completion() during
transfer which contains multiple messages, while another thread is
waiting for the completion.

This happens during transfer with more than 1 message, invoked via
stm32f7_i2c_xfer_core() -> stm32f7_i2c_xfer_msg(). After invoking the
stm32f7_i2c_xfer_msg() to start transfer, stm32f7_i2c_xfer_core()
calls wait_for_completion_timeout() to wait for completion of the
transfer of all messages. When the first message transfer completes,
the hard IRQ handler triggers, and detects transfer completion, which
leads to stm32f7_i2c_isr_event_thread() IRQ thread being started. The
stm32f7_i2c_isr_event_thread() calls stm32f7_i2c_xfer_msg() in case
there are more messages.

Without this change, the second and later stm32f7_i2c_xfer_msg() would
call reinit_completion() on the completion which is still being waited
for in stm32f7_i2c_xfer_core(). Fix this by moving the reinit_completion()
into stm32f7_i2c_xfer_core(), together with wait_for_completion_timeout().

Since stm32f7_i2c_xfer_core() now waits for completion of the entire
transfer, increase the default timeout. This fixes sporadic transfer
timeouts on STM32MP25xx during kernel boot.

Fixes: aeb068c57214 ("i2c: i2c-stm32f7: add driver")
Signed-off-by: Marek Vasut <marex@nabladev.com>
[wsa: reworded commit subject]
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
13 days agodt-bindings: i2c: amlogic: Add compatible for T7 SOC
Ronald Claveau [Fri, 24 Apr 2026 14:17:33 +0000 (16:17 +0200)] 
dt-bindings: i2c: amlogic: Add compatible for T7 SOC

Add the T7 SOC compatible which fallback to AXG compatible.

Acked-by: Rob Herring (Arm) <robh@kernel.org>
Signed-off-by: Ronald Claveau <linux-kernel-dev@aliel.fr>
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
13 days agoi2c: testunit: Replace system_long_wq with system_dfl_long_wq
Marco Crivellari [Thu, 30 Apr 2026 09:08:10 +0000 (11:08 +0200)] 
i2c: testunit: Replace system_long_wq with system_dfl_long_wq

Currently the code enqueue work items using {queue|mod}_delayed_work(),
using system_long_wq. This workqueue should be used when long works are
expected, but it is a per-cpu workqueue.

This is important because queue_delayed_work() queue the work using:

   queue_delayed_work_on(WORK_CPU_UNBOUND, ...);

Note that WORK_CPU_UNBOUND = NR_CPUS.

This would end up calling __queue_delayed_work() that does:

    if (housekeeping_enabled(HK_TYPE_TIMER)) {
    //      [....]
    } else {
            if (likely(cpu == WORK_CPU_UNBOUND))
                    add_timer_global(timer);
            else
                    add_timer_on(timer, cpu);
    }

So when cpu == WORK_CPU_UNBOUND the timer is global and is
not using a specific CPU. Later, when __queue_work() is called:

    if (req_cpu == WORK_CPU_UNBOUND) {
            if (wq->flags & WQ_UNBOUND)
                    cpu = wq_select_unbound_cpu(raw_smp_processor_id());
            else
                    cpu = raw_smp_processor_id();
    }

Because the wq is not unbound, it takes the CPU where the timer
fired and enqueue the work on that CPU.
The consequence of all of this is that the work can run anywhere,
depending on where the timer fired.

Recently, a new unbound workqueue specific for long running work has
been added:

    c116737e972e ("workqueue: Add system_dfl_long_wq for long unbound works")

So change system_long_wq with system_dfl_long_wq so that the work may
benefit from scheduler task placement.

Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
[wsa: remove FIXME as well]
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
13 days agowifi: cw1200: Revert "Fix locking in error paths"
Bart Van Assche [Thu, 30 Apr 2026 17:44:15 +0000 (10:44 -0700)] 
wifi: cw1200: Revert "Fix locking in error paths"

Revert commit d98c24617a83 ("wifi: cw1200: Fix locking in error paths")
because it introduces a locking bug instead of fixing a locking bug.
cw1200_wow_resume() unlocks priv->conf_mutex. Hence, adding
mutex_unlock(&priv->conf_mutex) just after cw1200_wow_resume() is wrong.

Reported-by: Ben Hutchings <ben@decadent.org.uk>
Closes: https://lore.kernel.org/all/408661f69f263266b028713e1412ba36d457e63d.camel@decadent.org.uk/
Fixes: d98c24617a83 ("wifi: cw1200: Fix locking in error paths")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://patch.msgid.link/20260430174418.1845431-1-bvanassche@acm.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
13 days agowifi: mac80211: tests: mark HT check strict
Johannes Berg [Mon, 4 May 2026 06:54:27 +0000 (08:54 +0200)] 
wifi: mac80211: tests: mark HT check strict

The HT check now only applies in strict mode since APs
were found to be broken. Mark it as such.

Fixes: 711a9c018ad2 ("wifi: mac80211: skip ieee80211_verify_sta_ht_mcs_support check in non-strict mode")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
13 days agoio_uring/eventfd: reset deferred signal state
Yufan Chen [Sun, 3 May 2026 17:57:10 +0000 (01:57 +0800)] 
io_uring/eventfd: reset deferred signal state

Recursive eventfd wakeups must defer io_uring eventfd signaling because
eventfd_signal_mask() rejects reentry from eventfd wakeup handlers. The
io_ev_fd ops bit tracks an outstanding deferred signal so that the same
rcu_head is not queued twice.

That bit is only set today. Once the first deferred callback runs, later
recursive notifications still see the bit set and skip queueing another
deferred signal. This can leave new completions without a matching
eventfd wake after the first recursive deferral.

Clear the pending bit before issuing the deferred signal. If the wakeup
path recurses while the callback runs, a new signal can be queued for
the next RCU grace period while the current callback keeps its reference
until it returns.

Signed-off-by: Yufan Chen <ericterminal@gmail.com>
Fixes: 60b6c075e8eb ("io_uring/eventfd: move to more idiomatic RCU free usage")
Link: https://patch.msgid.link/20260503175710.37209-1-yufan.chen@linux.dev
Signed-off-by: Jens Axboe <axboe@kernel.dk>
13 days agoio_uring/napi: clear tracked NAPI entries on unregister
Yufan Chen [Sun, 3 May 2026 17:56:10 +0000 (01:56 +0800)] 
io_uring/napi: clear tracked NAPI entries on unregister

IORING_UNREGISTER_NAPI disables NAPI busy polling, but it currently
leaves any previously tracked NAPI IDs on the ring context. The normal
wait path only checks whether the list is empty before entering the busy
poll helper, so an unregistered ring can still observe stale entries and
run an unexpected busy poll pass.

Make unregister switch the context to inactive and free the tracked
entries. Do the same inactive transition while changing the tracking
strategy, and recheck the expected tracking mode under napi_lock before
inserting a newly learned NAPI ID. This prevents a racing poll path from
repopulating the list after unregister or reconfiguration.

Also make the busy poll dispatcher ignore inactive mode explicitly.

Signed-off-by: Yufan Chen <ericterminal@gmail.com>
Fixes: 6bf90bd8c58a ("io_uring/napi: add static napi tracking strategy")
Link: https://patch.msgid.link/20260503175610.35521-1-yufan.chen@linux.dev
Signed-off-by: Jens Axboe <axboe@kernel.dk>
13 days agodrm/ttm: Fix GPU MM stats during pool shrinking
Matthew Brost [Sat, 2 May 2026 06:53:38 +0000 (23:53 -0700)] 
drm/ttm: Fix GPU MM stats during pool shrinking

TTM pool shrinking frees pages by calling __free_pages() directly,
which bypasses updates to NR_GPU_ACTIVE and leaves GPU MM accounting
out of sync.

Introduce a helper, __free_pages_gpu_account(), and use it for all page
frees in ttm_pool.c so GPU MM statistics are updated consistently.

Reported-by: Kenneth Crudup <kenny@panix.com>
Fixes: ae80122f3896 ("drm/ttm: use gpu mm stats to track gpu memory allocations. (v4)")
Cc: Christian Koenig <christian.koenig@amd.com>
Cc: Huang Rui <ray.huang@amd.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: David Airlie <airlied@gmail.com>
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Tested-by: Kenneth Crudup <kenny@panix.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Link: https://patch.msgid.link/20260502065338.2720646-1-matthew.brost@intel.com
13 days agosmb: client: use kzalloc to zero-initialize security descriptor buffer
Bjoern Doebel [Thu, 30 Apr 2026 08:57:17 +0000 (08:57 +0000)] 
smb: client: use kzalloc to zero-initialize security descriptor buffer

Commit 62e7dd0a39c2d ("smb: common: change the data type of num_aces
to le16") split struct smb_acl's __le32 num_aces field into __le16
num_aces and __le16 reserved. The reserved field corresponds to Sbz2
in the MS-DTYP ACL wire format, which must be zero [1].

When building an ACL descriptor in build_sec_desc(), we are using a
kmalloc()'ed descriptor buffer and writing the fields explicitly using
le16() writes now. This never writes to the 2 byte reserved field,
leaving it as uninitialized heap data.

When the reserved field happens to contain non-zero slab garbage,
Samba rejects the security descriptor with "ndr_pull_security_descriptor
failed: Range Error", causing chmod to fail with EINVAL.

Change kmalloc() to kzalloc() to ensure the entire buffer is
zero-initialized.

Fixes: 62e7dd0a39c2d ("smb: common: change the data type of num_aces to le16")
Cc: stable@vger.kernel.org
Signed-off-by: Bjoern Doebel <doebel@amazon.de>
Assisted-by: Kiro:claude-opus-4.6
[1] https://learn.microsoft.com/en-us/openspecs/windows_protocols/ms-dtyp/20233ed8-a6c6-4097-aafa-dd545ed24428
Signed-off-by: Steve French <stfrench@microsoft.com>
13 days agocifs: abort open_cached_dir if we don't request leases
Shyam Prasad N [Tue, 28 Apr 2026 16:07:47 +0000 (21:37 +0530)] 
cifs: abort open_cached_dir if we don't request leases

It is possible that SMB2_open_init may not set lease context based
on the requested oplock level. This can happen when leases have been
temporarily or permanently disabled. When this happens, we will have
open_cached_dir making an open without lease context and the response
will anyway be rejected by open_cached_dir (thereby forcing a close to
discard this open). That's unnecessary two round-trips to the server.

This change adds a check before making the open request to the server
to make sure that SMB2_open_init did add the expected lease context
to the open in open_cached_dir.

Cc: <stable@vger.kernel.org>
Reviewed-by: Bharath SM <bharathsm@microsoft.com>
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
13 days agoLoongArch: KVM: Move unconditional delay into timer clear scenery
Bibo Mao [Mon, 4 May 2026 01:00:48 +0000 (09:00 +0800)] 
LoongArch: KVM: Move unconditional delay into timer clear scenery

When timer interrupt arrives in guest kernel, guest kernel clears the
timer interrupt and program timer with the next incoming event.

During this stage, timer tick is -1 and timer interrupt status is
disabled in ESTAT register. KVM hypervisor need write zero with timer
tick register and wait timer interrupt injection from HW side, and
then clear timer interrupt.

So there is 2 cycle delay in KVM hypervisor to emulate such scenery,
and the delay is unnecessary if there is no need to clear the timer
interrupt.

Here move 2 cycle delay into timer clear scenery and add timer ESTAT
checking after delay, and set max timer expire value if timer interrupt
does not arrive still.

Cc: stable@vger.kernel.org
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
13 days agoLoongArch: KVM: Fix HW timer interrupt lost when inject interrupt by software
Bibo Mao [Mon, 4 May 2026 01:00:48 +0000 (09:00 +0800)] 
LoongArch: KVM: Fix HW timer interrupt lost when inject interrupt by software

With passthrough HW timer, timer interrupt is injected by HW. When
inject emulated CPU interrupt by software such SIP0/SIP1/IPI, HW timer
interrupt may be lost.

Here check whether there is timer tick value inversion before and after
injecting emulated CPU interrupt by software, timer enabling by reading
timer cfg register is skipped. If the timer tick value is detected with
changing, then timer should be enabled. And inject a timer interrupt by
software if there is.

Cc: <stable@vger.kernel.org>
Fixes: f45ad5b8aa93 ("LoongArch: KVM: Implement vcpu interrupt operations").
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
13 days agoLoongArch: KVM: Move AVEC interrupt injection into switch loop
Bibo Mao [Mon, 4 May 2026 01:00:48 +0000 (09:00 +0800)] 
LoongArch: KVM: Move AVEC interrupt injection into switch loop

When AVEC interrupt controller is emulated in user space, AVEC interrupt
is injected by software like SIP0/SIP1/TI/IPI interrupts. Here also move
the AVEC interrupt injection in switch loop.

Cc: stable@vger.kernel.org
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
13 days agoLoongArch: KVM: Use kvm_set_pte() in kvm_flush_pte()
Tao Cui [Mon, 4 May 2026 01:00:38 +0000 (09:00 +0800)] 
LoongArch: KVM: Use kvm_set_pte() in kvm_flush_pte()

kvm_flush_pte() is the only caller that directly assigns *pte instead
of using the kvm_set_pte() wrapper. Use the wrapper for consistency with
the rest of the file.

No functional change intended.

Cc: stable@vger.kernel.org
Reviewed-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Tao Cui <cuitao@kylinos.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
13 days agoLoongArch: KVM: Fix missing EMULATE_FAIL in kvm_emu_mmio_read()
Tao Cui [Mon, 4 May 2026 01:00:38 +0000 (09:00 +0800)] 
LoongArch: KVM: Fix missing EMULATE_FAIL in kvm_emu_mmio_read()

In the ldptr (0x24...0x27) opcode decoding path, the default case only
breaks out but without setting "ret" value to EMULATE_FAIL. This leaves
run->mmio.len uninitialized (stale from a previous MMIO operation) while
"ret" value remains EMULATE_DO_MMIO, causing the code to proceed with an
incorrect MMIO length.

Add "ret = EMULATE_FAIL" to match the other default branches in the same
function (e.g. the 0x28...0x2e and 0x38 cases).

Cc: stable@vger.kernel.org
Reviewed-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Tao Cui <cuitao@kylinos.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
13 days agoLoongArch: KVM: Cap KVM_CAP_NR_VCPUS by KVM_CAP_MAX_VCPUS
Qiang Ma [Mon, 4 May 2026 01:00:37 +0000 (09:00 +0800)] 
LoongArch: KVM: Cap KVM_CAP_NR_VCPUS by KVM_CAP_MAX_VCPUS

It doesn't make sense to return the recommended maximum number of vCPUs
which exceeds the maximum possible number of vCPUs.

Other architectures have already done this, such as commit 57a2e13ebdda
("KVM: MIPS: Cap KVM_CAP_NR_VCPUS by KVM_CAP_MAX_VCPUS")

Cc: stable@vger.kernel.org
Reviewed-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Qiang Ma <maqianga@uniontech.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
13 days agoLoongArch: KVM: Fix "unreliable stack" for kvm_exc_entry
Xianglai Li [Mon, 4 May 2026 01:00:37 +0000 (09:00 +0800)] 
LoongArch: KVM: Fix "unreliable stack" for kvm_exc_entry

Insert the appropriate UNWIND hint into the kvm_exc_entry assembly
function to guide the generation of correct ORC table entries, thereby
solving the timeout problem ("unreliable stack") while loading the
livepatch-sample module on a physical machine running virtual machines
with multiple vcpus.

Cc: stable@vger.kernel.org
Signed-off-by: Xianglai Li <lixianglai@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
13 days agoLoongArch: KVM: Compile switch.S directly into the kernel
Xianglai Li [Mon, 4 May 2026 01:00:37 +0000 (09:00 +0800)] 
LoongArch: KVM: Compile switch.S directly into the kernel

If we directly compile the switch.S file into the kernel, the address of
the kvm_exc_entry function will definitely be within the DMW memory area.
Therefore, we will no longer need to perform a copy relocation of the
kvm_exc_entry.

So this patch compiles switch.S directly into the kernel, and then remove
the copy relocation execution logic for the kvm_exc_entry function.

Cc: stable@vger.kernel.org
Signed-off-by: Xianglai Li <lixianglai@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
13 days agoLoongArch: vDSO: Drop custom __arch_vdso_hres_capable()
Thomas Weißschuh [Mon, 4 May 2026 01:00:20 +0000 (09:00 +0800)] 
LoongArch: vDSO: Drop custom __arch_vdso_hres_capable()

The custom definition is identical to the generic fallback one.

So remove it.

Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
13 days agoLoongArch: Fix potential ADE in loongson_gpu_fixup_dma_hang()
Wentao Guan [Mon, 4 May 2026 01:00:20 +0000 (09:00 +0800)] 
LoongArch: Fix potential ADE in loongson_gpu_fixup_dma_hang()

The switch case in loongson_gpu_fixup_dma_hang() may not DC2 or DC3, and
readl(crtc_reg) will access with random address, because the "device" is
from "base+PCI_DEVICE_ID", "base" is from "pdev->devfn+1". This is wrong
when my platform inserts a discrete GPU:

lspci -tv
-[0000:00]-+-00.0  Loongson Technology LLC Hyper Transport Bridge Controller
...
           +-06.0  Loongson Technology LLC LG100 GPU
           +-06.2  Loongson Technology LLC Device 7a37
...

Add a default switch case to fix the panic as below:

 Kernel ade access[#1]:
 CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.6.136-loong64-desktop-hwe+ #4
 pc 90000000017e5534 ra 90000000017e54c0 tp 90000001002f8000 sp 90000001002fb6c0
 a0 80000efe00003100 a1 0000000000003100 a2 0000000000000000 a3 0000000000000002
 a4 90000001002fb6b4 a5 900000087cdb58fd a6 90000000027af000 a7 0000000000000001
 t0 00000000000085b9 t1 000000000000ffff t2 0000000000000000 t3 0000000000000000
 t4 fffffffffffffffd t5 00000000fffb6d9c t6 0000000000083b00 t7 00000000000070c0
 t8 900000087cdb4d94 u0 900000087cdb58fd s9 90000001002fb826 s0 90000000031c12c8
 s1 7fffffffffffff00 s2 90000000031c12d0 s3 0000000000002710 s4 0000000000000000
 s5 0000000000000000 s6 9000000100053000 s7 7fffffffffffff00 s8 90000000030d4000
    ra: 90000000017e54c0 loongson_gpu_fixup_dma_hang+0x40/0x210
   ERA: 90000000017e5534 loongson_gpu_fixup_dma_hang+0xb4/0x210
  CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE)
  PRMD: 00000004 (PPLV0 +PIE -PWE)
  EUEN: 00000000 (-FPE -SXE -ASXE -BTE)
  ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7)
 ESTAT: 00480000 [ADEM] (IS= ECode=8 EsubCode=1)
  BADV: 7fffffffffffff00
  PRID: 0014d000 (Loongson-64bit, Loongson-3A6000-HV)
 Modules linked in:
 Process swapper/0 (pid: 1, threadinfo=(____ptrval____), task=(____ptrval____))
 Stack : 0000000000000006 90000001002fb778 90000001002fb704 0000000000000007
         0000000016a65700 90000000017e5690 000000000000ffff ffffffffffffffff
         900000000209f7c0 9000000100053000 900000000209f7a8 9000000000eebc08
         0000000000000000 0000000000000000 0000000000000006 90000001002fb778
         90000001000530b8 90000000027af000 0000000000000000 9000000100054000
         9000000100053000 9000000000ebb70c 9000000100004c00 9000000004000001
         90000001002fb7e4 bae765461f31cb12 0000000000000000 0000000000000000
         0000000000000006 90000000027af000 0000000000000030 90000000027af000
         900000087cd6f800 9000000100053000 0000000000000000 9000000000ebc560
         7a2500147cdaf720 bae765461f31cb12 0000000000000001 0000000000000030
         ...
 Call Trace:
 [<90000000017e5534>] loongson_gpu_fixup_dma_hang+0xb4/0x210
 [<9000000000eebc08>] pci_fixup_device+0x108/0x280
 [<9000000000ebb70c>] pci_setup_device+0x24c/0x690
 [<9000000000ebc560>] pci_scan_single_device+0xe0/0x140
 [<9000000000ebc684>] pci_scan_slot+0xc4/0x280
 [<9000000000ebdd00>] pci_scan_child_bus_extend+0x60/0x3f0
 [<9000000000f5bc94>] acpi_pci_root_create+0x2b4/0x420
 [<90000000017e5e74>] pci_acpi_scan_root+0x2d4/0x440
 [<9000000000f5b02c>] acpi_pci_root_add+0x21c/0x3a0
 [<9000000000f4ee54>] acpi_bus_attach+0x1a4/0x3c0
 [<90000000010e200c>] device_for_each_child+0x6c/0xe0
 [<9000000000f4bbf4>] acpi_dev_for_each_child+0x44/0x70
 [<9000000000f4ef40>] acpi_bus_attach+0x290/0x3c0
 [<90000000010e200c>] device_for_each_child+0x6c/0xe0
 [<9000000000f4bbf4>] acpi_dev_for_each_child+0x44/0x70
 [<9000000000f4ef40>] acpi_bus_attach+0x290/0x3c0
 [<9000000000f5211c>] acpi_bus_scan+0x6c/0x280
 [<900000000189c028>] acpi_scan_init+0x194/0x310
 [<900000000189bc6c>] acpi_init+0xcc/0x140
 [<9000000000220cdc>] do_one_initcall+0x4c/0x310
 [<90000000018618fc>] kernel_init_freeable+0x258/0x2d4
 [<900000000184326c>] kernel_init+0x28/0x13c
 [<9000000000222008>] ret_from_kernel_thread+0xc/0xa4

Cc: stable@vger.kernel.org
Fixes: 95db0c9f526d ("LoongArch: Workaround LS2K/LS7A GPU DMA hang bug")
Link: https://gist.github.com/opsiff/ebf2dac51b4013d22462f2124c55f807
Link: https://gist.github.com/opsiff/a62f2a73db0492b3c49bf223a339b133
Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>