git.ipfire.org Git - thirdparty/linux.git/log

sched/idle: Handle offlining first in idle loop

Offline handling happens from within the inner idle loop, after the
beginning of dyntick cputime accounting, nohz idle load balancing and
TIF_NEED_RESCHED polling.

This is not necessary and even buggy because:

  * There is no dyntick handling to do. And calling tick_nohz_idle_enter()
    messes up with the struct tick_sched reset that was performed on
    tick_sched_timer_dying().

  * There is no nohz idle balancing to do.

  * Polling on TIF_RESCHED is irrelevant at this stage, there are no more
    tasks allowed to run.

  * No need to check if need_resched() before offline handling since
    stop_machine is done and all per-cpu kthread should be done with
    their job.

Therefore move the offline handling at the beginning of the idle loop.
This will also ease the idle cputime unification later by not elapsing
idle time while offline through the call to:

   tick_nohz_idle_enter() -> tick_nohz_start_idle()

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Tested-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Reviewed-by: Rafael J. Wysocki (Intel) <rafael@kernel.org>
Reviewed-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Link: https://patch.msgid.link/20260508131647.43868-3-frederic@kernel.org

tick/sched: Fix TOCTOU in nohz idle time fetch

When the nohz idle time is fetched, the current clock timestamp is taken
outside the seqcount, which can result in a race as reported by Sashiko:

    get_cpu_sleep_time_us()                 tick_nohz_start_idle()
    -----------------------                 ---------------------
    now = ktime_get()
                                            write_seqcount_begin(idle_sleeptime_seq);
                                            idle_entrytime = ktime_get()
                                            tick_sched_flag_set(ts, TS_FLAG_IDLE_ACTIVE);
                                            write_seqcount_end(&ts->idle_sleeptime_seq);
    read_seqcount_begin(idle_sleeptime_seq)
    delta = now - idle_entrytime);
    //!! But now < idle_entrytime
    idle = *sleeptime +  delta;
    read_seqcount_retry(&ts->idle_sleeptime_seq, seq)

Here the read side fetches the timestamp before the write side and its
update. As a result the time delta computed on the read side is negative
(ktime_t is signed) and breaks the cputime monotonicity guarantee.

This could possibly be fixed with reading the current clock timestamp
inside the seqcount but the reader overhead might then increase. Also
simply checking that the current timestamp is above the idle entry time
is enough to prevent any issue of the like.

Fixes: 620a30fa0bd1 ("timers/nohz: Protect idle/iowait sleep time under seqcount")
Reported-by: Sashiko
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20260508131647.43868-2-frederic@kernel.org

net: garp: fix unsigned integer underflow in garp_pdu_parse_attr

The receive-side GARP attribute parser computes dlen with reversed
operands:

dlen = sizeof(*ga) - ga->len;

ga->len is the on-wire attribute length and includes the GARP attribute
header. For normal attributes with data, ga->len is larger than
sizeof(*ga), so the subtraction underflows in unsigned arithmetic.

The resulting value is later passed to garp_attr_lookup(), whose length
argument is u8. After truncation, the parsed data length usually no
longer matches the length stored for locally registered attributes, so
received Join/Leave events are ignored. This breaks the GARP receive path
for common attributes, such as GVRP VLAN registration attributes.

Compute the data length as the attribute length minus the header length.

Fixes: eca9ebac651f ("net: Add GARP applicant-only participant")
Reported-by: Yizhou Zhao <zhaoyz24@mails.tsinghua.edu.cn>
Reported-by: Yuxiang Yang <yangyx22@mails.tsinghua.edu.cn>
Reported-by: Ao Wang <wangao@seu.edu.cn>
Reported-by: Xuewei Feng <fengxw06@126.com>
Reported-by: Qi Li <qli01@tsinghua.edu.cn>
Reported-by: Ke Xu <xuke@tsinghua.edu.cn>
Signed-off-by: Yizhou Zhao <zhaoyz24@mails.tsinghua.edu.cn>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260527083200.42861-1-zhaoyz24@mails.tsinghua.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: drv-net: tso: add new tests for ip6tnl, ipip, and sit tunnels

Add new tunnel test cases for ip6tnl, ipip, and sit. ip6tnl supports
ipv[46] as inner l3 header, and the other two tunnels only support a
single inner l3 type.

Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com>
Link: https://patch.msgid.link/20260529-tso-tunnels-v1-1-3771ee9eaaa9@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

time: Fix off-by-one in settimeofday() usec validation

The validation check uses '>' instead of '>=' when comparing tv_usec
against USEC_PER_SEC, allowing the value 1000000 through. After
conversion to nanoseconds (*= 1000), this produces tv_nsec ==
NSEC_PER_SEC, violating the timespec invariant that tv_nsec must be
less than NSEC_PER_SEC.

Use '>=' to reject tv_usec values that are not in the valid range of
0 to 999999.

Fixes: 5e0fb1b57bea ("y2038: time: avoid timespec usage in settimeofday()")
Signed-off-by: Naveen Kumar Chaudhary <naveen.osdev@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Acked-by: John Stultz <jstultz@google.com>
Link: https://patch.msgid.link/4rikk44zew3s6577dugmx4jyblz7o5c57niuap6ct3td5yfm6w@gh7pcumg7qor

clockevents: Fix duplicate type specifier in stub function parameter

The stub for arch_inlined_clockevent_set_next_coupled() has 'u64 u64
cycles' in its parameter list. Since u64 is a typedef, the compiler
parses the second 'u64' as the parameter name, making 'cycles' an
unused token. Remove the duplicate so the parameter is correctly named.

Fixes: 89f951a1e8ad ("clockevents: Provide support for clocksource coupled comparators")
Signed-off-by: Naveen Kumar Chaudhary <naveen.osdev@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/7tostpvxzdn6tobmyow63a5rweatls5kux3scqp2vzhe7mv6uq@ecr746b4hyhf

ACPI: button: Switch over to devres-based resource management

Switch over the ACPI button driver to devres-based resource management
by making the following changes:

* Use devm_kzalloc() for allocating button object memory.

* Use devm_input_allocate_device() for allocating the input class
   device object.

* Turn acpi_lid_remove_fs() into a devm cleanup action added
   by devm_acpi_lid_add_fs() which is a new wrapper around
   acpi_lid_add_fs().

* Add devm_acpi_button_init_wakeup() for initializing the wakeup source
   and make it add a custom devm action that will automatically remove
   the wakeup source registered by it.

* Turn acpi_button_remove_event_handler() into a devm cleanup action
   added by devm_acpi_button_add_event_handler() which is a new wrapper
   around acpi_button_add_event_handler().

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/2283436.Mh6RI2rZIc@rafael.j.wysocki
[ rjw: Rebased and removed unnecessary input device parent assignment ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

ACPI: button: Reorganize installing and removing event handlers

To facilitate subsequent changes, move the code installing and
removing button event handlers into two separate functions called
acpi_button_add_event_handler() and acpi_button_remove_event_handler(),
respectively, and rearrange it to reduce code duplication.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/2714170.Lt9SDvczpP@rafael.j.wysocki

ACPI: button: Use string literals for generating netlink messages

Instead of storing strings that never change later under
acpi_device_class(device) and using them for generating netlink
messages, use pointers to string literals with the same content.

This also allows the clearing of the acpi_device_class(device)
area during driver removal and in the probe rollback path to be
dropped.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/2070791.usQuhbGJ8B@rafael.j.wysocki

ACPI: button: Clean up adding and removing lid procfs interface

The procfs interface is only used with lid devices which only becomes
clear after looking into the function bodies of acpi_button_add_fs()
and acpi_button_remove_fs(). Moreover, the only error code returned
by the former of these functions is -ENODEV, so the ret local variable
in it is redundant, and the return type of the latter one can be changed
to void.

Accordingly, rename these functions to acpi_button_add_fs() and
acpi_button_remove_fs(), respectively, move the button->type checks
against ACPI_BUTTON_TYPE_LID from them to their callers, and make
code simplifications as per the above.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/1869050.VLH7GnMWUR@rafael.j.wysocki

ACPI: button: Merge two switch () statements in acpi_button_probe()

Two switch () statements in acpi_button_probe() operate on the same
value and the statements between them can be reordered with respect
to the second one, so merge them.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/3352815.5fSG56mABF@rafael.j.wysocki

ACPI: button: Drop redundant variable from acpi_button_probe()

Local char pointer called "name" in acpi_button_probe() is redundant
because its value can be assigned directly to input->name and the
latter can be used in the only other place where "name" is read, so
get rid of it.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/3706239.iIbC2pHGDl@rafael.j.wysocki

ACPI: button: Rework device verification during probe

Instead of manually comparing the primary ID of the device (retuned
by _HID) with each of the device IDs supported by the driver, use
acpi_match_acpi_device() (which includes the ACPI companion device
pointer check against NULL) and store the ACPI button type as
driver_data in button_device_ids[], which allows a multi-branch
conditional statement to be replaced with a switch () one. However,
to continue preventing successful probing of devices that only have
one of the supported device IDs in their _CID lists, compare the
matched device ID with the primary ID of the device and return an
error if they don't match.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/7960518.EvYhyI6sBW@rafael.j.wysocki
[ rjw: Fixed button memory leak on probe failure ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

ntsync: Honour caller's time namespace for absolute MONOTONIC timeouts

ntsync_schedule() takes the absolute timeout from userspace and hands it to
schedule_hrtimeout_range_clock() with HRTIMER_MODE_ABS. For the default
CLOCK_MONOTONIC path, it does not call timens_ktime_to_host() first.

A process inside a CLOCK_MONOTONIC time namespace computes the absolute
timeout in its own clock view. The kernel reads the same value against the
host clock. The two differ by the namespace offset. The timeout then fires
too early or too late.

Other users of absolute timeouts run the ktime through
timens_ktime_to_host() before starting the hrtimer. ntsync was added later
and missed that step.

/dev/ntsync is mode 0666. Any user inside a time namespace that can
open it is affected. The visible effect is wrong timeout behaviour
for Wine in a container that sets a CLOCK_MONOTONIC offset.

Reproducer: unshare --user --time, set the monotonic offset to -10s,
issue NTSYNC_IOC_WAIT_ANY with a 100 ms absolute MONOTONIC timeout.
The baseline run elapses about 100 ms. The run inside the namespace
elapses about 0 ms.

Apply timens_ktime_to_host() to the parsed timeout when the caller
did not set NTSYNC_WAIT_REALTIME. The helper does nothing in the
initial time namespace, so the fast path is unchanged.

Fixes: b4a7b5fe3f51 ("ntsync: Introduce NTSYNC_IOC_WAIT_ANY.")
Signed-off-by: Maoyi Xie <maoyixie.tju@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Reviewed-by: Elizabeth Figura <zfigura@codeweavers.com>
Link: https://patch.msgid.link/20260528063311.3300393-3-maoyixie.tju@gmail.com

time/namespace: Export init_time_ns and do_timens_ktime_to_host()

timens_ktime_to_host() in compares the current time namespace against
init_time_ns for the fast path. It calls do_timens_ktime_to_host() for the
offset case. Both symbols are needed at link time by any caller of the
inline.

All current callers are builtin, but ntsync can be built as module, which
prevents it from using it.

Export both with EXPORT_SYMBOL_GPL.

Signed-off-by: Maoyi Xie <maoyixie.tju@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20260528063311.3300393-2-maoyixie.tju@gmail.com

hsr: Remove WARN_ONCE() in hsr_addr_is_self().

syzbot reported the warning [0] in hsr_addr_is_self(),
whose assumption is simply wrong.

hsr->self_node is cleared in hsr_del_self_node(), which
is called from hsr_dellink().

Since dev->rtnl_link_ops->dellink() is called before
unregister_netdevice_many(), there is a window when
user can find the device but without hsr->self_node.

Let's remove WARN_ONCE() in hsr_addr_is_self().

[0]:
HSR: No self node
WARNING: net/hsr/hsr_framereg.c:39 at hsr_addr_is_self+0x211/0x3f0 net/hsr/hsr_framereg.c:39, CPU#0: syz.4.16848/17220
Modules linked in:
CPU: 0 UID: 0 PID: 17220 Comm: syz.4.16848 Tainted: G             L      syzkaller #0 PREEMPT_{RT,(full)}
Tainted: [L]=SOFTLOCKUP
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/18/2026
RIP: 0010:hsr_addr_is_self+0x211/0x3f0 net/hsr/hsr_framereg.c:39
Code: 33 2f 41 0f b7 dd 89 ee 09 de 31 ff e8 c8 b4 c6 f6 09 dd 74 54 e8 0f b0 c6 f6 31 ed eb 53 e8 06 b0 c6 f6 48 8d 3d 2f 50 9c 04 <67> 48 0f b9 3a 31 ed eb 42 e8 c1 13 1f 00 89 c5 31 ff 89 c6 e8 96
RSP: 0018:ffffc900041c70e0 EFLAGS: 00010283
RAX: ffffffff8afdc6ca RBX: ffffffff8afdc4e6 RCX: 0000000000080000
RDX: ffffc90010493000 RSI: 0000000000000948 RDI: ffffffff8f9a1700
RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000
R10: ffffc900041c71e8 R11: fffff52000838e3f R12: dffffc0000000000
R13: ffff888041f9e3c0 R14: ffff888086ee3802 R15: 0000000000000000
FS:  00007f6fe985d6c0(0000) GS:ffff888126176000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f80bd437dac CR3: 0000000025096000 CR4: 00000000003526f0
DR0: ffffffffffffffff DR1: 00000000000001f8 DR2: 0000000000000002
DR3: ffffffffefffff15 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
check_local_dest net/hsr/hsr_forward.c:592 [inline]
fill_frame_info net/hsr/hsr_forward.c:728 [inline]
hsr_forward_skb+0xa11/0x2a80 net/hsr/hsr_forward.c:739
hsr_dev_xmit+0x253/0x370 net/hsr/hsr_device.c:236
__netdev_start_xmit include/linux/netdevice.h:5368 [inline]
netdev_start_xmit include/linux/netdevice.h:5377 [inline]
xmit_one net/core/dev.c:3888 [inline]
dev_hard_start_xmit+0x2df/0x860 net/core/dev.c:3904
__dev_queue_xmit+0x1428/0x3900 net/core/dev.c:4870
neigh_output include/net/neighbour.h:556 [inline]
ip_finish_output2+0xcec/0x10b0 net/ipv4/ip_output.c:237
ip_send_skb net/ipv4/ip_output.c:1510 [inline]
ip_push_pending_frames+0x8b/0x110 net/ipv4/ip_output.c:1530
raw_sendmsg+0x1547/0x1a50 net/ipv4/raw.c:659
sock_sendmsg_nosec net/socket.c:787 [inline]
__sock_sendmsg net/socket.c:802 [inline]
____sys_sendmsg+0x7da/0x9c0 net/socket.c:2698
___sys_sendmsg+0x2a5/0x360 net/socket.c:2752
__sys_sendmsg net/socket.c:2784 [inline]
__do_sys_sendmsg net/socket.c:2789 [inline]
__se_sys_sendmsg net/socket.c:2787 [inline]
__x64_sys_sendmsg+0x1c3/0x2a0 net/socket.c:2787
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0x15f/0xf80 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f6feb62ce59
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f6fe985d028 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f6feb8a6090 RCX: 00007f6feb62ce59
RDX: 0000000000000000 RSI: 0000200000000000 RDI: 0000000000000004
RBP: 00007f6feb6c2d6f R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007f6feb8a6128 R14: 00007f6feb8a6090 R15: 00007ffcf01cc488
</TASK>

Fixes: f266a683a480 ("net/hsr: Better frame dispatch")
Reported-by: syzbot+652670cf249077eb498b@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/6a1a861e.b111c304.35cd64.0016.GAE@google.com/
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Fernando Fernandez Mancera <fmancera@suse.de>
Link: https://patch.msgid.link/20260530064300.340793-1-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

bpf: Silence unused-but-set-variable warning in bpf_for_each_reg_in_vstate_mask

The macro requires callers to pass a stack variable, but not all
callbacks use it. Add (void)__stack to suppress the clang W=1 warning.

Signed-off-by: Amery Hung <ameryhung@gmail.com>
Link: https://lore.kernel.org/r/20260602175204.624401-1-ameryhung@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Merge tag 'batadv-next-pullrequest-20260601' of https://git.open-mesh.org/batadv

Simon Wunderlich says:

====================
This batman-adv cleanup patchset includes the following patches, all by
Sven Eckelmann:

- drop batman-adv specific version

- MAINTAINERS housekeeping for batman-adv (two patches)

- add missing includes

- use atomic_xchg() for gw.reselect check

- extract netdev wifi detection information object

- replace inappropriate atomic access with (READ|WRITE)_ONCE
   (six patches)

- tt: replace open-coded overflow check with helper

- tvlv: avoid unnecessary OGM buffer reallocations

- use neigh_node's orig_node only as id

* tag 'batadv-next-pullrequest-20260601' of https://git.open-mesh.org/batadv:
  batman-adv: use neigh_node's orig_node only as id
  batman-adv: tvlv: avoid unnecessary OGM buffer reallocations
  batman-adv: tt: replace open-coded overflow check with helper
  batman-adv: replace non-atomic last_ttvn with (READ|WRITE)_ONCE
  batman-adv: replace non-atomic packet_size_max with (READ|WRITE)_ONCE
  batman-adv: replace non-atomic mesh state with (READ|WRITE)_ONCE
  batman-adv: replace non-atomic vlan config fields with (READ|WRITE)_ONCE
  batman-adv: replace non-atomic hardif config fields with (READ|WRITE)_ONCE
  batman-adv: replace non-atomic meshif config fields with (READ|WRITE)_ONCE
  batman-adv: extract netdev wifi detection information object
  batman-adv: use atomic_xchg() for gw.reselect check
  batman-adv: add missing includes
  MAINTAINERS: Don't send batman-adv patches to netdev
  MAINTAINERS: Rename batman-adv T(ree)
  batman-adv: drop batman-adv specific version
====================

Link: https://patch.msgid.link/20260601123629.707089-1-sw@simonwunderlich.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'nf-26-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf

Pablo Neira Ayuso says:

====================
Netfilter/IPVS fixes for net

The following patchset contains Netfilter/IPVS fixes for net:

1) Fix splat with PREEMPT_RCU because smp_processor_id() in nfqueue,
   from Fernando Fernandez Mancera.

2) Fix possible use of pointer to old IPVS scheduler after RCU grace
   period when editing service, from Julian Anastasov.

3) Fix possible forever RCU walk over rt->fib6_siblings in nft_fib6,
   if rt is unlinked mid-iteration, apparently same issue happens in
   the fib6 core. From Jiayuan Chen.

4) Add mutex to guard refcount in synproxy infrastructure, since
   concurrent hook {un}registration can happen.
   From Fernando Fernandez Mancera.

5) Bail out if IRC conntrack helper fails to parse a command, do not
   try parsing using other command handlers, from Florian Westphal.
   This fixes a possible out-of-bound read.

6) Possible use-after-free in nft_tunnel by releasing template dst
   after all references has been dropped, from Tristan Madani.

7) Ignore conntrack template in nft_ct, from Jiayuan Chen.

8) Missing skb_ensure_writable() in ebt_snat, Yiming Qian.

9) Remove multi-register byteorder support, this allows for kernel
   stack info leak, from Florian Westphal.

* tag 'nf-26-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  netfilter: nft_byteorder: remove multi-register support
  netfilter: bridge: make ebt_snat ARP rewrite writable
  netfilter: nft_ct: bail out on template ct in get eval
  netfilter: nft_tunnel: fix use-after-free on object destroy
  netfilter: conntrack_irc: fix possible out-of-bounds read
  netfilter: synproxy: add mutex to guard hook reference counting
  netfilter: nft_fib_ipv6: bail out of sibling walk if rt got unlinked
  ipvs: clear the svc scheduler ptr early on edit
  netfilter: xt_NFQUEUE: prefer raw_smp_processor_id
====================

Link: https://patch.msgid.link/20260601115923.433946-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tcp: Add preempt_{disable,enable}_nested() in reqsk_queue_hash_req().

syzbot reported a weird reqsk->rsk_refcnt underflow in
__inet_csk_reqsk_queue_drop().

The captured reqsk_put() in __inet_csk_reqsk_queue_drop()
is called only when it successfully removes reqsk from ehash.

Moreover, reqsk_timer_handler() calls another reqsk_put()
after that.

This indicates that the reqsk was missing both refcnts for
ehash and the timer itself.

Since all the syzbot reports had PREEMPT_RT enabled, the only
possible scenario is that reqsk_queue_hash_req() is preempted
after mod_timer() and before refcount_set(), and then the timer
triggered after 1s aborts the reqsk due to its listener's close().

Let's wrap mod_timer() and refcount_set() with
preempt_disable_nested() and preempt_enable_nested().

Note that inet_ehash_insert() holds the normal spin_lock()
(mutex in PREEMPT_RT), so it must be called outside of
preempt_disable_nested(), but this is fine.

The lookup path just ignores 0 sk_refcnt entries in ehash
and tries to create another reqsk, but this will fail at
inet_ehash_insert().

[0]:
refcount_t: underflow; use-after-free.
WARNING: lib/refcount.c:28 at refcount_warn_saturate+0xb2/0x110 lib/refcount.c:28, CPU#0: ktimers/0/16
Modules linked in:
CPU: 0 UID: 0 PID: 16 Comm: ktimers/0 Tainted: G             L      syzkaller #0 PREEMPT_{RT,(full)}
Tainted: [L]=SOFTLOCKUP
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/18/2026
RIP: 0010:refcount_warn_saturate+0xb2/0x110 lib/refcount.c:28
Code: e4 7d d1 0a 67 48 0f b9 3a eb 4a e8 38 3d 23 fd 48 8d 3d e1 7d d1 0a 67 48 0f b9 3a eb 37 e8 25 3d 23 fd 48 8d 3d de 7d d1 0a <67> 48 0f b9 3a eb 24 e8 12 3d 23 fd 48 8d 3d db 7d d1 0a 67 48 0f
RSP: 0000:ffffc90000157948 EFLAGS: 00010246
RAX: ffffffff84a1301b RBX: 0000000000000003 RCX: ffff88801ca98000
RDX: 0000000000000100 RSI: 0000000000000000 RDI: ffffffff8f72ae00
RBP: ffffffff99ae3b01 R08: ffff88801ca98000 R09: 0000000000000005
R10: 0000000000000100 R11: 0000000000000004 R12: ffff8880425ef568
R13: ffff8880425ef4f8 R14: ffff8880425ef578 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff888126386000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f7b46710e9c CR3: 000000000dbb6000 CR4: 00000000003526f0
Call Trace:
<TASK>
__refcount_sub_and_test include/linux/refcount.h:400 [inline]
__refcount_dec_and_test include/linux/refcount.h:432 [inline]
refcount_dec_and_test include/linux/refcount.h:450 [inline]
reqsk_put include/net/request_sock.h:136 [inline]
__inet_csk_reqsk_queue_drop+0x3ce/0x440 net/ipv4/inet_connection_sock.c:1007
reqsk_timer_handler+0x651/0xdf0 net/ipv4/inet_connection_sock.c:1137
call_timer_fn+0x192/0x5e0 kernel/time/timer.c:1748
expire_timers kernel/time/timer.c:1799 [inline]
__run_timers kernel/time/timer.c:2374 [inline]
__run_timer_base+0x6a3/0x9f0 kernel/time/timer.c:2386
run_timer_base kernel/time/timer.c:2395 [inline]
run_timer_softirq+0x67/0x170 kernel/time/timer.c:2403
handle_softirqs+0x1de/0x6d0 kernel/softirq.c:622
__do_softirq kernel/softirq.c:656 [inline]
run_ktimerd+0x69/0x100 kernel/softirq.c:1151
smpboot_thread_fn+0x541/0xa50 kernel/smpboot.c:160
kthread+0x388/0x470 kernel/kthread.c:436
ret_from_fork+0x514/0xb70 arch/x86/kernel/process.c:158
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
</TASK>

Fixes: d2d6422f8bd1 ("x86: Allow to enable PREEMPT_RT.")
Reported-by: syzbot+e809069bc15f26300526@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/6a1a7bcf.0a9e871e.332604.000b.GAE@google.com/
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://patch.msgid.link/20260601182101.3183993-1-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: Annotate sk->sk_write_space() for UDP SOCKMAP.

UDP TX skb->destructor() is sock_wfree(), and UDP holds lock_sock()
only for UDP_CORK / MSG_MORE sendmsg().

Otherwise, sk->sk_write_space() may be read locklessly while SOCKMAP
rewrites sk->sk_write_space().

Let's use WRITE_ONCE() and READ_ONCE() for sk->sk_write_space().

Note that the write side is annotated by commit 2ef2b20cf4e0
("net: annotate data-races around sk->sk_{data_ready,write_space}").

Fixes: 7b98cd42b049 ("bpf: sockmap: Add UDP support")
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://patch.msgid.link/20260529193941.3897256-1-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

af_unix: Remove sock->state assignment.

Both struct socket and struct sock have a variable to
manage its state, sock->state and sk->sk_state.

When both are used, the former typically manages syscall
state and the latter manages the actual connection state.

AF_UNIX only uses sk->sk_state.

Let's remove unnecessary assignemnts for sock->state.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260529191829.3864438-1-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: net: add socat syslog for PPPoL2TP

As done in pppoe.sh, start socat as the syslog listener. In case the
test fails, dump its log to see what's going on.

Signed-off-by: Qingfang Deng <qingfang.deng@linux.dev>
Reviewed-by: Matthieu Baerts <matttbe@kernel.org>
Link: https://patch.msgid.link/20260529021146.5739-1-qingfang.deng@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-phy-dp83822-add-optional-external-phy-clock'

Stefan Wahren says:

====================
net: phy: dp83822: Add optional external PHY clock

This small series implement support for external PHY clock for the
dp83822 driver.
====================

Link: https://patch.msgid.link/20260528184642.33424-1-wahrenst@gmx.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: dp83822: Add optional external PHY clock

In some cases, the PHY can use an external ref clock source instead of a
crystal.

Add an optional clock in the PHY node to make sure that the clock source
is enabled, if specified, before probing.

Signed-off-by: Stefan Wahren <wahrenst@gmx.net>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20260528184642.33424-3-wahrenst@gmx.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: dp83822: Improve readability in dp8382x_probe

Introduce a local pointer for device so devm_kzalloc() fit into
a single line. Also this makes following changes easier to read.

Signed-off-by: Stefan Wahren <wahrenst@gmx.net>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20260528184642.33424-2-wahrenst@gmx.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

pcnet32: stop holding device spin lock during napi_complete_done

napi_complete_done may call gro_flush_normal (though not currently, as GRO
is unsupported at the moment), which may result in packet TX. This will
eventually result in calling pcnet32_start_xmit - resulting in a deadlock
while trying to re-acquire the already locked spin lock.

It is safe to split the spinlock block into two, because the hardware
registers are still protected from concurrent access, and the two blocks
perform unrelated operations that don't need to happen atomically.

Fixes: 5b2ec6f2be51 ("pcnet32: use napi_complete_done()")
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Oscar Maes <oscmaes92@gmail.com>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://patch.msgid.link/20260528140320.5556-1-oscmaes92@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

cgroup: Migrate tasks to the root css when a controller is rebound

cgroup_apply_control_disable() defers kill_css_finish() while a css is
still populated, relying on css_update_populated() to fire the deferred
kill once the populated count reaches zero.

This deadlocks when a controller is rebound out of a hierarchy. Mounting
an implicit_on_dfl controller such as perf_event as a v1 hierarchy steals
it off the default hierarchy, and rebind_subsystems() kills its
per-cgroup csses while they are still populated. The migration run in the
same step keeps the old css for a controller no longer in the hierarchy's
mask, so no task is migrated off the dying csses. Their populated count
never reaches zero, the deferred kill_css_finish() never fires, and the
next cgroup_lock_and_drain_offline() hangs forever under cgroup_mutex.

That migration is already a no-op pass over the rebound subtree. Add
cgroup_rebind_ss_mask so find_existing_css_set() resolves the leaving
controllers to the root css. Their tasks are migrated there, the
per-cgroup csses depopulate, and cgroup_apply_control_disable() kills
them synchronously. The deferral stays correct for the rmdir and
controller-disable paths it was meant for.

Fixes: 1dffd95575eb ("cgroup: Defer kill_css_finish() in cgroup_apply_control_disable()")
Reported-by: Mark Brown <broonie@kernel.org>
Closes: https://lore.kernel.org/all/41cd159c-54e5-45e0-81df-eaf36a6c028e@sirena.org.uk/
Reported-by: Bert Karwatzki <spasswolf@web.de>
Closes: https://lore.kernel.org/all/4e986b4ed7e16547805d54b6e67d09120bc4d2f2.camel@web.de/
Tested-by: Mark Brown <broonie@kernel.org>
Tested-by: Bert Karwatzki <spasswolf@web.de>
Signed-off-by: Tejun Heo <tj@kernel.org>

tcp: change bpf_skops_hdr_opt_len() signature

Some compilers do not inline bpf_skops_hdr_opt_len() from
tcp_established_options(), forcing an expensive stack canary
when CONFIG_STACKPROTECTOR_STRONG=y.

Change bpf_skops_hdr_opt_len() to return @remaining by value
to remove this stack canary from TCP fast path.

$ scripts/bloat-o-meter -t vmlinux.old vmlinux.new
add/remove: 0/0 grow/shrink: 1/1 up/down: 10/-59 (-49)
Function                                     old     new   delta
bpf_skops_hdr_opt_len                        297     307     +10
tcp_established_options                      574     515     -59
Total: Before=31456795, After=31456746, chg -0.00%

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260601093819.469626-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: b53: hide legacy gpiolib usage on non-mips

The MIPS bcm53xx platform still uses the legacy gpiolib interfaces based
on gpio numbers, but other platforms do not.

Hide these interfaces inside of the existing #ifdef block and use the
modern interfaces in the common parts of the driver to allow building
it when the gpio_set_value() is left out of the kernel.

Reviewed-by: Jonas Gorski <jonas.gorski@gmail.com>
Reviewed-by: Linus Walleij <linusw@kernel.org>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://patch.msgid.link/20260601165716.648230-1-arnd@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'soc-fixes-7.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc

Pull SoC fixes from Arnd Bergmann:
"Following the previous set of fixes, this addresses another
  significant number of small issues found in firmware drivers (tee,
  optee, qcomtee, qcom ice, exynos acpm) drivers through various tools.

  This is about error handling, resource leaks, concurrency and a
  use-after-free bug.

  The fixes for the Qualcomm ICE driver also introduce interface changes
  in the UFS and MMC drivers using it.

  Outside of firmware drivers, there are a few fixes across the tree:

   - Minor driver code mistakes in the Atmel EBI memory controller, the
     i.MX soc ID driver and socfpga boot logic

   - A defconfig change to avoid a boot time regression on multiple
     qualcomm boards

   - Device tree fixes for qualcomm, at91 and gemini, addressing mostly
     minor configuration mistakes"

* tag 'soc-fixes-7.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (28 commits)
  firmware: samsung: acpm: Fix infinite loop on sequence number exhaustion
  firmware: samsung: acpm: Fix missing LKMM barriers in sequence allocator
  firmware: samsung: acpm: Fix false timeouts and Use-After-Free in polling
  ARM: dts: gemini: Fix partition offsets
  ARM: socfpga: Fix OF node refcount leak in SMP setup
  soc: qcom: ice: Fix the error code when 'qcom,ice' property is not found
  arm64: dts: qcom: eliza: Add power-domain and iface clk for ice node
  arm64: dts: qcom: milos: Add power-domain and iface clk for ice node
  tee: qcomtee: add missing va_end in early return qcomtee_object_user_init()
  tee: fix params_from_user() error path in tee_ioctl_supp_recv
  tee: shm: fix shm leak in register_shm_helper()
  tee: fix tee_ioctl_object_invoke_arg padding
  arm64: defconfig: Enable PCI M.2 power sequencing driver
  scsi: ufs: ufs-qcom: Remove NULL check from devm_of_qcom_ice_get()
  mmc: sdhci-msm: Remove NULL check from devm_of_qcom_ice_get()
  soc: qcom: ice: Return proper error codes from devm_of_qcom_ice_get() instead of NULL
  soc: qcom: ice: Return -ENODEV if the ICE platform device is not found
  soc: qcom: ice: Fix race between qcom_ice_probe() and of_qcom_ice_get()
  ARM: dts: microchip: sam9x7: fix GMAC clock configuration
  firmware: samsung: acpm: Fix mailbox channel leak on probe error
  ...

ALSA: seq: oss: Reject reads that cannot fit the next event

snd_seq_oss_read() checks whether the next queued OSS sequencer event
fits in the remaining userspace buffer before removing it from the read
queue.

The check is inverted. It currently stops when the event is smaller than
the remaining buffer, so a normal 4-byte event is not copied for an
8-byte read buffer. Conversely, an 8-byte event can be copied for a
smaller read count.

Break only when the remaining userspace buffer is smaller than the next
event, and report -EINVAL if no complete event has been copied. This
prevents an undersized read from looking like end-of-file while leaving
the event queued for a later read with a large enough buffer.

Signed-off-by: Cássio Gabriel <cassiogabrielcontato@gmail.com>
Link: https://patch.msgid.link/20260602-alsa-seq-oss-read-size-check-v1-1-10e59b1742e0@gmail.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>

RDMA/efa: Validate SQ ring size against max LLQ size

Validate the SQ ring size against the device's max LLQ size. This
ensures that when using 128-byte WQEs, userspace cannot exceed the queue
limits.

On create QP, userspace provides the SQ ring size (depth x WQE size)
which is validated against the max LLQ size.

Fixes: 40909f664d27 ("RDMA/efa: Add EFA verbs implementation")
Link: https://patch.msgid.link/r/20260526081536.1203553-1-ynachum@amazon.com
Reviewed-by: Michael Margolin <mrgolin@amazon.com>
Signed-off-by: Yonatan Nachum <ynachum@amazon.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

ALSA: seq: Restore created port information after insertion

Commit 2ee646353cd5 ("ALSA: seq: Register kernel port with full
information") split sequencer port creation from list insertion so a
port can be filled before it becomes visible.

However, snd_seq_ioctl_create_port() still copies port->addr back to the
ioctl argument before snd_seq_insert_port() assigns the final port
number. A successful SNDRV_SEQ_IOCTL_CREATE_PORT without
SNDRV_SEQ_PORT_FLG_GIVEN_PORT can therefore report port -1 to userspace.

Move the ioctl address copy after successful insertion, and keep the
default "port-%d" name assignment from overwriting a caller-provided port
name. This restores the observable behavior from before the split while
keeping the port populated before publication.

Fixes: 2ee646353cd5 ("ALSA: seq: Register kernel port with full information")
Signed-off-by: Cássio Gabriel <cassiogabrielcontato@gmail.com>
Link: https://patch.msgid.link/20260602-alsa-seq-create-port-info-fix-v1-1-eec0280131e9@gmail.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>

Merge branches 'rcutorture.2026.05.24' and 'misc.2026.05.24' into rcu-merge.2026.05.24

rcutorture.2026.05.24: Torture-test updates
misc.2026.05.24: Miscellaneous RCU updates

rcu/nocb: reduce stack usage in nocb_gp_wait()

When CONFIG_UBSAN_ALIGNMENT is enabled, the stack usage of nocb_gp_wait()
grows above typical warning limits:

In file included from kernel/rcu/tree.c:4930:
kernel/rcu/tree_nocb.h: In function 'rcu_nocb_gp_kthread':
kernel/rcu/tree_nocb.h:866:1: error: the frame size of 1968 bytes is larger than 1280 bytes [-Werror=frame-larger-than=]

Apparently, the problem is passing rcu_data from a 'void *' pointer,
which gcc assumes may be misaligned. When the function is not inlined
into rcu_nocb_gp_kthread(), that is no longer visible to gcc.

Add a 'noinline_for_stack' annotation that leads to skipping a lot of
the alignment sanitizer checks and keeps the stack usage 60% lower here.

Reviewed-by: Kunwu Chan <chentao@kylinos.cn>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>

ARM: dts: imx7: add nvmem-layout

TQMa7 has board-information located in EEPROM at offset 0x20.
Add necessary nodes and properties for nvmem cell.

Signed-off-by: Alexander Feilke <Alexander.Feilke@ew.tq-group.com>
Signed-off-by: Frank Li <Frank.Li@nxp.com>

cpufreq/amd-pstate: Fix setting EPP in performance mode

EPP 0 is the only supported value in the performance policy.
commit 798c47593cca ("cpufreq/amd-pstate: Add support for platform profile
class") changed this while adding platform profile support to the
dynamic EPP feature, but this actually wasn't necessary since platform
profile writes disable manual EPP writes.

Restore allowing writing EPP of 0 when in performance mode.

Reviewed-by: Marco Scardovi <scardracs@disroot.org>
Tested-by: Marco Scardovi <scardracs@disroot.org>
Reported-by: Stuart Meckle <stuartmeckle@gmail.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=221473
Closes: https://gitlab.freedesktop.org/upower/power-profiles-daemon/-/work_items/190
Fixes: 798c47593cca ("cpufreq/amd-pstate: Add support for platform profile class")
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>

KVM: s390: Remove ptep_zap_softleaf_entry()

Migration entries do not need to be removed.

The swap subsystem has been (and still is being) heavily reworked. The
current implementation of ptep_zap_softleaf_entry() has been slowly
modified and is now wrong, since it unconditionally calls
swap_put_entries_direct() for both swap and migration entries.

Remove ptep_zap_softleaf_entry() altogether, merge the path for proper
swap entries directly in the only caller, and ignore migration entries.

Fixes: 200197908dc4 ("KVM: s390: Refactor and split some gmap helpers")
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Message-ID: <20260602142356.169458-11-imbrenda@linux.ibm.com>

KVM: s390: Fix possible reference leak in fault-in code

If kvm_s390_new_mmu_cache() fails, kvm_s390_faultin_gfn() returns
without releasing the faulted page.

Fix this by moving the allocation of the memory cache outside of the
loop. There is no reason to check at every iteration.

Opportunistically fix a comment.

Fixes: e907ae530133 ("KVM: s390: Add helper functions for fault handling")
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Message-ID: <20260602142356.169458-10-imbrenda@linux.ibm.com>

KVM: s390: Prevent memslots outside the ASCE range

With KVM_S390_VM_MEM_LIMIT_SIZE, userspace can set the highest address
allowed for the VM. Creating a memslot that lies over the maximum
address does not make sense and is only a potential source of bugs.

Prevent creation of memslots over the maximum address, and prevent the
maximum address from being reduced below the end of existing memslots.

Fixes: e38c884df921 ("KVM: s390: Switch to new gmap")
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Message-ID: <20260602142356.169458-9-imbrenda@linux.ibm.com>

io_uring/bpf-ops: restrict ctx access to BPF

BPF programs should have no need in looking into struct io_ring_ctx, if
anything, most of such cases would be anti patterns like looking up ring
indices directly via the context.

Replace it with a new empty structure, which is just an alias to struct
io_ring_ctx. It'll create a new BTF type and fail verification if a BPF
program tries to access it (beyond the first byte). It'll also give more
flexibility for the future, and otherwise it can be made aligned with
io_ring_ctx as before with struct groups if ever needed or extended in a
different way.

Fixes: d0e437b76bd3c ("io_uring/bpf-ops: implement loop_step with BPF struct_ops")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://patch.msgid.link/5f6ca3649e9e0bae8667db4357e28dd00cd07901.1780394491.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

drm/gem/shmem: Introduce __drm_gem_shmem_free_sgt_locked()

One of the complications of trying to use the shmem helpers to create a
scatterlist for shmem objects is that we need to be able to provide a
guarantee that the driver cannot be unbound for the lifetime of the
scatterlist.

The easiest way of handling this seems to be just hooking up an unmap
operation to devres the first time we create a scatterlist, which allows us
to still take advantage of gem shmem facilities without breaking that
guarantee. To allow for this, we extract __drm_gem_shmem_free_sgt_locked()
- which allows a caller (e.g. the rust bindings) to manually unmap the sgt
for a gem object as needed.

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Alexandre Courbot <acourbot@nvidia.com>
Link: https://patch.msgid.link/20260529183702.677677-6-lyude@redhat.com

rust: drm: gem: s/device::Device/Device/ for shmem.rs

We're about to start explicitly mentioning kernel devices as well in this
file, so this makes it easier to differentiate the two by allowing us to
import `device` as `kernel::device`.

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Alexandre Courbot <acourbot@nvidia.com>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Link: https://patch.msgid.link/20260428190605.3355690-2-lyude@redhat.com

block/partitions/acorn: use min in {riscix,linux}_partition

Use min() to replace the open-coded implementations and to simplify
riscix_partition() and linux_partition().

Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Reviewed-by: Kees Cook <kees@kernel.org>
Link: https://patch.msgid.link/20260602160757.973736-3-thorsten.blum@linux.dev
Signed-off-by: Jens Axboe <axboe@kernel.dk>

Merge branch 'more-gen_loader-fixes-2'

Daniel Borkmann says:

====================
More gen_loader fixes #2

Another small follow-up from the sashiko findings about signed loaders.
In particular, closing the gap to reject exclusive maps in iterators.
====================

Link: https://patch.msgid.link/20260602133052.423725-1-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: Test that exclusive maps are rejected as iter targets

Add a subtest to map_excl that creates an exclusive map and verifies a
bpf_map_elem iterator cannot be attached to it, which would otherwise
let an unrelated program read and overwrite the map's contents through
the iterator's writable value buffer.

  # LDLIBS=-static PKG_CONFIG='pkg-config --static' ./vmtest.sh -- ./test_progs -t map_excl
  [...]
  ./test_progs -t map_excl
  [    1.704382] bpf_testmod: loading out-of-tree module taints kernel.
  [    1.706068] bpf_testmod: module verification failed: signature and/or required key missing - tainting kernel
  #215/1   map_excl/map_excl_allowed:OK
  #215/2   map_excl/map_excl_denied:OK
  #215/3   map_excl/map_excl_no_map_in_map:OK
  #215/4   map_excl/map_excl_no_map_iter:OK
  #215     map_excl:OK
  Summary: 1/4 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/20260602133052.423725-5-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: Keep verifier_map_ptr exercising ops pointer access

sashiko complained that 38498c0ebacd ("selftests/bpf: Adjust verifier_map_ptr
for the map's excl field") would slightly decrease the test coverage given
before the test was against the verifier rejecting the ops pointer. Recover
the old test with the right offsets and add the existing one as an additional
test case.

  # LDLIBS=-static PKG_CONFIG='pkg-config --static' ./vmtest.sh -- ./test_progs -t verifier_map_ptr
  [    1.672932] bpf_testmod: module verification failed: signature and/or required key missing - tainting kernel
  #637/1   verifier_map_ptr/bpf_map_ptr: read with negative offset rejected:OK
  #637/2   verifier_map_ptr/bpf_map_ptr: read with negative offset rejected @unpriv:OK
  #637/3   verifier_map_ptr/bpf_map_ptr: write rejected:OK
  #637/4   verifier_map_ptr/bpf_map_ptr: write rejected @unpriv:OK
  #637/5   verifier_map_ptr/bpf_map_ptr: read non-existent field rejected:OK
  #637/6   verifier_map_ptr/bpf_map_ptr: read non-existent field rejected @unpriv:OK
  #637/7   verifier_map_ptr/bpf_map_ptr: read beyond excl field rejected:OK
  #637/8   verifier_map_ptr/bpf_map_ptr: read beyond excl field rejected @unpriv:OK
  #637/9   verifier_map_ptr/bpf_map_ptr: read ops field accepted:OK
  #637/10  verifier_map_ptr/bpf_map_ptr: read ops field accepted @unpriv:OK
  #637/11  verifier_map_ptr/bpf_map_ptr: r = 0, map_ptr = map_ptr + r:OK
  #637/12  verifier_map_ptr/bpf_map_ptr: r = 0, map_ptr = map_ptr + r @unpriv:OK
  #637/13  verifier_map_ptr/bpf_map_ptr: r = 0, r = r + map_ptr:OK
  #637/14  verifier_map_ptr/bpf_map_ptr: r = 0, r = r + map_ptr @unpriv:OK
  #637     verifier_map_ptr:OK
  [...]
  Summary: 2/20 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/20260602133052.423725-4-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

libbpf: Guard add_data() against size overflow

add_data() computes size8 = roundup(size, 8) and then hands size8 to
realloc_data_buf() before doing memcpy(gen->data_cur, data, size) with
the original size. A wrapped size8 passes through the realloc_data_buf()
INT32_MAX check. Harden this against overflow, though not realistic to
happen in practice.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/20260602133052.423725-3-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: Reject exclusive maps for bpf_map_elem iterators

Exclusive maps (aka excl_prog_hash) are meant to be reachable only
from the single program whose hash matches. This is enforced by
check_map_prog_compatibility() when the map is referenced from a
program such as signed BPF loaders.

A bpf_map_elem iterator, however, binds its target map at attach
time in bpf_iter_attach_map() instead of referencing it from the
program, so the exclusivity check is never reached. On top of that,
the iterator exposes the map value as a writable buffer.

Fixes: baefdbdf6812 ("bpf: Implement exclusive map creation")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/20260602133052.423725-2-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Merge tag 'mm-hotfixes-stable-2026-06-01-20-58' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM fixes from Andrew Morton:
"13 hotfixes. All are for MM. 10 are cc:stable and the remaining 3
  address post-7.1 issues or aren't considered suitable for backporting.

  There's a three-patch series "userfaultfd: verify VMA state across
  UFFDIO_COPY retry" from Mike Rapoport which fixes a few uffd things.
  The rest are singletons - please see the individual changelogs for
  details"

* tag 'mm-hotfixes-stable-2026-06-01-20-58' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  userfaultfd: remove redundant check in vm_uffd_ops()
  userfaultfd: refuse to __mfill_atomic_pte() for unsupported VMAs
  userfaultfd: verify VMA state across UFFDIO_COPY retry
  mm/huge_memory: update file PMD counter before folio_put()
  mm/huge_memory: update file PUD counter before folio_put()
  mm/hugetlb_vmemmap: fix incorrect vmemmap restore in rollback
  mm/damon/ops-common: call folio_test_lru() after folio_get()
  mm/cma: fix reserved page leak on activation failure
  mm/memory-failure: fix hugetlb_lock AA deadlock in get_huge_page_for_hwpoison
  mm/hugetlb: restore reservation on error in hugetlb folio copy paths
  mm/cma_debug: fix invalid accesses for inactive CMA areas
  memcg: use round-robin victim selection in refill_stock
  mm/hugetlb: avoid false positive lockdep assertion

dt-bindings: arm-smmu: Correct and add constraints for Hawi, Shikra and Kaanapali

Previous commit 75949eb02653 ("dt-bindings: arm-smmu: Constrain clocks
for newer Qualcomm variants") duplicated constraints for
qcom,sm6350-smmu-500 and qcom,sm6375-smmu-500 - these are already part
of previous "if:" block.

It also missed enforcing one clock for qcom,kaanapali-smmu-500 in GPU
case and missed simultaneously added Shikra and Hawi.

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Signed-off-by: Will Deacon <will@kernel.org>

dt-bindings: arm-smmu: Add compatible for Qualcomm Nord SoC

Document Applications Processor Subsystem (APSS) SMMU on Qualcomm
Nord SoC.

Signed-off-by: Shawn Guo <shengchao.guo@oss.qualcomm.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Signed-off-by: Will Deacon <will@kernel.org>

accel/amdxdna: Preserve user address when PASID is disabled

When PASID is not used, the buffer user address is set to
AMDXDNA_INVALID_ADDR. As a result, heap buffer user address validation
fails even though the original userspace address is available.

Preserve the userspace address regardless of PASID usage so heap buffer
address validation works correctly.

Fixes: dbc8fd7a03cb ("accel/amdxdna: Add expandable device heap support")
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
Link: https://patch.msgid.link/20260602040624.2206774-1-lizhi.hou@amd.com

KVM: x86: Take PIC lock on KVM_GET_IRQCHIP path

When userspace issues the KVM_SET_IRQCHIP ioctl to set the state of
the PIC, kvm_vm_ioctl_set_irqchip() grabs @kvm->arch.vpic->lock before
updating the state. However, the KVM_GET_IRQCHIP ioctl to retrieve the
same PIC state does not grab such lock, potentially causing torn reads
for userspace.

Fix this by grabbing the lock on the read path.

This issue goes all the way back. The bug was introduced with the
addition of PIC ioctl code itself in 6ceb9d791eee ("KVM: Add get/
set irqchip ioctls for in-kernel PIC live migration support"). Later,
894a9c5543ab ("KVM: x86: missing locking in PIT/IRQCHIP/SET_BSP_CPU
ioctl paths") added the locking for kvm_vm_ioctl_set_irqchip(), but
missed kvm_vm_ioctl_get_irqchip().

Fixes: 6ceb9d791eee ("KVM: Add get/set irqchip ioctls for in-kernel PIC live migration support")
Fixes: 894a9c5543ab ("KVM: x86: missing locking in PIT/IRQCHIP/SET_BSP_CPU ioctl paths")
Reported-by: Claude Code:claude-opus-4.6
Signed-off-by: Carlos López <clopez@suse.de>
Link: https://patch.msgid.link/20260529140013.14925-2-clopez@suse.de
Signed-off-by: Sean Christopherson <seanjc@google.com>

arm64: mm: Unmap kernel data/bss entirely from the linear map

The linear aliases of the kernel text and rodata are also mapped
read-only in the linear map. Given that the contents of these regions
are mostly identical to the version in the loadable image, mapping them
read-only and leaving their contents visible is a reasonable hardening
measure.

Data and bss, however, are now also mapped read-only but the contents of
these regions are more likely to contain data that we'd rather not leak.
So let's unmap these entirely in the linear map when the kernel is
running normally.

When going into hibernation or waking up from it, these regions need to
be mapped, so map the region initially, and toggle the valid bit so
map/unmap the region as needed.

Doing so is required because pages covering the kernel image are marked
as PageReserved, and therefore disregarded for snapshotting by the
hibernate logic unless they are mapped.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>

arm64: mm: Map the kernel data/bss read-only in the linear map

On systems where the bootloader adheres to the original arm64 boot
protocol, the placement of the kernel in the physical address space is
highly predictable, and this makes the placement of its linear alias in
the kernel virtual address space equally predictable, given the lack of
randomization of the linear map.

The linear aliases of the kernel text and rodata regions are already
mapped read-only, but the kernel data and bss are mapped read-write in
this region. This is not needed, so map them read-only as well.

Note that the statically allocated kernel page tables do need to be
modifiable via the linear map, so leave these mapped read-write.

Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>

mm: Make empty_zero_page[] const

The empty zero page is used to back any kernel or user space mapping
that is supposed to remain cleared, and so the page itself is never
supposed to be modified.

So mark it as const, which moves it into .rodata rather than .bss: on
most architectures, this ensures that both the kernel's mapping of it
and any aliases that are accessible via the kernel direct (linear) map
are mapped read-only, and cannot be used (inadvertently or maliciously)
to corrupt the contents of the zero page.

Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Reviewed-by: Jann Horn <jannh@google.com>
Reviewed-by: Feng Tang <feng.tang@linux.alibaba.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>

sh: Drop cache flush of the zero page at boot

SuperH performs cache maintenance on the zero page during boot,
presumably because before commit

6215d9f4470f ("arch, mm: consolidate empty_zero_page")

the zero page did double duty as a boot params region, and was cleared
separately, as it was not part of BSS. The memset() in question was
dropped by that commit, but the __flush_wback_region() call remained.

As empty_zero_page[] has been moved to BSS, it can be treated as any
other BSS memory, and so the cache flush can be dropped.

Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>

powerpc/code-patching: Avoid r/w mapping of the zero page

The only remaining use of map_patch_area() is mapping the zero page, and
immediately unmapping it again so that the intermediate page table
levels are all guaranteed to be populated.

The use of the zero page here is completely arbitrary, and not harmful
per se, but currently, it creates a writable mapping, and does so in a
manner that requires that the empty_zero_page[] symbol is not
const-qualified.

Given that this is about to change, and that map_patch_area() now never
maps anything other than the zero page, let's simplify the code and
- remove the helpers and call [un]map_kernel_page() directly
- take the PA of empty_zero_page directly
- create a read-only temporary mapping.

This allows empty_zero_page[] to be repainted as const u8[] in a
subsequent patch, without making substantial changes to this code
patching logic.

Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Christophe Leroy (CS GROUP) <chleroy@kernel.org>
Link: https://lore.kernel.org/all/20260520085423.485402-1-ardb@kernel.org/
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>

arm64: mm: Don't abuse memblock NOMAP to check for overlaps

Now that the linear region mapping routines respect existing table
mappings and contiguous block and page mappings, it is no longer needed
to fiddle with the memblock tables to set and clear the NOMAP attribute
in order to omit text and rodata when creating the linear map.

Instead, map the kernel text and rodata alias first with the desired
initial attributes and granularity, so that the loop iterating over the
memblocks will not remap it in a manner that prevents it from being
remapped with updated attributes later.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>

arm64: Move fixmap and kasan page tables to end of kernel image

Move the fixmap and kasan page tables out of the BSS section, and place
them at the end of the image, right before the init_pg_dir section where
some of the other statically allocated page tables live.

These page tables are currently the only data objects in vmlinux that
are meant to be accessed via the kernel image's linear alias, and so
placing them together allows the remainder of the data/bss section to be
remapped read-only or unmapped entirely.

Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>

arm64: mm: Permit contiguous attribute for preliminary mappings

There are a few cases where we omit the contiguous hint for mappings
that start out as read-write and are remapped read-only later, on the
basis that manipulating live descriptors with the PTE_CONT attribute set
is unsafe. When support for the contiguous hint was added to the code,
the ARM ARM was ambiguous about this, and so we erred on the side of
caution.

In the meantime, this has been clarified [0], and regions that will be
remapped in their entirety, retaining the contiguous bit on all entries,
can use the contiguous hint both in the initial mapping as well as the
one that replaces it. Note that this requires that the logic that may be
called to remap overlapping regions respects existing valid descriptors
that have the contiguous bit cleared.

So omit the NO_CONT_MAPPINGS flag in places where it is unneeded.

[0] RJQQTC

For a TLB lookup in a contiguous region mapped by translation table entries that
have consistent values for the Contiguous bit, but have the OA, attributes, or
permissions misprogrammed, that TLB lookup is permitted to produce an OA, access
permissions, and memory attributes that are consistent with any one of the
programmed translation table values.

Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>

arm64: kfence: Avoid NOMAP tricks when mapping the early pool

Now that the map_mem() routines respect existing page mappings and
contiguous granule sized blocks with the contiguous bit cleared, there
is no longer a reason to play tricks with the memblock NOMAP attribute.

Instead, the kfence pool can be allocated and mapped with page
granularity first, and this granularity will be respected when the rest
of DRAM is mapped later, even if block and contiguous mappings are
allowed for the remainder of those mappings.

Add the NO_EXEC_MAPPINGS flag to ensure that hierarchical XN attributes
are set on the intermediate page tables that are allocated when mapping
the pool.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>

arm64: mm: Permit contiguous descriptors to be manipulated

Currently, pgattr_change_is_safe() is overly pedantic when it comes to
descriptors with the contiguous hint attribute set, as it rejects
assignments even if the old and the new value are the same.

In fact, as per ARM ARM RJQQTC, manipulating descriptors with the
contiguous bit set is safe as long as the bit itself does not change
value, in the sense that no TLB conflict aborts or other exceptions may
be raised as a result. Inconsistent permission attributes within the
contiguous region may result in any of the alternatives to be taken to
apply to the entire region, which might be a programming error, but it
does not constitute an unsafe manipulation in terms of what
pgattr_change_is_safe() is intended to detect.

So drop the special PTE_CONT check, but still omit PTE_CONT from 'mask'
so that modifying the bit is still regarded as unsafe.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>

arm64: mm: Preserve non-contiguous descriptors when mapping DRAM

Instead of blindly overwriting existing live entries regardless of the
value of their contiguous bit when mapping DRAM regions at
contiguous-hint granularity, check whether the contiguous region in
question contains any valid descriptors that have the contiguous bit
cleared, and in that case, leave the contiguous bit unset on the entire
region. This permits the logic of mapping the kernel's linear alias to
be simplified in a subsequent patch.

Note that this can only result in a misprogrammed contiguous bit (as per
ARM ARM RNGLXZ) if the region in question already contains a mix of
valid contiguous and valid non-contiguous descriptors, in which case it
was already misprogrammed to begin with.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>

arm64: mm: Preserve existing table mappings when mapping DRAM

Instead of blindly overwriting an existing table entry when mapping DRAM
regions, take care not to replace a pre-existing table entry with a
block entry. This permits the logic of mapping the kernel's linear alias
to be simplified in a subsequent patch.

Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>

arm64: mm: Check for pud_/pmd_set_huge() failures on kernel mappings

Sashiko reports:

| If pmd_set_huge() rejects an unsafe page table transition (such as
| mapping a different physical address over an existing block mapping),
| it returns 0 and leaves the page table entry unmodified.
|
| Because *pmdp remains unmodified, READ_ONCE(pmd_val(*pmdp)) will equal
| pmd_val(old_pmd). The transition from old_pmd to old_pmd is evaluated
| as safe by pgattr_change_is_safe(), so the BUG_ON never triggers.
|
| This allows invalid and unsafe mapping updates to be silently dropped
| instead of panicking, leaving stale memory mappings active while the
| caller assumes the update was successful.

The same applies to pud_set_huge() in alloc_init_pud().

Given how it is generally preferred to limp on rather than blow up the
system if an unexpected condition such as this one occurs, and the fact
that there are no known cases where this disparity results in real
problems, let's WARN on these failures rather than BUG, allowing the
system to survive to the point where it can actually report them.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>

arm64: mm: Drop redundant pgd_t* argument from map_mem()

__map_memblock() and map_mem() always operate on swapper_pg_dir, so
there is no need to pass around a pgd_t pointer between them.

Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>

arm64: mm: Remove bogus stop condition from map_mem() loop

The memblock API guarantees that start is not greater than or equal to
end, so there is no need to test it. And if it were, it is doubtful that
breaking out of the loop would be a reasonable course of action here
(rather than attempting to map the remaining regions)

So let's drop this check.

Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>

ASoC: loongson: Refactor DMA and regmap handling

Binbin Zhou <zhoubinbin@loongson.cn> says:

This series refactors the Loongson I2S ASoC drivers, reducing code
duplication and improving DMA differentiation. It also adds an entry
in MAINTAINERS and applies a few fixes to the es8323 codec driver.

These changes have been tested on Loongson-2K0300 (platform, eDMA) and
Loongson-2K2000 (PCI, iDMA) boards.

Link: https://patch.msgid.link/cover.1780304703.git.zhoubinbin@loongson.cn

ASoC: loongson: Separate external shared DMA from the platform interface

The Loongson I2S platform driver (used on LS2K1000, LS7A etc.) relies on
an external DMA engine (e.g., dw_dmac) rather than the internal DMA.
However, its DMA-related code was originally embedded in
loongson_i2s_plat.c, duplicating logic that should be shared.

Extract the external DMA (eDMA) support from the platform driver and move
it into loongson_dma.c alongside the existing internal DMA (iDMA) code.

This change eliminates code duplication and prepares for future
consolidation of DMA selection logic.

Signed-off-by: Binbin Zhou <zhoubinbin@loongson.cn>
Link: https://patch.msgid.link/979368ad269f192703ed24e9a19eebce32316745.1780304703.git.zhoubinbin@loongson.cn
Signed-off-by: Mark Brown <broonie@kernel.org>

ASoC: loongson: Use the `idma` identifier for internal DMA variables

The Loongson I2S controller can work with two types of DMA:
- Internal DMA (iDMA): integrated DMA engine, driven by dedicated
registers and interrupts.
- External DMA (eDMA): generic DMA engine (e.g., dw_dmac), using the
standard dmaengine API.

To distinguish these two distinct implementations, rename all
internal-DMA-related structures, functions, and the component driver
to use the "idma" prefix.

No functional change intended.

Signed-off-by: Binbin Zhou <zhoubinbin@loongson.cn>
Link: https://patch.msgid.link/58e91c54f2bf658ac9b773741ca2aebc3866e550.1780304703.git.zhoubinbin@loongson.cn
Signed-off-by: Mark Brown <broonie@kernel.org>

ASoC: loongson: Combined regmap definitions

Previously, the regmap configuration for Loongson I2S controller was
duplicated in both PCI and platform glue drivers. Move the common
regmap configuration into the shared loongson_i2s.c to avoid code
duplication and centralize register access handling.

While moving, adjust the following:
- Mark RX_DATA/TX_DATA/I2S_CTRL as volatile registers. The PCI version
  incorrectly marked CFG/CFG1 as volatile, which prevented proper
  regcache synchronization.
- Change cache type from REGCACHE_FLAT to REGCACHE_MAPLE. The register
  map is sparse and the number of registers is small; MAPLE tree provides
  better scalability and is the recommended cache type for modern
  regmap users.

Also, the following warning for the i2s_plat driver will be eliminated:

loongson-i2s-plat loongson-i2s: using zero-initialized flat cache, this may cause unexpected behavior.

Signed-off-by: Binbin Zhou <zhoubinbin@loongson.cn>
Link: https://patch.msgid.link/e32d24479fc382dc3de6aded6351c13b43b6391d.1780304703.git.zhoubinbin@loongson.cn
Signed-off-by: Mark Brown <broonie@kernel.org>

MAINTAINERS: Add entry for Loongson ASoC driver

Add MAINTAINERS entry for Loongson I2S ASoC drivers to track
changes in sound/soc/loongson/ directory.

Signed-off-by: Binbin Zhou <zhoubinbin@loongson.cn>
Link: https://patch.msgid.link/9451dfcd6ff3048eac0656d3720908386128b7fc.1780304703.git.zhoubinbin@loongson.cn
Signed-off-by: Mark Brown <broonie@kernel.org>

ASoC: es9356: Use new SoundWire enumeration helper

Update the driver to use the new core helper that waits for the device
to enumerate on SoundWire and be initialised by the SoundWire core.

Link: https://lore.kernel.org/linux-sound/20260512103022.1154645-1-ckeepax@opensource.cirrus.com/
Signed-off-by: Charles Keepax <ckeepax@opensource.cirrus.com>
Link: https://patch.msgid.link/20260602102749.3962261-1-ckeepax@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org>

of: reserved_mem: only support one <base size> entry in reg property

A /reserved-memory child node may have multiple <base size> tuples in
'reg' property, but multiple entries in 'reg' have never been fully
functional:
- fdt_scan_reserved_mem() in the early pass loops over every
   tuple and reserves them all.

- fdt_scan_reserved_mem_late() reads 'reg' by
   of_flat_dt_get_addr_size(), which returns false if entries != 1.
   So 'reg' property with multiple <base size> entries will be
   skipped, no reserved_mem entry is created in reserved_mem[].

Supporting multiple <base size> tuples is not a good idea:
  - It requires reserved_mem_ops->node_init support. Currently,
    CMA(rmem_cma_setup) and DMA(rmem_dma_setup) are not supported.

  - of_reserved_mem_lookup() is name-based, only the first entry in
    multiple <base size> tuples will be found.

So change to support one <base size> entry in 'reg' property.

Also update dt binding:
  https://github.com/devicetree-org/dt-schema/pull/197

Suggested-by: Rob Herring <robh@kernel.org>
Signed-off-by: Wandun Chen <chenwandun@lixiang.com>
Tested-by: Meijing Zhao <zhaomeijing@lixiang.com>
Link: https://lore.kernel.org/all/20260506014752.GA280279-robh@kernel.org/
Link: https://patch.msgid.link/20260525121700.2706141-1-chenwandun1@gmail.com
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>

ASoC: mediatek: mt8192 probe cleanup

Cássio Gabriel <cassiogabrielcontato@gmail.com> says:

Fix two MT8192 AFE probe cleanup issues that mirror the recently fixed
MT8189 and MT8196 paths.

The first patch registers a devm cleanup action for a successful
reserved-memory assignment so later probe failures and driver unbind
release it.

The second patch checks the temporary runtime resume used while
reinitializing the regmap cache and makes the regcache failure path drop
the PM reference and clear pm_runtime_bypass_reg_ctl.

Link: https://patch.msgid.link/20260527-asoc-mt8192-probe-cleanup-v1-0-1bb834d05b72@gmail.com

ASoC: mediatek: mt8192: Check runtime resume during probe

The MT8192 AFE probe enables runtime PM temporarily while reinitializing
the regmap cache from hardware, but it uses pm_runtime_get_sync()
without checking the return value. If runtime resume fails, probe keeps
going without the device necessarily being accessible, and
pm_runtime_get_sync() may leave the PM usage count incremented.

The regmap_reinit_cache() failure path also returns before dropping the
temporary PM reference and before clearing pm_runtime_bypass_reg_ctl.

Use pm_runtime_resume_and_get() so resume failures do not leak a usage
count, and clear the temporary bypass flag after dropping the probe PM
reference on all regmap_reinit_cache() outcomes.

Fixes: 125ab5d588b0 ("ASoC: mediatek: mt8192: add platform driver")
Cc: stable@vger.kernel.org
Signed-off-by: Cássio Gabriel <cassiogabrielcontato@gmail.com>
Link: https://patch.msgid.link/20260527-asoc-mt8192-probe-cleanup-v1-2-1bb834d05b72@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>

ASoC: mediatek: mt8192: Release reserved memory on cleanup

The MT8192 AFE probe calls of_reserved_mem_device_init() and falls
back to preallocated buffers when no reserved memory region is
available. When the reserved memory assignment succeeds, however, the
driver never releases it.

Register a devm cleanup action after a successful reserved-memory
assignment so the assignment is released on probe failure and driver
unbind.

Fixes: ec4a10ca4a68 ("ASoC: mediatek: use reserved memory or enable buffer pre-allocation")
Cc: stable@vger.kernel.org
Signed-off-by: Cássio Gabriel <cassiogabrielcontato@gmail.com>
Link: https://patch.msgid.link/20260527-asoc-mt8192-probe-cleanup-v1-1-1bb834d05b72@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>

ASoC: mediatek: mt8183: Fix probe resource cleanup

Cássio Gabriel <cassiogabrielcontato@gmail.com> says:

The MT8183 AFE probe has two cleanup gaps that match issues
recently fixed in newer MediaTek AFE drivers.

First, reserved memory assigned with of_reserved_mem_device_init()
is never released on driver removal or later probe failures.

Second, the probe-time runtime PM resume used before reinitializing
the regmap cache is unchecked, and a regmap_reinit_cache() failure
skips the temporary PM put.

Fix both issues with a devm reserved-memory release action and
checked runtime PM resume handling.

Link: https://patch.msgid.link/20260527-asoc-mt8183-probe-cleanup-v1-0-4f4f5593c8d1@gmail.com

ASoC: mediatek: mt8183: Check runtime resume during probe

The MT8183 AFE probe uses pm_runtime_get_sync() before reading hardware
defaults into the regmap cache, but does not check whether runtime resume
failed. If regmap_reinit_cache() then fails, the temporary runtime PM
usage count is also not released.

Use pm_runtime_resume_and_get() so resume failures abort probe without
leaking a usage count, and release the temporary reference before
handling the regmap cache result.

Fixes: a94aec035a12 ("ASoC: mediatek: mt8183: add platform driver")
Cc: stable@vger.kernel.org
Signed-off-by: Cássio Gabriel <cassiogabrielcontato@gmail.com>
Link: https://patch.msgid.link/20260527-asoc-mt8183-probe-cleanup-v1-2-4f4f5593c8d1@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>

ASoC: mediatek: mt8183: Release reserved memory on cleanup

The MT8183 AFE probe can assign reserved memory with
of_reserved_mem_device_init(), but the assignment is never released on
driver removal or later probe failures.

Register a devm cleanup action so the reserved memory assignment is
released consistently, matching newer Mediatek AFE drivers.

Fixes: ec4a10ca4a68 ("ASoC: mediatek: use reserved memory or enable buffer pre-allocation")
Cc: stable@vger.kernel.org
Signed-off-by: Cássio Gabriel <cassiogabrielcontato@gmail.com>
Link: https://patch.msgid.link/20260527-asoc-mt8183-probe-cleanup-v1-1-4f4f5593c8d1@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>

regulator: Use named initializers for platform_device_id arrays

Uwe Kleine-König (The Capable Hub) <u.kleine-koenig@baylibre.com> says:

this series targets to use named initializers for platform_device_id
arrays. In general these are better readable for humans and more robust
to changes in the respective struct definition.

This robustness is needed as I want to do

Link: https://patch.msgid.link/cover.1779878004.git.u.kleine-koenig@baylibre.com

regulator: Unify usage of space and comma in platform_device_id arrays

After converting all these arrays to use named initializers and fixing
coding style en passant, adapt the coding style also for those drivers that
already used named initializers before for consistency.

Signed-off-by: Uwe Kleine-König (The Capable Hub) <u.kleine-koenig@baylibre.com>
Link: https://patch.msgid.link/a3a2736ebfcfa5a228dcebfbfefc14960dcce314.1779878004.git.u.kleine-koenig@baylibre.com
Signed-off-by: Mark Brown <broonie@kernel.org>

regulator: Use named initializers for platform_device_id arrays

Named initializers are better readable and more robust to changes of the
struct definition. This robustness is relevant for a planned change to
struct platform_device_id replacing .driver_data by an anonymous unit.

While touching these arrays unify spacing and usage of commas.

Signed-off-by: Uwe Kleine-König (The Capable Hub) <u.kleine-koenig@baylibre.com>
Acked-by: Karel Balej <balejk@matfyz.cz>
Reviewed-by: Matti Vaittinen <mazziesaccount@gmail.com>
Link: https://patch.msgid.link/d02f55dfd5bdd743ae5cd76f2a5af0d346226a68.1779878004.git.u.kleine-koenig@baylibre.com
Signed-off-by: Mark Brown <broonie@kernel.org>

regulator: Drop unused assignment of platform_device_id driver data

Several drivers explicitly set the .driver_data member of struct
platform_device_id to zero without relying on that value. Drop these
unused assignments.

While touching these arrays unify spacing, usage of commas and use
named initializers for .name.

Signed-off-by: Uwe Kleine-König (The Capable Hub) <u.kleine-koenig@baylibre.com>
Link: https://patch.msgid.link/613cd1bed263c2bf562ee714595f6d57f442804d.1779878004.git.u.kleine-koenig@baylibre.com
Signed-off-by: Mark Brown <broonie@kernel.org>

ASoC: codecs: rk3328: Use managed GPIO and clock helpers

rk3328_platform_probe() acquires the mute GPIO with gpiod_get_optional()
but never releases it. It also enables mclk and pclk manually while
relying on probe error labels for unwind, and the driver has no platform
remove callback to disable those clocks after a successful unbind.

This path has already needed fixes for missing clock unwinds on probe
errors. Use devm_gpiod_get_optional() and devm_clk_get_enabled() so the
GPIO and enabled clock lifetimes are tied to the device. This removes the
manual error labels and makes both probe failure and driver unbind follow
the normal devres cleanup path.

Signed-off-by: Cássio Gabriel <cassiogabrielcontato@gmail.com>
Link: https://patch.msgid.link/20260525-asoc-rk3328-devm-resources-v1-1-2abde0006f89@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>

regulator: scmi: fix of_node refcount leak in scmi_regulator_probe()

scmi_regulator_probe() calls of_find_node_by_name() which takes a
reference on the returned device node. On the error path where
process_scmi_regulator_of_node() fails, the function returns without
calling of_node_put() on the child node, leaking the reference.

Add of_node_put(np) on the error path to properly release the
reference.

Cc: stable@vger.kernel.org
Fixes: 0fbeae70ee7c ("regulator: add SCMI driver")
Signed-off-by: Wentao Liang <vulab@iscas.ac.cn>
Link: https://patch.msgid.link/20260527104850.872415-1-vulab@iscas.ac.cn
Signed-off-by: Mark Brown <broonie@kernel.org>

ASoC: rockchip: i2s: Use managed hclk and runtime PM cleanup

The Rockchip I2S driver mixes devm-managed probe resources with manual
runtime PM and hclk cleanup. This leaves the remove path doing runtime PM
shutdown and clock disable before devm-managed ASoC and PCM resources are
released.

Keep the bus clock enabled for the device lifetime with
devm_clk_get_enabled(), and move the runtime PM teardown into devres so the
unwind order matches the managed registrations. This also removes the
remove callback, which only existed for cleanup.

Use a devm action for the final runtime suspend and register it before the
managed runtime PM action, so teardown disables runtime PM before forcing
the device into the suspended state.

Signed-off-by: Cássio Gabriel <cassiogabrielcontato@gmail.com>
Reviewed-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Link: https://patch.msgid.link/20260521-asoc-rockchip-i2s-devm-cleanup-v1-1-9319bd781393@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>

mount: honour SB_NOUSER in the new mount API

One should *not* be allowed to mount one of those, new API or not.

Reported-by: Denis Arefev <arefev@swemel.ru>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Link: https://patch.msgid.link/20260602020444.GP2636677@ZenIV
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>

ASoC: cs35l56: Share common SoundWire interrupt enable/disable code

Move the duplicated SoundWire interrupt enable/disable code into shared
functions. These new functions are in cs35l56.c to prevent circular
dependency between cs35l56.c and cs35l56-sdw.c

Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Link: https://patch.msgid.link/20260529140350.408557-1-rf@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org>

KVM: s390: Lock pte when making page secure

Make sure _kvm_s390_pv_make_secure() takes the pte lock for the given
address when attempting to make the page secure.

One of the steps in making the page secure is freezing the folio using
folio_ref_freeze(), which temporarily sets the reference count to 0.
Any attempt to get such a folio while frozen will fail and cause a
warning to be printed.

Other users of folio_ref_freeze() make sure that the page is not mapped
while it's being frozen, thus preventing gup functions from being able
to access it. For _kvm_s390_pv_make_secure(), this is not possible,
because the page needs to be mapped in order for the import to succeed.

By taking the pte lock, gup functions will be blocked until the import
operation is done, thus avoiding the race.

In theory this does not completely solve the issue: if a page is mapped
through multiple mappings, locking one pte does not protect from
calling gup on it through the other mapping. In practice this does not
happen and it is a decent stopgap solution until a more correct
solution is available.

Fixes: e38c884df921 ("KVM: s390: Switch to new gmap")
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Message-ID: <20260602142356.169458-8-imbrenda@linux.ibm.com>

KVM: s390: Fix fault-in code

Fix the fault-in code so that it does not return success if a
concurrent unmap event invalidated the fault-in process between the
best-effort lockless check and the proper check with lock.

The new behaviour is to retry, like the best-effort lockless check
already did.

This prevents the fault-in handler from returning success without
having actually faulted in the requested page.

Fixes: e907ae530133 ("KVM: s390: Add helper functions for fault handling")
Reviewed-by: Steffen Eiden <seiden@linux.ibm.com>
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Message-ID: <20260602142356.169458-7-imbrenda@linux.ibm.com>

KVM: s390: vsie: Fix rmap handling in _do_shadow_crste()

Fix _do_shadow_crste() to also apply a mask on the reverse address, to
prevent spurious entries from being created, like already done in
gmap_protect_rmap().

Fixes: e38c884df921 ("KVM: s390: Switch to new gmap")
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Message-ID: <20260602142356.169458-6-imbrenda@linux.ibm.com>

KVM: s390: Fix guest / virtual address confusion in _essa_clear_cbrl()

Until now, gmap_helper_zap_one_page() was being called with the guest
absolute address, but it expects a userspace virtual address.

This meant that in the best case the requested pages were not being
discarded, and in the worst case that the wrong pages were being
discarded.

Fix this by converting the guest absolute address to host virtual
before passing it to gmap_helper_zap_one_page().

Fixes: e38c884df921 ("KVM: s390: Switch to new gmap")
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Message-ID: <20260602142356.169458-5-imbrenda@linux.ibm.com>

KVM: s390: Avoid potentially sleeping while atomic when zapping pages

Factor out try_get_locked_pte(), which behaves similarly to
get_locked_pte(), but does not attempt to allocate missing tables and
performs a spin_trylock() instead of blocking.

The new function is also exported, since it will be used in other
patches.

If intermediate entries are missing, there can be no pte swap entry to
free, so it's safe to ignore them.

This avoids potentially sleeping while atomic.

Fixes: e38c884df921 ("KVM: s390: Switch to new gmap")
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Message-ID: <20260602142356.169458-4-imbrenda@linux.ibm.com>

KVM: s390: Fix _gmap_crstep_xchg_atomic()

The previous incorrect behaviour cleared the vsie_notif bit without
returning false, which allowed shadow crstes to be installed without
the vsie_notif bit.

Return false and do not perform the operation if an unshadow event has
been triggered, but still attempt to clear the vsie_notif bit from the
existing crste.

This will prevent the installation of shadow crstes without vsie_notif
bit and will also prevent the caller from looping forever if it was
not checking for the sg->invalidated flag.

Fixes: b827ef02f409 ("KVM: s390: Remove non-atomic dat_crstep_xchg()")
Fixes: a2c17f9270cc ("KVM: s390: New gmap code")
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Message-ID: <20260602142356.169458-3-imbrenda@linux.ibm.com>

KVM: s390: Fix _gmap_unmap_crste()

In _gmap_unmap_crste(), the crste to be unmapped is zapped calling
gmap_crstep_xchg_atomic() exactly once, and expecting it to succeed.
This is a reasonable sanity check, since kvm->mmu_lock is being held in
write mode, and thus no races should be possible.

An upcoming patch will change the behaviour of gmap_crstep_xchg_atomic()
to return false and clear the vsie_notif bit if the operation triggers
an unshadow operation. With the new behaviour, an unmap operation that
triggers an unshadow would cause the VM to be killed.

Prepare for the change by checking if the vsie_notif bit was set in
the old crste if gmap_crstep_xchg_atomic() fails the first time, and
try a second time. The second time no failures are allowed.

Fixes: b827ef02f409 ("KVM: s390: Remove non-atomic dat_crstep_xchg()")
Fixes: a2c17f9270cc ("KVM: s390: New gmap code")
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Message-ID: <20260602142356.169458-2-imbrenda@linux.ibm.com>

tracing/eprobes: Allow use of BTF names to dereference pointers

Add syntax to the parsing of eprobes to be able to typecast a trace event
field that is a pointer to a structure.

Currently, a dereference must be a number, where the user has to figure
out manually the offset of a member of a structure that they want to
dereference.

But for event probes that records a field that happens to be a pointer to
a structure, it cannot dereference these values with BTF naming, but
must use numerical offsets.

For example, to find out what device a sk_buff is pointing to in the
net_dev_xmit trace event, one must first use gdb to find the offsets of the
members of the structures:

(gdb) p &((struct sk_buff *)0)->dev
$1 = (struct net_device **) 0x10
(gdb) p &((struct net_device *)0)->name
$2 = (char (*)[16]) 0x118

And then use the raw numbers to dereference:

  # echo 'e:xmit net.net_dev_xmit +0x118(+0x10($skbaddr)):string' >> dynamic_events

If BTF is in the kernel, then instead, the skbaddr can be typecast to
sk_buff and use the normal dereference logic.

  # echo 'e:xmit net.net_dev_xmit (sk_buff)skbaddr->dev->name:string' >> dynamic_events
  # echo 1 > events/eprobes/xmit/enable
  # cat trace
[..]
    sshd-session-1022    [000] b..2.   860.249343: xmit: (net.net_dev_xmit) arg1="enp7s0"
    sshd-session-1022    [000] b..2.   860.250061: xmit: (net.net_dev_xmit) arg1="enp7s0"
    sshd-session-1022    [000] b..2.   860.250142: xmit: (net.net_dev_xmit) arg1="enp7s0"
    sshd-session-1022    [000] b..2.   860.263553: xmit: (net.net_dev_xmit) arg1="enp7s0"
    sshd-session-1022    [000] b..2.   860.283820: xmit: (net.net_dev_xmit) arg1="enp7s0"
    sshd-session-1022    [000] b..2.   860.302716: xmit: (net.net_dev_xmit) arg1="enp7s0"
    sshd-session-1022    [000] b..2.   860.322905: xmit: (net.net_dev_xmit) arg1="enp7s0"
    sshd-session-1022    [000] b..2.   860.342828: xmit: (net.net_dev_xmit) arg1="enp7s0"
    sshd-session-1022    [000] b..2.   860.362268: xmit: (net.net_dev_xmit) arg1="enp7s0"
    sshd-session-1022    [000] b..2.   860.382335: xmit: (net.net_dev_xmit) arg1="enp7s0"
    sshd-session-1022    [000] b..2.   860.400856: xmit: (net.net_dev_xmit) arg1="enp7s0"
    sshd-session-1022    [000] b..2.   860.419893: xmit: (net.net_dev_xmit) arg1="enp7s0"

The syntax is simply: (STRUCT)(FIELD)->MEMBER[->MEMBER..]

Also add comments around the #else and #endif of #ifdef CONFIG_PROBE_EVENTS_BTF_ARGS
to know what they are for.

Link: https://lore.kernel.org/all/20260601130746.2139d926@gandalf.local.home/
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>