Wang Zhaolong [Tue, 18 Feb 2025 14:30:05 +0000 (22:30 +0800)]
smb: client: Fix netns refcount imbalance causing leaks and use-after-free
Commit ef7134c7fc48 ("smb: client: Fix use-after-free of network
namespace.") attempted to fix a netns use-after-free issue by manually
adjusting reference counts via sk->sk_net_refcnt and sock_inuse_add().
However, a later commit e9f2517a3e18 ("smb: client: fix TCP timers deadlock
after rmmod") pointed out that the approach of manually setting
sk->sk_net_refcnt in the first commit was technically incorrect, as
sk->sk_net_refcnt should only be set for user sockets. It led to issues
like TCP timers not being cleared properly on close. The second commit
moved to a model of just holding an extra netns reference for
server->ssocket using get_net(), and dropping it when the server is torn
down.
But there remain some gaps in the get_net()/put_net() balancing added by
these commits. The incomplete reference handling in these fixes results
in two issues:
1. Netns refcount leaks[1]
The problem process is as follows:
```
mount.cifs cifsd
cifs_do_mount
cifs_mount
cifs_mount_get_session
cifs_get_tcp_session
get_net() /* First get net. */
ip_connect
generic_ip_connect /* Try port 445 */
get_net()
->connect() /* Failed */
put_net()
generic_ip_connect /* Try port 139 */
get_net() /* Missing matching put_net() for this get_net().*/
cifs_get_smb_ses
cifs_negotiate_protocol
smb2_negotiate
SMB2_negotiate
cifs_send_recv
wait_for_response
cifs_demultiplex_thread
cifs_read_from_socket
cifs_readv_from_socket
cifs_reconnect
cifs_abort_connection
sock_release();
server->ssocket = NULL;
/* Missing put_net() here. */
generic_ip_connect
get_net()
->connect() /* Failed */
put_net()
sock_release();
server->ssocket = NULL;
free_rsp_buf
...
clean_demultiplex_info
/* It's only called once here. */
put_net()
```
When cifs_reconnect() is triggered, the server->ssocket is released
without a corresponding put_net() for the reference acquired in
generic_ip_connect() before. it ends up calling generic_ip_connect()
again to retry get_net(). After that, server->ssocket is set to NULL
in the error path of generic_ip_connect(), and the net count cannot be
released in the final clean_demultiplex_info() function.
2. Potential use-after-free
The current refcounting scheme can lead to a potential use-after-free issue
in the following scenario:
```
cifs_do_mount
cifs_mount
cifs_mount_get_session
cifs_get_tcp_session
get_net() /* First get net */
ip_connect
generic_ip_connect
get_net()
bind_socket
kernel_bind /* failed */
put_net()
/* after out_err_crypto_release label */
put_net()
/* after out_err label */
put_net()
```
In the exception handling process where binding the socket fails, the
get_net() and put_net() calls are unbalanced, which may cause the
server->net reference count to drop to zero and be prematurely released.
To address both issues, this patch ties the netns reference counting to
the server->ssocket and server lifecycles. The extra reference is now
acquired when the server or socket is created, and released when the
socket is destroyed or the server is torn down.
Aman [Thu, 6 Mar 2025 17:46:43 +0000 (17:46 +0000)]
CIFS: Propagate min offload along with other parameters from primary to secondary channels.
In a multichannel setup, it was observed that a few fields were not being
copied over to the secondary channels, which impacted performance in cases
where these options were relevant but not properly synchronized. To address
this, this patch introduces copying the following parameters from the
primary channel to the secondary channels:
By copying these parameters, we ensure consistency across channels and
prevent performance degradation due to missing or outdated settings.
Cc: stable@vger.kernel.org Signed-off-by: Aman <aman1@microsoft.com> Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>
Pali Rohár [Wed, 30 Oct 2024 20:59:35 +0000 (21:59 +0100)]
cifs: Improve establishing SMB connection with NetBIOS session
Function ip_rfc1001_connect() send NetBIOS session request but currently
does not read response. It even does not wait for the response. Instead it
just calls usleep_range(1000, 2000) and explain in comment that some
servers require short break before sending SMB negotiate packet. Response
is later handled in generic is_smb_response() function called from
cifs_demultiplex_thread().
That comment probably refers to the old DOS SMB server which cannot process
incoming SMB negotiate packet if it has not sent NetBIOS session response
packet. Note that current sleep timeout is too small when trying to
establish connection to DOS SMB server running in qemu virtual machine
connected over qemu user networking with guestfwd netcat options. So that
usleep_range() call is not useful at all.
NetBIOS session response packet contains useful error information, like
the server name specified NetBIOS session request packet is incorrect.
Old Windows SMB servers and even the latest SMB server on the latest
Windows Server 2022 version requires that the name is the correct server
name, otherwise they return error RFC1002_NOT_PRESENT. This applies for all
SMB dialects (old SMB1, and also modern SMB2 and SMB3).
Therefore read the reply of NetBIOS session request and implement parsing
of the reply. Log received error to dmesg to help debugging reason why
connection was refused. Also convert NetBIOS error to useful errno.
Note that ip_rfc1001_connect() function is used only when doing connection
over port 139. So the common SMB scenario over port 445 is not affected by
this change at all.
Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
Pali Rohár [Wed, 30 Oct 2024 21:46:20 +0000 (22:46 +0100)]
cifs: Fix establishing NetBIOS session for SMB2+ connection
Function ip_rfc1001_connect() which establish NetBIOS session for SMB
connections, currently uses smb_send() function for sending NetBIOS Session
Request packet. This function expects that the passed buffer is SMB packet
and for SMB2+ connections it mangles packet header, which breaks prepared
NetBIOS Session Request packet. Result is that this function send garbage
packet for SMB2+ connection, which SMB2+ server cannot parse. That function
is not mangling packets for SMB1 connections, so it somehow works for SMB1.
Fix this problem and instead of smb_send(), use smb_send_kvec() function
which does not mangle prepared packet, this function send them as is. Just
API of this function takes struct msghdr (kvec) instead of packet buffer.
[MS-SMB2] specification allows SMB2 protocol to use NetBIOS as a transport
protocol. NetBIOS can be used over TCP via port 139. So this is a valid
configuration, just not so common. And even recent Windows versions (e.g.
Windows Server 2022) still supports this configuration: SMB over TCP port
139, including for modern SMB2 and SMB3 dialects.
This change fixes SMB2 and SMB3 connections over TCP port 139 which
requires establishing of NetBIOS session. Tested that this change fixes
establishing of SMB2 and SMB3 connections with Windows Server 2022.
Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
Pali Rohár [Mon, 14 Oct 2024 11:47:04 +0000 (13:47 +0200)]
cifs: Fix getting DACL-only xattr system.cifs_acl and system.smb3_acl
Currently ->get_acl() callback always create request for OWNER, GROUP and
DACL, even when only DACLs was requested by user. Change API callback to
request only information for which the caller asked. Therefore when only
DACLs requested, then SMB client will prepare and send DACL-only request.
This change fixes retrieving of "system.cifs_acl" and "system.smb3_acl"
xattrs to contain only DACL structure as documented.
Note that setting/changing of "system.cifs_acl" and "system.smb3_acl"
xattrs already takes only DACL structure and ignores all other fields.
Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
Ivan Abramov [Mon, 10 Mar 2025 14:04:06 +0000 (17:04 +0300)]
smb: client: Remove redundant check in cifs_oplock_break()
There is an unnecessary NULL check of inode in cifs_oplock_break(), since
there are multiple dereferences of cinode prior to it.
Based on usage of cifs_oplock_break() in cifs_new_fileinfo() we can safely
assume that inode is not NULL, so there is no need to check inode in
cifs_oplock_break() at all.
Therefore, this redundant check can be removed.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Signed-off-by: Ivan Abramov <i.abramov@mt-integration.ru> Signed-off-by: Steve French <stfrench@microsoft.com>
Bharath SM [Mon, 17 Mar 2025 10:27:26 +0000 (15:57 +0530)]
smb: mark the new channel addition log as informational log with cifs_info
For multichannel mounts, when a new channel is successfully opened
we currently log 'successfully opened new channel on iface: <>' as
cifs_dbg(VFS..) which is eventually translated into a pr_err log.
Marking these informational logs as error logs may lead to confusion
for users so they will now be logged as info logs instead.
Signed-off-by: Bharath SM <bharathsm@microsoft.com> Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>
Bharath SM [Mon, 17 Mar 2025 10:27:25 +0000 (15:57 +0530)]
smb: minor cleanup to remove unused function declaration
remove cifs_writev_complete declaration from header file
Signed-off-by: Bharath SM <bharathsm@microsoft.com> Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>
Linus Torvalds [Sat, 22 Mar 2025 17:45:44 +0000 (10:45 -0700)]
Merge tag 'io_uring-6.14-20250322' of git://git.kernel.dk/linux
Pull io_uring fix from Jens Axboe:
"Just a single fix for the commit that went into your tree yesterday,
which exposed an issue with not always clearing notifications. That
could cause them to be used more than once"
* tag 'io_uring-6.14-20250322' of git://git.kernel.dk/linux:
io_uring/net: fix sendzc double notif flush
Before the blamed commit, sendzc relied on io_req_msg_cleanup() to clear
REQ_F_NEED_CLEANUP, so after the following snippet the request will
never hit the core io_uring cleanup path.
io_notif_flush();
io_req_msg_cleanup();
The easiest fix is to null the notification. io_send_zc_cleanup() can
still be called after, but it's tolerated.
David Howells [Wed, 19 Mar 2025 15:57:46 +0000 (15:57 +0000)]
keys: Fix UAF in key_put()
Once a key's reference count has been reduced to 0, the garbage collector
thread may destroy it at any time and so key_put() is not allowed to touch
the key after that point. The most key_put() is normally allowed to do is
to touch key_gc_work as that's a static global variable.
However, in an effort to speed up the reclamation of quota, this is now
done in key_put() once the key's usage is reduced to 0 - but now the code
is looking at the key after the deadline, which is forbidden.
Fix this by using a flag to indicate that a key can be gc'd now rather than
looking at the key's refcount in the garbage collector.
Fixes: 9578e327b2b4 ("keys: update key quotas in key_put()") Reported-by: syzbot+6105ffc1ded71d194d6d@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/673b6aec.050a0220.87769.004a.GAE@google.com/ Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: syzbot+6105ffc1ded71d194d6d@syzkaller.appspotmail.com Reviewed-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Namhyung Kim [Sat, 22 Mar 2025 07:13:01 +0000 (08:13 +0100)]
perf/amd/ibs: Prevent leaking sensitive data to userspace
Although IBS "swfilt" can prevent leaking samples with kernel RIP to the
userspace, there are few subtle cases where a 'data' address and/or a
'branch target' address can fall under kernel address range although RIP
is from userspace. Prevent leaking kernel 'data' addresses by discarding
such samples when {exclude_kernel=1,swfilt=1}.
IBS can now be invoked by unprivileged user with the introduction of
"swfilt". However, this creates a loophole in the interface where an
unprivileged user can get physical address of the userspace virtual
addresses through IBS register raw dump (PERF_SAMPLE_RAW). Prevent this
as well.
This upstream commit fixed the most obvious leak:
65a99264f5e5 perf/x86: Check data address for IBS software filter
Follow that up with a more complete fix.
Fixes: d29e744c7167 ("perf/x86: Relax privilege filter restriction on AMD IBS") Suggested-by: Matteo Rizzo <matteorizzo@google.com> Co-developed-by: Ravi Bangoria <ravi.bangoria@amd.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250321161251.1033-1-ravi.bangoria@amd.com
Linus Torvalds [Fri, 21 Mar 2025 20:42:55 +0000 (13:42 -0700)]
Merge tag 'regulator-fix-v6.14-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
Pull regulator fixes from Mark Brown:
"More fixes than I'd like at this point, some of which is due to me
cooking things in -next for a bit and resetting that cooking time as
more fixes came in.
- Christian Eggers fixed some race conditions with the dummy
regulator not being available very early in boot due to the use of
asynchronous probing, both the provider side (ensuring that it's
availalbe) and consumer side (handling things if that goes wrong)
are fixed
- Ludvig Pärsson fixed some lockdep issues with the debugfs
registration for regulators holding more locks than it really needs
causing issues later when looking at the resulting debugfs.boot
- Some device specific fixes for incorrect descriptions of the
RTQ2208 from ChiYuan Huang"
* tag 'regulator-fix-v6.14-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: rtq2208: Fix the LDO DVS capability
regulator: rtq2208: Fix incorrect buck converter phase mapping
regulator: check that dummy regulator has been probed before using it
regulator: dummy: force synchronous probing
regulator: core: Fix deadlock in create_regulator()
Linus Torvalds [Fri, 21 Mar 2025 20:02:28 +0000 (13:02 -0700)]
Merge tag 'pinctrl-v6.14-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
Pull pin control fix from Linus Walleij:
- A single patch for Spacemit K1 fixing up the Kconfig to not default
to "y"
* tag 'pinctrl-v6.14-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl: spacemit: PINCTRL_SPACEMIT_K1 should not default to y unconditionally
Linus Torvalds [Fri, 21 Mar 2025 17:30:15 +0000 (10:30 -0700)]
Merge tag 'io_uring-6.14-20250321' of git://git.kernel.dk/linux
Pull io_uring fix from Jens Axboe:
"Single fix heading to stable, fixing an issue with io_req_msg_cleanup()
sometimes too eagerly clearing cleanup flags"
* tag 'io_uring-6.14-20250321' of git://git.kernel.dk/linux:
io_uring/net: don't clear REQ_F_NEED_CLEANUP unconditionally
Linus Torvalds [Fri, 21 Mar 2025 15:52:31 +0000 (08:52 -0700)]
Merge tag 'perf-urgent-2025-03-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 perf events fixes from Ingo Molnar:
"Two fixes: an RAPL PMU driver error handling fix, and an AMD IBS
software filter fix"
* tag 'perf-urgent-2025-03-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/rapl: Fix error handling in init_rapl_pmus()
perf/x86: Check data address for IBS software filter
Linus Torvalds [Fri, 21 Mar 2025 15:48:40 +0000 (08:48 -0700)]
Merge tag 'sched-urgent-2025-03-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fix from Ingo Molnar:
"Revert a scheduler performance optimization that regressed other
workloads"
* tag 'sched-urgent-2025-03-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
Revert "sched/core: Reduce cost of sched_move_task when config autogroup"
* tag 'drm-fixes-2025-03-21' of https://gitlab.freedesktop.org/drm/kernel: (21 commits)
drm/xe: Fix exporting xe buffers multiple times
gpu: host1x: Do not assume that a NULL domain means no DMA IOMMU
drm/amdgpu/pm: Handle SCLK offset correctly in overdrive for smu 14.0.2
drm/amd/display: Fix incorrect fw_state address in dmub_srv
drm/amd/display: Use HW lock mgr for PSR1 when only one eDP
drm/amd/display: Fix message for support_edp0_on_dp1
drm/amdkfd: Fix user queue validation on Gfx7/8
drm/amdgpu: Restore uncached behaviour on GFX12
drm/amdgpu/gfx12: correct cleanup of 'me' field with gfx_v12_0_me_fini()
drm/amdkfd: Fix instruction hazard in gfx12 trap handler
drm/amdgpu/pm: wire up hwmon fan speed for smu 14.0.2
drm/amd/pm: add unique_id for gfx12
drm/amdgpu: Remove JPEG from vega and carrizo video caps
drm/amdgpu: Fix JPEG video caps max size for navi1x and raven
drm/amdgpu: Fix MPEG2, MPEG4 and VC1 video caps max size
drm/radeon: fix uninitialized size issue in radeon_vce_cs_parse()
accel/qaic: Fix integer overflow in qaic_validate_req()
accel/qaic: Fix possible data corruption in BOs > 2G
drm/v3d: Set job pointer to NULL when the job's fence has an error
drm/v3d: Don't run jobs that have errors flagged in its fence
...
Linus Torvalds [Thu, 20 Mar 2025 23:55:24 +0000 (16:55 -0700)]
Merge tag 'dma-mapping-6.14-2025-03-21' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux
Pull dma-mapping fix from Marek Szyprowski:
- fix missing clear bdr in check_ram_in_range_map() (Baochen Qiang)
* tag 'dma-mapping-6.14-2025-03-21' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux:
dma-mapping: fix missing clear bdr in check_ram_in_range_map()
Linus Torvalds [Thu, 20 Mar 2025 21:13:50 +0000 (14:13 -0700)]
Merge tag 'vfs-6.14-final.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs fixes from Christian Brauner:
"A final set of fixes for this cycle:
VFS:
- Ensure that the stable offset api doesn't return duplicate
directory entries when userspace has to perform the getdents call
multiple times on large directories
afs:
- Prevent invalid pointer dereference during get_link RCU pathwalk
fuse:
- Fix deadlock caused by uninitialized rings when using io_uring with
fuse
- Handle race condition when using io_uring with fuse to prevent NULL
dereference
libnetfs:
- Ensure that invalidate_cache is only called if implemented
- Fix collection of results during pause when collection is
offloaded
- Ensure rolling_buffer_load_from_ra() doesn't clear mark bits
- Make netfs_unbuffered_read() return ssize_t rather than int"
* tag 'vfs-6.14-final.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
libfs: Fix duplicate directory entry in offset_dir_lookup
fuse: fix possible deadlock if rings are never initialized
netfs: Fix netfs_unbuffered_read() to return ssize_t rather than int
netfs: Fix rolling_buffer_load_from_ra() to not clear mark bits
netfs: Call `invalidate_cache` only if implemented
netfs: Fix collection of results during pause when collection offloaded
fuse: fix uring race condition for null dereference of fc
afs: Fix afs_atcell_get_link() to check if ws_cell is unset first
perf/x86/rapl: Fix error handling in init_rapl_pmus()
If init_rapl_pmu() fails while allocating memory for "rapl_pmu" objects,
we miss freeing the "rapl_pmus" object in the error path. Fix that.
Fixes: 9b99d65c0bb4 ("perf/x86/rapl: Move the pmu allocation out of CPU hotplug") Signed-off-by: Dhananjay Ugwekar <dhananjay.ugwekar@amd.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250320100617.4480-1-dhananjay.ugwekar@amd.com
Linus Torvalds [Thu, 20 Mar 2025 18:34:30 +0000 (11:34 -0700)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fix from Paolo Bonzini:
"A lone fix for a s390 regression. An earlier 6.14 commit stopped
taking the pte lock for pages that are being converted to secure, but
it was needed to avoid races.
The patch was in development for a while and is finally ready, but I
wish it was split into 3-4 commits at least"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: s390: pv: fix race when making a page secure
io_req_msg_cleanup() relies on the fact that io_netmsg_recycle() will
always fully recycle, but that may not be the case if the msg cache
was already full. To ensure that normal cleanup always gets run,
let io_netmsg_recycle() deal with clearing the relevant cleanup flags,
as it knows exactly when that should be done.
The `struct ttm_resource->placement` contains TTM_PL_FLAG_* flags, but
it was incorrectly tested for XE_PL_* flags.
This caused xe_dma_buf_pin() to always fail when invoked for
the second time. Fix this by checking the `mem_type` field instead.
Fixes: 7764222d54b7 ("drm/xe: Disallow pinning dma-bufs in VRAM") Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Nirmoy Das <nirmoy.das@intel.com> Cc: Jani Nikula <jani.nikula@intel.com> Cc: intel-xe@lists.freedesktop.org Cc: <stable@vger.kernel.org> # v6.8+ Signed-off-by: Tomasz Rusinowicz <tomasz.rusinowicz@intel.com> Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250218100353.2137964-1-jacek.lawrynowicz@linux.intel.com Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
(cherry picked from commit b96dabdba9b95f71ded50a1c094ee244408b2a8e) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
- eth: ti: am65-cpsw: fix NAPI registration sequence
Previous releases - regressions:
- ipv6: fix memleak of nhc_pcpu_rth_output in fib_check_nh_v6_gw().
- mptcp: fix data stream corruption in the address announcement
- bluetooth: fix connection regression between LE and non-LE adapters
- can:
- flexcan: only change CAN state when link up in system PM
- ucan: fix out of bound read in strscpy() source
Previous releases - always broken:
- lwtunnel: fix reentry loops
- ipv6: fix TCP GSO segmentation with NAT
- xfrm: force software GSO only in tunnel mode
- eth: ti: icssg-prueth: add lock to stats
Misc:
- add Andrea Mayer as a maintainer of SRv6"
* tag 'net-6.14-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (33 commits)
MAINTAINERS: Add Andrea Mayer as a maintainer of SRv6
Revert "gre: Fix IPv6 link-local address generation."
Revert "selftests: Add IPv6 link-local address generation tests for GRE devices."
net/neighbor: add missing policy for NDTPA_QUEUE_LENBYTES
tools headers: Sync uapi/asm-generic/socket.h with the kernel sources
mptcp: Fix data stream corruption in the address announcement
selftests: net: test for lwtunnel dst ref loops
net: ipv6: ioam6: fix lwtunnel_output() loop
net: lwtunnel: fix recursion loops
net: ti: icssg-prueth: Add lock to stats
net: atm: fix use after free in lec_send()
xsk: fix an integer overflow in xp_create_and_assign_umem()
net: stmmac: dwc-qos-eth: use devm_kzalloc() for AXI data
selftests: drv-net: use defer in the ping test
phy: fix xa_alloc_cyclic() error handling
dpll: fix xa_alloc_cyclic() error handling
devlink: fix xa_alloc_cyclic() error handling
ipv6: Set errno after ip_fib_metrics_init() in ip6_route_info_create().
ipv6: Fix memleak of nhc_pcpu_rth_output in fib_check_nh_v6_gw().
net: ipv6: fix TCP GSO segmentation with NAT
...
Linus Torvalds [Thu, 20 Mar 2025 16:25:25 +0000 (09:25 -0700)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma fixes from Jason Gunthorpe:
"Collected driver fixes from the last few weeks, I was surprised how
significant many of them seemed to be.
- Fix rdma-core test failures due to wrong startup ordering in rxe
- Don't crash in bnxt_re if the FW supports more than 64k QPs
- Fix wrong QP table indexing math in bnxt_re
- Calculate the max SRQs for userspace properly in bnxt_re
- Don't try to do math on errno for mlx5's rate calculation
- Properly allow userspace to control the VLAN in the QP state during
INIT->RTR for bnxt_re
- 6 bug fixes for HNS:
- Soft lockup when processing huge MRs, add a cond_resched()
- Fix missed error unwind for doorbell allocation
- Prevent bad send queue parameters from userspace
- Wrong error unwind in qp creation
- Missed xa_destroy during driver shutdown
- Fix reporting to userspace of max_sge_rd, hns doesn't have a
read/write difference"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
RDMA/hns: Fix wrong value of max_sge_rd
RDMA/hns: Fix missing xa_destroy()
RDMA/hns: Fix a missing rollback in error path of hns_roce_create_qp_common()
RDMA/hns: Fix invalid sq params not being blocked
RDMA/hns: Fix unmatched condition in error path of alloc_user_qp_db()
RDMA/hns: Fix soft lockup during bt pages loop
RDMA/bnxt_re: Avoid clearing VLAN_ID mask in modify qp path
RDMA/mlx5: Handle errors returned from mlx5r_ib_rate()
RDMA/bnxt_re: Fix reporting maximum SRQs on P7 chips
RDMA/bnxt_re: Add missing paranthesis in map_qp_id_to_tbl_indx
RDMA/bnxt_re: Fix allocation of QP table
RDMA/rxe: Fix the failure of ibv_query_device() and ibv_query_device_ex() tests
Linus Torvalds [Thu, 20 Mar 2025 16:18:38 +0000 (09:18 -0700)]
Merge tag 'efi-fixes-for-v6.14-3' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi
Pull EFI fixes from Ard Biesheuvel:
"Here's a final batch of EFI fixes for v6.14.
The efivarfs ones are fixes for changes that were made this cycle.
James's fix is somewhat of a band-aid, but it was blessed by the VFS
folks, who are working with James to come up with something better for
the next cycle.
- Avoid physical address 0x0 for random page allocations
- Add correct lockdep annotation when traversing efivarfs on resume
- Avoid NULL mount in kernel_file_open() when traversing efivarfs on
resume"
* tag 'efi-fixes-for-v6.14-3' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
efivarfs: fix NULL dereference on resume
efivarfs: use I_MUTEX_CHILD nested lock to traverse variables on resume
efi/libstub: Avoid physical address 0x0 when doing random allocation
David Ahern [Wed, 12 Mar 2025 09:22:12 +0000 (10:22 +0100)]
MAINTAINERS: Add Andrea Mayer as a maintainer of SRv6
Andrea has made significant contributions to SRv6 support in Linux.
Acknowledge the work and on-going interest in Srv6 support with a
maintainers entry for these files so hopefully he is included
on patches going forward.
Following Paolo's suggestion, let's revert the IPv6 link-local address
generation fix for GRE devices. The patch introduced regressions in the
upstream CI, which are still under investigation.
Start by reverting the kselftest that depend on that fix (patch 1), then
revert the kernel code itself (patch 2).
====================
This patch broke net/forwarding/ip6gre_custom_multipath_hash.sh in some
circumstances (https://lore.kernel.org/netdev/Z9RIyKZDNoka53EO@mini-arch/).
Let's revert it while the problem is being investigated.
1) Fix tunnel mode TX datapath in packet offload mode
by directly putting it to the xmit path.
From Alexandre Cassen.
2) Force software GSO only in tunnel mode in favor
of potential HW GSO. From Cosmin Ratiu.
ipsec-2025-03-19
* tag 'ipsec-2025-03-19' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec:
xfrm_output: Force software GSO only in tunnel mode
xfrm: fix tunnel mode TX datapath in packet offload mode
====================
Paolo Abeni [Thu, 20 Mar 2025 14:29:59 +0000 (15:29 +0100)]
Merge tag 'batadv-net-pullrequest-20250318' of git://git.open-mesh.org/linux-merge
Simon Wunderlich says:
====================
Here is batman-adv bugfix:
- Ignore own maximum aggregation size during RX, Sven Eckelmann
* tag 'batadv-net-pullrequest-20250318' of git://git.open-mesh.org/linux-merge:
batman-adv: Ignore own maximum aggregation size during RX
====================
Lin Ma [Sat, 15 Mar 2025 16:51:13 +0000 (00:51 +0800)]
net/neighbor: add missing policy for NDTPA_QUEUE_LENBYTES
Previous commit 8b5c171bb3dc ("neigh: new unresolved queue limits")
introduces new netlink attribute NDTPA_QUEUE_LENBYTES to represent
approximative value for deprecated QUEUE_LEN. However, it forgot to add
the associated nla_policy in nl_ntbl_parm_policy array. Fix it with one
simple NLA_U32 type policy.
Arthur Mongodin [Fri, 14 Mar 2025 20:11:31 +0000 (21:11 +0100)]
mptcp: Fix data stream corruption in the address announcement
Because of the size restriction in the TCP options space, the MPTCP
ADD_ADDR option is exclusive and cannot be sent with other MPTCP ones.
For this reason, in the linked mptcp_out_options structure, group of
fields linked to different options are part of the same union.
There is a case where the mptcp_pm_add_addr_signal() function can modify
opts->addr, but not ended up sending an ADD_ADDR. Later on, back in
mptcp_established_options, other options will be sent, but with
unexpected data written in other fields due to the union, e.g. in
opts->ext_copy. This could lead to a data stream corruption in the next
packet.
Using an intermediate variable, prevents from corrupting previously
established DSS option. The assignment of the ADD_ADDR option
parameters is now done once we are sure this ADD_ADDR option can be set
in the packet, e.g. after having dropped other suboptions.
Fixes: 1bff1e43a30e ("mptcp: optimize out option generation") Cc: stable@vger.kernel.org Suggested-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Arthur Mongodin <amongodin@randorisec.fr> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
[ Matt: the commit message has been updated: long lines splits and some
clarifications. ] Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250314-net-mptcp-fix-data-stream-corr-sockopt-v1-1-122dbb249db3@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Yang Yingliang [Thu, 3 Nov 2022 12:11:46 +0000 (20:11 +0800)]
i2c: amd-mp2: drop free_irq() of devm_request_irq() allocated irq
irq allocated with devm_request_irq() will be freed in devm_irq_release(),
using free_irq() in ->remove() will causes a dangling pointer, and a
subsequent double free. So remove the free_irq() in the error path and
remove path.
Fixes: 969864efae78 ("i2c: amd-mp2: use msix/msi if the hardware supports") Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Acked-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com> Link: https://lore.kernel.org/r/20221103121146.99836-1-yangyingliang@huawei.com Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
Yongjian Sun [Thu, 20 Mar 2025 03:44:17 +0000 (11:44 +0800)]
libfs: Fix duplicate directory entry in offset_dir_lookup
There is an issue in the kernel:
In tmpfs, when using the "ls" command to list the contents
of a directory with a large number of files, glibc performs
the getdents call in multiple rounds. If a concurrent unlink
occurs between these getdents calls, it may lead to duplicate
directory entries in the ls output. One possible reproduction
scenario is as follows:
Create 1026 files and execute ls and rm concurrently:
for i in {1..1026}; do
echo "This is file $i" > /tmp/dir/file$i
done
ls /tmp/dir rm /tmp/dir/file4
->getdents(file1026-file5)
->unlink(file4)
->getdents(file5,file3,file2,file1)
It is expected that the second getdents call to return file3
through file1, but instead it returns an extra file5.
The root cause of this problem is in the offset_dir_lookup
function. It uses mas_find to determine the starting position
for the current getdents call. Since mas_find locates the first
position that is greater than or equal to mas->index, when file4
is deleted, it ends up returning file5.
It can be fixed by replacing mas_find with mas_find_rev, which
finds the first position that is less than or equal to mas->index.
Fixes: b9b588f22a0c ("libfs: Use d_children list to iterate simple_offset directories") Signed-off-by: Yongjian Sun <sunyongjian1@huawei.com> Link: https://lore.kernel.org/r/20250320034417.555810-1-sunyongjian@huaweicloud.com Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
When the destination is the same after the transformation, we enter a
lwtunnel loop. This is true for most of lwt users: ioam6, rpl, seg6,
seg6_local, ila_lwt, and lwt_bpf. It can happen in their input() and
output() handlers respectively, where either dst_input() or dst_output()
is called at the end. It can also happen in xmit() handlers.
... until rpl_do_srh() fails, which means skb_cow_head() failed.
This series provides a fix at the core level of lwtunnel to catch such
loops when they're not caught by the respective lwtunnel users, and
handle the loop case in ioam6 which is one of the users. This series
also comes with a new selftest to detect some dst cache reference loops
in lwtunnel users.
====================
Justin Iurman [Fri, 14 Mar 2025 12:00:48 +0000 (13:00 +0100)]
selftests: net: test for lwtunnel dst ref loops
As recently specified by commit 0ea09cbf8350 ("docs: netdev: add a note
on selftest posting") in net-next, the selftest is therefore shipped in
this series. However, this selftest does not really test this series. It
needs this series to avoid crashing the kernel. What it really tests,
thanks to kmemleak, is what was fixed by the following commits:
- commit c71a192976de ("net: ipv6: fix dst refleaks in rpl, seg6 and
ioam6 lwtunnels")
- commit 92191dd10730 ("net: ipv6: fix dst ref loops in rpl, seg6 and
ioam6 lwtunnels")
- commit c64a0727f9b1 ("net: ipv6: fix dst ref loop on input in seg6
lwt")
- commit 13e55fbaec17 ("net: ipv6: fix dst ref loop on input in rpl
lwt")
- commit 0e7633d7b95b ("net: ipv6: fix dst ref loop in ila lwtunnel")
- commit 5da15a9c11c1 ("net: ipv6: fix missing dst ref drop in ila
lwtunnel")
Justin Iurman [Fri, 14 Mar 2025 12:00:47 +0000 (13:00 +0100)]
net: ipv6: ioam6: fix lwtunnel_output() loop
Fix the lwtunnel_output() reentry loop in ioam6_iptunnel when the
destination is the same after transformation. Note that a check on the
destination address was already performed, but it was not enough. This
is the example of a lwtunnel user taking care of loops without relying
only on the last resort detection offered by lwtunnel.
Justin Iurman [Fri, 14 Mar 2025 12:00:46 +0000 (13:00 +0100)]
net: lwtunnel: fix recursion loops
This patch acts as a parachute, catch all solution, by detecting
recursion loops in lwtunnel users and taking care of them (e.g., a loop
between routes, a loop within the same route, etc). In general, such
loops are the consequence of pathological configurations. Each lwtunnel
user is still free to catch such loops early and do whatever they want
with them. It will be the case in a separate patch for, e.g., seg6 and
seg6_local, in order to provide drop reasons and update statistics.
Another example of a lwtunnel user taking care of loops is ioam6, which
has valid use cases that include loops (e.g., inline mode), and which is
addressed by the next patch in this series. Overall, this patch acts as
a last resort to catch loops and drop packets, since we don't want to
leak something unintentionally because of a pathological configuration
in lwtunnels.
The solution in this patch reuses dev_xmit_recursion(),
dev_xmit_recursion_inc(), and dev_xmit_recursion_dec(), which seems fine
considering the context.
Closes: https://lore.kernel.org/netdev/2bc9e2079e864a9290561894d2a602d6@akamai.com/ Closes: https://lore.kernel.org/netdev/Z7NKYMY7fJT5cYWu@shredder/ Fixes: ffce41962ef6 ("lwtunnel: support dst output redirect function") Fixes: 2536862311d2 ("lwt: Add support to redirect dst.input") Fixes: 14972cbd34ff ("net: lwtunnel: Handle fragmentation") Signed-off-by: Justin Iurman <justin.iurman@uliege.be> Link: https://patch.msgid.link/20250314120048.12569-2-justin.iurman@uliege.be Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Currently the API emac_update_hardware_stats() reads different ICSSG
stats without any lock protection.
This API gets called by .ndo_get_stats64() which is only under RCU
protection and nothing else. Add lock to this API so that the reading of
statistics happens during lock.
Gavrilov Ilia [Thu, 13 Mar 2025 08:50:08 +0000 (08:50 +0000)]
xsk: fix an integer overflow in xp_create_and_assign_umem()
Since the i and pool->chunk_size variables are of type 'u32',
their product can wrap around and then be cast to 'u64'.
This can lead to two different XDP buffers pointing to the same
memory area.
Found by InfoTeCS on behalf of Linux Verification Center
(linuxtesting.org) with SVACE.
Paolo Abeni [Wed, 19 Mar 2025 18:44:05 +0000 (19:44 +0100)]
Merge tag 'for-net-2025-03-14' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth
Luiz Augusto von Dentz says:
====================
bluetooth pull request for net:
- hci_event: Fix connection regression between LE and non-LE adapters
- Fix error code in chan_alloc_skb_cb()
* tag 'for-net-2025-03-14' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
Bluetooth: hci_event: Fix connection regression between LE and non-LE adapters
Bluetooth: Fix error code in chan_alloc_skb_cb()
====================
Linus Torvalds [Wed, 19 Mar 2025 18:12:18 +0000 (11:12 -0700)]
Merge tag 'hwmon-fixes-for-v6.14-rc8/6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
Pull hwmon fixes from Guenter Roeck:
- Fix an entry in MAINTAINERS to avoid sending hwmon review requests to
the i2c mailing list
- Fix an out-of-bounds access in nct6775 driver
* tag 'hwmon-fixes-for-v6.14-rc8/6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (nct6775-core) Fix out of bounds access for NCT679{8,9}
MAINTAINERS: correct list and scope of LTC4286 HARDWARE MONITOR
net: stmmac: dwc-qos-eth: use devm_kzalloc() for AXI data
Everywhere else in the driver uses devm_kzalloc() when allocating the
AXI data, so there is no kfree() of this structure. However,
dwc-qos-eth uses kzalloc(), which leads to this memory being leaked.
Switch to use devm_kzalloc().
Fixes: d8256121a91a ("stmmac: adding new glue driver dwmac-dwc-qos-eth") Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/E1tsRyv-0064nU-O9@rmk-PC.armlinux.org.uk Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jason Gunthorpe [Tue, 4 Feb 2025 19:18:19 +0000 (15:18 -0400)]
gpu: host1x: Do not assume that a NULL domain means no DMA IOMMU
Previously with tegra-smmu, even with CONFIG_IOMMU_DMA, the default domain
could have been left as NULL. The NULL domain is specially recognized by
host1x_iommu_attach() as meaning it is not the DMA domain and
should be replaced with the special shared domain.
This happened prior to the below commit because tegra-smmu was using the
NULL domain to mean IDENTITY.
Now that the domain is properly labled the test in DRM doesn't see NULL.
Check for IDENTITY as well to enable the special domains.
This is the same issue and basic fix as seen in
commit fae6e669cdc5 ("drm/tegra: Do not assume that a NULL domain means no
DMA IOMMU").
Fixes: c8cc2655cc6c ("iommu/tegra-smmu: Implement an IDENTITY domain") Reported-by: Diogo Ivo <diogo.ivo@tecnico.ulisboa.pt> Closes: https://lore.kernel.org/all/c6a6f114-3acd-4d56-a13b-b88978e927dc@tecnico.ulisboa.pt/ Tested-by: Diogo Ivo <diogo.ivo@tecnico.ulisboa.pt> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Thierry Reding <treding@nvidia.com> Link: https://patchwork.freedesktop.org/patch/msgid/0-v1-10dcc8ce3869+3a7-host1x_identity_jgg@nvidia.com
Linus Torvalds [Wed, 19 Mar 2025 14:31:43 +0000 (07:31 -0700)]
Merge tag 'ata-6.14-final' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux
Pull ata fix from Niklas Cassel:
- Fix a regression on ATI AHCI controllers, where certain Samsung
drives fails to be detected on a warm boot when LPM is enabled.
LPM on ATI AHCI works fine with other drives. Likewise, the
Samsung drives works fine with LPM with other AHI controllers.
Thus, just like the weirdo ATA_QUIRK_NO_NCQ_ON_ATI quirk, add a
new ATA_QUIRK_NO_LPM_ON_ATI quirk to disable LPM only on ATI
AHCI controllers.
* tag 'ata-6.14-final' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux:
ata: libata-core: Add ATA_QUIRK_NO_LPM_ON_ATI for certain Samsung SSDs
Paolo Bonzini [Wed, 19 Mar 2025 13:01:53 +0000 (09:01 -0400)]
Merge tag 'kvm-s390-master-6.14-1' of https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD
Holding the pte lock for the page that is being converted to secure is
needed to avoid races. A previous commit removed the locking, which
caused issues. Fix by locking the pte again.
Luis Henriques [Thu, 6 Mar 2025 11:12:18 +0000 (11:12 +0000)]
fuse: fix possible deadlock if rings are never initialized
When mounting a user-space filesystem using io_uring, the initialization
of the rings is done separately in the server side. If for some reason
(e.g. a server bug) this step is not performed it will be impossible to
unmount the filesystem if there are already requests waiting.
This issue is easily reproduced with the libfuse passthrough_ll example,
if the queue depth is set to '0' and a request is queued before trying to
unmount the filesystem. When trying to force the unmount, fuse_abort_conn()
will try to wake up all tasks waiting in fc->blocked_waitq, but because the
rings were never initialized, fuse_uring_ready() will never return 'true'.
Fixes: 3393ff964e0f ("fuse: block request allocation until io-uring init is complete") Signed-off-by: Luis Henriques <luis@igalia.com> Link: https://lore.kernel.org/r/20250306111218.13734-1-luis@igalia.com Acked-by: Miklos Szeredi <mszeredi@redhat.com> Reviewed-by: Bernd Schubert <bschubert@ddn.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
Miaoqian Lin [Wed, 19 Mar 2025 03:23:04 +0000 (11:23 +0800)]
spi: Fix reference count leak in slave_show()
Fix a reference count leak in slave_show() by properly putting the device
reference obtained from device_find_any_child().
Fixes: 6c364062bfed ("spi: core: Add support for registering SPI slave controllers") Fixes: c21b0837983d ("spi: Use device_find_any_child() instead of custom approach") Signed-off-by: Miaoqian Lin <linmq006@gmail.com> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Link: https://patch.msgid.link/20250319032305.70340-1-linmq006@gmail.com Signed-off-by: Mark Brown <broonie@kernel.org>
Pierre Riteau <pierre@stackhpc.com> found suspicious handling an error
from xa_alloc_cyclic() in scheduler code [1]. The same is done in few
other places.
v1 --> v2: [2]
* add fixes tags
* fix also the same usage in dpll and phy
xa_alloc_cyclic() can return 1, which isn't an error. To prevent
situation when the caller of this function will treat it as no error do
a check only for negative here.
Fixes: 384968786909 ("net: phy: Introduce ethernet link topology representation") Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
In case of returning 1 from xa_alloc_cyclic() (wrapping) ERR_PTR(1) will
be returned, which will cause IS_ERR() to be false. Which can lead to
dereference not allocated pointer (pin).
Fix it by checking if err is lower than zero.
This wasn't found in real usecase, only noticed. Credit to Pierre.
Fixes: 97f265ef7f5b ("dpll: allocate pin ids in cycle") Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
In case of returning 1 from xa_alloc_cyclic() (wrapping) ERR_PTR(1) will
be returned, which will cause IS_ERR() to be false. Which can lead to
dereference not allocated pointer (rel).
Fix it by checking if err is lower than zero.
This wasn't found in real usecase, only noticed. Credit to Pierre.
Fixes: c137743bce02 ("devlink: introduce object and nested devlink relationship infra") Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Here are some miscellaneous fixes and changes for netfslib:
(1) Fix the collection of results during a pause in transmission.
(2) Call ->invalidate_cache() only if provided.
(3) Fix the rolling buffer to not hammer atomic bit clears when loading
from readahead.
(4) Fix netfs_unbuffered_read() to return ssize_t.
* patches from https://lore.kernel.org/r/20250314164201.1993231-1-dhowells@redhat.com:
netfs: Fix netfs_unbuffered_read() to return ssize_t rather than int
netfs: Fix rolling_buffer_load_from_ra() to not clear mark bits
netfs: Call `invalidate_cache` only if implemented
netfs: Fix collection of results during pause when collection offloaded
David Howells [Fri, 14 Mar 2025 16:41:58 +0000 (16:41 +0000)]
netfs: Fix rolling_buffer_load_from_ra() to not clear mark bits
rolling_buffer_load_from_ra() looms large in the perf report because it
loops around doing an atomic clear for each of the three mark bits per
folio. However, this is both inefficient (it would be better to build a
mask and atomically AND them out) and unnecessary as they shouldn't be set.
Fix this by removing the loop.
Fixes: ee4cdf7ba857 ("netfs: Speed up buffered reading") Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/20250314164201.1993231-4-dhowells@redhat.com Acked-by: "Paulo Alcantara (Red Hat)" <pc@manguebit.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: Steve French <sfrench@samba.org>
cc: Paulo Alcantara <pc@manguebit.com>
cc: netfs@lists.linux.dev
cc: linux-cifs@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
Max Kellermann [Fri, 14 Mar 2025 16:41:57 +0000 (16:41 +0000)]
netfs: Call `invalidate_cache` only if implemented
Many filesystems such as NFS and Ceph do not implement the
`invalidate_cache` method. On those filesystems, if writing to the
cache (`NETFS_WRITE_TO_CACHE`) fails for some reason, the kernel
crashes like this:
David Howells [Fri, 14 Mar 2025 16:41:56 +0000 (16:41 +0000)]
netfs: Fix collection of results during pause when collection offloaded
A netfs read request can run in one of two modes: for synchronous reads
writes, the app thread does the collection of results and for asynchronous
reads, this is offloaded to a worker thread. This is controlled by the
NETFS_RREQ_OFFLOAD_COLLECTION flag.
Now, if a subrequest incurs an error, the NETFS_RREQ_PAUSE flag is set to
stop the issuing loop temporarily from issuing more subrequests until a
retry is successful or the request is abandoned.
When the issuing loop sees NETFS_RREQ_PAUSE, it jumps to
netfs_wait_for_pause() which will wait for the PAUSE flag to be cleared -
and whilst it is waiting, it will call out to the collector as more results
acrue... But this is the wrong thing to do if OFFLOAD_COLLECTION is set as
we can then end up with both the app thread and the work item collecting
results simultaneously.
This manifests itself occasionally when running the generic/323 xfstest
against multichannel cifs as an oops that's a bit random but frequently
involving io_submit() (the test does lots of simultaneous async DIO reads).
Fix this by only doing the collection in netfs_wait_for_pause() if the
NETFS_RREQ_OFFLOAD_COLLECTION is not set.
Fixes: e2d46f2ec332 ("netfs: Change the read result collector to only use one work item") Reported-by: Steve French <stfrench@microsoft.com> Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/20250314164201.1993231-2-dhowells@redhat.com Acked-by: "Paulo Alcantara (Red Hat)" <pc@manguebit.com>
cc: Paulo Alcantara <pc@manguebit.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
Joanne Koong [Tue, 18 Mar 2025 00:30:28 +0000 (17:30 -0700)]
fuse: fix uring race condition for null dereference of fc
There is a race condition leading to a kernel crash from a null
dereference when attemping to access fc->lock in
fuse_uring_create_queue(). fc may be NULL in the case where another
thread is creating the uring in fuse_uring_create() and has set
fc->ring but has not yet set ring->fc when fuse_uring_create_queue()
reads ring->fc. There is another race condition as well where in
fuse_uring_register(), ring->nr_queues may still be 0 and not yet set
to the new value when we compare qid against it.
This fix sets fc->ring only after ring->fc and ring->nr_queues have been
set, which guarantees now that ring->fc is a proper pointer when any
queues are created and ring->nr_queues reflects the right number of
queues if ring is not NULL. We must use smp_store_release() and
smp_load_acquire() semantics to ensure the ordering will remain correct
where fc->ring is assigned only after ring->fc and ring->nr_queues have
been assigned.
Tomasz Pakuła [Tue, 11 Mar 2025 21:38:33 +0000 (22:38 +0100)]
drm/amdgpu/pm: Handle SCLK offset correctly in overdrive for smu 14.0.2
Currently, it seems like the code was carried over from RDNA3 because
it assumes two possible values to set. RDNA4, instead of having:
0: min SCLK
1: max SCLK
only has
0: SCLK offset
This change makes it so it only reports current offset value instead of
showing possible min/max values and their indices. Moreover, it now only
accepts the offset as a value, without the indice index.
Additionally, the lower bound was printed as %u by mistake.
Setting this offset:
Old: "s 1 <offset>"
New: "s <offset>"
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4036 Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Tomasz Pakuła <tomasz.pakula.oficjalny@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 1cfeb60e6e8837b1de5eb4e17df7cf31f4442144) Cc: stable@vger.kernel.org # 6.12.x
Lo-an Chen [Mon, 10 Mar 2025 06:52:22 +0000 (14:52 +0800)]
drm/amd/display: Fix incorrect fw_state address in dmub_srv
[WHY]
The fw_state in dmub_srv was assigned with wrong address.
The address was pointed to the firmware region.
[HOW]
Fix the firmware state by using DMUB_DEBUG_FW_STATE_OFFSET
in dmub_cmd.h.
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Lo-an Chen <lo-an.chen@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit f57b38ac85a01bf03020cc0a9761d63e5c0ce197)
drm/amd/display: Use HW lock mgr for PSR1 when only one eDP
[WHY]
DMUB locking is important to make sure that registers aren't accessed
while in PSR. Previously it was enabled but caused a deadlock in
situations with multiple eDP panels.
[HOW]
Detect if multiple eDP panels are in use to decide whether to use
lock. Refactor the function so that the first check is for PSR-SU
and then replay is in use to prevent having to look up number
of eDP panels for those configurations.
Fixes: f245b400a223 ("Revert "drm/amd/display: Use HW lock mgr for PSR1"") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3965 Reviewed-by: ChiaHsuan Chung <chiahsuan.chung@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit ed569e1279a3045d6b974226c814e071fa0193a6) Cc: stable@vger.kernel.org
Philip Yang [Wed, 29 Jan 2025 17:37:30 +0000 (12:37 -0500)]
drm/amdkfd: Fix user queue validation on Gfx7/8
To workaround queue full h/w issue on Gfx7/8, when application create
AQL queue, the ring buffer bo allocate size is queue_size/2 and
map queue_size ring buffer to GPU in 2 pieces using 2 attachments, each
attachment map size is queue_size/2, with same ring_bo backing memory.
For Gfx7/8, user queue buffer validation should use queue_size/2 to
verify ring_bo allocation and mapping size.
Fixes: 68e599db7a54 ("drm/amdkfd: Validate user queue buffers") Suggested-by: Tomáš Trnka <trnka@scm.com> Signed-off-by: Philip Yang <Philip.Yang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit e7a477735f1771b9a9346a5fbd09d7ff0641723a) Cc: stable@vger.kernel.org
Wentao Liang [Wed, 12 Mar 2025 06:31:06 +0000 (14:31 +0800)]
drm/amdgpu/gfx12: correct cleanup of 'me' field with gfx_v12_0_me_fini()
In gfx_v12_0_cp_gfx_load_me_microcode_rs64(), gfx_v12_0_pfp_fini() is
incorrectly used to free 'me' field of 'gfx', since gfx_v12_0_pfp_fini()
can only release 'pfp' field of 'gfx'. The release function of 'me' field
should be gfx_v12_0_me_fini().
Fixes: 52cb80c12e8a ("drm/amdgpu: Add gfx v12_0 ip block support (v6)") Signed-off-by: Wentao Liang <vulab@iscas.ac.cn> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit ebdc52607a46cda08972888178c6aa9cd6965141) Cc: stable@vger.kernel.org # 6.12.x
Jay Cornwall [Fri, 7 Feb 2025 21:40:34 +0000 (16:40 -0500)]
drm/amdkfd: Fix instruction hazard in gfx12 trap handler
VALU instructions with SGPR source need wait states to avoid hazard
with SALU using different SGPR.
v2: Eliminate some hazards to reduce code explosion
Signed-off-by: Jay Cornwall <jay.cornwall@amd.com> Reviewed-by: Lancelot Six <lancelot.six@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 7e0459d453b911435673edd7a86eadc600c63238) Cc: stable@vger.kernel.org # 6.12.x
drm/radeon: fix uninitialized size issue in radeon_vce_cs_parse()
On the off chance that command stream passed from userspace via
ioctl() call to radeon_vce_cs_parse() is weirdly crafted and
first command to execute is to encode (case 0x03000001), the function
in question will attempt to call radeon_vce_cs_reloc() with size
argument that has not been properly initialized. Specifically, 'size'
will point to 'tmp' variable before the latter had a chance to be
assigned any value.
Play it safe and init 'tmp' with 0, thus ensuring that
radeon_vce_cs_reloc() will catch an early error in cases like these.
Found by Linux Verification Center (linuxtesting.org) with static
analysis tool SVACE.
Fixes: 2fc5703abda2 ("drm/radeon: check VCE relocation buffer range v3") Signed-off-by: Nikita Zhandarovich <n.zhandarovich@fintech.ru> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 2d52de55f9ee7aaee0e09ac443f77855989c6b68) Cc: stable@vger.kernel.org
ipv6: Set errno after ip_fib_metrics_init() in ip6_route_info_create().
While creating a new IPv6, we could get a weird -ENOMEM when
RTA_NH_ID is set and either of the conditions below is true:
1) CONFIG_IPV6_SUBTREES is enabled and rtm_src_len is specified
2) nexthop_get() fails
e.g.)
# strace ip -6 route add fe80::dead:beef:dead:beef nhid 1 from ::
recvmsg(3, {msg_iov=[{iov_base=[...[
{error=-ENOMEM, msg=[... [...]]},
[{nla_len=49, nla_type=NLMSGERR_ATTR_MSG}, "Nexthops can not be used with so"...]
]], iov_len=32768}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 148
Let's set err explicitly after ip_fib_metrics_init() in
ip6_route_info_create().
Fixes: f88d8ea67fbd ("ipv6: Plumb support for nexthop object in a fib6_info") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20250312013854.61125-1-kuniyu@amazon.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
ipv6: Fix memleak of nhc_pcpu_rth_output in fib_check_nh_v6_gw().
fib_check_nh_v6_gw() expects that fib6_nh_init() cleans up everything
when it fails.
Commit 7dd73168e273 ("ipv6: Always allocate pcpu memory in a fib6_nh")
moved fib_nh_common_init() before alloc_percpu_gfp() within fib6_nh_init()
but forgot to add cleanup for fib6_nh->nh_common.nhc_pcpu_rth_output in
case it fails to allocate fib6_nh->rt6i_pcpu, resulting in memleak.
Let's call fib_nh_common_release() and clear nhc_pcpu_rth_output in the
error path.
Note that we can remove the fib6_nh_release() call in nh_create_ipv6()
later in net-next.git.
* tag 'linux-can-fixes-for-6.14-20250314' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
can: flexcan: disable transceiver during system PM
can: flexcan: only change CAN state when link up in system PM
can: rcar_canfd: Fix page entries in the AFL list
dt-bindings: can: renesas,rcar-canfd: Fix typo in pattern properties for R-Car V4M
can: statistics: use atomic access in hot path
can: ucan: fix out of bound read in strscpy() source
====================