Linus Torvalds [Sat, 21 Dec 2024 19:07:19 +0000 (11:07 -0800)]
Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Pull BPF fixes from Daniel Borkmann:
- Fix inlining of bpf_get_smp_processor_id helper for !CONFIG_SMP
systems (Andrea Righi)
- Fix BPF USDT selftests helper code to use asm constraint "m" for
LoongArch (Tiezhu Yang)
- Fix BPF selftest compilation error in get_uprobe_offset when
PROCMAP_QUERY is not defined (Jerome Marchand)
- Fix BPF bpf_skb_change_tail helper when used in context of BPF
sockmap to handle negative skb header offsets (Cong Wang)
- Several fixes to BPF sockmap code, among others, in the area of
socket buffer accounting (Levi Zim, Zijian Zhang, Cong Wang)
* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
selftests/bpf: Test bpf_skb_change_tail() in TC ingress
selftests/bpf: Introduce socket_helpers.h for TC tests
selftests/bpf: Add a BPF selftest for bpf_skb_change_tail()
bpf: Check negative offsets in __bpf_skb_min_len()
tcp_bpf: Fix copied value in tcp_bpf_sendmsg
skmsg: Return copied bytes in sk_msg_memcopy_from_iter
tcp_bpf: Add sk_rmem_alloc related logic for tcp_bpf ingress redirection
tcp_bpf: Charge receive socket buffer in bpf_tcp_ingress()
selftests/bpf: Fix compilation error in get_uprobe_offset()
selftests/bpf: Use asm constraint "m" for LoongArch
bpf: Fix bpf_get_smp_processor_id() on !CONFIG_SMP
Linus Torvalds [Sat, 21 Dec 2024 18:56:34 +0000 (10:56 -0800)]
Merge tag 'media/v6.13-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media fixes from Mauro Carvalho Chehab:
- fix a clang build issue with mediatec vcodec
- add missing variable initialization to dib3000mb write function
* tag 'media/v6.13-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
media: mediatek: vcodec: mark vdec_vp9_slice_map_counts_eob_coef noinline
media: dvb-frontends: dib3000mb: fix uninit-value in dib3000_write_reg
Linus Torvalds [Sat, 21 Dec 2024 18:51:04 +0000 (10:51 -0800)]
Merge tag 'pci-v6.13-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Pull PCI fixes from Krzysztof Wilczyński:
"Two small patches that are important for fixing boot time hang on
Intel JHL7540 'Titan Ridge' platforms equipped with a Thunderbolt
controller.
The boot time issue manifests itself when a PCI Express bandwidth
control is unnecessarily enabled on the Thunderbolt controller
downstream ports, which only supports a link speed of 2.5 GT/s in
accordance with USB4 v2 specification (p. 671, sec. 11.2.1, "PCIe
Physical Layer Logical Sub-block").
As such, there is no need to enable bandwidth control on such
downstream port links, which also works around the issue.
Both patches were tested by the original reporter on the hardware on
which the failure origin golly manifested itself. Both fixes were
proven to resolve the reported boot hang issue, and both patches have
been in linux-next this week with no reported problems"
* tag 'pci-v6.13-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
PCI/bwctrl: Enable only if more than one speed is supported
PCI: Honor Max Link Speed when determining supported speeds
Linus Torvalds [Sat, 21 Dec 2024 18:47:47 +0000 (10:47 -0800)]
Merge tag 'pm-6.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These fix some amd-pstate driver issues:
- Detect preferred core support in amd-pstate before driver
registration to avoid initialization ordering issues (K Prateek
Nayak)
- Fix issues with with boost numerator handling in amd-pstate leading
to inconsistently programmed CPPC max performance values (Mario
Limonciello)"
* tag 'pm-6.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq/amd-pstate: Use boost numerator for upper bound of frequencies
cpufreq/amd-pstate: Store the boost numerator as highest perf again
cpufreq/amd-pstate: Detect preferred core support before driver registration
Linus Torvalds [Sat, 21 Dec 2024 18:44:44 +0000 (10:44 -0800)]
Merge tag 'thermal-6.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull thermal control fixes from Rafael Wysocki:
"Fix two issues with the user thermal thresholds feature introduced in
this development cycle (Daniel Lezcano)"
* tag 'thermal-6.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
thermal/thresholds: Fix boundaries and detection routine
thermal/thresholds: Fix uapi header macros leading to a compilation error
Linus Torvalds [Sat, 21 Dec 2024 17:35:18 +0000 (09:35 -0800)]
Merge tag '6.13-rc3-SMB3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull smb client fixes from Steve French:
- fix regression in display of write stats
- fix rmmod failure with network namespaces
- two minor cleanups
* tag '6.13-rc3-SMB3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
smb: fix bytes written value in /proc/fs/cifs/Stats
smb: client: fix TCP timers deadlock after rmmod
smb: client: Deduplicate "select NETFS_SUPPORT" in Kconfig
smb: use macros instead of constants for leasekey size and default cifsattrs value
Linus Torvalds [Sat, 21 Dec 2024 17:32:24 +0000 (09:32 -0800)]
Merge tag 'nfs-for-6.13-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client fixes from Trond Myklebust:
- NFS/pnfs: Fix a live lock between recalled layouts and layoutget
- Fix a build warning about an undeclared symbol 'nfs_idmap_cache_timeout'
* tag 'nfs-for-6.13-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
fs/nfs: fix missing declaration of nfs_idmap_cache_timeout
NFS/pnfs: Fix a live lock between recalled layouts and layoutget
Linus Torvalds [Sat, 21 Dec 2024 17:29:46 +0000 (09:29 -0800)]
Merge tag 'ceph-for-6.13-rc4' of https://github.com/ceph/ceph-client
Pull ceph fixes from Ilya Dryomov:
"A handful of important CephFS fixes from Max, Alex and myself: memory
corruption due to a buffer overrun, potential infinite loop and
several memory leaks on the error paths. All but one marked for
stable"
* tag 'ceph-for-6.13-rc4' of https://github.com/ceph/ceph-client:
ceph: allocate sparse_ext map only for sparse reads
ceph: fix memory leak in ceph_direct_read_write()
ceph: improve error handling and short/overflow-read logic in __ceph_sync_read()
ceph: validate snapdirname option length when mounting
ceph: give up on paths longer than PATH_MAX
ceph: fix memory leaks in __ceph_sync_read()
Cong Wang [Fri, 13 Dec 2024 03:40:55 +0000 (19:40 -0800)]
selftests/bpf: Add a BPF selftest for bpf_skb_change_tail()
As requested by Daniel, we need to add a selftest to cover
bpf_skb_change_tail() cases in skb_verdict. Here we test trimming,
growing and error cases, and validate its expected return values and the
expected sizes of the payload.
Cong Wang [Fri, 13 Dec 2024 03:40:54 +0000 (19:40 -0800)]
bpf: Check negative offsets in __bpf_skb_min_len()
skb_network_offset() and skb_transport_offset() can be negative when
they are called after we pull the transport header, for example, when
we use eBPF sockmap at the point of ->sk_data_ready().
__bpf_skb_min_len() uses an unsigned int to get these offsets, this
leads to a very large number which then causes bpf_skb_change_tail()
failed unexpectedly.
Fix this by using a signed int to get these offsets and ensure the
minimum is at least zero.
This is caused by tcp_bpf_sendmsg() returning a larger value(12289) than
size (8192), which causes the while loop in splice_to_socket() to release
an uninitialized pipe buf.
The underlying cause is that this code assumes sk_msg_memcopy_from_iter()
will copy all bytes upon success but it actually might only copy part of
it.
This commit changes it to use the real copied bytes.
Linus Torvalds [Fri, 20 Dec 2024 21:48:41 +0000 (13:48 -0800)]
Merge tag 'hwmon-for-v6.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
Pull hwmon fixes from Guenter Roeck:
- Fix reporting of negative temperature, current, and voltage values in
the tmp513 driver
* tag 'hwmon-for-v6.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (tmp513) Fix interpretation of values of Temperature Result and Limit Registers
hwmon: (tmp513) Fix Current Register value interpretation
hwmon: (tmp513) Fix interpretation of values of Shunt Voltage and Limit Registers
Linus Torvalds [Fri, 20 Dec 2024 21:37:58 +0000 (13:37 -0800)]
Merge tag 'block-6.13-20241220' of git://git.kernel.dk/linux
Pull block fixes from Jens Axboe:
- Minor cleanups for bdev/nvme using the helpers introduced
- Revert of a deadlock fix that still needs more work
- Fix a UAF of hctx in the cpu hotplug code
* tag 'block-6.13-20241220' of git://git.kernel.dk/linux:
block: avoid to reuse `hctx` not removed from cpuhp callback list
block: Revert "block: Fix potential deadlock while freezing queue and acquiring sysfs_lock"
nvme: use blk_validate_block_size() for max LBA check
block/bdev: use helper for max block size check
Linus Torvalds [Fri, 20 Dec 2024 21:32:43 +0000 (13:32 -0800)]
Merge tag 'io_uring-6.13-20241220' of git://git.kernel.dk/linux
Pull io_uring fixes from Jens Axboe:
- Fix for a file ref leak for registered ring fds
- Turn the ->timeout_lock into a raw spinlock, as it nests under the
io-wq lock which is a raw spinlock as it's called from the scheduler
side
- Limit ring resizing to DEFER_TASKRUN for now. We will broaden this in
the future, but for now, ensure that it's only feasible on rings with
a single user
- Add sanity check for io-wq enqueuing
* tag 'io_uring-6.13-20241220' of git://git.kernel.dk/linux:
io_uring: check if iowq is killed before queuing
io_uring/register: limit ring resizing to DEFER_TASKRUN
io_uring: Fix registered ring file refcount leak
io_uring: make ctx->timeout_lock a raw spinlock
Linus Torvalds [Fri, 20 Dec 2024 19:09:40 +0000 (11:09 -0800)]
Merge tag 'usb-6.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB / Thunderbolt fixes from Greg KH:
"Here are some important, and small, fixes for USB and Thunderbolt
issues that have come up in the -rc releases. And some new device ids
for good measure. Included in here are:
- Much reported xhci bugfix for usb-storage devices (and other
devices as well, tripped me up on a video camera)
- thunderbolt fixes for some small reported issues
- new usb-serial device ids
All of these have been in linux-next this week with no reported issues"
* tag 'usb-6.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
usb: xhci: fix ring expansion regression in 6.13-rc1
xhci: Turn NEC specific quirk for handling Stop Endpoint errors generic
thunderbolt: Improve redrive mode handling
USB: serial: option: add Telit FE910C04 rmnet compositions
USB: serial: option: add MediaTek T7XX compositions
USB: serial: option: add Netprisma LCUK54 modules for WWAN Ready
USB: serial: option: add MeiG Smart SLM770A
USB: serial: option: add TCL IK512 MBIM & ECM
thunderbolt: Don't display nvm_version unless upgrade supported
thunderbolt: Add support for Intel Panther Lake-M/P
Linus Torvalds [Fri, 20 Dec 2024 19:04:02 +0000 (11:04 -0800)]
Merge tag 'regulator-fix-v6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
Pull regulator fix from Mark Brown:
"The recently added regulator-uv-survival-time-ms property was renamed
during the review of the series that added it, but unfortunately only
in the DT binding and not in the code that parses the binding.
This brings the code in line with the binding, if someone started
using the original name we can add compat support for it but there's
nothing upstream yet and it's a very niche feature so hopefully not"
* tag 'regulator-fix-v6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: rename regulator-uv-survival-time-ms according to DT binding
Linus Torvalds [Fri, 20 Dec 2024 18:17:53 +0000 (10:17 -0800)]
Merge tag 'drm-fixes-2024-12-20' of https://gitlab.freedesktop.org/drm/kernel
Pull drm fixes from Dave Airlie:
"Probably the last pull before Christmas holidays, I'll still be around
for most of the time anyways, nothing too major in here, bunch of
amdgpu and i915 along with a smattering of fixes across the board.
core:
- fix FB dependency
- avoid div by 0 more in vrefresh
- maintainers update
display:
- fix DP tunnel error path
dma-buf:
- fix !DEBUG_FS
sched:
- docs warning fix
panel:
- collection of misc panel fixes
i915:
- Reset engine utilization buffer before registration
- Ensure busyness counter increases motonically
- Accumulate active runtime on gt reset
amdgpu:
- Disable BOCO when CONFIG_HOTPLUG_PCI_PCIE is not enabled
- scheduler job fixes
- IP version check fixes
- devcoredump fix
- GPUVM update fix
- NBIO 2.5 fix
udmabuf:
- fix memory leak on last export
- sealing fixes
* tag 'drm-fixes-2024-12-20' of https://gitlab.freedesktop.org/drm/kernel: (33 commits)
drm/sched: Fix drm_sched_fini() docu generation
accel/ivpu: Fix WARN in ivpu_ipc_send_receive_internal()
accel/ivpu: Fix memory leak in ivpu_mmu_reserved_context_init()
accel/ivpu: Fix general protection fault in ivpu_bo_list()
drm/amdgpu/nbio7.0: fix IP version check
drm/amd: Update strapping for NBIO 2.5.0
drm/amdgpu: Handle NULL bo->tbo.resource (again) in amdgpu_vm_bo_update
drm/amdgpu: fix amdgpu_coredump
drm/amdgpu/smu14.0.2: fix IP version check
drm/amdgpu/gfx12: fix IP version check
drm/amdgpu/mmhub4.1: fix IP version check
drm/amdgpu/nbio7.11: fix IP version check
drm/amdgpu/nbio7.7: fix IP version check
drm/amdgpu: don't access invalid sched
drm/amd: Require CONFIG_HOTPLUG_PCI_PCIE for BOCO
drm: rework FB_CORE dependency
drm/fbdev: Select FB_CORE dependency for fbdev on DMA and TTM
fbdev: Fix recursive dependencies wrt BACKLIGHT_CLASS_DEVICE
i915/guc: Accumulate active runtime on gt reset
i915/guc: Ensure busyness counter increases motonically
...
Linus Torvalds [Fri, 20 Dec 2024 18:13:26 +0000 (10:13 -0800)]
Merge tag 'trace-ringbuffer-v6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull ring-buffer fixes from Steven Rostedt:
- Fix possible overflow of mmapped ring buffer with bad offset
If the mmap() to the ring buffer passes in a start address that is
passed the end of the mmapped file, it is not caught and a
slab-out-of-bounds is triggered.
Add a check to make sure the start address is within the bounds
- Do not use TP_printk() to boot mapped ring buffers
As a boot mapped ring buffer's data may have pointers that map to the
previous boot's memory map, it is unsafe to allow the TP_printk() to
be used to read the boot mapped buffer's events. If a TP_printk()
points to a static string from within the kernel it will not match
the current kernel mapping if KASLR is active, and it can fault.
Have it simply print out the raw fields.
* tag 'trace-ringbuffer-v6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
trace/ring-buffer: Do not use TP_printk() formatting for boot mapped buffers
ring-buffer: Fix overflow in __rb_map_vma
Zijian Zhang [Tue, 10 Dec 2024 01:20:39 +0000 (01:20 +0000)]
tcp_bpf: Add sk_rmem_alloc related logic for tcp_bpf ingress redirection
When we do sk_psock_verdict_apply->sk_psock_skb_ingress, an sk_msg will
be created out of the skb, and the rmem accounting of the sk_msg will be
handled by the skb.
For skmsgs in __SK_REDIRECT case of tcp_bpf_send_verdict, when redirecting
to the ingress of a socket, although we sk_rmem_schedule and add sk_msg to
the ingress_msg of sk_redir, we do not update sk_rmem_alloc. As a result,
except for the global memory limit, the rmem of sk_redir is nearly
unlimited. Thus, add sk_rmem_alloc related logic to limit the recv buffer.
Since the function sk_msg_recvmsg and __sk_psock_purge_ingress_msg are
used in these two paths. We use "msg->skb" to test whether the sk_msg is
skb backed up. If it's not, we shall do the memory accounting explicitly.
Fixes: 604326b41a6f ("bpf, sockmap: convert to generic sk_msg interface") Signed-off-by: Zijian Zhang <zijianzhang@bytedance.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20241210012039.1669389-3-zijianzhang@bytedance.com
Cong Wang [Tue, 10 Dec 2024 01:20:38 +0000 (01:20 +0000)]
tcp_bpf: Charge receive socket buffer in bpf_tcp_ingress()
When bpf_tcp_ingress() is called, the skmsg is being redirected to the
ingress of the destination socket. Therefore, we should charge its
receive socket buffer, instead of sending socket buffer.
Because sk_rmem_schedule() tests pfmemalloc of skb, we need to
introduce a wrapper and call it for skmsg.
Fixes: 604326b41a6f ("bpf, sockmap: convert to generic sk_msg interface") Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20241210012039.1669389-2-zijianzhang@bytedance.com
We are seeing a sparse warning in gcs_restore_signal():
arch/arm64/kernel/signal.c:1054:9: sparse: sparse: cast removes address space '__user' of expression
when storing the final GCSPR_EL0 value back into the register, caused by
the fact that write_sysreg_s() casts the value it writes to a u64 which
sparse sees as discarding the __userness of the pointer.
Avoid this by treating the address as an integer, casting to a pointer only
when using it to write to userspace.
While we're at it also inline gcs_signal_cap_valid() into it's one user
and make equivalent updates to gcs_signal_entry().
Dave Airlie [Thu, 19 Dec 2024 21:13:44 +0000 (07:13 +1000)]
Merge tag 'drm-misc-fixes-2024-12-19' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes
drm-misc-fixes for v6.13-rc4:
- udma-buf fixes related to sealing.
- dma-buf build warning fix when debugfs is not enabled.
- Assorted drm/panel fixes.
- Correct error return in drm_dp_tunnel_mgr_create.
- Fix even more divide by zero in drm_mode_vrefresh.
- Fix FBDEV dependencies in Kconfig.
- Documentation fix for drm_sched_fini.
- IVPU NULL pointer, memory leak and WARN fix.
Pavel Begunkov [Thu, 19 Dec 2024 19:52:58 +0000 (19:52 +0000)]
io_uring: check if iowq is killed before queuing
task work can be executed after the task has gone through io_uring
termination, whether it's the final task_work run or the fallback path.
In this case, task work will find ->io_wq being already killed and
null'ed, which is a problem if it then tries to forward the request to
io_queue_iowq(). Make io_queue_iowq() fail requests in this case.
Note that it also checks PF_KTHREAD, because the user can first close
a DEFER_TASKRUN ring and shortly after kill the task, in which case
->iowq check would race.
Bharath SM [Thu, 19 Dec 2024 17:58:50 +0000 (23:28 +0530)]
smb: fix bytes written value in /proc/fs/cifs/Stats
With recent netfs apis changes, the bytes written
value was not getting updated in /proc/fs/cifs/Stats.
Fix this by updating tcon->bytes in write operations.
Fixes: 3ee1a1fc3981 ("cifs: Cut over to using netfslib") Signed-off-by: Bharath SM <bharathsm@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
* tag 'net-6.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (45 commits)
net: mctp: handle skb cleanup on sock_queue failures
net: mdiobus: fix an OF node reference leak
octeontx2-pf: fix error handling of devlink port in rvu_rep_create()
octeontx2-pf: fix netdev memory leak in rvu_rep_create()
psample: adjust size if rate_as_probability is set
netdev-genl: avoid empty messages in queue dump
net: dsa: restore dsa_software_vlan_untag() ability to operate on VLAN-untagged traffic
selftests: openvswitch: fix tcpdump execution
net: usb: qmi_wwan: add Quectel RG255C
net: phy: avoid undefined behavior in *_led_polarity_set()
netfilter: ipset: Fix for recursive locking warning
ipvs: Fix clamp() of ip_vs_conn_tab on small memory systems
can: m_can: fix missed interrupts with m_can_pci
can: m_can: set init flag earlier in probe
rtnetlink: Try the outer netns attribute in rtnl_get_peer_net().
net: netdevsim: fix nsim_pp_hold_write()
idpf: trigger SW interrupt when exiting wb_on_itr mode
idpf: add support for SW triggered interrupts
qed: fix possible uninit pointer read in qed_mcp_nvm_info_populate()
net: ethernet: bgmac-platform: fix an OF node reference leak
...
Linus Torvalds [Thu, 19 Dec 2024 16:53:51 +0000 (08:53 -0800)]
Merge tag 'mmc-v6.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
Pull MMC fixes from Ulf Hansson:
- mtk-sd: Cleanup the wakeup configuration in error/remove-path
- sdhci-tegra: Correct quirk for ADMA2 length
* tag 'mmc-v6.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: mtk-sd: disable wakeup in .remove() and in the error path of .probe()
mmc: sdhci-tegra: Remove SDHCI_QUIRK_BROKEN_ADMA_ZEROLEN_DESC quirk
Linus Torvalds [Thu, 19 Dec 2024 16:50:05 +0000 (08:50 -0800)]
Merge tag 'pwm/for-6.13-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/ukleinek/linux
Pull pwm fix from Uwe Kleine-König:
"Fix regression in pwm-stm32 driver when converting to new waveform
support
Fabrice Gasnier found and fixed a regression I introduced with
v6.13-rc1 when converting the stm32 pwm driver to support the new
waveform stuff. On some hardware variants this completely broke the
driver"
* tag 'pwm/for-6.13-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/ukleinek/linux:
pwm: stm32: Fix complementary output in round_waveform_tohw()
Linus Torvalds [Thu, 19 Dec 2024 16:45:37 +0000 (08:45 -0800)]
Merge tag 'v6.13-rc3-ksmbd-server-fixes' of git://git.samba.org/ksmbd
Pull smb server fixes from Steve French:
- Two fixes for better handling maximum outstanding requests
- Fix simultaneous negotiate protocol race
* tag 'v6.13-rc3-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
ksmbd: conn lock to serialize smb2 negotiate
ksmbd: fix broken transfers when exceeding max simultaneous operations
ksmbd: count all requests in req_running counter
Lukas Wunner [Tue, 17 Dec 2024 09:51:02 +0000 (10:51 +0100)]
PCI/bwctrl: Enable only if more than one speed is supported
If a PCIe port only supports a single speed, enabling bandwidth control
is pointless: There's no need to monitor autonomous speed changes, nor
can the speed be changed.
Not enabling it saves a small amount of memory and compute resources,
but also fixes a boot hang reported by Niklas: It occurs when enabling
bandwidth control on Downstream Ports of Intel JHL7540 "Titan Ridge 2018"
Thunderbolt controllers. The ports only support 2.5 GT/s in accordance
with USB4 v2 sec 11.2.1, so the present commit works around the issue.
PCIe r6.2 sec 8.2.1 prescribes that:
"A device must support 2.5 GT/s and is not permitted to skip support
for any data rates between 2.5 GT/s and the highest supported rate."
Consequently, bandwidth control is currently only disabled if a port
doesn't support higher speeds than 2.5 GT/s. However the Implementation
Note in PCIe r6.2 sec 7.5.3.18 cautions:
"It is strongly encouraged that software primarily utilize the
Supported Link Speeds Vector instead of the Max Link Speed field,
so that software can determine the exact set of supported speeds on
current and future hardware. This can avoid software being confused
if a future specification defines Links that do not require support
for all slower speeds."
In other words, future revisions of the PCIe Base Spec may allow gaps
in the Supported Link Speeds Vector. To be future-proof, don't just
check whether speeds above 2.5 GT/s are supported, but rather check
whether *more than one* speed is supported.
Fixes: 665745f27487 ("PCI/bwctrl: Re-add BW notification portdrv as PCIe BW controller") Closes: https://lore.kernel.org/r/db8e457fcd155436449b035e8791a8241b0df400.camel@kernel.org Link: https://lore.kernel.org/r/3564908a9c99fc0d2a292473af7a94ebfc8f5820.1734428762.git.lukas@wunner.de Reported-by: Niklas Schnelle <niks@kernel.org> Tested-by: Niklas Schnelle <niks@kernel.org> Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Jonathan Cameron <Jonthan.Cameron@huawei.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Lukas Wunner [Tue, 17 Dec 2024 09:51:01 +0000 (10:51 +0100)]
PCI: Honor Max Link Speed when determining supported speeds
The Supported Link Speeds Vector in the Link Capabilities 2 Register
indicates the *supported* link speeds. The Max Link Speed field in the
Link Capabilities Register indicates the *maximum* of those speeds.
pcie_get_supported_speeds() neglects to honor the Max Link Speed field and
will thus incorrectly deem higher speeds as supported. Fix it.
One user-visible issue addressed here is an incorrect value in the sysfs
attribute "max_link_speed".
But the main motivation is a boot hang reported by Niklas: Intel JHL7540
"Titan Ridge 2018" Thunderbolt controllers supports 2.5-8 GT/s speeds,
but indicate 2.5 GT/s as maximum. Ilpo recalls seeing this on more
devices. It can be explained by the controller's Downstream Ports
supporting 8 GT/s if an Endpoint is attached, but limiting to 2.5 GT/s
if the port interfaces to a PCIe Adapter, in accordance with USB4 v2
sec 11.2.1:
"This section defines the functionality of an Internal PCIe Port that
interfaces to a PCIe Adapter. [...]
The Logical sub-block shall update the PCIe configuration registers
with the following characteristics: [...]
Max Link Speed field in the Link Capabilities Register set to 0001b
(data rate of 2.5 GT/s only).
Note: These settings do not represent actual throughput. Throughput
is implementation specific and based on the USB4 Fabric performance."
The present commit is not sufficient on its own to fix Niklas' boot hang,
but it is a prerequisite: A subsequent commit will fix the boot hang by
enabling bandwidth control only if more than one speed is supported.
The GENMASK() macro used herein specifies 0 as lowest bit, even though
the Supported Link Speeds Vector ends at bit 1. This is done on purpose
to avoid a GENMASK(0, 1) macro if Max Link Speed is zero. That macro
would be invalid as the lowest bit is greater than the highest bit.
Ilpo has witnessed a zero Max Link Speed on Root Complex Integrated
Endpoints in particular, so it does occur in practice. (The Link
Capabilities Register is optional on RCiEPs per PCIe r6.2 sec 7.5.3.)
Fixes: d2bd39c0456b ("PCI: Store all PCIe Supported Link Speeds") Closes: https://lore.kernel.org/r/70829798889c6d779ca0f6cd3260a765780d1369.camel@kernel.org Link: https://lore.kernel.org/r/fe03941e3e1cc42fb9bf4395e302bff53ee2198b.1734428762.git.lukas@wunner.de Reported-by: Niklas Schnelle <niks@kernel.org> Tested-by: Niklas Schnelle <niks@kernel.org> Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Jens Axboe [Thu, 19 Dec 2024 16:32:26 +0000 (09:32 -0700)]
io_uring/register: limit ring resizing to DEFER_TASKRUN
With DEFER_TASKRUN, we know the ring can't be both waited upon and
resized at the same time. This is important for CQ resizing. Allowing SQ
ring resizing is more trivial, but isn't the interesting use case. Hence
limit ring resizing in general to DEFER_TASKRUN only for now. This isn't
a huge problem as CQ ring resizing is generally the most useful on
networking type of workloads where it can be hard to size the ring
appropriately upfront, and those should be using DEFER_TASKRUN for
better performance.
Enzo Matsumiya [Tue, 10 Dec 2024 21:15:12 +0000 (18:15 -0300)]
smb: client: fix TCP timers deadlock after rmmod
Commit ef7134c7fc48 ("smb: client: Fix use-after-free of network namespace.")
fixed a netns UAF by manually enabled socket refcounting
(sk->sk_net_refcnt=1 and sock_inuse_add(net, 1)).
The reason the patch worked for that bug was because we now hold
references to the netns (get_net_track() gets a ref internally)
and they're properly released (internally, on __sk_destruct()),
but only because sk->sk_net_refcnt was set.
Problem:
(this happens regardless of CONFIG_NET_NS_REFCNT_TRACKER and regardless
if init_net or other)
Setting sk->sk_net_refcnt=1 *manually* and *after* socket creation is not
only out of cifs scope, but also technically wrong -- it's set conditionally
based on user (=1) vs kernel (=0) sockets. And net/ implementations
seem to base their user vs kernel space operations on it.
e.g. upon TCP socket close, the TCP timers are not cleared because
sk->sk_net_refcnt=1:
(cf. commit 151c9c724d05 ("tcp: properly terminate timers for kernel sockets"))
net/ipv4/tcp.c:
void tcp_close(struct sock *sk, long timeout)
{
lock_sock(sk);
__tcp_close(sk, timeout);
release_sock(sk);
if (!sk->sk_net_refcnt)
inet_csk_clear_xmit_timers_sync(sk);
sock_put(sk);
}
Which will throw a lockdep warning and then, as expected, deadlock on
tcp_write_timer().
A way to reproduce this is by running the reproducer from ef7134c7fc48
and then 'rmmod cifs'. A few seconds later, the deadlock/lockdep
warning shows up.
Fix:
We shouldn't mess with socket internals ourselves, so do not set
sk_net_refcnt manually.
Also change __sock_create() to sock_create_kern() for explicitness.
As for non-init_net network namespaces, we deal with it the best way
we can -- hold an extra netns reference for server->ssocket and drop it
when it's released. This ensures that the netns still exists whenever
we need to create/destroy server->ssocket, but is not directly tied to
it.
Fixes: ef7134c7fc48 ("smb: client: Fix use-after-free of network namespace.") Cc: stable@vger.kernel.org Signed-off-by: Enzo Matsumiya <ematsumiya@suse.de> Signed-off-by: Steve French <stfrench@microsoft.com>
Bagas Sanjaya [Tue, 17 Dec 2024 03:49:15 +0000 (10:49 +0700)]
drm/sched: Fix drm_sched_fini() docu generation
Commit baf4afc5831438 ("drm/sched: Improve teardown documentation")
added a list of drm_sched_fini()'s problems. The list triggers htmldocs
warning (but renders correctly in htmldocs output):
Jerome Marchand [Wed, 18 Dec 2024 17:57:24 +0000 (18:57 +0100)]
selftests/bpf: Fix compilation error in get_uprobe_offset()
In get_uprobe_offset(), the call to procmap_query() use the constant
PROCMAP_QUERY_VMA_EXECUTABLE, even if PROCMAP_QUERY is not defined.
Define PROCMAP_QUERY_VMA_EXECUTABLE when PROCMAP_QUERY isn't.
Fixes: 4e9e07603ecd ("selftests/bpf: make use of PROCMAP_QUERY ioctl if available") Signed-off-by: Jerome Marchand <jmarchan@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/bpf/20241218175724.578884-1-jmarchan@redhat.com
accel/ivpu: Fix WARN in ivpu_ipc_send_receive_internal()
Move pm_runtime_set_active() to ivpu_pm_init() so when
ivpu_ipc_send_receive_internal() is executed before ivpu_pm_enable()
it already has correct runtime state, even if last resume was
not successful.
Tiezhu Yang [Thu, 19 Dec 2024 11:15:06 +0000 (19:15 +0800)]
selftests/bpf: Use asm constraint "m" for LoongArch
Currently, LoongArch LLVM does not support the constraint "o" and no plan
to support it, it only supports the similar constraint "m", so change the
constraints from "nor" in the "else" case to arch-specific "nmr" to avoid
the build error such as "unexpected asm memory constraint" for LoongArch.
Ahmad Fatoum [Wed, 18 Dec 2024 19:54:53 +0000 (20:54 +0100)]
regulator: rename regulator-uv-survival-time-ms according to DT binding
The regulator bindings don't document regulator-uv-survival-time-ms, but
the more descriptive regulator-uv-less-critical-window-ms instead.
Looking back at v3[1] and v4[2] of the series adding the support,
the property was indeed renamed between these patch series, but
unfortunately the rename only made it into the DT bindings with the
driver code still using the old name.
Let's therefore rename the property in the driver code to follow suit.
This will break backwards compatibility, but there are no upstream
device trees using the property and we never documented the old name
of the property anyway. ¯\_(ツ)_/¯"
Jeremy Kerr [Wed, 18 Dec 2024 03:53:01 +0000 (11:53 +0800)]
net: mctp: handle skb cleanup on sock_queue failures
Currently, we don't use the return value from sock_queue_rcv_skb, which
means we may leak skbs if a message is not successfully queued to a
socket.
Instead, ensure that we're freeing the skb where the sock hasn't
otherwise taken ownership of the skb by adding checks on the
sock_queue_rcv_skb() to invoke a kfree on failure.
In doing so, rather than using the 'rc' value to trigger the
kfree_skb(), use the skb pointer itself, which is more explicit.
Also, add a kunit test for the sock delivery failure cases.
Joe Hattori [Wed, 18 Dec 2024 03:51:06 +0000 (12:51 +0900)]
net: mdiobus: fix an OF node reference leak
fwnode_find_mii_timestamper() calls of_parse_phandle_with_fixed_args()
but does not decrement the refcount of the obtained OF node. Add an
of_node_put() call before returning from the function.
This bug was detected by an experimental static analysis tool that I am
developing.
Fixes: bc1bee3b87ee ("net: mdiobus: Introduce fwnode_mdiobus_register_phy()") Signed-off-by: Joe Hattori <joe@pf.is.s.u-tokyo.ac.jp> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20241218035106.1436405-1-joe@pf.is.s.u-tokyo.ac.jp Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Paolo Abeni [Thu, 19 Dec 2024 08:55:21 +0000 (09:55 +0100)]
Merge tag 'nf-24-12-19' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:
====================
Netfilter/IPVS fixes for net
The following series contains two fixes for Netfilter/IPVS:
1) Possible build failure in IPVS on systems with less than 512MB
memory due to incorrect use of clamp(), from David Laight.
2) Fix bogus lockdep nesting splat with ipset list:set type,
from Phil Sutter.
netfilter pull request 24-12-19
* tag 'nf-24-12-19' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: ipset: Fix for recursive locking warning
ipvs: Fix clamp() of ip_vs_conn_tab on small memory systems
====================
Jakub Kicinski [Wed, 18 Dec 2024 02:25:08 +0000 (18:25 -0800)]
netdev-genl: avoid empty messages in queue dump
Empty netlink responses from do() are not correct (as opposed to
dump() where not dumping anything is perfectly fine).
We should return an error if the target object does not exist,
in this case if the netdev is down it has no queues.
Fixes: 6b6171db7fc8 ("netdev-genl: Add netlink framework functions for queue") Reported-by: syzbot+0a884bc2d304ce4af70f@syzkaller.appspotmail.com Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Joe Damato <jdamato@fastly.com> Link: https://patch.msgid.link/20241218022508.815344-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Mon, 16 Dec 2024 13:50:59 +0000 (15:50 +0200)]
net: dsa: restore dsa_software_vlan_untag() ability to operate on VLAN-untagged traffic
Robert Hodaszi reports that locally terminated traffic towards
VLAN-unaware bridge ports is broken with ocelot-8021q. He is describing
the same symptoms as for commit 1f9fc48fd302 ("net: dsa: sja1105: fix
reception from VLAN-unaware bridges").
For context, the set merged as "VLAN fixes for Ocelot driver":
https://lore.kernel.org/netdev/20240815000707.2006121-1-vladimir.oltean@nxp.com/
was developed in a slightly different form earlier this year, in January.
Initially, the switch was unconditionally configured to set OCELOT_ES0_TAG
when using ocelot-8021q, regardless of port operating mode.
This led to the situation where VLAN-unaware bridge ports would always
push their PVID - see ocelot_vlan_unaware_pvid() - a negligible value
anyway - into RX packets. To strip this in software, we would have needed
DSA to know what private VID the switch chose for VLAN-unaware bridge
ports, and pushed into the packets. This was implemented downstream, and
a remnant of it remains in the form of a comment mentioning
ds->ops->get_private_vid(), as something which would maybe need to be
considered in the future.
However, for upstream, it was deemed inappropriate, because it would
mean introducing yet another behavior for stripping VLAN tags from
VLAN-unaware bridge ports, when one already existed (ds->untag_bridge_pvid).
The latter has been marked as obsolete along with an explanation why it
is logically broken, but still, it would have been confusing.
So, for upstream, felix_update_tag_8021q_rx_rule() was developed, which
essentially changed the state of affairs from "Felix with ocelot-8021q
delivers all packets as VLAN-tagged towards the CPU" into "Felix with
ocelot-8021q delivers all packets from VLAN-aware bridge ports towards
the CPU". This was done on the premise that in VLAN-unaware mode,
there's nothing useful in the VLAN tags, and we can avoid introducing
ds->ops->get_private_vid() in the DSA receive path if we configure the
switch to not push those VLAN tags into packets in the first place.
Unfortunately, and this is when the trainwreck started, the selftests
developed initially and posted with the series were not re-ran.
dsa_software_vlan_untag() was initially written given the assumption
that users of this feature would send _all_ traffic as VLAN-tagged.
It was only partially adapted to the new scheme, by removing
ds->ops->get_private_vid(), which also used to be necessary in
standalone ports mode.
Where the trainwreck became even worse is that I had a second opportunity
to think about this, when the dsa_software_vlan_untag() logic change
initially broke sja1105, in commit 1f9fc48fd302 ("net: dsa: sja1105: fix
reception from VLAN-unaware bridges"). I did not connect the dots that
it also breaks ocelot-8021q, for pretty much the same reason that not
all received packets will be VLAN-tagged.
To be compatible with the optimized Felix control path which runs
felix_update_tag_8021q_rx_rule() to only push VLAN tags when useful (in
VLAN-aware mode), we need to restore the old dsa_software_vlan_untag()
logic. The blamed commit introduced the assumption that
dsa_software_vlan_untag() will see only VLAN-tagged packets, assumption
which is false. What corrupts RX traffic is the fact that we call
skb_vlan_untag() on packets which are not VLAN-tagged in the first
place.
Fixes: 93e4649efa96 ("net: dsa: provide a software untagging function on RX for VLAN-aware bridges") Reported-by: Robert Hodaszi <robert.hodaszi@digi.com> Closes: https://lore.kernel.org/netdev/20241215163334.615427-1-robert.hodaszi@digi.com/ Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20241216135059.1258266-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 19 Dec 2024 03:20:20 +0000 (19:20 -0800)]
Merge branch '200GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Tony Nguyen says:
====================
idpf: trigger SW interrupt when exiting wb_on_itr mode
Joshua Hay says:
This patch series introduces SW triggered interrupt support for idpf,
then uses said interrupt to fix a race condition between completion
writebacks and re-enabling interrupts.
* '200GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
idpf: trigger SW interrupt when exiting wb_on_itr mode
idpf: add support for SW triggered interrupts
====================
Adrian Moreno [Tue, 17 Dec 2024 21:16:51 +0000 (22:16 +0100)]
selftests: openvswitch: fix tcpdump execution
Fix the way tcpdump is executed by:
- Using the right variable for the namespace. Currently the use of the
empty "ns" makes the command fail.
- Waiting until it starts to capture to ensure the interesting traffic
is caught on slow systems.
- Using line-buffered output to ensure logs are available when the test
is paused with "-p". Otherwise the last chunk of data might only be
written when tcpdump is killed.
Jakub Kicinski [Thu, 19 Dec 2024 01:51:38 +0000 (17:51 -0800)]
Merge tag 'linux-can-fixes-for-6.13-20241218' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
Marc Kleine-Budde says:
====================
pull-request: can 2024-12-18
There are 2 patches by Matthias Schiffer for the m_can_pci driver that
handles the m_can cores found on the Intel Elkhart Lake processor.
They fix the initialization and the interrupt handling under high CAN
bus load.
* tag 'linux-can-fixes-for-6.13-20241218' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
can: m_can: fix missed interrupts with m_can_pci
can: m_can: set init flag earlier in probe
====================
Jann Horn [Wed, 18 Dec 2024 16:56:25 +0000 (17:56 +0100)]
io_uring: Fix registered ring file refcount leak
Currently, io_uring_unreg_ringfd() (which cleans up registered rings) is
only called on exit, but __io_uring_free (which frees the tctx in which the
registered ring pointers are stored) is also called on execve (via
begin_new_exec -> io_uring_task_cancel -> __io_uring_cancel ->
io_uring_cancel_generic -> __io_uring_free).
This means: A process going through execve while having registered rings
will leak references to the rings' `struct file`.
Fix it by zapping registered rings on execve(). This is implemented by
moving the io_uring_unreg_ringfd() from io_uring_files_cancel() into its
callee __io_uring_cancel(), which is called from io_uring_task_cancel() on
execve.
This could probably be exploited *on 32-bit kernels* by leaking 2^32
references to the same ring, because the file refcount is stored in a
pointer-sized field and get_file() doesn't have protection against
refcount overflow, just a WARN_ONCE(); but on 64-bit it should have no
impact beyond a memory leak.
Arnd Bergmann [Tue, 17 Dec 2024 08:10:34 +0000 (09:10 +0100)]
net: phy: avoid undefined behavior in *_led_polarity_set()
gcc runs into undefined behavior at the end of the three led_polarity_set()
callback functions if it were called with a zero 'modes' argument and it
just ends the function there without returning from it.
This gets flagged by 'objtool' as a function that continues on
to the next one:
drivers/net/phy/aquantia/aquantia_leds.o: warning: objtool: aqr_phy_led_polarity_set+0xf: can't find jump dest instruction at .text+0x5d9
drivers/net/phy/intel-xway.o: warning: objtool: xway_gphy_led_polarity_set() falls through to next function xway_gphy_config_init()
drivers/net/phy/mxl-gpy.o: warning: objtool: gpy_led_polarity_set() falls through to next function gpy_led_hw_control_get()
There is no point to micro-optimize the behavior here to save a single-digit
number of bytes in the kernel, so just change this to a "return -EINVAL"
as we do when any unexpected bits are set.
Fixes: 1758af47b98c ("net: phy: intel-xway: add support for PHY LEDs") Fixes: 9d55e68b19f2 ("net: phy: aquantia: correctly describe LED polarity override") Fixes: eb89c79c1b8f ("net: phy: mxl-gpy: correctly describe LED polarity") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20241217081056.238792-1-arnd@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Phil Sutter [Tue, 17 Dec 2024 19:56:55 +0000 (20:56 +0100)]
netfilter: ipset: Fix for recursive locking warning
With CONFIG_PROVE_LOCKING, when creating a set of type bitmap:ip, adding
it to a set of type list:set and populating it from iptables SET target
triggers a kernel warning:
| WARNING: possible recursive locking detected
| 6.12.0-rc7-01692-g5e9a28f41134-dirty #594 Not tainted
| --------------------------------------------
| ping/4018 is trying to acquire lock:
| ffff8881094a6848 (&set->lock){+.-.}-{2:2}, at: ip_set_add+0x28c/0x360 [ip_set]
|
| but task is already holding lock:
| ffff88811034c048 (&set->lock){+.-.}-{2:2}, at: ip_set_add+0x28c/0x360 [ip_set]
This is a false alarm: ipset does not allow nested list:set type, so the
loop in list_set_kadd() can never encounter the outer set itself. No
other set type supports embedded sets, so this is the only case to
consider.
To avoid the false report, create a distinct lock class for list:set
type ipset locks.
Fixes: f830837f0eed ("netfilter: ipset: list:set set type support") Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
David Laight [Sat, 14 Dec 2024 17:30:53 +0000 (17:30 +0000)]
ipvs: Fix clamp() of ip_vs_conn_tab on small memory systems
The 'max_avail' value is calculated from the system memory
size using order_base_2().
order_base_2(x) is defined as '(x) ? fn(x) : 0'.
The compiler generates two copies of the code that follows
and then expands clamp(max, min, PAGE_SHIFT - 12) (11 on 32bit).
This triggers a compile-time assert since min is 5.
In reality a system would have to have less than 512MB memory
for the bounds passed to clamp to be reversed.
Swap the order of the arguments to clamp() to avoid the warning.
Replace the clamp_val() on the line below with clamp().
clamp_val() is just 'an accident waiting to happen' and not needed here.
Detected by compile time checks added to clamp(), specifically:
minmax.h: use BUILD_BUG_ON_MSG() for the lo < hi test in clamp()
Reported-by: Linux Kernel Functional Testing <lkft@linaro.org> Closes: https://lore.kernel.org/all/CA+G9fYsT34UkGFKxus63H6UVpYi5GRZkezT9MRLfAbM3f6ke0g@mail.gmail.com/ Fixes: 4f325e26277b ("ipvs: dynamically limit the connection hash table") Tested-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Reviewed-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Signed-off-by: David Laight <david.laight@aculab.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Linus Torvalds [Wed, 18 Dec 2024 22:17:21 +0000 (14:17 -0800)]
Merge tag 'for-6.13-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
- tree-checker catches invalid number of inline extent references
- zoned mode fixes:
- enhance zone append IO command so it also detects emulated writes
- handle bio splitting at sectorsize boundary
- when deleting a snapshot, fix a condition for visiting nodes in reloc
trees
* tag 'for-6.13-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: tree-checker: reject inline extent items with 0 ref count
btrfs: split bios to the fs sector size boundary
btrfs: use bio_is_zone_append() in the completion handler
btrfs: fix improper generation check in snapshot delete
Linus Torvalds [Wed, 18 Dec 2024 20:52:57 +0000 (12:52 -0800)]
Merge tag 'cxl-fixes-6.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl
Pull cxl fixes from Ira Weiny:
- prevent probe failure when non-critical RAS unmasking fails
- fix CXL 1.1 link status sysfs attribute
- fix 4 way (and greater) switch interleave region creation
* tag 'cxl-fixes-6.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl:
cxl/region: Fix region creation for greater than x2 switches
cxl/pci: Check dport->regs.rcd_pcie_cap availability before accessing
cxl/pci: Fix potential bogus return value upon successful probing
Alex Deucher [Thu, 12 Dec 2024 21:49:20 +0000 (16:49 -0500)]
drm/amdgpu/nbio7.0: fix IP version check
Use the helper function rather than reading it directly.
Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 0ec43fbece784215d3c4469973e4556d70bce915) Cc: stable@vger.kernel.org
Huacai Chen [Tue, 17 Dec 2024 07:37:04 +0000 (15:37 +0800)]
ACPI: EC: Enable EC support on LoongArch by default
Commit a6021aa24f6417416d933 ("ACPI: EC: make EC support compile-time
conditional") only enable ACPI_EC on X86 by default, but the embedded
controller is also widely used on LoongArch laptops so we also enable
ACPI_EC for LoongArch.
The laptop driver cannot work without EC, so also update the dependency
of LOONGSON_LAPTOP to let it depend on APCI_EC.
Fixes: a6021aa24f6417416d933 ("ACPI: EC: make EC support compile-time conditional") Reported-by: Xiaotian Wu <wuxiaotian@loongson.cn> Tested-by: Binbin Zhou <zhoubinbin@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> Link: https://patch.msgid.link/20241217073704.3339587-1-chenhuacai@loongson.cn
[ rjw: Added Fixes: ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Steven Rostedt [Wed, 18 Dec 2024 19:15:07 +0000 (14:15 -0500)]
trace/ring-buffer: Do not use TP_printk() formatting for boot mapped buffers
The TP_printk() of a TRACE_EVENT() is a generic printf format that any
developer can create for their event. It may include pointers to strings
and such. A boot mapped buffer may contain data from a previous kernel
where the strings addresses are different.
One solution is to copy the event content and update the pointers by the
recorded delta, but a simpler solution (for now) is to just use the
print_fields() function to print these events. The print_fields() function
just iterates the fields and prints them according to what type they are,
and ignores the TP_printk() format from the event itself.
To understand the difference, when printing via TP_printk() the output
looks like this:
Add a check before the calculation to avoid this problem.
syzbot reported this as a slab-out-of-bounds in __rb_map_vma:
BUG: KASAN: slab-out-of-bounds in __rb_map_vma+0x9ab/0xae0 kernel/trace/ring_buffer.c:7058
Read of size 8 at addr ffff8880767dd2b8 by task syz-executor187/5836
Linus Torvalds [Wed, 18 Dec 2024 18:03:33 +0000 (10:03 -0800)]
Merge tag 'trace-v6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fixes from Steven Rostedt:
"Replace trace_check_vprintf() with test_event_printk() and
ignore_event()
The function test_event_printk() checks on boot up if the trace event
printf() formats dereference any pointers, and if they do, it then
looks at the arguments to make sure that the pointers they dereference
will exist in the event on the ring buffer. If they do not, it issues
a WARN_ON() as it is a likely bug.
But this isn't the case for the strings that can be dereferenced with
"%s", as some trace events (notably RCU and some IPI events) save a
pointer to a static string in the ring buffer. As the string it points
to lives as long as the kernel is running, it is not a bug to
reference it, as it is guaranteed to be there when the event is read.
But it is also possible (and a common bug) to point to some allocated
string that could be freed before the trace event is read and the
dereference is to bad memory. This case requires a run time check.
The previous way to handle this was with trace_check_vprintf() that
would process the printf format piece by piece and send what it didn't
care about to vsnprintf() to handle arguments that were not strings.
This kept it from having to reimplement vsnprintf(). But it relied on
va_list implementation and for architectures that copied the va_list
and did not pass it by reference, it wasn't even possible to do this
check and it would be skipped. As 64bit x86 passed va_list by
reference, most events were tested and this kept out bugs where
strings would have been dereferenced after being freed.
Instead of relying on the implementation of va_list, extend the boot
up test_event_printk() function to validate all the "%s" strings that
can be validated at boot, and for the few events that point to strings
outside the ring buffer, flag both the event and the field that is
dereferenced as "needs_test". Then before the event is printed, a call
to ignore_event() is made, and if the event has the flag set, it
iterates all its fields and for every field that is to be tested, it
will read the pointer directly from the event in the ring buffer and
make sure that it is valid. If the pointer is not valid, it will print
a WARN_ON(), print out to the trace that the event has unsafe memory
and ignore the print format.
With this new update, the trace_check_vprintf() can be safely removed
and now all events can be verified regardless of architecture"
* tag 'trace-v6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing: Check "%s" dereference via the field and not the TP_printk format
tracing: Add "%s" check in test_event_printk()
tracing: Add missing helper functions in event pointer dereference check
tracing: Fix test_event_printk() to process entire print argument
Michel Dänzer [Tue, 17 Dec 2024 17:22:56 +0000 (18:22 +0100)]
drm/amdgpu: Handle NULL bo->tbo.resource (again) in amdgpu_vm_bo_update
Third time's the charm, I hope?
Fixes: d3116756a710 ("drm/ttm: rename bo->mem and make it a pointer") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3837 Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Michel Dänzer <mdaenzer@redhat.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 695c2c745e5dff201b75da8a1d237ce403600d04) Cc: stable@vger.kernel.org
Christian König [Thu, 12 Dec 2024 15:29:18 +0000 (16:29 +0100)]
drm/amdgpu: fix amdgpu_coredump
The VM pointer might already be outdated when that function is called.
Use the PASID instead to gather the information instead.
Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 57f812d171af4ba233d3ed7c94dfa5b8e92dcc04) Cc: stable@vger.kernel.org
Alex Deucher [Thu, 12 Dec 2024 22:06:26 +0000 (17:06 -0500)]
drm/amdgpu/smu14.0.2: fix IP version check
Use the helper function rather than reading it directly.
Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 8f2cd1067afe68372a1723e05e19b68ed187676a) Cc: stable@vger.kernel.org
Alex Deucher [Thu, 12 Dec 2024 22:04:58 +0000 (17:04 -0500)]
drm/amdgpu/gfx12: fix IP version check
Use the helper function rather than reading it directly.
Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit f1fd1d0f40272948aa6ab82a3a82ecbbc76dff53) Cc: stable@vger.kernel.org
Alex Deucher [Thu, 12 Dec 2024 22:03:20 +0000 (17:03 -0500)]
drm/amdgpu/mmhub4.1: fix IP version check
Use the helper function rather than reading it directly.
Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 63bfd24088b42c6f55c2096bfc41b50213d419b2) Cc: stable@vger.kernel.org
Alex Deucher [Thu, 12 Dec 2024 22:00:07 +0000 (17:00 -0500)]
drm/amdgpu/nbio7.11: fix IP version check
Use the helper function rather than reading it directly.
Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 2c8eeaaa0fe5841ccf07a0eb51b1426f34ef39f7) Cc: stable@vger.kernel.org
Alex Deucher [Thu, 12 Dec 2024 21:47:48 +0000 (16:47 -0500)]
drm/amdgpu/nbio7.7: fix IP version check
Use the helper function rather than reading it directly.
Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 22b9555bc90df22b585bdd1f161b61584b13af51) Cc: stable@vger.kernel.org
Since 2320c9e6a768 ("drm/sched: memset() 'job' in drm_sched_job_init()")
accessing job->base.sched can produce unexpected results as the initialisation
of (*job)->base.sched done in amdgpu_job_alloc is overwritten by the
memset.
This commit fixes an issue when a CS would fail validation and would
be rejected after job->num_ibs is incremented. In this case,
amdgpu_ib_free(ring->adev, ...) will be called, which would crash the
machine because the ring value is bogus.
To fix this, pass a NULL pointer to amdgpu_ib_free(): we can do this
because the device is actually not used in this function.
The next commit will remove the ring argument completely.
Fixes: 2320c9e6a768 ("drm/sched: memset() 'job' in drm_sched_job_init()") Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 2ae520cb12831d264ceb97c61f72c59d33c0dbd7)
If the kernel hasn't been compiled with PCIe hotplug support this
can lead to problems with dGPUs that use BOCO because they effectively
drop off the bus.
To prevent issues, disable BOCO support when compiled without PCIe hotplug.
Reported-by: Gabriel Marcano <gabemarcano@yahoo.com> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/1707#note_2696862 Acked-by: Alex Deucher <alexander.deucher@amd.com> Link: https://lore.kernel.org/r/20241211155601.3585256-1-superm1@kernel.org Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 1ad5bdc28bafa66db0f041cc6cdd278a80426aae)
Linus Torvalds [Wed, 18 Dec 2024 17:55:55 +0000 (09:55 -0800)]
Merge tag 'hyperv-fixes-signed-20241217' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux
Pull hyperv fixes from Wei Liu:
- Various fixes to Hyper-V tools in the kernel tree (Dexuan Cui, Olaf
Hering, Vitaly Kuznetsov)
- Fix a bug in the Hyper-V TSC page based sched_clock() (Naman Jain)
- Two bug fixes in the Hyper-V utility functions (Michael Kelley)
- Convert open-coded timeouts to secs_to_jiffies() in Hyper-V drivers
(Easwar Hariharan)
* tag 'hyperv-fixes-signed-20241217' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
tools/hv: reduce resource usage in hv_kvp_daemon
tools/hv: add a .gitignore file
tools/hv: reduce resouce usage in hv_get_dns_info helper
hv/hv_kvp_daemon: Pass NIC name to hv_get_dns_info as well
Drivers: hv: util: Avoid accessing a ringbuffer not initialized yet
Drivers: hv: util: Don't force error code to ENODEV in util_probe()
tools/hv: terminate fcopy daemon if read from uio fails
drivers: hv: Convert open-coded timeouts to secs_to_jiffies()
tools: hv: change permissions of NetworkManager configuration file
x86/hyperv: Fix hv tsc page based sched_clock for hibernation
tools: hv: Fix a complier warning in the fcopy uio daemon
Juergen Gross [Wed, 18 Dec 2024 08:02:28 +0000 (09:02 +0100)]
x86/static-call: fix 32-bit build
In 32-bit x86 builds CONFIG_STATIC_CALL_INLINE isn't set, leading to
static_call_initialized not being available.
Define it as "0" in that case.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Fixes: 0ef8047b737d ("x86/static-call: provide a way to do very early static-call updates") Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Merge tag 'amd-pstate-v6.13-2024-12-11' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/superm1/linux
Merge amd-pstate driver fixes for 6.13-rc4 from Mario Liminciello:
"Fix a problem where systems without preferred cores were
misdetecting preferred cores.
Fix issues with with boost numerator handling leading to
inconsistently programmed CPPC max performance values."
* tag 'amd-pstate-v6.13-2024-12-11' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/superm1/linux:
cpufreq/amd-pstate: Use boost numerator for upper bound of frequencies
cpufreq/amd-pstate: Store the boost numerator as highest perf again
cpufreq/amd-pstate: Detect preferred core support before driver registration
Commit be26ba96421a ("block: Fix potential deadlock while freezing queue and
acquiring sysfs_loc") actually reverts commit 22465bbac53c ("blk-mq: move cpuhp
callback registering out of q->sysfs_lock"), and causes the original resctrl
lockdep warning.
So revert it and we need to fix the issue in another way.
Cc: Nilay Shroff <nilay@linux.ibm.com> Fixes: be26ba96421a ("block: Fix potential deadlock while freezing queue and acquiring sysfs_loc") Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20241218101617.3275704-2-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
Daniel Lezcano [Mon, 16 Dec 2024 21:26:44 +0000 (22:26 +0100)]
thermal/thresholds: Fix boundaries and detection routine
The current implementation does not work if the thermal zone is
interrupt driven only.
The boundaries are not correctly checked and computed as it happens
only when the temperature is increasing or decreasing.
The problem arises because the routine to detect when we cross a
threshold is correlated with the computation of the boundaries. We
assume we have to recompute the boundaries when a threshold is crossed
but actually we should do that even if the it is not the case.
Mixing the boundaries computation and the threshold detection for the
sake of optimizing the routine is much more complex as it appears
intuitively and prone to errors.
This fix separates the boundaries computation and the threshold
crossing detection into different routines. The result is a code much
more simple to understand, thus easier to maintain.
The drawback is we browse the thresholds list several time but we can
consider that as neglictible because that happens when the temperature
is updated. There are certainly some aeras to improve in the
temperature update routine but it would be not adequate as this change
aims to fix the thresholds for v6.13.
Fixes: 445936f9e258 ("thermal: core: Add user thresholds support") Tested-by: Daniel Lezcano <daniel.lezcano@linaro.org> # rock5b, Lenovo x13s Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://patch.msgid.link/20241216212644.1145122-1-daniel.lezcano@linaro.org Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Fabrice Gasnier [Tue, 17 Dec 2024 15:00:21 +0000 (16:00 +0100)]
pwm: stm32: Fix complementary output in round_waveform_tohw()
When the timer supports complementary output, the CCxNE bit must be set
additionally to the CCxE bit. So to not overwrite the latter use |=
instead of = to set the former.
The interrupt line of PCI devices is interpreted as edge-triggered,
however the interrupt signal of the m_can controller integrated in Intel
Elkhart Lake CPUs appears to be generated level-triggered.
Consider the following sequence of events:
- IR register is read, interrupt X is set
- A new interrupt Y is triggered in the m_can controller
- IR register is written to acknowledge interrupt X. Y remains set in IR
As at no point in this sequence no interrupt flag is set in IR, the
m_can interrupt line will never become deasserted, and no edge will ever
be observed to trigger another run of the ISR. This was observed to
result in the TX queue of the EHL m_can to get stuck under high load,
because frames were queued to the hardware in m_can_start_xmit(), but
m_can_finish_tx() was never run to account for their successful
transmission.
On an Elkhart Lake based board with the two CAN interfaces connected to
each other, the following script can reproduce the issue:
ip link set can0 up type can bitrate 1000000
ip link set can1 up type can bitrate 1000000
To fix the issue, repeatedly read and acknowledge interrupts at the
start of the ISR until no interrupt flags are set, so the next incoming
interrupt will also result in an edge on the interrupt line.
While we have received a report that even with this patch, the TX queue
can become stuck under certain (currently unknown) circumstances on the
Elkhart Lake, this patch completely fixes the issue with the above
reproducer, and it is unclear whether the remaining issue has a similar
cause at all.
While an m_can controller usually already has the init flag from a
hardware reset, no such reset happens on the integrated m_can_pci of the
Intel Elkhart Lake. If the CAN controller is found in an active state,
m_can_dev_setup() would fail because m_can_niso_supported() calls
m_can_cccr_update_bits(), which refuses to modify any other configuration
bits when CCCR_INIT is not set.
To avoid this issue, set CCCR_INIT before attempting to modify any other
configuration flags.
rtnetlink: Try the outer netns attribute in rtnl_get_peer_net().
Xiao Liang reported that the cited commit changed netns handling
in newlink() of netkit, veth, and vxcan.
Before the patch, if we don't find a netns attribute in the peer
device attributes, we tried to find another netns attribute in
the outer netlink attributes by passing it to rtnl_link_get_net().