Make sure to drop the reference taken to the iommu platform device when
looking up its driver data during of_xlate().
Note that commit 1a26044954a6 ("iommu/exynos: add missing put_device()
call in exynos_iommu_of_xlate()") fixed the leak in a couple of error
paths, but the reference is still leaking on success.
Fixes: aa759fd376fb ("iommu/exynos: Add callback for initializing devices from device tree") Cc: stable@vger.kernel.org # 4.2: 1a26044954a6 Cc: Yu Kuai <yukuai3@huawei.com> Acked-by: Robin Murphy <robin.murphy@arm.com> Acked-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
DSP expects the periods to be aligned to fragment sizes, currently
setting up to hw constriants on periods bytes is not going to work
correctly as we can endup with periods sizes aligned to 32 bytes however
not aligned to fragment size.
Update the constriants to use fragment size, and also set at step of
10ms for period size to accommodate DSP requirements of 10ms latency.
A matching Common object post processing instance is normally resused
across multiple streams. However currently we close this on DSP
even though there is a refcount on this copp object, this can result in
below error.
q6routing ab00000.remoteproc:glink-edge:apr:service@8:routing: Found Matching Copp 0x0
qcom-q6adm aprsvc:service:4:8: cmd = 0x10325 return error = 0x2
q6routing ab00000.remoteproc:glink-edge:apr:service@8:routing: DSP returned error[2]
q6routing ab00000.remoteproc:glink-edge:apr:service@8:routing: Found Matching Copp 0x0
qcom-q6adm aprsvc:service:4:8: cmd = 0x10325 return error = 0x2
q6routing ab00000.remoteproc:glink-edge:apr:service@8:routing: DSP returned error[2]
qcom-q6adm aprsvc:service:4:8: cmd = 0x10327 return error = 0x2
qcom-q6adm aprsvc:service:4:8: DSP returned error[2]
qcom-q6adm aprsvc:service:4:8: Failed to close copp -22
qcom-q6adm aprsvc:service:4:8: cmd = 0x10327 return error = 0x2
qcom-q6adm aprsvc:service:4:8: DSP returned error[2]
qcom-q6adm aprsvc:service:4:8: Failed to close copp -22
Fix this by addressing moving the adm_close to copp_kref destructor
callback.
The func_traceonoff_triggers.tc sometimes goes to fail
on my board, Kunpeng-920.
[root@localhost]# ./ftracetest ./test.d/ftrace/func_traceonoff_triggers.tc -l fail.log
=== Ftrace unit tests ===
[1] ftrace - test for function traceon/off triggers [FAIL]
[2] (instance) ftrace - test for function traceon/off triggers [UNSUPPORTED]
I look up the log, and it shows that the md5sum is different between csum1 and csum2.
Link: https://lore.kernel.org/r/20230818013226.2182299-1-zouyipeng@huawei.com Fixes: d87b29179aa0 ("selftests: ftrace: Use md5sum to take less time of checking logs") Suggested-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Yipeng Zou <zouyipeng@huawei.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
The dma_alloc_coherent() allocates a dma-mapped buffer, pbl->pg_arr[i].
The dma_free_coherent() should pass the same buffer to
dma_free_coherent() and not page-aligned.
Fixes: 1ac5a4047975 ("RDMA/bnxt_re: Add bnxt_re RoCE driver") Signed-off-by: Thomas Fourier <fourier.thomas@gmail.com> Link: https://patch.msgid.link/20251230085121.8023-2-fourier.thomas@gmail.com Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
In bnxt_qplib_alloc_init_hwq(), while allocating memory for PDE table
driver incorrectly is using the "pg_size" value passed to the function.
Fixed to use the right value 4K. Also, fixed the allocation size for
PBL table.
The bnxt_re SEND path checks wr->send_flags to enable features such as
IP checksum offload. However, send_flags is a bitmask and may contain
multiple flags (e.g. IB_SEND_SIGNALED | IB_SEND_IP_CSUM), while the
existing code uses a switch() statement that only matches when
send_flags is exactly IB_SEND_IP_CSUM.
As a result, checksum offload is not enabled when additional SEND
flags are present.
Replace the switch() with a bitmask test:
if (wr->send_flags & IB_SEND_IP_CSUM)
This ensures IP checksum offload is enabled correctly when multiple
SEND flags are used.
Fixes: 1ac5a4047975 ("RDMA/bnxt_re: Add bnxt_re RoCE driver") Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com> Link: https://patch.msgid.link/20251219093308.2415620-1-alok.a.tiwari@oracle.com Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
RCFW_COMM_CONS_PCI_BAR_REGION is defined as BAR 2, so checking
!creq_db->reg.bar_id is incorrect and always false.
pci_resource_start() returns the BAR base address, and a value of 0
indicates that the BAR is unassigned. Update the condition to test
bar_base == 0 instead.
This ensures the driver detects and logs an error for an unassigned
RCFW communication BAR.
Fix missing comparison operator for RDMA_NETWORK_ROCE_V1 in the
conditional statement. The constant was used directly instead of
being compared with net_type, causing the condition to always
evaluate to true.
Fixes: 1c15b4f2a42f ("RDMA/core: Modify enum ib_gid_type and enum rdma_network_type") Signed-off-by: Jang Ingyu <ingyujang25@korea.ac.kr> Link: https://patch.msgid.link/20251219041508.1725947-1-ingyujang25@korea.ac.kr Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
The page size used for device might in some cases be smaller than
PAGE_SIZE what results in a negative shift when calculating the number of
host pages in PAGE_SIZE for a debug log. Remove the debug line together
with the calculation.
Fixes: 40909f664d27 ("RDMA/efa: Add EFA verbs implementation") Link: https://patch.msgid.link/r/20251210173656.8180-1-mrgolin@amazon.com Reviewed-by: Tom Sela <tomsela@amazon.com> Reviewed-by: Yonatan Nachum <ynachum@amazon.com> Signed-off-by: Michael Margolin <mrgolin@amazon.com> Reviewed-by: Gal Pressman <gal.pressman@linux.dev> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
rose_kill_by_device() collects sockets into a local array[] and then
iterates over them to disconnect sockets bound to a device being brought
down.
The loop mistakenly indexes array[cnt] instead of array[i]. For cnt <
ARRAY_SIZE(array), this reads an uninitialized entry; for cnt ==
ARRAY_SIZE(array), it is an out-of-bounds read. Either case can lead to
an invalid socket pointer dereference and also leaks references taken
via sock_hold().
When a nexthop object is deleted, it is marked as dead and then
fib_table_flush() is called to flush all the routes that are using the
dead nexthop.
The current logic in fib_table_flush() is to only flush error routes
(e.g., blackhole) when it is called as part of network namespace
dismantle (i.e., with flush_all=true). Therefore, error routes are not
flushed when their nexthop object is deleted:
# ip link add name dummy1 up type dummy
# ip nexthop add id 1 dev dummy1
# ip route add 198.51.100.1/32 nhid 1
# ip route add blackhole 198.51.100.2/32 nhid 1
# ip nexthop del id 1
# ip route show
blackhole 198.51.100.2 nhid 1 dev dummy1
As such, they keep holding a reference on the nexthop object which in
turn holds a reference on the nexthop device, resulting in a reference
count leak:
# ip link del dev dummy1
[ 70.516258] unregister_netdevice: waiting for dummy1 to become free. Usage count = 2
Fix by flushing error routes when their nexthop is marked as dead.
IPv6 does not suffer from this problem.
Fixes: 493ced1ac47c ("ipv4: Allow routes to use nexthop objects") Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Closes: https://lore.kernel.org/netdev/d943f806-4da6-4970-ac28-b9373b0e63ac@I-love.SAKURA.ne.jp/ Reported-by: syzbot+881d65229ca4f9ae8c84@syzkaller.appspotmail.com Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20251221144829.197694-1-idosch@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
There exists a kernel oops caused by a BUG_ON(nhead < 0) at
net/core/skbuff.c:2232 in pskb_expand_head().
This bug is triggered as part of the calipso_skbuff_setattr()
routine when skb_cow() is passed headroom > INT_MAX
(i.e. (int)(skb_headroom(skb) + len_delta) < 0).
The root cause of the bug is due to an implicit integer cast in
__skb_cow(). The check (headroom > skb_headroom(skb)) is meant to ensure
that delta = headroom - skb_headroom(skb) is never negative, otherwise
we will trigger a BUG_ON in pskb_expand_head(). However, if
headroom > INT_MAX and delta <= -NET_SKB_PAD, the check passes, delta
becomes negative, and pskb_expand_head() is passed a negative value for
nhead.
Fix the trigger condition in calipso_skbuff_setattr(). Avoid passing
"negative" headroom sizes to skb_cow() within calipso_skbuff_setattr()
by only using skb_cow() to grow headroom.
Fixes: 2917f57b6bc1 ("calipso: Allow the lsm to label the skbuff directly.") Suggested-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Will Rosenberg <whrosenb@asu.edu> Acked-by: Paul Moore <paul@paul-moore.com> Link: https://patch.msgid.link/20251219173637.797418-1-whrosenb@asu.edu Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
This patch ensures that the RX ring size (rx_pending) is not
set below the permitted length. This avoids UBSAN
shift-out-of-bounds errors when users passes small or zero
ring sizes via ethtool -G.
The wrappers in include/linux/pci-dma-compat.h should go away.
The patch has been generated with the coccinelle script below and has been
hand modified to replace GFP_ with a correct flag.
It has been compile tested.
When memory is allocated in 'add_card()', GFP_KERNEL can be used because
this flag is already used a few lines above and no lock is taken in the
between.
In async_set_registers(), when usb_submit_urb() fails, the allocated
async_req structure and URB are not freed, causing a memory leak.
The completion callback async_set_reg_cb() is responsible for freeing
these allocations, but it is only called after the URB is successfully
submitted and completes (successfully or with error). If submission
fails, the callback never runs and the memory is leaked.
Fix this by freeing both the URB and the request structure in the error
path when usb_submit_urb() fails.
The problem is in this flow:
1) Port is enabled, queue_id != 0, in qom_list
2) Port gets disabled
-> team_port_disable()
-> team_queue_override_port_del()
-> del (removed from list)
3) Port is disabled, queue_id != 0, not in any list
4) Priority changes
-> team_queue_override_port_prio_changed()
-> checks: port disabled && queue_id != 0
-> calls del - hits the BUG as it is removed already
To fix this, change the check in team_queue_override_port_prio_changed()
so it returns early if port is not enabled.
Reported-by: syzbot+422806e5f4cce722a71f@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=422806e5f4cce722a71f Fixes: 6c31ff366c11 ("team: remove synchronize_rcu() called during queue override change") Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20251212102953.167287-1-jiri@resnulli.us Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The ibm_rtl_init() function searches for the signature but has a pointer
arithmetic error. The loop counter suggests searching at 4-byte intervals
but the implementation only advances by 1 byte per iteration.
Fix by properly advancing the pointer by sizeof(unsigned int) bytes
each iteration.
Reported-by: Yuhao Jiang <danisjiang@gmail.com> Reported-by: Junrui Luo <moonafterrain@outlook.com> Fixes: 35f0ce032b0f ("IBM Real-Time "SMI Free" mode driver -v7") Signed-off-by: Junrui Luo <moonafterrain@outlook.com> Link: https://patch.msgid.link/SYBPR01MB78812D887A92DE3802D0D06EAFA9A@SYBPR01MB7881.ausprd01.prod.outlook.com Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
A sysfs group is created in msi_init() when old_ec_model is enabled, but
never removed. Remove the msipf_old_attribute_group in that case.
Fixes: 03696e51d75a ("msi-laptop: Disable brightness control for new EC") Signed-off-by: Thomas Fourier <fourier.thomas@gmail.com> Link: https://patch.msgid.link/20251217103617.27668-2-fourier.thomas@gmail.com Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Over the years, syzbot found many ways to crash the kernel
in ip6gre_header() [1].
This involves team or bonding drivers ability to dynamically
change their dev->needed_headroom and/or dev->hard_header_len
In this particular crash mld_newpack() allocated an skb
with a too small reserve/headroom, and by the time mld_sendpack()
was called, syzbot managed to attach an ip6gre device.
Fixes: c12b395a4664 ("gre: Support GRE over IPv6") Reported-by: syzbot+43a2ebcf2a64b1102d64@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/693b002c.a70a0220.33cd7b.0033.GAE@google.com/T/#u Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20251211173550.2032674-1-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The openvswitch teardown code will immediately call
ovs_netdev_detach_dev() in response to a NETDEV_UNREGISTER notification.
It will then start the dp_notify_work workqueue, which will later end up
calling the vport destroy() callback. This callback takes the RTNL to do
another ovs_netdev_detach_port(), which in this case is unnecessary.
This causes extra pressure on the RTNL, in some cases leading to
"unregister_netdevice: waiting for XX to become free" warnings on
teardown.
We can straight-forwardly avoid the extra RTNL lock acquisition by
checking the device flags before taking the lock, and skip the locking
altogether if the IFF_OVS_DATAPATH flag has already been unset.
The Aspeed MDIO controller may return incorrect data when a read operation
follows immediately after a write. Due to a controller bug, the subsequent
read can latch stale data, causing the polling logic to terminate earlier
than expected.
To work around this hardware issue, insert a dummy read after each write
operation. This ensures that the next actual read returns the correct
data and prevents premature polling exit.
This workaround has been verified to stabilize MDIO transactions on
affected Aspeed platforms.
Add aspeed_mdio_op() and aseed_mdio_get_data() for register accessing.
aspeed_mdio_op() handles operations, write command to control register,
then check and wait operations is finished (bit 31 is cleared).
aseed_mdio_get_data() fetchs the result value of operation from data
register.
Signed-off-by: Potin Lai <potin.lai@quantatw.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Stable-dep-of: d1a1a4bade4b ("net: mdio: aspeed: add dummy read to avoid read-after-write issue") Signed-off-by: Sasha Levin <sashal@kernel.org>
As soon as crypto_aead_encrypt is called, the underlying request
may be freed by an asynchronous completion. Thus dereferencing
req->iv after it returns is invalid.
Instead of checking req->iv against info, create a new variable
unaligned_info and use it for that purpose instead.
Fixes: 0a270321dbf9 ("[CRYPTO] seqiv: Add Sequence Number IV Generator") Reported-by: Xiumei Mu <xmu@redhat.com> Reported-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Sasha Levin <sashal@kernel.org>
There are off-by-one bugs when configuring RSS hash key and lookup
table, causing out-of-bounds reads to memory [1] and out-of-bounds
writes to device registers.
Before commit 43a3d9ba34c9 ("i40evf: Allow PF driver to configure RSS"),
the loop upper bounds were:
i <= I40E_VFQF_{HKEY,HLUT}_MAX_INDEX
which is safe since the value is the last valid index.
That commit changed the bounds to:
i <= adapter->rss_{key,lut}_size / 4
where `rss_{key,lut}_size / 4` is the number of dwords, so the last
valid index is `(rss_{key,lut}_size / 4) - 1`. Therefore, using `<=`
accesses one element past the end.
Fix the issues by using `<` instead of `<=`, ensuring we do not exceed
the bounds.
[1] KASAN splat about rss_key_size off-by-one
BUG: KASAN: slab-out-of-bounds in iavf_config_rss+0x619/0x800
Read of size 4 at addr ffff888102c50134 by task kworker/u8:6/63
The buggy address belongs to the object at ffff888102c50100
which belongs to the cache kmalloc-64 of size 64
The buggy address is located 0 bytes to the right of
allocated 52-byte region [ffff888102c50100, ffff888102c50134)
Memory state around the buggy address: ffff888102c50000: 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc fc ffff888102c50080: 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc fc
>ffff888102c50100: 00 00 00 00 00 00 04 fc fc fc fc fc fc fc fc fc
^ ffff888102c50180: 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc ffff888102c50200: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
Add service task schedule to set_rx_mode.
In some cases there are error messages printed out in PTP application
(ptp4l):
ptp4l[13848.762]: port 1 (ens2f3np3): received SYNC without timestamp
ptp4l[13848.825]: port 1 (ens2f3np3): received SYNC without timestamp
ptp4l[13848.887]: port 1 (ens2f3np3): received SYNC without timestamp
This happens when service task would not run immediately after
set_rx_mode, and we need it for setup tasks. This service task checks, if
PTP RX packets are hung in firmware, and propagate correct settings such
as multicast address for IEEE 1588 Precision Time Protocol.
RX timestamping depends on some of these filters set. Bug happens only
with high PTP packets frequency incoming, and not every run since
sometimes service task is being ran from a different place immediately
after starting ptp4l.
Fixes: 0e4425ed641f ("i40e: fix: do not sleep in netdev_ops") Reviewed-by: Grzegorz Nitka <grzegorz.nitka@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Przemyslaw Korba <przemyslaw.korba@intel.com> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The macros FAN_FROM_REG and TEMP_FROM_REG evaluate their arguments
multiple times. When used in lockless contexts involving shared driver
data, this causes Time-of-Check to Time-of-Use (TOCTOU) race
conditions.
Convert the macros to static functions. This guarantees that arguments
are evaluated only once (pass-by-value), preventing the race
conditions.
Adhere to the principle of minimal changes by only converting macros
that evaluate arguments multiple times and are used in lockless
contexts.
The macro FAN_FROM_REG evaluates its arguments multiple times. When used
in lockless contexts involving shared driver data, this leads to
Time-of-Check to Time-of-Use (TOCTOU) race conditions, potentially
causing divide-by-zero errors.
Convert the macro to a static function. This guarantees that arguments
are evaluated only once (pass-by-value), preventing the race
conditions.
Additionally, in store_fan_div, move the calculation of the minimum
limit inside the update lock. This ensures that the read-modify-write
sequence operates on consistent data.
Adhere to the principle of minimal changes by only converting macros
that evaluate arguments multiple times and are used in lockless
contexts.
In i2c_amd_probe(), amd_mp2_find_device() utilizes
driver_find_next_device() which internally calls driver_find_device()
to locate the matching device. driver_find_device() increments the
reference count of the found device by calling get_device(), but
amd_mp2_find_device() fails to call put_device() to decrement the
reference count before returning. This results in a reference count
leak of the PCI device each time i2c_amd_probe() is executed, which
may prevent the device from being properly released and cause a memory
leak.
Found by code review.
Cc: stable@vger.kernel.org Fixes: 529766e0a011 ("i2c: Add drivers for the AMD PCIe MP2 I2C controller") Signed-off-by: Ma Ke <make24@iscas.ac.cn> Signed-off-by: Andi Shyti <andi.shyti@kernel.org> Link: https://lore.kernel.org/r/20251022095402.8846-1-make24@iscas.ac.cn Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
While testing rpmsg-char interface it was noticed that duplicate sysfs
entries are getting created and below warning is noticed.
Reason for this is that we are leaking rpmsg device pointer, setting it
null without actually unregistering device.
Any further attempts to unregister fail because rpdev is NULL,
resulting in a leak.
Fix this by unregistering rpmsg device before removing its reference
from rpmsg channel.
sysfs: cannot create duplicate filename '/devices/platform/soc@0/3700000.remot
eproc/remoteproc/remoteproc1/3700000.remoteproc:glink-edge/3700000.remoteproc:
glink-edge.adsp_apps.-1.-1'
[ 114.115347] CPU: 0 UID: 0 PID: 9 Comm: kworker/0:0 Not
tainted 6.16.0-rc4 #7 PREEMPT
[ 114.115355] Hardware name: Qualcomm Technologies, Inc. Robotics RB3gen2 (DT)
[ 114.115358] Workqueue: events qcom_glink_work
[ 114.115371] Call trace:8
[ 114.115374] show_stack+0x18/0x24 (C)
[ 114.115382] dump_stack_lvl+0x60/0x80
[ 114.115388] dump_stack+0x18/0x24
[ 114.115393] sysfs_warn_dup+0x64/0x80
[ 114.115402] sysfs_create_dir_ns+0xf4/0x120
[ 114.115409] kobject_add_internal+0x98/0x260
[ 114.115416] kobject_add+0x9c/0x108
[ 114.115421] device_add+0xc4/0x7a0
[ 114.115429] rpmsg_register_device+0x5c/0xb0
[ 114.115434] qcom_glink_work+0x4bc/0x820
[ 114.115438] process_one_work+0x148/0x284
[ 114.115446] worker_thread+0x2c4/0x3e0
[ 114.115452] kthread+0x12c/0x204
[ 114.115457] ret_from_fork+0x10/0x20
[ 114.115464] kobject: kobject_add_internal failed for 3700000.remoteproc:
glink-edge.adsp_apps.-1.-1 with -EEXIST, don't try to register things with
the same name in the same directory.
[ 114.250045] rpmsg 3700000.remoteproc:glink-edge.adsp_apps.-1.-1:
device_add failed: -17
Make sure to drop the reference taken to the canvas platform device when
looking up its driver data.
Note that holding a reference to a device does not prevent its driver
data from going away so there is no point in keeping the reference.
Also note that commit 28f851e6afa8 ("soc: amlogic: canvas: add missing
put_device() call in meson_canvas_get()") fixed the leak in a lookup
error path, but the reference is still leaking on success.
Make sure to drop the reference taken to the ocmem platform device when
looking up its driver data.
Note that holding a reference to a device does not prevent its driver
data from going away so there is no point in keeping the reference.
Also note that commit 0ff027027e05 ("soc: qcom: ocmem: Fix missing
put_device() call in of_get_ocmem") fixed the leak in a lookup error
path, but the reference is still leaking on success.
Fixes: 88c1e9404f1d ("soc: qcom: add OCMEM driver") Cc: stable@vger.kernel.org # 5.5: 0ff027027e05 Cc: Brian Masney <bmasney@redhat.com> Cc: Miaoqian Lin <linmq006@gmail.com> Signed-off-by: Johan Hovold <johan@kernel.org> Reviewed-by: Brian Masney <bmasney@redhat.com> Link: https://lore.kernel.org/r/20250926143511.6715-2-johan@kernel.org Signed-off-by: Bjorn Andersson <andersson@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
__io_openat_prep() allocates a struct filename using getname(). However,
for the condition of the file being installed in the fixed file table as
well as having O_CLOEXEC flag set, the function returns early. At that
point, the request doesn't have REQ_F_NEED_CLEANUP flag set. Due to this,
the memory for the newly allocated struct filename is not cleaned up,
causing a memory leak.
Fix this by setting the REQ_F_NEED_CLEANUP for the request just after the
successful getname() call, so that when the request is torn down, the
filename will be cleaned up, along with other resources needing cleanup.
Clang is not happy about set but (in some cases) unused variable:
fs/nfsd/export.c:1027:17: error: variable 'inode' set but not used [-Werror,-Wunused-but-set-variable]
since it's used as a parameter to dprintk() which might be configured
a no-op. To avoid uglifying code with the specific ifdeffery just mark
the variable __maybe_unused.
The commit [1], which introduced this behaviour, is quite old and hence
the Fixes tag points to the first of the Git era.
inotify/fanotify do not allow users with no read access to a file to
subscribe to events (e.g. IN_ACCESS/IN_MODIFY), but they do allow the
same user to subscribe for watching events on children when the user
has access to the parent directory (e.g. /dev).
Users with no read access to a file but with read access to its parent
directory can still stat the file and see if it was accessed/modified
via atime/mtime change.
The same is not true for special files (e.g. /dev/null). Users will not
generally observe atime/mtime changes when other users read/write to
special files, only when someone sets atime/mtime via utimensat().
Align fsnotify events with this stat behavior and do not generate
ACCESS/MODIFY events to parent watchers on read/write of special files.
The events are still generated to parent watchers on utimensat(). This
closes some side-channels that could be possibly used for information
exfiltration [1].
Commit 89d9cec3b1e9 ("PM: runtime: Clear power.needs_force_resume in
pm_runtime_reinit()") added provisional clearing of power.needs_force_resume
to pm_runtime_reinit(), but it is done unconditionally which is a
mistake because pm_runtime_reinit() may race with driver probing
and removal [1].
To address this, notice that power.needs_force_resume should never
be set when runtime PM is enabled and so it only needs to be cleared
when runtime PM is disabled, and update pm_runtime_init() to only
clear that flag when runtime PM is disabled.
Fixes: 89d9cec3b1e9 ("PM: runtime: Clear power.needs_force_resume in pm_runtime_reinit()") Reported-by: Ed Tsai <ed.tsai@mediatek.com> Closes: https://lore.kernel.org/linux-pm/20251215122154.3180001-1-ed.tsai@mediatek.com/ [1] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: 6.17+ <stable@vger.kernel.org> # 6.17+ Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Link: https://patch.msgid.link/12807571.O9o76ZdvQC@rafael.j.wysocki Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Synthetic events currently do not have a function to register perf events.
This leads to calling the tracepoint register functions with a NULL
function pointer which triggers:
Instead, have the code return -ENODEV, which doesn't warn and has perf
error out with:
# perf record -e synthetic:futex_wait
Error:
The sys_perf_event_open() syscall returned with 19 (No such device) for event (synthetic:futex_wait).
"dmesg | grep -i perf" may provide additional information.
Ideally perf should support synthetic events, but for now just fix the
warning. The support can come later.
Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://patch.msgid.link/20251216182440.147e4453@gandalf.local.home Fixes: 4b147936fa509 ("tracing: Add support for 'synthetic' events") Reported-by: Ian Rogers <irogers@google.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Set exit_code_hi to -1u as a temporary band-aid to fix a long-standing
(effectively since KVM's inception) bug where KVM treats the exit code as
a 32-bit value, when in reality it's a 64-bit value. Per the APM, offset
0x70 is a single 64-bit value:
070h 63:0 EXITCODE
And a sane reading of the error values defined in "Table C-1. SVM Intercept
Codes" is that negative values use the full 64 bits:
–1 VMEXIT_INVALID Invalid guest state in VMCB.
–2 VMEXIT_BUSYBUSY bit was set in the VMSA
–3 VMEXIT_IDLE_REQUIREDThe sibling thread is not in an idle state
-4 VMEXIT_INVALID_PMC Invalid PMC state
And that interpretation is confirmed by testing on Milan and Turin (by
setting bits in CR0[63:32] to generate VMEXIT_INVALID on VMRUN).
Furthermore, Xen has treated exitcode as a 64-bit value since HVM support
was adding in 2006 (see Xen commit d1bd157fbc ("Big merge the HVM
full-virtualisation abstractions.")).
Cc: Jim Mattson <jmattson@google.com> Cc: Yosry Ahmed <yosry.ahmed@linux.dev> Cc: stable@vger.kernel.org Reviewed-by: Yosry Ahmed <yosry.ahmed@linux.dev> Link: https://patch.msgid.link/20251113225621.1688428-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When emulating L2 instructions, svm_check_intercept() checks whether a
write to CR0 should trigger a synthesized #VMEXIT with
SVM_EXIT_CR0_SEL_WRITE. For MOV-to-CR0, SVM_EXIT_CR0_SEL_WRITE is only
triggered if any bit other than CR0.MP and CR0.TS is updated. However,
according to the APM (24593—Rev. 3.42—March 2024, Table 15-7):
The LMSW instruction treats the selective CR0-write
intercept as a non-selective intercept (i.e., it intercepts
regardless of the value being written).
Skip checking the changed bits for x86_intercept_lmsw and always inject
SVM_EXIT_CR0_SEL_WRITE.
When advancing the target expiration for the guest's APIC timer in periodic
mode, set the expiration to "now" if the target expiration is in the past
(similar to what is done in update_target_expiration()). Blindly adding
the period to the previous target expiration can result in KVM generating
a practically unbounded number of hrtimer IRQs due to programming an
expired timer over and over. In extreme scenarios, e.g. if userspace
pauses/suspends a VM for an extended duration, this can even cause hard
lockups in the host.
Currently, the bug only affects Intel CPUs when using the hypervisor timer
(HV timer), a.k.a. the VMX preemption timer. Unlike the software timer,
a.k.a. hrtimer, which KVM keeps running even on exits to userspace, the
HV timer only runs while the guest is active. As a result, if the vCPU
does not run for an extended duration, there will be a huge gap between
the target expiration and the current time the vCPU resumes running.
Because the target expiration is incremented by only one period on each
timer expiration, this leads to a series of timer expirations occurring
rapidly after the vCPU/VM resumes.
More critically, when the vCPU first triggers a periodic HV timer
expiration after resuming, advancing the expiration by only one period
will result in a target expiration in the past. As a result, the delta
may be calculated as a negative value. When the delta is converted into
an absolute value (tscdeadline is an unsigned u64), the resulting value
can overflow what the HV timer is capable of programming. I.e. the large
value will exceed the VMX Preemption Timer's maximum bit width of
cpu_preemption_timer_multi + 32, and thus cause KVM to switch from the
HV timer to the software timer (hrtimers).
After switching to the software timer, periodic timer expiration callbacks
may be executed consecutively within a single clock interrupt handler,
because hrtimers honors KVM's request for an expiration in the past and
immediately re-invokes KVM's callback after reprogramming. And because
the interrupt handler runs with IRQs disabled, restarting KVM's hrtimer
over and over until the target expiration is advanced to "now" can result
in a hard lockup.
E.g. the following hard lockup was triggered in the host when running a
Windows VM (only relevant because it used the APIC timer in periodic mode)
after resuming the VM from a long suspend (in the host).
Moreover, if the suspend duration of the virtual machine is not long enough
to trigger a hard lockup in this scenario, since commit 98c25ead5eda
("KVM: VMX: Move preemption timer <=> hrtimer dance to common x86"), KVM
will continue using the software timer until the guest reprograms the APIC
timer in some way. Since the periodic timer does not require frequent APIC
timer register programming, the guest may continue to use the software
timer in perpetuity.
Fixes: d8f2f498d9ed ("x86/kvm: fix LAPIC timer drift when guest uses periodic mode") Cc: stable@vger.kernel.org Signed-off-by: fuqiang wang <fuqiang.wng@gmail.com>
[sean: massage comments and changelog] Link: https://patch.msgid.link/20251113205114.1647493-4-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When restarting an hrtimer to emulate a the guest's APIC timer in periodic
mode, explicitly set the expiration using the target expiration computed
by advance_periodic_target_expiration() instead of adding the period to
the existing timer. This will allow making adjustments to the expiration,
e.g. to deal with expirations far in the past, without having to implement
the same logic in both advance_periodic_target_expiration() and
apic_timer_fn().
Cc: stable@vger.kernel.org Signed-off-by: fuqiang wang <fuqiang.wng@gmail.com>
[sean: split to separate patch, write changelog] Link: https://patch.msgid.link/20251113205114.1647493-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
WARN and don't restart the hrtimer if KVM's callback runs with the guest's
APIC timer in periodic mode but with a period of '0', as not advancing the
hrtimer's deadline would put the CPU into an infinite loop of hrtimer
events. Observing a period of '0' should be impossible, even when the
hrtimer is running on a different CPU than the vCPU, as KVM is supposed to
cancel the hrtimer before changing (or zeroing) the period, e.g. when
switching from periodic to one-shot.
If the osdmap is (maliciously) corrupted such that the encoded length
of ceph_pg_pool envelope is less than what is expected for a particular
encoding version, out-of-bounds reads may ensue because the only bounds
check that is there is based on that length value.
This patch adds explicit bounds checks for each field that is decoded
or skipped.
The ASP chip is a very old variant of the GSP chip and is used e.g. in
HP 730 workstations. When trying to reprogram the affinity it will crash
with a HPMC as the relevant registers don't seem to be at the usual
location. Let's avoid the crash by checking the sversion. Also note,
that reprogramming isn't necessary either, as the HP730 is a just a
single-CPU machine.
__scs_magic() needs a 'void *' variable, but a 'struct task_struct *' is
given. 'task_scs(tsk)' is the starting address of the task's shadow call
stack, and '__scs_magic(task_scs(tsk))' is the end address of the task's
shadow call stack. Here should be '__scs_magic(task_scs(tsk))'.
The user-visible effect of this bug is that when CONFIG_DEBUG_STACK_USAGE
is enabled, the shadow call stack usage checking function
(scs_check_usage) would scan an incorrect memory range. This could lead
to:
1. **Inaccurate stack usage reporting**: The function would calculate
wrong usage statistics for the shadow call stack, potentially showing
incorrect value in kmsg.
2. **Potential kernel crash**: If the value of __scs_magic(tsk)is
greater than that of __scs_magic(task_scs(tsk)), the for loop may
access unmapped memory, potentially causing a kernel panic. However,
this scenario is unlikely because task_struct is allocated via the slab
allocator (which typically returns lower addresses), while the shadow
call stack returned by task_scs(tsk) is allocated via vmalloc(which
typically returns higher addresses).
However, since this is purely a debugging feature
(CONFIG_DEBUG_STACK_USAGE), normal production systems should be not
unaffected. The bug only impacts developers and testers who are actively
debugging stack usage with this configuration enabled.
Link: https://lkml.kernel.org/r/20251011082222.12965-1-zhichi.lin@vivo.com Fixes: 5bbaf9d1fcb9 ("scs: Add support for stack usage debugging") Signed-off-by: Jiyuan Xie <xiejiyuan@vivo.com> Signed-off-by: Zhichi Lin <zhichi.lin@vivo.com> Reviewed-by: Sami Tolvanen <samitolvanen@google.com> Acked-by: Will Deacon <will@kernel.org> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Kees Cook <keescook@chromium.org> Cc: Marco Elver <elver@google.com> Cc: Will Deacon <will@kernel.org> Cc: Yee Lee <yee.lee@mediatek.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
syzbot reported a kernel BUG in ocfs2_find_victim_chain() because the
`cl_next_free_rec` field of the allocation chain list (next free slot in
the chain list) is 0, triggring the BUG_ON(!cl->cl_next_free_rec)
condition in ocfs2_find_victim_chain() and panicking the kernel.
To fix this, an if condition is introduced in ocfs2_claim_suballoc_bits(),
just before calling ocfs2_find_victim_chain(), the code block in it being
executed when either of the following conditions is true:
1. `cl_next_free_rec` is equal to 0, indicating that there are no free
chains in the allocation chain list
2. `cl_next_free_rec` is greater than `cl_count` (the total number of
chains in the allocation chain list)
Either of them being true is indicative of the fact that there are no
chains left for usage.
This is addressed using ocfs2_error(), which prints
the error log for debugging purposes, rather than panicking the kernel.
Link: https://lkml.kernel.org/r/20251201130711.143900-1-activprithvi@gmail.com Signed-off-by: Prithvi Tambewagh <activprithvi@gmail.com> Reported-by: syzbot+96d38c6e1655c1420a72@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=96d38c6e1655c1420a72 Tested-by: syzbot+96d38c6e1655c1420a72@syzkaller.appspotmail.com Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Jun Piao <piaojun@huawei.com> Cc: Heming Zhao <heming.zhao@suse.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
vidtv_channel_si_init() creates a temporary list (program, service, event)
and ownership of the memory itself is transferred to the PAT/SDT/EIT
tables through vidtv_psi_pat_program_assign(),
vidtv_psi_sdt_service_assign(), vidtv_psi_eit_event_assign().
The problem here is that the local pointer where the memory ownership
transfer was completed is not initialized to NULL. This causes the
vidtv_psi_pmt_create_sec_for_each_pat_entry() function to fail, and
in the flow that jumps to free_eit, the memory that was freed by
vidtv_psi_*_table_destroy() can be accessed again by
vidtv_psi_*_event_destroy() due to the uninitialized local pointer, so it
is freed once again.
Therefore, to prevent use-after-free and double-free vulnerability,
local pointers must be initialized to NULL when transferring memory
ownership.
Cc: <stable@vger.kernel.org> Reported-by: syzbot+1d9c0edea5907af239e0@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=1d9c0edea5907af239e0 Fixes: 3be8037960bc ("media: vidtv: add error checks") Signed-off-by: Jeongjun Park <aha310510@gmail.com> Reviewed-by: Daniel Almeida <daniel.almeida@collabora.com> Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
KASAN reports a global-out-of-bounds access when running these nfit
tests: clear.sh, pmem-errors.sh, pfn-meta-errors.sh, btt-errors.sh,
daxdev-errors.sh, and inject-error.sh.
[] BUG: KASAN: global-out-of-bounds in nfit_test_ctl+0x769f/0x7840 [nfit_test]
[] Read of size 4 at addr ffffffffc03ea01c by task ndctl/1215
[] The buggy address belongs to the variable:
[] handle+0x1c/0x1df4 [nfit_test]
nfit_test_search_spa() uses handle[nvdimm->id] to retrieve a device
handle and triggers a KASAN error when it reads past the end of the
handle array. It should not be indexing the handle array at all.
The correct device handle is stored in per-DIMM test data. Each DIMM
has a struct nfit_mem that embeds a struct acpi_nfit_memdev that
describes the NFIT device handle. Use that device handle here.
With below scripts, it will trigger panic in f2fs:
mkfs.f2fs -f /dev/vdd
mount /dev/vdd /mnt/f2fs
touch /mnt/f2fs/foo
sync
echo 111 >> /mnt/f2fs/foo
f2fs_io fsync /mnt/f2fs/foo
f2fs_io shutdown 2 /mnt/f2fs
umount /mnt/f2fs
mount -o ro,norecovery /dev/vdd /mnt/f2fs
or
mount -o ro,disable_roll_forward /dev/vdd /mnt/f2fs
F2FS-fs (vdd): f2fs_recover_fsync_data: recovery fsync data, check_only: 0
F2FS-fs (vdd): Mounted with checkpoint version = 7f5c361f
F2FS-fs (vdd): Stopped filesystem due to reason: 0
F2FS-fs (vdd): f2fs_recover_fsync_data: recovery fsync data, check_only: 1
Filesystem f2fs get_tree() didn't set fc->root, returned 1
------------[ cut here ]------------
kernel BUG at fs/super.c:1761!
Oops: invalid opcode: 0000 [#1] SMP PTI
CPU: 3 UID: 0 PID: 722 Comm: mount Not tainted 6.18.0-rc2+ #721 PREEMPT(voluntary)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
RIP: 0010:vfs_get_tree.cold+0x18/0x1a
Call Trace:
<TASK>
fc_mount+0x13/0xa0
path_mount+0x34e/0xc50
__x64_sys_mount+0x121/0x150
do_syscall_64+0x84/0x800
entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7fa6cc126cfe
The root cause is we missed to handle error number returned from
f2fs_recover_fsync_data() when mounting image w/ ro,norecovery or
ro,disable_roll_forward mount option, result in returning a positive
error number to vfs_get_tree(), fix it.
Cc: stable@kernel.org Fixes: 6781eabba1bd ("f2fs: give -EINVAL for norecovery and rw mount") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
F2FS can mount filesystems with corrupted directory depth values that
get runtime-clamped to MAX_DIR_HASH_DEPTH. When RENAME_WHITEOUT
operations are performed on such directories, f2fs_rename performs
directory modifications (updating target entry and deleting source
entry) before attempting to add the whiteout entry via f2fs_add_link.
If f2fs_add_link fails due to the corrupted directory structure, the
function returns an error to VFS, but the partial directory
modifications have already been committed to disk. VFS assumes the
entire rename operation failed and does not update the dentry cache,
leaving stale mappings.
In the error path, VFS does not call d_move() to update the dentry
cache. This results in new_dentry still pointing to the old inode
(new_inode) which has already had its i_nlink decremented to zero.
The stale cache causes subsequent operations to incorrectly reference
the freed inode.
This causes subsequent operations to use cached dentry information that
no longer matches the on-disk state. When a second rename targets the
same entry, VFS attempts to decrement i_nlink on the stale inode, which
may already have i_nlink=0, triggering a WARNING in drop_nlink().
Example sequence:
1. First rename (RENAME_WHITEOUT): file2 → file1
- f2fs updates file1 entry on disk (points to inode 8)
- f2fs deletes file2 entry on disk
- f2fs_add_link(whiteout) fails (corrupted directory)
- Returns error to VFS
- VFS does not call d_move() due to error
- VFS cache still has: file1 → inode 7 (stale!)
- inode 7 has i_nlink=0 (already decremented)
2. Second rename: file3 → file1
- VFS uses stale cache: file1 → inode 7
- Tries to drop_nlink on inode 7 (i_nlink already 0)
- WARNING in drop_nlink()
Fix this by explicitly invalidating old_dentry and new_dentry when
f2fs_add_link fails during whiteout creation. This forces VFS to
refresh from disk on subsequent operations, ensuring cache consistency
even when the rename partially succeeds.
Reproducer:
1. Mount F2FS image with corrupted i_current_depth
2. renameat2(file2, file1, RENAME_WHITEOUT)
3. renameat2(file3, file1, 0)
4. System triggers WARNING in drop_nlink()
The asd_pci_remove() function fails to synchronize with pending tasklets
before freeing the asd_ha structure, leading to a potential
use-after-free vulnerability.
When a device removal is triggered (via hot-unplug or module unload),
race condition can occur.
The fix adds tasklet_kill() before freeing the asd_ha structure,
ensuring all scheduled tasklets complete before cleanup proceeds.
Reported-by: Yuhao Jiang <danisjiang@gmail.com> Reported-by: Junrui Luo <moonafterrain@outlook.com> Fixes: 2908d778ab3e ("[SCSI] aic94xx: new driver") Cc: stable@vger.kernel.org Signed-off-by: Junrui Luo <moonafterrain@outlook.com> Link: https://patch.msgid.link/ME2PR01MB3156AB7DCACA206C845FC7E8AFFDA@ME2PR01MB3156.ausprd01.prod.outlook.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The commit being reverted added code to __qla2x00_abort_all_cmds() to
call sp->done() without holding a spinlock. But unlike the older code
below it, this new code failed to check sp->cmd_type and just assumed
TYPE_SRB, which results in a jump to an invalid pointer in target-mode
with TYPE_TGT_CMD:
Then commit 4475afa2646d ("scsi: qla2xxx: Complete command early within
lock") added the spinlock back, because not having the lock caused a
race and a crash. But qla2x00_abort_srb() in the switch below already
checks for qla2x00_chip_is_down() and handles it the same way, so the
code above the switch is now redundant and still buggy in target-mode.
Remove it.
There are two reference count leaks in this driver:
1. In nforce2_fsb_read(): pci_get_subsys() increases the reference count
of the PCI device, but pci_dev_put() is never called to release it,
thus leaking the reference.
2. In nforce2_detect_chipset(): pci_get_subsys() gets a reference to the
nforce2_dev which is stored in a global variable, but the reference
is never released when the module is unloaded.
Fix both by:
- Adding pci_dev_put(nforce2_sub5) in nforce2_fsb_read() after reading
the configuration.
- Adding pci_dev_put(nforce2_dev) in nforce2_exit() to release the
global device reference.
intel_th_output_open() calls bus_find_device_by_devt() which
internally increments the device reference count via get_device(), but
this reference is not properly released in several error paths. When
device driver is unavailable, file operations cannot be obtained, or
the driver's open method fails, the function returns without calling
put_device(), leading to a permanent device reference count leak. This
prevents the device from being properly released and could cause
resource exhaustion over time.
Found by code review.
Cc: stable <stable@kernel.org> Fixes: 39f4034693b7 ("intel_th: Add driver infrastructure for Intel(R) Trace Hub devices") Signed-off-by: Ma Ke <make24@iscas.ac.cn> Link: https://patch.msgid.link/20251112091723.35963-1-make24@iscas.ac.cn Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Discovered by Atuin - Automated Vulnerability Discovery Engine.
In ac_ioctl, the validation of IndexCard and the check for a valid
RamIO pointer are skipped when cmd is 6. However, the function
unconditionally executes readb(apbs[IndexCard].RamIO + VERS) at the
end.
If cmd is 6, IndexCard may reference a board that does not exist
(where RamIO is NULL), leading to a NULL pointer dereference.
Fix this by skipping the readb access when cmd is 6, as this
command is a global information query and does not target a specific
board context.
When clk_bulk_prepare_enable() fails, the error path jumps to
err_resetc_assert, skipping clk_bulk_put_all() and leaking the
clock references acquired by clk_bulk_get_all().
Add err_clk_put_all label to properly release clock resources
in all error paths.
lpc32xx_udc_probe() acquires an i2c_client reference through
isp1301_get_client() but fails to release it in both error handling
paths and the normal removal path. This could result in a reference
count leak for the I2C device, preventing proper cleanup and potentially
leading to resource exhaustion. Add put_device() to release the
reference in the probe failure path and in the remove function.
Calling path: isp1301_get_client() -> of_find_i2c_device_by_node() ->
i2c_find_device_by_fwnode(). As comments of i2c_find_device_by_fwnode()
says, 'The user must call put_device(&client->dev) once done with the
i2c client.'
Platform drivers can be probed after their init sections have been
discarded (e.g. on probe deferral or manual rebind through sysfs) so the
probe function and match table must not live in init.
The pvr2_trace message is reporting an error about control read
transfers, however it is using the incorrect variable write_len
instead of read_lean. Fix this by using the correct variable
read_len.
Fixes: d855497edbfb ("V4L/DVB (4228a): pvrusb2 to kernel 2.6.18") Cc: stable@vger.kernel.org Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
rlen value is a user-controlled value, but dtv5100_i2c_msg() does not
check the size of the rlen value. Therefore, if it is set to a value
larger than sizeof(st->data), an out-of-bounds vuln occurs for st->data.
Therefore, we need to add proper range checking to prevent this vuln.
Fixes: 60688d5e6e6e ("V4L/DVB (8735): dtv5100: replace dummy frontend by zl10353") Cc: stable@vger.kernel.org Signed-off-by: Jeongjun Park <aha310510@gmail.com> Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We cannot determine which models require the NO_ATA_1X and
IGNORE_RESIDUE quirks aside from the EL-R12 optical drive device.
Fixes: 955a48a5353f ("usb: usb-storage: No additional quirks need to be added to the EL-R12 optical drive.") Signed-off-by: Chen Changcheng <chenchangcheng@kylinos.cn> Link: https://patch.msgid.link/20251218012318.15978-1-chenchangcheng@kylinos.cn Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit cbd9463da1b1 ("media: v4l2-mem2mem: Avoid calling .device_run in
v4l2_m2m_job_finish") deferred calls to .device_run() to a work queue to
avoid recursive calls when a job is finished right away from
.device_run(). It failed to update the v4l2_m2m_job_finish()
documentation that still states the function must not be called from
.device_run(). Fix it.
jbd2 journal handling code doesn't want jbd2_might_wait_for_commit()
to be placed between start_this_handle() and stop_this_handle(). So it
marks the region with rwsem_acquire_read() and rwsem_release().
However, the annotation is too strong for that purpose. We don't have
to use more than try lock annotation for that.
rwsem_acquire_read() implies:
1. might be a waiter on contention of the lock.
2. enter to the critical section of the lock.
All we need in here is to act 2, not 1. So trylock version of
annotation is sufficient for that purpose. Now that dept partially
relies on lockdep annotaions, dept interpets rwsem_acquire_read() as a
potential wait and might report a deadlock by the wait.
Replace it with trylock version of annotation.
Signed-off-by: Byungchul Park <byungchul@sk.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: stable@kernel.org
Message-ID: <20251024073940.1063-1-byungchul@sk.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When the MB_CHECK_ASSERT macro is enabled, an assertion failure can
occur in __mb_check_buddy when checking preallocated blocks (pa) in
a block group:
Assertion failure in mb_free_blocks() : "groupnr == e4b->bd_group"
This happens when a pa at the very end of a block group (e.g.,
pa_pstart=32765, pa_len=3 in a group of 32768 blocks) becomes
exhausted - its pa_pstart is advanced by pa_len to 32768, which
lies in the next block group. If this exhausted pa (with pa_len == 0)
is still in the bb_prealloc_list during the buddy check, the assertion
incorrectly flags it as belonging to the wrong group. A possible
sequence is as follows:
__mb_check_buddy
for each pa in group
ext4_get_group_no_and_offset
MB_CHECK_ASSERT(groupnr == e4b->bd_group)
To fix this, we modify the check to skip block group validation for
exhausted preallocations (where pa_len == 0). Such entries are in a
transitional state and will be removed from the list soon, so they
should not trigger an assertion. This change prevents the false
positive while maintaining the integrity of the checks for active
allocations.
Fixes: c9de560ded61f ("ext4: Add multi block allocator for ext4") Signed-off-by: Yongjian Sun <sunyongjian1@huawei.com> Reviewed-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz>
Message-ID: <20251106060614.631382-2-sunyongjian@huaweicloud.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If ext4_get_inode_loc() fails (e.g. if it returns -EFSCORRUPTED),
iloc.bh will remain set to NULL. Since ext4_xattr_inode_dec_ref_all()
lacks error checking, this will lead to a null pointer dereference
in ext4_raw_inode(), called right after ext4_get_inode_loc().
Found by Linux Verification Center (linuxtesting.org) with SVACE.
The error path of copying the old config used the wrong variable in the
error message:
$ mkdir /tmp/build
$ ./tools/testing/ktest/config-bisect.pl -b /tmp/build config-good /tmp/config-bad
$ chmod 0 /tmp/build
$ ./tools/testing/ktest/config-bisect.pl -b /tmp/build config-good /tmp/config-bad good
cp /tmp/build//.config config-good.tmp ... [0 seconds] FAILED!
Use of uninitialized value $config in concatenation (.) or string at ./tools/testing/ktest/config-bisect.pl line 744.
failed to copy to config-good.tmp
When it should have shown:
failed to copy /tmp/build//.config to config-good.tmp
Cc: stable@vger.kernel.org Cc: John 'Warthog9' Hawley <warthog9@kernel.org> Fixes: 0f0db065999cf ("ktest: Add standalone config-bisect.pl program") Link: https://patch.msgid.link/20251203180924.6862bd26@gandalf.local.home Reported-by: "John W. Krahn" <jwkrahn@shaw.ca> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
For years I wondered why the floppy driver does not just work on
sparc64, e.g:
root@SUNW_375_0066:# disktype /dev/fd0
disktype: Can't open /dev/fd0: No such device or address
[ 525.341906] disktype: attempt to access beyond end of device
fd0: rw=0, sector=0, nr_sectors = 16 limit=8
[ 525.341991] floppy: error 10 while reading block 0
Turns out floppy.c __floppy_read_block_0 tries to read one page for
the first test read to determine the disk size and thus fails if that
is greater than 4k. Adjust minimum MAX_DISK_SIZE to PAGE_SIZE to fix
floppy on sparc64 and likely all other PAGE_SIZE != 4KB configs.
In the C code, the 'inc' argument to the assembly functions
blake2s_compress_ssse3() and blake2s_compress_avx512() is declared with
type u32, matching blake2s_compress(). The assembly code then reads it
from the 64-bit %rcx. However, the ABI doesn't guarantee zero-extension
to 64 bits, nor do gcc or clang guarantee it. Therefore, fix these
functions to read this argument from the 32-bit %ecx.
In theory, this bug could have caused the wrong 'inc' value to be used,
causing incorrect BLAKE2s hashes. In practice, probably not: I've fixed
essentially this same bug in many other assembly files too, but there's
never been a real report of it having caused a problem. In x86_64, all
writes to 32-bit registers are zero-extended to 64 bits. That results
in zero-extension in nearly all situations. I've only been able to
demonstrate a lack of zero-extension with a somewhat contrived example
involving truncation, e.g. when the C code has a u64 variable holding
0x1234567800000040 and passes it as a u32 expecting it to be truncated
to 0x40 (64). But that's not what the real code does, of course.
vhost_vsock_get() uses hash_for_each_possible_rcu() to find the
`vhost_vsock` associated with the `guest_cid`. hash_for_each_possible_rcu()
should only be called within an RCU read section, as mentioned in the
following comment in include/linux/rculist.h:
/**
* hlist_for_each_entry_rcu - iterate over rcu list of given type
* @pos: the type * to use as a loop cursor.
* @head: the head for your list.
* @member: the name of the hlist_node within the struct.
* @cond: optional lockdep expression if called from non-RCU protection.
*
* This list-traversal primitive may safely run concurrently with
* the _rcu list-mutation primitives such as hlist_add_head_rcu()
* as long as the traversal is guarded by rcu_read_lock().
*/
Currently, all calls to vhost_vsock_get() are between rcu_read_lock()
and rcu_read_unlock() except for calls in vhost_vsock_set_cid() and
vhost_vsock_reset_orphans(). In both cases, the current code is safe,
but we can make improvements to make it more robust.
About vhost_vsock_set_cid(), when building the kernel with
CONFIG_PROVE_RCU_LIST enabled, we get the following RCU warning when the
user space issues `ioctl(dev, VHOST_VSOCK_SET_GUEST_CID, ...)` :
WARNING: suspicious RCU usage
6.18.0-rc7 #62 Not tainted
-----------------------------
drivers/vhost/vsock.c:74 RCU-list traversed in non-reader section!!
other info that might help us debug this:
rcu_scheduler_active = 2, debug_locks = 1
1 lock held by rpc-libvirtd/3443:
#0: ffffffffc05032a8 (vhost_vsock_mutex){+.+.}-{4:4}, at: vhost_vsock_dev_ioctl+0x2ff/0x530 [vhost_vsock]
This is not a real problem, because the vhost_vsock_get() caller, i.e.
vhost_vsock_set_cid(), holds the `vhost_vsock_mutex` used by the hash
table writers. Anyway, to prevent that warning, add lockdep_is_held()
condition to hash_for_each_possible_rcu() to verify that either the
caller is in an RCU read section or `vhost_vsock_mutex` is held when
CONFIG_PROVE_RCU_LIST is enabled; and also clarify the comment for
vhost_vsock_get() to better describe the locking requirements and the
scope of the returned pointer validity.
About vhost_vsock_reset_orphans(), currently this function is only
called via vsock_for_each_connected_socket(), which holds the
`vsock_table_lock` spinlock (which is also an RCU read-side critical
section). However, add an explicit RCU read lock there to make the code
more robust and explicit about the RCU requirements, and to prevent
issues if the calling context changes in the future or if
vhost_vsock_reset_orphans() is called from other contexts.
Fixes: 834e772c8db0 ("vhost/vsock: fix use-after-free in network stack callers") Cc: stefanha@redhat.com Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20251126133826.142496-1-sgarzare@redhat.com>
Message-ID: <20251126210313.GA499503@fedora> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The "dev->clt_device_id" variable is set using ida_alloc_max() which
returns an int and in particular it returns negative error codes.
Change the type from u32 to int to fix the error checking.
Fixes: c9b5645fd8ca ("block: rnbd-clt: Fix leaked ID in init_dev()") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
Thus we can't hold the rport lock when calling nvme_fc_ctrl_put.
Justin suggested use the safe list iterator variant because
nvme_fc_ctrl_put will also modify the rport->list.
Cc: Justin Tee <justin.tee@broadcom.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Daniel Wagner <wagi@kernel.org> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
In sprd_clk_init(), when devm_clk_get() returns -EPROBE_DEFER
for either uart or source clock, we should propagate the
error instead of just warning and continuing with NULL clocks.
Currently the driver only emits a warning when clock acquisition
fails and proceeds with NULL clock pointers. This can lead to
issues later when the clocks are actually needed. More importantly,
when the clock provider is not ready yet and returns -EPROBE_DEFER,
we should return this error to allow deferred probing.
This change adds explicit checks for -EPROBE_DEFER after both:
1. devm_clk_get(uport->dev, uart)
2. devm_clk_get(uport->dev, source)
When -EPROBE_DEFER is encountered, the function now returns
-EPROBE_DEFER to let the driver framework retry probing
later when the clock dependencies are resolved.
The optical drive of EL-R12 has the same vid and pid as INIC-3069,
as follows:
T: Bus=02 Lev=02 Prnt=02 Port=01 Cnt=01 Dev#= 3 Spd=5000 MxCh= 0
D: Ver= 3.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS= 9 #Cfgs= 1
P: Vendor=13fd ProdID=3940 Rev= 3.10
S: Manufacturer=HL-DT-ST
S: Product= DVD+-RW GT80N
S: SerialNumber=423349524E4E38303338323439202020
C:* #Ifs= 1 Cfg#= 1 Atr=80 MxPwr=144mA
I:* If#= 0 Alt= 0 #EPs= 2 Cls=08(stor.) Sub=02 Prot=50 Driver=usb-storage
E: Ad=83(I) Atr=02(Bulk) MxPS=1024 Ivl=0ms
E: Ad=0a(O) Atr=02(Bulk) MxPS=1024 Ivl=0ms
This will result in the optical drive device also adding
the quirks of US_FL_NO_ATA_1X. When performing an erase operation,
it will fail, and the reason for the failure is as follows:
[ 388.967742] sr 5:0:0:0: [sr0] tag#0 Send: scmd 0x00000000d20c33a7
[ 388.967742] sr 5:0:0:0: [sr0] tag#0 CDB: ATA command pass through(12)/Blank a1 11 00 00 00 00 00 00 00 00 00 00
[ 388.967773] sr 5:0:0:0: [sr0] tag#0 Done: SUCCESS Result: hostbyte=DID_TARGET_FAILURE driverbyte=DRIVER_OK cmd_age=0s
[ 388.967773] sr 5:0:0:0: [sr0] tag#0 CDB: ATA command pass through(12)/Blank a1 11 00 00 00 00 00 00 00 00 00 00
[ 388.967803] sr 5:0:0:0: [sr0] tag#0 Sense Key : Illegal Request [current]
[ 388.967803] sr 5:0:0:0: [sr0] tag#0 Add. Sense: Invalid field in cdb
[ 388.967803] sr 5:0:0:0: [sr0] tag#0 scsi host busy 1 failed 0
[ 388.967803] sr 5:0:0:0: Notifying upper driver of completion (result 8100002)
[ 388.967834] sr 5:0:0:0: [sr0] tag#0 0 sectors total, 0 bytes done.
For the EL-R12 standard optical drive, all operational commands
and usage scenarios were tested without adding the IGNORE_RESIDUE quirks,
and no issues were encountered. It can be reasonably concluded
that removing the IGNORE_RESIDUE quirks has no impact.
The UCSI spec states that the num_connectors field is 7 bits, and the
8th bit is reserved and should be set to zero.
Some buggy FW has been known to set this bit, and it can lead to a
system not booting.
Flag that the FW is not behaving correctly, and auto-fix the value
so that the system boots correctly.
Found on Lenovo P1 G8 during Linux enablement program. The FW will
be fixed, but seemed worth addressing in case it hit platforms that
aren't officially Linux supported.
Signed-off-by: Mark Pearson <mpearson-lenovo@squebb.ca> Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Link: https://lore.kernel.org/r/20250821185319.2585023-1-mpearson-lenovo@squebb.ca Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Interrupts are disabled before entering usb_hcd_giveback_urb().
A spinlock_t becomes a sleeping lock on PREEMPT_RT, so it cannot be
acquired with disabled interrupts.
Save the interrupt status and restore it after usb_hcd_giveback_urb().
syz reported:
BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
Call Trace:
dump_stack_lvl+0x189/0x250 lib/dump_stack.c:120
rt_spin_lock+0xc7/0x2c0 kernel/locking/spinlock_rt.c:57
spin_lock include/linux/spinlock_rt.h:44 [inline]
mon_bus_complete drivers/usb/mon/mon_main.c:134 [inline]
mon_complete+0x5c/0x200 drivers/usb/mon/mon_main.c:147
usbmon_urb_complete include/linux/usb/hcd.h:738 [inline]
__usb_hcd_giveback_urb+0x254/0x5e0 drivers/usb/core/hcd.c:1647
vhci_urb_enqueue+0xb4f/0xe70 drivers/usb/usbip/vhci_hcd.c:818
The kernel test robot reported that the exFAT remount operation
failed. The reason for the failure was that the process's umask
is different between mount and remount, causing fs_fmask and
fs_dmask are changed.
Potentially, both gid and uid may also be changed. Therefore, when
initializing fs_context for remount, inherit these mount options
from the options used during mount.
Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202511251637.81670f5c-lkp@intel.com Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
The VIA watchdog driver uses allocate_resource() to reserve a MMIO
region for the watchdog control register. However, the allocated
resource was not given a name, which causes the kernel resource tree
to contain an entry marked as "<BAD>" under /proc/iomem on x86
platforms.
During boot, this unnamed resource can lead to a critical hang because
subsequent resource lookups and conflict checks fail to handle the
invalid entry properly.
If a mailbox command completes immediately after
wait_for_completion_timeout() times out, ha->mbx_intr_comp could be left
in an inconsistent state, causing the next mailbox command not to wait
for the hardware. Fix by reinitializing the completion before use.
When given the module parameter qlini_mode=exclusive, qla2xxx in
initiator mode is initially unable to successfully send SCSI commands to
devices it finds while scanning, resulting in an escalating series of
resets until an adapter reset clears the issue. Fix by checking the
active mode instead of the module parameter.
The PUT_64[LB]E() macros need to cast the value to unsigned long long
like the GET_64[LB]E() macros. Caused lots of warnings when compiled
on 32-bit, and clobbered addresses (36-bit P4080).
CP110 based platforms rely on the bootloader for pci port
initialization.
TF-A actively prevents non-uboot re-configuration of pci lanes, and many
boards do not have software control over the pci card reset.
If a pci port had link at boot-time and the clock is stopped at a later
point, the link fails and can not be recovered.
PCI controller driver probe - and by extension ownership of a driver for
the pci clocks - may be delayed especially on large modular kernels,
causing the clock core to start disabling unused clocks.
Add the CLK_IGNORE_UNUSED flag to the three pci port's clocks to ensure
they are not stopped before the pci controller driver has taken
ownership and tested for an existing link.
This fixes failed pci link detection when controller driver probes late,
e.g. with arm64 defconfig and CONFIG_PHY_MVEBU_CP110_COMPHY=m.
am33xx.dtsi has the same clock setup as am35xx.dtsi, setting
ti,no-reset-on-init and ti,no-idle on timer1_target and timer2_target,
so AM33 needs the same workaround as AM35 to avoid ti-sysc probe
failing on certain target modules.
With mailbox channel requested, there is possibility that interrupts may
come in, so need to make sure the workqueue is initialized before
the queue is scheduled by mailbox rx callback.
Reviewed-by: Frank Li <Frank.Li@nxp.com> Signed-off-by: Peng Fan <peng.fan@nxp.com> Signed-off-by: Shawn Guo <shawnguo@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
channel_handler() sets intf->channels_ready to true but never
clears it, so __scan_channels() skips any rescan. When the BMC
firmware changes a rescan is required. Allow it by clearing
the flag before starting a new scan.
The race window between __scan_channels() and deliver_response() causes
the parameters of some channels to be set to 0.
1.[CPUA] __scan_channels() issues an IPMI request and waits with
wait_event() until all channels have been scanned.
wait_event() internally calls might_sleep(), which might
yield the CPU. (Moreover, an interrupt can preempt
wait_event() and force the task to yield the CPU.)
2.[CPUB] deliver_response() is invoked when the CPU receives the
IPMI response. After processing a IPMI response,
deliver_response() directly assigns intf->wchannels to
intf->channel_list and sets intf->channels_ready to true.
However, not all channels are actually ready for use.
3.[CPUA] Since intf->channels_ready is already true, wait_event()
never enters __wait_event(). __scan_channels() immediately
clears intf->null_user_handler and exits.
4.[CPUB] Once intf->null_user_handler is set to NULL, deliver_response()
ignores further IPMI responses, leaving the remaining
channels zero-initialized and unusable.
Fix the race between __scan_channels() and deliver_response() by
deferring both the assignment intf->channel_list = intf->wchannels
and the flag intf->channels_ready = true until all channels have
been successfully scanned or until the IPMI request has failed.
get_meter_levels_from_urb() parses the 64-byte meter packets sent by
the device and fills the per-channel arrays meter_level[],
comp_level[] and master_level[] in struct snd_us16x08_meter_store.
Currently the function derives the channel index directly from the
meter packet (MUB2(meter_urb, s) - 1) and uses it to index those
arrays without validating the range. If the packet contains a
negative or out-of-range channel number, the driver may write past
the end of these arrays.
Introduce a local channel variable and validate it before updating the
arrays. We reject negative indices, limit meter_level[] and
comp_level[] to SND_US16X08_MAX_CHANNELS, and guard master_level[]
updates with ARRAY_SIZE(master_level).