MSG_SPLICE_PAGES can attach pages from a pipe directly to an skb. TCP
marks such skbs with SKBFL_SHARED_FRAG after skb_splice_from_iter(),
so later paths that may modify packet data can first make a private
copy. The IPv4/IPv6 datagram append paths did not set this flag when
splicing pages into UDP skbs.
That leaves an ESP-in-UDP packet made from shared pipe pages looking
like an ordinary uncloned nonlinear skb. ESP input then takes the no-COW
fast path for uncloned skbs without a frag_list and decrypts in place
over data that is not owned privately by the skb.
Mark IPv4/IPv6 datagram splice frags with SKBFL_SHARED_FRAG, matching
TCP. Also make ESP input fall back to skb_cow_data() when the flag is
present, so ESP does not decrypt externally backed frags in place.
Private nonlinear skb frags still use the existing fast path.
This intentionally does not change ESP output. In esp_output_head(),
the path that appends the ESP trailer to existing skb tailroom without
calling skb_cow_data() is not reachable for nonlinear skbs:
skb_tailroom() returns zero when skb->data_len is nonzero, while ESP
tailen is positive. Thus ESP output will either use the separate
destination-frag path or fall back to skb_cow_data().
Fixes: cac2661c53f3 ("esp4: Avoid skb_cow_data whenever possible") Fixes: 03e2a30f6a27 ("esp6: Avoid skb_cow_data whenever possible") Fixes: 7da0dde68486 ("ip, udp: Support MSG_SPLICE_PAGES") Fixes: 6d8192bd69bb ("ip6, udp6: Support MSG_SPLICE_PAGES") Reported-by: Hyunwoo Kim <imv4bel@gmail.com> Reported-by: Kuan-Ting Chen <h3xrabbit@gmail.com> Tested-by: Hyunwoo Kim <imv4bel@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: Kuan-Ting Chen <h3xrabbit@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
[bwh: Backported to 6.1: set the SKBFL_SHARED_FRAG flag in
ip_append_page() instead of __ip{,6}_append_data()] Signed-off-by: Ben Hutchings <benh@debian.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
privcmd_vm_ops defines .close (privcmd_close), but neither .may_split
nor .open. When userspace does a partial munmap() on a privcmd mapping,
the kernel splits the VMA via __split_vma(). Since may_split is NULL,
the split is allowed. vm_area_dup() copies vm_private_data (a pages
array allocated in alloc_empty_pages()) into the new VMA without any
fixup, because there is no .open callback.
Both VMAs now point to the same pages array. When the unmapped portion
is closed, privcmd_close() calls:
- xen_unmap_domain_gfn_range()
- xen_free_unpopulated_pages()
- kvfree(pages)
The surviving VMA still holds the dangling pointer. When it is later
destroyed, the same sequence runs again, which leads to a double free.
Fix this issue by adding a .may_split callback denying the VMA split.
The src SG list offset wasn't set properly when decrypting in-place,
fix it.
Reported-by: Wolfgang Walter <linux@stwm.de> Fixes: e02494114ebf ("crypto: authencesn - Do not place hiseq at end of dst for out-of-place decryption") Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When decrypting data that is not in-place (src != dst), there is
no need to save the high-order sequence bits in dst as it could
simply be re-copied from the source.
However, the data to be hashed need to be rearranged accordingly.
Reported-by: Taeyang Lee <0wn@theori.io> Fixes: 104880a6b470 ("crypto: authencesn - Convert to new AEAD interface") Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
AF_ALG AEAD AIO requests currently use the socket-wide IV buffer during
request processing. For async requests, later socket activity can
update that shared state before the original request has fully
completed, which can lead to inconsistent IV handling.
Snapshot the IV into per-request storage when preparing the AEAD
request, so in-flight operations no longer depend on mutable socket
state.
Fixes: d887c52d6ae4 ("crypto: algif_aead - overhaul memory management") Cc: stable@kernel.org Reported-by: Yuan Tan <yuantan098@gmail.com> Reported-by: Yifan Wu <yifanwucs@gmail.com> Reported-by: Juefei Pu <tomapufckgml@gmail.com> Reported-by: Xin Liu <bird@lzu.edu.cn> Co-developed-by: Luxing Yin <tr0jan@lzu.edu.cn> Signed-off-by: Luxing Yin <tr0jan@lzu.edu.cn> Tested-by: Yucheng Lu <kanolyc@gmail.com> Signed-off-by: Douya Le <ldy3087146292@gmail.com> Signed-off-by: Ren Wei <n05ec@lzu.edu.cn> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This mostly reverts commit 72548b093ee3 except for the copying of
the associated data.
There is no benefit in operating in-place in algif_aead since the
source and destination come from different mappings. Get rid of
all the complexity added for in-place operation and just copy the
AD directly.
Fixes: 72548b093ee3 ("crypto: algif_aead - copy AAD from src to dst") Reported-by: Taeyang Lee <0wn@theori.io> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This function was rewritten twice. The earlier implementations had many
prerequisite commits, while the latest implementation is standalone.
It's much easier to just backport the latest code directly.
Signed-off-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When the core of io_uring was updated to handle completions
consistently and with fixed return codes, the POLL_REMOVE opcode
with updates got slightly broken. If a POLL_ADD is pending and
then POLL_REMOVE is used to update the events of that request, if that
update causes the POLL_ADD to now trigger, then that completion is lost
and a CQE is never posted.
Additionally, ensure that if an update does cause an existing POLL_ADD
to complete, that the completion value isn't always overwritten with
-ECANCELED. For that case, whatever io_poll_add() set the value to
should just be retained.
The original commit attempted to enable ACS in pci_dma_configure() prior
to IOMMU group assignment in iommu_init_device() to fix the ACS enablement
issue for OF platforms. But that assumption doesn't hold true for kernel
versions prior to v6.15, because on these older kernels,
pci_dma_configure() is called *after* iommu_init_device(). So the IOMMU
groups are already created before the ACS gets enabled. This causes the
devices that should have been split into separate groups by ACS, getting
merged into one group, thereby breaking the IOMMU isolation as reported on
the AMD machines.
So revert the offending commit to restore the IOMMU group assignment on
those affected machines. It should be noted that ACS has never really
worked on kernel versions prior to v6.15, so the revert doesn't make any
difference for OF platforms.
Reported-by: John Hancock <john@kernel.doghat.io> Reported-by: bjorn.forsman@gmail.com Closes: https://bugzilla.kernel.org/show_bug.cgi?id=221234 Fixes: b20b659c2c6a ("PCI: Enable ACS after configuring IOMMU for OF platforms") Cc: Linux kernel regressions list <regressions@lists.linux.dev> Link: https://lore.kernel.org/regressions/2c30f181-ffc6-4d63-a64e-763cf4528f48@leemhuis.info Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
krb_authenticate frees sess->user and does not set the pointer
to NULL. It calls ksmbd_krb5_authenticate to reinitialise
sess->user but that function may return without doing so. If
that happens then smb2_sess_setup, which calls krb_authenticate,
will be accessing free'd memory when it later uses sess->user.
Cc: stable@vger.kernel.org Signed-off-by: Sean Heelan <seanheelan@gmail.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Leon Chen <leonchen.oss@139.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When a session is found but its state is not SMB2_SESSION_VALID, It
indicates that no valid session was found, but it is missing to decrement
the reference count acquired by the session lookup, which results in
a reference count leak. This patch fixes the issue by explicitly calling
ksmbd_user_session_put to release the reference to the session.
Cc: stable@vger.kernel.org Reported-by: Alexandre <roger.andersen@protonmail.com> Reported-by: Stanislas Polu <spolu@dust.tt> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Li hongliang <1468888505@139.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently the code attempts to accept requests regardless of the
command identifier which may cause multiple requests to be marked
as pending (FLAG_DEFER_SETUP) which can cause more than
L2CAP_ECRED_MAX_CID(5) to be allocated in l2cap_ecred_rsp_defer
causing an overflow.
The spec is quite clear that the same identifier shall not be used on
subsequent requests:
'Within each signaling channel a different Identifier shall be used
for each successive request or indication.'
https://www.bluetooth.com/wp-content/uploads/Files/Specification/HTML/Core-62/out/en/host/logical-link-control-and-adaptation-protocol-specification.html#UUID-32a25a06-4aa4-c6c7-77c5-dcfe3682355d
So this attempts to check if there are any channels pending with the
same identifier and rejects if any are found.
Fixes: 15f02b910562 ("Bluetooth: L2CAP: Add initial code for Enhanced Credit Based Mode") Reported-by: Yiming Qian <yimingqian591@gmail.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
[ adapted variable names ] Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
`strncpy` is deprecated for use on NUL-terminated destination strings [1].
We should prefer more robust and less ambiguous string interfaces.
We expect name to be NUL-terminated based on its numerous uses with
functions that expect NUL-terminated strings.
For example in i2c-core-base.c +1533:
| dev_dbg(&adap->dev, "adapter [%s] registered\n", adap->name);
NUL-padding is not required as `adap` is already zero-alloacted with:
| adap = devm_kzalloc(&usbif->dev, sizeof(struct i2c_adapter), GFP_KERNEL);
With the above in mind, a suitable replacement is `strscpy` [2] due to
the fact that it guarantees NUL-termination on the destination buffer
without unnecessarily NUL-padding.
After this commit (e2b76ab8b5c9 "ksmbd: add support for read compound"),
response buffer management was changed to use dynamic iov array.
In the new design, smb2_calc_max_out_buf_len() expects the second
argument (hdr2_len) to be the offset of ->Buffer field in the
response structure, not a hardcoded magic number.
Fix the remaining call sites to use the correct offsetof() value.
Cc: stable@vger.kernel.org Fixes: e2b76ab8b5c9 ("ksmbd: add support for read compound") Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
[ adapted `req->CtlCode` field access to `req->CntCode` ] Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When a compound request consists of QUERY_DIRECTORY + QUERY_INFO
(FILE_ALL_INFORMATION) and the first command consumes nearly the entire
max_trans_size, get_file_all_info() would blindly call smbConvertToUTF16()
with PATH_MAX, causing out-of-bounds write beyond the response buffer.
In get_file_all_info(), there was a missing validation check for
the client-provided OutputBufferLength before copying the filename into
FileName field of the smb2_file_all_info structure.
If the filename length exceeds the available buffer space, it could lead to
potential buffer overflows or memory corruption during smbConvertToUTF16
conversion. This calculating the actual free buffer size using
smb2_calc_max_out_buf_len() and returning -EINVAL if the buffer is
insufficient and updating smbConvertToUTF16 to use the actual filename
length (clamped by PATH_MAX) to ensure a safe copy operation.
Cc: stable@vger.kernel.org Fixes: e2b76ab8b5c9 ("ksmbd: add support for read compound") Reported-by: Asim Viladi Oglu Manizada <manizada@pm.me> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
[ adapted variable declarations ] Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Move FSGSBASE enablement from identify_cpu() to cpu_init_exception_handling()
to ensure it is enabled before any exceptions can occur on both boot and
secondary CPUs.
== Background ==
Exception entry code (paranoid_entry()) uses ALTERNATIVE patching based on
X86_FEATURE_FSGSBASE to decide whether to use RDGSBASE/WRGSBASE instructions
or the slower RDMSR/SWAPGS sequence for saving/restoring GSBASE.
On boot CPU, ALTERNATIVE patching happens after enabling FSGSBASE in CR4.
When the feature is available, the code is permanently patched to use
RDGSBASE/WRGSBASE, which require CR4.FSGSBASE=1 to execute without triggering
== Boot Sequence ==
Boot CPU (with CR pinning enabled):
trap_init()
cpu_init() <- Uses unpatched code (RDMSR/SWAPGS)
x2apic_setup()
...
arch_cpu_finalize_init()
identify_boot_cpu()
identify_cpu()
cr4_set_bits(X86_CR4_FSGSBASE) # Enables the feature
# This becomes part of cr4_pinned_bits
...
alternative_instructions() <- Patches code to use RDGSBASE/WRGSBASE
Secondary CPUs (with CR pinning enabled):
start_secondary()
cr4_init() <- Code already patched, CR4.FSGSBASE=1
set implicitly via cr4_pinned_bits
cpu_init() <- exceptions work because FSGSBASE is
already enabled
Secondary CPU (with CR pinning disabled):
start_secondary()
cr4_init() <- Code already patched, CR4.FSGSBASE=0
cpu_init()
x2apic_setup()
rdmsrq(MSR_IA32_APICBASE) <- Triggers #VC in SNP guests
exc_vmm_communication()
paranoid_entry() <- Uses RDGSBASE with CR4.FSGSBASE=0
(patched code)
...
ap_starting()
identify_secondary_cpu()
identify_cpu()
cr4_set_bits(X86_CR4_FSGSBASE) <- Enables the feature, which is
too late
== CR Pinning ==
Currently, for secondary CPUs, CR4.FSGSBASE is set implicitly through
CR-pinning: the boot CPU sets it during identify_cpu(), it becomes part of
cr4_pinned_bits, and cr4_init() applies those pinned bits to secondary CPUs.
This works but creates an undocumented dependency between cr4_init() and the
pinning mechanism.
== Problem ==
Secondary CPUs boot after alternatives have been applied globally. They
execute already-patched paranoid_entry() code that uses RDGSBASE/WRGSBASE
instructions, which require CR4.FSGSBASE=1. Upcoming changes to CR pinning
behavior will break the implicit dependency, causing secondary CPUs to
generate #UD.
This issue manifests itself on AMD SEV-SNP guests, where the rdmsrq() in
x2apic_setup() triggers a #VC exception early during cpu_init(). The #VC
handler (exc_vmm_communication()) executes the patched paranoid_entry() path.
Without CR4.FSGSBASE enabled, RDGSBASE instructions trigger #UD.
== Fix ==
Enable FSGSBASE explicitly in cpu_init_exception_handling() before loading
exception handlers. This makes the dependency explicit and ensures both
boot and secondary CPUs have FSGSBASE enabled before paranoid_entry()
executes.
On arm64 server, we found folio that get from migration entry isn't locked
in softleaf_to_folio(). This issue triggers when mTHP splitting and
zap_nonpresent_ptes() races, and the root cause is lack of memory barrier
in softleaf_to_folio(). The race is as follows:
CPU0 CPU1
deferred_split_scan() zap_nonpresent_ptes()
lock folio
split_folio()
unmap_folio()
change ptes to migration entries
__split_folio_to_order() softleaf_to_folio()
set flags(including PG_locked) for tail pages folio = pfn_folio(softleaf_to_pfn(entry))
smp_wmb() VM_WARN_ON_ONCE(!folio_test_locked(folio))
prep_compound_page() for tail pages
In __split_folio_to_order(), smp_wmb() guarantees page flags of tail pages
are visible before the tail page becomes non-compound. smp_wmb() should
be paired with smp_rmb() in softleaf_to_folio(), which is missed. As a
result, if zap_nonpresent_ptes() accesses migration entry that stores tail
pfn, softleaf_to_folio() may see the updated compound_head of tail page
before page->flags.
This issue will trigger VM_WARN_ON_ONCE() in pfn_swap_entry_folio()
because of the race between folio split and zap_nonpresent_ptes()
leading to a folio incorrectly undergoing modification without a folio
lock being held.
This is a BUG_ON() before commit 93976a20345b ("mm: eliminate further
swapops predicates"), which in merged in v6.19-rc1.
To fix it, add missing smp_rmb() if the softleaf entry is migration entry
in softleaf_to_folio() and softleaf_to_page().
[tujinjiang@huawei.com: update function name and comments] Link: https://lkml.kernel.org/r/20260321075214.3305564-1-tujinjiang@huawei.com Link: https://lkml.kernel.org/r/20260319012541.4158561-1-tujinjiang@huawei.com Fixes: e9b61f19858a ("thp: reintroduce split_huge_page()") Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Cc: Barry Song <baohua@kernel.org> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Nanyong Sun <sunnanyong@huawei.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
[ adapted fix from leafops.h softleaf_to_page()/softleaf_to_folio() ] Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
tcm_loop_target_reset() violates the SCSI EH contract: it returns SUCCESS
without draining any in-flight commands. The SCSI EH documentation
(scsi_eh.rst) requires that when a reset handler returns SUCCESS the driver
has made lower layers "forget about timed out scmds" and is ready for new
commands. Every other SCSI LLD (virtio_scsi, mpt3sas, ipr, scsi_debug,
mpi3mr) enforces this by draining or completing outstanding commands before
returning SUCCESS.
Because tcm_loop_target_reset() doesn't drain, the SCSI EH reuses in-flight
scsi_cmnd structures for recovery commands (e.g. TUR) while the target core
still has async completion work queued for the old se_cmd. The memset in
queuecommand zeroes se_lun and lun_ref_active, causing
transport_lun_remove_cmd() to skip its percpu_ref_put(). The leaked LUN
reference prevents transport_clear_lun_ref() from completing, hanging
configfs LUN unlink forever in D-state:
INFO: task rm:264 blocked for more than 122 seconds.
rm D 0 264 258 0x00004000
Call Trace:
__schedule+0x3d0/0x8e0
schedule+0x36/0xf0
transport_clear_lun_ref+0x78/0x90 [target_core_mod]
core_tpg_remove_lun+0x28/0xb0 [target_core_mod]
target_fabric_port_unlink+0x50/0x60 [target_core_mod]
configfs_unlink+0x156/0x1f0 [configfs]
vfs_unlink+0x109/0x290
do_unlinkat+0x1d5/0x2d0
Fix this by making tcm_loop_target_reset() actually drain commands:
1. Issue TMR_LUN_RESET via tcm_loop_issue_tmr() to drain all commands that
the target core knows about (those not yet CMD_T_COMPLETE).
2. Use blk_mq_tagset_busy_iter() to iterate all started requests and
flush_work() on each se_cmd — this drains any deferred completion work
for commands that already had CMD_T_COMPLETE set before the TMR (which
the TMR skips via __target_check_io_state()). This is the same pattern
used by mpi3mr, scsi_debug, and libsas to drain outstanding commands
during reset.
Fixes: e0eb5d38b732 ("scsi: target: tcm_loop: Use block cmd allocator for se_cmds") Cc: stable@vger.kernel.org Assisted-by: Claude:claude-opus-4-6 Signed-off-by: Josef Bacik <josef@toxicpanda.com> Link: https://patch.msgid.link/27011aa34c8f6b1b94d2e3cf5655b6d037f53428.1773706803.git.josef@toxicpanda.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
[ added `bool reserved` parameter to `tcm_loop_flush_work_iter()` ] Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The devm_free_irq() and devm_request_irq() functions should not be
executed in an atomic context.
During device suspend, all userspace processes and most kernel threads
are frozen. Additionally, we flush all tx/rx status, disable all macb
interrupts, and halt rx operations. Therefore, it is safe to split the
region protected by bp->lock into two independent sections, allowing
devm_free_irq() and devm_request_irq() to run in a non-atomic context.
This modification resolves the following lockdep warning:
BUG: sleeping function called from invalid context at kernel/locking/mutex.c:591
in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 501, name: rtcwake
preempt_count: 1, expected: 0
RCU nest depth: 1, expected: 0
7 locks held by rtcwake/501:
#0: ffff0008038c3408 (sb_writers#5){.+.+}-{0:0}, at: vfs_write+0xf8/0x368
#1: ffff0008049a5e88 (&of->mutex#2){+.+.}-{4:4}, at: kernfs_fop_write_iter+0xbc/0x1c8
#2: ffff00080098d588 (kn->active#70){.+.+}-{0:0}, at: kernfs_fop_write_iter+0xcc/0x1c8
#3: ffff800081c84888 (system_transition_mutex){+.+.}-{4:4}, at: pm_suspend+0x1ec/0x290
#4: ffff0008009ba0f8 (&dev->mutex){....}-{4:4}, at: device_suspend+0x118/0x4f0
#5: ffff800081d00458 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire+0x4/0x48
#6: ffff0008031fb9e0 (&bp->lock){-.-.}-{3:3}, at: macb_suspend+0x144/0x558
irq event stamp: 8682
hardirqs last enabled at (8681): [<ffff8000813c7d7c>] _raw_spin_unlock_irqrestore+0x44/0x88
hardirqs last disabled at (8682): [<ffff8000813c7b58>] _raw_spin_lock_irqsave+0x38/0x98
softirqs last enabled at (7322): [<ffff8000800f1b4c>] handle_softirqs+0x52c/0x588
softirqs last disabled at (7317): [<ffff800080010310>] __do_softirq+0x20/0x2c
CPU: 1 UID: 0 PID: 501 Comm: rtcwake Not tainted 7.0.0-rc3-next-20260310-yocto-standard+ #125 PREEMPT
Hardware name: ZynqMP ZCU102 Rev1.1 (DT)
Call trace:
show_stack+0x24/0x38 (C)
__dump_stack+0x28/0x38
dump_stack_lvl+0x64/0x88
dump_stack+0x18/0x24
__might_resched+0x200/0x218
__might_sleep+0x38/0x98
__mutex_lock_common+0x7c/0x1378
mutex_lock_nested+0x38/0x50
free_irq+0x68/0x2b0
devm_irq_release+0x24/0x38
devres_release+0x40/0x80
devm_free_irq+0x48/0x88
macb_suspend+0x298/0x558
device_suspend+0x218/0x4f0
dpm_suspend+0x244/0x3a0
dpm_suspend_start+0x50/0x78
suspend_devices_and_enter+0xec/0x560
pm_suspend+0x194/0x290
state_store+0x110/0x158
kobj_attr_store+0x1c/0x30
sysfs_kf_write+0xa8/0xd0
kernfs_fop_write_iter+0x11c/0x1c8
vfs_write+0x248/0x368
ksys_write+0x7c/0xf8
__arm64_sys_write+0x28/0x40
invoke_syscall+0x4c/0xe8
el0_svc_common+0x98/0xf0
do_el0_svc+0x28/0x40
el0_svc+0x54/0x1e0
el0t_64_sync_handler+0x84/0x130
el0t_64_sync+0x198/0x1a0
Fixes: 558e35ccfe95 ("net: macb: WoL support for GEM type of Ethernet controller") Cc: stable@vger.kernel.org Reviewed-by: Théo Lebrun <theo.lebrun@bootlin.com> Signed-off-by: Kevin Hao <haokexin@gmail.com> Link: https://patch.msgid.link/20260318-macb-irq-v2-1-f1179768ab24@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
[ adapted WoL register writes to use MACB_BIT(MAG) instead of tmp variable ] Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When installing an emulated MMIO SPTE, do so *after* dropping/zapping the
existing SPTE (if it's shadow-present). While commit a54aa15c6bda3 was
right about it being impossible to convert a shadow-present SPTE to an
MMIO SPTE due to a _guest_ write, it failed to account for writes to guest
memory that are outside the scope of KVM.
E.g. if host userspace modifies a shadowed gPTE to switch from a memslot
to emulted MMIO and then the guest hits a relevant page fault, KVM will
install the MMIO SPTE without first zapping the shadow-present SPTE.
The driver lists (ld_free, ld_queue) are used in
rz_dmac_free_chan_resources(), rz_dmac_terminate_all(),
rz_dmac_issue_pending(), and rz_dmac_irq_handler_thread(), all under
the virtual channel lock. Take the same lock in rz_dmac_prep_slave_sg()
and rz_dmac_prep_dma_memcpy() as well to avoid concurrency issues, since
these functions also check whether the lists are empty and update or
remove list entries.
Both rz_dmac_disable_hw() and rz_dmac_irq_handle_channel() update the
CHCTRL register. To avoid concurrency issues when configuring
functionalities exposed by this registers, take the virtual channel lock.
All other CHCTRL updates were already protected by the same lock.
Previously, rz_dmac_disable_hw() disabled and re-enabled local IRQs, before
accessing CHCTRL registers but this does not ensure race-free access.
Remove the local IRQ disable/enable code as well.
In xfs_inode_item_push() and xfs_qm_dquot_logitem_push(), the AIL lock
is dropped to perform buffer IO. Once the cluster buffer no longer
protects the log item from reclaim, the log item may be freed by
background reclaim or the dquot shrinker. The subsequent spin_lock()
call dereferences lip->li_ailp, which is a use-after-free.
Fix this by saving the ailp pointer in a local variable while the AIL
lock is held and the log item is guaranteed to be valid.
Reported-by: syzbot+652af2b3c5569c4ab63c@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=652af2b3c5569c4ab63c Fixes: 90c60e164012 ("xfs: xfs_iflush() is no longer necessary") Cc: stable@vger.kernel.org # v5.9 Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Yuto Ohnuki <ytohnuki@amazon.com> Signed-off-by: Carlos Maiolino <cem@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit b98535d09179 ("ext4: fix bug_on in start_this_handle during umount
filesystem") moved ext4_unregister_sysfs() before flushing s_sb_upd_work
to prevent new error work from being queued via /proc/fs/ext4/xx/mb_groups
reads during unmount. However, this introduced a use-after-free because
update_super_work calls ext4_notify_error_sysfs() -> sysfs_notify() which
accesses the kobject's kernfs_node after it has been freed by kobject_del()
in ext4_unregister_sysfs():
Instead of reordering the teardown sequence, fix this by making
ext4_notify_error_sysfs() detect that sysfs has already been torn down
by checking s_kobj.state_in_sysfs, and skipping the sysfs_notify() call
in that case. A dedicated mutex (s_error_notify_mutex) serializes
ext4_notify_error_sysfs() against kobject_del() in ext4_unregister_sysfs()
to prevent TOCTOU races where the kobject could be deleted between the
state_in_sysfs check and the sysfs_notify() call.
Fixes: b98535d09179 ("ext4: fix bug_on in start_this_handle during umount filesystem") Cc: Jiayuan Chen <jiayuan.chen@linux.dev> Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Jiayuan Chen <jiayuan.chen@shopee.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20260319120336.157873-1-jiayuan.chen@linux.dev Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
[ adapted mutex_init placement ] Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The above scenarios occur in initialization failures and teardown
paths, there are no parallel operations on the resources released
by kvfree(), this commit therefore remove rcu_read_lock/unlock() and
use rcu_access_pointer() instead of rcu_dereference() operations.
Fixes: 7c990728b99e ("ext4: fix potential race between s_flex_groups online resizing and access") Fixes: df3da4ea5a0f ("ext4: fix potential race between s_group_info online resizing and access") Signed-off-by: Zqiang <qiang.zhang@linux.dev> Reviewed-by: Baokun Li <libaokun@linux.alibaba.com> Link: https://patch.msgid.link/20260319094545.19291-1-qiang.zhang@linux.dev Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
[ adapted fix to inlined teardown code ] Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
ext4_inode_attach_jinode() publishes ei->jinode to concurrent users.
It used to set ei->jinode before jbd2_journal_init_jbd_inode(),
allowing a reader to observe a non-NULL jinode with i_vfs_inode
still unset.
The fast commit flush path can then pass this jinode to
jbd2_wait_inode_data(), which dereferences i_vfs_inode->i_mapping and
may crash.
Fix this by initializing the jbd2_inode first.
Use smp_wmb() and WRITE_ONCE() to publish ei->jinode after
initialization. Readers use READ_ONCE() to fetch the pointer.
Fixes: a361293f5fede ("jbd2: Fix oops in jbd2_journal_file_inode()") Cc: stable@vger.kernel.org Signed-off-by: Li Chen <me@linux.beauty> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20260225082617.147957-1-me@linux.beauty Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
[ adapted READ_ONCE(jinode) wrapping to split ext4_fc_submit_inode_data_all() and ext4_fc_wait_inode_data_all() ] Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit b81ac4395bbe ("usb: gadget: uvc: allow for application to cleanly
shutdown") introduced two stages of synchronization waits totaling 1500ms
in uvc_function_unbind() to prevent several types of kernel panics.
However, this timing-based approach is insufficient during power
management (PM) transitions.
When the PM subsystem starts freezing user space processes, the
wait_event_interruptible_timeout() is aborted early, which allows the
unbind thread to proceed and nullify the gadget pointer
(cdev->gadget = NULL):
[ 814.123447][ T947] configfs-gadget.g1 gadget.0: uvc: uvc_function_unbind()
[ 814.178583][ T3173] PM: suspend entry (deep)
[ 814.192487][ T3173] Freezing user space processes
[ 814.197668][ T947] configfs-gadget.g1 gadget.0: uvc: uvc_function_unbind no clean disconnect, wait for release
When the PM subsystem resumes or aborts the suspend and tasks are
restarted, the V4L2 release path is executed and attempts to access the
already nullified gadget pointer, triggering a kernel panic:
Address the race condition and NULL pointer dereference by:
1. State Synchronization (flag + mutex)
Introduce a 'func_unbound' flag in struct uvc_device. This allows
uvc_function_disconnect() to safely skip accessing the nullified
cdev->gadget pointer. As suggested by Alan Stern, this flag is protected
by a new mutex (uvc->lock) to ensure proper memory ordering and prevent
instruction reordering or speculative loads. This mutex is also used to
protect 'func_connected' for consistent state management.
2. Explicit Synchronization (completion)
Use a completion to synchronize uvc_function_unbind() with the
uvc_vdev_release() callback. This prevents Use-After-Free (UAF) by
ensuring struct uvc_device is freed after all video device resources
are released.
Fixes: b81ac4395bbe ("usb: gadget: uvc: allow for application to cleanly shutdown") Cc: stable <stable@kernel.org> Suggested-by: Alan Stern <stern@rowland.harvard.edu> Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Jimmy Hu <hhhuuu@google.com> Link: https://patch.msgid.link/20260320065427.1374555-1-hhhuuu@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ replaced guard()/scoped_guard() macros ] Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
A race condition between gether_disconnect() and eth_stop() leads to a
NULL pointer dereference. Specifically, if eth_stop() is triggered
concurrently while gether_disconnect() is tearing down the endpoints,
eth_stop() attempts to access the cleared endpoint descriptor, causing
the following NPE:
Because eth_stop() crashes while holding the dev->lock, the thread
running gether_disconnect() fails to acquire the same lock and spins
forever, resulting in a hardlockup:
Core - Debugging Information for Hardlockup core(7)
Call trace:
queued_spin_lock_slowpath+0x94/0x488
_raw_spin_lock+0x64/0x6c
gether_disconnect+0x19c/0x1e8
ncm_set_alt+0x68/0x1a0
composite_setup+0x6a0/0xc50
The root cause is that the clearing of dev->port_usb in
gether_disconnect() is delayed until the end of the function.
Move the clearing of dev->port_usb to the very beginning of
gether_disconnect() while holding dev->lock. This cuts off the link
immediately, ensuring eth_stop() will see dev->port_usb as NULL and
safely bail out.
There was an issue when you did the following:
- setup and bind an hid gadget
- open /dev/hidg0
- use the resulting fd in EPOLL_CTL_ADD
- unbind the UDC
- bind the UDC
- use the fd in EPOLL_CTL_DEL
When CONFIG_DEBUG_LIST was enabled, a list_del corruption was reported
within remove_wait_queue (via ep_remove_wait_queue). After some
debugging I found out that the queues, which f_hid registers via
poll_wait were the problem. These were initialized using
init_waitqueue_head inside hidg_bind. So effectively, the bind function
re-initialized the queues while there were still items in them.
The solution is to move the initialization from hidg_bind to hidg_alloc
to extend their lifetimes to the lifetime of the function instance.
Additionally, I found many other possibly problematic init calls in the
bind function, which I moved as well.
Userspace can create an unlimited number of rfkill events if the system
is so configured, while not consuming them from the rfkill file
descriptor, causing a potential out of memory situation. Prevent this
from bounding the number of pending rfkill events at a "large" number
(i.e. 1000) to prevent abuses like this.
Cc: Johannes Berg <johannes@sipsolutions.net> Reported-by: Yuan Tan <yuantan098@gmail.com> Reported-by: Yifan Wu <yifanwucs@gmail.com> Reported-by: Juefei Pu <tomapufckgml@gmail.com> Reported-by: Xin Liu <bird@lzu.edu.cn> Cc: stable <stable@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://patch.msgid.link/2026033013-disfigure-scroll-e25e@gregkh Signed-off-by: Johannes Berg <johannes.berg@intel.com>
[ replaced `scoped_guard()` with explicit `mutex_lock()`/`mutex_unlock()` calls ] Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The seg6 lwtunnel uses a single dst_cache per encap route, shared
between seg6_input_core() and seg6_output_core(). These two paths
can perform the post-encap SID lookup in different routing contexts
(e.g., ip rules matching on the ingress interface, or VRF table
separation). Whichever path runs first populates the cache, and the
other reuses it blindly, bypassing its own lookup.
Fix this by splitting the cache into cache_input and cache_output,
so each path maintains its own cached dst independently.
Fixes: 6c8702c60b88 ("ipv6: sr: add support for SRH encapsulation and injection with lwtunnels") Cc: stable@vger.kernel.org Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it> Reviewed-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Reviewed-by: Justin Iurman <justin.iurman@gmail.com> Link: https://patch.msgid.link/20260404004405.4057-2-andrea.mayer@uniroma2.it Signed-off-by: Jakub Kicinski <kuba@kernel.org>
[ added missing dst reference loop guard in seg6_output_core() ] Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This commit was originally adding the ability to add MPTCP endpoints
with ID 0 by accident. The in-kernel PM, handling MPTCP endpoints at the
net namespace level, is not supposed to handle endpoints with such ID,
because this ID 0 is reserved to the initial subflow, as mentioned in
the MPTCPv1 protocol [1], a per-connection setting.
Note that 'ip mptcp endpoint add id 0' stops early with an error, but
other tools might still request the in-kernel PM to create MPTCP
endpoints with this restricted ID 0.
In other words, it was wrong to call the mptcp_pm_has_addr_attr_id
helper to check whether the address ID attribute is set: if it was set
to 0, a new MPTCP endpoint would be created with ID 0, which is not
expected, and might cause various issues later.
Commit 453b8fb68f36 ("xen/privcmd: restrict usage in
unprivileged domU") added a xenstore notifier to defer setting the
restriction target until Xenstore is ready.
XEN_PRIVCMD can be built as a module, but privcmd_exit() leaves that
notifier behind. Balance the notifier lifecycle by unregistering it on
module exit.
This is harmless even if xenstore was already ready at registration
time and the notifier was never queued on the chain.
An AF_RXRPC socket can be both client and server at the same time. When
sending new calls (ie. it's acting as a client), it uses rx->key to set the
security, and when accepting incoming calls (ie. it's acting as a server),
it uses rx->securities.
setsockopt(RXRPC_SECURITY_KEY) sets rx->key to point to an rxrpc-type key
and setsockopt(RXRPC_SECURITY_KEYRING) sets rx->securities to point to a
keyring of rxrpc_s-type keys.
Now, it should be possible to use both rx->key and rx->securities on the
same socket - but for userspace AF_RXRPC sockets rxrpc_setsockopt()
prevents that.
Fix this by:
(1) Remove the incorrect check rxrpc_setsockopt(RXRPC_SECURITY_KEYRING)
makes on rx->key.
(2) Move the check that rxrpc_setsockopt(RXRPC_SECURITY_KEY) makes on
rx->key down into rxrpc_request_key().
(3) Remove rxrpc_request_key()'s check on rx->securities.
This (in combination with a previous patch) pushes the checks down into the
functions that set those pointers and removes the cross-checks that prevent
both key and keyring being set.
Fixes: 17926a79320a ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both") Closes: https://sashiko.dev/#/patchset/20260401105614.1696001-10-dhowells@redhat.com Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Anderson Nascimento <anderson@allelesecurity.com>
cc: Luxiao Xu <rakukuip@gmail.com>
cc: Yuan Tan <yuantan098@gmail.com>
cc: Simon Horman <horms@kernel.org>
cc: linux-afs@lists.infradead.org
cc: stable@kernel.org Link: https://patch.msgid.link/20260408121252.2249051-16-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The jumbo_frm() chain-mode implementation unconditionally computes
len = nopaged_len - bmax;
where nopaged_len = skb_headlen(skb) (linear bytes only) and bmax is
BUF_SIZE_8KiB or BUF_SIZE_2KiB. However, the caller stmmac_xmit()
decides to invoke jumbo_frm() based on skb->len (total length including
page fragments):
When a packet has a small linear portion (nopaged_len <= bmax) but a
large total length due to page fragments (skb->len > bmax), the
subtraction wraps as an unsigned integer, producing a huge len value
(~0xFFFFxxxx). This causes the while (len != 0) loop to execute
hundreds of thousands of iterations, passing skb->data + bmax * i
pointers far beyond the skb buffer to dma_map_single(). On IOMMU-less
SoCs (the typical deployment for stmmac), this maps arbitrary kernel
memory to the DMA engine, constituting a kernel memory disclosure and
potential memory corruption from hardware.
Fix this by introducing a buf_len local variable clamped to
min(nopaged_len, bmax). Computing len = nopaged_len - buf_len is then
always safe: it is zero when the linear portion fits within a single
descriptor, causing the while (len != 0) loop to be skipped naturally,
and the fragment loop in stmmac_xmit() handles page fragments afterward.
qca_tty_receive() consumes each input byte before checking whether a
completed frame needs a fresh receive skb. When the current byte completes
a frame, the driver delivers that frame and then allocates a new skb for
the next one.
If that allocation fails, the current code returns i even though data[i]
has already been consumed and may already have completed the delivered
frame. Since serdev interprets the return value as the number of accepted
bytes, this under-reports progress by one byte and can replay the final
byte of the completed frame into a fresh parser state on the next call.
Return i + 1 in that failure path so the accepted-byte count matches the
actual receive-state progress.
Make sure to deregister the controller before dropping the reference to
the driver data on disconnect to avoid NULL-pointer dereferences or
use-after-free.
A use-after-free / refcount underflow is possible when the heartbeat
worker and intel_engine_park_heartbeat() race to release the same
engine->heartbeat.systole request.
The heartbeat worker reads engine->heartbeat.systole and calls
i915_request_put() on it when the request is complete, but clears
the pointer in a separate, non-atomic step. Concurrently, a request
retirement on another CPU can drop the engine wakeref to zero, triggering
__engine_park() -> intel_engine_park_heartbeat(). If the heartbeat
timer is pending at that point, cancel_delayed_work() returns true and
intel_engine_park_heartbeat() reads the stale non-NULL systole pointer
and calls i915_request_put() on it again, causing a refcount underflow:
Fix this by replacing the non-atomic pointer read + separate clear with
xchg() in both racing paths. xchg() is a single indivisible hardware
instruction that atomically reads the old pointer and writes NULL. This
guarantees only one of the two concurrent callers obtains the non-NULL
pointer and performs the put, the other gets NULL and skips it.
Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/work_items/15880 Fixes: 058179e72e09 ("drm/i915/gt: Replace hangcheck by heartbeats") Cc: <stable@vger.kernel.org> # v5.5+ Signed-off-by: Sebastian Brzezinka <sebastian.brzezinka@intel.com> Reviewed-by: Krzysztof Karas <krzysztof.karas@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://lore.kernel.org/r/d4c1c14255688dd07cc8044973c4f032a8d1559e.1775038106.git.sebastian.brzezinka@intel.com
(cherry picked from commit 13238dc0ee4f9ab8dafa2cca7295736191ae2f42) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When dma_map_single() fails in tse_start_xmit(), the function returns
NETDEV_TX_OK without freeing the skb. Since NETDEV_TX_OK tells the
stack the packet was consumed, the skb is never freed, leaking memory
on every DMA mapping failure.
Add dev_kfree_skb_any() before returning to properly free the skb.
Fixes: bbd2190ce96d ("Altera TSE: Add main and header file for Altera Ethernet Driver") Cc: stable@vger.kernel.org Signed-off-by: David Carlier <devnexen@gmail.com> Link: https://patch.msgid.link/20260401211218.279185-1-devnexen@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The -EBUSY handling in tls_do_encryption(), introduced by commit 859054147318 ("net: tls: handle backlogging of crypto requests"), has
a use-after-free due to double cleanup of encrypt_pending and the
scatterlist entry.
When crypto_aead_encrypt() returns -EBUSY, the request is enqueued to
the cryptd backlog and the async callback tls_encrypt_done() will be
invoked upon completion. That callback unconditionally restores the
scatterlist entry (sge->offset, sge->length) and decrements
ctx->encrypt_pending. However, if tls_encrypt_async_wait() returns an
error, the synchronous error path in tls_do_encryption() performs the
same cleanup again, double-decrementing encrypt_pending and
double-restoring the scatterlist.
The double-decrement corrupts the encrypt_pending sentinel (initialized
to 1), making tls_encrypt_async_wait() permanently skip the wait for
pending async callbacks. A subsequent sendmsg can then free the
tls_rec via bpf_exec_tx_verdict() while a cryptd callback is still
pending, resulting in a use-after-free when the callback fires on the
freed record.
Fix this by skipping the synchronous cleanup when the -EBUSY async
wait returns an error, since the callback has already handled
encrypt_pending and sge restoration.
Fixes: 859054147318 ("net: tls: handle backlogging of crypto requests") Cc: stable@vger.kernel.org Signed-off-by: Muhammad Alifa Ramdhan <ramdhan@starlabs.sg> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Link: https://patch.msgid.link/20260403013617.2838875-1-ramdhan@starlabs.sg Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
batadv_tt_prepare_tvlv_global_data() builds the allocation length for a
global TT response in 16-bit temporaries. When a remote originator
advertises a large enough global TT, the TT payload length plus the VLAN
header offset can exceed 65535 and wrap before kmalloc().
The full-table response path still uses the original TT payload length when
it fills tt_change, so the wrapped allocation is too small and
batadv_tt_prepare_tvlv_global_data() writes past the end of the heap object
before the later packet-size check runs.
Fix this by rejecting TT responses whose TVLV value length cannot fit in
the 16-bit TVLV payload length field.
Fixes: 7ea7b4a14275 ("batman-adv: make the TT CRC logic VLAN specific") Cc: stable@vger.kernel.org Reported-by: Yifan Wu <yifanwucs@gmail.com> Reported-by: Juefei Pu <tomapufckgml@gmail.com> Co-developed-by: Yuan Tan <yuantan098@gmail.com> Signed-off-by: Yuan Tan <yuantan098@gmail.com> Suggested-by: Xin Liu <bird@lzu.edu.cn> Tested-by: Ren Wei <enjou1224z@gmail.com> Signed-off-by: Ruide Cao <caoruide123@gmail.com> Signed-off-by: Ren Wei <n05ec@lzu.edu.cn> Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
pn532_receive_buf() reports the number of accepted bytes to the serdev
core. The current code consumes bytes into recv_skb and may already hand
a complete frame to pn533_recv_frame() before allocating a fresh receive
buffer.
If that alloc_skb() fails, the callback returns 0 even though it has
already consumed bytes, and it leaves recv_skb as NULL for the next
receive callback. That breaks the receive_buf() accounting contract and
can also lead to a NULL dereference on the next skb_put_u8().
Allocate the receive skb lazily before consuming the next byte instead.
If allocation fails, return the number of bytes already accepted.
Reboot starts failing on Poplar since commit 8424ecdde7df ("arm64: mm:
Set ZONE_DMA size based on devicetree's dma-ranges"), which effectively
changes zone_dma_bits from 30 to 32 for arm64 platforms that do not
properly define dma-ranges in device tree. It's unclear how Poplar reboot
gets broken by this change exactly, but a dma-ranges limiting zone_dma to
the first 1 GB fixes the regression.
The PCIe reset GPIO on Poplar is actually active low. The active high
worked before because kernel driver didn't respect the setting from DT.
This is changed since commit 1d26a55fbeb9 ("PCI: histb: Switch to using
gpiod API"), and thus PCIe on Poplar got brken since then.
The GRP_ACK_MSG handler in tipc_group_proto_rcv() currently decrements
bc_ackers on every inbound group ACK, even when the same member has
already acknowledged the current broadcast round.
Because bc_ackers is a u16, a duplicate ACK received after the last
legitimate ACK wraps the counter to 65535. Once wrapped,
tipc_group_bc_cong() keeps reporting congestion and later group
broadcasts on the affected socket stay blocked until the group is
recreated.
Fix this by ignoring duplicate or stale ACKs before touching bc_acked or
bc_ackers. This makes repeated GRP_ACK_MSG handling idempotent and
prevents the underflow path.
Fixes: 2f487712b893 ("tipc: guarantee that group broadcast doesn't bypass group unicast") Cc: stable@vger.kernel.org Signed-off-by: Oleh Konko <security@1seal.org> Reviewed-by: Tung Nguyen <tung.quang.nguyen@est.tech> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/41a4833f368641218e444fdcff822039.security@1seal.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
nft_ct_timeout_obj_destroy() frees the timeout object with kfree()
immediately after nf_ct_untimeout(), without waiting for an RCU grace
period. Concurrent packet processing on other CPUs may still hold
RCU-protected references to the timeout object obtained via
rcu_dereference() in nf_ct_timeout_data().
Add an rcu_head to struct nf_ct_timeout and use kfree_rcu() to defer
freeing until after an RCU grace period, matching the approach already
used in nfnetlink_cttimeout.c.
KASAN report:
BUG: KASAN: slab-use-after-free in nf_conntrack_tcp_packet+0x1381/0x29d0
Read of size 4 at addr ffff8881035fe19c by task exploit/80
Backport for conflicts introdued by
- conversion from sha1 to sha256 introduced in e44a4dc4b36c ("apparmor: switch SECURITY_APPARMOR_HASH from sha1 to sha256
")
- adding of conditioanl that nests the conflicting code inside an if
condition d61c57fde819 ("apparmor: make export of raw binary profile to userspace optional")
AppArmor was putting the reference to i_private data on its end after
removing the original entry from the file system. However the inode
can and does live beyond that point and it is possible that some of
the fs call back functions will be invoked after the reference has
been put, which results in a race between freeing the data and
accessing it through the fs.
While the rawdata/loaddata is the most likely candidate to fail the
race, as it has the fewest references. If properly crafted it might be
possible to trigger a race for the other types stored in i_private.
Fix this by moving the put of i_private referenced data to the correct
place which is during inode eviction.
Fixes: c961ee5f21b20 ("apparmor: convert from securityfs to apparmorfs for policy ns files") Reported-by: Qualys Security Advisory <qsa@qualys.com> Reviewed-by: Georgia Garcia <georgia.garcia@canonical.com> Reviewed-by: Maxime Bélair <maxime.belair@canonical.com> Reviewed-by: Cengiz Can <cengiz.can@canonical.com> Signed-off-by: John Johansen <john.johansen@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Backport for conflicts introdued by d61c57fde819 ("apparmor: make export of raw binary profile to userspace optional")
which nests the conflicting code inside an if condition
There is a race condition that leads to a use-after-free situation:
because the rawdata inodes are not refcounted, an attacker can start
open()ing one of the rawdata files, and at the same time remove the
last reference to this rawdata (by removing the corresponding profile,
for example), which frees its struct aa_loaddata; as a result, when
seq_rawdata_open() is reached, i_private is a dangling pointer and
freed memory is accessed.
The rawdata inodes weren't refcounted to avoid a circular refcount and
were supposed to be held by the profile rawdata reference. However
during profile removal there is a window where the vfs and profile
destruction race, resulting in the use after free.
Fix this by moving to a double refcount scheme. Where the profile
refcount on rawdata is used to break the circular dependency. Allowing
for freeing of the rawdata once all inode references to the rawdata
are put.
Fixes: 5d5182cae401 ("apparmor: move to per loaddata files, instead of replicating in profiles") Reported-by: Qualys Security Advisory <qsa@qualys.com> Reviewed-by: Georgia Garcia <georgia.garcia@canonical.com> Reviewed-by: Maxime Bélair <maxime.belair@canonical.com> Reviewed-by: Cengiz Can <cengiz.can@canonical.com> Tested-by: Salvatore Bonaccorso <carnil@debian.org> Signed-off-by: John Johansen <john.johansen@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Differential encoding allows loops to be created if it is abused. To
prevent this the unpack should verify that a diff-encode chain
terminates.
Unfortunately the differential encode verification had two bugs.
1. it conflated states that had gone through check and already been
marked, with states that were currently being checked and marked.
This means that loops in the current chain being verified are treated
as a chain that has already been verified.
2. the order bailout on already checked states compared current chain
check iterators j,k instead of using the outer loop iterator i.
Meaning a step backwards in states in the current chain verification
was being mistaken for moving to an already verified state.
Move to a double mark scheme where already verified states get a
different mark, than the current chain being kept. This enables us
to also drop the backwards verification check that was the cause of
the second error as any already verified state is already marked.
Fixes: 031dcc8f4e84 ("apparmor: dfa add support for state differential encoding") Reported-by: Qualys Security Advisory <qsa@qualys.com> Tested-by: Salvatore Bonaccorso <carnil@debian.org> Reviewed-by: Georgia Garcia <georgia.garcia@canonical.com> Reviewed-by: Cengiz Can <cengiz.can@canonical.com> Signed-off-by: John Johansen <john.johansen@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Backport for api changes introduced in
- 90c436a64a6e ("apparmor: pass cred through to audit info.")
- 92de220a7f33 ("apparmor: update policy capable checks to use a label")
An unprivileged local user can load, replace, and remove profiles by
opening the apparmorfs interfaces, via a confused deputy attack, by
passing the opened fd to a privileged process, and getting the
privileged process to write to the interface.
This does require a privileged target that can be manipulated to do
the write for the unprivileged process, but once such access is
achieved full policy management is possible and all the possible
implications that implies: removing confinement, DoS of system or
target applications by denying all execution, by-passing the
unprivileged user namespace restriction, to exploiting kernel bugs for
a local privilege escalation.
The policy management interface can not have its permissions simply
changed from 0666 to 0600 because non-root processes need to be able
to load policy to different policy namespaces.
Instead ensure the task writing the interface has privileges that
are a subset of the task that opened the interface. This is already
done via policy for confined processes, but unconfined can delegate
access to the opened fd, by-passing the usual policy check.
The verify_dfa() function only checks DEFAULT_TABLE bounds when the state
is not differentially encoded.
When the verification loop traverses the differential encoding chain,
it reads k = DEFAULT_TABLE[j] and uses k as an array index without
validation. A malformed DFA with DEFAULT_TABLE[j] >= state_count,
therefore, causes both out-of-bounds reads and writes.
[ 57.179855] ==================================================================
[ 57.180549] BUG: KASAN: slab-out-of-bounds in verify_dfa+0x59a/0x660
[ 57.180904] Read of size 4 at addr ffff888100eadec4 by task su/993
The match_char() macro evaluates its character parameter multiple
times when traversing differential encoding chains. When invoked
with *str++, the string pointer advances on each iteration of the
inner do-while loop, causing the DFA to check different characters
at each iteration and therefore skip input characters.
This results in out-of-bounds reads when the pointer advances past
the input buffer boundary.
[ 94.984676] ==================================================================
[ 94.985301] BUG: KASAN: slab-out-of-bounds in aa_dfa_match+0x5ae/0x760
[ 94.985655] Read of size 1 at addr ffff888100342000 by task file/976
Currently the number of policy namespaces is not bounded relying on
the user namespace limit. However policy namespaces aren't strictly
tied to user namespaces and it is possible to create them and nest
them arbitrarily deep which can be used to exhaust system resource.
Hard cap policy namespaces to the same depth as user namespaces.
Fixes: c88d4c7b049e8 ("AppArmor: core policy routines") Reported-by: Qualys Security Advisory <qsa@qualys.com> Reviewed-by: Ryan Lee <ryan.lee@canonical.com> Reviewed-by: Cengiz Can <cengiz.can@canonical.com> Signed-off-by: John Johansen <john.johansen@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The profile removal code uses recursion when removing nested profiles,
which can lead to kernel stack exhaustion and system crashes.
Reproducer:
$ pf='a'; for ((i=0; i<1024; i++)); do
echo -e "profile $pf { \n }" | apparmor_parser -K -a;
pf="$pf//x";
done
$ echo -n a > /sys/kernel/security/apparmor/.remove
Replace the recursive __aa_profile_list_release() approach with an
iterative approach in __remove_profile(). The function repeatedly
finds and removes leaf profiles until the entire subtree is removed,
maintaining the same removal semantic without recursion.
The function sets `*ns = NULL` on every call, leaking the namespace
string allocated in previous iterations when multiple profiles are
unpacked. This also breaks namespace consistency checking since *ns
is always NULL when the comparison is made.
Remove the incorrect assignment.
The caller (aa_unpack) initializes *ns to NULL once before the loop,
which is sufficient.
Fixes: dd51c8485763 ("apparmor: provide base for multiple profiles to be replaced at once") Reported-by: Qualys Security Advisory <qsa@qualys.com> Tested-by: Salvatore Bonaccorso <carnil@debian.org> Reviewed-by: Georgia Garcia <georgia.garcia@canonical.com> Reviewed-by: Cengiz Can <cengiz.can@canonical.com> Signed-off-by: Massimiliano Pellizzer <massimiliano.pellizzer@canonical.com> Signed-off-by: John Johansen <john.johansen@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Backport for conflicts caused by ad596ea74e74 ("apparmor: group dfa policydb unpacking")
- rearrange and consolidated the unpack.
b11e51dd7094 ("apparmor: test: make static symbols visible during kunit testing")
- rename function and make it visible to kunit tests
Start states are read from untrusted data and used as indexes into the
DFA state tables. The aa_dfa_next() function call in unpack_pdb() will
access dfa->tables[YYTD_ID_BASE][start], and if the start state exceeds
the number of states in the DFA, this results in an out-of-bound read.
==================================================================
BUG: KASAN: slab-out-of-bounds in aa_dfa_next+0x2a1/0x360
Read of size 4 at addr ffff88811956fb90 by task su/1097
...
Reject policies with out-of-bounds start states during unpacking
to prevent the issue.
The AD7923 was updated to support devices with 8 channels, but the size
of tx_buf and ring_xfer was not increased accordingly, leading to a
potential buffer overflow in ad7923_update_scan_mode().
Fixes: 851644a60d20 ("iio: adc: ad7923: Add support for the ad7908/ad7918/ad7928") Cc: stable@vger.kernel.org Signed-off-by: Nuno Sa <nuno.sa@analog.com> Signed-off-by: Zicheng Qu <quzicheng@huawei.com> Link: https://patch.msgid.link/20241029134637.2261336-1-quzicheng@huawei.com Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
[ Context change fixed. ] Signed-off-by: Robert Garcia <rob_garcia@163.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
If a line is requested with debounce, and that results in debouncing
in software, and the line is subsequently reconfigured to enable edge
detection then the allocation of the kfifo to contain edge events is
overlooked. This results in events being written to and read from an
uninitialised kfifo. Read events are returned to userspace.
Initialise the kfifo in the case where the software debounce is
already active.
Fixes: 65cff7046406 ("gpiolib: cdev: support setting debounce") Signed-off-by: Kent Gibson <warthog618@gmail.com> Link: https://lore.kernel.org/r/20240510065342.36191-1-warthog618@gmail.com Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Signed-off-by: Robert Garcia <rob_garcia@163.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Some devices, like the Grandstream GUV3100 webcam, have an invalid UVC
descriptor where multiple entities share the same ID, this is invalid
and makes it impossible to make a proper entity tree without heuristics.
We have recently introduced a change in the way that we handle invalid
entities that has caused a regression on broken devices.
Implement a new heuristic to handle these devices properly.
Reported-by: Angel4005 <ooara1337@gmail.com> Closes: https://lore.kernel.org/linux-media/CAOzBiVuS7ygUjjhCbyWg-KiNx+HFTYnqH5+GJhd6cYsNLT=DaA@mail.gmail.com/ Fixes: 0e2ee70291e6 ("media: uvcvideo: Mark invalid entities with id UVC_INVALID_ENTITY_ID") Cc: stable@vger.kernel.org Signed-off-by: Ricardo Ribalda <ribalda@chromium.org> Reviewed-by: Hans de Goede <hansg@kernel.org> Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Per UVC 1.1+ specification 3.7.2, units and terminals must have a non-zero
unique ID.
```
Each Unit and Terminal within the video function is assigned a unique
identification number, the Unit ID (UID) or Terminal ID (TID), contained in
the bUnitID or bTerminalID field of the descriptor. The value 0x00 is
reserved for undefined ID,
```
If we add a new entity with id 0 or a duplicated ID, it will be marked
as UVC_INVALID_ENTITY_ID.
In a previous attempt commit 3dd075fe8ebb ("media: uvcvideo: Require
entities to have a non-zero unique ID"), we ignored all the invalid units,
this broke a lot of non-compatible cameras. Hopefully we are more lucky
this time.
This also prevents some syzkaller reproducers from triggering warnings due
to a chain of entities referring to themselves. In one particular case, an
Output Unit is connected to an Input Unit, both with the same ID of 1. But
when looking up for the source ID of the Output Unit, that same entity is
found instead of the input entity, which leads to such warnings.
In another case, a backward chain was considered finished as the source ID
was 0. Later on, that entity was found, but its pads were not valid.
Here is a sample stack trace for one of those cases.
To avoid racing with FF playback events and corrupting device's event
queue take event_lock spinlock when calling uinput_dev_event() when
submitting a FF upload or erase "event".
A lockdep circular locking dependency warning can be triggered
reproducibly when using a force-feedback gamepad with uinput (for
example, playing ELDEN RING under Wine with a Flydigi Vader 5
controller):
The cycle is caused by four lock acquisition paths:
1. ff upload: input_ff_upload() holds ff->mutex and calls
uinput_dev_upload_effect() -> uinput_request_submit() ->
uinput_request_send(), which acquires udev->mutex.
2. device create: uinput_ioctl_handler() holds udev->mutex and calls
uinput_create_device() -> input_register_device(), which acquires
input_mutex.
3. device register: input_register_device() holds input_mutex and
calls kbd_connect() -> input_register_handle(), which acquires
dev->mutex.
4. evdev release: evdev_release() calls input_flush_device() under
dev->mutex, which calls input_ff_flush() acquiring ff->mutex.
Fix this by introducing a new state_lock spinlock to protect
udev->state and udev->dev access in uinput_request_send() instead of
acquiring udev->mutex. The function only needs to atomically check
device state and queue an input event into the ring buffer via
uinput_dev_event() -- both operations are safe under a spinlock
(ktime_get_ts64() and wake_up_interruptible() do not sleep). This
breaks the ff->mutex -> udev->mutex link since a spinlock is a leaf in
the lock ordering and cannot form cycles with mutexes.
To keep state transitions visible to uinput_request_send(), protect
writes to udev->state in uinput_create_device() and
uinput_destroy_device() with the same state_lock spinlock.
Additionally, move init_completion(&request->done) from
uinput_request_send() to uinput_request_submit() before
uinput_request_reserve_slot(). Once the slot is allocated,
uinput_flush_requests() may call complete() on it at any time from
the destroy path, so the completion must be initialised before the
request becomes visible.
The ehash table lookups are lockless and rely on
SLAB_TYPESAFE_BY_RCU to guarantee socket memory stability
during RCU read-side critical sections. Both tcp_prot and
tcpv6_prot have their slab caches created with this flag
via proto_register().
However, MPTCP's mptcp_subflow_init() copies tcpv6_prot into
tcpv6_prot_override during inet_init() (fs_initcall, level 5),
before inet6_init() (module_init/device_initcall, level 6) has
called proto_register(&tcpv6_prot). At that point,
tcpv6_prot.slab is still NULL, so tcpv6_prot_override.slab
remains NULL permanently.
This causes MPTCP v6 subflow child sockets to be allocated via
kmalloc (falling into kmalloc-4k) instead of the TCPv6 slab
cache. The kmalloc-4k cache lacks SLAB_TYPESAFE_BY_RCU, so
when these sockets are freed without SOCK_RCU_FREE (which is
cleared for child sockets by design), the memory can be
immediately reused. Concurrent ehash lookups under
rcu_read_lock can then access freed memory, triggering a
slab-use-after-free in __inet_lookup_established.
Fix this by splitting the IPv6-specific initialization out of
mptcp_subflow_init() into a new mptcp_subflow_v6_init(), called
from mptcp_proto_v6_init() before protocol registration. This
ensures tcpv6_prot_override.slab correctly inherits the
SLAB_TYPESAFE_BY_RCU slab cache.
struct xfrm_user_report is a __u8 proto field followed by a struct
xfrm_selector which means there is three "empty" bytes of padding, but
the padding is never zeroed before copying to userspace. Fix that up by
zeroing the structure before setting individual member variables.
Cc: stable <stable@kernel.org> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Simon Horman <horms@kernel.org> Assisted-by: gregkh_clanker_t1000 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
USB drivers bind to USB interfaces and any device managed resources
should have their lifetime tied to the interface rather than parent USB
device. This avoids issues like memory leaks when drivers are unbound
without their devices being physically disconnected (e.g. on probe
deferral or configuration changes).
Fix the USB anchor lifetime so that it is released on driver unbind.
Fixes: 8b4c0009313f ("rt2x00usb: Use usb anchor to manage URB") Cc: stable@vger.kernel.org # 4.7 Cc: Vishal Thanki <vishalthanki@gmail.com> Signed-off-by: Johan Hovold <johan@kernel.org> Acked-by: Stanislaw Gruszka <stf_xl@wp.pl> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://patch.msgid.link/20260327113219.1313748-1-johan@kernel.org Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Since the ChaCha permutation is invertible, the local variable
'permuted_state' is sufficient to compute the original 'state', and thus
the key, even after the permutation has been done.
While the kernel is quite inconsistent about zeroizing secrets on the
stack (and some prominent userspace crypto libraries don't bother at all
since it's not guaranteed to work anyway), the kernel does try to do it
as a best practice, especially in cases involving the RNG.
Thus, explicitly zeroize 'permuted_state' before it goes out of scope.
Currently we execute `SET_NETDEV_DEV(dev, &priv->lowerdev->dev)` for
the virt_wifi net devices. However, unregistering a virt_wifi device in
netdev_run_todo() can happen together with the device referenced by
SET_NETDEV_DEV().
It can result in use-after-free during the ethtool operations performed
on a virt_wifi device that is currently being unregistered. Such a net
device can have the `dev.parent` field pointing to the freed memory,
but ethnl_ops_begin() calls `pm_runtime_get_sync(dev->dev.parent)`.
Let's remove SET_NETDEV_DEV for virt_wifi to avoid bugs like this:
==================================================================
BUG: KASAN: slab-use-after-free in __pm_runtime_resume+0xe2/0xf0
Read of size 2 at addr ffff88810cfc46f8 by task pm/606
WARNING: CPU: 0 PID: 16 at io_uring/tctx.c:51 __io_uring_free+0xfa/0x140 io_uring/tctx.c:51
which is the
WARN_ON_ONCE(!xa_empty(&tctx->xa));
sanity check in __io_uring_free() when a io_uring_task is going through
its final put. The syzbot test case includes injecting memory allocation
failures, and it very much looks like xa_store() can fail one of its
memory allocations and end up with ->head being non-NULL even though no
entries exist in the xarray.
Until this issue gets sorted out, work around it by attempting to
iterate entries in our xarray, and WARN_ON_ONCE() if one is found.
Reported-by: syzbot+cc36d44ec9f368e443d3@syzkaller.appspotmail.com Link: https://lore.kernel.org/io-uring/673c1643.050a0220.87769.0066.GAE@google.com/ Signed-off-by: Jens Axboe <axboe@kernel.dk>
[ Modify the function in io_uring.c because it's located here in v5.15. ] Signed-off-by: Robert Garcia <rob_garcia@163.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
geth_alloc() increments the reference count, but geth_free() fails to
decrement it. This prevents the configuration of attributes via configfs
after unlinking the function.
Decrement the reference count in geth_free() to ensure proper cleanup.
The Information Element (IE) parser rtw_get_ie() trusted the length
byte of each IE without validating that the IE body (len bytes after
the 2-byte header) fits inside the remaining frame buffer. A malformed
frame can advertise an IE length larger than the available data, causing
the parser to increment its pointer beyond the buffer end. This results
in out-of-bounds reads or, depending on the pattern, an infinite loop.
Fix by validating that (offset + 2 + len) does not exceed the limit
before accepting the IE or advancing to the next element.
This prevents OOB reads and ensures the parser terminates safely on
malformed frames.
[ The context change is due to the commit 4610e57a7d2e
("staging: rtl8723bs: Remove redundant else branches.") in v5.19
which is irrelevant to the logic of this patch. ]
Signed-off-by: Navaneeth K <knavaneeth786@gmail.com> Cc: stable <stable@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Johnny Hao <johnny_haocn@sina.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Fix three refcount inconsistency issues related to `cifs_sb_tlink`.
Comments for `cifs_sb_tlink` state that `cifs_put_tlink()` needs to be
called after successful calls to `cifs_sb_tlink()`. Three calls fail to
update refcount accordingly, leading to possible resource leaks.
Fixes: 8ceb98437946 ("CIFS: Move rename to ops struct") Fixes: 2f1afe25997f ("cifs: Use smb 2 - 3 and cifsacl mount options getacl functions") Fixes: 366ed846df60 ("cifs: Use smb 2 - 3 and cifsacl mount options setacl function") Cc: stable@vger.kernel.org Signed-off-by: Shuhao Fu <sfual@cse.ust.hk> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Johnny Hao <johnny_haocn@sina.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
In mctp_dump_addrinfo, ifa_index can be used to filter interfaces, but
only when the struct ifaddrmsg is provided. Otherwise it will be
comparing to uninitialised memory - reproducible in the syzkaller case from
dhcpd, or busybox "ip addr show".
The kernel MCTP implementation has always filtered by ifa_index, so
existing userspace programs expecting to dump MCTP addresses must
already be passing a valid ifa_index value (either 0 or a real index).
[ The context change is due to the commit 2d45eeb7d5d7
("mctp: no longer rely on net->dev_index_head[]") in v6.14
which is irrelevant to the logic of this patch. ]
Fixes: 583be982d934 ("mctp: Add device handling and netlink interface") Reported-by: syzbot+e76d52dadc089b9d197f@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/68135815.050a0220.3a872c.000e.GAE@google.com/ Reported-by: syzbot+1065a199625a388fce60@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/681357d6.050a0220.14dd7d.000d.GAE@google.com/ Signed-off-by: Matt Johnston <matt@codeconstruct.com.au> Link: https://patch.msgid.link/20250508-mctp-addr-dump-v2-1-c8a53fd2dd66@codeconstruct.com.au Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Johnny Hao <johnny_haocn@sina.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Recently, we discovered the following issue through syzkaller:
BUG: KASAN: slab-use-after-free in fb_mode_is_equal+0x285/0x2f0
Read of size 4 at addr ff11000001b3c69c by task syz.xxx
...
Call Trace:
<TASK>
dump_stack_lvl+0xab/0xe0
print_address_description.constprop.0+0x2c/0x390
print_report+0xb9/0x280
kasan_report+0xb8/0xf0
fb_mode_is_equal+0x285/0x2f0
fbcon_mode_deleted+0x129/0x180
fb_set_var+0xe7f/0x11d0
do_fb_ioctl+0x6a0/0x750
fb_ioctl+0xe0/0x140
__x64_sys_ioctl+0x193/0x210
do_syscall_64+0x5f/0x9c0
entry_SYSCALL_64_after_hwframe+0x76/0x7e
Based on experimentation and analysis, during framebuffer unregistration,
only the memory of fb_info->modelist is freed, without setting the
corresponding fb_display[i]->mode to NULL for the freed modes. This leads
to UAF issues during subsequent accesses. Here's an example of reproduction
steps:
1. With /dev/fb0 already registered in the system, load a kernel module
to register a new device /dev/fb1;
2. Set fb1's mode to the global fb_display[] array (via FBIOPUT_CON2FBMAP);
3. Switch console from fb to VGA (to allow normal rmmod of the ko);
4. Unload the kernel module, at this point fb1's modelist is freed, leaving
a wild pointer in fb_display[];
5. Trigger the bug via system calls through fb0 attempting to delete a mode
from fb0.
Add a check in do_unregister_framebuffer(): if the mode to be freed exists
in fb_display[], set the corresponding mode pointer to NULL.
[ The context change is due to the commit 2c0c19b681d5
("fbdev: fbmem: Fix double free of 'fb_info->pixmap.addr'") in v5.16
which is irrelevant to the logic of this patch. ]
Signed-off-by: Quanmin Yan <yanquanmin1@huawei.com> Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de> Signed-off-by: Helge Deller <deller@gmx.de> Cc: stable@vger.kernel.org Signed-off-by: Johnny Hao <johnny_haocn@sina.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Sinc commit 79a6d1bfe114 ("can: gs_usb: gs_usb_receive_bulk_callback():
unanchor URL on usb_submit_urb() error") a failing resubmit URB will print
an info message.
In the case of a short read where netdev has not yet been assigned,
initialize as NULL to avoid dereferencing an undefined value. Also report
the error value of the failed resubmit.
Fixes: 79a6d1bfe114 ("can: gs_usb: gs_usb_receive_bulk_callback(): unanchor URL on usb_submit_urb() error") Reported-by: Jakub Kicinski <kuba@kernel.org> Closes: https://lore.kernel.org/all/20260119181904.1209979-1-kuba@kernel.org/ Link: https://patch.msgid.link/20260120-gs_usb-fix-error-message-v1-1-6be04de572bc@pengutronix.de Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Ruohan Lan <ruohanlan@aliyun.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
In commit 7352e1d5932a ("can: gs_usb: gs_usb_receive_bulk_callback(): fix
URB memory leak"), the URB was re-anchored before usb_submit_urb() in
gs_usb_receive_bulk_callback() to prevent a leak of this URB during
cleanup.
However, this patch did not take into account that usb_submit_urb() could
fail. The URB remains anchored and
usb_kill_anchored_urbs(&parent->rx_submitted) in gs_can_close() loops
infinitely since the anchor list never becomes empty.
To fix the bug, unanchor the URB when an usb_submit_urb() error occurs,
also print an info message.
Fixes: 7352e1d5932a ("can: gs_usb: gs_usb_receive_bulk_callback(): fix URB memory leak") Reported-by: Jakub Kicinski <kuba@kernel.org> Closes: https://lore.kernel.org/all/20260110223836.3890248-1-kuba@kernel.org/ Link: https://patch.msgid.link/20260116-can_usb-fix-reanchor-v1-1-9d74e7289225@pengutronix.de Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Ruohan Lan <ruohanlan@aliyun.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
In gs_can_open(), the URBs for USB-in transfers are allocated, added to the
parent->rx_submitted anchor and submitted. In the complete callback
gs_usb_receive_bulk_callback(), the URB is processed and resubmitted. In
gs_can_close() the URBs are freed by calling
usb_kill_anchored_urbs(parent->rx_submitted).
However, this does not take into account that the USB framework unanchors
the URB before the complete function is called. This means that once an
in-URB has been completed, it is no longer anchored and is ultimately not
released in gs_can_close().
Fix the memory leak by anchoring the URB in the
gs_usb_receive_bulk_callback() to the parent->rx_submitted anchor.
[ The variable usbcan was renamed to parent in
commit b6980ad3a90c ("can: gs_usb: uniformly use "parent" as variable name for struct gs_usb")
introduced in v6.6. To backport to v5.15, replace parent with usbcan. ]
When a gadget request is only partially transferred in transfer()
because the per-frame bandwidth budget is exhausted, the loop advances
to the next queued request. If that next request is a zero-length
packet (ZLP), len evaluates to zero and the code takes the
unlikely(len == 0) path, which sets is_short = 1. This bypasses the
bandwidth guard ("limit < ep->ep.maxpacket && limit < len") that
lives in the else branch and would otherwise break out of the loop for
non-zero requests. The is_short path then completes the URB before all
data from the first request has been transferred.
Reproducer (bulk IN, high speed):
Device side (FunctionFS with Linux AIO):
1. Queue a 65024-byte write via io_submit (127 * 512, i.e. a
multiple of the HS bulk max packet size).
2. Immediately queue a zero-length write (ZLP) via io_submit.
Host side:
3. Submit a 65536-byte bulk IN URB.
Expected: URB completes with actual_length = 65024.
Actual: URB completes with actual_length = 53248, losing 11776
bytes that leak into subsequent URBs.
At high speed the per-frame budget is 53248 bytes (512 * 13 * 8).
The 65024-byte request exhausts this budget after 53248 bytes, leaving
the request incomplete (req->req.actual < req->req.length). Neither
the request nor the URB is finished, and rescan is 0, so the loop
advances to the ZLP. For the ZLP, dev_len = 0, so len = min(12288, 0)
= 0, taking the unlikely(len == 0) path and setting is_short = 1.
The is_short handler then sets *status = 0, completing the URB with
only 53248 of the expected 65024 bytes.
Fix this by breaking out of the loop when the current request has
remaining data (req->req.actual < req->req.length). The request
resumes on the next timer tick, preserving correct data ordering.
Signed-off-by: Sebastian Urban <surban@surban.net> Cc: stable <stable@kernel.org> Reviewed-by: Alan Stern <stern@rowland.harvard.edu> Link: https://patch.msgid.link/20260315151045.1155850-1-surban@surban.net Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This fixes an error in synchronization in the dummy-hcd driver. The
error has a somewhat involved history. The synchronization mechanism
was introduced by commit 7dbd8f4cabd9 ("USB: dummy-hcd: Fix erroneous
synchronization change"), which added an emulated "interrupts enabled"
flag together with code emulating synchronize_irq() (it waits until
all current handler callbacks have returned).
But the emulated interrupt-disable occurred too late, after the driver
containing the handler callback routines had been told that it was
unbound and no more callbacks would occur. Commit 4a5d797a9f9c ("usb:
gadget: dummy_hcd: fix gpf in gadget_setup") tried to fix this by
moving the synchronize_irq() emulation code from dummy_stop() to
dummy_pullup(), which runs before the unbind callback.
There still were races, though, because the emulated interrupt-disable
still occurred too late. It couldn't be moved to dummy_pullup(),
because that routine can be called for reasons other than an impending
unbind. Therefore commits 7dc0c55e9f30 ("USB: UDC core: Add
udc_async_callbacks gadget op") and 04145a03db9d ("USB: UDC: Implement
udc_async_callbacks in dummy-hcd") added an API allowing the UDC core
to tell dummy-hcd exactly when emulated interrupts and their callbacks
should be disabled.
That brings us to the current state of things, which is still wrong
because the emulated synchronize_irq() occurs before the emulated
interrupt-disable! That's no good, beause it means that more emulated
interrupts can occur after the synchronize_irq() emulation has run,
leading to the possibility that a callback handler may be running when
the gadget driver is unbound.
To fix this, we have to move the synchronize_irq() emulation code yet
again, to the dummy_udc_async_callbacks() routine, which takes care of
enabling and disabling emulated interrupt requests. The
synchronization will now run immediately after emulated interrupts are
disabled, which is where it belongs.
Syzbot testing was able to provoke an addressing exception and crash
in the usb_gadget_udc_reset() routine in
drivers/usb/gadgets/udc/core.c, resulting from the fact that the
routine was called with a second ("driver") argument of NULL. The bad
caller was set_link_state() in dummy_hcd.c, and the problem arose
because of a race between a USB reset and driver unbind.
These sorts of races were not supposed to be possible; commit 7dbd8f4cabd9 ("USB: dummy-hcd: Fix erroneous synchronization change"),
along with a few followup commits, was written specifically to prevent
them. As it turns out, there are (at least) two errors remaining in
the code. Another patch will address the second error; this one is
concerned with the first.
The error responsible for the syzbot crash occurred because the
stop_activity() routine will sometimes drop and then re-acquire the
dum->lock spinlock. A call to stop_activity() occurs in
set_link_state() when handling an emulated USB reset, after the test
of dum->ints_enabled and before the increment of dum->callback_usage.
This allowed another thread (doing a driver unbind) to sneak in and
grab the spinlock, and then clear dum->ints_enabled and dum->driver.
Normally this other thread would have to wait for dum->callback_usage
to go down to 0 before it would clear dum->driver, but in this case it
didn't have to wait since dum->callback_usage had not yet been
incremented.
The fix is to increment dum->callback_usage _before_ calling
stop_activity() instead of after. Then the thread doing the unbind
will not clear dum->driver until after the call to
usb_gadget_udc_reset() safely returns and dum->callback_usage has been
decremented again.
device_property_read_foo() returns 0 on success and only then modifies
'val'. Currently, val is left uninitialized if the aforementioned
function returns non-zero, making nhi_wake_supported() return true
almost always (random != 0) if the property is not present in device
firmware.
Invert the check to make it make sense.
Fixes: 3cdb9446a117 ("thunderbolt: Add support for Intel Ice Lake") Cc: stable@vger.kernel.org Signed-off-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
ftgmac100_alloc_rings() allocates rx_skbs, tx_skbs, rxdes, txdes, and
rx_scratch in stages. On intermediate failures it returned -ENOMEM
directly, leaking resources allocated earlier in the function.
Rework the failure path to use staged local unwind labels and free
allocated resources in reverse order before returning -ENOMEM. This
matches common netdev allocation cleanup style.
Fixes: d72e01a0430f ("ftgmac100: Use a scratch buffer for failed RX allocations") Cc: stable@vger.kernel.org Signed-off-by: Yufan Chen <yufan.chen@linux.dev> Link: https://patch.msgid.link/20260328163257.60836-1-yufan.chen@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
vxlan_na_create() walks ND options according to option-provided
lengths. A malformed option can make the parser advance beyond the
computed option span or use a too-short source LLADDR option payload.
Validate option lengths against the remaining NS option area before
advancing, and only read source LLADDR when the option is large enough
for an Ethernet address.
Fixes: 4b29dba9c085 ("vxlan: fix nonfunctional neigh_reduce()") Cc: stable@vger.kernel.org Reported-by: Yifan Wu <yifanwucs@gmail.com> Reported-by: Juefei Pu <tomapufckgml@gmail.com> Tested-by: Ao Zhou <n05ec@lzu.edu.cn> Co-developed-by: Yuan Tan <tanyuan98@outlook.com> Signed-off-by: Yuan Tan <tanyuan98@outlook.com> Suggested-by: Xin Liu <bird@lzu.edu.cn> Signed-off-by: Yang Yang <n05ec@lzu.edu.cn> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/20260326034441.2037420-4-n05ec@lzu.edu.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
mtype_del() counts empty slots below n->pos in k, but it only drops the
bucket when both n->pos and k are zero. This misses buckets whose live
entries have all been removed while n->pos still points past deleted slots.
Treat a bucket as empty when all positions below n->pos are unused and
release it directly instead of shrinking it further.
Fixes: 8af1c6fbd923 ("netfilter: ipset: Fix forceadd evaluation path") Cc: stable@vger.kernel.org Reported-by: Juefei Pu <tomapufckgml@gmail.com> Reported-by: Xin Liu <dstsmallbird@foxmail.com> Signed-off-by: Yifan Wu <yifanwucs@gmail.com> Co-developed-by: Yuan Tan <yuantan098@gmail.com> Signed-off-by: Yuan Tan <yuantan098@gmail.com> Reviewed-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
`me4000_xilinx_download()` loads the firmware that was requested by
`request_firmware()`. It is possible for it to overrun the source
buffer because it blindly trusts the file format. It reads a data
stream length from the first 4 bytes into variable `file_length` and
reads the data stream contents of length `file_length` from offset 16
onwards.
Add a test to ensure that the supplied firmware is long enough to
contain the header and the data stream. On failure, log an error and
return `-EINVAL`.
Note: The firmware loading was totally broken before commit ac584af59945
("staging: comedi: me4000: fix firmware downloading"), but that is the
most sensible target for this fix.
`me2600_xilinx_download()` loads the firmware that was requested by
`request_firmware()`. It is possible for it to overrun the source
buffer because it blindly trusts the file format. It reads a data
stream length from the first 4 bytes into variable `file_length` and
reads the data stream contents of length `file_length` from offset 16
onwards. Although it checks that the supplied firmware is at least 16
bytes long, it does not check that it is long enough to contain the data
stream.
Add a test to ensure that the supplied firmware is long enough to
contain the header and the data stream. On failure, log an error and
return `-EINVAL`.