Below two paths could overlap each other if we power off a drive quickly
after powering it on. There are multiple races in nvme_setup_io_queues()
because of shutdown_lock missing and improper use of NVMEQ_ENABLED bit.
nvme_reset_work() nvme_remove()
nvme_setup_io_queues() nvme_dev_disable()
... ...
A1 clear NVMEQ_ENABLED bit for admin queue lock
retry: B1 nvme_suspend_io_queues()
A2 pci_free_irq() admin queue B2 nvme_suspend_queue() admin queue
A3 pci_free_irq_vectors() nvme_pci_disable()
A4 nvme_setup_irqs(); B3 pci_free_irq_vectors()
... unlock
A5 queue_request_irq() for admin queue
set NVMEQ_ENABLED bit
...
nvme_create_io_queues()
A6 result = queue_request_irq();
set NVMEQ_ENABLED bit
...
fail to allocate enough IO queues:
A7 nvme_suspend_io_queues()
goto retry
If B3 runs in between A1 and A2, it will crash if irqaction haven't
been freed by A2. B2 is supposed to free admin queue IRQ but it simply
can't fulfill the job as A1 has cleared NVMEQ_ENABLED bit.
Fix: combine A1 A2 so IRQ get freed as soon as the NVMEQ_ENABLED bit
gets cleared.
After solved #1, A2 could race with B3 if A2 is freeing IRQ while B3
is checking irqaction. A3 also could race with B2 if B2 is freeing
IRQ while A3 is checking irqaction.
Fix: A2 and A3 take lock for mutual exclusion.
A3 could race with B3 since they could run free_msi_irqs() in parallel.
Fix: A3 takes lock for mutual exclusion.
A4 could fail to allocate all needed IRQ vectors if A3 and A4 are
interrupted by B3.
Fix: A4 takes lock for mutual exclusion.
If A5/A6 happened after B2/B1, B3 will crash since irqaction is not NULL.
They are just allocated by A5/A6.
Fix: Lock queue_request_irq() and setting of NVMEQ_ENABLED bit.
A7 could get chance to pci_free_irq() for certain IO queue while B3 is
checking irqaction.
Fix: A7 takes lock.
nvme_dev->online_queues need to be protected by shutdown_lock. Since it
is not atomic, both paths could modify it using its own copy.
Co-developed-by: Yuanyuan Zhong <yzhong@purestorage.com> Signed-off-by: Casey Chen <cachen@purestorage.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
[noahm@debian.org: backported to 5.10] Link: https://lore.kernel.org/linux-nvme/20210707211432.29536-1-cachen@purestorage.com/ Signed-off-by: Noah Meyerhans <noahm@debian.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
# ip link add dummy0 type dummy
# ip link add link dummy0 name dummy0.1 type vlan id 1
# ip link add link dummy0.1 name dummy0.1.2 type vlan id 2
# rmmod 8021q
unregister_netdevice: waiting for dummy0.1 to become free. Usage count = 1
When rmmods 8021q, all vlan devs are deleted from their real_dev's vlan grp
and added into list_kill by unregister_vlan_dev(). dummy0.1 is unregistered
before dummy0.1.2, as it's using for_each_netdev() in __rtnl_kill_links().
When unregisters dummy0.1, dummy0.1.2 is not unregistered in the event of
NETDEV_UNREGISTER, as it's been deleted from dummy0.1's vlan grp. However,
due to dummy0.1.2 still holding dummy0.1, dummy0.1 will keep waiting in
netdev_wait_allrefs(), while dummy0.1.2 will never get unregistered and
release dummy0.1, as it delays dev_put until calling dev->priv_destructor,
vlan_dev_free().
This issue was introduced by Commit 563bcbae3ba2 ("net: vlan: fix a UAF in
vlan_dev_real_dev()"), and this patch is to fix it by moving dev_put() into
vlan_dev_uninit(), which is called after NETDEV_UNREGISTER event but before
netdev_wait_allrefs().
Fixes: 563bcbae3ba2 ("net: vlan: fix a UAF in vlan_dev_real_dev()") Reported-by: Shuang Li <shuali@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Olivier Matz <olivier.matz@6wind.com> Signed-off-by: Ivan Delalande <colona@arista.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch is to introduce vlan_dev_free_egress_priority() to
free egress priority for vlan dev, and keep vlan_dev_uninit()
static as .ndo_uninit. It makes the code more clear and safer
when adding new code in vlan_dev_uninit() in the future.
Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Olivier Matz <olivier.matz@6wind.com> Signed-off-by: Ivan Delalande <colona@arista.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
->d_name.name can change on rename and the earlier value can be freed;
there are conditions sufficient to stabilize it (->d_lock on dentry,
->d_lock on its parent, ->i_rwsem exclusive on the parent's inode,
rename_lock), but none of those are met at any of the sites. Take a stable
snapshot of the name instead.
Link: https://lore.kernel.org/all/20240202182732.GE2087318@ZenIV/ Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Stefan Berger <stefanb@linux.ibm.com> Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
[ Samasth: bp to fix CVE-2024-39494; Minor conflict resolved due to code context change ] Signed-off-by: Samasth Norway Ananda <samasth.norway.ananda@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
...followed by more symptoms of corruption, with similar stacks:
refcount_t: underflow; use-after-free.
kernel BUG at lib/list_debug.c:62!
Kernel panic - not syncing: Oops - BUG: Fatal exception
This happens because pps_device_destruct() frees the pps_device with the
embedded cdev immediately after calling cdev_del(), but, as the comment
above cdev_del() notes, fops for previously opened cdevs are still
callable even after cdev_del() returns. I think this bug has always
been there: I can't explain why it suddenly started happening every time
I reboot this particular board.
In commit d953e0e837e6 ("pps: Fix a use-after free bug when
unregistering a source."), George Spelvin suggested removing the
embedded cdev. That seems like the simplest way to fix this, so I've
implemented his suggestion, using __register_chrdev() with pps_idr
becoming the source of truth for which minor corresponds to which
device.
But now that pps_idr defines userspace visibility instead of cdev_add(),
we need to be sure the pps->dev refcount can't reach zero while
userspace can still find it again. So, the idr_remove() call moves to
pps_unregister_cdev(), and pps_idr now holds a reference to pps->dev.
This commit re-attempts the backport of the change to the linux-5.10.y
branch. Commit a1c3a19446a4 ("btrfs: avoid monopolizing a core when
activating a swap file") on this branch was reverted.
During swap activation we iterate over the extents of a file and we can
have many thousands of them, so we can end up in a busy loop monopolizing
a core. Avoid this by doing a voluntary reschedule after processing each
extent.
CC: stable@vger.kernel.org # 5.4+ Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Koichiro Den <koichiro.den@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The backport for linux-5.10.y, commit a1c3a19446a4 ("btrfs: avoid
monopolizing a core when activating a swap file"), inserted
cond_resched() in the wrong location.
Revert it now; a subsequent commit will re-backport the original patch.
Fixes: a1c3a19446a4 ("btrfs: avoid monopolizing a core when activating a swap file") # linux-5.10.y Signed-off-by: Koichiro Den <koichiro.den@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Leaving the PIT interrupt running can cause noticeable steal time for
virtual guests. The VMM generally has a timer which toggles the IRQ input
to the PIC and I/O APIC, which takes CPU time away from the guest. Even
on real hardware, running the counter may use power needlessly (albeit
not much).
Make sure it's turned off if it isn't going to be used.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Michael Kelley <mhkelley@outlook.com> Link: https://lore.kernel.org/all/20240802135555.564941-1-dwmw2@infradead.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
It should wait all existing dio write IOs before block removal,
otherwise, previous direct write IO may overwrite data in the
block which may be reused by other inode.
Cc: stable@vger.kernel.org Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Alva Lan <alvalan9@foxmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
After the netdevsim update to use human-readable IP address formats for
IPsec, we can now use the source and destination IPs directly in testing.
Here is the result:
# ./rtnetlink.sh -t kci_test_ipsec_offload
PASS: ipsec_offload
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20241010040027.21440-4-liuhangbin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The PCI function 2 on ASIX AX99100 PCIe to Multi I/O Controller can be
configured as a single-port parallel port controller. The subvendor id
is 0x2000 when configured as parallel port. It supports IEEE-1284 EPP /
ECP with its ECR on BAR1.
Each of the 4 PCI functions on ASIX AX99100 PCIe to Multi I/O
Controller can be configured as a single-port serial port controller.
The subvendor id is 0x1000 when configured as serial port and MSI
interrupts are supported.
Move PCI Vendor and Device ID of ASIX AX99100 PCIe to Multi I/O
Controller to pci_ids.h for its serial and parallel port driver
support in subsequent patches.
Signed-off-by: Jiaqing Zhao <jiaqing.zhao@linux.intel.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> Link: https://lore.kernel.org/r/20230724083933.3173513-3-jiaqing.zhao@linux.intel.com
[Moeko: Drop changes in drivers/net/can/sja1000/ems_pci.c] Signed-off-by: Tomita Moeko <tomitamoeko@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
nilfs_lookup_dirty_data_buffers(), which iterates through the buffers
attached to dirty data folios/pages, accesses the attached buffers without
locking the folios/pages.
For data cache, nilfs_clear_folio_dirty() may be called asynchronously
when the file system degenerates to read only, so
nilfs_lookup_dirty_data_buffers() still has the potential to cause use
after free issues when buffers lose the protection of their dirty state
midway due to this asynchronous clearing and are unintentionally freed by
try_to_free_buffers().
Eliminate this race issue by adjusting the lock section in this function.
[konishi.ryusuke@gmail.com: adjusted for page/folio conversion] Link: https://lkml.kernel.org/r/20250107200202.6432-3-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Fixes: 8c26c4e2694a ("nilfs2: fix issue with flush kernel thread after remount in RO mode because of driver's internal error or metadata corruption") Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Patch series "nilfs2: protect busy buffer heads from being force-cleared".
This series fixes the buffer head state inconsistency issues reported by
syzbot that occurs when the filesystem is corrupted and falls back to
read-only, and the associated buffer head use-after-free issue.
This patch (of 2):
Syzbot has reported that after nilfs2 detects filesystem corruption and
falls back to read-only, inconsistencies in the buffer state may occur.
One of the inconsistencies is that when nilfs2 calls mark_buffer_dirty()
to set a data or metadata buffer as dirty, but it detects that the buffer
is not in the uptodate state:
The other is when nilfs_btree_propagate(), which propagates the dirty
state to the ancestor nodes of a b-tree that point to a dirty buffer,
detects that the origin buffer is not dirty, even though it should be:
Both of these issues are caused by the callbacks that handle the
page/folio write requests, forcibly clear various states, including the
working state of the buffers they hold, at unexpected times when they
detect read-only fallback.
Fix these issues by checking if the buffer is referenced before clearing
the page/folio state, and skipping the clear if it is.
[konishi.ryusuke@gmail.com: adjusted for page/folio conversion] Link: https://lkml.kernel.org/r/20250107200202.6432-1-konishi.ryusuke@gmail.com Link: https://lkml.kernel.org/r/20250107200202.6432-2-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: syzbot+b2b14916b77acf8626d7@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=b2b14916b77acf8626d7 Reported-by: syzbot+d98fd19acd08b36ff422@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?extid=d98fd19acd08b36ff422 Fixes: 8c26c4e2694a ("nilfs2: fix issue with flush kernel thread after remount in RO mode because of driver's internal error or metadata corruption") Tested-by: syzbot+b2b14916b77acf8626d7@syzkaller.appspotmail.com Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
After detecting file system corruption and degrading to a read-only mount,
dirty folios and buffers in the page cache are cleared, and a large number
of warnings are output at that time, often filling up the kernel log.
In this case, since the degrading to a read-only mount is output to the
kernel log, these warnings are not very meaningful, and are rather a
nuisance in system management and debugging.
The related nilfs2-specific page/folio routines have a silent argument
that suppresses the warning output, but since it is not currently used
meaningfully, remove both the silent argument and the warning output.
[konishi.ryusuke@gmail.com: adjusted for page/folio conversion] Link: https://lkml.kernel.org/r/20240816090128.4561-1-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Stable-dep-of: ca76bb226bf4 ("nilfs2: do not force clear folio if buffer is referenced") Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This allows the assembly in entry.S to automatically keep in sync with
changes in the stack layout (struct pt_regs and struct switch_stack).
Cc: stable@vger.kernel.org Tested-by: Maciej W. Rozycki <macro@orcam.me.uk> Tested-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Maciej W. Rozycki <macro@orcam.me.uk> Signed-off-by: Ivan Kokshaysky <ink@unseen.parts> Signed-off-by: Matt Turner <mattst88@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
__static_call_update_early() has a check for early_boot_irqs_disabled, but
is used before early_boot_irqs_disabled is set up in start_kernel().
Xen PV has always special cased early_boot_irqs_disabled, but Xen PVH does
not and falls over the BUG when booting as dom0.
It is very suspect that early_boot_irqs_disabled starts as 0, becomes 1 for
a time, then becomes 0 again, but as this needs backporting to fix a
breakage in a security fix, dropping the BUG_ON() is the far safer option.
Fixes: 0ef8047b737d ("x86/static-call: provide a way to do very early static-call updates") Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219620 Reported-by: Alex Zenla <alex@edera.dev> Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Juergen Gross <jgross@suse.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Alex Zenla <alex@edera.dev> Link: https://lore.kernel.org/r/20241221211046.6475-1-andrew.cooper3@citrix.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
It is allowed for consoles to not provide a write() callback. For
example ttynull does this.
Check if a write() callback is available before using it.
Signed-off-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Petr Mladek <pmladek@suse.com> Reviewed-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: Daniel Thompson <daniel.thompson@linaro.org> Acked-by: Daniel Thompson <daniel.thompson@linaro.org> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20230717194607.145135-2-john.ogness@linutronix.de Cc: Brian Norris <briannorris@chromium.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The driver does not touch the irqstatus register when it is disabling
interrupts. This might cause an interrupt to trigger for an interrupt
that was just disabled.
To fix the issue, clear the irqstatus registers right after disabling
the interrupts.
Fixes: 32a1795f57ee ("drm/tidss: New driver for TI Keystone platform Display SubSystem") Cc: stable@vger.kernel.org Reported-by: Jonathan Cormier <jcormier@criticallink.com> Closes: https://e2e.ti.com/support/processors-group/processors/f/processors-forum/1394222/am625-issue-about-tidss-rcu_preempt-self-detected-stall-on-cpu/5424479#5424479 Signed-off-by: Devarsh Thakkar <devarsht@ti.com>
[Tomi: mostly rewrote the patch] Reviewed-by: Jonathan Cormier <jcormier@criticallink.com> Tested-by: Jonathan Cormier <jcormier@criticallink.com> Reviewed-by: Aradhya Bhatia <aradhya.bhatia@linux.dev> Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241021-tidss-irq-fix-v1-5-82ddaec94e4a@ideasonboard.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
It has been observed that sometimes DSS will trigger an interrupt and
the top level interrupt (DISPC_IRQSTATUS) is not zero, but the VP and
VID level interrupt-statuses are zero.
As the top level irqstatus is supposed to tell whether we have VP/VID
interrupts, the thinking of the driver authors was that this particular
case could never happen. Thus the driver only clears the DISPC_IRQSTATUS
bits which has corresponding interrupts in VP/VID status. So when this
issue happens, the driver will not clear DISPC_IRQSTATUS, and we get an
interrupt flood.
It is unclear why the issue happens. It could be a race issue in the
driver, but no such race has been found. It could also be an issue with
the HW. However a similar case can be easily triggered by manually
writing to DISPC_IRQSTATUS_RAW. This will forcibly set a bit in the
DISPC_IRQSTATUS and trigger an interrupt, and as the driver never clears
the bit, we get an interrupt flood.
To fix the issue, always clear DISPC_IRQSTATUS. The concern with this
solution is that if the top level irqstatus is the one that triggers the
interrupt, always clearing DISPC_IRQSTATUS might leave some interrupts
unhandled if VP/VID interrupt statuses have bits set. However, testing
shows that if any of the irqstatuses is set (i.e. even if
DISPC_IRQSTATUS == 0, but a VID irqstatus has a bit set), we will get an
interrupt.
Co-developed-by: Bin Liu <b-liu@ti.com> Signed-off-by: Bin Liu <b-liu@ti.com> Co-developed-by: Devarsh Thakkar <devarsht@ti.com> Signed-off-by: Devarsh Thakkar <devarsht@ti.com> Co-developed-by: Jonathan Cormier <jcormier@criticallink.com> Signed-off-by: Jonathan Cormier <jcormier@criticallink.com> Fixes: 32a1795f57ee ("drm/tidss: New driver for TI Keystone platform Display SubSystem") Cc: stable@vger.kernel.org Tested-by: Jonathan Cormier <jcormier@criticallink.com> Reviewed-by: Aradhya Bhatia <aradhya.bhatia@linux.dev> Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241021-tidss-irq-fix-v1-1-82ddaec94e4a@ideasonboard.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The initial value of err is -ENOBUFS, and err is guaranteed to be
less than 0 before all goto errout. Therefore, on the error path
of errout, there is no need to repeatedly judge that err is less than 0,
and delete redundant judgments to make the code more concise.
Signed-off-by: Li Zetao <lizetao1@huawei.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Stable-dep-of: becbd5850c03 ("neighbour: use RCU protection in __neigh_notify()") Signed-off-by: Sasha Levin <sashal@kernel.org>
Make the net pointer stored in possible_net_t structure annotated as
an RCU pointer. Change the access helpers to treat it as such.
Introduce read_pnet_rcu() helper to allow caller to dereference
the net pointer under RCU read lock.
Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Stable-dep-of: dd205fcc33d9 ("ipv4: use RCU protection in rt_is_expired()") Signed-off-by: Sasha Levin <sashal@kernel.org>
The following bug report happened with a PREEMPT_RT kernel:
BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 2012, name: kwatchdog
preempt_count: 1, expected: 0
RCU nest depth: 0, expected: 0
get_random_u32+0x4f/0x110
clocksource_verify_choose_cpus+0xab/0x1a0
clocksource_verify_percpu.part.0+0x6b/0x330
clocksource_watchdog_kthread+0x193/0x1a0
It is due to the fact that clocksource_verify_choose_cpus() is invoked with
preemption disabled. This function invokes get_random_u32() to obtain
random numbers for choosing CPUs. The batched_entropy_32 local lock and/or
the base_crng.lock spinlock in driver/char/random.c will be acquired during
the call. In PREEMPT_RT kernel, they are both sleeping locks and so cannot
be acquired in atomic context.
Fix this problem by using migrate_disable() to allow smp_processor_id() to
be reliably used without introducing atomic context. preempt_disable() is
then called after clocksource_verify_choose_cpus() but before the
clocksource measurement is being run to avoid introducing unexpected
latency.
Fixes: 7560c02bdffb ("clocksource: Check per-CPU clock synchronization when marked unstable") Suggested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://lore.kernel.org/all/20250131173323.891943-2-longman@redhat.com Signed-off-by: Sasha Levin <sashal@kernel.org>
The "Checking clocksource synchronization" message is normally printed
when clocksource_verify_percpu() is called for a given clocksource if
both the CLOCK_SOURCE_UNSTABLE and CLOCK_SOURCE_VERIFY_PERCPU flags
are set.
It is an informational message and so pr_info() is the correct choice.
Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Acked-by: John Stultz <jstultz@google.com> Link: https://lore.kernel.org/all/20250125015442.3740588-1-longman@redhat.com
Stable-dep-of: 6bb05a33337b ("clocksource: Use migrate_disable() to avoid calling get_random_u32() in atomic context") Signed-off-by: Sasha Levin <sashal@kernel.org>
clocksource_verify_percpu() calls cpumask_weight() to check if any bit of a
given cpumask is set.
This can be done more efficiently with cpumask_empty() because
cpumask_empty() stops traversing the cpumask as soon as it finds first set
bit, while cpumask_weight() counts all bits unconditionally.
Signed-off-by: Yury Norov <yury.norov@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20220210224933.379149-24-yury.norov@gmail.com
Stable-dep-of: 6bb05a33337b ("clocksource: Use migrate_disable() to avoid calling get_random_u32() in atomic context") Signed-off-by: Sasha Levin <sashal@kernel.org>
The functions get_online_cpus() and put_online_cpus() have been
deprecated during the CPU hotplug rework. They map directly to
cpus_read_lock() and cpus_read_unlock().
Replace deprecated CPU-hotplug functions with the official version.
The behavior remains unchanged.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20210803141621.780504-35-bigeasy@linutronix.de
Stable-dep-of: 6bb05a33337b ("clocksource: Use migrate_disable() to avoid calling get_random_u32() in atomic context") Signed-off-by: Sasha Levin <sashal@kernel.org>
Currently, if skew is detected on a clock marked CLOCK_SOURCE_VERIFY_PERCPU,
that clock is checked on all CPUs. This is thorough, but might not be
what you want on a system with a few tens of CPUs, let alone a few hundred
of them.
Therefore, by default check only up to eight randomly chosen CPUs. Also
provide a new clocksource.verify_n_cpus kernel boot parameter. A value of
-1 says to check all of the CPUs, and a non-negative value says to randomly
select that number of CPUs, without concern about selecting the same CPU
multiple times. However, make use of a cpumask so that a given CPU will be
checked at most once.
Suggested-by: Thomas Gleixner <tglx@linutronix.de> # For verify_n_cpus=1. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Feng Tang <feng.tang@intel.com> Link: https://lore.kernel.org/r/20210527190124.440372-3-paulmck@kernel.org
Stable-dep-of: 6bb05a33337b ("clocksource: Use migrate_disable() to avoid calling get_random_u32() in atomic context") Signed-off-by: Sasha Levin <sashal@kernel.org>
Add a check for the return value of mlxsw_sp_port_get_stats_raw()
in __mlxsw_sp_port_get_stats(). If mlxsw_sp_port_get_stats_raw()
returns an error, exit the function to prevent further processing
with potentially invalid data.
Fixes: 614d509aa1e7 ("mlxsw: Move ethtool_ops to spectrum_ethtool.c") Cc: stable@vger.kernel.org # 5.9+ Signed-off-by: Wentao Liang <vulab@iscas.ac.cn> Reviewed-by: Petr Machata <petrm@nvidia.com> Link: https://patch.msgid.link/20250212152311.1332-1-vulab@iscas.ac.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
A recent LLVM commit [1] started generating an .ARM.attributes section
similar to the one that exists for 32-bit, which results in orphan
section warnings (or errors if CONFIG_WERROR is enabled) from the linker
because it is not handled in the arm64 linker scripts.
ld.lld: error: arch/arm64/kernel/vdso/vgettimeofday.o:(.ARM.attributes) is being placed in '.ARM.attributes'
ld.lld: error: arch/arm64/kernel/vdso/vgetrandom.o:(.ARM.attributes) is being placed in '.ARM.attributes'
ld.lld: error: vmlinux.a(lib/vsprintf.o):(.ARM.attributes) is being placed in '.ARM.attributes'
ld.lld: error: vmlinux.a(lib/win_minmax.o):(.ARM.attributes) is being placed in '.ARM.attributes'
ld.lld: error: vmlinux.a(lib/xarray.o):(.ARM.attributes) is being placed in '.ARM.attributes'
Discard the new sections in the necessary linker scripts to resolve the
warnings, as the kernel and vDSO do not need to retain it, similar to
the .note.gnu.property section.
- The bailout for a bad partoffset must use put_dev_sector(), since the
preceding read_part_sector() succeeded.
- If the partition table claims a silly sector size like 0xfff bytes
(which results in partition table entries straddling sector boundaries),
bail out instead of accessing out-of-bounds memory.
- We must not assume that the partition table contains proper NUL
termination - use strnlen() and strncmp() instead of strlen() and
strcmp().
The stmpe_reg_read function can fail, but its return value is not checked
in stmpe_gpio_irq_sync_unlock. This can lead to silent failures and
incorrect behavior if the hardware access fails.
This patch adds checks for the return value of stmpe_reg_read. If the
function fails, an error message is logged and the function returns
early to avoid further issues.
Fixes: b888fb6f2a27 ("gpio: stmpe: i2c transfer are forbiden in atomic context") Cc: stable@vger.kernel.org # 4.16+ Signed-off-by: Wentao Liang <vulab@iscas.ac.cn> Link: https://lore.kernel.org/r/20250212021849.275-1-vulab@iscas.ac.cn Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
do_page_fault() and do_entUna() are special because they use
non-standard stack frame layout. Fix them manually.
Cc: stable@vger.kernel.org Tested-by: Maciej W. Rozycki <macro@orcam.me.uk> Tested-by: Magnus Lindholm <linmag7@gmail.com> Tested-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Maciej W. Rozycki <macro@orcam.me.uk> Suggested-by: Maciej W. Rozycki <macro@orcam.me.uk> Signed-off-by: Ivan Kokshaysky <ink@unseen.parts> Signed-off-by: Matt Turner <mattst88@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When flushing the serial port's buffer, uart_flush_buffer() calls
kfifo_reset() but if there is an outstanding DMA transfer then the
completion function will consume data from the kfifo via
uart_xmit_advance(), underflowing and leading to ongoing DMA as the
driver tries to transmit another 2^32 bytes.
This is readily reproduced with serial-generic and amidi sending even
short messages as closing the device on exit will wait for the fifo to
drain and in the underflow case amidi hangs for 30 seconds on exit in
tty_wait_until_sent(). A trace of that gives:
Since the port lock is held in when the kfifo is reset in
uart_flush_buffer() and in __dma_tx_complete(), adding a flush_buffer
hook to adjust the outstanding DMA byte count is sufficient to avoid the
kfifo underflow.
UEFI 2.11 introduced EFI_MEMORY_HOT_PLUGGABLE to annotate system memory
regions that are 'cold plugged' at boot, i.e., hot pluggable memory that
is available from early boot, and described as system RAM by the
firmware.
Existing loaders and EFI applications running in the boot context will
happily use this memory for allocating data structures that cannot be
freed or moved at runtime, and this prevents the memory from being
unplugged. Going forward, the new EFI_MEMORY_HOT_PLUGGABLE attribute
should be tested, and memory annotated as such should be avoided for
such allocations.
In the EFI stub, there are a couple of occurrences where, instead of the
high-level AllocatePages() UEFI boot service, a low-level code sequence
is used that traverses the EFI memory map and carves out the requested
number of pages from a free region. This is needed, e.g., for allocating
as low as possible, or for allocating pages at random.
While AllocatePages() should presumably avoid special purpose memory and
cold plugged regions, this manual approach needs to incorporate this
logic itself, in order to prevent the kernel itself from ending up in a
hot unpluggable region, preventing it from being unplugged.
So add the EFI_MEMORY_HOTPLUGGABLE macro definition, and check for it
where appropriate.
The problem is that GCC expects 16-byte alignment of the incoming stack
since early 2004, as Maciej found out [1]:
Having actually dug speculatively I can see that the psABI was changed in
GCC 3.5 with commit e5e10fb4a350 ("re PR target/14539 (128-bit long double
improperly aligned)") back in Mar 2004, when the stack pointer alignment
was increased from 8 bytes to 16 bytes, and arch/alpha/kernel/entry.S has
various suspicious stack pointer adjustments, starting with SP_OFF which
is not a whole multiple of 16.
Also, as Magnus noted, "ALPHA Calling Standard" [2] required the same:
D.3.1 Stack Alignment
This standard requires that stacks be octaword aligned at the time a
new procedure is invoked.
However:
- the "normal" kernel stack is always misaligned by 8 bytes, thanks to
the odd number of 64-bit words in 'struct pt_regs', which is the very
first thing pushed onto the kernel thread stack;
- syscall, fault, interrupt etc. handlers may, or may not, receive aligned
stack depending on numerous factors.
Somehow we got away with it until recently, when we ended up with
a stack corruption in kernel/smp.c:smp_call_function_single() due to
its use of 32-byte aligned local data and the compiler doing clever
things allocating it on the stack.
This adds padding between the PAL-saved and kernel-saved registers
so that 'struct pt_regs' have an even number of 64-bit words.
This makes the stack properly aligned for most of the kernel
code, except two handlers which need special threatment.
Note: struct pt_regs doesn't belong in uapi/asm; this should be fixed,
but let's put this off until later.
The J1939 standard requires the transmission of messages of length 0.
For example proprietary messages are specified with a data length of 0
to 1785. The transmission of such messages is not possible. Sending
results in no error being returned but no corresponding can frame
being generated.
Enable the transmission of zero length J1939 messages. In order to
facilitate this two changes are necessary:
1) If the transmission of a new message is requested from user space
the message is segmented in j1939_sk_send_loop(). Let the segmentation
take into account zero length messages, do not terminate immediately,
queue the corresponding skb.
2) j1939_session_skb_get_by_offset() selects the next skb to transmit
for a session. Take into account that there might be zero length skbs
in the queue.
Signed-off-by: Alexander Hölzl <alexander.hoelzl@gmx.net> Acked-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://patch.msgid.link/20250205174651.103238-1-alexander.hoelzl@gmx.net Fixes: 9d71dd0c7009 ("can: add support of SAE J1939 protocol") Cc: stable@vger.kernel.org
[mkl: commit message rephrased] Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Runtime PM is enabled as one of the last steps of probe(), so all
earlier gotos to "exit_free_device" label were not correct and were
leading to unbalanced runtime PM disable depth.
Fixes: 6e2fe01dd6f9 ("can: c_can: move runtime PM enable/disable to c_can_platform") Cc: stable@vger.kernel.org Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Reviewed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Link: https://patch.msgid.link/20250112-syscon-phandle-args-can-v1-1-314d9549906f@linaro.org Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
MeiG Smart SLM828 is an LTE-A CAT6 modem with the mPCIe form factor. The
"Cls=ff(vend.) Sub=10 Prot=02" and "Cls=ff(vend.) Sub=10 Prot=03"
interfaces respond to AT commands. Add these interfaces.
The product ID the modem uses is shared across multiple modems. Therefore,
add comments to describe which interface is used for which modem.
If we receive an initial fragment of size 8 bytes which specifies a wLength
of 1 byte (so the reassembled message is supposed to be 9 bytes long), and
we then receive a second fragment of size 9 bytes (which is not supposed to
happen), we currently wrongly bypass the fragment reassembly code but still
pass the pointer to the acm->notification_buffer to
acm_process_notification().
Make this less wrong by always going through fragment reassembly when we
expect more fragments.
Before this patch, receiving an overlong fragment could lead to `newctrl`
in acm_process_notification() being uninitialized data (instead of data
coming from the device).
If the first fragment is shorter than struct usb_cdc_notification, we can't
calculate an expected_size. Log an error and discard the notification
instead of reading lengths from memory outside the received data, which can
lead to memory corruption when the expected_size decreases between
fragments, causing `expected_size - acm->nb_index` to wrap.
This issue has been present since the beginning of git history; however,
it only leads to memory corruption since commit ea2583529cd1
("cdc-acm: reassemble fragmented notifications").
A mitigating factor is that acm_ctrl_irq() can only execute after userspace
has opened /dev/ttyACM*; but if ModemManager is running, ModemManager will
do that automatically depending on the USB device's vendor/product IDs and
its other interfaces.
Add Renesas R-Car D3 USB Download mode quirk and update comments
on all the other Renesas R-Car USB Download mode quirks to discern
them from each other. This follows R-Car Series, 3rd Generation
reference manual Rev.2.00 chapter 19.2.8 USB download mode .
Fixes: 6d853c9e4104 ("usb: cdc-acm: Add DISABLE_ECHO for Renesas USB Download mode") Cc: stable <stable@kernel.org> Signed-off-by: Marek Vasut <marek.vasut+renesas@mailbox.org> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Link: https://lore.kernel.org/r/20250209145708.106914-1-marek.vasut+renesas@mailbox.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The cause of this error is that the device has two interfaces, and the
hub driver binds to interface 1 instead of interface 0, which is where
usb_hub_to_struct_hub() looks.
We can prevent the problem from occurring by refusing to accept hub
devices that violate the USB spec by having more than one
configuration or interface.
Reported-and-tested-by: Robert Morris <rtm@csail.mit.edu> Cc: stable <stable@kernel.org> Closes: https://lore.kernel.org/linux-usb/95564.1737394039@localhost/ Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Link: https://lore.kernel.org/r/c27f3bf4-63d8-4fb5-ac82-09e3cd19f61c@rowland.harvard.edu Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
While the MIDI jacks are configured correctly, and the MIDIStreaming
endpoint descriptors are filled with the correct information,
bNumEmbMIDIJack and bLength are set incorrectly in these descriptors.
This does not matter when the numbers of in and out ports are equal, but
when they differ the host will receive broken descriptors with
uninitialized stack memory leaking into the descriptor for whichever
value is smaller.
The precise meaning of "in" and "out" in the port counts is not clearly
defined and can be confusing. But elsewhere the driver consistently
uses this to match the USB meaning of IN and OUT viewed from the host,
so that "in" ports send data to the host and "out" ports receive data
from it.
Cc: stable <stable@kernel.org> Fixes: c8933c3f79568 ("USB: gadget: f_midi: allow a dynamic number of input and output ports") Signed-off-by: John Keeping <jkeeping@inmusicbrands.com> Reviewed-by: Takashi Iwai <tiwai@suse.de> Link: https://lore.kernel.org/r/20250130195035.3883857-1-jkeeping@inmusicbrands.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The fastboot tool for communicating with Android bootloaders does not
work reliably with this device if USB 2 Link Power Management (LPM)
is enabled.
Various fastboot commands are affected, including the
following, which usually reproduces the problem within two tries:
fastboot getvar kernel
getvar:kernel FAILED (remote: 'GetVar Variable Not found')
This issue was hidden on many systems up until commit 63a1f8454962
("xhci: stored cached port capability values in one place") as the xhci
driver failed to detect USB 2 LPM support if USB 3 ports were listed
before USB 2 ports in the "supported protocol capabilities".
Adding the quirk resolves the issue. No drawbacks are expected since
the device uses different USB product IDs outside of fastboot mode, and
since fastboot commands worked before, until LPM was enabled on the
tested system by the aforementioned commit.
Based on a patch from Forest <forestix@nom.one> from which most of the
code and commit message is taken.
Teclast disk used on Huawei hisi platforms doesn't work well,
losing connectivity intermittently if LPM is enabled.
Add quirk disable LPM to resolve the issue.
When usb_control_msg is used in the get_bMaxPacketSize0 function, the
USB pipe does not include the endpoint device number. This can cause
failures when a usb hub port is reinitialized after encountering a bad
cable connection. As a result, the system logs the following error
messages:
usb usb2-port1: cannot reset (err = -32)
usb usb2-port1: Cannot enable. Maybe the USB cable is bad?
usb usb2-port1: attempt power cycle
usb 2-1: new high-speed USB device number 5 using ci_hdrc
usb 2-1: device descriptor read/8, error -71
The problem began after commit 85d07c556216 ("USB: core: Unite old
scheme and new scheme descriptor reads"). There
usb_get_device_descriptor was replaced with get_bMaxPacketSize0. Unlike
usb_get_device_descriptor, the get_bMaxPacketSize0 function uses the
macro usb_rcvaddr0pipe, which does not include the endpoint device
number. usb_get_device_descriptor, on the other hand, used the macro
usb_rcvctrlpipe, which includes the endpoint device number.
By modifying the get_bMaxPacketSize0 function to use usb_rcvctrlpipe
instead of usb_rcvaddr0pipe, the issue can be resolved. This change will
ensure that the endpoint device number is included in the USB pipe,
preventing reinitialization failures. If the endpoint has not set the
device number yet, it will still work because the device number is 0 in
udev.
Cc: stable <stable@kernel.org> Fixes: 85d07c556216 ("USB: core: Unite old scheme and new scheme descriptor reads") Signed-off-by: Stefan Eichenberger <stefan.eichenberger@toradex.com> Reviewed-by: Alan Stern <stern@rowland.harvard.edu> Link: https://lore.kernel.org/r/20250203105840.17539-1-eichest@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
LS7A EHCI controller doesn't have extended capabilities, so the EECP
(EHCI Extended Capabilities Pointer) field of HCCPARAMS register should
be 0x0, but it reads as 0xa0 now. This is a hardware flaw and will be
fixed in future, now just clear the EECP field to avoid error messages
on boot:
In dwc2_hsotg_udc_start(), e.g. when binding composite driver, "of_node"
is set to hsotg->dev->of_node.
It causes errors when binding the gadget driver several times, on
stm32mp157c-ev1 board. Below error is seen:
"pin PA10 already requested by 49000000.usb-otg; cannot claim for gadget.0"
The first time, no issue is seen as when registering the driver, of_node
isn't NULL:
-> gadget_dev_desc_UDC_store
-> usb_gadget_register_driver_owner
-> driver_register
...
-> really_probe -> pinctrl_bind_pins (no effect)
Then dwc2_hsotg_udc_start() sets of_node.
The second time (stop the gadget, reconfigure it, then start it again),
of_node has been set, so the probing code tries to acquire pins for the
gadget. These pins are hold by the controller, hence the error.
So clear gadget.dev.of_node in udc_stop() routine to avoid the issue.
drivers/usb/gadget/udc/renesas_usb3.c: In function 'renesas_usb3_probe':
drivers/usb/gadget/udc/renesas_usb3.c:2638:73: warning: '%d'
directive output may be truncated writing between 1 and 11 bytes into a
region of size 6 [-Wformat-truncation=]
2638 | snprintf(usb3_ep->ep_name, sizeof(usb3_ep->ep_name), "ep%d", i);
^~~~~~~~~~~~~~~~~~~~~~~~ ^~ ^
The role switch registration and set_role() can happen in parallel as they
are invoked independent of each other. There is a possibility that a driver
might spend significant amount of time in usb_role_switch_register() API
due to the presence of time intensive operations like component_add()
which operate under common mutex. This leads to a time window after
allocating the switch and before setting the registered flag where the set
role notifications are dropped. Below timeline summarizes this behavior
Thread1 | Thread2
usb_role_switch_register() |
| |
---> allocate switch |
| |
---> component_add() | usb_role_switch_set_role()
| | |
| | --> Drop role notifications
| | since sw->registered
| | flag is not set.
| |
--->Set registered flag.|
To avoid this, set the registered flag early on in the switch register
API.
Fixes: b787a3e78175 ("usb: roles: don't get/set_role() when usb_role_switch is unregistered") Cc: stable <stable@kernel.org> Signed-off-by: Elson Roy Serrao <quic_eserrao@quicinc.com> Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Link: https://lore.kernel.org/r/20250206193950.22421-1-quic_eserrao@quicinc.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Explicitly clear DEBUGCTL.LBR when a CPU is starting, prior to purging the
LBR MSRs themselves, as at least one system has been found to transfer
control to the kernel with LBRs enabled (it's unclear whether it's a BIOS
flaw or a CPU goof). Because the kernel preserves the original DEBUGCTL,
even when toggling LBRs, leaving DEBUGCTL.LBR as is results in running
with LBRs enabled at all times.
If a temporary error happened in the evaluation of the neighbor throughput
information, then the invalid throughput result should not be stored in the
throughtput EWMA.
Cc: stable@vger.kernel.org Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reference counting is used to ensure that
batadv_hardif_neigh_node and batadv_hard_iface
are not freed before/during
batadv_v_elp_throughput_metric_update work is
finished.
But there isn't a guarantee that the hard if will
remain associated with a soft interface up until
the work is finished.
This fixes a crash triggered by reboot that looks
like this:
(the batadv_v_mesh_free call is misleading,
and does not actually happen)
I was able to make the issue happen more reliably
by changing hardif_neigh->bat_v.metric_work work
to be delayed work. This allowed me to track down
and confirm the fix.
Cc: stable@vger.kernel.org Fixes: c833484e5f38 ("batman-adv: ELP - compute the metric based on the estimated throughput") Signed-off-by: Andy Strohman <andrew@andrewstrohman.com>
[sven@narfation.org: prevent entering batadv_v_elp_get_throughput without
soft_iface] Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The Vexia EDU ATLA 10 tablet comes in 2 different versions with
significantly different mainboards. The only outward difference is that
the charging barrel on one is marked 5V and the other is marked 9V.
The 5V version mostly works with the BYTCR defaults, except that it is
missing a CHAN package in its ACPI tables and the default of using
SSP0-AIF2 is wrong, instead SSP0-AIF1 must be used. That and its jack
detect signal is not inverted as it usually is.
Add a DMI quirk for the 5V version to fix sound not working.
I got a syzbot report: slab-out-of-bounds Read in
orangefs_debug_write... several people suggested fixes,
I tested Al Viro's suggestion and made this patch.
Signed-off-by: Mike Marshall <hubcap@omnibond.com> Reported-by: syzbot+fc519d7875f2d9186c1f@syzkaller.appspotmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
Function xen_pin_page calls xen_pte_lock, which in turn grab page
table lock (ptlock). When locking, xen_pte_lock expect mm->page_table_lock
to be held before grabbing ptlock, but this does not happen when pinning
is caused by xen_mm_pin_all.
This commit addresses lockdep warning below, which shows up when
suspending a Xen VM.
Definitions of ioread64 and iowrite64 macros in asm/io.h called by vfio
pci implementations are enclosed inside check for CONFIG_GENERIC_IOMAP.
They don't get defined if CONFIG_GENERIC_IOMAP is defined. Include
linux/io-64-nonatomic-lo-hi.h to define iowrite64 and ioread64 macros
when they are not defined. io-64-nonatomic-lo-hi.h maps the macros to
generic implementation in lib/iomap.c. The generic implementation does
64 bit rw if readq/writeq is defined for the architecture, otherwise it
would do 32 bit back to back rw.
Note that there are two versions of the generic implementation that
differs in the order the 32 bit words are written if 64 bit support is
not present. This is not the little/big endian ordering, which is
handled separately. This patch uses the lo followed by hi word ordering
which is consistent with current back to back implementation in the
vfio/pci code.
Apparently the Raptor Lake-P reference firmware configures the PIO log size
correctly, but some vendor BIOSes, including at least ASUSTeK COMPUTER INC.
Zenbook UX3402VA_UX3402VA, do not.
Apply the quirk for Raptor Lake-P. This prevents kernel complaints like:
DPC: RP PIO log size 0 is invalid
and also enables the DPC driver to dump the RP PIO Log registers when DPC
is triggered.
Note that the bug report also mentions 8086:a76e, which has been already
added by 627c6db20703 ("PCI/DPC: Quirk PIO log size for Intel Raptor Lake
Root Ports").
syzbot report a null-ptr-deref in vidtv_mux_stop_thread. [1]
If dvb->mux is not initialized successfully by vidtv_mux_init() in the
vidtv_start_streaming(), it will trigger null pointer dereference about mux
in vidtv_mux_stop_thread().
Adjust the timing of streaming initialization and check it before
stopping it.
Reported-by: syzbot+5e248227c80a3be8e96a@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=5e248227c80a3be8e96a Signed-off-by: Edward Adam Davis <eadavis@qq.com> Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl> Signed-off-by: Sasha Levin <sashal@kernel.org>
It appears that do_div() once more gets confused by a complex
expression that ends up not quite being constant despite
__builtin_constant_p() thinking it is:
Today a PV guest (including dom0) can create 2MB contiguous memory
regions for DMA buffers at max. This has led to problems at least
with the megaraid_sas driver, which wants to allocate a 2.3MB DMA
buffer.
The limiting factor is the frame array used to do the hypercall for
making the memory contiguous, which has 512 entries and is just a
static array in mmu_pv.c.
In order to not waste memory for non-PV guests, put the initial
frame array into .init.data section and dynamically allocate an array
from the .init_after_bootmem hook of PV guests.
In case a contiguous memory area larger than the initially supported
2MB is requested, allocate a larger buffer for the frame list. Note
that such an allocation is tried only after memory management has been
initialized properly, which is tested via a flag being set in the
.init_after_bootmem hook.
Fixes: 9f40ec84a797 ("xen/swiotlb: add alignment check for dma buffers") Signed-off-by: Juergen Gross <jgross@suse.com> Tested-by: Alan Robinson <Alan.Robinson@fujitsu.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
After removing the conditional return from xen_create_contiguous_region(),
the accompanying comment was left in place, but it now precedes an
unrelated conditional and confuses readers.
Fixes: 989513a735f5 ("xen: cleanup pvh leftovers from pv-only sources") Signed-off-by: Petr Tesarik <petr.tesarik.ext@huawei.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Link: https://lore.kernel.org/r/20230802163151.1486-1-petrtesarik@huaweicloud.com Signed-off-by: Juergen Gross <jgross@suse.com>
Stable-dep-of: e93ec87286bd ("x86/xen: allow larger contiguous memory regions in PV guests") Signed-off-by: Sasha Levin <sashal@kernel.org>
The settings for all GPIOs are locked by default in bcm_kona_gpio_reset.
The settings for a GPIO are unlocked when requesting it as a GPIO, but
not when requesting it as an interrupt, causing the IRQ settings to not
get applied.
Fix this by making sure to unlock the right bits when an IRQ is requested.
To avoid a situation where an IRQ being released causes a lock despite
the same GPIO being used by a GPIO request or vice versa, add an unlock
counter and only lock if it reaches 0.
The GPIO lock/unlock functions clear/write a bit to the relevant
register for each bank. However, due to an oversight the bit that
was being written was based on the total GPIO number, not the index
of the GPIO within the relevant bank, causing it to fail for any
GPIO above 32 (thus any GPIO for banks above bank 0).
Fix lock/unlock for these banks by using the correct bit.
The loop that detects/populates cache information already has a bounds
check on the array size but does not account for cache levels with
separate data/instructions cache. Fix this by incrementing the index
for any populated leaf (instead of any populated level).
Fixes: 5d425c186537 ("arm64: kernel: add support for cpu cache information") Signed-off-by: Radu Rendec <rrendec@redhat.com> Link: https://lore.kernel.org/r/20250206174420.2178724-1-rrendec@redhat.com Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
devm_kasprintf() can return a NULL pointer on failure,but this
returned value in mt_input_configured() is not checked.
Add NULL check in mt_input_configured(), to handle kernel NULL
pointer dereference error.
Fixes: 479439463529 ("HID: multitouch: Correct devm device reference for hidinput input_dev name") Signed-off-by: Charles Han <hanchunchao@inspur.com> Signed-off-by: Jiri Kosina <jkosina@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
If nfs4_client is in courtesy state then there is no point to send
the callback. This causes nfsd4_shutdown_callback to hang since
cl_cb_inflight is not 0. This hang lasts about 15 minutes until TCP
notifies NFSD that the connection was dropped.
This patch modifies nfsd4_run_cb_work to skip the RPC call if
nfs4_client is in courtesy state.
Signed-off-by: Dai Ngo <dai.ngo@oracle.com> Fixes: 66af25799940 ("NFSD: add courteous server support for thread with only delegation") Cc: stable@vger.kernel.org Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If getting acl_default fails, acl_access and acl_default will be released
simultaneously. However, acl_access will still retain a pointer pointing
to the released posix_acl, which will trigger a WARNING in
nfs3svc_release_getacl like this:
Clear acl_access/acl_default after posix_acl_release is called to prevent
UAF from being triggered.
Fixes: a257cdd0e217 ("[PATCH] NFSD: Add server support for NFSv3 ACLs.") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/20241107014705.2509463-1-lilingfeng@huaweicloud.com/ Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com> Reviewed-by: Rick Macklem <rmacklem@uoguelph.ca> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently the skb size after coalescing is only limited by the skb
layout (the skb must not carry frag_list). A single coalesced skb
covering several MSS can potentially fill completely the receive
buffer. In such a case, the snd win will zero until the receive buffer
will be empty again, affecting tput badly.
Fixes: 8268ed4c9d19 ("mptcp: introduce and use mptcp_try_coalesce()") Cc: stable@vger.kernel.org # please delay 2 weeks after 6.13-final release Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20241230-net-mptcp-rbuf-fixes-v1-3-8608af434ceb@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
==================================================================
BUG: KASAN: slab-out-of-bounds in ocfs2_match fs/ocfs2/dir.c:334
[inline]
BUG: KASAN: slab-out-of-bounds in ocfs2_search_dirblock+0x283/0x6e0
fs/ocfs2/dir.c:367
Read of size 1 at addr ffff88804d8b9982 by task syz-executor.2/14802
The two reports are all caused invalid negative i_size of dir inode. For
ocfs2, dir_inode can't be negative or zero.
Here add a check in which is called by ocfs2_check_dir_for_entry(). It
fixes the second report as ocfs2_check_dir_for_entry() must be called
before ocfs2_prepare_dir_for_insert(). Also set a up limit for dir with
OCFS2_INLINE_DATA_FL. The i_size can't be great than blocksize.
Declare ftrace_get_parent_ra_addr() as static to suppress clang
compiler warning that 'no previous prototype'. This function is
not intended to be called from other parts.
Fix follow error with clang-19:
arch/mips/kernel/ftrace.c:251:15: error: no previous prototype for function 'ftrace_get_parent_ra_addr' [-Werror,-Wmissing-prototypes]
251 | unsigned long ftrace_get_parent_ra_addr(unsigned long self_ra, unsigned long
| ^
arch/mips/kernel/ftrace.c:251:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
251 | unsigned long ftrace_get_parent_ra_addr(unsigned long self_ra, unsigned long
| ^
| static
1 error generated.
Signed-off-by: WangYuli <wangyuli@uniontech.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The ioctl and sysfs handlers unconditionally call the ->enable callback.
Not all drivers implement that callback, leading to NULL dereferences.
Example of affected drivers: ptp_s390.c, ptp_vclock.c and ptp_mock.c.
Instead use a dummy callback if no better was specified by the driver.
Fixes: d94ba80ebbea ("ptp: Added a brand new class driver for ptp clocks.") Cc: stable@vger.kernel.org Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Acked-by: Richard Cochran <richardcochran@gmail.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Link: https://patch.msgid.link/20250123-ptp-enable-v1-1-b015834d3a47@weissschuh.net Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The NCSI state machine as it's currently implemented assumes that
transition to the next logical state is performed either explicitly by
calling `schedule_work(&ndp->work)` to re-queue itself or implicitly
after processing the predefined (ndp->pending_req_num) number of
replies. Thus to avoid the configuration FSM from advancing prematurely
and getting out of sync with the process it's essential to not skip
waiting for a reply.
This patch makes the code wait for reception of the Deselect Package
response for the last package probed before proceeding to channel
configuration.
Thanks go to Potin Lai and Cosmo Chou for the initial investigation and
testing.
Fixes: 8e13f70be05e ("net/ncsi: Probe single packages to avoid conflict") Cc: stable@vger.kernel.org Signed-off-by: Paul Fertser <fercerpav@gmail.com> Link: https://patch.msgid.link/20250116152900.8656-1-fercerpav@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
For registered buffers, fastrpc driver sends the buffer information
to remote subsystem. There is a problem with current implementation
where the page address is being sent with an offset leading to
improper buffer address on DSP. This is leads to functional failures
as DSP expects base address in page information and extracts offset
information from remote arguments. Mask the offset and pass the base
page address to DSP.
This issue is observed is a corner case when some buffer which is registered
with fastrpc framework is passed with some offset by user and then the DSP
implementation tried to read the data. As DSP expects base address and takes
care of offsetting with remote arguments, passing an offsetted address will
result in some unexpected data read in DSP.
All generic usecases usually pass the buffer as it is hence is problem is
not usually observed. If someone tries to pass offsetted buffer and then
tries to compare data at HLOS and DSP end, then the ambiguity will be observed.
The function do_otp_read() does not set the output parameter *retlen,
which is expected to contain the number of bytes actually read.
As a result, in onenand_otp_walk(), the tmp_retlen variable remains
uninitialized after calling do_otp_walk() and used to change
the values of the buf, len and retlen variables.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
The "pipe" variable is a u8 which comes from the network. If it's more
than 127, then it results in memory corruption in the caller,
nci_hci_connect_gate().
Cc: stable@vger.kernel.org Fixes: a1b0b9415817 ("NFC: nci: Create pipe on specific gate in nci_hci_connect_gate") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://patch.msgid.link/bcf5453b-7204-4297-9c20-4d8c7dacf586@stanley.mountain Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Since nilfs_bmap_lookup_contig() in nilfs_fiemap() calculates its result
by being prepared to go through potentially maxblocks == INT_MAX blocks,
the value in n may experience an overflow caused by left shift of blkbits.
While it is extremely unlikely to occur, play it safe and cast right hand
expression to wider type to mitigate the issue.
Found by Linux Verification Center (linuxtesting.org) with static analysis
tool SVACE.
Mark did a conversion of ocfs2 to use folios and sent it to me as a
giant patch for review ;-)
So I've redone it as individual patches, and credited Mark for the patches
where his code is substantially the same. It's not a bad way to do it;
his patch had some bugs and my patches had some bugs. Hopefully all our
bugs were different from each other. And hopefully Mark likes all the
changes I made to his code!
This patch (of 23):
If we can't read the buffer, be sure to unlock the page before returning.
Link: https://lkml.kernel.org/r/20241205171653.3179945-1-willy@infradead.org Link: https://lkml.kernel.org/r/20241205171653.3179945-2-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Jun Piao <piaojun@huawei.com> Cc: Mark Tinguely <mark.tinguely@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 23aab037106d ("ocfs2: fix UBSAN warning in ocfs2_verify_volume()")
introduced a regression bug. The blksz_bits value is already converted to
CPU endian in the previous code; therefore, the code shouldn't use
le32_to_cpu() anymore.
Link: https://lkml.kernel.org/r/20250121112204.12834-1-heming.zhao@suse.com Fixes: 23aab037106d ("ocfs2: fix UBSAN warning in ocfs2_verify_volume()") Signed-off-by: Heming Zhao <heming.zhao@suse.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Jun Piao <piaojun@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
count and offset are passed from user space and not checked, only
offset is capped to 40 bits, which can be used to read/write out of
bounds of the device.
Fixes: 6e3f26456009 (“vfio/platform: read and write support for the device fd”) Cc: stable@vger.kernel.org Reported-by: Mostafa Saleh <smostafa@google.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Mostafa Saleh <smostafa@google.com> Tested-by: Mostafa Saleh <smostafa@google.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>