When an uncorrected memory error is consumed there is a race between the
CMCI from the memory controller reporting an uncorrected error with a UCNA
signature, and the core reporting and SRAR signature machine check when
the data is about to be consumed.
- Background: why *UN*corrected errors tied to *C*MCI in Intel platform [1]
Prior to Icelake memory controllers reported patrol scrub events that
detected a previously unseen uncorrected error in memory by signaling a
broadcast machine check with an SRAO (Software Recoverable Action
Optional) signature in the machine check bank. This was overkill because
it's not an urgent problem that no core is on the verge of consuming that
bad data. It's also found that multi SRAO UCE may cause nested MCE
interrupts and finally become an IERR.
Hence, Intel downgrades the machine check bank signature of patrol scrub
from SRAO to UCNA (Uncorrected, No Action required), and signal changed to
#CMCI. Just to add to the confusion, Linux does take an action (in
uc_decode_notifier()) to try to offline the page despite the UC*NA*
signature name.
- Background: why #CMCI and #MCE race when poison is consuming in Intel platform [1]
Having decided that CMCI/UCNA is the best action for patrol scrub errors,
the memory controller uses it for reads too. But the memory controller is
executing asynchronously from the core, and can't tell the difference
between a "real" read and a speculative read. So it will do CMCI/UCNA if
an error is found in any read.
Thus:
1) Core is clever and thinks address A is needed soon, issues a speculative read.
2) Core finds it is going to use address A soon after sending the read request
3) The CMCI from the memory controller is in a race with MCE from the core
that will soon try to retire the load from address A.
Quite often (because speculation has got better) the CMCI from the memory
controller is delivered before the core is committed to the instruction
reading address A, so the interrupt is taken, and Linux offlines the page
(marking it as poison).
- Why user process is killed for instr case
Commit 046545a661af ("mm/hwpoison: fix error page recovered but reported
"not recovered"") tries to fix noise message "Memory error not recovered"
and skips duplicate SIGBUSs due to the race. But it also introduced a bug
that kill_accessing_process() return -EHWPOISON for instr case, as result,
kill_me_maybe() send a SIGBUS to user process.
If the CMCI wins that race, the page is marked poisoned when
uc_decode_notifier() calls memory_failure(). For dirty pages,
memory_failure() invokes try_to_unmap() with the TTU_HWPOISON flag,
converting the PTE to a hwpoison entry. As a result,
kill_accessing_process():
- call walk_page_range() and return 1 regardless of whether
try_to_unmap() succeeds or fails,
- call kill_proc() to make sure a SIGBUS is sent
- return -EHWPOISON to indicate that SIGBUS is already sent to the
process and kill_me_maybe() doesn't have to send it again.
However, for clean pages, the TTU_HWPOISON flag is cleared, leaving the
PTE unchanged and not converted to a hwpoison entry. Conversely, for
clean pages where PTE entries are not marked as hwpoison,
kill_accessing_process() returns -EFAULT, causing kill_me_maybe() to send
a SIGBUS.
Console log looks like this:
Memory failure: 0x827ca68: corrupted page was clean: dropped without side effects
Memory failure: 0x827ca68: recovery action for clean LRU page: Recovered
Memory failure: 0x827ca68: already hardware poisoned
mce: Memory error not recovered
To fix it, return 0 for "corrupted page was clean", preventing an
unnecessary SIGBUS to user process.
[1] https://lore.kernel.org/lkml/20250217063335.22257-1-xueshuai@linux.alibaba.com/T/#mba94f1305b3009dd340ce4114d3221fe810d1871 Link: https://lkml.kernel.org/r/20250312112852.82415-3-xueshuai@linux.alibaba.com Fixes: 046545a661af ("mm/hwpoison: fix error page recovered but reported "not recovered"") Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com> Tested-by: Tony Luck <tony.luck@intel.com> Acked-by: Miaohe Lin <linmiaohe@huawei.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Borislav Betkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jane Chu <jane.chu@oracle.com> Cc: Jarkko Sakkinen <jarkko@kernel.org> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Josh Poimboeuf <jpoimboe@kernel.org> Cc: Naoya Horiguchi <nao.horiguchi@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ruidong Tian <tianruidong@linux.alibaba.com> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: Yazen Ghannam <yazen.ghannam@amd.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch should fix a possible userfaultfd release() hang during
concurrent GUP.
This problem was initially reported by Dimitris Siakavaras in July 2023
[1] in a firecracker use case. Firecracker has a separate process
handling page faults remotely, and when the process releases the
userfaultfd it can race with a concurrent GUP from KVM trying to fault in
a guest page during the secondary MMU page fault process.
A similar problem was reported recently again by Jinjiang Tu in March 2025
[2], even though the race happened this time with a mlockall() operation,
which does GUP in a similar fashion.
In 2017, commit 656710a60e36 ("userfaultfd: non-cooperative: closing the
uffd without triggering SIGBUS") was trying to fix this issue. AFAIU,
that fixes well the fault paths but may not work yet for GUP. In GUP, the
issue is NOPAGE will be almost treated the same as "page fault resolved"
in faultin_page(), then the GUP will follow page again, seeing page
missing, and it'll keep going into a live lock situation as reported.
This change makes core mm return RETRY instead of NOPAGE for both the GUP
and fault paths, proactively releasing the mmap read lock. This should
guarantee the other release thread make progress on taking the write lock
and avoid the live lock even for GUP.
When at it, rearrange the comments to make sure it's uptodate.
The PGDAT_RECLAIM_LOCKED bit is used to provide mutual exclusion of node
reclaim for struct pglist_data using a single bit.
It is "locked" with a test_and_set_bit (similarly to a try lock) which
provides full ordering with respect to loads and stores done within
__node_reclaim().
It is "unlocked" with clear_bit(), which does not provide any ordering
with respect to loads and stores done before clearing the bit.
The lack of clear_bit() memory ordering with respect to stores within
__node_reclaim() can cause a subsequent CPU to fail to observe stores from
a prior node reclaim. This is not an issue in practice on TSO (e.g.
x86), but it is an issue on weakly-ordered architectures (e.g. arm64).
Fix this by using clear_bit_unlock rather than clear_bit to clear
PGDAT_RECLAIM_LOCKED with a release memory ordering semantic.
This provides stronger memory ordering (release rather than relaxed).
Link: https://lkml.kernel.org/r/20250312141014.129725-1-mathieu.desnoyers@efficios.com Fixes: d773ed6b856a ("mm: test and set zone reclaim lock before starting reclaim") Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: Andrea Parri <parri.andrea@gmail.com> Cc: Will Deacon <will@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: David Howells <dhowells@redhat.com> Cc: Jade Alglave <j.alglave@ucl.ac.uk> Cc: Luc Maranget <luc.maranget@inria.fr> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The existing mremap() logic has grown organically over a very long period
of time, resulting in code that is in many parts, very difficult to follow
and full of subtleties and sources of confusion.
In addition, it is difficult to thread state through the operation
correctly, as function arguments have expanded, some parameters are
expected to be temporarily altered during the operation, others are
intended to remain static and some can be overridden.
This series completely refactors the mremap implementation, sensibly
separating functions, adding comments to explain the more subtle aspects
of the implementation and making use of small structs to thread state
through everything.
The reason for doing so is to lay the groundwork for planned future
changes to the mremap logic, changes which require the ability to easily
pass around state.
Additionally, it would be unhelpful to add yet more logic to code that is
already difficult to follow without first refactoring it like this.
The first patch in this series additionally fixes a bug when a VMA with
start address zero is partially remapped.
Tested on real hardware under heavy workload and all self tests are
passing.
This patch (of 3):
Consider the case of a partial mremap() (that results in a VMA split) of
an accountable VMA (i.e. which has the VM_ACCOUNT flag set) whose start
address is zero, with the MREMAP_MAYMOVE flag specified and a scenario
where a move does in fact occur:
addr end
| |
v v
|-------------|
| vma |
|-------------|
0
This move is affected by unmapping the range [addr, end). In order to
prevent an incorrect decrement of accounted memory which has already been
determined, the mremap() code in move_vma() clears VM_ACCOUNT from the VMA
prior to doing so, before reestablishing it in each of the VMAs
post-split:
addr end
| |
v v
|---| |---|
| A | | B |
|---| |---|
Commit 6b73cff239e5 ("mm: change munmap splitting order and move_vma()")
changed this logic such as to determine whether there is a need to do so
by establishing account_start and account_end and, in the instance where
such an operation is required, assigning them to vma->vm_start and
vma->vm_end.
Later the code checks if the operation is required for 'A' referenced
above thusly:
if (account_start) {
...
}
However, if the VMA described above has vma->vm_start == 0, which is now
assigned to account_start, this branch will not be executed.
As a result, the VMA 'A' above will remain stripped of its VM_ACCOUNT
flag, incorrectly.
The fix is to simply convert these variables to booleans and set them as
required.
I'm planning to implement lazy mmu mode for arm64 to optimize vmalloc. As
part of that, I will extend lazy mmu mode to cover kernel mappings in
vmalloc table walkers. While lazy mmu mode is already used for kernel
mappings in a few places, this will extend it's use significantly.
Having reviewed the existing lazy mmu implementations in powerpc, sparc
and x86, it looks like there are a bunch of bugs, some of which may be
more likely to trigger once I extend the use of lazy mmu. So this series
attempts to clarify the requirements and fix all the bugs in advance of
that series. See patch #1 commit log for all the details.
This patch (of 5):
The docs, implementations and use of arch_[enter|leave]_lazy_mmu_mode() is
a bit of a mess (to put it politely). There are a number of issues
related to nesting of lazy mmu regions and confusion over whether the
task, when in a lazy mmu region, is preemptible or not. Fix all the
issues relating to the core-mm. Follow up commits will fix the
arch-specific implementations. 3 arches implement lazy mmu; powerpc,
sparc and x86.
When arch_[enter|leave]_lazy_mmu_mode() was first introduced by commit 6606c3e0da53 ("[PATCH] paravirt: lazy mmu mode hooks.patch"), it was
expected that lazy mmu regions would never nest and that the appropriate
page table lock(s) would be held while in the region, thus ensuring the
region is non-preemptible. Additionally lazy mmu regions were only used
during manipulation of user mappings.
Commit 38e0edb15bd0 ("mm/apply_to_range: call pte function with lazy
updates") started invoking the lazy mmu mode in apply_to_pte_range(),
which is used for both user and kernel mappings. For kernel mappings the
region is no longer protected by any lock so there is no longer any
guarantee about non-preemptibility. Additionally, for RT configs, the
holding the PTL only implies no CPU migration, it doesn't prevent
preemption.
Commit bcc6cc832573 ("mm: add default definition of set_ptes()") added
arch_[enter|leave]_lazy_mmu_mode() to the default implementation of
set_ptes(), used by x86. So after this commit, lazy mmu regions can be
nested. Additionally commit 1a10a44dfc1d ("sparc64: implement the new
page table range API") and commit 9fee28baa601 ("powerpc: implement the
new page table range API") did the same for the sparc and powerpc
set_ptes() overrides.
powerpc couldn't deal with preemption so avoids it in commit b9ef323ea168
("powerpc/64s: Disable preemption in hash lazy mmu mode"), which
explicitly disables preemption for the whole region in its implementation.
x86 can support preemption (or at least it could until it tried to add
support nesting; more on this below). Sparc looks to be totally broken in
the face of preemption, as far as I can tell.
powerpc can't deal with nesting, so avoids it in commit 47b8def9358c
("powerpc/mm: Avoid calling arch_enter/leave_lazy_mmu() in set_ptes"),
which removes the lazy mmu calls from its implementation of set_ptes().
x86 attempted to support nesting in commit 49147beb0ccb ("x86/xen: allow
nesting of same lazy mode") but as far as I can tell, this breaks its
support for preemption.
In short, it's all a mess; the semantics for
arch_[enter|leave]_lazy_mmu_mode() are not clearly defined and as a result
the implementations all have different expectations, sticking plasters and
bugs.
arm64 is aiming to start using these hooks, so let's clean everything up
before adding an arm64 implementation. Update the documentation to state
that lazy mmu regions can never be nested, must not be called in interrupt
context and preemption may or may not be enabled for the duration of the
region. And fix the generic implementation of set_ptes() to avoid
nesting.
arch-specific fixes to conform to the new spec will proceed this one.
These issues were spotted by code review and I have no evidence of issues
being reported in the wild.
Link: https://lkml.kernel.org/r/20250303141542.3371656-1-ryan.roberts@arm.com Link: https://lkml.kernel.org/r/20250303141542.3371656-2-ryan.roberts@arm.com Fixes: bcc6cc832573 ("mm: add default definition of set_ptes()") Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Juergen Gross <jgross@suse.com> Cc: Andreas Larsson <andreas@gaisler.com> Cc: Borislav Betkov <bp@alien8.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David S. Miller <davem@davemloft.net> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Juegren Gross <jgross@suse.com> Cc: Matthew Wilcow (Oracle) <willy@infradead.org> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When a process consumes a UE in a page, the memory failure handler
attempts to collect information for a potential SIGBUS. If the page is an
anonymous page, page_mapped_in_vma(page, vma) is invoked in order to
1. retrieve the vaddr from the process' address space,
2. verify that the vaddr is indeed mapped to the poisoned page,
where 'page' is the precise small page with UE.
It's been observed that when injecting poison to a non-head subpage of an
anonymous hugetlb page, no SIGBUS shows up, while injecting to the head
page produces a SIGBUS. The cause is that, though hugetlb_walk() returns
a valid pmd entry (on x86), but check_pte() detects mismatch between the
head page per the pmd and the input subpage. Thus the vaddr is considered
not mapped to the subpage and the process is not collected for SIGBUS
purpose. This is the calling stack:
check_pte() header says that it
"check if [pvmw->pfn, @pvmw->pfn + @pvmw->nr_pages) is mapped at the @pvmw->pte"
but practically works only if pvmw->pfn is the head page pfn at pvmw->pte.
Hindsight acknowledging that some pvmw->pte could point to a hugepage of
some sort such that it makes sense to make check_pte() work for hugepage.
Link: https://lkml.kernel.org/r/20250224211445.2663312-1-jane.chu@oracle.com Signed-off-by: Jane Chu <jane.chu@oracle.com> Cc: Hugh Dickins <hughd@google.com> Cc: Kirill A. Shuemov <kirill.shutemov@linux.intel.com> Cc: linmiaohe <linmiaohe@huawei.com> Cc: Matthew Wilcow (Oracle) <willy@infradead.org> Cc: Peter Xu <peterx@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Even though FOLL_SPLIT_PMD on hugetlb now always fails with -EOPNOTSUPP,
let's add a safety net in case FOLL_SPLIT_PMD usage would ever be
reworked.
In particular, before commit 9cb28da54643 ("mm/gup: handle hugetlb in the
generic follow_page_mask code"), GUP(FOLL_SPLIT_PMD) would just have
returned a page. In particular, hugetlb folios that are not PMD-sized
would never have been prone to FOLL_SPLIT_PMD.
hugetlb folios can be anonymous, and page_make_device_exclusive_one() is
not really prepared for handling them at all. So let's spell that out.
Link: https://lkml.kernel.org/r/20250210193801.781278-3-david@redhat.com Fixes: b756a3b5e7ea ("mm: device exclusive memory access") Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Alistair Popple <apopple@nvidia.com> Tested-by: Alistair Popple <apopple@nvidia.com> Cc: Alex Shi <alexs@kernel.org> Cc: Danilo Krummrich <dakr@kernel.org> Cc: Dave Airlie <airlied@gmail.com> Cc: Jann Horn <jannh@google.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Karol Herbst <kherbst@redhat.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Lyude <lyude@redhat.com> Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Peter Xu <peterx@redhat.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: SeongJae Park <sj@kernel.org> Cc: Simona Vetter <simona.vetter@ffwll.ch> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Yanteng Si <si.yanteng@linux.dev> Cc: Barry Song <v-songbaohua@oppo.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
With commit 1a10a44dfc1d ("sparc64: implement the new page table range
API") set_ptes was added to the sparc architecture. The implementation
included calling arch_enter/leave_lazy_mmu() calls.
The patch removes the usage of arch_enter/leave_lazy_mmu() since this
implies nesting of lazy mmu regions which is not supported. Without this
fix, lazy mmu mode is effectively disabled because we exit the mode after
the first set_ptes:
Powerpc suffered the same problem and fixed it in a corresponding way with
commit 47b8def9358c ("powerpc/mm: Avoid calling
arch_enter/leave_lazy_mmu() in set_ptes").
Link: https://lkml.kernel.org/r/20250303141542.3371656-5-ryan.roberts@arm.com Fixes: 1a10a44dfc1d ("sparc64: implement the new page table range API") Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Andreas Larsson <andreas@gaisler.com> Acked-by: Juergen Gross <jgross@suse.com> Cc: Borislav Betkov <bp@alien8.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David S. Miller <davem@davemloft.net> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Juegren Gross <jgross@suse.com> Cc: Matthew Wilcow (Oracle) <willy@infradead.org> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Since commit 38e0edb15bd0 ("mm/apply_to_range: call pte function with lazy
updates") it's been possible for arch_[enter|leave]_lazy_mmu_mode() to be
called without holding a page table lock (for the kernel mappings case),
and therefore it is possible that preemption may occur while in the lazy
mmu mode. The Sparc lazy mmu implementation is not robust to preemption
since it stores the lazy mode state in a per-cpu structure and does not
attempt to manage that state on task switch.
Powerpc had the same issue and fixed it by explicitly disabling preemption
in arch_enter_lazy_mmu_mode() and re-enabling in
arch_leave_lazy_mmu_mode(). See commit b9ef323ea168 ("powerpc/64s:
Disable preemption in hash lazy mmu mode").
Given Sparc's lazy mmu mode is based on powerpc's, let's fix it in the
same way here.
Link: https://lkml.kernel.org/r/20250303141542.3371656-4-ryan.roberts@arm.com Fixes: 38e0edb15bd0 ("mm/apply_to_range: call pte function with lazy updates") Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Andreas Larsson <andreas@gaisler.com> Acked-by: Juergen Gross <jgross@suse.com> Cc: Borislav Betkov <bp@alien8.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David S. Miller <davem@davemloft.net> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Juegren Gross <jgross@suse.com> Cc: Matthew Wilcow (Oracle) <willy@infradead.org> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If do_zone_finish() is called with a filesystem that has missing devices
(e.g. a RAID file system mounted in degraded mode) it is accessing the
btrfs_device::zone_info pointer, which will not be set if the device
in question is missing.
Check if the device is present (by checking if it has a valid block device
pointer associated) and if not, skip zone finishing for it.
Fixes: 4dcbb8ab31c1 ("btrfs: zoned: make zone finishing multi stripe capable") CC: stable@vger.kernel.org # 6.1+ Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If btrfs_zone_activate() is called with a filesystem that has missing
devices (e.g. a RAID file system mounted in degraded mode) it is accessing
the btrfs_device::zone_info pointer, which will not be set if the device in
question is missing.
Check if the device is present (by checking if it has a valid block
device pointer associated) and if not, skip zone activation for it.
Fixes: f9a912a3c45f ("btrfs: zoned: make zone activation multi stripe capable") CC: stable@vger.kernel.org # 6.1+ Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
At close_ctree() after we have ran delayed iputs either through explicitly
calling btrfs_run_delayed_iputs() or later during the call to
btrfs_commit_super() or btrfs_error_commit_super(), we assert that the
delayed iputs list is empty.
When we have compressed writes this assertion may fail because delayed
iputs may have been added to the list after we last ran delayed iputs.
This happens like this:
1) We have a compressed write bio executing;
2) We enter close_ctree() and flush the fs_info->endio_write_workers
queue which is the queue used for running ordered extent completion;
3) The compressed write bio finishes and enters
btrfs_finish_compressed_write_work(), where it calls
btrfs_finish_ordered_extent() which in turn calls
btrfs_queue_ordered_fn(), which queues a work item in the
fs_info->endio_write_workers queue that we have flushed before;
4) At close_ctree() we proceed, run all existing delayed iputs and
call btrfs_commit_super() (which also runs delayed iputs), but before
we run the following assertion below:
ASSERT(list_empty(&fs_info->delayed_iputs))
A delayed iput is added by the step below...
5) The ordered extent completion job queued in step 3 runs and results in
creating a delayed iput when dropping the last reference of the ordered
extent (a call to btrfs_put_ordered_extent() made from
btrfs_finish_one_ordered());
6) At this point the delayed iputs list is not empty, so the assertion at
close_ctree() fails.
Fix this by flushing the fs_info->compressed_write_workers queue at
close_ctree() before flushing the fs_info->endio_write_workers queue,
respecting the queue dependency as the later is responsible for the
execution of ordered extent completion.
CC: stable@vger.kernel.org # 5.15+ Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The MT8173 disp-pwm device should have only one compatible string, based
on the following DT validation error:
arch/arm64/boot/dts/mediatek/mt8173-elm.dtb: pwm@1401e000: compatible: 'oneOf' conditional failed, one must be fixed:
['mediatek,mt8173-disp-pwm', 'mediatek,mt6595-disp-pwm'] is too long
'mediatek,mt8173-disp-pwm' is not one of ['mediatek,mt6795-disp-pwm', 'mediatek,mt8167-disp-pwm']
'mediatek,mt8173-disp-pwm' is not one of ['mediatek,mt8186-disp-pwm', 'mediatek,mt8188-disp-pwm', 'mediatek,mt8192-disp-pwm', 'mediatek,mt8195-disp-pwm', 'mediatek,mt8365-disp-pwm']
'mediatek,mt8173-disp-pwm' was expected
'mediatek,mt8183-disp-pwm' was expected
from schema $id: http://devicetree.org/schemas/pwm/mediatek,pwm-disp.yaml#
arch/arm64/boot/dts/mediatek/mt8173-elm.dtb: pwm@1401f000: compatible: 'oneOf' conditional failed, one must be fixed:
['mediatek,mt8173-disp-pwm', 'mediatek,mt6595-disp-pwm'] is too long
'mediatek,mt8173-disp-pwm' is not one of ['mediatek,mt6795-disp-pwm', 'mediatek,mt8167-disp-pwm']
'mediatek,mt8173-disp-pwm' is not one of ['mediatek,mt8186-disp-pwm', 'mediatek,mt8188-disp-pwm', 'mediatek,mt8192-disp-pwm', 'mediatek,mt8195-disp-pwm', 'mediatek,mt8365-disp-pwm']
'mediatek,mt8173-disp-pwm' was expected
'mediatek,mt8183-disp-pwm' was expected
from schema $id: http://devicetree.org/schemas/pwm/mediatek,pwm-disp.yaml#
Drop the extra "mediatek,mt6595-disp-pwm" compatible string.
Fixes: 61aee9342514 ("arm64: dts: mt8173: add MT8173 display PWM driver support node") Cc: YH Huang <yh.huang@mediatek.com> Cc: stable@vger.kernel.org # v4.5+ Signed-off-by: Chen-Yu Tsai <wenst@chromium.org> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Link: https://lore.kernel.org/r/20250108083424.2732375-2-wenst@chromium.org Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Hotplugged memory can be smaller than the original memory. For example,
on my target:
root@genericarmv8:~# cat /sys/kernel/debug/memblock/memory
0: 0x0000000064005000..0x0000000064023fff 0 NOMAP
1: 0x0000000064400000..0x00000000647fffff 0 NOMAP
2: 0x0000000068000000..0x000000006fffffff 0 DRV_MNG
3: 0x0000000088800000..0x0000000094ffefff 0 NONE
4: 0x0000000094fff000..0x0000000094ffffff 0 NOMAP
max_pfn will affect read_page_owner. Therefore, it should first compare and
then select the larger value for max_pfn.
In r852_ready(), the dev get from r852_get_dev() need to be checked.
An unstable device should not be ready. A proper implementation can
be found in r852_read_byte(). Add a status check and return 0 when it is
unstable.
In INFTL_findwriteunit(), the return value of inftl_read_oob()
need to be checked. A proper implementation can be
found in INFTL_deleteblock(). The status will be set as
SECTOR_IGNORE to break from the while-loop correctly
if the inftl_read_oob() fails.
Fixes: 8593fbc68b0d ("[MTD] Rework the out of band handling completely") Cc: stable@vger.kernel.org # v2.6+ Signed-off-by: Wentao Liang <vulab@iscas.ac.cn> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Recently, during a debugging session using local MPTCP connections, I
noticed MPJoinAckHMacFailure was not zero on the server side. The
counter was in fact incremented when the PM rejected new subflows,
because the 'subflow' limit was reached.
The fix is easy, simply dissociating the two cases: only the HMAC
validation check should increase MPTCP_MIB_JOINACKMAC counter.
Fixes: 4cf8b7e48a09 ("subflow: introduce and use mptcp_can_accept_new_subflow()") Cc: stable@vger.kernel.org Reviewed-by: Geliang Tang <geliang@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250407-net-mptcp-hmac-failure-mib-v1-1-3c9ecd0a3a50@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
According to the debug log, the same req received two SYN-ACK in a very
short time, very likely because the client retransmits the syn ack due
to multiple reasons.
Even if the packets are transmitted with a relevant time interval, they
can be processed by the server on different CPUs concurrently). The
'subflow_req->msk' ownership is transferred to the subflow the first,
and there will be a risk of a null pointer dereference here.
This patch fixes this issue by moving the 'subflow_req->msk' under the
`own_req == true` conditional.
Note that the !msk check in subflow_hmac_valid() can be dropped, because
the same check already exists under the own_req mpj branch where the
code has been moved to.
Fixes: 9466a1ccebbe ("mptcp: enable JOIN requests even if cookies are in use") Cc: stable@vger.kernel.org Suggested-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Gang Yan <yangang@kylinos.cn> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20250328-net-mptcp-misc-fixes-6-15-v1-1-34161a482a7f@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The split_sg_phys function was incorrectly setting the offsets of all
scatterlist entries (except the first) to 0. Only the first scatterlist
entry's offset and length needs to be modified to account for the skip.
Setting the rest entries' offsets to 0 could lead to incorrect data
access.
I am using this function in a crypto driver that I'm currently developing
(not yet sent to mailing list). During testing, it was observed that the
output scatterlists (except the first one) contained incorrect garbage
data.
I narrowed this issue down to the call of sg_split(). Upon debugging
inside this function, I found that this resetting of offset is the cause
of the problem, causing the subsequent scatterlists to point to incorrect
memory locations in a page. By removing this code, I am obtaining
expected data in all the split output scatterlists. Thus, this was indeed
causing observable runtime effects!
This patch removes the offending code, ensuring that the page offsets in
the input scatterlist are preserved in the output scatterlist.
Currently, when a lock class is allocated, nr_unused_locks will be
increased by 1, until it gets used: nr_unused_locks will be decreased by
1 in mark_lock(). However, one scenario is missed: a lock class may be
zapped without even being used once. This could result into a situation
that nr_unused_locks != 0 but no unused lock class is active in the
system, and when `cat /proc/lockdep_stats`, a WARN_ON() will
be triggered in a CONFIG_DEBUG_LOCKDEP=y kernel:
Tegra264 has updated HSP_INT_DIMENSIONING register as follows:
* nSI is now BIT17:BIT21.
* nDB is now BIT12:BIT16.
Currently, we are using a static macro HSP_nINT_MASK to get the values
from HSP_INT_DIMENSIONING register. This results in wrong values for nSI
for HSP instances that supports 16 shared interrupts.
Define dimensioning masks in soc data and use them to parse nSI, nDB,
nAS, nSS & nSM values.
Fixes: 602dbbacc3ef ("mailbox: tegra: add support for Tegra264") Cc: stable@vger.kernel.org Signed-off-by: Kartik Rajput <kkartik@nvidia.com> Acked-by: Thierry Reding <treding@nvidia.com> Acked-by: Jon Hunter <jonathanh@nvidia.com> Signed-off-by: Jassi Brar <jassisinghbrar@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The off_gpios could be NULL. Add missing check in the kb3930_probe().
This is similar to the issue fixed in commit b1ba8bcb2d1f
("backlight: hx8357: Fix potential NULL pointer dereference").
This was detected by our static analysis tool.
Cc: stable@vger.kernel.org Fixes: ede6b2d1dfc0 ("mfd: ene-kb3930: Add driver for ENE KB3930 Embedded Controller") Suggested-by: Lee Jones <lee@kernel.org> Signed-off-by: Chenyuan Yang <chenyuan0y@gmail.com> Link: https://lore.kernel.org/r/20250224233736.1919739-1-chenyuan0y@gmail.com Signed-off-by: Lee Jones <lee@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When determining the actual best period by looping through all
possible PWM configs, the resolution currently used is based on
bit shift value which is off-by-one above the possible maximum
PWM value allowed.
So subtract one from the resolution before determining the best
period so that the maximum duty cycle requested by the PWM user
won't result in a value above the maximum allowed by the selected
resolution.
Cc: stable@vger.kernel.org # 6.4 Fixes: b00d2ed37617 ("leds: rgb: leds-qcom-lpg: Add support for high resolution PWM") Signed-off-by: Abel Vesa <abel.vesa@linaro.org> Reviewed-by: Sebastian Reichel <sre@kernel.org> Link: https://lore.kernel.org/r/20250305-leds-qcom-lpg-fix-max-pwm-on-hi-res-v4-3-bfe124a53a9f@linaro.org Signed-off-by: Lee Jones <lee@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Ideally, the requested duty cycle should never translate to a PWM
value higher than the selected resolution (PWM size), but currently the
best matched period is never reported back to the PWM consumer, so the
consumer will still be using the requested period which is higher than
the best matched one. This will result in PWM consumer requesting
duty cycle values higher than the allowed PWM value.
For example, a consumer might request a period of 5ms while the best
(closest) period the PWM hardware will do is 4.26ms. For this best
matched resolution, if the selected resolution is 8-bit wide, when
the consumer asks for a duty cycle of 5ms, the PWM value will be 300,
which is outside of what the resolution allows. This will happen with
all possible resolutions when selected.
Since for these Hi-Res PWMs, the current implementation is capping the PWM
value at a 15-bit resolution, even when lower resolutions are selected,
the value will be wrapped around by the HW internal logic to the selected
resolution.
Fix the issue by capping the PWM value to the maximum value allowed by
the selected resolution.
Cc: stable@vger.kernel.org # 6.4 Fixes: b00d2ed37617 ("leds: rgb: leds-qcom-lpg: Add support for high resolution PWM") Signed-off-by: Abel Vesa <abel.vesa@linaro.org> Reviewed-by: Bjorn Andersson <andersson@kernel.org> Reviewed-by: Sebastian Reichel <sre@kernel.org> Link: https://lore.kernel.org/r/20250305-leds-qcom-lpg-fix-max-pwm-on-hi-res-v4-2-bfe124a53a9f@linaro.org Signed-off-by: Lee Jones <lee@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Journal emptiness is not determined by sb->s_sequence == 0 but rather by
sb->s_start == 0 (which is set a few lines above). Furthermore 0 is a
valid transaction ID so the check can spuriously trigger. Remove the
invalid WARN_ON.
CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20250206094657.20865-3-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The I3C master driver may receive an IBI from a target device that has not
been probed yet. In such cases, the master calls `i3c_master_queue_ibi()`
to queue an IBI work task, leading to "Unable to handle kernel read from
unreadable memory" and resulting in a kernel panic.
Typical IBI handling flow:
1. The I3C master scans target devices and probes their respective drivers.
2. The target device driver calls `i3c_device_request_ibi()` to enable IBI
and assigns `dev->ibi = ibi`.
3. The I3C master receives an IBI from the target device and calls
`i3c_master_queue_ibi()` to queue the target device driver’s IBI
handler task.
However, since target device events are asynchronous to the I3C probe
sequence, step 3 may occur before step 2, causing `dev->ibi` to be `NULL`,
leading to a kernel panic.
Add a NULL pointer check in `i3c_master_queue_ibi()` to prevent accessing
an uninitialized `dev->ibi`, ensuring stability.
The target can send the MDB byte followed by additional data bytes.
The readl on MRDATAB reads one actual byte, but the readsl advances
the destination pointer by 4 bytes. This causes the subsequent payload
to be copied to wrong position in the destination buffer.
Cc: stable@kernel.org Fixes: dd3c52846d59 ("i3c: master: svc: Add Silvaco I3C master driver") Signed-off-by: Stanley Chu <yschu@nuvoton.com> Reviewed-by: Frank Li <Frank.Li@nxp.com> Link: https://lore.kernel.org/r/20250318053606.3087121-3-yschu@nuvoton.com Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When mounting the same share twice, once with the "linux" mount parameter
(or equivalently "posix") and then once without (or e.g. with "nolinux"),
we were incorrectly reusing the same tree connection for both mounts.
This meant that the first mount of the share on the client, would
cause subsequent mounts of that same share on the same client to
ignore that mount parm ("linux" vs. "nolinux") and incorrectly reuse
the same tcon.
Cc: stable@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
soc_dev_attr->revision could be NULL, thus,
a pointer check is added to prevent potential NULL pointer dereference.
This is similar to the fix in commit 3027e7b15b02
("ice: Fix some null pointer dereference issues in ice_ptp.c").
Don't move forward with tpm_chip_start() inside tpm_try_get_ops(), unless
TPM_CHIP_FLAG_SUSPENDED is not set. tpm_find_get_ops() will return NULL in
such a failure case.
Fixes: 9265fed6db60 ("tpm: Lock TPM chip in tpm_pm_suspend() first") Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@igalia.com> Cc: stable@vger.kernel.org Cc: Jerry Snitselaar <jsnitsel@redhat.com> Cc: Mike Seo <mikeseohyungjin@gmail.com> Cc: Jarkko Sakkinen <jarkko@kernel.org> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Smatch noticed that inode_getblk() can return 1 on successful mapping of
a block instead of expected 0 after commit b405c1e58b73 ("udf: refactor
udf_next_aext() to handle error"). This could confuse some of the
callers and lead to strange failures (although the one reported by
Smatch in udf_mkdir() is impossible to trigger in practice). Fix the
return value of inode_getblk().
Link: https://lore.kernel.org/all/cb514af7-bbe0-435b-934f-dd1d7a16d2cd@stanley.mountain Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Fixes: b405c1e58b73 ("udf: refactor udf_next_aext() to handle error") CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
create_user_mr() has correct code to count the number of null keys
used to fill in a hole for the memory map. However, fill_indir()
does not follow the same to cap the range up to the 1GB limit
correspondingly. Fill in more null keys for the gaps in between,
so that null keys are correctly populated.
Fixes: 94abbccdf291 ("vdpa/mlx5: Add shared memory registration code") Cc: stable@vger.kernel.org Reported-by: Cong Meng <cong.meng@oracle.com> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Acked-by: Eugenio Pérez <eperezma@redhat.com>
Message-Id: <20250220193732.521462-2-dtatulea@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If FI_ATOMIC_DIRTIED is not set, atomic file can encounter corruption
due to a mismatch between old file size and new data.
To resolve this issue, I changed to set FI_ATOMIC_DIRTIED when
FI_DIRTY_INODE is set. This ensures that FI_DIRTY_INODE, which was
previously cleared by the Writeback thread during the commit atomic, is
set and i_size is updated.
Cc: <stable@vger.kernel.org> Fixes: fccaa81de87e ("f2fs: prevent atomic file from being dirtied before commit") Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Reviewed-by: Sunmin Jeong <s_min.jeong@samsung.com> Signed-off-by: Yeongjin Gil <youngjin.gil@samsung.com> Reviewed-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Syzkaller detected a use-after-free issue in ext4_insert_dentry that was
caused by out-of-bounds access due to incorrect splitting in do_split.
BUG: KASAN: use-after-free in ext4_insert_dentry+0x36a/0x6d0 fs/ext4/namei.c:2109
Write of size 251 at addr ffff888074572f14 by task syz-executor335/5847
The following loop is located right above 'if' statement.
for (i = count-1; i >= 0; i--) {
/* is more than half of this entry in 2nd half of the block? */
if (size + map[i].size/2 > blocksize/2)
break;
size += map[i].size;
move++;
}
'i' in this case could go down to -1, in which case sum of active entries
wouldn't exceed half the block size, but previous behaviour would also do
split in half if sum would exceed at the very last block, which in case of
having too many long name files in a single block could lead to
out-of-bounds access and following use-after-free.
Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
Cc: stable@vger.kernel.org Fixes: 5872331b3d91 ("ext4: fix potential negative array index in do_split()") Signed-off-by: Artem Sadovnikov <a.sadovnikov@ispras.ru> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20250404082804.2567-3-a.sadovnikov@ispras.ru Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
A client driver may use mhi_unprepare_from_transfer() to quiesce
incoming data during the client driver's tear down. The client driver
might also be processing data at the same time, resulting in a call to
mhi_queue_buf() which will invoke mhi_gen_tre(). If mhi_gen_tre() runs
after mhi_unprepare_from_transfer() has torn down the channel, a panic
will occur due to an invalid dereference leading to a page fault.
This occurs because mhi_gen_tre() does not verify the channel state
after locking it. Fix this by having mhi_gen_tre() confirm the channel
state is valid, or return error to avoid accessing deinitialized data.
In case of attempts to compress playback something, for instance,
when audio routing is not set up correctly, the audio DSP is left in
inconsistent state because we are not doing the correct things in
the error path of q6asm_dai_compr_set_params().
So, when routing is not set up and compress playback is attempted
the following errors are present (simplified log):
q6routing routing: Routing not setup for MultiMedia-1 Session
q6asm-dai dais: Stream reg failed ret:-22
q6asm-dai dais: ASoC error (-22): at snd_soc_component_compr_set_params()
on 17300000.remoteproc:glink-edge:apr:service@7:dais
After setting the correct routing the compress playback will always fail:
q6asm-dai dais: cmd = 0x10db3 returned error = 0x9
q6asm-dai dais: DSP returned error[9]
q6asm-dai dais: q6asm_open_write failed
q6asm-dai dais: ASoC error (-22): at snd_soc_component_compr_set_params()
on 17300000.remoteproc:glink-edge:apr:service@7:dais
0x9 here means "Operation is already processed". The CMD_OPEN here was
sent the second time hence DSP responds that it was already done.
Turns out the CMD_CLOSE should be sent after the q6asm_open_write()
succeeded but something failed after that, for instance, routing
setup.
Fix this by slightly reworking the error path in
q6asm_dai_compr_set_params().
DSP expects the periods to be aligned to fragment sizes, currently
setting up to hw constriants on periods bytes is not going to work
correctly as we can endup with periods sizes aligned to 32 bytes however
not aligned to fragment size.
Update the constriants to use fragment size, and also set at step of
10ms for period size to accommodate DSP requirements of 10ms latency.
Fixes: 9b4fe0f1cd79 ("ASoC: qdsp6: audioreach: add q6apm-dai support") Cc: stable@vger.kernel.org Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> Tested-by: Johan Hovold <johan+linaro@kernel.org> Link: https://patch.msgid.link/20250314174800.10142-5-srinivas.kandagatla@linaro.org Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This isn't fixing a real issue, but there's also zero point in going
through group and buffer setup, when the buffers are going to be
rejected once attempted to get used.
Since the new_metric and last_hop_metric variables can reach
the MAX_METRIC(0xffffffff) value, an integer overflow may occur
when multiplying them by 10/9. It can lead to incorrect behavior.
Found by InfoTeCS on behalf of Linux Verification Center
(linuxtesting.org) with SVACE.
Fixes: a8d418d9ac25 ("mac80211: mesh: only switch path when new metric is at least 10% better") Cc: stable@vger.kernel.org Signed-off-by: Ilia Gavrilov <Ilia.Gavrilov@infotecs.ru> Link: https://patch.msgid.link/20250212082124.4078236-1-Ilia.Gavrilov@infotecs.ru Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Implement the workaround for erratum
3.3 RGMII timing may be out of spec when transmit delay is enabled
for the 6320 family, which says:
When transmit delay is enabled via Port register 1 bit 14 = 1, duty
cycle may be out of spec. Under very rare conditions this may cause
the attached device receive CRC errors.
Signed-off-by: Marek Behún <kabel@kernel.org> Cc: <stable@vger.kernel.org> # 5.4.x Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20250317173250.28780-8-kabel@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When adding a socket option support in MPTCP, both the get and set parts
are supposed to be implemented.
IPV6_V6ONLY support for the setsockopt part has been added a while ago,
but it looks like the get part got forgotten. It should have been
present as a way to verify a setting has been set as expected, and not
to act differently from TCP or any other socket types.
Not supporting this getsockopt(IPV6_V6ONLY) blocks some apps which want
to check the default value, before doing extra actions. On Linux, the
default value is 0, but this can be changed with the net.ipv6.bindv6only
sysctl knob. On Windows, it is set to 1 by default. So supporting the
get part, like for all other socket options, is important.
Everything was in place to expose it, just the last step was missing.
Only new code is added to cover this specific getsockopt(), that seems
safe.
Fixes: c9b95a135987 ("mptcp: support IPV6_V6ONLY setsockopt") Cc: stable@vger.kernel.org Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/550 Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250314-net-mptcp-fix-data-stream-corr-sockopt-v1-2-122dbb249db3@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Set the device's runtime PM status and enable runtime PM before
registering the async sub-device. This is needed to avoid the case where
the device is runtime PM resumed while runtime PM has not been enabled
yet.
Also set the device's runtime PM status to suspended in remove only if it
wasn't so already.
Fixes: 1283b3b8f82b ("media: i2c: Add driver for Sony IMX219 sensor") Cc: stable@vger.kernel.org # for >= v6.6 Reviewed-by: Bingbu Cao <bingbu.cao@intel.com> Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com> Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
words_count denotes the number of words in total payload, while data
points to payload of various property within it. When words_count
reaches last word, data can access memory beyond the total payload. This
can lead to OOB access. With this patch, the utility api for handling
individual properties now returns the size of data consumed. Accordingly
remaining bytes are calculated before parsing the payload, thereby
eliminates the OOB access possibilities.
There is a possibility that init_codecs is invoked multiple times during
manipulated payload from video firmware. In such case, if codecs_count
can get incremented to value more than MAX_CODEC_NUM, there can be OOB
access. Reset the count so that it always starts from beginning.
Lift the xshutdown (enable) GPIO 1 ms after enabling the regulators, as
required by the sensor's power-up sequence.
Fixes: d30bb512da3d ("media: Add a driver for the ov7251 camera sensor") Cc: stable@vger.kernel.org Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com> Reviewed-by: Dave Stevenson <dave.stevenson@raspberrypi.com> Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fixes: d30bb512da3d ("media: Add a driver for the ov7251 camera sensor") Cc: stable@vger.kernel.org Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com> Reviewed-by: Dave Stevenson <dave.stevenson@raspberrypi.com> Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In v4l2_detect_gtf(), it seems safer to cast the 32-bit image_width
variable to the 64-bit type u64 before multiplying to avoid
a possible overflow. The resulting object code even seems to
look better, at least on x86_64.
Found by Linux Verification Center (linuxtesting.org) with Svace.
The visl driver supports both frame and slice mode, with and without a
start-code. But, the range and default for these enum controls was not
set, which currently limits the decoder to enums with a value of 0. Fix
this by setting the decoder mode and start code controls for both the
H.264 and HEVC codecs.
Fixes: 0c078e310b6d ("media: visl: add virtual stateless decoder driver") Cc: stable@vger.kernel.org Signed-off-by: Nicolas Dufresne <nicolas.dufresne@collabora.com> Signed-off-by: Sebastian Fricke <sebastian.fricke@collabora.com> Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If streamzap_callback() receives an urb with any non-critical error
status, i.e. any error code other than -ECONNRESET, -ENOENT or -ESHUTDOWN,
it will try to process IR data, ignoring a possible transfer failure.
Make streamzap_callback() process IR data only when urb->status is 0.
Move processing logic to a separate function to make code cleaner and
more similar to the URB completion handlers in other RC drivers.
Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
Fixes: 19770693c354 ("V4L/DVB: staging/lirc: add lirc_streamzap driver") Cc: stable@vger.kernel.org Signed-off-by: Murad Masimov <m.masimov@mt-integration.ru> Signed-off-by: Sean Young <sean@mess.org> Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The change to only use interrupts to handle supported status changes
introduced an issue when it is necessary to poll for the status. Rather
than checking for the status after sleeping the code now sleeps after
the check. This means a correct, but slower, status change on the part
of the TPM can be missed, resulting in a spurious timeout error,
especially on a more loaded system. Switch back to sleeping *then*
checking. An up front check of the status has been done at the start of
the function, so this does not cause an additional delay when the status
is already what we're looking for.
Cc: stable@vger.kernel.org # v6.4+ Fixes: e87fcf0dc2b4 ("tpm, tpm_tis: Only handle supported interrupts") Signed-off-by: Jonathan McDowell <noodles@meta.com> Reviewed-by: Michal Suchánek <msuchanek@suse.de> Reviewed-by: Lino Sanfilippo <l.sanfilippo@kunbus.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fixed warning on PM resume as shown below caused due to uninitialized
struct nand_operation that checks chip select field :
WARN_ON(op->cs >= nanddev_ntargets(&chip->base)
The fix uses the higher level nand_reset(chip, chipnr); where chipnr = 0, when
doing PM resume operation in compliance with the controller support for single
die nand chip. Switching from nand_reset_op() to nand_reset() implies more
than just setting the cs field op->cs, it also reconfigures the data interface
(ie. the timings). Tested and confirmed the NAND chip is in sync timing wise
with host after the fix.
Fixes: 97d90da8a886 ("mtd: nand: provide several helpers to do common NAND operations") Cc: stable@vger.kernel.org Signed-off-by: Kamal Dasu <kamal.dasu@broadcom.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In 2020, there's been an unnoticed change which rightfully attempted to
report probe deferrals upon DMA absence by checking the return value of
dma_request_chan_by_mask(). By doing so, it also reported errors which
were simply ignored otherwise, likely on purpose.
This change actually turned a void return into an error code. Hence, not
only the -EPROBE_DEFER error codes but all error codes got reported to
the callers, now failing to probe in the absence of Rx DMA channel,
despite the fact that DMA seems to not be supported natively by many
implementations.
Looking at the history, this change probably led to: ad2775dc3fc5 ("spi: cadence-quadspi: Disable the DAC for Intel LGM SoC") f724c296f2f2 ("spi: cadence-quadspi: fix Direct Access Mode disable for SoCFPGA")
In my case, the AM62A LP SK core octo-SPI node from TI does not
advertise any DMA channel, hinting that there is likely no support for
it, but yet when the support for the am654 compatible was added, DMA
seemed to be used, so just discarding its use with the
CQSPI_DISABLE_DAC_MODE quirk for this compatible does not seem the
correct approach.
Let's get change the return condition back to:
- return a probe deferral error if we get one
- ignore the return value otherwise
The "error" log level was however likely too high for something that is
expected to fail, so let's lower it arbitrarily to the info level.
Fixes: 935da5e5100f ("mtd: spi-nor: cadence-quadspi: Handle probe deferral while requesting DMA channel") Cc: stable@vger.kernel.org Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Link: https://patch.msgid.link/20250305200933.2512925-2-miquel.raynal@bootlin.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If kvm_arch_vcpu_create() fails to share the vCPU page with the
hypervisor, we propagate the error back to the ioctl but leave the
vGIC vCPU data initialised. Note only does this leak the corresponding
memory when the vCPU is destroyed but it can also lead to use-after-free
if the redistributor device handling tries to walk into the vCPU.
Add the missing cleanup to kvm_arch_vcpu_create(), ensuring that the
vGIC vCPU structures are destroyed on error.
Cc: <stable@vger.kernel.org> Cc: Marc Zyngier <maz@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Quentin Perret <qperret@google.com> Signed-off-by: Will Deacon <will@kernel.org> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250314133409.9123-1-will@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The code for detecting CPUs that are vulnerable to Spectre BHB was
based on a hardcoded list of CPU IDs that were known to be affected.
Unfortunately, the list mostly only contained the IDs of standard ARM
cores. The IDs for many cores that are minor variants of the standard
ARM cores (like many Qualcomm Kyro CPUs) weren't listed. This led the
code to assume that those variants were not affected.
Flip the code on its head and instead assume that a core is vulnerable
if it doesn't have CSV2_3 but is unrecognized as being safe. This
involves creating a "Spectre BHB safe" list.
As of right now, the only CPU IDs added to the "Spectre BHB safe" list
are ARM Cortex A35, A53, A55, A510, and A520. This list was created by
looking for cores that weren't listed in ARM's list [1] as per review
feedback on v2 of this patch [2]. Additionally Brahma A53 is added as
per mailing list feedback [3].
NOTE: this patch will not actually _mitigate_ anyone, it will simply
cause them to report themselves as vulnerable. If any cores in the
system are reported as vulnerable but not mitigated then the whole
system will be reported as vulnerable though the system will attempt
to mitigate with the information it has about the known cores.
Fixes: 558c303c9734 ("arm64: Mitigate spectre style branch history side channels") Cc: stable@vger.kernel.org Reviewed-by: Julius Werner <jwerner@chromium.org> Signed-off-by: Douglas Anderson <dianders@chromium.org> Link: https://lore.kernel.org/r/20250107120555.v4.2.I2040fa004dafe196243f67ebcc647cbedbb516e6@changeid Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Qualcomm Kryo 400-series Gold cores have a derivative of an ARM Cortex
A76 in them. Since A76 needs Spectre mitigation via looping then the
Kyro 400-series Gold cores also need Spectre mitigation via looping.
Qualcomm has confirmed that the proper "k" value for Kryo 400-series
Gold cores is 24.
Fixes: 558c303c9734 ("arm64: Mitigate spectre style branch history side channels") Cc: stable@vger.kernel.org Cc: Scott Bauer <sbauer@quicinc.com> Signed-off-by: Douglas Anderson <dianders@chromium.org> Acked-by: Trilok Soni <quic_tsoni@quicinc.com> Link: https://lore.kernel.org/r/20250107120555.v4.1.Ie4ef54abe02e7eb0eee50f830575719bf23bda48@changeid Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The interface specifies the symnum field as an input and output; the
hypervisor sets it to the next sequential symbol's index. xensyms_next()
incrementing the position explicitly (and xensyms_next_sym()
decrementing it to "rewind") is only correct as long as the sequence of
symbol indexes is non-sparse. Use the hypervisor-supplied value instead
to update the position in xensyms_next(), and use the saved incoming
index in xensyms_next_sym().
The smsdvb_module_init() returns without checking the retval from
smscore_register_hotplug().
If the smscore_register_hotplug() failed, the module failed to install,
leaving the smsdvb_debugfs not unregistered.
Fixes: 3f6b87cff66b ("[media] siano: allow showing the complete statistics via debugfs") Cc: stable@vger.kernel.org Signed-off-by: Yuan Can <yuancan@huawei.com> Acked-by: Ricardo Ribalda <ribalda@chromium.org> Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Move the v4l2_info() call displaying the video device name after the
device is actually registered.
This fixes a bug where the driver was always displaying "/dev/video0"
since it was reading from the vfd before it was registered.
Fixes: cf7f34777a5b ("media: vim2m: Register video device after setting up internals") Cc: stable@vger.kernel.org Signed-off-by: Matthew Majewski <mattwmajewski@gmail.com> Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
qsize represents size of shared queued between driver and video
firmware. Firmware can modify this value to an invalid large value. In
such situation, empty_space will be bigger than the space actually
available. Since new_wr_idx is not checked, so the following code will
result in an OOB write.
...
qsize = qhdr->q_size
sfr->buf_size is in shared memory and can be modified by malicious user.
OOB write is possible when the size is made higher than actual sfr data
buffer. Cap the size to allocated size for such cases.
The mask to select the test-pattern in register ADV748X_SDP_FRP is
incorrect, it's the lower 3 bits which controls the pattern. The
GENMASK() macro is used incorrectly and the generated mask is 0x0e
instead of 0x07.
The result is that not all test patterns are selectable, and that in
some cases the wrong test pattern is activated. Fix this by correcting
the GENMASK().
On Mediatek devices with a system companion processor (SCP) the mtk_scp
structure has to be removed explicitly to avoid a resource leak.
Free the structure in case the allocation of the firmware structure fails
during the firmware initialization.
The MIPID02 can use up to 2 data lanes which leads to having a maximum
item number of 3 for the lane-polarities since this also contains the
clock lane.
CC: stable@vger.kernel.org Fixes: c2741cbe7f8a ("dt-bindings: media: st,stmipid02: Convert the text bindings to YAML") Signed-off-by: Alain Volmat <alain.volmat@foss.st.com> Acked-by: Conor Dooley <conor.dooley@microchip.com> Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
A file handle that userspace provides to open_by_handle_at() can
legitimately contain an outdated inode number that has since been reused
for another purpose - that's why the file handle also contains a generation
number.
But if the inode number has been reused for an ea_inode, check_igot_inode()
will notice, __ext4_iget() will go through ext4_error_inode(), and if the
inode was newly created, it will also be marked as bad by iget_failed().
This all happens before the point where the inode generation is checked.
ext4_error_inode() is supposed to only be used on filesystem corruption; it
should not be used when userspace just got unlucky with a stale file
handle. So when this happens, let __ext4_iget() just return an error.
Classic BPF socket filters with SKB_NET_OFF and SKB_LL_OFF fail to
read when these offsets extend into frags.
This has been observed with iwlwifi and reproduced with tun with
IFF_NAPI_FRAGS. The below straightforward socket filter on UDP port,
applied to a RAW socket, will silently miss matching packets.
This is unexpected behavior. Socket filter programs should be
consistent regardless of environment. Silent misses are
particularly concerning as hard to detect.
Use skb_copy_bits for offsets outside linear, same as done for
non-SKF_(LL|NET) offsets.
Offset is always positive after subtracting the reference threshold
SKB_(LL|NET)_OFF, so is always >= skb_(mac|network)_offset. The sum of
the two is an offset against skb->data, and may be negative, but it
cannot point before skb->head, as skb_(mac|network)_offset would too.
This appears to go back to when frag support was introduced to
sk_run_filter in linux-2.4.4, before the introduction of git.
The amount of code change and 8/16/32 bit duplication are unfortunate.
But any attempt I made to be smarter saved very few LoC while
complicating the code.
Considering that the driver doesn't enable the used clocks (and also
that clk_get_rate() returns 0 if CONFIG_HAVE_CLK is unset) better check
the return value of clk_get_rate() for being non-zero before dividing by
it.
There were several issues in the function rcar_pwm_set_counter():
- The u64 values period_ns and duty_ns were cast to int on function
call which might loose bits on 32 bit architectures.
Fix: Make parameters to rcar_pwm_set_counter() u64
- The algorithm divided by the result of a division which looses
precision.
Fix: Make use of mul_u64_u64_div_u64()
- The calculated values were just masked to fit the respective register
fields which again might loose bits.
Fix: Explicitly check for overlow
Implement the respective fixes.
A side effect of fixing the 2nd issue is that there is no division by 0
if clk_get_rate() returns 0.
due to the fact that the !CONFIG_HAVE_CLK version of clk_get_rate()
returns zero.
This is presumably just a theoretical problem: COMPILE_TEST overrides
the dependency on RALINK which would select COMMON_CLK. Regardless it's
a good idea to check for the error explicitly to avoid divide-by-zero.
Fixes the following warning:
drivers/pwm/pwm-mediatek.o: warning: objtool: .text: unexpected end of section
Some Infineon devices have a issue where the status register will get
stuck with a quick REQUEST_USE / COMMAND_READY sequence. This is not
simply a matter of requiring a longer timeout; the work around is to
retry the command submission. Add appropriate logic to do this in the
send path.
This is fixed in later firmware revisions, but those are not always
available, and cannot generally be easily updated from outside a
firmware environment.
Testing has been performed with a simple repeated loop of doing a
TPM2_CC_GET_CAPABILITY for TPM_CAP_PROP_MANUFACTURER using the Go code
at:
Handle missing parent directories for LOG_FILE path to prevent test
failures. If the parent directories don't exist, create them to ensure
the tests proceed successfully.
Add comments about entry data storing code to __store_entry_arg() and
traceprobe_get_entry_data_size(). These are a bit complicated because of
building the entry data storing code and scanning it.
Function dispc_ovl_setup is not intended to work with the value OMAP_DSS_WB
of the enum parameter plane.
The value of this parameter is initialized in dss_init_overlays and in the
current state of the code it cannot take this value so it's not a real
problem.
For the purposes of defensive coding it wouldn't be superfluous to check
the parameter value, because some functions down the call stack process
this value correctly and some not.
For example, in dispc_ovl_setup_global_alpha it may lead to buffer
overflow.
Add check for this value.
Found by Linux Verification Center (linuxtesting.org) with SVACE static
analysis tool.
The access to the PCI config space via pci_ops::read and pci_ops::write is
a low-level hardware access. The functions can be accessed with disabled
interrupts even on PREEMPT_RT. The pci_lock is a raw_spinlock_t for this
purpose.
A spinlock_t becomes a sleeping lock on PREEMPT_RT, so it cannot be
acquired with disabled interrupts. The vmd_dev::cfg_lock is accessed in
the same context as the pci_lock.
Make vmd_dev::cfg_lock a raw_spinlock_t type so it can be used with
interrupts disabled.
This was reported as:
BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
Call Trace:
rt_spin_lock+0x4e/0x130
vmd_pci_read+0x8d/0x100 [vmd]
pci_user_read_config_byte+0x6f/0xe0
pci_read_config+0xfe/0x290
sysfs_kf_bin_read+0x68/0x90
Signed-off-by: Ryo Takakura <ryotkkr98@gmail.com> Tested-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com> Acked-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
[bigeasy: reword commit message] Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Tested-off-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com> Link: https://lore.kernel.org/r/20250218080830.ufw3IgyX@linutronix.de
[kwilczynski: commit log] Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
[bhelgaas: add back report info from
https://lore.kernel.org/lkml/20241218115951.83062-1-ryotkkr98@gmail.com/] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Add error handling to propagate amdgpu_cgs_create_device() failures
to the caller. When amdgpu_cgs_create_device() fails, release hwmgr
and return -ENOMEM to prevent null pointer dereference.
[v1]->[v2]: Change error code from -EINVAL to -ENOMEM. Free hwmgr.
In preparation for adding support for MT8195's HDMI reserved
DPI, add calls to clk_prepare_enable() / clk_disable_unprepare()
for the TVD clock: in this particular case, the aforementioned
clock is not (and cannot be) parented to neither pixel or engine
clocks hence it won't get enabled automatically by the clock
framework.
Please note that on all of the currently supported MediaTek
platforms, the TVD clock is always a parent of either pixel or
engine clocks, and this means that the common clock framework
is already enabling this clock before the children.
On such platforms, this commit will only increase the refcount
of the TVD clock without any functional change.
debugfs hang_hws is used by GPU reset test with HWS, for MES this crash
the kernel with NULL pointer access because dqm->packet_mgr is not setup
for MES path.
Skip GPU with MES for now, MES hang_hws debugfs interface will be
supported later.
Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Kent Russell <kent.russell@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
If GPU in reset, destroy_queue return -EIO, pqm_destroy_queue should
delete the queue from process_queue_list and free the resource.
Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
If HW scheduler hangs and mode1 reset is used to recover GPU, KFD signal
user space to abort the processes. After process abort exit, user queues
still use the GPU to access system memory before h/w is reset while KFD
cleanup worker free system memory and free VRAM.
There is use-after-free race bug that KFD allocate and reuse the freed
system memory, and user queue write to the same system memory to corrupt
the data structure and cause driver crash.
To fix this race, KFD cleanup worker terminate user queues, then flush
reset_domain wq to wait for any GPU ongoing reset complete, and then
free outstanding BOs.
Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
When releasing a device, if the release action causes a group to be
released, a warning is emitted because it can't find the group. This
happens because devres_release_all() moves the entire list to a todo
list and also move the group markers. Considering r* normal resource
nodes and g1 a group resource node:
g1 -----------.
v v
r1 -> r2 -> g1[0] -> r3-> g[1] -> r4
After devres_release_all(), dev->devres_head becomes empty and the todo
list it iterates on becomes:
g1
v
r1 -> r2 -> r3-> r4 -> g1[0]
When a call to component_del() is made and takes down the aggregate
device, a warning like this happen:
Because the devres group corresponding to the hdcp component cannot be
found. Just ignore this corner case: if the dev->devres_head is empty
and the caller is trying to remove a group, it's likely in the process
of device cleanup so just ignore it instead of warning.
Having an DRM_MODE_CONNECTOR_Unknown connector type is considered bad, and
drm_panel_bridge_add_typed() and derivatives are deprecated for this.
drm_panel_init() won't prevent initializing a panel with a
DRM_MODE_CONNECTOR_Unknown connector type. Luckily there are no in-tree
users doing it, so take this as an opportinuty to document a valid
connector type must be passed.
Returning an error if this rule is violated is not possible because
drm_panel_init() is a void function. Add at least a warning to make any
violations noticeable, especially to non-upstream drivers.
The Intel model of the OneXPlayer Mini uses a 1200x1920 portrait LCD panel.
The DMI strings are the same as the OneXPlayer, which already has a DMI
quirk, but the panel is different.
Add a DMI match to correctly rotate this panel.
Signed-off-by: Andrew Wyatt <fewtarius@steamfork.org> Co-developed-by: John Edwards <uejji@uejji.net> Signed-off-by: John Edwards <uejji@uejji.net> Tested-by: João Pedro Kurtz <joexkurtz@gmail.com> Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/20250213222455.93533-6-uejji@uejji.net Signed-off-by: Sasha Levin <sashal@kernel.org>
The AYANEO Slide uses a 1080x1920 portrait LCD panel. This is the same
panel used on the AYANEO Air Plus, but the DMI data is too different to
match both with one entry.
Add a DMI match to correctly rotate the panel on the AYANEO Slide.
This also covers the Antec Core HS, which is a rebranded AYANEO Slide with
the exact same hardware and DMI strings.
Signed-off-by: Andrew Wyatt <fewtarius@steamfork.org> Signed-off-by: John Edwards <uejji@uejji.net> Tested-by: John Edwards <uejji@uejji.net> Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/20250213222455.93533-4-uejji@uejji.net Signed-off-by: Sasha Levin <sashal@kernel.org>