This driver, for the time being, assumes that the kernel page size is 4kB,
so it fails on loong64 and aarch64 with 16kB pages, and ppc64el with 64kB
pages.
Signed-off-by: Simon Richter <Simon.Richter@hogyros.de> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Cc: stable@vger.kernel.org # v6.8+ Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://lore.kernel.org/r/20250802024152.3021-1-Simon.Richter@hogyros.de
(cherry picked from commit 0521a868222ffe636bf202b6e9d29292c1e19c62) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
[ Adjust context ] Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When add_addr_timeout was set to 0, this caused the ADD_ADDR to be
retransmitted immediately, which looks like a buggy behaviour. Instead,
interpret 0 as "no retransmissions needed".
The documentation is updated to explicitly state that setting the timeout
to 0 disables retransmission.
Fixes: 93f323b9cccc ("mptcp: add a new sysctl add_addr_timeout") Cc: stable@vger.kernel.org Suggested-by: Matthieu Baerts <matttbe@kernel.org> Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20250815-net-mptcp-misc-fixes-6-17-rc2-v1-5-521fe9957892@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
[ Before commit e4c28e3d5c09 ("mptcp: pm: move generic PM helpers to
pm.c"), mptcp_pm_alloc_anno_list() was in pm_netlink.c. The same patch
can be applied there without conflicts. ] Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
sk_reset_timer() was called twice in mptcp_pm_alloc_anno_list.
Simplify the code by using a 'goto' statement to eliminate the
duplication.
Note that this is not a fix, but it will help backporting the following
patch. The same "Fixes" tag has been added for this reason.
Fixes: 93f323b9cccc ("mptcp: add a new sysctl add_addr_timeout") Cc: stable@vger.kernel.org Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20250815-net-mptcp-misc-fixes-6-17-rc2-v1-4-521fe9957892@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
[ Before commit e4c28e3d5c09 ("mptcp: pm: move generic PM helpers to
pm.c"), mptcp_pm_alloc_anno_list() was in pm_netlink.c. The same patch
can be applied there without conflicts. ] Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This function is supposed to return true for valid headers and false for
invalid. In a couple places it returns -EINVAL instead which means the
invalid headers are counted as true. Change it to return false.
Fixes: 9f9967fed9d0 ("soc: qcom: mdt_loader: Ensure we don't read past the ELF header") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Link: https://lore.kernel.org/r/db57c01c-bdcc-4a0f-95db-b0f2784ea91f@sabinyo.mountain Signed-off-by: Bjorn Andersson <andersson@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
scsi_check_passthrough() is always called, but it doesn't check for if a
command completed successfully. As a result, if a command was successful and
the caller used SCMD_FAILURE_RESULT_ANY to indicate what failures it wanted
to retry, we will end up retrying the command. This will cause delays during
device discovery because of the command being sent multiple times. For some
USB devices it can also cause the wrong device size to be used.
This patch adds a check for if the command was successful. If it is we
return immediately instead of trying to match a failure.
Fixes: 994724e6b3f0 ("scsi: core: Allow passthrough to request midlayer retries") Reported-by: Kris Karas <bugs-a21@moonlit-rail.com> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219652 Signed-off-by: Mike Christie <michael.christie@oracle.com> Link: https://lore.kernel.org/r/20250107010220.7215-1-michael.christie@oracle.com Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: John Garry <john.g.garry@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Cc: Igor Pylypiv <ipylypiv@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
dce110_fill_display_configs is shared between DCE 6-11, and
finding the first CRTC and its line time is relevant to DCE 6 too.
Move the code to find it from DCE 11 specific code.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Rodrigo Siqueira <siqueira@igalia.com> Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 4ab09785f8d5d03df052827af073d5c508ff5f63) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
On DCE 6, DP audio was not working. However, it worked when an
HDMI monitor was also plugged in.
Looking at dce_aud_wall_dto_setup it seems that the main
difference is that we use DTO1 when only DP is plugged in.
When programming DTO1, it uses audio_dto_source_clock_in_khz
which is set from get_dp_ref_freq_khz
The dce60_get_dp_ref_freq_khz implementation looks incorrect,
because DENTIST_DISPCLK_CNTL seems to be always zero on DCE 6,
so it isn't usable.
I compared dce60_get_dp_ref_freq_khz to the legacy display code,
specifically dce_v6_0_audio_set_dto, and it turns out that in
case of DCE 6, it needs to use the display clock. With that,
DP audio started working on Pitcairn, Oland and Cape Verde.
However, it still didn't work on Tahiti. Despite having the
same DCE version, Tahiti seems to have a different audio device.
After some trial and error I realized that it works with the
default display clock as reported by the VBIOS, not the current
display clock.
[WHY & HOW]
IPS & self-fresh feature can cause vblank counter resets between
vblank disable and enable.
It may cause system stuck due to wait the vblank counter.
Call the drm_crtc_vblank_restore() during vblank enable to estimate
missed vblanks by using timestamps and update the vblank counter in
DRM.
It can make the vblank counter increase smoothly and resolve this issue.
Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Sun peng (Leo) Li <sunpeng.li@amd.com> Signed-off-by: Tom Chung <chiahsuan.chung@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 34d66bc7ff10e146a4cec76cf286979740a10954) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
For later VBIOS versions, the fractional feedback divider is
calculated as the remainder of dividing the feedback divider by
a factor, which is set to 1000000. For reference, see:
- calculate_fb_and_fractional_fb_divider
- calc_pll_max_vco_construct
However, in case of old VBIOS versions that have
set_pixel_clock_v3, they only have 1 byte available for the
fractional feedback divider, and it's expected to be set to the
remainder from dividing the feedback divider by 10.
For reference see the legacy display code:
- amdgpu_pll_compute
- amdgpu_atombios_crtc_program_pll
This commit fixes set_pixel_clock_v3 by dividing the fractional
feedback divider passed to the function by 100000.
Fixes: 4562236b3bc0 ("drm/amd/dc: Add dc display driver (v2)") Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Rodrigo Siqueira <siqueira@igalia.com> Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 027e7acc7e17802ebf28e1edb88a404836ad50d6) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The extra 15% clock was added as a workaround for a Polaris issue
which uses DCE 11, and should not have been used on DCE 6 which
is already hardcoded to the highest possible display clock.
Unfortunately, the extra 15% was mistakenly copied and kept
even on code paths which don't affect Polaris.
This commit fixes that and also adds a check to make sure
not to exceed the maximum DCE 6 display clock.
Fixes: 8cd61c313d8b ("drm/amd/display: Raise dispclk value for Polaris") Fixes: dc88b4a684d2 ("drm/amd/display: make clk mgr soc specific") Fixes: 3ecb3b794e2c ("drm/amd/display: dc/clk_mgr: add support for SI parts (v2)") Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Rodrigo Siqueira <siqueira@igalia.com> Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 427980c1cbd22bb256b9385f5ce73c0937562408) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[WHY]
Although unlikely drm_atomic_get_new_connector_state() or
drm_atomic_get_old_connector_state() can return NULL.
[HOW]
Check returns before dereference.
Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 1e5e8d672fec9f2ab352be121be971877bff2af9) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
A VMM may send a non-fatal signal to its threads, including vCPU tasks,
at any time, and thus may signal vCPU tasks during KVM_RUN. If a vCPU
task receives the signal while its trying to spawn the huge page recovery
vhost task, then KVM_RUN will fail due to copy_process() returning
-ERESTARTNOINTR.
Rework call_once() to mark the call complete if and only if the called
function succeeds, and plumb the function's true error code back to the
call_once() invoker. This provides userspace with the correct, non-fatal
error code so that the VMM doesn't terminate the VM on -ENOMEM, and allows
subsequent KVM_RUN a succeed by virtue of retrying creation of the NX huge
page task.
Co-developed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com>
[implemented the kvm user side] Signed-off-by: Keith Busch <kbusch@kernel.org>
Message-ID: <20250227230631.303431-3-kbusch@meta.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Cc: Alistair Delva <adelva@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Tracing code called by the SCLP interrupt handler contains early exits
if the SCCB address associated with an interrupt is NULL. This check is
performed after physical to virtual address translation.
If the kernel identity mapping does not start at address zero, the
resulting virtual address is never zero, so that the NULL checks won't
work. Subsequently this may result in incorrect accesses to the first
page of the identity mapping.
Fix this by introducing a function that handles the NULL case before
address translation.
Fixes: ada1da31ce34 ("s390/sclp: sort out physical vs virtual pointers usage") Cc: stable@vger.kernel.org Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Peter Oberparleiter <oberpar@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When skb packets are sent out, these skb packets still depends on
the rxe resources, for example, QP, sk, when these packets are
destroyed.
If these rxe resources are released when the skb packets are destroyed,
the call traces will appear.
To avoid skb packets hang too long time in some network devices,
a timestamp is added when these skb packets are created. If these
skb packets hang too long time in network devices, these network
devices can free these skb packets to release rxe resources.
Reported-by: syzbot+8425ccfb599521edb153@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=8425ccfb599521edb153 Tested-by: syzbot+8425ccfb599521edb153@syzkaller.appspotmail.com Fixes: 1a633bdc8fd9 ("RDMA/rxe: Let destroy qp succeed with stuck packet") Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> Link: https://patch.msgid.link/20250726013104.463570-1-yanjun.zhu@linux.dev Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The HP EliteBook x360 830 G6 and HP EliteBook 830 G6 have
Realtek HDA codec ALC215. It needs the ALC285_FIXUP_HP_GPIO_LED
quirk to enable the mute LED.
When memory_failure() is called for a already hwpoisoned pfn,
kill_accessing_process() will be called to kill current task. However, if
the vma of the accessing vaddr is VM_PFNMAP, walk_page_range() will skip
the vma in walk_page_test() and return 0.
Before commit aaf99ac2ceb7 ("mm/hwpoison: do not send SIGBUS to processes
with recovered clean pages"), kill_accessing_process() will return EFAULT.
For x86, the current task will be killed in kill_me_maybe().
However, after this commit, kill_accessing_process() simplies return 0,
that means UCE is handled properly, but it doesn't actually. In such
case, the user task will trigger UCE infinitely.
To fix it, add .test_walk callback for hwpoison_walk_ops to scan all vmas.
Link: https://lkml.kernel.org/r/20250815073209.1984582-1-tujinjiang@huawei.com Fixes: aaf99ac2ceb7 ("mm/hwpoison: do not send SIGBUS to processes with recovered clean pages") Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Jane Chu <jane.chu@oracle.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Naoya Horiguchi <nao.horiguchi@gmail.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Shuai Xue <xueshuai@linux.alibaba.com> Cc: Zi Yan <ziy@nvidia.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The mm/debug_vm_pagetable test allocates manually page table entries for
the tests it runs, using also its manually allocated mm_struct. That in
itself is ok, but when it exits, at destroy_args() it fails to clear those
entries with the *_clear functions.
The problem is that leaves stale entries. If another process allocates an
mm_struct with a pgd at the same address, it may end up running into the
stale entry. This is happening in practice on a debug kernel with
CONFIG_DEBUG_VM_PGTABLE=y, for example this is the output with some extra
debugging I added (it prints a warning trace if pgtables_bytes goes
negative, in addition to the warning at check_mm() function):
Here the modprobe process ended up with an allocated mm_struct from the
mm_struct slab that was used before by the debug_vm_pgtable test. That is
not a problem, since the mm_struct is initialized again etc., however, if
it ends up using the same pgd table, it bumps into the old stale entry
when clearing/freeing the page table entries, so it tries to free an entry
already gone (that one which was allocated by the debug_vm_pgtable test),
which also explains the negative pgtables_bytes since it's accounting for
not allocated entries in the current process.
As far as I looked pgd_{alloc,free} etc. does not clear entries, and
clearing of the entries is explicitly done in the free_pgtables->
free_pgd_range->free_p4d_range->free_pud_range->free_pmd_range->
free_pte_range path. However, the debug_vm_pgtable test does not call
free_pgtables, since it allocates mm_struct and entries manually for its
test and eg. not goes through page faults. So it also should clear
manually the entries before exit at destroy_args().
This problem was noticed on a reboot X number of times test being done on
a powerpc host, with a debug kernel with CONFIG_DEBUG_VM_PGTABLE enabled.
Depends on the system, but on a 100 times reboot loop the problem could
manifest once or twice, if a process ends up getting the right mm->pgd
entry with the stale entries used by mm/debug_vm_pagetable. After using
this patch, I couldn't reproduce/experience the problems anymore. I was
able to reproduce the problem as well on latest upstream kernel (6.16).
I also modified destroy_args() to use mmput() instead of mmdrop(), there
is no reason to hold mm_users reference and not release the mm_struct
entirely, and in the output above with my debugging prints I already had
patched it to use mmput, it did not fix the problem, but helped in the
debugging as well.
After nfs_lock_and_join_requests() tests for whether the request is
still attached to the mapping, nothing prevents a call to
nfs_inode_remove_request() from succeeding until we actually lock the
page group.
The reason is that whoever called nfs_inode_remove_request() doesn't
necessarily have a lock on the page group head.
So in order to avoid races, let's take the page group lock earlier in
nfs_lock_and_join_requests(), and hold it across the removal of the
request in nfs_inode_remove_request().
Reported-by: Jeff Layton <jlayton@kernel.org> Tested-by: Joe Quanaim <jdq@meta.com> Tested-by: Andrew Steffen <aksteffen@meta.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Fixes: bd37d6fce184 ("NFSv4: Convert nfs_lock_and_join_requests() to use nfs_page_find_head_request()") Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Due to a flaw in the hardware design, the GL9763e replay timer frequently
times out when ASPM is enabled. As a result, the warning messages will
often appear in the system log when the system accesses the GL9763e
PCI config. Therefore, the replay timer timeout must be masked.
The existing memstick core patch: commit 62c59a8786e6 ("memstick: Skip
allocating card when removing host") sets host->removing in
memstick_remove_host(),but still exists a critical time window where
memstick_check can run after host->eject is set but before removing is set.
In the rtsx_usb_ms driver, the problematic sequence is:
1. rtsx_usb_ms_drv_remove sets host->eject = true
2. cancel_work_sync(&host->handle_req) runs
3. memstick_check work may be executed here <-- danger window
4. memstick_remove_host sets removing = 1
During this window (step 3), memstick_check calls memstick_alloc_card,
which may indefinitely waiting for mrq_complete completion that will
never occur because rtsx_usb_ms_request sees eject=true and skips
scheduling work, memstick_set_rw_addr waits forever for completion.
This causes a deadlock when memstick_remove_host tries to flush_workqueue,
waiting for memstick_check to complete, while memstick_check is blocked
waiting for mrq_complete completion.
Fix this by setting removing=true at the start of rtsx_usb_ms_drv_remove,
before any work cancellation. This ensures memstick_check will see the
removing flag immediately and exit early, avoiding the deadlock.
In preparation to fix replay timer timeout, add
sdhci_gli_mask_replay_timer_timeout() function
to simplify some of the code, allowing it to be re-used.
The arm_smmu_attach_commit() updates master->ats_enabled before calling
arm_smmu_remove_master_domain() that is supposed to clean up everything
in the old domain, including the old domain's nr_ats_masters. So, it is
supposed to use the old ats_enabled state of the device, not an updated
state.
This isn't a problem if switching between two domains where:
- old ats_enabled = false; new ats_enabled = false
- old ats_enabled = true; new ats_enabled = true
but can fail cases where:
- old ats_enabled = false; new ats_enabled = true
(old domain should keep the counter but incorrectly decreased it)
- old ats_enabled = true; new ats_enabled = false
(old domain needed to decrease the counter but incorrectly missed it)
Update master->ats_enabled after arm_smmu_remove_master_domain() to fix
this.
Fixes: 7497f4211f4f ("iommu/arm-smmu-v3: Make changing domains be hitless for ATS") Cc: stable@vger.kernel.org Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Acked-by: Will Deacon <will@kernel.org> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Pranjal Shrivastava <praan@google.com> Link: https://lore.kernel.org/r/20250801030127.2006979-1-nicolinc@nvidia.com Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This should have been fine because we're only in the 2nd slot and there's
another one after this, but iterate_folioq should not try to map a folio
that skips the whole size, and more importantly part here does not end up
zero (because 'PAGE_SIZE - skip % PAGE_SIZE' ends up PAGE_SIZE and not
zero..), so skip forward to the "advance to next folio" code
Link: https://lkml.kernel.org/r/20250813-iot_iter_folio-v3-0-a0ffad2b665a@codewreck.org Link: https://lkml.kernel.org/r/20250813-iot_iter_folio-v3-1-a0ffad2b665a@codewreck.org Signed-off-by: Dominique Martinet <asmadeus@codewreck.org> Fixes: db0aa2e9566f ("mm: Define struct folio_queue and ITER_FOLIOQ to handle a sequence of folios") Reported-by: Maximilian Bosch <maximilian@mbosch.me> Reported-by: Ryan Lahfa <ryan@lahfa.xyz> Reported-by: Christian Theune <ct@flyingcircus.io> Reported-by: Arnout Engelen <arnout@bzzt.net> Link: https://lkml.kernel.org/r/D4LHHUNLG79Y.12PI0X6BEHRHW@mbosch.me/ Acked-by: David Howells <dhowells@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: <stable@vger.kernel.org> [6.12+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The io_futex_data is allocated upfront and assigned to the io_kiocb
async_data field, but the request isn't marked with REQ_F_ASYNC_DATA
at that point. Those two should always go together, as the flag tells
io_uring whether the field is valid or not.
Additionally, on failure cleanup, the futex handler frees the data but
does not clear ->async_data. Clear the data and the flag in the error
path as well.
Thanks to Trend Micro Zero Day Initiative and particularly ReDress for
reporting this.
Cc: stable@vger.kernel.org Fixes: 194bb58c6090 ("io_uring: add support for futex wake and wait") Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
A BPF scheduler may want to use the built-in idle cpumasks in ops.init()
before the scheduler is fully initialized, either directly or through a
BPF timer for example.
However, this would result in an error, since the idle state has not
been properly initialized yet.
This can be easily verified by modifying scx_simple to call
scx_bpf_get_idle_cpumask() in ops.init():
Fixes: d73249f88743d ("sched_ext: idle: Make idle static keys private") Signed-off-by: Andrea Righi <arighi@nvidia.com> Reviewed-by: Joel Fernandes <joelagnelf@nvidia.com> Signed-off-by: Tejun Heo <tj@kernel.org>
[ Backport to 6.12:
- Original commit doesn't apply cleanly to 6.12 since d73249f88743d is
not present.
- This backport applies the same logical fix to prevent BPF scheduler
failures while accessing idle cpumasks from ops.init(). ] Signed-off-by: Andrea Righi <arighi@nvidia.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
ata_gen_ata_sense() is always called for a failed qc missing sense data
so that a sense key, code and code qualifier can be generated using
ata_to_sense_error() from the qc status and error fields of its result
task file. However, if the qc does not have its result task file filled,
ata_gen_ata_sense() returns early without setting a sense key.
Improve this by defaulting to returning ABORTED COMMAND without any
additional sense code, since we do not know the reason for the failure.
The same fix is also applied in ata_gen_passthru_sense() with the
additional check that the qc failed (qc->err_mask is set).
Fixes: 816be86c7993 ("ata: libata-scsi: Check ATA_QCFLAG_RTF_FILLED before using result_tf") Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Ring provided buffers are potentially only valid within the single
execution context in which they were acquired. io_uring deals with this
and invalidates them on retry. But on the networking side, if
MSG_WAITALL is set, or if the socket is of the streaming type and too
little was processed, then it will hang on to the buffer rather than
recycle or commit it. This is problematic for two reasons:
1) If someone unregisters the provided buffer ring before a later retry,
then the req->buf_list will no longer be valid.
2) If multiple sockers are using the same buffer group, then multiple
receives can consume the same memory. This can cause data corruption
in the application, as either receive could land in the same
userspace buffer.
Fix this by disallowing partial retries from pinning a provided buffer
across multiple executions, if ring provided buffers are used.
Cc: stable@vger.kernel.org Reported-by: pt x <superman.xpt@gmail.com> Fixes: c56e022c0a27 ("io_uring: add support for user mapped provided buffer ring") Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If all the subrequests in an unbuffered write stream fail, the subrequest
collector doesn't update the stream->transferred value and it retains its
initial LONG_MAX value. Unfortunately, if all active streams fail, then we
take the smallest value of { LONG_MAX, LONG_MAX, ... } as the value to set
in wreq->transferred - which is then returned from ->write_iter().
LONG_MAX was chosen as the initial value so that all the streams can be
quickly assessed by taking the smallest value of all stream->transferred -
but this only works if we've set any of them.
Fix this by adding a flag to indicate whether the value in
stream->transferred is valid and checking that when we integrate the
values. stream->transferred can then be initialised to zero.
This was found by running the generic/750 xfstest against cifs with
cache=none. It splices data to the target file. Once (if) it has used up
all the available scratch space, the writes start failing with ENOSPC.
This causes ->write_iter() to fail. However, it was returning
wreq->transferred, i.e. LONG_MAX, rather than an error (because it thought
the amount transferred was non-zero) and iter_file_splice_write() would
then try to clean up that amount of pipe bufferage - leading to an oops
when it overran. The kernel log showed:
Currently holes are sent as writes full of zeroes, which results in
unnecessarily using disk space at the receiving end and increasing the
stream size.
In some cases we avoid sending writes of zeroes, like during a full
send operation where we just skip writes for holes.
But for some cases we fill previous holes with writes of zeroes too, like
in this scenario:
1) We have a file with a hole in the range [2M, 3M), we snapshot the
subvolume and do a full send. The range [2M, 3M) stays as a hole at
the receiver since we skip sending write commands full of zeroes;
2) We punch a hole for the range [3M, 4M) in our file, so that now it
has a 2M hole in the range [2M, 4M), and snapshot the subvolume.
Now if we do an incremental send, we will send write commands full
of zeroes for the range [2M, 4M), removing the hole for [2M, 3M) at
the receiver.
We could improve cases such as this last one by doing additional
comparisons of file extent items (or their absence) between the parent
and send snapshots, but that's a lot of code to add plus additional CPU
and IO costs.
Since the send stream v2 already has a fallocate command and btrfs-progs
implements a callback to execute fallocate since the send stream v2
support was added to it, update the kernel to use fallocate for punching
holes for V2+ streams.
Test coverage is provided by btrfs/284 which is a version of btrfs/007
that exercises send stream v2 instead of v1, using fsstress with random
operations and fssum to verify file contents.
Link: https://github.com/kdave/btrfs-progs/issues/1001 CC: stable@vger.kernel.org # 6.1+ Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Whenever we issue a command we allocate a path and then compute it. For
the current inode this is not necessary since we have one preallocated
and computed in the send context structure, so we can use it instead
and avoid allocating and freeing a path.
For example if we have 100 extents to send (100 write commands) for a
file, we are allocating and freeing paths 100 times.
So improve on this by avoiding path allocation and freeing whenever a
command is for the current inode by using the current inode's path
stored in the send context structure.
A test was run before applying this patch and the previous one in the
series:
"btrfs: send: keep the current inode's path cached"
The test script is the following:
$ cat test.sh
#!/bin/bash
DEV=/dev/nullb0
MNT=/mnt/nullb0
mkfs.btrfs -f $DEV > /dev/null
mount $DEV $MNT
DIR="$MNT/one/two/three/four"
FILE="$DIR/foobar"
mkdir -p $DIR
# Create some empty files to get a deeper btree and therefore make
# path computations slower.
for ((i = 1; i <= 30000; i++)); do
echo -n > "$DIR/filler_$i"
done
for ((i = 0; i < 10000; i += 2)); do
offset=$(( i * 4096 ))
xfs_io -f -c "pwrite -S 0xab $offset 4K" $FILE > /dev/null
done
Whenever we need to send a command for the current inode, like sending
writes, xattr updates, truncates, utimes, etc, we compute the inode's
path each time, which implies doing some memory allocations and traversing
the inode hierarchy to extract the name of the inode and each ancestor
directory, and that implies doing lookups in the subvolume tree amongst
other operations.
Most of the time, by far, the current inode's path doesn't change while
we are processing it (like if we need to issue 100 write commands, the
path remains the same and it's pointless to compute it 100 times).
To avoid this keep the current inode's path cached in the send context
and invalidate it or update it whenever it's needed (after unlinks or
renames).
A performance test, and its results, is mentioned in the next patch in
the series (subject: "btrfs: send: avoid path allocation for the current
inode when issuing commands").
Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Stable-dep-of: 005b0a0c24e1 ("btrfs: send: use fallocate for hole punching with send stream v2") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Extract the logic to rename the current inode at process_recorded_refs()
into a helper function and use it, therefore removing duplicated logic
and making it easier for an upcoming patch by avoiding yet more duplicated
logic.
Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Stable-dep-of: 005b0a0c24e1 ("btrfs: send: use fallocate for hole punching with send stream v2") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We have several local variables at process_recorded_refs() that are used
as booleans, with some of them having a 'bool' type while two of them
having an 'int' type. Change this to make them all use the 'bool' type
which is more clear and to make everything more consistent.
Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Stable-dep-of: 005b0a0c24e1 ("btrfs: send: use fallocate for hole punching with send stream v2") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We always send xattrs for the current inode only and both callers of
send_set_xattr() pass a path for the current inode. So move the path
allocation and computation to send_set_xattr(), reducing duplicated
code. This also facilitates an upcoming patch.
Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Stable-dep-of: 005b0a0c24e1 ("btrfs: send: use fallocate for hole punching with send stream v2") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fix up xfs_inumbers to now pass in the XFS_IBULK* flags into the flags
argument to xfs_inobt_walk, which expects the XFS_IWALK* flags.
Currently passing the wrong flags works for non-debug builds because
the only XFS_IWALK* flag has the same encoding as the corresponding
XFS_IBULK* flag, but in debug builds it can trigger an assert that no
incorrect flag is passed. Instead just extra the relevant flag.
Fixes: 5b35d922c52798 ("xfs: Decouple XFS_IBULK flags from XFS_IWALK flags") Cc: <stable@vger.kernel.org> # v5.19 Reported-by: cen zhang <zzzccc427@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
[ Adjust context ] Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Similar to mark_bg_unused() and mark_bg_to_reclaim(), we have a few
places that use bg_list with refcounting, mostly for retrying failures
to reclaim/delete unused.
These have custom logic for handling locking and refcounting the bg_list
properly, but they actually all want to do the same thing, so pull that
logic out into a helper. Unfortunately, mark_bg_unused() does still need
the NEW flag to avoid prematurely marking stuff unused (even if refcount
is fine, we don't want to mess with bg creation), so it cannot use the
new helper.
Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Boris Burkov <boris@bur.io> Signed-off-by: David Sterba <dsterba@suse.com>
Stable-dep-of: 62be7afcc13b ("btrfs: zoned: requeue to unused block group list if zone finish failed") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
All other users of the bg_list list_head increment the refcount when
adding to a list and decrement it when deleting from the list. Just for
the sake of uniformity and to try to avoid refcounting bugs, do it for
this list as well.
This does not fix any known ref-counting bug, as the reference belongs
to a single task (trans_handle is not shared and this represents
trans_handle->new_bgs linkage) and will not lose its original refcount
while that thread is running. And BLOCK_GROUP_FLAG_NEW protects against
ref-counting errors "moving" the block group to the unused list without
taking a ref.
With that said, I still believe it is simpler to just hold the extra ref
count for this list user as well.
Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Boris Burkov <boris@bur.io> Signed-off-by: David Sterba <dsterba@suse.com>
Stable-dep-of: 62be7afcc13b ("btrfs: zoned: requeue to unused block group list if zone finish failed") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If we find an unexpected generation for the extent buffer we are cloning
at btrfs_copy_root(), we just WARN_ON() and don't error out and abort the
transaction, meaning we allow to persist metadata with an unexpected
generation. Instead of warning only, abort the transaction and return
-EUCLEAN.
CC: stable@vger.kernel.org # 6.1+ Reviewed-by: Daniel Vacek <neelx@suse.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Only one of the callers of __add_block_group_free_space() aborts the
transaction if the call fails, while the others don't do it and it's
either never done up the call chain or much higher in the call chain.
So make sure we abort the transaction at __add_block_group_free_space()
if it fails, which brings a couple benefits:
1) If some call chain never aborts the transaction, we avoid having some
metadata inconsistency because BLOCK_GROUP_FLAG_NEEDS_FREE_SPACE is
cleared when we enter __add_block_group_free_space() and therefore
__add_block_group_free_space() is never called again to add the block
group items to the free space tree, since the function is only called
when that flag is set in a block group;
2) If the call chain already aborts the transaction, then we get a better
trace that points to the exact step from __add_block_group_free_space()
which failed, which is better for analysis.
So abort the transaction at __add_block_group_free_space() if any of its
steps fails.
CC: stable@vger.kernel.org # 6.6+ Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Transaction aborts should be done next to the place the error happens,
which was not done in add_block_group_free_space().
Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Stable-dep-of: 1f06c942aa70 ("btrfs: always abort transaction on failure to add block group to free space tree") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There's a race between a task disabling quotas and another running the
rescan ioctl that can result in a use-after-free of qgroup records from
the fs_info->qgroup_tree rbtree.
This happens as follows:
1) Task A enters btrfs_ioctl_quota_rescan() -> btrfs_qgroup_rescan();
2) Task B enters btrfs_quota_disable() and calls
btrfs_qgroup_wait_for_completion(), which does nothing because at that
point fs_info->qgroup_rescan_running is false (it wasn't set yet by
task A);
3) Task B calls btrfs_free_qgroup_config() which starts freeing qgroups
from fs_info->qgroup_tree without taking the lock fs_info->qgroup_lock;
4) Task A enters qgroup_rescan_zero_tracking() which starts iterating
the fs_info->qgroup_tree tree while holding fs_info->qgroup_lock,
but task B is freeing qgroup records from that tree without holding
the lock, resulting in a use-after-free.
Fix this by taking fs_info->qgroup_lock at btrfs_free_qgroup_config().
Also at btrfs_qgroup_rescan() don't start the rescan worker if quotas
were already disabled.
This patch fixes a race condition communication error, which ends up in
PD hard resets when losing the race. Some systems, like the Radxa ROCK
5B are powered through USB-C without any backup power source and use a
FUSB302 chip to do the PD negotiation. This means it is quite important
to avoid hard resets, since that effectively kills the system's
power-supply.
I've found the following race condition while debugging unplanned power
loss during booting the board every now and then:
1. lots of TCPM/FUSB302/PD initialization stuff
2. TCPM ends up in SNK_WAIT_CAPABILITIES (tcpm_set_pd_rx is enabled here)
3. the remote PD source does not send anything, so TCPM does a SOFT RESET
4. TCPM ends up in SNK_WAIT_CAPABILITIES for the second time
(tcpm_set_pd_rx is enabled again, even though it is still on)
At this point I've seen broken CRC good messages being send by the
FUSB302 with a logic analyzer sniffing the CC lines. Also it looks like
messages are being lost and things generally going haywire with one of
the two sides doing a hard reset once a broken CRC good message was send
to the bus.
I think the system is running into a race condition, that the FIFOs are
being cleared and/or the automatic good CRC message generation flag is
being updated while a message is already arriving.
Let's avoid this by caching the PD RX enabled state, as we have already
processed anything in the FIFOs and are in a good state. As a side
effect that this also optimizes I2C bus usage :)
As far as I can tell the problem theoretically also exists when TCPM
enters SNK_WAIT_CAPABILITIES the first time, but I believe this is less
critical for the following reason:
On devices like the ROCK 5B, which are powered through a TCPM backed
USB-C port, the bootloader must have done some prior PD communication
(initial communication must happen within 5 seconds after plugging the
USB-C plug). This means the first time the kernel TCPM state machine
reaches SNK_WAIT_CAPABILITIES, the remote side is not sending messages
actively. On other devices a hard reset simply adds some extra delay and
things should be good afterwards.
Replace ternary (condition ? "enable" : "disable") syntax with helpers
from string_choices.h because:
1. Simple function call with one argument is easier to read. Ternary
operator has three arguments and with wrapping might lead to quite
long code.
2. Is slightly shorter thus also easier to read.
3. It brings uniformity in the text - same string.
4. Allows deduping by the linker, which results in a smaller binary
file.
In order to support future versions of the SVSM_CORE_PVALIDATE call, all
reserved fields within a PVALIDATE entry must be set to zero as an SVSM should
be ensuring all reserved fields are zero in order to support future usage of
reserved areas based on the protocol version.
Fixes: fcd042e86422 ("x86/sev: Perform PVALIDATE using the SVSM when not at VMPL0") Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Joerg Roedel <joerg.roedel@amd.com> Cc: <stable@kernel.org> Link: https://lore.kernel.org/7cde412f8b057ea13a646fb166b1ca023f6a5031.1755098819.git.thomas.lendacky@amd.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
damon_migrate_pages() tries migration even if the target node is invalid.
If users mistakenly make such invalid requests via
DAMOS_MIGRATE_{HOT,COLD} action, the below kernel BUG can happen.
Add a target node validity check in damon_migrate_pages(). The validity
check is stolen from that of do_pages_move(), which is being used for the
move_pages() system call.
Link: https://lkml.kernel.org/r/20250720185822.1451-1-sj@kernel.org Fixes: b51820ebea65 ("mm/damon/paddr: introduce DAMOS_MIGRATE_COLD action for demotion") [6.11.x] Signed-off-by: SeongJae Park <sj@kernel.org> Reviewed-by: Joshua Hahn <joshua.hahnjy@gmail.com> Cc: Honggyu Kim <honggyu.kim@sk.com> Cc: Hyeongtak Ji <hyeongtak.ji@sk.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This modification is linked to the parent commit where the received
ADD_ADDR limit was accidentally reset when the endpoints were flushed.
To validate that, the test is now flushing endpoints after having set
new limits, and before checking them.
The 'Fixes' tag here below is the same as the one from the previous
commit: this patch here is not fixing anything wrong in the selftests,
but it validates the previous fix for an issue introduced by this commit
ID.
A flush of the MPTCP endpoints should not affect the MPTCP limits. In
other words, 'ip mptcp endpoint flush' should not change 'ip mptcp
limits'.
But it was the case: the MPTCP_PM_ATTR_RCV_ADD_ADDRS (add_addr_accepted)
limit was reset by accident. Removing the reset of this counter during a
flush fixes this issue.
When skb_ext_add(skb, SKB_EXT_MPTCP) fails in mptcp_incoming_options(),
we used to return true, letting the segment proceed through the TCP
receive path without a DSS mapping. Such segments can leave inconsistent
mapping state and trigger a mid-stream fallback to TCP, which in testing
collapsed (by artificially forcing failures in skb_ext_add) throughput
to zero.
Return false instead so the TCP input path drops the skb (see
tcp_data_queue() and step-7 processing). This is the safer choice
under memory pressure: it preserves MPTCP correctness and provides
backpressure to the sender.
Control packets remain unaffected: ACK updates and DATA_FIN handling
happen before attempting the extension allocation, and tcp_reset()
continues to ignore the return value.
With this change, MPTCP continues to work at high throughput if we
artificially inject failures into skb_ext_add.
Fixes: 6787b7e350d3 ("mptcp: avoid processing packet if a subflow reset") Cc: stable@vger.kernel.org Signed-off-by: Christoph Paasch <cpaasch@openai.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20250815-net-mptcp-misc-fixes-6-17-rc2-v1-1-521fe9957892@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When enabling a sched_ext scheduler, we may trigger invalid task state
transitions, resulting in warnings like the following (which can be
easily reproduced by running the hotplug selftest in a loop):
This happens because we skip initialization for tasks that are already
dead (with their usage counter set to zero), but we don't exclude them
during the scheduling class transition phase.
Fix this by also skipping dead tasks during class swiching, preventing
invalid task state transitions.
Fixes: a8532fac7b5d2 ("sched_ext: TASK_DEAD tasks must be switched into SCX on ops_enable") Cc: stable@vger.kernel.org # v6.12+ Signed-off-by: Andrea Righi <arighi@nvidia.com> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The panic happens in br_dev_queue_push_xmit() once it receives a
corrupted skb with ETH header already pushed in linear data. When it
attempts the skb_push() call, there's not enough headroom and
skb_push() panics.
The corrupted skb is put on the queue by HSR layer, which makes a
sequence of unintended transformations when it receives a specific
corrupted HSR frame (with incomplete TAG).
Fix it by dropping and consuming frames that are not long enough to
contain both ethernet and hsr headers.
Alternative fix would be to check for enough headroom before skb_push()
in br_dev_queue_push_xmit().
In the reproducer, this is injected via AF_PACKET, but I don't easily
see why it couldn't be sent over the wire from adjacent network.
Further Details:
In the reproducer, the following network interface chain is set up:
To trigger the events leading up to crash, reproducer sends a corrupted
HSR frame with incomplete TAG, via AF_PACKET socket on 'veth0_to_hsr'.
The first HSR-layer function to process this frame is
hsr_handle_frame(). It and then checks if the
protocol is ETH_P_PRP or ETH_P_HSR. If it is, it calls
skb_set_network_header(skb, ETH_HLEN + HSR_HLEN), without checking that
the skb is long enough. For the crashing frame it is not, and hence the
skb->network_header and skb->mac_len fields are set incorrectly,
pointing after the end of the linear buffer.
I will call this a BUG#1 and it is what is addressed by this patch. In
the crashing scenario before the fix, the skb continues to go down the
hsr path as follows.
hsr_handle_frame() then calls this sequence
hsr_forward_skb()
fill_frame_info()
hsr->proto_ops->fill_frame_info()
hsr_fill_frame_info()
hsr_fill_frame_info() contains a check that intends to check whether the
skb actually contains the HSR header. But the check relies on the
skb->mac_len field which was erroneously setup due to BUG#1, so the
check passes and the execution continues back in the hsr_forward_skb():
In create_stripped_skb_hsr(), a copy of the skb is created and is
further corrupted by operation that attempts to strip the HSR tag in a
call to __pskb_copy().
The skb enters create_stripped_skb_hsr() with ethernet header pushed in
linear buffer. The skb_pull(skb_in, HSR_HLEN) thus pulls 6 bytes of
ethernet header into the headroom, creating skb_in with a headroom of
size 8. The subsequent __pskb_copy() then creates an skb with headroom
of just 2 and skb->len of just 12, this is how it looks after the copy:
gdb) p skb->len
$10 = 12
(gdb) p skb->data
$11 = (unsigned char *) 0xffff888041e45382 "\252\252\252\252\252!\210\373",
(gdb) p skb->head
$12 = (unsigned char *) 0xffff888041e45380 ""
It seems create_stripped_skb_hsr() assumes that ETH header is pulled
in the headroom when it's entered, because it just pulls HSR header on
top. But that is not the case in our code-path and we end up with the
corrupted skb instead. I will call this BUG#2
*I got confused here because it seems that under no conditions can
create_stripped_skb_hsr() work well, the assumption it makes is not true
during the processing of hsr frames - since the skb_push() in
hsr_handle_frame to skb_pull in hsr_deliver_master(). I wonder whether I
missed something here.*
Next, the execution arrives in hsr_deliver_master(). It calls
skb_pull(ETH_HLEN), which just returns NULL - the SKB does not have
enough space for the pull (as it only has 12 bytes in total at this
point).
*The skb_pull() here further suggests that ethernet header is meant
to be pushed through the whole hsr processing and
create_stripped_skb_hsr() should pull it before doing the HSR header
pull.*
hsr_deliver_master() then puts the corrupted skb on the queue, it is
then picked up from there by bridge frame handling layer and finally
lands in br_dev_queue_push_xmit where it panics.
Cc: stable@kernel.org Fixes: 48b491a5cc74 ("net: hsr: fix mac_len checks") Reported-by: syzbot+a81f2759d022496b40ab@syzkaller.appspotmail.com Signed-off-by: Jakub Acs <acsjakub@amazon.de> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250819082842.94378-1-acsjakub@amazon.de Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
dc_clk_mgr_create accidentally overwrites the dce60_clk_mgr
with the dce_clk_mgr, causing incorrect behaviour on DCE6.
Fix it by removing the extra dce_clk_mgr_construct.
Fixes: 62eab49faae7 ("drm/amd/display: hide VGH asic specific structs") Reviewed-by: Rodrigo Siqueira <siqueira@igalia.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit bbddcbe36a686af03e91341b9bbfcca94bd45fb6) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[Why]
A null pointer dereference vulnerability exists in the AMD display driver's
(DC module) cleanup function dc_destruct().
When display control context (dc->ctx) construction fails
(due to memory allocation failure), this pointer remains NULL.
During subsequent error handling when dc_destruct() is called,
there's no NULL check before dereferencing the perf_trace member
(dc->ctx->perf_trace), causing a kernel null pointer dereference crash.
[How]
Check if dc->ctx is non-NULL before dereferencing.
Link: https://lore.kernel.org/r/tencent_54FF4252EDFB6533090A491A25EEF3EDBF06@qq.com Co-developed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
(Updated commit text and removed unnecessary error message) Signed-off-by: Siyang Liu <Security@tencent.com> Signed-off-by: Roman Li <roman.li@amd.com> Reviewed-by: Alex Hung <alex.hung@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 9dd8e2ba268c636c240a918e0a31e6feaee19404) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
amdgpu_dm_commit_planes calls update_freesync_state_on_stream only for
the primary plane. If a commit affects a CRTC but not its primary plane,
it would previously not trigger a refresh cycle or affect LFC, violating
current UAPI semantics.
Fixes e.g. atomic commits affecting only the cursor plane being limited
to the minimum refresh rate.
Don't do this for the legacy cursor ioctls though, it would break the
UAPI semantics for those.
Suggested-by: Xaver Hugl <xaver.hugl@kde.org> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3034 Signed-off-by: Michel Dänzer <mdaenzer@redhat.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit cc7bfba95966251b254cb970c21627124da3b7f4) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Since KFD proc content was moved to kernel debugfs, we can't destroy KFD
debugfs before kfd_process_destroy_wq. Move kfd_process_destroy_wq prior
to kfd_debugfs_fini to fix a kernel NULL pointer problem. It happens
when /sys/kernel/debug/kfd was already destroyed in kfd_debugfs_fini but
kfd_process_destroy_wq calls kfd_debugfs_remove_process. This line
debugfs_remove_recursive(entry->proc_dentry);
tries to remove /sys/kernel/debug/kfd/proc/<pid> while
/sys/kernel/debug/kfd is already gone. It hangs the kernel by kernel
NULL pointer.
Signed-off-by: Amber Lin <Amber.Lin@amd.com> Reviewed-by: Eric Huang <jinhuieric.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 0333052d90683d88531558dcfdbf2525cc37c233) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
After a recent change in clang to expose uninitialized warnings from
const variables and pointers [1], there is a warning in
imu_v12_0_program_rlc_ram() because data is passed uninitialized to
program_imu_rlc_ram():
drivers/gpu/drm/amd/amdgpu/imu_v12_0.c:374:30: error: variable 'data' is uninitialized when used here [-Werror,-Wuninitialized]
374 | program_imu_rlc_ram(adev, data, (const u32)size);
| ^~~~
As this warning happens early in clang's frontend, it does not realize
that due to the assignment of r to -EINVAL, program_imu_rlc_ram() is
never actually called, and even if it were, data would not be
dereferenced because size is 0.
Just initialize data to NULL to silence the warning, as the commit that
added program_imu_rlc_ram() mentioned it would eventually be used over
the old method, at which point data can be properly initialized and
used.
Cc: stable@vger.kernel.org Closes: https://github.com/ClangBuiltLinux/linux/issues/2107 Fixes: 56159fffaab5 ("drm/amdgpu: use new method to program rlc ram") Link: https://github.com/llvm/llvm-project/commit/2464313eef01c5b1edf0eccf57a32cdee01472c7 Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
HUBBUB structure is not initialized on DCE hardware, so check if it is NULL
to avoid null dereference while accessing amdgpu_dm_capabilities file in
debugfs.
Signed-off-by: Peter Shkenev <mustela@erminea.space> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Suggested-by: Christian König <christian.koenig@amd.com> Signed-off-by: Gang Ba <Gang.Ba@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We only need the fw based discovery table for sysfs. No
need to parse it. Additionally parsing some of the board
specific tables may result in incorrect data on some boards.
just load the binary and don't parse it on those boards.
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4441 Fixes: 80a0e8282933 ("drm/amdgpu/discovery: optionally use fw based ip discovery") Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 62eedd150fa11aefc2d377fc746633fdb1baeb55) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The driver uses "whole" fps in all its calculations (e.g. in
load_per_instance()). Those calculation expect an fps bigger than 1, and
not big enough to overflow.
Clamp the param if the user provides a value that will result in an invalid
fps.
Reported-by: Hans Verkuil <hverkuil@xs4all.nl> Closes: https://lore.kernel.org/linux-media/f11653a7-bc49-48cd-9cdb-1659147453e4@xs4all.nl/T/#m91cd962ac942834654f94c92206e2f85ff7d97f0 Fixes: aaaa93eda64b ("[media] media: venus: venc: add video encoder files") Cc: stable@vger.kernel.org Signed-off-by: Ricardo Ribalda <ribalda@chromium.org>
[bod: Change "parm" to "param"] Signed-off-by: Bryan O'Donoghue <bod@kernel.org> Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The driver uses "whole" fps in all its calculations (e.g. in
load_per_instance()). Those calculation expect an fps bigger than 1, and
not big enough to overflow.
Clamp the value if the user provides a param that will result in an invalid
fps.
Make sure the interrupt handler is initialized before the interrupt is
registered.
If the IRQ is registered before hfi_create(), it's possible that an
interrupt fires before the handler setup is complete, leading to a NULL
dereference.
This error condition has been observed during system boot on Rb3Gen2.
Add a check to ensure that the packet size does not exceed the number of
available words after reading the packet header from shared memory. This
ensures that the size provided by the firmware is safe to process and
prevent potential out-of-bounds memory access.
Both the ACE and CSI driver are missing a mei_cldev_disable() call in
their remove() function.
This causes the mei_cl client to stay part of the mei_device->file_list
list even though its memory is freed by mei_cl_bus_dev_release() calling
kfree(cldev->cl).
This leads to a use-after-free when mei_vsc_remove() runs mei_stop()
which first removes all mei bus devices calling mei_ace_remove() and
mei_csi_remove() followed by mei_cl_bus_dev_release() and then calls
mei_cl_all_disconnect() which walks over mei_device->file_list dereferecing
the just freed cldev->cl.
And mei_vsc_remove() it self is run at shutdown because of the
platform_device_unregister(tp->pdev) in vsc_tp_shutdown()
When building a kernel with KASAN this leads to the following KASAN report:
[ 106.634504] ==================================================================
[ 106.634623] BUG: KASAN: slab-use-after-free in mei_cl_set_disconnected (drivers/misc/mei/client.c:783) mei
[ 106.634683] Read of size 4 at addr ffff88819cb62018 by task systemd-shutdow/1
[ 106.634729]
[ 106.634767] Tainted: [E]=UNSIGNED_MODULE
[ 106.634770] Hardware name: Dell Inc. XPS 16 9640/09CK4V, BIOS 1.12.0 02/10/2025
[ 106.634773] Call Trace:
[ 106.634777] <TASK>
...
[ 106.634871] kasan_report (mm/kasan/report.c:221 mm/kasan/report.c:636)
[ 106.634901] mei_cl_set_disconnected (drivers/misc/mei/client.c:783) mei
[ 106.634921] mei_cl_all_disconnect (drivers/misc/mei/client.c:2165 (discriminator 4)) mei
[ 106.634941] mei_reset (drivers/misc/mei/init.c:163) mei
...
[ 106.635042] mei_stop (drivers/misc/mei/init.c:348) mei
[ 106.635062] mei_vsc_remove (drivers/misc/mei/mei_dev.h:784 drivers/misc/mei/platform-vsc.c:393) mei_vsc
[ 106.635066] platform_remove (drivers/base/platform.c:1424)
Add the missing mei_cldev_disable() calls so that the mei_cl gets removed
from mei_device->file_list before it is freed to fix this.
Fixes: 78876f71b3e9 ("media: pci: intel: ivsc: Add ACE submodule") Fixes: 29006e196a56 ("media: pci: intel: ivsc: Add CSI submodule") Cc: stable@vger.kernel.org Signed-off-by: Hans de Goede <hansg@kernel.org> Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com> Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Getting / Setting the frame interval using the V4L2 subdev pad ops
get_frame_interval/set_frame_interval causes a deadlock, as the
subdev state is locked in the [1] but also in the driver itself.
In [2] it's described that the caller is responsible to acquire and
release the lock in this case. Therefore, acquiring the lock in the
driver is wrong.
Remove the lock acquisitions/releases from mt9m114_ifp_get_frame_interval()
and mt9m114_ifp_set_frame_interval().
[1] drivers/media/v4l2-core/v4l2-subdev.c - line 1129
[2] Documentation/driver-api/media/v4l2-subdev.rst
Fixes: 24d756e914fc ("media: i2c: Add driver for onsemi MT9M114 camera sensor") Cc: stable@vger.kernel.org Signed-off-by: Mathis Foerst <mathis.foerst@mt.com> Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com> Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
ov2659_probe() doesn't properly free control handler resources in failure
paths, causing memory leaks. Add v4l2_ctrl_handler_free() to prevent these
memory leaks and reorder the ctrl_handler assignment for better code flow.
Fixes: c4c0283ab3cd ("[media] media: i2c: add support for omnivision's ov2659 sensor") Cc: stable@vger.kernel.org Signed-off-by: Zhang Shurong <zhang_shurong@foxmail.com> Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com> Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
During the probe() routine, the PiSP BE driver needs to power up the
interface in order to identify and initialize the hardware.
The driver resumes the interface by calling the
pispbe_runtime_resume() function directly, without going
through the pm_runtime helpers, but later suspends it by calling
pm_runtime_put_autosuspend().
This causes a PM usage count imbalance at probe time, notified by the
runtime_pm framework with the below message in the system log:
Fix this by resuming the interface using the pm runtime helpers instead
of calling the resume function directly and use the pm_runtime framework
in the probe() error path. While at it, remove manual suspend of the
interface in the remove() function. The driver cannot be unloaded if in
use, so simply disable runtime pm.
To simplify the implementation, make the driver depend on PM as the
RPI5 platform where the ISP is integrated in uses the PM framework by
default.
In the interrupt handler rain_interrupt(), the buffer full check on
rain->buf_len is performed before acquiring rain->buf_lock. This
creates a Time-of-Check to Time-of-Use (TOCTOU) race condition, as
rain->buf_len is concurrently accessed and modified in the work
handler rain_irq_work_handler() under the same lock.
Multiple interrupt invocations can race, with each reading buf_len
before it becomes full and then proceeding. This can lead to both
interrupts attempting to write to the buffer, incrementing buf_len
beyond its capacity (DATA_SIZE) and causing a buffer overflow.
Fix this bug by moving the spin_lock() to before the buffer full
check. This ensures that the check and the subsequent buffer modification
are performed atomically, preventing the race condition. An corresponding
spin_unlock() is added to the overflow path to correctly release the
lock.
This possible bug was found by an experimental static analysis tool
developed by our team.
Fixes: 0f314f6c2e77 ("[media] rainshadow-cec: new RainShadow Tech HDMI CEC driver") Cc: stable@vger.kernel.org Signed-off-by: Gui-Dong Han <hanguidong02@gmail.com> Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When an program is streaming (ffplay) and another program (qv4l2)
changes the TV standard from NTSC to PAL, the kernel crashes due to trying
to copy to unmapped memory.
Changing from NTSC to PAL increases the resolution in the usbtv struct,
but the video plane buffer isn't adjusted, so it overflows.
Fixes: 0e0fe3958fdd13d ("[media] usbtv: Add support for PAL video source") Cc: stable@vger.kernel.org Signed-off-by: Ludwig Disterhof <ludwig@disterhof.eu> Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
[hverkuil: call vb2_is_busy instead of vb2_is_streaming] Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
It's a common pattern in drivers to free the control handler's resources
and then return the handler's error code on drivers' error handling paths.
Alas, the v4l2_ctrl_handler_free() function also zeroes the error field,
effectively indicating successful return to the caller.
There's no apparent need to touch the error field while releasing the
control handler's resources and cleaning up stale pointers. Not touching
the handler's error field is a more certain way to address this problem
than changing all the users, in which case the pattern would be likely to
re-emerge in new drivers.
Do just that, don't touch the control handler's error field in
v4l2_ctrl_handler_free().
Fixes: 0996517cf8ea ("V4L/DVB: v4l2: Add new control handling framework") Cc: stable@vger.kernel.org Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com> Reviewed-by: Hans Verkuil <hverkuil@xs4all.nl> Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The desired clock frequency was correctly set to 400MHz in the device tree
but was lowered by the driver to 300MHz breaking 4K 60Hz content playback.
Fix the issue by removing the driver call to clk_set_rate(), which reduce
the amount of board specific code.
Fixes: 003afda97c65 ("media: verisilicon: Enable AV1 decoder on rk3588") Cc: stable@vger.kernel.org Reviewed-by: Benjamin Gaignard <benjamin.gaignard@collabora.com> Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de> Signed-off-by: Nicolas Dufresne <nicolas.dufresne@collabora.com> Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The pixel_array control size was calculated incorrectly:
the dimensions were swapped (dims[0] should be the height), and the
values should be the width or height divided by PIXEL_ARRAY_DIV
and rounded up. So don't use roundup, but use DIV_ROUND_UP instead.
This bug is harmless in the sense that nothing will break, except that
it consumes way too much memory for this control.
The pad argument to v4l2_subdev_state_xlate_streams() is incorrect, static
pad number is used for the source pad even though the pad number is
dependent on the stream. Fix it.
Add video_device_release() in label 'err_m2m' to release the memory
allocated by video_device_alloc() and prevent potential memory leaks.
Remove the reduntant code in label 'err_m2m'.
Fixes: a8ef0488cc59 ("media: imx: add csc/scaler mem2mem device") Cc: stable@vger.kernel.org Signed-off-by: Haoxiang Li <haoxiang_li2024@163.com> Reviewed-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Nicolas Dufresne <nicolas.dufresne@collabora.com> Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The test pattern is set by a 8-bit register according to the
specification.
+--------+-------------------------------+
| BIT[0] | Solid color |
+--------+-------------------------------+
| BIT[1] | Color bar |
+--------+-------------------------------+
| BIT[2] | Fade to grey color bar |
+--------+-------------------------------+
| BIT[3] | PN9 |
+--------+-------------------------------+
| BIT[4] | Gradient horizontal |
+--------+-------------------------------+
| BIT[5] | Gradient vertical |
+--------+-------------------------------+
| BIT[6] | Check board |
+--------+-------------------------------+
| BIT[7] | Slant pattern |
+--------+-------------------------------+
Based on function above, current test pattern programming is wrong.
This patch fixes it by 'BIT(pattern - 1)'. If pattern is 0, driver
will disable the test pattern generation and set the pattern to 0.
Fixes: e62138403a84 ("media: hi556: Add support for Hi-556 sensor") Cc: stable@vger.kernel.org Signed-off-by: Bingbu Cao <bingbu.cao@intel.com> Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com> Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This sd_init() function reads the firmware. The firmware data holds a
series of records and the function reads each record and sends the data
to the device. The request_ihex_firmware() function
calls ihex_validate_fw() which ensures that the total length of all the
records won't read out of bounds of the fw->data[].
However, a potential issue is if there is a single very large
record (larger than PAGE_SIZE) and that would result in memory
corruption. Generally we trust the firmware, but it's always better to
double check.
Fixes: 49b61ec9b5af ("[media] gspca: Add new vicam subdriver") Cc: stable@vger.kernel.org Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
int main(int argc, char **argv)
{
unsigned long page_size = sysconf(_SC_PAGESIZE);
char *p = malloc(3 * page_size);
char *p_aligned;
/* initialize memory region. If not initialized, write syscall below will correctly return EFAULT. */
if (1)
memset(p, 'X', 3 * page_size);
p_aligned = (char *) ((((uintptr_t) p) + (2*page_size - 1)) & ~(page_size - 1));
/* Drop PROT_READ protection. Kernel and userspace should fault when accessing that memory region */
mprotect(p_aligned, page_size, PROT_NONE);
/* the following write() should return EFAULT, since PROT_READ was dropped by previous mprotect() */
int ret = write(2, p_aligned, 1);
if (!ret || errno != EFAULT)
printf("\n FAILURE: write() did not returned expected EFAULT value\n");
return 0;
}
Because of the way _PAGE_READ is handled, kernel code never generates
a read access fault when it access a page as the kernel privilege level
is always less than PL1 in the PTE.
This patch reworks the comments in the make_insert_tlb macro to try
to make this clearer.
Signed-off-by: John David Anglin <dave.anglin@bell.net> Signed-off-by: Helge Deller <deller@gmx.de> Cc: stable@vger.kernel.org # v5.12+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We use load and stbys,e instructions to trigger memory reference
interruptions without writing to memory. Because of the way read
access support is implemented, read access interruptions are only
triggered at privilege levels 2 and 3. The kernel and gateway
page execute at privilege level 0, so this code never triggers
a read access interruption. Thus, it is currently possible for
user code to execute a LWS compare and swap operation at an
address that is read protected at privilege level 3 (PRIV_USER).
Fix this by probing read access rights at privilege level 3 and
branching to lws_fault if access isn't allowed.
Signed-off-by: John David Anglin <dave.anglin@bell.net> Signed-off-by: Helge Deller <deller@gmx.de> Cc: stable@vger.kernel.org # v5.12+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Because of the way read access support is implemented, read access
interruptions are only triggered at privilege levels 2 and 3. The
kernel executes at privilege level 0, so __get_user() never triggers
a read access interruption (code 26). Thus, it is currently possible
for user code to access a read protected address via a system call.
Fix this by probing read access rights at privilege level 3 (PRIV_USER)
and setting __gu_err to -EFAULT (-14) if access isn't allowed.
Note the cmpiclr instruction does a 32-bit compare because COND macro
doesn't work inside asm.
Signed-off-by: John David Anglin <dave.anglin@bell.net> Signed-off-by: Helge Deller <deller@gmx.de> Cc: stable@vger.kernel.org # v5.12+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>