When GPU memory mappings are updated, the driver returns a fence so
userspace knows when the update is finished.
The previous refactor could pick the wrong fence or rely on checks that
are not safe for GPU mappings that stay valid even when memory is
missing. In some cases this could return an invalid fence or cause fence
reference counting problems.
Fix this by (v5,v6, per Christian):
- Starting from the VM’s existing last update fence, so a valid and
meaningful fence is always returned even when no new work is required.
- Selecting the VM-level fence only for always-valid / PRT mappings using
the required combined bo_va + bo guard.
- Using the per-BO page table update fence for normal MAP and REPLACE
operations.
- For UNMAP and CLEAR, returning the fence provided by
amdgpu_vm_clear_freed(), which may remain unchanged when nothing needs
clearing.
- Keeping fence reference counting balanced.
v7: Drop the extra bo_va/bo NULL guard since
amdgpu_vm_is_bo_always_valid() handles NULL BOs correctly (including
PRT). (Christian)
This makes VM timeline fences correct and prevents crashes caused by
incorrect fence handling.
Fixes: bd8150a1b337 ("drm/amdgpu: Refactor amdgpu_gem_va_ioctl for Handling Last Fence Update and Timeline Management v4") Suggested-by: Christian König <christian.koenig@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
A recent commit fixed a resource leak on early registration failures but
for some reason left out the first error path which still leaks the
resources associated with the interface.
Fix up also the first error path so that the interface is always
released on errors.
Fixes: 1f4c9d8a1021 ("most: core: fix resource leak in most_register_interface error paths") Fixes: 723de0f9171e ("staging: most: remove device from interface structure") Cc: Christian Gromm <christian.gromm@microchip.com> Cc: Navaneeth K <knavaneeth786@gmail.com> Signed-off-by: Johan Hovold <johan@kernel.org> Link: https://patch.msgid.link/20260116162950.21578-1-johan@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Ben reports that when running with CONFIG_DEBUG_PREEMPT, using
__arch_counter_get_cntvct_stable() results in well deserves warnings,
as we access a per-CPU variable without preemption disabled.
Fix the issue by disabling preemption on reading the counter. We can
probably do a lot better by not disabling preemption on systems that
do not require horrible workarounds to return a valid counter value,
but this plugs the issue for the time being.
Fixes: 29cc0f3aa7c6 ("arm64: Force the use of CNTVCT_EL0 in __delay()") Reported-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/aZw3EGs4rbQvbAzV@e134344.arm.com Tested-by: Ben Horgan <ben.horgan@arm.com> Tested-by: André Draszik <andre.draszik@linaro.org> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Chris Mason noticed that there is a copy-paste error in a recent change
to xrep_dir_teardown that nulls out pointers after freeing the
resources.
Fixes: ba408d299a3bb3c ("xfs: only call xf{array,blob}_destroy if we have a valid pointer") Link: https://lore.kernel.org/linux-xfs/20260205194211.2307232-1-clm@meta.com/ Reported-by: Chris Mason <clm@meta.com> Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Commit 469c1c9eb6c9 ("kernel-doc: Issue warnings that were silently
discarded") started emitting warnings for cases that were previously
silently discarded. One such case is in intel_wakeref.h:
Warning: drivers/gpu/drm/i915/intel_wakeref.h:156 expecting prototype
for __intel_wakeref_put(). Prototype was for INTEL_WAKEREF_PUT_ASYNC()
instead
Arguably kernel-doc should be able to handle this, as it's valid C, but
having the flags defined between the function declarator and the body is
just asking for trouble. Move the INTEL_WAKEREF_PUT_* macros away from
there, making kernel-doc's life easier.
While at it, reduce the unnecessary abstraction levels by removing the
enum, and append _MASK to INTEL_WAKEREF_PUT_DELAY for clarity.
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Jonathan Corbet <corbet@lwn.net> Acked-by: Randy Dunlap <rdunlap@infradead.org> Tested-by: Randy Dunlap <rdunlap@infradead.org> Link: https://patch.msgid.link/20251215120908.3515578-1-jani.nikula@intel.com Signed-off-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
This commit is causing a suspend regression on systems such as the Asus
Zephyrus G14 (GA402RJ) with Ryzen 7 6700H: when suspending, the display
turns off but the device fails to fully power down. This is not seen
with v7.0-rc1 which indicates that there are changes missing. Therefore,
revert this change.
When building with clang older than 17 targeting architectures that use
asm goto for their get_user() and put_user(), such as arm64, after
commit f3d233daf011 ("ALSA: pcm: Relax __free() variable declarations"),
there are bogus errors around skipping over a variable declared with the
cleanup attribute:
sound/core/pcm_native.c:3308:6: error: cannot jump from this asm goto statement to one of its possible targets
if (put_user(result, &_xfern->result))
^
...
arch/arm64/include/asm/uaccess.h:298:2: note: expanded from macro '__put_mem_asm'
asm goto(
^
sound/core/pcm_native.c:3295:6: note: possible target of asm goto statement
if (put_user(0, &_xfern->result))
^
...
sound/core/pcm_native.c:3300:8: note: jump exits scope of variable with __attribute__((cleanup))
void *bufs __free(kfree) =
^
clang-17 fixed a bug in clang's jump scope checker [1] where all labels
in a function were checked as valid targets for all asm goto instances
in a function, regardless of whether they were actual targets in a
paricular asm goto's provided list of labels.
To workaround this, revert the change done to
snd_pcm_xfern_frames_ioctl() by commit f3d233daf011 ("ALSA: pcm: Relax
__free() variable declarations") to avoid a variable declared with
cleanup from existing between multiple uses of asm goto. There are no
other uses of cleanup in this function so there should be low risk from
moving this variable back to the top of the function.
impl_list_item_mod.rs calls container_of! without unsafe blocks at a
couple of places. Since container_of! is unsafe, the blocks are strictly
necessary.
The problem was so far not visible because the "unsafe-op-in-unsafe-fn"
check is a lint rather than a hard compiler error, and Rust suppresses
lints triggered inside of a macro from another crate.
Thus, the error becomes only visible once someone from within the kernel
crate tries to use linked lists:
error[E0133]: call to unsafe function `core::ptr::mut_ptr::<impl *mut T>::byte_sub`
is unsafe and requires unsafe block
--> rust/kernel/lib.rs:252:29
|
252 | let container_ptr = field_ptr.byte_sub(offset).cast::<$Container>();
| ^^^^^^^^^^^^^^^^^^^^^^^^^^ call to unsafe function
|
::: rust/kernel/drm/jq.rs:98:1
|
98 | / impl_list_item! {
99 | | impl ListItem<0> for BasicItem { using ListLinks { self.links }; }
100 | | }
| |_- in this macro invocation
|
note: an unsafe function restricts its caller, but its body is safe by default
--> rust/kernel/list/impl_list_item_mod.rs:216:13
|
216 | unsafe fn view_value(me: *mut $crate::list::ListLinks<$num>) -> *const Self {
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
::: rust/kernel/drm/jq.rs:98:1
|
98 | / impl_list_item! {
99 | | impl ListItem<0> for BasicItem { using ListLinks { self.links }; }
100 | | }
| |_- in this macro invocation
= note: requested on the command line with `-D unsafe-op-in-unsafe-fn`
= note: this error originates in the macro `$crate::container_of` which comes
from the expansion of the macro `impl_list_item`
Therefore, add unsafe blocks to container_of! calls to fix the issue.
[ As discussed, let's fix the build for those that want to use the
macro within the `kernel` crate now and we can discuss the proper
safety comments afterwards. Thus I removed the ones from the patch.
However, we cannot just avoid the comments with `CLIPPY=1`, so I
provided placeholders for now, like we did in the past. They were
also needed for an `unsafe impl`.
While I am not happy about it, it isn't worse than the current
status (the comments were meant to be there), and at least this
shows what is missing -- our pre-existing "good first issue" [1]
may motivate new contributors to complete them properly.
Finally, I moved one of the existing safety comments one line down
so that Clippy could locate it.
The event_hist_poll() function attempts to verify whether an event file is
being removed, but this check may not occur or could be unnecessarily
delayed. This happens because hist_poll_wakeup() is currently invoked only
from event_hist_trigger() when a hist command is triggered. If the event
file is being removed, no associated hist command will be triggered and a
waiter will be woken up only after an unrelated hist command is triggered.
Fix the issue by adding a call to hist_poll_wakeup() in
remove_event_file_dir() after setting the EVENT_FILE_FL_FREED flag. This
ensures that a task polling on a hist file is woken up and receives
EPOLLERR.
Cc: stable@vger.kernel.org Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Tom Zanussi <zanussi@kernel.org> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Link: https://patch.msgid.link/20260219162737.314231-3-petr.pavlu@suse.com Fixes: 1bd13edbbed6 ("tracing/hist: Add poll(POLLIN) support on hist file") Signed-off-by: Petr Pavlu <petr.pavlu@suse.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
The event_hist_open() and event_hist_poll() functions currently retrieve
a trace_event_file pointer from a file struct by invoking
event_file_data(), which simply returns file->f_inode->i_private. The
functions then check if the pointer is NULL to determine whether the event
is still valid. This approach is flawed because i_private is assigned when
an eventfs inode is allocated and remains set throughout its lifetime.
Instead, the code should call event_file_file(), which checks for
EVENT_FILE_FL_FREED. Using the incorrect access function may result in the
code potentially opening a hist file for an event that is being removed or
becoming stuck while polling on this file.
Correct the access method to event_file_file() in both functions.
Cc: stable@vger.kernel.org Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Tom Zanussi <zanussi@kernel.org> Link: https://patch.msgid.link/20260219162737.314231-2-petr.pavlu@suse.com Fixes: 1bd13edbbed6 ("tracing/hist: Add poll(POLLIN) support on hist file") Signed-off-by: Petr Pavlu <petr.pavlu@suse.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
The function graph tracer was modified to us the ftrace_ops of the
function tracer. This simplified the code as well as allowed more features
of the function graph tracer.
Not all architectures were converted over as it required the
implementation of HAVE_DYNAMIC_FTRACE_WITH_ARGS to implement. For those
architectures, it still did it the old way where the function graph tracer
handle was called by the function tracer trampoline. The handler then had
to check the hash to see if the registered handlers wanted to be called by
that function or not.
In order to speed up the function graph tracer that used ftrace_ops, if
only one callback was registered with function graph, it would call its
function directly via a static call.
Now, if the architecture does not support the use of using ftrace_ops and
still has the ftrace function trampoline calling the function graph
handler, then by doing a direct call it removes the check against the
handler's hash (list of functions it wants callbacks to), and it may call
that handler for functions that the handler did not request calls for.
On 32bit x86, which does not support the ftrace_ops use with function
graph tracer, it shows the issue:
~# trace-cmd start -p function -l schedule
~# trace-cmd show
# tracer: function_graph
#
# CPU DURATION FUNCTION CALLS
# | | | | | | |
2) * 11898.94 us | schedule();
3) # 1783.041 us | schedule();
1) | schedule() {
------------------------------------------
1) bash-8369 => kworker-7669
------------------------------------------
1) | schedule() {
------------------------------------------
1) kworker-7669 => bash-8369
------------------------------------------
1) + 97.004 us | }
1) | schedule() {
[..]
Now by starting the function tracer is another instance:
~# trace-cmd start -B foo -p function
This causes the function graph tracer to trace all functions (because the
function trace calls the function graph tracer for each on, and the
function graph trace is doing a direct call):
~# trace-cmd show
# tracer: function_graph
#
# CPU DURATION FUNCTION CALLS
# | | | | | | |
1) 1.669 us | } /* preempt_count_sub */
1) + 10.443 us | } /* _raw_spin_unlock_irqrestore */
1) | tick_program_event() {
1) | clockevents_program_event() {
1) 1.044 us | ktime_get();
1) 6.481 us | lapic_next_event();
1) + 10.114 us | }
1) + 11.790 us | }
1) ! 181.223 us | } /* hrtimer_interrupt */
1) ! 184.624 us | } /* __sysvec_apic_timer_interrupt */
1) | irq_exit_rcu() {
1) 0.678 us | preempt_count_sub();
When it should still only be tracing the schedule() function.
To fix this, add a macro FGRAPH_NO_DIRECT to be set to 0 when the
architecture does not support function graph use of ftrace_ops, and set to
1 otherwise. Then use this macro to know to allow function graph tracer to
call the handlers directly or not.
Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Mark Rutland <mark.rutland@arm.com> Link: https://patch.msgid.link/20260218104244.5f14dade@gandalf.local.home Fixes: cc60ee813b503 ("function_graph: Use static_call and branch to optimize entry function") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Check the event length before adding it for accessing next index in
rb_read_data_buffer(). Since this function is used for validating
possibly broken ring buffers, the length of the event could be broken.
In that case, the new event (e + len) can point a wrong address.
To avoid invalid memory access at boot, check whether the length of
each event is in the possible range before using it.
There is a pointer head_page in rb_meta_validate_events() which is not
initialized at the beginning of a function. This pointer can be dereferenced
if there is a failure during reader page validation. In this case the control
is passed to "invalid" label where the pointer is dereferenced in a loop.
To fix the issue initialize orig_head and head_page before calling
rb_validate_buffer.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Since commit 9c328f54741b ("net: nfc: nci: Add parameter validation for
packet data") communication with nci nfc chips is not working any more.
The mentioned commit tries to fix access of uninitialized data, but
failed to understand that in some cases the data packet is of variable
length and can therefore not be compared to the maximum packet length
given by the sizeof(struct).
Fixes: 9c328f54741b ("net: nfc: nci: Add parameter validation for packet data") Cc: stable@vger.kernel.org Signed-off-by: Michael Thalmeier <michael.thalmeier@hale.at> Reported-by: syzbot+740e04c2a93467a0f8c8@syzkaller.appspotmail.com Link: https://patch.msgid.link/20260218083000.301354-1-michael.thalmeier@hale.at Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
On Intel MacBookPros with switchable graphics, when the iGPU
is enabled, the address of VRAM gets put at 0 in the dGPU's
virtual address space. This is non-standard and seems to cause
issues with the cursor if it ends up at 0. We have the framework
to reserve memory at 0 in the address space, so enable it here if
the vram start address is 0.
Reviewed-and-tested-by: Mario Kleiner <mario.kleiner.de@gmail.com> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4302 Cc: stable@vger.kernel.org Cc: Mario Kleiner <mario.kleiner.de@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
[Why]
Fix fastboot broken in driver.
This is caused by an open source backport change 7495962c.
from the comment, the intended check is to disable fastboot
for pre-DCN10. but the logic check is reversed, and causes
fastboot to be disabled on all DCN10 and after.
fastboot is for driver trying to pick up bios used hw setting
and bypass reprogramming the hw if dc_validate_boot_timing()
condition meets.
Fixes: 7495962cbceb ("drm/amd/display: Disable fastboot on DCE 6 too") Cc: stable@vger.kernel.org Reviewed-by: Mario Limonciello <Mario.Limonciello@amd.com> Reviewed-by: Ovidiu Bunea <ovidiu.bunea@amd.com> Signed-off-by: Charlene Liu <Charlene.Liu@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Quentin forwards a report from Hyesoo Yu, describing an interesting
problem with the use of WFxT in __delay() when a vcpu is loaded and
that KVM is *not* in VHE mode (either nVHE or hVHE).
In this case, CNTVOFF_EL2 is set to a non-zero value to reflect the
state of the guest virtual counter. At the same time, __delay() is
using get_cycles() to read the counter value, which is indirected to
reading CNTPCT_EL0.
The core of the issue is that WFxT is using the *virtual* counter,
while the kernel is using the physical counter, and that the offset
introduces a really bad discrepancy between the two.
Fix this by forcing the use of CNTVCT_EL0, making __delay() consistent
irrespective of the value of CNTVOFF_EL2.
Reported-by: Hyesoo Yu <hyesoo.yu@samsung.com> Reported-by: Quentin Perret <qperret@google.com> Reviewed-by: Quentin Perret <qperret@google.com> Fixes: 7d26b0516a0d ("arm64: Use WFxT for __delay() when possible") Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/ktosachvft2cgqd5qkukn275ugmhy6xrhxur4zqpdxlfr3qh5h@o3zrfnsq63od Cc: stable@vger.kernel.org Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
`clippy` has changed behavior in [1] (Rust 1.95) where it no longer
warns about the `let_and_return` lint when a comment is placed between
the let binding and the return expression. Nightly thus fails to build,
because the expectation is no longer fulfilled.
Thus replace the expectation with an `allow`.
[ The errors were:
error: this lint expectation is unfulfilled
--> rust/pin-init/src/lib.rs:1279:10
|
1279 | #[expect(clippy::let_and_return)]
| ^^^^^^^^^^^^^^^^^^^^^^
|
= note: `-D unfulfilled-lint-expectations` implied by `-D warnings`
= help: to override `-D warnings` add `#[allow(unfulfilled_lint_expectations)]`
error: this lint expectation is unfulfilled
--> rust/pin-init/src/lib.rs:1295:10
|
1295 | #[expect(clippy::let_and_return)]
| ^^^^^^^^^^^^^^^^^^^^^^
These callback functions take a generic `T` that is used in the body as
the generic argument in `Registration` and `ThreadedRegistration`. Those
types require `T: 'static`, but due to a compiler bug this requirement
isn't propagated to the function. Thus add the bound. This was caught in
the upstream Rust CI [1].
[ The three errors looked similar and will start appearing with Rust
1.95.0 (expected 2026-04-16). The first one was:
error[E0310]: the parameter type `T` may not live long enough
Error: --> rust/kernel/irq/request.rs:266:43
|
266 | let registration = unsafe { &*(ptr as *const Registration<T>) };
| ^^^^^^^^^^^^^^^^^^^^^^
| |
| the parameter type `T` must be valid for the static lifetime...
| ...so that the type `T` will meet its required lifetime bounds
|
help: consider adding an explicit lifetime bound
|
264 | unsafe extern "C" fn handle_irq_callback<T: Handler + 'static>(_irq: i32, ptr: *mut c_void) -> c_uint {
| +++++++++
The io_zcrx_put_niov_uref() function uses a non-atomic
check-then-decrement pattern (atomic_read followed by separate
atomic_dec) to manipulate user_refs. This is serialized against other
callers by rq_lock, but io_zcrx_scrub() modifies the same counter with
atomic_xchg() WITHOUT holding rq_lock.
The same niov is pushed to the freelist twice, causing free_count to
exceed nr_iovs. Subsequent freelist pushes then perform an out-of-bounds
write (a u32 value) past the kvmalloc'd freelist array into the adjacent
slab object.
Fix this by replacing the non-atomic read-then-dec in
io_zcrx_put_niov_uref() with an atomic_try_cmpxchg loop that atomically
tests and decrements user_refs. This makes the operation safe against
concurrent atomic_xchg from scrub without requiring scrub to acquire
rq_lock.
Fixes: 34a3e60821ab ("io_uring/zcrx: implement zerocopy receive pp memory provider") Cc: stable@vger.kernel.org Signed-off-by: Kai Aizen <kai@snailsploit.com>
[pavel: removed a warning and a comment] Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
Dave reports that kexec may fail when the first kernel boots via the EFI
stub but without EFI runtime services, as in that case, the RSDP address
field in struct bootparams is never assigned. Kexec copies this value
into the version of struct bootparams that it provides to the incoming
kernel, which may have no other means to locate the ACPI root pointer.
So take the value from the EFI config tables if no root pointer has been
set in the first kernel's struct bootparams.
Fixes: a1b87d54f4e4 ("x86/efistub: Avoid legacy decompressor when doing EFI boot") Cc: <stable@vger.kernel.org> # v6.1 Reported-by: Dave Young <dyoung@redhat.com> Tested-by: Dave Young <dyoung@redhat.com> Link: https://lore.kernel.org/linux-efi/aZQg_tRQmdKNadCg@darkstar.users.ipa.redhat.com/ Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Currently if we export a GPIO over sysfs and unbind the parent GPIO
controller, the exported attribute will remain under /sys/class/gpio
because once we remove the parent device, we can no longer associate the
descriptor with it in gpiod_unexport() and never drop the final
reference.
Rework the teardown code: provide an unlocked variant of
gpiod_unexport() and remove all exported GPIOs with the sysfs_lock taken
before unregistering the parent device itself. This is done to prevent
any new exports happening before we unregister the device completely.
The range size can be 65536 when the requested range covers all possible
u16 queue IDs (e.g. queue_mapping=0 and queue_mapping_max=U16_MAX).
That value cannot be represented in a u16 and previously wrapped to 0,
so tcf_skbedit_hash() could trigger a divide-by-zero:
When user provides a bogus pat_index value through the madvise IOCTL, the
xe_pat_index_get_coh_mode() function performs an array access without
validating bounds. This allows a malicious user to trigger an out-of-bounds
kernel read from the xe->pat.table array.
The vulnerability exists because the validation in madvise_args_are_sane()
directly calls xe_pat_index_get_coh_mode(xe, args->pat_index.val) without
first checking if pat_index is within [0, xe->pat.n_entries).
Although xe_pat_index_get_coh_mode() has a WARN_ON to catch this in debug
builds, it still performs the unsafe array access in production kernels.
v2(Matthew Auld)
- Using array_index_nospec() to mitigate spectre attacks when the value
is used
v3(Matthew Auld)
- Put the declarations at the start of the block
Fixes: ada7486c5668 ("drm/xe: Implement madvise ioctl for xe") Reviewed-by: Matthew Auld <matthew.auld@intel.com> Cc: <stable@vger.kernel.org> # v6.18+ Cc: Matthew Brost <matthew.brost@intel.com> Cc: Shuicheng Lin <shuicheng.lin@intel.com> Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Jia Yao <jia.yao@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Link: https://patch.msgid.link/20260205161529.1819276-1-jia.yao@intel.com
(cherry picked from commit 944a3329b05510d55c69c2ef455136e2fc02de29) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Stefano reports that after commit 62089b804895 ("kbuild: rpm-pkg:
Generate debuginfo package manually"), building with an rpm package
using rpm 4.20.0 fails with:
RPM build errors:
Dependency tokens must begin with alpha-numeric, '_' or '/': #�) = 0x0d000002
Dependency tokens must begin with alpha-numeric, '_' or '/': �) = 0x0d000000
Dependency tokens must begin with alpha-numeric, '_' or '/': ) = 0x7c0e000000
Unknown rich dependency op 'Hat': (Red Hat 15.2.1-7)) = 0x3130363230322000
Unknown rich dependency op 'Hat': (Red Hat 15.2.1-7)) = 0x4728203a43434800
Unknown rich dependency op 'Hat': (Red Hat 15.2.1-7)) = 0x3130363230322000
Unknown rich dependency op 'Hat': (Red Hat 15.2.1-7)) = 0x4728203a43434800
This error comes from the automatic requirements feature of rpm. The
-debuginfo subpackage has no dependencies, so disable this feature with
'AutoReq: 0' for this subpackage, avoiding the error. This matches the
official %_debug_template macro that rpm provides. While automatic
provides should be default enabled, be explicit like %_debug_template
does.
Additionally, while in the area, add the manual debug information
package to the Development/Debug group, further aligning with
%_debug_template.
The ASUS Vivobook Pro 15X (M6501RR) with AMD Ryzen 9 6900HX has an
internal DMIC that is not detected without a DMI quirk entry, as the
BIOS does not set the AcpDmicConnected ACPI _DSD property.
Adding the DMI entry enables the ACP6x DMIC machine driver to probe
successfully.
Attempting SOCKET_URING_OP_SETSOCKOPT on an AF_NETLINK socket resulted
in an -EOPNOTSUPP, as AF_NETLINK doesn't have an ioctl in its struct
proto, but only in struct proto_ops.
Prior to the blamed commit, io_uring_cmd_sock() only had two cmd_op
operations, both requiring ioctl, thus the check was warranted.
Since then, 4 new cmd_op operations have been added, none of which
depend on ioctl. This patch moves the ioctl check, so it only applies
to the original operations.
AFAICT, the ioctl requirement was unintentional, and it wasn't
visible in the blamed patch within 3 lines of context.
In several places in the code, we have a label to signify
the start of the code where a request can be replayed if
necessary. However, some of these places were missing the
necessary reinitializations of certain local variables
before replay.
This change makes sure that these variables get initialized
after the label.
Cc: stable@vger.kernel.org Reported-by: Yuchan Nam <entropy1110@gmail.com> Tested-by: Yuchan Nam <entropy1110@gmail.com> Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
In an unlikely case when io_populate_area_dma() fails, which could only
happen on a PAGE_POOL_32BIT_ARCH_WITH_64BIT_DMA machine,
io_zcrx_map_area() will have an initialised and not freed table. It was
supposed to be cleaned up in the error path, but !is_mapped prevents
that.
Fix Sun FFB1 corrupted video out ([1] and [2]) by disabling overlay and
initializing window mode to a known state. The issue never appeared on
my FFB2+/vertical nor Elite3D/M6. It could also depend on the PROM
version.
/SUNW,ffb@1e,0: FFB at 000001fc00000000, type 11, DAC pnum[236c] rev[10] manuf_rev[4]
X (II) /dev/fb0: Detected FFB1, Z-buffer, Single-buffered.
X (II) /dev/fb0: BT9068 (PAC1) ramdac detected (with normal cursor control)
X (II) /dev/fb0: Detected Creator/Creator3D
of_parse_phandle() returns a device_node with refcount incremented,
which is stored in 'entry' and then copied to 'native_mode'. When the
error paths at lines 184 or 192 jump to 'entryfail', native_mode's
refcount is not decremented, causing a refcount leak.
Fix this by changing the goto target from 'entryfail' to 'timingfail',
which properly calls of_node_put(native_mode) before cleanup.
Fixes: cc3f414cf2e4 ("video: add of helper for display timings/videomode") Cc: stable@vger.kernel.org Signed-off-by: Weigang He <geoffreyhe2@gmail.com> Signed-off-by: Helge Deller <deller@gmx.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
The fbdev sysfs attributes are registered after sending the uevent for
the device creation, leaving a race window where e.g. udev rules may
not be able to access the sysfs attributes because the registration is
not done yet.
Fix this by switching to device_create_with_groups(). This also results in
a nice cleanup. After switching to device_create_with_groups() all that
is left of fb_init_device() is setting the drvdata and that can be passed
to device_create[_with_groups]() too. After which fb_init_device() can
be completely removed.
Dropping fb_init_device() + fb_cleanup_device() in turn allows removing
fb_info.class_flag as they were the only user of this field.
Fixes: 5fc830d6aca1 ("fbdev: Register sysfs groups through device_add_group") Cc: stable@vger.kernel.org Cc: Shixiong Ou <oushixiong@kylinos.cn> Signed-off-by: Hans de Goede <johannes.goede@oss.qualcomm.com> Signed-off-by: Helge Deller <deller@gmx.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
In some physical memory layout designs, the address space of CFMW (CXL
Fixed Memory Window) resides between multiple segments of system memory
belonging to the same NUMA node. In numa_cleanup_meminfo, these multiple
segments of system memory are merged into a larger numa_memblk. When
identifying which NUMA node the CFMW belongs to, it may be incorrectly
assigned to the NUMA node of the merged system memory.
When a CXL RAM region is created in userspace, the memory capacity of
the newly created region is not added to the CFMW-dedicated NUMA node.
Instead, it is accumulated into an existing NUMA node (e.g., NUMA0
containing RAM). This makes it impossible to clearly distinguish
between the two types of memory, which may affect memory-tiering
applications.
Example memory layout:
Physical address space:
0x00000000 - 0x1FFFFFFF System RAM (node0)
0x20000000 - 0x2FFFFFFF CXL CFMW (node2)
0x40000000 - 0x5FFFFFFF System RAM (node0)
0x60000000 - 0x7FFFFFFF System RAM (node1)
After numa_cleanup_meminfo, the two node0 segments are merged into one:
0x00000000 - 0x5FFFFFFF System RAM (node0) // CFMW is inside the range
0x60000000 - 0x7FFFFFFF System RAM (node1)
So the CFMW (0x20000000-0x2FFFFFFF) will be incorrectly assigned to node0.
To address this scenario, accurately identifying the correct NUMA node
can be achieved by checking whether the region belongs to both
numa_meminfo and numa_reserved_meminfo.
While this issue is only observed in a QEMU configuration, and no known
end users are impacted by this problem, it is likely that some firmware
implementation is leaving memory map holes in a CXL Fixed Memory Window.
CXL hotplug depends on mapping free window capacity, and it seems to be
only a coincidence to have not hit this problem yet.
Fixes: 779dd20cfb56 ("cxl/region: Add region creation support") Signed-off-by: Cui Chao <cuichao1753@phytium.com.cn> Cc: stable@vger.kernel.org Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Gregory Price <gourry@gourry.net> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Link: https://patch.msgid.link/20260213060347.2389818-2-cuichao1753@phytium.com.cn Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
On the receive path, __ioam6_fill_trace_data() uses trace->nodelen
to decide how much data to write for each node. It trusts this field
as-is from the incoming packet, with no consistency check against
trace->type (the 24-bit field that tells which data items are
present). A crafted packet can set nodelen=0 while setting type bits
0-21, causing the function to write ~100 bytes past the allocated
region (into skb_shared_info), which corrupts adjacent heap memory
and leads to a kernel panic.
Add a shared helper ioam6_trace_compute_nodelen() in ioam6.c to
derive the expected nodelen from the type field, and use it:
- in ioam6_iptunnel.c (send path, existing validation) to replace
the open-coded computation;
- in exthdrs.c (receive path, ipv6_hop_ioam) to drop packets whose
nodelen is inconsistent with the type field, before any data is
written.
Per RFC 9197, bits 12-21 are each short (4-octet) fields, so they
are included in IOAM6_MASK_SHORT_FIELDS (changed from 0xff100000 to
0xff1ffc00).
Fixes: 9ee11f0fff20 ("ipv6: ioam: Data plane support for Pre-allocated Trace") Cc: stable@vger.kernel.org Signed-off-by: Junxi Qian <qjx1298677004@gmail.com> Reviewed-by: Justin Iurman <justin.iurman@gmail.com> Link: https://patch.msgid.link/20260211040412.86195-1-qjx1298677004@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Commit 62089b804895 ("kbuild: rpm-pkg: Generate debuginfo package
manually") added uses of OBJCOPY and READELF, variables from Kbuild.
These variables are defined and work properly when using the binrpm-pkg
target because rpmbuild is run within Kbuild. However, these variables
are not defined when building from a source RPM package generated with
the srcrpm-pkg target, breaking the build when generating the debug info
subpackage.
Define a default value for these variables so that these commands
respect the value from Kbuild but continue to work when built from a
source RPM package.
Commit 62089b804895 ("kbuild: rpm-pkg: Generate debuginfo package
manually") effectively reverted commit a7c699d090a1 ("kbuild: rpm-pkg:
build a debuginfo RPM") but the approach it took is not safe for older
RPM releases. Restore commit a7c699d090a1 ("kbuild: rpm-pkg: build a
debuginfo RPM") for the !CONFIG_MODULE_SIG case to allow more
environments and configurations to take advantage of the separate debug
information package process.
Commit 62089b804895 ("kbuild: rpm-pkg: Generate debuginfo package
manually") moved away from the built-in RPM machinery for generating
-debuginfo packages to a more manual way to be compatible with module
signing, as the built-in machinery strips the modules after the
installation process, breaking the signatures.
Unfortunately, prior to rpm 4.20.0, there is a bug where a custom %files
directive is ignored for a -debuginfo subpackage [1], meaning builds
using older versions of RPM (such as on RHEL9 or RHEL10) fail with:
To workaround this, restrict the manual debug info package creation
process to when it is necessary (CONFIG_MODULE_SIG=y) and possible (when
using RPM >= 4.20.0). A follow up change will restore the RPM debuginfo
creation process using a separate internal flag to allow the package to
be built in more situations, as RPM 4.20.0 is a fairly recent version
and the built-in -debuginfo generation works fine when module signing is
disabled.
Cc: stable@vger.kernel.org Fixes: 62089b804895 ("kbuild: rpm-pkg: Generate debuginfo package manually") Link: https://github.com/rpm-software-management/rpm/commit/49f906998f3cf1f4152162ca61ac0869251c380f Reported-by: Steve French <smfrench@gmail.com> Closes: https://lore.kernel.org/CAH2r5mugbrHTwnaQwQiYEUVwbtqmvFYf0WZiLrrJWpgT8iwftw@mail.gmail.com/ Tested-by: Stefano Garzarella <sgarzare@redhat.com> Tested-by: Steve French <stfrench@microsoft.com> Tested-by: Juergen Gross <jgross@suse.com> Acked-by: Nicolas Schier <nsc@kernel.org> Link: https://patch.msgid.link/20260210-kbuild-fix-debuginfo-rpm-v1-1-0730b92b14bc@kernel.org Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Most samples cannot be build on some environments because they depend
on CC_CAN_LINK, which is set according to the result of
scripts/cc-can-link.sh called by cc_can_link_user.
Because cc-can-link.sh must now build without warning, it may fail
because it is calling printf() with an empty string:
+ cat
+ gcc -m32 -Werror -Wl,--fatal-warnings -x c - -o /dev/null
<stdin>: In function ‘main’:
<stdin>:4:9: error: zero-length gnu_printf format string [-Werror=format-zero-length]
cc1: all warnings being treated as errors
Fix this warning and the samples build by actually printing something.
Cc: stable@vger.kernel.org Fixes: d81d9d389b9b ("kbuild: don't enable CC_CAN_LINK if the dummy program generates warnings") Signed-off-by: Mickaël Salaün <mic@digikod.net> Reviewed-by: Nicolas Schier <nsc@kernel.org> Reviewed-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Link: https://patch.msgid.link/20260212133544.1331437-1-mic@digikod.net Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
When registering a second fgraph callback, direct path is disabled and
array loop is used instead. When ftrace_graph_active falls back to one,
we try to re-enable direct mode via ftrace_graph_enable_direct(true, ...).
But ftrace_graph_enable_direct() incorrectly disables the static key
rather than enabling it. This leaves fgraph_do_direct permanently off
after first multi-callback transition, so direct fast mode is never
restored.
Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20260213142932519cuWSpEXeS4-UnCvNXnK2P@zte.com.cn Fixes: cc60ee813b503 ("function_graph: Use static_call and branch to optimize entry function") Signed-off-by: Shengming Hu <hu.shengming@zte.com.cn> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
When the PCA-200E or SBA-200E adapter is being detached, the fore200e
is deallocated. However, the tx_tasklet or rx_tasklet may still be running
or pending, leading to use-after-free bug when the already freed fore200e
is accessed again in fore200e_tx_tasklet() or fore200e_rx_tasklet().
One of the race conditions can occur as follows:
CPU 0 (cleanup) | CPU 1 (tasklet)
fore200e_pca_remove_one() | fore200e_interrupt()
fore200e_shutdown() | tasklet_schedule()
kfree(fore200e) | fore200e_tx_tasklet()
| fore200e-> // UAF
Fix this by ensuring tx_tasklet or rx_tasklet is properly canceled before
the fore200e is released. Add tasklet_kill() in fore200e_shutdown() to
synchronize with any pending or running tasklets. Moreover, since
fore200e_reset() could prevent further interrupts or data transfers,
the tasklet_kill() should be placed after fore200e_reset() to prevent
the tasklet from being rescheduled in fore200e_interrupt(). Finally,
it only needs to do tasklet_kill() when the fore200e state is greater
than or equal to FORE200E_STATE_IRQ, since tasklets are uninitialized
in earlier states. In a word, the tasklet_kill() should be placed in
the FORE200E_STATE_IRQ branch within the switch...case structure.
The ID 8086:104f is matched by both i40e and ipw2200. The same device
ID should not be in more than one driver, because in that case, which
driver is used is unpredictable. Fix this by taking advantage of the
fact that i40e devices use PCI_CLASS_NETWORK_ETHERNET and ipw2200
devices use PCI_CLASS_NETWORK_OTHER to differentiate the devices.
Fixes: 2e45d3f4677a ("i40e: Add support for X710 B/P & SFP+ cards") Cc: stable@vger.kernel.org Acked-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Ethan Nelson-Moore <enelsonmoore@gmail.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Link: https://patch.msgid.link/20260210021235.16315-1-enelsonmoore@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Several subsystems (slub, shmem, ttm, etc.) use page->private but don't
clear it before freeing pages. When these pages are later allocated as
high-order pages and split via split_page(), tail pages retain stale
page->private values.
This causes a use-after-free in the swap subsystem. The swap code uses
page->private to track swap count continuations, assuming freshly
allocated pages have page->private == 0. When stale values are present,
swap_count_continued() incorrectly assumes the continuation list is valid
and iterates over uninitialized page->lru containing LIST_POISON values,
causing a crash:
KASAN: maybe wild-memory-access in range [0xdead000000000100-0xdead000000000107]
RIP: 0010:__do_sys_swapoff+0x1151/0x1860
Fix this by clearing page->private in free_pages_prepare(), ensuring all
freed pages have clean state regardless of previous use.
Link: https://lkml.kernel.org/r/20260207173615.146159-1-mikhail.v.gavrilov@gmail.com Fixes: 3b8000ae185c ("mm/vmalloc: huge vmalloc backing pages should be split rather than compound") Signed-off-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> Suggested-by: Zi Yan <ziy@nvidia.com> Acked-by: Zi Yan <ziy@nvidia.com> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Brendan Jackman <jackmanb@google.com> Cc: Chris Li <chrisl@kernel.org> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kairui Song <ryncsn@gmail.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Patch series "mm/vmscan: fix demotion targets checks in reclaim/demotion",
v9.
This patch series addresses two issues in demote_folio_list(),
can_demote(), and next_demotion_node() in reclaim/demotion.
1. demote_folio_list() and can_demote() do not correctly check
demotion target against cpuset.mems_effective, which will cause (a)
pages to be demoted to not-allowed nodes and (b) pages fail demotion
even if the system still has allowed demotion nodes.
Patch 1 fixes this bug by updating cpuset_node_allowed() and
mem_cgroup_node_allowed() to return effective_mems, allowing directly
logic-and operation against demotion targets.
2. next_demotion_node() returns a preferred demotion target, but it
does not check the node against allowed nodes.
Patch 2 ensures that next_demotion_node() filters against the allowed
node mask and selects the closest demotion target to the source node.
This patch (of 2):
Fix two bugs in demote_folio_list() and can_demote() due to incorrect
demotion target checks against cpuset.mems_effective in reclaim/demotion.
Commit 7d709f49babc ("vmscan,cgroup: apply mems_effective to reclaim")
introduces the cpuset.mems_effective check and applies it to can_demote().
However:
1. It does not apply this check in demote_folio_list(), which leads
to situations where pages are demoted to nodes that are
explicitly excluded from the task's cpuset.mems.
2. It checks only the nodes in the immediate next demotion hierarchy
and does not check all allowed demotion targets in can_demote().
This can cause pages to never be demoted if the nodes in the next
demotion hierarchy are not set in mems_effective.
These bugs break resource isolation provided by cpuset.mems. This is
visible from userspace because pages can either fail to be demoted
entirely or are demoted to nodes that are not allowed in multi-tier memory
systems.
To address these bugs, update cpuset_node_allowed() and
mem_cgroup_node_allowed() to return effective_mems, allowing directly
logic-and operation against demotion targets. Also update can_demote()
and demote_folio_list() accordingly.
Bug 1 reproduction:
Assume a system with 4 nodes, where nodes 0-1 are top-tier and
nodes 2-3 are far-tier memory. All nodes have equal capacity.
Bug 2 reproduction:
Assume a system with 6 nodes, where nodes 0-2 are top-tier,
node 3 is a far-tier node, and nodes 4-5 are the farthest-tier nodes.
All nodes have equal capacity.
Test script:
echo 1 > /sys/kernel/mm/numa/demotion_enabled
mkdir /sys/fs/cgroup/test
echo +cpuset > /sys/fs/cgroup/cgroup.subtree_control
echo "0-2,4-5" > /sys/fs/cgroup/test/cpuset.mems
echo $$ > /sys/fs/cgroup/test/cgroup.procs
swapoff -a
# Expectation: Pages are demoted to Nodes 4-5
# Observation: No pages are demoted before oom.
stress-ng --oomable --vm 1 --vm-bytes 150% --mbind 0,1,2
Link: https://lkml.kernel.org/r/20260114205305.2869796-1-bingjiao@google.com Link: https://lkml.kernel.org/r/20260114205305.2869796-2-bingjiao@google.com Fixes: 7d709f49babc ("vmscan,cgroup: apply mems_effective to reclaim") Signed-off-by: Bing Jiao <bingjiao@google.com> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: David Hildenbrand <david@kernel.org> Cc: Gregory Price <gourry@gourry.net> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Joshua Hahn <joshua.hahnjy@gmail.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Tejun Heo <tj@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Waiman Long <longman@redhat.com> Cc: Wei Xu <weixugc@google.com> Cc: Yuanchu Xie <yuanchu@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
When user provides incorrectly sized buffer for build ID for PROCMAP_QUERY
we return with -ENAMETOOLONG error. After recent changes this condition
happens later, after we unlocked mmap_lock/per-VMA lock and did mmput(),
so original goto out is now wrong and will double-mmput() mm_struct. Fix
by jumping further to clean up only vm_file and name_buf.
Rename free_pages_prepare() to __free_pages_prepare(), add an fpi_t
parameter, and skip those checks if FPI_TRYLOCK is set. To keep the fpi_t
definition in mm/page_alloc.c, add a wrapper function free_pages_prepare()
that always passes FPI_NONE and use it in mm/compaction.c.
Link: https://lkml.kernel.org/r/20260209062639.16577-1-harry.yoo@oracle.com Fixes: 8c57b687e833 ("mm, bpf: Introduce free_pages_nolock()") Signed-off-by: Harry Yoo <harry.yoo@oracle.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Zi Yan <ziy@nvidia.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Brendan Jackman <jackmanb@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Suren Baghdasaryan <surenb@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Commit a833a693a490 ("mm: hugetlb: fix incorrect fallback for subpool")
fixed an underflow error for hstate->resv_huge_pages caused by incorrectly
attributing globally requested pages to the subpool's reservation.
Unfortunately, this fix also introduced the opposite problem, which would
leave spool->used_hpages elevated if the globally requested pages could
not be acquired. This is because while a subpool's reserve pages only
accounts for what is requested and allocated from the subpool, its "used"
counter keeps track of what is consumed in total, both from the subpool
and globally. Thus, we need to adjust spool->used_hpages in the other
direction, and make sure that globally requested pages are uncharged from
the subpool's used counter.
Each failed allocation attempt increments the used_hpages counter by how
many pages were requested from the global pool. Ultimately, this renders
the subpool unusable, as used_hpages approaches the max limit.
The issue can be reproduced as follows:
1. Allocate 4 hugetlb pages
2. Create a hugetlb mount with max=4, min=2
3. Consume 2 pages globally
4. Request 3 pages from the subpool (2 from subpool + 1 from global)
4.1 hugepage_subpool_get_pages(spool, 3) succeeds.
used_hpages += 3
4.2 hugetlb_acct_memory(h, 1) fails: no global pages left
used_hpages -= 2
5. Subpool now has used_hpages = 1, despite not being able to
successfully allocate any hugepages. It believes it can now only
allocate 3 more hugepages, not 4.
With each failed allocation attempt incrementing the used counter, the
subpool eventually reaches a point where its used counter equals its
max counter. At that point, any future allocations that try to
allocate hugeTLB pages from the subpool will fail, despite the subpool
not having any of its hugeTLB pages consumed by any user.
Once this happens, there is no way to make the subpool usable again,
since there is no way to decrement the used counter as no process is
really consuming the hugeTLB pages.
The underflow issue that the original commit fixes still remains fixed
as well.
Without this fix, used_hpages would keep on leaking if
hugetlb_acct_memory() fails.
Link: https://lkml.kernel.org/r/20260116204037.2270096-1-joshua.hahnjy@gmail.com Fixes: a833a693a490 ("mm: hugetlb: fix incorrect fallback for subpool") Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com> Acked-by: Usama Arif <usama.arif@linux.dev> Cc: David Hildenbrand <david@kernel.org> Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Ma Wupeng <mawupeng1@huawei.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Oscar Salvador <osalvador@suse.de> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Waiman Long <longman@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Invalidating a dmabuf will impact other users of the shared BO.
In the scenario where process A moves the BO, it needs to inform
process B about the move and process B will need to update its
page table.
The commit fixes a synchronisation bug caused by the use of the
ticket: it made amdgpu_vm_handle_moved behave as if updating
the page table immediately was correct but in this case it's not.
An example is the following scenario, with 2 GPUs and glxgears
running on GPU0 and Xorg running on GPU1, on a system where P2P
PCI isn't supported:
glxgears:
export linear buffer from GPU0 and import using GPU1
submit frame rendering to GPU0
submit tiled->linear blit
Xorg:
copy of linear buffer
The sequence of jobs would be:
drm_sched_job_run # GPU0, frame rendering
drm_sched_job_queue # GPU0, blit
drm_sched_job_done # GPU0, frame rendering
drm_sched_job_run # GPU0, blit
move linear buffer for GPU1 access #
amdgpu_dma_buf_move_notify -> update pt # GPU0
It this point the blit job on GPU0 is still running and would
likely produce a page fault.
Cc: stable@vger.kernel.org Fixes: a448cb003edc ("drm/amdgpu: implement amdgpu_gem_prime_move_notify v2") Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
On Framework laptops with DDR5 modules, underflow can be observed.
It's unclear why it only occurs on specific desktop contents. However,
increasing enter/exit latencies by 3us seems to resolve it.
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4463 Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Leo Li <sunpeng.li@amd.com> Signed-off-by: Tom Chung <chiahsuan.chung@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
__io_fixed_fd_install() returns 0 on success for non-alloc mode
(specific slot), not the slot index. io_pipe_fixed() used this return
value directly as the slot index in fds[], which can cause the reported
values returned via copy_to_user() to be incorrect, or the error path
operating on the incorrect direct descriptor.
Fix by computing the actual 0-based slot index (slot - 1) for specific
slot mode, while preserving the existing behavior for auto-alloc mode
where __io_fixed_fd_install() already returns the allocated index.
Cc: stable@vger.kernel.org Fixes: 53db8a71ecb4 ("io_uring: add support for IORING_OP_PIPE") Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
Explicit fixed file install/remove operations on slots outside the
configured alloc range can corrupt alloc_hint via io_file_bitmap_set()
and io_file_bitmap_clear(), which unconditionally update alloc_hint to
the bit position. This causes subsequent auto-allocations to fall
outside the configured range.
For example, if the alloc range is [10, 20) and a file is removed at
slot 2, alloc_hint gets set to 2. The next auto-alloc then starts
searching from slot 2, potentially returning a slot below the range.
Fix this by clamping alloc_hint to [file_alloc_start, file_alloc_end)
at the top of io_file_bitmap_get() before starting the search.
Cc: stable@vger.kernel.org Fixes: 6e73dffbb93c ("io_uring: let to set a range for file slot allocation") Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
CephFS stores file data across multiple RADOS objects. An object is the
atomic unit of storage, so the writeback code must clean only folios
that belong to the same object with each OSD request.
CephFS also supports RAID0-style striping of file contents: if enabled,
each object stores multiple unbroken "stripe units" covering different
portions of the file; if disabled, a "stripe unit" is simply the whole
object. The stripe unit is (usually) reported as the inode's block size.
Though the writeback logic could, in principle, lock all dirty folios
belonging to the same object, its current design is to lock only a
single stripe unit at a time. Ever since this code was first written,
it has determined this size by checking the inode's block size.
However, the relatively-new fscrypt support needed to reduce the block
size for encrypted inodes to the crypto block size (see 'fixes' commit),
which causes an unnecessarily high number of write operations (~1024x as
many, with 4MiB objects) and correspondingly degraded performance.
Fix this (and clarify intent) by using i_layout.stripe_unit directly in
ceph_define_write_size() so that encrypted inodes are written back with
the same number of operations as if they were unencrypted.
This patch depends on the preceding commit ("ceph: do not propagate page
array emplacement errors as batch errors") for correctness. While it
applies cleanly on its own, applying it alone will introduce a
regression. This dependency is only relevant for kernels where ce80b76dd327 ("ceph: introduce ceph_process_folio_batch() method") has
been applied; stable kernels without that commit are unaffected.
When fscrypt is enabled, move_dirty_folio_in_page_array() may fail
because it needs to allocate bounce buffers to store the encrypted
versions of each folio. Each folio beyond the first allocates its bounce
buffer with GFP_NOWAIT. Failures are common (and expected) under this
allocation mode; they should flush (not abort) the batch.
However, ceph_process_folio_batch() uses the same `rc` variable for its
own return code and for capturing the return codes of its routine calls;
failing to reset `rc` back to 0 results in the error being propagated
out to the main writeback loop, which cannot actually tolerate any
errors here: once `ceph_wbc.pages` is allocated, it must be passed to
ceph_submit_write() to be freed. If it survives until the next iteration
(e.g. due to the goto being followed), ceph_allocate_page_array()'s
BUG_ON() will oops the worker.
Note that this failure mode is currently masked due to another bug
(addressed next in this series) that prevents multiple encrypted folios
from being selected for the same write.
For now, just reset `rc` when redirtying the folio to prevent errors in
move_dirty_folio_in_page_array() from propagating. Note that
move_dirty_folio_in_page_array() is careful never to return errors on
the first folio, so there is no need to check for that. After this
change, ceph_process_folio_batch() no longer returns errors; its only
remaining failure indicator is `locked_pages == 0`, which the caller
already handles correctly.
Commit 32dc0042528d ("tracing: Reset last-boot buffers when reading
out all cpu buffers") resets the last_boot_info when user read out
all data via trace_pipe* files. But it is not reset when user
resets the buffer from other files. (e.g. write `trace` file)
Reset it when the corresponding ring buffer is reset too.
Cc: stable@vger.kernel.org Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/177071302364.2293046.17895165659153977720.stgit@mhiramat.tok.corp.google.com Fixes: 32dc0042528d ("tracing: Reset last-boot buffers when reading out all cpu buffers") Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
In commit 99537d5c476c ("net: macb: Relocate mog_init_rings() callback
from macb_mac_link_up() to macb_open()"), the mog_init_rings() callback
was moved from macb_mac_link_up() to macb_open() to resolve a deadlock
issue. However, this change introduced a tx/rx malfunction following
phy link down and up events. The issue arises from a mismatch between
the software queue->tx_head, queue->tx_tail, queue->rx_prepared_head,
and queue->rx_tail values and the hardware's internal tx/rx queue
pointers.
According to the Zynq UltraScale TRM [1], when tx/rx is disabled, the
internal tx queue pointer resets to the value in the tx queue base
address register, while the internal rx queue pointer remains unchanged.
The following is quoted from the Zynq UltraScale TRM:
When transmit is disabled, with bit [3] of the network control register
set low, the transmit-buffer queue pointer resets to point to the address
indicated by the transmit-buffer queue base address register. Disabling
receive does not have the same effect on the receive-buffer queue
pointer.
Additionally, there is no need to reset the RBQP and TBQP registers in a
phy event callback. Therefore, move macb_init_buffers() to macb_open().
In a phy link up event, the only required action is to reset the tx
software head and tail pointers to align with the hardware's behavior.
Commit 95540ad6747c ("net: ti: icssg-prueth: Add support for HSR frame
forward offload") introduced support for offloading HSR frame forwarding,
which relies on functions such as is_hsr_master() provided by the HSR
module. Although HSR provides stubs for configurations with HSR
disabled, this driver still requires an optional dependency on HSR.
Otherwise, build failures will occur when icssg-prueth is built-in
while HSR is configured as a module.
ld.lld: error: undefined symbol: is_hsr_master
>>> referenced by icssg_prueth.c:710 (drivers/net/ethernet/ti/icssg/icssg_prueth.c:710)
>>> drivers/net/ethernet/ti/icssg/icssg_prueth.o:(icssg_prueth_hsr_del_mcast) in archive vmlinux.a
>>> referenced by icssg_prueth.c:681 (drivers/net/ethernet/ti/icssg/icssg_prueth.c:681)
>>> drivers/net/ethernet/ti/icssg/icssg_prueth.o:(icssg_prueth_hsr_add_mcast) in archive vmlinux.a
>>> referenced by icssg_prueth.c:1812 (drivers/net/ethernet/ti/icssg/icssg_prueth.c:1812)
>>> drivers/net/ethernet/ti/icssg/icssg_prueth.o:(prueth_netdevice_event) in archive vmlinux.a
ld.lld: error: undefined symbol: hsr_get_port_ndev
>>> referenced by icssg_prueth.c:712 (drivers/net/ethernet/ti/icssg/icssg_prueth.c:712)
>>> drivers/net/ethernet/ti/icssg/icssg_prueth.o:(icssg_prueth_hsr_del_mcast) in archive vmlinux.a
>>> referenced by icssg_prueth.c:712 (drivers/net/ethernet/ti/icssg/icssg_prueth.c:712)
>>> drivers/net/etherneteth_hsr_del_mcast) in archive vmlinux.a
>>> referenced by icssg_prueth.c:683 (drivers/net/ethernet/ti/icssg/icssg_prueth.c:683)
>>> drivers/net/ethernet/ti/icssg/icssg_prueth.o:(icssg_prueth_hsr_add_mcast) in archive vmlinux.a
>>> referenced 1 more times
Fixes: 95540ad6747c ("net: ti: icssg-prueth: Add support for HSR frame forward offload") Signed-off-by: Kevin Hao <haokexin@gmail.com> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20260207-icssg-dep-v3-1-8c47c1937f81@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The RX/TX flow-control bitmaps (rx_fc_pfvf_bmap and tx_fc_pfvf_bmap)
are allocated by cgx_lmac_init() but never freed in cgx_lmac_exit().
Unbinding and rebinding the driver therefore triggers kmemleak:
The ID 1186:4302 is matched by both r8169 and skge. The same device ID
should not be in more than one driver, because in that case, which
driver is used is unpredictable. I downloaded the latest drivers for
all hardware revisions of the D-Link DGE-530T from D-Link's website,
and the only drivers which contain this ID are Realtek drivers.
Therefore, remove this device ID from skge.
In the kernel bug report which requested addition of this device ID,
someone created a patch to add the ID to skge. Then, it was pointed
out that this device is an "r8169 in disguise", and a patch was created
to add it to r8169. Somehow, both of these patches got merged. See the
link below.
According to Documentation/dev-tools/kasan.rst, software KASAN modes use
compiler instrumentation to insert validity checks. Such instrumentation
might be incompatible with some parts of the kernel, and therefore needs
to be disabled, just use the attribute __no_sanitize_address to disable
instrumentation for the low level function setup_ptwalker().
Otherwise bringing up the secondary CPUs failed when CONFIG_KASAN is set
(especially when PTW is enabled), here are the call chains:
After commit 88fd2b70120d ("LoongArch: Fix sleeping in atomic context for
PREEMPT_RT"), it should guard percpu handler under !CONFIG_PREEMPT_RT to
avoid redundant operations.
Currently, use %p to prevent leaking information about the kernel memory
layout when printing the PC address, but the kernel log messages are not
useful to debug problem if bt_address() returns 0. Given that the type of
"pc" variable is unsigned long, it should use %px to print the unmodified
unwinding address.
Currently we use bottom-up allocation after sparse_init(), the reason is
sparse_init() need a lot of memory, and bottom-up allocation may exhaust
precious low memory (below 4GB). On the other hand, SWIOTLB and CMA need
low memories for DMA32, so swiotlb_init() and dma_contiguous_reserve()
need bottom-up allocation.
Since swiotlb_init() and dma_contiguous_reserve() are both called in
arch_mem_init(), we no longer need bottom-up allocation after that. So
we set the allocation policy to top-down at the end of arch_mem_init(),
in order to avoid later memory allocations (such as KASAN) exhaust low
memory.
This solve at least two problems:
1. Some buggy BIOSes use 0xfd000000~0xfe000000 for secondary CPUs, but
didn't reserve this range, which causes smpboot failures.
2. Some DMA32 devices, such as Loongson-DRM and OHCI, cannot work with
KASAN enabled.
According to Documentation/locking/seqlock.rst, seqcount_t is not
NMI-safe and seqcount_latch_t should be used when read path can interrupt
the write-side critical section. In this case, do not access
current->mems_allowed_seq and avoid retry.
Custom target specifications are unstable, but starting with Rust 1.95.0,
`rustc` requires to explicitly pass `-Zunstable-options` to use them [1]:
error: error loading target specification: custom targets are unstable and require `-Zunstable-options`
|
= help: run `rustc --print target-list` for a list of built-in targets
David (Rust compiler team lead), writes:
"We're destabilising custom targets to allow us to move forward with
build-std without accidentally exposing functionality that we'd like
to revisit prior to committing to. I'll start a thread on Zulip to
discuss with the RfL team how we can come up with an alternative
for them."
Thus pass it.
Cc: David Wood <david@davidtw.co> Cc: Wesley Wiser <wwiser@gmail.com> Cc: stable@vger.kernel.org # Needed in 6.12.y and later (Rust is pinned in older LTSs). Link: https://github.com/rust-lang/rust/pull/151534 Reviewed-by: Gary Guo <gary@garyguo.net> Tested-by: Gary Guo <gary@garyguo.net> Link: https://patch.msgid.link/20260206204535.39431-1-ojeda@kernel.org Signed-off-by: Miguel Ojeda <ojeda@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Update the device ID for Dell XPS 13 7390 2-in-1 in the quirk
`QUIRK_EDP_LIMIT_RATE_HBR2` entry. The previous ID (0x8a12) was
incorrect; the correct ID is 0x8a52.
Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/5969 Fixes: 21c586d9233a ("drm/i915/dp: Add device specific quirk to limit eDP rate to HBR2") Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Ankit Nautiyal <ankit.k.nautiyal@intel.com> Cc: <stable@vger.kernel.org> # v6.18+ Signed-off-by: Ankit Nautiyal <ankit.k.nautiyal@intel.com> Reviewed-by: Suraj Kandpal <suraj.kandpal@intel.com> Link: https://patch.msgid.link/20251226043359.2553-1-ankit.k.nautiyal@intel.com
(cherry picked from commit c7c30c4093cc11ff66672471f12599a555708343) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The ceph_zero_partial_object function was missing proper snapshot
context for its OSD write operations, which could lead to data
inconsistencies in snapshots.
Since commit 6e690d54cfa8 ("serial: 8250: fix return error code in
serial8250_request_std_resource()"), registering an 8250 MMIO port
without mapbase no longer works, as the resource range is derived from
mapbase/mapsize.
Populate mapbase and mapsize accordingly. Also drop ugly membase KSEG1
pointer and set UPF_IOREMAP instead, letting the 8250 core perform the
ioremap.
Fixes: 6e690d54cfa8 ("serial: 8250: fix return error code in serial8250_request_std_resource()") Cc: stable@vger.kernel.org Reported-by: Waldemar Brodkorb <wbx@openadk.org> Link: https://lore.kernel.org/linux-mips/aX-d0ShTplHKZT33@waldemar-brodkorb.de/ Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
We used to use the cifs_tcp_ses_lock to protect a lot of objects
that are not just the server, ses or tcon lists. We later introduced
srv_lock, ses_lock and tc_lock to protect fields within the
corresponding structs. This was done to provide a more granular
protection and avoid unnecessary serialization.
There were still a couple of uses of cifs_tcp_ses_lock to provide
tcon fields. In this patch, I've replaced them with tc_lock.
Cc: stable@vger.kernel.org Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The LPI2C controller sends a NACK at the end of a receive command
unless another receive command is already queued in MTDR. During
SMBus block reads, this causes the controller to NACK immediately
after receiving the block length byte, aborting the transfer before
the data bytes are read.
Fix this by queueing a second receive command as soon as the block
length byte is received, keeping MTDR non-empty and ensuring
continuous ACKs. The initial receive command reads the block length,
and the subsequent command reads the remaining data bytes according
to the reported length.
Fixes: a55fa9d0e42e ("i2c: imx-lpi2c: add low power i2c bus driver") Signed-off-by: Carlos Song <carlos.song@nxp.com> Cc: <stable@vger.kernel.org> # v4.10+ Reviewed-by: Frank Li <Frank.Li@nxp.com> Signed-off-by: Andi Shyti <andi.shyti@kernel.org> Link: https://lore.kernel.org/r/20260123105459.3448822-1-carlos.song@nxp.com Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
cpustat_tail indexes cpustat_util[], which is a NUM_SAMPLE_PERIODS-sized
ring buffer. need_counting_irqs() currently wraps the index using
NUM_HARDIRQ_REPORT, which only happens to match NUM_SAMPLE_PERIODS.
Use NUM_SAMPLE_PERIODS for the wrap to keep the ring math correct even if
the NUM_HARDIRQ_REPORT or NUM_SAMPLE_PERIODS changes.
Link: https://lkml.kernel.org/r/tencent_7068189CB6D6689EB353F3D17BF5A5311A07@qq.com Fixes: e9a9292e2368 ("watchdog/softlockup: Report the most frequent interrupts") Signed-off-by: Shengming Hu <hu.shengming@zte.com.cn> Reviewed-by: Petr Mladek <pmladek@suse.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Zhang Run <zhang.run@zte.com.cn> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Enabling KCSAN is causing a large number of duplicate types in BTF for
core kernel structs like task_struct [1]. This is due to the definition
in include/linux/compiler_types.h
Because some objects in the kernel are compiled without KCSAN flags
(KCSAN_SANITIZE) we sometimes get the empty __data_racy annotation for
objects; as a result we get multiple conflicting representations of the
associated structs in DWARF, and these lead to multiple instances of core
kernel types in BTF since they cannot be deduplicated due to the
additional modifier in some instances.
Moving the __data_racy definition under CONFIG_KCSAN avoids this problem,
since the volatile modifier will be present for both KCSAN and
KCSAN_SANITIZE objects in a CONFIG_KCSAN=y kernel.
Link: https://lkml.kernel.org/r/20260116091730.324322-1-alan.maguire@oracle.com Fixes: 31f605a308e6 ("kcsan, compiler_types: Introduce __data_racy type qualifier") Signed-off-by: Alan Maguire <alan.maguire@oracle.com> Reported-by: Nilay Shroff <nilay@linux.ibm.com> Tested-by: Nilay Shroff <nilay@linux.ibm.com> Suggested-by: Marco Elver <elver@google.com> Reviewed-by: Marco Elver <elver@google.com> Acked-by: Yonghong Song <yonghong.song@linux.dev> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com> Cc: Bart van Assche <bvanassche@acm.org> Cc: Daniel Borkman <daniel@iogearbox.net> Cc: Eduard Zingerman <eddyz87@gmail.com> Cc: Hao Luo <haoluo@google.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Jason A. Donenfeld <jason@zx2c4.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Kees Cook <kees@kernel.org> Cc: KP Singh <kpsingh@kernel.org> Cc: Martin KaFai Lau <martin.lau@linux.dev> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Naman Jain <namjain@linux.microsoft.com> Cc: Nathan Chancellor <nathan@kernel.org> Cc: "Paul E . McKenney" <paulmck@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stanislav Fomichev <sdf@fomichev.me> Cc: Uros Bizjak <ubizjak@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
- Create an MPTCP endpoint for address A without any flags
- Create a new MPTCP connection from address A
- Remove the MPTCP endpoint: the corresponding subflows will be removed
- Recreate the endpoint with the same ID, but with the subflow flag
- Change the same endpoint to add the fullmesh flag
In this case, msk->pm.local_addr_used has been kept to 0 as expected,
but the corresponding bit in msk->pm.id_avail_bitmap was still unset
after having removed the endpoint, causing the splat later on.
When removing an endpoint, the corresponding endpoint ID was only marked
as available for "signal" types with an announced address, plus all
"subflow" types, but not the other types like an endpoint corresponding
to the initial subflow. In these cases, re-creating an endpoint with the
same ID didn't signal/create anything. Here, adding the fullmesh flag
was creating the splat when calling __mark_subflow_endp_available() from
mptcp_pm_nl_fullmesh(), because msk->pm.local_addr_used was set to 0
while the ID was marked as used.
To fix this issue, the corresponding bit in msk->pm.id_avail_bitmap can
always be set as available when removing an MPTCP in-kernel endpoint. In
other words, moving the call to __set_bit() to do it in all cases,
except for "subflow" types where this bit is handled in a dedicated
helper.
Note: instead of adding a new spin_(un)lock_bh that would be taken in
all cases, do all the actions requiring the spin lock under the same
block.
This modification potentially fixes another issue reported by syzbot,
see [1]. But without a reproducer or more details about what exactly
happened before, it is hard to confirm.
Fixes: e255683c06df ("mptcp: pm: re-using ID of unused removed ADD_ADDR") Cc: stable@vger.kernel.org Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/606 Reported-by: syzbot+f56f7d56e2c6e11a01b6@syzkaller.appspotmail.com Closes: https://lore.kernel.org/68fcfc4a.050a0220.346f24.02fb.GAE@google.com [1] Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260205-net-mptcp-misc-fixes-6-19-rc8-v2-1-c2720ce75c34@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Commit a4e772898f8b ("PCI: Add missing bridge lock to pci_bus_lock()")
delegates the bridge device's pci_dev_trylock() to pci_bus_trylock() in
pci_slot_trylock(), but it forgets to remove the corresponding
pci_dev_unlock() when pci_bus_trylock() fails.
The commit 8278c6914306 ("PCI: Preserve bridge window resource type flags")
changed bridge window resource behavior such that flags are no longer zero
if the bridge window is not valid or is disabled (mainly to preserve the
type flags for later use). If a bridge window has its limit smaller than
base address, pci_read_bridge_*() sets both IORESOURCE_UNSET and
IORESOURCE_DISABLED to indicate the bridge window exists but is not valid
with the current base and limit configuration.
The code in pci_claim_bridge_resources() still depends on the old behavior
of checking validity of the bridge window solely based on !r->flags,
whereas after 8278c6914306, also IORESOURCE_DISABLED may indicate bridge
window addresses are not valid.
While pci_claim_resource() does check IORESOURCE_UNSET,
pci_claim_bridge_resource() attempts to clip the resource if
pci_claim_resource() fails, which is not correct for bridge window
resources that are not valid. As pci_bus_clip_resource() performs clipping
regardless of flags and then clears IORESOURCE_UNSET, it should not be
called unless the resource is valid.
The problem is visible in this log:
pci 0000:20:00.0: PCI bridge to [bus 21]
pci 0000:20:00.0: bridge window [io size 0x0000 disabled]: can't claim; no address assigned
pci 0000:20:00.0: [io 0x0000-0xffffffffffffffff disabled] clipped to [io 0x0000-0xffff disabled]
Add IORESOURCE_DISABLED check in pci_claim_bridge_resources() to only
claim bridge windows that appear to have a valid configuration.
If an error occurs during register_netdev() for the first MAC in
cpsw_register_ports(), even though cpsw->slaves[0].ndev is set to NULL,
cpsw->slaves[1].ndev would remain unchanged. This could later cause
cpsw_unregister_ports() to attempt unregistering the second MAC.
To address this, add a check for ndev->reg_state before calling
unregister_netdev(). With this change, setting cpsw->slaves[i].ndev
to NULL becomes unnecessary and can be removed accordingly.
Fixes: ed3525eda4c4 ("net: ethernet: ti: introduce cpsw switchdev based driver part 1 - dual-emac") Signed-off-by: Kevin Hao <haokexin@gmail.com> Cc: stable@vger.kernel.org Reviewed-by: Alexander Sverdlin <alexander.sverdlin@gmail.com> Link: https://patch.msgid.link/20260205-cpsw-error-path-v1-2-6e58bae6b299@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
The current error handling in cpsw_probe() has two issues:
- cpsw_unregister_ports() may be called before cpsw_register_ports() has
been executed.
- cpsw_unregister_ports() is already invoked within cpsw_register_ports()
in case of a register_netdev() failure, but the error path would call
it again.
Fixes: ed3525eda4c4 ("net: ethernet: ti: introduce cpsw switchdev based driver part 1 - dual-emac") Signed-off-by: Kevin Hao <haokexin@gmail.com> Cc: stable@vger.kernel.org Reviewed-by: Alexander Sverdlin <alexander.sverdlin@gmail.com> Link: https://patch.msgid.link/20260205-cpsw-error-path-v1-1-6e58bae6b299@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Ungate GPU CG/PG in device_fini_hw and device_halt to protect GPU
register accesses, e.g. GC registers are accessed in amdgpu_irq_disable_all()
and amdgpu_fence_driver_hw_fini().
The kfd_event_page_set() function writes KFD_SIGNAL_EVENT_LIMIT * 8
bytes via memset without checking the buffer size parameter. This allows
unprivileged userspace to trigger an out-of bounds kernel memory write
by passing a small buffer, leading to potential privilege
escalation.
Signed-off-by: Sunday Clement <Sunday.Clement@amd.com> Reviewed-by: Alexander Deucher <Alexander.Deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 13b1f8e25bfd1 ("usb: dwc2: Force mode optimizations") removed the
dwc2_force_mode(hsotg, true) in dwc2_force_dr_mode() if dr_mode is host.
But this brings a bug: the controller fails to resume back as host,
further debugging shows that the controller is resumed as peripheral.
The reason is dwc2_force_dr_mode() missed the host mode forcing, and
when resuming from s2ram, GINTSTS is 0 by default, dwc2_is_device_mode
in dwc2_resume() misreads this as the controller is in peripheral mode.
Fix the resume failure by adding back the dwc2_force_mode(hsotg, true).
Then an obvious question is: why this bug hasn't been observed and fixed
for about six years? There are two resons: most dwc2 platforms set the
dr_mode as otg; Some platforms don't have suspend & resume support yet.
Currently dwc3_gadget_vbus_draw() can be called from atomic
context, which in turn invokes power-supply-core APIs. And
some these PMIC APIs have operations that may sleep, leading
to kernel panic.
Fix this by moving the vbus_draw into a workqueue context.
Fixes: 99288de36020 ("usb: dwc3: add an alternate path in vbus_draw callback") Cc: stable <stable@kernel.org> Tested-by: Samuel Wu <wusamuel@google.com> Acked-by: Thinh Nguyen <Thinh.Nguyen@synopsys.com> Signed-off-by: Prashanth K <prashanth.k@oss.qualcomm.com> Link: https://patch.msgid.link/20260204054155.3063825-1-prashanth.k@oss.qualcomm.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
The mmio regmap that may be allocated during probe is never freed.
Switch to using the device managed allocator so that the regmap is
released on probe failures (e.g. probe deferral) and on driver unbind.
Fixes: 61de83fd8256 ("mux: mmio: Do not use syscon helper to build regmap") Cc: stable@vger.kernel.org # 6.16 Cc: Andrew Davis <afd@ti.com> Signed-off-by: Johan Hovold <johan@kernel.org> Acked-by: Andrew Davis <afd@ti.com> Link: https://patch.msgid.link/20251127134702.1915-1-johan@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
When dw_pcie_iatu_setup() configures outbound address translation for both
type PCIE_ATU_TYPE_MEM and PCIE_ATU_TYPE_IO, the iATU index to use is
incremented before calling dw_pcie_prog_outbound_atu().
However for msg_atu_index, the index is not incremented before use,
causing the iATU index to be the same as the last configured iATU index,
which means that it will incorrectly use the same iATU index that is
already in use, breaking outbound address translation.
In total there are three problems with this code:
-It assigns msg_atu_index the same index that was used for the last
outbound address translation window, rather than incrementing the index
before assignment.
-The index should only be incremented (and msg_atu_index assigned) if the
use_atu_msg feature is actually requested/in use (pp->use_atu_msg is set).
-If the use_atu_msg feature is requested/in use, and there are no outbound
iATUs available, the code should return an error, as otherwise when this
this feature is used, it will use an iATU index that is out of bounds.
Fixes: e1a4ec1a9520 ("PCI: dwc: Add generic MSG TLP support for sending PME_Turn_Off when system suspend") Signed-off-by: Niklas Cassel <cassel@kernel.org> Signed-off-by: Manivannan Sadhasivam <mani@kernel.org> Tested-by: Maciej W. Rozycki <macro@orcam.me.uk> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hans Zhang <zhanghuabing@ecosda.com> Reviewed-by: Frank Li <Frank.Li@nxp.com> Reviewed-by: Shawn Lin <shawn.lin@rock-chips.com> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20260127151038.1484881-6-cassel@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
Loongson2ef reserves io range below 0x4000 (LOONGSON_PCI_IO_START) while
ISA-mode only IDE controller on the south bridge still has a hard
dependency on ISA IO ports.
The reservation was done by lifting loongson_pci_io_resource.start onto
0x4000. Prior to commit ae81aad5c2e1 ("MIPS: PCI: Use
pci_enable_resources()"), the arch specific pcibios_enable_resources()
did not check if the resources were claimed, which diverges from what
PCI core checks, effectively hiding the fact that IDE IO resources were
not properly within the resource tree. After starting to use
pcibios_enable_resources() from PCI core, enabling IDE controller fails:
pata_cs5536 0000:00:0e.2: BAR 0 [io 0x01f0-0x01f7]: not claimed; can't enable device
pata_cs5536 0000:00:0e.2: probe with driver pata_cs5536 failed with error -22
MIPS PCI code already has support for enforcing lower bounds using
PCIBIOS_MIN_IO in pcibios_align_resource() without altering the IO
window start address itself. Make Loongson2ef PCI code use
PCIBIOS_MIN_IO too.
Fixes: ae81aad5c2e1 ("MIPS: PCI: Use pci_enable_resources()") Cc: stable@vger.kernel.org Tested-by: Beiyan Yun <root@infi.wang> Tested-by: Yao Zi <me@ziyao.cc> Signed-off-by: Rong Zhang <rongrong@oss.cipunited.com> Acked-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
We are about to set loongson_pci_io_resource.start to 0 and adopt
PCIBIOS_MIN_IO. As the first step, PCI controller needs to be registered
in early stage to make it the root of other resources (e.g., i8259) and
prevent resource conflicts.
Register it in plat_mem_setup() instead of arch_initcall().
Fixes: ae81aad5c2e1 ("MIPS: PCI: Use pci_enable_resources()") Cc: stable@vger.kernel.org Tested-by: Beiyan Yun <root@infi.wang> Tested-by: Yao Zi <me@ziyao.cc> Signed-off-by: Rong Zhang <rongrong@oss.cipunited.com> Acked-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
When both KASAN and SLAB_STORE_USER are enabled, accesses to
struct kasan_alloc_meta fields can be misaligned on 64-bit architectures.
This occurs because orig_size is currently defined as unsigned int,
which only guarantees 4-byte alignment. When struct kasan_alloc_meta is
placed after orig_size, it may end up at a 4-byte boundary rather than
the required 8-byte boundary on 64-bit systems.
Note that 64-bit architectures without HAVE_EFFICIENT_UNALIGNED_ACCESS
are assumed to require 64-bit accesses to be 64-bit aligned.
See HAVE_64BIT_ALIGNED_ACCESS and commit adab66b71abf ("Revert:
"ring-buffer: Remove HAVE_64BIT_ALIGNED_ACCESS"") for more details.
Change orig_size from unsigned int to unsigned long to ensure proper
alignment for any subsequent metadata. This should not waste additional
memory because kmalloc objects are already aligned to at least
ARCH_KMALLOC_MINALIGN.
When allocating slabobj_ext array in alloc_slab_obj_exts(), the array
can be allocated from the same slab we're allocating the array for.
This led to obj_exts_in_slab() incorrectly returning true [1],
although the array is not allocated from wasted space of the slab.
Vlastimil Babka observed that this problem should be fixed even when
ignoring its incompatibility with obj_exts_in_slab(), because it creates
slabs that are never freed as there is always at least one allocated
object.
To avoid this, use the next kmalloc size or large kmalloc when
the array can be allocated from the same cache we're allocating
the array for.
In case of random kmalloc caches, there are multiple kmalloc caches
for the same size and the cache is selected based on the caller address.
Because it is fragile to ensure the same caller address is passed to
kmalloc_slab(), kmalloc_noprof(), and kmalloc_node_noprof(), bump the
size to (s->object_size + 1) when the sizes are equal, instead of
directly comparing the kmem_cache pointers.
Note that this doesn't happen when memory allocation profiling is
disabled, as when the allocation of the array is triggered by memory
cgroup (KMALLOC_CGROUP), the array is allocated from KMALLOC_NORMAL.
Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202601231457.f7b31e09-lkp@intel.com [1] Cc: stable@vger.kernel.org Fixes: 4b8736964640 ("mm/slab: add allocation accounting into slab allocation and free paths") Signed-off-by: Harry Yoo <harry.yoo@oracle.com> Link: https://patch.msgid.link/20260126125714.88008-1-harry.yoo@oracle.com Reviewed-by: Hao Li <hao.li@linux.dev> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Sasha Levin <sashal@kernel.org>